Aller au contenu

Ressources for the Audio Application

This application is based on sound recordings, mostly from the Silent Cities Dataset (except Session 1), from which we extract subdatasets specifically for this course.

Ressources for Session 1

Dataset

For Lab Session 1, we use all training examples from two classes of the ESC-50 dataset.

  • The two classes are "Dog barking" and "Fireworks".
  • Duration of sounds are 5 seconds
  • Sampling rate is 44.1 kHz

More details can be found in the original paper.

Visualisation - Listening to a few sounds

Dog

Fireworks

Latent Space

The 80 sounds have been put in a latent space using CNN14, a deep learning model from this paper : PANNs. We will delve into the details of Deep Learning and feature extraction from course 4.

For now, you can just open the numpy array containing all samples in the latent space from the embeddings-audio-lab1.npz.

Work to do

Compute, visualize and interpret the distance matrix, as explained in Lab Session 1 main page.