Exploring Audio Datasets with Python

Create a simple GUI to browse large datasets

Image datasets can be explored easily. Even if we have hundreds of images, we can scroll through directories and glance at the data. This way, we can quickly notice interesting properties: Colours, locations, time of the day. However, would we use the same strategy for audio datasets, we would not get far. Instead, we had to listen or skip through a single file at a time. For a large number of samples, this approach is prohibitive.

A solution is to visualize the audio files. Rather than listening to the samples sequentially, we plot different characteristics. The most basic plot is the waveform, which is shown in this example:

A visualization of the waveform for several audio samples. Image by the author.

We can see that the samples in the first row all show similar patterns. Also, for the second row, the sound seems to be concentrated in the beginning. Then, switching to a spectrogram representation, we can examine the samples further:

A visualization of the spectrogram for several audio samples. Image by the author.

On the y-axis, we see the frequency, and on the x-axis, we see the time. Finally, the colour holds the last information: The brighter, the more energy the area has. For our second row, the energy seems to be concentrated in the first 2, 3 seconds.

Then, in the next step, we take one sample and visualize several transformations at the same time, as shown below:

A combined view of different visualization techniques. Image by the author.

Doing all this manually over and over is tedious. But, thanks to python and streamlit, we can develop a simple GUI, which is rendered in the browser. You can try it live here.

AI/ML

Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot

via WordPress https://ramseyelbasheer.io/2021/07/20/exploring-audio-datasets-with-python/