[MSc thesis] Audio Data Augmentation with respect to Musical Instrument Recognition

We develop a large number of software tools and hosting infrastructures to support the research developed at the Department. We will be detailing in this section the different tools available. You can take a look for the moment at the offer available within the UPF Knowledge Portal, the innovations created in the context of EU projects in the Innovation Radar and the software sections of some of our research groups:

Artificial Intelligence

Nonlinear Time Series Analysis

Downloads

Web Research

Dyswebxia

Music Technology

Interactive Technologies

Barcelona MedTech

GitHub

Natural Language Processing

GitHub
Resources (datasets, software and other material)

Nonlinear Time Series Analysis

Downloads

UbicaLab

GitHub

Wireless Networking

GitHub

Educational Technologies

GitHub

Back [MSc thesis] Audio Data Augmentation with respect to Musical Instrument Recognition

[MSc thesis] Audio Data Augmentation with respect to Musical Instrument Recognition

Author: Siddharth Bhardwaj

Supervisors: Olga Slizovskaia, Emilia Gómez and Gloria Haro

MSc program: Master in Sound and Music Computing

Identifying musical instruments in a polyphonic music recording is a difficult yet crucial problem in music information retrieval. It helps in auto-tagging of a musical piece by instrument, consequently enabling searching music databases by instrument. Other useful applications of instrument recognition are source separation, genre recognition, music transcription, and instrument specific equalizations. We review the state of the art methods for the task, including the recent Convolutional Neural Networks based approaches. These deep learning models require large quantities of annotated data, a problem which can be partly solved by synthetic data augmentation. We study different types of audio data transformations that can help in various audio related tasks, publishing an augmentation library in the process. We investigate the effect of using augmented data during the training process of three state of the art CNN based models. We achieved a performance improvement of 2% over the best performing model with almost half the number of trainable model parameters. We attained 6% performance improvement for the single-layer CNN architecture, and 4% for the multi-layer architecture . Also, we study the influence of each type of audio augmentation on each instrument class individually.

Additional material:

Link: https://doi.org/10.5281/zenodo.1066136

DTIC MdM Strategic Program: Artificial and Natural Intelligence for ICT and beyond

[MSc thesis] Audio Data Augmentation with respect to Musical Instrument Recognition

Related Assets