Slizovskaia O, Gómez E, Haro G. Automatic musical instrument recognition in audiovisual recordings by combining image and audio classification strategies. 13th Sound and Music Computing Conference (SMC 2016)
We develop a large number of software tools and hosting infrastructures to support the research developed at the Department. We will be detailing in this section the different tools available. You can take a look for the moment at the offer available within the UPF Knowledge Portal, the innovations created in the context of EU projects in the Innovation Radar and the software sections of some of our research groups:
Artificial Intelligence |
Nonlinear Time Series Analysis |
Web Research |
Music Technology |
Interactive Technologies |
Barcelona MedTech |
Natural Language Processing |
Nonlinear Time Series Analysis |
UbicaLab |
Wireless Networking |
Educational Technologies |
Slizovskaia O, Gómez E, Haro G. Automatic musical instrument recognition in audiovisual recordings by combining image and audio classification strategies. 13th Sound and Music Computing Conference (SMC 2016)
Slizovskaia O, Gómez E, Haro G. Automatic musical instrument recognition in audiovisual recordings by combining image and audio classification strategies. 13th Sound and Music Computing Conference (SMC 2016)
The goal of this work is to incorporate the visual modality into a musical instrument recognition system. For that, we first evaluate state-of-the-art image recognition techniques in the context of music instrument recognition, using a database of about 20000 images and 12 instrument classes. We then reproduce the results of state-of-the-art methods for audio-based musical instrument recognition, considering standard datasets including more than 9000 sound excerpts and 45 instrument classes. We finally compare the accuracy and confusions in both modalities and we showcase how they can be integrated for audio-visual instrument recognition in music videos. We obtain around 0.75 F1-measure for audio and 0.77 for images and similar confusions between instruments. This study confirms that visual (shape) and acoustic (timbre) properties of music instruments are related to each other and reveals the potential of audiovisual music description systems.
Additional material:
- Code in Github
- IRMAS: A dataset for instrument recognition in musical audio signals
- RWC Music Database
- ImageNet ILSVRC collection