TensorFlow Audio Models in Essentia

  • Authors
  • Alonso-Jiménez P, Bogdanov D, Pons J, Serra X
  • UPF authors
  • BOGDANOV ., DMITRY; SERRA CASALS, FRANCESC XAVIER; ALONSO JIMENEZ, PABLO; PONS PUIG, JORDI;
  • Authors of the book
  • AA. VV.
  • Book title
  • Proceedings ICASSP 2020 - IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • Publisher
  • IEEE
  • Publication year
  • 2020
  • Pages
  • 266-270
  • ISBN
  • 978-1-5090-6631-5
  • Abstract
  • Essentia is a reference open-source C++/Python library for audio and music analysis. In this work, we present a set of algorithms that employ TensorFlow in Essentia, allow predictions with pre-trained deep learning models, and are designed to offer flexibility of use, easy extensibility, and real-time inference. To show the potential of this new interface with TensorFlow, we provide a number of pre-trained state-of-the-art music tagging and classification CNN models. We run an extensive evaluation of the developed models. In particular, we assess the generalization capabilities in a cross-collection evaluation utilizing both external tag datasets as well as manual annotations tailored to the taxonomies of our models.
  • Complete citation
  • Alonso-Jiménez P, Bogdanov D, Pons J, Serra X. TensorFlow Audio Models in Essentia. In: AA. VV.. Proceedings ICASSP 2020 - IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 1 ed. Barcelona: IEEE; 2020. p. 266-270.