27 June 2019
09:30 - 10:00 Registration
10:00 - 11:20 Oral Session 1
- Xavier Serra (MTG): 25 years of the MTG
- Álvaro Barbosa (University of Saint Joseph): From Networked Music to Designing Musical Interactions
- Thomas Aussenac (Sound Object): Presentation of Sound Object
- Rafael Ramirez (MTG): Technology-Enhanced Music Learning, Health, and Well-Being
- Frederic Font (MTG) & Bram de Jong (Holoplot): Freesound: Past, Present, and Future
- Jordi Pons (MTG): End-to-end Learning for Music Audio Tagging at Scale
- António Ramires (MTG): Methods for Supporting Electronic Music Production with Large-Scale Sound Databases
- Lorenzo Porcaro (MTG): 20 Years of Playlists: A Statistical Analysis on Popularity and Diversity
- Olga Slizovkaia (MTG): End-to-End Sound Source Separation Conditioned on Instrument Labels
- Juan Sebastian Gómez (MTG): The Emotions that We Perceive in Music: Agreement and Language
- Miguel Garcia Casado (MTG): Contributing to New Musicological Theories with Computational Methods: Quantifying Melodic Pattern Importance in Arab-Andalusian Music
- Andres Ferraro (MTG): Music Cold-Start and Long-Tail Recommendation: Evaluation of Bias in Deep Model Representations
- Rachit Gupta (MTG): Comprehending Hindustani Music by Analysing Rhythmic Patterns
- Vsevolod Eremenko & Blazej Kotowski (MTG): Automatic Assessment of Beginners Level Guitar Exercises Performances by Audio Recordings
- Merlijn Blaauw (MTG): Recent Work on Neural Singing Synthesis
- Pritish Chandna (MTG): A Vocoder Based Method For Singing Voice Extraction
- Genís Plaja (UPF): Sounds of Science: Share your Research Works Through the Matter of Sound
- Rubén Hinojosa (Independent consultant): Technoplayer: A Digital Algorithmic Music Instrument for Real-time Interactive Performances
12:10 - 12:30 Musical Interlude 1
- Sons de Barcelona: Free-Sounds de Barcelona
12:30 - 13:30 Oral Session 2
- Albin Correya (MTG): Music Tech Community India
- Pedro J. González González (Ubiquotechs – Ubox): Trends in Audio Interactivity: From Theremin to Web Audio API
- Joachim Haas (SWR EXPERIMENTALSTUDIO): The SWR EXPERIMENTALSTUDIO: 40 Years of Live-Electronic Practice
- Sonia Espí (MTG): Beyond Research: Outreach and Communication
- Waldo Nogueira (Hannover Medical School): Making Music More Accessible for Cochlear Implant Listeners
- Enric Guaus (ESMUC & MTG): Decisions in my Academic Career
13:30 - 14:30 Lunch break
14:30 - 15:50 Oral Session 3
- Sergi Jordà (MTG): Before and After the Reactable: 25 Years of Musical Interaction at the MTG
- Georgi Dzhambazov (Voice Magix): From Research Prototype to a Product: The Music Tech Perspective
- Alastair Porter (MTG): Using the Crowd to Analyse Music at Scale with AcousticBrainz
- Adan Garriga (EURECAT): 15 years of Research in 3D Audio at Eurecat
- Jordi Bonada (MTG): 20 Years of Singing Voice Synthesis at the MTG
- Gerard Roma (University of Huddersfield): The Fluid Decomposition Toolbox: Decomposing Audio in Creative Coding Environments
- Andres Lewin-Richter (Phonos): Phonos the Cradle of the MTG
- Ángel Faraldo (Phonos): Phonos at 45: Midlife Crisis or Second Adolescence?
16:40 - 17:00 Musical Interlude 2
- Sergi Jordà & Ángel Faraldo: Reactable Duet
17:00 - 19:00 Oral Session 4
- Fabien Gouyon (Pandora): Is Audio the Future of Music Streaming?
- Cyril Laurier (Hand Coded): From Artificial Intelligence to Art Installations
- Joan Serrà (Telefónica): The Amazing Normalizing Flows
- Oscar Mayor (Voctrolabs): My (almost) 20 Years at the MTG in Ten Minutes
- Martin Kaltenbrunner (University of Art and Design in Linz): Post-Digital Lutherie at the Tangible Music Lab
- Amaury Hazan (Billaboop): 10 Years Post-MTG, Music Technology from Mobile to Embedded
- Paul Brossier (Aubio): Around Aubio: 15 Years of Consulting for Open Source Music Applications
- Ramon Mañas (Odiseimusic): Travel Sax, The Smallest and Lightest Electronic Saxophone in the World Designed in Barcelona!
- Maarten DeBoer & Pau Arumí (Dolby): Dolby Atmos: Object-Based Immersive Audio
19:00 Beers & Networking
28 June 2019
10:00 - 11:30 Oral Session 5
- Emilia Gómez (MTG): MIR Technologies: The Human Rights Perspective
- Rafael Caro Repetto (MTG): The Musical Bridges project: Technology for the Understanding of Music Cultures
- Hendrik Purwins (Accenture): Deep Learning for Audio Signal Processing
- Miguel Risueño AKA Mike808 (Production.Club): From Creative Coder to Creative Director
- Martí Umbert (Verbio): R&D on Speech Technologies at Verbio
Dmitry Bogdanov (MTG) & Nicolas Wack (Jacoti): Essentia: Past, Present, and Future
- Philip Tovstogan (MTG): Facilitating Interactive Music Exploration
- Xavier Favory (MTG): Graph-Based Audio Clustering for Browsing Online Sound Collections
- Luis Joglar Ongay (SonoSuite & MTG) & Stefano Negrín (SonoSuite): Automatisation of Processes for the Digital Music Distribution Industry
- Aggelos Gkiokas (MTG): Deep Learning Methods for Automatic Beat Tracking
- Furkan Yesiler (MTG): Identifying and Understanding Versions of Songs with Computational Approaches
- Blai Melendez (BMAT & MTG): Open Broadcast Media Audio from TV: a Dataset of TV Broadcast Audio with Music Relative Loudness Annotations
- Eduardo Fonseca (MTG): Model-Agnostic Approaches to Handling Noisy Labels When Training Sound Event Classifiers
- Ninad Puranik (MTG): Automatic Assessment of Hindustani Singing Exercises
- Fabio Ortega (MTG): Teaching Music Expression Using Data Visualization and Machine Learning Models
- Alia Morsi (MTG): From General to Culture Specific and Back: How Dunya and Compmusic Empower Culture Specific but also More General Music Research Tasks too
- Sergio Giraldo (MTG): Enhancing Music Learning with Smart Technologies
- David Dalmazzo (MTG): Gestures Classification in Music Performance: A Machine Learning Approach
12:30 - 13:30 Oral Session 6
- Marius Miron (European Commission's Joint Research Centre): Investigating the Causes of Group Unfairness in Machine Learning Classification
- Lorena Aldana (Bielefeld University): Cardio sounds: ECG sonification to support cardiac diagnostic and monitoring
- Gabriel Meseguer (IRCAM): Dali: A Large Dataset of Synchronized Audio, Lyrics, and Notes, Automatically Created Using Teacher-Student machine Learning Paradigm
- Javier Nistal (Sony): Conditional Generation of Audio Using Deep Neural Networks and its Applications to Music Production
- Leny Vinceslas (Loughborough University London): Multi-Sound Zone Reproduction: Modelling Personal Audio Spaces in a Multi-User Environment
- Adrian Mateo (SEAT): The MTG Sound Effect: A Personal Review
13:30 - 14:30 Lunch break
14:30 - 15:50 Oral Session 7
- Perfecto Herrera (ESMUC & MTG): What If?
- Sergio Oramas (Pandora): From Software Engineer to Data Scientist
- Jose Luis Zagazeta (SonoSuite): Empowering the Music Industry Through Innovative Technology Solutions
- Jordi Janer (Voctrolabs): A Voiceful Story: Singing Technologies for Creative Applications
- Enric Giné (Tasso): Technology, Techniques, and Standards for the Preservation of Historical Sound Recordings
- Bruno Rocha (Universidade de Coimbra): The Challenges of Building a Respiratory Sound Database
- David Vidal Gonzalez & Jorge Fuentes (Musical Instruments Innovation Lab): Manufacturing Disruptive high‐end Musical Instruments by Applying Acoustically‐engineered Carbon Fibre Composites
16:30 - 18:30 Oral Session 8
- José Lozano (UPF): La Candelaria, El Raval, El Born, PobleNou
- Jorge Garcia (independent consultant): Working in Game Audio technology
- Eduard Resina (ESMUC): Don’t Loop on Me
- Alex Loscos (BMAT): BMAT the FAQ
- Emilio Molina (BMAT): R&D on Music Monitoring at BMAT
- Felipe Navarro (MTG): Creative Technologies: From Art to Therapy
- Ferdinand Fuhrmann (Joanneum Research): Developing and Deploying Machine Listening Systems for Commercial Applications
- Juanjo Bosch (Spotify): From Automatic Music Description to Computer-Aided Music Creation
- Zacharias Vamvakousis (Eyeharp): Teaching Music with the EyeHarp Gaze Controlled Music Interface
19:00 - 21:30 Closing event: MTGers music (Plaça Gutenberg)
- Ignasi Nou (drums), Lorenzo Porcaro (bass), David Dalmazo (guitar) & Sergio Giraldo (guitar): The Machine Learning Lab Jazz Quartet
- Albin Correya & Xavier Favory: Poly Atmospheres
- António Ramires, a.k.a. Caucenus: Dubstep / Batida
- Sergio Oramas (vocals & samples) + Emilio Molina (keys): Supertrópica
- Open Jam Session
Xavier Serra (MTG): 25 years of the MTG [video]
In 1994 I joined the UPF and assembled a small team of researchers to work on music technology. Soon after we decided to name ourselves Music Technology Group. The origins and evolution of the MTG are linked to a number of institutions that have shaped what we are. Our seed was the Phonos Foundation and the trigger was its integration into the Audiovisual Institute (now extinct) of the UPF. Then, the MTG became one of the founding groups of the Department of Information and Communication Technologies, of which we are an integral part. But the development of the current MTG has been a long journey which has been the result of the many people that have worked and collaborated with us. During these two days, we will have a glimpse into these 25 years of history and their impact.
Álvaro Barbosa (University of Saint Joseph): From Networked Music to Designing Musical Interactions [video]
In 2006 I defended my PhD Thesis at the MTG. It was one of the earliest solid research works in the field of Networked Music and it has been the foundation followup research developments that I carried on over the next 13 years. in this presentation, I will introduce a short overview of the most relevant doctoral research projects I supervised during the last decade, as well as how it links to my academic career.
Thomas Aussenac (Sound Object): Presentation of Sound Object [video]
Sound Object is a sound & music company based in Barcelona. We are experts in audio branding, sound design, original music for TV/web/radio ads, video games, films, and new media art.
Rafael Ramirez (MTG): Technology-enhanced music learning, health, and well-being [video]
Learning to play a musical instrument has been showed to provide a number of benefits both in terms of health and well-being, and for acquiring non-musical skills. However, there is a lack of generalised access to music education, and musical instrument learning is mostly based on the master-apprentice model in which the student’s interaction is often restricted to short and punctual contact with the teacher followed by long periods of self-study resulting in high abandonment rates. Part of our work has been devoted to enhance music education by using technology to create new interactive, assistive, self-learning, and mutimodal systems complementary to traditional teaching. In this talk, I will present research carried out in our research lab on technology-enhanced music learning and accessible music interfaces.
Frederic Font (MTG) & Bram de Jong (Holoplot): Freesound: past, present, and future [video]
Freesound was created at the MTG back in 2005, in the context of the International Computer Music Conference (ICMC). At that time it could have not been imagined that 14 years later Freesound would become home for more than 410,000 sounds uploaded by more than 22,000 users, and would have served more than 137 million sound downloads. Freesound is most probably the MTG project with the biggest impact worldwide. In this talk we'll walk you through Freesound history since its creation until present days, tell about some curious anecdotes and talk about our future plans.
Albin Correya (MTG): Music Tech Community India [video]
India is a country with a vibrant music scene grounded in centuries of rich musical tradition. At the same time, in the last decades, India has experienced rapid technological advances. However, the exponential growth of new technologies has not had an equivalent impact on the growth of Music Technology spaces. While the reasons for this may be complex and indeed many, Music Tech Community - India (MTC - India) was created to address two crucial demands, a lack of awareness and accurate information regarding this interdisciplinary field. MTC - India is a group of volunteers that brings together musicians, technologists, artists, developers, audiophiles and makers with an interest in Music Technology. It aims to foster interaction between these people on different edges of the Music Technology spectrum, in order to build an open, transparent, and actively collaborative community. Since its inception in April 2018 by a group of former MTG students, MTC - India has organized workshops, talks and networking events to achieve its goal. This presentation outlines its past activities and its long-term plans.
Pedro J. González González (Ubiquotechs – Ubox): Trends in audio interactivity: From Theremin to Web Audio API [video]
From the middle of the 20th century, there have been many advances in analog and digital fields in terms of both creation and analysis of audio and music technologies. This 21th century has been driven by ubiquitous web technologies which allow making projects which were not possible just some years ago. Thanks to this and the computing power of today’s computers and mobile phone, it can be sort of emulated most of the audio equipment we had in the past with great accuracy controlled nearly from anywhere. Herein it will be covered, but not exclusively, the opportunities that Web Audio API opens for interactivity, allowing creating synthesis engines running on the browser, and also tools for analysing sound and music both in time and frequency domains.
Joachim Haas (SWR EXPERIMENTALSTUDIO): The SWR EXPERIMENTALSTUDIO: 40 years of live-electronic practice [video]
The SWR EXPERIMENTALSTUDIO was founded in 1971 in Freiburg, Germany. It is situated as an interface between compositional ideas and their technical realisation. Every year several composers and musicians are invited to accept a fellowship in order to realize their works with the studio’s special equipment in interaction with its staff members: experts in computer music, sound designers, sound engineers, and sound directors. The EXPERIMENTALSTUDIO then takes an active role in the worldwide performance of these works. With its 40-year presence in the international music scene, it has become one of the leading centers for ambitious works of music with live electronics.
Sonia Espí (MTG): Beyond research: outreach and communication [video]
What is the impact of our research on society? How much we think of the potential users of our work and of how can they benefit from our research? And, most important, do they get to know what we do? For the MTG is very relevant to dedicate efforts to communicate and transfer our results beyond the academic environments and be able to engage with the creative community, young people, industry and the society in general. In this talk, I will present different actions that we take at the MTG to improve our social impact.
Waldo Nogueira (Hannover Medical School): Making Music More Accessible for Cochlear Implant Listeners [video]
Cochlear implants (CIs) have become remarkably successful in restoring the hearing abilities of profoundly hearing-impaired people. Although in most cases speech understanding with CIs reaches around 90%, key musical features such as pitch and timbre are poorly transmitted by CIs, leading to a severely distorted perception of music. Because music is a ubiquitous means of sociocultural interaction, this handicap significantly degrades the quality of life of CI users. Therefore, in this contribution, recent developments that enable CI users to access music are presented. More concrete the limitations pitch and timbre perception with CIs are presented as well as its implications for music perception. Next, different emerging strategies for improving CI users' music enjoyment, such as customized music compositions, music pre-processing methods for the reduction of signal complexity, and improved sound coding strategies will be reviewed together with subjective and objective instrumental evaluation procedures.
Enric Guaus (ESMUC & MTG): Decisions in My Academic Career [video]
About 18 years ago, I started my collaboration with the Music Technology Group as a master student. I came from the acoustics and vibrations field and, with my musical background, being accepted at the MTG to start working with signal processing was a great success. This was my first important decision. Since that September in 2001, I've had to take some (minor) decisions that, as a whole, have led me to the (not so bad) current personal and academic situation. In this talk, I will share my personal viewpoint on the participation in research projects vs teaching schedules, considering the PhD thesis as the end or as the starting point of the academic career, the future of research in the academia vs the foundation of a Start-up, among others. I'm not trying to provide recipes but a senior point of view that I hope can help in your long term professional career.
Sergi Jordà (MTG): Before and after the reactable: 25 Years of Musical Interaction at the MTG [video]
The reactable has clearly been the most popular and acclaimed outcome of the research in musical interaction we have pursued in the MTG since day 1, but it has definitely not been the only one! In this talk, I will provide an overview of these 25 years of work involving more than 15 researchers, summarising also how this field has evolved in this quarter-century.
Georgi Dzhambazov (Voice Magix): From Research Prototype to a Product: The Music Tech Perspective [video]
The gap between research on music technology and music industry is still wide! Despite the ever-increasing amount and quality of research papers, there are still just a few cases of successful industry products applying research outcomes. Why is it a challenge to start your own music tech product? In this talk I will outline the fundamental steps to build a market-ready product out of the debris of an existing research prototype, drawing from my personal experience while building www.voicemagix.com. Among others, the focus will be on aspects such as a time-to-market, market study, monetization model, referring to key resources like music technology incubators and music business conferences. The talk can serve as a “quick start” for entrepreneur-savvies who would like to validate the market potential out their ideas for music tech startups.
Alastair Porter (MTG): Using the crowd to analyse music at scale with AcousticBrainz [video]
AcousticBrainz is an online database of features computed from audio files. We ask thousands of volunteers to analyse their music collections using Essentia and send the results to us. This has allowed us to collect a large collection of automatically generated features for over 10 million music recordings. This talk will cover the research that we have performed to date with this large collection of features, and our future plans.
Adan Garriga (EURECAT): 15 years of research in 3D audio at Eurecat [video]
I will review the research projects and achievements of the audio group at Eurecat (formerly named Barcelona Media) founded in 2004. I will focus on 3D audio advances, from the Imm Sound's incubation to the recent development of 3D audio technologies for music production: Sfëar.
Jordi Bonada (MTG): 20 years of singing voice synthesis at the MTG [video]
In this talk, we will overview our research in singing synthesis for the last two decades, and the impact deep learning has brought in. We will see how the different models we used evolved from sinusoidal models plus residual to spectral models, and currently to deep learning approaches modeling vocoder features. We will also see how we moved from manual expression controls to automatic expressive singing generation. While in the past our datasets consisted of pseudo singing recordings, highly constrained to constant pace and loudness, nowadays we directly use phonetically segmented expressive songs without score annotations. Our models can learn vocal loudness and vocal strength without specific annotations and are able to automatically generate expressive singing.
Gerard Roma (University of Huddersfield): The Fluid Decomposition Toolbox: Decomposing Audio in Creative Coding Environments [video]
The Fluid Decomposition Toolbox is a C++ library developed within the Fluid Corpus Manipulation (FluCoMa) project, which makes a number of signal processing algorithms for audio decomposition available to popular creative coding environments. The goal is to facilitate music creation based on audio collections by enabling many possible ways of splitting recordings into "layers", "slices" and "objects", while providing a few descriptors and other related utilities. The beta version of the library is now available. I will summarize the general architecture and the available algorithms, as well as remaining challenges and future directions.
Andres Lewin-Ritcher (Phonos): Phonos the cradle of the MTG [video]
Who could guess in 1974 that Phonos would be part of a University? When we (4 people) established Phonos S.A. we had the clear idea that an electronic music studio had to have three clearly defined departments: (1) Education, (2) Research (at that time creation of hardware, later on move into the digital world), (3) Production (composition and concerts). All this has been achieved and splintered: UPF in 1994 took over education (IDEC, later ESMUC-Sonology) and research (today MTG), and now Phonos only takes care of composition and concerts with a clear spirit to expand the MTG research.
Angel Faraldo (Phonos): Phonos at 45: midlife crisis or second adolescence? [video]
In this presentation I will make an open evaluation of my first year as artistic director of Phonos, sharing the ideas behind our artistic and disseminative programmes, and making an exercise of self-criticism taking into account the attendance to our events, the relations with the University and other cultural institutions in Barcelona, and the non-trivial bridging of research in (music) technology and music-making. Based on these facts, I will reflect upon the challenges that a foundation such as Phonos needs to face to renew its mission and raison d'être in the present day, to guide the design of future dissemination and educational activities that can strengthen our position as a reference centre for creation and dissemination of musics made with new technologies, in between the city and academia.
Fabien Gouyon: Is Audio the Future of Music Streaming? [video]
The early 2000's were transformative for the music industry. The years of the first internet music companies such as Napster, and Pandora. And the first years of MIR at the MTG ---and anywhere, really. Back then, a number of people in this room predicted, and were damn sure that (our PhD research on) audio processing was the answer to pretty much everything, including saving the music industry. It wasn't really, at that time. But, is now the time for Audio??... Guessing the future (or even the present in the case of music streaming...) is difficult. But again, it appears The Simpsons correctly guessed the future about 20 times, so let's give it another try.
Cyril Laurier (Hand Coded): From Artificial Intelligence to Art Installations [video]
After my PhD at the MTG, I co-founded Hand-Coded, a collective investigating new technologies for light and sound installations. It quickly became a playground for developing new ideas and especially for experimenting how to use machine learning in an artistic context. I will present some of those works, such as a touring audio reactive DJ stage, analysing the music in realtime to generate visuals and control kinetic lights.
Joan Serrà (Telefónica): Tthe Amazing Normalizing Flows [video]
I will briefly expose a new deep architecture for generative modeling and density estimation commonly termed normalizing flows. Easier to train than generative adversarial networks and faster than autoregressive models, normalizing flows feature a number of desirable properties that I will briefly discuss. I will end the talk with some audio-related applications that we have been developing in the last year.
Oscar Mayor (Voctrolabs): My (almost) 20 years at the MTG in ten minutes [video]
When I finished my degree in Computer Science I had a clear idea about what I wanted to do professionally, a work that could combine my modest musical expertise and my software developer ambition. I managed to get a scholarship to do my end of degree project in collaboration with IIIA-CSIC and the MTG and after my graduation in the year 2000 I started to work full-time at the MTG, something that I've been happily doing until the beginning of this year 2019 when I moved to Voctro Labs, a spin-off of the MTG, after a five years transition period. In this talk I will explain all the things I've learned at the MTG, the large list of projects I've worked on and all my personal evolution as I started as a single and ended up with a wife and two kids, the last one born exactly the day I left the MTG.
Martin Kaltenbrunner (University of Art and Design in Linz): Post-Digital Lutherie at the Tangible Music Lab [video]
Martin Kaltenbrunner is professor for Tangible Interaction Design and director of the Tangible Music Lab, an artistic research group within the Institute of Media Studies at the University of Art and Design in Linz, Austria. Until 2009 he worked as researcher at the Music Technology Group and lecturer at the Pompeu Fabra University in Barcelona, as well as further research institutions such as the MIT Medialab Europe in Dublin. As co-founder of Reactable Systems he had been mainly working on the interaction design of the electronic musical instrument Reactable, which was awarded with a Golden Nica for Digital Music at the Prix Ars Electronica in 2008. This instrument provided the framework for his present artistic-research practice, which is applying tangible interaction design methods towards post-digital lutherie. He received a Doctorate in Engineering from the Bauhaus-University Weimar, and his technical developments in the field of Tangible User Interfaces, such as TUIO and reacTIVision have been employed for the realization of numerous artistic and scientific projects.
Amaury Hazan (Billaboop): 10 years post-MTG, music technology from mobile to embedded [video]
After graduating at MTG, I have been involved in a range of Music Technology, AI, and audio-related projects. mostly in the private sector. This includes the mobile apps I created: BoomClap, CamBox and Vidibox, and the collaborations I had with companies: RjDj, Chirp.io, and Jacoti.
Paul Brossier (Aubio): around aubio: 15 years of consulting for open source music applications [video]
After being a researcher at MTG, Paul Brossier gathered more than ten years of experience as a consultant in music software, with a focus on real-time applications. As the maintainer of the aubio library, he will return on the past 15 years of this open-source library, presenting some of the projects he worked on as a developer, and making a parallel of the evolution of open-source adoption in both the academy and the industry.
Ramon Mañas (Odiseimusic): Travel Sax, the smallest and lightest electronic saxophone in the world designed in Barcelona! [video]
Travel Sax is the smallest and lightest electronic saxophone in the market. It has been designed and developed by Odisei Music, a startup based in Barcelona. The objective is to help saxophone players improve their saxophone skills much easier and faster. During this talk, we will walk you through the evolution of the project since it started less than a year ago.
Maarten DeBoer & Pau Arumí (Dolby): Dolby Atmos: object-based immersive audio
With object-based audio, content creators are not tied to any specific playback layout: audio tracks and their metadata are preserved from the production until the playback stage, where they are rendered to the consumer’s loudspeakers or headphones. This paradigm enables flexible production and distribution of immersive audio content. In this presentation, we will show how immersive Cinematic and Music content in Dolby Atmos is produced, authored, distributed and played over a variety of professional and consumer scenarios and configurations, from cinemas to living rooms to mobile phones.
Emilia Gómez (MTG): MIR technologies: the human rights perspective [video]
Music information retrieval (MIR) research has been designed to develop technologies supporting music listening experiences, facilitate the understanding of large music collections, and advance in our scientific understanding of music. In this talk, I will present some ethical challenges related to these goals and connect MIR development with human rights, in particular to the right for the enjoyment of benefits of cultural freedom and scientific progress as defined at the United Nations International Covenant on Economic, Social and Cultural rights.
Rafael Caro Repetto (MTG): The Musical Bridges project: Technology for the understanding of music cultures [video]
The increasing availability of digital data made possible by the development of ICT brings us close to the different cultures of the world. However, this availability of data does not necessarily imply their understanding. A music tradition is a well-known passageway to the culture which it stemmed from, but it can only be walked through via the comprehension of its structural and aesthetic principles. The purpose of this project hence is to put technology at the service of music understanding with the ultimate goal of bridging cultures. To this aim, Musical Bridges draws on the musical data gathered in the CompMusic project from five music traditions, namely Hindustani, Carnatic, Ottoman-Turkish Makam, Chinese jingju and Arab-Andalusian. State of the art methods for music education mostly consist in textbooks accompanied with audio samples in accompanying CDs or websites. However, there still remains a gap between the intellectual understanding and its perception from audio samples. To solve this issue, most textbooks rely on staff notation, which reduces its outreach to readers with music literacy. In Musical Bridges we develop computational tools to bridge the gap between intellectual understanding and aural perception through intuitive, interactive visualizations for whose comprehension no musical literacy is required. Their availability online and the number of samples allow the users to determine their pace and depth of engagement with the new music tradition. In this presentation, I present the motivation, goals, and methodology of the Musical Bridges project, as well as the challenges for its development, illustrating our first results with two examples from our tools for Hindustani and jingju music.
Hendrik Purwins (Accenture): Deep Learning for Audio Signal Processing
I will present our recent overview article with the same title (IEEE JSTSP 14, 2019 by myself, Bo Li, Tuomas Virtanen, Jan Schlüter, Shuo-yiin Chang, and Tara Sainath). Given the recent surge in developments of deep learning, this talk provides a review of the state-of-the-art deep learning techniques for audio signal processing. Speech, music, and environmental sound processing are considered side-by-side, in order to point out similarities and differences between the domains, highlighting general methods, problems, key references, and potential for cross-fertilization between areas. The dominant feature representations (in particular, log-mel spectra and raw waveform) and deep learning models are reviewed, including convolutional neural networks, variants of the long short-term memory architecture, as well as more audio-specific neural network models. Subsequently, prominent deep learning application areas are covered, i.e. audio recognition (automatic speech recognition, music information retrieval, environmental sound detection, localization and tracking) and synthesis and transformation (source separation, audio enhancement, generative models for speech, sound, and music synthesis). Finally, key issues and future questions regarding deep learning applied to audio signal processing are identified.
Miguel Risueño AKA Mike808 (Production.Club): From Creative Coder to Creative Director [video]
After finalizing my Master Degree at the MTG I moved to Los Angeles where I co-founded Production Club, a multidisciplinary creative studio working for Artists and Tech Companies. Since then I have designed shows, experiences and interactive installations for DJs (Skrillex, Zedd, The Chainsmokers, Deadmau5, ZHU...) and tech companies (Amazon, Intel, Epic Games, Notch...). In this short talk, I will present some of my projects combining Art and Tech, talk about the particularities of working with big artists, and explain how the mindset developed at the MTG helped me create innovative work.
Martí Umbert (Verbio): R&D on speech technologies at Verbio [video]
How can speech technologies be combined towards building a smart conversational system? What is the technology behind AI-based call centers? How can chatbots understand, retrieve information, and answer questions? In this talk, I will provide an overview of the R&D at Verbio, where five main areas are being addressed: text-to-speech synthesis, continuous speech recognition, voice biometrics, speech analytics, and natural language processing.
Dmitry Bogdanov (MTG) & Nicolas Wack (Jacoti): Essentia: past, present, and future [video]
Essentia is an audio analysis software for audio and music analysis, description, and synthesis, that was developed at MTG and became adopted in many music technology applications. We highlight some moments of the history of its development, main technical features that made it attractive for industry and research, and our vision of the future of the project.
Marius Miron (European Commission's Joint Research Centre): Investigating the causes of group unfairness in machine learning classification [video]
The Fairness, Accountability, and Transparency in Machine Learning (FAT-ML) literature proposes a varied set of group fairness metrics to evaluate the disparity between two groups with respect to a protected feature, such as gender or race in the case of a system performing binary classification. In this research, we analyze the case of criminal recidivism prediction. We discuss potential sources of the unfairness and provide explanations for them, by combining machine learning interpretability techniques with a thorough data analysis. Our findings provide an explanation for why ML techniques lead to unfairness in data-driven risk assessment, even when protected attributes are not used in training.
Lorena Aldana (Bielefeld University): Cardio sounds: ECG sonification to support cardiac diagnostic and monitoring [video]
In the medical field, most of the diagnostic and monitoring tools rely mainly on visual displays. However, there are situations in which physicians need to monitor a number of signals while performing a primary task that requires their visual attention (e.g. performing a surgery), and thus another type of feedback is needed. Given that the human auditory system is already very capable of detecting patterns and changes in signals, sonification emerges as a valuable tool to support diagnostics and monitoring tasks in the medical field or in sports related activities. In this talk, I focus on current ECG sonification methods, ongoing challenges in the field and possibilities for future work.
Gabriel Meseguer (IRCAM): Dali: A Large Dataset of Synchronized Audio, Lyrics and Notes, Automatically Created Using Teacher-Student Machine Learning Paradigm [video]
The DALI dataset is a large dataset with time-aligned vocal melody notes and lyrics at four levels of granularity. The first version has 5358 audio tracks, the second one 7756. We start with songs manually annotated by non-expert users into notes and lyrics of the vocal melody. But, they come without audio and only described by the artist's name and song's title. Also, the annotations are not always accurate enough to be used for a MIR dataset. To create a clean dataset, we need to 1) find the corresponding audio used and 2) to improve the quality of the annotations' time information. For each annotated songs, we retrieve a set of audio candidates from the Web. Each one is turned into a singing-voice probability (SVP) over time using a deep Convolutional Neural Network. We compare the SVP to the annotated singing voice activity, derived from the time-aligned lyrics, we find the best candidate and correct the annotation time information. The quality of this matching is restricted by the performances of the SVD system. To improve it, we adopt a teacher-student paradigm. The teacher is our first SVD system trained on clean data, which is used to select a new training set (annotation/audio matches). This set is used to train a new SVD system, the student. We show that this loop (performed twice) progressively improve the performances of the SVD systems to get better audio-annotation pairs and to find a better alignment.
Javier Nistal (Sony): Conditional generation of audio using deep neural networks and its applications to music production
Recent advances in artificial intelligence have led to the development of powerful generative models capable of synthesizing high-quality audio without or few human interventions. I believe that this practice cannot be engaging from a user-experience point of view as interaction lies in the core of the creative process. My research focuses on conditioning these models to generate musically adapted audio to some conditioning musical content. The goal is to make this adaptation consistent across harmony, melody, timbre and mid/long-term structure to the conditioning audio. The potential of conditional generative models may provide with more flexible and expressive music tools that can enhance creativity.
Leny Vinceslas (Loughborough University London): Multi-sound zone reproduction: modelling personal audio spaces in a multi-user environment [video]
Our modern environment is submerged by audio-visual contents that can be accessed from a great number of devices and from an increasing number of locations. For example, the same living space might contain several laptops, tablets, audio systems, smartphones or televisions. Unlike visual contents, multiple audio contents cannot be delivered in a unique space without competing, which seriously impacts the listening experience. Headphones could be used to produce isolated listening conditions; however, they seriously impede communication between listeners and extended listening can cause fatigue and decreases awareness of the environment. Multi-sound zone reproduction aims to reproduce multiple audio contents over multiple regions of space without physical isolation and with minimal interference between the different programmes. In this talk, I will give an overview of the multi-sound zone reproduction state-of-the-art, outline the principal challenges of the research area and present an example of real-life implementation.
Adrian Mateo (SEAT): The MTG Sound Effect: a personal review
As a former student of the Sound and Music Computing Master at the MTG, I will give a short talk on the effect it has made on the sound product of my work, both as a freelance sound engineer and as a car audio engineer at SEAT. Not only the many topics covered by the Master’s program such as Music Perception and Cognition, Sound Production, Audio Signal Processing or Music Information Retrieval have had a ‘Sound Effect’. Also, the MTG has been a source of inspiration for innovation and has provided great opportunities for networking. Thus, I will also try to use this opportunity to encourage this amazing community to think of future ways of helping the sound of SEAT vehicles to evolve in every aspect, as the emerging era of electric, autonomous vehicles will push the needs for high-quality in-car entertainment.
Perfecto Herrera (ESMUC & MTG): What if? [video]
In this short talk, I will try to expose and maybe challenge some assumptions we (unnoticeably) bring to research as part of its conceptual framework. They are basic assumptions about music, processing, and computation. They are so obvious or hidden in the fabric of our activities, that we rarely wonder about them. Most of us have learned to get by without realising about them, as they seem natural or useful to reach to successful research outcomes. I wonder if a different kind of research would be possible by breaking such assumptions.
Sergio Oramas (Pandora): From software engineer to data scientist [video]
As almost everybody that has been at the MTG, my two passions are music and computers and I wanted to join them, but it wasn’t clear to me how. This short talk will describe my road from being an unhappy software engineer that worked as less as possible to have more time for playing music, to a happy scientist that spend his full day working with music (and computers) in one of the best companies in the world for this kind of job (and passion).
Jose Luis Zagazeta (SonoSuite): Empowering the music industry through innovative technology solutions [video]
Digital innovations are revolutionizing the whole music industry, and the distribution process is no exception. Last reports claim that the power of change resides in the independent music sector, and we want to be part of it, helping music businesses from all over the world to develop their own music industry, understanding their needs and being fairly paid with royalty collection. This is why we are developing SonoSuite, a digital distribution software made by tech professionals who love music and are building this solution to the service of the music industry.
Jordi Janer (Voctrolabs): A Voiceful story: singing technologies for creative applications [video]
In this talk, we will present the research and innovation activities around singing voice technologies at Voctro Labs. We distilled several algorithms for voice analysis, transformation, and synthesis, to create the Voiceful toolkit. We will show a few examples of how Voiceful is integrated in end-user creative applications. As a spin-off of the MTG, this talk serves also as an opportunity to give an overview of our evolution, both at the technical level (from basic Signal Processing to latest Deep Learning algorithms) and at the company level (the stepping stones in the past years and the current challenges).
Enric Giné (Tasso): Technology, techniques, and standards for the preservation of historical sound recordings [video]
This short talk will revise the way we can deal with old recordings on analogue formats: its preservation, playback settings and devices, digitization standards, restoration criteria for master and broadcast and related metadata, from a professional, day-to-day work view. We'll comment case studies and general guidelines, and look into possible research lines.
Bruno Rocha (Universidade de Coimbra): The challenges of building a respiratory sound database [video]
During the last decades, there has been a significant interest in the automatic analysis of respiratory sounds. However, a core problem in the field is the lack of publicly available large databases with which to evaluate algorithms and compare results. In this talk, I will present the first open access respiratory sound database and the limitations we expect to overcome when expanding it.
David Vidal Gonzalez & Jorge Fuentes (Musical Instruments Innovation Lab): Manufacturing Disruptive high‐end Musical Instruments by Applying Acoustically‐engineered Carbon Fibre Composites [video]
Reasons and advantages of using alternative materials to wood for the construction of high-end musical instruments.
José Lozano (UPF): La Candelaria, El Raval, El Born, PobleNou
I will explain my experience linked to MTG from the perspective of the recording studio of PHONOS-MTG. In the year 1998 I arrived from Bogota to the Raval neighborhood to study "The Master in Digital Arts" (co-directed by Xavier Serra, MTG) at the end that year I began to work hand to hand with Perfecto Herrera in the recording studio after that year I have been 20 years in the studio linked to MTG and Phonos. In those years we have moved the studio from three neighborhood's in Barcelona, arriving finally to the Campus of PobleNou.
Jorge Garcia (independent consultant): Working in Game Audio technology [video]
This talk will briefly introduce some of my experiences working in medium and large size game studios such as Electronic Arts, Codemasters, Activision, and MercurySteam, as well as working as an independent consultant in Game Audio. Some of the technology challenges and opportunities for Audio runtimes and tools when collaborating in multidisciplinary projects with audio directors, sound designers, audio programmers, and engineers will be also presented.
Eduard Resina (ESMUC): Don't loop me [video]
Since my first contact with electronic music at Phonos in 1976, up to present days, I’ve always searched in electronic sound an extension of the possibilities of acoustic sound in a number of ways, as well as the introduction of new elements which also involve other ways of compositional thinking. Yet, I’ve always struggled not to give up that essential aspect of acoustic instrumental performance: that no sound is ever identical to a previous one. As long as sound could not yet be digitally frozen and endlessly reproduced with no loss, change of some type was unavoidable and an essential aspect of the very nature of any sound, its aura, to put it in terms of Benjamin’s “The Work of Art in the Age of Mechanical Reproduction”. Sound had always been unrepeatable and unique. Just as no one can declaim twice a phrase with exactly identical rhythm and intonation, change, even when slight, is an essential element of the nature of time. A living time is one where we experience that things change, breath and move. Change is at the basis of movement just as the passing of time is at the essence of life and music. Sound liveliness, non-exact repetitiveness of sounding elements, is a crucial aspect affecting musical time and musical memory, which ground the recognition of large-scale and complex musical form.
Alex Loscos (BMAT): BMAT the FAQ [video]
The first spin-off of the MTG is soon to be 15 and we’ve compiled the top 15 most frequently asked questions about us - from 'what does BMAT stand for' to 'who pays you for this'.
Emilio Molina (BMAT): R&D on music monitoring at BMAT [video]
At BMAT we continuously identify the music played in more than 6000 channels (TVs, radios or venues) so that artists get the recognition they deserve. For it, we use a set of technologies: audio fingerprinting (against a catalog of tens of millions of tracks), music detection, and under some conditions, cover song identification. In this talk, I will tell my experience about the research challenges (not always considered in the literature) that arise when applying these technologies at industrial scale.
Felipe Navarro (MTG): Creative Technologies: From Art to Therapy
Since I graduated in 2013 at the MTG, my work has focused on finding creative and innovative ways of dealing with technology. Along these years, I have noticed its enormous potential to unite people, facilitate communication, share experiences and make a positive impact in their well-being and quality of life. In this sense, I have developed multiple interactive installations, designed sound for different purposes or pursued my electronic music project with a live audiovisual show. On the other hand, I have been working for MTG’s Banda Sonora Vital, a further development of my master thesis, which is a music recommendation system that attempts to recover the most representative music of the user’s life. This is currently being researched with a group of people with mild and moderate dementia, in order to evaluate the relevance and impact of the music proposed for its use in therapy sessions.
Ferdinand Fuhrmann (Joanneum Research): Developing and deploying machine listening systems for commercial applications [video]
In this talk, I’ll give an overview of the commercial potential of machine listening systems. More specifically, I’ll provide insights into the field of applied research for real-time acoustic monitoring systems. In two different application scenarios I’ll show the steps involved in the process starting from the client specifications up to the release of the final product. I’ll review the development phases, evaluation criteria, software tools for research and development, as well as hardware specifications involved in the two presented applications. Finally, I’ll draw conclusions and give an outlook into the future of machine listening applications.
Juanjo Bosch (Spotify): From Automatic Music Description to Computer-Aided Music Creation [video]
In this talk, I will describe my personal (research) journey which first took me from the Fraunhofer Institute to the MTG, where I pursued the SMC master and PhD (graduated in 2017), and then to the Spotify Creator Technology Research Lab. I will also explain how my research interests have organically grown over time from MIR to making tools to help musicians and producers in their creative process, which is my current focus. Finally, I will also mention the main research areas at Spotify and a selection of recent works.
Zacharias Vamvakousis (EyeHarp): Teaching Music with the EyeHarp Gaze Controlled Music Interface [video]
The EyeHarp is a gaze or head controlled digital music instrument that allows people with tetraplegia to play music through eye or head movements. It was developed as a part of my master thesis at the SMC master in UPF in 2011. During the years of my PhD at UPF it was evaluated on healthy users. The last 2 years I have been teaching music using the EyeHarp on 7 persons with spastic tetraplegia at the area of Barcelona. The EyeHarp users participated in various concerts in Barcelona and other cities around the world. The EyeHarp has now evolved into a gamified music teaching software. Our team now consists of 2 programmers and one sound designer. We plan to release the first commercial version of the EyeHarp in 8 months. At the same time, we are starting collaborations with creative centres and residences for people with disabilities in the area of Barcelona.
Jordi Pons (MTG): End-to-end learning for music audio tagging at scale
The lack of data tends to limit the outcomes of deep learning research, particularly when dealing with end-to-end learning stacks processing raw data such as waveforms. In this study, 1.2M tracks annotated with musical labels are available to train our end-to-end models. This large amount of data allows us to unrestrictedly explore two different design paradigms for music auto-tagging: assumption-free models --- using waveforms as input with very small convolutional filters; and models that rely on domain knowledge --- log-mel spectrograms with a convolutional neural network designed to learn timbral and temporal features. Our work focuses on studying how these two types of deep architectures perform when datasets of variable size are available for training: the MagnaTagATune (25k songs), the Million Song Dataset (240k songs), and a private dataset of 1.2M songs. Our experiments suggest that music domain assumptions are relevant when not enough training data are available, thus showing how waveform-based models outperform spectrogram-based ones in large-scale data scenarios.
António Ramires (MTG): Methods for Supporting Electronic Music Production with Large-Scale Sound Databases
The repurposing of audio material has been a key component in Electronic Music Production (EMP) since its early days. Hip-hop started from this practice which also had a major influence in a large variety of musical genres. The availability of software such as Digital Audio Workstations made the manipulation of audio easy and affordable. The internet further aided this practice by making the share of audio material easier. This led to the surfacing of a variety audio sharing platforms with several different characteristics. For our research, we will focus on a specific database, Freesound. This is a collaborative database of user-created sounds with different characteristics from the commercial sample databases. The different nature of sounds offered by Freesound is extremely useful for EMP but, due to the nature of its characterisation of sounds (based on unrestricted tags and descriptions), the navigation for EMP is not intuitive. Our work will be focused on providing better automatic content characterisation of sounds in this database by designing classification methods for the specific context of EMP. During our work we will first work on identification and characterisation of audio loops, which are of common use in Electronic Dance Music, and, after this, we will focus on similarity measures and automatic classification of isolated instrumental one-shot sounds. The methodologies developed during our research will then be implemented in prototypes so as to better evaluate how they can improve the EMP workflow and empower creativity.
Lorenzo Porcaro (MTG): 20 years of Playlists: a Statistical Analysis on Popularity and Diversity
Grouping songs together, according to music preferences, mood or other characteristics, is an activity which reflects personal listening behaviors and tastes. In the last two decades, due to the increasing size of music catalogues accessible and to improvements of recommendation algorithms, people have been exposed to new ways for creating playlists. In this work, through the statistical analysis of more than 400K playlists from four datasets, created in different temporal and technological contexts, we aim to understand if it is possible to extract information about the evolution of humans strategies for playlist creation. We focus our analysis on two driving concepts of the Music Information Retrieval literature: popularity and diversity.
Olga Slizovkaia (MTG): End-to-End Sound Source Separation Conditioned on Instrument Labels
Can we perform an end-to-end music source separation with a variable number of sources using a deep learning model? We present an extension of the Wave-U-Net model which allows end-to-end monaural source separation with a non-fixed number of sources. Furthermore, we propose multiplicative conditioning with instrument labels at the bottleneck of the Wave-U-Net and show its effect on the separation results. This approach leads to other types of conditioning such as audio-visual source separation and score-informed source separation.
Juan Sebastian Gómez (MTG): The Emotions That We Perceive in Music: Agreement and Language
In our present study, we address the relation between the emotions perceived in pop and rock music and the language the listener speaks. This research attempts to clarify the following research questions: 1) Are there differences/correlations in the emotions perceived from pop/rock music given that the listener has been raised with a particular mother tongue? 2) Do personal characteristics correlate to the perceived emotions of these styles of music inside a particular language? Our hypothesis is that there will be higher agreement of the perceived emotions by subjects that speak the same language, as defined by the Krippendorff alpha coefficient. Secondly, we attempt to replicate previous studies which show that subjects with more knowledge and experience with music tend to show lower agreement in the perceived emotions. We use pop and rock music in English and Spanish since these musical styles can be considered as neutral and homogeneous, even when sung in different languages. The musical fragments are rated by listeners with emotion tags of the Geneva Emotion Music Scale (GEMS). We also aim at characterizing perceived emotion with respect to mother tongue, demographic data, musical knowledge and taste, and familiarity with the examples. We created online surveys addressed to four different languages (Spanish, English, German and Mandarin) using two excerpts per emotion, for a total of 22 excerpts. While general emotion recognition models may achieve certain accuracy level, there is a need to further study agreement in subjective emotion annotation and to develop personalized emotional labels and personalized, language-sensitive models.
Miguel Garcia Casado (MTG): Contributing to new musicological theories with computational methods: Quantifying melodic pattern importance in Arab-Andalusian Music
Arab-Andalusian music was formed in the medieval Islamic territories of the Iberian Peninsula, drawing on local traditions and assuming Arabic influences. The expert performer and researcher of the Moroccan tradition of this music, Amin Chaachoo, is developing a theory whose last formulation was recently published in La Musique Hispano-Arabe, al-Ala (2016), which argues that centonization, a melodic composition technique used in Gregorian chant, was also utilized for the creation of this repertoire. In this paper we aim to contribute to Chaachoo’s theory by means of tf-idf analysis. A high-order n-gram model is applied to a corpus of 149 prescriptive transcriptions of heterophonic recordings, representing each as an un ordered multiset of patterns. Computing the tf-idf statistic of each pattern in this corpus provides a means by which we can rank and compare motivic content across nawabāt, distinct musical forms of the tradition. For each nawba, an empirical comparison is made between patterns identified as significant via our approach and those proposed by Chaachoo. Ultimately we observe considerable agreement between the two pattern sets and go further in proposing new, unique and as yet undocumented patterns that occur at least as frequently and with at least as much importance as those in Chaachoo’s proposals.
Andres Ferraro (MTG): Music cold-start and long-tail recommendation: evaluation of bias in deep model representations
Recent advances in deep learning have yielded new approaches for music recommendation in the long tail. The new approaches are based on data related to the music content (i.e. the audio signal) and context (i.e. other textual information), from which it automatically obtains a representation in a latent space that is used to generate the recommendations. The authors of these new approaches have shown improved accuracies, thus becoming the new state-of-the-art for music recommendation in the long tail. One of the drawbacks of these methods is that it is not possible to understand how the recommendations are generated and what the different dimensions of the underlying models represent. The goal of this thesis is to evaluate these models to understand how good are the results from the user perspective and how fair they are to new artists or to new music that is not popular (i.e. the long tail). For example, if a model predicts the latent representation from the audio but a given genre is not well represented in the collection, it is not probable that the songs of this genre are going to be recommended. First, we will focus on defining a measure that could be used to assess how fair a model is with a new artist or with less represented genres. Then, the state-of-the-art methods will be evaluated to understand how they perform under different circumstances. Later, using an online evaluation it will be possible to understand how the defined metric is related to user satisfaction. Increasingly, algorithms are responsible for the music that we consume, understanding their behavior is fundamental to make sure they give the opportunity to new artists and music styles. This work will contribute in this direction, allowing to give better recommendations for the users.
Rachit Gupta (MTG): Comprehending Hindustani Music by Analysing Rhythmic Patterns
Culture influences the processing of music rhythm, but the precise aspects are still unknown. We developed a rhythm training setup to perform a music cognitive study and investigated the effects of visual feedback enforced mimetic comprehension of Hindustani Rhythmic Patterns on participants with different cultural background, unfamiliar with the style of music. Participants were asked to recognise the “Sam”, which is the starting beat of every beat cycle, for simple and complex Hindustani Rhythmic beat patterns, and their performance was recorded to measure the improvement in their rhythm comprehending ability.
Vsevolod Eremenko & Blazej Kotowski (MTG): Automatic assessment of beginners level guitar exercises performances by audio recordings
Music MOOCs for beginners are flourishing now on a variety of platforms on the internet. Self-assessment and peer-assessment remain the primary approach to qualify the students' progress here. Could we help them to obtain more insightful and detailed feedback with the help of technology? We consider a particular case of "Guitar for Beginners" course.
Merlijn Blaauw (MTG): Recent Work on Neural Singing Synthesis
Over the last few years, Text-To-Speech (TTS) systems based on deep learning have reached a point where the results are practically indistinguishable from real speech. Our work on neural singing synthesis aims to similarly close the gap between synthetic and real singing. We present a convolutional autoregressive model trained on natural singing and predicting vocoder features, which are then converted to waveform using a traditional or neural vocoder. This model not only outperforms the previous state of the art, concatenative synthesis, in listening tests, but also allows for a much greater degree of flexibility. One example of this flexibility is the ability to "clone" a voice from only a few minutes of recordings, by leveraging data from other singers.
Pritish Chandna (MTG): A Vocoder Based Method For Singing Voice Extraction
This paper presents a novel method for extracting the vocal track from a musical mixture. The musical mixture consists of a singing voice and a backing track which may comprise of various instruments. We use a convolutional network with skip and residual connections as well as dilated convolutions to estimate vocoder parameters, given the spectrogram of an input mixture. The estimated parameters are then used to synthesize the vocal track, without any interference from the backing track. We evaluate our system, through objective metrics pertinent to audio quality and interference from background sources, and via a comparative subjective evaluation. We use open-source source separation systems based on Non-negative Matrix Factorization (NMFs) and Deep Learning methods as benchmarks for our system and discuss future applications for this particular algorithm.
Genís Plaja (UPF): Sounds of Science: share your research works through the matter of sound
The Sounds of Science is an outreach project that aims to explain the research activity through its sounds. This project, developed in the frame of DTIC Maria de Maeztu strategic program, counts with the participation of different research groups counts with the participation of different research groups to create a community, based on Freesound, where they can share their research throgh sounds. To complete the flow of scientific information between the diverse users and improve its reusability, all these sounds are accompanied by a brief explanation, references, tags, and geolocation.
Rubén Hinojosa (Independent consultant): Technoplayer: A digital algorithmic music instrument for real-time interactive performances
The Technoplayer is a system for real-time interactive execution of electronic music. Its generative model, based on iterative functions that exhibit chaotic behavior, allows to create potentially more than 2000 million different musical themes. It is a practical result of the ideas I defended 16 years ago in my pre-thesis for the PhD program in Computer Science and Digital Communication (UPF-MTG), and my research and experimentations on symbolic music modelling by means of chaotic functions since 1990.
Philip Tovstogan (MTG): Facilitating Interactive Music Exploration
Music recommendation systems are an integral part of modern music streaming services. Usually, they utilize exploit vs explore model from game theory. There is a lot of work being done on music exploitation, but not enough on exploration. We address the issue of strict categorization of music according to classes (genres, moods, themes) by introducing the concept of continuous semantic latent space that is constructed based on feature learning and transfer learning from relevant classifier systems. This space is used to map out user preferences and identify good recommendations for exploration. Moreover, we use reinforcement learning and negative feedback to learn the current context and goal of the user to provide the most relevant suggestions at the time of interaction. The evaluation of the system will be performed based on the experiments involving user interactions with the system with the metrics based on user engagement and novelty.
Xavier Favory (MTG): Graph-Based Audio Clustering for Browsing Online Sound Collections
The large size of nowadays online multimedia databases makes retrieving its content a difficult and time-consuming task. Users of online sound collections typically submit search queries that express a broad intent, which make the system often return large result sets. Search Result Clustering is a technique that clusters the search results content into coherent groups, which enables users to identify useful subsets in their results. Obtaining coherent and distinctive clusters is crucial for making this technique a useful complement of traditional search engines. In our work, we propose a graph-based approach, using audio features, for clustering diverse sound collections obtained when querying online large databases. We evaluate our method by taking advantage of the associated metadata and we show that the use of confidence measures enables discarding results that are not coherent enough to be presented to the user.
Luis Joglar Ongay (SonoSuite) & Stefano Negrín (SonoSuite): Automatisation of processes for the digital music distribution industry
SonoSuite is a white label, flexible, scalable and affordable digital music distribution SaaS. Our vision is to empower media creators around the world allowing them to develop their catalog and manage their distribution, marketing, licensing and royalties accounting activities. Our team invests in R&D to develop state of the art technologies in the digital music industry. In this poster we present our current research project in collaboration with the MTG in which we are developing tools to automate processes like audio quality control and metadata generation.
Aggelos Gkiokas (MTG): Deep Learning Methods for Automatic Beat Tracking
Beat Tracking is one of the most challenging tasks in the Music Information Retrieval field. Tempo fluctuations, soft onsets (classical music), complex rhythm (jazz, electro-acoustic music), rhythm changes (meter, tempo) and other rhythm irregularities are some of the reasons constituting beat tracking difficult. Although it is a well-studied task, state-of-the-art methods achieve roughly 60% accuracy in detecting the correct beats in some cases. In this presentation, we demonstrate ongoing research on beat tracking based on Deep Learning methods. In particular, we deploy Long Short Term Memory Neural Networks to learn beat sequences from the input spectrogram of an audio signal. Moreover, we investigate multi-task learning approaches for rhythm analysis, such as the joint learning of tempo and beats.
Furkan Yesiler (MTG): Identifying and Understanding Versions of Songs with Computational Approaches
Nowadays, with large collections of music being easily accessible, MIR researchers are investigating new ways of organizing these collections. Version identification can provide new ways of modelling music similarity, it has a direct application in copyright infringement cases, and users can be interested in finding other versions of their favorite songs. Focusing on the limitations of previous works, first, a large-scale dataset with commonly used features for version identification will be curated. We plan on exploiting this data for exploring deep learning applications for incorporating several musical characteristics and providing industrial-level scalability for a version identification system. To understand the relations among versions, we will investigate the existence of an “essence” in various musical dimensions with employing time series motif discovery methods. Moreover, we will study the musical evolution of versions by designating several topics for musical characteristics and analyzing the development of them throughout the years.
Blai Melendez (BMAT & MTG): Open Broadcast Media Audio from TV: a Dataset of TV Broadcast Audio with Music Relative Loudness Annotations
Open Broadcast Media Audio from TV (OpenBMAT) is an open, annotated dataset for the task of music detection that contains over 27 hours of TV broadcast audio from 4 countries distributed over 1647 one-minute long excerpts. It is designed to encompass several essential features for any music detection dataset and is the first one to include annotations about the loudness of music in relation to other simultaneous non-music sounds. OpenBMAT has been cross-annotated by 3 annotators obtaining high inter-annotator agreement percentages, which allows us to validate the annotation methodology and ensure the annotations reliability. In this work, we first review the current publicly available music detection datasets and state OpenBMAT's contributions. After that, we detail its building process: the selection of the audio and the annotation methodology. Then, we analyze the produced annotations and validate their reliability. We continue with an experiment to highlight the value of these annotations and investigate the most challenging content in OpenBMAT. Finally, we describe the details about the format in which the dataset is presented and the platform where we have made it available. We believe OpenBMAT will contribute to major advancements of the research on music detection in real-life scenarios.
Eduardo Fonseca (MTG): Model-agnostic Approaches to Handling Noisy Labels When Training Sound Event Classifiers
Label noise is emerging as a pressing issue in sound event classification. This arises as we move towards larger datasets that are difficult to annotate manually, but it is even more severe if datasets are collected automatically from online repositories, where labels are inferred through automated heuristics applied to the audio content or metadata. While learning from noisy labels has been an active area of research in computer vision, it has received little attention in sound event classification. Most of existing computer vision approaches against label noise are relatively complex, requiring complex networks or extra data resources. In this work, we evaluate simple and efficient model-agnostic approaches to handling noisy labels when training sound event classifiers, namely label smoothing regularization, mixup and noise-robust loss functions. The main advantage of these methods is that they can be incorporated to existing deep learning pipelines without the need for network modifications or extra resources. We report results from experiments conducted with the FSDnoisy18k dataset. We show that these simple methods can be effective in mitigating the effect of label noise, providing up to 2% of accuracy boost when added to a CNN baseline, while requiring minimal intervention and computational overhead.
Ninad Puranik (MTG): Automatic Assessment of Hindustani Singing Exercises
Poster presentation and live demonstration of automatic grading of singing exercises in Hindustani Music to showcase the research done at Audio Signal Processing Lab as part of the TECSOME project for the development of an automatic assessment system named Music Critic to support music performance exercises and help them scale to MOOC levels. The system developed compares a user recorded melody with a reference melody and assigns a grade based on pitch and timing accuracy with respect to the reference.
Fabio Ortega (MTG): Teaching music expression using data visualization and machine learning models
A central element in music performance is the musicians' expression, which characterizes their artistic translation of a composition into sound through choices of dynamics, note articulation, rhythmic variation, and several others. Teaching such skills is a challenging task for music educators, and not many suitable tools exist to assist the communication of expressive intentions and its underlying creative process. We propose an experimental setup to assess the impact of using software tools for providing visual feedback to musicians about expressive features of their performances. Additionally, a machine learning model of expression is presented and its application as a creative tool assisting performance practice is discussed.
Alia Morsi (MTG): From General to Culture Specific and Back: How Dunya and Compmusic Empower Culture Specific but also More General Music Research Tasks too
Dunya and Compmusic are intertwined projects which aim to enhance computational research of traditional music. Given that models and algorithms resulting from research of a particular tradition would most likely have biases or assumptions not ideal for another, the incorporation of more domain knowledge and music cultural specificity - the objectives of Dunya and Compmusic - can therefore significantly enhance the automatic description of each music tradition being studied. The goal of this work is to explain what each of these projects fundamentally are, and how they differ from and influence one another. In addition, we show the scheme through which the corpora data and metadata are organized in general, followed by an overview of the main components of Dunya. Finally, it is also our goal to emphasize that although made with traditional music in mind, the accumulated knowledge of these projects is not constrained to that, and is applicable for other more general computational music tasks.
Sergio Giraldo (MTG): Enhancing Music Learning with Smart Technologies
Learning to play a musical instrument is a difficult task, requiring the development of sophisticated skills. Nowadays, such a learning process is mostly based on the master-apprentice model. However, learning under this model lead to long periods of private-study by the student, making the learning a rather harsh and solitary experience, resulting in high abandonment rates. Technologies are rarely employed and are usually restricted to audio and video recording and playback. The TELMI (Technology Enhanced Learning of Musical Instrument Performance) Project seeks to design and implement new interaction paradigms for music learning and training based on state-of-the-art multimodal (audio, image, video, and motion) technologies.
David Dalmazzo (MTG): Gestures Classification in Music Performance: A Machine Learning Approach
Gestures in music are of paramount importance partly because they are directly linked to musicians’ sound and expressiveness. At the same time, current motion capture technologies are capable of detecting body motion/gestures details very accurately. We present a Machine Learning (ML) approach to automatic violin bow gesture classification based on the comparison among Hierarchical Hidden Markov Models (HHMM) and Recurrent Neural Networks such as Long-Short Term Memory architecture or Temporal Convolutional Networks. We recorded motion and audio data corresponding to seven representative bow techniques (Détaché, Martelé, Spiccato, Ricochet, Sautillé, Staccato and Bariolage) performed by professional violin players. We used the commercial Myo device for recording inertial motion information from the right and left forearm, synchronized it with audio recordings. After extracting features from both the motion and audio data implementing Essentia c++ library, we trained the ML models to identify the different bowing techniques automatically. Our model can determine the studied bowing techniques with over 94% accuracy. The results make feasible the application of this work in a practical learning scenario, where violin students can benefit from the real-time feedback provided by the system.