Back Best Audio Representation Learning Paper Award at WASPAA 2021

Best Audio Representation Learning Paper Award at WASPAA 2021

The work done during Edu Fonseca’s internship at Google Research explores the use of automatic sound separation for self-supervised representation learning

25.10.2021

Imatge inicial

The work done during Edu Fonseca’s most recent internship at Google Research has received a special award for “Best Audio Representation Learning Paper” at the 2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), one of the most relevant venues in the field of audio and acoustic signal processing. The paper explores the use of automatic sound separation to decompose sound scenes into multiple semantically-linked views for use in self-supervised representation learning. The work is a collaboration with multiple researchers at Google Research from New York, California, Massachusetts and Switzerland.

Real-world sound scenes consist of time-varying collections of sound sources, each generating characteristic sound events that are mixed together in audio recordings. The association of these constituent sound events with their mixture and each other is semantically constrained: the sound scene contains the union of source classes and not all classes co-occur naturally. With this motivation, the paper explores the use of automatic sound separation to decompose sound scenes into multiple semantically-linked views for use in self-supervised contrastive learning.

The paper shows that sound separation can be seen as a valid augmentation to generate positive views for contrastive learning. In particular, learning to associate input sound mixtures with their constituent separated channels elicits semantic structure in the learned representation, outperforming comparable systems without separation. Through extensive experimentation, the authors also discover that optimal sound separation performance is not essential for successful representation learning.

An extended version of the WASPAA 2021 submission with additional discussion for easier consumption is available on arXiv. A video presentation as well as the slide deck used are also available.

 

Reference:

Eduardo Fonseca, Aren Jansen, Daniel P.W. Ellis, Scott Wisdom, Marco Tagliasacchi, John R. Hershey, Manoj Plakal, Shawn Hershey, R. Channing Moore and Xavier Serra. Self-Supervised Learning from Automatically Separated Sound Scenes. In 2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).

Multimedia

Categories:

SDG - Sustainable Development Goals:

Els ODS a la UPF

Contact