Soundscape Generation - Authoring tool

Main features / specifications

Soundscape design is beginning to receive considerable attention in virtual and augmented reality environments, games and interactive media. Our technology consists of an online platform that simplifies the authoring process and generates at the same time a realistic and interactive soundscape. In the authoring stage, we features automatic audio classification to facilitate the search of samples in Freesound. The authoring tool uses standard format (extended KML) for storing soundscape designs. In the synthesis stage, an autonomous engine is driven by graph models and multiple audio samples. Finally, in our server/client architecture, the server gets position update messages from the client in real-time and the soundscape is delivered to application as a web stream.

DEMO (& GUI)

(extended version)

Use cases and applications

Augmented mobile reality: Walking in a city while listening to an augmented soundscape can be a enriching experience. It can be thought as an informational (e.g. as an enhanced touristic audio-guide or also for audio branding purposes) or artistic (e.g. spatially distributed music compositions) activity. Today mobile devices are equipped with GPS positioning and internet connection that enable this novel applications.
Driving in soundscapes: a variant of the previous use-case is when driving on the road instead of walking in a city. Adapting the soundscape to a larger scale can generate an interesting in-car listening experience.
Architectural 3D modelling: new urban developments make use of virtual models to be presented. To increase the sense of presence

Available software tools

Authoring tool: application written in SuperCollider that allows loading a map file (KML) from GoogleEarth. The KML file contains the positions of the sound objects, and the authoring tool permits to assign several samples from FreeSound to each sound object. It exports all necessary files that define a particular soundscape (map, sound object positions, sample list, and synthesis parameters).
Web streaming server: A web server implemented a Twisted (http://twistedmatrix.com) framework and it provides an HTTP interface for external Internet clients, which translates to OSC (OpenSoundControl) calls for controlling the streaming server. Finally, this server module streams the listener output produced by the soundscape generation in the MPEG1 Layer 3 format

System requirements

Authoring Tool: All code has been implemented in Supercollider, and is available under the GNU-GPL license. Linux only.
Soundcape Generation: All code has been implemented in Supercollider, and is available under the GNU-GPL license. The system has dependencies on other SuperCollider packages (GeoGraphy, XML). Linux/Mac/Windows
Web streaming server: Icecast Streaming Media server. (http://www.icecast.org/). Linux server

Acknowledgement

ITEA2 Metaverse Project
TECNIO network (ACC1Ó - Generalitat de Catalunya)

Awards and other recognitions

Selected talk at the Games Developers Conference GDC 2012, San Francisco.

Team involved

Stefan Kersten, Gerard Roma, Mattia Schirosa, Jordi Janer

Related publications

'Talking Soundscapes: Automatizing Voice Transformations For Crowd Simulation', Janer, J., Geraerts R., van Toll W. G., & Bonada J., AES 49th International Conference, Audio for Games, 2013.
'Authoring augmented soundscapes with user-contributed content', Janer, J., Roma G., & Kersten S., ISMAR Workshop on Authoring Solutions for Augmented Reality, 2011
'An online platform for interactive soundscapes with user-contributed content', Janer, J., Kersten S., Schirosa M., & Roma G., AES 41st International Conference on Audio for Games, 2011
'Ecological acoustics perspective for content-based retrieval of environmental sounds', Roma, G.;Janer, J.;Kersten,S.;Schirosa,M.;Herrera, P.;Serra, X., EURASIP Journal on Audio, Speech, and Music Processing, 2011
'Content-based retrieval from unstructured databases using an ecological acoustics taxonomy', Roma, G.; Janer, J.; Kersten, S.; Schirosa, M.; Herrera, P., Proceedings International Community for the Auditory Display (ICAD, 2010)
'Sound Texture Synthesis with Hidden Markov Tree Models in the Wavelet Domain', Kersten, Stefan ; Purwins, Hendrik, Proceedings SMC Conference, Barcelona, 2010
'An Online Platform for Interactive Soundscapes with User-Contributed Content', Jordi Janer, Stefan Kersten, Mattia Schirosa, Gerard Roma, Proceedings AES 41st Conference Audio for Games, London, 2010 '
'A system for soundscape generation, composition and streaming', Schirosa, M.; Jordi, J.; Kersten, S.; Roma, G., XVII CIM - Colloquium of Musical Informatics, 2010
'Soundscape Generation for Virtual Environments using Community-Provided Audio Databases', Finney, N.; Janer, J., W3C Workshop: Augmented Reality on the Web, Barcelona, 2010
'Design and Evaluation of a Visualization Interface for Querying Large Unstructured Sound Databases', Font, F., Master Thesis, UPF 2010
'Supporting Soundscape Design in Virtual Environments with Content-based Audio Retrieval', Janer, J.; Finney, N.; Roma, G.; Kersten, S.; Serra, X., Journal of Virtual Worlds Research, Vol.2, (3), Virtual Worlds Research Consortium, 2009.
'Autonomous Generation of Soundscapes using Unstructured Sound Databases', Finney, N., Master Thesis, UPF, 2009

Commercialization options

We have partnered with SampleCount startup who are exploiting it through locosonic application which consists of a web browser for the authoring tool that connects to the freesound API to collect the sounds, and a mobile app for the synthesis engine that runs locally on the client device.

Environmental Sound Search

We present a method to search for environmental sounds in large unstructured databases of user-submitted audio, using a general sound events taxonomy from ecological acoustics. We use Support Vector Machines to classify sound recordings according to the taxonomy, and provide a content-based web search interface for a large audio database.

You can try out here our Freesound Taxonomy Browser (beta version)

Sound designers have traditionally made extensive use of recordings for creating the auditory content of audiovisual productions. Many of these sound effects come from commercial sound libraries, either in the form of CD/DVD collections or more recently as online databases. These repositories are organized according to editorial criteria and contain a wide range of sounds recorded in controlled environments. With the rapid growth of social media, large amounts of sound material are becoming available through the web every day. In contrast with traditional audiovisual media, networked multimedia environments can exploit such a rich source of data to provide content that evolves over time.

Soundscape Synthesis

Soundscape Generation System

We developed a generative system that aims at simplifying the authoring process, but offering at the same time a realistic and interactive soundscape. A sample-based synthesis algorithm is driven by graph models (see figure). Sound samples can be retrieved from a user-contributed audio repository. The synthesis engine runs on a server that gets position update messages and the soundscape is delivered to the client application as a web stream. The system provides standard format for soundscape composition.The system includes an authoring module to create the soundscapes, and the actual audio generation engine. All code has been implemented in Supercollider, and is available under the GNU-GPL license. The system has dependencies on other SuperCollider packages (GeoGraphy, XML).

Sound Texture Synthesis

We are working on new parametric and non-parametric statistical models for synthesizing environmental sound textures, such as running water, rain, and fire. Sound texture analysis is cast in the framework of multiresolution statistical models. We stochastically sample from a model that has been trained on source sounds and captures correlations of these sound textures in a transformed feature space representation. By reconstructing a time-domain signal from the sampled feature sequences, e.g. by the inverse wavelet transform or corpus-based synthesis, we aim to create distinct but perceptually similar versions of a sound.

Streaming Server

In online virtual environments, the audiovisual content render is typically achieved by a standalone application on the client side. Our interest is here, rather than deploying a large-scale efficient system, to provide a flexible platform for soundscape design and generation that is both application-agnostic and is focused to user accessibility.We intend to foster user-generated content, and the web has become commonplace as a collaborative repository of media content. As for the actual audio rendering, inspite of the lower efficiency, a server-side architecture offers advantages, since it does not require any specific software installation by the user.

Interaction with the soundscape running in the server is done through a web API. This allows client applications to add listeners and obtain personalized streams given the coordinates of each listener (“position” and “rotation”). A web server implemented a Twisted (http://twistedmatrix.com) framework and it provides an HTTP interface for external Internet clients, which translates to OSC (OpenSoundControl) calls for controlling the streaming server. The web server is also responsible of maintaining client sessions. In the current implementation, a fixed pool of streaming URLs is used, and so the number of clients is bounded. Finally, this server module streams the listener output produced by the soundscape generation in the MPEG1 Layer 3 format.Currently, the streaming server hosts uniquely the soundscape for the "Virtual Travel" demo (see downloads page). For evaluation purposes, it is possible to host streaming of user-created soundscapes under request from research or commercial institutions.

Downloads

Metaverse1 Virtual Travel prototype

The capabilities for creating generative soundscapes in SecondLife are currently quite limited. Thus we intercept the traffic between the SecondLife server (sim) and the viewer in order to send status updates (such as avatar position, head rotation, time of day, etc.) to a custom soundscape streaming server with an HTTP API. A separate stream is generated for each client and received through the SecondLife viewer application via a local streaming proxy connection. This requires running a proxy program along with SecondLife. Check the Readme file for further information.

Download