From Heuristics-Based to Data-Driven Audio Melody Extraction
[PhD thesis] From Heuristics-Based to Data-Driven Audio Melody Extraction
Author: Juan José Bosch
Supervisor: Emilia Gómez
The identification of the melody from a music recording is a relatively easy task for humans, but very challenging for computational systems. This task is known as "audio melody extraction", more formally defined as the automatic estimation of the pitch sequence of the melody directly from the audio signal of a polyphonic music recording. This thesis investigates the benefits of exploiting knowledge automatically derived from data for audio melody extraction, by combining digital signal processing and machine learning methods. We extend the scope of melody extraction research by working with a varied and realistic set of data, and considering multiple definitions of melody. We first present an extensive overview of the state of the art, and perform an evaluation on a novel symphonic music melody extraction dataset. Results show that most approaches are not able to generalise well to the characteristics of such data, which presents a high pitch range. A pitch salience function based on source-filter modelling is found to be specially useful in such context. We then propose its integration with melody tracking methods based on pitch contour characterisation, and evaluate them on a wide range of music genres. Firstly, this salience function is adapted for pitch contour creation by combining it with another one based on harmonic summation. This combination increases the salience of melody pitches and improves melody extraction accuracy over previous approaches, with two different contour-based melody tracking methods: pitch contour selection based on heuristic rules, and supervised pitch contour classification. Secondly, the latter approach is further improved by using novel timbre, tonal and spatial features, which are helpful to discriminate melodic from non-melodic pitch contours. Finally, we also propose a method for the estimation of multiple melodic lines based on pitch contour classification, which exploits continuity within melodic lines. The combination of supervised and unsupervised approaches leads to advancements on melody extraction and shows a promising path for future research and applications.
- Open access version available at TDR repository and Zenodo
- Datasets: The symphonic music dataset proposed in this thesis is available at:
- Code: Source code of the melody extraction algorithms proposed in this thesis is available at: