PhD thesis
Data-driven Pitch Content Description of Choral Singing Recordings
Abstract
Ensemble singing is a well-established practice across cultures, found in a great diversity of forms, languages, and levels.
However, it has not been widely studied in the field of Music Information Retrieval (MIR), likely due to the lack of appropriate data.
In this dissertation, we first address the data scarcity by building new open, multi-track datasets of ensemble singing.
Then, we address three main research problems: multiple F0 estimation, voice assignment, and the characterization of vocal unisons, all in the context of four-part vocal ensembles.
Hence, a primary contribution of this thesis is the development and release of four multi-track datasets of vocal ensembles:
Choral Singing Dataset, Dagstuhl ChoirSet, ESMUC Choir Dataset, and CantorĂa Dataset, all of them with audio recordings and accompanying annotations.
The second contribution is a set of deep learning models for multiple F0 estimation, streaming, and voice assignment of vocal quartets, mainly based on
convolutional neural networks designed leveraging music domain knowledge.
Finally, we propose two methods to characterize vocal unison performances in terms of pitch dispersion.
Datasets
Link to the dissertation document:
http://hdl.handle.net/10803/673924