Voice assignment in vocal quartets using deep learning models based on pitch salience

Mostra el registre complet Registre parcial de l'ítem

  • dc.contributor.author Cuesta, Helena
  • dc.contributor.author Gómez Gutiérrez, Emilia, 1975-
  • dc.date.accessioned 2023-01-20T07:52:15Z
  • dc.date.available 2023-01-20T07:52:15Z
  • dc.date.issued 2022
  • dc.description.abstract This paper deals with the automatic transcription of four-part, a cappella singing, audio performances. In particular, we exploit an existing, deep-learning based, multiple F0 estimation method and complement it with two neural network architectures for voice assignment (VA) in order to create a music transcription system that converts an input audio mixture into four pitch contours. To train our VA models, we create a novel synthetic dataset by collecting 5381 choral music scores from public-domain music archives, which we make publicly available for further research. We compare the performance of the proposed VA models on different types of input data, as well as to a hidden Markov model-based baseline system. In addition, we assess the generalization capabilities of these models on audio recordings with differing pitch distributions and vocal music styles. Our experiments show that the two proposed models, a CNN and a ConvLSTM, have very similar performance, and both of them outperform the baseline HMM-based system. We also observe a high confusion rate between the alto and tenor voice parts, which commonly have overlapping pitch ranges, while the bass voice has the highest scores in all evaluated scenarios.
  • dc.description.sponsorship This work is partially supported by the European Commission under the TROMPA project (H2020 770376), the Spanish Ministry of Science and Innovation under the Musical AI project (PID2019-111403GB-I00), and by AGAUR (Generalitat de Catalunya) through an FI Predoctoral Grant (2018FI-B01015).
  • dc.format.mimetype application/pdf
  • dc.identifier.citation Cuesta H, Gómez E. Voice assignment in vocal quartets using deep learning models based on pitch salience. Transactions of the International Society for Music Information Retrieval. 2022;5(1):99-112. DOI: 10.5334/tismir.121
  • dc.identifier.doi http://dx.doi.org/10.5334/tismir.121
  • dc.identifier.issn 2514-3298
  • dc.identifier.uri http://hdl.handle.net/10230/55358
  • dc.language.iso eng
  • dc.publisher Ubiquity Press
  • dc.relation.ispartof Transactions of the International Society for Music Information Retrieval. 2022;5(1):99-112.
  • dc.relation.isreferencedby https://github.com/helenacuesta/voas-vocal-quartets
  • dc.relation.isreferencedby https://doi.org/10.5334/tismir.121.s1
  • dc.relation.projectID info:eu-repo/grantAgreement/EC/H2020/770376
  • dc.relation.projectID info:eu-repo/grantAgreement/ES/2PE/PID2019-111403GB-I00
  • dc.rights © 2022 The Author(s). This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International License (CC-BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. See http://creativecommons.org/licenses/by/4.0/.
  • dc.rights.accessRights info:eu-repo/semantics/openAccess
  • dc.rights.uri http://creativecommons.org/licenses/by/4.0/
  • dc.subject.keyword voice assignment
  • dc.subject.keyword multi-pitch estimation
  • dc.subject.keyword music information retrieval
  • dc.subject.keyword vocal quartets
  • dc.subject.keyword polyphonic vocal music
  • dc.subject.keyword deep learning
  • dc.title Voice assignment in vocal quartets using deep learning models based on pitch salience
  • dc.type info:eu-repo/semantics/article
  • dc.type.version info:eu-repo/semantics/publishedVersion