Score-informed syllable segmentation for a cappella singing voice with convolutional neural networks

Mostra el registre complet Registre parcial de l'ítem

  • dc.contributor.author Pons Puig, Jordica
  • dc.contributor.author Gong, Rongca
  • dc.contributor.author Serra, Xavierca
  • dc.date.accessioned 2018-03-12T10:50:23Z
  • dc.date.available 2018-03-12T10:50:23Z
  • dc.date.issued 2017
  • dc.description Comunicació presentada a la 18th International Society for Music Information Retrieval Conference (ISMIR 2017), celebrada els dies 23 a 27 d'octubre de 2017 a Suzhou, Xina.
  • dc.description.abstract This paper introduces a new score-informed method for the segmentation of jingju a cappella singing phrase into syllables. The proposed method estimates the most likely sequence of syllable boundaries given the estimated syllable onset detection function (ODF) and its score. Throughout the paper, we first examine the jingju syllables structure and propose a definition of the term “syllable onset”. Then, we identify which are the challenges that jingju a cappella singing poses. Further, we investigate how to improve the syllable ODF estimation with convolutional neural networks (CNNs). We propose a novel CNN architecture that allows to efficiently capture different timefrequency scales for estimating syllable onsets. Besides, we propose using a score-informed Viterbi algorithm – instead of thresholding the onset function–, because the available musical knowledge we have (the score) can be used to inform the Viterbi algorithm to overcome the identified challenges. The proposed method outperforms the state-of-the-art in syllable segmentation for jingju a cappella singing. We further provide an analysis of the segmentation errors which points possible research directions.en
  • dc.description.sponsorship This work is partially supported by the Maria de Maeztu Programme (MDM-2015-0502) and the European Research Council under the European Union’s Seventh Framework Program, as part of the CompMusic project (ERC grant agreement 267583).
  • dc.format.mimetype application/pdf
  • dc.identifier.citation Pons J, Gong R, Serra X. Score-informed syllable segmentation for a cappella singing voice with convolutional neural networks. In: Hu X, Cunningham SJ, Turnbull D, Duan Z. ISMIR 2017. 18th International Society for Music Information Retrieval Conference; 2017 Oct 23-27; Suzhou, China. [Canada]: ISMIR; 2017. p. 483-9.
  • dc.identifier.uri http://hdl.handle.net/10230/34089
  • dc.language.iso eng
  • dc.publisher International Society for Music Information Retrieval (ISMIR)ca
  • dc.relation.ispartof Hu X, Cunningham SJ, Turnbull D, Duan Z. ISMIR 2017. 18th International Society for Music Information Retrieval Conference; 2017 Oct 23-27; Suzhou, China. [Canada]: ISMIR; 2017. p. 483-9.
  • dc.relation.projectID info:eu-repo/grantAgreement/EC/FP7/267583
  • dc.rights © Jordi Pons, Rong Gong and Xavier Serra. Licensed under a Creative Commons Attribution 4.0 International License (CC BY 4.0). Attribution: Jordi Pons∗, Rong Gong∗ and Xavier Serra. “Score-informed syllable segmentation for a cappella singing voice with convolutional neural networks”, 18th International Society for Music Information Retrieval Conference, Suzhou, China, 2017.
  • dc.rights.accessRights info:eu-repo/semantics/openAccess
  • dc.rights.uri http://creativecommons.org/licenses/by/4.0/
  • dc.subject.other Música -- Informàtica
  • dc.title Score-informed syllable segmentation for a cappella singing voice with convolutional neural networksca
  • dc.type info:eu-repo/semantics/conferenceObject
  • dc.type.version info:eu-repo/semantics/publishedVersion