Modeling of phoneme durations for alignment between polyphonic audio and lyrics

Dzhambazov, Georgi Bogomilov; Serra, Xavier

Modeling of phoneme durations for alignment between polyphonic audio and lyrics

dc.contributor.author Dzhambazov, Georgi Bogomilovca
dc.contributor.author Serra, Xavierca
dc.date.accessioned 2016-11-28T08:14:31Z
dc.date.available 2016-11-28T08:14:31Z
dc.date.issued 2015ca
dc.description Comunicació presentada al 12th Sound and Music Computing Conference, celebrada del 30 de juliol a l'1 d'agost 2015 a Maynooth (Irlanda).ca
dc.description.abstract In this work we propose how to modify a standard scheme for text-to-speech alignment for the alignment of lyrics and singing voice. To this end we model the duration of phonemes specific for the case of singing. We rely on a duration-explicit hidden Markov model (DHMM) phonetic recognizer based on mel frequency cepstral coefficients (MFCCs), which are extracted in a way robust to background instrumental sounds. The proposed approach is tested on polyphonic audio from the classical Turkish music tradition in two settings: with and without modeling phoneme durations. Phoneme durations are inferred from sheet music. In order to assess the impact of the polyphonic setting, alignment is evaluated as well on an acapella dataset, compiled especially for this study. We show that the explicit modeling of phoneme durations improves alignment accuracy by absolute 10 percent on the level of lyrics lines (phrases) and performs on par with state-of-the-art aligners for other languages.
dc.description.sponsorship This work is partly supported by the European Research/nCouncil under the European Unions Seventh Framework/nProgram, as part of the CompMusic project (ERC grant/nagreement 267583) and partly by the AGAUR research grant.en
dc.format.mimetype application/pdfca
dc.identifier.citation Dzhambazov G, Serra X. Modeling of phoneme durations for alignment between polyphonic audio and lyrics. In: Timoney J, Lysaght T, editors. 12th Sound and Music Computing Conference; 2015 jul. 30-ag. 1; Maynooth (Ireland). Maynooth: Music Technology Research Group, Department of Computer Science, Maynooth University; 2015. Oral session 7, Computational musicology and mathematical music theory 1; p. 281-286.ca
dc.identifier.uri http://hdl.handle.net/10230/27614
dc.language.iso engca
dc.publisher Music Technology Research Group, Department of Computer Science, Maynooth Universityca
dc.relation.ispartof Timoney J, Lysaght T, editors. 12th Sound and Music Computing Conference; 2015 jul. 30-ag. 1; Maynooth (Ireland). Maynooth: Music Technology Research Group, Department of Computer Science, Maynooth University; 2015. Oral session 7, Computational musicology and mathematical music theory 1; p. 281-286.
dc.relation.isreferencedby http://hdl.handle.net/10230/27563
dc.relation.isreferencedby http://hdl.handle.net/10230/27606
dc.relation.projectID info:eu-repo/grantAgreement/EC/FP7/267583ca
dc.rights ©2015 Georgi Dzhambazov et al. This is an open-access article distributed under the terms of the Creative Commons Attribution 3.0 Unported License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.en
dc.rights.accessRights info:eu-repo/semantics/openAccessca
dc.rights.uri http://creativecommons.org/licenses/by/3.0/
dc.subject.other So -- Enregistrament i reproducció -- Tècniques digitalsca
dc.title Modeling of phoneme durations for alignment between polyphonic audio and lyricsen
dc.type info:eu-repo/semantics/conferenceObjectca
dc.type.version info:eu-repo/semantics/publishedVersionca

Collections

Congressos (Departament de Tecnologies de la Informació i les Comunicacions)
Documents OpenAIRE (Open Access Infrastructure for Research in Europe)