Singing voice accompaniment data augmentation with generative models

Perez, Miguel; Kirchhoff, Holger; Grosche, Peter; Serra, Xavier

Singing voice accompaniment data augmentation with generative models

Mostra el registre complet Registre parcial de l'ítem

dc.contributor.author Perez, Miguel
dc.contributor.author Kirchhoff, Holger
dc.contributor.author Grosche, Peter
dc.contributor.author Serra, Xavier
dc.date.accessioned 2025-10-16T06:07:03Z
dc.date.available 2025-10-16T06:07:03Z
dc.date.issued 2025
dc.description.abstract Singing voice transcription is a key task in Music Information Retrieval (MIR) that focuses on identifying sung notes within a music audio segment. Advancing state-of-theart methods in this area relies heavily on high-quality data, yet annotating such data is resource-intensive and requires musical expertise. In genres like pop music, data sharing is further complicated by copyright and distribution limitations. In this paper, we refine a recently proposed data augmentation technique that leverages AI-generated music audio to address these data-related challenges. Specifically, we create musical accompaniments for vocals with known target notes, enabling the generation of new mixes that retain the original piece’s harmony while introducing substantial audio variation. Our cross-dataset experiments reveal that using harmony-matched mixes improves generalization, though performance remains below that achieved by training with additional real data.
dc.format.mimetype application/pdf
dc.identifier.citation Perez M, Kirchhoff H, Grosche P, Serra X. Singing voice accompaniment data augmentation with generative models. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP); 2025 April 6-11; Hyderabad, Índia. 5 p. DOI: 10.1109/ICASSPW65056.2025.11011167
dc.identifier.uri http://hdl.handle.net/10230/71522
dc.language.iso eng
dc.publisher Institute of Electrical and Electronics Engineers (IEEE)
dc.relation.ispartof IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP); 2025 April 6-11; Hyderabad, Índia.
dc.rights © 2025 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. http://dx.doi.org/10.1109/ICASSPW65056.2025.11011167
dc.rights.accessRights info:eu-repo/semantics/openAccess
dc.subject.keyword Singing voice
dc.subject.keyword Data augmentation
dc.subject.keyword Generative models
dc.title Singing voice accompaniment data augmentation with generative models
dc.type info:eu-repo/semantics/conferenceObject
dc.type.version info:eu-repo/semantics/acceptedVersion

Col·leccions

Congressos (Departament de Tecnologies de la Informació i les Comunicacions)