Leveraging Carnatic live recordings for singing voice separation using regression-guided latent diffusion
Mostra el registre complet Registre parcial de l'ítem
- dc.contributor.author Plaja-Roglans, Genís
- dc.contributor.author Serra, Xavier
- dc.contributor.author Rocamora, Martín
- dc.date.accessioned 2025-09-09T05:57:45Z
- dc.date.available 2025-09-09T05:57:45Z
- dc.date.issued 2025
- dc.description Comunicació presentada a la 26th International Society for Music Information Retrieval Conference (ISMIR 2025), celebrada a Daejeon (Korea) del 21 a 25 de setembre de 2025.
- dc.description.abstract Diffusion models have demonstrated potential to separate individual sources from music mixtures in a generative fashion, enabling a new solution for this challenging problem. However, existing works require clean multi-stem data, which is scarce for several repertoires, consequently compromising generalization. We explore the potential of generative modeling to perform weakly-supervised singing voice separation for Carnatic Music, a music repertoire for which large quantities of multi-stem recordings with bleeding between sources have been collected from live performances. We pre-train a latent diffusion model to perform preliminary vocal separation conditioning on the corresponding mixture. Then, using a regressive model which is separately trained on a clean, smaller, and out-of-domain dataset, we estimate the level of bleeding in the preliminary separations and use that information to guide the diffusion model toward generating cleaner samples. The objective and perceptual evaluations show the potential of the proposed generative system for Carnatic vocal separation.
- dc.description.sponsorship This work is supported by IA y Música: Cátedra en Inteligencia Artificial y Música (TSI-100929-2023-1), funded by the Secretaría de Estado de Digitalización e Inteligencia Artificial, and the European Union-Next Generation EU, under the program Cátedras ENIA 2022 para la creación de cátedras universidad-empresa en IA, and IMPA: Multimodal AI for Audio Processing (PID2023-152250OB-I00), funded by the Ministry of Science, Innovation and Universities of the Spanish Government, the Agencia Estatal de Investigación (AEI) and co-financed by the European Union.
- dc.format.mimetype application/pdf
- dc.identifier.citation Plaja-Roglans G, Serra X, Rocamora M. Leveraging Carnatic live recordings for singing voice separation using regression-guided latent diffusion. Paper presented at: 26th International Society for Music Information Retrieval Conference (ISMIR 2025); 2025 Sep 21-25; Daejeon, Korea. 9p.
- dc.identifier.uri http://hdl.handle.net/10230/71153
- dc.language.iso eng
- dc.publisher International Society for Music Information Retrieval (ISMIR)
- dc.relation.projectID info:eu-repo/grantAgreement/ES/3PE/PID2023-152250OB-I00
- dc.rights Licensed under a Creative Commons Attribution 4.0 International License (CC BY 4.0). Attribution: G. Plaja-Roglans, X. Serra, and M. Rocamora, “Leveraging Carnatic live recordings for singing voice separation using regression-guided latent diffusion”, in Proc. of the 26th Int. Society for Music Information Retrieval Conf., Daejeon, South Korea, 2025.
- dc.rights.accessRights info:eu-repo/semantics/openAccess
- dc.rights.uri http://creativecommons.org/licenses/by/4.0/
- dc.subject.keyword Carnatic music
- dc.subject.keyword Live recordings
- dc.subject.keyword Singing voice separation
- dc.subject.keyword Regression-guided latent diffusion
- dc.title Leveraging Carnatic live recordings for singing voice separation using regression-guided latent diffusion
- dc.type info:eu-repo/semantics/conferenceObject
- dc.type.version info:eu-repo/semantics/publishedVersion