Welcome to the UPF Digital Repository

A diffusion-inspired training strategy for singing voice extraction in the waveform domain

Show simple item record

dc.contributor.author Plaja-Roglans, Genís
dc.contributor.author Miron, Marius
dc.contributor.author Serra, Xavier
dc.date.accessioned 2022-09-22T12:15:38Z
dc.date.available 2022-09-22T12:15:38Z
dc.date.issued 2022-09-22
dc.identifier.uri http://hdl.handle.net/10230/54156
dc.description This work has been accepted at the 23rd International Society for Music Information Retrieval Conference (ISMIR 2022), at Bengaluru, India. December 4-8, 2022.
dc.description.abstract Notable progress in music source separation has been achieved using multi-branch networks that operate on both temporal and spectral domains. However, such networks tend to be complex and heavy-weighted. In this work, we tackle the task of singing voice extraction from polyphonic music signals in an end-to-end manner using an approach inspired by the training procedure of denoising diffusion models. We perform unconditional signal modelling to gradually convert an input mixture signal to the corresponding singing voice or accompaniment. We use fewer parameters than the state-of-the-art models while operating on the waveform domain, bypassing phase-related problems. More concisely, we train a non-causal WaveNet using a diffusion-inspired strategy improving the said network for singing voice extraction and obtaining performance comparable to the end-to-end state-of-the-art on MUSDB18. We further report results on a non-MUSDB-overlapping version of MedleyDB and the multi-track audio of the Saraga Carnatic dataset showing good generalization, and run perceptual tests of our approach. Code, models, and audio examples are made available.
dc.description.sponsorship This work was carried out under the projects Musical AI - PID2019-111403GB-I00/AEI/10.13039/501100011033 and NextCore - RTC2019-007248-7 funded by the Spanish Ministerio de Ciencia, Innovación y Universidades (MCIU) and the Agencia Estatal de Investigación (AEI).
dc.format.mimetype application/pdf
dc.language.iso eng
dc.rights © G. Plaja-Roglans, M. Miron, and X. Serra. Licensed under a Creative Commons Attribution 4.0 International License (CC BY 4.0). Attribution: G. Plaja-Roglans, M. Miron, and X. Serra, “A diffusion-inspired training strategy for singing voice extraction in the waveform domain”, in Proc. of the 23rd Int. Society for Music Information Retrieval Conf., Bengaluru, India, 2022.
dc.rights.uri https://creativecommons.org/licenses/by/4.0
dc.title A diffusion-inspired training strategy for singing voice extraction in the waveform domain
dc.type info:eu-repo/semantics/preprint
dc.relation.projectID info:eu-repo/grantAgreement/ES/2PE/PID2019-111403GB-I00
dc.relation.projectID info:eu-repo/grantAgreement/ES/2PE/RTC2019-007248-7
dc.rights.accessRights info:eu-repo/semantics/openAccess


This item appears in the following Collection(s)

Show simple item record

Search DSpace

Advanced Search


My Account


In collaboration with Compliant to Partaking