WGansing: a multi-voice singing voice synthesizer based on the Wasserstein-Gan

Chandna, Pritish; Blaauw, Merlijn; Bonada, Jordi, 1973-; Gómez Gutiérrez, Emilia, 1975-

WGansing: a multi-voice singing voice synthesizer based on the Wasserstein-Gan

Mostra el registre complet Registre parcial de l'ítem

dc.contributor.author Chandna, Pritish
dc.contributor.author Blaauw, Merlijn
dc.contributor.author Bonada, Jordi, 1973-
dc.contributor.author Gómez Gutiérrez, Emilia, 1975-
dc.date.accessioned 2021-05-11T09:00:41Z
dc.date.available 2021-05-11T09:00:41Z
dc.date.issued 2019
dc.description Comunicació presentada al EUSIPCO 2019: 27th European Signal Processing Conference, celebrat del 2 al 6 de setembre de 2019 a La Corunya, Espanya.
dc.description.abstract We present a deep neural network based singing voice synthesizer, inspired by the Deep Convolutions Generative Adversarial Networks (DCGAN) architecture and optimized using the Wasserstein-GAN algorithm. We use vocoder parameters for acoustic modelling, to separate the influence of pitch and timbre. This facilitates the modelling of the large variability of pitch in the singing voice. Our network takes a block of consecutive frame-wise linguistic and fundamental frequency features, along with global singer identity as input and outputs vocoder features, corresponding to the block of features. This block-wise approach, along with the training methodology allows us to model temporal dependencies within the features of the input block. For inference, sequential blocks are concatenated using an overlap-add procedure. We show that the performance of our model is competitive with regards to the state-of-the-art and the original sample using objective metrics and a subjective listening test. We also present examples of the synthesis on a supplementary website and the source code via GitHub.en
dc.description.sponsorship This work is partially supported by the European Commission under the TROMPA project (H2020 770376). The TITAN X used for this research was donated by the NVIDIA Corporation.
dc.format.mimetype application/pdf
dc.identifier.citation Chandna P, Blaauw M, Bonada J, Gómez E. WGansing: a multi-voice singing voice synthesizer based on the Wasserstein-Gan. In: EUSIPCO 2019. 27th European Signal Processing Conference; 2019 Sep 2-6; A Coruña, Spain. New Jersey: IEEE; 2019. [5 p.]. DOI: 10.23919/EUSIPCO.2019.8903099
dc.identifier.doi http://dx.doi.org/10.23919/EUSIPCO.2019.8903099
dc.identifier.issn 2076-1465
dc.identifier.uri http://hdl.handle.net/10230/47389
dc.language.iso eng
dc.publisher Institute of Electrical and Electronics Engineers (IEEE)
dc.relation.ispartof EUSIPCO 2019. 27th European Signal Processing Conference; 2019 Sep 2-6; A Coruña, Spain. New Jersey: IEEE; 2019. [5 p.]
dc.relation.projectID info:eu-repo/grantAgreement/EC/H2020/770376
dc.rights © 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. http://dx.doi.org/10.23919/EUSIPCO.2019.8903099
dc.rights.accessRights info:eu-repo/semantics/openAccess
dc.subject.keyword Vocodersen
dc.subject.keyword Gallium nitrideen
dc.subject.keyword Generatorsen
dc.subject.keyword Generative adversarial networksen
dc.subject.keyword Adaptation modelsen
dc.subject.keyword Acousticsen
dc.subject.keyword Trainingen
dc.title WGansing: a multi-voice singing voice synthesizer based on the Wasserstein-Ganen
dc.type info:eu-repo/semantics/conferenceObject
dc.type.version info:eu-repo/semantics/acceptedVersion

Col·leccions

Congressos (Departament de Tecnologies de la Informació i les Comunicacions)
Documents OpenAIRE (Open Access Infrastructure for Research in Europe)