Modeling and transforming speech using variational autoencoders

Mostra el registre complet Registre parcial de l'ítem

  • dc.contributor.author Blaauw, Merlijnca
  • dc.contributor.author Bonada, Jordi, 1973-ca
  • dc.date.accessioned 2017-05-29T10:07:33Z
  • dc.date.available 2017-05-29T10:07:33Z
  • dc.date.issued 2016
  • dc.description Comunicació presentada al Interspeech 2016, celebrat a San Francisco (Califòrnia, EUA) els dies 8 a 12 de septembre de 2016, i organitzat per la International Speech Communication Association (ISCA).
  • dc.description.abstract Latent generative models can learn higher-level underlying factors from complex data in an unsupervised manner. Such models can be used in a wide range of speech processing applications, including synthesis, transformation and classification. While there have been many advances in this field in recent years, the application of the resulting models to speech processing tasks is generally not explicitly considered. In this paper we apply the variational autoencoder (VAE) to the task of modeling frame-wise spectral envelopes. The VAE model has many attractive properties such as continuous latent variables, prior probability over these latent variables, a tractable lower bound on the marginal log likelihood, both generative and recognition models, and end-to-end training of deep models. We consider different aspects of training such models for speech data and compare them to more conventional models such as the Restricted Boltzmann Machine (RBM). While evaluating generative models is difficult, we try to obtain a balanced picture by considering both performance in terms of reconstruction error and when applying the model to a series of modeling and transformation tasks to get an idea of the quality of the learned features.
  • dc.format.mimetype application/pdfca
  • dc.identifier.citation Blaauw M, Bonada J. Modeling and transforming speech using variational autoencoders. In: Morgan N, editor. Interspeech 2016; 2016 Sep 8-12; San Francisco, CA. [place unknown]: ISCA; 2016. p. 1770-4. DOI: 10.21437/Interspeech.2016-1183
  • dc.identifier.doi http://dx.doi.org/10.21437/Interspeech.2016-1183
  • dc.identifier.uri http://hdl.handle.net/10230/32189
  • dc.language.iso eng
  • dc.publisher International Speech Communication Association (ISCA)ca
  • dc.relation.ispartof Morgan N, editor. Interspeech 2016; 2016 Sep 8-12; San Francisco, CA. [place unknown]: ISCA; 2016. p. 1770-4.
  • dc.rights Copyright © 2016 ISCA
  • dc.rights.accessRights info:eu-repo/semantics/openAccess
  • dc.subject.keyword Generative models
  • dc.subject.keyword Variational autoencoder
  • dc.subject.keyword Acoustic modeling
  • dc.subject.keyword Deep learning
  • dc.title Modeling and transforming speech using variational autoencodersca
  • dc.type info:eu-repo/semantics/conferenceObject
  • dc.type.version info:eu-repo/semantics/publishedVersion