Welcome to the UPF Digital Repository

Attentional parallel RNNs for generating punctuation in transcribed speech

Show simple item record

dc.contributor.author Öktem, Alp
dc.contributor.author Farrús, Mireia
dc.contributor.author Wanner, Leo
dc.date.accessioned 2018-02-20T11:30:28Z
dc.date.available 2018-02-20T11:30:28Z
dc.date.issued 2017
dc.identifier.citation Öktem A, Farrús M, Wanner L. Attentional parallel RNNs for generating punctuation in transcribed speech. In: Camelin N, Estève Y, Martín-Vide C. Statistical Language and Speech Processing. 5th International Conference SLSP 2017; 2017 Oct 23-25; Le Mans, France. Cham: Springer, 2017. p. 131-42. (LNCS; no. 10583 ). DOI: 10.1007/978-3-319-68456-7_11
dc.identifier.issn 0302-9743
dc.identifier.uri http://hdl.handle.net/10230/33936
dc.description.abstract Until very recently, the generation of punctuation marks for automatic speech recognition (ASR) output has been mostly done by looking at the syntactic structure of the recognized utterances. Prosodic cues such as breaks, speech rate, pitch intonation that influence placing of punctuation marks on speech transcripts have been seldom used. We propose a method that uses recurrent neural networks, taking prosodic and lexical information into account in order to predict punctuation marks for raw ASR output. Our experiments show that an attention mechanism over parallel sequences of prosodic cues aligned with transcribed speech improves accuracy of punctuation generation.
dc.description.sponsorship We would like to thank Francesco Barbieri for offering his technical insights throughout this work. This work is part of the KRISTINA project, which has received funding from the European Union’s Horizon 2020 Research and Innovation Programme under the Grant Agreement number H2020-RIA-645012. The second author is partially funded by the Spanish Ministry of Economy, Industry and Competitiveness through the Ramón y Cajal program.
dc.format.mimetype application/pdf
dc.language.iso eng
dc.publisher Springer
dc.relation.ispartof Camelin N, Estève Y, Martín-Vide C. Statistical Language and Speech Processing. 5th International Conference SLSP 2017; 2017 Oct 23-25; Le Mans, France. Cham: Springer, 2017. p. 131-42. (LNCS; no. 10583).
dc.relation.isreferencedby http://hdl.handle.net/10230/33981
dc.relation.isreferencedby http://hdl.handle.net/10230/33982
dc.rights © Springer The final publication is available at Springer via http://dx.doi.org/10.1007/978-3-319-68456-7_11
dc.title Attentional parallel RNNs for generating punctuation in transcribed speech
dc.type info:eu-repo/semantics/conferenceObject
dc.identifier.doi http://dx.doi.org/10.1007/978-3-319-68456-7_11
dc.subject.keyword Speech transcription
dc.subject.keyword Recurrent neural networks
dc.subject.keyword Prosody
dc.subject.keyword Punctuation generation
dc.subject.keyword Automatic speech recognition
dc.relation.projectID info:eu-repo/grantAgreement/EC/H2020/645012
dc.rights.accessRights info:eu-repo/semantics/openAccess
dc.type.version info:eu-repo/semantics/acceptedVersion

Thumbnail

This item appears in the following Collection(s)

Show simple item record

Search DSpace


Advanced Search

Browse

My Account

Statistics

Compliant to Partaking