Attentional parallel RNNs for generating punctuation in transcribed speech

dc.contributor.authorÖktem, Alpca
dc.contributor.authorFarrús, Mireiaca
dc.contributor.authorWanner, Leoca
dc.date.accessioned2018-02-20T11:30:28Z
dc.date.available2018-02-20T11:30:28Z
dc.date.issued2017
dc.description.abstractUntil very recently, the generation of punctuation marks for automatic speech recognition (ASR) output has been mostly done by looking at the syntactic structure of the recognized utterances. Prosodic cues such as breaks, speech rate, pitch intonation that influence placing of punctuation marks on speech transcripts have been seldom used. We propose a method that uses recurrent neural networks, taking prosodic and lexical information into account in order to predict punctuation marks for raw ASR output. Our experiments show that an attention mechanism over parallel sequences of prosodic cues aligned with transcribed speech improves accuracy of punctuation generation.
dc.description.sponsorshipWe would like to thank Francesco Barbieri for offering his technical insights throughout this work. This work is part of the KRISTINA project, which has received funding from the European Union’s Horizon 2020 Research and Innovation Programme under the Grant Agreement number H2020-RIA-645012. The second author is partially funded by the Spanish Ministry of Economy, Industry and Competitiveness through the Ramón y Cajal program.
dc.format.mimetypeapplication/pdf
dc.identifier.citationÖktem A, Farrús M, Wanner L. Attentional parallel RNNs for generating punctuation in transcribed speech. In: Camelin N, Estève Y, Martín-Vide C. Statistical Language and Speech Processing. 5th International Conference SLSP 2017; 2017 Oct 23-25; Le Mans, France. Cham: Springer, 2017. p. 131-42. (LNCS; no. 10583 ). DOI: 10.1007/978-3-319-68456-7_11
dc.identifier.doihttp://dx.doi.org/10.1007/978-3-319-68456-7_11
dc.identifier.issn0302-9743
dc.identifier.urihttp://hdl.handle.net/10230/33936
dc.language.isoeng
dc.publisherSpringerca
dc.relation.ispartofCamelin N, Estève Y, Martín-Vide C. Statistical Language and Speech Processing. 5th International Conference SLSP 2017; 2017 Oct 23-25; Le Mans, France. Cham: Springer, 2017. p. 131-42. (LNCS; no. 10583).
dc.relation.isreferencedbyhttp://hdl.handle.net/10230/33981
dc.relation.isreferencedbyhttp://hdl.handle.net/10230/33982
dc.relation.projectIDinfo:eu-repo/grantAgreement/EC/H2020/645012
dc.rights© Springer The final publication is available at Springer via http://dx.doi.org/10.1007/978-3-319-68456-7_11
dc.rights.accessRightsinfo:eu-repo/semantics/openAccess
dc.subject.keywordSpeech transcription
dc.subject.keywordRecurrent neural networks
dc.subject.keywordProsody
dc.subject.keywordPunctuation generation
dc.subject.keywordAutomatic speech recognition
dc.titleAttentional parallel RNNs for generating punctuation in transcribed speechca
dc.typeinfo:eu-repo/semantics/conferenceObject
dc.type.versioninfo:eu-repo/semantics/acceptedVersion

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
oktem_lncs_attentional.pdf
Size:
279.49 KB
Format:
Adobe Portable Document Format