Attentional parallel RNNs for generating punctuation in transcribed speech

Öktem, AlpFarrús, MireiaWanner, Leo2018-02-202018-02-202017Öktem A, Farrús M, Wanner L. Attentional parallel RNNs for generating punctuation in transcribed speech. In: Camelin N, Estève Y, Martín-Vide C. Statistical Language and Speech Processing. 5th International Conference SLSP 2017; 2017 Oct 23-25; Le Mans, France. Cham: Springer, 2017. p. 131-42. (LNCS; no. 10583 ). DOI: 10.1007/978-3-319-68456-7_110302-9743http://hdl.handle.net/10230/33936Until very recently, the generation of punctuation marks for automatic speech recognition (ASR) output has been mostly done by looking at the syntactic structure of the recognized utterances. Prosodic cues such as breaks, speech rate, pitch intonation that influence placing of punctuation marks on speech transcripts have been seldom used. We propose a method that uses recurrent neural networks, taking prosodic and lexical information into account in order to predict punctuation marks for raw ASR output. Our experiments show that an attention mechanism over parallel sequences of prosodic cues aligned with transcribed speech improves accuracy of punctuation generation.application/pdfeng© Springer The final publication is available at Springer via http://dx.doi.org/10.1007/978-3-319-68456-7_11Attentional parallel RNNs for generating punctuation in transcribed speechinfo:eu-repo/semantics/conferenceObjecthttp://dx.doi.org/10.1007/978-3-319-68456-7_11Speech transcriptionRecurrent neural networksProsodyPunctuation generationAutomatic speech recognitioninfo:eu-repo/semantics/openAccess