Punctuation marks support understandability and readability in written language. In spoken language, punctuation of the transcribed speech is influenced by two phenomena: (1) syntax and (2) prosody. We present a software architecture that makes it possible to train punctuation restoration models from any combination of lexical, morphosyntactic, prosodic and acoustic features. Architecture is language independent and feeds on word-segmented data. A dataset compiled from English TED talks is given ...
Punctuation marks support understandability and readability in written language. In spoken language, punctuation of the transcribed speech is influenced by two phenomena: (1) syntax and (2) prosody. We present a software architecture that makes it possible to train punctuation restoration models from any combination of lexical, morphosyntactic, prosodic and acoustic features. Architecture is language independent and feeds on word-segmented data. A dataset compiled from English TED talks is given in http://hdl.handle.net/10230/33981
+