Benvinguts al Repositori Digital de la UPF

Integrating lexical and prosodic features for automatic paragraph segmentation

Mostra el registre parcial de l'element

dc.contributor.author Lai, Catherine
dc.contributor.author Farrús, Mireia
dc.contributor.author Moore, Johanna D.
dc.date.accessioned 2020-05-26T09:26:51Z
dc.date.available 2020-05-26T09:26:51Z
dc.date.issued 2020
dc.identifier.citation Lai C, Farrús M, Moor JD. Integrating lexical and prosodic features for automatic paragraph segmentation. Speech Commun. 2020 Aug;121:44-57. DOI: 10.1016/j.specom.2020.04.007
dc.identifier.issn 0167-6393
dc.identifier.uri http://hdl.handle.net/10230/44818
dc.description.abstract Spoken documents, such as podcasts or lectures, are a growing presence in everyday life. Being able to automatically identify their discourse structure is an important step to understanding what a spoken document is about. Moreover, finer-grained units, such as paragraphs, are highly desirable for presenting and analyzing spoken content. However, little work has been done on discourse based speech segmentation below the level of broad topics. In order to examine how discourse transitions are cued in speech, we investigate automatic paragraph segmentation of TED talks using lexical and prosodic features. Experiments using Support Vector Machines, AdaBoost, and Neural Networks show that models using supra-sentential prosodic features and induced cue words perform better than those based on the type of lexical cohesion measures often used in broad topic segmentation. Moreover, combining a wide range of individually weak lexical and prosodic predictors improves performance, and modelling contextual information using recurrent neural networks outperforms other approaches by a large margin. Our best results come from using late fusion methods that integrate representations generated by separate lexical and prosodic models while allowing interactions between these features streams rather than treating them as independent information sources. Application to ASR outputs shows that adding prosodic features, particularly using late fusion, can significantly ameliorate decreases in performance due to transcription errors.
dc.description.sponsorship The second author was funded from the EU’s Horizon 2020 Research and Innovation Programme under the GA H2020-RIA-645012 and the Spanish Ministry of Economy and Competitivity Juan de la Cierva program. The other authors were funded by the University of Edinburgh.
dc.format.mimetype application/pdf
dc.language.iso eng
dc.publisher Elsevier
dc.relation.ispartof Speech communication. 2020 Aug;121:44-57.
dc.rights © Elsevier https://doi.org/10.1016/j.specom.2020.04.007
dc.title Integrating lexical and prosodic features for automatic paragraph segmentation
dc.type info:eu-repo/semantics/article
dc.identifier.doi http://dx.doi.org/10.1016/j.specom.2020.04.007
dc.subject.keyword Discourse structure
dc.subject.keyword Paragraph segmentation
dc.subject.keyword Prosody
dc.subject.keyword Spoken language understanding
dc.subject.keyword Coherence
dc.relation.projectID info:eu-repo/grantAgreement/EC/H2020/645012
dc.rights.accessRights info:eu-repo/semantics/openAccess
dc.type.version info:eu-repo/semantics/acceptedVersion

Thumbnail

Aquest element apareix en la col·lecció o col·leccions següent(s)

Mostra el registre parcial de l'element

Cerca


Cerca avançada

Visualitza

El meu compte

Estadístiques

Amb col·laboració de Complim Participem