Using hierarchical information structure for prosody prediction in content-to-speech applications

Citació

Domínguez M, Farrús M, Burga A, Wanner L. Using hierarchical information structure for prosody prediction in content-to-speech application. In: Proceedings of Speech Prosody 8; 2016 May 31 - Jun 3; Boston, United States. [Boston]: ISCA, 2016. p. 1019-23. DOI: 10.21437/SPEECHPROSODY.2016-209

Enllaç permanent

Descripció

Resum
State-of-the-art prosody modelling in content-to-speech (CTS) applications still uses the same methodology to predict intonation cues as text-to-speech (TTS) applications, namely the analysis of the generated surface sentences with respect to part of speech, syntactic dependency relations and word order. On the other side, several theoretical studies argue that morphology, syntax, and information (or communicative) structure that organizes/na given content (semantic or deep-syntactic structure) with respect to the intention of the speaker show a strong correlation with intonation. However, little empirical work based on sufficiently large corpora has been carried out so far to buttress this argumentation. We present empirical evidence for the Information Structure–Prosody correlation using the Wall Street Journal Penn Treebank corpus recorded by native American English speakers. Our experiments reach a prosody prediction accuracy of 80% using the hierarchical information structure from the Meaning-Text Theory, compared to 59% of the baseline.
Descripció
Paper presented at Speech Prosody 8, 2016 May 31 - Jun 3; Boston, United States.
DOI
http://dx.doi.org/10.21437/SPEECHPROSODY.2016-209
Col·leccions
Congressos (Departament de Tecnologies de la Informació i les Comunicacions)
Documents OpenAIRE (Open Access Infrastructure for Research in Europe)

Fitxers