Combining acoustic and linguistic features in phrase-oriented prosody prediction

Citació

Domínguez M, Farrús M, Wanner L. Combining acoustic and linguistic features in phrase-oriented prosody prediction. In: Proceedings of Speech Prosody 8; 2016 May 31 - Jun 3; Boston, United States. [Boston]: ISCA, 2016. p. 796-800. DOI: 10.21437/SPEECHPROSODY.2016-163

Enllaç permanent

Descripció

Resum
Intonation is traditionally considered to be the most important prosodic feature, whereupon an important research effort has been devoted to automatic segmentation and labeling of speech samples to grasp intonation cues. A number of studies also show that when duration or intensity are incorporated, automatic prosody labeling is further improved. However, the combination of word level acoustic features still attains poor results when machine learning techniques are applied on annotated corpora to derive intonation for speech synthesis applications. To address this problem, we present an experimental set-up for the development of a hierarchical prosodic structure model which combines linguistic features, including information structure, and three acoustic elements (intensity, pitch and duration). We show empirically that this combination leads to a considerably more accurate representation of prosody and, consequently, a more reliable automatic labeling of speech corpora for machine learning.
Descripció
Paper presented at Speech Prosody 8, 2016 May 31 - Jun 3; Boston, United States.
DOI
http://dx.doi.org/10.21437/SPEECHPROSODY.2016-163
Col·leccions
Congressos (Departament de Tecnologies de la Informació i les Comunicacions)
Documents OpenAIRE (Open Access Infrastructure for Research in Europe)

Fitxers