Bilingual prosodic dataset compilation for spoken language translation
Mostra el registre complet Registre parcial de l'ítem
- dc.contributor.author Öktem, Alpca
- dc.contributor.author Farrús, Mireiaca
- dc.contributor.author Bonafonte Cávez, Antonioca
- dc.date.accessioned 2018-10-11T10:19:22Z
- dc.date.available 2018-10-11T10:19:22Z
- dc.date.issued 2018
- dc.description Comunicació presentada a: IberSpeech 2018, celebrat el 21 al 23 de novembre de 2018 a Barcelona.ca
- dc.description.abstract This paper builds on a previous methodology that exploits dubbed media material to build prosodically annotated bilingual corpora. The almost fully-automatized process serves for building data for training spoken language models without the need for designing and recording bilingual data. The methodology is put into use by compiling an English-Spanish parallel corpus using a recent TV series. The collected corpus contains 7000 parallel utterances totaling to about 10 hours of data annotated with speaker information, word-alignments and word-level acoustic features. Both the extraction scripts and the dataset are distributed open-source for research purposes.en
- dc.description.sponsorship The annotation work carried by the annotators was financed with the 2018 Maria de Maeztu Reproducibility Award from Department of Information and Communication Technologies of Universitat Pompeu Fabra received by the first author. The second author is funded by the Spanish Ministry through the Ramón y Cajal program.
- dc.format.mimetype application/pdf
- dc.identifier.citation Öktem A, Farrús M, Bonafonte A. Bilingual prosodic dataset compilation for spoken language translation. In: IberSpeech 2018; 2018 Nov 21-23; Barcelona, Spain. Baixas, France: ISCA; 2018. p. 20-4. DOI: 10.21437/IberSPEECH.2018-5
- dc.identifier.doi http://dx.doi.org/10.21437/IberSPEECH.2018-5
- dc.identifier.uri http://hdl.handle.net/10230/35600
- dc.language.iso eng
- dc.publisher International Speech Communication Association (ISCA)ca
- dc.relation.ispartof IberSpeech 2018; 2018 Nov 21-23; Barcelona, Spain. Baixas, France: ISCA; 2018. p. 20-4.
- dc.relation.isreferencedby http://hdl.handle.net/10230/35572
- dc.rights © 2018 ISCA
- dc.rights.accessRights info:eu-repo/semantics/openAccess
- dc.subject.keyword Bilingual corpora
- dc.subject.keyword Spoken machine translation
- dc.subject.keyword Prosody
- dc.title Bilingual prosodic dataset compilation for spoken language translationca
- dc.type info:eu-repo/semantics/conferenceObject
- dc.type.version info:eu-repo/semantics/publishedVersion