Öktem, AlpFarrús, MireiaBonafonte Cávez, Antonio2018-10-112018-10-112018Öktem A, Farrús M, Bonafonte A. Bilingual prosodic dataset compilation for spoken language translation. In: IberSpeech 2018; 2018 Nov 21-23; Barcelona, Spain. Baixas, France: ISCA; 2018. p. 20-4. DOI: 10.21437/IberSPEECH.2018-5http://hdl.handle.net/10230/35600Comunicació presentada a: IberSpeech 2018, celebrat el 21 al 23 de novembre de 2018 a Barcelona.This paper builds on a previous methodology that exploits dubbed media material to build prosodically annotated bilingual corpora. The almost fully-automatized process serves for building data for training spoken language models without the need for designing and recording bilingual data. The methodology is put into use by compiling an English-Spanish parallel corpus using a recent TV series. The collected corpus contains 7000 parallel utterances totaling to about 10 hours of data annotated with speaker information, word-alignments and word-level acoustic features. Both the extraction scripts and the dataset are distributed open-source for research purposes.application/pdfeng© 2018 ISCABilingual prosodic dataset compilation for spoken language translationinfo:eu-repo/semantics/conferenceObjecthttp://dx.doi.org/10.21437/IberSPEECH.2018-5Bilingual corporaSpoken machine translationProsodyinfo:eu-repo/semantics/openAccess