Mostra el registre parcial de l'element
dc.contributor.author | Öktem, Alp |
dc.contributor.author | Farrús, Mireia |
dc.contributor.author | Bonafonte Cávez, Antonio |
dc.date.accessioned | 2018-10-11T10:19:22Z |
dc.date.available | 2018-10-11T10:19:22Z |
dc.date.issued | 2018 |
dc.identifier.citation | Öktem A, Farrús M, Bonafonte A. Bilingual prosodic dataset compilation for spoken language translation. In: IberSpeech 2018; 2018 Nov 21-23; Barcelona, Spain. Baixas, France: ISCA; 2018. p. 20-4. DOI: 10.21437/IberSPEECH.2018-5 |
dc.identifier.uri | http://hdl.handle.net/10230/35600 |
dc.description | Comunicació presentada a: IberSpeech 2018, celebrat el 21 al 23 de novembre de 2018 a Barcelona. |
dc.description.abstract | This paper builds on a previous methodology that exploits dubbed media material to build prosodically annotated bilingual corpora. The almost fully-automatized process serves for building data for training spoken language models without the need for designing and recording bilingual data. The methodology is put into use by compiling an English-Spanish parallel corpus using a recent TV series. The collected corpus contains 7000 parallel utterances totaling to about 10 hours of data annotated with speaker information, word-alignments and word-level acoustic features. Both the extraction scripts and the dataset are distributed open-source for research purposes. |
dc.description.sponsorship | The annotation work carried by the annotators was financed with the 2018 Maria de Maeztu Reproducibility Award from Department of Information and Communication Technologies of Universitat Pompeu Fabra received by the first author. The second author is funded by the Spanish Ministry through the Ramón y Cajal program. |
dc.format.mimetype | application/pdf |
dc.language.iso | eng |
dc.publisher | International Speech Communication Association (ISCA) |
dc.relation.ispartof | IberSpeech 2018; 2018 Nov 21-23; Barcelona, Spain. Baixas, France: ISCA; 2018. p. 20-4. |
dc.relation.isreferencedby | http://hdl.handle.net/10230/35572 |
dc.rights | © 2018 ISCA |
dc.title | Bilingual prosodic dataset compilation for spoken language translation |
dc.type | info:eu-repo/semantics/conferenceObject |
dc.identifier.doi | http://dx.doi.org/10.21437/IberSPEECH.2018-5 |
dc.subject.keyword | Bilingual corpora |
dc.subject.keyword | Spoken machine translation |
dc.subject.keyword | Prosody |
dc.rights.accessRights | info:eu-repo/semantics/openAccess |
dc.type.version | info:eu-repo/semantics/publishedVersion |