A large Spanish-Catalan parallel corpus release for machine translation
Mostra el registre complet Registre parcial de l'ítem
- dc.contributor.author Costa-jussà, Marta R.ca
- dc.contributor.author Fonollosa, José A Rodriguezca
- dc.contributor.author Mariño Acebal, José B.ca
- dc.contributor.author Poch, Marcca
- dc.contributor.author Farrús, Mireiaca
- dc.date.accessioned 2016-05-11T12:46:29Z
- dc.date.available 2016-05-11T12:46:29Z
- dc.date.issued 2014ca
- dc.description.abstract We present a large Spanish-Catalan parallel corpus extracted from ten years of the paper edition of a bilingual Catalan newspaper. The produced corpus of 7:5M parallel sentences (around 180M words per language) is useful for many natural language applications. We report excellent results when building a statistical machine translation system trained on this parallel corpus. The Spanish-Catalan corpus is partially available via ELDA (Evaluations and Language Resources Distribution Agency) in catalog number ELRA-W0053.en
- dc.description.sponsorship This work has been partially funded by the Seventh Framework Program of the European Commission through the International Outgoing Fellowship Marie Curie Action (IMTraP-2011-29951). The authors also want to thank the Universitat Polit ecnica de Catalunya for its support and permission to publish this research.en
- dc.format.mimetype application/pdfca
- dc.identifier.citation Costa-Jussa MR, Fonollosa JAR, Marino JB, Poch M, Farrus M. A large Spanish-Catalan parallel corpus release for machine translation. Computing and Informatics. 2014;33(4):907-20.ca
- dc.identifier.issn 1335-9150ca
- dc.identifier.uri http://hdl.handle.net/10230/26266
- dc.language.iso engca
- dc.publisher Institute of Informatics Slovak Academy of Sciencesen
- dc.relation.ispartof Computing and Informatics. 2014;33(4):907-20en
- dc.relation.projectID info:eu-repo/grantAgreement/EC/FP7/29951
- dc.rights © Institute of Informatics Slovak Academy of Sciencesen
- dc.rights.accessRights info:eu-repo/semantics/openAccessca
- dc.subject.keyword Catalan-Spanish parallel corpusen
- dc.subject.keyword Machine translationen
- dc.title A large Spanish-Catalan parallel corpus release for machine translationca
- dc.type info:eu-repo/semantics/articleca
- dc.type.version info:eu-repo/semantics/publishedVersionca