A large Spanish-Catalan parallel corpus release for machine translation

Mostra el registre complet Registre parcial de l'ítem

  • dc.contributor.author Costa-jussà, Marta R.ca
  • dc.contributor.author Fonollosa, José A Rodriguezca
  • dc.contributor.author Mariño Acebal, José B.ca
  • dc.contributor.author Poch, Marcca
  • dc.contributor.author Farrús, Mireiaca
  • dc.date.accessioned 2016-05-11T12:46:29Z
  • dc.date.available 2016-05-11T12:46:29Z
  • dc.date.issued 2014ca
  • dc.description.abstract We present a large Spanish-Catalan parallel corpus extracted from ten years of the paper edition of a bilingual Catalan newspaper. The produced corpus of 7:5M parallel sentences (around 180M words per language) is useful for many natural language applications. We report excellent results when building a statistical machine translation system trained on this parallel corpus. The Spanish-Catalan corpus is partially available via ELDA (Evaluations and Language Resources Distribution Agency) in catalog number ELRA-W0053.en
  • dc.description.sponsorship This work has been partially funded by the Seventh Framework Program of the European Commission through the International Outgoing Fellowship Marie Curie Action (IMTraP-2011-29951). The authors also want to thank the Universitat Polit ecnica de Catalunya for its support and permission to publish this research.en
  • dc.format.mimetype application/pdfca
  • dc.identifier.citation Costa-Jussa MR, Fonollosa JAR, Marino JB, Poch M, Farrus M. A large Spanish-Catalan parallel corpus release for machine translation. Computing and Informatics. 2014;33(4):907-20.ca
  • dc.identifier.issn 1335-9150ca
  • dc.identifier.uri http://hdl.handle.net/10230/26266
  • dc.language.iso engca
  • dc.publisher Institute of Informatics Slovak Academy of Sciencesen
  • dc.relation.ispartof Computing and Informatics. 2014;33(4):907-20en
  • dc.relation.projectID info:eu-repo/grantAgreement/EC/FP7/29951
  • dc.rights © Institute of Informatics Slovak Academy of Sciencesen
  • dc.rights.accessRights info:eu-repo/semantics/openAccessca
  • dc.subject.keyword Catalan-Spanish parallel corpusen
  • dc.subject.keyword Machine translationen
  • dc.title A large Spanish-Catalan parallel corpus release for machine translationca
  • dc.type info:eu-repo/semantics/articleca
  • dc.type.version info:eu-repo/semantics/publishedVersionca