A large Spanish-Catalan parallel corpus release for machine translation

dc.contributor.authorCosta-jussà, Marta R.ca
dc.contributor.authorFonollosa, José A Rodriguezca
dc.contributor.authorMariño Acebal, José B.ca
dc.contributor.authorPoch, Marcca
dc.contributor.authorFarrús, Mireiaca
dc.date.accessioned2016-05-11T12:46:29Z
dc.date.available2016-05-11T12:46:29Z
dc.date.issued2014ca
dc.description.abstractWe present a large Spanish-Catalan parallel corpus extracted from ten years of the paper edition of a bilingual Catalan newspaper. The produced corpus of 7:5M parallel sentences (around 180M words per language) is useful for many natural language applications. We report excellent results when building a statistical machine translation system trained on this parallel corpus. The Spanish-Catalan corpus is partially available via ELDA (Evaluations and Language Resources Distribution Agency) in catalog number ELRA-W0053.en
dc.description.sponsorshipThis work has been partially funded by the Seventh Framework Program of the European Commission through the International Outgoing Fellowship Marie Curie Action (IMTraP-2011-29951). The authors also want to thank the Universitat Polit ecnica de Catalunya for its support and permission to publish this research.en
dc.format.mimetypeapplication/pdfca
dc.identifier.citationCosta-Jussa MR, Fonollosa JAR, Marino JB, Poch M, Farrus M. A large Spanish-Catalan parallel corpus release for machine translation. Computing and Informatics. 2014;33(4):907-20.ca
dc.identifier.issn1335-9150ca
dc.identifier.urihttp://hdl.handle.net/10230/26266
dc.language.isoengca
dc.publisherInstitute of Informatics Slovak Academy of Sciencesen
dc.relation.ispartofComputing and Informatics. 2014;33(4):907-20en
dc.relation.projectIDinfo:eu-repo/grantAgreement/EC/FP7/29951
dc.rights© Institute of Informatics Slovak Academy of Sciencesen
dc.rights.accessRightsinfo:eu-repo/semantics/openAccessca
dc.subject.keywordCatalan-Spanish parallel corpusen
dc.subject.keywordMachine translationen
dc.titleA large Spanish-Catalan parallel corpus release for machine translationca
dc.typeinfo:eu-repo/semantics/articleca
dc.type.versioninfo:eu-repo/semantics/publishedVersionca

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Poch_ci_lsc.pdf
Size:
1.1 MB
Format:
Adobe Portable Document Format