A large Spanish-Catalan parallel corpus release for machine translation
| dc.contributor.author | Costa-jussà, Marta R. | ca |
| dc.contributor.author | Fonollosa, José A Rodriguez | ca |
| dc.contributor.author | Mariño Acebal, José B. | ca |
| dc.contributor.author | Poch, Marc | ca |
| dc.contributor.author | Farrús, Mireia | ca |
| dc.date.accessioned | 2016-05-11T12:46:29Z | |
| dc.date.available | 2016-05-11T12:46:29Z | |
| dc.date.issued | 2014 | ca |
| dc.description.abstract | We present a large Spanish-Catalan parallel corpus extracted from ten years of the paper edition of a bilingual Catalan newspaper. The produced corpus of 7:5M parallel sentences (around 180M words per language) is useful for many natural language applications. We report excellent results when building a statistical machine translation system trained on this parallel corpus. The Spanish-Catalan corpus is partially available via ELDA (Evaluations and Language Resources Distribution Agency) in catalog number ELRA-W0053. | en |
| dc.description.sponsorship | This work has been partially funded by the Seventh Framework Program of the European Commission through the International Outgoing Fellowship Marie Curie Action (IMTraP-2011-29951). The authors also want to thank the Universitat Polit ecnica de Catalunya for its support and permission to publish this research. | en |
| dc.format.mimetype | application/pdf | ca |
| dc.identifier.citation | Costa-Jussa MR, Fonollosa JAR, Marino JB, Poch M, Farrus M. A large Spanish-Catalan parallel corpus release for machine translation. Computing and Informatics. 2014;33(4):907-20. | ca |
| dc.identifier.issn | 1335-9150 | ca |
| dc.identifier.uri | http://hdl.handle.net/10230/26266 | |
| dc.language.iso | eng | ca |
| dc.publisher | Institute of Informatics Slovak Academy of Sciences | en |
| dc.relation.ispartof | Computing and Informatics. 2014;33(4):907-20 | en |
| dc.relation.projectID | info:eu-repo/grantAgreement/EC/FP7/29951 | |
| dc.rights | © Institute of Informatics Slovak Academy of Sciences | en |
| dc.rights.accessRights | info:eu-repo/semantics/openAccess | ca |
| dc.subject.keyword | Catalan-Spanish parallel corpus | en |
| dc.subject.keyword | Machine translation | en |
| dc.title | A large Spanish-Catalan parallel corpus release for machine translation | ca |
| dc.type | info:eu-repo/semantics/article | ca |
| dc.type.version | info:eu-repo/semantics/publishedVersion | ca |
Files
Original bundle
1 - 1 of 1
