Overcoming statistical machine translation limitations: error analysis and proposed solutions for the Catalan–Spanish language pair

dc.contributor.authorFarrús, Mireiaca
dc.contributor.authorCosta-jussà, Marta R.ca
dc.contributor.authorMariño Acebal, José B.ca
dc.contributor.authorPoch, Marcca
dc.contributor.authorHernández, Adolfoca
dc.contributor.authorHenríquez, Carlosca
dc.contributor.authorFonollosa, José A Rodriguezca
dc.date.accessioned2017-09-04T12:49:24Z
dc.date.available2017-09-04T12:49:24Z
dc.date.issued2011
dc.description.abstractThis work aims to improve an N-gram-based statistical machine translation system between the Catalan and Spanish languages, trained with an aligned Spanish-Catalan parallel corpus consisting of 1.7 million sentences taken from El Periódico newspaper. Starting from a linguistic error analysis above this baseline system, orthographic, morphological, lexical, semantic and syntactic problems are approached using a set of techniques. The proposed solutions include the development and application of additional statistical techniques, text pre- and post-processing tasks, and rules based on the use of grammatical categories, as well as lexical categorization. The performance of the improved system is clearly increased, as is shown in both human and automatic evaluations of the system, with a gain of about 1.1 points BLEU observed in the Spanish-to-Catalan direction of translation, and a gain of about 0.5 points in the reverse direction. The final system is freely available online as a linguistic resource.en
dc.description.sponsorshipThis work has been partially funded by the Spanish Department of Science and Innovation through the Juan de la Cierva fellowship program and the Spanish Government under the BUCEADOR project (TEC2009-14094-C04-01).en
dc.format.mimetypeapplication/pdf
dc.identifier.citationFarrús M, Costa-Jussà MR, Mariño JB, Poch M, Hernández A, Henríquez C, Fonollosa JAR. Overcoming statistical machine translation limitations: error analysis and proposed solutions for the Catalan–Spanish language pair. Lang Resour Eval. 2011;45(2):181-208. DOI: 10.1007/s10579-011-9137-0
dc.identifier.doihttp://dx.doi.org/10.1007/s10579-011-9137-0
dc.identifier.issn1574-020X
dc.identifier.urihttp://hdl.handle.net/10230/32733
dc.language.isoeng
dc.publisherSpringerca
dc.relation.ispartofLanguage resources and evaluation. 2011;45(2):181-208.
dc.relation.projectIDinfo:eu-repo/grantAgreement/ES/3PN/TEC2009-14094-C04-01
dc.rights© Springer The final publication is available at Springer via http://dx.doi.org/10.1007/s10579-011-9137-0”
dc.rights.accessRightsinfo:eu-repo/semantics/openAccess
dc.subject.keywordStatistical machine translationen
dc.subject.keywordN-gram-based translationen
dc.subject.keywordLinguistic knowledgeen
dc.subject.keywordGrammatical categoriesen
dc.titleOvercoming statistical machine translation limitations: error analysis and proposed solutions for the Catalan–Spanish language pairca
dc.typeinfo:eu-repo/semantics/article
dc.type.versioninfo:eu-repo/semantics/acceptedVersion

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
farrus_LRE45_over.pdf
Size:
297.95 KB
Format:
Adobe Portable Document Format