Welcome to the UPF Digital Repository

Overcoming statistical machine translation limitations: error analysis and proposed solutions for the Catalan–Spanish language pair

Show simple item record

dc.contributor.author Farrús, Mireia
dc.contributor.author Costa-jussà, Marta R.
dc.contributor.author Mariño Acebal, José B.
dc.contributor.author Poch, Marc
dc.contributor.author Hernández, Adolfo
dc.contributor.author Henríquez, Carlos
dc.contributor.author Fonollosa, José A Rodriguez
dc.date.accessioned 2017-09-04T12:49:24Z
dc.date.available 2017-09-04T12:49:24Z
dc.date.issued 2011
dc.identifier.citation Farrús M, Costa-Jussà MR, Mariño JB, Poch M, Hernández A, Henríquez C, Fonollosa JAR. Overcoming statistical machine translation limitations: error analysis and proposed solutions for the Catalan–Spanish language pair. Lang Resour Eval. 2011;45(2):181-208. DOI: 10.1007/s10579-011-9137-0
dc.identifier.issn 1574-020X
dc.identifier.uri http://hdl.handle.net/10230/32733
dc.description.abstract This work aims to improve an N-gram-based statistical machine translation system between the Catalan and Spanish languages, trained with an aligned Spanish-Catalan parallel corpus consisting of 1.7 million sentences taken from El Periódico newspaper. Starting from a linguistic error analysis above this baseline system, orthographic, morphological, lexical, semantic and syntactic problems are approached using a set of techniques. The proposed solutions include the development and application of additional statistical techniques, text pre- and post-processing tasks, and rules based on the use of grammatical categories, as well as lexical categorization. The performance of the improved system is clearly increased, as is shown in both human and automatic evaluations of the system, with a gain of about 1.1 points BLEU observed in the Spanish-to-Catalan direction of translation, and a gain of about 0.5 points in the reverse direction. The final system is freely available online as a linguistic resource.
dc.description.sponsorship This work has been partially funded by the Spanish Department of Science and Innovation through the Juan de la Cierva fellowship program and the Spanish Government under the BUCEADOR project (TEC2009-14094-C04-01).
dc.format.mimetype application/pdf
dc.language.iso eng
dc.publisher Springer
dc.relation.ispartof Language resources and evaluation. 2011;45(2):181-208.
dc.rights © Springer The final publication is available at Springer via http://dx.doi.org/10.1007/s10579-011-9137-0”
dc.title Overcoming statistical machine translation limitations: error analysis and proposed solutions for the Catalan–Spanish language pair
dc.type info:eu-repo/semantics/article
dc.identifier.doi http://dx.doi.org/10.1007/s10579-011-9137-0
dc.subject.keyword Statistical machine translation
dc.subject.keyword N-gram-based translation
dc.subject.keyword Linguistic knowledge
dc.subject.keyword Grammatical categories
dc.relation.projectID info:eu-repo/grantAgreement/ES/3PN/TEC2009-14094-C04-01
dc.rights.accessRights info:eu-repo/semantics/openAccess
dc.type.version info:eu-repo/semantics/acceptedVersion


This item appears in the following Collection(s)

Show simple item record

Search DSpace


Advanced Search

Browse

My Account

Statistics

Compliant to Partaking