Translation quality regarding low-resource, custom machine translations: a fine-grained comparative study on turkish-to-english statistical and neural machine translation systems

Dogru, Gokhan

Translation quality regarding low-resource, custom machine translations: a fine-grained comparative study on turkish-to-english statistical and neural machine translation systems

Mostra el registre complet Registre parcial de l'ítem

dc.contributor.author Dogru, Gokhan
dc.date.accessioned 2025-01-28T15:20:45Z
dc.date.available 2025-01-28T15:20:45Z
dc.date.issued 2022
dc.description.abstract Corpus-based machine translation (MT) has been the main approach to developing and implementing MT systems in both academia and the industry over the last three decades. In this field, the type and size of the corpus used for training MT engines have presented problems for both statistical MT (SMT) systems as well as neural MT (NMT) systems, being the two dominant corpus based approaches. Moreover, language pairs such as Turkish-English have been understudied within this framework. This article aims to evaluate the translation quality in Turkish-to-English custom MT systems that have been trained on different corpus sizes and types. Two NMT engines and two SMT engines were trained on the KantanMT platform using two different training corpus types with either only domain-specific cardiology corpus or this corpus plus a mixed-domain corpus. The study conducted both automatic evaluations with metrics including BLEU, F-Measure and TER, as well as a comprehensive human evaluation with metrics including fluency, A/B test, and adequacy. Lastly, the study realized a separate, subjective terminology evaluation in order to investigate how differently MT systems handle terminology, as this is a crucial aspect for specific-domain text types such as cardiology. While the automatic evaluation results suggest the SMT engines to perform better than NMT engines, all human evaluators rated the mixed-domain NMT engine as the highest performing one. However, the terminology evaluation task demonstrated SMT to still be able to perform better and to commit less terminology errors, despite the industry and academia shifting toward NMT engines.
dc.format.mimetype application/pdf
dc.identifier.citation Dogru G. Translation quality regarding low-resource, custom machine translations: a fine-grained comparative study on turkish-to-english statistical and neural machine translation systems. İstanbul Üniversitesi Çeviribilim Dergisi - Istanbul University Journal of Translation Studies. 2022;17:95-115. DOI: 10.26650/iujts.2022.1182687
dc.identifier.doi http://dx.doi.org/10.26650/iujts.2022.1182687
dc.identifier.issn 2717-6959
dc.identifier.uri http://hdl.handle.net/10230/69350
dc.language.iso eng
dc.publisher Istanbul University Press
dc.relation.ispartof İstanbul Üniversitesi Çeviribilim Dergisi - Istanbul University Journal of Translation Studies. 2022;17:95-115
dc.rights This work is licensed under Creative Commons Attribution-NonCommercial 4.0 International License
dc.rights.accessRights info:eu-repo/semantics/openAccess
dc.rights.uri http://creativecommons.org/licenses/by-nc/4.0/
dc.subject.keyword Machine translation evaluation
dc.subject.keyword Turkish-to-English machine translation
dc.subject.keyword Medical translation
dc.subject.keyword Neural machine translation
dc.subject.keyword Statistical machine translation
dc.title Translation quality regarding low-resource, custom machine translations: a fine-grained comparative study on turkish-to-english statistical and neural machine translation systems
dc.type info:eu-repo/semantics/article
dc.type.version info:eu-repo/semantics/publishedVersion

Col·leccions

Articles (Departament de Traducció i Ciències del Llenguatge)