Translation quality regarding low-resource, custom machine translations: a fine-grained comparative study on turkish-to-english statistical and neural machine translation systems

Mostra el registre complet Registre parcial de l'ítem

  • dc.contributor.author Dogru, Gokhan
  • dc.date.accessioned 2025-01-28T15:20:45Z
  • dc.date.available 2025-01-28T15:20:45Z
  • dc.date.issued 2022
  • dc.description.abstract Corpus-based machine translation (MT) has been the main approach to developing and implementing MT systems in both academia and the industry over the last three decades. In this field, the type and size of the corpus used for training MT engines have presented problems for both statistical MT (SMT) systems as well as neural MT (NMT) systems, being the two dominant corpus based approaches. Moreover, language pairs such as Turkish-English have been understudied within this framework. This article aims to evaluate the translation quality in Turkish-to-English custom MT systems that have been trained on different corpus sizes and types. Two NMT engines and two SMT engines were trained on the KantanMT platform using two different training corpus types with either only domain-specific cardiology corpus or this corpus plus a mixed-domain corpus. The study conducted both automatic evaluations with metrics including BLEU, F-Measure and TER, as well as a comprehensive human evaluation with metrics including fluency, A/B test, and adequacy. Lastly, the study realized a separate, subjective terminology evaluation in order to investigate how differently MT systems handle terminology, as this is a crucial aspect for specific-domain text types such as cardiology. While the automatic evaluation results suggest the SMT engines to perform better than NMT engines, all human evaluators rated the mixed-domain NMT engine as the highest performing one. However, the terminology evaluation task demonstrated SMT to still be able to perform better and to commit less terminology errors, despite the industry and academia shifting toward NMT engines.
  • dc.format.mimetype application/pdf
  • dc.identifier.citation Dogru G. Translation quality regarding low-resource, custom machine translations: a fine-grained comparative study on turkish-to-english statistical and neural machine translation systems. İstanbul Üniversitesi Çeviribilim Dergisi - Istanbul University Journal of Translation Studies. 2022;17:95-115. DOI: 10.26650/iujts.2022.1182687
  • dc.identifier.doi http://dx.doi.org/10.26650/iujts.2022.1182687
  • dc.identifier.issn 2717-6959
  • dc.identifier.uri http://hdl.handle.net/10230/69350
  • dc.language.iso eng
  • dc.publisher Istanbul University Press
  • dc.relation.ispartof İstanbul Üniversitesi Çeviribilim Dergisi - Istanbul University Journal of Translation Studies. 2022;17:95-115
  • dc.rights This work is licensed under Creative Commons Attribution-NonCommercial 4.0 International License
  • dc.rights.accessRights info:eu-repo/semantics/openAccess
  • dc.rights.uri http://creativecommons.org/licenses/by-nc/4.0/
  • dc.subject.keyword Machine translation evaluation
  • dc.subject.keyword Turkish-to-English machine translation
  • dc.subject.keyword Medical translation
  • dc.subject.keyword Neural machine translation
  • dc.subject.keyword Statistical machine translation
  • dc.title Translation quality regarding low-resource, custom machine translations: a fine-grained comparative study on turkish-to-english statistical and neural machine translation systems
  • dc.type info:eu-repo/semantics/article
  • dc.type.version info:eu-repo/semantics/publishedVersion