Translation quality regarding low-resource, custom machine translations: a fine-grained comparative study on turkish-to-english statistical and neural machine translation systems

dc.contributor.authorDogru, Gokhan
dc.date.accessioned2025-01-28T15:20:45Z
dc.date.available2025-01-28T15:20:45Z
dc.date.issued2022
dc.description.abstractCorpus-based machine translation (MT) has been the main approach to developing and implementing MT systems in both academia and the industry over the last three decades. In this field, the type and size of the corpus used for training MT engines have presented problems for both statistical MT (SMT) systems as well as neural MT (NMT) systems, being the two dominant corpus based approaches. Moreover, language pairs such as Turkish-English have been understudied within this framework. This article aims to evaluate the translation quality in Turkish-to-English custom MT systems that have been trained on different corpus sizes and types. Two NMT engines and two SMT engines were trained on the KantanMT platform using two different training corpus types with either only domain-specific cardiology corpus or this corpus plus a mixed-domain corpus. The study conducted both automatic evaluations with metrics including BLEU, F-Measure and TER, as well as a comprehensive human evaluation with metrics including fluency, A/B test, and adequacy. Lastly, the study realized a separate, subjective terminology evaluation in order to investigate how differently MT systems handle terminology, as this is a crucial aspect for specific-domain text types such as cardiology. While the automatic evaluation results suggest the SMT engines to perform better than NMT engines, all human evaluators rated the mixed-domain NMT engine as the highest performing one. However, the terminology evaluation task demonstrated SMT to still be able to perform better and to commit less terminology errors, despite the industry and academia shifting toward NMT engines.
dc.format.mimetypeapplication/pdf
dc.identifier.citationDogru G. Translation quality regarding low-resource, custom machine translations: a fine-grained comparative study on turkish-to-english statistical and neural machine translation systems. İstanbul Üniversitesi Çeviribilim Dergisi - Istanbul University Journal of Translation Studies. 2022;17:95-115. DOI: 10.26650/iujts.2022.1182687
dc.identifier.doihttp://dx.doi.org/10.26650/iujts.2022.1182687
dc.identifier.issn2717-6959
dc.identifier.urihttp://hdl.handle.net/10230/69350
dc.language.isoeng
dc.publisherIstanbul University Press
dc.relation.ispartofİstanbul Üniversitesi Çeviribilim Dergisi - Istanbul University Journal of Translation Studies. 2022;17:95-115
dc.rightsThis work is licensed under Creative Commons Attribution-NonCommercial 4.0 International License
dc.rights.accessRightsinfo:eu-repo/semantics/openAccess
dc.rights.urihttp://creativecommons.org/licenses/by-nc/4.0/
dc.subject.keywordMachine translation evaluation
dc.subject.keywordTurkish-to-English machine translation
dc.subject.keywordMedical translation
dc.subject.keywordNeural machine translation
dc.subject.keywordStatistical machine translation
dc.titleTranslation quality regarding low-resource, custom machine translations: a fine-grained comparative study on turkish-to-english statistical and neural machine translation systems
dc.typeinfo:eu-repo/semantics/article
dc.type.versioninfo:eu-repo/semantics/publishedVersion

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Dogru_iuj_tran.pdf
Size:
648.73 KB
Format:
Adobe Portable Document Format

License

Rights