Statistical machine translation customization between turkish and 11 languages

Mostra el registre complet Registre parcial de l'ítem

  • dc.contributor.author Dogru, Gokhan
  • dc.date.accessioned 2025-01-28T15:21:01Z
  • dc.date.available 2025-01-28T15:21:01Z
  • dc.date.issued 2020
  • dc.description.abstract Statistical Machine Translation (SMT) has been the dominant corpus-based machine translation (MT) approach in the last twenty years. While SMT has been studied in detail among European languages, it has not been studied sufficiently in language pairs including Turkish as source or target language, and its study has been limited mostly to English ↔ Turkish language pair. This study aims to broaden the perspective on Turkish corpus-based MT studies by training MT engines between Turkish and a wide variety of languages with different features. It surveys customized SMT between Turkish and 11 different languages. Twenty-two SMT engines have been trained in KantanMT with open parallel corpora using Turkish as both source and target language. Three automatic evaluation metrics F-Measure, BLEU, and TER have been used for evaluating MT quality. Due to the variations in the corpus quality and size, highly varying results have been achieved. While Turkish ↔ Catalan engines have had the highest automatic evaluation scores, Turkish ↔ Arabic engines have had the lowest automatic scores. While the quality results are highly varying across languages, we obtain baseline scores for a wide variety of languages coupled with Turkish. These results may provide a reference point for evaluating future MT systems including Turkish.
  • dc.format.mimetype application/pdf
  • dc.identifier.citation Dogru G. Statistical machine translation customization between turkish and 11 languages. transLogos Translation Studies Journal. 2020;3(1):98-121. DOI: 10.29228/transLogos.23
  • dc.identifier.doi http://dx.doi.org/10.29228/transLogos.23
  • dc.identifier.issn 2667-4629
  • dc.identifier.uri http://hdl.handle.net/10230/69351
  • dc.language.iso eng
  • dc.publisher Diye Global Communications
  • dc.relation.ispartof transLogos Translation Studies Journal. 2020;3(1):98-121
  • dc.rights This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
  • dc.rights.accessRights info:eu-repo/semantics/openAccess
  • dc.rights.uri http://creativecommons.org/licenses/by-nc-nd/4.0/
  • dc.subject.keyword Statistical machine translation customization
  • dc.subject.keyword Turkish
  • dc.subject.keyword Automatic evaluation metrics
  • dc.subject.keyword Translation quality evaluation
  • dc.subject.keyword Parallel corpus
  • dc.title Statistical machine translation customization between turkish and 11 languages
  • dc.type info:eu-repo/semantics/article
  • dc.type.version info:eu-repo/semantics/publishedVersion