Multilingual lexical simplification

Mostra el registre complet Registre parcial de l'ítem

  • dc.contributor.author Pimienta Castillo, Jorge S.
  • dc.date.accessioned 2021-12-15T12:31:56Z
  • dc.date.available 2021-12-15T12:31:56Z
  • dc.date.issued 2021-09
  • dc.description Treball fi de màster de: Master in Intelligent Interactive Systemsca
  • dc.description Tutor: Horacio Saggion
  • dc.description.abstract This report describes, implement, and evaluate one strategy for text simplification, namely, Lexical Simplification, that aims to reduce the complexity of some words in a sentence. This process is done in two main steps, the first, is a module that identifies the complex elements, and the second, is a module that replaces those elements for simpler variants. For the first module, the system will use three different datasets that include human annotations in different languages: English, Spanish, and German, this will allow us to train a classifier that detects complex words. For the second module, a pre-trained model for word prediction (BERT) will be used to generate the candidates, the candidates will be sorted based on Zipf’s frequency, to later select the one with the highest value. Finally, the complete system is evaluated using a test dataset, and a survey designed to collect human annotations and perception of Fluency, Meaning and Simplicity.ca
  • dc.format.mimetype application/pdf*
  • dc.identifier.uri http://hdl.handle.net/10230/49224
  • dc.language.iso engca
  • dc.rights Attribution-NonCommercial- NoDerivs 4.0 Internationalca
  • dc.rights.accessRights info:eu-repo/semantics/openAccessca
  • dc.rights.uri https://creativecommons.org/licenses/by-nc-nd/4.0/ca
  • dc.subject.keyword Complex word identification
  • dc.subject.keyword Masked language model
  • dc.subject.keyword Lexical simplification
  • dc.subject.keyword Word frequency
  • dc.subject.keyword Model evaluation
  • dc.title Multilingual lexical simplificationca
  • dc.type info:eu-repo/semantics/masterThesisca