An extensible massively multilingual lexical simplification pipeline dataset using the MultiLS framework
Mostra el registre complet Registre parcial de l'ítem
- dc.contributor.author Shardlow, Matthew
- dc.contributor.author Bott, Stefan Markus
- dc.contributor.author Hayakawa, Akio
- dc.contributor.author Saggion, Horacio
- dc.date.accessioned 2024-05-28T11:49:32Z
- dc.date.available 2024-05-28T11:49:32Z
- dc.date.issued 2024
- dc.description Comunicació presentada a 3rd Workshop on Tools and Resources for People with REAding DIfficulties (READI) @ LREC-COLING 2024, celebrat el 20 de maig de 2024 a Torí, Itàlia.
- dc.description.abstract We present preliminary findings on the MultiLS dataset, developed in support of the 2024 Multilingual Lexical Simplification Pipeline (MLSP) Shared Task. This dataset currently comprises of 300 instances of lexical complexity prediction and lexical simplification across 10 languages. In this paper, we (1) describe the annotation protocol in support of the contribution of future datasets and (2) present summary statistics on the existing data that we have gathered. Multilingual lexical simplification can be used to support low-ability readers to engage with otherwise difficult texts in their native, often low-resourced, languages.
- dc.description.sponsorship Andrea Horbach is part of the research conducted at CATALPA – Center for Advanced Technology-Assisted Learning and Predictive Analytics of the FernUniversität in Hagen, Germany. Anna Hülsing is supported by the German Federal Ministry of Education and Research (grant no. FKZ 01JA23S03C). Joseph Imperial is supported by the National University Philippines (Project ID: 2023I-1T-05-MLA-CCIT-Computer Science) and the UKRI Centre for Doctoral Training in Accountable, Responsible and Transparent AI [EP/S023437/1] of the University of Bath. Horacio Saggion, Stefan Bott and Akio Hayakawa acknowledge funding from the European Union´s Horizon Europe research and innovation program under the Grant Agreement No. 101132431 (iDEM Project) – views and opinions expressed are however those of the author(s) only and do necessarily reflect those of the European Union.
- dc.format.mimetype application/pdf
- dc.identifier.citation Shardlow M, Alva-Manchego F, Batista-Navarro R, Bott S, Calderon Ramirez S, Cardon R, et al. An extensible massively multilingual lexical simplification pipeline dataset using the MultiLS framework. In: Wilkens R, Cardon R, Todirascu A, Gala N, editors. Proceedings of the 3rd Workshop on Tools and Resources for People with REAding DIfficulties (READI) @ LREC-COLING 2024; 2024 May 20th; Torino, Italy. Brussels: ELRA and ICCL, 2024. p. 38–46.
- dc.identifier.uri http://hdl.handle.net/10230/60269
- dc.language.iso eng
- dc.publisher ELRA (European Language Resources Association)
- dc.relation.ispartof Wilkens R, Cardon R, Todirascu A, Gala N, editors. Proceedings of the 3rd Workshop on Tools and Resources for People with REAding DIfficulties (READI) @ LREC-COLING 2024; 2024 May 20th; Torino, Italy. Brussels: ELRA and ICCL, 2024. p. 38–46.
- dc.relation.projectID info:eu-repo/grantAgreement/EC/HE/101132431
- dc.rights © 2024 ELRA - European Language Resources Association: CC BY-NC 4.0
- dc.rights.accessRights info:eu-repo/semantics/openAccess
- dc.rights.uri https://creativecommons.org/licenses/by-nc/4.0/
- dc.subject.keyword lexical simplification
- dc.subject.keyword lexical complexity prediction
- dc.subject.keyword MultiLS
- dc.title An extensible massively multilingual lexical simplification pipeline dataset using the MultiLS framework
- dc.type info:eu-repo/semantics/conferenceObject
- dc.type.version info:eu-repo/semantics/publishedVersion