Welcome to the UPF Digital Repository

An extensible massively multilingual lexical simplification pipeline dataset using the MultiLS framework

Show simple item record

dc.contributor.author Shardlow, Matthew
dc.contributor.author Bott, Stefan Markus
dc.contributor.author Hayakawa, Akio
dc.contributor.author Saggion, Horacio
dc.date.accessioned 2024-05-28T11:49:32Z
dc.date.available 2024-05-28T11:49:32Z
dc.date.issued 2024
dc.identifier.citation Shardlow M, Alva-Manchego F, Batista-Navarro R, Bott S, Calderon Ramirez S, Cardon R, et al. An extensible massively multilingual lexical simplification pipeline dataset using the MultiLS framework. In: Wilkens R, Cardon R, Todirascu A, Gala N, editors. Proceedings of the 3rd Workshop on Tools and Resources for People with REAding DIfficulties (READI) @ LREC-COLING 2024; 2024 May 20th; Torino, Italy. Brussels: ELRA and ICCL, 2024. p. 38–46.
dc.identifier.uri http://hdl.handle.net/10230/60269
dc.description Comunicació presentada a 3rd Workshop on Tools and Resources for People with REAding DIfficulties (READI) @ LREC-COLING 2024, celebrat el 20 de maig de 2024 a Torí, Itàlia.
dc.description.abstract We present preliminary findings on the MultiLS dataset, developed in support of the 2024 Multilingual Lexical Simplification Pipeline (MLSP) Shared Task. This dataset currently comprises of 300 instances of lexical complexity prediction and lexical simplification across 10 languages. In this paper, we (1) describe the annotation protocol in support of the contribution of future datasets and (2) present summary statistics on the existing data that we have gathered. Multilingual lexical simplification can be used to support low-ability readers to engage with otherwise difficult texts in their native, often low-resourced, languages.
dc.description.sponsorship Andrea Horbach is part of the research conducted at CATALPA – Center for Advanced Technology-Assisted Learning and Predictive Analytics of the FernUniversität in Hagen, Germany. Anna Hülsing is supported by the German Federal Ministry of Education and Research (grant no. FKZ 01JA23S03C). Joseph Imperial is supported by the National University Philippines (Project ID: 2023I-1T-05-MLA-CCIT-Computer Science) and the UKRI Centre for Doctoral Training in Accountable, Responsible and Transparent AI [EP/S023437/1] of the University of Bath. Horacio Saggion, Stefan Bott and Akio Hayakawa acknowledge funding from the European Union´s Horizon Europe research and innovation program under the Grant Agreement No. 101132431 (iDEM Project) – views and opinions expressed are however those of the author(s) only and do necessarily reflect those of the European Union.
dc.format.mimetype application/pdf
dc.language.iso eng
dc.publisher ELRA (European Language Resources Association)
dc.relation.ispartof Wilkens R, Cardon R, Todirascu A, Gala N, editors. Proceedings of the 3rd Workshop on Tools and Resources for People with REAding DIfficulties (READI) @ LREC-COLING 2024; 2024 May 20th; Torino, Italy. Brussels: ELRA and ICCL, 2024. p. 38–46.
dc.rights © 2024 ELRA - European Language Resources Association: CC BY-NC 4.0
dc.rights.uri https://creativecommons.org/licenses/by-nc/4.0/
dc.title An extensible massively multilingual lexical simplification pipeline dataset using the MultiLS framework
dc.type info:eu-repo/semantics/conferenceObject
dc.subject.keyword lexical simplification
dc.subject.keyword lexical complexity prediction
dc.subject.keyword MultiLS
dc.relation.projectID info:eu-repo/grantAgreement/EC/HE/101132431
dc.rights.accessRights info:eu-repo/semantics/openAccess
dc.type.version info:eu-repo/semantics/publishedVersion

Thumbnail

This item appears in the following Collection(s)

Show simple item record

Search DSpace


Advanced Search

Browse

My Account

Statistics

In collaboration with Compliant to Partaking