Can LLMs solve reading comprehension tests as second language learners?
Mostra el registre complet Registre parcial de l'ítem
- dc.contributor.author Hayakawa, Akio
- dc.contributor.author Saggion, Horacio
- dc.date.accessioned 2025-02-25T07:24:37Z
- dc.date.available 2025-02-25T07:24:37Z
- dc.date.issued 2024
- dc.description.abstract The manual evaluation of natural language processing systems is costly and time-consuming, especially when targeting people with specific attributes as evaluators. Current large language models (LLMs) are reported to outperform humans at various tasks, and recently have been used as substitutes for human evaluators. LLMs also have shown the ability to behave as specified in a prompt. This progress raises a fundamental question: can LLMs mimic the behavior of language learners? In this study, we intentionally weaken LLMs aiming to make them simulate language learners on multiple-choice reading comprehension tests. By comparing answer distributions from language learners and LLMs, we observe that prompts designed to weaken the LLMs indeed degrade their performance. However, this degration does not bridge the gap between the original LLMs and language learners, thereby hilighting a critical discrepancy between them.
- dc.description.sponsorship The authors acknowledge the support from Departament de Recerca i Universitats de la Generalitat de Catalunya (ajuts SGR-Cat 2021) and from Maria de Maeztu Units of Excellence Programme CEX2021-001195-M, funded by MCIN/AEI /10.13039/501100011033. This research is part of a project that has received funding from the European Union´s Horizon Europe research and innovation program under the Grant Agreement No. 101132431 (iDEM Project).
- dc.format.mimetype application/pdf
- dc.identifier.citation Hayakawa A, Saggion H. Can LLMs solve reading comprehension tests as second language learners? In: Gaur M, Tsamoura E, Raff E, Vedula N, Parthasarathy S, Asefa G, Alam M, Buscaldi D, Cochez M, Osborne F, Reforgiato Recupero D, editors. Joint Proceedings of the 4th International Workshop on Knowledge-infused Learning (KiL 2024) and the workshop on Deep Learning and Large Language Models for Knowledge Graphs (DL4KG 2024); 2024 August 26; Barcelona, Spain. CEUR Workshop Proceedings; 2024. p. 42-9.
- dc.identifier.issn 1613-0073
- dc.identifier.uri http://hdl.handle.net/10230/69721
- dc.language.iso eng
- dc.publisher CEUR Workshop Proceedings
- dc.relation.projectID info:eu-repo/grantAgreement/EC/HE/101132431
- dc.rights © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0)
- dc.rights.accessRights info:eu-repo/semantics/openAccess
- dc.rights.uri http://creativecommons.org/licenses/by/4.0/
- dc.subject.keyword Natural language processing
- dc.subject.keyword Large language models
- dc.subject.keyword Question answering
- dc.subject.keyword Reading comprehension
- dc.title Can LLMs solve reading comprehension tests as second language learners?
- dc.type info:eu-repo/semantics/conferenceObject
- dc.type.version info:eu-repo/semantics/publishedVersion