Can LLMs solve reading comprehension tests as second language learners?

Citació

Hayakawa A, Saggion H. Can LLMs solve reading comprehension tests as second language learners? In: Gaur M, Tsamoura E, Raff E, Vedula N, Parthasarathy S, Asefa G, Alam M, Buscaldi D, Cochez M, Osborne F, Reforgiato Recupero D, editors. Joint Proceedings of the 4th International Workshop on Knowledge-infused Learning (KiL 2024) and the workshop on Deep Learning and Large Language Models for Knowledge Graphs (DL4KG 2024); 2024 August 26; Barcelona, Spain. CEUR Workshop Proceedings; 2024. p. 42-9.

Enllaç permanent

Descripció

Resum
The manual evaluation of natural language processing systems is costly and time-consuming, especially when targeting people with specific attributes as evaluators. Current large language models (LLMs) are reported to outperform humans at various tasks, and recently have been used as substitutes for human evaluators. LLMs also have shown the ability to behave as specified in a prompt. This progress raises a fundamental question: can LLMs mimic the behavior of language learners? In this study, we intentionally weaken LLMs aiming to make them simulate language learners on multiple-choice reading comprehension tests. By comparing answer distributions from language learners and LLMs, we observe that prompts designed to weaken the LLMs indeed degrade their performance. However, this degration does not bridge the gap between the original LLMs and language learners, thereby hilighting a critical discrepancy between them.
Col·leccions
Congressos (Departament de Tecnologies de la Informació i les Comunicacions)

Fitxers