Ramoneda, PedroParada-Cabaleiro, EmiliaWeck, BennoSerra, Xavier2025-05-302025-05-302024Ramoneda P, Parada-Cabaleiro E, Weck B, Serra X. The role of large language models in musicology: are we ready to trust the machines? In: Proceedings of the 3rd Workshop on NLP for Music and Audio (NLP4MusA); 2024 Nov 15; San Francisco, USA. Stroudsburg: Association for Computational Linguistics (ACL); 2024. p.81-6.http://hdl.handle.net/10230/70566In this work, we explore the use and reliability of Large Language Models (LLMs) in musicology. From a discussion with experts and students, we assess the current acceptance and concerns regarding this, nowadays ubiquitous, technology. We aim to go one step further, proposing a semi-automatic method to create an initial benchmark using retrieval-augmented generation models and multiple-choice question generation, validated by human experts. Our evaluation on 400 human-validated questions shows that current vanilla LLMs are less reliable than retrieval augmented generation from music dictionaries. This paper suggests that the potential of LLMs in musicology requires musicology-driven research that can specialize LLMs by including accurate and reliable domain knowledge.application/pdfeng© ACL, Creative Commons Attribution 4.0 LicenseThe role of large language models in musicology: are we ready to trust the machines?info:eu-repo/semantics/conferenceObjectLanguage modelsMusicologyinfo:eu-repo/semantics/openAccess