Exploring the integration of large language models for automatic emotion labeling in speech

Yun Chien, Yi

Exploring the integration of large language models for automatic emotion labeling in speech

Mostra el registre complet Registre parcial de l'ítem

dc.contributor.author Yun Chien, Yi
dc.date.accessioned 2025-10-20T16:01:56Z
dc.date.available 2025-10-20T16:01:56Z
dc.date.issued 2025
dc.description Treball fi de màster de: Master in Intelligent Interactive Systems
dc.description Supervisor: Prof. María Inés Torres Barañano
dc.description.abstract In this work, we present a comprehensive comparison of methodologies for speech emotion recognition (SER), with a focus on evaluating the effectiveness of large language models (LLMs) in this domain. Our study is structured into three parts. First, we extract audio embeddings using models such as WavLM, HuBERT, and Dasheng, and use classical machine learning classifier-Support Vector Machine (SVM) and Multilayer Perceptron (MLP) for emotion prediction. These approach serves as a baseline for comparison. Second, we investigate the capacity of LLMs like GPT-4o, Qwen2-Audio, and Amazon Nova Sonic to analyze audio features, including speaker attributes such as gender, thereby extending their application beyond traditional natural language processing. Third, we explore a more integrated approach that directly inputs raw audio into LLM for audio processing, such as Qwen2-Audio7B-Instruct, for end-to-end emotion classification, without the need for traditional signal-processing-based feature extraction. We evaluate and compare the performance of these methodologies based on various metrics, such as accuracy, precision, recall, and F1-score. A key aspect of this study is the primary focus on the results obtained from LLM-based models. Our results reveal several key insights: (1) data distribution significantly affects classifier performance; (2) different audio embeddings shows different results even with the same classifier and dataset; and (3) despite their capability, current LLMs still underperform compared to classical classifiers such as SVM and MLP in emotion prediction tasks.ENG
dc.identifier.uri http://hdl.handle.net/10230/71584
dc.language.iso eng
dc.rights Llicència CC Reconeixement-NoComercial-CompartirIgual 4.0 Internacional (CC BY-NC-SA 4.0)
dc.rights.accessRights info:eu-repo/semantics/openAccess
dc.rights.uri https://creativecommons.org/licenses/by-nc-sa/4.0/
dc.subject.other Emocions
dc.title Exploring the integration of large language models for automatic emotion labeling in speech
dc.type info:eu-repo/semantics/masterThesis

Col·leccions

Master in Intelligent Interactive Systems. Master Thesis projects