Building a dataset of emotions with distant supervision

Schaefer Trindade, Luísa

Building a dataset of emotions with distant supervision

Mostra el registre complet Registre parcial de l'ítem

dc.contributor.author Schaefer Trindade, Luísa
dc.date.accessioned 2025-03-04T14:26:42Z
dc.date.available 2025-03-04T14:26:42Z
dc.date.issued 2024
dc.description Treball de fi de màster en Lingüística Teòrica i Aplicada. Directora: Dra. Núria Bel
dc.description.abstract In Natural Language Processing (NLP), emotion detection is a challenging problem of text classification. Using supervised machine learning to tackle this task requires annotated datasets, which can be difficult to come by because they are costly to produce. Moreover, emotions are subjective, and human annotators often disagree in their assessments. Recently, many methods have been proposed to reduce costs, including distant supervision. This thesis presents a strategy for annotating emotions in literary works in Brazilian Portuguese. Using a combination of regular expressions for automatic dialogue extraction, SpaCy, and a lexicon containing 26 emotions, we classify dialogue by considering words used by the narrator to introduce and describe it. The results are mixed, given the large set of emotion labels, many of which are underrepresented in our data collection efforts. However, this strategy can still benefit the annotation of literary corpora with more common emotions such as Happiness and Dissatisfaction.
dc.identifier.uri http://hdl.handle.net/10230/69824
dc.language.iso eng
dc.rights Llicència Creative Commons, Reconeixement-NoComercial-SenseObraDerivada 4.0 Internacional
dc.rights.accessRights info:eu-repo/semantics/openAccess
dc.rights.uri https://creativecommons.org/licenses/by-nc-nd/4.0/deed.ca
dc.subject.keyword Distant supervision
dc.subject.keyword Emotion lexicon
dc.subject.keyword Annotated dataset
dc.subject.keyword Portuguese
dc.title Building a dataset of emotions with distant supervision
dc.type info:eu-repo/semantics/masterThesis

Col·leccions

Màster en Lingüística Teòrica i Aplicada. Treballs de fi de màster de recerca