Building a dataset of emotions with distant supervision
Mostra el registre complet Registre parcial de l'ítem
- dc.contributor.author Schaefer Trindade, Luísa
- dc.date.accessioned 2025-03-04T14:26:42Z
- dc.date.available 2025-03-04T14:26:42Z
- dc.date.issued 2024
- dc.description Treball de fi de màster en Lingüística Teòrica i Aplicada. Directora: Dra. Núria Bel
- dc.description.abstract In Natural Language Processing (NLP), emotion detection is a challenging problem of text classification. Using supervised machine learning to tackle this task requires annotated datasets, which can be difficult to come by because they are costly to produce. Moreover, emotions are subjective, and human annotators often disagree in their assessments. Recently, many methods have been proposed to reduce costs, including distant supervision. This thesis presents a strategy for annotating emotions in literary works in Brazilian Portuguese. Using a combination of regular expressions for automatic dialogue extraction, SpaCy, and a lexicon containing 26 emotions, we classify dialogue by considering words used by the narrator to introduce and describe it. The results are mixed, given the large set of emotion labels, many of which are underrepresented in our data collection efforts. However, this strategy can still benefit the annotation of literary corpora with more common emotions such as Happiness and Dissatisfaction.
- dc.identifier.uri http://hdl.handle.net/10230/69824
- dc.language.iso eng
- dc.rights Llicència Creative Commons, Reconeixement-NoComercial-SenseObraDerivada 4.0 Internacional
- dc.rights.accessRights info:eu-repo/semantics/openAccess
- dc.rights.uri https://creativecommons.org/licenses/by-nc-nd/4.0/deed.ca
- dc.subject.keyword Distant supervision
- dc.subject.keyword Emotion lexicon
- dc.subject.keyword Annotated dataset
- dc.subject.keyword Portuguese
- dc.title Building a dataset of emotions with distant supervision
- dc.type info:eu-repo/semantics/masterThesis