MSC+: Language pattern learning for word sense induction and disambiguation

Mostra el registre complet Registre parcial de l'ítem

  • dc.contributor.author Bif Goularte, Fábio
  • dc.contributor.author Sorato, Danielly
  • dc.contributor.author Modesto Nassar, Silvia
  • dc.contributor.author Fileto, Renato
  • dc.contributor.author Saggion, Horacio
  • dc.date.accessioned 2020-12-14T09:58:23Z
  • dc.date.issued 2019
  • dc.description.abstract Identifying the correct meaning of words in context or discovering new word senses is particularly useful for several tasks such as question answering, information extraction, information retrieval, and text summarization. However, specially in the context of user-generated contents and on-line communication (e.g. Twitter), new meanings are continuously crafted by speakers as the result of existing words being used in novel contexts. Consequently, lexical semantics inventories and systems have difficulties to cope with semantic drifting problems. In this work, we propose an approach to induce and disambiguate word senses of some target words in collections of short texts, such as tweets, through the use of fuzzy lexico-semantic patterns that we define as sequences of Morpho-semantic Components (MSC+). We learn these patterns, that we call patterns, from text data automatically. Experimental results show that instances of some patterns arise in a number of tweets, but sometimes using different words to convey the sense of the respective MSC+ in some tweets where pattern instances appear. The exploitation of MSC+ patterns when they induce semantics on target words enable effective word sense disambiguation mechanisms leading to improvements in the state of the art.
  • dc.description.sponsorship This work was conducted during a doctorate partially supported by grants of CAPES (Brazilian Coordination of Superior Level Staff Improvement) a research support agency from the Ministry of Education of Brazil. CAPES also supported an internship for international cooperation with the TALN (Natural Language Processing Research Group) at the Pompeu Fabra University in Barcelona, Spain. The last author acknowledges support from the Spanish Government under the María de Maeztu Units of Excellence Programme (MDM-2015-0502).
  • dc.format.mimetype application/pdf
  • dc.identifier.citation Bif Goularte F, Sorato D. Modesto Nassar S, Fileto R, Saggion H. MSC+: Language pattern learning for word sense induction and disambiguation. Know. Based Systems. 2020;188:105017. DOI: 10.1016/j.knosys.2019.105017
  • dc.identifier.doi http://dx.doi.org/10.1016/j.knosys.2019.105017
  • dc.identifier.issn 0950-7051
  • dc.identifier.uri http://hdl.handle.net/10230/46030
  • dc.language.iso eng
  • dc.publisher Elsevier
  • dc.relation.ispartof Knowledge-Based Systems. 2020;188:105017.
  • dc.rights © Elsevier http://dx.doi.org/10.1016/j.knosys.2019.105017
  • dc.rights.accessRights info:eu-repo/semantics/openAccess
  • dc.subject.keyword Lexical semantics
  • dc.subject.keyword Information extraction
  • dc.subject.keyword Linguistic pattern mining
  • dc.subject.keyword Word sense induction
  • dc.subject.keyword Word sense disambiguation
  • dc.title MSC+: Language pattern learning for word sense induction and disambiguation
  • dc.type info:eu-repo/semantics/article
  • dc.type.version info:eu-repo/semantics/acceptedVersion