Evaluating language models for the retrieval and categorization of lexical collocations

Citation

Espinosa-Anke L, Codina-Filbà J, Wanner L. Evaluating language models for the retrieval and categorization of lexical collocations. In: Merlo P, Tiedemann J, Tsarfaty R, editors. The 16th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2021): proceedings of the conference; 2021 Apr 19-23; [online]. Stroudsburg: Association for Computational Linguistics; 2021. p. 1406-17. DOI: 10.18653/v1/2021.eacl-main.120

Permanent Link

Description

Linked Data
https://github.com/luisespinosaanke/lexicalcollocations
Abstract
Lexical collocations are idiosyncratic combinations of two syntactically bound lexical items (e.g., “heavy rain”, “take a step” or “undergo surgery”). Understanding their degree of compositionality and idiosyncrasy, as well their underlying semantics, is crucial for language learners, lexicographers and downstream NLP applications alike. In this paper we analyse a suite of language models for collocation understanding. We first construct a dataset of apparitions of lexical collocations in context, categorized into 16 representative semantic categories. Then, we perform two experiments: (1) unsupervised collocate retrieval, and (2) supervised collocation classification in context. We find that most models perform well in distinguishing light verb constructions, especially if the collocation’s first argument acts as a subject, but often fail to distinguish, first, different syntactic structures within the same semantic category, and second, finer-grained categories which restrict the set of correct collocates.
Description
Comunicació presentada a: EACL 2021 celebrat del 19 a 23 d'abril de 2021 en línia.
DOI
http://dx.doi.org/10.18653/v1/2021.eacl-main.120
Collections
Congressos (Departament de Tecnologies de la Informació i les Comunicacions)
Documents OpenAIRE (Open Access Infrastructure for Research in Europe)

Files