When the whole is less than the sum of its parts: how composition affects PMI values in distributional semantic vectors

Paperno, Denis; Baroni, Marco

When the whole is less than the sum of its parts: how composition affects PMI values in distributional semantic vectors

Mostra el registre complet Registre parcial de l'ítem

dc.contributor.author Paperno, Denis
dc.contributor.author Baroni, Marco
dc.date.accessioned 2020-12-02T09:07:47Z
dc.date.available 2020-12-02T09:07:47Z
dc.date.issued 2016
dc.description.abstract Distributional semantic models, deriving vector-based word representations from patterns of word usage in corpora, have many useful applications (Turney and Pantel 2010). Recently, there has been interest in compositional distributional models, which derive vectors for phrases from representations of their constituent words (Mitchell and Lapata 2010). Often, the values of distributional vectors are pointwise mutual information (PMI) scores obtained from raw co-occurrence counts. In this article we study the relation between the PMI dimensions of a phrase vector and its components in order to gain insights into which operations an adequate composition model should perform. We show mathematically that the difference between the PMI dimension of a phrase vector and the sum of PMIs in the corresponding dimensions of the phrase's parts is an independently interpretable value, namely, a quantification of the impact of the context associated with the relevant dimension on the phrase's internal cohesion, as also measured by PMI. We then explore this quantity empirically, through an analysis of adjective–noun composition.en
dc.description.sponsorship We would like to thank the Computational Linguistics editor and reviewers: Yoav Goldberg, Omer Levy, Katya Tentori, Germán Kruszewski, Nghia Pham, and the other members of the Composes team for useful feedback. Our work is funded by ERC 2011 Starting Independent Research Grant n. 283554 (COMPOSES).
dc.format.mimetype application/pdf
dc.identifier.citation Paperno D, Baroni M. When the whole is less than the sum of its parts: how composition affects PMI values in distributional semantic vectors. Computational Linguistics. 2016 Jun;42(2):345-50. DOI: 10.1162/COLI_a_00250
dc.identifier.doi http://dx.doi.org/10.1162/COLI_a_00250
dc.identifier.issn 0891-2017
dc.identifier.uri http://hdl.handle.net/10230/45934
dc.language.iso eng
dc.publisher MIT Press
dc.relation.ispartof Computational Linguistics. 2016 Jun;42(2):345-50
dc.relation.projectID info:eu-repo/grantAgreement/EC/FP7/283554
dc.rights.accessRights info:eu-repo/semantics/openAccess
dc.rights.uri https://creativecommons.org/licenses/by-nc-nd/4.0/
dc.title When the whole is less than the sum of its parts: how composition affects PMI values in distributional semantic vectorsen
dc.type info:eu-repo/semantics/article
dc.type.version info:eu-repo/semantics/publishedVersion

Col·leccions

Articles (Departament de Traducció i Ciències del Llenguatge)
Documents OpenAIRE (Open Access Infrastructure for Research in Europe)