Visually grounded meaning representations

Silberer, CarinaFerrari, VittorioLapata, Mirella2019-06-262019-06-262017Silberer C, Ferrari V, Lapata M. Visually Grounded meaning representations. IEEE Trans Pattern Anal Mach Intell. 2017; 39(11): 2284-97. DOI: 10.1109/TPAMI.2016.26351380162-8828http://hdl.handle.net/10230/41865In this paper we address the problem of grounding distributional representations of lexical meaning. We introduce a new model which uses stacked autoencoders to learn higher-level representations from textual and visual input. The visual modality is encoded via vectors of attributes obtained automatically from images. We create a new large-scale taxonomy of 600 visual attributes representing more than 500 concepts and 700K images. We use this dataset to train attribute classifiers and integrate their predictions with text-based distributional models of word meaning. We evaluate our model on its ability to simulate word similarity judgments and concept categorization. On both tasks, our model yields a better fit to behavioral data compared to baselines and related models which either rely on a single modality or do not make use of attribute-based input.application/pdfeng© 2018 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. http://dx.doi.org/10.1109/TPAMI.2016.2635138Visually grounded meaning representationsinfo:eu-repo/semantics/articlehttp://dx.doi.org/10.1109/TPAMI.2016.2635138Cognitive simulationComputer visionDistributed representationsConcept learningConnectionism and neural netsNaturallanguage processinginfo:eu-repo/semantics/openAccess