We consider the problem of grounding the
meaning of words in the physical world
and focus on the visual modality which we
represent by visual attributes. We create
a new large-scale taxonomy of visual attributes
covering more than 500 concepts
and their corresponding 688K images. We
use this dataset to train attribute classifiers
and integrate their predictions with
text-based distributional models of word
meaning. We show that these bimodal
models give a better fit to human word association
data ...
We consider the problem of grounding the
meaning of words in the physical world
and focus on the visual modality which we
represent by visual attributes. We create
a new large-scale taxonomy of visual attributes
covering more than 500 concepts
and their corresponding 688K images. We
use this dataset to train attribute classifiers
and integrate their predictions with
text-based distributional models of word
meaning. We show that these bimodal
models give a better fit to human word association
data compared to amodal models
and word representations based on handcrafted
norming data.
+