Grounding semantic roles in images

Citació

  • Silberer C, Pinkal M. Grounding semantic roles in images. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing; 2018 Oct 31 - Nov 4; Brussels, Belgium. Stroudsburg (PA): Association for Computational Linguistics; 2018. p. 2616–26.

Enllaç permanent

Descripció

  • Resum

    We address the task of visual semantic role labeling (vSRL), the identification of the participants of a situation or event in a visual scene, and their labeling with their semantic relations to the event or situation. We render candidate participants as image regions of objects, and train a model which learns to ground roles in the regions which depict the corresponding participant. Experimental results demonstrate that we can train a vSRL model without reliance on prohibitive image-based role annotations, by utilizing noisy data which we extract automatically from image captions using a linguistic SRL system. Furthermore, our model induces frame–semantic visual representations, and their comparison to previous work on supervised visual verb sense disambiguation yields overall better results.
  • Descripció

    Comunicació presentada a la 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP 2018), celebrada del 31 d'octubre al 4 de novembre de 2018 a Brussel·les, Bèlgica.
  • Mostra el registre complet