Cross-domain image captioning with discriminative finetuning

Dessì, Roberto; Bevilacqua, Michele; Gualdoni, Eleonora; Rakotonirina, Nathanael Carraz; Franzon, Francesca; Baroni, Marco

Cross-domain image captioning with discriminative finetuning

Mostra el registre complet Registre parcial de l'ítem

dc.contributor.author Dessì, Roberto
dc.contributor.author Bevilacqua, Michele
dc.contributor.author Gualdoni, Eleonora
dc.contributor.author Rakotonirina, Nathanael Carraz
dc.contributor.author Franzon, Francesca
dc.contributor.author Baroni, Marco
dc.date.accessioned 2023-04-19T06:16:31Z
dc.date.available 2023-04-19T06:16:31Z
dc.date.issued 2023
dc.description Comunicació presentada a: CVPR 2023. IEEE/CVF Conference on Computer Vision and Pattern Recognition celebrat del 18 al 22 de juny a Vancouver, Canada.
dc.description.abstract Neural captioners are typically trained to mimic humangenerated references without optimizing for any specific communication goal, leading to problems such as the generation of vague captions. In this paper, we show that fine-tuning an out-of-the-box neural captioner with a selfsupervised discriminative communication objective helps to recover a plain, visually descriptive language that is more informative about image contents. Given a target image, the system must learn to produce a description that enables an out-of-the-box text-conditioned image retriever to identify such image among a set of candidates. We experiment with the popular ClipCap captioner, also replicating the main results with BLIP. In terms of similarity to groundtruth human descriptions, the captions emerging from discriminative finetuning lag slightly behind those generated by the non-finetuned model, when the latter is trained and tested on the same caption dataset. However, when the model is used without further tuning to generate captions for out-of-domain datasets, our discriminatively-finetuned captioner generates descriptions that resemble human references more than those produced by the same captioner without finetuning. We further show that, on the Conceptual Captions dataset, discriminatively finetuned captions are more helpful than either vanilla ClipCap captions or ground-truth captions for human annotators tasked with an image discrimination task
dc.description.sponsorship EG, NCR, FF and MB received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreements No. 715154 and No. 101019291) and the Spanish Research Agency (ref. PID2020-112602GBI00/MICIN/AEI/10.13039/501100011033).
dc.format.mimetype application/pdf
dc.identifier.citation Dessì R, Bevilacqua M, Gualdoni E, Rakotonirina N, Franzon F, Baroni M. Cross-domain image captioning with discriminative finetuning. Paper presented at: CVPR 2023. Proceedings of The Thirty-Fourth IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2023 Jun 18-22; Vancouver, Canada.
dc.identifier.uri http://hdl.handle.net/10230/56494
dc.language.iso eng
dc.publisher Institute of Electrical and Electronics Engineers (IEEE)
dc.relation.ispartof CVPR 2023. Proceedings of The Thirty-Fourth IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2023 Jun 18-22; Vancouver, Canada.
dc.relation.projectID info:eu-repo/grantAgreement/EC/H2020/715154
dc.relation.projectID info:eu-repo/grantAgreement/EC/H2020/101019291
dc.relation.projectID info:eu-repo/grantAgreement/ES/2PE/PID2020-112602-GB-I00
dc.rights © 2023 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. http://dx.doi.org/[núm.DOI]
dc.rights.accessRights info:eu-repo/semantics/embargoedAccess
dc.subject.other Subtitulació
dc.title Cross-domain image captioning with discriminative finetuning
dc.type info:eu-repo/semantics/conferenceObject
dc.type.version info:eu-repo/semantics/acceptedVersion

Col·leccions

Congressos (Departament de Traducció i Ciències del Llenguatge)
Documents OpenAIRE (Open Access Infrastructure for Research in Europe)