Uncovering the semantics of concepts using GPT-4 and Other recent large language models

Le Mens, Gaël; Kovács, Balász; Hannan, Michael T.; Pros, Guillem

Uncovering the semantics of concepts using GPT-4 and Other recent large language models

Mostra el registre complet Registre parcial de l'ítem

dc.contributor.author Le Mens, Gaël
dc.contributor.author Kovács, Balász
dc.contributor.author Hannan, Michael T.
dc.contributor.author Pros, Guillem
dc.contributor.other Universitat Pompeu Fabra. Departament d'Economia i Empresa
dc.date.accessioned 2024-11-14T10:10:00Z
dc.date.available 2024-11-14T10:10:00Z
dc.date.issued 2023-06-02
dc.date.modified 2024-11-14T10:08:55Z
dc.description.abstract Recently, the world's attention has been captivated by Large Language Models (LLMs) thanks to OpenAI's Chat-GPT, which rapidly proliferated as an app powered by GPT-3 and now its successor, GPT-4. If these LLMs produce human-like text, the semantic spaces they construct likely align with those used by humans for interpreting and generating language. This suggests that social scientists could use these LLMs to construct measures of semantic similarity that match human judgment. In this article, we provide an empirical test of this intuition. We use GPT-4 to construct a new measure of typicalityâ the similarity of a text document to a concept or category. We evaluate its performance against other model-based typicality measures in terms of their correspondence with human typicality ratings. We conduct this comparative analysis in two domains: the typicality of books in literary genres (using an existing dataset of book descriptions) and the typicality of tweets authored by US Congress members in the Democratic and Republican parties (using a novel dataset). The GPT-4 Typicality measure not only meets or exceeds the current state-of-the-art but accomplishes this without any model training. This is a breakthrough because the previous state-of-the-art measure required fine-tuning a model (a BERT text classifier) on hundreds of thousands of text documents to achieve its performance. Our comparative analysis emphasizes the need for systematic empirical validation of measures based on LLMs: several measures based on other recent LLMs achieve at best a moderate correspondence with human judgments.
dc.format.mimetype application/pdf*
dc.identifier https://econ-papers.upf.edu/ca/paper.php?id=1864
dc.identifier.citation Proceedings of the National Academy of Sciences (PNAS), 120 (49) e2309350120, pp. 1-7 https://doi.org/10.1073/pnas.2309350120
dc.identifier.uri http://hdl.handle.net/10230/68663
dc.language.iso eng
dc.relation.ispartofseries Economics and Business Working Papers Series; 1864
dc.rights L'accés als continguts d'aquest document queda condicionat a l'acceptació de les condicions d'ús establertes per la següent llicència Creative Commons
dc.rights.accessRights info:eu-repo/semantics/openAccess
dc.rights.uri http://creativecommons.org/licenses/by-nc-nd/3.0/es/
dc.subject.keyword categories
dc.subject.keyword concepts
dc.subject.keyword deep learning
dc.subject.keyword typicality
dc.subject.keyword gpt
dc.subject.keyword chatgpt
dc.subject.keyword bert
dc.subject.keyword similarity
dc.title Uncovering the semantics of concepts using GPT-4 and Other recent large language models
dc.title.alternative
dc.type info:eu-repo/semantics/workingPaper

Col·leccions

Economics and Business Working Papers Series