Uncovering the semantics of concepts using GPT-4

Le Mens, Gaël; Kovács, Balázs; Hannan, Michael T.; Pros Rius, Guillem

Uncovering the semantics of concepts using GPT-4

Mostra el registre complet Registre parcial de l'ítem

dc.contributor.author Le Mens, Gaël
dc.contributor.author Kovács, Balázs
dc.contributor.author Hannan, Michael T.
dc.contributor.author Pros Rius, Guillem
dc.date.accessioned 2025-04-22T07:03:58Z
dc.date.available 2025-04-22T07:03:58Z
dc.date.issued 2023
dc.description Includes supplementary materials for the online appendix.
dc.description.abstract The ability of recent Large Language Models (LLMs) such as GPT-3.5 and GPT-4 to generate human-like texts suggests that social scientists could use these LLMs to construct measures of semantic similarity that match human judgment. In this article, we provide an empirical test of this intuition. We use GPT-4 to construct a measure of typicality—the similarity of a text document to a concept. We evaluate its performance against other model-based typicality measures in terms of the correlation with human typicality ratings. We conduct this comparative analysis in two domains: the typicality of books in literary genres (using an existing dataset of book descriptions) and the typicality of tweets authored by US Congress members in the Democratic and Republican parties (using a novel dataset). The typicality measure produced with GPT-4 meets or exceeds the performance of the previous state-of-the art typicality measure we introduced in a recent paper [G. Le Mens, B. Kovács, M. T. Hannan, G. Pros Rius, Sociol. Sci. 2023, 82–117 (2023)]. It accomplishes this without any training with the research data (it is zero-shot learning). This is a breakthrough because the previous state-of-the-art measure required fine-tuning an LLM on hundreds of thousands of text documents to achieve its performance.en
dc.description.sponsorship G.L.M. and G.P. received financial support from European Research Council (ERC) Consolidator Grant 772268 from the European Commission. G.L.M. also received financial support from an Catalan Institution for Research and Advanced Studies (ICREA) Academia grant, grant PID2019-105249GBI00/AEI/10.13039/501100011033 from the Spanish Ministerio de Ciencia, Innovacion y Universidades and the Agencia Estatal de Investigacion, and the Severo Ochoa Programme for Centres of Excellence in R&D (Barcelona School of Economics CEX2019-000915-S), funded by MCIN/AEI/10.13039/501100011033. B.K. was supported by Yale School of Management. M.T.H. was supported by the Stanford Graduate School of Business. We thank Ido Erev, Tim Sels, Alex Tyulyupo, and the participants in the 2023 Nagymaros Conference for insightful discussions and comments.en
dc.format.mimetype application/pdf
dc.identifier.citation Le Mens G, Kovács B, Hannan MT, Pros G. Uncovering the semantics of concepts using GPT-4. Proc Natl Acad Sci USA. 2023 Dec 5;120(49):e2309350120. DOI: 10.1073/pnas.2309350120
dc.identifier.doi http://dx.doi.org/10.1073/pnas.2309350120
dc.identifier.issn 0027-8424
dc.identifier.uri http://hdl.handle.net/10230/70179
dc.language.iso eng
dc.publisher National Academy of Sciences
dc.relation.ispartof Proceedings National Academy of Sciences USA. 2023 Dec 5;120(49):e2309350120
dc.relation.projectID info:eu-repo/grantAgreement/EC/FP7/772268
dc.relation.projectID info:eu-repo/grantAgreement/ES/2PE/CEX2019-000915-S
dc.rights This open access article is distributed under Creative Commons Attribution License 4.0 (CC BY).
dc.rights.accessRights info:eu-repo/semantics/openAccess
dc.rights.uri http://creativecommons.org/licenses/by/4.0/
dc.subject.keyword Categoriesen
dc.subject.keyword chatGPTen
dc.subject.keyword Deep learningen
dc.subject.keyword Typicalityen
dc.subject.keyword LLMen
dc.title Uncovering the semantics of concepts using GPT-4en
dc.type info:eu-repo/semantics/article
dc.type.version info:eu-repo/semantics/publishedVersion

Col·leccions

Articles (Departament d'Economia)
Documents OpenAIRE (Open Access Infrastructure for Research in Europe)