Uncovering the semantics of concepts using GPT-4
Mostra el registre complet Registre parcial de l'ítem
- dc.contributor.author Le Mens, Gaël
- dc.contributor.author Kovács, Balázs
- dc.contributor.author Hannan, Michael T.
- dc.contributor.author Pros Rius, Guillem
- dc.date.accessioned 2025-04-22T07:03:58Z
- dc.date.available 2025-04-22T07:03:58Z
- dc.date.issued 2023
- dc.description Includes supplementary materials for the online appendix.
- dc.description.abstract The ability of recent Large Language Models (LLMs) such as GPT-3.5 and GPT-4 to generate human-like texts suggests that social scientists could use these LLMs to construct measures of semantic similarity that match human judgment. In this article, we provide an empirical test of this intuition. We use GPT-4 to construct a measure of typicality—the similarity of a text document to a concept. We evaluate its performance against other model-based typicality measures in terms of the correlation with human typicality ratings. We conduct this comparative analysis in two domains: the typicality of books in literary genres (using an existing dataset of book descriptions) and the typicality of tweets authored by US Congress members in the Democratic and Republican parties (using a novel dataset). The typicality measure produced with GPT-4 meets or exceeds the performance of the previous state-of-the art typicality measure we introduced in a recent paper [G. Le Mens, B. Kovács, M. T. Hannan, G. Pros Rius, Sociol. Sci. 2023, 82–117 (2023)]. It accomplishes this without any training with the research data (it is zero-shot learning). This is a breakthrough because the previous state-of-the-art measure required fine-tuning an LLM on hundreds of thousands of text documents to achieve its performance.en
- dc.description.sponsorship G.L.M. and G.P. received financial support from European Research Council (ERC) Consolidator Grant 772268 from the European Commission. G.L.M. also received financial support from an Catalan Institution for Research and Advanced Studies (ICREA) Academia grant, grant PID2019-105249GBI00/AEI/10.13039/501100011033 from the Spanish Ministerio de Ciencia, Innovacion y Universidades and the Agencia Estatal de Investigacion, and the Severo Ochoa Programme for Centres of Excellence in R&D (Barcelona School of Economics CEX2019-000915-S), funded by MCIN/AEI/10.13039/501100011033. B.K. was supported by Yale School of Management. M.T.H. was supported by the Stanford Graduate School of Business. We thank Ido Erev, Tim Sels, Alex Tyulyupo, and the participants in the 2023 Nagymaros Conference for insightful discussions and comments.en
- dc.format.mimetype application/pdf
- dc.identifier.citation Le Mens G, Kovács B, Hannan MT, Pros G. Uncovering the semantics of concepts using GPT-4. Proc Natl Acad Sci USA. 2023 Dec 5;120(49):e2309350120. DOI: 10.1073/pnas.2309350120
- dc.identifier.doi http://dx.doi.org/10.1073/pnas.2309350120
- dc.identifier.issn 0027-8424
- dc.identifier.uri http://hdl.handle.net/10230/70179
- dc.language.iso eng
- dc.publisher National Academy of Sciences
- dc.relation.ispartof Proceedings National Academy of Sciences USA. 2023 Dec 5;120(49):e2309350120
- dc.relation.projectID info:eu-repo/grantAgreement/EC/FP7/772268
- dc.relation.projectID info:eu-repo/grantAgreement/ES/2PE/CEX2019-000915-S
- dc.rights This open access article is distributed under Creative Commons Attribution License 4.0 (CC BY).
- dc.rights.accessRights info:eu-repo/semantics/openAccess
- dc.rights.uri http://creativecommons.org/licenses/by/4.0/
- dc.subject.keyword Categoriesen
- dc.subject.keyword chatGPTen
- dc.subject.keyword Deep learningen
- dc.subject.keyword Typicalityen
- dc.subject.keyword LLMen
- dc.title Uncovering the semantics of concepts using GPT-4en
- dc.type info:eu-repo/semantics/article
- dc.type.version info:eu-repo/semantics/publishedVersion