Using machine learning to uncover the semantics of concepts: how well do typicality measures extracted from a BERT text classifier match human judgments of genre typicality?

Le Mens, Gaël; Kovács, Balázs; Hannan, Michael T.; Pros Rius, Guillem

Using machine learning to uncover the semantics of concepts: how well do typicality measures extracted from a BERT text classifier match human judgments of genre typicality?

Mostra el registre complet Registre parcial de l'ítem

dc.contributor.author Le Mens, Gaël
dc.contributor.author Kovács, Balázs
dc.contributor.author Hannan, Michael T.
dc.contributor.author Pros Rius, Guillem
dc.date.accessioned 2023-03-06T11:27:45Z
dc.date.available 2023-03-06T11:27:45Z
dc.date.issued 2023
dc.description Includes data, material, and analysis code for all analyses.
dc.description.abstract Social scientists have long been interested in understanding the extent to which the typicalities of an object in concepts relate to its valuations by social actors. Answering this question has proven to be challenging because precise measurement requires a feature-based description of objects. Yet, such descriptions are frequently unavailable. In this article, we introduce a method to measure typicality based on text data. Our approach involves training a deep-learning text classifier based on the BERT language representation and defining the typicality of an object in a concept in terms of the categorization probability produced by the trained classifier. Model training allows for the construction of a feature space adapted to the categorization task and of a mapping between feature combination and typicality that gives more weight to feature dimensions that matter more for categorization. We validate the approach by comparing the BERT-based typicality measure of book descriptions in literary genres with average human typicality ratings. The obtained correlation is higher than 0.85. Comparisons with other typicality measures used in prior research show that our BERT-based measure better reflects human typicality judgments.
dc.description.sponsorship Pros received financial support from ERC Consolidator Grant #772268 from the European Commission. G. Le Mens also received financial support from grant PID2019-105249GB-I00/AEI/10.13039/501100011033 from the Spanish Ministerio de Ciencia, Innovacion y Universidades (MCIU) and the Agencia Estatal de Investigacion (AEI) and from the BBVA Foundation Grant G999088Q.
dc.format.mimetype application/pdf
dc.identifier.citation Le Mens G, Kovács B, Hannan MT, Pros G. Using machine learning to uncover the semantics of concepts: how well do typicality measures extracted from a BERT text classifier match human judgments of genre typicality? Sociological Science. 2023 March;10:82-117. DOI: 10.15195/v10.a3
dc.identifier.doi http://dx.doi.org/10.15195/v10.a3
dc.identifier.issn 2330-6696
dc.identifier.uri http://hdl.handle.net/10230/56063
dc.language.iso eng
dc.publisher Society for Sociological Science
dc.relation.ispartof Sociological Science. 2023 March;10:82-117
dc.relation.isreferencedby https://osf.io/ta273/
dc.relation.projectID info:eu-repo/grantAgreement/EC/H2020/772268
dc.relation.projectID info:eu-repo/grantAgreement/ES/2PE/PID2019-105249GB-I00
dc.rights This work is licensed under a Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).
dc.rights.accessRights info:eu-repo/semantics/openAccess
dc.rights.uri http://creativecommons.org/licenses/by/4.0/
dc.subject.keyword Categories
dc.subject.keyword Concepts
dc.subject.keyword Deep learning
dc.subject.keyword Typicality
dc.subject.keyword BERT
dc.subject.keyword Transformer models
dc.title Using machine learning to uncover the semantics of concepts: how well do typicality measures extracted from a BERT text classifier match human judgments of genre typicality?
dc.type info:eu-repo/semantics/article
dc.type.version info:eu-repo/semantics/publishedVersion

Col·leccions

Articles (Departament d'Economia)
Documents OpenAIRE (Open Access Infrastructure for Research in Europe)