Exploring the capacity of large language models to assess the chronic pain experience: algorithm development and validation
Mostra el registre complet Registre parcial de l'ítem
- dc.contributor.author Amidei, Jacopo
- dc.contributor.author Nieto, Rubén
- dc.contributor.author Kaltenbrunner, Andreas
- dc.contributor.author Ferreira De Sá, Jose Gregorio
- dc.contributor.author Serrat, Mayte
- dc.contributor.author Albajes, Klara
- dc.date.accessioned 2025-11-11T07:11:48Z
- dc.date.available 2025-11-11T07:11:48Z
- dc.date.issued 2025
- dc.description.abstract Background: Chronic pain, affecting more than 20% of the global population, has an enormous pernicious impact on individuals as well as economic ramifications at both the health and social levels. Accordingly, tools that enhance pain assessment can considerably impact people suffering from pain and society at large. In this context, assessment methods based on individuals’ personal experiences, such as written narratives (WNs), offer relevant insights into understanding pain from a personal perspective. This approach can uncover subjective, intricate, and multifaceted aspects that standardized questionnaires can overlook. However, WNs can be time-consuming for clinicians. Therefore, a tool that uses WNs while reducing the time required for their evaluation could have a significantly beneficial impact on people's pain assessment. Objective: This study is the first evaluation of the potential of applying large language models (LLMs) to assist clinicians in assessing patients’ pain expressed through WNs. Methods: We performed an experiment based on 43 WNs made by people with fibromyalgia and qualitatively evaluated in a prior study. Focusing on pain severity and disability, we prompt GPT-4 (with temperature parameter settings 0 or 1) to assign scores and scores’ explanations, to these WNs. Then, we quantitatively compare GPT-4 scores with experts’ scores of the same narratives, using statistical measures such as Pearson correlations, root mean squared error, the weighted version of the Gwet agreement coefficient, and Krippendorff α. Additionally, 2 experts specialized in chronic pain conducted a qualitative analysis of the scores’ explanation to assess their accuracy and potential applicability of GPT’s analysis for future pain narrative evaluations. Results: Our analysis reveals that GPT-4’s performance in assessing pain narratives yielded promising results. GPT-4 was comparable in terms of agreement with experts (with a weighted percentage agreement higher than 0.95), correlations with standardized measurements (for example in the range of 0.43 and 0.49 between the Revised Fibromyalgia Impact Questionnaire and GTP-4 with temperatures 1), and low error rates (root mean squared error of 1.20 for severity and 1.44 for disability). Moreover, experts generally deemed the ratings provided by GPT-4, as well as the scores’ explanation, to be adequate. However, we observe that GPT has a slight tendency to overestimate pain severity and disability with a lower SD than expert estimates. Conclusions: These findings underline the potential of LLMs in facilitating the assessment of WNs of people with fibromyalgia, offering a novel approach to understanding and evaluating patient pain experiences. Integrating automated assessments through LLMs presents opportunities for streamlining and enhancing the assessment process, paving the way for improved patient care and tailored interventions in the chronic pain management field.en
- dc.format.mimetype application/pdf
- dc.identifier.citation Amidei J, Nieto R, Kaltenbrunner A, Ferreira De Sá JG, Serrat M, Albajes K. Exploring the capacity of large language models to assess the chronic pain experience: algorithm development and validation. J Med Internet Res. 2025 Mar 31;27:e65903. DOI: 10.2196/65903
- dc.identifier.doi http://dx.doi.org/10.2196/65903
- dc.identifier.issn 1439-4456
- dc.identifier.uri http://hdl.handle.net/10230/71843
- dc.language.iso eng
- dc.publisher JMIR Publications
- dc.relation.ispartof Journal of Medical Internet Research. 2025 Mar 31;27:e65903
- dc.rights ©Jacopo Amidei, Rubén Nieto, Andreas Kaltenbrunner, Jose Gregorio Ferreira De Sá, Mayte Serrat, Klara Albajes. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 31.03.2025. This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research (ISSN 1438-8871), is properly cited. The complete bibliographic information, a link to the original publication on https://www.jmir.org/, as well as this copyright and license information must be included.
- dc.rights.accessRights info:eu-repo/semantics/openAccess
- dc.rights.uri http://creativecommons.org/licenses/by/4.0/
- dc.subject.keyword Large language modelsen
- dc.subject.keyword Fibromyalgiaen
- dc.subject.keyword Chronic painen
- dc.subject.keyword Written narrativesen
- dc.subject.keyword Pain narrativesen
- dc.subject.keyword Automated assessmenten
- dc.subject.keyword Pain severityen
- dc.subject.keyword Pain disabilityen
- dc.title Exploring the capacity of large language models to assess the chronic pain experience: algorithm development and validationen
- dc.type info:eu-repo/semantics/article
- dc.type.version info:eu-repo/semantics/publishedVersion
