Identifying most important predictors for suicidal thoughts and behaviours among healthcare workers active during the Spain COVID-19 pandemic: a machine-learning approach
Identifying most important predictors for suicidal thoughts and behaviours among healthcare workers active during the Spain COVID-19 pandemic: a machine-learning approach
Citació
- Alayo I, Pujol O, Alonso J, Ferrer M, Amigo F, Portillo-Van Diest A, et al. Identifying most important predictors for suicidal thoughts and behaviours among healthcare workers active during the Spain COVID-19 pandemic: a machine-learning approach. Epidemiol Psychiatr Sci. 2025 May 8;34:e28. DOI: 10.1017/S2045796025000198
Enllaç permanent
Descripció
Resum
Aims: Studies conducted during the COVID-19 pandemic found high occurrence of suicidal thoughts and behaviours (STBs) among healthcare workers (HCWs). The current study aimed to (1) develop a machine learning-based prediction model for future STBs using data from a large prospective cohort of Spanish HCWs and (2) identify the most important variables in terms of contribution to the model's predictive accuracy. Methods: This is a prospective, multicentre cohort study of Spanish HCWs active during the COVID-19 pandemic. A total of 8,996 HCWs participated in the web-based baseline survey (May-July 2020) and 4,809 in the 4-month follow-up survey. A total of 219 predictor variables were derived from the baseline survey. The outcome variable was any STB at the 4-month follow-up. Variable selection was done using an L1 regularized linear Support Vector Classifier (SVC). A random forest model with 5-fold cross-validation was developed, in which the Synthetic Minority Oversampling Technique (SMOTE) and undersampling of the majority class balancing techniques were tested. The model was evaluated by the area under the Receiver Operating Characteristic (AUROC) curve and the area under the precision-recall curve. Shapley's additive explanatory values (SHAP values) were used to evaluate the overall contribution of each variable to the prediction of future STBs. Results were obtained separately by gender. Results: The prevalence of STBs in HCWs at the 4-month follow-up was 7.9% (women = 7.8%, men = 8.2%). Thirty-four variables were selected by the L1 regularized linear SVC. The best results were obtained without data balancing techniques: AUROC = 0.87 (0.86 for women and 0.87 for men) and area under the precision-recall curve = 0.50 (0.55 for women and 0.45 for men). Based on SHAP values, the most important baseline predictors for any STB at the 4-month follow-up were the presence of passive suicidal ideation, the number of days in the past 30 days with passive or active suicidal ideation, the number of days in the past 30 days with binge eating episodes, the number of panic attacks (women only) and the frequency of intrusive thoughts (men only). Conclusions: Machine learning-based prediction models for STBs in HCWs during the COVID-19 pandemic trained on web-based survey data present high discrimination and classification capacity. Future clinical implementations of this model could enable the early detection of HCWs at the highest risk for developing adverse mental health outcomes. Study registration: NCT04556565.