Use of text mining techniques for the selection of cohorts in a clinical trial: classifying compliance with the selection criteria for a clinical trial by analyzing patient medical records

Enllaç permanent

Descripció

  • Resum

    Introduction Nowadays, most of the data with which clinicians work are un- structured such as texts. In this large volume of unstructured data lies the need to create analytical procedures in order to maximize the value extraction. As a com- bination of computational linguistics and Machine Learning techniques arises Text Mining and Natural Language Processing (NLP) techniques. Objective: One of the most tedious and time-consuming tasks in clinical trials is the subject selection process. The information of the patients susceptible to inclusion in the study must be consulted manually to check if they meet the defined selection criteria. The project aimed to build an automatic subject selection system for clinical trials from longitudinal patient medical records. Materials and Methods: Starting from a set of clinical histories annotated according to whether the patient meets or does not meet 13 selection criteria, several preprocessing tasks related to NLP techniques were applied. An hybrid approach combining Machine Learning (ML) models and Rule-Based models were studied for the classification task setting the majority classifier algorithm as the Baseline for comparison. Results: 10 of the selection criteria achieved best results when applying ML models after a preprocessing stage, the remaining selection criteria achieved better classification performance when applying the Rule-Based approach. The overall micro F1 score of the proposed model achieved a 0,8574 value. Conclusion: This study concludes that the proposed hybrid approach offers the possibility to develop a useful tool for the automatic selection of patients for a clinical trial cohort.
  • Descripció

    Tutor: Horacio Saggion
    Treball de fi de grau en Biomèdica
  • Mostra el registre complet