New implementation of data standards for AI in oncology: Experience from the EuCanImage project
New implementation of data standards for AI in oncology: Experience from the EuCanImage project
Citació
- García-Lezana T, Bobowicz M, Frid S, Rutherford M, Recuero M, Riklund K, et al. New implementation of data standards for AI in oncology: Experience from the EuCanImage project. Gigascience. 2025 Jan 6;14:giae101. DOI: 10.1093/gigascience/giae101
Enllaç permanent
Descripció
Resum
Background: An unprecedented amount of personal health data, with the potential to revolutionize precision medicine, is generated at health care institutions worldwide. The exploitation of such data using artificial intelligence (AI) relies on the ability to combine heterogeneous, multicentric, multimodal, and multiparametric data, as well as thoughtful representation of knowledge and data availability. Despite these possibilities, significant methodological challenges and ethicolegal constraints still impede the real-world implementation of data models. Technical details: The EuCanImage is an international consortium aimed at developing AI algorithms for precision medicine in oncology and enabling secondary use of the data based on necessary ethical approvals. The use of well-defined clinical data standards to allow interoperability was a central element within the initiative. The consortium is focused on 3 different cancer types and addresses 7 unmet clinical needs. We have conceived and implemented an innovative process to capture clinical data from hospitals, transform it into the newly developed EuCanImage data models, and then store the standardized data in permanent repositories. This new workflow combines recognized software (REDCap for data capture), data standards (FHIR for data structuring), and an existing repository (EGA for permanent data storage and sharing), with newly developed custom tools for data transformation and quality control purposes (ETL pipeline, QC scripts) to complement the gaps. Conclusion: This article synthesizes our experience and procedures for health care data interoperability, standardization, and reproducibility.