Welcome to the UPF Digital Repository

Identifying Cases of Type 2 Diabetes in Heterogeneous Data Sources: Strategy from the EMIF Project.

Show simple item record

dc.contributor.author Roberto, Giuseppe
dc.contributor.author Mayer, Miguel Ángel, 1960-
dc.contributor.author Gini, Rosa
dc.date.accessioned 2016-11-25T08:47:01Z
dc.date.available 2016-11-25T08:47:01Z
dc.date.issued 2016
dc.identifier.citation Roberto G, Leal I, Sattar N, Loomis AK, Avillach P, Egger P. et al. Identifying Cases of Type 2 Diabetes in Heterogeneous Data Sources: Strategy from the EMIF Project. PLoS One. 2016 Aug 31;11(8):e0160648. doi: 10.1371/journal.pone.0160648
dc.identifier.issn 1932-6203
dc.identifier.uri http://hdl.handle.net/10230/27604
dc.description.abstract Due to the heterogeneity of existing European sources of observational healthcare data, data source-tailored choices are needed to execute multi-data source, multi-national epidemiological studies. This makes transparent documentation paramount. In this proof-of-concept study, a novel standard data derivation procedure was tested in a set of heterogeneous data sources. Identification of subjects with type 2 diabetes (T2DM) was the test case. We included three primary care data sources (PCDs), three record linkage of administrative and/or registry data sources (RLDs), one hospital and one biobank. Overall, data from 12 million subjects from six European countries were extracted. Based on a shared event definition, sixteeen standard algorithms (components) useful to identify T2DM cases were generated through a top-down/bottom-up iterative approach. Each component was based on one single data domain among diagnoses, drugs, diagnostic test utilization and laboratory results. Diagnoses-based components were subclassified considering the healthcare setting (primary, secondary, inpatient care). The Unified Medical Language System was used for semantic harmonization within data domains. Individual components were extracted and proportion of population identified was compared across data sources. Drug-based components performed similarly in RLDs and PCDs, unlike diagnoses-based components. Using components as building blocks, logical combinations with AND, OR, AND NOT were tested and local experts recommended their preferred data source-tailored combination. The population identified per data sources by resulting algorithms varied from 3.5% to 15.7%, however, age-specific results were fairly comparable. The impact of individual components was assessed: diagnoses-based components identified the majority of cases in PCDs (93-100%), while drug-based components were the main contributors in RLDs (81-100%). The proposed data derivation procedure allowed the generation of data source-tailored case-finding algorithms in a standardized fashion, facilitated transparent documentation of the process and benchmarking of data sources, and provided bases for interpretation of possible inter-data source inconsistency of findings in future studies.
dc.description.sponsorship The research leading to these results has received support from the Innovative Medicines Initiative Joint Undertaking (http://www.imi.europa.eu/) under European Medical Information Framework grant agreement no. 115372, resources of which are composed of financial contribution from the European Union's Seventh Framework Programme (FP7/2007-2013) and European Federation of Pharmaceutical Industries and Association companies’ in kind contribution. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Pfizer Worldwide Research and Development, GlaxoSmithKline, Cegedim Strategic Data Medical Research Ltd and Janssen provided support in the form of salaries for AKL, PE, DA and MJS, respectively, but did not have any additional role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript. The specific roles of these authors are articulated in the ‘author contributions’ section.
dc.format.mimetype application/pdf
dc.language.iso eng
dc.publisher Public Library of Science (PLoS) 
dc.relation.ispartof PLoS One. 2016 Aug 31;11(8):e0160648
dc.rights © 2016 Roberto et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
dc.rights.uri https://creativecommons.org/licenses/by/4.0/
dc.subject.other Diabetis
dc.subject.other Protocols clínics
dc.title Identifying Cases of Type 2 Diabetes in Heterogeneous Data Sources: Strategy from the EMIF Project.
dc.type info:eu-repo/semantics/article
dc.identifier.doi http://dx.doi.org/10.1371/journal.pone.0160648
dc.relation.projectID info:eu-repo/grantAgreement/EC/FP7/115372
dc.rights.accessRights info:eu-repo/semantics/openAccess
dc.type.version info:eu-repo/semantics/publishedVersion


This item appears in the following Collection(s)

Show simple item record

Search DSpace

Advanced Search


My Account


Compliant to Partaking