Combining Multiple Hypothesis Testing with Machine Learning Increases the Statistical Power of Genome-wide Association Studies

Mostra el registre complet Registre parcial de l'ítem

  • dc.contributor.author Mieth, Bettinaca
  • dc.contributor.author Kloft, Mariusca
  • dc.contributor.author Rodríguez, Juan Antonioca
  • dc.contributor.author Sonnenburg, Sörenca
  • dc.contributor.author Vobruba, Robinca
  • dc.contributor.author Morcillo Suárez, Carlos, 1969-ca
  • dc.contributor.author Farré, Xavierca
  • dc.contributor.author Marigorta, Urko M.ca
  • dc.contributor.author Fehr, Ernstca
  • dc.contributor.author Dickhaus, Thorstenca
  • dc.contributor.author Blanchard, Gillesca
  • dc.contributor.author Schunk, Danielca
  • dc.contributor.author Navarro i Cuartiellas, Arcadi, 1969-ca
  • dc.contributor.author Müller, Klaus-Robertca
  • dc.date.accessioned 2017-03-06T13:51:50Z
  • dc.date.available 2017-03-06T13:51:50Z
  • dc.date.issued 2016
  • dc.description.abstract The standard approach to the analysis of genome-wide association studies (GWAS) is based on testing each position in the genome individually for statistical significance of its association with the phenotype under investigation. To improve the analysis of GWAS, we propose a combination of machine learning and statistical testing that takes correlation structures within the set of SNPs under investigation in a mathematically well-controlled manner into account. The novel two-step algorithm, COMBI, first trains a support vector machine to determine a subset of candidate SNPs and then performs hypothesis tests for these SNPs together with an adequate threshold correction. Applying COMBI to data from a WTCCC study (2007) and measuring performance as replication by independent GWAS published within the 2008–2015 period, we show that our method outperforms ordinary raw p-value thresholding as well as other state-of-the-art methods. COMBI presents higher power and precision than the examined alternatives while yielding fewer false (i.e. non-replicated) and more true (i.e. replicated) discoveries when its results are validated on later GWAS studies. More than 80% of the discoveries made by COMBI upon WTCCC data have been validated by independent studies. Implementations of the COMBI method are available as a part of the GWASpi toolbox 2.0.
  • dc.description.sponsorship EF acknowledges support from the advanced ERC grant (ERC-2011-AdG 295642-FEP) on the Foundation of Economic Preferences. MK, BM, and KRM were supported by the German National Science Foundation (DFG) under the grants MU 987/6-1 and RA 1894/1-1. TD and DS were supported by the German National Science Foundation (DFG) under the grants DI 1723/3-1 und SCHU 2828/2-1. GB and TS acknowledge support of the German National Science Foundation (DFG) under the research group grant FOR 1735. MK, DT, KRM, and GB acknowledge financial support by the FP7-ICT Programme of the European Community, under the PASCAL2 Network of Excellence. MK acknowledges a postdoctoral fellowship by the German Research Foundation (DFG), award KL 2698/2-1, and from the Federal Ministry of Science and Education (BMBF) awards 031L0023A and 031B0187B. AN acknowledges support from the Spanish Multiple Sclerosis Network (REEM), of the Instituto de Salud Carlos III (RD12/0032/0011), the Spanish National Institute for Bioinformatics (PT13/0001/0026) the Spanish Government Grant BFU2012-38236 and from FEDER. This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 634143 (MedBioinformatics). MK and KRM were financially supported by the Ministry of Education, Science, and Technology, through the National Research Foundation of Korea under Grant R31-10008 (MK, KRM) and BK21 (KRM).
  • dc.format.mimetype application/pdfca
  • dc.identifier.citation Mieth B, Kloft M, Rodríguez JA, Sonnenburg S, Vobruba R, Morcillo-Suárez C, Farré X, Marigorta UM, Fehr E, Dickhaus T, Blanchard G, Schunk D, Navarro i Cuartiellas A, Müller KR. Combining Multiple Hypothesis Testing with Machine Learning Increases the Statistical Power of Genome-wide Association Studies. Scientific Reports. 2016; 6: 36671. DOI: 10.1038/srep36671
  • dc.identifier.doi http://dx.doi.org/10.1038/srep36671
  • dc.identifier.issn 2045-2322
  • dc.identifier.uri http://hdl.handle.net/10230/28172
  • dc.language.iso eng
  • dc.publisher Nature Publishing Groupca
  • dc.relation.ispartof Scientific Reports. 2016; 6: 36671
  • dc.relation.projectID info:eu-repo/grantAgreement/ES/3PN/BFU2012-38236
  • dc.relation.projectID info:eu-repo/grantAgreement/EC/H2020/634143
  • dc.rights © Nature Publishing Group. This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/
  • dc.rights.accessRights info:eu-repo/semantics/openAccess
  • dc.rights.uri http://creativecommons.org/licenses/by/4.0/
  • dc.subject.keyword Computational science
  • dc.subject.keyword Genome-wide association studies
  • dc.subject.keyword Statistical methods
  • dc.title Combining Multiple Hypothesis Testing with Machine Learning Increases the Statistical Power of Genome-wide Association Studiesca
  • dc.type info:eu-repo/semantics/article
  • dc.type.version info:eu-repo/semantics/publishedVersion