Hierarchical boosting: a machine-learning framework to detect and classify hard selective sweeps in human populations

Mostra el registre complet Registre parcial de l'ítem

  • dc.contributor.author Pybus Oliveras, Marc, 1985-ca
  • dc.contributor.author Luisi, Pierre, 1985-ca
  • dc.contributor.author Dall'Olio, Giovanni Marco, 1983-ca
  • dc.contributor.author Uzkudun, Manuca
  • dc.contributor.author Laayouni, Hafid, 1968-ca
  • dc.contributor.author Bertranpetit, Jaume, 1952-ca
  • dc.contributor.author Engelken, Johannesca
  • dc.date.accessioned 2018-04-19T09:22:48Z
  • dc.date.available 2018-04-19T09:22:48Z
  • dc.date.issued 2015
  • dc.description.abstract MOTIVATION: Detecting positive selection in genomic regions is a recurrent topic in natural population genetic studies. However, there is little consistency among the regions detected in several genome-wide scans using different tests and/or populations. Furthermore, few methods address the challenge of classifying selective events according to specific features such as age, intensity or state (completeness). RESULTS: We have developed a machine-learning classification framework that exploits the combined ability of some selection tests to uncover different polymorphism features expected under the hard sweep model, while controlling for population-specific demography. As a result, we achieve high sensitivity toward hard selective sweeps while adding insights about their completeness (whether a selected variant is fixed or not) and age of onset. Our method also determines the relevance of the individual methods implemented so far to detect positive selection under specific selective scenarios. We calibrated and applied the method to three reference human populations from The 1000 Genome Project to generate a genome-wide classification map of hard selective sweeps. This study improves detection of selective sweep by overcoming the classical selection versus no-selection classification strategy, and offers an explanation to the lack of consistency observed among selection tests when applied to real data. Very few signals were observed in the African population studied, while our method presents higher sensitivity in this population demography. AVAILABILITY AND IMPLEMENTATION: The genome-wide results for three human populations from The 1000 Genomes Project and an R-package implementing the 'Hierarchical Boosting' framework are available at http://hsb.upf.edu/.
  • dc.description.sponsorship This work was supported by Ministerio de Economía y Competitividad (Spain) [grants BFU2010-19443, BFU2013-43726-P]; and the Secretaria d’Universitats i Recerca del Departament d’Economia i Coneixement de la Generalitat de Catalunya [GRC 2014 SGR 866] to J.B. M.P. and G.D. have been supported by a grant of the FPI program, Ministerio de Economia y Competitividad; P.L. by a grant from the Instituto de Salud Carlos III; J.E. was supported through a Postdoc scholarship from the Volkswagenstiftung [Az: I/85 198]
  • dc.format.mimetype application/pdf
  • dc.identifier.citation Pybus M, Luisi P, Dall'Olio GM, Uzkudun M, Laayouni H, Bertranpetit J et al. Hierarchical boosting: a machine-learning framework to detect and classify hard selective sweeps in human populations. Bioinformatics. 2015 Dec 15;31(24):3946-52. DOI: 10.1093/bioinformatics/btv493
  • dc.identifier.doi http://dx.doi.org/10.1093/bioinformatics/btv493
  • dc.identifier.issn 1367-4803
  • dc.identifier.uri http://hdl.handle.net/10230/34403
  • dc.language.iso eng
  • dc.publisher Oxford University Pressca
  • dc.relation.ispartof Bioinformatics. 2015 Dec 15;31(24):3946-52
  • dc.relation.projectID info:eu-repo/grantAgreement/ES/1PE/BFU2010-19443
  • dc.relation.projectID info:eu-repo/grantAgreement/ES/1PE/BFU2013-43726-P
  • dc.rights © Oxford University Press. This is a pre-copy-editing, author-produced PDF of an article accepted for publication in Bioinformatics following peer review. The definitive publisher-authenticated version Pybus M, Luisi P, Dall'Olio GM, Uzkudun M, Laayouni H, Bertranpetit J et al. Hierarchical boosting: a machine-learning framework to detect and classify hard selective sweeps in human populations. Bioinformatics. 2015 Dec 15; 31(24): 3946-52 is available online at: http://dx.doi.org/10.1093/bioinformatics/btv493
  • dc.rights.accessRights info:eu-repo/semantics/openAccess
  • dc.subject.other Genètica de poblacions
  • dc.subject.other Genòmica
  • dc.subject.other Aprenentatge automàtic
  • dc.title Hierarchical boosting: a machine-learning framework to detect and classify hard selective sweeps in human populationsca
  • dc.type info:eu-repo/semantics/article
  • dc.type.version info:eu-repo/semantics/acceptedVersion