From Wet-Lab to Variations: Concordance and Speed of Bioinformatics Pipelines for Whole Genome and Whole Exome Sequencing
Mostra el registre complet Registre parcial de l'ítem
- dc.contributor.author Laurie, Steven, 1973-ca
- dc.contributor.author Fernández Callejo, Marcosca
- dc.contributor.author Marco Sola, Santiagoca
- dc.contributor.author Trotta, Jean-Remica
- dc.contributor.author Camps-Puchadas, Jordica
- dc.contributor.author Chacón, Alejandroca
- dc.contributor.author Espinosa, Antonioca
- dc.contributor.author Gut, Martaca
- dc.contributor.author Gut, Ivo Glynneca
- dc.contributor.author Heath, Simonca
- dc.contributor.author Beltran, Sergica
- dc.date.accessioned 2017-04-24T10:29:29Z
- dc.date.available 2017-04-24T10:29:29Z
- dc.date.issued 2016
- dc.description.abstract As whole genome sequencing becomes cheaper and faster, it will progressively substitute targeted next-generation sequencing as standard practice in research and diagnostics. However, computing cost–performance ratio is not advancing at an equivalent rate. Therefore, it is essential to evaluate the robustness of the variant detection process taking into account the computing resources required. We have benchmarked six combinations of state-of-the-art read aligners (BWA-MEM and GEM3) and variant callers (FreeBayes, GATK HaplotypeCaller, SAMtools) on whole genome and whole exome sequencing data from the NA12878 human sample. Results have been compared between them and against the NIST Genome in a Bottle (GIAB) variants reference dataset. We report differences in speed of up to 20 times in some steps of the process and have observed that SNV, and to a lesser extent InDel, detection is highly consistent in 70% of the genome. SNV, and especially InDel, detection is less reliable in 20% of the genome, and almost unfeasible in the remaining 10%. These findings will aid in choosing the appropriate tools bearing in mind objectives, workload, and computing infrastructure available.
- dc.description.sponsorship Contract Grant Sponsors: Spanish Ministry of Economy and Competitiveness; Generalitat de Catalunya; European Regional Development Fund (ERDF); RD-Connect Project (EC FP7/2007-2013 #305444); ELIXIR-EXCELERATE (EC H2020 #676559); MICINN (TIN2014-53234- C2-1-R).
- dc.format.mimetype application/pdfca
- dc.identifier.citation Laurie S, Fernandez Callejo M, Marco Sola S, Trotta J-R, Camps J, Chacón A et al. From Wet-Lab to Variations: Concordance and Speed of Bioinformatics Pipelines for Whole Genome and Whole Exome Sequencing. Human Mutation. 2016;37(12):1263-71. DOI: http://dx.doi.org/10.1002/humu.23114
- dc.identifier.doi http://dx.doi.org/10.1002/humu.23114
- dc.identifier.issn 1059-7794
- dc.identifier.uri http://hdl.handle.net/10230/30879
- dc.language.iso eng
- dc.publisher Wileyca
- dc.relation.ispartof Human Mutation. 2016;37(12):1263-71
- dc.relation.projectID info:eu-repo/grantAgreement/EC/FP7/305444
- dc.relation.projectID info:eu-repo/grantAgreement/EC/H2020/676559
- dc.relation.projectID info:eu-repo/grantAgreement/ES/1PE/TIN2014-53234- C2-1-R
- dc.rights © 2016 The Authors. **Human Mutation published by Wiley Periodicals, Inc. This is an open access article under the terms of the Creative Commons Attribution-NonCommercial License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited and is not used for commercial purposes.
- dc.rights.accessRights info:eu-repo/semantics/openAccess
- dc.rights.uri https://creativecommons.org/licenses/by-nc/4.0/
- dc.subject.keyword Whole genome sequencing
- dc.subject.keyword Whole exome sequencing
- dc.subject.keyword NGS
- dc.subject.keyword NA12878
- dc.subject.keyword Alignment
- dc.subject.keyword Variant calling
- dc.subject.keyword Bioinformatics
- dc.subject.keyword Computing speed
- dc.subject.keyword Benchmark
- dc.title From Wet-Lab to Variations: Concordance and Speed of Bioinformatics Pipelines for Whole Genome and Whole Exome Sequencingca
- dc.type info:eu-repo/semantics/article
- dc.type.version info:eu-repo/semantics/publishedVersion