Analysis of the batch effect Due to sequencing center in population statistics quantifying rare events in the 1000 Genomes Project
Mostra el registre complet Registre parcial de l'ítem
- dc.contributor.author Maceda Porto, Iago, 1986-
- dc.contributor.author Lao Grueso, Oscar, 1976-
- dc.date.accessioned 2022-04-04T06:31:08Z
- dc.date.available 2022-04-04T06:31:08Z
- dc.date.issued 2021
- dc.description.abstract The 1000 Genomes Project (1000G) is one of the most popular whole genome sequencing datasets used in different genomics fields and has boosting our knowledge in medical and population genomics, among other fields. Recent studies have reported the presence of ghost mutation signals in the 1000G. Furthermore, studies have shown that these mutations can influence the outcomes of follow-up studies based on the genetic variation of 1000G, such as single nucleotide variants (SNV) imputation. While the overall effect of these ghost mutations can be considered negligible for common genetic variants in many populations, the potential bias remains unclear when studying low frequency genetic variants in the population. In this study, we analyze the effect of the sequencing center in predicted loss of function (LoF) alleles, the number of singletons, and the patterns of archaic introgression in the 1000G. Our results support previous studies showing that the sequencing center is associated with LoF and singletons independent of the population that is considered. Furthermore, we observed that patterns of archaic introgression were distorted for some populations depending on the sequencing center. When analyzing the frequency of SNPs showing extreme patterns of genotype differentiation among centers for CEU, YRI, CHB, and JPT, we observed that the magnitude of the sequencing batch effect was stronger at MAF < 0.2 and showed different profiles between CHB and the other populations. All these results suggest that data from 1000G must be interpreted with caution when considering statistics using variants at low frequency.
- dc.description.sponsorship O.L. and I.M. acknowledge the support from Spanish Ministry of Science and Innovation to the EMBL partnership, the Centro de Excelencia Severo Ochoa, the CERCA Programme/Generalitat de Catalunya, the Spanish Ministry of Science and Innovation through the Instituto de Salud Carlos III, the Generalitat de Catalunya through the Departament de Salut and the Departament d’Empresa i Coneixement, Co-financing with funds from the European Regional Development Fund by the Spanish Ministry of Science and Innovation corresponding to the Programa Operativo FEDER Plurirregional de España (POPE) 2014–2020, and from the Secretaria d’Universitats i Recerca, Departament d’Empresa i Coneixement of the Generalitat de Catalunya corresponding to the Programa Operatiu FEDER de Catalunya 2014–2020. O.L. gratefully acknowledges the financial support from the Ministerio de Economía y Competitividad (Ministry of Economy and Competitiveness)—RYC-2013-14797, BFU2015-68759-P, and PGC2018-098574-B-I00 and the Generalitat de Catalunya (Government of Catalonia)—GRC 2017 SGR 937. I.M. gratefully acknowledges the financial support from the Government of Catalonia Agència de Gestió d’Ajuts Universitaris i de Recerca (Agency for Management of University and Research Grants)—GRC 2014 SGR 615.
- dc.format.mimetype application/pdf
- dc.identifier.citation Maceda I, Lao O. Analysis of the batch effect Due to sequencing center in population statistics quantifying rare events in the 1000 Genomes Project. Genes (Basel). 2021 Dec 24;13(1):44. DOI: 10.3390/genes13010044
- dc.identifier.doi http://dx.doi.org/10.3390/genes13010044
- dc.identifier.issn 2073-4425
- dc.identifier.uri http://hdl.handle.net/10230/52819
- dc.language.iso eng
- dc.publisher MDPI
- dc.relation.ispartof Genes (Basel). 2021 Dec 24;13(1):44
- dc.relation.projectID info:eu-repo/grantAgreement/ES/1PE/BFU2015-68759-P
- dc.relation.projectID info:eu-repo/grantAgreement/ES/2PE/PGC2018-098574-B-I00
- dc.rights © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
- dc.rights.accessRights info:eu-repo/semantics/openAccess
- dc.rights.uri https://creativecommons.org/licenses/by/4.0/
- dc.subject.keyword 1000 Genomes Project
- dc.subject.keyword Batch effect
- dc.subject.keyword Population genetics
- dc.subject.keyword Sequencing center
- dc.title Analysis of the batch effect Due to sequencing center in population statistics quantifying rare events in the 1000 Genomes Project
- dc.type info:eu-repo/semantics/article
- dc.type.version info:eu-repo/semantics/publishedVersion