Consistency of metagenomic assignment programs in simulated and real data

Garcia Etxebarria, Koldo; García Garcerà, Marc; Calafell i Majó, Francesc

Consistency of metagenomic assignment programs in simulated and real data

Mostra el registre complet Registre parcial de l'ítem

dc.contributor.author Garcia Etxebarria, Koldoca
dc.contributor.author García Garcerà, Marcca
dc.contributor.author Calafell i Majó, Francescca
dc.date.accessioned 2015-03-19T08:38:15Z
dc.date.available 2015-03-19T08:38:15Z
dc.date.issued 2014ca
dc.description.abstract Background: Metagenomics is the genomic study of uncultured environmental samples, which has been greatly facilitated by the advent of shotgun-sequencing technologies. One of the main focuses of metagenomics is the discovery of previously uncultured microorganisms, which makes the assignment of sequences to a particular taxon a challenge and a crucial step. Recently, several methods have been developed to perform this task, based on different methodologies such as sequence composition or sequence similarity. The sequence composition methods have the ability to completely assign the whole dataset. However, their use in metagenomics and the study of their performance with real data is limited. In this work, we assess the consistency of three different methods (BLAST + Lowest Common Ancestor, Phymm, and Naïve Bayesian Classifier) in assigning real and simulated sequence reads. Results: Both in real and in simulated data, BLAST + Lowest Common Ancestor (BLAST+LCA), Phymm, and Naïve Bayesian Classifier consistently assign a larger number of reads in higher taxonomic levels than in lower levels. However, discrepancies increase at lower taxonomic levels. In simulated data, consistent assignments between all three methods showed greater precision than assignments based on Phymm or Bayesian Classifier alone, since the BLAST + LCA algorithm performed best. In addition, assignment consistency in real data increased with sequence read length, in agreement with previously published simulation results. Conclusions: The use and combination of different approaches is advisable to assign metagenomic reads. Although the sensitivity could be reduced, the reliability can be increased by using the reads consistently assigned to the same taxa by, at least, two methods, and by training the programs using all available information.en
dc.description.sponsorship This work was financed by the MICINN (Spanish Ministry of Science and Innovation) grant SAF2010-16240. MGG was supported by a predoctoral fellowship from MICINNen
dc.format.mimetype application/pdfca
dc.identifier.citation Garcia-Etxebarria K, Garcia-Garcerà M, Calafell F. Consistency of metagenomic assignment programs in simulated and real data. BMC Bioinformatics. 2014; 15: 90. DOI 10.1186/1471-2105-15-90ca
dc.identifier.doi http://dx.doi.org/10.1186/1471-2105-15-90
dc.identifier.issn 1471-2105ca
dc.identifier.uri http://hdl.handle.net/10230/23231
dc.language.iso engca
dc.publisher BioMed Centralca
dc.relation.ispartof BMC Bioinformatics. 2014; 15: 90
dc.relation.projectID info:eu-repo/grantAgreement/ES/3PN/SAF2010-16240
dc.rights © 2014 Garcia-Etxebarria et al.; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited.ca
dc.rights.accessRights info:eu-repo/semantics/openAccessca
dc.rights.uri http://creativecommons.org/licenses/by/2.0
dc.subject.keyword Metagenomicsen
dc.subject.keyword Assignmenten
dc.subject.keyword Comparisonen
dc.subject.other Ratolins (Animals de laboratori)ca
dc.subject.other Genòmica -- Investigacióca
dc.title Consistency of metagenomic assignment programs in simulated and real dataen
dc.type info:eu-repo/semantics/articleca
dc.type.version info:eu-repo/semantics/publishedVersionca

Col·leccions

Articles (Departament de Medicina i Ciències de la Vida)