Starcode: sequence clustering based on all-pairs search

Zorita, Eduard; Cuscó Pons, Pol, 1987-; Filion, Guillaume

Starcode: sequence clustering based on all-pairs search

Mostra el registre complet Registre parcial de l'ítem

dc.contributor.author Zorita, Eduardca
dc.contributor.author Cuscó Pons, Pol, 1987-ca
dc.contributor.author Filion, Guillaumeca
dc.date.accessioned 2015-11-11T18:44:57Z
dc.date.available 2015-11-11T18:44:57Z
dc.date.issued 2015
dc.description.abstract Motivation: The increasing throughput of sequencing technologies offers new applications and challenges for computational biology. In many of those applications, sequencing errors need to be/ncorrected. This is particularly important when sequencing reads from an unknown reference such as random DNA barcodes. In this case, error correction can be done by performing a pairwise comparison/nof all the barcodes, which is a computationally complex problem. Results: Here, we address this challenge and describe an exact algorithm to determine which pairs of sequences lie within a given Levenshtein distance. For error correction or redundancy reduction purposes, matched pairs are then merged into clusters of similar sequences. The efficiency of starcode is attributable to the poucet search, a novel implementation of the Needleman–Wunsch algorithm performed on the nodes of a trie. On the task of matching random barcodes, starcode outperforms sequence clustering algorithms in both speed and precision. Availability and implementation: The C source code is available at http://github.com/gui11aume/starcode.en
dc.format.mimetype application/pdfca
dc.identifier.citation Zorita E, Cuscó P, Filion GJ. Starcode: sequence clustering based on all-pairs search. Bioinformatics. 2015;31(12):1913-9. DOI: 10.1093/bioinformatics/btv053ca
dc.identifier.doi http://dx.doi.org/10.1093/bioinformatics/btv053
dc.identifier.issn 1367-4803
dc.identifier.uri http://hdl.handle.net/10230/25055
dc.language.iso engca
dc.publisher Oxford University Pressca
dc.relation.ispartof Bioinformatics. 2015;31(12):1913-9
dc.rights © The Author 2014. Published by Oxford University Press. This is an open access article published under a Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited.en
dc.rights.accessRights info:eu-repo/semantics/openAccessca
dc.rights.uri http://creativecommons.org/licenses/by-nc/4.0/ca
dc.subject.other Bioinformàticaca
dc.subject.other Biologia computacionalca
dc.title Starcode: sequence clustering based on all-pairs searchen
dc.type info:eu-repo/semantics/articleca
dc.type.version info:eu-repo/semantics/publishedVersionca

Col·leccions

Articles (Center for Genomic Regulation (CRG))
Articles (Departament de Medicina i Ciències de la Vida)