Welcome to the UPF Digital Repository

Assembling genes from predicted exons in linear time with dynamic programming

Show simple item record

dc.contributor.author Guigó Serra, Roderic
dc.date.accessioned 2012-05-24T14:28:16Z
dc.date.available 2012-05-24T14:28:16Z
dc.date.issued 1998
dc.identifier.citation Guigó R. Assembling genes from predicted exons in linear time with dynamic programming. J Comput Biol. 1998;5(4):681-702. DOI: 10.1089/cmb.1998.5.681
dc.identifier.issn 1066-5277
dc.identifier.uri http://hdl.handle.net/10230/16482
dc.description.abstract In a number of programs for gene structure prediction in higher eukaryotic genomic sequences, exon prediction is decoupled from gene assembly: a large pool of candidate exons is predicted and scored from features located in the query DNA sequence, and candidate genes are assembled from such a pool as sequences of nonoverlapping frame-compatible exons. Genes are scored as a function of the scores of the assembled exons, and the highest scoring candidate gene is assumed to be the most likely gene encoded by the query DNA sequence. Considering additive gene scoring functions, currently available algorithms to determine such a highest scoring candidate gene run in time proportional to the square of the number of predicted exons. Here, we present an algorithm whose running time grows only linearly with the size of the set of predicted exons. Polynomial algorithms rely on the fact that, while scanning the set of predicted exons, the highest scoring gene ending in a given exon can be obtained by appending the exon to the highest scoring among the highest scoring genes ending at each compatible preceding exon. The algorithm here relies on the simple fact that such highest scoring gene can be stored and updated. This requires scanning the set of predicted exons simultaneously by increasing acceptor and donor position. On the other hand, the algorithm described here does not assume an underlying gene structure model. Indeed, the definition of valid gene structures is externally defined in the so-called Gene Model. The Gene Model specifies simply which gene features are allowed immediately upstream which other gene features in valid gene structures. This allows for great flexibility in formulating the gene identification problem. In particular it allows for multiple-gene two-strand predictions and for considering gene features other than coding exons (such as promoter elements) in valid gene structures.
dc.format.mimetype application/pdf
dc.language.iso eng
dc.publisher Mary Ann Liebert, Inc
dc.relation.ispartof Journal of Computational Biology. 1998;5(4):681-702
dc.rights This is a copy of an article published in the Journal of Computational Biology © 1998 Mary Ann Liebert, Inc.; Journal of Computational Biology is available online at: http://www.liebertonline.com
dc.subject.other Seqüències de nucleòtids
dc.subject.other Exons (Genètica)
dc.subject.other Genomes -- Anàlisi
dc.title Assembling genes from predicted exons in linear time with dynamic programming
dc.type info:eu-repo/semantics/article
dc.identifier.doi http://dx.doi.org/10.1089/cmb.1998.5.681
dc.rights.accessRights info:eu-repo/semantics/openAccess
dc.type.version info:eu-repo/semantics/publishedVersion


This item appears in the following Collection(s)

Show simple item record

Search DSpace

Advanced Search


My Account


In collaboration with Compliant to Partaking