Welcome to the UPF Digital Repository

Estimation of protein coding density in a corpus of DNA sequence data

Show simple item record

dc.contributor.author Fickett, James W.
dc.contributor.author Guigó Serra, Roderic
dc.date.accessioned 2011-11-28T11:08:06Z
dc.date.available 2011-11-28T11:08:06Z
dc.date.issued 1993
dc.identifier.citation Fickett J W, Guigó R. Estimation of protein coding density in a corpus of DNA sequence data. Nucleic acids research. 1993;21(12):2837-44. DOI: 10.1093/nar/21.12.2837
dc.identifier.issn 0305-1048
dc.identifier.uri http://hdl.handle.net/10230/13149
dc.description.abstract A number of experimental methods have been reported for estimating the number of genes in a genome, or the closely related coding density of a genome, defined as the fraction of base pairs in codons. Recently, DNA sequence data representative of the genome as a whole have become available for several organisms, making the problem of estimating coding density amenable to sequence analytic methods. Estimates of coding density for a single genome vary widely, so that methods with characterized error bounds have become increasingly desirable. We present a method to estimate the protein coding density in a corpus of DNA sequence data, in which a ‘coding statistic’ is calculated for a large number of windows of the sequence under study, and the distribution of the statistic is decomposed into two normal distributions, assumed to be the distributions of the coding statistic in the coding and noncoding fractions of the sequence windows. The accuracy of the method is evaluated using known data and application is made to the yeast chromosome III sequence and to C.elegans cosmid sequences. It can also be applied to fragmentary data, for example a collection of short sequences determined in the course of STS mapping.
dc.description.sponsorship This work was performed under the auspices of the U.S./nDepartment of Energy and the Los Alamos Center for Human/nGenome Studies, with funding by DOE/OHER grant/nERFWPF116, and LANL grant X16K.
dc.format.mimetype application/pdf
dc.language.iso eng
dc.publisher Oxford University Press
dc.relation.ispartof Nucleic acids research. 1993;21(12):2837-44
dc.rights © Oxford University Press. Published article is available online at http://nar.oxfordjournals.org/content/21/12/2837
dc.subject.other Nucleòtids -- Anàlisi
dc.subject.other Genòmica
dc.title Estimation of protein coding density in a corpus of DNA sequence data
dc.type info:eu-repo/semantics/article
dc.identifier.doi http://dx.doi.org/10.1093/nar/21.12.2837
dc.subject.keyword Animals
dc.subject.keyword Humans
dc.subject.keyword Cosmids
dc.subject.keyword Proteins
dc.subject.keyword DNA
dc.subject.keyword Caenorhabditis elegans
dc.subject.keyword Codon
dc.rights.accessRights info:eu-repo/semantics/openAccess
dc.type.version info:eu-repo/semantics/publishedVersion


This item appears in the following Collection(s)

Show simple item record

Search DSpace

Advanced Search


My Account


In collaboration with Compliant to Partaking