Welcome to the UPF Digital Repository

Improving genome-wide scans of positive selection by using protein isoforms of similar length

Show simple item record

dc.contributor.author Villanueva Cañas, José Luis, 1984-
dc.contributor.author Laurie, Steven, 1973-
dc.contributor.author Albà Soler, Mar
dc.date.accessioned 2016-12-02T09:08:07Z
dc.date.available 2016-12-02T09:08:07Z
dc.date.issued 2013
dc.identifier.citation Villanueva-Cañas JL, Laurie S, Albà MM. Improving genome-wide scans of positive selection by using protein isoforms of similar length. Genome Biol Evol. 2013;5(2):457-67. DOI: 10.1093/gbe/evt017
dc.identifier.issn 1759-6653
dc.identifier.uri http://hdl.handle.net/10230/27682
dc.description.abstract Large-scale evolutionary studies often require the automated construction of alignments of a large number of homologous gene families. The majority of eukaryotic genes can produce different transcripts due to alternative splicing or transcription initiation, and many such transcripts encode different protein isoforms. As analyses tend to be gene centered, one single-protein isoform per gene is selected for the alignment, with the de facto approach being to use the longest protein isoform per gene (Longest), presumably to avoid including partial sequences and to maximize sequence information. Here, we show that this approach is problematic because it increases the number of indels in the alignments due to the inclusion of nonhomologous regions, such as those derived from species-specific exons, increasing the number of misaligned positions. With the aim of ameliorating this problem, we have developed a novel heuristic, Protein ALignment Optimizer (PALO), which, for each gene family, selects the combination of protein isoforms that are most similar in length. We examine several evolutionary parameters inferred from alignments in which the only difference is the method used to select the protein isoform combination: Longest, PALO, the combination that results in the highest sequence conservation, and a randomly selected combination. We observe that Longest tends to overestimate both nonsynonymous and synonymous substitution rates when compared with PALO, which is most likely due to an excess of misaligned positions. The estimation of the fraction of genes that have experienced positive selection by maximum likelihood is very sensitive to the method of isoform selection employed, both when alignments are constructed with MAFFT and with Prank(+F). Longest performs better than a random combination but still estimates up to 3 times more positively selected genes than the combination showing the highest conservation, indicating the presence of many false positives. We show that PALO can eliminate the majority of such false positives and thus that it is a more appropriate approach for large-scale analyses than Longest. A web server has been set up to facilitate the use of PALO given a user-defined set of gene families; it is available at http://evolutionarygenomics.imim.es/palo.
dc.description.sponsorship This work was funded by Ministerio de Economía y Competitividad (FPI BES-2010-038494 to J.L.V.-C., Plan Nacional BIO2009-08160 and BFU2012-36820) and Fundació ICREA to M.M.A.
dc.format.mimetype application/pdf
dc.language.iso eng
dc.publisher Oxford University Press
dc.relation.ispartof Genome Biology and Evolution. 2013;5(2):457-67
dc.rights © José Luis Villanueva-Cañas, Steve Laurie and M. Mar Albà. 2013. Published by Oxford University Press. This is an Open Access article distributed under the terms of a Creative Commons Attribution License
dc.rights.uri http://creativecommons.org/licenses/by-nc/3.0/
dc.subject.other Evolució molecular
dc.subject.other Selecció natural
dc.subject.other Genètica
dc.subject.other Proteïnes
dc.title Improving genome-wide scans of positive selection by using protein isoforms of similar length
dc.type info:eu-repo/semantics/article
dc.identifier.doi http://dx.doi.org/10.1093/gbe/evt017
dc.relation.projectID info:eu-repo/grantAgreement/ES/3PN/BES2010-038494
dc.relation.projectID info:eu-repo/grantAgreement/ES/3PN/BIO2009-08160
dc.relation.projectID info:eu-repo/grantAgreement/ES/3PN/BFU2012-36820
dc.rights.accessRights info:eu-repo/semantics/openAccess
dc.type.version info:eu-repo/semantics/publishedVersion

Thumbnail

This item appears in the following Collection(s)

Show simple item record

Search DSpace


Advanced Search

Browse

My Account

Statistics

In collaboration with Compliant to Partaking