ChimPipe: Accurate detection of fusion genes and transcription-induced chimeras from RNA-seq data
Mostra el registre complet Registre parcial de l'ítem
- dc.contributor.author Rodríguez Martín, Bernardoca
- dc.contributor.author Palumbo, Emilioca
- dc.contributor.author Marco Sola, Santiagoca
- dc.contributor.author Griebel, Thassoca
- dc.contributor.author Ribeca, Paoloca
- dc.contributor.author Alonso, Gracielaca
- dc.contributor.author Rastrojo, Albertoca
- dc.contributor.author Aguado, Begoñaca
- dc.contributor.author Guigó Serra, Rodericca
- dc.contributor.author Djebali, Sarahca
- dc.date.accessioned 2017-03-17T10:38:51Z
- dc.date.available 2017-03-17T10:38:51Z
- dc.date.issued 2017
- dc.description.abstract Background: Chimeric transcripts are commonly defined as transcripts linking two or more different genes in the genome, and can be explained by various biological mechanisms such as genomic rearrangement, read-through or trans-splicing, but also by technical or biological artefacts. Several studies have shown their importance in cancer, cell pluripotency and motility. Many programs have recently been developed to identify chimeras from Illumina RNA-seq data (mostly fusion genes in cancer). However outputs of different programs on the same dataset can be widely inconsistent, and tend to include many false positives. Other issues relate to simulated datasets restricted to fusion genes, real datasets with limited numbers of validated cases, result inconsistencies between simulated and real datasets, and gene rather than junction level assessment. Results: Here we present ChimPipe, a modular and easy-to-use method to reliably identify fusion genes and transcription-induced chimeras from paired-end Illumina RNA-seq data. We have also produced realistic simulated datasets for three different read lengths, and enhanced two gold-standard cancer datasets by associating exact junction points to validated gene fusions. Benchmarking ChimPipe together with four other state-of-the-art tools on this data showed ChimPipe to be the top program at identifying exact junction coordinates for both kinds of datasets, and the one showing the best trade-off between sensitivity and precision. Applied to 106 ENCODE human RNA-seq datasets, ChimPipe identified 137 high confidence chimeras connecting the protein coding sequence of their parent genes. In subsequent experiments, three out of four predicted chimeras, two of which recurrently expressed in a large majority of the samples, could be validated. Cloning and sequencing of the three cases revealed several new chimeric transcript structures, 3 of which with the potential to encode a chimeric protein for which we hypothesized a new role. Applying ChimPipe to human and mouse ENCODE RNA-seq data led to the identification of 131 recurrent chimeras common to both species, and therefore potentially conserved. Conclusions: ChimPipe combines discordant paired-end reads and split-reads to detect any kind of chimeras, including those originating from polymerase read-through, and shows an excellent trade-off between sensitivity and precision. The chimeras found by ChimPipe can be validated in-vitro with high accuracy.
- dc.description.sponsorship This project was supported by Award Number 1U54HG007004-01 from the National Human Genome Research Institute of the National Institutes of Health, by Obra Social Fundación ’la Caixa’ under the Severo Ochoa 2014 program, by grant BIO2011-26205 from the Spanish Ministry of Economy and Competitiveness (MINECO), Centro de Excelencia Severo Ochoa 2013-2017 (SEV-2012-0208), and by grant BFU2009-09117 from the Spanish Ministery of Science and Education (MICINN). This publication has also been written with the support of the Agreenskills fellowship program which has received funding from the EU’s Seventh Framework Program under grant agreement No FP7-609398, and with the support of an institutional grant from Fundación Ramón Areces attributed to CBMSO.
- dc.format.mimetype application/pdfca
- dc.identifier.citation Rodríguez Martín B, Palumbo E, Marco Sola S, Griebel T, Ribeca P, Alonso G, Rastrojo A, Aguado B, Guigó Serra R, Djebali S. ChimPipe: Accurate detection of fusion genes and transcription-induced chimeras from RNA-seq data. BMC Genomics. 2017;18(1):7. DOI: 10.1186/s12864-016-3404-9
- dc.identifier.doi http://dx.doi.org/10.1186/s12864-016-3404-9
- dc.identifier.issn 1471-2164
- dc.identifier.uri http://hdl.handle.net/10230/28254
- dc.language.iso eng
- dc.publisher BioMed Centralca
- dc.relation.ispartof BMC Genomics. 2017;18(1):7
- dc.relation.projectID info:eu-repo/grantAgreement/ES/3PN/BIO2011-26205
- dc.relation.projectID info:eu-repo/grantAgreement/ES/3PN/SEV2012-0208
- dc.rights © The Author(s). 2017 Open Access. This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
- dc.rights.accessRights info:eu-repo/semantics/openAccess
- dc.rights.uri http://creativecommons.org/licenses/by/4.0/
- dc.subject.keyword Chimera
- dc.subject.keyword Transcript
- dc.subject.keyword Fusion gene
- dc.subject.keyword RNA-seq
- dc.subject.keyword Benchmark
- dc.subject.keyword Cancer
- dc.subject.keyword Simulation
- dc.subject.keyword Isoform
- dc.subject.keyword Splice junction
- dc.title ChimPipe: Accurate detection of fusion genes and transcription-induced chimeras from RNA-seq dataca
- dc.type info:eu-repo/semantics/article
- dc.type.version info:eu-repo/semantics/publishedVersion