A high-resolution single-molecule sequencing-based Arabidopsis transcriptome using novel methods of Iso-seq analysis

Mostra el registre complet Registre parcial de l'ítem

  • dc.contributor.author Zhang, Runxuan
  • dc.contributor.author Marquez, Yamile
  • dc.contributor.author Brown, John W. S.
  • dc.date.accessioned 2022-10-28T06:50:17Z
  • dc.date.available 2022-10-28T06:50:17Z
  • dc.date.issued 2022
  • dc.description.abstract Background: Accurate and comprehensive annotation of transcript sequences is essential for transcript quantification and differential gene and transcript expression analysis. Single-molecule long-read sequencing technologies provide improved integrity of transcript structures including alternative splicing, and transcription start and polyadenylation sites. However, accuracy is significantly affected by sequencing errors, mRNA degradation, or incomplete cDNA synthesis. Results: We present a new and comprehensive Arabidopsis thaliana Reference Transcript Dataset 3 (AtRTD3). AtRTD3 contains over 169,000 transcripts-twice that of the best current Arabidopsis transcriptome and including over 1500 novel genes. Seventy-eight percent of transcripts are from Iso-seq with accurately defined splice junctions and transcription start and end sites. We develop novel methods to determine splice junctions and transcription start and end sites accurately. Mismatch profiles around splice junctions provide a powerful feature to distinguish correct splice junctions and remove false splice junctions. Stratified approaches identify high-confidence transcription start and end sites and remove fragmentary transcripts due to degradation. AtRTD3 is a major improvement over existing transcriptomes as demonstrated by analysis of an Arabidopsis cold response RNA-seq time-series. AtRTD3 provides higher resolution of transcript expression profiling and identifies cold-induced differential transcription start and polyadenylation site usage. Conclusions: AtRTD3 is the most comprehensive Arabidopsis transcriptome currently. It improves the precision of differential gene and transcript expression, differential alternative splicing, and transcription start/end site usage analysis from RNA-seq data. The novel methods for identifying accurate splice junctions and transcription start/end sites are widely applicable and will improve single-molecule sequencing analysis from any species.
  • dc.format.mimetype application/pdf
  • dc.identifier.citation Zhang R, Kuo R, Coulter M, Calixto CPG, Entizne JC, Guo W et al. A high-resolution single-molecule sequencing-based Arabidopsis transcriptome using novel methods of Iso-seq analysis. Genome Biol. 2022 Jul 7;23(1):149. DOI: 10.1186/s13059-022-02711-0
  • dc.identifier.doi http://dx.doi.org/10.1186/s13059-022-02711-0
  • dc.identifier.issn 1474-7596
  • dc.identifier.uri http://hdl.handle.net/10230/54635
  • dc.language.iso eng
  • dc.publisher BioMed Central
  • dc.relation.ispartof Genome Biol. 2022 Jul 7;23(1):149
  • dc.rights © The Author(s). 2022 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
  • dc.rights.accessRights info:eu-repo/semantics/openAccess
  • dc.rights.uri http://creativecommons.org/licenses/by/4.0/
  • dc.subject.keyword Alternative polyadenylation
  • dc.subject.keyword Alternative splicing
  • dc.subject.keyword Arabidopsis
  • dc.subject.keyword Iso-seq
  • dc.subject.keyword Reference transcript dataset
  • dc.subject.keyword Splice junction
  • dc.subject.keyword Transcription start and end sites
  • dc.title A high-resolution single-molecule sequencing-based Arabidopsis transcriptome using novel methods of Iso-seq analysis
  • dc.type info:eu-repo/semantics/article
  • dc.type.version info:eu-repo/semantics/publishedVersion