k-mer manifold approximation and projection for visualizing DNA sequences
Mostra el registre complet Registre parcial de l'ítem
- dc.contributor.author Fu, Chengbo
- dc.contributor.author Niskanen, Einari A.
- dc.contributor.author Wei, Gong-Hong
- dc.contributor.author Yang, Zhirong
- dc.contributor.author Sanvicente-García, Marta
- dc.contributor.author Güell Cargol, Marc, 1982-
- dc.contributor.author Cheng, Lu
- dc.date.accessioned 2025-06-16T05:55:00Z
- dc.date.available 2025-06-16T05:55:00Z
- dc.date.issued 2025
- dc.description.abstract Identifying and illustrating patterns in DNA sequences are crucial tasks in various biological data analyses. In this task, patterns are often represented by sets of k-mers, the fundamental building blocks of DNA sequences. To visually unveil these patterns, one could project each k-mer onto a point in two-dimensional (2D) space. However, this projection poses challenges owing to the high-dimensional nature of k-mers and their unique mathematical properties. Here, we establish a mathematical system to address the peculiarities of the k-mer manifold. Leveraging this k-mer manifold theory, we develop a statistical method named KMAP for detecting k-mer patterns and visualizing them in 2D space. We applied KMAP to three distinct data sets to showcase its utility. KMAP achieves a comparable performance to the classical method MEME, with ∼90% similarity in motif discovery from HT-SELEX data. In the analysis of H3K27ac ChIP-seq data from Ewing sarcoma (EWS), we find that BACH1, OTX2, and KNCH2 might affect EWS prognosis by binding to promoter and enhancer regions across the genome. We also observe potential colocalization of BACH1, OTX2, and the motif CCCAGGCTGGAGTGC in ∼70 bp windows in the enhancer regions. Furthermore, we find that FLI1 binds to the enhancer regions after ETV6 degradation, indicating competitive binding between ETV6 and FLI1. Moreover, KMAP identifies four prevalent patterns in gene editing data of the AAVS1 locus, aligning with findings reported in the literature. These applications underscore that KMAP can be a valuable tool across various biological contexts.
- dc.format.mimetype application/pdf
- dc.identifier.citation Fu C, Niskanen EA, Wei GH, Yang Z, Sanvicente-García M, Güell M, et al. k-mer manifold approximation and projection for visualizing DNA sequences. Genome Res. 2025 May 2;35(5):1234-46. DOI: 10.1101/gr.279458.124
- dc.identifier.doi http://dx.doi.org/10.1101/gr.279458.124
- dc.identifier.issn 1088-9051
- dc.identifier.uri http://hdl.handle.net/10230/70692
- dc.language.iso eng
- dc.publisher Cold Spring Harbor Laboratory Press (CSHL Press)
- dc.relation.ispartof Genome Res. 2025 May 2;35(5):1234-46
- dc.rights © 2025 Fu et al.; Published by Cold Spring Harbor Laboratory Press. This article, published in Genome Research, is available under a Creative Commons License (Attribution 4.0 International), as described at http://creativecommons.org/licenses/by/4.0/.
- dc.rights.accessRights info:eu-repo/semantics/openAccess
- dc.rights.uri http://creativecommons.org/licenses/by/4.0/
- dc.subject.other Seqüència de nucleòtids
- dc.title k-mer manifold approximation and projection for visualizing DNA sequences
- dc.type info:eu-repo/semantics/article
- dc.type.version info:eu-repo/semantics/publishedVersion