robustica: customizable robust independent component analysis
Mostra el registre complet Registre parcial de l'ítem
- dc.contributor.author Anglada-Girotto, Miquel
- dc.contributor.author Miravet Verde, Samuel, 1992-
- dc.contributor.author Serrano Pubull, Luis, 1982-
- dc.contributor.author Head, Sarah A.
- dc.date.accessioned 2023-02-23T07:03:31Z
- dc.date.available 2023-02-23T07:03:31Z
- dc.date.issued 2022
- dc.description.abstract Background: Independent Component Analysis (ICA) allows the dissection of omic datasets into modules that help to interpret global molecular signatures. The inherent randomness of this algorithm can be overcome by clustering many iterations of ICA together to obtain robust components. Existing algorithms for robust ICA are dependent on the choice of clustering method and on computing a potentially biased and large Pearson distance matrix. Results: We present robustica, a Python-based package to compute robust independent components with a fully customizable clustering algorithm and distance metric. Here, we exploited its customizability to revisit and optimize robust ICA systematically. Of the 6 popular clustering algorithms considered, DBSCAN performed the best at clustering independent components across ICA iterations. To enable using Euclidean distances, we created a subroutine that infers and corrects the components' signs across ICA iterations. Our subroutine increased the resolution, robustness, and computational efficiency of the algorithm. Finally, we show the applicability of robustica by dissecting over 500 tumor samples from low-grade glioma (LGG) patients, where we define two new gene expression modules with key modulators of tumor progression upon IDH1 and TP53 mutagenesis. Conclusion: robustica brings precise, efficient, and customizable robust ICA into the Python toolbox. Through its customizability, we explored how different clustering algorithms and distance metrics can further optimize robust ICA. Then, we showcased how robustica can be used to discover gene modules associated with combinations of features of biological interest. Taken together, given the broad applicability of ICA for omic data analysis, we envision robustica will facilitate the seamless computation and integration of robust independent components in large pipelines.
- dc.description.sponsorship This project was funded by grants from the Plan Estatal de Investigación Científica y Técnica y de Innovación to L.S. (PGC2018-101271-B-I00 and PID2021-122341NB-I00, http://www.ciencia.gob.es). We also acknowledge the support of the Spanish Ministry of Science and Innovation to the EMBL partnership, the Centro de Excelencia Severo Ochoa, and the CERCA Programme/Generalitat de Catalunya. These funding bodies have not participated in the design of the study, collection, analysis, and interpretation of the data, nor in writing the manuscript.
- dc.format.mimetype application/pdf
- dc.identifier.citation Anglada-Girotto M, Miravet-Verde S, Serrano L, Head SA. robustica: customizable robust independent component analysis. BMC Bioinformatics. 2022 Dec 5;23(1):519. DOI: 10.1186/s12859-022-05043-9
- dc.identifier.doi http://dx.doi.org/10.1186/s12859-022-05043-9
- dc.identifier.issn 1471-2105
- dc.identifier.uri http://hdl.handle.net/10230/55865
- dc.language.iso eng
- dc.publisher BioMed Central
- dc.relation.ispartof BMC Bioinformatics. 2022 Dec 5;23(1):519
- dc.relation.projectID info:eu-repo/grantAgreement/ES/2PE/PGC2018-101271-B-I00
- dc.relation.projectID info:eu-repo/grantAgreement/ES/3PE/PID2021-122341NB-I00
- dc.rights © The Author(s) 2022. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
- dc.rights.accessRights info:eu-repo/semantics/openAccess
- dc.rights.uri http://creativecommons.org/licenses/by/4.0/
- dc.subject.keyword Bioinformatics
- dc.subject.keyword Clustering
- dc.subject.keyword Independent component analysis
- dc.subject.keyword Low-grade glioma
- dc.subject.keyword Python
- dc.subject.keyword Unsupervised learning
- dc.title robustica: customizable robust independent component analysis
- dc.type info:eu-repo/semantics/article
- dc.type.version info:eu-repo/semantics/publishedVersion