Amalgams: data-driven amalgamation for the dimensionality reduction of compositional data

dc.contributor.authorQuinn, Thomas P.
dc.contributor.authorErb, Ionas
dc.date.accessioned2022-05-18T10:13:55Z
dc.date.available2022-05-18T10:13:55Z
dc.date.issued2020
dc.description.abstractMany next-generation sequencing datasets contain only relative information because of biological and technical factors that limit the total number of transcripts observed for a given sample. It is not possible to interpret any one component in isolation. The field of compositional data analysis has emerged with alternative methods for relative data based on log-ratio transforms. However, these data often contain many more features than samples, and thus require creative new ways to reduce the dimensionality of the data. The summation of parts, called amalgamation, is a practical way of reducing dimensionality, but can introduce a non-linear distortion to the data. We exploit this non-linearity to propose a powerful yet interpretable dimension method called data-driven amalgamation. Our new method, implemented in the user-friendly R package amalgam, can reduce the dimensionality of compositional data by finding amalgamations that optimally (i) preserve the distance between samples, or (ii) classify samples as diseased or not. Our benchmark on 13 real datasets confirm that these amalgamations compete with state-of-the-art methods in terms of performance, but result in new features that are easily understood: they are groups of parts added together.
dc.format.mimetypeapplication/pdf
dc.identifier.citationQuinn TP, Erb I. Amalgams: data-driven amalgamation for the dimensionality reduction of compositional data. NAR Genom Bioinform. 2020 Oct 2;2(4):lqaa076. DOI:10.1093/nargab/lqaa076
dc.identifier.doihttp://dx.doi.org/10.1093/nargab/lqaa076
dc.identifier.issn2631-9268
dc.identifier.urihttp://hdl.handle.net/10230/53139
dc.language.isoeng
dc.publisherOxford University Press
dc.rights© Thomas P. Quinn and Ionas Erb 2019. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics. This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited
dc.rights.accessRightsinfo:eu-repo/semantics/openAccess
dc.rights.urihttp://creativecommons.org/licenses/by-nc/4.0/
dc.subject.otherAnàlisi de dades
dc.subject.otherAmalgamació
dc.subject.otherMostreig
dc.titleAmalgams: data-driven amalgamation for the dimensionality reduction of compositional data
dc.typeinfo:eu-repo/semantics/article
dc.type.versioninfo:eu-repo/semantics/publishedVersion

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Quinn_2020.pdf
Size:
2.36 MB
Format:
Adobe Portable Document Format

License

Rights