HypercubeME: two hundred million combinatorially complete datasets from a single experiment

Esteban, Laura AvinoLonishin, Lyubov R.Bobrovskiy, DaniilLeleytner, GregoryBogatyreva, Natalya S.Kondrashov, Fyodor A., 1979-Ivankov, Dmitry N.2020-11-242020-11-242020Esteban LA, Lonishin LR, Bobrovskiy D, Leleytner G, Bogatyreva NS, Kondrashov FA, Ivankov DN. HypercubeME: two hundred million combinatorially complete datasets from a single experiment. Bioinformatics. 2020; 36(6):1960-2. DOI: 10.1093/bioinformatics/btz8411367-4803http://hdl.handle.net/10230/45879Motivation: Epistasis, the context-dependence of the contribution of an amino acid substitution to fitness, is common in evolution. To detect epistasis, fitness must be measured for at least four genotypes: the reference genotype, two different single mutants and a double mutant with both of the single mutations. For higher-order epistasis of the order n, fitness has to be measured for all 2n genotypes of an n-dimensional hypercube in genotype space forming a "combinatorially complete dataset". So far, only a handful of such datasets have been produced by manual curation. Concurrently, random mutagenesis experiments have produced measurements of fitness and other phenotypes in a high-throughput manner, potentially containing a number of combinatorially complete datasets. Results: We present an effective recursive algorithm for finding all hypercube structures in random mutagenesis experimental data. To test the algorithm, we applied it to the data from a recent HIS3 protein dataset and found all 199,847,053 unique combinatorially complete genotype combinations of dimensionality ranging from two to twelve. The algorithm may be useful for researchers looking for higher-order epistasis in their high-throughput experimental data. Availability: https://github.com/ivankovlab/HypercubeME.git. Supplementary information: Supplementary data are available at Bioinformatics online.application/pdfeng© The Author(s) 2019. Published by Oxford University Press. This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.comHypercubeME: two hundred million combinatorially complete datasets from a single experimentinfo:eu-repo/semantics/articlehttp://dx.doi.org/10.1093/bioinformatics/btz841Genetics and population analysisinfo:eu-repo/semantics/openAccess