Accurate models describing the relationship between genotype and phenotype are necessary in order to understand and predict how mutations to biological sequences affect the fitness and evolution of living organisms. The apparent abundance of epistasis (genetic interactions), both between and within genes, complicates this task and how to build mechanistic models that incorporate epistatic coefficients (genetic interaction terms) is an open question. The Walsh-Hadamard transform represents a rigorous ...
Accurate models describing the relationship between genotype and phenotype are necessary in order to understand and predict how mutations to biological sequences affect the fitness and evolution of living organisms. The apparent abundance of epistasis (genetic interactions), both between and within genes, complicates this task and how to build mechanistic models that incorporate epistatic coefficients (genetic interaction terms) is an open question. The Walsh-Hadamard transform represents a rigorous computational framework for calculating and modeling epistatic interactions at the level of individual genotypic values (known as genetical, biological or physiological epistasis), and can therefore be used to address fundamental questions related to sequence-to-function encodings. However, one of its main limitations is that it can only accommodate two alleles (amino acid or nucleotide states) per sequence position. In this paper we provide an extension of the Walsh-Hadamard transform that allows the calculation and modeling of background-averaged epistasis (also known as ensemble epistasis) in genetic landscapes with an arbitrary number of states per position (20 for amino acids, 4 for nucleotides, etc.). We also provide a recursive formula for the inverse matrix and then derive formulae to directly extract any element of either matrix without having to rely on the computationally intensive task of constructing or inverting large matrices. Finally, we demonstrate the utility of our theory by using it to model epistasis within both simulated and empirical multiallelic fitness landscapes, revealing that both pairwise and higher-order genetic interactions are enriched between physically interacting positions.
+