Protein language models are able to predict epistasis in a zero-shot setting following a nonlinear transform

Summary

Protein language models are able to predict epistasis in a zero-shot setting, but must be nonlinearly transformed to achieve meaningful accuracy (1). This was studied using ESM2 model and showed the same scaling dependency observed with prediction of other properties.

Details

Epistasis is defined as $ε^{e} = lo g f_{A B}^{e} - (lo g f_{A}^{e} + lo g f_{B}^{e})$ where $lo g f_{A B}^{e}$ is the experimental fitness of the double mutant and $lo g f_{A}^{e}$ and $lo g f_{B}^{e}$ is the experimental fitness of single mutants $A$ and $B$ . The nonlinear transform $ϕ_{1} (x) = - lo g (1 + exp (- b_{1} (x + c_{1})))$ is applied with fit parameters $b$ and $c$ , with $x$ being log-likelihood predictions from the PLM, and $ϕ_{1} (x)$ being the output transformed prediction.

Figures

Ref (1)

Nambiar A, Littlefield SB, Cuellar C, Khorana R, Maslov S. Protein Language Models Capture Structural and Functional Epistasis in a Zero-Shot Setting. openRxiv; 2025. Available from: https://doi.org/10.1101/2025.09.14.676130

Quartz 4

Explorer

Protein language models are able to predict epistasis in a zero-shot setting following a nonlinear transform

Summary

Details

Figures

Graph View

Table of Contents