Summary

Sequences with lower log-likelihood values yield poor zero-shot variant effect prediction values using protein language models (1,2). This can be mitigated by fine-tuning (1). In contrast, sequences with high log-likelihoods yield poorer predictions when using a fine-tuned model (i.e., the opposite result). However, this correlation breaks down as PLMs get larger.

Figures

Figures from (1)

Ref (2)

See also

1.
Gordon C, Lu AX, Abbeel P. Protein Language Model Fitness Is a Matter of Preference. openRxiv; 2024. Available from: https://doi.org/10.1101/2024.10.03.616542
2.
Wells J, Hooker AH, Livne M, Lin W, Miller D, Dallago C, et al. ProFam: Open-Source Protein Family Language Modelling for Fitness Prediction and Design. openRxiv; 2025. Available from: https://doi.org/10.64898/2025.12.19.695431