Quartz 4

❯

❯

Sequences with lower log likelihoods are worse for zero shot variant effect prediction using PLMs

Sequences with lower log-likelihoods are worse for zero-shot variant effect prediction using PLMs

Created Jan 22, 2026Modified Apr 21, 2026

protein-language-models/representations

Summary

Sequences with lower log-likelihood values yield poor zero-shot variant effect prediction values using protein language models (1,2). This can be mitigated by fine-tuning (1). In contrast, sequences with high log-likelihoods yield poorer predictions when using a fine-tuned model (i.e., the opposite result). However, this correlation breaks down as PLMs get larger.

Figures

Figures from (1)

Ref (2)

See also

PLMs are biased by uneven distribution of sequence data in datasets such as UniRef and UniProt
PLMs downweigh probability of sequences with multiple mutations

1.

Gordon C, Lu AX, Abbeel P. Protein Language Model Fitness Is a Matter of Preference. openRxiv; 2024. Available from: https://doi.org/10.1101/2024.10.03.616542

2.

Wells J, Hooker AH, Livne M, Lin W, Miller D, Dallago C, et al. ProFam: Open-Source Protein Family Language Modelling for Fitness Prediction and Design. openRxiv; 2025. Available from: https://doi.org/10.64898/2025.12.19.695431

Graph View

Backlinks

Correlation between sequence log-likelihood and variant effect prediction performance breaks down as PLMs get larger
Fine-tuning almost always improves property prediction

Created with Quartz v4.5.2 © 2026

GitHub
Discord Community