Summary
The correlation between a sequence’s log-likelihood and the performance of variant effect prediction using protein language models breaks down as the PLMs get larger (1). This happens specifically due to poorer performance with high-log-likelihood sequences.
Figures

Ref (1)
See also
- Sequences with lower log-likelihoods are worse for zero-shot variant effect prediction using PLMs
- Protein property prediction using PLMs does not benefit from scale except when predicting inferring features of either structural or sparsely populated sequence families
1.
Gordon C, Lu AX, Abbeel P. Protein Language Model Fitness Is a Matter of Preference. openRxiv; 2024. Available from: https://doi.org/10.1101/2024.10.03.616542