Summary

The correlation between a sequence’s log-likelihood and the performance of variant effect prediction using protein language models breaks down as the PLMs get larger (1). This happens specifically due to poorer performance with high-log-likelihood sequences.

Figures

Ref (1)

See also

1.
Gordon C, Lu AX, Abbeel P. Protein Language Model Fitness Is a Matter of Preference. openRxiv; 2024. Available from: https://doi.org/10.1101/2024.10.03.616542