Summary

Protein property prediction using PLMs does not benefit from scale beyond ~650M parameters, except when predicting A) structural features or B) features of sparsely populated protein families such as those from viruses. This is particularly the case with zero-shot prediction (12). Some authors have reported it is also true of fine-tuning (45), although aligning PLMs using reinforcement learning using data does seem to restore the trend in some cases (7). The exception of structure when transfer learning was shown by (5), and corroborated by attempts to use (ESM) embeddings as starting points for structure prediction (8,9). Related to this, larger PLMs are better at homolog detection and thermostability prediction, and do not improve as much as smaller models at fitness prediction when infused with structural information via low-rank adaptors (10). Likewise, the exception for viral proteins was observed by (11). (12) surmise that both exceptions are due to large PLMs having a greater capacity to memorize domain-specific contacts.

Details

Larger (ProGen) models were better able to predict fitness of distant sequences (“wide mutational scale”; (2)). Likewise, engineered CRISPR variants were said to be better predicted with larger models ((13); mentioned at PEGS Boston 2024). Augmenting ProGen models with structural information only led to improvements in fitness prediction in smaller models, suggesting that the larger models had already learned most of that information.

This is also true of post-translational modification prediction ((14); ESM2-650M vs ESM2-3B).

Figures

Model typeModel nameSpearmanAUCMCCNDCGRecall
Alignment-basedSite-Independent0.3610.6970.2880.7460.201
WaveNet0.2160.6230.1740.6840.154
EVmutation0.3970.7170.3060.7750.220
DeepSequence (ens.)0.4220.7310.3300.7750.227
EVE (ens.)0.4410.7410.3430.7810.231
GEMME0.4570.7500.3530.7750.209
Protein languageUniRep0.1930.6070.1490.6470.14
CARP (640M)0.3730.7040.2890.7490.210
RITA XL0.3730.7080.2940.7500.194
ProGen2 XL0.3920.7180.3070.7660.200
ESM-1b ★0.3990.7220.3150.7480.205
ESM2 (15B) ★0.4050.7230.3180.7590.210
ESM-1v (ens.)0.4160.7300.3290.7530.216
VESPA0.4370.7430.3480.7740.201
HybridUniRep evotuned0.3470.6930.2740.7370.181
MSA Transformer (ens.)0.4340.7380.3410.7770.224
Tranception L0.4360.7400.3420.7780.221
TranceptEVE L0.4570.7520.3570.7850.231
Inverse FoldingProteinMPNN0.2580.6400.1960.7120.186
MIF-ST0.4010.7180.3100.7660.227
ESM-IF10.4220.7300.3310.7480.223
Ref (3)

Ref (2)

Figures 3-5 from (5)

Ref (6)

Ref (7)

Figures from (11)

1.
Tan Y, Zhou B, Jiang Y, Wang YG, Hong L. Multi-level Protein Representation Learning for Blind Mutational Effect Prediction. 2023; Available from: https://arxiv.org/abs/2306.04899
2.
Nijkamp E, Ruffolo JA, Weinstein EN, Naik N, Madani A. ProGen2: Exploring the boundaries of protein language models. Cell Systems. 2023;14(11):968-978.e3. Available from: https://doi.org/10.1016/j.cels.2023.10.002
3.
Notin P, Kollasch AW, Ritter D, van Niekerk L, Paul S, Spinner H, et al. ProteinGym: Large-Scale Benchmarks for Protein Design and Fitness Prediction. openRxiv; 2023. Available from: https://doi.org/10.1101/2023.12.07.570727
4.
Detlefsen NS, Hauberg S, Boomsma W. Learning meaningful representations of protein sequences. Nature Communications. 2022;13(1). Available from: https://doi.org/10.1038/s41467-022-29443-w
5.
Li F-Z, Amini AP, Yue Y, Yang KK, Lu AX. Feature Reuse and Scaling: Understanding Transfer Learning with Protein Language Models. openRxiv; 2024. Available from: https://doi.org/10.1101/2024.02.05.578959
6.
Vieira LC, Handojo ML, Wilke CO. Medium-sized protein language models perform well at transfer learning on realistic datasets. Scientific Reports. 2025;15(1). Available from: https://doi.org/10.1038/s41598-025-05674-x
7.
Bhatnagar A, Jain S, Beazer J, Curran SC, Hoffnagle AM, Ching KS, et al. Scaling Unlocks Broader Generation and Deeper Functional Understanding of Proteins. openRxiv; 2025. Available from: https://doi.org/10.1101/2025.04.15.649055
8.
Lin Z, Akin H, Rao R, Hie B, Zhu Z, Lu W, et al. Evolutionary-scale prediction of atomic-level protein structure with a language model. Science. 2023;379(6637):1123–30. Available from: https://doi.org/10.1126/science.ade2574
9.
Lee J, Han K, Kim J, Yu H, Lee Y. Solvent: A Framework for Protein Folding. 2023; Available from: https://arxiv.org/abs/2307.04603
10.
Ruffolo JA, Bhatnagar A, Beazer J, Nayfach S, Russ J, Hill E, et al. Adapting protein language models for structure-conditioned design. openRxiv; 2024. Available from: https://doi.org/10.1101/2024.08.03.606485
11.
Gurev S, Youssef N, Jain N, Mehrotra A, Leung SRM, Jackson A, et al. Evaluating variant effect prediction across viruses. openRxiv; 2025. Available from: https://doi.org/10.1101/2025.08.04.668549
12.
Zhang Z, Wayment-Steele HK, Brixi G, Wang H, Kern D, Ovchinnikov S. Protein language models learn evolutionary statistics of interacting sequence motifs. Proceedings of the National Academy of Sciences. 2024;121(45). Available from: https://doi.org/10.1073/pnas.2406285121
13.
Ruffolo JA, Nayfach S, Gallagher J, Bhatnagar A, Beazer J, Hussain R, et al. Design of highly functional genome editors by modelling CRISPR–Cas sequences. Nature. 2025;645(8080):518–25. Available from: https://doi.org/10.1038/s41586-025-09298-z
14.
Peng FZ, Wang C, Chen T, Schussheim B, Vincoff S, Chatterjee P. PTM-Mamba: a PTM-aware protein language model with bidirectional gated Mamba blocks. Nature Methods. 2025;22(5):945–9. Available from: https://doi.org/10.1038/s41592-025-02656-9