ESM is series of a protein language models that use the BERT masking training objectives (introduced in (1,2), and (3)). Embeddings from a 3B ESM2 model (which was sequence-only) were used to train ESMFold (2). ESM3 directly uses input and output structure tokens as training objectives (3).

Details

  • ESM1b used dropout in the attention matrices during training, but ESM2 did not.
  • Verkuil et al. (4) argue that it generalizes beyond natural proteins. Its training set excludes artificial proteins.
  • ESM-1v is a variant effect prediction model trained on MSA Transformer logits (5).
1.
Rives A, Meier J, Sercu T, Goyal S, Lin Z, Liu J, et al. Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. Proceedings of the National Academy of Sciences. 2021;118(15). Available from: https://doi.org/10.1073/pnas.2016239118
2.
Lin Z, Akin H, Rao R, Hie B, Zhu Z, Lu W, et al. Evolutionary-scale prediction of atomic-level protein structure with a language model. Science. 2023;379(6637):1123–30. Available from: https://doi.org/10.1126/science.ade2574
3.
Hayes T, Rao R, Akin H, Sofroniew NJ, Oktay D, Lin Z, et al. Simulating 500 million years of evolution with a language model. Science. 2025;387(6736):850–8. Available from: https://doi.org/10.1126/science.ads0018
4.
Verkuil R, Kabeli O, Du Y, Wicky BIM, Milles LF, Dauparas J, et al. Language models generalize beyond natural proteins. openRxiv; 2022. Available from: https://doi.org/10.1101/2022.12.21.521521
5.
Meier J, Rao R, Verkuil R, Liu J, Sercu T, Rives A. Language models enable zero-shot prediction of the effects of mutations on protein function. openRxiv; 2021. Available from: https://doi.org/10.1101/2021.07.09.450648