Summary

MSA-based PLMs such as MSA Transformer and the Evoformer are more effective than generic PLMs at predicting structure (1)** and stability** (2). (3) found that MSA Transformer outperformed PLMs like ESM2-15B on almost all benchmarks in ProteinGym.

Details

Conclusions from (1) about the representations from Evoformer as a standalone ML model:

  • Structure prediction (superior to ESM-1b and MSA Transformer)
  • Miniprotein stability prediction (superior to ESM-1b and MSA transformer)
  • Function annotation prediction (ESM-1b outperforms EvoFormer and MSA-Transformer)
  • Fitness score prediction (worse than ESM-1b and MSA-transformer)
  • Residue-level prediction

Figures

CategoryModelVersion# Params (M)ρ Singleρ Doubleρ Allρ Prokaryoteρ Humanρ Eukaryoteρ Virus
MSASITEINDEP--0.3780.3220.3780.3430.3750.4010.406
EVMUTATION--0.4230.4010.4230.4990.3960.4290.381
WAVENET--0.3990.3440.4000.4920.3730.4420.321
DEEPSEQUENCE--0.4110.3570.4150.4970.3960.4610.332
MSA-TRANSFORMERmsa11000.3100.2320.3080.2920.3020.3920.278
msa1b1000.2910.2750.2900.2680.2820.3650.279
non-MSARITAsmall850.3240.2110.3290.3110.3140.3300.372
medium3000.3720.2370.3770.3560.3700.3990.398
large6800.3720.2270.3830.3530.3800.4040.405
xlarge1,2000.3850.2340.3890.4050.3640.3930.407
PROGEN2small1510.3460.2490.3520.3640.3760.3960.273
medium7640.3940.2740.3950.4340.3930.4110.346
base7640.3890.3230.3940.4260.3960.4270.335
large2,7000.3960.3330.3960.4310.3960.4360.336
xlarge6,4000.4040.3580.4040.4800.3490.4520.383
PORTTRANSbert4200.3390.2790.3360.4030.3000.3450.317
bert_bfd4200.3110.3360.3080.4710.3280.3380.087
t5_xl_uniref503,0000.3840.2840.3780.4850.3750.3690.277
t5_xl_bfd3,0000.3550.3560.3510.4900.3990.3490.131
TRANCEPTIONlarge7000.3990.3980.4060.4470.3690.4260.407
ESM-1V-6500.3760.2900.3720.4960.4090.3980.233
ESM-1B-6500.3710.3250.3660.5070.4160.3600.150
ESM-IF1-1420.3590.2790.3680.4450.3580.3390.322
ESM-2t301500.3450.2960.3440.4370.4190.4010.045
t336500.3920.3170.3890.5150.4330.4540.155
t363,0000.3840.2610.3830.4950.4190.4290.195
t4815,0000.3940.3130.3910.4570.4020.4420.251
P¹³LGk20_h5121480.4240.3950.4260.5160.4250.4800.297
Ref (2)
1.
Hu M, Yuan F, Yang K, Ju F, Su J, Wang H, et al. Exploring evolution-aware & -free protein language models as protein function predictors. Advances in Neural Information Processing Systems. 2022;35:38873–84. Available from: https://papers.nips.cc/paper_files/paper/2022/hash/fe066022bab2a6c6a3c57032a1623c70-Abstract-Conference.html
2.
Tan Y, Zhou B, Jiang Y, Wang YG, Hong L. Multi-level Protein Representation Learning for Blind Mutational Effect Prediction. 2023; Available from: https://arxiv.org/abs/2306.04899
3.
Notin P, Kollasch AW, Ritter D, van Niekerk L, Paul S, Spinner H, et al. ProteinGym: Large-Scale Benchmarks for Protein Design and Fitness Prediction. openRxiv; 2023. Available from: https://doi.org/10.1101/2023.12.07.570727