Summary

Structure-based methods such as Inverse folding models outperform sequence-based methods such as PLMs on protein stability prediction of point mutants, but not full sequences (1). Among those models, ESM-IF is the most effective (2,3). (4) found that embedding geometry into PLMs can improve stability prediction. In contrast, (5) found that the PLM ESM-2 had better Spearman correlations than these inverse folding models to experimental stability measurements when comparing the de novo sequences with the same target fold.

Details

Comparisons carried out by (2) were carried out using data from (6,7).

Data are ambiguous about when Rosetta or ProteinMPNN is better ((1); the former was substantially improved by (8) following transfer learning on ddG data by (7)).

Figures

Model nameExpressionBindingActivityStabilityOrganismal fitnessMean
N. Assays610316869
EVE (ens.)0.4600.3220.4530.4180.4700.425
GEMME0.3820.3610.4900.5180.4690.444
ProGen2 (ens.)0.4590.2980.4170.4220.4460.408
VESPA0.4830.3780.4720.5010.4680.460
ProteinMPNN0.1620.1480.2130.5550.1770.251
ESM-IF10.4010.3750.3870.6300.3680.432
Tranception L0.4410.3290.4620.4730.4590.433
TranceptEVE0.4810.3410.4820.5020.4780.457
StructSeq0.5490.3990.4980.6330.4680.509

Ref (3)

Design methodDe NovoNaturalAll
GVP0.3900.4940.450
PiFold0.4480.5560.511
ProteinMPNN0.4280.6050.531
ESMIF0.5000.6290.575
ByProt0.4680.5860.536
AF-Design0.3540.2920.318
ESM-Design0.1270.00040.053

Ref (2)

ModelVersionTPR↑ DTm 5%TPR↑ DTm 25%TPR↑ DTm 50%TPR↑ DDG 5%TPR↑ DDG 25%TPR↑ DDG 50%
PROGEN2oas0.0330.2860.5370.0000.3390.515
medium0.1170.3670.5820.0720.4430.615
base0.2120.3620.5850.2310.4080.621
large0.1320.3230.5570.1170.3200.597
BFD900.1780.3330.5890.2060.4510.644
xlarge0.1180.3530.5780.1440.3830.603
TRANCEPTIONmedium0.1880.3590.5640.0830.3670.527
large0.1490.3710.5860.0720.3950.540
PORTTRANSbert0.1310.3640.5860.1220.4240.635
bert_bfd0.1680.3360.5790.1360.4230.589
t5_xl_uniref500.1840.4120.5930.1470.4250.640
t5_xl_bfd0.1360.3500.5870.1060.4190.610
ESM-1V-0.2160.3860.6020.2310.4510.622
ESM-1B-0.1510.4020.6060.2110.4240.642
ESM-IF1-0.1880.4180.6560.2580.4690.641
ESM-2t300.1390.3970.5980.1720.4530.646
t330.2390.4070.6010.1810.4380.637
t360.1520.4080.6340.1690.4050.641
t480.2320.4300.6070.1890.4000.606
P¹³LGk20_h12800.3040.4190.6420.2670.4540.676

Ref (4)

Ref (5)

See also

1.
Reeves S, Kalyaanamoorthy S. Zero-shot transfer of protein sequence likelihood models to thermostability prediction. Nature Machine Intelligence. 2024;6(9):1063–76. Available from: https://doi.org/10.1038/s42256-024-00887-7
2.
Wang C, Zhong B, Zhang Z, Chaudhary N, Misra S, Tang J. PDB-Struct: A Comprehensive Benchmark for Structure-based Protein Design. 2023; Available from: https://arxiv.org/abs/2312.00080
3.
Paul S, Kollasch A, Notin P, Marks D. Combining Structure and Sequence for Superior Fitness Prediction. In: GenBio@NeurIPS2023. 2023. Available from: https://openreview.net/forum?id=8PbTU4exnV
4.
Tan Y, Zhou B, Jiang Y, Wang YG, Hong L. Multi-level Protein Representation Learning for Blind Mutational Effect Prediction. 2023; Available from: https://arxiv.org/abs/2306.04899
5.
Cho Y, Dauparas J, Tsuboyama K, Rocklin G, Ovchinnikov S. Implicit modeling of the conformational landscape and sequence allows scoring and generation of stable proteins. openRxiv; 2024. Available from: https://doi.org/10.1101/2024.12.20.629706
6.
Rocklin GJ, Chidyausiku TM, Goreshnik I, Ford A, Houliston S, Lemak A, et al. Global analysis of protein folding using massively parallel design, synthesis, and testing. Science. 2017;357(6347):168–75. Available from: https://doi.org/10.1126/science.aan0693
7.
Tsuboyama K, Dauparas J, Chen J, Laine E, Mohseni Behbahani Y, Weinstein JJ, et al. Mega-scale experimental analysis of protein folding stability in biology and design. Nature. 2023;620(7973):434–44. Available from: https://doi.org/10.1038/s41586-023-06328-6
8.
Dieckhaus H, Brocidiacono M, Randolph NZ, Kuhlman B. Transfer learning to leverage larger datasets for improved prediction of protein stability changes. Proceedings of the National Academy of Sciences. 2024;121(6). Available from: https://doi.org/10.1073/pnas.2314853121