Summary

Using evolutionary information for label smoothing when training inverse folding models improves variant effect prediction. Zhou et al. used the BLOSUM62 matrix, whereas Gong et al. (1) used PSSMs. By contrast, Dauparas et al. used uniform smoothing (2,3).

Details

Zhou et al. use a temperature value to balance the contribution of the WT amino acid and the BLOSUM62 matrix - the figure below defines this as where and are the target matrices and BLOSUM matrices, respectively, and is a nonlinear operator. (2)

In contrast, Gong et al. (1) calculated propensities from MSAs.

Figures

Figure S7 from (2)

MethodT2837 ρ↑T2837 AUC↑S669 ρ↑S669 AUC↑S-Sym ρ↑S-Sym AUC↑Myolobin ρ↑Myolobin AUC↑FireProtDB ρ↑FireProtDB AUC↑Gβ1 ρ↑Gβ1 AUC↑T2837 Reverse ρ↑T2837 Reverse AUC↑
# Mutations283766934213417649352837
RaSP* (Blaabjerg et al., 2023)0.580.610.390.690.640.730.680.750.560.710.720.660.230.59
ThermoMPNN* (Dieckhaus et al., 2023)0.550.780.390.680.660.820.580.770.570.750.650.780.430.71
Prostata-IFML (Diaz et al., 2023)0.530.750.490.760.550.750.540.67--0.660.820.520.75
Stability Oracle (Diaz et al., 2023)0.590.810.520.740.720.870.680.810.610.790.710.820.590.81
ESM2* (Lin et al., 2023)0.280.600.040.500.260.560.150.570.250.570.250.630.280.60
ProteinMPNN* (Dauparas et al., 2022)0.360.700.250.590.320.640.350.660.310.700.350.670.360.70
MutComputeXGT (Diaz et al., 2023)0.340.680.270.570.380.720.370.720.300.690.340.660.340.68
MutComputeXGT w/ MSA soft-label (Ours)0.370.700.300.590.480.750.450.750.360.710.410.690.370.70
MutRank (Ours)0.510.780.400.670.620.840.680.840.510.770.620.770.510.78
SSL Improvement ↑42%11%48%14%63%17%83%17%65%10%77%15%42%11%
Supervised Fine-Tuning Gap ↓14%4%23%9%14%3%0%-4%16%3%13%3%14%4%

Table 3 from (1)

1.
Gong C, Klivans A, Loy JM, Chen T, Liu Q, Diaz DJ. Evolution-Inspired Loss Functions for Protein Representation Learning. In: Proceedings of the 41st International Conference on Machine Learning. PMLR; 2024. p. 15893–906. (Proceedings of Machine Learning Research; vol. 235). Available from: https://proceedings.mlr.press/v235/gong24e.html
2.
Zhou B, Zheng L, Wu B, Tan Y, Lv O, Yi K, et al. Protein Engineering with Lightweight Graph Denoising Neural Networks. Journal of Chemical Information and Modeling. 2024;64(9):3650–61. Available from: https://doi.org/10.1021/acs.jcim.4c00036
3.
Dauparas J, Anishchenko I, Bennett N, Bai H, Ragotte RJ, Milles LF, et al. Robust deep learning–based protein sequence design using ProteinMPNN. Science. 2022;378(6615):49–56. Available from: https://doi.org/10.1126/science.add2187