Summary

Reinforcement learning (here, DPO) outperforms fine-tuning on property-guided inverse folding (1). This was shown using (ProteinMPNN) as a base model, and even in situations with multiple constraints. As shown below, this does not lead to sacrificed refoldability metrics.

Figures

MethodRMSD ↓TM score ↑PLDDT ↑EP ↓Sol ↑Thermo ↑AAR ↑
Test Dataset3.970.76180.85.800.6200.2461.000
ESM-IF4.360.73778.46.110.7330.7190.464
InstructPLM (default)6.810.62873.47.970.6530.3960.574
InstructPLM (T=0.1)6.960.63274.47.310.6570.4550.584
ProteinMPNN4.300.74079.16.700.7190.7690.389
ProteinDPO5.490.66772.010.500.6290.3570.388
SolubleMPNN4.480.73378.86.540.794 ★0.8150.382
Guidance [Sol]4.330.74079.46.400.762 ★0.8050.393
MoMPNN [Sol+TM]4.370.73879.36.270.8840.7470.384
MoMPNN [Sol+TM+EP]4.380.73979.56.180.852 ★0.7900.387
MoMPNN [Sol+IG]4.730.72779.36.000.883 ★0.7510.382
MoMPNN [Sol+IG+EP]4.610.73179.35.990.856 ★0.7890.384
HyperMPNN4.900.70674.37.810.7190.929 ★0.359
Guidance [Thermo]4.300.73777.66.880.7350.901 ★0.386
MoMPNN [Thermo+TM]4.300.73978.46.240.7040.947 ★0.386
MoMPNN [Thermo+TM+EP]4.300.74278.66.120.7310.946 ★0.387
MoMPNN [Thermo+IG]4.380.73478.25.850.6940.9630.382
MoMPNN [Thermo+IG+EP]4.370.73778.55.970.7230.947 ★0.385
Ref (1)
1.
Hou X, Liu J, Shi C, Liu X, Yang Z, Tang J. Property-driven Protein Inverse Folding With Multi-Objective Preference Alignment. 2026; Available from: https://arxiv.org/abs/2603.06748