Summary

During Inverse folding, the identity of about 50% of amino acids can be determined from context alone, i.e., the identities of the surrounding residues are not necessary (1,2). This could be what allows methods such as MiniMPNN and Frame2Seq, which predict the entire sequence in a single shot, to match the performance of autoregressive methods such as ESM-IF and GVP.

See also

1.
Mahajan SP, Dávila-Hernández FA, Ruffolo JA, Gray JJ. How well do contextual protein encodings learn structure, function, and evolutionary context? Cell Systems. 2025;16(3):101201. Available from: https://doi.org/10.1016/j.cels.2025.101201
2.
Ding D, Shaw AY, Sinai S, Rollins N, Prywes N, Savage DF, et al. Protein design using structure-based residue preferences. Nature Communications. 2024;15(1). Available from: https://doi.org/10.1038/s41467-024-45621-4