Summary

Designs generated by inverse folding that are rich in glutamates and lysines can fail due to expression failures resulting from adenosine-rich sequences (1,2). In the Bits in Bio 2026 competition, these were overrepresented by designs from ProteinMPNN, particularly those rich in alpha helices. In the Proteina-Complexa paper, high lysine and glutamate content correlated with absence from phage pools (along with length of designs). High lysine and glutamate content was observed in ProteinMPNN outputs in previous studies (3) and has been replicated in recent methods (4).

Figures

Ref (1) Ref (2) Figure 4B from (3) Ref (4)

See also

1.
Kosonocky CW, Abel AM, Feller AL, Cifuentes Rieffer AE, Woolley PR, Lála J, et al. Validation and analysis of 12,000 AI-driven CAR-T designs in the Bits to Binders competition. openRxiv; 2026. Available from: https://doi.org/10.64898/2026.03.03.709355
2.
Didi K, Zhang Z, Zhou G, Reidenbach D, Cao Z, Cha S, et al. Scaling Atomistic Protein Binder Design with Generative Pretraining and Test-Time Compute. 2026; Available from: https://arxiv.org/abs/2603.27950
3.
Wang H, Liu D, Zhao K, Wang Y, Zhang G. SPDesign: protein sequence designer based on structural sequence profile using ultrafast shape recognition. Briefings in Bioinformatics. 2024;25(3). Available from: https://doi.org/10.1093/bib/bbae146
4.
Stark H, Faltings F, Choi M, Xie Y, Hur E, O’Donnell T, et al. BoltzGen: Toward Universal Binder Design. openRxiv; 2025. Available from: https://doi.org/10.1101/2025.11.20.689494