ProGen is a family of autoregressive protein language models. ProGen2 has been extensively shown to be capable of designing new sequences from scratch if fine-tuned; see PLMs generate more divergent functional proteins than Potts models (1,2). ProGen3 adopts a Mixture-of-experts architecture, is also capable of infilling masked spans, and can design proteins that express in vitro even without fine-tuning.

Uses

(3) used a fine-tuned version of ProGen to generate functional lysozymes with low sequence identify to natural proteins.

(4) generated four million CRISPR proteins and one million Cas9 proteins using fine-tuned versions of ProGen. Half were prompted with the N- or C-terminal 50 residues of natural enzymes, while the other half were fully de novo. The generated sequences were well distributed across the various sub-families of these proteins, and the observed pLDDT values from AlphaFold2 were above 80 for >80% and >99.4%, respectively.

1.
Nijkamp E, Ruffolo JA, Weinstein EN, Naik N, Madani A. ProGen2: Exploring the boundaries of protein language models. Cell Systems. 2023;14(11):968-978.e3. Available from: https://doi.org/10.1016/j.cels.2023.10.002
2.
Bhatnagar A, Jain S, Beazer J, Curran SC, Hoffnagle AM, Ching KS, et al. Scaling Unlocks Broader Generation and Deeper Functional Understanding of Proteins. openRxiv; 2025. Available from: https://doi.org/10.1101/2025.04.15.649055
3.
Madani A, Krause B, Greene ER, Subramanian S, Mohr BP, Holton JM, et al. Large language models generate functional protein sequences across diverse families. Nature Biotechnology. 2023;41(8):1099–106. Available from: https://doi.org/10.1038/s41587-022-01618-2
4.
Ruffolo JA, Nayfach S, Gallagher J, Bhatnagar A, Beazer J, Hussain R, et al. Design of highly functional genome editors by modelling CRISPR–Cas sequences. Nature. 2025;645(8080):518–25. Available from: https://doi.org/10.1038/s41586-025-09298-z