Summary

Proteins designed using PLMs more unique than those designed using Potts models (1). For example, -lactamase sequences designed using Potts models by (2) were found to be active up to 50% sequence identity, but not more. This also includes MSA-transformer (3). In contrast, lysozyme sequences designed using (ProGen) model after fine-tuning achieve activity with identities as low as 31% (1). In the same experiment, bmDCA-generated sequences were inactive. Other studies have found that they are more likely to fold with high TM-score (4).

Figures

Ref (2)

Ref (3)

Ref (4)

1.
Madani A, Krause B, Greene ER, Subramanian S, Mohr BP, Holton JM, et al. Large language models generate functional protein sequences across diverse families. Nature Biotechnology. 2023;41(8):1099–106. Available from: https://doi.org/10.1038/s41587-022-01618-2
2.
Fram B, Su Y, Truebridge I, Riesselman AJ, Ingraham JB, Passera A, et al. Simultaneous enhancement of multiple functional properties using evolution-informed protein design. Nature Communications. 2024;15(1). Available from: https://doi.org/10.1038/s41467-024-49119-x
3.
Sgarbossa D, Lupo U, Bitbol A-F. Generative power of a protein language model trained on multiple sequence alignments. eLife. 2023;12. Available from: https://doi.org/10.7554/elife.79854
4.
Alamdari S, Thakkar N, van den Berg R, Tenenholtz N, Strome R, Moses AM, et al. Protein generation with evolutionary diffusion: sequence is all you need. openRxiv; 2023. Available from: https://doi.org/10.1101/2023.09.11.556673