PLM-designed sequences match the distribution of fitness values, lengths, and structure prediction confidence of natural sequences

Summary

Sequences designed by protein language models have the same distribution of pLDDT values, sequence lengths, and in vitro activity as naturally occurring sequences (1). This was measured in amylases and can be overcome with fine-tuning using direct preference optimization.

Figures

Ref (1)

Stocco F, Artigues-Lleixa M, Hunklinger A, Widatalla T, Guell M, Ferruz N. Guiding Generative Protein Language Models with Reinforcement Learning. 2024; Available from: https://arxiv.org/abs/2412.12979

Quartz 4

Explorer

PLM-designed sequences match the distribution of fitness values, lengths, and structure prediction confidence of natural sequences

Summary

Figures

Graph View