Summary

The activity of synthetic sequences that were experimentally generated and tested by (1) could not be distinguished using common ML/statistical quality metrics. These include pLDDT, ESM-1v or MSA-transformer probabilities, BLOSUM62 distance or sequence identity from template, Rosetta scores, ESM-IF or ProteinMPNN probabilities, or CNN-based probabilities. These were designed using either MSA Transformer, Ancestral sequence reconstruction, a generative adversarial network, or simply randomly chosen natural sequences (“test” in figure below). Slightly less than half were active.

Figures

Figure 2A from (1)

See also

1.
Johnson SR, Fu X, Viknander S, Goldin C, Monaco S, Zelezniak A, et al. Computational scoring and experimental evaluation of enzymes generated by neural networks. Nature Biotechnology. 2024;43(3):396–405. Available from: https://doi.org/10.1038/s41587-024-02214-2