Most ML quality metrics cannot effectively predict enzyme activity after controlling for similarity to native

Summary

The activity of synthetic sequences that were experimentally generated and tested by (1) could not be distinguished using common ML/statistical quality metrics. These include pLDDT, ESM-1v or MSA-transformer probabilities, BLOSUM62 distance or sequence identity from template, Rosetta scores, ESM-IF or ProteinMPNN probabilities, or CNN-based probabilities. These were designed using either MSA Transformer, Ancestral sequence reconstruction, a generative adversarial network, or simply randomly chosen natural sequences (“test” in figure below). Slightly less than half were active.

Figures

Figure 2A from (1)

Quartz 4

Explorer

Most ML quality metrics cannot effectively predict enzyme activity after controlling for similarity to native

Summary

Figures

See also

Graph View

Backlinks