Summary

Focused protein sequence libraries are poor training sets. (1) found that when sequence search is highly exploitative (i.e., targeting exclusively sequences predicted to be high-fitness), it becomes a poor training set. In contrast, more explorative libraries could more effectively learn the fitness landscape.

See also

1.
Yang J, Ducharme J, Johnston KE, Li F-Z, Yue Y, Arnold FH. DeCOIL: Optimization of Degenerate Codon Libraries for Machine Learning-Assisted Protein Engineering. ACS Synthetic Biology. 2023;12(8):2444–54. Available from: https://doi.org/10.1021/acssynbio.3c00301