Summary

Sequence-based ML models for variant effect prediction generalize to de novo designed proteins (1). A PLM with a top-off layer that was fine-tuned on bulk GEMME predictions achieved a Spearman correlation of 0.386 on de novo proteins in the mega-scale dataset (2).

1.
Marquet C, Schlensok J, Abakarova M, Rost B, Laine E. Expert-guided protein language models enable accurate and blazingly fast fitness prediction. Bioinformatics. 2024;40(11). Available from: https://doi.org/10.1093/bioinformatics/btae621
2.
Tsuboyama K, Dauparas J, Chen J, Laine E, Mohseni Behbahani Y, Weinstein JJ, et al. Mega-scale experimental analysis of protein folding stability in biology and design. Nature. 2023;620(7973):434–44. Available from: https://doi.org/10.1038/s41586-023-06328-6