Summary
The use of Distillation, where synthetic data are generated and supplement the real data during retraining from scratch, improved structure prediction quality when training AlphaFold2 (1). It is likely due to exposing more sequence tokens in Multiple sequence alignments to the Evoformer. In contrast, it was not as effective when training ESMFold.
Figures
Ref (1)
1.
Jumper J, Evans R, Pritzel A, Green T, Figurnov M, Ronneberger O, et al. Highly accurate protein structure prediction with AlphaFold. Nature. 2021;596(7873):583–9. Available from: https://doi.org/10.1038/s41586-021-03819-2