Distillation of machine learning models refers to the generation of additional training data using either another neural network (cross-distillation, or student-teacher training) or an earlier version of a neural network (self distillation). This can be to expose more training data (AlphaFold2, AlphaMissense, ESM3) or to shrink the model (DistillProtBERT; (1)) for better generalization. It has also been used to turn PLMs into Variant effect prediction models (2), by adding a top-off layer that is trained using predictions from larger specialized models. Finally, cross-distillation was used when training AlphaFold3 for prediction of disordered regions (see AlphaFold3 uses cross-distillation from AlphaFold2 to avoid hallucinating secondary structure in low-pLDDT regions; (3)).

1.
Geffen Y, Ofran Y, Unger R. DistilProtBert: a distilled protein language model used to distinguish between real proteins and their randomly shuffled counterparts. Bioinformatics. 2022;38(Supplement_2):ii95–8. Available from: https://doi.org/10.1093/bioinformatics/btac474
2.
Marquet C, Schlensok J, Abakarova M, Rost B, Laine E. Expert-guided protein language models enable accurate and blazingly fast fitness prediction. Bioinformatics. 2024;40(11). Available from: https://doi.org/10.1093/bioinformatics/btae621
3.
Abramson J, Adler J, Dunger J, Evans R, Green T, Pritzel A, et al. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature. 2024;630(8016):493–500. Available from: https://doi.org/10.1038/s41586-024-07487-w