AlphaMissense is a protein language model designed for variant effect prediction that is built on top of AlphaFold2 and fine-tuned using primate data. During its release, it achieved state-of-the-are performance on ClinVar data, and predicted that a third of missense mutations in the human proteome were likely pathogenic and half as likely benign. A later paper found that, in the ion channel CFTR, it had both a high false positive rate and high correlation with in vitro functional data, concluding that it “cannot differentiate mechanistic effects or the nature of pathophysiology” (1).
Ref (2)
Notes
- Three-stage training procedure:
- Structure prediction and MSA site prediction (vanilla AlphaFold2), except the BERT losses were upweighted
- Fine-tuning on primate sequences
- Self-distillation and repeat
- Ablation studies suggest the entire training procedure is necessary
- Reduced performance on residues predicted to be disordered
1.
McDonald EF, Oliver KE, Schlebach JP, Meiler J, Plate L. Benchmarking AlphaMissense pathogenicity predictions against cystic fibrosis variants. PLOS ONE. 2024;19(1):e0297560. Available from: https://doi.org/10.1371/journal.pone.0297560
2.
Cheng J, Novati G, Pan J, Bycroft C, Žemgulytė A, Applebaum T, et al. Accurate proteome-wide missense variant effect prediction with AlphaMissense. Science. 2023;381(6664). Available from: https://doi.org/10.1126/science.adg7492