Quartz 4

Home

❯

notes

❯

Masked PLMs are more sensitive to training imbalances than autoregressive PLMs

Masked PLMs are more sensitive to training imbalances than autoregressive PLMs

Created Jul 02, 2024Modified Apr 21, 2026

  • protein-language-models/training

Summary

Masked PLMs (such as ESM) are more sensitive to training imbalances than autoregressive models (such as ProGen; (1)). Presumably this is also true of transformers trained on natural language.

Figures

Ref (1)

See also

  • PLMs are biased by uneven distribution of sequence data in datasets such as UniRef and UniProt
1.
Cheng X, Chen B, Li P, Gong J, Tang J, Song L. Training Compute-Optimal Protein Language Models. openRxiv; 2024. Available from: https://doi.org/10.1101/2024.06.06.597716

Graph View

Created with Quartz v4.5.2 © 2026

  • GitHub
  • Discord Community