Summary
Alternative noise schedules improve training of protein language models (1). No ablation data is provided for this claim, but authors of ESM3 state that training using a typical 15% masking rate yielded poor results relative to a more random noise schedule.
1.
Hayes T, Rao R, Akin H, Sofroniew NJ, Oktay D, Lin Z, et al. Simulating 500 million years of evolution with a language model. Science. 2025;387(6736):850–8. Available from: https://doi.org/10.1126/science.ads0018