Summary

On fixed compute budgets, sparse models (e.g., mixture-of-experts models) outperform dense models (1).

Figures

ArchitectureParametersActive ParametersFLOPsValidation Loss
Dense1,386,084,3521,386,084,3521.10 × 10^201.920
Sparse1,355,871,232393,174,0163.14 × 10^191.948
Sparse2,989,920,000866,369,2806.93 × 10^191.849
Table from (1)
1.
Bhatnagar A, Jain S, Beazer J, Curran SC, Hoffnagle AM, Ching KS, et al. Scaling Unlocks Broader Generation and Deeper Functional Understanding of Proteins. openRxiv; 2025. Available from: https://doi.org/10.1101/2025.04.15.649055