Summary
On fixed compute budgets, sparse models (e.g., mixture-of-experts models) outperform dense models (1).
Figures
| Architecture | Parameters | Active Parameters | FLOPs | Validation Loss |
|---|---|---|---|---|
| Dense | 1,386,084,352 | 1,386,084,352 | 1.10 × 10^20 | 1.920 |
| Sparse | 1,355,871,232 | 393,174,016 | 3.14 × 10^19 | 1.948 |
| Sparse | 2,989,920,000 | 866,369,280 | 6.93 × 10^19 | 1.849 |
| Table from (1) |
1.
Bhatnagar A, Jain S, Beazer J, Curran SC, Hoffnagle AM, Ching KS, et al. Scaling Unlocks Broader Generation and Deeper Functional Understanding of Proteins. openRxiv; 2025. Available from: https://doi.org/10.1101/2025.04.15.649055