Scaling hypothesis

Summary

The scaling hypothesis says that Transformer models (and deep learning models more broadly) learn a richer understanding of the training data as they and their training sets get larger. (1) give examples for this in natural language processing (below).

Figures

Ref (1)

From (2)

Ruan Y, Maddison CJ, Hashimoto T. Observational Scaling Laws and the Predictability of Language Model Performance. 2024; Available from: https://arxiv.org/abs/2405.10938

Chen B, Cheng X, Li P, Geng Y, Gong J, Li S, et al. xTrimoPGLM: Unified 100B-Scale Pre-trained Transformer for Deciphering the Language of Protein. openRxiv; 2023. Available from: https://doi.org/10.1101/2023.07.05.547496

Quartz 4

Explorer

Scaling hypothesis

Summary

Figures

Graph View

Backlinks