Grokking

Grokking refers to training a neural network far beyond overfitting. Wang et al. (1) showed that it can improve reasoning abilities of transformers, whereas Springer et al. (2) showed it can make models more difficult to fine-tune.

Wang B, Yue X, Su Y, Sun H. Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization. 2024; Available from: https://arxiv.org/abs/2405.15071

Springer JM, Goyal S, Wen K, Kumar T, Yue X, Malladi S, et al. Overtrained Language Models Are Harder to Fine-Tune. In: International Conference on Machine Learning. PMLR; 2025. p. 56719–89. Available from: https://proceedings.mlr.press/v267/springer25a.html

Quartz 4

Explorer

Grokking

Graph View

Backlinks