Summary
Pretraining performance does not capture effectiveness on downstream tasks (1,2). Neyshabur et al found that using different post-plateau checkpoints as starting points for fine-tuning had drastically different performance on downstream tasks, with more heavily trained models showing better performance.
See also
- Protein property prediction using PLMs does not benefit from scale except when predicting inferring features of either structural or sparsely populated sequence families
- Fine-tuning leads to small changes in parameter values in L2 space
- Fine-tuning can be detrimental to performance
1.
Neyshabur B, Sedghi H, Zhang C. What is being transferred in transfer learning? Advances in Neural Information Processing Systems. 2020;33:512–23. Available from: https://proceedings.neurips.cc/paper/2020/hash/0607f4c705595b911a4f3e7a127b44e0-Abstract.html
2.
Li F-Z, Amini AP, Yue Y, Yang KK, Lu AX. Feature Reuse and Scaling: Understanding Transfer Learning with Protein Language Models. openRxiv; 2024. Available from: https://doi.org/10.1101/2024.02.05.578959