Summary

ML models trained on property prediction (such as fitness prediction) must trade off bias and variance (Fannjiang and (1)). Bias here refers to consistent inaccuracy while variance refers to uncertainty. Datasets that are small but contextually relevant tend to produce ML models with more of the latter, while those that are larger (e.g., sequences for PLMs) tend to produce ML models with more of the former.

1.
Fannjiang C, Listgarten J. Is novelty predictable? arXiv. 2023; Available from: https://arxiv.org/abs/2306.00872