Summary
In binary classification tasks, the composition of the negative data being used affects what is being learned (1). Moreover, whether the correct things are being learned cannot be gleaned directly from absolute losses on either in-distribution or out-of-distribution data. This was evaluated on antibody-antigen binding prediction.
Details
Authors note that “biological rule discovery was not trivially associated with either in-distribution (ID) or out-of-distribution (OOD) model performance” and “performance is necessary but not sufficient for rule discovery.”
1.
Ursu E, Minnegalieva A, Rawat P, Chernigovskaya M, Tacutu R, Sandve GK, et al. Training data composition determines machine learning generalization and biological rule discovery. Nature Machine Intelligence. 2025;7(8):1206–19. Available from: https://doi.org/10.1038/s42256-025-01089-5