Summary

Measurements of protein-ligand effects (such as and ) carried out in different assays only weakly correlate with each other. The correlation is strengthened when the assays share many protein-ligand pairs (1). This means that substantial noise is introduced when combining data from multiple assays, even on the same protein. Another recent paper found that the same data collected in two different labs with different equipment has an average Spearman correlation of 0.73(2). Due to these factors, training deep-learning affinity predictors on many datasets from different sources therefore requires complex schemes (cite Boltz-2 paper)

Figures

Ref (1)

Ref(2)

See also

1.
Landrum GA, Riniker S. Combining IC 50 or K i Values from Different Sources Is a Source of Significant Noise. Journal of Chemical Information and Modeling. 2024;64(5):1560–7. Available from: https://doi.org/10.1021/acs.jcim.4c00049
2.
Spinner A, Ross D, Cortade D, Ikonomova S, Baranowski C, Dhroso A, et al. GROQ-seq Enables Cross-site Reproducibility for High-Throughput Measurement of Protein Function. 2026 Apr; Available from: http://dx.doi.org/10.64898/2026.04.07.716961