Summary
Measurements of protein-ligand effects (such as and ) carried out in different assays only weakly correlate with each other. The correlation is strengthened when the assays share many protein-ligand pairs (1). This means that substantial noise is introduced when combining data from multiple assays, even on the same protein. Another recent paper found that the same data collected in two different labs with different equipment has an average Spearman correlation of 0.73(2). Due to these factors, training deep-learning affinity predictors on many datasets from different sources therefore requires complex schemes (cite Boltz-2 paper)
Figures
Ref (1)
Ref(2)
See also
- About 100k datapoints required to train accurate ddG predictor
- Spearman correlations of protein property prediction methods do not correlate perfectly with absolute error
- Kd differs from IC50, LD50, and GI50
1.
Landrum GA, Riniker S. Combining IC 50 or K i Values from Different Sources Is a Source of Significant Noise. Journal of Chemical Information and Modeling. 2024;64(5):1560–7. Available from: https://doi.org/10.1021/acs.jcim.4c00049
2.
Spinner A, Ross D, Cortade D, Ikonomova S, Baranowski C, Dhroso A, et al. GROQ-seq Enables Cross-site Reproducibility for High-Throughput Measurement of Protein Function. 2026 Apr; Available from: http://dx.doi.org/10.64898/2026.04.07.716961