Contrastive learning

Contrastive learning is a supervised method for driving the latent representations of data points towards or away from each other by adding custom losses.

Implementations

The triplet margin loss used by Yu et al. (1): $L^{TM} = ∣∣ z_{a} - z_{p} ∣ ∣^{2} - ∣∣ z_{a} - z_{n} ∣ ∣^{2} + α$ $z_{a}$ : Enzyme embedding $z_{p}$ : Positive case $z_{n}$ : Negative case, selected to have EC numbers close in Euclidean space to the positive case $α$ : Margin; set to 1
The supercon hard loss used by Yu et al. (1): $L^{s u p} = e \in E \sum \frac{- 1}{∣ P ( E ) ∣} z_{p} \in P (e) \sum l o g \frac{exp ( z _{e} z _{p} / τ )}{\sum _{z_{a} \in A (e)} exp ( z _{i} z _{a} / τ )}$ $τ$ : temperature, set to 0.1 in the paper
Noise contrastive estimation (variant 1): $L_{B} (θ, γ) = lo g σ (s (x_{i}, a_{i, 0}^{+}; θ), γ) + k = 1 \sum K lo g (1 - σ (s (x_{i}, a_{i, k}^{-}; θ), γ)$ $x_{i}$ : Input text $a_{i, \circ}$ : Another example (either positive or negative) $s (x_{i}, a_{i, \circ}; θ)$ : Scoring function, usually cosine similarity, dot product, or logit “produced by input-sample matcher sub-network” (from Rethmeier and Augenstein 2021) $σ (z, γ)$ : Scaling function, usually sigmoid
Noise contrastive estimation (variant 2, ranks a single positive pair over $K$ negative pairs): Variant 2 ranks a single positive pair over $K$ negative pairs $L_{R} = lo g \frac{exp ( s ˉ * ( x _{I} , a _{i, 0}^{+} ; θ _{)} )}{exp ( s ˉ * ( x _{I} , a _{i, 0}^{+} ; θ _{)} ) + \sum _{k = 1}^{K} exp ( s ˉ * ( x _{I} , a _{i, k}^{-} ; θ ))}$

Yu T, Cui H, Li JC, Luo Y, Jiang G, Zhao H. Enzyme function prediction using contrastive learning. Science. 2023;379(6639):1358–63. Available from: https://doi.org/10.1126/science.adf2465

Quartz 4

Explorer

Contrastive learning

Implementations

Contrastive learning improves segregation of functional clusters

PLM embeddings fine-tuned using contrastive learning outperform other representations in drug-target interaction prediction

Contrastive fine-tuning PLMs on inverse folding embeddings of experimental structures but not computational models improves downstream tasks

Contrastive learning of PLM embeddings on functional annotation improves variant effect prediction and homolog detection

Contrastive learning on whole structures leads to learning of distinct substructures