Tag Index

Found 104 total tags.

affinity-maturation

Affinity maturation is a process by which antibodies are iteratively improved in vivo. Described as “hyper-darwinian”.

29 items with this tag. Showing first 10 tags.

Apr 28, 2026
Subset of mutations obtained during affinity maturation contribute disproportionately to affinity
- affinity-maturation
- antibody-antigen-interactions/binding-affinity
Apr 21, 2026
Affinity maturation causes antibody-antigen interfaces to be more similar to generic PPI interfaces
Apr 21, 2026
Affinity matured antibodies can adopt distinct binding poses from germline precursors
- affinity-maturation
- antibody-antigen-interactions/misc
Apr 21, 2026
Affinity-specificity tradeoff in antibodies is partially mediated by varying the magnitude of rigidification
Apr 21, 2026
Antibodies from different repertoires within a given species targeting the same epitope often have the same light chain isotype and share critical residues
- affinity-maturation/somatic-hypermutation
- antibody-antigen-interactions/misc
Apr 21, 2026
Antibody affinity can be improved by stabilization
- affinity-maturation
- antibody-antigen-interactions/binding-affinity
Apr 21, 2026
Antibody on and off rates are capped by the affinity maturation process
- affinity-maturation
- antibody-antigen-interactions/binding-affinity
Apr 21, 2026
Conformational entropy in antibodies decreases during affinity maturation
Apr 21, 2026
Germline precursors bind to their antigens with nanomolar affinity
- affinity-maturation
- antibody-antigen-interactions/binding-affinity
Apr 21, 2026
Lower antibody affinity is desirable for minimizing on-target off-tumor binding
- affinity-maturation
- antibody-antigen-interactions/binding-affinity

affinity-maturation/somatic-hypermutation

2 items with this tag.

Alignment collects notes on comparing proteins by either sequence-derived representations or structural information. The main split here is between sequence-based alignment approaches, including PLM embedding methods and homology retrieval, and structure-based alignment approaches that operate on coordinates, structural descriptors, or structure-derived embeddings.

18 items with this tag. Showing first 10 tags.

alignment/sequence-based

14 items with this tag. Showing first 10 tags.

alignment/structure-based

6 items with this tag.

alphafold2

(OpenFold redirects here)

AlphaFold2 is a protein structure prediction method from 2020, and is the first to make extensive use of the Transformer architecture, consisting of 97 million parameters. AlphaMissense is a derivative of this method.

Architecture of AlphaFold2 from Jumper et al. (1)

Architectural and ML contributions

Evoformer
Invariant point attention
Frame aligned point error
Self-distillation for protein structures
Predicted aligned error
Triangular update

Jumper J, Evans R, Pritzel A, Green T, Figurnov M, Ronneberger O, et al. Highly accurate protein structure prediction with AlphaFold. Nature. 2021;596(7873):583–9. Available from: https://doi.org/10.1038/s41586-021-03819-2

26 items with this tag. Showing first 10 tags.

alphafold3

AlphaFold3 is a diffusion-based all-atom structure prediction method that is widely seen as state-of-the-art.

Architecture of AlphaFold3 from Abramson et al. (1)

Architectural and ML contributions

PairFormer
PDE: predicted distance error (replacing frame aligned point error)
Non-equivariant per-atom prediction, which leads to occasional errors when predicting chirality
Cross-distillation from AlphaFold2-Multimer v2.3 to avoid hallucination of low-pLDDT regions

Abramson J, Adler J, Dunger J, Evans R, Green T, Pritzel A, et al. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature. 2024;630(8016):493–500. Available from: https://doi.org/10.1038/s41586-024-07487-w

17 items with this tag. Showing first 10 tags.

ancestral-sequence-reconstruction

Ancestral sequence reconstruction refers to the process of inferring ancestral sequences using extant sequences. Figure from (1)

Dishman AF, Tyler RC, Fox JC, Kleist AB, Prehoda KE, Babu MM, et al. Evolution of fold switching in a metamorphic protein. Science. 2021;371(6524):86–90. Available from: https://doi.org/10.1126/science.abd8700

6 items with this tag.

antibodies

Antibodies are proteins with two heavy chains and two light chains produced by B cells, central to the adaptive immune system. Their structure consists of a variable region (containing CDRs and a framework region) and three constant regions (CH1, CH2, CH3). The variable region and CH1 form the Fab, while the remainder forms the Fc region. B cell receptors are antibodies with an additional CH4 domain.

Types of antibodies

IgA is found in mucous membranes, cannot activate the complement system, and makes up roughly two-thirds of antibodies in healthy adults. It is notable for being double-sided.

IgE has only one binding site, mainly protects against large organisms like parasites, and is responsible for allergic reactions.

IgG is the standard antibody class used in therapeutic design. Cannot activate the complement system and can pass through the placenta.

IgG1 makes up 67% of all antibodies in the human body, is capable of antibody-dependent cellular phagocytosis, and has the longest hinge region. IgG1 immune repertoires consist mostly of just a few dozen dominant clones and are unique to each individual, remaining largely stable over time.
IgG2 — mice have IgG2a and IgG2b instead; some strains have IgG2c instead of IgG2a.
IgG3 makes up ~7% of IgGs and has a notably shorter half-life due to poor binding to FcRn, attributed to an H435R mutation.
IgG4 is the rarest IgG and undergoes Fab-arm exchange, dissociating into half-bodies and forming novel combinations via R409 in the hinge. Therapeutic IgG4 antibodies use the S228P substitution to stabilize the hinge and prevent this.

IgM is far less subject to affinity maturation than IgG, responds to lipids and polysaccharides, and activates the complement system.

Datasets

SAbDab is a database of all antibody and nanobody structures in the PDB. The full list can be downloaded with:

curl -s https://opig.stats.ox.ac.uk/webapps/sabdab-sabpred/sabdab/summary/all/

26 items with this tag. Showing first 10 tags.

antibodies/engineering-and-design

8 items with this tag.

antibodies/function

1 item with this tag.

Apr 21, 2026
Agonist antibodies benefit from low affinity due to antigen clustering

antibodies/nanobodies

17 items with this tag. Showing first 10 tags.

antibody-antigen-interactions

45 items with this tag. Showing first 10 tags.

antibody-antigen-interactions/binding-affinity

14 items with this tag. Showing first 10 tags.

antibody-antigen-interactions/complex-prediction

16 items with this tag. Showing first 10 tags.

antibody-antigen-interactions/misc

15 items with this tag. Showing first 10 tags.

antibody-developability

Antibody developability refers to a series of properties linked to the development of antibodies as therapeutics. These arise because natural antibodies do not possess them to begin with, or because they arise during optimization as a consequence of affinity maturation via techniques such as Yeast display.

General observations

Raybould et al. (1) outlined five criteria (in a parallel to Lepinski’s rule of five):
Length of CDRs, specifically CDRH3
Hydrophobic patches
Negative patches
Positive patches
Viscosity

Thermostability

See Stability and thermostability

Immunogenicity

Glycine residues doesn’t cause immunogenicity as much as other residues (2). This makes them used for including during design of cyclic peptides or peptide therapeutics.
Antibody humanization can lead to immunogenic reactions.

Raybould MIJ, Marks C, Krawczyk K, Taddese B, Nowak J, Lewis AP, et al. Five computational developability guidelines for therapeutic antibody profiling. Proceedings of the National Academy of Sciences. 2019;116(10):4025–30. Available from: https://doi.org/10.1073/pnas.1810576116

Aina A, Hsueh SCC, Gibbs E, Peng X, Cashman NR, Plotkin SS. De Novo Design of a β-Helix Tau Protein Scaffold: An Oligomer-Selective Vaccine Immunogen Candidate for Alzheimer’s Disease. ACS Chemical Neuroscience. 2023;14(15):2603–17. Available from: https://doi.org/10.1021/acschemneuro.3c00007

31 items with this tag. Showing first 10 tags.

antibody-developability/expression

3 items with this tag.

antibody-developability/general

10 items with this tag.

antibody-developability/humanization

2 items with this tag.

antibody-developability/hydrophobicity

5 items with this tag.

antibody-developability/immunogenicity

2 items with this tag.

antibody-developability/polyspecificity

6 items with this tag.

antibody-developability/self-association

2 items with this tag.

antibody-developability/solubility

1 item with this tag.

Apr 21, 2026
The solubilization tetrad allows camelid nanobodies to remain soluble in the absence of a light chain
- antibody-developability/solubility
- antibodies/nanobodies

antibody-drug-conjugates

1 item with this tag.

Apr 21, 2026
Antibody-drug conjugates
- antibody-drug-conjugates

antibody-framework

1 item with this tag.

Apr 21, 2026
CDR3 mediates heavy-light chain pairing preferences by making contacts to framework residues
- cdrh3
- antibody-framework

antibody-structure-prediction

The antibody structure prediction problem is a subclass of the protein structure prediction problem that is mostly focused on predicting the CDRs, specifically the CDRH3 loop that mediates antigen binding. It also sometimes includes the antibody-antigen docking problem.

Models

AbFold: a method that concatenates information from IgFold and AlphaFold2 Multimer when modeling antibodies, which usually predict different conformations, leading to improved prediction quality (1)
IgFold: a method that uses embeddings from the antibody-specific PLM AntiBERTy
ImmuneBuilder: Consists of ABodyBuilder2 (for Antibodies), NanoBodyBuilder2 (for Nanobodies), and TCRBuilder2 (for T-cell receptors).

Peng C, Wang Z, Zhao P, Ge W, Huang C. AbFold – an AlphaFold Based Transfer Learning Model for Accurate Antibody Structure Prediction. openRxiv; 2023. Available from: https://doi.org/10.1101/2023.04.20.537598

25 items with this tag. Showing first 10 tags.

antibody-structure-prediction/cdr

19 items with this tag. Showing first 10 tags.

antibody-structure-prediction/complex-prediction

5 items with this tag.

b-cells

B cells are a type of lymphocyte that synthesizes and secretes antibodies as well as B-cell receptors. Each cell synthesizes a single antibody sequence throughout its life cycle (a phenomenon termed allelic exclusion). Can be categorized as either naive B cells or memory B cells. Activation by antigen presentation causes them to proliferate into effector cells that secrete antibodies.

1 item with this tag.

Apr 21, 2026
B-cell proliferation is driven by both antigen binding and surface expression
- affinity-maturation
- b-cells

blosum62

The BLOSUM62 matrix quantifies the similarity of amino acids to one another when computing evolutionary distance sequences (1). Interestingly, it has math errors in its computation that make it a more effective matrix for substitution calculation (2). A custom version for TCRs has also been presented (3).

Henikoff S, Henikoff JG. Amino acid substitution matrices from protein blocks. Proceedings of the National Academy of Sciences. 1992;89(22):10915–9. Available from: https://doi.org/10.1073/pnas.89.22.10915

Styczynski MP, Jensen KL, Rigoutsos I, Stephanopoulos G. BLOSUM62 miscalculations improve search performance. Nature Biotechnology. 2008;26(3):274–5. Available from: https://doi.org/10.1038/nbt0308-274

Postovskaya A, Vercauteren K, Meysman P, Laukens K. tcrBLOSUM: an amino acid substitution matrix for sensitive alignment of distant epitope-specific TCRs. Briefings in Bioinformatics. 2024;26(1). Available from: https://doi.org/10.1093/bib/bbae602

1 item with this tag.

Apr 21, 2026
Effectiveness of the BLOSUM62 for antibody affinity maturation varies by molecule
- blosum62

cdrh3

13 items with this tag. Showing first 10 tags.

citation-fix

25 items with this tag. Showing first 10 tags.

citation-needed

1 item with this tag.

Apr 21, 2026
Activation-induced deaminase
- citation-needed

confidence-metrics

conformational-dynamics

63 items with this tag. Showing first 10 tags.

conformational-dynamics/allostery

6 items with this tag.

conformational-dynamics/evolution

10 items with this tag.

conformational-dynamics/experimental-ensembles

5 items with this tag.

conformational-dynamics/kinetics

6 items with this tag.

conformational-dynamics/modeling

23 items with this tag. Showing first 10 tags.

conformational-dynamics/molecular-dynamics

12 items with this tag. Showing first 10 tags.

contrastive-learning

Contrastive learning is a supervised method for driving the latent representations of data points towards or away from each other by adding custom losses.

Implementations

The triplet margin loss used by Yu et al. (1): $L^{TM} = ∣∣ z_{a} - z_{p} ∣ ∣^{2} - ∣∣ z_{a} - z_{n} ∣ ∣^{2} + α$ $z_{a}$ : Enzyme embedding $z_{p}$ : Positive case $z_{n}$ : Negative case, selected to have EC numbers close in Euclidean space to the positive case $α$ : Margin; set to 1
The supercon hard loss used by Yu et al. (1): $L^{s u p} = e \in E \sum \frac{- 1}{∣ P ( E ) ∣} z_{p} \in P (e) \sum l o g \frac{exp ( z _{e} z _{p} / τ )}{\sum _{z_{a} \in A (e)} exp ( z _{i} z _{a} / τ )}$ $τ$ : temperature, set to 0.1 in the paper
Noise contrastive estimation (variant 1): $L_{B} (θ, γ) = lo g σ (s (x_{i}, a_{i, 0}^{+}; θ), γ) + k = 1 \sum K lo g (1 - σ (s (x_{i}, a_{i, k}^{-}; θ), γ)$ $x_{i}$ : Input text $a_{i, \circ}$ : Another example (either positive or negative) $s (x_{i}, a_{i, \circ}; θ)$ : Scoring function, usually cosine similarity, dot product, or logit “produced by input-sample matcher sub-network” (from Rethmeier and Augenstein 2021) $σ (z, γ)$ : Scaling function, usually sigmoid
Noise contrastive estimation (variant 2, ranks a single positive pair over $K$ negative pairs): Variant 2 ranks a single positive pair over $K$ negative pairs $L_{R} = lo g \frac{exp ( s ˉ * ( x _{I} , a _{i, 0}^{+} ; θ _{)} )}{exp ( s ˉ * ( x _{I} , a _{i, 0}^{+} ; θ _{)} ) + \sum _{k = 1}^{K} exp ( s ˉ * ( x _{I} , a _{i, k}^{-} ; θ ))}$

Yu T, Cui H, Li JC, Luo Y, Jiang G, Zhao H. Enzyme function prediction using contrastive learning. Science. 2023;379(6639):1358–63. Available from: https://doi.org/10.1126/science.adf2465

5 items with this tag.

diffusion-guidance

Diffusion guidance refers to inference-time methods that steer a diffusion process toward desired properties, constraints, or observations. It is a general concept that applies to both protein design, structure prediction, and sequence-based protein design, of which all-atom diffusion is just one application.

Details

Xie et al (1) outline three broad types of guidance used by diffusion models:

Score guidance, in which gradients from classifiers, constraints, or rewards are used to nudge the diffusion path.
Path-integral reweighting, such as Feynman-Kac potentials, which use importance weights to update trajectories.
Invariant correctors, such as Metropolis-adjusted Langevin methods, which mix within a biased marginal without changing the trajectory weights.

However, other search algorithms such as Beam search and Monte Carlo Tree Search have been used in conjunction with diffusion models (2).

Xie Y, Winkler L, Sun L, Lewis S, Foster AE, Luna JJ, et al. Enhanced Diffusion Sampling: Efficient Rare Event Sampling and Free Energy Calculation with Diffusion Models. 2026; Available from: https://arxiv.org/abs/2602.16634

Didi K, Zhang Z, Zhou G, Reidenbach D, Cao Z, Cha S, et al. Scaling Atomistic Protein Binder Design with Generative Pretraining and Test-Time Compute. 2026; Available from: https://arxiv.org/abs/2603.27950

12 items with this tag. Showing first 10 tags.

diffusion-guidance/protein-design

4 items with this tag.

diffusion-guidance/structure-prediction

8 items with this tag.

diffusion-models

Diffusion models are generative models whose outputs are generated by iteratively denoising Gaussian noise. During inference, the reverse process is executed, whereas during training, the forward process is carried out (converting data to noise). This process can be guided at inference time to respect specific geometric constraints.

Details

Diffusion models try to recover samples $X_{0}$ drawn from $p_{d a t a}$ from Gaussian noise $X_{T}$ by iteratively denoising across $T$ time steps. The forward process is defined as

d X = drift f (X, t) d t + diffusion g (t) d B_{t} total SDE

$B_{t}$ : Standard Weiner process (by which noise is added)

Typically these drift and diffusion functions are used:

f (X, t) = - \frac{β ( t )}{2} X

g (t) = β (t)

The reverse process:

d X = [f (X, t) - g (t)^{2} \nabla_{x} lo g p_{t} (X) scoring function] d t + g (t) B_{t}

\nabla_{x} lo g p_{t} (X) = \frac{E [ X _{0} ∣ X _{t} = x ] - x}{conditional variance of distribution p _{t} ( X ∣ X _{0} = x _{0} ) σ _{t}^{2}}

The scoring function is approximated by a neural network, $s_{θ} (x, t)$ .

Diffusion models can also be zero-shot classifiers (1).

Diffusion models have been combined with Replica-exchange molecular dynamics (2).

Types of sequence-based diffusion

For protein sequences, which are fundamentally discrete, Yang et al (3) describe three approaches to generation using diffusion models:

Diffusion in pre-trained latent space (e.g., continuous diffusion)
Diffusion in discrete space with uniform noise matrices
Diffusion in discrete space via absorbing matrices, e.g., one-at-a-time masking/unmasking of individual tokens

Li AC, Prabhudesai M, Duggal S, Brown E, Pathak D. Your Diffusion Model is Secretly a Zero-Shot Classifier. In: 2023 IEEE/CVF International Conference on Computer Vision (ICCV). IEEE; 2023. p. 2206–17. Available from: https://doi.org/10.1109/iccv51070.2023.00210

Wang Y, Herron L, Tiwary P. From data to noise to data for mixing physics across temperatures with generative artificial intelligence. Proceedings of the National Academy of Sciences. 2022;119(32). Available from: https://doi.org/10.1073/pnas.2203656119

Yang J, Chu W, Khalil D, Astudillo R, Wittmann BJ, Arnold FH, et al. Steering Generative Models with Experimental Data for Protein Fitness Optimization. 2025; Available from: https://arxiv.org/abs/2505.15093

31 items with this tag. Showing first 10 tags.

diffusion-models/implementation

2 items with this tag.

diffusion-models/protein-design

15 items with this tag. Showing first 10 tags.

diffusion-models/structure-prediction

12 items with this tag. Showing first 10 tags.

diffusion-models/training

2 items with this tag.

epistasis

Epistasis refers to the non-additivity of fitness effects arising from specific combinations of mutations. (1) outline three types of epistasis: magnitude epistasis (“same direction as expected but are not perfectly additive”), sign epistasis (“effect of one of the substitutions changes direction in the context of the other”), and reciprocal epistasis (“effects of both substitutions change direction when they are made together”).

Figure from (1)

Figure from (2)

Notes

Negative epistasis is approximately 100x more common than positive epistasis (3).
Rates of magnitude, sign, and reciprocal epistasis are constant across positions (1). These differed from a null additive-only model with noise, which had 74% magnitude, 22% sign, and 4% reciprocal sign epistasis. Figure from (1)

Examples

The S373P mutation in the Spike protein is disadvantageous in pre-Omicron-variant versions of SARS-CoV-2 but advantageous in Omicron. Mentioned by Bloom and Neher (4).
The combination of N501Y and Q498R in the Spike protein of SARS-CoV-2 increases the binding affinity to ACE2 by 387-fold. This is believed to have led to the immune-evasive mutations in the Omicron-variant and was observed in vitro by Zahradník et al. (5) prior to the emergence of Omicron:
One example of third-order epistasis is the J-domain, where strong non-additivity is observed among a triad, but disappears if any of the three are mutated to alanine (2). Figure from (2)

Measuring epistasis

Figure from (6)

Graph Fourier Transform: A decomposition that breaks down the signal (here, fitness) into distinct “epistatic orders”
Dirichlet energy: quantifies how variable (i.e., rugged) the landscape is, with high energy corresponding to high ruggedness $D E = f^{T} Lf = \frac{1}{2} u \sim v \sum (f_{u} - f v)^{2}$
NK model: A linear model where $N$ is the number of sites and $K$ is the number of interacting sites; when $K = 0$ , effect of all mutations is linear, whereas larger $K$ indicates more interactions and therefore more epistasis

Johnston KE, Almhjell PJ, Watkins-Dulaney EJ, Liu G, Porter NJ, Yang J, et al. A combinatorially complete epistatic fitness landscape in an enzyme active site. Proceedings of the National Academy of Sciences. 2024;121(32). Available from: https://doi.org/10.1073/pnas.2400439121

Tsuboyama K, Dauparas J, Chen J, Laine E, Mohseni Behbahani Y, Weinstein JJ, et al. Mega-scale experimental analysis of protein folding stability in biology and design. Nature. 2023;620(7973):434–44. Available from: https://doi.org/10.1038/s41586-023-06328-6

Sarkisyan KS, Bolotin DA, Meer MV, Usmanova DR, Mishin AS, Sharonov GV, et al. Local fitness landscape of the green fluorescent protein. Nature. 2016;533(7603):397–401. Available from: https://doi.org/10.1038/nature17995

Bloom JD, Neher RA. Fitness effects of mutations to SARS-CoV-2 proteins. Virus Evolution. 2023;9(2):vead055. Available from: https://doi.org/10.1093/ve/vead055

Zahradník J, Marciano S, Shemesh M, Zoler E, Harari D, Chiaravalli J, et al. SARS-CoV-2 variant prediction and antiviral drug design are enabled by RBD in vitro evolution. Nature Microbiology. 2021;6(9):1188–98. Available from: https://doi.org/10.1038/s41564-021-00954-4

Sandhu M, Chen JZ, Matthews DS, Spence MA, Pulsford SB, Gall B, et al. Computational and Experimental Exploration of Protein Fitness Landscapes: Navigating Smooth and Rugged Terrains. Biochemistry. 2025;64(8):1673–84. Available from: https://doi.org/10.1021/acs.biochem.4c00673

12 items with this tag. Showing first 10 tags.

epitope-prediction

1 item with this tag.

Apr 21, 2026
Deep learning methods cannot generalize T-cell receptor binding to new epitopes
- epitope-prediction

evolution-and-natural-selection

Protein evolution is the process and result of gradual sequence changes resulting in functional and/or structural changes. See Epistasis for examples on why evolutionary trajectories are difficult to predict. This note excludes any discussion of somatic hypermutation.

Notes

Paradigms and preliminaries

Neutral theory: most observed amino acid changes are neutral (i.e., silent in fitness effects). This leads to genetic drift. Developed by Kimura (1).
Nearly neutral theory: deleterious mutations are retained and subsequently compensated for by advantageous mutations (which is consistent with the observation that most missense mutations are destabilizing). Developed by Ohta (2) to explain why the rate of protein evolution was independent of generation time, which is in turn inversely proportional to population size. Figure from (2)
The theory of punctuated equilibrium suggests that phenotypes change very little for long stretches of time, followed by abrupt rapid changes (3).
Statistical physics approach: population size is equated with inverse temperature (such that infinite population is analogous to zero degrees Kelvin), and log-fitness with energy (4). Advantageous and deleterious mutations are predicted to occur with equal frequency. This framing ignores the imbalance in sequence data (5,6).
Fisher’s geometric model: The overall fitness of a phenotype can be quantified along $N$ dimensions; Fisher postulated that phenotypes in a population were distributed as a hypersphere centered on a local maximum.
Protein evolvability refers to the ability of a protein to 1) evolve new functions in relatively few mutations and 2) be robust to mutations that lead to loss-of-function (7). These are described as contradictory statements by Tokuriki & Tawfik (8) but are described as complementary at the structural level.
“The principle of minimal frustration suggests that naturally evolved proteins with the same structure should have similar folding rates and that modulation of thermodynamic stability should occur via unfolding rates” (quoted from (9)). This has been supported by the observation that thioredoxins fold at similar rates but unfold at rates that correlate with their thermostability values.

Observations

Protein folds with high sequence diversity also have high functional diversity (7).
The sequence capacity of a protein exceeds $1 0^{23}$ for even small proteins (35-40 AAs), but the fraction of stable states is extremely small and inversely correlated with protein size. These values were estimated using Potts models (10). Figure from (10)

Kimura M. The Neutral Theory of Molecular Evolution. Cambridge University Press; 1985. Available from: https://books.google.com/books/about/The_Neutral_Theory_of_Molecular_Evolutio.html?id=e_HoAwAAQBAJ

OHTA T. Slightly Deleterious Mutant Substitutions in Evolution. Nature. 1973;246(5428):96–8. Available from: https://doi.org/10.1038/246096a0

Duran-Nebreda S, Bentley RA, Vidiella B, Spiridonov A, Eldredge N, O’Brien MJ, et al. On the multiscale dynamics of punctuated evolution. Trends in Ecology & Evolution. 2024;39(8):734–44. Available from: https://doi.org/10.1016/j.tree.2024.05.003

Sella G, Hirsh AE. The application of statistical physics to evolutionary biology. Proceedings of the National Academy of Sciences. 2005;102(27):9541–6. Available from: https://doi.org/10.1073/pnas.0501865102

Ding F, Steinhardt J. Protein language models are biased by unequal sequence sampling across the tree of life. openRxiv; 2024. Available from: https://doi.org/10.1101/2024.03.07.584001

Weinstein EN, Amin AN, Frazer J, Marks DS. Non-identifiability and the Blessings of Misspecification in Models of Molecular Fitness. openRxiv; 2022. Available from: https://doi.org/10.1101/2022.01.29.478324

Wagner A. Robustness and evolvability: a paradox resolved. Proceedings of the Royal Society B: Biological Sciences. 2007;275(1630):91–100. Available from: https://doi.org/10.1098/rspb.2007.1137

Tokuriki N, Tawfik DS. Protein Dynamism and Evolvability. Science. 2009;324(5924):203–7. Available from: https://doi.org/10.1126/science.1169375

Tzul FO, Vasilchuk D, Makhatadze GI. Evidence for the principle of minimal frustration in the evolution of protein folding landscapes. Proceedings of the National Academy of Sciences. 2017;114(9). Available from: https://doi.org/10.1073/pnas.1613892114

10.

Tian P, Best RB. How Many Protein Sequences Fold to a Given Structure? A Coevolutionary Analysis. Biophysical Journal. 2017;113(8):1719–30. Available from: https://doi.org/10.1016/j.bpj.2017.08.039

21 items with this tag. Showing first 10 tags.

evolution-and-natural-selection/structure

5 items with this tag.

fine-tuning

2 items with this tag.

immune-repertoires

Immune repertoires are the full breadth of B-cell and T-cell receptors being expressed by a human that are available for potential antigen binding. Rees (1) provides estimates suggesting that humans have naive repertoires of about $1 0^{9}$ sequences.

Rees AR. Understanding the human antibody repertoire. mAbs. 2020;12(1). Available from: https://doi.org/10.1080/19420862.2020.1729683

4 items with this tag.

inverse-folding

Inverse folding describes the problem of designing a sequence for a structure. Typically these are limited to the twenty canonical amino acids.

Methods

See Hybrid sequence-structure models for a list of methods that incorporate PLMs

ProteinMPNN and its derivatives
ESM-IF
GearNet
Frame2seq
GVP-GNN

Notes

Training

Training inverse folding models with backbone dihedral angles as features usually improved sequence recovery (1). Figure from (1)

Execution

Forward-folding is a stronger predictor of inverse folding success than sequence recovery ((2), citing Watson et al. (3,4)).

Datasets

PDBench is a dataset of 595 protein structures with diverse, evenly divided topologies for benchmarking of Inverse folding methods (5). Figure 2 from Castorina et al. (5)

Jamasb AR, Morehead A, Joshi CK, Zhang Z, Didi K, Mathis SV, et al. Evaluating Representation Learning on the Protein Structure Universe. In: ICLR 2024. 2024. Available from: https://openreview.net/forum?id=sTYuRVrdK3

Yang JJ, Yim J, Barzilay R, Jaakkola T. Fast non-autoregressive inverse folding with discrete diffusion. 2023; Available from: https://arxiv.org/abs/2312.02447

Watson JL, Juergens D, Bennett NR, Trippe BL, Yim J, Eisenach HE, et al. De novo design of protein structure and function with RFdiffusion. Nature. 2023;620(7976):1089–100. Available from: https://doi.org/10.1038/s41586-023-06415-8

Dauparas J, Anishchenko I, Bennett N, Bai H, Ragotte RJ, Milles LF, et al. Robust deep learning–based protein sequence design using ProteinMPNN. Science. 2022;378(6615):49–56. Available from: https://doi.org/10.1126/science.add2187

Castorina LV, Petrenas R, Subr K, Wood CW. PDBench: evaluating computational methods for protein-sequence design. Bioinformatics. 2023;39(1). Available from: https://doi.org/10.1093/bioinformatics/btad027

44 items with this tag. Showing first 10 tags.

inverse-folding/evaluation

31 items with this tag. Showing first 10 tags.

inverse-folding/misc

2 items with this tag.

inverse-folding/training

10 items with this tag.

ligand-docking

7 items with this tag.

light-chains

The light chain of an antibody makes up part of its variable region and Fab, and therefore is involved in antigen binding. In humans, the Kappa and Lambda subtypes are found and split about 60:40, whereas in mice it is closer to 90:10.

Kappa and lambda subtype

The ratio of kappa to lambda in circulating antibodies is about 60:40; when this falls out of balance, that can be a symptom of B cell lymphoma.
Lambda light chains are more flexible than kappa light chains due to an extra glycine in the switch region (_Articles that need citations).

6 items with this tag.

Apr 21, 2026
Kappa light chains are far more common in antibody therapeutics
- light-chains
Apr 21, 2026
Lambda light chains are less thermostable
- light-chains
Apr 21, 2026
Light chain CDR charges can shorten half-life by slowing release from FcRn
- light-chains
Apr 21, 2026
Light chain coherence
- light-chains
- cdrh3
Apr 21, 2026
Light chain determines autospecificity
- antibody-developability/self-association
- light-chains
Apr 21, 2026
Some light chain V-genes are more likely to be found in dysfunctional antibodies
- light-chains

low-rank-adaptation

7 items with this tag.

pae

Predicted aligned error (PAE) is a measurement calculated by protein structure prediction neural networks to capture positional errors between two amino acids in a computational model. It was introduced by AlphaFold2. A derivative metric, ipSAE, has been shown to be more robust at identifying potential binders.

Figure from (1)

Chow A, Chu H, Li R, Nalbant BN, Dozic AV, Kida LC, et al. Sequence and structural determinants of efficacious de novo chimeric antigen receptors. openRxiv; 2025. Available from: https://doi.org/10.64898/2025.12.12.694033

9 items with this tag.

plddt

(LDDT redirects here) pLDDT (predicted local distance difference test) is a confidence metric used by neural networks for protein structure prediction. It captures the per-residue accuracy, both in terms of neighborhood and side chain rotamer. It was first directly integrated into structure prediction by AlphaFold2 at the per-residue level and has been widely adopted since. AlphaFold3 adopted per-atom pLDDT.

Figure from (1)

Notes

When clustering predicted protein structures, sparse clusters tend to have lower pLDDT (2). This was found to be independent of MSA depth.
pLDDT correlates poorly with GDT-TS among AlphaFold2 models in CASP15. This was observed in a repeat that used deeper MSAs (3).
De novo sequences designed by inversion with high pLDDT were found by ESM to have high perplexity (4).
While the default pLDDT is not continuously differentiable and thus unsuitable for training, (5) use a modified version that can be used as a loss function.
pLDDT can be used as spatial restraints in biomolecular simulations: $E_{p L DD T} = \sum (\frac{d _{i}}{1.5 * exp ( 4 * ( 0.7 - p L DD T _{i} ) )})^{2}$ The equation was originally presented by Hiranuma et al. (6) and was used by del Alamo et al. (7) as coordinate constraints in Rosetta when refining AlphaFold2 models.

Terwilliger TC, Afonine PV, Liebschner D, Croll TI, McCoy AJ, Oeffner RD, et al. Accelerating crystal structure determination with iterative AlphaFold prediction. Acta Crystallographica Section D Structural Biology. 2023;79(3):234–44. Available from: https://doi.org/10.1107/s205979832300102x

Nomburg J, Doherty EE, Price N, Bellieny-Rabelo D, Zhu YK, Doudna JA. Birth of protein folds and functions in the virome. Nature. 2024;633(8030):710–7. Available from: https://doi.org/10.1038/s41586-024-07809-y

Lee S, Kim G, Karin EL, Mirdita M, Park S, Chikhi R, et al. Petascale Homology Search for Structure Prediction. openRxiv; 2023. Available from: https://doi.org/10.1101/2023.07.10.548308

Verkuil R, Kabeli O, Du Y, Wicky BIM, Milles LF, Dauparas J, et al. Language models generalize beyond natural proteins. openRxiv; 2022. Available from: https://doi.org/10.1101/2022.12.21.521521

Trinquier J, Petti S, Park S, Herath K, van Kempen M, Feng S, et al. SoftAlign: End-to-end protein structures alignment. openRxiv; 2025. Available from: https://doi.org/10.1101/2025.05.09.653096

Hiranuma N, Park H, Baek M, Anishchenko I, Dauparas J, Baker D. Improved protein structure refinement guided by deep learning based accuracy estimation. Nature Communications. 2021;12(1). Available from: https://doi.org/10.1038/s41467-021-21511-x

del Alamo D, DeSousa L, Nair RM, Rahman S, Meiler J, Mchaourab HS. Integrated AlphaFold2 and DEER investigation of the conformational dynamics of a pH-dependent APC antiporter. Proceedings of the National Academy of Sciences. 2022;119(34). Available from: https://doi.org/10.1073/pnas.2206129119

29 items with this tag. Showing first 10 tags.

protein-backbone-design

Protein backbone design is the generation of protein backbones in three-dimensional space. This section also covers generation and design of entire protein structures in Cartesian space, but most methods uncouple design of the backbone and design of the sequence given the backbone (inverse folding). As of May 2024, the current state of the art uses diffusion.

Methods

Chroma (1)
RF-diffusion (2) and RFam (3)
Hallucination using AlphaFold2 and RosettaFold
Inpainting using RosettaFold (4)

Datasets

Verkuil et al. (5) use a test set of 39 PDBs for their validation, although they cite someone else:
1QYS
2KL8
2KPO
2LN3
2LTA
2LVB
2N2T
2N2U
2N3Z
2N76
4KY3
4KYZ
5CW9
5KPE
5KPH
5L33
5TPJ
5TRV
6CZG
6CZH
6CZI
6CZJ
6D0T
6DG6
6DKM A
6DKM B
6DLM A
6DLM B
6E5C
6LLQ
6MRR
6MRS
6MSP
6NUK
6W3F
6W3W
6WI5
6WVS
7MCD

Ingraham JB, Baranov M, Costello Z, Barber KW, Wang W, Ismail A, et al. Illuminating protein space with a programmable generative model. Nature. 2023;623(7989):1070–8. Available from: https://doi.org/10.1038/s41586-023-06728-8

Kim D, Woodbury SM, Ahern W, Tischer D, Kang A, Joyce E, et al. Computational design of metallohydrolases. Nature. 2025;649(8095):246–53. Available from: https://doi.org/10.1038/s41586-025-09746-w

Wang J, Lisanza S, Juergens D, Tischer D, Watson JL, Castro KM, et al. Scaffolding protein functional sites using deep learning. Science. 2022;377(6604):387–94. Available from: https://doi.org/10.1126/science.abn2100

Verkuil R, Kabeli O, Du Y, Wicky BIM, Milles LF, Dauparas J, et al. Language models generalize beyond natural proteins. openRxiv; 2022. Available from: https://doi.org/10.1101/2022.12.21.521521

22 items with this tag. Showing first 10 tags.

protein-backbone-design/designability

20 items with this tag. Showing first 10 tags.

protein-design

Auto-generated

This page is generated automatically from notes tagged protein-design/*. Add prose above the generated marker to preserve it across regenerations.

28 items with this tag. Showing first 10 tags.

protein-design/design

19 items with this tag. Showing first 10 tags.

protein-design/enzyme-design

5 items with this tag.

protein-design/training

2 items with this tag.

protein-folding

Not to be confused with protein structure prediction

Protein folding is the process by which an amino acid polypeptide self-organizes into a 3D structure.

Prediction

no-h

26 items with this tag. Showing first 10 tags.

protein-folding/structure-prediction

10 items with this tag.

protein-folding/unfolding

16 items with this tag. Showing first 10 tags.

protein-language-models

Protein language models (PLMs) are a type of Transformer model trained on either protein sequences or Multiple sequence alignments.

Methods

Single-sequence

ESM: currently the most widely-used encoder PLM
ProGen: probably the most widely-used decoder PLM
ProtBERT and DistillProtBERT (1)
ProteinNPT
xTrimoPGLM
CARP: A CNN that performs as well as transformer-based methods on both pretraining and downstream tasks. Anecdotally, these can’t indirectly calculate contact maps via the Categorical Jacobian method as well as transformer-based models.
DASM (deep amino acid sequence model), which is trained on germline-descendant point mutation pairs to learn relative mutation frequencies, after normalizing for expected mutation frequencies in the codon table (2).

Notes

General observations

PLMs are in-context learners that default to retrieving information from nearby repeats (3).

Representations

Multiple instance learning using PLM embeddings of all genes in a viral genome identifies which sequences are responsible for host tropism (4). For example, this ranked the Spike protein as the key contributor of host tropism.
Homolog detection using PLM representations can be improved by compression (5). Using the full representations worsened detection AUC by 7.4%.
PLMs with a smoother representation space are better predictors of protein function (6). Figure from (6)

Hybrid PLM-inverse folding models

From Hybrid sequence-structure models

Training

Matthews et al. (6) found that masking 0.5% of residues when training PLMs improved predictive performance (greater $R^{2}$ ) relative to 15% used by ESM.

Geffen Y, Ofran Y, Unger R. DistilProtBert: a distilled protein language model used to distinguish between real proteins and their randomly shuffled counterparts. Bioinformatics. 2022;38(Supplement_2):ii95–8. Available from: https://doi.org/10.1093/bioinformatics/btac474

Bitbol A-F. eLife Assessment: Separating selection from mutation in antibody language models. 2026; Available from: https://doi.org/10.7554/elife.109644.3.sa0

Kantroo P, Wagner GP, Machta BB. In-Context Learning can distort the relationship between sequence likelihoods and biological fitness. 2025; Available from: https://arxiv.org/abs/2504.17068

Liu D, Young F, Lamb KD, Robertson DL, Yuan K. Prediction of virus-host associations using protein language models and multiple instance learning. PLOS Computational Biology. 2024;20(11):e1012597. Available from: https://doi.org/10.1371/journal.pcbi.1012597

Kilinc M, Jia K, Jernigan RL. Improved global protein homolog detection with major gains in function identification. Proceedings of the National Academy of Sciences. 2023;120(9). Available from: https://doi.org/10.1073/pnas.2211823120

Matthews DS, Spence MA, Mater AC, Nichols J, Pulsford SB, Sandhu M, et al. Leveraging ancestral sequence reconstruction for protein representation learning. Nature Machine Intelligence. 2024;6(12):1542–55. Available from: https://doi.org/10.1038/s42256-024-00935-2

91 items with this tag. Showing first 10 tags.

protein-language-models/antibodies

21 items with this tag. Showing first 10 tags.

protein-language-models/representations

35 items with this tag. Showing first 10 tags.

protein-language-models/training

31 items with this tag. Showing first 10 tags.

protein-protein-interactions

Protein-protein interactions describe when two or more proteins bind to one another.

19 items with this tag. Showing first 10 tags.

protein-structure-prediction

1 item with this tag.

Apr 21, 2026
Coupling sidechain and main chain prediction or design does not always lead to improvements

protein-structure-prediction/limitations

1 item with this tag.

Apr 21, 2026
Coupling sidechain and main chain prediction or design does not always lead to improvements

protein-structure-tokenization

Protein structure tokenization refers to the process of discretizing protein structure using a learned codebook derived from vector-quantized variational autoencoders. It was first used for search purposes with Foldseek and has since been adopted for use with ESM3, SaProt, and others.

Notes

Structural tokenization circumvents the inability of inverse folding models to be effectively trained on exclusively computational models.

2 items with this tag.

proteinmpnn

2 items with this tag.

rosetta

1 item with this tag.

Apr 10, 2026
Rosetta
- rosetta

rosettafold

RosettaFold (sometimes stylized as RoseTTAfold) is a protein structure prediction method unveiled in mid-2021 (1). A second version, RosettaFold2, was released in early 2023 (2). Its architecture closely tracks that of AlphaFold2, with several changes, such as the use of an SE3-transformer.

Baek M, DiMaio F, Anishchenko I, Dauparas J, Ovchinnikov S, Lee GR, et al. Accurate prediction of protein structures and interactions using a three-track neural network. Science. 2021;373(6557):871–6. Available from: https://doi.org/10.1126/science.abj8754

Baek M, Anishchenko I, Humphreys IR, Cong Q, Baker D, DiMaio F. Efficient and accurate prediction of protein structure using RoseTTAFold2. openRxiv; 2023. Available from: https://doi.org/10.1101/2023.05.24.542179

1 item with this tag.

Apr 21, 2026
AlphaFold2 and RosettaFold sometimes produce different conformations

structure-prediction

Structure prediction refers to the problem of predicting the 3D shape of a protein or nucleotide sequence without any experimental information. Common metrics used for evaluating the quality of predicted structures include LDDT (residue-level, TM-score (whole-structure level), and DockQ (complex level).

Methods

MSA-based

AlphaFold2: currently viewed as the highest-accuracy method
RosettaFold
Diffold: A fine-tuned version of AlphaFold2

PLM-based

ESMFold: currently the most widely-used method, albeit probably not the most accurate model in this category
OmegaFold
xTrimoPGLM

Others

EquiFold: a method that needs to be fine-tuned on specific families of proteins
EigenFold: a method that uses diffusion to model the dynamics of proteins, albeit unsuccessfully

For antibodies

See Antibody structure prediction

Notes

Training

Figure from (1)

Sidechain prediction

Formulating the sidechain prediction problem as a classification problem by binning chi angles, rather than a regression problem, let to improved performance (2).
Sidechain prediction methods not sensitive to B-factor cutoffs. The outcome of sidechain prediction model PIPPack was not strongly affected by B-factor values of protein structures in the training set (2).

Ahdritz G, Bouatta N, Floristean C, Kadyan S, Xia Q, Gerecke W, et al. OpenFold: retraining AlphaFold2 yields new insights into its learning mechanisms and capacity for generalization. Nature Methods. 2024;21(8):1514–24. Available from: https://doi.org/10.1038/s41592-024-02272-z

Randolph NZ, Kuhlman B. Invariant point message passing for protein side chain packing. Proteins: Structure, Function, and Bioinformatics. 2024;92(10):1220–33. Available from: https://doi.org/10.1002/prot.26705

85 items with this tag. Showing first 10 tags.

structure-prediction/architecture

8 items with this tag.

structure-prediction/complex-prediction

8 items with this tag.

structure-prediction/experimental-data

2 items with this tag.

structure-prediction/limitations

19 items with this tag. Showing first 10 tags.

structure-prediction/metrics

9 items with this tag.

structure-prediction/sampling

27 items with this tag. Showing first 10 tags.

structure-prediction/training

13 items with this tag. Showing first 10 tags.

thermostability

Thermostability refers to a protein’s ability to remain folded at high temperatures or under harsh conditions. It is a highly desirable property for engineered proteins.

Prediction

Phantom epistasis refers to the inclusion of unnecessary model parameters when building biophysical/statistical fitness models (Fitness prediction). Faure et al. (1) attribute this to the epistasis mechanisms reviewed by Domingo et al. (2).
Thermodynamic reversability can be used for expanding training sets for stability prediction/ddG prediction ML models. However, it has been shown to lead to biases that favor WT amino acids. Diaz et al. (3) claim to mitigate this.
The amount of ddG data available for a given residue for training can be expanded using thermodynamic permutation, where $n$ measurements are increased to $n * (n - 1)$ . This was used by MutComputeXGT on the Tsuboyama et al. (4) dataset. It is useful for stability prediction and improves generalization in (3).
ddG data is skewed with hydrophobic amino acids (e.g., alanine scans). This has been reported to increase solvation ddG by 0.8 kcal/mol in studies cited by (3). The Tsuboyama et al. (4) data does not have this bias.

Faure AJ, Martí-Aranda A, Hidalgo-Carcedo C, Beltran A, Schmiedel JM, Lehner B. The genetic architecture of protein stability. Nature. 2024;634(8035):995–1003. Available from: https://doi.org/10.1038/s41586-024-07966-0

Domingo J, Baeza-Centurion P, Lehner B. The Causes and Consequences of Genetic Interactions (Epistasis). Annual Review of Genomics and Human Genetics. 2019;20:433–60. Available from: https://doi.org/10.1146/annurev-genom-083118-014857

Diaz DJ, Gong C, Ouyang-Zhang J, Loy JM, Wells J, Yang D, et al. Stability Oracle: A Structure-Based Graph-Transformer for Identifying Stabilizing Mutations. openRxiv; 2023. Available from: https://doi.org/10.1101/2023.05.15.540857

54 items with this tag. Showing first 10 tags.

thermostability/design

7 items with this tag.

thermostability/determinants

6 items with this tag.

thermostability/evolution

5 items with this tag.

thermostability/mutations

7 items with this tag.

thermostability/prediction

20 items with this tag. Showing first 10 tags.

tm-score

Summary

TM-score is an alignment-dependent protein structure similarity term introduced by (1) that is widely used for assessing protein structure prediction methods. It is defined as:

TM - score = ma x \frac{1}{L _{t a r g e t}} i \sum L_{co mm o n} \frac{1}{1 + ( \frac{d _{i}}{d _{0} ( L _{t a r g e t} )} ) ^{2}}

$L_{t a r g e t}$ : length of the amino acid sequence of the target protein $L_{co mm o n}$ : number of residues in both the target and query proteins $d$ : Distance between pairs of residues $d_{0} (L_{t a r g e t}) = 1.24 * 3 L_{t a r g e t} - 15 - 1.8$ : Distance scaling factor

Zhang Y, Skolnick J. Scoring function for automated assessment of protein structure template quality. Proteins: Structure, Function, and Bioinformatics. 2004;57(4):702–10. Available from: https://doi.org/10.1002/prot.20264

20 items with this tag. Showing first 10 tags.

transformers

6 items with this tag.

variant-effect-prediction

Variant effect prediction covers the changes in properties or fitness (measured in various ways) resulting from small sequence-level changes in proteins.

21 items with this tag. Showing first 10 tags.

Explorer

Tag Index

Architectural and ML contributions

Architectural and ML contributions

Types of antibodies

Datasets