Found 104 total tags.
Affinity maturation is a process by which antibodies are iteratively improved in vivo. Described as “hyper-darwinian”.
29 items with this tag. Showing first 10 tags.
Alignment collects notes on comparing proteins by either sequence-derived representations or structural information. The main split here is between sequence-based alignment approaches, including PLM embedding methods and homology retrieval, and structure-based alignment approaches that operate on coordinates, structural descriptors, or structure-derived embeddings.
18 items with this tag. Showing first 10 tags.
14 items with this tag. Showing first 10 tags.
6 items with this tag.
(OpenFold redirects here)
AlphaFold2 is a protein structure prediction method from 2020, and is the first to make extensive use of the Transformer architecture, consisting of 97 million parameters. AlphaMissense is a derivative of this method.
Architecture of AlphaFold2 from Jumper et al. (1)
26 items with this tag. Showing first 10 tags.
AlphaFold3 is a diffusion-based all-atom structure prediction method that is widely seen as state-of-the-art.
Architecture of AlphaFold3 from Abramson et al. (1)
17 items with this tag. Showing first 10 tags.
Ancestral sequence reconstruction refers to the process of inferring ancestral sequences using extant sequences.
Figure from (1)
6 items with this tag.
Antibodies are proteins with two heavy chains and two light chains produced by B cells, central to the adaptive immune system. Their structure consists of a variable region (containing CDRs and a framework region) and three constant regions (CH1, CH2, CH3). The variable region and CH1 form the Fab, while the remainder forms the Fc region. B cell receptors are antibodies with an additional CH4 domain.
IgA is found in mucous membranes, cannot activate the complement system, and makes up roughly two-thirds of antibodies in healthy adults. It is notable for being double-sided.
IgE has only one binding site, mainly protects against large organisms like parasites, and is responsible for allergic reactions.
IgG is the standard antibody class used in therapeutic design. Cannot activate the complement system and can pass through the placenta.
IgM is far less subject to affinity maturation than IgG, responds to lipids and polysaccharides, and activates the complement system.
SAbDab is a database of all antibody and nanobody structures in the PDB. The full list can be downloaded with:
curl -s https://opig.stats.ox.ac.uk/webapps/sabdab-sabpred/sabdab/summary/all/26 items with this tag. Showing first 10 tags.
8 items with this tag.
1 item with this tag.
17 items with this tag. Showing first 10 tags.
45 items with this tag. Showing first 10 tags.
14 items with this tag. Showing first 10 tags.
16 items with this tag. Showing first 10 tags.
15 items with this tag. Showing first 10 tags.
Antibody developability refers to a series of properties linked to the development of antibodies as therapeutics. These arise because natural antibodies do not possess them to begin with, or because they arise during optimization as a consequence of affinity maturation via techniques such as Yeast display.
See Stability and thermostability
31 items with this tag. Showing first 10 tags.
10 items with this tag.
5 items with this tag.
6 items with this tag.
1 item with this tag.
The antibody structure prediction problem is a subclass of the protein structure prediction problem that is mostly focused on predicting the CDRs, specifically the CDRH3 loop that mediates antigen binding. It also sometimes includes the antibody-antigen docking problem.
25 items with this tag. Showing first 10 tags.
19 items with this tag. Showing first 10 tags.
5 items with this tag.
B cells are a type of lymphocyte that synthesizes and secretes antibodies as well as B-cell receptors. Each cell synthesizes a single antibody sequence throughout its life cycle (a phenomenon termed allelic exclusion). Can be categorized as either naive B cells or memory B cells. Activation by antigen presentation causes them to proliferate into effector cells that secrete antibodies.
1 item with this tag.
The BLOSUM62 matrix quantifies the similarity of amino acids to one another when computing evolutionary distance sequences (1). Interestingly, it has math errors in its computation that make it a more effective matrix for substitution calculation (2). A custom version for TCRs has also been presented (3).
1 item with this tag.
13 items with this tag. Showing first 10 tags.
25 items with this tag. Showing first 10 tags.
1 item with this tag.
6 items with this tag.
63 items with this tag. Showing first 10 tags.
6 items with this tag.
10 items with this tag.
5 items with this tag.
6 items with this tag.
23 items with this tag. Showing first 10 tags.
12 items with this tag. Showing first 10 tags.
Contrastive learning is a supervised method for driving the latent representations of data points towards or away from each other by adding custom losses.
5 items with this tag.
Diffusion guidance refers to inference-time methods that steer a diffusion process toward desired properties, constraints, or observations. It is a general concept that applies to both protein design, structure prediction, and sequence-based protein design, of which all-atom diffusion is just one application.
Xie et al (1) outline three broad types of guidance used by diffusion models:
However, other search algorithms such as Beam search and Monte Carlo Tree Search have been used in conjunction with diffusion models (2).
12 items with this tag. Showing first 10 tags.
4 items with this tag.
8 items with this tag.
Diffusion models are generative models whose outputs are generated by iteratively denoising Gaussian noise. During inference, the reverse process is executed, whereas during training, the forward process is carried out (converting data to noise). This process can be guided at inference time to respect specific geometric constraints.
Diffusion models try to recover samples drawn from from Gaussian noise by iteratively denoising across time steps. The forward process is defined as
: Standard Weiner process (by which noise is added)
Typically these drift and diffusion functions are used:
The reverse process:
The scoring function is approximated by a neural network, .
Diffusion models can also be zero-shot classifiers (1).
Diffusion models have been combined with Replica-exchange molecular dynamics (2).
For protein sequences, which are fundamentally discrete, Yang et al (3) describe three approaches to generation using diffusion models:
31 items with this tag. Showing first 10 tags.
15 items with this tag. Showing first 10 tags.
12 items with this tag. Showing first 10 tags.
Epistasis refers to the non-additivity of fitness effects arising from specific combinations of mutations. (1) outline three types of epistasis: magnitude epistasis (“same direction as expected but are not perfectly additive”), sign epistasis (“effect of one of the substitutions changes direction in the context of the other”), and reciprocal epistasis (“effects of both substitutions change direction when they are made together”).
Figure from (1)
Figure from (2)
Figure from (1)
Figure from (2)
Figure from (6)
12 items with this tag. Showing first 10 tags.
1 item with this tag.
Protein evolution is the process and result of gradual sequence changes resulting in functional and/or structural changes. See Epistasis for examples on why evolutionary trajectories are difficult to predict. This note excludes any discussion of somatic hypermutation.
Figure from (2)
Figure from (10)21 items with this tag. Showing first 10 tags.
5 items with this tag.
Immune repertoires are the full breadth of B-cell and T-cell receptors being expressed by a human that are available for potential antigen binding. Rees (1) provides estimates suggesting that humans have naive repertoires of about sequences.
Related:
4 items with this tag.
Inverse folding describes the problem of designing a sequence for a structure. Typically these are limited to the twenty canonical amino acids.
See Hybrid sequence-structure models for a list of methods that incorporate PLMs
Figure from (1)
Figure 2 from Castorina et al. (5)44 items with this tag. Showing first 10 tags.
31 items with this tag. Showing first 10 tags.
10 items with this tag.
7 items with this tag.
The light chain of an antibody makes up part of its variable region and Fab, and therefore is involved in antigen binding. In humans, the Kappa and Lambda subtypes are found and split about 60:40, whereas in mice it is closer to 90:10.
6 items with this tag.
7 items with this tag.
Predicted aligned error (PAE) is a measurement calculated by protein structure prediction neural networks to capture positional errors between two amino acids in a computational model. It was introduced by AlphaFold2. A derivative metric, ipSAE, has been shown to be more robust at identifying potential binders.
Figure from (1)
9 items with this tag.
(LDDT redirects here) pLDDT (predicted local distance difference test) is a confidence metric used by neural networks for protein structure prediction. It captures the per-residue accuracy, both in terms of neighborhood and side chain rotamer. It was first directly integrated into structure prediction by AlphaFold2 at the per-residue level and has been widely adopted since. AlphaFold3 adopted per-atom pLDDT.
Figure from (1)
29 items with this tag. Showing first 10 tags.
Protein backbone design is the generation of protein backbones in three-dimensional space. This section also covers generation and design of entire protein structures in Cartesian space, but most methods uncouple design of the backbone and design of the sequence given the backbone (inverse folding). As of May 2024, the current state of the art uses diffusion.
22 items with this tag. Showing first 10 tags.
20 items with this tag. Showing first 10 tags.
Auto-generated
This page is generated automatically from notes tagged
protein-design/*. Add prose above the generated marker to preserve it across regenerations.
28 items with this tag. Showing first 10 tags.
19 items with this tag. Showing first 10 tags.
5 items with this tag.
Not to be confused with protein structure prediction
Protein folding is the process by which an amino acid polypeptide self-organizes into a 3D structure.
26 items with this tag. Showing first 10 tags.
10 items with this tag.
16 items with this tag. Showing first 10 tags.
Protein language models (PLMs) are a type of Transformer model trained on either protein sequences or Multiple sequence alignments.
Figure from (6)From Hybrid sequence-structure models
91 items with this tag. Showing first 10 tags.
21 items with this tag. Showing first 10 tags.
35 items with this tag. Showing first 10 tags.
31 items with this tag. Showing first 10 tags.
Protein-protein interactions describe when two or more proteins bind to one another.
19 items with this tag. Showing first 10 tags.
Protein structure tokenization refers to the process of discretizing protein structure using a learned codebook derived from vector-quantized variational autoencoders. It was first used for search purposes with Foldseek and has since been adopted for use with ESM3, SaProt, and others.
RosettaFold (sometimes stylized as RoseTTAfold) is a protein structure prediction method unveiled in mid-2021 (1). A second version, RosettaFold2, was released in early 2023 (2). Its architecture closely tracks that of AlphaFold2, with several changes, such as the use of an SE3-transformer.
1 item with this tag.
Structure prediction refers to the problem of predicting the 3D shape of a protein or nucleotide sequence without any experimental information. Common metrics used for evaluating the quality of predicted structures include LDDT (residue-level, TM-score (whole-structure level), and DockQ (complex level).
See Antibody structure prediction
Figure from (1)
85 items with this tag. Showing first 10 tags.
8 items with this tag.
8 items with this tag.
19 items with this tag. Showing first 10 tags.
9 items with this tag.
27 items with this tag. Showing first 10 tags.
13 items with this tag. Showing first 10 tags.
Thermostability refers to a protein’s ability to remain folded at high temperatures or under harsh conditions. It is a highly desirable property for engineered proteins.
54 items with this tag. Showing first 10 tags.
7 items with this tag.
6 items with this tag.
5 items with this tag.
7 items with this tag.
20 items with this tag. Showing first 10 tags.
TM-score is an alignment-dependent protein structure similarity term introduced by (1) that is widely used for assessing protein structure prediction methods. It is defined as:
: length of the amino acid sequence of the target protein : number of residues in both the target and query proteins : Distance between pairs of residues : Distance scaling factor
20 items with this tag. Showing first 10 tags.
6 items with this tag.
Variant effect prediction covers the changes in properties or fitness (measured in various ways) resulting from small sequence-level changes in proteins.
21 items with this tag. Showing first 10 tags.