Quartz 4

Home

❯

tags

❯

Protein structure tokenization

Protein structure tokenization

Created Apr 10, 2026Modified Apr 20, 2026

Protein structure tokenization refers to the process of discretizing protein structure using a learned codebook derived from vector-quantized variational autoencoders. It was first used for search purposes with Foldseek and has since been adopted for use with ESM3, SaProt, and others.

Notes

  • Structural tokenization circumvents the inability of inverse folding models to be effectively trained on exclusively computational models.

2 items with this tag.

  • Apr 21, 2026

    Protein sequence and structure compression using VQ-VAE tokens is prone to codebook collapse

    • protein-structure-tokenization
  • Apr 21, 2026

    Structural tokens improve zero-shot variant effect prediction in ESM3, but only when structures are not computationally derived

    • protein-structure-tokenization
    • variant-effect-prediction

Created with Quartz v4.5.2 © 2026

  • GitHub
  • Discord Community