Quartz 4

❯

BERT

Created Apr 10, 2026Modified Apr 17, 2026

BERT is a type of encoder-only Transformer model that is typically trained by masking 15% of tokens. It is less widely-used as decoder-only LMs for generative modeling.

Graph View

Backlinks

AlphaMissense
BERT models trained on sequence clusters outperform those trained on all data
BERT-based inverse folding models generate overly repetitive sequences
ESM
Encoder- and decoder-based models have different strengths and are equally useful on generative tasks
Generic PLMs outperform antibody-specific PLMs on zero-shot predictions of affinity changes
Gibbs sampling is less effective than simpler algorithms for sequence design using PSSMs
ML optimized Ab libraries have more mutations than PSSM libraries
MSA-based structure prediction models can be retrained as pathogenicity prediction models by upweighting BERT losses
Masked LMs can be fine-tuned starting from autoregressive LMs, but not vice-versa
Transformer

GitHub
Discord Community

Quartz 4

Explorer

BERT

Graph View

Backlinks