Weight averaging refers to the averaging of different checkpoints of a single model during training.

Types

  • Stochastic weight averaging: Averages the same model’s weights at different points near the end of the training process, maintaining high-ish learning rates
  • Exponentially moving average: Decreases the contribution of older states
  • Latest weight averaging: Uses states from the end of an epoch