Models trained on synthetic data tend to converge to mean

Summary

Synthetic data over-represents mean values of the original data distribution and either underrepresents or exaggerates the presence of outliers (1). This can pose a challenge for models trained using Distillation and can be a cause of Catastrophic forgetting.

Figures

Ref (1)

Quartz 4

Explorer

Models trained on synthetic data tend to converge to mean

Summary

Figures

See also

Graph View