Dimensionality reduction refers to the process of compressing input features into dimensions, where .

Notes

  • t-SNE and UMAP distort input values (1). There are other examples of this in the single-cell omics field. Ref (1)

Approaches

Principal component analysis

Principal component analysis (PCA) is a standard dimensionality reduction tool widely used in data science. Per (2), its interpretation requires the following assumptions:

  • The underlying patterns exist
  • The patterns exhibited in the data are independent of one another
  • The patterns combine linearly to form the data
  • The data contain exclusively these patterns and additive, uncorrelated noise
  • The observations are independent

Figure 1 from Shinn (2)

t-Stochastic Neighbor Embedding
  • Anecdotally, this is better for local clustering and patterns
Uniform Manifold Approximation and Projection
  • Anecdotally, this is better for finding global clustering and patterns
Autoencoders

These are in practice not used very often.

Variational autoencoders

See Variational autoencoders

1.
Chari T, Pachter L. The specious art of single-cell genomics. PLOS Computational Biology. 2023;19(8):e1011288. Available from: https://doi.org/10.1371/journal.pcbi.1011288
2.
Shinn M. Phantom oscillations in principal component analysis. Proceedings of the National Academy of Sciences. 2023;120(48). Available from: https://doi.org/10.1073/pnas.2311420120