Dimensionality reduction, or dimension reduction, is a general process of projecting a set of high-dimensional vectors to a lower-dimensionality space while retaining metrics among them. In other words, dimensionality reduction aims to downsize high-dimensional data so that it can be represented in low-dimensional space without losing important information from the data.
There are several reasons why dimensionality reduction can be useful:
- Data visualization - It's difficult or even impossible for humans to visualize high-dimensional data. Dimensionality reduction can represent that high-dimensional data in 2D or 3D.
- Data compression - Storage space and computing power are costly resources. Dimensionality reduction makes data more efficient to store and easier to retrieve.
- Noise removal - Data can often be corrupted or distorted to the point that it's difficult/impossible to understand and interpret it. Dimensionality reduction can reduce noise in data and have a positive effect on query accuracy.
Dimensionality Reduction Techniques
Numerous techniques of data mining and machine learning can be categorized as processes of dimensionality reduction.
- Non-negative matrix factorization (NMF)
- Principal component analysis (PCA)
- Kernel PCA
- Independent component analysis (ICA)
- Nonlinear dimensionality reduction (NDR)
- Linear discriminant analysis (LDA)
- Factor analysis
- Many others
Timeline
No Timeline data yet.