๐Uniform Manifold Approximation and Projection (UMAP)
UMAP (Uniform Manifold Approximation and Projection) is a dimensional reduction method that considers both local and global structures. Here's a brief summary of how UMAP works:
Construct a Neighborhood Graph: For each data point, identify its nearest neighbors using a distance metric (e.g., Euclidean distance). UMAP focuses on a local neighborhood approach, considering only a fixed number of nearest neighbors for each point.
Fuzzy-Simplicial Set Approximation: Convert the nearest neighbors graph into a fuzzy representation of a simplicial set. This involves determining the likelihood of each pair of data points being connected by an edge in the low-dimensional representation.
Optimize Low-Dimensional Embedding: Optimize the low-dimensional representation of the data by minimizing the mismatch between the fuzzy-simplicial set in the original data space and the low-dimensional space. This optimization is achieved through stochastic gradient descent.
Preserve Global and Local Structure: Local structure is preserved by ensuring that nearby points in the original space remain close in the low-dimensional space. Global structure is preserved by maintaining the broader connectivity patterns in the data.
To understand more details of UMAP, check here.
Meanwhile, UMAP supports both unsupervised and supervised dimensional reduction! Look at the code below, the only difference is in fit_transform() function:

After the data had been reduced to 3 dimensions, the unsupervised output looks like this:

For supervised output, let's look at the projections from both training data and testing data:

After seeing all these methods, how do you plan to do dimensional reduction in the future? You're more than welcome to share you ideas here!
Last updated