Manifold Learning Techniques: t-SNE vs UMAP vs Isomap

High-dimensional data pervades modern machine learning, from genomics with thousands of gene expressions to natural language processing with embeddings containing hundreds of dimensions. Yet humans struggle to comprehend anything beyond three dimensions. Manifold learning techniques bridge this gap by revealing the hidden structure within high-dimensional data through dimensionality reduction that preserves meaningful relationships.

Among the arsenal of manifold learning algorithms, three stand out for their distinct approaches and widespread adoption: t-SNE (t-Distributed Stochastic Neighbor Embedding), UMAP (Uniform Manifold Approximation and Projection), and Isomap (Isometric Mapping). Understanding their fundamental differences, mathematical underpinnings, and practical trade-offs empowers data scientists to select the right tool for each analytical challenge.

The Manifold Hypothesis and Why It Matters

The manifold hypothesis posits that real-world high-dimensional data often lies on or near a lower-dimensional manifold embedded within the high-dimensional space. Consider photographs: while a 1024×1024 color image technically occupies a space with over three million dimensions, meaningful images occupy a vastly smaller subspace. Not every possible combination of pixel values produces a recognizable image.

Manifold learning algorithms attempt to discover this intrinsic lower-dimensional structure. Unlike linear techniques like Principal Component Analysis (PCA) that find the best linear projection, manifold learning methods can uncover non-linear structures, making them invaluable for complex real-world datasets where relationships aren’t captured by straight lines and flat planes.

The three algorithms we’re examining take fundamentally different philosophical approaches to this challenge, leading to distinct characteristics in their outputs and appropriate use cases.

t-SNE: Probability-Based Local Structure Preservation

t-Distributed Stochastic Neighbor Embedding, introduced by Laurens van der Maaten and Geoffrey Hinton in 2008, revolutionized data visualization by producing remarkably clear cluster separations. t-SNE operates by converting high-dimensional Euclidean distances between points into conditional probabilities representing similarities.

How t-SNE Works Under the Hood

The algorithm begins by computing pairwise similarities in the high-dimensional space using a Gaussian distribution centered on each point. For points i and j, the conditional probability that i would pick j as its neighbor is proportional to the Gaussian density under the distance between them. This creates a probability distribution over all potential neighbors for each point.

In the low-dimensional embedding space, t-SNE uses a Student t-distribution (specifically with one degree of freedom, which is a Cauchy distribution) rather than a Gaussian. This heavy-tailed distribution helps address the “crowding problem”—the challenge that the low-dimensional space has insufficient room to accommodate moderately distant points from the high-dimensional space.

The optimization process uses gradient descent to minimize the Kullback-Leibler divergence between the high-dimensional and low-dimensional probability distributions. This effectively pulls similar points together while pushing dissimilar points apart in the embedding space.

from sklearn.manifold import TSNE
import numpy as np

# Apply t-SNE with key parameters
tsne = TSNE(n_components=2, 
            perplexity=30,      # Balances local vs global structure
            learning_rate=200,   # Step size for optimization
            n_iter=1000,        # Number of optimization iterations
            random_state=42)

embedding = tsne.fit_transform(high_dim_data)

The Perplexity Parameter and Its Critical Role

Perplexity represents perhaps the most important hyperparameter in t-SNE, serving as a smooth measure of the effective number of neighbors considered for each point. Lower perplexity values (5-15) focus on very local structure, potentially fragmenting clusters, while higher values (30-50) consider broader neighborhoods, potentially merging distinct clusters.

The appropriate perplexity depends heavily on dataset size and intrinsic structure. For datasets with hundreds of points, perplexity between 5-30 works well. For thousands of points, values of 30-50 prove more effective. Dense datasets with clear cluster structure benefit from higher perplexity, while sparse or noisy data often requires lower values.

Key characteristics of t-SNE:

  • Excels at revealing local neighborhood structure and cluster separation
  • Creates visually striking visualizations with clear cluster boundaries
  • Computationally expensive, with O(n²) complexity in standard implementations
  • Stochastic nature means different runs produce different embeddings
  • Global structure preservation is explicitly sacrificed for local accuracy
  • Distances between clusters in the embedding carry limited meaning
  • Not suitable for dimensionality reduction beyond visualization (cannot transform new data)

When t-SNE Shines

t-SNE proves invaluable for exploratory data analysis when identifying distinct groups matters more than preserving exact distances. Single-cell RNA sequencing analysis uses t-SNE extensively to visualize cell type clusters. Image embedding visualization benefits from t-SNE’s ability to group similar images tightly while clearly separating different categories. Document clustering applications leverage t-SNE to reveal topic structures in text corpora.

However, t-SNE’s computational cost becomes prohibitive beyond tens of thousands of points. The Barnes-Hut approximation reduces complexity to O(n log n) but still struggles with massive datasets. More critically, t-SNE cannot embed new points without rerunning the entire optimization, limiting its utility for production machine learning pipelines.

Quick Comparison: Core Differences

📊

t-SNE

Best for visualization, emphasizes local structure, slow on large datasets

🚀

UMAP

Faster, preserves more global structure, supports new point transformation

🗺️

Isomap

Geodesic distances, convex manifolds, interpretable dimensions

UMAP: Topology-Based Modern Alternative

Uniform Manifold Approximation and Projection emerged in 2018 from Leland McInnes and colleagues, quickly gaining traction as a faster, more flexible alternative to t-SNE. UMAP’s foundation rests on manifold theory and topological data analysis, giving it theoretical advantages alongside practical benefits.

The Mathematical Foundation of UMAP

UMAP constructs a fuzzy topological representation of the high-dimensional data, then finds a low-dimensional representation with the closest possible equivalent fuzzy topological structure. This sounds abstract, but the intuition is straightforward: UMAP builds a graph connecting nearby points in high-dimensional space, then optimizes a similar graph in low-dimensional space to match the original as closely as possible.

The algorithm uses a distance metric (typically Euclidean) to find each point’s nearest neighbors, then constructs edge weights representing the likelihood that points are connected in the underlying manifold. These weights decrease exponentially with distance but use different rates for different points, making the representation adaptive to local density variations.

In the low-dimensional space, UMAP optimizes these connections using a force-directed graph layout approach, where attractive and repulsive forces balance to create the final embedding. The cross-entropy loss function measures the difference between high-dimensional and low-dimensional graph structures.

Key Parameters That Control UMAP Behavior

The n_neighbors parameter determines how many neighbors UMAP considers when constructing the high-dimensional graph representation. Small values (5-15) emphasize local structure, potentially creating many small clusters. Large values (50-100) capture broader structure, potentially connecting what might appear as separate clusters. This parallels t-SNE’s perplexity but offers more intuitive interpretation.

The min_dist parameter controls how tightly UMAP packs points together in the embedding. Values near 0 allow tight clumping, creating distinct, separated clusters. Values from 0.1-0.5 spread points more evenly, useful when preserving continuous structure matters more than cluster separation.

import umap

# UMAP with typical parameters
reducer = umap.UMAP(n_neighbors=15,      # Local neighborhood size
                     min_dist=0.1,        # Minimum distance between points
                     n_components=2,      # Target dimensions
                     metric='euclidean',  # Distance metric
                     random_state=42)

embedding = reducer.fit_transform(high_dim_data)

# Transform new data points
new_embedding = reducer.transform(new_data_points)

UMAP’s distinctive characteristics:

  • Significantly faster than t-SNE, especially on large datasets
  • Better preservation of global structure while maintaining local relationships
  • Supports transformation of new data points without retraining
  • More stable across different runs with the same parameters
  • Flexible distance metrics (Euclidean, Manhattan, Cosine, and many others)
  • Scales well to datasets with millions of points
  • Can reduce to dimensions beyond 2 or 3 while maintaining utility

UMAP’s Practical Advantages

UMAP’s speed advantage becomes dramatic as data size increases. What might take hours with t-SNE completes in minutes with UMAP. This computational efficiency enables iterative exploration and parameter tuning that would be impractical with t-SNE.

The ability to transform new points proves crucial for production systems. Train UMAP on a reference dataset, then efficiently project new observations into the same embedding space—something impossible with t-SNE. This enables real-time visualization dashboards, anomaly detection systems that flag unusual patterns in the embedding space, and consistent embeddings across different data batches.

UMAP also preserves more global structure than t-SNE. While cluster separation remains strong, the relative positions of clusters carry more meaning. Distances between cluster centers in UMAP embeddings correlate better with true high-dimensional distances than in t-SNE, though neither should be treated as exact distance preservers.

Isomap: Geodesic Distance Preservation

Isomap takes an entirely different philosophical approach compared to t-SNE and UMAP. Rather than focusing on probability distributions or topological structures, Isomap attempts to preserve geodesic distances—the distances along the manifold’s surface rather than through the high-dimensional space.

The Geodesic Distance Concept

Imagine a map of the Earth flattened onto paper. The straight-line distance between two cities on the flat map poorly represents travel distance because it passes through the Earth’s interior. The geodesic distance—following the surface—provides the meaningful metric. Isomap applies this principle to high-dimensional manifolds.

Isomap constructs a neighborhood graph where each point connects to its k nearest neighbors. To estimate geodesic distances, the algorithm computes shortest paths through this graph using Dijkstra’s or Floyd-Warshall algorithm. These graph distances approximate distances along the manifold surface. Finally, multidimensional scaling (MDS) finds a low-dimensional embedding that best preserves these geodesic distances.

Implementation and Parameter Selection

The primary parameter, the number of neighbors (k), critically affects results. Too few neighbors create a disconnected graph where geodesic distances cannot be computed between all points. Too many neighbors create shortcuts that violate the manifold structure, making “geodesic” distances approach Euclidean distances.

from sklearn.manifold import Isomap

# Isomap with neighbor-based connectivity
isomap = Isomap(n_neighbors=10,      # Neighborhood size
                n_components=2,       # Target dimensions
                metric='minkowski',   # Distance metric
                p=2)                  # Minkowski p-value (2 = Euclidean)

embedding = isomap.fit_transform(high_dim_data)

Isomap’s defining characteristics:

  • Preserves geodesic distances rather than local neighborhoods
  • Assumes the manifold is convex (no holes or disconnected regions)
  • Produces more interpretable embedding dimensions
  • Deterministic—same input always produces same output
  • Sensitive to noise and outliers
  • Can fail catastrophically when manifold assumptions are violated
  • Computationally moderate, with graph shortest-path as bottleneck

When Isomap Excels and When It Fails

Isomap works beautifully on synthetic manifolds like Swiss rolls or S-curves where the data lies on a smooth, connected, convex manifold. The embedding dimensions often correspond to meaningful latent variables—for instance, when applied to images of faces under different lighting conditions, Isomap dimensions might cleanly separate pose angle, lighting direction, and facial expression.

However, Isomap’s manifold assumptions prove restrictive. Real-world data often violates convexity, contains holes or disconnected regions, or includes significant noise. When applied to data with distinct clusters, Isomap often produces poor results because it tries to maintain geodesic distances across clusters, sometimes creating misleading connections.

Short-circuiting poses another challenge. If noise creates spurious connections between distant manifold regions, geodesic distance estimates become corrupted, and the entire embedding suffers. This makes Isomap less robust than t-SNE or UMAP on noisy data.

Real-World Application Example: Single-Cell RNA-Seq Analysis

Consider analyzing 50,000 cells with 2,000 gene expression measurements each, aiming to identify cell types and developmental trajectories.

t-SNE Approach

Best for: Initial visualization revealing distinct cell types. Creates beautiful, publication-ready cluster plots. Limitation: Takes 30+ minutes to compute. Cannot easily add newly sequenced cells without recomputing everything.

UMAP Approach

Best for: Interactive exploratory analysis and production pipelines. Completes in 5 minutes. Preserves enough global structure to infer developmental relationships between cell types. Can transform newly sequenced cells into the existing embedding for immediate classification.

Isomap Approach

Best for: Continuous developmental trajectories on pre-filtered data. Works well on specific cell lineages but struggles with the full heterogeneous dataset. Sensitive to the noisy, sparse nature of single-cell data.

Performance and Scalability Considerations

Computational performance varies dramatically across these algorithms. t-SNE’s O(n²) complexity becomes prohibitive above 10,000 points without approximations. Barnes-Hut t-SNE reduces this to O(n log n) but remains slower than alternatives. Typical execution times range from seconds for hundreds of points to hours for tens of thousands.

UMAP achieves remarkable speed through approximate nearest neighbor algorithms and efficient optimization. Datasets with 100,000 points that would take hours with t-SNE complete in minutes with UMAP. The difference becomes even more pronounced at larger scales, making UMAP the only practical choice for datasets with millions of observations.

Isomap’s performance bottleneck lies in computing all-pairs shortest paths on the neighborhood graph. For moderate-sized datasets (thousands of points), this completes quickly. However, the algorithm doesn’t scale gracefully to extremely large datasets. Memory requirements also increase substantially as the distance matrix must be stored.

Memory consumption patterns:

  • t-SNE: Moderate, primarily for pairwise probability distributions
  • UMAP: Low, maintains only sparse neighbor graphs
  • Isomap: High, requires full distance matrix storage

Preservation of Different Structural Properties

Each algorithm prioritizes different aspects of the high-dimensional structure, leading to distinct embedding characteristics that match different analytical goals.

t-SNE aggressively prioritizes local structure. Points close in high-dimensional space remain close in the embedding. However, global relationships suffer—the distance between clusters or the relative positions of clusters carry little meaning. This makes t-SNE ideal for identifying groups but problematic for understanding relationships between groups.

UMAP strikes a balance, preserving both local neighborhoods and substantial global structure. Clusters remain separated and internally cohesive like t-SNE, but their relative positions and distances correlate better with high-dimensional reality. This makes UMAP suitable for both cluster identification and understanding broader relationships.

Isomap focuses on geodesic distance preservation across all scales. When successful, it maintains both local and global structure accurately. The embedding dimensions often correspond to meaningful continuous variables. However, this comes at the cost of robustness—violations of the manifold assumptions lead to poor performance.

Choosing the Right Algorithm for Your Data

The decision between t-SNE, UMAP, and Isomap should be driven by your specific data characteristics, computational constraints, and analytical objectives.

Select t-SNE when:

  • Dataset size remains below 10,000 points
  • Visual cluster separation matters most
  • You need publication-quality exploratory visualizations
  • Global structure is irrelevant to your analysis
  • Computational time isn’t constrained
  • You won’t need to embed new points later

Select UMAP when:

  • Working with large datasets (10,000+ points)
  • You need both local and global structure preservation
  • Computational efficiency is important
  • New points must be transformed into the embedding space
  • You want stable, reproducible results
  • The embedding will be used in downstream machine learning
  • Interactive exploration requires fast iteration

Select Isomap when:

  • Data lies on a smooth, convex manifold
  • Interpretable embedding dimensions are valuable
  • Geodesic distance preservation matters
  • Data quality is high with minimal noise
  • Working with moderate-sized datasets
  • The manifold structure is well-understood
  • Deterministic results are required

For complex real-world datasets with uncertain structure, UMAP generally provides the best balance of performance, robustness, and utility. Its speed enables exploration with different parameters, and its structural preservation supports both visualization and downstream analysis.

Conclusion

t-SNE, UMAP, and Isomap represent three distinct philosophical approaches to manifold learning, each with unique strengths and limitations. t-SNE excels at creating visually striking cluster separations for exploratory analysis but sacrifices global structure and computational efficiency. UMAP offers a modern alternative that balances local and global preservation while dramatically improving speed and flexibility. Isomap provides geodesic distance preservation and interpretable dimensions but requires strict manifold assumptions that often fail on real-world data.

Understanding these trade-offs empowers practitioners to select the appropriate tool for each analytical challenge. For most contemporary applications involving substantial datasets, UMAP provides the optimal combination of speed, preservation quality, and practical utility. However, t-SNE remains valuable for small-scale exploratory visualization, while Isomap finds niche applications where its manifold assumptions hold and interpretable dimensions prove valuable.

Leave a Comment