Stats

52 pages, 10801 KB

Open AccessArticle

Silhouette-Based Evaluation of PCA, Isomap, and t-SNE on Linear and Nonlinear Data Structures

by Mostafa Zahed and Maryam Skafyan

Stats 2025, 8(4), 105; https://doi.org/10.3390/stats8040105 - 3 Nov 2025

Dimensionality reduction is fundamental for analyzing high-dimensional data, supporting visualization, denoising, and structure discovery. We present a systematic, large-scale benchmark of three widely used methods—Principal Component Analysis (PCA), Isometric Mapping (Isomap), and t-Distributed Stochastic Neighbor Embedding (t-SNE)—evaluated by average silhouette scores to quantify [...] Read more.

Dimensionality reduction is fundamental for analyzing high-dimensional data, supporting visualization, denoising, and structure discovery. We present a systematic, large-scale benchmark of three widely used methods—Principal Component Analysis (PCA), Isometric Mapping (Isomap), and t-Distributed Stochastic Neighbor Embedding (t-SNE)—evaluated by average silhouette scores to quantify cluster preservation after embedding. Our full factorial simulation varies sample size

n \in {100, 200, 300, 400, 500}

, noise variance

σ^{2} \in {0.25, 0.5, 0.75, 1, 1.5, 2}

, and feature count

p \in {20, 50, 100, 200, 300, 400}

under four generative regimes: (1) a linear Gaussian mixture, (2) a linear Student-t mixture with heavy tails, (3) a nonlinear Swiss-roll manifold, and (4) a nonlinear concentric-spheres manifold, each replicated 1000 times per condition. Beyond empirical comparisons, we provide mathematical results that explain the observed rankings: under standard separation and sampling assumptions, PCA maximizes silhouettes for linear, low-rank structure, whereas Isomap dominates on smooth curved manifolds; t-SNE prioritizes local neighborhoods, yielding strong local separation but less reliable global geometry. Empirically, PCA consistently achieves the highest silhouettes for linear structure (Isomap second, t-SNE third); on manifolds the ordering reverses (Isomap > t-SNE > PCA). Increasing

σ^{2}

and adding uninformative dimensions (larger p) degrade all methods, while larger n improves levels and stability. To our knowledge, this is the first integrated study combining a comprehensive factorial simulation across linear and nonlinear regimes with distribution-based summaries (density and violin plots) and supporting theory that predicts method orderings. The results offer clear, practice-oriented guidance: prefer PCA when structure is approximately linear; favor manifold learning—especially Isomap—when curvature is present; and use t-SNE for the exploratory visualization of local neighborhoods. Complete tables and replication materials are provided to facilitate method selection and reproducibility. Full article

21 pages, 1895 KB

Open AccessArticle

Computational Testing Procedure for the Overall Lifetime Performance Index of Multi-Component Exponentially Distributed Products

by Shu-Fei Wu and Chia-Chi Hsu

Stats 2025, 8(4), 104; https://doi.org/10.3390/stats8040104 - 2 Nov 2025

Journal Menu

Journal Browser

Stats, Volume 8, Issue 4 (December 2025) – 22 articles

Further Information

Guidelines

MDPI Initiatives

Follow MDPI