Galaxy Evolution with Manifold Learning
Abstract
1. Introduction
1.1. Galaxy Evolution in the Era of Large Galaxy Surveys
1.2. Galaxy Manifold in Multi-Wavelength Luminosity Space
2. Data
3. Methods: Quantification of the Galaxy Manifold via Manifold Learning
3.1. Galaxy Manifold in Multi-Wavelength Luminosity Space Revisited
3.2. Manifold Learning
3.3. Isomap and UMAP Algorithms
- Isomap:
- Metric-preserving and density-preserving;
- UMAP:
- Topology-preserving and noise-robust.
3.3.1. Isomap
- Convexity
- is a geodesically convex subset of .
- Isometry
- The geodesic distance is preserved under the map . For any two points on the manifold, the geodesic distance between them is equal to the Euclidean distance between the corresponding embedded points and in , i.e.,
- 1.
- Nearest-neighbor searchChoose an integer K or . Compute the distances between all pairs of data points in the feature space :As the distance measure, the Euclidean distance is typically used. The neighboring points on are defined by connecting points up to the K-nearest neighbors or all points within a ball of radius . The performance of Isomap is determined by the choice of K or . For an efficient neighborhood search, Isomap uses sklearn.neighbors.BallTree.
- 2.
- Computation of graph distancesFor the input data points , construct a weighted neighborhood graph . The vertex set consists of the data points , and the edge set consists of edges that represent neighborhood relations between data points. Each edge is assigned a weight corresponding to the distance between the two points. If two points and are not directly connected by an edge, the weight is set to ∞.The geodesic distances on between pairs of points are then estimated by the graph distances on . The graph distance is defined as the length of the shortest path between the two vertices on the graph . Two points that are not neighbors are connected via the shortest path that links nearest neighbors, and the path length is given by the sum of the corresponding weights. This length provides an approximation to the geodesic distance between the two distant points.If the data points are sampled from a probability distribution defined on the manifold , then, for a flat manifold, the graph distance converges to the geodesic distance as [27]. Efficient algorithms for this purpose include the Floyd–Warshall algorithm [28,29] and Dijkstra’s algorithm [30]. The former is known to be effective when the graph is dense, whereas the latter is effective when the graph is sparse (e.g., [15]).
- 3.
- Spectral embedding via MDSConsider the distance matrix , an symmetric matrix. Applying classical MDS to , we reconstruct a d-dimensional space such that the geodesic distances between data points on the manifold are preserved as faithfully as possible. Let be the symmetric matrix for which its entries are the squared graph distances. This matrix is double-centered asHere, denotes the identity matrix, and denotes the symmetric matrix with all entries equal to 1.
- 4.
- Choose the embedding vectors so as to minimize . Here,with and being the Euclidean distance between and . If we perform an eigendecomposition of using the eigenvalue matrix and the eigenvector matrix , we obtainThe optimal solution is given by the eigenvectors corresponding to the d largest eigenvalues of .
- 5.
- The graph is embedded into the d-dimensional subspace by the matrix:
3.3.2. UMAP
- Estimation of the Riemannian manifold;
- Representation of the distance space using fuzzy topology;
- Dimensionality reduction.
4. Results and Discussion
4.1. Results: Galaxy Manifolds Derived with Isomap and UMAP
4.2. Galaxy Manifold and Observables
4.3. From Quantification to Formulation
5. Conclusions and Outlook
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Appendix A. Basics of Cosmology
Appendix A.1. The Friedmann–Lemaître–Robertson–Walker Metric and the Scale Factor
Appendix A.2. Cosmological Redshift
Appendix A.3. Friedmann Equations and Cosmological Parameters
- Hubble parameter:Although this quantity is time dependent, observationally its present value is often used. The dimensionless parameter is also frequently employed.
- Density parameter:Here, is the critical density. In the present Universe, it is measured to be .
- Dimensionless cosmological-constant parameter:
- Curvature parameter:
Appendix B. Magnitude
Appendix C. Galaxy Star Formation Histories and Optical Spectra
Appendix D. Multidimensional Scaling
- We first introduce the kernel (Gram) matrix defined by
- The kernel matrix is computed from the distance matrix aswhereis the centering matrix. Here, is the identity matrix, and is the matrix for which its elements are all unity. This operation is called double centering.
- We perform the eigenvalue decomposition of the kernel matrix:where is the matrix of eigenvectors, and is the diagonal matrix of eigenvalues.Using these, we define the full coordinate matrix
- The matrix represents an embedding in an N-dimensional space. To obtain a d-dimensional embedding, we retain the d largest eigenvalues and their corresponding eigenvectors , and we definewhere and .
Appendix E. Uniform Manifold Approximation and Projection (UMAP)
- Estimation of the Riemannian manifoldUMAP assumes that the data are uniformly distributed on a Riemannian manifold . The Riemannian manifold on which the data lie is estimated using a K-nearest-neighbor graph, similarly to Isomap. The data are given as points in , so is regarded as embedded in , but the metric g is estimated separately.The estimation of the metric is based on the following lemma [24].Lemma A1([24]). Let be a Riemannian manifold embedded in , and let . For a sufficiently small neighborhood U of p, assume that the metric is locally constant and diagonal: . Let be an open ball of radius r centered at p in . Then, the volume of B with respect to the metric g iswhere the second equality follows from the constancy of g on U. Since the last integral is the volume of an n-dimensional Euclidean ball of radius r,By adjusting r, one may set , which implies . Hence, distances measured with respect to g are times the Euclidean distances, and for ,This lemma shows that distances near a point p can be computed from Euclidean distances by choosing an appropriate local neighborhood. Given a point set sampled uniformly from , the local metric scale can be estimated by fixing a constant K and choosing the Euclidean radius that contains exactly K neighbors.
- Fuzzy topological representation of distance spacesUMAP represents the data as a fuzzy topological representation [43], which is essentially a weighted graph for which its edge weights encode the strength of connectivity. Here, we provide an informal introduction following McInnes et al. [25].
- (a)
- Fuzzy setsA fuzzy set generalizes an ordinary set by allowing the membership functionto represent the membership strength of x in the set. For a threshold a, the associated crisp set is
- (b)
- Simplicial setsA simplicial set assigns, for each m, a set of simplices of dimension at most m. For example, is the set of vertices, and includes edges.
- (c)
- Fuzzy simplicial setsA fuzzy simplicial set replaces each by a fuzzy set. For strength a, we write for the set of simplices with a membership strength of at least a.
- (d)
- FinEPMet, FinReal, and FinSingAn extended pseudo-metric space (EPMet) allows and for . Finite EPMets are denoted FinEPMet.The functor FinReal maps a fuzzy simplex to an EPMetwithConversely, FinSing maps an EPMet to a fuzzy simplicial set , where
- Dimensionality reductionThe fuzzy topological representations of the original data X and the low-dimensional embedding are compared using a cross-entropy objective. The embedding Y is obtained by minimizing this objective, typically using stochastic gradient descent.
References
- Sáez, D.; Ballester, V.J. Topological defects and large-scale structure. Phys. Rev. D 1990, 42, 3321–3328. [Google Scholar] [CrossRef] [PubMed]
- Tinsley, B.M. Evolution of the Stars and Gas in Galaxies. Fundam. Cosm. Phys. 1980, 5, 287–388. [Google Scholar] [CrossRef]
- Brosche, P. The Manifold of Galaxies. Galaxies with known Dynamical Parameters. Astron. Astrophys. 1973, 23, 259–268. [Google Scholar]
- Djorgovski, S. Galaxy Manifolds and Galaxy Formation. In Morphological and Physical Classification of Galaxies; Longo, G., Capaccioli, M., Busarello, G., Eds.; Springer: Dordrecht, The Netherlands, 1992; pp. 337–356. [Google Scholar]
- Hunt, L.; Magrini, L.; Galli, D.; Schneider, R.; Bianchi, S.; Maiolino, R.; Romano, D.; Tosi, M.; Valiante, R. Scaling relations of metallicity, stellar mass and star formation rate in metal-poor starbursts—I. A Fundamental Plane. Mon. Not. R. Astron. Soc. 2012, 427, 906–918. [Google Scholar] [CrossRef]
- Zhang, H.; Zaritsky, D. Examining early-type galaxy scaling relations using simple dynamical models. Mon. Not. R. Astron. Soc. 2016, 455, 1364–1374. [Google Scholar] [CrossRef]
- Ginolfi, M.; Hunt, L.K.; Tortora, C.; Schneider, R.; Cresci, G. Scaling relations and baryonic cycling in local star-forming galaxies. I. The sample. Astron. Astrophys. 2020, 638, A4. [Google Scholar] [CrossRef]
- Siudek, M.; Małek, K.; Pollo, A.; Krakowski, T.; Iovino, A.; Scodeggio, M.; Moutard, T.; Zamorani, G.; Guzzo, L.; Garilli, B.; et al. The VIMOS Public Extragalactic Redshift Survey (VIPERS). The complexity of galaxy populations at 0.4 < z < 1.3 revealed with unsupervised machine-learning algorithms. Astron. Astrophys. 2018, 617, A70. [Google Scholar] [CrossRef]
- Bouveyron, C.; Brunet, C. Simultaneous model-based clustering and visualization in the Fisher discriminative subspace. Stat. Comput. 2012, 22, 301–324. [Google Scholar] [CrossRef]
- Ryan, S.G.; Norton, A.J. Stellar Evolution and Nucleosynthesis; Cambridge University Press: Cambridge, UK, 2010. [Google Scholar]
- Takeuchi, T.T. Physics of the Formation and Evolution of Galaxies: Multiwavelength Point of View; Springer Series in Astrophysics and Cosmology; Springer Nature: Singapore, 2025. [Google Scholar]
- Blanton, M.R. Galaxies in SDSS and DEEP2: A Quiet Life on the Blue Sequence? Astrophys. J. 2006, 648, 268–280. [Google Scholar] [CrossRef]
- Faber, S.M.; Willmer, C.N.A.; Wolf, C.; Koo, D.C.; Weiner, B.J.; Newman, J.A.; Im, M.; Coil, A.L.; Conroy, C.; Cooper, M.C.; et al. Galaxy Luminosity Functions to z~1 from DEEP2 and COMBO-17: Implications for Red Galaxy Formation. Astrophys. J. 2007, 665, 265–294. [Google Scholar] [CrossRef]
- Chilingarian, I.V.; Zolotukhin, I.Y. A universal ultraviolet-optical colour-colour-magnitude relation of galaxies. Mon. Not. R. Astron. Soc. 2012, 419, 1727–1739. [Google Scholar] [CrossRef]
- Ma, Y.; Fu, Y. Manifold Learning Theory and Applications; CRC Press: Boca Raton, FL, USA, 2012. [Google Scholar]
- Takeuchi, T.T. Applications of Big Data and Machine Learning in Galaxy Formation and Evolution; Series in Astronomy and Astrophysics; CRC Press: Boca Raton, FL, USA, 2025. [Google Scholar]
- Chilingarian, I.V.; Zolotukhin, I.Y.; Katkov, I.Y.; Melchior, A.L.; Rubtsov, E.V.; Grishin, K.A. RCSED—A Value-added Reference Catalog of Spectral Energy Distributions of 800,299 Galaxies in 11 Ultraviolet, Optical, and Near-infrared Bands: Morphologies, Colors, Ionized Gas, and Stellar Population Properties. Astrophys. J. Suppl. Ser. 2017, 228, 14. [Google Scholar] [CrossRef]
- Abazajian, K.N.; Adelman-McCarthy, J.K.; Agüeros, M.A.; Allam, S.S.; Allende Prieto, C.; An, D.; Anderson, K.S.J.; Anderson, S.F.; Annis, J.; Bahcall, N.A.; et al. The Seventh Data Release of the Sloan Digital Sky Survey. Astrophys. J. Suppl. Ser. 2009, 182, 543–558. [Google Scholar] [CrossRef]
- Lin, L.; St. Thomas, B.; Zhu, H.; Dunson, D.B. Extrinsic Local Regression on Manifold-Valued Data. J. Am. Stat. Assoc. 2017, 112, 1261–1273. [Google Scholar] [CrossRef] [PubMed]
- Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; MIT Press: Cambridge, MA, USA, 2016; Available online: http://www.deeplearningbook.org (accessed on 23 February 2026).
- Roweis, S.T.; Saul, L.K. Nonlinear Dimensionality Reduction by Locally Linear Embedding. Science 2000, 290, 2323–2326. [Google Scholar] [CrossRef]
- Tenenbaum, J.B.; Silva, V.d.; Langford, J.C. A Global Geometric Framework for Nonlinear Dimensionality Reduction. Science 2000, 290, 2319–2323. [Google Scholar] [CrossRef]
- Liu, S.; Maljovec, D.; Wang, B.; Bremer, P.T.; Pascucci, V. Visualizing High-Dimensional Data: Advances in the Past Decade. IEEE Trans. Vis. Comput. Graph. 2017, 23, 1249–1268. [Google Scholar] [CrossRef]
- McInnes, L.; Healy, J.; Saul, N.; Großberger, L. UMAP: Uniform Manifold Approximation and Projection. J. Open Source Softw. 2018, 3, 861. [Google Scholar] [CrossRef]
- McInnes, L.; Healy, J.; Melville, J. UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction. arXiv 2020, arXiv:1802.03426. [Google Scholar] [CrossRef]
- Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
- Bernstein, M.; Silva, V.D.; Langford, J.C.; Tenenbaum, J.B. Graph Approximations to Geodesics on Embedded Manifolds; Department of Psychology, Stanford University: Stanford, CA, USA, 2001. [Google Scholar]
- Floyd, R.W. Algorithm 97: Shortest path. Commun. ACM 1962, 5, 345. [Google Scholar] [CrossRef]
- Warshall, S. A Theorem on Boolean Matrices. J. ACM 1962, 9, 11–12. [Google Scholar] [CrossRef]
- Dijkstra, E.W. A Note on Two Problems in Connexion with Graphs. Numer. Math. 1959, 1, 269–271. [Google Scholar] [CrossRef]
- Farahmand, A.m.; Szepesvári, C.; Audibert, J.Y. Manifold-adaptive dimension estimation. In Proceedings of the 24th International Conference on Machine Learning, Corvallis, OR, USA, 20–24 June 2007; ICML ’07; ACM: New York, NY, USA, 2007; pp. 265–272. [Google Scholar] [CrossRef]
- Akaike, H. A new look at the statistical model identification. IEEE Trans. Autom. Control 1974, 19, 716–723. [Google Scholar] [CrossRef]
- Schwarz, G. Estimating the Dimension of a Model. Ann. Stat. 1978, 6, 461–464. [Google Scholar] [CrossRef]
- Cooray, S.; Takeuchi, T.T.; Kashino, D.; Yoshida, S.A.; Ma, H.X.; Kono, K.T. Characterizing and understanding galaxies with two parameters. Mon. Not. R. Astron. Soc. 2023, 524, 4976–4995. [Google Scholar] [CrossRef]
- Lilly, S.J.; Carollo, C.M.; Pipino, A.; Renzini, A.; Peng, Y. Gas Regulation of Galaxies: The Evolution of the Cosmic Specific Star Formation Rate, the Metallicity-Mass-Star-formation Rate Relation, and the Stellar Content of Halos. Astrophys. J. 2013, 772, 119. [Google Scholar] [CrossRef]
- Cranmer, M. Pysr: Fast and interpretable symbolic regression. J. Open Source Softw. 2023, 8, 5300. [Google Scholar] [CrossRef]
- Foster, J.; Nightingale, J. A Short Course in General Relativity; Springer: New York, NY, USA, 2010. [Google Scholar]
- Planck Collaboration; Aghanim, N.; Akrami, Y.; Ashdown, M.; Aumont, J.; Baccigalupi, C.; Ballardini, M.; Banday, A.J.; Barreiro, R.B.; Bartolo, N.; et al. Planck 2018 results. VI. Cosmological parameters. Astron. Astrophys. 2020, 641, A6. [Google Scholar] [CrossRef]
- Bessell, M.S. Standard Photometric Systems. Annu. Rev. Astron. Astrophys. 2005, 43, 293–336. [Google Scholar] [CrossRef]
- Oke, J.B.; Gunn, J.E. Secondary standard stars for absolute spectrophotometry. Astrophys. J. 1983, 266, 713–717. [Google Scholar] [CrossRef]
- Peebles, P.J.E. Principles of Physical Cosmology; Princeton University Press: Princeton, NJ, USA, 1993. [Google Scholar] [CrossRef]
- Oke, J.B.; Sandage, A. Energy Distributions, K Corrections, and the Stebbins-Whitford Effect for Giant Elliptical Galaxies. Astrophys. J. 1968, 154, 21. [Google Scholar] [CrossRef]
- Spivak, D.I. Metric realization of fuzzy simplicial sets. arXiv 2009, arXiv:0906.4992. [Google Scholar]









Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Takeuchi, T.T.; Cooray, S.; Kano, R.R. Galaxy Evolution with Manifold Learning. Entropy 2026, 28, 288. https://doi.org/10.3390/e28030288
Takeuchi TT, Cooray S, Kano RR. Galaxy Evolution with Manifold Learning. Entropy. 2026; 28(3):288. https://doi.org/10.3390/e28030288
Chicago/Turabian StyleTakeuchi, Tsutomu T., Suchetha Cooray, and Ryusei R. Kano. 2026. "Galaxy Evolution with Manifold Learning" Entropy 28, no. 3: 288. https://doi.org/10.3390/e28030288
APA StyleTakeuchi, T. T., Cooray, S., & Kano, R. R. (2026). Galaxy Evolution with Manifold Learning. Entropy, 28(3), 288. https://doi.org/10.3390/e28030288

