A Brief Review of Unsupervised Machine Learning Algorithms in Astronomy: Dimensionality Reduction and Clustering
Abstract
1. Introduction
2. Dimensionality Reduction
2.1. Linear Methods
Principal Component Analysis (PCA) and Kernal PCA
2.2. Non-Linear Methods
2.2.1. Multi-Dimensional Scaling and Isometric Feature Mapping
2.2.2. Locally Linear Embedding
2.2.3. t-Distributed Stochastic Neighbor Embedding
2.3. Neural Network-Based Dimensionality Reduction
2.3.1. Self-Organizing Map
2.3.2. Auto-Encoder and Variational Auto-Encoder
2.4. Examples of Applications of Different Dimensionality Reduction Algorithms
2.5. Benchmarking of Dimensionality Reduction Algorithms
3. Clustering
3.1. Gaussian Mixture Model
3.2. K-Means
3.3. Hierarchical Clustering and Friends-of-Friends Clustering
3.4. Density-Based Spatial Clustering of Applications with Noise
3.5. Hierarchical Density-Based Spatial Clustering of Applications with Noise
3.6. Fuzzy C-Means Clustering
3.7. Examples of Applications of Different Clustering Algorithms
3.8. Benchmarking of Clustering Algorithms
4. Other Applications of Unsupervised Machine Learning
4.1. Anomaly Detection
4.2. Symbolic Regression
5. Future Directions
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
| ACORNS | Agglomerative clustering for organising nested structures |
| AE | Auto-encoder |
| BIC | Bayesian information criterion |
| CADET | Cavity Detection Tool |
| CAE | Convolutional auto-encoder |
| DBSCAN | Density-based spatial clustering of applications with noise |
| DESOM | Deep embedded self-organizing map |
| dim red | Dimensionality reduction |
| DIR | Direct calibration |
| DPGMM | Dirichlet process Gaussian mixture model |
| EM | Expectation maximization |
| FCM | Fuzzy C-means clustering |
| FOF | Friends-of-Friends |
| GMM | Gaussian mixture mode |
| GPU | Graphics processing unit |
| HC | Hierarchical clustering |
| HDBSCAN | Hierarchical density-based spatial clustering of applications with noise |
| HDBSCAN-MC | Hierarchical density-based spatial clustering of applications with |
| noise–Monte Carlo | |
| HMAC | Hierarchical mode association clustering |
| iForest | Isolation Forest |
| Isomap | Isometric feature mapping |
| LLE | Locally linear embedding |
| MDS | Multi-dimensional scaling |
| ML | Machine learning |
| OC | Open cluster |
| PC | Principal component |
| PCA | Principal component analysis |
| SNR | Signal-to-noise ratio |
| SOM | Self-organizing map |
| SR | Symbolic regression |
| StarGO | Stars’ Galactic Origin |
| SVD | Singular value decomposition |
| SVM | Support vector machine |
| t-SNE | t-distributed stochastic neighbor embedding |
| UMAP | Uniform manifold approximation and projection |
| VAE | Variational auto-encoder |
| VBGMM | Variational Bayesian Gaussian mixture model |
| XDGMM | Extreme deconvolution Gaussian mixture model |
| XGBoost | Extreme gradient boosting |
Appendix A. Supplementary Discussion: Algorithm Performance on Synthetic Datasets
Appendix A.1. Synthetic Datasets


Appendix A.2. Performance of Dimensionality Reduction Algorithms on a Synthetic Dataset


| Accuracy (%) | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Rep1 | Rep2 | Rep3 | Rep4 | Rep5 | Rep6 | Rep7 | Rep8 | Avg | SD | |
| PCA | 93.9 | 94.3 | 94.3 | 94.1 | 94.2 | 93.9 | 94.1 | 94.6 | 94.2 | 0.2 |
| LLE | 93.9 | 95.5 | 94.7 | 95.4 | 94.8 | 94.7 | 94.8 | 95.5 | 94.9 | 0.6 |
| Isomap | 96.5 | 96.5 | 96.3 | 96.3 | 96.0 | 95.9 | 96.2 | 96.3 | 96.3 | 0.2 |
| t-SNE | 96.0 | 97.1 | 96.9 | 97.1 | 97.1 | 96.9 | 97.5 | 97.1 | 97.0 | 0.4 |
| Accuracy (%) | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Rep1 | Rep2 | Rep3 | Rep4 | Rep5 | Rep6 | Rep7 | Rep8 | Avg | SD | |
| PCA | 97.8 | 97.7 | 97.7 | 97.9 | 97.7 | 97.7 | 97.7 | 98.1 | 97.8 | 0.1 |
| LLE | 97.0 | 93.7 | 93.8 | 91.1 | 95.0 | 90.6 | 96.0 | 96.6 | 94.2 | 2.4 |
| Isomap | 98.7 | 99.0 | 98.7 | 98.7 | 98.7 | 99.0 | 98.7 | 98.6 | 98.8 | 0.1 |
| t-SNE | 98.2 | 98.3 | 98.5 | 98.3 | 98.2 | 98.0 | 98.7 | 97.8 | 98.3 | 0.3 |
Appendix A.3. Performance of Clustering Algorithms on a Synthetic Dataset


| Accuracy (%) | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Rep1 | Rep2 | Rep3 | Rep4 | Rep5 | Rep6 | Rep7 | Rep8 | Avg | SD | |
| GMM | 96.8 | 97.5 | 97.8 | 97.2 | 97.9 | 98.0 | 98.1 | 97.3 | 97.6 | 0.5 |
| K-means | 95.0 | 95.6 | 95.5 | 95.3 | 95.6 | 95.3 | 94.9 | 95.9 | 95.4 | 0.4 |
| HC | 96.8 | 96.2 | 96.4 | 97.0 | 97.1 | 97.4 | 97.0 | 97.0 | 96.9 | 0.4 |
| FCM | 93.6 | 94.0 | 93.7 | 93.5 | 93.9 | 93.6 | 93.4 | 94.0 | 93.7 | 0.2 |
| SOM | 68.7 | 67.7 | 66.2 | 69.5 | 66.9 | 66.7 | 66.8 | 69.0 | 67.7 | 1.2 |
| Accuracy (%) | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Rep1 | Rep2 | Rep3 | Rep4 | Rep5 | Rep6 | Rep7 | Rep8 | Avg | SD | |
| GMM | 98.4 | 98.3 | 99.5 | 98.3 | 98.1 | 99.2 | 99.3 | 99.3 | 98.8 | 0.6 |
| K-means | 98.3 | 98.1 | 98.0 | 98.3 | 98.0 | 98.4 | 98.3 | 98.4 | 98.2 | 0.1 |
| HC | 97.1 | 97.5 | 97.8 | 97.7 | 97.2 | 98.1 | 97.2 | 97.8 | 97.6 | 0.3 |
| FCM | 92.7 | 92.5 | 93.0 | 92.9 | 92.9 | 92.5 | 92.4 | 93.4 | 92.8 | 0.3 |
| SOM | 42.3 | 47.0 | 44.4 | 41.7 | 41.2 | 42.3 | 43.5 | 42.8 | 43.2 | 1.9 |
| 1 | Manifold: A topological space that locally resembles Euclidean space (e.g., Cartesian space) around each point but may have a more complex global structure. |
| 2 | Embedding space: A lower-dimensional representation of high-dimensional data that typically captures the essential structure or features of the data. |
| 3 | Continuous interpolation means that similar latent vectors produce similar outputs when decoded. |
| 4 | A distortion curve plots versus K, where is the estimated distortion for the K-th clustering and is the distortion for K clusters, defined as the squared distance of points from their assigned cluster centers. Further details on the equation and its formulation are provided by Chattopadhyay et al. [6]. |
| 5 | Legacy Survey of Space and Time: https://rubinobservatory.org/explore/how-rubin-works/lsst, accessed on 8 September 2025. |
| 6 | Ensemble learning is a technique that combines the predictions of multiple models with the same objective, using methods such as voting and averaging [16]. |
References
- Almeida, J.S.; Aguerri, J.A.L.; Muñoz-Tuñón, C.; de Vicente, A. Automatic Unsupervised Classification of All Sloan Digital Sky Survey Data Release 7 Galaxy Spectra. Astrophys. J. 2010, 714, 487–504. [Google Scholar] [CrossRef]
- Boersma, C.; Bregman, J.; Allamandola, L.J. Properties of Polycyclic Aromatic Hydrocarbons in the Northwest Photon Dominated Region of NGC 7023. II. Traditional PAH analysis using k-means as a Visualization tool. Astrophys. J. 2014, 795, 110. [Google Scholar] [CrossRef]
- Panos, B.; Kleint, L.; Huwyler, C.; Krucker, S.; Melchior, M.; Ullmann, D.; Voloshynovskiy, S. Identifying Typical Mg ii Flare Spectra Using Machine Learning. Astrophys. J. 2018, 861, 62. [Google Scholar] [CrossRef]
- Rodrigo, C.; Cruz, P.; Aguilar, J.F.; Aller, A.; Solano, E.; Gálvez-Ortiz, M.C.; Jiménez-Esteban, F.; Mas-Buitrago, P.; Bayo, A.; Cortés-Contreras, M.; et al. Photometric segregation of dwarf and giant FGK stars using the SVO Filter Profile Service and photometric tools. Astron. Astrophys. 2024, 689, A93. [Google Scholar] [CrossRef]
- Zhang, H.; Ardern-Arentsen, A.; Belokurov, V. On the existence of a very metal-poor disc in the Milky Way. Mon. Not. R. Astron. Soc. 2024, 533, 889–907. [Google Scholar] [CrossRef]
- Chattopadhyay, T.; Misra, R.; Chattopadhyay, A.K.; Naskar, M. Statistical Evidence for Three Classes of Gamma-Ray Bursts. Astrophys. J. 2007, 667, 1017–1023. [Google Scholar] [CrossRef]
- Matijevič, G.; Prša, A.; Orosz, J.A.; Welsh, W.F.; Bloemen, S.; Barclay, T. Kepler Eclipsing Binary Stars. III. Classification of Kepler Eclipsing Binary Light Curves with Locally Linear Embedding. Astron. J. 2012, 143, 123. [Google Scholar] [CrossRef]
- Steinhardt, C.L.; Mann, W.J.; Rusakov, V.; Jespersen, C.K. Classification of BATSE, Swift, and Fermi Gamma-Ray Bursts from Prompt Emission Alone. Astrophys. J. 2023, 945, 67. [Google Scholar] [CrossRef]
- Froebrich, D.; Campbell-White, J.; Scholz, A.; Eislöffel, J.; Zegmott, T.; Billington, S.J.; Donohoe, J.; Makin, S.V.; Hibbert, R.; Newport, R.J.; et al. A survey for variable young stars with small telescopes: First results from HOYS-CAPS. Mon. Not. R. Astron. Soc. 2018, 478, 5091–5103. [Google Scholar] [CrossRef]
- Paraficz, D.; Courbin, F.; Tramacere, A.; Joseph, R.; Metcalf, R.B.; Kneib, J.P.; Dubath, P.; Droz, D.; Filleul, F.; Ringeisen, D.; et al. The PCA Lens-Finder: Application to CFHTLS. Astron. Astrophys. 2016, 592, A75. [Google Scholar] [CrossRef]
- Mesa, D.; Gratton, R.; Zurlo, A.; Vigan, A.; Claudi, R.U.; Alberi, M.; Antichi, J.; Baruffolo, A.; Beuzit, J.L.; Boccaletti, A.; et al. Performance of the VLT Planet Finder SPHERE. II. Data analysis and results for IFS in laboratory. Astron. Astrophys. 2015, 576, A121. [Google Scholar] [CrossRef]
- Banda, J.M.; Angryk, R.A.; Martens, P.C.H. Steps Toward a Large-Scale Solar Image Data Analysis to Differentiate Solar Phenomena. Sol. Phys. 2013, 288, 435–462. [Google Scholar] [CrossRef]
- Koza, J.R.; Bennett, F.H.; Andre, D.; Keane, M.A. Automated Design of Both the Topology and Sizing of Analog Electrical Circuits Using Genetic Programming. In Artificial Intelligence in Design ’96; Kluwer Academic Publishers: Dordrecht, The Netherlands, 1996; pp. 151–170. [Google Scholar] [CrossRef]
- Baron, D. Machine Learning in Astronomy: A practical overview. arXiv 2019, arXiv:1904.07248. [Google Scholar] [CrossRef]
- Fotopoulou, S. A review of unsupervised learning in astronomy. Astron. Comput. 2024, 48, 100851. [Google Scholar] [CrossRef]
- Ivezić, Z.; Connolly, A.; Vanderplas, J.T.; Gray, A. Statistics, Data Mining, and Machine Learning in Astronomy: A Practical Python Guide for the Analysis of Survey Data; Princeton University Press: Princeton, NJ, USA, 2020. [Google Scholar]
- Pearson, K., LIII. On lines and planes of closest fit to systems of points in space. Lond. Edinb. Dublin Philos. Mag. J. Sci. 1901, 2, 559–572. [Google Scholar] [CrossRef]
- Hotelling, H. Analysis of a complex of statistical variables into principal components. J. Educ. Psychol. 1933, 24, 417–441, 498–520. [Google Scholar] [CrossRef]
- Hotelling, H. Relations Between Two Sets of variates. Biometrika 1936, 28, 321–377. [Google Scholar] [CrossRef]
- Follette, K.B. An Introduction to High Contrast Differential Imaging of Exoplanets and Disks. Publ. Astron. Soc. Pac. 2023, 135, 093001. [Google Scholar] [CrossRef]
- Çakir, U.; Buck, T. MEGS: Morphological Evaluation of Galactic Structure—Principal component analysis as a galaxy morphology model. Astron. Astrophys. 2024, 691, A320. [Google Scholar] [CrossRef]
- Scholkopf, B.; Smola, A.; Müller, K.R. Nonlinear Component Analysis as a Kernel Eigenvalue Problem. Neural Comput. 1998, 10, 1299–1319. [Google Scholar] [CrossRef]
- Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
- Dale, D.A.; de Paz, A.G.; Gordon, K.D.; Hanson, H.M.; Armus, L.; Bendo, G.J.; Bianchi, L.; Block, M.; Boissier, S.; Boselli, A.; et al. An Ultraviolet-to-Radio Broadband Spectral Atlas of Nearby Galaxies. Astrophys. J. 2007, 655, 863–884. [Google Scholar] [CrossRef]
- Ellingson, E.; Lin, H.; Yee, H.K.C.; Carlberg, R.G. The Evolution of Population Gradients in Galaxy Clusters: The Butcher-Oemler Effect and Cluster Infall. Astrophys. J. 2001, 547, 609–622. [Google Scholar] [CrossRef]
- Francis, P.J.; Hewett, P.C.; Foltz, C.B.; Chaffee, F.H. An Objective Classification Scheme for QSO Spectra. Astrophys. J. 1992, 398, 476. [Google Scholar] [CrossRef]
- Osmer, P.S.; Porter, A.C.; Green, R.F. Luminosity Effects and the Emission-Line Properties of Quasars with 0 < Z < 3.8. Astrophys. J. 1994, 436, 678. [Google Scholar] [CrossRef]
- Brotherton, M.S.; Wills, B.J.; Francis, P.J.; Steidel, C.C. The Intermediate Line Region of QSOs. Astrophys. J. 1994, 430, 495. [Google Scholar] [CrossRef]
- Cowan, N.B.; Agol, E.; Meadows, V.S.; Robinson, T.; Livengood, T.A.; Deming, D.; Lisse, C.M.; A’Hearn, M.F.; Wellnitz, D.D.; Seager, S.; et al. Alien Maps of an Ocean-bearing World. Astrophys. J. 2009, 700, 915–923. [Google Scholar] [CrossRef]
- Whitmore, B.C. An objective classification system for spiral galaxies. I. The two dominant dimensions. Astrophys. J. 1984, 278, 61–80. [Google Scholar] [CrossRef]
- Borg, I.; Groenen, P.J.F. Modern Multidimensional Scaling—Theory and Applications; Springer: Berlin/Heidelberg, Germany, 2005. [Google Scholar] [CrossRef]
- Genest, C.; Nešlehová, J.G.; Ramsay, J.O. A Conversation with James O. Ramsay. Int. Stat. Rev. 2014, 82, 161–183. [Google Scholar] [CrossRef]
- Kruskal, J.B. Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis. Psychometrika 1964, 29, 1–27. [Google Scholar] [CrossRef]
- Kruskal, J.B. Nonmetric multidimensional scaling: A numerical method. Psychometrika 1964, 29, 115–129. [Google Scholar] [CrossRef]
- Tenenbaum, J.B.; de Silva, V.; Langford, J.C. A Global Geometric Framework for Nonlinear Dimensionality Reduction. Science 2000, 290, 2319–2323. [Google Scholar] [CrossRef]
- Bu, Y.; Chen, F.; Pan, J. Stellar spectral subclasses classification based on Isomap and SVM. New Astron. 2014, 28, 35–43. [Google Scholar] [CrossRef]
- Floyd, R.W. Algorithm 97: Shortest path. Commun. ACM 1962, 5, 345. [Google Scholar] [CrossRef]
- Fredman, M.L.; Tarjan, R.E. Fibonacci heaps and their uses in improved network optimization algorithms. J. ACM 1987, 34, 596–615. [Google Scholar] [CrossRef]
- Ward, J.L.; Lumsden, S.L. Locally linear embedding: Dimension reduction of massive protostellar spectra. Mon. Not. R. Astron. Soc. 2016, 461, 2250–2256. [Google Scholar] [CrossRef]
- Thorsen, T.; Zhou, J.; Wu, Y. Comparison of Stellar Classification Accuracies Using Automated Algorithms. In Proceedings of the American Astronomical Society Meeting Abstracts #227, Kissimmee, FL, USA, 4–8 January 2016; Volume 227, p. 348.18. [Google Scholar]
- Pearson, W.J.; Rodriguez-Gomez, V.; Kruk, S.; Margalef-Bentabol, B. Determining the time before or after a galaxy merger event. Astron. Astrophys. 2024, 687, A45. [Google Scholar] [CrossRef]
- Roweis, S.T.; Saul, L.K. Nonlinear Dimensionality Reduction by Locally Linear Embedding. Science 2000, 290, 2323–2326. [Google Scholar] [CrossRef]
- Lehoucq, R.B.; Sorensen, D.C.; Yang, C. ARPACK Users’ Guide; Society for Industrial and Applied Mathematics: Philadelphia, PA, USA, 1998. [Google Scholar] [CrossRef]
- Vanderplas, J.; Connolly, A. Reducing the Dimensionality of Data: Locally Linear Embedding of Sloan Galaxy Spectra. Astron. J. 2009, 138, 1365–1379. [Google Scholar] [CrossRef]
- Bu, Y.; Zhao, G.; Luo, A.l.; Pan, J.; Chen, Y. Restricted Boltzmann machine: A non-linear substitute for PCA in spectral processing. Astron. Astrophys. 2015, 576, A96. [Google Scholar] [CrossRef]
- Kao, W.B.; Zhang, Y.; Wu, X.B. Efficient identification of broad absorption line quasars using dimensionality reduction and machine learning. Publ. Astron. Soc. Jpn. 2024, 76, 653–665. [Google Scholar] [CrossRef]
- Matijevič, G.; Zwitter, T.; Bienaymé, O.; Bland-Hawthorn, J.; Boeche, C.; Freeman, K.C.; Gibson, B.K.; Gilmore, G.; Grebel, E.K.; Helmi, A.; et al. Exploring the Morphology of RAVE Stellar Spectra. Astrophys. J. Suppl. Ser. 2012, 200, 14. [Google Scholar] [CrossRef]
- Daniel, S.F.; Connolly, A.; Schneider, J.; Vanderplas, J.; Xiong, L. Classification of Stellar Spectra with Local Linear Embedding. Astron. J. 2011, 142, 203. [Google Scholar] [CrossRef]
- Yang, M.; Zhang, H.; Wang, S.; Zhou, J.L.; Zhou, X.; Wang, L.; Wang, L.; Wittenmyer, R.A.; Liu, H.G.; Meng, Z.; et al. Eclipsing Binaries From the CSTAR Project at Dome A, Antarctica. Astrophys. J. Suppl. Ser. 2015, 217, 28. [Google Scholar] [CrossRef]
- Hinton, G.E.; Roweis, S. Stochastic Neighbor Embedding. In Advances in Neural Information Processing Systems; Becker, S., Thrun, S., Obermayer, K., Eds.; MIT Press: Cambridge, MA, USA, 2002; Volume 15. [Google Scholar]
- van der Maaten, L.; Hinton, G. Visualizing Data using t-SNE. J. Mach. Learn. Res. 2008, 9, 2579–2605. [Google Scholar]
- Peruzzi, T.; Pasquato, M.; Ciroi, S.; Berton, M.; Marziani, P.; Nardini, E. Interpreting automatic AGN classifiers with saliency maps. Astron. Astrophys. 2021, 652, A19. [Google Scholar] [CrossRef]
- van der Maaten, L. Barnes-Hut-SNE. arXiv 2013, arXiv:1301.3342. [Google Scholar] [CrossRef]
- Nakoneczny, S.; Bilicki, M.; Solarz, A.; Pollo, A.; Maddox, N.; Spiniello, C.; Brescia, M.; Napolitano, N.R. Catalog of quasars from the Kilo-Degree Survey Data Release 3. Astron. Astrophys. 2019, 624, A13. [Google Scholar] [CrossRef]
- Zhang, X.; Feng, Y.; Chen, H.; Yuan, Q. Powerful t-SNE Technique Leading to Clear Separation of Type-2 AGN and H II Galaxies in BPT Diagrams. Astrophys. J. 2020, 905, 97. [Google Scholar] [CrossRef]
- Queiroz, A.B.A.; Anders, F.; Chiappini, C.; Khalatyan, A.; Santiago, B.X.; Nepal, S.; Steinmetz, M.; Gallart, C.; Valentini, M.; Dal Ponte, M.; et al. StarHorse results for spectroscopic surveys and Gaia DR3: Chrono-chemical populations in the solar vicinity, the genuine thick disk, and young alpha-rich stars. Astron. Astrophys. 2023, 673, A155. [Google Scholar] [CrossRef]
- Traven, G.; Feltzing, S.; Merle, T.; Van der Swaelmen, M.; Čotar, K.; Church, R.; Zwitter, T.; Ting, Y.S.; Sahlholdt, C.; Asplund, M.; et al. The GALAH survey: Multiple stars and our Galaxy. I. A comprehensive method for deriving properties of FGK binary stars. Astron. Astrophys. 2020, 638, A145. [Google Scholar] [CrossRef]
- Steinhardt, C.L.; Weaver, J.R.; Maxfield, J.; Davidzon, I.; Faisst, A.L.; Masters, D.; Schemel, M.; Toft, S. A Method to Distinguish Quiescent and Dusty Star-forming Galaxies with Machine Learning. Astrophys. J. 2020, 891, 136. [Google Scholar] [CrossRef]
- Garcia-Cifuentes, K.; Becerra, R.L.; De Colle, F.; Cabrera, J.I.; Del Burgo, C. Identification of Extended Emission Gamma-Ray Burst Candidates Using Machine Learning. Astrophys. J. 2023, 951, 4. [Google Scholar] [CrossRef]
- Kohonen, T. Self-organized formation of topologically correct feature maps. Biol. Cybern. 1982, 43, 59–69. [Google Scholar] [CrossRef]
- Masters, D.; Capak, P.; Stern, D.; Ilbert, O.; Salvato, M.; Schmidt, S.; Longo, G.; Rhodes, J.; Paltani, S.; Mobasher, B.; et al. Mapping the Galaxy Color-Redshift Relation: Optimal Photometric Redshift Calibration Strategies for Cosmology Surveys. Astrophys. J. 2015, 813, 53. [Google Scholar] [CrossRef]
- Hildebrandt, H.; van den Busch, J.L.; Wright, A.H.; Blake, C.; Joachimi, B.; Kuijken, K.; Tröster, T.; Asgari, M.; Bilicki, M.; de Jong, J.T.A.; et al. KiDS-1000 catalogue: Redshift distributions and their calibration. Astron. Astrophys. 2021, 647, A124. [Google Scholar] [CrossRef]
- Wright, A.H.; Hildebrandt, H.; van den Busch, J.L.; Heymans, C. Photometric redshift calibration with self-organising maps. Astron. Astrophys. 2020, 637, A100. [Google Scholar] [CrossRef]
- Carrasco Kind, M.; Brunner, R.J. SOMz: Photometric redshift PDFs with self-organizing maps and random atlas. Mon. Not. R. Astron. Soc. 2014, 438, 3409–3421. [Google Scholar] [CrossRef]
- Yuan, Z.; Myeong, G.C.; Beers, T.C.; Evans, N.W.; Lee, Y.S.; Banerjee, P.; Gudin, D.; Hattori, K.; Li, H.; Matsuno, T.; et al. Dynamical Relics of the Ancient Galactic Halo. Astrophys. J. 2020, 891, 39. [Google Scholar] [CrossRef]
- Armstrong, D.J.; Kirk, J.; Lam, K.W.F.; McCormac, J.; Osborn, H.P.; Spake, J.; Walker, S.; Brown, D.J.A.; Kristiansen, M.H.; Pollacco, D.; et al. K2 variable catalogue—II. Machine learning classification of variable stars and eclipsing binaries in K2 fields 0-4. Mon. Not. R. Astron. Soc. 2016, 456, 2260–2272. [Google Scholar] [CrossRef]
- Brett, D.R.; West, R.G.; Wheatley, P.J. The automated classification of astronomical light curves using Kohonen self-organizing maps. Mon. Not. R. Astron. Soc. 2004, 353, 369–376. [Google Scholar] [CrossRef]
- Kramer, M.A. Nonlinear principal component analysis using autoassociative neural networks. AIChE J. 1991, 37, 233–243. [Google Scholar] [CrossRef]
- Kramer, M. Autoassociative neural networks. Comput. Chem. Eng. 1992, 16, 313–328. [Google Scholar] [CrossRef]
- Kingma, D.P.; Welling, M. Auto-Encoding Variational Bayes. arXiv 2013, arXiv:1312.6114. [Google Scholar]
- Ralph, N.O.; Norris, R.P.; Fang, G.; Park, L.A.F.; Galvin, T.J.; Alger, M.J.; Andernach, H.; Lintott, C.; Rudnick, L.; Shabala, S.; et al. Radio Galaxy Zoo: Unsupervised Clustering of Convolutionally Auto-encoded Radio-astronomical Images. Publ. Astron. Soc. Pac. 2019, 131, 108011. [Google Scholar] [CrossRef]
- Savary, E.; Rojas, K.; Maus, M.; Clément, B.; Courbin, F.; Gavazzi, R.; Chan, J.H.H.; Lemon, C.; Vernardos, G.; Cañameras, R.; et al. Strong lensing in UNIONS: Toward a pipeline from discovery to modeling. Astron. Astrophys. 2022, 666, A1. [Google Scholar] [CrossRef]
- Ganeshaiah Veena, P.; Lilow, R.; Nusser, A. Large-scale density and velocity field reconstructions with neural networks. Mon. Not. R. Astron. Soc. 2023, 522, 5291–5307. [Google Scholar] [CrossRef]
- Shen, H.; George, D.; Huerta, E.A.; Zhao, Z. Denoising Gravitational Waves with Enhanced Deep Recurrent Denoising Auto-Encoders. arXiv 2019, arXiv:1903.03105. [Google Scholar] [CrossRef]
- Ichinohe, Y.; Yamada, S. Neural network-based anomaly detection for high-resolution X-ray spectroscopy. Mon. Not. R. Astron. Soc. 2019, 487, 2874–2880. [Google Scholar] [CrossRef]
- Bayley, J.; Messenger, C.; Woan, G. Rapid parameter estimation for an all-sky continuous gravitational wave search using conditional varitational auto-encoders. Phys. Rev. D 2022, 106, 083022. [Google Scholar] [CrossRef]
- Wenger, M.; Ochsenbein, F.; Egret, D.; Dubois, P.; Bonnarel, F.; Borde, S.; Genova, F.; Jasniewicz, G.; Laloë, S.; Lesteven, S.; et al. The SIMBAD astronomical database. The CDS reference database for astronomical objects. Astron. Astrophys. Suppl. Ser. 2000, 143, 9–22. [Google Scholar] [CrossRef]
- Allen, M.G. CDS—Strasbourg Astronomical Data Centre. In Astronomical Data Analysis Software and Systems XXIX; Pizzo, R., Deul, E.R., Mol, J.D., de Plaa, J., Verkouter, H., Eds.; Astronomical Society of the Pacific: San Francisco, CA, USA, 2020; Volume 527, p. 751. [Google Scholar]
- Tamuz, O.; Mazeh, T.; Zucker, S. Correcting systematic effects in a large set of photometric light curves. Mon. Not. R. Astron. Soc. 2005, 356, 1466–1470. [Google Scholar] [CrossRef]
- Norberg, P.; Baugh, C.M.; Hawkins, E.; Maddox, S.; Madgwick, D.; Lahav, O.; Cole, S.; Frenk, C.S.; Baldry, I.; Bland-Hawthorn, J.; et al. The 2dF Galaxy Redshift Survey: The dependence of galaxy clustering on luminosity and spectral type. Mon. Not. R. Astron. Soc. 2002, 332, 827–838. [Google Scholar] [CrossRef]
- Lyke, B.W.; Higley, A.N.; McLane, J.N.; Schurhammer, D.P.; Myers, A.D.; Ross, A.J.; Dawson, K.; Chabanier, S.; Martini, P.; Busca, N.G.; et al. The Sloan Digital Sky Survey Quasar Catalog: Sixteenth Data Release. Astrophys. J. Suppl. Ser. 2020, 250, 8. [Google Scholar] [CrossRef]
- Newton-Bosch, J.; González, L.X.; Valdés-Galicia, J.F.; Monterde-Andrade, F.; Morales-Olivares, O.G.; Sergeeva, M.A.; Muraki, Y.; Shibata, S.; Matsubara, Y.; Sako, T.; et al. Atmospheric pressure and temperature effects on the Solar Neutron Telescope at Sierra Negra. Adv. Space Res. 2025, 75, 6543–6552. [Google Scholar] [CrossRef]
- Boroson, T.A.; Green, R.F. The Emission-Line Properties of Low-Redshift Quasi-stellar Objects. Astrophys. J. Suppl. Ser. 1992, 80, 109. [Google Scholar] [CrossRef]
- DeMeo, F.E.; Binzel, R.P.; Slivan, S.M.; Bus, S.J. An extension of the Bus asteroid taxonomy into the near-infrared. Icarus 2009, 202, 160–180. [Google Scholar] [CrossRef]
- Tous, J.L.; Solanes, J.M.; Perea, J.D. Fully comprehensive diagnostic of galaxy activity using principal components of visible spectra: Implementation on nearby S0s. Mon. Not. R. Astron. Soc. 2025, 537, 1459–1469. [Google Scholar] [CrossRef]
- Xu, M.; Fu, X.; Chen, Y.; Li, L.; Fang, M.; Zhao, H.; Liu, P.; Zuo, Y. Nearby open clusters with tidal features: Golden sample selection and 3D structure. Astron. Astrophys. 2025, 698, A156. [Google Scholar] [CrossRef]
- Bu, Y.D.; Pan, J.C.; Chen, F.Q. Stellar spectral outliers detection based on Isomap. Guang Pu Xue Yu Guang Pu Fen Xi 2014, 34, 267–273. [Google Scholar]
- Sasdelli, M.; Ishida, E.E.O.; Vilalta, R.; Aguena, M.; Busti, V.C.; Camacho, H.; Trindade, A.M.M.; Gieseke, F.; de Souza, R.S.; Fantaye, Y.T.; et al. Exploring the spectroscopic diversity of Type Ia supernovae with DRACULA: A machine learning approach. Mon. Not. R. Astron. Soc. 2016, 461, 2044–2059. [Google Scholar] [CrossRef]
- Žerjal, M.; Zwitter, T.; Matijevič, G.; Strassmeier, K.G. Chromospherically Active Stars in the RAVE Survey. In Setting the scene for Gaia and LAMOST; Feltzing, S., Zhao, G., Walton, N.A., Whitelock, P., Eds.; IAU Symposium; Cambridge University Press: Cambridge, UK, 2014; Volume 298, pp. 298–303. [Google Scholar] [CrossRef]
- Bódi, A.; Hajdu, T. Classification of OGLE Eclipsing Binary Stars Based on Their Morphology Type with Locally Linear Embedding. Astrophys. J. Suppl. Ser. 2021, 255, 1. [Google Scholar] [CrossRef]
- Bu, Y.; Pan, J.; Jiang, B.; Wei, P. Stellar Spectral Subclass Classification Based on Locally Linear Embedding. Publ. Astron. Soc. Jpn. 2013, 65, 81. [Google Scholar] [CrossRef]
- Chang, H.; Yeung, D.Y. Robust locally linear embedding. Pattern Recognit. 2006, 39, 1053–1065. [Google Scholar] [CrossRef]
- Matijevič, G.; Chiappini, C.; Grebel, E.K.; Wyse, R.F.G.; Zwitter, T.; Bienaymé, O.; Bland-Hawthorn, J.; Freeman, K.C.; Gibson, B.K.; Gilmore, G.; et al. Very metal-poor stars observed by the RAVE survey. Astron. Astrophys. 2017, 603, A19. [Google Scholar] [CrossRef]
- Hughes, A.C.N.; Spitler, L.R.; Zucker, D.B.; Nordlander, T.; Simpson, J.; da Costa, G.S.; Ting, Y.S.; Li, C.; Bland-Hawthorn, J.; Buder, S.; et al. The GALAH Survey: A New Sample of Extremely Metal-poor Stars Using a Machine-learning Classification Algorithm. Astrophys. J. 2022, 930, 47. [Google Scholar] [CrossRef]
- Anders, F.; Chiappini, C.; Santiago, B.X.; Matijevič, G.; Queiroz, A.B.; Steinmetz, M.; Guiglion, G. Dissecting stellar chemical abundance space with t-SNE. Astron. Astrophys. 2018, 619, A125. [Google Scholar] [CrossRef]
- Kos, J.; Bland-Hawthorn, J.; Freeman, K.; Buder, S.; Traven, G.; De Silva, G.M.; Sharma, S.; Asplund, M.; Duong, L.; Lin, J.; et al. The GALAH survey: Chemical tagging of star clusters and new members in the Pleiades. Mon. Not. R. Astron. Soc. 2018, 473, 4612–4633. [Google Scholar] [CrossRef]
- Zhu, S.Y.; Sun, W.P.; Ma, D.L.; Zhang, F.W. Classification of Fermi gamma-ray bursts based on machine learning. Mon. Not. R. Astron. Soc. 2024, 532, 1434–1443. [Google Scholar] [CrossRef]
- Chen, J.M.; Zhu, K.R.; Peng, Z.Y.; Zhang, L. Classification and Physical Characteristic Analysis of Fermi-GBM Gamma-Ray Bursts Based on Deep Learning. Astrophys. J. Suppl. Ser. 2025, 276, 62. [Google Scholar] [CrossRef]
- Quispe-Huaynasi, F.; Roig, F.; Holanda, N.; Loaiza-Tacuri, V.; Eleutério, R.; Pereira, C.B.; Daflon, S.; Placco, V.M.; Lopes de Oliveira, R.; Sestito, F.; et al. An Unsupervised Machine Learning Approach to Identify Spectral Energy Distribution Outliers: Application to the S-PLUS DR4 Data. Astron. J. 2025, 169, 332. [Google Scholar] [CrossRef]
- Yan, Z.; Wright, A.H.; Elisa Chisari, N.; Georgiou, C.; Joudaki, S.; Loureiro, A.; Reischke, R.; Asgari, M.; Bilicki, M.; Dvornik, A.; et al. KiDS-Legacy: Angular galaxy clustering from deep surveys with complex selection effects. Astron. Astrophys. 2025, 694, A259. [Google Scholar] [CrossRef]
- Buchs, R.; Davis, C.; Gruen, D.; DeRose, J.; Alarcon, A.; Bernstein, G.M.; Sánchez, C.; Myles, J.; Roodman, A.; Allen, S.; et al. Phenotypic redshifts with self-organizing maps: A novel method to characterize redshift distributions of source galaxies for weak lensing. Mon. Not. R. Astron. Soc. 2019, 489, 820–841. [Google Scholar] [CrossRef]
- van den Busch, J.L.; Wright, A.H.; Hildebrandt, H.; Bilicki, M.; Asgari, M.; Joudaki, S.; Blake, C.; Heymans, C.; Kannawadi, A.; Shan, H.Y.; et al. KiDS-1000: Cosmic shear with enhanced redshift calibration. Astron. Astrophys. 2022, 664, A170. [Google Scholar] [CrossRef]
- Beck, R.; Szapudi, I.; Flewelling, H.; Holmberg, C.; Magnier, E.; Chambers, K.C. PS1-STRM: Neural network source classification and photometric redshift catalogue for PS1 3π DR1. Mon. Not. R. Astron. Soc. 2021, 500, 1633–1644. [Google Scholar] [CrossRef]
- Galvin, T.J.; Huynh, M.T.; Norris, R.P.; Wang, X.R.; Hopkins, E.; Polsterer, K.; Ralph, N.O.; O’Brien, A.N.; Heald, G.H. Cataloguing the radio-sky with unsupervised machine learning: A new approach for the SKA era. Mon. Not. R. Astron. Soc. 2020, 497, 2730–2758. [Google Scholar] [CrossRef]
- Polsterer, K.L.; Gieseke, F.; Igel, C. Automatic Galaxy Classification via Machine Learning Techniques: Parallelized Rotation/Flipping INvariant Kohonen Maps (PINK). In Astronomical Data Analysis Software an Systems XXIV (ADASS XXIV); Taylor, A.R., Rosolowsky, E., Eds.; Astronomical Society of the Pacific Conference Series; Astronomical Society of the Pacific: San Francisco, CA, USA, 2015; Volume 495, p. 81. [Google Scholar]
- Kollasch, F.; Polsterer, K. UltraPINK—New possibilities to explore Self-Organizing Kohonen Maps. In Astromical Data Analysis Software and Systems XXXI; Hugo, B.V., Van Rooyen, R., Smirnov, O.M., Eds.; Astronomical Society of the Pacific Conference Series; Astronomical Society of the Pacific: San Francisco, CA, USA, 2024; Volume 535, p. 49. [Google Scholar] [CrossRef]
- Viscasillas Vázquez, C.; Solano, E.; Ulla, A.; Ambrosch, M.; Álvarez, M.A.; Manteiga, M.; Magrini, L.; Santoveña-Gómez, R.; Dafonte, C.; Pérez-Fernández, E.; et al. Advanced classification of hot subdwarf binaries using artificial intelligence techniques and Gaia DR3 data. Astron. Astrophys. 2024, 691, A223. [Google Scholar] [CrossRef]
- Jarrett, K.; Kavukcuoglu, K.; Ranzato, M.; LeCun, Y. What is the best multi-stage architecture for object recognition? In Proceedings of the IEEE International Conference on Computer Vision, Kyoto, Japan, 29 September–2 October 2009; pp. 2146–2153. [Google Scholar] [CrossRef]
- Han, Y.; Zou, Z.; Li, N.; Chen, Y. Identifying Outliers in Astronomical Images with Unsupervised Machine Learning. Res. Astron. Astrophys. 2022, 22, 085006. [Google Scholar] [CrossRef]
- Sharma, K.; Kembhavi, A.; Kembhavi, A.; Sivarani, T.; Abraham, S. Detecting Outliers in SDSS using Convolutional Neural Network. Bull. Soc. R. Sci. Liege 2019, 88, 174–181. [Google Scholar] [CrossRef]
- Tröster, T.; Ferguson, C.; Harnois-Déraps, J.; McCarthy, I.G. Painting with baryons: Augmenting N-body simulations with gas using deep generative models. Mon. Not. R. Astron. Soc. 2019, 487, L24–L29. [Google Scholar] [CrossRef]
- Jadhav, S.; Shrivastava, M.; Mitra, S. Towards a robust and reliable deep learning approach for detection of compact binary mergers in gravitational wave data. Mach. Learn. Sci. Technol. 2023, 4, 045028. [Google Scholar] [CrossRef]
- Laroche, A.; Speagle, J.S. Closing the Stellar Labels Gap: Stellar Label independent Evidence for [α/M] Information in Gaia BP/RP Spectra. Astrophys. J. 2025, 979, 5. [Google Scholar] [CrossRef]
- Andrianomena, S.; Tang, H. Radio Galaxy Zoo: Leveraging latent space representations from variational autoencoder. J. Cosmol. Astropart. Phys. 2024, 2024, 034. [Google Scholar] [CrossRef]
- Ferragamo, A.; de Andres, D.; Sbriglio, A.; Cui, W.; De Petris, M.; Yepes, G.; Dupuis, R.; Jarraya, M.; Lahouli, I.; De Luca, F.; et al. THE THREE HUNDRED project: A machine learning method to infer clusters of galaxy mass radial profiles from mock Sunyaev-Zel’dovich maps. Mon. Not. R. Astron. Soc. 2023, 520, 4000–4008. [Google Scholar] [CrossRef]
- Zhou, C.; Gu, Y.; Fang, G.; Lin, Z. Automatic Morphological Classification of Galaxies: Convolutional Autoencoder and Bagging-based Multiclustering Model. Astron. J. 2022, 163, 86. [Google Scholar] [CrossRef]
- Alemi, A.; Poole, B.; Fischer, I.; Dillon, J.; Saurous, R.A.; Murphy, K. Fixing a Broken ELBO. In Proceedings of the 35th International Conference on Machine Learning, Stockholm, Sweden, 10–15 July 2018; Dy, J., Krause, A., Eds.; PMLR (Proceedings of Machine Learning Research): London, UK, 2018; Volume 80, pp. 159–168. [Google Scholar]
- Semenov, V.; Tymchyshyn, V.; Bezguba, V.; Tsizh, M.; Khlevniuk, A. Galaxy morphological classification with manifold learning. Astron. Comput. 2025, 52, 100963. [Google Scholar] [CrossRef]
- Zubieta, E.; Missel, R.; Sosa Fiscella, V.; Lousto, C.O.; del Palacio, S.; López Armengol, F.G.; García, F.; Combi, J.A.; Wang, L.; Combi, L.; et al. First results of the glitching pulsar monitoring programme at the Argentine Institute of Radioastronomy. Mon. Not. R. Astron. Soc. 2023, 521, 4504–4521. [Google Scholar] [CrossRef]
- Zubieta, E.; Missel, R.; Araujo Furlan, S.B.; Lousto, C.O.; García, F.; del Palacio, S.; Gancio, G.; Combi, J.A.; Wang, L. Study of the 2024 major Vela glitch at the Argentine Institute of Radioastronomy. Astron. Astrophys. 2025, 698, A72. [Google Scholar] [CrossRef]
- Lousto, C.O.; Missel, R.; Prajapati, H.; Sosa Fiscella, V.; Armengol, F.G.L.; Gyawali, P.K.; Wang, L.; Cahill, N.D.; Combi, L.; Palacio, S.d.; et al. Vela pulsar: Single pulses analysis with machine learning techniques. Mon. Not. R. Astron. Soc. 2022, 509, 5790–5808. [Google Scholar] [CrossRef]
- Amaya, J.; Dupuis, R.; Innocenti, M.E.; Lapenta, G. Visualizing and Interpreting Unsupervised Solar Wind Classifications. Front. Astron. Space Sci. 2020, 7, 66. [Google Scholar] [CrossRef]
- Teimoorinia, H.; Shishehchi, S.; Tazwar, A.; Lin, P.; Archinuk, F.; Gwyn, S.D.J.; Kavelaars, J.J. An Astronomical Image Content-based Recommendation System Using Combined Deep Learning Models in a Fully Unsupervised Mode. Astron. J. 2021, 161, 227. [Google Scholar] [CrossRef]
- Forest, F.; Lebbah, M.; Azzag, H.; Lacaille, J. Deep Embedded SOM: Joint representation learning and self-organization. In Proceedings of the European Symposium on Artificial Neural Networks, Bruges, Belgium, 24–26 April 2019. [Google Scholar]
- Schwarz, G. Estimating the Dimension of a Model. Ann. Stat. 1978, 6, 461–464. [Google Scholar] [CrossRef]
- Blei, D.M.; Jordan, M.I. Variational inference for Dirichlet process mixtures. Bayesian Anal. 2006, 1, 121–143. [Google Scholar] [CrossRef]
- Hao, J.; McKay, T.A.; Koester, B.P.; Rykoff, E.S.; Rozo, E.; Annis, J.; Wechsler, R.H.; Evrard, A.; Siegel, S.R.; Becker, M.; et al. A GMBCG Galaxy Cluster Catalog of 55,424 Rich Clusters from SDSS DR7. Astrophys. J. Suppl. Ser. 2010, 191, 254–274. [Google Scholar] [CrossRef]
- Duncan, K.J. All-purpose, all-sky photometric redshifts for the Legacy Imaging Surveys Data Release 8. Mon. Not. R. Astron. Soc. 2022, 512, 3662–3683. [Google Scholar] [CrossRef]
- Das, P.; Hawkins, K.; Jofré, P. Ages and kinematics of chemically selected, accreted Milky Way halo stars. Mon. Not. R. Astron. Soc. 2020, 493, 5195–5207. [Google Scholar] [CrossRef]
- D’Isanto, A.; Polsterer, K.L. Photometric redshift estimation via deep learning. Generalized and pre-classification-less, image based, fully probabilistic redshifts. Astron. Astrophys. 2018, 609, A111. [Google Scholar] [CrossRef]
- Lee, K.J.; Guillemot, L.; Yue, Y.L.; Kramer, M.; Champion, D.J. Application of the Gaussian mixture model in pulsar astronomy - pulsar classification and candidates ranking for the Fermi 2FGL catalogue. Mon. Not. R. Astron. Soc. 2012, 424, 2832–2840. [Google Scholar] [CrossRef][Green Version]
- Cheng, T.Y.; Li, N.; Conselice, C.J.; Aragón-Salamanca, A.; Dye, S.; Metcalf, R.B. Identifying strong lenses with unsupervised machine learning using convolutional autoencoder. Mon. Not. R. Astron. Soc. 2020, 494, 3750–3765. [Google Scholar] [CrossRef]
- Tarricq, Y.; Soubiran, C.; Casamiquela, L.; Castro-Ginard, A.; Olivares, J.; Miret-Roig, N.; Galli, P.A.B. Structural parameters of 389 local open clusters. Astron. Astrophys. 2022, 659, A59. [Google Scholar] [CrossRef]
- MacQueen, J. Some methods for classification and analysis of multivariate observations. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, California, CA, USA, 27 December 1965–7 January 1966; Volume 1, pp. 281–297. [Google Scholar]
- Arthur, D.; Vassilvitskii, S. K-Means++: The Advantages of Careful Seeding. In Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, New Orleans, LA, USA, 7–9 January 2007; Volume 8, pp. 1027–1035. [Google Scholar]
- Viticchié, B.; Sánchez Almeida, J. Asymmetries of the StokesVprofiles observed by HINODE SOT/SP in the quiet Sun. Astron. Astrophys. 2011, 530, A14. [Google Scholar] [CrossRef]
- Johnson, S.C. Hierarchical clustering schemes. Psychometrika 1967, 32, 241–254. [Google Scholar] [CrossRef] [PubMed]
- Dantas, M.L.L.; Smiljanic, R.; Boesso, R.; Rocha-Pinto, H.J.; Magrini, L.; Guiglion, G.; Tautvaišienė, G.; Gilmore, G.; Randich, S.; Bensby, T.; et al. The Gaia-ESO Survey: Old super-metal-rich visitors from the inner Galaxy. Astron. Astrophys. 2023, 669, A96. [Google Scholar] [CrossRef]
- Galli, P.A.B.; Loinard, L.; Bouy, H.; Sarro, L.M.; Ortiz-León, G.N.; Dzib, S.A.; Olivares, J.; Heyer, M.; Hernandez, J.; Román-Zúñiga, C.; et al. Structure and kinematics of the Taurus star-forming region from Gaia-DR2 and VLBI astrometry. Astron. Astrophys. 2019, 630, A137. [Google Scholar] [CrossRef]
- Kounkel, M.; Covey, K.; Suárez, G.; Román-Zúñiga, C.; Hernandez, J.; Stassun, K.; Jaehnig, K.O.; Feigelson, E.D.; Peña Ramírez, K.; Roman-Lopes, A.; et al. The APOGEE-2 Survey of the Orion Star-forming Complex. II. Six-dimensional Structure. Astron. J. 2018, 156, 84. [Google Scholar] [CrossRef]
- Hojnacki, S.M.; Kastner, J.H.; Micela, G.; Feigelson, E.D.; LaLonde, S.M. An X-Ray Spectral Classification Algorithm with Application to Young Stellar Clusters. Astrophys. J. 2007, 659, 585–598. [Google Scholar] [CrossRef]
- Tinker, J.; Kravtsov, A.V.; Klypin, A.; Abazajian, K.; Warren, M.; Yepes, G.; Gottlöber, S.; Holz, D.E. Toward a Halo Mass Function for Precision Cosmology: The Limits of Universality. Astrophys. J. 2008, 688, 709–728. [Google Scholar] [CrossRef]
- Behroozi, P.S.; Wechsler, R.H.; Wu, H.Y. The ROCKSTAR Phase-space Temporal Halo Finder and the Velocity Offsets of Cluster Cores. Astrophys. J. 2013, 762, 109. [Google Scholar] [CrossRef]
- Howlett, C.; Ross, A.J.; Samushia, L.; Percival, W.J.; Manera, M. The clustering of the SDSS main galaxy sample—II. Mock galaxy catalogues and a measurement of the growth of structure from redshift space distortions at z = 0.15. Mon. Not. R. Astron. Soc. 2015, 449, 848–866. [Google Scholar] [CrossRef]
- Ester, M.; Kriegel, H.P.; Sander, J.; Xu, X. A density-based algorithm for discovering clusters in large spatial databases with noise. In KDD; The Association for the Advancement of Artificial Intelligence: Washington, DC, USA, 1996; Volume 96, pp. 226–231. [Google Scholar]
- Castro-Ginard, A.; Jordi, C.; Luri, X.; Julbe, F.; Morvan, M.; Balaguer-Núñez, L.; Cantat-Gaudin, T. A new method for unveiling open clusters in Gaia. New nearby open clusters confirmed by DR2. Astron. Astrophys. 2018, 618, A59. [Google Scholar] [CrossRef]
- Zari, E.; Brown, A.G.A.; de Zeeuw, P.T. Structure, kinematics, and ages of the young stellar populations in the Orion region. Astron. Astrophys. 2019, 628, A123. [Google Scholar] [CrossRef]
- Yan, Q.Z.; Yang, J.; Su, Y.; Sun, Y.; Wang, C. Distances and Statistics of Local Molecular Clouds in the First Galactic Quadrant. Astrophys. J. 2020, 898, 80. [Google Scholar] [CrossRef]
- Price-Jones, N.; Bovy, J. Blind chemical tagging with DBSCAN: Prospects for spectroscopic surveys. Mon. Not. R. Astron. Soc. 2019, 487, 871–886. [Google Scholar] [CrossRef]
- Castro-Ginard, A.; Jordi, C.; Luri, X.; Cantat-Gaudin, T.; Carrasco, J.M.; Casamiquela, L.; Anders, F.; Balaguer-Núñez, L.; Badia, R.M. Hunting for open clusters in Gaia EDR3: 628 new open clusters found with OCfinder. Astron. Astrophys. 2022, 661, A118. [Google Scholar] [CrossRef]
- Hunt, E.L.; Reffert, S. Improving the open cluster census. I. Comparison of clustering algorithms applied to Gaia DR2 data. Astron. Astrophys. 2021, 646, A104. [Google Scholar] [CrossRef]
- Campello, R.J.G.B.; Moulavi, D.; Sander, J. Density-Based Clustering Based on Hierarchical Density Estimates. In Advances in Knowledge Discovery and Data Mining; Pei, J., Tseng, V.S., Cao, L., Motoda, H., Xu, G., Eds.; Springer: Berlin/Heidelberg, Germany, 2013; pp. 160–172. [Google Scholar]
- McInnes, L.; Healy, J.; Astels, S. hdbscan: Hierarchical density based clustering. J. Open Source Softw. 2017, 2, 205. [Google Scholar] [CrossRef]
- Koppelman, H.H.; Helmi, A.; Massari, D.; Price-Whelan, A.M.; Starkenburg, T.K. Multiple retrograde substructures in the Galactic halo: A shattered view of Galactic history. Astron. Astrophys. 2019, 631, L9. [Google Scholar] [CrossRef]
- Hunt, E.L.; Reffert, S. Improving the open cluster census. II. An all-sky cluster catalogue with Gaia DR3. Astron. Astrophys. 2023, 673, A114. [Google Scholar] [CrossRef]
- Kerr, R.M.P.; Rizzuto, A.C.; Kraus, A.L.; Offner, S.S.R. Stars with Photometrically Young Gaia Luminosities Around the Solar System (SPYGLASS). I. Mapping Young Stellar Structures and Their Star Formation Histories. Astrophys. J. 2021, 917, 23. [Google Scholar] [CrossRef]
- Webb, S.; Lochner, M.; Muthukrishna, D.; Cooke, J.; Flynn, C.; Mahabal, A.; Goode, S.; Andreoni, I.; Pritchard, T.; Abbott, T.M.C. Unsupervised machine learning for transient discovery in deeper, wider, faster light curves. Mon. Not. R. Astron. Soc. 2020, 498, 3077–3094. [Google Scholar] [CrossRef]
- Moranta, L.; Gagné, J.; Couture, D.; Faherty, J.K. New Coronae and Stellar Associations Revealed by a Clustering Analysis of the Solar Neighborhood. Astrophys. J. 2022, 939, 94. [Google Scholar] [CrossRef]
- Shank, D.; Komater, D.; Beers, T.C.; Placco, V.M.; Huang, Y. Dynamically Tagged Groups of Metal-poor Stars. II. The Radial Velocity Experiment Data Release 6. Astrophys. J. Suppl. Ser. 2022, 261, 19. [Google Scholar] [CrossRef]
- Cabrera Garcia, J.; Beers, T.C.; Huang, Y.; Li, X.Y.; Liu, G.; Zhang, H.; Hong, J.; Lee, Y.S.; Shank, D.; Gudin, D.; et al. Probing the Galactic halo with RR Lyrae stars—V. Chemistry, kinematics, and dynamically tagged groups. Mon. Not. R. Astron. Soc. 2024, 527, 8973–8990. [Google Scholar] [CrossRef]
- Dunn, J.C. A Fuzzy Relative of the ISODATA Process and Its Use in Detecting Compact Well-Separated Clusters. J. Cybern. 1973, 3, 32–57. [Google Scholar] [CrossRef]
- Bezdek, J. Pattern Recognition With Fuzzy Objective Function Algorithms; Springer: New York, NY, USA, 1981. [Google Scholar] [CrossRef]
- Kruse, R.; Döring, C.; Lesot, M.J. Fundamentals of Fuzzy Clustering. In Advances in Fuzzy Clustering and its Applications; John Wiley & Sons, Ltd.: Hoboken, NJ, USA, 2007; Chapter 1; pp. 1–30. [Google Scholar] [CrossRef]
- Colazo, M.; Alvarez-Candal, A.; Duffard, R. Zero-phase angle asteroid taxonomy classification using unsupervised machine learning algorithms. Astron. Astrophys. 2022, 666, A77. [Google Scholar] [CrossRef]
- Shi, L.; He, P. A Fast Fuzzy Clustering Algorithm for Large-Scale Datasets. In Advanced Data Mining and Applications; Li, X., Wang, S., Dong, Z.Y., Eds.; Springer: Berlin/Heidelberg, Germany, 2005; pp. 203–208. [Google Scholar]
- Cheng, T.W.; Goldgof, D.B.; Hall, L.O. Fast fuzzy clustering. Fuzzy Sets Syst. 1998, 93, 49–56. [Google Scholar] [CrossRef]
- Szabó, G.M.; Kálmán, S.; Borsato, L.; Hegedűs, V.; Mészáros, S.; Szabó, R. Sub-Jovian desert of exoplanets at its boundaries. Parameter dependence along the main sequence. Astron. Astrophys. 2023, 671, A132. [Google Scholar] [CrossRef]
- Modak, S. Distinction of groups of gamma-ray bursts in the BATSE catalog through fuzzy clustering. Astron. Comput. 2021, 34, 100441. [Google Scholar] [CrossRef]
- Li, H. Fuzzy Cluster Analysis: Application to Determining Metallicities for Very Metal-poor Stars. Astrophys. J. 2021, 923, 183. [Google Scholar] [CrossRef]
- Barra, V.; Delouille, V.; Hochedez, J.F. Segmentation of extreme ultraviolet solar images via multichannel fuzzy clustering. Adv. Space Res. 2008, 42, 917–925. [Google Scholar] [CrossRef]
- Anilkumar, B.T.; Sabarinath, A. Grouping and long term prediction of sunspot cycle characteristics-A fuzzy clustering approach. Astron. Comput. 2024, 48, 100836. [Google Scholar] [CrossRef]
- Offner, S.S.R.; Taylor, J.; Markey, C.; Chen, H.H.H.; Pineda, J.E.; Goodman, A.A.; Burkert, A.; Ginsburg, A.; Choudhury, S. Turbulence, coherence, and collapse: Three phases for core evolution. Mon. Not. R. Astron. Soc. 2022, 517, 885–909. [Google Scholar] [CrossRef]
- Shan, H.; Wang, X.; Chen, X.; Yuan, J.; Nie, J.; Zhang, H.; Liu, N.; Wang, N. Wavelet based recognition for pulsar signals. Astron. Comput. 2015, 11, 55–63. [Google Scholar] [CrossRef]
- Bandyopadhyay, S.; Das, S.; Datta, A. Comparative Study and Development of Two Contour-Based Image Segmentation Techniques for Coronal Hole Detection in Solar Images. Sol. Phys. 2020, 295, 110. [Google Scholar] [CrossRef]
- Revathy, K.; Lekshmi, S.; Nayar, S.R.P. Fractal-Based Fuzzy Technique For Detection Of Active Regions From Solar Images. Sol. Phys. 2005, 228, 43–53. [Google Scholar] [CrossRef]
- Castro-Ginard, A.; McMillan, P.J.; Luri, X.; Jordi, C.; Romero-Gómez, M.; Cantat-Gaudin, T.; Casamiquela, L.; Tarricq, Y.; Soubiran, C.; Anders, F. Milky Way spiral arms from open clusters in Gaia EDR3. Astron. Astrophys. 2021, 652, A162. [Google Scholar] [CrossRef]
- Buder, S.; Lind, K.; Ness, M.K.; Feuillet, D.K.; Horta, D.; Monty, S.; Buck, T.; Nordlander, T.; Bland-Hawthorn, J.; Casey, A.R.; et al. The GALAH Survey: Chemical tagging and chrono-chemodynamics of accreted halo stars with GALAH+ DR3 and Gaia eDR3. Mon. Not. R. Astron. Soc. 2022, 510, 2407–2436. [Google Scholar] [CrossRef]
- Talbot, C.; Thrane, E. Flexible and Accurate Evaluation of Gravitational-wave Malmquist Bias with Machine Learning. Astrophys. J. 2022, 927, 76. [Google Scholar] [CrossRef]
- Myeong, G.C.; Belokurov, V.; Aguado, D.S.; Evans, N.W.; Caldwell, N.; Bradley, J. Milky Way’s Eccentric Constituents with Gaia, APOGEE, and GALAH. Astrophys. J. 2022, 938, 21. [Google Scholar] [CrossRef]
- Morales-Luis, A.B.; Sánchez Almeida, J.; Aguerri, J.A.L.; Muñoz-Tuñón, C. Systematic Search for Extremely Metal-poor Galaxies in the Sloan Digital Sky Survey. Astrophys. J. 2011, 743, 77. [Google Scholar] [CrossRef]
- Hasselquist, S.; Carlin, J.L.; Holtzman, J.A.; Shetrone, M.; Hayes, C.R.; Cunha, K.; Smith, V.; Beaton, R.L.; Sobeck, J.; Allende Prieto, C.; et al. Identifying Sagittarius Stream Stars by Their APOGEE Chemical Abundance Signatures. Astrophys. J. 2019, 872, 58. [Google Scholar] [CrossRef]
- Mackereth, J.T.; Schiavon, R.P.; Pfeffer, J.; Hayes, C.R.; Bovy, J.; Anguiano, B.; Allende Prieto, C.; Hasselquist, S.; Holtzman, J.; Johnson, J.A.; et al. The origin of accreted stellar halo populations in the Milky Way using APOGEE, Gaia, and the EAGLE simulations. Mon. Not. R. Astron. Soc. 2019, 482, 3426–3442. [Google Scholar] [CrossRef]
- Asadi, V.; Haghi, H.; Zonoozi, A.H. Semi-supervised classification of stars, galaxies and quasars using K-means and random-forest approaches. arXiv 2025, arXiv:2507.14072. [Google Scholar] [CrossRef]
- Hogg, D.W.; Casey, A.R.; Ness, M.; Rix, H.W.; Foreman-Mackey, D.; Hasselquist, S.; Ho, A.Y.Q.; Holtzman, J.A.; Majewski, S.R.; Martell, S.L.; et al. Chemical Tagging Can Work: Identification of Stellar Phase-space Structures Purely by Chemical-abundance Similarity. Astrophys. J. 2016, 833, 262. [Google Scholar] [CrossRef]
- Bose, S.; Joshi, J.; Henriques, V.M.J.; Rouppe van der Voort, L. Spicules and downflows in the solar chromosphere. Astron. Astrophys. 2021, 647, A147. [Google Scholar] [CrossRef]
- Hayes, J.J.C.; Kerins, E.; Awiphan, S.; McDonald, I.; Morgan, J.S.; Chuanraksasat, P.; Komonjinda, S.; Sanguansak, N.; Kittara, P.; SPEARNet Collaboration. Optimizing exoplanet atmosphere retrieval using unsupervised machine-learning classification. Mon. Not. R. Astron. Soc. 2020. [Google Scholar] [CrossRef]
- Sreehari, H.; Nandi, A. A machine learning approach for classification of accretion states of black hole binaries. Mon. Not. R. Astron. Soc. 2021, 502, 1334–1343. [Google Scholar] [CrossRef]
- Barchi, P.H.; da Costa, F.G.; Sautter, R.; Moura, T.C.; Stalder, D.H.; Rosa, R.R.; de Carvalho, R.R. Improving galaxy morphology with machine learning. arXiv 2017, arXiv:1705.06818. [Google Scholar] [CrossRef]
- Henshaw, J.D.; Ginsburg, A.; Haworth, T.J.; Longmore, S.N.; Kruijssen, J.M.D.; Mills, E.A.C.; Sokolov, V.; Walker, D.L.; Barnes, A.T.; Contreras, Y.; et al. ‘The Brick’ is not a brick: A comprehensive study of the structure and dynamics of the central molecular zone cloud G0.253+0.016. Mon. Not. R. Astron. Soc. 2019, 485, 2457–2485. [Google Scholar] [CrossRef]
- Carruba, V.; Aljbaae, S.; Lucchini, A. Machine-learning identification of asteroid groups. Mon. Not. R. Astron. Soc. 2019, 488, 1377–1386. [Google Scholar] [CrossRef]
- Barnes, A.T.; Henshaw, J.D.; Caselli, P.; Jiménez-Serra, I.; Tan, J.C.; Fontani, F.; Pon, A.; Ragan, S. Similar complex kinematics within two massive, filamentary infrared dark clouds. Mon. Not. R. Astron. Soc. 2018, 475, 5268–5289. [Google Scholar] [CrossRef]
- Castro-Ginard, A.; Jordi, C.; Luri, X.; Álvarez Cid-Fuentes, J.; Casamiquela, L.; Anders, F.; Cantat-Gaudin, T.; Monguió, M.; Balaguer-Núñez, L.; Solà, S.; et al. Hunting for open clusters in Gaia DR2: 582 new open clusters in the Galactic disc. Astron. Astrophys. 2020, 635, A45. [Google Scholar] [CrossRef]
- Prisinzano, L.; Damiani, F.; Sciortino, S.; Flaccomio, E.; Guarcello, M.G.; Micela, G.; Tognelli, E.; Jeffries, R.D.; Alcalá, J.M. Low-mass young stars in the Milky Way unveiled by DBSCAN and Gaia EDR3: Mapping the star forming regions within 1.5 kpc. Astron. Astrophys. 2022, 664, A175. [Google Scholar] [CrossRef]
- Plšek, T.; Werner, N.; Topinka, M.; Simionescu, A. CAvity DEtection Tool (CADET): Pipeline for detection of X-ray cavities in hot galactic and cluster atmospheres. Mon. Not. R. Astron. Soc. 2024, 527, 3315–3346. [Google Scholar] [CrossRef]
- Nidever, D.L.; Dey, A.; Fasbender, K.; Juneau, S.; Meisner, A.M.; Wishart, J.; Scott, A.; Matt, K.; Nikutta, R.; Pucha, R. Second Data Release of the All-sky NOIRLab Source Catalog. Astron. J. 2021, 161, 192. [Google Scholar] [CrossRef]
- Kounkel, M.; Covey, K. Untangling the Galaxy. I. Local Structure and Star Formation History of the Milky Way. Astron. J. 2019, 158, 122. [Google Scholar] [CrossRef]
- Patel, V.; Hora, J.L.; Ashby, M.L.N.; Vig, S. Identification of Outer Galaxy Cluster Members Using Gaia DR3 and Multidimensional Simulation. arXiv 2025, arXiv:2507.09721. [Google Scholar] [CrossRef]
- Hendy, Y.H.M.; Shokry, A.; Takey, A.; Aboueisha, M.S. The first CCD photometric study of the member eclipsing binary ZTF J060425.73+365000.1 in the newly discovered young open cluster UBC 68. New Astron. 2025, 119, 102392. [Google Scholar] [CrossRef]
- Thakur, P.S.; Verma, R.K.; Tiwari, R. Analysis of Time Complexity of K-Means and Fuzzy C-Means Clustering Algorithm. Eng. Math. Lett. 2024, 2024, 4. [Google Scholar] [CrossRef]
- Benvenuto, F.; Piana, M.; Campi, C.; Massone, A.M. A Hybrid Supervised/Unsupervised Machine Learning Approach to Solar Flare Prediction. Astrophys. J. 2018, 853, 90. [Google Scholar] [CrossRef]
- Richards, J.W.; Starr, D.L.; Miller, A.A.; Bloom, J.S.; Butler, N.R.; Brink, H.; Crellin-Quick, A. Construction of a Calibrated Probabilistic Classification Catalog: Application to 50k Variable Sources in the All-Sky Automated Survey. Astrophys. J. Suppl. Ser. 2012, 203, 32. [Google Scholar] [CrossRef][Green Version]
- Villaescusa-Navarro, F.; Anglés-Alcázar, D.; Genel, S.; Spergel, D.N.; Somerville, R.S.; Dave, R.; Pillepich, A.; Hernquist, L.; Nelson, D.; Torrey, P.; et al. The CAMELS Project: Cosmology and Astrophysics with Machine-learning Simulations. Astrophys. J. 2021, 915, 71. [Google Scholar] [CrossRef]
- Liu, F.T.; Ting, K.M.; Zhou, Z.H. Isolation Forest. In Proceedings of the 2008 Eighth IEEE International Conference on Data Mining, Pisa, Italy, 15–19 December 2008; pp. 413–422. [Google Scholar] [CrossRef]
- Wen, J.; Ahmadzadeh, A.; Georgoulis, M.K.; Sadykov, V.M.; Angryk, R.A. Outlier Detection and Removal in Multivariate Time Series for a More Robust Machine Learning–based Solar Flare Prediction. Astrophys. J. Suppl. Ser. 2025, 277, 60. [Google Scholar] [CrossRef]
- Pruzhinskaya, M.V.; Malanchev, K.L.; Kornilov, M.V.; Ishida, E.E.O.; Mondon, F.; Volnova, A.A.; Korolev, V.S. Anomaly detection in the Open Supernova Catalog. Mon. Not. R. Astron. Soc. 2019, 489, 3591–3608. [Google Scholar] [CrossRef]
- Villar, V.A.; Cranmer, M.; Berger, E.; Contardo, G.; Ho, S.; Hosseinzadeh, G.; Lin, J.Y.Y. A Deep-learning Approach for Live Anomaly Detection of Extragalactic Transients. Astrophys. J. Suppl. Ser. 2021, 255, 24. [Google Scholar] [CrossRef]
- Sánchez-Sáez, P.; Lira, H.; Martí, L.; Sánchez-Pi, N.; Arredondo, J.; Bauer, F.E.; Bayo, A.; Cabrera-Vives, G.; Donoso-Oliva, C.; Estévez, P.A.; et al. Searching for Changing-state AGNs in Massive Data Sets. I. Applying Deep Learning and Anomaly-detection Techniques to Find AGNs with Anomalous Variability Behaviors. Astron. J. 2021, 162, 206. [Google Scholar] [CrossRef]
- Chan, H.S.; Villar, V.A.; Cheung, S.H.; Ho, S.; O’Grady, A.J.G.; Drout, M.R.; Renzo, M. Searching for Anomalies in the ZTF Catalog of Periodic Variable Stars. Astrophys. J. 2022, 932, 118. [Google Scholar] [CrossRef]
- Lochner, M.; Bassett, B.A. ASTRONOMALY: Personalised active anomaly detection in astronomical data. Astron. Comput. 2021, 36, 100481. [Google Scholar] [CrossRef]
- Angelis, D.; Sofos, F.; Karakasidis, T.E. Artificial Intelligence in Physical Sciences: Symbolic Regression Trends and Perspectives. Arch. Comput. Methods Eng. 2023, 30, 3845–3865. [Google Scholar] [CrossRef]
- Schmidt, M.; Lipson, H. Symbolic Regression of Implicit Equations. In Genetic Programming Theory and Practice VII; Riolo, R., O’Reilly, U.M., McConaghy, T., Eds.; Springer: Boston, MA, USA, 2010; pp. 73–85. [Google Scholar] [CrossRef]
- Llorella, F.R.; Cebrian, J.A. Exploring Symbolic Regression and Genetic Algorithms for Astronomical Object Classification. Open J. Astrophys. 2025, 8, 27. [Google Scholar] [CrossRef]
- Tan, B. Neural infalling cloud equations (NICE): Increasing the efficacy of subgrid models and scientific equation discovery using neural ODEs and symbolic regression. Mon. Not. R. Astron. Soc. 2025, 537, 3383–3395. [Google Scholar] [CrossRef]
- Lemos, P.; Jeffrey, N.; Cranmer, M.; Ho, S.; Battaglia, P. Rediscovering orbital mechanics with machine learning. Mach. Learn. Sci. Technol. 2023, 4, 045002. [Google Scholar] [CrossRef]
- Delgado, A.M.; Wadekar, D.; Hadzhiyska, B.; Bose, S.; Hernquist, L.; Ho, S. Modelling the galaxy-halo connection with machine learning. Mon. Not. R. Astron. Soc. 2022, 515, 2733–2746. [Google Scholar] [CrossRef]
- Gebhardt, M.; Anglés-Alcázar, D.; Borrow, J.; Genel, S.; Villaescusa-Navarro, F.; Ni, Y.; Lovell, C.C.; Nagai, D.; Davé, R.; Marinacci, F.; et al. Cosmological baryon spread and impact on matter clustering in CAMELS. Mon. Not. R. Astron. Soc. 2024, 529, 4896–4913. [Google Scholar] [CrossRef]
- Kuang, Y.; Wang, J.; Huang, H.; Ye, M.; Zhu, F.; Li, X.; Hao, J.; Wu, F. Advancing Symbolic Discovery on Unsupervised Data: A Pre-training Framework for Non-degenerate Implicit Equation Discovery. arXiv 2025, arXiv:2505.03130. [Google Scholar]
- Lima, M.; Cunha, C.E.; Oyaizu, H.; Frieman, J.; Lin, H.; Sheldon, E.S. Estimating the redshift distribution of photometric galaxy samples. Mon. Not. R. Astron. Soc. 2008, 390, 118–130. [Google Scholar] [CrossRef]
- Balakrishnan, V.; Champion, D.; Barr, E.; Kramer, M.; Sengar, R.; Bailes, M. Pulsar candidate identification using semi-supervised generative adversarial networks. Mon. Not. R. Astron. Soc. 2021, 505, 1180–1194. [Google Scholar] [CrossRef]
- Rahmani, S.; Teimoorinia, H.; Barmby, P. Classifying galaxy spectra at 0.5 < z < 1 with self-organizing maps. Mon. Not. R. Astron. Soc. 2018, 478, 4416–4432. [Google Scholar] [CrossRef]
- Stein, G.; Blaum, J.; Harrington, P.; Medan, T.; Lukić, Z. Mining for Strong Gravitational Lenses with Self-supervised Learning. Astrophys. J. 2022, 932, 107. [Google Scholar] [CrossRef]
- Kumar, A.; Kumar, V. Hybrid-Ensemble Deep-Learning Models to Enhance the Sunspot Prediction and Forecasting of Solar Cycle 26. Sol. Phys. 2025, 300, 100. [Google Scholar] [CrossRef]
- Angarita, Y.; Chaparro, G.; Lumsden, S.L.; Walsh, C.; Avison, A.; Asabre Frimpong, N.; Fuller, G.A. Pattern finding in millimetre-wave spectra of massive young stellar objects. Astron. Astrophys. 2025, 694, A20. [Google Scholar] [CrossRef]
- Shojaeian, A.; Shafizadeh-Moghadam, H.; Sharafati, A.; Shahabi, H. Extreme flash flood susceptibility mapping using a novel PCA-based model stacking approach. Adv. Space Res. 2024, 74, 5371–5382. [Google Scholar] [CrossRef]
- Bussov, M.; Nättilä, J. Segmentation of turbulent computational fluid dynamics simulations with unsupervised ensemble learning. Signal Process. Image Commun. 2021, 99, 116450. [Google Scholar] [CrossRef]
- Yuan, Z.; Chang, J.; Banerjee, P.; Han, J.; Kang, X.; Smith, M.C. StarGO: A New Method to Identify the Galactic Origins of Halo Stars. Astrophys. J. 2018, 863, 26. [Google Scholar] [CrossRef]
- Li, J.; Ray, S.; Lindsay, B.G. A Nonparametric Statistical Approach to Clustering via Mode Identification. J. Mach. Learn. Res. 2007, 8, 1687–1723. [Google Scholar]
- Bovy, J.; Hogg, D.W.; Roweis, S.T. Extreme deconvolution: Inferring complete distribution functions from noisy, heterogeneous and incomplete observations. Ann. Appl. Stat. 2011, 5, 1657–1677. [Google Scholar] [CrossRef]
- Bhave, A.; Kulkarni, S.; Desai, S.; Srijith, P.K. Two dimensional clustering of Gamma-Ray Bursts using durations and hardness. Astrophys. Space Sci. 2022, 367, 39. [Google Scholar] [CrossRef]













| Algorithm | Runtime Scaling 1 | Classic Applications | Novel Applications | Popular Datasets and Data Types | Strengths | Weaknesses and Failure Modes | Overall Evaluation |
|---|---|---|---|---|---|---|---|
| PCA | [23] | Denoise and PCA correction 2 applied to remove unwanted systematic variations [79,80,81,82] Finding the major driver of variation, where subsequent classification is performed [83,84,85] | Given a stellar catalog, selecting high-quality samples used for further ML training or analysis [86]. | SDSS [81,85] Spectra [80,83,85] | Can be automated and thus can be applied to large datasets [81]. Results very interpretable. Significantly fast. | Linear and thus may not demonstrate differences between clusters. Denoise: when the systematic uncertainty varies case by case [79]. | Simple and widely applied. Fast ⇔ Linear and simple. |
| Isomap | Traditional: 3. Floyd–Warshall: Dijkstra: [16,23] | Reducing the dimensionality of data before applying supervised learning (e.g., XGBoost, Random Forest, SVM) [36,40,46] | Spectral outlier detection, which finds mostly the spectra of spectroscopic binaries, but the method becomes less effective at low signal-to-noise ratio (SNR), where outliers lie close to the main clusters [87]. | SDSS [36,40,46,87] Stellar spectra [36,40,46,87] and latent space [41,88] | Isomap, being a non-linear method, projects clusters more distinctly and can thus be used for clearer visualization [88]. | When the SNR is low, the clusters may be projected close to each other, reducing the accuracy of clustering [87]. The program needs to be re-run from the start if new data points are added. | Moderate cost; one can reduce cost by selecting a good training set 4. Separation of clusters ⇔ Sensitive to noise. |
| LLE | [16] | Being a component of automated classification | Vanderplas and Connolly [44] were the first to systematic apply LLE to SDSS spectral classification. | RAVE [47,89] and SDSS [44,48] Light curves [7,49,90], stellar and galactic spectra [44,47,48,91] | Better preserves relation between similar data points, which is valuable information for classification [44,47,48]. Once the lower-dimensional space is defined, classification based on LLE is fast and can be automated [7]. Low number of free parameters [7]. | Computational cost limits its use on large datasets. Sensitive to outliers; one solution is to use robust LLE [44,92]. May not generate meaningful results for small datasets [48]. Requires trials and errors to find the best parameters [47]. The program needs to be re-run from the start if new data points are added [48]. Ward and Lumsden [39] note that LLE can fail to distinguish protostar spectra when continuum emission dominates. Removing the continuum improves the performance of LLE. | Moderate cost; one can reduce cost by selecting a good training set. Better to be used on large, low-dim datasets |
| t-SNE | Traditional: Barnes–Hut: [23] | From a dataset, creating a subset of data points of research interest [57,93,94] Performing chemical tagging or very-similar classification [56,95,96] Investigating Gamma-ray bursts [58,97,98] | Given stellar photometric data, identifying stellar population and facilitating the classification of outlier spectral energy distributions [99]. | Multi-source dataset [56,57] and GALAH [94,96] Catalog [55,56] | Performs well on high-dimensional datasets [54,55]. | Relatively high computational cost and relatively slow [54]. Requires parameter fine-tuning (especially perplexity) by inspecting the results. Choosing suboptimal values can distort the low-dimensional embedding, potentially producing misleading clusters or relationships that do not reflect the true structure of the data [56,58]. The program needs to be re-run from the start if new data points are added [54]. | Similar to LLE’s. |
| SOM | [100] | Redshift calibration [62,63,101,102] Dimensionality reduction (and clustering) by projecting the feature space onto a 2D grid of neurons [63,101,103,104] Galaxy visualization and classification [105,106] | Viscasillas Vázquez et al. [107] use SOM to cluster Gaia spectral data to identify hot subdwarf binaries. A small dataset is used. | KiDS [62,63,102] Photometric [62,63,103] data, typically galactic | SOM not only reduces the dimension, but also clusters the data points into cells, thus can be used to identify gold samples 5 [63,102] or generate prototypes that represent the cells [104]. Once trained, other catalogs can be mapped onto the same projection [65]. | The small number of neurons tend to be dominated by the most common features of the dataset. Complex and rare morphologies may be lost. To better present the features, we need more neurons, which increases the computational cost. To reduce the computational cost, one may adopt hierarchical SOM layers [104]. | SOM is usually used to reduce the dimension to 2D. Projecting the data in 3D has little impact on the results [101]. Large computational cost limits its application to large datasets [104]. |
| AE | Depends on the number of epochs required, the data size, and hardware | Convolutional AE (CAE, [108]) [71,109,110] and VAE [75,76,111,112,113,114] Anomaly detection [75,99,109,110] Feature extraction and dim red [71,109,115,116] Generative model: Producing new data points that resemble the original dataset by decoding the latent space learned [111,114] | Scatter VAE, which can generate a Gaia BP/RP spectrum and simultaneously estimate intrinsic scatter for individual spectra [113]. | Galaxy Zoo [71,109,114] Simulations [76,111,115] and images [71,109,112,114,115,116] The images are mostly galactic | VAE is interpretable and stable and thus ideal among generative models [111]. | Training the VAE takes a long time [75]. The result of anomaly detection is very sensitive to the hyperparameters. E.g., if the AE is trained using brighter objects, and dimmer objects are introduced, the AE may view dimmer objects as anomalies [99]. For the VAE, the reconstructed feature space may not be very similar to the input feature space, which suggests that the decoder may lack sufficient power. However, if the decoder power increases, the VAE may experience posterior collapse [117] where the latent space is ignored [117]. | The VAE is more computationally expensive than the AE. The AE and CAE are widely applied to reduce the size of data, decreasing the computational cost of further data mining. Low output quality ⇔ High stability [111]. Takes a long time to train ⇔ Becomes fast once trained. |
| PCA | MDS | Isomap | LLE | t-SNE | SOM | AE | Total | |
|---|---|---|---|---|---|---|---|---|
| No. | ||||||||
| Spectral data | 251 | 1 | 2 | 5 | 18 | 41 | 22 | 340 |
| Spectroscopic data | 256 | 0 | 1 | 1 | 28 | 30 | 6 | 322 |
| Image | 106 | 0 | 1 | 0 | 7 | 26 | 16 | 156 |
| Photometry data | 46 | 0 | 0 | 2 | 7 | 51 | 1 | 107 |
| Catalogs | 35 | 0 | 1 | 5 | 14 | 38 | 3 | 96 |
| Light curves | 31 | 0 | 0 | 3 | 4 | 9 | 4 | 51 |
| Astrometric data | 9 | 3 | 0 | 0 | 2 | 4 | 0 | 18 |
| Radio-astronomy | 4 | 0 | 0 | 0 | 1 | 4 | 3 | 12 |
| Polarimetric data | 9 | 0 | 0 | 0 | 0 | 0 | 1 | 10 |
| Latent space | 3 | 0 | 1 | 0 | 0 | / | / | 4 |
| Total No. | 750 | 4 | 6 | 16 | 81 | 203 | 56 | 1116 |
| Algorithm | Runtime Scaling 1 | Classic Applications | Novel Applications | Popular Datasets and Data Types | Strengths | Weaknesses and Failure Modes | Overall Evaluation |
|---|---|---|---|---|---|---|---|
| GMM | [151] | Studying open clusters (OCs) [133,151,176] Clustering the objects such that one or more clusters exhibit properties of scientific interest [131,176,177] | A novel pre-processing method [178] that enhances the performance of the GMM (and potentially other density estimation algorithms), with applications so far limited to gravitational wave and black hole astronomy. | Gaia [133,151,176] Stellar catalog [133,151,176] and photometry data [127,128] | Fast and simple. Effective at finding membership lists of OCs [151]. | Does not deal with noise ⇒ Expensive and ineffective in the presence of noise [151]. Struggles to represent a non-Gaussian cluster using a single Gaussian distribution and may split the cluster into multiple separate but related clusters [179]. Hard-boundary cuts applied to the data may degrade the performance. A solution is to exclude the dimensions that were used for hard-boundary selection of the data. [179]. Sensitive to the number of clusters [151]. The GMM fails when there are severe outliers in a large dataset. The GMM clusters field stars as numerous clusters; when there are more field stars and a greater number of components, the size of the routine () increases drastically with the size of the dataset 2 [151]. This is improved by imposing the GMM on sub-partitions. | Inefficient for blind search of a large dataset with a lot of noise [151]. Fast and simple ⇔ More sensitive to convex-shaped clusters compared to other shapes. |
| K-means | [151] | Clustering the objects such that one or more clusters exhibit properties of scientific interest [180,181,182] | K-means is used as a component of a semi-supervised learning framework (Section 5), used in combination with Random Forest to classify spectroscopic data of stars, galaxies, and quasars [183]. | APOGEE [181,182,184] Spectroscopic data, where most examples use chemical abundance [180,181,182,183,184,185,186] | Fast and simple [180,186]. Applicable to large datasets [180]. Effective for near-normally distributed isotropic clusters in high-dim space [184]. | Greatly depends on initialization and thus needs to be repeated with different initializations [180,183]. Requires a known number of clusters [184]. More sensitive to spherical clusters than other shapes [183,184]. Does not deal with noise [184]. | Does not deal with noise ⇒ More effective if the clusters are dense [184]. Fast and simple ⇔ Assumes clusters are spherical and of similar sizes [183]. |
| HC | Traditional: FOF: [100] | Agglomerative HC [100,139,187,188,189] Clustering the objects so one can study the groups separately [100,140,187,190] Clustering to reveal the structure of the object (i.e., the dataset) [188,189,191] | Data points are first clustered into SOM cells (i.e., neurons), and the cells are further clustered using HC. Compared to solely HC, it reduces the computational cost. Compared to solely SOM, it is more flexible and faster to view the result of different numbers of clusters [100]. | Different databases are used (e.g., KiDS [100], Gaia [139], ALMA [189], and multi-source dataset [140,187]) Photometric data [100,187], astrometric data [139,140], and catalog [188,190] | Robust, flexible, and interpretable [100]. Dendrograms can be very informative and interpretable, providing information on how close or similar two clusters are [139] or showing the hierarchical structure of the objects [189,191]. | High computational cost [100] Does not like overlapping clusters [190]. | High computational cost ⇔ Provides informative dendrogram and does not require re-running to explore different numbers of clusters. Best to be used to study hierarchically structured large clusters. |
| DBSCAN | [151] | Searching for OC candidates [146,151,192] and molecular cloud candidates [148] In broader terms: providing membership lists for clusters, with one or multiple clusters subsequently analyzed [146,147,148,151,192,193] | DBSCAN is a component of Cavity Detection Tool (CADET)—an automated machine learning pipeline for detecting and measuring X-ray cavities. DBSCAN is used to cluster processed pixels [194]. | Gaia [147] Astrometric data [146,147,151,192] and catalog [193,195] | Good scalability 3 and can be used on large datasets [146,151]. Can be used for blind searching [151]. Can discard small clusters [151]. Does not require an a priori number of clusters [192,193]. Can detect arbitrarily shaped clusters [146,192,193]. | Sensitive to hyperparameters [151]. Applies the same requirements to all clusters (i.e., has global parameters) and thus may not detect clusters of different densities [146,147,151]. A solution is to divide the feature space into multiple smaller regions for clustering [146]. Does not handle uncertainty [195]. Low sensitivity: May fail to cluster diffuse clusters [147]. Given photometric and astrometric data, DBSCAN may fail to differentiate two groups with different kinematics but that are spatially close [147]. | Can be applied to large datasets but may require dividing the feature spaces into regions for separate clustering [147]. More sensitive to detect distant OCs [151]. |
| HDBSCAN | [151] | Providing membership lists for clusters, with each cluster subsequently analyzed [56,133,151,196] Providing information about the substructures of a group [154,156] | HDBSCAN-MC: Integrating Monte Carlo simulation with HDBSCAN to better handle astrometric data with significant uncertainties [197]. | Gaia [133,151,156,196,197] Astrometric data [133,151,156,196,197] and photometric data [156,198] | Able to reveal the substructures [151,156]. High sensitivity: Can find clusters of different densities [133,151] and small-scale structure [154]. Parameters are relatively straightforward to interpret when applied to detecting open clusters [151]. | Low precision: Requires careful post-processing to select valid clusters and remove false positives [151]. Prone to finding clusters in the densest region [151]. Not effective for diffuse clusters with a low SNR [151]. Given astrometric data, HDBSCAN may fail to detect both nearby and distant clusters at the same time when the parallax range is large [196]. | High sensitivity ⇔ Low precision [151]. Higher sensitivity to nearby OCs [151]. Acceptable runtime: Does not increase significantly compared to DBSCAN [151]. |
| FCM | [199] | Visualizing the structure of the dataset or the shape of the clusters [172,174,175] Studying the sun [171,174,175,200] | Application in astronomy is limited. A newer application is to cluster sunspot cycles [171]. There are many cases where fuzzy logic is applied to clustering sunspot cycles, and this paper is the first to use FCM. | Varios databases (e.g., simulation [172], SDSS [164], The ATNF Pulsar Catalogue [173], and APOGEE [167]) Image [174,175], catalog [167,173], and time series [171,172] | Shows the membership probability of each data point, which can be used to compile a list of ambiguous targets that require re-classification [164]. Useful if the cluster boundaries are not clear [174]. | Not applicable to large datasets that have a lot of clusters. The runtime () increases drastically with the number of clusters m [199]. | Should be applied if the clusters overlap. Soft clustering and thus informative ⇔ High computational cost. |
| GMM | K-Means | HC | DBSCAN | HDBSCAN | FCM | Total | |
|---|---|---|---|---|---|---|---|
| No. | |||||||
| Spectroscopic data | 59 | 54 | 24 | 9 | 24 | 0 | 170 |
| Spectral data | 43 | 76 | 19 | 18 | 11 | 1 | 168 |
| Catalogs | 48 | 27 | 22 | 31 | 22 | 1 | 151 |
| Image | 36 | 70 | 13 | 13 | 2 | 3 | 137 |
| Photometric data | 51 | 14 | 8 | 14 | 26 | 1 | 114 |
| Astrometric data | 4 | 1 | 6 | 16 | 22 | 1 | 50 |
| Bivariate data | 11 | 11 | 12 | 1 | 1 | 0 | 36 |
| Light curves | 10 | 9 | 3 | 3 | 3 | 0 | 28 |
| Polarimetric data | 2 | 7 | 0 | 0 | 0 | 0 | 9 |
| Latent space | 3 | 0 | 0 | 0 | 0 | 0 | 3 |
| Total No. | 267 | 269 | 107 | 105 | 111 | 7 | 866 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Kuo, C.-T.; Xu, D.; Friesen, R. A Brief Review of Unsupervised Machine Learning Algorithms in Astronomy: Dimensionality Reduction and Clustering. Universe 2025, 11, 412. https://doi.org/10.3390/universe11120412
Kuo C-T, Xu D, Friesen R. A Brief Review of Unsupervised Machine Learning Algorithms in Astronomy: Dimensionality Reduction and Clustering. Universe. 2025; 11(12):412. https://doi.org/10.3390/universe11120412
Chicago/Turabian StyleKuo, Chih-Ting, Duo Xu, and Rachel Friesen. 2025. "A Brief Review of Unsupervised Machine Learning Algorithms in Astronomy: Dimensionality Reduction and Clustering" Universe 11, no. 12: 412. https://doi.org/10.3390/universe11120412
APA StyleKuo, C.-T., Xu, D., & Friesen, R. (2025). A Brief Review of Unsupervised Machine Learning Algorithms in Astronomy: Dimensionality Reduction and Clustering. Universe, 11(12), 412. https://doi.org/10.3390/universe11120412

