The miniJPAS and J-NEP Surveys: Machine Learning for Star-Galaxy Separation
Abstract
1. Introduction
2. Data
- SDSS DR12 (Section 2.1): the final Data Release (DR12) of the third-generation Sloan Digital Sky Survey (SDSS-III [6]), which collected imaging and spectroscopy from 2008 to 2014, providing wide-area multiband photometry and spectra of millions of sources.
- Gaia EDR3 (Section 2.2): the Early Data Release 3 of the Gaia mission [7], delivering high-precision astrometry, broad-band photometry, and radial velocities for over 1.5 billion stars in the Milky Way.
- DEEP3 DR4 (Section 2.3): spectroscopic redshift survey conducted with DEIMOS on the Keck II telescope in the Extended Groth Strip (EGS) [8]. It provides secure galaxy redshifts and spectra, forming part of the AEGIS survey program.
- Binospec (Section 2.4): spectroscopic data from Binospec [9], a high-throughput optical spectrograph on the 6.5-m MMT, covering 370–1000 nm with wide field of view, used to obtain redshifts for faint galaxies.
- DESI DR1 (Section 2.5): the first data release of the Dark Energy Spectroscopic Instrument [10], based on the first 13 months of the main survey. It provides high-confidence redshifts for 18.7 million objects (13.1 M galaxies, 1.6 M quasars, 4.0 M stars) over more than 9000 deg2, making it the largest extragalactic spectroscopic sample to date and a key resource for cosmology and large-scale structure studies.
- HSC-SSP PDR2 (Section 2.6): the second public data release of the Hyper Suprime-Cam Subaru Strategic Program (HSC-SSP [11]), providing deep, wide-field imaging in five broad bands (), with exquisite seeing from the Subaru telescope.
2.1. Crossmatch with SDSS DR12
2.2. Crossmatch with Gaia EDR3
2.3. Crossmatch Between miniJPAS and DEEP3 DR4
2.4. Crossmatch Between J-NEP and Binospec Data
2.5. Crossmatch Between miniJPAS and DESI DR1
2.6. Crossmatch Between miniJPAS and HSC-SSP PDR2
2.7. Final Labeled Set
3. Feature Configuration
3.1. Observational Features
- Point Spread Function (PSF) of the tile,
- Galactic extinction ,
- error,
- Object flags indicating observational issues.
3.2. Photometric Features
3.3. Morphological Features
- Ellipticity A/B = A_WORLD/B_WORLD, where A_WORLD and B_WORLD are the root mean square values of the light distribution along the major and minor axes of the source, respectively.
- Concentration cr = MAG_APER_1_5 − MAG_APER_3_0 (in the r-band), calculated from magnitudes measured within circular apertures of 1.5 and 3.0 arcsec.
- Full Width at Half Maximum FWHM_WORLD, defined assuming a Gaussian intensity distribution.
- Normalized peak surface brightness μ = MU_MAX/MAG_APER_3_0 (in the r-band), where MU_MAX represents the peak surface brightness above the local background.
3.4. Feature Sets
4. Supervised Classification
4.1. Representativeness
- the internal consistency between the labeled miniJPAS and labeled J-NEP subsets;
- the external coverage of the full miniJPAS and J-NEP datasets by the combined labeled set.
4.2. TPOT Pipeline for XGBoost Optimization
4.3. Performance Metrics
5. Results
5.1. Metrics
5.2. Misclassifications
5.3. Stellar Locus
5.4. Feature Importance
6. Conclusions
- The XGBoost model combining photometric and morphological features achieves a higher area under the ROC curve and outperforms both the SGLC and CLASS_STAR classifiers across all magnitude ranges.
- Using photometry alone yields competitive performance, slightly surpassing SGLC, but leads to misclassifications of faint stars as galaxies at . This ambiguity is reduced when morphological features are included.
- The permutation importance analysis identifies the concentration, normalized peak surface brightness, and PSF as critical morphological parameters for classification. Photometric bands at approximately 3900, 4600, and 6800 Å are consistently among the most informative, highlighting their relevance for characterizing stellar and galactic spectral slopes.
- The UMAP representativeness analysis demonstrates the robustness and consistency of the labeled training set, validating its applicability for reliable classification across the full miniJPAS and J-NEP datasets.
- Our ML pipeline, publicly released as a value-added catalog (VAC), offers reliable star-galaxy classifications that can directly support downstream scientific analyses, including studies of galaxy evolution and cosmological investigations reliant on precise object classifications.
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
Appendix A. Queries
Appendix A.1. miniJPAS and J-NEP Sources
Appendix A.2. SDSS DR12 Photometric Crossmatch
Appendix A.3. SDSS DR12 Spectroscopic Crossmatch
Appendix A.4. Gaia EDR3 Crossmatch
Appendix A.5. HSC PDR2 Crossmatch
Appendix B. Hyperparameter Configuration
- booster = gbtree: specifies the tree-based boosting model, suitable for structured tabular data.
- colsample_bylevel = 0.6: fraction of features sampled for each tree level, reducing overfitting by introducing randomness.
- colsample_bynode = 1.0: fraction of features sampled for each split, set to 1 (all features used).
- colsample_bytree = 0.9: fraction of features sampled for each tree, promoting diversity among trees.
- gamma = 0.5: minimum loss reduction required for further partitioning; larger values make the model more conservative.
- learning_rate = 0.05: step size shrinkage, balancing learning speed with generalization.
- max_depth = 7: maximum depth of each tree, controlling model complexity.
- min_child_weight = 5: minimum sum of instance weights in a child node; higher values prevent overfitting small fluctuations.
- n_estimators = 600: number of boosting rounds (trees).
- objective = binary:logistic: specifies binary classification with probabilistic outputs.
- reg_alpha = 0: L1 regularization term, promoting sparsity in weights; set to zero here.
- reg_lambda = 1: L2 regularization term, penalizing large weights to reduce overfitting.
- scale_pos_weight = 3.46: weighting factor to address class imbalance between stars and galaxies.
- subsample = 0.95: fraction of training instances sampled for each tree, preventing overfitting by adding stochasticity.
Appendix C. Treatment of Ambiguous Crossmatches

Appendix D. Visual Inspection of Misclassifications


References
- Bonoli, S.; Marín-Franch, A.; Varela, J.; Ramió, H.V.; Abramo, L.R.; Cenarro, A.J.; Dupke, R.A.; Vílchez, J.M.; Cristóbal-Hornillos, D.; Delgado, R.M.G.; et al. The miniJPAS survey: A preview of the Universe in 56 colors. Astron. Astrophys. 2021, 653, A31. [Google Scholar] [CrossRef]
- Hernán-Caballero, A.; Willmer, C.N.A.; Varela, J.; López-Sanjuan, C.; Marín-Franch, A.; Ramió, H.V.; Civera, T.; Ederoclite, A.; Muniesa, D.; Cenarro, J.; et al. J-NEP: 60-band photometry and photometric redshifts for the James Webb Space Telescope North Ecliptic Pole Time-Domain Field. Astron. Astrophys. 2023, 671, A71. [Google Scholar] [CrossRef]
- Jansen, R.A.; Windhorst, R.A. The James Webb Space Telescope North Ecliptic Pole Time-domain Field. I. Field Selection of a JWST Community Field for Time-domain Studies. Publ. Astron. Soc. Pac. 2018, 130, 124001. [Google Scholar] [CrossRef]
- López-Sanjuan, C.; Vázquez Ramió, H.; Varela, J.; Spinoso, D.; Angulo, R.E.; Muniesa, D.; Viironen, K.; Cristóbal-Hornillos, D.; Cenarro, A.J.; Ederoclite, A.; et al. J-PLUS: Morphological star/galaxy classification by PDF analysis. Astron. Astrophys. 2019, 622, A177. [Google Scholar] [CrossRef]
- Baqui, P.; Marra, V.; Casarini, L.; Angulo, R.; Díaz-García, L.A.; Hernández-Monteagudo, C.; Lopes, P.A.A.; López-Sanjuan, C.; Muniesa, D.; Placco, V.M.; et al. The miniJPAS survey: Star-galaxy classification using machine learning. Astron. Astrophys. 2021, 645, A87. [Google Scholar] [CrossRef]
- Alam, S.; Albareti, F.D.; Allende Prieto, C.; Anders, F.; Anderson, S.F.; Anderton, T.; Andrews, B.H.; Armengaud, E.; Aubourg, É.; Bailey, S.; et al. The Eleventh and Twelfth Data Releases of the Sloan Digital Sky Survey: Final Data from SDSS-III. Astrophys. J. Suppl. Ser. 2015, 219, 12. [Google Scholar] [CrossRef]
- Gaia Collaboration; Brown, A.G.A.; Vallenari, A.; Prusti, T.; de Bruijne, J.H.J.; Babusiaux, C.; Biermann, M.; Creevey, O.L.; Evans, D.W.; Eyer, L.; et al. Gaia Early Data Release 3. Summary of the contents and survey properties. Astron. Astrophys. 2021, 649, A1. [Google Scholar] [CrossRef]
- Cooper, M.C.; Aird, J.A.; Coil, A.L.; Davis, M.; Faber, S.M.; Juneau, S.; Lotz, J.M.; Nandra, K.; Newman, J.A.; Willmer, C.N.A.; et al. The DEEP3 Galaxy Redshift Survey: Keck/DEIMOS Spectroscopy in the GOODS-N Field. Astrophys. J. Suppl. Ser. 2011, 193, 14. [Google Scholar] [CrossRef]
- Fabricant, D.; Fata, R.; Epps, H.; Gauron, T.; Mueller, M.; Zajac, J.; Amato, S.; Barberis, J.; Bergner, H.; Brennan, P.; et al. Binospec: A Wide-field Imaging Spectrograph for the MMT. Publ. Astron. Soc. Pac. 2019, 131, 075004. [Google Scholar] [CrossRef]
- Abdul Karim, M.; Adame, A.G.; Aguado, D.; Aguilar, J.; Ahlen, S.; Alam, S.; Aldering, G.; Alexander, D.M.; Alfarsy, R.; Allen, L.; et al. Data Release 1 of the Dark Energy Spectroscopic Instrument. arXiv 2025, arXiv:2503.14745. [Google Scholar] [CrossRef]
- Aihara, H.; AlSayyad, Y.; Ando, M.; Armstrong, R.; Bosch, J.; Egami, E.; Furusawa, H.; Furusawa, J.; Goulding, A.; Harikane, Y.; et al. Second data release of the Hyper Suprime-Cam Subaru Strategic Program. Publ. Astron. Soc. Jpn. 2019, 71, 114. [Google Scholar] [CrossRef]
- Marra, V. Onexmatch. 2025. Available online: https://zenodo.org/records/17148385 (accessed on 23 November 2025).
- von Marttens, R.; Marra, V.; Quartin, M.; Casarini, L.; Baqui, P.O.; Alvarez-Candal, A.; Galindo-Guil, F.J.; Fernández-Ontiveros, J.A.; del Pino, A.; Díaz-García, L.A.; et al. J-PLUS DR3: Galaxy-Star-Quasar classification. Mon. Not. Roy. Astron. Soc. 2023, 527, 3347–3365. [Google Scholar] [CrossRef]
- Chaini, S.; Bagul, A.; Deshpande, A.; Gondkar, R.; Sharma, K.; Vivek, M.; Kembhavi, A. Photometric identification of compact galaxies, stars, and quasars using multiple neural networks. Mon. Not. Roy. Astron. Soc. 2023, 518, 3123–3136. [Google Scholar] [CrossRef]
- del Pino, A.; López-Sanjuan, C.; Hernán-Caballero, A.; Domínguez-Sánchez, H.; von Marttens, R.; Fernández-Ontiveros, J.A.; Coelho, P.R.T.; Lumbreras-Calle, A.; Vega-Ferrero, J.; Jimenez-Esteban, F.; et al. J-PLUS: Bayesian object classification with a strum of BANNJOS. Astron. Astrophys. 2024, 691, A221. [Google Scholar] [CrossRef]
- Zeraatgari, F.Z.; Hafezianzadeh, F.; Zhang, Y.; Mei, L.; Ayubinia, A.; Mosallanezhad, A.; Zhang, J. Machine learning-based photometric classification of galaxies, quasars, emission-line galaxies, and stars. Mon. Not. Roy. Astron. Soc. 2024, 527, 4677–4689. [Google Scholar] [CrossRef]
- Asadi, V.; Haghi, H.; Zonoozi, A.H. Semi-supervised classification of stars, galaxies and quasars using K-means and random-forest approaches. Astron. Astrophys. 2025, 700, A259. [Google Scholar] [CrossRef]
- Bertin, E. Automated Morphometry with SExtractor and PSFEx. In Astronomical Society of the Pacific Conference Series, Proceedings of the Astronomical Data Analysis Software and Systems XX, Boston, MA, USA, 7–11 November 2010; Evans, I.N., Accomazzi, A., Mink, D.J., Rots, A.H., Eds.; Astronomical Society of the Pacific: San Francisco, CA, USA, 2011; Volume 442, p. 435. [Google Scholar]
- Green, G.M.; Schlafly, E.F.; Finkbeiner, D.; Rix, H.W.; Martin, N.; Burgett, W.; Draper, P.W.; Flewelling, H.; Hodapp, K.; Kaiser, N.; et al. Galactic reddening in 3D from stellar photometry—An improved map. Mon. Not. Roy. Astron. Soc. 2018, 478, 651–666. [Google Scholar] [CrossRef]
- López-Sanjuan, C.; Varela, J.; Cristóbal-Hornillos, D.; Vázquez Ramió, H.; Carrasco, J.M.; Tremblay, P.E.; Whitten, D.D.; Placco, V.M.; Marín-Franch, A.; Cenarro, A.J.; et al. J-PLUS: Photometric calibration of large-area multi-filter surveys with stellar and white dwarf loci. Astron. Astrophys. 2019, 631, A119. [Google Scholar] [CrossRef]
- McInnes, L.; Healy, J.; Melville, J. UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction. arXiv 2018, arXiv:1802.03426. [Google Scholar] [CrossRef]
- Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar] [CrossRef]
- Le, T.T.; Fu, W.; Moore, J.H. Scaling tree-based automated machine learning to biomedical big data with a feature set selector. Bioinformatics 2020, 36, 250–256. [Google Scholar] [CrossRef]
- Bertin, E.; Arnouts, S. SExtractor: Software for source extraction. Astron. Astrophys. Suppl. Ser. 1996, 117, 393–404. [Google Scholar] [CrossRef]

















| 18.5–20.5 | 20.5–22.5 | 22.5–23.5 | ||||
|---|---|---|---|---|---|---|
| Pred. Stars | Pred. Gal. | Pred. Stars | Pred. Gal. | Pred. Stars | Pred. Gal. | |
| true stars | 79 | 10 | 92 | 21 | 6 | 4 |
| true gal. | 0 | 385 | 5 | 1523 | 9 | 2350 |
| Crossmatch Source | miniJPAS (Galaxies/Stars) | J-NEP (Galaxies/Stars) |
|---|---|---|
| DESI DR1 | 2307/333 (2307/333) | – |
| Binospec | – | 593/53 (593/53) |
| DEEP3 DR4 | 5748/16 (6583/36) | – |
| SDSS DR12 (spectro) | 189/124 (355/218) | – |
| Gaia EDR3 | –/826 (–/989) | –/565 (–/570) |
| HSC-SSP PDR2 | 2323/765 (4406/946) | – |
| SDSS DR12 (photo) | 157/261 (802/1228) | 105/229 (156/724) |
| Total | 10,724/2325 | 698/847 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Jeakel, A.P.; Vieira dos Santos, G.; Marra, V.; von Marttens, R.; Gurung-López, S.; Abramo, R.; Alcaniz, J.; Benitez, N.; Bonoli, S.; Cenarro, J.; et al. The miniJPAS and J-NEP Surveys: Machine Learning for Star-Galaxy Separation. Galaxies 2026, 14, 6. https://doi.org/10.3390/galaxies14010006
Jeakel AP, Vieira dos Santos G, Marra V, von Marttens R, Gurung-López S, Abramo R, Alcaniz J, Benitez N, Bonoli S, Cenarro J, et al. The miniJPAS and J-NEP Surveys: Machine Learning for Star-Galaxy Separation. Galaxies. 2026; 14(1):6. https://doi.org/10.3390/galaxies14010006
Chicago/Turabian StyleJeakel, Ana Paula, Gabriel Vieira dos Santos, Valerio Marra, Rodrigo von Marttens, Siddhartha Gurung-López, Raul Abramo, Jailson Alcaniz, Narciso Benitez, Silvia Bonoli, Javier Cenarro, and et al. 2026. "The miniJPAS and J-NEP Surveys: Machine Learning for Star-Galaxy Separation" Galaxies 14, no. 1: 6. https://doi.org/10.3390/galaxies14010006
APA StyleJeakel, A. P., Vieira dos Santos, G., Marra, V., von Marttens, R., Gurung-López, S., Abramo, R., Alcaniz, J., Benitez, N., Bonoli, S., Cenarro, J., Cristóbal-Hornillos, D., Daflon, S., Dupke, R., Ederoclite, A., González Delgado, R. M., Hernán-Caballero, A., Hernández-Monteagudo, C., Liu, J., López-Sanjuan, C., ... Zaragoza-Cardiel, J. (2026). The miniJPAS and J-NEP Surveys: Machine Learning for Star-Galaxy Separation. Galaxies, 14(1), 6. https://doi.org/10.3390/galaxies14010006

