Estimating H I Mass Fraction in Galaxies with Bayesian Neural Networks
Abstract
1. Introduction
2. Materials and Methods
2.1. Photometric Estimators of H I Gas Fraction
2.2. From Deterministic ML to Probabilistic Prediction
2.3. Surveys and Value-Added Catalogs
2.4. Cross-Matching and De-Duplication
2.5. Target Definition and Feature Set
2.6. Quality Controls and Final Sample
- Full (38 cols): hydromassnet_full_dataset_all_columns.csv with 31,501 galaxies. Key availability: (100%), (89.7%), (89.7%), (89.7%), SFR proxy logSFR22 (75.9%).
- Processed (9 cols): hydromassnet_processed.csv with the same 31,501 galaxies and no missing values. It retains only fully observed predictors (, , , , , logSFR22, surface_brightness_proxy, ) plus for target construction. Distance-encoding quantities (e.g., catalog distance or recession velocity/redshift) are intentionally excluded to ensure that no explicit distance information is provided to the models and to avoid encoding selection-driven correlations (e.g., Malmquist bias). The retained predictors are limited to optical photometry and simple structural proxies that are uniformly available across the sample.
2.7. Exploratory Characterization of the Feature Space
2.8. Dataset Biases and Their Impact
- Systematic uncertainties in at low surface brightness. For diffuse, low– galaxies, surface–brightness corrections, aperture definitions, and sky-subtraction systematics can introduce large uncertainties in stellar-mass surface density estimates. These effects broaden the apparent relation between optical predictors and , inflating the empirical scatter in the bluest regions of parameter space.
- Assumptions about star-formation histories. The MPA–JHU stellar population models rely on parameterized star-formation histories; for young or bursty galaxies, these assumptions propagate into and consequently into . This contributes additional variance that is not purely astrophysical.
- Training-set bias. Because the dataset is anchored to ALFALFA detections, the training distribution is biased toward gas-rich systems. As a result, part of the dispersion seen in predicted vs. true —particularly in low– and blue galaxies—may reflect dataset imbalance rather than limitations of the Bayesian model itself.
2.9. Empirical Trends in the Optical Parameter Space
2.10. Data Partitions and Leakage Prevention
2.11. Targets and Baseline Formulations
2.12. Deterministic Models and Baseline (Vanilla)
- Vanilla (deterministic baseline): fully connected feed-forward network (hidden layers [128, 64], dropout 0.2), early stopping on validation MAE/RMSE. Reports point metrics only.
- Gradient-boosted trees (GBT): modern boosting library with tuned depth/learning rate/subsample/estimators on the validation split [27]. Provides a strong deterministic model but is not considered the baseline.
- All inputs are standardized (mean/variance) using training-set statistics only; the learned transforms are applied to validation/test/sky-holdout sets to avoid information leakage.
2.13. Bayesian Neural Networks (BNNs)
2.14. Training, Tuning, and Reproducibility Protocol
- 1.
- Splits. Fixed, non-overlapping train/validation/test partitions at the level, plus an additional sky-holdout test built from disjointed HEALPix tiles to probe domain shift [29]. Random seeds are fixed and released.
- 2.
- Feature audit. Exclude any predictor directly or indirectly encoding the observed H I measurement or match quality; all transformations (scaling, PCA if used) are fitted on training only and applied consistently to other splits.
- 3.
- Hyperparameters. We use random search [30] over learning rate, depth/width, dropout rate, weight decay, and (for GBT) tree depth/learning rate/subsample/estimators. Random search simply means that we test many randomly chosen combinations of these hyperparameters and keep the one that performs best on the validation set. Selection is based on validation NLL (for BNN) or RMSE/MAE (for deterministic models), with early stopping (interrupting training once the validation metric stops improving).
- 4.
- 5.
2.15. Metrics and Probabilistic Diagnostics
- Negative log-likelihood (NLL) on held-out data, computed under the predictive distribution returned by the probabilistic models (Equation (4)). For Gaussian heteroscedastic models, we evaluate the per-object NLL using the predicted mean and variance and report the average NLL over the split; lower values indicate better calibrated and sharper predictive distributions.
- Reliability diagrams, which compare the predicted quantiles to observed frequencies and check whether, for example, a 68% interval actually contains the true H I mass about 68% of the time.
- Probability integral transform (PIT) histograms, which test whether the cumulative distribution functions are statistically well calibrated.
- Empirical coverage at 68% and 95%, summarised by the coverage gap , i.e., the difference between the observed and nominal coverage.
2.16. Robustness Tests: Sky-Holdout and Noise Injection
2.17. Implementation Details
- BNN (heteroscedastic; two heads: mean + variance):[iMAG, Ag, Ai, logMsT, e_logMsT, logSFR22, surface_brightness_proxy, e_iMAG]
- DBNN (heteroscedastic; decoupled heads):[iMAG, Ag, Ai, logMsT, e_logMsT, logSFR22, surface_brightness_proxy, e_iMAG]
- Vanilla (deterministic baseline):[iMAG, Ag, Ai, logMsT, e_logMsT, logSFR22, surface_brightness_proxy, e_iMAG]
3. Results
3.1. Overall Accuracy Across Models
3.2. Learning Dynamics
3.3. Prediction Quality and Per-Object Uncertainty
3.4. Comparative Accuracy Summaries
3.5. Robustness to Domain Shift and Injected Noise
- Key takeaways: (i) The deterministic Vanilla baseline achieves strong point accuracy; (ii) the two-headed BNN matches that accuracy while providing calibrated predictive intervals that adapt to data support; and (iii) distributional fidelity and risk-awareness improve with the BNN, which is critical for prioritising 21 cm follow-up and for unbiased population-level inferences.
4. Discussion
4.1. Comparison with PGF Relations
4.2. Utility of Predictive Uncertainty
4.3. Interpretability of Learned Relations
4.4. Robustness Under Domain Shift and Noise
4.5. Limitations
4.6. Implications and Outlook
5. Conclusions
- Main findings
- 1.
- Point accuracy: Deterministic learners (GBT and the Vanilla baseline) achieve strong predictions for . The two-headed BNN matches this accuracy within uncertainties while additionally supplying predictive intervals; as expected, a mean-only BNN underperforms when variance is not explicitly modelled.
- 2.
- Calibrated uncertainty: The heteroscedastic BNN yields per-object predictive distributions with near-nominal empirical coverage (68%/95%) and well-behaved reliability/PIT diagnostics, enabling galaxy-level uncertainty estimates rather than relying on a single global scatter term.
- 3.
- 4.
- Robustness: Under the sky-holdout split, scatter increases modestly and BNN intervals widen in the withheld region—desirable behaviour under domain shift, i.e., when the test galaxies occupy a part of parameter space that is less well represented in the training set. Controlled input perturbations yield steady degradation in MAE/RMSE and a corresponding rise in NLL consistent with the noise amplitude.
- 5.
- Physical consistency: Predictors known to trace gas content—such as optical colors (e.g., ) and structural parameters like —emerge as key drivers, in line with the PGF literature. Gains arise without target leakage thanks to explicit feature auditing and train-only preprocessing.
- Limitations—The training set is anchored to H I detections with reliable optical counterparts, biasing the learned mapping toward gas-rich systems; flux-limit and line-width–dependent completeness in ALFALFA imposes residual selection effects. Inputs are restricted to optical/UV photometry and simple structural proxies to preserve portability; richer features (e.g., environmental indicators, refined dust corrections) could reduce irreducible scatter but may limit universality across surveys.
- Outlook—Immediate extensions include (i) incorporating H I non-detections by using loss functions that explicitly account for censored data (upper limits), so that galaxies without 21 cm detections still contribute information to the training, or by jointly modelling and the probability of detection, and (ii) exploring domain-adaptation or hierarchical Bayesian approaches to stabilise performance across sky regions and survey releases. Applying the present pipeline to deeper imaging and higher redshift will test scalability and inform 21 cm survey strategies.
- Reproducibility—We release fixed train/validation/test and sky-holdout IDs, preprocessing and cross-match scripts, model configurations and seeds, and notebooks to regenerate all tables and figures. A versioned archive with DOI ensures long-term access and transparency.
- Summary—Uncertainty-aware machine learning—here pertaining to heteroscedastic BNNs—provides a robust and scalable route to infer galactic H I content from optical data. By combining competitive point accuracy with calibrated predictive intervals, the method enables efficient 21 cm target prioritisation and unbiased population-level inference with transparent uncertainty propagation.
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
| PGF | Photometric Gas Fraction (photometric gas fraction relations). |
| SDSS | Sloan Digital Sky Survey. |
| ALFALFA | Arecibo Legacy Fast ALFA (blind 21 cm H I survey). |
| BNN | Bayesian Neural Network (heteroscedastic, uncertainty-aware). |
| Vanilla | Deterministic Baseline (feed-forward DNN). |
| DNN | Deep Neural Network (feed-forward, non-probabilistic). |
| GBT | Gradient-Boosted Trees (ML ensemble; not the Green Bank Telescope). |
| NLL | Negative Log-Likelihood (proper scoring rule used for training/evaluation). |
| PIT | Probability Integral Transform (calibration diagnostic). |
| HEALPix | Hierarchical Equal Area isoLatitude Pixelization (sky tiling for holdout). |
| 1 | https://github.com/JoelsonSartoriJr/HydroMassNet. Accessed on 25 January 2026. |
| 2 | Values mirror the draft baseline without re-fitting, to preserve traceability. |
References
- Giovanelli, R.; Haynes, M.P. Extragalactic neutral hydrogen. In Galactic and Extragalactic Radio Astronomy; Springer: Berlin/Heidelberg, Germany, 1988; pp. 522–562. [Google Scholar]
- Kennicutt, R.C., Jr. Star formation in galaxies along the Hubble sequence. Annu. Rev. Astron. Astrophys. 1998, 36, 189–231. [Google Scholar] [CrossRef]
- Haynes, M.P.; Giovanelli, R.; Kent, B.R.; Adams, E.A.; Balonek, T.J.; Craig, D.W.; Fertig, D.; Finn, R.; Giovanardi, C.; Hallenbeck, G.; et al. The Arecibo Legacy Fast ALFA Survey: The ALFALFA Extragalactic H i Source Catalog. Astrophys. J. 2018, 861, 49. [Google Scholar] [CrossRef]
- Kannappan, S.J. Linking Gas Fractions to Bimodalities in Galaxy Properties. Astrophys. J. Lett. 2004, 611, L89–L92. [Google Scholar] [CrossRef][Green Version]
- Zhang, W.; Li, C.; Kauffmann, G.; Xiao, T. Estimating the H,I gas fractions of galaxies in the local Universe. Mon. Not. R. Astron. Soc. 2009, 397, 1243–1253. [Google Scholar] [CrossRef]
- Catinella, B.; Schiminovich, D.; Kauffmann, G.; Fabello, S.; Hummels, C.; Lemonias, J.; Moran, S.M.; Wu, R.; Cooper, A.; Wang, J. The GALEX Arecibo SDSS Survey. VI. Second Data Release and Updated Gas Fraction Scaling Relations. Astron. Astrophys. 2012, 544, A65. [Google Scholar] [CrossRef]
- Catinella, B.; Saintonge, A.; Janowiecki, S.; Cortese, L.; Davé, R.; Lemonias, J.J.; Cooper, A.P.; Schiminovich, D.; Hummels, C.B.; Fabello, S.; et al. xGASS: Total cold gas scaling relations and molecular-to-atomic gas ratios of galaxies in the local Universe. Mon. Not. R. Astron. Soc. 2018, 476, 875–895. [Google Scholar] [CrossRef]
- Teimoorinia, H.; Ellison, S.L.; Patton, D.R. Pattern Recognition in the ALFALFA.70 and Sloan Digital Sky Surveys: A Catalog of ∼500,000 H I Gas Fraction Estimates Based on Artificial Neural Networks. Mon. Not. R. Astron. Soc. 2017, 464, 3796–3813. [Google Scholar] [CrossRef]
- Wu, J.F. Connecting Optical Morphology, Environment, and H I Mass Fraction for Low-Redshift Galaxies Using Deep Learning. Astrophys. J. 2020, 900, 142. [Google Scholar] [CrossRef]
- Andrianomena, S.; Rafieferantsoa, M.; Davé, R. Classifying Galaxies According to Their H I Content. Mon. Not. R. Astron. Soc. 2020, 492, 5743–5753. [Google Scholar] [CrossRef]
- Guo, C.; Pleiss, G.; Sun, Y.; Weinberger, K.Q. On Calibration of Modern Neural Networks. In Proceedings of the 34th International Conference on Machine Learning (ICML), Sydney, Australia, 6–11 August 2017; JMLR.org: Norfolk, MA, USA, 2017; Volume 70, pp. 1321–1330. [Google Scholar]
- Kuleshov, V.; Fenner, N.; Ermon, S. Accurate Uncertainties for Deep Learning Using Calibrated Regression. In Proceedings of the 35th International Conference on Machine Learning (ICML), Stockholm, Sweden, 10–15 July 2018; PMLR: Cambridge, MA, USA, 2018; Volume 80, pp. 2796–2804. [Google Scholar]
- Lakshminarayanan, B.; Pritzel, A.; Blundell, C. Simple and Scalable Predictive Uncertainty Estimation Using Deep Ensembles. In Advances in Neural Information Processing Systems 30 (NeurIPS 2017); Curran Associates Inc.: Red Hook, NY, USA, 2017; pp. 6405–6416. [Google Scholar]
- Gal, Y.; Ghahramani, Z. Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning. In Proceedings of the 33rd International Conference on Machine Learning (ICML 2016), New York, NY, USA, 19–24 June 2016; JMLR.org: Norfolk, MA, USA, 2016; Volume 48, pp. 1050–1059. [Google Scholar]
- Alam, S.; Albareti, F.D.; Allende Prieto, C.; Anders, F.; Anderson, S.F.; Anderton, T.; Andrews, B.H.; Armengaud, E.; Aubourg, É; Bailey, S.; et al. The Eleventh and Twelfth Data Releases of the Sloan Digital Sky Survey: Final Data from SDSS-III. Astrophys. J. Suppl. Ser. 2015, 219, 12. [Google Scholar] [CrossRef]
- Brinchmann, J.; Charlot, S.; White, S.D.M.; Tremonti, C.; Kauffmann, G.; Heckman, T.; Brinkmann, J. The physical properties of star-forming galaxies in the low-redshift Universe. Mon. Not. R. Astron. Soc. 2004, 351, 1151–1179. [Google Scholar] [CrossRef]
- Rafieferantsoa, M.; Andrianomena, S.; Davé, R. Predicting the neutral hydrogen content of galaxies from optical data using machine learning. Mon. Not. R. Astron. Soc. 2018, 479, 4509–4525. [Google Scholar] [CrossRef]
- Adams, E.A.K.; Adebahr, B.; de Blok, W.J.G.; Dénes, H.; Hess, K.M.; van der Hulst, J.M.; Kutkin, A.; Lucero, D.M.; Morganti, R.; Moss, V.A.; et al. First release of Apertif imaging survey data. Astron. Astrophys. 2022, 667, A38. [Google Scholar] [CrossRef]
- Koribalski, B.S.; Staveley-Smith, L.; Westmeier, T.; Serra, P.; Spekkens, K.; Wong, O.I.; Lee-Waddell, K.; Lagos, C.D.P.; Obreschkow, D.; Ryan-Weber, E.V.; et al. WALLABY: An SKA Pathfinder H I survey. Astrophys. Space Sci. 2020, 365, 118. [Google Scholar] [CrossRef]
- Hutchens, Z.L.; Kannappan, S.J.; Berlind, A.A.; Asad, M.; Eckert, K.D.; Stark, D.V.; Carr, D.S.; Castelloe, E.R.; Baker, A.J.; Hess, K.M.; et al. The RESOLVE and ECO Gas in Galaxy Groups Initiative: The Group Finder and the Group H I–Halo Mass Relation. Astrophys. J. 2023, 956, 51. [Google Scholar] [CrossRef]
- O’Neil, K.; Bothun, G.D.; Schombert, J. Red, Gas-Rich Low Surface Brightness Galaxies and Enigmatic Deviations from the Tully-Fisher Relation. Astrophys. J. 2000, 119, 136–152. [Google Scholar] [CrossRef]
- Carr, D.S.; Kannappan, S.J.; Hutchens, Z.L.; Polimera, M.S.; Norris, M.A.; Eckert, K.D.; Moffett, A.J. Using Machine Learning to Estimate Near-Ultraviolet Magnitudes and Probe Quenching Mechanisms of z = 0 Nuggets in the RESOLVE and ECO Surveys. Astrophys. J. 2025, 985, 25. [Google Scholar] [CrossRef]
- Budavári, T.; Szalay, A.S. Probabilistic cross-identification of astronomical sources. Astrophys. J. 2008, 679, 301. [Google Scholar] [CrossRef]
- Strateva, I.; Ivezić, Ž.; Knapp, G.R.; Narayanan, V.K.; Strauss, M.A.; Gunn, J.E.; Lupton, R.H.; Schlegel, D.; Bahcall, N.A.; Brinkmann, J.; et al. Color Separation of Galaxy Types in the Sloan Digital Sky Survey Imaging Data. Astron. J. 2001, 122, 1861–1874. [Google Scholar] [CrossRef]
- Baldry, I.K.; Glazebrook, K.; Brinkmann, J.; Ivezić, Ž.; Lupton, R.H.; Nichol, R.C.; Szalay, A.S. Quantifying the Bimodal Color-Magnitude Distribution of Galaxies. Astrophys. J. 2004, 600, 681–706. [Google Scholar] [CrossRef]
- Schawinski, K.; Urry, C.M.; Simmons, B.D.; Fortson, L.; Kaviraj, S.; Keel, W.C.; Lintott, C.J.; Masters, K.L.; Nichol, R.C.; Sarzi, M.; et al. The green valley is a red herring: Galaxy Zoo reveals two evolutionary pathways towards quenching of star formation in early-and late-type galaxies. Mon. Not. R. Astron. Soc. 2014, 440, 889–907. [Google Scholar] [CrossRef]
- Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; ACM: New York, NY, USA, 2016; pp. 785–794. [Google Scholar] [CrossRef]
- Blundell, C.; Cornebise, J.; Kavukcuoglu, K.; Wierstra, D. Weight Uncertainty in Neural Networks. In Proceedings of the 32nd International Conference on Machine Learning (ICML), Lille, France, 6–11 July 2015; JMLR.org: Norfolk, MA, USA, 2015; pp. 1613–1622. [Google Scholar]
- Gorski, K.M.; Hivon, E.; Banday, A.J.; Wandelt, B.D.; Hansen, F.K.; Reinecke, M.; Bartelmann, M. HEALPix: A framework for high-resolution discretization and fast analysis of data distributed on the sphere. Astrophys. J. 2005, 622, 759. [Google Scholar] [CrossRef]
- Bergstra, J.; Bengio, Y. Random Search for Hyper-Parameter Optimization. J. Mach. Learn. Res. 2012, 13, 281–305. [Google Scholar]
- Glorot, X.; Bengio, Y. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the 13th International Conference on Artificial Intelligence and Statistics (AISTATS), Sardinia, Italy, 13–15 May 2010; PMLR: Cambridge, MA, USA, 2010; pp. 249–256. [Google Scholar]
- Kingma, D.P.; Ba, J. Adam: A Method for Stochastic Optimization. In Proceedings of the 3rd International Conference on Learning Representations (ICLR), San Diego, CA, USA, 7–9 May 2015. [Google Scholar]
- Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
- Abadi, M.; Barham, P.; Chen, J.; Chen, Z.; Davis, A.; Dean, J.; Devin, M.; Ghemawat, S.; Irving, G.; Isard, M.; et al. TensorFlow: A System for Large-Scale Machine Learning. In Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16), Savannah, GA, USA, 2–4 November 2016; USENIX Association: Berkeley, CA, USA, 2016; pp. 265–283. [Google Scholar]
- Salvatier, J.; Wiecki, T.V.; Fonnesbeck, C. Probabilistic programming in Python using PyMC3. PeerJ Comput. Sci. 2016, 2, e55. [Google Scholar] [CrossRef]
- Gneiting, T.; Raftery, A.E. Strictly Proper Scoring Rules, Prediction, and Estimation. J. Am. Stat. Assoc. 2007, 102, 359–378. [Google Scholar] [CrossRef]












| Variable (Processed) | Meaning/Notes |
|---|---|
| SDSS i-band apparent magnitude (AB) used as an input feature in the processed table. | |
| , | Galactic extinction terms in the g and i bands (used as predictors). |
| (logMsT), | Stellar mass (MPA–JHU, total) and its reported uncertainty. |
| proxy (logSFR22) | Scalar SFR indicator available in the catalog; used as a predictor, not as a target. |
| surface_brightness_proxy | Magnitude-like scalar derived from SDSS photometry; traces stellar-mass surface density. |
| Uncertainty on . | |
| Target fields (not used as predictors) | |
| H I mass from ALFALFA (base-10). | |
| Model | Key Hyperparameters (Range) |
|---|---|
| GBT | depth (3–10), learning rate (–), subsample (0.5–1.0), estimators (100–2000) |
| DNN | layers (1–5), width (32–1024), dropout (0.0–0.5), (–), LR (–) |
| BNN | as DNN + MC-dropout (0.05–0.3), VI prior scale (–) |
| Model | MAE | RMSE | |
|---|---|---|---|
| Dummy regressor | 0.137 ± 0.005 | — | — |
| CatBoost (GBT) | 0.069 ± 0.003 | — | — |
| Vanilla (deterministic baseline) | 0.061 ± 0.002 | 0.304 ± 0.010 | 0.705 ± 0.008 |
| BNN (two-headed; mean + variance) | 0.068 ± 0.002 | 0.282 ± 0.009 | 0.746 ± 0.007 |
| BNN (mean head only) | 0.075 ± 0.003 | — | 0.685 ± 0.010 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Sartori, J.; Bernal, C.G.; Frajuca, C. Estimating H I Mass Fraction in Galaxies with Bayesian Neural Networks. Galaxies 2026, 14, 10. https://doi.org/10.3390/galaxies14010010
Sartori J, Bernal CG, Frajuca C. Estimating H I Mass Fraction in Galaxies with Bayesian Neural Networks. Galaxies. 2026; 14(1):10. https://doi.org/10.3390/galaxies14010010
Chicago/Turabian StyleSartori, Joelson, Cristian G. Bernal, and Carlos Frajuca. 2026. "Estimating H I Mass Fraction in Galaxies with Bayesian Neural Networks" Galaxies 14, no. 1: 10. https://doi.org/10.3390/galaxies14010010
APA StyleSartori, J., Bernal, C. G., & Frajuca, C. (2026). Estimating H I Mass Fraction in Galaxies with Bayesian Neural Networks. Galaxies, 14(1), 10. https://doi.org/10.3390/galaxies14010010

