Flow Matching for Simulation-Based Inference: Design Choices and Implications
Abstract
1. Introduction
- We conduct an extensive experimental campaign on the Ariel Data Challenge (ADC) 2023 dataset [16], considering thousands of CNF configurations with variations in input data, preprocessing strategies, network architectures, and optimization parameters.
- We evaluate the performance of CNFs trained with FMPE and the estimated posterior distributions through an extensive posterior evaluation framework encompassing prediction errors, calibration, uncertainty quantification, and coverage analysis.
- We perform a systematic comparison to assess the influence of sensitive modeling decisions on model performance. These include (i) min–max scaling versus Z-score normalization; (ii) noise conditioning, and (iii) robustness under perturbation noise.
- We identify principled design choices prioritizing key predictive qualities, thereby enhancing the robustness and reliability of simulation-based posterior inference using CNFs.
2. Methods
2.1. Simulation-Based Atmospheric Retrieval
2.2. Flow Matching Posterior Estimation
2.3. Identification of Sensitive Modeling Choices
2.3.1. Data Normalization
2.3.2. Explicit Incorporation of Input Data Uncertainty
2.3.3. Training CNFs on Synthetic Data
3. Experiments
3.1. Dataset
- Spectral data, comprising a transmission spectrum and associated uncertainty measurements , where is the number of dimensions (i.e., discretized wavelengths).
- Auxiliary data, denoted with , encompassing eight additional stellar and planetary parameters, such as star distance, stellar mass, stellar radius, stellar temperature, planet mass, orbital period, semi-major axis, and surface gravity.
- Input parameters, denoted with , describing seven atmospheric parameters generating the simulated observations: the planet radius (in Jupyter radii ), temperature (in Kelvin), and the log-abundance of five atmospheric gases such as (water), (carbon dioxide), CO (carbon monoxide), (methane), and (ammonia gas).
3.2. Model Architecture, Training, and Inference
3.3. Evaluation
- ADC2023 Scores. We quantify retrieval performance with respect to a ground-truth NS-based posterior distribution under the ADC2023 scoring system, which includes the posterior score (PS), assessing the fidelity of predicted posterior distributions using the Kolmogorov–Smirnov test, and the spectral score (SS), measuring spectral consistency of the median predictive spectra and interquartile ranges. Both scores range from 0 to 1000. The final score (FS) is a weighted combination of these metrics: , using the same weighting coefficients defined in the original ADC scoring framework.
- Prediction Errors. We quantify the error between input parameters (our targets) and posterior samples (our predictions) by measuring Mean Absolute Error (MAE), Median Absolute Error (MedAE), Mean Squared Error (MSE), and Root Mean Squared Error (RMSE).
- Uncertainty Quantification. Thanks to the generative ability of CNFs, we estimate the predictive uncertainty by simply measuring the standard deviations of the posterior samples.
- Calibration. To verify whether the predicted posterior distribution fits the empirical distribution of the data, we quantify the calibration error by measuring proper calibration metrics, including Negative Log-Likelihood (NLL), Pinball Loss (), Quantile Calibration Error (QCE), Uncertainty Calibration Error (UCE), and Expected Normalized Calibration Error (ENCE) [25]. These metrics evaluate the calibration under different aspects (such as prediction quality, quantile, and variance), both in the univariate and multivariate cases.
- Ground-Truth Benchmarking. The predicted posterior distribution of a probabilistic regression model should at least include the input parameters to the simulator generating the observations. To evaluate the ground-truth benchmarking performance of a probabilistic regression method, we evaluate the Marginal Coverage Ratio (MCR) and Joint Coverage Ratio (JCR) at multiple coverage levels (, , and full-support). For a given coverage level, MCR and JCR are computed by measuring the average fraction of target values falling within the sets of marginal posterior values, separately or jointly.
4. Results and Discussion
4.1. Preliminary Data Analysis
4.2. Scores of the Ariel Data Challenge
4.3. Min–Max Scaling Versus Z-Score Normalization
4.4. Effects of Noise Conditioning
4.5. Robustness to Noisy Observations
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Cranmer, K.; Brehmer, J.; Louppe, G. The frontier of simulation-based inference. Proc. Natl. Acad. Sci. USA 2020, 117, 30055–30062. [Google Scholar] [CrossRef] [PubMed]
- Zammit-Mangion, A.; Sainsbury-Dale, M.; Huser, R. Neural Methods for Amortized Inference. Annu. Rev. Stat. Its Appl. 2025, 12, 311–335. [Google Scholar] [CrossRef]
- Ganguly, A.; Jain, S.; Watchareeruetai, U. Amortized Variational Inference: A Systematic Review. J. Artif. Intell. Res. 2023, 78, 167–215. [Google Scholar] [CrossRef]
- Wildberger, J.B.; Dax, M.; Buchholz, S.; Green, S.R.; Macke, J.H.; Schölkopf, B. Flow Matching for Scalable Simulation-Based Inference. In Proceedings of the Thirty-Seventh Conference on Neural Information Processing Systems, New Orleans, LA, USA, 10–16 December 2023. [Google Scholar]
- Chen, R.T.Q.; Rubanova, Y.; Bettencourt, J.; Duvenaud, D.K. Neural Ordinary Differential Equations. In Proceedings of the Advances in Neural Information Processing Systems, Montréal, QC, Canada, 3–8 December 2018; Curran Associates, Inc.: Red Hook, NY, USA, 2018; Volume 31. [Google Scholar]
- Papamakarios, G.; Nalisnick, E.; Rezende, D.J.; Mohamed, S.; Lakshminarayanan, B. Normalizing flows for probabilistic modeling and inference. J. Mach. Learn. Res. 2021, 22, 57:2617–57:2680. [Google Scholar]
- Lipman, Y.; Chen, R.T.Q.; Ben-Hamu, H.; Nickel, M.; Le, M. Flow Matching for Generative Modeling. arXiv 2023, arXiv:2210.02747. [Google Scholar] [CrossRef]
- Barret, D.; Dupourqué, S. Simulation-based inference with neural posterior estimation applied to X-ray spectral fitting-Demonstration of working principles down to the Poisson regime. Astron. Astrophys. 2024, 686, A133. [Google Scholar] [CrossRef]
- Ward, D.; Cannon, P.; Beaumont, M.; Fasiolo, M.; Schmon, S.M. Robust Neural Posterior Estimation and Statistical Model Criticism. In Proceedings of the Advances in Neural Information Processing Systems, New Orleans, LA, USA, 28 November–9 December 2022. [Google Scholar]
- Fiscale, S.; Ferone, A.; Ciaramella, A.; Inno, L.; Giordano Orsini, M.; Covone, G.; Rotundi, A. Detection of Exoplanets in Transit Light Curves with Conditional Flow Matching and XGBoost. Electronics 2025, 14, 1738. [Google Scholar] [CrossRef]
- Vasist, M.; Rozet, F.; Absil, O.; Mollière, P.; Nasedkin, E.; Louppe, G. Neural posterior estimation for exoplanetary atmospheric retrieval. Astron. Astrophys. 2023, 672, A147. [Google Scholar] [CrossRef]
- Yip, K.H.; Changeat, Q.; Al-Refaie, A.; Waldmann, I.P. To Sample or Not to Sample: Retrieving Exoplanetary Spectra with Variational Inference and Normalizing Flows. Astrophys. J. 2024, 961, 30. [Google Scholar] [CrossRef]
- Gebhard, T.D.; Angerhausen, D.; Konrad, B.S.; Alei, E.; Quanz, S.P.; Schölkopf, B. Parameterizing pressure–temperature profiles of exoplanet atmospheres with neural networks. Astron. Astrophys. 2024, 681, A3. [Google Scholar] [CrossRef]
- Gebhard, T.D.; Wildberger, J.; Dax, M.; Kofler, A.; Angerhausen, D.; Quanz, S.P.; Schölkopf, B. Flow matching for atmospheric retrieval of exoplanets: Where reliability meets adaptive noise levels. Astron. Astrophys. 2025, 693, A42. [Google Scholar] [CrossRef]
- Giordano Orsini, M.; Ferone, A.; Inno, L.; Casolaro, A.; Maratea, A. Flow Matching Posterior Estimation for Simulation-based Atmospheric Retrieval of Exoplanets. IEEE Access 2025, 13, 137773–137792. [Google Scholar] [CrossRef]
- Changeat, Q.; Yip, K.H. ESA-Ariel Data Challenge NeurIPS 2022: Introduction to exo-atmospheric studies and presentation of the Atmospheric Big Challenge (ABC) Database. RAS Tech. Instruments 2023, 2, 45–61. [Google Scholar] [CrossRef]
- Sisson, S.A.; Fan, Y.; Beaumont, M. (Eds.) Handbook of Approximate Bayesian Computation; Chapman and Hall/CRC: New York, NY, USA, 2018. [Google Scholar]
- Madhusudhan, N. Exoplanetary Atmospheres: Key Insights, Challenges, and Prospects. Annu. Rev. Astron. Astrophys. 2019, 57, 617–663. [Google Scholar] [CrossRef]
- Giordano Orsini, M.; Ferone, A.; Inno, L.; Giacobbe, P.; Maratea, A.; Ciaramella, A.; Bonomo, A.S.; Rotundi, A. A data-driven approach for extracting exoplanetary atmospheric features. Astron. Comput. 2025, 52, 100964. [Google Scholar] [CrossRef]
- Hüllermeier, E.; Waegeman, W. Aleatoric and epistemic uncertainty in machine learning: An introduction to concepts and methods. Mach. Learn. 2021, 110, 457–506. [Google Scholar] [CrossRef]
- Valdenegro-Toro, M.; de Jong, I.P.; Zullich, M. Unified Uncertainties: Combining Input, Data and Model Uncertainty into a Single Formulation. arXiv 2024, arXiv:2406.18787. [Google Scholar] [CrossRef]
- Mugnai, L.V.; Al-Refaie, A.; Bocchieri, A.; Changeat, Q.; Pascale, E.; Tinetti, G. Alfnoor: Assessing the Information Content of Ariel’s Low-resolution Spectra with Planetary Population Studies. Astron. J. 2021, 162, 288. [Google Scholar] [CrossRef]
- Steinhoff, J.; Hind, S. Simulation and the reality gap: Moments in a prehistory of synthetic data. Big Data Soc. 2025, 12, 20539517241309884. [Google Scholar] [CrossRef]
- Gargaud, M.; Irvine, W.M.; Amils, R.; Claeys, P.; Cleaves, H.J.; Gerin, M.; Rouan, D.; Spohn, T.; Tirard, S.; Viso, M. (Eds.) Atmospheric Remote-Sensing Infrared Exoplanet Large-Survey. In Encyclopedia of Astrobiology; Springer: Berlin/Heidelberg, Germany, 2023; p. 275. [Google Scholar]
- Küppers, F.; Schneider, J.; Haselhoff, A. Parametric and Multivariate Uncertainty Calibration for Regression and Object Detection. In Proceedings of the Computer Vision—ECCV 2022 Workshops, Tel Aviv, Israel, 23–27 October 2022; Karlinsky, L., Michaeli, T., Nishino, K., Eds.; Springer: Cham, Switzerland, 2023; pp. 426–442. [Google Scholar]
- Yip, K.H.; Changeat, Q.; Nikolaou, N.; Morvan, M.; Edwards, B.; Waldmann, I.P.; Tinetti, G. Peeking inside the Black Box: Interpreting Deep-learning Models for Exoplanet Atmospheric Retrievals. Astron. J. 2021, 162, 195. [Google Scholar] [CrossRef]
- Aubin, M.; Cuesta-Lazaro, C.; Tregidga, E.; Viaña, J.; Garraffo, C.; Gordon, I.E.; López-Morales, M.; Hargreaves, R.J.; Makhnev, V.Y.; Drake, J.J. Simulation-Based Inference for Exoplanet Atmospheric Retrieval: Insights from Winning the Ariel Data Challenge 2023 Using Normalizing Flows. In Proceedings of the Machine Learning and Principles and Practice of Knowledge Discovery in Databases, Turin, Italy, 18–22 September 2023; Springer: Cham, Switzerland, 2025; pp. 113–131. [Google Scholar]
Hyperparameter | Sweep Values |
---|---|
[0.0, 0.1] | |
[1, 2] | |
[−0.75, −0.5, −0.25, 0.0, 0.5, 1.0, 2.0, 4.0] | |
h | [128, 256, 512, 1024, 512, 256, 128], |
[512, 512, 512, 512, 512, 512, 512] | |
[0.001, 0.0005, 0.0001] |
Training | Validation | Test | |
---|---|---|---|
Training | Validation | Test | |
---|---|---|---|
Estimators | Scores | ||
---|---|---|---|
Posterior | Spectral | Final | |
Baseline (CNN) [26] | 216.038 | 606.170 | 294.064 |
NSF [27] | 424.904 | 759.602 | 491.844 |
CNF-FMPE | 483.874 | 865.843 | 560.268 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Giordano Orsini, M.; Ferone, A.; Inno, L.; Casolaro, A.; Maratea, A. Flow Matching for Simulation-Based Inference: Design Choices and Implications. Electronics 2025, 14, 3833. https://doi.org/10.3390/electronics14193833
Giordano Orsini M, Ferone A, Inno L, Casolaro A, Maratea A. Flow Matching for Simulation-Based Inference: Design Choices and Implications. Electronics. 2025; 14(19):3833. https://doi.org/10.3390/electronics14193833
Chicago/Turabian StyleGiordano Orsini, Massimiliano, Alessio Ferone, Laura Inno, Angelo Casolaro, and Antonio Maratea. 2025. "Flow Matching for Simulation-Based Inference: Design Choices and Implications" Electronics 14, no. 19: 3833. https://doi.org/10.3390/electronics14193833
APA StyleGiordano Orsini, M., Ferone, A., Inno, L., Casolaro, A., & Maratea, A. (2025). Flow Matching for Simulation-Based Inference: Design Choices and Implications. Electronics, 14(19), 3833. https://doi.org/10.3390/electronics14193833