Marginal Bayesian Statistics Using Masked Autoregressive Flows and Kernel Density Estimators with Examples in Cosmology

Bevins, Harry; Handley, Will; Lemos, Pablo; Sims, Peter; de Lera Acedo, Eloy; Fialkov, Anastasia

doi:10.3390/psf2022005001

Open AccessProceeding Paper

Marginal Bayesian Statistics Using Masked Autoregressive Flows and Kernel Density Estimators with Examples in Cosmology^†

by

Harry Bevins

^1,2,*

,

Will Handley

^1,2

,

Pablo Lemos

^3,4,

Peter Sims

⁵,

Eloy de Lera Acedo

^1,2 and

Anastasia Fialkov

^2,6

¹

Astrophysics Group, Cavendish Laboratory, J. J. Thomson Avenue, Cambridge CB3 0HE, UK

²

Kavli Institute for Cosmology, Madingley Road, Cambridge CB3 0HA, UK

³

Department of Physics & Astronomy, University College London, Gower Street, London WC1E 6BT, UK

⁴

Department of Physics and Astronomy, University of Sussex, Pevensey Building, Brighton BN1 9QH, UK

⁵

Department of Physics and McGill Space Institute, McGill University, 3600 University Street, Montreal, QC H3A 2T8, Canada

⁶

Institute of Astronomy, University of Cambridge, Madingley Road, Cambridge CB3 0HA, UK

^*

Author to whom correspondence should be addressed.

^†

Presented at the 41st International Workshop on Bayesian Inference and Maximum Entropy Methods in Science and Engineering, Paris, France, 18–22 July 2022.

Phys. Sci. Forum 2022, 5(1), 1; https://doi.org/10.3390/psf2022005001

Published: 27 October 2022

(This article belongs to the Proceedings of The 41st International Workshop on Bayesian Inference and Maximum Entropy Methods in Science and Engineering)

Download

Browse Figures

Versions Notes

Abstract

:

Cosmological experiments often employ Bayesian workflows to derive constraints on cosmological and astrophysical parameters from their data. It has been shown that these constraints can be combined across different probes, such as Planck and the Dark Energy Survey, and that this can be a valuable exercise to improve our understanding of the universe and quantify tension between multiple experiments. However, these experiments are typically plagued by differing systematics, instrumental effects, and contaminating signals, which we collectively refer to as ‘nuisance’ components, which have to be modelled alongside target signals of interest. This leads to high dimensional parameter spaces, especially when combining data sets, with ≳20 dimensions of which only ∼5 correspond to key physical quantities. We present a means by which to combine constraints from different data sets in a computationally efficient manner by generating rapid, reusable, and reliable marginal probability density estimators, giving us access to nuisance-free likelihoods. This is possible through the unique combination of nested sampling, which gives us access to Bayesian evidence, and the marginal Bayesian statistics code margarine. Our method is lossless in the signal parameters, resulting in the same posterior distributions as would be found from a full nested sampling run over all nuisance parameters, and typically quicker than evaluating full likelihoods. We demonstrate our approach by applying it to the combination of posteriors from the Dark Energy Survey and Planck.

Keywords:

Bayesian analysis; normalizing flows; Kullback–Leibler; cosmology

1. Introduction

Bayesian inference is a cornerstone of modern cosmology and astrophysics. It is frequently employed to derive parameter constraints on key signal parameters from data sets such as the Dark Energy Survey (DES, [1,2]), Planck [3], REACH [4], and SARAS2 [5], among others. Often, experiments are sensitive to different aspects of the same physics, and by combining constraints across probes we can improve our understanding of the Universe or reveal tensions between different experiments.

However, this can become a computationally expensive task as many experiments feature systematic, instrumental effects, as well as contamination from other physical signals that need to be modelled alongside the signal or parameters of interest. For individual experiments, this can lead to high dimensional problems with ≳20 parameters of which the majority can be considered ‘nuisance’ parameters. The problem is compounded when combining different data sets with different models for common nuisance components and different systematics or instrumental effects that have to be modelled.

In this work, we demonstrate that we can use density estimators, such as kernel density estimators [6,7] and masked autoregressive flows [8], to rapidly calculate reliable and reusable representations of marginal probability densities and marginal Bayesian summary statistics for key signal or cosmological parameters. This gives us access to the nuisance-free likelihood functions and allows us to combine parameter constraints from different data sets in a computationally efficient manner given marginal posterior samples from the different experiments. We use the publicly available code (https://github.com/htjb/margarine (accessed on 26 October 2022)) margarine [9] to generate density estimators.

In Section 2, we mathematically demonstrate that the application of margarine to the problem of combining the marginal posteriors from two data sets is equivalent to running a full nested sampling run including all ‘nuisance’ parameters. We define in this section the nuisance-free likelihood. Section 3 briefly discusses the methodology behind margarine with reference to a previously published work [9]. Finally, we show the results of combining samples from DES and Planck in Section 4 and conclude in Section 5.

2. Theory

2.1. Notation

Given a likelihood

L (Θ) \equiv P (D | Θ, M)

representing the probability of data D given some model

M

with parameters

Θ

, Bayesian inference proceeds by defining a prior

π (Θ) \equiv P (Θ | M)

, and then through Bayes theorem computing a posterior distribution

P (Θ) \equiv P (Θ | D, M)

for the purposes of parameter estimation and an evidence

Z \equiv P (D | M)

in order to perform model comparison. In our notation, we suppress model dependence, but where we wish to refer to the likelihoods derived from different datasets, we denote this with a subscript so for example

L_{A} (Θ) \equiv P (D_{A} | Θ, M)

, and

L_{B} (Θ) \equiv P (D_{B} | Θ, M)

.

In our setting, the parameter vector is split into two sub-vectors

Θ = (θ, α)

, where

θ

are parameters of scientific interest, and

α

are nuisance parameters, including for the purposes of data analysis. Such situations are common in astrophysics, where for example

θ

might be parameters governing the Universe’s evolution, whilst

α

might be associated with instrument calibration and foreground removal [3,4]. The

α

parameters are generally “marginalised out” and not considered further in final or future analyses.

2.2. Definitions

With this notation established, the Bayes theorem version (including nuisance parameters) takes the form

L (θ, α) \times π (θ, α) = P (θ, α) \times Z,

(1)

where we have placed the inputs of inference (likelihood and prior) on the left-hand side, and the outputs (posterior and evidence) on the right. The evidence is as usual equivalent to the (fully-)marginalised likelihood

Z = \int L (θ, α) π (θ, α) d θ d α

.

We may marginalise any probability distribution so can straightforwardly define the nuisance marginalised posterior and prior by integrating over

α

P (θ) = \int P (θ, α) d α, π (θ) = \int π (θ, α) d α .

(2)

The nuisance marginalised version of Bayes theorem Equation (1) takes the form

L (θ) \times π (θ) = P (θ) \times Z .

(3)

Here

Z

is the original evidence, whilst

L (θ)

is non-trivially the nuisance-free likelihood

L (θ) \equiv \frac{\int L (θ, α) π (θ, α) d α}{\int π (θ, α) d α} = \frac{P (θ) Z}{π (θ)},

(4)

where the above is motivated by marginalising over

α

of the full Bayes theorem Equation (1) and substituting the definitions in Equations (2) and (4) recovers the marginalised Bayes theorem Equation (3).

The nuisance-free likelihood Equation (4) is straightforward to compute in our framework since we (uniquely) have NS-computed evidence

Z

combined with margarine trained distributions

P (θ)

and

π (θ)

[9] (see Section 3).

We now explain why Equation (4) is a useful definition

Theorem 1.

Let

L_{A} (θ, α_{A})

and

L_{B} (θ, α_{B})

be two likelihoods with distinct datasets, each with their own nuisance parameters. The nuisance-free likelihoods

L_{A} (θ)

,

L_{B} (θ)

form a lossless compression in θ. This means that we can recover the same (marginal) inference in combination that we would have made when performing a combined analysis with all nuisance parameters:

\begin{matrix} L_{A} (θ, α_{A}) L_{B} (θ, α_{B}) π_{A B} (θ, α_{A}, α_{B}) & = P_{A B} (θ, α_{A}, α_{B}) Z_{A B}, \end{matrix}

(5)

\begin{matrix} \Rightarrow L_{A} (θ) L_{B} (θ) π (θ) & = P_{A B} (θ) Z_{A B}, \end{matrix}

(6)

if their respective priors

π_{A} (θ, α_{A})

and

π_{B} (θ, α_{B})

satisfy the marginal consistency relations:

\begin{matrix} π (θ) = \int π_{A} (θ, α_{A}) d α_{A} = \int π_{B} (θ, α_{B}) d α_{B}, \end{matrix}

(7)

\begin{matrix} \int π_{A B} (θ, α_{A}, α_{B}) d α_{A} = π_{B} (θ, α_{B}), \int π_{A B} (θ, α_{A}, α_{B}) d α_{B} = π_{A} (θ, α_{A}) . \end{matrix}

(8)

This process is represented graphically inFigure 1 and Figure 2.

Proof.

Integrating the combined Bayes theorem Equation (5) with respect to

α_{B}

, applying the definition of the marginal posterior Equation (2) on the right-hand side, and drawing out terms independent of

α_{B}

on the left, yields

L_{A} (θ, α_{A}) \int L_{B} (θ, α_{B}) π_{A B} (θ, α_{A}, α_{B}) d α_{B} = P_{A B} (θ, α_{A}) Z_{A B} .

(9)

From the definition of a nuisance-free likelihood Equation (4), and the marginal consistency Equation (8), we can say that the integral on the left-hand side becomes:

\begin{matrix} \int L_{B} (θ, α_{B}) π_{A B} (θ, α_{A}, α_{B}) d α_{B} \\ = \int L_{B} (θ, α_{A}, α_{B}) π_{A B} (θ, α_{A}, α_{B}) d α_{B} & [L_{B} (θ, α_{B}) \equiv L_{B} (θ, α_{A}, α_{B}) \sin ce L_{B} indep of α_{A}] \\ = L_{B} (θ, α_{A}) \int π_{A B} (θ, α_{A}, α_{B}) d α_{B} & [Using Equation (4) for L_{B}] \\ = L_{B} (θ) \int π_{A B} (θ, α_{A}, α_{B}) d α_{B} & [L_{B} (θ, α_{A}) \equiv L_{B} (θ) \sin ce L_{B} indep of α_{A}] \\ = L_{B} (θ) π_{A} (θ, α_{A}) . & [Using marginal consistency Equation (8)] \end{matrix}

(10)

Substituting Equation (10) back into Equation (9) we find

L_{A} (θ, α_{A}) L_{B} (θ) π_{A} (θ, α_{A}) = P_{A B} (θ, α_{A}) Z_{A B} .

(11)

Proceeding with a similar manipulation to Equation (10), marginalising with respect to

α_{A}

, and applying the definition of the nuisance-free likelihood

L_{A} (θ)

Equation (4) and the marginal prior consistency Equation (7) we recover Equation (6)

L_{A} (θ) L_{B} (θ) π (θ) = P_{A B} (θ) Z_{A B} .

□

2.3. Discussion

Equation (5) represents Bayes theorem for the combined likelihood of both datasets

L_{A B} (θ, α_{A}, α_{B}) = L_{A} (θ, α_{A}) L_{B} (θ, α_{B})

, using the combined prior

π_{A B} (θ, α_{A}, α_{B})

. We assume the combined prior is marginally consistent, Equations (7) and (8), which is reasonable, merely demanding that the priors are identical in the parameter spaces where they overlap. In practice, this would usually be achieved by assuming separability between signal and nuisance parameter spaces

π (θ, α) = π (θ) π (α)

, but Equations (7) and (8) are a slightly less restrictive requirement and therefore more general.

The upshot of this is that if you have performed inference for two datasets separately, such that you are able to compute the nuisance-free likelihoods with margarine, you may discard the nuisance parameters for the next set of analyses when you combine the datasets.

3. Methods

margarine was first introduced in [9] and uses density estimation to approximate probability distributions such as

P (θ)

and

π (θ)

given sets of representative samples. The code was developed initially to calculate marginal Kullback–Leibler (KL) divergences [10] and Bayesian model dimensionalities (BMD) [11] however as discussed in Section 2 it can be used to calculate the nuisance-free likelihoods. This in turn means that we can use margarine alongside an implementation of the nested sampling algorithm to sample the product

L_{A} (θ) L_{B} (θ)

. In this manner, margarine allows us to combine constraints on common parameters across different data sets.

We refer the reader to [9] for a complete discussion of how margarine works; however, we discuss briefly the density estimation here. margarine uses two different types of density estimators to model posterior and prior samples, namely masked autoregressive flows (MAFs, [8]) and kernel density estimators (KDEs, [6,7]).

MAFs transform a multivariate base distribution, the standard normal, into a target distribution via a series of shifts and scaling, which are estimated by autoregressive neural networks. To improve the performance of the MAF the samples representing the target distribution, in our case

P (θ)

and

π (θ)

, are transformed into a Gaussianized space. We implement the MAFs using tensorflow and the keras backend [12].

KDEs use a kernel to approximate the multivariate probability density of a series of samples. In our case, the kernel is Gaussian and the probability density is a sum of Gaussians centred on the sample points with a given bandwidth. Again, we transform the target samples into a Gaussianized parameter space allowing the KDE to better capture the distribution. The KDEs are implemented with SciPy in margarine [6,7,13].

Since both types of density estimator build approximations to the target distribution using known distributions, the approximate log probabilities of the target distribution can be easily calculated.

The evaluation of normalised log probabilities for the marginal posterior and marginal prior allows us to calculate the nuisance-free likelihoods, as discussed, along with marginal Kullback–Leibler divergences

D (P | | π) = \int P (θ) log \frac{P (θ)}{π (θ)} d θ,

(12)

which quantifies the amount of information gained when moving from the marginal prior to posterior.

4. Cosmological Example

It has previously been demonstrated that margarine is capable of replicating complex probability distributions and approximating marginal Bayesian statistics such as the KL divergence and the BMD [9]. Here, we demonstrate the theory discussed in Section 2 by combining samples from the Dark Energy Survey (DES) Year 1 posterior [1] and Planck posterior [3] using margarine to estimate nuisance-free likelihoods. DES surveys supernovae, galaxies, and large-scale cosmic structures in the universe in an effort to measure dark matter and dark energy densities and model the dark energy equation of state. In contrast, Planck mapped the anisotropies in the cosmic microwave background (CMB) and correspondingly provided constraints on key cosmological parameters.

The constraints from DES and Planck have previously been combined using a full nested sampling run over all parameters including a multitude of ‘nuisance’ parameters in a computationally expensive exercise [14]. This corresponds to the flow chart in Figure 1 and the previous analysis gives us access to the combined DES and Planck evidence, which is found to have a value of

log (Z) = - 5965.7 \pm 0.3

. In Figure 3, we show the DES, Planck, and joint posteriors for the six cosmological parameters derived in this work using margarine and the flow chart in Figure 2. The constrained parameters are the baryon and dark matter density parameters,

Ω_{b} h^{2}

and

Ω_{c} h^{2}

, the angular size of the sound horizon at recombination,

θ_{M C}

, the CMB optical depth,

τ

, the amplitude of the power spectrum,

A_{s}

, and the corresponding spectral index,

n_{s}

. These make up the set

θ = (Ω_{b} h^{2}, Ω_{c} h^{2}, θ_{M C}, τ, A_{s}, n_{s})

. We use the nested sampling algorithm polychord in our analysis [15,16].

We use a uniform prior that is defined to be three sigma around the Planck posterior mean. This is done to improve the efficiency of our nested sampling run. However, we subsequently have to re-weight the samples and correct the evidence for the difference between the priors used here and in the previous full nested sampling run [14] for comparison. If we define

Z_{A} = \int L (θ) π_{A} (θ) d θ, Z_{B} = \int L (θ) π_{B} (θ) d θ,

(13)

where A is our uniform prior space and B is our target prior space from the previous work [14], then

Z_{B} = \int L (θ) π_{B} (θ) d θ = \int L (θ) π_{A} (θ) \frac{π_{B} (θ)}{π_{A} (θ)} d θ = Z_{A} {〈\frac{π_{B} (θ)}{π_{A} (θ)}〉}_{P_{A}}

(14)

giving

Z_{B} = Z_{A} {〈\frac{π_{B} (θ)}{π_{A} (θ)}〉}_{P_{A}} .

(15)

Following similar arguments, we can transform our posteriors by re-weighting the distributions with the following

w_{B}^{(i)} = w_{A}^{(i)} \frac{π_{B} (θ^{(i)})}{π_{A} (θ^{(i)})} .

(16)

We see from the figure and corresponding table that with our joint analysis we are able to derive a log-evidence that is approximately consistent with that found in [14] validating the theory discussed and its implementation with margarine. We note that the re-weighting described above relies on the calculations of the two prior log probabilities for which we use margarine, and currently do not have an estimate of the error for. As a result, the error in the combined evidence,

Z_{B}

, is given by the error in

Z_{A}

from the nested samples and is likely underestimated. Using margarine [9] we can also derive the combined KL divergence, also reported in Figure 3, which we find is consistent with the result in the literature of

D = 6.17 \pm 0.36

[11]. Similarly, we derive marginal KL divergences for the DES and Planck cosmological parameters using margarine. A full discussion of the implications of combining the two data sets for our understanding of cosmology can be found in the literature (e.g., [11,14]).

By reducing the number of parameters that need to be sampled, we significantly reduce the nested sampling runtime. For polychord the runtime scales as the cube of the number of dimensions [17]. This can be seen by assessing the time complexity of the algorithm, where

T \propto n_{live} \times 〈 T {L (θ)} 〉 \times 〈 T {I m p l .} 〉 \times D (P | | π)

. Here

n_{live}

scales with the number of dimensions, d, as does the Kullback–Leibler divergence. For polychord, the specific implementation time complexity factor,

〈 T {I m p l .} 〉

, representing the impact of replacing dead points with higher likelihood live points on the runtime, scales linearly with d. Together this gives

T \propto d^{3} \times 〈 T {L (θ)} 〉

. Therefore, by using nuisance-free likelihoods and sampling over 6 parameters rather than 41 parameters (cosmological plus 20 nuisance parameters for DES and 15 different nuisance parameters for Planck) we reduce the runtime by a factor of

{(41 / 6)}^{3} \approx 319

with further improvements in

〈 T {L (θ)} 〉

. Using margarine,

〈 T {L (θ)} 〉

is typically reduced since analytic likelihoods are computationally more expensive than emulated likelihoods.

5. Conclusions

In the paper, we demonstrated the consistency between combining constraints from different experiments in a marginal framework using density estimators and the code margarine with a full nested sampling run over all parameters, including those describing ‘nuisance’ components of the data. We have shown this consistency mathematically and with a cosmological example. For the combination of Planck and DES, we found Bayesian evidence and KL divergence that were consistent with previous results [11,14].

The analysis in this paper is only possible because (a) we were able to estimate densities in the (much smaller) cosmological parameter space

θ

using margarine, and (b) because we have evidence,

Z

, from our original nested sampling runs. It is this unique combination that allowed us to compress away or discard nuisance parameters once they have been used. We note also that working in the marginal space results in a compression that is lossless in information on

θ

as it recovers an identical marginal posterior and total evidence as is found during a full Bayesian inference. Finally, through the nuisance-free likelihood we can significantly reduce the dimensionality of our problems and since it is faster to emulate a likelihood rather than analytically evaluate, margarine offers a much more computationally efficient path to combined Bayesian analysis.

In principle, our work paves the way for the development of a publicly available library of cosmological density estimators modelled with margarine that can be combined with new data sets using the proposed method in a more efficient manner than currently implemented techniques. However, the work has implications outside of cosmology in any field where multiple experiments probe different aspects of the same physics.

Author Contributions

Conceptualization, W.H. and H.B.; methodology, W.H. and H.B.; formal analysis, H.B.; software, H.B.; writing—original draft preparation, H.B.; writing—review and editing, H.B., W.H., P.L. and P.S.; supervision, W.H., E.d.L.A. and A.F. All authors have read and agreed to the published version of the manuscript.

Funding

H.B. acknowledges the support of the Science and Technology Facilities Council (STFC) through grant number ST/T505997/1 and Fitzwilliam College, Cambridge. W.H. and A.F. were supported by Royal Society University Research Fellowships. PHS acknowledges support from a McGill Space Institute Fellowship and the Canada 150 Research Chairs Program. E.d.L.A. was supported by the STFC through the Ernest Rutherford Fellowship.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The Planck and DES posterior samples used in this paper are available in Zenodo at 10.5281/zenodo.4116393.

Conflicts of Interest

The authors declare no conflict of interest.

References

DES Collaboration. Dark Energy Survey year 1 results: Cosmological constraints from galaxy clustering and weak lensing. Phys. Rev. D 2018, 98, 043526. [Google Scholar] [CrossRef] [Green Version]
DES Collaboration. Dark Energy Survey year 3 results: Covariance modelling and its impact on parameter estimation and quality of fit. MNRAS 2021, 508, 3125–3165. [Google Scholar] [CrossRef]
Planck Collaboration. Planck 2018 results. VI. Cosmological parameters. A&A 2020, 641, A6. [Google Scholar] [CrossRef] [Green Version]
Anstey, D.; Acedo, E.d.L.; Handley, W. A General Bayesian Framework for Foreground Modelling and Chromaticity Correction for Global 21 cm Experiments. arXiv 2020, arXiv:2010.09644. [Google Scholar] [CrossRef]
Bevins, H.T.J.; de Lera Acedo, E.; Fialkov, A.; Handley, W.J.; Singh, S.; Subrahmanyan, R.; Barkana, R. A Comprehensive Bayesian re-analysis of the SARAS2 data from the Epoch of Reionization. arXiv 2022, arXiv:2201.11531. [Google Scholar]
Parzen, E. On Estimation of a Probability Density Function and Mode. Ann. Math. Stat. 1962, 33, 1065–1076. [Google Scholar] [CrossRef]
Rosenblatt, M. Remarks on Some Nonparametric Estimates of a Density Function. Ann. Math. Stat. 1956, 27, 832–837. [Google Scholar] [CrossRef]
Papamakarios, G.; Pavlakou, T.; Murray, I. Masked Autoregressive Flow for Density Estimation. arXiv 2017, arXiv:1705.07057. [Google Scholar]
Bevins, H.T.J.; Handley, W.J.; Lemos, P.; Sims, P.H.; de Lera Acedo, E.; Fialkov, A.; Alsing, J. Removing the fat from your posterior samples with margarine. arXiv 2022, arXiv:2205.12841. [Google Scholar]
Kullback, S.; Leibler, R.A. On Information and Sufficiency. Ann. Math. Stat. 1951, 22, 79–86. [Google Scholar] [CrossRef]
Handley, W.; Lemos, P. Quantifying dimensionality: Bayesian cosmological model complexities. Phys. Rev. D 2019, 100, 023512. [Google Scholar] [CrossRef] [Green Version]
Abadi, M.; Agarwal, A.; Barham, P.; Brevdo, E.; Chen, Z.; Citro, C.; Corrado, G.S.; Davis, A.; Dean, J.; Devin, M.; et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. 2015. Available online: https://www.tensorflow.org/ (accessed on 26 October 2022).
Virtanen, P.; Gommers, R.; Oliphant, T.E.; Haberland, M.; Reddy, T.; Cournapeau, D.; Burovski, E.; Peterson, P.; Weckesser, W.; Bright, J.; et al. SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python. Nat. Methods 2020, 17, 261–272. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Handley, W.; Lemos, P. Quantifying tensions in cosmological parameters: Interpreting the DES evidence ratio. Phys. Rev. D 2019, 100, 043504. [Google Scholar] [CrossRef]
Handley, W.J.; Hobson, M.P.; Lasenby, A.N. PolyChord: Nested sampling for cosmology. Mon. Not. R. Astron. Soc. 2015, 450, L61–L65. [Google Scholar] [CrossRef]
Handley, W.J.; Hobson, M.P.; Lasenby, A.N. PolyChord: Next-generation nested sampling. Mon. Not. R. Astron. Soc. 2015, 453, 4385–4399. [Google Scholar] [CrossRef] [Green Version]
Petrosyan, A.; Handley, W.J. Supernest: Accelerated nested sampling applied to astrophysics and cosmology. In Proceedings of the International Workshop on Bayesian Inference and Maximum Entropy Methods in Science and Engineering, Paris, France, 18–22 July 2022. [Google Scholar]
Handley, W. Anesthetic: Nested sampling visualisation. J. Open Source Softw. 2019, 4, 1414. [Google Scholar] [CrossRef]

Figure 1. A graphical representation of combining constraints from different data sets via a full nested sampling run over both cosmological and nuisance parameters (Equation (5)).

Figure 2. A graphical representation of combining constraints from two data sets via margarine (Equation (6)). Left of the dashed line illustrates the derivation of a nuisance-free likelihood function for one experimental data set.

Figure 3. The combined posterior (in grey) found when combining the constraints on the cosmological parameters from DES and Planck using margarine. For DES and Planck, we calculate the marginal KL divergences using margarine, whereas the Bayesian evidence is calculated using anesthetic. The joint evidence and joint KL divergence are calculated with a combination of the two codes and are found to be approximately consistent with those found in the literature [11,14]. Note that the error on the joint evidence is likely underestimated as it relies on evaluations of log probabilities for various distributions, for which margarine does not currently provide errors. The figure was produced with anesthetic [18].

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Bevins, H.; Handley, W.; Lemos, P.; Sims, P.; de Lera Acedo, E.; Fialkov, A. Marginal Bayesian Statistics Using Masked Autoregressive Flows and Kernel Density Estimators with Examples in Cosmology. Phys. Sci. Forum 2022, 5, 1. https://doi.org/10.3390/psf2022005001

AMA Style

Bevins H, Handley W, Lemos P, Sims P, de Lera Acedo E, Fialkov A. Marginal Bayesian Statistics Using Masked Autoregressive Flows and Kernel Density Estimators with Examples in Cosmology. Physical Sciences Forum. 2022; 5(1):1. https://doi.org/10.3390/psf2022005001

Chicago/Turabian Style

Bevins, Harry, Will Handley, Pablo Lemos, Peter Sims, Eloy de Lera Acedo, and Anastasia Fialkov. 2022. "Marginal Bayesian Statistics Using Masked Autoregressive Flows and Kernel Density Estimators with Examples in Cosmology" Physical Sciences Forum 5, no. 1: 1. https://doi.org/10.3390/psf2022005001

APA Style

Bevins, H., Handley, W., Lemos, P., Sims, P., de Lera Acedo, E., & Fialkov, A. (2022). Marginal Bayesian Statistics Using Masked Autoregressive Flows and Kernel Density Estimators with Examples in Cosmology. Physical Sciences Forum, 5(1), 1. https://doi.org/10.3390/psf2022005001

Article Menu

Marginal Bayesian Statistics Using Masked Autoregressive Flows and Kernel Density Estimators with Examples in Cosmology^†

Abstract

1. Introduction

2. Theory

2.1. Notation

2.2. Definitions

2.3. Discussion

3. Methods

4. Cosmological Example

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Marginal Bayesian Statistics Using Masked Autoregressive Flows and Kernel Density Estimators with Examples in Cosmology †

Abstract

1. Introduction

2. Theory

2.1. Notation

2.2. Definitions

2.3. Discussion

3. Methods

4. Cosmological Example

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Marginal Bayesian Statistics Using Masked Autoregressive Flows and Kernel Density Estimators with Examples in Cosmology^†