1. Introduction
Bayesian inference is a cornerstone of modern cosmology and astrophysics. It is frequently employed to derive parameter constraints on key signal parameters from data sets such as the Dark Energy Survey (DES, [
1,
2]), Planck [
3], REACH [
4], and SARAS2 [
5], among others. Often, experiments are sensitive to different aspects of the same physics, and by combining constraints across probes we can improve our understanding of the Universe or reveal tensions between different experiments.
However, this can become a computationally expensive task as many experiments feature systematic, instrumental effects, as well as contamination from other physical signals that need to be modelled alongside the signal or parameters of interest. For individual experiments, this can lead to high dimensional problems with ≳20 parameters of which the majority can be considered ‘nuisance’ parameters. The problem is compounded when combining different data sets with different models for common nuisance components and different systematics or instrumental effects that have to be modelled.
In this work, we demonstrate that we can use density estimators, such as kernel density estimators [
6,
7] and masked autoregressive flows [
8], to rapidly calculate reliable and reusable representations of marginal probability densities and marginal Bayesian summary statistics for key signal or cosmological parameters. This gives us access to the nuisance-free likelihood functions and allows us to combine parameter constraints from different data sets in a computationally efficient manner given marginal posterior samples from the different experiments. We use the publicly available code (
https://github.com/htjb/margarine (accessed on 26 October 2022))
margarine [
9] to generate density estimators.
In
Section 2, we mathematically demonstrate that the application of
margarine to the problem of combining the marginal posteriors from two data sets is equivalent to running a full nested sampling run including all ‘nuisance’ parameters. We define in this section the nuisance-free likelihood.
Section 3 briefly discusses the methodology behind
margarine with reference to a previously published work [
9]. Finally, we show the results of combining samples from DES and Planck in
Section 4 and conclude in
Section 5.
3. Methods
margarine was first introduced in [
9] and uses density estimation to approximate probability distributions such as
and
given sets of representative samples. The code was developed initially to calculate marginal Kullback–Leibler (KL) divergences [
10] and Bayesian model dimensionalities (BMD) [
11] however as discussed in
Section 2 it can be used to calculate the nuisance-free likelihoods. This in turn means that we can use
margarine alongside an implementation of the nested sampling algorithm to sample the product
. In this manner,
margarine allows us to combine constraints on common parameters across different data sets.
We refer the reader to [
9] for a complete discussion of how
margarine works; however, we discuss briefly the density estimation here.
margarine uses two different types of density estimators to model posterior and prior samples, namely masked autoregressive flows (MAFs, [
8]) and kernel density estimators (KDEs, [
6,
7]).
MAFs transform a multivariate base distribution, the standard normal, into a target distribution via a series of shifts and scaling, which are estimated by autoregressive neural networks. To improve the performance of the MAF the samples representing the target distribution, in our case
and
, are transformed into a Gaussianized space. We implement the MAFs using
tensorflow and the
keras backend [
12].
KDEs use a kernel to approximate the multivariate probability density of a series of samples. In our case, the kernel is Gaussian and the probability density is a sum of Gaussians centred on the sample points with a given bandwidth. Again, we transform the target samples into a Gaussianized parameter space allowing the KDE to better capture the distribution. The KDEs are implemented with
SciPy in
margarine [
6,
7,
13].
Since both types of density estimator build approximations to the target distribution using known distributions, the approximate log probabilities of the target distribution can be easily calculated.
The evaluation of normalised log probabilities for the marginal posterior and marginal prior allows us to calculate the nuisance-free likelihoods, as discussed, along with marginal Kullback–Leibler divergences
which quantifies the amount of information gained when moving from the marginal prior to posterior.
4. Cosmological Example
It has previously been demonstrated that
margarine is capable of replicating complex probability distributions and approximating marginal Bayesian statistics such as the KL divergence and the BMD [
9]. Here, we demonstrate the theory discussed in
Section 2 by combining samples from the Dark Energy Survey (DES) Year 1 posterior [
1] and Planck posterior [
3] using
margarine to estimate nuisance-free likelihoods. DES surveys supernovae, galaxies, and large-scale cosmic structures in the universe in an effort to measure dark matter and dark energy densities and model the dark energy equation of state. In contrast, Planck mapped the anisotropies in the cosmic microwave background (CMB) and correspondingly provided constraints on key cosmological parameters.
The constraints from DES and Planck have previously been combined using a full nested sampling run over all parameters including a multitude of ‘nuisance’ parameters in a computationally expensive exercise [
14]. This corresponds to the flow chart in
Figure 1 and the previous analysis gives us access to the combined DES and Planck evidence, which is found to have a value of
. In
Figure 3, we show the DES, Planck, and joint posteriors for the six cosmological parameters derived in this work using
margarine and the flow chart in
Figure 2. The constrained parameters are the baryon and dark matter density parameters,
and
, the angular size of the sound horizon at recombination,
, the CMB optical depth,
, the amplitude of the power spectrum,
, and the corresponding spectral index,
. These make up the set
. We use the nested sampling algorithm
polychord in our analysis [
15,
16].
We use a uniform prior that is defined to be three sigma around the Planck posterior mean. This is done to improve the efficiency of our nested sampling run. However, we subsequently have to re-weight the samples and correct the evidence for the difference between the priors used here and in the previous full nested sampling run [
14] for comparison. If we define
where
A is our uniform prior space and
B is our target prior space from the previous work [
14], then
giving
Following similar arguments, we can transform our posteriors by re-weighting the distributions with the following
We see from the figure and corresponding table that with our joint analysis we are able to derive a log-evidence that is approximately consistent with that found in [
14] validating the theory discussed and its implementation with
margarine. We note that the re-weighting described above relies on the calculations of the two prior log probabilities for which we use
margarine, and currently do not have an estimate of the error for. As a result, the error in the combined evidence,
, is given by the error in
from the nested samples and is likely underestimated. Using
margarine [
9] we can also derive the combined KL divergence, also reported in
Figure 3, which we find is consistent with the result in the literature of
[
11]. Similarly, we derive marginal KL divergences for the DES and Planck cosmological parameters using
margarine. A full discussion of the implications of combining the two data sets for our understanding of cosmology can be found in the literature (e.g., [
11,
14]).
By reducing the number of parameters that need to be sampled, we significantly reduce the nested sampling runtime. For
polychord the runtime scales as the cube of the number of dimensions [
17]. This can be seen by assessing the time complexity of the algorithm, where
. Here
scales with the number of dimensions,
d, as does the Kullback–Leibler divergence. For
polychord, the specific implementation time complexity factor,
, representing the impact of replacing dead points with higher likelihood live points on the runtime, scales linearly with
d. Together this gives
. Therefore, by using nuisance-free likelihoods and sampling over 6 parameters rather than 41 parameters (cosmological plus 20 nuisance parameters for DES and 15 different nuisance parameters for Planck) we reduce the runtime by a factor of
with further improvements in
. Using
margarine,
is typically reduced since analytic likelihoods are computationally more expensive than emulated likelihoods.
5. Conclusions
In the paper, we demonstrated the consistency between combining constraints from different experiments in a marginal framework using density estimators and the code
margarine with a full nested sampling run over all parameters, including those describing ‘nuisance’ components of the data. We have shown this consistency mathematically and with a cosmological example. For the combination of Planck and DES, we found Bayesian evidence and KL divergence that were consistent with previous results [
11,
14].
The analysis in this paper is only possible because (a) we were able to estimate densities in the (much smaller) cosmological parameter space using margarine, and (b) because we have evidence, , from our original nested sampling runs. It is this unique combination that allowed us to compress away or discard nuisance parameters once they have been used. We note also that working in the marginal space results in a compression that is lossless in information on as it recovers an identical marginal posterior and total evidence as is found during a full Bayesian inference. Finally, through the nuisance-free likelihood we can significantly reduce the dimensionality of our problems and since it is faster to emulate a likelihood rather than analytically evaluate, margarine offers a much more computationally efficient path to combined Bayesian analysis.
In principle, our work paves the way for the development of a publicly available library of cosmological density estimators modelled with margarine that can be combined with new data sets using the proposed method in a more efficient manner than currently implemented techniques. However, the work has implications outside of cosmology in any field where multiple experiments probe different aspects of the same physics.