Snowballing Nested Sampling

A new way to run nested sampling, combined with realistic MCMC proposals to generate new live points, is presented. Nested sampling is run with a fixed number of MCMC steps. Subsequently, snowballing nested sampling extends the run to more and more live points. This stabilizes MCMC proposals over time, and leads to pleasant properties, including that the number of live points and number of MCMC steps do not have to be calibrated, that the evidence and posterior approximation improves as more compute is added and can be diagnosed with convergence diagnostics from the MCMC literature. Snowballing nested sampling converges to a ``perfect'' nested sampling run with infinite number of MCMC steps.


Introduction
Nested sampling [1] is a popular [2] Monte Carlo algorithm for Bayesian inference, estimating the posterior distribution and the Bayesian evidence.A set of K live points are initially drawn from the prior, and their likelihoods evaluated.In each iteration of the algorithm, the live point with the lowest likelihood is discarded and becomes a dead point.It is replaced with a new point sampled from the prior, under the constraint that the likelihood of the discarded point is exceeded.This likelihood-restricted prior sampling (LRPS) can be achieved in various ways (see Buchner [3] for an extensive review).
Skilling [1] originally proposed Markov Chain Monte Carlo (MCMC) random walk algorithms for the LRPS.This is still the most popular choice for high-dimensional problems, including slice sampling variants (e.g.[4]), recently compared for efficiency and reliability in [5].Such algorithms start from a randomly chosen live point and make M transitions to yield the new live point.To avoid a collapse of the live point set, and sufficiently explore the likelihood-restricted prior, M has to be chosen large enough.However, as Salomone et al. [6] discusses, mild correlations between live points are acceptable.Thus, a suitably efficient LRPS algorithm has to be chosen by the user, as well as the size of the live points, K, which regulates the resolution of the sampling and the uncertainty on the outputs.Since M is not known before-hand, to get reliable outputs [7] recommended to repeatedly run nested sampling with ever-doubling M until the Bayesian evidence stabilizes.However, this may discard information built with substantial computational resources.
This work highlights that there is an interplay between M and K: If the number of live points is large, nested sampling discards only a small fraction of the prior in each iteration (1/K of its probability mass), and a large set of live points is representing the likelihood distribution of the restricted prior.The higher K, and the slower nested sampling traverses the likelihood, the more acceptable correlations between points should be.
At termination, nested sampling provides posterior samples from the dead points.Each sampled point is weighted by the likelihood times the prior volume discarded at that iteration.Because nested sampling iterates with a fixed speed through the prior, the number of samples with non-negligible weight may be low.Two solutions have been proposed for this: (1) to keep intermediate "phantom" points [4] from the MCMC chains to represent the dead point, instead of a single estimate.A drawback here is memory storage.(2) to dynamically widen K in a selected likelihood interval (dynamic nested arXiv:2308.05816v1[stat.CO] 10 Aug 2023 sampling, 8).The latter can rerun likelihood intervals where most of the posterior weight is, bulking the posterior samples.The uncertainty on the evidence is dominated by the stochastic sampling of a limited number of K live points up to this point.Therefore, for enhanced precision on the evidence, all iterations have to be extended to more live points.
In this work, we present a relatively derivative form of nested sampling which (1) avoids choosing M and K carefully beforehand, (2) does not discard previous computations, (3) achieves arbitrarily large effective number of posterior samples as compute increases, and (4) can be diagnosed for convergence despite an imperfect LRPS.

Method
Instead of improving nested sampling convergence with a limited Markov Chain by increasing the number of steps M, we increase the number of live points K.

Basic algorithm
The algorithm proposed here, Snowballing Nested Sampling, is: At each iteration, this algorithm produces nested sampling results (evidence, posterior samples)."Running nested sampling" means running with a MCMC LRPS with a fixed number of steps, M, until termination.For example, one can adopt a nested sampling termination criterion that makes the integral contribution remaining in the live points negligible (e.g., Z live /Z i < 10 −6 ).

Convergence analysis
The rate of compression in (vanilla) nested sampling is K−1 K [9].Thus, after t iterations, the likelihood-restricted prior has volume K−1 K t .The same restricted prior volume can be reached after times the number of live points.With twice the number of iterations, there are twice as many dead points.Loosely speaking, each dead point from a run with K is split into two dead points from a run with K ′ live points.The noise in the nested sampling estimate due to more dead points is also reduced, because the variance is inversely proportional to K [10,11].However, here, we focus on analysing the convergence requirements of the LRPS.
We assume an MCMC random walk with a fixed number of steps M is adopted as an LRPS method.With the restricted prior compressing twice as slowly, and each dead point being split into two dead points, naturally, we find the following equivalence: A nested sampling run with M steps and K live points is comparable to a run with K × 2 live points but M/2 steps, in terms of correlations between live points that could be problematic for nested sampling.Based on this, K → ∞ with fixed M achieves the properties of M → ∞ with fixed K, i.e., a perfect LRPS algorithm for vanilla nested sampling.For the latter case, [9] proved unbiasedness and convergence of Z and the posterior distribution,.A perfect LRPS algorithm is reached with a proposal kernel that is unbounded (or has the support of the prior), as M → ∞.Therefore, the results of [9] also hold for snowballing NS, as K → ∞.

Improved algorithm
For practical purposes, the algorithm above can be improved slightly.Some LRPS methods need at least a few times the dimensionality of the parameter to build effective transition kernels.Therefore the iterations with the lowest K can be skipped, and we start at K 0 .Since there may be some computation overhead beyond model evaluations in nested sampling, K can be incremented in steps of K inc .
It is also not necessary to discard runs from previous iterations and start from scratch.Firstly, a subsequent nested sampling run can be initialised with the same initial prior samples as in the previous iterations, and K inc additional ones.Subsequently in each nested sampling iteration, a live point is discarded, and replaced with a new point.In continuous likelihood functions without plateaus, the call to the LRPS at each dead point, creating a new prior sample based on a threshold L min , has a uniquely identifiable input, L min .Therefore, the LRPS call can be memoized.That is, the result of the function is stored with the argument, L min , as its identifier, and when later the function evaluation with the same argument is needed, the result is returned based on a look up.Only if the identifier is not stored yet, the LRPS call at that L min needs to be made.See [5,7,12,13] for previous work on resuming or combining nested sampling runs.
Put together, the improved algorithm is: resume Nested Sampling with K live points

Results
We test the algorithm on the Rosenbrock function over a flat prior from -10 to +10 in d = 20 dimensions.The likelihood is: With K 0 = 20, and K inc = 20, and M = 20, we observe that ln(Z) evolves as shown in Figure 1.From a initial high value, the estimate declines approximately proportional to 1/(M × K) towards the (presumably true) value if nested sampling was run with K = ∞.

Discussion
We have presented an iterative form of applying nested sampling.It can be understood as a kind of dynamic nested sampling.The proposed algorithm has several amicable properties: 1. the number of live points and the number of steps does not have to be chosen carefully, 2. an initial quick look result is available, 3. the evidence estimate improves as more compute is added, 4. in multi-modal settings, additional peaks are discovered as more compute is added, 5. the convergence can be observed and tested with conventional methods based on MCMC, 6. the posterior sample size increases proportionally to the compute, 7. it is easy to implement.
The convergence as more compute is added, the revisiting and further exploration of missed peaks, and the possibility to use conventional MCMC convergence diagnostics, are properties shared with diffusive nested sampling [14].That algorithm is based on likelihood levels.Instead of monotonically switching to higher likelihoods, the sampler can randomly go up or down in levels, and add more samples (live points) there.Unlike diffusive nested sampling, snowballing nested sampling provides an uncertainty estimate on the evidence Z.
Finally, we make some remarks on adaptive MCMC algorithms.Salomone et al. [6] pointed out that on-the-fly adaptation of MCMC algorithms is problematic, and a warm-up nested sampling run should be performed, the adapted proposal kernel (parameters) at each iteration stored, and applied without adaptation in a second, fresh nested sampling run.If the MCMC transition kernel is created based on estimating a sample covariance matrix of the live point population, this estimate stabilizes as K increases.Newly added K inc live points only mildly alter the estimate as K → ∞.Another popular adaptation is to alter the proposal scale in Metropolis random walks to target a certain acceptance rate.If the proposal scale is chosen based on the acceptance rate of the previous K points and their proposal scale, through a weighted mean targeting an acceptance rate of 23.4 per cent, then we have a situation of a vanishing adaptation [see 15] with convergence to the true posterior.Furthermore, as the sampling is more faithful, the live points are also more faithfully distributed, which in turn improves the sampling, leading to a positive feedback effect.This accelerates the convergence further.

Figure 1 .
Figure 1.Estimate of ln(Z) for each algorithm iteration.