Flow Annealed Kalman Inversion for Gradient-Free Inference in Bayesian Inverse Problems

For many scientific inverse problems we are required to evaluate an expensive forward model. Moreover, the model is often given in such a form that it is unrealistic to access its gradients. In such a scenario, standard Markov Chain Monte Carlo algorithms quickly become impractical, requiring a large number of serial model evaluations to converge on the target distribution. In this paper we introduce Flow Annealed Kalman Inversion (FAKI). This is a generalization of Ensemble Kalman Inversion (EKI), where we embed the Kalman filter updates in a temperature annealing scheme, and use normalizing flows (NF) to map the intermediate measures corresponding to each temperature level to the standard Gaussian. In doing so, we relax the Gaussian ansatz for the intermediate measures used in standard EKI, allowing us to achieve higher fidelity approximations to non-Gaussian targets. We demonstrate the performance of FAKI on two numerical benchmarks, showing dramatic improvements over standard EKI in terms of accuracy whilst accelerating its already rapid convergence properties (typically in $\mathcal{O}(10)$ steps).


Introduction
Many scientific inference tasks are concerned with inverse problems of the form where y ∈ R dy are the data, x ∈ R d are the model parameters, G is the forward map, and η is the observation noise.Throughout this work we will assume that we do not have access to gradients of G with respect to the parameters, and that η ∼ N (0, Γ) where Γ is a fixed noise covariance.The assumption of additive Gaussian noise is the standard setting for Ensemble Kalman Inversion (EKI) [1,2,3,4,5,6,7,8], and whilst we are restricted to problems with Gaussian likelihoods, this covers a large family of scientific inverse problems.The goal of the Bayesian inverse problem is then to recover the posterior distribution over the model parameters given our observations, p(x|y).Typical gradient-free inference methods often involve some variant on Markov Chain Monte Carlo (MCMC) algorithms e.g., random walk Metropolis [9,10,11], or Sequential Monte Carlo (SMC) [12].However, these methods typically require ≳ 10 3 serial model evaluations to achieve convergence, making them intractable for problems with expensive forward models.EKI by contrast utilizes embarrassingly parallel model evaluations to update parameter estimates, typically converging to an approximate solution in O (10) iterations [2,6,7,8].
EKI leverages ideas originally developed in the context of Ensemble Kalman Filtering (EKF) for data assimilation [13].Since its development, EKI has seen applications across a range of disciplines, including studies of fluid flow [14], climate models [15] and machine learning tasks [16].EKI can be understood in the context of annealing, where seek to move from the prior to the posterior through a sequence of intermediate measures.In standard EKI, this involves constructing a sequence of Gaussian approximations to the intermediate measures.In the regime where we have a Gaussian prior π0(x) = N (m0, C0) and a linear forward model G(x) = Gx, the particle distribution obtained via EKI converges to the true posterior in the limit where the ensemble size J → ∞.However, outside this linear, Gaussian regime EKI is an uncontrolled approximation to the posterior that is constructed on the basis of matching first and second moments of the target distribution.Nonetheless, EKI has been shown to perform well on problems with nonlinear forward models and slightly non-Gaussian targets [1,2,6].
In this work we propose the application of normalizing flows (NF) [17,18,19,20] to relax the Gaussian ansatz made by standard EKI for the intermediate measures.Instead of assuming a Gaussian particle distribution at each iteration, the NF is used to fit for the empirical particle distribution and map to a Gaussian latent space, where the EKI updates are performed.In doing so, we are better able to capture non-Gaussian target geometries.The structure of this paper is as follows: in Section 2 we describe the Flow Annealed Kalman Inversion (FAKI) algorithm, in Section 3 we demonstrate the performance of the method on two Bayesian inference tasks with non-Gaussian target geometries and we summarize our work in Section 4.

Regularized Ensemble Kalman Inversion
A number of versions of EKI have been proposed in the literature.Of interest here is the regularized, perturbed observation form of EKI [6].Starting with an ensemble of particles drawn from the prior, {x j 0 } J j=1 , the particles are updated at each iteration according to The empirical covariances C xG n and C GG n are given by At each iteration we perturb the forward model evaluations with the Gaussian observation noise ξ j n ∼ N (0, Γ).The parameter αn is a Tikhonov regularisation parameter, which can be viewed as an inverse step size in the Bayesian annealing context.In particular, given a set of annealing parameters β0 ≡ 0 < β1 < . . .< βN < βN+1 ≡ 1, we have the corresponding set of target distributions with αn = βn+1 − βn.
EKI proceeds by constructing a sequence of ensemble approximations to Gaussian distributions that approximate the intermediate targets.
The choice of the regularization parameter, αn controls the transition from the prior to the posterior.Previous proposals for an adaptive choice have taken inspiration from SMC by using a threshold on the effective sample size (ESS) of the particles at each temeperature level [21,22].In this work we adopt the same approach, calculating pseudo-importance weights at each temperature given by The next temperature level can then be selected by solving using the bisection method, where 0 < τ < 1 is the target fractional ESS threshold.Throughout our work we set τ = 0.5.Full pseudocode for EKI is given in Algorithm 1.

Normalizing Flows
As discussed above, standard EKI proceeds by constructing a sequence of ensemble approximations to Gaussian distributions.The procedure works well in the situation where the target and all the intermediate measures are close to Gaussian.However, when any of these measures are far from Gaussian, EKI can dramatically fail to capture the final target geometry.
To address this shortcoming we propose the use of NFs to approximate each intermediate target, instead of using the Gaussian ansatz of standard EKI.NFs are powerful generative models that can be used for flexible density estimation and sampling [17,18,19,20].An NF model maps from the original space x ∈ R d to a latent space z ∈ R d , through a sequence of invertible transformations f = f1 • f2 • . . .• fL, such that we have a bijective mapping z = f (x).The mapping is such that the latent variables are mapped to some simple base distribution, typically chosen to be the standard Normal distribution, giving z ∼ pz(z) = N (0, I).
The NF density can be evaluated through the change of variables formula, where Df (x) = ∂f (x)/∂x denotes the Jacobian of f .Efficient evaluation of this density requires the Jacobian of the transformation to be easy to evaluate, and efficient sampling requires the inverse of the mapping f to be easy to calculate.In this work we use Masked Autoregressive Flows (MAF) [18], which have previously been found to perform well in the context of preconditioned MCMC sampling within SMC without the need for expensive hyper-parameter searches during sampling [23].

Flow Annealed Kalman Inversion
Given particles distributed as πn(x), the subsequent target can be written as We may therefore view πn(x) i.e., the posterior at the temperature level βn, as an effective prior for πn+1(x), with a data likelihood annealed by α −1 n .By fitting an NF to the particles {x j n } J j=1 , we obtain an approximate map from the intermediate target πn(x) to N (z|0, I).The latent space target is then given by the change of variables formula as By controlling the choice of αn, we control the distance between the Gaussianized effective prior and this latent space target density.For FAKI, we therefore perform the EKI updates in the NF latent space at each temperature level, allowing us to relax the Gaussian ansatz of standard EKI by constructing an approximate map from each πn(x) to a Gaussian latent space.It is worth noting that, whilst this method relaxes the Gaussianity assumptions of standard EKI, it does not address the linearity assumptions used in deriving EKI.
The FAKI update for the latent space particle locations is given by where the latent space empirical covariances are given by Full pseudocode for FAKI is given in Algorithm 2.

5:
Solve for β n+1 using the bisection method with Equation 86: Fit NF map f n to current samples {x j n } J j=1 8: Map particles to latent space z j n = f n (x j n ), j ∈ {1, . . ., J}

Results
In this section we demonstrate the performance of FAKI compared to standard EKI on two numerical benchmarks, a two dimensional Rosenbrock distribution and a stochastic Lorenz system [24,25].Both models display significant non-Gaussianity at some point during the transition from prior to posterior, severely frustrating the performance of EKI.This manifests in both reduced fidelity of the final ensemble approximations to the posterior, and in a larger number of iterations being required for convergence following the ESS-based annealing scheme described in Section 2.1.
In Table 1 we provide statistics summarizing the performance of EKI and FAKI on our numerical benchmarks.We measure the quality of the posterior approximations by computing the 1-Wasserstein distance, W1 [26,27] between the samples obtained through FAKI and EKI, against reference posterior samples obtained via long runs of Hamiltonian Monte Carlo (HMC) [28,29].These reference samples are thinned to be approximately independent when computing the 1-Wasserstein distances 1 .The 1-Wasserstein distance may be interpreted as the cost involved in rearranging one probability measure to look like another, with lower values indicating the two probability measures are closer to one another.In addition to this assessment of the approximation quality, we report the number of iterations, Niter required by FAKI and EKI for convergence.For both quantities we report the median and median absolute deviation (MAD), estimated over 10 independent runs using different random seeds.

Model
Algorithm  1: Median and MAD values for N iter and 1-Wasserstein distances for each model and algorithm combination, calculated over 10 independent runs using different random seeds.For both the numerical benchmarks we see that FAKI results in a reduced number of iterations for convergence, and a lower value of the 1-Wasserstein distance between the converged samples and the ground truth.

d = 2 Rosenbrock
In our first numerical experiment we consider the two dimensional Rosenbrock distribution.This toy model allows us to clearly see the impact of non-Gaussianity on the performance of EKI, and how FAKI is able to alleviate these issues.For the Rosenbrock model we assume a Gaussian prior over the parameters x ∈ R 2 , x ∼ N (0, 10 2 I).
The data, y ∈ R 2 are distributed according to the likelihood, To generate simulated data we evaluate y = G((1, 1) ⊺ ) + η, where η ∼ N (0, diag(0.01 2 , 1 2 )).The large difference in noise scales results in a highly non-Gaussian posterior geometry that poses a significant challenge for EKI.For each run of EKI and FAKI we use 100 particles.
In Figure 1 we show pair-plots comparing the final particle distributions obtained with EKI and FAKI against samples obtained through a long run of HMC.The NF mapping means that the ensemble approximation obtained by FAKI is able to capture the highly nonlinear target geometry.In comparison, EKI struggles to fill the tails of the Rosenbrock target.Moreover, whilst FAKI converges within ∼ 34 iterations, EKI required a median number of ∼ 100 iterations to converge using the ESS-based annealing scheme.

Stochastic Lorenz System
The Lorenz equations are a set of coupled differential equations used as a simple model of atmospheric convection.Notably, for certain parameter values the Lorenz equations are known to exhibit chaotic behaviour [24].In this work we follow [25] and consider the stochastic Lorenz system, dXt = 10(Yt − Xt)dt + dW x t , where W x t , W y t and W z t are Gaussian white noise processes with standard deviation σ0 = 0.1.To generate simulated data we integrated these equations using an Euler-Maruyama scheme with dt = 0.02 for 30 steps, with initial conditions X0, Y0, Z0 ∼ N (0, 1 2 ).The observations are then taken to be the Xt values over these 30 time steps, with Gaussian observational noise ηt ∼ N (0, σ 2 = 1 2 ).The goal of our inference here is to recover the initial conditions, the trajectories (Xt, Yt, Zt) and the innovation noise scale σ0, giving a parameter space of d = 94 dimensions.We assign priors over these parameters as, X0, Y0, Z0 ∼ N (0, 1 2 ), ( 21) where fX , fY and fZ are the transition functions corresponding to Equations 17-19 respectively.The Gaussian likelihood has the form Xt ∼ N (Xt, σ 2 ), t ∈ {1, . . ., 30}, (25) where Xt are the observations of the Xt trajectory.The chaotic dynamics of the Lorenz system results in a highly non-Gaussian prior distribution, with the inversion having to proceed through a sequence of highly non-Gaussian intermediate measures towards the posterior.This severely frustrates the performance of EKI, with the Gaussian ansatz failing to describe the geometry of the intermediate measures.For each run of EKI and FAKI we use 940 particles.
In Figure 2 we show the ensemble estimates for the mean and standard deviation along each dimension obtained by EKI and FAKI, compared to reference estimates obtained through long runs of HMC.FAKI is able to obtain accurate mean estimates along each dimension, whereas EKI is unable to obtain the correct means for much of the Zt trajectory.EKI severely overestimates the marginal standard deviations along many dimensions.This situation is alleviated by the NF mappings learned by FAKI.The greater fidelity of the FAKI posterior approximations are reflected in the median estimates for the 1-Wasserstein distances, with a value of 5.65 for FAKI and 69.8 for EKI.

Conclusions
In this work we have introduced Flow Annealed Kalman Inversion (FAKI), a gradient-free inference algorithm for Bayesian inverse problems with expensive forward models.This is a generalization of Ensemble Kalman Inversion (EKI), where we utilize Normalizing Flows (NF) to replace the Gaussian ansatz made in EKI.Instead of constructing a sequence of ensemble approximations to Gaussian measures that approximate a sequence of intermediate measures, as we move from the prior to the posterior, we learn an NF mapping at each iteration to a Gaussian latent space.Provided the transition between temperature levels is controlled, we can perform Kalman inversion updates in the NF latent space.In the NF latent space, the Gaussianity assumptions of EKI are more closely satisfied, resulting in a more stable inversion at each temperature level.
We demonstrate the performance of FAKI on two numerical benchmarks, a d = 2 Rosenbrock distribution and a d = 94 stochastic Lorenz system.Both examples exhibit significant non-Gaussianity in the transition from prior to posterior that frustrate standard EKI.In the presence of strong non-Gaussianity, we find FAKI produces higher fidelity posterior approximations compared to EKI, as measured by the 1-Wasserstein distance between FAKI/EKI samples and reference HMC samples.In addition to the improved fidelity of the posterior approximations, we find FAKI tends to reduce the number of iterations required for convergence.
Whilst the application of NFs is able to relax the Gaussian ansatz of EKI, it does not address the linearity assumptions used in deriving EKI.As such, FAKI is still not exact for general forward models.In future work, it will be interesting to explore methods to address this, for example the combination of FAKI with unbiased MCMC or importance sampling methods.It would also be interesting to consider generalizations of FAKI that are able to accommodate non-Gaussian likelihoods and/or parameter-dependent noise covariances.The use of NFs means that we typically require ensemble sizes J ≳ 10d to learn accurate NF maps with the MAF architecture employed in this work.It would be useful to explore alternative NF architectures and regularization schemes that are able to learn accurate NF maps with smaller ensemble sizes, in order to enable FAKI to scale to higher dimensions.In this work, we have found that the MAF architecture is able to capture a wide range of target geometries without the need for expensive NF hyper-parameter searches.However, it may be possible to exploit NF architectures with inductive biases that are particularly suited to common target geometries e.g., the nonlinear correlations that often appear in hierarchical models.

Figure 1 :
Figure 1: Pair-plots for the Rosenbrock target.Panel (a): Pair-plot comparison of samples from EKI and a long HMC run.Panel (b): Pair-plot comparison of samples from FAKI and a long HMC run.Samples from FAKI are able to correctly capture the highly nonlinear target geometry.Standard EKI struggles to fill the tails of the target, and requires ∼ 100 iterations to converge, compared to ∼ 34 iterations for FAKI.

Figure 2 :
Figure 2: Comparison of first and second moment estimates along each dimension for the stochastic Lorenz system.Panel (a): Comparison between the mean estimates from EKI and a long HMC run.Panel (b): Comparison between the mean estimates from FAKI and a long HMC run.Panel (c): Comparison between the standard deviation estimates from EKI and a long HMC run.Panel (d): Comparison between the standard deviation estimates from FAKI and a long HMC run.Blue bars indicate the moment estimates obtained via HMC along each dimension, with the adjacent orange bars showing the estimates obtained through EKI/FAKI.EKI is unable to obtain accurate mean estimates for much of the Z t trajectory, whilst FAKI is able to obtain accurate mean estimates for each dimension.FAKI outperforms EKI in its estimates of the marginal standard deviations, with EKI drastically overestimating the standard deviations along many dimensions.