Implicit Equal-Weights Variational Particle Smoother

: Under the motivation of the great success of four-dimensional variational (4D-Var) data assimilation methods and the advantages of ensemble methods (e.g., Ensemble Kalman Filters and Particle Filters) in numerical weather prediction systems, we introduce the implicit equal-weights particle ﬁlter scheme in the weak constraint 4D-Var framework which avoids the ﬁlter degeneracy through implicit sampling in high-dimensional situations. The new variational particle smoother (varPS) method has been tested and explored using the Lorenz96 model with dimensions N x = 40, N x = 100, N x = 250, and N x = 400. The results show that the new varPS method does not suffer from the curse of dimensionality by construction and the root mean square error (RMSE) in the new varPS is comparable with the ensemble 4D-Var method. As a combination of the implicit equal-weights particle ﬁlter and weak constraint 4D-Var, the new method improves the RMSE compared with the implicit equal-weights particle ﬁlter and LETKF (local ensemble transformed Kalman ﬁlter) methods and enlarges the ensemble spread compared with ensemble 4D-Var scheme. To overcome the difﬁculty of the implicit equal-weights particle ﬁlter in real geophysical application, the posterior error covariance matrix is estimated using a limited ensemble and can be calculated in parallel. In general, this new varPS performs slightly better in ensemble quality (the balance between the RMSE and ensemble spread) than the ensemble 4D-Var and has the potential to be applied into real geophysical systems.


Introduction
The Particle filter (PF) is a continuous-time sequential Monte Carlo method, which uses Monte Carlo sampling particles to estimate and represent the posterior probability density functions (PDFs). The advantage of PFs over variational methods and EnKF is that the PFs estimate the posteriors without the linear and Gaussian assumptions. PFs have been successfully applied in systems with low dimensions, e.g., [1,2]. PF uses a set of weighted particles to estimate the posterior PDFs of the model state. The weight of each particle is proportional to the likelihood of these observations, which are the conditional PDFs of the observations given the model state. However, in the case of high-dimensional model with a large number of independent observations, it tends to encounter the problem that only one particle gets a weight close to one, while others have weights close to zero, which is called filter degeneracy or the collapse of the filter. To prevent filter degeneracy, PF requires the particle number to scale exponentially with the dimension of independent observations [3,4]. In practical application of perfect model assumption by allowing for a Gaussian additive model error in the dynamical model, the problem has become fully 4-dimensional. The most appropriate initial condition and model error are found by minimizing the errors in the initial state, observations and the model during a time window. This is the weak constraint 4D-Var.
The new algorithm is designed as a compromise between weak constraint 4D-Var and the IEWPF, and inherits the merits of both. Efforts have been made to introduce a particle filter scheme in a 4D-Var framework. Morzfeld et al. [11] apply a localized particle smoother in the strong constraint 4D-Var, which prevents the collapse of the variational particle smoother and yields results that are comparable with those of the ensemble version of the 4D-Var method. The particle smoother has a kind of natural connection to weak constraint 4D-Var formulation, and implicit sampling by Monte Carlo methods can be easily applied in the 4D-Var data assimilation system through the proposal transition density. The major obstacle for the implementation of IEWPF in a real dynamical system is the calculation of the covariance of the proposal density (P matrix). To implement the IEWPF in the weak constraint 4D-Var framework, the major point is the expression of the P matrix. As shown in Morzfeld et al. [11], the P matrix has a connection with the Hessian matrix of the weak constraint 4D-Var cost function. In addition, the weak constraint 4D-Var gives an effective way of calculating its Hessian matrix of the cost function. To avoid explicit estimation of the P matrix, we estimate the random part of each updated model state using ensembles. We introduce the implicit sampling and proposal transition density in the weak constraint 4D-Var framework, using a scale factor α to slightly adjust the covariance of the proposal density (P matrix) for the purpose of fulfilling the equal-weights.
This paper is organized as follows. In Section 2, we describe the implicit equal-weights particle smoother algorithm in detail. Section 3 illustrates the performance of the new scheme on a nonlinear Lorenz96 model. The conclusions and discussions are included in Section 4.

Implicit Equal-Weights Variational Particle Smoother
In this section, we describe the basic idea of the new algorithm, referred to from now on as the implicit equal-weights variational particle smoother (IEWVPS). Two key points, the selection of α and estimation of the P matrix, are introduced in detail in the subsections.

The Basic Idea
Before introducing the IEWVPS algrithom, some assumptions must be made. As in Zhu et al. [16], it is assumed that the observation errors have a Gaussian distribution and that the observations are spatially and temporally independent. The observation operator H is linear and the model errors are Gaussian. The dynamical model system is Markovian, which means that the model state at time t only relates to the model state at the previous time t − 1.
According to Bayes' theorem, the posterior PDF of model state x 0:n given the observations y 1:n during the time window [0, n] can be written as: p(x 0:n |y 1:n ) = p(x 0:n )p(y 1:n |x 0:n ) p(y 1:n ) Applying the idea of proposal density, one can multiply the numerator and the denominator of Equation (1) by the same factor q(x 0:n |y 1:n ), leading to: p(x 0:n |y 1:n ) = p(x 0:n )p(y 1:n |x 0:n ) p(y 1:n )q(x 0:n |y 1:n ) q(x 0:n |y 1:n ) The support of q(x 0:n |y 1:n ) should be equal to or larger than that of p(x 0:n ). Instead of drawing samples from the original transition density p(x 0:n ) = ∏ n k=1 p(x k |x k−1 )p(x 0 ), we could now draw samples from the proposal transition density q(x 0:n |y 1:n ), leading the posterior PDF to be expressed as: where N e is the number of particles and w i is the weight of particle i: As it is assumed that the observation error and model error are Gaussian, the numerator of Equation (4) could be expressed in terms of the cost function of weak constraint 4D-Var of particle i: where J i (x 0:n i ) is the cost function of weak constraint 4D-Var of particle i: where a 2 A −1 ≡ a T A −1 a. x b is the initial background state and Q is the model error covariance. Morzfeld et al. [11] described the connection between the variational particle smoother and the ensemble 4D-Var. According to this connection, the numerator of Equation (4) could be written as: where N(x a,0:n , P) = A exp[− 1 2 (x 0:n − x a,0:n ) T P −1 (x 0:n − x a,0:n )], A is constant, x a,0:n i = arg min J i (x 0:n i ) is the minimizer of the cost function, and 1 2 φ i = min J i (x 0:n i ) is the minimum of the cost function. P i is the inverse of the Hessian of J i (x 0:n i ) and could be estimated as the Gauss-Newton approximation of the Hessian, which requires the second derivative of J i (x 0:n i ), computed by tangent linear and adjoint model of the dynamical model.
In this study, the proposal transition density now can be chosen as Gaussian with mean x a,0:n i and covariance P i , so that: q(x 0:n i |y 1:n ) = N(x a,0:n i , P i ) Through implementing the implicit equal-weights particle filter (IEWPF) scheme (Zhu et al.) in the weak constraint 4D-Var framework, the updated model state of particle i are computed as: where ξ 0:n i is a random vector drawn from the implicit proposal density q(ξ 0:n ) = N(0, I). α i is a scalar chosen by setting the weights of all particles to the same target weight [16]. In the original formulation of Zhu et al. [16], α i has an analytical solution in terms of the Lambert W function [25]. Pseudo-code for the IEWVPS is provided in Algorithm 1 and a schematic diagram of the IEWVPS is shown in

The Scale Factor α
In the IEWPF method, α is an important parameter, which is selected by setting all particle weights to the target weight. In this section, we extend the calculation of α in Zhu et al. [16] to 4D model state.
As mentioned above, we draw samples implicitly from a Gaussian proposal density q(ξ 0:n ) instead of the original proposal density q(x 0:n |y 1:n ). These two PDFs are related by: where dx 0:n dξ 0:n denotes the absolute value of the determinant of the Jacobian matrix of the transformation x 0:n = g(ξ 0:n ). The function g(·) is defined by Equation (9). The weight of particle i is given by: Neglecting p(y 1:n ) and taking −2 log of both sides gives: −2 log w i = −2 log p(y 1:n |x 0:n i )p(x 0:n i ) + 2 log q(ξ 0:n i ) According to Zhu et al. [16], dx 0:n dξ 0:n can be reduced to: Using p(y 1: Setting the weights of all particles to the target weight is equivalent to setting the right-hand side of Equation (14) to a constant C: and leads to: where c i = C − φ i and 2 log P 1/2 i has been absorbed into C. Although Equations (15) or (16) have analytical solutions, it is solved by the Newton iterative method for practical reasons. As discussed in Zhu et al. [16] and Skauvold et al. [17], there exist two branches of α. Sampling α in different branches yields different performances. In this study, as in Zhu et al. [16], a randomly chosen scheme of 50% α > 1 and 50% α < 1 has been adopted.

The Expression of the P Matrix
To implement the new algorithm or IEWPF, one major problem is calculation of the covariance of the proposal density (P matrix). In IEWPF, P is given explicitly, which is difficult to calculate in real geophysical systems. In our method or the method of Morzfeld et al. [11], P has a connection with the Hessian matrix of the cost function of 4D-Var, which is also difficult to calculate explicitly in red high-dimensional problems. In this study, we estimate the P matrix implicitly using a limited ensemble.
Given the model state at time t n−1 , the state at time t n is given by: where M denotes forward integration of the dynamical model, and n ∼ N(0, Q) represents additive Gaussian model error with mean zero and covariance Q. Following the notation in El-Said [26], we define the four-dimensional model state and model error as follows: z and p are linked by 4D model operator L: L is the 4D model operator: With this formulation, the cost function can be written as: The Hessian of the cost function is as follows: The operators L and H are written as: where M is a linearisation of the forward model M and H is a linearisation of H. Therefore, P in Equation (9) is related to S by: Extracting L T D −1/2 out of parentheses, we get: Performing a singular value decomposition (SVD) of R −1/2 HL −1 D 1/2 : R −1/2 HL −1 D 1/2 = USV T , we can estimate P 1/2 directly: In this formulation, L −1 and L −T are expressed explicitly, which is applicable to a low-dimensional problem. If the model dimension is N x , and the length of assimilation window is n, the dimension of the 4D model state is N z = N x × (n + 1); thus, the dimension of P is N z × N z . When the model dimension is high, or the assimilation window is long, it is difficult to explicitly express P and P 1/2 . Since what we need is P 1/2 ξ 0:n , we do not need to calculate the whole Hessian of J(z). P 1/2 ξ 0:n can be estimated using tangent linear and adjoint models.

Numerical Experiments
In this section, the new scheme is tested on the Lorenz96 model [27], which is given by: where the indices wrap around, so that x N x +k = x k , x −k = x N x −k for k ≥ 0. F is often set to 8 for chaotic behavior. One feature of the Lorenz96 model is that it is a chaotic system, as are the atmospheric or oceanic systems. Another is that the model dimension can be easily extended. Thus, the performance of the new scheme can be tested from low to high-dimensional problems. The performance of IEWVPS is compared with ensemble 4D-Var (En4DVar), which is a Monte Carlo method [28]. The En4DVar is an ensemble of weak constraint 4D-Var analysis cycles taking account of observations, boundary, forcing, and model error sources. With a sufficiently large ensemble size, it has the ability to handle non-Gaussian PDFs [24]. In our study, only the initial conditions and observations are perturbed according to their pre-specified error covariance matrices to account for the uncertainties. There is no information exchange between ensemble members; thus, the model state of each ensemble member (or particle) depends only on itself. However, in the IEWVPS, the model state of each particle depends on all particles. In this study, the weak constraint 4D-Var scheme is from El-Said [26]. The results are also compared with the IEWPF and LETKF methods.

Comparison on Different Model Dimensions
In this study, the new scheme has been tested with 40, 100, 250, and 400 dimensions, representing low-to high-dimensional problems. All IEWVPS experiments use 50 particles (except the 400-dimensional experiments which use 20 particles) and the same assimilation window (10 model steps). The truth and perturbed observations are generated as follows: first, the model is integrated for a spin-up period of 50 steps. The final spin-up state is used as the true initial condition (x 0 ). Then, the model is integrated for 200 assimilation windows to generate the truth. Perturbed observations are sampled from the truth every fifth step. The IEWVPS and En4DVar are smoother methods which have an assimilation window; all observations during 10 steps are assimilated in one assimilation window when doing a data assimilation process. While the IEWPF and LETKF are filter methods, the observations are assimilated only at an analysis time step. Despite of the difference of the data assimilation process in the assimilation window, the observation frequency is same in all experiments, which means that the total amount and the positions of observations are totally the same in the smoother and filter experiments. An adaptive localization covariance has been used in the LETKF. In these experiments, the model error covariance matrix Q and background error covariance matrix B are specified as tridiagonal matrices, while the observation error covariance matrix R is a diagonal matrix: It should be noted that the error variance in this study is larger than in Zhu et al. [16] and Skauvold et al. [17]. Although the RMSE is smaller with R = 0.16I than R = 1.6I, which is as expected, the ratio of RMSE to ensemble spread does not change much, as shown in Table 1. Figure 2 compares rank histograms of one run for different error variances. A similar shape can be found for R = 1.6I and R = 0.1I. Thus, in what follows, we set variances of B, Q, R to 2.0, 1.0, and 1.6, respectively. The averaged ratio of the RMSE to ensemble spread over 2000 model time steps has been used to evaluate the performance for different dimensions. The value of the ratio close to 1 indicates that the RMSE and ensemble spread are matched. Figure 3 illustrates the averaged ratio of the RMSE to ensemble spread as a function of model dimension. The mean ratio is listed in Table 2. In general, the ratio of the RMSE to ensemble spread increases as the model dimension enlarges. In the IEWVPS experiments, the ratio is closer to 1.0 than in the En4DVar experiments but larger than in the IEWPF and LETKF experiments. Although LETKF and IEWPF provide the best performance in terms of the ratio, the RMSE is not smallest in the LETKF and IEWPF experiments.
Since the deterministic part in Equation (9) is provided by weak constraint 4D-Var, the mean RMSE in the IEWVPS experiments is similar to that in the En4DVar experiments, as shown in Figure 4. Compared with the IEWPF and LETKF experiments, an extraordinary improvement is seen in the IEWVPS experiments using the same observation frequency, the mean RMSE has been reduced from ∼2.0 to ∼1.0 ( Figure 4 and Table 2). With the additional random part introduced in Equation (9), the IEWVPS increases the ensemble spread relative to the En4DVar ( Figure 5). This means that the IEWVPS could maintain the same level of the RMSE as the En4DVar and increase the ensemble spread at the same time. Figure 6 compares the rank histograms of one run for different model size. A rank histogram is generated by ranking the truth or observation in the set of ascend sorting ensemble members over a period of time. It can be used to evaluate the reliability of the ensemble forecast qualitatively and diagnostic the ensemble spread. Usually, an uniform histogram is desirable, which means the ensemble spread matches the RMSE. A tilted shape indicates that systematic bias exist, and the U-shape histogram indicates a too little spread, while the humped histogram indicates that the ensemble spread is too large [29]. As shown in Figure 6, U-shaped rank histograms occur when model dimension is larger than 100 in the IEWVPS and En4DVar experiments, indicating that the ensemble spread is smaller than the RMSE.

Influence of Ensemble
In the standard PF approach, the number of particles must increase exponentially with the number of independent observations to prevent collapse [3]. To test the performance for different ensemble sizes, we set the model dimension to N x = 100 and increase the ensemble size from 50 to 100, 150, 200, and 400. Results are shown in Figure 7. In all experiments, the ratio of RMSE to spread is not sensitive to the ensemble size. The LETKF and IEWPF outperform the IEWVPS and En4DVar in maintaining the balance of the RMSE and ensemble spread. The RMSE is also not sensitive to the ensemble size with the pre-specified background and observation error covariance matrices in the Section 3.1 (not shown). Although there is no significant improvement in the IEWVPS or other experiments by increasing the ensemble size, it shows that a small ensemble size (less than 100) would yield results comparable to that of the large ensemble size (N e = 400). In a real geophysical system, the ensemble size usually is 50-100. Thus, the ensemble size (less than 100) which is needed to promise a good performance of the IEWVPS is enough for the real geophysical application. Generally speaking, even with a small ensemble size (less than 100), the new approach performs well and does not degenerate.

Deterministic Observation Experiments
In the above experiments, to describe the uncertainties, perturbed observations are assimilated, that is, the observation ensemble size is the same as the number of particles. In atmospheric or oceanic data assimilation, we often do not add noise to the observations, that is, in variational methods and in the EnKF, only one set of observations is used. We do experiments in this way to see how the IEWVPS maintains the capability of describing uncertainty. The results of the first 100 and last 100 model steps are shown in Figures 8 and 9. When the observations are deterministic, the En4DVar experiment collapses after one assimilation window (10 model steps), while the IEWVPS experiment does not collapse.

Discussions and Conclusions
In this study, we use the weak constraint 4D-Var as proposal density in a standard IEWPF framework [16]. To ensure good performance in the IEWPF, a relaxing term, which forces the model state towards the future observation [13,16], must be included. There is no need for the relaxing scheme in this new approach, since the deterministic component is provided by the analysis of weak constraint 4D-Var, which guarantees the accuracy.
This new approach is tested on the Lorenz96 model with different model dimensions. As shown in Section 3, it is applicable to both low and high-dimensional problems. A comparison with the En4DVar reveals that the ensemble spread is larger in the IEWVPS experiments than in the En4DVar experiments while the RMSE is on the same level. If observations are considered to be deterministic, as is usually done in 4D-Var and for the EnKF, the ensemble 4D-Var collapses quickly after one assimilation window, whereas the IEWVPS performs well and does not degenerate. Compared with IEWPF and LETKF experiments, the RMSE in the IEWVPS is much smaller using the same observation frequency. With the pre-specified error covariance, the RMSE of the IEWVPS is about 0.93, while the RMSE of the IEWPF and LETKF is about 1.9. Not only is the RMSE reduced in the IEWVPS experiment, the ensemble spread (∼0.8) is also reduced. As a result, the ensemble spread is smaller than RMSE in both IEWVPS and En4DVar experiments, although perturbed observations are used.
As pointed out by Snyder et al. [3], the number of particles must increase exponentially with the number of independent observations to prevent filter degeneracy. We test the performance of the 100-dimensional Lorenz96 model with different ensemble size. It is proven that, even with a small ensemble size (less than 100), the IEWVPS can perform well and yield results comparable to that of large ensemble size (N e = 400). In the real atmospheric or oceanic application, the ensemble size is usually less than 100. Thus, the curse of dimensionality does not exist in the application of the IEWPF to real atmosphere or ocean systems.
Our method is implemented in the standard IEWPF framework. As proven by Skauvold et al. [17], the gap in the IEWPF will lead to systematic bias in the predictions. This systematic bias can be eliminated by using a two-stage IEWPF method [17]. U-shaped rank histograms are seen for both lowand high-dimensional Lorenz96 models, indicating that the ensemble spread is smaller than the RMSE. One possible reason for the U-shaped rank histograms is the systematic underestimation of variance. Another possible reason is the estimation of the P matrix. To avoid direct calculation of the Hessian matrix and its inverse, we directly estimate P 1/2 ξ 0:n using a limited ensemble. The P matrix is specific to each particle, but, to reduce computational cost, we assume that it is the same for all particles. These factors lead to a smaller ensemble spread. However, there is an advantage of the direct estimation of P 1/2 ξ 0:n using a limited ensemble, namely that this process can be implemented in parallel. Benefiting from this parallel advantage, estimation of P 1/2 ξ 0:n computationally cheap compared with the cost of ensemble 4D-Var. Thus, the new approach has the potential to be applied to practical geophysical systems, since even ensemble 4D-Var can be implemented in parallel.
To ensure good performance, other techniques could be tested and used, such as posterior inflation [30], kernel density distribution mapping (KDDM, McGinnis et al. [31]), etc. We plan to implement the IEWVPS in the regional ocean model system (ROMS) or the Weather Research and Forecasting Model (WRF) in the near future.