1. Introduction
The Particle filter (PF) is a continuous-time sequential Monte Carlo method, which uses Monte Carlo sampling particles to estimate and represent the posterior probability density functions (PDFs). The advantage of PFs over variational methods and EnKF is that the PFs estimate the posteriors without the linear and Gaussian assumptions. PFs have been successfully applied in systems with low dimensions, e.g., [
1,
2]. PF uses a set of weighted particles to estimate the posterior PDFs of the model state. The weight of each particle is proportional to the likelihood of these observations, which are the conditional PDFs of the observations given the model state. However, in the case of high-dimensional model with a large number of independent observations, it tends to encounter the problem that only one particle gets a weight close to one, while others have weights close to zero, which is called filter degeneracy or the collapse of the filter. To prevent filter degeneracy, PF requires the particle number to scale exponentially with the dimension of independent observations [
3,
4]. In practical application of NWP systems, the number of the ensemble members is always on the order of 10–100 [
5] because of the limitation of computation resources, which makes the obstacle of degeneracy inevitable for PFs.
One potential method to avoid filter degeneracy is drawing samples from a proposal transition density rather than the original transition density [
6]. The limitations on the proposal transition density are few as long as the support of the proposal density is larger than that of the original density and the proposal density should be non-zero where the original density is non-zero. Doucet et al. [
6] proposed an optimal proposal density which minimizes the variance of the weights. In the weighted ensemble Kalman filter (WEKF) [
7], the stochastic EnKF [
8] is used as the proposal density. Morzfeld et al. [
9] discussed the behavior of WEKF in high-dimensional systems, and suggested localization to make WEKF effective. Chen et al. [
10] extended the localization approach to the WEKF, providing better performance in nonlinear systems than the local particle filter (LPF). Like the EnKF, 4D-Var can also be used as a proposal density, by using the particle filter in a 4D-Var framework. Morzfeld et al. [
11] proposed a variational particle smoother method and introduced localization to eliminate particle collapse. The equivalent-weights particle filter (EWPF) allows the proposal density to depend on all particles at the previous time and gives equal weights to most particles to avoid degeneracy [
12,
13,
14,
15]. Combining the idea of implicit sampling and equivalent-weights, Zhu et al. [
16] proposed the implicit equal-weights particle filter (IEWPF), in which there is no need for parameter tuning. To remedy the bias in IEWPF of Zhu et al. [
16], Skauvold et al. [
17] proposed a two-stage IEWPF method. Other techniques to eliminate filter degeneracy have been reviewed by van Leeuwen et al. [
18], including transformation, localization, and hybridization.
The IEWPF [
16] is an indirect method based on the ideas of implicit sampling and proposal transition density [
19]. The basic idea of implicit sampling is to locate the regions of high probability and to draw samples in these regions. This scheme uses a proposal transition density in which each particle is drawn implicitly from a slightly different proposal density by introducing a factor in front of the covariance of the proposal transition density. For each particle, this factor depends on the proposal density of all particles, and is calculated to fulfill the equal-weights property.
The most successful data assimilation (DA) methods implemented in the operational numerical weather prediction (NWP) centers in the world are variational methods (e.g., 3D-Var and 4D-Var, [
20]), ensemble Kalman filter (EnKF) method, and its variants [
8,
21,
22,
23]. Variational methods try to search the peak value of the posterior PDF through a minimization process of the cost function, which cannot be guaranteed to be the global optimum provided that the search may stop at a local mode. It is also hard to estimate the uncertainties in variational methods. EnKF methods estimate the mean and covariance of the posterior PDF because the EnKF implicitly assumes that the model is linear and that the posterior PDF is Gaussian. Consequently, the estimation of the posterior PDF becomes simplified under those assumptions, which means the mean value is close to the peak and the covariance becomes much easier to calculate. These implicit conditions and assumptions of the variational methods and the EnKF methods are unlikely to be satisfied in most real geophysical systems. Neither of these two DA methods can describe nonlinear and non-Gaussian posterior PDFs in an accurate manner, and it is still unclear what these two methods describe for the multi-modal posterior PDFs. Therefore, the variational methods and EnKF methods cannot meet the increasingly complex and advanced nonlinear models because these methods are relatively incomplete.
The success of 4D-Var data assimilation in operational NWP centres is due to the following four aspects [
24]. (1) Variational methods are able to handle the increasing quantity of asynchronous satellite observations effectively and are consistent with the model dynamics. (2) Variational methods have the capacity to contain the weak nonlinearities of the model and observation operators. (3) Variational methods avoid localization and perform the data assimilation process in a global way. (4) Variational methods allow additional terms in the cost function, e.g., variational bias correction, variational quality control, digital filter initialization, weak constraint term, etc. If it is assumed that the dynamical model is perfect, the objective is to seek a best initial condition, which minimizes the errors in the background and observations during a time window. This is the so-called strong constraint 4D-Var. Relaxing the perfect model assumption by allowing for a Gaussian additive model error in the dynamical model, the problem has become fully 4-dimensional. The most appropriate initial condition and model error are found by minimizing the errors in the initial state, observations and the model during a time window. This is the weak constraint 4D-Var.
The new algorithm is designed as a compromise between weak constraint 4D-Var and the IEWPF, and inherits the merits of both. Efforts have been made to introduce a particle filter scheme in a 4D-Var framework. Morzfeld et al. [
11] apply a localized particle smoother in the strong constraint 4D-Var, which prevents the collapse of the variational particle smoother and yields results that are comparable with those of the ensemble version of the 4D-Var method. The particle smoother has a kind of natural connection to weak constraint 4D-Var formulation, and implicit sampling by Monte Carlo methods can be easily applied in the 4D-Var data assimilation system through the proposal transition density. The major obstacle for the implementation of IEWPF in a real dynamical system is the calculation of the covariance of the proposal density (
P matrix). To implement the IEWPF in the weak constraint 4D-Var framework, the major point is the expression of the
P matrix. As shown in Morzfeld et al. [
11], the
P matrix has a connection with the Hessian matrix of the weak constraint 4D-Var cost function. In addition, the weak constraint 4D-Var gives an effective way of calculating its Hessian matrix of the cost function. To avoid explicit estimation of the
P matrix, we estimate the random part of each updated model state using ensembles. We introduce the implicit sampling and proposal transition density in the weak constraint 4D-Var framework, using a scale factor
to slightly adjust the covariance of the proposal density (
P matrix) for the purpose of fulfilling the equal-weights.
This paper is organized as follows. In
Section 2, we describe the implicit equal-weights particle smoother algorithm in detail.
Section 3 illustrates the performance of the new scheme on a nonlinear Lorenz96 model. The conclusions and discussions are included in
Section 4.
3. Numerical Experiments
In this section, the new scheme is tested on the Lorenz96 model [
27], which is given by:
where the indices wrap around, so that
,
for
.
F is often set to 8 for chaotic behavior. One feature of the Lorenz96 model is that it is a chaotic system, as are the atmospheric or oceanic systems. Another is that the model dimension can be easily extended. Thus, the performance of the new scheme can be tested from low to high-dimensional problems. The performance of IEWVPS is compared with ensemble 4D-Var (En4DVar), which is a Monte Carlo method [
28]. The En4DVar is an ensemble of weak constraint 4D-Var analysis cycles taking account of observations, boundary, forcing, and model error sources. With a sufficiently large ensemble size, it has the ability to handle non-Gaussian PDFs [
24]. In our study, only the initial conditions and observations are perturbed according to their pre-specified error covariance matrices to account for the uncertainties. There is no information exchange between ensemble members; thus, the model state of each ensemble member (or particle) depends only on itself. However, in the IEWVPS, the model state of each particle depends on all particles. In this study, the weak constraint 4D-Var scheme is from El-Said [
26]. The results are also compared with the IEWPF and LETKF methods.
3.1. Comparison on Different Model Dimensions
In this study, the new scheme has been tested with 40, 100, 250, and 400 dimensions, representing low- to high-dimensional problems. All IEWVPS experiments use 50 particles (except the 400-dimensional experiments which use 20 particles) and the same assimilation window (10 model steps). The truth and perturbed observations are generated as follows: first, the model is integrated for a spin-up period of 50 steps. The final spin-up state is used as the true initial condition (
). Then, the model is integrated for 200 assimilation windows to generate the truth. Perturbed observations are sampled from the truth every fifth step. The IEWVPS and En4DVar are smoother methods which have an assimilation window; all observations during 10 steps are assimilated in one assimilation window when doing a data assimilation process. While the IEWPF and LETKF are filter methods, the observations are assimilated only at an analysis time step. Despite of the difference of the data assimilation process in the assimilation window, the observation frequency is same in all experiments, which means that the total amount and the positions of observations are totally the same in the smoother and filter experiments. An adaptive localization covariance has been used in the LETKF. In these experiments, the model error covariance matrix
Q and background error covariance matrix
B are specified as tridiagonal matrices, while the observation error covariance matrix
R is a diagonal matrix:
It should be noted that the error variance in this study is larger than in Zhu et al. [
16] and Skauvold et al. [
17]. Although the RMSE is smaller with
than
, which is as expected, the ratio of RMSE to ensemble spread does not change much, as shown in
Table 1.
Figure 2 compares rank histograms of one run for different error variances. A similar shape can be found for
and
. Thus, in what follows, we set variances of
B,
Q,
R to 2.0, 1.0, and 1.6, respectively.
The averaged ratio of the RMSE to ensemble spread over 2000 model time steps has been used to evaluate the performance for different dimensions. The value of the ratio close to 1 indicates that the RMSE and ensemble spread are matched.
Figure 3 illustrates the averaged ratio of the RMSE to ensemble spread as a function of model dimension. The mean ratio is listed in
Table 2. In general, the ratio of the RMSE to ensemble spread increases as the model dimension enlarges. In the IEWVPS experiments, the ratio is closer to 1.0 than in the En4DVar experiments but larger than in the IEWPF and LETKF experiments. Although LETKF and IEWPF provide the best performance in terms of the ratio, the RMSE is not smallest in the LETKF and IEWPF experiments.
Since the deterministic part in Equation (
9) is provided by weak constraint 4D-Var, the mean RMSE in the IEWVPS experiments is similar to that in the En4DVar experiments, as shown in
Figure 4. Compared with the IEWPF and LETKF experiments, an extraordinary improvement is seen in the IEWVPS experiments using the same observation frequency, the mean RMSE has been reduced from ∼2.0 to ∼1.0 (
Figure 4 and
Table 2). With the additional random part introduced in Equation (
9), the IEWVPS increases the ensemble spread relative to the En4DVar (
Figure 5). This means that the IEWVPS could maintain the same level of the RMSE as the En4DVar and increase the ensemble spread at the same time.
Figure 6 compares the rank histograms of one run for different model size. A rank histogram is generated by ranking the truth or observation in the set of ascend sorting ensemble members over a period of time. It can be used to evaluate the reliability of the ensemble forecast qualitatively and diagnostic the ensemble spread. Usually, an uniform histogram is desirable, which means the ensemble spread matches the RMSE. A tilted shape indicates that systematic bias exist, and the U-shape histogram indicates a too little spread, while the humped histogram indicates that the ensemble spread is too large [
29]. As shown in
Figure 6, U-shaped rank histograms occur when model dimension is larger than 100 in the IEWVPS and En4DVar experiments, indicating that the ensemble spread is smaller than the RMSE.
3.2. Influence of Ensemble
In the standard PF approach, the number of particles must increase exponentially with the number of independent observations to prevent collapse [
3]. To test the performance for different ensemble sizes, we set the model dimension to
and increase the ensemble size from 50 to 100, 150, 200, and 400. Results are shown in
Figure 7. In all experiments, the ratio of RMSE to spread is not sensitive to the ensemble size. The LETKF and IEWPF outperform the IEWVPS and En4DVar in maintaining the balance of the RMSE and ensemble spread. The RMSE is also not sensitive to the ensemble size with the pre-specified background and observation error covariance matrices in the
Section 3.1 (not shown). Although there is no significant improvement in the IEWVPS or other experiments by increasing the ensemble size, it shows that a small ensemble size (less than 100) would yield results comparable to that of the large ensemble size (
). In a real geophysical system, the ensemble size usually is 50–100. Thus, the ensemble size (less than 100) which is needed to promise a good performance of the IEWVPS is enough for the real geophysical application. Generally speaking, even with a small ensemble size (less than 100), the new approach performs well and does not degenerate.
3.3. Deterministic Observation Experiments
In the above experiments, to describe the uncertainties, perturbed observations are assimilated, that is, the observation ensemble size is the same as the number of particles. In atmospheric or oceanic data assimilation, we often do not add noise to the observations, that is, in variational methods and in the EnKF, only one set of observations is used. We do experiments in this way to see how the IEWVPS maintains the capability of describing uncertainty. The results of the first 100 and last 100 model steps are shown in
Figure 8 and
Figure 9. When the observations are deterministic, the En4DVar experiment collapses after one assimilation window (10 model steps), while the IEWVPS experiment does not collapse.
4. Discussions and Conclusions
In this study, we use the weak constraint 4D-Var as proposal density in a standard IEWPF framework [
16]. To ensure good performance in the IEWPF, a relaxing term, which forces the model state towards the future observation [
13,
16], must be included. There is no need for the relaxing scheme in this new approach, since the deterministic component is provided by the analysis of weak constraint 4D-Var, which guarantees the accuracy.
This new approach is tested on the Lorenz96 model with different model dimensions. As shown in
Section 3, it is applicable to both low and high-dimensional problems. A comparison with the En4DVar reveals that the ensemble spread is larger in the IEWVPS experiments than in the En4DVar experiments while the RMSE is on the same level. If observations are considered to be deterministic, as is usually done in 4D-Var and for the EnKF, the ensemble 4D-Var collapses quickly after one assimilation window, whereas the IEWVPS performs well and does not degenerate. Compared with IEWPF and LETKF experiments, the RMSE in the IEWVPS is much smaller using the same observation frequency. With the pre-specified error covariance, the RMSE of the IEWVPS is about 0.93, while the RMSE of the IEWPF and LETKF is about 1.9. Not only is the RMSE reduced in the IEWVPS experiment, the ensemble spread (∼0.8) is also reduced. As a result, the ensemble spread is smaller than RMSE in both IEWVPS and En4DVar experiments, although perturbed observations are used.
As pointed out by Snyder et al. [
3], the number of particles must increase exponentially with the number of independent observations to prevent filter degeneracy. We test the performance of the 100-dimensional Lorenz96 model with different ensemble size. It is proven that, even with a small ensemble size (less than 100), the IEWVPS can perform well and yield results comparable to that of large ensemble size (
). In the real atmospheric or oceanic application, the ensemble size is usually less than 100. Thus, the curse of dimensionality does not exist in the application of the IEWPF to real atmosphere or ocean systems.
Our method is implemented in the standard IEWPF framework. As proven by Skauvold et al. [
17], the gap in the IEWPF will lead to systematic bias in the predictions. This systematic bias can be eliminated by using a two-stage IEWPF method [
17]. U-shaped rank histograms are seen for both low- and high-dimensional Lorenz96 models, indicating that the ensemble spread is smaller than the RMSE. One possible reason for the U-shaped rank histograms is the systematic underestimation of variance. Another possible reason is the estimation of the
P matrix. To avoid direct calculation of the Hessian matrix and its inverse, we directly estimate
using a limited ensemble. The
P matrix is specific to each particle, but, to reduce computational cost, we assume that it is the same for all particles. These factors lead to a smaller ensemble spread. However, there is an advantage of the direct estimation of
using a limited ensemble, namely that this process can be implemented in parallel. Benefiting from this parallel advantage, estimation of
computationally cheap compared with the cost of ensemble 4D-Var. Thus, the new approach has the potential to be applied to practical geophysical systems, since even ensemble 4D-Var can be implemented in parallel.
To ensure good performance, other techniques could be tested and used, such as posterior inflation [
30], kernel density distribution mapping (KDDM, McGinnis et al. [
31]), etc. We plan to implement the IEWVPS in the regional ocean model system (ROMS) or the Weather Research and Forecasting Model (WRF) in the near future.