A Robust Non-Gaussian Data Assimilation Method for Highly Non-Linear Models

Nino-Ruiz, Elias D.; Cheng, Haiyan; Beltran, Rolando

doi:10.3390/atmos9040126

Open AccessArticle

A Robust Non-Gaussian Data Assimilation Method for Highly Non-Linear Models

by

Elias D. Nino-Ruiz

^1,*

,

Haiyan Cheng

² and

Rolando Beltran

¹

Applied Math and Computational Science Laboratory, Department of Computer Science, Universidad del Norte, Barranquilla 080001, Colombia

²

Department of Computer Science, Willamette University, 900 State Street, Salem, OR 97301, USA

^*

Author to whom correspondence should be addressed.

Atmosphere 2018, 9(4), 126; https://doi.org/10.3390/atmos9040126

Submission received: 5 January 2018 / Revised: 14 March 2018 / Accepted: 20 March 2018 / Published: 26 March 2018

(This article belongs to the Special Issue Efficient Formulation and Implementation of Data Assimilation Methods)

Download

Browse Figures

Versions Notes

Abstract

:

In this paper, we propose an efficient EnKF implementation for non-Gaussian data assimilation based on Gaussian Mixture Models and Markov-Chain-Monte-Carlo (MCMC) methods. The proposed method works as follows: based on an ensemble of model realizations, prior errors are estimated via a Gaussian Mixture density whose parameters are approximated by means of an Expectation Maximization method. Then, by using an iterative method, observation operators are linearized about current solutions and posterior modes are estimated via a MCMC implementation. The acceptance/rejection criterion is similar to that of the Metropolis-Hastings rule. Experimental tests are performed on the Lorenz 96 model. The results show that the proposed method can decrease prior errors by several order of magnitudes in a root-mean-square-error sense for nearly sparse or dense observational networks.

Keywords:

ensemble Kalman filter; Gaussian Mixture Models; non-linear observation operator; Markov-Chain-Monte-Carlo

1. Introduction

Data Assimilation is the process of optimally combining the imperfect numerical forecast states and imperfect observations to better estimate the state

x^{*} \in R^{n \times 1}

of a system that evolves according to some model operator [1,2,3],

x_{n e x t}^{*} = M_{t_{c u r r e n t} \to t_{n e x t}} (x_{c u r r e n t}^{*}),

(1)

where,

M : R^{n \times 1} \to R^{n \times 1}

is an imperfect numerical model that evolves the states, n is the number of model components. One example of

M

could be a model that mimics the ocean and/or the atmosphere dynamics. In traditional DA settings, prior errors are described by Gaussian distributions,

\begin{matrix} x \sim N (x^{b}, B) \end{matrix}

where

x^{b} \in R^{n \times 1}

and

B \in R^{n \times n}

is the background state and the background error covariance matrix, respectively. This assumption has plenty of computational benefits, for instance, by assuming Gaussian errors in the prior and the observations, Kalman-like updates can be performed in order to compute posterior (error) moments. However, in the context of geophysics, model dynamics can be highly non-linear and therefore, Gaussian Mixture Models (GMM) [4,5] can be used to capture our prior knowledge about the error dynamics:

x \sim \sum_{k = 1}^{K} α_{k}^{b} \cdot N (x_{k}^{b}, B_{k}), with \sum_{k = 1}^{K} α_{k}^{b} = 1

(2)

where, for the k-th mixture component,

x_{k}^{b} \in R^{n \times 1}

is the mean,

B_{k} \in R^{n \times n}

is the background error covariance, and

α_{k}^{b}

is the prior weight, for

1 \leq k \leq K

. These weights can be estimated, for instance, using the Expectation Maximization algorithm [6,7,8]. In sequential methods, it is common to assume Gaussian errors over observations

y \in R^{m \times 1}

,

y \sim N (H (x), R)

(3)

where m is the number of observed components from the model state,

R \in R^{m \times m}

is the data error covariance matrix, and

H : R^{n \times 1} \to R^{m \times 1}

is the observation mapping operator. When the operator

H

is linear, the posterior error distribution can be described by a GMM [9] as well:

x | y \sim \sum_{k = 1}^{K} α_{k}^{a} \cdot N (x_{k}^{a}, A_{k})

(4)

with updated weights

α_{k}^{a}

, centroids

x_{k}^{a}

, and covariances

A_{k}

, for

1 \leq k \leq K

, which account for the observation (3). It can be noted that, for the k-th mixture component, the posterior mode

x_{k}^{a} \in R^{n \times 1}

can be obtained by means of a 3D-Var optimization problem:

x_{k}^{a} = \underset{x}{arg min} J^{k} (x)

(5)

where

J^{k} (x) = \frac{1}{2} \cdot {∥x - x_{k}^{b}∥}_{B_{k}^{- 1}}^{2} + \frac{1}{2} \cdot {∥y - H (x)∥}_{R^{- 1}}^{2}

(6)

Under linear assumptions,

x_{k}^{a}

can be estimated by means of, for instance, an EnKF updating formula. However, for non-linear observation operators, such expression can fail to obtain reasonable estimates of posterior modes and therefore, other alternatives such as sampling methods are employed. For instance, Monte-Carlo based methods are commonly utilized in order to relax the Gaussian assumption in the observational errors. Thus, we propose an efficient sampling method to draw samples from the posterior distribution (4) using cost functions of the form (6). In general, the method works as follows: for a fixed number of clusters, a GMM is fitted with the EM method, then, for each prior component, samples are drawn along steepest descent approximations of the 3D-Var cost function; these samples are accepted/rejected based on the Metropolis–Hastings rule. Besides, background error correlations are estimated based on a modified Cholesky decomposition in order to perform implicit localization and to reduce the impact of sampling errors, a major concern in the context of DA.

This paper is organized as follows: Section 2 discusses EnKF formulations and sampling methods for relaxing Gaussian assumptions in prior and observation errors. Section 3 presents an EnKF formulation based on GMM and MCMC (Markov-Chain-Monte-Carlo) for computing posterior modes together with error statistics; the computational cost of the method is estimated as well. In Section 4, experimental tests are performed on the Lorenz-96 model and non-linear observation operators; conclusions are finally stated in Section 5.

2. Preliminaries

2.1. Ensemble Kalman Filters Based on Modified Cholesky Decomposition

In sequential Data Assimilation (DA), under prior Gaussian assumptions, a well-known filter is the ensemble Kalman filter (EnKF) [10,11]. In EnKF, using an ensemble of model realizations,

X^{b} = [x^{b [1]}, x^{b [2]}, \dots, x^{b [N]}] \in R^{n \times N}

(7a)

the hyper-parameters of the error distribution,

x \sim N (x^{b}, B)

(7b)

are estimated as:

x^{b} \approx {\bar{x}}^{b} = \frac{1}{N} \cdot \sum_{e = 1}^{N} x^{b [e]} \in R^{n \times 1}

(7c)

and

B \approx P^{b} = \frac{1}{N - 1} \cdot Δ X \cdot Δ X^{T} \in R^{n \times n}

(7d)

where

x^{b} \in R^{n \times 1}

is the background state,

B \in R^{n \times n}

is the background error covariance matrix, N is the ensemble size,

x^{b [e]}

denotes the e-th ensemble member, for

1 \leq e \leq N

.

{\bar{x}}^{b}

is known as the ensemble mean, and

P^{b}

is the ensemble covariance matrix. The matrix of member deviations

Δ X \in R^{n \times N}

reads

Δ X = X^{b} - {\bar{x}}^{b} \cdot 1^{T}

(7e)

and

1

is a vector of consistent dimension whose components are all ones. When an observation

y

becomes available, in the stochastic EnKF, the analysis ensemble can be estimated as follows:

X^{a} = X^{b} + P^{a} \cdot Δ Y \in R^{n \times N}

(7f)

where

Δ Y = H^{T} \cdot R^{- 1} \cdot [y \cdot 1^{T} - H \cdot X^{b} + E] \in R^{n \times N}

is the matrix of scaled innovations on the observations. The columns of

E \in R^{n \times N}

are formed by samples from a m-th dimensional standard Normal distribution.

H \in R^{m \times n}

is the linearized observation operator (with the linearization performed about the background state) also known as the Jacobian matrix of

H

at the background state, and

P^{a} \in R^{n \times n}

is the analysis covariance matrix:

P^{a} = {[{[P^{b}]}^{- 1} + H^{T} \cdot R^{- 1} \cdot H]}^{- 1}

(7g)

In operational DA, ensemble sizes are several orders of magnitude smaller than the model dimensions (

N ≪ n

) and as a consequence, the covariance matrix

P^{b}

is commonly rank-deficient. This implies that (7g) cannot be directly computed and even though equivalent updating formulas which avoid the

{[P^{b}]}^{- 1}

calculation can be found in the literature, sampling errors can still impact the quality of the analysis corrections. In practice, localization methods are often used to artificially increase the degrees of freedom of

P^{b}

and to mitigate the impact of sampling errors [12,13,14,15]. An efficient EnKF implementation which accounts for implicit localization during the assimilation step is the EnKF method based on a modified Cholesky decomposition (EnKF-MC) [16]. In this filter, the vanilla covariance (7d) is replaced by the Bickel and Levina estimator [17]:

B^{- 1} \approx {\hat{B}}^{- 1} = L^{T} \cdot D \cdot L \in R^{n \times n}

(8a)

where

L \in R^{n \times n}

is a lower triangular matrix whose diagonal elements are all ones. Its non-zero sub-diagonal elements are computed by fitting models of the form:

x_{[i]}^{T} = \sum_{j \in P (i, r)} x_{[j]}^{T} \cdot {- L}_{i, j} + ς_{i} \in R^{N \times 1}, for 1 \leq i \leq n

(8b)

In which

P (i, r)

denotes the predecessors of the i-th grid component for some labelling of model components,

x_{[i]} \in R^{1 \times N}

stands for the i-th row of the matrix (7e), and, as an assumption,

ς_{i} \in R^{N \times 1}

follows a zero-mean Normal distribution with uncorrelated errors of unknown variance

σ^{2}

. Likewise,

D \in R^{n \times n}

is a diagonal matrix whose diagonal elements are given by the reciprocal variances of the residuals in (8b):

{\{D\}}_{i, i} = var {(x_{[i]}^{T} - \sum_{j \in P (i, r)} x_{[j]}^{T} \cdot {- L}_{i, j})}^{- 1} \approx \frac{1}{σ^{2}}

(8c)

Similar to (7f), the analysis ensemble can be built as,

X^{a} = X^{b} + \hat{A} \cdot Δ Y

(8d)

where an estimate of the analysis covariance is:

\hat{A} = {[{\hat{B}}^{- 1} + H^{T} \cdot R^{- 1} \cdot H]}^{- 1} \in R^{n \times n}

(8e)

Matrix-free implementations of the EnKF-MC are currently proposed in the literature, for instance, the Posterior EnKF (P-EnKF) [18,19] exploits the special structure of (8a) in order to estimate Cholesky factors of the posterior precision covariance (8e). Consider

{\hat{A}}^{- 1} = {\hat{B}}^{- 1} + H^{T} \cdot R^{- 1} \cdot H = {\hat{B}}^{- 1} + Z \cdot Z^{T} = {\hat{B}}^{- 1} + \sum_{o = 1}^{m} z^{[o]} \cdot {[z^{[o]}]}^{T} \in R^{n \times n}

(9)

where

z^{[o]} \in R^{n \times 1}

, for

1 \leq o \leq m

, is the o-th column of matrix

Z = H^{T} \cdot R^{- 1 / 2} \in R^{n \times m}

, the updating process can be done via a sequence of rank-one updates over the prior Cholesky factors:

\begin{matrix} {\hat{A}}^{(0)} & = & {[L^{(0)}]}^{T} \cdot D^{(0)} \cdot [L^{(0)}] = L^{T} \cdot D \cdot L = {\hat{B}}^{- 1} \\ {\hat{A}}^{(1)} & = & {\hat{A}}^{(0)} + z^{[1]} \cdot {[z^{[1]}]}^{T} = {[L^{(1)}]}^{T} \cdot D^{(1)} \cdot [L^{(1)}] \\ {\hat{A}}^{(2)} & = & {\hat{A}}^{(1)} + z^{[2]} \cdot {[z^{[2]}]}^{T} = {[L^{(2)}]}^{T} \cdot D^{(2)} \cdot [L^{(2)}] \\ ⋮ \\ {\hat{A}}^{(m)} & = & {\hat{A}}^{(m - 1)} + z^{[m]} \cdot {[z^{[m]}]}^{T} \\ = & {[L^{(m)}]}^{T} \cdot D^{(m)} \cdot [L^{(m)}] = {\hat{L}}^{T} \cdot \hat{D} \cdot \hat{L} = {\hat{A}}^{- 1} \end{matrix}

where

L^{(0)} \in R^{n \times n}

and

D^{(0)} \in R^{n \times n}

are the Cholesky factors of

{\hat{B}}^{- 1}

. Having a posterior precision in the form

{\hat{A}}^{- 1} = {\hat{L}}^{T} \cdot \hat{D} \cdot \hat{L}

, an estimate of the posterior ensemble can be easily built:

X^{a} = X^{b} + Q \in R^{n \times N}

(10a)

where

Q \in R^{n \times N}

is given by the solution of a lower triangular linear system:

[{\hat{L}}^{T} \cdot {\hat{D}}^{1 / 2}] \cdot Q = Δ Y

(10b)

In spite of EnKF implementations well-recognized by the DA community, the Gaussian assumption (7b) can be easily broken when the numerical model dynamics (1) are highly non-linear. For this reason, one prefers to use a different model to describe the prior error distribution. A Gaussian Mixture Model is frequently used to relax the Gaussian assumption over the forecast distribution [20,21,22,23].

2.2. Gaussian Mixture Models Based Filters

In EnKF formulations based on Gaussian Mixture Models (GMM), the prior error distribution (7b) is replaced by a mixture of Gaussian distributions (2). Based on this general idea, many methods have been proposed in the current literature in order to deal with highly non-linear dynamics of, for instance, operational DA models. These methods rely on the assumption that GMM models can approximate arbitrary density functions. GMM components are commonly fitted by using the Expectation Maximization (EM) algorithm [24]. As the name implies, this iterative method works in two recursive steps: Expectation (E) and Maximization (M). For a fixed number of clusters K, at iteration p, during the E-step, the probability of representativeness of each ensemble member is computed with regard to clusters:

w_{e, k}^{(p)} \propto exp (- \frac{1}{2} \cdot {∥x^{b [e]} - x_{k}^{b (p)}∥}_{{[B_{k}^{(p)}]}^{- 1}}^{2}), for 1 \leq e \leq N and 1 \leq k \leq K

(11)

which allows us to approximate the k-th weight for the prior mixture component:

\begin{matrix} α_{k}^{b (p)} = {(\sum_{r = 1}^{K} \sum_{e = 1}^{N} w_{e, r}^{(p)})}^{- 1} \cdot (\sum_{e = 1}^{N} w_{e, k}^{(p)}) \end{matrix}

GMM components are updated during the M-step:

\begin{matrix} x_{k}^{b (p + 1)} = η_{k}^{(p)} \cdot \sum_{e = 1}^{N} w_{e, k}^{(p)} \cdot x^{b [e]} \in R^{n \times 1}, and B_{k}^{(p + 1)} = η_{k}^{(p)} \cdot [Δ X_{k}^{(p)} \cdot Γ_{k}^{(p)}] \cdot {[Δ X_{k}^{(p)} \cdot Γ_{k}^{(p)}]}^{T} \in R^{n \times n} \end{matrix}

where

η_{k}^{(p)} = {[\sum_{e = 1}^{N} w_{e, k}^{(p)}]}^{- 1} \in R

, the matrix of intra-cluster deviations reads

\begin{matrix} Δ X_{k}^{(p)} = X^{b} - x_{k}^{b (p)} \cdot 1^{T} \in R^{n \times N} and \end{matrix}

\begin{matrix} Γ_{k}^{(p)} = diag \{\sqrt{w_{1, k}^{(p)}}, \sqrt{w_{2, k}^{(p)}}, \dots, \sqrt{w_{N, k}^{(p)}}\} \in R^{N \times N} \end{matrix}

Under linear assumptions, GMM-EnKF formulations exploit the fact that a Gaussian distribution is conjugate of a GMM. Thus, posterior components can be computed as results of performing Kalman-like updates over prior weights, centroids, and covariances as follows:

α_{k}^{a} \propto α_{k}^{b} \cdot exp (- \frac{1}{2} \cdot {∥y - H (x_{k}^{b})∥}_{Q_{k}^{- 1}}^{2}), for 1 \leq k \leq K

(12a)

x_{k}^{a} = x_{k}^{b} + A_{k} \cdot H^{T} \cdot [y - H (x_{k}^{b})] \in R^{n \times 1}

(12b)

and

A_{k} = {[B_{k}^{- 1} + H^{T} \cdot R^{- 1} \cdot H]}^{- 1} \in R^{n \times n}

(12c)

where

Q_{k} = H \cdot B_{k} \cdot H^{T} + R \in R^{m \times m}

, and

\sum_{k = 1}^{K} α_{k}^{a} = 1

. This computationally friendly property is exploited by filters such as the Non-linear Bayesian Estimator (N-BE) [25] and the Gaussian Mixture Ensemble Filter [26,27]. In [28] the N-BE is enhanced by using the Bayesian Information Criteria (BIC)

\begin{matrix} B I C = - 2 \cdot (\sum_{e = 1}^{N} log (\sum_{k = 1}^{K} α_{k}^{b} \cdot ϕ (x^{b [e]}; x_{k}^{b}, B_{k}))) + 2 \cdot (3 \cdot N - 1) \end{matrix}

in order to choose the number of components for the GMM, where

ϕ (x; x_{k}^{b}, B_{k})

denotes a Normal probability density function with parameters

x_{k}^{b}

and

B_{k}

, respectively. This is possible, as well, by other means, for instance in the GMM-EnKF filter proposed by Smith in [29], the number of mixture components relies on the Akaie’s Information Criteria (AIC)

\begin{matrix} A I C = - 2 \cdot (\sum_{e = 1}^{N} log (\sum_{k = 1}^{K} α_{k}^{b} \cdot ϕ (x^{b [e]}; x_{k}^{b}, B_{k}))) + log (N) \cdot (3 \cdot N - 1) \end{matrix}

The BIC and the AIC are common methods for choosing the number of parameters in GMM but, information criteria based methods are poorly suited for selecting a model with a good out-of-sample fit in model-rich environments [30,31]. Besides, these methods are only valid for sample size N much larger than the number K of parameters, which can be difficult in the DA context.

Other methods such as Particle Filters (PFs) [32] are available in the literature in order to attack non-Gaussian DA problems. These methods are a good-choice from a theoretical point of view. Unfortunately, in practice, PFs do suffer from a degeneracy problem [33] (Section 1.4) and even more, many challenges have to be overcome before they can be considered under operational DA scenarios [34,35]. For these reasons, PFs are not considered any further in this paper.

3. Proposed Method

In this section, we develop an efficient Gaussian mixture ensemble Kalman filter implementation based on Markov-Chain-Monte-Carlo (GM-EnKF-MCMC) for non-Gaussian data assimilation. We describe in detail how hyper-parameters are computed as well as how observations are digested for the GM-EnKF-MCMC. Lastly, we briefly discuss how posterior ensembles are built.

3.1. Estimation of Hyper-Parameters—EM Method

Consider an ensemble of model realizations

X^{b} = [x^{b [1]}, x^{b [2]}, \dots, x^{b [N]}] \in R^{n \times N}

(13)

where

x^{b [e]} \in R^{n \times 1}

stands for the e-th ensemble member, for

1 \leq e \leq N

. To estimate the hyper-parameters of the prior error distribution, we use the EM method to fit a GMM with K components. Instead of estimating background error covariances

B_{k}

, for

1 \leq k \leq K

, for the mixture components, we do prefer to estimate precision covariances

B_{k}^{- 1}

due to their computational benefits. For instance, in the context of Normal distributions, probabilities are computed based on precision covariances which would require the inversion of large matrices when background error covariances are fitted during EM steps. Besides, by using the modified Cholesky decomposition for computing the precision covariances, we can exploit their special structures in order to obtain huge savings in terms of memory usage and to reduce the computational cost of matrix-vector products among iterations.

The proposed EM method works as follows: for a given number of clusters K, we choose K random members from the ensemble (13) in order to set the initial mixture centroids while the initial precision covariances are all equal to the precision covariance of the ensemble (8a). During the E-step, we compute the representativeness of each ensemble member regarding each cluster

{\hat{w}}_{e, k}^{(p)} \propto exp (- \frac{1}{2} \cdot {∥x^{b [e]} - x_{k}^{b (p)}∥}_{{[{\hat{B}}_{k}^{(p)}]}^{- 1}}^{2}), for 1 \leq e \leq N and 1 \leq k \leq K

(14a)

where

{[{\hat{B}}_{k}^{(p)}]}^{- 1} = {[L_{k}^{(p)}]}^{T} \cdot D_{k}^{(p)} \cdot [L_{k}^{(p)}] \in R^{n \times n}

(14b)

The M-step updates the precision covariances as well as the background error states for each cluster. Consider the diagonal matrix of fuzzy weights,

{\hat{Γ}}_{k}^{(p)} = diag \{{\hat{w}}_{1, k}^{(p)}, {\hat{w}}_{2, k}^{(p)}, \dots, {\hat{w}}_{N, k}^{(p)}\} \in R^{N \times N}

(15a)

the GMM centroids are updated as follows:

{\bar{x}}_{k}^{b (p)} = {\hat{η}}_{k}^{(p)} \cdot [X^{b} \cdot {\hat{Γ}}_{k}^{(p)}] \in R^{n \times 1}

(15b)

where

{\hat{η}}_{k}^{(p)} = {[\sum_{e = 1}^{N} {\hat{w}}_{e, k}^{(p)}]}^{- 1} \in R

. Likewise, the non-zero elements of

L_{k}^{(p + 1)} \in R^{n \times n}

(14b) are revised by fitting models of the form:

{[{x_{[i]}}_{k}^{(p)}]}^{T} = \sum_{q \in P (i, r)} {[{x_{[q]}^{(p)}}_{k}]}^{T} \cdot {\{L_{k}^{(p + 1)}\}}_{i, q} + γ_{i, k}, for 1 \leq i \leq n and 1 \leq k \leq K

(15c)

where

{x_{[i]}}_{k}^{(p)} \in R^{1 \times N}

is the i-th row of matrix

Δ X_{k}^{(p)} \in R^{n \times N}

Δ X_{k}^{(p)} = X^{b} - {\bar{x}}_{k}^{(p)} \cdot 1^{T}

(15d)

errors

γ_{i, k} \in R^{N \times 1}

are assumed Normal distribution with zero mean and uncorrelated components of variance

σ^{2}

. Moreover, the diagonal entries of

D_{k}^{(p)} \in R^{n \times n}

are estimated via the residuals of model (15c):

{\{D_{k}^{(p)}\}}_{i, i} = var {({[{x_{[i]}}_{k}^{(p)}]}^{T} - \sum_{q \in P (i, r)} {[{x_{[q]}}_{k}^{(p)}]}^{T} \cdot {\{L_{k}^{(p + 1)}\}}_{i, q})}^{- 1}, for 2 \leq i \leq n, and 1 \leq k \leq K

(15e)

with

{\{D_{k}^{(p)}\}}_{1, 1} = var {(x_{[1]}^{(p)})}^{- 1}

. Note that whenever covariance inflation is desired before assimilation steps, the inflation factors can be applied to the matrix of member deviations with regard to centroids (15d). In this manner, the estimated precision covariances come already inflated. Once the EM steps are concluded (i.e., when a maximum number of iterations is reached), the prior error distribution is described by the GMM:

x \sim \sum_{k = 1}^{K} α_{k}^{b} \cdot ({\bar{x}}_{k}^{b}, {[L_{k}^{T} \cdot D_{k} \cdot L_{k}]}^{- 1})

(16)

The special structure of the resulting precision covariances

{\hat{B}}_{k}^{- 1} = L_{k}^{T} \cdot D_{k} \cdot L_{k}

can be exploited in order to reduce the computational effort of drawing samples from the posterior error distribution as is discussed in the next section.

3.2. Sampling Method—Approaching the Posterior

In order to approximate samples from the posterior distribution (4), for each mixture component

1 \leq k \leq K

in (16), we use a Markov-Chain-Monte-Carlo (MCMC) approximation. Traditionally, Normal distributions are good candidates for proposing states, for instance, in our context, starting with

u = 0

and

x^{(u)} = {\bar{x}}_{k}^{b} \in R^{n \times 1}

, at iteration

0 \leq u \leq v

, where v is a user-defined number of iterations, a state can be proposed as follows:

z^{(u)} \sim N (x^{(u)}, {[{\hat{B}}_{k}^{- 1}]}^{- 1})

Or simply,

z^{(u)} = x^{(u)} + ν \in R^{n \times 1}

(17)

where

ν \in R^{n \times 1}

is given by the solution of an upper triangular linear system:

[L^{T} \cdot D^{1 / 2}] \cdot ν = ε \in R^{n \times 1}

and

ε \in R^{n \times 1}

follows a standard Normal distribution. Nevertheless, in high-dimensional spaces such as those found in the context of operational DA, high-probability zones of posterior error distributions can be reached after a huge number of iterations. Thus, to overcome this situation, we proceed as follows: we linearize the observation operator about the current state

x^{(u)}

,

H (x) \approx G_{u} (x) = H (x^{(u)}) + H_{x^{(u)}} \cdot [x - x^{(u)}]

where

H_{x^{(u)}} \in R^{m \times n}

is the Jacobian matrix of

H (x)

at

x^{(u)}

, the cost function (6) can then be approximated by the quadratic cost function:

{\hat{J}}_{k} (x) = \frac{1}{2} \cdot {∥x - {\bar{x}}_{k}^{b}∥}_{B_{k}^{- 1}}^{2} + \frac{1}{2} \cdot {∥y - G_{u} (x)∥}_{R^{- 1}}^{2}

(18a)

whose gradient reads,

\nabla {\hat{J}}_{k} (x) = {\hat{B}}_{k}^{- 1} \cdot [x - {\bar{x}}_{k}^{b}] - H_{x^{(u)}}^{T} \cdot R^{- 1} \cdot [y - G_{u} (x)] \in R^{n \times 1}

(18b)

By using this gradient, the proposal distribution (17) can be modified in such a manner that, samples along the (approximated) steepest descent direction

- \nabla {\hat{J}}_{k} (x^{(u)})

of (6) have high probability of occurrence, that is,

z^{(u)} = x^{(u)} + ψ \cdot [- \frac{\nabla {\hat{J}}_{k} (x^{(u)})}{∥\nabla {\hat{J}}_{k} (x^{(u)})∥}], with ψ \sim U (0, β)

(19)

where

U (0, β)

stands for an Uniform distribution on the interval

(0, β)

. The value of

β

can be tuned according to the degree of the observation operator. For instance, for a linear observation operator,

β

can be set as

∥\nabla \hat{J} (x^{(u)})∥

since the gradient of (6) and (18a) are the same. When observation operators are highly non-linear, since Taylor based approximations suffer from myopia, a small step of such gradient must be taken and therefore, a good choice under this consideration is 1. Thus, an intuitive range for

β

can be

\begin{matrix} β \in [1, ∥\nabla \hat{J} (x^{(u)})∥] \end{matrix}

In the absence of prior information about

β

, one can choose 1. The main motivation for using gradient approximations is that, the subset of samples along the descent direction (19) can provide states which potentially maximize the posterior probability. Computationally speaking, no matrix inversion is needed in this context in order to propose states. The acceptance/rejection rule for the states (19) rely on the Metropolis-Hastings criterion which can be adapted as follows:

x^{(u + 1)} = \{\begin{matrix} x^{(u)} & , y > \frac{J_{k} (x^{(u)})}{J_{k} (z^{(u)})}, for y \sim U (0, 1) \\ z^{(u)} & , otherwise \end{matrix}

(20)

The observation operator is then linearized about

x^{(u + 1)}

and the overall process is repeated until some number of iterations is satisfied (or any other numerical condition). Putting it all together, the sampling procedure to compute the posterior modes is in the following:

Step 1: Let $k = 1$ , set $u = 0$ , go to step 2.
Step 2: Set $x^{(u)} = {\bar{x}}_{k}^{b}$ .
Step 3: Linearize $H$ about $x^{(u)}$ and compute the direction (18b).
Step 4: Compute $z^{(u)}$ via Equation (19).
Step 5: Set $x^{(u + 1)}$ according to Equation (20).
Step 6: If $u \leq v$ set $u = u + 1$ and go to step 3, set ${\bar{x}}_{k}^{a} = x^{(v)}$ and go to step 7 otherwise.
Step 7: If $k \leq K$ go to step 1, go to step 8 otherwise.
Step 8: The posterior mode approximations read ${\{{\bar{x}}_{k}^{a}\}}_{k = 1}^{K}$ .

Note that, unlike MCMC methods based on Random-Walk, for each prior mode, this procedure does not return a chain but only the last state in

x^{(v)}

. This sampling step can be replaced by an optimization method such as Trust Region [36,37,38] or Line Search [39,40]. However, in order for it to work, some regularities must be satisfied by the gradient approximation (18b) [41,42,43,44,45], for instance, smoothness, which in practice is not necessarily the case.

3.3. Building the Posterior Ensemble

Once the posterior modes are computed, posterior covariances within clusters can be estimated as follows:

{\hat{A}}_{k} = {[{\hat{B}}_{k}^{- 1} + H_{{\bar{x}}_{k}^{a}} \cdot R^{- 1} \cdot {[H_{{\bar{x}}_{k}^{a}}]}^{T}]}^{- 1} \in R^{n \times n}

(21a)

Therefore, the analysis members can be computed as:

x^{a [e]} \sim N ({\bar{x}}_{k}^{a}, {\hat{A}}_{k}), with probability α_{k}^{a}, for 1 \leq e \leq N

(21b)

where

α_{k}^{a}

is estimated via the likelihood ratio:

α_{k}^{a} = {[\sum_{j = 1}^{K} ϕ (y; H (x_{j}^{a}), R)]}^{- 1} \cdot ϕ (y; H (x_{k}^{a}), R)

3.4. Computational Complexity

In this section, we estimate the number of long computations of the proposed method in order to assess its computational effort. We detail the number of computations of the GMM-EnKF-MCMC below, we avoid the use of iteration indexes for ease of reading:

During the E-Step, the computations of weights (14a) depend on the calculation:

$\begin{matrix} - \frac{1}{2} \cdot {∥x^{b [e]} - x_{k}^{b (p)}∥}_{{[{\hat{B}}_{k}^{(p)}]}^{- 1}}^{2} & = & - \frac{1}{2} \cdot {[x^{b [e]} - x_{k}^{b (p)}]}^{T} \cdot {\hat{B}}_{k}^{- 1} [x^{b [e]} - x_{k}^{b (p)}] \\ = & - \frac{1}{2} \cdot {[x^{b [e]} - x_{k}^{b (p)}]}^{T} \cdot L_{k}^{T} \cdot D_{k} \cdot L_{k} \cdot [x^{b [e]} - x_{k}^{b (p)}] \\ = & - \frac{1}{2} \cdot {\underset{︸}{[x^{b [e]} - x_{k}^{b (p)}]}}_{s_{k}}^{T} \cdot L_{k}^{T} \cdot D_{k}^{T / 2} \cdot D_{k}^{1 / 2} \cdot L_{k} \cdot \underset{s_{k}}{\underset{︸}{[x^{b [e]} - x_{k}^{b (p)}]}} \\ = & - \frac{1}{2} \cdot s_{k}^{T} \cdot L_{k}^{T} \cdot D_{k}^{T / 2} \cdot D_{k}^{1 / 2} \cdot L_{k} \cdot s_{k} \\ = & {[D_{k}^{1 / 2} \cdot \underset{{\hat{s}}_{k}}{\underset{︸}{L_{k} \cdot s_{k}}}]}^{T} \cdot D_{k}^{1 / 2} \cdot \underset{{\hat{s}}_{k}}{\underset{︸}{L_{k} \cdot s_{k}}} \\ = & {[\underset{{\tilde{s}}_{k}}{\underset{︸}{D_{k}^{1 / 2} \cdot {\hat{s}}_{k}}}]}^{T} \cdot \underset{{\tilde{s}}_{k}}{\underset{︸}{D_{k}^{1 / 2} \cdot {\hat{s}}_{k}}} = {\tilde{s}}_{k}^{T} \cdot {\tilde{s}}_{k} = {∥{\tilde{s}}_{k}∥}_{2}^{2} \end{matrix}$

From this step, given the special structure of $L_{k}$ , ${\tilde{s}}_{k} \in R^{n \times 1}$ can be computed with no more than $O (θ^{2} \cdot n)$ long computations where $θ$ denotes the maximum number of non-zero elements across all rows in $L_{k}$ with $θ ≪ n$ . Likewise, the number of long computations in order to obtain $\tilde{s} \in R^{n \times 1}$ is bounded by $O (n)$ since $D_{k}$ is diagonal. Thus, since there are K clusters, each E-step has the following operation counting:

$O (K \cdot [n + θ^{2} \cdot n])$

(22a)
During the M-step, updating the centroids (15b) can be performed with no more than $O (N^{2} \cdot n)$ since $D_{k}$ has only n components different from zero (the diagonal ones), the least square solution of (15c) is bounded by $O (θ^{2} \cdot n)$ calculations since there are n model components, and the cost of (15e) is bounded by $O (n \cdot θ^{2})$ since the multiplication of coefficients and model components is constrained to the neighbourhood of each model component. The computational effort of this method is then estimated as follows:

$O (K \cdot [θ^{2} \cdot n + N^{2} \cdot n])$

(22b)
During the sampling procedure, the gradient (18b) can be efficiently computed as follows:

$\begin{matrix} \nabla {\hat{J}}_{k} (x) & = & {\hat{B}}_{k}^{- 1} \cdot [\underset{g_{k}}{\underset{︸}{x - {\bar{x}}_{k}^{b}}}] - H_{x^{(u)}}^{T} \cdot R^{- 1} \cdot [\underset{f_{k}}{\underset{︸}{y - G_{u} (x)}}] = L_{k}^{T} \cdot D_{k} \cdot \underset{\hat{g_{k}}}{\underset{︸}{L_{k} \cdot g_{k}}} - H_{x^{(u)}}^{T} \cdot \underset{\hat{f_{k}}}{\underset{︸}{R^{- 1} \cdot f_{k}}} \\ = & L_{k}^{T} \cdot \underset{\tilde{g_{k}}}{\underset{︸}{D_{k} \cdot {\hat{g}}_{k}}} - \underset{\tilde{f_{k}}}{\underset{︸}{H_{x^{(u)}}^{T} \cdot \hat{f_{k}}}} = \underset{\bar{g_{k}}}{\underset{︸}{L_{k}^{T} \cdot {\tilde{g}}_{k}}} - \tilde{f_{k}} = \bar{g_{k}} - f_{k} \end{matrix}$

Given the special structure of $L_{k}$ , $g_{k} \in R^{n \times 1}$ and $\bar{g_{k}} \in R^{n \times 1}$ can be computed with no more than $O (θ^{2} \cdot n)$ , the computations in $\tilde{g_{k}} \in R^{n \times 1}$ are bounded by $O (n)$ since $D_{k}$ is diagonal. Thus, since this sampling method is performed v times, the computational effort of the sampling procedure reads,

$O (v \cdot [n + θ^{2} \cdot n])$

(22c)
The posterior ensemble can be built (Section 3.3 [19]) with no more than

$O (n \cdot θ^{2} + m \cdot N),$

(22d)

long computations.

Assuming that the number of clusters and the number of iterations in the sampling process are much lower than the model dimension, based on Equation (22) the computational effort of the GMM-EnKF-MCMC reads,

\begin{matrix} O (n \cdot θ^{2} + m \cdot N) \end{matrix}

which is linear with regard to the model resolution n.

3.5. Comparison of GM-EnKF-MCMC with Other Sampling Methods

In this section, we briefly compare the GM-EnKG-MCMC method with well-known filters from the literature: the Cluster Sampling Filter (CSF) [21], the Cluster Monte Carlo Implementation (CMCI) [46], and the Cluster Ensemble Kalman Filter (CEnKF) [29].

The CSF exploits the evolution of a system under Hamiltonian dynamics, for instance, based on Newton’s law it is possible to describe the dynamics of particles. Each particle is fully described by two components: the position and the velocity coordinates which are associated with model states

x \in R^{n \times 1}

and momentums

p \in R^{n \times 1}

(auxiliary variables), respectively. The performance of CSF relies on the proper choice of a mass matrix

M \in R^{n \times n}

in order to describe the probability distribution of

p

during the sampling process. Besides, numerical integrators which preserve Liouville’s Theorem are a must in order to obtain a proposal state

(p^{*}, x^{*})

. Some of those are the Verlet integrator [47] and the Leap-frog scheme [48]. A main difference between the GMM-EnKF-MCMC and the CSF is that, no auxiliary n-th dimensional vectors are required in our method during the sampling process. This can be convenient, for instance, under current operational DA systems. Besides, during the sampling process, the only parameter to be tuned in the GMM-EnKF-MCMC is

β

in (17) while in the CSF method, as we mentioned before, proper choices of

M

are required in order to speed-up the convergence of MCMC towards high-probability zones of the posterior. Besides, extra parameters may need to be tuned depending on the numerical integrator chosen in order to propose states during the sampling process.

In the CMCI, the background error distribution is approximated by a summation of Gaussian Kernels. These are selected using the method of Fukunaga [49] to form a continuous approximation to the random ensemble. By assuming linear observation operators, the posterior ensemble is Gaussian as well. Posterior members are drawn by sampling each of the components/kernels based on a set of calculated weights which account for the observation. As is pointed out by the authors, the CMCI is a very promising filter but, additional work is needed in order to extend its capabilities to more realistic scenarios. Moreover, this filter has been developed under Gaussian assumptions on the observations; such assumption is not required in the GMM-EnKF-MCMC.

The CEnKF is developed by assuming linear observation operators. Prior error distributions are described by GMM and its components are fitted by using the EM-method. This filter can fail to obtain reasonable estimates of posterior ensembles when non-linear observation operators are present during the assimilation of observations, as is typical in practice.

4. Experimental Settings

In this section, numerical tests are performed to assess the accuracy of the proposed filter. We use the Lorenz-96 model [50] as our surrogate model. The Lorenz-96 model is described by the following set of ordinary differential equations [51]:

\frac{d x_{j}}{d t} = \{\begin{matrix} (x_{2} - x_{n - 1}) \cdot x_{n} - x_{1} + F & for j = 1 \\ (x_{j + 1} - x_{j - 2}) \cdot x_{j - 1} - x_{j} + F & for 2 \leq j \leq n - 1 \\ (x_{1} - x_{n - 2}) \cdot x_{n - 1} - x_{n} + F & for j = n \end{matrix}

(23)

where F is external force and

n = 40

is the number of model components. Periodic boundary conditions are assumed. When

F = 8

the model exhibits chaotic behavior, which makes it a relevant surrogate problem for atmospheric dynamics [52,53]. One time unit in the Lorenz-96 represents 7 days in real case. The experimental settings are described below:

Starting with an initial random solution, a 4th order Runge Kutta method is employed in order to integrate it over a long time period from which initial condition $x_{- 2}^{*} \in R^{n \times 1}$ is obtained.
A perturbed background solution ${\tilde{x}}_{- 2}^{b}$ is formed at time $t_{- 2}$ by drawing a sample from the Normal distribution,

$\begin{matrix} {\tilde{x}}_{- 2}^{b} \sim N (x_{- 2}^{*}, {0.05}^{2} \cdot I) \end{matrix}$

This solution is then integrated for 10 time units (equivalent to 70 days) in order to obtain a background solution $x_{- 1}^{b}$ consistent with the dynamics of the numerical model.
An initial perturbed ensemble is built about the background state by taking samples from the distribution,

$\begin{matrix} {\tilde{x}}_{- 1}^{b [\hat{e}]} \sim N (x_{- 1}^{b}, {0.05}^{2} \cdot I), for 1 \leq \hat{e} \leq \hat{N} \end{matrix}$

In order to make them consistent with the model dynamics, the ensemble members are propagated for 10 time units, from which the initial ensemble members are obtained. We create the initial pool ${\hat{X^{b}}}_{0}$ of $\hat{N} = 10^{4}$ members. The actual solution is integrated over 20 more time units in order to place it at the beginning of the assimilation window.
Two assimilation windows are proposed for the tests, both of them consist of $M = 15$ observations. In the first assimilation window, observations are taken every 16 h (time step of 0.1 time units) while in the last one, observations are available every 50 h (time step of 0.3 time units). We denote by $δ t \in {16, 50}$ the elapsed time between two observations.
The observational errors are described by the probability distribution,

$\begin{matrix} y_{ℓ} \sim N (H_{k} (x_{ℓ}^{*}), {[ϵ^{o}]}^{2} \cdot I), for 1 \leq ℓ \leq M \end{matrix}$

where the standard deviations of observational errors $ϵ^{o} = 10^{- 2}$ , and ℓ should be interpreted as time index. Random observation networks are formed at the different assimilation cycles. The space between observations will depend on the step size, for instance, in the first step size observations are available every 0.1 time units (16 h) while in the last one, observations are taken every 0.3 time units (50 h).
We consider the non-linear observation operator [32]:

${\{H (x)\}}_{j} = \frac{{\{x\}}_{j}}{2} \cdot [{(\frac{|{\{x\}}_{j}|}{2})}^{γ - 1} + 1]$

(24)

where j denotes the j-th observed component from the model state. $γ \in {1, 3, 5}$ .
We consider two percentages of observed components s from the model state $s \in {70 %, 100 %}$ .
The radius of influence is set to $r = 1$ while the inflation factor is set to 1.02 (a typical value).
We propose two ensemble sizes for the benchmark $N \in {20, 80}$ . These members are randomly chosen from the pool ${\hat{X^{b}}}_{0}$ for different experiments in order to form the initial ensemble $X_{0}^{b}$ for the assimilation window. Evidently, $X_{0}^{b} \subset {\hat{X^{b}}}_{0}$ .
The $L - 2$ norm of errors are utilized as a measure of accuracy at the assimilation step ℓ,

$λ_{ℓ} = \sqrt{{[x_{ℓ}^{*} - x_{ℓ}^{a}]}^{T} \cdot [x_{ℓ}^{*} - x_{ℓ}^{a}]}$

(25)

where $x_{k}^{*}$ and $x_{k}^{a}$ are the reference and the analysis solutions, respectively. The analysis state is obtained by a weighted combination of posterior centroids via the likelihood ratio (21b) in lieu of the posterior mean.
The Root-Mean-Square-Error (RMSE) is used as a measure of performance. On average, on a given assimilation window,

$λ = \sqrt{\frac{1}{M} \cdot \sum_{ℓ = 1}^{M} λ_{ℓ}^{2}}$

(26)

Results for

δ t = 16

h, different ensemble sizes, and different values of

γ

are shown in Figure 1 in terms of L-2 error norms

λ_{ℓ}

per assimilation step (25). Notoriously, the performance of the filter is improved as long as prior errors are described by a GMM (

K > 1

). This can be expected owing to the non-linear dynamics of the Lorenz-96 model. In addition, the performance of the GMM-EnKF-MCMC is not impacted by the degree

γ

of the observation operator. Both features make the proposed formulation attractive for DA systems where non-linear observation operators are the bridge for digesting observations and besides, non-linear dynamics are encapsulated in the numerical model. Moreover, for

K > 1

, there are less posterior errors than in prior and observations; this of course must be expected for full observational networks. In some cases, filter divergence is possible for

K = 1

(a single mode prior distribution). Assuming that spurious correlations are not impacting the analysis corrections, in this set of experiments, filter divergence can be seen as a consequence of one of two things: prior errors are not well fitted by a Gaussian distribution (the actual error distribution is multimodal) or inflation factors are not properly tuned. Regardless of the main cause, filter convergence is evident when a GMM describes the background error distribution. In Figure 2, results for

δ t = 50

, different ensemble sizes, and different values of

γ

are shown as well. Under this configuration, the accuracy of the GMM-EnKF-MCMC is slightly impacted by increments in

γ

. Filter convergence is rapidly achieved when the number of prior modes is larger than one. Despite the fact that filter divergence can be still for

K = 1

, interestingly, the proposed method does a reasonable estimation of the actual state of the system. In the linear case, this is not surprising, EnKF formulations are widely utilized in practice wherein Gaussian assumptions are commonly broken during assimilation steps. For non-linear observation operators

γ > 1

, and Gaussian assumptions on the prior

K = 1

, the accuracy of the filter relies on the sampling procedure. As can be seen, under such assumptions, the proposed method is capable of obtaining good estimates of posterior error modes.

RMSE values (

λ

) (26) in the log-scale are shown in the Table 1 and Table 2 for

N = 20

and

N = 80

, respectively. We group RMSE values by elapsed time between observations

δ t

and number of observed components s. Notice that the performance of the filter can be improved when the number of prior modes is larger than one. This aligns with the highly non-linear (and chaotic) dynamics exhibited by the Lorenz 96 model which makes a single mode distribution insufficient to encapsulate all prior information. However, the GMM-EnKF-MCMC can be impacted when a large number of mixture components is utilized and in some cases, overfitting is possible, for instance, in Table 1, for

γ = 5

and

N = 20

.

Yet another interesting analysis is the behaviour of the sampling process. In Figure 3, we show a two dimensional projection of our sampling steps for some particular choices of parameters

γ

, N, s, and K. We use the two leading components of the space generated by the accepted samples among iterations. Note that, in all cases, the sampling method obtains good estimates of the actual state of the system

x^{*}

, even more, assuming

N = K

(which is equivalent to assume Gaussian errors for each ensemble member), the sampling steps converges to

x^{*}

.

5. Conclusions

In this paper we propose an EnKF implementation based on a modified Cholesky decomposition and a Markov-Chain-Monte-Carlo (MCMC) Method, the GMM-EnKF-MCMC. During the assimilation of observations, the method proceeds as follows: prior error distributions are fitted by using Gaussian Mixture Models. Prior components are estimated making use of the Expectation Maximization algorithm. Based on MCMC, posterior components are individually estimated. This is done by sampling states along an approximated steepest descent direction of the well-known Three Dimensional Variational cost function. Experimental tests are performed on the Lorenz-96 model. The results reveal that the proposed method is able to handle non-linearities produced by the model dynamics as well as those generated by the non-linear observation operator and even more, in a root-mean-square-error sense, the accuracy of the filter is not highly impacted as the degree of the observation operator is increased.

Acknowledgments

This work was supported by the Applied Math and Computer Science Laboratory at Universidad del Norte, Barranquilla, COL.

Author Contributions

Elias D. Nino-Ruiz and Haiyan Cheng conceived and designed the experiments; Elias D. Nino-Ruiz and Rolando Beltran performed the experiments; Elias D. Nino-Ruiz and Haiyan Cheng analyzed the data; Rolando Beltran contributed analysis tools; Elias D. Nino-Ruiz and Haiyan Cheng wrote the paper.

Conflicts of Interest

The authors declare no conflict of interest.

References

Hoar, T.; Anderson, J.; Collins, N.; Kershaw, H.; Hendricks, J.; Raeder, K.; Mizzi, A.; Barré, J.; Gaubert, B.; Madaus, L.; et al. DART: A Community Facility Providing State-Of-The-Art, Efficient Ensemble Data Assimilation for Large (Coupled) Geophysical Models; AGU: Washington, DC, USA, 2016. [Google Scholar]
Clayton, A.; Lorenc, A.C.; Barker, D.M. Operational implementation of a hybrid ensemble/4D-Var global data assimilation system at the Met Office. Q. J. R. Meteorol. Soc. 2013, 139, 1445–1461. [Google Scholar] [CrossRef]
Fairbairn, D.; Pring, S.; Lorenc, A.; Roulstone, I. A comparison of 4DVar with ensemble data assimilation methods. Q. J. R. Meteorol. Soc. 2014, 140, 281–294. [Google Scholar] [CrossRef]
Reynolds, D. Gaussian mixture models. Encycl. Biom. 2015, 827–832. [Google Scholar] [CrossRef]
Nguyen, T.M.; Wu, Q.J.; Zhang, H. Bounded generalized Gaussian mixture model. Pattern Recogn. 2014, 47, 3132–3142. [Google Scholar] [CrossRef]
Witten, I.H.; Frank, E.; Hall, M.A.; Pal, C.J. Data Mining: Practical Machine Learning Tools and Techniques; Morgan Kaufmann: Burlington, MA, USA, 2016. [Google Scholar]
Mandel, M.I.; Weiss, R.J.; Ellis, D.P. Model-based expectation-maximization source separation and localization. IEEE Trans. Audio Speech Lang. Proc. 2010, 18, 382–394. [Google Scholar] [CrossRef]
Vila, J.P.; Schniter, P. Expectation-maximization Gaussian-mixture approximate message passing. IEEE Trans. Signal Process. 2013, 61, 4658–4672. [Google Scholar] [CrossRef]
Dovera, L.; Della Rossa, E. Multimodal ensemble Kalman filtering using Gaussian mixture models. Comput. Geosci. 2011, 15, 307–323. [Google Scholar] [CrossRef]
Evensen, G. Data Assimilation: The Ensemble Kalman Filter; Springer: Secaucus, NJ, USA, 2006. [Google Scholar]
Evensen, G. The ensemble Kalman filter: Theoretical formulation and practical implementation. Ocean Dyn. 2003, 53, 343–367. [Google Scholar] [CrossRef]
Greybush, S.J.; Kalnay, E.; Miyoshi, T.; Ide, K.; Hunt, B.R. Balance and ensemble Kalman filter localization techniques. Mon. Weather Rev. 2011, 139, 511–522. [Google Scholar] [CrossRef]
Buehner, M. Evaluation of a Spatial/Spectral Covariance Localization Approach for Atmospheric Data Assimilation. Mon. Weather Rev. 2011, 140, 617–636. [Google Scholar] [CrossRef]
Anderson, J.L. Localization and Sampling Error Correction in Ensemble Kalman Filter Data Assimilation. Mon. Weather Rev. 2012, 140, 2359–2371. [Google Scholar] [CrossRef]
Nino-Ruiz, E.D.; Sandu, A. Efficient parallel implementation of DDDAS inference using an ensemble Kalman filter with shrinkage covariance matrix estimation. Clust. Comput. 2017, 1–11. [Google Scholar] [CrossRef]
Nino-Ruiz, E.D.; Sandu, A.; Deng, X. A parallel implementation of the ensemble Kalman filter based on modified Cholesky decomposition. J. Comput. Sci. 2017, in press. [Google Scholar] [CrossRef]
Bickel, P.J.; Levina, E. Regularized estimation of large covariance matrices. Ann. Statist. 2008, 36, 199–227. [Google Scholar] [CrossRef]
Nino-Ruiz, E.D.; Mancilla, A.; Calabria, J.C. A Posterior Ensemble Kalman Filter Based on A Modified Cholesky Decomposition. Procedia Comput. Sci. 2017, 108, 2049–2058. [Google Scholar] [CrossRef]
Nino-Ruiz, E.D. A Matrix-Free Posterior Ensemble Kalman Filter Implementation Based on a Modified Cholesky Decomposition. Atmosphere 2017, 8, 125. [Google Scholar] [CrossRef]
Attia, A.; Sandu, A. A hybrid Monte Carlo sampling filter for non-gaussian data assimilation. AIMS Geosci. 2015, 1, 41–78. [Google Scholar] [CrossRef]
Attia, A.; Moosavi, A.; Sandu, A. Cluster Sampling Filters for Non-Gaussian Data Assimilation. arXiv, 2016; arXiv:1607.03592. [Google Scholar]
Kotecha, J.H.; Djuric, P.M. Gaussian sum particle filtering. IEEE Trans. Signal Process. 2003, 51, 2602–2612. [Google Scholar] [CrossRef]
Rings, J.; Vrugt, J.A.; Schoups, G.; Huisman, J.A.; Vereecken, H. Bayesian model averaging using particle filtering and Gaussian mixture modeling: Theory, concepts, and simulation experiments. Water Resour. Res. 2012, 48. [Google Scholar] [CrossRef]
Moon, T.K. The expectation-maximization algorithm. IEEE Signal Process. Mag. 1996, 13, 47–60. [Google Scholar] [CrossRef]
Alspach, D.; Sorenson, H. Nonlinear Bayesian estimation using Gaussian sum approximations. IEEE Trans. Autom. Control 1972, 17, 439–448. [Google Scholar] [CrossRef]
Frei, M.; Künsch, H.R. Mixture ensemble Kalman filters. Comput. Stat. Data Anal. 2013, 58, 127–138. [Google Scholar] [CrossRef]
Tagade, P.; Seybold, H.; Ravela, S. Mixture Ensembles for Data Assimilation in Dynamic Data-driven Environmental Systems1. Procedia Comput. Sci. 2014, 29, 1266–1276. [Google Scholar] [CrossRef]
Sondergaard, T.; Lermusiaux, P.F. Data assimilation with Gaussian mixture models using the dynamically orthogonal field equations. Part I: Theory and scheme. Mon. Weather Rev. 2013, 141, 1737–1760. [Google Scholar] [CrossRef]
Smith, K.W. Cluster ensemble Kalman filter. Tellus A 2007, 59, 749–757. [Google Scholar] [CrossRef]
Hansen, P.R. A Winner’s Curse for Econometric Models: On the Joint Distribution of In-Sample Fit and Out-Of-Sample Fit and Its Implications for Model Selection; Research Paper; Stanford University: Stanford, CA, USA, 2010; pp. 1–39. [Google Scholar]
Vrieze, S.I. Model selection and psychological theory: A discussion of the differences between the Akaike information criterion (AIC) and the Bayesian information criterion (BIC). Psychol. Methods 2012, 17, 228. [Google Scholar] [CrossRef] [PubMed]
Van Leeuwen, P.J.; Cheng, Y.; Reich, S. Nonlinear Data Assimilation; Springer: Cham, Switzerland, 2015; Volume 2. [Google Scholar]
Bannister, R. A review of operational methods of variational and ensemble-variational data assimilation. Q. J. R. Meteorol. Soc. 2017, 143, 607–633. [Google Scholar] [CrossRef]
Snyder, C.; Bengtsson, T.; Bickel, P.; Anderson, J. Obstacles to high-dimensional particle filtering. Mon. Weather Rev. 2008, 136, 4629–4640. [Google Scholar] [CrossRef]
Rebeschini, P.; Van Handel, R. Can local particle filters beat the curse of dimensionality? Ann. Appl. Probab. 2015, 25, 2809–2866. [Google Scholar] [CrossRef]
Moré, J.J.; Sorensen, D.C. Computing a trust region step. SIAM J. Sci. Stat. Comput. 1983, 4, 553–572. [Google Scholar] [CrossRef]
Conn, A.R.; Gould, N.I.; Toint, P.L. Trust Region Methods; SIAM: Philadelphia, PA, USA, 2000; Volume 1. [Google Scholar]
Nino-Ruiz, E.D. Implicit Surrogate Models for Trust Region Based Methods. J. Comput. Sci. 2018, in press. [Google Scholar] [CrossRef]
Wächter, A.; Biegler, L.T. On the implementation of an interior-point filter line-search algorithm for large-scale nonlinear programming. Math. Program. 2006, 106, 25–57. [Google Scholar] [CrossRef]
Grippo, L.; Lampariello, F.; Lucidi, S. A nonmonotone line search technique for Newton’s method. SIAM J. Numer. Anal. 1986, 23, 707–716. [Google Scholar] [CrossRef]
Kramer, O.; Ciaurri, D.E.; Koziel, S. Derivative-free optimization. In Computational Optimization, Methods and Algorithms; Springer: Berlin/Heidelberg, Germany, 2011; pp. 61–83. [Google Scholar]
Conn, A.R.; Scheinberg, K.; Vicente, L.N. Introduction to Derivative-Free Optimization; SIAM: Philadelphia, PA, USA, 2009; Volume 8. [Google Scholar]
Rios, L.M.; Sahinidis, N.V. Derivative-free optimization: A review of algorithms and comparison of software implementations. J. Glob. Optim. 2013, 56, 1247–1293. [Google Scholar] [CrossRef]
Jang, J.S.R. Derivative-Free Optimization. In Neuro-Fuzzy Soft Computing; Prentice Hall: Englewood Cliffs, NJ, USA, 1997; pp. 173–196. [Google Scholar]
Nino-Ruiz, E.D.; Ardila, C.; Capacho, R. Local search methods for the solution of implicit inverse problems. In Soft Computing; Springer: Berlin/Heidelberg, Germany, 2017; pp. 1–14. [Google Scholar]
Anderson, J.L.; Anderson, S.L. A Monte Carlo implementation of the nonlinear filtering problem to produce ensemble assimilations and forecasts. Mon. Weather Rev. 1999, 127, 2741–2758. [Google Scholar] [CrossRef]
Grubmüller, H.; Heller, H.; Windemuth, A.; Schulten, K. Generalized Verlet algorithm for efficient molecular dynamics simulations with long-range interactions. Mol. Simul. 1991, 6, 121–142. [Google Scholar] [CrossRef]
Van Gunsteren, W.F.; Berendsen, H. A leap-frog algorithm for stochastic dynamics. Mol. Simul. 1988, 1, 173–185. [Google Scholar] [CrossRef]
Koontz, W.L.; Fukunaga, K. A nonparametric valley-seeking technique for cluster analysis. IEEE Trans. Comput. 1972, 100, 171–178. [Google Scholar] [CrossRef]
Lorenz, E.N. Designing Chaotic Models. J. Atmos. Sci. 2005, 62, 1574–1587. [Google Scholar] [CrossRef]
Fertig, E.J.; Harlim, J.; Hunt, B.R. A comparative study of 4D-VAR and a 4D ensemble Kalman filter: Perfect model simulations with Lorenz-96. Tellus A 2007, 59, 96–100. [Google Scholar] [CrossRef]
Karimi, A.; Paul, M.R. Extensive chaos in the Lorenz-96 model. Chaos 2010, 20, 043105. [Google Scholar] [CrossRef] [PubMed]
Gottwald, G.A.; Melbourne, I. Testing for chaos in deterministic systems with noise. Physica D 2005, 212, 100–110. [Google Scholar] [CrossRef]

Figure 1. Experimental results with the Lorenz-96 model (29). The time evolution of mean analysis errors and their standard deviation across the 10 different experimental configurations are reported for

γ \in {1, 3, 5}

, and

δ t = 16

.

Figure 1. Experimental results with the Lorenz-96 model (29). The time evolution of mean analysis errors and their standard deviation across the 10 different experimental configurations are reported for

γ \in {1, 3, 5}

, and

δ t = 16

.

Figure 2. Experimental results with the Lorenz-96 model (29). The time evolution of mean analysis errors and their standard deviation across the 10 different experimental configurations are reported for

γ \in {1, 3, 5}

, and

δ t = 50

.

Figure 2. Experimental results with the Lorenz-96 model (29). The time evolution of mean analysis errors and their standard deviation across the 10 different experimental configurations are reported for

γ \in {1, 3, 5}

, and

δ t = 50

.

Figure 3. Two dimensional projections of the steps performed by the sampling method. Blue dots denote prior ensemble members

x^{b [e]}

, large blue dots stand for centroids

{\bar{x}}_{k}^{b}

, dashed black lines together with black dots denote accepted states

x^{(u)}

, and the red dot stands for the actual state of the system

x^{*}

. Even for

K = N

, the method is able to obtain reasonable estimates of the actual state of the system. The variance explained in such plots is larger than

90 %

.

Figure 3. Two dimensional projections of the steps performed by the sampling method. Blue dots denote prior ensemble members

x^{b [e]}

, large blue dots stand for centroids

{\bar{x}}_{k}^{b}

, dashed black lines together with black dots denote accepted states

x^{(u)}

, and the red dot stands for the actual state of the system

x^{*}

. Even for

K = N

, the method is able to obtain reasonable estimates of the actual state of the system. The variance explained in such plots is larger than

90 %

.

Table 1. Experimental results with the Lorenz-96 model. Mean of RMSE values are reported across the 10 different experimental configurations for different configuration of parameters

δ t

, s, K, and

γ

. The number of ensemble members equals

N = 20

.

Table 1. Experimental results with the Lorenz-96 model. Mean of RMSE values are reported across the 10 different experimental configurations for different configuration of parameters

δ t

, s, K, and

γ

. The number of ensemble members equals

N = 20

.

$s$	$δ t$	$γ = 1$	$γ = 3$	$γ = 5$
70%	16 h
70%	50 h
100%	16 h
100%	50 h

Table 2. Experimental results with the Lorenz-96 model. Mean of RMSE values are reported across the 10 different experimental configurations for different configuration of parameters

δ t

, s, K, and

γ

. The number of ensemble members equals

N = 80

.

Table 2. Experimental results with the Lorenz-96 model. Mean of RMSE values are reported across the 10 different experimental configurations for different configuration of parameters

δ t

, s, K, and

γ

. The number of ensemble members equals

N = 80

.

s	$δ t$	$γ = 1$	$γ = 3$	$γ = 5$
70%	16 h
70%	50 h
100%	16 h
100%	50 h

© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Nino-Ruiz, E.D.; Cheng, H.; Beltran, R. A Robust Non-Gaussian Data Assimilation Method for Highly Non-Linear Models. Atmosphere 2018, 9, 126. https://doi.org/10.3390/atmos9040126

AMA Style

Nino-Ruiz ED, Cheng H, Beltran R. A Robust Non-Gaussian Data Assimilation Method for Highly Non-Linear Models. Atmosphere. 2018; 9(4):126. https://doi.org/10.3390/atmos9040126

Chicago/Turabian Style

Nino-Ruiz, Elias D., Haiyan Cheng, and Rolando Beltran. 2018. "A Robust Non-Gaussian Data Assimilation Method for Highly Non-Linear Models" Atmosphere 9, no. 4: 126. https://doi.org/10.3390/atmos9040126

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Robust Non-Gaussian Data Assimilation Method for Highly Non-Linear Models

Abstract

1. Introduction

2. Preliminaries

2.1. Ensemble Kalman Filters Based on Modified Cholesky Decomposition

2.2. Gaussian Mixture Models Based Filters

3. Proposed Method

3.1. Estimation of Hyper-Parameters—EM Method

3.2. Sampling Method—Approaching the Posterior

3.3. Building the Posterior Ensemble

3.4. Computational Complexity

3.5. Comparison of GM-EnKF-MCMC with Other Sampling Methods

4. Experimental Settings

5. Conclusions

Acknowledgments

Author Contributions

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI