PyDA: A Hands-On Introduction to Dynamical Data Assimilation with Python

Ahmed, Shady E.; Pawar, Suraj; San, Omer

doi:10.3390/fluids5040225

Open AccessArticle

PyDA: A Hands-On Introduction to Dynamical Data Assimilation with Python

by

Shady E. Ahmed

,

Suraj Pawar

and

Omer San

^*

School of Mechanical and Aerospace Engineering, Oklahoma State University, Stillwater, OK 74078, USA

^*

Author to whom correspondence should be addressed.

Fluids 2020, 5(4), 225; https://doi.org/10.3390/fluids5040225

Submission received: 1 November 2020 / Revised: 25 November 2020 / Accepted: 26 November 2020 / Published: 29 November 2020

(This article belongs to the Special Issue Teaching and Learning of Fluid Mechanics, Volume II)

Download

Browse Figures

Versions Notes

Abstract

:

Dynamic data assimilation offers a suite of algorithms that merge measurement data with numerical simulations to predict accurate state trajectories. Meteorological centers rely heavily on data assimilation to achieve trustworthy weather forecast. With the advance in measurement systems, as well as the reduction in sensor prices, data assimilation (DA) techniques are applicable to various fields, other than meteorology. However, beginners usually face hardships digesting the core ideas from the available sophisticated resources requiring a steep learning curve. In this tutorial, we lay out the mathematical principles behind DA with easy-to-follow Python module implementations so that this group of newcomers can quickly feel the essence of DA algorithms. We explore a series of common variational, and sequential techniques, and highlight major differences and potential extensions. We demonstrate the presented approaches using an array of fluid flow applications with varying levels of complexity.

Keywords:

data assimilation; variational and sequential methods; Kalman filtering; forward sensitivity; measurements fusion

1. Introduction

Data assimilation (DA) refers to a class of techniques that lie at the interface between computational sciences and real measurements, and aim at fusing information from both sides to provide better estimates of the system’s state. One of the very mature applications that significantly utilize DA is weather forecast, that we rely on in our daily life. In order to predict the weather (or the state of any system) in the future, a model has to be solved, most often by numerical simulations. However, a few problems rise at this point and we refer to only few of them here. First, for accurate predictions, these simulations need to be initiated from the true initial condition, which is never known exactly. For large scale systems, it is almost impossible to experimentally measure the full state of the system at a given time. For example, imagine simulating the atmospheric or oceanic flow, then you need to measure the velocity, temperature, density, etc. at every location corresponding to your numerical grid! Even in the hypothetical case when this is possible, measurements are always contaminated by noise, reducing the fidelity of your estimation. Second, the mathematical model that completely describes all the underlying processes and dynamics of the system is either unknown or hard to deal with. Then, approximate and simplified models are adopted instead. Third, the computational resources always constrain the level of accuracy in the employed schemes and enforce numerical approximations. Luckily, DA appears at the intersection of all these efforts and introduces a variety of approaches to mitigate these problem, or at least reduce their effects. In particular, DA techniques combines possibly incomplete dynamical models, prior information about initial system’s state and parameterization, and sparse and corrupted measurement data to yield optimized trajectory in order to describe the system’s dynamics and evolution.

As indicated above, dynamical data assimilation techniques have a long history in computational meteorology and geophysical fluid dynamics sciences [1,2,3]. Then comes the question: why do we write such an introductory tutorial about a historical topic? Before answering this question, we highlight a few points. DA borrows ideas from numerical modeling and analysis, linear algebra, optimization, and control. Although these topics are taught separately in almost every engineering discipline, their combination is rarely presented. We believe that incorporating DA course in engineering curricula is important nowadays as it provides a variety of global tools and ideas that can potentially be applied in many areas, not just meteorology. This was proven while administering a graduate class on “Data Assimilation in Science and Engineering” at Oklahoma State University, as students from different disciplines and backgrounds were astonished by the feasibility and utility of DA techniques to solve numerous inverse problems they are working on, not related to weather forecast. Nonetheless, the availability of beginner-friendly resources has been the major shortage that students suffered from. The majority of textbooks either derives DA algorithms from their very deep roots or surveys their historical developments, without focus on actual implementations. On the other hand, available packages are presented in a sophisticated way that optimizes data storage and handling, computational cost, and convergence. However, this level of sophistication takes a steep learning curve to understand the computational pipeline as well as the algorithmic steps, and a lot of learners fall hopeless during this journey.

Therefore, the main objective of this tutorial paper is to familiarize beginning researchers and practitioners with basic DA ideas along with easy-to-follow pieces of codes to feel the essence of DA and trigger the priceless “aha” moments. With this in mind, we choose Python as the coding language, being a popular, interpreted language, and easy to understand even with minimum programming background, although not the most computationally favored language in high performance computing (HPC) environments. Moreover, whenever possible, we utilize the built-in functions and libraries to minimize the user coding efforts. In other words, the provided codes are presented for demonstrative purposes only using an array of academic test problems, and significant modifications should be incorporated before dealing with complex applications. Meanwhile, this tutorial will give the reader a jump-start that hopefully shortens the learning curve of more advanced packages. A Python-based DA testing suite has been also designed to compare different methodologies [4].

Dynamical data assimilation techniques can be generally classified into variational DA and sequential DA. Variational data assimilation which works by setting an optimization problem defined by a cost functional along with constraints that collectively incorporate our knowledge about the system. The minimizer of the cost functional represents the DA estimate of the unknown system’s variables and/or parameters. On the other hand, in sequential methods (also known as statistical methods), the system state is evolved in time using background information until observations become available. At this instant, an update (correction) to the system’s variables and/or parameters is estimated and the solver is re-initialized with this new updated information until new measurements are collected, and so on. We give an overview of both approaches as well as basic implementation. In particular, we briefly discuss the three dimensional variational data assimilation (3DVAR) [5,6], the four dimensional variational data assimilation (4DVAR) [7,8,9,10,11,12], and forward sensitivity method (FSM) [13,14] as examples of variational approaches. Kalman filtering and its variants [15,16,17,18,19,20,21,22] are the most popular applications of sequential methods. We introduce the main ideas behind standard Kalman filter and its extensions for nonlinear and high-dimensional problems. The famous Lorenz 63 is utilized to illustrate the merit of all presented algorithms, being a simple low-order dynamical system that exhibit interesting dynamics. This is to help readers to digest the different pieces of codes and follow the computational pipeline. Then, the paper is concluded with a section that provides the deployment of selected DA approaches for dynamical systems with increasing levels of dimensionality and complexity. We highlight here that the primary purpose of this paper is to provide an introductory tutorial on the data assimilation for educational purposes. All Python implementations of the presented algorithms as well as the test cases are made publicly accessible at our GitHub repository https://github.com/Shady-Ahmed/PyDA.

2. Preliminaries

2.1. Notation

Before we dive into the technical details of dynamical data assimilation approaches, we briefly present and describe our notations and assumptions. In general, we assume that all vector-valued functions or variables are written as a column vector. Unless stated otherwise, boldfaced lowercase letters are used to denote vectors and boldface uppercase letters are reserved to matrices. We suppose that the system state at any time t is denoted as

u (t) = {[u_{1} (t), u_{2} (t), \dots, u_{n} (t)]}^{T} \in R^{n}

, where n is the state-space dimension. The dynamics of the system are governed by the following differential equation

\begin{matrix} \frac{d u}{d t} = f (u; θ), \end{matrix}

(1)

where

f : R^{n} \times R^{p} \to R^{n}

encapsulates the model’s dynamics, with

θ \in R^{p}

being the vector of model’s parameters and p being the number of these parameters. With a time-integration scheme applied, the discrete-time model can be written as follows,

\begin{matrix} u (t_{k + 1}) = M (u (t_{k}); θ), \end{matrix}

(2)

where M is the one-time step transition map that evolves the state at time

t_{k}

to time

t_{k + 1} = t_{k} + Δ t

, with

Δ t

being the time step length.

We denote the true value of the state variable as

u_{t}

, which is assumed to be unknown and a good approximation of it is sought. Our prior information about the state

u

is called the background, with a subscript of b as

u_{b}

. This represents our beginning knowledge, which might come from historical data, numerical simulations, or just an intelligent guess. The discrepancy between this background information and true state is denoted as

ξ_{b} = u_{t} - u_{b}

, resulting from imperfect model, inaccurate model’s initialization, incorrect parameterization, numerical approximations, etc. From probabilistic point of view, we suppose that the background error has a zero mean and a covariance matrix of

B

. This can be represented as

E [ξ_{b}] = 0

and

E [ξ_{b} ξ_{b}^{T}] = B

, where

B \in R^{n \times n}

is a symmetric and positive-definite matrix and the superscript T refers to the transpose operation. Moreover, we assume that unknown true state has a multivariate Gaussian distribution with a mean

u_{b}

and a covariance matrix

B

(i.e.,

u_{t} = N (u_{b}, B)

).

We define the set of the collected measurements at a specific time

t_{k}

as

w (t_{k}) \in R^{m}

, where m is the dimension of observation-space. We highlight that the observed quantity need not be the same as the state variable. For instance, if the state variable that we are trying to resolve is the temperature of sea surface, we may have access only to radiance measurements by satellites. However, those case be related to each other through Planch–Stefan’s law, for instance. Formally, we can relate the observables and the state variables as

\begin{matrix} w (t_{k}) = h (u_{t} (t_{k})) + ξ_{m}, \end{matrix}

(3)

where

h : R^{n} \to R^{m}

defines the mapping from state-space to measurement-space and

ξ_{m} \in R^{m}

denotes the measurement noise. The mapping h can refer to the sampling (and probably interpolation) of state variables at the measurements locations, relating different quantities of interest (e.g., relating see surface temperature to emitted radiance), or both! The model’s map M and observation operator h can be linear, nonlinear, or a combination of them. Similar to the background error, the observation noise

ξ_{m}

is assumed to possess a multivariate normal distribution, with a zero mean and a covariance matrix

R \in R^{m \times m}

, i.e.,

ξ_{m} = N (0, R)

. An extra grounding assumption is that the measurement noise and the state variables (either true or background) are uncorrelated. Furthermore, all noises are assumed to be temporally uncorrelated (i.e., white noise). Even though we consider only Gaussian distribution for the background error, and observation noise, we emphasize that there has been a lot of studies dealing with non-Gaussian data assimilation [23,24,25,26].

The objective of data assimilation is to provide an algorithm that fuses our prior information

u_{b}

and measurement data

w

to yield a better approximation of the unknown true state. This better approximation is called the analysis, and denoted as

u_{a}

. The difference between this better approximation and the true state is denoted as

ξ_{a} = u_{t} - u_{a}

.

2.2. Twin Experiment Framework

In a realistic situation, the true state values are unknown and noisy measurements are collected by sensing devices. However, for testing ideas, the ground truth need to be known beforehand such that the convergence and accuracy of the developed algorithm can be evaluated. In this sense, the concept of twin experiment has been popular in data assimilation (and inverse problems, in general) studies. First, a prototypical test case (all called toy problems!) is selected based on the similarities between its dynamics and real situations. Similar to your first “Hello World!” program, the Lorenz 63 and Lorenz 96 are often used in numerical weather forecast investigations, the one-dimensional Burgers equation is explored in computational fluid dynamics developments, the two-dimensional Kraichnan turbulence and three-dimensional Taylor–Green vortex are analyzed in turbulence studies, and so on. A reference true trajectory is computed by fixing all parameters and running the forward solver until some final time is reached. Synthetic measurements are then collected by sampling the true trajectory at some points in space and time. A mapping can be applied on the true state variables and arbitrary random noise is artificially added (e.g., a white Gaussian noise). Finally, the data assimilation technique of interest is implemented starting from false values of the state variables or the model’s parameters along with the synthetic measurement data. The output trajectory of the algorithm is thus compared against the reference solution, and the performance can be evaluated. It is always recommended that researchers get familiar with twin experiment frameworks as they provide well-structured and controlled environments for testing ideas. For instance, the influence of different measurement sparsity and/or level of noise can be cheaply assessed, without the need to locate or modify sensors.

3. Three Dimensional Variational Data Assimilation

The three dimensional variational data assimilation (3DVAR) framework can be derived from either an optimal control or Bayesian analysis points of view. The interested readers can be referred to other resources for mathematical foundations (e.g., [27]). In order to compute a good approximation of the system state, the following cost functional can be defined,

\begin{matrix} J (u) = \frac{1}{2} {(w - h (u))}^{T} R^{- 1} (w - h (u)) + \frac{1}{2} {(u - u_{b})}^{T} B^{- 1} (u - u_{b}), \end{matrix}

(4)

where the first term penalizes the discrepancy between the actual measurement

w

and the state variable mapped into the observation space

h (u)

(also called the model predicted measurement). The second term aims at incorporating the prior information, weighted by the inverse of the covariance matrix to reflect our confidence in this background. We highlight that all terms in Equation (4) are evaluated at the same time, and thus the 3DVAR can be referred to as a stationary case.

The minimizer of

J (u)

(i.e., the analysis) can be obtained by setting the gradient of the cost functional to zero as follows,

\begin{matrix} \nabla J (u) = - D_{h}^{T} (u_{a}) R^{- 1} (w - h (u_{a})) + B^{- 1} (u_{a} - u_{b}) = 0, \end{matrix}

(5)

where

D_{h} (u) \in R^{m \times n}

is the Jacobian matrix of the operator

h (u)

. The difficulty of solving Equation (5) depends on the form of

h (u_{a})

as it can either be linear, or highly nonlinear.

3.1. Linear Case

For linear observation operator (i.e.,

h (u) = H u

, and

D_{h} (u) = H

, where

H

is an

m \times n

matrix), the evaluation of the analysis

u_{a}

in Equation (5) reduces to solving the following linear system of equations

\begin{matrix} (B^{- 1} + H^{T} R^{- 1} H) u_{a} = (B^{- 1} u_{b} + H^{T} R^{- 1} w) . \end{matrix}

(6)

We note that

(B^{- 1} + H^{T} R^{- 1} H)

on the left-hand side is an

n \times n

matrix, and hence this is called the model-space approach to 3DVAR. Furthermore, a popular incremental form can be derived from Equation (6) by adding and subtracting

H^{T} R^{- 1} H u_{a}

to/from the right-hand side and rearranging to get the following form,

\begin{matrix} u_{a} = u_{b} + {(B^{- 1} + H^{T} R^{- 1} H)}^{- 1} H^{T} R^{- 1} (w - H u_{b}) . \end{matrix}

(7)

Moreover, the Sherman–Morrison–Woodbury inversion formula can be used to derive an observation-space solution to the 3DVAR problem (for details, see [27], page 327) as follows,

\begin{matrix} u_{a} = u_{b} + B H^{T} {(R + H B H^{T})}^{- 1} (w - H u_{b}) . \end{matrix}

(8)

Note that

(R + H B H^{T})

is an

m \times m

matrix, compared to

(B^{- 1} + H^{T} R^{- 1} H)

being an

n \times n

matrix. Thus, Equation (7) or Equation (8) might be computationally favored based on the values of n and m. We also highlight that in either cases, matrix inversion is rarely (almost never) computed directly, and efficient linear system solvers should be utilized, instead. An example of a Python function for the implementation of the 3DVAR algorithm with a linear operator is shown in Listing 1.

3.2. Nonlinear Case

On the other hand, if

h (u)

is a nonlinear function, Equation (5) implies the solution of a system of nonlinear equations. Unlike linear systems, few algorithms are available to directly solve nonlinear systems and their convergence and stability are usually questionable. Alternatively, we can use Taylor series to expand

h (u)

around an initial estimate of

u_{a}

, denoted as

u_{c}

, where

u_{a} = u_{c} + Δ u

. The first-order approximation of

h (u_{a})

can be written as

\begin{matrix} h (u_{a}) \approx h (u_{c}) + D_{h} (u_{c}) Δ u, \end{matrix}

(9)

and Equation (5) can be approximated as

\begin{matrix} D_{h}^{T} (u_{c}) R^{- 1} (w - h (u_{c}) - D_{h} (u_{c}) Δ u) = B^{- 1} (u_{c} + Δ u - u_{b}) . \end{matrix}

(10)

Thus, the correction to the initial guess of

u_{a}

can be computed by solving the following system of linear equations

\begin{matrix} (B^{- 1} + D_{h}^{T} (u_{c}) R^{- 1} D_{h} (u_{c})) Δ u = (B^{- 1} (u_{b} - u_{c}) + D_{h} {(u_{c})}^{T} R^{- 1} (w - h (u_{c}))), \end{matrix}

(11)

and a new guess of

u_{a}

is estimated as

u_{c} + Δ u

, which is then plugged back into Equation (11) and the computations are repeated until convergence is reached. Python implementation of the 3DVAR in nonlinear observation operator is presented in Listing 2. Although we only present the first order approximation of

h (u)

, higher order expansions can be utilized for increased accuracy [27].

3.3. Example: Lorenz 63 System

The Lorenz 63 equations have been utilized as a toy problem in data assimilation studies, capturing some of the interesting mechanisms of weather systems. The three-equation model can be written as

\begin{matrix} \frac{d x}{d t} & = σ (y - x), \\ \frac{d y}{d t} & = x (ρ - z) - y, \\ \frac{d z}{d t} & = x y - β z, \end{matrix}

(12)

where the values of

σ = 10, β = 8 / 4, ρ = 28

are usually used to exhibit a chaotic behavior. If we like to put Equation (12) with the notations introduced in Section 2, we can write

u = {[x, y, z]}^{T}

with

n = 3

, and

θ = {[σ, β, ρ]}^{T}

with

p = 3

. A Python function describing the dynamics of the Lorenz 63 system is given in Listing 3.

Equation (12) describe the continuous-time evolution of the Lorenz system. In order to obtain the discrete-time mapping

M (\cdot; \cdot)

, a temporal integration scheme has to be applied. In Listing 4, one-step time integration functions are provided in Python using the first-order Euler and the fourth-order Runge–KuKutta schemes. Note that these functions requires a right-hand side function as input, this is mainly the continuous-time model

f (u)

(e.g., Listing 3).

For twin experiment testing, we suppose a true initial condition of

u_{t} (0) = {[1, 1, 1]}^{T}

and measurements are collected each

0.2

time units for a total time of 2. We suppose that we measure the full system state (i.e.,

h (u) = u

,

m = 3

, and

H = I_{3}

, where

I_{3}

is the

3 \times 3

identity matrix). Measurements are considered to be contaminated by a white Gaussian noise with a zero mean and a covariance matrix

R = Diag (σ_{1}^{2}, σ_{2}^{2}, σ_{3}^{2})

. For simplicity, we let

σ_{1} = σ_{2} = σ_{3} = 0.15

. For data assimilation testing, we assume that we begin with a perturbed initial condition of

u (0) = {[2, 3, 4]}^{T}

. Then, background state values are computed at

t = 0.2

by time integration of Equation (12) starting from this false initial condition. Observations at

t = 0.2

are assimilated to provide the analysis at

t = 0.2

. After that, background state values are computed at

t = 0.4

by time integration of Equation (12) starting from the analysis at

t = 0.2

, and so on. A sample implementation of the 3DVAR framework is presented in Listing 5, where a fixed background covariance matrix

B = Diag (0.01, 0.01, 0.01)

is assumed. Solution trajectories are presented in Figure 1 for a total time of 10, where observations are only available up to

t = 2

.

4. Four Dimensional Variational Data Assimilation

We highlighted in Section 3 that the 3DVAR can be referred to as a stationary case since the observations, background, and analysis all correspond to a fixed time instant. In other words, the optimization problem that minimizes Equation (4) takes place in the spatial state-space only. As an extension, the four dimensional variational data assimilation (4DVAR) aims to solve the optimization in both space and time, proving a non-stationary framework. In particular, the model’s dynamics are incorporated into the optimization problem to relate different points in time to each other. The cost functional for the 4DVAR Can be written as follows,

\begin{matrix} J (u (t_{0})) = \sum_{t_{k} \in T} \frac{1}{2} {(w (t_{k}) - h (u (t_{k})))}^{T} R^{- 1} (t_{k}) (w (t_{k}) - h (u (t_{k}))), \end{matrix}

(13)

where

w (t_{k})

is the measurement at time

t_{k}

and

T

defines the set of time instants where observations are available. Note that the argument of this cost functional is the initial condition

u (t_{0})

. In other words, the purpose of the 4DVAR algorithm is to evaluate an initial state estimate, which if evolved in time, would produce a trajectory that is as close to the collected measurements as possible (weighted by the inverse of the covariance matrix of interfering noise). This is the place where the model’s dynamics comes into play to relate initial condition to future predictions when measurements are accessible. In other words, the values of

u (t_{k}))

are constrained by the underlying model. Instead of presenting the linear and nonlinear mappings separately, we will focus on the general case of both nonlinear model mapping and nonlinear observation operator, where simplification to linear cases should be straightforward.

In Equation (2), we introduced the one-step transition map and here, we can extend it to the k-step transition case by applying Equation (2) recursively as

\begin{matrix} u (t_{k}) = M^{(k)} (u (t_{0}); θ) = M (M^{(k - 1)} (u (t_{0}); θ); θ), \end{matrix}

(14)

where

M^{(1)} (u (t_{0}); θ) = M (u (t_{0}); θ)

. Now, we consider a base trajectory given by

ū (t_{k})

for

k = 1, 2, \dots

generated from an initial condition of

ū (t_{0})

. A perturbed trajectory (

u (t_{k})

for

k = 1, 2, \dots,

) can be obtained by correcting the initial condition as

u (t_{0}) = ū (t_{0}) + Δ u_{0}

and the difference between the perturbed and based trajectories can be written as

\begin{matrix} u (t_{k}) - ū (t_{k}) = M^{(k)} (ū (t_{0}) + Δ u_{0}; θ) - M^{(k)} (ū (t_{0}); θ) . \end{matrix}

(15)

A first-order Taylor expansion of

M^{} (ū (t_{0}) + Δ u_{0}; θ)

around

ū (t_{0})

can be given as follows

\begin{matrix} M^{} (ū (t_{0}) + Δ u_{0}; θ) \approx M^{} (ū (t_{0}); θ) + D_{M} (ū (t_{0})) Δ u_{0}, \end{matrix}

(16)

where

D_{M} (u (t_{k}))

is the Jacobian of the model

M (u; θ)

, evaluated at

u (t_{k})

, also known as the tangent linear operator. Note that

M (ū (t_{0}); θ) = ū (t_{1})

and

M (ū (t_{0}) + Δ u_{0}; θ) = u (t_{1})

, thus

Δ u_{1} = u (t_{1}) - ū (t_{1}) \approx D_{M} (ū (t_{0})) Δ u_{0}

. Similarly, we can expand

M^{} (ū (t_{1}) + Δ u_{1}; θ)

around

ū (t_{1})

as follows,

\begin{matrix} M^{} (ū (t_{1}) + Δ u_{1}; θ) \approx M^{} (ū (t_{1}); θ) + D_{M} (ū (t_{1})) Δ u_{1}, \end{matrix}

(17)

where

u (t_{2} = M^{} (u (t_{1}); θ) \approx M^{} (ū (t_{1}) + Δ u_{1}; θ)

and

u (t_{2}) = M^{} (ū (t_{1}); θ)

. Consequently,

Δ u_{2} = u (t_{2}) - ū (t_{2}) \approx D_{M} (ū (t_{1})) Δ u_{1}

, which can be generalized as,

\begin{matrix} Δ u_{k + 1} \approx D_{M} (ū (t_{k})) Δ u_{k}, \end{matrix}

(18)

with

u (t_{k}) \approx Δ u_{k} + ū (t_{k})

. It is customary to call Equation (18) as the perturbation equation, or the tangent linear system (TLS). Equation (18) can be related to

Δ u_{0}

by recursion as follows,

\begin{matrix} Δ u_{k + 1} & \approx D_{M} (ū (t_{k})) Δ u_{k} \\ \approx D_{M} (ū (t_{k})) D_{M} (ū (t_{k - 1})) Δ u_{k - 1} \\ \approx D_{M} (ū (t_{k})) D_{M} (ū (t_{k - 1})) D_{M} (ū (t_{k - 2})) Δ u_{k - 2} \\ \approx D_{M} (ū (t_{k})) D_{M} (ū (t_{k - 1})) D_{M} (ū (t_{k - 2})) D_{M} (ū (t_{k - 3})) \dots D_{M} (ū (t_{0})) Δ u_{0}, \end{matrix}

which can be short-handed as

Δ u_{k + 1} \approx D_{M} (ū (t_{k : 0})) Δ u_{0}

(please, notice the order of matrix multiplication and the subscript “

k : 0

”).

Now, we investigate the first order variation

Δ J

of the cost functional

J (ū (t_{0}))

induced by the perturbation

Δ u_{0}

in the initial condition. This can be approximated as below,

\begin{matrix} Δ J & = Δ u_{0}^{T} \nabla J (ū (t_{0})) \end{matrix}

(19)

\begin{matrix} = - \sum_{t_{k} \in T} Δ u_{k}^{T} D_{h}^{T} (ū (t_{k})) R^{- 1} (t_{k}) (w (t_{k}) - h (ū (t_{k}))) . \end{matrix}

(20)

Given that

Δ u_{k} \approx D_{M} (ū (t_{k - 1 : 0})) Δ u_{0}

, then

Δ u_{k}^{T} \approx Δ u_{0}^{T} D_{M}^{T} (ū (t_{0})) D_{M}^{T} (ū (t_{1})) \dots D_{M}^{T} (ū (t_{k - 1})) = Δ u_{0}^{T} D_{M}^{T} (ū (t_{0 : k - 1}))

and Equation (20) can be rewritten as

\begin{matrix} Δ J & = - \sum_{t_{k} \in T} Δ u_{0}^{T} D_{M}^{T} (ū (t_{0 : k - 1})) D_{h}^{T} (ū (t_{k})) R^{- 1} (t_{k}) (w (t_{k}) - h (ū (t_{k}))) . \end{matrix}

(21)

By comparing Equations (19) and (21), the gradient of the cost functional can be approximated as

\begin{matrix} \nabla J (ū (t_{0})) & = - \sum_{t_{k} \in T} D_{M}^{T} (ū (t_{0 : k - 1})) D_{h}^{T} (ū (t_{k})) R^{- 1} (t_{k}) (w (t_{k}) - h (ū (t_{k}))) \end{matrix}

(22)

\begin{matrix} = - \sum_{t_{k} \in T} D_{M}^{T} (ū (t_{0 : k - 1})) f (t_{k}), \end{matrix}

(23)

where

f (t_{k}) = D_{h}^{T} (ū (t_{k})) R^{- 1} (t_{k}) (w (t_{k}) - h (ū (t_{k})))

. If we denote the time instants at which measurements are available as

T = {t_{O 1}, t_{O 2}, \dots, t_{O N}}

, Equation (23) can be expanded as

\begin{matrix} \nabla J (ū (t_{0})) & = - \{D_{M}^{T} (ū (t_{0 : O 1 - 1})) f (t_{O 1}) + D_{M}^{T} (ū (t_{0 : O 2 - 1})) f (t_{O 2}) + \dots + D_{M}^{T} (ū (t_{0 : O N - 1})) f (t_{O N})\} . \end{matrix}

(24)

Now, defining a sequence of

λ_{k} \in R^{n}

as below,

λ_{k} = \{\begin{matrix} f_{k}, & if t_{k} = t_{O N} \\ D_{M}^{T} (ū (t_{k})) λ_{k + 1} + f_{k}, & if t_{k} \in {t_{O 1}, t_{O 2}, \dots, t_{O N - 1}} \\ D_{M}^{T} (ū (t_{k})) λ_{k + 1}, & otherwise . \end{matrix}

(25)

It can be verified that

\nabla J (ū (t_{0})) = - λ_{0}

(assume some numbers and you can see this relation holds!). Therefore, in order to obtain the gradient of the cost functional,

λ_{0}

has to be computed, which depends on the evaluation of

λ_{1}

. In turn, the computation of

λ_{1}

requires

λ_{2}

and so on. Equation (25) is known as the first-order adjoint equation, as it implies the evaluation of

λ_{k}

sequence from

t_{k + 1}

to

t_{k}

(i.e., reverse order).

Therefore, the first-order approximation of the 4DVAR works as follows. Starting from a prior guess of the initial condition, the base trajectory is computed by solving the model forward in time until the final time corresponding the last observation point (i.e.,

k = 0, 1, 2, \dots, O N

). Then, the value of

λ_{O N} = f_{O N}

is evaluated at final time. After that,

λ_{k}

is evolved backward in time using Equation (25) until

\nabla J (ū (t_{0})) = - λ_{0}

is obtained. This value of the gradient is thus utilized to update the initial condition and a new base trajectory is generated. The solution of the 4DVAR problem requires the solution of the model dynamics forward in time and adjoint problem backward in time until to compute the gradient of the cost functional and update the initial condition. The process is thus repeated until convergence takes place. Listing 6 shows a sample function to compute the gradient of the cost functional

\nabla J (ū (t_{0}))

corresponding to a base trajectory generated from a guess of the initial condition

ū (t_{0})

. We highlight that, in practice, the storage of the base trajectory as well as the

λ

sequence at every time instant might be overwhelming. However, we are not addressing such issues in these introductory tutorials to data assimilation techniques.

The gradient

\nabla J (ū (t_{0}))

should be used in a minimization algorithm to update the initial condition for the next iteration. One simple algorithm is the simple gradient descent where an updated value of the initial state is computed as

ū (t_{0}) {)^{n e w} = ū (t_{0}))}^{o l d} - β_{n} \nabla J (ū {(t_{0})}^{o l d})

, where

β_{n}

is some step parameter. This can be normalized as

ū (t_{0}) {)^{n e w} = ū (t_{0}))}^{o l d} - β \frac{\nabla J (ū {(t_{0})}^{o l d})}{∥ \nabla J (ū {(t_{0})}^{o l d}) ∥}

. The value of

β

might be predefined, or more efficiently updated at each iteration using an additional optimization algorithm (e.g., line-search). For the sake of completeness, we present a line-search routine in Listing 7 using the Golden search algorithm. This is based on the definition of the cost functional in Listing 8.

Example: Lorenz 63 System

Similar to the 3DVAR demonstration, we apply the described 4DVAR using the first-order adjoint method on the Lorenz 63 system. We also begin with the same erroneous initial condition of

u (0) = {[2, 3, 4]}^{T}

and observations are collected each

0.2

time units, contaminated with a Gaussian noise with diagonal covariance matrix defined as

R = σ_{m}^{2} I_{3}

, where

σ_{m} = 0.15

is the standard deviation for measurement noise. Moreover, we simply define a linear observation operator defined as

h (u) = u

, with a Jacobian of identity matrix. We utilize the simple gradient descent for minimizing the cost functional, equipped by a Golden search method for learning rate optimization. A maximum number of iterations is set to 1000, but we highlight that this is highly dependent on the adopted minimization algorithm as well as the line-search technique. In practice, the evaluation of each iteration might be too computationally expensive, so the number of iterations need to be as low as possible. We define two criteria for convergence, and iterations stop whenever any one of them is achieved. The first one is based on the change in the value of the cost or loss functional and the second one is based on the magnitude of its gradient. Extra criteria might be supplied as well.

Results for running Listing 9 is shown in Figure 2, where we can notice the significant improvement of predictions, compared to the background trajectories. Moreover, we highlight the correction to the initial conditions in Figure 2 which resulted in the analysis trajectory. This is opposed to the 3DVAR implementation, where correction is applied locally at measurements instants only as seen in Figure 1.

Before we move to other data assimilation techniques, we highlight a few remarks regarding our presentation of the 4DVAR

In Listing 9, we utilize the gradient descent approach to minimize the cost function. Readers are encouraged to apply other optimization techniques (e.g., conjugate gradient) that achieve higher convergence rate.
The determination of the learning rate can be further optimized using more efficient line-search methods, rather than the simple Golden search.
The Lagrangian multiplier method can be applied to solve the 4DVAR problem instead of the adjoint method, similar results should be obtained.
The presented algorithm relies on the definition of the cost functional given in Equation (13), based on the discrepancy between measurements and model’s predictions. When extra information is available, it can be incorporated into the cost functional. For instance, similar to Equation (4), a term that penalizes the correction magnitude can be added, weighted by the background covariance matrix. Furthermore, symmetries or other physical knowledge can be enforced as hard or weak constraints.
The first-order adjoint algorithm requires the computation of the Jacobian $D_{M} (u)$ of the discrete-time model map $M (u; θ)$ . This can be computed by plugging the model $f (u; θ)$ in a time integration scheme and rearranging everything to rewrite $M (u; θ)$ as explicit function of $u$ and differentiating with respect to components of $u$ . For Lorenz 63 and 1st Euler scheme, this can be an easy task. However, for a higher dimensional system and more accurate time integrators, this would be cumbersome. Instead, the chain rule can be utilized to compute $D_{M} (u)$ as presented in Listing 10, which takes as input the right-hand side of the continuous-time dynamics $f (\cdot; \cdot)$ (described in Listing 3 for Lorenz 63 system) as well as its Jacobian (given in Listing 11 for Lorenz 63).

5. Forward Sensitivity Method

We have seen in Section 4 that the minimization of the cost functional via the 4DVAR algorithm requires the solution of the adjoint problem at each iteration, which incurs a significant computational burden for high dimensional systems. Alternatively, Lakshmivarahan and Lewis [13] proposed the forward sensitivity method (FSM) to derive an expression for the correction vector in terms of the forward sensitivity matrices [14]. In their development, simultaneous correction to the initial condition

u (t_{0})

and the model parameters

θ

is treated. For conciseness and consistency with the methods introduced here, we only consider erroneous initial conditions and assume model parameters are perfectly known. Given the discrete-time model map

M (u; θ)

in Equation (2), the forecast sensitivity at time

t_{k + a}

to the initial conditions

u (t_{0})

can be defined as follows,

\frac{\partial u_{i} (t_{k + 1})}{\partial u_{j} (t_{0})} = \sum_{q = 1}^{n} (\frac{\partial M_{i} (u (t_{k}); θ)}{\partial u_{q} (t_{k})}) (\frac{\partial u_{q} (t_{k})}{\partial u_{j} (t_{0})}), 1 ⩽ i, j ⩽ n,

(26)

where

M (u (t_{k}); θ) = {[M_{1} (u (t_{k}); θ), M_{2} (u (t_{k}); θ), \dots, M_{n} (u (t_{k}); θ)]}^{T}

. Recall that the Jacobian of the model

M (u (t_{k}); θ)

is defined by the matrix

D_{M} (u (t_{k})) \in R^{n \times n}

whose

{(i, j)}^{t h}

entry is defined as

\frac{\partial M_{i} (u (t_{k}); θ)}{\partial u_{j} (t_{k})}

. We also define

U (t_{k})

as the forward sensitivity matrix of

u (t_{k}) \in R^{n \times n}

with respect to initial state

u (t_{0})

, where

{[U (t_{k})]}_{i, j} = \frac{\partial u_{i} (t_{k})}{\partial u_{j} (t_{0})}

for

1 ⩽ i, j ⩽ n

. Thus, Equation (26) can be rewritten in matrix form as,

\begin{matrix} U (t_{k + 1}) & = D_{M} (u (t_{k})) U (t_{k}) . \end{matrix}

(27)

Equation (27) provides the dynamic evolution of the forward sensitivity matrix in a recursive manner, initialized by

U (t_{0}) = I_{n}

, that can be used to relate the prediction error at any time step to the initial condition.

Given the measurement

w (t_{k})

at time

t_{k} \in T

, the forecast error

e (t_{k})

is defined as the difference between the model forecast and measurements as

e (t_{k}) = w (t_{k}) - h (u (t_{k})) .

(28)

This is commonly called the innovation in DA terminology. The cost functional in Equation (13) can be rewritten as

J (u (t_{0})) = \sum_{t_{k} \in T} \frac{1}{2} {∥ e (t_{k}) ∥}_{R^{- 1} (t_{k})}^{2} = \sum_{t_{k} \in T} \frac{1}{2} e {(t_{k})}^{T} R^{- 1} (t_{k}) e (t_{k}) .

(29)

With the assumption that the dynamical model is perfect (i.e., correctly encapsulates all the relevant processes) and the model parameters are known, the deterministic part of the forecast error can be attributed to the inaccuracy in the initial condition

u (t_{0})

, defined as

Δ u_{0} = u_{t} (t_{0}) - u_{b} (t_{0})

, where

u_{t} (t_{0})

denotes the true initial conditions.

Considering a base trajectory

ū (t_{k})

for

k = 1, 2, \dots

generated from the initial condition of

ū (t_{0})

, related to the corrected trajectory (

u (t_{k})

for

k = 1, 2, \dots,

) obtained by correcting the initial condition as

u (t_{0}) = ū (t_{0}) + Δ u_{0}

, we define the difference between both trajectories at any time

t_{k}

as

Δ u_{k} = u (t_{k}) - ū (t_{k})

. We highlight that

u (t_{k})

is a function of both the initial condition (with the model parameters being known), the first-order Taylor expansion of

u (t_{k})

around the base trajectory can be written as

u (t_{k}) \approx ū (t_{k}) + U (t_{k}) Δ u_{0}

, leading to the following relation

Δ u_{k} \approx U (t_{k}) Δ u_{0} .

(30)

If we let the perturbed (corrected) trajectory to be the sought true trajectory, Equation (3) can be rewritten as

\begin{matrix} w (t_{k}) = h (ū (t_{k}) + Δ u_{k}) + ξ_{m}, \end{matrix}

(31)

and a first order expansion of

w (t_{k})

(neglecting the measurement noise) will be as follows,

\begin{matrix} w (t_{k}) \approx h (ū (t_{k})) + D_{h} (ū (t_{k})) Δ u_{k}, \end{matrix}

(32)

and the forecast error at the base trajectory can be approximated as

e (t_{k}) = D_{h} (ū (t_{k})) Δ u_{k} .

(33)

Equations (30) and (33) can be combined to yield the following,

e (t_{k}) = D_{h} (ū (t_{k})) U (t_{k}) Δ u_{0},

(34)

which relates the forecast error at any time

t_{k}

and the discrepancy between the true and erroneous initial condition in a linear relationship. In order to account for all the time instants at which observations are available (

T = {t_{O 1}, t_{O 2}, \dots, t_{O N}}

), Equation (34) can be concatenated at different times and written as a linear system of equations as follows,

\begin{matrix} Q Δ u_{0} = e_{F}, \end{matrix}

(35)

where the matrix

Q^{N m \times n}

and the vector

e_{F} \in R^{N m}

are computed as,

\begin{matrix} Q = [\begin{matrix} D_{h} (ū (t_{O 1})) U (t_{O 1}) \\ D_{h} (ū (t_{O 2})) U t_{O 2}) \\ ⋮ \\ D_{h} (ū (t_{O N})) U (t_{O N}) \end{matrix}], e_{F} = [\begin{matrix} e (t_{O 1}) \\ e (t_{O 2}) \\ ⋮ \\ e (t_{O N}) \end{matrix}] . \end{matrix}

(36)

Depending on the value of

N m

relative to n, Equation (35) can give rise to either an over-determined or an under-determined linear inverse problem. In either case, the inverse problem can be solved in a weighted least squares sense to find a first-order estimation of the optimal correction or perturbation to the initial condition

Δ u_{0}

, with

R^{- 1}

being the weighting matrix, where

R

is a block-diagonal matrix constructed as follows,

R = [\begin{matrix} R (t_{O 1}) \\ R (t_{O 2}) \\ ⋱ \\ R (t_{O N}) \end{matrix}],

(37)

and the solution of Equation (35) can be written as

Δ u_{0} = {(Q^{T} R^{- 1} Q)}^{- 1} Q^{T} R^{- 1} e_{F}

for the over-determined case. This first order approximation progressively yields better results by repeating the entire process for multiple iterations until convergence with certain tolerance [13]. In essence, the FSM is an alternative to the 4DVAR algorithm, replacing the solution of the adjoint problem backward in time (i.e., Equation (25)), by the successive matrix evaluation in Equation (27). However, we highlight that in the 4DVAR approach, the actual forecast error is computed as

e_{k} = w (t_{k}) - h (ū (t_{k}))

while its first-order approximation is utilized in the FSM development. The duality between the two approaches is further discussed in [13].

A Python implementation of the FSM approach is presented in Listing 12. We note that we solve the Equation (35) using the built-in numpy least-squares function. However, more efficient iterative schemes can be adopted in practice.

Example: Lorenz 63 System

We apply the described FSM to estimate the initial conditions for the Lorenz 63 system, using the same parameters and setup as described in Section 4. Sample code is presented in Listing 13. We highlight that instead of adding the correction vector

Δ u_{0}

directly to the base value

ū (t_{0})

, we multiply it with a learning rate to mitigate the effects of first-order approximations. We utilize the golden search method to update this learning rate at each iteration.

Prediction results are provided in Figure 3, where we notice a large discrepancy at the estimated initial conditions. However, the predicted trajectory perfectly match the true one for the rest of the testing time window. This is largely affected by the nature of the Lorenz system itself and the attachment to its attractor. Furthermore, this can be partially attributed to the lack of background information and its contribution to the cost functional. Moreover, this can be highly improved by adding more observations close to the initial time since the correction vector is estimated based on the forecast error computed at observation times. Anyhow, we see that the analysis trajectory is significantly more accurate than the background one, with iterative first-order approximations of the forward sensitivity method, even for long time predictions.

6. Kalman Filtering

The idea behind Kalman filtering techniques is to propagate the mean as well as the covariance matrix of the system’s state sequentially in time. That is, in addition to providing an improved state estimate (i.e., the analysis), it also gives some information about the statistical properties of this state estimate. This is one main difference between Kalman filtering and variational methods, which often assumes a fixed (stationary) background covariance matrices. Kalman filters are also very popular in systems engineering, robotics, navigation, and control. Almost all modern control systems use the Kalman filter. It assisted the guidance of the Apollo 11 lunar module to the moon’s surface, and most probably will do the same for next generations of aircraft as well.

Although most application in fluid dynamics involve nonlinear systems, we first describe the standard Kalman filter developed for the linear dynamical system case with linear observation operator described as

\begin{matrix} u_{t} (t_{k + 1}) & = M_{k} u_{t} (t_{k}) + ξ_{p} (t_{k + 1}), \end{matrix}

(38)

\begin{matrix} w (t_{k}) & = H_{k} u_{t} (t_{k}) + ξ_{m} (t_{k}) \end{matrix}

(39)

where

M \in R^{n \times n}

is a non-singular system matrix defining the underlying governing processes and

ξ_{p} \in R^{n}

describes the process noise (or model error).

H \in R^{m \times n}

represents the measurement system with a measurement noise of

ξ_{m} \in R^{m}

.

As presented in Section 2, the true state

u_{t} (t_{k})

is assumed to be a random variable with known mean

E [u_{t} (t_{k})] = u_{b} (t_{k})

and covariance matrix of

E [(u_{t} (t_{k}) - u_{b} (t_{k})) {(u_{t} (t_{k}) - u_{b} (t_{k}))}^{T}] = B_{k}

. In Kalman filtering, we note that the covariance matrix evolves in time, and thus appears the subscript. We also assume that the process noise is unbiased with zero mean and a covariance matrix

Q

. That is

E (ξ_{p} (t_{k})) = 0

and

E (ξ_{p} (t_{k}) ξ_{p} {(t_{k})}^{T}) = Q_{k}

.

Thus, the goal of the filtering problem is to find a good estimate (analysis)

u_{a} (t_{k})

of the true system’s state

u_{t} (t_{k})

given a dynamical model a set of noisy observation

{w (t_{i})}

collected at some time instants

t_{i} \in (0, t_{k}]

. The optimality of the estimate

u_{a} (t_{k})

is defined as the one which minimizes

E [{(u_{t} (t_{k}) - u_{a} (t_{k}))}^{T} (u_{t} (t_{k}) - u_{a} (t_{k}))]

. This filtering process generally consists of two steps: the forecast step and the data assimilation step.

The forecast step is performed using the predictable part of the given dynamical model starting from the best known information at time

t_{k}

(denoted as

{\hat{u}}_{b} (t_{k}))

to produce a forecast or background estimate

u_{b} (t_{k + 1}) = M_{k} {\hat{u}}_{b} (t_{k})

. The difference between the background forecast and true state at

t_{k + 1}

can be written as follows,

\begin{matrix} ξ_{b} (t_{k + 1}) & = u_{t} (t_{k + 1}) - u_{b} (t_{k + 1}) \\ = (M_{k} u_{t} (t_{k}) + ξ_{p} (t_{k + 1})) - M_{k} {\hat{u}}_{b} (t_{k}) \\ = M_{k} (u_{t} (t_{k}) - {\hat{u}}_{b} (t_{k})) + ξ_{p} (t_{k + 1}) \\ = M_{k} {\hat{ξ}}_{b} (t_{k}) + ξ_{p} (t_{k + 1}), \end{matrix}

where

{\hat{ξ}}_{b} (t_{k}) = (u_{t} (t_{k}) - {\hat{u}}_{b} (t_{k}))

is the error estimate at

t_{k}

, with zero mean and covariance matrix of

{\hat{B}}_{k}

.

The covariance matrix of the background estimate at

t_{k + 1}

can be evaluated as

B_{k + 1} = E [ξ_{b} (t_{k + 1}) ξ_{b} {(t_{k + 1})}^{T}] = E [(M_{k} {\hat{ξ}}_{b} (t_{k}) + ξ_{p} (t_{k + 1})) {(M_{k} {\hat{ξ}}_{b} (t_{k}) + ξ_{p} (t_{k + 1}))}^{T}]

. Since,

{\hat{ξ}}_{b} (t_{k})

and

ξ_{p} (t_{k + 1})

are assumed to be uncorrelated (i.e.,

E [{\hat{ξ}}_{b} (t_{k}) ξ_{p} {(t_{k + 1})}^{T}] = 0

), the background covariance matrix at

t_{k + 1}

can be computed as follows,

B_{k + 1} = M_{k} {\hat{B}}_{k} M_{k}^{T} + Q_{k + 1} .

(40)

Now, with the forecast step, we have a background estimate at

t_{k + 1}

defined as

u_{b} (t_{k + 1})

with a covariance matrix

B_{k + 1}

. Then, measurements

w (t_{k + 1})

are collected at

t_{k + 1}

with a linear operator

H_{k + 1}

and measurement noise

ξ_{m} (t_{k + 1})

with zero mean a covariance matrix of

R_{k + 1}

. Thus, we would like to fuse these pieces of information to create an optimal unbiased estimate (analysis)

u_{a} (t_{k + 1})

with a covariance matrix

P_{k + 1}

. This can be defines as linear function of

u_{b} (t_{k + 1})

and

w (t_{k + 1})

as follows,

\begin{matrix} u_{a} (t_{k + 1}) = u_{b} (t_{k + 1}) + K_{k + 1} (w (t_{k + 1}) - H_{k + 1} u_{b} (t_{k + 1})), \end{matrix}

(41)

where

(w (t_{k + 1}) - H_{k + 1} u_{b} (t_{k + 1}))

is the innovation vector and

K \in R^{n \times m}

is called the Kalman gain matrix. We highlight that Kalman gain matrix is defined in such a way to minimize

E [{(u_{t} (t_{k + 1}) - u_{a} (t_{k + 1}))}^{T} (u_{t} (t_{k + 1}) - u_{a} (t_{k + 1}))] = tr (P_{k + 1})

. This can be written as [27]

\begin{matrix} K_{k + 1} = B_{k + 1} H_{k + 1}^{T} {(H_{k + 1} B_{k + 1} H_{k + 1}^{T} + R_{k + 1})}^{- 1}, \end{matrix}

(42)

resulting in an analysis covariance matrix defined as

\begin{matrix} P_{k + 1} = (I_{n} - K_{k + 1} H_{k + 1}) B_{k + 1}, \end{matrix}

(43)

where

I_{n}

is the

n \times n

identity matrix. The resulting analysis

u_{a} (t_{k + 1})

is known as the best linear unbiased estimate (BLUE). We highlight that information at

t_{k}

might correspond to the analysis (i.e.,

{\hat{u}}_{b} (t_{k}) = u_{a} (t_{k})

) obtained from the last data assimilation implementation, or just from previous forecast if no other information is available. Thus, the Kalman filtering process can be summarized as follows,

\begin{matrix} Inputs : & {\hat{u}}_{b} (t_{k}), {\hat{B}}_{k}, M_{k}, Q_{k + 1}, w (t_{k + 1}), R_{k + 1}, H_{k + 1} \\ Forecast : & u_{b} (t_{k + 1}) = M_{k} {\hat{u}}_{b} (t_{k}) \\ B_{k + 1} = M_{k} {\hat{B}}_{k} M_{k}^{T} + Q_{k + 1} \\ Kalman gain : & K_{k + 1} = B_{k + 1} H_{k + 1}^{T} {(H_{k + 1} B_{k + 1} H_{k + 1}^{T} + R_{k + 1})}^{- 1} \\ Analysis : & u_{a} (t_{k + 1}) = u_{b} (t_{k + 1}) + K_{k + 1} (w (t_{k + 1}) - H_{k + 1} u_{b} (t_{k + 1})) \\ P_{k + 1} = (I_{n} - K_{k + 1} H_{k + 1}) B_{k + 1}, \end{matrix}

where the inputs at

t_{k}

are defined as

\begin{matrix} ({\hat{u}}_{b} (t_{k}), {\hat{B}}_{k}) = \{\begin{matrix} (u_{a} (t_{k}), P_{k}) & if w (t_{k}), is available, \\ (u_{b} (t_{k}), B_{k}) & otherwise . \end{matrix} \end{matrix}

Listing 14 describes a basic Python implementation of the data assimilation step using the KF algorithm described before. Although efficient matrix inversion routines that benefit from specific matrix properties can be utilized, we use the standard built-in Numpy matrix inversion function.

Different forms for evaluating the Kalman gain and the covariance matrices are presented in literature. Some of them are favored for computational cost aspects, while others maintain desirable properties (e.g., symmetry and positive definiteness) for numerically stable implementation [27]. Since we are more interested in nonlinear dynamical models, we shall discuss extensions for standard Kalman filters to account for nonlinearity in the following sections.

7. Extended Kalman Filter

Instead of dealing with linear stochastic dynamics, we look at the nonlinear case with general (nonlinear) observation operator written as

\begin{matrix} u_{t} (t_{k + 1}) & = M (u_{t} (t_{k}); θ) + ξ_{p} (t_{k + 1}), \end{matrix}

(44)

\begin{matrix} w (t_{k}) & = h (u_{t} (t_{k})) + ξ_{m} (t_{k}) . \end{matrix}

(45)

The first challenge of applying Kalman filter for this system is the propagation of the background covariance matrix in the forecast step. The main clue behind the extended Kalman filter (EKF) to address this issue is to locally linearize

M (u (t_{k}))

by expanding it around the estimate

{\hat{u}}_{b} (t_{k})

at

t_{k}

using the first-order Taylor series as follows,

M (u_{t} (t_{k}); θ) \approx M ({\hat{u}}_{b} (t_{k}); θ) + D_{M} ({\hat{u}}_{b} (t_{k})) {\hat{ξ}}_{b} (t_{k}),

(46)

where

D_{M} ({\hat{u}}_{b} (t_{k}))

is the Jacobian (also known as the tangent linear operator) of the forward model

M (\cdot; \cdot)

evaluated at

{\hat{u}}_{b} (t_{k})

and

{\hat{ξ}}_{b} (t_{k}) = (u_{t} (t_{k}) - {\hat{u}}_{b} (t_{k}))

defining the error estimate at

t_{k}

, with zero mean and covariance matrix of

{\hat{B}}_{k}

. Thus, the difference between the background forecast and true state at

t_{k + 1}

can be written as follows,

\begin{matrix} ξ_{b} (t_{k + 1}) & = u_{t} (t_{k + 1}) - u_{b} (t_{k + 1}) \\ = M (u_{t} (t_{k}); θ) + ξ_{p} (t_{k + 1}) - M ({\hat{u}}_{b} (t_{k}); θ) \\ \approx M ({\hat{u}}_{b} (t_{k}); θ) + D_{M} ({\hat{u}}_{b} (t_{k})) {\hat{ξ}}_{b} (t_{k}) + ξ_{p} (t_{k + 1}) - M ({\hat{u}}_{b} (t_{k}); θ) \\ \approx D_{M} ({\hat{u}}_{b} (t_{k})) {\hat{ξ}}_{b} (t_{k}) + ξ_{p} (t_{k + 1}) . \end{matrix}

Similar to the derivation in the linear case, with the assumption of uncorrelation between

{\hat{ξ}}_{b} (t_{k})

and

ξ_{p} (t_{k + 1})

, the background covariance matrix at

t_{k + 1}

can be computed as follows,

B_{k + 1} = D_{M} ({\hat{u}}_{b} (t_{k})) {\hat{B}}_{k} M_{k}^{T} + Q_{k + 1} .

(47)

The next challenge regarding the analysis step is the computation of the Kalman gain in case of nonlinear observation operator. Again,

h (u_{t} (t_{k + 1}))

is linearized using Talylor series expansion around

u_{b} (t_{k + 1})

(i.e., the background forecast) as follows,

h (u_{t} (t_{k + 1})) \approx h (u_{b} (t_{k + 1})) + D_{h} (u_{b} (t_{k + 1})) ξ_{b} (t_{k + 1}),

(48)

where

D_{h} (u_{b} (t_{k + 1}))

is the Jacobian of the observation operator h, computed with the forecast

u_{b} (t_{k + 1})

. The Kalman gain is thus computed using this first-order approximation of h as follows,

\begin{matrix} K_{k + 1} = B_{k + 1} D_{h} {(u_{b} (t_{k + 1}))}^{T} {(D_{h} (u_{b} (t_{k + 1})) B_{k + 1} D_{h} {(u_{b} (t_{k + 1}))}^{T} + R_{k + 1})}^{- 1}, \end{matrix}

(49)

with an analysis estimate and analysis covariance matrix defined as

\begin{matrix} u_{a} (t_{k + 1}) & = u_{b} (t_{k + 1}) + K_{k + 1} (w (t_{k + 1}) - h (u_{b} (t_{k + 1}))), \end{matrix}

(50)

\begin{matrix} P_{k + 1} & = (I_{n} - K_{k + 1} D_{h} (u_{b} (t_{k + 1}))) B_{k + 1} . \end{matrix}

(51)

A summary of the EKF algorithm is described as follows,

\begin{matrix} Inputs : & {\hat{u}}_{b} (t_{k}), {\hat{B}}_{k}, M (\cdot; \cdot), Q_{k + 1}, w (t_{k + 1}), R_{k + 1}, h (\cdot) \\ Forecast : & u_{b} (t_{k + 1}) = M ({\hat{u}}_{b} (t_{k}); θ) \\ B_{k + 1} = D_{M} ({\hat{u}}_{b} (t_{k})) {\hat{B}}_{k} D_{M} {({\hat{u}}_{b} (t_{k}))}^{T} + Q_{k + 1} \\ Kalman gain : & K_{k + 1} = B_{k + 1} D_{h} {(u_{b} (t_{k + 1}))}^{T} {(D_{h} {(u_{b} (t_{k + 1}))}_{k + 1} B_{k + 1} D_{h} {(u_{b} (t_{k + 1}))}^{T} + R_{k + 1})}^{- 1} \\ Analysis : & u_{a} (t_{k + 1}) = u_{b} (t_{k + 1}) + K_{k + 1} (w (t_{k + 1}) - h (u_{b} (t_{k + 1}))) \\ P_{k + 1} = (I_{n} - K_{k + 1} D_{h} (u_{b} (t_{k + 1}))) B_{k + 1}, \end{matrix}

and a Python implementation of the data assimilation step is presented in Listing 15.

Example: Lorenz 63 System

The first-order approximation of the Kalman filter in nonlinear case, known as extended Kalman filter, is applied for the test case of Lorenz 63 system. The computation of model Jacobian

D_{M} (\cdot)

is presented in Listing 10 and Listing 11 in Section 4. We use the same parameters and initial conditions for the twin experiment framework as before. The sequential implementation of the forecast and analysis steps is shown in Listing 16 and results are illustrated in Figure 4. We adopt the 4th order Runge–Kutta scheme for time integration. For demonstration purposes, we consider zero process noise (i.e.,

Q = 0

). However, we have found that assuming non-zero process noise (e.g.,

Q = 0.01 I_{3}

) yields better performance.

8. Ensemble Kalman Filter

Despite the sound mathematical and theoretical foundation of Kalman filters (both linear and nonlinear cases), they are not widely utilized in geophysical sciences. The major bottleneck in the computational pipeline of Kalman filtering is the update of background covariance matrix. In typical implementation, the cost of this step is

O (n^{3})

, where n is the size of the state vector. For systems governed by ordinary differential equations (ODES), n can be manageable (e.g., 3 in the Lorenz 63 model). However, fluid flows are often governed by partial differential equations. Thus, spatial discretization schemes (e.g., finite difference, finite volume, and finite element) are applied, resulting in a semi-discrete system of ODEs. In geophysical flow dynamics applications (e.g., weather forecast), a dimension of millions or even billions is not uncommon, which hinders the feasible implementation of standard Kalman filtering techniques.

Alternatively, reduced rank algorithms that provides low-order approximation of the covariance matrices are usually adopted. A very popular approach is the ensemble Kalman filter (EnKF), introduced by Evensen [17,28,29] based on the Monte Carlo estimation methods.

The main procedure for these methods is to create an ensemble of size N of the system state denoted as

{u {(t_{k})}^{(i)} | 1 \leq i \leq N}

and apply the filtering algorithm to each member of the established ensemble. The statistical properties of the forecast and analysis are thus extracted from the ensemble using the standard Monte Carlo framework. In the previous discussions, the forecast (background) and analysis covariances are defined as follows,

\begin{matrix} B & = E [ξ_{b} ξ_{b}^{T}] = E [(u_{t} - u_{b}) {(u_{t} - u_{b})}^{T}], \\ P & = E [ξ_{a} ξ_{a}^{T}] = E [(u_{t} - u_{a}) {(u_{t} - u_{a})}^{T}] . \end{matrix}

Alternatively, those can be approximated by the ensemble covariances, given as

\begin{matrix} B & \approx \frac{1}{N - 1} \sum_{i = 1}^{N} (u_{b}^{(i)} - ū_{b}) {(u_{b}^{(i)} - ū_{b})}^{T}, \\ P & \approx \frac{1}{N - 1} \sum_{i = 1}^{N} (u_{a}^{(i)} - ū_{a}) {(u_{a}^{(i)} - ū_{a})}^{T}, \end{matrix}

where the bar denotes the ensemble average defined as

ū = \frac{1}{N} \sum_{i = 1}^{N} u^{(i)}

. Thus, an interpretation of EnKF is that the ensemble mean is the best estimate of the state and the spreading of the ensemble around the mean is a definition of the error in this estimate. A larger ensemble size N yields a better approximation of the state estimate and its covariance matrix. In the following, we describe the typical steps for applying the EnKF algorithm.

We begin by creating an initial ensemble

{{\hat{u}}_{b}^{(i)} (t_{k}) | 1 \leq i \leq N}

at time

t_{k}

drawn from the distribution

N ({\hat{u}}_{b} (t_{k}), {\hat{B}}_{k})

, where

{\hat{u}}_{b} (t_{k})

represents our best-known estimate at

t_{k}

. It can be verified that the ensemble mean and covariance converge to

{\hat{u}}_{b} (t_{k})

and

B_{k}

as

N \to \infty

. Then, the forecast step is applied to each member of the enesemble as follows,

u_{b}^{(i)} (t_{k + 1}) = M ({\hat{u}}_{b}^{(i)} (t_{k}); θ) + ξ_{p}^{(i)} (t_{k + 1}),

(52)

where

ξ_{p}^{(i)} (t_{k + 1})

is drawn from the multivariate Gaussian distribution with zero mean and covariance matrix of

Q_{k + 1}

representing the process noise applied to each member. The sample mean of the forecast ensemble can be thus computed, along with the corresponding covaiance matrix as,

\begin{matrix} u_{b} (t_{k + 1}) & \approx ū_{b} (t_{k + 1}) = \frac{1}{N} \sum_{i = 1}^{N} u_{b}^{(i)} (t_{k + 1}), \end{matrix}

(53)

\begin{matrix} ξ_{b}^{(i)} (t_{k + 1}) & = u_{b}^{(i)} (t_{k + 1}) - ū_{b} (t_{k + 1}), \end{matrix}

(54)

\begin{matrix} B_{k + 1} & \approx \frac{1}{N - 1} \sum_{i = 1}^{N} ξ_{b}^{(i)} (t_{k + 1}) ξ_{b}^{(i)} {(t_{k + 1})}^{T}, \end{matrix}

(55)

which provides an approximation for the background covariance at

t_{k + 1}

without actually propagating the covariance matrix, as is the case in standard Kalman filtering.

An enesmble of observations

{w^{(i)} (t_{k + 1}) | 1 \leq i \leq N}

, also called virtual observations, is created assuming a Gaussian distribution with a mean equal to the actual observation

w (t_{k + 1})

and a covariance matrix

R_{k + 1}

. In other words, random Gaussian perturbations with zero mean and covariance matrix

R_{k + 1}

are added to the actual measurements to create perturbed measurements. The Kalman gain matrix can be computed as before (repeated here for completeness),

\begin{matrix} K_{k + 1} = B_{k + 1} D_{h} {(u_{b} (t_{k + 1}))}^{T} {(D_{h} (u_{b} (t_{k + 1})) B_{k + 1} D_{h} {(u_{b} (t_{k + 1}))}^{T} + R_{k + 1})}^{- 1} . \end{matrix}

(56)

Then, the analysis step is applied for each member in the ensemble cloud as below,

\begin{matrix} u_{a}^{(i)} & = u_{b}^{(i)} (t_{k + 1}) + K_{k + 1} (w^{(i)} (t_{k + 1}) - h (u_{b}^{(i)} (t_{k + 1}))), \end{matrix}

(57)

and the analyzed estimate at

t_{k + 1}

is computed as the sample mean of the analysis ensemble along with its covariance matrix as follows,

\begin{matrix} u_{a} (t_{k + 1}) & \approx ū_{a} (t_{k + 1}) = \frac{1}{N} \sum_{i = 1}^{N} u_{a}^{(i)} (t_{k + 1}), \end{matrix}

(58)

\begin{matrix} ξ_{a}^{(i)} (t_{k + 1}) & = u_{a}^{(i)} (t_{k + 1}) - ū_{a} (t_{k + 1}), \end{matrix}

(59)

\begin{matrix} P_{k + 1} & \approx \frac{1}{N - 1} \sum_{i = 1}^{N} ξ_{a}^{(i)} (t_{k + 1}) ξ_{a}^{(i)} {(t_{k + 1})}^{T} . \end{matrix}

(60)

We observe that the EnKF algorithm provides approximations of the background and analysis covariance matrices, without the need to evaluate the computationally expensive propagation equations. This comes with the expense of having to evolve an ensemble of system’s states. However, the size of the ensemble N is usually smaller than the system’s dimension n. Moreover, with parallelization and high performance computing (HPC) frameworks, the forecast step can be distributed efficiently and computational speed-ups can be achieved. A summary of the EnKF algorithm is described as follows,

\begin{matrix} Inputs : & {\hat{u}}_{b} (t_{k}), {\hat{B}}_{k}, M (\cdot; \cdot), Q_{k + 1}, w (t_{k + 1}), R_{k + 1}, h (\cdot) \\ Ensemble initialization : & {\hat{u}}_{b}^{(i)} (t_{k}) = {\hat{u}}_{b} (t_{k}) + e_{b}^{(i)}, e_{b}^{(i)} \sim N (0, {\hat{B}}_{k}) \\ Virtual observations : & w^{(i)} (t_{k + 1}) = w^{(i)} (t_{k + 1}) + e_{m}^{(i)}, e_{m}^{(i)} \sim N (0, R_{k + 1}) \\ Forecast : & u_{b}^{(i)} (t_{k + 1}) = M ({\hat{u}}_{b}^{(i)} (t_{k}); θ) + ξ_{p}^{(i)} (t_{k + 1}) \\ ξ_{b}^{(i)} (t_{k + 1}) = u_{b}^{(i)} (t_{k + 1}) - ū_{b} (t_{k + 1}) \\ B_{k + 1} \approx \frac{1}{N - 1} \sum_{i = 1}^{N} ξ_{b}^{(i)} (t_{k + 1}) ξ_{b}^{(i)} {(t_{k + 1})}^{T} \\ Kalman gain : & K_{k + 1} = B_{k + 1} D_{h} {(u_{b} (t_{k + 1}))}^{T} {(D_{h} {(u_{b} (t_{k + 1}))}_{k + 1} B_{k + 1} D_{h} {(u_{b} (t_{k + 1}))}^{T} + R_{k + 1})}^{- 1} \\ Analysis : & u_{a}^{(i)} (t_{k + 1}) = u_{b}^{(i)} (t_{k + 1}) + K_{k + 1} (w^{(i)} (t_{k + 1}) - h (u_{b}^{(i)} (t_{k + 1}))) \\ u_{a} (t_{k + 1}) \approx ū_{a} (t_{k + 1}) \\ ξ_{a}^{(i)} (t_{k + 1}) = u_{a}^{(i)} (t_{k + 1}) - ū_{a} (t_{k + 1}) \\ P_{k + 1} \approx \frac{1}{N - 1} \sum_{i = 1}^{N} ξ_{a}^{(i)} (t_{k + 1}) ξ_{a}^{(i)} {(t_{k + 1})}^{T} \end{matrix}

and Listing 17 provides a Python execution of the presented EnKF approach.

We highlight a few remarks regarding the EnKF as below,

Ensemble methods have gained significant popularity because of their simple conceptual formulation and relative ease of implementation. No optimization problem is required to be solved. They are considered non-intrusive in the sense that current solvers can be easily incorporated with minimal modification, as there is no need to derive model Jacobians or adjoint equations.
The analysis ensemble can be used as initial ensemble for the next assimilation cycle (in which case, we need not compute $P_{k + 1}$ ). Alternatively, new ensemble can be built, by sampling from multivariate Gaussian distribution with a mean of $u_{a} (t_{k + 1})$ and covariance matrix of $P_{k + 1}$ (i.e., using $u_{a} (t_{k + 1})$ and $P_{k + 1}$ in lieu of ${\hat{u}}_{b} (t_{k + 1})$ and ${\hat{B}}_{k + 1}$ , respectively).
After virtual observations are made-up, an ensemble measurement error covariance matrix can be arbitrarily computed as an alternative to the actual one [17]. This is especially valuable when the actual measurement noise covariance matrix is poorly known.
Perturbed observations are needed in EnKF derivation and guarantees that the posterior (analysis) covariance is not underestimated. For instance, in case of small corrections to the forecast, the traditional EnKF without virtual observations yields a error covariance that is about twice smaller than that is needed to match Kalman filter [30]. In other words, the use of virtual observations forces the ensemble posterior covariance to be the same as that of the standard Kalman filter in the limit of very large N. Thus, the same Kalman gain matrix relation is borrowed from standard Kalman filter.
Instead of assuming virtual observations, alternative formulations of ensemble Kalman filters have been proposed in literature, giving a family of deterministic ensemble Kalman filter (DEnKF), as opposed to the aforementioned (stochastic) ensemble Kalman filer (EnKF). One such variant is briefly discussed in Section 8.1.

8.1. Deterministic Ensemble Kalman Filter

The use of an ensemble of perturbed observations in the EnKF leads to a match between the analysis error covariance and its theoretical value given by Kalman filter. However, this is in a statistical sense only when the ensemble size is large. Unfortunately, this perturbation introduces sampling error, which renders the filter suboptimal, particularly for small ensembles [31]. Alternative formulations that do not require virtual observations can be found in literature, including ensemble square root filters [31,32]. We focus here on a simple formulation proposed by Sakov and Oke [30] that maintains the numerical effectiveness and simplicity EnKF without the need to virtual observations, denoted as deterministic ensemble Kalman filter (DEnKF).

Without measurement perturbation, it can be derived that the resulting analysis error covariance matrix is given as follows,

\begin{matrix} P_{k + 1} & = (I_{n} - K_{k + 1} D_{h} (u_{b} (t_{k + 1}))) B_{k + 1} {(I - K D_{h} (u_{b} (t_{k + 1})))}^{T} \\ = B_{k + 1} - K_{k + 1} D_{h} (u_{b} (t_{k + 1})) B_{k + 1} - B_{k + 1} D_{h} {(u_{b} (t_{k + 1}))}^{T} K_{k + 1}^{T} \\ + K_{k + 1} D_{h} (u_{b} (t_{k + 1})) B_{k + 1} D_{h} {(u_{b} (t_{k + 1}))}^{T} K_{k + 1}^{T} . \end{matrix}

With the definition of the Kalman gain, it can be seen that

K_{k + 1} D_{h} (u_{b} (t_{k + 1})) B_{k + 1} = B_{k + 1} D_{h} {(u_{b} (t_{k + 1}))}^{T} K_{k + 1}^{T}

. Thus,

\begin{matrix} P_{k + 1} & = B_{k + 1} - 2 K_{k + 1} D_{h} (u_{b} (t_{k + 1})) B_{k + 1} + K_{k + 1} D_{h} (u_{b} (t_{k + 1})) B_{k + 1} D_{h} {(u_{b} (t_{k + 1}))}^{T} K_{k + 1}^{T} . \end{matrix}

For small values of

K_{k + 1} D_{h} (u_{b} (t_{k + 1}))

, this form converges to

P_{k + 1} = B_{k + 1} - 2 K_{k + 1} D_{h} (u_{b} (t_{k + 1})) B_{k + 1}

up to the quadratic term. It can be seen that this asymptotically match the theoretical value of analysis covariance matrix in standard Kalman filtering (i.e.,

P_{k + 1} = (I_{n} - K_{k + 1} D_{h} (u_{b} (t_{k + 1}))) B_{k + 1} = B_{k + 1} - K_{k + 1} D_{h} (u_{b} (t_{k + 1}))

) by dividing the Kalman gain by two. Therefore, it can be argued that the DEnKF linearly recovers the theoretical analysis error covariance matrix. This is achieved by applying the analysis equation separately to the forecast mean

u_{b} (t_{k + 1}) \approx ū_{b} (t_{k + 1})

with the Kalman gain matrix and ensemble of anomalies

ξ_{b}^{(i)} (t_{k + 1}) = u_{b}^{(i)} (t_{k + 1}) - ū_{b} (t_{k + 1})

using half of the standard Kalman gain matrix. These steps are summarized as follows,

\begin{matrix} Inputs : & {\hat{u}}_{b} (t_{k}), {\hat{B}}_{k}, M (\cdot; \cdot), Q_{k + 1}, w (t_{k + 1}), R_{k + 1}, h (\cdot) \\ Ensemble initialization : & {\hat{u}}_{b}^{(i)} (t_{k}) = {\hat{u}}_{b} (t_{k}) + e_{b}^{(i)}, e_{b}^{(i)} \sim N (0, {\hat{B}}_{k}) \\ Forecast : & u_{b}^{(i)} (t_{k + 1}) = M ({\hat{u}}_{b}^{(i)} (t_{k}); θ) + ξ_{p}^{(i)} (t_{k + 1}) \\ u_{b} (t_{k + 1}) \approx ū_{b} (t_{k + 1}) \\ ξ_{b}^{(i)} (t_{k + 1}) = u_{b}^{(i)} (t_{k + 1}) - ū_{b} (t_{k + 1}) \\ B_{k + 1} \approx \frac{1}{N - 1} \sum_{i = 1}^{N} ξ_{b}^{(i)} (t_{k + 1}) ξ_{b}^{(i)} {(t_{k + 1})}^{T} \\ Kalman gain : & K_{k + 1} = B_{k + 1} D_{h} {(u_{b} (t_{k + 1}))}^{T} {(D_{h} {(u_{b} (t_{k + 1}))}_{k + 1} B_{k + 1} D_{h} {(u_{b} (t_{k + 1}))}^{T} + R_{k + 1})}^{- 1} \\ Analysis : & u_{a} (t_{k + 1}) = u_{b} (t_{k + 1}) + K_{k + 1} (w (t_{k + 1}) - h (u_{b} (t_{k + 1}))) \\ ξ_{a}^{(i)} (t_{k + 1}) = ξ_{b}^{(i)} (t_{k + 1}) - \frac{1}{2} K_{k + 1} (h (ξ_{b}^{(i)} (t_{k + 1}))) \\ u_{a}^{(i)} (t_{k + 1}) = u_{a} (t_{k + 1}) + ξ_{a}^{(i)} (t_{k + 1}) \\ P_{k + 1} \approx \frac{1}{N - 1} \sum_{i = 1}^{N} ξ_{a}^{(i)} (t_{k + 1}) ξ_{a}^{(i)} {(t_{k + 1})}^{T} \end{matrix}

and Listing 18 provides a Python execution of the presented DEnKF approach. Note that the ensemble of observations is not created in this case, compared to the EnKF.

8.2. Example: Lorenz 63 System

The same Lorenz 63 system is used to showcase the performance of both the EnKF and DEnKF. In Listing 19, we show the Python application of the EnKF algorithm. In general, the size of ensemble N is much smaller than the state dimension n for the implementation of EnKF to be computationally feasible. However, the state dimension in the Lorenz 63 is 3, and an ensemble of size 3 or less is trivial. The uncertainty in the covariance approximation via the Monte Carlo framework with such small ensemble becomes very high and resulting predictions are unreliable. There exists various approaches that help to increase the fidelity of small ensembles, including localization and inflation. In Section 9.2, we describe simple application of inflation factor and its impact with small ensembles. Here, we stick with the basic implementation with an ensemble size of 10 for both EnKF and DEnKF.

Figure 5 shows the EnKF results for the Lorenz 63 system. We see that the analysis trajectory is close to the true one and more accurate than the background. Readers are encouraged to play with the codes to explore the effect of increasing or decreasing the ensemble size with different levels of noise. Furthermore, different observation operators can be defined (for instance, observe only 1 or 2 variables, or assume some nonlinear function

h (\cdot)

).

For the sake of completeness, we also sketch the DEnKF predictions in Figure 6. The same implementation in Listing 19 can be adopted, but calling the DEnKF function framed in Listing 18 instead of EnKF. Although different approaches might give slightly dissimilar results, we are not trying to benchmark them in this introductory presentation since we are only showing very simple implementation, with idealized twin experiments.

9. Applications

In this section, we gradually increase the complexity of the test cases using fluid dynamics applications. In Section 9.1, we slightly increase the dimensionality of the system from 3 (as in Lorenz 63 system) to 36 using the Lorenz 96 system and demonstrate the capability of DA algorithms to treat uncertainty in initial conditions. This is further extended in Section 9.2, where we show that DA can recover the hidden underlying processes and provide closure effects using the two-level variant of the Lorenz 96 model. We also introduce the utilization of an inflation factor and its impact to mitigate ensemble collapse and account for a slight under-representation of covariance due to the use of a small ensemble in EnKF and DEnKF. In Section 9.3, we illustrate the application of DA on systems governed by partial differential equations (PDEs) using the Kuramoto–Sivashinsky equation. We highlight that for each application, we only show results for a few selected algorithms and extensions to other approaches covered in this tutorial are left to readers as computer projects. We also emphasize that we are demonstrating the implementation and capabilities of the presented DA algorithms, not assessing their performance nor benchmarking different approaches against each other.

9.1. Lorenz 96 System

The Lorenz 96 model [33] is a system of ordinary differential equations that describes an arbitrary atmospheric quantity as it evolves on a circular array of sites, undergoing forcing, dissipation, and rotation invariant advection [34]. The Lorenz 96 dynamical model can be written as

\frac{d X_{i}}{d t} = (X_{i + 1} - X_{i - 2}) X_{i - 1} - X_{i} + F, i = 1, 2, \dots, n,

(61)

where

X_{i}

is the state of the system at the

i^{t h}

location and F represents a forcing constant. Periodicity is enforced by assuming that

X_{- 1} = X_{n - 1}

,

X_{0} = X_{n}

, and

X_{n + 1} = X_{1}

. In the present study, we use

n = 36

, and

F = 8

defining a forcing term. In order to obtain a valid initial condition, we begin at

t = - 5

using equilibrium conditions defined as (

X_{i} = F

for

i = 1, 2, \dots, n

) and adding a small perturbation to the 20th state variable as

X_{20} = F + 0.01

. Then, ODE integrator is run up to

t = 0

and solution at

t = 0

is treated at the true initial conditions for our twin experiment. We assume a background erroneous initial condition by contaminating the true one with Gaussian noise with zero mean and standard deviation of 1.

A total time window of 20 time units is considered, with a time step of

Δ t = 0.01

and the RK4 schemes is adopted for time integration. Synthetic measurements are collected at every

0.2

time unit (i.e., each 20 time integration steps) sampled at 9 equidistant locations (i.e., at

i \in {4, 8, 12, \dots, 36}

) from true trajectory assuming that sensors add a white noise with zero mean and a standard deviation of

0.1

. We also assume a process noise drawn from a multivariate Gaussian distribution with zero mean and covariance matrix

Q

defined as

Q = 0 . 1^{2} I_{36}

. We first apply the EKF approach to correct the solution trajectory, which yields very good results as shown in Figure 7. For visualization, we only plot the time evolution of

X_{9}

,

X_{18}

, and

X_{36}

. We see that the Lorenz 96 is sensitive to the initial conditions and small perturbation is sufficient to produce a very different trajectory (e.g., background solution).

The second approach to test is the stochastic version of EnKF. We create an ensemble of 50 members to approximate the covariance matrices. Results are depicted in Figure 8 for

X_{9}

,

X_{18}

, and

X_{36}

. We highlight that observations appear only in the

X_{36}

plot because observations are collected at

i = 4, 8, 12, \dots, 36

(neither

X_{9}

nor

X_{18}

are measured). We highlight that, generally speaking, increasing the size of ensemble improves the predictions. However, this comes on the expense of the solution of the forward nonlinear model for each added member. Thus, a compromise between the accuracy and computational burden is in place.

9.2. Two-Level Lorenz 96 System

In this section, we describe the two-level variant of the Lorenz 96 model proposed by Lorenz [33]. The two-level Lorenz 96 model can be written as

\begin{matrix} \frac{d X_{i}}{d t} & = - X_{i - 1} (X_{i - 2} - X_{i + 1}) - X_{i} - \frac{h c}{b} \sum_{j = 1}^{J} Y_{j, i} + F, \end{matrix}

(62)

\begin{matrix} \frac{d Y_{j, i}}{d t} & = - c b Y_{j + 1, i} (Y_{j + 2, i} - Y_{j - 1, i}) - c Y_{j, i} + \frac{h c}{b} X_{i}, \end{matrix}

(63)

where Equation (62) represents the evolution of slow, high-amplitude variables

X_{i} (i = 1, \dots, n)

, and Equation (63) provides the evolution of a coupled fast, low-amplitude variable

Y_{j, i} (j = 1, \dots, J)

. We use

n = 36

and

J = 10

in our computational experiments. We utilize

c = 10

and

b = 10

, which implies that the small scales fluctuate 10 times faster than the larger scales. Furthermore, the coupling coefficient h between two scales is equal to 1 and the forcing is set at

F = 10

to make both variables exhibit the chaotic behavior.

We utilize the fourth-order Runge–Kutta numerical scheme with a time step

Δ t = 0.001

for temporal integration of the Lorenz 96 model. We apply the periodic boundary condition for the slow variables, i.e.,

X_{i - n} = X_{i + n} = X_{i}

. The fast variables are extended by letting

Y_{j, i - n} = Y_{j, i + n} = Y_{j, i}

,

Y_{j - J, i} = Y_{j, i - 1}

, and

Y_{j + J, i} = Y_{j, i + 1}

. The physical initial condition is computed by starting with an equilibrium condition at time

t = - 5

for slow variables. The equilibrium condition for slow variables is

X_{i} = F

for

i \in 1, 2, \dots, n

. We perturb the equilibrium solution for the 18th state variable as

X_{18} = F + 0.01

. At the time

t = - 5

, the fast variables are assigned with random numbers between

- F / 10

to

F / 10

. We integrate a two-level Lorenz 96 model by solving both Equations (62) and (63) in a coupled manner up to time

t = 0

. The solution at time

t = 0

represent the true initial condition. For our twin experiment, we obtain observations by adding noise drawn from the Gaussian distribution with zero mean and

σ_{o}^{2} = 1.0

. We assume that observations are sparse in space and are collected at every 10th time step.

The motivation behind this example is to demonstrate how covariance inflation can be utilized to account for the model error. Usually the imperfections in the forecast model is taken into account by adding a Gaussian noise to the forecast model. Another method to account for model error is covarinace inflation. It also helps in alleviating the effect finite number of ensemble members in practical data assimilation and addresses the problem of covariance underestimation in the EnKF algorithm. We use the multiplicative inflation [35] where the ensemble members are pushed away from the ensemble mean by a given inflation factor and mathematically it can be expressed as

\begin{matrix} u_{b}^{(i)} (t_{k + 1}) \leftarrow u_{a} (t_{k + 1}) + λ \cdot (u_{b}^{(i)} (t_{k + 1}) - u_{a}^{(i)} (t_{k + 1})), \end{matrix}

(64)

where

λ

is the inflation factor. The inflation factor can be a constant scalar over the entire domain at all time step or it can space and time dependent.

In this example, we discard parameterizations of fast variables in the forecast model. The forecast model for two-level Lorenz system with no parameterizations is equivalent to setting the coupling coefficient

h = 0

in Equation (62) and it reduces to one-level Lorenz 96 model as presented in Section 9.1. We note here that the observations used for data assimilation are obtained by solving a two-level Lorenz 96 model in a coupled manner (i.e., without discarding fast-variables). Therefore, the effect of unresolved scales is embedded in observations. The parameterization of fast variables (i.e.,

\frac{h c}{b} \sum_{j = 1}^{J} Y_{j, i}

term in Equation (62)) can be considered as an added noise to the true state of the system for a one-level Lorenz 96 model. Figure 9 displays the RMSE for a two-level Lorenz system when 18 observations are used for DA with different number of ensemble members and inflation factors for EnKF and DEnKF algorithms. Figure 10 shows the full state trajectory of two-level Lorenz system corresponding to minimum RMSE, which is obtained with 50 ensemble members and inflation factor

λ = 1.04

for the EnKF algorithm. The parameters corresponding to minimum RMSE for the DEnKF algorithm are 45 ensemble members and

λ = 1.05

.

9.3. Kuramato Sivashinsky

In this section, we describe the Kuramoto–Sivashinsky (K-S) equation derived by Kuramoto [36], which is used as a turbulence model for different flows. The one-dimensional K-S equation can be written as

\frac{\partial u}{\partial t} = - ν \frac{\partial^{4} u}{\partial x^{4}} - \frac{\partial^{2} u}{\partial x^{2}} - u \frac{\partial u}{\partial x},

(65)

where

ν

is the viscosity coefficient. The K-S equation is characterized by the second-order unstable diffusion term responsible for an instability at large scales, the fourth-order stabilizing viscous term that provides damping at small scales, and a quadratic nonlinear term which transfers energy between large and small scales. We use the computational domain extending from 0 to L, i.e.,

x \in [0, L]

and time

t \in [0, \infty]

. We impose the Dirichlet and Neumann boundary conditions as given below

\begin{matrix} u (0, t) = u (L, t) = 0, \end{matrix}

(66)

\begin{matrix} \frac{\partial u}{\partial x} |_{x = 0} = \frac{\partial u}{\partial x} |_{x = L} = 0 . \end{matrix}

(67)

We spatially discretize the domain with the grid size

Δ x = L / (n - 1)

, where n is the degrees of freedom. We set

L = 50

and

n = 129

for our numerical experiments. The state of the system at discretized grid is denoted as

u_{i} = u ((i - 1) Δ x)

for

i = 1, \dots, n

. Using the second-order finite difference discretization, the discretized K-S equation can be written as

\begin{matrix} \frac{d u_{i}}{d t} = - ν \frac{u_{i + 2} - 4 u_{i + 1} + 6 u_{i} - 4 u_{i - 1} + u_{i - 2}}{Δ x^{4}} - \frac{u_{i + 1} - 2 u_{i} + u_{i - 1}}{Δ x^{2}} - \frac{1}{2} \frac{u_{i + 1}^{2} - u_{i - 1}^{2}}{2 Δ x} . \end{matrix}

(68)

The first term on the right hand side is computed by utilizing ghost nodes and the Neumann boundary condition is assigned for ghost points. We impose

u_{0} = u_{2}

and

u_{n + 1} = u_{n - 1}

using the second-order discretization for Equation (67) at boundary points

u_{1}

and

u_{n}

, respectively.

We use the fourth-order Runge–Kutta scheme for time integration with a time step

Δ t = 0.25

. To generate an initial condition for the forward run we start with an equilibrium condition at time

t = - 50

and integrate up to time

t = 0

. The equilibrium condition for the model is

u_{i} = 0.1

for

j \in {1, \dots, n}

. Once the true initial condition is generated, we run the forward solver up to time

t = 50

. We test the prediction capability of sequential data assimilation algorithms for forecast up to

t = 50

. The K-S equation exhibits different levels of chaotic behavior depending on the value of viscosity coefficient

ν

. The chaos depends upon the bifurcation parameter

\tilde{L} = L / 2 π \sqrt{ν}

. We utilize

ν = 1 / 2

which represent the less chaotic behaviour. The observations for twin experiments are obtained by adding some noise to the true state of the system to account for experimental uncertainties and measurement errors. The observations are also sparse in time, meaning that the time interval between two observations can be different from the time step of the forecast model. For our twin experiments, we assume that observations are recorded at every 10th time step of the model for

ν = 1 / 2

. Therefore, the time difference between two observations is

δ t = 2.5

for the K-S equation. We present results for the EnKF algorithm with three sets of observations. The first set of observations is very sparse with only 12.5% of the full state of the system. The first set utilizes observations for states

[u_{8}, u_{16}, \dots, u_{128}] \in R^{16}

. In a second set of observations we employ observations at

[u_{4}, u_{8}, \dots, u_{128}] \in R^{32}

for the assimilation. The third set of observations consists of 50% of the full state of the system, i.e., observations at states

[u_{2}, u_{4}, \dots, u_{128}] \in R^{64}

for the assimilation. We apply

σ_{o}^{2} = 1.0 \times 10^{- 2}

and

σ_{i}^{2} = 1.0 \times 10^{- 2}

as the variance of observation noise and initial condition uncertainty, respectively.

In Figure 11, we present the time evolution of selected states for three different number of observations included in the assimilation of the EnKF algorithm. There is an excellent agreement between true and assimilated states

u_{51}

and

u_{101}

, for which observations are not present. We also provide the full state trajectory of the K-S equation in Figure 12. The results obtained clearly indicate that the EnKF algorithm is able to determine the correct state trajectory even when the observation data are very sparse, i.e.,

m = 16

. With an increase in the number of observations, the prediction of the full state trajectory gets smoother, and almost the exact state is recovered with 50% observations.

9.4. Quasi-Geostrophic (QG) Ocean Circulation Model

We consider a simple single-layer QG model to illustrate the application of sequential data assimilation for two-dimensional flows. Specifically, we use the deterministic ensemble Kalman filter (DenKF) algorithm discussed in Section 8.1 to improve the prediction of the single-layer QG model. The wind-driven oceanic flows exhibit a vast range of spatio-temporal scales and modeling of these scales with all the relevant physics has always been challenging. The barotropic vorticity equation (BVE) with various dissipative and forcing terms is one of the most commonly used models for geostrophic flows [37,38]. The dimensionless vorticity-streamfunction formulation for the BVE [39] with forcing and dissipative terms can be written as

\frac{\partial ω}{\partial t} + J (ω, ψ) - \frac{1}{R o} \frac{\partial ψ}{\partial x} = \frac{1}{R e} \nabla^{2} ω + \frac{1}{R o} sin (π y),

(69)

where

ω

is the vorticity,

ψ

is the streamfunction,

\nabla^{2}

is the standard two-dimensional Laplacian operator, Re is the Reynolds number, and

R o

is the Rossby number. The kinematic relation between vorticity and streamfunction is given by the following Poisson equation

\nabla^{2} ψ = - ω .

(70)

The nonlinear convection term is given by the Jacobian as follows

J (ω, ψ) = \frac{\partial ψ}{\partial y} \frac{\partial ω}{\partial x} - \frac{\partial ψ}{\partial x} \frac{\partial ω}{\partial y} .

(71)

The computational domain for the QG model is

(x, y) \in [0, 1] \times [- 1, 1]

and is discretized using

128 \times 256

grid resolution. Therefore, the QG model has the dimension of about

3.2 \times 10^{4}

. We utilize the homogeneous Dirichlet boundary condition for the vorticity and streamfunction at all boundaries. The vorticity and streamfunction is initialized from quiescent state, i.e.,

ω_{t = 0} {= ψ |}_{t = 0} = 0

. The QG model is numerically solved by discretizing Equation (69) using second-order finite difference scheme. The nonlinear Jacobian term is discretized with the energy-conserving Arakawa [40] numerical scheme. A third-order total-variation-diminishing Runge–Kutta scheme is used for the temporal integration and a fast sine transform Poisson solver is utilized to update streamfunction from the vorticity [41]. For the physical parameters, we use values of

Re = 100

and

Ro = 1.75 \times 10^{- 3}

.

The QG model is integrated with a constant time step of

5 \times 10^{- 5}

from time

t = 0

to

t = 0.25

to generate the true initial condition at final time

t = 0.25

. Then the data assimilation is conducted from time

t = 0.25

to

t = 0.4

with observations getting assimilated at every tenth time step. The synthetic observations are generated by sampling vorticity field on 16 equidistant points in x and y directions respectively and then adding the Gaussian noise, i.e.,

v_{k} \sim N (0, R_{k})

, where

R_{k} = σ_{b}^{2} I

. We set the observation noise variance at

σ_{b}^{2} = 5

. The typical vorticity and streamfunction field along with the locations of measurements are shown in Figure 13.

We employ 20 ensemble members for the DEnKF algorithm. The initialization of the ensemble members is an important step to get accurate prediction with any type of the EnKF algorithm. We initialize different ensemble members by randomly selecting the vorticity field snapshots between time

t = 0.24

to

t = 0.25

. The other methods such as adding a random perturbation from the Gaussian distribution to the true initial condition can also be adopted. Figure 14 displays the vorticity field and the predicted vorticity field at three different time instances along with the difference (error) between the two. We can see that the true and analysis field are similar at all time instances and the magnitude of error is also small. We recall that we observe only around 2% of the system (i.e., observations at

16 \times 32

locations). With more observations, the quality of the results can be further improved.

10. Concluding Remarks

In this tutorial paper, we provided a 101 introduction to common data assimilation techniques. In particular, we briefly covered the relevant mathematical foundation and the algorithmic steps for three dimensional variational (3DVAR), four dimensional variational (4DVAR), forward sensitivity method (FSM), and Kalman filtering approaches. Since it is considered as a first exposure to DA, we focused on the simplest implementations that anybody can easily follow. For example, to treat nonlinearity (e.g., in 3DVAR, 4DVAR, FSM and EKF), we only presented the first order Taylor expansions. We demonstrated the execution of the covered approaches with a series of Python modules that can be linked to each other easily. Again, we preferred to keep our codes as concise and simple as possible, even if it comes on the expense of computational efficiency. The Python codes used to generate this tutorial are publicly available through our GitHub repository https://github.com/Shady-Ahmed/PyDA.

Since it is introductory exploration, we should admit that we have bypassed a few important analyses and shortcut some key derivations. Interested readers are referred to well-established textbooks that offer in-depth discussions about various DA techniques [14,27,29,42,43,44,45,46,47]. Likewise, more advanced topics such as the particle filters [48], maximum likelihood ensemble filters [49,50], optimal sensor placement [51,52], higher-order analysis of variational methods [53], or hybrid methods [54,55,56,57,58,59] are omitted in our current presentation.

Author Contributions

Data curation, S.E.A., S.P., and O.S.; Supervision, O.S.; Writing—original draft, S.E.A., and S.P.; and Writing—review and editing, S.E.A., S.P., and O.S. All authors have read and agreed to the published version of the manuscript.

Funding

This material is based upon work supported by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research under Award Number DE-SC0019290. Omer San gratefully acknowledges their support. Disclaimer: This report was prepared as an account of work sponsored by an agency of the United States Government. Neither the United States Government nor any agency thereof, nor any of their employees, makes any warranty, express or implied, or assumes any legal liability or responsibility for the accuracy, completeness, or usefulness of any information, apparatus, product, or process disclosed, or represents that its use would not infringe privately owned rights. Reference herein to any specific commercial product, process, or service by trade name, trademark, manufacturer, or otherwise does not necessarily constitute or imply its endorsement, recommendation, or favoring by the United States Government or any agency thereof. The views and opinions of authors expressed herein do not necessarily state or reflect those of the United States Government or any agency thereof.

Acknowledgments

We thank Sivaramakrishnan Lakshmivarahan for his insightful comments as well as his archival NPTEL lectures [60] on dynamic data assimilation that greatly helped us in developing the PyDA module. Special thanks go to Ionel Michael Navon for providing his lecture notes on data assimilation.

Conflicts of Interest

The authors declare no conflict of interest.

References

Navon, I.M. Data assimilation for numerical weather prediction: A review. In Data Assimilation for Atmospheric, Oceanic and Hydrologic Applications; Springer: Berlin/Heidelberg, Germany, 2009; pp. 21–65. [Google Scholar]
Blum, J.; Le Dimet, F.X.; Navon, I.M. Data assimilation for geophysical fluids. Handb. Numer. Anal. 2009, 14, 385–441. [Google Scholar]
Le Dimet, F.X.; Navon, I.M.; Ştefănescu, R. Variational data assimilation: Optimization and optimal control. In Data Assimilation for Atmospheric, Oceanic and Hydrologic Applications (Vol. III); Springer: Berlin/Heidelberg, Germany, 2017; pp. 1–53. [Google Scholar]
Attia, A.; Sandu, A. DATeS: A highly extensible data assimilation testing suite v1.0. Geosci. Model Dev. 2019, 12, 629–649. [Google Scholar] [CrossRef] [Green Version]
Lorenc, A.C. Analysis methods for numerical weather prediction. Q. J. R. Meteorol. Soc. 1986, 112, 1177–1194. [Google Scholar] [CrossRef]
Parrish, D.F.; Derber, J.C. The National Meteorological Center’s spectral statistical-interpolation analysis system. Mon. Weather Rev. 1992, 120, 1747–1763. [Google Scholar] [CrossRef]
Courtier, P. Dual formulation of four-dimensional variational assimilation. Q. J. R. Meteorol. Soc. 1997, 123, 2449–2461. [Google Scholar] [CrossRef]
Rabier, F.; Järvinen, H.; Klinker, E.; Mahfouf, J.F.; Simmons, A. The ECMWF operational implementation of four-dimensional variational assimilation. I: Experimental results with simplified physics. Q. J. R. Meteorol. Soc. 2000, 126, 1143–1170. [Google Scholar] [CrossRef]
Elbern, H.; Schmidt, H.; Talagrand, O.; Ebel, A. 4D-variational data assimilation with an adjoint air quality model for emission analysis. Environ. Model. Softw. 2000, 15, 539–548. [Google Scholar] [CrossRef]
Courtier, P.; Thépaut, J.N.; Hollingsworth, A. A strategy for operational implementation of 4D-Var, using an incremental approach. Q. J. R. Meteorol. Soc. 1994, 120, 1367–1387. [Google Scholar] [CrossRef]
Lorenc, A.C.; Rawlins, F. Why does 4D-Var beat 3D-Var? Q. J. R. Meteorol. Soc. 2005, 131, 3247–3257. [Google Scholar] [CrossRef]
Gauthier, P.; Tanguay, M.; Laroche, S.; Pellerin, S.; Morneau, J. Extension of 3DVAR to 4DVAR: Implementation of 4DVAR at the Meteorological Service of Canada. Mon. Weather Rev. 2007, 135, 2339–2354. [Google Scholar] [CrossRef] [Green Version]
Lakshmivarahan, S.; Lewis, J.M. Forward sensitivity approach to dynamic data assimilation. Adv. Meteorol. 2010, 2010, 375615. [Google Scholar] [CrossRef]
Lakshmivarahan, S.; Lewis, J.M.; Jabrzemski, R. Forecast Error Correction Using Dynamic Data Assimilation; Springer: Cham, Switzerland, 2017. [Google Scholar]
Houtekamer, P.L.; Mitchell, H.L. Data assimilation using an ensemble Kalman filter technique. Mon. Weather Rev. 1998, 126, 796–811. [Google Scholar] [CrossRef]
Burgers, G.; Jan van Leeuwen, P.; Evensen, G. Analysis scheme in the ensemble Kalman filter. Mon. Weather Rev. 1998, 126, 1719–1724. [Google Scholar] [CrossRef]
Evensen, G. The ensemble Kalman filter: Theoretical formulation and practical implementation. Ocean. Dyn. 2003, 53, 343–367. [Google Scholar] [CrossRef]
Houtekamer, P.L.; Mitchell, H.L. A sequential ensemble Kalman filter for atmospheric data assimilation. Mon. Weather Rev. 2001, 129, 123–137. [Google Scholar] [CrossRef]
Houtekamer, P.L.; Mitchell, H.L. Ensemble kalman filtering. Q. J. R. Meteorol. Soc. 2005, 131, 3269–3289. [Google Scholar] [CrossRef]
Treebushny, D.; Madsen, H. A new reduced rank square root Kalman filter for data assimilation in mathematical models. In Proceedings of the International Conference on Computational Science, Melbourne, Australia, 2 June 2003; Springer: Berlin/Heidelberg, Germany, 2003; pp. 482–491. [Google Scholar]
Buehner, M.; Malanotte-Rizzoli, P. Reduced-rank Kalman filters applied to an idealized model of the wind-driven ocean circulation. J. Geophys. Res. Ocean. 2003, 108. [Google Scholar] [CrossRef] [Green Version]
Lakshmivarahan, S.; Stensrud, D.J. Ensemble Kalman filter. IEEE Control. Syst. Mag. 2009, 29, 34–46. [Google Scholar]
Apte, A.; Hairer, M.; Stuart, A.; Voss, J. Sampling the posterior: An approach to non-Gaussian data assimilation. Phys. Nonlinear Phenom. 2007, 230, 50–64. [Google Scholar] [CrossRef]
Bocquet, M.; Pires, C.A.; Wu, L. Beyond Gaussian statistical modeling in geophysical data assimilation. Mon. Weather Rev. 2010, 138, 2997–3023. [Google Scholar] [CrossRef]
Vetra-Carvalho, S.; Van Leeuwen, P.J.; Nerger, L.; Barth, A.; Altaf, M.U.; Brasseur, P.; Kirchgessner, P.; Beckers, J.M. State-of-the-art stochastic data assimilation methods for high-dimensional non-Gaussian problems. Tellus Dyn. Meteorol. Oceanogr. 2018, 70, 1–43. [Google Scholar] [CrossRef] [Green Version]
Attia, A.; Moosavi, A.; Sandu, A. Cluster sampling filters for non-Gaussian data assimilation. Atmosphere 2018, 9, 213. [Google Scholar] [CrossRef] [Green Version]
Lewis, J.M.; Lakshmivarahan, S.; Dhall, S. Dynamic Data Assimilation: A Least Squares Approach; Cambridge University Press: Cambridge, UK, 2006; Volume 104. [Google Scholar]
Evensen, G. Sequential data assimilation with a nonlinear quasi-geostrophic model using Monte Carlo methods to forecast error statistics. J. Geophys. Res. Ocean. 1994, 99, 10143–10162. [Google Scholar] [CrossRef]
Evensen, G. Data Assimilation: The Ensemble Kalman Filter; Springer: Berlin/Heidelberg, Germany, 2009. [Google Scholar]
Sakov, P.; Oke, P.R. A deterministic formulation of the ensemble Kalman filter: An alternative to ensemble square root filters. Tellus Dyn. Meteorol. Oceanogr. 2008, 60, 361–371. [Google Scholar] [CrossRef] [Green Version]
Whitaker, J.S.; Hamill, T.M. Ensemble data assimilation without perturbed observations. Mon. Weather Rev. 2002, 130, 1913–1924. [Google Scholar] [CrossRef]
Tippett, M.K.; Anderson, J.L.; Bishop, C.H.; Hamill, T.M.; Whitaker, J.S. Ensemble square root filters. Mon. Weather Rev. 2003, 131, 1485–1490. [Google Scholar] [CrossRef] [Green Version]
Lorenz, E.N. Predictability: A problem partly solved. In Proceedings of the Seminar on Predictability, Reading, UK, 9–11 September 1996; Volume 1. [Google Scholar]
Kerin, J.; Engler, H. On the Lorenz’96 Model and Some Generalizations. arXiv 2020, arXiv:2005.07767. [Google Scholar]
Anderson, J.L.; Anderson, S.L. A Monte Carlo implementation of the nonlinear filtering problem to produce ensemble assimilations and forecasts. Mon. Weather Rev. 1999, 127, 2741–2758. [Google Scholar] [CrossRef]
Kuramoto, Y. Diffusion-induced chaos in reaction systems. Prog. Theor. Phys. Suppl. 1978, 64, 346–367. [Google Scholar] [CrossRef]
Majda, A.; Wang, X. Nonlinear Dynamics and Statistical Theories for basic Geophysical Flows; Cambridge University Press: New York, NY, USA, 2006. [Google Scholar]
Greatbatch, R.J.; Nadiga, B.T. Four-gyre circulation in a barotropic model with double-gyre wind forcing. J. Phys. Oceanogr. 2000, 30, 1461–1471. [Google Scholar] [CrossRef]
San, O.; Staples, A.E.; Wang, Z.; Iliescu, T. Approximate deconvolution large eddy simulation of a barotropic ocean circulation model. Ocean. Model. 2011, 40, 120–132. [Google Scholar] [CrossRef] [Green Version]
Arakawa, A. Computational design for long-term numerical integration of the equations of fluid motion: Two-dimensional incompressible flow. Part I. J. Comput. Phys. 1997, 135, 103–114. [Google Scholar] [CrossRef] [Green Version]
Press, W.H.; Flannery, B.P.; Teukolsky, S.A.; Vetterling, W.T. Numerical Recipes; Cambridge University Press: New York, NY, USA, 1989. [Google Scholar]
Cacuci, D.G.; Navon, I.M.; Ionescu-Bujor, M. Computational Methods for Data Evaluation and Assimilation; CRC Press: New York, NY, USA, 2013. [Google Scholar]
Kalnay, E. Atmospheric Modeling, Data Assimilation and Predictability; Cambridge University Press: New York, NY, USA, 2003. [Google Scholar]
Law, K.; Stuart, A.; Zygalakis, K. Data Assimilation: A Mathematical Introduction; Springer: Cham, Switzerland, 2015. [Google Scholar]
Asch, M.; Bocquet, M.; Nodet, M. Data Assimilation: Methods, Algorithms, and Applications; SIAM: Philadelphia, PA, USA, 2016. [Google Scholar]
Simon, D. Optimal State Estimation: Kalman, H Infinity, and Nonlinear Approaches; John Wiley & Sons: Hoboken, NJ, USA, 2006. [Google Scholar]
Labbe, R. Kalman and bayesian filters in Python. Chap 2014, 7, 246. [Google Scholar]
Van Leeuwen, P.J.; Künsch, H.R.; Nerger, L.; Potthast, R.; Reich, S. Particle filters for high-dimensional geoscience applications: A review. Q. J. R. Meteorol. Soc. 2019, 145, 2335–2365. [Google Scholar] [CrossRef] [PubMed]
Zupanski, M. Maximum likelihood ensemble filter: Theoretical aspects. Mon. Weather Rev. 2005, 133, 1710–1726. [Google Scholar] [CrossRef]
Zupanski, M.; Navon, I.M.; Zupanski, D. The Maximum Likelihood Ensemble Filter as a non-differentiable minimization algorithm. Q. J. R. Meteorol. Soc. 2008, 134, 1039–1050. [Google Scholar] [CrossRef] [Green Version]
Kang, W.; Xu, L. Optimal placement of mobile sensors for data assimilations. Tellus Dyn. Meteorol. Oceanogr. 2012, 64, 17133. [Google Scholar] [CrossRef]
Mons, V.; Chassaing, J.C.; Sagaut, P. Optimal sensor placement for variational data assimilation of unsteady flows past a rotationally oscillating cylinder. J. Fluid Mech. 2017, 823, 230–277. [Google Scholar] [CrossRef]
Le Dimet, F.X.; Navon, I.M.; Daescu, D.N. Second-order information in data assimilation. Mon. Weather Rev. 2002, 130, 629–648. [Google Scholar] [CrossRef] [Green Version]
Lorenc, A.C.; Bowler, N.E.; Clayton, A.M.; Pring, S.R.; Fairbairn, D. Comparison of hybrid-4DEnVar and hybrid-4DVar data assimilation methods for global NWP. Mon. Weather Rev. 2015, 143, 212–229. [Google Scholar] [CrossRef]
Desroziers, G.; Camino, J.T.; Berre, L. 4DEnVar: Link with 4D state formulation of variational assimilation and different possible implementations. Q. J. R. Meteorol. Soc. 2014, 140, 2097–2110. [Google Scholar] [CrossRef]
Wang, X.; Barker, D.M.; Snyder, C.; Hamill, T.M. A hybrid ETKF–3DVAR data assimilation scheme for the WRF model. Part I: Observing system simulation experiment. Mon. Weather Rev. 2008, 136, 5116–5131. [Google Scholar] [CrossRef] [Green Version]
Buehner, M.; Morneau, J.; Charette, C. Four-dimensional ensemble-variational data assimilation for global deterministic weather prediction. Nonlinear Process. Geophys. 2013, 20, 669–682. [Google Scholar] [CrossRef] [Green Version]
Kleist, D.T.; Ide, K. An OSSE-based evaluation of hybrid variational-ensemble data assimilation for the NCEP GFS. Part I: System description and 3D-hybrid results. Mon. Weather Rev. 2015, 143, 433–451. [Google Scholar] [CrossRef]
Kleist, D.T.; Ide, K. An OSSE-based evaluation of hybrid variational-ensemble data assimilation for the NCEP GFS. Part II: 4DEnVar and hybrid variants. Mon. Weather Rev. 2015, 143, 452–470. [Google Scholar] [CrossRef] [Green Version]
Lakshmivarahan, S. Video Lectures on Dynamic Data Assimilation; NPTEL Program; IIT Madras: Chennai, India, 2016; Available online: https://nptel.ac.in/courses/111/106/111106082/ (accessed on 28 November 2020).

Figure 1. Results of 3DVAR implementation for the Lorenz 63 system.

Figure 2. Results of 4DVAR implementation for the Lorenz 63 system.

Figure 3. Results of FSM implementation for the Lorenz 63 system.

Figure 4. EKF results for the Lorenz 63 system with the assumption of zero process noise.

Figure 5. EnKF results for the Lorenz 63 system with virtual observations.

Figure 6. DEnKF results for the Lorenz 63 system without virtual observations.

Figure 7. EKF results for the Lorenz 96 system. The trajectories of

X_{9}

,

X_{18}

, and

X_{36}

are shown.

Figure 7. EKF results for the Lorenz 96 system. The trajectories of

X_{9}

,

X_{18}

, and

X_{36}

are shown.

Figure 8. EnKF results for the Lorenz 96 system using an ensemble of 50 members.

Figure 9. RMSE for a two-level Lorenz model for different combinations of number of ensembles and inflation factor.

Figure 10. Full state trajectory of the multiscale Lorenz 96 model with no parameterizations in the forecast model. The EnKF algorithm uses the inflation factor

λ = 1.04

and

N = 50

and the DEnKF uses the inflation factor

λ = 1.05

and

N = 45

. The observation data for both EnKF an DEnKF algorithm is obtained by adding measurement noise to the exact solution of the two-level Lorenz 96 system.

Figure 10. Full state trajectory of the multiscale Lorenz 96 model with no parameterizations in the forecast model. The EnKF algorithm uses the inflation factor

λ = 1.04

and

N = 50

and the DEnKF uses the inflation factor

λ = 1.05

and

N = 45

. The observation data for both EnKF an DEnKF algorithm is obtained by adding measurement noise to the exact solution of the two-level Lorenz 96 system.

Figure 11. Selected trajectories of the Kuramoto–Sivashinsky model (

ν = 1 / 2

) with the analysis performed by the ensemble Kalman filter (EnKF) using observations from

m = 16

(left),

m = 32

(middle), and

m = 64

(right) state variables at every 10 time steps.

Figure 11. Selected trajectories of the Kuramoto–Sivashinsky model (

ν = 1 / 2

) with the analysis performed by the ensemble Kalman filter (EnKF) using observations from

m = 16

(left),

m = 32

(middle), and

m = 64

(right) state variables at every 10 time steps.

Figure 12. Full state trajectory of the Kuramoto–Sivashinsky model (

ν = 1 / 2

) with the analysis performed by the ensemble Kalman filter (EnKF) using observations from

m = 16

(left),

m = 32

(middle), and

m = 64

(right) state variables at every 10 time steps.

Figure 12. Full state trajectory of the Kuramoto–Sivashinsky model (

ν = 1 / 2

) with the analysis performed by the ensemble Kalman filter (EnKF) using observations from

m = 16

(left),

m = 32

(middle), and

m = 64

(right) state variables at every 10 time steps.

Figure 13. A Typical vorticity (left) and streamfunction (right) field for the single-layer QG model. The dots shows the locations of observations.

Figure 14. Snapshots of the true vorticity field (left), analysis estimate of the DEnKF algorithm (middle), and the difference (error) between the two fields (right) obtained for a particular run of the single-layer QG model. The snapshots of vorticity field are plotted at

t = 0.3, 0.35, 0.4

(from top to bottom).

Figure 14. Snapshots of the true vorticity field (left), analysis estimate of the DEnKF algorithm (middle), and the difference (error) between the two fields (right) obtained for a particular run of the single-layer QG model. The snapshots of vorticity field are plotted at

t = 0.3, 0.35, 0.4

(from top to bottom).

Listing 1. Implementation of 3DVAR for with a linear observation operator.

Listing 2. Implementation of the 3DVAR for with a nonlinear observation operator, using first-order approximation.

Listing 3. A Python function for the Lorenz 63 dynamics.

Listing 4. Python functions for the time integration using the 1st Euler and the 4th Runge–Kutta schemes.

Listing 5. Implementation of the 3DVAR for the Lorenz 63 system.

Listing 6. Computation of the gradient of the cost functional with the 4DVAR using the first-order adjoint algorithm.

Listing 7. A line-search Python function using the Golden search method.

Listing 8. Computation of the cost functional defined in Equation (13).

Listing 9. Implementation of 4DVAR using the first-order adjoint method for the Lorenz 63 system.

Listing 10. Python functions for computing the Jacobian

D_{M} (u)

of the discrete-time model map

M (u; θ)

using the 1st Euler and the 4th Runge–Kutta schemes with chain rule.

Listing 10. Python functions for computing the Jacobian

D_{M} (u)

of the discrete-time model map

M (u; θ)

using the 1st Euler and the 4th Runge–Kutta schemes with chain rule.

Listing 11. A Python function for the Jacobian of the continuous-time Lorenz 63 dynamics.

Listing 12. Python function for computing the correction vector

Δ u_{0}

sing the forward sensitivity method with first-order approximation.

Listing 12. Python function for computing the correction vector

Δ u_{0}

sing the forward sensitivity method with first-order approximation.

Listing 13. Implementation of the FSM for the Lorenz 63 system.

Listing 14. Implementation of the KF with linear dynamics and observation operator.

Listing 15. Implementation of the (first-order) EKF with nonlinear dynamics and nonlinear observation operator.

Listing 16. Implementation of the EKF for the Lorenz 63 system.

Listing 17. Implementation of the EnKF with virtual observations.

Listing 18. Implementation of DEnKF without virtual observations.

Listing 19. Implementation of EnKF for the Lorenz 63 system.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ahmed, S.E.; Pawar, S.; San, O. PyDA: A Hands-On Introduction to Dynamical Data Assimilation with Python. Fluids 2020, 5, 225. https://doi.org/10.3390/fluids5040225

AMA Style

Ahmed SE, Pawar S, San O. PyDA: A Hands-On Introduction to Dynamical Data Assimilation with Python. Fluids. 2020; 5(4):225. https://doi.org/10.3390/fluids5040225

Chicago/Turabian Style

Ahmed, Shady E., Suraj Pawar, and Omer San. 2020. "PyDA: A Hands-On Introduction to Dynamical Data Assimilation with Python" Fluids 5, no. 4: 225. https://doi.org/10.3390/fluids5040225

APA Style

Ahmed, S. E., Pawar, S., & San, O. (2020). PyDA: A Hands-On Introduction to Dynamical Data Assimilation with Python. Fluids, 5(4), 225. https://doi.org/10.3390/fluids5040225

Article Menu

PyDA: A Hands-On Introduction to Dynamical Data Assimilation with Python

Abstract

1. Introduction

2. Preliminaries

2.1. Notation

2.2. Twin Experiment Framework

3. Three Dimensional Variational Data Assimilation

3.1. Linear Case

3.2. Nonlinear Case

3.3. Example: Lorenz 63 System

4. Four Dimensional Variational Data Assimilation

Example: Lorenz 63 System

5. Forward Sensitivity Method

Example: Lorenz 63 System

6. Kalman Filtering

7. Extended Kalman Filter

Example: Lorenz 63 System

8. Ensemble Kalman Filter

8.1. Deterministic Ensemble Kalman Filter

8.2. Example: Lorenz 63 System

9. Applications

9.1. Lorenz 96 System

9.2. Two-Level Lorenz 96 System

9.3. Kuramato Sivashinsky

9.4. Quasi-Geostrophic (QG) Ocean Circulation Model

10. Concluding Remarks

Author Contributions

Funding

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI