# PyDA: A Hands-On Introduction to Dynamical Data Assimilation with Python

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Preliminaries

#### 2.1. Notation

#### 2.2. Twin Experiment Framework

## 3. Three Dimensional Variational Data Assimilation

#### 3.1. Linear Case

#### 3.2. Nonlinear Case

#### 3.3. Example: Lorenz 63 System

## 4. Four Dimensional Variational Data Assimilation

#### Example: Lorenz 63 System

- In Listing 9, we utilize the gradient descent approach to minimize the cost function. Readers are encouraged to apply other optimization techniques (e.g., conjugate gradient) that achieve higher convergence rate.
- The determination of the learning rate can be further optimized using more efficient line-search methods, rather than the simple Golden search.
- The Lagrangian multiplier method can be applied to solve the 4DVAR problem instead of the adjoint method, similar results should be obtained.
- The presented algorithm relies on the definition of the cost functional given in Equation (13), based on the discrepancy between measurements and model’s predictions. When extra information is available, it can be incorporated into the cost functional. For instance, similar to Equation (4), a term that penalizes the correction magnitude can be added, weighted by the background covariance matrix. Furthermore, symmetries or other physical knowledge can be enforced as hard or weak constraints.
- The first-order adjoint algorithm requires the computation of the Jacobian ${\mathbf{D}}_{M}\left(\mathbf{u}\right)$ of the discrete-time model map $M(\mathbf{u};\theta )$. This can be computed by plugging the model $f(\mathbf{u};\theta )$ in a time integration scheme and rearranging everything to rewrite $M(\mathbf{u};\theta )$ as explicit function of $\mathbf{u}$ and differentiating with respect to components of $\mathbf{u}$. For Lorenz 63 and 1st Euler scheme, this can be an easy task. However, for a higher dimensional system and more accurate time integrators, this would be cumbersome. Instead, the chain rule can be utilized to compute ${\mathbf{D}}_{M}\left(\mathbf{u}\right)$ as presented in Listing 10, which takes as input the right-hand side of the continuous-time dynamics $f(\xb7;\xb7)$ (described in Listing 3 for Lorenz 63 system) as well as its Jacobian (given in Listing 11 for Lorenz 63).

## 5. Forward Sensitivity Method

#### Example: Lorenz 63 System

## 6. Kalman Filtering

## 7. Extended Kalman Filter

#### Example: Lorenz 63 System

## 8. Ensemble Kalman Filter

- Ensemble methods have gained significant popularity because of their simple conceptual formulation and relative ease of implementation. No optimization problem is required to be solved. They are considered non-intrusive in the sense that current solvers can be easily incorporated with minimal modification, as there is no need to derive model Jacobians or adjoint equations.
- The analysis ensemble can be used as initial ensemble for the next assimilation cycle (in which case, we need not compute ${\mathbf{P}}_{k+1}$). Alternatively, new ensemble can be built, by sampling from multivariate Gaussian distribution with a mean of ${\mathbf{u}}_{a}\left({t}_{k+1}\right)$ and covariance matrix of ${\mathbf{P}}_{k+1}$ (i.e., using ${\mathbf{u}}_{a}\left({t}_{k+1}\right)$ and ${\mathbf{P}}_{k+1}$ in lieu of ${\widehat{\mathbf{u}}}_{b}\left({t}_{k+1}\right)$ and ${\widehat{\mathbf{B}}}_{k+1}$, respectively).
- After virtual observations are made-up, an ensemble measurement error covariance matrix can be arbitrarily computed as an alternative to the actual one [17]. This is especially valuable when the actual measurement noise covariance matrix is poorly known.
- Perturbed observations are needed in EnKF derivation and guarantees that the posterior (analysis) covariance is not underestimated. For instance, in case of small corrections to the forecast, the traditional EnKF without virtual observations yields a error covariance that is about twice smaller than that is needed to match Kalman filter [30]. In other words, the use of virtual observations forces the ensemble posterior covariance to be the same as that of the standard Kalman filter in the limit of very large N. Thus, the same Kalman gain matrix relation is borrowed from standard Kalman filter.
- Instead of assuming virtual observations, alternative formulations of ensemble Kalman filters have been proposed in literature, giving a family of deterministic ensemble Kalman filter (DEnKF), as opposed to the aforementioned (stochastic) ensemble Kalman filer (EnKF). One such variant is briefly discussed in Section 8.1.

#### 8.1. Deterministic Ensemble Kalman Filter

#### 8.2. Example: Lorenz 63 System

## 9. Applications

#### 9.1. Lorenz 96 System

#### 9.2. Two-Level Lorenz 96 System

#### 9.3. Kuramato Sivashinsky

#### 9.4. Quasi-Geostrophic (QG) Ocean Circulation Model

## 10. Concluding Remarks

## Author Contributions

## Funding

## Acknowledgments

## Conflicts of Interest

## References

- Navon, I.M. Data assimilation for numerical weather prediction: A review. In Data Assimilation for Atmospheric, Oceanic and Hydrologic Applications; Springer: Berlin/Heidelberg, Germany, 2009; pp. 21–65. [Google Scholar]
- Blum, J.; Le Dimet, F.X.; Navon, I.M. Data assimilation for geophysical fluids. Handb. Numer. Anal.
**2009**, 14, 385–441. [Google Scholar] - Le Dimet, F.X.; Navon, I.M.; Ştefănescu, R. Variational data assimilation: Optimization and optimal control. In Data Assimilation for Atmospheric, Oceanic and Hydrologic Applications (Vol. III); Springer: Berlin/Heidelberg, Germany, 2017; pp. 1–53. [Google Scholar]
- Attia, A.; Sandu, A. DATeS: A highly extensible data assimilation testing suite v1.0. Geosci. Model Dev.
**2019**, 12, 629–649. [Google Scholar] [CrossRef] [Green Version] - Lorenc, A.C. Analysis methods for numerical weather prediction. Q. J. R. Meteorol. Soc.
**1986**, 112, 1177–1194. [Google Scholar] [CrossRef] - Parrish, D.F.; Derber, J.C. The National Meteorological Center’s spectral statistical-interpolation analysis system. Mon. Weather Rev.
**1992**, 120, 1747–1763. [Google Scholar] [CrossRef] - Courtier, P. Dual formulation of four-dimensional variational assimilation. Q. J. R. Meteorol. Soc.
**1997**, 123, 2449–2461. [Google Scholar] [CrossRef] - Rabier, F.; Järvinen, H.; Klinker, E.; Mahfouf, J.F.; Simmons, A. The ECMWF operational implementation of four-dimensional variational assimilation. I: Experimental results with simplified physics. Q. J. R. Meteorol. Soc.
**2000**, 126, 1143–1170. [Google Scholar] [CrossRef] - Elbern, H.; Schmidt, H.; Talagrand, O.; Ebel, A. 4D-variational data assimilation with an adjoint air quality model for emission analysis. Environ. Model. Softw.
**2000**, 15, 539–548. [Google Scholar] [CrossRef] - Courtier, P.; Thépaut, J.N.; Hollingsworth, A. A strategy for operational implementation of 4D-Var, using an incremental approach. Q. J. R. Meteorol. Soc.
**1994**, 120, 1367–1387. [Google Scholar] [CrossRef] - Lorenc, A.C.; Rawlins, F. Why does 4D-Var beat 3D-Var? Q. J. R. Meteorol. Soc.
**2005**, 131, 3247–3257. [Google Scholar] [CrossRef] - Gauthier, P.; Tanguay, M.; Laroche, S.; Pellerin, S.; Morneau, J. Extension of 3DVAR to 4DVAR: Implementation of 4DVAR at the Meteorological Service of Canada. Mon. Weather Rev.
**2007**, 135, 2339–2354. [Google Scholar] [CrossRef] [Green Version] - Lakshmivarahan, S.; Lewis, J.M. Forward sensitivity approach to dynamic data assimilation. Adv. Meteorol.
**2010**, 2010, 375615. [Google Scholar] [CrossRef] - Lakshmivarahan, S.; Lewis, J.M.; Jabrzemski, R. Forecast Error Correction Using Dynamic Data Assimilation; Springer: Cham, Switzerland, 2017. [Google Scholar]
- Houtekamer, P.L.; Mitchell, H.L. Data assimilation using an ensemble Kalman filter technique. Mon. Weather Rev.
**1998**, 126, 796–811. [Google Scholar] [CrossRef] - Burgers, G.; Jan van Leeuwen, P.; Evensen, G. Analysis scheme in the ensemble Kalman filter. Mon. Weather Rev.
**1998**, 126, 1719–1724. [Google Scholar] [CrossRef] - Evensen, G. The ensemble Kalman filter: Theoretical formulation and practical implementation. Ocean. Dyn.
**2003**, 53, 343–367. [Google Scholar] [CrossRef] - Houtekamer, P.L.; Mitchell, H.L. A sequential ensemble Kalman filter for atmospheric data assimilation. Mon. Weather Rev.
**2001**, 129, 123–137. [Google Scholar] [CrossRef] - Houtekamer, P.L.; Mitchell, H.L. Ensemble kalman filtering. Q. J. R. Meteorol. Soc.
**2005**, 131, 3269–3289. [Google Scholar] [CrossRef] - Treebushny, D.; Madsen, H. A new reduced rank square root Kalman filter for data assimilation in mathematical models. In Proceedings of the International Conference on Computational Science, Melbourne, Australia, 2 June 2003; Springer: Berlin/Heidelberg, Germany, 2003; pp. 482–491. [Google Scholar]
- Buehner, M.; Malanotte-Rizzoli, P. Reduced-rank Kalman filters applied to an idealized model of the wind-driven ocean circulation. J. Geophys. Res. Ocean.
**2003**, 108. [Google Scholar] [CrossRef] [Green Version] - Lakshmivarahan, S.; Stensrud, D.J. Ensemble Kalman filter. IEEE Control. Syst. Mag.
**2009**, 29, 34–46. [Google Scholar] - Apte, A.; Hairer, M.; Stuart, A.; Voss, J. Sampling the posterior: An approach to non-Gaussian data assimilation. Phys. Nonlinear Phenom.
**2007**, 230, 50–64. [Google Scholar] [CrossRef] - Bocquet, M.; Pires, C.A.; Wu, L. Beyond Gaussian statistical modeling in geophysical data assimilation. Mon. Weather Rev.
**2010**, 138, 2997–3023. [Google Scholar] [CrossRef] - Vetra-Carvalho, S.; Van Leeuwen, P.J.; Nerger, L.; Barth, A.; Altaf, M.U.; Brasseur, P.; Kirchgessner, P.; Beckers, J.M. State-of-the-art stochastic data assimilation methods for high-dimensional non-Gaussian problems. Tellus Dyn. Meteorol. Oceanogr.
**2018**, 70, 1–43. [Google Scholar] [CrossRef] [Green Version] - Attia, A.; Moosavi, A.; Sandu, A. Cluster sampling filters for non-Gaussian data assimilation. Atmosphere
**2018**, 9, 213. [Google Scholar] [CrossRef] [Green Version] - Lewis, J.M.; Lakshmivarahan, S.; Dhall, S. Dynamic Data Assimilation: A Least Squares Approach; Cambridge University Press: Cambridge, UK, 2006; Volume 104. [Google Scholar]
- Evensen, G. Sequential data assimilation with a nonlinear quasi-geostrophic model using Monte Carlo methods to forecast error statistics. J. Geophys. Res. Ocean.
**1994**, 99, 10143–10162. [Google Scholar] [CrossRef] - Evensen, G. Data Assimilation: The Ensemble Kalman Filter; Springer: Berlin/Heidelberg, Germany, 2009. [Google Scholar]
- Sakov, P.; Oke, P.R. A deterministic formulation of the ensemble Kalman filter: An alternative to ensemble square root filters. Tellus Dyn. Meteorol. Oceanogr.
**2008**, 60, 361–371. [Google Scholar] [CrossRef] [Green Version] - Whitaker, J.S.; Hamill, T.M. Ensemble data assimilation without perturbed observations. Mon. Weather Rev.
**2002**, 130, 1913–1924. [Google Scholar] [CrossRef] - Tippett, M.K.; Anderson, J.L.; Bishop, C.H.; Hamill, T.M.; Whitaker, J.S. Ensemble square root filters. Mon. Weather Rev.
**2003**, 131, 1485–1490. [Google Scholar] [CrossRef] [Green Version] - Lorenz, E.N. Predictability: A problem partly solved. In Proceedings of the Seminar on Predictability, Reading, UK, 9–11 September 1996; Volume 1. [Google Scholar]
- Kerin, J.; Engler, H. On the Lorenz’96 Model and Some Generalizations. arXiv
**2020**, arXiv:2005.07767. [Google Scholar] - Anderson, J.L.; Anderson, S.L. A Monte Carlo implementation of the nonlinear filtering problem to produce ensemble assimilations and forecasts. Mon. Weather Rev.
**1999**, 127, 2741–2758. [Google Scholar] [CrossRef] - Kuramoto, Y. Diffusion-induced chaos in reaction systems. Prog. Theor. Phys. Suppl.
**1978**, 64, 346–367. [Google Scholar] [CrossRef] - Majda, A.; Wang, X. Nonlinear Dynamics and Statistical Theories for basic Geophysical Flows; Cambridge University Press: New York, NY, USA, 2006. [Google Scholar]
- Greatbatch, R.J.; Nadiga, B.T. Four-gyre circulation in a barotropic model with double-gyre wind forcing. J. Phys. Oceanogr.
**2000**, 30, 1461–1471. [Google Scholar] [CrossRef] - San, O.; Staples, A.E.; Wang, Z.; Iliescu, T. Approximate deconvolution large eddy simulation of a barotropic ocean circulation model. Ocean. Model.
**2011**, 40, 120–132. [Google Scholar] [CrossRef] [Green Version] - Arakawa, A. Computational design for long-term numerical integration of the equations of fluid motion: Two-dimensional incompressible flow. Part I. J. Comput. Phys.
**1997**, 135, 103–114. [Google Scholar] [CrossRef] [Green Version] - Press, W.H.; Flannery, B.P.; Teukolsky, S.A.; Vetterling, W.T. Numerical Recipes; Cambridge University Press: New York, NY, USA, 1989. [Google Scholar]
- Cacuci, D.G.; Navon, I.M.; Ionescu-Bujor, M. Computational Methods for Data Evaluation and Assimilation; CRC Press: New York, NY, USA, 2013. [Google Scholar]
- Kalnay, E. Atmospheric Modeling, Data Assimilation and Predictability; Cambridge University Press: New York, NY, USA, 2003. [Google Scholar]
- Law, K.; Stuart, A.; Zygalakis, K. Data Assimilation: A Mathematical Introduction; Springer: Cham, Switzerland, 2015. [Google Scholar]
- Asch, M.; Bocquet, M.; Nodet, M. Data Assimilation: Methods, Algorithms, and Applications; SIAM: Philadelphia, PA, USA, 2016. [Google Scholar]
- Simon, D. Optimal State Estimation: Kalman, H Infinity, and Nonlinear Approaches; John Wiley & Sons: Hoboken, NJ, USA, 2006. [Google Scholar]
- Labbe, R. Kalman and bayesian filters in Python. Chap
**2014**, 7, 246. [Google Scholar] - Van Leeuwen, P.J.; Künsch, H.R.; Nerger, L.; Potthast, R.; Reich, S. Particle filters for high-dimensional geoscience applications: A review. Q. J. R. Meteorol. Soc.
**2019**, 145, 2335–2365. [Google Scholar] [CrossRef] [PubMed] - Zupanski, M. Maximum likelihood ensemble filter: Theoretical aspects. Mon. Weather Rev.
**2005**, 133, 1710–1726. [Google Scholar] [CrossRef] - Zupanski, M.; Navon, I.M.; Zupanski, D. The Maximum Likelihood Ensemble Filter as a non-differentiable minimization algorithm. Q. J. R. Meteorol. Soc.
**2008**, 134, 1039–1050. [Google Scholar] [CrossRef] [Green Version] - Kang, W.; Xu, L. Optimal placement of mobile sensors for data assimilations. Tellus Dyn. Meteorol. Oceanogr.
**2012**, 64, 17133. [Google Scholar] [CrossRef] - Mons, V.; Chassaing, J.C.; Sagaut, P. Optimal sensor placement for variational data assimilation of unsteady flows past a rotationally oscillating cylinder. J. Fluid Mech.
**2017**, 823, 230–277. [Google Scholar] [CrossRef] - Le Dimet, F.X.; Navon, I.M.; Daescu, D.N. Second-order information in data assimilation. Mon. Weather Rev.
**2002**, 130, 629–648. [Google Scholar] [CrossRef] [Green Version] - Lorenc, A.C.; Bowler, N.E.; Clayton, A.M.; Pring, S.R.; Fairbairn, D. Comparison of hybrid-4DEnVar and hybrid-4DVar data assimilation methods for global NWP. Mon. Weather Rev.
**2015**, 143, 212–229. [Google Scholar] [CrossRef] - Desroziers, G.; Camino, J.T.; Berre, L. 4DEnVar: Link with 4D state formulation of variational assimilation and different possible implementations. Q. J. R. Meteorol. Soc.
**2014**, 140, 2097–2110. [Google Scholar] [CrossRef] - Wang, X.; Barker, D.M.; Snyder, C.; Hamill, T.M. A hybrid ETKF–3DVAR data assimilation scheme for the WRF model. Part I: Observing system simulation experiment. Mon. Weather Rev.
**2008**, 136, 5116–5131. [Google Scholar] [CrossRef] [Green Version] - Buehner, M.; Morneau, J.; Charette, C. Four-dimensional ensemble-variational data assimilation for global deterministic weather prediction. Nonlinear Process. Geophys.
**2013**, 20, 669–682. [Google Scholar] [CrossRef] [Green Version] - Kleist, D.T.; Ide, K. An OSSE-based evaluation of hybrid variational-ensemble data assimilation for the NCEP GFS. Part I: System description and 3D-hybrid results. Mon. Weather Rev.
**2015**, 143, 433–451. [Google Scholar] [CrossRef] - Kleist, D.T.; Ide, K. An OSSE-based evaluation of hybrid variational-ensemble data assimilation for the NCEP GFS. Part II: 4DEnVar and hybrid variants. Mon. Weather Rev.
**2015**, 143, 452–470. [Google Scholar] [CrossRef] [Green Version] - Lakshmivarahan, S. Video Lectures on Dynamic Data Assimilation; NPTEL Program; IIT Madras: Chennai, India, 2016; Available online: https://nptel.ac.in/courses/111/106/111106082/ (accessed on 28 November 2020).

**Figure 7.**EKF results for the Lorenz 96 system. The trajectories of ${X}_{9}$, ${X}_{18}$, and ${X}_{36}$ are shown.

**Figure 9.**RMSE for a two-level Lorenz model for different combinations of number of ensembles and inflation factor.

**Figure 10.**Full state trajectory of the multiscale Lorenz 96 model with no parameterizations in the forecast model. The EnKF algorithm uses the inflation factor $\lambda =1.04$ and $N=50$ and the DEnKF uses the inflation factor $\lambda =1.05$ and $N=45$. The observation data for both EnKF an DEnKF algorithm is obtained by adding measurement noise to the exact solution of the two-level Lorenz 96 system.

**Figure 11.**Selected trajectories of the Kuramoto–Sivashinsky model ($\nu =1/2$) with the analysis performed by the ensemble Kalman filter (EnKF) using observations from $m=16$ (

**left**), $m=32$ (

**middle**), and $m=64$ (

**right**) state variables at every 10 time steps.

**Figure 12.**Full state trajectory of the Kuramoto–Sivashinsky model ($\nu =1/2$) with the analysis performed by the ensemble Kalman filter (EnKF) using observations from $m=16$ (

**left**), $m=32$ (

**middle**), and $m=64$ (

**right**) state variables at every 10 time steps.

**Figure 13.**A Typical vorticity (

**left**) and streamfunction (

**right**) field for the single-layer QG model. The dots shows the locations of observations.

**Figure 14.**Snapshots of the true vorticity field (

**left**), analysis estimate of the DEnKF algorithm (

**middle**), and the difference (error) between the two fields (

**right**) obtained for a particular run of the single-layer QG model. The snapshots of vorticity field are plotted at $t=0.3,\phantom{\rule{3.33333pt}{0ex}}0.35,\phantom{\rule{3.33333pt}{0ex}}0.4$ (from top to bottom).

**Listing 2.**Implementation of the 3DVAR for with a nonlinear observation operator, using first-order approximation.

**Listing 4.**Python functions for the time integration using the 1st Euler and the 4th Runge–Kutta schemes.

**Listing 6.**Computation of the gradient of the cost functional with the 4DVAR using the first-order adjoint algorithm.

**Listing 8.**Computation of the cost functional defined in Equation (13).

**Listing 10.**Python functions for computing the Jacobian ${\mathbf{D}}_{M}\left(\mathbf{u}\right)$ of the discrete-time model map $M(\mathbf{u};\theta )$ using the 1st Euler and the 4th Runge–Kutta schemes with chain rule.

**Listing 12.**Python function for computing the correction vector $\mathsf{\Delta}{\mathbf{u}}_{0}$ sing the forward sensitivity method with first-order approximation.

**Listing 15.**Implementation of the (first-order) EKF with nonlinear dynamics and nonlinear observation operator.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Ahmed, S.E.; Pawar, S.; San, O.
PyDA: A Hands-On Introduction to Dynamical Data Assimilation with Python. *Fluids* **2020**, *5*, 225.
https://doi.org/10.3390/fluids5040225

**AMA Style**

Ahmed SE, Pawar S, San O.
PyDA: A Hands-On Introduction to Dynamical Data Assimilation with Python. *Fluids*. 2020; 5(4):225.
https://doi.org/10.3390/fluids5040225

**Chicago/Turabian Style**

Ahmed, Shady E., Suraj Pawar, and Omer San.
2020. "PyDA: A Hands-On Introduction to Dynamical Data Assimilation with Python" *Fluids* 5, no. 4: 225.
https://doi.org/10.3390/fluids5040225