Next Article in Journal
GNSS Profile from the Greenland Korth Expeditions in the Context of Satellite Data
Next Article in Special Issue
Convolutional Neural Networks for Differential Diagnosis of Raynaud’s Phenomenon Based on Hands Thermal Patterns
Previous Article in Journal
Lineal Energy of Proton in Silicon by a Microdosimetry Simulation
Previous Article in Special Issue
Pre-Training Autoencoder for Lung Nodule Malignancy Assessment Using CT Images
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Deep Data Assimilation: Integrating Deep Learning with Data Assimilation

1
Data Science Institute, Imperial College London, London SW72AZ, UK
2
State Key Laboratory of Industrial Control Technology, Zhejiang University, Hangzhou 310027, China
3
Ningbo Joynext Technology Inc., Ningbo 315000, China
4
Department of Computer Science, Hong Kong Baptist University, Hong Kong
*
Author to whom correspondence should be addressed.
Appl. Sci. 2021, 11(3), 1114; https://doi.org/10.3390/app11031114
Submission received: 8 December 2020 / Revised: 19 January 2021 / Accepted: 21 January 2021 / Published: 26 January 2021
(This article belongs to the Special Issue Artificial Intelligence Applications and Innovation)

Abstract

:
In this paper, we propose Deep Data Assimilation (DDA), an integration of Data Assimilation (DA) with Machine Learning (ML). DA is the Bayesian approximation of the true state of some physical system at a given time by combining time-distributed observations with a dynamic model in an optimal way. We use a ML model in order to learn the assimilation process. In particular, a recurrent neural network, trained with the state of the dynamical system and the results of the DA process, is applied for this purpose. At each iteration, we learn a function that accumulates the misfit between the results of the forecasting model and the results of the DA. Subsequently, we compose this function with the dynamic model. This resulting composition is a dynamic model that includes the features of the DA process and that can be used for future prediction without the necessity of the DA. In fact, we prove that the DDA approach implies a reduction of the model error, which decreases at each iteration; this is achieved thanks to the use of DA in the training process. DDA is very useful in that cases when observations are not available for some time steps and DA cannot be applied to reduce the model error. The effectiveness of this method is validated by examples and a sensitivity study. In this paper, the DDA technology is applied to two different applications: the Double integral mass dot system and the Lorenz system. However, the algorithm and numerical methods that are proposed in this work can be applied to other physics problems that involve other equations and/or state variables.

1. Introduction and Motivations

The present work is placed in the context of the design of reliable algorithms for solving Data Assimilation (DA) for dynamic systems applications. DA is an uncertainty quantification technique by which measurements and model predictions are combined in order to obtain an accurate estimation of the state of the modelled system [1]. In the past 20 years, DA methodologies, used, in principle, mainly in atmospheric models, has become a main component in the development and validation of mathematical models in meteorology, climatology, geophysics, geology, and hydrology (often these models are referred to with the term predictive to underline that these are dynamical systems). Recently, DA has also been applied to numerical simulations of geophysical applications [2], medicine, and biological science [3,4] for improving the reliability of the numerical simulations. In practice, DA provides the answers to questions, such as: how well does the model under consideration represent the physical phenomena? What confidence can one have that the numerical results produced by the model are correct? How far can the calculated results be extrapolated? How can the predictability and/or extrapolation limits be extended and/or improved?
Accuracy in numerical models of dynamic systems does not come easily. The simulation errors come from three main sources: inaccurate input data, inaccurate physics models, and limited accuracy of the numerical solutions of the governing equations. Our ability to predict any physical phenomenon is determined by the accuracy of our input data and our modelling approach [5]. Errors arise at the different stages of the solution process, namely, the uncertainty in the mathematical model, in the model’s solution, and in the measurements. These are the errors intrinsic to the DA problem. Moreover, there are the approximation errors that are introduced by the linearization, the discretization, and the model reduction. These errors incur when infinite-dimensional equations are replaced by a finite-dimensional system (that is, the process of discretization), or when simpler approximations to the equations are developed (e.g., by model reduction). Finally, given the numerical problem, an algorithm is developed and implemented as a mathematical software. At this stage, the inevitable rounding errors introduced by working in finite-precision arithmetic occur. These approaches are unable to fully overcome their unrealistic assumptions, particularly linearity, normality, and zero error covariances, despite the improvements in the complexity of the DA models to better match their application requirements and circumvent their implementation problems [6].
With the rapid developments in recent years, DL shows great capability in approximating nonlinear systems, and extracting high dimensional features. As the foundation and main driving force of deep learning, DNN is concerned less about numerical modelling. DNN took a data driven approach, where the models are built by learning from data. Instead of being deterministic [7,8], DL models such as NN are stochastic. Thus, they can well succeed when applied to deterministic systems, but without ever learning the relationship between the variables. For e.g., ML algorithms do not know when they violate the physics laws of [9,10] in weather forecasting. Before they begin to produce accurate results, machine learning methods often need large quantities of data, and, the bigger the architecture, the more data are required. In a wide variety of applications, DL is successfully used when the conditions allow. However, in cases where the dimensions are either very large, the data are noisy or the data do not cover the entire domain adequately; these approaches still fail. Furthermore, the network will not perform well if the available information is too noisy, too scarce, or if there is a lack of prominent features to reflect the problem. This can occur either if there is a lack of good characteristics or a lack of data on good ground reality. For this reason, a good quality of the data (smaller errors) can help to generate a more reliable DNN. DA is the Bayesian approximation of the true state of some physical system at a given time by combining time-distributed observations with a dynamic model in an optimal way. In some sense, DA increases the meaningfulness of the data and reduces the forecasting errors. Therefore, it is interesting to investigate the mechanism where two data driven paradigms, DA and DNN, can be integrated.
In this context, we developed the Deep Data Assimilation (DDA) model. DDA is a new concept that combines the DNN and DA. It faces the problem to reduce both input data error (by DA) and modelling error (by ML).

Related Works and Contribution of the Present Work

This paper is placed in the imperfect models and sensitivity analysis context for data assimilation and deep neural network. Sensitivity Analysis (SA) refers to the determination of the contributions of individual uncertainty on data to the uncertainty in the solution [11]. The sensitivity of the variational DA models has been studied in [12], where an Adjoint modeling is used in order to obtain first and second-order derivative information. In [13], sensitivity analysis is based on the Backward Error Analysis (B.E.A.), which figure out how much the errors propagation in DA process depend on the condition number of the background error covariance matrices. Reduced-order approaches are formulated in [12] in order to alleviate the computational cost that is associated with the sensitivity estimation. This method makes rerunning less expensive, the parameters must still be selected a priori and, consequently, important sensitivities may be missed [14]. For imperfect models, weak constraint DA (WCDA) can be employed [15], in which the model error term in the covariance evolution equation acts to reduce the dependence of the estimate on observations and prior states that are well separated in time. However, the time complexity of the WCDA models is a big limit that makes these models often unusable.
ML algorithms are capable of assisting or replacing, without the assumptions of traditional methods, the aforementioned conventional methods in assimilating data, and producing forecasts. Because any linear or nonlinear functions can be approximated by neural networks, DA has been integrated as a supplement in various applications. The technology and the model presented in this paper are very general and they can be applied to any kind of dynamical systems. The use of ML in correcting model forecasts is promising in several geophysics applications [16,17]. Babovic [18,19,20] applies neural network for error correction in forecasting. However, in this literature, the error correction neural network has not a direct relation with the system update model in each step and it does not train the results of the assimilation process. In [21], DA and ML are also integrated and a surrogate model is used in order to replace the dynamical system. In this paper we do not replace the the dynamical system with a surrogate models, as this would add approximation errors. In fact, in real case scenarios (i.e., in some operational centres), data driven models are welcome to support Computational Fluid Dynamics simulations [9,10,22], but the systems of Partial Differential Equations that are stably implemented to predict (weather, climate, ocean, air pollution, et al.) dynamics are not replaced due to the big approximations that this replacement would introduce. The technology that we propose in this paper integrates DA with ML in a way that can be used in real case scenarios for real world applications without changing the already implemented dynamical systems.
Other works have already explored the relationship between the learning, analysis, and implementation of data in ML and DA. These studies have concentrated on fusing methods from the two fields instead of taking a modular approach like the one implemented in this paper, thereby developing methods that are unique to those approaches. The integration of the DA and ML frameworks from a Bayesian perspective illustrates the interface between probabilistic ML methods and differential equations. The equivalence is formally presented in [23] and it illustrates the parallels between the two fields. Here, the equivalence of four-dimensional VarDA (4D-Var) and Recurrent NN, and how approximate Bayesian inverse methods (i.e., VarDA in DA and back-propagation in ML) can be used to merge the two fields. These methods are also especially suited to systems involving Gaussian process, as presented in [24,25,26]. These are developed data-driven algorithms that are capable, under a unified approach, of learning nonlinear, space-dependent cross-correlations, and of estimating model statistics. A DA method can be used via the DL framework to help the dynamic model (in this case, the ML model) to find optimal initial conditions. Although DA can be computationally costly, the recent dimensional reduction developments using ML substantially reduce this expense, while retaining much of the variance of the original [27] model. Therefore, DA helps to develop the ML model, while [28,29] also benefits from the ML techniques. Methods that merge DA models, such as Kalman filters and ML, exist to overcome noise or to integrate time series knowledge. The authors account for temporal details in human motion analysis in [30] by incorporating a Kalman filter into a neural network of LSTM. In [31], the authors suggest a methodology that combines NN and DA for model error correction. Instead, in [32], the authors use DA, in particular Kalman tracking, to speed up any learning-based motion tracking method to real-time and to correct some common inconsistencies in motion tracking methods that are based on the camera. Finally, in [33], the authors introduce a new neural network for speed and replace the whole DA process.
In this paper, we proposed a new concept, called DDA, which combines the DL into the conventional DA. In recent years, DL gained great success both in academic and industrial areas. In function approximation, which has unknown model and high nonlinearity, it shows great benefit. Here, we use DNN to describe the model uncertainty and the assimilation process. In an end-to-end approach, the neural networks are introduced and their parameters are iteratively modified by applying the DA methods with upcoming observations. The resulting DNN model includes the features of the DA process. We prove that this implies a reduction of the model error, which decreases at each iteration.
We also prove that the DDA approach introduces an improvement in both DA and DNN. We introduce a sensitivity analysis that is based on the backward error analysis to compare the error in DDA result and the error in DA. We prove that the error in the results obtained by DDA is reduced with respect the DA. We also prove that the use of the results of DA to train the DL model introduces a novelty in the SA techniques used to evaluate which inputs are important in NN. The implemented approach can be compared to the ‘Weights’ Method [34]. The distinction from the classical ’Weights’ method is that the weights between the nodes are given in our approach by the error covariance matrices that are determined in the DA phase.
In summary, we prove that the DDA approach
  • includes the features of the DA process into the DNN model with a consequent reduction of the model error;
  • introduces a reduction in the overall error in the assimilation solution with respect to the DA solution; and,
  • increases the reliability of the DNN model, including the error covariance matrices as weight in the loss function.
This paper is structured, as follows. Section 2 provides preliminaries on DA. We proposed the DDA methodology in Section 3, where we introduce the integration of deep learning with data assimilation. In Section 4, numerical examples are provided in order to implement and verity the DDA algorithm and, in Section 5, conclusions and future work are summarized.

2. Data Assimilation Process

Let
u ˙ = M u , t , θ
be a forecasting model, where u denotes the state, M is a nonlinear map, θ is a parameter of interest, and t [ 0 , T ] denotes the time. Let
v = H u + ϵ
be an observation of the state u, where H is an observation function and ϵ denotes the measurement error.
Data Assimilation is concerned with how to use the observations in (2) and the model in (1) in order to obtain the best possible knowledge of the system as a function of time.The Bayes theorem conducts the estimation of a function u D A , which maximizes a probability density function, given the observation v and a prior from u [1,35].
For a fixed a time step t k [ 0 , T ] , let u k denote the estimated system state at time t k :
u k = M u k 1
where the operator M represents a discretization of a first order approximation of M in (1). Let v k be an observation of the state at time t k and let
H : u k v k .
be the discretization of a first order approximation of the linearization of H in (2). The DA process consists in finding u D A (called analysis) as an optimal tradeoff between the prediction made based on the estimated system state (called background) and the available observation. The state u is then replaced by the function u D A , which includes the information from the observation v in a prediction-correction cycle that is shown in Figure 1.
The analysis u D A is computed as inverse solution of
v k = H u D A + e R k ,
subject to the constraint that
u D A = u k + e B k .
where e R k = N ( μ R , R k ) and e B k = N ( μ B , B k ) denote the observation error and model error, respectively, N ( · ) denotes Gaussian distribution [1], and e R k includes the measurement error ϵ introduced in (2). Both of the errors include the discretization, approximation and the other numerical errors.
Because H is typically rank deficient, the (5) is an ill posed inverse problem [36,37]. The Tikhonov formulation [38] leads to an unconstrained least square problem, where the term in (6) ensures the existence of a unique solution of the (5). The DA process can be then described in Algorithm 1, where:
u D A = a r g m i n u u u k B k 1 2 + v k H u R k 1 2
where R k and B k are the observation and model error covariance matrices, respectively:
R k : = σ 0 2 I ,
with 0 σ 0 2 1 and I be the identity matrix,
B k = V k V k T
with V k background error deviation matrix [13] being computed by a set of n k historical data S = { u j } j = 1 , , n k available at time t k and such that
V k = { V j k } j = 1 , , n k ,
where
V j k = u j u ¯
and
u ¯ = m e a n j = 1 , , n k { u j } .
Algorithm 1 DA
Applsci 11 01114 i001
The DA process, as defined in (7), can be solved by several methods. Two main approaches that have gained acceptance as powerful methods for data assimilation on the last years are the variational approach and Kalman Filter (KF). The variational approach [39] is based on the minimization of a functional, which estimates the discrepancy between numerical results and measures. The Kalman Filter [40] is a recursive filtering instead. In both cases we seek an optimal solution. Statistically, KF seeks a solution with minimum variance. Variational methods seek a solution that minimizes a suitable cost function. In certain cases, the two approaches are identical and they provide exactly the same solution [1]. However, the statistical approach, though often complex and time-consuming, can provide a richer information structure, i.e., an average and some characteristics of its variability (probability distribution).
Assuming the acquisition of the observations in a fixed time steps, the variational approach is named 3DVar [1,6], and it consists in computing the minimum of the cost function:
J ( u ) = ( u u k ) T B k 1 ( u u k ) + ( v k H u ) T R k 1 ( v k H u )
where
u D A = a r g m i n u J ( u ) .
Potential problems of a variational method are mainly given by three main points: it heavily relies on correct B k matrices, it does not take system evolution into account, and it can end up in local minima.
Kalman Filter consists in computing the explicit solution of the normal equations:
u D A = u k + K k ( v k H u k )
where the matrix
K k = B k H T ( H B k H T + R k ) 1
is called Kalman gain matrix and where B k is updated at each time step [6]:
B k = M ( 1 K k 1 H ) B k 1 M T .
A potential problem of Kalman Filter is that K k is too large to store for large- dimensional problems.
A common point of 3DVar and KF is that the observations that are defined in the window [ 0 , T ] (circle points in Figure 2) are assimilated in the temporal point, where the state is defined (grey rhombus in Figure 2 at time t k ). After the assimilation, a new temporal window [ T , 2 T ] is considered and new observations are assimilated. Each process starts at the end point of the previous process and the two assimilation processes are defined in almost independent temporal windows. The result of the assimilation process u D A (green pentagons in Figure 2), as computed by 3DVar or KF, is called analysis. The explicit expression for the analysis error covariance matrix, denoted as A k , is [1]:
A k = ( I K k H ) B k .
where I denotes the identity matrix.
Even if the DA process takes information in a temporal window into account, it does not really take the past into account. This is due to the complexity of the real application forecasting models for which a long time window would be too computationally expensive. In next section, we introduce the Deep Data Assimilation (DDA) model, in which we replace the forecasting model M (black line in Figure 3) with a new Machine Learning model M (red line in Figure 3) trained while using DA results and taking the past into account.

3. The Deep Data Assimilation (DDA) Model

Data assimilation (DA) methodologies improve the levels of confidence in computational predictions (i.e., improve numerical forecasted results) by incorporating observational data into a prediction model. However, the error propagation into the forecating model is not improved by DA, so that, at each step, correction have to be based from scratch without learning from previous experience of error correction. The strongly nonlinear character of many physical processes of interest can result in the dramatic amplification of even small uncertainties in the input, so that they produce large uncertainties in the system behavior [5]. Because this instability, as many observations are assimilated as possible to the point where a strong requirement to DA is to enable real-time utilization of data to improve predictions. Therefore, it is desirable to use machine learning method in order to learn a function which accumulates the process of previous assimilation process. We use NN to model this process. The key concept is in recording each step of state correction during an assimilation period and then learn a function in order to capture this updating mechanism. The system model is then revised by composing this learned updating function with the current system model. Such a process continues by further learning the assimilation process with the updated system model.
The resulting NN-forecasting model is then a forecasting model with an intrinsic assimilation process.
We introduce a neural network G ω , starting by a composition with the model M, as described in Figure 4. The training target is:
u k = G ω M u k 1 .
We train G ω by exploiting the results of DA, as described in Algorithm 2. The G ω model can be the iterative form G ω , i G ω , i 1 . . . G ω , 1 M , where G ω , i is defined as the network that was trained in ith iteration and denotes the composition function. Let M i be the model that was implemented in the ith iteration, it is:
M i = G ω , i G ω , i 1 . . . G ω , 1 M ,
from which M i = G ω , i M i 1 , and  M 0 = M .
For iteration i > 1 , the training set for G ω , i is { ( u k , u k D D A ) } , which exploits the { ( u k 1 D D A , u k D D A ) } from the last updated model M i . In this paper, we implement a Recurrent Neural Network (RNN) based on Elman networks [41]. It is a basic RNN structure that loops through the hidden layer output as an internal variable (that corresponds to the Jordan network [42] that outputs u t + . . . as a variable). As shown in Figure 5, the RNN is specifically expressed as
h t = t a n h ( w 1 [ u k , h t 1 ] + b 1 ) u k D D A = w 2 ( t a n h ( w 1 [ u k , h t 1 ] + b 1 ) ) + b 2
Algorithm 2 A ( D D A )
Applsci 11 01114 i002
This choice to train the NN using data from DA introduces a reduction of the model’s error at each time step that can be quantified as reduction of the background error covariance matrix, as proved in next theorem.
Theorem 1.
Let B k 1 and B k be the error covariance matrices at step k 1 and step k in Algorithm 2, respectively. We prove that
B k B k 1
i.e., the error in background data is reduced at each iteration.
Proof. 
We prove the thesis by a reductio ad absurdum. We assume
B k > B k 1
At step k 1 , from Step 10 of Algorithm 2 we have B k = A k 1 , i.e., from (15) and (13), the (21) gives
I H k 1 B k 1 H k 1 T H k 1 B k 1 H k 1 T + R k 1 B k 1 > B k 1
i.e., from the property of · : a b > a b , we have
I H k 1 B k 1 H k 1 T H k 1 B k 1 H k 1 T + R k 1 B k 1 > B k 1
which means that
I H k 1 B k 1 H k 1 T H k 1 B k 1 H k 1 T + R k 1 > 1
Becaue of the definition of · , we can then assume that i j , such that
1 b i j b i j + r i j > 1
i.e., such that
b i j b i j + r i j < 0 .
which is absurd. In fact, if we consider the two possible conditions i = j and i j . In case i = j , b i i and r i i are diagonal elements of the errors covariance matrices. In other words, they are values of variances that mean that they are positive values and the (26) is not satisfied. In case i j , r i j = 0 as the error covariance matrix R k is diagonal, as defined in (8). Subsequently, we have that b i j b i j + r i j = b i j b i j = 1 and the (26) is also not satisfied. Subsequently, the (20) is proven. □
Another important aspect is that the solution of the DDA Algorithm 2 is more accurate than the solution of a standard DA Algorithm 1.
In fact, even if some sources of errors cannot be ignored (i.e., the round off error), then the introduction of the DDA method reduces the error in the numerical forecasting model (below denoted as ξ k ). The following result held.
Theorem 2.
Let δ k denote the error in the DA solution:
u k D A = u k t r u e + δ k ,
and let δ ^ k be the error in DDA solution:
u k D D A = u k t r u e + δ ^ k ,
it is:
δ ^ k δ k
Proof. 
Let ξ k be error in the solution of the dynamic system M in (3), i.e., u k = M ( u k 1 ) + ξ k . From the backward error analysis that was applied to the DA algorithm [13], we have:
δ k = μ D A ξ k
and
δ ^ k = μ D D A ξ k
where μ D A and μ D D A denote the condition numbers of the DA numerical model and the DDA numerical model, respectively.
It has been proved, in [37], that the condition number of the DA and DDA numerical models are directly proportional to the condition numbers of the related background error covariance matrices B k . Subsequently, thanks to the (20), we have μ D D A μ D A , from which:
δ ^ k = μ D D A ξ k μ D A ξ k = δ k
The DDA framework in this article implement a Recurrent neural network (RNN) structure as G ω . The loss function is defined, as described in Theorem 3 and the network is updated at each step by the gradient loss function on its weights and parameters:
ω i + 1 ω i + 1 + α L i ω .
Theorem 3.
Let { ( u k , u k D D A ) } be the training set for G ω , i , which exploits the data { ( u k 1 D D A , u k D D A ) } from the past updated model M i in (17). The loss function for the Back-propagation, being defined as mean square error (MSE) for the DA training set, is such that
L k ( ω ) = B k H T O k 1 d k G ω , k ( B k 1 H T O k 1 1 d k 1 ) 2
where d k = v k H k u k is the misfit between observed data and background data, and O k = H k B k H k T + R k is the Hessian of the DA system.
Prof. 
The mean square error (MSE)-based loss function for the Back-propagation is such that
L i = | | u k D D A G ω , i M i ( u k 1 D D A ) | | 2 ,
The solution u k D D A of the DDA system, which is obtained by a DA function, can be expresses as [1]:
u k D D A = B k H T ( H B k H T + R k ) 1 ( v k H u k )
Let posed d k = v k H u k and O k = H B k H T + R k , (34) gives:
u k D D A = B k H T O k 1 d k
Substituting, in (33), the expression of u k D D A given by (35), the (32) is proven. □

4. Experiments

In this section, we apply the DDA technology to two different applications: the Double integral mass dot system and the Lorenz system:
  • Double integral mass dot system:
    For the physical representation of a generalized motion system, the double-integral particle system is commonly used. The method, under the influence of an acceleration regulation quantity, can be seen as the motion of a particle. The position and velocity are the variables that affect the state at each time step. The status as a controlled state gradually converges to a relatively stable value, due to the presence of a PID controller in the feedback control system. The performance of the controller is the system-acting acceleration, or it can be interpreted as a force that linearly relates to the acceleration.
    Double integral system is a mathematic abstraction of Newton’s Second Law that is often used as a simplified model of several controlled systems. It can be represented as a continuous model, as:
    u ˙ = A u + B z + ξ , v = C u + r
    where the state u = u 1 , u 2 T is a two-dimensional vector containing position u 1 and velocity u 2 and where z = u ˙ 2 is the controlled input. The coefficients matrices A, B, and C are time-invariant system and observations matrices. r is the noise of the observations, which is Gaussian and two dimensional. The system disruption ξ comprises two aspects: random system noise and instability of the structural model.
  • Lorenz system:
    Lorenz developed a simplified mathematical model for atmospheric convection in 1963. It is a common test case for DA algorithms. The model is a system of three ordinary differential equations, named Lorenz equations. For some parameter values and initial conditions, it is noteworthy for having a chaotic behavior. The Lorenz equations are given by the nonlinear system:
    d p d t = σ ( p q ) , d q d t = ρ p q p r , d r d t = p q β r .
    where p, q and r are coordinates, and σ , ρ , and β are parameters. In our implementation, the model has been discretized by second order Runge–Kutta method [43] and, for this test case, we posed σ = 10 , ρ = 8 / 3 and β = 28 .
We mainly provide testing in order to validate the results that we proved in the previous section. Afterwards, we mainly focus on:
4.1
DDA accuracy: we prove that the accuracy of the forecasting result increases as the number of time steps increases, as proved in Theorem 1;
4.2
DDA vs. DA: in order to address the true models, we face the circumstances where DA is not sufficient to reduce the modeling uncertainty and we show that the DDA algorithm, we have proposed is applicable to solve this issue. Such findings validate what we showed in Theorem 2; and,
4.3
R-DDA vs. F-DDA: we are demonstrating that the accuracy of the R-DDA is better than that of the F-DDA. It is also due to the way the weight is calculated, i.e., by including the DA covariance matrices, as shown in Theorem 3.
We implemented the algorithms on the Simulink (Simulation and Model-Based Design) in Matlab 2016a.

4.1. DDA Accuracy

4.1.1. Double Integral Mass Dot System

When considering model uncertainty, a system disturbance in (36) is introduced so that:
ξ k = ξ s m u ( u k ) + ξ i i d , k ,
where ξ s m u ( · ) denotes the structural model uncertainty and ξ i i d denotes the i.i.d random disturbance.
One typical model error is from the parallel composition as:
ξ s m u ( u k ) = D e u k + 1 1 + e u k + 1 .
where the amplitude vector D = d 1 , d 2 is adjustable, according to the reality.
A cascade controller is designed to prevent divergence from the device in order to accomplish the model simulation. As a sinusoidal function, the tracking signal is set. 10,000 samples are collected from a 100 s simulation with a 100 Hz sample frequency. The training set for the DDA Algorithm 2 is made of the time series { u k , u k D D A } of m couples. First of all, we run the framework for the dynamic system Equation (36) and record the observation v, control input z. A corresponding DA runs alongside program updates, and outputs a u D A prediction sequence. The network G ω , 1 is trained on the dataset of m steps.
Subsequently, the system and the DA update the data from the m + 1 step to 2 m step. Although the device upgrade process in DA now incorporates the qualified neural network G ω , 1 , as:
u k = G ω , 1 ( M u k 1 + G u k )
Figure 6 shows what we expected regarding the accuracy of DDA. In fact, as it is shown, the mean forecasting error decrease as the training window length increases. The Figure also shows a comparison of the mean forecasting error values with respect the results that were obtained from DA. Figure 7 is a convergence curve when matlab trains a network using the Levenberg–Marquardt backpropagation [44,45] method. It can be seen that, with just few epochs, the best value for the mean square error is achieved.

4.1.2. Lorenz System

We implement a Lorenz system with structural model uncertain as:
u k = M k ( u k 1 ) + ξ s m u , k
where u k = [ p k , q k , r k ] T denotes the state and M k ( · ) denotes the discrete function. The structural model uncertainty is denoted here as ξ s m u , k = c u k 1 , with c = 0.01 , and M k ( · ) , such that:
p k + 1 = p k + σ Δ t 2 [ 2 ( q k p k ) + Δ t ( ρ p k q k p k r k ) σ Δ t ( q k p k ) ] , q k + 1 = q k + Δ t 2 [ ρ p k q k p k r k + ρ ( p k + σ Δ t ( q k p k ) ) q k Δ t ( ρ p k q k p k r k ) ( p k + σ Δ t ( q k p k ) ) ( r k + Δ t ( p k q k β r k ) ) ] , r k + 1 = r k + Δ t 2 [ p k q k β r k + ( p k + Δ t σ ( q k p k ) ) ( q k + Δ t ( ρ p k q k p k r k ) ) β r k Δ t ( p k q k β r k ) ] .
First of all, for the dynamic system, Equation (37), we run the system and record the true value of the system u. Subsequently, by adding Gaussian noise to the true value of u, v is created. The DA algorithm is then applied to produce a sequence with a DA result of u D A and a length of m. The training set is made of the time series { M k ( u k 1 D A ) , u k D A } . Afterwards, the neural network G ω is trained.
Figure 8 shows the accuracy results of DDA for Lorenz test case. Additionally, in this case, the results confirm our expectation. In fact, as it is shown, the mean forecasting error decrease as the training window length increases. Additionally, in this figure, a comparison of the mean forecasting error values with respect the results obtained from DA is shown. Figure 9 is a convergence curve when matlab trains a network using the Levenberg–Marquardt backpropagation method and, also in this case, the best value for the mean square error is achieved with just a few epochs.

4.2. DDA vs. DA

In this section, we compare the forecasting results based on DA model and the DDA model, respectively. For the DA, the system model update is formed as:
u k = M k ( u k 1 ) .
where M k is the discrete forecasting system in (3). For DDA, the system update model is as:
u k = M k ( u k 1 ) .
where M k is defined in (17).

4.2.1. Double Integral Mass Dot System

Figure 10 shows the result of DA and DDA model on a double integral system that considers the model error. Two vertical dash lines are at which the neural network is trained. Obviously, the forecasting state value based on DDA is more closer to the true value than DA. The DA results in a larger residual. We also calculate the ratio of residual to true value as a forecasting error metric. The average forecasting error is significantly reduced from 16.7% to 1.5% by applying DDA model. Table 1 lists the run-time cost of every training window in Figure 10.

4.2.2. Lorenz System

We compare the forecasting results that are shown in Figure 11. The k [ 0 , 1000 ] generated training set is not plotted in full. The forecast starts at k = 1000 . We can see that the DA model’s forecasting errors in all three axes are large. DA model trajectories can not track the real state. Although the DDA model trajectories track the true value very well. Figure 12 is a 3D plot of this forecasting part of the Lorenz system. This demonstrates that the DA model fails to predict the right hand part of this butterfly. In this test, for the right wing trajectory of butterfly, the DDA model outperforms the DA model in forecasting.

4.3. F-DDA vs. R-DDA

In this section, we introduce the DDA using the feedfoward neural network (we named F-DDA) and compare the results with the R-DDA, i.e., the DDA we described in this paper that uses the recurrent neural network. The feed-foward (FC) network is the earliest and most basic neural network structure. This means that in the next layer, the output of each neuron in the upper layer becomes the input of each neuron, and it has a structure with corresponding values of weight. Inside the FC network, there is no cycle or loop. In the fitting of functions, fully conected neural networks are widely cited (especially the output of continuous value functions). The multilayer perceptron (MLP), which contains three hidden layers, is used in this paper.
In the training step of the DDA framework, the training set is the DDA result over a period of time. Take the first cycle ( i = 0 ) as an example. The training set is u 0 D D A , u 1 D D A , u 2 D D A , . . . , u m D D A (the results of consecutive time series), and this data set generates the model. Because this network is a feedfoward neural network and it is unable to manage time series, we consider that the training set for the M category is independent of each other.

4.3.1. Double Integral Mass Dot System

In this section, we compared the performance of F-DDA and R-DDA in the double integral mass dot system. Because our data were previously generated by simulink, random noise, and observation noise are fixed and can be compared.
Figure 13 shows that F-DDA obtains a similar result when comparing to R-DDA in this case. It is mainly because the double integral mass dot system is not strongly dependent on the time historical states. Meanwhile, when considering the execution time cost that is shown in Table 1, F-DDA is more preferable, as it takes shorter training time than R-DDA.

4.3.2. Lorenz System

Figure 14 is a comparison of the forecasting results of F-DDA and R-DDA for the Lorenz system. It can be seen that the R-DDA (black dashed line) has a more accurate prediction of the true value (blue solid line) and the curve fits better. In contrast, F-DDA (red dotted line) has a certain time lag and it is also relatively inaccurate in amplitude. This is mainly because the Lorenz system is strongly time-dependent. Table 2 provides the run-time cost for both F-DDA and R-DDA in training and forecasting. It is worth taking more training time for R-DDA to achieve a better prediction than F-DDA.

5. Conclusions and Future Work

In this article, we discuss the DDA: the integration of DL and DA and we validate the algorithm by a numerical examples. The DA methods have increased strongly in complexity in order to better suit their application requirements and circumvent their implementation problems. However, the approaches to DA are unable to fully overcome their unrealistic assumptions, particularly of zero error covariances, linearity, and normality. DL shows great capability in approximating nonlinear systems, and extracting high-dimensional features. Together with the DA methods, DL is capable of helping traditional methods to make forecasts without the conventional methods’ assumptions. On the other side, the training data provided to DL technologies include several numerical, approximation, and round off errors that are trained in the DL forecasting model. DA can increase the reliability of the DL models reducing errors by including information on physical meanings from the observed data.
This paper showed that the cohesion of DL and DA is blended in the future generation of technology that is used in support of predictive models.
So far, the DDA algorithm still remains to be implemented and verified in varies specific domains, which have huge state space and more complex internal and external mechanisms. Future work includes adding more data to the systems. A generalization of DDA could be developed if it is used for different dynamical systems.
The numerical solution of dynamic systems could be replaced by DL algorithms, such as Generative Adversarial Networks or Convolutional Neural Networks combined with LSTM, in order to make the runs faster. This will accelerate the forecast process towards a solution in real time. When combined with DDA, this future work has the potential to be very fast.
DDA is encouraging, as speed and accuracy are typically terms that are mutually exclusive. The results that are shown here are promising and demonstrate how, in computationally demanding physical models, the merger of DA and ML models, especially in computationally demanding physical models.

Author Contributions

R.A. and Y.-K.G. conceived and designed the experiments; J.Z. and S.H. performed the experiments; All the authors analyzed the data; All the authors contributed reagents/materials/ analysis tools. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Imperial College-Zhejiang University Joint Applied Data Science Lab. The work is supported by the EP/T000414/1 PREdictive Modelling with QuantIfication of UncERtainty for MultiphasE Systems (PREMIERE) and the EP/T003189/1 Health assessment across biological length scales for personal pollution exposure and its mitigation (INHALE).

Acknowledgments

This work is supported by the EP/T000414/1 PREdictive Modelling with QuantIfication of UncERtainty for MultiphasE Systems (PREMIERE) and the EP/T003189/1 Health assessment across biological length scales for personal pollution exposure and its mitigation (INHALE).

Conflicts of Interest

The authors declare no conflict of interest.

Acronyms

DAData Assimilation
DDADeep Data Assimilation
DLDeep Learning
DNNDeep Neural Network
LSTMLong Short Term Memory
MLMachine Learning
NNNeural Network
VarDAVariational Data Assimilation

References

  1. Kalnay, E. Atmospheric Modeling, Data Assimilation and Predictability; Cambridge University Press: Cambridge, MA, USA, 2003. [Google Scholar]
  2. Blum, J.; Le Dimet, F.X.; Navon, I.M. Data assimilation for geophysical fluids. In Handbook of Numerical Analysis; Elsevier: Amsterdam, The Netherlands, 2009; Volume 14, pp. 385–441. [Google Scholar]
  3. D’Elia, M.; Perego, M.; Veneziani, A. A variational data assimilation procedure for the incompressible Navier-Stokes equations in hemodynamics. J. Sci. Comput. 2012, 52, 340–359. [Google Scholar] [CrossRef]
  4. Potthast, R.; Graben, P.B. Inverse problems in neural field theory. SIAM J. Appl. Dyn. Syst. 2009, 8, 1405–1433. [Google Scholar] [CrossRef]
  5. Christie, M.A.; Glimm, J.; Grove, J.W.; Higdon, D.M.; Sharp, D.H.; Wood-Schultz, M.M. Error analysis and simulations of complex phenomena. Los Alamos Sci. 2005, 29, 6–25. [Google Scholar]
  6. Asch, M.; Bocquet, M.; Nodet, M. Data Assimilation: Methods, Algorithms, and Applications; SIAM: Philadelphia, PA, USA, 2016; Volume 11. [Google Scholar]
  7. Weinan, E. A proposal on machine learning via dynamical systems. Commun. Math. Stat. 2017, 5, 1–11. [Google Scholar]
  8. Li, Q.; Chen, L.; Tai, C.; Weinan, E. Maximum principle based algorithms for deep learning. J. Mach. Learn. Res. 2017, 18, 5998–6026. [Google Scholar]
  9. Boukabara, S.A.; Krasnopolsky, V.; Stewart, J.Q.; Maddy, E.S.; Shahroudi, N.; Hoffman, R.N. Leveraging Modern Artificial Intelligence for Remote Sensing and NWP: Benefits and Challenges. Bull. Am. Meteorol. Soc. 2019, 100, ES473–ES491. [Google Scholar] [CrossRef]
  10. Dueben, P.D.; Bauer, P. Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev. 2018, 11, 3999–4009. [Google Scholar] [CrossRef] [Green Version]
  11. Cacuci, D. Sensitivity and Uncertainty Analysis; Chapman & Hall/CRC: New York, NY, USA, 2003. [Google Scholar]
  12. Daescu, D.; Navon, I. Sensitivity analysis in nonlinear variational data assimilation: Theoretical aspects and applications. In Advanced Numerical Methods for Complex Environmental Models: Needs and Availability; Farago, I., Zlatev, Z., Eds.; Bentham Science Publishers: Sharjah, UAE, 2013. [Google Scholar]
  13. Arcucci, R.; D’Amore, L.; Pistoia, J.; Toumi, R.; Murli, A. On the variational data assimilation problem solving and sensitivity analysis. J. Comput. Phys. 2017, 335, 311–326. [Google Scholar] [CrossRef] [Green Version]
  14. Cacuci, D.; Navon, I.; Ionescu-Bujor, M. Computational Methods for Data Evaluation and Assimilation; CRC Press: Boca Raton, FL, USA, 2013. [Google Scholar]
  15. Fisher, M.; Leutbecher, M.; Kelly, G.A. On the equivalence between Kalman smoothing and weak-constraint four-dimensional variational data assimilation. Q. J. R. Meteorol. Soc. 2005, 131, 3235–3246. [Google Scholar] [CrossRef]
  16. Gagne, D.J.; McGovern, A.; Haupt, S.E.; Sobash, R.A.; Williams, J.K.; Xue, M. Storm-based probabilistic hail forecasting with machine learning applied to convection-allowing ensembles. Weather Forecast. 2017, 32, 1819–1840. [Google Scholar] [CrossRef]
  17. Campos, R.M.; Krasnopolsky, V.; Alves, J.H.G.; Penny, S.G. Nonlinear wave ensemble averaging in the Gulf of Mexico using neural networks. J. Atmos. Ocean. Technol. 2019, 36, 113–127. [Google Scholar] [CrossRef]
  18. Babovic, V.; Keijzer, M.; Bundzel, M. From global to local modelling: A case study in error correction of deterministic models. In Proceedings of the Fourth International Conference on Hydro Informatics, Cedar Rapids, IA, USA, 23–27 July 2000. [Google Scholar]
  19. Babovic, V.; Caňizares, R.; Jensen, H.R.; Klinting, A. Neural networks as routine for error updating of numerical models. J. Hydraul. Eng. 2001, 127, 181–193. [Google Scholar] [CrossRef]
  20. Babovic, V.; Fuhrman, D.R. Data assimilation of local model error forecasts in a deterministic model. Int. J. Numer. Methods Fluids 2002, 39, 887–918. [Google Scholar] [CrossRef]
  21. Brajard, J.; Carassi, A.; Bocquet, M.; Bertino, L. Combining data assimilation and machine learning to emulate a dynamical model from sparse and noisy observations: A case study with the Lorenz 96 model. arXiv 2020, arXiv:2001.01520. [Google Scholar] [CrossRef]
  22. Rasp, S.; Dueben, P.D.; Scher, S.; Weyn, J.A.; Mouatadid, S.; Thuerey, N. WeatherBench: A benchmark dataset for data-driven weather forecasting. arXiv 2020, arXiv:2002.00469. [Google Scholar] [CrossRef]
  23. Geer, A.J. Learning Earth System Models from Observations: Machine Learning or Data Assimilation? Technical Report 863; ECMWF: Reading, UK, 2020. [Google Scholar] [CrossRef]
  24. Bocquet, M.; Brajard, J.; Carrassi, A.; Bertino, L. Bayesian inference of chaotic dynamics by merging data assimilation, machine learning and expectation-maximization. Found. Data Sci. 2020, 2, 55–80. [Google Scholar] [CrossRef] [Green Version]
  25. Raissi, M.; Perdikaris, P.; Karniadakis, G. Inferring solutions of differential equations using noisy multi-fidelity data. J. Comput. Phys. 2016, 335. [Google Scholar] [CrossRef] [Green Version]
  26. Perdikaris, P.; Raissi, M.; Damianou, A.; Lawrence, N.; Karniadakis, G. Nonlinear information fusion algorithms for data-efficient multi-fidelity modelling. Proc. R. Soc. A Math. Phys. Eng. Sci. 2017, 473, 20160751. [Google Scholar] [CrossRef]
  27. Chao, G.; Luo, Y.; Ding, W. Recent advances in supervised dimension reduction: A survey. Mach. Learn. Knowl. Extr. 2019, 1, 341–358. [Google Scholar] [CrossRef] [Green Version]
  28. Quilodrán Casas, C.; Arcucci, R.; Wu, P.; Pain, C.; Guo, Y.K. A Reduced Order Deep Data Assimilation model. Phys. D Nonlinear Phenom. 2020, 412, 132615. [Google Scholar] [CrossRef]
  29. Makarynskyy, O. Improving wave predictions with artificial neural networks. Ocean Eng. 2004, 31, 709–724. [Google Scholar] [CrossRef]
  30. Coskun, H.; Achilles, F.; Dipietro, R.; Navab, N.; Tombari, F. Long Short-Term Memory Kalman Filters: Recurrent Neural Estimators for Pose Regularization. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 5525–5533. [Google Scholar] [CrossRef] [Green Version]
  31. Zhu, J.; Hu, S.; Arcucci, R.; Xu, C.; Zhu, J.; Guo, Y.k. Model error correction in data assimilation by integrating neural networks. Big Data Min. Anal. 2019, 2, 83–91. [Google Scholar] [CrossRef]
  32. Buizza, C.; Fischer, T.; Demiris, Y. Real-Time Multi-Person Pose Tracking using Data Assimilation. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision, Snowmass Village, CO, USA, 1–5 March 2020; pp. 1–8. [Google Scholar]
  33. Arcucci, R.; Moutiq, L.; Guo, Y.K. Neural assimilation. In International Conference on Computational Science; Springer: Berlin/Heidelberg, Germany, 2020; pp. 155–168. [Google Scholar]
  34. Foo, Y.W.; Goh, C.; Li, Y. Machine learning with sensitivity analysis to determine key factors contributing to energy consumption in cloud data centers. In Proceedings of the 2016 International Conference on Cloud Computing Research and Innovations (ICCCRI), Singapore, 4–5 May 2016; IEEE: Piscataway, NJ, USA, 2016; pp. 107–113. [Google Scholar]
  35. Purser, R. A new approach to the optimal assimilation of meteorological data by iterative Bayesian analysis. In Proceedings of the Preprint of the 10th AMS Conference on Weather Forecasting and Analysis, Clearwater Beach, FL, USA, 25–29 June 1984; American Meteorological Society: Boston, MA, USA, 1984; pp. 102–105. [Google Scholar]
  36. Engl, H.W.; Hanke, M.; Neubauer, A. Regularization of Inverse Problems; Springer Science & Business Media: Berlin/Heidelberg, Germany, 1996; Volume 375. [Google Scholar]
  37. Nichols, N. Mathematical concepts in data assimilation. In Data Assimilation; Lahoz, W., Khattatov, B., Menard, M., Eds.; Springer: Berlin/Heidelberg, Germany, 2010. [Google Scholar]
  38. Hansen, P. Rank Deficient and Discrete Ill-Posed Problems; SIAM: Philadelphia, PA, USA, 1998. [Google Scholar]
  39. Le Dimet, F.; Talagrand, O. Variational algorithms for analysis and assimilation of meteorological observations: Theoretical aspects. Tellus 1986, 38A, 97–110. [Google Scholar] [CrossRef]
  40. Kalman, R.E. A new approach to linear filtering and prediction problems. J. Basic Eng. 1960, 82, 35–45. [Google Scholar] [CrossRef] [Green Version]
  41. Elman, J.L. Finding structure in time. Cogn. Sci. 1990, 14, 179–211. [Google Scholar] [CrossRef]
  42. Jordan, M.I. Serial order: A parallel distributed processing approach. In Advances in Psychology; Elsevier: Amsterdam, The Netherlands, 1997; Volume 121, pp. 471–495. [Google Scholar]
  43. Lawless, A.S. Data Assimiliation with the Lorenz Equations; University of Reading: Reading, UK, 2002. [Google Scholar]
  44. Levenberg, K. A method for the solution of certain non-linear problems in least squares. Q. Appl. Math. 1944, 2, 164–168. [Google Scholar] [CrossRef] [Green Version]
  45. Marquardt, D.W. An algorithm for least-squares estimation of nonlinear parameters. J. Soc. Ind. Appl. Math. 1963, 11, 431–441. [Google Scholar] [CrossRef]
Figure 1. The prediction-correction cycle.
Figure 1. The prediction-correction cycle.
Applsci 11 01114 g001
Figure 2. The Data Assimilation (DA) process.
Figure 2. The Data Assimilation (DA) process.
Applsci 11 01114 g002
Figure 3. The Deep Data Assimilation (DDA) process.
Figure 3. The Deep Data Assimilation (DDA) process.
Applsci 11 01114 g003
Figure 4. The schematic diagram of DDA.
Figure 4. The schematic diagram of DDA.
Applsci 11 01114 g004
Figure 5. Recurrent Neural Network.
Figure 5. Recurrent Neural Network.
Applsci 11 01114 g005
Figure 6. Mean forecasting error-first test case.
Figure 6. Mean forecasting error-first test case.
Applsci 11 01114 g006
Figure 7. Training process-first test case.
Figure 7. Training process-first test case.
Applsci 11 01114 g007
Figure 8. Mean forecasting error-second test case.
Figure 8. Mean forecasting error-second test case.
Applsci 11 01114 g008
Figure 9. Training process-second test case.
Figure 9. Training process-second test case.
Applsci 11 01114 g009
Figure 10. Simulation result of DDA on double integral system considering the model error. The vertical dash-lines refer to the training windows.
Figure 10. Simulation result of DDA on double integral system considering the model error. The vertical dash-lines refer to the training windows.
Applsci 11 01114 g010
Figure 11. DA and DDA trajectories on 3 axis.
Figure 11. DA and DDA trajectories on 3 axis.
Applsci 11 01114 g011
Figure 12. Forecasts of the DA and DDA trajectories. It is the three-dimensional (3D)-plot of Figure 11 for timestep ( 1000 , 1176 ) .
Figure 12. Forecasts of the DA and DDA trajectories. It is the three-dimensional (3D)-plot of Figure 11 for timestep ( 1000 , 1176 ) .
Applsci 11 01114 g012
Figure 13. The simulation result of DDA on double integral system. The vertical dash-lines refer to the training windows.
Figure 13. The simulation result of DDA on double integral system. The vertical dash-lines refer to the training windows.
Applsci 11 01114 g013
Figure 14. Trajectories of DA and DDA forecasting. It is the timesteps as Figure 11 for ( 1000 , 1176 ) . R-DDA (black dashline) vs. F-DDA (red dashline).
Figure 14. Trajectories of DA and DDA forecasting. It is the timesteps as Figure 11 for ( 1000 , 1176 ) . R-DDA (black dashline) vs. F-DDA (red dashline).
Applsci 11 01114 g014
Table 1. The execution time cost for both F-DDA and R-DDA in every training and forecasting window when considering the model error.
Table 1. The execution time cost for both F-DDA and R-DDA in every training and forecasting window when considering the model error.
Traint1t2t3
F-DDA0.9605 s0.5723 s0.4181 s
R-DDA102.2533 s16.4835 s15.0160 s
Forecastt1t2t3
F-DDA17.9610 s36.2358 s56.4286 s
R-DDA18.3527 s35.9934 s53.8951 s
Table 2. Run-time cost for F-DDA and R-DDA in training and forecasting of Lorenz system.
Table 2. Run-time cost for F-DDA and R-DDA in training and forecasting of Lorenz system.
Train TimeForecast Time
F-DDA3.7812 s3.8581 s
R-DDA24.2487 s3.2901 s
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Arcucci, R.; Zhu, J.; Hu, S.; Guo, Y.-K. Deep Data Assimilation: Integrating Deep Learning with Data Assimilation. Appl. Sci. 2021, 11, 1114. https://doi.org/10.3390/app11031114

AMA Style

Arcucci R, Zhu J, Hu S, Guo Y-K. Deep Data Assimilation: Integrating Deep Learning with Data Assimilation. Applied Sciences. 2021; 11(3):1114. https://doi.org/10.3390/app11031114

Chicago/Turabian Style

Arcucci, Rossella, Jiangcheng Zhu, Shuang Hu, and Yi-Ke Guo. 2021. "Deep Data Assimilation: Integrating Deep Learning with Data Assimilation" Applied Sciences 11, no. 3: 1114. https://doi.org/10.3390/app11031114

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop