1. Introduction
The spatio-temporal high resolution monitoring of sea surface geophysical parameters (e.g., temperature, salinity, ocean colour) is of key interest for a variety of scientific fields [
1,
2,
3]. Direct observations of these geophysical tracers are provided by satellite remote sensing observations and in-situ networks. However, due to sensors characteristics (e.g., space-time sampling, sensor type) and their sensitivity to the atmospheric conditions (e.g., rain, clouds), only partial, with potentially high missing data rates, and possibly noisy observations are available. As a consequence, providing high resolution gape free spatio-temporal fields, in both space and time, based on these observations have long been a crucial challenge that motivated the development of several spatio-temporal interpolation tools.
Within the satellite ocean community, Optimal Interpolation (OI) is a standard technique used in several operational products [
4,
5,
6,
7,
8,
9,
10]. Given a covariance model of spatio-temporal dynamics, the interpolated field results from a linear combination of the observations. In general, stationary covariance hypotheses are considered, which prove relevant for the reconstruction of horizontal scales above 100 km. Fine scale components in the other hand may hardly be retrieved with such approaches and a variety of research studies aim to improve the reconstruction of high-resolution components of spatio-temporal fields.
Empirical Orthogonal Function (EOF) based interpolation is an other category widely used in geosciences [
11,
12,
13]. It relies on a Singular Value Decomposition (SVD) to compute an EOF basis, the field is then reconstructed by projecting the observations on the EOF subspace until a convergence criterion is reached [
14]. Unfortunately, dealing with high missing data rates decreases the encoded variability in the EOF components witch results in smoothing fine scale structures.
Data assimilation is the state-of-the-art framework for the reconstruction of dynamical systems from partial observations based on a given numerical model [
15,
16]. Statistical data assimilation schemes, especially ensemble Kalman filters, have become particularly popular due to their trade-off between computational efficiency and modeling flexibility. Unlike OI and EOF based techniques, these schemes explicitly rely on dynamical priors to address interpolation issues resulting in better representation of fine scale components. However, When dealing with sea surface dynamics, the analytical derivation of these priors involves simplifying assumptions which may not be satisfied by real observations [
17]. By contrast, realistic analytical parameterizations may lead to highly computationally-demanding numerical models associated with modeling and inversion uncertainties [
18], which may limit their relevance for an application of the interpolation of a single sea surface tracer.
Recently, data-driven approaches [
11,
19] have emerged as relevant alternatives to model-driven schemes. They take benefit from the increasing availability of remote sensing observations and simulation data to derive computationally efficient [
20] dynamical priors. Analog methods are one of the first data-driven techniques developed within a data assimilation framework [
19]. In our recent study [
20,
21,
22], we proved the relevance of such data-driven approache when addressing the spatio-temporal interpolation of sea surface geophysical tracers. Combining analog data assimilation (AnDA) with a patch-based representation have shown great results with respect to the state-of-the-art OI and EOF-based schemes. However, the parametrization of the proposed framework involves tuning several parameters principally due to the data-driven formulation of the dynamical prior based on analog forecasting. The implementation of this dynamical prior in an ensemble filtering scheme also limits the representativity of the model as a trad-off between the method’s parameters and the ensemble size need to be addressed carefully to decrease the computational complexity of the assimilation method. From this point of view, several works [
23,
24] tried to formulate stochastic representations of dynamical operators for their optimal use in sequential filtering schemes. Methods based on prior knowledge of the variability of dynamical models have already been addressed to infer probabilistic representations. However, such techniques are limited to systems with available dynamical priors. Complex dynamical models in the other hand may require complex priors which may be unavailable or hard to derive.
In the last years, Neural Networks have enriched the state-of-the-art in probabilistic modelling. This is principally due to the advances in deep learning models which allow better understanding of complicated systems. Probabilistic representations such as structured inference [
25] and deep Gaussian processes [
26] have rapidly became very popular in applications such as generative modeling and dynamical inference. From this point of view the stochastic modelization of spatio-temporal fields is an interesting open challenge that may benefit from these advances and can allow the representation of complex stochastic dynamics without any prior knowledge regarding our system.
In this paper, we investigate data-driven interpolation approaches within a statistical data assimilation framework. We aim to derive stochastic data-driven representations of complex geophysical tracers. Among other representations [
27] Neural networks are particularly appealing due to their efficient trad-off between modeling abilities and interpretability of the learnt models. This models have rapidly become the state-of-the-art in machine learning for a wide range of applications, including inverse imaging issues [
28]. Recent applications to the assimilation of low-dimensional dynamical systems [
27] and to the forecasting of geophysical dynamics [
29,
30] have been developed. However, to our knowledge, the design of neural-network-based assimilation models for the spatio-temporal interpolation of geophysical dynamics remain an open challenge, which may greatly benefit from the ability of deep learning models to capture computationally-efficient representations from available ocean observation and simulation datasets. In this study, we address this challenge and propose a novel NN-based Kalman filtering scheme applied to the spatio-temporal interpolation of satellite-derived sea surface temperature. We aim to propose a parametric data driven framework that embed a stochastic representation of spatio-temporal dynamics. this architecture conveys a probabilistic representation through the prediction of a mean component and a covariance pattern. The latter may be regarded as a NN-based representation of the covariance patterns issued from Monte Carlo approximations in ensemble assimilation schemes [
31]. Our model may then be directly exploited in sequential filtering schemes which allows us to overcome both issues encountered in analog data assimilation and parametric stochastic representations based on prior knowledge in terms of numerical complexity and availability of dynamical priors. Overall, the methodological contributions of this work are two-fold: (i) we propose a new probabilistic NN-based representation of 2D geophysical dynamics, (ii) we derive the associated NN-based Kalman filtering scheme for spatio-temporal interpolation issues. We demonstrate the relevance of these contributions with respect to state-of-the-art approaches [
5,
11,
22] for the spatio-temporal interpolation of satellite-derived SST fields in a case study region off South Africa. This region involves complex fine-scale SST dynamics (e.g., fronts, filaments) which can’t be retrieved using classical state-of-the-art techniques.
This paper is organized as follows.
Section 2 reviews data assimilation schemes.
Section 3 describes the proposed neural-network-based data assimilation framework.
Section 4 presents the SST dataset used in our experiments as well as the parametrization chosen for the proposed model and benchmark techniques.
Section 5 presents the results of the numerical experiments. We further discuss our contributions in
Section 6.
2. Problem Statement and Related Work
Regarding ocean remote sensing data, spatio-temporal interpolation issues (also referred to as data assimilation in geoscience) can be regarded as the reconstruction of some hidden states from partial and/or noisy observation series [
31]. Data assimilation techniques usually involve a state-space evolution model [
31]:
where
represents the temporal resolution of our time series and
the dynamical model describing the temporal evolution of the physical variables
x. The observation model
links the observation
y to the physical variable
x.
and
are random processes accounting for the uncertainties in the dynamical and observation models. They are usually defined as centered Gaussian processes with covariances
and
respectively.
From a probabilistic point of view, the spatio-temporal interpolation problem can be seen as a Bayesian filtering problem where the main goal is to evaluate the conditional probabilities
(prediction distribution of the state
given observations up to time
t) and
(posterior distribution of
given observations up to time
). Under certain assumptions over the state space model (the dynamical and observation models are linear with Gaussian uncertainties), the prediction and posterior distributions are also Gaussian and can be written as:
with the means and covariances computed for each time
t using the well known Kalman recursion
with
Here
F and
corresponds respectively to some linear dynamical and observation models. The superscript (-) refers to the forecasting of the mean of the state variable
and of its covariance matrix
given observations up to time
t but without the new observation at time
. The superscript (+) refers in the other hand to the mean of the state variable
and of the covariance matrix
given all observations up to time
. They are referred to as the assimilated mean and covariance.
is the Kalman gain. Kalman filters provide a sequential formulation of the Optimal Interpolation (OI) [
15] which may also be solved directly knowing the space-time covariance of processes
x and
y. For non-linear and high-dimensional dynamical systems, the Probability Density Functions (PDFs) are not Gaussian anymore and the above Kalman recursion does define their means and covariances. Ensemble Kalman methods have been proposed to address these issues. The ensemble Kalman filter and smoother [
31] are the first sequential filtering techniques used reliably in the reconstruction of geophysical fields. The key idea here is to approximate the forecasting mean
and covariance
by a sample mean and covariance matrix computed by propagating an ensemble of
M members,
, using the dynamical model
.
Besides all its advantages, EnKF techniques do not escape the curse of dimensionality. High-dimensional systems require using large ensemble sizes
M which may lead to very high-computational complexity. The use of small ensemble sizes in the other hand may result in undersampling the covariance matrix (the considered ensemble is not representative of our systems dynamics) which may in turn result in poor reconstruction performance, including for instance filter divergence and spurious long-range correlations. Proposed solutions such as inflation [
32], cross-validation [
33] and localization methods [
34,
35,
36] may require thorough tuning experiments. An alternative strategy based on a model-driven propagation of parametric covariance models [
23,
24] seems appealing. Using advection priors [
37], it propagates parametric covariance structures, which leads to the implementation of the classic Kalman recursion. Accounting for more complex dynamical priors for the covariance structure is an open question, which may limit the applicability of this approach to complex geophysical systems. Inspired by the latter parametric framework, we aim to design an efficient sequential filtering technique for the reconstruction of geophysical fields. Rather than considering a model-driven prior to propagate Gaussian states as in [
23,
24], we investigate NN-based priors, which may be fitted from training data. The resulting NN-based Gaussian representations provide computationally-efficient approximations of the dynamical priors that should prevent undersampling issues within a Kalman recursion.
3. Proposed Interpolation Model
3.1. Neural-Network Gaussian Dynamical Prior
Our key idea is to exploit neural-network (NN) representations for the time propagation of a Gaussian approximation of the distribution of the physical variable
x. Compared with dynamical priors in the assimilation model (
1), which states the conditional distribution
, we consider neural-network representations to extend the prediction step of the Kalman recursion (
5) and (
6) to non-linear dynamics. Formally, it comes to define:
with
and
the predicted mean and covariance of the Gaussian approximation of the state at time
given the assimilated mean
and covariance
at time
t. Functions
are neural networks to be defined with parameter vectors
. It may be noted that our parameterization follows (
5) and (
6) such that the update of the mean component in (
16) only depends on the mean at the previous time step and the update of the covariance depends both on the mean and covariance at the previous time step. Given this NN-based representation of the prediction step of the Kalman filter, we apply the classic Kalman-based filtering under the assumption that the observation model is linear and Gaussian.
Such a formulation does not require forecasting an ensemble to compute a sample covariance matrix. It results in a significant reduction of the computational complexity. The same holds when compared to the computational complexity of the analog data assimilation which involves ensemble forecasting and repeated nearest-neighbor search.
3.2. Patch-Based NN Architecture
When considering spatio-temporal fields, the application of the model defined by (
16) and (
17) should be considered with care to account for the underlying dimensionality, especially for the covariance model. For this reason, a global representation of the spatio-temporal field is most likely to fail due to computational limitations. Following our previous works on analog data assimilation [
21,
22], we consider a patch-based representation as sketched in
Figure 1 (A patch is a
subregion of a 2
D field with
P the width and the height of the patch). This patch-based representation is fully embedded in the considered NN architecture to make explicit both the extraction of the patches from a 2
D field and the reconstruction of a 2
D field from the collection of patches. The latter involves a reconstruction operator which is learnt from data.
Regarding model , the proposed architecture proceeds as follows:
At a given time
t, the first layer of the network, which is parameter-free in terms of training, comes to decompose an input field
into a collection of
patches
, where
P is the width and height of each patch and
s the patch location in the global field. Each patch is decomposed onto an EOF basis
according to:
with
the EOF decomposition of the patch
. The EOF decomposition matrix
is trained offline as preprocessing step; For each
, we predict
using an EOF-patch-based model
. This model is implemented based on a residual architecture to mimic a numerical integration scheme (typically, an Euler or 4th-order Runge-Kutta scheme) of an approximate Ordinary Differential Equation (ODE) parametrized by the residual block of our residual network. By contrast to other neural networks models, This architecture grantee the physical interpretability of our dynamical model as stated in [
27]. In order to enhance the modeling capabilities of our approximate model, The residual block is a classic Multilayer Perceptron (MLP) network with bilinear layers;
The third layer is a reconstruction network
. It combines the predicted patches
to reconstruct the output field
. This reconstruction network
involves a convolution neural network [
38].
The details of the considered parameterizations for the second and third layers are given in
Section 4. To train the mean dynamical model
, we apply a two-step procedure. We first learn the local dynamical models
based on the minimization of the EOF-patch based forecasting error. The reconstruction network
is then optimized using the same criterion over the global field. This training procedure allows the patch based models to be interpreted as local dynamical models and the reconstruction network as a post-processing operator. Other training configurations could be envisaged, we can for example train the all model according to a forecasting error over the global field. However, this results in inconsistent patch models
that can’t be used in assimilation experiments for patch reconstruction issues.
Regarding the covariance model , we also consider a patch-based representation of the spatial domain. More precisely, a block-diagonal parameterization of the covariance model is addressed by training diagonal patch-level covariance models in the EOF space. It may be noted that a diagonal parameterization of the covariance in the EOF space forms a full covariance matrix in the original patch space.
Each patch based covariance model
is learnet according to a Maximum Likelihood (ML) criterion. The associated training dataset comprises patch-based EOF decompositions of the forecasted states according to the mean model
from states of the training dataset corrupted by an additive Gaussian perturbation with a covariance structure
. Here,
is given by the empirical covariance of the EOF patches for the entire training dataset. Overall, for a given patch
, we parameterize
the restriction of covariance
onto patch
as:
with
the diagonal covariance model in the EOF space parametrized by a neural network and
a scaling function. Among different parameterizations, a constant scaling function
led to the best performance in our numerical experiments. Regarding the diagonal covariance model, details on its parametrizations are given in the next section.
To illustrate the relevance of the proposed block diagonal covariance matrix parametrization (based on a patch based projection on the EOF space and illustrated for instance by Equation (
19)), we also investigate a diagonal covariance matrix model in the patch space.
3.3. Data Assimilation Procedure
Given a trained patch-based NN representation as described in the previous section, we derive the associated Kalman-like filtering procedure. As summarized in Algorithm 1, at time step t, given the Gaussian approximation of the posterior likelihood with mean and covariance , we first compute the forecasted Gaussian approximation at time t with mean field and patch-based covariance . The assimilation of the new observation is performed at a patch-level. For each patch , we update the patch-level mean and covariance using Kalman recursion (8) with observation . We then combine these patch-level updates to obtain global mean and covariance . Whereas we compute global mean using trained reconstruction network , just comes to store the collection of patch-level covariances. This procedure is iterated up to the end of the observation sequence.
Algorithm 1 Patch-based NNKF reconstruction |
- 1:
procedure PB-NNKF(,,y,R) - 2:
for t in : - 3:
- 4:
- 5:
- 6:
- 7:
for s in : - 8:
- 9:
- 10:
- 11:
- 12:
|
Compared with the patch-based analog data assimilation [
22], it might be noted that we iterate patch-level assimilation steps and global reconstruction steps thanks to the NN-based propagation of the patch-based covariance structure. This procedure potentially allows information propagation from one patch to neighborhing ones after each assimilation step. By contrast, in the patch-based analog data assimilation, each patch is processed independently, such that no such information propagation can occur. This is regarded as a key feature to account for the propagation of geophysical structures (e.g., fronts, eddies, filaments,...).
We refer to the patch-based NNKF reconstruction model using the EOF block-diagonal parameterization of the covariance model , as model PB-NNKF-EOF. The model using the diagonal parameterization of the covariance model in the patch space is referred to as PB-NNKF.
6. Conclusions
In this work, we addressed neural-network-based models for the spatio-temporal interpolation of satellite-derived SST fields. We introduced a novel probabilistic NN-based representation of geophysical dynamics. This representation, which relies on a patch-level and EOF-based decomposition, allows us to propagate in time a mean component and the covariance of the SST field. It makes direct the derivation of an associated Kalman filter for the spatio-temporal interpolation of SST dynamics. The relevance of the proposed framework is demonstrated in our numerical experiments with respect to the state-of-the-art approaches. Our method clearly outperforms the optimal interpolation and DINEOF based schemes which fail retrieving fine scale structures due to the high missing data rate in our observations. Comparing our data-driven data assimilation scheme to the analog data assimilation framework reveals the importance of investigating such filtering representations. From our numerical experiments, an important gain is stressed with respect to analog forecasting based schemes which is principally due to the formulation of our stochastic dynamical model. The patch based identification procedure allows to significantly reduce the identification complexity while still giving good priors. The recollection of the patches to form the global output allows getting ride of fine tuning post-processing step that can decrease the results as illustrated in our experiments. Finally the stochastic formulation of our dynamical model allows the propagation of a parametric PDF of our transition function in a Kalman like assimilation scheme. This stochastic formulation is completely learnt from data and allows getting ride of the ensemble formulation that may cause limitations in terms of numerical complexity.
We believe that this study opens a new research avenue for the design of stochastic dynamical representations for spatio-temporal fields. The application of the proposed framework to other sea surface geophysical tracers, including multi-source and multi-modal interpolation issues is considered as our first priority. SLA (Sea Level Anomaly) fields could provide an interesting case-study as the associated space-time sampling is particularly scarce and multi-source strategies are of key interest [
40]. Improving the formulation and training of the covariance model is also an important issue. Learning our covariance model based on one step ahead ensemble forecasting is most likely to fail in sequential assimilation frameworks when provided with observations with highly irregular temporal sampling. Optimizing our covariance model based on the spatio-temporal sampling of our observations seems to be an interesting path to investigate as one of our further works.
The use of the RMSE for training our data-driven models and as a diagnosis tool raises the question of the relevance of the proposed criterion. Although from the qualitative analysis based on the visual analysis of our reconstructed fields proved the relevance of the proposed technique. The development of more rigorous diagnosis and training criterions based on structures matching is an appalling research avenue. Exploiting stability analysis tools such as Lyapunov exponents is an interesting approach that may increase the modeling capabilities of our data-driven framework.
Finally, the interpretation of the parametrization of the reconstruction network is an open issue. In our work, our reconstruction network was tuned to give the best forecasting performances with a low computational complexity. However, defining a relationship between the reconstruction network parameters (e.g., number of filters, kernel size, activation function) and the physical system (e.g., fine scale structures identification, patch boundaries) is an open research topic that might be answered in the next years due to the advances of deep learning interpretability.