Data-driven Interpolation of Sea Level Anomalies using Analog Data Assimilation

Despite the well-known limitations of Optimal Interpolation (OI), it remains the conventional method to interpolate Sea Level Anomalies (SLA) from altimeter-derived along-track data. In consideration of the recent developments of data-driven methods as a means to better exploit large-scale observation, simulation and reanalysis datasets for solving inverse problems, this study addresses the improvement of the reconstruction of higher-resolution SLA ﬁelds using analog strategies. The reconstruction is stated as an analog data assimilation issue, where the analog models rely on patch-based and EOF-based representations to circumvent the curse of dimensionality. We implement an Observation System Simulation Experiment in the South China sea. The reported results show the relevance of the proposed framework with a signiﬁcant gain in terms of root mean square error for scales below 100km. We further discuss the usefulness of the proposed analog model as a means to exploit high-resolution model simulations for the processing and analysis of current and future satellite-derived altimetric data.


Introduction 1
The past twenty years have witnessed a deluge of ocean satellite data, such 2 as sea surface height, sea surface temperature, ocean color, ocean current, 3 sea ice, etc. This has helped building big databases of valuable information 4 and represents a major opportunity for the interplay of ideas between ocean 5 remote sensing community and the data science community. Exploring ma-6 chine learning methods in general and non-parametric methods in particular 7 is now feasible and is increasingly drawing the attention of many researchers  More specifically, analog forecasting (Lorenz, 1969) which is among the 10 earliest statistical methods explored in geoscience benefits from recent ad-11 vances in data science. In short, analog forecasting is based on the assump-12 tion that the future state of a system can be predicted throughout the succes-   In this work, we build upon our recent advances in analog data assimi-  The remainder of the paper is organized as follows: Section 2 presents the 64 different datasets used in this paper to design an OSSE, Section 3 gives in-65 sights on the classical methods used for mapping SLA from along track data, 66 Section 4 introduces the proposed analog data assimilation model. Experi-67 mental results for the considered OSSE are shown in Section 5, and Section 68 6 further discuss the key aspects of this work.   fields. An example of these fields is given in Figure 1.  considered here to be Gaussian centered and of covariance R. We assume that 121 and η are independent and that Q and R are known. Two main approaches 122 are generally considered for the mathematical resolution of the system (1)-

124
They differ in the way they infer the analyzed state x a , the first is based on  covariance P a can be calculated using the following OI set of equations: It worths mentioning that Lorenc (1986) showed that OI is closely related 142 to the 3D-Var variational data assimilation algorithm which obtains x a by 143 minimizing the following cost function: While OI had been shown to successfully retrieve large-scale structures in 145 the ocean (≥ 150km), a well-known limitation of OI is that the Gaussian-146 like covariance error matrices smooths out the small-scale information (e.g.
The difference between AnDA and classical data assimilation resides in where N (µ t , Σ t ) is a Gaussian distribution of mean µ t and covariance Σ t .

179
These parameters of the Gaussian distribution are calculated using the result to each pair (A k , S k ) are used to calculate µ t and Σ t , the forecast state x(t) 183 is then sampled from N (µ t , Σ t ). The weights are defined using a Gaussian Scale parameter σ is locally-adapted to the median value of the K distances 186 x(t − 1) − A k 2 to the K analogs. Other types of kernels might be con- with EOF k the k th EOF basis and α k (s, t) the corresponding coefficient for 238 patch P s at time t. Let us denote by Φ(P s , t) the vector of the N E coefficients 239 α k (s, t). This vector represents the projection of dX(P s , t) in the lower-240 dimensional EOF space.
We consider the three analog forecasting operators presented in Section 3.2, 247 namely, the locally-constant, the locally incremental and the locally-linear.

248
The calculation of the weights associated to each analog-successor pair relies  a patchwise EOF-based decomposition-reconstruction with a smaller 307 patch-size (here, 17 × 17 patches) to remove these blocky artifacts.

308
• the reconstruction of fields X asX + dX.

310
We evaluate the proposed PB-AnDA approach using the OSSE presented 311 in Section 2. We perform a qualitative and quantitative comparison to state-  We also evaluated the proposed approach for noisy along-track data.

382
Here, we run two experiments with an additive zero-mean Gaussian noise 383 applied to the simulated along-track data. We consider a noise covariance of 384 R = 0.01 (Experiment A) and of R = 0.03 (Experiment B) which is more 385 close to the instrumental error of conventional altimeters. Given the resulting 386 noisy along-track dataset, we apply the same methods as for the noise-free 387 case study. 388 We run PB-AnDA using different values for R. For Experiment A, Table   389 2 shows that the minimum is reached using the true value of the error R = 390 0.01. While for Experiment B, Table 3 shows that the minimum is counter-  Table   394 4. PB-AnDA still outperforms OI in terms of RMSE and correlation statis-

419
We report mean RMSE and correlation statistics for these four PB-420 AnDA parameterizations in Table 5 for the noisy case-study. Considering