Facilitating Inversion of the Error Covariance Models for the Wide-Swath Altimeters

: Wide-swath satellite altimeter observations are contaminated by errors caused by the uncertainties in the geometry and orientation of the on-board interferometer. These errors are strongly correlated across the track, while also having similar error structures in the along-track direction. We describe a method for modifying the geometric component of the error covariance matrix which facilitates accuracy in the removal of the respective error modes from the signal and improves computational efﬁciency of the data assimilation schemes involving wide-swath altimeter observations. The method has been tested using the Surface Water and Ocean Topography simulator. We show substantial computer cost savings in the pseudo-inversion of the respective error covariance matrix. This efﬁciency improvement comes with a few per cent error in the approximation of the original covariance model simulating uncertainties in the geometry and orientation of the on-board interferometer. speciﬁc information on the correlation structure of the swath altimetry to produce an accurate and computationally efﬁcient inverse covariance approximation. The presented study could be treated as a step towards this ultimate goal.


Introduction
The upcoming Surface Water and Ocean Topography (SWOT, [1,2]) and Coastal and Ocean Measurement with Precise and Innovative Radar Altimeter (COMPIRA, [3]) missions are planned to deliver high resolution maps of ocean surface topography using radar interfereometers. In order to achieve the target sea surface height (SSH) accuracy of a few cm at kilometer resolutions, the new remote sensing technology requires taking into the account multiple spatially correlated errors. In recent years, the SWOT error covariance model [2], developed by the Jet Propulsion Laboratory (JPL), has served as a basis for numerous experiments (e.g., [4][5][6]) assessing the added value of the mission in monitoring the global ocean. SSH errors caused by the uncertainties in the interferometer geometry and orientation (hereinafter referred to as geometric and orientation errors, GOEs) provide one of the two major contributions to the overall error budget. Recently, attempts have been made to remove these errors in data assimilation experiments using simulated SWOT data [7,8]. Success of the approach largely depends on the accuracy of the projection of the SSH innovations on the range of GOE covariance matrix as well as on the relative contribution of other types of errors to this projection. In a more general perspective, the problem of introducing non-local observation error covariances into data assimilation systems requires an efficient algorithm for statistically consistent normalization of the innovations (multiplication of the innovations by the inverse square root of the respective observational error covariance). Regarding SWOT altimetry, some progress has been made in this direction in recent years [5,9].
In this note, we consider an approximation to the GOE model which facilitates the accuracy of the above mentioned projection and provides a computationally efficient algorithm for the pseudo-inversion of the respective error covariance matrix and its symmetric square root.

The Goe Model for Swot Altimetry
In the following, we denote the cross-swath (CS) direction by x and the along-swath (LS) direction by y, assume that SWOT observations are already projected on a regular grid, and enumerate CS grid points by the indices 1 ≤ i, j ≤ n x and along-track grid points by the indices 1 ≤ k, l ≤ n y . Matrices and vectors are denoted by bold capital and lowercase letters, respectively. To distinguish between matrices (vectors) defined on the swath and matrices (vectors) defined on either across-or along-swath directions, we italisize the latter.

Formulation
The JPL error model [2,10] takes into account the following four GOE components e q , 1 ≤ q ≤ 4 caused by the uncertainties in: (a) satellite roll angle (roll error e 1 ), (b) relative postioning of the two interferometer antennas (phase error e 2 ); (c) thermal expansion of the antennas (dilation error e 3 ), and (d) orbit height (timing error e 4 ). A realization of the GOE field e ik is given by the sum of the four error components: where C q kl are the elements of the square root of the along-track covariance matrix of the qth error component, n q = {n q l } is the respective vector of normal noise in the LS direction, and f q = { f q i } are the CS structure functions (Figure 1a) for each of the four components (roll, phase, dilation and timing). The LS square root GOE covariance matrices C q y = {C q kl } are assumed to have a diagonal represention in terms of the along-track Fourier components: where F y is the matrix of the along-track Fourier transform and S q are the positive-definite diagonal marices with the square roots of the LS GOE power spectra shown in Figure 1b. Similar to [7,8], we consider regional applications involving finite pieces of the satellite swath and, therefore, utilize the LS Fourier transform representation by the n y × n y unitary matrix F y .
Note that in practice one has to deal with GOE noise realizations in the form of 2d normal error fields n q jk defined on a regular grid of the SWOT swath [10]. Therefore, it is convenient to rewrite Equation (1) in the equivalent form involving four 2d noise realizations n q ≡ n q jk . To do this, we supplement the structure functions with the operators a j which perform summation of n q in the CS direction and keep the unit variance of the output. As a result, the vectors f q are converted into n x × n x matrices C q x whose elementsĈ q ij are given by or in the matrix/vector notation, where ⊗ denotes the Kronecker product. Using this notation, Equation (1) can be rewritten in the form At this point it is necessary to note that representation (5) can be treated in a more general way as if the matrices C q x were operating in the entire subspace spanned by the structure functions f q , and not by the individual ones implied by the definition (4). In the particular case of SWOT, the CS constituent of the phase error covariance involves two orthogonal structure functions shown by gray lines in Figure 1b, whereas the structure function of the roll error is defined by their difference (dashed line in Figure 1b). Hereinafter we shall imply that C q x are known a priori and have this general form.
The n x n y × n x n y GOE covariance matrix C ≡ ee T is given by In deriving (7) we used the expression for the normal noise covariance: where δ pq is the Kronecker delta, denote the ensemble average and I x,y are the identity matrices in the across-and along-track directions.

Modification of the Goe Covariance Model
Although the GOE covariance model given by (6) exhibits a lot of symmetries, one cannot take advantage of exposed factorizations when computing the pseudo-inverse matrix C + , which is a necessary prerequisite for projecting the innovation field δz onto the subspace spanned by the eigenmodes of C: This type of projection was used in [7,8] to remove GOEs from simulated SWOT observations under a rather restrictive assumption that the GOE field does not change along the track.
The above restriction, however, can be removed by a slight (several percent) modification of the GOE covariance model. The modification is likely to be within the uncertainty limits of the model's formulation and it is based on the similarity of the LS spectra shown in Figure 1b. In particular, JPL documentation [2,10] implies that for swath lengths less than 3000 km the along section spectra have similar forms, so that they can be approximated by some universal spectrum S S q ≈ α q S; α q > 0; q = 1, ..., 4 within the above mentioned degree of accuracy. In this case, the projection can be further simplified by employing the symmetric square root C * of the above approximation to the GOE covariance matrix.
Introducing the notations C y = F T y SF y and C x for the square root of ∑ q α 2 q C q x C qT x , the expression for the projector C * C T * takes the form Here we took into account that C y C + y = I y . Furthermore, since by the definition the rhs of (11) is a n x × n x matrix of rank 4, so its svd decomposition F x Λ 2 F T x (where F x is the n x × 4 matrix with orthonormal columns and Λ 2 x is the diagonal 4 × 4 matrix of the respective eigenvalues) can be easily computed. As a consequence, the symmetric square root of C is given by and can be readily used to compute the approximate projector Keeping in mind that the simplified expression (13) is valid within the accuracy of the approximation (9), we examine it in the next section.

Approximation Errors
To find the best approximation of the symmetric square root of the GOE covariance C (Equation (12)), adopt the notations and consider the following optimization problem where || · || F is the Frobenius norm. Setting the derivatives of (15) with respect to S, α q to zero, using Parceval's identity and properties of the Kronecker product yields the following equations with respect to n y + 4 unknowns α q , q = 1, ..., 4 and diag S = {S k }, k = 1, ..., n y : where Multiplying Equation (16) by Φ −1 x , Equations (16) and (17) can be further simplified to The system of Equations (18) and (19) is non-linear and has multiple solutions (S = α q = 0 is one of those). To avoid this ambiguity, a good first guess for the control vector {S, α q } can be prepared by first rescaling the spectral components S q to minimize the distance between them by choosing a suboptimal α q , and then averaging the resulting scaled approximations of S q in order to get a plausible first guess for S. In the simple case when the along-track spectra can be rescaled with zero error (S q = α q S), Equations (16) and (17) are reduced to identities.
Since the exact expressions for the cost function (15) and gradients (16) and (17) are available, minimization of (15) can be done at relatively low expense by a quasi-Newtonian algorithm. The above expressions depend only on the four structure functions f q and four along-track spectra S q , that are easy to manipulate.
After optimization of the the GOE model parameters α q , S the approximation error can be estimated as = 2J/||C|| 2 F . The results of the optimization are shown in Table 1 for four different resolutions δx of a 1024 kilometer long piece of swath.  As it is seen, the optimal rescaling parameters weakly depend on swath resolution, but the value of the approximation error is roughly proportional to the inverse of the grid step. Figure 2 demonstrates the relative approximation errors |S k q − S k |/|S k q | of the roll and phase components of GOE spectra as a function of the wavenumber k. For the dominant spectral components with k −1 < 10 −2 (cf. Figure 1) the errors are generally less than a few percent and yield the mean approximation errors √ of 2-7% depending on the grid resolution (Table 1). We assume that these values are small enough to be comparable with the modelling uncertainties of the LS error spectra. In that respect it might be useful to take into account the similarities of the LS GOE spectra at the stage of their estimation from the basic principles [2], because the respective "universal" LS GOE model (12) provides much better computational efficiency of the pseudo-inversion and projection matrices.

Computational Savings
Computation of C * is a costly task because it involves a singular value decomposition of a large n x n y × n x n y matrix (e.g., at 1 km swath resolution and 1000 km long piece of swath, the matrix size is N = n x n y ∼ 100 × 1000 = 10 5 ). However, this task is not insurmountable because we need to find only 4n y ∼ 4000 eigenvectors corresponding to all non-zero eigenvalues. After they are found offline, the entire projection of e requires approximately O(4n y N) = 4 × 10 8 floating point multiplications (FPMs), which may still appear computationally expensive.
On the other hand, Equation (13) shows that the approximate projection corresponding to the GOE model (12) with universal LS error spectrum can be obtained at the expense of only O(4N) ∼ 10 5 FPMs, i.e. n y times faster.
Similarity of the LS GOE spectra also provides a small parameter that can be employed to improve the accuracy of the projector via Taylor expansion in the vicinity of P a (Equation (13)). In particular, adopting the notation C = C * + δC, we get where δQ = δCC + * and P a ⊥ = I − P a . As we have already seen (Equation (10)), application of P a is reduced to projecting SWOT observations on the GOE eigenmodes in every point in the along-track direction. Furthermore, since the matrix δQ is symmetric, and, as it can be seen from the rhs of (21), its action on a vector is performed in four sweeps, each containing the following two steps: 1. perform spectral transform in the along-track direction n x times (for every along-track section) using the qth diagonal matrix δS q S −1 at the expense of O(2N log 2 n y ) FPMs.

2.
project the result using P q x = C q x C q+ x on the qth column of F x n y times (for every across-track section) which is approximately equivalent to N multiplications.
After that, perform summation of the results over all four cross-track GOE components q. The computational complexity of the above described first-order correction is estimated as 4(2N log 2 n y + N) ≈ O(8N log 2 n y ), still a considerable (n y /2 log 2 n y times) saving compared to the exact projection.

Discussion
In this study we observed that similarity of the LS GOE spectra can potentially bring considerable computational savings in assimilating wide-swath altimetry data into numerical models. These savings are due to the possibility of accurately approximating the symmetric square root of the n x n y × n x n y GOE covariance matrix C by factorizing it as a Kronecker product of two smaller (n x × n x and n y × n y ) matrices operating in the CS and LS directions. As a consequence, (pseudo)inversion operations on C are reduced to (pseudo)inversions of the respective factors in the Kronecker product. The spectral similarity also provides a small parameter that can be used to improve the accuracy of the above approximations via Taylor expansions of the exact GOE covariance matrix in the vicinity of its computationally efficient estimate.
In view of the above, the GOE covariance model [2] derived from the first principles, could be supplemented by a version featuring the universal LS spectrum. This development may bring extra practical benefits to the data assimilation community.
An ultimate goal of incorporation of the wide-swath altimetry into the data assimilation systems is providing a computationally efficient algorithm of statistically consistent rescaling of the innovations. Conventionally, it was done under the assumption of spatially uncorrelated observation errors whose covariance matrix is diagonal and is easily inverted. That is not the case for the wide-swath altimetry characterized by substantial error correlations at distances larger than the sampling scale, making the inversion of C a cahllenging task. A few attempts have been made to do so [5,9] using purely heuristic assumptions on the sparsity pattern of C −1 . In the presented study, we explicitly use information on the structure of the GOE part of the SWOT error covariance matrix to derive computationally efficient approximations to its pseudo-inverse and the respective projection operator P a .
The presented effort sheds more light on the structure of P a and provides an opportunity to go beyond the simple assumptions used in [7,8] for filtering of the GOEs from innovations. It is necessary to note that the success of the filtering technique largely depends on the relative magnitudes of other types of errors (such as KaRIN noise, wet/dry atmosphere and model error) that contribute to the innovations. In practice, GOEs are often dominated by other error components, and in this case the "LS longwave approximation" of the type used in [7,8] and additional ensemble information on the model errors may improve the filter quality. An alternative filtering approach [11] has demonstrated a certain superiority of a two-parameter family of isotropic smoothing kernels generated by quadratic polynomial of the Laplacian operator ∆ over the kernel generated by e ∆ .
Numerical assimilation experiments with simulated SWOT data demonstrate that wide-swath SSH observations are especially useful for monitoring oceanic small scale phenomena at wavelengths below 50 km [12,13]. At these scales, the KaRIN noise, whose error covariance is diagonal, dominates over all other SWOT error components [10], thus providing an efficient rescaling algorithm for the submesoscale innovations. In this case, GOE and wet atmosphere error covariances can be treated as small perturbations to the KaRIN noise covariance matrix, so the rescaling matrix can be well approximated by the expansion of the inverse square root of the KaRIN noise in the respective perturbations. Equations (12), (20) and (21) provide numerically efficient GOE contributions to this expansion. At the same time, the obtained low-rank approximation of the square root of the GOE covariance (12) provides an efficient way of computing the joint impact of KaRIN noise and GOEs using the approach of [14].
In a more general perspective, we assume that obtaining a universal and numerically efficient algorithm for rescaling the wide-swath altimetry observations is a challenging problem that could be solved by jointly analyzing real data from the forthcoming missions and ensemble simulations of the ocean surface topography by state-of-the-art numerical models. Highly structured observation error covariances of the wide-swath altimetry provide an opportunity to apply deep learning techniques (e.g., [15,16]) for retrieval of both more realistic estimates of the covariance matrices and their inverses. Extensive literature (e.g., [17][18][19][20]) also exists on other methods of approximating the inverse covariances. These methods are more general, as they do not take into account the specific structure of the covariance model, as it was done in the present study. Of particular interest are applications of the modified Cholesky decomposition [18], under the sparsifying assumption of conditional independence of the observation errors beyond a given localization scale. This approach is suitable for efficient parallelization [19] and guarantees the positive definite property of the inverse covariance.
In the SWOT application, one may think of combining the above mentioned methods with the specific information on the correlation structure of the swath altimetry to produce an accurate and computationally efficient inverse covariance approximation. The presented study could be treated as a step towards this ultimate goal.
Author Contributions: Conceptualization, software, methology, writing-original draft preparation M.Y.; software, validation, writing-review and editing, J.M.D.; project administration, funding acquisition, writing-review and editing, G.J. All authors have read and agreed to the published version of the manuscript.