Multiscale Stuart-Landau Emulators : Application to Wind-Driven Ocean Gyres

The multiscale variability of the ocean circulation due to its nonlinear dynamics remains a big challenge for theoretical understanding and practical ocean modeling. This paper demonstrates how the data-adaptive harmonic (DAH) decomposition and inverse stochastic modeling techniques introduced in (Chekroun and Kondrashov, (2017), Chaos, 27), allow for reproducing with high fidelity the main statistical properties of multiscale variability in a coarse-grained eddy-resolving ocean flow. This fully-data-driven approach relies on extraction of frequency-ranked time-dependent coefficients describing the evolution of spatio-temporal DAH modes (DAHMs) in the oceanic flow data. In turn, the time series of these coefficients are efficiently modeled by a family of low-order stochastic differential equations (SDEs) stacked per frequency, involving a fixed set of predictor functions and a small number of model coefficients. These SDEs take the form of stochastic oscillators, identified as multilayer Stuart–Landau models (MSLMs), and their use is justified by relying on the theory of Ruelle–Pollicott resonances. The good modeling skills shown by the resulting DAH-MSLM emulators demonstrates the feasibility of using a network of stochastic oscillators for the modeling of geophysical turbulence. In a certain sense, the original quasiperiodic Landau view of turbulence, with the amendment of the inclusion of stochasticity, may be well suited to describe turbulence.


Introduction
The turbulent oceanic flows consist of complex motions, jets, vortices and waves that co-exist on very different spatio-temporal scales, but also without clear scale separation.Mesoscale oceanic eddies populate nearly all parts of the global ocean and play important roles in maintaining the oceanic general circulation.The most straightforward, but also the most computationally intensive and, thus, unfeasible, way of accounting for the eddy effects on the large-scale circulation is resolving them dynamically with eddy-resolving ocean general circulation models (GCMs).This brute-force approach requires a computational grid resolution of about 1 km, which makes it feasible only for relatively short time simulations, whereas the Earth system and climate change modeling routinely require much longer simulations over centuries and millennia.One way to afford these time scales, while simulating the ocean in a qualitatively correct way, is to parameterize the important eddy effects with simple and affordable, but still accurate models embedded in the prognostic ocean circulation models.
Thus, the search for suitable eddy parameterizations remains a challenging theoretical topic with clear practical dimension.More often, the embedded prognostic models are dynamical, in the sense that they solve some coarse-grained approximations of the governing primitive equations, but computationally still fairly expensive, because they solve for many degrees of freedom.Here, the parameterizations of the eddy effects can be deterministic, where eddy diffusion is by far the most common approach (e.g., [1,2]), stochastic (e.g., [3][4][5][6][7][8]), or a combination of both.The latter approach is attracting increasing attention because it can, not only account for nondiffusive and antidiffusive eddy effects (e.g., negative eddy viscosity, eddy backscatter, etc.), but also takes into account the inherent "randomness" of complicated turbulent processes.
Taking a broader view, the exact form of such reduced models can be found rigorously from the full model equations only in special cases, by adopting, for instance, the Mori-Zwanzig (MZ) projection approach [9][10][11][12][13] or methods rooted in the approximation theory of local invariant manifolds, deterministic [14][15][16][17] or stochastic ( [18,19] and the references therein).Alternatively, when a rigorous derivation of the reduced dynamics is mathematically intractable or when the governing equations are not known, a data-driven inverse modeling approach naturally leads to the development of emulators, i.e., purely statistical, severely truncated in terms of degrees of freedom, prognostic models of reduced complexity that can reproduce the whole complexity of turbulent oceanic motions across scales.Such inverse models can be trained on the data from model simulations, or from observations, and they can be viewed as inexpensive statistical oceanic emulators.Their potential applications range from representing the ocean in climate process studies to providing a framework for analyses of multiscale flow interactions.
For successive inverse modeling, one of the key steps is to find relevant data-adaptive basis functions.Numerous techniques are known for decomposition of time-evolving datasets into data-adaptive modes, which include: (i) variance-based decomposition, such as principal component analysis (PCA) [20,21], its probabilistic formulation [22], as well as its nonlinear [23,24] and time-embedded extensions [25]; (ii) eigenfunctions of Markov matrices that reflect the local geometry and density of the data [26][27][28][29]; and (iii) approaches that rely on Koopman operator theory [30][31][32][33][34][35].While working in a suitable subspace spanned by the leading modes, one needs then to derive evolution equations that characterize the long-term dynamics of the full dynamical system that has produced the dataset.
This study demonstrates how to overcome such hurdles by applying the data-adaptive harmonic decomposition (DAHD) techniques introduced in [55] to emulate the full spectrum of dynamically important scales in the coarse-grained eddy-resolving ocean model solution, including mesoscale eddies.As discussed below, this is made possible by the ability of DAHD to extract spatio-temporal modes whose time-evolution can be efficiently modeled by a universal family of frequency-based low-order stochastic differential equations (SDEs) involving a fixed set of predictor functions, namely coupled stochastic oscillators made of multilayer Stuart-Landau models (MSLMs) ( [55], Section VIII).
To make the expository as self-contained as possible, the DAHD technique and the MSLMs are summarized in Sections 4.2 and 4.3, respectively.It is noteworthy that, after prerequisites from the theory of Ruelle-Pollicott resonances [56] recalled in Appendices A and B, the Appendices C and D below communicate, in the context of the oceanic model [8], on the mathematical foundations underlying the modeling by the class of Stuart-Landau models (of type (25) used hereafter), for the time-evolution of the corresponding DAH modes (DAHMs).The resulting DAH-MSLM approach has already shown its flexibility for datasets such as analyzed ( [55], Section IX) and a great promise for the stochastic data-driven modeling of challenging datasets in different branches of geophysics where modeling and prediction is hampered, on the one hand, by the shortness of observational records and, on the other, by the shortcomings of physics-based models: Arctic sea ice extent and concentrations [57,58] and solar wind-magnetosphere interactions [59].This study demonstrates, in the context of ocean modeling, that the DAH-MSLM approach allows for efficiently simulating the time-evolution of the spatio-temporal DAHMs across the full range of temporal scales covered by the oceanic model of [8], reproducing fairly accurately, in particular, the key statistical properties of the flow's nonlinear dynamics.

Oceanic Dataset
We consider a model of mid-latitude ocean gyres where flow dynamics is simulated by the three-layer quasi-geostrophic (QG) equations forced by the wind stress in the upper layer, in which merging western boundary currents form powerful eastward jet extension, similar to the Gulf Stream and Kuroshio; see Section 4.1.The ocean circulation of eddy-resolving simulation with ≈10 6 spatial degrees of freedom (d.o.f.) at reference model parameters [8] is characterized by a robust large-scale decadal low-frequency variability (LFV) at ≈17 years and involving coherent meridional shifts of the eastward jet extension separating the gyres.This LFV is driven by transient mesoscale eddies, qualitatively similar to the turbulent oscillator studied by [60,61]; see [62][63][64][65][66][67][68] for alternative theories.Still, there is no clear scale separation between the eddies and large-scale flow, and in fact, the LFV accounts for a relatively small fraction of the total variability, as demonstrated below in the detailed DAH analysis.
First, we consider the upper-ocean stream function anomalies, given by the dataset of 6.5 × 10 5 days, sampled 10 days apart and coarse-grained in the spatial region centered around the eastward jet (see Figure 1) with 64 points in the x-direction and 26 points in the y-direction.Then, the resulting field having N = 3.25 × 10 5 time snapshots and sampled 20 days apart is compressed by applying a standard PCA [21].The leading d = 30 empirical orthogonal function (EOF) modes (from a total of 1664 modes), which capture ≈70% of the total variance, are retained for the subsequent analysis.Discarded residual variability accounts for small spatial scales away from the jet; the latter being shown in Figure 1.The corresponding time series of principal components (PCs) are obtained by projecting the upper-ocean stream function anomalies onto the retained EOFs.

DAHD, DAH Power Spectrum and DAHMs
Next, we apply DAHD (Section 4.2) for the spectral analysis of the dataset constituted by the 30 corresponding PCs to diagnose, in particular, the multiscale ocean variability such as simulated by the model studied here; see Section 4.1 below for a description.Section 4.2 below recalls from [55] the main mathematical ingredients needed for the computations of the DAH spectra and modes.
To compute the latter, we first compute all the combinations of two-sided cross-correlations among the PCs time series up to lag M = 150 (in sampling units) and formed the grand matrix C given by (12) below with d = 30 and M = 2M − 1 = 299 (≈17 y), allowing us thus to resolve the decadal variability.Given the temporal sampling of 20 days used here, the Nyquist interval of resolved frequencies is therefore f < 9.1 y −1 .After computing the DAH eigenvalues (λ j ) and associated eigenvectors (W j ) of the grand matrix C, we re-arrange them according to the procedure described Section 4.2.1 in order to form the DAH power spectrum P given in (15), over each Fourier frequency f ; see (16) below.
Figure 2 shows the resulting DAH power spectrum with the corresponding |λ j | plotted as red filled circles.The DAH power spectrum is evenly spaced in frequency and the number of frequency bins B f = (M + 1)/2 = 150.There are exactly d = 30 pairs of |λ j | at each equidistant frequency f , except at f = 0, where there are d unpaired values [55].Note that in Figure 2 and the figures hereafter, the frequency values f that appear therein correspond to f /(2π).The DAH power spectrum exhibits a complicated "bumpy" plateau-like shape stretching across most of the frequencies and without a clear distinction between fast and slow timescales; a bumpy plateau that coexists with a pronounced sharp peak at low frequencies corresponding to decadal periodicity ≈ 17 y(0.061y −1 ) as shown in Figure 2 (cyan dots in Figure 2).A closer look at the details of the DAH power spectrum indicates that the bumpy part of the plateau is actually made of much broader, but less pronounced peaks spanning over an intermediate range of frequencies, 3 y −1 < f < 6 y −1 , as well as over a range of very high frequencies near f ≈ 8.5 y −1 , thus close to the Nyquist frequency.Recall from [55] that a separation in the DAH power spectrum between the DAH pairs at a given frequency f reflects how the singular values of the cross-spectral matrix S( f ) defined by (13) are distributed.For the dataset analyzed here, while there is a large gap between dominant power values at decadal peak, other frequencies tend to have several pairs in one group that is separated above the continuous background.
Figures 3 and 4 show space-time patterns (with the "space-coordinate" taken in the EOF-phase space) for the four pairs of DAHMs (W j ; see (17) and ( 18)), corresponding to the largest DAH spectral power at decadal and interannual frequency, respectively (cyan and green dots in Figure 2).The dominant modes, i.e., those corresponding to the pair having largest |λ j | at these frequencies, have largest magnitude in the channels corresponding to leading PCs.As |λ j | decreases, channels of higher-ranked PCs prevail with complex and non-trivial patterns.Figures 5 and 6 show the oscillation phases of the dominant (associated with the largest |λ j |) DAHM patterns, when transformed back to the physical space by using EOFs.Since the DAHMs shown on the left and center columns of Figures 5 and 6 have a time-component corresponding to the embedding dimension (≈17 y) [55], the transformation consists here of multiplying the 30 "spatial" channels of the DAHMs by the EOFs and then plotting in the physical space the resulting patterns as time evolves, within the embedding window.By doing so, the decadal mode reveals mostly coherent meridional shifts of the eastward jet extension, while the intraseasonal one represents largely standing oscillation of small eddies along the jet extension.corresponding to the four largest |λ j | (in descending order) at decadal frequency f = 0.061 y −1 (cyan dots in the data-adaptive harmonic (DAH) power spectrum of Figure 2).Each of the modes in a pair is time-shifted by a quarter of a period, i.e., in exact phase quadrature; x-axis, time embedding dimension (in years); y-axis, spacial dimension (rank of principal component (PC)).Right panels: DAHCs obtained by projection of the input dataset of 30 PCs onto the DAHMs; see (20).A DAHC pair consists of narrowband time series at the same temporal frequency of the associated DAHMs, but modulated in amplitude.5, but for the pair of DAHMs associated with the largest |λ j | (top green dot in Figure 2) at the interannual frequency f I = 0.674 y −1 (with period ≈ 1.5 y) (cf. Figure 4).

DAH-MSLM Oceanic Emulators
According to (20), the DAHCs are obtained by projection of the input dataset of retained PCs onto DAHMs and thus have N = 32,500 − 299 + 1 = 32,202 data points (21).Right columns of Figures 3 and 4 show that the time series of DAHCs are narrowband at the characteristic frequency associated with the respective DAHM pairs, as predicted by the theory (Section 4.2.2).Interannual DAHCs (Figure 4) appear to be very well in phase-quadrature (i.e., having a shift of a quarter of their period), since window M is sufficiently long to resolve this periodicity.On the other hand, the phase-quadrature relationship is less precise for decadal DAHCs (Figure 3) since M is comparable to such periodicity.Furthermore, the magnitudes of DAHCs are higher for the dominant spectral pair and tend to diminish as |λ j | decreases.
To model the time-evolution of the DAHCs, we apply now the MSLM modeling approach of ( [55], Section VIII) and summarized in Section 4.3 below, for the reader's convenience.For each pair of DAHCs, the resulting MSLM model was trained to estimate 3 + 4(d − 1) = 119 coefficients for the main layer of Equations ( 26).The model is then integrated from initial conditions corresponding to the start of the record and is forced by a white noise realization to obtain an ensemble of simulated DAHCs.The latter are convolved with DAHMs according to (22) to obtain harmonic reconstruction components (HRCs) (23) that are added in the frequency bands of interest.
The DAH-MSLM model was estimated and run in parallel at B f = 150 frequencies in the full range f ≤ 9.1 y −1 (see Section 4.3), taking ≈30 min of CPU time on a four-core 2.9 GHz Intel Core i7 MacBook Pro laptop to obtain a simulation of 1780 years long.On the other hand, the reference solution of the QG model (Section 4.1) takes about 1000 single-CPU hours to obtain the full data record [69].
Figures A7 and A8 of Appendix E below show good modeling skills of the resulting DAH-MSLM emulator in reproducing key statistical properties of leading PCs, namely their probability density functions (PDFs) and their autocorrelation function (ACFs), in a partial frequency band f ≤ 5.27 y −1 .Furthermore, good statistical modeling skill is also obtained for the full range of temporal frequencies (i.e., in the Nyquist interval f ≤ 9.1 y −1 ), and respective results are shown in Figures A5 and A6.Due in part to the long length of the simulation used here, these statistical skills do not show a marked dependence on the particular realization of the white noise used to drive our DAH-MSLM emulators.This observation is actually consistent with the ergodic property that an MSLM of the form (26) must satisfy; a property that will be communicated elsewhere.Note that for an optimal simulation of the high-frequency band 5.3 y −1 ≤ f ≤ 9.1 y −1 , it was beneficial to have b x ij and b y ij set to zero in (26).While the PDFs are mostly Gaussian, the autocorrelation functions exhibit a complicated mixture of temporal scales and thus do not allow for a clear ranking between the different PCs in terms of variability.For example, a slower decay is dominant in ACFs of PC1, 6, 8 and 11, while other PCs exhibit fast decay with superimposed oscillations.This is where a key advantage of the DAH-MSLM approach lies, i.e., in its ability to distinguish distinct temporal scales from a mixture of frequencies embedded in the data allowing in turn for an effective modeling; the distinction being under the monitoring of the DAHD, while the modeling itself is made efficient by the use of MSLMs.PC1 is shown as the black curve in Panel (a) of Figure 7. There, one can observe already in this raw time series the mixture of frequencies superimposed on a modulated oscillation of decadal dominant period.PC1 as obtained from the DAH-MSLM emulator is also shown as the black curve in Panel (b) of Figure 7.A comparison between these two curves with the naked eye shows already that the main dominant period, as well as the time series modulations and the higher frequency part of the signal are well reproduced.To better assess how the LFV associated with the decadal oscillation is captured, we proceeded in two steps.First, for each PC1, as simulated from the QG model and from the DAH-MSLM emulator, we calculated the first four HRCs that we sum up to span a range of frequencies corresponding to the decadal variability of the flow as identified by the DAH power spectrum.The latter sum of HRCs is shown as the red and light blue curves in Panels (a) and (b) for PC1 as simulated by the QG model and the DAH-MSLM emulator, respectively.In both cases, these curves provide a low-pass filter, smoothed version, of the signal (here PC1) corresponding to the decadal variability of the flow.
In a second step, we estimated the reduced Ruelle-Pollicott resonances (Appendices A and B) associated with the reduced phase space V constituted by these sums of HRCs and their lagged versions; the lag being taken here to be equal to 1 y.Essentially, these resonances are obtained from the spectrum of the transition matrix associated with these sums of HRCs in the reduced phase space V; see Appendix B for more details.As outlined in Appendices A and B (see also Appendix C), these resonances are characteristic of the underlying dynamics, and the arrangement of these resonances in the complex plane constitutes a signature of key dynamical features.In particular, they allow for precise decomposition formulas of correlation functions and power spectral densities (PSDs) in term of these resonances; see (A5) and (A6) in Appendix A below.See also [56,70,71] and the references therein.In this sense, their estimation goes beyond the estimation of ACFs and PSDs: the latter statistical quantities could indeed share similar features without sharing necessarily the same underlying resonances.In terms of modeling, an agreement of these resonances is thus more demanding, but also more indicative of a good capture of the dynamics than by looking simply at the reproduction of ACFs and PSDs.
Panel (c) of Figure 7 shows a good agreement of the arrangements of the resulting reduced RP resonances estimated from the HRCs calculated from the QG simulations, on the one hand, and from the DAH-MSLM emulator, on the other.Such good modeling skills as diagnosed by (reduced) RP resonances are not only the attribute of the decadal variability, they are actually observed across other frequency ranges such as corresponding to interannual variability and even subannual and for most of the PCs (not shown).These results obtained in the EOF-phase space or its reduction indicate that a good agreement is expected to take place in the physical space, as well, between the simulations of the DAH-MSLM emulator and the original QG model from which it is derived.
To assess the modeling skills in the physical space, the DAH-MSLM simulations (in selected frequency bands) are transformed into the physical space by using EOFs.Comparison with harmonic reconstructions of QG data in low, intermediate, high and full frequency bands shows realistic instantaneous flow patterns obtained by the DAH-MSLM emulator (compare Panel (c) (resp.(g)) with Panel (d) (resp.(h)) of Figures 8 and 9), albeit this similarity is necessarily qualitative due to the stochastic nature of the latter.
In addition, the DAH-MSLM emulator reproduces very well both the spatial patterns and magnitude of variance in these key frequency bands; compare Panel (a) (resp.(e)) with Panel (b) (resp.(f)) of Figures 8 and 9. Despite the pronounced decadal peak, the LFV range f < 0.18 y −1 accounts only for 17% of the total variance in the full range of resolved frequencies f < 9.1 y −1 (in the truncated subspace of 30 leading EOFs), while most of the variance is captured by the intermediate range of temporal scales.

Discussion
Thus, the DAHD of the QG model's stream function anomalies, after projection onto the space of EOFs, allows for the efficient modeling of the main statistical features of the flow and its variability across a broad range of frequencies, by a network of stochastic oscillators, namely the MSLMs.In mathematical terms, if one denotes by ψ 1 the anomalies of the upper-layer stream function, the DAHD provides the following representation: where Φ k denotes the k-th EOF and R k (t; f ) the harmonic reconstructed component, at the frequency f for this EOF; see (23) in Section 4.2.2 below.Since the R k (t; f ) are themselves linear combinations of the DAHC pairs at the same frequency (see (22)), the modeling of ψ 1 boils down to the modeling of these pairs, due to (1).As shown above and further discussed from a theoretical viewpoint in Appendices B-D below, the efficient modeling of these DAHC pairs by MLSMs offers thus an interesting representation/modeling of a turbulent stream function in terms of stochastic oscillators.
In a certain sense, the original quasiperiodic Landau view of turbulence [72,73], with the amendment of the inclusion of stochasticity, may be well suited to describe turbulence.Unlike the data-driven emulator of [69], where the time-lagged multivariate singular spectrum analysis (M-SSA) basis and linear stochastic MSM formulation [54] are used to extract and simulate decadal LFV only, the DAH-MSLM approach allows us to capture the full spectrum of temporal scales, from decadal to mesoscale eddies.The key and unique features that make it possible and that are not available in other data-adaptive decompositions including M-SSA are the rigorous extraction of the spatio-temporal modes (i.e., DAHMs) unambiguously associated with a single temporal frequency (i.e., without mixing of scales) and a well-defined dynamical mechanism to describe their time evolution (i.e., respective DAHCs) using coupled Stuart-Landau stochastic oscillators.Both DAH power and phase spectra hold great promise for the dynamical diagnostics and intercomparison of multiscale datasets and models, focusing on specific frequency bands.
The future research directions include utilizing DAH-MSLM emulators for dynamical analyses, material transport and eddy parameterizations.For example, [5,8,74] approximated actual eddy effects in non-eddy-resolving models by applying transient forcings with spatio-temporal correlations.Such forcings excite flow fluctuations that evolve dynamically and eventually become rectified by the nonlinearity into the eddy-driven large-scale flow anomalies.A major interim weakness of this approach is that it treats transient forcing in the simplest way.
The DAH-MSLM modeling offers great advantages, because not only does it provide vehicles for statistically accurate approximations of the eddy field and its eddy forcing, but also it makes the above parameterization approach much more general and versatile.Instead of applying some random stochastic noise as a replacement for the eddy effects, DAH-MSLM can provide highly constrained forcing functions with spatio-temporal correlations fitted to the actual observed statistics of the eddies.The emulated eddy forcings can be embedded in the corresponding non-eddy-resolving ocean models as data-driven replacements for the eddy effects with direct [5][6][7] and indirect [8,75] implementations of the embedding.

Mid-Latitude Ocean Model
In this study, the data are generated from the integration of the classical eddy-resolving double-gyre ocean model of [8].The flow dynamics is governed by the quasi-geostrophic potential-vorticity (QG PV) equations for 3 stacked isopycnal layers: where the layer index starts from the top; J(•, •) is the Jacobian operator; ρ 1 = 10 3 kg•m −3 is the upper layer density; β = 2×10 −11 m −1 •s −1 is the planetary vorticity gradient; ν = 20 m 2 •s −1 is the eddy viscosity coefficient; and γ = 4 × 10 −8 s −1 is the bottom friction parameter.The basin size is 2L = 3840 km, so that −L ≤ x ≤ L and −L ≤ y ≤ L. The isopycnal layer depths are H 1 = 250, H 2 = 750 and H 3 = 3000 m.The PV anomalies q i and the velocity stream functions ψ i are related as: where the stratification parameters S 1 , S 21 , S 22 , S 3 are such that the first and second Rossby deformation radii are Rd 1 = 40 km and Rd 2 = 20.6 km, respectively.The flow is forced by the prescribed wind stress curl W(x, y).The double-gyre Ekman pumping W(x, y) is asymmetric in order to avoid artificial symmetrization of the gyres: where the asymmetry parameter is A = 0.9, the non-zonal tilt parameter is B = 0.2, and the wind stress amplitude is There are no-flow-through and partial-slip boundary conditions on the lateral walls, augmented by the integral mass conservation constraints.The model is solved on the uniform 513 2 grid with 7.5-km nominal resolution.The solution is saved every 20 days over 1780 years; this time interval is unprecedentedly long for an eddy-resolving ocean model, but our study requires this length to obtain highly accurate statistics of the flow.Each snapshot of the solution was coarse-grained by a local running-window averaging with a 120-km width to a 65 2 grid with 60-km spacing.We checked that the outcome is not sensitive to moderate variations of the running-window width.

Data-Adaptive Harmonic Decomposition
The data-adaptive harmonic decomposition (DAHD) [55] is a signal-processing technique that allows for a decomposition of the power and phase spectra via data-adaptive modes within a time-embedded phase space.Unlike other techniques exploiting time-embedding, such as M-SSA [25] or its nonlinear version [28], DAHD exploits a combination of integral operator and semigroup techniques [76] that help decompose the original signal into elementary signals that are narrowband for each separate discrete Fourier frequency, while being data-adaptive.
The mathematical details of the approach are provided in [55] within a general framework, including the case of multivariate time series issued from a mixing dynamical system, either stochastic or deterministic.Central to the approach is the spectral analysis of a class of integral operators whose kernels are built from correlation functions in a quite different way than found in principal component analysis (PCA) and its generalizations.For the sake of simplicity, we recall first from [55] how such an integral operator is constructed in the case of a one-dimensional time series X(t).Given the two-sided autocorrelation function (ACF), ρ (of X(t)), estimated on the interval I = [−τ/2, τ/2], such an operator takes the form: and acts on any square-integrable function Ψ on the interval I.The parameter τ > 0 characterizes the embedding window, but is chosen in practice so that ρ(t) sufficiently decays over [−τ/2, τ/2].At a practical level, the discretization of the operator L ρ defined by (10) leads to Hankel matrices built from temporal correlations fundamentally different than in M-SSA and alike.For multivariate time series, the ACF, ρ, is replaced by time-lagged cross-correlations, and operators such as given by ( 10) are grouped into a block operator whose discretization results into block-Hankel matrices; see (12) below and ( [55], Sec.VI-D).The aforementioned DAHMs are then obtained as eigenvectors of such a block-Hankel matrix, while the corresponding eigenvalues provide a notion of energy contained into the signal that, while allowing for a reconstruction of the signal, is not equivalent to variance; see ( [55], Remark V.1-(ii)).The main properties of DAH spectral elements are given below.

DAH Eigenelements and Power Spectrum
Given a multivariate time series formed of d "channels" sampled uniformly at a unit of time δt, i.e., X(t n ) = (X 1 (t n ), . . ., X d (t n )), with t n = nδt, n = 1, . . ., N, the first step to determine the DAH spectra consists of estimating the two-sided cross-correlation coefficients ρ (p,q) k between channels p and q at lag k up to a maximum lag M − 1, i.e., −M As shown in ( [55], Sec.VI-D), the discretization of the operator L ρ given by (10) with ρ = ρ (p,q) leads to the following Hankel matrix H (p,q) : By forming such a Hankel matrix for each (p, q) in {1, . . ., d} 2 , one can assemble the following block-Hankel matrix C composed by d 2 blocks of size (2M − 1) × (2M − 1), each given according to: Note that because each building block, H (p,q) , is symmetric and because C (p,q) = C (q,p) , the grand matrix C is itself symmetric, and therefore, its eigenvalues are necessarily real while the eigenvectors form an orthogonal set.Furthermore, Theorem V.1 of [55] shows that the corresponding eigenvalues of C come in pairs of equal absolute values, but with the opposite sign.The eigenvalues can be grouped per Fourier frequency f ( = 0) and are actually determined by the singular values of a cross-spectral matrix at each frequency.In particular, denoting by ρ p,q ( f ) the Fourier transform at the frequency f of the cross-correlation function ρ p,q , we consider the following d × d cross-spectral matrix S( f ) whose entries are given by: Then, Theorem V.1 of [55] shows that for each singular value σ k ( f ) of S( f ) there exists, when f = 0, a pair of negative-positive eigenvalues i.e., 2d eigenvalues are associated with each Fourier frequency f = 0.The same theorem shows that d (but not paired) eigenvalues are associated with the frequency f = 0.This property allows one to rearrange the eigenvalues into the DAH power spectrum that consists of forming, for each ranging from 1-(M + 1)/2, the discrete set: where: Hereafter, we use M = 2M − 1 for concision, re-indexing the string {−M + 1, . . ., M − 1} to run from 1-M as necessary.
Furthermore, Theorem V.1 of [55] shows that the eigenvectors W j of C can also be grouped per Fourier frequency f by using the following representation: where DAHM snippet E j k is a M -long row vector that is explicitly associated with a Fourier frequency f according to: where the amplitudes B j k and the phases θ j k are both data-dependent, for each k in {1, . . ., d}.Thus, we can easily sort the eigenvectors according to a given Fourier frequency f in (16), by determining the following subset of indices in {1, . . ., dM }: Note that J ( f ) is composed of 2d indices when = 0 and of d indices if = 0, and therefore, J ( f )'s form a partition of the total set of indices, {1, . . ., dM }.

DAH Coefficients
Another useful property concerns the pair of DAHMs associated with a pair of DAH eigenvalues (λ j , λ j ), such that λ j = −λ j with j and j that belong thus to the same subset J ( f ).For such a DAHM pair, the theory shows indeed that their corresponding phases satisfy θ j k = θ j k + π/2, i.e., in each DAHM pair, the modes are shifted by one fourth of the period; see Theorem IV.1 of [55].The DAHMs are thus always in exact phase quadrature, as for a sine-and-cosine pair in Fourier analysis, but in a data-adaptive fashion as encapsulated in the θ j k 's and the B j k 's.
By analogy with M-SSA [25], the multivariate dataset X can be projected onto the orthogonal set formed by the W j 's, in order to obtain the following DAH expansion coefficients (DAHCs): where t varies from one to: Although the DAHCs are not formally orthogonal in time, the DAHC pair (ξ j (t), ξ j (t)) associated with a DAHM pair (W j , W j ), is composed of time series that are nearly in phase quadrature; a property that is all the more pronounced when the embedding window parameter M can be made sufficiently large to resolve the decay of temporal correlations contained in X; see [55].In other words, the larger M (subject to the length of the record), the more apparent is the phase quadrature property exhibited by ξ j (t) and ξ j (t) constituting a DAHC pair.
Furthermore, any subset of DAHCs, as well as the their full set, can be convolved with its corresponding set of DAHMs, to produce a partial or full reconstruction of the original dataset, respectively.Thus, the following j-th reconstructed component (RC) at time t and for channel k is defined as: where L t (resp.U t ) is a lower (resp.upper) bound in {1, . . ., M }, that is allowed to depend on time.
The normalization factor M t equals M , except near the ends of the time series, as in M-SSA [25], and the sum of all the RCs recovers the original time series.Due to the ordering of the DAHMs in terms of Fourier frequency, the harmonic reconstruction component (HRC) can be formed from the sum of the RC pairs associated with a same Fourier frequency f = 0, namely: where J ( f ) denotes the set of indices given in (19).HRC provides an unambiguous and natural way to determine how variability at a particular frequency f is expressed in the time domain.For instance, Panels (a) and (b) of Figure 7 show the sum of the first four HRCs on PC1 for the upper-layer stream function anomalies as simulated from the QG model and its DAH-MSLM emulator, respectively.

Multilayer Stuart-Landau Models
DAHD facilitates efficient inverse modeling of a given multivariate dataset X in the transformed coordinates of DAHCs (20) and then utilizes the reconstruction Formulas ( 22) and ( 23) for transformation into the space of the original variables of X.As we explain below, the modeling of DAHCs is simplified due to (i) the near-phase quadrature property satisfied by any DAHC pair and (ii) its narrowband character.
Let us consider a DAHC pair (ξ j (t), ξ j (t)) associated with a pair of DAH eigenvalues (λ j , λ j ), such that λ j = −λ j with j and j that belong thus to the same subset J ( f ) associated with a frequency f .Moreover, without loss of generality, we can assume that λ j is positive and λ j is negative.Hereafter, we also assume the time t to be a continuous parameter.For such a DAHC pair, we form the complex time series, ζ j (t) = ξ j (t) + iξ j (t) where i 2 = −1.As shown in Section VII of [55], we can infer that: Since the convolution term in the RHS of ( 24) involves a cosine function oscillating at the frequency f , the frequencies g = f contained in the time series X k (t + τ/2) − X k (t − τ/2) are filtered out, and the resulting DAHC, ξ j (t), is narrowband about the frequency f .This latter property is shared by all DAHCs, ξ j (t), with j in J ( f ), and in particular for ξ j (t).Although the RHS of ( 24) provides an exact representation of the DAHC (in the case of an infinite time series), its usage in practice is limited since we desire a closure model of the ξ j (t)'s that avoids an explicit dependence on the data, X k , as found in (24).
Guided by the fact that a DAHC pair (ξ j (t), ξ j (t)) is composed of narrowband time series that are nearly in phase quadrature, Chekroun and Kondrashov [55] have shown that the class of Stuart-Landau (SL) oscillators driven by an additive noise [77,78] represent a natural class of closure models to capture amplitude modulations of ξ j (t) and ξ j (t): where µ, γ and β are real parameters, t is a noise term and z(t) = ξ j (t) + iξ j (t).Complementarily, Appendices A-D below justify the modeling of DAHCs within the class of SL oscillators by the theory of Ruelle-Pollicott (RP) resonances and their estimation from the time series.Furthermore, the collective behavior of all DAHC pairs associated with the same frequency f must be also taken into account by using an appropriate dynamical coupling between the corresponding individual SL oscillators, as well as by considering temporal and spatial cross-pair correlations in the noise term t .
The multilayer MSM framework of [54] is particularly suited to deal with these issues and applied to (25), it leads to MSLMs such as introduced in [55].For this study, the optimal MSLM has only a main layer according to the stopping criterion described in Appendix A of [54], and it is given as the following system of SDEs driven by pairwise-correlated white noise: Appendices C and D below provide, in the context of the data collected from the oceanic model [8], numerical evidences and theoretical arguments for the modeling of the corresponding DAHCs by MSLMs such as (26).As mentioned above, the theoretical apparatus is in particular based on the theory of Ruelle-Pollicott resonances and their estimation from time series [56], recalled in Appendices A and B below.
Here, (x j (t), y j (t)) is aimed at approximating a DAHC pair (ξ j (t), ξ j (t)) associated with a frequency f = f , and the index j belongs to the subset of indices J ( f ) associated with d distinct pairs.The W j k 's with k = 1 or k = 2, and 1 ≤ j ≤ d, form 2d independent Brownian motions.Following Kondrashov et al. [54], all model coefficients are estimated for each (x j , y j )-pair by (multiple) linear regression (MLR).Linear constraints on α j ( f ), β j ( f ) and σ j ( f ) are imposed to ensure antisymmetry for the linear part of each (x j , y j )-pair, as well as equal and positive values σ j ( f ) > 0 to ensure asymptotic stability.The resulting regression residuals yield time series of ( d x j , d y j ) that are well approximated by pairwise-correlated white noise in the sums involving the W j k 's; see Appendix D for more details.Note that an MSLM, as any MSM, may include more layers to accurately model the regression residual; see Section VIII-B of [55].It turned out that for the simulations reported in this study, an MSLM of the form (26) was sufficient to reach satisfactory modeling skills; see Figures 8 and 9 above, as well as Appendix E. Note also that the DAHCs associated with f ≡ 0 are not paired and resemble red noises in practice (see [55]); thus, only a linear MSM is used to model the DAHCs associated with the zero-frequency; see [54].
Because the SL oscillators are uncoupled across the frequencies, the DAH-MSLM approach is computationally efficient, totally parallelizable and, for a variety of datasets, laptop-enabled in practice.First, the model coefficients can be estimated in parallel for each frequency, and the overall number of independent coefficients to estimate remains small and fixed for each (x j , y j )-pair.Once all the resulting (few) MSLM coefficients have been estimated, for the simulation, as well, no extra coupling across the frequencies is needed other than running the MSLMs across the frequencies by the same noise realization, which can be also done in parallel.
To simulate the DAHC pairs, Equations ( 26) are discretized in time and integrated numerically forward using a Euler-Maruyama scheme from initial states obtained according to the initialization procedure described in Appendix B of [54], followed by a convolution with the associated DAHMs according to (22) to transform into the space of original variables.The simulated RCs are then added in each frequency bin to yield the corresponding HRCs given by ( 23) that, in turn, are added across the frequencies in the range of interest.For instance, Panel (b) of Figure 7 shows the sum of the first four HRCs on PC1 for the upper-layer stream function anomalies as simulated for the DAH-MSLM emulator of the QG model considered here.theory, in particular regarding the fundamental role that RP resonances play in the decomposition of power spectral density (PSD) and correlation functions and that we apply to the case of time series constituted by the dominant pairs of DAHCs for the three frequency bands of interest in this study, namely decadal, interannual and subannual.
Given a system of stochastic differential equations (SDEs) in R p , dX = F(X) dt + G(X) dW t .(A1) Here, W t = (W 1 t , . . ., W q t ) denotes an R q -valued Wiener process whose components are mutually independent standard Brownian motions.
In Equation (A1), the drift part is provided by a (possibly nonlinear) vector field F of R p , and the (also possibly nonlinear) stochastic diffusion in its Itô version, given by G(X) dW t , has its i-th-component (1 ≤ i ≤ p) given by: In what follows, we assume that the vector field F and the matrix-valued function: satisfy regularity conditions that guarantee the existence and the uniqueness of mild solutions, as well as the continuity of the trajectories; see, e.g., [79,80] for such conditions in the case of locally Lipschitz coefficients.
It is well known that the evolution of the probability density of the random R p -valued variable, X t (solving Equation (A1)), is governed by the Fokker-Planck equation: with Σ(X) = G(X)G(X) T .What is less-known however is that the spectral properties of the second-order differential operator, A, informs about fundamental objects such as the power spectra or correlation functions computed typically along a (random) trajectory of Equation (A1).We briefly recall the main elements hereafter and refer to [56,71] for more details.
First, given an observable ϕ : R p → R for the system (A1), we recall that the correlation spectrum S ϕ ( f ) is obtained by taking the Fourier transform of the correlation function C ϕ (τ), namely: where X x t denotes the stochastic process solving (A1) and satisfying X x t = x at time t = 0.As shown in [71], for a broad class of SDEs that possess an ergodic probability distribution µ, the spectrum, σ(A), of the Fokker-Planck operator, A, is typically contained in the left-half complex plane, {z ∈ C s.t.(z) ≤ 0}, and its resolvent R(z) = (zId − A) −1 , is a well-defined linear operator that satisfies: Here, f lies in the complex plane C, and the poles of the resolvent R(i f ), which correspond to the RP resonances, introduce singularities into S ϕ ( f ).Once the PSD is calculated, i.e., once |S ϕ ( f )| is computed with f taken to be real, these poles manifest themselves as peaks that stand out over a continuous background at the frequency f if the corresponding RP resonances with imaginary part f (or nearby) are close enough to the imaginary axis.The continuous background has its origin in In practice, only partial observations of the solutions to (A1) are available, e.g., few solution components.Speaking roughly, it is shown in [56] that given partial observations of a complex system that lie in a reduced phase space V, a Markov operator T with state space V can be inferred from these observations such that (i) T characterizes the coarse-graining in V of the transition probabilities in the full state space and (ii) the spectrum of T relates to the RP resonances, but in an averaged sense; see also [29,87].In practice, the dimension of V is kept low so that T can be efficiently estimated via a maximum likelihood estimator (MLE).We detail below this procedure and what (ii) means, in the context of DAHCs.
Our standing assumption is that for each frequency f = 0, there exists a model of the form (A1) of which a solution is given by the set of DAHC-pairs, (ξ j (t), ξ j (t)), at frequency f .Under this working assumption, since there is a total of d pairs, our phase space is here R p with p = 2d.Our observations are made out of a single pair of DAHCs, (ξ j (t), ξ j (t)), so that dim(V) = 2 here.We denote by ϕ : R p → V the corresponding mapping and take t = t n , i.e., discrete time instants, as given by the multiple of the sampling time at which the DAHCs are computed, here 20 days.
The Markov operator T is approximated by the transition matrix P whose entries are given: where the J k 's form a partition (of size M × M) of a domain D in V containing the set of discrete points (ξ k (t n ), ξ k (t n )), for n = 1, . . ., N; see, e.g., [56,87,88].
Assuming that the sought system of SDEs, (A1), possesses an ergodic statistical equilibrium, µ, one can show that in the limit N → ∞, the spectrum of P (after application of the (principal value of) logarithm) provides an approximation of the point spectrum of the following averaged Fokker-Planck operator, given formally for all Ψ in C 2 (V) by: see [89].The point spectrum σ p (A) of A provides what we call hereafter the reduced RP resonances of A. In [71], it is called the reduced mixing spectrum, terminology that we have purposely changed here to avoid any confusion with the notion of mixing in the physical space; the mixing referred to in [71] takes place in the phase space and is typically manifested by decay of correlations; see also [56].The point spectrum σ p (A) characterizes the mixing properties in the reduced phase space, once the effects of the variables lying in the unobserved factor Z have been averaged out.Thus, in short, log(σ(P)) approximates σ p (A) (A9) More precisely, in (A8), the space Z denotes the supplement of V in R p , i.e., R p = V ⊕ Z, and µ v denotes the disintegration of µ above v in V, along the unobserved factor space Z. Mathematically, for all Borel sets B and F of Z and V, respectively, with m = Π V µ; see [54] and the references therein.Thus, the operator A corresponds to the conditional expectation of A obtained after averaging along the unobserved variables lying in Z. Once the Markov matrix P is estimated from time series via (A7), its spectrum provides in turn an estimation of the reduced RP resonances and thus may inform us about the nature of the coefficients of A provided that the spectrum of P exhibits interpretable features in terms of known SDEs.
In polar coordinates (r, θ) with x = r cos θ and y = r sin θ, this system becomes by applying the Itô's formula: where W r and W θ are two Wiener processes satisfying the relationships The Fokker-Planck equation associated with (A14) is given by: We assume now that β is sufficiently largely positive and σ is sufficiently small, so that roughly speaking, the dynamics of (x(t), y(t)) settles near a circle of radius R * , corresponding to an average radius of the stochastic limit cycle [90].Under this assumption, it is expected thus that the azimuthal component of the solutions of (A15) can be well approximated by the solutions of the following advection-diffusion equation with periodic boundary conditions: The solutions of (A16) are formally given by: It is then easy to obtain the following approximation formula (up to a rescaling) for the eigenvalues, λ k , of the Fokker-Planck operator's azimuthal part associated with Equation (A15): We refer to [89] for more details; see also [91].Note that here, T in (A17) denotes the period associated with the azimuthal velocity ω such that ω = 2π/T.
The analytic approximations given by (A17) are shown by the green dots in the right panel of Figure A2 for σ = 1.02 × 10 −2 and T = 540 days.To such analytic approximation of the RP resonances associated with (A13), the parabolas of resonances that are shifted to the left of the complex plane by γ > 0, namely: constitute parabolas of harmonics; see [71].These parabolas of harmonics, shown in green here for one of them to avoid overload, provide good approximations of most of the resonances shown (in red) in the right panel of Figure A2.Note that in practice, the shift γ relates to the eigenvalues of the radial part A r of the Fokker-Planck operator associated with (A15), namely: uncoupled SL models of the form (A13).Even for frequencies f (such as in Appendix B) for which individual SL models provide very plausible SDE generators of any given pair of DAHCs at the frequency f (such as explained in Appendix C), it is unlikely to obtain in this simplified fashion a coherent simulation of the DAHCs (associated with frequency f ) that would share phase relationships consistent with those exhibited by the original DAHCs (at the same frequency).It is one of the main motivations for the presence of such coupling terms in (A21) and thus in (A24).
In the presence of coupling terms, it is noteworthy that the averaged Fokker-Planck operator associated with (A24) differs from that associated with an individual SL model, i.e., associated with: by first-order and second-order partial differential operators involving averaging over a combination of coefficients in C( f ) and G( f ), respectively.
The analysis conducted in Appendix C suggests thus that any "deviation" from parabolas of the arrangement of resonances could serve as a good indication that the presence of such derivatives in the averaged Fokker-Planck operator is non-negligible (emphasizing thus the presence of coupling terms), although such a statement should be taken with care due not only to possible estimation errors that could lie behind such discrepancies, but also to other factors governed e.g., by the ratio between β and σ, adopting the notations of Appendix C; see [89] for more details.
Discrepancies with respect to an array of parabolas are observed in the right panel of Figure A3 for the RP resonances estimated from the dominant pair of DAHCs shown in the left panel of Figure A3, for the frequency f = 0.061 cycle/y −1 , i.e., corresponding to decadal variability (T ≈ 16.39y).The reason here lies presumably in the less precise phase-quadrature relationship thatthe time series composing such a pair satisfies, as pointed out already in Section 2.3 for the pairs of DAHCs shown in Figure 3 associated with decadal variability.Actually, not only the phase-quadrature relationship among the time series constituting a DAHC pair is altered at low-frequency, but also their narrowband character unlike for the interannual and subannual cases (not shown).This alteration of such narrowband features is actually rooted into windowing effects since the 16.4-y-period is actually pretty close to the size of the window (≈17 y) used to estimate the correlations underlying the DAH grand matrix, C, in (12).
Thus, we may naturally expect that a single SL model (for which in particular such phase-quadratures are typically observed) is insufficient for modeling purposes.Nevertheless, the outer envelope of these resonances still follows closely a parabola as shown by the green dots in Figure A3 obtained by application of the analytic Formula (A18) with T = 17 y, γ = 0 and σ = 3 × 10 −3 .The coupling terms C( f ) in (A24) and the noise matrix G( f ) are aimed at modeling the differences from an array of parabolas within the envelop's convex hull.
The subannual case supports also the relevance of the analytic prediction (A17) and thus of the class of Stuart-Landau models used to model the DAHCs for the corresponding frequencies, albeit in a less direct way.Here, the frequency associated with the pair of DAHCs analyzed hereafter is determined via the DAH spectrum given in Figure 2. The corresponding value of this frequency is f ≈ 5.27 cycle/y −1 .It lies close to the Nyquist frequency, and therefore, sampling effects are expected to manifest.For instance, whereas the dominant period should be thus T p ≈ 69 days as predicted by the theory, an estimation of the resonances as eigenvalues of the Markov matrix P with entries given by (A7) shows an estimated period T s of 1/7 y −1 corresponding approximately to 52 days.The latter period is clearly visible by displaying the eigenvalues of P without applying the logarithm, unlike in Figures A2 and A3: seven strings of resonances are indeed apparent in the right panel of Figure A4 within the unit disc of the complex plane, i.e., by adopting the same mode of representation as in [56].These resonances strings are very well approximated by rays of resonances obtained by application of the analytic Formula (A17), this time with T = T s (and not with T = T p ), γ = 0 and σ = 3 × 10 −3 ; see the right panel of Figure A4.Note that a cluster of resonances has formed near

Figure 1 .
Figure 1.Upper-level stream function anomalies.(Left) Top: standard deviation; bottom: snapshot of instantaneous flow.(Right) Same, but in the truncated subspace of 30 leading spatial empirical orthogonal functions (EOFs).The nondimensional units are arbitrary, but are the same for all panels.

Figure 2 .
Figure 2. DAH power spectrum P of 30 leading PCs of the upper layer stream function anomalies.Left panel: Each discrete set P consists of 30 eigenelements (equal to number of input dataset's channels), corresponding to the pairs of DAH eigenvalues |λ j | and eigenvectors W j at a given temporal frequency f (see Section 4.2.1 and (15)-(17)).Figures3 and 4show DAHMs associated with four largest |λ j | at two selected frequencies: cyan, decadal LFV peak f D = 0.061 y −1 (≈ 17 y); green, interannual f I = 0.674 y −1 (≈ 1.5 y).The right panel shows a magnification of the low-frequency part of the spectrum.

Figure 3 .
Figure 3. Left and center panels: Space-time patterns of data-adaptive harmonic mode (DAHM) pairs

Figure 4 .
Figure 4. Same as in Figure3, but for f = 0.674 y −1 ; associated here with the green dots in the DAH power spectrum shown in Figure2.

Figure 5 .
Figure 5. Manifestation in the physical domain of the leading pair of DAHMs (see Figure 3) at the decadal variability f = 0.061 y −1 ; i.e., corresponding to the largest |λ j | (top cyan dot) in Figure 2. The resulting pattern is periodic (with period ≈ 16.39 y).Here, eight oscillation phases labeled by time are shown.

Figure 6 .
Figure 6.Same as in Figure5, but for the pair of DAHMs associated with the largest |λ j | (top green dot in Figure2) at the interannual frequency f I = 0.674 y −1 (with period ≈ 1.5 y) (cf.Figure4).

Figure 7 .
Figure 7. Decadal harmonic reconstruction component (HRC) and its reduced RP resonances for PC1: quasi-geostrophic (QG) model and multilayer Stuart-Landau model (MSLM).(a,b) show the sum of the first four HRCs on PC1 for the upper-layer stream function anomalies as simulated from the QG model (red curve) and its DAH-MSLM emulator (blue curve).In both panels, the PC1 is shown in black: in (b), PC1 is obtained after simulation of the DAH-MSLM emulator, whereas in (a), PC1 is obtained from simulation of the QG model.The HRCs are computed according to (23) in Section 4.2.2 below, from the corresponding simulated data.(c) shows the corresponding reduced RP resonances (see Appendix B) as estimated from HRCs shown in (a,b); colors match across panels (a-c).

Figure A5 .
Figure A5.Same format as in Figure A7, but the comparison of harmonic reconstruction (red) and DAH-MSLM stochastic simulation (blue) in a full frequency band f < 9.1 y −1 .

Figure A6 .
Figure A6.Same format as in Figure A8, but the comparison of harmonic reconstruction (red) and DAH-MSLM stochastic simulation (blue) in a full frequency band f < 9.1 y −1 .