#### 4.1. Mid-Latitude Ocean Model

In this study, the data are generated from the integration of the classical eddy-resolving double-gyre ocean model of [

8]. The flow dynamics is governed by the quasi-geostrophic potential-vorticity (QG PV) equations for 3 stacked isopycnal layers:

where the layer index starts from the top;

$J(\xb7,\xb7)$ is the Jacobian operator;

${\rho}_{1}\phantom{\rule{-0.166667em}{0ex}}=\phantom{\rule{-0.166667em}{0ex}}{10}^{3}$ kg·m

${}^{-3}$ is the upper layer density;

$\beta \phantom{\rule{-0.166667em}{0ex}}=\phantom{\rule{-0.166667em}{0ex}}2\phantom{\rule{-0.166667em}{0ex}}\times \phantom{\rule{-0.166667em}{0ex}}{10}^{-11}$ m

${}^{-1}$·s

${}^{-1}$ is the planetary vorticity gradient;

$\nu \phantom{\rule{-0.166667em}{0ex}}=\phantom{\rule{-0.166667em}{0ex}}20$ m

${}^{2}$·s

${}^{-1}$ is the eddy viscosity coefficient; and

$\gamma \phantom{\rule{-0.166667em}{0ex}}=\phantom{\rule{-0.166667em}{0ex}}4\times {10}^{-8}$ s

${}^{-1}$ is the bottom friction parameter. The basin size is

$2L\phantom{\rule{-0.166667em}{0ex}}=\phantom{\rule{-0.166667em}{0ex}}3840$ km, so that

$-L\le x\le L$ and

$-L\le y\le L.$ The isopycnal layer depths are

${H}_{1}$ = 250,

${H}_{2}$ = 750 and

${H}_{3}$ = 3000 m. The PV anomalies

${q}_{i}$ and the velocity stream functions

${\psi}_{i}$ are related as:

where the stratification parameters

${S}_{1},{S}_{21},{S}_{22},{S}_{3}$ are such that the first and second Rossby deformation radii are

$R{d}_{1}=40$ km and

$R{d}_{2}=20.6$ km, respectively. The flow is forced by the prescribed wind stress curl

$W(x,y).$ The double-gyre Ekman pumping

$W(x,y)$ is asymmetric in order to avoid artificial symmetrization of the gyres:

where the asymmetry parameter is

$A=0.9,$ the non-zonal tilt parameter is

$B=0.2,$ and the wind stress amplitude is

${\tau}_{0}=0.8$ N·m

${}^{-2}$.

There are no-flow-through and partial-slip boundary conditions on the lateral walls, augmented by the integral mass conservation constraints. The model is solved on the uniform ${513}^{2}$ grid with 7.5-km nominal resolution. The solution is saved every 20 days over 1780 years; this time interval is unprecedentedly long for an eddy-resolving ocean model, but our study requires this length to obtain highly accurate statistics of the flow. Each snapshot of the solution was coarse-grained by a local running-window averaging with a 120-km width to a ${65}^{2}$ grid with 60-km spacing. We checked that the outcome is not sensitive to moderate variations of the running-window width.

#### 4.2. Data-Adaptive Harmonic Decomposition

The data-adaptive harmonic decomposition (DAHD) [

55] is a signal-processing technique that allows for a decomposition of the power and phase spectra via data-adaptive modes within a time-embedded phase space. Unlike other techniques exploiting time-embedding, such as M-SSA [

25] or its nonlinear version [

28], DAHD exploits a combination of integral operator and semigroup techniques [

76] that help decompose the original signal into elementary signals that are narrowband for each separate discrete Fourier frequency, while being data-adaptive.

The mathematical details of the approach are provided in [

55] within a general framework, including the case of multivariate time series issued from a mixing dynamical system, either stochastic or deterministic. Central to the approach is the spectral analysis of a class of integral operators whose kernels are built from correlation functions in a quite different way than found in principal component analysis (PCA) and its generalizations. For the sake of simplicity, we recall first from [

55] how such an integral operator is constructed in the case of a one-dimensional time series

$X\left(t\right)$. Given the two-sided autocorrelation function (ACF),

$\rho $ (of

$X\left(t\right)$), estimated on the interval

$I=[-\tau /2,\tau /2]$, such an operator takes the form:

and acts on any square-integrable function

$\mathsf{\Psi}$ on the interval

I. The parameter

$\tau >0$ characterizes the embedding window, but is chosen in practice so that

$\rho \left(t\right)$ sufficiently decays over

$[-\tau /2,\tau /2]$.

At a practical level, the discretization of the operator

${\mathcal{L}}_{\rho}$ defined by (

10) leads to Hankel matrices built from temporal correlations fundamentally different than in M-SSA and alike. For multivariate time series, the ACF,

$\rho $, is replaced by time-lagged cross-correlations, and operators such as given by (

10) are grouped into a block operator whose discretization results into block-Hankel matrices; see (

12) below and ([

55], Sec. VI-D). The aforementioned DAHMs are then obtained as eigenvectors of such a block-Hankel matrix, while the corresponding eigenvalues provide a notion of energy contained into the signal that, while allowing for a reconstruction of the signal, is not equivalent to variance; see ([

55], Remark V.1-(ii)). The main properties of DAH spectral elements are given below.

#### 4.2.1. DAH Eigenelements and Power Spectrum

Given a multivariate time series formed of d “channels” sampled uniformly at a unit of time $\delta t$, i.e., $X\left({t}_{n}\right)=({X}_{1}\left({t}_{n}\right),\dots ,{X}_{d}\left({t}_{n}\right))$, with ${t}_{n}=n\delta t$, $n=1,\dots ,N$, the first step to determine the DAH spectra consists of estimating the two-sided cross-correlation coefficients ${\rho}_{k}^{(p,q)}$ between channels p and q at lag k up to a maximum lag $M-1$, i.e., $-M+1\le k\le M-1$.

As shown in ([

55], Sec. VI-D), the discretization of the operator

${\mathcal{L}}_{\rho}$ given by (

10) with

$\rho ={\rho}^{(p,q)}$ leads to the following Hankel matrix

${\mathbf{H}}^{(p,q)}$:

By forming such a Hankel matrix for each

$(p,q)$ in

${\{1,\cdots ,d\}}^{2}$, one can assemble the following block-Hankel matrix

$\mathfrak{C}$ composed by

${d}^{2}$ blocks of size

$(2M-1)\times (2M-1)$, each given according to:

Note that because each building block, ${\mathbf{H}}^{(p,q)}$, is symmetric and because ${\mathfrak{C}}^{(p,q)}={\mathfrak{C}}^{(q,p)}$, the grand matrix $\mathfrak{C}$ is itself symmetric, and therefore, its eigenvalues are necessarily real while the eigenvectors form an orthogonal set.

Furthermore, Theorem V.1 of [

55] shows that the corresponding eigenvalues of

$\mathfrak{C}$ come in pairs of equal absolute values, but with the opposite sign. The eigenvalues can be grouped per Fourier frequency

f (

$\ne 0$) and are actually determined by the singular values of a cross-spectral matrix at each frequency. In particular, denoting by

$\widehat{{\rho}^{p,q}}\left(f\right)$ the Fourier transform at the frequency

f of the cross-correlation function

${\rho}^{p,q}$, we consider the following

$d\times d$ cross-spectral matrix

$\mathfrak{S}\left(f\right)$ whose entries are given by:

Then, Theorem V.1 of [

55] shows that for each singular value

${\sigma}_{k}\left(f\right)$ of

$\mathfrak{S}\left(f\right)$ there exists, when

$f\ne 0$, a pair of negative-positive eigenvalues

$({\lambda}_{-}^{k}\left(f\right),{\lambda}_{+}^{k}\left(f\right))$ of

$\mathfrak{C}$ such that:

i.e.,

$2d$ eigenvalues are associated with each Fourier frequency

$f\ne 0$. The same theorem shows that

d (but not paired) eigenvalues are associated with the frequency

$f=0$.

This property allows one to rearrange the eigenvalues into the DAH power spectrum that consists of forming, for each

ℓ ranging from 1–

$({M}^{\prime}+1)/2$, the discrete set:

where:

Hereafter, we use ${M}^{\prime}=2M-1$ for concision, re-indexing the string $\{-M+1,\cdots ,M-1\}$ to run from 1–${M}^{\prime}$ as necessary.

Furthermore, Theorem V.1 of [

55] shows that the eigenvectors

${\mathbf{W}}_{j}$ of

$\mathfrak{C}$ can also be grouped per Fourier frequency

f by using the following representation:

where DAHM snippet

${\mathbf{E}}_{k}^{j}$ is a

${M}^{\prime}$-long row vector that is explicitly associated with a Fourier frequency

f according to:

where the amplitudes

${B}_{k}^{j}$ and the phases

${\theta}_{k}^{j}$ are both data-dependent, for each

k in

$\{1,\cdots ,d\}$. Thus, we can easily sort the eigenvectors according to a given Fourier frequency

${f}_{\ell}$ in (

16), by determining the following subset of indices in

$\{1,\cdots ,d{M}^{\prime}\}$:

Note that $\mathcal{J}\left({f}_{\ell}\right)$ is composed of $2d$ indices when $\ell \ne 0$ and of d indices if $\ell =0$, and therefore, $\mathcal{J}\left({f}_{\ell}\right)$’s form a partition of the total set of indices, $\{1,\cdots ,d{M}^{\prime}\}.$

#### 4.2.2. DAH Coefficients

Another useful property concerns the pair of DAHMs associated with a pair of DAH eigenvalues

$({\lambda}_{j},{\lambda}_{{j}^{\prime}})$, such that

${\lambda}_{{j}^{\prime}}=-{\lambda}_{j}$ with

j and

${j}^{\prime}$ that belong thus to the same subset

$\mathcal{J}\left(f\right)$. For such a DAHM pair, the theory shows indeed that their corresponding phases satisfy

${\theta}_{k}^{{j}^{\prime}}={\theta}_{k}^{j}+\pi /2$, i.e., in each DAHM pair, the modes are shifted by one fourth of the period; see Theorem IV.1 of [

55]. The DAHMs are thus always in exact phase quadrature, as for a sine-and-cosine pair in Fourier analysis, but in a data-adaptive fashion as encapsulated in the

${\theta}_{k}^{j}$’s and the

${B}_{k}^{j}$’s.

By analogy with M-SSA [

25], the multivariate dataset

X can be projected onto the orthogonal set formed by the

${\mathbf{W}}_{j}$’s, in order to obtain the following DAH expansion coefficients (DAHCs):

where

t varies from one to:

Although the DAHCs are not formally orthogonal in time, the DAHC pair

$({\xi}_{j}\left(t\right),{\xi}_{{j}^{\prime}}\left(t\right))$ associated with a DAHM pair

$({\mathbf{W}}_{j},{\mathbf{W}}_{{j}^{\prime}})$, is composed of time series that are nearly in phase quadrature; a property that is all the more pronounced when the embedding window parameter

M can be made sufficiently large to resolve the decay of temporal correlations contained in

X; see [

55]. In other words, the larger

M (subject to the length of the record), the more apparent is the phase quadrature property exhibited by

${\xi}_{j}\left(t\right)$ and

${\xi}_{{j}^{\prime}}\left(t\right)$ constituting a DAHC pair.

Furthermore, any subset of DAHCs, as well as the their full set, can be convolved with its corresponding set of DAHMs, to produce a partial or full reconstruction of the original dataset, respectively. Thus, the following

j-th reconstructed component (RC) at time

t and for channel

k is defined as:

where

${L}_{t}$ (resp.

${U}_{t}$) is a lower (resp. upper) bound in

$\{1,\cdots ,{M}^{\prime}\}$, that is allowed to depend on time. The normalization factor

${M}_{t}$ equals

${M}^{\prime}$, except near the ends of the time series, as in M-SSA [

25], and the sum of all the RCs recovers the original time series.

Due to the ordering of the DAHMs in terms of Fourier frequency, the harmonic reconstruction component (HRC) can be formed from the sum of the RC pairs associated with a same Fourier frequency

$f\ne 0$, namely:

where

$\mathcal{J}\left(f\right)$ denotes the set of indices given in (

19). HRC provides an unambiguous and natural way to determine how variability at a particular frequency

f is expressed in the time domain. For instance, Panels (a) and (b) of

Figure 7 show the sum of the first four HRCs on PC1 for the upper-layer stream function anomalies as simulated from the QG model and its DAH-MSLM emulator, respectively.

#### 4.3. Multilayer Stuart–Landau Models

DAHD facilitates efficient inverse modeling of a given multivariate dataset

$\mathbf{X}$ in the transformed coordinates of DAHCs (

20) and then utilizes the reconstruction Formulas (

22) and (

23) for transformation into the space of the original variables of

$\mathbf{X}$. As we explain below, the modeling of DAHCs is simplified due to (i) the near-phase quadrature property satisfied by any DAHC pair and (ii) its narrowband character.

Let us consider a DAHC pair

$({\xi}_{j}\left(t\right),{\xi}_{{j}^{\prime}}\left(t\right))$ associated with a pair of DAH eigenvalues

$({\lambda}_{j},{\lambda}_{{j}^{\prime}})$, such that

${\lambda}_{{j}^{\prime}}=-{\lambda}_{j}$ with

j and

${j}^{\prime}$ that belong thus to the same subset

$\mathcal{J}\left(f\right)$ associated with a frequency

f. Moreover, without loss of generality, we can assume that

${\lambda}_{j}$ is positive and

${\lambda}_{{j}^{\prime}}$ is negative. Hereafter, we also assume the time

t to be a continuous parameter. For such a DAHC pair, we form the complex time series,

${\zeta}_{j}\left(t\right)={\xi}_{j}\left(t\right)+i{\xi}_{{j}^{\prime}}\left(t\right)$ where

${i}^{2}=-1$. As shown in Section VII of [

55], we can infer that:

Since the convolution term in the RHS of (

24) involves a cosine function oscillating at the frequency

f, the frequencies

$g\ne f$ contained in the time series

${X}_{k}(t+\tau /2)-{X}_{k}(t-\tau /2)$ are filtered out, and the resulting DAHC,

${\xi}_{j}\left(t\right)$, is narrowband about the frequency

f. This latter property is shared by all DAHCs,

${\xi}_{j}\left(t\right)$, with

j in

$\mathcal{J}\left(f\right)$, and in particular for

${\xi}_{{j}^{\prime}}\left(t\right)$. Although the RHS of (

24) provides an exact representation of the DAHC (in the case of an infinite time series), its usage in practice is limited since we desire a closure model of the

${\xi}_{j}\left(t\right)$’s that avoids an explicit dependence on the data,

${X}_{k}$, as found in (

24).

Guided by the fact that a DAHC pair

$({\xi}_{j}\left(t\right),{\xi}_{{j}^{\prime}}\left(t\right))$ is composed of narrowband time series that are nearly in phase quadrature, Chekroun and Kondrashov [

55] have shown that the class of Stuart–Landau (SL) oscillators driven by an additive noise [

77,

78] represent a natural class of closure models to capture amplitude modulations of

${\xi}_{j}\left(t\right)$ and

${\xi}_{{j}^{\prime}}\left(t\right)$:

where

$\mu ,\gamma $ and

$\beta $ are real parameters,

${\u03f5}_{t}$ is a noise term and

$z\left(t\right)={\xi}_{j}\left(t\right)+i{\xi}_{{j}^{\prime}}\left(t\right)$. Complementarily,

Appendix A,

Appendix B,

Appendix C and

Appendix D below justify the modeling of DAHCs within the class of SL oscillators by the theory of Ruelle–Pollicott (RP) resonances and their estimation from the time series. Furthermore, the collective behavior of all DAHC pairs associated with the same frequency

f must be also taken into account by using an appropriate dynamical coupling between the corresponding individual SL oscillators, as well as by considering temporal and spatial cross-pair correlations in the noise term

${\u03f5}_{t}$.

The multilayer MSM framework of [

54] is particularly suited to deal with these issues and applied to (

25), it leads to MSLMs such as introduced in [

55]. For this study, the optimal MSLM has only a main layer according to the stopping criterion described in Appendix A of [

54], and it is given as the following system of SDEs driven by pairwise-correlated white noise:

Appendix C and

Appendix D below provide, in the context of the data collected from the oceanic model [

8], numerical evidences and theoretical arguments for the modeling of the corresponding DAHCs by MSLMs such as (

26). As mentioned above, the theoretical apparatus is in particular based on the theory of Ruelle–Pollicott resonances and their estimation from time series [

56], recalled in

Appendix A and

Appendix B below.

Here, $({x}_{j}\left(t\right),{y}_{j}\left(t\right))$ is aimed at approximating a DAHC pair $({\xi}_{j}\left(t\right),{\xi}_{{j}^{\prime}}\left(t\right))$ associated with a frequency $f={f}_{\ell}$, and the index j belongs to the subset of indices $\mathcal{J}\left({f}_{\ell}\right)$ associated with d distinct pairs. The ${W}_{k}^{j}$’s with $k=1$ or $k=2$, and $1\le j\le d$, form $2d$ independent Brownian motions.

Following Kondrashov et al. [

54], all model coefficients are estimated for each

$({x}_{j},{y}_{j})$-pair by (multiple) linear regression (MLR). Linear constraints on

${\alpha}_{j}\left(f\right),{\beta}_{j}\left(f\right)$ and

${\sigma}_{j}\left(f\right)$ are imposed to ensure antisymmetry for the linear part of each

$({x}_{j},{y}_{j})$-pair, as well as equal and positive values

${\sigma}_{j}\left(f\right)>0$ to ensure asymptotic stability. The resulting regression residuals yield time series of

$(\phantom{\rule{0.166667em}{0ex}}\mathrm{d}{\u03f5}_{j}^{x},\phantom{\rule{0.166667em}{0ex}}\mathrm{d}{\u03f5}_{j}^{y})$ that are well approximated by pairwise-correlated white noise in the sums involving the

${W}_{k}^{j}$’s; see

Appendix D for more details. Note that an MSLM, as any MSM, may include more layers to accurately model the regression residual; see Section VIII-B of [

55]. It turned out that for the simulations reported in this study, an MSLM of the form (

26) was sufficient to reach satisfactory modeling skills; see

Figure 8 and

Figure 9 above, as well as

Appendix E. Note also that the DAHCs associated with

$f\equiv 0$ are not paired and resemble red noises in practice (see [

55]); thus, only a linear MSM is used to model the DAHCs associated with the zero-frequency; see [

54].

Because the SL oscillators are uncoupled across the frequencies, the DAH-MSLM approach is computationally efficient, totally parallelizable and, for a variety of datasets, laptop-enabled in practice. First, the model coefficients can be estimated in parallel for each frequency, and the overall number of independent coefficients to estimate remains small and fixed for each $({x}_{j},{y}_{j})$-pair. Once all the resulting (few) MSLM coefficients have been estimated, for the simulation, as well, no extra coupling across the frequencies is needed other than running the MSLMs across the frequencies by the same noise realization, which can be also done in parallel.

To simulate the DAHC pairs, Equations (

26) are discretized in time and integrated numerically forward using a Euler–Maruyama scheme from initial states obtained according to the initialization procedure described in Appendix B of [

54], followed by a convolution with the associated DAHMs according to (

22) to transform into the space of original variables. The simulated RCs are then added in each frequency bin to yield the corresponding HRCs given by (

23) that, in turn, are added across the frequencies in the range of interest. For instance, Panel (b) of

Figure 7 shows the sum of the first four HRCs on PC1 for the upper-layer stream function anomalies as simulated for the DAH-MSLM emulator of the QG model considered here.