A simple stochastic parameterization for reduced models of multiscale dynamics

Multiscale dynamics are frequently present in real-world processes, such as the atmosphere-ocean and climate science. Because of time scale separation between a small set of slowly evolving variables and much larger set of rapidly changing variables, direct numerical simulations of such systems are difficult to carry out due to many dynamical variables and the need for an extremely small time discretization step to resolve fast dynamics. One of the common remedies for that is to approximate a multiscale dynamical systems by a closed approximate model for slow variables alone, which reduces the total effective dimension of the phase space of dynamics, as well as allows for a longer time discretization step. Recently we developed a new method for constructing a deterministic reduced model of multiscale dynamics where coupling terms were parameterized via the Fluctuation-Dissipation theorem. In this work we further improve this previously developed method for deterministic reduced models of multiscale dynamics by introducing a new method for parameterizing slow-fast interactions through additive stochastic noise in a systematic fashion. For the two-scale Lorenz 96 system with linear coupling, we demonstrate that the new method is able to recover additional features of multiscale dynamics in a stochastically forced reduced model, which the previously developed deterministic method could not reproduce.


Introduction
Multiscale dynamics are common in applications of contemporary science, such as geophysical science and climate change prediction [12,13,19,21,24,28,35]. Multiscale dynamics are typically characterized by the time and space scale separation of patterns of motion, with fewer slowly evolving variables and much larger set of faster evolving variables. This time-space scale separation often made direct numerical computation of the dynamics on the slow time scale quite difficult in realworld applications, which led to the development of multiscale computational methods [15,16]. These methods make use of the averaging formalism [36,41,42] to allow for large time discretization steps for the computation of the slow part of the dynamics. However, in very large systems with many fast variables even these methods are computationally expensive.
As a different alternative to direct numerical simulation of the complete multiscale model with all variables, it has long been recognized that, if a closed simplified model for the slow variables alone was available, one could use this closed slow-variable model instead to simulate the statistics of the slow variables. In order to derive such a reduced model, one usually represents the process as a general two-scale dynamical system of the form where x ∈ R N x and y ∈ R N y are the state vectors, and F and G are nonlinear differentiable functions. The scaling parameter ε ≪ 1 is used to separate the time scales in (1.1) into slow (that is, x), and fast (that is, y). The scale separation parameter ε above is introduced for convenience of presentation, because, as shown below, the method developed here does not require its explicit presence to function. Besides, often real-world processes do not have any distinct time-scale separation parameters, and the fast and slow variables are known empirically from observations. Under the assumption of "infinitely fast" y-variables, one writes the reduced averaged system for slow variables alone as

F(x, y)dµx(y),
where µx is the invariant distribution measure of the uncoupled fast dynamics wherex is a fixed parameter: Under the assumption of ergodic µx, the measure integral in (1.2) can be replaced with the time average along a single trajectory zx(t) of (1.3) for specified parameterx:

F(x, zx(t))dt.
From (1.2)-(1.4), it is clear that the computation of the averaged functionF(x) is not a simple task; in fact, it is rarely available explicitly (except for some special cases where either the invariant distribution measure µx or the solution zx(t) are known as explicit formulas). A good example of the system where the invariant distribution measure is known explicitly is the Ornstein-Uhlenbeck process [40]; the invariant distribution measure of the Ornstein-Uhlenbeck process is a Gaussian distribution with explicitly known mean state and covariance matrix. Because of this convenient property, the Ornstein-Uhlenbeck process is popular in the area of stochastic modeling of geophysical and other real-world processes.
The "upgrade" from the deterministic reduced model in (1.2) is a stochastic model of the form where W t is a Wiener process, and σ(x) is a matrix. The stochastic reduced model of the form (1.5) is generally considered to be more advantageous to the deterministic reduced model in (1.2), because the random noise can be used to parameterize "noise-like" influence from fast variables onto slow dynamics, which is entirely lacking in the deterministic reduced model in (1.2). Sometimes, deterministic reduced models of atmospheric processes fail to adequately represent atmospheric variability; it is thought that a significant portion of atmospheric variability is essentially noise-like, and fitting it with deterministic forcing terms does not seem to be a good way to produce an adequate approximation. Additionally, the deterministic slow dynamics in (1.2) tend to produce the attractor of the slow dynamics with significantly lower dimension than that for the full two-scale dynamics in (1.1), especially for weakly chaotic slow variables. At the same time, if the stochastic forcing in (1.5) is strong enough, it should "diffuse" the invariant manifolds of (1.2), thus inflating the dimension of the set containing limiting dynamics. Nonetheless, in many applications, the averaged dynamics of the deterministic type (1.2) or stochastic type (1.5) are not available explicitly. As a result, numerous approximate closure schemes were developed for multiscale dynamical systems [14,16,[30][31][32][33], which are all based on the averaging principle over the fast variables [36,41,42]. Some of the methods (such as those in [30][31][32][33]) replace the fast nonlinear dynamics with suitable stochastic processes [44], discontinuous Markov jump processes [22], or conditional Markov chains [14], while others [16] provide direct closure by suitable tabulation and curve fitting. Reduced stochastic dynamics were used to model global circulation patterns [11,18,34,43,45], and large-scale features of tropical convection [23,29]. However, it seems that all these approaches require either extensive computations to produce a closed model (for example, [14,16] require multiple simulations of fast variables alone with different fixed states of slow variables), or somewhat ad hoc determination of closure coefficients by matching areas under the time correlation functions [30][31][32][33]. Another interesting method was recently developed in [10], however, again, somewhat ad hoc approach was used to compute the closure (namely, slow variables of a multiscale system were treated as if their dynamics were generated by a running average of an Ornstein-Uhlenbeck process).
In the two recent works [4,6] the author developed a relatively simple and straightforward method of constructing the deterministic reduced model for slow variables of a multiscale model with nonlinear and multiplicative coupling, which required only a single computation of certain statistics of the fast dynamics with a fixed state of the slow variables, located in the region where the slow dynamics usually evolve. The method was based on the first-order Taylor expansion of the averaged coupling term with respect to the slow variables, which was computed using the Fluctuation-Dissipation theorem [1-3, 7-9, 27, 38]. It was demonstrated through the computations with the appropriately rescaled two-scale Lorenz 96 model [25,26] that, with nonlinear and multiplicative coupling in both slow and fast variables, the developed reduced model produced good approximation to the statistics of the full two-scale Lorenz 96 model. Among the advantages of the developed method were its simplicity and explicit formulation. Additionally, existing zero-order models of this kind for the Earth's atmosphere (such as the T21 barotropic model [9,17,39]) can be retrofitted with the new deterministic correction term emerging from the theory in [4].
In the current work, we introduce a method for computing a consistent approximation of the form (1.5) to the multiscale dynamics in (1.1). The method is aimed at stochastic parameterization of general complex nonlinear multiscale dynamics with many variables, and has essentially the same implementation restrictions as the method for deterministic reduced models we developed previously in [4,6]. We test the new method on the two-scale Lorenz 96 model, where only the linear part of the coupling is enabled for the simplicity of presentation. We demonstrate through direct numerical simulations that the new method generally improves properties of statistics of the reduced model, and, in particular, the injection of random noise into a deterministic reduced model can make it both more or less chaotic and mixing, depending on the difference between the dynamical regimes of the reduced and full two-scale models.
The manuscript is organized as follows. In Section 2 we present the general description of the new method, following the homogenization theory of [37]. In Section 3 we lay out the step-by-step computational implementation of the new method for a general two-scale dynamical system with linear coupling between the slow and fast variables, which does not contain any explicit time-scale separation parameters. In Section 4 we test the new method on the two-scale Lorenz 96 model in a range of dynamical regimes with varying chaos, mixing, and time scale separation between the slow and fast variables. Section 5 summarizes the results of this work.

General description of the method
Here we introduce a new method for the stochastic correction of the form (1.5) for the deterministic reduced model (1.2). To derive the method, we use the theoretical framework similar to that applied to homogenization problems in [37]. To simplify presentation, we assume that the deterministic averaged functionF(x) from (1.2) is already available, either as an explicit formula, or as an approximation we developed previously in [4,6]. Below, x is used to denote the state of the slow variables from the multiscale system (1.1), whilex denotes the state of the slow variables from the deterministic reduced model (1.2). The difference between x andx is denoted as q = x −x. Then, one can rewrite the multiscale system in (1.1) in the new variables as What we see above is that the first equation is already a closed system from (1.2), and given a suitable approximation forF from [4,6], it can be solved on its own. Thus,x(t) can be treated as a given function of time. This, in effect, leaves q and y as the unknown variables, and the first equation in (2.1) can be dropped. Now, the idea is to apply the averaging formalism to q, obtainingq as the next order correction tox. However, the straightforward application of what was done in (1.2) to (2.1) leads toq being identically zero for all times as long as its starting value is zero. In this situation, the reduced model forq should be derived as the Itô diffusion process of the corresponding backward Kolmogorov equation restricted to slow time scale (see [37] for details), under the condition that q (and, therefore, dq/dt) in (2.1) is O(ε). To do that, first we bring the time scale of q in (2.1) to O(1) by rescaling the time t as τ = εt. For the rescaled time τ, from (2.1) we obtain (2.2b) dy dτ = 1 ε 2 G(x + q, y). Above, the evolution equation forx is no longer needed, asx(τ) is a given function of rescaled time τ. Now we write the backward Kolmogorov equation for (2.2). For that, let v(τ, τ ′ ,x, q, y), τ ′ ≥ τ, be the value of a test function h(q(τ ′ ), y(τ ′ )), given q(τ) = q, and y(τ) = y. Then, v(τ, τ ′ ,x, q, y) obeys the following backward Kolmogorov equation (for reference, see, for example, [20]): Below, we drop the τ ′ -dependence from v as it is of no consequence to what is presented. Observe that the terms in the backward Kolmogorov equation above are multiplied by different powers of ε, which leads to the perturbation expansion of the solution v(τ,x, q, y) in powers of ε, and derivation of the backward Kolmogorov equation for the term which has the lowest power of ε in the closed form. Then, its corresponding Itô diffusion process will be the evolution equation for dq/dt, as long asq is small enough. The expansion of v(τ,x, q, y) in powers of ε is Plugging the expansion above back into the Kolmogorov equation in (2.3) and collecting the terms with matching powers of ε, we obtain the following relations for each power of ε: Below we consider each of the relations above separately.
Here we use the relation in (2.5b) to express v 1 in terms of v 0 .
We denote the flow, generated by (1.3), by φ sx , so that φ sx y is the solution of (1.3) forward in time s with the initial condition y, with the obvious identity

Now, consider the integral
where the group property of φ sx is used in the second equality. Then, from the first identity in (2.7), it follows that and from the second identity in (2.7) it follows that u(s,x, q, y) is in fact an explicit function of φ sx +q y, that is, u(s,x, q, y) ≡ U(x, q, φ sx +q y). However, any U(x, q, φ sx +q y) must obey the transport equation where the second identity is due to (2.6), and which holds for any s including s = 0. Combining (2.8) and (2.9) at s = 0, we obtain From the comparison with (2.5b) it follows that v 1 = u(0,x, q, y), that is, • Order ε 0 . Here observe that v 0 does not depend on y, as pointed out above. This means that the average of v 0 with respect to the invariant distribution measure µx +q of (1.3) is the identity operation, and the same holds for its τ-derivative. Then, averaging out (2.5c) with respect to µx +q yields, For the first integral above we express the gradient term as a time derivative of the function along the flow (in the same way as above for v 1 ): The invariant distribution measure µx +q preserves the averages of functions of φ sx +q y: (2.14) This, in turn, results in and, therefore, At this point, substituting the expression for v 1 from (2.11), and rescaling time τ back to t yields where ":" denotes the Frobenius product of two matrices, and " Sym" denotes the symmetric part of a matrix (the skew-symmetric part is canceled out by the Frobenius product with a symmetric matrix). As a result, the next-order correctionq to the averaged dynamics in (1.2), generated by the backward Kolmogorov equation above, is given by the Itô diffusion process as long asq is small enough. Above, W t is a K-dimensional Wiener process for some positive integer K, Q(x +q,x) is the O(ε) N y -vector drift term, and σ(x +q,x) is the O( √ ε) N y × K stochastic diffusion matrix, given (non-uniquely) by (2.19) σσ T = S. Thus far, in (2.18) we obtained the equations for two sets of variables,x andq, while the evolution of the stochastic reduced dynamics for (1.1) is given by the sum (x +q). However, what we actually need is an approximate reduced model for (x +q) directly. For that, we add the two equations in (2.18) to obtain (2.20) Above, observe that the first term in the right-hand side is O(1), the second term is O( √ ε), and the rest of the terms are O(ε) or higher (taking into account thatq is O(ε)). Additionally, the terms in the second and third line of (2.20) are very difficult to compute in practice for complex nonlinear dynamics. Therefore, we delete the terms in the second and third lines from (2.20), thus completely decoupling (2.20) from (2.18). Then, after replacing (x +q) →x, we obtain the stochastic reduced model in (1.5), where the diffusion matrix σ is computed according to (2.17c) and (2.19) with a = b =x (further we denote S(x) ≡ S(x,x)).
Remark: σ(x) does not depend on ε. It is important to note that S(x), and, therefore, σ(x), do not in fact depend on ε. Indeed, observe that where φ s/ε x y is the solution of the fast dynamics from (1.1) (with ε −1 still in front of G(x, y)) with x =x fixed as a constant parameter. Thus, there is no need to know what ε is to compute S(x), and, in fact, ε does not have to be an explicit scaling parameter in the multiscale dynamics in (1.1). This makes the new method practical in realistic applications, where time-scale difference might not be explicitly available in the form of a parameter.
3. Practical implementation of the reduced stochastic model for a general multiscale process with linear coupling As formulated above in Section 2, the new method is applicable for a broad range of dynamical systems with general forms of coupling. However, for the clarity of presentation, we dedicate a separate section to the implementation of the developed method for a multiscale process with linear coupling. The linear coupling is the most basic form of coupling in physical processes, however, because of that it is also probably the most common form of coupling. Below we describe the complete step-by-step assembly of the reduced model, with both the deterministic and stochastic terms which parameterize coupling.
Here we consider the special setting of (1.1) with linear coupling between x and y: where f and g are nonlinear vector functions of x and y, respectively, and L x and L y are constant matrices of suitable sizes. As before, x and y denote the slow and fast variables, respectively. However, one important distinction between (3.1) and (1.1) is that we no longer make use of the time scale separation parameter ε; here the assumption is that it is somehow known that x-variables are slow, and y-variables are fast, but no further information about time scale separation is available beyond that, and, in particular, no explicit time scale separation parameter is known. As before, we introduce the corresponding fast limiting system with x specified as a constant parameter. It is easy to see that for the multiscale system with linear coupling in (3.1), the matrix S(x) from (2.17c) is computed as the time-lag correlation matrix where φ sz is the solution of (3.2), with x fixed at constant parameterx, forward in time s from an initial condition y, andz(x) is the mean state of (3.2). Before constructing the reduced model, we need realistic assumptions on what we can compute in the multiscale system in (3.1), which we take directly from [4,6]: • First, we are going to presume that the multiscale dynamical system in (3.1) is not necessarily computable at will for arbitrarily long time intervals. The reason is that if it is possible, then the need for a reduced model becomes somewhat difficult to justify. • Even if the full multiscale model is not computable at will, we still need some statistical information about it to formulate the reduced model. Here, we presume that some typical state x * of the slow variables x is available, such that the dynamics evolve in the proximity of x * . For example, a rough estimate of the mean state of the slow variables of the full multiscale system can be taken as x * , or a nearby state. • We presume that the limiting fast dynamics in (3.2) is computable beyond the mixing time scale, so that time averages of (3.2) can be computed, at least for a single given value x = x * . • The assumptions above are the same as in [4,6], in order to ensure compatibility of the proposed stochastic reduced model method with what we have already developed in [4,6] for deterministic reduced models of multiscale dynamics with nonlinear and multiplicative coupling. Under the above assumptions and the ergodicity hypothesis, we compute the mean state z and the time covariance matrix C(τ) from the long-term time series of the solution z(t) of (3.2) with fixed parameter x = x * : Then, we assemble the reduced stochastic model for (3.1) in the explicit form as where the constant terms R and σ are computed as Above, the deterministic part in (3.5) is computed as described in [4] using the quasi-Gaussian linear response approximation [27], while the stochastic part is computed according to (3.3) (where the measure average is replaced with the time average) withx = x * . Observe that the matrix σ will be computed only once, for the particular value x * . That, in effect, makes it a constant matrix approximation to the exact σ(x) from (2.19), provided that the trajectoryx(t) of the reduced model in (3.5) is in the vicinity of x * . This type of stochastic parameterization with a constant diffusion matrix is called the additive noise parameterization (as opposed to the multiplicative noise parameterization, where σ is a function of x). In the future work, we plan to extend the method described here onto the multiplicative noise parameterization.
The choice of σ. Observe that σ is not determined uniquely by (3.6b), as multiple decompositions of S into σσ T are available (the Cholesky decomposition into the product of a lower-triangular matrix with its own transpose being one of the examples). Here we use the following algorithm to compute σ: first, solve the eigenvalue problem where X is the matrix of eigenvectors, and Λ is the diagonal matrix of eigenvalues. Since S is symmetric and nonnegative-definite [37], then all eigenvalues in Λ are guaranteed to be positive, and all eigenvectors in X are guaranteed to be orthonormal (so that X −1 = X T ). Then, we compute σ as (3.8) σ = XΛ 1/2 X T . This method uniquely determines σ as a square, symmetric, and positive-definite diffusion matrix for the Wiener process W t .

Computational study: the two-scale Lorenz 96 model with linear coupling
The test multiscale system used to study the new method of stochastic parameterization here is the rescaled two-scale Lorenz 96 system with linear coupling, previously used in [4] to study the deterministic reduced model parameterization. The Lorenz 96 system is given by The model has periodic boundary conditions: x i+N x = x i , y i,j+J = y i+1,j , and y i+N x ,j = y i,j . The parameter ε ≪ 1 sets the time scale separation between the slow variables x i , and the fast variables y i,j . The parameters (x, β x ) and (ȳ, β y ) are the (mean, standard deviation) pairs for the corresponding uncoupled and unrescaled Lorenz models with the same periodic boundary conditions. The rescaling above ensures that the Lorenz 96 model in (4.1) has zero mean state and unit standard deviation for both slow and fast variables in the absence of coupling (λ x = λ y = 0), and remain near these values when λ x and λ y are nonzero. The linear coupling above preserves the energy of the form Below we display the results of a numerical study of the stochastic reduced model for slow variables of multiscale dynamics, using the rescaled Lorenz system in (4.1) as the test model. We compare the statistical properties of the slow variables for the four following systems: (1) The complete rescaled Lorenz system from (4.1); (2) The stochastic reduced model from (3.5); (3) The deterministic reduced model obtained from (3.5) by removing the stochastic forcing (this model was previously developed in [4]); (4) The poor man's version of (3.5) with no stochastic forcing and the firstorder linear deterministic correction term R set to zero (further referred to as the "zero-order" system, same as in [4]). This zero-order system represents the simplest reduced model with constant parameterization of coupling terms. The fixed parameter x * for the computation of z , R and σ was set to the longterm mean state x of the full multiscale model in (4.1) (in computationally expensive models, a rough estimate could be used). Like in [4,6], for all computational statistical results presented below, the averaging time window T av equals 10000 time units.
Due to translational invariance of the studied models, the statistics are invariant with respect to the index shift for the variables x i . For diagnostics, we monitor the following long-term statistical quantities of x i : a. The distribution density functions, computed by bin-counting. A distribution density function gives the best information about the one-point statistics of x i , as it shows the statistical distribution of x i in the phase space. b. The time auto-correlation functions x i (t)x i (t + s) , where the time average is over t, normalized by the variance x 2 i (so that it always starts with 1). c. The time cross-correlation functions x i (t)x i+1 (t + s) , also normalized by the variance x 2 i . d. The energy auto-correlation function This energy auto-correlation function measures the non-Gaussianity of the process (it is identically 1 for all s if the process is Gaussian, such as the Ornstein-Uhlenbeck process). For details, see [32].
The following dynamical regimes are studied: • N x = 20, J = 4 (so that N y = 80). Thus, the number of the fast variables is four times greater than the number of the slow variables. • ε = 0.01, 0.1. The time scale separation of two orders of magnitude (ε = 0.01) is consistent with typical real-world geophysical processes (for example, the annual and diurnal cycles of the Earth's atmosphere). Also, in large-scale atmospheric processes the time scale separation between slow and fast variables can be weaker than that (for example, typical time scale of equatorial Kelvin waves is about 70 days, and that of Yanai waves is about 30 days, which is well in between the annual and diurnal cycles), so we additionally test the dynamical regimes with weak time scale separation ε = 0.1. • λ x = λ y = 0.3, 0.35. These values of coupling are chosen so that they are neither too weak, nor too strong (although 0.3 is weaker, and 0.35 is stronger). However, this small variation in coupling changes the dynamical regime in the slow variables between moderately (ε = 0.3) and weakly (ε = 0.35) chaotic and mixing, due to the suppression of chaos effect previously studied in [5]. • F x = 6. The slow forcing F x adjusts the chaos and mixing properties of the slow variables, and in this work it is set to a weak-to-moderate chaotic regime F x = 6. The reason for that is that it was found previously in [4] that in strongly chaotic and mixing regimes at slow variables there is not much of a difference between the multiscale dynamics and reduced models.  Table 4.1. Relative errors between the slow variables of the full multiscale system, and different reduced models, computed for the plots in  [4], and the zero-order reduced model with constant parameterization of coupling terms (which is given for reference as a simplest/poorest form of parameterization). Observe that the difference between the new stochastic and deterministic models here are rather small, however, it does look like the additional stochastic term improves the statistics. In particular, the stochastic terms makes the distribution density more spiky towards the multiscale density, and appears to produce better fits for correlation functions. The probable reason why there is not much of a difference between the deterministic and stochastic reduced models is that the slow dynamics are not sufficiently weakly chaotic for the stochastic term to make a significant difference. Table 4.1 confirms that there is some improvement errorwise in the distribution density and energy auto-correlation, but no improvement in auto-and cross-correlations (probably due to oscillations getting out of sync with increased lag time).

Moderate mixing at slow variables with strong time scale separation.
Here we present the comparison of different statistics for the dynamical regime with moderate chaos and mixing at slow variables (achieved by setting λ x = λ y = 0.3) and strong time scale separation (achieved by setting ε = 0.01). In Figure 4.2 we show the distribution density functions, time auto-correlation functions, time cross-correlation functions, and energy auto-correlation functions for the multiscale dynamics in (4.1), and three different kinds of the reduced models: the new stochastic reduced model, the deterministic reduced model from [4], and the zeroorder reduced model with constant parameterization of coupling terms. Observe Density function x N x =20, N y =80, λ x =0.3, λ y =0.3, F x =6, F y =16, ε=0.
that the difference between the new stochastic and deterministic models here are even smaller than for the regime with weak time-scale separation, probably due to the fact that the contribution of the stochastic term scales as √ ε. However, again, it looks like the additional stochastic term improves the statistics somewhat. Table  4.2 show that there is some improvement error-wise in all presented statistics, but not enough to be discernible visually in Figure 4 Density function x N x =20, N y =80, λ x =0.3, λ y =0.  9.309 · 10 −2 9.627 · 10 −2 0.1923 Cross-corr. 9.57 · 10 −2 9.99 · 10 −2 0.2129 Energy corr. 9.042 · 10 −3 1.209 · 10 −2 2.776 · 10 −2 Table 4.2. Relative errors between the slow variables of the full multiscale system, and different reduced models, computed for the plots in Density function x N x =20, N y =80, λ x =0. 35 Table 4.4. Relative errors between the slow variables of the full multiscale system, and different reduced models, computed for the plots in Figure 4 time scale separation (achieved by setting ε = 0.1). In Figure 4.3 we show the distribution density functions, time auto-correlation functions, time cross-correlation functions, and energy auto-correlation functions for the multiscale dynamics in (4.1), and three different kinds of the reduced models: the new stochastic reduced model, the deterministic reduced model from [4], and the zero-order reduced model with constant parameterization of coupling terms. Here we can see a significant improvement between the deterministic reduced model from [4] and the new stochastic reduced model. In particular, what apparently happens here is that the deterministic reduced model from [4] turns out to be less chaotic and mixing than the original multiscale dynamics at slow variables (observe that the distribution density is very spiky for the deterministic reduced model, while the time auto-and cross-correlation functions are less mixing, and the energy autocorrelation function is more sub-Gaussian than those for the multiscale dynamics). Then, the introduction of the stochastic term results in improvement of mixing and sub-Gaussianity, and also smoothens out the spikes on the distribution density, resulting in better approximation of the multiscale dynamics. Table 4.3 confirms that there is improvement of statistics error-wise with the introduction of the stochastic term in the new reduced model. 4.4. Weak mixing at slow variables with strong time scale separation. Here we present the comparison of different statistics for the dynamical regime with weak chaos and mixing at slow variables (achieved by setting λ x = λ y = 0.35) and strong time scale separation (achieved by setting ε = 0.01). In Figure 4.4 we show the distribution density functions, time auto-correlation functions, time cross-correlation functions, and energy auto-correlation functions for the multiscale dynamics in (4.1), and three different kinds of the reduced models: the new stochastic reduced model, the deterministic reduced model from [4], and the zeroorder reduced model with constant parameterization of coupling terms. Here, again, we can see a significant improvement between the deterministic reduced model from [4] and the new stochastic reduced model, however, in a somewhat reverse way if compared to the previous set-up with weak time scale separation. Density function x N x =20, N y =80, λ x =0. 35  Here observe that the deterministic reduced model is more chaotic and mixing than the slow variables of the multiscale dynamics, and the stochastic term makes the new model less chaotic and mixing to better match the multiscale dynamics. While at first seeming counter-intuitive, this effect can happen due to the stochastic noise pushing the solution off the unstable manifold of the deterministic system (where the chaotic and mixing motion occurs) into the dissipative absorbing region surrounding the system's attractor while not having enough strength to compensate for the lack of chaos and mixing with its own random forcing. In fact, the same (but somewhat weaker) effect can also be observed in Figures 4.1 and 4.2 for weaker coupling and stronger chaos and mixing, so one can presume that it should be generally common in stochastic reduced models, especially if the time scale separation is strong and, because of that, the stochastic noise is weak enough to produce its own mixing. Table 4.4 confirms that there is some improvement of statistics error-wise with the introduction of the stochastic term in the new reduced model.

Conclusions
In the current work we improve the recently developed method [4,6] for deterministic reduced models of multiscale dynamics with the higher-order additive stochastic term which parameterizes the slow-fast interactions with random noise. The method is based on the homogenization techniques for multiscale systems [37] and offers a practical way of creating stochastic reduced models for a broad range of general multiscale processes. As demonstrated above in Section 3, the stochastic term upgrade comes at no additional computational cost for systems with linear coupling between slow and fast variables, as it uses the same time correlation matrix of the fast variables for a fixed state of the slow variables as the response term in the deterministic part of coupling parameterization. Another practical advantage of the new method is that it does not require an explicit time-scale separation parameter between the slow and fast variables of multiscale dynamics. We tested the new method numerically using the two-scale Lorenz 96 model with linear coupling between the slow and fast variables in a range of dynamical regimes with weak/strong time-scale separation, chaos and mixing at the slow variables. The new stochastic reduced model consistently improved the results of the previously developed deterministic approach [4,6] in the two different dynamical regimes: (1) In the situation where the deterministic reduced model was less chaotic and weaker mixing than the slow variables of the full multiscale dynamics, the stochastic model was more chaotic and stronger mixing, due to stochastic forcing smoothening out spikes in distribution density and introducing random decorrelation in time auto-and cross-correlation functions. (2) In the situation where the deterministic reduced model was more chaotic and stronger mixing than the slow variables of the full multiscale dynamics, the stochastic model suppressed chaos and mixing in the reduced dynamics. While this effect seems somewhat counter-intuitive, it can be explained by the stochastic noise pushing the solution off the unstable manifold of the deterministic system (where the chaotic and mixing motion occurs) into the dissipative region around the system's attractor, while not having enough random force to increase chaos and mixing on its own. In the future work, we plan to extend the new method of stochastic parameterization of reduced slow dynamics onto multiplicative stochastic forcing parameterization. Observe that the additive stochastic forcing parameterization in the current work emerges from the constant diffusion matrix approximation above in Section 3. However, if the constant diffusion matrix approximation is improved by including its higher order Taylor expansion terms (as was done to the deterministic coupling parameterization in [4,6]), the inclusion of the higher-order terms will result in the multiplicative coupling with the diffusion matrix of the reduced model being a function of the slow variables. For the multiplicative diffusion matrix approximation, we plan to use an approach similar to that in [4,6].