Next Article in Journal
Functional Kernel Density Estimation: Point and Fourier Approaches to Time Series Anomaly Detection
Next Article in Special Issue
Bayesian Update with Importance Sampling: Required Sample Size
Previous Article in Journal
A Genetic Programming-Based Low-Level Instructions Robot for Realtimebattle
Previous Article in Special Issue
Data-Driven Corrections of Partial Lotka–Volterra Models
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Data-Driven Model Reduction for Stochastic Burgers Equations

Department of Mathematics, Johns Hopkins University, 3400 N. Charles Street, Baltimore, MD 21218, USA
Entropy 2020, 22(12), 1360; https://doi.org/10.3390/e22121360
Submission received: 1 October 2020 / Revised: 29 November 2020 / Accepted: 30 November 2020 / Published: 30 November 2020

Abstract

:
We present a class of efficient parametric closure models for 1D stochastic Burgers equations. Casting it as statistical learning of the flow map, we derive the parametric form by representing the unresolved high wavenumber Fourier modes as functionals of the resolved variable’s trajectory. The reduced models are nonlinear autoregression (NAR) time series models, with coefficients estimated from data by least squares. The NAR models can accurately reproduce the energy spectrum, the invariant densities, and the autocorrelations. Taking advantage of the simplicity of the NAR models, we investigate maximal space-time reduction. Reduction in space dimension is unlimited, and NAR models with two Fourier modes can perform well. The NAR model’s stability limits time reduction, with a maximal time step smaller than that of the K-mode Galerkin system. We report a potential criterion for optimal space-time reduction: the NAR models achieve minimal relative error in the energy spectrum at the time step, where the K-mode Galerkin system’s mean Courant–Friedrichs–Lewy (CFL) number agrees with that of the full model.

1. Introduction

Closure modeling aims for computationally efficiently reduced models for tasks requiring repeated simulations such as Bayesian uncertainty quantification [1,2] and data assimilation [3,4]. Consisting of low-dimensional resolved variables, the closure model must take into account the non-negligible effects of unresolved variables so as to capture both the short-time dynamics and large-time statistics. As suggested by the Mori–Zwanzig formalism [5,6,7], trajectory-wise approximation is no longer appropriate, and the approximation is in a statistical sense. That is, the reduced model aims to generate a process that approximates the target process in distribution, or at least, reproduce the key statistics and dynamics for the quantities of interest. For general nonlinear systems, such a reduced closure model is out of the reach of direct derivations from first principles.
Data-driven approaches, which are based on statistical learning methods, provide useful and practical tools for model reduction. The past decades witness revolutionary developments of data-driven strategies, ranging from parametric models (see, e.g., [8,9,10,11,12,13,14] and the references therein) to nonparametric and machine learning methods (see, e.g., [15,16,17,18]). These developments demand a systematic understanding of model reduction from the perspectives of dynamical systems (see, e.g., [7,19,20]), numerical approximation [21,22], and statistical learning [17,23].
With 1D stochastic Burgers equation as a prototype model, we aim to further the understanding of model reduction from an interpretable statistical inference perspective. More specifically, we consider a stochastic Burgers equation with a periodic solution on [ 0 , 2 π ] :
u t = ν u x x u u x + f ( x , t ) , 0 < x < 2 π , t > 0 u ( 0 , t ) = u ( 2 π , t ) = 0 , u x ( 0 , t ) = u x ( 2 π , t ) ,
from an initial condition u ( · , 0 ) . We consider a stochastic force f ( x , t ) that is smooth in space, residing on K 0 low wavenumber Fourier modes, and white in time, given by
f ( x , t ) = σ m = 1 K 0 sin ( m x ) W ˙ m ( t ) + cos ( m x ) W ˙ m ( t ) ,
where { W m , W m } are independent Brown motions. Here ν > 0 is the viscosity constant and σ > 0 represents the strength of the stochastic force.
Our goal is to find a discrete-time closure model for the first K Fourier modes, so as to efficiently reproduce the energy spectrum and other statistics of these modes.
We present a class of efficient parametric reduced closure models for 1D stochastic Burgers equations. The key idea is to approximate the discrete-in-time flow map statistically, in particular, to represent the unresolved high wavenumber Fourier modes as functionals of the resolved variable’s trajectory. The reduced models are nonlinear autoregression (NAR) time series models, with coefficients estimated from data simply by least squares. We test the NAR models in four settings: reduction of deterministic responses ( K > K 0 ) vs. reduction involving unresolved stochastic force ( K < K 0 ), and small vs. large scales of stochastic force (with σ = 0.2 and σ = 1 ), where K 0 is the number of Fourier modes of the white-in-time stochastic force and σ is the scale of the force. In all these settings, the NAR models can accurately reproduce the energy spectrum, invariant densities, and autocorrelation functions (ACF). We also discuss model selection, consistency of estimators, and memory length of the reduced models.
Taking advantage of our NAR models’ simplicity, we further investigate a critical issue in model reduction of (stochastic) partial differential equations: maximal space-time reduction. The space dimension can be reduced arbitrarily in our parametric inference approach: NAR models with two Fourier modes perform well. The time reduction is another story. The maximal time step is limited by the NAR model’s stability and is smaller than those of the K-mode Galerkin system. Numerical tests indicate that the NAR models achieve the minimal relative error at the time step where the K-mode Galerkin system’s mean CFL (Courant–Friedrichs–Lewy) number agrees with the full model’s, suggesting a potential criterion for optimal space-time reduction.
One can readily extend our parametric closure modeling strategy to general nonlinear dissipative systems beyond quadratic nonlinearities. Along with [14], we may view it as a parametric inference extension of the nonlinear Galerkin methods [24,25,26,27]. However, it does not require the existence of an inertial manifold (and the stochastic Burgers equation does not satisfy the spectral gap condition that is sufficient for the existence of an inertial manifold [28]), and it applies to resolved variables of any dimension (e.g., lower than the dimension of the inertial manifold if it exists [14]). Notably, one may use NAR models that are linear in parameters and estimate them by least squares. Therefore, the algorithm is computationally efficient and is scalable for large systems.
The limitation of the parametric modeling approach is its reliance on the derivation of a parametric form using the Picard iteration, which depends on the nonlinearity of the unresolved variables (see Section 3.1). When the nonlinearity is complicated, a linear-in-parameter ansatz may be out of reach. One can overcome this limitation by nonparametric techniques [23,29] and machine learning methods (see, e.g., [16,17,30]).
The stochastic Burgers equation is a prototype model for developing closure modeling techniques for turbulence (see e.g., [31,32,33,34,35,36,37]). In particular, Dolaptchiev et al. [37] propose a closure model for stochastic Burgers equation in a similar setting, based on local averages of finite-difference discretization, reproducing accurate energy spectrum similar to this study. We directly construct a simple yet effective NAR model for the Fourier modes, providing the ground of a statistical inference examination of model reduction.
We note that the closure reduced models based on parametric inference are different from the widely studied proper orthogonal decomposition (POD)-based reduced order models (ROM) for parametric full models [38,39]. These POD-ROMs seek new effective bases to capture the effective dynamics by a linear system for the whole family of parametric full models. The inference-based closure models focus on nonlinear dynamics in a given basis and aim to capture both short-time dynamics and large-time statistics. In a probabilistic perspective, both approaches approximate the target stochastic process: the POD-ROMs are based on Karhunen-Loéve expansion, while the inference-based closure models aim to learn the nonlinear flow-map. One may potentially combine the two and find nonlinear closure models for the nonlinear dynamics in the POD basis.
The exposition of our study proceeds as follows. We first summarize the notations in Table 1. Following a brief review of the basic properties of the stochastic Burgers equation and its numerical integration, we introduce in Section 2 the inference approach to closure modeling and compare it with the nonlinear Galerkin methods. Section 3 presents the inference of NAR models: derivation of the parametric form, parameter estimation, and model selection. Examining NAR models’ performance in four settings in Section 4, we investigate the space-time reduction. Section 5 concludes our main findings and possible future research.

2. Space-Time Reduction for Stochastic Burgers Equationations

In this section, we first review basic properties of the stochastic Burgers equation and its numerical integration. Then, we introduce inference-based model reduction and compare it with the nonlinear Galerkin methods.

2.1. The Stochastic Burgers Equationation

A Fourier transform of Equation ( 1 ) leads to
d d t u ^ k = ν q k 2 u ^ k i q k 2 l = u ^ l u ^ k l + f ^ k ( t )
with q k = k , k Z , where u ^ k are Fourier modes:
u ^ k ( t ) = F [ u ] k = 1 2 π 0 2 π u ( x , t ) e i q k x d x , u ( x , t ) = F 1 [ u ^ ] = k u ^ k ( t ) e i q k x ,
The system has the following properties. First, it is Galilean invariant: if u ( x , t ) is a solution, then u ( x c t , t ) + c , with c an arbitrary constant speed, is a solution. To see this, let v ( x , t ) = u ( x c t , t ) + c . Then, v t = c u x + u t , v x = u x , and
v t = c v x + u x x + u u x + f = c v x + v x x + ( v c ) v x + f = v x x + v v x + f .
Without loss of generality, we set 0 2 π u ( x , 0 ) d x = 0 . This implies that u ^ 0 ( 0 ) = 0 . In this study, we only consider forces with mean zero, i.e., 0 2 π f ( x , t ) d x = 0 , therefore from Equation (3), we see that u ^ 0 ( t ) 0 , or equivalently, 0 2 π u ( x , t ) d x 0 . Second, the system has an invariant measure [31,40,41], due to a balance between the diffusion term, which dissipates energy, and the stochastic force, which injects energy. In particular, the initial condition does not affect the large time statistical properties of the solution. Third, since u is real, the Fourier modes satisfies u ^ k = u ^ k * , where u ^ k * is the complex conjugate of u ^ k .

2.2. Galerkin Spectral Method

We consider the Galerkin spectral method for numerical solutions of the Burgers equation. The system is approximated as follows: the function u ( x , t ) is represented at grid points x i = i Δ x with i = 0 , , 2 N 1 and Δ x = 2 π 2 N . The Fourier transform F is replaced by discrete Fourier transform
u ^ k ( t ) = F 2 N [ u ] k = i = 0 2 N 1 u ( x i , t ) e i q k x i , u ( x i , t ) = F 2 N 1 [ u ^ ] i = 1 2 N k = N + 1 N u ^ k e i q k x i .
For simplicity of notation, we abuse the notation u ( x i , t ) so that it denotes either the true solution or its high-resolution 2 N -mode approximation. Since u is real, we have u ^ k = u ^ k * . Noticing further that u ^ 0 = 0 due to Galilean invariance, and setting u ^ N = 0 , we obtain a truncated system
d d t u ^ k = ν q k 2 u ^ k i k 2 | k l | N , | l | N u ^ l u ^ k l + f ^ k , with | k | = 1 , , N .
We solve Equation (4) using the exponential time differencing fourth order Rouge–Kutta method (ETDRK4) (see [42,43]) with standard 3 / 2 zero-padding for dealiasing (see e.g., [44]), with the force term f ^ k treated as a constant in each time step. Such a mixture scheme is of strong order 1, but it has an advantage of preserving both the numerical stability of ETDRK4 and the simplicity of Euler–Maruyama.
We will consider a relatively small viscosity ν = 0.02 , so that random shocks are about to emerge in the solution. In general, a smaller viscosity constant demands a higher resolution in space-time to resolve the solution, particularly the emerging shocks as ν vanishes. To sufficiently resolve the solution, we set N = 128 and d t = 0.001 . The solution is accurately resolved, with mean Courant–Friedrichs–Lewy (CFL) numbers being 0.139 and 0.045 for σ = 1 and σ = 0.2 , respectively. Here the mean CFL number is computed as the average along a trajectory with N t = 10 5 steps
Mean CFL number = 1 N t n = 1 N t sup x | u ( x , t n ) | Δ t Δ x ,
where Δ t and Δ x are the time step and space step, respectively. Furthermore, numerical tests show that the marginal densities converge as trajectory length increases.

2.3. Nonlinear Galerkin and Inferential Model Reduction

For simplicity of notation, we write the Burgers equation in an operator form as
t u + A u = B ( u ) + f , u ( 0 ) = u 0
with a linear operator A : H 0 1 ( 0 , 2 π ) L 2 ( 0 , 2 π ) and a nonlinear operator B : H 0 1 ( 0 , 2 π ) L 2 ( 0 , 2 π )
A = ν x x , B ( u ) = ( u 2 ) x / 2 .
We first decompose the Fourier modes of u into resolved and unresolved variables. Recall that our goal of model reduction is to derive a closed system that can faithfully describe the dynamics of the coefficients { u ^ k ( t ) } | k | = 1 K , or equivalently, the low dimensional process v ( x , t ) = | k | = 1 K u ^ k ( t ) e i q k x .
Denote by P the projection operator from H 0 1 ( 0 , 2 π ) to span { e i q k x } | k | = 1 K , and let Q : = I P (and for simplicity of notation, we will also denote them as projections on the corresponding vector spaces of Fourier modes). With u = P u + Q u = v + w , we can write the system (5) as
(6) d v d t = P A v + P B ( v ) + P f + [ P B ( v + w ) P B ( v ) ] , (7) d w d t = Q A w + Q B ( v + w ) + Q f .
To find a closed system for v, we quantify the truncation error P B ( v + w ) P B ( v ) in (6), which represents the nonlinear interaction between the low and high wavenumber modes, by either a function of v or a functional of the trajectory of v. In particular, in the nonlinear Galerkin method based on inertial manifold theory, (see e.g., [24,25,26,27]), one aims to represent the high modes w as a function of the low modes v (and hence obtaining an approximate inertial manifold). In the simplest implementation, one neglects the time derivative in Equationation (7) and solves w = ψ ( v ) from
w ( Q A ) 1 [ Q B ( v + w ) + Q f ]
by fixed point iterations: ψ 0 = 0 , ψ i + 1 = ( Q A ) 1 [ Q B ( u + ψ i ) + Q f ] . This leads to an approximation of w as a function of v, which exists if K is large enough and if the system satisfies a gap condition (so that an inertial manifold exists). However, among many dissipative systems with global attractor, only a few have been proven to satisfy the gap condition (see [28] for a recent review). More importantly, we can not always expect K to be larger than the dimension of an inertial manifold, which is unknown in general. Therefore, such a nonlinear Galerkin approach works for neither a system without an inertial manifold nor for a K smaller than the dimension of the inertial manifold.
We take a different perspective on the reduction. Unlike the nonlinear Galerkin which aims for a trajectory-wise approximation, we aim for a probabilistic approximation of the distribution of the stochastic process ( v ( · , t ) , t 0 ) . The randomness of the process v can come from random initial conditions and/or from the stochastic force. We emphasize that a key is to represent the dependence of the model error P B ( v + w ) P B ( v ) on the process v, not simply constructing a stochastic process with the same distribution as P B ( v + w ) P B ( v ) , which may be independent of the process of v.
In a data-driven approach, such a probabilistic approximation leads naturally to the statistical inference of the underlying process, aiming to represent the model error [ P B ( v + w ) P B ( v ) ] ( t ) as a functional of the past trajectory ( v ( · , s ) , s t ) . This inferential reduction approach works flexibly for general settings: there is no need of an inertial manifold and the dimension K can be arbitrary (e.g., less than the dimension of the inertial manifold, as shown in [14]).
Space-time reduction. To achieve a space-time reduction for practical computation, the reduced model should be a time series model with a time step δ > d t for time reduction, instead of a differential system. It approximates the flow map (with t n = n δ )
u ^ k ( t n + 1 ) = F ( u ^ · ( t n ) , f ^ · ( [ t n : t n + 1 ] ) ) k , | k | K ,
where u ^ · ( t n ) = ( u ^ k ( t n ) , | k | 0 ) is the vector of all Fourier modes, and thus the above map is not a closed system for the low modes. Recall that for | k | K ,
d d t u ^ k = ν q k 2 u ^ k i k 2 | k l | K , | l | K u ^ l u ^ k l K - mode truncation i q k 2 | k l | > K or | l | > K u ^ l u ^ k l truncation error + f ^ k ( t )
Clearly, the K-mode truncated Galerkin system can provide an immediate approximation to F in (8). Making use of it, we propose a time series model for { u ^ k ( t n ) } | k | = 1 K in the form of
u k n + 1 = u k n + δ [ R k δ ( u n ) + f k n + Φ k n ] + g k n + 1 , | k | K ,
where R · δ ( u n ) is from a one-step forward integrator with time step-size δ of the deterministic K-mode Galerkin, and f k n = f ^ k ( t n ) is white noise in the kth Fourier mode of the stochastic force at time t n . Here, the term Φ n and the noise g n + 1 aim to represent the truncation error, as well as the discretization error. Together with the other terms in (10), they provide a statistical approximation to the flow map F in (8). In particular, the term Φ n approximates F based on information up to time n (e.g., the conditional expectation), and the noise g n + 1 aims to statistically represent the residual of the approximation. Since the truncation error depends on the past history of the low wavenumber modes, and as suggested by the Mori–Zwanzig formalism [6,7], we make Φ n depend on the trajectory u 1 : n of the state process, as well as the trajectories f 1 : n and g 1 : n :
Φ n : = Φ ( u 1 : n , f 1 : n , g 1 : n ) .
For simplicity, we assume the noise { g n } to be iid Gaussian, and the resulted time series model in (10) is a nonlinear autoregression moving average model (NARMA) [13,45,46].
The right hand side of Equation (10), together with Φ n defined in Equation (11), aims for a statistical approximation of the discrete-time map (8). However, the general form in Equation (11) leads to a high dimensional function to be learned from data, which is intractable by regression methods using either global or local polynomial basis, due to the well-known curse of dimensionality. Fortunately, the physical model provides informative structures to reduce the dimension, and we can obtain effective approximations based on only a few basis functions with finite memory. In the next section, we derive from the physical model a parametric form for the reduced model, whose coefficients can be efficiently estimated from data.
To avoid confusions between notations, we summarize the correspondence of the variables between the full and reduced models in Table 2.

3. Inference of Reduced Models

We present here the parametric inference of NAR models: derivation of parametric forms, estimation of the parameters, and model selection.

3.1. Derivation of Parametric Reduced Models

We derive parametric reduced models by extracting basis functions from numerical integration of Equation (6). The combination of these basis functions will give us Φ ( u 1 : n , f 1 : n , g 1 : n ) in (11), which approximates the flow maps { F ( u ^ · ( t n ) , f ^ · ( [ t n : t n + 1 ] ) ) k , | k | K } in (8) in a statistical sense.
We first write a closed integro-differential system for the low-mode process ( v ( · , t ) , t 0 ) . In view of Equation (6), this can be simply done by integrating the equation of the high modes w in Equation (7):
d v d t = P A v + P B ( v ) + P f + [ P B ( v + w ) P B ( v ) ] , w ( t ) = e Q A τ w ( t τ ) + t τ t e Q A ( t s ) [ Q B ( v ( s ) + w ( s ) ) + Q f ( s ) ] d s ,
where τ [ 0 , t ] . Note that in addition to the trajectories ( v ( · , s ) , s [ t τ , t ] ) and ( Q f ( s ) , s [ t τ , t ] ) , which we can assume to be known, the state w ( · , t ) also depends on the initial condition w ( · , t τ ) . Therefore, this equation is not strictly closed. However, as τ increases, the effect of the initial condition decays exponentially, allowing for possible finite time approximate closure. Given w ( · , t τ ) and ( Q f ( s ) , s [ t τ , t ] ) , the Picard iteration can provide us an approximation of w as a functional of the trajectory of v. That is, the sequence of functions { w ( l ) } , defined by
w ( l + 1 ) ( t ) = e Q A τ w ( l ) ( t τ ) + t τ t e Q A ( t s ) [ Q B ( v ( s ) + w ( l ) ( s ) ) + Q f ( s ) ] d s ,
with w ( 0 ) ( s ) = 0 for s [ t τ , t ] , will converge to w as l . In particular, the first Picard iteration
w ( 1 ) ( t ) = t τ t e Q A ( t s ) [ Q B ( v ( s ) ) + Q f ( s ) ] d s
provides us a closed representation: from its numerical integrator, we can derive parametric terms for the reduced model. We emphasize that the goal is to derive parametric terms for statistical inference, but not to have a trajectory-wise approximation. Thus, high-order numerical integrators or high-order Picard iterations are helpful but may complicate the parametrization. For simplicity, we shall consider only the first Picard iteration and Riemann sum approximation of this integral.
We can now propose parametric numerical reduced models from the above integro-differential equation. In a simple form, we parametrize both the Riemann sum approximation of the first Picard iteration and a numerical scheme of the differential equation to obtain
v ( t n ) v ( t n 1 ) + a 1 δ R δ ( v ( t n 1 ) ) + a 2 δ P f ( t n 1 ) + δ [ P B ( v + w ) P B ( v ) ] ( t n 1 ) , w ( t n 1 ) j = 0 p c j e Q A j δ [ Q B ( v ( t n j ) ) + Q f ( t n j ) ] .
Here δ = t n t n 1 denotes the time step-size, the nonlinear function R δ ( · ) comes from a numerical integration of the deterministic truncated Galerkin equation d v d t P A v + P B ( v ) at time t n 1 and with time step-size δ , and the coefficients ( a 1 , a 2 , c j ) are to be estimated by fitting to data in a statistical sense. To distinguish the approximate process in the reduced model from the original process, we denote it by v n , and write the reduced model as
(14a) v n = v n 1 + a 1 δ R δ ( v n 1 ) + a 2 δ P f ( t n 1 ) + δ [ P B ( v n 1 + w n 1 ) P B ( v n 1 ) ] + g n , (14b) w n 1 = j = 1 p c j e Q A j δ [ Q B ( v n j ) + Q f ( t n j ) ] ,
where { g n } is a process representing the residual, can be assumed to be stochastic force for simplicity, but can also be assumed to be a moving average part to better capture the time correlation as in [13,46]. The second Equation (14b) does not have a residual term, as its goal is to provide a set of basis functions for the approximation of the forward map v ( t n ) = F ( v ( t n 1 ) , w ( t n 1 ) , f ) as in Equation (8), not to model the high modes.
Note that the time step-size δ can be relatively large, as long as the truncated Galerkin equation d v d t P A v + P B ( v ) of the slow variable v can be reasonably resolved. In general, such a step-size can be much larger than the time step-size needed to resolve the fast process w, because the effect of the unresolved fast process is “averaged” statistically when fitting the coefficients ( a 1 , a 2 , c j ) to data. Furthermore, the numerical error in the discretization is taken into account statistically.
Theoretically, the right-hand side of Equation (14a) is an approximation of the conditional expectation E v ( t n ) | v ( t n p : n 1 ) , P f ( t n p : n 1 ) , which is the optimal L 2 estimator of the forward map conditional on the information up time t n 1 . Here, the L 2 is with respect to the joint measure of the vector ( v ( t · p : · 1 ) , P f ( t · p : · 1 ) ) , which is approximated by their joint empirical measure when fitting to data.
To avoid nonlinear optimization, the parametric form may be further simplified to be linearly dependent on the coefficients by dropping the terms that are nonlinear in the parameter, which is quadratic. In fact, recall that in the Burgers equation B ( u ) = u u x and P B ( v + w ) P B ( v ) = v x w + v w x + w w x . By dropping the interaction between the high modes w w x and approximating
P B ( v n 1 + w n 1 ) P B ( v n 1 ) v x n 1 w n 1 + v n 1 w x n 1
in (14a), we obtain a reduced model that depends linearly on the coefficients { a j , c j } .

3.2. The Numerical Reduced Model in Fourier Modes

We now write the reduced model in terms of the Fourier modes as in Equation (10).
As discussed in the above section, the major task is to parametrize the truncation error P B ( v + w ) k P B ( v ) k . Recall that the operator P projects u to modes with wavenumber 1 | k | K and that the bilinear function P B ( v ) k = l u ^ l u ^ k l (hereafter, to simplify notation, we also denote P and Q on the corresponding vector spaces of Fourier modes).
P B ( v + w ) k P B ( v ) k = i k 2 | l | > K or | k l | > K u ^ l u ^ k l .
Since the quadratic term B ( v ) can only propagate energy from ( u ^ k , 1 | k | K ) to modes with wave numbers less than 2 K + 1 , we get only the high modes with wave numbers K < | k | 2 K when we compute w by a single iteration of Q B ( v ) . (We use a single iteration for simplicity, but one can reach higher wave numbers by multiple iterations at the price of more complicated parametric forms.) Therefore, in a single iteration approximation, the truncated error will involve the first 2 K Fourier modes:
P B ( v + w ) k P B ( v ) k i k 2 K < | k l | 2 K or K < | l | 2 K u ^ l u ^ k l .
Dropping the interaction between the high-modes to avoid nonlinear optimization in parameter estimation, we have
P B ( v + w ) k P B ( v ) k i k 2 | k l | K , K < | l | 2 K or | l | K , K < | k l | 2 K u ^ l u ^ k l .
We approximate the high modes ( u ^ k , K < | k | 2 K ) by a functional of low modes as in (14b),
u ^ k ( t n 1 ) j = 1 p c k , j e Q A j δ [ u ˜ k ( t n j ) + f ^ k ( t n 1 ) ] , K < | k | 2 K
where u ˜ k is the high modes of the nonlinear function B ( v ) :
u ˜ k = Q B ( v ) k = i k 2 | l | K , | k l | K u ^ l u ^ k l , for K < | k | 2 K .
Here Q B ( v ) only represents the modes up to wavenumber 2 K , due to the fact that quadratic nonlinearity only involves interaction between double wave-numbers. One can reach higher wave numbers by iterations of the quadratic interaction.
The truncation error term can now be linearly parametrized as
[ P B ( v + w ) P B ( v ) ] k ( t n ) ) i q k 2 j = 0 p c k , j e Q A j δ | k l | K , K < | l | 2 K or | l | K , K < | k l | 2 K u ˜ l ( t n ) u ˜ k l ( t n j ) ,
where we also denote u ˜ k = u ^ k for | k | K for simplicity of notation.
We have now reached a parametric numerical reduced model for the Fourier modes. Denote u n = ( u k n , | k | K ) C K the low-modes in the reduced model that approximates the original low modes ( u ^ k ( t n ) , | k | K ) . The reduced model is
(17a) u k n = u k n 1 + δ [ R δ ( u · n 1 ) + f k n + Φ k n ] + g k n , 1 k K , (17b) Φ k n = j = 1 p c k , j v u k n j + c k , j R R δ ( u · n j ) + c k , j f f k n j + c k , j w | k l | K , K < | l | 2 K or | l | K , K < | k l | 2 K u ˜ l n 1 u ˜ k l n j
with the convention that u k n = ( u k n ) * (with the sup-script * denoting complex conjugate), and where the notion u ˜ l n j represents the high modes and is defined by
u ˜ k n j = u k n j , 1 k K ; i q k 2 e ν q k 2 j δ | l | K , | k l | K u k l n j u l n j , K < k 2 K .
The reduced model is in the form of a nonlinear auto-regression moving average (NARMA) model:
  • The map R δ ( · ) : C K C K is the 1-step forward of the deterministic K-mode Galerkin truncation equation d v d t = P A v + P B ( v ) using a numerical integration scheme with a time step-size δ , i.e., v n + 1 = v n + δ R δ ( v n ) . We use the ETDRK4 scheme.
  • The term f k n denotes the increment of the k-th Fourier modes of the stochastic force in the time interval [ t n 1 , t n ] , scaled by 1 / δ , and it is separated from R δ so that the reduced model can linearly quantify the response of the low-modes to the stochastic force.
  • The function Φ k n : = Φ k n ( u n p : n 1 , f n p : n 1 ) is a function C K p + K p C K with parameters θ = ( c v , c R , c f , c w ) R 4 K p to be estimated from data. In particular, the coefficients c k , 1 v and c k , 1 R act as a correction to the integration of the truncated equation.
  • The new noise terms { g n C K } are assumed for simplicity to be a white noise independent of the original stochastic force ( f n ) . That is, we assume that { g n } is a sequence of independent identically distributed (iid) Gaussian random vectors, with independent real and imaginary parts, distributed as N ( 0 , Diag ( σ k g ) ) with σ k g to be estimated from data. Under such a white noise assumption, the parameters can be estimated simply by least squares (see next section). In general, one can also assume other distributions for g n , or other structures such as moving average { g n : = ξ n + j = 1 q c j g ξ n j } with { ξ n } being a white noise sequence [13,46].

3.3. Data Generation and Parameter Estimation

We estimate the parameters of the NAR model by maximizing the likelihood of the data.
Data for the NAR model. To infer a reduced model in form of Equation (17), we generate relevant data from a numerical scheme that sufficiently resolve the system in space and time, as introduced in Section 2.2. The relevant data are trajectories of the low-modes of the state and the stochastic force, i.e., { u ^ k ( t n ) , f ^ k ( t n ) } for | k | K and n 0 , which are taken as { u k n , f k n } in the reduced model. Here, the time instants are t n = n δ , where δ can be much larger than the time step-size d t needed to resolve the system. Furthermore, the data do not include the high modes. In short, the data are generated by a downsampling, in both space and time, of the high-resolution solutions of the system.
The data can be either a long trajectory or many independent short trajectories. We denote the data consisting of M independent trajectories by
Data : { u k 1 : N t , m , f k 1 : N t , m } m , k = 1 M , K with u k 1 : N t , m = u ^ k ( t 1 : N t ) ( m ) , f k 1 : N t , m = f ^ k ( t 1 : N t ) ( m ) ,
where m indexes the trajectories, t n = n δ with δ being the time interval between two observations, and N t denotes the number of steps for each trajectory,
Parameter estimation. The parameters in the discrete-time reduced model Equation (17) is estimated by maximum likelihood methods. Our discrete-time reduced model has a few attractive features: (i) the likelihood function can be computed exactly, avoiding possible approximation error that could lead to biases in estimators; (ii) the maximum likelihood estimator (MLE) may be computed by least squares under the assumption that the process { g n } is white noise, avoiding time-consuming nonlinear optimizations.
Under the assumption that { g n } is white noise, the parameters can be estimated simply by least squares, because the reduced model in Equation (17) depends linearly on the parameters. More precisely, the log-likelihood of the data { u 1 : N t , m , f 1 : N t , m } m = 1 M in (19) can be written as
l ( θ , σ g ) = | k | K log σ k g + n , m = 1 T , M | u k n , m u k n 1 , m + δ R δ ( u k n 1 , m ) + δ f k n , m + δ Φ k n , m ( θ ) | 2 2 M T σ k g ,
where | · | denotes the absolute value of a complex number, θ = ( c v , c R , c f , c w ) R 4 K p and σ g = ( σ 1 g , , σ K g ) R K . To compute the maximum likelihood estimator (MLE) of the parameter ( θ , σ g ) , we note that Φ k n ( θ ) in (17b) depends linearly on the parameter θ . Therefore, the estimators of θ and σ g can be analytically computed by finding a zero of the gradient of the likelihood function. More precisely, denoting
Φ k n ( θ ) = j = 1 4 p θ j Φ k , j n
with Φ k , j n denoting the parameterized terms in (17b), we compute the MLE as
θ ^ k = ( A k ) 1 b k , 1 k K , σ ^ k g = 1 M T n , m = 1 T , M u k n , m u k n 1 , m + δ R δ ( u k n 1 , m ) + δ f k n , m + δ Φ k n , m ( θ ^ ) 2
where the normal matrix A k and vector b k are defined by
A k ( j , j ) = δ M T n , m = 1 T , M Φ k , j n , m , Φ k , j n , m , 1 j , j 4 p , b k ( j ) = 1 M T n , m = 1 T , M u k n , m u k n 1 , m + δ R δ ( u k n 1 , m ) + δ f k n , m , Φ k , j n , m .
In practice, A k may be singular and it can be dealt with by pseudo inverse or regularization. We assume for simplicity that the stochastic force g has independent components, so that the coefficients can be estimated by simple least square regression. One may further improve the NAR model by considering spatial correlation between the components of g or by using moving average models [13,46,47].

3.4. Model Selection

The parametric form in Equation (17b) leaves a family of reduced models with many freedoms underdetermined, such as the time lag p and possible redundant terms. To avoid overfitting and redundancy, we proposed to select the reduced model by the following criterion.
  • Cross validation: the reduced model should be stable and can reproduce the distribution of the resolved process, particularly the main dynamical-statistical properties. We will consider the energy spectrum, the marginal invariant densities, and temporal correlations:
    Energy spectrum : E | u ^ k | 2 = lim N t M 1 N t M m , n = 1 M , N t | u ^ k ( t n ) ( m ) | 2 ; Invariant density of Re ( u ^ k ) : p k ( z ) d z = lim N t M 1 N t M m , n = 1 M , N t 1 ( z , z + d z ) ( Re ( u ^ k ( t n ) ( m ) ) ; Auto - correlation function : ACF k ( τ ) = E [ Re u ^ k ( t + τ ) Re u ^ k ( t ) ] 1 N t M m , n = 1 M , N t Re ( u ^ k ( t n + τ ) ( m ) ) Re ( u ^ k ( t n ) ( m ) ) ;
    for k = 1 , , K .
  • Consistency of the estimators. If the model is perfect and the data are either independent trajectories or a long trajectory from an ergodic measure, the estimators should converge as the data size increases (see e.g., [45,48]). While our parametric model may not be perfect, the estimators should also become less oscillatory as the data size increases, so that the algorithm is robust and can yield similar reduced models from different data sets.
  • Simplicity and sparsity. When there are multiple reduced models performing similarly, we prefer the simplest model. We remove the redundant terms and enforce sparsity by LASSO (least absolute shrinkage and selection operator) regression [49]. Particularly, a singular normal matrix (22) indicates the redundancy of the terms and the need to remove strongly correlated terms.
These criteria are by no means exhaustive. Other methods include Bayesian information criterion (BIC, see, e.g., [50]), and the error reduction ratio [51] may be applied, but in our experience, they provide limited help for the selection of reduced models [7,14,46].
In view of statistical learning of the high-dimensional nonlinear flow map in (8), each linear-in-parameter reduced model provides an optimal approximation to the flow map in the hypothesis space spanned by the proposed terms. A possible future direction is to select adaptive-to-data hypothesis spaces in a nonparametric fashion [23] and analyze the distance between the flow map and the hypothesis space [52,53].

4. Numerical Study on Space-Time Reduction

We examine the inference and performance of NAR models for the stochastic Burgers equation in (1) and (2). We will consider two settings of the full model: the stochastic force has a scale of either σ = 1 or σ = 0.2 , representing that the stochastic force either dominates or subordinates to the dynamics, respectively. We will also consider two settings for reduction: the number of the Fourier modes of the reduced model is either K > K 0 or K < K 0 , representing a reduction of the deterministic responses and a reduction involving stochastic force, respectively.

4.1. Settings

As reviewed in Section 2.2, we integrate the Equation (4) of 2 N Fourier modes by ETD-RK4 with a time-stepping d t that the solution is resolved accurately. We call this discretized system the full model and its configuration is specified in Table 3. We will consider two different scales for the stochastic force, with standard deviations σ = 1 , leading to a dynamics dominated by the stochastic force, and σ = 0.2 , representing dynamics dominated by the deterministic drift.
We generate data in (19) from the full model as described in Section 3.3. We generate an ensemble of initial conditions by first integrating the system for 10 4 time units from an initial condition u 0 ( x ) = sin ( x ) + 2 cos ( x ) and draw 10 3 samples uniformly from this long trajectory. Then, we generate either a long trajectory or an ensemble of trajectories starting from randomly picked initial conditions, and we save data with the time-stepping δ . Numerical tests show that the invariant densities and the correlation functions vary little when the data are generated from different initial conditions.
We then infer NAR models for the first K Fourier modes with a time step δ . We will consider two values for K (recall that K 0 is the number of Fourier modes in the stochastic force)
  • K = 8 > K 0 = 4 . In this case, Q f = 0 , i.e., the stochastic force does not act on the unresolved Fourier modes w in (7), so w is a deterministic functional of the history of the resolved Fourier modes. In view of (14b), the reduced model mainly quantifies this deterministic map. We call this case “reduction of the deterministic response” and present the results in Section 4.3.
  • K = 2 < K 0 . In this case, Q f 0 , and w in (7) depend on the unobserved Fourier modes of the stochastic force. Thus, the reduced model has to quantify the effects of the unresolved Fourier modes of both the solution and the stochastic force. We call this case “reduction involving unresolved stochastic force” and present the results in Section 4.4.
In either case, we explore the maximal time step that NAR models can reach by testing time steps δ = d t × { 5 , 10 , 20 , 30 , 40 , 50 , 80 , 160 } .
We summarize the configurations and notations in Table 3.

4.2. Model Selection and Memory Length

We demonstrate model selection and the effect of memory length for reduced models with time step δ = 5 d t . We aim to select a universal parametric form of the NAR model for different setting of ( K , σ ) , where K { 8 , 2 } is the number of Fourier modes in the NAR model and σ { 1 , 0.2 } is the standard deviation of the full model’s stochastic force. Such a parametric form will be used later for the exploration of maximal time reduction by NAR models in the next sections.
We select the model according to Section 3.4: for each pair ( K , σ ) , we test a pool of NAR models and select the simplest model that best reproduces the statistics and has consistent estimators. The statistics are computed along a long trajectory of T = 2000 time units. We say that an NAR is numerically unstable if it blows up (e.g., | u n | exceeding 10 5 ) before reaching T = 2000 time units.
We estimate the coefficients in (17b) for a few time lag ps. Numerical tests show that the normal matrix in regression is almost singular, either when the stochastic force f k n j presents or when the lag for u k n j or R δ ( u n j ) is bigger than two. Thus, for simplicity, we remove them by setting:
c k , j f = 0 for all 1 j p , and c k , j v = c k , j R = 0 for all 1 < j p ,
and estimate only c k , 1 v , c k , 1 R , c k , j w for 1 j p .
That is, in (17b), the terms u k n j and R δ ( u n j ) have a time lag 1, the stochastic force term f k n j is removed, and only the high-order (the fourth) term has a time lag p. The memory length is p δ .
Memory length. To select a memory length, we test NAR models with time lags p { 1 , 5 , 10 , 20 } and consider their reproduction of the energy spectrum in (23). Figure 1 shows the relative error in energy spectrum of these NAR models. It shows that as p increases: (1) when the scale of the stochastic force is large ( σ = 1 ), the error oscillates without a clear pattern; (2) when σ = 0.2 , the error first decreases and then increases. Thus, a longer memory does not necessarily lead to a better reduced model when the stochastic force dominates the dynamics; but when deterministic flow dominates the dynamics, a proper memory can be helpful.
In all four settings, the simplest NAR models with p = 1 can consistently reproduce the energy spectrum with relative errors within 5%. Remarkably, the accuracy remains when the true energy spectrum is at the scale of 10 2 for the modes with k = 7 , 8 in Figure 2a,b and k = 2 in Figure 2d. Figure 2 also shows that the truncated K-mode Galerkin systems cannot reproduce the true energy spectrum in any of these settings, with upward tails, due to the lack of fast energy dissipation from the high modes. Thus, the NAR model has introduced additional energy dissipation through Φ n .
Consistency of estimators. The estimator of the NAR models tends to converge as data size increases. Figure 3 shows that the estimated coefficients of NAR with p = 1 from data consisting of M trajectories, each with length T, where M { 2 , 8 , 32 , 128 , 512 } and T { 40 , 80 , 160 , 320 , 640 , 1280 } . As T × M increases, all the estimators tend to converge (note that the coefficients c k , 1 w are at the scale of 10 4 or 10 3 ). In particular, they converge faster when σ = 1 than when σ = 0.2 : the estimators in (a)–(c) oscillate little after T × M > 10 3 , indicating that different trajectories lead to similar estimators, while the estimators (take c K , 1 R for example) in (b)–(d) oscillate until T × M > 10 5 . This agrees with the fact that a larger stochastic force makes the system mix faster, so each trajectory provides more effective samples driving the estimator to converge faster.
Numerical tests also show that an NAR model can be numerically unstable, while its coefficient estimator was consistent (i.e., tending to converge as above). Thus, consistency is not sufficient for the selection of an NAR model.
In our tests, sparse regression algorithms such as LASSO (see e.g., [49]) or sequential thresholding (see e.g., [54,55]) have difficulty in proper thresholding, because the coefficient c w of the high order terms are small and can vary in scales in different settings, but these high order terms are important for the NAR model.
Since the NAR models with p = 1 perform well in all the four settings, and since they are the simplest, we use them in the next sections to explore the maximal time reduction.

4.3. Reduction of the Deterministic Response

We explore in this and the next section the maximal time step δ that the NAR models can reach. We consider only the simplest models with time lag p = 1 .
We consider first the models with K = 8 Fourier modes. Since the stochastic force acts directly only on the first K 0 = 4 Fourier modes, the unresolved variable w in (12) is a deterministic functional of the path of the K modes, so is the truncation error P B ( v + w ) P B ( v ) in (14b). Thus, the NAR model mainly reduces the deterministic response of the resolved variables to the unresolved variables. In particular, the term Φ n in the NAR model (17a) optimally approximates this deterministic response on the function space linearly spanned by the terms in (17b).
We consider time steps δ = d t × Gap with Gap { 5 , 10 , 20 , 30 , 40 , 50 } . For each δ , we first estimate the coefficients ( c k , 1 v , c k , 1 R , c k , 1 w ) of the NAR model from the data with the same time step. We then validate the estimated NAR model by its statistics.
Numerical tests show that the NAR models with Gap 20 are numerically unstable for the setting ( K = 8 , σ = 1 ) , and the number is Gap = 50 for the setting ( K = 8 , σ = 0.2 ) . Figure 4a,b shows the relative error in energy spectrum reproduced by NAR models with those stable time steps. The relative errors increase as the Gap increases. Note that the relative errors for modes k = 1 , 2 change little, but those with k { 3 , 4 , 5 , 6 } increase significantly. In particular, note that in (b), the relative errors at k = 8 are about 8% for Gap { 20 , 30 , 40 } , but the relative errors at k { 3 , 4 , 5 , 6 } increase sharply to form a peak at k = 6 when Gap = 40 . We will discuss connections with CFL numbers in Section 4.5.
These NAR models reproduce the PDFs and ACFs relatively accurately. Figure 5 shows the marginal PDFs of the real parts of the modes. The top row shows the marginal PDFs for the NAR models with Gap = 5 , in comparison with those of the full model and the Galerkin truncated system (solved with time step d t ). For the modes with wave numbers k { 1 , 2 , 3 , 4 } , the NAR model captures the shape and spread of the PDFs almost perfectly, improving those of the Galerkin truncated system. For the modes with k { 5 , 6 , 7 , 8 } , the NAR model still performs well, significantly improving those of the truncated Galerkin system. The discrepancy between the PDFs becomes larger as the wavenumber increases, because these modes are affected more by the unresolved modes. The bottom row shows that the Kolmogorov–Smirnov statistics (the maximal difference between the cumulative distribution functions) increase slightly as the Gap increases. Figure 6 shows the ACFs. The top row shows that both the NAR model (with Gap = 5 ) and the Galerkin system can reproduce the ACFs accurately. The bottom row shows that the relative error of the ACF, in L 2 ( [ 0 , 3 ] ) -norm, increases as Gap increases (particularly in the case σ = 0.2 ). Recall that the truncated Galerkin system produces PDFs with support much wider than the truth for the high modes (see Figure 5), and that R δ becomes less accurate as δ increases. Thus, the terms u and R δ ( u ) in the NAR model (17) preserve the temporal correlation, and the high order term helps to dissipate energy and preserve the invariant measure.
In summary, when K = 8 , the maximal time steps are δ d t × [ 10 , 20 ) = [ 0.01 , 0.02 ) and δ = d t × [ 40 , 50 ) = [ 0.04 , 0.05 ) when σ = 1 and σ = 0.2 , respectively, for NAR models with p = 1 . All these NAR models can accurately reproduce the energy spectrum, the invariant measure and the temporal autocorrelation.

4.4. Reduction Involving Unresolved Stochastic Force

We consider next NAR models with K = 2 . In this case, the unresolved variable w in (12) is a functional of both the path of the K modes and the unresolved stochastic force. Thus, in view of (14b), (16) and (17), the NAR model quantifies the response of the K-modes to both the unresolved Fourier modes and the unresolved stochastic force.
Note first that K = 2 is too small for the K-mode Galerkin system to meaningfully reproduce any of the statistical or dynamical properties; see Figure 2c,d for the energy spectrum, Figure 5c,d for the marginal PDFs and Figure 6c,d for the ACFs. On the contrary, the NAR models with δ = 5 d t , whose term R δ comes from the K-mode Galerkin, reproduce these statistics accurately. Remarkably, the NAR models remain accurate even when the time step is as large as δ = 80 d t , with the K-S statistics being less than 0.025 in Figure 5c,d, and with the relative error in ACFs less than 6% in Figure 6c,d.
To explore the maximal time step that NAR models can reach, we consider time steps δ = d t × Gap with Gap { 5 , 10 , 20 , 40 , 80 , 160 } . Numerical tests show that the NAR models are numerically stable for all of them in both settings of σ = 1 and σ = 0.2 . Figure 4c,d shows the relative error in energy spectrum reproduced by NAR models with these time steps. The relative error first decreases and then increases as Gap increases, reaching the lowest when Gap = 10 and Gap = 20 for the settings σ = 1 and σ = 0.2 , respectively. In particular, all of these relative errors remain less than 9%, except when Gap = 160 in the setting σ = 1 .
In summary, when K = 2 , NAR models can tolerate large time steps. The maximal time steps are at least δ = d t × 80 = 0.08 and δ d t × 160 = 0.16 when σ = 1 and σ = 0.2 , respectively, for the NAR models to reproduce the energy spectrum with relative error less than 9%.

4.5. Discussion on Space-Time Reduction

Since model reduction aims for space-time reduction, it is natural to consider the maximal reduction in space-time; in other words, the minimum “spatial” dimension K and maximum time step δ = d t × Gap . We have the following observations from the previous sections:
  • Space dimension reduction, memory length of the reduced model and the stochastic force are closely related. As suggested by the discrete Mori–Zwanzig formalism for random dynamics (see e.g., [7]), space dimension reduction would lead to non-Markovian closure models. Figure 1 suggests that a proper medium length of the memory leads to best NAR model. It also suggests that the scale of the white in time stochastic force can affect the memory length, and a larger scale of stochastic force leads to shorter memory. We leave it as future work to investigate the relations between memory length (colored or white in time), stochastic force, and energy dissipation.
  • Maximal time step depends on the space dimension and the scale of the stochastic force, mainly limited by the stability of the nonlinear reduced model. Figure 4 shows that the maximum time step when K = 2 is at least δ = d t × Gap with Gap = 160 , much larger than those of the case of K = 8 . It also shows that as the scale of stochastic force increases from σ = 0.2 to σ = 1 , the NAR models’ maximal time step decreases (because the NAR models either become unstable or have larger errors in energy spectrum). It is noteworthy to mention that these maximal time steps of NAR models are smaller than those that the K-mode Galerkin system can tolerate. Figure 7 shows that the K-mode Galerkin system can be stable for time steps much larger than those of the NAR models: the maximal time step for the K-mode Galerkin system is when the mean CFL number (which increases linearly) reaches 1, but the maximal time step for the NAR models to be stable is smaller. For example, in the setting ( K = 8 , σ = 0.2 ) , the maximal time gap for the Galerkin system is Gap = 80 (the end of the red diamond line), but the maximal time gap for the NAR model is about Gap = 10 . The increased numerical instability of the NAR model is likely due to the nonlinear terms Φ n , which are important for the NAR model to preserve energy dissipation and the energy spectrum (see Figure 2 and the coefficients in Figure 3).
Beyond maximal reduction, an intriguing question arises: when does the reduced model perform the best (i.e., the least relative error in energy spectrum)? We call it optimality of space-time reduction. It is more interesting and relevant to model reduction than maximal reduction in space-time, because one may achieve a large time step or a small space dimension at the price of a large error in the NAR model, as we have seen in Figure 4. We note that the relative errors in energy spectrum in Figure 4c,d are the smallest when the Gap s are the closest to the squares in Figure 7, where the full model’s mean CFL numbers agree with those of the K-mode Galerkin system. We conjecture that optimal space-time reduction can be achieved by an NAR model when the K-mode Galerkin system preserves the CFL number of the full model.

5. Conclusions

We consider a data-driven model reduction for stochastic Burgers equations, casting it as a statistical learning problem on approximating the flow map of low-wavenumber Fourier modes. We derive a class of efficient parametric reduced closure models, based on representing the high modes as functionals of the resolved variables’ trajectory. The reduced models are nonlinear autoregression (NAR) time series models, with coefficients estimated from data by least squares. In various settings, the NAR models can accurately reproduce the statistics such as the energy spectrum, the invariant densities, and the autocorrelations.
Using the simplest NAR model, we investigate the maximal space-time reduction in four settings: reduction of deterministic responses ( K > K 0 ) vs. reduction involving unresolved stochastic force ( K < K 0 ), and small vs. large scales of stochastic force (with σ = 0.2 and σ = 1 ), where K 0 is the number of Fourier modes of the white-in-time stochastic force, and σ is the scale of the force. Reduction in space dimension is unlimited, and NAR models with K = 2 Fourier modes can reproduce the energy spectrum with relative errors less than 5%. The time reduction is another story. Maximal time reduction depends on both the dimension reduction and the stochastic force’s scale, as they affect the stability of the NAR model. The NAR model’s stability limits the maximal time step to be smaller than those of the K-mode Galerkin system. Numerical tests indicate that the NAR models achieve the minimal relative error at the time step where the K-mode Galerkin system’s mean CFL number agrees with the full model’s. This is a potential criterion for optimal space-time reduction.
The simplicity of our NAR model structure opens various fronts for a further understanding of data-driven model reduction. Future directions include: (1) studying the connection between optimal space-time reduction, the CFL number, and quantification of the accuracy of reduced models; (2) investigating the relation between memory length, dimension reduction, the stochastic force, and the energy dissipation of the system; (3) developing post-processing techniques to efficiently recover information of the high Fourier modes, so as to predict the shocks using the reduced models.

Funding

This research was funded by NSF-1913243 and NSF-1821211.

Acknowledgments

The author would like to thank the anonymous reviewers for valuable feedback that helped significantly improve the manuscript. He is grateful for Alexandre Chorin for introducing this problem. This study is part of our joint project on renormalization group methods. The author would like to thank Kevin Lin, Panos Stinis, John Harlim, Xiantao Li, Mauro Maggioni and Felix Ye for helpful discussions.

Conflicts of Interest

The author declares no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
ETDRK4exponential time differencing fourth order Rouge–Kutta method
CFL numberCourant–Friedrichs–Lewy number
NARnonlinear autoregression
PDFprobability density function
ACFautocorrelation function

References

  1. Stinis, P. Mori-Zwanzig Reduced Models for Uncertainty Quantification II: Initial Condition Uncertainty. arXiv 2012, arXiv:1212.6360. [Google Scholar]
  2. Li, Z.; Bian, X.; Li, X.; Karniadakis, G.E. Incorporation of Memory Effects in Coarse-Grained Modeling via the Mori-Zwanzig Formalism. J. Chem. Phys. 2015, 143, 243128. [Google Scholar] [CrossRef] [PubMed]
  3. Lu, F.; Tu, X.; Chorin, A.J. Accounting for Model Error from Unresolved Scales in Ensemble Kalman Filters by Stochastic Parameterization. Mon. Weather. Rev. 2017, 145, 3709–3723. [Google Scholar] [CrossRef]
  4. Lu, F.; Weitzel, N.; Monahan, A. Joint state-parameter estimation of a nonlinear stochastic energy balance model from sparse noisy data. Nonlinear Process. Geophys. 2019, 26, 227–250. [Google Scholar] [CrossRef] [Green Version]
  5. Zwanzig, R. Nonequilibrium Statistical Mechanics; Oxford University Press: New York, NY, USA, 2001. [Google Scholar]
  6. Chorin, A.J.; Hald, O.H. Stochastic Tools in Mathematics and Science, 3rd ed.; Springer: New York, NY, USA, 2013. [Google Scholar]
  7. Lin, K.K.; Lu, F. Data-driven model reduction, Wiener projections, and the Koopman-Mori-Zwanzig formalism. J. Comput. Phys. 2020, 424, 109864. [Google Scholar] [CrossRef]
  8. Kondrashov, D.; Chekroun, M.D.; Ghil, M. Data-Driven Non-Markovian Closure Models. Physica D 2015, 297, 33–55. [Google Scholar] [CrossRef] [Green Version]
  9. Harlim, J.; Li, X. Parametric Reduced Models for the Nonlinear Schrödinger Equation. Phys. Rev. E 2015, 91, 053306. [Google Scholar] [CrossRef] [Green Version]
  10. Lei, H.; Baker, N.A.; Li, X. Data-Driven Parameterization of the Generalized Langevin Equation. Proc. Natl. Acad. Sci. USA 2016, 113, 14183–14188. [Google Scholar] [CrossRef] [Green Version]
  11. Xie, X.; Mohebujjaman, M.; Rebholz, L.G.; Iliescu, T. Data-Driven Filtered Reduced Order Modeling of Fluid Flows. SIAM J. Sci. Comput. 2018, 40, B834–B857. [Google Scholar] [CrossRef] [Green Version]
  12. Chekroun, M.D.; Kondrashov, D. Data-Adaptive Harmonic Spectra and Multilayer Stuart-Landau Models. Chaos Interdiscip. J. Nonlinear Sci. 2017, 27, 093110. [Google Scholar] [CrossRef] [Green Version]
  13. Chorin, A.J.; Lu, F. Discrete approach to stochastic parametrization and dimension reduction in nonlinear dynamics. Proc. Natl. Acad. Sci. USA 2015, 112, 9804–9809. [Google Scholar] [CrossRef] [Green Version]
  14. Lu, F.; Lin, K.K.; Chorin, A.J. Data-based stochastic model reduction for the Kuramoto–Sivashinsky equation. Physica D 2017, 340, 46–57. [Google Scholar] [CrossRef] [Green Version]
  15. Pathak, J.; Hunt, B.; Girvan, M.; Lu, Z.; Ott, E. Model-Free Prediction of Large Spatiotemporally Chaotic Systems from Data: A Reservoir Computing Approach. Phys. Rev. Lett. 2018, 120, 024102. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  16. Ma, C.; Wang, J.; E, W. Model Reduction with Memory and the Machine Learning of Dynamical Systems. Commun. Comput. Phys. 2018, 25, 947–962. [Google Scholar] [CrossRef]
  17. Harlim, J.; Jiang, S.W.; Liang, S.; Yang, H. Machine learning for prediction with missing dynamics. J. Comput. Phys. 2020. [Google Scholar] [CrossRef]
  18. Parish, E.J.; Duraisamy, K. A Paradigm for Data-Driven Predictive Modeling Using Field Inversion and Machine Learning. J. Comput. Phys. 2016, 305, 758–774, Eng0. [Google Scholar] [CrossRef] [Green Version]
  19. Duan, J.; Wei, W. Effective Dynamics of Stochastic Partial Differential Equations; Elsevier: Amsterdam, The Netherlands, 2014. [Google Scholar]
  20. Stinis, P. Renormalized Mori-Zwanzig-Reduced Models for Systems without Scale Separation. Proc. R. Soc. A 2015, 471, 20140446. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  21. Hudson, T.; Li, X.H. Coarse-Graining of Overdamped Langevin Dynamics via the Mori-Zwanzig Formalism. Multiscale Model. Simul. 2020, 18, 1113–1135. [Google Scholar] [CrossRef]
  22. Choi, Y.; Carlberg, K. Space–Time Least-Squares Petrov-Galerkin Projection for Nonlinear Model Reduction. SIAM J. Sci. Comput. 2019, 41, A26–A58. [Google Scholar] [CrossRef]
  23. Jiang, S.W.; Harlim, J. Modeling of missing dynamical systems: Deriving parametric models using a nonparametric framework. Res. Math. Sci. 2020, 7, 1–25. [Google Scholar] [CrossRef]
  24. Marion, M.; Temam, R. Nonlinear Galerkin methods. SIAM J. Numer. Anal. 1989, 26, 1139–1157. [Google Scholar] [CrossRef]
  25. Jolly, M.S.; Kevrekidis, I.G.; Titi, E.S. Approximate inertial manifolds for the Kuramoto-Sivashinsky equation: Analysis and computations. Physica D 1990, 44, 38–60. [Google Scholar] [CrossRef]
  26. Rosa, R. Approximate inertial manifolds of exponential order. Discrete Contin. Dynam. Syst. 1995, 3, 421–448. [Google Scholar] [CrossRef]
  27. Novo, J.; Titi, E.S.; Wynne, S. Efficient methods using high accuracy approximate inertial manifolds. Numer. Math. 2001, 87, 523–554. [Google Scholar] [CrossRef]
  28. Zelik, S. Inertial manifolds and finite-dimensional reduction for dissipative PDEs. Proc. R. Soc. Edinb. A 2014, 144, 1245–1327. [Google Scholar] [CrossRef] [Green Version]
  29. Zhang, H.; Harlim, J.; Li, X. Computing linear response statistics using orthogonal polynomial based estimators: An RKHS formulation. arXiv 2019, arXiv:1912.11110. [Google Scholar]
  30. Pan, S.; Duraisamy, K. Data-driven discovery of closure models. SIAM J. Appl. Dyn. Syst. 2018, 17, 2381–2413. [Google Scholar] [CrossRef]
  31. E, W.; Khanin, K.; Mazel, A.; Sinai, Y.G. Invariant Measures for Burgers Equation with Stochastic Forcing. Ann. Math. 2000, 151, 877–960. [Google Scholar] [CrossRef]
  32. Chorin, A.J. Averaging and Renormalization for the Korteveg-deVries-Burgers Equation. Proc. Natl. Acad. Sci. USA 2003, 100, 9674–9679. [Google Scholar] [CrossRef] [Green Version]
  33. Chorin, A.J.; Hald, O.H. Viscosity-Dependent Inertial Spectra of the Burgers and Korteweg-deVries-Burgers Equations. Proc. Natl. Acad. Sci. USA 2005, 102, 3921–3923. [Google Scholar] [CrossRef] [Green Version]
  34. Bec, J.; Khanin, K. Burgers Turbulence. Phys. Rep. 2007, 447, 1–66. [Google Scholar] [CrossRef]
  35. Beck, M.; Wayne, C.E. Using Global Invariant Manifolds to Understand Metastability in the Burgers Equation With Small Viscosity. SIAM J. Appl. Dyn. Syst. 2009, 8, 1043–1065. [Google Scholar] [CrossRef] [Green Version]
  36. Wang, Z.; Akhtar, I.; Borggaard, J.; Iliescu, T. Two-Level Discretizations of Nonlinear Closure Models for Proper Orthogonal Decomposition. J. Comput. Phys. 2011, 230, 126–146. [Google Scholar] [CrossRef]
  37. Dolaptchiev, S.; Achatz, U.; Timofeyev, I. Stochastic closure for local averages in the finite-difference discretization of the forced Burgers equation. Theor. Comput. Fluid Dyn. 2013, 27, 297–317. [Google Scholar] [CrossRef]
  38. Benner, P.; Gugercin, S.; Willcox, K. A Survey of Projection-Based Model Reduction Methods for Parametric Dynamical Systems. SIAM Rev. 2015, 57, 483–531. [Google Scholar] [CrossRef]
  39. Quarteroni, A.; Manzoni, A.; Negri, F. Reduced Basis Methods for Partial Differential Equations: An Introduction; Springer: Berlin/Heidelberg, Germany, 2015; Volume 92. [Google Scholar]
  40. Sinai, Y.G. Two results concerning asymptotic behavior of solutions of the Burgers equation with force. J. Stat. Phys. 1991, 64, 1–12. [Google Scholar] [CrossRef]
  41. Da Prato, G. An Introduction to Infinite-Dimensional Analysis; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2006. [Google Scholar]
  42. Cox, S.M.; Matthews, P.C. Exponential time differencing for stiff systems. J. Comput. Phys. 2002, 176, 430–455. [Google Scholar] [CrossRef] [Green Version]
  43. Kassam, A.K.; Trefethen, L.N. Fourth-order time stepping for stiff PDEs. SIAM J. Sci. Comput. 2005, 26, 1214–1233. [Google Scholar] [CrossRef]
  44. Gottlieb, D.; Orszag, S. Numerical Analysis of Spectral Methods: Theory and Applications; SIAM: Philadelphia, PA, USA, 1977. [Google Scholar]
  45. Fan, J.; Yao, Q. Nonlinear Time Series: Nonparametric and Parametric Methods; Springer: New York, NY, USA, 2003. [Google Scholar]
  46. Lu, F.; Lin, K.K.; Chorin, A.J. Comparison of continuous and discrete-time data-based modeling for hypoelliptic systems. Commun. Appl. Math. Comput. Sci. 2016, 11, 187–216. [Google Scholar] [CrossRef] [Green Version]
  47. Verheul, N.; Crommelin, D. Stochastic parameterization with VARX processes. arXiv 2020, arXiv:stat.ME/2010.03293. [Google Scholar]
  48. Kutoyants, Y.A. Statistical Inference for Ergodic Diffusion Processes; Springer: Berlin/Heidelberg, Germany, 2004. [Google Scholar]
  49. Tibshirani, R. Regression Shrinkage and Selection Via the Lasso. J. R. Stat. Soc. Ser. (Methodol.) 1996, 58, 267–288. [Google Scholar] [CrossRef]
  50. Brockwell, P.; Davis, R. Introduction to Time Series and Forecasting; Springer: New York, NY, USA, 2002. [Google Scholar]
  51. Billings, S.A. Nonlinear System Identification: NARMAX Methods in the Time, Frequency, and Spatiotemporal Domains; John Wiley and Sons: Hoboken, NJ, USA, 2013. [Google Scholar]
  52. Györfi, L.; Kohler, M.; Krzyzak, A.; Walk, H. A Distribution-Free Theory of Nonparametric Regression; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2006. [Google Scholar]
  53. Lu, F.; Zhong, M.; Tang, S.; Maggioni, M. Nonparametric inference of interaction laws in systems of agents from trajectory data. Proc. Natl. Acad. Sci. USA 2019, 116, 14424–14433. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  54. She, Y. Thresholding-Based Iterative Selection Procedures for Model Selection and Shrinkage. Electron. J. Statist. 2009, 3, 384–415. [Google Scholar] [CrossRef]
  55. Quade, M.; Abel, M.; Kutz, N.J.; Brunton, S.L. Sparse Identification of Nonlinear Dynamics for Rapid Model Recovery. Chaos 2018, 28, 063116. [Google Scholar] [CrossRef] [PubMed]
Figure 1. Relative error in energy spectrum reproduced by the NAR models with different memory lengths p, in four settings of ( K , σ ) . As the time lag p increases, the relative error tends to first decrease and then increase, particularly in (b,d) with σ = 0.2 .
Figure 1. Relative error in energy spectrum reproduced by the NAR models with different memory lengths p, in four settings of ( K , σ ) . As the time lag p increases, the relative error tends to first decrease and then increase, particularly in (b,d) with σ = 0.2 .
Entropy 22 01360 g001
Figure 2. Energy spectrum of NAR models with p = 1 and the K-mode Galerkin systems in four settings of ( K , σ ) . The time step is δ = 5 d t for the NAR models and is d t for the Galerkin models. The NAR models accurately reproduce the true energy spectrum in all settings.
Figure 2. Energy spectrum of NAR models with p = 1 and the K-mode Galerkin systems in four settings of ( K , σ ) . The time step is δ = 5 d t for the NAR models and is d t for the Galerkin models. The NAR models accurately reproduce the true energy spectrum in all settings.
Entropy 22 01360 g002
Figure 3. Estimated coefficients ( c k , 1 v , c k , 1 R , c k , j w ) in NAR models with p = 1 and δ = 5 d t in four settings of ( K , σ ) . The estimators tend to converge fast as the trajectory length T and number M increase: note that the coefficients c k , 1 w are at the scale of 10 4 or 10 3 .
Figure 3. Estimated coefficients ( c k , 1 v , c k , 1 R , c k , j w ) in NAR models with p = 1 and δ = 5 d t in four settings of ( K , σ ) . The estimators tend to converge fast as the trajectory length T and number M increase: note that the coefficients c k , 1 w are at the scale of 10 4 or 10 3 .
Entropy 22 01360 g003
Figure 4. Relative error in energy spectrum reproduced by the NAR models with time steps δ = d t × Gap for Gap { 5 , 10 , 20 , 30 , 40 , 50 } in four settings of ( K , σ ) . All NAR models are with time lag p = 1 . The missing Gap s in (a,b) lead to numerically unstable NAR models. Thus, the maximal δ s that an NAR model can reach are δ [ 0.01 , 0.02 ) and δ [ 0.04 , 0.05 ) for (a,b) respectively, and δ 0.16 for (c,d).
Figure 4. Relative error in energy spectrum reproduced by the NAR models with time steps δ = d t × Gap for Gap { 5 , 10 , 20 , 30 , 40 , 50 } in four settings of ( K , σ ) . All NAR models are with time lag p = 1 . The missing Gap s in (a,b) lead to numerically unstable NAR models. Thus, the maximal δ s that an NAR model can reach are δ [ 0.01 , 0.02 ) and δ [ 0.04 , 0.05 ) for (a,b) respectively, and δ 0.16 for (c,d).
Entropy 22 01360 g004
Figure 5. Marginal PDFs and K-S statistics (Kolmogorov--Smirnov statistics, which is the maximum difference between the cumulative distribution functions). In each of (ad), the top panels are plots of the empirical marginal PDFs of the real parts of the Fourier modes, from data (True), the K-mode Galerkin system (Galerklin) and the NAR models with p = 1 and δ = Gap d t with Gap = 5; the bottom panels are the K-S statistics of NAR models with different time steps δ = d t × Gap , up to the largest Gap such that the NAR model is numerically stable.
Figure 5. Marginal PDFs and K-S statistics (Kolmogorov--Smirnov statistics, which is the maximum difference between the cumulative distribution functions). In each of (ad), the top panels are plots of the empirical marginal PDFs of the real parts of the Fourier modes, from data (True), the K-mode Galerkin system (Galerklin) and the NAR models with p = 1 and δ = Gap d t with Gap = 5; the bottom panels are the K-S statistics of NAR models with different time steps δ = d t × Gap , up to the largest Gap such that the NAR model is numerically stable.
Entropy 22 01360 g005
Figure 6. ACF (auto correlation functions. In each of (ad), the top panels are the ACFs of the real parts of the Fourier modes when Gap = 5; the bottom panels are the relative errors (in L 2 ( [ 0 , 3 ] ) -norm) of the NAR models with different time steps δ = d t × Gap , up to the largest Gap such that the NAR model is numerically stable.
Figure 6. ACF (auto correlation functions. In each of (ad), the top panels are the ACFs of the real parts of the Fourier modes when Gap = 5; the bottom panels are the relative errors (in L 2 ( [ 0 , 3 ] ) -norm) of the NAR models with different time steps δ = d t × Gap , up to the largest Gap such that the NAR model is numerically stable.
Entropy 22 01360 g006
Figure 7. The mean CFL numbers of the full models and the K-mode Galerkin systems. The mean CFL number is computed along a trajectory with 10 5 steps. The time step is d t = 0.001 for the full model, and is δ = d t × Gap for the K-mode Galerkin system. When ( σ = 1 , K = 8 ) , the K-mode Galerkin system blows up after Gap > 80 , so its CFL number is missing afterwards. The stars are the maximal Gap , such that the NAR model is stable. The red and blue squares are where the full model’s mean CFL numbers agree with those of the K-mode Galerkin systems. The stars (✩) are the largest time Gap that our NAR model is numerically stable. The relative errors in energy spectrum in Figure 4c,d are the smallest when the Gap ’s are the closest to these squares.
Figure 7. The mean CFL numbers of the full models and the K-mode Galerkin systems. The mean CFL number is computed along a trajectory with 10 5 steps. The time step is d t = 0.001 for the full model, and is δ = d t × Gap for the K-mode Galerkin system. When ( σ = 1 , K = 8 ) , the K-mode Galerkin system blows up after Gap > 80 , so its CFL number is missing afterwards. The stars are the maximal Gap , such that the NAR model is stable. The red and blue squares are where the full model’s mean CFL numbers agree with those of the K-mode Galerkin systems. The stars (✩) are the largest time Gap that our NAR model is numerically stable. The relative errors in energy spectrum in Figure 4c,d are the smallest when the Gap ’s are the closest to these squares.
Entropy 22 01360 g007
Table 1. Notations: the variables in the full and reduced models.
Table 1. Notations: the variables in the full and reduced models.
ModelNotationDescription
Full model u ( x , t ) = | k | 1 u ^ k ( t ) e i q k x solution of (1) in its Fourier series
f ( x , t ) = | k | 1 K 0 f ^ k ( t ) e i q k x stochastic force in (2) in its Fourier series
v ( x , t ) = | k | K u ^ k ( t ) e i q k x the resolved variable, the target process for closure modeling
w ( x , t ) = | k | > K u ^ k ( t ) e i q k x the unresolved variable; u = v + w in (12)
ν , σ the viscosity in (1) and the strength of the stochastic force
N, d t number of modes and time step-size in numerical solution
Reduced modelsKnumber of modes in reduced (NAR) models in (17)
( u k n ) | k | K state variable in reduced model, corresponding to u ^ k ( t n )
δ = d t × Gap observation time interval
R k δ , Φ n , g n parametric terms in the NAR model in (10) and (17)
Table 2. Correspondence of the variables between the full and reduced models.
Table 2. Correspondence of the variables between the full and reduced models.
Full Model in (4)Reduced Model in (10) or (17)
State variables u ^ k ( t n ) or u ^ ( t n ) in (4) and (9) u k n or u n in (10)
Resolved variable v ( x , t n ) or v, in (6) and (12)the vector ( u K n , , u K n ) in (17)
Unresolved variable w ( x , t ) or w in (7) and (12)NA
Stochastic forcewhite noise f ^ k ( t n ) in (9)white noise f k n in (10)
Noise introduced in inferenceNA g n in (10)
Flow map of resolved variableF in Equation (8)Equation (10)
Table 3. Settings of the full and reduced models.
Table 3. Settings of the full and reduced models.
Full model ν = 0.02 , L = 1 viscosity, interval length of the equation
N = 128 , d t = 0.001 number of modes, time step-size
K 0 = 4 number of modes in the stochastic force
σ = 1 or 0.2 standard deviation of the stochastic force
Reduced models K = 8 or 2 number of modes in the reduced model
δ = d t × Gap observation time interval
Gap { 5 , 10 , 20 , 30 , 40 , 50 , 80 , 160 } gap of time steps
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Lu, F. Data-Driven Model Reduction for Stochastic Burgers Equations. Entropy 2020, 22, 1360. https://doi.org/10.3390/e22121360

AMA Style

Lu F. Data-Driven Model Reduction for Stochastic Burgers Equations. Entropy. 2020; 22(12):1360. https://doi.org/10.3390/e22121360

Chicago/Turabian Style

Lu, Fei. 2020. "Data-Driven Model Reduction for Stochastic Burgers Equations" Entropy 22, no. 12: 1360. https://doi.org/10.3390/e22121360

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop