# Data-Driven Model Reduction for Stochastic Burgers Equations

## Abstract

**:**

## 1. Introduction

## 2. Space-Time Reduction for Stochastic Burgers Equationations

#### 2.1. The Stochastic Burgers Equationation

#### 2.2. Galerkin Spectral Method

#### 2.3. Nonlinear Galerkin and Inferential Model Reduction

**Space-time reduction.**To achieve a space-time reduction for practical computation, the reduced model should be a time series model with a time step $\delta >dt$ for time reduction, instead of a differential system. It approximates the flow map (with ${t}_{n}=n\delta $)

## 3. Inference of Reduced Models

#### 3.1. Derivation of Parametric Reduced Models

#### 3.2. The Numerical Reduced Model in Fourier Modes

- The map ${R}^{\delta}(\xb7):{\mathbb{C}}^{K}\to {\mathbb{C}}^{K}$ is the 1-step forward of the deterministic K-mode Galerkin truncation equation $\frac{dv}{dt}=-PAv+PB\left(v\right)$ using a numerical integration scheme with a time step-size $\delta $, i.e., ${v}^{n+1}={v}^{n}+\delta {R}^{\delta}\left({v}^{n}\right)$. We use the ETDRK4 scheme.
- The term ${f}_{k}^{n}$ denotes the increment of the k-th Fourier modes of the stochastic force in the time interval $[{t}_{n-1},{t}_{n}]$, scaled by $1/\delta $, and it is separated from ${R}^{\delta}$ so that the reduced model can linearly quantify the response of the low-modes to the stochastic force.
- The function ${\Phi}_{k}^{n}:={\Phi}_{k}^{n}({u}^{n-p:n-1},{f}^{n-p:n-1})$ is a function ${\mathbb{C}}^{Kp+Kp}\to {\mathbb{C}}^{K}$ with parameters $\theta =({c}^{v},{c}^{R},{c}^{f},{c}^{w})\in {\mathbb{R}}^{4Kp}$ to be estimated from data. In particular, the coefficients ${c}_{k,1}^{v}$ and ${c}_{k,1}^{R}$ act as a correction to the integration of the truncated equation.
- The new noise terms $\{{g}^{n}\in {\mathbb{C}}^{K}\}$ are assumed for simplicity to be a white noise independent of the original stochastic force $\left({f}^{n}\right)$. That is, we assume that $\left\{{g}^{n}\right\}$ is a sequence of independent identically distributed (iid) Gaussian random vectors, with independent real and imaginary parts, distributed as $\mathcal{N}(0,\mathrm{Diag}\left({\sigma}_{k}^{g}\right))$ with ${\sigma}_{k}^{g}$ to be estimated from data. Under such a white noise assumption, the parameters can be estimated simply by least squares (see next section). In general, one can also assume other distributions for ${g}^{n}$, or other structures such as moving average $\{{g}^{n}:={\xi}_{n}+{\sum}_{j=1}^{q}{c}_{j}^{g}{\xi}_{n-j}\}$ with $\left\{{\xi}_{n}\right\}$ being a white noise sequence [13,46].

#### 3.3. Data Generation and Parameter Estimation

**Data for the NAR model.**To infer a reduced model in form of Equation (17), we generate relevant data from a numerical scheme that sufficiently resolve the system in space and time, as introduced in Section 2.2. The relevant data are trajectories of the low-modes of the state and the stochastic force, i.e., $\{{\widehat{u}}_{k}\left({t}_{n}\right),{\widehat{f}}_{k}\left({t}_{n}\right)\}$ for $\left|k\right|\le K$ and $n\ge 0$, which are taken as $\{{u}_{k}^{n},{f}_{k}^{n}\}$ in the reduced model. Here, the time instants are ${t}_{n}=n\delta $, where $\delta $ can be much larger than the time step-size $dt$ needed to resolve the system. Furthermore, the data do not include the high modes. In short, the data are generated by a downsampling, in both space and time, of the high-resolution solutions of the system.

**Parameter estimation.**The parameters in the discrete-time reduced model Equation (17) is estimated by maximum likelihood methods. Our discrete-time reduced model has a few attractive features: (i) the likelihood function can be computed exactly, avoiding possible approximation error that could lead to biases in estimators; (ii) the maximum likelihood estimator (MLE) may be computed by least squares under the assumption that the process $\left\{{g}^{n}\right\}$ is white noise, avoiding time-consuming nonlinear optimizations.

#### 3.4. Model Selection

- Cross validation: the reduced model should be stable and can reproduce the distribution of the resolved process, particularly the main dynamical-statistical properties. We will consider the energy spectrum, the marginal invariant densities, and temporal correlations:$$\begin{array}{cc}\hfill \mathrm{Energy}\phantom{\rule{4.pt}{0ex}}\mathrm{spectrum}:\phantom{\rule{4.pt}{0ex}}\mathbb{E}|{\widehat{u}}_{k}{|}^{2}& =\underset{{N}_{t}M\to \infty}{lim}\frac{1}{{N}_{t}M}\sum _{m,n=1}^{M,{N}_{t}}{\left|{\widehat{u}}_{k}{\left({t}_{n}\right)}^{\left(m\right)}\right|}^{2};\hfill \\ \hfill \mathrm{Invariant}\phantom{\rule{4.pt}{0ex}}\mathrm{density}\phantom{\rule{4.pt}{0ex}}\mathrm{of}\phantom{\rule{4.pt}{0ex}}\mathrm{Re}\left({\widehat{u}}_{k}\right):\phantom{\rule{4.pt}{0ex}}{p}_{k}\left(z\right)dz& =\underset{{N}_{t}M\to \infty}{lim}\frac{1}{{N}_{t}M}\sum _{m,n=1}^{M,{N}_{t}}{\mathbf{1}}_{(z,z+dz)}(\mathrm{Re}({\widehat{u}}_{k}{\left({t}_{n}\right)}^{\left(m\right)});\hfill \\ \hfill \mathrm{Auto}-\mathrm{correlation}\phantom{\rule{4.pt}{0ex}}\mathrm{function}:\phantom{\rule{4.pt}{0ex}}{\mathrm{ACF}}_{k}\left(\tau \right)& =\mathbb{E}\left[\mathrm{Re}{\widehat{u}}_{k}(t+\tau )\mathrm{Re}{\widehat{u}}_{k}\left(t\right)\right]\hfill \\ \hfill \phantom{\rule{1.em}{0ex}}& \approx \frac{1}{{N}_{t}M}\sum _{m,n=1}^{M,{N}_{t}}\mathrm{Re}({\widehat{u}}_{k}{({t}_{n}+\tau )}^{\left(m\right)})\mathrm{Re}({\widehat{u}}_{k}{\left({t}_{n}\right)}^{\left(m\right)});\hfill \end{array}$$
- Consistency of the estimators. If the model is perfect and the data are either independent trajectories or a long trajectory from an ergodic measure, the estimators should converge as the data size increases (see e.g., [45,48]). While our parametric model may not be perfect, the estimators should also become less oscillatory as the data size increases, so that the algorithm is robust and can yield similar reduced models from different data sets.
- Simplicity and sparsity. When there are multiple reduced models performing similarly, we prefer the simplest model. We remove the redundant terms and enforce sparsity by LASSO (least absolute shrinkage and selection operator) regression [49]. Particularly, a singular normal matrix (22) indicates the redundancy of the terms and the need to remove strongly correlated terms.

## 4. Numerical Study on Space-Time Reduction

#### 4.1. Settings

- $K=8>{K}_{0}=4$. In this case, $Qf=0$, i.e., the stochastic force does not act on the unresolved Fourier modes w in (7), so w is a deterministic functional of the history of the resolved Fourier modes. In view of (14b), the reduced model mainly quantifies this deterministic map. We call this case “reduction of the deterministic response” and present the results in Section 4.3.
- $K=2<{K}_{0}$. In this case, $Qf\ne 0$, and w in (7) depend on the unobserved Fourier modes of the stochastic force. Thus, the reduced model has to quantify the effects of the unresolved Fourier modes of both the solution and the stochastic force. We call this case “reduction involving unresolved stochastic force” and present the results in Section 4.4.

#### 4.2. Model Selection and Memory Length

**Memory length.**To select a memory length, we test NAR models with time lags $p\in \{1,5,10,20\}$ and consider their reproduction of the energy spectrum in (23). Figure 1 shows the relative error in energy spectrum of these NAR models. It shows that as p increases: (1) when the scale of the stochastic force is large ($\sigma =1$), the error oscillates without a clear pattern; (2) when $\sigma =0.2$, the error first decreases and then increases. Thus, a longer memory does not necessarily lead to a better reduced model when the stochastic force dominates the dynamics; but when deterministic flow dominates the dynamics, a proper memory can be helpful.

**Consistency of estimators.**The estimator of the NAR models tends to converge as data size increases. Figure 3 shows that the estimated coefficients of NAR with $p=1$ from data consisting of M trajectories, each with length T, where $M\in \{2,8,32,128,512\}$ and $T\in \{40,80,160,320,640,1280\}$. As $T\times M$ increases, all the estimators tend to converge (note that the coefficients ${c}_{k,1}^{w}$ are at the scale of ${10}^{-4}$ or ${10}^{-3}$). In particular, they converge faster when $\sigma =1$ than when $\sigma =0.2$: the estimators in (a)–(c) oscillate little after $T\times M>{10}^{3}$, indicating that different trajectories lead to similar estimators, while the estimators (take ${c}_{K,1}^{R}$ for example) in (b)–(d) oscillate until $T\times M>{10}^{5}$. This agrees with the fact that a larger stochastic force makes the system mix faster, so each trajectory provides more effective samples driving the estimator to converge faster.

#### 4.3. Reduction of the Deterministic Response

#### 4.4. Reduction Involving Unresolved Stochastic Force

#### 4.5. Discussion on Space-Time Reduction

- Space dimension reduction, memory length of the reduced model and the stochastic force are closely related. As suggested by the discrete Mori–Zwanzig formalism for random dynamics (see e.g., [7]), space dimension reduction would lead to non-Markovian closure models. Figure 1 suggests that a proper medium length of the memory leads to best NAR model. It also suggests that the scale of the white in time stochastic force can affect the memory length, and a larger scale of stochastic force leads to shorter memory. We leave it as future work to investigate the relations between memory length (colored or white in time), stochastic force, and energy dissipation.
- Maximal time step depends on the space dimension and the scale of the stochastic force, mainly limited by the stability of the nonlinear reduced model. Figure 4 shows that the maximum time step when $K=2$ is at least $\delta =dt\times \mathrm{Gap}$ with $\mathrm{Gap}=160$, much larger than those of the case of $K=8$. It also shows that as the scale of stochastic force increases from $\sigma =0.2$ to $\sigma =1$, the NAR models’ maximal time step decreases (because the NAR models either become unstable or have larger errors in energy spectrum). It is noteworthy to mention that these maximal time steps of NAR models are smaller than those that the K-mode Galerkin system can tolerate. Figure 7 shows that the K-mode Galerkin system can be stable for time steps much larger than those of the NAR models: the maximal time step for the K-mode Galerkin system is when the mean CFL number (which increases linearly) reaches 1, but the maximal time step for the NAR models to be stable is smaller. For example, in the setting $(K=8,\sigma =0.2)$, the maximal time gap for the Galerkin system is $\mathrm{Gap}=80$ (the end of the red diamond line), but the maximal time gap for the NAR model is about $\mathrm{Gap}=10$. The increased numerical instability of the NAR model is likely due to the nonlinear terms ${\Phi}^{n}$, which are important for the NAR model to preserve energy dissipation and the energy spectrum (see Figure 2 and the coefficients in Figure 3).

## 5. Conclusions

## Funding

## Acknowledgments

## Conflicts of Interest

## Abbreviations

ETDRK4 | exponential time differencing fourth order Rouge–Kutta method |

CFL number | Courant–Friedrichs–Lewy number |

NAR | nonlinear autoregression |

probability density function | |

ACF | autocorrelation function |

## References

- Stinis, P. Mori-Zwanzig Reduced Models for Uncertainty Quantification II: Initial Condition Uncertainty. arXiv
**2012**, arXiv:1212.6360. [Google Scholar] - Li, Z.; Bian, X.; Li, X.; Karniadakis, G.E. Incorporation of Memory Effects in Coarse-Grained Modeling via the Mori-Zwanzig Formalism. J. Chem. Phys.
**2015**, 143, 243128. [Google Scholar] [CrossRef] [PubMed] - Lu, F.; Tu, X.; Chorin, A.J. Accounting for Model Error from Unresolved Scales in Ensemble Kalman Filters by Stochastic Parameterization. Mon. Weather. Rev.
**2017**, 145, 3709–3723. [Google Scholar] [CrossRef] - Lu, F.; Weitzel, N.; Monahan, A. Joint state-parameter estimation of a nonlinear stochastic energy balance model from sparse noisy data. Nonlinear Process. Geophys.
**2019**, 26, 227–250. [Google Scholar] [CrossRef][Green Version] - Zwanzig, R. Nonequilibrium Statistical Mechanics; Oxford University Press: New York, NY, USA, 2001. [Google Scholar]
- Chorin, A.J.; Hald, O.H. Stochastic Tools in Mathematics and Science, 3rd ed.; Springer: New York, NY, USA, 2013. [Google Scholar]
- Lin, K.K.; Lu, F. Data-driven model reduction, Wiener projections, and the Koopman-Mori-Zwanzig formalism. J. Comput. Phys.
**2020**, 424, 109864. [Google Scholar] [CrossRef] - Kondrashov, D.; Chekroun, M.D.; Ghil, M. Data-Driven Non-Markovian Closure Models. Physica D
**2015**, 297, 33–55. [Google Scholar] [CrossRef][Green Version] - Harlim, J.; Li, X. Parametric Reduced Models for the Nonlinear Schrödinger Equation. Phys. Rev. E
**2015**, 91, 053306. [Google Scholar] [CrossRef][Green Version] - Lei, H.; Baker, N.A.; Li, X. Data-Driven Parameterization of the Generalized Langevin Equation. Proc. Natl. Acad. Sci. USA
**2016**, 113, 14183–14188. [Google Scholar] [CrossRef][Green Version] - Xie, X.; Mohebujjaman, M.; Rebholz, L.G.; Iliescu, T. Data-Driven Filtered Reduced Order Modeling of Fluid Flows. SIAM J. Sci. Comput.
**2018**, 40, B834–B857. [Google Scholar] [CrossRef][Green Version] - Chekroun, M.D.; Kondrashov, D. Data-Adaptive Harmonic Spectra and Multilayer Stuart-Landau Models. Chaos Interdiscip. J. Nonlinear Sci.
**2017**, 27, 093110. [Google Scholar] [CrossRef][Green Version] - Chorin, A.J.; Lu, F. Discrete approach to stochastic parametrization and dimension reduction in nonlinear dynamics. Proc. Natl. Acad. Sci. USA
**2015**, 112, 9804–9809. [Google Scholar] [CrossRef][Green Version] - Lu, F.; Lin, K.K.; Chorin, A.J. Data-based stochastic model reduction for the Kuramoto–Sivashinsky equation. Physica D
**2017**, 340, 46–57. [Google Scholar] [CrossRef][Green Version] - Pathak, J.; Hunt, B.; Girvan, M.; Lu, Z.; Ott, E. Model-Free Prediction of Large Spatiotemporally Chaotic Systems from Data: A Reservoir Computing Approach. Phys. Rev. Lett.
**2018**, 120, 024102. [Google Scholar] [CrossRef] [PubMed][Green Version] - Ma, C.; Wang, J.; E, W. Model Reduction with Memory and the Machine Learning of Dynamical Systems. Commun. Comput. Phys.
**2018**, 25, 947–962. [Google Scholar] [CrossRef] - Harlim, J.; Jiang, S.W.; Liang, S.; Yang, H. Machine learning for prediction with missing dynamics. J. Comput. Phys.
**2020**. [Google Scholar] [CrossRef] - Parish, E.J.; Duraisamy, K. A Paradigm for Data-Driven Predictive Modeling Using Field Inversion and Machine Learning. J. Comput. Phys.
**2016**, 305, 758–774, Eng0. [Google Scholar] [CrossRef][Green Version] - Duan, J.; Wei, W. Effective Dynamics of Stochastic Partial Differential Equations; Elsevier: Amsterdam, The Netherlands, 2014. [Google Scholar]
- Stinis, P. Renormalized Mori-Zwanzig-Reduced Models for Systems without Scale Separation. Proc. R. Soc. A
**2015**, 471, 20140446. [Google Scholar] [CrossRef] [PubMed][Green Version] - Hudson, T.; Li, X.H. Coarse-Graining of Overdamped Langevin Dynamics via the Mori-Zwanzig Formalism. Multiscale Model. Simul.
**2020**, 18, 1113–1135. [Google Scholar] [CrossRef] - Choi, Y.; Carlberg, K. Space–Time Least-Squares Petrov-Galerkin Projection for Nonlinear Model Reduction. SIAM J. Sci. Comput.
**2019**, 41, A26–A58. [Google Scholar] [CrossRef] - Jiang, S.W.; Harlim, J. Modeling of missing dynamical systems: Deriving parametric models using a nonparametric framework. Res. Math. Sci.
**2020**, 7, 1–25. [Google Scholar] [CrossRef] - Marion, M.; Temam, R. Nonlinear Galerkin methods. SIAM J. Numer. Anal.
**1989**, 26, 1139–1157. [Google Scholar] [CrossRef] - Jolly, M.S.; Kevrekidis, I.G.; Titi, E.S. Approximate inertial manifolds for the Kuramoto-Sivashinsky equation: Analysis and computations. Physica D
**1990**, 44, 38–60. [Google Scholar] [CrossRef] - Rosa, R. Approximate inertial manifolds of exponential order. Discrete Contin. Dynam. Syst.
**1995**, 3, 421–448. [Google Scholar] [CrossRef] - Novo, J.; Titi, E.S.; Wynne, S. Efficient methods using high accuracy approximate inertial manifolds. Numer. Math.
**2001**, 87, 523–554. [Google Scholar] [CrossRef] - Zelik, S. Inertial manifolds and finite-dimensional reduction for dissipative PDEs. Proc. R. Soc. Edinb. A
**2014**, 144, 1245–1327. [Google Scholar] [CrossRef][Green Version] - Zhang, H.; Harlim, J.; Li, X. Computing linear response statistics using orthogonal polynomial based estimators: An RKHS formulation. arXiv
**2019**, arXiv:1912.11110. [Google Scholar] - Pan, S.; Duraisamy, K. Data-driven discovery of closure models. SIAM J. Appl. Dyn. Syst.
**2018**, 17, 2381–2413. [Google Scholar] [CrossRef] - E, W.; Khanin, K.; Mazel, A.; Sinai, Y.G. Invariant Measures for Burgers Equation with Stochastic Forcing. Ann. Math.
**2000**, 151, 877–960. [Google Scholar] [CrossRef] - Chorin, A.J. Averaging and Renormalization for the Korteveg-deVries-Burgers Equation. Proc. Natl. Acad. Sci. USA
**2003**, 100, 9674–9679. [Google Scholar] [CrossRef][Green Version] - Chorin, A.J.; Hald, O.H. Viscosity-Dependent Inertial Spectra of the Burgers and Korteweg-deVries-Burgers Equations. Proc. Natl. Acad. Sci. USA
**2005**, 102, 3921–3923. [Google Scholar] [CrossRef][Green Version] - Bec, J.; Khanin, K. Burgers Turbulence. Phys. Rep.
**2007**, 447, 1–66. [Google Scholar] [CrossRef] - Beck, M.; Wayne, C.E. Using Global Invariant Manifolds to Understand Metastability in the Burgers Equation With Small Viscosity. SIAM J. Appl. Dyn. Syst.
**2009**, 8, 1043–1065. [Google Scholar] [CrossRef][Green Version] - Wang, Z.; Akhtar, I.; Borggaard, J.; Iliescu, T. Two-Level Discretizations of Nonlinear Closure Models for Proper Orthogonal Decomposition. J. Comput. Phys.
**2011**, 230, 126–146. [Google Scholar] [CrossRef] - Dolaptchiev, S.; Achatz, U.; Timofeyev, I. Stochastic closure for local averages in the finite-difference discretization of the forced Burgers equation. Theor. Comput. Fluid Dyn.
**2013**, 27, 297–317. [Google Scholar] [CrossRef] - Benner, P.; Gugercin, S.; Willcox, K. A Survey of Projection-Based Model Reduction Methods for Parametric Dynamical Systems. SIAM Rev.
**2015**, 57, 483–531. [Google Scholar] [CrossRef] - Quarteroni, A.; Manzoni, A.; Negri, F. Reduced Basis Methods for Partial Differential Equations: An Introduction; Springer: Berlin/Heidelberg, Germany, 2015; Volume 92. [Google Scholar]
- Sinai, Y.G. Two results concerning asymptotic behavior of solutions of the Burgers equation with force. J. Stat. Phys.
**1991**, 64, 1–12. [Google Scholar] [CrossRef] - Da Prato, G. An Introduction to Infinite-Dimensional Analysis; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2006. [Google Scholar]
- Cox, S.M.; Matthews, P.C. Exponential time differencing for stiff systems. J. Comput. Phys.
**2002**, 176, 430–455. [Google Scholar] [CrossRef][Green Version] - Kassam, A.K.; Trefethen, L.N. Fourth-order time stepping for stiff PDEs. SIAM J. Sci. Comput.
**2005**, 26, 1214–1233. [Google Scholar] [CrossRef] - Gottlieb, D.; Orszag, S. Numerical Analysis of Spectral Methods: Theory and Applications; SIAM: Philadelphia, PA, USA, 1977. [Google Scholar]
- Fan, J.; Yao, Q. Nonlinear Time Series: Nonparametric and Parametric Methods; Springer: New York, NY, USA, 2003. [Google Scholar]
- Lu, F.; Lin, K.K.; Chorin, A.J. Comparison of continuous and discrete-time data-based modeling for hypoelliptic systems. Commun. Appl. Math. Comput. Sci.
**2016**, 11, 187–216. [Google Scholar] [CrossRef][Green Version] - Verheul, N.; Crommelin, D. Stochastic parameterization with VARX processes. arXiv
**2020**, arXiv:stat.ME/2010.03293. [Google Scholar] - Kutoyants, Y.A. Statistical Inference for Ergodic Diffusion Processes; Springer: Berlin/Heidelberg, Germany, 2004. [Google Scholar]
- Tibshirani, R. Regression Shrinkage and Selection Via the Lasso. J. R. Stat. Soc. Ser. (Methodol.)
**1996**, 58, 267–288. [Google Scholar] [CrossRef] - Brockwell, P.; Davis, R. Introduction to Time Series and Forecasting; Springer: New York, NY, USA, 2002. [Google Scholar]
- Billings, S.A. Nonlinear System Identification: NARMAX Methods in the Time, Frequency, and Spatiotemporal Domains; John Wiley and Sons: Hoboken, NJ, USA, 2013. [Google Scholar]
- Györfi, L.; Kohler, M.; Krzyzak, A.; Walk, H. A Distribution-Free Theory of Nonparametric Regression; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2006. [Google Scholar]
- Lu, F.; Zhong, M.; Tang, S.; Maggioni, M. Nonparametric inference of interaction laws in systems of agents from trajectory data. Proc. Natl. Acad. Sci. USA
**2019**, 116, 14424–14433. [Google Scholar] [CrossRef] [PubMed][Green Version] - She, Y. Thresholding-Based Iterative Selection Procedures for Model Selection and Shrinkage. Electron. J. Statist.
**2009**, 3, 384–415. [Google Scholar] [CrossRef] - Quade, M.; Abel, M.; Kutz, N.J.; Brunton, S.L. Sparse Identification of Nonlinear Dynamics for Rapid Model Recovery. Chaos
**2018**, 28, 063116. [Google Scholar] [CrossRef] [PubMed]

**Figure 1.**Relative error in energy spectrum reproduced by the NAR models with different memory lengths p, in four settings of $(K,\sigma )$. As the time lag p increases, the relative error tends to first decrease and then increase, particularly in (

**b**,

**d**) with $\sigma =0.2$.

**Figure 2.**Energy spectrum of NAR models with $p=1$ and the K-mode Galerkin systems in four settings of $(K,\sigma )$. The time step is $\delta =5dt$ for the NAR models and is $dt$ for the Galerkin models. The NAR models accurately reproduce the true energy spectrum in all settings.

**Figure 3.**Estimated coefficients $({c}_{k,1}^{v},{c}_{k,1}^{R},{c}_{k,j}^{w})$ in NAR models with $p=1$ and $\delta =5dt$ in four settings of $(K,\sigma )$. The estimators tend to converge fast as the trajectory length T and number M increase: note that the coefficients ${c}_{k,1}^{w}$ are at the scale of ${10}^{-4}$ or ${10}^{-3}$.

**Figure 4.**Relative error in energy spectrum reproduced by the NAR models with time steps $\delta =dt\times \mathrm{Gap}$ for $\mathrm{Gap}\in \{5,10,20,30,40,50\}$ in four settings of $(K,\sigma )$. All NAR models are with time lag $p=1$. The missing $\mathrm{Gap}$s in (

**a**,

**b**) lead to numerically unstable NAR models. Thus, the maximal $\delta $s that an NAR model can reach are $\delta \in [0.01,0.02)$ and $\delta \in [0.04,0.05)$ for (

**a**,

**b**) respectively, and $\delta \ge 0.16$ for (

**c**,

**d**).

**Figure 5.**Marginal PDFs and K-S statistics (Kolmogorov--Smirnov statistics, which is the maximum difference between the cumulative distribution functions). In each of (

**a**–

**d**), the top panels are plots of the empirical marginal PDFs of the real parts of the Fourier modes, from data (True), the K-mode Galerkin system (Galerklin) and the NAR models with $p=1$ and $\delta =\mathrm{Gap}dt$ with Gap = 5; the bottom panels are the K-S statistics of NAR models with different time steps $\delta =dt\times \mathrm{Gap}$, up to the largest Gap such that the NAR model is numerically stable.

**Figure 6.**ACF (auto correlation functions. In each of (

**a**–

**d**), the top panels are the ACFs of the real parts of the Fourier modes when Gap = 5; the bottom panels are the relative errors (in ${L}^{2}\left(\right[0,3\left]\right)$-norm) of the NAR models with different time steps $\delta =dt\times \mathrm{Gap}$, up to the largest Gap such that the NAR model is numerically stable.

**Figure 7.**The mean CFL numbers of the full models and the K-mode Galerkin systems. The mean CFL number is computed along a trajectory with ${10}^{5}$ steps. The time step is $dt=0.001$ for the full model, and is $\delta =dt\times \mathrm{Gap}$ for the K-mode Galerkin system. When $(\sigma =1,K=8)$, the K-mode Galerkin system blows up after $\mathrm{Gap}>80$, so its CFL number is missing afterwards. The stars are the maximal $\mathrm{Gap}$, such that the NAR model is stable. The red and blue squares are where the full model’s mean CFL numbers agree with those of the K-mode Galerkin systems. The stars (✩) are the largest time Gap that our NAR model is numerically stable. The relative errors in energy spectrum in Figure 4c,d are the smallest when the $\mathrm{Gap}$’s are the closest to these squares.

Model | Notation | Description |
---|---|---|

Full model | $u(x,t)={\sum}_{\left|k\right|\ge 1}{\widehat{u}}_{k}\left(t\right){e}^{i{q}_{k}x}$ | solution of (1) in its Fourier series |

$f(x,t)={\sum}_{\left|k\right|\ge 1}^{{K}_{0}}{\widehat{f}}_{k}\left(t\right){e}^{i{q}_{k}x}$ | stochastic force in (2) in its Fourier series | |

$v(x,t)={\sum}_{\left|k\right|\le K}{\widehat{u}}_{k}\left(t\right){e}^{i{q}_{k}x}$ | the resolved variable, the target process for closure modeling | |

$w(x,t)={\sum}_{\left|k\right|>K}{\widehat{u}}_{k}\left(t\right){e}^{i{q}_{k}x}$ | the unresolved variable; $u=v+w$ in (12) | |

$\nu $, $\sigma $ | the viscosity in (1) and the strength of the stochastic force | |

N,$dt$ | number of modes and time step-size in numerical solution | |

Reduced models | K | number of modes in reduced (NAR) models in (17) |

${\left({u}_{k}^{n}\right)}_{\left|k\right|\le K}$ | state variable in reduced model, corresponding to ${\widehat{u}}_{k}\left({t}_{n}\right)$ | |

$\delta =dt\times \mathrm{Gap}$ | observation time interval | |

${R}_{k}^{\delta}$, ${\Phi}^{n}$, ${g}^{n}$ | parametric terms in the NAR model in (10) and (17) |

Full Model in (4) | Reduced Model in (10) or (17) | |
---|---|---|

State variables | ${\widehat{u}}_{k}\left({t}_{n}\right)$ or $\widehat{u}\left({t}_{n}\right)$ in (4) and (9) | ${u}_{k}^{n}$ or ${u}^{n}$ in (10) |

Resolved variable | $v(x,{t}_{n})$ or v, in (6) and (12) | the vector $({u}_{-K}^{n},\dots ,{u}_{K}^{n})$ in (17) |

Unresolved variable | $w(x,t)$ or w in (7) and (12) | NA |

Stochastic force | white noise ${\widehat{f}}_{k}\left({t}_{n}\right)$ in (9) | white noise ${f}_{k}^{n}$ in (10) |

Noise introduced in inference | NA | ${g}^{n}$ in (10) |

Flow map of resolved variable | F in Equation (8) | Equation (10) |

Full model | $\nu =0.02$, $L=1$ | viscosity, interval length of the equation |

$N=128$, $dt=0.001$ | number of modes, time step-size | |

${K}_{0}=4$ | number of modes in the stochastic force | |

$\sigma =1\phantom{\rule{4.pt}{0ex}}\mathrm{or}\phantom{\rule{4.pt}{0ex}}0.2$ | standard deviation of the stochastic force | |

Reduced models | $K=8\phantom{\rule{4.pt}{0ex}}\mathrm{or}\phantom{\rule{4.pt}{0ex}}2$ | number of modes in the reduced model |

$\delta =dt\times \mathrm{Gap}$ | observation time interval | |

$\mathrm{Gap}\in \{5,10,20,30,40,50,80,160\}$ | gap of time steps |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2020 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Lu, F. Data-Driven Model Reduction for Stochastic Burgers Equations. *Entropy* **2020**, *22*, 1360.
https://doi.org/10.3390/e22121360

**AMA Style**

Lu F. Data-Driven Model Reduction for Stochastic Burgers Equations. *Entropy*. 2020; 22(12):1360.
https://doi.org/10.3390/e22121360

**Chicago/Turabian Style**

Lu, Fei. 2020. "Data-Driven Model Reduction for Stochastic Burgers Equations" *Entropy* 22, no. 12: 1360.
https://doi.org/10.3390/e22121360