Parameter Estimation of Partially Observed Turbulent Systems Using Conditional Gaussian Path-Wise Sampler

: Parameter estimation of complex nonlinear turbulent dynamical systems using only partially observed time series is a challenging topic. The nonlinearity and partial observations often impede using closed analytic formulae to recover the model parameters. In this paper, an exact path-wise sampling method is developed, which is incorporated into a Bayesian Markov chain Monte Carlo (MCMC) algorithm in light of data augmentation to efﬁciently estimate the parameters in a rich class of nonlinear and non-Gaussian turbulent systems using partial observations. This path-wise sampling method exploits closed analytic formulae to sample the trajectories of the unobserved variables, which avoid the numerical errors in the general sampling approaches and signiﬁcantly increase the overall parameter estimation efﬁciency. The unknown parameters and the missing trajectories are estimated in an alternating fashion in an adaptive MCMC iteration algorithm with rapid convergence. It is shown based on the noisy Lorenz 63 model and a stochastically coupled FitzHugh–Nagumo model that the new algorithm is very skillful in estimating the parameters in highly nonlinear turbulent models. The model with the estimated parameters succeeds in recovering the nonlinear and non-Gaussian features of the truth, including capturing the intermittency and extreme events, in both test examples.


Introduction
Complex nonlinear turbulent dynamical systems are ubiquitous in many areas, including geophysics, engineering, material science, and neural science [1][2][3][4]. Estimating the model parameters is a crucial prerequisite for data assimilation, uncertainty quantification, and prediction [5,6]. However, parameter estimation of these nonlinear systems is a big challenge for the following reasons. First, these systems often have strong non-Gaussian features [7,8]. The intermittent instability, the extreme events, and the associated non-Gaussian probability density functions (PDFs) with significant skewness or fat tails can easily break the conditions that are required in many existing parameter estimation algorithms designed for more stabilized or laminar systems. Second, the dimension of many complex turbulent systems is quite large [9][10][11][12], which requires the development of efficient parameter estimation algorithms that can overcome the curse of dimensionality [13,14]. Finally, one of the key features of many turbulent dynamical systems is the availability of only the partial observations [15][16][17]. Despite the lack of observational data, the unresolved variables nevertheless play a crucial role in transferring energy with the observed or resolved variables in a highly nonlinear way [18,19]. These unobserved variables are also able to trigger various non-Gaussian phenomena including extreme events in both the resolved and unresolved scales [6,20]. Unfortunately, many desirable mathematical formulations for the parameter estimation algorithms in the presence of full observations are no longer applicable to the partially observed turbulent systems, which leads to a fundamental difficulty in recovering the model parameters.
Several parameter estimation algorithms have been developed for turbulent systems. One simple and frequently adopted parameter estimation approach is based on computing maximum a posteriori or maximum likelihood estimates [21][22][23]. Another commonly used method is the expectation-maximization algorithms [24][25][26][27], where the target energy function is evaluated recursively. Other parameter estimation algorithms based on specific cost functions are also routinely seen depending on the applications [28][29][30][31][32][33].
Among all the parameter estimation algorithms, the Bayesian approaches are widely applied in many practical problems. One of the commonly used Bayesian techniques is the Markov chain Monte Carlo (MCMC) method [34][35][36], which samples the so-called posterior distribution of the parameters conditioned on the given observations. In the situation with partial observations, data augmentation [37] is often incorporated into the MCMC algorithm [38,39]. Then, the MCMC algorithm is applied to sample the parameters and the unobserved trajectories in an alternating fashion. The main challenge in simultaneously sampling the parameters and the hidden trajectories comes from the high-dimensional representation of the latter even in the presence of a low-dimensional dynamical system. In fact, the degree of freedom needed to characterize a trajectory equals the total time instants. Even with the time discretization approximation, a large number of the temporal points are required to fully represent a trajectory. This raises a challenge in the sampling procedure. To this end, several innovative updating strategies have been developed in the context of multivariate diffusion models [40][41][42][43][44][45][46][47][48]. While these methods are designed for general nonlinear models, specific methods can be developed in light of the model structures, which then have the potential to further facilitate sampling and parameter estimation.
The focus of this paper is on a rich class of complex nonlinear turbulent systems [49][50][51]. This rich class of the complex turbulent models is highly nonlinear and non-Gaussian but certain conditional distributions are Gaussian. These systems have wide applications in modeling and predicting many important phenomena in geophysics [52][53][54], climate science [55][56][57], and neural science [58,59]. Specific examples include the Lorenz models (Lorenz 63, 84, and two-layered 96 models), Boussinesq equations with noise, and a variety of the stochastically coupled FitzHugh-Nagumo model. Despite being highly nonlinear, this paper aims at showing that the conditional Gaussian structure can be explored to the development of an efficient path-wise sampling strategy, which is then incorporated into an MCMC algorithm to advance the parameter estimation using only partial observations. The advantage of such a path-wise sampling algorithm is that the sampled trajectories can be written using closed analytic formulae in light of the conditional Gaussian statistics, which are not achievable for general nonlinear systems. Therefore, the new parameter estimation algorithm is extremely efficient and avoids potential numerical errors. It also has the potential to be applied in high-dimensional systems due to its analytically solvable characteristics.
The rest of the paper is organized as follows. The rich class of complex nonlinear turbulent systems with conditional statistics is introduced in Section 2. Section 3 focuses on the development of the sampling and parameter estimation algorithm. Section 4 includes the numerical test experiments. The paper is concluded in Section 5.

The Nonlinear Turbulent Models with Conditional Gaussian Statistics
The general framework of the nonlinear and non-Gaussian turbulent dynamical models considered in this paper is as follows [49,51]: In the coupled system (1a) and (1b), the state variables are X and Y, which are both column vectors representing the multi-dimensional states. On the right-hand side of (1a) and (1b), A 0 , and a 0 are column vectors while A 1 , a 1 , b, and B are matrices. All of them can depend nonlinearly on the state variables X and time t but not the state variable Y. The vectors W 1 and W 2 are independent white noise sources. For complex nonlinear turbulent systems with partial observations, X can be regarded as the collection of the observed variables while Y contains the variables that are not directly observed.
Because of the strong nonlinearity in A 0 (X, t), A 1 (X, t), a 0 (X, t) and a 0 (X, t) and their coupling with Y, its interaction with the white noise can easily induce non-Gaussian statistics for both the marginal and the joint distributions of the coupled system (1a) and (1b). Extreme events, intermittency, and complex nonlinear interactions between different variables all appear in such systems. Despite being highly nonlinear and non-Gaussian, the system in (1a) and (1b) has a desirable analytic feature. That is, conditioned on a given realization of X(s) for s ≤ t, the distribution of Y(t) is Gaussian. Therefore, the coupled system is called the conditional Gaussian nonlinear system.
The conditional Gaussian nonlinear modeling framework (1a) and (1b) includes many physics-constrained nonlinear stochastic models, large-scale dynamical models in turbulence, fluids, and geophysical flows, as well as stochastically coupled reaction-diffusion models in neuroscience and ecology. A gallery of examples of conditional Gaussian systems in engineering, neuroscience, ecology, fluids, and geophysical flows can be found in [49].

The General Bayesian MCMC Parameter Estimation Framework with Partial Observations
Denote θ as the collection of the parameters to be estimated in the coupled system (1a) and (1b), where θ can appear in A 0 , A 1 , a 0 , a 1 , b, and B. Assume X obs is the given trajectory of the state variable X in (1a) and (1b) within the time interval [0, T]. The goal of parameter estimation in the Bayesian framework is to find the conditional distribution, which is also known as the posterior distribution, p(θ|X obs ) ∝ p(θ)p(X obs |θ). ( In (2), p(θ) is the prior distribution, which is based on the empirical knowledge of choosing the parameter θ, while p(X obs |θ) is the likelihood function. Since the nonlinear system (1a) and (1b) is only partially observed, the closed analytic formula of solving (2) is in general not available. One strategy to solve (2) is to apply the data augmentation technique [37], which augments the target state space variables from θ to (θ, Y hidden ). Here, Y hidden is the trajectories of the unobserved variable Y. In such a way, the Bayesian framework in (2) is modified to It is clear from (3a)-(3c) that, if p(θ, Y hidden |X obs ) is solved, then p(θ|X obs ) can be reached by marginalizing over the auxiliary variable Y hidden . Therefore, the goal here is to develop a suitable MCMC algorithm to compute the posterior distribution p(θ, Y hidden |X obs ). In addition to the prior distribution p(θ), the other two components on the right-hand side of (3c) represent the probability of the sampled trajectory of the hidden variable given the parameters p(Y hidden |θ) and the likelihood of the observed trajectory conditioned on both the sampled one and parameters p(X obs |Y hidden , θ).
Comparing with (2), the most significant improvement in (3a)-(3c) is the solvability of the likelihood function. In fact, if the unobserved trajectory Y hidden is given, then a closed analytic formula of the likelihood p(X obs |Y hidden , θ) is available. Specifically, assume a time discretization of both the trajectories X obs and Y hidden denoted bỹ Here, the time interval is discretized with small subintervals 0 = t 0 < t 1 < . . . , < t n = T. The notation with tilde is for discrete approximation while that without tilde is for the original continuous time series. With such notations, the discrete version of the likelihood p(X obs |Ỹ hidden , θ) can be written down explicitly: according to the Markov property. With a small time increment, each probability on the right-hand side of (5) can be computed using a Gaussian approximation based on a forward Euler scheme p(X obs i |X obs i−1 ,Ỹ hidden i−1 , θ). The product of these Gaussian distributions finishes the calculation of the likelihood.
On the other hand, the discrete approximation of p(Y hidden |θ), i.e., p(Ỹ hidden |θ) can be solved via (7) Formulas (6)-(7) basically follow the Girsanov formula [60]. In other words, the Equation (1b) is rescaled by where Y hidden scaled is the rescaled state variable of Y hidden by dividing the noise coefficient b componentwise, assuming b is a diagonal matrix as in most of the applications. The reason to introduce the rescaled variable with an identity noise coefficient in (8) is the following. The probability L(Ỹ hidden ; θ) can be viewed as the Radon-Nikodym derivative of the measure induced by (1b) with respect to the Wiener measure scaled by the diffusion coefficient b. Given that two such dominating measures (with different b) are singular, a direct MCMC implementation will result in a reducible algorithm [38]. Note that the function α(·) depends on X as well. However, for the notation simplicity here, we drop such an explicit dependence.
It is also noticeable that the left-hand side is only proportional instead of equaling to the right-hand side in (6). Nevertheless, this is sufficient for the MCMC algorithm. The reason is that each step in the MCMC algorithm is to compute the ratio between the two p(Ỹ hidden |θ) in the previous and current steps. The pre-constants will cancel with each other in computing the ratio.

The Path-Wise Sampler of the Unobserved Trajectories
To apply the Bayesian MCMC parameter estimation algorithm, what remains is to develop an efficient path-wise sampler of the unobserved trajectory Y hidden (t) given the parameters θ, i.e., sampling from p(Y hidden |θ) appearing in (3a)-(3c). In what follows, the goal is to develop an efficient sampling algorithm for the conditional distribution p(Y hidden |X obs , θ). Note that p(Y hidden |X obs , θ) is only a subset of p(Y hidden |θ). Nevertheless, since the MCMC procedure also involves sampling from p(X obs |Y hidden , θ) in (3a)-(3c), the trajectories of Y hidden sampled from p(Y hidden |X obs , θ) will lead to a higher probability for p(X obs |Y hidden , θ). This means that only those Y hidden that are sampled from p(Y hidden |X obs , θ) will have a higher acceptance in the MCMC algorithm. Thus, sampling from p(Y hidden |X obs , θ) is a more efficient way than sampling from p(Y hidden |θ). Notably, the path-wise sampler here exploits the closed analytic properties of the conditional statistics associated with the coupled nonlinear systems in (1a) and (1b). Data assimilation is utilized in the development of such a path-wise sampler. Throughout this subsection (Section 3.2), we use p(Y hidden |X obs , θ) to denote p(Y hidden |X obs ) since the parameters here are fixed.
As a prerequisite of the path-wise sampler, let us start with solving the conditional (or posterior) distribution p(Y hidden (t)|X obs (s), s ≤ t) at a fixed time instant t. This is also known as the filtering. For the conditional nonlinear Gaussian systems (1a) and (1b), the solution of the filtering can be written down with closed analytic formulae. Theorem 1. For the conditional Gaussian nonlinear systems (1a) and (1b), given one realization of X obs (s) for s ∈ [0, t], the conditional distribution p(Y hidden (t)|X obs (s), is Gaussian. The conditional mean µ f and the conditional covariance R f are given by the following explicit formulae: where · * is the conjugate transpose.
The filtering exploits the observational information up to the current time instant for improving the initialization of real-time prediction. On the other hand, the optimal offline point-wise statistical state estimation can be carried out by making use of the observational information in the entire training period. This leads to more accurate state estimation, achieved by a forward pass of filtering and a backward pass of smoothing. Such a smoothing state estimation is directly associated with the path-wise sampler to be developed shortly.

Theorem 2.
For the conditional Gaussian nonlinear systems (1a) and (1b), given one realization of X obs (t) for t ∈ [0, T], the optimal nonlinear smoother estimate p(Y hidden (t)|X obs (s), s ∈ [0, T]) ∼ N (µ s (t), R s (t)) is conditional Gaussian, where the conditional mean µ s (t) and the conditional covariance R s (t) satisfy the following equations: The backward notations on the left-hand side of (10a) and (10b) are understood as ← − − dµ s = lim ∆t→0 µ s (t) − µ s (t + ∆t). The starting value of the nonlinear smoother (µ s (T), R s (T)) is the same as the filter estimate at the endpoint (µ f (T), R f (T)).
With these tools in hand, we are ready to show the efficient path-wise sampler of the hidden trajectory Y hidden . In fact, the path-wise sampler can be achieved by applying a conditional sampling formula, which is carried out in an extremely efficient way by solving a simple SDE.

Theorem 3.
For the conditional Gaussian nonlinear systems (1a) and (1b), conditioned on one realization of X(t) for t ∈ [0, T], the optimal conditional sampling formula of the trajectories of Y(t) in the same time interval satisfies the following explicit formula: where W Y hidden is an independent white noise source and the square root of the positive definite matrix bb * is unique. The initial value of Y hidden is drawn from N (µ f (T), R f (T)).
The detailed proofs of these formulae can be found in [51,61]. Thanks to the explicit formula in (11), the conditional sampling procedure using (11) for sampling Y hidden is computationally much cheaper than the traditional numerical methods. Such a desirable feature facilitates the efficiency and accuracy of the path-wise sampler required in the parameter estimation.

The Setup of the MCMC Algorithm
A Metropolis-Hasting algorithm is adopted as the MCMC scheme for calculating the posterior distribution (3a)-(3c), which alternates between updating θ conditionally on Y hidden and X obs and sampling the unobserved trajectory Y hidden conditionally on θ and X obs . To overcome the issue of low acceptance ratio in the standard MCMC algorithm, an adaptive Metropolis algorithm [62] is adopted here for proposing the new parameter values. This adaptive MCMC algorithm combines both the shape and the scale adaptations of the Metropolis and the acceptance probability optimization in light of a Cholesky factorization. More specifically, the target acceptance ratio for this adaptive algorithm is set to be 35% for numerical experiments done in the next section.
The entire procedure of the MCMC algorithm is as follows: 1.
Assign an initial guess of the parameters θ. Setup the iteration index k = 1 in the MCMC.

2a.
Sample a trajectory of the unobserved variable Y hidden using the current parameter values. Compute the prior distribution and the likelihood, i.e., the right-hand side of (6). 2b.
Propose a new parameter candidate for θ based on the adaptive MCMC algorithm. 2c.
Compute the ratio of the product of the prior and the likelihood functions associated with the parameter values used in the current and the previous steps, based on the observed trajectory and the hidden trajectory generated in (2a). 2d.
Accept or reject the proposed parameter values.

Numerical Test Experiments
The numerical integration time step is ∆t = 0.005 for the test done in Section 4.1, while that for Section 4.2 is ∆t = 0.001, which are also the time step for the discretization in computing the likelihood. A forward Euler scheme is used for computing the likelihood while the stochastic system is solved via the Euler-Maruyama scheme [63]. In all the tests below, a non-informative prior distribution p(θ) is used for all the parameters in the drift part of the model, which means the same probability is used for θ as the prior information regardless of its value. For the parameters in the diffusion part of the model, the only knowledge in the prior information is their non-negativity.

The Noisy Lorenz 63 Model
The first test model considered here is the noisy version of the Lorenz 63 model: The deterministic Lorenz 63 model [64] is a simplified mathematical model for atmospheric convection. These equations relate the properties of a two-dimensional fluid layer uniformly warmed from below and cooled from above. These equations describe the rate of change of three quantities with respect to time. Physically, x is proportional to the rate of convection, y to the horizontal temperature variation, and z to the vertical temperature variation. The constants σ, ρ, and β are system parameters proportional to the non-dimensional numbers: Prandtl number, Rayleigh number, and certain physical dimensions of the layer itself [65]. In addition to the applications in atmospheric science, the Lorenz 63 model is also widely used as a test model in many other areas, including lasers, dynamos, and chemical reactions [66][67][68][69][70][71][72]. The noisy version of the Lorenz 63 includes more turbulent and small-scale features coming from the unresolved scales. The time series of the noisy Lorenz 63 model thus contains more fluctuations, but they retain the characteristics in the original Lorenz 63. The parameters adopted here are the classical values for the deterministic part together with moderate noise strengths, Figure 1 shows one realization of the model simulation. It is clear that the onedimensional trajectories of the noisy Lorenz 63 model are similar to those from the original deterministic version but with more fluctuations. The chaotic features are illustrated in these time series. The three-dimensional phase plot also clearly indicates a butterfly shape of the model solution.
Assume one realization of the time series y and z is available as the observations. In such a situation, the noisy Lorenz 63 model is a conditional Gaussian system. The new parameter estimation algorithm including the efficient path-wise sampling of the unobserved variable x is thus applicable. To test the parameter estimation skill, the length of the observed time series for y and z is assumed to have 500 time units. Figure 2 shows the trace plots of the estimated parameters. The initial values of the trace plots start from random numbers that are very different from the truth. After about 200 iterations, all the parameters (except σ x ) converge to the truth, indicating a skillful estimation of these parameters. As a further sanity check, the trace plots of the loglikelihood are also included, which confirm the skill of the algorithm. To understand the behavior of the estimated value of σ x , it is important to note that σ x is the noise coefficient in the unobserved variable. In fact, it has only weak contributions to the fluctuation of the observed time series y and z in the presence of the additional random noise from σ y and σ z . In other words, the parameter σ x is only weakly identified in such a chaotic system. Therefore, it is anticipated that the estimation of σ x is not as accurate as other parameters. Nevertheless, as is shown in Figure 3, the model with the estimated parameters succeeds in capturing the key statistics of the true signal. In particular, the non-Gaussian PDF of x and the oscillating temporal autocorrelation function (ACF), which measures the memory, of z are both recovered with high accuracy. These results confirm the skill of the estimated parameters. Figure 1. The noisy Lorenz 63 model (12) with parameters in (13). (a-c) one realization of the onedimensional trajectories of x, y, and z, respectively; (d) the associated three-dimensional phase plot of (x, y, z).

A Stochastically Coupled FitzHugh-Nagumo (FHN) Model
The FitzHugh-Nagumo model (FHN) is a prototype system that describes the excitable medium. It aims at characterizing the activation and deactivation dynamics of a spiking neuron [73]. Stochastic versions of the FHN model mimic more realistic situations, where noise-induced limit cycles were widely studied and applied in the context of stochastic resonance [74][75][76][77]. With these desirable dynamical and statistical features, FHN models have applications in many other areas [78][79][80][81].
The stochastically coupled FHN model considered here is as follows: In (14), the parameter e is a crucial one that represents the time scale ratio. The time scale here is characterized by := 1/e, which is much smaller than one (e.g., = 10 −2 ), implying that u is a fast variable while v is a slow variable. The parameter a is another key parameter that controls the dynamical behavior of the FHN model. The value a > 1 is required in order to guarantee that the system has a global attractor in the absence of noise and diffusion. However, the random noise is able to drive the system above the threshold level of global stability and triggers limit cycles intermittently that enriches the dynamical features.
The stochastically coupled FHN model (14) is used as the test model for the parameter estimation algorithm. The model (14) has a single neuron, and it contains the model families with both coherence resonance and self-induced stochastic resonance. The model fits into the conditional Gaussian nonlinear modeling framework, where the observed variable is u and the hidden variable is v. With different choices of the noise strength σ v , (14) exhibits rich dynamical behaviors. The following two regimes are considered for the parameter estimation experiments.
Regime 1: Coherent temporal pattern and nearly regular oscillations. In this dynamical regime, the temporal patterns are highly coherent due to the choice of a weak noise σ v . The time series of both u and v have nearly regular oscillations with large bursts in u around every fixed number of units. The associated statistical equilibrium PDF of u is bimodal while that of v is skewed. The following parameters are adopted to generate the true time series for regime 1: Regime 2: Mixed temporal pattern and irregular oscillations. In this dynamical regime, the temporal patterns are strongly mixed due to the choice of strong noise σ v . The oscillation period of u becomes more irregular as in regime 1. Correspondingly, the PDF of v becomes nearly Gaussian with a large variance. In addition, a small a is adopted in this regime, leading to a relatively short time between two consecutive excitations of u. The following parameters are adopted to generate the true time series for regime 2: Figure 4 includes the time series, the PDFs, and the phase diagrams of the two dynamical regimes. It is clear that both dynamical regimes have strong non-Gaussian features, where a bimodal distribution is seen in u. Such a bimodal distribution corresponds to a highly intermittent time series with strong extreme events. Regime 1 also has a highly skewed PDF of v, as was described above.
In the setup of parameter estimation, only a single realization of u is available. The total length of the observed u is only 30 time units, representing a moderately short observational period. Figure 5 shows the trace plots of the parameter estimation. Despite being started from a random initial condition, most of the parameters converge quickly to the truth. The only exception is σ v in regime 2. This is again understandable because regime 2 is more turbulent and the parameter σ v is hard to be identified from the noisy observations. Nevertheless, such a weak identification property also means the feedback from σ v to the observed process u is not strong, and therefore the estimated parameters can still recover the significant nonlinear and non-Gaussian features of the true dynamics, especially for the observed variable u. Figures 6 and 7 show the comparison of the model with the true parameters and that with the estimated parameters. Note that the trajectories are computed with different random noise seeds in the two runs. Therefore, the trajectories are not expected to be point-wise consistent with each other. Nevertheless, the overall behavior of the trajectories from the model with the estimated parameters resembles that of the truth. In particular, the model with the estimated parameters generates similar intermittent time series with strong extreme events as the truth. In addition, the statistics from the model with the estimated parameters are also very close to the truth. In regime 1, the highly non-Gaussian PDFs and the irregularly oscillating-decaying ACFs are both perfectly captured by the model with the estimated parameters. In the even tougher dynamical regime 2, despite the small error, the ACFs and the PDFs are still recovered with high accuracy.  (14), where the parameters of the two regimes are listed in (15) and (16). The three columns show the model trajectories, the PDFs, and the phase diagram of both dynamical regimes.

Conclusions
In this paper, a parameter estimation algorithm is developed for a rich class of complex nonlinear and non-Gaussian turbulent dynamical models using only partial observations. The algorithm follows a data augmentation technique, which allows updating the estimated parameters and the trajectories of the unobserved variables alternatively within a Bayesian MCMC framework. Different from the traditional path-wise sampling methods, the method proposed here exploits the mathematical structure of the class of nonlinear systems, which facilitates the development of a closed analytic formula to sample the trajectories of the unobserved variables. Therefore, the sampling approach is mathematically exact and computationally efficient. The parameter estimation algorithm is applied to both the noisy Lorenz 63 model, which is a highly chaotic system, and a stochastically coupled FHN model, which is a highly nonlinear model with strong non-Gaussian statistics and extreme events.
The test models used in this study are still simple low-dimensional models. Nevertheless, the closed analytic formula of the sampling technique should allow an efficient path-wise sampling for even high-dimensional systems. This will be left as a future work. The rigorous mathematical study of the convergence of the algorithm is another future work.  Data Availability Statement: The study did not report any data.

Conflicts of Interest:
The authors declare no conflict of interest.