Bayesian Identiﬁcation of Dynamical Systems †

: Many inference problems relate to a dynamical system, as represented by d x / dt = f ( x ) , where x ∈ R n is the state vector and f is the (in general nonlinear) system function or model. Since the time of Newton, researchers have pondered the problem of system identiﬁcation : how should the user accurately and efﬁciently identify the model f – including its functional family or parameter values – from discrete time-series data? For linear models, many methods are available including linear regression, the Kalman ﬁlter and autoregressive moving averages. For nonlinear models, an assortment of machine learning tools have been developed in recent years, usually based on neural network methods, or various classiﬁcation or order reduction schemes. The ﬁrst group, while very useful, provide “black box" solutions which are not readily adaptable to new situations, while the second group necessarily involve the sacriﬁcing of resolution to achieve order reduction. To address this problem, we propose the use of an inverse Bayesian method for system identiﬁcation from time-series data. For a system represented by a set of basis functions, this is shown to be mathematically identical to Tikhonov regularization, albeit with a clear theoretical justiﬁcation for the residual and regularization terms, respectively as the negative logarithms of the likelihood and prior functions. This insight justiﬁes the choice of regularization method, and can also be extended to access the full apparatus of the Bayesian inverse solution. Two Bayesian methods, based on the joint maximum a posteriori (JMAP) and variational Bayesian approximation (VBA), are demonstrated for the Lorenz equation system with added Gaussian noise, in comparison to the regularization method of least squares regression with thresholding (the SINDy algorithm). The Bayesian methods are also used to estimate the variances of the inferred parameters, thereby giving the estimated model error, providing an important advantage of the Bayesian approach over traditional regularization methods.


Introduction
Many problems of inference involve a dynamical system, as represented by: where x ∈ R n is the observable state vector, a function of time t (and/or some other parameters), and f ∈ R n is the (in general nonlinear) system function or model.Given a set of discrete time series data [x(t 1 ), x(t 2 ), x(t 3 ), ...] from such a system, how should a user accurately and efficiently identify the model f ?In dynamical systems theory, this is referred to as system identification, although for many problems of known mathematical structure, it can be simplified into a problem of parameter identification.The question then leads into deeper questions concerning the purpose of the prediction of f , and whether it is desired to reproduce a time series exactly, or more simply to extract its important mathematical and/or statistical properties.For linear models, many methods are available for identification of the dynamical system (1), including linear regression, the Kalman filter and autoregressive moving averages.For nonlinear models, an assortment of machine learning tools have been developed in recent years, usually based on neural networks or evolutionary computational methods, or various classification or order reduction schemes.The first group, while very useful, provide "black box" solutions which are not readily adaptable to new situations, while the second group necessarily involve the sacrificing of resolution to achieve order reduction.
Very recently, a number of researchers in dynamical and fluid flow systems have applied sparse regression methods for system identification from time series data [e.g.[1][2][3].The regression is used to determine a matrix of coefficients which -when multiplied by a matrix of functional operations -can be used to reproduce the time series.Such methods generally involve a regularization technique to conduct the sparse regression.However, both the regularization term and its coefficient are usually implemented in a heuristic or ad hoc manner, without much fundamental guidance on how they should be selected for any particular dynamical system.
In this study, we present a Bayesian framework for the system identification (or parameter identification) of a dynamical system using the Bayesian maximum a posteriori (MAP) estimate, which is shown to be equivalent to a variant of Tikhonov regularization.This Bayesian reinterpretation provides a rational justification for the choices of the residual and regularization terms, respectively as the negative logarithms of the likelihood and prior functions.The Bayesian approach can be readily extended to the full apparatus of the Bayesian inverse solution, for example to quantify the uncertainty in the model parameters, or even to explore the functional form of the posterior.In this study, we compare the prominent regularization method of least squares regression with thresholding (the SINDy algorithm) to two Bayesian methods, by application to the Lorenz system with added Gaussian noise.We demonstrate an advantage of the Bayesian methods, in their ability to calculate the variances of the inferred parameters, thereby giving the estimated model errors.

Theoretical Foundations
In recent years, a number of researchers have implemented sparse regression methods for the system identification of a variety of dynamical systems [e.g.[1][2][3].The method proceeds from a recorded time series, which for m time steps of an n-dimensional parameter x is assembled into the m × n matrix: . . .
and similarly for the time derivative: The user then chooses an alphabet of c functions, which are applied to X to populate a m × c matrix library, for example of the form: in this case based on polynomial and trigonometric functions.The time series data for the dynamical system (1) are then analyzed by the matrix product: in which Ξ is a c × n matrix of coefficients ξ ij ∈ R. The matrix Ξ is commonly computed by inversion of ( 5) using sparse regression.This generally involves a minimization equation of the form: where ˆindicates an inferred value, based on an objective function consisting of residual and regularization terms: where || • || p is the p norm, λ ∈ R is the regularization coefficient and α, β, γ ∈ R are constants.For dynamical system identification, ( 6)-( 7) have been variously implemented with α ∈ {1, Instead of ( 7), to enforce a sparse solution, some authors have implemented least squares regression with iterative thresholding, known as the sparse identification of nonlinear dynamics (SINDy) method [1]: This has been shown to converge to (7) with α = β = 2 and γ = 0 [7].Other authors have implemented an objective function containing an information criterion, to preferentially select models with fewer parameters [2].The above methods have been shown to have strong connections to the mathematical methods of singular value decomposition (SVD), dynamic mode decomposition (DMD) and Koopman analysis using various Koopman operators [e.g.8-10].
In the Bayesian approach to this problem [e.g.11-13], it is recognized that instead of (5), the time series decomposition should be written explicitly as: where is a noise or error term, representing the uncertainty in the measurement data.The variables Ẋ, X, Ξ and are considered to be probabilistic, each represented by a probability density function (pdf) defined over their applicable domain.Instead of trying to invert (9), the Bayesian considers the posterior probability of Ξ given the data, as given by Bayes' rule: The simplest Bayesian method is to consider the maximum a posteriori (MAP) estimate of Ξ, given by maximization of (10): For greater fidelity, it is convenient to consider the logarithmic maximum instead of (11), hence from (10): If we now make the simple assumption of unbiased multivariate Gaussian noise with covariance matrix Γ, we have: where det is the determinant.The numerator can be written as [13] p( |Ξ) where || || 2 A = A is the norm defined by the A bilinear product.From (9), this gives the likelihood If we also assign a multivariate Gaussian prior with covariance matrix Σ then the MAP estimator ( 12) becomes [13]: We see that the Bayesian MAP provides a minimization formula based on an objective function, which is remarkably similar to that used in the regularization method ( 6)- (7).Indeed, for isotropic variances of the noise Γ = σ 2 I and prior Σ = σ 2 Ξ I, where I is the identity matrix, (17) reduces to the common regularization formula ( 6)-( 7) with α = β = γ = 2 and λ = σ 2 /σ 2 Ξ [11].In Bayesian inference, any additional parameters can also be incorporated into the inferred posterior pdf.In the present study, the covariance matrices Γ of the noise in (14) and Σ of the prior in (16) are unknown.It is desirable to determine these directly from the Bayesian inversion process.Using the above simple model of isotropic variances, the posterior can be written as: In the Bayesian joint maximum a posteriori (JMAP) algorithm, ( 18) is maximized with respect to Ξ, σ 2 and σ 2 Ξ , to give the estimated parameters Ξ, σ2 and σ2 Ξ .In the variational Bayesian approximation (VBA), the posterior in (18) is approximated by q(Ξ, σ 2 , σ 2 Ξ ) = q 1 (Ξ)q 2 (σ 2 )q 3 (σ 2 Ξ ).The individual MAP estimates of each parameter are then calculated iteratively, using a Kullback-Leibler divergence K = q ln(q/p) dΞdσ 2 dσ 2 Ξ as the convergence criterion.

Application
To compare the traditional and Bayesian methods for dynamical system identification, a number of time series of the Lorenz system were generated and analyzed by several regularization methods, including SINDy, JMAP and VBA.The Lorenz system is described by the nonlinear equation [14]: with parameter values [σ, ρ, β] commonly assigned to [10, 8  3 , 28] to generate chaotic behavior with a strange attractor.The analyses were conducted in Matlab 2018a on a MacBook Pro with 2.8 GHz Intel Core i7, with numerical integration by the ode45 function, using a time step of 0.01 and total time of 100.The calculated position data X were then augmented by additive random noise, drawn from the standard normal distribution multiplied by a scaling parameter of 0.2.The regularization processes were then executed using a modified version of the published SINDy code and other utility functions [2], and modified forms of the JMAP and VBA functions implemented previously [11] with parameters a 0 = 10 8 and b 0 = 10 −8 .For comparisons, the inferred parameters were then used to recalculate the time series and derivatives by a further function call.In the Bayesian algorithms, the estimated variances of the parameters and the prior were also calculated, assuming inverse gamma distributions for the variance priors; for JMAP this has an analytical solution, while for VBA the solution is found iteratively using a minimum Kullback-Leibler convergence criterion [11].

Results
The calculated noisy data for the Lorenz system are illustrated in Figure 1a,b, respectively for the parameter values and their derivatives.The calculated regularization results are then presented in Figures 2-4, respectively for the SINDy, JMAP and VBA methods.In each of these plots, the first subplot illustrates the difference in each inferred parameter (i.e., ξ ij − ξij ), while the second subplot gives the inferred time series of the parameters X, showing the noisy time series x(t), the inferred series x(t) and their differences.
As evident in these plots, the three methods were approximately as effective in selection of the coefficients to recreate the Lorenz system.Of the other regularization methods published by [2], the iterative hard thresholding least squares and orthogonal matching pursuit also performed well, while the LASSO algorithm was unsuccessful for any system examined.
As noted, the two Bayesian methods also provided the variances of the predicted parameters, shown in Figures 3a and 4a as error bars corresponding to the standard deviations.These calculations indicate the inferred parameter errors to be larger than previously appreciated, for example ±1.878 × 10 −10 in the coefficient of x in all three series predicted by both JMAP and VBA.These values give a more realistic estimate of the inherent errors in the system identification method than suggested by the SINDy regularization.

Conclusions
We examine the problem of system identification of a dynamical system, represented by a nonlinear equation system dx/dt = f (x), from discrete time series data.For this, we present a Bayesian inference framework based on the Bayesian maximum a posteriori (MAP) estimate, which for the assumption of Gaussian likelihood and prior functions, is shown to be equivalent to a variant of Tikhonov regularization.This Bayesian reinterpretation provides a clear theoretical justification for the choices of the residual and regularization terms, respectively as the negative logarithms of the likelihood and prior functions.The Bayesian approach is readily extended to the full apparatus of the Bayesian inverse solution, for example to quantify the uncertainty in the model parameters, or even to explore the functional form of the posterior pdf.
In this study, we compare the regularization method of least squares regression with thresholding (the SINDy algorithm) to two Bayesian methods JMAP and VBA, by application to the Lorenz system with added Gaussian noise.The Bayesian methods are shown to perform almost as effectively as SINDy for parameter estimation and reconstruction of the Lorenz time series.More importantly, the Bayesian methods also provide the variances -hence the standard deviations -of the inferred parameters, thereby giving a mathematical estimate of the system identification error.This is an important advantage of the Bayesian approach over traditional regularization methods.

Figure 1 .
Figure 1.Calculated noisy data for the Lorenz system: (a) parameters X, and (b) derivatives Ẋ.