A Matlab Toolbox for Extended Dynamic Mode Decomposition Based on Orthogonal Polynomials and p-q Quasi-Norm Order Reduction

: Extended Dynamic Mode Decomposition (EDMD) allows an approximation of the Koopman operator to be derived in the form of a truncated (ﬁnite dimensional) linear operator in a lifted space of (nonlinear) observable functions. EDMD can operate in a purely data-driven way using either data generated by a numerical simulator of arbitrary complexity or actual experimental data. An important question at this stage is the selection of basis functions to construct the observable functions, which in turn is determinant of the sparsity and efﬁciency of the approximation. In this study, attention is focused on orthogonal polynomial expansions and an order-reduction procedure called p-q quasi-norm reduction. The objective of this article is to present a Matlab library to automate the computation of the EDMD based on the above-mentioned tools and to illustrate the performance of this library with a few representative examples.


Introduction
In contrast to traditional modeling approaches, in which it is necessary to formulate a general nonlinear model that depends on a set of parameters to replicate the dynamics of a system under consideration, the EDMD [1] builds upon numerical data (simulation or actual experiments) to provide a finite-dimensional (truncated) linear representation of the system dynamics in a lifted space of nonlinear observable functions, making it akin to a black box modeling paradigm, e.g., transfer functions or autoregressive models [2][3][4]. While the approximation provided by EDMD therefore remains nonlinear in the original state variables, it is linear in a transformed space, which is called the observable space, function space, or vector-valued function of observables (VVFO), among others (for the remainder of this article, we use observables when referring to this space, while the example codes use the VVFO terminology). In other words, The EDMD formulation does not represent the system by a linearized representation of the form x(k + 1) = Ax(k) (where A is a Jacobian); rather, it describes the evolution of observables f (x(k)) through a linear operator U, i.e., f (x(k + 1)) = U f (x(k)). (1) EDMD is closely related to other decompositions such as Karhunen-Loeve decomposition (KLD) [5], singular value decomposition (SVD) [6], proper orthogonal decomposition (POD) [7], and its direct precursor, dynamic mode decomposition (DMD) [8]. These decompositions all produce linear approximations of the behavior of the system near a fixed point, offering the possibility of using linear system analysis tools. EDMD extends this possibility to a region of the state space which is larger than the neighborhood of the fixed point. Indeed, EDMD describes the nonlinear dynamics while being linear in the function space, and therefore provides more than local information while preserving the linear characteristics of the above-mentioned decompositions (KLD, SVD, POD, DMD). As such, it is sometimes called a linearization in the large.
Koopman mode decomposition (KMD) [9][10][11] emerges from the linearity of decompositions that use their spectrum (eigenvalues and eigenvectors) to obtain an approximation of the Koopman operator [12]. From an approximation of the Koopman operator, it is possible to analyze nonlinear systems in terms of their stability and regions of attraction [9,[13][14][15]. Additionally, EDMD (or the approximate Koopman operator) can be used in the context of optimal control and model predictive control [16][17][18][19]. These developments show the importance of having accurate EDMD approximations for analysis and control.
There are several variants of the EDMD algorithm, which use norm-based expansions, radial-basis functions, kernel-based representations [20], orthogonal polynomials, and their variations [21,22]. These representations provide tools for analyzing nonlinear systems via spectral decomposition, and represent the fundamentals for developing synthesis algorithms such as EDMD for control [23].
In this paper, attention is focused on the use of orthogonal polynomials for the expression of the observable functions and an order reduction method based on p-q quasi norm [24,25]. Several application examples are described together with Matlab codes which constitute a practical library for users interested in applying EDMD to engineering and scientific problems.
The library provides several Matlab functions to compute a pq-EDMD approximation of a dynamical system based on one of three possible algorithms. The first algorithm is the original least squares solution, which is suitable for data with a high signal-to-noise ratio. For data with higher levels of noise, a maximum likelihood approximation is proposed, which is valid for unimodal Gaussian distributions (i.e., Gaussian noise for a system with a unique stable equilibrium point). Finally, a solution based on regularized least squares is provided, which promotes sparsity in the regression matrix.

Extended Dynamic Mode Decomposition
This section starts with an introduction to the traditional EDMD formulation to identify nonlinear models of dynamical systems. The procedure is exemplified by the Duffing equation, a benchmark problem in the literature for testing the reliability of the algorithm.
The core idea of the EDMD algorithm is to transform a nonlinear system into an augmented linear system. The first proponent of this idea was Takata [26], who describes the method as a formal linearization. Much later, the method emerged in its current form after the development of the dynamic mode decomposition algorithm [8] and its several extensions. In the following, the original EDMD algorithm [1] is first presented, followed by pq-EDMD, which makes use of orthogonal polynomials and order reduction based on p-q quasi-norm. This later version is particularly interesting as it yields increased numerical accuracy and systematic application.

The Basic EDMD Formulation
Consider as an example the unforced Duffing oscillator, which is a nonlinear spring that has different behaviors depending on the parameterization. The set of differential equations that govern this system iṡ where state x 1 is the displacement, state x 2 is the velocity, α is the stiffness of the spring, which is related to the linear force of the spring according to Hooke's law, δ is the amount of damping in the system, and β is the proportion of nonlinearity present in the stiffness of the spring. Figure 1 shows the phase plane of the system for three different sets of parameters and six random initial conditions (i.e., six initial conditions are generated randomly within the range ics ∈ [−2, 2] 2 starting from a known seed rng (1)). This choice of initial conditions produces an appropriate set of trajectories for calculating the approximation and testing its accuracy. The system of Equations (2) and (3) is integrated with a Matlab ODE solver, e.g., ode23s, and the results are collected at a constant sample period ∆t = 0.1 s for a total of 20 s. The result of this numerical integration is a set of six trajectories of two state variables with 201 points per variable. Each of these trajectories is an element of a structure array in Matlab with the fields "Time" and "SV". The choice of a structure array instead of a tensor comes from the possibility of having trajectories of different lengths, e.g., experimental data of different lengths, a feature that becomes important in systems where having redundant data near the asymptotically stable attractors has a negative impact on the approximation. As the EDMD is a data-driven algorithm, certain trajectories serve as a training set while others serve as a testing set. The amount of data necessary to obtain an accurate approximation depends on the system under consideration as well as its information content (large data sets can bear little information content if experiments are not properly designed). The EDMD algorithm captures the dynamic of the system on the portion of the state space covered by the trajectories in the training set. Therefore, designing experiments that maximize the coverage of the state space can reduce the amount of data while having a positive effect on accuracy.
Each trajectory of the training set is in discrete-time, i.e., x(k + 1) = T(x(k)), where x ∈ R n are the states of the system, k ∈ Z + 0 is the non-negative discrete time, and T : R n → R n is an unknown nonlinear mapping that provides the evolution of the discretetime trajectories. To construct the database, the training trajectories are organized in so-called snapshot pairs {( The snapshots function presented in Listing 1 handles the available trajectories, dividing them into training and testing sets of the appropriate type; the training set consists of matrices containing the x and y data, while the testing set is a cell array containing one orbit per index of the cell. The choice of cell arrays instead of a tensor is to offer the possibility of testing trajectories of different lengths. The tr_ts argument is a Matlab structure containing the indexes of the original set of orbits, which serve as the training and testing sets. The fields of this structure must be tr_index and ts_index, respectively. In addition, there is a normalization flag to use when necessary, e.g., when the order of magnitude of different states is dissimilar. [ xts , yts ] = deal ( cell ( testing_number ,1) ) ; 29 for trj = 1 : testing_number 30 xts { trj } = normalize ( system ( tr_ts . ts_index ( trj ) ) . SV (1: end -2 ,:) , ...

35
' scale ' , scale (1: size ( system ( tr_ts . ts_index ( trj ) ) . SV ,2) ) ) ; 36 end 37 end Notice that the generation of the snapshots avoids the last element in each trajectory, SV(1:end-2) for x and SV(2:end-1) for y. As stated before, avoiding redundant data at the asymptotically stable attractors improves the performance of the algorithm. In Matlab, stopping the simulation early, e.g., as convergence towards the attractor has been achieved, causes the last output interval ∆t = 0.1. This small difference can increase the error in the construction of the approximation. Conversely, if the numerical integration of the system is not stopped near the steady state, it is not necessary to eliminate the last element in the trajectories. The next step in the development of the EDMD is the definition of the observable space as a set of functions f i (x) : R n → C for i = 1, . . . , d, which represent a transformation from the state space into an arbitrary function space. This transformation of the state is equivalent to a change of variables z = f (x), where z ∈ C d . In the Matlab library, the observables are described by orthogonal polynomials, where each element of the set of observables is the tensor product of n univariate polynomials up to order p ∈ N + . For example, in the Duffing oscillator, a set of observables with p = 2 and a Hermite basis of orthogonal polynomials is provided by Note that the first entry is the product of a zero-order polynomial in both of the state variables; the orders of the polynomial basis in the two state variables can be summarized by 5 % Preallocate the matrix of symbolic variables 6 sym_univariate = sym ( ones ( size ( hpm ) ) ) ; 7 % Loop over the state variables to assign the polynomial 8 % according to the order and variable 9 for state_variable = 1 : n 10 sym_univariate ( state_variable ,1: end ) = hermiteH ( hpm ( state_variable ,1: end ) , xsym ( state_variable ) ) ; 11 end 12 base = prod ( sym_univariate ,1) ; 13 % The function omits the intercept ( first element ) . 14 % Otherwise , the evaluation of the whole training matrix 15 % is not possible at once , and the calculation should be 16 % achieved in a loop . The function f can evaluate the complete set of training trajectories at once with the omission of the first observable that corresponds to the intercept or constant value (the consideration of this observable would require another programming strategy involving loops, resulting in higher computational time and memory allocation). Notice the versatility of using orthogonal polynomials, as the whole realm of available orthogonal polynomials in Matlab is a valid choice, e.g., Laguerre, Legendre, Jacobi, etc. Note that the code snippet defines the function of observables as a row vector, instead of the column vector notation in the theoretical descriptions.
After the observables have been defined, their time evolution can be computed according to where U ∈ R d×d is the matrix that provides the linear evolution of the observables and r(x) is the error in the approximation. One of the main advantages of the EDMD algorithm resides in the fact that the system description is linear in the function space. The solution to (6) is the matrix U that minimizes the residual term r(x), which can be expressed as a least-squares criterion: where N is the total number of samples in the training set. The ordinary least-squares (OLS) solution is provided by where the A/G notation replaces the inverse of the design matrix G, as even when using a basis formed by the products of orthogonal polynomials, the design matrix can be close to ill-conditioned (i.e., close to singular). This notation, particularly in Matlab, specifies that a more robust algorithm compared to the inverse or pseudo-inverse is necessary to obtain the approximation. For the solution of (8), the matrices G, A ∈ R d×d are defined by Setting the observables as products of univariate orthogonal polynomials is an improvement, as it generally avoids the need to use a pseudo-inverse approach. Even though the sequence of polynomials in the set of observables is no longer orthogonal, it is less likely to have co-linear columns in the design matrix, improving the numerical stability of the solution. With the training set and the observables, the method for calculating the regression matrix U is shown in Listing 3. Notice that this code defines and uses all the arrays as their transpose. This change is related to the approximation of the Koopman operator, where it is necessary to calculate the right and left eigenvectors of U. The eigenfunctions of the Koopman operator are determined from the left eigenvectors of the spectral decomposition. In Matlab, the left eigenvectors result from additional algebraic manipulations of the right eigenvectors and the diagonal matrix of eigenvalues, thereby decreasing the numerical precision of the eigenfunctions. This problem is alleviated by computing U and its spectral decomposition so that the left eigenvectors are immediately available. In general, if U is a normal matrix (diagonalizable), the additional steps involve the inverse of the right eigenvectors to obtain the left eigenvectors and the calculation of this inverse, considering again that the problem is close to being ill-conditioned, which reduces the accuracy of the eigenfunctions. Numerical Results with EDMD Here, the EDMD algorithm is tested with the second case scenario for the Duffing oscillator with hard damping. The EDMD algorithm can capture the dynamics of the portion of the state space covered by the training set, which is therefore selected as the outermost trajectory in Figure 2. The five remaining trajectories are used for testing. Table 1 provides the parameters of the original EDMD algorithm. In Figure 2, the graph on the left displays the phase plane of the system and shows the training trajectory and testing trajectories along with their approximation by the EDMD algorithm with the Laguerre polynomial basis. EDMD achieves a good approximation while using only a small amount of data. However, notice that the discrete-time approximation of a system of order 2 is of dimension twenty-five. In view of this dimensionality explosion with regard to the original dimension of the state and the complexity of the system, it is necessary to introduce reduction techniques that decrease the necessary number of observables to increase the accuracy of the algorithm [24].

pqEDMD Algorithm
The extension of the EDMD algorithm is a result of speculation on the good performance of the original algorithm when coupled with a set of observables based on products of univariate-orthogonal polynomials. The idea is to introduce a reduction method, based on p-q quasi-norms, first introduced by Konakli and Sudret [27] for fault detection in polynomial chaos problems. The reduction proceeds in the following way: if the q-quasi-norm of the indexes that provide the order of the univariate-orthogonal polynomials is less than the maximum order p of a particular observer, then this observer is eliminated from the set. To implement this procedure, the orders of an observer are defined as α i , and the q-quasi norm of these orders as where q ∈ R + and Equation (11) represent a norm only when q is an integer. When p is redefined as the maximum order of a particular multivariate polynomial instead of the maximum order of the univariate elements, the sets of polynomial orders that remain in the basis are those that satisfy The code snippet used to generate a set of observables based on Laguerre polynomials with a maximum multivariate order p = 4 and a q-quasi-norm q = 0.7 is provided in Listing 4.
The reduction of the basis is not only dimensional, as the p-q quasi-norm reduction reduces the maximum order of the observables as well. As a rule of thumb (considering various application examples), the higher-order observables usually have a negative contribution to the accuracy of the solution. Numerical Results with the pqEDMD The algorithm is now applied to the Duffing oscillator with soft damping. The pqEDMD algorithm can capture the dynamics of the two attractors provided that the training set has at least one trajectory that converges to each of them. Additionally, as is the case for the hard damping, each of these trajectories should be the outermost (see Figure 3). Table 2 lists the parameters of the pqEDMD algorithm. Even though the full basis achieves a low approximation error of 2.3770 × 10 −4 , the reduction of the observables order reduces the empirical error by two orders of magnitude. Comparing the dimension of the full basis to the reduced one does not represent a large improvement. However, this result is due to the comparison between the best result after performing a sweep over several p-q values. Imposing lower p-q values on the approximation has the potential to provide smaller sets of observables while sacrificing accuracy. Next, the first case scenario is considered; here, the damping parameter is zero and the system oscillates around the fixed point at the origin. The pqEDMD algorithm can capture the dynamics of the system if the innermost and outermost limit cycles compose the training set; otherwise, the algorithm cannot capture the dynamics. Table 3 shows a summary of the simulations along with the results; it is apparent that even though the empirical error is higher than in the other two case scenarios, the approximation is accurate (see Figure 4). The sweep over different p-q values provides a reduced basis with lower error than with the full basis. Even though p-q quasi-norm reduction produces more accurate and tractable solutions, having products of orthogonal univariate polynomials does not necessarily produce an orthogonal basis. In certain scenarios, the evaluation of the observables produces an illconditioned design matrix G. Therefore, the next section proposes a way to eliminate even more observables from the basis, improving the numerical stability of the solution.

Improving Numerical Stability via QR Decomposition
QR decomposition [28] can be used to improve the numerical stability and reduce the number of observables even further. If we assume that the design matrix G ∈ R d×d in Equation (9) is obtained based on the products of orthogonal polynomials and that there are no co-linear columns, or, in other words, that rank(G) = d holds, then it is possible to decompose this matrix into the product where Q ∈ R d×d is orthogonal, i.e., Q Q = I d and R ∈ R d×d is upper triangular. Column pivoting methods for QR decomposition rely on exchanging the rows of G such that in every step of the diagonalization of R and the subsequent calculation of the orthogonal columns of Q the procedure starts with a column that is as independent as possible from the columns of G already processed. This method yields a permutation matrix P ∈ R d×d such that where the permutation of columns makes the absolute value of the diagonal elements in R non-increasing, i.e., |r 1,1 | ≥ |r 2,2 | ≥ · · · ≥ |r d,d |. Furthermore, considering that the permutation process selects the most linearly independent column of G in every step of the process, the last columns in the analysis are the ones that are close to being co-linear. Therefore, eliminating the observable related to the last column improves the residual condition number of G. The modified function for the calculation of the regression matrix U is provided in Listing 5.

Matlab Package
pqEDMD() is the main class of the Matlab package that provides an array of decompositions based on the pqEDMD algorithm pqEDMD_array. The cardinality of the array of solutions may be less than the product of the cardinality of p and q, as certain p-q pairs produce the same set of indices, i.e., the algorithm would compute the same decomposition more than once. In addition, it calculates the empirical error of the approximations based on the test set and returns the best-performing approximation from the array as a separate attribute best_pqEDMD. The code provides the complete set of solutions, as a user may opt to use a compact solution that is not as accurate as the best one for tractability reasons, e.g., an MPC controller, where a smaller basis guarantees feasibility for longer horizons and has a lower computational cost.
The only required input for pqEDMD() is the system argument, where it is necessary to provide a structure array with the Time and SV fields with at least two trajectories in the array, one for training and one for testing. The remaining arguments are optional, e.g., the array of positive-integer values p, the array of positive values q, the structure of training and testing trajectories with the fields tr_index and ts_index, the string specifying the type of polynomial, the array of polynomial parameters (if the polynomial type is either "Jacobi" or "Gegenbauer"), the boolean flag of normalization, and the string indicating the decomposition method. For example, Listing 6 shows a call to the algorithm with a complete set of arguments. To provide the different approximations in the main class, pqVVFO() handles the observables for different values of p, q, and the polynomial type. Its output is the matrix of polynomial indexes, a symbolic vector of observable functions, and a matlabFunction Psi to evaluate the observables arithmetically and efficiently; it accepts a matrix of values, avoiding evaluation with loops.
The remaining classes are the implementations of different decompositions based on different algorithms. The ExtendedDecomposition() is the traditional least-squares method described in this article. In addition, there are two additional available decompositions. MaxLikeDecomposition() is used for data with noise, where the maximum likelihood algorithm assumes that the transformation of the states in the function space preserves a unimodal Gaussian distribution of the noise in the state space (this is a work in progress; preliminary results can be found in [29]). These properties of the distribution of noise in the function space are a strong assumption; nonetheless, it is sometimes possible to identify dynamical systems corrupted with noise. The last decomposition leverages the advantages of regularized lasso regression to produce sparse solutions, i.e., RegularizedDecomposition(). Even though the solutions are more tractable, the regularized method sacrifices accuracy. Figure 5 shows the architecture of the solution with the relationship between classes. The current functionality requires the user to call pqEDMD() with the appropriate inputs and options in order to obtain an array of decompositions. This class handles the creation of the necessary pqVVFO() objects to feed into the required decomposition. It is possible to use and extend the observable class to use in other types of decompositions without the use of the main class. The code is available for download in the Supplementary Materials section of this paper.

An Additional Application Example
To conclude this article, an additional case study is discussed involving a set of reactions occurring in a continuously stirred tank reactor, described by

An Additional Application Example
To conclude this article, an additional case study is discussed involving a set of reactions occurring in a continuously stirred tank reactor, described by where there are two types of substrate, s 1 and s 2 . The first substrate is the only component in the inflow, and is the only component necessary for the replication of the first species s 3 according to the replication rate constant r 1 . In addition to the replication of s 3 , the product of the first reaction is the second substrate s 2 , which in turn is necessary for the replication of the second species s 4 according to the replication rate constant r 2 . The remaining variable s 5 is the combination of the dead species from the two groups, where each group dies according to the reaction rates r 3 and r 4 , respectively. The ordinary differential equations (15) Figure 5. Package architecture.

An Additional Application Example
To conclude this article, an additional case study is discussed involving a set of reactions occurring in a continuously stirred tank reactor, described by where there are two types of substrate, s 1 and s 2 . The first substrate is the only component in the inflow, and is the only component necessary for the replication of the first species s 3 according to the replication rate constant r 1 . In addition to the replication of s 3 , the product of the first reaction is the second substrate s 2 , which in turn is necessary for the replication of the second species s 4 according to the replication rate constant r 2 . The remaining variable s 5 is the combination of the dead species from the two groups, where each group dies according to the reaction rates r 3 and r 4 , respectively. The ordinary differential equations where there are two types of substrate, s 1 and s 2 . The first substrate is the only component in the inflow, and is the only component necessary for the replication of the first species s 3 according to the replication rate constant r 1 . In addition to the replication of s 3 , the product of the first reaction is the second substrate s 2 , which in turn is necessary for the replication of the second species s 4 according to the replication rate constant r 2 . The remaining variable s 5 is the combination of the dead species from the two groups, where each group dies according to the reaction rates r 3 and r 4 , respectively. The ordinary differential equations that describe the dynamics of network (15) according to the polynomial formulation of mass action kinetics [30] and the material exchange with the environment are provided bẏ where d = 0.5 is the in/out-flow (dilution rate) of the system and the values for the reaction rates are r = [7 5 0.3 0.05] . With these rate constants, the system has three asymptotically stable points: the working point, where the two species s 3 and s 4 coexist, a point where species s 3 thrives and species s 4 washes out, and a wash-out point, where the concentration of both species vanishes. To construct the database, the strategy is to generate a set of orbits with an even distribution of initial conditions converging to each of the equilibrium points. Certain trajectories converging to each point are used as the training set to produce a linear expanded approximation of the system.
The set of orbits is taken from the numerical integration of the ODE (16) via the ode23s method with an output sampling ∆t = 0.1 for an arbitrary number of initial conditions until the full set of orbits has a total of 20 trajectories that converge to each point, resulting in 60 trajectories in total for the execution of the algorithm.
From each of the sets of orbits that converge to the fixed points, 50% are used for the approximation and the remaining for testing the solution. It is important to have a training set with sufficient information about the trajectories of the system, and in a similar way as for the second and third scenarios of the Duffing equation, to select the trajectories that are far away from the equilibrium point. For system (16), the choice of trajectories for the training set are the ten trajectories that at any given time are furthest away from the equilibrium point to which they converge. Figure 6 shows a selection of training and testing trajectories. Assuming that the orbits of the system are in a structure array with the appropriate fields named system and that the training and testing indexes for the approximation have been carefully selected and placed in a structure named tr_ts, the call to the pqEDMD() class that provides an accurate approximation is shown in Listing 7, where the solution is obtained through the default decomposition method (ordinary least squares), the default polynomial type (Legendre), and without normalization. Listing 7. Class call to approximate the reaction network dynamics.
1 % the pqEDMD !!! 2 p = [3 4 5]; 3 q = [0.5 0.7 0.9]; 4 mak_net_approx = pqEDMD ( system ,p ,q , ' tr_ts ' , tr_ts ) ; Table 4 shows a summary of the simulation parameters used to generate the orbits and to obtain the approximation. The clear advantage of using the reduction method lies in the comparison between the full basis of polynomials, i.e., from 3125 observables for p = 4 and a system of five state variables to a basis of 51 polynomials for q = 0.7. Although the computation with a full basis is computationally intensive, it leads to a solution that is not satisfactory, as the state matrix is not Hurwitz and the trajectories diverge, leading to a result with an infinite error metric. Figure 7 depicts the comparison of several testing trajectories with their corresponding approximations. It is clear that certain trajectories converge to a different fixed point than the one they are supposed to. This phenomenon causes the empirical error grow while remaining bounded. The reason for this behavior is the lack of training trajectories near the boundary of the attraction regions of the asymptotically stable equilibrium points. For better performance of the algorithm in terms of the number of orbits necessary for the approximation, and possibly the dimension of the observable basis, an experimental design procedure is required. ts approx Figure 7. Comparison between the testing and approximation trajectories of the biochemical reaction system.

Conclusions
This paper presents a methodology to derive discrete-time approximations of nonlinear dynamical systems via the pqEDMD algorithm and proposes a Matlab library that can hopefully help popularize the use of the method by non-expert users. The discussion of the methodology and codes is illustrated with several case studies related to the Duffing oscillator and by an example involving a biochemical reaction network.