Algebraic Method for the Reconstruction of Partially Observed Nonlinear Systems Using Differential and Integral Embedding

The identification of partially observed continuous nonlinear systems from noisy and incomplete data series is an actual problem in many branches of science, for example, biology, chemistry, physics, and others. Two stages are needed to reconstruct a partially observed dynamical system. First, one should reconstruct the entire phase space to restore unobserved state variables. For this purpose, the integration or differentiation of the observed data series can be performed. Then, a fast-algebraic method can be used to obtain a nonlinear system in the form of a polynomial dynamical system. In this paper, we extend the algebraic method proposed by Kera and Hasegawa to Laurent polynomials which contain negative powers of variables, unlike ordinary polynomials. We provide a theoretical basis and experimental evidence that the integration of a data series can give more accurate results than the widely used differentiation. With this technique, we reconstruct Lorenz attractor from a one-dimensional data series and B. Muthuswamy’s circuit equations from a three-dimensional data series.


Introduction
The problem in nonlinear dynamical system reconstruction from data series can be described as follows: For a given data series, a system must be found that produces data series that are close in a certain sense to the initial series. Obtaining system models using a limited set of observations is crucial for the reverse engineering of gene regulatory networks [1], analysis of chemical oscillations [2], studies on biological circadian rhythms [3], reconstruction of neuronal models [4], etc.
When systems described by ordinary differential equations (ODEs) are considered, one can specify the problem statement as follows: For a given data series {x i } and its time derivative . x i a system of ODEs must be reconstructed as .
where f is a vector function. Usually, some information about f is known apriori, for instance, the possible elementary operations constituting it. These operations usually construct a group or, more specifically, a ring. One such ring is a polynomial ring which includes a set of polynomials in several variables with coefficients belonging to a certain field such as a field of real or complex numbers [5,6].
where h is the vector of unknown coefficients by the known nonlinear terms (monomials) t and ⊗ denotes element-wise multiplication. The problem usually consists of finding all the entries in h, taking into account that the system can be sparse and many of these coefficients can equal zero. The set of monomials t is a-priory known. Nonlinear dynamical models of Type (2) are used quite often, examples can be found in the classical works of E. Lorenz, O. Rossler, and J.C. Sprott [7]. System reconstruction in terms of PDS has been investigated since the early 1980s by many authors, for example, by J. P. Crutchfield and B. S. McNamara [8], who proposed one of the classical definitions of the problem, and J. Cremers and A. Hübler, who used Legendre polynomials as a basis for the derivative function construction [9]. Later, Taylor-like expansions were introduced by J. L. Breeden et al. [10]. Many alternatives to PDS models have been proposed, for example, neuronal networks, also including nonlinearities and coefficients found from the experimental data [11]. A neuronal network can sometimes give more precise results and is more preferable in cases of complex systems. The disadvantage of the neuronal technique is that it yields large-scale and computationally costly models. Neuronal networks and the PDS approach can also be combined, for example, for gene regulatory network identification. The problem of finding a reliable algorithm for determining coefficients of PDS (2) is difficult, especially in case(s) of noisy and incomplete data. Determining coefficient vector h can be considered as the optimization problem where v is the set of observed derivatives v = .
x i , and f(X) = f(x i ) is a set of function values evaluated on {x i }. The size of the unknown vector h is equal to M × L where M is the system dimension and L is a number of available monomials.
The Problem (3) can be treated as a problem of nonlinear regression. Several researchers, for example, H. Iba and D. P. Searson et al. [12,13] developed specialized versions of genetic algorithms for solving it. The main disadvantage of genetic algorithms is that their run times are, in a general case, much longer in comparison to other approaches. Meanwhile, classical optimization methods such as Newton or quasi-Newton methods are inefficient on this problem because the optimization problem is discontinuous. When one or several coefficients become equal to zero, the nonlinear regression result changes sufficiently.
Fortunately, Problem (3) can also be considered as the linear regression problem. Indeed, consider · as 2-norm, and notice that since the set X is already known, the function values f(X) can be evaluated. Therefore, the least squares method (LSM) can be applied in order to obtain h * : where E = f(X).
The disadvantage of LSM is that Problem (3) is often ill-conditioned since some of the monomials vanish on the given data series and cause the singularity of the matrix E T E. To avoid this, Laubenbacher and Stigler proposed division by Gröbner basis to simplify the initial set of monomials used for regression [1], but this approach is not applicable to noisy data series.
Even if monomials do not vanish or the monomial set was somehow simplified to remove all vanishing elementary functions, the matrix E T E usually has a large condition number which makes Problem (3) difficult to solve. Recently, D. Yogatama and N. Smith considered the application of L 1 -regularized LSM to these types of problems, which converge better than the classical version of LSM [14]. Later, H. Kera and Y. Hasegawa proposed excluding vanishing polynomials using the approximate Buhberger-Müller algorithm and, then, deleting minor terms from regression using a specialized procedure [15]. The latter algorithm, further referred to as the Kera-Hasegawa (KH) method, underlies the reconstruction procedure developed in this paper.
From a physical point of view, some state variables needed to perform the reconstruction procedure can be unknown. The simplest case is when only system states are known and the derivatives are not, so numerical differentiation should be applied. Howeve, if the system is partially observed, phase space reconstruction must be performed. For example, when a circuit with a memristive element is considered, one of its state variables, magnetic flux or charge in a memristive device, cannot be directly measured in a macroscopic scale [16]. The problem is more complicated in the domain of physical chemistry where only one variable is usually observed, for example, in Belousov-Zhabotinsky reaction this is Br-ISE electric potential [17]. Using the well-known Takens' embedding theorem [17] we can reconstruct the discrete-time system phase space using a shifted time series. For continuous-time reconstruction, we can use differentiation or integration. The latter technique is used quite rarely because it can cause the emergence of linear trends in integrated data but we will show how to avoid this and why it is more preferable than differentiation.
The main aim of this paper is to establish an efficient and reliable technique for identifying chaotic circuits, especially, ones with nonlinear elements with unknown parameters. The problem arises from the fact that real circuits even with simple nonlinearities and known parameters need certain parameter optimization for simulation despite the theoretical prediction that the circuit can accurately correspond to the ODE [18]. When the circuit contains a nonlinear element, the problem becomes more complicated since the relevant model of this element may also be different from the prediction. For example, it has recently been shown that original HP models are irrelevant to the actual resistance switching devices with memory effect [16]. The relevant model proposed in this work [16] contains exponents, hyperbolic trigonometric functions and negative powers. This paper is organized as follows: First, we describe how PDS can be reconstructed from complete data using the extended Kera-Hasegawa method; second, we explain how these results can be applied to data series lacking some state variables; third, we carry out computational experiments showing the validity of the approach, to be more precise, we reconstruct the Lorenz system from 1-dimensional data series and a 4-dimensional memristive circuit proposed by B. Muthuswamy [19] using incomplete 3-dimensional data and we show that the accuracy of numerical integration and differentiation plays an important role in the success of reconstruction, even though the reconstruction method is noise-tolerant; finally, we discuss the limitations of the proposed approach and give a conclusion.

Algebraic Method for ODE Reconstruction
In this section, we describe an algebraic approach to the system reconstruction method based on subsequent application of the approximate Buchberger-Möller method, the least-squares method and deleting minor terms method, close to the method proposed by Kera and Hasegawa, but we extend this approach to the Laurent polynomials which can have negative powers of monomials in comparison with ordinary polynomials with nonnegative powers only.

Polynomial Rings, Ideals, ABM Algorithm
Consider a nonlinear function in Equation (1) as where α ij are the powers of each variable. The polynomial functions form a polynomial ring P over the field F, where operations of addition and multiplication are defined as usual. More precisely, P is an Abelian group under addition since it is associative, commutative, has an additive inverse and has zero element f (x) = 0, P is monoid under multiplication since it is associative and has an identity element f (x) = 1 which is exactly also this multiplication is distributive. The field over which the ring is defined determines the properties of the coefficients h ki ∈ F. In this paper, we consider P over a field of real numbers R. When α ij ∈ N, the functions of Equation (6) form ordinary polynomials, or Taylor-like expressions, e.g., for the two-variable function it can be When α ij ∈ Z, the functions for Equation (5) are called Laurent polynomials where powers are conventionally defined within a symmetric interval α ij ∈ [−M, M] Z , but we will not require this property further. The terms in Equation (8) can be ordered in various ways. We will consider a degree-lexicographic order, where we primarily consider the power of the entire monomial defined as Second, we consider the powers of variables selected alphabetically organized in ascending order of absolute value of power. This may be exemplified for a set of Laurent monomials of 2 variables and order in [−1 Originally, only positive powers are considered in degree-lexicographic order, so we introduce this equation here by analogy; for discussion, see Section 4.
The order ideal O of the polynomial ring P is a finite set of monomials in T if it is closed under divisors, i.e., satisfying the condition: for each monomial t ∈ O all monomials t j dividing it also belong to O. A "border" of the order ideal ∂O in an ordinary polynomial ring is a set of monomials in T defined as where "minus" denotes set subtraction. If the monomial set O = 1, x, y is an order ideal in an ordinary polynomial ring, its border is ∂O = x 2 , xy, y 2 .

of 22
In Laurent polynomial ring, a border is defined similarly to Equation (9) but also includes negative powers x −1 i , i = 1 . . . M: The vanishing ideal of a set of data points X is a set of all polynomials vanishing (i.e., equals to zero) on X: The vanishing ideal I(X) can be spanned by a finite set of polynomials called the basis of I(X). Let O = {t 1 . . . t L } be an order ideal with a border ∂O = {b 1 . . . b ν }. Let G = g 1 . . . g ν ⊂ P be a set of polynomials, and there is an ideal I ⊆ P. The set G is an O-border prebasis if the polynomials in it have the form The set G is an O-border basis of I if G generates I and if the residue classes of the terms in O constitute a vector basis of P − I. Several other bases are known, such as Gröbner basis and Macaulay basis. To find a border basis, Buchberger-Möller (BM) algorithms can be used [15]. Border basis G is called approximate if g(X) 2 < , where g ∈ G and is a small positive number. Modification of BM algorithm, known as the approximate Buchberger-Möller (ABM) algorithm can be used to compute an approximate border basis on a noisy dataset X.
We also define the evaluation map E, which transforms a given dataset The ABM algorithm can be described as follows. Suppose, we have a dataset X, a tolerance , and set of monomials σ ⊂ T in degree-lexicographic order. We should obtain a border basis G and an order ideal O. We start with degree d = 1 and O = {1}. Let L ⊂ σ initially contain all terms of degree 1. Then, the process starts while the order d is not greater than the maximal order defined as max degσ and L is not empty. At the current step, all monomials of degree d − 1 have been already processed. We estimate the vector A on the next monomial in σ as and, then, remove it from L. Let λ min be the smallest eigenvalue of the matrix A T A and if it satisfies the condition λ min > , then t is appended to O and all the polynomials of order d + 1 are added to ∂O. A polynomial with coefficients defined by an eigenvector of A T A corresponding to λ min is added to G. Application of ABM algorithm to the dataset X and the set of initial monomials σ allows obtaining O and ensures that any polynomial where o i ∈ O, does not vanish on X, even though O can be, strictly speaking, not always an order ideal [15].

KH Algorithm for Polynomial Function Reconstruction
Consider a dataset X = {x 1 , . . . , x M } which can be represented as observation vectors x i for i-th state variable, M is the system dimension. Each data vector has N observation points. Using the order ideal O as a set of basis monomials, one can evaluate them on a given dataset X and then build an interpolation, using derivatives set V = {v 1 , . . . , v M } which is also given or can be derived from X by the relation v i = .
x i : In each dimension, Equation (13) can be reconstructed using the least squares method. Denote E = E O (X) and transform Equation (13) into a linear equation: Equation (14) is usually considered as overdetermined, since the number of estimated points can be arbitrary and, in the presence of noise, a large number of observations are a necessary condition for obtaining a reliable solution. The overdetermined Equation (14) can be solved using the ordinary least squares method, obtaining the minimum (15) by analytical Formula (4). Due to the presence of noise and since it is supposed that the vector h k is sparse, L 1 -regularization is recommended to obtain better results: where λ is a regularization parameter. Increasing λ provides a sparser solution. Problem (16) is also known as a basis pursuit denoising (BPDN) problem, and several special optimization methods have been proposed for solving it, e.g., Nesterov method [20,21]. If it is assumed that h * is dense, L 2 -regularization or Tikhonov regularization [22] can be applied: with a known solution in terms of matrices: where I is an identity matrix [23]. Kera and Hasegawa [15] proposed removing minor terms from the interpolation polynomial using a special procedure based on an estimation For each term in a polynomial h ki o i (x) its weight (20) is estimated, and terms with minimal weight are subsequently removed from the interpolation while the error of approximation remains within a given tolerance η: Mathematics 2020, 8, 300 7 of 22 Following Kera and Hasegawa, we call this procedure delMinorTerms. It ensures that the solution is sparse enough even if the data is contaminated with noise.
The Kera-Hasegawa (KH) PDS reconstruction method finds a set of interpolating polynomials H in P = (T × R) of type (19), where × is the Cartesian multiplication, satisfying (H ⊂ P : H k (X) ≈ v k ) for a given X and V. Firstly, it performs the ABM algorithm to remove monomials from T vanishing on X, comprising an order ideal O of T. Next, it applies L 1 -regularizated LSM to find an interpolation polynomial on O × R and then runs delMinorTerms to ensure that the interpolated polynomials are sparse while the accuracy criterion Equation (20) is satisfied.

Phase Space Reconstruction Using Integration Embedding
The KH method can be performed only if X and V are complete, i.e., the dimension M of the reconstructed system corresponds to dimensions of X and V. In the general case, we can observe only one or several state variables, and thus we need to reconstruct the entire phase space. The well-known F. Takens' theorem [23] states that having a time-delayed series of a single-variable , we can reconstruct M-dimensional phase space due to time delay embedding, and, similarly, N. Packard et al. [24] proposed reconstructing M-dimensional phase using M − 1 subsequent derivatives due to differential embedding. Recently, J. Lekscha and R. Donner [25] have shown that reconstruction is possible in the case of non-uniformly sampled and noisy time series as well, which is an important result for practical use. Unlike the differential embedding approach, integral embedding did not become popular in scientific literature, despite being more useful in several practical cases. Lemma 1. M-dimensional phase space can be reconstructed due to integral embedding, and a trajectory F of the system F reconstructed from time series x[n] can be written as: where C M−1 (t) is a residual polynomial whose general form is: where K is a natural number.

Proof.
A system of ODEs in the Cauchy normal form is usually organized so as where the state variable x j is related to the variable of the next index through differentiation. It can also be written through the integration Mathematics 2020, 8, 300 where C j is a constant depending on the initial condition. Equation (24) can be used for subsequent integration of state variables for reconstructing the trajectory in a full phase space when variables x 1 . . . x j−1 are known. After K integrations of a time series given by x 1 , we obtain from which Equations (22) and (23) follow obviously.

Corollary 1.
Since the polynomial residual Equation (23) grows very fast if t K−1 is large enough, it is undesirable if we consider a computer implementation of the reconstruction algorithm. Otherwise, the accuracy of the data representation may suffer. We propose the following procedure for obtaining x j . First, we choose zero initial condition to detect and integrate x j+1 on a given time interval [t 0 , t 1 ]: Then, we estimate its mean value x j , and supposing C j = x j , subtract it obtaining the desired solution Another way is to de-trend the second integral containing C j t with a specialized procedure, e.g., detrend function in a MATLAB environment.
The system reconstructed using integration embedding is equivalent to the system reconstructed using differential embedding up to M − 1 differentiations. More strictly, this can be formulated as Lemma 2.

Lemma 2.
Suppose, there is a map G(t, y) : R × R M → R M , t ∈ R, y ∈ R M which projects G onto the phase space R M , can be obtained via subsequent differentiation of the one-dimensional trajectory G 1 corresponding to the state variable y 1 . Then, then a dynamical system (t, x) : R × R M → R M , x ∈ R M exists which projects F onto the phase space R M can be obtained via subsequent integration of the one-dimensional trajectory F M = G 1 corresponding to the state variable x M .
Proof. The following ODE describes a dynamical system G(t, y) due to the condition that subsequent differentiation of one-dimensional trajectory y 1 generates G Since G 1 = F M the corresponding state variables are equal: y 1 = x M . The condition that the projection F of the map F onto the entire phase space can be obtained via subsequent integration of its one-dimensional component F M can be expressed in the form of Equation (24). Consequently, the following ODE describes F(t, x): where f (t, x) is a differentiable function. It is related with y 1 in the following way: Corollary 2. A dynamical systemF which can be reconstructed via integration embedding exists if the system G which can be reconstructed via differential embedding exists.
The ODE representation of a partially observed system with K observed variables and M-dimensional entirely phase space can be represented in the following way: where f j (x) is a scalar function. Missing derivatives of orders {K + 1, . . . , M} can be found numerically.
Finite-difference-based numerical differentiation is the most common approach, but some other techniques can be used as well, e.g., numerical differentiation based on Legendre polynomials [25].
If the known state variables are denoted x K+1 , . . . , x M , the system reconstructed via integration reads which is an alternative form of Equation (26) with other variables. The number of dimensions M can be selected concerning its physical sense or with one of the several algorithms, e.g., false nearest neighbor (FNN) algorithm [26,27], or estimating one of the fractal dimensions or entropies [28].
The embedded integration approach is more useful in some practical cases due to the following two theorems. Theorem 1 considers a case appearing in many practical applications when the differential embedding approach cannot give a valid reconstruction due to the non-uniqueness of the solution corresponding to the reconstructed trajectory F . Theorem 1. A function f (t) determined on R exists, satisfying the following conditions: (1) f (t) is a function that produces time series u = f (t i ) being a one-dimensional projection of a M-dimensional trajectory F of the dynamical system F; (2) f (t) is continuous and satisfies the Lipchitz condition (3) an M-dimensional trajectoryF can be reconstructed from time series u using differentiation which does not satisfy the condition of the uniqueness of the solution; (4) an M-dimensional trajectory F can be reconstructed from time series u using integration that satisfies the condition of uniqueness of the solution and thus is topologically valid.
Proof. Consider a continuous function that satisfies the following conditions. First, there exists an infinite set of time points at which the function f and its derivatives equal an arbitrary number a ∈ R: At other time points the function f differs from a: Second, for a sufficiently large number Q ∈ N: M ≤ Q there exist two time points t i and t j , where the function changes its value relatively to a: Then, under this conditions, reconstruction by differentiation yields in a phase space where the point x a = (a, a, . . . , a) T is a point at which at least two trajectory loops are connected and therefore the uniqueness of the solution is violated: starting from this point and having no apriori information of which branch can be selected we can neither predict the next state of the system nor determine the previous state of the system. Therefore, the trajectoryF reconstructed from time series u using differentiation does not satisfy the condition of uniqueness of the solution. Meanwhile, due to Lemma 1 and Lemma 2, it is possible to obtain M-dimensional trajectory F from time series u using integration. Due to Corollary 1 it is possible to select such integration constants C j when u and its integrals are not equal to a so it may satisfy the condition of the uniqueness of the solution and thus F is topologically valid. Figure 1 illustrates the phase space geometry. Theorem 2 is dedicated to noise-amplifying properties of differentiation operators. In the presence of additive white Gaussian noise, when the sampling frequency is high enough and the spectral density of the signal is mostly low-frequency, differentiation tends to decrease the signal-to-noise ratio (SNR), while integration does not. (1) after differentiation, the SNR of will be smaller than the SNR of ; (2) after integration, the SNR of ∫ will be at least not smaller than the SNR of .

Proof.
Recall, the signal-to-noise ratio is determined as where is root-mean-square (RMS) amplitude of the signal with its square equal to Time series u = f (t i ) satisfying conditions 2-4 of the Theorem 1 when a = 0 are rather common in biology, medicine, genetics and neuroscience.
Theorem 2 is dedicated to noise-amplifying properties of differentiation operators. In the presence of additive white Gaussian noise, when the sampling frequency is high enough and the spectral density of the signal is mostly low-frequency, differentiation tends to decrease the signal-to-noise ratio (SNR), while integration does not. (1) after differentiation, the SNR of d dt u will be smaller than the SNR of u; (2) after integration, the SNR of udt will be at least not smaller than the SNR of u.
Proof. Recall, the signal-to-noise ratio is determined as where A u is root-mean-square (RMS) amplitude of the signal with its square equal to where T is time interval, and A ξ is RMS amplitude of the noise. Denote the sampling frequency as ω s . Denote the Fourier transform of the noise ξ(t) as Ξ(ω). Then, the power spectral density of the noise calculated within frequency interval ω ∈ [0; ω s /2] is According to Parseval's theorem, Note that AWGN has a uniform power spectral density, i.e., ∀ω ∈ 0; ω s 2 : S ξξ (ω) = S ξξ .
The condition that the power spectrum of u lie almost entirely in a frequency interval [ω 0 , ω 1 ] means that its power spectrum S uu (ω) satisfies the condition: where 0 < q < 1 is a positive real number close to 1, and ω 1 is a frequency up to which almost the whole spectrum is contained, usually, ω 1 < ω s 2... 10 . The gap between ω 1 and ω s /2 is almost occupied by the noise only, see Figure 2a. Consequently, in this case, a formula for SNR can be rewritten as: Consider integration and differentiation acting at u. Taking Fourier transform, obtain where j is the imaginary unit. The same is true for the noise. Denote d dt u(t) = . u(t), Then, corresponding spectral densities are: From Equation (31) and Equations in (32), SNR of the differentiated signal is: Due to Hölder's inequality, and the fact that both ω 2 and S uu (ω) are nonnegative, Since 2ω 1 ω s < 1, the SNR D is never greater than the SNR of the original signal under the above made assumptions.
For the integration operator, from Equation (31) and Equations in (32), Again, due to Hölder's inequality, Therefore, while the SNR I is not obligatorily greater than the SNR of the original signal, it is not necessarily less than it in contrast to the case of using differentiation. Theorem 2 is in good correspondence with the signal processing theory [29] from which it is known that differentiation operator relates to a high-pass filter, and amplifies high-frequency noise, while integration operator relates to a low-pass filter.

Description of the Proposed Reconstruction Technique
The overall proposed technique is described as follows: First, we consider the time series set of a partially observed dynamical system and its derivative . Using both integral or differential embedding, preferring the first one due to Theorem 2 is in good correspondence with the signal processing theory [29] from which it is known that differentiation operator relates to a high-pass filter, and amplifies high-frequency noise, while integration operator relates to a low-pass filter.

Description of the Proposed Reconstruction Technique
The overall proposed technique is described as follows: First, we consider the time series set U of a partially observed dynamical system and its derivative W. Using both integral or differential embedding, preferring the first one due to Theorems 1 and 2, we reconstruct the phase space and the M-dimensional phase space trajectory X. In this paper we suppose that M is already known. If not, the FNN algorithm or other dimension-finding algorithms can be applied. For the reasons why we do not reject differential embedding, see the Limitations section.
Then, we generate a degree-lexicographically organized set of monomials σ of order D = [d − ; d + ], where the power of each monomial lies within D and powers of each term in monomial also lies within D. From a practical point of view, we assume that d − ≤ 0 and d + ≥ 0. The algorithm for generating σ is given in pseudocode in Listing 1.
The subroutine generateV returns a special indexing matrix V: and red plot corresponds to the Fourier transform of the noise Ξ( ). (a) Original signal and noise; (b) after differentiation; and (c) after integration.
Theorem 2 is in good correspondence with the signal processing theory [29] from which it is known that differentiation operator relates to a high-pass filter, and amplifies high-frequency noise, while integration operator relates to a low-pass filter.

Description of the Proposed Reconstruction Technique
The overall proposed technique is described as follows: First, we consider the time series set of a partially observed dynamical system and its derivative . Using both integral or differential embedding, preferring the first one due to Theorems 1 and 2, we reconstruct the phase space and the -dimensional phase space trajectory . In this paper we suppose that is already known. If not, the FNN algorithm or other dimension-finding algorithms can be applied. For the reasons why we do not reject differential embedding, see the Limitations section.
Then, we generate a degree-lexicographically organized set of monomials of order = [ ; ], where the power of each monomial lies within and powers of each term in monomial also lies within . From a practical point of view, we assume that ≤ 0 and ≥ 0. The algorithm for generating is given in pseudocode in Listing 1.
The subroutine generateV returns a special indexing matrix V: The subroutine adds produces a column-wise sum of as a candidate vector = ( , … , ) of monomial terms powers: The monomial ( ) is added to if it is not already contained in . The order of multiplication is neglected, e.g., the term is considered similar to the term . The subroutine shiftForward produces the next state of matrix = ⊕ considering as a column-type of D-digit number in M-ary numeral system, where each digit has M states defined by the position of the non-zero element in each row, and where the first row is the least significant digit of the number, and ⊕ denotes summation with the column vector = (1,0, … ,0) .
The subroutine adds produces a column-wise sum of V as a candidate vector α = (α 1 , . . . , α M ) T of monomial terms powers: The monomial t(x) is added to σ if it is not already contained in σ. The order of multiplication is neglected, e.g., the term x 1 x 2 x 1 is considered similar to the term x 2 1 x 2 . The subroutine shiftForward produces the next state of matrix V k+1 = V k ⊕ 1 considering V as a column-type of D-digit number in M-ary numeral system, where each digit has M states defined by the position of the non-zero element in each row, and where the first row is the least significant digit of the number, and V k ⊕ 1 denotes summation with the column vector 1 = (1, 0, . . . , 0) T . When the degree-lexicographically ordered set of monomials is computed, the Kera-Hasegawa method is applied to , , and . The obtained reconstructed PDS corresponds to the original dynamical system if the noise level in the data is sufficiently low and the system is representable in a form of PDS in a given set of coordinates.

Experimental Results
In this section, we first illustrate the advantages of using integral embedding using two examples and, then, show how the proposed approach can be used for the reconstruction of Lorenz attractor from a one-dimensional time series and chaotic memristive circuit equations from a three-dimensional time series.

Illustrations of the Theorems
The following example illustrates the Theorem 1.

Figure 3 shows two variants of 3-dimensional phase spaces, reconstructed via differentiation and integration from time series = { ( )}.
When the degree-lexicographically ordered set of monomials σ is computed, the Kera-Hasegawa method is applied to σ, V, and X. The obtained reconstructed PDS corresponds to the original dynamical system if the noise level in the data is sufficiently low and the system is representable in a form of PDS in a given set of coordinates.

Experimental Results
In this section, we first illustrate the advantages of using integral embedding using two examples and, then, show how the proposed approach can be used for the reconstruction of Lorenz attractor from a one-dimensional time series and chaotic memristive circuit equations from a three-dimensional time series.

Illustrations of the Theorems
The following example illustrates the Theorem 1.

Example 1. Consider the ODE:
The function z(t) in (31) satisfies the conditions of the Theorem 1. Its unresolvable point is x 0 = (0, 0, 0) T . Figure 3 shows two variants of 3-dimensional phase spaces, reconstructed via differentiation and integration from time series u = z(t i ) .  Example 2 shows how embedded differentiation and integration approaches can be applied to ECG signals. In this example, the conditions of both Theorems are satisfied, i.e., differentiation produces noisy trajectory reconstruction with a singular point, while integration does not have these drawbacks. [30,31], as shown in Figure 4a.  After applying the embedded differentiation approach, we obtain a trajectory that does not satisfy the uniqueness criterion and is contaminated with noise. After applying embedded integration approaches, we obtain a rather a smooth trajectory with no singular points. Example 2 shows how embedded differentiation and integration approaches can be applied to ECG signals. In this example, the conditions of both Theorems are satisfied, i.e., differentiation produces noisy trajectory reconstruction with a singular point, while integration does not have these drawbacks. [30,31], as shown in Figure 4a. Example 2 shows how embedded differentiation and integration approaches can be applied to ECG signals. In this example, the conditions of both Theorems are satisfied, i.e., differentiation produces noisy trajectory reconstruction with a singular point, while integration does not have these drawbacks. [30,31], as shown in Figure 4a.  After applying the embedded differentiation approach, we obtain a trajectory that does not satisfy the uniqueness criterion and is contaminated with noise. After applying embedded integration approaches, we obtain a rather a smooth trajectory with no singular points. After applying the embedded differentiation approach, we obtain a trajectory that does not satisfy the uniqueness criterion and is contaminated with noise. After applying embedded integration approaches, we obtain a rather a smooth trajectory with no singular points.

Reconstruction of Lorenz Attractor
Consider the classical Lorenz attractor with a standard parameter set: s = 10, r = 28, b = 8/3 [32]. Suppose, only one variable z can be observed, so we should transform the Lorenz system to obtain a model in new coordinates: Equation (37) cannot be reconstructed into ordinary polynomial functions due to the negative powers of variable x 1 , but it is possible to reconstruct it as the Laurent polynomial function. The application of the proposed algorithm can be done using both differential and integration approaches. The deglexord procedure (see Listing 1) generates a set of L = 50 monomials or degrees D = [−1; 3]. To ensure that the system of equations is overdetermined, we should take not less than 51 points in X. We are never sure that any term will vanish on X and will be removed by the ABM algorithm. An example of the reconstructed system is: To reach a high quality of reconstruction, we generate data using the function ode113 with parameters RelTol = 10 −13 and AbsTol = 10 −15 and constant stepsize h = 10 −4 and simulation time T = 45 in MATLAB 2019b environment (Campus license No. 40502181, ETU "LETI") and use the 4th order integration formula. Note that Equation (37) can be reconstructed from the data obtained from the original Lorenz system using the time series corresponding to the variable z and its two integrals.
Two projections of the attractor Equation (37) and its reconstructed version are given in Figure 5. The initial conditions were (0.1, 0, −0.1) T , the parameter η = 10 −2 . the 4th order integration formula. Note that Equation (37) can be reconstructed from the data obtained from the original Lorenz system using the time series corresponding to the variable and its two integrals.
Two projections of the attractor Equation (37) and its reconstructed version are given in Figure  5. The initial conditions were (0.1,0, −0.1) , the parameter = 10 .  Truncation and round-off errors can be considered as a sort of noise [33] which makes it possible to compare differential embedding and integral embedding in terms of sensitivity to the reconstruction error induced by the stepsize. The results are given in Figure 6. Truncation and round-off errors can be considered as a sort of noise [33] which makes it possible to compare differential embedding and integral embedding in terms of sensitivity to the reconstruction error induced by the stepsize. The results are given in Figure 6. In Figure 6, we denote the phase space reconstruction method based on differentiation of the 2nd order of accuracy as DiffOrd2 (central differences) and of the 4th order of accuracy as DiffOrd4 (4 order finite differences) [34]. The method based on of 2nd order integration is denoted as IntOrd2 (trapezoidal rule) and the method of 4th order of accuracy is denoted as IntOrd4 (Simpson's rule). Two plots summarize the results on mean error = ‖ − ( )‖ across three state variables and overall number of terms .
We see from both plots that for small stepsizes, the integration based reconstruction method performs well while the differentiation approach suffers from numerical noise following with the proof of Theorem 2. The higher the sampling frequency is, the more noise is added into the data and the algorithm is more likely to fail. However, large stepsizes are more likely to corrupt the integration-based approach, but in this case this happens at ℎ > 10 when both methods give inaccurate results even on 4th order. The reconstruction techniques based on second-order formulae became inaccurate at ℎ > 2,5 ⋅ 10 , the fourth-order based formulae became inaccurate at ℎ > 3,5 ⋅ 10 . By "inaccurate" we mean that Equation (37) was reconstructed with some missing or excessive terms. In Figure 6, we denote the phase space reconstruction method based on differentiation of the 2nd order of accuracy as DiffOrd2 (central differences) and of the 4th order of accuracy as DiffOrd4 (4 order finite differences) [34]. The method based on of 2nd order integration is denoted as IntOrd2 (trapezoidal rule) and the method of 4th order of accuracy is denoted as IntOrd4 (Simpson's rule). Two plots summarize the results on mean error E = v k − H k (X) across three state variables and overall number of terms L.
We see from both plots that for small stepsizes, the integration based reconstruction method performs well while the differentiation approach suffers from numerical noise following with the proof of Theorem 2. The higher the sampling frequency is, the more noise is added into the data and the algorithm is more likely to fail. However, large stepsizes are more likely to corrupt the integration-based approach, but in this case this happens at h > 10 −2 when both methods give inaccurate results even on 4th order. The reconstruction techniques based on second-order formulae became inaccurate at h > 2.5 · 10 −4 , the fourth-order based formulae became inaccurate at h > 3.5 · 10 −3 . By "inaccurate" we mean that Equation (37) was reconstructed with some missing or excessive terms.

Reconstruction of Muthuswamy's Memristive Circuit
Replacing the nonlinear negative resistor (Chua diode) in the well-known Chua's circuit with the flux-controlled memristor one obtains a circuit able to exhibit the chaotic behavior [19]. The circuit proposed by B. Muthuswamy includes five elements (Figure 7): a linear passive inductor, two linear passive capacitors, a linear passive resistor and a nonlinear active memristor.
Two plots summarize the results on mean error = ‖ − ( )‖ across three state variables and overall number of terms .
We see from both plots that for small stepsizes, the integration based reconstruction method performs well while the differentiation approach suffers from numerical noise following with the proof of Theorem 2. The higher the sampling frequency is, the more noise is added into the data and the algorithm is more likely to fail. However, large stepsizes are more likely to corrupt the integration-based approach, but in this case this happens at ℎ > 10 when both methods give inaccurate results even on 4th order. The reconstruction techniques based on second-order formulae became inaccurate at ℎ > 2,5 ⋅ 10 , the fourth-order based formulae became inaccurate at ℎ > 3,5 ⋅ 10 . By "inaccurate" we mean that Equation (37) was reconstructed with some missing or excessive terms.

Reconstruction of Muthuswamy's Memristive Circuit
Replacing the nonlinear negative resistor (Chua diode) in the well-known Chua's circuit with the flux-controlled memristor one obtains a circuit able to exhibit the chaotic behavior [19]. The circuit proposed by B. Muthuswamy includes five elements (Figure 7): a linear passive inductor, two linear passive capacitors, a linear passive resistor and a nonlinear active memristor. Figure 7. Schematics of the five-element chaotic circuit [19]. Figure 7. Schematics of the five-element chaotic circuit [19].
After scaling and transformation, we obtain the following system of ODEs: Parameters for chaotic behavior are α = −0.667 · 10 −3 ; β = 0.029·10 −3 ; p 1 = 2.594·10 −3 ; p 2 = 73.53·10 3 ; p 3 = 147·10 6 ; p 4 = 7.353·10 3 ; p 5 = 14.7·10 6 ; p 6 = 55.55. We simulated the system by the function ode113 with parameters RelTol = 10 −13 and AbsTol = 10 −15 and constant stepsize h = 10 −8 and final time T = 0.01. In this experiment we used data obtained from the original Equation (39) and reconstructed the state variable x 1 by integrating variable x 2 . This is motivated by two facts: first, variable x 1 stands for the magnetic flux between the terminals of the memristive device [19], and therefore cannot be measured on a macroscopic scale, so it is assumed to be unobservable. Second, integrating x 2 is very natural to the system (39) due to the relation . x 1 = p 1 x 2 and therefore, needs no complicated transformations.
The degrees of terms in the monomials were placed within the interval D = [0; 3], the tolerance parameter of AMB algorithm was = 10 −2 . The reconstructed equations are Note, that the Equations in (40) are a rescaled version of the Equations in (39) whenp 1 = 1 instead of p 1 in the original equation.
Two projections of the attractor of original and reconstructed systems are given in Figure 8. The initial conditions were (0, 0.1, 0.1, 0) T . We compared three integration methods by changing the tolerance parameter η in the delMinorTerms algorithm in the range η ∈ 10 −3 , 10 4 . Figure 9 depicts the result of the experiment. Using higher-order integration formulae allows for the achievement of more precise and more compact solutions. This experiment shows the difficulty of choosing η: its low values force the algorithm to generate equations with many excessive terms making the least-squares solution more precise. But, as one can see in Figure 9, the accuracy of the proper reconstruction is rather high due to high absolute values of the state variables (see Figure 6) and thus truncation and numerical errors in the reconstructed data for x 1 are large as well.
initial conditions were (0,0.1,0.1,0) . We compared three integration methods by changing the tolerance parameter in the delMinorTerms algorithm in the range ∈ [10 , 10 ]. Figure 9 depicts the result of the experiment. Using higher-order integration formulae allows for the achievement of more precise and more compact solutions. This experiment shows the difficulty of choosing : its low values force the algorithm to generate equations with many excessive terms making the least-squares solution more precise. But, as one can see in Figure 9, the accuracy of the proper reconstruction is rather high due to high absolute values of the state variables (see Figure 6) and thus truncation and numerical errors in the reconstructed data for are large as well.
(a) (b)  To summarize, the experiment shows that higher-order integration is more preferable and value should be adjusted with respect to the problem properties.

Discussion and Conclusions
Two main findings are highlighted in this paper. First, we show that the integral embedding approach has several advantages over the differential approach often used in cases of incompletely observed systems. Second, we extend the Kera-Hasegawa method to Laurent polynomials and show by example its performance.
Still, some limitations of this method exist. The first, and the most prominent one being, is that not each system can be represented as a PDS, and furthermore, there are some PDSs that do not allow transformation to the form of Equation (22) or Equation (24) or their combinations. Consider the following system with an unobservable variable : To summarize, the experiment shows that higher-order integration is more preferable and value η should be adjusted with respect to the problem properties.

Discussion and Conclusions
Two main findings are highlighted in this paper. First, we show that the integral embedding approach has several advantages over the differential approach often used in cases of incompletely observed systems. Second, we extend the Kera-Hasegawa method to Laurent polynomials and show by example its performance.
Still, some limitations of this method exist. The first, and the most prominent one being, is that not each system can be represented as a PDS, and furthermore, there are some PDSs that do not allow transformation to the form of Equation (22) or Equation (24) or their combinations. Consider the following system with an unobservable variable z: where a, b, c are parameters. The natural way to reconstruct Equations (41) in this case is to transform it into where a new variable u is reconstructed from the observable variable y through differentiation. Equation (42) cannot be represented in terms of polynomial functions and, moreover, needs a special technique to resolve the ambiguity in ± sign appearing when z variable is expressed from the second Equation of (41). The second limitation of the proposed reconstruction method can be exemplified by Muthuswamy's memristive circuit. In many cases, the algorithm found a PDS different from Equations in (40), but which is also correct on limited data series X, V and has a stable attractor similar to the attractor of the system Equations in (40): Moreover, the third Equation in (43) is parametric, so infinitesimally many variants of it exist. So, if similar data can be produced by several PDSs, the method may find any one of them.
The third limitation of the proposed reconstruction method was found in both experiments with the Lorenz system and Muthuswamy's memristive circuit, i.e., it is sensitive to algorithm parameters, such as stepsize, the order of accuracy of the reconstruction method, its type (differentiation or integration), and tolerance of delMinorTerms algorithm η. Obviously, sensitivity to the AMB algorithm tolerance also takes place, but we did not investigate its influence.
The fourth limitation is that the dimension of search grows dramatically when the number of terms or system dimension grows. For example, four-dimensional PDS in complete Laurent polynomials containing degrees in D ∈ [−3, 3] can be constructed out of 471 unique monomials, five-dimensional PDS out of 1281 unique monomials, and six-dimesional PDS out of 3067 unique monomials. Even if ABM algorithm sufficiently reduces these numbers, it is likely to obtain very poorly conditioned large-scale regression problems, and it is not obvious whether the algorithm remains useful in these cases. This question needs further investigation.
Laurent monomial ordering, delgexord, and BM algorithm for Laurent monomials also need the development of a stricter theoretical basement, as in the case of well-established ordinary monomials [35]. Moreover, from Equation (4) it is clear that the reconstruction approach based on the LSM can also handle not only PDSs but functions containing other elementary nonlinearities as well (e.g., trigonometric functions). This also needs further development. Funding: The reported study was supported by RFBR, research project no. 19-07-00496.