A Spline Kernel-Based Approach for Nonlinear System Identiﬁcation with Dimensionality Reduction

: This paper proposes a novel approach for identiﬁcation of nonlinear systems. By transforming the data space into a feature space, kernel methods can be used for modeling nonlinear systems. The spline kernel is adopted to produce a Hilbert space. However, a problem exists as the spline kernel-based identiﬁcation method cannot deal with data with high dimensions well, resulting in huge computational cost and slow estimation speed. Additionally, owing to the large number of parameters to be estimated, the amount of training data required for accurate identiﬁcation must be large enough to satisfy the persistence of excitation conditions. To solve the problem, a dimensionality reduction strategy is proposed. Transformation of coordinates is made with the tool of differential geometry. The purpose of the transformation is that no intersection of information with relevance to the output will exist between different new states, while the states with no impact on the output are extracted, which are then abandoned when constructing the model. Then, the dimension of the kernel-based model is reduced, and the number of parameters to be estimated is also reduced. Finally, the proposed identiﬁcation approach was validated by simulations performed on experimental data from wind tunnel tests. The identiﬁcation result turns out to be accurate and effective with lower dimensions.


Introduction
System identification plays an important role in control systems with applications in many fields, e.g., biology, chemistry, physics, electrical engineering, aeronautical engineering, computer engineering, and marine systems [1][2][3][4][5][6][7]. When the model for the dynamic system is unknown, the identification method finds an extract representation with estimated parameters for the system, which is the base for designing the control law. System identification is divided into two steps. The first step is model selection [8], and the second step is parameter estimation [9]. The model selection depends on the model validation techniques [10]. The most commonly used is cross validation [11]. In terms of parameter estimation, the most widely used methods are prediction error-based parametric methods [12,13], such as least squares algorithm and maximum likelihood algorithm.
Recently, system identification attracts many interests from researchers in the machine learning field. In terms of the uncertainty and nonlinearity, machine learning provides a tool for modeling the control system. Machine learning schemes support efficient models in system identification. Many existing modeling frameworks are proposed with elements of artificial intelligence, which can be integrated to the control loop. The use of artificial intelligence combination to solve system identification problems produces effective results.

Related Work
In most cases, the intended use for system identification is control design. For the perspective of control systems, the state variable is denoted as the vector x, consisting of several states with x = [x 1 , x 2 , x 3 , · · · ] T , while the input and output of the system are denoted as u and y, respectively. The goal of system identification is to construct the relationship between u and y, given the set of pairs (x k , y k ) ∈ (R n , R), where x k is the value of the state vector at time k with x k = [x k 1 , x k 2 , · · · , x k n ] T , and n is the number of the states.
Then, the relationship between u and y can be given by a state-space model. In the model set, a simple example is the linear state space equations, where A, B, and C are matrices with estimated parameters. With the given parametric form in Equation (1), the problem of system identification is transformed to be the problem of parameter estimation from sampled data from observations. The idea is extended with the goal to identify nonlinear systems. Consider a nonlinear state-space model [24][25][26] where . . , f n (x)] T and g(x) = [g 1 (x), g 2 (x), . . . , g n (x)] T are sets of R n → R mappings defined on state space, which are usually obtained according to the physical mechanism and prior knowledge of the system, and h(x) is a nonlinear real-valued function defined on state space. In terms of the parametric model, the function h : R n → R is determined by the parameter vector denoted as θ, after the model structure is selected. The identification method is used to find the function h(x) which approximates the output well. In the step of constructing model structures, the function is often assumed to be linear in θ [27], and then the relationship between y and x can be represented as where θ i is the ith element in the parameter vector θ, {φ 1 , φ 2 , . . . , φ m } is the basis function set, and m is the number of functions in the set. There exist many choices for the form of basis function. Among them, the polynomial model is most widely used [28], with φ i (x) = (λ T x) i , where λ is a weight vector. m is determined by the model validation technique. Then, the least squares algorithm can be used to estimate θ by minimizing where N is the number of data. However, generalization ability cannot be guaranteed when the objective function in the minimization is merely determined by the error between the prediction and the real value. Additionally, overfitting will be caused, which means small noise in the data will have significant effect on the final model. A method usually usually used for deflecting overfitting is regularization [29]. A regularization term is added to the objective function in the minimization process [22], where r(·), a function of h, represents the regularizer, which can penalize the unexpected features of h, and γ is the weight parameter, which is used to adjust the relative importance between the error term ∑ N i=1 (y i − h(x i )) 2 and the regularization term r(h).
To reduce the sensitivity to the noises, the high-frequency components in h is reduced by penalizing the energy of high-order derivatives [30], which is given by where h (p) (x) is the pth order of h(x), and p is determined by the properties of noises in practical applications.

Problem Formulation
In recent years, the kernel method has been adopted to construct the basis function. With a defined kernel, h(x) is represented as where K x i (·) denotes the kernel function, which can be also described as Based on the definition of the kernel, a Hilbert space H can be constructed by proving that the Cauchy sequences with the induced norm in the space will converge [31,32]. According to the expected properties for the final model, we set the form for the regularizer r(·) in Equation (3) to be from which, it can be easily derived that

Spline Kernel-Based Identification Method
The most widely used kernels include the linear kernels, radial basis kernels, and spline kernels. As is discussed, the regularizer has the form of Equation (6) within the constructed Hilbert space. It is always desired to reduce the impact of noises in identification. There is great significance to design the r(h) in the form of Equation (4), which penalizes the energy of high-order derivatives. If Equations (4) and (6) can match with each other, the kernel has huge advantages in modeling h(x). The spline kernel is chosen, because it satisfies the matching relation with the form where With the tool of least squares support vector machine method [33], the estimation for θ is obtained byθ where K is the kernel matrix with K ij = K(x i , x j ), I is an N × N identity matrix, and y is the output vector with y = [y 1 , y 2 , · · · , y N ] T .

Dimensionality Reduction
From Equation (5), we can find that the dimension of state vector x, i.e., the number of states, has a huge influence on the model complexity, which also determines the computational complexity of estimation according to Equation (10). However, it is often the case that some states may have little impact on the output of the system because the output has no relationship with the part of states or the information in some states related to the output has already been contained in other states. Consequently, the dimension for the output model can be reduced.
Considering that there exists an intersection among the information of different states, which leads to redundancy in modeling, it is hoped that the information with relevance to the output can be concentrated, and the variables, which provide no additional information on the output except the existing variables, should be abandoned in the modeling process in Equation (5). Coordinate transformation is proposed in this paper, which is discussed in detail in next section. The new state vector is denoted as z, with z ∈ R n . The function realizing the transformation is denoted as where each of the elements in z 1 is related with y, while all the elements in z 2 have no effect on y.
Since the definition of the kernel has no dependence on the dimension of the point, the form of the kernel function remains unchanged. The purpose for the transformation is to reduce the dimension of Equation (5) h where β is the parameter vector to be estimated, z 1 is part of the transformed state vector z, N is the number of data for modeling after transformation, and z 1 ∈ R d , d ≤ n. Figure 1 shows the graphical illustration of the proposed framework. It is easy to complete the coordinate transformation for a linear system. However, for a nonlinear system, the transformation is difficult, especially when the transformation is also a nonlinear change.

Differential Manifold-Based Transformation
As discussed in the previous section, some of the states in x may present similar information related to y, and some of the states in x may have little impact on y. Aiming at removing the correlation of different states with y, we adopt coordinate transformation to reduce the dimension. Coordinate transformation has been used to obtain more efficient solutions and accelerate the convergence process in optimization problems [34,35]. However, the linear coordinate transformation, which is most widely used, cannot deal with the problem.
We deal with the problem with the tool of differential geometry [36] to obtain nonlinear coordinate transformation. A smooth distribution of dimension n − d is first defined, denoted as ∆, which represents a differentiable manifold. The manifold ∆ is required to be invariant under the vector fields f , g, i.e., f (x) ∈ ∆, g(x) ∈ ∆ for all x in ∆. A vector field on ∆ means an assignment of one tangent vector to each x in ∆. f and g can be regarded as mappings from ∆ into the tangent bundles.
To find the transformation t(x), which is a vector of n real-valued functions of n variables, denoted as two assumptions are made: • The inverse function t (z) of t(x) can be found. For any x in ∆, t (t(x)) = x.

•
The functions t(x) and t (z) are both continuous and analytic on their domain of definition.
Based on the division of z in Equation (11), we have where d < i ≤ n. Then, ∆ is contained in the distribution (span{dh}) ⊥ . Ω is denoted as the orthogonal complement distribution of ∆ with Ω = ∆ ⊥ . For any x in ∆, where a * is the dual vector and (R n ) * is the dual space. If b is a column vector, which is a point in ∆, then a * is a row vector, a * = [a 1 , a 2 , · · · , a n ], which is a point in Ω. The inner product a * , b is given by Then, it can be given that Ω contains the span{dh}, and also is invariant under the vector fields f , g.
Aiming at obtaining the maximal dimension reduction of Equation (5), we need to find the smallest distribution which contains Ω and is invariant under the vector fields f , g, denoted by f , g|Ω . A nondecreasing sequence of distributions is defined Ω 0 = Ω = span{dh}, . . .
where [, ] is the Lie bracket of vector field, which is an operator to assign a third vector field given two vector fields on a smooth manifold. For any given functions According to Lemma 1.9.2 in [36], in iteration process of Equation (20), the equation Ω k * = Ω k * +1 finally appears for some integer k * . Meanwhile, f , g|Ω is obtained, Finally, we find the real-valued functions z 1 Then, the model of Equation (2) is transformed to Compared with the original description, the dimension of the new description is reduced. The number of data required for accurate identification is reduced, which corresponds to N in Equation (13). Consequently, the number of parameters to be estimated is highly reduced.

Simulation and Evaluation
In this section, the proposed identification approach is evaluated and compared with the commonly used methods. A brief introduction to the experimental settings is first given. Comparison of the evaluation results with other methods is then made.

Background Information
To validate the proposed method for nonlinear systems, aerodynamic coefficients at large angles of attack were considered, which present significant nonlinear phenomena in the wind tunnel experiments. With the FL-8 wind tunnel, large amplitude forced oscillation tests were made. The model to be tested was chosen to be the fourth-generation fighter configuration [37]. Table 1 gives the main physical parameters for the experimental model. In the tests, the flow speed was 30 m/s. The sampling period was 0.02 s. The state vector for aerodynamic identification was constructed by where α is the angle of attack, q is the pitch rate, δ e is the elevator control surface deflection, and M is the Mach number. The state equation for x 1 (k) = α(k), x 2 (k) =α(k), x 3 (k) =α can be given by where ∆t denotes the time period of the measurements, and w(k) is the state noise.
Corresponding to the form of Equation (2), the first three elements of f (x) and g(x) are linear. The construction for the remaining part of f (x) and g(x) can be obtained according to the physical mechanism, e.g., through table interpolation.

Comparison Results and Discussions
The nonlinear aerodynamic coefficients in the large amplitude pitch oscillation tests for three frequencies are shown in Figure 2. The test is conducted at the balanced angle of 40 • with amplitude of 40 • , and the sideslip angle is −30 • . When the test data are not sufficient, the data-driven models including step response model and neural networks cannot be identified. We compared the proposed identification approach with the standard spline kernel-based method. We chose the experimental condition with frequency of 0.8 Hz.
The estimation results are shown in Figure 3. In terms of the lift coefficient with slow time-varying characteristics, the proposed approach and the standard spline kernel method demonstrate almost the same tracking ability. The improvement is not obvious because of the locally linear features in the curve. In terms of the drag coefficient, which contains more complex nonlinear characteristics, the performance of the proposed method is obviously better. Given the same amount of training data, more features are caught by the proposed approach. The comparison results of objective function with the form of Equation (3) is given in Table 2, which proves that the proposed algorithm can track the hysteresis with a higher accuracy. As discussed above, in addition to the accuracy, the construction of Equation (3) also takes into account of the generalization ability of the model. Therefore, the proposed method has better generalization ability and is less sensitive to the noises in observations. Additionally, with the strategy of dimensionality reduction, the new state vector z transformed from x is divided into z 1 and z 2 . The dimension of the state vector z 1 finally input to the spline kernel-based model is 3. Therefore, the dimension of the final output model h(z 1 ) is 3. Compared with the standard spline kernel-based model, the proposed approach constructs a model with fewer parameters. Estimation is made with lower computational cost, while the information in the data is used more efficiently.

Conclusions
In this work, a novel spine kernel-based approach with dimensionality reduction is designed for nonlinear system identification. A regularization term based on the produced Hilbert space is added into the objective function to guarantee the generalization ability of the model. Dimensionality reduction is proposed to solve the problem of kernel methods when applied to data with high dimensions. By reducing the dimension of the state vector, the number of parameters in the estimation is significantly reduced. With the tool of Lie bracket, nonlinear coordinate transformation is made based on the defined distribution of the differentiable manifold. Clear physical meaning is given that the transformation concentrates the information on the states related to the output, and abandons the redundant states. After the transformation, the new state vector reduces the dimensions of the new description. Evaluation on the nonlinear aerodynamic coefficients under the wind tunnel tests was made. The experimental results show that the proposed method successfully approximates the data in the nonlinear system and achieves higher accuracy and better generalization ability through dimensionality reduction. The future work will include analyzing the effects of the regularization parameter γ on the performance of the identification method, and subsequently linking the estimation of γ into the identification framework.