Auxiliary Model-Based Multi-Innovation Fractional Stochastic Gradient Algorithm for Hammerstein Output-Error Systems

: This paper focuses on the nonlinear system identiﬁcation problem, which is a basic premise of control and fault diagnosis. For Hammerstein output-error nonlinear systems, we propose an auxiliary model-based multi-innovation fractional stochastic gradient method. The scalar innovation is extended to the innovation vector for increasing the data use based on the multi-innovation identiﬁcation theory. By establishing appropriate auxiliary models, the unknown variables are estimated and the improvement in the performance of parameter estimation is achieved owing to the fractional-order calculus theory. Compared with the conventional multi-innovation stochastic gradient algorithm, the proposed method is validated to obtain better estimation accuracy by the simulation results.


Introduction
The accuracy of a system model affects the performance and safety of industrial control systems [1][2][3][4][5], and system identification is a theory and method for constructing mathematical model of systems and has been widely implemented in practice [6][7][8][9]. The behavior of most modern industrial control systems and synthetic systems are nonlinear by nature. Presently, an important research field in modern signal processing is the research of parameter identification for nonlinear systems, in which the block-structure systems, such as the Hammerstein model, are among the most current nonlinear systems due to their efficiency and accuracy to model complex nonlinear systems [10][11][12]. The representative feature of a Hammerstein model is that its architecture consists of two blocks: a static nonlinear model followed by a linear dynamic model. The simplicity in structure makes it provide a good compromise between the accuracy of nonlinear systems and the tractability of linear systems, and thus promoting its use in different nonlinear applications such as automatic control [13][14][15], fault detection and diagnosis [16][17][18], and so on.
Recently, several new system identification methods and theories have been developed for nonlinear models in the literature, including the least squares methods [19], the gradientbased methods [20], the iterative methods [21],the subspace identification methods [22], the hierarchical identification theory [23], the auxiliary model and the multi-innovation (MI) identification theories [24]. One well-known algorithm is the stochastic gradient (SG) algorithm, which has lower computational cost and complexity than the recursive least squares algorithm, whereas slow-convergence phenomena are often observed. Therefore, different modifications of the SG algorithm were developed to enhance its performance [25][26][27][28][29][30]. In particular, by extending scalar innovation into innovation vectors, the MI identification theory was proposed to improve the convergence speed and estimation accuracy in [31], and the fractional-order calculus method was introduced to show that it can achieve more satisfactory performance in [32,33].
To the best of our knowledge, different fractional-order gradient methods have been produced [34][35][36]. For example, in [37], a fractional-order SG algorithm was designed to identify the Hammerstein nonlinear ARMAX systems by an improved fractional-order gradient method. Based on the MI theory and the fractional-order calculus, an MI fractional least mean squares identification algorithm was presented for the Hammerstein controlled autoregressive systems, where the update mechanism was composed of the first-order gradient and the fractional gradient [38]. However, the above-discussed papers only consider the Hammerstein equation-error systems, and the cross-products between the parameters in the linear block and nonlinear block can lead to many redundant parameters. When the dimensions of parameter vectors are large, it will cause high computational complexity and deteriorate the identification accuracy.
In this work, we study the identification problem of the Hammerstein output-error moving average (OEMA) systems, which have been less studied due to the difficulty in identification [39,40]. To avoid estimating the redundant parameters, the Hammerstein model is parameterized using the key-term separation principle [41]. Furthermore, based on the identification model, the fractional-order SG algorithm is extended to the identification of Hammerstein OEMA systems and an auxiliary model-based multi-innovation fractional stochastic gradient (AM-MIFSG) algorithm is presented by the auxiliary model identification idea. The proposed algorithm can generate higher estimation accuracy than the common multi-innovation stochastic gradient (MISG) algorithm, with fewer parameters required to be estimated.
The paper is structured as follows. Section 2 gives a description for Hammerstein OEMA systems. Section 3 introduces the multi-innovation identification theory and drives an auxiliary model-based multi-innovation stochastic gradient (AM-MISG) identification algorithm for a comparison purpose. Section 4 presents the AM-MIFSG identification algorithm for the Hammerstein OEMA systems. Section 5 gives the convergence analysis of the proposed AM-MIFSG algorithm. Section 6 verifies the results in this paper using a simulation example. Finally, concluding remarks are given in Section 7.

The System Description
Consider the Hammerstein OEMA systems shown in Figure 1, where {u k } and {y k } are the input and output sequences of the system, {ū k } is the output sequence of the nonlinear block, and it can be represented as a linear combination of a known basis f (u k ) : {v k } is a stochastic white noise sequence with zero mean and variance σ 2 , A(z), B(z) and D(z) are the polynomials in the unit backward shift operator z −1 [z −1 y k = y k−1 ], and defined as A(z) := 1 + a 1 z −1 + a 2 z −2 + · · · + a n a z −n a , B(z) : Assume that the orders of these polynomials n a , n b and n d are known and u k = 0, y k = 0 and v k = 0 for k 0. Define the intermediate variables x k and w k as follows: Take the first variableū k on the right-hand side of (3) as a separated key-term. Based on the principle of key-term separation [42,43], substitutingū k in (2) into (3) gives Define the following related parameter vectors: θ := θ s d ∈ R n , n := n a + n b + n d + m, θ s := [a T , b T , c T ] T ∈ R n a +n b +m , a := [a 1 , a 2 , · · · , a n a ] T ∈ R n a , b : and the information vectors: From (1)-(5), we have Equation (6) is the identification model of the Hammerstein OEMA system. Please note that the parameter vector θ contains all the parameters of the system in (1)- (2), and the parameters in the linear and nonlinear blocks are separated. This means there is no need to identify redundant parameters. This paper aims to present an AM-MIFSG algorithm for Hammerstein OEMA systems to improve the parameter estimation accuracy.

The AM-MISG Algorithm
In this section, we introduce the auxiliary model and multi-innovation identification theories briefly, and derive the AM-MISG algorithm for the Hammerstein OEMA system. Letθ k denote the estimate of θ. Based on the search principle of negative gradient, defining and minimizing the cost function the following SG algorithm can be obtained for estimating the parameter vector θ: where µ 1 is the step size for the SG algorithm, which is taken as µ 1 = 1 s k , and s 0 = 1. However, it is worth noting that the variables x k−i ,ū k−i and v k−i in ϕ k are unknown, and thus the algorithms in (7)-(9) cannot be implemented directly. The solution is to use the idea of the auxiliary model to build the following auxiliary models based on the parameter estimateθ k :x and use the outputsx k−i ,û k−i andv k−i of the auxiliary models instead of the unknown variables x k−i ,ū k−i and v k−i to construct the estimates of the information vectors: The SG algorithm update the parameter estimate using the current data information, thus its computational complexity is low, but estimation accuracy needs to be improved. Based on the multi-innovation identification theory [44,45], a slide window of length p (i.e., innovation length) is built to improve the estimation performance of the SG algorithm, which contains the data information from the current time k to k − p + 1, i.e., Define the stacked output vector Y p,k and information matrixΦ p,k as In principle, the estimateθ t−1 is closer to the optimal value θ thanθ t−i for i = 2, · · · , p, then Equation (10) can be approximated by In summary, we can obtain the AM-MISG algorithm as follows: x k =φ T s,kθs,k , Please note that the AM-MISG algorithm will reduce to the auxiliary model-based stochastic gradient (AM-SG) algorithm when p = 1.

The AM-MIFSG Algorithm
This section deduces an AM-MIFSG algorithm to improve the parameter estimation performance of above AM-MISG identification algorithm.
In (7), the first-order gradient is used to update the parameter vector. In contrast to the integer order, for the quadratic objective function, the derivative of a fractional-order near a point is uncertain, so its essential property is nonlocal. This excellent property enables the fractional-order gradient method to jump out of local optimum and reach global minimum point more quicker. Here, we propose to add the fractional-order gradient in addition to the first-order gradient, and the final update relation is written as: where µ α is the step size for the factional order derivative ∂ α . According to the Caputo and Riemann-Liouville definition [46,47], the fractional derivation of a power function f (t) = t n (n > −1)is defined as: where D α t is the fractional derivative operator of order α and Γ is the gamma function which defined as Γ(n) = (n − 1)!.
According to (26), the fractional-order gradient in Equation (25) can be written as follows: where Γ(2) = 1. Then Equation (25) can be approximated as follows: Please note that the absolute value of θ is used to avoid complex values, this is a common way of dealing with fractional-order gradient [38]. The introduction of fractionalorder parameter α provides additional degrees of freedom and increases the flexibility of the parameter estimation.
Similar to the AM-MISG algorithm in Section 3, expanding the information vector ψ k to the information matrix and applying the auxiliary model identification idea, we can obtain the following AM-MIFSG algorithm: x k =φ T s,kθs,k , Here, the above AM-MIFSG algorithm reduces to the auxiliary model-based fractional stochastic gradient (AM-FSG) algorithm when p = 1.

Remark 1.
In general, as the innovation length p increases, the collected data are being used more fully, and therefore the estimation accuracy is gradually improved. However, the computational amount increases at the same time. How to choose optimal innovation p is an open problem to be solved. In practice, we often choose p < n. Remark 2. The differential order α is chose in the range of (0,1). The orders may show different characteristics for different systems, and can be adjusted during the procedure as needed.
The implementation of the AM-MIFSG algorithm is listed as follows.

2.
Collect the input-output data u k and y k , form the basis function vector f (u k ) by (42), and the information vectorsφ k by (43),φ s,k by (44) andφ n,k by (45).
Increase k by 1, go to step 2.
Proof. Define the parameter estimation errorθ k =θ k − θ ∈ R n . To simplify the proof, assuming s α,k = s k /Γ(2 − α). Inserting (32) into (31) and rearranging, we havē where The rest can be proved in a similar to the way in [66].

Examples
Consider the following Hammerstein OEMA system: In this example, the input {u k } is a persistently excited signal sequence and {v k } is a white noise sequence with zero mean and variances σ 2 = 0.80 2 . The data length is taken as L = 4000, where the first 3500 samples are assigned for system identification and the remaining 500 samples are assigned for prediction and validation. The details are as follows.
1. Firstly, applying the AM-MISG algorithm and the AM-MIFSG algorithm with α = 0.94 to estimate the parameters of considered system. Tables 1 and 2 show the parameter estimates and their errors with p = 1, 2, 4 and 6. Figures 2 and 3 indicate the parameter estimation errors δ := θ k − θ / θ versus k.    2. Secondly, to validate the influence of the fraction order α, in the AM-MIFSG algorithm, we take p = 5 and 6, and α =0.80, 0.90 and 0.92, respectively, the simulation results are shown in Tables 3 and 4, and Figures 4 and 5. 3. In the end, a different data set (L e = 500 samples from k = 3501 to 4000) and the estimated model obtained by the AM-MIFSG algorithm with p = 6 and α = 0.92 are used for model validation. The predicted output and true output are plotted in Figure 6 from k = 3501 to 3700 and Figure 7 from k = 3501 to 4000, where the average predicted output error is and the dots line is the outputŷ k of the estimated model and the solid line is the true output y k . From Tables 1-4 and Figures 2-7, we can draw the following conclusions: (1) with the innovation length p increases, both the AM-MISG and the AM-MIFSG algorithm can give higher parameter estimation accuracy; (2) in general, the AM-MIFSG algorithm has a faster convergence rate than the AM-MISG algorithm in the same situation, and the introduction of the fractional-order can improve the parameter estimation accuracy; (3) the convergence rate of the AM-MIFSG increases as the fractional-order α increases, the α within the range of [0.90, 0.95] seems to be an appropriate choice which can give better estimation results for the Hammerstein output-error systems; (4) the estimated model obtained by the AM-MIFSG algorithm can well capture system dynamics.

Conclusions
This paper derives an AM-MIFSG estimation algorithm for Hammerstein output-error systems based on the key-term separation principle and auxiliary model identification idea. By means of the key-term separation principle, all the parameters in the linear and nonlinear blocks are separated, and the unknown variables in the identification model are replaced by the outputs of the auxiliary models. The analysis of the simulation results shows that the proposed algorithm obtains better parameter estimation performance than the AM-MISG algorithm. However, there also exist many topics that need to be further discussed. For example, is this algorithm still effective for systems with missing data? And is the performance of the algorithm can be improved by introducing a time-varying differential order α? These topics remain as open problems for future studies.