Kernel Identification of Non-Linear Systems with General Structure

In the paper we deal with the problem of non-linear dynamic system identification in the presence of random noise. The class of considered systems is relatively general, in the sense that it is not limited to block-oriented structures such as Hammerstein or Wiener models. It is shown that the proposed algorithm can be generalized for two-stage strategy. In step 1 (non-parametric) the system is approximated by multi-dimensional regression functions for a given set of excitations, treated as representative set of points in multi-dimensional space. ‘Curse of dimensionality problem’ is solved by using specific (quantized or periodic) input sequences. Next, in step 2, non-parametric estimates can be plugged into least squares criterion and support model selection and estimation of system parameters. The proposed strategy allows decomposition of the identification problem, which can be of crucial meaning from the numerical point of view. The “estimation points” in step 1 are selected to ensure good task conditioning in step 2. Moreover, non-parametric procedure plays the role of data compression. We discuss the problem of selection of the scale of non-parametric model, and analyze asymptotic properties of the method. Also, the results of simple simulation are presented, to illustrate functioning of the method. Finally, the proposed method is successfully applied in Differential Scanning Calorimeter (DSC) to analyze aging processes in chalcogenide glasses.


Introduction
The problem of non-linear system modeling has been intensively examined over the past four decades. Owing to many potential applications (see [1]) and interdisciplinary scope of the topic, both scientists and engineers look for more precise and numerically efficient identification algorithms. First attempts at generalization of linear system identification theory for non-linear models were based on Volterra series representation ( [2]). Traditional Volterra series-based approach leads to relatively high numerical complexity, which is often not acceptable from practical point of view. To cope with this problem regularization or tensor network techniques have been proposed recently ( [3,4]). However, strong restrictions are imposed on the system characteristics (e.g., smoothness of nonlinearity, and short memory of the dynamics). Alternatively, the concept of block-oriented models was introduced ([5]). It was assumed that the system can be represented, or approximated with satisfactory accuracy, with the use of structural models including interconnected elementary blocks of two types-linear dynamics and static nonlinearities ( [6]). The most popular structures in this class are Hammerstein system (see e.g., [7][8][9][10][11][12]) with static nonlinearity followed by linear dynamics, and the Wiener system [13][14][15][16][17][18]) including the same blocks connected reversely.
Traditionally, identification method needs two kinds of knowledge-set of input-output measurements and a priori parametric formula describing system characteristics and including finite number of unknown parameters ( [6,19]). Usually, the polynomial model of static characteristics and the difference equation with known orders are assumed. This convention leads to relatively fast convergence of parameter estimates, but the risk of systematic approximation error appears, if the assumed model is not correct. As an alternative, the theory of non-parametric system identification ( [20][21][22][23]) was proposed to solve this problem. The algorithms work under mild prior restrictions, such as stability of linear dynamic block and local continuity of static non-linear characteristics. Although the estimates converge to the true characteristics, the rate of convergence is relatively slow in practice, as a consequence of assumptions relaxation. This paper represents the idea of combined, i.e., parametric-non-parametric, approach (see e.g., [24][25][26][27][28][29][30]), in which both parametric and non-parametric algorithms support each other to achieve the best possible results of identification for moderate number of measurements and to guarantee the asymptotic consistency (i.e., convergence to the true system characteristics), when the number of observations tends to infinity. Since the preliminary step of structure selection is treated rather cursorily in the literature, generalizations of the approach towards wider classes of systems seem to be of high importance from practical point of view. The paper was also motivated by the project of Differential Scanning Calorimeter ( [31]) developed in the team, and particularly, modeling of heating process for examination the properties of chalcogenide glasses.
Main contribution of the paper lays in the following aspects: • proposed identification algorithm is run without any prior knowledge about the system structure and parametric representation of nonlinearity, • non-parametric multi-dimensional kernel regression estimate was generalized for modeling of non-linear dynamic systems, and the dimensionality problem was solved by using special input sequences, • the scheme elaborated in the paper was successfully applied in Differential Scanning Calorimeter for testing parameters of chalcogenide glasses.
The paper is organized as follows. In Section 2 the general class of considered systems and the identification problem is formulated in detail. Next, in Section 3.1, purely non-parametric, regression-based approach is presented, and its disadvantages are discussed. Then, to cope with dimensionality problem, the idea of some specific input sequences is presented in Section 3.2. Owing to that, the system characteristics are identified only for some selected points, but the convergence is much faster. The idea of combined, two-stage strategy is introduced in Section 4. It allows use of prior knowledge to expand the model on the whole input space. Also, the results of simple simulation are included in Section 5 to illustrate and discuss some practical aspects of the approach. Finally, in Section 6, the algorithm is successfully applied in Differential Scanning Calorimeter to model aging properties of modern materials (chalcogenide glasses).

Class of Systems
We consider discrete-time non-linear dynamic system with general representation where {u k−i } ∞ i=0 is bounded random input sequence (|u k | < u max ), z k is zero mean random disturbance. The transformation F () is Lipschitz with respect to all arguments, and has the property of exponential forgetting (fading memory), i.e., if we put u k−i = 0 for i ≥ s and define the cut-off sequence as We assume that it holds that with some unknown c = const, and 0 < λ < 1. Similar class of fading memory systems, in which the output depends less and less on historical inputs, was considered in [32]. The goal is to identify the system (build the model F ()) using the sequence of N input-output measurements {(u k , y k )} N k=1 . In considered system class, hysteresis is not admitted.

Examples
In this section, we show that some systems (popular in applications) fall into above description as special cases.

Hammerstein System
For Hammerstein system (see Figure 1), described by the equation with the Lipschitz non-linear characteristic, i.e., such that and asymptotically stable dynamics, i.e., |γ i | ≤ c H λ i , we get { } ∞

Wiener System
Analogously, for Wiener system (Figure 2), where the stable linear dynamics is followed by the Lipschitz static non-linear block We get

Finite Memory Bilinear System
Another important and often met in application special case is the bilinear system with finite order m. It is described by the formula i.e., for m = 1 we get for m = 2 we have that Since Considered example falls into the more general class of Volterra representation. Presented approach works without the knowledge of parametric representation. As regards applicability to the class (1), for m < ∞ we have finite memory system, i.e., ∆(s) = 0, as s > m. Moreover, since the input is assumed to be bounded (|u k | < u max ), resulting mapping F () fulfills Lipschitz condition (as ordinary polynomial on compact support).

General Overview
Let us introduce the s-dimensional input regressor and the regression function with the argument vector In particular, for s = 1 we get and for s = 2 The non-parametric kernel estimate ([20-23,33]) of R s x (s) has the following form where · denotes Euclidean norm, K () plays the role of kernel function, e.g., and h is a bandwidth parameter, responsible for the balance between bias and variance of the estimate. The class of possible kernels can obviously be generalized. Nevertheless, previous experiences shows that the kind of kernel function used for estimation is of secondary importance, whereas behavior of h(N) with respect to N is fundamental. We limited the presentation to Parzen (window) kernel for clarity of exposition. It fulfills all general assumptions made for kernels, i.e., it is even, non-negative and square integrable. The system (1) is thus approximated by the model and s can be interpreted as its complexity. Obviously, both h = h(N) and s = s(N) need to be set depending on the number of measurements N. Observing that the mean squared error of the model R s u (s) k can be expressed as follows and introducing the true finite-dimensional regression function R s u → 0 as N → ∞, these components can be set arbitrarily small by using appropriate scale s. Owing to above, for fixed s we focus on the first component of the MSE error of the form where bias R s (u var It can simply be shown that The bias order follows directly from Lipschitz condition, and the fact that u For window kernel, Lipschitz function F , and strictly positive input probability density function around the estimation point, probability of selection in s-dimensional space can be obviously evaluated from below by ch s , where c is some constant. Hence, expected number of successes is proportional to Nch s (not less than). The variance order is thus a simple consequence of Wald's identity. Hence, to assure the convergence R s x (s) → R s x (s) , as N → ∞ in the mean square sense, the following conditions must be fulfilled h → 0 and Nh s → ∞, as N → ∞, which leads to typical setting To obtain the best asymptotic trade-off between squared bias and variance and comparing its orders we get Moreover, to assure the balance between the estimation error and approximation error of order o(λ s ), connected with neglecting the tail {u k−i } ∞ i=s we get where 2 log 1 λ = const. Owing to above, the scale s (N) must not be faster than log N, i.e., which illustrates slowness of admissible model complexity increasing in general case. The property (38) commonly known as "curse of dimensionality problem" illustrates the main drawback of multi-dimensional non-parametric regression approach to system modeling in traditional form. The reason is that probability of kernel selection decreases rapidly when s grows large. We also refer the reader to the proof of Theorem 3 in [26], where a detailed discussion concerning an analogous problem can be found.

Dimension Reduction
To cope with the problem shown in (39) we consider two cases of some specific input excitation processes to speed up the rate of convergence.

Discrete Input
In case 1 we assume that in each s-element input sub-sequence u k , u k+1 , ..., u k+s−1 , there exists d inputs with discrete distribution on finite set of possible realizations. Consequently, all the points u (s) k ∈ R s lay on a finite number of separable and compact subspaces with the internal dimension and for x (s) = u (s) For each measurement point probability of kernel selection behaves like ch s * , where s * , (s * < s), is internal dimension of this subspace. In particular, for d = s (all input variables quantized) we get ERR ∼ N −1 . The sets of possible input sequences for s = 2 are illustrated in Figure 3 and 4.

Periodic Input
In case 2 we assume the input is periodic with the period N 0 , i.e., u k = u k+N 0 for each k = 1, 2, ..., N − N 0 . Then, the value of the regressor u (s) k (see (13)) evaluates to one of N 0 distinct points in R s , n ∈ Z, x (s) Measurements are uniformly distributed on the finite set of N 0 distinct points Narrowing of h(N) does not affect the kernel estimator asymptotically (i.e., s * = 0). Consequently, we get the best possible convergence rate ERR ∼ N −1 . However, it must be admitted that estimators are calculated only for finite number of points, and, increasing N 0 causes increase of variance of the regression estimator for particular points x

Hybrid/Combined Parametric-Non-Parametric Approach
Since the special input excitations allows for fast recovering the system characteristics only in some points, additional prior knowledge is needed to extend the model for arbitrary process {u k }. In this section we assume that the transformation F {u k−i } ∞ i=0 belongs to the given (a priori known), finite dimensional class of systems F u (s) with unknown parameter vector θ. In the proposed methodology, one of the input excitations described in Section 4 is applied. The system is identified on the finite set of N 0 representative points x (s) [1] , x (s) [2] , ..., x (s) [p] ∈ R s , and p = 1, 2, ..., N 0 . Let us denote by θ * -the true and unknown vector of system parameters. We assume that θ * is identifiable, i.e., the following property holds Moreover, let the quality index be convex for θ ∈ Ξ, where Ξ is some neighborhood of the true θ * The following two-step algorithm is proposed.

Simulation Example
To illustrate the proposed method, we simulated simple Wiener system (see Figure 2) with and uniformly distributed output disturbance For N = 10 4 simulated input-output pairs {(u k , y k )} N k=1 , the non-parametric models R s (u k ) were computed for s = 1, 2, 3 and compared with respect to the following empirical error (61) The results are presented in Table 1 and Figures 5 and 6. Explicit derivation of the true finite order (2D or 3D) regression function is problematic, owing to the fact that the neglected part of input signal, i.e., the 'tail' connected with terms {u k−τ } ∞ τ=s+1 , is transferred through the non-linear characteristic arctg(). Figures 5 and 6 illustrate non-parametric character and non-linear properties of the model, and give a general view on the shape of input-output relationship, which can be helpful for eventual parametrization. Quantized input (59) can be a good choice, when s = const < ∞, and the non-parametric estimate R s u     Table 1 illustrates reduction of error with respect to scale of the regression. The results are also compared with the best linear approximations FIR(s) of Wiener system. We emphasize that improvement is achieved under mild prior restrictions, i.e., the non-linear model is built based on measurements knowledge only.

Application in Testing of Chalcogenide Glasses with the Use of DSC Method
In this section, we apply the proposed algorithm for identification of heating process in Differential Scanning Calorimeter [31].

Chalcogenide Glasses
Materials with non-linear optical properties play a key role in frequency conversion and optical switching. One of the most promising materials in this area are chalcogenide glasses, because of good non-linear, passive and active properties. They are considered to be optical medium for the fibers of the 21st century. Chalcogenide glass fibers transmit into the IR, hence they can have numerous potential applications in the civil, medical and military areas. The IR light sources, lasers and amplifiers developed using these phenomena will be very useful in many areas of industry. High-speed optical communication requires ultra-fast all-optical processing and switching capabilities. In DSC experiment energy (heat flow) in function of time or temperature could be established. The energy from an external source is required to set to zero the difference in temperature of the tested and reference samples. Both samples are heated or cooled in a controlled mode and both techniques enable the detection of thermal events observed in the physical or chemical transformation under the influence of the changing temperature in a specific manner. Owing to that, many thermodynamically important parameters can be established, e.g., a glass transition or softening temperature, melting temperature, and melting enthalpy. The results also allow observation of physical aging processes. The goal is to control temperature of heating module precisely and ensure linearity of it. It is planned to design Model Following Control (MFC) structure of system to optimize quality indexes of temperature controlling. Below we present the results of identification experiment.

Results of Experiment
Treating the sample temperature as system output y k , and the power of the heating element as input u k , the non-parametric multi-dimension regression model R s u (s) k was computed for s = 1, 2, 3, 4. The results are shown in Figures 7-10.
Differential Scanning Calorimeter for chalcogenide glasses (built by members of the team), was first approximated by the linear model, and the results were not satisfying. To improve accuracy, Hammerstein model was applied, and the decision of model structure was made arbitrarily. To avoid the risk of bad parameterization the general approach presented in the paper was applied. The results are comparable, although obtained without making any restrictive assumptions about the block-oriented structure of the model.    In Table 2 resulting mean squared errors for various scales s = 1, 2, 3, 4 are shown. The strong point of the method is that asymptotically, as s(N) → ∞, the model becomes free of approximation error, on the contrary to linear or block-oriented representation.  [20], their algorithms suffer from correlation of input, and they are not applicable here. On the other hand, for parametric Hammerstein model, the results are strongly dependent on the arbitrarily selected basis functions of nonlinearity. In our experiment we applied 3rd order polynomial function µ() connected in cascade with FIR(s) linear dynamic filter. Table 2 shows that our method is more accurate, emphasizing that it works under mild prior restrictions.

Conclusions
The main contribution of the work lays in the fact that the model is built with lack of prior knowledge about the structure of the system and its characteristics. No decision of using particular Hammerstein or Wiener model is needed at the beginning to start the procedure. Obtained non-parametric estimators R s x (s) [p] can be eventually plugged into the least squares optimization criterion in step 2, to provide parametric representation of the relationship. Both parametric and non-parametric methods can be combined to design strategy, which includes advantages of both approaches.
Step 1 (non-parametric) is run to estimate selected points of system characteristic. It is done effectively thanks to generation of specific input excitation (discrete or periodic), which allows to avoid the problem of high dimensionality. Moreover, appropriate selection of estimation points can significantly decrease level of difficulty of the non-linear optimization task in step 2. The rate of convergence of parameter estimate is the same as for non-parametric ones.
The scheme presented in the paper is universal for a broad class of systems including Hammerstein and Wiener structures, and their interconnections. Non-parametric data pre-filtering plays also the role of compression algorithm and the result of step 1 can be treated as the simplified pattern of system for eventual structure detection and selection of its best parametric model. Regression-based non-parametric model can be computed only for the set of selected points, and the resulting pairs