Kernel Identification of Non-Linear Systems with General Structure

Mzyk, Grzegorz; Hasiewicz, Zygmunt; Mielcarek, Paweł

doi:10.3390/a13120328

Open AccessArticle

Kernel Identification of Non-Linear Systems with General Structure

by

Grzegorz Mzyk

^*

,

Zygmunt Hasiewicz

and

Paweł Mielcarek

Faculty of Electronics, Wrocław University of Science and Technology, 50-372 Wrocław, Poland

^*

Author to whom correspondence should be addressed.

Algorithms 2020, 13(12), 328; https://doi.org/10.3390/a13120328

Submission received: 27 October 2020 / Revised: 3 December 2020 / Accepted: 4 December 2020 / Published: 6 December 2020

Download

Browse Figures

Versions Notes

Abstract

In the paper we deal with the problem of non-linear dynamic system identification in the presence of random noise. The class of considered systems is relatively general, in the sense that it is not limited to block-oriented structures such as Hammerstein or Wiener models. It is shown that the proposed algorithm can be generalized for two-stage strategy. In step 1 (non-parametric) the system is approximated by multi-dimensional regression functions for a given set of excitations, treated as representative set of points in multi-dimensional space. ‘Curse of dimensionality problem’ is solved by using specific (quantized or periodic) input sequences. Next, in step 2, non-parametric estimates can be plugged into least squares criterion and support model selection and estimation of system parameters. The proposed strategy allows decomposition of the identification problem, which can be of crucial meaning from the numerical point of view. The “estimation points” in step 1 are selected to ensure good task conditioning in step 2. Moreover, non-parametric procedure plays the role of data compression. We discuss the problem of selection of the scale of non-parametric model, and analyze asymptotic properties of the method. Also, the results of simple simulation are presented, to illustrate functioning of the method. Finally, the proposed method is successfully applied in Differential Scanning Calorimeter (DSC) to analyze aging processes in chalcogenide glasses.

Keywords:

system identification; hammerstein system; wiener system; non-parametric methods; kernel regression

1. Introduction

The problem of non-linear system modeling has been intensively examined over the past four decades. Owing to many potential applications (see [1]) and interdisciplinary scope of the topic, both scientists and engineers look for more precise and numerically efficient identification algorithms. First attempts at generalization of linear system identification theory for non-linear models were based on Volterra series representation ([2]). Traditional Volterra series-based approach leads to relatively high numerical complexity, which is often not acceptable from practical point of view. To cope with this problem regularization or tensor network techniques have been proposed recently ([3,4]). However, strong restrictions are imposed on the system characteristics (e.g., smoothness of nonlinearity, and short memory of the dynamics). Alternatively, the concept of block-oriented models was introduced ([5]). It was assumed that the system can be represented, or approximated with satisfactory accuracy, with the use of structural models including interconnected elementary blocks of two types—linear dynamics and static nonlinearities ([6]). The most popular structures in this class are Hammerstein system (see e.g., [7,8,9,10,11,12]) with static nonlinearity followed by linear dynamics, and the Wiener system [13,14,15,16,17,18]) including the same blocks connected reversely.

Traditionally, identification method needs two kinds of knowledge—set of input–output measurements and a priori parametric formula describing system characteristics and including finite number of unknown parameters ([6,19]). Usually, the polynomial model of static characteristics and the difference equation with known orders are assumed. This convention leads to relatively fast convergence of parameter estimates, but the risk of systematic approximation error appears, if the assumed model is not correct. As an alternative, the theory of non-parametric system identification ([20,21,22,23]) was proposed to solve this problem. The algorithms work under mild prior restrictions, such as stability of linear dynamic block and local continuity of static non-linear characteristics. Although the estimates converge to the true characteristics, the rate of convergence is relatively slow in practice, as a consequence of assumptions relaxation.

This paper represents the idea of combined, i.e., parametric-non-parametric, approach (see e.g., [24,25,26,27,28,29,30]), in which both parametric and non-parametric algorithms support each other to achieve the best possible results of identification for moderate number of measurements and to guarantee the asymptotic consistency (i.e., convergence to the true system characteristics), when the number of observations tends to infinity. Since the preliminary step of structure selection is treated rather cursorily in the literature, generalizations of the approach towards wider classes of systems seem to be of high importance from practical point of view. The paper was also motivated by the project of Differential Scanning Calorimeter ([31]) developed in the team, and particularly, modeling of heating process for examination the properties of chalcogenide glasses.

Main contribution of the paper lays in the following aspects:

proposed identification algorithm is run without any prior knowledge about the system structure and parametric representation of nonlinearity,
non-parametric multi-dimensional kernel regression estimate was generalized for modeling of non-linear dynamic systems, and the dimensionality problem was solved by using special input sequences,
the scheme elaborated in the paper was successfully applied in Differential Scanning Calorimeter for testing parameters of chalcogenide glasses.

The paper is organized as follows. In Section 2 the general class of considered systems and the identification problem is formulated in detail. Next, in Section 3.1, purely non-parametric, regression-based approach is presented, and its disadvantages are discussed. Then, to cope with dimensionality problem, the idea of some specific input sequences is presented in Section 3.2. Owing to that, the system characteristics are identified only for some selected points, but the convergence is much faster. The idea of combined, two-stage strategy is introduced in Section 4. It allows use of prior knowledge to expand the model on the whole input space. Also, the results of simple simulation are included in Section 5 to illustrate and discuss some practical aspects of the approach. Finally, in Section 6, the algorithm is successfully applied in Differential Scanning Calorimeter to model aging properties of modern materials (chalcogenide glasses).

2. Problem Statement

2.1. Class of Systems

We consider discrete-time non-linear dynamic system with general representation

y_{k} = F ({\{u_{k - i}\}}_{i = 0}^{\infty}) + z_{k},

(1)

where

{\{u_{k - i}\}}_{i = 0}^{\infty}

is bounded random input sequence (

|u_{k}| < u_{max}

),

z_{k}

is zero mean random disturbance. The transformation

F ()

is Lipschitz with respect to all arguments, and has the property of exponential forgetting (fading memory), i.e., if we put

u_{k - i} = 0

for

i \geq s

and define the cut-off sequence as

{\bar{u}}_{k - i} ≜ \{\begin{matrix} u_{k - i}, as i = 0, 1, . . ., s - 1 \\ 0, as i \geq s \end{matrix},

(2)

We assume that it holds that

Δ (s) ≜ |F ({\{u_{k - i}\}}_{i = 0}^{\infty}) - F ({\{{\bar{u}}_{k - i}\}}_{i = 0}^{\infty})| \leq c λ^{s},

(3)

with some unknown

c = const

, and

0 < λ < 1

. Similar class of fading memory systems, in which the output depends less and less on historical inputs, was considered in [32]. The goal is to identify the system (build the model

\hat{F} ()

) using the sequence of N input–output measurements

{\{(u_{k}, y_{k})\}}_{k = 1}^{N}

.

In considered system class, hysteresis is not admitted.

2.2. Examples

In this section, we show that some systems (popular in applications) fall into above description as special cases.

2.2.1. Hammerstein System

For Hammerstein system (see Figure 1), described by the equation

y_{k} = \sum_{i = 0}^{\infty} γ_{i} μ (u_{k - i}) + z_{k}

(4)

with the Lipschitz non-linear characteristic, i.e., such that

|μ (u_{a}) - μ (u_{b})| \leq l |u_{a} - u_{b}|,

(5)

and asymptotically stable dynamics, i.e.,

|γ_{i}| \leq c_{H} λ^{i}

, we get

Δ (s) = |\sum_{i = s}^{\infty} γ_{i} (μ (u_{k - i}) - μ (0))| \leq l u_{max} |\sum_{i = s}^{\infty} γ_{i}| \leq c λ^{s} .

(6)

2.2.2. Wiener System

Analogously, for Wiener system (Figure 2), where the stable linear dynamics is followed by the Lipschitz static non-linear block

y_{k} = μ (\sum_{i = 0}^{\infty} λ_{i} u_{k - i}) + z_{k,}

(7)

We get

\begin{matrix} Δ (s) & = & |μ (\sum_{i = 0}^{\infty} γ_{i} u_{k - i}) - μ (\sum_{i = 0}^{s - 1} γ_{i} u_{k - i})| \end{matrix}

(8)

\begin{matrix} \leq & l |\sum_{i = s}^{\infty} γ_{i} u_{k - i}| < c λ^{s} . \end{matrix}

(9)

Remark 1.

Analogously, it can be simply shown that also Wiener–Hammerstein (L–N–L) and Hammerstein–Wiener (N–L–N) sandwich systems belong to the assumed class.

2.2.3. Finite Memory Bilinear System

Another important and often met in application special case is the bilinear system with finite order m. It is described by the formula

y_{k} = \sum_{l_{0}, l_{1}, . . ., l_{m - 1} : l_{i \geq 0} \cap \sum_{i} l_{i} \leq m} c_{l_{0}, l_{1}, . . ., l_{m - 1}} \prod_{i = 0}^{m - i} u_{k - i}^{l_{i}} + z_{k}

(10)

i.e., for

m = 1

we get

y_{k} = c_{0, 0} + c_{1, 0} u_{k} + c_{0, 1} u_{k - 1},

(11)

for

m = 2

we have that

y_{k} = c_{0, 0} + c_{1, 0} u_{k} + c_{0, 1} u_{k - 1} + c_{1, 1} u_{k} u_{k - 1} + c_{2, 0} u_{k}^{2} + c_{0, 2} u_{k - 1}^{2} .

(12)

Since

y_{k}

does not depend on

u_{k - m}, u_{k - m - 1}, . . .

, for

s \geq m

we get

F ({\{u_{k - i}\}}_{i = 0}^{\infty}) = F ({\{{\bar{u}}_{k - i}\}}_{i = 0}^{\infty})

, and obviously

Δ (s) = 0

.

Considered example falls into the more general class of Volterra representation. Presented approach works without the knowledge of parametric representation. As regards applicability to the class (1), for

m < \infty

we have finite memory system, i.e.,

Δ (s) = 0

, as

s > m

. Moreover, since the input is assumed to be bounded (

|u_{k}| < u_{max}

), resulting mapping

F ()

fulfills Lipschitz condition (as ordinary polynomial on compact support).

3. Non-Parametric Regression

3.1. General Overview

Let us introduce the s-dimensional input regressor

u_{k}^{(s)} = {(u_{k}, u_{k - 1}, . . ., u_{k - (s - 1)})}^{T},

(13)

and the regression function

R_{s} (x^{(s)}) = E (y_{k} | u_{k}^{(s)} = x^{(s)}),

(14)

with the argument vector

x^{(s)} = {(x_{0}, x_{1}, . . ., x_{s - 1})}^{T} .

(15)

In particular, for

s = 1

we get

R_{1} (x_{0}) = E (y_{k} | u_{k} = x_{0}),

(16)

and for

s = 2

R_{2} (x_{0}, x_{1}) = E (y_{k} | u_{k} = x_{0} \cap u_{k - 1} = x_{1}) .

(17)

The non-parametric kernel estimate ([20,21,22,23,33]) of

R_{s} (x^{(s)})

has the following form

{\hat{R}}_{s} (x^{(s)}) = \frac{\sum_{k = 1}^{N} y_{k} K (\frac{∥u_{k}^{(s)} - x^{(s)}∥}{h})}{\sum_{k = 1}^{N} K (\frac{∥u_{k}^{(s)} - x^{(s)}∥}{h})},

(18)

where

∥\cdot∥

denotes Euclidean norm,

K ()

plays the role of kernel function, e.g.,

K (v) = \{\begin{matrix} 1, as |v| \leq 1 \\ 0, as |v| > 1 \end{matrix},

(19)

and h is a bandwidth parameter, responsible for the balance between bias and variance of the estimate. The class of possible kernels can obviously be generalized. Nevertheless, previous experiences shows that the kind of kernel function used for estimation is of secondary importance, whereas behavior of

h (N)

with respect to N is fundamental. We limited the presentation to Parzen (window) kernel for clarity of exposition. It fulfills all general assumptions made for kernels, i.e., it is even, non-negative and square integrable. The system (1) is thus approximated by the model

{\hat{R}}_{s} (u_{k}, u_{k - 1}, . . ., u_{k - (s - 1)}),

(20)

and s can be interpreted as its complexity. Obviously, both

h = h (N)

and

s = s (N)

need to be set depending on the number of measurements N. Observing that

F ({\{u_{k - i}\}}_{i = 0}^{\infty}) = R_{\infty} (u_{k}^{(\infty)}),

(21)

the mean squared error of the model

{\hat{R}}_{s} (u_{k}^{(s)})

can be expressed as follows

\begin{matrix} M S E ({\hat{R}}_{s} (u_{k}^{(s)})) = E {({\hat{R}}_{s} (u_{k}^{(s)}) - F ({\{u_{k - i}\}}_{i = 0}^{\infty}))}^{2} = E {({\hat{R}}_{s} (u_{k}^{(s)}) - R_{\infty} (u_{k}^{(\infty)}))}^{2}, \end{matrix}

and introducing the true finite-dimensional regression function

R_{s} (u_{k}^{(s)})

we get

\begin{matrix} M S E ({\hat{R}}_{s} (u_{k}^{(s)})) = E \{({\hat{R}}_{s} (u_{k}^{(s)}) - R_{s} (u_{k}^{(s)})) + {(R_{s} (u_{k}^{(s)}) - R_{\infty} (u_{k}^{(\infty)}))\}}^{2} = \\ E {\{{\hat{R}}_{s} (u_{k}^{(s)}) - R_{s} (u_{k}^{(s)})\}}^{2} + E {\{R_{s} (u_{k}^{(s)}) - R_{\infty} (u_{k}^{(\infty)})\}}^{2} + \\ + 2 E \{({\hat{R}}_{s} (u_{k}^{(s)}) - R_{s} (u_{k}^{(s)})) (R_{s} (u_{k}^{(s)}) - R_{\infty} (u_{k}^{(\infty)}))\} . \end{matrix}

Since both

E {\{R_{s} (u_{k}^{(s)}) - R_{\infty} (u_{k}^{(\infty)})\}}^{2} \to 0

as

N \to \infty

and

(R_{s} (u_{k}^{(s)}) - R_{\infty} (u_{k}^{(\infty)})) \to 0

as

N \to \infty

, these components can be set arbitrarily small by using appropriate scale s. Owing to above, for fixed s we focus on the first component of the MSE error of the form

\begin{matrix} E R R & = & E {\{{\hat{R}}_{s} (u_{k}^{(s)}) - R_{s} (u_{k}^{(s)})\}}^{2} = {b i a s}^{2} {\hat{R}}_{s} (u_{k}^{(s)}) + v a r {\hat{R}}_{s} (u_{k}^{(s)}), \end{matrix}

(22)

where

\begin{matrix} b i a s {\hat{R}}_{s} (u_{k}^{(s)}) & ≜ & E {\hat{R}}_{s} (u_{k}^{(s)}) - R_{s} (u_{k}^{(s)}), \end{matrix}

(23)

\begin{matrix} v a r {\hat{R}}_{s} (u_{k}^{(s)}) & ≜ & E {\{{\hat{R}}_{s} (u_{k}^{(s)}) - E {\hat{R}}_{s} (u_{k}^{(s)})\}}^{2} . \end{matrix}

(24)

It can simply be shown that

\begin{matrix} b i a s {\hat{R}}_{s} (x^{(s)}) & = & o (h (N)), \end{matrix}

(25)

\begin{matrix} {b i a s}^{2} {\hat{R}}_{s} (x^{(s)}) & = & o (h^{2} (N)), \end{matrix}

(26)

The bias order follows directly from Lipschitz condition, and the fact that

∥u_{k}^{(s)} - x_{[p]}^{(s)}∥ \leq h

for all k’s selected by kernel. Moreover,

\begin{matrix} v a r {\hat{R}}_{s} (x^{(s)}) & = & o (\frac{1}{N h^{s} (N)}) . \end{matrix}

(27)

For window kernel, Lipschitz function

F

, and strictly positive input probability density function around the estimation point, probability of selection in s-dimensional space can be obviously evaluated from below by

c h^{s}

, where c is some constant. Hence, expected number of successes is proportional to

N c h^{s}

(not less than). The variance order is thus a simple consequence of Wald’s identity. Hence, to assure the convergence

{\hat{R}}_{s} (x^{(s)}) \to R_{s} (x^{(s)})

, as

N \to \infty

in the mean square sense, the following conditions must be fulfilled

h \to 0 and N h^{s} \to \infty, as N \to \infty,

(28)

which leads to typical setting

h (N) \sim N^{- α}, where α \in (0, \frac{1}{s}) .

(29)

To obtain the best asymptotic trade-off between squared bias and variance and comparing its orders we get

\begin{matrix} h^{2} (N) & = & \frac{1}{N h^{s} (N)}, \end{matrix}

(30)

\begin{matrix} h^{s + 2} (N) & = & \frac{1}{N}, \end{matrix}

(31)

\begin{matrix} h_{o p t} (N) & \sim & N^{- \frac{1}{s + 2}}, \end{matrix}

(32)

\begin{matrix} E R R & \sim & N^{- \frac{2}{s + 2}} . \end{matrix}

(33)

Moreover, to assure the balance between the estimation error and approximation error of order

o (λ^{s})

, connected with neglecting the tail

{\{u_{k - i}\}}_{i = s}^{\infty}

we get

\begin{matrix} N^{- \frac{2}{s (N) + 2}} & = & λ^{s (N)}, \end{matrix}

(34)

\begin{matrix} - \frac{2}{s (N) + 2} log N & = & s (N) log λ, \end{matrix}

(35)

\begin{matrix} \frac{- 2}{s (N) (s (N) + 2)} & = & \frac{log λ}{log N}, \end{matrix}

(36)

\begin{matrix} s^{2} (N) + 2 s (N) & = & \frac{2}{log \frac{1}{λ}} log N, \end{matrix}

(37)

where

\frac{2}{log \frac{1}{λ}} = const

. Owing to above, the scale

s (N)

must not be faster than

\sqrt{log N}

, i.e.,

s_{o p t} (N) = O (\sqrt{log N}) .

(38)

which illustrates slowness of admissible model complexity increasing in general case. The property (38) commonly known as “curse of dimensionality problem” illustrates the main drawback of multi-dimensional non-parametric regression approach to system modeling in traditional form. The reason is that probability of kernel selection

P \{K (\frac{∥u_{k}^{(s)} - x^{(s)}∥}{h}) = 1\} = P \{∥u_{k}^{(s)} - x^{(s)}∥ \leq h\} \sim h^{s}

(39)

decreases rapidly when s grows large. We also refer the reader to the proof of Theorem 3 in [26], where a detailed discussion concerning an analogous problem can be found.

3.2. Dimension Reduction

To cope with the problem shown in (39) we consider two cases of some specific input excitation processes to speed up the rate of convergence.

3.2.1. Discrete Input

In case 1 we assume that in each s-element input sub-sequence

u_{k}, u_{k + 1}, . . ., u_{k + s - 1}

, there exists d inputs with discrete distribution on finite set of possible realizations. Consequently, all the points

u_{k}^{(s)} \in R^{s}

lay on a finite number of separable and compact subspaces with the internal dimension

s^{*} = s - d,

(40)

and for

x^{(s)} = u_{k}^{(s)}

we have

E R R \sim N^{- \frac{2}{s^{*} + 2}} .

(41)

For each measurement point probability of kernel selection behaves like

c h^{s^{*}}

, where

s^{*}

,

(s^{*} < s)

, is internal dimension of this subspace. In particular, for

d = s

(all input variables quantized) we get

E R R \sim N^{- 1}

. The sets of possible input sequences for

s = 2

are illustrated in Figure 3 and Figure 4.

3.2.2. Periodic Input

In case 2 we assume the input is periodic with the period

N_{0}

, i.e.,

u_{k} = u_{k + N_{0}}

for each

k = 1, 2, . . ., N - N_{0}

. Then, the value of the regressor

u_{k}^{(s)}

(see (13)) evaluates to one of

N_{0}

distinct points in

R^{s}

,

n \in Z

,

x_{[1]}^{(s)} = [\begin{matrix} u_{s} \\ u_{s - 1} \\ ⋮ \\ u_{1} \end{matrix}], . . ., x_{[N_{0}]}^{(s)} = [\begin{matrix} u_{N_{0} + s - 1} \\ u_{N_{0} + s - 2} \\ ⋮ \\ u_{N_{0}} \end{matrix}]

(42)

with probabilities

P \{u_{k}^{(s)} = x_{[p]}^{(s)}\} = \frac{1}{N_{0}}, p = 1, 2, . . ., N_{0}

(43)

Measurements are uniformly distributed on the finite set of

N_{0}

distinct points

x_{[1]}^{(s)}, . . ., x_{[p]}^{(s)}, . . . x_{[N_{0}]}^{(s)}

(44)

Narrowing of

h (N)

does not affect the kernel estimator asymptotically (i.e.,

s^{*} = 0

). Consequently, we get the best possible convergence rate

E R R \sim N^{- 1}

. However, it must be admitted that estimators are calculated only for finite number of points, and, increasing

N_{0}

causes increase of variance of the regression estimator for particular points

x_{[p]}^{(s)}

(as the number of selected data is of order

N / N_{0}

).

4. Hybrid/Combined Parametric-Non-Parametric Approach

Since the special input excitations allows for fast recovering the system characteristics only in some points, additional prior knowledge is needed to extend the model for arbitrary process

{u_{k}}

. In this section we assume that the transformation

F ({\{u_{k - i}\}}_{i = 0}^{\infty})

belongs to the given (a priori known), finite dimensional class of systems

F (u_{k}^{(s)}, θ)

F ({\{u_{k - i}\}}_{i = 0}^{\infty}) \subset F (u_{k}^{(s)}, θ)

(45)

with unknown parameter vector

θ

. In the proposed methodology, one of the input excitations described in Section 4 is applied. The system is identified on the finite set of

N_{0}

representative points

x_{[1]}^{(s)}, x_{[2]}^{(s)}, . . ., x_{[p]}^{(s)}, . . ., x_{[N_{0}]}^{(s)}

, where

x_{[p]}^{(s)} \in R^{s}

, and

p = 1, 2, . . ., N_{0}

. Let us denote by

θ^{*}

—the true and unknown vector of system parameters. We assume that

θ^{*}

is identifiable, i.e., the following property holds

F ({\{u_{k - i}\}}_{i = 0}^{\infty}) = F (u_{k}^{(s)}, θ) ⟺ θ = θ^{*} .

(46)

Moreover, let the quality index

Q (θ) = E {[y_{k} - F (u_{k}^{(s)}, θ)]}^{2}

(47)

be convex for

θ \in Ξ

, where

Ξ

is some neighborhood of the true

θ^{*}

θ^{*} = arg min_{θ} Q (θ) .

(48)

The following two-step algorithm is proposed.

Step 1. (non-parametric) Using the input–output observations

{\{(u_{k}, y_{k})\}}_{k = 1}^{N}

for

p = 1, 2, . . ., N_{0}

compute the estimates

{\hat{R}}_{s} (x_{[1]}^{(s)}), . . ., {\hat{R}}_{s} (x_{[p]}^{(s)}), . . ., {\hat{R}}_{s} (x_{[N_{0}]}^{(s)}),

(49)

Step 2. (parametric) Minimize the empirical version of the least squares criterion (47)

\hat{θ} = arg min_{θ} \frac{1}{N_{0}} \sum_{p = 1}^{N_{0}} {({\hat{R}}_{s} (x_{[p]}^{(s)}) - F (x_{[p]}^{(s)}, θ))}^{2} .

(50)

Lemma 1.

Let

F (u_{k}^{(s)}, θ)

be Lipschitz with respect to all

u_{k - l}

’s included in

u_{k}^{(s)}

and all parameters in θ. If the error of non-parametric estimate behaves like

∥{\hat{R}}_{s} (x_{[p]}^{(s)}) - F (x_{[p]}^{(s)}, θ^{*})∥ = O (N^{- τ})

in the mean square sense, for all

p = 1, 2, . . ., N_{0}

, then

∥\hat{θ} - θ^{*}∥ = O (N^{- τ})

(51)

in the parametric step 2.

Proof.

The property (51) can be proven following the lines of the proof of Theorem 1 in [25]. □

Remark 2.

Fulfillment of (46) and the method of non-linear optimization in (50) are strictly dependent on the specifics of the problem. In active experiment, when the input can be generated arbitrarily, appropriate selection of the points

{\{x_{[p]}^{(s)}\}}_{p = 1}^{N_{0}}

can significantly simplify operations in Step 2 (see example below).

Example 1.

For the system

y_{k} = e^{θ_{1} u_{k}} + θ_{2} u_{k} u_{k - 1} + z_{k},

(52)

in step 1 we can estimate two-dimensional (

s = 2

) regression function

{\hat{R}}_{2} (x^{(2)})

in

N_{0} = 2

representative points

x_{[1]}^{(2)} = {(1, 0)}^{T}

, and

x_{[2]}^{(2)} = {(1, 1)}^{T}

, i.e., compute the pattern

{\hat{R}}_{2} (x_{[1]}^{(2)}), {\hat{R}}_{2} (x_{[2]}^{(2)}) .

(53)

Since the true values of the regression function are respectively

R_{2} (x_{[1]}^{(2)}) = e^{θ_{1}}

and

R_{2} (x_{[2]}^{(2)}) = e^{θ_{1}} + θ_{2}

, in step 2 we get trivial estimates of parameters

\begin{matrix} {\hat{θ}}_{1} & = & log {\hat{R}}_{2} (x_{[1]}^{(2)}), \end{matrix}

(54)

\begin{matrix} {\hat{θ}}_{2} & = & {\hat{R}}_{2} (x_{[2]}^{(2)}) - {\hat{R}}_{2} (x_{[1]}^{(2)}) . \end{matrix}

(55)

5. Simulation Example

To illustrate the proposed method, we simulated simple Wiener system (see Figure 2) with

\begin{matrix} x_{k} & = & 0.5 x_{k - 1} + u_{k} \end{matrix}

(56)

\begin{matrix} v_{k} & = & arctg (x_{k}) \end{matrix}

(57)

\begin{matrix} y_{k} & = & v_{k} + z_{k} \end{matrix}

(58)

excited by random process uniformly distributed on equidistant set of points

u_{k} \sim \{- 1, - 0.75, - 0.5, - 0.25, 0, 0.25, 0.5, 0.75, 1\},

(59)

and uniformly distributed output disturbance

z_{k} \sim U [- 0.1, 0.1] .

(60)

For

N = 10^{4}

simulated input–output pairs

{\{(u_{k}, y_{k})\}}_{k = 1}^{N}

, the non-parametric models

{\hat{R}}_{s} (u_{k})

were computed for

s = 1, 2, 3

and compared with respect to the following empirical error

δ_{s} = \frac{1}{N - s + 1} \sum_{k = s}^{N} {(y_{k} - {\hat{R}}_{s} (u_{k}^{(s)}))}^{2} .

(61)

The results are presented in Table 1 and Figure 5 and Figure 6. Explicit derivation of the true finite order (2D or 3D) regression function is problematic, owing to the fact that the neglected part of input signal, i.e., the ‘tail’ connected with terms

{\{u_{k - τ}\}}_{τ = s + 1}^{\infty}

, is transferred through the non-linear characteristic

a r c t g ()

. Figure 5 and Figure 6 illustrate non-parametric character and non-linear properties of the model, and give a general view on the shape of input–output relationship, which can be helpful for eventual parametrization. Quantized input (59) can be a good choice, when

s = const < \infty

, and the non-parametric estimate

{\hat{R}}_{s} (u_{k}^{(s)})

plays supporting role for non-linear least squares-based parameter estimation in step 2. Nevertheless, in purely non-parametric approach, i.e., for

s (N) \to \infty

, the number of possible realizations of

u_{k}^{(s)}

increases exponentially. In the considered example, for 9 points in (59), probability of kernel selection for each

x^{(s)} = u_{k}^{(s)}

behaves like

\sim 9^{- s}

and the estimate becomes sensitive on the noise

z_{k}

.

Table 1 illustrates reduction of error with respect to scale of the regression. The results are also compared with the best linear approximations

F I R (s)

of Wiener system. We emphasize that improvement is achieved under mild prior restrictions, i.e., the non-linear model is built based on measurements knowledge only.

6. Application in Testing of Chalcogenide Glasses with the Use of DSC Method

In this section, we apply the proposed algorithm for identification of heating process in Differential Scanning Calorimeter [31].

6.1. Chalcogenide Glasses

Materials with non-linear optical properties play a key role in frequency conversion and optical switching. One of the most promising materials in this area are chalcogenide glasses, because of good non-linear, passive and active properties. They are considered to be optical medium for the fibers of the 21st century. Chalcogenide glass fibers transmit into the IR, hence they can have numerous potential applications in the civil, medical and military areas. The IR light sources, lasers and amplifiers developed using these phenomena will be very useful in many areas of industry. High-speed optical communication requires ultra-fast all-optical processing and switching capabilities. In DSC experiment energy (heat flow) in function of time or temperature could be established. The energy from an external source is required to set to zero the difference in temperature of the tested and reference samples. Both samples are heated or cooled in a controlled mode and both techniques enable the detection of thermal events observed in the physical or chemical transformation under the influence of the changing temperature in a specific manner. Owing to that, many thermodynamically important parameters can be established, e.g., a glass transition or softening temperature, melting temperature, and melting enthalpy. The results also allow observation of physical aging processes. The goal is to control temperature of heating module precisely and ensure linearity of it. It is planned to design Model Following Control (MFC) structure of system to optimize quality indexes of temperature controlling. Below we present the results of identification experiment.

6.2. Results of Experiment

Treating the sample temperature as system output

y_{k}

, and the power of the heating element as input

u_{k}

, the non-parametric multi-dimension regression model

{\hat{R}}_{s} (u_{k}^{(s)})

was computed for

s = 1, 2, 3, 4

. The results are shown in Figure 7, Figure 8, Figure 9 and Figure 10.

Differential Scanning Calorimeter for chalcogenide glasses (built by members of the team), was first approximated by the linear model, and the results were not satisfying. To improve accuracy, Hammerstein model was applied, and the decision of model structure was made arbitrarily. To avoid the risk of bad parameterization the general approach presented in the paper was applied. The results are comparable, although obtained without making any restrictive assumptions about the block-oriented structure of the model.

In Table 2 resulting mean squared errors for various scales

s = 1, 2, 3, 4

are shown. The strong point of the method is that asymptotically, as

s (N) \to \infty

, the model becomes free of approximation error, on the contrary to linear or block-oriented representation.

The results have been compared to FIR(s) linear model and parametric Hammerstein model. Regarding non-parametric modeling of Hammerstein systems, proposed by Greblicki and Pawlak in the 1980s [20], their algorithms suffer from correlation of input, and they are not applicable here. On the other hand, for parametric Hammerstein model, the results are strongly dependent on the arbitrarily selected basis functions of nonlinearity. In our experiment we applied 3rd order polynomial function

μ ()

connected in cascade with FIR(s) linear dynamic filter. Table 2 shows that our method is more accurate, emphasizing that it works under mild prior restrictions.

7. Conclusions

The main contribution of the work lays in the fact that the model is built with lack of prior knowledge about the structure of the system and its characteristics. No decision of using particular Hammerstein or Wiener model is needed at the beginning to start the procedure. Obtained non-parametric estimators

{\hat{R}}_{s} (x_{[p]}^{(s)})

can be eventually plugged into the least squares optimization criterion in step 2, to provide parametric representation of the relationship. Both parametric and non-parametric methods can be combined to design strategy, which includes advantages of both approaches. Step 1 (non-parametric) is run to estimate selected points of system characteristic. It is done effectively thanks to generation of specific input excitation (discrete or periodic), which allows to avoid the problem of high dimensionality. Moreover, appropriate selection of estimation points can significantly decrease level of difficulty of the non-linear optimization task in step 2. The rate of convergence of parameter estimate is the same as for non-parametric ones.

The scheme presented in the paper is universal for a broad class of systems including Hammerstein and Wiener structures, and their interconnections. Non-parametric data pre-filtering plays also the role of compression algorithm and the result of step 1 can be treated as the simplified pattern of system for eventual structure detection and selection of its best parametric model. Regression-based non-parametric model can be computed only for the set of selected points, and the resulting pairs

{\{(x_{[p]}^{(s)}, {\hat{R}}_{s} (x_{[p]}^{(s)}))\}}_{p = 1}^{N_{0}}

can used as compressed pattern (as

N_{0} ≪ N

) of the system, instead of N data points

{\{(u_{k}^{(s)}, {\hat{R}}_{s} (u_{k}^{(s)}))\}}_{k = 1}^{N}

.

Non-parametric pattern

{\hat{R}}_{s} (x_{[1]}^{(s)}), . . ., {\hat{R}}_{s} (x_{[N_{0}]}^{(s)})

can help to support decision of model selection from the list of potential candidates, and model competitions can be performed in the user-defined regions of interests, e.g., in the working points.

Author Contributions

Formal analysis, G.M.; methodology, Z.H.; software, P.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research (including APC) was funded and supported by the National Science Centre, Poland, Grant No. 2016/21/B/ST7/02284.

Conflicts of Interest

The authors declare no conflict of interest.

References

Giannakis, G.; Serpedin, E. A bibliography on nonlinear system identification. Signal Process. 2001, 81, 533–580. [Google Scholar] [CrossRef]
Schetzen, M. The Volterra and Wiener Theories of Nonlinear Systems; John Wiley & Sons: Hoboken, NJ, USA, 1980. [Google Scholar]
Batselier, K.; Chen, Z.; Wong, N. Tensor Network alternating linear scheme for MIMO Volterra system identification. Automatica 2017, 84, 26–35. [Google Scholar] [CrossRef]
Birpoutsoukis, G.; Marconato, A.; Lataire, J.; Schoukens, J. Regularized non-parametric Volterra kernel estimation. Automatica 2017, 82, 324–327. [Google Scholar] [CrossRef]
Narendra, K.; Gallman, P.G. An iterative method for the identification of nonlinear systems using the Hammerstein model. IEEE Trans. Autom. Control 1966, 11, 546–550. [Google Scholar] [CrossRef]
Giri, F.; Bai, E.W. Block-Oriented Nonlinear System Identification; Lecture Notes in Control and Information Sciences; Springer: Berlin/Heidelberg, Germany, 2010. [Google Scholar]
Bai, E.; Li, D. Convergence of the iterative Hammerstein system identification algorithm. IEEE Trans. Autom. Control 2004, 49, 1929–1940. [Google Scholar] [CrossRef]
Billings, S.; Fakhouri, S. Identification of systems containing linear dynamic and static nonlinear elements. Automatica 1982, 18, 15–26. [Google Scholar] [CrossRef]
Chang, F.; Luus, R. A noniterative method for identification using Hammerstein model. IEEE Trans. Autom. Control 1971, 16, 464–468. [Google Scholar] [CrossRef]
Giri, F.; Rochdi, Y.; Chaoui, F. Parameter identification of Hammerstein systems containing backlash operators with arbitrary-shape parametric borders. Automatica 2011, 47, 1827–1833. [Google Scholar] [CrossRef]
Śliwiński, P. Lecture Notes in Statistics. In Nonlinear System Identification by Haar Wavelets; Springer: Berlin/Heidelberg, Germany, 2013. [Google Scholar]
Stoica, P.; Söderström, T. Instrumental-variable methods for identification of Hammerstein systems. Int. J. Control 1982, 35, 459–476. [Google Scholar] [CrossRef]
Bershad, N.; Celka, P.; Vesin, J. Analysis of stochastic gradient tracking of time-varying polynomial Wiener systems. IEEE Trans. Signal Process. 2000, 48, 1676–1686. [Google Scholar] [CrossRef]
Chen, H.F. Recursive identification for Wiener model with discontinuous piece-wise linear function. IEEE Trans. Autom. Control 2006, 51, 390–400. [Google Scholar] [CrossRef]
Hagenblad, A.; Ljung, L.; Wills, A. Maximum likelihood identification of Wiener models. Automatica 2008, 44, 2697–2705. [Google Scholar] [CrossRef]
Lacy, S.; Bernstein, D. Identification of FIR Wiener systems with unknown, non-invertible, polynomial non-linearities. Int. J. Control 2003, 76, 1500–1507. [Google Scholar] [CrossRef]
Vörös, J. Parameter identification of Wiener systems with multisegment piecewise-linear nonlinearities. Syst. Control Lett. 2007, 56, 99–105. [Google Scholar] [CrossRef]
Wigren, T. Convergence analysis of recursive identification algorithms based on the nonlinear Wiener model. IEEE Trans. Autom. Control 1994, 39, 2191–2206. [Google Scholar] [CrossRef]
Pintelon, R.; Schoukens, J. System Identification: A Frequency Domain Approach; Wiley-IEEE Press: Hoboken, NJ, USA, 2004. [Google Scholar]
Greblicki, W.; Pawlak, M. Nonparametric System Identification; Cambridge University Press: Cambridge, UK, 2008. [Google Scholar]
Györfi, L.; Kohler, M.; Krzyżak, A.; Walk, H. A Distribution-Free Theory of Nonparametric Regression; Springer: New York, NY, USA, 2002. [Google Scholar]
Härdle, W. Applied Nonparametric Regression; Cambridge University Press: Cambridge, UK, 1990. [Google Scholar]
Wand, M.; Jones, H. Kernel Smoothing; Chapman and Hall: London, UK, 1995. [Google Scholar]
Hasiewicz, Z.; Mzyk, G. Combined parametric-nonparametric identification of Hammerstein systems. IEEE Trans. Autom. Control 2004, 48, 1370–1376. [Google Scholar] [CrossRef]
Hasiewicz, Z.; Mzyk, G. Hammerstein system identification by non-parametric instrumental variables. Int. J. Control 2009, 82, 440–455. [Google Scholar] [CrossRef]
Mzyk, G. A censored sample mean approach to nonparametric identification of nonlinearities in Wiener systems. IEEE Trans. Circuits Syst. 2007, 54, 897–901. [Google Scholar] [CrossRef]
Mzyk, G. Combined Parametric-Nonparametric Identification of Block-Oriented Systems; Lecture Notes in Control and Information Sciences; Springer: Berlin/Heidelberg, Germany, 2014. [Google Scholar]
Mzyk, G.; Wachel, P. Kernel-based identification of Wiener-Hammerstein system. Automatica 2017, 83, 275–281. [Google Scholar] [CrossRef]
Mzyk, G.; Wachel, P. Wiener system identification by input injection method. Int. J. Adapt. Control Signal Process. 2020, 34, 1105–1119. [Google Scholar] [CrossRef]
Wachel, P.; Mzyk, G. Direct identification of the linear block in Wiener system. Int. J. Adapt. Control Signal Process. 2016, 30, 93–105. [Google Scholar] [CrossRef]
Kozdraś, B.; Mzyk, G.; Mielcarek, P. Identification of the heating process in Differential Scanning Calorimetry with the use of Hammerstein model. In Proceedings of the 2020 21st International Carpathian Control Conference (ICCC), High Tatras, Slovakia, 27–29 October 2020. [Google Scholar]
Boyd, S.; Chua, L. Fading memory and the problem of approximating nonlinear operators with Volterra series. IEEE Trans. Circuits Syst. 1985, 32, 1150–1161. [Google Scholar] [CrossRef]
Van der Vaart, A.W. Asymptotic Statistics; Cambridge University Press: Cambridge, UK, 2000. [Google Scholar]

Figure 1. Hammerstein system.

Figure 2. Wiener System.

Figure 3. Input space for

s = 2

and

d = 1

, (a) k odd, (b) k even.

Figure 3. Input space for

s = 2

and

d = 1

, (a) k odd, (b) k even.

Figure 4. Input space for

s = 2

and

d = 2

.

Figure 4. Input space for

s = 2

and

d = 2

.

Figure 5. Two-dimensional regression-based model

{\hat{R}}_{2} (x_{0}, x_{1})

.

Figure 5. Two-dimensional regression-based model

{\hat{R}}_{2} (x_{0}, x_{1})

.

Figure 6. Three-dimensional regression-based model

{\hat{R}}_{3} (x_{0}, x_{1}, x_{2})

.

Figure 6. Three-dimensional regression-based model

{\hat{R}}_{3} (x_{0}, x_{1}, x_{2})

.

Figure 7. Experimental data—1-dimensional regression.

Figure 8. Experimental data—2-dimensional regression.

Figure 9. Experimental data—3-dimensional regression.

Figure 10. Experimental data—4-dimensional regression.

Table 1. Mean squared errors

δ_{s}

of model outputs for

s = 1, 2, 3

, compared to best linear approximation (BLA).

Table 1. Mean squared errors

δ_{s}

of model outputs for

s = 1, 2, 3

, compared to best linear approximation (BLA).

	$δ_{1}$	$δ_{2}$	$δ_{3}$
$\hat{R_{s}}$	0.0656	0.0279	0.0137
BLA	0.0660	0.0304	0.0176

Table 2. Mean squared errors

δ_{s}

of model outputs for

s = 1, 2, 3, 4

.

Table 2. Mean squared errors

δ_{s}

of model outputs for

s = 1, 2, 3, 4

.

	$δ_{1}$	$δ_{2}$	$δ_{3}$	$δ_{4}$
$\hat{R_{s}}$ (presented method)	1017	477	232	169
BLA (Linear FIR(s))	1710	1331	1165	1114
Hammerstein polynomial (3rd order + FIR(s))	1102	553	296	202

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Mzyk, G.; Hasiewicz, Z.; Mielcarek, P. Kernel Identification of Non-Linear Systems with General Structure. Algorithms 2020, 13, 328. https://doi.org/10.3390/a13120328

AMA Style

Mzyk G, Hasiewicz Z, Mielcarek P. Kernel Identification of Non-Linear Systems with General Structure. Algorithms. 2020; 13(12):328. https://doi.org/10.3390/a13120328

Chicago/Turabian Style

Mzyk, Grzegorz, Zygmunt Hasiewicz, and Paweł Mielcarek. 2020. "Kernel Identification of Non-Linear Systems with General Structure" Algorithms 13, no. 12: 328. https://doi.org/10.3390/a13120328

APA Style

Mzyk, G., Hasiewicz, Z., & Mielcarek, P. (2020). Kernel Identification of Non-Linear Systems with General Structure. Algorithms, 13(12), 328. https://doi.org/10.3390/a13120328

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Kernel Identification of Non-Linear Systems with General Structure

Abstract

1. Introduction

2. Problem Statement

2.1. Class of Systems

2.2. Examples

2.2.1. Hammerstein System

2.2.2. Wiener System

2.2.3. Finite Memory Bilinear System

3. Non-Parametric Regression

3.1. General Overview

3.2. Dimension Reduction

3.2.1. Discrete Input

3.2.2. Periodic Input

4. Hybrid/Combined Parametric-Non-Parametric Approach

5. Simulation Example

6. Application in Testing of Chalcogenide Glasses with the Use of DSC Method

6.1. Chalcogenide Glasses

6.2. Results of Experiment

7. Conclusions

Author Contributions

Funding

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI