Tensor-Based Approaches for Nonlinear and Multilinear Systems Modeling and Identification

Favier, Gérard; Kibangou, Alain

doi:10.3390/a16090443

Open AccessReview

Tensor-Based Approaches for Nonlinear and Multilinear Systems Modeling and Identification

by

Gérard Favier

^1,*,†

and

Alain Kibangou

^2,3,†

¹

I3S Laboratory, Côte d’Azur University, CNRS, 06900 Sophia Antipolis, France

²

Univ. Grenoble Alpes, CNRS, Inria, Grenoble INP, GIPSA-Lab, 38000 Grenoble, France

³

Faculty of Science (Auckland Park Campus), University of Johannesburg, Johannesburg 2006, South Africa

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Algorithms 2023, 16(9), 443; https://doi.org/10.3390/a16090443

Submission received: 30 July 2023 / Revised: 29 August 2023 / Accepted: 9 September 2023 / Published: 14 September 2023

(This article belongs to the Special Issue Mathematical Modelling in Engineering and Human Behaviour)

Download

Browse Figures

Versions Notes

Abstract

:

Nonlinear (NL) and multilinear (ML) systems play a fundamental role in engineering and science. Over the last two decades, active research has been carried out on exploiting the intrinsically multilinear structure of input–output signals and/or models in order to develop more efficient identification algorithms. This has been achieved using the notion of tensors, which are the central objects in multilinear algebra, giving rise to tensor-based approaches. The aim of this paper is to review such approaches for modeling and identifying NL and ML systems using input–output data, with a reminder of the tensor operations and decompositions needed to render the presentation as self-contained as possible. In the case of NL systems, two families of models are considered: the Volterra models and block-oriented ones. Volterra models, frequently used in numerous fields of application, have the drawback to be characterized by a huge number of coefficients contained in the so-called Volterra kernels, making their identification difficult. In order to reduce this parametric complexity, we show how Volterra systems can be represented by expanding high-order kernels using the parallel factor (PARAFAC) decomposition or generalized orthogonal basis (GOB) functions, which leads to the so-called Volterra–PARAFAC, and Volterra–GOB models, respectively. The extended Kalman filter (EKF) is presented to estimate the parameters of a Volterra–PARAFAC model. Another approach to reduce the parametric complexity consists in using block-oriented models such as those of Wiener, Hammerstein and Wiener–Hammerstein. With the purpose of estimating the parameters of such models, we show how the Volterra kernels associated with these models can be written under the form of structured tensor decompositions. In the last part of the paper, the notion of tensor systems is introduced using the Einstein product of tensors. Discrete-time memoryless tensor-input tensor-output (TITO) systems are defined by means of a relation between an Nth-order tensor of input signals and a Pth-order tensor of output signals via a

(P + N)

th-order transfer tensor. Such systems generalize the standard memoryless multi-input multi-output (MIMO) system to the case where input and output data define tensors of order higher than two. The case of a TISO system is then considered assuming the system transfer is a rank-one Nth-order tensor viewed as a global multilinear impulse response (IR) whose parameters are estimated using the weighted least-squares (WLS) method. A closed-form solution is proposed for estimating each individual IR associated with each mode-n subsystem.

Keywords:

block-oriented nonlinear systems; multilinear systems; parameter estimation; tensor decompositions; tensor systems; Volterra systems; Wiener–Hammerstein systems

1. Introduction

The continuous development of mathematical knowledge, together with a constantly renewed and growing need to study, represent and analyze ever more complex physical phenomena and systems, are at the origin of new mathematical objects and models. In particular, the notion of matrices introduced by Gauss in 1810 to solve systems of linear algebraic equations, with the foundations of matrix computation developed during the 19th century by Sylvester (1814–1897) and Cayley (1821–1895), among several other mathematicians, has later given rise to the notion of tensors. Tensors of order higher than two, i.e., mathematical objects indexed by more than two indices, are multidimensional generalizations of vectors and matrices which are tensors of orders one and two, respectively. Such objects are well suited to represent and process multidimensional and multimodal signals and data, like in computer vision [1], pattern recognition [2], array processing [3], machine learning [4], recommender systems [5], ECG applications [6], bioinformatics [7], and wireless communications [8], among many other fields of application. Today, with ever constantly growing big data (texts, images, audio, and videos) to manage in multimedia applications and social networks, tensor tools are well adapted to fuse, classify, analyze and process digital information [9].

The purpose of this paper is to present an overview of tensor-based methods for modeling and identifying nonlinear and multilinear systems using input–output data, as encountered in signal processing applications, with a focus on truncated Volterra models and block-oriented nonlinear ones and an introduction to memoryless input–output tensor systems. With a detailed reminder of the tensor tools useful to make the presentation as self-contained as possible and a review of main nonlinear models and their applications, this paper should be of interest to researchers and engineers concerned with signal processing applications.

First of all, developed as computational and representation tools in physics and geometry, tensors were the subject of mathematical developments related to polyadic decomposition [10], aiming to generalize dyadic decompositions, i.e., matrix decompositions such as the singular value decomposition (SVD), discovered independently by Beltrami (1835–1900) and Jordan (1838–1922) in 1873 and 1874, respectively. Then, tensors were used for the analysis of three-dimensional data generalizing matrix analysis to sets of matrices, seen as arrays of data characterized by three indices, in the fields of psychometrics and chemometrics [11,12,13,14]. This explains the other name given to tensors as multiway arrays in the context of data analysis and data mining [15].

Matrix decompositions, such as the SVD, have thus been generalized into tensor decompositions, such as the PARAFAC decomposition [13], also called canonical polyadic decomposition (CPD), and the Tucker decomposition (TD) [12]. Tensor decompositions consist in representing a high-order tensor by means of factor matrices and lower-order tensors, called core tensors. In the context of data analysis, such decompositions make it possible to highlight hidden structures of the data while preserving their multilinear structure, which is not the case when stacking the data in the form of vectors or matrices. Tensor decompositions can be used to reduce data dimensionality [16], merge coupled data tensors [17], handle missing data through the application of tensor completion methods [18,19], and design semi-blind receivers for tensor-based wireless communication systems [8].

In Table 1, we present basic and very useful matrix and third-order tensor decompositions, namely the reduced SVD, also known as the compact SVD, PARAFAC/CPD and TD, in a comparative way. A detailed presentation of PARAFAC and Tucker decompositions is given in Section 4.2. Note that the matrix factors

U

and

V

which are column-orthonormal, contain the left and right singular vectors, respectively, whereas the diagonal matrix

Σ

contains the nonzero singular values, and R denotes the rank of the matrix.

A historical review of the theory of matrices and tensors, with basic decompositions and applications, can be found in [20].

Similarly, from the system modeling point of view, linear models of dynamic systems in the form of input–output relationships or state space equations have given rise to nonlinear and multilinear models to take into account nonlinearities inherent in physical systems. This explains why nonlinear models are appropriate in many engineering applications. Consequently, standard parameter estimation and filtering methods for linear systems, such as the least-squares (LS) algorithm and the Kalman filter (KF), first proposed by Legendre in 1805 [21] and Kalman in 1960 [22], respectively, were extended for parameter and state estimation of nonlinear systems. Thus, the alternating least-squares (ALS) algorithm [13] and the extended Kalman filter (EKF) [23] were developed, respectively, for estimating the parameters of a PARAFAC decomposition and applying the KF to nonlinear systems.

In Table 2, we present two examples of standard linear models, namely the single-input single-output (SISO) finite impulse response (FIR) model and the memoryless multi-input multi-output (MIMO) model, often used for modeling a communication channel between

n_{T}

transmit antennas and

n_{R}

receive antennas, where

h_{i, j}

is the fading coefficient between the jth transmit antenna and the ith receiver antenna. The FIR model is one of the most used for modeling linear time-invariant (LTI) systems, i.e., systems which satisfy the constraints of linearity and time-invariance, which means that the system output

y (t)

can be obtained from the input via a convolution

y (t) = (h ★ u) (t)

, where

h (.)

is the system’s impulse response (IR), and ★ denotes the convolution operator.

The notion of linear dynamical system has been generalized to multilinear dynamical systems in [24] to model tensor time series data, i.e., time series in which input and output data are tensors. In this paper, the multilinear operator is chosen in the form of a Kronecker product of matrices, and the parameters are estimated by means of an expectation-maximization algorithm, with application to various real datasets. Then, the notion of LTI system has been extended to multilinear LTI (MLTI) systems by [25] using the Einstein product of even-order paired tensors, with an extension of the classical stability, reachability, and observability criteria to the case of MLTI systems. In Table 2, four examples of nonlinear (NL) and multilinear (ML) models are introduced, namely the polynomial, truncated Volterra, tensor-input tensor-output (TITO), and multilinear tensor-input single-output (TISO) models, which will be studied in more detail in Section 5 and Section 6, as mentioned in Table 2.

System modeling and identification is a fundamental problem in engineering applications. Real-life systems being often nonlinear in nature, NL models are very useful for various application areas. Parameter estimation using measurements of input and output (I/O) signals is at the heart of identification methods. In this paper, two main families of NL models are considered: (i) discrete-time Volterra models, also called truncated Volterra series expansions; (ii) block-oriented (Wiener, Hammerstein, Wiener–Hammerstein) models. In the sequel, we assume that the systems to be modeled are time invariant, i.e., their properties and consequently the parameters of their model do not depend on time.

Volterra models are frequently used due to the fact that they allow approximating any fading memory nonlinear systems with an arbitrary precision, as shown in [26]. They represent a direct nonlinear extension of the very popular FIR linear model, with guaranteed stability in the bounded-input bounded-output (BIBO) sense, and they have the advantage to be linear in their parameters, the kernel coefficients [27]. The nonlinearity of a Pth-order truncated Volterra model is due to products of up to P samples of delayed inputs. Moreover, they are interpretable in terms of multidimensional convolutions which makes the derivation of their z-transform and Fourier transform representations easy[28].

Among the numerous application areas of Volterra models, we can mention chemical and biochemical processes [29], radio-over-fiber (RoF) wireless communication systems (due to optical/electrical (O/E) conversion) [30,31], high-power amplifiers (HPA) in satellite communications [32,33], physiological systems [34], vibrating structures and more generally mechatronic systems like robots [35], and acoustic echo cancellation [36].

The main drawback of Volterra models is their parametric complexity implying the need to estimate a huge number of parameters which exponentially grows with the order and memory of the kernels. So, several complexity reduction approaches for Volterra models have been developed using symmetrization or triangularization of Volterra kernels, or their expansion on orthogonal bases like Laguerre and Kautz ones, or generalized orthogonal bases (GOB). Considering Volterra kernels as tensors, they can also be decomposed using a PARAFAC decomposition or a tensor train (TT). These approaches lead to the so-called Volterra–Laguerre, Volterra–GOB–Tucker, Volterra–PARAFAC and Volterra–TT models [37,38,39,40,41,42]. In Section 5.3 and Section 5.4, we review the Volterra–PARAFAC and Volterra–GOB–Tucker models. Note that a model-pruning approach can also be employed to adjust the complexity reduction in considering only nearly diagonal coefficients of the kernels and removing the other ones which correspond to more delayed input values whose influence decreases when the delay increases [43].

Another approach for ensuring a reduced parametric complexity consists in considering block-oriented NL models, composed of two types of blocks: linear time-invariant (LTI) dynamic blocks and static NL blocks. The linear blocks may be parametric (transfer functions, FIR models, state-space representations) or nonparametric (impulse responses), whereas the NL blocks may be with memory or memoryless. The different blocks are concatenated in series leading to the so-called Hammerstein (NL-LTI) and Wiener (LTI-NL) models, extended to the Wiener–Hammerstein (LTI-NL-LTI) and Hammerstein–Wiener (NL-LTI-NL) models, abbreviated W-H and H-W, respectively. To extend the modeling potential of block-oriented models, several W-H and H-W models can also be interconnected in parallel. Although such models are simpler but less general than Volterra models, they allow us to represent numerous nonlinear systems. One of the first applications of block-oriented NL models was for modeling biological systems [44]. A lot of papers have been devoted to the identification of block-oriented models and their applications. For more details, the reader is referred to the book [45] and the survey papers [46,47].

In Section 5.5, we show that the Wiener, Hammerstein and W-H models are equivalent to structured Volterra models. This equivalence is at the origin of the structure identification method for block-oriented systems, which will be presented in Section 5.5.4. Tensor-based methods using this equivalence have been developed to estimate the parameters of block-oriented nonlinear systems [48,49,50,51]. These methods are generally composed of two steps. In the first one, the Volterra kernel associated with a particular block-oriented system is used to estimate the LTI component(s). Note that there exist closed-form solutions for estimating only the Volterra kernel of interest. Such a solution is proposed in [52,53] for a third-order and fifth-order kernel, respectively. Then, in a second step, the nonlinear block is estimated using the LS method. An example of a tensor-based method for identifying a nonlinear communication channel represented by means of a W-H model was proposed in [54] using the associated third-order Volterra kernel.

On the other hand, multilinear models are useful for modeling coupled dynamical systems in engineering, biology, and physics. Tensor-based approaches have been proposed for solving and identifying multilinear systems [24,55,56]. Using the Einstein product of tensors, we first introduce a new class of systems, the so-called memoryless tensor-input tensor-output (TITO) systems, in which the multidimensional input and output signals define two tensors. The LS method is applied to estimate the tensor transfer of such a system. Then the case of a tensor-input single-output (TISO) system is considered assuming the system transfer is a rank-one Nth-order tensor, which leads to a multilinear system with respect to the impulse responses (IR) of the N subsystems associated with the N modes of the input tensor.

The non-recursive weighted least-squares (WLS) method is used to estimate the multilinear impulse response (MIR) under a vectorized form. A closed-form method is also proposed to estimate the IR of each subsystem from the estimated MIR.

The rest of the paper is structured as follows. In Section 2, we present the notations with the index convention used throughout the paper. In Section 3, we introduce some tensor sets in connection with multilinear forms. In Section 4, we briefly recall basic tensor operations and decompositions. Section 5 and Section 6 are devoted to tensor-based approaches for nonlinear and multilinear systems modeling and identification, respectively. Finally, Section 7 concludes the paper, with some perspectives for future work.

Many books and survey papers discuss estimation theory and system identification. In the field of engineering sciences, we can cite the fundamental contributions of [57,58,59,60,61,62,63] for linear systems and [27,28,29,47,64,65,66,67,68,69] for nonlinear systems. In the case of multilinear systems, the reader is referred to [55,56] for more details.

2. Notation and Index Convention

Scalars, column vectors, matrices, and tensors are denoted by lower-case, boldface lower-case, boldface upper-case, and calligraphic letters, e.g., x,

x

, X,

X

, respectively. We denote by

a_{i, r}

the

(i, r)

element and by

A_{. r}

(resp.

A_{i .}

) the rth column (resp. ith row) of

A \in C^{I \times R}

.

I_{R}

denotes the identity matrix of size

R \times R

.

The transpose, complex conjugate, transconjugate, and Moore–Penrose pseudo-inverse operators are represented by

{(.)}^{T}

,

{(\cdot)}^{*}

,

{(\cdot)}^{H}

and

{(\cdot)}^{†}

, respectively.

The operator

diag (\cdot)

forms a diagonal matrix from its vector argument, while

D_{i} (A)

stands for a diagonal matrix holding the ith row of

A \in C^{I \times R}

on the diagonal.

The operator

T_{M + N - 1, N} (\cdot)

forms a

(M + N - 1) \times N

Toeplitz matrix from its vector argument

x \in C^{M}

, whose first column and row are, respectively,

{(\begin{matrix} x_{1} & \dots & x_{M} & 0_{N - 1}^{T} \end{matrix})}^{T}

and

(\begin{matrix} x_{1} & 0_{N - 1}^{T} \end{matrix})

.

Given

Y \in C^{I \times J}

, the vec and unvec operators are defined such that:

y = vec (Y) \in C^{J I} \leftrightarrow Y = unvec (y) \in C^{I \times J}

, where the order of dimensions in the product

J I

is linked to the order of variation of the indices, with the column index j varying more slowly than the row index i.

The outer, Kronecker and Khatri–Rao products are denoted by ∘, ⊗ and ⋄, respectively.

Table 3 summarizes the notation used for sets of indices and dimensions [70].

We now introduce the index convention which allows eliminating the summation symbols in formulae involving multi-index variables. For example,

\sum_{i = 1}^{I} a_{i} b_{i}

is simply written as

a_{i} b_{i}

. Note there are two differences relative to Einstein’s summation convention:

Each index can be repeated more than twice in an expression;
Ordered index sets are allowed.

The index convention can be interpreted in terms of two types of summation, the first associated with the row indices (superscripts) and the second associated with the column indices (subscripts), with the following rules [70]:

The order of the column indices is independent of the order of the row indices;
Consecutive row and column indices (or index sets) can be permuted.

In Table 4, we give some examples of vector and matrix products using index convention, where

e_{i j} ≜ e_{i}^{(I)} \otimes e_{j}^{(J)}, e_{i}^{j} ≜ e_{i}^{(I)} \otimes {(e_{j}^{(J)})}^{T}, e_{i k}^{j} ≜ e_{i}^{(I)} \otimes e_{k}^{(K)} \otimes {(e_{j}^{(J)})}^{T}

.

Using the index convention, the multiple sum over the indices of

x_{i_{1}, \dots, i_{P}} y_{i_{1}, \dots, i_{P}}

will be abbreviated to

\sum_{i_{1} = 1}^{I_{1}} \dots \sum_{i_{P} = 1}^{I_{P}} x_{i_{1}, \dots, i_{P}} y_{i_{1}, \dots, i_{P}} = \sum_{{\underset{̲}{i}}_{P} = \underset{̲}{1}}^{{\underset{̲}{I}}_{P}} x_{{\underset{̲}{i}}_{P}} y_{{\underset{̲}{i}}_{P}} = x_{{\underset{̲}{i}}_{P}} y_{{\underset{̲}{i}}_{P}},

(1)

where

\underset{̲}{1}

denotes a set of ones whose number is fixed by the index P of the set

{\underset{̲}{I}}_{P}

. The notation

{\underset{̲}{i}}_{P}

and

{\underset{̲}{I}}_{P}

allows us to simplify the expression of the multiple sum into a single sum over an index set, which is further simplified by using the index convention.

3. Tensors and Multilinear Forms

In signal processing applications, a tensor

X \in K^{I_{1} \times \dots \times I_{N}}

of order N and size

I_{1} \times \dots \times I_{N}

is typically viewed as an array of numbers

[x_{i_{1}, \dots, i_{N}}]

. The order corresponds to the number of indices

(i_{1}, \dots, i_{N})

that characterize its elements

x_{i_{1}, \dots, i_{N}} \in K

, also denoted

x_{i_{1} \dots i_{N}}

or

{(X)}_{i_{1}, \dots, i_{N}}

. Each index

i_{n} \in 〈 I_{n} 〉 ≜ {1, \dots, I_{n}}

, for

n \in 〈 N 〉 ≜ {1, \dots, N}

, is associated with a mode, also called a way, and

I_{n}

denotes the dimension of the nth mode. The number of elements in

X

is equal to

\prod_{n = 1}^{N} I_{n}

. For instance, in a wireless communication system [8], each index of a signal

x_{i_{1}, \dots, i_{N}}

corresponds to a different form of diversity (in time, space, frequency, code, etc., domains), and the dimensions

I_{n}

are the numbers of time samples, receive antennas, subcarriers, the code length, etc.

The tensor

X

is said to be real (resp. complex) if its elements are real numbers (resp. complex numbers), which corresponds to

K = R

(resp.

K = C

). It is said to have even order (resp. odd order) if N is even (resp. odd). The special cases

N = 2

and

N = 1

correspond to the sets of matrices

X \in K^{I \times J}

and column vectors

x \in K^{I}

, respectively.

If

I_{1} = \dots = I_{N} = I

, the Nth-order tensor

X = [x_{i_{1}, \dots, i_{N}}] \in K^{I \times I \times \dots \times I}

is said to be hypercubic, of dimensions I, with

i_{n} \in 〈 I 〉

, for

n \in 〈 N 〉

. The number of elements in

X

is then equal to

I^{N}

. The set of (real or complex) hypercubic tensors of order N and dimensions I will be denoted

K^{[N; I]}

.

A hypercubic tensor of order N and dimensions I is said to be symmetric if it is invariant under any permutation

π

of its modes, i.e.,

a_{π (i_{1}, i_{2}, \dots, i_{N})} ≜ a_{i_{π (1)}, i_{π (2)}, \dots, i_{π (N)}} = a_{i_{1}, i_{2}, \dots, i_{N}} .

(2)

The identity tensor of order N and dimensions I is denoted

I_{N, I} = [δ_{i_{1}, \dots, i_{N}}]

, with

i_{n} \in 〈 I 〉

, for

n \in 〈 N 〉

, or simply

I

. It is a hypercubic tensor whose elements are defined using the generalized Kronecker delta

δ_{i_{1}, \dots, i_{N}} = \{\begin{matrix} 1 if i_{1} = \dots = i_{N} \\ 0 otherwise \end{matrix} .

It is a diagonal tensor whose diagonal elements are equal to 1 and other elements to zero, which can be written as the sum of I outer products of N canonical basis vectors

e_{i}^{(I)}

of the space

R^{I}

I_{N, I} = \sum_{i = 1}^{I} \underset{N terms}{\underset{︸}{e_{i}^{(I)} \circ \dots \circ e_{i}^{(I)}}} .

where the outer product operation is defined later in Table 9.

A diagonal tensor

X \in K^{I \times \dots \times I}

of order N, whose diagonal elements are the entries of vector

a = {[a_{1}, \dots, a_{I}]}^{T}

, will be written as

x_{i, i_{2}, \dots, i_{N}} = a_{i} δ_{i, i_{2}, \dots, i_{N}} \Leftrightarrow X = \sum_{i = 1}^{I} a_{i} \underset{N terms}{\underset{︸}{e_{i}^{(I)} \circ \dots \circ e_{i}^{(I)}}} .

Different matricizations, also called matrix unfoldings, can be defined for a tensor

X \in K^{I_{1} \times \dots \times I_{N}}

. Consider a partitioning of the set of modes

〈 N 〉

into two disjoint ordered subsets

S_{1}

and

S_{2}

, composed of p and

N - p

modes, respectively, with

p \in 〈 N - 1 〉

. A general matrix unfolding formula was given by [71] as follows

X_{S_{1}; S_{2}} = \sum_{i_{1} = 1}^{I_{1}} \dots \sum_{i_{N} = 1}^{I_{N}} x_{i_{1}, \dots, i_{N}} (\underset{n \in S_{1}}{\otimes} e_{i_{n}}^{(I_{n})}) {(\underset{n \in S_{2}}{\otimes} e_{i_{n}}^{(I_{n})})}^{T} \in K^{J_{1} \times J_{2}},

(3)

where

e_{i_{n}}^{(I_{n})}

is the

i_{n}

-th vector of the canonical basis of

R^{I_{n}}

, and

J_{n_{1}} = \underset{n \in S_{n_{1}}}{\prod I_{n}}

, for

n_{1} = 1 and 2

. We say that

X_{S_{1}; S_{2}}

is a matrix unfolding of

X

along the modes of

S_{1}

for the rows and along the modes of

S_{2}

for the columns, with

S_{1} \cap S_{2} = \emptyset

and

S_{1} \cup S_{2} = 〈 N 〉

.

For instance, in the case of a third-order tensor

X \in K^{I \times J \times K}

, we have six flat unfoldings and six tall unfoldings. For

S_{1} = 1

and

S_{2} = {2, 3}

, we have the following mode-1 flat unfolding

X_{I \times J K} ≜ X_{1; {2, 3}}

, while for

S_{1} = {2, 3}

and

S_{2} = 1

we obtain the following mode-1 tall unfolding

X_{J K \times I} ≜ X_{{2, 3}; 1} = X_{I \times J K}^{T}

.

Vectorized forms of

X \in K^{I_{1} \times \dots \times I_{N}}

are obtained by combining the modes in a given order. Thus, a lexicographical vectorization gives the vector

y ≜ x_{I_{1} \dots I_{N}}

with element

x_{i_{1}, \dots, i_{N}}

at the position

m = \bar{i_{1} i_{2} \dots i_{N}}

in

y

, i.e.,

y_{m} = x_{i_{1}, \dots, i_{N}} ≜ x_{{\underset{̲}{i}}_{N}}

, with [72]

\bar{i_{1} i_{2} \dots i_{N}} ≜ i_{N} + \sum_{n = 1}^{N - 1} (i_{n} - 1) \prod_{k = n + 1}^{N} I_{k} .

(4)

By convention, the order of the dimensions in a product

\prod_{n = 1}^{N} I_{n} ≜ I_{1} \dots I_{N}

associated with the index combination

\bar{i_{1} i_{2} \dots i_{N}}

follows the order of variation of the indices

(i_{1}, \dots, i_{N})

, with

i_{1}

varying more slowly than

i_{2}

, which in turn varies more slowly than

i_{3}

, etc.

The Frobenius norm of

X \in K^{I_{1} \times \dots \times I_{N}}

is the square root of the inner product of the tensor with itself, i.e.,

{∥ X ∥}_{F} = \sqrt{〈 X, X 〉} = {(\sum_{i_{1} = 1}^{I_{1}} \dots \sum_{i_{N} = 1}^{I_{N}} {| x_{i_{1}, \dots, i_{N}} |}^{2})}^{1 / 2} .

(5)

Table 5 presents various sets of tensors that will be considered in this paper, with the notation introduced in [70].

We can make the following remarks about the sets of tensors defined in Table 5:

•: For $P = N = 1$ , the set $K^{[2; I, J]}$ is the set $K^{I \times J}$ of (real or complex) matrices of size $I \times J$ .
•: The set $K^{[P; I]}$ is also denoted ${K^{I}}^{P}$ or T $^{P} (K^{I})$ by some authors.
•: The set $K^{{\underset{̲}{I}}_{P} \times {\underset{̲}{I}}_{P}}$ is called the set of even-order (or square) tensors of order $2 P$ and size ${\underset{̲}{I}}_{P} \times {\underset{̲}{I}}_{P}$ . The name of square tensor comes from the fact that the index set is divided into two identical subsets of dimension ${\underset{̲}{I}}_{P}$ .
•: Analogously to matrices, tensors in the sets $K^{{\underset{̲}{I}}_{P} \times {\underset{̲}{J}}_{P}}$ with $J_{p} \neq I_{p}$ and $K^{{\underset{̲}{I}}_{P} \times {\underset{̲}{J}}_{N}}$ are said to be rectangular. The set $K^{{\underset{̲}{I}}_{P} \times {\underset{̲}{J}}_{N}}$ is called the set of rectangular tensors with index blocks of dimensions ${\underset{̲}{I}}_{P}$ and ${\underset{̲}{J}}_{N}$ .

The various tensor sets introduced above can be associated with scalar real-valued multilinear forms in vector variables and with homogeneous polynomials. Like in the matrix case, we will distinguish between homogeneous polynomials of degree P that depend on the components of P vector variables and those that depend on just one vector variable.

A real-valued multilinear form, also called a P-linear form, is a map f such as

\overset{P}{\underset{p = 1}{\times}} R^{I_{p}} ∋ (x^{(1)}, \dots x^{(P)}) ⟼ f (x^{(1)}, \dots x^{(P)}) \in R

(6)

that is separately linear with respect to each vector variable

x^{(p)}

when the other variables

x^{(q)}

, for

q \neq p

, are fixed. Using the index convention, the multilinear form can be written for

x^{(p)} \in R^{I_{p}}

,

p \in 〈 P 〉

, as

f (x^{(1)}, \dots, x^{(P)}) = \sum_{i_{1} = 1}^{I_{1}} \dots \sum_{i_{P} = 1}^{I_{P}} a_{i_{1}, \dots, i_{P}} x_{i_{1}}^{(1)} \dots x_{i_{P}}^{(P)} = a_{{\underset{̲}{i}}_{P}} \prod_{p = 1}^{P} x_{i_{p}}^{(p)} .

(7)

The tensor

A \in R^{{\underset{̲}{I}}_{P}}

is called the tensor associated with the multilinear form f.

Two multilinear forms are presented in Table 6, which also states the transformation corresponding to each of them, as well as the associated tensor.

Table 7 recalls the definitions of bilinear/quadratic forms using the index convention, then presents the multilinear forms defined in Table 6, as well as the associated tensors from Table 5 and the corresponding homogeneous polynomials.

We can make the following remarks:

In the same way that bilinear forms depend on two variables that do not necessarily belong to the same vector space, general real multilinear forms depend on P variables that may belong to different vector spaces: $x^{(p)} \in R^{I_{p}}$ .
Analogously to quadratic forms obtained from bilinear forms by replacing the pair $(x, y)$ with the vector $x$ , real multilinear forms can be expressed using just one vector $x \in K^{I}$ . In the same way symmetric quadratic forms lead to the notion of symmetric matrices, the symmetry of multilinear forms is directly linked to the symmetry of their associated tensors.

4. Tensor Operations and Decompositions

In Section 4.1, we introduce different multiplications with tensors. Then, in Section 4.2, we present the two most used tensor decompositions, namely the PARAFAC (parallel factors) and Tucker decompositions [12,13].

For a more in-depth presentation of tensor tools, the reader is referred to the recent book [70] and review papers [73,74].

4.1. Multiplications with Tensors

In Table 8, we present three types of multiplication with tensors, using the notation of Table 3 and the index convention: mode-p, mode-

(p, n)

, and Einstein products.

The multiplication

\times_{p}

, called mode-p or Tucker product, corresponds to a summation over the index

i_{p}

associated with the mode p of the Pth-order tensor

X

and the second index of

A

, giving a tensor of order

P - 1

, and size

I_{1} \times \dots \times I_{p - 1} \times I_{p + 1} \times \dots \times I_{P}

.

The mode-

(p, n)

product, denoted

\times_{p}^{n}

, corresponds to a contraction operation performed for two arbitrary modes

(p, n)

, such as:

I_{p} = J_{n} = K

. This multiplication gives a tensor of order

P + N - 2

and size

I_{1} \times \dots \times I_{p - 1} \times I_{p + 1} \times \dots \times I_{P} \times J_{1} \times \dots \times J_{n - 1} \times J_{n + 1} \times \dots \times J_{N}

.

The Einstein product, denoted

A ★_{N} X

, of the tensors

A \in K^{{\underset{̲}{I}}_{P} \times {\underset{̲}{J}}_{N}}

of order

P + N

and

X \in K^{{\underset{̲}{J}}_{N} \times {\underset{̲}{K}}_{Q}}

of order

N + Q

corresponds to a contraction along the N shared indices

{\underset{̲}{j}}_{N}

, associated with the N last modes of

A

and the N first modes of

X

. The tensor

A

can be interpreted as a multilinear operator associated with a multilinear transformation applied to the tensor

X

. The Einstein product will be used in Section 6 for defining multilinear systems.

Table 9 presents a few examples of outer products of vectors, matrices, and tensors, indicating the order and the space to which the tensors resulting from the products belong.

4.2. PARAFAC and Tucker Decompositions

The PARAFAC decomposition [13] is also called CANDECOMP (canonical decomposition) by [75] and CP for CANDECOMP/PARAFAC by [76] when applied to decompose a data tensor. In the context of system modeling, it is called a PARAFAC model. It amounts to decomposing a tensor into a sum of R polyads, i.e., R rank-one tensors [10]. For an Nth-order tensor

X

, each polyad corresponds to the outer product of the rth columns of N factor matrices

A^{(n)} \in K^{I_{n} \times R}

, i.e.,

\overset{N}{\underset{n = 1}{\circ}} A_{. r}^{(n)}

. When R is minimal, it is called the tensor rank or canonical rank of

X

. PARAFAC is also called a canonical polyadic decomposition (CPD), and concisely written as

{A^{(1)}, \dots, A^{(N)}; R}

. When

R = 1

,

X \in K^{{\underset{̲}{I}}_{N}}

is a rank-one tensor, also called a separable tensor. Then, it can be written as the outer product of N non-zero vectors

a^{(n)} \in K^{I_{n}}

X = \overset{N}{\underset{n = 1}{\circ}} a^{(n)} ⟺ x_{{\underset{̲}{i}}_{N}} = \prod_{n = 1}^{N} a_{i_{n}}^{(n)} .

(8)

In the case of a symmetric rank-one tensor

X \in K^{[N, I]}

, all the vectors

a^{(n)} \in K^{I}

are identical [77].

In Table 10, we present different ways of writing a PARAFAC decomposition for a third-order and Nth-order tensor: scalar writing, with mode-p and outer products, and matrix unfoldings as defined in (3).

PARAFAC models have the following two main features:

Essential uniqueness, i.e., uniqueness up to trivial indeterminacies corresponding to permutation and scalar ambiguities of the columns of the factor matrices (see [78,79]);
Existence of a simple algorithm, the so-called alternating least-squares (ALS), for estimating the PARAFAC parameters for a tensor of an arbitrary order N.

The Tucker decomposition [12] of a tensor

X \in K^{{\underset{̲}{I}}_{N}}

can be viewed as a generalization of the PARAFAC decomposition in the sense that such a decomposition allows taking into account all interactions between distinct columns of the factor matrices

A^{(n)} \in K^{I_{n} \times R_{n}}

, when a PARAFAC model only involves interactions between the same columns

r \in 〈 R 〉

of the factor matrices

A^{(n)} \in K^{I_{n} \times R}

. In Table 11, we present different ways of writing a Tucker decomposition for a third-order and Nth-order tensor. From the writing with outer products, we can conclude that the Tucker model of a Nth-order tensor consists in a weighted sum of

\prod_{n = 1}^{N} R_{n}

rank-one tensors, where the coefficients

g_{r_{1}, \dots, r_{N}}

of the core tensor

G \in K^{{\underset{̲}{R}}_{N}}

, define the weights of the interactions between the columns

A_{. r_{n}}^{(n)}

of the factor matrices.

Note that a Tucker decomposition is generally not essentially unique, unless additional constraints are imposed, such as a perfect knowledge of the core tensor, certain sparseness or structural constraints on the core tensor or the matrix factors [80,81]. Consult [82] for a review of uniqueness results for Tucker models.

5. Tensor-Based Approaches for Nonlinear Systems

The use of tensor-based approaches for nonlinear systems has proved advantageous in three areas: (i) parametric complexity reduction in order to get efficient and computationally fast parameters estimation algorithms, (ii) generating new representations of nonlinear systems thanks to tensor decompositions, and (iii) structural identification of systems which can be represented as combinations of dynamical linear systems with static nonlinear blocks. In Section 5.1, we first describe polynomial models. Then, in Section 5.2, we introduce standard discrete-time Volterra models. To reduce the parametric complexity of such models, in Section 5.3, we present a tensor approach which consists in using a PARAFAC decomposition of Volterra kernels. The expansion of Volterra kernels on orthogonal bases is considered in Section 5.4. Finally, some links between block-oriented models (Hammerstein, Wiener, and Wiener–Hammerstein) and tensor representations via their associated Volterra kernels are established in Section 5.5

5.1. Polynomial Models

Polynomial models are a direct extension of linear models. For a single-input single-output (SISO) system, the output of a recursive polynomial model, at discrete-time instant t, is given by

\hat{y} (t) = \sum_{p = 1}^{P} f_{p} [u (t), \dots, u (t - n_{u}), y (t - 1), \dots, y (t - n_{y})],

(9)

where

f_{p} (.)

is a pth-degree polynomial in the system input

(u)

and output

(y)

signals, P is the nonlinearity order, and

M = m a x (n_{u}, n_{y})

is the memory of the model. In the sequel, all the signals will be assumed to be real-valued.

The input/output (I/O) relationship (9) is also called a nonlinear autoregressive with exogenous input (NARX) model [83], or a one-step prediction model, i.e., a model whose output

\hat{y} (t)

at time t depends on past values

y (t - n)

(for

n \in 〈 n_{y} 〉

) of the system output, and current and past values

u (t - n)

(for

n = 0, 1, \dots, n_{u}

) of the system input. This model is an extension of the standard autoregressive with exogenous input (ARX) model, frequently used to study discrete time series, due to the presence of nonlinear terms in the input–output signals, which explains its success in many industrial applications.

Equation (9) can also be written as a regression model which is linear in its parameters, namely the polynomial coefficients, and nonlinear in the I/O signals:

\hat{y} (t) = φ^{T} (u (t), y (t - 1)) θ,

(10)

where

u (t) = {[u (t) \dots u (t - n_{u})]}^{T}

,

y (t - 1) = {[y (t - 1) \dots y (t - n_{y})]}^{T}

, and

φ

is the nonlinear regressor vector whose components are monomials in (i.e., products of) previous system outputs and previous and current system inputs contained in the vectors

y (t - 1)

and

u (t)

, and

θ

is the parameter vector containing the polynomial coefficients.

If the previous system outputs

y (t - 1), \dots, y (t - n_{y})

are replaced by previous model outputs

\hat{y} (t - 1), \dots, \hat{y} (t - n_{y})

, the polynomial model (9) is then called a simulation or nonlinear output error (NOE) model, defined as

\hat{y} (t) = \sum_{p = 1}^{P} f_{p} [u (t), \dots, u (t - n_{u}), \hat{y} (t - 1), \dots, \hat{y} (t - n_{y})] = φ^{T} (u (t), \hat{y} (t - 1)) θ,

(11)

with

\hat{y} (t - 1) = {[\hat{y} (t - 1) \dots \hat{y} (t - n_{y})]}^{T}

.

This model is recursive with respect to previous model outputs

\hat{y} (t - n)

(for

n \in 〈 n_{y} 〉

), while the one-step prediction model (10) is purely feedforward.

As for linear systems, the advantage of NARX and NOE models with output feedback is to be more parsimonious than without output feedback, which means a reduced parametric complexity in terms of dimension of the parameter vector

θ

. One drawback of output feedback is that stability is generally not guaranteed. Another drawback of NOE models is that they need to use a nonlinear optimization method for parameter estimation due to the dependence of

\hat{y} (t - 1)

on

θ

in the regression Equation (11) implying a nonlinear dependence of the model output with respect to model parameters. That is not the case for the NARX model that is linear in its parameters, whose estimation can therefore be carried out by means of the standard least-squares (LS) algorithm.

5.2. Truncated Volterra Models

When the polynomial functions

f_{p} (.)

in (9) are independent from the output signal, i.e., without output feedback, the polynomial model is called a nonrecursive polynomial model or a discrete-time Volterra model. A Pth-order Volterra model for a causal, stable, finite-memory, time-invariant SISO system is described by the following I/O relationship:

\hat{y} (t) = h_{0} + \sum_{p = 1}^{P} \sum_{m_{1} = 1}^{M_{p}} \dots \sum_{m_{P} = 1}^{M_{P}} h_{m_{1}, \dots, m_{P}}^{(p)} \prod_{q = 1}^{p} u (t - m_{q} + 1) = h_{0} + \sum_{p = 1}^{P} {\hat{y}}^{(p)} (t),

(12)

where

h_{0}

is the offset,

M_{p}

is the memory of the pth-order homogeneous term

{\hat{y}}^{(p)} (t)

, and

h_{m_{1}, \dots, m_{P}}^{(p)}

is a coefficient of the pth-order Volterra kernel, assumed to be real-valued.

Note that a truncated Volterra model can be seen as a truncated Taylor series expansion for approximating a given smooth nonlinear function (around 0 by convention).

Equation (12) can also be written as a polynomial regression model linear in its parameters and composed of monomials in previous samples of the input signal

\hat{y} (t) = h_{0} + φ^{T} (u (t)) θ,

(13)

where

u (t) = {[u (t), \dots, u (t - M)]}^{T}

, with

M = m a x_{p} (M_{p})

, and

θ

is the parameter vector containing all the kernel coefficients, and the vector

φ

contains all possible monomials in u up to degree P. In the sequel, we assume that all memories

M_{p}

are equal to M. The coefficient

h_{m_{1}, \dots, m_{P}}^{(p)}

being characterized by p indices can be viewed as an element of a tensor

H^{(p)} \in R^{[p, M]}

, of order p, characterized by

M^{p}

entries, which is a number growing very fast with the kernel order p. The pth-order homogeneous term

{\hat{y}}^{(p)} (t)

can then be written using the Tucker product as

{\hat{y}}^{(p)} (t) = H^{(p)} \overset{p}{\underset{q = 1}{\times}} u^{T} (t),

(14)

which is a homogeneous polynomial of degree p in the components of the input vector.

Several adaptive and nonadaptive methods have been proposed to identify truncated Volterra models from I/O measurements, both in the time and frequency domains. Frequency methods are based on the use of input signal cumulants, which requires estimating high-order statistics of the input signal, up to order

2 P

for a Pth-order Volterra model. Such an approach is mainly interesting with a Gaussian input signal, since the input cumulants of order higher than two are then zero, which implies a significant simplification of frequency methods. In the time domain, we can distinguish the optimal minimum mean-square error (MMSE) estimator, based on the use of input signal statistics, the nonrecursive least-squares (LS) algorithm, which can be viewed as an approximation of the MMSE solution, and adaptive methods. Note that estimating the parameters of an homogeneous Pth-order Volterra kernel, with memory M, using the MMSE and nonrecursive LS solutions requires inverting an autocorrelation matrix of size

M^{P} \times M^{P}

, which is a time consuming and numerically difficult task.

Adaptive methods are often associated with adaptive Volterra filters used for representing NL time-varying signals and systems as encountered in echo cancellation, for instance. Parameter estimation of adaptive Volterra filters is carried out using the well-known least-mean-square (LMS) or recursive LS (RLS) algorithms. See the book [28] and the references therein for an overview of the methods briefly introduced above.

In the next section, we present an approach for identifying reduced complexity Volterra models which is based on a PARAFAC decomposition of symmetrized kernels, leading to the so-called Volterra–PARAFAC models.

5.3. Volterra–PARAFAC Models

As each permutation of the indices

m_{1}, \dots, m_{p}

corresponds to the same product

\prod_{q = 1}^{p} u (t - m_{q} + 1)

of delayed inputs, we can sum all the coefficients associated with these permutations to get a symmetric kernel given by

h_{m_{1}, \dots, m_{P}}^{(p, s y m)} = \frac{1}{p!} \sum_{π (.)} h_{m_{π (1)}, \dots, m_{π (p)}}^{(p)},

where

(π (1), \dots, π (p))

denotes a permutation of

(1, \dots, p)

. So, in the sequel, without loss of generality, the Volterra kernels of order

p \geq 2

will be considered in symmetric form. Assuming all the kernels have the same memory M, the number of independent coefficients contained in the symmetric pth-order kernel is equal to

C_{p}^{M + p - 1} = \frac{(M + p - 1)!}{p! (M - 1)!}

, showing that this number, and consequently the parametric complexity of the Volterra model, grows quickly with M even for moderate p.

In order to reduce the complexity of Volterra models, a PARAFAC decomposition of symmetrized kernels was exploited in [40,41]. The symmetrized pth-order Volterra kernel can then be decomposed using a symmetric PARAFAC decomposition, with symmetric rank

r_{p}

and matrix factor

A^{(p)} \in R^{M \times r_{p}}

, for

p \in 〈 P 〉

, as [77]

h_{m_{1}, \dots, m_{P}}^{(p, s y m)} = \sum_{r = 1}^{r_{p}} \prod_{q = 1}^{p} a_{m_{q}, r}^{(p)}, m_{q} = 1, \dots M .

(15)

Remark 1.

Note that a pth-order Volterra kernel is said to be separable if it can be written as the product of p first-order kernels [28], i.e.,

h_{m_{1}, \dots, m_{P}}^{(p)} = \prod_{q = 1}^{p} a_{m_{q}}^{(p)}, m_{q} = 1, \dots M

(16)

which corresponds to a rank-one PARAFAC decomposition (15).

The kernel decomposition (15) allows rewritting the pth-order homogeneous term as follows:

\begin{matrix} {\hat{y}}^{(p)} (t) = & \sum_{m_{1} = 1}^{M} \dots \sum_{m_{P} = 1}^{M} h_{m_{1}, \dots, m_{P}}^{(p, s y m)} \prod_{q = 1}^{p} u (t - m_{q} + 1) \end{matrix}

(17)

\begin{matrix} = & \sum_{m_{1} = 1}^{M} \dots \sum_{m_{P} = 1}^{M} (\sum_{r = 1}^{r_{p}} \prod_{q = 1}^{p} a_{m_{q}, r}^{(p)}) \prod_{q = 1}^{p} u (t - m_{q} + 1), \end{matrix}

(18)

or equivalently

{\hat{y}}^{(p)} (t) = \sum_{r = 1}^{r_{p}} \prod_{q = 1}^{p} (\sum_{m_{q} = 1}^{M} a_{m_{q}, r}^{(p)} u (t - m_{q} + 1)) = \sum_{r = 1}^{r_{p}} {(u^{T} (t) A_{. r}^{(p)})}^{p} .

(19)

We then obtain a homogeneous polynomial of degree p expressed as a sum of powers of linear forms, which is directly connected to the Waring problem. Note that a Waring’s decomposition consists in expressing a homogeneous polynomial of degree p in n variables (i.e., a quantics), associated with a symmetric tensor, as a sum of pth powers of linear forms [84]. This pth-order homogeneous term can therefore be carried out in parallelizing

r_{p}

Wiener models. As introduced later (see Section 5.5.2), each Wiener model is composed of a FIR linear filter whose coefficients are the components of a column

A_{. r}^{(p)} \in R^{M}

of the matrix factor

A^{(p)}

, in cascade with a static nonlinearity equal to the power

{(.)}^{p}

. Consequently, the Volterra model output (12) is obtained as the sum of the offset term

h_{0}

, and the outputs of

\sum_{p = 1}^{P} r_{p}

Wiener models in parallel, as illustrated in Figure 1 for a cubic Volterra–PARAFAC model, where

A_{. 1}^{(1)} = [h_{1}^{(1)}, \dots, h_{M}^{(1)}]

and

r_{1} = 1

.

It is worth noting that such a Volterra–PARAFAC model provides a very attractive modular and parallel architecture for approximating nonlinear systems with a low computational complexity.

This Volterra–PARAFAC architecture is to be compared with the parallel cascade Wiener (PCW) model composed of P Wiener models in parallel and described by the following equation:

\hat{y} (t) = \sum_{p = 1}^{P} N^{(p)} (\sum_{m = 1}^{M} h_{m}^{(p)} u (t - m + 1)),

(20)

where

h_{m}^{(p)}

is the mth coefficient of the pth FIR model

h^{(p)}

, and

N^{(p)}

represents a static nonlinearity for the pth path,

p \in 〈 P 〉

. Comparing Equation (20) with Equation (19) allows us to conclude that the Volterra–PARAFAC model is a PCW one whose the FIR filters are the columns of the factor matrices of the PARAFAC representations of the Volterra kernels, and the pth static nonlinearity

N^{(p)} (.)

is the power

{(.)}^{p}

. In [85], it is shown that any discrete-time, finite-memory nonlinear system can be approximated with an arbitrary accuracy by a PCW model, with a finite number P of paths. A method based on a joint diagonalization of third-order Volterra kernel slices is proposed in [50] for identifying PCW systems.

The extended Kalman filter (EKF) was proposed in [40,41] to estimate the parameters of a Volterra–PARAFAC model, associated with the following state-space representation

\begin{matrix} θ (t) = & θ (t - 1) + w (t) \end{matrix}

(21)

\begin{matrix} y (t) = & \sum_{p = 1}^{P} \sum_{r = 1}^{r_{p}} {(u^{T} (t) A_{. r}^{(p)})}^{p} = G (θ (t), u (t)), \end{matrix}

(22)

where the state vector is the Volterra–PARAFAC parameters vector

θ

defined as

\begin{matrix} θ ≜ & {[{[A_{. 1}^{(1)}]}^{T}, {[A_{. 1}^{(2)}]}^{T}, \dots, {[A_{. r_{2}}^{(2)}]}^{T}, \dots, {[A_{. 1}^{(P)}]}^{T}, \dots, {[A_{. r_{P}}^{(P)}]}^{T}]}^{T} \in R^{\bar{M}} \end{matrix}

(23)

\begin{matrix} = & {[{[A_{. 1}^{(1)}]}^{T}, {vec}^{T} (A^{(2)}), \dots, {vec}^{T} (A^{(P)})]}^{T} \end{matrix}

(24)

\begin{matrix} = & {[{[θ^{(1)}]}^{T}, {[θ^{(2)}]}^{T}, \dots, {[θ^{(P)}]}^{T}]}^{T}, \end{matrix}

(25)

with

θ^{(p)} ≜ vec (A^{(P)}) ≜ {[{[A_{. 1}^{(p)}]}^{T}, \dots, {[A_{. r_{p}}^{(p)}]}^{T}]}^{T}

, for

p \in 〈 P 〉

, and

\bar{M} = M (1 + \sum_{p = 2}^{P} r_{p})

.

Equation (21) corresponds to a random walk model for modeling slowly time-varying parameters

θ

, and

w (t) \in R^{\bar{M}}

is a white Gaussian noise sequence with covariance

σ_{w}^{2} I_{\bar{M}}

.

The EKF algorithm can be used online for updating the estimated parameters as input samples become available, and even for tracking time-varying kernels. It is obtained by applying the Kalman filter after linearization of the nonlinear function

G (θ, u (t))

around the last estimate

\hat{θ} (t - 1)

y (t) \approx G (\hat{θ} (t - 1), u (t)) + h^{T} (t) (θ - \hat{θ} (t - 1)),

(26)

where

h (t)

is the gradient of

G (θ, u (t))

with respect to the parameter vector

θ

, calculated at the point

θ = \hat{θ} (t - 1)

\begin{matrix} h (t) ≜ & \frac{\partial G (θ, u (t))}{\partial θ} |_{θ = \hat{θ} (t - 1)} \in R^{\bar{M}} \end{matrix}

(27)

\begin{matrix} = & {[{(h^{(1)} (t))}^{T}, {(h^{(2)} (t))}^{T}, \dots, {(h^{(P)} (t))}^{T}]}^{T} \end{matrix}

(28)

\begin{matrix} h^{(1)} (t)) = & u (t) \end{matrix}

(29)

\begin{matrix} h^{(P)} (t) ≜ & \frac{\partial G (θ, u (t))}{\partial θ^{(p)}} |_{θ = \hat{θ} (t - 1)}, for p \in [2, P] . \end{matrix}

(30)

Let us define the scalar quantity

z_{p, r} (t)

as

z_{p, r} (t) ≜ u^{T} (t) A_{. r}^{(p)} .

(31)

The nonlinear function

G (θ, u (t))

defined in () can then be written as

G (θ, u (t)) = \sum_{p = 1}^{P} \sum_{r = 1}^{r_{p}} z_{p, r}^{p} (t) .

(32)

By the chain rule, we have

\begin{matrix} \frac{\partial G (θ, u (t))}{\partial A_{. r}^{(p)}} = & p z_{p, r}^{p - 1} (t) u (t) \in R^{M} \\ ⇓ \\ h^{(P)} (t) = & {[{[\frac{\partial G (θ, u (t))}{\partial A_{. 1}^{(p)}}]}^{T}, \dots, {[\frac{\partial G (θ, u (t))}{\partial A_{. r_{p}}^{(p)}}]}^{T}]}_{θ = \hat{θ} (t - 1)}^{T} \\ = & p {[{\hat{z}}_{p, 1}^{p - 1} (t), \dots, {\hat{z}}_{p, r_{p}}^{p - 1} (t)]}^{T} \otimes u (t) \in R^{M r_{p}}, \end{matrix}

(33)

where

{\hat{z}}_{p, r} (t) = u^{T} (t) {\hat{A}}_{. r}^{(p)}

.

The EKF equations are then derived from the Kalman filter associated with the linearized state space equations

\begin{matrix} θ (t) = & θ (t - 1) + w (t) \end{matrix}

(34)

\begin{matrix} y (t) = & h^{T} (t) θ + n (t), \end{matrix}

(35)

where

n (t)

is assumed to be a white Gaussian noise, with variance

σ_{n}^{2}

, including both the measurement noise and the modeling error.

The innovation process associated with the linearized Equation (26) is equal to

\begin{matrix} e (t) = & y (t) - G (\hat{θ} (t - 1), u (t)) \end{matrix}

(36)

\begin{matrix} = & h^{T} (t) (θ - \hat{θ} (t - 1)) + n (t), \end{matrix}

(37)

with variance

s (t) = E [e^{2} (t)] = h^{T} (t) P (t | t - 1) h (t) + σ_{n}^{2}

, where

P (t | t - 1)

is the covariance matrix of the prediction error

θ - \hat{θ} (t - 1)

, and

G (\hat{θ} (t - 1), u (t)) = \sum_{p = 1}^{P} \sum_{r = 1}^{r_{p}} {(u^{T} (t) {\hat{A}}_{. r}^{(p)} (t - 1))}^{p} .

(38)

The Kalman gain is given by

k (t) = \frac{1}{s (t)} P (t | t - 1) h (t),

(39)

and the recursive equation for calculating the parameter vector estimate is

\hat{θ} (t) = \hat{θ} (t - 1) + k (t) e (t) .

(40)

Finally, the equation for updating the covariance matrix of the one-step prediction error is

P (t + 1 / t) = (I_{\bar{M}} - k (t) h^{T} (t)) P (t | t - 1) + σ_{w}^{2} I_{\bar{M}} .

(41)

The EKF algorithm is summarized in Algorithm 1.

Algorithm 1: Extended Kalman filter for parameter estimation of a Volterra–PARAFAC model.

Given

σ_{w}^{2}

and

σ_{n}^{2}

:

Initialize $P (0 / - 1)$ and $\hat{θ} (0)$ .
For $t = 1$ to $t = T$ , compute:

The innovation process $e (t)$ using Equations (36) and (38);
The gradient $h (t)$ using Equations (28), (29), (31) and (33);
The Kalman gain $k (t)$ using Equation (39);
The recursive parameter estimate with Equation (40);
The updated error covariance matrix $P (t + 1 / t)$ using Equation (41).

Example 1.

In this example, we consider a memory

M = 5

third-order Volterra model with rank one second-order and third-order kernels. Each kernel acts on a specific bandwidth, making the nonlinear distortion frequency selective. Precisely:

First-order kernel $h_{m_{1}}^{(1)} = a_{m_{1}}^{(1)}$ , with $a_{m_{1}}^{(1)}$ the $m_{1}$ th entry of
$A_{. 1}^{(1)} = {[\begin{matrix} 0.0284 & 0.2370 & 0.4692 & 0.2370 & 0.0284 \end{matrix}]}^{T}$ , which represents a low pass FIR filter with normalized cut-off frequency 0.2.
Second-order kernel $h_{m_{1}, m_{2}}^{(2)} = a_{m_{1}}^{(2)} a_{m_{2}}^{(2)}$ with $a_{m_{i}}$ , $i = 1, 2$ the entries of
$A_{. 1}^{(2)} = {[\begin{matrix} - 0.0568 & 0.0708 & 0.8698 & 0.0708 & - 0.0568 \end{matrix}]}^{T}$ , which stands for a bandpass filter with normalized frequencies 0.3 and 0.5.
Third-order kernel $h_{m_{1}, m_{2}, m_{3}}^{(2)} = a_{m_{1}}^{(3)} a_{m_{2}}^{(3)} a_{m_{3}}^{(3)}$ with $a_{m_{i}}$ , $i = 1, 2, 3$ the entries of $A_{. 1}^{(3)} = {[\begin{matrix} 0.0302 & - 0.3463 & 0.7471 & - 0.3463 & 0.0302 \end{matrix}]}^{T}$ , a high-pass filter with normalized frequency 0.9.

We first analyze the output reconstruction capability of Volterra–PARAFAC with parameters estimated by an EKF. Then, we evaluate the transient behavior of the algorithm in the noiseless case in comparison with PCWS. Eventually, the steady state results in the noisy case are evaluated. The considered PCWS has three branches, each branch being a Wiener system of order 3 and memory 5. The parameters of PCWS were estimated using an EKF. The simulation results given hereafter were obtained by implementing the algorithms with MATLAB R2018b. The code is provided as Supplementary Material.

Output reconstruction:

We consider the composite signal

u (t) = 0.5 s i n (0.01 π t) + 0.5 s i n (0.9 π t)

as input. Figure 2 depicts output reconstruction obtained with the proposed EKF filter from a noisy signal. One can note a very good reconstruction after convergence of the filter.

Transient behavior evaluation in the noiseless case:

R = 100

Monte Carlo runs are considered for this analysis. For each run ρ, the square error

e_{ρ} (t) = {(y (t) - {\hat{y}}_{ρ} (t))}^{2}

, with

{\hat{y}}_{ρ} (t)

the reconstructed output at the ρ-th run, is computed. Then the median value over the R runs is computed as

ϵ (t) = median \{e_{ρ} (t), ρ = 1, 2, \dots, R\}

. This allows discarding outliers due to ill convergence of EKF. Indeed, depending on the initialization, the EKF sometimes failed to converge with the selected number of samples. This is particularly the case for PCWS. Finally,

ϵ (t)

is smoothed with a moving average filter:

ϵ_{L} (t) = \frac{1}{L} \sum_{τ = 0}^{L - 1} ϵ (t - τ)

. The obtained results are given in Figure 3 where a comparison between PCWS and Volterra–PARAFAC in terms of the square error

ϵ_{L} (t)

, with

L = 100

, is depicted. In general, EKF converges faster with Volterra–PARAFAC than with PCWS in the noiseless case.

Evaluation in steady state:

To evaluate the steady state performance, the NMSE (normalized mean square error) is calculated as

N M S E = \frac{{\bar{ϵ}}^{2}}{{\bar{y}}^{2}}

, with

{\bar{ϵ}}^{2} = \frac{1}{t_{f} - t_{0}} \sum_{t = t_{0}}^{t_{f}} ϵ^{2} (t)

and

{\bar{y}}^{2} = \frac{1}{t_{f} - t_{0}} \sum_{t = t_{0}}^{t_{f}} y^{2} (t)

, where the interval

[t_{0}, t_{f}]

characterizes the steady state. The evaluation was carried out with two types of input signals: the composite sum of sines previously used and a random input. The random input was drawn from a uniform distribution between

- 1

and 1. The number of iterations needed for convergence of the EKF with the random input was much less than with sum of sines;

10, 000

samples were generated for a random input and the steady performance was evaluated from the 1000 last samples of the reconstructed output. In the case of sum of sines,

100, 000

samples were generated and the steady state was evaluated from the

10, 000

last samples. A white Gaussian noise was added to the output; its variance depends on a specified signal-to-noise ratio (SNR). For different values of SNR, Figure 4 and Figure 5 depict the NMSE in steady state for sum of sines and for the considered random input, respectively. It can be noticed that in steady state, both Volterra–PARAFAC and PCWS give the same performance whatever the input signal. With a lower SNR value, Volterra–PARAFAC is slightly better than PCWS.

5.4. Volterra–GOB Models

Under stability and causality conditions, a Volterra kernel

H^{(p)}

can be expanded on a basis of orthogonal functions [27]. Various functions have been introduced in the literature (Laguerre, Kautz, generalized orthogonal basis functions (GOBF), etc.). Selection of such a basis has been widely studied (see [38] for instance). Defining by

b_{k_{j}}^{(j, p)} (.)

,

k_{j} = 1, 2, \dots

, a set of orthogonal basis functions for expanding the pth-order Volterra kernel along its jth mode,

j \in 〈 p 〉

, the GOB expansion of this Volterra kernel in such a basis is given by

h_{m_{1}, m_{2}, \dots, m_{p}}^{(p)} = \sum_{k_{1} = 1}^{\infty} \sum_{k_{2} = 1}^{\infty} \dots \sum_{k_{p} = 1}^{\infty} g_{k_{1}, k_{2}, \dots, k_{p}}^{(p)} \prod_{j = 1}^{p} b_{k_{j}}^{(j, p)} (m_{j}), m_{j} \in 〈 M_{p} 〉, j \in 〈 p 〉,

(42)

where

g_{k_{1}, k_{2}, \dots, k_{p}}^{(p)}

are the coefficients of the expansion, also called Fourier coefficients, and the GOB functions

b_{k_{j}}^{(j, p)} (.)

in the time domain can be derived from the inverse z-transform of some transfer function [86]. This expansion is often truncated to a given order

K_{p}

for practical reasons, leading to the following truncated development

h_{m_{1}, m_{2}, \dots, m_{p}}^{(p)} = \sum_{k_{1} = 1}^{K_{p}} \sum_{k_{2} = 1}^{K_{p}} \dots \sum_{k_{p} = 1}^{K_{p}} g_{k_{1}, k_{2}, \dots, k_{p}}^{(p)} \prod_{j = 1}^{p} b_{k_{j}}^{(j, p)} (m_{j}),

(43)

where

b_{k_{j}}^{(j, p)} (m_{j})

is the

m_{j}

th entry of the

k_{j}

th column

B_{. k_{j}}^{(j, p)} \in R^{M_{p}}

of the matrix factor

B^{(j, p)} \in R^{M_{p} \times K_{p}}

, associated with mode

j \in 〈 p 〉

.

The development (43) of the pth-order Volterra kernel, viewed as a pth-order tensor

H^{(p)} = [h_{m_{1}, m_{2}, \dots, m_{p}}^{(p)}] \in R^{M_{p} \times \dots \times M_{p}}

, can be interpreted as the following Tucker model:

H^{(p)} = G^{(p)} \overset{p}{\underset{j = 1}{\times}} B^{(j, p)},

(44)

where the core tensor

G^{(p)} \in R^{K_{p} \times \dots \times K_{p}}

contains the Fourier coefficients.

Consider the FIR linear filter

B_{k_{j}}^{(j, p)} (q^{- 1}) = \sum_{m_{j} = 1}^{M_{p}} b_{k_{j}}^{(j, p)} (m_{j}) q^{- m_{j}}

, where

q^{- 1}

is the unit delay operator. This filter, with memory

M_{p}

, is associated with mode j of the tensor

H^{(p)}

.

Now, let us define the filtered input

s_{k_{j}}^{(j, p)} (t)

, for

k_{j} \in 〈 K_{p} 〉

and

j \in 〈 p 〉

, as

s_{k_{j}}^{(j, p)} (t) = B_{k_{j}}^{(j, p)} (q^{- 1}) u (t) = \sum_{m_{j} = 1}^{M_{p}} b_{k_{j}}^{(j, p)} (m_{j}) u (t - m_{j}) .

(45)

Using the truncated expansion (43) of the pth Volterra kernel and the filtered inputs (45), the input–output relationship for the pth-order homogeneous Volterra–GOB term can then be written as

\begin{matrix} {\hat{y}}^{(p)} (t) = & \sum_{m_{1} = 1}^{M_{p}} \sum_{m_{2} = 1}^{M_{p}} \dots \sum_{m_{p} = 1}^{M_{p}} h_{m_{1}, m_{2}, \dots, m_{p}}^{(p)} \prod_{j = 1}^{p} u (t - m_{j}) \end{matrix}

(46)

\begin{matrix} = & \sum_{k_{1} = 1}^{K_{p}} \sum_{k_{2} = 1}^{K_{p}} \dots \sum_{k_{p} = 1}^{K_{p}} g_{k_{1}, k_{2}, \dots, k_{p}}^{(p)} \prod_{j = 1}^{p} s_{k_{j}}^{(j, p)} (t) . \end{matrix}

(47)

Taking the Tucker model (44) of the pth order kernel tensor

H^{(p)}

into account, and defining the vector

s^{(j, p)} (t) = {[s_{1}^{(j, p)} (t), \dots, s_{K_{p}}^{(j, p)} (t)]}^{T} \in R^{K_{p}}

, the input–output equation for the Volterra–GOB model then becomes

\hat{y} (t) = h_{0} + \sum_{p = 1}^{P} G^{(p)} \overset{p}{\underset{j = 1}{\times}} {[s^{(j, p)} (t)]}^{T} .

(48)

Figure 6 illustrates a third-order Volterra–GOB model.

Remark 2.

Note that the truncation order

K_{p}

, and as a consequence the parametric complexity of the Volterra–GOB model, is strongly dependent on the choice of the GOB functions, which is a difficult task. Once these functions are fixed, the Volterra–GOB model is linear in its parameters, the Fourier coefficients, which can be estimated using the standard least-squares (LS) method. In comparison, the Volterra–PARAFAC model is nonlinear in its parameters, the PARAFAC coefficients, which requires the use of a nonlinear optimization method like the extended Kalman filter, for their estimation.

5.5. Block-Oriented Models

Nonlinear input–output models constituted by a cascade of linear dynamic subsystems with memoryless (static) nonlinearities, also called block-oriented (or block-structured) nonlinear models, have been extensively studied by many authors during the last three decades. They play an important role in many fields of application owing to their low parametric complexity implying a low computational cost for system identification. Moreover, they often reflect the structure of physical systems. We review hereafter the three most common block-oriented models and their tensor representation. According to the Weierstrass theorem, it is assumed that nonlinear blocks are continuous and therefore can be represented with a polynomial of a given degree P:

c (x) = \sum_{p = 0}^{P} c_{p} x^{p}

.

5.5.1. Hammerstein Model

It is constituted with a nonlinear functional block followed by a FIR linear one

g (.)

, with memory

M_{g}

. In control applications, as illustrated on Figure 7, the Hammerstein model is used for representing control systems with nonlinearities in the actuator.

The output

y (t)

of the Hammerstein model is given by

\begin{matrix} y (t) = & \sum_{i = 1}^{M_{g}} g_{i} v (t - i) \end{matrix}

(49)

\begin{matrix} = & \sum_{i = 1}^{M_{g}} g_{i} (\sum_{p = 0}^{P} c_{p} u^{p} (t - i)) \end{matrix}

(50)

\begin{matrix} = & \sum_{p = 0}^{P} c_{p} \sum_{i = 1}^{M_{g}} g_{i} u^{p} (t - i) . \end{matrix}

(51)

This model is therefore equivalent to a Volterra model of order P, with the following pth-order kernel

h_{i, i_{2}, \dots, i_{p}}^{(p)} = c_{p} g_{i} δ_{i, i_{2}, \dots, i_{p}}, i \in 〈 M_{g} 〉,

(52)

where

δ_{i, i_{2}, \dots, i_{p}}

is the generalized Kronecker delta. The corresponding tensor is diagonal and given by [49]

H^{(p)} = c_{p} G^{(p)},

(53)

where

G^{(p)} \in R^{M_{g} \times \dots \times M_{g}}

is a diagonal tensor whose diagonal elements are the components of the FIR coefficients vector

g = {[g_{1}, \dots, g_{M_{g}}]}^{T}

.

5.5.2. Wiener Model

It is the dual of the Hammerstein model, i.e., the FIR linear functional block

l (.)

, with memory

M_{h}

, comes before the nonlinear one, as illustrated on Figure 8. It allows taking sensor nonlinearities into account for instance.

For this model, the output

y (t)

is given by

\begin{matrix} y (t) = & c (w (t)) = \sum_{p = 0}^{P} c_{p} w^{p} (t) \end{matrix}

(54)

\begin{matrix} = & \sum_{p = 0}^{P} c_{p} {(\sum_{i = 1}^{M_{h}} l_{i} u (t - i))}^{p} \end{matrix}

(55)

\begin{matrix} = & \sum_{p = 0}^{P} c_{p} \sum_{i_{1} = 1}^{M_{h}} \dots \sum_{i_{p} = 1}^{M_{h}} \prod_{j = 1}^{p} l_{i_{j}} u (t - i_{j}) . \end{matrix}

(56)

This model is equivalent to a Volterra model of order P, whose the pth-order kernel is a rank one symmetric tensor defined as

h_{i_{1}, i_{2}, \dots, i_{p}}^{(p)} = c_{p} \prod_{j = 1}^{p} l_{i_{j}}, i_{j} \in 〈 M_{h} 〉, j \in 〈 p 〉,

(57)

or equivalently

H^{(p)} = c_{p} \overset{P}{\underset{j = 1}{\circ}} l,

(58)

where

l = {[l_{1}, \dots, l_{M_{h}}]}^{T}

is the vector of FIR coefficients.

5.5.3. Wiener–Hammerstein Model

The Wiener–Hammerstein model, whose structure is illustrated in Figure 9, is a combination of the Wiener and Hammerstein models described previously. Its output

y (t)

is given by

\begin{matrix} y (t) = & \sum_{i = 1}^{M_{g}} g_{i} v (t - i) = \sum_{i = 1}^{M_{g}} g_{i} c (w (t - i)) \end{matrix}

(59)

\begin{matrix} = & \sum_{i = 1}^{M_{g}} g_{i} \sum_{p = 0}^{P} c_{p} w^{p} (t - i) \end{matrix}

(60)

\begin{matrix} = & \sum_{p = 0}^{P} c_{p} \sum_{i = 1}^{M_{g}} g_{i} \prod_{j = 1}^{p} \sum_{m_{j} = 1}^{M_{h}} l_{m_{j}} u (t - i - m_{j}) . \end{matrix}

(61)

Defining the changes of variables

i_{j} = i + m_{j}

, for

j \in 〈 p 〉

, and reordering the sums lead to the following I/O relationship:

y (t) = \sum_{p = 0}^{P} c_{p} \sum_{i_{1} = 2}^{M_{v}} \dots \sum_{i_{p} = 2}^{M_{v}} \sum_{i = 1}^{M_{g}} g_{i} \prod_{j = 1}^{p} l_{i_{j} - i} u (t - i_{j}),

(62)

where

M_{v} = M_{g} + M_{h}

stands for the memory of the overall system. The Wiener–Hammerstein model is associated with a Volterra model whose pth-order kernel is given by [49]

h_{i_{1}, \dots, i_{P}}^{(p)} = c_{p} \sum_{i = 1}^{M g} g_{i} \prod_{j = 1}^{P} l_{i_{j} - i}, i_{j} = 2, \dots, M_{v}, j \in 〈 p 〉 .

(63)

The corresponding tensor

H^{(p)} \in R^{M_{v} - 1 \times \dots \times M_{v} - 1}

is a rank

M_{g}

tensor admitting a PARAFAC decomposition written as

H^{(p)} = c_{p} \sum_{i = 1}^{M g} g_{i} \overset{p}{\underset{j = 1}{\circ}} a_{i} = c_{p} I_{p, M_{v} - 1} \overset{p}{\underset{j = 1}{\times}} A^{(j)},

(64)

where

a_{i} = [\begin{matrix} 0_{i - 1} \\ l \\ 0_{M_{g} - i} \end{matrix}] \in R^{M_{v} - 1}, i \in 〈 M_{g} 〉,

(65)

\begin{matrix} A^{(j)} = & [\begin{matrix} a_{1} & \dots & a_{M_{g}} \end{matrix}] = T_{M_{v} - 1, M_{g}} (l) = [\begin{matrix} l_{1} & 0 & \dots & 0 \\ ⋮ & l_{1} & ⋮ \\ l_{M_{h}} & ⋮ & ⋱ & 0 \\ 0 & l_{M_{h}} & l_{1} \\ ⋮ & ⋱ & ⋮ \\ 0 & 0 & \dots & l_{M_{h}} \end{matrix}], j \in 〈 p - 1 〉, \end{matrix}

(66)

\begin{matrix} A^{(p)} = & T_{M_{v} - 1, M_{g}} (l) d i a g (g) \in R^{M_{v} - 1 \times M_{g}} . \end{matrix}

(67)

5.5.4. Tensor Rank and Structure Identification

Given the Volterra model associated with a block-oriented system among the three previously considered, it was shown in [51] that the inherent structure of the system can be inferred from analyzing the tensor rank of the pth-order Volterra kernel

H^{(p)}

. Indeed, from the results established in Section 5.5.1, Section 5.5.2 and Section 5.5.3, we can conclude that the tensor

H^{(p)}

has a rank less than or equal to

M_{g}

. It is precisely rank 1 for a Wiener model and diagonal for a Hammerstein one. However, the PARAFAC decomposition of

H^{(p)}

is not guaranteed to be of minimal rank. The method is based on a filtering of the system output with a FIR filter with nonzero random coefficients. The Volterra kernel of this augmented system is ensured to have minimal rank. For an order

M_{f}

FIR filter with an impulse response coefficients vector

f

, the tensor corresponding to the pth order Volterra kernel of the augmented system is given by

{\bar{H}}^{(p)} = c_{p} I \overset{p}{\underset{i = 1}{\times}} {\bar{A}}^{(i)},

(68)

where

{\bar{A}}^{(i)} = T_{{\bar{M}}_{v}, M_{g}} (l)

,

i \in 〈 p - 1 〉

,

{\bar{A}}^{(p)} = T_{{\bar{M}}_{v}, M_{g}} (l) d i a g (\bar{g})

,

\bar{g} = T_{{\bar{M}}_{g}, M_{g}} (f) g

,

{\bar{M}}_{g} = M_{f} + M_{g} - 1

, and

{\bar{M}}_{v} = M_{v} + M_{f} - 1

. The factor matrices are generically full column rank. Therefore, the matrix unfolding of the tensor along the pth dimension is full rank and reflects the tensor rank. This leads to the following decision rule

$r a n k ({\bar{H}}^{(p)}) = {\bar{M}}_{v}$ ⟹ Hammerstein structure (Nonlinear–Linear)
$r a n k ({\bar{H}}^{(p)}) = M_{f}$ ⟹ Wiener structure (Linear–Nonlinear)
$r a n k ({\bar{H}}^{(p)}) \in]M_{f}, {\bar{M}}_{v}[$ ⟹ Wiener–Hammerstein structure (Linear–Nonlinear–Linear).

Since the factor matrices are full column rank, the tensor rank is precisely given by the rank of the pth unfolding of tensor, hereafter denoted

{\bar{H}}_{p}

. However, in presence of Volterra kernel estimation errors,

{\bar{H}}_{p}

is often full column rank, which can lead to an erroneous selection of the Hammerstein structure. Thus, it is necessary to check the diagonal structure of the tensor in order to confirm the decision or not. This is performed by checking if the sum of diagonal entries of the tensor is much higher than those of off-diagonal ones. If the matrix is rank deficient, an interesting rule for computing the rank r from the singular values of the matrix is given in [87] as

r = arg min_{i} ρ (i), ρ (i) = \frac{σ_{i + 1}^{2}}{σ_{i}^{2} - 2 σ_{i + 1}^{2}} if σ_{i + 1}^{2} \leq \frac{σ_{i}^{2}}{3} else ρ (i) = 1 .

(69)

The algorithm for detecting the structure of a block-oriented nonlinear system is then described in Algorithm 2:

Algorithm 2: Structure identification of a block-oriented nonlinear system.

Given the coefficients

h_{i_{1}, i_{2}, \dots, i_{p}}^{(p)}

of the pth order kernel of the Volterra model associated with the block-oriented nonlinear system with memory

M_{v}

:

Generate the impulse response $f$ of an $M_{f} \geq M_{v}$ order FIR filter with random coefficients.
Form the tensor ${\bar{H}}^{(p)}$ of the augmented system by filtering the Volterra kernel as

${\bar{h}}_{i_{1}, i_{2}, \dots, i_{p}}^{(p)} = \sum_{i = 0}^{M_{f} - 1} f_{i} h_{i_{1} - i, i_{2} - i, \dots, i_{p} - i}^{(p)} .$
Compute the singular values $σ_{i}$ of the matrix unfolding ${\bar{H}}_{p}$ .
Compute the rank r of ${\bar{H}}_{p}$ as the smallest integer such that

$\sum_{i = 1}^{k - 1} σ_{i} < ϵ \sum_{i = 1}^{M_{v} + M_{f} - 1} σ_{i} \leq \sum_{i = 1}^{k} σ_{i},$

where $ϵ$ is a constant close to 1.
If $r = \bar{M_{v}} = M_{v} + M f - 1$ , test if ${\bar{H}}^{(p)}$ is diagonal. If yes then conclude that the system has a Hammerstein structure.
If $r < M_{v} + M f - 1$
Compute the rank r using (69).
(a)
If $r = M_{f}$ , then the system has a Wiener structure.
(b)
If $M_{f} < r < M_{v} + M_{f} - 1$ , then the system has a Wiener–Hammerstein structure whose first linear block is of order $M_{l} = M_{v} + M f - r$ , while the second linear block is of order $M_{g} = r - M_{f} + 1$ .

6. Tensor-Based Approaches for Multilinear Systems

In Section 6.1 and Section 6.2, we introduce the notions of tensor system and memoryless discrete-time tensor-input tensor-output (TITO) system, respectively. Then, in Section 6.3, we consider a tensor-input single-output (TISO) system whose system transfer is a rank-one Nth-order tensor, which leads to a multilinear system whose the N vector factors represent IRs of subsystems associated with the N modes of the input tensor. In Section 6.4, we present the weighted least-squares (WLS) algorithm for estimating the system transfer tensor of a multilinear system from input–output (I/O) data. A closed-form solution is also proposed for estimating the individual IR of each subsystem.

6.1. Tensor Systems

In this section, we introduce the notion of tensor system using the Einstein product [55]. In Table 12, we present two examples of tensor systems: one is linear in the unknown tensor variable

X

, while the other one is bilinear in the unknown tensor variables

(X, Z)

.

Example 2.

To illustrate the notion of tensor system, let us consider the following equation:

Y = A ★_{2} X .

(70)

This equation can be associated with the following map

f : R^{K \times L} ∋ X ⟼ f (X) = Y \in R^{I \times J}

such as

y_{i, j} = \sum_{k = 1}^{K} \sum_{l = 1}^{L} a_{i, j, k, l} x_{k, l}

(71)

with the associated fourth-order tensor

A \in R^{I \times J \times K \times L}

.

Equation (70) can be solved with respect to the unknown matrix

X

by minimizing the LS criterion

min_{X} {∥ Y - A ★_{2} X ∥}_{F}^{2}

. This minimization is carried out after a vectorization of Equation (70) using the unfolding

A_{I J \times K L}

of the tensor

A

and the vectorized forms

x_{K L}

and

y_{I J}

of the matrices

X

and

Y

, which leads to a standard system of linear equations in matrix form, with a coefficient matrix. The LS criterion then becomes

min_{x_{K L}} {∥ y_{I J} - A_{I J \times K L} x_{K L} ∥}_{2}^{2} .

(72)

Minimizing this criterion with respect to the unknown vector

x_{K L}

gives the following normal equations:

\begin{matrix} (A_{I J \times K L}^{T} A_{I J \times K L}) {\hat{x}}_{K L} = & A_{I J \times K L}^{T} y_{I J} \end{matrix}

(73)

\begin{matrix} ⇓ \end{matrix}

(74)

\begin{matrix} {\hat{x}}_{K L} = & {(A_{I J \times K L}^{T} A_{I J \times K L})}^{- 1} A_{I J \times K L}^{T} y_{I J} \end{matrix}

(75)

if the matrix

A_{I J \times K L}^{T} A_{I J \times K L}

is invertible, i.e., if

A_{I J \times K L}

has full column rank, which implies the necessary but not sufficient condition that

I J \geq K L

.

Remark 3.

Let us consider the inner product of the Nth-order real tensors

A \in R^{{\underset{̲}{I}}_{N}}

and

X \in R^{{\underset{̲}{I}}_{N}}

〈 A, X 〉 = A ★_{N} X = \sum_{i_{1} = 1}^{I_{1}} \dots \sum_{i_{N} = 1}^{I_{N}} a_{i_{1}, \dots i_{N}} x_{i_{1}, \dots i_{N}} = a_{{\underset{̲}{i}}_{N}} x_{{\underset{̲}{i}}_{N}} .

(76)

Assuming

X

has rank-one, i.e.,

X = \overset{N}{\underset{n = 1}{\circ}} x^{(n)} ⟺ x_{{\underset{̲}{i}}_{N}} = \prod_{n = 1}^{N} x_{i_{n}}^{(n)},

(77)

Equation (76) becomes

\begin{matrix} A ★_{N} X = & \sum_{i_{1} = 1}^{I_{1}} \dots \sum_{i_{N} = 1}^{I_{N}} a_{i_{1}, \dots i_{N}} \prod_{n = 1}^{N} x_{i_{n}}^{(n)} \\ = & A \times_{1} x^{(1)} \dots \times_{N} x^{(N)} = A \overset{N}{\underset{n = 1}{\times}} x^{(n)}, \end{matrix}

(78)

and we obtain a homogeneous multivariate polynomial of degree N in the components of the N vector factors

x^{(n)}, n \in 〈 N 〉

.

If we assume that

A

has also rank-one, i.e.,

A = \overset{N}{\underset{n = 1}{\circ}} a^{(n)}

, Equation (76) can be written as

\begin{matrix} A ★_{N} X = & \sum_{i_{1} = 1}^{I_{1}} \dots \sum_{i_{N} = 1}^{I_{N}} \prod_{n = 1}^{N} a_{i_{n}}^{(n)} x_{i_{n}}^{(n)} \\ = & \prod_{n = 1}^{N} (\sum_{i_{n} = 1}^{I_{n}} a_{i_{n}}^{(n)} x_{i_{n}}^{(n)}) = \prod_{n = 1}^{N} {(a^{(n)})}^{T} x^{(n)} . \end{matrix}

(79)

In conclusion, when

A

has rank-one, the multivariate polynomial (78) is equal to the product of N linear forms, each linear form being an univariate polynomial in the components

x_{i_{n}}^{(n)}

of the vector

x^{(n)}

.

If

A

satisfies a rank-R PARAFAC decomposition (see Table 10), i.e.,

A = \sum_{r = 1}^{R} \overset{N}{\underset{n = 1}{\circ}} A_{. r}^{(n)}

, with

A^{(n)} \in R^{I_{n} \times R}

, Equation (78) becomes

\begin{matrix} A ★_{N} X = & \sum_{i_{1} = 1}^{I_{1}} \dots \sum_{i_{N} = 1}^{I_{N}} (\sum_{r = 1}^{R} \prod_{n = 1}^{N} a_{i_{n}, r}^{(n)}) \prod_{n = 1}^{N} x_{i_{n}}^{(n)} \end{matrix}

(80)

\begin{matrix} = & \sum_{r = 1}^{R} (\sum_{i_{1} = 1}^{I_{1}} a_{i_{1}, r}^{(1)} x_{i_{1}}^{(1)}) \dots (\sum_{i_{N} = 1}^{I_{N}} a_{i_{N}, r}^{(N)} x_{i_{N}}^{(N)}) \end{matrix}

\begin{matrix} = & \sum_{r = 1}^{R} ({(A_{. r}^{(1)})}^{T} x^{(1)}) \dots ({(A_{. r}^{(N)})}^{T} x^{(N}) . \end{matrix}

(81)

In this case, we obtain a sum of products of N linear forms, to be compared with the Volterra–PARAFAC model (19) which corresponds to the particular case where

x^{(n)} = u (t)

and

A_{. r}^{(n)} = A_{. r}^{(p)}

, for

n \in 〈 N 〉

. This last constraint results from the assumption of symmetry of the Volterra kernel.

6.2. Discrete-Time Memoryless Tensor-Input Tensor-Output Systems

Assuming system input and model output data are contained in two tensors

X (t) \in R^{{\underset{̲}{J}}_{N}}

and

\hat{Y} (t) \in R^{{\underset{̲}{I}}_{P}}

which depend on time

t \in [1, T]

, we define a discrete-time memoryless tensor-input tensor-output (TITO) model by means of the following I/O relationship:

\hat{Y} (t) = A ★_{N} X (t),

(82)

where

X (t)

and

\hat{Y} (t)

are the tensors of system input and model output signals of the TITO system, at the time instant t, and

A \in R^{{\underset{̲}{I}}_{P} \times {\underset{̲}{J}}_{N}}

is the system transfer tensor. Using the index convention, Equation (82) can be written in scalar form as

{\hat{y}}_{{\underset{̲}{i}}_{P}} (t) ≜ {\hat{y}}_{i_{1}, \dots, i_{P}} (t) = a_{{\underset{̲}{i}}_{P}, {\underset{̲}{j}}_{N}} x_{{\underset{̲}{j}}_{N}} (t) .

(83)

This equation is associated with the following map:

R^{{\underset{̲}{J}}_{N}} ∋ X (t) ⟼ f (X (t)) = \hat{Y} (t) \in R^{{\underset{̲}{I}}_{P}}

(84)

with the associated

(P + N)

th-order tensor

A \in R^{{\underset{̲}{I}}_{P} \times {\underset{̲}{J}}_{N}}

.

Considering measurements of I/O signals during the time interval T, the sets of input and output signals are concatenated along the time mode to form the matrix unfoldings

X_{Π J_{N} \times T} \in R^{Π J_{N} \times T}

and

{\hat{Y}}_{Π I_{P} \times T} \in R^{Π I_{P} \times T}

of the tensors

X (T) \in R^{{\underset{̲}{J}}_{N} \times T}

and

\hat{Y} (T) \in R^{{\underset{̲}{I}}_{P} \times T}

, respectively, with

Π I_{P}

and

Π J_{N}

defined as in Table 3. The I/O relationship of the TITO model can then be written in the following matrix form:

{\hat{Y}}_{Π I_{P} \times T} = A_{Π I_{P} \times Π J_{N}} X_{Π J_{N} \times T} .

(85)

Let us assume the model output (83) is corrupted by a zero-mean additive white Gaussian noise (AWGN)

e_{{\underset{̲}{i}}_{P}} (t)

such as the measured noisy output signal is given by

y_{{\underset{̲}{i}}_{P}} (t) = {\hat{y}}_{{\underset{̲}{i}}_{P}} (t) + e_{{\underset{̲}{i}}_{P}} (t) = a_{{\underset{̲}{i}}_{P}, {\underset{̲}{j}}_{N}} x_{{\underset{̲}{j}}_{N}} (t) + e_{{\underset{̲}{i}}_{P}} (t) .

(86)

Transposing both members of Equation (85) gives

Y_{T \times Π I_{P}} = X_{T \times Π J_{N}} A_{Π J_{N} \times Π I_{P}} + E_{T \times Π I_{P}} .

(87)

From this equation, it is easy to derive the LS estimate of the matrix unfolding

A_{Π J_{N} \times Π I_{P}}

of the system transfer tensor, which minimizes the least mean square error between the model outputs and the noisy system output measurements

\begin{matrix} \min_{A_{Π J_{N} \times Π I_{P}}} [∥ E_{T \times Π I_{P}} ∥_{F}^{2} = \sum_{i_{1} = 1}^{I_{1}} \dots \sum_{i_{P} = 1}^{I_{P}} e_{{\underset{̲}{i}}_{P}}^{2} (t)] = & \min_{A_{Π J_{N} \times Π I_{P}}} {∥ Y_{T \times Π I_{P}} - X_{T \times Π J_{N}} A_{Π J_{N} \times Π I_{P}} ∥}_{F}^{2} \\ ⇓ \\ {\hat{A}}_{Π J_{N} \times Π I_{P}} = & {[X_{T \times Π J_{N}}]}^{†} Y_{T \times Π I_{P}} . \end{matrix}

(88)

To ensure uniqueness of this LS solution, it is necessary that

X_{T \times Π J_{N}}

be full column rank, which implies the necessary condition

T \geq Π J_{N}

; i.e., the number T of input–output samples must be greater or equal to the number of input signal samples at each time instant t.

6.3. Multilinear TISO Systems

In the case of a memoryless tensor-input single-output (TISO) system, let us assume that the system transfer is a rank-one tensor

A \in R^{{\underset{̲}{J}}_{N}}

written as

A = \overset{N}{\underset{n = 1}{\circ}} h^{(n)} ⟺ a_{{\underset{̲}{j}}_{N}} = \prod_{n = 1}^{N} h_{j_{n}}^{(n)},

(89)

with

h^{(n)} \in R^{J_{n}}

, for

n \in 〈 N 〉

. The model output (82) is then given by

\begin{matrix} \hat{y} (t) = & (\overset{N}{\underset{n = 1}{\circ}} h^{(n)}) ★_{N} X (t) \\ = & \sum_{j_{1} = 1}^{J_{1}} \dots \sum_{j_{N} = 1}^{J_{N}} (\overset{N}{\prod_{n = 1}} h_{j_{n}}^{(n)}) x_{{\underset{̲}{j}}_{N}} (t) \\ = & (\sum_{j_{1} = 1}^{J_{1}} h_{j_{1}}^{(1)} x_{j_{1}, \dots, j_{N}} (t)) \dots (\sum_{j_{N} = 1}^{J_{N}} h_{j_{N}}^{(N)} x_{j_{1}, \dots, j_{N}} (t)) \end{matrix}

(90)

or using the index convention

\hat{y} (t) = \overset{N}{\prod_{n = 1}} (h_{j_{n}}^{(n)} x_{{\underset{̲}{j}}_{N}} (t))

(91)

or equivalently

\begin{matrix} \hat{y} (t) = & (X (t) \times_{1} {(h^{(1)})}^{T}) \dots (X (t) \times_{N} {(h^{(N)})}^{T}) \\ = & X (t) \overset{N}{\underset{n = 1}{\times}} {(h^{(n)})}^{T} . \end{matrix}

(92)

The resulting system output is multilinear (N-linear) with respect to the vector factors

h^{(n)}

. Each vector can be interpreted as the impulse response (IR) of length

J_{n}

of the subsystem associated to the nth mode of the input tensor

X (t)

. The output signal

\hat{y} (t)

is therefore a multilinear form in the N individual IR vectors (see Table 7).

6.4. Estimation of the System Transfer Tensor from I/O Data

Let us define the lexicographical vectorization

u (t) ≜ vec (X (t)) \in R^{Π J_{N}}

such as

u_{\bar{j_{1} \dots j_{N}}} (t) = x_{j_{1}, \dots, j_{N}} (t),

(93)

and the (multilinear) global impulse response (GIR)

h

as the vectorized form of the system transfer tensor

A

h = vec (\overset{N}{\underset{n = 1}{\circ}} h^{(n)}) = \overset{N}{\underset{n = 1}{\otimes}} h^{(n)} = h^{(1)} \otimes h^{(2)} \otimes \dots \otimes h^{(N)} \in R^{Π J_{N}} .

(94)

The output of the multilinear model can then be rewritten as

\hat{y} (t) = u^{T} (t) h .

(95)

Considering noisy output measurements on the time interval

[1, T]

, the noisy output vector

y (T) \in R^{T}

is given by

y (T) ≜ [\begin{matrix} y (1) \\ ⋮ \\ y (T) \end{matrix}] = [\begin{matrix} u^{T} (1) \\ ⋮ \\ u^{T} (T) \end{matrix}] h + [\begin{matrix} e (1) \\ ⋮ \\ e (T) \end{matrix}] ≜ U (T) h + e (T),

(96)

where

e (t)

is a zero-mean AWGN, at the time instant t, and

U (T) \in R^{T \times Π J_{N}}

.

We now determine the weighted least-squares (WLS) estimate of the vectorized form

h

of the GIR tensor, which minimizes the following cost funtion

\min_{h} {∥ e (T) ∥}_{W}^{2}

, with

{∥ e (T) ∥}_{W}^{2} ≜ e^{T} (T) W e (T) = \sum_{t = 1}^{T} w_{t} e^{2} (t) = {∥ y (T) - U (T) h ∥}_{W}^{2},

(97)

where

W ≜ diag (w_{1}, \dots, w_{T})

is a diagonal weighting matrix, with

w_{t} > 0

for all

t \in [1, T]

. The WLS criterion (97) can be developed as

{∥ e (T) ∥}_{W}^{2} = {∥ y (T) ∥}_{W}^{2} - 2 h^{T} U^{T} (T) W y (T) + h^{T} U^{T} (T) W U (T) h .

(98)

It is a quadratic cost function with respect to the unknown parameters vector

h

. The Hessian (

2 U^{T} (T) W U (T)

) being a nonnegative definite matrix, this criterion has a unique global minimum obtained in canceling its gradient with respect to

h

, which gives

(U^{T} (T) W U (T)) \hat{h} (T) = U^{T} (T) W y (T) .

(99)

Assuming the matrix

U^{T} (T) W U (T)

is nonsingular, the WLS estimate of

h

is given by

\hat{h} (T) = {(U^{T} (T) W U (T))}^{- 1} U^{T} (T) W y (T) .

(100)

As the diagonal weighting matrix

W

is positive definite, a condition for ensuring the uniqueness of the WLS estimate is that

U (T)

be full column rank, which implies the necessary condition

T \geq Π J_{N}

. When the weighting matrix is chosen as the identity matrix, we obtain the standard LS estimate of the GIR given by

\hat{h} (T) = U^{†} (T) y (T) .

(101)

In [56], an iterative Wiener filter and LMS-based algorithms are proposed to identify multilinear systems as described in (91).

Tensorizing the GIR vector estimate

\hat{h}

as a Nth-order rank-one tensor

\hat{H} \in R^{{\underset{̲}{J}}_{N}}

, an estimate

{\hat{h}}^{(n)}

of each individual IR

h^{(n)}

can be obtained by using the high order singular value decomposition (HOSVD) of

\hat{H}

, i.e., calculating the left singular vector associated with the largest singular value of the matrix unfolding

{\hat{H}}_{J_{n} \times J_{1} \dots J_{n - 1} J_{n + 1} \dots J_{N}}

. For more details concerning the HOSVD-based estimation of matrix or vector factors of a multiple Kronecker product, the reader is referred to the following references [70,81]. Uniqueness of individual estimates

{\hat{h}}^{(n)}

is ensured assuming the first coefficient

h_{1}^{(n)} = 1

for

n \in 〈 N 〉

.

7. Conclusions and Perspectives

The aim of this paper is to outline links between tensors and nonlinear and multilinear systems. In the case of NL systems, a focus has been made on Volterra models, with the objective of parametric complexity reduction using a PARAFAC decomposition of symmetrized kernels or their expansion on generalized orthogonal basis functions. The EKF algorithm has been proposed to estimate the parameters of a Volterra–PARAFAC model. Then, three block-oriented nonlinear systems have been represented by means of associated Volterra models in the form of a structured tensor decomposition. It has been shown how this equivalent tensor representation can be exploited to identify the structure of a block-oriented system. This tensor representation can also be used for parameter estimation of a block-oriented system. As perspectives of these results, it would be interesting to compare the different NL models considered, both in terms of parametric complexity and quality of modeling via parameter estimation for a given benchmark.

For multilinear systems, a new class of systems called tensor-input tensor-output (TITO) systems is introduced using Einstein product of tensors. The case of a TISO system has been studied in more detail assuming that the transfer tensor has rank one. The WLS algorithm has been derived for estimating the multilinear global impulse response (GIR) associated with the vectorized form of the system transfer tensor. A closed-form HOSVD-based solution has been proposed to estimate the individual impulse response of each subsystem from the estimated GIR. Another line of research will be to consider a sparse input data tensor modeled using different tensor models and apply tensor completion methods to reconstruct missing data.

Supplementary Materials

The following are available online at https://www.mdpi.com/article/10.3390/a16090443/s1, Matlab code for simulations results in Figure 2, Figure 3, Figure 4 and Figure 5.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement:

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Vasilescu, M.A.O.; Terzopoulos, D. Multilinear analysis of image ensembles: TensorFaces. In Proceedings of the European Conference on Computer Vision (ECCV 2002), Copenhagen, Denmark, 28–31 May 2002; pp. 447–460. [Google Scholar]
Lu, H.; Plataniotis, K.N.; Venetsanopoulos, A.N. MPCA: Multilinear principal component analysis of tensor objects. IEEE Trans. Neural Netw. 2008, 19, 18–39. [Google Scholar]
Raimondi, F.; Cabral Farias, R.; Michel, O.; Comon, P. Wideband multiple diversity tensor array processing. IEEE Trans. Signal Process. 2017, 65, 5334–5346. [Google Scholar] [CrossRef]
Ji, Y.; Wang, Q.; Li, X.; Liu, J. A Survey on tensor techniques and applications in machine learning. IEEE Access 2019, 7, 162950. [Google Scholar] [CrossRef]
Frolov, E.; Oseledets, I. Tensor methods and recommender systems. WIREs Data Mining Knowl. Discov. 2017, 7, e1201. [Google Scholar] [CrossRef]
Padhy, S.; Goovaerts, G.; Boussé, M.; De Lathauwer, L.; Van Huffel, S. The Power of Tensor-Based Approaches in Cardiac Applications. In Biomedical Signal Processing. Advances in Theory, Algorithms and Applications; Naik, G., Ed.; Springer: Singapore, 2019. [Google Scholar]
Wang, R.; Li, S.; Cheng, L.; Wong, M.H.; Leung, K.S. Predicting associations among drugs, targets and diseases by tensor decomposition for drug repositioning. BMC Bioinform. 2019, 26, 628. [Google Scholar] [CrossRef] [PubMed]
Favier, G.; Sousa Rocha, D. Overview of tensor-based cooperative MIMO communication systems— Part 1: Tensor modeling. MDPI Entropy 2023, 25, 1181. [Google Scholar] [CrossRef]
Cichocki, A. Era of big data processing: A new approach via tensor networks and tensor decompositions. arXiv 2014, arXiv:1403.2048v4. [Google Scholar]
Hitchcock, F.L. The expression of a tensor or a polyadic as a sum of products. J. Math. Phys. 1927, 6, 164–189. [Google Scholar] [CrossRef]
Cattell, R. Parallel proportional profiles and other principles for determining the choice of factors by rotation. Psychometrika 1944, 9, 267–283. [Google Scholar] [CrossRef]
Tucker, L.R. Some mathematical notes on three-mode factor analysis. Psychometrika 1966, 31, 279–311. [Google Scholar] [CrossRef] [PubMed]
Harshman, R.A. Foundations of the PARAFAC procedure: Models and conditions for an “explanatory” multimodal factor analysis. UCLA Work. Pap. Phon. 1970, 16, 1–84. [Google Scholar]
Bro, R. PARAFAC. Tutorial and applications. Chemom. Intell. Lab. Syst. 1997, 38, 149–171. [Google Scholar] [CrossRef]
Morup, M. Applications of tensor (multiway array) factorizations and decompositions in data mining. WIREs Data Min. Knowl. Discov. 2011, 1, 20–40. [Google Scholar] [CrossRef]
Cichocki, A.; Lee, N.; Oseledets, I.; Phan, A.H.; Zhao, Q.; Mandic, D.P. Tensor networks for dimensionality reduction and large-scale optimization: Part 1 Low-rank tensor decompositions. Found. Trends Mach. Learn. 2016, 9, 249–429. [Google Scholar] [CrossRef]
Acar, E.; Bro, R.; Smilde, A. Data fusion in metabolomics using coupled matrix and tensor factorizations. Proc. IEEE 2015, 103, 1602–1620. [Google Scholar] [CrossRef]
Gandy, S.; Recht, B.; Yamada, I. Tensor completion and low-n-rank tensor recovery via convex optimization. Inverse Probl. 2011, 27, 025010. [Google Scholar] [CrossRef]
Liu, J.; Musialski, P.; Wonka, P.; Ye, J. Tensor completion for estimating missing values in visual data. IEEE Trans. Pattern Anal. Mach. Intell. 2013, 35, 208–220. [Google Scholar] [CrossRef]
Favier, G. From Algebraic Structures to Tensors; Wiley: Hoboken, NJ, USA, 2019; Volume 1. [Google Scholar]
Legendre, A.M. Appendice: Sur la Méthode des Moindres Quarrés, in “Nouvelles Méthodes Pour la Détermination des Orbites des Comètes”; Firmin-Didot: Paris, France, 1805; pp. 72–80. [Google Scholar]
Kalman, R.E. A new approach to linear filtering and prediction problems. Trans. ASME J. Basic Eng. 1960, 82D, 34–45. [Google Scholar] [CrossRef]
Jazwinski, A.H. Stochastic Processes and Filtering Theory; Academic Press: Cambridge, MA, USA, 1970. [Google Scholar]
Rogers, M.; Li, L.; Russell, S.J. Multilinear dynamical systems for tensor time series. In Proceedings of the Advances in Neural Information Processing Systems, Lake Tahoe, NV, USA, 5–10 December 2013; pp. 2634–2642. [Google Scholar]
Chen, C.; Surana, A.; Bloch, A.; Rajapakse, I. Multilinear control systems theory. SIAM J. Control Optim. 2021, 5, 749–776. [Google Scholar] [CrossRef]
Boyd, S.; Chua, L. Fading memory and the problem of approximating nonlinear operators with Volterra series. IEEE Trans. Circuits Syst. 1985, 32, 1150–1161. [Google Scholar] [CrossRef]
Schetzen, M. The Volterra and Wiener Theories of Nonlinear Systems; John Wiley & Sons: New York, NY, USA, 1980. [Google Scholar]
Mathews, V.; Sicuranza, G. Polynomial Signal Processing; John Wiley & Sons: New York, NY, USA, 2000. [Google Scholar]
Doyle III, F.; Pearson, R.; Ogunnaike, B. Identification and Control Using Volterra Models; Springer: Berlin/Heidelberg, Germany, 2002. [Google Scholar]
Fernando, X.N.; Sesay, A.B. Adaptive asymmetric linearization of radio over fiber links for wireless access. IEEE Trans. Veh. Technol. 2002, 51, 1576–1586. [Google Scholar] [CrossRef]
He, J.; Lee, J.; Kandeepan, S.; Wang, K. Machine learning techniques in radio-over-fiber systems and networks. Photonics 2020, 7, 105. [Google Scholar] [CrossRef]
Benedetto, S.; Biglieri, E.; Daffara, S. Modeling and peformance evaluation of nonlinear satellite links—A Volterra series approach. IEEE Trans. Aerosp. Electron. Syst. 1979, AES-15, 494–507. [Google Scholar] [CrossRef]
Cheng, C.H.; Powers, E.J. Optimal Volterra kernel estimation algorithms for a nonlinear communication system for PSK and QAM inputs. IEEE Trans. Signal Process. 2001, 49, 147–163. [Google Scholar] [CrossRef]
Marmarelis, V. Nonlinear Dynamic Modeling of Physiological Systems; Wiley-IEEE Press: Hoboken, NJ, USA, 2004. [Google Scholar]
Kerschen, G.; Worden, K.; Vakakis, A.; Golinval, J. Past, present and future of nonlinear system identification in structural dynamics. Mech. Syst. Signal Process. 2006, 20, 505–592. [Google Scholar] [CrossRef]
Azpicueta, L.; Zeller, M.; Figueiras-Vidal, A.; Kellerman, W.; Arenas-Garcia, J. Enhanced adaptive Volterra filtering by automatic attenuation of memory regions and its application to acoustic echo cancellation. IEEE Trans. Signal Process. 2013, 61, 2745–2750. [Google Scholar] [CrossRef]
Campello, R.J.; Favier, G.; Amaral, W.C. Optimal expansions of discrete-time Volterra models using Laguerre functions. Automatica 2004, 42, 815–822. [Google Scholar] [CrossRef]
Kibangou, A.; Favier, G.; Hassani, M.M. Selection of generalized orthonormal bases for second order Volterra filters. Signal Process. 2005, 85, 2371–2385. [Google Scholar] [CrossRef]
da Rosa, A.; Campello, R.; Amaral, W. Choice of free parameters in expansions of discrete-time Volterra models using Kautz functions. Automatica 2007, 43, 1084–1091. [Google Scholar] [CrossRef]
Favier, G.; Bouilloc, T. Identification de modèles de Volterra basée sur la décomposition PARAFAC de leurs noyaux et le filtre de Kalman etendu. Traitement du Signal 2010, 27, 27–51. [Google Scholar]
Favier, G.; Kibangou, A.; Bouilloc, T. Nonlinear system modeling and identification using Volterra–PARAFAC models. Int. J. Adapt. Control Signal Process. 2012, 26, 30–53. [Google Scholar] [CrossRef]
Batselier, K.; Chen, Z.; Wong, N. Tensor network alternating linear scheme for MIMO Volterra system identification. Automatica 2017, 84, 26–35. [Google Scholar] [CrossRef]
Crespo-Cadenas, C.; Reina-Tosina, J.; Madero-Ayora, M.J.; Muñoz-Cruzato, J. A new approach to pruning Volterra models for power amplifiers. IEEE Trans. Signal Process. 2010, 58, 2113–2120. [Google Scholar] [CrossRef]
Hunter, I.W.; Korenberg, M.J. The identification of nonlinear biological systems: Wiener and Hammerstein cascade models. Biol. Cybern. 1986, 55, 135–144. [Google Scholar] [CrossRef]
Giri, F.; Bai, E.W. Block-Oriented Nonlinear System Identification; LNCIS; Springer: London, UK, 2010; Volume 404. [Google Scholar]
Pearson, R.K.; Pottmann, M. Gray-box identification of block-oriented nonlinear models. J. Process. Control 2000, 10, 301–315. [Google Scholar] [CrossRef]
Schoukens, J.; Tiels, K. Identification of block-oriented nonlinear systems starting from linear approximations: A survey. Automatica 2017, 85, 272–292. [Google Scholar] [CrossRef]
Favier, G. Nonlinear system modeling and identification using tensor approaches. In Proceedings of the 10th International Conference on Sciences and Techniques of Automatic Control and Computer Engineering (STA’2009), Hammamet, Tunisia, 20–22 December 2009. [Google Scholar]
Kibangou, A.; Favier, G. Wiener-Hammerstein systems modeling using diagonal Volterra kernels coefficients. IEEE Signal Process. Lett. 2006, 13, 381–384. [Google Scholar] [CrossRef]
Kibangou, A.; Favier, G. Identification of parallel-cascade Wiener systems using joint diagonalization of third-order Volterra kernel slices. IEEE Signal Process. Lett. 2009, 16, 188–191. [Google Scholar] [CrossRef]
Kibangou, A.; Favier, G. Tensor analysis-based model structure determination and parameter estimation for block-oriented nonlinear systems. IEEE J. Sel. Top. Signal Process. Spec. Issue Model Order Sel. Signal Process. Syst. 2010, 4, 514–525. [Google Scholar] [CrossRef]
Tseng, C.; Powers, E. Identification of cubic systems using higher order moments of i.i.d. signals. IEEE Trans. Signal Process. 1995, 43, 1733–1735. [Google Scholar] [CrossRef]
Kibangou, A.; Favier, G. Identification of fifth-order Volterra systems using i.i.d. inputs. IET Signal Process. 2010, 4, 30–44. [Google Scholar] [CrossRef]
Kibangou, A.; Favier, G. Matrix and tensor decompositions for identification of block-structured nonlinear channels in digital transmission systems. In Proceedings of the IEEE 9th Worshop on Signal Processing Advances in Wireless Communications (SPAWC), Recife, Brazil, 6–9 July 2008. [Google Scholar]
Brazell, M.; Li, N.; Navasca, C.; Tamon, C. Solving multilinear systems via tensor inversion. SIAM J. Matrix Anal. Appl. 2013, 34, 542–570. [Google Scholar] [CrossRef]
Dogariu, L.M.; Paleologu, C.; Benesty, J.; Ciochina, S. Identification of Multilinear Systems: A Brief Overview. In Principal Component Analysis; IntechOpen: London, UK, 2022. [Google Scholar]
Sage, A.P.; Melsa, J.L. System Identification; Academic Press: Cambridge, MA, USA, 1971. [Google Scholar]
Söderström, T.; Stoica, P. System Identification. Prentice-Hall: Englewood Cliffs, NJ, USA, 1989. [Google Scholar]
Eykhoff, P. System Identification. Parameter and State Estimation; John Wiley & Sons: Hoboken, NJ, USA, 1974. [Google Scholar]
Goodwin, G.; Payne, R. Dynamic system Identification: Experiment Design and Data Analysis; Academic Press: Cambridge, MA, USA, 1977. [Google Scholar]
Norton, J. An introduction to Identification; Academic Press: Cambridge, MA, USA, 1986. [Google Scholar]
Ljung, L. System Identification: Theory for the User; Prentice-Hall: Hoboken, NJ, USA, 1987. [Google Scholar]
Heuberger, P.; Van den Hof, P.; Wahlberg, B. Modelling and Identification with Rational Orthogonal Basis Functions; Springer: Berlin/Heidelberg, Germany, 2005. [Google Scholar]
Billings, S.A. Identification of nonlinear systems—A survey. IEE Proc. 1980, 127, 272–285. [Google Scholar] [CrossRef]
Rugh, W.J. Nonlinear System Theory. The Volterra-Wiener Approach; Johns Hopkins University Press: Baltimore, MD, USA, 1981. [Google Scholar]
Haber, R.; Keviczky, L. Nonlinear System Identification. Input-Ouput Modeling Approach. Vol. 1: Nonlinear System Parameter Identification; Kluwer Academic Publishers: New York, NY, USA, 1999. [Google Scholar]
Giannakis, G.; Serpedin, E. A bibliography on nonlinear system identification. Signal Process. 2001, 81, 533–580. [Google Scholar] [CrossRef]
Nelles, O. Nonlinear System Identification: From Classical Approaches to Neural Networks and Fuzzy Models; Springer: Berlin/Heidelberg, Germany, 2001. [Google Scholar]
Schoukens, J.; Ljung, L. Nonlinear system identification. A user-oriented road map. IEEE Control Syst. Mag. 2019, 39, 28–99. [Google Scholar] [CrossRef]
Favier, G. Matrix and Tensor Decompositions in Signal Processing. Vol. 2; Wiley: Hoboken, NJ, USA, 2022. [Google Scholar]
Favier, G.; de Almeida, A.L.F. Overview of constrained PARAFAC models. EURASIP J. Adv. Signal Process. 2014, 5, 1–25. [Google Scholar] [CrossRef]
Ragnarsson, S.; Van Loan, C. Block tensors and symmetric embeddings. Linear Algebra Its Appl. 2013, 438, 853–874. [Google Scholar] [CrossRef]
Cichocki, A.; Mandic, D.; De Lathauwer, L.; Zhou, G.; Zhao, Q.; Caiafa, C. Tensor decompositions for signal processing applications: From two-way to multiway component analysis. IEEE Trans. Signal Process. 2015, 32, 145–163. [Google Scholar] [CrossRef]
Sidiropoulos, N.D.; de Lathauwer, L.; Fu, X.; Huang, K.; Papalexakis, E.; Faloutsos, C. Tensor decomposition for signal processing and machine learning. IEEE Trans. Signal Process. 2017, 65, 3551–3582. [Google Scholar] [CrossRef]
Carroll, J.D.; Chang, J. Analysis of individual differences in multidimensional scaling via an N-way generalization of “Eckart-Young” decomposition. Psychometrika 1970, 35, 283–319. [Google Scholar] [CrossRef]
Kiers, H.A.L. Towards a standardized notation and terminology in multiway analysis. J. Chemom. 2000, 14, 105–122. [Google Scholar] [CrossRef]
Comon, P.; Golub, G.; Lim, L.H.; Mourrain, B. Symmetric tensors and symmetric tensor rank. SIAM J. Matrix Anal. Appl. 2008, 30, 1254–1279. [Google Scholar] [CrossRef]
Kruskal, J.B. Three-way arrays: Rank and uniqueness of trilinear decompositions, with application to arithmetic complexity and statistics. Linear Algebra Its Appl. 1977, 18, 95–138. [Google Scholar] [CrossRef]
Sidiropoulos, N.D.; Bro, R. On the uniqueness of multilinear decomposition of N-way arrays. J. Chemom. 2000, 14, 229–239. [Google Scholar] [CrossRef]
Ten Berge, J.M.F.; Smilde, A.K. Non-triviality and identification of a constrained Tucker3 analysis. J. Chemom. 2002, 16, 609–612. [Google Scholar] [CrossRef]
De Lathauwer, L.; de Moor, B.; Vandewalle, J. A multilinear singular value decomposition. SIAM J. Matrix Anal. Appl. 2000, 21, 1253–1278. [Google Scholar] [CrossRef]
Smilde, A.K.; Bro, R.; Geladi, P. Multi-Way Analysis. Applications in the Chemical Sciences; Wiley: Chichester, England, 2004. [Google Scholar]
Leontaritis, I.J.; Billings, S.A. Input-output parametric models for non-linear systems. Int. J. Control 1985, 41, 303–344. [Google Scholar] [CrossRef]
Comon, P.; Mourrain, B. Decomposition of quantics in sums of power of linear forms. Signal Process. 1996, 53, 93–107. [Google Scholar] [CrossRef]
Korenberg, M. Parallel cascade identification and kernel estimation for nonlinear systems. Ann. Biomed. Eng. 1991, 19, 429–455. [Google Scholar] [CrossRef]
Ninness, B.; Gustafsson, F. A unifying construction of orthonormal bases for system identification. IEEE Trans. Autom. Control 1997, 42, 515–521. [Google Scholar] [CrossRef]
Liavas, A.; Regalia, P.; Delmas, J. Blind channel approximation: Effective channel order determination. IEEE Trans. Signal Process. 1999, 47, 3336–3344. [Google Scholar] [CrossRef]

Figure 1. Realization of a third-order Volterra–PARAFAC model as Wiener models in parallel.

Figure 2. Top: Original output signal. Bottom: reconstructed output from noisy measurements (SNR = 30 dB).

Figure 3. Evolution of the square error

ϵ_{L} (t)

,

L = 100

.

Figure 3. Evolution of the square error

ϵ_{L} (t)

,

L = 100

.

Figure 4. Normalized mean square error in steady state for a sum of sines.

Figure 5. Normalized mean square error in steady state for a random input.

Figure 6. Realization of a third-order Volterra–GOB model.

Figure 7. Block diagram of a Hammerstein model.

Figure 8. Block diagram of a Wiener model.

Figure 9. Block diagram of a Wiener–Hammerstein model.

Table 1. Reduced SVD, PARAFAC/CPD, and TD.

Matrices	⋮	Third-Order Tensors
$X \in R^{I \times J}$	⋮	$X \in R^{I \times J \times K}$
Reduced SVD	⋮	PARAFAC/CPD
$x_{i, j} = \sum_{r = 1}^{R} σ_{r} u_{i, r} v_{j, r} ⟺ X = U Σ V^{T}$	⋮	$x_{i, j, k} = \sum_{r = 1}^{R} a_{i, r} b_{j, r} c_{k, r}$
$U \in R^{I \times R}, V \in R^{J \times R}, Σ \in R^{R \times R}$	⋮	$A \in R^{I \times R}, B \in R^{J \times R}, C \in R^{K \times R}$
	⋮	TD
	⋮	$x_{i, j, k} = \sum_{p = 1}^{P} \sum_{q = 1}^{Q} \sum_{s = 1}^{S} g_{p, q, s} a_{i, p} b_{j, q} c_{k, s}$
	⋮	$A \in R^{I \times P}, B \in R^{J \times Q}, C \in R^{K \times S}, G \in R^{P \times Q \times S}$

Table 2. Some examples of linear, nonlinear and multilinear models.

Linear models

SISO FIR model

y (t) = \sum_{i}^{n_{u}} h_{i} u (t - i)

Memoryless MIMO model

y_{i} (t) = \sum_{j = 1}^{n_{T}} h_{i, j} u_{j} (t)

,

i \in [1, n_{R}]

y (t) = H u (t)

,

y (t) \in R^{n_{R}}, u (t) \in R^{n_{T}}, H \in R^{n_{R} \times n_{T}}

Nonlinear models

Polynomial model (Section 5.1)

y (t) = \sum_{p = 1}^{P} f_{p} [u (t), \dots, u (t - n_{u}), y (t - 1), \dots, y (t - n_{y})]

f_{p} (.) =

pth-degree polynomial in the system input

(u)

and output

(y)

signals

Truncated Volterra model (Section 5.2)

y (t) = h_{0} + \sum_{p = 1}^{P} \sum_{m_{1} = 1}^{M_{p}} \dots \sum_{m_{P} = 1}^{M_{P}} h_{m_{1}, \dots, m_{P}}^{(p)} \prod_{q = 1}^{p} u (t - m_{q} + 1)

h_{m_{1}, \dots, m_{P}}^{(p)} = p

th-order Volterra kernel with memory

M_{p}

Multilinear models

TITO model (Section 6.2)

y_{i_{1}, \dots, i_{P}} (t) = \sum_{j_{1} = 1}^{J_{1}} \dots \sum_{j_{N} = 1}^{J_{N}} h_{i_{1}, \dots, i_{P}, j_{1}, \dots, j_{N}} u_{j_{1}, \dots, j_{N}} (t)

U (t) \in R^{J_{1} \times \dots \times J_{N}}, Y (t) \in R^{I_{1} \times \dots \times I_{P}}

Multilinear TISO model (Section 6.3 and Section 6.4)

y (t) = \sum_{j_{1} = 1}^{J_{1}} \dots \sum_{j_{N} = 1}^{J_{N}} (\prod_{n = 1}^{N} h_{j_{n}}^{(n)}) u_{j_{1}, \dots, j_{N}} (t)

Table 3. Notation for sets of indices and dimensions.

{\underset{̲}{i}}_{P} ≜ {i_{1}, \dots, i_{P}}

;

{\underset{̲}{j}}_{N} ≜ {j_{1}, \dots, j_{N}}

{\underset{̲}{I}}_{P} ≜ {I_{1}, \dots, I_{P}}

;

{\underset{̲}{J}}_{N} ≜ {J_{1}, \dots, J_{N}}

{\underset{̲}{I}}_{P} ≜ I_{1} \times \dots \times I_{P}

;

{\underset{̲}{J}}_{N} ≜ J_{1} \times \dots \times J_{N}

{\underset{̲}{I}}_{P} \times {\underset{̲}{J}}_{N} = I_{1} \times \dots \times I_{P} \times J_{1} \times \dots \times J_{N}

{\underset{̲}{I}}_{P} \times {\underset{̲}{I}}_{P} = I_{1} \times \dots \times I_{P} \times I_{1} \times \dots \times I_{P}

Π I_{P} ≜ I_{1} \dots I_{P} = \prod_{p = 1}^{P} I_{p}

Table 4. Vector and matrix products using the index convention.

u \in K^{I}, v \in K^{J}, w \in K^{K}

u \otimes v = u_{i} v_{j} e_{i j} \in K^{I J}

u \otimes v^{T} = u_{i} v_{j} e_{i}^{j} \in K^{I \times J}

u \otimes v^{T} \otimes w = u_{i} v_{j} w_{k} e_{i k}^{j} \in K^{I K \times J}

A \in K^{I \times J}, B \in K^{J \times K}, C \in K^{K \times J}

A B = \sum_{i = 1}^{I} \sum_{k = 1}^{K} (\sum_{j = 1}^{J} a_{i j} b_{j k}) e_{i}^{k} = a_{i j} b_{j k} e_{i}^{k} \in K^{I \times K}

A C^{T} = a_{i j} c_{k j} e_{i}^{k} \in K^{I \times K}

Table 5. Various sets of tensors.

Order	Size	Sets of Tensors
P	${\underset{̲}{I}}_{P} = I_{1} \times \dots \times I_{P}$	$K^{I_{1} \times \dots \times I_{P}} ≜ K^{{\underset{̲}{I}}_{P}}$
P	${\underset{̲}{I}}_{P} = I_{1} \times \dots \times I_{P} with I_{p} = I, \forall p \in 〈 P 〉$	$K^{[P; I]}$
$P + N$	${\underset{̲}{I}}_{P} \times {\underset{̲}{J}}_{N} = I_{1} \times \dots \times I_{P} \times J_{1} \times \dots \times J_{N}$	$K^{{\underset{̲}{I}}_{P} \times {\underset{̲}{J}}_{N}}$
	${\underset{̲}{I}}_{P} \times {\underset{̲}{J}}_{N} = I \times \dots \times I \times J \times \dots \times J$
$P + N$	with	$K^{[P + N; I, J]}$
	$I_{p} = I, \forall p \in 〈 P 〉$ and $J_{n} = J, \forall n \in 〈 N 〉$
$2 P$	${\underset{̲}{I}}_{P} \times {\underset{̲}{I}}_{P}$ with $I_{p} = I, \forall p \in 〈 P 〉$	$K^{[2 P; I]}$

Table 6. Multilinear forms and associated tensors.

Multilin. Forms	Transformations	Tensors
real-valued in P vectors	$\overset{P}{\underset{p = 1}{\times}} R^{I_{p}} ∋ (x^{(1)}, \dots x^{(P)}) ⟼ f (x^{(1)}, \dots x^{(P)}) \in R$	$A \in R^{{\underset{̲}{I}}_{P}}$
real-valued in one vector	$R^{I} ∋ x ⟼ f \underset{P terms}{\underset{︸}{(x, \dots, x)}} \in R$	$A \in R^{[P; I]}$

Table 7. Multilinear forms and associated homogeneous polynomials.

Forms	Matrices/Tensors	Homogeneous Polynomials
Bilinear	$A \in R^{I \times J}; y \in R^{I}, x \in R^{J}$	$f (x, y) = y^{T} A x = a_{i j} y_{i} x_{j}, i \in 〈 I 〉, j \in 〈 J 〉$
Quadratic	$A \in R^{I \times I}; x \in R^{I}$	$f (x) = x^{T} A x = a_{i j} x_{i} x_{j}, i, j \in 〈 I 〉$
Real multilinear in P vector	$A \in R^{{\underset{̲}{I}}_{P}}; x^{(p)} \in R^{I_{p}}$	$f (x^{(1)}, \dots x^{(P)}) = a_{{\underset{̲}{i}}_{P}} \prod_{p = 1}^{P} x_{i_{p}}^{(p)}, i_{p} \in 〈 I_{p} 〉, p \in 〈 P 〉$
Real multilinear in one vector	$A \in R^{[P; I]}; x \in R^{I}$	$f \underset{P terms}{\underset{︸}{(x, \dots, x)}} = a_{{\underset{̲}{i}}_{P}} \prod_{p = 1}^{P} {x_{i}}_{p}, i_{p} \in 〈 I 〉, p \in 〈 P 〉$

Table 8. Different types of multiplication with tensors.

Tensors	Operations	Definitions
$X \in K^{{\underset{̲}{I}}_{P}}, A \in K^{J \times I_{p}}$	$Y = X \times_{p} A$	$y_{i_{1}, \dots, i_{p - 1}, j, i_{p + 1}, \dots, i_{P}} = \sum_{i_{p}} a_{j, i_{p}} x_{{\underset{̲}{i}}_{P}} = a_{j, i_{p}} x_{{\underset{̲}{i}}_{P}}$
$X \in K^{{\underset{̲}{I}}_{P}}, u \in K^{I_{p}}$	$Y = X \times_{p} u^{T}$	$y_{i_{1}, \dots, i_{p - 1}, i_{p}, \dots, i_{P}} = \sum_{i_{p}} u_{i_{p}} x_{{\underset{̲}{i}}_{P}} = u_{i_{p}} x_{{\underset{̲}{i}}_{P}}$
$X \in K^{{\underset{̲}{I}}_{P}}, Y \in K^{{\underset{̲}{J}}_{N}}$	$Z = X \times_{p}^{n} Y$	$z_{i_{1}, \dots, i_{p - 1}, i_{p + 1}, \dots, i_{P}, j_{1}, \dots, j_{n - 1}, j_{n + 1}, \dots, j_{N}} =$
with $I_{p} = J_{n} = K$		$\sum_{k = 1}^{K} a_{i_{1}, \dots, i_{p - 1}, k, i_{p + 1}, \dots, i_{P}} b_{j_{1}, \dots, j_{n - 1}, k, j_{n + 1}, \dots, j_{N}}$
$A \in K^{{\underset{̲}{I}}_{P} \times {\underset{̲}{J}}_{N}}, X \in K^{{\underset{̲}{J}}_{N} \times {\underset{̲}{K}}_{Q}}$	$Y = A ★_{N} X$	$y_{{\underset{̲}{i}}_{P}, {\underset{̲}{k}}_{Q}} = \sum_{{\underset{̲}{j}}_{N} = \underset{̲}{1}}^{{\underset{̲}{J}}_{N}} a_{{\underset{̲}{i}}_{P}, {\underset{̲}{j}}_{N}} x_{{\underset{̲}{j}}_{N}, {\underset{̲}{k}}_{Q}} = a_{{\underset{̲}{i}}_{P}, {\underset{̲}{j}}_{N}} x_{{\underset{̲}{j}}_{N}, {\underset{̲}{k}}_{Q}}$

Table 9. Outer products of vectors, matrices, and tensors.

Vectors/Matrices/Tensors	Outer Products	Spaces	Orders
$u^{(p)} \in K^{I_{p}}$ , $p \in 〈 P 〉$	$\overset{P}{\underset{p = 1}{\circ}} u^{(p)}$	$K^{{\underset{̲}{I}}_{P}}$	P
$A^{(p)} \in K^{I_{p} \times J_{p}}, p \in 〈 P 〉$	$\overset{P}{\underset{p = 1}{\circ}} A^{(p)}$	$K^{I_{1} \times J_{1} \times \dots \times I_{P} \times J_{P}}$	$2 P$
$A \in K^{{\underset{̲}{I}}_{P}}, B \in K^{{\underset{̲}{J}}_{N}}$	$A \circ B$	$K^{{\underset{̲}{I}}_{P} \times {\underset{̲}{J}}_{N}}$	$P + N$
$A^{(p)} \in K^{{\underset{̲}{J}}_{N_{p}}}, p \in 〈 P 〉$	$\overset{P}{\underset{p = 1}{\circ}} A^{(p)}$	$K^{{\underset{̲}{J}}_{N_{1}} \times \dots \times {\underset{̲}{J}}_{N_{P}}}$	$\sum_{p = 1}^{P} N_{p}$

Table 10. PARAFAC decomposition of a tensor of order three and order N.

Third-Order Tensor		Nth-Order Tensor
$X \in K^{I \times J \times K}$		$X \in K^{{\underset{̲}{I}}_{N}}$
$A \in K^{I \times R}, B \in K^{J \times R}, C \in K^{K \times R},$		$A^{(n)} \in K^{I_{n} \times R}$
$x_{i, j, k} = \sum_{r = 1}^{R} a_{i r} b_{j r} c_{k r}$	Scalar writing	$x_{{\underset{̲}{i}}_{N}} = \sum_{r = 1}^{R} \prod_{n = 1}^{N} a_{i_{n}, r}^{(n)}$
$X = I_{R} \times_{1} A \times_{2} B \times_{3} C$	with mode-n products	$X = I_{R} \overset{N}{\underset{n = 1}{\times}} A^{(n)}$
$X = \sum_{r = 1}^{R} A_{. r} \circ B_{. r} \circ C_{. r}$	with outer products	$X = \sum_{r = 1}^{R} \overset{N}{\underset{n = 1}{\circ}} A_{. r}^{(n)}$
$X_{I J \times K} = (A ⋄ B) C^{T}$	Matrix unfoldings	$X_{S_{1}; S_{2}} = (\underset{n \in S_{1}}{⋄} A^{(n)}) {(\underset{n \in S_{2}}{⋄} A^{(n)})}^{T}$
$X_{J K \times I} = (B ⋄ C) A^{T}$
$X_{K I \times J} = (C ⋄ A) B^{T}$
$x_{I J K} = (A ⋄ B ⋄ C) 1_{R}$	Vectorized form	$x_{I_{1} \cdot I_{N}} = (A^{(1)} ⋄ A^{(2)} ⋄ \dots ⋄ A^{(N)}) 1_{R}$

Table 11. Tucker decomposition of a tensor of order three and order N.

Third-Order Tensor		Nth-Order Tensor
$X \in K^{I \times J \times K}$		$X \in K^{{\underset{̲}{I}}_{N}}$
$G \in K^{P \times Q \times S}$ , $A \in K^{I \times P}$ ,		$G \in K^{{\underset{̲}{R}}_{N}}$ , $A^{(n)} \in K^{I_{n} \times R_{n}}, n \in 〈 N 〉$
$B \in K^{J \times Q}, C \in K^{K \times S}$
$x_{i j k} = \sum_{p = 1}^{P} \sum_{q = 1}^{Q} \sum_{s = 1}^{S} g_{p q s} a_{i p} b_{j q} c_{k s}$	Scalar writing	$x_{{\underset{̲}{i}}_{N}} = \sum_{r_{1} = 1}^{R_{1}} \dots \sum_{r_{N} = 1}^{R_{N}} g_{r_{1}, \dots, r_{N}} \prod_{n = 1}^{N} a_{i_{n}, r_{n}}^{(n)}$
$X = G \times_{1} A \times_{2} B \times_{3} C$	with mode-n products	$X = G \times_{1} A^{(1)} \times_{2} A^{(2)} \times_{3} \dots \times_{N} A^{(N)}$
$X = \sum_{p = 1}^{P} \sum_{q = 1}^{Q} \sum_{s = 1}^{S} g_{p q s} A_{. p} \circ B_{. q} \circ C_{. s}$	with outer products	$X = \sum_{r_{1} = 1}^{R_{1}} \dots \sum_{r_{N} = 1}^{R_{N}} g_{r_{1}, \dots, r_{N}} \overset{N}{\underset{n = 1}{\circ}} A_{. r_{n}}^{(n)}$

Table 12. Examples of tensor systems.

Forms	Dimensions	Tensor Systems
Linear	$Y \in R^{{\underset{̲}{I}}_{P}}, A \in R^{{\underset{̲}{I}}_{P} \times {\underset{̲}{J}}_{N}}, X \in R^{{\underset{̲}{J}}_{N}}$	$Y = A ★_{N} X$
Bilinear	$Y \in R^{{\underset{̲}{I}}_{P}}, A \in R^{{\underset{̲}{I}}_{P} \times {\underset{̲}{K}}_{M} \times {\underset{̲}{J}}_{N}}, X \in R^{{\underset{̲}{J}}_{N}}, Z \in R^{{\underset{̲}{K}}_{M}}$	$Y = A ★_{N} X ★_{M} Z$

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Favier, G.; Kibangou, A. Tensor-Based Approaches for Nonlinear and Multilinear Systems Modeling and Identification. Algorithms 2023, 16, 443. https://doi.org/10.3390/a16090443

AMA Style

Favier G, Kibangou A. Tensor-Based Approaches for Nonlinear and Multilinear Systems Modeling and Identification. Algorithms. 2023; 16(9):443. https://doi.org/10.3390/a16090443

Chicago/Turabian Style

Favier, Gérard, and Alain Kibangou. 2023. "Tensor-Based Approaches for Nonlinear and Multilinear Systems Modeling and Identification" Algorithms 16, no. 9: 443. https://doi.org/10.3390/a16090443

APA Style

Favier, G., & Kibangou, A. (2023). Tensor-Based Approaches for Nonlinear and Multilinear Systems Modeling and Identification. Algorithms, 16(9), 443. https://doi.org/10.3390/a16090443

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Tensor-Based Approaches for Nonlinear and Multilinear Systems Modeling and Identification

Abstract

1. Introduction

2. Notation and Index Convention

3. Tensors and Multilinear Forms

4. Tensor Operations and Decompositions

4.1. Multiplications with Tensors

4.2. PARAFAC and Tucker Decompositions

5. Tensor-Based Approaches for Nonlinear Systems

5.1. Polynomial Models

5.2. Truncated Volterra Models

5.3. Volterra–PARAFAC Models

5.4. Volterra–GOB Models

5.5. Block-Oriented Models

5.5.1. Hammerstein Model

5.5.2. Wiener Model

5.5.3. Wiener–Hammerstein Model

5.5.4. Tensor Rank and Structure Identification

6. Tensor-Based Approaches for Multilinear Systems

6.1. Tensor Systems

6.2. Discrete-Time Memoryless Tensor-Input Tensor-Output Systems

6.3. Multilinear TISO Systems

6.4. Estimation of the System Transfer Tensor from I/O Data

7. Conclusions and Perspectives

Supplementary Materials

Funding

Institutional Review Board Statement

Informed Consent Statement:

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI