## 2. Problem Description

Consider an MISO-OE system

where

${u}_{i}\left(t\right)$ and

${d}_{i}$ are the input and time-delay of the

ith input channel,

$y\left(t\right)$ is the output,

$v\left(t\right)$ is a white noise vector with zero mean and variance

${\sigma}^{2}$, and

${A}_{i}\left(z\right)$ and

${B}_{i}\left(z\right)$ are the time-invariant polynomials with constant coefficients in the unit backward shift operator

${z}^{-1}$$[\mathrm{i}.\mathrm{e}.,\phantom{\rule{3.33333pt}{0ex}}u\left(t\right){z}^{-1}=u(t-1)]$,

Assume that the orders ${n}_{ai}$ and ${n}_{bi}$ are known, $y\left(t\right)=0$, ${u}_{i}\left(t\right)=0$ and $v\left(t\right)=0$ for $t<0$.

To form the identification model, an intermediate variable is introduced [

24],

Since the time-delay

${d}_{i}$ of each input channel is unknown, an overparameterization method is applied by setting a maximum input regression length

l which satisfies

$l\ge \mathrm{max}({d}_{i}+{n}_{bi})$ [

20]. Then,

${x}_{i}\left(t\right)$ can be written in an impact form

where

The identification model of the system in Equation (

1) can be formed as

where

It can be seen from Equations (

3) and (6) that the parameter vector

$\mathit{\theta}$ contains many zeros, therefore

$\mathit{\theta}$ is a sparse vector and the system in Equation (

4) is a sparse system. The sparsity level can be measured by

$K:={\displaystyle \sum _{i=1}^{r}}({n}_{ai}+{n}_{bi})$, which denotes the number of non-zero elements in

$\mathit{\theta}$. The identification objective is to estimate the unknown parameters

${a}_{ij}$,

${b}_{ij}$ as well as the time-delays

${d}_{i}$ from observations.

## 3. Identification Algorithm

If we have

m observations from

$t=1$ to

$t=m$, Equation (

4) can be written in a stacked form

where

From Equations (

2), (

5) and (10), we can see that the information matrix

$\Phi $ contains many unknown intermediate terms. Therefore, it is difficult to perform the identification directly. According to the auxiliary model identification idea [

24,

27], the information matrix

$\Phi $ can be replaced with its estimate

where

Note that the unmeasurable terms

${x}_{i}(t-j)$ are replaced with their auxiliary model output estimates

${\widehat{x}}_{i}(t-j)$. Then, the parameter vector

$\mathit{\theta}$ can be estimated by the auxiliary model based least squares iterative (AM-LSI) algorithm [

28],

where

${\widehat{\mathit{\theta}}}_{k}$ denotes the parameter vector estimate at the

kth iteration. According to the least squares (LS) theory, the AM-LSI algorithm is efficient if it is satisfied that

$m\gg n$. However, from Equation (7), we can see that the dimension of the system in Equation (

8) is high. Therefore, it would take a lot of time and efforts to obtain enough observations to meet the identification requirement. Moreover, Equation (

14) shows that the AM-LSI algorithm requires computing the inverse matrix

${\left[{\widehat{\mathbf{\Phi}}}_{k}^{T}{\widehat{\mathbf{\Phi}}}_{k}\right]}^{-1}$ at each iteration, which leads to a heavy computational burden. Furthermore, the sparse solution cannot be obtained [

23], and the time-delays cannot be effectively estimated. Thus, the AM-LSI algorithm is infeasible for the high dimensional and sparse system identification.

Inspired by the compressive sensing theory, the identification of the sparse system in Equation (

8) can be further expressed as an optimization problem

where

${\parallel \xb7\parallel}_{0}$ represents the

${l}_{0}$ norm,

${\parallel \xb7\parallel}_{2}$ the

${l}_{2}$ norm, and

$\epsilon $ the error tolerance, which is a priori chosen. However, it is a non-convex problem and is difficult to solve in practice. A commonly used alternative is to replace the

${l}_{0}$ norm with a relaxed convex

${l}_{1}$ norm [

9],

where

${\parallel \xb7\parallel}_{1}$ is the

${l}_{1}$ norm. It is proved that the minimization of

${l}_{1}$ norm is equivalent to the minimization of

${l}_{0}$ norm when the Restricted Isometry Property (RIP) is satisfied [

13]. Then, the convex optimization problem in Equation (

17) is solvable with the BPDN criterion [

16]. The BPDN approach can effectively reduce the interference of noise and has good robustness. Inspired by the auxiliary model and the iterative identification idea in the AM-LSI algorithm, we apply the BPDN criterion to modify the AM-LSI algorithm by replacing the LS step with the BPDN criterion. The parameters are estimated by the BPDN criterion and the unmeasurable terms are replaced by the outputs of the auxiliary model in each iteration.

Let

$k=1,2,\cdots $ be the iterative number. To get accurate reconstruction from Equation (

17), the information matrix should satisfy some conditions, such as the RIP [

29] or the exact recovery condition (ERC) [

9]. The ERC guarantee and the consistency properties for the identification of the controlled autoregressive models have been investigated in [

12]. To meet the ERC, we normalize the information matrix

${\widehat{\mathbf{\Phi}}}_{k}$ defined in Equations (

11)–(13) by dividing the elements in each column by the

${l}_{2}$ norm of that column [

30,

31]. Denote the element of the

ith row and

jth column in

${\widehat{\mathbf{\Phi}}}_{k}$ by

${\widehat{\Phi}}_{k,ij}$, and the normalized information matrix

${\widehat{\mathbf{\Phi}}}_{k,n}$ is constructed by

where

${\widehat{\Phi}}_{k}\left(j\right):=\sqrt{{\sum}_{i=1}^{m}{\left({\widehat{\Phi}}_{k,ij}\right)}^{2}}$. Similarly, the normalized parameter vector

${\mathit{\theta}}_{k,n}$ can be defined as

Note that the location of non-zeros in

${\mathit{\theta}}_{n}$ is identical to that in

$\mathit{\theta}$. Accordingly, the constrained optimization problem in Equation (

17) equals

The problem in Equation (

20) is closely related to the following unconstrained convex optimization problem

where

$\lambda $ is a nonnegative parameter. Since the information matrix

${\widehat{\mathbf{\Phi}}}_{k}$ is normalized, we can set

$\lambda $ to the value

$\lambda =\sigma \sqrt{2log\left(n\right)}$ [

16].

The key step of the BPDN approach is to express Equation (

21) as a quadratic program (QP) problem [

32]. To begin with, two nonnegative vectors

${\mathit{u}}_{n}$ and

${\mathit{v}}_{n}$ are introduced to express

${\mathit{\theta}}_{n}$. Let

${\theta}_{nj}$,

${u}_{nj}$ and

${v}_{nj}$ be the

jth element of the vectors

${\mathit{\theta}}_{n}$,

${\mathit{u}}_{n}$ and

${\mathit{v}}_{n}$, respectively, where

${u}_{nj}\ge 0$ and

${v}_{nj}\ge 0$ for all

$j=1,2,\cdots ,n$. Define

Then,

${\mathit{\theta}}_{n}$ can be rewritten as

Accordingly, the

${l}_{1}$ regularization term

$\parallel {\mathit{\theta}}_{n}{\parallel}_{1}$ can be expressed as

where

${\mathbf{1}}_{n}^{T}:=[1,\cdots ,1]\in {\mathbb{R}}^{n}$,

${\mathbf{1}}_{2n}^{T}:=[1,\cdots ,1]\in {\mathbb{R}}^{2n}$, and

Note that all elements in

${\mathbf{z}}_{n}$ are nonnegative. Similarly, the quadratic error term can be written as

Since

${\mathit{Y}}^{T}[{\widehat{\mathbf{\Phi}}}_{k,n},-{\widehat{\mathbf{\Phi}}}_{k,n}]{\mathbf{z}}_{n}$ is a scalar, it follows that

Equation (

24) can be further written as

where

From Equations (

22) and (

26), we have

where

Since

$\frac{1}{2}{\mathit{Y}}^{T}\mathit{Y}$ is a constant, Equation (

28) can be constructed in a standard QP framework,

For the inequality constrained QP problem in Equation (

30), the common solution is the active set method [

33]. While for the sake of simplicity, the QP problem can be directly solved by calling the relevant function from the standard scientific software toolbox. For example, the MATLAB toolbox provides a function “quadprog”. Then, the parameter vector

${\widehat{\mathit{\theta}}}_{n}$ can be obtained from the optimum solution

${\widehat{\mathbf{z}}}_{n}$,

where

${\widehat{\mathbf{z}}}_{n}(1:n)$ represents a vector formed by the first

n elements of

${\widehat{\mathbf{z}}}_{n}$, and

${\widehat{\mathbf{z}}}_{n}(n+1:2n)$ a vector formed by the last

n elements of

${\widehat{\mathbf{z}}}_{n}$. Considering that the system in Equation (

8) is contaminated with noise, the parameter estimation error can be large. To further reduce the estimation error, a small threshold TH

$=\u03f5$ can be set to filter the elements close to zero in

${\widehat{\mathit{\theta}}}_{n}$. Let

${\widehat{\theta}}_{nj}=0$ if

${\widehat{\theta}}_{nj}<\u03f5$ and denote the filtered parameter vector as

${\widehat{\mathit{\theta}}}_{n,\u03f5}$. Then, the parameter vector estimate

${\widehat{\mathit{\theta}}}_{k}$ can be recovered according to Equation (

19),

The estimates of the intermediate variables ${x}_{i,k}\left(t\right)$ can be refreshed by ${\widehat{\mathit{\theta}}}_{k}$ as shown in Equations (15) and (16).

Equations (

9), (

11)–(13), (15), (16), (

18), (

23), (

25), (

27), and (

29)–(

32) form the auxiliary model-basis pursuit de-noising iterative (AM-BPDNI) algorithm for the MISO-OE system. The implementation procedures are listed as follows:

Collect the input–output data {${u}_{i}\left(t\right)$, $y\left(t\right)$: $i=1,2,\cdots ,r$; $t=1,2,\cdots ,m$} and set the parameter estimation accuracy ${\epsilon}_{0}$.

Construct the stacked output vector

**Y** by Equation (

9).

Initialize the iteration: let $k=1$ and ${\widehat{x}}_{i,0}\left(t\right)$ be random sequences.

Construct the information matrix

${\widehat{\mathbf{\Phi}}}_{k}$ by Equations (

11)–(13) and normalize

${\widehat{\mathbf{\Phi}}}_{k}$ by (

18).

Form the vectors and matrix

${\mathbf{z}}_{n}$,

b,

b and

**C** by Equations (

23), (

25), (

27), and (

29) and formulate the QP by Equation (

30).

Call the function to obtain the optimum solution

${\widehat{\mathbf{z}}}_{n}$ and compute

${\widehat{\mathit{\theta}}}_{n}$ by Equation (

31).

Set a threshold to obtain

${\widehat{\mathit{\theta}}}_{n,\u03f5}$ and recover the parameter vector estimate

${\widehat{\mathit{\theta}}}_{k}$ by (

32).

Compare ${\widehat{\mathit{\theta}}}_{k}$ with ${\widehat{\mathit{\theta}}}_{k-1}$: if $\parallel {\widehat{\mathit{\theta}}}_{k}-{\widehat{\mathit{\theta}}}_{k-1}\parallel >{\epsilon}_{0}$, update the auxiliary model outputs ${x}_{i,k}\left(t\right)$ by Equations (15) and (16) and go to Step 4. Otherwise, stop the iteration and obtain the parameter vector estimate $\widehat{\mathit{\theta}}$.

The unknown time-delay of each input channel can be estimated according to the location of zero-blocks and the number of zeros in

$\widehat{\mathit{\theta}}$. It can be seen from Equations (

3) and (6) that there are

$2r$ zero-blocks in

$\widehat{\mathit{\theta}}$. Denote the number of zeros in each zero-block by

${z}_{i}$ $(i=1,2,\cdots ,2r)$. Then, time-delays can be estimated by

## 4. Simulation Example

**Example** **1.** Consider the following MISO-OE system with time-delays, The system in Equation (

34) is a second order system with three inputs and one output. The inputs

$\left\{{u}_{i}\left(t\right)\right\}$,

$i=1,2,3$ are taken as random uncorrelated signal sequences with zero mean and unit variances, and

$\left\{v\right(t\left)\right\}$ as a white noise sequence with zero mean and variances

${\sigma}^{2}$. Let the maximum input regression length be

$l=50$. Then, the parameter vector to be identified is

where

${\mathbf{0}}_{i}$ denotes the zero-block with

i zeros. Note that the number of non-zero elements is

$K={\parallel \mathit{\theta}\parallel}_{0}={\displaystyle \sum _{i=1}^{3}}({n}_{ai}+{n}_{bi})=12$.

Taking

$m=130$ and

$\mathrm{TH}=0.001$, apply the AM-LSI algorithm and the AM-BPDNI algorithm to perform the identification, respectively. The parameter estimation errors

$\delta :=\parallel \mathit{\theta}-\widehat{\mathit{\theta}}\parallel /\parallel \mathit{\theta}\parallel $ versus different noise levels are shown in

Table 1. When

${\sigma}^{2}={0.10}^{2}$, the estimation errors

$\delta $ versus the iterative number

k are shown in

Figure 1. It can be seen that the AM-BPDNI algorithm performs better than the AM-LSI algorithm and is insensitive to noise.

Let the variance of

$\left\{v\right(t\left)\right\}$ be

${\sigma}^{2}={0.10}^{2}$. Apply the AM-BPDNI algorithm to obtain the estimated model of the system in Equation (

34) with the first

$m=130$ data. Then, validate the estimated model by using

${m}_{e}=200$ samples from

$t=131$ to 330. The predicted output of the estimated model, the true output of the system and their errors are plotted in

Figure 2. It is shown that the predicted outputs

$\widehat{y}\left(t\right)$ are close to the true outputs

$y\left(t\right)$. Moreover, the average predicted output error

is small and close to the standard deviation of the noise

$\sigma =0.10$. It follows that the estimated model can well capture the system dynamics.

Let

$m=130$ and

${\sigma}^{2}={0.10}^{2}$. Using the AM-BPDNI algorithm to estimate the sparse parameter vector

$\mathit{\theta}$, the non-zero parameter estimates versus

k are shown in

Table 2 and

Figure 3.

With 10 iterations, the estimated parameter vector is

It can be seen from Equation (

35) that there are six zero-blocks in

$\widehat{\mathit{\theta}}$ and the number of zeros in each zero-block are

${z}_{1}=20$,

${z}_{2}=28$,

${z}_{3}=10$,

${z}_{4}=38$,

${z}_{5}=30$ and

${z}_{6}=18$. Then, the time-delay of each input channel can be estimated according to Equation (

33),

Obviously, the time-delay estimates are agreement with the true time-delays.

**Example** **2.** Consider the following MISO-OE system with time-delays, Compared with the system in Equation (

34), the system in Equation (

36) has one more input. Thus, the number of parameters is increased. Let

$l=50$ and the true parameter vector is

Taking

$m=130$,

${\sigma}^{2}=0.{10}^{2}$, and

$\mathrm{TH}=0.001$, employ the AM-BPDNI algorithm to identify the system in Equation (

36). The estimated parameter vector is

The time-delay estimates are

which are identical to the true time-delays.

The parameter estimation errors of the systems in Equations (

34) and (

36) versus

k are shown in

Figure 4.

The running time of the proposed method for the systems is ${t}_{1}=1.374637$ s and ${t}_{2}=2.398602$ s. It can be concluded that the computational burden increases as the dimension of the parameter vector increases.

The simulation results show that for the MISO-OE model, the proposed AM-BPDNI algorithm can obtain efficient estimation of parameters from few observations ($m<n$) with good robustness. Moreover, the AM-BPDNI algorithm can effectively estimate the time-delays according to the sparse characteristic of the estimated parameter vector. However, as the number of the input channels increases, the computational burden of the proposed algorithm increases.