Next Article in Journal
Common Attractor for Hutchinson θ-Contractive Operators in Partial Metric Spaces
Next Article in Special Issue
Recent Advances and New Challenges in Coupled Systems and Networks: Theory, Modelling, and Applications (Special Issue in Honor of Professor Roderick Melnik)
Previous Article in Journal
Imaging Estimation for Liver Damage Using Automated Approach Based on Genetic Programming
Previous Article in Special Issue
Reduced-Order Model for Cell Volume Homeostasis: Application to Aqueous Humor Production
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Modeling of Nonlinear Systems: Method of Optimal Injections

by
Anatoli Torokhti
1 and
Pablo Soto-Quiros
2,*
1
STEM Discipline, University of South Australia, Mawson Lakes, GPO Box 2471, Adelaide, SA 5001, Australia
2
Escuela de Matemática, Instituto Tecnológico de Costa Rica, Cartago 30101, Costa Rica
*
Author to whom correspondence should be addressed.
Math. Comput. Appl. 2025, 30(2), 26; https://doi.org/10.3390/mca30020026
Submission received: 5 September 2024 / Revised: 26 February 2025 / Accepted: 4 March 2025 / Published: 7 March 2025

Abstract

:
In this paper, a nonlinear system is interpreted as an operator F transforming random vectors. It is assumed that the operator is unknown and the random vectors are available. It is required to find a model of the system represented by a best constructive operator F approximation. While the theory of operator approximation with any given accuracy has been well elaborated, the theory of best constrained constructive operator approximation is not so well developed. Despite increasing demands from various applications, this subject is minimally tractable because of intrinsic difficulties with associated approximation techniques. This paper concerns the best constrained approximation of a nonlinear operator in probability spaces. The main conceptual novelty of the proposed approach is that, unlike the known techniques, it targets a constructive optimal determination of all 3 p + 2 ingredients of the approximating operator where p is a nonnegative integer. The solution to the associated problem is represented by a combination of new best approximation techniques with a special iterative procedure. The proposed approximating model of the system has several degrees of freedom to minimize the associated error. In particular, one of the specific features of the developed approximating technique is special random vectors called injections. It is shown that the desired injection is determined from the solution of a special Fredholm integral equation of the second kind. Its solution is called the optimal injection. The determination of optimal injections in this way allows us to further minimize the associated error.

1. Introduction

1.1. Motivation

Over the last few decades, the problem of constructive approximation of nonlinear operators has been a topic of profound research. A number of fundamental papers have appeared, which have established significant advances in this research area. Some relevant references can be found, in particular, in [1,2,3,4,5,6,7,8,9,10,11,12,13,14].
The known related results mainly concern proving the existence and uniqueness of operators approximating a given map, and justifying the bounds of errors arising from the approximation methods. The assumptions are that preimages and images are deterministic and can be represented in an analytical form, that is, by equations. At the same time, in many applications, the sets of preimages and images are stochastic and cannot be described by equations. Nevertheless, it is possible to represent these sets in terms of their numerical characteristics, such as the expectation and covariance matrices. Typical examples include stochastic signal processing [15,16,17,18,19], statistics [20,21,22,23,24], engineering [25,26,27,28], and image processing [29,30]; in the latter case, a digitized image, presented by a matrix, is often interpreted as the sample of a stochastic signal.
While the theory of operator approximation with any given accuracy is well elaborated (see, e.g., [1,2,3,4,5,6,7,8,9,10,11,12,13,14]), the theory of best constrained constructive operator approximation is not particularly well developed, although this is an area of intensive recent research (see, e.g., [31,32,33,34,35,36]). Despite increasing demands from applications [17,18,19,21,22,23,25,26,27,28,30,31,32,33,34,36,37,38,39,40,41,42,43,44,45,46], this subject is minimally tractable because of intrinsic difficulties in best approximation techniques, especially when the approximating operator should have a specific structure implied by the underlying problem. Recent studies related to this topic can be found in [47,48,49,50,51,52,53,54,55]. In particular, in [47], an approach to modeling of nonlinear systems was proposed based on observed input-output data. The proposed method constructs an approximate model by decomposing the unknown function into multiple components, aiming to minimize the associated error. Special random vectors, called injections, are used to refine the approximation while specific transformations simplify the optimization process. The approach provides flexibility in adjusting parameters to improve accuracy.
We wish to extend the known results in this area to the case when the sets of preimages and images of map F are stochastic, and the approximating operator we search is constructive in the sense it can numerically be realized and, therefore, is applicable to problems in applications.

1.2. Short Description of the Method

Let Ω be a set of outcomes in the probability space ( Ω , Σ , μ ) for which Σ is a σ –field of measurable subsets of Ω and μ : Σ [ 0 , 1 ] is an associated probability measure. Let x L 2 ( Ω , R m ) and y L 2 ( Ω , R n ) be random vectors, F : L 2 ( Ω , R n ) L 2 ( Ω , R m ) be a nonlinear map, and x = F ( y ) .
A nonlinear system is interpreted as the map F where x and y are random output and input signals, respectively. It is assumed that F is unknown, and x and y are available. We propose and justify a new method for the constructive F approximation such that first, an associated error is minimized and second, a structure of the approximating operator satisfies the special constraint related to the dimensionality reduction of vector y . More specifically, we develop a new approach to the best constructive approximation of the map F in probability spaces subject to a specialized criterion associated with the dimensionality reduction of random preimages. The latter constraint follows from the requirements in applications such as those considered in [20,21,22,23,24,26,41,42]. In particular, in signal processing and system theory, a dimensionality reduction of random signals is used to optimize the cost of signal transmission. It is assumed that the only available information on F is given by certain covariance matrices formed from the preimages and images. This is a typical assumption used in applications such as those considered, e.g., in [15,16,17,18,19,20,21,22,23,37,38,44,45,46]. Here, we adopt that assumption. As mentioned, in particular, in [56,57,58,59,60,61], a priori knowledge of the covariances can come either from specific data models, or after sample estimation during a training phase.
The problem we consider (see (7) below) concerns finding the best approximating operator that depends on 2 p + 2 unknown matrices G j and H j , for j = 0 , , p and p more unknown random vectors v 1 , , v p . We call v 1 , , v p the injections. Here, p is a non-negative integer p. The injections v 1 , , v p are aimed to further diminish the associated error. The difficulty is that 3 p + 2 unknowns should be determined from a minimization of the single cost function given in (7).
The main difference between the approach in [47] and the method proposed in this paper is in their approach to modeling of nonlinear systems. The method in [47] constructs the model T p as a sum of p + 1 specific components, and the original optimization problem is decomposed into p + 1 simpler sub-problems. The method empirically determines the special random vectors, called injections, to reduce the associated error. In contrast, the method proposed here solves the problem of a best constrained approximation of a nonlinear operator in probability spaces. It introduces the concept of an optimal injection, which is determined by solving a Fredholm integral equation of the second kind. While the method in [47] focuses on a structured numerical implementation, the proposed method provides a more general theoretical framework that optimally determines all 3 p + 2 parameters through a combination of best approximation techniques and an iterative procedure.
The solution is represented in Section 3 and Section 4, and is based on the following observation. Methods for best approximation are aimed at obtaining the best solution within a certain class; the accuracy of the solution is limited by the extent to which the class is satisfactory. By contrast, iterative methods are normally convergent, but the convergence can be quite slow. Moreover, in practice, only a finite number of iteration loops can be carried out, and therefore, the final approximate solution is often unsatisfactorily inaccurate. A natural idea is to combine the methods for best approximation and iterative techniques to exploit their advantageous features.
In Section 3 and Section 4, we present an approach which realizes this. First, a special iterative procedure is proposed which aims to improve the accuracy of approximation with each consequent iteration loop. Secondly, the best approximation problem is solved providing the smallest associated error within the chosen class of approximants for each iteration loop. In Section 4, we show that the combination of these techniques allows us to build a computationally efficient and flexible method. In particular, we prove that the error in approximating F by the proposed method decreases with an increase in the number of iterations. An application is made to the optimal filtering of stochastic signals.

1.3. Novelty and Advantages

The novelty of the proposed method is in its approach to the best constructive approximation of nonlinear operators in probability spaces. Unlike methods that primarily focus on the existence and uniqueness of approximating operators, the proposed method aims to achieve an optimal constructive approximation by targeting a specific structure for the approximating operator, which is constrained by dimensionality reduction of random preimages. The focus on dimensionality reduction and the optimization of random vectors, known as injections, distinguishes the proposed method from existing approaches.
The advantages of the proposed method are twofold. First, it allows us to determine optimal injections which are derived from solving a Fredholm integral equation of the second kind, thereby providing a more robust and theoretically sound approach to approximating nonlinear systems. Second, the method combines best approximation techniques with a special iterative procedure, ensuring that the accuracy of the approximation improves with each iteration. This combination not only enhances the computational efficiency of the model but also provides a flexible framework that can be adapted to various applications, particularly those associated with signal processing and system theory, where dimensionality reduction plays a crucial role in optimizing signal transmission costs. By approximating F using the best constrained constructive operator approximation, one can design efficient algorithms for signal reconstruction or denoising. In system identification, this approach can be used to model and estimate dynamic systems with stochastic inputs, enabling more accurate predictions and adaptive control strategies.

2. The Proposed Approach

2.1. Some Special Notation

Let us write x = [ x ( 1 ) , , x ( m ) ] T and y = [ y ( 1 ) , , y ( n ) ] T where x ( i ) , y ( j ) L 2 ( Ω , R ) , for i = 1 , , m and j = 1 , , n , and x ( ω ) R m and y ( ω ) R n for all ω Ω . Each matrix A R m × n defines a bounded linear transformation A : L 2 ( Ω , R n ) L 2 ( Ω , R m ) . It is customary to write A rather then A since [ A ( x ) ] ( ω ) = A [ x ( ω ) ] , for each ω Ω . Let us also write
x Ω 2 = Ω j = 1 m [ x j ( ω ) ] 2 d μ ( ω ) < .
The covariance matrix formed from x and y is denoted by E x y such that
E x y = Ω x ( ω ) [ y ( ω ) ] T d μ ( ω ) = Ω x ( i ) ( ω ) y ( j ) ( ω ) d μ ( ω ) i , j = 1 m , n .
The Moore–Penrose pseudo-inverse [62] of matrix M is denoted by M .

2.2. Generic Structure of Approximating Operator

Let v 1 , , v p be random vectors such that v j L 2 ( Ω , R q j ) , for j = 1 , , p . We write y = v 0 and q 0 = n . As mentioned before, we call v 1 , , v p the injections. This is because v 1 , , v p contribute to the decrease of the associated error, as shown in Section 4.4 below. The choice of v 1 , , v p is considered in Section 4.2, where each v j , for j = 1 , , p is defined by a nonlinear transformation φ j of y , i.e., v j = φ j ( y ) . To facilitate the numerical implementation of the approximating technique introduced below, each vector v j , for j = 1 , , p , is transformed to vector z j L 2 ( Ω , R q j ) by transformation Q j so that
z j = Q j ( v j , Z j 1 ) ,
where Z j 1 = { z 0 , , z j 1 } . The choice of Q j is considered in Section 3.1.
Further, for i = 0 , 1 , , p , let G i R m × r i ,   H i R r i × q i where r i is given, 0 < r i < r , and
r = r 0 + + r p .
Here, r is a positive integer such that r min { m , n } .
It is convenient to set Q 0 = I and z 0 = v 0 = y . To approximate F for a given reduction ratio
c = r / min { m , n } ,
we consider operator T p : L 2 ( Ω , R q 0 ) × × L 2 ( Ω , R q p ) L 2 ( Ω , R m ) represented by
T p ( v 0 , v p ) = G 0 H 0 z 0 + + G p H p z p ,
where G j : L 2 ( Ω , R r j ) L 2 ( Ω , R m ) and H j : L 2 ( Ω , R q j ) L 2 ( Ω , R r j ) , for j = 0 , 1 , , p , are linear operators (i.e., G j and H j are represented by m × r j and r j × q j matrices, respectively. Recall that we use the same symbol to define a matrix and the associated linear operator).
Importantly, operators H 0 , , H p imply the dimensionality reduction of vector y . This is because H i z i L 2 ( Ω , R r i ) where 0 < r i < r min { m , n } , for i = 0 , , p .
We call p the degree of T p . It is shown below that T p approximates an operator of interest F : L 2 ( Ω , R n ) L 2 ( Ω , R m ) , with the accuracy represented by theorems in Section 3.3, Section 3.4, and Section 4.4.

2.3. Statement of the Problem

Let F : L 2 ( Ω , R n ) L 2 ( Ω , R m ) be a continuous operator. We consider the problem as follows: Given x , y and r 0 , , r p , find matrices G 0 , H 0 , , G p , H p and vectors v 1 ,   , v p that solve
min v 1 , , v p min G 0 , H 0 , , G p , H p F ( y ) j = 0 p G j H j z j Ω 2
subject to
G j R m × r j and H j R r j × q j ,
and
E z i z j = O , for i j ,
where i , j = 0 , , p and O denotes the zero matrix (and the zero vector).
It will be shown in Section 3 below that the solution of problem (7)–(9) is determined under a special condition imposed on vectors v 1 , , v p .

2.4. Related Work

2.4.1. Low-Rank Approximations

For p = 0 and the assumption that matrix E x y is invertible, the particular case of the problem in (7)–(9),
min G 0 , H 0 x G 0 H 0 y Ω 2 ,
has been solved, in particular, in [20,21,63,64,65]. Under the assumption E x y is singular, it has been solved in [24,66,67,68,69]. Note it is a quite simplified case of problem in (7)–(9).
For p = 1 , G : L 2 ( Ω , R r 0 ) L 2 ( Ω , R m ) , H j : L 2 ( Ω , R n ) L 2 ( Ω , R r 0 ) where j = 0 , 1 , a particular case of the problem in (7)–(9),
min G , H 0 , H 1 x G ( H 0 y + H 1 v 1 ) Ω 2 ,
was solved in [20]. The choice of v 1 considered in [20] is not optimal.

2.4.2. Tensor Methods

In [70,71,72], the problem in (10) is generalized and studied in terms of tensors. Together with methods formulated in terms of matrices and mentioned in Section 1 and Section 2.4.1, the tensor methods represent an important research subject both in the theoretical and applied sense. In this paper, the problems in (10) and (11) are generalized in a different way, which was described in Section 2.2 and Section 2.3, and will be justified in Section 3 and Section 4.4 below.

2.4.3. System Identification and Modeling

The problem we consider can also be represented as a black-box problem [73,74] where y and x are an available random input and output, respectively, and F is an unknown system. Then, the approximating operator T p identifies a model of the system. Its particular features are detailed in Section 4.

2.5. Contribution

2.5.1. Challenges of High-Dimensionality

The proposed approach achieves the dimensionality reduction of preimages in the following way. In (6), let us write T p ( v 0 , v p ) as
T p ( v 0 , v p ) = [ G 0 , , G p ] H 0 z 0 H p z p = G H z ,
where G = [ G 0 , , G p ] , H = diag [ H 0 , , H p ] , z = [ z 0 T , , z p T ] T . Here, H is the block-diagonal matrix with H 0 , , H p on the main diagonal and the dimensionality of vector u = H z is r, which is defined by (4). Therefore, H realizes the dimensionality reduction in vector y with the reduction ratio defined by (5). Unlike the known techniques [20,21,22,23,24,26,41,42] for dimensionality reduction, the proposed approach achieves this by using p + 1 terms, H 0 , , H p . This allows us to increase the accuracy associated with optimal determination of approximating operator T p . More details can be found in Section 2.5.3 and Section 4.

2.5.2. Challenges of Accuracy

It is shown in Theorems 3, 4, and 5 below that, for the same reduction ratio, accuracy associated with the approximating operator T p improves if the degree p or dimensions of injections v 1 , , v p increase. For the case of the optimal determination of injections v 1 , , v p , the associated error is further improved. This is established in Theorems 10 and 11 in Section 4.4.

2.5.3. Novelties and Relation to Existing Concepts

Commonly, an approximating operator is represented by j = 0 p P j v j where v 0 , , v p are known basis functions and P 0 , , P j are scalars or matrices which should be determined from (desirably) an error minimization.
The main conceptual novelty of the proposed approach is that, unlike the known techniques, it targets a constructive optimal determination of all 3 p + 2 ingredients of the approximating operator in (6). They are v 1 , , v p , H 0 , G 0 , ,   H p , G p . The solution of the associated problem (7)–(9) is provided in Section 3.1, Section 3.2, Section 4.1, and Section 4.2, and is represented by a combination of a new best approximation technique with a special iterative procedure.
A basic idea of the solution is to reduce the original problem (7)–(9) with 3 p + 2 unknowns to p + 1 simpler problems in (25) and (63) so that each of them, for j = 1 , , p , contains only three unknowns, v j , H j , and G j . For j = 0 , there are only two unknowns, H 0 and G 0 . This is achieved due to exploiting operators Q 0 , , Q p determined in Theorem 1.
The iteration procedure represented in Section 4.1 is based on the idea of the maximum block improvement method [75] which is an efficient technique for solving the spherically constrained homogeneous polynomial optimization problems. The associated novelties are in the techniques for the determination of matrices H 0 , G 0 , ,   H p , G p given in Section 3.2, injections v 1 , , v p represented in Section 4.2, and for the error analysis given in Section 3.3 and Section 4.4. In particular, it is shown in Theorem 8 in Section 4.2 that the desired vector v j is determined from the solution of a special Fredholm integral equation of the second kind (71). Its solution is called the optimal injection.
Further, unlike the known techniques, the proposed approximating operator T p has several degrees of freedom to minimize the associated error. They are:
  • ‘degree’ p of T p ,
  • matrices G 0 , H 0 , ,   G p , H p ,
  • optimal injections v 1 , , v p ,
  • values of r 0 , , r p in (4), and
  • dimensions q 1 , , q p of injections v 1 , , v p .
It is shown in Section 3.3 and Section 4.4 below that both the optimal choice of G 0 , H 0 , ,   G p , H p and injections v 1 , , v p , and the increase in r 0 , , r p and q 1 , , q p lead to the decrease in the error associated with approximating operator T p . Injections v 1 , , v p represent a new special feature of the proposed technique.
In terms of multilayer networks, the novelties are as follows. First, the associated network consists of four hidden layers, i.e., more than in the known networks. Second, it exploits the interaction between the hidden layers (see Figure 1 where a particular case of T p ( v 0 , , v p ) for p = 3 is illustrated). These particular features imply the improvement in the associated accuracy (see Section 3.3 and Section 4.4). In Figure 1, x ˜ denotes the approximation of x = F ( y ) determined by the proposed technique, i.e., x ˜ = T p ( v 0 , , v p ) . Details are provided in Section 4.
In terms of system identification, side by side with the black-box problem as in [73,74], the problem in (7)–(9) can be interpreted as a quite wide generalization of the blind system identification problem considered, in particular, in [17,76,77,78]. This is because vectors v 1 , , v p are assumed to be unknown. Unlike the existing work on blind system identification, the problem in (7)–(9) is stated in terms of random vectors. Other novelties associated with system identification are similar to those considered above, i.e., the proposed model T p ( v 0 , , v p ) of the system contains 2 p + 1 unknowns, the method of their determination is different from the existing techniques, and the associated accuracy is improved by a variation in more degrees of freedom.

3. Preliminary Results

Here, we consider the determination of vectors z 1 , ,   z p (in Definition 1 that follows, they are called pairwise uncorrelated vectors) and the solution of a particular case of the problem in (7), (8), (9) where the minimization with respect to v 1 , ,   v p is not included. These preliminary results will be used in Section 4, where the solution to the original problem represented by (7), (8), (9) is provided.
Definition 1.
Random vectors z 0 , , z p are called pairwise uncorrelated if the condition in (9) holds for any pair of vectors z i and z j , for i j , where i , j = 0 , , p . Two vectors z i and z j belonging to the set of the pairwise uncorrelated vectors are called uncorrelated.
For j = 0 , , p , let N ( M ( j ) ) be a null space of matrix M ( j ) R q j × q j .
Definition 2.
Random vectors v 0 , , v p are called linearly independent in the generalized sense if, for every collection of matrices M ( 0 ) , , M ( p ) ,
M ( 0 ) v 0 ( ω ) + + M ( p ) v p ( ω ) = O ,
 for almost all  ω Ω , implies that  v j ( ω ) N ( M ( j ) ) , for each  j = 0 , , p , and almost all  ω Ω .
Definition 3.
Random vector v j , for j = 0 , , p , is called the well-defined injection if
Γ z j = E x z j E z j z j E z j x O ,
where z j is defined by (3). Otherwise, injection v j is called ill-defined.
An explanation for introducing Definition 3 is provided by Remark 1 at the end of Section 3.2 below.

3.1. Determination of Pairwise Uncorrelated Vectors

Theorem 1.
Let random vectors v 0 , , v p be linearly independent in the generalized sense. Then they are transformed to the pairwise uncorrelated vectors z 0 , , z p by transformations Q 0 , , Q p as follows:
z 0 = Q 0 ( v 0 ) = v 0 and , for j = 1 , , p ,
z j = Q j ( v j , Z j 1 ) = v j k = 0 j 1 E v j z k E z k z k z k .
Proof. 
Suppose that the condition in (9) holds for z 0 ,   , z i 1 . Then, for = 0 , , i 1 ,
E z i z = E [ ( v i l = 0 i 1 E v i z l E z l z l z l ) z T ] = E v i z l = 0 i 1 E v i z l E z l z l E z l z = E v i z E v i z E z z E z z = O .
The latter is true because by Lemma 1 in [79],
E v i z E z z E z z = E v i z .
Thus, by induction, (9) holds for any i = 0 , , p . □
In terms of a multi layer network, this procedure is illustrated diagrammatically in Figure 1.
The solution device of the problem in (7)–(9) is based, in particular, on the solution of the problem
min G 0 , H 0 , , G p , H p F ( y ) j = 0 p G j H j z j Ω 2
subject to (8) and (9). In the following Section 3.2, matrices G 0 , H 0 , ,   G p , H p that solve this problem are given.

3.2. Determination of Matrices G 0 , H 0 , ,   G p , H p That Solve the Problem in (16), (8) and (9)

First, recall the definition of a truncated singular value decomposition (SVD). Let the SVD of matrix A R m × n be given by A = U A Σ A V A T , where U A = [ u 1 u 2 u m ] R m × m , V A = [ v 1 v 2 v n ] R n × n are unitary matrices, and Σ A = diag ( σ 1 ( A ) ,   ,   σ min ( m , n ) ( A ) ) R m × n is a generalized diagonal matrix, with the singular values σ 1 ( A ) σ 2 ( A ) 0 on the main diagonal. For k < m , j < n and < min ( m , n ) , we denote
U A , k = [ u 1 u 2 u k ] , V A , j = [ v 1 v 2 v j ] , Σ A , = diag ( σ 1 ( A ) , , σ ( A ) ) ,
and write
Π A , L = k = 1 rank ( A ) u k u k T and Π A , R = j = 1 rank ( A ) v j v j T .
For r = 1 , , rank ( A ) ,
[ A ] r = i = 1 r σ i ( A ) u i v i T R m × n ,
is the truncated SVD of A. For r rank ( A ) , we write [ A ] r = A ( = A rank ( A ) ) .
Proposition 1.
For any random vector x L 2 ( Ω , R m ) ,
x Ω 2 = tr { E x x } .
Proof. 
We have
x Ω 2 = Ω x ( ω ) 2 d μ ( ω ) = Ω tr { x ( ω ) [ x ( ω ) ] T } d μ ( ω ) = j = 1 m Ω [ x ( j ) ( ω ) ] 2 d μ ( ω ) = tr { E x x } .
Thus, (18) is true. □
Theorem 2.
Let v 0 , , v p be well-defined injections and vectors z 0 , , z p be pairwise uncorrelated. Then the minimal Frobenius norm solution to the problem in (16) is given, for j = 0 , , p , by
G j = U Γ z j , r j a n d H j = U Γ z j , r j T E x z j E z j z j ,
Proof. 
Let P j = G j H j , P = [ P 0 , , P p ] and w = [ z 0 T , ,   z p T ] T . Then, for x = F ( y ) ,
F ( y ) j = 0 p P j z j Ω 2 = tr E x x E x w P T P E w x + P E w w P T ,
where E x w = [ E x z 0 , , E x z p ] and by Theorem 1, matrix E w w is block-diagonal, E w w = diag [ E z 0 z 0 , , E z p z p ] . Thus,
P E w w P T = j = 0 p P j E z j z j P j T and P E w x = j = 0 p P j E z j x
and then
F ( y ) j = 0 p P j z j Ω 2 = tr E x x j = 0 p E x z j P j T j = 0 p P j E z j x + j = 0 p P j E z j z j P j T .
At the same time,
F ( y ) P j z j Ω 2 = tr { E x x E x z j P j T P j E z j x + P j E z j z j P j T } .
Therefore, (20) and (21) imply
F ( y ) j = 0 p P j z j Ω 2 = j = 0 p F ( y ) P j z j Ω 2 tr p E x x .
The RHS in (22) is nonnegative. Indeed, for R = j = 0 p E x z j P j T j = 0 p P j E z j x + j = 0 p P j E z j z j P j T , we have F ( y ) j = 0 p P j z j Ω 2 = tr { E x x + R } 0 , i.e., tr { E x x + ( p E x x p E x x ) + R } 0 . Here, tr { E x x + ( p E x x p E x x ) + R } = j = 0 p F ( y ) P j z j Ω 2 tr p E x x . Further, the case F ( y ) = P j z j is not possible since matrix P j is singular. Further,
F ( y ) P j z j Ω 2 = E x x 1 / 2 2 E x z j ( E z j z j 1 / 2 ) 2 + ( P j E x z j E z j z j ) E z j z j 1 / 2 2 = E x x 1 / 2 2 E x z j ( E z j z j 1 / 2 ) 2 + P j E z j z j 1 / 2 E x z j ( E z j z j 1 / 2 ) 2
because E z j z j E z j z j 1 / 2 = ( E z j z j 1 / 2 ) and
E x z j E z j z j E z j z j = E x z j
(see [80]).
Let us now denote by R r j m × n the set of all m × n matrices of rank at most r j . In the RHS of (23), only the last term depends on P j . Therefore, on the basis of [24,68,81,82,83,84], the minimal Frobenius norm solution to the problem
min P j R r j m × n F ( y ) P j z j Ω 2 ,
for j = 0 , , p , is given by
P j = G j H j = U Γ z j , r j U Γ z j , r j T E x z j E z j z j .
Then, (19) follows from (26). □
Remark 1.
Definition 3 of the well-defined injections is motivated by the following observation. It follows from (19) that if, for all j = 0 , , p , vector v j is such that Γ z j = O , then G j = O and H j = O . In other words, then approximating operator T p = O .
Therefore, in Theorem 2 above and in the theorems below, vectors v 0 , , v p are assumed well-defined.

3.3. Error Analysis Associated with the Solution of Problem in (16), (8) and (9)

In Theorem 3 of this section, we obtain the constructive representation of the error associated with the solution of problem in (16), (8), and (9). In Theorem 4, we show that the error can be improved by the increase in the dimensions of injections v 1 , , v p . Further, Theorems 3, 4, and 5 establish that the error is also diminished by the increase in the degree of approximating operator T p .
The error associated with T p ( v 0 , , v p ) = j = 0 p G j H j z j is denoted by
ε G H ( p ) = min G 0 , H 0 , , G p , H p F ( y ) j = 0 p G j H j z j Ω 2 .
Let us denote the Frobenius norm by · .
Theorem 3.
For j = 0 , , p , let A j = E x z j ( E z j z j 1 / 2 ) , rank ( A j ) = s j and s j r j + 1 . For k = 1 , , s j , let σ k ( A j ) be a singular value of A j . Let G 0 , H 0 , , G p , H p be determined by (19). Then
ε G H ( p ) = tr { E x x } j = 0 p k = 1 r j σ k 2 ( A j )
In particular, the error decreases as p increases.
Proof. 
In the notation introduced in (17), matrix G j H j in (19) is represented as G j H j = [ E x z j ( E z j z j 1 / 2 ) ] r j ( E z j z j 1 / 2 ) . Therefore, in (23),
G j H j E z j z j 1 / 2 E x z j ( E z j z j 1 / 2 ) 2 = [ E x z j ( E z j z j 1 / 2 ) ] r j ( E z j z j 1 / 2 ) E z j z j 1 / 2 E x z j ( E z j z j 1 / 2 ) 2 = [ E x z j ( E z j z j 1 / 2 ) ] r j E x z j ( E z j z j 1 / 2 ) 2 = [ A j ] r j A j 2 = k = r j + 1 s j σ k 2 ( A j ) .
because [ E x z j ( E z j z j 1 / 2 ) ] r j ( E z j z j 1 / 2 ) E z j z j 1 / 2 = [ E x z j ( E z j z j 1 / 2 ) ] r j . Further, since
E x z j ( E z j z j 1 / 2 ) 2 = A j 2 = k = 1 s j σ k 2 ( A j ) ,
then (22), (23), (28), and (29) imply (27). □
Let us write
A j = { a k i ( j ) } k , i = 1 m , q j and A j [ A j ] r j = { b k i ( j ) } k , i = 1 m , q j
where a k i ( j ) and a k i ( j ) are entries of matrices A j and A j [ A j ] r j , respectively. Let us also denote
γ k , ( j ) = i = 1 m a k i ( j ) 2 b k i ( j ) 2 , γ ( j ) = max { γ 1 , ( j ) , , γ q j , ( j ) } ,
γ = max j = 1 , , p γ ( j ) ,   α 0 = tr { E x x j = 0 p A j A j T }   and   q = q 1 + , + q p .
In the following theorem we show that injections v 1 , , v p are useful in the sense that their dimensions increase, so the error associated with the solution to the problem in (16), (8), and (9) is diminished.
Theorem 4.
Let v 1 , , v p be well-defined injections and let matrices G 0 , H 0 ,   ,   G p , H p be defined by Theorem 2. Then the associated error decreases as the sum q of dimensions of injections v 1 , , v p increases. In particular, there is β ( 0 , γ ] such that, given α α 0 , then
α 0 ε G H ( p ) α
if
q tr E x x k = 1 r 0 σ k 2 ( A 0 ) β .
Proof. 
It follows from (22), (27), (23), and (28) that
ε G H ( p ) = F ( y ) G j H 0 z 0 Ω 2 + j = 1 p F ( y ) G j H j z j Ω 2 tr p E x x = tr E x x k = 1 r 0 σ k 2 ( A 0 ) + j = 1 p F ( y ) G j H j z j Ω 2 tr p E x x
where
F ( y ) G j H j z j Ω 2 = tr E x x A j 2 [ A j ] r j A j 2 = tr E x x k = 1 q j i = 1 m a k i ( j ) 2 b k i ( j ) 2 .
Therefore,
ε G H ( p ) = tr E x x k = 1 r 0 σ k 2 ( A 0 ) j = 1 p k = 1 q j i = 1 m a k i ( j ) 2 b k i ( j ) 2 .
Here, k = 1 q j i = 1 m a k i ( j ) 2 b k i ( j ) 2 > 0 , since by (28),
A j 2 [ A j ] r j A j 2 = k = 1 r j σ k 2 ( A j ) > 0 .
Thus, (32)–(34) imply that ε G H ( p ) decreases as q j increases, for j = 1 , , p , and p increases.
Further, (27) implies
ε G H ( p ) tr { E x x } j = 0 p k = 1 s j σ k 2 ( A j ) = tr E x x j = 0 p A j A j T = α 0 .
Since
0 < j = 1 p k = 1 q j i = 1 m a k i ( j ) 2 b k i ( j ) 2 γ j = 1 p q j = γ q ,
then
j = 1 p k = 1 q j i = 1 m a k i ( j ) 2 b k i ( j ) 2 = q β .
Therefore, (33), (35), and (36) imply
α 0 ε G H ( p ) = tr E x x k = 1 r 0 σ k 2 ( A 0 ) q β .
thus, if ε G H ( p ) α , then (31) is true. Conversely, if the latter is true, then ε G H ( p ) α . □
Remark 2.
An empirical explanation of Theorem 4 is that the increase in q implies the increase in the dimensions of matrices H 1 , , H p in (6) and (19). Hence, it implies the increase in the number of parameters to optimize. As a result, for a fixed parameter r given by (4), the accuracy associated with approximating operator T p improves. Further, it follows from (31) that, as q increases, ε G H ( p ) tends to α 0 , which is the error associated with the full rank approximating operator S h (see (40) and (46) below).
Remark 3.
By Theorem 3, the error associated with solution of problem (16) decreases as the degree p of the approximating operator T p increases. At the same time, the increase in degree p of the approximating operator T p may involve an increase in parameter r (see (4)). However, by a condition of some applied problem in hand, r must be fixed. In the following Theorem 5, under the condition of fixed r, the case of decreasing the error as the degree p of the approximating operator increases is detailed.
Theorem 5.
Let r and r j , for j = 0 , , p , be given. Let g be a nonnegative integer such that g < p and let g = r g + r g + 1 + + r p . If
k = r g + 1 r g + 1 + + r p σ k 2 ( A g ) < j = g + 1 p k = 1 r j σ k 2 ( A j ) ,
where j = g + 1 p k = 1 r j σ k 2 ( A j ) = j = g + 1 p k = 1 q j i = 1 m a k i ( j ) 2 b k i ( j ) 2 , then
ε G H ( x ) < ε G H ( c ) ,
i.e., for the same r, the error associated with the approximating operator of higher degree p is less than the error associated with the approximating operator of lower degree g.
Proof. 
We write r = r 0 + + r g 1 + l g . Then,
ε G H ( g ) = tr { E x x } j = 0 g 1 k = 1 r j σ k 2 ( A j ) k = 1 g σ k 2 ( A g ) = tr { E x x } j = 0 g 1 k = 1 r j σ k 2 ( A j ) k = 1 r g σ k 2 ( A g ) k = r g + 1 r g + 1 + + r p σ k 2 ( A g ) .
Thus, (27) and (39) imply (37) and (38). □
Remark 4.
The RHS in (37) increases as the dimension q j of at least single injection v j , for j = g + 1 , , p , increases while the LHS does not depend on q j . In other words, one can always find q j , for j = g + 1 , , p , such that the inequality in (37) is true.
Example 1.
Here, we wish to numerically illustrate Theorems 3, 4, and 5. To this end, we assume that x L 2 ( Ω , R m ) and y L 2 ( Ω , R m ) are uniformly and normally distributed random vectors, respectively. Injections v 1 L 2 ( Ω , R q 1 ) and v 2 L 2 ( Ω , R q 2 ) are here chosen as uniformly distributed random vectors. Covariance matrices E x v j , E v i v j , for i , j = 0 , 1 , 2 , are represented by E x v j = 1 s X V j T a n d E v i v j = 1 s V i V j T where X R m × s and V j R q j × s are sample matrices of x and v j , respectively, for j = 0 , 1 , 2 .
We choose m = 100 and r = 50 . It follows from Theorems 3, 4, and 5 that the error associated with approximating operator T p ( v 0 , , v p ) varies if values of p, q = q 0 + + q p and r = r 0 + + r p vary. We wish to illustrate the decrease in error when, for the same r, values of p and q increase. To this end, we provide Table 1 and Table 2, where values of the errors are given for different values of q j , r j , for j = 0 , 1 , 2 and p = 0 , 1 , 2 . In Table 1 and Table 2, the abbreviation MSE (mean square error) is used to unite notations ε G H ( p ) and ε G H ( g ) used in Theorems 3, 4, and 5. In the tables, Cases 1 and 2 for specific values of q j , r j are considered.
In Figure 2 the MSE values are illustrated diagrammatically.
It follows from Table 1 and Table 2 and Figure 2 that, for the same r, the error associated with the proposed system model decreases if degree p or the sum q of the injection dimensions increases. This is because the increase in the number of parameters to optimize in the operator T results in a decrease in the error, as stated in Theorem 5.

3.4. Particular Case: No Reduction of Vector Dimensionality

An important particular case of the problem in (16) and (9) is when matrix G j H j is replaced with a full rank matrix M j R m × q j , for j = 0 , , h . Operator S h given by
S h ( v 0 , v p ) = k = 0 h M k z k
is called the full rank approximating operator. The problem then is to find matrices M 0 , , M h that solve
min M 0 , , M h F ( y ) k = 0 h M k z k Ω 2 ,
subject to condition (9).
In particular, searched matrices M 0 , , M h can be found, for ω Ω , from equations
M j z j ( ω ) = [ F ( y ) ] ( ω ) , for j = 0 , , h ,
and
k = 0 h M k z k ( ω ) = [ F ( y ) ] ( ω ) ,
respectively. Those equations may have an infinite number of solutions or have no solutions. Therefore, instead, the following theorem provides the solution to the problem in (41) where cases (42) and (43) are excluded.
Theorem 6.
Let v 0 , , v p be well-defined injections and vectors z 0 , , z p be pairwise uncorrelated. Let F ( y ) M j z j , for j = 0 , , h , and F ( y ) k = 0 h M k z k . Then, the minimal Frobenius norm solution to the problem (41) is given, for k = 0 , , h , by
M k = E x z k E z k z k .
Proof. 
Similar to (22),
F ( y ) j = 0 h M j z j Ω 2 = j = 0 h x M j z j Ω 2 tr h E x x .
It is known (see, for example, [16,20,21]) that the minimal Frobenius norm solution to the problem
min M j x M j z j Ω 2
is given by (44). □
Theorem 7.
Let A j = E x z j ( E z j z j 1 / 2 ) . The error associated with the minimal Frobenius norm solution to the problem in (41) is represented by
min M 0 , , M h F ( y ) j = 0 h M j z j Ω 2 = tr { E x x } j = 0 h A j 2 .
Proof. 
For M j determined by (44),
F ( y ) M j z j Ω 2 = tr { E x x } E x z j ( E z j z j 1 / 2 ) 2 + M j E z j z j 1 / 2 E x z j ( E z j z j 1 / 2 ) 2 .
Here,
M j E z j z j 1 / 2 E x z j ( E z j z j 1 / 2 ) 2 = E x z j E z j z j E z j z j 1 / 2 E x z j ( E z j z j 1 / 2 ) 2 = E x z j ( E z j z j 1 / 2 ) E x z j ( E z j z j 1 / 2 ) 2 = 0 .
That is,
F ( y ) M j z j Ω 2 = tr { E x x } E x z j ( E z j z j 1 / 2 ) 2 .
Then, (46) follows from (22) and (48). □

4. Solution of Problem Given by (7), (8), (9)

Now we are in the position to consider a solution to the original problem in (7), (8), (9).

4.1. Device of Solution

In comparison with the problem in (16), (8), (9), a specific difficulty of the original problem in (7), (8), (9) is a determination of p additional unknowns, injections v 1 , , v p .
The device of the proposed solution is as follows. First, in (13) and (14), arbitrary linearly independent in the generalized sense vectors v 0 , , v p are denoted by v 0 ( 0 ) , , v p ( 0 ) and pairwise uncorrelated random vectors z 0 , , z p are denoted by z 0 ( v 0 ( 0 ) ) ,   ,   z p ( v p ( 0 ) ) . In (19), matrices G j and H j , for j = 0 , , p , are denoted by G j ( 0 ) and H j ( 0 ) , respectively. The associated error is still represented by (27). We also write
ε ( 0 ) = F ( y ) k = 0 p G k ( 0 ) H k ( 0 ) z k ( v k ( 0 ) ) Ω 2 .
Then, for i = 0 , 1 , , searched injections v 1 ( i + 1 ) ,   , v p ( i + 1 ) and matrices G 1 ( i + 1 ) ,   H 1 ( i + 1 ) , G p ( i + 1 ) , H p ( i + 1 ) are determined by the iterative procedure represented bellow. It will be shown in Theorem 10 below that v 1 ( i + 1 ) ,   , v p ( i + 1 ) and G 1 ( i + 1 ) ,   H 1 ( i + 1 ) , G p ( i + 1 ) , H p ( i + 1 ) further minimize the associated error. The i-th iterative loop of the iterative procedure consists of the steps as follows.
The i-th iterative loop, for i = 0 , 1 , .
Step 1. Given v 0 = y , G 0 ( i ) , H 0 ( i ) , ,   G p ( i ) , H p ( i ) , find v 1 , , v p that solve
min v 1 , , v p F ( y ) k = 1 p G k ( i ) H k ( i ) z k ( v k ) Ω 2 ,
where
z j ( v j ) = v j k = 0 j 1 E v j z k ( v k ) E z k ( v k ) z k ( v k ) z k ( v k ) .
The solution is denoted by v 1 ( i + 1 ) , , v p ( i + 1 ) . We call v 1 ( i + 1 ) , , v p ( i + 1 ) the optimal injections.
Step 2. Given v 1 ( i + 1 ) , , v p ( i + 1 ) , find pairwise uncorrelated random vectors z 1 ( v 1 ( i + 1 ) ) ,   , z p ( v p ( i + 1 ) ) and denote
ε v ( i + 1 ) = F ( y ) k = 0 p G k ( i ) H k ( i ) z k ( v k ( i + 1 ) ) Ω 2 ,
where, for k = 0 , we set G 0 ( i ) = G 0 ( 0 ) and H 0 ( i ) = H 0 ( 0 ) , for all i = 1 , 2 , and we write
z j ( v j ( i ) ) = v j ( i ) k = 0 j 1 E v j ( i ) z k ( v k ( i ) ) E z k ( v k ( i ) ) z k ( v k ( i ) ) z k ( v k ( i ) ) .
Step 3. Given z 1 ( v 1 ( i ) ) , , z p ( v p ( i ) ) , find G 1 , H 1 , , G p , H p that solve
min G 1 , H 1 , , G p , H p F ( y ) k = 1 p G k H k z k ( v k ( i ) ) Ω 2 .
The solution of problem (55) is denoted by G 1 ( i + 1 ) , H 1 ( i + 1 ) , ,   G p ( i + 1 ) , H p ( i + 1 ) . Further, denote
ε G H ( i + 1 ) = F ( y ) k = 0 p G k ( i + 1 ) H k ( i + 1 ) z k ( v k ( i ) ) Ω 2 ,
where, as before, for k = 0 , we set G 0 ( i + 1 ) = G 0 ( 0 ) and H 0 ( i + 1 ) = H 0 ( 0 ) , for all i = 0 , 1 , .
Step 4. Denote
ε ( i + 1 ) = min { ε v ( i + 1 ) , ε G H ( i + 1 ) } .
Step 5. If, for a given tolerance δ ,
ε ( i + 1 ) ε ( i ) Ω 2 δ ,
the iterations are stopped. If not, then Steps 1–4 are repeated to form the next iterative loop.
The above steps of the solution device are consummated as follows.

4.2. Determination of v j ( i + 1 ) in Step 1

Let us denote
z k ( i + 1 ) = z k ( v k ( i + 1 ) ) , P j ( i ) = G j ( i ) H j ( i ) , B j k ( i + 1 ) = ( P j ( i ) ) E x z k ( i + 1 ) , γ ( i + 1 ) = E z ( i + 1 ) z ( i + 1 ) z ( i + 1 ) and A k ( i + 1 ) = E z ( i + 1 ) z ( i + 1 ) E z ( i + 1 ) z k ( i + 1 ) .
Here, γ ( i + 1 ) ( ω ) R q × q , B j k ( i + 1 ) R q j × q k and A k ( i + 1 ) R q × q k .
Theorem 8.
Let
B j ( i ) = [ B j 0 ( i ) , , B j , j 1 ( i ) ] a n d A j ( i ) = { A k s ( i ) } k , s = 0 j 1 ,
where B j ( i ) R q j × q , A j ( i ) R q × q and q = q 0 + + q j 1 . Then the optimal injection v j ( i + 1 ) , for j = 1 , , p , that solves the problem in (51) is determined by
v j ( i + 1 ) ( ω ) = ( G j ( i ) H j ( i ) ) [ F ( y ) ] ( ω ) + = 0 j 1 C j ( i ) γ ( i ) ( ω ) ,
where matrices C j 0 ( i ) , , C j , j 1 ( i ) are defined by
[ C j 0 ( i ) , , C j , j 1 ( i ) ] = B j ( i ) ( I A j ( i ) ) .
Here, C j ( i ) R q j × q , for = 0 , , j 1 .
Proof. 
Based on (22), the problem in (51) is represented as follows:
min v 1 , , v p F ( y ) j = 0 p P j ( i ) z j ( v j ) Ω 2 = F ( y ) P 0 ( 0 ) z 0 ( v 0 ) Ω 2 + j = 1 p min v j F ( y ) P j ( i ) z j ( v j ) Ω 2 tr { p E x x } .
Then
min v j F ( y ) P j ( i ) z j ( v j ) Ω 2 = min v j F ( y ) P j ( i ) v j k = 0 j 1 E v j z k E z k z k z k ( v k ) Ω 2 .
Here,
F ( y ) P j ( i ) v j k = 0 j 1 E v j z k E z k z k z k ( v k ) Ω 2 = Ω [ F ( y ) ] ( ω ) P j ( i ) v j ( ω ) k = 0 j 1 E v j z k E z k z k z k ( v k ( ω ) ) 2 d μ ( ω ) = Ω [ F ( y ) ] ( ω ) P j ( i ) z j ( v j ( ω ) ) 2 d μ ( ω ) .
For u = z j ( ( v j ) , let us denote U v = { u L 2 ( Ω , R q j ) u = z j ( ( v j ) } . For any u , let us write U = { u L 2 ( Ω , R q j ) } . For j = 1 , , p , the vector u ˜ ( ω ) R q j of the smallest Euclidean norm of all minimizers that solves
min u ( ω ) [ F ( y ) ] ( ω ) P j ( i ) u ( ω ) 2
is given by (see [85], p. 257)
u ˜ ( ω ) = ( P j ( i ) ) [ F ( y ) ] ( ω ) .
Let v j ( i + 1 ) be such that
min v j ( ω ) [ F ( y ) ] ( ω ) P j ( i ) z j ( v j ( ω ) ) 2 = [ F ( y ) ] ( ω ) P j ( i ) z j ( v j ( i + 1 ) ( ω ) ) 2 .
Since U v U then
min v j ( ω ) [ F ( y ) ] ( ω ) P j ( i ) z j ( v j ( ω ) ) 2 min u ( ω ) [ F ( y ) ] ( ω ) P j ( i ) u ( ω ) 2 ,
i.e.,
min v j ( ω ) [ F ( y ) ] ( ω ) P j ( i ) z j ( v j ( ω ) ) 2 [ F ( y ) ] ( ω ) P j ( i ) u ˜ ( ω ) 2 = [ F ( y ) ] ( ω ) P j ( i ) ( P j ( i ) ) [ F ( y ) ] ( ω ) 2
Then (66) implies that, for all v j ,
[ F ( y ) ] ( ω ) P j ( i ) z j ( v j ( ω ) ) 2 [ F ( y ) ] ( ω ) P j ( i ) ( P j ( i ) ) [ F ( y ) ] ( ω ) 2 .
Since it is true for all v j , then
[ F ( y ) ] ( ω ) P j ( i ) ( P j ( i ) ) [ F ( y ) ] ( ω ) 2 = min v j ( ω ) [ F ( y ) ] ( ω ) P j ( i ) z j ( v j ( ω ) ) 2 ,
i.e.,
[ F ( y ) ] ( ω ) P j ( i ) ( P j ( i ) ) [ F ( y ) ] ( ω ) 2 = [ F ( y ) ] ( ω ) P j ( i ) z j ( v j ( i + 1 ) ( ω ) ) 2 .
Therefore,
z j ( v j ( i + 1 ) ( ω ) ) = ( P j ( i ) ) [ F ( y ) ] ( ω ) .
Taking into account (14), Equation (69) is written as
v j ( i + 1 ) ( ω ) = ( P j ( i ) ) [ F ( y ) ] ( ω ) + k = 0 j 1 E v j ( i + 1 ) z k ( i + 1 ) E z k ( i + 1 ) z k ( i + 1 ) z k ( v k ( i + 1 ) ( ω ) ) ,
where we denote
E v j ( i + 1 ) z k ( i + 1 ) = Ω v j ( i + 1 ) ( ξ ) z k [ v k ( i + 1 ) ( ξ ) ] T d μ ( ξ ) , and E z k ( i + 1 ) z k ( i + 1 ) = Ω z k [ v k ( i + 1 ) ( ξ ) ] z k [ v k ( i + 1 ) ( ξ ) ] T d μ ( ξ ) .
Thus,
v j ( i + 1 ) ( ω ) = ( P j ( i ) ) [ F ( y ) ] ( ω ) + = 0 j 1 Ω v j ( i + 1 ) ( ξ ) [ z ( v ( i + 1 ) ( ξ ) ) ] T d μ ( ξ ) γ ( i + 1 ) ( ω ) .
Let us now write Equation (70) as follows:
v j ( i + 1 ) ( ω ) = α j ( i ) ( ω ) + Ω K j ( i + 1 ) ( ω , ξ ) v j ( i + 1 ) ( ξ ) d μ ( ξ ) ,
where
α j ( i ) ( ω ) = ( P j ( i ) ) [ F ( y ) ] ( ω ) and K j ( i + 1 ) ( ω , ξ ) = = 0 j 1 [ z ( v ( i + 1 ) ( ξ ) ) ] T γ ( i + 1 ) ( ω ) .
Recall that in (72), matrix P j ( i ) = G j ( i ) H j ( i ) depends on v j ( i ) ( ω ) , not on v j ( i + 1 ) ( ω ) . Therefore, Equation (71) is a vector version of the Fredholm integral equation of the second kind [86] with respect to v j ( i + 1 ) ( ω ) . Its solution is provided as follows. Write Equation (71) as
v j ( i + 1 ) ( ω ) = α j ( i ) ( ω ) + = 0 j 1 C j ( i + 1 ) γ ( i + 1 ) ( ω ) ,
where
C j ( i + 1 ) = Ω v j ( i + 1 ) ( ξ ) [ z ( v ( i + 1 ) ( ξ ) ) ] T d μ ( ξ ) .
Let us now multiply both sides of (73) by [ z k ( v k ( i + 1 ) ( ω ) ) ] T , for k = 0 , , j 1 , and integrate. It implies
Ω v j ( i + 1 ) ( ω ) [ z k ( v k ( i + 1 ) ( ω ) ) ] T d μ ( ω ) = Ω α j ( i ) ( ω ) [ z k ( v k ( i + 1 ) ( ω ) ) ] T d μ ( ω ) + = 0 j 1 C j ( i + 1 ) Ω γ ( i + 1 ) ( ω ) [ z k ( v k ( i + 1 ) ( ω ) ) ] T d μ ( ω )
and
C j k ( i + 1 ) = B j k ( i + 1 ) + = 0 j 1 C j ( i + 1 ) A k ( i + 1 ) ,
where, for k = 0 , , j 1 ,
B j k ( i + 1 ) = Ω α j ( i + 1 ) ( ω ) [ z k ( v k ( i + 1 ) ( ω ) ) ] T d μ ( ω ) = ( P j ( i ) ) Ω [ F ( y ) ] ( ω ) [ z k ( v k ( i + 1 ) ( ω ) ) ] T d μ ( ω ) = ( P j ( i ) ) E x z k ( i + 1 ) R q j × q k
and
A k ( i + 1 ) = Ω γ ( i + 1 ) ( ω ) [ z k ( v k ( i + 1 ) ( ω ) ) ] T d μ ( ω ) = E z ( i + 1 ) z ( i + 1 ) E z ( i + 1 ) z k ( i + 1 ) R q × q k .
Let C j ( i + 1 ) = [ C j 0 ( i + 1 ) , , C j , j 1 ( i + 1 ) ] R q j × q . Then the set of matrix equations in (74) can be written as a single equation
C j ( i + 1 ) = B j ( i + 1 ) + C j ( i + 1 ) A j ( i + 1 ) or B j ( i + 1 ) = C j ( i + 1 ) ( I A j ( i + 1 ) ) .
If matrix I A j ( i + 1 ) is invertible, then (75) implies
C j ( i + 1 ) = B j ( i + 1 ) ( I A j ( i + 1 ) ) 1 .
If matrix I A j ( i + 1 ) is singular, then instead of Equation (75), we consider the problem
min C j ( i + 1 ) B j ( i + 1 ) C j ( i + 1 ) ( I A j ( i + 1 ) ) 2 .
Its minimal Frobenius norm solution is given by [62]:
C j ( i + 1 ) = B j ( i + 1 ) ( I A j ( i + 1 ) ) .
Thus, injection v j ( i + 1 ) follows from (73), (76), and (78). □
In terms of a multilayer network, this procedure is illustrated diagrammatically in Figure 1. To simplify the notation, superscripts ( i ) and ( i + 1 ) are omitted in Figure 1.

4.3. Determination of Matrices G 1 ( i + 1 ) , H 1 ( i + 1 ) , ,   G p ( i + 1 ) , H p ( i + 1 ) in Step 3

In Step 3, the matrices G 1 ( i + 1 ) , H 1 ( i + 1 ) , ,   G p ( i + 1 ) , H p ( i + 1 ) solve problem (55). At the same time, the problems in (16) and (55) differ in notation only. Therefore, in Step 3, matrices G 1 ( i + 1 ) , H 1 ( i + 1 ) , ,   G p ( i + 1 ) , H p ( i + 1 ) are determined, in fact, by Theorem 2, where only the notation should be changed.
Nevertheless, to avoid any confusion, we below provide Theorem 9, where matrices G 1 ( i + 1 ) , H 1 ( i + 1 ) , ,   G p ( i + 1 ) , H p ( i + 1 ) are represented. To this end, we denote Γ z j ( i ) = E x z j ( i ) E z j ( i ) z j ( i ) E z j ( i ) x . To simplify the notation, we set Γ z j : = Γ z j ( i ) .
Theorem 9.
Let v 0 , v 1 ( i ) , , v p ( i ) be well-defined injections and let vectors z 1 ( i ) ,   , z p ( i ) be pairwise uncorrelated. Then, the minimal Frobenius norm solution to the problem in (55) is given, for j = 1 , , p , by
G j ( i + 1 ) = U Γ z j , r j a n d H j ( i + 1 ) = U Γ z j , r j T E x z j ( i ) E z j ( i ) z j ( i ) .
Proof. 
The proof follows from the proof of Theorem 2. □

4.4. Error Analysis of the Solution of Problem in (7), (8), (9)

Theorem 10.
Let G 0 ( 0 ) ,   H 0 ( 0 ) ,   G k ( i ) ,   H k ( i ) , and v k ( i ) , for k = 1 , , p , i = 0 , 1 , , be determined by Theorems 2, 8, and 9. Let ε ( i ) be the associated error defined by (50) and (57). Then, the increase in the number of iterations i of the procedure represented in nm33,jkw8,nmwk1 implies the decrease in the associated error, i.e.,
ε ( i + 1 ) ε ( i ) .
Proof. 
Let us consider the initial case of the proposed technique when i = 0 .
  • The case i = 0 . For i = 0 , the i-th iteration loop represented in Section 4.1 implies
    ε v ( 1 ) = min v 1 , , v p F ( y ) k = 1 p G k ( 0 ) H k ( 0 ) z k ( v k ) Ω 2 .
    Here, for j = 1 , , p ,   G j ( 1 ) = G j ( 0 ) and H j ( 1 ) = H j ( 0 ) . Therefore,
    ε v ( 1 ) = min v 1 , , v p F ( y ) k = 1 p G k ( 1 ) H k ( 1 ) z k ( v k ) Ω 2 ,
    i.e., for any v j , in particular, for v j = v j ( 0 ) ,
    ε v ( 1 ) F ( y ) k = 1 p G k ( 1 ) H k ( 1 ) z k ( v k ( 0 ) ) Ω 2 = ε G H ( 1 )
    and
    ε v ( 1 ) F ( y ) k = 1 p G k ( 0 ) H k ( 0 ) z k ( v k ( 0 ) ) Ω 2 = ε ( 0 ) .
    Taking into account (81), we denote
    ε ( 1 ) = ε v ( 1 ) .
    Then, by (82),
    ε ( 1 ) ε ( 0 ) .
    For i = 1 , 2 , , let us prove inequality (80) by induction. To this end, we first consider the basis step of the induction, which consists of cases i = 1 and i = 2 .
  • The basis step: Case i = 1 . If i = 1 then the i-th iteration loop (see Section 4.1) implies
    ε G H ( 2 ) = min G 1 , H 1 , , G p , H p F ( y ) k = 1 p G k H k z k ( v k ( 1 ) ) Ω 2 = F ( y ) k = 1 p G k ( 2 ) H k ( 2 ) z k ( v k ( 1 ) ) Ω 2 ,
    i.e., for all G k and H k with k = 1 , , p ,
    ε G H ( 2 ) F ( y ) k = 1 p G k H k z k ( v k ( 1 ) ) Ω 2 .
    In particular, for G k = G k ( 1 ) and H k = H k ( 1 ) with k = 1 , , p ,
    ε G H ( 2 ) F ( y ) k = 1 p G k ( 1 ) H k ( 1 ) z k ( v k ( 1 ) ) Ω 2 .
    Further, because for k = 1 , , p ,   G k ( 1 ) = G k ( 0 ) and H k ( 1 ) = H k ( 0 ) , then
    min v 1 , , v p F ( y ) k = 1 p G k ( 1 ) H k ( 1 ) z k ( v k ) Ω 2 = min v 1 , , v p F ( y ) k = 1 p G k ( 0 ) H k ( 0 ) z k ( v k ) Ω 2 .
    Therefore, for k = 1 , , p ,
    v k ( 2 ) = v k ( 1 ) and z k ( v k ( 2 ) ) = z k ( v k ( 1 ) )
    (see (51)). Thus, (85) and (87) imply
    ε G H ( 2 ) F ( y ) k = 1 p G k ( 1 ) H k ( 1 ) z k ( v k ( 2 ) ) Ω 2 = ε v ( 2 ) .
    But since G k ( 1 ) = G k ( 0 ) , H k ( 1 ) = H k ( 0 ) and z k ( v k ( 2 ) ) = z k ( v k ( 1 ) ) , for k = 1 , , p , then
    ε v ( 2 ) = ε v ( 1 ) .
    Then by (83), (88), and (89),
    ε G H ( 2 ) ε v ( 1 ) = ε ( 1 ) .
    Taking into account (88), we denote
    ε ( 2 ) = ε G H ( 2 ) .
    Then, (90) and (91) imply
    ε ( 2 ) ε ( 1 ) .
  • The basis step: Case i = 2 . In this case, ε v ( i + 1 ) in (53) is written as
    ε v ( 3 ) = min v 1 , , v p F ( y ) k = 0 p G k ( 2 ) H k ( 2 ) z k ( v k ) Ω 2 = F ( y ) k = 1 p G k ( 2 ) H k ( 2 ) z k ( v k ( 3 ) ) Ω 2 .
    That is, for any v 1 , , v p ,
    ε v ( 3 ) F ( y ) k = 0 p G k ( 2 ) H k ( 2 ) z k ( v k ) Ω 2 .
    At the same time,
    ε G H ( 3 ) = min G 1 , H 1 , , G p , H p F ( y ) k = 1 p G k H k z k ( v k ( 2 ) ) Ω 2 = F ( y ) k = 1 p G k ( 3 ) H k ( 3 ) z k ( v k ( 2 ) ) Ω 2 .
    By (87), z k ( v k ( 2 ) ) = z k ( v k ( 1 ) ) , therefore, G k ( 3 ) , H k ( 3 ) solve, in fact,
    min G 1 , H 1 , , G p , H p F ( y ) k = 1 p G k H k z k ( v k ( 1 ) ) Ω 2 ,
    i.e., G k ( 3 ) = G k ( 2 ) , H k ( 3 ) = H k ( 2 ) . Therefore, (94) implies
    ε G H ( 3 ) = F ( y ) k = 1 p G k ( 2 ) H k ( 2 ) z k ( v k ( 1 ) ) Ω 2 = ε G H ( 2 ) .
    Further, (93) is true for all v k and, in particular, for v k ( 1 ) . Therefore, (93) and (95) imply
    ε v ( 3 ) ε G H ( 3 ) = ε G H ( 2 ) .
    Denote ε ( 3 ) = ε v ( 3 ) . By (91), ε ( 2 ) = ε G H ( 2 ) . Thus,
    ε ( 3 ) ε ( 2 ) .
  • The inductive step. For s = 1 , 2 , , let us suppose that if z k ( v k ( 2 s ) ) = z k ( v k ( 2 s 1 ) ) , and ε ( 2 s ) = ε G H ( 2 s ) and ε ( 2 s 1 ) = ε v ( 2 s 1 ) then
    ε ( 2 s ) ε ( 2 s 1 ) .
    Below, we show that then ε ( 2 s + 1 ) ε ( 2 s ) and ε ( 2 s + 2 ) ε ( 2 s + 1 ) , i.e., that (80) is true.
    To this end, for the i-th iterative loop, let us consider case i = 2 s where s = 1 , 2 , . We have
    ε v ( 2 s + 1 ) = min v 1 , , v p F ( y ) k = 1 p G k ( 2 s ) H k ( 2 s ) z k ( v k ) Ω 2 = F ( y ) k = 1 p G k ( 2 s ) H k ( 2 s ) z k ( v k ( 2 s + 1 ) ) Ω 2 .
    That is, for any v 1 , , v p ,
    ε v ( 2 s + 1 ) F ( y ) k = 1 p G k ( 2 s ) H k ( 2 s ) z k ( v k ) Ω 2 .
    At the same time,
    ε G H ( 2 s + 1 ) = min G 1 , H 1 , , G p , H p F ( y ) k = 1 p G k H k z k ( v k ( 2 s ) ) Ω 2 = F ( y ) k = 1 p G k ( 2 s + 1 ) H k ( 2 s + 1 ) z k ( v k ( 2 s ) ) Ω 2 .
    By the assumption z k ( v k ( 2 s ) ) = z k ( v k ( 2 s 1 ) ) , G k ( 2 s + 1 ) , H k ( 2 s + 1 ) solve, in fact,
    min G 1 , H 1 , , G p , H p F ( y ) k = 1 p G k H k z k ( v k ( 2 s 1 ) ) Ω 2 ,
    i.e., G k ( 2 s + 1 ) = G k ( 2 s ) , H k ( 2 s + 1 ) = H k ( 2 s ) . Therefore, (100) implies
    ε G H ( 2 s + 1 ) = F ( y ) k = 1 p G k ( 2 s ) H k ( 2 s ) z k ( v k ( 2 s 1 ) ) Ω 2 = ε G H ( 2 s ) .
    Further, (99) is true for all v k and, in particular, for v k ( 2 s 1 ) . Therefore, (99) and (101) imply
    ε v ( 2 s + 1 ) ε G H ( 2 s + 1 ) = ε G H ( 2 s ) .
    Denote ε ( 2 s + 1 ) = ε v ( 2 s + 1 ) . By the assumption ε ( 2 s ) = ε G H ( 2 s ) , then
    ε ( 2 s + 1 ) ε ( 2 s ) .
    Further, for s = 1 , 2 , , let us now consider the case i = 2 s + 1 . Then
    ε G H ( 2 s + 2 ) = min G 1 , H 1 , , G p , H p F ( y ) k = 1 p G k H k z k ( v k ( 2 s + 1 ) ) Ω 2 = F ( y ) k = 1 p G k ( 2 s + 2 ) H k ( 2 s + 2 ) z k ( v k ( 2 s + 1 ) ) Ω 2 .
    Therefore, for any G k and H k ,
    ε G H ( 2 s + 2 ) F ( y ) k = 1 p G k H k z k ( v k ( 2 s + 1 ) ) Ω 2 .
    We also need the following. By (55), G k ( 2 s + 1 ) and H k ( 2 s + 1 ) solve
    min G 1 , H 1 , , G p , H p F ( y ) k = 1 p G k H k z k ( v k ( 2 s ) ) Ω 2 .
    Since by the assumption, z k ( v k ( 2 s ) ) = z k ( v k ( 2 s 1 ) ) , then (105) is equivalent to
    min G 1 , H 1 , , G p , H p F ( y ) k = 1 p G k H k z k ( v k ( 2 s 1 ) ) Ω 2 .
    Thus,
    G k ( 2 s + 1 ) = G k ( 2 s ) and H k ( 2 s + 1 ) = H k ( 2 s ) .
    For G k = G k ( 2 s + 1 ) and H k = H k ( 2 s + 1 ) , (104) implies
    ε G H ( 2 s + 2 ) F ( y ) k = 1 p G k ( 2 s + 1 ) H k ( 2 s + 1 ) z k ( v k ( 2 s + 1 ) ) Ω 2
    and then by (107),
    ε G H ( 2 s + 2 ) F ( y ) k = 1 p G k ( 2 s ) H k ( 2 s ) z k ( v k ( 2 s + 1 ) ) Ω 2 = ε v ( 2 s + 1 ) .
    Because of (109), denote ε ( 2 s + 2 ) = ε G H ( 2 s + 2 ) . Recall that according to the above, ε ( 2 s + 1 ) = ε v ( 2 s + 1 ) . Therefore, (109) implies
    ε ( 2 s + 2 ) ε ( 2 s + 1 ) .
    Thus, (80) is true.
Further, we wish to show that the proposed procedure for the solution of problem in (7), (8), (9) converges to a so-called coordinate-wise minimum, which is defined in Theorem 11 below. To this end, let us denote
f ( P , v ) = F ( y ) j = 0 p P j z j ( v j ) Ω 2 ,
where, for j = 0 , 1 , , p , as in (20), P j = G j H j , P = [ P 0 , , P p ] , v = [ v 0 , ,   v p ] T and v 0 = y .
For i = 1 , 2 , , we also write P j ( i ) = G j ( i ) H j ( i ) , P ( i ) = [ P 0 ( 0 ) , P 1 ( i ) , , P p ( i ) ] and v ( i ) = [ v 0 ( i ) , ,   v p ( i ) ] T . As before, v 0 ( i ) = y .
Let us now define compact sets K 1 and K 2 such that
K 1 = { P : 0 f ( P , v ( 0 ) ) f ( P ( 0 ) , v ( 0 ) ) }
and
K 2 = { v : 0 f ( P ( 0 ) , v ) f ( P ( 0 ) , v ( 1 ) ) } .
Theorem 11.
Let P K 1 and v K 2 . Let v ( i ) = [ v 0 ( i ) , ,   v p ( i ) ] T and P ( i ) be determined by Theorems 2, 8, and 9, respectively, and let S ( i ) = ( P ( i ) , v ( i ) ) . Then, any cluster point of sequence { S ( i ) } , say S = ( P , v ) is a coordinate-wise minimum point of f ( P , v ) , i.e.,
P arg min P K 1 f ( P , v ) a n d v arg min v K 2 f ( P , v ) .
Proof. 
Since each K j , for j = 1 , 2 , is compact, then there is a subsequence { S ( j t ) } = { ( P ( j t ) , v ( j t ) ) } such that S ( j t ) S when t . Let P ¯ and P ¯ ( j t ) be defined by
P ¯ arg min P K 1 f ( P , v ) and P ¯ ( j t ) arg min P K 1 f ( P , v ( j t ) ) .
Similar to [75], P ¯ and P ¯ ( j t ) are called the best responses to P associated with v and v ( j t ) , respectively. Consider entry P ( j t ) of S ( j t ) . Then, on the basis of the proof of Theorem 3.1 in [75], we have
f ( P ¯ , v ( j t ) ) f ( P ¯ ( j t ) , v ( j t ) ) f ( P ( j t + 1 ) , v ( j t + 1 ) ) f ( P ( j t + 1 ) , v ( j t + 1 ) ) .
By continuity, as t ,
f ( P ¯ , v ) f ( P , v ) .
This implies that (114) should hold as an equality, since the inequality is false by the definition of the best response P ¯ . Thus, P is the best response for v , or equivalently, P is the solution of the problem arg min P K 1 f ( P , v ) .
The proof is similar if we consider entry v ( j t ) of S ( j t ) . □
Example 2.
In the above Example 1, we illustrated the decrease in the error associated with approximating operator T p ( v 0 , , v p ) when parameters p and q increase. In Example 1, injections were not optimal.
Here, for the case of optimal injections, we wish to numerically illustrate Theorem 10, i.e., that the error ε ( i ) decreases if the number of iterations i increases.
For i = 0 , 1 , , we denote
T p ( i , i + 1 ) ( v 0 , v 1 ( i + 1 ) , , v p ( i + 1 ) ) = k = 0 p G k ( i ) H k ( i ) z k ( v k ( i + 1 ) )
and
T p ( i + 1 , i ) ( v 0 , v 1 ( i ) , , v p ( i ) ) = k = 0 p G k ( i + 1 ) H k ( i + 1 ) z k ( v k ( i ) ) .
According to the procedure described in Section 4, for each i-th iteration where i = 0 , 1 , , the proposed method is represented by either
T p ( i , i + 1 ) ( v 0 , v 1 ( i + 1 ) , , v p ( i + 1 ) )
or
T p ( i + 1 , i ) ( v 0 , v 1 ( i ) , , v p ( i ) ) .
To simplify the notation, both T p ( i , i + 1 ) ( v 0 ,   v 1 ( i + 1 ) ,   , v p ( i + 1 ) ) and T p ( i + 1 , i ) ( v 0 ,   v 1 ( i ) , , v p ( i ) ) are denoted by T p ( i ) ( v ( i ) ) .
We assume that x L 2 ( Ω , R m ) and y L 2 ( Ω , R m ) are uniformly and normally distributed random vectors, respectively. Initial injections v 1 ( 0 ) L 2 ( Ω , R q 1 ) , , v p ( 0 ) L 2 ( Ω , R q p ) are here chosen as uniformly distributed random vectors. It is assumed that covariance matrices E x v j , E v i v j , for i , j = 0 , 1 , 2 , are given by E x v j = 1 s X V j T a n d E v i v j = 1 s V i V j T where X R m × s and V j R q j × s are samples of x and v j , respectively, for j = 0 , 1 , 2 .
We choose m = 400 and r 0 = 200 . In Table 3, values of error ε ( i ) associated with operators T 0 ( i ) ( v ( i ) ) , T 1 ( i ) ( v ( i ) ) and T 2 ( i ) ( v ( i ) ) , for i = 14 are represented.
In Figure 3, diagrams of the errors are represented. It follows from Table 3 and Figure 3 that the error decreases if the number of iterations i increases. In particular, the error remains the same after i = 14 .
Similar to what is explained in Example 1, the proposed method is more effective because the increase in the number of parameters to optimize in the operator T leads to the reduction in the error, as stated in Theorem 10.

Author Contributions

Conceptualization, P.S.-Q. and A.T.; methodology, A.T.; software, P.S.-Q.; validation, A.T.; writing—original draft, A.T.; visualization, P.S.-Q. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Acknowledgments

This work was financially supported by Vicerrectoría de Investigación y Extensión from Instituto Tecnológico de Costa Rica (Research #1440054).

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Amat, S.; Busquier, S.; Negra, M. Adaptive Approximation of Nonlinear Operators. Numer. Funct. Anal. Optim. 2004, 25, 397–405. [Google Scholar] [CrossRef]
  2. Bruno, V.I. An approximate Weierstrass theorem in topological vector space. J. Approx. Theory 1984, 42, 1–3. [Google Scholar] [CrossRef]
  3. Chen, T.; Chen, H. Universal approximation to nonlinear operators by neural networks with arbitrary activation functions and its application to dynamical systems. IEEE Trans. Neural Netw. 1995, 6, 911–917. [Google Scholar] [CrossRef]
  4. Dingle, K.; Camargo, C.Q.; Louis, A.A. Input–output maps are strongly biased towards simple outputs. Nat. Commun. 2018, 9, 761. [Google Scholar] [CrossRef]
  5. Gallman, P.G.; Narendra, K. Representations of nonlinear systems via the Stone-Weierstrass theorem. Automatica 1976, 12, 619–622. [Google Scholar] [CrossRef]
  6. Howlett, P.G.; Torokhti, A.P.; Pearce, C.E.M. A Philosophy for the Modelling of Realistic Non-linear Systems. Proc. Am. Math. Soc. 2003, 131, 353–363. [Google Scholar] [CrossRef]
  7. Istrăţescu, V.I. A Weierstrass theorem for real Banach spaces. J. Approx. Theory 1977, 19, 118–122. [Google Scholar] [CrossRef]
  8. Prenter, P.M. A Weierstrass Theorem for Real, Separable Hilbert Spaces. J. Approx. Theory 1970, 4, 341–357. [Google Scholar] [CrossRef]
  9. Prolla, J.B.; Machado, S. Weierstrass-Stone theorems for set-valued mappings. J. Approx. Theory 1982, 36, 1–15. [Google Scholar] [CrossRef]
  10. Rao, N.V. Stone-Weierstrass theorem revisited. Am. Math. Mon. 2005, 112, 726–729. [Google Scholar] [CrossRef]
  11. Sandberg, I.W. Notes on uniform approximation of time–varying systems on finite time intervals. IEEE Trans. Circuits Syst. I Fundam. Theory Appl. 1998, 45, 863–864. [Google Scholar] [CrossRef]
  12. Sandberg, I.W. Time delay polynomial networks and quality of approximation. IEEE Trans. Circuits Syst. I Fundam. Theory Appl. 2000, 47, 40–49. [Google Scholar] [CrossRef]
  13. Sandberg, I.W. R+ Fading Memory and Extensions of Input-Output Maps. IEEE Trans. Circuits Syst. I Fundam. Theory Appl. 2002, 49, 1586–1591. [Google Scholar] [CrossRef]
  14. Timofte, V. Stone-Weierstrass theorem revisited. J. Approx. Theory 2005, 536, 45–59. [Google Scholar] [CrossRef]
  15. Piotrowski, T.; Cavalcante, R.; Yamada, I. Stochastic MV-PURE Estimator-Robust Reduced-Rank Estimator for Stochastic Linear Model. IEEE Trans. Signal Process. 2009, 57, 1293–1303. [Google Scholar] [CrossRef]
  16. Hua, Y.; Nikpour, M.; Stoica, P. Optimal reduced-rank estimation and filtering. IEEE Trans. Signal Process. 2001, 49, 457–469. [Google Scholar]
  17. Zhu, X.L.; Zhang, X.D.; Ding, Z.Z.; Jia, Y. Adaptive nonlinear PCA algorithms for blind source separation without prewhitening. IEEE Trans. Circuits Syst. I Regul. Pap. 2006, 53, 745–753. [Google Scholar]
  18. Wang, Q.; Jing, Y. New rank detection methods for retuced-rank MIMO systems. EURASIP J. Wirel. Comm. Netw. 2015, 2015, 1–16. [Google Scholar] [CrossRef]
  19. Bai, L.; Dou, S.; Xiao, Z.; Choi, J. Doubly iterative multiple-input-multiple-output-bit-interleaved coded modulation receiver with joint channel estimation and randomised sampling detection. IET Signal Process. 2016, 10, 335–341. [Google Scholar] [CrossRef]
  20. Brillinger, D.R. Time Series: Data Analysis and Theory; SIAM: San Francisco, CA, USA, 2001. [Google Scholar]
  21. Jolliffe, I. Principal Component Analysis, 2nd ed.; Springer: New York, NY, USA, 2002. [Google Scholar]
  22. Bühlmann, P.; van de Geer, S. Statistics for High-Dimensional Data; Springer: New York, NY, USA, 2011. [Google Scholar]
  23. She, Y.; Li, S.; Wu, D. Robust Orthogonal Complement Principal Component Analysis. J. Am. Stat. Assoc. 2016, 111, 763–771. [Google Scholar] [CrossRef]
  24. Torokhti, A.; Friedland, S. Towards theory of generic Principal Component Analysis. J. Multivar. Anal. 2009, 100, 661–669. [Google Scholar] [CrossRef]
  25. Saghri, J.A.; Schroeder, S.; Tescher, A.G. Adaptive two-stage Karhunen-Loeve-transform scheme for spectral decorrelation in hyperspectral bandwidth compression. Opt. Eng. 2010, 49, 057001. [Google Scholar] [CrossRef]
  26. Kmuntchav, R.; Kountcheva, R. Hierarchical Adaptive KL-Based Transform: Algorithms and Applications. In Computer Vision in Control Systems-1; Favorskaya, M.N., Jain, L.C., Eds.; Springer: Cham, Switzerland, 2015; Volume 73, pp. 91–136. [Google Scholar]
  27. de Moura, E.P.; de Abreu Melo Junior, F.; Damasceno, F.; Figueiredo, L.; de Andrade, C.; de Almeida, M. Classification of imbalance levels in a scaled wind turbine through detrended fluctuation analysis of vibration signals. Renew. Energy 2016, 96 Pt A, 993–1002. [Google Scholar] [CrossRef]
  28. Azimia, R.; Ghayekhlooa, M.; Ghofranib, M.; Sajedic, H. A novel clustering algorithm based on data transformation approaches. Expert Syst. Appl. 2017, 76, 59–70. [Google Scholar] [CrossRef]
  29. Burge, M.J.; Burger, W. Digital Image Processing—An Algorithmic Introduction Using Java; Springer: London, UK, 2006. [Google Scholar]
  30. Phophalia, A.; Rajwade, A.; Mitra, S.K. Rough set based image denoising for brain MR images. Signal Process. 2014, 103, 24–35. [Google Scholar] [CrossRef]
  31. Chenn, S.; Billings, S.A.; Grant, P. Non-linear system identification using neural networks. Int. J. Control 1990, 51, 1191–1214. [Google Scholar] [CrossRef]
  32. Chenn, S.; Billings, S.A. Neural networks for nonlinear dynamic system modelling and identification. Int. J. Control 1992, 56, 319–346. [Google Scholar] [CrossRef]
  33. Fomin, V.N.; Ruzhansky, M.V. Abstract Optimal Linear Filtering. SIAM J. Control Optim. 2000, 38, 1334–1352. [Google Scholar] [CrossRef]
  34. Temlyakov, V.N. Nonlinear Methods of Approximation. Found. Comput. Math. 2003, 3, 33–107. [Google Scholar] [CrossRef]
  35. Torokhti, A.; Howlett, P. Best approximation of identity mapping: The case of variable memory. J. Approx. Theory 2006, 143, 111–123. [Google Scholar] [CrossRef]
  36. Liu, W.; Zhang, H.; Yu, K.; Tan, X. Optimal linear filtering for networked systems with communication constraints, fading measurements, and multiplicative noises. Int. J. Adapt. Control Signal Process. 2016. [Google Scholar] [CrossRef]
  37. Aminzare, Z.; Sontag, E.D. Synchronization of Diffusively-Connected Nonlinear Systems: Results Based on Contractions with Respect to General Norms. IEEE Trans. Netw. Sci. Eng. 2014, 1, 91–106. [Google Scholar] [CrossRef]
  38. Schneider, M.K.; Willsky, A.S. A Krylov Subspace Method for Covariance Approximation and Simulation of Random Processes and Fields. Multidimens. Syst. Signal Process. 2003, 14, 295–318. [Google Scholar] [CrossRef]
  39. Chenn, S.; Billings, S.A. Representations of non-linear systems: The NARMAX model. Int. J. Control 1989, 49, 1013–1032. [Google Scholar] [CrossRef]
  40. Piroddi, L.; Spinelli, W. An identification algorithm for polynomial NARX models based on simulation error minimization. Int. J. Control 2003, 76, 1767–1781. [Google Scholar] [CrossRef]
  41. Alter, O.; Golub, G.H. Singular value decomposition of genome-scale mRNA lengths distribution reveals asymmetry in RNA gel electrophoresis band broadening. Process. Natl. Acad. Sci. USA 2006, 103, 11828–11833. [Google Scholar] [CrossRef]
  42. Gianfelici, F.; Turchetti, C.; Crippa, P. A non-probabilistic recognizer of stochastic signals based on KLT. Signal Process. 2009, 89, 422–437. [Google Scholar] [CrossRef]
  43. Formaggia, L.; Quarteroni, A.; Veneziani, A. (Eds.) Cardiovascular Mathematics—Modeling and Simulation of the Circulatory System; Springer: Milano, Italy, 2009; Volume 1. [Google Scholar]
  44. Ambrosi, D.; Quarteroni, A.; Rozza, G. (Eds.) Modelling of Physiological Flows; Springer: Milano, Italy, 2011; Volume V. [Google Scholar]
  45. Piotrowski, T.; Yamada, I. Performance of the stochastic MV-PURE estimator in highly noisy settings. J. Frankl. Insitute 2014, 351, 3339–3350. [Google Scholar] [CrossRef]
  46. Poor, H.V. An Introduction to Signal Processing and Estimation, 2nd ed.; Springer: New York, NY, USA, 2001. [Google Scholar]
  47. Torokhti, A.; Soto-Quiros, P. Optimal modeling of nonlinear systems: Method of variable injections. Proyecciones 2024, 43, 189–224. [Google Scholar] [CrossRef]
  48. Yang, S.; Xing, T.; Ke, C.; Liang, J.; Ke, X. Effect of wavefront distortion on the performance of coherent detection systems: Theoretical analysis and experimental research. Photonics 2023, 10, 493. [Google Scholar] [CrossRef]
  49. Stanković, L.; Brajović, M.; Stanković, I.; Lerga, J.; Daković, M. RANSAC-based signal denoising using compressive sensing. Circuits Syst. Signal Process. 2021, 40, 3907–3928. [Google Scholar] [CrossRef]
  50. Wang, Y.; Cheng, K.; Zhao, S.; Xu, E. Human ear image recognition method using PCA and Fisherface complementary double feature extraction. J. Artif. Intell. Technol. 2023, 3, 18–24. [Google Scholar] [CrossRef]
  51. Al-Saffar, N.F.H.; Al-Saiq, I.R. Symmetric text encryption scheme based Karhunen Loeve transform. J. Discret. Math. Sci. Cryptogr. 2022, 25, 2773–2781. [Google Scholar] [CrossRef]
  52. Soto-Quiros, P.; Torokhti, A. Fast random vector transforms in terms of pseudo-inverse within the Wiener filtering paradigm. J. Comput. Appl. Math. 2024, 448, 115927. [Google Scholar] [CrossRef]
  53. Howlett, P.; Torokhti, A. Optimal approximation of a large matrix by a sum of projected linear mappings on prescribed subspaces. Electron. J. Linear Algebra 2024, 40, 585–605. [Google Scholar] [CrossRef]
  54. Torokhti, A.; Howlett, P. Optimal estimation of distributed highly noisy signals within KLT-Wiener archetype. Digit. Signal Process. 2023, 143, 104225. [Google Scholar] [CrossRef]
  55. Howlett, P.; Torokhti, A. An optimal linear filter for estimation of random functions in Hilbert space. ANZIAM J. 2020, 62, 274–301. [Google Scholar] [CrossRef]
  56. Ledoit, O.; Wolf, M. A well-conditioned estimator for large-dimensional covariance matrices. J. Multivar. Anal. 2004, 88, 365–411. [Google Scholar] [CrossRef]
  57. Ledoit, O.; Wolf, M. Nonlinear shrinkage estimation of large-dimensional covariance matrices. Ann. Stat. 2012, 40, 1024–1060. [Google Scholar] [CrossRef]
  58. Adamczak, R.; Litvak, A.E.; Pajor, A.; Tomczak-Jaegermann, N. Quantitative estimates of the convergence of the empirical covariance matrix in log-concave ensembles. J. Am. Math. Soc. 2009, 2, 535–561. [Google Scholar] [CrossRef]
  59. Vershynin, R. How Close is the Sample Covariance Matrix to the Actual Covariance Matrix? J. Theor. Probab. 2012, 25, 655–686. [Google Scholar] [CrossRef]
  60. Won, J.-H.; Lim, J.; Kim, S.-J.; Rajaratnam, B. Condition-number-regularized covariance estimation. J. R. Stat. Soc. Ser. B 2013, 75, 427–450. [Google Scholar] [CrossRef] [PubMed]
  61. Yang, R.; Berger, J.O. Estimation of a covariance matrix using the reference prior. Ann. Statist. 1994, 22, 1195–1211. [Google Scholar] [CrossRef]
  62. Ben-Israel, A.; Greville, T.N.E. Generalized Inverses: Theory and Applications, 2nd ed.; Springer: New York, NY, USA, 2003. [Google Scholar]
  63. Hotelling, H. Analysis of a Complex of Statistical Variables into Principal Components. J. Educ. Psychol. 1933, 6, 417–441 and 498–520. [Google Scholar] [CrossRef]
  64. Karhunen, K. Über Lineare Methoden in der Wahrscheinlichkeitsrechnung. Ann. Acad. Sci. Fenn. Ser. A. I. Math.-Phys. 1947, 1, 1–79. [Google Scholar]
  65. Loève, M. Fonctions aléatoires de second ordre. In Processus Stochastiques et Mouvement Brownien; Lévy, P., Ed.; Hermann: Paris, France, 1948. [Google Scholar]
  66. Scharf, L. The SVD and reduced rank signal processing. Signal Process. 1991, 25, 113–133. [Google Scholar] [CrossRef]
  67. Hua, Y.; Liu, W. Generalized Karhunen-Loeve transform. IEEE Signal Process. Lett. 1998, 5, 141–142. [Google Scholar]
  68. Torokhti, A.; Soto-Quiros, P. Generalized Brillinger-Like Transforms. IEEE Signal Process. Lett. 2016, 23, 843–847. [Google Scholar] [CrossRef]
  69. Torokhti, A.; Miklavcic, S. Data compression under constraints of causality and variable finite memory. Signal Process. 2010, 90, 2822–2834. [Google Scholar] [CrossRef]
  70. Lathauwer, L.D.; Moor, B.D.; Vandewalle, J. On the Best Rank-1 and Rank-(R1,R2, , RN) Approximation of Higher-Order Tensors. SIAM J. Matrix Anal. Appl. 2006, 21, 1324–1342. [Google Scholar] [CrossRef]
  71. Grasedyck, L.; Kressner, D.; Tobler, C. A literature survey of low-rank tensor approximation techniques. GAMM-Mitteilungen 2013, 36, 53–78. [Google Scholar] [CrossRef]
  72. Friedland, S.; Tammali, V. Low-Rank Approximation of Tensors. In Numerical Algebra, Matrix Theory, Differential-Algebraic Equations and Control Theory; Benner, P., Bollhöfer, M., Kressner, D., Mehl, C., Stykel, T., Eds.; Springer: Cham, Switzerland, 2015; pp. 377–411. [Google Scholar]
  73. Billings, S.A. Nonlinear System Identification-Narmax Methods in the Time, Frequency, and Spatio-Temporal Domains; John Wiley and Sons, Ltd.: Hoboken, NJ, USA, 2013. [Google Scholar]
  74. Schoukens, M.; Tiels, K. Identification of Block-oriented Nonlinear Systems Starting from Linear Approximations: A Survey. Automatica 2017, 85, 272–292. [Google Scholar] [CrossRef]
  75. Chen, B.; He, S.; Li, Z.; Zhang, S. Maximum Block Improvement and Polynomial Optimization. SIAM J. Optim. 2012, 22, 87–107. [Google Scholar] [CrossRef]
  76. Abed-Meraim, K.; Qui, W.; Hua, Y. Blind system identification. Proc. IEEE 1997, 85, 1310–1322. [Google Scholar] [CrossRef]
  77. Hua, Y. Blind methods of system identification. Circuits Syst. Signal Process. 2002, 21, 91–108. [Google Scholar] [CrossRef]
  78. Shi, X. Mathematical Description of Blind Signal Processing. In Blind Signal Processing: Theory and Practice; Springer: Berlin/Heidelberg, Germany, 2011; pp. 27–59. [Google Scholar]
  79. Torokhti, A.; Howlett, P. Optimal Fixed Rank Transform of the Second Degree. IEEE Trans. Circuits Syst. II Analog. Digit. Signal Process. 2001, 48, 309–315. [Google Scholar] [CrossRef]
  80. Torokhti, A.; Howlett, P. Computational Methods for Modelling of Nonlinear Systems; Elsevier: Amsterdam, The Netherlands, 2007. [Google Scholar]
  81. Friedland, S.; Torokhti, A. Generalized Rank-Constrained Matrix Approximations. SIAM J. Matrix Anal. Appl. 2007, 29, 656–659. [Google Scholar] [CrossRef]
  82. Liu, X.; Li, W.; Wang, H. Rank constrained matrix best approximation problem with respect to (skew) Hermitian matrices. J. Comput. Appl. Math. 2017, 319, 77–86. [Google Scholar] [CrossRef]
  83. Boutsidis, C.; Woodruff, D.P. Optimal CUR Matrix Decompositions. SIAM J. Comput. 2017, 46, 543–589. [Google Scholar] [CrossRef]
  84. Wang, H. Rank constrained matrix best approximation problem. Appl. Math. Lett. 2015, 50, 98–104. [Google Scholar] [CrossRef]
  85. Golub, G.; Loan, C.F.V. Matrix Computations; Jons Hopkins University Press: Baltimore, MD, USA, 1996. [Google Scholar]
  86. Kanwal, R. Linear Integral Equations; Birkhäuser Boston: Boston, MA, USA, 1996. [Google Scholar]
Figure 1. Diagrammatical representation of the proposed technique.
Figure 1. Diagrammatical representation of the proposed technique.
Mca 30 00026 g001
Figure 2. Example 1: Diagrams of the errors associated with T 0 ( v 0 ) , T 1 ( v 0 , v 1 ) , and T 2 ( v 0 , v 1 , v 2 ) .
Figure 2. Example 1: Diagrams of the errors associated with T 0 ( v 0 ) , T 1 ( v 0 , v 1 ) , and T 2 ( v 0 , v 1 , v 2 ) .
Mca 30 00026 g002
Figure 3. Example 2: Diagrams of the errors associated with T 0 : = T 0 ( i ) ( v ( i ) ) , T 1 : = T 1 ( i ) ( v ( i ) ) and T 2 : = T 2 ( i ) ( v ( i ) ) .
Figure 3. Example 2: Diagrams of the errors associated with T 0 : = T 0 ( i ) ( v ( i ) ) , T 1 : = T 1 ( i ) ( v ( i ) ) and T 2 : = T 2 ( i ) ( v ( i ) ) .
Mca 30 00026 g003
Table 1. Numerical characterizations of approximating operators T 0 ( v 0 ) , T 1 ( v 0 , v 1 ) , and T 2 ( v 0 , v 1 , v 2 ) in Case 1.
Table 1. Numerical characterizations of approximating operators T 0 ( v 0 ) , T 1 ( v 0 , v 1 ) , and T 2 ( v 0 , v 1 , v 2 ) in Case 1.
Approximating Operator q 0 q 1 q 2 r 0 r 1 r 2 MSE
T 0 ( v 0 ) 100N/AN/A50N/AN/A8.30
T 1 ( v 0 , v 1 ) 10025N/A2525N/A7.93
T 2 ( v 0 , v 1 , v 2 ) 100255001717167.03
Table 2. Numerical characterizations of operators T 0 ( v 0 ) , T 1 ( v 0 , v 1 ) , and T 2 ( v 0 , v 1 , v 2 ) in Case 2.
Table 2. Numerical characterizations of operators T 0 ( v 0 ) , T 1 ( v 0 , v 1 ) , and T 2 ( v 0 , v 1 , v 2 ) in Case 2.
Approximating Operator q 0 q 1 q 2 r 0 r 1 r 2 MSE
T 0 ( v 0 ) 100N/AN/A50N/AN/A8.30
T 1 ( v 0 , v 1 ) 100200N/A2525N/A7.61
T 2 ( v 0 , v 1 , v 2 ) 1002005001717166.28
Table 3. Numerical characterizations of approximating operators T 0 ( i ) ( v ( i ) ) , T 1 ( i ) ( v ( i ) ) and T 2 ( i ) ( v ( i ) ) .
Table 3. Numerical characterizations of approximating operators T 0 ( i ) ( v ( i ) ) , T 1 ( i ) ( v ( i ) ) and T 2 ( i ) ( v ( i ) ) .
T p ( i ) ( v ( i ) ) q 0 q 1 q 2 r 0 r 1 r 2 i ε ( i )
T 0 ( i ) ( v ( i ) ) 400N/AN/A200N/AN/AN/A30.9
T 1 ( i ) ( v ( i ) ) 400400400100100N/A1421.8
T 2 ( i ) ( v ( i ) ) 4004004006767661418.9
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Torokhti, A.; Soto-Quiros, P. Modeling of Nonlinear Systems: Method of Optimal Injections. Math. Comput. Appl. 2025, 30, 26. https://doi.org/10.3390/mca30020026

AMA Style

Torokhti A, Soto-Quiros P. Modeling of Nonlinear Systems: Method of Optimal Injections. Mathematical and Computational Applications. 2025; 30(2):26. https://doi.org/10.3390/mca30020026

Chicago/Turabian Style

Torokhti, Anatoli, and Pablo Soto-Quiros. 2025. "Modeling of Nonlinear Systems: Method of Optimal Injections" Mathematical and Computational Applications 30, no. 2: 26. https://doi.org/10.3390/mca30020026

APA Style

Torokhti, A., & Soto-Quiros, P. (2025). Modeling of Nonlinear Systems: Method of Optimal Injections. Mathematical and Computational Applications, 30(2), 26. https://doi.org/10.3390/mca30020026

Article Metrics

Back to TopTop