A Recursive Least-Squares Algorithm for the Identiﬁcation of Trilinear Forms

: High-dimensional system identiﬁcation problems can be efﬁciently addressed based on tensor decompositions and modelling. In this paper, we design a recursive least-squares (RLS) algorithm tailored for the identiﬁcation of trilinear forms, namely RLS-TF. In our framework, the trilinear form is related to the decomposition of a third-order tensor (of rank one). The proposed RLS-TF algorithm acts on the individual components of the global impulse response, thus being efﬁcient in terms of both performance and complexity. Simulation results indicate that the proposed solution outperforms the conventional RLS algorithm (which handles only the global impulse response), but also the previously developed trilinear counterparts based on the least-mean-squares algorithm.


Introduction
Currently, there is an increasing interest in developing methods and algorithms that exploit tensor decompositions and modelling [1,2]. These techniques become of significant importance in many real-world scenarios, e.g., when dealing with large amounts of data, processing multidimensional signals, or solving high-dimensional system identification problems. Many important applications rely on such tensor-based techniques, which can be successfully used in the fields of big data [3], source separation [4], machine learning [5], multiple-input multiple-output (MIMO) communication systems [6], and beamforming [7].
Tensor decompositions and their related applications are frequently addressed based on multilinear signal processing techniques [8,9]. For example, in the context of system identification scenarios, the problems can be formulated in terms of identifying multilinear forms. As particular cases, we can mention the bilinear and trilinear forms, where the decomposition is performed using two and three components, respectively. Since the Wiener filter and adaptive algorithms represent popular methods to address system identification problems, their applicability was also extended to the multilinear framework. Among the recent related works, we can mention the iterative Wiener filter for bilinear forms [10] and the subsequent adaptive filtering methods [11][12][13], together with their extensions to trilinear forms [14][15][16][17].
In this context, the work in [17] provided a system identification framework based on tensor decomposition, which was suitable for the trilinear approach. This work presents the iterative Wiener filter and least-mean-squares (LMS) adaptive algorithms tailored for trilinear forms. Among those,

Third-Order Tensors
In this section, we provide a brief summary on tensors, outlining the main definitions and operations, while also establishing the notation. A tensor can be defined as a multidimensional array of data [22,23]. For example, a matrix and a vector can be referred to as second-and first-order tensors, respectively. Tensors of order three or higher are called higher order tensors. In the following, the notation used for a tensor, a matrix, a vector, and a scalar is A, A, a, and a, respectively. In this work, we are focusing only on real-valued third-order tensors, i.e., A ∈ R L 1 ×L 2 ×L 3 , so that the array dimension is L 1 × L 2 × L 3 .
The entries of a tensor can be referred to by using multiple indices. In our case, for a third-order tensor, the first and second indices l 1 (with l 1 = 1, 2, . . . , L 1 ) and l 2 (with l 2 = 1, 2, . . . , L 2 ) correspond to the row and column, respectively (like in a matrix); in addition, the third index l 3 (with l 3 = 1, 2, . . . , L 3 ) corresponds to the tube and describes its depth. Consequently, these three indices describe the three different modes. In terms of notation, the entries of the different order tensors are denoted by (A) l 1 l 2 l 3 = a l 1 l 2 l 3 , (A) l 1 l 2 = a l 1 l 2 , and (a) l 1 = a l 1 .
In the case of a matrix, the vectorization operation leads to a vector that contains the concatenated columns of the original matrix. In the case of a third-order tensor, the so-called matricization operation concatenates the "slices" of the tensor and produces a large matrix. Of course, the result depends on which index, all the elements of which are considered first. Thus, the matricization can be performed along three different modes [8,9]. For example, considering the mode row and then varying the columns and the tubes, we obtain: A [1] = A :,1:L 2 ,1:L 3 = A ::1 · · · A ::L 3 , where A [1] ∈ R L 1 ×L 2 L 3 and the matrices A ::l 3 ∈ R L 1 ×L 2 with l 3 = 1, 2 . . . , L 3 denote the frontal slices. Similarly, we can take the mode column and then vary the lines and the tubes, which results in A [2] = A 1:L 1 ,:,1:L 3 , with A [2] ∈ R L 2 ×L 1 L 3 . Finally, we can take the mode tube and then vary the rows and the columns, in order to obtain A [3] = A 1:L 1 ,1:L 2 ,: , where A [3] ∈ R L 3 ×L 1 L 2 . The ranks of A [1] , A [2] , and A [3] represent the mode-1, mode-2, and mode-3 ranks of the tensor, respectively. Furthermore, the vectorization of a tensor (e.g., following mode-1) is Nevertheless, there are some fundamental differences between the rank of a matrix A ∈ R L 1 ×L 2 and the rank of a tensor A ∈ R L 1 ×L 2 ×L 3 . For example, the rank of A can never be larger than min{L 1 , L 2 }, while the rank of A can be greater than min{L 1 , L 2 , L 3 }. A rank-1 tensor (of dimension L 1 × L 2 × L 3 ) is defined as: where b 1 , b 2 , and b 3 are vectors of lengths L 1 , L 2 , and L 3 , respectively, • is the vector outer product, and the elements of B are given by (B) l 1 l 2 l 3 = b 1,l 1 b 2,l 2 b 3,l 3 , where b i,l i are the elements of the vector b i , with i = 1, 2, 3 and l i = 1, 2, . . . , L i . In this case, it can be easily verified that: where ⊗ denotes the Kronecker product [24]. In this context, the rank of a tensor A, denoted rank (A), is defined as the minimum number of rank-1 tensors that generate A as their sum. For example, if: where a 1r , a 2r , and a 3r are vectors of lengths L 1 , L 2 , and L 3 , respectively, then rank (A) = R when R is minimal and (3) is called the canonical polyadic decomposition (CPD) of A. Another important operation is the multiplication of a tensor with a matrix [1,2], which can also be defined in different ways. For example, the mode-1 product between the tensor A ∈ R L 1 ×L 2 ×L 3 and the matrix M 1 ∈ R M 1 ×L 1 gives the tensor: whose entries are u m 1 l 2 l 3 = ∑ L 1 l 1 =1 a l 1 l 2 l 3 m m 1 l 1 (for m 1 = 1, 2, . . . , M 1 ) and U [1] = M 1 A [1] . Similarly, the mode-2 product between the same tensor A and the matrix M 2 ∈ R M 2 ×L 2 leads to the tensor: with the entries u l 1 m 2 l 3 = ∑ L 2 l 2 =1 a l 1 l 2 l 3 m m 2 l 2 (for m 2 = 1, 2, . . . , M 2 ) and U [2] = M 2 A [2] . Finally, the mode-3 product between the tensor A and the matrix M 3 ∈ R M 3 ×L 3 results in the tensor: having the entries u l 1 l 2 m 3 = ∑ L 3 l 3 =1 a l 1 l 2 l 3 m m 3 l 3 (for m 3 = 1, 2, . . . , M 3 ), while U [3] = M 3 A [3] . Clearly, we can multiply the tensor A with the three previously defined matrices M 1 , M 2 , and M 3 . In this case, we get the tensor: where T ∈ R M 1 ×M 2 ×M 3 . As a consequence, considering b 1 , b 2 , and b 3 three vectors of lengths L 1 , L 2 , and L 3 , respectively, the multiplication of the tensor A with the transposed vectors gives the scalar: where T is the transpose operator. It is easy to check that (8) is trilinear with respect to b 1 , b 2 , and b 3 . At this point, we can also define the inner product between two tensors A and B of the same dimension (L 1 × L 2 × L 3 ), which is: Therefore, Expression (8) can also be written in a more convenient way, i.e., (1)). Moreover, if A is also a rank-1 tensor, i.e., A = a 1 • a 2 • a 3 , where the vectors a i have the lengths L i (with i = 1, 2, 3), then: Furthermore, it is easy to check that: where the matrices M 1 , M 2 , and M 3 were previously defined (related to (4)- (6)). The short background on tensors provided before and the main related operations (e.g., matricization, vectorization, rank, and different types of product) aim to facilitate the development that follows in Section 3. It is also important to outline that the trilinear forms result in the context of the decomposition of third-order tensors. Extension to higher order tensors and multilinear forms could be straightforward when dealing with rank-1 tensors. Otherwise, in the general case, decomposing higher rank higher order tensors (see (3)) raises additional difficulties, as will be briefly pointed out at the end of Section 5.

RLS Algorithm for Trilinear Forms
In the following, for the sake of consistency with the development of the NLMS-TF algorithm, we will keep the framework and notation from [17]. Therefore, let us consider the output of a multiple-input single-output (MISO) system (with real-valued data) at the discrete-time index n defined as: where the tensorial form X (n) ∈ R L 1 ×L 2 ×L 3 groups the input signals, with: and the vectors h i , i = 1, 2, 3 (of lengths L 1 , L 2 , and L 3 , respectively) define the three impulse responses, i.e., As we can notice, y(n) represents a trilinear form (see (12) as compared to (8)), because it is a linear function of each of the vectors h i , i = 1, 2, 3, if the other two are fixed.
Next, we can also introduce a rank-1 tensor of dimension L 1 × L 2 × L 3 , using the three impulse responses of the MISO system: whose elements are: Consequently, the output signal results in: where: with H ::l 3 and X ::l 3 (n) (l 3 = 1, 2 . . . , L 3 ) being the frontal slices of H and X (n), respectively. At this point, we can introduce the notation: where h and x(n) are two long vectors, each of them having L 1 L 2 L 3 elements. Thus, the output signal can also be expressed as: The main goal is to estimate the output of the MISO system, which is usually corrupted by an additive noise. Hence, the reference signal results in: where w(n) is a zero-mean additive noise, which is uncorrelated with the input signals. Alternatively, we could estimate the global impulse response h, using an adaptive filter h(n) (of length L 1 L 2 L 3 ).
At this point, we may also define the error signal: which represents the difference between the reference signal and the estimated signal, y(n). Based on (17), we can notice that the global impulse response h (of length L 1 L 2 L 3 ) results based on a combination of the shorter impulse responses h i , i = 1, 2, 3, of lengths L 1 , L 2 , and L 3 , respectively. Consequently, the estimated impulse response can also be decomposed as: where h i (n), i = 1, 2, 3 are three adaptive filters (with L 1 , L 2 , and L 3 coefficients, respectively), which aim to model the individual impulse responses h i , i = 1, 2, 3. Nevertheless, we should notice that there is no unique solution related to the decomposition in (22). It is obvious that, for any constants η 1 , η 2 , and η 3 , with η 1 η 2 η 3 = 1, we have: Hence, η i h i , i = 1, 2, 3 also represent a set of solutions. However, the global impulse response h is identified with no scaling ambiguity.
Following the decomposition from (22), we can easily verify that: where: and I L i , i = 1, 2, 3 are the identity matrices of sizes L 1 × L 1 , L 2 × L 2 , and L 3 × L 3 , respectively. Moreover, introducing the notation: the error signal from (21) can be expressed in three equivalent ways as: Based on the least-squares error criterion [18] applied in the context of (20) and (21), the conventional RLS algorithm is derived from: where λ ≤ 1 is a positive constant known as the forgetting factor. On the other hand, based on (24)-(26), the cost function from (36) can be expressed in three different ways, targeting the optimization of the individual components, i.e., where λ 1 , λ 2 , and λ 3 are the individual forgetting factors. The previous cost functions suggest a "trilinear" optimization strategy [25], where we assume that two components are fixed during the optimization of the third one. Consequently, based on the minimization of (37)-(39) with respect to h 1 (n), h 2 (n), and h 3 (n), respectively, the following set of normal equations are obtained: where: Solving (40)-(42), the updates of the individual filters result in: where k 32 (n) = R −1 32 (n)x 32 (n), k 31 (n) = R −1 31 (n)x 31 (n), and k 21 (n) = R −1 21 (n)x 21 (n) are the Kalman gain vectors, while the error signal can be computed based on (33). At this point, the main task is to update the inverse of the matrices R 32 (n), R 31 (n), and R 21 (n) efficiently. The solution relies on the matrix inversion lemma [18], which leads to the following updates: Therefore, the Kalman gain vectors are evaluated as: For initialization, we can choose: where 0 N and 1 N are two vectors of length N, all elements of which are equal to zero and one, respectively. The proposed RLS algorithm for trilinear forms, namely RLS-TF, is summarized in Table 1. It could also be interpreted as an extension of the RLS-based algorithm tailored for bilinear forms, which was presented in [12]. However, if the MISO system identification problem results based on (12), it is natural to use the RLS-TF algorithm, which is designed for the identification of third-order (rank-1) tensors.
In terms of computational complexity, it can be noticed that the proposed RLS-TF algorithm combines the solutions provided by three RLS-based filters, i.e., h 1 (n) (of length L 1 ), h 2 (n) (of length L 2 ), and h 3 (n) (of length L 3 ). Since the complexity of a regular RLS-based algorithm is proportional to the square of the filter length, the overall complexity of the RLS-TF algorithm roughly results in . On the other hand, the system identification problem can also be handled based on the conventional RLS algorithm, which results following (20) and (21), together with the cost function from (36). However, in this case, there is a single adaptive filter h(n), of length L = L 1 L 2 L 3 , so that the computational complexity is of the order of O(L 2 ). This could be much more computationally expensive as compared to the proposed RLS-TF algorithm. Initialization: Set h 1 (0), h 2 (0), and h 3 (0) based on (52)-(54)  (22) Basically, the RLS-TF algorithm "transforms" a large system identification problem of length L = L 1 L 2 L 3 into three "smaller" problems of lengths L 1 , L 2 , and L 3 , respectively, with advantages in terms of both performance (as will be shown in simulations) and complexity. As outlined before, the proposed RLS-TF algorithm combines the solutions provided by three adaptive filters of lengths L 1 , L 2 , and L 3 , respectively, while the conventional RLS algorithm deals with a single filter of length L = L 1 L 2 L 3 , which is usually much longer. Since the length of the filter highly influences the main performance criteria, i.e., convergence rate and misadjustment [18], the proposed algorithm is able to outperform the conventional one in terms of both criteria. In other words, the shorter the length, the faster the convergence and the lower the misadjustment. This expected behaviour will be supported by the simulation results provided in the next section.
Finally, we should observe that there are some extra operations specific to the RLS-TF algorithm. For example, the "input" signals x 32 (n), x 31 (n), and x 21 (n) result based on (30)- (32), which rely on (27)- (29). Furthermore, the global impulse response (if required within the application) can be evaluated based on (22). These operations require Kronecker products, but the related computational complexity is moderate, i.e., of the order of O(L 1 L 2 L 3 ) = O(L).
The detailed computational complexities of the proposed RLS-TF algorithm and other benchmark algorithms (i.e., the conventional RLS and NLMS algorithms) are summarized in Table 2. For a better visualization, the computational complexities are also illustrated in Figure 1, in terms of the number of multiplications and additions (per iteration), for different values of L 1 ; the other lengths are fixed to L 2 = 8 and L 3 = 4 (similar to the experimental setup from Section 4). As we can notice, the computational complexity of the conventional RLS algorithm was significantly greater, while the computational amount of the proposed RLS-TF algorithm was closer to the conventional NLMS algorithm, especially for higher lengths.

Simulation Results
Simulations were performed in the framework of a tensor-based system identification problem, which resulted following the MISO model defined by (12) and (20) and was similar to the setup used in [17]. The input signals that form the third-order tensor X (n) are AR(1) processes; they are generated by filtering white Gaussian noises through a first-order system 1/ 1 − 0.9z −1 . The additive noise w(n) is white and Gaussian; its variance was set to σ 2 w = 0.01. The third-order system used in the simulations and its components (h 1 , h 2 , and h 3 ) are depicted in Figure 2. First, the component h 1 is an impulse response from the G168 Recommendation [26], of length L 1 = 64; it is provided in Figure 2a. Second, in Figure 2b, the component h 2 is a random impulse response (with Gaussian distribution) of length L 2 = 8. Third, the coefficients of the last component, i.e., the impulse response h 3 , are depicted in Figure 2c; those were evaluated as h 3,l 3 = 0.5 l 3 −1 , l 3 = 1, 2, . . . , L 3 , using the length L 3 = 4. Therefore, the global impulse response from Figure 2d resulted as h = h 3 ⊗ h 2 ⊗ h 1 , and its length was L = L 1 L 2 L 3 = 2048. This global impulse response resembled a channel with echoes, e.g., like an acoustic echo path [27]. Finally, the third-order (rank-1) tensor H of dimension L 1 × L 2 × L 3 could be formed according to (13). In order to test the tracking capabilities of the algorithms, an abrupt change of the system was introduced in the middle of each experiment, by changing the sign of the coefficients of each impulse response.  (17) As shown in Section 3, the proposed RLS-TF algorithm was designed to estimate the individual components of the global system. However, we could identify h 1 , h 2 , and h 3 up to some scaling factors, as explained using (23). Therefore, to evaluate the identification of these individual impulse responses, a proper performance measure is the normalized projection misalignment (NPM) [28]: where · 2 denotes the Euclidean norm. On the other hand, the global impulse response h results without any scaling ambiguity. Consequently, we can use a performance measure based on the regular normalized misalignment (NM): which is also equivalent to H − H(n) 2 F / H 2 F , where H(n) = h 1 (n) • h 2 (n) • h 3 (n) and · F denotes the Frobenius norm (the Frobenius norm of a third-order tensor A is defined as The simulation results should provide answers to several important questions, as follows. (i) What is the influence of the forgetting factors on the performance of the proposed RLS-TF algorithm? (ii) What are the advantages of the RLS-TF algorithm over the previously developed NLMS-TF counterpart [17]? (iii) What are the advantages of the RLS-TF algorithm over the conventional RLS benchmark? The following three experiments are designed to address these issues.
In the first experiment, the performance of the proposed RLS-TF algorithm was analysed with respect to its main parameters, i.e., the forgetting factors λ 1 , λ 2 , and λ 3 . In the case of an RLS-based algorithm, the value of the forgetting factor is usually related to the filter length, following a well-known rule of thumb, as shown in Table 1 (see "Initialization"). In our case, the forgetting factors of the RLS-TF algorithm were set to λ 1 = 1 − 1/(KL 1 ), λ 2 = 1 − 1/(KL 2 ), and λ 3 = 1 − 1/(KL 3 ). As we can notice, the value of each forgetting factor depended on the length of its related filter (i.e., L 1 , L 2 , or L 3 ), but also on the constant K. This tuning parameter could be used to adjust the values of the forgetting factors, as indicated in Figures 3 and 4. Clearly, a higher value of K would result in a higher value of the forgetting factor (i.e., closer to one). We could expect that a higher value of the forgetting factor would reduce the misalignment, but slowing down the convergence/tracking [29]. On the other hand, reducing the forgetting factor improves the convergence/tracking, but increasing the misalignment. This behaviour was supported by the results depicted in Figures 3 and 4, in terms of NPM and NM, respectively. As we can notice, the value K = 20 lead to a good compromise between the performance criteria, so that it would be used in the following experiments. Next, we compare the performance of the proposed RLS-TF algorithm with its previously developed counterpart based on the NLMS algorithm, i.e., NLMS-TF [17]. The overall performance of this algorithm is mainly controlled by its normalized step-sizes, which are positive constants smaller than one. Using notation similar to that involved in [17], we set equal values for these parameters, i.e., α 1 = α 2 = α 3 = α. In the case of the NLMS-TF algorithm, the fastest convergence mode was obtained when α 1 + α 2 + α 3 ≈ 1, so that we could use α = 0.33. Smaller values of the normalized step-sizes (e.g., α = 0.1) reduced the convergence/tracking, but led to a lower misalignment. As shown in Figures 5 and 6 (in terms of NPM and NM, respectively), the RLS-TF algorithm clearly outperformed the NLMS-TF counterpart, achieving a faster convergence rate and tracking, together with low misalignment.
Finally, in the last experiment, we investigated the comparison between the RLS-TF solution and the conventional RLS algorithm. As explained in the last part of Section 3 (related to the computational complexity), the conventional RLS algorithm could also be used for the identification of the global impulse response of length L, based on the cost function from (36). This algorithm uses a single forgetting factor, which can also be set as λ = 1 − 1/(KL), where K is the same tuning constant. The influence of the value of λ on the performance of the algorithm is also related to the well-known compromise between low misalignment and fast tracking. In the experiment reported in Figure 7, the conventional RLS algorithm uses two values of the forgetting factor, which were set by varying the tuning constant to K = 1 and K = 20. As we can notice, even if the largest value of the forgetting factor (obtained for K = 20) led to a lower misalignment, the tracking capability of the conventional RLS algorithm was significantly reduced. Clearly, the tracking was improved when using a smaller forgetting factor (corresponding to K = 1), but the misalignment of the conventional RLS algorithm was much higher in this case. On the other hand, the RLS-TF algorithm outperformed by far its conventional counterpart, in terms of both performance criteria. Moreover, the complexity of the conventional RLS algorithm, i.e., O(L 2 ), was prohibitive for practical implementations, due to the long length of the global filter (L = 2048). On the other hand, the RLS-TF algorithm worked on the individual components and combined the solutions of three shorter filters, of lengths L 1 , L 2 , and L 3 (with L 1 L 2 L 3 = L); thus, it was much more computationally efficient. As a trivial example related to the last experiment given in Figure 7, we could mention that the simulation time (using MATLAB) of the RLS-TF algorithm was less than one minute, while the conventional RLS algorithm took hours to reach the final result.  [17] (with different normalized step-sizes α 1 = α 2 = α 3 = α) and the RLS-TF algorithm (with the forgetting  [17] (with different normalized step-sizes α 1 = α 2 = α 3 = α) and the RLS-TF algorithm (with the forgetting factors λ i = 1 − 1/(KL i ), i = 1, 2, 3, where K = 20).

Conclusions and Future Works
In this paper, we explored the identification of trilinear forms using the RLS algorithm. The approach was developed in the framework of an MISO system identification problem, based on the decomposition and modelling of third-order tensors. The resulting RLS-TF algorithm was tailored for the identification of such trilinear forms in a more efficient way as compared to the conventional RLS algorithm. Moreover, the proposed RLS-TF algorithm outperformed its previously developed NLMS-TF counterpart in terms of the main performance criteria, providing a faster convergence rate and tracking, together with low misalignment.
The ideas presented in this paper could be further exploited in an extended framework, aiming to identify more general forms of global impulse responses, which cannot be decomposed as rank-1 tensors. Several preliminary results can be found in [30][31][32], but they are applicable only in the bilinear context (i.e., second-order tensors). The extension to the trilinear case represents a very challenging problem, since finding (and approximating) the rank of a higher order tensor is a much more sensitive task. Furthermore, it would be interesting to extend other versions of the NLMS and RLS algorithms (e.g., based on variable step-sizes [33] and variable forgetting factors [34], respectively) to the trilinear forms.
Finally, it would be useful to evaluate how the proposed algorithm could benefit (in terms of implementation) from the current technology of the tensor processing unit (TPU) and the TensorFlow software [35]. This aspect could bring additional advantages, especially in the framework of specific applications related to machine learning/vision, neural networks, and artificial intelligence.

Conflicts of Interest:
The authors declare no conflict of interest.