A Kalman Filter for Multilinear Forms and Its Connection with Tensorial Adaptive Filters †

The Kalman filter represents a very popular signal processing tool, with a wide range of applications within many fields. Following a Bayesian framework, the Kalman filter recursively provides an optimal estimate of a set of unknown variables based on a set of noisy observations. Therefore, it fits system identification problems very well. Nevertheless, such scenarios become more challenging (in terms of the convergence and accuracy of the solution) when the parameter space becomes larger. In this context, the identification of linearly separable systems can be efficiently addressed by exploiting tensor-based decomposition techniques. Such multilinear forms can be modeled as rank-1 tensors, while the final solution is obtained by solving and combining low-dimension system identification problems related to the individual components of the tensor. Recently, the identification of multilinear forms was addressed based on the Wiener filter and most well-known adaptive algorithms. In this work, we propose a tensorial Kalman filter tailored to the identification of multilinear forms. Furthermore, we also show the connection between the proposed algorithm and other tensor-based adaptive filters. Simulation results support the theoretical findings and show the appealing performance features of the proposed Kalman filter for multilinear forms.


Introduction
The identification of multilinear forms (or linearly separable systems) can be efficiently exploited in the framework of different applications, like channel equalization [1,2], nonlinear acoustic echo cancellation [3,4], source separation [5,6], array beamforming [7,8], and object recognition [9,10]. In these contexts, the basic approach relies on tensor decomposition and modeling techniques [11][12][13][14], since the multilinear forms can be modeled as rank-1 tensors. The main idea is to combine (i.e., "tensorize") the solutions to low-dimension problems, in order to efficiently solve a multidimensional system identification problem, which is usually characterized by a large parameter space. Such scenarios can appear in the framework of multichannel systems, e.g., those with a large number of sensors and actuators, such as in active noise control systems [15], adaptive beamforming [16], microphones arrays [17], etc.
The particular cases of bilinear and trilinear forms have been previously addressed in the literature from a system identification perspective. In [18], an iterative Wiener filter for bilinear forms was developed in the framework of a multiple-input/single-output (MISO) system. Compared to the conventional Wiener solution, the iterative version can obtain a good accuracy of the solution, even when a few data are available for the estimation of the

Multilinear Framework for MISO System Identification
Let us consider a real-valued MISO system with N individual channels, which are modeled as finite-impulse-response (FIR) filters of lengths L n , n = 1, 2, . . . , N. The impulse responses of these channels at the discrete-time index t are characterized by the vectors h n (t) = [h n,1 (t) h n,2 (t) · · · h n,L n (t)] T , n = 1, 2, . . . , N, where the superscript T denotes the transpose operator. Furthermore, we assume that h n (t), n = 1, 2, . . . , N are zero-mean random vectors, which follow a simplified first-order Markov model h n (t) = h n (t − 1) + w n (t), (1) where w n (t), n = 1, 2, . . . , N are zero-mean white Gaussian noise vectors [uncorrelated with h n (t − 1)], whose correlation matrices are R w n = σ 2 w n I L n , n = 1, 2, . . . , N, with I L n being an identity matrix of size L n × L n . The variances σ 2 w n , n = 1, 2, . . . , N capture the uncertainties in the corresponding channels h n (t), n = 1, 2, . . . , N.
The impulse responses of the channels can be grouped in a tensorial form, i.e., H(t) ∈ R L 1 ×L 2 ×···×L N , such that where • denotes the outer product. Therefore, the elements of this rank-1 tensor are (H) l 1 ,l 2 ,...,l N (t) = h 1,l 1 (t)h 2,l 2 (t) · · · h N,l N (t). In addition, using the vectorization operation, vec(·), we can write where ⊗ denotes the Kronecker product. The real-valued input signals that feed into the MISO system are described in the tensorial form X (t) ∈ R L 1 ×L 2 ×···×L N , having the elements (X ) l 1 l 2 ...l N (t) = x l 1 l 2 ...l N (t). Consequently, the output signal at the discrete-time index t results in x l 1 l 2 ...l N (t)h 1,l 1 (t)h 2,l 2 (t) · · · h N,l N (t) where × n denotes the mode-n product [5]. This represents a multilinear form, since it is a linear function of each of the vectors h n (t), n = 1, 2, . . . , N, considering that the other N − 1 components are fixed. The last line in (4) represents a particular case of the MISO system identification problem, which resembles a single-input, single-output (SISO) scenario. Using the notation x(t) = vec[X (t)] and h(t) = vec[H(t)], the output signal from (4) becomes where the vector h(t) represents the global impulse response of the system. Therefore, in this multilinear framework, the system identification problem can be formulated in two ways. First, this can be approached in terms of estimating the individual channels, h n (t), n = 1, 2, . . . , N. Alternatively, we can focus on the identification of the global impulse response, h(t), as in a conventional SISO system identification problem. At this point, there are two important aspects that should be outlined. The global impulse response, h(t), is of length L = ∏ N n=1 L n , but this results as a combination of ∑ N n=1 L n elements, which are the coefficients of the individual impulse responses, h n (t), n = 1, 2, . . . , N. Additionally, according to (3), we have with η n ∈ R, η n = 0 (for any n = 1, 2, . . . , N), and ∏ N n=1 η n = 1, so that the decomposition of h(t) is not unique. Consequently, from a system identification perspective, it would be more advantageous to approach the problem in terms of estimating the individual impulse responses h n (t), n = 1, 2, . . . , N, while the estimated global impulse response results similar to (6). On the other hand, the Kronecker product-based decomposition of h(t) does not lead to a unique set of estimates for the individual channels. However, there is no scaling ambiguity when identifying the global impulse response. In sum, the main idea behind the decomposition-based approach is to reformulate a high-dimension system identification problem as a combination of low-dimension solutions, which are "tensorized" together.
In realistic system identification scenarios, the output signal y(t) is usually corrupted by an additive noise, v(t), which is uncorrelated with the input signals. The variance of this noise signal is denotes mathematical expectation. Thus, the reference (or desired) signal at the discrete-time index t results in The goal is to estimate the output of the system (or, equivalently, the impulse responses of the channels), given the reference and the input signals. In this context, the conventional Wiener filter provides the well-known solution [20][21][22] where h W is an estimate of the global impulse response, while R = E x(t)x T (t) and represent the covariance matrix and the cross-correlation vector, respectively. Recently, an iterative Wiener filter [38] was developed in the framework of multilinear forms. It exploits the decomposition of the global impulse response, while the optimization criterion is applied, following a block coordinate descent approach, to the individual components [41]. As compared to the conventional Wiener filter from (8), the iterative version of multilinear forms leads to a superior performance, especially when a small amount of data are available for the estimation of R and p. Nevertheless, both previously mentioned Wiener solutions present several limitations, such as the time-invariant framework, the matrix inversion operation, and the estimation of the statistics. These could make them unsuitable in real-world scenarios, e.g., working in nonstationary environments and/or requiring real-time processing.
In this context, a more appropriate approach is adaptive filtering. Since the LMS algorithm represents one of the simplest and most practical solutions, the LMS-based algorithms tailored to the identification of multilinear forms were developed in [39]. Furthermore, the RLS algorithm for multilinear forms was introduced in [40]. These versions are also referred to as tensor-based adaptive algorithms. They rely on the minimization of cost functions that depend on the error signal, i.e., the difference between the reference signal and the estimated output of the system. Nevertheless, in a realistic system identification framework, [i.e., in the presence of the system noise, according to (7)], the goal of the adaptive filter is not to make the error signal reach zero. The objective, instead, is to recover this system noise from the error signal of the adaptive filter, after it converges to the true solution. Consequently, it makes more sense to minimize the system misalignment, i.e., a measure of the difference between the true impulse response of the system and the estimated one. This is the optimization approach behind Kalman filtering. Moreover, the Kalman filter uses a specific parameter that captures the uncertainties in the system to be identified, as outlined in the discussion that follows (1), related to σ 2 w n , n = 1, 2, . . . , N. These parameters could act as control factors. On the other hand, the LMS and RLS adaptive filters do not depend on these uncertainties, since the impulse responses of the channels are considered as deterministic in the derivation of these algorithms. Therefore, the specific parameters of the Kalman filter would allow for better control.

Kalman Filter for the Identification of Multilinear Forms
Following (5), we can introduce the a priori error signal where y(t) is the estimated output signal and h(t − 1) represents an estimate of the global impulse response at the discrete-time index t − 1. Since the global impulse response can be deconstructed based on (6), we can also consider that h(t) similarly results in a combination of the estimated impulse responses of the channels, denoted by h n (t), n = 1, 2, . . . , N. Therefore, we have Alternatively, using the properties of the Kronecker product [42], (10) can be expressed in N equivalent ways, i.e., Consequently, the a priori error signal from (9) can also be rewritten in N equivalent forms (targeting the individual filters), as follows . . . where . . .
In a similar manner, the reference signal from (7) can be expressed in N equivalent ways, which can be summarized as where with n = 1, 2, . . . , N. For any value of n = 1, 2, . . . , N, expression (17) plays the role of an observation equation, while (1) represents a state equation. In the framework of multilinear forms, having N such pairs of state and observation equations, the objective is to find the optimal recursive estimator of h n (t), for n = 1, 2, . . . , N, i.e., h n (t). To this end, we will follow a multilinear optimization approach [41], by considering that N − 1 impulse responses are fixed for all the previous time indices, while optimizing the remaining one at the current time index. In this case, within the optimization criterion of h n (t), n = 1, 2, . . . , N, we may assume that Under these considerations and based on (17), we can introduce the a posteriori errors related to the N individual filters, which result in where is the a posteriori misalignment (or the state estimation error) of the nth individual filter, with n = 1, 2, . . . , N. Similarly, we can define the a priori misalignments and, consequently, represent the correlation matrices of the a priori and a posteriori misalignments, respectively.
As explained in Section 2, according to (6), we can only identify the individual impulse responses up to some arbitrary scaling factors, η n , n = 1, 2, . . . , N. However, in terms of identifying the global impulse response (or, equivalently, the output signal), the group Thus, in order to simplify the notation, the scaling factors do not appear explicitly in the misalignments.
In the context of the multilinear optimization strategy and the linear sequential Bayesian approach [43], the optimum estimates of the state vectors, h n (t), n = 1, 2, . . . , N, have the recursive forms . . .
where tr[·] denotes the trace of a square matrix. This leads to . . . and . . .
An estimation of the global impulse response can be obtained based on (10). Alternatively, the conventional Kalman filter can be used to find h(t), based on the observation (7) and a state equation for the global impulse response h(t) [similar to (1)]. In this case, the computational complexity of the conventional Kalman filter would be proportional to O(L 2 ), where L = ∏ N n=1 L n (i.e., the length of the global impulse response, as explained in Section 2). On the other hand, the KF-T algorithm combines the solutions of N Kalmanbased filters of shorter lengths (i.e., L n , n = 1, 2, . . . , N), so that its computational cost is proportional to ∑ N n=1 O(L 2 n ). Moreover, even if they are interconnected, these N individual filters can work in parallel, since the update of each filter at the discrete-time index t depends on the coefficients of all the other filters from the previous time index, i.e., h n (t − 1), n = 1, 2, . . . , N. Inputs:

Connection with Tensor-Based Adaptive Filters
The KF-T algorithm developed in the previous section represents a generalization of the Kalman filter for bilinear forms (i.e., the particular case N = 2) presented in [27]. Nevertheless, it is also connected with other tensor-based adaptive algorithms, as will be shown in this section.
In [40], the tensor-based RLS (RLS-T) algorithm was introduced in the context of multilinear forms. At first, the RLS-T algorithm with the forgetting factors equal to one [44] has a striking resemblance to the KF-T using σ 2 w n = 0, n = 1, 2, . . . , N. However, the KF-T does not rely on any matrix inversion, which is not the case for the RLS-T algorithm. Moreover, the RLS-T depends on the correlation matrices of the input signals, while the KF-T is related to the correlation matrices of the misalignments. This is a more proper approach in system identification scenarios, as was outlined at the end of Section 2. Additionally, the RLS-T algorithm does not depend on the variance of the additive noise (i.e., the parameter σ 2 v ), or on the uncertainties in the system (i.e., the parameters σ 2 w n , n = 1, 2, . . . , N), since h n (t), n = 1, 2, . . . , N are considered as deterministic in its derivation. Nevertheless, these specific parameters of the KF-T allow for better control of the algorithm. For example, large values of σ 2 w n , n = 1, 2, . . . , N are suitable when the uncertainties in the system are high, in which case a good tracking behavior of the algorithm is needed; usually, the price is a lower accuracy, i.e., a higher misalignment. On the other hand, lower values of these parameters lead to improved accuracy, but reduce the tracking capability.
An interesting connection can be established between the KF-T and the tensor-based NLMS (NLMS-T) algorithm [39,40], which was also developed in the framework of multilinear forms. Let us consider that the KF-T has started to converge. Consequently, in the steady-state of the algorithm, we may also consider that R m hn (t), n = 1, 2, . . . , N tend to become diagonal matrices. This approximation is reasonable, taking into account that the misalignment of the individual coefficients tend to become uncorrelated. Hence, we can use where σ 2 m hn (t), n = 1, 2, . . . , N represent the elements from the main diagonal of the respective matrices. Furthermore, using the notation the expressions of the Kalman gain vectors from (27)-(29) simplify to . . .
These updates are specific to a tensor-based variable-regularized NLMS algorithm tailored for multilinear forms, namely VR-NLMS-T. Such an algorithm is defined by the error signals from (11)-(13) and the updates (38)- (40). In this case, the control parameters are grouped into the variable regularization factors δ h n (t), n = 1, 2, . . . , N.

Simulation Results
In this section, simulation results are provided in order to support the performance of the proposed KF-T and the main theoretical findings related to this algorithm. The goal of this analysis is fourfold, as follows. First, we evaluate the influence of the uncertainty parameters (i.e., σ 2 w n , n = 1, 2, . . . , N) on the performance of the KF-T. Second, we outline the connection between the proposed Kalman-based algorithm and its tensor-based counterparts, as discussed in Section 4. Third, we assess the performance of the KF-T as compared to the conventional Kalman filter. Finally, we analyze the behavior of the KF-T in a more general framework, for the identification of nonseparable systems.
The experiments are performed in the context of the multilinear framework presented in Section 2. The order of the system is N = 4 and the input signals that form the tensor X (t) are AR(1) processes in most of the experiments. These inputs are obtained by filtering white Gaussian noises through an AR(1) model with a pole at 0.99. Such a high correlation degree (due to the pole close to 1) represents a challenge for most adaptive filters [20][21][22], in terms of their convergence features. In the last two experiments, we also used realworld speech sequences (corrupted by background noise) as input signals, which are also challenging due to their nonstationary character.
The MISO system used in simulations is characterized by four individual channels (i.e., N = 4), where their impulse responses are chosen as follows. The impulse response h 1 (t) contains the first L 1 = 16 coefficients of the first network echo path from the ITU-T G168 Recommendation [45] (which is a standard for digital network echo cancellers). The impulse response h 2 (t) is randomly generated (with Gaussian distribution), using the length L 2 = 8. The lengths of the other two impulse responses, i.e., h 3 (t) and h 4 (t), are set to L 3 = 4 and L 4 = 2, respectively. Their coefficients follow an exponential decay based on the rule a l k −1 k , l k = 1, . . . , L k , k = 3, 4, using a 3 = 0.8 and a 4 = 0.3, respectively. The length of the global impulse response h(t) is L = ∏ N n=1 L n , which, in our case, results in L = 1024. Such a length could be prohibiting in terms of implementation, especially for the Kalman-based and RLS-based adaptive filters. On the other hand, the tensor-based algorithms combine the solutions of N shorter filters of length L n , n = 1, 2, . . . , N, which is much more advantageous, since, usually, L n L (as in the current setup). The output signal y(n) from (4) is corrupted by an additive white Gaussian noise, v(n); its variance is set to σ 2 v = 0.01. We assume that this parameter is available in simulations. In practice, it can be estimated in several ways, e.g., [46,47]. Nevertheless, the influence of these different estimates on the performance of the proposed KF-T is beyond the scope of this paper. Finally, the reference signal results based on (7).
Two performance measures are used in simulations. First, the identification of the global impulse response is evaluated based on the normalized misalignment (NM), in dB, which is computed as where · stands for the Euclidean norm. Second, since the identification of the individual impulse responses is influenced by the scaling factors [according to the discussion related to (6)], a proper performance measure is the normalized projection misalignment (NPM) [48], which is evaluated as for n = 1, 2, . . . , N.
In the first set of experiments, we evaluate the impact of the uncertainty parameters on the performance of the proposed KF-T. The values of σ 2 w n , n = 1, 2, . . . , N are subject to a compromise between the main performance criteria, i.e., fast convergence/tracking versus low misadjustment (i.e., good accuracy). The best accuracy of the solution is obtained for σ 2 w n = 0, n = 1, 2, . . . , N, i.e., when there are no uncertainties in the system. Such a setup is suitable in stationary environments. As can be seen in Figures 1 and 2 (in terms of the NM and NPM, respectively), the lower the values of σ 2 w n , n = 1, 2, . . . , N, the lower the misalignment of KF-T. On the other hand, larger values of these parameters improve the tracking capability of the algorithm, but achieve a higher misadjustment, reducing the accuracy of the solution. This behavior is supported in Figure 3, where an abrupt change in the system is considered in the middle of the experiments, by changing the sign of the coefficients of h 1 (t). Since the initial convergence rate is not relevant in this case, and for a better visualization, we focused only on the tracking behavior in Figure 3. Despite having the lowest misalignment (i.e., the best accuracy), the KF-T using σ 2 w n = 0, n = 1, 2, . . . , N has the slowest tracking capability. Nevertheless, larger values of σ 2 w n , n = 1, 2, . . . , N lead to a significantly better performance in terms of tracking, while slightly sacrificing the accuracy of the solution (i.e., achieving a slightly higher misalignment level).   The connections between the proposed KF-T and other tensor-based adaptive algorithms were shown in Section 4. For example, the KF-T with σ 2 w n = 0, n = 1, 2, . . . , N resembles the RLS-T algorithm [40] using the forgetting factors λ n = 1, n = 1, 2, . . . , N. This aspect is supported in Figures 4 and 5, in terms of the NM and NPM, respectively. Both algorithms achieve similar initial convergence rates. However, the KF-T reaches a lower misalignment level and outperforms the RLS-T algorithm, thus supporting the discussion from Section 4. Moreover, in Figure 4, we also introduce comparisons with the solutions provided by the conventional and iterative Wiener filters (WFs) [38]. Both tensor-based algorithms outperform the conventional WF in terms of the accuracy of their solutions (i.e., lower misalignment). The KF-T converges (faster than the RLS-T) to the solution obtained by the iterative WF for multilinear forms [38]. Nevertheless, as mentioned in Section 2 [in the discussion that follows (8)], the KF-T overcomes the inherent limitations of the iterative WF (e.g., time-invariant framework, estimation of the statistics, and the matrix inversion operation).   Based on the assumption from (33), it was also shown in Section 4 that the KF-T behaves like a VR-NLMS-T algorithm, which is defined by the relations (34)- (40). This behavior is supported in Figures 6-8, when using different values of σ 2 w n , n = 1, 2, . . . , N.
The KF-T and the VR-NLMS-T reach the same misalignment level for the same values of the uncertainty parameters. On the other hand, due to the assumption in (33), the VR-NLMS-T algorithm experiences a slower convergence rate as compared to the KF-T. In fact, as outlined in Section 4, the VR-NLMS-T algorithm represents a simplified version of the KF-T, with a lower computational complexity (but paying a price in terms of the convergence features).
In Figures 9 and 10, the NM and NPM performance of the proposed KF-T are compared to its tensor-based counterparts, i.e., the NLMS-T and the RLS-T algorithms [40]. The specific parameters of the KF-T are set to σ 2 w n = 10 −9 , n = 1, 2, . . . , N, while the RLS-T algorithm uses the forgetting factors λ n = 0.999, n = 1, 2, . . . , N, in order to target a similar convergence behavior. The normalized step-sizes of the NLMS-T algorithm are set to α n = 0.25, n = 1, 2, . . . , N, which represent the fastest convergence mode [39,40]. The NLMS-T and RLS-T algorithms reach similar misalignment levels, while the KF-T outperforms its counterparts in terms of both convergence rate and misalignment. As expected, the RLS-T is significantly faster (in terms of the convergence rate) as compared to the NLMS-T algorithm. Nevertheless, the KF-T provides an initial convergence rate that is slightly better as compared to the RLS-T algorithm, while also achieving a lower misalignment level (i.e., a better accuracy of the solution).
The conventional Kalman filter (KF) can also be used for the identification of the global impulse response, h(t), as explained in the end of Section 3. In this case, there is a single adaptive filter (of length L) that has to be updated, while the overall computational complexity is proportional to O(L 2 ). Due to the large number of coefficients, dealing with such a long adaptive filter raises significant challenges in terms of the complexity, convergence, and accuracy of the solution. On the other hand, the proposed KF-T can obtain the estimate of the global impulse response by combining the solutions of much shorter adaptive filters of lengths L n , n = 1, 2, . . . , N, with ∏ N n=1 L n = L. Therefore, the expected gain is twofold, in terms of both performance and complexity. This is supported in Figure 11, where the performance of the proposed KF-T is compared to the conventional KF. The specific parameters are set to target the best accuracy of the solution, i.e., σ 2 w n = 0, n = 1, 2, . . . , N in case of the KF-T, while the conventional KF uses the same null value for its uncertainty parameter. The proposed KF-T outperforms the conventional KF in terms of accuracy, achieving a significantly lower misalignment level. Moreover, the complexity of the KF-T is proportional to ∑ N n=1 O(L 2 n ), which is much more advantageous compared to the conventional KF, especially for L n L, with n = 1, 2, . . . , N. For example, related to the experiment given in Figure 11, we could mention that the simulation time (using MATLAB R2018b) of the proposed KF-T was less than one minute, while the conventional KF took almost one hour to reach the final result. The experiment was performed on an Asus GL552VX device (Windows 10 OS), having an Intel Core i7-6700HQ CPU@2.60 GHz, with 4 Cores, 8 Logical Processors, and 16 GB of RAM. In the last set of experiments, we focus on a more challenging scenario, when the global impulse response of the system is not separable. In this case, we target the identification of h(t) + u(t), where h(t) is specified in the beginning of this section [i.e., h(t) = h 4 , while u(t) is randomly generated, with Gaussian distribution, and its variance is set to υ h(t) 2 /L (using different values of υ). Clearly, the higher the value of υ, the more challenging the decomposition of the global impulse response. Since the KF-T is based on the decomposition in (10), it cannot model the noisy part u(t). Nevertheless, as can be seen in Figures 12 and 13 (in terms of the NM and NPM, respectively), the KF-T is able to achieve a reasonable attenuation of the misalignment (i.e., a good accuracy of the estimate) even for larger values of υ. Consequently, the proposed KF-T has uses beyond the identification of rank-1 tensors, when the global impulse response contains a dominant separable (i.e., decomposable) part.
The previous experiment is repeated in Figure 14, but using real-world speech sequences (corrupted by background noise) as input signals. Due to the nonstationary nature and highly correlated character of the speech signals, the performance of any adaptive algorithm is influenced in such a scenario. This is also the case for the KF-T, which pays with a slower convergence rate, as compared to the case analyzed in Figure 12. Nevertheless, this performance criterion can be improved by using higher values for the uncertainty parameters, σ 2 w n , n = 1, 2, . . . , N. This is further supported in Figure 15, where we can notice a higher convergence rate of the KF-T when increasing the values of σ 2 w n , while causing a slight increase in the misalignment level.

Conclusions
In this paper, we have presented a tensorial Kalman filter tailored to the identification of multilinear forms. The solution was developed in the framework of a MISO system, while the identification problem was reformulated based on linearly separable systems modeled as rank-1 tensors. We have also shown how the resulting KF-T algorithm is connected to the main categories of tensor-based adaptive filters, i.e., the NLMS-T and the RLS-T algorithms. In this context, the specific uncertainty parameters of the KF-T allow for better control of this, as compared to its counterparts. The simulation results indicated the good performance features of the KF-T and also supported the discussion related to its connection with other tensorial algorithms. Future works will focus on finding a more practical way to evaluate the uncertainty parameters, e.g., following a similar approach to variable adaptation factors, which act as time-dependent parameters. In addition, we aim to extend the decomposition-based technique to the identification of nonseparable systems (which could be modeled as higher rank tensors), in order to develop a more general and efficient version of the tensorial Kalman filter.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author.

Conflicts of Interest:
The authors declare no conflict of interest.