A Connection Between the Kalman Filter and an Optimized LMS Algorithm for Bilinear Forms

The system identification problem becomes more challenging when the parameter space increases. Recently, several works have focused on the identification of bilinear forms, which are related to the impulse responses of a spatiotemporal model, in the context of a multiple-input/ single-output system. In this framework, the problem was addressed in terms of the Wiener filter and different basic adaptive algorithms. This paper studies two types of algorithms tailored for the identification of such bilinear forms, i.e., the Kalman filter (along with its simplified version) and an optimized least-mean-square (LMS) algorithm. Also, a comparison between them is performed, which shows interesting similarities. In addition to the mathematical derivation of the algorithms, we also provide extensive experimental results, which support the theoretical findings and indicate the good performance of the proposed solutions.


Introduction
Bilinear systems were previously treated in various contexts [1], such as the nonlinear systems approximation.The applications derived from there are numerous, among which we can mention system identification [2][3][4][5], design of digital filters [6], echo cancellation [7], chaotic communications [8], active noise control [9], neural networks [10], etc.In all these papers, the bilinear term is understood in terms of an input-output relation (with respect to the data).
More recently, a new approach was studied in [11], where the bilinear term was defined in the context of a multiple-input/single-output (MISO) system, with respect to the impulse responses of a spatiotemporal model.In [11], the Wiener filter solution for the identification of such bilinear forms was proposed, and then, in [12][13][14], some adaptive solutions based on different basic algorithms were provided.Similar frameworks can be found in [15][16][17][18], in conjunction with specific applications, such as channel equalization and nonlinear acoustic echo cancellation; however, these works were not associated with the identification of bilinear forms or analyzed in this context.
The Kalman filter was first introduced in [19] and it has a wide range of applications in technology, as well as in signal processing [20].Some of its applications in the automotive field are mentioned in [21,22], and the references therein, while [23] also provides a computational complexity analysis.
In this paper, we focus on a comparative study of two different types of algorithms tailored for the identification of bilinear forms.The first one is based on the Kalman filter [19], whose bilinear form was introduced in our previous work [13], together with a simplified (i.e., low complexity) version, which could be more suitable in real-world applications.The second algorithm is a version of the least-mean-square (LMS) adaptive filter, but where the step-size parameter has been optimized in order to meet a proper compromise between the convergence rate and misadjustment.In addition, as it will be shown, there is a strong similarity between the simplified Kalman filter and the optimized LMS algorithm.Simulations show the good performance of the proposed solutions, in comparison with other related works based on the conventional algorithms.Moreover, the gain could be twofold, in terms of both convergence performance and computational complexity.
The paper structure is as follows.Section 2 introduces the system model in the context of bilinear forms.In this framework, we present the Kalman filter together with its simplified version in Section 3. The optimized LMS algorithm is derived in Section 4. Next, a comparison between the simplified Kalman filter and the optimized LMS algorithm for bilinear forms is presented in Section 5. Section 6 provides some practical considerations.Simulations are performed in the framework of system identification and the results are given in Section 7. Finally, some conclusions are drawn in Section 8. Also, Appendix A provides detailed computations of some terms required within the optimized LMS algorithm, while Appendix B summarizes and briefly explains the main parameters used in the paper, in order to facilitate the reading.For the sake of clarity, the main notation used in this work is provided in Table 1.

System Model
The signal model that shall be used throughout the paper is given by [11] where d(n) is the zero-mean desired (or reference) signal at the discrete-time index n, h(n) and g(n) are the two impulse responses of the system of lengths L and M, respectively, the superscript T is the transpose operator, is the output signal and it represents a bilinear form, and v(n) is a zero-mean additive noise (with the variance σ 2 v ).It is assumed that all signals are real-valued, and X(n) and v(n) are independent.As we can notice, y(n) is a bilinear function of h(n) and g(n), because for every fixed h(n), y(n) is a linear function of g(n), and for every fixed g(n), it is a linear function of h(n).
We can rewrite the matrix X(n), of size L × M, as a vector of length ML, by using the vectorization operation: Therefore, the output signal can be expressed as where ⊗ denotes the Kronecker product between the individual impulse responses, while the vector f(n) = g(n) ⊗ h(n), of length ML, represents the spatiotemporal (i.e., global) impulse response of the system.Consequently, the signal model in (1) becomes The major difference compared to a general MISO system is yielded by the fact that in this bilinear context f(n) is formed with only M + L different elements, despite being of length ML.
The goal is the identification of the two impulse responses h(n) and g(n), and, in this way, of the spatiotemporal impulse response f(n).For this aim, we can use two adaptive filters, h(n) and g(n); hence, the global impulse response can be evaluated as Let η = 0 be a real-valued constant.We can see from (1) that meaning that the pairs [h(n)/η, ηg(n)] and [h(n), g(n)] are equivalent in the bilinear form.This implies that we can only identify h(n) and g(n) up to a scaling factor.A similar discussion can be found in [15,18] in the framework of blind identification/equalization and nonlinear acoustic echo cancellation, respectively.Nevertheless, because the global impulse response can be identified with no scaling ambiguity.Consequently, for the performance evaluation of the identification of the temporal and spatial filters, we can use the normalized projection misalignment (NPM), defined in [24]: where • denotes the Euclidean norm.On the other hand, for the identification of the global filter, f(n), we should use the normalized misalignment: In [11], this bilinear system identification problem has been studied in terms of the Wiener filter.Therefore, the assumption was that the two impulse responses that need to be identified are time-invariant (which represents a basic assumption in the context of the Wiener filter).In practice, however, these systems could vary in time.For this reason, in this paper, we approach the system identification problem considering that the systems that need to be identified vary in time.Thus, we assume that h(n) and g(n) are zero-mean random vectors, following a simplified first-order Markov model, i.e., where w h (n) and w g (n) are zero-mean white Gaussian noise vectors, with correlation matrices R w h (n) = σ 2 w h I L and R w g (n) = σ 2 w g I M , respectively (with I L and I M being the identity matrices of sizes L × L and M × M, respectively).It is considered that w h (n) is uncorrelated with h(n − 1) and v(n), while w g (n) is uncorrelated with g(n − 1) and v(n).The variances σ 2 w h and σ 2 w g capture the uncertainties in h(n) and g(n), respectively.

Kalman Filter for Bilinear Forms
In this section, we address the previously described system identification problem in terms of the Kalman filter, summarizing the findings from [13].In this context, the signal model from (1) may be interpreted as the observation equation, while the system impulse responses can be considered as state equations.Given the two adaptive filters h(n) and g(n), the estimated signal is given by As a result, the a priori error signal between the desired and estimated signals can be defined as In the context of the linear sequential Bayesian approach, the optimal estimates of the state vectors have the forms [25]: where k h (n) and k g (n) are the Kalman gain vectors.Next, we can define the a posteriori misalignments (which represent the state estimation errors) related to the temporal and spatial impulse responses: for which their correlation matrices are ], respectively, where E[•] denotes mathematical expectation.As mentioned in Section 2, we can only identify the impulse responses up to this arbitrary scaling factor η; however, the pair h(n)/η and ηg(n) is equivalent to the pair h and g in the bilinear form.Let us also define the a priori misalignments related to the two impulse responses: whose correlation matrices are , respectively.For the sake of simplicity of the upcoming developments, let us multiply w h (n) and w g (n) by 1/η and η, respectively, and introduce the notation: These new terms are also zero-mean white Gaussian noise vectors, having the correlation matrices R w h (n) = σ 2 w h I L and R w g (n) = σ 2 w g I M , respectively.Clearly, we have σ 2 w h = σ 2 w h /η 2 and σ 2 w g = η 2 σ 2 w g .Consequently, using this notation in (21) and (22), we obtain In this context, the Kalman gain vectors are computed from the minimization of the criteria: with respect to k h (n) and k g (n), respectively, where tr [•] denotes the trace of a square matrix.From these minimizations, we find that and Summarizing, the Kalman filter for bilinear forms (namely KF-BF) is defined by Equations ( 17), ( 18), ( 25), (26), and ( 29)-(32).As we can notice, the computational complexity of this algorithm is proportional to O(L 2 + M 2 ).For the purpose of reducing the computational complexity of the KF-BF, a simplified version of this algorithm is derived.The idea of this low-complexity version was inspired by the work presented in [26], in the context of echo cancellation.First, let us assume that the KF-BF has reached the steady-state convergence.Consequently, R c ha (n) and R c ga (n) tend to become diagonal matrices, which have all the elements on the main diagonal equal to small positive numbers, σ 2 c ha (n) and σ 2 c ga (n), respectively.Therefore, we can use the approximations: Hence, the Kalman gain vectors for the temporal and spatial impulse responses become where c ga (n) can be seen as variable regularization parameters.Then, we use the Kalman vectors from ( 35) and (36) in the updates ( 17) and ( 18), respectively.
Next, a new simplification can be made, by considering that the matrices appearing in the updates of R c h (n) and R c g (n) can be approximated as We can perform these approximations because, as the filters start to converge, the misalignments of the individual coefficients tend to become uncorrelated; due to this fact, the matrices R c h (n) and R c g (n) tend to become diagonal.Using the notation: together with we can summarize the simplified Kalman filter for bilinear forms (SKF-BF) in Table 2.As we can notice, its computational complexity is proportional to O(L + M), which represents an important gain as compared to KF-BF.

Optimized LMS Algorithm for Bilinear Forms
In this section, we approach the system identification problem based on the LMS algorithm, aiming to optimize its step-size parameter in order to address the compromise between the main performance criteria, i.e., convergence rate versus misadjustment [27].In the following, the proposed optimized LMS algorithm for bilinear forms (namely OLMS-BF) is derived based on the same system model given in Section 2. As will be explained in Section 5, this algorithm has striking resemblances with the SKF-BF, even if their derivations follow different patterns.
Let us consider the two estimated impulse responses h(n) and g(n), such that the estimated signal is given by (13).As a consequence, the a priori error signal between the desired signal and the estimated one can be defined following (14), i.e., where 15) and ( 16), respectively.The desired signal d(n) may also be expressed as where the term: can be seen as an additional "noise" term, of variance σ 2 v g (n), introduced by the system g.The terms c h and c h a were previously defined in (19) and (21), respectively.For the sake of simplicity, the scaling factor η does not appear explicitly in the following.As explained before, this parameter is included in the expression of the uncertainty parameter, thus leading to the notation from ( 23) and (24).
In a similar way, for the second system we have where can be interpreted as an additional "noise" term, of variance σ 2 v h (n), related to the system h.Here, the a posteriori and a priori misalignments corresponding to the system g were defined in (20) and ( 22), respectively.In Figures 1 and 2, the equivalent system identification scheme is represented in terms of the two components, g(n) and h(n), respectively; it can be observed that each system influences the other one through the additional "noise" term.In the framework of the LMS algorithm for bilinear forms, namely LMS-BF [12], the updates are the following: where µ g and µ h are the step-size parameters.In this context, the vectors corresponding to the a posteriori misalignments become At this point, let us introduce the notation Taking the square 2 norms in both sides of ( 49) and ( 50), respectively, we can recursively evaluate where It is very difficult to further process the expectation terms from (53)-(56) (and, consequently, (51) and ( 52)) without any supporting assumptions on the character of the input signals.Hence, let us consider that the covariance matrices of the inputs are close to a diagonal one.This is a fairly restrictive assumption on the input signals, which has been widely used to simplify the convergence analysis of many adaptive algorithms [27,28].Also, let us consider that the input signals are independent and have the same power.In this context, the computations of the expectation terms from (53)-(56) are detailed in Appendix A. Summarizing the results from this appendix, these terms result in where σ 2 x = E x(n) 2 and the terms denoted by p g (n) and p h (n) are evaluated as The expressions of the variances σ 2 v g (n) and σ 2 v h (n) are also provided in Appendix A. Consequently, using (57)-( 60) in ( 51) and (52), we obtain In the context of system identification problems, the main goal is to reduce the system misalignment, which basically represents the difference between the true impulse response and the estimated one.Therefore, in our framework, the optimal step-size parameters (denoted in the following by µ g,o and µ h,o ) can be found by minimizing (63) and (64).This is done by canceling the derivatives of ( 63) and (64) with respect to the step-sizes, which result in: By replacing A g , B g , A h , and B h with their expressions (see (57)-( 60)), the step-size parameters of the proposed optimized LMS algorithm for bilinear forms (namely OLMS-BF) are found.Finally, introducing these parameters in (47) and (48), the updates of the OLMS-BF algorithm become The most problematic terms in (67) and (68) are p g (n) and p h (n) (from (61) and (62), respectively), which depend on the true impulse responses.However, as shown in the next section, these terms could be omitted in practice.

SKF-BF versus OLMS-BF
The SKF-BF and OLMS-BF algorithms were developed following different theoretical patterns.However, there are strong similarities between these two algorithms, as will be explained in this section.
The update equations of the SKF-BF are given by ( 17) and (18), where the Kalman gain vectors have the expressions in (35) and (36).It can be noticed that the updates of the SKF-BF can be expressed as where the Kalman step-size parameters are Comparing these parameters with the optimal step-sizes from (67) and (68) (also taking (A7) and (A8) into account; see Appendix A), we can notice striking resemblances between SKF-BF and OLMS-BF.In fact, these two algorithms are very similar when On the other hand, as it was indicated in [12], this could represent a reasonable assumption, since suggesting that in the steady-state of the algorithm, the influence of the terms p h and p g on the step-size parameters diminishes.As will be supported in simulations, (73) can be fairly imposed within the OLMS-BF algorithm, while still leading to a very good compromise between the performance criteria (e.g., convergence rate versus misadjustment).Under these considerations, the OLMS-BF algorithm is summarized in Table 3 (in a practical form that facilitates its implementation).Initialization:

Practical Considerations
The previously developed algorithms are designed to identify the individual impulse responses of the bilinear form.The global (spatiotemporal) impulse response can be computed based on the Kronecker product between them.An alternative solution is to use the regular Kalman filter to identify the spatiotemporal impulse response directly, relying on the observation Equation ( 4) and identifying the state equation: where w(n) is a zero-mean white Gaussian noise signal vector.The covariance matrix of w(n) is R w (n) = σ 2 w I ML , where I ML is the identity matrix of size ML × ML and the variance σ 2 w captures the uncertainties in f(n).
In this way, following the approach from [26], we can easily derive the regular Kalman filter (KF) and its simplified version (namely SKF), which can identify the global impulse response using a single adaptive filter f(n); for further details, please see Sections VI and VII in [26].However, we need to mention that the solution found using the regular KF and SKF involves an adaptive filter of length ML, whereas their counterparts tailored for bilinear forms (i.e., KF-BF and SKF-BF) use two shorter filters of lengths L and M, respectively.As a consequence, besides a lower computational complexity, a much faster converge rate and tracking are expected for the bilinear algorithms with respect to the conventional ones.The same ideas apply for the OLMS-BF algorithm, as compared to its regular counterpart, i.e., the joint-optimized normalized LMS (JO-NLMS) algorithm [29], which could be used to identify the global impulse response f(n).
The computational complexity of the previously discussed algorithms is summarized in Table 4.It can be easily seen that the SKF-BF offer a great reduction in terms of complexity with respect to KF-BF.Also, the SKF-BF and OLMS-BF differ only by a small number of operations, thus confirming the similarity that was highlighted in Section 5. Finally, when ML M + L (which is usually the case in practice), we can notice that the algorithms tailored for bilinear forms (namely KF-BF, SKF-BF, and OLMS-BF) offer lower computational complexities as compared to their regular counterparts (i.e., KF, SKF, and JO-NLMS, respectively).

Algorithms
× + / KF-BF [13] 3 Next, a few important observations ought to be made regarding the specific parameters that must be set within the algorithms.Here, the noise power σ 2 v is required in order to compute the Kalman gain vectors (for KF-BF and SKF-BF) or the optimal step-sizes (for OLMS-BF).In practice, we can estimate this parameter in different ways; some simple and efficient methods for this purpose are presented in [30,31].Although there are different other methods that can be used to estimate the noise power, the analysis of their influence on the performance of the algorithms lies beyond the scope of this paper.
The parameters related to the uncertainties in the unknown systems also need to be set or estimated, i.e., σ 2 w h and σ 2 w g .Choosing small values for these parameters yields a small misalignment, but at the same time a poor tracking.On the other hand, large values (meaning that there are high uncertainties in the unknown systems) lead to a good tracking but also a high misalignment.This means that we always need to have a good compromise between fast tracking and low misalignment.In practice, if we have some a priori information about the systems which we need to identify, we can take it into consideration when setting the values of these parameters.For example, if we assume the spatial impulse response to be time-invariant, we could fix σ 2 w g = 0 and tune only the parameter related to the temporal impulse response.Thus, based on the state equation related to h(n), together with the approximation w h (n) 2 ≈ Lσ 2 w h (which is valid when L 1), and replacing h(n) and h(n − 1) by their estimates, we can evaluate It can be noticed that the estimation from ( 76) is designed to achieve a proper compromise between good tracking and low misalignment.When the algorithm starts to converge or when there is an abrupt change of the system, the difference between h(n) and h(n − 1) is significant, leading to large values of the parameter σ 2 w h (n), therefore providing fast convergence and tracking.On the contrary, when the algorithm is converging to its steady-state, the difference between h(n) and h(n − 1) reduces, thus leading to the parameter σ 2 w h (n) taking small values and, consequently, to a low misalignment.

Results
Experiments are performed in the context of system identification, in order to highlight the performance of the Kalman-based algorithms for bilinear forms (referred to as KF-BF and SKF-BF), in comparison with their regular counterparts (KF and SKF, as mentioned in Section 6).Also, we aim to evaluate the features of the OLMS-BF algorithm, as compared to other existing solutions, e.g., the normalized LMS-BF (NLMS-BF) and the JO-NLMS algorithms, which were introduced in [12] and [29], respectively.
In most of the experiments, both the temporal and the spatial impulse responses are randomly generated from a Gaussian distribution, having the lengths equal to L = 64 and M = 8, respectively.This leads to a length of the spatiotemporal impulse response equal to ML = 8 × 64 = 512.It is also useful to evaluate the tracking capabilities of the algorithms; to this purpose, a sudden change in the temporal impulse response is applied in the middle of simulations, by generating a new random vector of length L = 64, also from a Gaussian distribution.Only in the last experiment, the impulse response h(n) is an acoustic echo path of length L = 512, while the coefficients of g(n) are computed as g m (n) = 0.5 m , with m = 1, . . ., M and M = 4; in this case, the length of the global system is ML = 4 × 512 = 2048.
The input signals x m (n), m = 1, 2, . . ., M are either white Gaussian noises (WGNs) or AR(1) processes (which were obtained after passing a white Gaussian noise through a first-order system with the transfer function 1/ 1 − 0.8z −1 ).The additive noise v(n) is white and Gaussian, having the variance σ 2 v = 0.01; we assume that this parameter is available in the experiments.In most of simulations, the measure of the performance is the NM (in dB) (see (10)), to evaluate the identification of the global impulse response.In addition, in the second set of experiments (focusing on the OLMS-BF algorithm), we also involve the NPMs (based on ( 8) and ( 9)), related to the individual impulse responses.
In Figures 3 and 4, the KF-BF is compared to the regular KF for WGN and AR(1) input signals, respectively.The specific parameters of the algorithms are set to σ 2 w h = σ 2 w g = σ 2 w = 10 −9 .It can be noticed from both figures that the KF-BF achieves a faster convergence rate as compared to the regular KF, for both types of input signals, providing also a better tracking capability.The gain is even more apparent in case of AR(1) inputs.
The previous experiment is repeated (for the same two types of inputs) in Figures 5 and 6, this time comparing the SKF-BF with the regular SKF [26].As it can be observed, the simplified versions (SKF-BF and SKF) yield a slower convergence rate (especially in case of AR(1) inputs) as compared to the full versions (KF-BF and KF, respectively); however, the computational complexities for these simplified versions are much lower.As it was expected, the SKF-BF outperforms the regular SKF in terms of the convergence rate; the improvement is much more visible in the case of AR(1) inputs.
Next, the performance of the SKF-BF is evaluated in Figures 7 and 8, but using the recursive estimation σ 2 w h (n) from (76) (not a constant value, as in the previous experiments).The spatial impulse response is assumed to be time invariant, so that we can set σ 2 w g = 0.The regular SKF is considered for comparison, using a similar way to estimate its specific parameter, i.e., σ 2 2 [26].Because of the nature of the estimators (as it was explained in Section 6), the algorithms behave like variable step-size adaptive filters, achieving both low misalignment and fast convergence/tracking.Moreover, as we can notice from these two figures, the proposed SKF-BF still outperforms the regular SKF in terms of both performance criteria.As outlined in Section 5, there are strong similarities between the SKF-BF and OLMS-BF algorithms.In Figures 9 and 10, we compare the performances of these algorithms using two types of input signals, i.e., WGNs and AR(1) processes, respectively.Both algorithms use the recursive estimate σ 2 w h (n) (from (76)) and σ 2 w g = 0.As we can notice, the SKF-BF and OLMS-BF algorithms behave quite similar, especially when the input signals are WGNs (Figure 9).When the input signals are AR(1) processes (Figure 10), the SKF-BF outperforms the OLMS-BF in terms of the initial convergence rate; however, it pays with a slower tracking reaction.Nevertheless, the overall performances of these algorithms are very similar, as supported by the comparison provided in Section 5.In the second set of experiments, the behavior of the OLMS-BF algorithm is analyzed, as compared to the NLMS-BF algorithm [12].The NLMS-BF algorithm uses different values of its step-size parameters, α h and α g .The performances are now evaluated in terms of both NPMs and NM, using both types of input signals as before (WGNs and AR(1) processes).The results are presented in Figures 11 and 12, using WGNs as inputs, and in Figures 13 and 14, where the input signals are AR(1) processes.It can be noticed that the proposed solution achieves similar convergence rate but a much lower misalignment level than the NLMS-BF algorithm with α h = α g = 0.5 (which provides the fastest convergence rate [12]).On the other hand, if we target a lower misalignment and set the step-sizes of the NLMS-BF to smaller values (i.e., α h = α g = 0.1 and α h = α g = 0.01), the convergence rate also decreases.However, the OLMS-BF algorithm leads to a misalignment level similar to the NLMS-BF algorithm using the smallest step-sizes.In addition, when the input signals are AR(1) processes, the improvement offered by the OLMS-BF algorithm is even more apparent.Next, the performance of the OLMS-BF algorithm is evaluated along with the JO-NLMS algorithm [29], which is applied for the identification of the global impulse response of length ML = 512.The results are presented in Figures 15 and 16, using WGNs and AR(1) input signals, respectively.As specified in Section 6, the JO-NLMS algorithm is the regular counterpart of the OLMS-BF in a classical (one-dimensional) system identification scenario.We can see that the proposed solution (tailored for bilinear forms, i.e., exploiting the two-dimensional decomposition) offers both faster convergence and tracking, as well as a lower misalignment, as compared to the JO-NLMS algorithm.The performance improvement is even more important in case of AR(1) input signals.Finally, to validate our approach, we assess the performance of the OLMS-BF algorithm when applying it in a context which is closer to a real scenario.The temporal impulse response h(n) is a real-world echo path of length L = 512.The spatial impulse response g(n), of length M = 4, is generated using an exponential decay with the elements g m = 0.5 m , m = 1, . . ., M. Both impulse responses are then normalized such that h(n) = g(n) = 1.The input signal is an AR(1) process and we compare the behaviors of the OLMS-BF and NLMS-BF algorithms.The performance is illustrated in Figures 17 and 18.We can notice that the proposed solution slightly outperforms the fastest convergence rate of NLMS-BF, given by α h = α g = 0.5, but at the same time offering a much lower value of the misalignment.If, however, we use the NLMS-BF algorithm with the smaller step-sizes (in order to obtain a better misalignment), the resulting convergence rate is much lower than the one of the OLMS-BF algorithm.

Discussion
In this paper, we have focused on the Kalman filter tailored for the identification of bilinear forms (KF-BF), together with its simplified version (SKF-BF).Also, we have developed an optimized version of the LMS algorithm for bilinear forms, namely OLMS-BF.In addition, a comparison between the SKF-BF and OLMS-BF algorithms has been outlined, indicating strong similarities between these two solutions.In our framework, the bilinear term has been defined with respect to the impulse responses of the spatiotemporal model.
The SKF-BF provides a reduced computational complexity as compared to KF-BF; the downside is that it has a slower convergence rate, more visible for correlated inputs.On the other hand, the SKF-BF and OLMS-BF algorithms perform very similarly.Experimental results also indicate that the algorithms tailored for bilinear forms outperform their regular counterparts (in such two-dimensional system identification scenarios), in terms of both convergence rate and tracking, as well as the steady-state misalignment.Adding to that, the reduced computational amount provided by the use of two shorter adaptive filters instead of a single (much longer) one, led us to conclude that the proposed algorithms could represent appealing solutions for the identification of bilinear forms.
where p g (n) is given in (61) and we took into account that c (n − 1) and x h (n) are uncorrelated.
Next, we should concentrate on the last expectation term in (A2), which can be expressed as The main diagonal terms of this matrix are E h . ., M. In the following, we consider the assumption that the input signals are independent and have the same power, while their covariance matrices are close to a diagonal matrix [27,28].Consequently, Finally, using (A3) and (A4) in (A2), we obtain In a similar manner, the corresponding term from (55) is derived as The terms p g (n) and p h (n) are given in (61) and (62), respectively.Further, we detail the evaluation of the expectation term from (54).To begin, let us focus on the product x T h (n)x h (n).Relying on the same considerations and assumptions from (A3) and (A4), we obtain Similarly, Hence, considering some degree of stationarity of the input signals, (A7) can be seen as a deterministic quantity, yielding Let us focus on the computation of the expectation term E e 2 (n) .Using (A1), we obtain At this point, we need to evaluate the variance of v g , which can be developed as For the corresponding term from (56), we use the dual expression for e(n) (see (14)), which leads to where (similar to (A11)) Thus, we finally obtain Summarizing, we can use (A5), (A6), (A13), and (A16) in ( 51)-( 52), in order to obtain the recursive relations from (63)-(64), which are further used in the development of the OLMS-BF algorithm.

Appendix B
We provide here (summarized in Table A1) the main parameters that were used throughout the paper, in order to facilitate the reading.

Figure 1 .
Figure 1.Equivalent system identification scheme when considering the system g(n) and the input x h (n).

Figure 2 .
Figure 2. Equivalent system identification scheme when considering the system h(n) and the input x g (n).

Figure 3 .Figure 4 .
Figure 3. Normalized misalignment of the KF-BF and regular KF using WGNs as input signals.The length of the global impulse response is ML = 512.The specific parameters are set to σ 2 w h = σ 2 w g = σ 2 w = 10 −9 .

Figure 5 .Figure 6 .
Figure 5. Normalized misalignment of the SKF-BF and regular SKF using WGNs as input signals.The length of the global impulse response is ML = 512.The specific parameters are set to σ 2 w h = σ 2 w g = σ 2 w = 10 −9 .

Figure 7 .
Figure 7. Normalized misalignment of the SKF-BF and regular SKF (for WGNs input signals), using the recursive estimates σ 2w h (n) and σ 2 w (n), respectively; the SKF-BF uses σ 2 w g = 0.The length of the global impulse responses is ML = 512.

Figure 8 .
Figure 8. Normalized misalignment of the SKF-BF and regular SKF (for AR(1) input signals), using the recursive estimates σ 2 w h (n) and σ 2 w (n), respectively; the SKF-BF uses σ 2 w g = 0.The length of the global impulse responses is ML = 512.

Figure 9 .Figure 10 .
Figure 9. Normalized misalignment of the SKF-BF and OLMS-BF algorithms using WGNs as input signals.Both algorithms use the recursive estimate σ 2 w h (n) and σ 2 w g = 0.The length of the global impulse responses is ML = 512.

Figure 17 .Figure 18 .
Figure 17.Normalized projection misalignment of the OLMS-BF and NLMS-BF (using different step-size parameters): (Top) identification of the temporal impulse response h(n), (Bottom) identification of the spatial impulse response g(n).The input signals are AR(1) processes, L = 512, and M = 4.

Table A1 .
The main parameters used throughout the paper.zero-mean multiple-input signal matrixx(n) = vec[X(n)]input signal vector y(n) output signal at time n (i.e., the bilinear form)d(n) zero-mean desired signal at time n v(n) zero-mean additive noise, of variance σ 2 v h(n), g(n)true impulse responses of the system, of lengths L and M, respectively η arbitrary scaling parameter f(n) true spatiotemporal (i.e., global) impulse response of the system, of length ML h(n), g(n) estimated impulse responses of the systemf(n) = g(n) ⊗ h(n)estimated spatiotemporal impulse response of the systemx h (n), x g (n) the input signals from Figures1 and 2

y
(n) estimated output signal at time n e(n)   error signal between the desired and estimated signals at time nw h (n), w g (n)zero-mean white Gaussian noise vectors, with correlation matricesR w h (n) = σ 2 w h I L and R wg (n) = σ 2 wg I M , respectively k h (n), k g (n) Kalman gain vectors c h (n), c g (n)a posteriori misalignments corresponding to the two impulse responses, with correlation matrices R c h (n) and R cg (n), respectively c ha (n), c ga (n) a priori misalignments corresponding to the two impulse responses, with correlation matrices R c ha (n) and R cg a (n), respectivelym g (n) = E c g (n) 2 , m h (n) = E c h (n) 2squared norms of the a posteriori misalignments µ h,o , µ g,o optimal step-size parameters corresponding to the two adaptive filters

Table 1 .
Notation used throughout the paper.
vec(C) = cvectorization operation, i.e., conversion of a matrix (C of size L × M) into a vector (c of length ML)

Table 4 .
Computational complexity of the algorithms.