1. Introduction
Bilinear systems were previously treated in various contexts [
1], such as the nonlinear systems approximation. The applications derived from there are numerous, among which we can mention system identification [
2,
3,
4,
5], design of digital filters [
6], echo cancellation [
7], chaotic communications [
8], active noise control [
9], neural networks [
10], etc. In all these papers, the bilinear term is understood in terms of an input-output relation (with respect to the data).
More recently, a new approach was studied in [
11], where the bilinear term was defined in the context of a multiple-input/single-output (MISO) system, with respect to the impulse responses of a spatiotemporal model. In [
11], the Wiener filter solution for the identification of such bilinear forms was proposed, and then, in [
12,
13,
14], some adaptive solutions based on different basic algorithms were provided. Similar frameworks can be found in [
15,
16,
17,
18], in conjunction with specific applications, such as channel equalization and nonlinear acoustic echo cancellation; however, these works were not associated with the identification of bilinear forms or analyzed in this context.
The Kalman filter was first introduced in [
19] and it has a wide range of applications in technology, as well as in signal processing [
20]. Some of its applications in the automotive field are mentioned in [
21,
22], and the references therein, while [
23] also provides a computational complexity analysis.
In this paper, we focus on a comparative study of two different types of algorithms tailored for the identification of bilinear forms. The first one is based on the Kalman filter [
19], whose bilinear form was introduced in our previous work [
13], together with a simplified (i.e., low complexity) version, which could be more suitable in real-world applications. The second algorithm is a version of the least-mean-square (LMS) adaptive filter, but where the step-size parameter has been optimized in order to meet a proper compromise between the convergence rate and misadjustment. In addition, as it will be shown, there is a strong similarity between the simplified Kalman filter and the optimized LMS algorithm. Simulations show the good performance of the proposed solutions, in comparison with other related works based on the conventional algorithms. Moreover, the gain could be twofold, in terms of both convergence performance and computational complexity.
The paper structure is as follows.
Section 2 introduces the system model in the context of bilinear forms. In this framework, we present the Kalman filter together with its simplified version in
Section 3. The optimized LMS algorithm is derived in
Section 4. Next, a comparison between the simplified Kalman filter and the optimized LMS algorithm for bilinear forms is presented in
Section 5.
Section 6 provides some practical considerations. Simulations are performed in the framework of system identification and the results are given in
Section 7. Finally, some conclusions are drawn in
Section 8. Also,
Appendix A provides detailed computations of some terms required within the optimized LMS algorithm, while
Appendix B summarizes and briefly explains the main parameters used in the paper, in order to facilitate the reading. For the sake of clarity, the main notation used in this work is provided in
Table 1.
2. System Model
The signal model that shall be used throughout the paper is given by [
11]
where
is the zero-mean desired (or reference) signal at the discrete-time index
n,
and
are the two impulse responses of the system of lengths
L and
M, respectively, the superscript
is the transpose operator,
is the zero-mean multiple-input signal matrix,
is a vector containing the
L most recent time samples of the
mth (
) input signal,
is the output signal and it represents a bilinear form, and
is a zero-mean additive noise (with the variance
). It is assumed that all signals are real-valued, and
and
are independent. As we can notice,
is a bilinear function of
and
, because for every fixed
,
is a linear function of
, and for every fixed
, it is a linear function of
.
We can rewrite the matrix
, of size
, as a vector of length
, by using the vectorization operation:
Therefore, the output signal can be expressed as
where ⊗ denotes the Kronecker product between the individual impulse responses, while the vector
, of length
, represents the spatiotemporal (i.e., global) impulse response of the system. Consequently, the signal model in (1) becomes
The major difference compared to a general MISO system is yielded by the fact that in this bilinear context is formed with only different elements, despite being of length .
The goal is the identification of the two impulse responses
and
, and, in this way, of the spatiotemporal impulse response
. For this aim, we can use two adaptive filters,
and
; hence, the global impulse response can be evaluated as
Let
be a real-valued constant. We can see from (1) that
meaning that the pairs
and
are equivalent in the bilinear form. This implies that we can only identify
and
up to a scaling factor. A similar discussion can be found in [
15,
18] in the framework of blind identification/equalization and nonlinear acoustic echo cancellation, respectively. Nevertheless, because
the global impulse response can be identified with no scaling ambiguity. Consequently, for the performance evaluation of the identification of the temporal and spatial filters, we can use the normalized projection misalignment (NPM), defined in [
24]:
where
denotes the Euclidean norm. On the other hand, for the identification of the global filter,
, we should use the normalized misalignment:
In [
11], this bilinear system identification problem has been studied in terms of the Wiener filter. Therefore, the assumption was that the two impulse responses that need to be identified are time-invariant (which represents a basic assumption in the context of the Wiener filter). In practice, however, these systems could vary in time. For this reason, in this paper, we approach the system identification problem considering that the systems that need to be identified vary in time. Thus, we assume that
and
are zero-mean random vectors, following a simplified first-order Markov model, i.e.,
where
and
are zero-mean white Gaussian noise vectors, with correlation matrices
and
, respectively (with
and
being the identity matrices of sizes
and
, respectively). It is considered that
is uncorrelated with
and
, while
is uncorrelated with
and
. The variances
and
capture the uncertainties in
and
, respectively.
3. Kalman Filter for Bilinear Forms
In this section, we address the previously described system identification problem in terms of the Kalman filter, summarizing the findings from [
13]. In this context, the signal model from (1) may be interpreted as the observation equation, while the system impulse responses can be considered as state equations. Given the two adaptive filters
and
, the estimated signal is given by
As a result, the a priori error signal between the desired and estimated signals can be defined as
where
In the context of the linear sequential Bayesian approach, the optimal estimates of the state vectors have the forms [
25]:
where
and
are the Kalman gain vectors. Next, we can define the a posteriori misalignments (which represent the state estimation errors) related to the temporal and spatial impulse responses:
for which their correlation matrices are
and
, respectively, where
denotes mathematical expectation. As mentioned in
Section 2, we can only identify the impulse responses up to this arbitrary scaling factor
; however, the pair
and
is equivalent to the pair
and
in the bilinear form. Let us also define the a priori misalignments related to the two impulse responses:
whose correlation matrices are
and
, respectively.
For the sake of simplicity of the upcoming developments, let us multiply
and
by
and
, respectively, and introduce the notation:
These new terms are also zero-mean white Gaussian noise vectors, having the correlation matrices
and
, respectively. Clearly, we have
and
. Consequently, using this notation in (21) and (22), we obtain
In this context, the Kalman gain vectors are computed from the minimization of the criteria:
with respect to
and
, respectively, where
denotes the trace of a square matrix. From these minimizations, we find that
and
Summarizing, the Kalman filter for bilinear forms (namely KF-BF) is defined by Equations (17), (18), (25), (26), and (29)–(32). As we can notice, the computational complexity of this algorithm is proportional to
. For the purpose of reducing the computational complexity of the KF-BF, a simplified version of this algorithm is derived. The idea of this low-complexity version was inspired by the work presented in [
26], in the context of echo cancellation. First, let us assume that the KF-BF has reached the steady-state convergence. Consequently,
and
tend to become diagonal matrices, which have all the elements on the main diagonal equal to small positive numbers,
and
, respectively. Therefore, we can use the approximations:
Hence, the Kalman gain vectors for the temporal and spatial impulse responses become
where
and
can be seen as variable regularization parameters. Then, we use the Kalman vectors from (35) and (36) in the updates (17) and (18), respectively.
Next, a new simplification can be made, by considering that the matrices appearing in the updates of
and
can be approximated as
We can perform these approximations because, as the filters start to converge, the misalignments of the individual coefficients tend to become uncorrelated; due to this fact, the matrices
and
tend to become diagonal. Using the notation:
together with
we can summarize the simplified Kalman filter for bilinear forms (SKF-BF) in
Table 2. As we can notice, its computational complexity is proportional to
, which represents an important gain as compared to KF-BF.
4. Optimized LMS Algorithm for Bilinear Forms
In this section, we approach the system identification problem based on the LMS algorithm, aiming to optimize its step-size parameter in order to address the compromise between the main performance criteria, i.e., convergence rate versus misadjustment [
27]. In the following, the proposed optimized LMS algorithm for bilinear forms (namely OLMS-BF) is derived based on the same system model given in
Section 2. As will be explained in
Section 5, this algorithm has striking resemblances with the SKF-BF, even if their derivations follow different patterns.
Let us consider the two estimated impulse responses
and
, such that the estimated signal is given by (13). As a consequence, the a priori error signal between the desired signal and the estimated one can be defined following (14), i.e.,
where
and
, while
and
are defined in (15) and (16), respectively.
The desired signal
may also be expressed as
where the term:
can be seen as an additional “noise” term, of variance
, introduced by the system
. The terms
and
were previously defined in (19) and (21), respectively. For the sake of simplicity, the scaling factor
does not appear explicitly in the following. As explained before, this parameter is included in the expression of the uncertainty parameter, thus leading to the notation from (23) and (24).
In a similar way, for the second system we have
where
can be interpreted as an additional “noise” term, of variance
, related to the system
. Here, the a posteriori and a priori misalignments corresponding to the system
were defined in (20) and (22), respectively. In
Figure 1 and
Figure 2, the equivalent system identification scheme is represented in terms of the two components,
and
, respectively; it can be observed that each system influences the other one through the additional “noise” term.
In the framework of the LMS algorithm for bilinear forms, namely LMS-BF [
12], the updates are the following:
where
and
are the step-size parameters. In this context, the vectors corresponding to the a posteriori misalignments become
At this point, let us introduce the notation
and
. Taking the square
norms in both sides of (49) and (50), respectively, we can recursively evaluate
where
It is very difficult to further process the expectation terms from (53)–(56) (and, consequently, (51) and (52)) without any supporting assumptions on the character of the input signals. Hence, let us consider that the covariance matrices of the inputs are close to a diagonal one. This is a fairly restrictive assumption on the input signals, which has been widely used to simplify the convergence analysis of many adaptive algorithms [
27,
28]. Also, let us consider that the input signals are independent and have the same power. In this context, the computations of the expectation terms from (53)–(56) are detailed in
Appendix A. Summarizing the results from this appendix, these terms result in
where
and the terms denoted by
and
are evaluated as
The expressions of the variances
and
are also provided in
Appendix A. Consequently, using (57)–(60) in (51) and (52), we obtain
In the context of system identification problems, the main goal is to reduce the system misalignment, which basically represents the difference between the true impulse response and the estimated one. Therefore, in our framework, the optimal step-size parameters (denoted in the following by
and
) can be found by minimizing (63) and (64). This is done by canceling the derivatives of (63) and (64) with respect to the step-sizes, which result in:
By replacing
, and
with their expressions (see (57)–(60)), the step-size parameters of the proposed optimized LMS algorithm for bilinear forms (namely OLMS-BF) are found. Finally, introducing these parameters in (47) and (48), the updates of the OLMS-BF algorithm become
The most problematic terms in (67) and (68) are and (from (61) and (62), respectively), which depend on the true impulse responses. However, as shown in the next section, these terms could be omitted in practice.
6. Practical Considerations
The previously developed algorithms are designed to identify the individual impulse responses of the bilinear form. The global (spatiotemporal) impulse response can be computed based on the Kronecker product between them. An alternative solution is to use the regular Kalman filter to identify the spatiotemporal impulse response directly, relying on the observation Equation (4) and identifying the state equation:
where
is a zero-mean white Gaussian noise signal vector. The covariance matrix of
is
, where
is the identity matrix of size
and the variance
captures the uncertainties in
.
In this way, following the approach from [
26], we can easily derive the regular Kalman filter (KF) and its simplified version (namely SKF), which can identify the global impulse response using a single adaptive filter
; for further details, please see Sections VI and VII in [
26]. However, we need to mention that the solution found using the regular KF and SKF involves an adaptive filter of length
, whereas their counterparts tailored for bilinear forms (i.e., KF-BF and SKF-BF) use two shorter filters of lengths
L and
M, respectively. As a consequence, besides a lower computational complexity, a much faster converge rate and tracking are expected for the bilinear algorithms with respect to the conventional ones. The same ideas apply for the OLMS-BF algorithm, as compared to its regular counterpart, i.e., the joint-optimized normalized LMS (JO-NLMS) algorithm [
29], which could be used to identify the global impulse response
.
The computational complexity of the previously discussed algorithms is summarized in
Table 4. It can be easily seen that the SKF-BF offer a great reduction in terms of complexity with respect to KF-BF. Also, the SKF-BF and OLMS-BF differ only by a small number of operations, thus confirming the similarity that was highlighted in
Section 5. Finally, when
(which is usually the case in practice), we can notice that the algorithms tailored for bilinear forms (namely KF-BF, SKF-BF, and OLMS-BF) offer lower computational complexities as compared to their regular counterparts (i.e., KF, SKF, and JO-NLMS, respectively).
Next, a few important observations ought to be made regarding the specific parameters that must be set within the algorithms. Here, the noise power
is required in order to compute the Kalman gain vectors (for KF-BF and SKF-BF) or the optimal step-sizes (for OLMS-BF). In practice, we can estimate this parameter in different ways; some simple and efficient methods for this purpose are presented in [
30,
31]. Although there are different other methods that can be used to estimate the noise power, the analysis of their influence on the performance of the algorithms lies beyond the scope of this paper.
The parameters related to the uncertainties in the unknown systems also need to be set or estimated, i.e.,
and
. Choosing small values for these parameters yields a small misalignment, but at the same time a poor tracking. On the other hand, large values (meaning that there are high uncertainties in the unknown systems) lead to a good tracking but also a high misalignment. This means that we always need to have a good compromise between fast tracking and low misalignment. In practice, if we have some a priori information about the systems which we need to identify, we can take it into consideration when setting the values of these parameters. For example, if we assume the spatial impulse response to be time-invariant, we could fix
and tune only the parameter related to the temporal impulse response. Thus, based on the state equation related to
, together with the approximation
(which is valid when
), and replacing
and
by their estimates, we can evaluate
It can be noticed that the estimation from (76) is designed to achieve a proper compromise between good tracking and low misalignment. When the algorithm starts to converge or when there is an abrupt change of the system, the difference between and is significant, leading to large values of the parameter , therefore providing fast convergence and tracking. On the contrary, when the algorithm is converging to its steady-state, the difference between and reduces, thus leading to the parameter taking small values and, consequently, to a low misalignment.
7. Results
Experiments are performed in the context of system identification, in order to highlight the performance of the Kalman-based algorithms for bilinear forms (referred to as KF-BF and SKF-BF), in comparison with their regular counterparts (KF and SKF, as mentioned in
Section 6). Also, we aim to evaluate the features of the OLMS-BF algorithm, as compared to other existing solutions, e.g., the normalized LMS-BF (NLMS-BF) and the JO-NLMS algorithms, which were introduced in [
12] and [
29], respectively.
In most of the experiments, both the temporal and the spatial impulse responses are randomly generated from a Gaussian distribution, having the lengths equal to and , respectively. This leads to a length of the spatiotemporal impulse response equal to . It is also useful to evaluate the tracking capabilities of the algorithms; to this purpose, a sudden change in the temporal impulse response is applied in the middle of simulations, by generating a new random vector of length , also from a Gaussian distribution. Only in the last experiment, the impulse response is an acoustic echo path of length , while the coefficients of are computed as , with and ; in this case, the length of the global system is .
The input signals are either white Gaussian noises (WGNs) or AR(1) processes (which were obtained after passing a white Gaussian noise through a first-order system with the transfer function ). The additive noise is white and Gaussian, having the variance ; we assume that this parameter is available in the experiments. In most of simulations, the measure of the performance is the NM (in dB) (see (10)), to evaluate the identification of the global impulse response. In addition, in the second set of experiments (focusing on the OLMS-BF algorithm), we also involve the NPMs (based on (8) and (9)), related to the individual impulse responses.
In
Figure 3 and
Figure 4, the KF-BF is compared to the regular KF for WGN and AR(1) input signals, respectively. The specific parameters of the algorithms are set to
. It can be noticed from both figures that the KF-BF achieves a faster convergence rate as compared to the regular KF, for both types of input signals, providing also a better tracking capability. The gain is even more apparent in case of AR(1) inputs.
The previous experiment is repeated (for the same two types of inputs) in
Figure 5 and
Figure 6, this time comparing the SKF-BF with the regular SKF [
26]. As it can be observed, the simplified versions (SKF-BF and SKF) yield a slower convergence rate (especially in case of AR(1) inputs) as compared to the full versions (KF-BF and KF, respectively); however, the computational complexities for these simplified versions are much lower. As it was expected, the SKF-BF outperforms the regular SKF in terms of the convergence rate; the improvement is much more visible in the case of AR(1) inputs.
Next, the performance of the SKF-BF is evaluated in
Figure 7 and
Figure 8, but using the recursive estimation
from (76) (not a constant value, as in the previous experiments). The spatial impulse response is assumed to be time invariant, so that we can set
. The regular SKF is considered for comparison, using a similar way to estimate its specific parameter, i.e.,
[
26]. Because of the nature of the estimators (as it was explained in
Section 6), the algorithms behave like variable step-size adaptive filters, achieving both low misalignment and fast convergence/tracking. Moreover, as we can notice from these two figures, the proposed SKF-BF still outperforms the regular SKF in terms of both performance criteria.
As outlined in
Section 5, there are strong similarities between the SKF-BF and OLMS-BF algorithms. In
Figure 9 and
Figure 10, we compare the performances of these algorithms using two types of input signals, i.e., WGNs and AR(1) processes, respectively. Both algorithms use the recursive estimate
(from (76)) and
. As we can notice, the SKF-BF and OLMS-BF algorithms behave quite similar, especially when the input signals are WGNs (
Figure 9). When the input signals are AR(1) processes (
Figure 10), the SKF-BF outperforms the OLMS-BF in terms of the initial convergence rate; however, it pays with a slower tracking reaction. Nevertheless, the overall performances of these algorithms are very similar, as supported by the comparison provided in
Section 5.
In the second set of experiments, the behavior of the OLMS-BF algorithm is analyzed, as compared to the NLMS-BF algorithm [
12]. The NLMS-BF algorithm uses different values of its step-size parameters,
and
. The performances are now evaluated in terms of both NPMs and NM, using both types of input signals as before (WGNs and AR(1) processes). The results are presented in
Figure 11 and
Figure 12, using WGNs as inputs, and in
Figure 13 and
Figure 14, where the input signals are AR(1) processes. It can be noticed that the proposed solution achieves similar convergence rate but a much lower misalignment level than the NLMS-BF algorithm with
(which provides the fastest convergence rate [
12]). On the other hand, if we target a lower misalignment and set the step-sizes of the NLMS-BF to smaller values (i.e.,
and
), the convergence rate also decreases. However, the OLMS-BF algorithm leads to a misalignment level similar to the NLMS-BF algorithm using the smallest step-sizes. In addition, when the input signals are AR(1) processes, the improvement offered by the OLMS-BF algorithm is even more apparent.
Next, the performance of the OLMS-BF algorithm is evaluated along with the JO-NLMS algorithm [
29], which is applied for the identification of the global impulse response of length
. The results are presented in
Figure 15 and
Figure 16, using WGNs and AR(1) input signals, respectively. As specified in
Section 6, the JO-NLMS algorithm is the regular counterpart of the OLMS-BF in a classical (one-dimensional) system identification scenario. We can see that the proposed solution (tailored for bilinear forms, i.e., exploiting the two-dimensional decomposition) offers both faster convergence and tracking, as well as a lower misalignment, as compared to the JO-NLMS algorithm. The performance improvement is even more important in case of AR(1) input signals.
Finally, to validate our approach, we assess the performance of the OLMS-BF algorithm when applying it in a context which is closer to a real scenario. The temporal impulse response
is a real-world echo path of length
. The spatial impulse response
, of length
, is generated using an exponential decay with the elements
,
. Both impulse responses are then normalized such that
. The input signal is an AR(1) process and we compare the behaviors of the OLMS-BF and NLMS-BF algorithms. The performance is illustrated in
Figure 17 and
Figure 18. We can notice that the proposed solution slightly outperforms the fastest convergence rate of NLMS-BF, given by
, but at the same time offering a much lower value of the misalignment. If, however, we use the NLMS-BF algorithm with the smaller step-sizes (in order to obtain a better misalignment), the resulting convergence rate is much lower than the one of the OLMS-BF algorithm.