An Insightful Overview of the Wiener Filter for System Identiﬁcation

: Efﬁciently solving a system identiﬁcation problem represents an important step in nu-merous important applications. In this framework, some of the most popular solutions rely on the Wiener ﬁlter, which is widely used in practice. Moreover, it also represents a benchmark for other related optimization problems. In this paper, new insights into the regularization of the Wiener ﬁlter are provided, which is a must in real-world scenarios. A proper regularization technique is of great importance, especially in challenging conditions, e.g., when operating in noisy environments and/or when only a low quantity of data is available for the estimation of the statistics. Different regularization methods are investigated in this paper, including several new solutions that ﬁt very well for the identiﬁcation of sparse and low-rank systems. Experimental results support the theoretical developments and indicate the efﬁciency of the proposed techniques.


Introduction
The Wiener filter has been an extremely useful tool since its invention by Norbert Wiener [1]. It represents one of the most popular solutions for system identification problems and finds important applications within a wide range of domains [2][3][4][5]. Basically, the Wiener filter is able to estimate a reference (or desired) random process by filtering an observed noisy process, which is somehow related (or correlated) to the reference signal. In this framework, the filtering process is linear and time invariant, all the signals (input, reference, and noise) are considered stationary processes, and the external (additive) noise that usually corrupts the reference signal is uncorrelated with the input signal. Overall, the Wiener filter is a statistical-based approach, where the solution results following an optimization criterion that relies on the minimization of the mean-squared error. In this context, the error is defined as the difference between the reference signal and the output of the filter.
The optimal solution (i.e., the optimal coefficients) corresponding to the Wiener filter can be simply found by solving a linear system of L equations, where L denotes the length of the impulse response. In other words, the vector (of length L) containing the optimal coefficients is obtained by multiplying the inverse of the covariance matrix of the input signal (of size L × L) with the cross-correlation vector (of length L) between the input and reference signals.
Despite this straightforward mathematical formulation, there are several major challenges related to the use of the Wiener filter in real-world applications. First, the accuracy of the conventional Wiener solution relies on the accuracy of the statistical estimates-namely, the covariance matrix and the cross-correlation vector mentioned previously. In order to reliably estimate these statistics, a large number of data samples (i.e., much larger than L) should be available. However, this is not always the case in practice, since in many applications, a low quantity of data is available, or there is incomplete information about the system. This challenge becomes even more apparent when facing the identification of long-length impulse responses. Second, the matrix inversion operation is also a critical step that should be carefully handled [6]. The basic idea is to avoid an ill-conditioning or a rank-deficient problem, which could result in a bad matrix inversion. For this purpose, the covariance matrix of the input signal needs to be regularized before inversion, usually by adding a positive constant to its main diagonal. It is known that the value of this regularization parameter becomes important in noisy conditions [7]. In other words, the additive noise that usually corrupts the reference signal could significantly influence the overall performance of the Wiener filter, especially in heavy noise environments.
The accuracy of the statistical estimates depends on specific constraints of particular applications, in terms of the availability of the quantity of data. Nevertheless, reformulating a conventional system identification problem using smaller data structures represents a potential solution to improve the behavior of the Wiener filter when dealing with a small quantity of data [8]. In addition, the matrix inversion operation could be avoided by using alternative coordinate descent methods [9]. However, the problem of handling an ill-conditioned matrix remains a critical issue, which can be addressed in terms of using a proper regularization technique, as mentioned before.
The previously mentioned challenges related to the Wiener filter have been addressed in the context of different applications. Recently, in [10], the Kronecker product decomposition of the impulse response was exploited in order to improve the accuracy of the Wiener filter for long-length impulse responses identification. Due to this feature, the resulting solution fits very well in the framework of echo cancellation, for example. In addition, several regularization methods were developed for denoising applications and image restoration, e.g., [11][12][13] and the references therein. Nevertheless, there is still room for performance improvement, especially when using the Wiener filter in challenging conditions, as explained above.
The goal of this paper is to provide new insights into the Wiener filter for system identification problems. The main focus concerns the important practical aspects related to the regularization of this optimal filter in difficult conditions, e.g., working in noisy environments and/or handling a limited set of available data/information. Following this introduction, Sections 2 and 3 briefly present the system model behind the conventional Wiener filter and some useful background, respectively. Next, Section 4 details different regularization methods of the Wiener filter, also showing how the sparse character of the system can be exploited in this context. Furthermore, a solution obtained using the Kronecker product decomposition of the impulse response is developed in Section 5, which is suitable for low-rank systems identification. In Section 6, the quadratic eigenvalue problem is used for the regularization of the Wiener filter, which can be successfully applied when some a priori knowledge on the impulse response is available. All these techniques and theoretical findings are supported by the simulation results provided in Section 7. Finally, the main conclusions are summarized in Section 8.

System Model and the Conventional Wiener Filter
Let us consider a linear single-input/single-output (SISO) system [14], whose output at the discrete-time index k is given by where d(k) is also known as the reference or desired signal, T is a vector containing the most recent L time samples of the zero-mean input signal x(k), the superscript T denotes the transpose operator, h is the unknown system impulse response of length L, and w(k) is the zero-mean additive noise. It is assumed that the signals take real values and that x(k) and w(k) are uncorrelated. Then, the objective of the so-called SISO system identification problem is to identify or estimate h, in the best possible way, given the input signal vector, x(k), and the output signal, d(k). For this purpose, the celebrated Wiener technique is the most appropriate [15][16][17].
From (1), it is clear that the variance of d(k) is where denoting mathematical expectation, and σ 2 w is the variance of w(k). It can be deduced from (2) that the signal-to-noise ratio (SNR) of the studied SISO system as presented in (1) is Even though this SNR depends explicitly on the channel impulse response of the system, in many practical identification problems, this SNR is roughly known even if h is not available [18].
Let h be a real-valued linear filter of length L, which is also an estimate of h, so that y(k) = h T x(k) is an estimate of d(k). Then, from the minimization of the popular mean-squared error (MSE) criterion the conventional Wiener filter [3,5] is found where p = E[x(k)d(k)] represents the cross-correlation vector (of length L) between x(k) and d(k). The block diagram of the conventional Wiener filter is depicted in Figure 1. While h W leads to satisfactory results in many important applications [18], there is still room for improvement.  It should be outlined that in different real-world applications, the presence of outliers (in the input signal and/or external noises) could bias the MSE criterion and, consequently, the solution from (5). This is due to the fact that the basic theory behind the conventional Wiener filter relies on the assumption of zero-mean and stationary signals. In order to cope with different outliers, the conventional approach should further involve additional control and robustness mechanisms, e.g., see [19] and the references therein. The investigation of such particular cases is beyond the scope of this paper.

Useful Definitions
In this section, many definitions that will be useful in the rest of this paper are presented. Let be a real-value vector of length L. The 2 (or Euclidean) norm can be defined as The 1 norm is given by Another very useful vector norm is the 0 norm: where card{·} stands for cardinality. In other words, the 0 norm of h is simply the number of nonzero elements of h. Let H be a rectangular matrix of size L 1 × L 2 containing real values, where without loss of generality, it is always assumed that L 1 ≥ L 2 . The Frobenius norm of H is defined as where tr(·) denotes the trace of a square matrix. The singular value decomposition (SVD) technique [20] is a very powerful tool that can decompose any matrix H as where U 1 = u 1,1 u 1,2 · · · u 1,L 1 and U 2 = u 2,1 u 2,2 · · · u 2,L 2 are two orthogonal matrices of sizes L 1 × L 1 and L 2 × L 2 , respectively, and Σ is an L 1 × L 2 rectangular diagonal matrix having on its main diagonal nonnegative real numbers. The columns of U 1 (resp. U 2 ) are known as the left-singular (resp. right-singular) vectors of H, whereas the diagonal entries σ l , l = 1, 2, . . . , L 2 of Σ are called the singular values of H with σ 1 ≥ σ 2 ≥ · · · ≥ σ L 2 ≥ 0. In the case when the rank of H is equal to P ≤ L 2 , i.e., rank(H) = P, then this matrix may be decomposed as From (10), it is deduced that the nuclear norm of H is Let L = L 1 L 2 . h can be decomposed as where s l , l = 1, 2, . . . , L 2 are L 2 short vectors of length L 1 each. As a result, the vector h of length L 1 L 2 can be transformed into a rectangular matrix of size L 1 × L 2 : Therefore, h = vec(H), where by vec(·) it is denoted the vectorization operation, consisting of converting a matrix into a vector. Furthermore, H = ivec(h), where ivec(·) is the inverse of the vectorization operation.
Let h be the system impulse response of length L. A good measure of the sparseness of h is [21,22] The closer ξ 12 (h) is to 1, the sparser is h.
or, equivalently, Let C be a convex set. The convex envelope of a function f : C → R is defined as the largest convex function g such that g(x) ≤ f (x) for all x ∈ C [23].

Wiener Filter with Different Kinds of Regularization
The Wiener filter can definitely be improved with an appropriate regularization, which should depend on the properties of the system that needs to be identified. In this section, the most useful regularization techniques are exposed.
The most well-known regularization approach is the so-called Tikhonov regularization, which is based on the 2 norm. The Wiener problem is now where δ ≥ 0 is the regularization parameter, which controls the tradeoff between the fidelity of the original data and large values of the coefficients of the filter. The optimization problem in (18) has obviously a closed-form solution, which is where I L is the L × L identity matrix. The filter h W,δ is much more reliable than h W , especially when R is ill conditioned, and SNR(h) is low.
If the system is sparse, the following optimization problem is more appropriate [24]: This problem finds the sparsest solution, but unfortunately, it is NP-hard [24]. Clearly, a large value of δ leads to a sparse filter. Using the fact that the convex envelop of h which is now a convex problem [25] and is roughly equivalent to (20). The optimization problem in (21) does not have a closed-form solution, but many iterative/adaptive algorithms exist in the literature that converge to the optimal solution, including proportionatetype adaptive filters [22]. The main advantage of (21) is that even if R is very ill conditioned, or its estimate is performed with a limited number of observations, if the system is sparse, then a better solution than the conventional Wiener filter can be expected.
Taking the gradient of J h + 2δ h 1 with respect to h and equating the result to zero, it can be found that where is a diagonal matrix and h l , l = 1, 2, . . . , L are the elements of h. As a result, (22) can be rewritten as From the previous expression, it is possible to derive a simple iterative algorithm. At iteration 0, one may take Then, at iteration i, it can be written where δ 1 > 0 is a regularization parameter that is not necessarily equal to δ. The iterations continue until convergence.
In the same way, H = ivec h . In many applications, H is far to be a full-column rank matrix and this important information should be somehow exploited. In other words, H is a low-rank system. Therefore, in the MSE criterion, it makes more sense to apply a regularization based on the rank of H, i.e., [26] Unfortunately, the matrix rank minimization in (27) is NP-hard, in general, due to the combinatorial nature of the function rank(·) [26]. Interestingly, when the matrix H is diagonal, the previous problem reduces to the optimization problem in (20). Consequently, the sparse problem is taken into account in (27). Using the fact that the convex envelop [27] of rank H is H * , (27) is roughly equivalent to the optimization problem [26]: Some iterative algorithms exist to solve this problem, but they do not seem to work well for long impulse responses; furthermore, complexity may be prohibitive since an approximate SVD needs to be computed at each iteration. Recent insights can be found in [28] and the references therein.

Wiener Filter via Kronecker Product Decomposition
In this section, the investigation on the identification of low-rank systems with the Wiener filter is further discussed. An attempt to minimize the nuclear norm of the system in the regularization part was presented in the previous section, but this approach does not seem very practical, and complexity may be extremely high, especially for large matrices.
Here, instead, it is assumed that the rank of the L 1 × L 2 matrix H is known and is much smaller than L 2 , i.e., rank(H) = P L 2 . In this case, H can also be expressed as where h 1,p and h 2,p , of lengths L 1 and L 2 , respectively, are impulse responses. This decomposition of H can be explicitly used in the MSE criterion as explained below. As a result where ⊗ is the Kronecker product. Furthermore, the filter h can also be decomposed as where h 1,p and h 2,p are filters having lengths L 1 and L 2 , respectively. By using the following relationships: where by I L 1 and I L 2 are denoted the identity matrices of sizes L 1 × L 1 and L 2 × L 2 , respectively, into (31), it is obtained that where respectively. The two expressions from (33) may be rewritten as where are matrices of sizes L 1 L 2 × PL 1 and L 1 L 2 × PL 2 , respectively, and T are vectors of lengths PL 1 and PL 2 , respectively. With this Kronecker product decomposition, it can be seen that h 1 and h 2 need to be identified instead of h, i.e., P(L 1 + L 2 ) components instead of L = L 1 L 2 .
Assume that h 2 (or, equivalently, H 2 ) is fixed. Then, the MSE criterion is The minimization of J h 1 | h 2 with respect to h 1 leads to In the same way, assume that h 1 (or, equivalently, H 1 ) is fixed. Then, the MSE criterion is The minimization of J h 2 | h 1 with respect to h 2 leads to By alternatively iterating between h 1,W and h 2,W , an iterative Wiener filter is obtained, that converges to the true solution [10]. This approach, as shown in [10], performs better than the conventional Wiener filter in various contexts of noise and available data for the estimation of the involved second-order statistics.
A more robust iterative Wiener filter must include regularization based on the 2 norm, i.e., from which the following optimal solutions are found: which is equivalent to regularize the covariance matrix R. Again, by alternatively iterating between h 1,W,δ and h 2,W,δ , a robust iterative Wiener filter is obtained, that converges to the true solution. In order to avoid any potential numerical issues, a very small positive constant should be added to the diagonal of the matrices in (41) and (42) before inversion.

Wiener Filter with the Quadratic Eigenvalue Problem
Assume that some knowledge is available on the impulse response, h, i.e., h 2 2 = α 2 with α 2 > 0. This information can be used to exactly regularize (with the Tikhonov technique) the Wiener filter with the help of the so-called quadratic eigenvalue problem (QEP) [29,30]. This approach is explained in this section.
The criterion is From the above criterion, the Lagrange function can be written: where λ is a Lagrange multiplier. Differentiating L h, λ with respect to h and λ and equating to zero yields the equations: It can be shown [31,32] that the smallest solution λ of the two previous equations is needed to solve (43).
Assume that λ is not an eigenvalue of R and let In fact, because R > 0, all the eigenvalues of R are real and strictly positive. To be in accordance with the regularization technique, R − λI L also needs to be positive definite. Consequently, λ needs to always be negative.
Relations (45) and (46) can be written in terms of h as follows: It is easily deduced from this second equation that Developing (48) and using the previous result, it is obtained that As a consequence, (51) can be rewritten as or alternatively, where and Q(λ) represents the λ-matrix, which is a degree 2 L × L matrix polynomial. In (53), the QEP [29] can be recognized, where h is the right eigenvector corresponding to the eigenvalue λ. Consequently, the solution of (43) is h = (R − λI L ) −1 p, where λ is the smallest eigenvalue of (53). The QEP represents a generalization of the standard eigenvalue problem (SEP), Au = λu, and the generalized eigenvalue problem (GEP), Au = λBu. Having a λ-matrix of size L × L, the QEP has 2L eigenvalues (finite or infinite) containing up to 2L right and 2L left eigenvectors. If there are more than L eigenvectors, then they cannot, of course, be linearly independent [29]. In the current context, since A 2 = I L is nonsingular, the QEP has 2L finite eigenvalues. Because A 0 , A 1 , and A 2 are symmetric, the eigenvalues are real or come in pairs (λ, λ * ) [29], where the superscript * denotes the complex conjugate operator.
An important observation regarding the QEP is that the linearization process is not unique. It is said that a 2L × 2L matrix A − λB represents a linearization of the L × L λ-matrix Q(λ) if the eigenvalues of these matrices are identical. Due to this process, the QEP can be solved with the GEP. One option is to take It can be checked that if t is an eigenvector of Q(λ), then is an eigenvector of A − λB. This process corresponds to the symmetric linearization since both A and B are symmetric. The accuracy of this approach will depend on the conditioning of A 0 and A 2 . Another possibility is to take where I 2L is the 2L × 2L identity matrix. This linearization is not symmetric. Different linearizations may yield different results; therefore, choosing the right one is important. The eigenvalues of Q(λ) can be ordered as where (·) denotes the real part of a complex number. Then, the regularized Wiener filter with the QEP is Now, if the eigenvalues with the P smallest real parts are considered, another version of the Wiener filter is obtained where

Simulations
In this section, simulation results are provided to support the main theoretical findings related to the regularization of the Wiener filter, which were presented in Sections 4-6. All the experiments were performed using MATLAB R2018b on an Asus GL552VX device (Windows 10 OS), having an Intel Core i7-6700HQ CPU @ 2.60 GHz, with 4 Cores, 8 Logical Processors, and 16 GB of RAM.

Conventional Wiener Filter with 2 -Norm Regularization
First, it is important to assess the performance of the conventional Wiener filter, which depends on several important factors, such as the accuracy of the statistical estimates, the SNR, and the regularization parameter. The performance measure is the normalized misalignment (in dB), which is evaluated based on (16). The input signal, x(k), is an AR(1) process with a pole at 0.9, while the additive noise, w(k), is white and Gaussian, and different values of the SNR are used. The unknown system, h, is the first impulse response from G168 Recommendation [33], which is padded with zeros up to the length L = 500. It represents a network echo path, which is sparse in nature. Its sparsity measure can be evaluated based on (15) and results in ξ 12 (h) = 0.8957.
The first critical factor that influences the overall performance of the Wiener filter is the accuracy of the statistical estimates-namely, the covariance matrix, R, and the cross-correlation vector, p. Let us consider that N data samples are used to estimate these quantities, which result in A higher value of N (as compared to the filter length L) leads to a reliable set of estimates. On the other hand, when a low quantity of data samples is used for the estimation of the statistics (i.e., N ≈ L), the conventional Wiener filter is not able to provide an accurate solution. Such a scenario is very challenging in different practical applications, where only an incomplete set of data/information is available. Moreover, in the framework of a low SNR environment, the challenges become even more apparent.
In order to highlight these aspects, the performance of the conventional Wiener filter depending on N and for different SNRs is illustrated. Having available the statistical estimates from (63) and (64), the conventional Wiener solution is obtained based on (19). In this simulation, a very small value of the regularization parameter is used, i.e., δ = 10 −8 . The results are presented in Figure 2. As expected, a larger value of N improves the Wiener solution, while the accuracy is also influenced by a lower SNR. The critical scenario appears for N ≈ L, where the conventional Wiener filter is not able to obtain a reliable solution even for a high SNR.  Next, the influence of the regularization parameter, δ, on the performance of the conventional Wiener filter is evaluated in different SNR conditions. First, in Figure 3a,b, good SNR conditions are considered, by setting SNR = 30 dB and SNR = 20 dB, respectively, while choosing different values for δ. As it can be noticed, the influence of the regulariza-tion parameter becomes more apparent when using a low quantity of data (e.g., N ≈ L) to estimate the statistics. In this case, a larger value of δ can improve the accuracy of the solution. The influence of the regularization parameter becomes less important for N L; in these conditions, a larger value of δ can even bias the Wiener solution. Nevertheless, in low SNR environments, a larger value of the regularization parameter is required. This is supported in Figure 3c,d, in which the SNR is set to 10 dB and 0 dB, respectively. As it can be observed, the largest value of δ improves the performance for any value of N. However, the 2 -norm regularization used in (19) does not exploit the sparse character of the system, which is taken into account within the 1 -norm framework presented in Section 4. Details of these results [focusing on the initial parts of the plots from Figure 3a-d] are shown in Figure 4a-d.

Iterative Wiener Filter with 1 -Norm Regularization
The iterative 1 -norm regularization from (26) can improve the Wiener solution for sparse system identification. This is supported in the next set of experiments, where the iterative 1 -norm regularization is compared to the conventional 2 -norm regularization of the Wiener filter.
In Figure 5, different SNRs are set and a large quantity of data is used to estimate the statistics, i.e., N = 5000. The conventional Wiener filter uses δ = 10 −8 , while the iterative 1 -norm regularization involves different values of the parameter δ 1 from (26). As it can be observed in Figure 5 (where N L), for some values of δ 1 , the iterative 1 -norm regularization leads to better performance. Details of these results [focusing on the overlapping curves in Figure 5a-d] are shown in Figure 6a-d. This is more apparent for low values of N, e.g., when N = L, as illustrated in Figure 7. In this case, the conventional 2 -norm regularization is clearly outperformed by the iterative 1 -norm version, for all the values of δ 1 .
The SNR also influences the performance of the iterative 1 -norm regularization, and the values of δ 1 from (26) depend on the noise level. As it can be observed in Figures 5-7, a higher value of δ 1 is required when SNR decreases. On the other hand, the gain of the iterative 1 -norm regularization is more apparent for low SNRs, where it performs much better with respect to the conventional Wiener filter using the 2 -norm regularization. Finding an optimal value of δ 1 is beyond the scope of this paper, in which the main purpose is to outline the features and the potential of the 1 -norm regularization for the identification of sparse systems. However, based on the previous experimental results, some empirical ranges for the values of this parameter could be recommended, depending on the SNR. For example, in good SNR conditions (e.g., SNR ≥ 20 dB), a reliable choice could be On the other hand, in moderate or low SNR environments (e.g., 0 ≤ SNR < 20 dB), another option could be δ 1 | SNR(dB)<20dB = 10 −(2+log 10  The validity of these recommended choices are verified in Figure 8, in which the iterative 1 -norm regularization is compared with the conventional 2 -norm regularization in various SNR conditions and for different values of N. Here, δ = δ 1 , where δ 1 is computed based on (65) and (66), depending on the SNR. As it can be inferred, the iterative 1 -norm regularization works significantly better in all the cases, but its gain is more apparent for low SNRs. Furthermore, the results indicate a good behavior for low values of N, especially for the critical case when N ≈ L.

Wiener Filter Based on the Kronecker Product Decomposition
The Wiener filter based on the Kronecker product decomposition (KPD) presented in Section 5 involves smaller data structures with respect to the conventional Wiener filter. While the conventional solution is obtained by directly solving (19), the KPD-based Wiener filter iteratively combines the solutions from (41) and (42). In other words, instead of dealing with a single filter of length L = L 1 L 2 (as in the conventional case), the KPD approach handles two shorter filters of lengths PL 1 and PL 2 , with P L 2 . The advantages become more apparent for the identification of long-length impulse responses, such as in echo cancellation [10].
In Figure 9, the conventional Wiener filter is compared with the KPD-based solution using different values of the decomposition parameter P. The regularization parameter is set to δ = 10 −8 for both algorithms. In this simulation, N is set to 5000 and different SNRs are used. As shown in Section 5, the value of P is related to the rank of the matrix H, which reshapes the impulse response h. In this scenario, the length of h is L = 500, so that the values L 1 = 25 and L 2 = 20 can be used. In this case, rank(H) = 3. As it can be observed in Figure 9a-c, the performance of the KPD-based Wiener filter is improved as the value of P increases up to the rank of H. The advantages become more apparent for lower SNRs, where the KPD solution outperforms the conventional Wiener filter for most of the values of P. It is interesting to notice that in heavy noise conditions (e.g., SNR = 0 dB), a lower value of P could be used, as indicated in Figure 9d.
In terms of the quantity of data available for the estimation of the statistics, the value N = 5000 corresponds to a favorable scenario, since L = 500, so that N = 10L in this case. Therefore, a reliable set of statistical estimates [see (63) and (64)] are available. However, even in this favorable scenario, the performance of the conventional Wiener filter is still influenced by the SNR level, as already supported in Figure 2. In other words, the accuracy of its solution is worse when the SNR decreases, as can also be inferred from Figure 9. On the other hand, the KPD-based Wiener filter operates with smaller data structures, and it is less influenced by the accuracy of the statistical estimates, as compared to the conventional Wiener filter. Consequently, it is also more robust to the SNR level. Especially in noisy conditions, i.e., with moderate and low SNRs (e.g., 10 dB and 0 dB, respectively), the KPD-based Wiener filter (for any value of P) outperforms the conventional Wiener solution. Next, the performance of the KPD-based Wiener filter (using P = 3) is evaluated for different values of N and in different SNR conditions, using the conventional Wiener filter as a benchmark for comparison. As it can be observed in Figure 10, the KPD approach outperforms the conventional solution for all the values of N and in all SNR conditions. The advantages become more apparent for low values of N, i.e., N ≈ L.

Wiener Filter with QEP Regularization
Finally, the last experiment is dedicated to the QEP regularization from (60), which is explained in Section 6. The linearization methods from (54), (55), (57), and (58) are referred in Figure 11 as QEP 1 and QEP 2, respectively. The conventional Wiener filter using δ = 10 −8 is considered for comparison. In Figure 11, different values of N and different SNRs are used. It can be inferred that the QEP regularization outperforms the regular 2 -norm regularization, especially for low values of N. In heavy noise conditions (e.g., SNR = 0 dB), the QEP approach leads to better results for all the values of N, as indicated in Figure 11d. It can also be observed that the two linearization methods perform in a very similar way. It is interesting to compare the results from Figure 11 with the plots from Figure 3. In this context, it can be observed that the QEP regularization is close to the optimal 2 -norm regularization of the Wiener filter.

Conclusions
The regularization process is a critical step in any ill-posed problem, especially in noisy environments. This important step is also required in conjunction with the wellknown Wiener filter, to avoid an ill-conditioned matrix inversion, as well as to control and improve its performance in low SNR conditions. In this paper, the focus was on several regularization techniques of the Wiener filter.
As shown in Section 4, the classical 2 -norm (or Tikhonov) regularization performs reasonably well only in good or moderate SNR environments. Nevertheless, for sparse systems, the overall performance of the Wiener filter can be improved by using a regularization technique based on the 1 norm. The iterative solution proposed in this paper outperforms the 2 -norm regularization, especially in challenging conditions, i.e., for low SNRs and/or when using less accurate statistical estimates.
The KPD-based approach is also suitable when only a small quantity of data is available for the estimation of the statistics, as supported in Section 5. The resulting iterative KPD-based Wiener filter can lead to significantly better results, as compared to the conventional Wiener filter for the identification of low-rank systems. This KPD-based version combines the solutions provided by two shorter filters, while the conventional Wiener filter operates with a single filter of much longer length. Consequently, the KPDbased approach fits very well for the identification of long-length impulse responses.
Finally, the QEP regularization technique presented in Section 6 improves the performance of the Wiener filter (as compared to the 2 -norm regularization), especially in noisy environments (i.e., low SNRs), but also when using less accurate estimates of the statistics. Experimental results have supported the main theoretical developments and indicated the robust performance of these regularization techniques.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author.

Conflicts of Interest:
The authors declare no conflict of interest.