A Recursive Least-Squares with a Time-Varying Regularization Parameter

: Recursive least-squares (RLS) algorithms are widely used in many applications, such as real-time signal processing, control and communications. In some applications, regularization of the least-squares provides robustness and enhances performance. Interestingly, updating the regularization parameter as processing data continuously in time is a desirable strategy to improve performance in applications such as beamforming. While many of the presented works in the literature assume non-time-varying regularized RLS (RRLS) techniques, this paper deals with a time-varying RRLS as the parameter varies during the update. The paper proposes a novel and efﬁcient technique that uses an approximate recursive formula, assuming a slight variation in the regularization parameter to provide a low-complexity update method. Simulation results illustrate the feasibility of the derived formula and the superiority of the time-varying RRLS strategy over the ﬁxed one.


Introduction
Least-squares (LS) estimation methods emerge crucially in many applied sciences and engineering applications such as geodesy, adaptive filters, statistics, signal processing and control theory, to name a few. It is widespread due to its simple and efficient implementation. The standard LS problem dates back to 1795 AD when Gauss used it to study planetary motions [1]. An obvious advantage of this estimation method is that it does not require prior information on data; only a signal model is assumed [2]. In many applications, data are increased progressively in time. In this case, data can be processed for leastsquares problems in two manners, either waiting for all the available data to be collected or processing them sequentially in time. The latter is known as recursive least-squares (RLS), or sequential least squares [2]. The RLS algorithm is one of the most significant advancements of the LS estimation in the twentieth century, which Gauss also conceived in his work on celestial bodies. The original work on the RLS algorithm in modern times is often credited to Plackett [3].
When working on LS problems, there are some cases where regularization is needed to compromise solution biasing and algorithmic robustness [4]. Different regularized LS and regularized RLS algorithms have emerged. One of these algorithms is the standard regularized exponentially weighted RLS that employs a regularization matrix whose elements fade exponentially to zero. However, some applications require maintaining regularization throughout the adaptive estimation process, such as beamforming. In addition, regularization ensures the existence of the sample covariance matrix inverse when the matrix has fewer rows than columns or is rank deficient. Regularization has a rich history in matrix computation and numerical analysis methods [5]. Tikhonov regularization methods are among the famous and earliest references on the regularization topic [6].
The literature concerning regularized RLS is quite limited because of the complexity associated with implementation [7]. In many research works, the regularization term does not fade but is treated fixed in all recursion steps, e.g., [8,9]. However, in some cases, such as adaptive beamforming, this regularization term needs to be adjusted, i.e., time-varying, to cope with changes in the data or noise [10,11]. The work in [12] proposes a special regularization structure that is updated in time using rank-1 matrices. The structure allows inflation or deflation of the regularization matrix and can ensure, on average, the invertibility of the sample covariance matrix with reasonable complexity. However, the sample covariance matrix can be positive-semidefinite, i.e., its smallest eigenvalue can become zero. The work presented in [4] proposes a time-varying regularized RLS with a generic structure. However, its implementation is complex.
Different variations of the regularized least squares for many applications are presented in the literature. For acoustic echo cancellation, the problems of acoustic impulse response estimation and nonlinearity modeling are addressed in [13] with a regularized RLS solution to handle these problems. A variable regularization technique is presented in [14] for the fast affine projection algorithm. The work in [15] introduced an algorithm that accounts for a cost function with a time-varying weighted regularization RLS using a sliding window approach.
Another parameter that is related to the regularization parameter is the forgetting factor parameter. In some cases, it is preferable to keep the newer data and ignore the older ones by using a forgetting factor parameter which applies higher weighting to recent data [16]. The problem of updating the forgetting factor parameter recursively in time can be found, for example, in [17][18][19].
In this paper, we consider the problem of solving regularized least-squares recursively. The recursion is viewed as a time-update process, which means that the number of observations increases as time progresses. The recursive solution helps in reducing the complexity associated with computations and memory storage requirements.

The Least-Squares Estimation Problem
This paper deals with estimating a vector x ∈ R n×1 from a linear model of the form where y ∈ R m×1 is the observation vector, A ∈ R m×n is a full-rank data matrix, (i.e., rank (A) = m if m ≤ n or rank (A) = n if n ≤ m), and z ∈ R m×1 is a Gaussian noise vector, (i.e., each element of z follows a Gaussian distribution). The LS estimation solves the following unconstrained minimization problem [20]: where . is the Euclidean norm. That is, we seek an estimate of x, (i.e.,x) that minimizes the l 2 -norm of the residual error. Two cases can be identified based on dimensions m and n. The first case is the over-determined least-squares, where the number of rows of data matrix A is at least equal to the number of its columns, i.e., m ≥ n; therefore, Equation (2) either has a unique solution or an infinite number of solutions [20]. In this case, the solution of Equation (2) is given byx The second case is the under-determined least-squares, where the number of columns of A exceeds the number of rows, i.e., m < n. In this case, there is an infinite number of solutions [20].
A variation of the standard least-squares optimization problem involves a regularization parameter which servers two purposes. First, it allows the incorporation of some a priori information about the solution into the problem formulation. Second, it alleviates problems associated with rank-deficiency or ill-conditioning of the data matrix A [20]. Different regularization methods have been presented in the literature which include Tikhonov regularization, the shrunken estimator, the covariance shaping LS estimator [21] and ridge regression [22]. In Tikhonov regularization methods [6], a penalty term is added to Equation (2) to shrink the elements ofx towards zero. A variety of penalty terms have been proposed; a popular form is the l 2 penalization defined as follows [23]: where γ ∈ R + is a regularization parameter. The solution of (4) is given bŷ where I n is the identity matrix of dimension n. It is shown that there always exists γ > 0 for which (5) produces lower mean squared-error (MSE) than the LS estimator in (3) [23].

Regularization Parameter Selection
The optimal regularization parameter can be set to minimize the MSE error of the estimate as follows [23][24][25]: where ]. Practically, γ opt does not have a closed-form solution because the actual vector x is unknown. Some methods are proposed in the literature that assume a specific structure of the noise vector z and/or data vector x. For example, if the noise vector z is zero-mean with i.i.d. elements, and x is of zero-mean and independent components, the optimal regularization parameter can be approximated as follows [23]: where σ 2 z is the variance of z and R xx E[xx T ] is the covariance matrix of x. From Equation (7), we observe that optimal tuning of the regularization parameter should take into account any variation in the noise of signal second-order statistics. In RLS, computing a new regularization parameter at each iteration will increase the computational complexity.
In the following section, we present a method for recursively updating regularization parameter in a computationally efficient manner.

Recursive Least-Squares
We assume that the current dimension of y and the number of rows of A is m. We denote this by y m and A m , respectively. Moreover, we can denote the individual entries of the observation vector y by {d i }, and the individual rows of A by {u T i }. As time progresses, the dimension of y m and the rows of A m increase by 1 and become y m+1 and A m+1 , which can be written as follows: where d m+1 is the new observed data and u T m+1 is the new row of A. Note that the subscript m can be viewed as a time index as well. The estimatex at dimension m can be written as follows:x and for the time-updated estimate,x m+1 , Going from Equation (10) to (11) is known as a time-update step because it amounts to employing new data {d m+1 , u T m+1 } in addition to all previous data [20]. Performing timeupdates based on Equations (10) and (11) is costly in terms of computation since it requires inverting two n × n matrices, in addition to memory storage, required for storing the entries of y m and A m [20]. Hence, it is important to seek a low-cost update method. The following section introduces two regularized recursive least-squares (RRLS) algorithms. The first one is well-established in the literature, which assumes a fixed regularization parameter through the update process [2,20]. The second (proposed) algorithm handles the case when the regularization parameter varies during the update process.

Recursive Least-Squares with Fixed Regularization
To obtain a recursive formula of the estimate before the update (x m ) and after the update, (x m+1 ), we can writex where The initial condition is P 0 = 1 γ I n . Note that P m can be viewed as the inverse of scaled regularized sample covariance matrix. We can write P m+1 as follows: Using the matrix inversion lemma (Equation (30.11) in [20]), we obtain This recursion updates P m to obtain P m+1 . By substituting Equation (17) in Equation (13) and rearranging, the estimatex m+1 can be obtained by updating x m as follows [20]: withx 0 = 0. The RRLS method with fixed regularization is summarized in Algorithm 1.

The Proposed Regularization Recursive Least-Squares
This section aims to derive a recursive formula of the RRLS estimator that takes into consideration the requirement to also update the regularization parameter. If the regularization parameter before the update is γ m and after the update is γ m+1 , then Equations (14) and (15) become and with an initial condition P 0 = 1 γ 0 I n . Now, P m+1 becomes Manipulating Equation (21) similar to Equation (17) using the matrix inversion lemma for Equation (21) is not helpful because the presence of (γ m+1 − γ m )I n . To obtain a desirable update formula, we rely on the assumption that ( γ m+1 − γ m ) is small compared to P −1 m + u m+1 u T m+1 . Hence, the following approximation can be utilized [26]: where With this assumption, the following recursions can be easily derived as: (22) is based on the first-order Taylor's series expansion of matrices. Consequently, more accurate results can be obtained by expanding to higher orders.

Remark 1. The approximation Equation
The proposed method for RRLS is summarized in Algorithm 2.

2.
For each new observed data, d m+1 , and new row, u T m+1 do the following: • Estimate the new regularization parameter, γ m+1 using a suitable regularization parameter selection method (e.g., Equation (7)).

Simulation Results
The results in this section simulate model (1), where A is generated independently from a Gaussian distribution with zero-mean and unit variance. The noise, z, and the data vector, x, are zero-mean Gaussian vectors. Having these assumptions on hand suggests the optimal regularization parameter can be obtained from Equation (7). Initially, the matrix A has 100 columns (n = 100) and 90 rows (m = 90). We simulate three scenarios with uncertainties in calculating the signal-to-noise ratio (SNR). The SNR varies uniformly during the update process within the three intervals [10.5, 11] dB, [10.25, 11] dB and [10,11] dB, i.e., the uncertainties are 0.5 dB, 0.75 dB, and 1 dB, respectively. We consider two competitive methods: the fixed RRLS and the time-varying RRLS. In the fixed regularization setting, the regularization parameter does not account for this variation in the SNR, while, in the varying regularization parameter, the parameter is changed according to Equation (7). Figure 1 compares the MSE performance for the fixed regularization and the varying regularization parameter methods when the number of observations is increased one at a time. Figure 1a shows the result obtained when the uncertainty is 0.5 dB. When applying the fixed regularized recursive formula at (m = 91), the MSE scores 17.  Figure 2 plots the error in estimating the data vector recursively using the proposed time-varying RRLS versus the number of rows, m. As expected, the recursive formula's error increases in each iteration because the derived formula relies on some approximations. Figure 2a plots the error curve when the uncertainty in the SNR is 0.5 dB. The minimum error is 9 × 10 −6 at m = 91 and the maximum error is 2 × 10 −3 at m = 100. Figure 2b illustrates that the recursive update's minimum and maximum errors are between 1 × 10 −6 and 8 × 10 −2 , respectively, for 0.75 dB uncertainty. Finally, for 1 dB uncertainty in calculating the SNR in Figure 2c, the minimum error is 7 × 10 −5 at m = 91 and the maximum error is 1 × 10 −2 at m = 100.

Conclusions
Recursive least-squares algorithms are widely used in many applications. This paper proposed a recursive algorithm for computing a time-varying non-fading regularized RLS. An approximate formula of the regularized RLS was derived, assuming slight deviations in the parameter during the time-update process to cope with noise and data statistics variations. Simulation results provided in this paper considered different uncertainties in calculating the SNR. The results demonstrated the benefit of the proposed approach in providing an efficient means of tracking the regularization parameter changes compared to the fixed RRLS method. Because of the approximation involved in deriving the proposed method, its efficiency is limited to the cases where the change in the regularization parameter is small. Hence, more accurate solutions with efficient implementation are needed for future work.