Improved Convergence Speed of a DCD-Based Algorithm for Sparse Solutions

: To solve a system of equations that needs few updates, such as sparse systems, the leading dichotomous coordinate descent (DCD) algorithm is better than the cyclic DCD algorithm because of its fast speed of convergence. In the case of sparse systems requiring a large number of updates, the cyclic DCD algorithm converges faster and has a lower error level than the leading DCD algorithm. However, the leading DCD algorithm has a faster convergence speed in the initial updates. In this paper, we propose a combination of leading and cyclic DCD iterations, the leading-cyclic DCD algorithm, to improve the convergence speed of the cyclic DCD algorithm. The proposed algorithm involves two steps. First, by properly selecting the number of updates of the solution vector used in the leading DCD algorithm, a solution is obtained from the leading DCD algorithm. Second, taking the output of the leading DCD algorithm as the initial values, an improved soft output is generated by the cyclic DCD algorithm with a large number of iterations. Numerical results demonstrate that when the solution sparsity γ is in the interval [ 1/8, 6/8 ] , the proposed leading-cyclic DCD algorithm outperforms both the existing cyclic and leading DCD algorithms for all iterations.


Introduction
With the development of information technology, the number of participated devices and data transmission rate have substantially increased in recent years. Solving the problems in a wide range of signal processing applications is equivalent to getting the solution of a linear least squares (LS) problem [1]. These applications include adaptive antenna array applications [2], multi-user detection [3], multiple-input multiple-output (MIMO) detection [4], echo cancellation [5], equalization [6], and system identification [1,[7][8][9]. If the channel information is known, zero forcing (ZF) algorithm and minimum mean-square error (MMSE) algorithm are popular to be used in these applications. They are simple to implement but require the operation of matrix inversion.
The complexity of matrix inversion requires O(U 3 ) arithmetic operations, where U is system size [10]. When the system size is large, the complexity of the matrix inversion is prohibitively high. As a result, there are some techniques proposed to solve systems of equations without inverting the matrix. Direct methods, for example Gaussian elimination, Cholesky factorization, and QR decomposition attain a complexity of O(U 3 ) [10,11]. Therefore, it is difficult for direct methods to be implemented in real-time signal processing and hardware applications. Using these methods to solve the linear equations of large sparse systems may be prohibitively expensive and are infeasible for practical applications. Iterative methods, for example, the conjugate gradient method and the steepest descent method could achieve fast convergence. In each iteration, they require O(U 2 ) complexity. Other iterative methods, for example, the Southwell's relaxation [12], Jacobi [10] and Gauss-Seidel [10], are coordinate descent approaches. At each iteration, they only require O(U) complexity but have a slower convergence speed. The number of iterations performed decides the computational complexity of these techniques. Iterative methods can solve problems with sparse structures more efficiently compared with direct methods [13]. Moreover, iterative methods have the potential to achieve a good initial guess for the expected results, which can reduce the number of iterations required to obtain a solution [14]. However, some iterative methods include the numerical operations of multiplying or dividing, which are difficult for hardware implementation.
Prior research work investigates the dichotomous coordinate descent (DCD) algorithms that use a reduced number of operations to solve systems of equations. DCD-based algorithms are well known in many applications because the DCD algorithm avoids the operations of multiplying and dividing, which are expensive to implement. In [15][16][17], by applying the DCD iterations, the numerical complexity of affine projection algorithms for active noise control is reduced. In [18], DCD iterations solve the recursive least squares (RLS) matrix inversion problem by using bit-shift. The DCD algorithm [14,19] is a non-stationary iterative method based on stationary CD techniques. Only O(U) additions or O(1) additions are required in each iteration [11]. The DCD algorithm, therefore, is a good choice for real-time hardware implementations. The DCD algorithm variants, cyclic DCD and leading DCD algorithm, were presented in [20]. When scenarios with sparse true solutions are considered, such as multi-path channel estimation and detection of a multi-user system with several unknown active users, the leading DCD algorithm converges faster than the cyclic DCD algorithm. The cyclic DCD algorithm is ideal for solving systems of equations that require a large number of iterations because it converges faster and has a lower error level than the leading DCD algorithm. However, the leading DCD algorithm has a faster convergence rate than the cyclic DCD algorithm in the initial iterations if the solution is not very sparse.
In this paper, we considered a combination of leading and cyclic DCD iterations and proposed a leading-cyclic DCD algorithm for obtaining a sparse solution in systems requiring a large number of updates. The idea is that the leading DCD algorithm uses a small number of iterations to obtain the initial input for the cyclic DCD algorithm. The number of iterations used in the leading DCD algorithm has been thoroughly investigated under different system matrix conditions and solution sparsities. In the proposed algorithm, a range of the number of iterations is determined that will produce an optimal number of updates for the leading DCD algorithm. The results show that the proposed leading-cyclic DCD algorithm improves the convergence speed of the cyclic DCD algorithm and lowers the steady-state level.
Notation: Boldface uppercase and boldface lowercase letters represent matrices and vectors (e.g., R and Γ). Elements of matrix and vector are denoted as R p,n and Γ n , respectively. The n-th column of R is denoted as R :,n . (·) T denotes a matrix transpose, and (·) H denotes the Hermitian transpose.

Preliminaries
We start by introducing the system model used in this work. We then discuss the DCD algorithms used to solve the linear model.

System Model
The system linear model can be modeled as where A is a Q × U matrix, Z is a Q × 1 received vector, and x is a U × 1 unknown vector. We assume that U < Q and the matrix and vectors are real-valued. In addition, K (K < U) elements of vector x are non-zero. For example, x is a sparse vector, and the index set of non-zero values is not given. There are many applications to solve problems with sparse true solutions. The multi-path communication channels, concerning the uncertainty delay interval, several multi-path components can be very small. The maximum likelihood (ML) estimation is usually applied to solve normal equations [21], to achieve the sparse true solutions x 0 in sparse multi-path channels. Multi-user detection can be considered to be another example when the number of active users K is less than the expected number of users U [22]. Whether the matrix R = A H A can be pre-computed determines the different computational complexity of sparse solution estimators. Calculation of the matrix R is complex for the algorithm. The pre-computed matrix R leads to the low complexity of the algorithm. In this work, we assume that matrix R is known. The vector of the signal is calculated as: where β = A T Z. It costs O(U 3 ) complexity for computing the matrix inversion directly. The matrix inversion computing becomes prohibitively high when the system size U increases. To avoid such a high computational complexity, we can solve the normal Equation (1) by minimizing the quadratic function: To solve (3), an iterative descent search method can be applied as a possible choice. The descent search method updates the solution vectorx is the update direction of the solution. The direction is designed to be non-orthogonal to the residual vector Γ (Γ (i) = β − Rx (i) ). When currentx (i) is not the exact point, i.e., J(x (i+1) ) < J(x (i) ), it shows that the descent search method converges to the original problem solution with descent objective values in the convex area.

DCD Algorithms
The DCD algorithm is one of many iterative techniques for solving the linear LS problem [19]. There are M b bits to represent the elements of the solution vector. The amplitude of the element is limited to [−ζ, ζ]. The iterative search begins with the most significant bits of the solution x and ends with the least significant bits updated. The execution is controlled by the step size σ (σ > 0). σ starts at σ = ζ and decreases as σ → σ/2 for less significant bits. DCD-based techniques usually use bit-shift to replace the multiplication/division operations, which is attractive for hardware implementation. The variants of the DCD algorithm: cyclic DCD algorithm and leading DCD algorithm, have different properties and been applied to different applications. These algorithms are described below.

Cyclic DCD Algorithm
The cyclic DCD algorithm [20], which is described in Table 1, updates a solution vector x in cyclic order (n = 1, 2, . . . , U). In each iteration, when an element of the solution vector is updated, we call it a "successful" update. When σ updates, the algorithm executes the successful updates until the elements in the residual vector Γ cyclic are too small to meet the condition in step 3. Then, σ is updated in step 1. The number of successful iterations p and the number of bits M b are the essential parameters of the algorithm. They determine the main computational complexity of the cyclic DCD algorithm. The maximum number of successful iterations, N u cyclic , is predetermined. It can be used as a stopping condition for the algorithm. For a given N u cyclic and M b , the computational complexity of the cyclic DCD algorithm is restricted to an upper limit U(2N u cyclic + M b − 1) + N u cyclic additions. However, the cyclic order update is not an efficient choice for solving a system of equations that needs a few updates. Table 1. Cyclic DCD algorithm. Step Input β, R, M b , ζ, N u cyclic ; Output: x, Γ cyclic Addition

Leading DCD Algorithm
A good method of index selection may speed up the convergence. The leading DCD algorithm [20] is briefly described in Table 2. The index n in the leading DCD algorithm is chosen via n = arg max j=1,2,···U {|Γ leading j |}. The (n-th) element in x corresponds to the (n-th) element in the residual vector Γ leading that has the largest absolute value. Given several iterations N u leading , the computational complexity of the leading DCD algorithm is restricted to an upper limit (2U + 1)N u leading + M b additions. Table 2. Leading DCD algorithm. Step Input β, R, M b , ζ, N u leading ; output: x, Γ leading Addition Table 1 shows that when N u cyclic is much larger than M b , then the computational complexity of the cyclic DCD is dominated by 2UN u cyclic additions. When the N u cyclic value is small, then the computational complexity is upper bounded by the term U M b .

Numerical Results
In this section, numerical results show that cyclic and leading DCD algorithms deal with system matrices R with different condition numbers. We consider real-valued systems here. The condition number of the matrix R is attained by modifying the ratio between Q and U. Usually when U is closer to Q, the matrix condition number is higher. For example, performing a highly loaded multi-user detection is equivalent to solving a linear equation β = Rx, where β is the output vector of the matched filter. x is a randomly generated U × 1 transmitted data vector whose elements are uniformly distributed on [−1, +1].
Using the DCD algorithm to solve (1), an estimated solutionx is obtained. We determine the misalignment betweenx and x as The misalignments are averaged by 100 simulation trials and plotted (in decibels) against the number of updates. Figure 1 depicts the misalignments of the cyclic and leading DCD algorithms. A system matrix R is designed with small condition numbers in the interval [2,5]. It is seen that the cyclic DCD algorithm has a slightly slower convergence than the leading DCD algorithm.  Figure 2 depicts the misalignments of the cyclic and leading DCD algorithms in the system with high condition numbers in the range of [300, 400]. It shows that the cyclic DCD algorithm has a faster convergence than the leading DCD algorithm. Also, the convergence speed of the two DCD algorithms is slower than that in Figure 1.
When we consider a scenario with a sparse true solution, the number of updates N u is expected to be reduced. Figures 3 and 4 display misalignments of the DCD algorithms in the case of a system matrix with high condition numbers. It is seen that the solution is sparser, the convergence of the DCD algorithm is faster.
From the results in Figures 3 and 4, we can see that in the case of K = 16, compared to the non-sparse system, the number of iterations required by the leading DCD algorithm to attain a misalignment of −50 dB is approximately reduced by a factor of 20. However, for the cyclic DCD algorithm, the number of iterations reduction is not significant. The number of updates required in the leading DCD algorithm is approximately reduced by a factor of 30 in the case of K = 8. For the cyclic DCD algorithm, the number of iterations is approximately reduced by a factor of 1.5. Therefore, the leading DCD algorithm is preferable for highly sparse scenarios because it could achieve a significant reduction in the number of updates. When K = 32, we can see that the cyclic DCD algorithm has a lower steady-state misalignment; however, the leading DCD algorithm has a faster convergence speed in the initial iterations. When K = 64 (non-sparse system), the cyclic DCD algorithm has faster convergence and lower steady-state misalignment than the leading DCD algorithm.

Proposed Leading-Cyclic DCD Algorithm
From the results above, we can see that the cyclic DCD algorithm is more suitable for solving a non-sparse system with a large condition number. However, it is noted that when the percentage of non-zero elements of the solution is approximately 50%, the leading DCD algorithm provides a faster convergence in the initial updates. To further improve the convergence when the solution is not highly sparse, we consider combining the leading and cyclic DCD algorithms and propose the leading-cyclic DCD algorithm shown in Table 3. Table 3. Leading-cyclic DCD algorithm. Step Input The proposed algorithm involves two steps. In the first step, a data vector is obtained using the leading DCD algorithm with a considerably smaller number of iterations N u leading than in the standard version of the leading DCD algorithm. In the second step, an improved soft output is generated by applying the cyclic DCD algorithm with a large number of iterations N u . The updated vectors of x and Γ and the step size of σ obtained from the leading DCD algorithm are the initial values for the cyclic DCD algorithm. There are two factors that determine the worst-case computational complexity of the leading-cyclic DCD algorithm. The cyclic DCD algorithm starts at m leading − 1 bit for updating an element of the solution. The upper computational complexity of the cyclic DCD algorithm is calculated as follows. For the m-th bit, m = m leading − 1, 2, 3, · · · , M b − 1 has (N u − N u leading ) successful iterations. That is, executing the initial (M b − m leading ) bits does not include a successful iteration and thus requires (M b − m leading )U additions. Among the U iterations (n = 1, 2, · · · , U), if there is only one successful iteration, then calculating the last bit (m = M b ) requires U additions for the comparison and (U + 1) additions for updating the residual vector Γ and elements x n . In total, (N u − N u leading ) successful iterations require N u − N u leading (2U + 1) additions. The upper bound of computational complexity of the cyclic DCD algorithm is {U[2(N u − N u leading ) + (M b − m leading ) − 1] + (N u − N u leading )} additions. The worst-case scenario for the complexity of the leading DCD algorithm occurs when the algorithm uses all N u leading updates. By using N u leading iterations, the computational complexity of the leading DCD algorithm is {(2U + 1)N u leading + m leading } additions. Therefore, the computational complexity of the leading-cyclic DCD algorithm has an upper limit of {(2U + 1)N u + (1 − U)m leading + (M b − 1)U} real-valued additions. In a practical situation, in each pass there should be several successful updates, and the average complexity is close to 2UN u . N u cyclic is approximately equal to N u ; therefore, the average complexity of the leading-cyclic DCD algorithm is close to that of the cyclic DCD algorithm.
The selection of N u leading varies based on the system matrix condition and the solution sparsity. The leading-cyclic algorithm is evaluated by using different N u leading values in the range of [U(1 − γ), Q + U]. Among these values of N u leading , the one that allows the proposed algorithm to achieve the fastest convergence to the steady-state level is chosen as the optimal N u leading . Figure 5 shows the optimal N u leading in the leading-cyclic DCD algorithm for matrices with different sparsities γ. The misalignments compared to the number of successful iterations for large system matrices with condition numbers (≥100) and sparse solutions are shown in Figures 6-11. By comparing the results from Figures 6 and 7, we can see that the lower the condition number of the system matrix is, the faster the convergence speed and the lower the steady-state level of these DCD algorithms. From these figures, we can see that when γ ≤ 1/8, the leading DCD algorithm has a significantly faster convergence speed and a lower steady-state misalignment than the cyclic DCD algorithm. The leading-cyclic DCD algorithm has approximately the same steady-state level as the leading DCD algorithm. As γ increases, the cyclic DCD algorithm gradually outperforms the leading DCD algorithm. The leading-cyclic DCD algorithm with a given N u leading (shown in Figure 5) has an increased convergence speed and a lower steady-state misalignment compared to the leading DCD algorithm. When γ = 2/8 increases to γ = 6/8, the leading-cyclic DCD algorithm exhibits an insignificantly increased convergence speed compared to the cyclic DCD algorithm. Figures 8-11 show the misalignments of the DCD algorithms with conditional numbers of the system matrix in the interval [300, 400]. We can see that when ξ = Q/U increases, the convergence speed of the DCD algorithms decreases. According to the numerical results above, we can see that the leading-cyclic DCD algorithm is an effective approach for solving a sparse system with a large number of updates. For example, in code division multiple access (CDMA)with a spreading factor Q, there are approximately half of the users are active in the U user system. The columns of A are represented as spreading sequences. The matrix R is represented as a correlation matrix of the spreading sequence. Figure 12 shows the misalignments of the RLS algorithm and DCD leading−cyclic -RLS (forgetting factor λ = 0.96, the length of the filter U = 16, and γ = 1/2). When system noise energy increases by 10 times between 1000 iterations and 1100 iterations, the DCD leading−cyclic -RLS maintains a low error rate. The proposed algorithm-based RLS provides a lower error level and better tracking performance than the RLS algorithm.

Discussion
When γ increases, the convergence speed of the proposed algorithm (e.g., at the misalignment −50 dB) decreases. The optimal N u leading required for solving a system of equations varies depending on the system size, the solution sparseness and the condition number of the system matrix. In this work, we choose the value that allows the proposed algorithm to achieve the fastest convergence to the steady-state level as the optimal N u leading . The results show that the optimal N u leading varies around Q and approximately remains stable with different γ (except for in the case of 32 × 37). The ratio between Q and U in the case of 32 × 37 is lower than those in the other cases with the same condition number range, and the leading DCD algorithm might contribute fewer iterations in the proposed algorithm to achieving the steady state when γ increases. Investigation of the general theory of induction for choosing the proper N u leading in the combined algorithm is left for future work.

Conclusions
In this paper, we have demonstrated that the leading DCD algorithm and the cyclic DCD algorithm are useful in different applications. In particular, we consider a scenario in which the sparsity of the true solution is γ ∈ [1/8, 6/8]. We propose a leading-cyclic DCD algorithm that solves normal equations with sparse solutions. The results show that the proposed leading-cyclic DCD algorithm has an optimal N u leading value in the range of [U(1 − γ), Q + U] and exhibits improved convergence compared to the cyclic DCD algorithm and the leading DCD algorithm. In addition, within the same range of the system matrix condition number, the larger Q/U is, the slower the convergence of the DCD algorithms.