Adaptive Sparse Cyclic Coordinate Descent for Sparse Frequency Estimation

: The frequency estimation of multiple complex sinusoids in the presence of noise is impor-tant for many signal processing applications. As already discussed in the literature, this problem can be reformulated as a sparse representation problem. In this letter, such a formulation is derived and an algorithm based on sparse cyclic coordinate descent (SCCD) for estimating the frequency parameters is proposed. The algorithm adaptively reduces the size of the used frequency grid, which eases the computational burden. Simulation results revealed that the proposed algorithm achieves similar performance to the original formulation and the Root-multiple signal classiﬁcation (MUSIC) algorithm in terms of the mean square error (MSE), with signiﬁcantly less complexity.


Introduction
Parametric frequency estimation has been an active research area for many years. As a consequence, several methods have been proposed in the literature for addressing this problem. Among the most classic estimation strategies are the multiple signal classification (MUSIC) [1], its extension Root-MUSIC [2], and the estimation of signal parameters via rotational invariance techniques (ESPRIT) [3]. The principal drawback of these algorithms is their high computational complexity due to the required computation of the eigenvalue decomposition (EVD) of the measurement signal's covariance matrix [4,5]. Furthermore, a degradation of the performance in scenarios with low signal to noise ratios (SNRs) can be also observed. Some approaches found in the literature, e.g., the work proposed in [6] for source localization, inspired the application of sparse representations in frequency estimation as an alternative method for overcoming the limitations of the classic techniques. As a consequence, some algorithms that exploit the sparse nature of the signals in the frequency domain have been introduced [7][8][9][10] in the literature. These methods rely on the sparse representation of the model. This means that the model assumes that the signal can be well-approximated as a linear combination of just a few underlying functions from a known basis or dictionary. Previous works based on these concepts using structured and dynamic dictionaries [11] have also been proposed. These algorithms have significant advantages, such as better performance compared with classic methods, less computational complexity, and more accurate results in scenarios with fewer available measurements.
The frequency estimation problem can be modeled as a linear combination of complex exponentials in the presence of noise via the following model: x q e j2π f q m + w [m] (1) In applications such as frequency estimation where the measurement matrix is composed of complex exponentials that are highly correlated, conditions such as the one enunciated above are violated. For example, in a scenario with M = 50, ∆ f = 1/(20M) and p = 5, the mutual coherence of the matrix A is equal to 0.9959. Therefore this condition cannot be satisfied (i.e., µ = 0.9959 > 1/(2p − 1) = 0.111).
On the other hand, convex optimization methods have very good performance guarantees, which make them a reliable tool for sparse signal recovery but with the drawback that these methods typically have a higher computational cost for large-scale problems. Therefore, iterative approaches such as sparse cyclic coordinate descent (SCCD) [21] were introduced for solving these problems. SCCD is an optimization technique that sequentially minimizes a cost function in the direction of a single coordinate while leaving all other coordinates fixed. This is particularly useful in cases where the single parameter problem is easy to solve [22]. It is also especially suitable for problems where the gradient of the cost function does not have a closed-form solution. Moreover, efficient storage in a sparse vector format can be implemented since SCCD only updates one entry of the sparse vector at each iteration. SCCD also offers other advantages in terms of memory requirements compared to alternative algorithms such as linearized Bregman iterations. SCCD only uses a sparse vector of length N, and storing such a vector requires only a memory size depending on the number of non-zero values, whereas LBI requires an additional vector of length N which is dense and requires a most costly update [23,24]. Based on these characteristics, SCCD can be considered a more suitable candidate for the frequency estimation problem [12].
In this letter, a new version of SCCD based on an adaptive frequency grid is proposed. The proposed method inherits the characteristics of the original formulation and in addition employs several strategies for accelerating the convergence and reducing the number of required mathematical operations. The main idea of the proposed method is to effectively reduce the size of the grid (i.e., to reduce the number of columns of the matrix A) for estimating the frequency parameters. This will allow one to reduce the computational complexity, and subsequently the number of required mathematical operations in comparison to the original formulation. Therefore, it enables a significantly better complexity/precision trade-off. As our approach mainly consists of complex rotations, it is easily implementable in fixed-point digital hardware with only moderate precision requirements. The proposed method also achieves similar performance to the Root-MUSIC algorithm, but with significantly less computational complexity.

Adaptive-Sparse Coordinate Descent Algorithm
The SCCD algorithm solves the following cost function: arg min where the regularization parameter λ is used to adjust the sparsity of the estimation result. Higher values of λ lead to more sparse estimation results [12]. An interesting phenomenon occurs over the iterations in SCCD: most coordinates at each update that are zero never become nonzero [22]. This characteristic allows using different strategies to speed up the convergence of the algorithm. For example, one can use a vector of lambda in descending order instead of a fixed λ [25]. Then, the estimation result of a higher λ is used as the start value for the next estimation with a lower λ. Such a hot-starting strategy reduces the computational effort to obtain a whole solution path because the result of such a previous estimation provides a start vector for the next estimation that is typically closer to its solution than the all-zero vector. As a consequence, the computation of the final solution is considerably faster than computing the solution using a fixed arbitrary λ.
In [12], we proposed a version of SCCD for frequency estimation, where a decreasing strategy for updating the regularization parameter and a peak search are used for detecting the p frequencies exactly. This is repeated in Algorithm 1 for the reader's convenience.
Algorithm 1 starts with a λ-value that would lead to the all-zero solution (i.e., λ 1 = λ max where λ max = max i |a H * ,i y|) [12] and reduces λ until the number of estimated frequencies matches the desired number of frequencies p. For this, we use a vector of descending entries: λ = [λ 1 = λ max , λ 2 , . . . , λ L ]. The iteration process basically consists of two principal steps. In the first step (line 8), the inner product between the corresponding column a * ,i and the residual r is computed and added to the currently estimated vector. Then, the resulting value is passed as the argument to the shrink function. The shrink function outputs zero if its input is smaller than a defined threshold (i.e., in our case smaller than µ i λ l ). If the input is larger than the threshold, the magnitude of the input is reduced by the threshold value. Figure 1 graphically depicts the shrink function. After applying the shrink function, the second step (line 9) of Algorithm 1 updates the residual. In order to find an estimated vector with exactly p nonzero values that correspond to the true frequencies, a very fine grid of λ values are required. This also implies a considerable increase in the number of iterations. In order to address this issue, a peak search on the estimated vector can be performed (line 11 of Algorithm 1). This allows detecting p frequencies, even when the number of nonzero elements of the estimated vector is larger than p. This strategy also allows using a relatively coarse vector of λ values while still providing good estimation performance. The vector a * ,i in Algorithm 1 is composed of complex exponentials. Therefore, the inner product of such a vector with the vector r and the multiplication by a scalar (complex) number µ i (line 8 of Algorithm 1) can be easily performed by the CORDIC algorithm without explicitly calculating and storing a * ,i . Hence, the vector a * ,i will only exist virtually in the hardware implementation [12].
Based on Algorithm 1, we propose an extension of the work published in [12]. In the proposed version, we continue to exploit the fact that most coordinates that are zero remain unchanging over the iterations. This characteristic allows discarding the areas of the frequency grid that remain zero over the iterations. Then, the frequency grid can be redefined to the areas of interest where the values are nonzero. This dimensional reduction of the frequency grid considerably reduces the number of mathematical operations needed per iteration and speeds up the convergence of the algorithm. The proposed algorithm is called Adaptive-SCCD and consists of two main steps: • In the first step, Algorithm 1 is run using a coarse grid. This means that the defined frequency spacing of the grid ∆ f coarse uses a coarse grid spacing, which leads to a moderate number of columns of A. After running the algorithm, the non-zero positions are detected. These positions are located in the non-zero intervals around the peaks of the estimated vectorx. This is the reason why we performed a peak search and detected the beginnings and ends of these intervals. • In the second step, we again run Algorithm 1, but in this case, the frequency grid is constructed only considering the intervals detected in the first step and using a frequency spacing ∆ f fine . This considerably reduces the number of grid points (i.e., number of columns of the matrix A) that need to be considered, resulting in a reduction of the required number of mathematical operations.
Algorithm 1 Sparse cyclic coordinate descent (SCCD) for frequency estimation [12] Input if |P| < p then 13: l ← min(L, l + 1) 14: else return frequencies corresponding to P 15: end if 16: end if 17: end for The two main steps of the proposed algorithm are illustrated in Figure 2. The main advantage of this approach is that the first estimation (coarse search) allows finding intervals centered around the peaks. These intervals can be interpreted as the potential regions of the grid where the true frequencies can be located. Then, the size of the frequency grid is effectively reduced in the second step (fine search) by only considering the intervals detected before. As a consequence, the number of points of interest that need to be evaluated in the second step is reduced and the number of required mathematical operations also decreases.
The pseudocode of Adaptive-SCCD is described in Algorithm 2, where FindNonZe-roPositions and BuildGrid are auxiliary functions. Note that in Figure 2, the intervals centered around the peaks of the absolute value of the estimated vectorx correspond to the non-zero positions. The function FindNonZeroPositions finds the start and end values of these peaks and stores them in the vectors s and e. The function is described in Algorithm 3. The function BuildGrid is used to build a fine frequency grid in the intervals found by the coarse estimation. The fine grid uses the frequency spacing ∆ f fine . It is described in more detail in Algorithm 4. Finally, the function SCCD_fine is used for estimating the frequency parameters using the fine frequency grid. This function is similar to the original one described in Algorithm 1, only the inner product of line 8 is now performed by only considering the non-zero intervals detected in the first run of SCCD. Then, for implementing the function SCCD_fine, the lines 5 and 6 in Algorithm 1 are replaced by  10: if isempty(start) then 11: start=1 12: end if 13: s(k) = start 14: end = find(( f grid > f grid (loc(k))) & (dx >= 0)) 15: if isempty(end) then 16:

Results
The performance of our method in the following estimation scenarios is evaluated in terms of the mean square error (MSE).
between the correct frequencies f i , i = 1, 2, . . . , p and their estimatesf i,r in R = 1000 runs. The measurement noise samples were drawn from an i.i.d. complex Gaussian random process with zero mean and variance σ 2 . The measurements have been created by summing p real sinusoids with phase φ uniformly distributed at random in the interval (0, 2π) and the amplitudes were chosen randomly to be −1 or 1 for all the simulations.
The vector λ is defined as λ = [λ max , λ max , 2 λ max , . . . , (L−1) λ max ] where = 0.99 and L = 50. In the case of SCCD, we considered a frequency grid with a spacing of ∆ f = 1/(20M), and for Adaptive-SCCD we consider a coarse grid with ∆ f coarse = 1/(2M) for the first step and in the second step a spacing equal to the original version (i.e., ∆ f fine = ∆ f = 1/(20M)). We also included the following algorithms for comparison: Root-MUSIC; ESPRIT, whose code is available in [26]; discrete-time Fourier transform (DTFT); and the two-step iterative shrinkage/thresholding (TwIST) algorithm [27], whose code is available in [28]. The simulation results for Root-MUSIC were obtained using the function rootmusic built in MATLAB and considering autocorrelation matrices of size M/2 × M/2 for Root-MUSIC and also for ESPRIT. In the case of the algorithm TwIST, the parameter τ was set to 5. Figure 3 depicts a scenario with M = 50 and five frequencies selected at random from the interval (0, 0.5). Please note that Adaptive-SCCD showed about the same performance as the original version of SCCD. This validates our affirmation that the adaptive reduction of the frequency grid size does not affect the final performance. Both versions of SCCD also outperformed the Root-MUSIC, ESPRIT, DTFT, and TwIST algorithms for this test case. In the case of the DTFT, it is known that in the presence of multiple frequencies, the DTFT suffers from spectral leakage due to interference among the sinusoidal components [29], which causes it to become biased [30]. Its inferior performance in this scenario was expected. In the case of the subspace techniques, Root-MUSIC showed the best performance compared to ESPRIT and is comparable in performance to Adaptive-SCCD. The performance gap between Root-MUSIC and ESPRIT is due to the fact that they work in different subspaces: Root-MUSIC works in the noise subspace with dimensions M − p, and ESPRIT works in the signal subspace with dimension p. Moreover, TwIST is an interesting approach based on the iterative shrinkage/thresholding (IST) algorithm that was originally proposed for image restoration, but we can notice that its performance for the simulated scenario of frequency estimation was inferior to Root-MUSIC and the proposed method. We also analyzed how the selection of ∆ f coarse affects the final performance. We experimentally determined that a minimum spacing ∆ f coarse = (1/2M) is necessary to guarantee the same final performance than SCCD. Figure 4 shows the performance of the original formulation versus the adaptive version for different values of ∆ f coarse . In the simulations, ∆ f coarse = factor · (1/(20M)). This is the reason why the figure is represented as a function of the factors. In the case of SCCD, the spacing of the grid ∆ f = 1/(20M) was always the same for all simulations.
Note that for values of ∆ f coarse > (1/2M), i.e., factor values greater than 10, the performance of Adaptive-SCCD was worse than the performance of SCCD. This behavior was due to the fact that large values of ∆ f coarse lead to a very coarse frequency grid. This produces scenarios with M > N or even N ≤ p, where the coarse estimation fails to find all the intervals of interest (i.e., some parts of the grid where the true frequencies are located can not be detected in the coarse search). Therefore, some of these intervals of interest were excluded in the fine frequency grid. As a consequence, the Adaptive-SCCD algorithm was unable to estimate the true frequencies in the fine search. As expected, we also observed in the simulation results that for larger values of M, the final performance improved, but we must consider that the increase of M also had an impact on the number of overall required computations. In [12], we outlined that the Algorithm 1 can be efficiently implemented in hardware using the CORDIC algorithm. In order to evaluate how the increase of the number of measurements impacts the complexity of such an implementation, we provide Figure 5.  Figure 5 shows the average numbers of required CORDIC operations of SCCD and Adaptive-SCCD for different numbers of measurements. We considered the total number of iterations until the algorithms finished. A second-order polynomial in M has been used to fit the points with resulting R 2 (R-squared) values of 0.9988 and 0.9997 for Adaptive-SCCD and SCCD, respectively. In addition, a first-degree polynomial fitting for Adaptive-SCCD is also shown with an R 2 (R-squared) value of 0.9834. Note that the complexity of Adaptive-SCCD is significantly smaller than SCCD whenever the number of measurements increases. This result confirms that the proposed technique significantly reduces the computational complexity of the original formulation and provides a more efficient alternative for hardware implementations. Moreover, the subspace method MUSIC has a computational complexity of O(M 2 N + M 2 ) [31]. As typically the number of grid points N M, the complexity can be approximated to O(M 3 ). This implies that the computational cost of MUSIC scales cubically with respect to the number of measurements while the complexity of Adaptive-SCCD only scales quadratically in the number of measurements. Therefore, a significant complexity gap should be expected between an implementation of the MUSIC algorithm in digital hardware and the proposed method in scenarios where the number of measurements increases.

Materials and Methods
The implementations were written in MATLAB 2008a and simulations were run on an HP laptop running Windows 10 64-bit with an Intel(R) Core(TM) i7-7600 CPU and 16 GB of memory.

Conclusions
We proposed an adaptive algorithm based on SCCD for the frequency estimation problem that requires significantly less computational effort than its basic version. The proposed method relies on the sparse representation of the signal model, where the signal is represented as a multiplication of a measurement matrix A with a sparse vector. In order to estimate these frequencies, we proposed a method that effectively reduces the size of the frequency grid and considerably reduces the overall computational complexity without loss in performance. Moreover, the complexity of the method only scales quadratically with the number of measurements, while for other alternative methods, such as Root-MUSIC, it scales cubically. Basically, an implementation of Adaptive-SCCD in digital hardware will mostly consist of CORDIC operations and additions. This will result in a significant complexity gap between digital hardware implementations of Root-MUSIC and the proposed method. Due to the structure of the algorithm, designs can be efficiently implemented using sparse vector formats. Simulations have also shown that the proposed estimator achieves similar results in terms of the MSE to the original version, and outperforms other techniques such as Root-MUSIC.
The use-cases of a low-complexity and high-precision sparse frequency estimation method are manifold, including real-time applications in areas such as audio signal processing, radar or communications.
Author Contributions: All authors contributed equally to this work. All authors have read and agreed to the published version of the manuscript.
Funding: This work has been supported by the COMET-K2 "Center for Symbiotic Mechatronics" of the Linz Center of Mechatronics (LCM) funded by the Austrian federal government and the federal state of Upper Austria. This work is also supported by the Austrian ministries BMVIT and BMDW within the COMET program.

Conflicts of Interest:
The authors declare no conflict of interest.