On Performance of Sparse Fast Fourier Transform Algorithms Using the Aliasing Filter

Computing the Sparse Fast Fourier Transform(sFFT) of a K-sparse signal of size N has emerged as a critical topic for a long time. There are mainly two stages in the sFFT: frequency bucketization and spectrum reconstruction. Frequency bucketization is equivalent to hashing the frequency coefficients into B buckets through one of these filters: Dirichlet kernel filter, flat filter, aliasing filter, etc. The spectrum reconstruction is equivalent to identifying frequencies that are isolated in their buckets. More than forty different sFFT algorithms compute Discrete Fourier Transform(DFT) by their unique methods so far. In order to use them properly, the urgent topic of great concern is how to analyze and evaluate the performance of these algorithms in theory and practice. The paper mainly discusses the sFFT Algorithms using the aliasing filter. In the first part, the paper introduces the technique of three frameworks: the one-shot framework based on the compressed sensing(CS) solver, the peeling framework based on the bipartite graph and the iterative framework based on the binary tree search. Then, we get the conclusion of the performance of six corresponding algorithms: sFFT-DT1.0, sFFT-DT2.0, sFFT-DT3.0, FFAST, R-FFAST and DSFFT algorithm in theory. In the second part, we make two categories of experiments for computing the signals of different SNR, different N, different K by a standard testing platform and record the run time, percentage of the signal sampled and L0, L1, L2 error both in the exactly sparse case and general sparse case. The result of experiments satisfies the inferences obtained in theory.


Introduction
The widely popular algorithm to compute DFT is the fast Fourier transform(FFT) invented by Cooley and Tukey, which can compute a signal of size N in O(N logN ) time and use O(N ) samples. Nowadays, with the demand for big data computing and low sampling ratio, it motivates the require for the new algorithms to replace previous FFT algorithms that can compute DFT in sublinear time and only use a subset of the input data. The new sFFT algorithms take advantage of the signal's inherent characteristics that a large number of signals are sparse in the frequency domain, only K(K << N ) frequencies are non-zeros or are significantly large. Under this assumption, people can reconstruct the spectrum with high accuracy by using only K most significant frequencies. Because of its excellent performance and generally satisfied assumptions, the technology of sFFT was named one of the ten Breakthrough Technologies in MIT Technology Review in 2012.
Through frequency bucketization, N frequency coefficients are hashed into B buckets, and the length of one bucket is denoted by L. The effect of the Dirichlet kernel filter is to make the signal convoluted a rectangular window in the time domain; it can be equivalent to the signal multiply a Dirichlet kernel window of size L(L << N ) in the frequency domain. The effect of the flat window filter is to make the signal multiply a mix window in the time domain; it can be equivalent to the signal convoluted a flat window of size L(L << N ) in the frequency domain. The effect of the aliasing filter is to make the signal multiply a comb window in the time domain; it can be equivalent to the signal convoluted a comb window of size B(B ≈ K) in the frequency domain. After bucketization, the algorithm only needs to focus on the non-empty buckets and locate the positions and estimate values of the large frequency coefficients in those buckets in what we call the identifying frequencies or the spectrum reconstruction. Up to now, there are more than forty sFFT algorithms using the sFFT idea and about ten sFFT algorithms using the aliasing filter. People are concerned about the performance of these algorithms including runtime complexity, sampling complexity and robustness performance. The results of these performance analyses are the guide for us to optimize these algorithms and use them selectively. The paper will give complete answers in theory and practice.
The first sFFT algorithms called the Ann Arbor fast Fourier transform(AAFFT) algorithm using the Dirichlet kernel filter is a randomized algorithm with runtime and sampling complexity O(K 2 poly(logN )). The performance of the AAFFT0.5 [1] algorithm was later improved to O(Kpoly(logN )) in the AAFFT0.9 [2], [3] algorithm through the use of unequally-spaced FFTs and the use of binary search technique for spectrum reconstruction.
The sFFT algorithms using the flat window filter can compute the exactly K-sparse signal with runtime complexity O(KlogN ) and general K-sparse signal with runtime complexity O(KlogN log(N/K)). In the one-shot framework, the sFFT1.0 [4] and sFFT2.0 [4] algorithm can locate and estimate the K largest coefficients in one shot. In the iterative framework, the sFFT3.0 [5] and sFFT4.0 [5] algorithm can locate the position by using only 2B or more samples of the filtered signal inspired by the frequency offset estimation both in the exactly sparse case and general sparse case. Later, a new robust algorithm, so-called the Matrix Pencil FFT(MPFFT) algorithm, was proposed in [6] based on the sFFT3.0 algorithm. The paper [7] summarizes the two frameworks and five reconstruction methods of these five corresponding algorithms. Subsequently the performance of these five algorithms is analyzed and evaluated both in theory and practice.
There are three frameworks for sFFT algorithms using the flat window filter. The algorithms of the one-shot framework are the so-called sFFT by downsampling in the time domain(sFFT-DT)1.0 [8], sFFT-DT2.0 [9], sFFT-DT3.0 algorithm. The algorithms of the peeling framework are the so-called Fast Fourier Aliasingbased Sparse Transform(FFAST) [10], [11] and R-FFAST(Robust FFAST) [12], [13] algorithm. The algorithm of the iterative framework is the so-called Deterministic Sparse FFT(DSFFT) [14] algorithm. This paper mainly discusses the technology and performance of these six algorithms, and all the details will be mentioned later in the paper.
Under the assumption of arbitrary sampling (even the sampling interval is less than 1), the Gopher Fast Fourier Transform(GFFT) [15], [16] algorithm and the Christlieb Lawlor Wang Sparse Fourier Transform(CLW-SFT) [17], [18] algorithm can compute the exactly K-sparse signal with runtime complexity O(KlogK). They are aliasing-based search deterministic algorithms guided by the Chinese Remainder Theorem(CRT). The DMSFT [19](generated from GFFT) algorithm and CLW-DSFT [19](generated from CLW-SFT) algorithm can compute the general K-sparse signal with runtime complexity O(K 2 logK). They use the multiscale error-correcting method to cope with high-level noise.
The identification of different sFFT algorithms can be seen through a brief analysis as above. The algorithms using the Dirichlet kernel filter are not efficient because it only bins some frequency coefficients into one bucket one time. The algorithms using the flat filter are probabilistic algorithms with spectrum leakage. In comparison to them, the algorithms using the aliasing filter is very convenient and no spectrum leakage, whether N is a product of some co-prime numbers or N is a power of two. This type of algorithm is suitable for the exactly sparse case and general sparse case, but it is not easy to solve the worst case because there may be many frequency coefficients in the same bucket accidentally because the scaling operation is of no use for the filtered signal.
The paper is structured as follows. Section 2 provides a brief overview of the basic sFFT technique using the aliasing filter. Section 3 introduces and analyzes three frameworks and six corresponding spectrum reconstruction methods. In Section 4, we analyze and evaluate the performance of six corresponding algorithms in theory. In the one-shot framework, sFFT-DT1.0, sFFT-DT2.0 and sFFT-DT3.0 algorithm uses the CS solver with the help of the Moment Preserving Problem(MPP) method. In the peeling framework, FFAST and R-FFAST algorithm uses the bipartite graph with the help of the packet erasure channel method. In the iterative framework, the DSFFT algorithm uses the binary tree search with the help of the statistical characteristics. In Section 5, we do two categories of comparison experiments. The first kind of experiment is to compare them with each other. The second is to compare them with other algorithms. The analysis of the experiment results satisfies the inferences obtained in theory.

Preliminaries
In this section, we initially present some notations and basic definitions of sFFT.

Notation
The N -th root of unify is denoted by ω N = e −2πi/N . The DFT matrix of size N is denoted by F N ∈ C N ×N as follows: The DFT of a vector x ∈ C N (consider a signal of size N ) is a vectorx ∈ C N defined as follows:x In the exactly sparse case, spectrumx is exactly K-sparse if it has exactly K non-zero frequency coefficients while the remaining N − K coefficients are zero. In the general sparse case, spectrumx is general K-sparse if it has K significant frequency coefficients while the remaining N − K coefficients are approximately equal to zero. The goal of sFFT is to recover a K-sparse approximationx by locating K frequency positions f 0 , . . . , f K−1 and estimating K largest frequency coefficientsx f 0 , . . . ,x f K−1 .

Techniques of frequency bucketization using the aliasing filter
The first stage of sFFT is encoding by frequency bucketization. The process of frequency bucketization using the aliasing filter is achieved through shift operation and subsampling operation.
The first technique is the use of shift operation. The offset parameter is denoted by τ ∈ R. The shift operation representing the original signal multiplied by matrix S τ . S τ ∈ R N ×N is defined as follows: If a vector x ∈ C N , x = S τ x,x = F N S τ x, such that: The second technique is the use of the aliasing filter. The signal in the time domain is subsampled such that the corresponding spectrum in the frequency domain is aliased. It also means frequency bucketization. The subsampling factor is denoted by L ∈ Z + , representing how many frequencies aliasing in one bucket. The subsampling number is denoted by B ∈ Z + , representing the number of buckets(B = N/L). The subsampling operation representing the original signal multiplied by matrix D L . D L ∈ R B×N is defined as follows: Let vectorŷ B,τ ∈ C B be the filtered spectrum obtained by shift operation and subsampling operation.
As we see above, frequency bucketization includes three steps: shift operation(x = S τ x, it costs 0 runtime), subsampling operation(y B,τ = D L S τ x, it costs B samples), Fourier Transform(ŷ B,τ = F B D L S τ x, it costs BlogB runtime). So totally frequency bucketization one round costs BlogB runtime and B samples.

Techniques
In this section, we introduce an overview of the techniques and frameworks that we will use in the sFFT algorithms based on the aliasing filter.
As mentions above, frequency bucketization can decrease runtime and sampling complexity because all operations are calculated in B dimensions(B = O(K), B << N ). After frequency bucketization, it needs spectrum reconstruction by identifying frequencies that are isolated in their buckets. The aliasing filter may lead to frequency aliasing where more than one significant frequencies are aliasing in one bucket. It increases the difficulty in recovery because finding frequency position and estimating vales of frequency become indistinguishable in terms of their aliasing characteristic. There are three frameworks to overcome this problem, the one-shot framework, the peeling framework, the iterative framework.

One-shot framework based on the CS solver
Firstly we introduce the one-shot framework which can recover all K significant frequencies in one shot. The block diagram of the one-shot framework is shown in Figure 1. The first stage of sFFT is encoding by frequency bucketization. Suppose there are at most a m number of significant frequencies aliasing in every bucket, after running 3a m times' round for set τ = {τ 0 = −a m , τ 1 = −a m + 1, . . . , τ a m = 0, . . . , τ 3a m −1 = 2a m − 1}, calculateŷ B,τ = F B D L S τ x representing filtered spectrum by encoding. Suppose in bucket i, the number of significant frequencies is denoted by a, it is a high probability that a ≤ a m , we get simplified Equation (9) from Equation (7) and Equation (8). In Equation (8) and Equation (9), p j =x f j respecting effective frequency values for |p 0 | ≥ |p 1 | ≥ · · · ≥ |p a−1 | other values of p j , z j = ω f j respecting effective frequency position for f j ∈ {i, i + L, . . . , i + (L − 1)B)}, m k =ŷ B,k [i] respecting filtered spectrum in bucket i for k ∈ {−a m , . . . , 2a m − 1}. In most cases, a = 0 respecting sparsity. In a small number of cases, a = 1 respecting only one significant frequency in the bucket. Only in very little cases, a >= 2 respecting frequencies aliasing in the bucket. It's very unlikely that a > a m . In the exactly sparse case, the approximately equal sign becomes the equal sign in Equation (7), Equation (8), Equation (9), Equation (11), Equation (14) and Equation (18).ŷ The spectrum reconstruction problem in bucket i is equivalent to obtain unknown variables including a, z 0 , · · · , z a−1 , p 0 , · · · , p a−1 for we have known variables including m −a , · · · , m 2a−1 . The aliasing problem is reformulated as Moment Preserving Problem(MPP). The MPP problem formulated by Bose-Chaudhuri-Hocquenghem(BCH) codes can be divided into three subproblems: how to obtain a, how to obtain z j 's, how to obtain p j 's in every bucket. Later, we will solve these three subproblems step by step.
Step 1: Obtain the number of significant frequencies in bucket i. Solution: Suppose a ≤ a m , it means m k is composed of at most a m number of Prony component p j 's. Let vectorẑ j ∈ C a m ×1 defined asẑ j = [z 0 j , z 1 j , . . . , z a m −1 j ] T and Matrix M a m ∈ C a m ×a m defined as Equation (10), then the relationship between M a m ,ẑ j andẑ T j satisfies Theorem 1.
Proof (Proof of Theorem 1) Based on the properties as mentioned above, we get Equation (11).
Equation (11) is similar to the symmetric singular value decomposition(SSVD). Nevertheless, there are some differences. 1) p j 's are complex but not real.
2) The columns of [ẑ 0 , . . . ,ẑ a m −1 ] are not mutually orthogonal normalized vectors. It is easy to do a transformation that M a m ≈ a m −1 j=0 p jẑ jẑ T j , where p j 's are real and the absolute value of p j is directly proportional to the |p j |, and the columns of [ẑ 0 , . . . ,ẑ a m −1 ] are mutually orthogonal normalized vectors. The paper [34] proved that for the symmetric matrix, the Σ got from the SVD is equal to the Σ got from the SSVD. For example, the SVD of 0 1 1 0 is 0 1 1 0 = 0 1 1 0 In these a m number of singular values σ j 's, how many large singular values mean how many efficient Prony components p j 's; it also indicates how many significant frequencies in bucket i.
Step 2: Obtain effective frequency position f j 's in bucket i. Solution: Let the orthogonal polynomial formula P (z) defined as Equation (12) and P (z) ≈ 0. Let Matrix M a ∈ C a×a defined as Equation (13). Let vector C de- The first element of M s has been proved and other elements of M s can also be proved.
For the convenience of matrix calculation, for a matrix, the superscript "T" denotes the transpose, the superscript "+" denotes Moore-Penrose inverse or pseudoinverse, the superscript "*" denotes adjoint matrix, the superscript "-1" denotes inverse. Through Theorem 2, we can obtain C ≈ (M −1 a )M s . After gaining C, there are three ways to obtain z j 's through Equation (12). The first approach is the polynomial method, the second approach is the enumeration method, and the last approach is the Matrix Pencil method. After knowing z j 's, we can obtain approximate positions f j 's through z j = ω f j .
Method 1) Polynomial method: In the exactly sparse case, the a number of roots of P (z) = 0 is the solution of z j 's . For example if a = 1, through (12), the first a number of smallest |P (z)| is the solution of z j 's. L times' attempts are needed in one solution.
Method 3) Matrix Pencil method: The method is proposed in paper [35]. For our problem, let the Toeplitz matrix Y defined as Equation (15), let Y 1 be Y with the rightmost column removed defined as Equation (16), let Y 2 be Y with the leftmost column removed defined as Equation (17). The set of generalized eigenvalues of Y 2 − λY 1 satisfies Theorem 3.
Lemma 3 The set of generalized eigenvalues of Y 2 − λY 1 are the z j 's we seek.
Proof (Proof of Theorem 3) Let the diagonal matrix C ∈ C a×a defined as C = diag(p j ), Let the Vandermonde Martix U a+1 ∈ C (a+1)×a defined as follows: so the Theorem 3 can be proved.
If the rank(Y) = a, the set of generalized eigenvalues of Y 2 − λY 1 is equal to the set of nonzero 2 ordinary eigenvalues of (Y 1 + )Y 2 . It is most likely that the rank(Y) < a, it is necessary to compute the SVD of the Y, Y = VΣV * , then we can use the Matrix Pencil Method to deal with the Matrix V afterword. For details, please refer to paper [35], [6].
Step 3: Obtain effective frequency values p j 's in bucket i. Solution: In order to use the CS method, we need several random sampling. So for P number of random numbers r 1 , . . . , r P , we calculateŷ B,τ = F B D L S τ x for set τ = {τ 0 = r 1 , τ 1 = r 2 , . . . , τ P −1 = r P } in P times' round. Suppose in bucket i, the number of significant frequencies a and approximate effective frequency position z j 's have been known by step 1 and step 2; we can get Equation (18). (There are maybe errors for z j 's obtained by step 2 because of interference of other L − a number of Prony components.) (18) is very likely similar to the CS formula. The model of CS is formulated as y ≈ ΦS, where S is a sparse signal, Φ is a sensing matrix, y is the measurements. In Equation (18), y is a vector of P × 1 by P measurements, Φ is a matrix of P × L, S is a vector of L × 1 which is a-sparse. It should be noted that Φ must satisfy either the restricted isometry property (RIP) or mutual incoherence property (MIP) for the successful recovery with high probability. It has been known that the Gaussian random matrix and partial Fourier matrix are good candidates to be Φ, so the Φ of the Equation (18) meets the criteria. Furthermore, the number of measurements P one collect should more than alog 10 L, so that these measurements will be sufficient to recover signal x.
In order to obtain p j 's by CS solver, we use the subspace pursuit method. The process is as follows: 1) Through the positions f j 's gained by step 2, get the possible value of z j 's as follows An (over-complete) dictionary can be characterized by a matrix D, and it contains 3a vectors listed above. (one wishes one-third vectors of them form a basis). Each (column) vector in a dictionary is called an atom. 2) From 3a atoms of the dictionary matrix D, find a number of atoms that best match the measurements' residual error. Select these a number of atoms to construct a new sensing matrix Φ. 3) ObtainŜ by the support ofŜ = argmin y −ΦŜ 2 through the least square method(LSM). 4) If the residual error y −ΦŜ 2 meets the requirement, or the number of iterations reaches the assumption, or the residual error becomes larger, the iteration will be quit, otherwise continue to step 2. After computing, we get the final sensing matrixΦ of size P × a and sparse signalŜ of size a just in the right part of Equation (18). So we get effective frequency positions z j 's and effective frequency values p j 's in bucket i.

Peeling framework based on the bipartite graph
Secondly, we introduce the peeling framework which can recover all K significant frequencies layer by layer. The block diagram of the peeling framework is shown in Figure 2. In the peeling framework, we require the size N of signal x to be a product of a few(typically three or more) co-prime numbers p i 's. For example N = p 1 p 2 p 3 where p 1 , p 2 , p 3 are co-prime numbers. With this precondition, we determine L i 's and B i 's gained from p i 's in each bucketization cycle. In every cycle, we use the same set τ = {τ 0 = r 1 , τ 1 = r 2 , . . . , τ R−1 = r R } inspired by different spectrum reconstruction ways introduced later to calculateŷ B,τ = F B D L S τ x in R times' rounds(it also means R delay chains for one bucket). Suppose there are d times' cycles, after d cycles, stage 1 encoding by bucketization is completed. In order to solve the aliasing problems, the filtered spectrum in bucket i of the No.j' cycle is denoted by vector − → y B j ,i ∈ C R×1 as follows: . . .
We use a simple example to illustrate the process of encoding by bucketization. Consider a twenty(N = 20) size signal x that has only five(K = 5) significant coefficientsx 1 ,x 3 ,x 5 ,x 10 ,x 13 >> 0, while the rest of the coefficients are approximately equal to zero. With this precondition, there are two bucketization cycles. In the first cycle, for B 1 = 4 and L 1 = 5, we get four vectors { − → y 4,0 , − → y 4,1 , − → y 4,2 , − → y 4,3 } respecting the filtered spectrum in four buckets for set τ in R times' rounds. In the second cycle, for B 2 = 5 and L 2 = 4, we get five vectors { − → y 5,0 , − → y 5,1 , − → y 5,2 , − → y 5,3 , − → y 5,4 }. After the bucketization, we can construct a bipartite graph shown in Figure 3 through Equation (19). In Figure 3, there are N = 20 variable nodes on the left(referred to twenty coefficients ofx) and N b = B 1 + B 2 = 4 + 5 = 9 parity check nodes on the right(referred to nine buckets in two cycles). The values of the parity check nodes on the right are approximately equal to the complex sum of the values of variable nodes(its left neighbors) through Equation (19). In these check nodes, some have no significant variable node, as no left neighbor, is called a 'zero-ton' bucket(three blue-colored check nodes). Some have exactly one significant variable node, as one left neighbor, is called a 'single-ton' bucket(three green-colored check nodes). Others have more than one significant variable nodes, as more than one left neighbors, is called a 'multi-ton' bucket(three red-colored check nodes). parity check nodes on the right(blue square with "N" means 'zero-ton' bucket, green square with "S" means 'single-ton' bucket, red square with "M" means 'multi-ton' bucket).
After bucketization, the subsequent problem is how to recover the spectrum from all buckets gained from several cycles. Through the identification of vector − → y B j ,i , we can determine the characteristics of bucket B j [i]. If the bucket is a 'zero-ton' bucket, thenx (kB j +i) = 0 for k ∈ [0, L j − 1], so the problem of frequency recovery in this bucket can be solved. If the bucket is a 'single-ton' bucket, suppose the one effective frequency position f 0 and the one effective frequency valuex f 0 can be obtained by the afterward methods, thenx (kB j +i , so the frequency recovery in this bucket can be solved. If the bucket is a 'multi-ton' bucket, it is necessary to separate the 'multi-ton' bucket into the 'single-ton' bucket to realize the decoding of the 'multi-ton' bucket. For example, after the first peeling, in the bucket i of the No.j' cycle, supposex f 0 , . . .x f n 1 −1 have been got by other 'single-ton' bucket, then poly(x f n 1 −1 ) respecting remaining frequencies in the bucket. Through the identification of vector − → y B j ,i , we can analyze the remaining elements in this bucket; if it is a 'zero-ton' bucket or a 'single-ton' bucket, we can stop peeling and get all frequencies in this bucket. If not, continue the second peeling, supposex f n 1 , .
. .x f n 2 −1 can be got by other 'single-ton' bucket through new peeling. We can identify − → y B j ,i to continue to analyze the bucket, poly(x f n 2 −1 ). After q times' peeling, the problem of frequency recovery in the 'multi-ton' bucket can be solved as follows: If the frequency recovery in all buckets can be solved, we can finish the spectrum reconstruction. The peeling-decoder successfully recovers all the frequencies with high probability under these three following assumptions: 1) 'Zero-ton'; 'single-ton'; and 'multi-ton' buckets can be identified correctly. 2) If the bucket is a 'single-ton' bucket, the decoder can locate the effective frequency position f 0 and estimate the effective frequency valuex f 0 correctly. 3) It is sufficient to cope with 'multi-ton' buckets by the peeling platform.
Subproblem 1: How to identify vector − → y B j ,i to distinguish bucket B j [i]? Solution: In the exactly sparse case, if − → y B j ,i 2 = 0, the bucket is a 'zero-ton' bucket. If the bucket is not a 'zero-ton' bucket, make the second judgment. The way to make the second judgment about whether the bucket is a 'single-ton' bucket or not is to judge − → y B j ,i − poly(x f 0 ) 2 = 0 or = 0, wherex f 0 is gained from the solution of subproblem 2. If the bucket B j [i] is not a 'zero-ton' bucket either a 'single-ton' bucket, it is a 'multi-ton' bucket. In the general sparse case, the two equations are translated to − → y B j ,i 2 ≤ T 0 and − → y B j ,i − poly(x f 0 ) 2 ≤ T 1 , where T 0 and T 1 are the identification thresholds. It costs O(R) runtime to identify vector − → y B j ,i by knowingx f 0 in advance. After d times' cycles, it costs O(R(B 1 +· · ·+B d )) runtime for the identification in the first peeling.
Subproblem 2: Suppose the target is a 'single-ton' bucket, how to recover the one frequency in this bucket?
Solution: In the exactly sparse case, we use R = 3 and the set τ = {τ 0 = 0, , it is a single-ton bucket. If it is a single-ton bucket, the position f 0 can be obtained by , and the valuex f 0 can be obtained byx f 0 =ŷ B j ,0 [i]. In all, it costs three samples and four runtime to recover the one significant frequency in one bucket.
In the general sparse case, the frequency recovery method is the optimal whitening filter coefficients of the Minimum Mean Squared Error(MMSE) estimator and sinusoidal structured bin-measurement matrix for the speedy recovery. At first, the set of the one candidate f 0 is f 0 ∈ {i, i + L, . . . , i + (L − 1)B)}, and there are L possible choices. In the first iteration of binary-search, we choose a random number r 0 , then calculateŷ B j ,r 0 ,ŷ B j ,r 0 +1 , . . . ,ŷ B j ,r 0 +m−1 in m times' rounds. In fact, we obtainŷ B j ,r 0 In paper [36], we can see the Maximum Likelihood Estimate(MLE) ∠ω f 0 of ∠ω f 0 is calculated by Equation (21), where w t is defined as Equation (22). After obtaining ∠ω f 0 , make a judgment of binary-search; if ∠ω f 0 ∈ [0, π], there is a new restriction that f 0 ∈ [0, N/2 − 1], otherwise the restriction is f 0 ∈ [N/2, N − 1] the complement set of set [0, N/2 − 1]. The next iteration is very similar, we choose a random number r 1 , then calculateŷ B j ,r 1 [i],ŷ B j ,r 1 +2 [i], . . . ,ŷ B j ,r 1 +2(m−1) [i] in bucket i. After obtaining ∠ω 2f 0 through Equation (21) and Equation (22) Subproblem 3: How to solve the 'multi-ton' buckets by the peeling platform? Solution with method 1: The genie-assisted-peeling decoder by the bipartite graph is useful. As shown in Figure 3, the bipartite graph represents the characteristics of the bucketization. The variable nodes on the left represent the characteristics of the frequencies. The parity check nodes on the right represent the characteristics of the buckets. Every efficient variable node connects d different parity check nodes as its neighbor in d time's cycles; it respects the d edges connected to each efficient variable node. The example of the process of the genie-assisted-peeling decoder is shown in Figure 4, and the steps are as follows: Step 1: We can identify all the indistinguishable buckets. If the bucket is a 'zero-ton' bucket, the frequency recovery in this bucket is finished. If the bucket is a 'single-ton' bucket, we can obtain the frequency in the bucket through the solution of subproblem 2. In the graph, these operations represent to select and remove all right nodes with degree 0 or degree 1, and moreover, to remove the edges connected to these right nodes and corresponding left nodes.
Step 2: We should remove the contributions of these frequencies gained by step 1 in other buckets. For example, the first peeling in bucket i means we calculate a new vector . .x f n 1 −1 have been gained by step 1. In the graph, these operations represent to remove all the other edges connected to the left nodes removed in step 1. When the edges are removed, their contributions are subtracted from their right nodes. In the new graph, the degree of some right nodes decreases as well.
Step 3: If all buckets have been identified, we successfully reconstruct the spectrumx. Otherwise, we turn to step 1 and step 2 to continue identifying and peeling operations. In the graph, it means if all right nodes are removed, the decoding is finished. Otherwise, turn to step 1 and step 2. Fig. 4 The example of the process of the genie-assisted-peeling decoder.
Solution with method 2: The sparse-graph decoder by packet erasure channel method is more efficient. From Figure 4, we see all processes need thirteen(9+3+1) identifications in three peelings, so it is not very efficient. As we can see, spectrum recovery can transform into a problem of decoding over sparse bipartite graphs using the peeling platform. The problem of decoding over sparse bipartite graphs has been well studied in the coding theory literature. From the coding theory literature, we know that several sparse-graph code constructions are low-complexity and capacity-achieving by using the erasure channels method. So we use the erasure channels method to improve efficiency.
We If one left node is selected, d number of its neighboring right nodes will be determined. If one right node is selected, L number of its neighboring left nodes will be determined as well. These two corresponding nodes are connected through an edge(In the graph, if the variable nodes on the left is an efficient node, the edge is a solid line. Otherwise, the edge is a dotted line or be omitted). A directed edge − → e in the graph is represented as an ordered pair of nodes such as − → e = {V, C} or − → e = {C, V }, where V is a variable node and C is a check node. A path in the graph is a directed sequence of directed edges such as { − → e 1 , . . . , − → e l }, where the end node of the previous edge of the path is the start node of the next edge of the path. Depth l is defined as the length of the path also means the number of directed edges in the path. The induced subgraph N l V is the directed neighborhood of depth l of left node V . It contains all the edges and nodes on paths − → e 1 , . . . , − → e l starting at node V . It is a tree-like graph which can be seen in Figure 5. In Figure 5, it shows subgraph N l V starting atx 2 with depth l is equal to two, three, four, five, six, respectively and subgraph N l V starting atx 7 with depth l is equal to six. Under these definitions, we get the steps of packet erasure channel method as follows: Step 1: Take O(K) random left nodes as start-points, and draw O(K) trees N 1 V from these start-points. Such as in Figure 5, we choose two left nodesx 2 andx 7 as start-points. The endpoints of N 1 V are check nodes, then identify the characteristics of these check nodes. If the check node is a 'zero-ton' bucket, such as − → y 5,2 , this path stops extending. If the check node is a 'multi-ton' bucket, continue waiting until its left neighboring node identified by other paths. If the check node is a 'single-ton' bucket, such as − → y 4,2 and − → y 4,3 , continue to connect its only one efficient left neighboring node. Then get the tree N 2 V through expending these left nodes. Their (d − 1) number of new right neighboring nodes will be determined through these left nodes, then get the tree N 3 V by expending these right nodes.
Step p: For each start-point V , we have got tree N 2p−1 V from the last step. The endpoints of N 2p−1 V are check nodes. We should remove the contributions of some frequencies gained by the previous paths at first, then identify the characteristics of these check nodes modified. For example, before identifying − → y 5,0 , the endpoint of N 3 x 2 , we should remove the contributions ofx 10 . Furthermore, before identifying − → y 4,1 , the endpoint of N 5 x 2 , we should remove the contributions ofx 13 andx 5 . If the check node modified is a 'zero-ton' bucket, this path stop extending. If the check node modified is a 'multi-ton' bucket, continue waiting until its left neighboring node identified by other paths. If the check node modified is a 'single-ton' bucket, continue to connect its only one efficient left neighboring node. Then get the tree N 2p V through expending these left nodes. Their (d − 1) number of new right neighboring nodes will be determined through these left nodes, then get the tree N 2p+1 V by expending these right nodes.
If the number of left nodes identified equal to K, it means the spectrum recovery has been successful. The example is shown in Figure 5. In Figure 5, from the beginning of start-pointx 2 andx 7 , we can get two trees N 6 x 2 and N 6 x 7 (depth=6) through three expanding by three steps. From the graph, we can obtain all five variable nodes. All processes need six(4+4-2) identifications, far less than thirteen identifications of method 1.

Iterative framework based on the binary tree search method
Thirdly, we introduce the iterative framework based on the binary tree search method. The example is shown in Figure 6.  [8] does not exist either. Inspired by these two ideas, the steps of of binary tree search are as follows: Step 1: Calculate the nodeŷ 1,0 [0] of the first layer, get m 0 and efficient nodes' distribution in this layer.
Step 2: According to the last result, calculate the nodeŷ 2,0 [0] andŷ 2,0 [1] of the second layer selectively, get m 1 and efficient nodes' distribution in this layer.
Step We don't need to start from step 1, we can start from step p(p = O(logK)). With the binary tree search, if m d gained by step d + 1 is approximately equal to the sparse number K, the binary tree search is finished. In Figure 6, m 3 = 5 = K, the binary tree search is finished after the fourth layer by step 4. According to the efficient nodes' distribution of No.(d + 1)'s layer, we can solve the frequency recovery problem of each 'single-ton' bucket. Finally, the K frequency recovery problem is solved by combining the efficient frequency in each bucket. In Figure  6,ŷ 8,0 [4],ŷ 8,0 [6],ŷ 8,0 [1],ŷ 8,0 [5],ŷ 8,0 [3] exists, it represents that each of the five buckets has exactly one effective frequency. Through the frequency recovery of each 'single-ton' bucket, we obtainx 1 ,x 6 ,x 13 ,x 20 ,x 59 individually in every bucket. Finally, we obtainx of five elements. This algorithm involves three subproblems as follows:  . are still aliasing until d = logN . Therefore, threshold = 0.75K can be considered, which means that the K frequencies are put into 0.75K buckets. The aliasing problem can be solved by using the SVD decomposition described in the last paragraph. Its extra sampling and calculating cost may be more less than that of continuous expansion search if maybe no more frequencies are separated by the next layer.
Subproblem 3: The frequency recovery problem of approximately 'single-ton' bucket.
Solution: In the exactly sparse case, the problem can be solved by using the phase encoding method described in the last paragraph. In the general sparse case, the problem can be solved by the CS method or the MMSE estimator described in the last paragraph.
The binary tree search method is especially suitable for small K or small support. In the case of small support, the frequencies must be scattered into ev-ery different bucket. For example, only eight consecutive frequencies of the 1028 frequenciesx 0 , . . . ,x 1027 are significant frequencies, and their eight locations are equal to 0mod8, 1mod8, 2mod8, . . . , 7mod8, respectively. In this way, m 3 = 8 can be obtained in the fourth layer, and the binary tree search can be stopped by step 4. This method also has the advantage that it is a deterministic algorithm. Finally, the distribution of at most one frequency in one bucket must happen with the layer's expansion, unlike some probabilistic algorithms such as sFFT-DT algorithms.

Algorithms analysis in theory
As mentioned above, the goal of frequency bucketization is to decrease runtime and sampling complexity in the advantage of low dimensions. After bucketization, the filtered spectrumŷ L,τ,σ can be obtained by the original signal x. Then through three frameworks introduced above, it is sufficient to recover the spectrum x of the filtered spectrumŷ L,τ,σ in their own way. In this section, we introduce and analyze the performance of corresponding algorithms including sFFT-DT1.0 algorithm, sFFT-DT2.0 algorithm, sFFT-DT3.0 algorithm, FFAST algorithm, R-FFAST algorithm and DSFFT algorithm.

The sFFT-DT algorithms by the one-shot framework
The block diagram of the sFFT algorithms by the one-shot framework is shown in Figure 1. We explain the details and analyze the performance in theory as below: Stage 2-Step 2 Locate the position by method 3: In a bucket waiting to be solved, firstly compute the SVD of the matrix Y: Y = VΣV * , then get the set of nonzero 2 ordinary eigenvalues of (V 1 + )V 2 . The sFFT-DT3.0 algorithm uses this method, and it costs a 3 m K runtime in this step. Stage 2-Step 3 Estimate the value by CS solver: In a bucket waiting to be solved, 1) Through the positions f j 's gained by step 2, get the possible value of z j 's as follows z j , z j , z j , then obtain 3a vectors. The dictionary D contains 3a vectors listed above. 2) From 3a atoms of the dictionary matrix D, find a number of atoms that best matches the measurements' residual error. Select these a number of atoms to construct a new sensing matrixΦ. 3) ObtainŜ by the support ofŜ = argmin y −ΦŜ 2 through the LSM. 4) If the residual error meets the requirement, or the number of iterations reaches the assumption, or the residual error becomes larger, the iteration will be quit, otherwise continue to step 2. It costs 3a 2 m P K runtime in this step.

The FFAST algorithms by the peeling framework
The block diagram of the sFFT algorithms by the peeling framework is shown in Figure 2. We explain the details and analyze the performance in theory as below: Stage 1 Encoding by Bucketization: With the assumption of K < N 1/3 , we run three bucketization cycles, and each cycle needs R times' round. The number of buckets is equal to B 1 , B 2 , B 3 individually in three cycles, correspondingly the size of one bucket is equal to L 1 , L 2 , L 3 individually in three cycles. In the exactly sparse case, in the first peeling, we use the FFAST algorithm with R = 3 and the set τ = {τ 0 = 0, τ 1 = 1, τ 2 = 2} in three rounds. In the general sparse case, in the first peeling, we use the R-FFAST algorithm with R = Cm = O(log 4/3 N ) and the set τ = r 0 , r 0 + 1, . . . , r 1 , r 1 + 2, . . . , r C−1 , r C−1 + 2 C−1 , . . . , r C−1 + 2 C−1 (m − 1) in R rounds. After bucketization, we get the vectors − → y B 1 ,0 , . . . , − → y B 1 ,

The DSFFT algorithm by the binary tree search framework
The first stage of DSFFT algorithm is finding exactly O(K) efficient buckets through the binary tree search method. When m d gained by step (d + 1) is equal to the sparse number K, the binary tree search is finished. Now we start to calculate the probability in this case. The total number of buckets is 2 d = B, the number of aliasing frequencies in each bucket is L = N/B = N/2 d . In other words, the right case is K large frequencies of totally N frequencies allocated to B buckets, and there is no aliasing in the B distributed buckets. The probability P is calculated as follows: Firstly, analyze the first bucket, the probability P 1 is the case of the non-aliasing case divided by the total case,  Table 1 shows the probability . When B 2 = 4K, Table 1 The probability P B 0 , P B 1 , P B 2 , P B 3 of different K in the case of B from K to 8K As we can see from Table 1, the DSFFT algorithm has high efficiency only when K is very small. The total complexity can be calculated according to the sum of probability times conditional complexity. So the total runtime complexity is for P B −1 = 0, and the total sampling complexity is

Summary of six algorithms in theory
After analyzing three types of frameworks, six corresponding algorithms, Table 2 1 can be concluded with the additionl information of other sFFT algorithms and fftw algorithm. Table 2 The performance of fftw algorithm and sFFT algorithms in theory algorithm runtime complexity sampling complexity robustness From Table 2, we can see the sFFT-DT1.0 algorithm and the FFAST algorithm are the lowest runtime and sampling complexity, but they are non-robustness.
Other algorithms using the aliasing filter are good sampling complexity but compare them with other sFFT algorithms it is no advantage in the runtime complexity except sFFT-DT3.0 algorithm. In addition, the use of aliasing algorithm is limited in some aspects. For the algorithms of the one-shot platform, the performance is not very efficient if N/K ≤ 32000 because of needing the necessary number of measurements for CS recovery. For the algorithms of the peeling platform, the assumption of K < N 1/3 is required because if there are four or more cycles, it will be very complicated. For the algorithm of the binary tree search framework, K is required to be very small, because only in this case, there is no need to expand the scale of a large binary tree.

Algorithms analysis in practice
In this section, we evaluate the performance of six sFFT algorithms using the aliasing filter: sFFT-DT1.0, sFFT-DT2.0, sFFT-DT3.0, FFAST, R-FFAST and DSFFT algorithm. We firstly compare these algorithms' runtime, percentage of the signal sampled and robustness characteristics with each other. Then we compare some of these algorithms' characteristics with other algorithms: fftw, sFFT1.0, sFFT2.0, sFFT-3.0, sFFT-4.0 and AAFFT algorithm. All experiments are run on a Linux CentOS computer with 4 Intel(R) Core(TM) i5 CPU and 8 GB of RAM.

Experimental Setup
In the experiment, the test signals are gained in a way that K frequencies are randomly selected from N frequencies and assigned a magnitude of 1 and a uniformly random phase. The rest frequencies are set to zero in the exactly sparse case or combined with additive white Gaussian noise in the general sparse case, whose variance varies depending on the SNR required. The parameters of these algorithms are chosen so that they can make a balance between time efficiency and robustness.

Comparison experiment about different algorithms using the aliasing filter of themselves
We plot Figure 7(a), 7(b) representing run time vs. Signal Size and vs. Signal Sparsity for sFFT-DT2.0, sFFT-DT3.0, R-FFAST and DSFFT algorithm in the general sparse case 2 . As mentioned above, the runtime is determined by two factors. One is how many buckets to cope with, and another is how much time cost to identify frequencies in efficient buckets. So from Figure 7, we can see 1) The runtime of these four algorithms are approximately linear in the log scale as a function of N and non-linear as a function of K. 2) Result of ranking the runtime of four algorithms is sFFT-DT3.0>sFFT-DT2.0>DSFFT>R-FFAST. The reason is the method of R-FFAST algorithm is the most time-consuming, the method of DSFFT algorithm is also not efficient, the method of sFFT-DT2.0 is much better, and the method of sFFT-DT3.0 has the highest time performance. 3) K has a certain limitation. The DSFFT algorithm is very inefficient when K is greater than 50, and the R-FFAST algorithm does not work when K is greater than N 1/3 .  We plot Figure 8(a), 8(b) representing the percentage of the signal sampled vs. Signal Size and vs. Signal Sparsity for sFFT-DT2.0, sFFT-DT3.0 and R-FFAST algorithm in the general sparse case. As mentioned above, the percentage of the signal sampled is also determined by two factors: how many buckets and how many samples sampled in one bucket. So from Figure 8, we can see 1) The percentage of the signal sampled of these three algorithms are approximately linear in the log scale as a function of N and non-linear as a function of K. 2) Result of ranking the sampling complexity of three algorithms is R-FFAST>sFFT-DT2.0=sFFT-DT3.0 because R-FFAST algorithm uses the principle of CRT, the number of buckets is independent of K, and the proportion decreases with the increase of N . 3) There is a limit for sFFT-DT2.0 and sFFT-DT3.0 algorithm. When N/K is too large, B can not maintain the linearity of the sparsity K. For the R-FFAST algorithm; there is a limit that K cannot be greater than N 1/3 .  We plot Figure 9(a), 9(b) representing run time and L 1 -error vs SNR for sFFT-DT2.0, sFFT-DT3.0 and R-FFAST algorithm. From Figure 9 we can see 1) The runtime of sFFT-DT2.0, sFFT-DT3.0 is approximately equal vs SNR, but the runtime of the R-FFAST algorithm increases with the noise. 2) To a certain extent, these three algorithms are all robust. When SNR is low, only the R-FFAST algorithm satisfies the ensure of robustness. When SNR is medium, the sFFT-DT3.0 algorithm can also meet the ensure of robustness. And only when SNR is bigger than 20db, the sFFT-DT2.0 algorithm can deal with the noise interference. The reason is that the way of binary search is better than other ways in terms of robustness.

Comparison experiment about algorithms using the aliasing filter and other algorithms
We plot Figure 10(a) and 10(b) representing run time vs. Signal Size and vs. Signal Sparsity for sFFT-DT3.0, R-FFAST, sFFT2.0, sFFT4.0, AAFFT and fftw algo-rithm in the general sparse case. 3 From Figure 10, we can see 1) These algorithms are approximately linear in the log scale as a function of N except fftw algorithm. These algorithms are approximately linear in the standard scale as a function of K except fftw and sFFT-DT3.0 algorithm. 2) Result of ranking the runtime complexity of these six algorithms is sFFT2.0>sFFT4.0>AAFFT>sFFT-DT3.0>fftw>R-FFAST when N is large. The reason is the Least Absolute Shrinkage and Selection Operator(LASSO) method used in R-FFAST costs a lot of time. Besides, the SVD method and the CS method used in sFFT-DT also cost a lot of time. 2) Result of ranking the runtime complexity of these six algorithms is fftw>sFFT-DT3.0>sFFT4.0 >sFFT2.0>AAFFT>R-FFAST when K is large. The reason is algorithms using the aliasing filter only need much fewer buckets than algorithms using the flat filter when K is large.  We plot Figure 11(a) and 11(b) representing the percentage of the signal sampled vs. Signal Size and vs. Signal Sparsity for sFFT-DT3.0, R-FFAST, sFFT2.0, sFFT4.0, AAFFT and fftw algorithm in the general sparse case. From Figure 11, we can see 1) These algorithms are approximately linear in the log scale as a function of N except fftw and sFFT-DT algorithm. The reason is sampling in low-dimension in sFFT algorithms can decrease sampling complexity and it is a limit to the size of the bucket in the sFFT-DT algorithm by using the CS method. These algorithms are approximately linear in the standard scale as a function of K except R-FFAST, sFFT-DT and fftw algorithm. The reason is algorithms using the aliasing filter saving samples by using less number of buckets. 2) Result of ranking the sampling complexity of these six algorithms is R-FFAST>sFFT4.0>AAFFT>sFFT2.0>sFFT-DT3.0>fftw when N is large. 3) Result of ranking the sampling complexity is sFFT-DT3.0>sFFT4.0>AAFFT>sFFT2.0>fftw when K is large.  We plot Figure 12(a), 12(b) representing run time and L 1 -error vs SNR for sFFT-DT3.0, R-FFAST, sFFT2.0, sFFT4.0, AAFFT and fftw algorithm. From Figure 12, we can see 1) The runtime is approximately equal vs SNR except the R-FFAST algorithm. 2) To a certain extent, these six algorithms are all robust, but when SNR is low, only fftw and R-FFAST algorithm satisfies the ensure of robustness. When SNR is medium, sFFT2.0, AAFFT and sFFT-DT3.0 algorithm can also meet the ensure of robustness. And only when SNR is bigger than 20db, the sFFT4.0 algorithm can deal with the noise interference.

Conclusion
In the first part, the paper introduces the techniques used in the sFFT algorithm including time-shift operation and subsampling operation. In the second part, we analyze six typical algorithms using the aliasing filter in detail. By the one-shot framework based on the CS solver, sFFT-DT1.0 algorithm uses the polynomial method, sFFT-DT2.0 algorithm uses the enumeration method and sFFT-DT3.0 algorithm uses the Matrix Pencil method. By the peeling framework based on the bipartite graph, the FFAST algorithm uses the phase encoding method, the R-FFAST algorithm uses the MMSE estimator method. By the iterative framework, the DSFFT algorithm uses the binary tree search method. We get the conclusion of the performance of these algorithms including runtime complexity, sampling complexity and robustness in theory in Table 2. In the third part, we make two categories of experiments for computing the signals of different SNR, different N and different K by a standard testing platform and record the run time, percentage of the signal sampled and L 0 , L 1 , L 2 error by using nine different sFFT algorithms in every different situation both in the exactly sparse case and general sparse case. The analysis of the experiment results satisfies theoretical inference.
The main contribution of this paper is 1) Develop a standard testing platform which can test more than ten typical sFFT algorithms in different situations on the basic of an old platform 2) Get a conclusion of the character and performance of the six typical sFFT algorithms using the aliasing filter: sFFT-DT1.0 algorithm, sFFT-DT2.0 algorithm, sFFT-DT3.0 algorithm, FFAST algorithm, R-FFAST algorithm and DSFFT algorithm in theory and practice.