In this section, we introduce an overview of the techniques and frameworks that we will use in the sFFT algorithms based on the aliasing filter.
As mentioned above, frequency bucketization can decrease runtime and sampling complexity because all operations are calculated in B dimensions . After frequency bucketization, it needs spectrum reconstruction performed by identifying frequencies that are isolated in their buckets. The aliasing filter may lead to frequency aliasing where more than one significant frequency are aliasing in one bucket. This increases the difficulty in recovery because finding frequency position and estimating frequency values becomes indistinguishable in terms of their aliasing characteristics. There are three frameworks to overcome this problem, namely the one-shot framework, the peeling framework, the iterative framework.
  3.1. One-Shot Framework Based on the CS Solver
Firstly we introduce the one-shot framework, which can recover all 
K significant frequencies in one shot. The block diagram of the one-shot framework is shown in 
Figure 1. The concepts, technology, and framework involved in this section were proposed by the Taiwan university in the paper [
9,
10].
The first stage of sFFT is encoding by frequency bucketization. Suppose there are at most an 
 number of significant frequencies aliasing in every bucket, after running 3
 times for the set 
, calculate 
, representing the filtered spectrum by encoding. Suppose that in bucket 
i, the number of significant frequencies is denoted by 
a; there is a high probability that 
, we obtain simplified Equation (
9) from Equations (
7) and (
8). In Equations (
8) and (
9), 
 respecting effective frequency values for 
, 
 respecting effective frequency position for 
, 
 respecting filtered spectrum in bucket 
i for 
. In most cases, 
 respecting sparsity. In a small number of cases, 
 respecting only one significant frequency in the bucket. Only in very few cases, 
 respecting frequencies aliasing in the bucket. It is very unlikely that 
. In the exactly sparse case, the approximately equal sign becomes the equal sign in Equations (
7)–(
9), (
11), (
14), and (
18).
The spectrum reconstruction problem in bucket 
i is equivalent to obtaining unknown variables including 
, as we have known variables including 
. The aliasing problem is reformulated as a moment preserving problem (MPP). The MPP problem formulated by Bose–Chaudhuri–Hocquenghem (BCH) codes [
35] can be divided into three subproblems: how to obtain 
a, how to obtain 
’s, and how to obtain 
’s in every bucket. Below, we will solve these three subproblems step by step.
        
Step 1: Obtain the number of significant frequencies.
Solution: Suppose 
; this means 
 is composed of at most an 
 number of Prony component 
’s. Let vector 
 defined as 
 and Matrix 
 defined as Equation (
10), then the relationship between 
, 
 and 
 satisfies Theorem 1.
        
Proof of Theorem 1.  Based on the properties as mentioned above, we obtain Equation (
11).    □
 Equation (
11) is similar to the symmetric singular value decomposition (SSVD). Nevertheless, there are some differences. (1) 
’s are complex but not real. (2) The columns of 
 are not mutually orthogonal normalized vectors. It is easy to perform a transformation that 
, where 
’s are real and the absolute value of 
 is directly proportional to the 
, and the columns of 
 are mutually orthogonal normalized vectors. The paper [
36] proved that for the symmetric matrix, the 
 obtained from the SVD is equal to the 
 obtained from the SSVD. For example, the SVD of 
 is 
 and the SSVD of 
 is 
; the 
 values gained via these two methods are the same. After knowing this, we can compute the SVD of 
 and obtain 
 singular values, then perform the principal component analysis (PCA), 
. In these 
 number of singular values 
’s, the amount of large singular values indicates the amount of efficient Prony components 
’s; it also indicates how many significant frequencies are in bucket 
i.
Step 2: Obtain effective frequency position ’s in bucket i.
Solution: Let the orthogonal polynomial formula 
 be defined as Equation (
12) and 
. Let Matrix 
 be defined as Equation (
13). Let vector 
C be defined as 
. Let vector 
 be defined as 
, 
. The moments’ formula satisfies Theorem 2.
        
Proof of Theorem 2.  The first element of  has been proven and other elements of  can also be proven.    □
 For the convenience of matrix calculation, for a matrix, the superscript “T” denotes the transpose, the superscript “+” denotes the Moore–Penrose inverse or pseudoinverse, the superscript “*” denotes the adjoint matrix, and the superscript “−1” denotes the inverse. Through Theorem 2, we can obtain 
. After gaining 
C, there are three ways to obtain 
’s through Equation (
12). The first approach is the polynomial method, the second approach is the enumeration method, and the last approach is the matrix pencil method. After knowing 
’s, we can obtain the approximate positions 
’s through 
.
(Method 1) Polynomial method: In the exactly sparse case, the a number of roots of  is the solution of ’s. For example, if , through , then . If , through , then .
(Method 2) Enumeration method: For 
, try every possible position 
 in Equation (
12), the first 
a number of the smallest 
 is the solution of 
’s. 
L attempts are needed in one solution.
(Method 3) Matrix pencil method: The method was proposed in paper [
37]. The matrix pencil method, like the Prony method, is a standard technique for mode frequency identification for computing the maximum likelihood signal under Gaussian noise and evenly spaced samples. For our problem, let the Toeplitz matrix 
 be defined as Equation (
15), let 
 be 
 with the rightmost column removed and be defined as Equation (
16), and let 
 be 
 with the leftmost column removed and be defined as Equation (
17). The set of generalized eigenvalues of 
 satisfies Theorem 3.
        
Theorem 3.  The set of generalized eigenvalues of  are the ’s we seek.
 Proof of Theorem 3.  Let the diagonal matrix  be defined as . Let the Vandermonde martix  be defined as follows: .  has a Vandermonde decomposition, we can obtain , , . For example, if a = 1, , and if a = 2, , , . Using the Vandermonde decomposition, and we can obtain , so the Theorem 3 can be proven.    □
 If the rank 
, the set of generalized eigenvalues of 
 is equal to the set of nonzero
 ordinary eigenvalues of 
. It is most likely that for the rank 
, it is necessary to compute the SVD of the 
, 
, and then we can use the matrix pencil method to deal with the matrix 
 afterword. For details, please refer to paper [
7,
37].
Step 3: Obtain effective frequency values ’s in bucket i.
Solution: In order to use the CS method, we need several random samplings. Thus, for a 
P number of random numbers 
, we calculate 
 for the set 
 in a 
P times’ round. Suppose that in bucket 
i, the number of significant frequencies 
a and approximate effective frequency position 
’s have been known by step 1 and step 2; we can obtain Equation (
18). (There may be errors for 
’s obtained by step 2 because of the interference of another 
 number of Prony components).
        
Equation (
18) is very likely similar to the CS formula. The model of CS is formulated as 
, where 
S is a sparse signal, 
 is a sensing matrix, and 
y is the measurements. In Equation (
18), 
y is a vector of 
 by 
P measurements, 
 is a matrix of 
, and 
S is a vector of 
 that is 
a-sparse. It should be noted that 
 must satisfy either the restricted isometry property (RIP) or mutual incoherence property (MIP) for a successful recovery with high probability. It has been known that the Gaussian random matrix and partial Fourier matrix are good candidates to be 
, so the 
 of Equation (
18) meets the criteria. Furthermore, the number of measurements 
P one collects should be more than 
, so that these measurements will be sufficient to recover signal 
x.
In order to obtain 
’s using the CS solver, we use the subspace pursuit method. The process is as follows: (1) Through the positions 
’s gained by step 2, obtain the possible value of 
’s as follows: 
, 
, 
, then obtain 3
a vectors as follows: 
, 
. An (over-complete) dictionary can be characterized by a matrix 
, and it contains the 
 vectors listed above. (One wishes one-third vectors of them form a basis). Each (column) vector in a dictionary is called an atom. (2) From 
 atoms of the dictionary matrix 
, find 
a number of atoms that best match the measurements’ residual error. Select these 
a number of atoms to construct a new sensing matrix 
. (3) Obtain 
 by the support of 
 through the least square method (LSM). (4) If the residual error 
 meets the requirement, or the number of iterations reaches the assumption, or the residual error becomes larger, the iteration will be quit; otherwise continue to step 2. After computing, we obtain the final sensing matrix 
 of size 
 and sparse signal 
 of size 
a just in the right part of Equation (
18). Thus we obtain effective frequency positions 
’s and effective frequency values 
’s in bucket 
i.
  3.2. Peeling Framework Based on the Bipartite Graph
Secondly, we introduce the peeling framework, which can recover all 
K significant frequencies layer by layer. The block diagram of the peeling framework is shown in 
Figure 2. The concepts, technology, and framework involved in this section were proposed by Berkeley university in the paper [
11,
12,
13,
14].
In the peeling framework, we require the size 
N of signal 
x to be a product of a few (typically three or more) co-prime numbers 
’s. For example 
, where 
 are co-prime numbers. With this precondition, we determine 
’s and 
’s gained from 
’s in each bucketization cycle. In every cycle, we use the same set 
 inspired by different spectrum reconstruction methods introduced later to calculate 
 in 
R times’ rounds (this also means there are 
R delay chains for one bucket). Suppose there are 
d cycles, after 
d cycles, stage 1 encoding by bucketization is completed. In order to solve the aliasing problems, the filtered spectrum in bucket 
i of the no. 
j’ cycle is denoted by vector 
 as follows:
We use a simple example to illustrate the process of encoding by bucketization. Consider a signal 
x of size (
N = 20) that has only five (
K = 5) significant coefficients, 
, while the rest of the coefficients are approximately equal to zero. With this precondition, there are two bucketization cycles. In the first cycle, for 
 and 
, we obtain four vectors, 
, respecting the filtered spectrum in four buckets for the set 
 in 
R rounds. In the second cycle, for 
 and 
, we obtain five vectors, 
. After the bucketization, we can construct a bipartite graph shown in 
Figure 3 through Equation (
19). In 
Figure 3, there are 
 variable nodes on the left (referring to the 20 coefficients of 
) and 
 parity check nodes on the right (referring to nine buckets in two cycles). The values of the parity check nodes on the right are approximately equal to the complex sum of the values of variable nodes (its left neighbors) through Equation (
19). In these check nodes, some have no significant variable node with no left neighbor, which is called a “zero-ton” bucket (three blue-colored check nodes). Some have exactly one significant variable node, as one left neighbor, which is called a “single-ton” bucket (three green-colored check nodes). Others have more than one significant variable nodes, as more than one left neighbors, which is called a “multi-ton” bucket (three red-colored check nodes).
After bucketization, the subsequent problem is how to recover the spectrum from all buckets gained from several cycles. Through the identification of vector 
, we can determine the characteristics of bucket 
. If the bucket is a “zero-ton” bucket, then 
, so the problem of frequency recovery in this bucket can be solved. If the bucket is a “single-ton” bucket, suppose the one effective frequency position 
 and the one effective frequency value 
 can be obtained by the afterward methods, then 
, and thus the frequency recovery in this bucket can be solved. If the bucket is a “multi-ton” bucket, it is necessary to separate the “multi-ton” bucket into the “single-ton” bucket to realize the decoding of the “multi-ton” bucket. For example, after the first peeling, in the bucket 
i of the no. 
j cycle, suppose 
 have been obtained via another “single-ton” bucket, then the vector 
 respecting the original bucket changes to 
, where 
 respecting the remaining frequencies in the bucket. Through the identification of vector 
, we can analyze the remaining elements in this bucket; if it is a “zero-ton” bucket or a “single-ton” bucket, we can stop peeling and obtain all frequencies in this bucket. If not, continue the second peeling, and suppose 
 can be obtained by another “single-ton” bucket through new peeling. We can identify 
 to continue to analyze the bucket, where 
. After 
q times’ peeling, the problem of frequency recovery in the “multi-ton” bucket can be solved as follows:
If the frequency recovery in all buckets can be solved, we can finish the spectrum reconstruction. The peeling-decoder successfully recovers all of the frequencies with high probability under the three following assumptions: (1) “Zero-ton”, “single-ton” and “multi-ton” buckets can be identified correctly. (2) If the bucket is a “single-ton” bucket, the decoder can locate the effective frequency position  and estimate the effective frequency value  correctly. (3) It is sufficient to cope with “multi-ton” buckets via the peeling platform.
Subproblem 1: How to identify vector  to distinguish bucket ?
Solution: In the exactly sparse case, if , the bucket is a “zero-ton” bucket. If the bucket is not a “zero-ton” bucket, make the second judgment. The way to make the second judgment about whether the bucket is a ”single-ton” bucket or not is to judge , where  is gained from the solution of subproblem 2. If the bucket  is not a “zero-ton” bucket nor a “single-ton” bucket, it is a “multi-ton” bucket. In the general sparse case, the two equations are translated to  and , where  and  are the identification thresholds. It costs  runtime to identify vector  by knowing  in advance. After d cycles,  runtime is needed for the identification in the first peeling.
Subproblem 2: Suppose the target is a “single-ton” bucket, how to recover the one frequency in this bucket?
Solution: In the exactly sparse case, we use  and the set  to calculate  in three rounds. Suppose the bucket  is a ”single-ton” bucket, then three elements of the vector  are ,  and . Thus, only if , it is a ”single-ton” bucket. If it is a ”single-ton” bucket, the position  can be obtained by , and the value  can be obtained by . In all, it costs three samples and four runtimes to recover the one significant frequency in one bucket.
In the general sparse case, the frequency recovery method is the optimal whitening filter coefficient of the minimum mean squared error (MMSE) estimator and sinusoidal structured bin-measurement matrix for the speedy recovery. At first, the set of the one candidate 
 is 
, and there are 
L possible choices. In the first iteration of the binary search, we choose a random number 
, then calculate 
 in 
m rounds. In fact, we obtain 
. 
 respecting the phase difference is defined as 
, where 
 relates to the real error in no. 
t’ round. In paper [
38], we can see the maximum likelihood estimate (MLE) 
 of 
 is calculated by Equation (
21), where 
 is defined as Equation (
22). After obtaining 
, make a judgment of binary search; if 
, there is a new restriction that 
, otherwise the restriction is 
 the complement set of set 
. The next iteration is very similar; we choose a random number 
, then calculate 
 in bucket 
i. After obtaining 
 through Equations (
21) and (
22), we make a judgment of binary-search; if 
, there is a new restriction that 
, otherwise the restriction is the complement set of the previous set. After 
C iterations, in the advantage of restrictions, we can locate the only position 
 from the original set 
. From the paper [
13], if the number of iterations 
 and the number of rounds per iteration 
, the singleton-estimator algorithm can correctly identify the unknown frequency 
 with a high probability. If the signal-to-noise ratio (SNR) is low, the 
m and 
C must be increased. After knowing 
, we can obtain 
 by applying the LSM. In all, to recover the one approximately significant frequency in one bucket, it needs 
 samples and 
 runtime. The runtime includes 
 runtime to calculate all 
’s and 
 runtime to calculate all 
’s. 
 can be used to judge whether the bucket is a “single-ton” bucket or not through the discriminant 
.
        
Subproblem 3: How to solve the “multi-ton” buckets by the peeling platform?
Solution with method 1: The genie-assisted peeling decoder by the bipartite graph is useful. As shown in 
Figure 3, the bipartite graph represents the characteristics of the bucketization. The variable nodes on the left represent the characteristics of the frequencies. The parity check nodes on the right represent the characteristics of the buckets. Every efficient variable node connects 
d different parity check nodes as its neighbors in 
d cycles; it respects the 
d edges connected to each efficient variable node. The example of the process of the genie-assisted peeling decoder is shown in 
Figure 4, and the steps are as follows:
Step 1: We can identify all the indistinguishable buckets. If the bucket is a “zero-ton” bucket, the frequency recovery in this bucket is finished. If the bucket is a “single-ton” bucket, we can obtain the frequency in the bucket through the solution of subproblem 2. In the graph, these operations represent to select and remove all right nodes with degree 0 or degree 1, and moreover, to remove the edges connected to these right nodes and corresponding left nodes.
Step 2: We should remove the contributions of these frequencies gained by step 1 in other buckets. For example, the first peeling in bucket i means we calculate a new vector  just as , where  have been gained by step 1. In the graph, these operations represent removing all the other edges connected to the left nodes removed in step 1. When the edges are removed, their contributions are subtracted from their right nodes. In the new graph, the degree of some right nodes decreases as well.
Step 3: If all buckets have been identified, we successfully reconstruct the spectrum . Otherwise, we turn to step 1 and step 2 to continue identifying and peeling operations. In the graph, it means that if all right nodes are removed, the decoding is finished. Otherwise, turn to step 1 and step 2.
Solution with method 2: The sparse-graph decoder by the packet erasure channel method is more efficient. From 
Figure 4, we see all processes need 13 (9 + 3 + 1) identifications in three peelings, so it is not very efficient. As we can see, spectrum recovery can transform into a problem of decoding over sparse bipartite graphs using the peeling platform. The problem of decoding over sparse bipartite graphs has been well studied in the coding theory literature. From the coding theory literature, we know that several sparse-graph code constructions are of low-complexity and capacity-achieving by using the erasure channels method. Thus we use the erasure channels method to improve efficiency.
We construct a bipartite graph with 
K variable nodes on the left and 
 check nodes on the right. Each efficient left node 
 connects exactly 
d right nodes 
’s in 
d cycles and the set of 
 check nodes are assigned to 
d subsets with the no. 
i’s subset having 
 check nodes. The example of the “balls-and-bins” model defined above is shown in 
Figure 2. In 
Figure 3, there are 
 variable nodes on the left and 
 check nodes on the right and each left node 
 connects exactly 
 right nodes; the set of check nodes is assigned to two subsets (
). If one left node is selected, a 
d number of its neighboring right nodes will be determined. If one right node is selected, an 
L number of its neighboring left nodes will be determined as well. These two corresponding nodes are connected through an edge (in the graph, if the variable nodes on the left are efficient nodes, the edge is a solid line. Otherwise, the edge is a dotted line or is omitted). A directed edge 
 in the graph is represented as an ordered pair of nodes such as 
 or 
, where 
V is a variable node and 
C is a check node. A path in the graph is a directed sequence of directed edges such as 
, where the end node of the previous edge of the path is the start node of the next edge of the path. Depth 
l is defined as the length of the path, which also indicates the number of directed edges in the path. The induced subgraph 
 is the directed neighborhood of depth 
l of left node 
V. It contains all of the edges and nodes on paths 
 starting at node 
V. It is a tree-like graph that can be seen in 
Figure 5. In 
Figure 5, it can be seen that subgraph 
 starting at 
 with depth 
l is equal to two, three, four, five, and six, and subgraph 
 starting at 
 with depth 
l is equal to six. Under these definitions, we obtain the steps of thepacket erasure channel method as follows: Step 1: Take 
 random left nodes as starting points, and draw 
 trees 
 from these starting points. As shown in 
Figure 5, we choose two left nodes 
 and 
 as starting points. The endpoints of 
 are check nodes; we then identify the characteristics of these check nodes. If the check node is a “zero-ton” bucket, such as 
, this path stops extending. If the check node is a “multi-ton” bucket, continue waiting until its left neighboring node is identified by other paths. If the check node is a “single-ton” bucket, such as 
 and 
, continue to connect its only efficient left neighboring node. Then obtain the tree 
 through expending these left nodes. Their 
 number of new right neighboring nodes will be determined through these left nodes; then, obtain the tree 
 by expending these right nodes.
…
Step p: For each start-point V, we have obtained tree  from the last step. The endpoints of  are check nodes. We should remove the contributions of some frequencies gained by the previous paths at first, then identify the characteristics of these modified check nodes. For example, before identifying , the endpoint of , we should remove the contributions of . Furthermore, before identifying , the endpoint of , we should remove the contributions of  and . If the modified check node is a “zero-ton” bucket, this path stops extending. If the modified check node is a “multi-ton” bucket, continue waiting until its left neighboring node is identified by other paths. If the modified check node is a “single-ton” bucket, continue to connect its only efficient left neighboring node. Then obtain the tree  through expending these left nodes. Their  number of new right neighboring nodes will be determined through these left nodes; then obtain the tree  by expending these right nodes.
If the number of left nodes identified is equal to 
K, it means the spectrum recovery has been successful. The example is shown in 
Figure 5. In 
Figure 5, from the beginning of starting points 
 and 
, we can obtain two trees, 
 and 
 (depth = 6), through three expanding by three steps. From the graph, we can obtain all five variable nodes. All processes need six (4 + 4 − 2) identifications, which is far less than the thirteen identifications of method 1.
  3.3. Iterative Framework Based on the Binary Tree Search Method
Thirdly, we introduce the iterative framework based on the binary tree search method. The example is shown in 
Figure 6. The concepts, technology, and framework involved in this section were proposed by Georg-August university in the paper [
15].
Consider a signal  sized  that has only five () significant coefficients , while the rest of the coefficients are approximately equal to zero. The node of the first layer of the binary tree is , with N aliasing frequencies. The nodes of the second layer of the binary tree are  and , with  aliasing frequencies. The nodes of the third layer of the binary tree are , with  aliasing frequencies. The nodes of the no. ’s layer of the binary tree are , with  aliasing frequencies.
The insight of the binary tree search is as follows: (1) The frequency of aliasing is gradually dispersed with the expansion of binary tree layers. The number of efficient nodes of the no. 
’s layer is defined as 
 and the final number will be approximately equal to the sparse number 
K with the expansion. In 
Figure 6, 
, and the final number of efficient nodes of no. 4’s layer 
 is equal to the sparse number 
K. (2) If the parent node exists, there may be one or two child nodes. If the parent node does not exist, there is no need to continue the binary search for this node. For the example of 
Figure 6, parent node 
 exists, so at least one of the two child nodes 
 and 
 exists. By the same reason, parent node 
 and 
 exist, so at least one of the two child nodes 
 and 
 exists, and at least one of the two child nodes 
 and 
 exists. On the contrary, parent node 
 does not exist, so its two child nodes 
 and 
 do not exist either. Inspired by these two ideas, the steps of of binary tree search are as follows:
Step 1: Calculate the node  of the first layer, obtain  and efficient nodes’ distribution in this layer.
Step 2: According to the last result, calculate the node  and  of the second layer selectively, and then obtain  and the efficient nodes’ distribution in this layer.
…
Step : According to the last result, calculate the node , …,  of the no. ’s layer selectively, and then obtain  and the efficient nodes’ distribution in this layer.
We do not need to start from step 1, we can start from step 
. With the binary tree search, if the 
 gained by step 
 is approximately equal to the sparse number 
K, the binary tree search is finished. In 
Figure 6, 
, the binary tree search is finished after the fourth layer by step 4. According to the efficient nodes’ distribution of no. 
’s layer, we can solve the frequency recovery problem of each “single-ton” bucket. Finally, the 
K frequency recovery problem is solved by combining the efficient frequency in each bucket. In 
Figure 6, 
 exists; it represents that each of the five buckets has exactly one effective frequency. Through the frequency recovery of each “single-ton” bucket, we obtain 
 individually in every bucket. Finally, we obtain 
 of five elements. This algorithm involves three subproblems as follows:
Subproblem 1: How to calculate , …,  selectively according to the last result?
Solution: The runtime of calculating all 
 of 
 is 
 by using the FFT algorithm. The number of effective nodes in the upper layer is 
, so the maximum number of effective nodes in this layer is 
. The formula for calculating each effective node is Equation (
23), so the runtime of calculating these 
 nodes is 
. Therefore, when 
, use Equation (
23) to calculate 
 nodes. Otherwise, use the FFT algorithm to calculate all nodes.
        
Subproblem 2: The condition of stopping the binary tree search.
Solution: If the stop condition is defined as , in the worst case, the condition cannot be satisfied until d is very large. For example, if  and  are significant frequencies, after the decomposition of the first layer’s node , the second layer’s node , the third layer’s node ,…, and their child node  are still aliasing until . Therefore, threshold  can be considered, which means that the K frequencies are put into  buckets. The aliasing problem can be solved by using the SVD decomposition described in the last paragraph. Its extra sampling and calculating cost may be less than that of continuous expansion search if no more frequencies are separated by the next layer.
Subproblem 3: The frequency recovery problem of an approximately “single-ton” bucket.
Solution: In the exactly sparse case, the problem can be solved by using the phase encoding method described in the last paragraph. In the general sparse case, the problem can be solved by the CS method or the MMSE estimator described in the last paragraph.
The binary tree search method is especially suitable for small K or small support. In the case of small support, the frequencies must be scattered into every different bucket. For example, only eight consecutive frequencies of the 1028 frequencies  are significant frequencies, and their eight locations are equal to 0mod8, 1mod8, 2mod8, …, 7mod8. In this way,  = 8 can be obtained in the fourth layer, and the binary tree search can be stopped by step 4. This method also has the advantage that it is a deterministic algorithm. Finally, the distribution of at most one frequency in one bucket must happen with the layer’s expansion, unlike some probabilistic algorithms such as sFFT-DT algorithms.