Prony Method for Two-Generator Sparse Expansion Problem

: In data analysis and signal processing, the recovery of structured functions from the given sampling values is a fundamental problem. Many methods generalized from the Prony method have been developed to solve this problem; however, the current research mainly deals with the functions represented in sparse expansions using a single generating function . In this paper, we generalize the Prony method to solve the sparse expansion problem for two generating functions , so that more types of functions can be recovered by Prony-type methods. The two-generator sparse expansion problem has some special properties. For example, the two sets of frequencies need to be separated from the zeros of the Prony polynomial. We propose a two-stage least-square detection method to solve this problem effectively.


Introduction
The Prony method is a popular tool used to recover the functions represented in sparse expansions using one generating function. For example, the function with the following form can be recovered from 2M equispaced sampling values f (lh), l = 0, ..., 2M − 1 for an appropriate positive constant h; however, in many real-world applications, we need to deal with the functions represented by more than one generating functions. For example, the harmonic signals with the form are generated by two generating functions (or simply generators): cos(φx) and sin(βx), where φ and β are generic parameters used as the placeholders for the real parameters {φ j } M j=1 and {β j } M j=1 to generate the specific terms in the expansion. In this system, we have two sets of coefficients {c j } M j=1 and {d j } M j=1 and two sets of frequencies {φ j } M j=1 and {β j } M j=1 . Analogous to the original Prony method, we expect to use 4M equispaced sampling values f (lh), l = 0, ..., 4M − 1 to recover those four sets of parameters.
There are some existing methods to solve this problem. The first one is to convert it to a single-generator problem by the following formulas cos x = 1 2 (e ix + e −ix ) and sin x = 1 2i (e ix − e −ix ), which results in problem (1) (see [1]). Another way using the same idea is based on the even/odd properties for cos x and sin x (see [2]) as follows However, this approach is very restrictive, because the chance that one can make this kind of conversion is very small. In this paper, we are interested in solving the general two-generator sparse expansion problem by a new way of generalized Prony method. More specifically, we study the functions with the following two-generator sparse expansion where u(φx) and v(βx) are two different functions used as the generators. In order to make the Prony method work, we need a critical condition for our special technique: There exists a linear operator, such that u(φx) and v(βx) are both eigenfunctions of this operator.
Another situation that could lead to the two-generator expansion problem is when we apply some special transforms on a sparse expansion. For example, when we apply the short time Fourier transform (STFT), i.e., we would obtain a two-generator sparse expansion as follows, In this example, the two generators are e −β(φ−x) 2 and e −β(φ+x) 2 with β = 0. Actually, the original single-generator problem (6) can be solved directly. For example, one can convert cos(φx) to 1 2 (e iφx + e −iφx ) (see [1]), or use a method based on the Chebyshev polynomials (see [3]). When we solve problem (6) directly, we use the sampling values in the time domain; when we solve the problem in the form of (7), we use the sampling values in the frequency domain. (See [4] for a discussion on sampling values in the frequency domain.) In this paper, we use this example to study the special properties of the twogenerator sparse expansion problem.
Since the signals could take various forms, not necessarily in the exponential form studied in the classical Prony method, many researchers generalized the Prony method to handle different types of signals. For example, many results in [1,3,[5][6][7][8][9][10][11][12] have been developed over the last few years. In particular, Peter and Plonka in [1,8] generalized the Prony method to reconstruct M-sparse expansions in terms of eigenfunctions of some special linear operators. In [3], Plonka and others reconstructed different signals by exploiting the generalized shift operator. These results provide us the building blocks for our method in this paper.
We organize our presentation in the remaining sections as follows. In Section 2, we quickly review the classical Prony method and one of its generalizations for the Gaussian generating function to establish the foundation of our method. In Section 3, we describe the details of our method using the example with two generators: cosine and sine functions. In Section 4, we apply our method on two different types of Gaussian generating functions, so that we can study an interesting property: When the Hankel matrix for finding the coefficients of the Prony polynomial is singular, what does it really mean? In Section 5, we show two examples that correspond to the two problems solved in Sections 3 and 4, respectively. Finally, we make conclusions in Section 6 and describe two related research problems to be solved in the future.
is a Vandermonde matrix, which is non-singular for distinct φ j 's and hφ j ∈ (−π, π] for j = 1, ..., M. The frequencies can be extracted from the zeros of Λ(z) (in the form of z j = e −ihφ j ) using the formula Finally, the coefficients c j , j = 1, ..., M can be determined by solving the following overdetermined linear system (with M unknowns and 2M equations) The redundant equations in this overdetermined linear system will play a critical role in our two-generator method to help us separate the frequencies associated with the two generators (see Section 3).

Sparse Expansions on Shifted Gaussian
In order to solve the two-generator sparse expansion problem (7), we need to apply the technique presented in [3], which solves a single-generator sparse expansion problem with the following form where β ∈ C\{0}. The technique relies on the following generalized shift operator where h = 0, and K(·, ·) has the property The K(x, h) function in (16) is chosen to be e βh(2x+h) , so that we have the following critical property which means that e −β(φ j −x) 's are eigenfunctions of S K,h for all φ j ∈ R.
The sparse expansion f (x) in (15) can be reconstructed using 2M sampling values f (x 0 + hk), k = 0, ..., 2M − 1, and x 0 is an arbitrary real number. If Re β = 0, then h ∈ R\{0}; while if Re β = 0, then 0 < h ≤ π 2|Imβ|L with φ j ∈ (−L, L) for j = 1, ..., M for some given L. (See [3].) The Prony polynomial for the problem in (15) can be defined as: with λ M = 1. Then, we have the following linear system for m = 0, 1, ..., M − 1, which can be represented as an inhomogeneous system where G : . This H matrix is a Hankel matrix, and it has the following structure  Finally, the coefficients c j 's in the expansion (15) can be computed by solving the following overdetermined linear system:

The Sparse Expansion Problem with Two Generators: Cosine and Sine
In this section, we present our method for solving the two-generator sparse expansion problem in the following form through a modified Prony method. We present our method in the following theorem.
Proof. First, we choose an appropriate linear operator, such that our two generating functions cos(φx) and sin(βx) in (23) are both the eigenfunctions of this operator. We consider the symmetric shift operator (see [3]) When we apply this operator on cos(φx) and sin(βx), we obtain where cos(φh) and cos(βh) are the eigenvalues. Now we define the Prony polynomial for problem (23) using all the eigenvalues {cos(φ j h)} M 1 j=1 and {cos(β l h)} M 2 l=1 as follows: which can be written in terms of the Chebyshev polynomials as where T k (z) := cos(k cos −1 (z)). (See [3] for more information on this technique.) Since the leading coefficient of the Chebyshev polynomial (27) has the leading coefficient 1. This Prony polynomial has the following critical property: for j = 1, 2, . . . , M 1 and l = 1, 2, . . . , M 2 , respectively, which is essential to help us derive the following linear system. To derive a linear system for {λ k } M 1 +M 2 −1 k=0 , we need to calculate the following expression which can be shown to be zero. That is, for m = 0, 1, . . . , M 1 + M 2 − 1. Indeed, using the right-hand-side expression in (23) for f (x) in (28) and for a fixed m ∈ {0, 1, . . . , We can reformulate the system (28) as for m = 0, 1, . . . , In order to see that the linear system in (29) has a unique solution, we study the (29), which we denote as H. As in the classical Prony method, we can factorize H in the following structure where the Vandermonde Block matrix V h can be written as with and and the diagonal block matrix D can be written as where and Thus, H is guaranteed to be invertible by the conditions 2 • and 3 • of the theorem. Then, we can find the unique solution for {λ k } M 1 +M 2 −1 k=0 from the linear system (29). With these λ k values for Λ(z) as in (26), we can determine φ j 's and β l 's from the zeros of Λ(z); however, this step is non-trivial, because we do not know what zeros correspond to φ j 's and what zeros correspond to β l 's. In order to resolve this ambiguity, we consider all the possible cases: Among M 1 + M 2 zeros of Λ(z), M 1 of them correspond to φ j 's. Thus, there are a total ( M 1 +M 2 M 1 ) possible choices for φ j 's, among which there is exactly one choice for the solution; however, how do we select the right one? We need to go to the next overdetermined linear system for the answer.
When we determine the coefficients c j 's and d l 's in (23), we have the following linear system are calculated using the original φ j 's and β l 's (corresponding to the true-solution case), which means that all the 4(M 1 + M 2 ) − 1 equations in (36) are completely satisfied for the true-solution case. In other words, the least-square solution of (36) for the true-solution case should have this property: Its error term is zero theoretically (or very close to zero due to rounding errors in computation). While the least-square solution for any non-solution case would have a significant (with respect to the rounding errors) nonzero error term, which makes the true solution stand out clearly.
Our experiments have verified this phenomenon. Based on this observation, we develop a two-stage least-square detection method to minimize the computing cost, and in Section 5, we demonstrate the effectiveness of this method using a simple example. Remark 1. The overdetermined linear system (36) plays an important role in determining the true solution from certain number of possible cases. Typically, this situation happens in the multigenerator sparse expansion problem. For the single-generator case, we can select same number of linearly independent equations from the overdetermined system as the number of unknowns to find the solution; however, for the multi-generator case, the redundant equations are very useful in the least-square method.

The Sparse Expansion Problem with Two Gaussian Generators
In this section, we solve another two-generator sparse expansion problem as in (7) that uses the two Gaussian generating functions, e −β(φ−x) 2 and e −β(φ+x) 2 , in the form of for some constant β ∈ C\{0}. In order to recover the coefficients c j ∈ C\{0} and the parameters φ j 's, we need 4M sampling values f (x 0 + kh), k = 0, ..., 4M − 1, where x 0 ∈ R, and h satisfies the same condition as in Section 2.2.
This two-generator sparse expansion problem has a special property: When φ j 0 = 0 for some j 0 ∈ {1, ..., M}, the two functions e −β(φ j 0 −x) 2 and e −β(φ j 0 +x) 2 are the same. This property would cause some problem for our method presented in the previous section. In order to make the discussion easier, we separate these two cases, and consider the case that φ j ∈ R\{0} for all j = 1, ..., M first.

Theorem 2.
Assume that a function f (x) has the two-generator sparse expansion form of (37), where the number of terms M and the constant β ∈ C\{0} are known, but the coefficients in {c 1 , . . . , c M } and the parameters in {φ 1 , . . . , φ M } are unknown. If 4M equispaced sampling values of the form f (x 0 + kh) for k = 0, 1, . . . , 4M − 1 are provided, then the original function f (x) can be uniquely reconstructed under the following conditions: Proof. Our method relies on existence of some critical linear operator, such that both generating functions are its eigenfunctions. Here we use the operator S K,h as defined in (16) with K(x, h) := e βh(2x+h) , which has the following properties: Clearly e −β(φ j −·) 2 and e −β(φ j +·) 2 are eigenfunctions of S K,h for all φ j ∈ R\{0} with corresponding eigenvalues e 2βφ j h and e −2βφ j h , respectively, for j = 1, ..., M. Hence we can define the Prony polynomial using all these eigenvalues: with λ 2M = 1. Since the real number φ j = 0, we can assume that φ j > 0 for all j = 1, ..., M based on the structure in (37) to improve the certainty without loss of generality.
Then for m = 0, 1, ..., 2M − 1, we calculate which can be written as the following linear system for m = 0, 1, ..., 2M − 1. To solve this system, we need 4M sampling values: f (x 0 + kh) for = 0, 1, . . . , 4M − 1. To study existence of the solution for this linear system, we would like to simplify it with respect to the unknown vector λ := [λ 0 , ..., λ 2M−1 ] T as follows, and The invertibility of H can be seen from the following matrix factorization: where the Vandermonde block matrix V h has the following form and the diagonal block matrix D is given by and From this structure, we can see that the Vandermonde matrix V h in (44) is invertible by conditions 2 • and 3 • of the theorem, and hence H in (42) is also invertible by condition 1 • , which results in the unique solution for λ.
With all the λ l values found from the above linear system, we can find all the φ j values by calculating the zeros of the Prony polynomial of (39). In this case, we do not need to deal with the ambiguity that we encountered in the previous section due to the special structure of the pairs (φ j , −φ j )'s. Finally, the coefficients c j 's of the sparse expansion (37) can be computed by solving the following overdetermined linear system: for l = 0, ..., 4M − 1.

Remark 2.
Our method above only works for the case when φ j = 0 for all j in {1, 2, . . . , M}; however, in the real-world situation, when we solve a problem of (37) using 4M sampling values, how do we know if there exists any φ j = 0 in it or not? We need a detection method to tell us if all the φ j 's are nonzero before we apply the above method.
Let us investigate the existence of a solution for the linear system (41), which is determined by the invertibility of H in (42). We notice that when φ 1 = 0, the first column of (45) and the first column of (46) are the same, which causes the matrix V h in (44) to be singular. Then, we conclude that H in (43) is singular if any φ j = 0. In other words, by checking the invertibility of H, we can tell if there is any φ j = 0 for problem (37). If H in (42) is singular, our current method does not work. Fortunately we can modify our method to solve the problem for this special situation.
After we solve the linear system of (55), we obtain the Prony polynomial that contains one zero at z = 1 and the remaining zeros appear in pairs of (z j , z −1 j )'s, which correspond to the parameter values 0 and (φ j , −φ j ) pairs. Finally, we will solve the following overdetermined linear system for c 0 , c 1 , . . . , c M values for l = 0, ..., 4M + 1. From this example, we can see that the value of det(H) can give us some important information, that is, which of the two systems in (37) and (51) we should work on. This property could be useful when we consider a problem in which the M value in (51) is unknown, but restricted in certain range. (See discussion in Section 6).

Numerical Experiments
In this section, we use two simple examples to illustrate the implementation details of our method for the two-generator sparse expansion problem described in the previous sections. The first example is for version (23) in Section 3. The second example is for version (37) in Section 4.
To separate the φ j 's from β l 's, we consider the following overdetermined linear system: where we use the shorthand notations for n = 0, 1, . . . , 19. Note: In this linear system, we only use 20 out of 39 original sampling values, which is adequate for this particular example. It is a trade-off issue between the accuracy of computation and the cost of computation (in time). In general, the more redundant equations we use, the more accuracy we can achieve in searching for the true solution. In other words, if we can obtain adequate accuracy, we focus on cutting the computation cost to the minimum. We do not solve this overdetermined linear system by the least-square method directly. We split these 20 equations into two parts: In the first part, we approximate the coefficients {c 1 , . . . , c 5 , d 1 , . . . , d 5 } in (60) by the least-square method.
Then we apply these derived coefficients to the equations in the second part so as to filter out the true solution. Among the 10 values in (59), every time we select 5 of them for {φ 1 , . . . , φ 5 }, the remaining 5 numbers are automatically for {β 1 , . . . β 5 }. We will have total 252 possible choices (which is the combinatorial number ( 10 5 )) as the candidates for the solution. Notice that this combinatorial number is a relatively big number. In order to speed up the pro-cessing, we reduce the redundant computation to the minimum. Let us use the notations {φ i 1 , . . . , φ i 5 , β i 1 , . . . β i 5 } with i = 1, 2, . . . , 252 representing those 252 candidates. Our method is based on the property that the information given in the sampling values has a lot of redundancy for selecting the true solution, and we only use just enough information from the given sampling values so as to save the computation time.
First, when we calculate the coefficients {c 1 , . . . , c 5 , d 1 , . . . , d 5 } by the least-square method, we use exactly 10 equations (the same number of the coefficients) out of the 20 equations in (60). Based on our experiments, we do not have to use an overdetermined system for a good approximation by the least-square method. A determined system can give us excellent approximation for the least-square problem, while any underdetermined system usually does not approximate the data well through the least-square solution. For convenience, we select 10 consecutive equations in (60) somewhere in the middle, which we call the least-square block in our discussion, to approximate the coefficients {c 1 , . . . , c 5 , d 1 , . . . , d 5 }. Specifically, our least-square block takes the subscripts from 6 through 15, and the corresponding sampling values { f 6 , f 7 , . . . , f 15 } should be selected as a reduced linear system given below, Even if our new linear system (61) is a determined system, we still solve it for a leastsquare solution, because the determinant of the square matrix in (61) could be very close to zero. Then the remaining equations in (60) together with the coefficients derived from (61) will be used to detect which candidate is the true solution based on the error information.
which is in general different from the original sampling vector [ f 0 , f 1 , . . . , f 19 ] T . Then we will calculate the difference of these two vectors, and see how close they are. We define the error vector as follows: To search for the true solution among the 252 candidates, we discover an intrinsic property, shown in Figures 2 and 3, that can clearly separate the true solution from other candidates.
In Figure 2, we plot the error vector for one of the 252 candidates to view its typical behavior. The error values in the least-square block (with subscripts from 6 to 15) are very close to zero for a typical candidate; however, the error values that are out of the least-square block (with subscripts from 0 to 5 and from 16 to 19) are not close to zero in general for a candidate that is not the true solution.
This behavior can be explained in this way: The errors in the least-square block are usually very small due to the fact that the least-square solution of the determined system approximates the targeting sampling values { f 6 , f 7 , . . . , f 15 } quite well; however, when we consider an error for a sampling value out of the least-square block, since the corresponding equation is not involved in the least-square approximation, there is no reason for this equation to generate a value that is very close to the targeting sampling value.
While for the true solution case, the behavior is different in the sense that the errors for all the equations in the linear system (60) are very close to zero (see Figure 3, and ignore the two reference points at the ends). Let us summarize the key property that helps us to find out the true solution among all the candidates: For a candidate, if the coefficients generated from the determined linear system (61) by the least-square method cannot approximate just one sampling value out of the least-square block well, then it cannot be the true solution.
However, if the coefficients for one candidate can approximate one particular sampling value out of the least-square block well, we can only say that it is highly likely that this candidate could be the true solution, because the probability for a non-solution candidate to approximate some sampling value out of the least-square block well is very small. Based on this observation from our experiments, we design the following strategy for the solution search.
Strategy: Eliminate as many as possible candidates in the first round filtering in two steps: Step 1. Select a determined linear system from the overdetermined linear system in (60) (as the least-square block), and approximate the coefficients {c 1 , . . . , c 5 , d 1 , . . . , d 5 } by the least-square method for each of the 252 candidates.
Step 2. Apply the derived coefficients in Step 1 on one of the linear equations out of the least-square block to approximate the targeting sampling value and calculate the error with the targeting sampling value. If the error is greater than certain threshold (we use 0.1 as our threshold), we drop this candidate from the consideration; otherwise, this candidate passes the first round filtering. If only one candidate survives the first round filtering, it must be the true solution. If more than one candidates pass the first round filtering, we need to do the second round filtering. In the second round filtering, we simply apply the derived coefficients on another linear equation out of the least-square block, and calculate the error for the targeting sampling value. If the error is greater than the threshold, we eliminate this candidate. We keep doing these cycles until we identify the true solution. Since we have plenty of redundant equations out of the least-square block, we should be able to determine the true solution without going through too many cycles in general. Furthermore, those linear equations corresponding to the original sampling values that are not included in the linear system (60) can still be used for the above steps when necessary, but the probability to use those equations out of the linear system (60) will be extremely small. This simple strategy is designed to allow us to detect the true solution without unnecessary computation, while we still preserve the option to use the redundant information when necessary.  Here we would like to point out that as soon as we select values in φ-group or β-group, the order of those values in each group is not important, because their corresponding coefficients (c j 's or d l 's) will also be aligned with them accordingly when we solve the determined linear system (61) using the least-square method.

Example 2.
Our second function to be recovered has the following form which is derived by applying the STFT on the following function with the parameters of (64) listed in the following Table 2. To solve this problem, we need to use 12 (i.e., 4M) sampling values. After we applied the method described in Section 4, we solved a linear system with 6 unknowns, and derived the Prony polynomial of degree 6 as follows Λ(z) = 1.0000(z 6 + 1) − 14.4845(z 5 + z) + 65.9809(z 4 + z 2 ) + 108.8070z 3 .
The symmetric structure of this polynomial tells us that its zeros appear in (z j , z −1 j ) pairs for j = 1, 2, 3, which correspond to three pairs of parameters: (1.0000, −1.0000), (3.0000, −3.0000), and (4.0000, −4.0000) for (φ j , −φ j ), j = 1, 2, 3. Finally, we can solve another linear system for the coefficients c j 's with the errors listed in the Table 3. Table 3. Parameters of the function f (x) in (64) and approximate errors using 12 sampling values with h = 0.5.

Conclusions
In this paper, we introduce a method that extends the Prony method to solve the two-generator sparse expansion problem. This method relies on the existence of a special linear operator for which the two generators must be the eigenfunctions. This two-generator problem has a special property: The zeros of its Prony polynomial correspond to two sets of parameters, and there is no straightforward way to separate them. We propose a two-stage least-square detection method on an overdetermined linear system for each candidate to extract the true solution, which relies on an intrinsic property for the true solution: Only the true solution can use the coefficients derived from the least-square block to approximate the targeting sampling values out of the least-square block well. Our method is designed to minimize the computation cost, while still maintain the computation accuracy.
It seems that the idea can be extended to the k-generator sparse expansion problem for k > 2; however, for the general k-generator case, the requirement that there exists a linear operator such that all the generators must be its eigenfunctions becomes extremely hard to achieve. For example, in the following sparse expansion problem, it is not easy to find a linear operator, such that both cos(φx) and e βx are its eigenfunctions. One may argue that the problem could be solved by converting cos(φx) to 1 2 (e iφx + e −iφx ), and then it becomes a one-generator problem. Notice that converting a two-generator problem to a one-generator problem may not work most of the time. We are interested in developing a general method that can solve the two-generator sparse expansion problem including the one in (65). We can see that there are many difficult problems to be solved in this multi-generator sparse expansion problem, and we would like to see more researchers contribute in this direction.
Our method for the two-generator sparse expansion problem can handle certain degree of uncertainty. For example, in problem (23), if we know the total number of terms (i.e., the value of M 1 + M 2 ), but we do not know the number of terms in each summation (i.e., the individual values of M 1 and M 2 ), we can still solve the problem using our two-stage least-square detection method described in Sections 3 and 5. If we increase the uncertainty a little more, can we still solve the problem?
For example, in the problem we considered in Section 4, if we do not know the exact number of terms (it is referred to unknown order of sparsity M in [1]) in the following expansion, c j e −β(φ j +x) 2 , and we are given K equispaced sampling values for some positive integer K. If we are told that these sampling values are sufficient to recover the signal, how do we recover it?
In other words, we know that the number of terms M is in the range 1 ≤ M ≤ K/4 , but we do not know the exact number M, can we solve the problem? The answer is yes, because we can try all the possible cases: M = 1, 2, . . . , K/4 , and for each case, we apply our two-stage least-square detection method to tell us if the true solution can be extracted.
However, we are not satisfied with this kind of exhaustive search type solution due to its high cost. We plan to develop an efficient term number detection method, so that when we make a term number prediction, this method can tell us if it is correct or not immediately. In [1], two methods are proposed: One is based on the rank of the H matrix, and the other is based on the singular values of the H matrix. The main issue is: How to obtain a reliable method to determine the M value in the sparse expansion? Only after we obtain the correct term number we will pay the computation cost to go through all the necessary details to find the solution. Funding: This research received no external funding.

Conflicts of Interest:
The authors declare no conflict of interest.