Decoding ‘Maximum Entropy’ Deconvolution

For over five decades, the mathematical procedure termed “maximum entropy” (M-E) has been used to deconvolve structure in spectra, optical and otherwise, although quantitative measures of performance remain unknown. Here, we examine this procedure analytically for the lowest two orders for a Lorentzian feature, obtaining expressions for the amount of sharpening and identifying how spurious structures appear. Illustrative examples are provided. These results enhance the utility of this widely used deconvolution approach to spectral analysis.

While numerical studies have outlined the general characteristics of the formalism [1,[20][21][22], many details are not understood. For example, quantitative predictions of the degree of sharpening do not exist, even for special cases. In this work, we approach this lack of understanding by solving the first-and second-order Burg equations analytically for a Lorentzian feature whose Fourier coefficients are those (e −|n|Γ ) of the continuum. This analysis parallels our previous work on noise reduction using the corrected maximumentropy (CME) procedure [23], where a similar calculation was found to yield an exact analytic result. The results obtained here are also found to be exact, and are illustrated with representative applications. Surprisingly, the degree of sharpening depends on the ratio of the width of the feature to the width of the spectral segment being analyzed. These results enhance the utility of this widely used deconvolution approach to spectral analysis.

Theory
The path through the origin, development, and reduction to practice of Burg's deconvolution theory began with the mid-century goal of extracting weak harmonic signals in stationary time series, where the signals are buried in Gaussian noise. The procedure started with autocorrelation, an operation incompatible with spectroscopy. Applications evolved through multiple steps, including forward prediction [24]. We do not attempt to trace the path here. We consider only the result, cast in a form suitable for the present application. This is Andersen's approach [22], later extended to complex coefficients by Kesler and Hayken [25].
In spectroscopy, the procedure is based on the Fourier coefficients of the spectrum being analyzed. To define these procedures, let P o (θ j ) = p j be a positive-definite spectrum consisting of (2N + 1) real values p j , −N ≤ j ≤ N, projected onto the range −π ≤ θ ≤ π according to The restriction to an odd number (2N + 1) of data is required by the maximum-entropy derivation, and is also mathematically convenient. Next, let R n , −N ≤ n ≤ N be the set of digital Fourier coefficients associated with the p j , determined as (2) Because the p j are real, R * n = R −n . Then, the CME representation P M (θ) of P o (θ) of order M is given by [23] where the a Mn are solutions of The Toeplitz (diagonal-constant) form originated with the need to perform autocorrelation in time-sequence applications, and is retained for spectroscopy because maximumentropy theory requires it. The theory is also best suited to Lorentzian features, because as shown below, Equation (3) reduces to Lorentzian form under small-term conditions.
The solution of these equations is readily obtained either by Levinson recursion [26,27] or by inverting the Toeplitz matrix. The resulting uppermost equation is solved for a M0 , after which the remaining coefficients are calculated. When the resulting spectrum P M (θ) of Equation (3) is Fourier analyzed, the original coefficients R 0 , R 1 , . . . R M are found to be recovered exactly, while those for n > M continue the established trend. As previously shown in [23], it follows that this approach possesses all the advantages of the brick-wall filter with none of the disadvantages, yielding the most accurate noise-free representation of spectral data with no apodization (filter-cutoff) errors.
Burg/Andersen (B/A) deconvolution [22,25] is entirely different, sharing only the starting Toeplitz matrix and the final pseudo-Lorentzian form, Equation (3). Here, the objective is to obtain a Mn such that the denominator of Equation (3) approaches zero as closely as possible at the locations θ j of the j features in the spectrum. The result is a sharpened or "whitened" version of the original. The calculation is a Levinson-like recursion, generating a spectrum P(θ) that when Fourier-analyzed, the original R n are found to be replaced with new values having a smaller decay coefficient, as described below.
The specific B/A procedure is as follows. Assume that the elements a Mµ of the solution vector of the Mth-order Toeplitz-matrix equation are known, but with the vector normalized to a M0 = 1. The term 1/a M0 in the right side vector of Equation (4) is replaced by a "power" P M that is also determined from the Mth-order solution. (The terminology "power" is relevant for stationary time series but has no meaning for spectroscopy). Next, increase the size of R by adding 1 row and 1 column, so M → M + 1 . Now evaluate From a stationary-time-sequence perspective, Equation (5) is a filter running in the forward direction, i.e., predicting the output b Mn for increasing κ, whereas Equation (6) is the same filter operating in the reverse direction, predicting b Mn for decreasing κ. Next, evaluate The (M + 1)-order solutions a M+1,n , P M+1 are then given by The recursion calculation is initiated with the starting values and a 00 = a 10 = 1. The starting value of a 11 is determined by minimizing with respect to a 11 . The results are The calculation then proceeds with Equations (7)- (11). The bidirectional averaging implicit in Equations (5) and (6) minimizes odd-harmonic contributions, leading to deconvolution.

Analytic Investigation
We now consider analytic solutions, with the objectives of obtaining information about the deconvolution process, properties of the solutions, and useful estimates of the amount of sharpening that can be expected in specific situations. As with our previous work [23], we base our analysis on the Fourier coefficients e −|n| Γ of the continuum normalized Lorenzian which for mathematical simplicity we place at the center of the range −π ≤ θ ≤ π, defined by Equation (1). The digital Fourier transform P D (θ) of e − |n| Γ is In Equation (19), it is assumed that (N + 1)Γ is large enough that the apodization ("ringing") term in Equation (18) can be discarded. The M = 1 solution P 1 (θ) of Equations (3) and (4) is identical to Equation (19), except that it is exact. As shown in [23], with R n = e −|n|Γ , all higher-order terms vanish. For small Γ and θ, Equation (20) reduces to which differs from Equation (16) only by the (1/2π) normalizing factor of the continuum Fourier transform. Thus, the broadening is the square root of the closest approach of the denominator to zero. The full-width-half-maximum (FWHM) is 2Γ. We now consider the equivalent B/A M = 1 solution using Equations (3) and (14), providing detail as necessary. Substituting R n = e −nΓ in Equation (14) yields where Equation (24) follows from Equation (23) by canceling the common sums, and Equation (25) from Equation (24) by multiplying the numerator and denominator by e Γ . As a result of the special property of R n = e −|n|Γ , the sums in Equation (23) cancel regardless of the initial and final values of n, so the resulting Equation (25) is independent of N.
The remaining mathematics is more efficiently carried out in two parts, evaluating first the power prefactor P 1 and next the denominator D 1 . For the prefactor, we find assuming that N is large enough that the apodization term in the numerator can be ignored.
Here, the sum depends on the starting value of n. We can remove this ambiguity by noting that the digital transform Equation (2) uses all terms −N ≤ n ≤ N, whereas Equation (26) is single-ended. Consistency is achieved by taking the average of sums starting at n = 0 and n = 1, in which case each term is considered once and only once. The result is The contribution of the denominator is Combining the two expressions yields the M = 1 lineshape P M (θ), i.e., the M = 1 deconvolved version of P o (θ j ): The structural similarity between Equations (19) and (32) is evident. This equation is best understood by repeating the small-term expansion leading to Equation (21) in the CME case. Performing the series expansions of sinhΓ, cosh Γ, and cos θ for small Γ and θ, we find Ignoring the small θ 4 term in the denominator, the lineshape is seen to be Lorentzian, with a FWHM of This can be compared to the FWHM of 2Γ for the original Lorentzian. The relative narrowing is more relevant, so we divide Equation (35) by 2Γ, which yields where in Equation (37) it is assumed that the second term in the radical can be ignored relative to the first. Since the dimensionless reference for Γ is the Fourier period 2π, we find the surprising result that the relative sharpening is also a function of the width of the structure relative to the width of the spectral segment being analyzed. For structures with values of Γ that are small compared to the intrinsic reference scale of 2π, the sharpening can be significant.
While the M = 1 B/A FWHM is obviously less than the original for Γ < 1, a short calculation shows that it is always the case if Because this inequality is satisfied for any Γ, the B/A process for M = 1 always results in a narrower Lorentzian line.
Because the width and peak values of the Lorentzian are related, narrowing is equivalent to moving the singularity in the denominator closer to zero. At θ = 0, we find compared to 1/π Γ for the original normalized Lorentzian. The pole has thus moved 2 inverse powers of Γ closer to zero, consistent with the decreased width. At this point we note the importance of averaging: had normalization been performed using only the first term in the denominator of Equation (23), the result would be a 11 = e −Γ , yielding the CME result Equation (19). Had Equation (23) been normalized by the second term, the result would be a 11 = e Γ > 1, violating a fundamental constraint of the theory. In essence, the B/A approach works because of averaging.
It is straightforward to take the calculation one step further, determining the result for M = 2. Using the same strategy, we find This expression is significantly more complicated than that for P 1 (θ), although its properties are easily determined. We consider first the extrema, which can be found by setting the derivative of the denominator with respect to θ equal to zero. The calculation yields The factored term sin θ shows that one extremum occurs at θ = 0, as expected by symmetry. However, Equation (47) also has second and third extrema at locations given by Because cosh 2Γ > cosh Γ for Γ > 0, it follows that the right side of Equation (48) is less than 1 under all conditions, hence by symmetry this solution is valid for any Γ. Thus for M = 2, the reconstruction of the single Lorentzian is split into two peaks.
We next consider the small-term expansion for M = 2. Expanding Equation (47) to fourth order in Γ and θ, we find The denominator is the square of a parabola with singularities at θ = ±Γ. This is consistent with the above analysis: the single peak of the original Lorentzian has split into two, with each new singularity located a distance Γ from the symmetry axis. The infinities at θ = ±Γ in Equation (45) are eliminated in a 6th-order expansion of the denominator, which yields the more accurate small-term result where Because P 2 (Γ ) = 1 Γ 3 > P 2 (0) = 4Γ 3 Γ 4 , the location θ = 0 is a local minimum. Thus, for M = 2, the singularities are moved even closer to zero, although the price paid is that two peaks are present instead of one. While it is possible to obtain an expression yielding the FWHM of each peak, the splitting has made the concept of sharpening irrelevant. We note that in going from M = 1 to M = 2, the peak went from being the reciprocal of a fourth-order quantity to the reciprocal to a sixth-order quantity.
Next, we note that small-term expansions may be expected to have limited utility. As an alternative, replacing e iθ → z and setting the denominator equal to zero yields Again, we see that the result contains two peaks, which are now separated by ∆θ = sinhΓ.
The above calculations are too crude to have significant quantitative value, but they illustrate a basic mechanism that we expect to be valid in other situations: when too many peaks (too large a value of M) are requested, the existing peaks are not only shifted, but extra features appear. We expect this to apply to orders of M beyond 2 as well.

Discussion
To better visualize the nature of the solutions, we turn to numerical methods. Figure 1 compares the normalized deconvolved spectrum for M = 1, calculated from Equation (32), to the original Lorentzian and the associated pseudo-Lorentzian calculated from Equations (16) and (19), respectively. The curves are shown on the expanded range −1.5 ≤ θ ≤ 1.5 to better display differences. On this scale, the Lorentzian and the pseudo-Lorentzian are essentially identical, but the deconvolved result is larger than the others by a factor of about 2 1 2 . From Equation (35) the linewidth is reduced by a factor of 8.1, in agreement with Figure 1. It can be noted from Equation (35) that the extreme sharpening seen here is a direct consequence of the relatively small value Γ = 0.25 relative to the ± π range of θ. Had we chosen a larger value of Γ, the relative amount of sharpening would have been less. As noted above, when M exceeds the number of singularities, the system generates extra poles. Figure 2 illustrates this in detail. Here, we show the results of the B/A procedure for M = 1, 2, and 3, comparing the results to each other and to the original Lorentzian at the bottom. The enhancement of the amplitude of the M = 1 spectrum with respect to the original Lorentzian is clear. Another striking feature is that the curve for M = 2 has two peaks instead of one, with the peaks located approximately at θ = ±Γ, as shown in the previous section. The curve for M = 3 recovers the single dominant peak, but its amplitude is reduced by about a factor of 2. In addition, spurious satellites appear at θ = ±2Γ. Thus, if M exceeds the number of legitimate critical points in the spectrum, the result is not only extra features, but also a significantly reduced peak height when a feature coincides with a structure of the spectrum. Because spectra usually contain more than one feature, the question arises as to how much of the simple single-line theory is transferrable to spectra containing multiple structures. Figure 3 shows a model spectrum consisting of a pair of lines with Γ = 0.25, symmetrically located about the origin with a separation of ∆θ = ±π/10 = ±0.314. This is sufficient for the peaks to remain distinct while retaining some overlap. It is seen that the M = 2 lineshapes are much sharper than the original, so the B/A procedure is effective. However, an examination of the numerical values shows that the broadening is about 0.042, leading to a ratio of 0.17 compared to the predicted value of 0.123. The difference is about 30%. In addition, the singularities are found to lie at about ±0.38 from the symmetry axis instead of ±0.314, a difference of 17%. Although no measures of broadening have appeared before, errors in the locations of singularities are well-known. We shall discuss this in more detail elsewhere.  Figure 4 compares the capabilities of the CME and B/A approaches to deal with singularities separated essentially by the broadening parameter. The synthetic spectrum consists of two poles, the first at θ = −0.20 with an amplitude of 0.5, and the second at θ 2 = +0.10 with an amplitude of 1.0. The broadening parameter Γ = 0.25 is the same in both cases, and both calculations were performed to the order M = 2. The CME is functioning as it should, i.e., generating a replica of the data with white-noise coefficients replaced with most-probable values. The B/A response is qualitatively different, with the sharpening evident and the two singularities clearly identified. However, the B/A procedure places these at −0.267 and +0.258, which is only qualitatively correct. The CME performs better, but only at higher M. For M = 20, two poles are also identified, located at −0.233 and +0.110, in better agreement with the correct values.

Conclusions
In this work, we investigate the B/A procedure analytically for a single Lorentzian line and B/A orders M = 1 and 2. We take advantage of the form e −|n| Γ of the continuum Fourier coefficients of the Lorentz lineshape to obtain exact analytic expressions. In both cases, the Lorentz structure is deconvolved, with the amount of sharpening greater for M = 2 than M = 1. The M = 1 case yields a simple analytic expression that can be used to estimate the amount of deconvolution that can be expected in given situations. Surprisingly, the relative amount of deconvolution is not intrinsic to the structure but depends on the ratio of the width of the original structure to the width of the spectral segment being analyzed, not just on the width of the structure itself.
The origin of spurious peaks is demonstrated to be the result of selecting an order that does not match the number of singularities in the structure, with degradation seen first as a splitting of a single peak, followed at higher orders with the appearance of satellite structures and reductions in the enhancement of the main peak. In applications to a model spectrum and data, we find that when two peaks are present, B/A processing for order 2 highlights the structures, in contrast to the CME, although for large M the CME returns more accurate values of the singularities.