Stieltjes and Hamburger Reduced Moment Problem When MaxEnt Solution Does Not Exist

: For a given set of moments whose predetermined values represent the available information, we consider the case where the Maximum Entropy (MaxEnt) solutions for Stieltjes and Hamburger reduced moment problems do not exist. Genuinely relying upon MaxEnt rationale we ﬁnd the distribution with largest entropy and we prove that this distribution gives the best approximation of the true but unknown underlying distribution. Despite the nice properties just listed, the suggested approximation suffers from some numerical drawbacks and we will discuss this aspect in detail in the paper.


Problem Formulation and MaxEnt Rationale
In the context of testable information that is, when a statement about a probability distribution whose truth or falsity is well-defined, the principle of maximum entropy states that the probability distribution which best represents the current state of knowledge is the one with largest entropy. In this spirit, Maximum Entropy (MaxEnt) methods are traditionally used to select a probability distribution in situations when some (prior) knowledge about the true probability distribution is available and several (up to an infinite set of) different probability distributions are consistent with it. In such a situation MaxEnt methods represent correct methods for doing inference about the true but unknown underlying distribution generating the data that have been observed.
Suppose that X be an absolutely continuous random variable having probability density function (pdf) f defined on an unbounded support S X and that {µ * k } M k=1 , with µ * 0 = 1, be M finite integer moments whose values are pre-determined that is, for an arbitrary M ∈ N. Quantities such as in (1) may be intended to represent the available (pre-determined) information relatively to X. The Stieltjes (Hamburger) reduced moment problem [1] consists of recovering an unknown pdf f , having support S X = R + (S X = R), from the knowledge of prefixed moment set {µ * k } M k=1 . Due to the non-uniqueness of the recovered density, the best choice among the (potentially, infinite) competitors may be done by invoking the Maximum Entropy (MaxEnt) principle [2] which consists in maximizing the Shannon-entropy under the constraints (1). Since entropy may be regarded as an objective measure of the uncertainty in a distribution, "... the MaxEnt distribution is uniquely determined as the 1. µ * j for prescribed moments; 2.
µ j for variable (free to vary) moments; 3.
µ j, f j−1 for the j-th moment of f j−1 , that is µ j, f j−1 = S X x j f j−1 (x) dx (in general µ j = µ * j ) 4.
The above non negativity condition on λ M which is a consequence of unbounded support S X , is crucial and renders the moment problem solvable only under certain restrictive assumptions on the prescribed moment vector (µ * 1 , . . . , µ * M ). This is the ultimate reason upon which the present paper relies.
The existence conditions of the MaxEnt solution f M have been deeply investigated in literature ( [5][6][7][8][9] just to mention some widely cited papers); over the years an intense debate-combining the results of the above papers-has established the correct existence conditions underlying the Stieltjes and Hamburger moment problem (more details on this topic may be found in the Appendix A).
On the other hand, when the existence conditions for f M are not satisfied, the nonexistence of the MaxEnt solution in Stieltjes and Hamburger reduced moment problem poses a series of interesting and important questions about how to find an approximant of the unknown density f least committed to the information not given to us (still obeying to Jaynes' Principle). This problem is addressed the present paper.
More formally, take C M to be the set of the density functions satisfying the M + 1 moment constraints (that is, they share the same M + 1 predetermined moments) and let µ(C M ) be the moment space associated to C M ; hence, the indeterminacy of the moment problem (1) follows.
A common way to regularize the problem, as recalled before, consists in applying the MaxEnt Principle obtaining E M , the set of MaxEnt densities functions which is a subset of C M ; consequently, let µ(E M ) be the moment space relative to the set of MaxEnt densities functions E M . Because, in general, µ(C M ) strictly includes µ(E M ) there are admissible moment vectors in Int(µ(C M )), the interior of µ(C M ), for which the moment problem (1) is solvable but the MaxEnt problem (3) has no solution and the usual regularization based on MaxEnt strategy is therefore precluded.
The implications of such issue are often understated in practical applications where the usual procedure limits itself to: . However, this is not completely coherent with the MaxEnt principle that prescribes to use not only the available but all the available information; hence discarding available information seems to be conceptually in contrast with the MaxEnt spirit. However, from the point of practical applications, to consider or not to consider the prefixed moment µ * M seems to have negligible effects on the summarizing quantities of the underlying distribution (mostly expected values of suitable functions) in which we may be interested in. We will resume this issue, after having carefully motivated and proved the proposed solution, in the last section of the paper devoted to discussion and conclusions.
We call the solutions 1. and 2. "forced" pseudo-solutions; they might indeed lead to the unpleasant fact that a MaxEnt solution always exists, although the original Stieltjes (Hamburger) moment problem does not admit any solution. Hence the crucial question is: does there exist a way to regularize the (indeterminate) moment problem (1) coherently with all and only the available information exploiting the MaxEnt rationale setup without forcing to unnatural solutions, i.e., based on totally inappropriate application of the MaxEnt principle?
Before proceed recall C M and define the following class of density functions: withC M ⊂ C M , whose entries satisfy the given constraints expressed in terms of M + 1 assigned integer moments µ * k = E(X k ), k = 0, 1, . . . , M. Now the question is: once {µ * k } M k=1 ∈ µ(C M )\µ(E M ) are pre-determined (that is, the MaxEnt problem does not admit solution) what is the optimal choice of the pdf that we can select in place of f M ? Relying upon the MaxEnt rationale, the best substitute of the missing f M should be given by suitable onef M ∈ C M having the overall largest entropy; that is, selectf M ∈ C M actually satisfying the relationship for an arbitrarily small ε.
We are aimed to find sup f ∈C M H f ,f M and the corresponding entropy Hf M , proving that it may be accomplished by MaxEnt machinery (see Equations (9)-(11) below).
The remainder of the paper is organized as follows. Sections 2 and 3 are devoted to evaluating the best pdf in Stieltjes and Hamburger cases respectively. We devote Section 4 to numerical aspects and in Section 5 we round up with some concluding remarks. In Appendix A. the existence conditions of MaxEnt distributions in Stieltjes and Hamburger case are shortly reviewed.

Stieltjes Case
In this section we provide a formal justification about motivation (rationale) and optimality of the proposed substitutef M of the MaxEnt density f M . We deal with the issue of selecting the "best" pdf both satisfying the constraints (given by predetermined integer moments) and with the overall largest entropy.
Before start, some relevant facts need to be collected together. Since MaxEnt density exist; the latter exists for any value µ M+1 > µ − M+1 (see Appendix A for more details). Since the procedure here adopted remains valid for each value µ * and consequently µ M+1 → +∞ too. As well, since MaxEnt density does not exist, some additional information not given to us must be added; of course, µ M+1 is the most suitable candidate to represent it.
Once this is established, the relevant question is: what value for µ M+1 ? Recalling that should assume the overall largest value, so that the decreasing of entropy is as small as possible. Since If f M+1 (µ M+1 ) is density corresponding to the set of moments (8), the following theorem holds. Theorem 1. The following two relationships hold and lim and ε indicates a fixed tolerance. Proof.
Let us now consider the suitable class , belong to C M and, primarily, they all have analytically tractable entropy. In (5), (6) and (12) , bounded by H f M−1 from above, is a differentiable monotonic increasing function of µ M+1 and then it tends to a finite limit so that 3. In analogy with (12), let us introduce the following class Collecting together both the achieved results in above items (a), (b), (c) and taking into account (9) one has Hence Equation (10) is also proved.
Equation (10) is restated as follows: if ε indicates a fixed tolerance, there exists a (7)) follows. As a consequencef M is the proposed substitute of f M and Equation (11) is proved.
In conclusion: 1. As f M does not exist, although the current use of MaxEnt fails, a solution is found back to (9)-(11) from which the desired result (7). 2. The existence of MaxEnt f M implies its uniqueness, unlikef M which depends on the assumed tolerance. In numerical Examples below just above remark will be actually used.

Hamburger Case
The non-symmetric Hamburger case when M even is here disregarded because the existence of the MaxEnt solution f M is guaranteed. Now, we will concentrate our attention on the symmetric case with M ≥ 4 and on the non-symmetric case with M ≥ 3 odd. In both cases, thanks to MaxEnt formalism, the procedure used in Stieltjes case can be extended to Hamburger one (see [9]); this fact represents one of the main advantages of MaxEnt machinery.

Symmetric Case with M ≥ 4 Even
We recall f M is symmetric function for every M even so that Lagrange multipliers λ 2j−1 = 0.
where the parameter µ M+2 is introduced and thanks to MaxEnt machinery the proof continues analogously to the Stieltjes case.

Non-Symmetric Case with M ≥ 3 Odd
If M is odd f M does not exist for every set of moments belonging to µ(C M ) because . Then nonsymmetric case, with M odd and for every set of moments belonging to µ(C M ), is solved analogously to the Stieltjes case and Theorem 1 holds true. A consequence of achieved results in this section is the following. Let us consider a non symmetric Hamburger moment. In Theorem 1 we proved Since the entropy is monotonic non increasing as M increases, the latter equalities enable us to set

Numerical Aspects
The procedure just above described and rooted on MaxEnt machinery suffers from some numerical drawbacks which will be here discussed. It deserves to recall similar drawbacks had been previously found ( [11,12]) although for the special value M = 4 in Hamburger case, exploring special regions of the moment space. Essentially, numerical troubles arise because the expected solution f M+1 is contaminated with a small wiggle that (a) moving to infinity, (b) is scaled in such a way that its contribution to the (M + 1)-th order moment µ M+1 is always O(1) and (c) may become invisible to numerical methods of quadrature. Now we provide some theoretical ground to justify above heuristics, which holds true in both Hamburger and Stieltjes case thanks to MaxEnt formalism. First of all, under the constraints (µ * 1 , . . . , µ * M , µ M+1 ), we prove the wiggle exists. At this purpose both the relationships λ M < 0 for each µ M+1 and λ M+1 → 0 as µ M+1 → ∞ have to be proved.
We are ready to prove the statement concerning the fact that f M+1 (µ M+1 ) exhibits a small wiggle at x 1 (analogously, in symmetric Hamburger case the wiggle is exhibited so that f M+1 admits maximum value at As µ M+1 increases we proved the relationships λ M < 0 and λ M+1 → 0, so that x wig > 0 moves to infinity (from numerical evidence, as µ M+1 increases, | λ M |→ 0 too much slower than λ M+1 ). Since f M+1 has finite moments (µ * 1 , ..., µ * M ) for each µ M+1 , it follows the wiggle in a compact packet is scaled in such a way that its contribution to the (M + 1)-th order moment µ M+1 is always O(1) (whilst for all higher moments the contribution due to this maximum obviously grows without bound, as a consequence of Lyapunov's inequality).
An additional complication comes from the fact that height and position of wiggle is extremely sensitive to the parameters λ M and λ M+1 , so that it becomes progressively smaller and smaller until to be "invisible" if an unsuitable numerical method of quadrature is adopted. As a consequence the procedure becomes increasingly ill-conditioned to such a degree that numerical error precludes finding a suitable solution. As remedy, for instance, the quadrature on the unbounded domain has to be mapped onto finite interval, as well an adaptive quadrature is required. Since the wiggle moves along x-axis as µ M+1 increases, a fixed nodes quadrature formula could be unsuitable as the wiggle could become invisible for some values of µ M+1 .
Above remedies are just a numerical trick, not a reduction of Stieltjes or Hamburger problem into Hausdorff one. Indeed, all the subsequent numerical examples consider and use random variables X having unbounded support R + or R.
As well the dual formulation, which evaluates (λ 1 , ..., λ M+1 ) minimizing the potential function avoids the computation of higher moments, as required by Newton-type methods by solving (3). The drawbacks just illustrated lead us to equip the stopping criterion (7) based on entropy with a further one based on the moments, which allows us the relationship µ j, f M+1 = µ * j , j = 1, ..., M holds true. That is, (or involving the absolute error) for a proper ε 1 .
The following question arises: it isf M , here identified with f M+1 (μ M+1 ) andμ M+1 is chosen so that stopping criteria (7) and (13) are verified, an acceptable approximation of underlying unknown density? Although the wiggle has non-physical meaning, nevertheless from the approximate density one like to calculate accurate and interesting quantities. We will resume the issue in the final part of the paper.
For practical purposes in both Stieltjes and Hamburger casef M is calculated according to (9)-(11) uniquely by means of MaxEnt machinery following these two distinct steps 1. First, the sequence {µ * k } M k=0 is prescribed and f M does not exists; then we know f M−1 exists with entropy H f M−1 ; 2. The next step relies upon on the monotonicity of Before to illustrate some numerical examples that confirm the goodness of the proposed method, it is worth spend some words discussing the outlined procedure. The calculation off M is obtained through an approximate procedure and hence has a limited range of applicability. The main problem is the presence of wiggles; at the end to contain their detrimental effect it is necessary that convergence of H f M+1 (µ M+1 ) to H f M−1 be fast.
Taking ε = 10 −4 , Equation (11) is satisfied starting fromμ M+1 = 20. Thenf M , which is identified with f M+1 (μ M+1 ), jointly with f M−1 are displayed in Figure 1 (top). The difference betweenf M and f M−1 is insignificant since µ * M − µ M, f M−1 = 0.1 was chosen to avoid the dentrimental effect of wiggle. In Figure 1 (bottom) the same f M+1 (μ M+1 ), on a logarithmic scale and on extended x-axis scale, is reported to evidenciate the presence of small wiggle. The moments respectively. It can be concluded f M ≡ f M+1 (μ M+1 ) satisfies all the expected theoretical properties and can be considered the "best" substitute of the missing f M . The moments respectively. It can be concludedf M satisfies all the expected theoretical properties and can be considered the "best" substitute of the missing f M .

Remark 1.
It is worth to note that the nonsymmetric Hamburger case with M = 3 has been discussed in [13], pp. 413-415, Equation (12.32), but solely on the basis of a simple heuristic reasoning; they use a tricky problem to observe that even if the Lagrange multipliers cannot be chosen to satisfy the given constraints, the "maximum" entropy can be found and it is equal to concluding that in this situation the entropy may only be -achievable. Just to give a simple example of it, but not a formal justification, the authors consider the case in which a Normal distribution be contaminated with a small "wiggle" at a very high value of x; consequently the moments of new distribution are almost the same as those of the non contaminated Normal, the biggest change being in the third moment (the new distribution is not any more symmetric). However, adding new wiggles in opportune positions to balance the changes caused by the original wiggle we can bring the first and the second moments back to their original values and also get any value of the third moment without reducing the entropy significantly below that of the associated non contaminated Normal (from this the conclusion about the -achievability of the entropy).
Just above heuristic procedure is displayed in Figure 2, and interpreted sayingf M may be identified with the Normal distribution on which some wiggles are superimposed.
This result is a particular case of the more general result covered by this paper and coincides with the above (9)-(11) when, in this case, f M−1 is the density function of a Normal distribution.
Lastly, all above heuristics agrees with the mathematical general result that two continuous density functions having the same first M + 1 moments (including µ 0 = 1) cross each other in at least M + 1 points ( [14], Vol.1, No. 140, p. 83). In our case f M+1 and the Normal density plotted in Figure 2, share the first M + 1 = 3 moments and they cross each other at three points as the inspection of the previous figure suggests.  In each of the previous three examples we have assumed that µ * M and µ M, f M−1 differ from a small amount and this to avoid the detrimental effect due to the wiggle; consequently, the difference betweenf M and f M−1 becomes insignificant too. As a result, 1. The convergence of H f M+1 to H f M−1 is fast and avoids the formation of small evanescent wiggles at a great distance; 2. The rise of numerical quadrature problems.
If g is a bounded function of X,f M and f M−1 lead to similar values, as Pinsker's inequality ( [15], p. 390) and (11) yield, As a consequence, although we settle for a density constrained by fewer moments, and then conceptually in contrast with the MaxEnt spirit, nevertheless the results remain unaltered.
The matter runs similarly whether quantiles have to be calculated. They may be configured as expected values of proper bounded functions: indeed, for fixed x, F(x) = E[g(t)] with g(t) = 1 if t ∈ [0, x] and g(t) = 0 if t ∈ (x, ∞). Then, ifF M and F M−1 denote the distribution functions corresponding tof M and f M−1 , respectively, we have in Stieltjes case (and mutatis mutandis equivalently holds for Hamburger case) Again, although we settle for a density constrained by fewer moments, and this goes conceptually against the spirit of Jaynes, nevertheless the results concerning expected values of g remain unaltered. However, if g is an arbitrary unbounded function of X then the sequence of the above inequalities does not hold and the calculation of expected values of g could lead to different results, i.e., E f M−1 [g(X)] = Ef M [g(X)]. In conclusion: if the maximum entropy distribution does not exist, being guided by the spirit of maximum entropy could always turn out to be the best choice.  (2)) in Stieltjes case. For the special case M = 2, see [5], Theorem 2 or Example 2. (ii) λ M−1 = λ M = 0 (see Equation (2)) in symmetric Hamburger case.