Density Reconstructions with Errors in the Data

The maximum entropy method was originally proposed as a variational technique to determine probability densities from the knowledge of a few expected values. The applications of the method beyond its original role in statistical physics are manifold. An interesting feature of the method is its potential to incorporate errors in the data. Here, we examine two possible ways of doing that. The two approaches have different intuitive interpretations, and one of them allows for error estimation. Our motivating example comes from the field of risk analysis, but the statement of the problem might as well come from any branch of applied sciences. We apply the methodology to a problem consisting of the determination of a probability density from a few values of its numerically-determined Laplace transform. This problem can be mapped onto a problem consisting of the determination of a probability density on [0, 1] from the knowledge of a few of its fractional moments up to some measurement errors stemming from insufficient data.


Introduction
An important problem in many applications of probability is the determination of the probability density of a positive random variable when the information available consists of an observed sample.For example, it can be either an exit time or a reaction time, the accumulated losses or accumulated damage, and so on.A standard technique, related to a variety of branches of analysis, consists of the use of the Laplace transform.However, sometimes, such a technique may fail, because the transform cannot be determined, as in the case of the lognormal variable.In this regard, see the efforts in [1] to determine the Fourier-Laplace transform of the lognormal variable.One is then led to search for techniques to invert a Laplace transform from a few of its values determined numerically.Additionally, that is the reason why we chose to use a sample from the lognormal as data to test the methods that we propose.
To state our problem: we are interested in a method to obtain a probability density f S (s) from the knowledge of the values of the Laplace transform: ( To be specific, the positive random variable S may denote the severity of some kind of losses accumulated during a given time interval, and the density f S (s) is the object that we are after.Due to the importance of this problem for the insurance industry, there has been quite a large amount of effort devoted to finding systematic ways to compute f S (s) from the knowledge of the ingredients of some model relating the S to more basic quantities, like the frequency of losses and individual severities.See [2], for example, for a relatively recent update on methods to deal with that problem.
We should mention at the outset that if the Laplace transform E[e −αS ] were known as a function on the positive real axis, a variety of methods to determine f S exist.Among them, the use of maximum entropy techniques that bypass the need to extend the Laplace transform into a complex half-plane.The standard maximum entropy (SME for short) method to solve this problem is simple to implement.See [3] for a comparative study of methods (including the maximum entropy) that can be used to determine f S when E[e −αX ] can be computed analytically.By the way, there, we showed that with eight fractional moments (corresponding to eight values of the Laplace transform), we could obtain quite accurate inversions.That is the reason why we consider eight moments in this paper.
However, in many cases, E[e −αS ] has to be estimated from observed values s 1 , ...., s N of S, that is, the only knowledge that may be available to us is the total loss in a given period.It is at this point where errors come in, because in order to determine µ(α), we have to use the random sample and average over it.If we were somehow determining µ(α) by means of some experimental procedure, then an error might come in through the measurement process.
Thus, the problem that we want to address can now be restated as: where C i is some interval enclosing the true value of µ(α i ) for i = 1, ..., K.These intervals are related to the uncertainty (error) in the data.For us, these will be the statistical confidence intervals, but, as we are not using them for the statistical estimation of a mean, but as a measure of some experimental error, we adjust the width of the interval to our convenience.
To transform the problem into a fractional moment problem, note that, since S is positive, we may think of Y = e −S as a variable in [0, 1], whose density f Y (y) we want to infer from the knowledge of an interval in which its fractional moments fall, that is, from: where C i denotes now an interval around the true, but unknown moments µ(α i ) of f Y .
The SME method has been used for a long time to deal with problems like Equation (2).See [4] for a rigorous proof of the basic existence and representation results and for applications in statistical mechanics.See, also, [5] and [6] for different rigorous proofs of these results.See, also, [7] for an interesting collection of applications in a large variety of fields.
However, possible extensions of the method on maximum entropy to handle errors in the data do not seem to have received much attention despite their potential applicability.Two such possible extensions, explored in the framework of the method maximum entropy in the mean as applied to linear inverse problems with convex constraints, were explored in [8].Here, we want to provide alternative ways to incorporate methods to deal with errors in the data to solve Equation (3) within the framework of the SME methods without bringing in the method of maximum entropy in the mean as in [8].The difference between the two methods that we analyze lies in that one of them provides us with an estimator of the additive error.
The remainder of the paper is organized as follows.In the next section, we present the two extensions of the SME, and in the third section, we apply them to obtain the probability density f S (s) from the knowledge of the interval in which the fractional moments of Y = e −S fall.We point out two features of our simulations at this stage: we consider one with a relatively small sample and one with a larger sample.The exact probability density from which the data is sampled is to be used as a benchmark against which the output of the procedures is compared.We should mention that the methods we present here are a direct alternative to the methods based on the method of maximum entropy in the mean.For an application of that technique for the determination of the risk measure from the knowledge of mispriced risks, see [9].

The Maxentropic Approaches
As mentioned in the Introduction, in each subsection below, we consider a different way of extending the SME.In the first, we present the extension of the method of maximum entropy to include errors in the data, while in the second, we present a version that allows for the estimation of the additive error in the data.

Extension of the Standard Maxent Approach without Error Estimation
Here, we present an extension of the original variational method originally proposed by Jaynes in [10], based on an idea proposed in [11], to solve the (inverse) problem consisting of finding a probability density f Y (y) (on [0, 1] in this case), satisfying the following integral constraints: where the interval around the true, but unknown µ Y (α k ), is determined from the statistical analysis of the data for each of the moments.For k = 0 only, we set C 0 = {1}, since for α 0 = 0, we have µ 0 = 1 to take care of the natural normalization requirement on f Y (y).To state the extension, denote by D the class of probability densities g(y) satisfying Equation ( 4).This class is convex.On this class, define the entropy by S(g) = − ∫ 1 0 g(y) ln(g(y))dy, whenever the integral is finite (or −∞, if not).Now, to solve Equation (4), we extend Jaynes' method ( [10]) as follows.The problem now is: To dig further into this problem, let us introduce some notation.Let g denote any density, and denote by µ g (α) the vector of α moments of g.Set C = C 0 × ... × C M .Additionally, for c ∈ C, let us denote by D(c) the collection of densities having µ g (α) = c.With these notations and following the proposal in [10], then, we carry on the maximization process sequentially and restate the previous problem as: The idea behind the proposal is clear: first, solve a maximum entropy problem each c ∈ C to determine a g * c , and then, maximize over c ∈ C to determine the c * , such that g * c * yields the maximum entropy S(g * c * ) over all possible moments in the confidence set.
Invoking the standard argument, we know that when the inner problem has a solution, it is of the type: in which the number of moments M appears explicitly.It is usually customary to write e Recall, as well, that the normalization factor is given by: With this notation, the generic form of the solution looks like: To complete, it remains to specify how the vector λ * can be found.For that, one has to minimize the dual entropy: where < a, b > denotes the standard Euclidean scalar product.It is also a standard result that: With this, the double minimization process can be restated as: Additionally, invoking the standard minimax argument, restate it as: Now, due to the special form of Σ(λ, c), it suffices to compute sup{< λ, c > |c ∈ C}.For that, we make use of the simpleto verify the fact that sup{< λ, y With this, it is easy to see that: Explicitly, As a first step towards a solution, we have the simple: With the notations introduced above, set: Then, Σ(λ) is strictly convex in λ.
Observe that ∂δ * C (λ)/∂λ i is defined, except at λ i = 0, where it is sub-differentiable (see [12]).Actually: Additionally, to close up, we have: Then, the solution to the maximum entropy problem ( 6) is: Due to the computation above, it is clear that: Comment: This is a rather curious result.Intuitively, at a minimum, we have ∇ λ Σ(λ) = 0; then, according to Equation (11), if all λ i ̸ = 0, the maxentropic density g * (y) has moments equal to one of the end points of the confidence intervals.If all λ i = 0, the density g * (y) is uniform, and the reconstructed moments

Extension of the Standard Maxent Approach with Error Estimation
In this section, we present an extension of the method of maximum entropy that allows us to estimate the errors in the measurement and to circumvent the concluding comments in the previous section, that is to obtain estimated moments different from the end points of the confidence intervals.Instead of supposing that f Y (y) satisfies Equation ( 4), we present a totally different approach.To restate the problem, consider the following argument.If we suppose that the measurement error in the determination of the k−th moment lies within the confidence interval C k = [a k , b k ], then the unknown estimate of the measurement error can be written as We propose to extend the original problem to (if the extension is not clear, see the Appendix at the end of this section): The idea behind the proposal is clear.In order to obtain the µ ′ k s either experimentally or numerically, we average over a collection of observations (simulations) and the errors in each average additively.Thus, the observed value of µ k consists of the true (unknown) moment, which is to be determined, plus an error term that has to be estimated, as well.This time, we search for a density f Y (y) with y ∈ [0, 1] and numbers 0 < p k < 1, (k = 1, ..., M ), such that Equation ( 13) holds.This computational simplification is possible when the support of the distribution of errors is bounded.To compress the notations, we write probability distributions concentrated on , and the probability that we are after is a mixture of continuous and discrete distributions.To determine it, we define, on the appropriate space of product probabilities (see the Appendix to this section below), the entropy: Needless to say, 0 < p k < 1, for k = 1, ..., M. With all of these notations, our problem becomes: Find probability density g * (y) and numbers 0 < p k < 1 satisfying: Constraints (13) and the normalization constraint The usual variational argument of [9] or the more rigorous proofs in [4][5][6] yield: ) Here, the normalization factor Z(λ) is as above.This time, the vector λ * of Lagrange multipliers is to be found minimizing the dual entropy: Once λ * is found, the estimator of the measurement error is, as implicit in Equation ( 13), given by: Notice that, although the formal expression for g * (y) is the same as that for the first method, the result is different, because the λ * is found minimizing a different functional.So as not to interrupt the flow of ideas, we compile the basic model behind the results that we just presented, as well as some simple, but necessary computations in the Appendix.

Numerical Implementations
Here, we suppose that the random variable of interest follows a lognormal distribution with known parameters µ = 1 and σ = 0.1.Even though it is a simple example, it is rather representative of the type of distributions appearing in applications.As said, our data will consist of simulated data, and we shall consider two data sets of different sizes.
To produce the two examples, do the following: (1) Simulate two samples {s 1 , ..., s N } of sizes N = 200 and N = 1, 000 from the lognormal distribution.
(2) For each sample, we compute its α−moment and its confidence interval using the standard statistical definition.That is, we compute and the confidence interval as specified below.
(3) After obtaining each maxentropic density, we use standard statistical test to measure the quality of the density reconstruction procedure.
Table 1 shows the error intervals, which we take to be the 10% confidence interval for the mean obtained, respectively, using the standard definition, that is as , where sd i is the sample standard deviation and μi is the sample mean of Y α i the simulated samples of sizes of 200 and 1000.In Table 2, we list the moments of S for the two sample sizes.As mentioned before, the error intervals are the unique inputs for the maxentropic method of density without error estimation, whereas the moments are needed for both the SME and the maxentropic method with error estimation.Recall, as well, that the second method forces the estimation error in the i−th moment to lie in the corresponding error interval, which is centered at the observed sample moment.

Reconstruction without Error Estimation
We now present the results of implementing the first approach, namely the density reconstruction without estimating the additive noise.
The two panels of Figure 1 display the real (true) density from which the data was sampled, the histogram that was obtained, the density reconstructed according to the SME method, as well as the density obtained according to the first extension of the SME (labeled SMEE).The left panel contains the results obtained for a sample of a size of 200, whereas the right panel the reconstruction obtained for a sample of a size of 1, 000.
Even though the moments of the density obtained with the SME method coincide with the empirical moments, the moments of the density obtained by the first reconstruction method do not need to coincide with the sample moments.They only need to fall within (or at the boundary) of the error interval, which is centered at the sample moments.In Table 3, we list the moments that those densities determine for each sample size.Table 3.
Moments of the maxentropic densities reconstructed according to the first procedure.Let us now run some simple quality of reconstruction tests.An experimentalist would be most interested in the comparison of the histogram (real data) to the reconstructed density.However, as we have the real density from which the data was sampled, we can perform three comparisons.We compute the actual L 1 and L 2 distances between the densities and those between the densities and the histograms.Actually, this is a bin-dependent computation and not a truly good measure of the quality of reconstruction, but we carry it out as a consistency test.In Table 4, we display the results of the computations.The distances between the continuous densities are computed using standard numerical integrators and the distances between the empirical densities and the continuous densities according to:

Size
where b k are the position of the bins, which enter as limits of integration.Note that the distances between the true density and the maxentropic reconstruction are much smaller that those between the maxentropic or trueand the histogram, and that the distance of the true and the maxentropic distances and the histogram are similar.Thus, the reconstruction methods are performing well.Another measure of quality of reconstruction is the L 1 and L 2 distances between cumulative distribution functions.A simple way to compute them, which includes the histogram, as well, is given by: where we may consider the s j to be the ordered valued of the sample without loss of generality.To distinguish them from the standard distances, M AE stands for "mean average error" and RM SE stands for "root mean square error".The results are displayed in Table 5.It is intuitively clear and confirmed in the results displayed in the table that the larger the size of the sample, the better the estimation results for both methods, that is, the SME and SMEE.
In Table 6, we can see details of the convergence of the method used according to the sample size.Clearly, the SMEE method involves less iterations and machine time, but involves a lower value for the gradient.
To close, let us consider Table 3 once more.We commented after Theorem 1 that when the multipliers were non-zero, the reconstructed moments are the end points of the confidence intervals.This is not borne out by the results in that table, because at a minimum, the norm of the gradient is ∼ 10 −4 and not exactly zero, and this tiny error explains the slight differences.Not only that, since the first method yields the boundary points of the confidence intervals as reconstructed moments, the corresponding maxentropic density is expected to differ more from the true one than that of the center of the confidence interval.

Density Reconstruction with Error Estimation
Recall that this time, we are provided with moments measured (estimated) with error and that we have prior knowledge about the range of the error about the moment.We are now interested in determining a density, the moments that it determines and an estimate of the additive measurement error.
In each panel of Figure 2, we again display four plots: Along with the true (real) density and the histograms generated by sampling from it, the reconstructed maxentropic density (labeled SME) is determined by the original sample moments, and the maxentropic densities (labeled SMEE) are by the second procedure, for each error interval and each sample size.Visually, the SMEE densities are more close to the SME density this time.This time, the optimal Lagrange multipliers λ * i , i = 1, ..., 8 determine both the maximum entropy distributions, as well as the weights of the endpoint of the confidence intervals for the estimation of the noise.With the multipliers, one obtains the maximum entropy density, from which the true moments can be computed as μk = ∫ 1 0 y α k f * Y (y)dy.The values obtained for each type of confidence interval and for each sample size are presented in Table 7.To measure the quality of the reconstructions, we again compute the L 1 and L 2 distances between densities, as well as the distance between distribution functions.These are displayed in Tables 9 and 10 below.Again, the distances between densities and histograms depend on the bin sizes.For a sample size of 1000, the results of the methods SME and SMEE seem to coincide better and are closer to the lognormal curve than the histogram (simulated data).To finish, consider Table 11 in which we can show the details of the convergence of the SMEE versus the SME in the second case.The two leftmost columns compare the SMEE versus the SME for a sample of a size of 200, whereas the two rightmost columns compare the performance for a sample of a size of 1000.All things considered, it seems that the second method, that is the simultaneous determination of density and measurement errors, has a better performance.
(19) Now, the procedure is standard, and the result is stated in the easy to prove, but important: Theorem 2. Suppose that the infimum λ * of Σ(λ) given by Equation ( 17) is reached in the interior of the set {λ ∈ R M |Z(λ) < ∞}.Then, the solution to the maximum entropy problem Equation ( 19) is given by Equation ( 16).
Proof.All it takes is to differentiate (17) with respect to λ i equated to zero and to read the desired conclusion.Comment: This is a rather diluted version of the general result presented in [4] or [10], but is enough to keep us going.
In the next two subsections, we add the explicit computation of the gradient of Σ(λ) for those that use gradient-based methods for its minimization.

B. Derivative of Σ(λ) when Reconstructing with Data in a Confidence Interval
Having determined the confidence interval [a i , b i ] for each i = 1, ..., 8, the next step is to minimize Σ(λ).For the first case, in the case of need, here is the derivative.It invokes Equation (11): where k i = b i if λ i > 0 and k i = a i if λ i < 0, and in the rare case that λ i = 0, choose k i ∼ U (a i , b i ).Additionally, as above, Z(λ) = ∫ 1 0 e − ∑ 8 i=1 λ i y α i dy.Once the minimizer λ * has been found, the maxentropic density (in the changed variables is: when the change of variables f * (t) = e −t g * (e −t ).

C. Derivative of Σ(λ) when Reconstructing with Error Estimation
This time, the derivatives of Σ(λ) are a bit different.On the one hand: Once the minimizing λ * has been found, the routine should be the same as above.That is, use Equation ( 17) to obtain the density and plot along with the result obtained in the previous section.

Figure 1 .
Figure 1.Histograms and true and maxentropic densities for different sample sizes.(a) Results with a sample of a size of 200; (b) results with a sample of a size of 1,000.

Figure 2 .
Figure 2. Density of the individual losses obtained by SME and SME with errors (SMEE) for different samples sizes.(a) Results for a sample size of 200; (b) results for a sample size of 1000.

Table 1 .
Error intervals for S for sample sizes of 200 and 1000.

Table 2 .
Moments of Sfor different sample sizes.

Table 4 .
L 1 and L 2 distances between densities and between histograms and densities.SMEE, extended standard maximum entropy.

Table 5 .
The MAE and RMSE values between reconstructed densities, the original histogram and densities.

Table 6 .
Convergence of the method (2.1) and SME for different sample sizes.

Table 7 .
Moments determined by the second procedure for samples of 200 and 1000.

Table 8
displays the estimated errors, as well as the corresponding weights of the endpoints of the error intervals.Keep in mind that the estimated error in the determination of each moment, as well as the estimated moment add up to the measured moment.

Table 8 .
Weights and estimated errors.

Table 9 .
L1 and L2 distances between the reconstructed densities, the original histogram and the densities.
ApproachHistogram vs. Real density Histogram vs. Maxent Real density vs.

Table 10 .
MAE and RMSE distances between the reconstructed densities, the original histogram and the densities.