Probability Distributions Approximation via Fractional Moments and Maximum Entropy: Theoretical and Computational Aspects
Abstract
:1. Introduction
- The solution to the problem is unique.
- The entire moment curve from which to pick up a finite number of arbitrary fractional moments is known.
2. The Role of Fractional Moments in MaxEnt Setup
3. A Reminder about T-Systems
- .
- The starting point is to consider that the set of continuous linearly independent real-valued functions , defined on the interval , constitutes a T-system of order n if any polynomial
- (a)
- for each
- (b)
- for each
- (c)
- if the set is a T-system, then is it too.
- The space of moments, given by the convex hull generated by the points has a nonempty interior. This set is convex but has a complex geometry. A good deal of the geometry of the classical moment spaces induced by the special T-system can be generalized to the case of the investigated T-system . If the sequence of prescribed moments is an inner point of then there are uncountably many probability measure having such prescribed moments, one of them being . Elsewhere, if the sequence of prescribed moments belongs to , the boundary of , a unique measure supported on a finite set of points exists (the so-called lower principal representation) and the determinant of the below-defined Gram matrix becomes zero. For an arbitrary n, let . For notational convenience we set
- Let us now consider the probability measure . Then , where, as usual
- Thus the matrix is the positive definite Gram matrix.
- The following Markov-Krein theorem ([14] Thm 5.1, p. 157; [15] Thm 1.1, p. 177) is fundamental to prove the convergence in entropy of the MaxEnt distribution: here we adapt it to fractional moments.Theorem 3 (Markov–Krein theorem).Given values of the first fractional moments so that the Gram matrix be positive definite, the integral over all the distributions having the assigned moments , has a minimum value where
- 2.
- .
- In this case the procedure proposed for runs more or less similarly since the involved functions , are T-systems too and an analogous Markov-Krein theorem ([14] Thm 1.1, p. 80; [15] Thm. 1.1, p. 109) is available. It should only be recalled that, in analogy with Theorem 3, given , the moment admits minimum and maximum value , , respectively, where
4. MaxEnt Solution of the Fractional Moment Problem
4.1. Existence of MaxEnt Distribution
- Suppose that X has unbounded support, , and the first moments have been assigned, has to be to guarantee integrability of . In analogy with the case of integer moments, being both integer and fractional moments T-systems, the above nonnegativity condition on is crucial and renders the moment problem solvable only under certain restrictive assumptions on the prescribed moment vector . Consider (11) with n replaced by , the first moments held constant, whilst varies continuously, so that the Lagrange multipliers , are depending on . Differentiating both sides with respect to one has
- From now on, for the sake of brevity, in the arguments of and we will mention only those that take continuously varying values.
- (i)
- Assume exists. Once are assigned and varies continuously, combine together the following facts: is a monotonic decreasing function, and take into account (9) and (15). One concludes that, if exists, the necessary and sufficient condition for the existence of is , in analogy with the past investigated case concerning integer moments ([16], Appendix A).
- (ii)
- Assume does not exist. In such a case . Indeed, if it were then we would have both and then , contradicting the fact that does not exist. Consequently, exists for every set . For practical purposes, doesn’t exist, both and exist. We can state that the problem of the non-existence of the MaxEnt density can be easily bypassed.
- Collecting together the items (i) and (ii) we conclude that the existence of is iteratively and numerically determined, starting from which exists.
- Proving the conditions of existence of the MaxEnt distribution we remarked the close analogy between the cases of fractional moments and integer moments. It is reasonable to expect similar analogies to arise also in the case in which an entropy value is to be attributed to the density in the case in which it does not exist so that the sequence of entropies is defined for every n. The issue was addressed in ([16], Thm. 1) and taking into account of the laboriousness of the proof, we limit ourselves to illustrating the tools involved and the results obtained.
- Some relevant facts need to be collected together. Since MaxEnt density does not exist, both and exist with entropies and , respectively. Introduce now the following class of densities all having the same first moments
- In particular, we direct our attention to the density , which thanks to Theorem 4 exists for any value . As in integer moments case, may not exist so that is meaningless ([16], Thm. 1) proved the relationship , from which , although the current use of MaxEnt fails (here the last recalled is the analog of (16) with replaced by ). Since the entropy is non-increasing as n increases, the latter equality enables us to set , filling the gap left by the nonexistence of the density . We reformulate such a result in terms of fractional moments as , from which . That leads us to conclude, whenever does not exist the missing entropy is replaced with , so that the sequence of entropies is defined for every n.
- We can thus reformulate the conditions of existence according to integer moments by meansTheorem 4.Once the moment set is prescribed, suppose exists with its n-th moment .
- (i)
- If , then exists; conversely if does not exist. Thus the existence of is iteratively (and numerically only) determined from starting from which exists.
- (ii)
- If does not exist, both and exist for every and , respectively. In addition, can be set.
- 2.
- Suppose now that : the procedure employed in the unbounded support case 1. runs similarly since the involved functions , are T-systems too and an analogous Markov-Krein theorem ([15], Thm. 1.1, p. 109) is available. It should only be recalled that, in analogy with the above Theorem 4, given , the moment admits minimum and maximum value and , respectively. The corresponding measures and under the form of a sum weighted Dirac delta function are uniquely determined and they are the so-called lower and upper principal representation, respectively, and the points . Thanks to MaxEnt formalism Equation (13) continues to hold. Once the first moments have been assigned, the bounded support does not imply any restriction on the Lagrange multipliers, in particular, can take on any real value. From (13), as varies within the bounded range of its admissible values , is bounded. As , coincides with the measures and . As a consequence , from which and then follows.
- Analog conclusions hold for the remaining Lagrange multipliers, pre and post-multiplying in , with , the matrix at the numerator by a suitable permutation matrix.
- In conclusion, given and assuming exists, exists if . Equivalently, the existence of is iteratively determined, starting from (the uniform distribution) which exists. On the other hand, thanks again to the MaxEnt formalism the previous proof of existence continues to hold. The solvability of the problem under certain restrictive assumptions on the prescribed moment vector ceases to exist and consequently the following theorem holds:Theorem 5.If a necessary and sufficient condition for the existence of the MaxEnt distribution is that the vector of moments is internal to the space of moments, that is .
4.2. Entropy Convergence of MaxEnt Distribution
- 1.
- Suppose .As varies, both (equivalently , ) and then hold.Consider and collect together (12) and the first equation of (13), we haveEnter Markov–Krein’s Theorem. From Theorem 4 and its consequences, as , can be assimilated to Dirac’s deltas set, equivalently to discrete distribution, the so-called lower principal representation.We recall for consistency between the differential entropy of a continuous random variable and the entropy of its discretization, the differential entropy of any discrete measure (being compared to the delta Dirac function) is assumed to be ([18], pp. 247–249). As a consequence, can be set. On the other hand, as takes its own prescribed value, holds. Then, with being a continuous function, there exists a value, say , such that . Summarizing, we have seen that:
- (i)
- If are assigned and is the corresponding MaxEnt density with entropy , the sequence is monotonically decreasing and then convergent, with ;
- (ii)
- for each n, is concave function in ; as , ;
- (iii)
- there exists such that .Enter Lin’s Theorem. Consider the Theorem 1 and without loss of generality, it will be assumed the sequence is asymptotically monotonic increasing. From Theorem 1, the sequence is convergent and, under the above assumption, is asymptotically monotonic increasing. As , from both relationships and , it follows
- (iv)
- both and .
Combining together just the above items (i)–(iv) drawn from Theorem 1 and Theorem 4, respectively, it followsThe employed methodology for the proof clearly suggests that the convergence in entropy holds true in both cases finite and . Indeed, assuming , as , tends to , so that Equation (18) leads to too. Previously we proved that whenever does not exist the missing entropy is replaced with , so that the sequence of entropies is defined for every n. That fact gives full significance to (18). - 2.
- Suppose .The procedure previously employed in case, is likewise extended to since the involved functions , are T-systems too and both an analogous Markov–Krein theorem ([15]—Thm. 1.1, p. 109) and Lin’s Theorem 6 are available (in the latter case, although the sequence has to be monotonically decreasing, the proof is similar). Thanks to MaxEnt formalism, both Equation (13) and the used methodology to prove Theorem (6) hold true.
4.3. Further Convergence Modes for Finite
- (a)
- the monotonically non-increasing sequence converges to and then it is a Cauchy sequence
- (b)
- the Kullback–Leibler distance between and that share the same first n fractional moments given by
- By replacing with f, letting , recalling Theorem 6 and the completeness of the space, it holds
5. Optimal Choice and Optimal Number of ’s
5.1. The Choice of
- We shall denote by found in (10) to make explicit its dependence on n and implicitly on the . These will be chosen as to minimize the Kullback–Leibler divergence (19) between the ”true” but unknown density f and the maxentropic solution . From (12), and this quantity equals because f and satisfy the same moments constraints. Therefore, minimizing (19) amounts toIn other words, is obtained through two consecutive minimization procedures with respect to , namelyBeing (28) multivariable and highly nonlinear unconstrained not convex optimization, the uniqueness of the MaxEnt solution may not be guaranteed, so the results greatly rely on the initial condition, i.e., different initial conditions may give different MaxEnt solutions. And even if the algorithm converges, there is no assurance that it will have converged to a global, rather than a local, optimum since conventional algorithms cannot distinguish between the two.For problems where finding an approximate global optimum is more important than finding a precise local optimum in a fixed amount of time, the Simulated Annealing Method may be preferable to exact algorithms. This explores the function’s entire surface and tries to optimize the function while moving both uphill and downhill. Thus, it is largely independent of the starting values, often a critical input in conventional algorithms. Further, it can escape from local optima and go on to find the global optimum.In conclusion, the crucial issue consists of solving the nested minimization which ranges over two distinct sets of variables and . While each takes its values into the interval , where relies upon physical or numerical reasons, each may assume any real value.
- Alternatively, taking into account for each fixed set , the inner admits a unique solution being convex function, the outer one could be calculated by Monte Carlo technique, replacing with , that isAfter having illustrated the selection criteria in the distribution calculation procedure we can reconnect again to the previously introduced problem of choice of constraints in the construction of the MaxEnt distribution. As a constraint, we can also include the choice of range in which to place in the minimization procedure (29). As an example, if for physical reasons we know the underlying f has hazard rate function with prescribed properties (for instance, asymptotically decreasing to zero), the approximation would save such property. As a consequence we choose once more (29), but, as it is easy to verify taking into account (10) with exponents (therefore, not optimal for the purposes of rapid convergence in entropy). Conversely, with the hazard rate asymptotically increasing to , with accomplish that request. Consequently, in both cases, it is important that entropy convergence is ensured.In conclusion, the criterion (29) for the calculation of is elastic and lends itself to correctly describing multiple scenarios.
5.2. A Single-Loop Strategy for Approximating with
- Once is fixed, and , are set, are drawn solving the linear system (31);
- as integrates into one, is given by
- Case : it is enough to transform the original sample data for instance through , to obtain a transformed sample in and apply the simplified procedure of Section 4.2.
- Case : in a similar way, the transformation can be applied to the original data obtaining a transformed sample once again in interval and the simplified procedure of Section 4.2 is immediately applicable. Note, that the empirical fractional moments of Y coincide with the empirical Laplace Transform of X, that is , which turns out to be the empirical version of the relationship . This last relation leads us to conclude that also the numerical inversion of the Laplace transform can be reduced to a fractional moment problem in .
- More recently [23] investigated the feature of an estimator relying upon the fractional moments for random variables supported on by allowing the fractional powers to take complex numbers. Unlike other authors, they are dealing with the case that the negative values of a random variable are not negligible at all.
5.3. The Choice of n
6. Conclusions
- the conditions of existence of the density ;
- the convergence theorem in entropy from which other modes of convergence follow;
- an optimal choice and optimal number of the fractional exponents ;
- assuming , a single-loop algorithm for approximating .
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Jaynes, E.T. Information theory and statistical mechanics. Phys. Rev. 1957, 106, 620. [Google Scholar] [CrossRef]
- Jaynes, E.T. Information theory and statistical mechanics II. Phys. Rev. 1957, 108, 171. [Google Scholar] [CrossRef]
- Akhiezer, N.I. The Classical Moment Problem and Some Related Questions in Analysis; Oliver and Boyd: Edinburgh, UK, 1965. [Google Scholar]
- Shohat, J.A.; Tamarkin, J.D. The Problem of Moments; Mathematical Surveys and Monographs-Volume I; American Mathematical Society: Providence, RI, USA, 1943. [Google Scholar]
- Olteanu, O. Symmetry and asymmetry in moment, functional equations and optimization problems. Symmetry 2023, 15, 1471. [Google Scholar] [CrossRef]
- Papalexiou, S.M.; Koutsoyiannis, D. Entropy based derivation of probability distributions: A case study to daily rainfall. Adv. Water Resour. 2012, 45, 51–57. [Google Scholar] [CrossRef]
- Novi Inverardi, P.L.; Tagliani, A. Maximum Entropy Density Estimation from Fractional Moments. Commun. Stat. Theory Methods 2003, 32, 327–345. [Google Scholar] [CrossRef]
- Novi Inverardi, P.L.; Petri, A.; Pontuale, G.; Tagliani, A. Stieltjes moment problem via fractional moments. Appl. Math. Comput. 2005, 166, 664–677. [Google Scholar] [CrossRef]
- Xu, J.; Zhu, S. An efficient approach for high-dimensional structural reliability analysis. Mech. Syst. Signal Process. 2019, 122, 152–170. [Google Scholar] [CrossRef]
- Zhang, X.; Pandey, M.D. Structural reliability analysis based on the concepts of entropy, fractional moment and dimensional reduction method. Struct. Saf. 2017, 43, 28–40. [Google Scholar] [CrossRef]
- Zhang, X.; He, W.; Zhang, Y.; Pandey, M.D. An effective approach for probabilistic lifetime modelling based on the principle of maximum entropy with fractional moments. Appl. Math. Model. 2017, 51, 626–642. [Google Scholar] [CrossRef]
- Ferreira de Lima, A.R.; Ferreira Batista, J.L.; Prado, P.I. Modelling Tree Diameter Distributions in Natural Forests: An Evaluation of 10 Statistical Models. Forest Sci. 2015, 61, 320–327. [Google Scholar] [CrossRef]
- Lin, G.D. Characterizations of Distributions via moments. Sankhya Indian J. Stat. 1992, 54, 128–132. [Google Scholar]
- Karlin, S.; Studden, W.J. Tchebycheff Systems: With Applications in Analysis and Statistics; Wiley Interscience: New York, NY, USA, 1966. [Google Scholar]
- Krein, M.G.; Nudelman, A.A. The Markov Moment Problem and Extremal Problems; American Mathematical Society: Providence, RI, USA, 1977. [Google Scholar]
- Novi Inverardi, P.L.; Tagliani, A. Stieltjes and Hamburger Reduced Moment Problem When MaxEnt Solution Does Not Exist. Mathematics 2021, 9, 309. [Google Scholar] [CrossRef]
- Alibrandi, U.; Mosalam, K.M. Kernel density maximum entropy method with generalized moments for evaluating probability distributions, including tails, from a small sample of data. Int. J. Numer. Methods Eng. 2017, 113, 1904–1928. [Google Scholar] [CrossRef]
- Cover, T.M.; Thomas, J.A. Elements of Information Theory; John Wiley & Sons: Hoboken, NJ, USA, 2006. [Google Scholar]
- Gzyl, H. Super resolution in the maximum entropy approach to invert Laplace transforms. Inverse Probl. Sci. Eng. 2017, 25, 1536–1545. [Google Scholar] [CrossRef]
- Kullback, S. Information Theory and Statistics; Dover: New York, NY, USA, 1967. [Google Scholar]
- Tagliani, A. Hausdorff moment problem and fractional moments: S simplified procedure. Appl. Math. Comput. 2011, 218, 4423–4432. [Google Scholar] [CrossRef]
- Gzyl, H.; Novi Inverardi, P.L.; Tagliani, A. Fractional moments and maximum entropy: Geometric meaning. Commun. Stat. Theory Methods 2014, 43, 3596–3601. [Google Scholar] [CrossRef]
- Akaoka, Y.; Okamura, K.; Otobe, Y. Properties of complex-valued power means of random variables and their applications. Acta Math. Acad. Sci. Hung. 2023, 171, 124–175. [Google Scholar] [CrossRef]
Families of Distribution | Density | Characterizing Moments |
---|---|---|
n | n | n | |||
---|---|---|---|---|---|
2 | 1 | 1 | |||
4 | 2 | 2 | |||
6 | 3 | 3 | |||
8 | 4 | 4 | |||
10 | 5 | 5 | |||
12 | 6 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Novi Inverardi, P.L.; Tagliani, A. Probability Distributions Approximation via Fractional Moments and Maximum Entropy: Theoretical and Computational Aspects. Axioms 2024, 13, 28. https://doi.org/10.3390/axioms13010028
Novi Inverardi PL, Tagliani A. Probability Distributions Approximation via Fractional Moments and Maximum Entropy: Theoretical and Computational Aspects. Axioms. 2024; 13(1):28. https://doi.org/10.3390/axioms13010028
Chicago/Turabian StyleNovi Inverardi, Pier Luigi, and Aldo Tagliani. 2024. "Probability Distributions Approximation via Fractional Moments and Maximum Entropy: Theoretical and Computational Aspects" Axioms 13, no. 1: 28. https://doi.org/10.3390/axioms13010028
APA StyleNovi Inverardi, P. L., & Tagliani, A. (2024). Probability Distributions Approximation via Fractional Moments and Maximum Entropy: Theoretical and Computational Aspects. Axioms, 13(1), 28. https://doi.org/10.3390/axioms13010028