# Best Probability Density Function for Random Sampled Data

## Abstract

**:**

## 1. Introduction

## 2. Method

#### 2.1. Transforming the sampled data

#### 2.2. Maximum entropy method applied to level-functions

#### 2.3. Least squares error method and integral evaluation

#### 2.4. Boundary conditions

#### 2.5. Smoothing the pdf

#### 2.6. Procedure to determine expansion coefficients

#### 2.7. Funnel diffusion: A surrogate for simulated annealing

**Initialization:**Set the current position equal to an initial guess: $\overline{\lambda}={\overline{\lambda}}_{o}$. Define the initial standard deviation for a zero-mean Gaussian distributed random step for each component to be ${\sigma}_{j}^{\left(0\right)}=1+\frac{{\lambda}_{j}}{10}$. Set the decay rate, r, to control the rate at which the random step size decreases. In this work, $r=\frac{\sqrt{2}}{2}$. As funnel diffusion proceeds, the step size will be at the i-th iteration, given by ${\sigma}_{j}^{\left(i\right)}={r}^{i}{\sigma}_{j}^{\left(0\right)}$. In vector notation, the standard deviation for each component is expressed as ${\overline{\sigma}}^{\left(i\right)}$. The criteria for the step size to decrease is that the error does not decrease after many consecutive failed attempts. Initialize the number of consecutive failed steps, ${N}_{fail}$, equal to zero. The step size is decreased only after ${N}_{fail}$ exceeds a maximum number of consecutive failed steps, ${M}_{fail}$. In this work, ${M}_{fail}=100$.**Random step:**Generate an independent random step in each of the m directions characterized by the corresponding standard deviation given by ${\overline{\sigma}}^{\left(i\right)}$ to arrive at the vector displacement, $\delta \overline{\lambda}$. Define a new test position, ${\overline{\lambda}}^{\prime}=\overline{\lambda}+\delta \overline{\lambda}$, and evaluate $E\left({\overline{\lambda}}^{\prime}\right)$.**Acceptance criteria:**If $E\left({\overline{\lambda}}^{\prime}\right)>E\left(\overline{\lambda}\right)$ the current position does not change, and ${N}_{fail}$ is incremented by 1. Otherwise, accept the move such that the new test position becomes the current position. In addition, reset ${N}_{fail}=0$, and also reset ${\sigma}_{j}^{\left(0\right)}=1+\frac{{\lambda}_{j}}{10}$ in order to reflect the new current position. Notice that ${\overline{\sigma}}^{\left(0\right)}$ is updated on each successful move in order to provide an automated adaptive scale for the step size for each component. Consequently, ${\overline{\sigma}}^{\left(i\right)}$ is also updated, although the iteration index, i, remains the same.**Funneling:**If ${N}_{fail}\le {M}_{fail}$ continue without doing anything. Conversely, if ${N}_{fail}>{M}_{fail}$, the current step size is too large. Therefore, decrease the step size where ${\overline{\sigma}}^{(i+1)}=r{\overline{\sigma}}^{\left(i\right)}$. To reflect the continual decrease in step size as the bottom of the funnel is approached, the index i is incremented by 1. Finally, reset ${N}_{fail}=0$.**Convergence:**If ${r}^{i}<tol$, the current position, $\overline{\lambda}$ is returned as the final answer. In this work, the tolerance is set as $tol=2\times {10}^{-4}$. Otherwise, take the next**Random step**.

## 3. Results

#### 3.1. Test example 1

**Figure 1.**Example results for test case 1. The first column shows the exact pdf (black) and four predicted pdf (red, green, blue, magenta) using independent random samples. The x-axis displays the range of the random variable in arbitrary units, while the y-axis is dimensionless. From top to bottom rows the number of random events in each sample were 64, 256, 1024, 4096 and 1048576. The second column is similar to the first, except it shows the result shown in magenta in the first column, and compares it with four additional results for the same sample — but from a different funnel diffusion run (black, red, green and blue). The third column shows 80 different level function moments calculated from the empirical data (x-axis) and from the theoretical prediction (y-axis) as defined in Equation 5. Perfect agreement would fall along the red line ($y=x$).

#### 3.2. Test example 2

**Figure 2.**Example results for test case 2. The first column shows the exact pdf (black) and four predicted pdf (red, green, blue, magenta) using independent random samples. The x-axis displays the range of the random variable in arbitrary units, while the y-axis is dimensionless. From top to bottom rows the number of random events in each sample were 64, 256, 1024, 4096 and 1048576. The second column is similar to the first, except it shows the result shown in magenta in the first column, and compares it with four additional results for the same sample — but from a different funnel diffusion run (black, red, green and blue). The third column shows 80 different level function moments calculated from the empirical data (x-axis) and from the theoretical prediction (y-axis) as defined in Eqaution Equation 5. Perfect agreement would fall along the red line ($y=x$).

#### 3.3. Test example 3

#### 3.4. Test example 4

**Figure 3.**Example results for test case 3. Top panel: The first column shows the exact pdf (black) and four predicted pdf (red, green, blue, magenta) using independent random samples. The x-axis displays the range of the random variable in arbitrary units, while the y-axis is dimensionless. The top and bottom rows respectively show the results using 256 and 1024 random events. The second column is similar to the first, except it shows the result shown in magenta in the first column, and compares it with four additional results for the same sample — but from a different funnel diffusion run (black, red, green and blue). The third column shows 80 different level function moments calculated from the empirical data (x-axis) and from the theoretical prediction (y-axis) as defined in Eqaution 5. Perfect agreement would fall along the red line ($y=x$). Bottom panel: The same description as the top panel, except that the number of events sampled in the top and bottom rows are respectively 4096 and 1048576.

**Figure 4.**Example results for test case 3. This re-plots one of the results from Figure 3 that was shown as magenta for the 1048576 random samples. Here, we can see the accuracy better using a semi-log scale.

## 4. Discussion

**Figure 5.**Example results for test case 4. The first column shows the exact pdf (black) and four predicted pdf (red, green, blue, magenta) using independent random samples. The x-axis displays the range of the random variable in arbitrary units, while the y-axis is dimensionless. From top to bottom rows the number of random events in each sample were 64, 256, 1024, 4096 and 1048576. The second column is similar to the first, except it shows the result shown in magenta in the first column, and compares it with four additional results for the same sample — but from a different funnel diffusion run (black, red, green and blue). The third column shows 80 different level function moments calculated from the empirical data (x-axis) and from the theoretical prediction (y-axis) as defined in Equation 5. Perfect agreement would fall along the red line ($y=x$).

**Figure 6.**Example results using smoothing on test case 4. The first column shows the exact pdf (black) and four predicted pdf (red, green, blue, magenta) using smoothing level, $s=10$, (defined in Equation 16 and each case is drawn from independent random samples. The x-axis displays the range of the random variable in arbitrary units, while the y-axis is dimensionless. The top and bottom rows contain 256 and 1024 random events. The second column is similar to the first, except it shows the result shown in magenta in the first column, and compares it with four additional results for the same sample — but using different smoothing requests with $s=2,4,6,8$ shown as (black, blue, green, red). Note that the red curve is essentially indistinguishable from the curve shown in magenta. Of course, since the objective function changes, this implies a different funnel diffusion run as well. The third column shows 80 different level function moments calculated from the empirical data (x-axis) and from the theoretical prediction (y-axis) as defined in Equation 5. These results correspond to the $s=10$ smoothing case shown in the first column by the magenta curve. Perfect agreement would fall along the red line ($y=x$).

## 5. Conclusions

## Acknowledgements

## References

- Wu, N. Maximum Entropy Method; Springer Series in Information Sciences; Springer-Verlag: New York, NY, USA, 1997. [Google Scholar]
- Golan, A.; Judge, G.G.; Miller, D. Maximum Entropy Econometrics: Robust Estimation with Limited Data; Wiley: New York, NY, USA, 1996. [Google Scholar]
- Kapur, J.N. Maximum Entropy Models in Science and Engineering; Wiley: New York, NY, USA, 1989. [Google Scholar]
- Mnatsakanov, R.M. Hausdorff moment problem: Reconstruction of probability density functions. Stat. Probab. Lett.
**2008**, 78, 1869–1877. [Google Scholar] [CrossRef] - Ngoc, T.M.P. A statistical minimax approach to the Hausdorff moment problem. Inverse Prob.
**2008**, 24, 045018. [Google Scholar] [CrossRef] - Inverardi, P.N.; Pontuale, G.; Petri, A.; Tagliani, A. Hausdorff moment problem via fractional moments. Appl. Math. Comput.
**2003**, 144, 61–74. [Google Scholar] [CrossRef] - Tagliani, A. Numerical aspects of finite Hausdorff moment problem by maximum entropy approach. Appl. Math. Comput.
**2001**, 118, 133–149. [Google Scholar] [CrossRef] - Kennedya, C.A.; Lennox, W.C. Solution to the practical problem of moments using non-classical orthogonal polynomials, with applications for probabilistic analysis. Probab. Eng. Mech.
**2000**, 15, 371–379. [Google Scholar] [CrossRef] - Kavehrad, M.; Joseph, M. Maximum entropy and the method of moments in performance evaluation of digital communications systems. IEEE Trans. Commun.
**1986**, 34, 1183–1189. [Google Scholar] [CrossRef] - Jaynes, E.T. Information theory and statistical mechanics. Phys. Rev.
**1957**, 106, 620. [Google Scholar] [CrossRef] - Levine, R.D. An information theoretical approach to inversion problems. J. Phys. A: Math. Gen.
**1980**, 13, 91–108. [Google Scholar] [CrossRef] - Borwein, J.M.; Lewis, A.S. On the convergence of moment problems. Trans. Am. Math. Soc.
**1991**, 325, 249–271. [Google Scholar] [CrossRef] - Malouf, R. A comparison of algorithms for maximum entropy parameter estimation. Proc. of CoNLL
**2002**, 20, 1–7. [Google Scholar] - Tagliani, A. Hausdorff moment problem and maximum entropy: A unified approach. Appl. Math. Comput.
**1999**, 105, 291–305. [Google Scholar] [CrossRef] - Turek, I. A maximum-entropy approach to the density of states within the recursion method. J. Phys. C
**1988**, 21, 3251–3260. [Google Scholar] [CrossRef] - Bandyopadhyay, K.; Bhattacharya, A.K.; Biswas, P.; Drabold, D.A. Maximum entropy and the problem of moments: A stable algorithm. Phys. Rev. E
**2005**, 71, 057701. [Google Scholar] [CrossRef] - Collins, R.; Wragg, A. Maximum entropy histograms. J. Phys. A: Math. Gen.
**1977**, 10, 1441–1464. [Google Scholar] [CrossRef] - Haven, K.; Majda, A.; Abramov, R. Quantifying predictability through information theory: Small sample estimation in a non-Gaussian framework. J. Comp. Phys.
**2005**, 206, 334–362. [Google Scholar] [CrossRef] - Deng, J.; Li, X.B.; Gu, G.S. A distribution-free method using maximum entropy and moments for estimating probability curves of rock variables. Int. J. Rock Mech. Min. Sci. Vol.
**2004**, 41, 1–6. [Google Scholar] - Dudk, M.; Schapire, R.E.; Phillips, S.J. Correcting sample selection bias in maximum entropy density estimation. In Advances in Neural Information Processing Systems; MIT Press: MA, USA, 2005; Vol. 18, pp. 323–330. [Google Scholar]
- Csiszar, I. Why least squares and maximum entropy? An axiomatic approach to inference for linear inverse problems. Annals Stat.
**1991**, 19, 2032–2066. [Google Scholar] [CrossRef]

© 2009 by the author; licensee Molecular Diversity Preservation International, Basel, Switzerland. This article is an open-access article distributed under the terms and conditions of the Creative Commons Attribution license http://creativecommons.org/licenses/by/3.0/.

## Share and Cite

**MDPI and ACS Style**

Jacobs, D.J. Best Probability Density Function for Random Sampled Data. *Entropy* **2009**, *11*, 1001-1024.
https://doi.org/10.3390/e11041001

**AMA Style**

Jacobs DJ. Best Probability Density Function for Random Sampled Data. *Entropy*. 2009; 11(4):1001-1024.
https://doi.org/10.3390/e11041001

**Chicago/Turabian Style**

Jacobs, Donald J. 2009. "Best Probability Density Function for Random Sampled Data" *Entropy* 11, no. 4: 1001-1024.
https://doi.org/10.3390/e11041001