# Pushing for the Extreme: Estimation of Poisson Distribution from Low Count Unreplicated Data—How Close Can We Get?

## Abstract

**:**

## 1. Introduction

- Equal a-priori weighting (flat prior) over possible (unknown) Poisson sources is unrealistic. Typical values of observed counts are usually bounded by the nature of the problem (e.g., gene magnification setting used in the experiments or time window on the photon streams). One may have a good initial (a-priori) guess as to what ranges of typical observed counts might be reasonably expected. In particular, we are interested in the low count regimes. In such cases, it is desirable to incorporate such prior knowledge into the inference mechanism. In this study, we do this in the Bayesian framework through prior distribution over the expected counts.
- To understand potential benefits of the proposed learning/inference method (in our case Bayesian approach), it is important to compare it with a simple straightforward baseline (here maximum likelihood estimation). We contrast the expected Kullback–Leibler divergences from the true unknown Poisson distribution to its Bayesian and maximum likelihood estimates, inferred from a single realization.

## 2. Single Count Data—Bayesian and Maximum Likelihood Approaches

#### 2.1. Bayesian Averaging in the Audic–Claverie Approach

#### 2.2. Information Theory of ${P}_{AC}\left(y\right|x)$

**Theorem 1**[11] Consider an underlying Poisson distribution $P(\xb7|\lambda )$ parameterized by some $\lambda >0$. Then

#### 2.3. ${P}_{AC}\left(y\right|x)$ vs. Maximum Likelihood

**Theorem 2**[12] Consider an underlying Poisson distribution $P(\xb7|\lambda )$ parameterized by some $\lambda >0$ and a regularization constant $\u03f5\in (0,1]$. The expected divergence in bits $\mathsf{{\rm Y}}(\lambda ,\u03f5)$ between the true Poisson source and its (regularized) maximum likelihood estimate based on a single observation,

**Figure 1.**Expected divergence (in bits) $\mathsf{{\rm Y}}(\lambda ,\u03f5=1)$ of the ML estimation (zero count regularized with $\u03f5=1$) (solid line). Also shown is the expected divergence $\mathcal{E}\left(\lambda \right)$ of ${P}_{AC}\left(y\right|x)$ (dashed line).

## 3. Generalized ${P}_{AC}\left(y\right|x)$ with Gamma Prior

**Figure 2.**Gamma prior $P\left(\lambda \right|\alpha =1,\beta )$. Shown are the priors for three possible values of parameter β, $\beta \in \{1,0.1,0.05\}$.

## 4. First and Second Moments of the Generalized ${P}_{AC}\left(y\right|x)$

**Theorem 3**Consider a non-negative integer x and the associated generalized model ${P}_{G}\left(y\right|x,\alpha ,\beta )$. Then,

## 5. Expected Divergence of the Generalized ${P}_{AC}\left(y\right|x)$ from the True Underlying Poisson Distribution

**Theorem 4**Consider an underlying Poisson distribution $P(\xb7|\lambda )$ parameterized by some $\lambda >0$. Then for $\beta \ge 0$,

**Theorem 5**For Poisson sources with mean rates

**Figure 3.**Graph of $\kappa \left(\beta \right)$. For Poisson sources with mean rates $\lambda <\kappa \left(\beta \right)$, $\mathcal{E}\left(\lambda \right)>{\mathcal{E}}_{G}(\lambda ;\beta )$ and hence ${P}_{R}\left(y\right|x,\beta )$ is on average guaranteed to approximate the underlying source better than the original ${P}_{AC}\left(y\right|x)$.

**Figure 4.**Expected divergences ${\mathcal{E}}_{G}(\lambda ;\beta )$ (solid line) and $\mathcal{E}\left(\lambda \right)$ (dashed line) for $\beta =0.2$ (left) and $\beta =0.01$ (right).

## 6. Empirical Investigations

**Figure 5.**ROC curves for test distributions ${P}_{AC}\left(y\right|x)={P}_{R}\left(y\right|x,\beta \to 0)$ (solid black line), ${P}_{R}\left(y\right|x,\beta =1/100)$ (solid blue line), ${P}_{R}\left(y\right|x,\beta =1/50)$ (solid green line) and ${P}_{ML}\left(y\right|x)$ with $\u03f5=1$ (dashed red line). Mean rate of the underlying Poisson source was fixed at $\lambda =5$.

**Figure 6.**ROC curves for test distributions ${P}_{AC}\left(y\right|x)={P}_{R}\left(y\right|x,\beta \to 0)$ (solid black line) and ${P}_{G}\left(y\right|x,{\alpha}_{*},{\beta}_{*})$ (dashed red line). Mean rate of the underlying Poisson source was fixed at $\lambda =5$.

## 7. Discussion and Conclusion

## Acknowledgements

## Appendix A

- The sampling rate ${\lambda}_{2,j}$ for the treatment group ${T}_{2}$ is obtained as$$\begin{array}{ccc}\hfill {\lambda}_{2,j}& =& {2}^{\left({log}_{2}{\lambda}_{1}\right)-LF{C}_{j}}\hfill \end{array}$$$$\begin{array}{ccc}\hfill LF{C}_{j}& \sim & Uniform\{-2.0,-1.5,-1.0,...,1.5,2.0\}\hfill \end{array}$$
- A pair of gene counts $({y}_{1,j},{y}_{2,j})$ is sampled with respect to $Poisson\left({\lambda}_{1}\right)$ and $Poisson\left({\lambda}_{2,j}\right)$,$${y}_{1,j}\sim Poisson\left({\lambda}_{1}\right),\phantom{\rule{4pt}{0ex}}\phantom{\rule{4pt}{0ex}}\phantom{\rule{4pt}{0ex}}{y}_{2,j}\sim Poisson\left({\lambda}_{2,j}\right)$$
- Zero mean Gaussian noise is then added to each gene count (rounding to the nearest integer using the rounding operator $[\xb7]$):$$\begin{array}{ccc}\hfill {y}_{i,j}^{\prime}& =& {y}_{i,j}+\left[{\eta}_{j}\right],\phantom{\rule{4pt}{0ex}}\phantom{\rule{4pt}{0ex}}\phantom{\rule{4pt}{0ex}}i=1,2\hfill \end{array}$$$$\begin{array}{ccc}\hfill {\eta}_{j}& \sim & N\left(0,{\sigma}_{j}=\frac{{v}_{j}}{\psi}\right)\hfill \end{array}$$$$\begin{array}{ccc}\hfill {v}_{j}& =& \frac{{\lambda}_{1}+{\lambda}_{2,j}}{2}\hfill \end{array}$$
- The batch and lane effects are simulated as follows. Batch effects are accounted for by adding Gaussian noise to each noisy count ${y}_{i,j}^{\prime}$,$$\begin{array}{ccc}\hfill {y}_{i,j}^{\prime \prime}& =& {y}_{i,j}^{\prime}+\left[{\eta}_{i,j}^{\prime}\right]\hfill \end{array}$$$$\begin{array}{ccc}\hfill {\eta}_{i,j}^{\prime}& \sim & N\left(0,\frac{{y}_{i,j}^{\prime}}{10}\right)\hfill \end{array}$$$$\begin{array}{ccc}\hfill {x}_{i,j}& \sim & Poisson({\delta}_{j}\xb7{y}_{i,j}^{\prime \prime})\hfill \end{array}$$$$\begin{array}{ccc}\hfill {\delta}_{j}& \sim & Uniform\{0.65,0.8,0.95\}\hfill \end{array}$$

## References

- Varuzza, L.; Gruber, A.; de B. Pereira, C. Significance tests for comparing digital gene expression profiles. Nat. Preced.
**2008**. [Google Scholar] [CrossRef] - Audic, S.; Claverie, J. The significance of digital expression profiles. Genome Res.
**1997**, 7, 986–995. [Google Scholar] [PubMed] - Medina, C.; Rotter, B.; Horres, R.; Udupa, S.; Besser, B.; Bellarmino, L.; Baum, M.; Matsumura, H.; Terauchi, R.; Kahl, G.; et al. SuperSAGE: The drought stress-responsive transcriptome of chickpea roots. BMC Genomics
**2008**, 9, e553. [Google Scholar] - Kim, H.; Baek, K.; Lee, S.; Kim, J.; Lee, B.; Cho, H.; Kim, W.; Choi, D.; Hur, C. Pepper EST database: Comprehensivein silico tool for analyzing the chili pepper (Capsicum annuum) transcriptome. BMC Plant Biol.
**2008**, 8, e101. [Google Scholar] [CrossRef] [PubMed] - Cervigni, G.; Paniego, N.; Pessino, S.; Selva, J.; Diaz, M.; Spangenberg, G.; Echenique, V. Gene expression in diplosporous and sexual Eragrostis curvula genotypes with differing ploidy levels. BMC Plant Biol.
**2008**, 67, e11. [Google Scholar] [CrossRef] [PubMed] - Miles, J.; Blomberg, A.; Krisher, R.; Everts, R.; Sonstegard, T.; Tassell, C.V.; Zeulke, K. Comparative transcriptome analysis of in vivo and in vitro-produced porcine blastocysts by small amplified RNA-serial analysis of gene expression (SAR-SAGE). Mol. Reprod. Dev.
**2008**, 75, 976–988. [Google Scholar] [CrossRef] [PubMed] - Cuevas-Tello, J.C.; Tiňo, P.; Raychaudhury, S. How accurate are the time delay estimates in gravitational lensing? Astron. Astrophys.
**2006**, 454, 695–706. [Google Scholar] [CrossRef] - Cuevas-Tello, J.C.; Tiňo, P.; Raychaudhury, S.; Yao, X.; Harva, M. Uncovering delayed patterns in noisy and irregularly sampled time series: An astronomy application. Pattern Recognit.
**2010**, 43, 1165–1179. [Google Scholar] [CrossRef] - Pelt, J.; Hjorth, J.; Refsdal, S.; Schild, R.; Stabell, R. Estimation of multiple time delays in complex gravitational lens systems. Astron. Astrophys.
**1998**, 337, 681–684. [Google Scholar] - Press, W.; Rybicki, G.; Hewitt, J. The time delay of gravitational lens 0957+561, I. Methodology and analysis of optical photometric Data. Astrophys. J.
**1992**, 385, 404–415. [Google Scholar] [CrossRef] - Tiňo, P. Basic properties and information theory of audic-claverie statistic for analyzing cDNA arrays. BMC Bioinform.
**2009**, 10, e310. [Google Scholar] [CrossRef] [PubMed] - Tiňo, P. One-shot Learning of Poisson Distributions in cDNA Array Analysis. In Advances in Neural Networks, Proceedings of the 8th International Symposium on Neural Networks (ISNN 2011), Guilin, China, 29 May – 1 June, 2011; Liu, D., Zhang, H., Polycarpou, M., Alippi, C., He, H., Eds.; Lecture Notes in Computer Science (LNCS 6676). Springer-Verlag: Berlin, Heildelberg, Germany, 2011; pp. 37–46. [Google Scholar]
- Auer, P.; Doerge, R. Statistical design and analysis of RNA sequencing data. Genetics
**2010**, 185, 405–416. [Google Scholar] [CrossRef] [PubMed]

© 2013 by the author; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).

## Share and Cite

**MDPI and ACS Style**

Tiňo, P.
Pushing for the Extreme: Estimation of Poisson Distribution from Low Count Unreplicated Data—How Close Can We Get? *Entropy* **2013**, *15*, 1202-1220.
https://doi.org/10.3390/e15041202

**AMA Style**

Tiňo P.
Pushing for the Extreme: Estimation of Poisson Distribution from Low Count Unreplicated Data—How Close Can We Get? *Entropy*. 2013; 15(4):1202-1220.
https://doi.org/10.3390/e15041202

**Chicago/Turabian Style**

Tiňo, Peter.
2013. "Pushing for the Extreme: Estimation of Poisson Distribution from Low Count Unreplicated Data—How Close Can We Get?" *Entropy* 15, no. 4: 1202-1220.
https://doi.org/10.3390/e15041202