A Flexible Multivariate Distribution for Correlated Count Data
Abstract
1. Introduction
2. Conway–Maxwell–Poisson Distribution
3. Multivariate Conway–Maxwell–Poisson Distribution
3.1. Parameter Estimation
3.2. Hypothesis Testing
4. Examples
4.1. Simulated Data
4.2. Real Data: Corporación Favorita Grocery Sales
4.3. Real Data: NBA All-Star
5. Discussion
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
| MB | multivariate binomial |
| fmgf | factorial moment generating function |
| MP | multivariate Poisson |
| pgf | probability generating function |
| NB | negative binomial |
| MNB | multivariate negative binomial |
| CMP | Conway–Maxwell–Poisson |
| MCMP | multivariate Conway–Maxwell–Poisson |
| mgf | moment generating function |
| MLEs | maximum likelihood estimates |
| pmf | probability mass function |
| ML | maximum likelihood |
| LRT | likelihood ratio test |
| AIC | Akaike Information Criterion |
| MLE | maximum likelihood estimate |
| NBA | National Basketball Association |
| C | Center |
| F | Forward |
| FC | Forward-center |
| sCMP | sum of CMPs |
| MSCMP | multivariate version of the sum of CMPs |
Appendix A. Deriving the Probability Mass Function
Appendix B. Derivations of Moments
Appendix C. Introduction to the Multivariate sCMP Model
| Model | Estimated Parameters | Log Likelihood | No. of Parameters | AIC | ||
|---|---|---|---|---|---|---|
| CMP | −804.9 | 9 | 1627.9 | |||
| sCMP () | −804.0 | 9 | 1626.0 | |||
| sCMP () | −803.5 | 9 | 1625.0 | |||
| NB | −802.8 | 8 | 1621.7 | |||
Appendix D. Real Datasets
| Day | Store 1 | Store 2 | Store 3 | Day | Store 1 | Store 2 | Store 3 | Day | Store 1 | Store 2 | Store 3 |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 2 | 5 | 6 | 35 | 2 | 3 | 3 | 68 | 1 | 3 | 5 |
| 2 | 3 | 8 | 23 | 36 | 4 | 2 | 6 | 69 | 1 | 2 | 3 |
| 3 | 2 | 8 | 21 | 37 | 3 | 2 | 16 | 70 | 0 | 4 | 4 |
| 4 | 4 | 5 | 8 | 38 | 7 | 12 | 12 | 71 | 1 | 4 | 10 |
| 5 | 2 | 7 | 13 | 39 | 6 | 0 | 13 | 72 | 1 | 0 | 14 |
| 6 | 1 | 4 | 15 | 40 | 2 | 2 | 4 | 73 | 0 | 1 | 14 |
| 7 | 0 | 0 | 11 | 41 | 1 | 0 | 13 | 74 | 0 | 2 | 19 |
| 8 | 1 | 6 | 2 | 42 | 3 | 6 | 4 | 75 | 0 | 4 | 6 |
| 9 | 6 | 6 | 7 | 43 | 2 | 0 | 3 | 76 | 7 | 2 | 12 |
| 10 | 3 | 8 | 13 | 44 | 7 | 2 | 4 | 77 | 1 | 4 | 7 |
| 11 | 3 | 8 | 16 | 45 | 5 | 7 | 3 | 78 | 1 | 0 | 7 |
| 12 | 0 | 1 | 7 | 46 | 8 | 1 | 19 | 79 | 3 | 1 | 8 |
| 13 | 0 | 5 | 11 | 47 | 1 | 6 | 17 | 80 | 4 | 3 | 17 |
| 14 | 6 | 8 | 7 | 48 | 1 | 2 | 5 | 81 | 4 | 7 | 7 |
| 15 | 1 | 2 | 4 | 49 | 3 | 6 | 5 | 82 | 3 | 3 | 13 |
| 16 | 8 | 4 | 4 | 50 | 5 | 2 | 9 | 83 | 1 | 0 | 9 |
| 17 | 3 | 10 | 20 | 51 | 10 | 1 | 0 | 84 | 1 | 1 | 11 |
| 18 | 6 | 6 | 12 | 52 | 4 | 4 | 11 | 85 | 0 | 2 | 8 |
| 19 | 0 | 1 | 6 | 53 | 1 | 5 | 25 | 86 | 6 | 5 | 11 |
| 20 | 3 | 7 | 10 | 54 | 3 | 5 | 4 | 87 | 0 | 0 | 9 |
| 21 | 0 | 2 | 5 | 55 | 1 | 7 | 3 | 88 | 0 | 4 | 8 |
| 22 | 3 | 4 | 1 | 56 | 2 | 1 | 4 | 89 | 1 | 2 | 15 |
| 23 | 3 | 4 | 3 | 57 | 1 | 5 | 2 | 90 | 1 | 6 | 4 |
| 24 | 7 | 17 | 13 | 58 | 0 | 3 | 7 | 91 | 0 | 2 | 11 |
| 25 | 2 | 4 | 11 | 59 | 6 | 4 | 12 | 92 | 0 | 2 | 5 |
| 26 | 0 | 5 | 9 | 60 | 8 | 1 | 11 | 93 | 8 | 2 | 11 |
| 27 | 3 | 1 | 2 | 61 | 3 | 4 | 12 | 94 | 3 | 3 | 21 |
| 28 | 2 | 3 | 1 | 62 | 1 | 2 | 5 | 95 | 3 | 9 | 21 |
| 29 | 2 | 9 | 4 | 63 | 5 | 0 | 3 | 96 | 0 | 8 | 8 |
| 30 | 4 | 2 | 7 | 64 | 2 | 1 | 3 | 97 | 4 | 5 | 27 |
| 31 | 6 | 4 | 12 | 65 | 0 | 3 | 7 | 98 | 5 | 3 | 15 |
| 32 | 1 | 6 | 18 | 66 | 1 | 2 | 26 | 99 | 1 | 2 | 4 |
| 33 | 0 | 11 | 15 | 67 | 0 | 1 | 17 | 100 | 4 | 3 | 6 |
| 34 | 2 | 7 | 12 |
| Year | C | F | FC | Year | C | F | FC | Year | C | F | FC |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 2000 | 6 | 1 | 4 | 2006 | 3 | 3 | 4 | 2012 | 3 | 2 | 4 |
| 2001 | 3 | 4 | 3 | 2007 | 2 | 3 | 3 | 2013 | 2 | 2 | 4 |
| 2002 | 4 | 3 | 4 | 2008 | 3 | 2 | 3 | 2014 | 2 | 2 | 5 |
| 2003 | 4 | 2 | 3 | 2009 | 2 | 3 | 5 | 2015 | 2 | 4 | 4 |
| 2004 | 3 | 4 | 4 | 2010 | 2 | 2 | 5 | 2016 | 3 | 4 | 2 |
| 2005 | 2 | 2 | 5 | 2011 | 4 | 2 | 2 |
References
- Johnson, N.; Kotz, S.; Balakrishnan, N. Discrete Multivariate Distributions; John Wiley & Sons: New York, NY, USA, 1997. [Google Scholar]
- Krishnamoorthy, A.S. Multivariate binomial and Poisson distributions. Sankhyā Indian J. Stat. 1951, 11, 117–124. [Google Scholar]
- Mahamunulu, D.M. A note on regression in the multivariate Poisson distribution. J. Am. Stat. Assoc. 1967, 62, 251–258. [Google Scholar] [CrossRef]
- Teicher, H. On the Multivariate Poisson distribution. Skand. Aktuarietidskr. 1954, 37, 1–9. [Google Scholar] [CrossRef]
- Hilbe, J.M. Modeling Count Data; Cambridge University Press: New York, NY, USA, 2014. [Google Scholar] [CrossRef]
- Doss, D.C. Definition and characterization of multivariate negative binomial distribution. J. Multivar. Anal. 1979, 9, 460–464. [Google Scholar] [CrossRef]
- Conway, R.W.; Maxwell, W.L. A queuing model with state dependent service rates. J. Ind. Eng. 1962, 12, 132–136. [Google Scholar]
- Sellers, K.F.; Shmueli, G.; Borle, S. The COM-Poisson model for count data: A survey of methods and applications. Appl. Stoch. Model. Bus. Ind. 2011, 28, 104–116. [Google Scholar] [CrossRef]
- Shmueli, G.; Minka, T.P.; Kadane, J.B.; Borle, S.; Boatwright, P. A useful distribution for fitting discrete data: Revival of the Conway-Maxwell-Poisson distribution. Appl. Stat. 2005, 54, 127–142. [Google Scholar] [CrossRef]
- Guikema, S.D.; Coffelt, J.P. A Flexible Count Data Regression Model for Risk Analysis. Risk Anal. 2008, 28, 213–223. [Google Scholar] [CrossRef] [PubMed]
- Sellers, K.F.; Morris, D.S.; Balakrishnan, N. Bivariate Conway-Maxwell-Poisson distribution: Formulation, properties, and inference. J. Multivar. Anal. 2016, 150, 152–168. [Google Scholar] [CrossRef]
- Kocherlakota, S.; Kocherlakota, K. Bivariate Discrete Distributions; Marcel Dekker: New York, NY, USA, 1992. [Google Scholar]
- Lai, C.D. Constructions of discrete bivariate distributions. In Advances in Distribution Theory, Order Statistics and Inference, Part I; Balakrishnan, N., Sarabia, J.M., Castillo, E., Eds.; Birkhauser: Boston, MA, USA, 2006; pp. 29–58. [Google Scholar]
- Marshall, A.W.; Olkin, I. A family of bivariate distributions generated by the bivariate Bernoulli distribution. J. Am. Stat. Assoc. 1985, 80, 332–338. [Google Scholar] [CrossRef]
- R Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2017. [Google Scholar]
- Balakrishnan, N.; Pal, S. Lognormal lifetimes and likelihood-based inference for flexible cure rate models based on COM-Poisson family. Comput. Stat. Data Anal. 2013, 67, 41–67. [Google Scholar] [CrossRef]
- Burnham, K.P.; Anderson, D.R. Model Selection and Multimodel Inference; Springer: New York, NY, USA, 2002. [Google Scholar]
- Corporación Favorita. Grocery Sales Data. 2018. Available online: https://www.kaggle.com/c/favorita-grocery-sales-forecasting/data (accessed on 26 April 2020).
- Voinov, V.; Nikulin, M.; Balakrishnan, N. Chi-Squared Goodness of Fit Tests with Applications; Academic Press: Boston, MA, USA, 2013. [Google Scholar]
- NBA. NBA All-Star Game, 2000–2016. Available online: https://www.kaggle.com/fmejia21/nba-all-star-game-20002016? (accessed on 22 April 2020).
- Inouye, D.I.; Yang, E.; Allen, G.I.; Ravikumar, P. A review of multivariate distributions for count data derived from the Poisson distribution. WIREs Comput. Stat. 2017, 9, e1398. [Google Scholar] [CrossRef] [PubMed]
- Genest, C.; Nešlehová, J. A Primer on Copulas for Count Data. ASTIN Bull. 2007, 37, 475–515. [Google Scholar] [CrossRef]
- Trivedi, P.; Zimmer, D. A Note on Identification of Bivariate Copulas for Discrete Count Data. Econometrics 2017, 5, 10. [Google Scholar] [CrossRef]
- Sellers, K.F.; Swift, A.W.; Weems, K.S. A flexible distribution class for count data. J. Stat. Distrib. Appl. 2017, 4, 1–21. [Google Scholar] [CrossRef]

| Empirical Support Level for Model i | |
|---|---|
| Substantial | |
| Considerably less | |
| Essentially none |
| Sample Size | Within | Within |
|---|---|---|
| 100 | 93.0% | 99.2% |
| 250 | 93.4% | 99.4% |
| 500 | 94.2% | 98.8% |
| 1000 | 94.8% | 98.6% |
| Sample Size | Geometric | CMP () | CMP () | Bernoulli |
|---|---|---|---|---|
| 100 | 100% | 37.0% | 82.8% | 100% |
| 250 | 100% | 62.6% | 99.4% | 100% |
| 500 | 100% | 75.4% | 100.0% | 100% |
| 1000 | 100% | 99.2% | 100.0% | 100% |
| Model | Estimated Parameters | Log Likelihood | No. of Free Parameters | AIC | ||
|---|---|---|---|---|---|---|
| CMP | −804.9 | 9 | 1627.9 | |||
| Poisson | −867.3 | 7 | 1748.5 | |||
| geometric | −844.6 | 7 | 1703.2 | |||
| NB | −802.8 | 8 | 1621.7 | |||
| Store 1 | Store 2 | Store 3 | |
|---|---|---|---|
| CMP | 26.0 | 8.9 | 30.6 |
| NB | 26.4 | 9.7 | 30.2 |
| Poisson | 82.7 | 57.7 | 857.0 |
| Geometric | 16.1 | 22.7 | 51.6 |
| Model | Estimated Parameters | Log Likelihood | No. of Free Params | AIC | ||
|---|---|---|---|---|---|---|
| CMP | −68.2 | 9 | 154.4 | |||
| Poisson | −83.3 | 7 | 180.6 | |||
| NB | −83.3 | 8 | 182.7 | |||
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Sellers, K.F.; Li, T.; Wu, Y.; Balakrishnan, N. A Flexible Multivariate Distribution for Correlated Count Data. Stats 2021, 4, 308-326. https://doi.org/10.3390/stats4020021
Sellers KF, Li T, Wu Y, Balakrishnan N. A Flexible Multivariate Distribution for Correlated Count Data. Stats. 2021; 4(2):308-326. https://doi.org/10.3390/stats4020021
Chicago/Turabian StyleSellers, Kimberly F., Tong Li, Yixuan Wu, and Narayanaswamy Balakrishnan. 2021. "A Flexible Multivariate Distribution for Correlated Count Data" Stats 4, no. 2: 308-326. https://doi.org/10.3390/stats4020021
APA StyleSellers, K. F., Li, T., Wu, Y., & Balakrishnan, N. (2021). A Flexible Multivariate Distribution for Correlated Count Data. Stats, 4(2), 308-326. https://doi.org/10.3390/stats4020021

