A Flexible Multivariate Distribution for Correlated Count Data
Abstract
:1. Introduction
2. Conway–Maxwell–Poisson Distribution
3. Multivariate Conway–Maxwell–Poisson Distribution
3.1. Parameter Estimation
3.2. Hypothesis Testing
4. Examples
4.1. Simulated Data
4.2. Real Data: Corporación Favorita Grocery Sales
4.3. Real Data: NBA All-Star
5. Discussion
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
MB | multivariate binomial |
fmgf | factorial moment generating function |
MP | multivariate Poisson |
pgf | probability generating function |
NB | negative binomial |
MNB | multivariate negative binomial |
CMP | Conway–Maxwell–Poisson |
MCMP | multivariate Conway–Maxwell–Poisson |
mgf | moment generating function |
MLEs | maximum likelihood estimates |
pmf | probability mass function |
ML | maximum likelihood |
LRT | likelihood ratio test |
AIC | Akaike Information Criterion |
MLE | maximum likelihood estimate |
NBA | National Basketball Association |
C | Center |
F | Forward |
FC | Forward-center |
sCMP | sum of CMPs |
MSCMP | multivariate version of the sum of CMPs |
Appendix A. Deriving the Probability Mass Function
Appendix B. Derivations of Moments
Appendix C. Introduction to the Multivariate sCMP Model
Model | Estimated Parameters | Log Likelihood | No. of Parameters | AIC | ||
---|---|---|---|---|---|---|
CMP | −804.9 | 9 | 1627.9 | |||
sCMP () | −804.0 | 9 | 1626.0 | |||
sCMP () | −803.5 | 9 | 1625.0 | |||
NB | −802.8 | 8 | 1621.7 | |||
Appendix D. Real Datasets
Day | Store 1 | Store 2 | Store 3 | Day | Store 1 | Store 2 | Store 3 | Day | Store 1 | Store 2 | Store 3 |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | 2 | 5 | 6 | 35 | 2 | 3 | 3 | 68 | 1 | 3 | 5 |
2 | 3 | 8 | 23 | 36 | 4 | 2 | 6 | 69 | 1 | 2 | 3 |
3 | 2 | 8 | 21 | 37 | 3 | 2 | 16 | 70 | 0 | 4 | 4 |
4 | 4 | 5 | 8 | 38 | 7 | 12 | 12 | 71 | 1 | 4 | 10 |
5 | 2 | 7 | 13 | 39 | 6 | 0 | 13 | 72 | 1 | 0 | 14 |
6 | 1 | 4 | 15 | 40 | 2 | 2 | 4 | 73 | 0 | 1 | 14 |
7 | 0 | 0 | 11 | 41 | 1 | 0 | 13 | 74 | 0 | 2 | 19 |
8 | 1 | 6 | 2 | 42 | 3 | 6 | 4 | 75 | 0 | 4 | 6 |
9 | 6 | 6 | 7 | 43 | 2 | 0 | 3 | 76 | 7 | 2 | 12 |
10 | 3 | 8 | 13 | 44 | 7 | 2 | 4 | 77 | 1 | 4 | 7 |
11 | 3 | 8 | 16 | 45 | 5 | 7 | 3 | 78 | 1 | 0 | 7 |
12 | 0 | 1 | 7 | 46 | 8 | 1 | 19 | 79 | 3 | 1 | 8 |
13 | 0 | 5 | 11 | 47 | 1 | 6 | 17 | 80 | 4 | 3 | 17 |
14 | 6 | 8 | 7 | 48 | 1 | 2 | 5 | 81 | 4 | 7 | 7 |
15 | 1 | 2 | 4 | 49 | 3 | 6 | 5 | 82 | 3 | 3 | 13 |
16 | 8 | 4 | 4 | 50 | 5 | 2 | 9 | 83 | 1 | 0 | 9 |
17 | 3 | 10 | 20 | 51 | 10 | 1 | 0 | 84 | 1 | 1 | 11 |
18 | 6 | 6 | 12 | 52 | 4 | 4 | 11 | 85 | 0 | 2 | 8 |
19 | 0 | 1 | 6 | 53 | 1 | 5 | 25 | 86 | 6 | 5 | 11 |
20 | 3 | 7 | 10 | 54 | 3 | 5 | 4 | 87 | 0 | 0 | 9 |
21 | 0 | 2 | 5 | 55 | 1 | 7 | 3 | 88 | 0 | 4 | 8 |
22 | 3 | 4 | 1 | 56 | 2 | 1 | 4 | 89 | 1 | 2 | 15 |
23 | 3 | 4 | 3 | 57 | 1 | 5 | 2 | 90 | 1 | 6 | 4 |
24 | 7 | 17 | 13 | 58 | 0 | 3 | 7 | 91 | 0 | 2 | 11 |
25 | 2 | 4 | 11 | 59 | 6 | 4 | 12 | 92 | 0 | 2 | 5 |
26 | 0 | 5 | 9 | 60 | 8 | 1 | 11 | 93 | 8 | 2 | 11 |
27 | 3 | 1 | 2 | 61 | 3 | 4 | 12 | 94 | 3 | 3 | 21 |
28 | 2 | 3 | 1 | 62 | 1 | 2 | 5 | 95 | 3 | 9 | 21 |
29 | 2 | 9 | 4 | 63 | 5 | 0 | 3 | 96 | 0 | 8 | 8 |
30 | 4 | 2 | 7 | 64 | 2 | 1 | 3 | 97 | 4 | 5 | 27 |
31 | 6 | 4 | 12 | 65 | 0 | 3 | 7 | 98 | 5 | 3 | 15 |
32 | 1 | 6 | 18 | 66 | 1 | 2 | 26 | 99 | 1 | 2 | 4 |
33 | 0 | 11 | 15 | 67 | 0 | 1 | 17 | 100 | 4 | 3 | 6 |
34 | 2 | 7 | 12 |
Year | C | F | FC | Year | C | F | FC | Year | C | F | FC |
---|---|---|---|---|---|---|---|---|---|---|---|
2000 | 6 | 1 | 4 | 2006 | 3 | 3 | 4 | 2012 | 3 | 2 | 4 |
2001 | 3 | 4 | 3 | 2007 | 2 | 3 | 3 | 2013 | 2 | 2 | 4 |
2002 | 4 | 3 | 4 | 2008 | 3 | 2 | 3 | 2014 | 2 | 2 | 5 |
2003 | 4 | 2 | 3 | 2009 | 2 | 3 | 5 | 2015 | 2 | 4 | 4 |
2004 | 3 | 4 | 4 | 2010 | 2 | 2 | 5 | 2016 | 3 | 4 | 2 |
2005 | 2 | 2 | 5 | 2011 | 4 | 2 | 2 |
References
- Johnson, N.; Kotz, S.; Balakrishnan, N. Discrete Multivariate Distributions; John Wiley & Sons: New York, NY, USA, 1997. [Google Scholar]
- Krishnamoorthy, A.S. Multivariate binomial and Poisson distributions. Sankhyā Indian J. Stat. 1951, 11, 117–124. [Google Scholar]
- Mahamunulu, D.M. A note on regression in the multivariate Poisson distribution. J. Am. Stat. Assoc. 1967, 62, 251–258. [Google Scholar] [CrossRef]
- Teicher, H. On the Multivariate Poisson distribution. Skand. Aktuarietidskr. 1954, 37, 1–9. [Google Scholar] [CrossRef]
- Hilbe, J.M. Modeling Count Data; Cambridge University Press: New York, NY, USA, 2014. [Google Scholar] [CrossRef]
- Doss, D.C. Definition and characterization of multivariate negative binomial distribution. J. Multivar. Anal. 1979, 9, 460–464. [Google Scholar] [CrossRef] [Green Version]
- Conway, R.W.; Maxwell, W.L. A queuing model with state dependent service rates. J. Ind. Eng. 1962, 12, 132–136. [Google Scholar]
- Sellers, K.F.; Shmueli, G.; Borle, S. The COM-Poisson model for count data: A survey of methods and applications. Appl. Stoch. Model. Bus. Ind. 2011, 28, 104–116. [Google Scholar] [CrossRef]
- Shmueli, G.; Minka, T.P.; Kadane, J.B.; Borle, S.; Boatwright, P. A useful distribution for fitting discrete data: Revival of the Conway-Maxwell-Poisson distribution. Appl. Stat. 2005, 54, 127–142. [Google Scholar] [CrossRef]
- Guikema, S.D.; Coffelt, J.P. A Flexible Count Data Regression Model for Risk Analysis. Risk Anal. 2008, 28, 213–223. [Google Scholar] [CrossRef] [PubMed]
- Sellers, K.F.; Morris, D.S.; Balakrishnan, N. Bivariate Conway-Maxwell-Poisson distribution: Formulation, properties, and inference. J. Multivar. Anal. 2016, 150, 152–168. [Google Scholar] [CrossRef]
- Kocherlakota, S.; Kocherlakota, K. Bivariate Discrete Distributions; Marcel Dekker: New York, NY, USA, 1992. [Google Scholar]
- Lai, C.D. Constructions of discrete bivariate distributions. In Advances in Distribution Theory, Order Statistics and Inference, Part I; Balakrishnan, N., Sarabia, J.M., Castillo, E., Eds.; Birkhauser: Boston, MA, USA, 2006; pp. 29–58. [Google Scholar]
- Marshall, A.W.; Olkin, I. A family of bivariate distributions generated by the bivariate Bernoulli distribution. J. Am. Stat. Assoc. 1985, 80, 332–338. [Google Scholar] [CrossRef]
- R Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2017. [Google Scholar]
- Balakrishnan, N.; Pal, S. Lognormal lifetimes and likelihood-based inference for flexible cure rate models based on COM-Poisson family. Comput. Stat. Data Anal. 2013, 67, 41–67. [Google Scholar] [CrossRef]
- Burnham, K.P.; Anderson, D.R. Model Selection and Multimodel Inference; Springer: New York, NY, USA, 2002. [Google Scholar]
- Corporación Favorita. Grocery Sales Data. 2018. Available online: https://www.kaggle.com/c/favorita-grocery-sales-forecasting/data (accessed on 26 April 2020).
- Voinov, V.; Nikulin, M.; Balakrishnan, N. Chi-Squared Goodness of Fit Tests with Applications; Academic Press: Boston, MA, USA, 2013. [Google Scholar]
- NBA. NBA All-Star Game, 2000–2016. Available online: https://www.kaggle.com/fmejia21/nba-all-star-game-20002016? (accessed on 22 April 2020).
- Inouye, D.I.; Yang, E.; Allen, G.I.; Ravikumar, P. A review of multivariate distributions for count data derived from the Poisson distribution. WIREs Comput. Stat. 2017, 9, e1398. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Genest, C.; Nešlehová, J. A Primer on Copulas for Count Data. ASTIN Bull. 2007, 37, 475–515. [Google Scholar] [CrossRef] [Green Version]
- Trivedi, P.; Zimmer, D. A Note on Identification of Bivariate Copulas for Discrete Count Data. Econometrics 2017, 5, 10. [Google Scholar] [CrossRef] [Green Version]
- Sellers, K.F.; Swift, A.W.; Weems, K.S. A flexible distribution class for count data. J. Stat. Distrib. Appl. 2017, 4, 1–21. [Google Scholar] [CrossRef] [Green Version]
Empirical Support Level for Model i | |
---|---|
Substantial | |
Considerably less | |
Essentially none |
Sample Size | Within | Within |
---|---|---|
100 | 93.0% | 99.2% |
250 | 93.4% | 99.4% |
500 | 94.2% | 98.8% |
1000 | 94.8% | 98.6% |
Sample Size | Geometric | CMP () | CMP () | Bernoulli |
---|---|---|---|---|
100 | 100% | 37.0% | 82.8% | 100% |
250 | 100% | 62.6% | 99.4% | 100% |
500 | 100% | 75.4% | 100.0% | 100% |
1000 | 100% | 99.2% | 100.0% | 100% |
Model | Estimated Parameters | Log Likelihood | No. of Free Parameters | AIC | ||
---|---|---|---|---|---|---|
CMP | −804.9 | 9 | 1627.9 | |||
Poisson | −867.3 | 7 | 1748.5 | |||
geometric | −844.6 | 7 | 1703.2 | |||
NB | −802.8 | 8 | 1621.7 | |||
Store 1 | Store 2 | Store 3 | |
---|---|---|---|
CMP | 26.0 | 8.9 | 30.6 |
NB | 26.4 | 9.7 | 30.2 |
Poisson | 82.7 | 57.7 | 857.0 |
Geometric | 16.1 | 22.7 | 51.6 |
Model | Estimated Parameters | Log Likelihood | No. of Free Params | AIC | ||
---|---|---|---|---|---|---|
CMP | −68.2 | 9 | 154.4 | |||
Poisson | −83.3 | 7 | 180.6 | |||
NB | −83.3 | 8 | 182.7 | |||
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Sellers, K.F.; Li, T.; Wu, Y.; Balakrishnan, N. A Flexible Multivariate Distribution for Correlated Count Data. Stats 2021, 4, 308-326. https://doi.org/10.3390/stats4020021
Sellers KF, Li T, Wu Y, Balakrishnan N. A Flexible Multivariate Distribution for Correlated Count Data. Stats. 2021; 4(2):308-326. https://doi.org/10.3390/stats4020021
Chicago/Turabian StyleSellers, Kimberly F., Tong Li, Yixuan Wu, and Narayanaswamy Balakrishnan. 2021. "A Flexible Multivariate Distribution for Correlated Count Data" Stats 4, no. 2: 308-326. https://doi.org/10.3390/stats4020021
APA StyleSellers, K. F., Li, T., Wu, Y., & Balakrishnan, N. (2021). A Flexible Multivariate Distribution for Correlated Count Data. Stats, 4(2), 308-326. https://doi.org/10.3390/stats4020021