Heterogeneous Overdispersed Count Data Regressions via Double-Penalized Estimations
Abstract
:1. Introduction
2. Double -Penalized NBR
2.1. Heterogeneous Overdispersed Count Data Regressions
2.2. Heterogeneous Overdispersed NBR via Double Penalty
Algorithm 1 Double -Penalized Optimization |
Input: the set of tuning parameters Output: the estimate for , do let ; solve ; obtain the estimate ; compute ; end for find ; return |
3. Main Results
3.1. Stochastic Lipschitz Conditions
3.2. -Estimation Error Oracle Inequalities RE Conditions
4. Numerical Studies
4.1. Simulations
4.2. A Real Data Example
5. Conclusions and Future Study
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Appendix A. Proofs
- To avoid the ill behavior of Hessian, propose the restricted eigenvalue condition or other analogous conditions about the design matrix.
- Find the tuning parameter based on the high-probability event, i.e., the KKT conditions.
- According to some restricted eigenvalue assumptions and tuning parameter selection, derive the oracle inequalities via the definition of the lasso optimality and the minimizer under unknown expected risk function and some basic inequalities. There are three sub-steps:
- (i)
- Under the KKT conditions, show that the error vector is in some restricted set with structure sparsity, and check that is in a big compact set;
- (ii)
- Show that the likelihood-based divergence of and can be lower bounded by some quadratic distance between and ;
- (iii)
- By some elementary inequalities and (ii), show that is in a smaller compact set with a radius of optimal rate (proportional to ).
References
- Dai, H.; Bao, Y.; Bao, M. Maximum likelihood estimate for the dispersion parameter of the negative binomial distribution. Stat. Probab. Lett. 2013, 83, 21–27. [Google Scholar] [CrossRef]
- Allison, P.D.; Waterman, R.P. Fixed–effects negative binomial regression models. Sociol. Methodol. 2002, 32, 247–265. [Google Scholar] [CrossRef] [Green Version]
- Hilbe, J.M. Negative Binomial Regression; Cambridge University Press: Cambridge, UK, 2011. [Google Scholar]
- Weißbach, R.; Radloff, L. Consistency for the negative binomial regression with fixed covariate. Metrika 2020, 83, 627–641. [Google Scholar] [CrossRef]
- Tibshirani, R. Regression shrinkage and selection via the lasso. J. R. Statal Soc. Ser. B 1996, 58, 267–288. [Google Scholar] [CrossRef]
- Fan, J.; Li, R. Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 2001, 96, 1348–1360. [Google Scholar] [CrossRef]
- Zou, H.; Hastie, T. Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 2005, 67, 301–320. [Google Scholar] [CrossRef] [Green Version]
- Zou, H. The adaptive lasso and its oracle properties. J. Am. Stat. Assoc. 2006, 101, 1418–1429. [Google Scholar] [CrossRef] [Green Version]
- Qiu, Y.; Chen, S.X.; Nettleton, D. Detecting rare and faint signals via thresholding maximum likelihood estimators. Ann. Stat. 2018, 46, 895–923. [Google Scholar] [CrossRef] [Green Version]
- Xie, F.; Xiao, Z. Consistency of l1 penalized negative binomial regressions. Stat. Probab. Lett. 2020, 165, 108816. [Google Scholar] [CrossRef]
- Li, Y.; Rahman, T.; Ma, T.; Tang, L.; Tseng, G.C. A sparse negative binomial mixture model for clustering RNA-seq count data. Biostatistics 2021, kxab025. [Google Scholar] [CrossRef]
- Jankowiak, M. Fast Bayesian Variable Selection in Binomial and Negative Binomial Regression. arXiv 2021, arXiv:2106.14981. [Google Scholar]
- Lisawadi, S.; Ahmed, S.; Reangsephet, O. Post estimation and prediction strategies in negative binomial regression model. Int. J. Model. Simul. 2021, 41, 463–477. [Google Scholar] [CrossRef]
- Zhang, H.; Jia, J. Elastic-net Regularized High-dimensional Negative Binomial Regression: Consistency and Weak Signals Detection. Stat. Sin. 2022, 32, 181–207. [Google Scholar] [CrossRef]
- Xu, D.; Zhang, Z.; Wu, L. Variable selection in high-dimensional double generalized linear models. Stat. Pap. 2014, 55, 327–347. [Google Scholar] [CrossRef]
- Yee, T.W. Vector Generalized Linear and Additive Models: With an Implementation in R; Springer: Berlin/Heidelberg, Germany, 2015. [Google Scholar]
- Nguelifack, B.M.; Kemajou-Brown, I. Robust rank-based variable selection in double generalized linear models with diverging number of parameters under adaptive Lasso. J. Stat. Comput. Simul. 2019, 89, 2051–2072. [Google Scholar] [CrossRef]
- Cavalaro, L.L.; Pereira, G.H. A procedure for variable selection in double generalized linear models. J. Stat. Comput. Simul. 2022, 1–18. [Google Scholar] [CrossRef]
- Wang, Z.; Ma, S.; Zappitelli, M.; Parikh, C.; Wang, C.Y.; Devarajan, P. Penalized count data regression with application to hospital stay after pediatric cardiac surgery. Stat. Methods Med. Res. 2016, 25, 2685–2703. [Google Scholar] [CrossRef] [Green Version]
- Huang, H.; Zhang, H.; Li, B. Weighted Lasso estimates for sparse logistic regression: Non-asymptotic properties with measurement errors. Acta Math. Sci. 2021, 41, 207–230. [Google Scholar] [CrossRef]
- Adamczak, R. A tail inequality for suprema of unbounded empirical processes with applications to Markov chains. Electron. J. Probab. 2008, 13, 1000–1034. [Google Scholar] [CrossRef]
- Bickel, P.J.; Ritov, Y.; Tsybakov, A.B. Simultaneous analysis of Lasso and Dantzig selector. Ann. Stat. 2009, 37, 1705–1732. [Google Scholar] [CrossRef]
- Candes, E.; Tao, T. The Dantzig selector: Statistical estimation when p is much larger than n. Ann. Stat. 2007, 35, 2313–2351. [Google Scholar]
- Riphahn, R.T.; Wambach, A.; Million, A. Incentive effects in the demand for health care: A bivariate panel count data estimation. J. Appl. Econom. 2003, 18, 387–405. [Google Scholar] [CrossRef]
- Yang, X.; Song, S.; Zhang, H. Law of iterated logarithm and model selection consistency for generalized linear models with independent and dependent responses. Front. Math. China 2021, 16, 825–856. [Google Scholar] [CrossRef]
- Shi, C.; Song, R.; Chen, Z.; Li, R. Linear hypothesis testing for high dimensional generalized linear models. Ann. Stat. 2019, 47, 2671. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Xie, F.; Lederer, J. Aggregating Knockoffs for False Discovery Rate Control with an Application to Gut Microbiome Data. Entropy 2021, 23, 230. [Google Scholar] [CrossRef]
- Cui, C.; Jia, J.; Xiao, Y.; Zhang, H. Directional FDR Control for Sub-Gaussian Sparse GLMs. arXiv 2021, arXiv:2105.00393. [Google Scholar]
- Bateman, H. Higher Transcendental Functions [Volumes i–iii]; McGraw-Hill Book Company: New York, NY, USA, 1953; Volume 1. [Google Scholar]
- Alzer, H. On some inequalities for the gamma and psi functions. Math. Comput. 1997, 66, 373–389. [Google Scholar] [CrossRef] [Green Version]
- Zhang, H.; Chen, S.X. Concentration inequalities for statistical inference. Commun. Math. Res. 2021, 37, 1–85. [Google Scholar]
- Moriguchi, S.; Murota, K.; Tamura, A.; Tardella, F. Discrete midpoint convexity. Math. Oper. Res. 2020, 45, 99–128. [Google Scholar] [CrossRef] [Green Version]
- Sen, B. A Gentle Introduction to Empirical Process Theory and Applications; Columbia University: New York, NY, USA, 2018. [Google Scholar]
- Chi, Z. Stochastic Lipschitz continuity for high dimensional Lasso with multiple linear covariate structures or hidden linear covariates. arXiv 2010, arXiv:1011.1384. [Google Scholar]
- Ledoux, M.; Talagrand, M. Probability in Banach Spaces: Isoperimetry and Processes; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2013. [Google Scholar]
- Massart, P. Some applications of concentration inequalities to statistics. Ann. Fac. Sci. Toulouse Math. 2000, 9, 245–303. [Google Scholar] [CrossRef]
- Xiao, Y.; Yan, T.; Zhang, H.; Zhang, Y. Oracle inequalities for weighted group lasso in high-dimensional misspecified Cox models. J. Inequalities Appl. 2020, 2020, 1–33. [Google Scholar] [CrossRef]
- Abramovich, F.; Grinshtein, V. Model selection and minimax estimation in generalized linear models. IEEE Trans. Inf. Theory 2016, 62, 3721–3730. [Google Scholar] [CrossRef] [Green Version]
n | ||||||
---|---|---|---|---|---|---|
100 | 0.1597 | 0.0335 | 0.72414 | 0.1809 | 0.0397 | 0.68904 |
200 | 0.0862 | 0.01 | 0.22149 | 0.0837 | 0.0169 | 0.33048 |
400 | 0.05 | 0.0047 | 0.08847 | 0.0619 | 0.0067 | 0.15066 |
Previous Method | Proposed Method | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
n | p | Other s | Other s | Other s | |||||||||
100 | 25 | 173 | 198 | 171 | 2.33 | 192 | 200 | 190 | 0.37 | 180 | 184 | 180 | 0.32 |
50 | 164 | 197 | 147 | 2.885 | 196 | 200 | 193 | 0.52 | 182 | 180 | 188 | 0.41 | |
150 | 136 | 182 | 111 | 2.725 | 194 | 194 | 192 | 1.02 | 188 | 182 | 186 | 0.41 | |
200 | 50 | 196 | 200 | 192 | 1.435 | 200 | 200 | 200 | 0.59 | 200 | 190 | 198 | 0.53 |
100 | 193 | 200 | 193 | 2.05 | 200 | 200 | 200 | 0.91 | 196 | 186 | 196 | 0.69 | |
250 | 162 | 198 | 155 | 1.5 | 199 | 199 | 198 | 1.18 | 198 | 198 | 198 | 0.69 | |
400 | 100 | 200 | 200 | 200 | 0.605 | 200 | 200 | 200 | 0.4 | 200 | 198 | 200 | 0.55 |
200 | 200 | 200 | 199 | 0.88 | 200 | 200 | 200 | 0.6 | 200 | 200 | 200 | 0.51 | |
500 | 197 | 200 | 198 | 1.29 | 200 | 200 | 200 | 1.21 | 200 | 200 | 200 | 0.61 | |
100 | 25 | 183 | 199 | 179 | 2.3 | 194 | 198 | 194 | 0.41 | 179 | 184 | 180 | 0.35 |
50 | 172 | 197 | 150 | 2.66 | 196 | 196 | 190 | 0.63 | 178 | 182 | 180 | 0.42 | |
150 | 134 | 191 | 99 | 2.32 | 194 | 196 | 192 | 1.01 | 180 | 184 | 182 | 0.43 | |
200 | 50 | 195 | 200 | 197 | 1.48 | 200 | 200 | 198 | 0.38 | 196 | 183 | 190 | 0.32 |
100 | 189 | 200 | 179 | 1.52 | 199 | 200 | 198 | 0.53 | 194 | 186 | 194 | 0.44 | |
250 | 178 | 200 | 154 | 1.39 | 196 | 198 | 196 | 1.1 | 196 | 196 | 194 | 0.55 | |
400 | 100 | 200 | 200 | 200 | 0.435 | 200 | 200 | 200 | 0.28 | 200 | 199 | 194 | 0.34 |
200 | 200 | 200 | 199 | 0.675 | 200 | 200 | 198 | 0.47 | 200 | 198 | 196 | 0.36 | |
500 | 199 | 200 | 194 | 1.12 | 200 | 200 | 198 | 1.07 | 200 | 198 | 196 | 0.56 |
Variables | 1984 | 1985 | 1986 | 1987 | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
NBR | HNBR | NBR | HNBR | NBR | HNBR | NBR | HNBR | |||||
Female | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Age | −0.013 | −0.013 | −0.012 | −0.009 | −0.01 | −0.007 | −0.006 | −0.006 | −0.013 | −0.002 | −0.001 | −0.018 |
Hsat | −0.205 | −0.2 | −0.025 | −0.244 | −0.237 | 0 | −0.188 | −0.195 | −0.045 | −0.158 | −0.153 | −0.043 |
Handdum | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Handper | 0.005 | 0.005 | 0.004 | 0.007 | 0.006 | 0.007 | 0.007 | 0.007 | 0 | 0.007 | 0.007 | 0.01 |
Hhninc | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Hhkids | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Educ | 0 | 0 | −0.027 | 0 | 0 | −0.064 | −0.035 | −0.038 | 0 | −0.095 | −0.106 | −0.003 |
Others | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
FE | 0.798 | 0.602 | 2.203 | 1.874 | 0.735 | 0.581 | 1.314 | 1.027 | ||||
Variables | 1988 | 1991 | 1994 | |||||||||
NBR | HNBR | NBR | HNBR | NBR | HNBR | |||||||
Female | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |||
Age | −0.015 | −0.014 | −0.012 | −0.022 | −0.019 | −0.003 | −0.005 | −0.004 | −0.011 | |||
Hsat | −0.191 | −0.187 | −0.015 | −0.112 | −0.132 | −0.049 | −0.226 | −0.224 | −0.06 | |||
Handdum | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |||
Handper | 0.011 | 0.009 | 0.006 | 0.014 | 0.013 | 0 | 0.007 | 0.008 | 0.004 | |||
Hhninc | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |||
Hhkids | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |||
Educ | −0.016 | −0.023 | −0.002 | −0.074 | −0.068 | 0 | −0.064 | −0.069 | 0 | |||
Others | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |||
FE | 1.144 | 0.912 | 1.007 | 0.787 | 0.713 | 0.58 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Li, S.; Wei, H.; Lei, X. Heterogeneous Overdispersed Count Data Regressions via Double-Penalized Estimations. Mathematics 2022, 10, 1700. https://doi.org/10.3390/math10101700
Li S, Wei H, Lei X. Heterogeneous Overdispersed Count Data Regressions via Double-Penalized Estimations. Mathematics. 2022; 10(10):1700. https://doi.org/10.3390/math10101700
Chicago/Turabian StyleLi, Shaomin, Haoyu Wei, and Xiaoyu Lei. 2022. "Heterogeneous Overdispersed Count Data Regressions via Double-Penalized Estimations" Mathematics 10, no. 10: 1700. https://doi.org/10.3390/math10101700