# Towards Improving Transparency of Count Data Regression Models for Health Impacts of Air Pollution

^{1}

^{2}

^{3}

^{*}

## Abstract

**:**

## 1. Introduction

^{2}, which varies from 0 to 1, and is a measure of the portion of the variation in the response variable accounted for by the conditional mean model, is visually represented in the plot.

^{2}or other simple measures to help detect the presence of inappropriate covariates. Finally, because such covariates may elude even k-fold cross-validation, the final CMM is applied to the testing data, and, again, reductions in R

^{2}or other simple measures will further aid in detecting false inference and overfitting.

^{2}or other intuitively appealing measures of goodness-of-fit that can be conveniently used in k-fold cross-validation or in application to test data to help warn against overfitting. There are only various forms of the more difficult to interpret pseudo-R

^{2}, and other measures, depending on the representation of the residuals [2]. This may explain why authoritative “how-to” guides on data analysis in R may demonstrate k-fold cross-validation for various model types but not for count data regression [3,4]. In our literature review of the impact of air quality on respiratory health, we found k-fold cross-validation and application of testing data was used [5], but never for a count data response variable in a CMM.

## 2. Illustration of False Inference and Overfitting Due to pmf Misspecification

## 3. The Predicted-And-Observed Count Histogram

## 4. Conclusions

## Author Contributions

## Funding

## Informed Consent Statement

## Data Availability Statement

## Acknowledgments

## Conflicts of Interest

## References

- Hilbe, J.M. Modeling Count Data; Cambridge University Press: New York, NY, USA, 2014. [Google Scholar]
- Cameron, A.C.; Trivedi, P.K. Regression Analysis of Count Data, 2nd ed.; Cambridge University Press: New York, NY, USA, 2013. [Google Scholar]
- Kabacoff, R.I. R in Action: Data Analysis and Graphics with R; Manning Publications Co.: Shelter Island, NY, USA, 2015. [Google Scholar]
- Rigby, R.A.; Stasinopoulos, D.M.; Heller, G.Z.; De Bastiani, F. Distributions for Modeling Location, Scale, and Shape: Using Gamlss in R; CRC Press: Boca Raton, FL, USA; Taylor & Francis Group: Boca Raton, FL, USA, 2020. [Google Scholar]
- Vitolo, C.; Scutari, M.; Ghalaieny, M.; Tucker, A.; Russell, A. Modeling air pollution, climate, and health data using Bayesian networks: A case study of the English regions. Earth Space Sci.
**2018**, 5, 76–88. [Google Scholar] [CrossRef][Green Version] - Akaike, H. Information Theory and an Extension of the Maximum Likelihood Principle. In Proceedings of the Second International Symposium on Information Theory, Tsahkadsor, Armenia, 2–8 September 1971; Petrov, B.N., Caski, F., Eds.; Akademiai Kiado: Budapest, Hungary, 1973; pp. 267–281. [Google Scholar]
- Rigby, R.A.; Stasinopoulos, D.M. Generalized additive models for location, scale and shape (with discussion). Appl. Stat.
**2005**, 54, 507–554. [Google Scholar] [CrossRef][Green Version] - R Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2020; Available online: https://www.R-project.org/ (accessed on 1 February 2021).
- Wedderburn, R.W.M. Quasi-likelihood functions, generalized linear models, and the Gauss-Newton method. Biometrika
**1974**, 61, 439–447. [Google Scholar] - Li, W.K. Testing model adequacy for some Markov regression models for time series. Biometrika
**1991**, 78, 83–89. [Google Scholar] [CrossRef] - Choi, M.; Curriero, F.C.; Johantgen, M.; Mills, M.E.C.; Sattler, B.; Lipscomb, J. Association between ozone and emergency department visits: An ecological study. Int. J. Environ. Health Res.
**2011**, 21, 201–221. [Google Scholar] [CrossRef] [PubMed] - Hyrkas-Palmu, H.; Ikäheimo, T.M.; Laatikainen, T.; Jousilahti, P.; Jaakkola, M.S.; Jaakkola, J.J.K. Cold weather increases respiratory symptoms and functional disability especially among patients with asthma and allergic rhinitis. Sci. Rep.
**2018**, 8, 10131. [Google Scholar] [CrossRef] [PubMed] - Lam, H.C.; Li, A.M.; Chan, E.Y.; Goggins, W.B., III. The short-term association between asthma hospitalisations, ambient temperature, other meteorological factors and air pollutants in Hong Kong: A time-series study. Thorax
**2016**, 71, 1097–1109. [Google Scholar] [CrossRef] [PubMed][Green Version] - Lin, Y.; Chang, S.; Lin, C.; Chen, Y.; Wang, Y. Comparing ozone metrics on associations with outpatient visits for respiratory diseases in Taipei Metropolitan area. Environ. Pollut.
**2013**, 177, 177–184. [Google Scholar] [CrossRef] [PubMed] - O’Lenick, C.R.; Winquist, A.; Chang, H.H.; Kramer, M.R.; Mulholland, J.A.; Grundstein, A.; Sarnat, S.E. Evaluation of individual and area-level factors as modifiers of the association between warm-season temperature and pediatric asthma morbidity in Atlanta, GA. Environ. Res.
**2017**, 156, 132–144. [Google Scholar] [CrossRef] [PubMed] - Rublee, C.S.; Sorensen, C.J.; Lemery, J.; Wade, T.J.; Sams, E.A.; Hilborn, E.D.; Crooks, J.L. Associations between dust storms and intensive care unit admissions in the United States, 2000–2015. GeoHealth
**2020**, 3, e2020GH000260. [Google Scholar] [CrossRef] [PubMed] - Xu, Z.; Huang, C.; Su, H.; Turner, L.R.; Qiao, Z.; Tong, S. Diurnal temperature range and childhood asthma: A time-series study. Environ. Health
**2013**, 12, 12. Available online: http://www.ehjournal.net/content/12/1/12 (accessed on 1 February 2021). - Zhang, H.; Liu, S.; Chen, Z.; Zu, B.; Zhao, Y. Effects of variations in meteorological factors on daily hospital visits for asthma: A time-series study. Environ. Res.
**2020**, 182, 109115. [Google Scholar] [CrossRef] [PubMed]

**Figure 1.**Emergency department childhood asthma arrivals in response to mold during summers of 2003–2011 in Houston, Texas.

**Figure 2.**Schematic for the generation of synthetic data with count data as the response variable. pmf: probability mass functions; CMM: conditional mean model.

**Figure 3.**Predicted-and-Observed Count Histogram for modeling of emergency department arrivals with mold as the covariate, for summers from 2003–2011 in Houston, Texas.

**Table 1.**Results for regression analysis using negative binomial (Neg. Bin.) pmf (${\mathcal{L}}_{optimal}$ columns) and Poisson pmf (${\mathcal{L}}_{Poisson}$ columns).

$\widehat{\mathit{y}}={\mathit{e}}^{{\widehat{\mathit{\beta}}}_{0}+{\widehat{\mathit{\beta}}}_{1}{\mathit{x}}_{1}}$ as a Conditional Mean Model | $\widehat{\mathit{y}}={\mathit{e}}^{{\widehat{\mathit{\beta}}}_{0}+{\widehat{\mathit{\beta}}}_{1}{\mathit{x}}_{1}+{\widehat{\mathit{\beta}}}_{2}\mathit{x}}$ as a Conditional Mean Model | $\widehat{\mathit{y}}={\mathit{e}}^{{\widehat{\mathit{\beta}}}_{0}+{\widehat{\mathit{\beta}}}_{1}{\mathit{x}}_{1}+{\widehat{\mathit{\beta}}}_{2}\mathit{x}+{\widehat{\mathit{\beta}}}_{3}{\mathit{x}}_{3}}$ as a Conditional Mean Model | ||||
---|---|---|---|---|---|---|

${\mathit{\mathcal{L}}}_{\mathit{P}\mathit{o}\mathit{i}\mathit{s}\mathit{s}\mathit{o}\mathit{n}}$ | ${\mathit{\mathcal{L}}}_{\mathit{o}\mathit{p}\mathit{t}\mathit{i}\mathit{m}\mathit{a}\mathit{l}}$ | ${\mathit{\mathcal{L}}}_{\mathit{P}\mathit{o}\mathit{i}\mathit{s}\mathit{s}\mathit{o}\mathit{n}}$ | ${\mathit{\mathcal{L}}}_{\mathit{o}\mathit{p}\mathit{t}\mathit{i}\mathit{m}\mathit{a}\mathit{l}}$ | ${\mathit{\mathcal{L}}}_{\mathit{P}\mathit{o}\mathit{i}\mathit{s}\mathit{s}\mathit{o}\mathit{n}}$ | ${\mathit{\mathcal{L}}}_{\mathit{o}\mathit{p}\mathit{t}\mathit{i}\mathit{m}\mathit{a}\mathit{l}}$ | |

pmf | Poisson | Neg. bin. | Poisson | Neg. bin. | Poisson | Neg. bin. |

${\sigma}_{i}$ | $\sqrt{{\widehat{y}}_{i}}$ | $\sqrt{{\widehat{y}}_{i}+\alpha {\widehat{y}}_{i}{}^{2}}$ | $\sqrt{{\widehat{y}}_{i}}$ | $\sqrt{{\widehat{y}}_{i}+\alpha {\widehat{y}}_{i}{}^{2}}$ | $\sqrt{{\widehat{y}}_{i}}$ | $\sqrt{{\widehat{y}}_{i}+\alpha {\widehat{y}}_{i}{}^{2}}$ |

$\alpha $ | NA | 0.515 | NA | 0.512 | NA | 0.511 |

AIC | 15,054.0 | 7905.6 | 14,973.3 | 7900.9 | 14,960.1 | 7901.3 |

${\widehat{\beta}}_{0}$ (p-value) | 1.52 ($<2\times {10}^{-16})$ | 1.47 ($7.6\times {10}^{-9})$ | 0.85 ($7.2\times {10}^{-16}$) | 0.84 (0.017) | 1.15 $(<2\times {10}^{-16})$ | 1.17 (0.0072) |

${\widehat{\beta}}_{1}$ (p-value) | 0.15 ($<2\times {10}^{-16})$ | 0.15 ($1.2\times {10}^{-9}$) | 0.15 ($<2\times {10}^{-16})$ | 0.15 ($1.1\times {10}^{-9})$ | 0.15 ($<2\times {10}^{-16})$ | 0.15 ($9.3\times {10}^{-10})$ |

${\widehat{\beta}}_{2}$ (p-value) | NA | NA | 0.065 ($<2\times {10}^{-16})$ | 0.063 (0.0099) | 0.064 $(<2\times {10}^{-16})$ | 0.063 (0.011) |

${\widehat{\beta}}_{3}$ (p-value) | NA | NA | NA | NA | −0.029 $\left(0.00010\right)$ | −0.033 (0.20) |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Joseph, J.F.; Furl, C.; Sharif, H.O.; Sunil, T.; Macias, C.G. Towards Improving Transparency of Count Data Regression Models for Health Impacts of Air Pollution. *Appl. Sci.* **2021**, *11*, 3375.
https://doi.org/10.3390/app11083375

**AMA Style**

Joseph JF, Furl C, Sharif HO, Sunil T, Macias CG. Towards Improving Transparency of Count Data Regression Models for Health Impacts of Air Pollution. *Applied Sciences*. 2021; 11(8):3375.
https://doi.org/10.3390/app11083375

**Chicago/Turabian Style**

Joseph, John F., Chad Furl, Hatim O. Sharif, Thankam Sunil, and Charles G. Macias. 2021. "Towards Improving Transparency of Count Data Regression Models for Health Impacts of Air Pollution" *Applied Sciences* 11, no. 8: 3375.
https://doi.org/10.3390/app11083375