# Comprehensive Evaluation of a Sparse Dataset, Assessment and Selection of Competing Models

^{*}

## Abstract

**:**

## 1. Introduction

#### 1.1. Problem Statement

#### 1.2. Literature Review

## 2. Method

## 3. Data

## 4. Performance Evaluations of Models

_{i}from the corresponding predicted means. This approach has been used to ensure that after transforming the dataset, e.g., response, the model residuals follow a normal distribution. This can be achieved by checking quantile–quantile (Q-Q) plots. This is a scatterplot created by plotting two sets of quantiles against one another to check if both sets come from the same distribution.

- Based on the identified model, expected counts are shown by a red line.
- Real/observed counts are shown as bars.
- X-axis represents counts.
- Y-axis represents square root of expected/observed counts.
- The first line observed in this figure is related to the height of the observation with zero value (EPDO = 0).

## 5. Results

#### 5.1. Model Comparisons

_{NB}= 6955 versus AIC

_{poisson}= 29,630). In summary, even considering a different family of GLM results in complete variations across some predictors in terms of signs and magnitudes.

#### 5.2. Comparison Between Hurdle Models, Truncated Count Component, with Different Distributions

#### 5.3. Hurdle Model Results

## 6. Summary and Conclusions

## Author Contributions

## Funding

## Conflicts of Interest

## References

- Ascone, D.; Lindsey, T.T.; Varghese, C. An Examination of Driver Distraction as Recorded in NHTSA Databases; National Highway Traffic Safety Administration: Washington, DC, USA, 2009.
- Kleiber, C.; Zeileis, A. Visualizing count data regressions using rootograms. Am. Stat.
**2016**, 70, 296–303. [Google Scholar] [CrossRef][Green Version] - Weber, A.; Murray, D.C. Evaluating the Impact of Commercial Motor Vehicle Enforcement Disparities on Carrier Safety Performance; American Transportation Research Institute: Arlington, VA, USA, 2014.
- Oh, J.; Washington, S.; Lee, D. Property Damage Crash Equivalency Factors to Solve Crash Frequency–Severity Dilemma: Case Study on South Korean Rural Roads. Transp. Res. Rec.
**2010**, 2148, 83–92. [Google Scholar] [CrossRef] - Lu, P.; Tolliver, D. Accident prediction model for public highway-rail grade crossings. Accid. Anal. Prev.
**2016**, 90, 73–81. [Google Scholar] [CrossRef] [PubMed] - Shmueli, G.; Minka, T.P.; Kadane, J.B.; Borle, S.; Boatwright, P. A useful distribution for fitting discrete data: Revival of the Conway–Maxwell–Poisson distribution. J. R. Stat. Soc. Ser.
**2005**, 54, 127–142. [Google Scholar] [CrossRef] - McDowell, A. From the help desk: Hurdle models. Stata J.
**2003**, 3, 178–184. [Google Scholar] [CrossRef] - Atkins, D.C.; Gallop, R.J. Rethinking how family researchers model infrequent outcomes: A tutorial on count regression and zero-inflated models. J. Fam. Psychol.
**2007**, 21, 726. [Google Scholar] [CrossRef] [PubMed] - Ma, L.; Yan, X.; Wei, C.; Wang, J. Modeling the equivalent property damage only crash rate for road segments using the hurdle regression framework. Anal. Methods Accid. Res.
**2016**, 11, 48–61. [Google Scholar] [CrossRef] - Ma, L.; Yan, X.; Weng, J. Modeling traffic crash rates of road segments through a lognormal hurdle framework with flexible scale parameter. J. Adv. Transport.
**2015**, 49, 928–940. [Google Scholar] [CrossRef] - Washington, S.; Haque, M.M.; Oh, J.; Lee, D. Applying quantile regression for modeling equivalent property damage only crashes to identify accident blackspots. Accid. Anal. Prev.
**2014**, 66, 136–146. [Google Scholar] [CrossRef] [PubMed][Green Version] - Lord, D.; Mannering, F. The statistical analysis of crash-frequency data: A review and assessment of methodological alternatives. Transp. Res. Part A Policy Pract.
**2010**, 44, 291–305. [Google Scholar] [CrossRef][Green Version] - Zeileis, A.; Kleiber, C.; Jackman, S. Regression models for count data in R. J. Stat. Softw.
**2008**, 27, 1–25. [Google Scholar] [CrossRef][Green Version] - Rose, C.E.; Martin, S.W.; Wannemuehler, K.A.; Plikaytis, B.D. On the use of zero-inflated and hurdle models for modeling vaccine adverse event count data. J. Biopharm. Stat.
**2006**, 16, 463–481. [Google Scholar] [CrossRef] [PubMed] - Lord, D.; Washington, S.P.; Ivan, J.N. Poisson, Poisson-gamma and zero-inflated regression models of motor vehicle crashes: Balancing statistical fit and theory. Accid. Anal. Prev.
**2005**, 37, 35–46. [Google Scholar] [CrossRef] [PubMed] - Shankar, V.; Milton, J.; Mannering, F. Modeling accident frequencies as zero-altered probability processes: An empirical inquiry. Accid. Anal. Prev.
**1997**, 29, 829–837. [Google Scholar] [CrossRef] - Rezapour, M.; Wulff, S.S.; Ksaibati, K. Examination of the severity of two-lane highway traffic barrier crashes using the mixed logit model. J. Saf. Res.
**2019**, 70, 223–232. [Google Scholar] [CrossRef] [PubMed] - Tukey, J.W. Some thoughts on clinical trials, especially problems of multiplicity. Science
**1977**, 198, 679–684. [Google Scholar] [CrossRef] [PubMed]

Variable | Mean | Std. Dev | Min | Max | |
---|---|---|---|---|---|

Barriers with crashes | |||||

EPDO | 9.3 | 32.898 | 1 | 302 | |

Shoulder width, categorical, cutting point of 5 feet | 0.5 | 0.4999 | 0 | 1 | |

Barrier height (in) | 30.0 | 2.969 | <12 | 56 | |

Barrier length (ft) | 683.7 | 1181.515 | 20 | 10276 | |

Barrier type | Box beam | Frequency: 883 | |||

Cable barrier | Frequency: 5 | ||||

Concrete barrier | Frequency: 51 | ||||

W beam barrier | Frequency: 129 | ||||

Restraint condition, restrained as 0 versus 1 as others | 0.110 | 0.258 | 0 | 1 | |

Speed compliance, speed limit was compiled as 0 versus 1 as others. | 0.0969 | 0.229 | 0 | 1 | |

Barriers with no crash | |||||

EPDO | 0 | — | — | — | |

Shoulder width (ft) | 7.1 | 2.977 | 2 | 18 | |

Barrier height (in) | 27.9 | 5.501 | <12 | 44 | |

AADT (Average annual daily traffic) | 3049 | 1240.090 | 750 | 6019 | |

Barrier length | 358 | 318.185 | 32 | 2711 | |

Barrier type | Box beam | Frequency: 91 | |||

Cable barrier | Frequency: 172 | ||||

Concrete barrier | Frequency: 7 | ||||

W beam barrier | Frequency: 43 |

Distributions | GLM | Zero Augmented | |||
---|---|---|---|---|---|

Poisson | Quasi-Poisson | Negative Binomial | Hurdle | Zero Inflated | |

Intercept | 4 × 10^{−1}(<0.005) | 3.736 × 10^{−1}(0.722) | −2.138 (<0.005) | −1.3 × 10 (0.9704) | 1.683 (0.006) |

Shoulder width | −1.455 (<0.005) | −1.455 (0.243) | 1.158 (0.111) | −2.40 (0.0833) | −1.479 (0.047) |

Barrier height | 3.6 × 10^{−2}(<0.005) | 3.603 × 10^{−2}(0.301) | 1.136 × 10^{−1}(<0.005) | −6.61 × 10^{−2}(0.0982) | −1.6 × 10^{−2}(0.427) |

Barrier length | 2.6 × 10^{−4}(<0.005) | 2.235 × 10^{−4}(<0.005) | 2.224 × 10^{−4}(<0.005) | 3.93 × 10^{−4}(<0.005) | 2.926 × 10^{−4}(<0.005) |

Cable barrier | −2.491 (<0.005) | −2.491 (0.092) | −1.432 (<0.005) | 7.51 × 10^{−2}(0.943) | −4.82 × 10^{−1}(0.363) |

Concrete barrier | 1.9 × 10^{−1}(<0.005) | 1.991 × 10^{−1}(0.617) | −3.539 × 10^{−1}(0.074) | 4.05 × 10^{−1}(0.3006) | 1.302 × 10^{−1}(0.545) |

W beam barrier | 1.6 × 10^{−1}(<0.005) | 1.571 × 10^{−1}(0.541) | −3.283 × 10^{−1}(0.006) | −2.23 × 10^{−1}(0.3514) | −2.50 × 10^{−1}(0.043) |

Restraint condition | 1.425 (<0.005) | 1.425 (<0.005) | 2.892 (<0.005) | 5.64 (<0.005) | 2.842 (<0.005) |

Speed compliance | 3.7 × 10^{−1}(<0.005) | 3.754 × 10^{−1}(0.331) | 1.218 (<0.005) | 9.51 × 10^{−1}(0.063) | 8.690 × 10^{−1}(<0.005) |

Shoulder width × barrier height | 5.6 × 10^{−2}(<0.005) | 5.620 × 10^{−2}(0.159) | −2.913 × 10^{−2}(0.232) | 9.18 × 10^{−2}(0.045) | 5.895 × 10^{−2}(0.017) |

Intercept | – | – | – | −1.351 (<0.005) | 20.516 (0.336) |

Shoulder width | – | – | – | 1.04 (<0.005) | −6.300 (0.006) |

Barrier height | – | – | – | 5.239 (<0.005) | −3.289 (<0.005) |

Traffic (AADT) | – | – | – | 2.37 (<0.005) | −0.0002 (0.005) |

Barrier length | – | – | – | 1.44 (<0.005) | −0.001 (0.011) |

Cable barrier | – | – | – | −6.849 (<0.005) | 84.698 (<0.005) |

Concrete barrier | – | – | – | −2.496 (0.636) | 18.452 (0.507) |

W beam barrier | – | – | – | −1.101 (<0.005) | 48.66 (<0.005) |

Shoulder width × barrier height | – | – | – | −1.351 (<0.005) | 0.223 (0.012) |

Degree of freedom | 10 | 10 | 11 | 20 | 19 |

Log likelihood | −14,805 | – | −3466 | −3138 | −3425 |

AIC | 29,630 | – | 6955 | 6315 | 6890 |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Rezapour, M.; Ksaibati, K. Comprehensive Evaluation of a Sparse Dataset, Assessment and Selection of Competing Models. *Signals* **2020**, *1*, 157-169.
https://doi.org/10.3390/signals1020009

**AMA Style**

Rezapour M, Ksaibati K. Comprehensive Evaluation of a Sparse Dataset, Assessment and Selection of Competing Models. *Signals*. 2020; 1(2):157-169.
https://doi.org/10.3390/signals1020009

**Chicago/Turabian Style**

Rezapour, Mahdi, and Khaled Ksaibati. 2020. "Comprehensive Evaluation of a Sparse Dataset, Assessment and Selection of Competing Models" *Signals* 1, no. 2: 157-169.
https://doi.org/10.3390/signals1020009