# A Copula-Based Approach for Accommodating the Underreporting Effect in Wildlife‒Vehicle Crash Analysis

## Abstract

**:**

## 1. Introduction

## 2. Data Description

## 3. Methodology

#### 3.1. The Wildlife‒Vehicle Collision Model

#### 3.2. The Underreporting Outcome Model

#### 3.3. Linking the Wildlife‒Vehicle Collision Model and the Underreporting Outcome Model

## 4. Modeling Results

#### 4.1. Variables Affecting the Number of Reported Wildlife‒Vehicle Collisions and the Underreporting Outcome

#### 4.2. Comparison of the Hotspot Identification Results Using the Gaussian Copula-Based EB Method and NB-Based EB Method

#### Measure I

#### Measure II

#### Measure III

## 5. Discussion and Conclusions

Variables | Minimum | Maximum | Mean | S.D. ^{a} |
---|---|---|---|---|

Number of reported wildlife‒vehicle collisions per road segment | 0 | 22 | 0.24 | 0.81 |

Number of carcasses per road segment | 0 | 95 | 0.94 | 3.88 |

Underreporting indicator (Underreporting: 1; otherwise: 0) | Underreporting: 16%; otherwise: 84% | |||

Annual average daily traffic (AADT) over year 2002 to 2006 | 0.31 | 148.8 | 13.85 | 19.76 |

Restrictive access control (yes: 1; no: 0) ^{b} | yes: 24%; otherwise: 76% | |||

Posted speed limit (mph) | 20 | 70 | 52.76 | 10.79 |

Truck percentage (%) | 0 | 52.28 | 14.05 | 8.29 |

Median width (feet) | 0 | 60 | 7.9 | 15.62 |

Total number of lanes for both directions | 1 | 9 | 2.79 | 1.24 |

Roadway length (mile) | 0.01 | 6.99 | 0.22 | 0.4 |

Terrain type (rolling: 1; otherwise: 0) | rolling: 72%; otherwise: 28% | |||

Terrain type (mountainous: 1; otherwise: 0) | mountainous: 9.6%; otherwise: 90.4% | |||

Lane width (feet) | 10 | 20 | 12.5 | 1.88 |

Left shoulder width (feet) | 0 | 18 | 2.44 | 2.04 |

Right shoulder width (feet) | 0 | 20 | 4.03 | 3.52 |

Rural or Urban (urban: 0; rural: 1) | urban: 75.8%; rural: 24.2% | |||

White-tailed deer habitat (yes: 1; no: 0) | yes: 31%; no: 69% | |||

Mule deer habitat (yes: 1; no: 0) | yes: 51%; no: 49% | |||

Elk habitat (yes: 1; no: 0) | yes: 31%; no: 69% |

^{a}S.D. means Standard Deviation;

^{b}Restrictive access control means that access to the roadways is fully controlled;

^{c}6 out of 10475 road segments have one lane.

Name | Copula $C\left(u,v;\text{}\theta \right)$ ^{a} | Parameter Range of $\theta $ | Parameter Range of Kendall’s tau |
---|---|---|---|

Gaussian | ${\Phi}_{2}\left({\Phi}^{-1}\left(u\right),{\Phi}^{-1}\left(v\right),\theta \right)$^{b} | $\theta \in \left(-1,\text{}1\right)$, $\theta =0$ is independence | $\tau =(2/\pi ){\mathrm{sin}}^{-1}\left(\theta \right)$, $\tau \in \left(-1,\text{}1\right)$ |

Farlie-Gumbel-Morgenstern | $uv[1+\theta (1-u)(1-v)]$ | $\theta \in \left(-1,\text{}1\right)$, $\theta =0$ is independence | $\tau =\frac{2}{9}\theta $, $\tau \in \left(-\frac{2}{9},\frac{2}{9}\right)$ |

Ali-Mikhail-Haq | $\frac{uv}{1-\theta (1-u)(1-v)}$ | $\theta \in [-1,1)$, $\theta \to 0$ is independence | $\tau =\frac{3\theta -2}{3\theta}-\frac{2{\left(1-\theta \right)}^{2}}{3{\theta}^{2}}\mathrm{ln}\left(1-\theta \right)$, $-0.182<\tau <0.333$ |

Clayton | ${({u}^{-\theta}+{v}^{-\theta}-1)}^{-1/\theta}$ | $\theta \in (0,\infty )$, $\theta \to 0$ is independence | $\tau =\frac{\theta}{\theta +2}$, $0<\tau <1$ |

Frank | $-\frac{1}{\theta}\mathrm{ln}\left(1+\frac{\left({e}^{-\theta u}-1\right)\left({e}^{-\theta v}-1\right)}{{e}^{-\theta}-1}\right)$ | $\theta \in (-\infty ,\infty )\backslash \left\{0\right\}$, $\theta \to 0$ is independence | $\tau =1-\frac{4}{\theta}[1-{D}_{1}(\theta )]$^{c}, $\tau \in \left(-1,\text{}1\right)$ |

Gumbel | $\mathrm{exp}\left(-{\left[{\left(-\mathrm{ln}u\right)}^{\theta}+{\left(-\mathrm{ln}v\right)}^{\theta}\right]}^{1/\theta}\right)$ | $\theta \in [1,\infty )$, $\theta =1$ is independence | $\tau =1-{\theta}^{-1}$, $0<\tau <1$ |

Joe | $1-{\left[{\left(1-u\right)}^{\theta}+{\left(1-v\right)}^{\theta}-{\left(1-u\right)}^{\theta}{\left(1-v\right)}^{\theta}\right]}^{1/\theta}$ | $\theta \in [1,\infty )$, $\theta =1$ is independence | $\tau =1+\frac{4}{\theta}{D}_{J}\left(\theta \right)$^{d}, $0<\tau <1$ |

^{a}$u$ and $v$ represent the marginal cdfs of $\left(Y,Z\right)$, respectively;

^{b}${\Phi}_{2}$ represents the standard cdf of bivariate normal distribution and Φ

^{−1}denotes the inverse cdf of the standard univariate normal distribution;

^{c}${D}_{1}\left(\theta \right)$ is the first order Debye function; and,

^{d}${D}_{J}\left(\theta \right)={\displaystyle {\int}_{t=0}^{1}\frac{[\mathrm{ln}(1-{t}^{\theta})]\left(1-{t}^{\theta}\right)}{{t}^{\theta -1}}}dt$.

**Table 3.**Modeling results for underreporting outcome and reported wildlife‒vehicle collisions using the Gaussian copula model and the independent copula model.

Gaussian Copula Model | Independent Copula Model | |
---|---|---|

Underreporting indicator variable | Estimate (Std. Error) | Estimate (Std. Error) |

Intercept | −4.103 (0.355) | −4.221 (0.355) |

Average daily traffic | −1.181 × 10^{−5} (4.77 × 10^{−6}) | -* |

Restrictive access control | −0.780 (0.160) | −0.874 (0.157) |

Posted speed limit | 0.034 (0.006) | 0.039 (0.006) |

Total number of lanes for both directions | −0.159 (0.063) | −0.249 (0.052) |

Segment length | 1.187 (0.091) | 1.229 (0.085) |

Terrain type: rolling | 0.588 (0.115) | 0.575 (0.115) |

Terrain type: mountainous | 0.304 (0.156) | 0.315 (0.157) |

Left shoulder width | 0.085 (0.012) | 0.080 (0.012) |

White-tailed deer habitat | 1.274 (0.082) | 1.250 (0.082) |

Elk habitat | 0.491 (0.078) | 0.503 (0.078) |

Mule deer habitat | −0.288 (0.084) | −0.274 (0.083) |

Number of reported wildlife‒vehicle collisions variable | Estimate (Std. Error) | Estimate (Std. Error) |

Intercept | −6.240 (0.811) | −8.718 (0.596) |

Ln (Average daily traffic) | 0.497 (0.057) | 0.690 (0.052) |

Restrictive access control | −1.050 (0.141) | −0.958 (0.127) |

Posted speed limit | 0.059 (0.007) | 0.028 (0.006) |

Truck percentage | −0.036 (0.005) | −0.036 (0.005) |

Total number of lanes for both directions | −0.252 (0.048) | −0.177 (0.043) |

Terrain type: rolling | −0.244 (0.094) | −0.213 (0.084) |

Terrain type: mountainous | −0.742 (0.154) | −0.680 (0.140) |

Lane width | −0.132 (0.045) | -* |

Left shoulder width | 0.057 (0.011) | 0.057 (0.010) |

White-tailed deer habitat | 0.583 (0.075) | 0.523 (0.067) |

Elk habitat | 0.654 (0.075) | 0.705 (0.066) |

**Table 4.**Test statistics of three measures using the Gaussian copula-based EB method and NB-based EB method.

Measures | Threshold Values | ||
---|---|---|---|

Method I | c = 0.01 | c = 0.05 | c = 0.10 |

Copula model | 1031 | 2842 | 4056 |

NB model | 921 | 2624 | 3822 |

Method II | c = 0.01 | c = 0.05 | c = 0.10 |

Copula model | 13 | 86 | 213 |

NB model | 12 | 75 | 189 |

Method III | c = 0.01 | c = 0.05 | c = 0.10 |

Copula model | 8337 | 90,657 | 236,760 |

NB model | 11,490 | 114,093 | 294,874 |

