Factors Contributing to the Relationship between Driving Mileage and Crash Frequency of Older Drivers

: As a characteristic of senior drivers aged 65 + , the low-mileage bias has been reported in previous studies. While it is thought to be a well-known phenomenon caused by aging, the characteristics of urban environments create more opportunities for crashes. This calls for investigating the low-mileage bias and scrutinizing whether it has the same impact on other age groups, such as young and middle-aged drivers. We use a crash database from the Ohio Department of Public Safety from 2006 to 2011 and adopt a macro approach using Negative Binomial models and Conditional Autoregressive (CAR) models to deal with a spatial autocorrelation issue. Aside from the low-mileage bias issue, we examine the association between the number of crashes and the built environment and socio-economic and demographic factors. We conﬁrm that the number of crashes is associated with vehicle miles traveled, which suggests that more accumulated driving miles result in a lower likelihood of being involved in a crash. This implies that drivers in the low mileage group are involved in crashes more often, regardless of the driver’s age. The results also conﬁrm that more complex urban environments have a higher number of crashes than rural environments.


Introduction
Traffic safety has increased with the development of technology that can enhance safety and protect people on the road. However, over six million traffic crashes still occur every year in the USA. Among the driver age groups, the share of seniors aged 65+ has increased, and so has the number of crashes involving older drivers [1]. The vulnerability of senior drivers to car crashes has been characterized by frailty and low-mileage. According to Cioca and Ivascu [2], the 65+ driver group has the highest risk of death among all driver groups. It was also thought that driving more increases crash risk [3,4]. However, this idea has begun to change since Janke [5] came up with the concept of low-mileage bias, which introduces the idea that people who drive less have more crashes per vehicle distance traveled [6][7][8][9][10][11][12][13][14][15].
Focusing on senior-related crashes, Langford et al. [10] found that low-mileage drivers performed relatively poorly in a wide range of measures and perceived their own driving ability as being poorer. Older drivers were most often associated with low mileage. Dumbaugh and Zhang [16] investigated the causality between urban form and crashes involving senior drivers and found that community design features, such as arterial thoroughfares, commercial strips, and big box stores, were related to crashes for older adults, and that pedestrian-scaled retail areas ensured mobility for older adults.
groups. Dumbaugh and Li [25] explored whether a crash is the product of random errors or whether crashes are affected by the characteristics of the construction environment. Hadayeghi et al. [26] developed crash prediction models with geographically weighted Poisson regressions and investigated local spatial variations in the relationship between the number of crashes and potential transportation planning predictors, such as land use, socio-economic and demographic features, traffic volume, road network characteristics, dwelling units, and employment type. Guo et al. [27] investigated the determinants of crashes involving cyclists, using a comprehensive list of covariates at the TAZ level in Vancouver, Canada. Using the macro-level crash approach with Bayesian statistical models, they found that bicycle and vehicle exposure measures, households, and commercial area density were positively associated with cyclist crashes. Average edge length, average zonal slope, and off-street bicycle links were found to be negatively associated with cyclist crashes.
After this review, we summarize several points mentioned in the literature. Since low-mileage bias issues have usually been addressed in individual-level analyses through individual surveys, only a few studies have considered the socio-economic characteristics of the geographic location. As Rolison and Moutari [15] stated, multiple components of travel that could affect driver risk, in addition to mileage driven, need to be considered simultaneously. Whereas some studies have addressed the relationship between the number of crashes and the attributes of the built environment using macro-level analysis, only a few studies have tried to assess the low-mileage bias at the macro level. Few crash analyses involving low-mileage bias have been completed using spatial econometric models. In particular, we found that many analyses were conducted based on crash severity in area-level analyses without considering the driver's age. Based on these points, we asked the following three research questions. First, does low-mileage bias have the same impact on other age groups, such as young drivers and mature drivers? Second, what is the relationship between the number of crashes and the built environment, and socio-economic and demographic factors? Third, as there might be a potential problem regarding spatial autocorrelation resulting in problems with statistical assumptions about the independence of the residuals, does spatial autocorrelation appear in the study area? If so, how can this issue be addressed? This study contributes to the literature by answering these questions. We evaluated not only the misunderstanding of senior driving behavior regarding low-mileage bias but also compared this behavior with that of younger drivers at the area level. This study bridges a gap regarding the relationship between the built environment and the driving behaviors of drivers as well.

Study Area and Data
As Figure 1 indicates, the study area covers seven counties in the Columbus Metropolitan Area (CMA), the third-largest in Ohio, USA. All seven CMA counties were considered, but due to incomplete crash data, only four of them were included. Based on the 2010 census, the study area covered 2320 square miles, with a population of 1,654,374. The geographical unit is the traffic analysis zone (TAZ), with 1805 TAZs in the study area. A TAZ is one of the spatial units most commonly used for transportation planning models in the USA. Zones usually consist of one or more census blocks, block groups, or census tracks. The size of a zone varies according to traffic-related data and socio-economic data. Every year, around 300,000 crashes occur in Ohio, and about 15% of them occur in the study area.
The dataset was obtained from various sources. Table 1 shows the data source of each variable used in the study. The crash database was downloaded from the Ohio Department of Public Safety (ODPS), and categories were created for crash location, driver's age, etc. To assess the impact of vehicle miles travel (VMT) on crashes, person trips per household per weekday data were drawn from the National Household Travel Survey (NHTS). The dependent variables are the number of crashes caused by young drivers, mature drivers, and older drivers per TAZ. Because of the need for comparability with other data, we used crash data over the period of 2006-2011. We determined the at-fault driver in a crash by an indicator of crash causation in the crash police report. The dataset was obtained from various sources. Table 1 shows the data source of each variable used in the study. The crash database was downloaded from the Ohio Department of Public Safety (ODPS), and categories were created for crash location, driver's age, etc. To assess the impact of vehicle miles travel (VMT) on crashes, person trips per household per weekday data were drawn from the National Household Travel Survey (NHTS). The dependent variables are the number of crashes caused by young drivers, mature drivers, and older drivers per TAZ. Because of the need for comparability with other data, we used crash data over the period of 2006-2011. We determined the at-fault driver in a crash by an indicator of crash causation in the crash police report. To investigate the relationship between crashes and the built environment, and socio-economic and demographic factors, we used a large volume of data on land use, demographics of current residents, employment, school enrollment, bus stop, average speed limit, etc. Table 2 provides a description of the variables and basic statistics.
A total of 161,501 crashes occurred in the CMA during the period 2006-2011. The number of drivers, classified by age group, is presented in Table 3. We categorized drivers into three age groups. We designated young drivers as drivers aged between 15 and 24, based on an Organization for Economic Cooperation and Development (OECD) report [28], and older drivers as drivers aged 65 +, according to an American Automobile Association (AAA) report [29]. The other drivers, aged 25-64, were included in the mature driver group. Table 3 shows that 32.3% of all crashes were caused by young drivers, 61.0% by mature drivers, and 6.7% by senior drivers.  To investigate the relationship between crashes and the built environment, and socio-economic and demographic factors, we used a large volume of data on land use, demographics of current residents, employment, school enrollment, bus stop, average speed limit, etc. Table 2 provides a description of the variables and basic statistics.
A total of 161,501 crashes occurred in the CMA during the period 2006-2011. The number of drivers, classified by age group, is presented in Table 3. We categorized drivers into three age groups. We designated young drivers as drivers aged between 15 and 24, based on an Organization for Economic Cooperation and Development (OECD) report [28], and older drivers as drivers aged 65 +, according to an American Automobile Association (AAA) report [29]. The other drivers, aged 25-64, were included in the mature driver group. Table 3 shows that 32.3% of all crashes were caused by young drivers, 61.0% by mature drivers, and 6.7% by senior drivers.

Statistical Analyses
We used statistical analyses to understand the relationship between the number of crashes and various factors at the macro level. Counting models were used, and the dependent variable was of the counting type. We used SAS (Statistical Analysis System), R, and Winbugs software to analyze the counting data. Based on the driver age group and statistical method, six models were developed.
Ordinary least squares (OLS), Poisson, and negative binomial (NB) models were commonly used in studies involving collision analysis. The OLS model is not appropriate in the case of nonlinearities, and zero values must be dropped when using logarithms. The Poisson model assumes that the mean-variance of the frequency of collisions is constant, but this assumption is often violated. As a result, we adopted a negative binomial model to control for the problems of overdispersion (presence of greater variability) or under dispersion (presence of less variability) based on the observed variance. This model was the preferred approach used in other crash frequency studies [24,25,[30][31][32][33][34][35][36][37][38]. The Poisson model proved inadequate, but the negative binomial model proved to be appropriate. Therefore, we used the negative binomial model to regress the number of crashes on environmental and socio-economic factors. Using the negative binomial model implies that the mean and the variance are equal (E[y i ] = Var[y i ]). The variance of y i is estimated using the mean crash frequency for entity i, where α is the overdispersion parameter, and c is a constant. The negative binomial model is expressed as lny i = β'x i + ε, where x i is the vector of independent variables. The estimated coefficient β' represents the proportional change in the expected crash frequency from a unit change in the independent variables. However, this model does not account for any spatial dependence among spatial units such as TAZs.
To check for the existence of spatial autocorrelation (SA), the Moran's I test has been widely used [30] and is used here to evaluate SA. In building the neighborhood matrix W, two locations (TAZs) that share an edge or a corner are considered neighbors, based on the queen adjacency of the first order.
The results indicate that SA exists in all three models, as the Moran's I indicator is significant at the 1% level in all models (See Table 4). Since the NB model cannot deal with the spatial dependency issue among adjacent spatial units, this study adopts the conditional autoregressive (CAR) model to deal with it. CAR models are widely used to represent spatial autocorrelation in various applications, including crash analysis [39]. Usually, the data are related to a set of non-overlapping areal units [40]. This model is presented below: where β is a vector of estimable coefficients representing the effects of the covariates, EV i is the exposure variable, and β 0 is the intercept term. SC i represents the spatial random effects (i.e., spatial correlation) as a conditional autoregressive (CAR) prior. UH i represents the unobserved (i.e., uncorrelated) heterogeneity. The CAR model was implemented in a Bayesian setting, and inference was based on the Markov chain Monte Carlo (MCMC) simulation [40,41].

Analysis and Results
The number of observations is 1805 (number of TAZs in the study area). We expected the following directions of impacts of the independent variables: demographics (population density (+); number of households (+); proportions of the population between 15-24 (+), 50-64 (+), and over 65 (+)); socio-economic (high school enrollment (+)); built environment (proportion of commercial land use (+), number of bus stops (+), length of road (+), distance from the center of city (-)); average speed of roads (-); and the time variables Y06 (+), and Y07 (+), which show the number of crashes that occurred in 2006 and 2007.
Young driver, mature driver, and senior driver models were estimated with the negative binomial model. The results in Table 5 indicate that 12 variables are significant for the young driver model, whereas 10 and 9 variables are significant for the mature driver and senior driver models, respectively. For the young drivers, demographic variables, such as the number of households, office employment, high school enrollment, proportion of population between 15 and 24, and proportion of population between 50 and 64, had significant impacts. Among the other variables, the proportion of commercial land use, length of road, distance from the center of Columbus (miles), average maximum speed of roads, and crashes that occurred in 2006 and 2007 were also significant. The vehicle miles per travel rate per weekday had a negative effect on the number of crashes caused by young drivers. For mature drivers, the population density, office employment, and proportion of population between 50 and 64 had a negative effect. Other variables display the same effects as for the young driver model, including VMT. Finally, the senior driver model shows that the number of crashes is positively influenced by office employment, the proportion of the population over 65, and the number of bus stops in a TAZ.
Other variables showed the same trend as the previous two models, except for the distance from the center of Columbus variable. The number of crashes by senior drivers was negatively associated with VMT in the study area. Table 5 shows the significant variables at the 95% level. According to Hadayeghi et al. [22], R 2 α = 1 − α α intercept indicates the goodness-of-fit of the NB model. Notably, VMT is an important variable in the models for assessing the low-mileage bias. The VMT variable is strongly statistically significant in all models. In addition, the sign of VMT is negative not only in the senior driver model but also in the other two models. The number of households, office employment, high school enrollment, the proportion of population between 15 and 24, the proportion of commercial land use, and the length of road in the years 2006 and 2007 are significant with a positive sign. The proportion of population between 50 and 64, the distance from the center of Columbus, and the average maximum speed of roads are negatively associated with the number of crashes.
In the CAR models (Table 6), which address the spatial dependency issue, the results showed that there were few differences from those of the NB models. The results of the CAR models indicate that the signs of the coefficients of the independent variables are consistent with those of the NB models. The VMT variable in each model has a negative sign, consistent with that of the NB models. Demographic variables, socio-economic variables, and other variables are also the same in terms of the direction of effect.  p-values are in parentheses. * p < 0.10, ** p < 0.05, and *** p < 0.01.

Discussion
As our literature review has emphasized the existence of low-mileage bias, we have focused on the relationship between the number of crashes and VMT. The analysis of the NB and CAR model results shows that the relationship is inversely proportional. This can be interpreted as the possibility that being involved in crashes increases as people drive less. People driving less may have more chances of being involved in crashes than people who drive more. This is the same result as reported by Antin et al. [12], and Langford [8], who showed that low-mileage driver groups have higher crash rates when compared with medium-and high-mileage groups. As shown in Tables 5 and 6, the low-mileage bias is not only found in the senior driver group but also in the other driver age groups. This result is not consistent with Pljakić et al. [42]. Unlike our results, they found that increasing Daily Vehicle-Kilometers Traveled increases the number of crashes.
Our results suggest that low-mileage issues exist in all age groups. The low-mileage bias has been argued to be due to aging, as prior literature has examined the nature of low-mileage bias only in terms of the characteristics of senior drivers. Plausible reasons may exist as to why driving less could lead to higher crash opportunities. In urban areas, everything is densely concentrated within the designated areas due to strict regulations and codes. Since many things are located close to residents' homes, this may discourage people from driving long distances. Losing the opportunity to drive could decrease the ability to drive for all drivers. The complexity of the urban environment and the large number of vehicles can contribute to more crashes. However, several studies have attributed this phenomenon to the expedited loss of cognition through aging, although this might not only be true for senior drivers.
Aside from the biological characteristics of humans, this phenomenon might be related to urban road environments. Hence, we also examined the influence of the urban built environment on crash frequencies. The results confirm that demographics, socio-economic, and built-environment factors may also influence the chances of a crash occurring. The overall results support the idea that a more complex urban environment increases the likelihood of crashes. Population density contributes significantly to the number of crashes caused by drivers aged between 24 and 64. These results confirm the results of previous studies [24,31,35,43], except for Dumbaugh and Rae [33], who found that population density had a negative effect on the number of injury crashes. The number of household-occupied housing units has a positive and significant impact on crashes by young drivers. The larger number of housing units may be the result of higher housing density, fewer vacancies, or the exclusive use of dwelling compared to alternative land uses. In either case, there is more local residential activity, which explains the positive sign for this variable. High school enrollment might impact the driver composition by generating young drivers. This is the age where students learn to drive, purchase their first car, go to school, and attend other activities (such as parties). These results support Shope [44], who suggested that teenaged novice drivers were easily distracted and that their risk of crashing is higher than those for other age groups. High school environments involve high levels of bus and pedestrian activities and unexpected changes in speed that might impact drivers.
Our results show that the proportion of 15-24-year-old residents increases the crashes for young drivers. Dumbaugh and Rae [33] found that the population aged between 18 and 24 increased the number of collisions. A population below 15 may have a positive impact on the severity of pedestrian crashes, as shown by Clifton et al. [45]. This effect can be directly accounted for by the contribution of the resident population to this particular group of drivers. The data found in the literature are ambiguous about the impact of the aging population on crashes, which may perhaps be due to the lack of age-specific categorization and that few studies focused on faulty drivers. Quddus [38] showed those aged over 60 years in the population negatively impact the number of fatal injuries and minor injuries, but Wier et al. [46] showed that they contributed to vehicle-pedestrian collisions.
The results show that the proportion of people over 65 years in a TAZ may lead to more crashes involving the elderly. This may be related to the contribution of this group of residents to faulty elderly drivers. Crashes that occur in most age groups are positively and significantly affected by commercial land use. This result is consistent with Wier et al. [46], Pulugurtha et al. [36], and Kim et al. [24].
The number of bus stops positively impacts crashes across most age groups. As suggested by Lee et al. [17], bus stops may increase crash opportunities by increasing pedestrian flows and reducing visibility. The road length in the TAZ has a positive and significant impact in all models. The more roads that contain more vehicles, the greater the likelihood of collision. Previous studies [21][22][23][24][25][26][27][28][29][30][31][32][33]39] reported similar results. According to Kang [47], bus stops can substantially increase pedestrian volume, as well.
The results indicate that the distance from the center of Columbus has a negative impact on crashes caused by young and mature drivers. Central areas in cities can provide more collision targets (pedestrians, cars, etc.) than areas on the outskirts. However, Shi et al. [48] highlighted that high crash rates on rural roads could occur due to poorly designed road curves and the lack of traffic signs in China. The average speed limit of the TAZ has a negative and significant effect on the likelihood of crashes. This is an unexpected result because people usually expect that the number of crashes would decrease with the reduction of speed. One of the plausible reasons is that this study analyzed factors focusing on the number of crashes, not on their severity. Similar results have been reported in the literature. Blanco et al. [49] found similar results by using the Second Strategic Highway Research Program (SHRP2) Naturalistic Driving Study (NDS) data and stratified analysis. Quddus [38] also found that the mean speed (km/h) had a negative impact on the number of fatalities, injuries, and minor injuries. Bindra et al. [34] found that speed limits of less than 40 miles per hour had a negative impact on rural areas, but speed limits of less than 35 miles per hour had a positive impact in urban and suburban areas. Siddiqui et al. [37] found that a speed limit of 15 mph had a negative impact on pedestrian collisions.
We found spatial autocorrelation in the study area. Crashes that occur in the study area are spatially dependent and can be influenced by neighboring environments. There may be several reasons for that. Xu and Huang [39] suggested that the relationship between crash counts and surrounding environments is influenced by location due to intrinsically different relationships across the region, misspecification of reality, omitted relevant variables, and inappropriately represented the functional form. Lee et al. [17] suggested that several unobserved factors, such as road congestion, design, and materials of roads that were built in the same period and location, were spatially correlated. Also, Li et al. [50] discovered that the compactness of the urban environment is positively associated with traffic congestion.

Conclusions
We developed statistical models to investigate the relationship between driver groups classified by age and vehicle miles traveled. The outcomes from the structured models provided some reasonable answers to the research questions, which focused on the low-mileage bias. First, we confirmed that age groups besides older drivers were also associated with VMT, and that low mileage may contribute to more crashes regardless of driver age. Due to impairments caused by aging, such as visual and cognitive impairment, senior drivers have previously been targeted for this phenomenon, as compared with middle-aged and young drivers. However, the low-mileage bias was revealed not only in the crash model for senior drivers but also in the crash models for the other age groups. The results of this study suggest that low-mileage bias exists for senior drivers, drivers aged 25-64, as well as young drivers. The results imply that, regardless of age, driving less could lead to more chances of being involved in crashes.
Second, the results of the models support the hypothesis that the number of crashes and surrounding factors, such as the built environment and socio-economic and demographic factors, are closely related to each other. This was revealed not only in the results of the senior crash models but also in the models for the other age groups. We found influential urban environment factors (e.g., demographics, socio-economic factors, and built environment) on crashes, and the results were consistent with the literature in many cases, although some variables could not be applied to all age group, such as the composition of the population by age and high school enrollment. One plausible reason might be that a driver's driving behavior is influenced by characteristics of the same or close age groups in the area. For example, in areas where high school enrollment is high, teenage novice drivers might contribute to more crashes.
Third, we found spatial autocorrelation in the study area, and we used CAR models to deal with this issue. We found few differences between the results of the NB models and those of the CAR models regarding the sign of the variables and their coefficient.
The average age of the total population is increasing due to an aging society. The above findings provide a better understanding of low-mileage bias and the determinants of crashes. This might assist decision makers in urban policy and planning to devise regulations and policies based on the behavior of drivers by age. Our findings also encourage more safety-emphasized planning, such as road management, to improve sustainable traffic safety by reducing the complexity of the driving environment and increasing overall neighborhood safety.
There are avenues to be addressed by future research. First, further analysis might be conducted based on crash severity and the driver's age. Different types of damage caused by crashes may be differently associated with low-mileage bias. Second, follow up research regarding traffic safety policies should be conducted based on the characteristics of driver age and sex in addition to low-mileage bias, which will bridge the gap between the built environment and driving behaviors.