1. Introduction
The 2018 World Health Organization report showed that every year more than 1.25 million people die because of road traffic crashes. Non-fatal injuries were also recorded among about 50 million people, and these injuries usually result in permanent disabilities [
1]. For the fiscal year 2010, approximately
$240 billion worth of loss resulted from motor-vehicle crashes, representing about 2% of the United States (U.S.) Gross Domestic Product [
1]. It is projected that by the year 2030, if no preventive measures are implemented, road traffic crashes will become the seventh leading cause of death [
1].
In the U.S., statistical investigations have shown a reduction in traffic collisions between the years 2003 and 2011. Notwithstanding, traffic fatalities and fatality rates increased in the year 2012 [
1], which emphasizes the need to not only examine collision frequency but also injury severity and fatalities. There were approximately 6.3 million police-reported motor vehicle traffic crashes, with 0.6% (32,166) fatal, 27.2% (1,715,000) being injury and 72.2% (4,548,000) being property damage only crashes [
2] in 2015. Compared to 2015, records showed that the year 2016 had even more fatal crashes (34,439) and more fatalities [
2].
The Center for Disease Control and Prevention reported that motor-vehicles were the leading cause of death for individuals age 10 and ages 16–23 in 2015 [
2]. According to the injury facts produced by the National Safety Council in 2017, most deaths for individuals aged 1–24 are recorded from motor vehicle crashes, and individuals of age 25–84 have their second leading cause of death from traffic collisions. Furthermore, six deaths are recorded every day for teens between the ages of 16 and 19 as a result of motor vehicle collision injuries. The likelihood of teen drivers to be involved in fatal crashes is almost three times that of drivers aged 20 [
3].
In 2015, hospital emergency departments treated approximately 250,000 adults for motor vehicle collision injuries, with an additional 7000 individuals dying from their injuries. Using simple statistics, this means 19 older adults are killed and about 700 are injured in crashes on a daily basis [
3]. More specifically, in 2014, California has ranked the second-highest state for young driver (aged 15–20 years old) crashes in the United States after Texas [
2]. It is also important to know that road crashes cost the U.S. an average of
$820 per person per year.
With these facts, the frequency and severity analysis of collisions have been the two major branches for assessing traffic safety [
1]. Collision frequency analyses are conducted at a macroscopic level (county, state, traffic analysis zones, area-wide, etc.), while severity analyses are done at a microscopic level (driver, occupant, pedestrian, etc.). The frequency level considers factors such as social, economic, demographic factors, which have to do with a large population of people in an area.
Severity analysis is conducted with individual crash victim characteristics. Factors considered include the age, gender, driving ability, and other variables personal to the victim. Information related to this is usually reported by the police. Using accident reconstruction strategies, law enforcement agents can give a near-accurate description of events that occurred right before the crash [
2]. Along with testimonies from witnesses, an overview of the crash causes and resulting injuries can be provided in official reports.
In modeling crashes, different explanatory variables have been used in the past. Roadway (speed limit, road surface, number of lanes, lane width, curvature, grade, paving, roadway class, roadway function, alignment, AADT, shoulder width, barrier, etc.), driver (age, gender, seatbelt use, seating position, driver alcohol involvement, familiarity with road, etc.) and environmental (Lighting condition, weather, time of day, day of the week, etc.) factors have been a point of focus for studies [
3,
4,
5].
Existing studies have observed that factors such as inadequate roadway lighting conditions, bad weather conditions, and the number of vehicles involved can significantly contribute to an increased injury severity for drivers [
6]. In addition, Uddin and Huynh [
7] established that truck drivers carrying hazardous materials have higher injury severity outcome compared to passengers inside the same truck. The same researchers also found that on non-interstate highways, truck occupants are about 50% more likely not to sustain any injury in events of traffic collision [
7].
A taxi–pedestrian crash analysis [
2] provided insights on factors affecting the severity of injuries. Results showed that speed and the absence of a roadway median were among the major contributors to increased injury severity. The study also pointed out that collisions between a vehicle moving in a straight path and a pedestrian are usually less severe.
Haleem et al. analyzed pedestrian–intersection crash-related injuries using the mixed logit model. Two severity levels (fatal and non-fatal) were considered in crash modeling. Results showed that the truck composition in the traffic, posted speed limit, the age of pedestrian, the annual average daily traffic, time of crash, lighting, and weather conditions were among the significant predictor variables. At un-signalized intersections, they found that the type of crosswalk, bad lighting, age of pedestrian, and condition of the road surface were significant in crash injury severity prediction [
4].
Additionally, Behnood et al. used the latent class model to analyze the effect of age, gender and alcohol consumption on driver-injury severities. It was found that the blood alcohol content of drivers had a significant effect on injury severity outcomes among different genders [
8]. Kim et al. [
9] also found that the likelihood of fatal injury increased for male drivers. They also found that an older driver driving an older vehicle can potentially increase crash injury severity.
Human factors are one of the most relevant determinants of crash severity. In most cases, crashes usually occur due to errors of human driving. This has led to many studies which consider the human-related characteristics in determining injury severity outcomes. Some studies [
10,
11] have considered severity prediction in terms of the at-fault driver, while others have considered age [
12,
13], sex [
14,
15,
16], and experience [
17]. All these studies have pointed to the significance of human factors in injury severity outcomes.
Specifically, age-related differences have been established. Ulak et al. [
18] analyzed crashes in northern Florida based on different age groups. The study found that older driver crashes totally differ from crashes involving other age groups, both spatially and temporally. They found that older driver crashes are more frequent in downtown areas that are highly populated with older individuals. Considering novice teenage drivers, Braitman et al. [
19] discovered that for every crash involving teenagers, the teenage driver was at fault for 75% of the crashes. Donmez and Liu [
12] observed that driver distractions, such as texting on phones and chatting with passengers, resulted in increased injury severity for young and middle-aged drivers. They also found a closer relationship between crash characteristics for young and middle-aged drivers than between older drivers and any other age groups.
The investigation of crash injury severity has not only been in terms of motor-vehicles. It has also been established that there are age-related differences in crash characteristics even for cyclists [
20,
21] and pedestrians [
22]. For example, the cycling intensity of younger drivers was found to be a significant predictor of risky behavior. On the other hand, cycling intensity was not associated with risky behavior for older cyclists [
20]. Deb et al. [
22] found that older pedestrians showed more positive behaviors than younger pedestrians. In fact, they found that there was not much difference between young and middle-aged pedestrian behaviors, but there was a significant difference between older pedestrians and other age groups.
With all these studies reviewed, the first inference is that human factors are major considerations in examining crash characteristics and considering possible solutions to achieve the objective of safer roadways. More importantly, a recent study [
23] provided the most up-to-date investigation of aging crash risks in the U.S.—though focusing on intersections, the study found that older drivers are facing numerous challenges as a result of natural aging and depletion in sensory, perception, cognitive, and vehicle navigation skills. This calls for intervention, as the U.S. population is continually aging.
Specifically for the state of California, it was said earlier that the state was recently ranked as the second-highest state for young driver crashes. With this, it is important to find a means for improving the roadway safety performance for all age groups. This calls for further investigation. As a result, the objective of this study was to obtain insights on factors that are significant determinants of crash severity for different age groups and to provide some recommendations on how to possibly improve the roadways equally for all age groups. To the best of our knowledge, this study is the most recent study that considers age-related differences, specifically for the state of California. In 2013, Kim et al. [
9] conducted a similar study but only investigated single-vehicle crashes; they did not consider each age group differently.
Using the Highway Safety Information System (HSIS) data, this study sought to identify factors contributing to motor-vehicle injury severities by driver age in the state of California. Three different age groups were identified: Young drivers (aged 16–25), middle-aged drivers (aged 26–64) and older drivers (aged 65 and above). We examined the effect of different characteristics on the likelihood of injury fatalities for each age group. We also examined the difference across each of the age groups.
This report continues with a methodology section that describes the data used, the variables used in developing the model, the model formulation, and model development. The result section follows the methodology section, and it gives the results obtained for each age group and provides a detailed discussion of the results. Finally, the conclusion section provides a general explanation of the research objective, the results obtained, and recommendations in line with this study. The last section provides the limitations of this study suggested for future research.
2. Materials and Methods
Data
The HSIS is a coordinated effort by the U.S. Federal Highways Administration (FHWA) in which nine (9) member states including California, North Carolina, Illinois, Ohio, Maine, Utah, Michigan, Washington, and Minnesota voluntarily provide the FHWA with crash data. It is a system that provides rich data on roadway and accident variables. The roadway file contains information on roadway type, including the number of lanes, shoulder type, median and shoulder width, and other relevant geometric characteristics. The accident file contains the vehicle and occupant information, which includes the type of vehicle, occupant age, sex, accident severity, weather condition, and other environmental variables. The HSIS database does not provide personal and/or confidential details about the crash victims. Therefore, there are no risks for the integrity of the victims. Since the HSIS only provides previously published data, there was no breach of ethical issues in this study.
Crashes for California from the years 2005–2014 were obtained from the HSIS database. The data contains only police-reported crashes for the state of California. The crash files were first merged for accidents, vehicles, and roadway characteristics. The merging was done using case numbers, and then the roadway characteristics were incorporated using the milepost variable from the accident and roadway files. Once the 10-year data were successfully merged, crashes involving motor-vehicles were extracted as the required data for this study. The was a total of 1,638,013 crashes over the 10-year period.
Rows containing empty data points were removed from the overall crash data. In cases where a variable had too many empty data points, the variable was removed totally. After data cleaning and filtering, there were 17 variables left for model development. The total remaining number of crashes is 1,520,465. The data were then grouped into crashes involving young (16–25 years old), middle-aged (26–64), and older (above 64 years) drivers. All 17 variables obtained after cleaning and filtering were used to develop the model at the initial stages. The variables fall into four major categories: Roadway, driver, accident, and environmental characteristics. The variables were converted to categorical variables.
Injury severity in the HSIS database is classified into five (5) levels using the KABCO scale defined as; fatal injury (K), and 4 levels of severe injuries: Severe/incapacitating injury (level 1, A), non-incapacitating/other visible injury (level 2, B), possible injury (level 3, C), and no injury (level 4, O). Each of these crash severity levels was accounted for in modeling severity factors in this study. 2.2. Model Formation
To identify factors contributing to the crash severity for each age group, this study used the multinomial logit (MNL) modeling technique. There are numerous models used in crash severity analysis in literature. However, MNL is the most used statistical approach for establishing the significance of predictor variables in determining the outcome of response variables [
24]. The MNL model does not put a restriction on the coefficient of predictor variables in different severity functions, thereby making it possible to observe the impact of a single variable on different injury outcomes [
25].
The dependent variable (injury severity) was coded as a polytomous nominal response, which is suitable for the MNL model. The MNL model explains the odds of an outcome being in one injury severity category relative to the reference category,
k, by assuming that the combination of independent variables is linear. As a result, the probability of each outcome is a nonlinear function of
p predictor variables. The MNL model is given as:
In which
is the probability of the resulting prediction being in category
j,
k specifies the number of dependent variables, and
p provides the number of independent variables. The total of
equations are solved using the repeatedly weighted least squares optimization procedure that finds the maximum likelihood estimates of the coefficient of each independent variable. The resulting coefficient provides the effects of the predictor variable on the odds of being in one injury severity category
j against the selected reference category
k. The resulting outcome is given as:
The model development considered only the crashes related to motor-vehicles for all age groups. The MNL models were developed using MATLAB statistical software. The HSIS data used the KABCO scale as described earlier, and a baseline injury severity level had to be selected for the MNL. This study used the “no-injury” severity level as the baseline for the other 4 severity levels. Therefore, the MNL provided a set of four coefficients, representing the other four severely levels versus the base category (no-injury).
3. Results
The distribution of frequency for each crash level is shown in
Figure 1 for all age groups under consideration. Similar trends were observed in the crash frequency distribution for all age groups. Older drivers had a higher percentage of fatal crashes over the 10-year period compared to young drivers. They also had fewer non-incapacitating crashes compared to young and middle-aged drivers. This result is reasonable considering the frailness of the older human’s body state. Older drivers are more susceptible to external forces than other age groups, thereby resulting in more injury severity.
Table 1,
Table 2 and
Table 3 gives the proportions (prop.) of the categorical variables used. The tables show the frequency distribution of crash severity levels for each age group based on each of the variables. In terms of environmental factors, for young drivers, 80% of daylight crashes were no-injury crashes, while 76% of crashes in dark, street-lighted conditions were no-injury (
Table 1). Similarly, middle-aged drivers had 80% no-injury crashes in daylight and 78% in dark, street-lighted conditions (
Table 2). On the other hand, older drivers had 77% no-injury crashes in daylight and 79% no-injury crashes in dark, street-lighted conditions (
Table 3). The same trends were observed for young and middle-aged drivers, and a different trend was observed for older drivers. Older drivers possibly take more precaution while driving at night than they do in the day. Clear weather conditions provided the highest no-injury crashes for all age groups with 78%, 79%, and 77% for young, middle-aged, and older drivers, respectively (
Table 1,
Table 2 and
Table 3).
For roadway factors, divided and paved medians gave the highest no-injury crashes for all age groups. This is logical, as paved medians provide proper separation between two opposing traffic streams, and can reduce the risk of vehicle-to-vehicle head-on collisions, potentially reducing injury severity. Furthermore, all age groups had more injury-involved crashes on rolling terrains compared to flat terrains. When the road surface was wet, 76% of crashes were no-injury for young and middle-aged drivers (
Table 1 and
Table 2), while 74% were no-injury for older drivers (
Table 3). This indicates that more injury-involved crashes were recorded for older drivers in such conditions.
Accident characteristics reveal that head-on collisions provided the most injury-involved crashes compared to sideswipe and rear-end. 45%, 47%, and 54% of head-on collisions had some level of injury for young, middle-aged, and older drivers, respectively. Finally, driver characteristics reveal that the nonuse of belts and left-turn maneuvers resulted in more injury for drivers. Summarily, the frequency distribution of the crashes reveals already-established facts and also point out the potential similarities for crashes involving young and middle-aged drivers. More insights can be further obtained using the modeling technique already described in the methodology section.
The modeling results for all crashes involving all age groups are provided in
Table 4,
Table 5,
Table 6,
Table 7,
Table 8 and
Table 9. To explore insights from these results, we will explain the effect of each variable over the injury severity outcome based on the category of the variable. The variables were already categorized into four: Driver, accident, environmental, and roadway characteristics. Each consecutive pair of tables provides the results for each age group, resulting in six tables for the three age groups.
We will explain the results in terms of the coefficient for each predictor variable, the p-value, and the odds ratio (O.R.). The coefficient values determine the weight of a specific predictor variable on the probability of a crash injury severity level relative to the baseline. The p-values signify the significance of a predictor variable in explaining a crash outcome at a specific confidence interval (C.I.). In this study, we used a C.I. of 90%. Therefore, p-value ≤0.1 denote a significant variable. The O.R. is the exponential value of the logit coefficient which states to what magnitude a unit increases in a variable affects the crash severity outcomes when all other factors are kept constant.
From applying the MNL, the resulting mathematical prediction equations are provided in Equations (5)–(7) for young, middle-aged, and older drivers, respectively. Only the mathematical equations for fatality prediction are provided here. Equations for other injury severity levels take the same form as provided by the coefficients in the table.
where:
= Lighting (1 if daylight, 0 otherwise)
= Lighting (1 if dark, street lighted, 0 otherwise)
= Day (1 if weekend, 0 otherwise)
= Median type (1 if undivided, not striped, 0 otherwise)
= Median type (1 if undivided, striped, 0 otherwise)
= Median type (1 if divided, paved, 0 otherwise)
= Median type (1 if divided, unpaved, 0 otherwise)
= Terrain type (1 if flat, 0 otherwise)
= Terrain type (1 if flat, 0 otherwise)
= Roadway type (1 if urban, 0 otherwise)
= Functional class (1 if minor arterial, 0 otherwise)
= Functional class (1 if major collector, 0 otherwise)
= Accident type (1 if head-on, 0 otherwise)
= Accident type (1 if sideswipe, 0 otherwise)
= Accident type (1 if rear-end, 0 otherwise)
= Weather (1 if clear, 0 otherwise)
= Weather (1 if rainy, 0 otherwise)
= Weather (1 if cloudy, 0 otherwise)
= Weather (1 if windy, 0 otherwise)
= Road surface (1 if dry, 0 otherwise)
= Road surface (1 if wet, 0 otherwise)
= Vehicle deficiency (1 if none, 0 otherwise)
= Gender (1 if male, 0 otherwise)
= Seatbelt (1 if not used, 0 otherwise)
= Maneuver type (1 if left turn, 0 otherwise)
= Sobriety (1 if not driving under influence, 0 otherwise)
= Number of lanes (1 if 2 lanes, 0 otherwise)
5. Conclusions
This study presents an injury severity modeling of motor-vehicle crashes in the state of California. This was done based on age groups. Three age groups were identified for the analysis: Young drivers (ages 16–24), middle-aged drivers (age 25–64) and older drivers (age above 64). The data used were obtained from the Federal Highways Administration’s Highway Safety Information System. Certain variables were selected from the data after cleaning. The data include roadway characteristics, environmental characteristics, driver characteristics, and accident characteristics.
The results showed similar trends between young and middle-aged driver injury severity outcomes. The driving behavior for both age groups tended to be similar, resulting in a closely related impact of predictors on the injury severity for both groups. Older drivers, on the other hand, had distinct results. The results showed that older drivers tend to have a regular driving behavior regardless of changes roadway characteristics. As a result of this, changes in such factors did not readily have a significant effect on the injury severity outcome. The factors that had more influence on their driving ability were accident type and weather conditions.
There are some contributions that can be directly obtained from the findings in this study. As the results indicated, environmental conditions, especially weather conditions, influence older driver crash injury severity the most. This could inform transportation professionals of the possibility of reducing older driver injury severity by providing policies that put some level of restriction on aging drivers in adverse weather and bad lighting conditions. These policies could be implemented in the form of variable message signs (VMS) that are only active during bad environmental conditions. These VMS methods would not in any way affect the driving of other age groups.
Furthermore, the study found that young and middle-aged drivers’ injury severities have very close relationships. This infers that safety-improvement strategies can be implemented for one of these age groups and have an equal impact on the other age group. Emerging transportation systems management and operations (TSMO) strategies can be directed towards the safety of young and middle-aged drivers to simultaneously reduce injury severity for both age groups.
Some limitations in this study should be considered for future exploration. First, only crashes involving motor-vehicles were considered, thereby limiting the scope of the conclusions to motor-vehicles alone. Future studies can investigate the insights that can be obtained by including crashes involving cyclists and pedestrian crashes. Secondly, the age group distribution considered three sets, but it would be interesting to perform more a micro-level analysis by having crash subsets for each age group, similar to the one performed by Ulak et al. [
32] for older drivers. Thirdly, instead of the multinomial logit model utilized in this study, a more advanced modeling technique could be implemented to account for possible model or data bias. Lastly, this study used the HSIS database, which is presently being encouraged by the U.S. FHWA to be used in safety research. Other data sources can be used to perform the same analysis and compare the results. This can be a form of validation for the HSIS database.