Socioeconomic and Demographic Factors Effect in Association with Driver’s Medical Services after Crashes

Motor vehicle crashes are the third leading cause of preventable-injury deaths in the United States. Previous research has found links between the socioeconomic characteristics of driver residence zip codes and crash frequencies. The objective of the study is to extend earlier work by investigating whether the socioeconomic characteristics of a driver’s residence zip code influence their likelihood of resulting in post-crash medical services. Data were drawn from General Use Model (GUM) data for police crash reports linked to hospital records in Kentucky, Utah, and Ohio. Zip-code-level socioeconomic data from the American Community Survey were also incorporated into analyses. Logistic regression models were developed for each state and showed that the socioeconomic variables such as educational attainment, median housing value, gender, and age have p-values < 0.001 when tested against the odds of seeking post-crash medical services. Models for Kentucky and Utah also include the employment-to-population ratio. The results show that in addition to age and gender, educational attainment, median housing value and rurality percentage at the zip code level are associated with the likelihood of a driver seeking follow-up medical services after a crash. It is concluded that drivers from areas with lower household income and lower educational attainment are more likely to seek post-crash medical services, primarily in emergency departments. Female drivers are also more likely to seek post-crash medical services.


Introduction
Motor vehicle crashes in the US are the third leading cause of preventable injuryrelated deaths. In Kentucky, they rank as the second leading cause of injury-related deaths, and in Utah and Ohio, third. Deaths per 100 million vehicle miles traveled (VMT) in Kentucky have exceeded the national average since 1986 [1,2]. Previous research has demonstrated that socioeconomic and demographic characteristics of the zip codes in which drivers reside influence their involvement in crashes [3,4]. Past research investigated the effect of socioeconomic and demographic factors associated with the residence location of the drivers involved in a crash, using them as a surrogate descriptor [3][4][5] and using the socioeconomic data from the drivers' residence zip codes [6]. Income, poverty, employment, education, rurality, and driver age all appear to influence crash propensity [4,[7][8][9][10][11]. Previous research showed a well-defined relationship between educational attainment and crashes [3,7,12]. Race is a potential contributing factor as well [7]; however, research on the link between race and crashes is sparse. Another robust predictor of crash propensity is driver marital status [11]. Drivers who live in lower household income areas are more likely to be involved in a crash and be found as the at-fault party [4]. Age is another influence on crash propensity. Chen et al. [13] and Factor et al. [11] showed that young or new drivers are involved in more crashes and have higher fatality rates than other age groups.
Several research efforts have identified variables that can potentially impact crash occurrence and severity. While police-issued crash reports include data on severity, they are completed at the scene before full medical examinations are performed. This has motivated efforts to link crash records with hospital data to obtain more accurate and complete information on injuries and crash severity. For example, individuals involved in what police characterize as a possible injury crash (i.e., complaints of pain without visible injury) could result in post-crash medical services after experiencing discomfort. Had police known this information, they may have upgraded the crash severity. Integrating linked crash and hospital records into safety analyses can provide a more comprehensive view of how resources need to be allocated to prevent fatalities, reduce injury severity, and lower healthcare costs.
Such efforts began in 1996 when the National Highway Traffic Safety Administration (NHTSA) established the Crash Outcome Data Evaluation System (CODES) program [14]. The CODES Data Network developed a standardization model that maps state-specific crash files onto a standardized format called the General Use Model (GUM). Probabilistic linkages were used to combine information from motor vehicle crash reports and several types of medical data to develop GUM datasets. Integrating linked crash and hospital records into safety analyses can provide a more comprehensive view of how resources need to be allocated to prevent fatalities, reduce injury severity, and lower healthcare costs. Several previous research efforts utilized CODES data for studies such as injury assessments for bicycle and pedestrian crashes [15], seat belt studies [16] and helmet usage [17]. These research efforts have demonstrated the benefits of using linked data to estimate crash severity and improve driver safety.
While socioeconomic status is clearly associated with health condition and healthcare utilization, there has been limited research along this line. The current study investigates the socioeconomic factors associated with the driver's residence zip code that could influence the likelihood of the driver seeking post-crash medical services. Using data from Kentucky, Utah, and Ohio, this study investigates whether the socioeconomic conditions of the zip code in which a driver resides influence their likelihood of being involved in a crash that results in seeking post-crash medical services. Data were drawn from the GUM dataset that has linked police crash reports and hospital records. Measuring correlations between driver zip code socioeconomic characteristics and the need for seeking post-crash medical attention will improve our understanding of the factors most likely to identify those seeking post-crash medical services.
A person's socioeconomic status significantly impacts their healthcare service utilization [17]. Although healthcare delivery in the US has been transformed in recent years, the availability of newer and improved healthcare services does not guarantee equal availability of services to all people. A person's socioeconomic status significantly impacts their healthcare service utilization [18,19]. For example, Medicaid recipients are more likely to use emergency departments (EDs) than people who have other coverage as they have less access to ambulatory care [19]. Some research has found that poorer individuals are more likely to have a sedentary or unhealthy lifestyle and therefore are more likely to suffer poor health outcomes or endure disabling conditions than wealthier people who have greater access to resources [18,19]. This research investigates the association between a person's socioeconomic status and healthcare utilization post involvement in a crash. Practitioners can utilize the output of this research to propose potential improvements in health insurance policies.

Dataset Development
This study leveraged the GUM data from Kentucky, Ohio and Utah that linked motor vehicle crash, ED, and hospital discharge records. The analysis encompasses single and multi-vehicle crashes during the study period. Study periods varied by state-Kentucky, 2008-2014; Ohio 2009-2016; and Utah 2008-2018.
Information on driver residence zip codes was extracted from the state police databases and merged with GUM datasets. Data were cleaned to remove invalid entries. The final dataset consisted of 928,692 driver records for Kentucky, 3,078,229 for Ohio and 1,069,777 for Utah and included drivers between 15 and 90 years old. For this analysis, drivers were divided into six age groups-<20, 20-24, 25-39, 40-64, 65-74 and >74 years. The drivers were grouped in these groups to conform to the socioeconomic data. A representative control dataset was created by drawing from the unlinked records a random sample equal to the number of hospital-linked records. The sampling procedure ensured an equal distribution of crash severities between the linked and unlinked data. Table 1 summarizes the data from each state and indicates the number of cases for linked and unlinked driver records.

Socioeconomic Data
As discussed above, several studies show that socioeconomic and demographic factors associated with the residence location of the drivers can be used as a surrogate measure to describe crash occurrence. Income, poverty, employment, education, rurality, and driver age are some of the factors that influence crash propensity. As an attempt to investigate the association of these factors on healthcare service utilization post involvement in a crash, several socioeconomic and demographic factors were selected based on the recommendations from prior research findings. The list of variables selected and examined in this study is presented in Table A1 in Appendix A. Several zip-code-level socioeconomic data for all three states were extracted from the American Community Survey [20] data. The data collected were variables associated with Race, Housing, Marital Status, Education, Income and Others (such as employment by population ratio, percent rural etc.). The data were combined with each state's GUM dataset. Additional socioeconomic variables were derived from Census data. The proportion of the population eligible to drive was calculated and incorporated as an explanatory variable. Driver population density for each zip code was calculated using measured zip code areas. Data on driver age and gender were extracted from crash data.

Statistical Analysis
The response variable of the study is whether a driver sought medical services post involvement in a crash, while the predictor variables are the socioeconomic characteristics of the driver's residence zip code. Logistic regression was selected because the response variable is whether a driver's crash record is linked to a hospital database and is categorical.
In logistic regression, the response variable's expected values are modeled based on the probability-or odds-of it taking a particular value based on combinations of predictor values. The natural logarithm of the odds (i.e., log-odds or logit function) is defined in Equation (1). Here, p is the probability of a driver seeking post-crash medical services The probability takes its final form as such: where f (X) = a + b 1 X 1 + b 2 X 2 + . . . + b n X n is the regression model, Xi is the ith explanatory variable, a is the intercept and bi is the ith coefficient estimated using the maximum likelihood method.

Model Development Process
Correlation and recursive partitioning analysis served as the starting point for logistic regression by winnowing potential explanatory variables. Correlation was used to explore relationships between the response variable and socioeconomic variables. Recursive partitioning analysis was also used to understand the association between explanatory and response variables. The method generates an importance score that captures the relative importance of explanatory variables in predicting the response variable. When building logistic regression models, the outputs from the two analyses were utilized.
Several potential interactions between the socioeconomic variables might influence crash occurrence. Potential interactions were identified using R's Shiny package, which employs the Feasible Solution Algorithm (FSA) to detect statistical interactions in large datasets [21]. This approach enables the formulation of new models or improvements to existing models. FSA allows higher-order interactions; however, for this study, two-way interactions are used.

Correlation Test
Kentucky All racial categories, except for the percent of zip code identifying as white, were significantly but weakly correlated with the response variable. Median housing value, marital status, and educational attainment were also significantly correlated. Drivers from zip codes with lower educational attainment were more likely to be involved in a crash and seek post-crash medical services; percentage of residents in a zip code with less than a high school degree was identified as a potential candidate for describing educationrelated effects. All income levels were significantly related to the response variable, as were all variables in the Other category, although employment by population showed a stronger association. Driver age and gender have a well-established relationship with crash occurrence, qualifying them as potential predictor variables.
Ohio All explanatory variables were weakly correlated with the response variable. Education variables returned the highest correlations with a negative correlation with the response variable. Three candidate explanatory variables were not statistically significantly correlated with seeking post-crash medical services-driver population, renter occupied housing units and area of zip code.
Utah Data from Utah yielded results similar to Kentucky. Weak correlations were observed between the candidate and the response variables. The variables displaying the highest absolute correlations were percent bachelor's degree or higher, median housing value, and mean individual income. Most correlations were statistically significant. Only three lacked a statistically significant relationship with the response variable-percent white, percent other races, and renter-occupied housing units. All variables in the marital, education, income and Other categories were statistically significantly correlated with the response variable.

Recursive Partitioning Analysis
Recursive partitioning analysis identified several factors, including education, housing value, rurality, employment, and income, as possible explanatory variables. Table A2 in the Appendix A lists the recursive partitioning analysis for the variables deemed most important for each state.
In all states, gender ranked as the highest or second highest predictive variable. Age was also of high importance, along with some socioeconomic variables. Among the zipcode-level economic predictors, median individual income ranked highly in all states, while median housing value was highly important in Kentucky and Ohio. Two predictors related to educational attainment-the percentage of residents with at least a bachelor's degree and the percentage with at least some high school education-had high importance. In all states, the percentage of rural areas in each zip code was of high importance. In short, the recursive partitioning analysis suggested the potential for developing state-specific models using common variables, thus facilitating comparisons.

Final Model
For each state, several models were tested and evaluated (using goodness-of-fit measures such as AIC, BIC, AUC and percent correctly predicted) to select a final model. A training and validation approach was used for the model development to estimate accuracy in predicting the response variable-20% of observations were placed in the training dataset and 80% in the validation dataset. Along with the main effects, possible interactions were explored to improve each model's mathematical stability. Table 2 lists the parameters of each state's final model. All variables were significant (p < 0.001) except those noted in bold. Models were tested for potential interactions between variables. No statistically significant interactions were identified. Training and validation analysis found that the models predict approximately 57% of the data correctly in each state.
All state models include zip-code-level socioeconomic variables such as percent with bachelor's degree (BS), median housing value (HVL), and percentage rural (RUR). Gender and age are common variables as well. Models for Kentucky and Utah also incorporate employment-to-population ratio (EMP). Socioeconomic predictors (e.g., BS, HVL, EMP) are negatively related to the response variable, while RUR exhibits a positive association. The impact of these socioeconomic factors on crashes is supported by several re-search studies but is not limited to Hasselberg [7]. Table A3 in Appendix A shows the summary statistics of the continuous predictor variables in the models for all three states.
The odds ratio for gender indicates that female drivers are more likely to be involved in a crash seeking post-crash medical services. Figure 1 shows the odds ratio for different age groups in each of the three states. Odds ratios suggest that drivers in the age category of 20-24 years are more likely to be involved in a crash resulting in medical services. This trend decreases with age before increasing for older groups. The inflection point differs by state. In Kentucky and Ohio, the reversal occurs for drivers 65 and older, while in Utah, the threshold is 40 years of age. All state models include zip-code-level socioeconomic variables such as percent with bachelor's degree (BS), median housing value (HVL), and percentage rural (RUR). Gender and age are common variables as well. Models for Kentucky and Utah also incorporate employment-to-population ratio (EMP). Socioeconomic predictors (e.g., BS, HVL, EMP) are negatively related to the response variable, while RUR exhibits a positive association. The impact of these socioeconomic factors on crashes is supported by several re-search studies but is not limited to Hasselberg [7]. Table A3 in Appendix A shows the summary statistics of the continuous predictor variables in the models for all three states.
The odds ratio for gender indicates that female drivers are more likely to be involved in a crash seeking post-crash medical services. Figure 1 shows the odds ratio for different age groups in each of the three states. Odds ratios suggest that drivers in the age category of 20-24 years are more likely to be involved in a crash resulting in medical services. This trend decreases with age before increasing for older groups. The inflection point differs by state. In Kentucky and Ohio, the reversal occurs for drivers 65 and older, while in Utah, the threshold is 40 years of age.

Discussion
This research is one of the first studies that attempts to correlate the socioeconomic factors of a driver involved in a crash seeking post-crash medical services. This study confirms that drivers living in zip codes with lower housing values are more likely to seek post-crash medical services.
One explanation is that residents of lower-income zip codes are more likely to have more pre-existing conditions [22] and therefore suffer greater impacts when involved in a motor vehicle crash. Ukert et al. [23] discuss the disparities in healthcare use among lowand high-income communities. They concluded that HDHPs (High Deductible Health Plans) discourage routine checkups among low-salary employees. It is scientifically established that the disparities in healthcare utilization differ between communities of different

Discussion
This research is one of the first studies that attempts to correlate the socioeconomic factors of a driver involved in a crash seeking post-crash medical services. This study confirms that drivers living in zip codes with lower housing values are more likely to seek post-crash medical services.
One explanation is that residents of lower-income zip codes are more likely to have more pre-existing conditions [22] and therefore suffer greater impacts when involved in a motor vehicle crash. Ukert et al. [23] discuss the disparities in healthcare use among lowand high-income communities. They concluded that HDHPs (High Deductible Health Plans) discourage routine checkups among low-salary employees. It is scientifically established that the disparities in healthcare utilization differ between communities of different socioeconomic statuses [24][25][26]. The impact of these socioeconomic factors on crashes is supported by several research studies but is not limited to Hasselberg [5], and Adanu et al., 2017 [7].
Another reason is that persons with lower socioeconomic status are more likely to use EDs. Current healthcare delivery in the US does not guarantee equal availability of services to all people. A person's socioeconomic status significantly impacts their healthcare service utilization [18,19]. Research has documented that Medicaid recipients are more likely to use EDs than people who have other coverage [19]. In the Kentucky and Utah GUM data, most of the linked records (85% for Kentucky, 91% for Utah) are from ED admissions, which substantiates the assertion that drivers from lower economic areas are over-represented in the database.
Overreliance on ED services is a serious issue in the US. Many patients who visit EDs do not require urgent services [27]. Ukert et al. [23] showed that lower-salary employees in HDHPs underutilize outpatient care and overutilize EDs. These individuals may lack or be unaware of other care options and depend on ED services for non-urgent problems. Bilings et al. [28] suggested that changing the way people use EDs will require improvements to the primary care system. Potential solutions to this dilemma include expanding the availability of affordable health insurance coverage, improving access to primary healthcare services, and making telemedicine consultations more accessible. Primary care clinics can be better rewarded for providing a lower-cost alternative to ED use and for preventing emergency situations from developing [28].
Discrepancies in age trends between Utah, Kentucky and Ohio could reflect differences in driver populations and characteristics. By area, Utah is the thirteenth largest state in the US; however, beyond the Interstate 15 corridor, the state is sparsely populated (ACS) and has a population density of 39.01 people per square mile. The state has a low percentage of older drivers (11.9%) and the youngest median age in the US. Conversely, Kentucky's population density is 113.12 persons per square mile, and 16.9% of the population is classified as older, while Ohio's population density is 286.13 persons per square mile, and 17.5% of the population is older. These differences between Utah and the other two states could explain the dissimilar age trends shown in Figure 1.

Conclusions
The mathematical model developed in the study attempted to correlate the socioeconomic factors of a driver involved in crash seeking post-crash medical services. The study concludes that (1) drivers residing in areas with lower socioeconomic status and (2) female drivers are more likely to seek post-crash medical services.
This study is one of the few studies that have investigated the relationship of the drivers' socioeconomic factors with their seeking of post-crash medical services. However, the study carries several limitations. First is its dependence on GUM data, which primarily uses police-reported crash records. Upwards of 10 million crashes go unreported each year [29]. Furthermore, some drivers might seek medical attention regardless of injury severity. Crashes that do not appear in crash records are not traceable and thus excluded from analysis, which could potentially bias the findings. However, this is unavoidable. The linkage approach used to develop the GUMs data is probabilistic and imperfect. Furthermore, this study used socioeconomic data derived from the ACS at the zip code level and assumed that information observed at the zip code level also holds true at the individual level. When studies are conducted at the group level, this ecological fallacy is common and irresolvable. Crash types were not factored into the analysis presented here, and future research should take a closer look at the relationship between crash type and crash severity, as this may contribute to whether a driver seeks medical attention.