1. Introduction
Car accidents continue to pose a significant public safety concern, with far-reaching impacts affecting communities, individuals, and infrastructure across the US. The US Transportation Secretary Pete Buttigieg states, “We continue to face a national crisis of traffic deaths on our roadways, and everyone has a roleS to play in reversing the rise that we experienced in recent years” [
1]. In 2022, 42,795 people died in motor vehicle traffic crashes in the US [
1]. The grim statistics surrounding car accidents underscore the pressing need for advanced analytical tools and predictive models to mitigate their devastating consequences. In response to this urgent societal challenge, this paper presents a comprehensive study that delves into the intricate dynamics of car accidents, aiming to provide novel insights and predictive capabilities that can contribute to a safer driving environment.
The prevalence of car accidents on American roads has spurred numerous efforts to understand their underlying causes, patterns, and trends [
2,
3,
4,
5]. These accidents result in human casualties, economic losses, and emotional distress. Additionally, they strain emergency response systems, healthcare facilities, and law enforcement agencies. To address this multifaceted challenge, it is helpful to use the power of advanced data analysis [
6], machine learning techniques [
7,
8,
9,
10], and geospatial visualization [
11,
12] to unravel the complex interactions that lead to accidents. Doing so can pave the way for proactive safety measures and targeted interventions, ultimately reducing the toll of casualties on society.
While it is standard practice to calculate the shortest route between a starting point and a destination, an often-overlooked aspect is calculating the risk associated with each potential route. A dangerous assumption existing navigation systems adopt is that all roads are equally safe [
13]. This gap in analysis presents an opportunity for study, as it could greatly benefit drivers by providing them with essential information regarding the potential risks involved in their chosen routes. By integrating risk assessment into route planning, drivers can make more informed decisions, enhancing overall safety and efficiency on the road. Therefore, investigating and addressing this gap could contribute significantly to improving transportation systems and ensuring the well-being of drivers and passengers alike.
This research addresses the following question: Can the safest car routes across the US be determined? To answer the research question, two objectives were created, namely, (1) to forecast car accidents and (2) to determine the safest route between any two locations. The first aim was to predict the number of car accidents for the following years (2023 to 2025) by examining historical car accident data from 2016 to 2022. Statistical analyses, data mining techniques, and machine learning algorithms were applied to uncover patterns and trends. By doing this, it was possible to gain insights and predict car accident occurrences in subsequent years using neural network (NN) techniques. The safest route objective aims to delve into the model to quantify the risk of an accident between different routes or areas inside US territory. This involved developing a joint probability density function (PDF) model incorporating the latitude and longitude dimensions using a convolutional neural network (CNN). The model aims to create a PDF that can quantify the risk by accounting for the spatial distribution of accidents. This approach provides a tool for policymakers, law enforcement agencies, and stakeholders to allocate resources and proactively implement adequate safety measures. Additionally, this study offers a tool for drivers to evaluate the risk between different possible routes. This research seeks to analyze data to enable safer roads around the country. Through extensive exploration of the complex network of variables, car accidents were predicted for the following years. Additionally, a model that leverages the power of spatial information is developed to quantify the risk of accidents across the country.
The proposed model focuses on predicting and analyzing car accidents across the US to determine the safest routes. The prior literature covers a broader range of safe/secure routing problems, including hazardous materials transport, patrol routing, and cash-in-transit operations [
14]. The proposed joint PDF quantifies accident risk based on geographical location. This is a more sophisticated probabilistic approach compared to approaches in most of the surveyed literature, which tend to use simpler risk measures like population exposure or binary coverage [
15]. A vital contribution of the proposed work is using NNs and CNNs to forecast future accidents and model risk distributions. The literature generally does not incorporate predictive modeling to this extent. Instead, it focuses on optimization given static risk measures [
14,
16]. The analysis covers multiple years of historical data and makes predictions for future years. Many surveyed papers focus on single-period or static routing problems [
15]. The prior literature often uses smaller datasets or artificial instances, whereas we leverage large-scale, real-world accident data across the US [
15]. The developed research provides insights that could be directly applied to general route planning and navigation systems. Much of the prior work is more specialized (e.g., hazmat transport and police patrols) [
15,
17]. In summary, the proposed research represents a novel application of data science and machine learning techniques to the broader problem of safe routing, with a unique focus on general car accident risk prediction and analysis across a large geographical area. This distinguishes it from the more specialized research-oriented approaches in the prior literature on safe and secure vehicle routing. The contributions of this study can be listed as follows:
Proposes an NN to forecast car accidents using a dataset with millions of accident cases;
Investigates the evolution of car accidents from 2016 to 2022 in the US;
Develops a tool to quantify the probability of an accident across the country; this enables the calculation of the probability of accidents around different areas or routes.
2. Methods
2.1. Measured Car Accidents
A comprehensive dataset comprises car accident information spanning 49 states across the US. Encompassing the February 2016 to March 2023 timeframe (the data for 2023 were excluded for this research since they contain data only for three months), the dataset was curated utilizing a range of application programming interfaces (APIs) designed to stream traffic incident data [
18,
19]. The APIs act as channels for real-time traffic updates obtained from various sources such as the US and state transportation departments, law enforcement agencies, traffic cameras, and road network sensors. This dataset currently contains a vast compilation of approximately 7.7 million different accident records, making it an invaluable resource for studying and analyzing traffic trends and road safety. The US Accidents dataset provides information about accident locations, timestamps, weather conditions, road conditions, driver demographics, and other attributes.
2.2. Forecasting
Forecasting can be performed using statistical analysis [
20] or AI and widely used across various domains, from weather to car accident predictions [
21]. The prediction of car accidents for the following years by examining historical car accident data was accomplished using statistical analyses, data mining techniques, and machine learning algorithms. By doing this, it was possible to predict future car accident occurrences using NN techniques. The effectiveness of the artificial neural network (ANN) model in predicting car accidents by states over the years heavily relies on the quality and structure of the input data. A data preprocessing procedure prepared the 7.7 million distinct accident records for utilization within the NN framework. A key aspect of this procedure was the segmentation of the data based on both years and states. This segregation was strategic, aligning with the core principle of the NN model—predicting car accidents with a state-specific focus across varying timeframes. The data preprocessing phase laid the foundation for the NN’s subsequent predictive capabilities. The preprocessing procedure captured the car accident patterns’ temporal and spatial intricacies by segmenting the extensive accident records by years and states. This data preparation enhances the model’s capacity to predict car accidents by states over the years, thereby contributing to informed decision making for road safety initiatives.
The foundation of the NN model was established by designing a sequential architecture using the Keras library in Python. The architecture consisted of three layers: an initial dense layer with 64 neurons and an ReLU activation function, a subsequent dense layer with 32 neurons and ReLU activation, and finally, an output layer with seven neurons (which correspond to the seven years of the input data, corresponding to the years 2016–2022). The compiled model was configured to minimize the mean square error (MSE), serving as the loss function. The Adam optimizer managed the optimization process, efficiently adapting learning rates during training. The model was subsequently trained iteratively, enhancing its predictive accuracy with each epoch.
Table 1 presents detailed information on the parameters related to the NN.
The MSE was employed as a performance metric, assessing the difference between predicted and actual values. The training continued until the model’s error fell below the pre-established threshold. Before predicting for the following years, the model’s performance was evaluated by comparing its predictions against the actual data of 2022. For the validation study, the input variables to train the NN were the number of car accidents per state per year from 2016 to 2021. The model performed a regression analysis on these data and attempted to predict accident numbers for 2022. We then compared these predictions to the actual 2022 data to evaluate model performance. The NN architecture that produced the lowest error was the selected model. The training process involved creating models with different maximum error thresholds. These models were used to predict results for 2022, which were then compared to the actual 2022 data to validate the proposed NN. The comparison was performed state by state, and the average error was calculated. The average error of each model can be synthase in
Figure 1. The selected threshold error was 10%, resulting in a model with a 39% error. This means the model’s accuracy is around 60%.
Figure 1 concisely represents the interplay between model error and prediction accuracy. The x-axis quantifies the extent of model error, indicating deviations between the model’s predictions and actual values. On the y-axis, the prediction error is depicted, showcasing the disparity between predicted and exact outcomes. Each point represents a model. The graph effectively demonstrates how, as model error increases, prediction error follows suit, exemplifying the direct relationship between the two. This visualization underscores the need to minimize model error to ensure accurate predictions. The prediction modeling can be explained through the framework in
Figure 2. The raw data are preprocessed to train the NN and generate different predictions. Then, different error thresholds, the model’s error, and the real values were used for the error condition. This approach allowed the selection of the best model for predictions while adjusting the threshold; thereby, a better prediction can be achieved.
In this investigation, the study focused on discerning the relationship between model error and prediction accuracy. It was observed that models demonstrating minimal error during training tended to overfit the available data, resulting in increased prediction errors when applied to unseen data. This outcome emphasized the importance of balancing model complexity and predictive performance. Training proceeded iteratively, with each epoch using a batch size of 16, until the MSE fell below a 10% threshold. Subsequently, its predictive capabilities were extended to forecast car accident patterns for 2023, 2024, and 2025.
2.3. Safest Route
Risk modeling consists of developing a model to quantify the risk of an accident in a specific area or route in the US territory. Risk quantification involves assessing the likelihood of certain events or outcomes occurring within a specified context [
22]. One powerful approach to achieve this is by utilizing the concept of a PDF and understanding how the integral of the volume under this PDF corresponds to the probability of an accident occurring in a particular area, therefore, the risk of an accident in the calculated constraints.
In mathematical terms, the integral of the PDF over a particular region gives the probability of an accident occurring within that region [
23]. This integral computes the “volume” under the PDF surface over the specified region. The larger the integral value, the higher the probability of accidents in that area,
This concept holds significant importance in risk assessment. By calculating the integral of the PDF over different geographic areas, it is possible to compare the risk levels of accidents occurring in those areas. Areas with larger integral values have a higher probability of accidents, indicating higher risk, while areas with smaller integral values have a lower probability of accidents, indicating lower risk. The same concept can be applied to routes. The integral of the PDF in a route gives the probability of a car accident on that route, making it feasible to evaluate the safest possible route. In summary, probability calculation serves as a tool, offering a mathematical representation of the likelihood of accidents transpiring within a specific geographic area or route. This concept is part of risk quantification within the realm of accident analysis, serving as an instrument for gaining insights into the intricate interplay of factors that contribute to the probability and distribution of accidents across diverse regions and locales.
A PDF was created based on accident data to quantify risk, and then the PDF was modeled using an ANN. A kernel density estimate (KDE) [
24] was used to represent the PDF of accident occurrences as a function of latitude and longitude. A 3D CNN (Conv3D) [
25,
26] model was designed to model the PDF curve. The model architecture comprised convolutional layers, followed by flattening and dense layers. The output layer employed a linear activation function to predict the number of accidents. The grid size of
was used to discretize the geographical space. The model was trained using the Adam optimizer and MSE loss function. Training was conducted until the CNN model’s maximum error and calculated PDF were less than 10%. The model’s performance was evaluated on a validation set, measuring the MSE between the predicted PDF and the actual PDF for each grid point.
Table 2 presents detailed information on the parameters related to the 3D CNN.
The first dense layer has 64 neurons and 192 parameters; the second dense layer has 32 neurons with 2080 parameters, and the output layer consists of a single neuron with 33 parameters. The network has 2389 parameters, all trainable during training, contributing to the model’s capacity to learn and make predictions. The predictive model’s performance was assessed based on the MSE between the predicted and actual PDF values. The framework is represented in
Figure 3, which provides a visual overview of the methodology architecture and components.
Using a KDE approach to create a joint PDF offers a tool for visualizing data distribution and unveiling underlying patterns. However, it is essential to note that the KDE-generated PDF, while providing a smooth curve approximating the data distribution, does not inherently yield a model function. It represents the data but lacks the explicit functional form needed for precise mathematical analysis, such as determining maximum and minimum values or conducting intricate calculations. In contrast, using a 3D CNN to model the curve as a function offers an advantage. The 3D CNN not only captures the data distribution but also formulates it as an explicit mathematical function, thereby enabling a deeper level of analysis that includes extracting maximum and minimum values and performing mathematical computations. This contrast highlights the complementary nature of these two approaches, where KDE excels in data visualization and initial exploration. At the same time, the 3D CNN extends the scope to a model and mathematical analysis of the underlying data distribution.
Google Maps served as the primary tool for gathering coordinates (latitude and longitude) for the analysis. These coordinates were extracted to delineate boundary coordinates for different cities under examination. Subsequently, Google Maps API generated multiple routes between travel points to assess the risk associated with various travel routes. These routes were exported as keyhole markup language (KML) files for integration into Python-based GIS tools. Employing the model, one can quantify the risk inherent in traversing each route. Finally, the safest route was determined based on minimizing quantified risk, ensuring optimal route selection for enhanced travel safety.
3. Results
3.1. Data Preparation
The positions of accidents were selected considering latitude and longitude. This data preparation was used to train the NN. This data preparation also provided insight into how car accidents have evolved. To visually represent the spatial distribution of car accidents in 2016 and 2022, the heatmap gradient color map was created to visualize the spatial distribution of car accidents for 2016 and 2022 in
Figure 4 and
Figure 5, respectively.
Figure 4 presents a 2D heatmap illustrating the spatial distribution of car accidents throughout the US in 2016. The color intensity corresponds to the frequency of accidents, with green regions indicating higher incident rates. By visually mapping these incidents, the heatmap provides valuable insights into accident-prone areas, facilitating data-driven strategies for enhanced road safety nationwide.
Figure 5 presents a 2D heatmap illustrating the spatial distribution of car accidents throughout the US in 2022. The color intensity corresponds to the frequency of accidents, with yellow regions indicating higher incident rates.
When observing the scale, it becomes apparent that there has been a noticeable increase in the maximum value. This suggests that the occurrence of car accidents has been on the rise over time. It is also worth noting that regions that previously had low car accident rates are now reporting more incidents, particularly in the northern part of the country. Despite this, states such as California, Florida, and New York continue to present some of the highest values of car accidents. These findings indicate a pressing need for intervention to address car accidents and reduce their frequency.
3.2. Forecasting
Table 3 offers a comprehensive glimpse into the anticipated number of accidents by state for 2023, 2024, and 2025. It encapsulates the projected accident figures and the underlying factors influencing these predictions. By extrapolating from historical data, this table provides insights into the potential trends and challenges in road safety, serving as a cornerstone for informed decision making and proactive accident prevention strategies at the state level. It is important to note that certain states have significantly higher rates of car accidents than others. According to recent data, California, Florida, Virginia, Texas, and New York have the highest number of accidents. These states are also expected to see changes in accident rates in future years. Specifically, the projections indicate that from 2022 to 2025, accident rates in these states are expected to increase by varying percentages: 2.6%, 2.4%, 4.1%, 1.2%, and a striking 73.4%, respectively. This information suggests that certain states may require more attention and resources when it comes to reducing the number of car accidents. In particular, the high rate of increase in car accidents in New York is a matter of concern and may require further study and intervention to address this issue.
3.3. Safest Route
Figure 6 presents the joint PDFs mapping the US’s spatial distribution across latitude and longitude. On the left, it displays the raw data representation, while on the right, it showcases the outcome of a CNN modeling the left curve. These plots provide a spatial perspective, elucidating the intricate data representation and the CNN-driven modeling results, enabling a comprehensive analysis of geographic patterns. The PDF has three significant peaks; the highest one in red corresponds to the location of California, which has the most significant number of car accidents. The two other peaks are New York and Florida. The graph on the right represents the result produced by CNN. This happens because CNN tries to model the 3D curve as a function of latitude and longitude. This enables the calculation of the probability of an accident occurring.
This plot enables us to assess the probability of accidents across various cities, encompassing major urban centers (i.e., New York and Los Angeles) and smaller municipalities (i.e., Lubbock).
Table 4 presents the probability of accidents within these cities, factoring in their respective population sizes, areas, and population densities. To calculate the probability, the boundary coordinates were used to delimit the integral area in Equation (1). The calculated probabilities of accidents are presented as percentages. Thus, the probability of an accident in a small city can be 1000 smaller compared to larger cities. Population density directly affects the probability of accident. The results of the developed tool comply with other research [
27], which shows the relative probability of accidents in Los Angeles and Houston as 2.1522, while the result from the developed tool is 2.1010.
In addition to the probability in different areas, it is also possible to calculate the probability of an accident on various routes around the US. Four routes were explored between Los Angeles, CA, and Houston, TX. Route #1 passes through Flagstaff, Albuquerque, Amarillo, and Dallas. Route #2 is like the first one but passes through Lubbock. Route #3 goes through Phoenix, Tucson, and Albine (see
Figure 7). Finally, Route #4 is an alternative to #3, passing through Dallas. By evaluating the probability of an accident across each route, we found the safest route and compared the probability among the four options. As can be seen in
Table 5, Route #2, which passes through Lubbock, is the safest route. The other options are around 40% more likely to involve an accident. This example could reduce the number of accidents on this route by almost half. If the risk evaluation is shown to drivers, it could reduce the accidents in the entire country by around 40%.
The main reason why Route #2 is the safest is because the joint PDF values along the way are lower than the others. This is associated with the density population of the cities along the way; this result is presented in
Table 4. The safest direction to travel from a given origin can also be pinpointed. The gradient of this PDF at any point indicates the steepest direction along the curve, leading toward areas with lower accident probabilities. Remarkably, moving in the opposite direction from this gradient guides away from higher-risk zones.
Table 6 illustrates the practical application of this method across various cities, offering insights into the safest routes to take. For instance, in Lubbock, heading northwest is the optimal choice, while in New York and Los Angeles, driving north is recommended for safer travel. Conversely, in Houston, moving southwest leads to decreased accident probabilities. By integrating such analytical techniques into navigation systems, road safety can be enhanced and risks can be mitigated for drivers everywhere.
4. Discussion
4.1. Forecasting
The computational demands of running the ANN model posed a challenge for the personal computer Alienware M15 R6 due to its limited RAM capacity (16 GB). Despite its robust specifications, including an Intel Core i7 processor and a high-performance graphics card (RTX 3070) with 8 GB of GDDR6 memory, the ANN memory-intensive processes strained the computer’s resources. Consequently, to surmount these limitations and ensure the seamless execution of the intricate computations, the decision was made to transition the code to a High-Performance Computing Center (HPCC) on the Texas Tech University campus using the Nocona CPU partition. This computing system, boasting a theoretical maximum of 983 TFLOPS and a benchmarked performance of 804 TFLOPS, was equipped with 240 CPU nodes of the Dell PowerEdge C6525 model. Powered by AMD EPYC™ 7702 processors with 64 cores each, totaling 30,720 cores within the partition, Nocona provided the computational source needed to execute the ANN tasks efficiently. The supercomputer’s superior memory capacity, with 512 GB per node and 4 GB per core, mitigated the memory-related challenges encountered on the personal computer. Moreover, its high-speed fabric, Mellanox HDR 200 InfiniBand, operating at 200 Gbps, ensured rapid data exchange and optimal node communication [
28]. One node and eight GPUs were necessary to run the required code effectively.
The primary aim was to discern alterations in spatial patterns of car accidents and identify potential trends over this period [
5]. The heatmap representing 2016,
Figure 4 exhibited localized clusters of car accidents with varying degrees of intensity. Conversely, the heatmap for 2022,
Figure 5, displayed perceptible shifts in these clusters, indicating potential changes in accident distribution patterns. The data analysis showed a conspicuous increase in the overall count of car accidents between 2016, 410,821 car accidents, and 2022, 1,762,452, car accidents. The heatmap for 2022 depicted elevated accident incident density across several regions, signifying an escalation in traffic-related safety concerns compared to the preceding year. To contextualize the findings, the study factored in travel volume statistics for the last 12 months of December for both years. In 2016, the collective vehicle miles traveled across all roads and streets totaled approximately 3,174,408 million miles. In contrast, this value slightly decreased to 3,169,434 million miles in 2022 [
29]. Despite the minor reduction in travel volume, the pronounced surge in the total number of car accidents suggests a disproportionately amplified increase in accident occurrences. The observed surge in the overall count of car accidents from 2016 to 2022 prompts concern and warrants the attention of traffic management authorities and policymakers. The indication of an increase in accidents despite a slight reduction in overall travel volume implies the presence of underlying factors contributing to this phenomenon. The alterations in spatial patterns of car accidents are also of notable significance. Land utilization, urban development, and traffic flow change could contribute to the shifting clusters of accident incidents. Identifying these patterns and understanding the localized determinants influencing them can facilitate targeted interventions to enhance road safety. Distinctive patterns emerged across various regions of the US, each offering unique insights into the dynamics of road safety. In the Northeast, states such as Connecticut and Delaware displayed notable upticks in accident rates, potentially attributable to their proximity to bustling urban hubs and elevated population densities [
30]. Meanwhile, in the Midwest, states like Minnesota and Missouri witnessed increases, signaling potential road safety challenges in tandem with population growth [
30]. The South, encompassing states like Florida and Georgia, exhibited substantial spikes in accidents, underscoring the pressing necessity for implementing robust traffic management strategies to mitigate these alarming trends. In conclusion, the analysis underscores the importance of sustained monitoring and evaluation of road safety protocols. The escalation in car accidents between 2016 and 2022, along with modifications in spatial distribution, accentuates the necessity for adaptable and holistic strategies to improve road safety conditions. Mitigating the increase in accidents necessitates an approach that considers the aggregate accident count and delves into the contributing factors engendered by this trend.
The predictions were generated using the selected NN model [
5], recognized as the optimal performer among various evaluated models. This model showcased the highest degree of accuracy and reliability in projecting the anticipated count of car accidents, offering valuable insights into potential trends and patterns within specific states over the upcoming years. The subsequent discussion delves into the key findings, shedding light on the predictive capabilities of the employed NN model and the implications of its outcomes for enhancing road safety strategies and policy formulation [
31].
Table 3 offers a comprehensive glimpse into the anticipated number of accidents by state for 2023, 2024, and 2025. These forecasts are not merely statistical extrapolations but are informed by advanced modeling techniques considering various influencing factors. The projections reflect the anticipated consequences of continued urbanization, population growth, infrastructure developments, and changing traffic patterns. For instance, states like California and Florida, which are the states with the highest number of cars [
32], are expected to see further increases in accidents due to their ongoing population growth and complex traffic dynamics. States that witnessed fluctuations in the past, such as Illinois and Indiana, may continue to experience variations, potentially influenced by evolving road safety policies and infrastructure projects. Urban areas tend to exhibit higher accident rates due to the concentration of population and more intricate traffic patterns. States with large metropolitan areas, like New York, Maryland, and New Jersey, will likely continue to experience elevated accident figures. In contrast, rural areas, like those in the Midwest and Mountain states, may see more stable or declining accident rates. New York State is right now number 5 in accidents, and according to the prediction, is about to see a significant increase in the rate of car accidents; New York State is three times smaller than California but has half as many people, which increases the population density, and this contributes to the occurrence of accidents. The increase in New York State’s car accident rate is a concern that reinforces the need for ways to reduce accidents in the country. The machine learning results can significantly improve crash prediction accuracy [
33,
34]. In summary, the historical evolution of car accidents provides context for understanding the present, while predictive modeling offers a glimpse into the future. These insights are invaluable for crafting targeted road safety policies, infrastructure investments, and public awareness campaigns to address each state and region’s challenges and opportunities. Road safety is dynamic, and continuous analysis and adaptation are crucial to mitigating risks and improving roadway outcomes.
4.2. Safest Route
In pursuit of risk modeling, which aimed to comprehensively quantify the risk associated with car accidents across the US, a data-driven approach was harnessed. This objective necessitated the creation of a PDF model to facilitate risk calculation within specific geographic areas. This endeavor involved the construction of a CNN model strategically designed to emulate the complex curve of the PDF while maintaining a predefined margin of error of 10%. The resulting trained ANN model has been made available for utilization through the project’s dedicated GitHub repository [
35], enabling stakeholders to access a powerful tool for comprehensive risk assessment.
The visual representation presented in
Figure 6 depicts the intricacies involved in the process. The diagram showcases two distinct components: a 3D plot of the PDF on the left and the CNN-modeled PDF on the right. While visually informative, the former is a static image that introduces complexity. However, the true essence of the investigation lies in the PDF depicted on the right, which encapsulates the model’s finely tuned computations. The predictions and projections of car accident risks are within this constructed curve. This model makes it possible to calculate the risk of a car accident in different areas (i.e.,
Table 4) and routes (i.e.,
Table 5) and study the curve.
The highest peak in
Figure 6 corresponds to the area of California, which is the area that has the highest number of cars in the US [
32]; this state has almost double the number of cars of Florida, which is the state with the second-highest number of cars. Then, the following peaks are New York and Florida; the number of accidents in Florida can also be related to the total number of cars, but New York ranks ninth for the number of vehicles. On the other hand, New York is a small state (ranks 27 in size). So, the concentration of cars is high, which contributes to accidents. Creating the ANN model for PDF modeling marked a pivotal phase in the study. The success of the NN model was underscored by its ability to uphold a 10% margin of error, a testament to its precision in replicating the intricate PDF curve. This degree of accuracy is paramount in risk assessment, as modeling inaccuracies could lead to flawed decision making and suboptimal resource allocation. The ANN’s prowess in maintaining such accuracy positions it as a valuable tool for policymakers, urban planners, and stakeholders in proactively addressing road safety concerns. The model was used to quantify the risk in four different cities (see
Table 4). The findings reveal a clear correlation between population density and accident probability, indicating that smaller cities consistently experience fewer accidents than their larger urban counterparts. Higher population density leads to more cars per area, increasing the likelihood of car accidents. The PDF model was also used to evaluate the probability of an accident on different routes. After evaluating accident probabilities for various routes, it became clear that specific routes are significantly less risky than others (see
Table 5). Comparing them with alternative routes, the data suggest that these alternatives carry a 40% higher risk of accidents. This underscores a tangible opportunity to halve the accident rate along this route alone. If such risk assessments were made available to drivers nationwide, there is potential to mitigate accidents on a broader scale, potentially reducing the overall countrywide accident rate by approximately half. Route #2 is the safest route. In the example, it is the safest because the path of this route goes through primarily smaller cities, unlike the other routes. Smaller cities tend to have a lower probability of accidents since they tend to have smaller population densities. Additionally, the analysis proposes a safer overall direction by employing gradient analysis, which facilitates the identification of routes leading toward safer regions. By leveraging gradient analysis, decision makers can implement targeted interventions or adjustments to transportation routes, thereby enhancing overall safety measures and potentially reducing the incidence of accidents in the designated areas.
In conclusion, the culmination of this endeavor resulted in a multifaceted achievement: the construction of a PDF modeled by a CNN, the accurate estimation of car accident risks, and the provision of a repository for accessible utilization. By quantifying the risk of car accidents in the US through this innovative methodology, the study contributes to a deeper understanding of road safety dynamics. It offers a sophisticated tool to inform decision making and interventions to reduce car accident incidents [
36].
4.3. Generalizability and Limitations
While the current work utilizes the car accident dataset to validate the framework, it is worth noting that the same tools can be applied to predict future accidents and joint PDFs using various datasets. The framework can accommodate a range of data characteristics, which include the following:
Accident/failure accident data: any accident or failure is supported, regardless of its domain of origin;
Data length: the proposed framework can be used on datasets of any size, but larger datasets will result in higher computational costs;
High-dimensional data: It is possible to use datasets that contain more than two input variables to predict future accidents. However, it is impossible to generate a plot of the joint PDF because the dimension of the plot would be higher than three.
The limitations of the proposed framework are also listed to facilitate its applicability to other datasets. They are as follows:
Availability of accidents/failure dataset: in some cases, there is insufficient available data;
Dataset structure: to create the joint PDF, it is necessary to have at least two variables related to the accident/failure to model the risk;
Tuning of the framework: the machine learning hyperparameters were selected considering the error result; however, further analysis may be performed to choose those hyperparameters, considering another threshold in addition to the error.
5. Conclusions
This paper answered the following question: Can the safest car routes across the US be determined? The prediction modeling analyzed data from accidents from previous years to identify underlying trends and patterns. Data analysis was conducted to predict the number of accidents per state. Risk modeling created a novel model to quantify the risk of accidents in the US. It produced a new model incorporating the Conv3D technique.
Data collection, data analysis, and machine learning techniques showed valuable insights into the dynamics of road incidents across different states, regions, and periods. The utilization of ANN allowed the construction of a model and its extrapolation to predict car accidents for the following years. Additionally, ANN was able to model a PDF curve that accurately characterized the distribution of accidents, facilitating informed risk assessment in specific geographical areas and along diverse travel routes.
One of the outcomes of this study is the empowerment of stakeholders, from individuals to policymakers, with the tools to make informed decisions regarding road safety. The accurate joint PDF and risk assessment capabilities provide a foundation for designing effective strategies to reduce accidents and improve overall traffic safety. Individuals can significantly minimize their exposure to potential accidents by avoiding high-risk areas and adopting safer routes. For policymakers and urban planners, the insights gained from this study offer a wealth of actionable information. The predictive model and risk assessment developed in this study can provide policymakers with valuable insights to guide targeted infrastructure improvements. The following data-driven interventions can be implemented by identifying high-risk routes and accident-prone areas: enhanced signage, road design modifications, traffic calming measures (speed bumps and crosswalks), increased law enforcement presence, improved road maintenance, and enhanced lighting.
The analysis also pinpointed numerous high-crash frequency areas identified as accident hotspots, including California, New York, and Florida. In California, densely populated areas and heavy traffic contribute to frequent collisions. In New York, there is a clear need for enhanced safety measures in urban environments. Florida, particularly in urban areas like Miami and Orlando, experiences a high number of accidents, partly due to the state’s large population and significant tourist activity. Developing customized interventions for high-risk routes is recommended to decrease accidents and enhance road safety. For rural highways, focus can be on vehicle-centric safety measures like rumble strips and widened shoulders. Urban roads need multimodal safety improvements such as pedestrian crossings and dedicated bicycle lanes. Suburban streets require a balanced approach, including improved lighting and shared bike lanes. Tailoring these interventions to each route type can effectively reduce accidents.
The risk modeling in this study could be integrated into GPS navigation systems and mobile apps to offer drivers a tool for safer route planning. It could provide alternative route suggestions prioritizing safety, display color-coded risk overlays on maps, and deliver real-time alerts for high-risk zones to improve road safety and prevent accidents. The developed model can enhance traffic engagement through data-driven insights for targeted public awareness campaigns and driver education. It is possible to identify specific risks in different regions, allowing authorities to develop customized educational content addressing relevant safety concerns. For example, campaigns could focus on the dangers of using mobile devices while driving in areas with high incidences of distracted driving accidents. Similarly, driver education programs could emphasize adhering to speed limits and adjusting driving behavior to road conditions for routes with frequent speeding-related crashes.
Identifying accident-prone areas and trends can guide the allocation of resources for infrastructure improvements, traffic management initiatives, and targeted interventions. With the ability to predict accidents, authorities can deploy emergency services strategically, ultimately saving lives and reducing the societal impact of accidents. Addressing the gap in route analysis by integrating risk evaluation represents a step toward reducing road safety. Future endeavors aim to deploy this tool directly to drivers, allowing for an assessment of the impact of risk awareness on driver behavior. Additionally, quantifying the tool’s effectiveness in preventing accidents through real-world case studies will provide valuable insights into its practical utility. Through such initiatives, striving toward safer and more informed driving practices is possible, ultimately fostering a safer transportation environment for all road users.
Author Contributions
Conceptualization, N.L.G. and S.E.-O.; methodology, N.L.G. and S.E.-O.; software, N.L.G.; validation, N.L.G., S.E.-O., J.R., O.P. and G.F.; formal analysis, N.L.G.; investigation, N.L.G.; resources, N.L.G.; data curation, N.L.G.; writing—original draft preparation, N.L.G. and S.E.-O.; writing—review and editing, N.L.G., J.R., O.P. and G.F.; visualization, N.L.G. and S.E.-O.; supervision, S.E.-O. and J.R.; project administration, S.E.-O. and J.R. All authors have read and agreed to the published version of the manuscript.
Funding
This research received no external funding.
Data Availability Statement
The original data presented in the study are openly available in [
14,
15].
Conflicts of Interest
The authors declare no conflicts of interest.
References
- NHTSA. NHTSA Estimates for 2022 Show Roadway Fatalities Remain Flat after Two Years of Dramatic Increases. Available online: https://www.nhtsa.gov/press-releases/traffic-crash-death-estimates-2022 (accessed on 2 November 2023).
- Rabbani, M.B.A.; Musarat, M.A.; Alaloul, W.S.; Ayub, S.; Bukhari, H.; Altaf, M. Road Accident Data Collection Systems in Developing and Developed Countries: A Review. Int. J. Integr. Eng. 2022, 14, 336–352. [Google Scholar] [CrossRef]
- Hafeez, F.; Sheikh, U.U.; Al-Shammari, S.; Hamid, M.; Khakwani, A.B.K.; Arfeen, Z.A. Comparative Analysis of Influencing Factors on Pedestrian Road Accidents. Bull. Electr. Eng. Inform. 2023, 12, 257–267. [Google Scholar] [CrossRef]
- Frej, D.; Szumska, E. Analysis of the Length of Highways and the Number of Motor Vehicles Impact on the Intensity of Road Accidents in Selected European Countries in 2010–2020. Commun.-Sci. Lett. Univ. Žilina 2023, 25, A40–A60. [Google Scholar] [CrossRef]
- Boyagoda, L.S.; Nawarathna, L.S. Analysis and Prediction of Severity of United States Countrywide Car Accidents Based on Machine Learning Techniques. In Proceedings of the 7th International Conference on Information Technology Research: Digital Resilience and Reinvention, ICITR 2022—Proceedings, Moratuwa, Sri Lanka, 7–8 December 2022; IEEE: Moratuwa, Sri Lanka, 2022; pp. 1–5. [Google Scholar]
- Lin, Y.; Li, R. Real-Time Traffic Accidents Post-Impact Prediction: Based on Crowdsourcing Data. Accid. Anal. Prev. 2020, 145, 105696. [Google Scholar] [CrossRef] [PubMed]
- Li, P.; Abdel-Aty, M. A Hybrid Machine Learning Model for Predicting Real-Time Secondary Crash Likelihood. Accid. Anal. Prev. 2022, 165, 106504. [Google Scholar] [CrossRef]
- Wen, X.; Xie, Y.; Jiang, L.; Li, Y.; Ge, T. On the Interpretability of Machine Learning Methods in Crash Frequency Modeling and Crash Modification Factor Development. Accid. Anal. Prev. 2022, 168, 106617. [Google Scholar] [CrossRef]
- Alif, M.A.R.; Hussain, M. Lightweight Convolutional Network with Integrated Attention Mechanism for Missing Bolt Detection in Railways. Metrology 2024, 4, 254–278. [Google Scholar] [CrossRef]
- Li, M.; Huang, L. An Artificial Neural Network-Based Approach to Improve Non-Destructive Asphalt Pavement Density Measurement with an Electrical Density Gauge. Metrology 2024, 4, 304–322. [Google Scholar] [CrossRef]
- Krueger, R.; Bansal, P.; Buddhavarapu, P. A New Spatial Count Data Model with Bayesian Additive Regression Trees for Accident Hot Spot Identification. Accid. Anal. Prev. 2020, 144, 105623. [Google Scholar] [CrossRef]
- Man, C.K.; Quddus, M.; Theofilatos, A. Transfer Learning for Spatio-Temporal Transferability of Real-Time Crash Prediction Models. Accid. Anal. Prev. 2022, 165, 106511. [Google Scholar] [CrossRef]
- Ghoul, T.; Sayed, T.; Fu, C. Real-Time Safest Route Identification: Examining the Trade-off between Safest and Fastest Routes. Anal. Methods Accid. Res. 2023, 39, 100277. [Google Scholar] [CrossRef]
- Sohrabi, S.; Weng, Y.; Das, S.; German Paal, S. Safe Route-Finding: A Review of Literature and Future Directions. Accid. Anal. Prev. 2022, 177, 106816. [Google Scholar] [CrossRef] [PubMed]
- Pourroostaei Ardakani, S.; Liang, X.; Mengistu, K.T.; So, R.S.; Wei, X.; He, B.; Cheshmehzangi, A. Road Car Accident Prediction Using a Machine-Learning-Enabled Data Analysis. Sustainability 2023, 15, 5939. [Google Scholar] [CrossRef]
- Goldberg, D.M.; Hong, S. Minimizing the Risks of Highway Transport of Hazardous Materials. Sustainability 2019, 11, 6300. [Google Scholar] [CrossRef]
- Fröhlich, G.E.A.; Gansterer, M.; Doerner, K.F. Safe and Secure Vehicle Routing: A Survey on Minimization of Risk Exposure. Int. Trans. Oper. Res. 2023, 30, 3087–3121. [Google Scholar] [CrossRef] [PubMed]
- Moosavi, S.; Samavatian, M.H.; Parthasarathy, S.; Ramnath, R. A Countrywide Traffic Accident Dataset. In Proceedings of the 27th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, Chicago, IL, USA, 5–8 November 2019; Association for Computing Machinery: Chicago, IL, USA, 2019. [Google Scholar]
- Moosavi, S.; Samavatian, M.H.; Parthasarathy, S.; Teodorescu, R.; Ramnath, R. Accident Risk Prediction Based on Heterogeneous Sparse Data: New Dataset and Insights. In SIGSPATIAL’19: Proceedings of the 27th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, Chicago, IL, USA, 5–8 November 2019; Association for Computing Machinery: New York, NY, USA; pp. 33–42.
- Haddad, A.; Mondal, A.; Eluru, N.; Bhat, C.R. A Novel Integrated Approach to Modeling and Predicting Crash Frequency by Crash Event State. Anal. Methods Accid. Res. 2024, 41, 100319. [Google Scholar] [CrossRef]
- Hussain, F.; Ali, Y.; Li, Y.; Haque, M.M. Real-Time Crash Risk Forecasting Using Artificial-Intelligence Based Video Analytics: A Unified Framework of Generalised Extreme Value Theory and Autoregressive Integrated Moving Average Model. Anal. Methods Accid. Res. 2023, 40, 100302. [Google Scholar] [CrossRef]
- Joo, Y.J.; Kim, E.J.; Kim, D.K.; Park, P.Y. A Generalized Driving Risk Assessment on High-Speed Highways Using Field Theory. Anal. Methods Accid. Res. 2023, 40, 100303. [Google Scholar] [CrossRef]
- Hernandez, H. Multivariate Probability Theory: Determination of Probability Density Functions. ForsChem Res. Rep. 2017, 2, 1–20. [Google Scholar] [CrossRef]
- Chen, Y.C. A Tutorial on Kernel Density Estimation and Recent Advances. Biostat. Epidemiol. 2017, 1, 161–187. [Google Scholar] [CrossRef]
- Garcia-Garcia, A.; Gomez-Donoso, F.; Garcia-Rodriguez, J.; Orts-Escolano, S.; Cazorla, M.; Azorin-Lopez, J. PointNet: A 3D Convolutional Neural Network for Real-Time Object Class Recognition. In Proceedings of the International Joint Conference on Neural Networks, Vancouver, BC, Canada, 24–29 July 2016; Volume 2016, pp. 1578–1584. [Google Scholar]
- Ji, S.; Xu, W.; Yang, M.; Yu, K. 3D Convolutional Neural Networks for Human Action Recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2013, 35, 221–231. [Google Scholar] [CrossRef] [PubMed]
- Ramirez, R.; Bieber, C. The Cities Where You’re Most Likely to Get in a Car Accident. Available online: https://www.forbes.com/advisor/legal/auto-accident/cities-most-car-accidents/ (accessed on 29 January 2024).
- Beebe, M.; Williams, B.; Devaney, S.; Leidel, J.; Chen, Y.; Poole, S. RaiderSTREAM: Adapting the STREAM Benchmark to Modern HPC Systems. In Proceedings of the 2022 IEEE High Performance Extreme Computing Conference, HPEC 2022, Virtual, 19–23 September 2022; IEEE: Waltham, MA, USA, 2022; pp. 1–7. [Google Scholar]
- Transportation, U.S.D. of Travel Monitoring. Available online: https://www.fhwa.dot.gov/policyinformation/travel_monitoring/tvt.cfm (accessed on 2 November 2023).
- DESA, U.N. United Nations, Department of Economic and Social Affairs, Population Division. World Population Prospects 2022: File Gen/01/Fev1: Demographic Indicators by Region, Subregion and Country, Annually for 1950–2100. Online Edition. 2022. Available online: https://www.un.org/development/desa/pd/sites/www.un.org.development.desa.pd/files/wpp2022_summary_of_results.pdf (accessed on 26 September 2024).
- Broughton, J. Forecasting Road Accident Casualties in Great Britain. Accid. Anal. Prev. 1991, 23, 353–362. [Google Scholar] [CrossRef] [PubMed]
- Carlier, M. Automobile Registrations in the United States in 2021, by State. Available online: https://www.statista.com/statistics/196010/total-number-of-registered-automobiles-in-the-us-by-state/ (accessed on 2 November 2023).
- Cai, Q.; Abdel-Aty, M.; Sun, Y.; Lee, J.; Yuan, J. Applying a Deep Learning Approach for Transportation Safety Planning by Using High-Resolution Transportation and Land Use Data. Transp. Res. Part. A Policy Pract. 2019, 127, 71–85. [Google Scholar] [CrossRef]
- Cai, Q.; Abdel-Aty, M.; Zheng, O.; Wu, Y. Applying Machine Learning and Google Street View to Explore Effects of Drivers’ Visual Environment on Traffic Safety. Transp. Res. Part C Emerg. Technol. 2022, 135, 103541. [Google Scholar] [CrossRef]
- Gandur, N.L. Accidents. GitHub. Available online: https://github.com/NazirGandur/Accidents (accessed on 2 November 2023).
- Zhao, J.; Liu, P.; Li, Z. Exploring the Impact of Trip Patterns on Spatially Aggregated Crashes Using Floating Vehicle Trajectory Data and Graph Convolutional Networks. Accid. Anal. Prev. 2024, 194, 107340. [Google Scholar] [CrossRef]
| Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).