1. Introduction
Traffic accidents were estimated to cost Australia A
$33.15 billion in 2016 [
1]. This figure comes from estimations of the “value of a statistical life” [
2] and costs associated with loss of economic output as a result of injury as well as the repair of property [
3]. In the United States, fatalities from traffic accidents surpassed the combined toll taken by the two most deadly diseases, cancer and heart disease, and close to half of the deaths of 19-year-olds were a result of traffic accidents [
4]. While annual road fatalities per 100,000 people in Australia were five times less in 2013 compared to 1975 [
5], there were over 1100 fatalities in 2018 [
6]. To maintain this reduction in fatalities and reduce accidents overall, understanding the range of causative factors that influence traffic accidents is critical.
Influencing factors include environmental conditions [
7,
8,
9,
10,
11,
12,
13], vehicle factors [
4,
14], driver characteristics and behaviour [
15,
16,
17,
18], and road design [
19,
20].
Of particular interest is the effect of traffic conditions on accidents. While on the surface it seems desirable to reduce congestion, if it correlates negatively with serious injury or fatal accident frequency, a reduction may negatively affect road safety [
21]. A strong understanding of this relationship is necessary to improve traffic management and reduce accident frequency. Research stems from the 1930s [
11], with relationships between accident occurrence and traffic volume/congestion falling into one of two broad categories: linear and non-linear [
22].
Veh [
11] found a positive correlation between accident rates and average daily traffic (ADT), before accident rates gradually declined in higher traffic volumes; a trend also found by Raff [
23]. Other studies using ADT and annual average daily traffic (AADT) reported simple positive linear correlations [
24,
25,
26,
27,
28]. Gwynn [
29] suggested that the higher temporal resolution of hourly traffic data may give a stronger relationship. Using hourly volumes, both Gwynn [
29] and Ceder [
30] found a U-shaped curve, with the highest accident rates existing in the lowest and highest traffic volumes. Martin [
31] also found a U-shaped response, as did Frantzeskakis and Iordanis [
32] when considering the relationship to level of service. Shefer [
33] hypothesized that the relationship between the volume/capacity (v/c) ratio and fatal accident frequency would form a bell-shaped curve. This hypothesis was supported by Martin [
31] when looking at overall accident frequencies (not only fatal) and hourly traffic when using 6-minute traffic volume measurements for periods when accidents occurred.
Fortunately, accidents are rare events. However, from a statistical perspective, this requires analysis over long time periods and broad spatial scales to ensure sufficient sample sizes. Large datasets improve the sensitivity of model response to variables of interest. Advanced modelling approaches use high temporal resolution traffic volume data in combination with multiple covariates to predict accident frequencies, as in a study by Theofilatos [
34]. But these approaches cannot be easily adopted elsewhere if the data required to include these covariates is not available (e.g., road geometry, moisture conditions or light levels) and more parsimonious models may be necessary [
35]. It is important to note that a parsimonious approach could lead to issues relating to unobserved heterogeneity in unincluded factors [
35] between different accidents and intersections. Approaches such as the v/c ratio exist to standardise traffic volume with relation to intersection capacity [
36] and allow detailed analysis at a limited number of locations by taking differences in intersection characteristics into account. But the application of this method across broad spatial scales is difficult if road geometry, directional traffic volume, or traffic signal data is unavailable.
This study aims to analyse how traffic volumes affect accident frequency to address the lack of consensus between the linear and non-linear hypotheses in the wake of past research. Large datasets of high temporal frequency traffic volumes are used and the response of accident occurrence to congestion across 120 intersections will be analysed. Separate analyses look at the effect of congestion on accident severity and the effects of rainfall on accident risk across these congestion levels. The City of Adelaide is chosen due to overlap in availability of high temporal frequency, spatially explicit traffic data and detailed accident records.
2. Materials and Methods
This analysis combines detailed spatio-temporal traffic accident records and hourly intersection traffic volume data. By normalising traffic volumes to each intersection, the resulting congestion index allowed traffic conditions to be compared between intersections irrespective of differences in intersection characteristics.
Two further factors were investigated. Accidents risks at different congestion levels were analysed with relation to rainfall and accidents were disaggregated by severity level to uncover any influence of congestion on accident severity.
Data was processed and analysed using the R programming language [
37] with the RStudio integrated development environment [
38].
2.1. Study Area
The study is constrained to the Adelaide City Council (ACC) area in South Australia, Australia—chosen based on the extent of the hourly intersection traffic volumes dataset.
Figure 1 shows the location relative to wider Adelaide and Australia as a whole.
2.2. Data Processing Workflow
Figure 2 summarises the process by which intersection traffic volumes were joined to traffic accident records with reference to the relevant methods sections.
2.3. Accident Data
Traffic accident records were obtained from the Department of Planning, Transport and Infrastructure’s (DPTI) “Road Crash Data” dataset [
39]. While this is publicly available, the dates and times of individual accidents are omitted for privacy reasons. The DPTI provided a version with dates and times included for use in this research. The dataset contains information about each accident, including the date and time, coordinates, weather conditions and accident severity.
A separate “units” table provides additional information about the units (including cars, cyclists and fixed objects) involved in each accident.
2.4. Processing Accident Data
Accidents that included unit types such as cyclists, pedestrians, wheelchairs and animals were removed as these units are not affected by traffic in the same way as vehicles in the main traffic stream.
The date-times of each accident were formatted into the ISO 8610 date-time format with the Adelaide time-zone specified. Standardising date-times between datasets will ensure the accurate temporal joining of accidents and traffic volume measurements.
As traffic volume data only exists for intersections within the ACC between the years 2010 and 2014, the accidents were filtered to fit these parameters, leaving 2336 accidents (
Table 1). Accident times were rounded down to the nearest hour to match the hourly timestamps of the traffic volume data. It was essential to round times down to the previous hour to ensure the traffic volume used was not affected by the accident itself. This practice is used in previous studies looking at the relationship between traffic volume and accident frequencies [
34,
40,
41].
2.5. Intersection Traffic Volume Data
Traffic intersection volumes from 2010 to 2014 [
42] are publicly available through
data.sa.gov.au. The dataset consists of hourly traffic volume measurements for 122 intersections in the ACC; recorded using the Sydney Coordinated Adaptive Traffic System (SCATS). Traffic volumes represent the total number of vehicles to pass through an intersection in each hour. Directional traffic data was not available—subsequent methods detail the approach used to address this. Each hourly measurement includes the coordinates of its corresponding intersection, meaning that every measurement at each intersection has a separate spatial data point. Over five million hourly traffic volume measurements are available to be paired to individual accidents (
Table 2). This is important due to the rarity of accident events. A large traffic volume dataset increases the probability that any individual accident will have associated traffic data and increases the number of accidents useable in the analysis.
2.6. Processing Intersection Traffic Volume Data
Upon investigation of the data, an intersection on Anzac Highway and one on Wakefield Street were found to have median traffic volumes of zero vehicles per hour. Just over half of the traffic volume measurements at the Wakefield street site were zero and nearly all measurements at the Anzac Highway site were zero; this is unrealistic for two major roads in the ACC. Volume measurements from these two intersections were removed from the dataset, leaving a total of 120 intersections (
Table 2). There were large groups of consecutive zero vehicles per hour readings—often during hours of the day where volumes above zero would be expected—these are also errors. To address this, groups of traffic volume measurements that remained the same for more than five consecutive hourly periods, including values above zero, were removed. Overall, removing error measurements reduced the number of traffic volume measurements by approximately 150,000 (
Table 2).
Hour of day and date columns were combined into one date-time column and formatted in the ISO 8601 date-time format [
43] with the Adelaide time-zone specified.
Traffic volume measurements were also corrected using the provided error ratio, which indicates the proportion of vehicle counts in each hourly period that were made in error. The inverse of this ratio is the “valid ratio”; the proportion of vehicles that were counted correctly in any given hour. Each hourly traffic volume measurement was multiplied by its valid ratio to give a corrected measurement, accounting for error in the SCATS sensors. The SCATS system is developed by the New South Wales Government in Australia and uses data collected from detectors at intersections to manage traffic signals.
2.7. Joining Accident and Traffic Volume Datasets
Before analysis of the effects of traffic volumes on the frequency of accidents could be conducted, it was necessary to know the volume of traffic passing through an intersection immediately before each accident. This required joining the accident and intersection traffic volume datasets.
Using the coordinates of each accident and each traffic volume measurement, the two datasets were spatially joined with a distance parameter of 20 m; joining each accident to the traffic volume data for any intersection within 20 m. Because every hourly traffic volume measurement has its own spatial data point, each accident record was duplicated across every traffic volume measurement at its intersection. This large dataset was filtered to only include rows where the date-time of the accident matched with the date-time of the traffic volume measurement. This resulted in a total of 1629 accidents (
Table 3) with associated traffic volumes. This new table will be referred to as the accident volumes dataset.
The accidents in this dataset were then used to analyse the effects of traffic volumes on the occurrence of accidents.
2.8. Rainfall Data
High temporal resolution rainfall data was purchased from the Bureau of Meteorology (BOM) [
44]. Data for the “Adelaide (Kent Town)” rainfall station was available from 1995 to 2015, with a total of 353,439 measurements. Rainfall rates were taken in increments of 0.2 mm with a temporal resolution of 30 minutes.
2.9. Accounting for Variability in Intersection Capacity
As each intersection has a different capacity, traffic volumes are not comparable between them. For example, 1000 vehicles per hour may be close to the capacity of one intersection but easily within the capacity of another.
To account for this, traffic volumes must be normalised. Traditionally, v/c ratios are used, with methods derived from the Highway Capacity Manual [
36]. If this method were to be used, the capacity at signalized intersections would be calculated individually for different lane groups using their capacity, adjusted saturation flow rate, effective green traffic signal ratio and cycle length [
36]. However, the broad spatial scale of the study area makes the use of this method difficult. Signal timing data for each intersection was not easily accessible and intersection geometry information would have been difficult to ascertain and use over 120 intersections. Additionally, traffic volume data was only available as a total count for each intersection. The lack of directional vehicle counts makes the calculation of v/c ratios for different lane groups impossible. As a result, a novel approach to standardising traffic conditions was taken.
This was achieved by assigning each measurement into one of 15 bins in a quantile classification based on other measurements at the same intersection (
Appendix A explains the choice of 15 bins). A traffic volume measurement of 300 vehicles per hour may be assigned to bin 15 at a low-volume intersection, while a measurement of 5000 vehicles per hour may be assigned to bin 15 at a high-volume intersection. Looking at the two traffic volumes alone, they seem incomparable; however, they both fall among the highest volume measurements for their respective intersections. These bins effectively act as an index for congestion by representing traffic volumes relative to the overall range of volumes at an intersection.
2.10. Analysing the Relationship Between Traffic Volume and Accident Frequency
Intersections were grouped into three different sizes based on their median traffic volumes. Accidents in the accident volumes dataset were then grouped by the size of the intersection they occurred at and the congestion level at the time of the accident. This results in 45 groups (three intersection size ranks × 15 congestion levels). The number that occurred in each of these 45 groups was counted.
However, plotting accident frequencies against congestion index on a linear scale results in a transformation of the response of accident frequency. This is because the 15 congestion levels are not distributed evenly throughout the traffic volume distributions at intersections. To combat this, the median traffic volumes of each of the congestion levels was calculated, as explained in
Appendix B.
Accident frequencies were then able to be plotted against this median value, allowing the linear hypothesis to be tested.
To determine whether the response of accident frequency to traffic volume was linear, or whether a significant non-linear effect was present, various models were fitted to accident frequency for each intersection size. As the data is non-negative count data, regular linear models are not appropriate.
Initially, poisson generalized linear models (GLM) were fit with a single linear explanatory term. These models were then tested for overdispersion to determine whether the poisson was appropriate. If the poisson model is overdispersed, the negative binomial model is more appropriate. The following formulae were used for either the poisson or negative binomial models.
- Linear:
accident frequency ~ traffic volume
- Quadratic:
accident frequency ~ traffic volume + (traffic volume)2
- Natural Spline:
accident frequency ~ natural spline (traffic volume, 4 d.f.)
The most preferable of these three models for each intersection rank were determined using the AICc (Akaike Information Criterion) model selection criterion [
45].
2.11. Accident Severity Analysis
For analysis of how congestion affects accident severity, the accident volumes dataset was filtered into three subsets, containing property damage only (PDO), minor injury (MI) and serious injury (SI) accidents (there were no fatal accidents at intersections in the ACC during the study period). As there were only 20 SI accidents with traffic volume data, there was too much noise for a clear response to be observed and SI accidents were not considered further.
The frequency of PDO and MI traffic accidents in each congestion level were then plotted, allowing any difference in response of accident frequency to be seen. Normalised frequencies are the proportion of total PDO or MI accident counts in each congestion level. The accident frequency ratio is the ratio of PDO to MI frequencies in each congestion level.
2.12. Rainfall Risk Analysis
To determine the effect of rain on accident occurrence, the accidents in the accident volumes table were separated into accidents occurring while it was raining and while it was not raining. For each of these filtered datasets, the accident frequency in each congestion level was counted.
Using these not-raining and raining accident frequencies, the risk of not-raining and raining accidents in each congestion level were calculated. Accident risk is the probability of an accident occurring within a period. Using raining accident risk as an example, risk was calculated using Equation (1):
where a period refers to the hourly traffic volume periods.
To understand how rainfall risk changes with increasing congestion, this calculation was applied to each of the 15 congestion levels (Equation (2)).
where Cx is the congestion level (1–15).
The process was repeated for not-raining accidents.
To determine the total number of hourly periods in which it was raining (or not raining), the BOM rainfall data was joined to the intersection volumes table. The number of traffic volume measurement periods in which it was raining and not raining were counted for each of the 15 levels of congestion, allowing risks to be calculated for each level.
Relative risk (RR) is the ratio of the risk of an event occurring under exposed conditions to the risk of an event occurring under control conditions [
46]. In the context of accident risks, RR was calculated for each congestion level (Cx) using the following equation and the risk values calculated using Equations (1) and (2).
RR was then plotted against congestion index to allow any changes with increasing congestion to be observed. A change in RR would indicate a change in how rainfall affects the risk of an accident. A RR of greater than one means that the risk of an accident occurring is higher when it is raining, while a RR of less than one means that the risk of an accident occurring is greater while it is not raining.
5. Conclusions
This study has demonstrated the ability of high temporal frequency traffic volume data to be used in parsimonious models for predicting accident frequencies at intersections. A total of 1629 motor vehicle accidents were linked traffic volume data from a pool of over 5 million hourly traffic volume data points.
Results show that accident frequency increases non-linearly in the higher levels of congestion. Therefore, suggesting that managing traffic to avoid such high levels of congestion would have the greatest impact on reducing accident occurrence. Importantly, there is no observable increase in accident frequency as congestion decreases, meaning that reducing congestion would not negatively impact public health.
Change in the severity of accidents between congestion levels was also considered. However, no relationship was found, possibly due to the lack of SI and fatal accidents in the data.
Rainfall risks were compared individually for each of the 15 levels of congestion, showing an increased influence of rainfall on accident occurrence when levels of congestion are low and indicating an increased importance of rainfall risk management in these conditions.
This analysis has demonstrated the benefit of using long-term, broad-scale, temporally detailed data for accident risk analysis.