Crash Classification by Congestion Type for Highways

Effective management of highway networks requires a thorough understanding of the conditions under which vehicular crashes occur. Such an understanding can and should inform related operational and resource allocation decisions. This paper presents an easily implementable methodology that can classify all reported crashes in terms of the operational conditions under which each crash occurred. The classification methodology uses link-based speed data. Unlike previous secondary collision identification schemes, it neither requires an a priori identification of the precipitating incident nor definition of the precipitating incident’s impact area. To accomplish this objective, the methodology makes use of a novel scheme for distinguishing between recurrent and non-recurrent congestion. A 500-crash case study was performed using a 274 km section of the I-40 in North Carolina. Twelve percent of the case study crashes were classified as occurring in non-recurrent congestion. Thirty-seven percent of the crashes in non-recurrent congestion classified were identified within unreported primary incidents or crashes influence area. The remainder was classified as primary crashes occurring in either uncongested conditions (84%) or recurrent congestion (4%). The methodology can be implemented in any advanced traffic management system for which crash time and link location are available along with corresponding archived link speed data are available.


Introduction
The primary role of an Advanced Traffic Management Systems (ATMS) is to improve reliability and safety through active real-time traffic management and control. Vehicular crashes endanger lives, damage property, and cause congestion, presenting an obstacle to the goal of improving the safety, efficiency, and sustainability of the transportation system. To manage a system well, it is important to understand the conditions under which crashes happen. This knowledge can inform crash management and resource allocation for incident response. For instance, Variable Speed Limits (VSL) may be an effective countermeasure to prevent crashes during recurring congestion. Conversely, incident response time may be the most effective strategy for mitigating the impacts of crashes during congested periods. However, crash records in a crash analysis database do not indicate congestion conditions at the crash scene but only environmental, crash, roadway, and driver characteristics [1][2][3].
Another motivation for understanding and classifying crashes is that such information helps improve reliability and safety [4][5][6][7][8][9][10]. Many previous studies have focused on identifying the relationship between crashes and the traffic flow rate. The relationship between reliability and safety is less well understood, except that vehicular collisions and other unplanned incidents increase travel time variability and decrease reliability. Operating jurisdictions tend to note simply whether the recurring or non-recurring congestion was extant at the time of the crash [11,12]. Conversely, many traffic safety researchers have studied secondary collision and the factors affecting their occurrence on freeways [13,14].
This analysis focuses on classifying crashes, especially whether they occurred during recurrent and non-recurrent congestion. The implicit objective is to help operating agencies understand how to reduce the number of secondary collisions and mitigate their risk. A necessary precursor is a method to classify each crash in terms of whether or not it occurred during congested conditions, and if so, whether the congestion was the cause of or impacted by the event. This calls for developing an integrated methodology to classify crashes by congestion type.
With these considerations in mind, this paper presents an easily implementable methodology that can classify all reported crashes in terms of the operational conditions under which each crash occurred. It classifies crashes three cases: (1) crash not during congested conditions, (2) crash during non-recurrent congestion, and (3) crash during recurring congestion. Unlike previous secondary collision identification schemes, it requires neither identification of the precipitating incident nor a definition of the precipitating incident's impact area. It supports decision-makers in their efforts to implement both safety and mobility treatments that are precisely targeted and effective.
In what follows, relevant studies are reviewed and knowledge gaps are highlighted. Then, the proposed methodology is described and applied to a 274 km section of I-40 in North Carolina of the United States. The paper concludes with a presentation of the findings, conclusions, and recommendations for further research.

Literature Review
Several studies have tested the relationship between crash rates and flow rates or density [4][5][6][7][8][9][10]. However, the results have been inconclusive. Zhou and Sisiopiku [4] found that there was a U-shaped relationship between V/C ratio and crash rate on freeways. On the other hand, Lord et al. [8] did not find relationships between crash rate and congestion or severity and congestion.
Insofar as crash types and congestion are concerned, there are two important questions: (1) does congestion influence the crash type and (2) vice versa. Most studies have focused on the former but this one was concerned with the latter. Golob et al. [9] and Lee et al. [10] found that rear-end crashes were more likely under unstable traffic flow conditions. Elvik et al. [15] identified main factors influencing accidents on road bridge and found that traffic volume was the most influencing factor. Wang et al. [6] found that traffic congestion had little or no impact on crash rates. However, their statistical model failed to pass statistical significance tests. The reason for this might be that they used a congestion index to test the relationship, and that index was based on the average congestion level across an entire year. More recent studies have focused on identifying the relationship between road accidents and traffic volume. Xu et al. [16] found that high traffic volume was responsible for 25.6% of the serious casualty crashes indicating that there is a positive relationship between traffic volume and road accidents. Zhan-Moodie [17] concluded that congestion can be linked with crashes by superimposing crash areas on top of congested areas using GIS shapefiles. Retallack and Ostendorf [18] reviewed this work and concluded that the method was not tested using congestion information that pertained at the time of the crash. The study presented here addresses this issue by using a congestion measure that is predicated on the traffic conditions at the time of the crash.
At extant traffic condition (congestion) can be classified as either recurrent or non-recurrent. A definition of non-recurrent congestion is as follows; delay caused by an incident, a work zone, adverse weather, or other non-repetitive event [12,[19][20][21][22][23][24][25]. Chung [21] defines non-recurrent congestion Appl. Sci. 2020, 10,2583 3 of 15 as the extra delay caused by incidents compared with the annual average section travel speed. For instance, if the free-flow speed is 60 mi/h and annual average section travel speed is 30 mi/h during peak periods, then it is assumed that recurrent congestion occurs.
Defining recurring congestion is more problematic [19][20][21][22]26]. Oxford's Dictionary defines "recurrent" as being something that occurs often or repeatedly. In other words, recurrent congestion should be "predictable" in both location and time. Drivers should indicate that "this area, at this time, is often". Most previous studies use either the mean or the median of a speed distribution during a specified time of day to measure recurrent delay and extra delay caused by incidents as non-recurrent congestion on a freeway. Caltrans [27] defines recurring congestion as the combination of a location, time, and speed: for example, an average speed of 35 mile per hour or less for 15 min or more on a specific freeway segment. Schaefer [25] defines it as a Travel Time Index (TTI) of 1.5 or more for a given segment. Song et al. [11] recently employed a data-driven approach using link speeds. Their approach defined "recurrent" consistent with the Oxford's Dictionary definition and identified recurrent bottleneck locations and their time span.
Collision classification tends to rely on deterministic queuing theory. If a specific freeway segment is under study, the rules are often as follows. First, crashes which occur during times of non-recurrent congestion are classified as such. These become the "primary" incidents. Then, the boundaries of the impact area are determined. The rules for doing this may be either static or dynamic. In the static case, one might assume that a secondary incident has arisen if it occurs within 15 min of the primary event and within 1 mile upstream [28]. This rule might be applied even though the primary incident impact area is longer. That is, incidents occurring more than a mile upstream are not classified as secondary.
Dynamic thresholds overcome some of these limitations [23,24,29,30]. Chou and Miler-Hooks [29] created a "simulation-based secondary incident filtering" method (SPSIF). They implemented a regression model to identify the boundaries of the primary incident impact area. All subsequent incidents within that area were classified as secondary. Zhang and Khattak [31] used queuing analysis to study single-pair events (one primary and one secondary incident) and large-scale events (one primary and many secondary incidents). Their objective was to determine the "back of queue" location. Yang et al. [32] proposed the use of historical virtual sensor measurements to identify secondary crashes. They used a Representative Speed Contour Map (RSCM) with percentile speed of historical incident-free virtual sensor speed measurements of each spatiotemporal cell. Moreover, Goodall [13] used private-sector speed data provided from INRIX TMC data for the first analysis of secondary crash occurrence to integrate incident timelines. However, the study still needs to identify the precise primary incidents.
The work presented here differs from previous work in four important ways.
(1) Each crash is linked with an incident timeline, which is a necessary precursor to investigating the relationship between reliability and safety. Previous studies have conducted a statistical test with traffic volume and accident frequency. (2) A novel methodology is proposed to classify crashes in recurrent congestion as well as crashes in non-recurrent congestion. (3) The "recurrent" bottleneck identification approach is used to identify recurrent bottleneck locations and their impact areas. (4) The methodology does not require any information on primary incidents (crashes and non-crash incidents) to identify crashes in non-recurrent congestion.

Methodology
A four-step methodology is employed, as depicted in Figure 1. In Step 1, both mobility and crash data are directionally and temporally linked to freeway segments using a GIS-based shapefile. Fifteen minute speed data are used as a mobility surrogate. In Step 2, the crashes are classified based on whether they are occurring during normally congested or uncongested periods. To differentiate between these conditions, a cut-off threshold is employed. In Step 3, for the crashes that occur during normally uncongested periods, it is determined whether non-recurring congestion was present at the time of the crash. Crashes occurring in non-recurring congestion where no or little bottleneck is normally activated are classified with a specific cut-off threshold. This threshold comes from a historic spatiotemporal congestion index that identifies recurrent congestion. Finally, in Step 4, the remaining crashes are classified as being during recurrent or non-recurrent conditions. First, crashes are classified as being recurrent congestion if they are within a recurrent congestion impact area whose bottleneck activates frequently. Crashes that cannot be classified this way are treated as special cases and classified with a supplemental methodology.
Appl. Sci. 2020, 10, x FOR PEER REVIEW 4 of 15 time of the crash. Crashes occurring in non-recurring congestion where no or little bottleneck is normally activated are classified with a specific cut-off threshold. This threshold comes from a historic spatiotemporal congestion index that identifies recurrent congestion. Finally, in Step 4, the remaining crashes are classified as being during recurrent or non-recurrent conditions. First, crashes are classified as being recurrent congestion if they are within a recurrent congestion impact area whose bottleneck activates frequently. Crashes that cannot be classified this way are treated as special cases and classified with a supplemental methodology.

Mobility Data
Speed data from INRIX.com was employed in the study. INRIX.com uses GPS enabled probe vehicles to collect this information. The geocoding is based on Traffic Message Channel (TMC) codes, as defined by Tele Atals and Navteq. Each TMC corresponds to a directional roadway segment with geolocated begin and end points. INRIX reports average travel times, average speeds, reference speeds, and scores by time of day and day of the week. The score indicates if the reported speed is based on historical data, real time data, or a blend of the two.
In the study area, the Regional Integrated Transportation Information System (RITIS) provides an integrated spatiotemporal contour map of the traffic congestion for each TMC segment by time of day (RITIS) [33]. RITIS also provides contour maps for comparative speeds, congestion, a travel time index, and so forth. This study used 15 min. aggregated congestion data. RITIS defines the congestion value, C(i,t,m), for segment (i) at time (t). i is TMC a segment and t is specified time interval in a day (e.g., 8:00-8:15, 15 min.) and typically will vary from 1 to 96 for a single day. m is index for a day in the study period especially for weekday. Finally, MS(i, t, m), the measured speed (mi/h), and FFS(i) the free flow speed (mi/h) are specified for each segment.

Mobility Data
Speed data from INRIX.com was employed in the study. INRIX.com uses GPS enabled probe vehicles to collect this information. The geocoding is based on Traffic Message Channel (TMC) codes, as defined by Tele Atals and Navteq. Each TMC corresponds to a directional roadway segment with geolocated begin and end points. INRIX reports average travel times, average speeds, reference speeds, and scores by time of day and day of the week. The score indicates if the reported speed is based on historical data, real time data, or a blend of the two.
In the study area, the Regional Integrated Transportation Information System (RITIS) provides an integrated spatiotemporal contour map of the traffic congestion for each TMC segment by time of day (RITIS) [33]. RITIS also provides contour maps for comparative speeds, congestion, a travel time index, and so forth. This study used 15 min. aggregated congestion data. RITIS defines the congestion value, C(i,t,m), for segment (i) at time (t). i is TMC a segment and t is specified time interval in a day (e.g., 8:00-8:15, 15 min.) and typically will vary from 1 to 96 for a single day. m is index for a day in the study period especially for weekday. Finally, MS(i, t, m), the measured speed (mi/h), and FFS(i) the free flow speed (mi/h) are specified for each segment.   Table 1 presents two examples of the TEAAS crash data.

Congestion Assessment Methodology
A method was developed to determine the level of congestion that was extant at the time of each crash. As mentioned earlier, the recurrent bottleneck identification approach developed by Song et al. [11] was employed. It makes use of a Congestion Index (CI), an Average Historic Congestion Index (AHCI), and a Recurrent Bottleneck Location Identification (RBLI).
The congestion index, CI(i, t, m), labels each segment i as being congested or uncongested at time t, as illustrated in Figure 2b. The congestion value for each segment is determined. If the congestion value is below a threshold , segment (i) is classified as being congested at time t. Here, a value of 80% was used as the threshold. This value is consistent with the ratio of (the speed at capacity) to (the speed at free flow) presented in the US Highway Capacity Manual [34].
The Average Historic Congestion Index, AHCI(i, t), is defined as the faction of (week or all) days in the study period T (typically one or two years) where segment i was congested at time t, based on the specified congestion index ((CI(i, t, m))). AHCI is the key parameter for identifying recurrent congestion. It is used to denote the probability that segment i was congested at time t.  Table 1 presents two examples of the TEAAS crash data.

Congestion Assessment Methodology
A method was developed to determine the level of congestion that was extant at the time of each crash. As mentioned earlier, the recurrent bottleneck identification approach developed by Song et al. [11] was employed. It makes use of a Congestion Index (CI), an Average Historic Congestion Index (AHCI), and a Recurrent Bottleneck Location Identification (RBLI).
The congestion index, CI(i, t, m), labels each segment i as being congested or uncongested at time t, as illustrated in Figure 2b. The congestion value for each segment is determined. If the congestion value is below a threshold α, segment (i) is classified as being congested at time t. Here, a value of 80% was used as the threshold. This value is consistent with the ratio of (the speed at capacity) to (the speed at free flow) presented in the US Highway Capacity Manual [34].
The Average Historic Congestion Index, AHCI(i, t), is defined as the faction of (week or all) days in the study period T (typically one or two years) where segment i was congested at time t, based on the specified congestion index ((CI(i, t, m))). AHCI is the key parameter for identifying recurrent congestion. It is used to denote the probability that segment i was congested at time t.
AHCI contour maps are used to define the recurrent bottlenecks as well as their influence areas. To illustrate, Figure 3 shows three different patterns that were observed in Spring and Fall seasons of 2013 in North Carolina. Each block corresponds to an AHCI value at a TMC segment at a certain time of day. A TMC with the highest AHCI (more than 50%) value a given time of day is defined as a recurrent bottleneck segment, β, which was proposed from Song et al. [11].
Appl. Sci. 2020, 10, x FOR PEER REVIEW 6 of 15 AHCI contour maps are used to define the recurrent bottlenecks as well as their influence areas. To illustrate, Figure 3 shows three different patterns that were observed in Spring and Fall seasons of 2013 in North Carolina. Each block corresponds to an AHCI value at a TMC segment at a certain time of day. A TMC with the highest AHCI (more than 50%) value a given time of day is defined as a recurrent bottleneck segment, , which was proposed from Song et al. [11]. The question then is what other nearby TMC segments are also in a recurrent bottleneck area? In reference to Figure 3a, the TMC with the highest AHCI above 50% (TMC A) is located just The question then is what other nearby TMC segments are also in a recurrent bottleneck area? In reference to Figure 3a, the TMC with the highest AHCI above 50% (TMC A) is located just upstream of a significant drop in AHCI value. Therefore, this TMC segment is clearly in a recurrent bottleneck and this pattern is defined as Pattern 1. In Figure 3b, the TMC segment with the highest AHCI value (TMC B) is not located immediately upstream of a significant drop in the AHCI value. However, TMC B is also part of the bottleneck area and this pattern is defined as Pattern 2. In Figure 3c, the TMC segment with the highest AHCI value (greater than 50%) is in the bottleneck area (TMC C); however, a significant drop is not observed in the AHCI value for this TMC segment or the one downstream of it. In addition, the downstream TMC segment has an AHCI value below 50%. Therefore, the bottleneck area is defined by only TMC C and this pattern is defined as Pattern 3. In this study, the bottleneck influence area includes all the TMC segments upstream of a bottleneck area with AHCI values greater than 20%, with the assumption that congestion occurs on at least one of every five weekdays. The cut-off threshold of boundary of bottleneck influence area, γ, is reasonable in that spatial patterns depicts all TMC segments is located upstream have greater AHCI value than downstream TMCs' as seen in Figure 3a-c.
The recurrent bottleneck identification methodology developed by Song et al. [11] employs an exhaustive search algorithm to identify the recurrent bottleneck locations. Two constraints are imposed on this search: First, to be included in a recurrent bottleneck location, all contiguous TMC segments must have AHCI values exceeding 50%. Second, at most two of the TMC segments must be included by spatial pattern 2. A threshold, δ, is defined as the allowable difference in AHCI. The proposed δ value in this study is 2. In Figure 4, the TMC U segment produces an AHCI value greater than δ, which is greater than the AHCI value for TMC Ds, so this segment is included in the recurrent bottleneck. Song et al. [11] recommended values of δ equal to 2.0, 2.4, and 2.5, which can be selected differently by transportation jurisdictions. The methodology is as follows Algorithm 1. The recurrent bottleneck identification methodology.

For all each spatiotemporal AHCI(i, t)
If AHCI(i, t) ≥ 50% Spatial pattern 1, 3: y 1 = AHCI(i, t) -2·AHCI(i+1, t) Spatial pattern 2(1): y 2 = AHCI(i, t) -2·AHCI(i+2, t) Spatial pattern 2(2): y 3 = AHCI(i, t) -AHCI(i+1, t) If y 3 ≥0 && y 1 ≥0||y 2 ≥ 0 Then, the segment (i) possibly as bottleneck Appl. Sci. 2020, 10, x FOR PEER REVIEW 7 of 15 upstream of a significant drop in AHCI value. Therefore, this TMC segment is clearly in a recurrent bottleneck and this pattern is defined as Pattern 1. In Figure 3b, the TMC segment with the highest AHCI value (TMC B) is not located immediately upstream of a significant drop in the AHCI value. However, TMC B is also part of the bottleneck area and this pattern is defined as Pattern 2. In Figure  3c, the TMC segment with the highest AHCI value (greater than 50%) is in the bottleneck area (TMC C); however, a significant drop is not observed in the AHCI value for this TMC segment or the one downstream of it. In addition, the downstream TMC segment has an AHCI value below 50%. Therefore, the bottleneck area is defined by only TMC C and this pattern is defined as Pattern 3. In this study, the bottleneck influence area includes all the TMC segments upstream of a bottleneck area with AHCI values greater than 20%, with the assumption that congestion occurs on at least one of every five weekdays. The cut-off threshold of boundary of bottleneck influence area, , is reasonable in that spatial patterns depicts all TMC segments is located upstream have greater AHCI value than downstream TMCs' as seen in Figure 3a-c. The recurrent bottleneck identification methodology developed by Song et al. [11] employs an exhaustive search algorithm to identify the recurrent bottleneck locations. Two constraints are imposed on this search: First, to be included in a recurrent bottleneck location, all contiguous TMC segments must have AHCI values exceeding 50%. Second, at most two of the TMC segments must be included by spatial pattern 2. A threshold, , is defined as the allowable difference in AHCI. The proposed value in this study is 2. In Figure 4, the TMC U segment produces an AHCI value greater than , which is greater than the AHCI value for TMC Ds, so this segment is included in the recurrent bottleneck. Song et al. [11] recommended values of equal to 2.0, 2.4, and 2.5, which can be selected differently by transportation jurisdictions. The methodology is as follows Algorithm 1. The recurrent bottleneck identification methodology.

Collision Classification Methodology by Type of Congestion
A four-step method is used to add a congestion level indicator to the crash records, as shown in Figure 5. This section provides detailed explanations for each step.

Appl. Sci. 2020, 10, x FOR PEER REVIEW 8 of 15
A four-step method is used to add a congestion level indicator to the crash records, as shown in Figure 5. This section provides detailed explanations for each step.

Step1: Locate the Crash in GIS
First, a TMC code is associated with the crash. As the crash records include longitude and latitude information, the TMC codes can be identified easily. A link shapefile of TMC segments is applied to join the two datasets. Thus, crash becomes associated with specific TMC segment, which is defined as "TMCc."

Step2: Crashes in Uncongested Conditions
The second step is to identify the crashes which occur during uncongested conditions. As shown in Figure 5, if the crash occurs on a TMC segment that is uncongested at the time of the crash, then the crash is labeled as "Case 1-Crash not in congested conditions". Otherwise, the analysis continues to Step 3. As mentioned earlier, if CI(i,t,m) = 0, then TMC segment i is considered to be uncongested at time t and vice versa.

Step3: Crashes in Congested Conditions
For those crashes that do not occur during uncongested conditions, the third step is to determine the congestion status of the TMC segment at the time of the crash. Two main metrics are employed: CI(i,t,m) and AHCI(i,t). The analysis focuses on the pattern AHCI(i,t) values, during the crash versus "normally". As mentioned earlier, two breakpoints, and , are used to evaluate AHCI(i,t). If AHCI(t)< , then TMC segment i is considered to be historically uncongested at time t; otherwise, if AHCI(t) , then it is historically congested. The values of and are determined through a sensitivity analysis. Figure 6 shows the sensitivity the sensitivity analysis for crashes labeled as "Case 2-Crash in non-recurrent congestion". It is clear that if the upper bound is between 50 and 70 and the lower bound is between 10 and 20, the number of Case 2 selections becomes quite stable. However, if the upper bound drops below 50 or the lower bound is increased beyond 30, the number of Case 2 classifications changes significantly. This leads to the conclusion that the upper and lower bounds used in the selection process presented above are a good combination to employ. A value of 60% as the threshold of and 20% for . The First, a TMC code is associated with the crash. As the crash records include longitude and latitude information, the TMC codes can be identified easily. A link shapefile of TMC segments is applied to join the two datasets. Thus, crash becomes associated with specific TMC segment, which is defined as "TMCc."

Step2: Crashes in Uncongested Conditions
The second step is to identify the crashes which occur during uncongested conditions. As shown in Figure 5, if the crash occurs on a TMC segment that is uncongested at the time of the crash, then the crash is labeled as "Case 1-Crash not in congested conditions". Otherwise, the analysis continues to Step 3. As mentioned earlier, if CI(i,t,m) = 0, then TMC segment i is considered to be uncongested at time t and vice versa.

Step3: Crashes in Congested Conditions
For those crashes that do not occur during uncongested conditions, the third step is to determine the congestion status of the TMC segment at the time of the crash. Two main metrics are employed: CI(i,t,m) and AHCI(i,t). The analysis focuses on the pattern AHCI(i,t) values, during the crash versus "normally". As mentioned earlier, two breakpoints, β and r, are used to evaluate AHCI(i,t). If AHCI(t) < γ, then TMC segment i is considered to be historically uncongested at time t; otherwise, if AHCI(t) ≥ β, then it is historically congested.
The values of β and r are determined through a sensitivity analysis. Figure 6 shows the sensitivity the sensitivity analysis for crashes labeled as "Case 2-Crash in non-recurrent congestion". It is clear that if the upper bound is between 50 and 70 and the lower bound is between 10 and 20, the number of Case 2 selections becomes quite stable. However, if the upper bound drops below 50 or the lower bound is increased beyond 30, the number of Case 2 classifications changes significantly. This leads to the conclusion that the upper and lower bounds used in the selection process presented above are a good combination to employ. A value of 60% as the threshold of β and 20% for γ. The values of β and r were 20% and 60%, respectively. Therefore, if AHCI(t), then TMC segment i is congested at time t less than once in every five days. If AHCI(t)≥ 60%, then it is congested at least every other day. Therefore, if AHCI(t) < γ, then the crash is labeled as "Case 2". Otherwise, it is passed to Step 4.
Appl. Sci. 2020, 10, x FOR PEER REVIEW 9 of 15 values of and were 20% and 60%, respectively. Therefore, if AHCI(t) , then TMC segment i is congested at time t less than once in every five days. If AHCI(t) 60%, then it is congested at least every other day. Therefore, if AHCI(t) , then the crash is labeled as "Case 2". Otherwise, it is passed to Step 4. , then the crash is labeled "Case 3-Crash in recurrent congestion". Otherwise (that is, 20% AHCI 60%), then a sub-test is performed to see if recurrent congestion is typically present at the time of the crash. If recurring congestion is typically present, then the crash is labeled "Case 3". Otherwise, it is labeled "Case 2". In the area of uncertainty given by 20% AHCI 60%, a supplemental test is made to see whether there is a recurrent bottleneck TMC downstream of crash location, and whether the queue from that TMC typically spills back into the crash location. This algorithm is seen in Algorithm 2.

If TMCc is in a recurrent bottleneck segment Go to Case 3 Elseif TMCc is not in a recurrent bottleneck segment
Go to Case 3 Else Go to Case 2

Algorithm 2. Classifying Remaining Crashes (Supplemental Methodology).
In the algorithm, AHCITMCc(i,t) is the AHCI of TMCc, and AHCITMCc(i+b, t) is the AHCI of the recurrent bottleneck location downstream of TMCc.

Illustrated Case Study
This section illustrates the use of the classification methodology. Data from the 274 km "test site" is employed. In step four, if AHCI(t) ≥ β, then the crash is labeled "Case 3-Crash in recurrent congestion". Otherwise (that is, 20% < AHCI ≤ 60%), then a sub-test is performed to see if recurrent congestion is typically present at the time of the crash. If recurring congestion is typically present, then the crash is labeled "Case 3". Otherwise, it is labeled "Case 2". In the area of uncertainty given by 20% < AHCI ≤ 60%, a supplemental test is made to see whether there is a recurrent bottleneck TMC downstream of crash location, and whether the queue from that TMC typically spills back into the crash location. This algorithm is seen in Algorithm 2.

Algorithm 2. Classifying Remaining Crashes (Supplemental Methodology).
If TMCc is in a recurrent bottleneck segment Go to Case 3 IElseif TMCc is not in a recurrent bottleneck segment Go to Case 2 Else Go to Case 2 In the algorithm, AHCI TMCc (i,t) is the AHCI of TMCc, and AHCI TMCc (i+b, t) is the AHCI of the recurrent bottleneck location downstream of TMCc.

Illustrated Case Study
This section illustrates the use of the classification methodology. Data from the 274 km "test site" is employed.

Study Site
The 274 km section of the I-40 used to test the methodology extends from Exit 259 (at the split with I-85 in Durham) to Exit 420 (Gordon Rd, at the eastern end of I-40 outside of Wilmington). This section contains 98 TMC segments, with an average length of 1.6 miles and a standard deviation of 2.05 miles. The first, median, and the third quartile values are 0.51, 0.67, and 1.56, respectively. The posted speed limit is either 65 or 70 mph. The data employed were for Tuesdays, Wednesdays, and Thursdays in April, May, September, and October of 2012 and 2013 (a total of 105 days). Aggregated 15-min TMC segment data were used to create the congestion contour plots. The crash data were for both directions. The crash dataset comprised 500 records (234 crashes westbound and 266 eastbound).

Crash Classification Results
Crashes on the TMC segments were classified using the methodology described Section 3. Figure 7 shows the contour map for one of the crashes classified "Case 1-Crash not in congested conditions". The number of Case 1 crashes was 419 out of 500 (i.e., 84%).

Study Site
The 274 km section of the I-40 used to test the methodology extends from Exit 259 (at the split with I-85 in Durham) to Exit 420 (Gordon Rd, at the eastern end of I-40 outside of Wilmington). This section contains 98 TMC segments, with an average length of 1.6 miles and a standard deviation of 2.05 miles. The first, median, and the third quartile values are 0.51, 0.67, and 1.56, respectively. The posted speed limit is either 65 or 70 mph. The data employed were for Tuesdays, Wednesdays, and Thursdays in April, May, September, and October of 2012 and 2013 (a total of 105 days). Aggregated 15-min TMC segment data were used to create the congestion contour plots. The crash data were for both directions. The crash dataset comprised 500 records (234 crashes westbound and 266 eastbound).

Crash Classification Results
Crashes on the TMC segments were classified using the methodology described Section 3. Figure  7 shows the contour map for one of the crashes classified "Case 1-Crash not in congested conditions". The number of Case 1 crashes was 419 out of 500 (i.e., 84%).

In
Step 3, the remaining 81 crashes were further analyzed. Of them, 62 were classified as belonging to Case 2 (a Case 2 is one that occurs in non-recurrent congestion). Panels (a) and (b) of Figure 8 show a crash that was identified as belonging to Case 2. In Step 3, the remaining 81 crashes were further analyzed. Of them, 62 were classified as belonging to Case 2 (a Case 2 is one that occurs in non-recurrent congestion). Panels (a) and (b) of Figure 8 show a crash that was identified as belonging to Case 2. Figure 8e shows an instance where the Case 3 classification was based on the CI contour map. Figure 8f presents an instance where the Case 3 classification was based on the AHCI contour map.
The remaining nine crashes were passed to Step 4. Of these, nine had an AHCI value of more than 60% and were placed in Case 3. Panels (c) and (d) of Figure 8 show an example of a TMCc(AHCI(t) ≥ 60%) that was identified as belonging to Case 3. The remaining 10 crashes were identified as being moved on to the sub-test in Step 4 and were subsequently also classified as Case 3.
The results of the crash classifications are summarized in Table 2. As can be seen, the proportions of cases 1, 2, and 3 were 84%, 12%, and 4%. In Step 3, the remaining 81 crashes were further analyzed. Of them, 62 were classified as belonging to Case 2 (a Case 2 is one that occurs in non-recurrent congestion). Panels (a) and (b) of Figure 8 show a crash that was identified as belonging to Case 2.   Figure 8e shows an instance where the Case 3 classification was based on the CI contour map. Figure 8f presents an instance where the Case 3 classification was based on the AHCI contour map.
The remaining nine crashes were passed to Step 4. Of these, nine had an AHCI value of more than 60% and were placed in Case 3. Panels (c) and (d) of Figure 8 show an example of a TMCc(AHCI(t) 60%) that was identified as belonging to Case 3. The remaining 10 crashes were identified as being moved on to the sub-test in Step 4 and were subsequently also classified as Case 3.
The results of the crash classifications are summarized in Table 2. As can be seen, the proportions of cases 1, 2, and 3 were 84%, 12%, and 4%.

Conclusions
This paper has presented a method for classifying crashes based on the type of congestion in which they occur. The methodology has been tested using North Carolina crash data for 274 centerline kilometers on I-40 in North Carolina for Tuesdays, Wednesdays, and Thursdays in April, May, September, and October of 2012 and 2013. In addition, an approach for identifying "recurrent" congestion has been used as part of the procedure. Unlike previous studies that used the mean and median of the speed distribution to distinguish recurrent and non-recurrent congestion pattern, the method used here employs a recurring congestion definition that is based on an average congestion history using probe-based speeds.
The proportion of secondary crashes (a surrogate for crashes in non-recurrent congestion) identified in the case study is in line with results from previous classification studies, where the secondary crash percentage ranged from 2.2% to 15.5% [28,[35][36][37][38][39][40][41]. However, the study found a secondary crash proportion that is near the upper end of the range reported in these earlier studies. This is to be expected because the proposed methodology classifies crashes as secondary whenever they occur in atypical congestion without the need to identify the primary crash or incident event.
There are some limitations to the study. The most important one is the need to validate or verify efforts regarding secondary incidents or crash identification. It calls for identifying and defining real-world secondary events with more detailed approaches in the real world. Another issue is that the link-based traffic data provide uniform traffic performance information for the entire TMC. Thus, this study employed the starting and ending points of TMC's as the beginning and ending points for the congestion. This may cause fewer errors in identifying non-recurrent congestion conditions because of the limitation of the TMC segment itself (different link lengths). However, this limitation can be addressed as vendors provide speed data for sub-segments. In addition, as the probe percentages increase, data quality will also be an issue.
Despite the fact that only 4% of the total crashes in this case study were identified as a "Crash in recurrent congestion", the percentage of "secondary" crashes caused by primary incidents in recurrent congestion increased compared well to the results of previous studies. Therefore, identifying these crashes calls for a further detailed classification methodology, which the authors are presently investigating with more accurate crash data available from the state-of-the-art technologies of vehicle such as vehicle black box and event data recorder can be used. Finally, there has been no consideration about the impact of rubbernecking because the objective of this study was to focus on developing for a robust and easily implementable methodology to classify crashes in different types of congestion.