Empirical Analysis and Modeling of Stop-Line Crossing Time and Speed at Signalized Intersections

In China, a flashing green (FG) indication of 3 s followed by a yellow (Y) indication of 3 s is commonly applied to end the green phase at signalized intersections. Stop-line crossing behavior of drivers during such a phase transition period significantly influences safety performance of signalized intersections. The objective of this study is thus to empirically analyze and model drivers’ stop-line crossing time and speed in response to the specific phase transition period of FG and Y. High-resolution trajectories for 1465 vehicles were collected at three rural high-speed intersections with a speed limit of 80 km/h and two urban intersections with a speed limit of 50 km/h in Shanghai. With the vehicle trajectory data, statistical analyses were performed to look into the general characteristics of stop-line crossing time and speed at the two types of intersections. A multinomial logit model and a multiple linear regression model were then developed to predict the stop-line crossing patterns and speeds respectively. It was found that the percentage of stop-line crossings during the Y interval is remarkably higher and the stop-line crossing time is approximately 0.7 s longer at the urban intersections, as compared with the rural intersections. In addition, approaching speed and distance to the stop-line at the onset of FG as well as area type significantly affect the percentages of stop-line crossings during the FG and Y intervals. Vehicle type and stop-line crossing pattern were found to significantly influence the stop-line crossing speed, in addition to the above factors. The red-light-running seems to occur more frequently at the large intersections with a long cycle length.


Introduction
Despite the availability of traffic control devices and traffic signal warrants in China, the lack of a universal signal timing guideline results in inappropriate or improper signal timings arbitrarily matching with different signal control devices. Among such signal control implementations by local authorities, the most widely used one is a pattern of a flashing green (FG) of 3 s, followed by a constant yellow time (Y) of 3 s and an all-red time (AR) of 1 s or 2 s. This pattern serves as a transition period between different phases and the duration of AR mainly depends on the intersection size [1,2].

Past Research
The following section summarizes pertinent past research regarding the impacts of FG on drivers' stop-line crossing behavior as well as modelling approaches for drivers' stop-or-go decisions at the end of the green phase at signalized intersections.
The use of FG before the Y indication considerably increases the complexity and dynamics of drivers' decision-making process at signalized intersections [17,18]. A few previous studies have addressed the impacts of FG on drivers' stop-or-go decisions as well as DZ occurrence and reported distinct results. On the positive side, intersection approaches with a FG signal apparently have a lower proportion of drivers crossing during the red, as compared to those without a FG signal [19,20]. On the other hand, there is a marked increase in the proportion of stopping decisions with the presence of a FG signal [19,21,22]. In addition, the FG signal significantly reduced the area of Type I DZ at the onset of Y, while enlarging the areas of option zone and Type II DZ, as well as inducing conservative stops greatly and aggressive passes slightly [21,23]. The severity of maximum accelerations and decelerations was found to be reduced in the presence of a FG signal. In other words, adding a FG signal is similar to increasing the length of the Y interval [1,20].
Meanwhile, many researchers attempted to model drivers' stop-or-go decisions at the end of the green phase and they found that it is random and follows a certain probability distribution [24]. Based on the logistic regression models or logit models, such variables as approaching speed, distance or travel time to the stop-line at the onset of Y, vehicle type, driver characteristics, etc. were modeled to contribute to the value of stopping probability [20,21,[25][26][27]. In addition, a study used the classification tree model for the correlation between the probabilities of a stop-or-go decision and of red-light-running and the traffic parameters [28]. Meanwhile, Fuzzy Logic theory and Hidden Markov Model theory were also used to interpret drivers' stop-or-go decision-making process [18,[29][30][31].
In summary, much attention has been given to the impacts of FG on drivers' stop-or-go decisions and DZ occurrence. Both positive and negative impacts of FG have been reported in literature and argued. The diversity of conclusions might be due to the length of FG interval, cultural differences, intersection geometries, signal operations and other local conditions. In contrast, little research has addressed the impacts of FG on stop-line crossing speed and time in response to the specific phase transition period including a 3 s of FG and a 3 s of Y. In addition, comparisons of such impacts between the rural high-speed intersections and the urban intersections are in a great shortage. Hence, this study was intended to fill in that research gap.

Site Descriptions
Five intersections in Shanghai were selected to collect driver behavior and traffic operation data, which were implemented with a FG of 3 s, a Y of 3 s, and an AR of 1 or 2 s, as shown in Figure 1. The selected intersections can be categorized into two groups, i.e., three rural intersections with a speed limit of 80 km/h and two urban intersections with a speed limit of 50 km/h. Three rural intersections are located on a highway, i.e., Cao'an Road, which is a major corridor connecting the city center and Jiading district. It accommodates a large amount of commuting traffic in the peak hours, including a high percent of large trucks. Two urban intersections are Siping Road and Dalian Road and Rende Road and Jipu Road, where the vast majority of traffic are passenger cars. All the selected intersections are implemented with the red-light-running enforcement cameras. Meanwhile, many researchers attempted to model drivers' stop-or-go decisions at the end of the green phase and they found that it is random and follows a certain probability distribution [24]. Based on the logistic regression models or logit models, such variables as approaching speed, distance or travel time to the stop-line at the onset of Y, vehicle type, driver characteristics, etc. were modeled to contribute to the value of stopping probability [20,21,[25][26][27]. In addition, a study used the classification tree model for the correlation between the probabilities of a stop-or-go decision and of red-light-running and the traffic parameters [28]. Meanwhile, Fuzzy Logic theory and Hidden Markov Model theory were also used to interpret drivers' stop-or-go decision-making process [18,[29][30][31].
In summary, much attention has been given to the impacts of FG on drivers' stop-or-go decisions and DZ occurrence. Both positive and negative impacts of FG have been reported in literature and argued. The diversity of conclusions might be due to the length of FG interval, cultural differences, intersection geometries, signal operations and other local conditions. In contrast, little research has addressed the impacts of FG on stop-line crossing speed and time in response to the specific phase transition period including a 3 s of FG and a 3 s of Y. In addition, comparisons of such impacts between the rural high-speed intersections and the urban intersections are in a great shortage. Hence, this study was intended to fill in that research gap.

Site Descriptions
Five intersections in Shanghai were selected to collect driver behavior and traffic operation data, which were implemented with a FG of 3 s, a Y of 3 s, and an AR of 1 or 2 s, as shown in Figure 1. The selected intersections can be categorized into two groups, i.e., three rural intersections with a speed limit of 80 km/h and two urban intersections with a speed limit of 50 km/h. Three rural intersections are located on a highway, i.e., Cao'an Road, which is a major corridor connecting the city center and Jiading district. It accommodates a large amount of commuting traffic in the peak hours, including a high percent of large trucks. Two urban intersections are Siping Road and Dalian Road and Rende Road and Jipu Road, where the vast majority of traffic are passenger cars. All the selected intersections are implemented with the red-light-running enforcement cameras.  It needs to be mentioned that the required Y time for the rural intersections is approximately 5.0 s and that for the urban intersections is 3.0 s, according to the Manual on Uniform Traffic Control Devices (MUTCD) in the United States [3]. Hence, the group of rural intersections can represent those intersections with a FG signal and a theoretically insufficient length of Y, and the group of urban intersections can represent those intersections with a FG signal and an appropriate length of Y. A summary of site conditions at the observed intersections and approaches is provided in Table 1. It needs to be mentioned that the required Y time for the rural intersections is approximately 5.0 s and that for the urban intersections is 3.0 s, according to the Manual on Uniform Traffic Control Devices (MUTCD) in the United States [3]. Hence, the group of rural intersections can represent those intersections with a FG signal and a theoretically insufficient length of Y, and the group of urban intersections can represent those intersections with a FG signal and an appropriate length of Y. A summary of site conditions at the observed intersections and approaches is provided in Table 1.

Data Collection and Reduction
The video-taping method was applied to collect the traffic operation and driver behavior data in this study. Video surveys were conducted under sunny weather conditions on normal weekdays in the year of 2013 and 2014. Two high-resolution video cameras were set up on nearby buildings at each intersection approach to record the upstream traffic conditions of the approach and the signal states respectively. An image-processing software with a resolution of 1/30 s (i.e., George 2.1 developed by Nagoya University) was used for data extraction and reduction to ensure high data accuracy. After time synchronization of video data using a stopwatch, calibration was done over the captured global positions through several reference positions, showing that the spatial and temporal trajectory errors were smaller than 0.15 m and 0.1 s respectively. Then vehicles' positions were recorded every 1/30 s once entering the scope of the video cameras, which was realized manually utilizing George 2.1. Thus we could reproduce the complete vehicular trajectories as well as the corresponding signal states which were also obtained at a time step of 1/30 s through the above preprocessing of raw trajectory data.
Only the last-to-stop and the first-to-go through vehicles after the onset of FG that had a following distance greater than 5 s were selected for the analysis to avoid the influence of the presence of leading vehicles. The first-to-stop vehicles refer to the first stopped vehicles in its lane after the onset of FG; the last-to-go vehicles refer to the last passed vehicles in its lane after the onset of FG. Valid sample sizes for the last-to-stop vehicles and the first-to-go vehicles as well as for various stop-line crossing patterns (i.e., crossing during the FG interval (FGC), crossing during the Y interval (YC), red-light-running (RLR)) at each intersection are provided in Table 1. Eventually, 1465 vehicle trajectories including 377 trucks and 1088 passenger cars that encountered the onset of FG were successfully obtained as shown in Figure 2 and used in the following statistical analysis and model development. onset of FG; the last-to-go vehicles refer to the last passed vehicles in its lane after the onset of FG. Valid sample sizes for the last-to-stop vehicles and the first-to-go vehicles as well as for various stopline crossing patterns (i.e., crossing during the FG interval (FGC), crossing during the Y interval (YC), red-light-running (RLR)) at each intersection are provided in Table 1. Eventually, 1465 vehicle trajectories including 377 trucks and 1088 passenger cars that encountered the onset of FG were successfully obtained as shown in Figure 2 and used in the following statistical analysis and model development.

Statistical Characteristics of Stop-Line Crossing Time
Stop-line crossing time is defined as the elapsed time in seconds after the onset of FG in this study. It is thus a relative time against the onset time of FG, instead of an absolute time. A stop-line crossing time greater than 6 s (i.e., the sum of the FG and Y time durations) actually translates to a RLR. The observed frequencies and cumulative probability of stop-line crossing time are presented in Figure 3, for the rural intersections and the urban intersections respectively. In this study, four distinct patterns are defined based on the phase transition intervals to further look into the characteristics of the stop-line crossing time, i.e., Crossing during the FG interval (FGC), Crossing during the Y interval (YC), Red-Light-Running (RLR), and Stop (STOP). Descriptive statistics of the observed approaching speeds and distances to the stop-line at the onset of FG for each of the patterns are presented in Table 2.
The proportions of the STOP pattern were found to be the largest, which were 44.6% and 46.9% for the rural intersections and the urban intersections respectively, and those of the RLR pattern were found to be the lowest, which were 1.8% and 3.7% respectively. In addition, the proportion of the YC pattern at the urban intersections, i.e., approximately 1/3 out of the total samples, is considerably larger than that at the rural intersections, i.e., around 1/4 out of the total samples. It could be explained by the hypothesis stated earlier in the paper that the use of FG may become more effective in resulting in conservative decisions of drivers at the rural high-speed intersections. More specifically, the use of FG seems to lead to earlier entries (i.e., FGC) or STOP, since the drivers would encounter more severe traffic conflicts if they decided to go but could not pass the intersection during the rest of the time.
Furthermore, it was found that the mean speed of RLR pattern at rural intersections, i.e., 50.4 km/h, is significantly lower than those of other patterns. This is because all the RLR vehicles (a very low proportion, i.e., 1.8%) were found to be trucks with relatively low running speeds and their drivers are generally more aggressive than the passenger car drivers.  Table 2.
The proportions of the STOP pattern were found to be the largest, which were 44.6% and 46.9% for the rural intersections and the urban intersections respectively, and those of the RLR pattern were found to be the lowest, which were 1.8% and 3.7% respectively. In addition, the proportion of the YC pattern at the urban intersections, i.e., approximately 1/3 out of the total samples, is considerably larger than that at the rural intersections, i.e., around 1/4 out of the total samples. It could be explained by the hypothesis stated earlier in the paper that the use of FG may become more effective in resulting in conservative decisions of drivers at the rural high-speed intersections. More specifically, the use of FG seems to lead to earlier entries (i.e., FGC) or STOP, since the drivers would encounter more severe traffic conflicts if they decided to go but could not pass the intersection during the rest of the time.
Furthermore, it was found that the mean speed of RLR pattern at rural intersections, i.e., 50.4 km/h, is significantly lower than those of other patterns. This is because all the RLR vehicles (a very low proportion, i.e., 1.8%) were found to be trucks with relatively low running speeds and their drivers are generally more aggressive than the passenger car drivers. It can also be seen from the table that the mean of speed at the onset of FG (i.e., V FG ) for the FGC pattern at the rural intersections was the highest, followed by those of the YC pattern, the STOP pattern, and the RLR pattern. Moreover, the mean speed of the RLR pattern was remarkably lower (i.e., about 17%) than those of the other patterns. At a lower speed level, the RLR drivers were supposed to have longer average time to make a stop as compared with those FGC and YC drivers, but they eventually chose to cross. The fact implies that most of the RLRs observed at the rural intersections might be intentional or due to drivers' decision errors, not because of drivers' incapability of making a stop. This finding is particularly interesting as it infers that the RLR enforcement cameras are probably less influential at the rural intersections. The differences in the means of V FG among the four patterns were comparably small at the urban intersections. The mean V FG of the STOP pattern was approximately 10% lower than those of other patterns, which is rational since the stopping probability of drivers generally increases with the reduced approaching speed.
In terms of distance to the stop-line at the onset of FG (i.e., D FG ), it was found that the mean of D FG rose substantially (almost tripled) from the FGC pattern to the STOP pattern for both types of intersections. It reveals that the stopping probability of drivers is strongly associated with the D FG . Meanwhile, D FG can be a good explanatory variable for predicting the probabilities of YC and RLR as well. crossing patterns except the pattern of STOP. As shown in the figure, the means of stop-line crossing speed for the patterns of FGC, YC and RLR were very similar for both the rural intersections and the urban intersections. The mean stop-line crossing speeds for the FGC and YC patterns were 63.2 km/h and 61.0 km/h respectively at the rural intersections, about 22% lower than the posted speed limit, i.e., 80 km/h. The corresponding mean speeds were 47.4 km/h and 45.7 km/h at the urban intersections, much closer to the posted speed limit, i.e., 50 km/h. The mean stop-line crossing speeds for the RLR pattern were almost the same at the two types of intersections, though the posted speed limits differ greatly. Moreover, the discrepancies in the mean stop-line crossing speed between the RLR pattern (i.e., 44.6 km/h) and the other two patterns (i.e., 63.2 km/h and 61.0 km/h) were significant at the rural intersections. However, such a discrepancy was very minor at the urban intersections, i.e., 44.7 km/h versus 47.4 km/h and 45.7 km/h.

Statistical Characteristics of Stop-Line Crossing Speed
On the other hand, the variance of the stop-line crossing speed was much more considerable at the rural intersections, which could be partly attributed to the traffic compositions, including a high percent of trucks as mentioned earlier.

Prediction of Stop-Line Crossing Time
Stop-line crossing time defined in this study is, on one hand, a continuous and linear increasing variable in nature. On the other hand, it can also be categorized into four stop-line crossing patterns, i.e., YC, FGC, RLR and STOP. Therefore, one modelling approach is considering it as a continuous dependent variable and then using the multiple linear regression models, if it is monotonically increasing for each stop-line crossing pattern. The other approach is converting it to be a multinomial variable and then using the discrete choice models, if it is random and scattered for each stop-line crossing pattern. To select a proper modelling approach, a preliminary analysis on the distribution of stop-line crossing time was firstly conducted to look into whether the stop-line crossing time was significantly differently among the stop-line crossing patterns. The results showed that it was rather random and not monotonically increasing for each stop-line crossing pattern. Therefore, a The mean stop-line crossing speeds for the RLR pattern were almost the same at the two types of intersections, though the posted speed limits differ greatly. Moreover, the discrepancies in the mean stop-line crossing speed between the RLR pattern (i.e., 44.6 km/h) and the other two patterns (i.e., 63.2 km/h and 61.0 km/h) were significant at the rural intersections. However, such a discrepancy was very minor at the urban intersections, i.e., 44.7 km/h versus 47.4 km/h and 45.7 km/h.
On the other hand, the variance of the stop-line crossing speed was much more considerable at the rural intersections, which could be partly attributed to the traffic compositions, including a high percent of trucks as mentioned earlier.

Prediction of Stop-Line Crossing Time
Stop-line crossing time defined in this study is, on one hand, a continuous and linear increasing variable in nature. On the other hand, it can also be categorized into four stop-line crossing patterns, i.e., YC, FGC, RLR and STOP. Therefore, one modelling approach is considering it as a continuous dependent variable and then using the multiple linear regression models, if it is monotonically increasing for each stop-line crossing pattern. The other approach is converting it to be a multinomial variable and then using the discrete choice models, if it is random and scattered for each stop-line crossing pattern. To select a proper modelling approach, a preliminary analysis on the distribution of stop-line crossing time was firstly conducted to look into whether the stop-line crossing time was significantly differently among the stop-line crossing patterns. The results showed that it was rather random and not monotonically increasing for each stop-line crossing pattern. Therefore, a multinomial logit (MNL) model was finally chosen to predict the stop-line crossing time by considering it as a multinomial variable, i.e., stop-line crossing patterns.
There are likely to be site-specific factors that influence driver decision-making. To avoid correlations among the factors, the authors used a backward selection process to screen the independent variables. In addition to the driver-specific factors including vehicle type, approaching speed at the onset of FG, and distance to the stop-line at the onset of FG, we firstly included all the site-specific factors including area type, intersection type, observation time periods, and speed limit as well as their joint factors in the model and then eliminated the insignificant variables one by one. Based on the analysis results, the following factors are eventually selected as explanatory variables: vehicle type (VT), area type (AT), intersection type (IT), approach speed at the onset of FG (V FG ), and distance to the stop-line at the onset of FG (D FG ). With respect to the IT, the intersections with a size larger than 50 m and with a cycle length greater than 150 s were classified as the large intersections in this study (i.e., Cao'an Road and Jiasongbei Road, Cao'an Road and Xiangjiang Road, and Siping Road andDalian Road listed in Table 1); the rest of the intersections were classified as the small intersections (i.e., Cao'an Road and Caofeng Road and Rende Road and Jipu Road as listed in Table 1). The details of the selected independent variables are explained below. The MNL model estimation results with the reference category of STOP are presented in Table 3. The MNL model is generally acceptable according to the model statistics, including the Log-likelihood at constant (i.e., 3335.696), the Log-likelihood at convergence (i.e., 1497.164), the McFadden R 2 (i.e., 0.551) and the Hit-ratio (i.e., 87.6%). Based on the estimated model coefficients, the occurring probabilities of the FGC, YC, and RLR patterns against the STOP pattern can then be calculated by Equations (1)-(3) shown below. These equations can then be used to predict the ratios of various stop-line crossing patterns for a particular vehicle at a particular intersection.
It was found that vehicle type (VT) is not a significant factor for all the four patterns. The approaching speed at the onset of FG (V FG ) and distance to the stop-line at the onset of FG (D FG ) affected the occurring probabilities of the FGC and YC patterns at a significance level of 0.01. The former variable (V FG ) had a positive effect and the latter variable (D FG ) had a negative effect, i.e., the probabilities of the FGC and YC patterns against the STOP pattern increased as the V FG increased and the D FG decreased. In addition, the variable of AT positively contributed to the ratio of the YC pattern to the STOP pattern at the significance level of 0.05. It implies that drivers are more likely to choose crossing during the Y interval than taking a stop at the urban intersections, which is consistent with the previous findings presented in Table 2.
Meanwhile, the RLR pattern seems to be irrelative with most of the factors except intersection type (IT). One major reason may be that only a small number of RLR samples were collected in this study as shown in Table 1. It was found that the ratio of RLR to STOP occurs more frequently at the large intersections. This is probably due to that, in order to avoid a long waiting time caused by the long cycle lengths adopted at the large intersections, drivers intended to run into the intersections even if they could not reach the stop-line until the start of the red signal.

Prediction of Stop-Line Crossing Speed
Unlike the stop-line crossing time, the stop-line crossing speed is simply a continuous variable. The statistical analysis results presented in Figure 4 have indicated that it is quite randomly distributed for each of the stop-line crossing patterns of YC, FGC and RLR, particularly at the rural intersections. Meanwhile, its means and variances seem to be related to area type and crossing patterns.
Therefore, a multiple linear regression (MLR) model was further developed to predict the stop-line crossing speed. A multinomial variable of CP representing the stop-line crossing patterns was also incorporated into the MLR model, in addition to those determinants used in the MNL model such as VT, AT, IT, V FG , and D FG . A preliminary analysis showed that the variable of IT was an insignificant factor that influences the stop-line crossing speed. Thus, it was excluded from the model development.
The estimated coefficients of the final model are summarized in Table 4, with a total sample size of 802 and a regression R 2 of 0.579. Based on the estimated model coefficients, the stop-line crossing speed can then be formulated by Equation (4) shown below.
where, y = stop-line crossing speed (km/h); CP = stop-line crossing patterns, which is a multinomial variable, i.e., 1 = FGC, 2 = YC, and 3 = RLR; the other variables are defined as before. It can be found that all five independent variables were associated with the stop-line crossing speed at the significance level of 0.01. The impacts of V FG and D FG were positive and those of VT, AT, and CP were negative. The results support that the greater the approaching speed and the distance to the stop-line are, the higher the stop-line crossing speed is. Furthermore, stop-line crossing speed tends to be significantly higher for rural intersections, passenger cars, and the FGC pattern. These findings are easy to understand if considering the site conditions presented in Table 1.

Conclusions
This study empirically analyzed and modeled drivers' stop-line crossing behavior at signalized intersections in China, where a specific phase transition period consisting of a 3 s of FG and a 3 s Y is commonly applied. Comprehensive statistical analyses were conducted to look into the characteristics of stop-line crossing time and speed, based on 1465 high-resolution vehicle trajectories collected at five intersections in Shanghai. A MNL model and a MLR model were then developed to predict the stop-line crossing patterns (i.e., FGC, YC, RLR, and STOP) and speed, respectively. Major findings of this study are summarized as follows.

•
Compared with the rural intersections, the urban intersections had a higher ratio of stop-line crossings during the Y interval and an approximately 0.7 s longer stop-line crossing time which is defined as the elapsed time after the onset of FG. • Not only approaching speed and distance to the stop-line at the onset of FG, but also area type, imposed a significant influence on the ratios of the FGC and YC patterns to the STOP pattern. Area type also positively contributed to the ratio of the YC pattern to the STOP pattern; in addition, the ratio of RLR to STOP was higher at the large intersections with a long cycle length.

•
The larger the approaching speed and the distance to the stop-line were, the higher the stop-line crossing speed was. Stop-line crossing speed was also significantly higher for the rural intersections, the passenger cars, and the FGC pattern.
The findings of this study are useful for the evaluation of the operational and safety performance during the phase transition period at signalized intersections implemented with a FG signal. They could also contribute to the development of safety warning systems as well as dynamic signal control strategies mitigating traffic conflicts.
However, it must be mentioned that the conclusions based on the five intersections may not hold true for all urban and rural intersections. Distinct driver behavior might be observed at other types of intersections with different traffic demand, size, speed limits, and phase transition intervals. Moreover, the effect of time of day on the drivers' stop-line crossing behavior was not considered in this study. Thus, it is essential to collect empirical data for other traffic conditions in order to more comprehensively investigate stop-line crossing behavior. In addition, other modelling approaches can also be studied in order to incorporate the prediction processes of stop-line crossing time and speed and to improve the prediction accuracy in the future. For instance, a nested logit model with a top-level model distinguishing stop-or-go and then a lower level model distinguishing among the three "go" alternatives might provide a good alternative. Other random effects models or panel structures can be investigated to eliminate the correlations among the site-specific factors.