Real Time Safety Model for Pedestrian Red-Light Running at Signalized Intersections in China

: The traditional way to evaluate pedestrian safety is a reactive approach using the data at an aggregate level. The objective of this study is to develop real-time safety models for pedestrian red-light running using the signal cycle level trafﬁc data. Trafﬁc data for 464 signal cycles during 16 h were collected at eight crosswalks on two intersections in the city of Nanjing, China. Various real-time safety models of pedestrian red-light running were developed based on the different combination of explanatory variables using the Bayesian Poisson-lognormal (PLN) model. The Bayesian estimation approach based on Markov chain Monte Carlo simulation is utilized for the real-time safety models estimates. The models’ comparison results show that the model incorporated exposure, pedestrians’ characteristics and crossing maneuver, and trafﬁc control and crosswalk design outperforms the model incorporated exposure and the model incorporated exposure, pedestrians’ characteristics, and crossing maneuver. The result indicates that including more variables in the real-time safety model could improve the model ﬁt. The model estimation results show that pedestrian volume, ratio of males, ratio of pedestrians on phone talking, pedestrian waiting time, green ratio, signal type, and length of crosswalk are statistically signiﬁcantly associated with the pedestrians’ red-light running. The ﬁndings from this study could be useful in real-time pedestrian safety evaluation as well as in crosswalk design and pedestrian signal optimization.


Introduction
Walking has been increasingly promoted as a sustainable transport mode in many cities around the world. It is identified as a healthy lifestyle instead of driving. In addition, walking is an efficient way to relieve traffic congestion, energy consumption, and vehicle emissions, improving the well-being of the society. However, pedestrians have always suffered from elevated risk of collisions, because they are vulnerable road users and usually exposed to traffic environments [1]. For example, IIHS reported 846 fatalities in crashes that involved red light running in 2018 in the U.S., and about half of those killed were pedestrians, bicyclists, and people in other vehicles. This is particularly true at the pedestrian crossing facility, signal intersection crosswalks, where pedestrians are used to being involved in violations such as red light running [2,3]. In China, pedestrians should comply with the traffic signal when crossing the crosswalks at signalized intersection. However, a common phenomenon is found that many pedestrians violate the traffic signal. Therefore, it is important to investigate pedestrians' red light running to understand their safety at crosswalks. Pedestrians' safety was commonly evaluated by safety models based on collision data. The traditional way to develop pedestrian safety models is to use the aggregate historical collision records, where the annual pedestrian collisions were modelled by various count data models. However, this is a reactive approach that needs the "collision" to be occurred before preventing it [4]. Moreover, this approach requires a prolonged time to collect the collision data. To address this issue, the real-time pedestrian safety model was proposed in this study. Additionally, high resolution traffic data such as the data in a short time window are available for developing such real-time model with the development of traffic surveillance systems and detection technologies.
The objective of this study is to develop the real-time pedestrian safety model using the signal cycle level traffic data. Various real-time safety models of pedestrians' redlight running were developed using the Bayesian Poisson-lognormal (PLN) model. The study is useful to the transport planners and engineers in safety evaluation of pedestrian behaviors at signalized intersections, as well as in crosswalk design and pedestrian signal optimization.

Proactive Safety Studies on Pedestrians
With the development of information technology, high solution traffic data were easy to access and apply in transportation systems. Recently, many studies applied ambient sensing-based driving data in proactive safety studies [3,[23][24][25]. For example, Zhang et al. [23] predicted the pedestrian-vehicle conflicts at the signalized intersections using the trajectories of pedestrians and vehicles. Li et al. [24] presented a surrogate safety measure for pedestrian crashes based on GPS trajectory data. Furthermore, the previous studies linked the traffic volatility with crashes [26][27][28]. Wali et al. [26,27] explored the relationship between driving volatility and crashes at intersections and segments. Zheng et al. [28] connected the traffic conflicts to crash risk to predict crashes at signalized intersections. These proactive safety studies focused on the interactions between vehicles and pedestrians. The pedestrians' red-light running behavior, which is an important safety concern at intersections, is highly ignored in these studies. The present study fills this gap in previous studies.

Pedestrian Red-Light Running
Pedestrians red-light running behavior has been investigated in many previous studies [29][30][31][32][33][34][35][36]. A general finding can be concluded from previous studies that pedestrians' red-light running rate in developing countries was higher than that in developed countries. A great interest in pedestrians red-light running is the contributing factor associated with this violation behavior. Such contributing factors can be summarized as individual characteristics, traffic conditions, and built environment, and among others [30][31][32]34,35,[37][38][39][40][41]. These studies provided a reference for selecting potential explanatory variables in pedestrian safety models.
As for the individual characteristics, age and gender were found to be significantly related to pedestrians' red-light running. Razzaghi and Zolala [40] found that males were Sustainability 2021, 13, 1695 3 of 11 more prone to violate traffic lights than females because males were risk-prone when crossing the streets. Guo et al. [41] showed that older pedestrians tended to comply more with traffic lights than young pedestrians, because they could bear a longer waiting time at crosswalks. However, Tiwari et al. [42] found that young pedestrians were more likely to be involved in violations than adult pedestrians based on a survival analysis. In addition, group size, which is the group of people with close distance and similar walking speed, was found to be an influencing factor with pedestrians red-light running [30,42]. Traffic conditions such as pedestrian volume, conflicting traffic volume, pedestrians waiting time, parked vehicles, and traffic density were found to be significantly associated with pedestrians red-light running violation [31,[42][43][44]. Furthermore, several studies showed that pedestrian crossing facilities, pedestrian signal type (countdown and flashing), scramble-phase, land use, and crosswalks design had impacts on pedestrians crossing behaviors [30,45].

Data Preparation
Field data was collected at two signalized intersections that are North Taiping Road and Zhujiang Road and Zhongshan Road and Zhujiang Road in Nanjing city. For each intersection, four crosswalks were selected for video recording. The video data was collected in October, autumn 2016. All the pedestrian crosswalks were installed with pedestrian signals, where four of them are flashing pedestrian signal and the other four are countdown timers. The length of the crosswalks varies from 35 to 40 m and the width varies from 4 to 4.5 m. The cycle length of pedestrian signal is from 105 to 140 s with an average of 125 s. The signal and design characteristics of the selected crosswalks are shown in Table 1. At each intersection, two cameras were used to cover the entire intersection for video recording. The cameras were mounted at poles near the crosswalks to record pedestrians crossing behaviors and pedestrian volume. Figure 1 shows the view of the cameras. Figure 2 is the zoomed in view of traffic lights. Two hours of video were recorded at each crosswalk, resulting in a total of 16 h of video data. In order to eliminate the impacts of weather condition on pedestrians crossing behaviors, field data were collected at weekday morning peak hours with fine weather. The videos were viewed in the laboratory for data reduction. Each video (2 h for one crosswalk) was reviewed by two trained graduate students for data processing and cleaning. The videos were processed using the software VirtualDub, which is able to play videos frame by frame. As such, detailed data information can be obtained from the video. In this study, the traffic data was collected in the signal cycle level. The output for each cycle are: (a) pedestrian volume; (b) pedestrians crossing behavior (red-light running or not); (c) pedestrians individual characteristics (age and gender); (d) pedestrian distractions (phone talking, texting/reading, headphone using); (e) pedestrian crossing speed; (f) pedestrians waiting time before crossing; and (g) signal green ratio. whether a headphone was on his/her head. Moreover, the video was reviewed by one investigator, and the other investigator had double checked the video for data filter. A total of 464 signal cycles were extract from the videos. Candidate variables were then extracted from the output data. Table 2 shows summary statistics of the collected data.     whether a headphone was on his/her head. Moreover, the video was reviewed by one investigator, and the other investigator had double checked the video for data filter. Figure 5 shows an example of pedestrians on the phone talking. A total of 464 signal cycles were extract from the videos. Candidate variables were then extracted from the output data. Table 2 shows summary statistics of the collected data.     For each pedestrian, the crossing speed was estimated as the length of the crosswalk divided by the crossing time, which is obtained from VirtualDub. The pedestrian related variables were extract by a comprehensive observation of the pedestrians' features such as the walking features, gestures, walking speed, appearance characteristics, and color/style of the clothes. Gender was differentiated by the appearance characteristics such as the hair and the color/style of the clothes. Moreover, walking features were also used, since females usually have higher walk frequency than males. Moreover, the swing amplitude of a male's arm is greater than that of a female. Figure 3 shows an example of males and females. Age is an estimate rather than an accuracy value. It is a binary variable and is classified as young and old. It is estimated from the appearance characteristics and further confirmed by walking speed, since young pedestrians generally have higher speed than older. Figure 4 showed an examples of young and older pedestrians. The distractions were detected by carefully examining pedestrians' gestures. Phone talking is distinguished by the pedestrian lifting his/her arm to his/her ear. Reading/texting is featured by pedestrian staring at the phone in his/her hand, and headphone use is discerned by whether a headphone was on his/her head. Moreover, the video was reviewed by one investigator, and the other investigator had double checked the video for data filter. Figure 5 shows an example of pedestrians on the phone talking. A total of 464 signal cycles were extract from the videos. Candidate variables were then extracted from the output data. Table 2 shows summary statistics of the collected data.

Poisson-Lognormal (PLN) Model
The real-time pedestrian safety model was developed to relate explanatory variables of each cycle to the number of pedestrians' red-light running in the same cycle. Count-data modeling techniques were utilized for the real-time safety model development. The Poisson-lognormal (PLN) model was adopted, because it can account for the over-dispersion in the count data [46,47]. Let Yi represent the number of pedestrians' red-light running at signal cycle i, assuming that Yi independently follows a Poisson distribution as follows:

Poisson-Lognormal (PLN) Model
The real-time pedestrian safety model was developed to relate explanatory variables of each cycle to the number of pedestrians' red-light running in the same cycle. Count-data modeling techniques were utilized for the real-time safety model development. The Poisson-lognormal (PLN) model was adopted, because it can account for the over-dispersion in the count data [46,47]. Let Yi represent the number of pedestrians' red-light running at signal cycle i, assuming that Yi independently follows a Poisson distribution as follows: A total of 464 signal cycles were extract from the videos. Candidate variables were then extracted from the output data. Table 2 shows summary statistics of the collected data.

Poisson-Lognormal (PLN) Model
The real-time pedestrian safety model was developed to relate explanatory variables of each cycle to the number of pedestrians' red-light running in the same cycle. Count-data modeling techniques were utilized for the real-time safety model development. The Poisson-lognormal (PLN) model was adopted, because it can account for the overdispersion in the count data [46,47]. Let Y i represent the number of pedestrians' red-light Sustainability 2021, 13, 1695 6 of 11 running at signal cycle i, assuming that Y i independently follows a Poisson distribution as follows: where λ i is the model parameter that is the expected number of pedestrians' red-light running at cycle i. Theoretically, the Poisson distribution assumes that the mean equals the variance. However, this assumption is not always true, because the variance of pedestrians' red-light running amount is greater than the mean, which is known as over-dispersion. To address the over-dispersion issue, it is assumed that A logarithm link function that relate µ i to a linear predictor is given by where x 1 , x 2 , . . . , x k are explanatory variables and β 1 , β 2 , . . . , β k are model parameters. It should be noted that pedestrian volume (PV) is the exposure that must be included in the model, and the conversion variable ln(PV) was used as exposure variable. The PLN model is then obtained by the following assumption where σ 2 u is the extra Poisson variance.

Full Bayesian Estimation
The forward stepwise procedure was used for model variables selection. Specifically, the variables were added to the model one by one. When the variable is significant and the goodness-of-fit of the model can be improved by adding the variable, it was kept in the model. A Full Bayesian approach based on Markov Chain Monte Carlo (MCMC) was utilized in estimating the real-time safety model. The posterior distributions of the model parameters can be estimated by the MCMC sampling techniques. Firstly, prior distribution of the model parameters is required. In this study, due to the lack of prior information, the non-informative priors were used. Specifically, the diffused normal distributions N(0, 10 6 ) were assigned as priors for the regression parameters β 1 , β 2 , . . . , β k . The diffused gamma distribution Gamma (0.001, 0.001) was used as the prior of precision for hyper-parameter σ −2 u [46]. The software WinBUGS was used to repeatedly sample the posterior distribution of the model parameters using MCMC techniques. Two independent Markov chains for each parameter were generated and run for 30,000 iterations. The first 10,000 interactions were used for monitoring the convergence, and then they were excluded as a burn-in sample. The remaining interactions were adopted for model parameters estimation of their posterior mean and standard deviation. Visual inspection was used for convergence check, where it occurs when the Markov chains were well mixed. As well, Brooks-Gelman-Rubin (BGR) was also adopted for quantitative convergence check, where it occurs if the value of the BGR statistic is less than 1.2.

Model Results and Discussion
The multicollinearity test was performed before putting the variables into the model. If two variables were found to be significantly correlated in the correlation analysis, they were inputted into the model one by one, while monitoring the overall model fit and the significance of the variable. Different combinations of explanatory variables were considered in the model development. The basic idea is to add more variables gradually into the model form. Three real-time safety models of pedestrian red-light running were developed. The first model (M1) had only the exposure variable (ln(PV)). The second model (M2) incorporated exposure (ln(PV)), pedestrians' characteristics (RM, RY), and crossing maneuver (RP, RT, RH, CS, ln(WT)) variables. The last model (M3) included exposure (ln(PV)), pedestrians' characteristics (RM, RY), and crossing maneuver (RP, RT, RH, CS, ln(WT)) variables, and traffic control (ST, GR) and crosswalk design (ln(CL)) variables. Table 3 lists the models estimation results. Only variables that are significant at 95% confidence level were kept in the models. Generally, the estimated models have good fit of the observed data. The parameter σ 2 u is significant in all models, showing that the over-dispersed issue was accounted for by the PLN model. Moreover, M1 has the largest σ 2 u and M3 has the lowest σ 2 u , indicating that incorporating more variables in the model could reduce the over-dispersion caused by unobservable or unmeasured heterogeneity. As for the coefficients, a positive covariate coefficient sign indicates that pedestrians' red-light running increases with the increase of the corresponding variable, whereas a negative coefficient sign indicates that the redlight running decreases with the increase of the corresponding variable. All the covariate coefficients have logical signs in the models in general.
The coefficient of ln(PV) is positive and statistically significant at 95% confidence level in all the real-time safety models, indicating that pedestrians' red-light running violation increases with the pedestrian volume. The finding is intuitive and consistent with many studies [30,42,44]. For M2, another three variables including RM, RP, and ln(WT) were found to be significant in the real-time safety model. The ratio of males (RM) was positively associated with pedestrians' red-light running violation. The finding is consistent with Wu et al. [47], who indicated that males have a higher probability to run red lights than females. The ratio of pedestrians on phone talking (RP) had positive sign in the real-time safety model, suggesting that pedestrians' red-light running increases with the distracted pedestrians caused by cell phones. As pointed out by Alsaleh et al. [48], distracted pedestrians could be involved in violations. It was found that the pedestrians' waiting time (ln(WT)) was positively related with pedestrians' red-light running. The result that this group of pedestrian lose their patience in waiting for the green lights, leading them to run the red lights, is reasonable.
For M3, in addition to the above-mentioned significant variables, the traffic control (ST, GR) variables and crosswalk design (ln(CL) variables were found to be in the real-time safety model. The coefficient associated with pedestrian signal type (ST) was found to be 0.353, meaning a decrease of 42.3% (exp(0.353) − 1 = 0.423) of pedestrians' red-light running by replacing a countdown signal with a flashing signal. The green ratio (GR) was negatively related with pedestrians' red-light running, which is reasonable, as longer green time allows more pedestrians to legally cross the intersection. The length of crosswalk (ln(CL)) was also found to be significant in the real-time safety model. The positive sign suggested that pedestrians' red-light running increases with the length of crosswalk. This could be explained by the fact that pedestrians tend to run the red light for saving time when they cross a long crosswalk. However, the pedestrians may also judge that they need more time to cross a longer crosswalk in red light, leading to higher crash risk. Based on this assumption, the likelihood of red-light running decreases with the length of crosswalk.
The inconsistent assumption and findings should be further studied using more data from more sites.

Conclusions
This study developed real-time safety models for pedestrians' red-light running at signalized intersections. Bayesian statistical techniques were applied for modeling the count number of pedestrians' red-light running at a signalized intersection. Data were collected at eight crosswalks on two intersections in the city of Nanjing and were then aggregated at signal cycle-level. In total, 464 signal cycles were collected during 16 h. A PLN model was used to account for the over-dispersion of the count data by adding an error term with lognormal distribution in the original Poisson regression model. Explanatory variables including exposure, pedestrians' characteristics, pedestrians' crossing maneuver, traffic control characteristics, and crosswalk design were considered in the safety models. Three real-time safety models were developed based on different combinations of the explanatory variables and were estimated under Bayesian framework.
The results from this study showed that the over-dispersion in the data was well captured by the PLN model, resulting in a well-fitted real-time pedestrian safety model. Moreover, the over-dispersion in the real-time safety model could be reduced by incorporating more variables in the model. In addition, the estimation results of the real-time safety model showed that the cycle-level variables including pedestrian volume, ratio of males, ratio of pedestrians on phone talking, pedestrian waiting time, green ratio, signal type, and the length of crosswalk were statistically significant in the real-time safety model (M3). These significant variables include pedestrians' characteristics and crossing maneuver, traffic control, and crosswalk design. As such, the real-time pedestrian safety model could be used to predict the expected number of pedestrians' red-light running in a cycle-level, given the information such as the real-time pedestrian volume, crosswalk design, and traffic control. Furthermore, the real-time safety models could be integrated with the signal timing optimization aiming at improving both efficiency and safety at signalized intersections. The findings from this study could also be helpful in understanding the risk factors associated with pedestrians' red-light violation behavior. Thus, countermeasures for reducing the number of pedestrians red-light running could be proposed. For example, the flashing pedestrian signal has a significant effect in reducing pedestrians' red-light violation. It is suggested the flashing pedestrian signal should be installed at the signalized intersection rather than countdown pedestrian signals.
There are some limitations in this study. The data used in this study was processed manually from the video footage. Thus, the real-time safety model cannot be directly used for online prediction of pedestrians' red-light running. A future study is to develop a video-based automated data collection platform for the use of real-time pedestrian safety model. The count data modelling techniques have a big family; it is valuable to compare different model forms in developing the real-time pedestrian safety model. This study focused on pedestrians' red-light running, which is the temporal violation, and a future study could be conducted by simultaneously modelling pedestrians' temporal and spatial violation. The finding of the study is drawn from the one day data collected at two intersections in Nanjing. There might be other potential factors such as time of days, day of weeks, and more crosswalk design features associated with pedestrians' red-light running behaviors. Therefore, the results could be verified using a larger dataset collected from more intersections with various design features for a longer time. The pedestrian behaviors are likely to be characterized by heterogeneity. However, the unobserved heterogeneity was not accounted for in the study. Future study could apply the random parameters PLN model to address this issue. The Gamma priors were used for the hyper-parameters in this study, other priors such as the uniform prior could also be an alternative in further study. Moreover, the transferability of the developed real-time pedestrian safety model should be performed using the data from more intersection crosswalks. In addition, the pedestrian red-light running behavior should be connected to pedestrian crashes to verify the findings in the study. Finally, the evaluation of pedestrian safety countermeasures such as the leading pedestrian intervals by traffic conflict techniques is also a valuable future study [49,50].