Bicycle-Vehicle Conﬂict Risk Based on Cyclist Perceptions: Misestimations of Various Risk Factors

: Cycling is a sustainable but vulnerable mode of transportation. Intersections’ bicycle-vehicle crashes are particularly dangerous. This paper explores the discordance between empirical evidence and cyclists’ perceptions of the various risk factors of cycling. Ridge regression was adopted to identify risk factors from bicycle-vehicle conﬂict data. A questionnaire was distributed to assess cyclists’ perceptions of safety and danger over the same candidate risk factors. There was indeed discordance between the data and the questionnaire results. Cyclists appear to misestimate risk in certain factors such as bus stops and subway stations. Understanding these misestimations can provide a foundation for safety improvements and for promoting cycling as a sustainable mode of transportation. our results lend meaningful suggestions for promoting cycling as a mode of transportation. Our results also may be helpful in improving intersection safety. The revealed problems expand previous research on risk factor identiﬁcation and the analysis of safety perceptions.


Introduction
Cycling is a sustainable mode of transportation that reduces traffic congestion and carbon emissions [1,2]. Cycling has been greatly encouraged to promote sustainable development [2]. However, a major deterrent to cycling is the risk of collision with vehicles [3,4]. Cyclists experience higher rates of injury and death than motor vehicle drivers [5,6]. For example, in Canada, cycling accidents result in 2.2% and 4.6% of all road fatalities and injuries, respectively, despite the low proportion of cyclists on the road [7]. Collisions between bicycles and motor vehicles comprise the majority of reported fatalities and serious injuries among other modes of transportation [8]. Many bicycle-related collisions occur in intersections [9,10]. Therefore, improving cycling safety in intersections has become one of the key issues in promoting sustainable transportation.
Researchers have identified various risk factors in this context, among which exposure is known to be most important [11,12]. Other risks can mainly be divided into three dimensions: infrastructural, traffic-related, and environmental. The number of nearby bus stops, for example, positively increases the risk of cycling [13]. Employment, number of schools, subway stations, land use mixtures, commercial retail properties, and the presence and proximity of bicycle facilities have positive effects on risk while average street length and the presence of parking entrances have negative effects [14][15][16]. Network coverage and recreational density have negative, direct effects on cycling accidents while their total, indirect effect is positive [17].
Few researchers have attempted to incorporate cyclists' behavioral perceptions into this context. It is possible that there are notable disparities between risk factors perceived by cyclists and the risk factors observed empirically. So how do these disparities affect the cyclist's safety when passing through intersections? Some studies have indicated that risk perception is not entirely aligned with the actual risk of collision [18,19]. Certain risk factors are recognized similarly by both researchers and cyclists, such as multi-lane corridors [15,20,21] and mixed traffic infrastructures [22][23][24]. Certain environmental risks that have been identified by researchers, however, are largely neglected by cyclists; these factors include land use patterns [14,15]. Cyclists perceive the volume of bicycles and vehicles on the road as risk factors [19,25], but statistics reveal a "safety in numbers" (SIN) effect [11,26]. Under the SIN effect, an increase in cycling accidents does not align with a proportional growth in the number of cyclists. Accordingly, effective improvements may not increase cyclists' perceptions of safety and therefore do not encourage citizens to commute by cycling.
Multicollinearity and overfitting are other noteworthy patterns in this context. Regression analysis is a common approach to identifying risk factors. Regression analysis techniques include generalized linear regression [27], Poisson regression [28], and negative binomial regression [29]. Multicollinearity and overfitting are highly common when operating any regression analysis, and significantly influence the models' accuracy, even leading to distortion. We developed a ridge regression technique in this study to identify risk factors without multicollinearity or overfitting [30][31][32]. By introducing bias into the coefficient estimation, ridge regression provides more robust and realistic estimations than other similar analysis techniques [31]. At present, ridge regression is mainly utilized in the mathematics field. There is potential for further development in terms of risk identification.
The first objective of this work was to compare the differences in risk factors between cyclists' perceptions and bicycle-vehicle conflict data. The second objective was to determine whether and why some effective safety measures do not increase cyclists' perceptions of safety. Perceived safety can influence modal choices [21], so our findings may be useful in promoting sustainable and safe cycling. We also hope to enrich the literature with regard to risk identification in cycling.
The remainder of this paper is organized as follows. Section 2 describes our methodology; Section 3 presents our collected data; and Section 4 presents the results of our analysis. Section 5 provides a deeper discussion and concluding remarks.

Regularization Technique: Ridge Regression
In this study, we developed a ridge regression technique to identify risk factors. Ridge regression is a linear regularization method which penalizes the size of regression coefficients to prevent overfitting and multicollinearity [31,33].
A general regression problem can be defined as follows [34]: where L(ω) is the loss function to be minimized (e.g., square error); R m is the attribute set; and m is the number of attributes. Given a set of n vectors, x 1 , . . . ., x n in R m . y i denotes real values andŷ i denotes the predicted values;ŷ i = ωx i , ω ∈ R m . The new samples are recorded as y n+1 =ω 0 x n+1 . Ridge regression involves adding a regularization L2 to reduce the variance with least squares: where L2 is a bound norm controlled by the shrinkage parameter λ(λ > 0). There are two main advantages to the ridge regression approach. First, when the sample dataset is small, least square fitting may result in high-variance regressions. The importance of different risk factors can vary greatly when the sample data slightly changes, so this technique may not effectively explain risk factors. By introducing a small bias, ridge regression can reduce the variance of the fit as well as the sensitivity of the result [35]. The dataset we used in this study is limited by the number of intersections, which is relatively small, so ridge regression is more reasonable, stable, and explanatory.
Second, as shown in Equation (2), regularization L2 is a squared term and ridge regression does not eliminate any variables. Instead, less important variables are assigned smaller parameters. Therefore, ridge regression retains nearly all parameter information and can clearly show the importance of each variable. This helped us to compare the differences in each variable between the statistical data and cyclists' perceptions.

Bicycle-Vehicle Conflicts
Traffic accident data generally take a long time to collect. Traffic conflicts, conversely, are fairly frequent and of minor social cost [36], making conflict data a promising alternative measure in safety research. There are mainly two indicators to measure the probability of a collision: time-to-collision (TTC) and post-encroachment time (PET). TTC reflects "the time required for two vehicles to collide if they continue at their present speed and on the same path" [37]. Vehicles' paths change continuously, so the TTC value changes continuously. TTC requires real-time measurement and depends on motion predictions. PET is defined by the time differences between two moments: the first road-user leaves the potential conflict point and the second reaches this point [38,39]. Compared with TTC, PET only requires two time stamps to measure conflicts. Its values are precisely defined with no need to include vehicle path choices or changes. PET measures the relative closeness to a collision, making it adaptable for analyzing conflicts at intersections because there are a lot of intersecting trajectories. TTC is more useful in road segments and can only be applied when there is a definite collision course. We used PET as a basic measure in this study and developed it to respond to the needs of our cycling safety analysis.

Video Conflict Data Collection Method
Many previous researchers have collected video traffic data [40][41][42]. The open-source software Traffic Intelligence [43] is often utilized for object-tracking; which can automatically extract trajectories of road users from video recordings. Trajectories show each road user's position and speed over a series of individual video frames. The speed, appearance, and other features of road users shown in the videos can then be used to characterize the pedestrians, cyclists, and vehicles. In this study, we used Traffic Intelligence to extract all vehicle and cyclist trajectories in the investigated intersections. We identified each bicycle/vehicle pair sharing a crossed trajectory and calculated their PET values. Generally, when PET > 3 s the probability of a collision is very small [44,45], so all PET values higher than 3 s were excluded.

Conflict Data from Video Recordings
We collected conflict data from 20 intersections in Nanjing, China. All intersections had more than one bicycle-vehicle accident within the past two years and were not undergoing construction at the time. All intersections also had common intersection shapes; 30% were unsignalized while 70% were signalized, and four were T-shaped while the other 16 were cross-shaped. No stereo intersections were included. The intersections had sufficient space and other qualities (legal allowances, weak electromagnetic conditions, low wind speeds) for the launching and landing of unmanned aerial vehicles (UAVs). The specific features of each intersection are shown in Figure 1. We collected data for seven days at the end of November 2019, between 11:00 A.M. and 12:00 A.M. Two UAVs were alternately deployed in each intersection to record traffic conditions. We avoided the morning peak hours (normally between 7:00 A.M. and 12:00 A.M.) because intersections tend to be congested during that time, vehicles tend to pass more slowly, and traffic police are more likely to be present. These qualities make the intersection deviate from its usual state. All data was acquired from late October to early November in 2019, during which time there were almost no inclement weather events (e.g., fog, rain) in the surveyed area. This allowed us to minimize any environmental disturbances.
After recording videos, we used Traffic Intelligence to extract all vehicle and cyclist trajectories, then calculated the PET values (>3 s) of all conflicts. Figure 2 shows the PET and total conflicts collected in each intersection. We observed a total of 1401 conflicts and an average of 70 conflicts in each intersection (a maximum of 171 and a minimum of 27). The total variance in PET values was 0.16 and the maximum variance was 0.22 in the intersection grouping.

Candidate Risk Factors
Bicycle-vehicle accidents in intersections involve various factors related to road users and the built environment [46]. For the purposes of this study, we collected candidate risk factors in traffic and environmental dimensions based on the literature [11][12][13][14][15][16].
The traffic variables we collected include the average flow of bicycles, vehicles, and pedestrians. We used Annual Average Daily Traffic (AADT) to define these traffic characteristics. AADT data are widely used in accident analyses [13,47,48]. They are provided by the road management department based on their in situ traffic flow monitoring equipment. However, pedestrian and bicycle traffic is generally not counted. We calculated the traffic of bicycles and pedestrians during the investigation period based on our UAV traffic recordings.
The environmental datasets were collected from Nanjing's Urban Plan (2018-2035) [49] and an open data source-Baidu map [50]. A radius of 500 m was considered to be a proper walking distance for taking buses and subways [51,52]. In a given transfer between a bus and subway, the search area fell within a 500 m radius [53,54]. We applied the 500 m search radius rule in calculating the influence area of buses and subways. Detailed descriptions, statistical analysis results, and the values of dependent variables for all intersections are shown in Table 1.

Cyclists' Safety Perception from Questionnaire Survey
From June to July 2020, we distributed questionnaires to cyclists at the investigated intersections. The respondents were asked to self-report their perceptions of a series of listed risk factors. For each risk factor, two questions need to be answered: Firstly, if this factor exists or increases, will your cycling be safer or more dangerous? Secondly, please rank the level of your perception with a minimum of 0 (very low) and maximum of 10 (extremely strong).
The listed risk factors are consistent with the candidate risk factors in Section 3.2. Sociodemographic information (gender, age, and income level) was also collected in this survey. We determined the safety/danger grade of each factor by averaging all the reported perception levels, respectively. A total of 400 questionnaires were distributed, with an average of 20 at each intersection. We received 312 questionnaires (78%), excluded any incomplete or inaccurate questionnaires (22%), and ultimately retained at least 12 questionnaires for each intersection. In six of the intersections we received 100% valid questionnaires. The minimum number of respondents for our investigation can be calculated using Equation (3) [55]: where n is the sample quantity, z is the standard constant (1.96 under 95% confidence), E is the allowable error margin (10%), and cv is the coefficient of variation, which is the ratio of the standard deviation (number of cyclists) to its mean. Based on the standard deviation and mean value of V 1 (Figure 1), cv was equal to 0.58. The minimum number of respondents under 95% confidence was 129. Our sample size (n = 312) was higher than the minimum (n = 129). We used Myer's index [56] to evaluate the representativeness of our sample. Assuming there is one population with no data preferences, the age's mantissa should have a uniform distribution. We calculated the difference between the actual distribution and the theoretical distribution in the age mantissa in our investigated population; the sum of their absolute values is the Myer's index. When a Myer's index is higher than 60, the sample is unrepresentative. Figure 3 shows the actual distribution of the age mantissa in our case, where the Myer's index is 10.40; this is lower than 60, so our sample was fairly representative.

Risk Factors from Ridge Regression and PET Data
The ridge-regression risk factor identification results are shown in Table 2. The ridge regression parameter was chosen by an automatic method [57]. As can be observed in Table 2, annual average daily traffic of vehicles, average hourly traffic of pedestrians, number of subway (metro) stations, density of commercial land use, and minimum crossing angle show significant correlation with bicycle-vehicle conflicts.
Notably, the annual average daily traffic of vehicles shows the highest positive influence on conflicts. The number of subway (metro) stations shows a very large negative effect on conflicts due to its high parameter estimate (−322.67). Interestingly, though the bicycle is a crucial participant in these conflicts, bicycle traffic appears to have no statistical correlation with bicycle-vehicle conflicts (p = 0.68).

Risk Factors from Cyclist Perceptions
The cyclists' perceptions of various traffic and environmental factors are shown in Figure 4. As shown in Figure 4, nearly all the cyclists who responded to our survey stated that vehicle traffic flow, density of commercial land, intersection approaches, and crossing angles negatively affect their perceptions of safety, almost without objection. These four factors consistently scored highest for danger, though some cyclists stated that residential land density, number of bus stops, pedestrian traffic flow, and number of subway stations also have certain effects on safety. Understandably, pedestrian signals, the presence of bicycle lanes, and signalized infrastructure appear to provide the cyclists with a sense of security.
Comparing with the statistical identifications presented in Section 4.1, the risk factors reported by cyclists show higher danger grades and vehicle traffic consistently shows the highest risk. The cyclists who responded to our questionnaire perceive substantial danger in areas with more bus stops and certain intersection approaches, while these two factors showed little significance in the statistical analysis. The cyclists gave a high danger grade to bus stops (6.20) and subway stations (4.65). However, statistically, bus stops show no significant relation to cyclist-vehicle collision probability and subway stations are actually negatively related to cyclist-vehicle conflicts. These effects merit further research.

Discussion and Conclusions
This study was conducted to explore the differences in risk factors between statistical bicycle-vehicle conflict data and cyclists' perceptions of risk. Ridge regression was adopted to identify risk factors from in situ conflict data and a questionnaire was distributed to evaluate cyclists' self-reported perceptions of safety and danger based on numerical aggregation. Many previous researchers have investigated the risk factors of cycling, but few have compared statistical data with cyclists' real-world perceptions. Our results suggest that cyclists may misestimate the risk of certain factors. In other words, improvements in certain risk factors may not improve cyclists' perceptions of safety and therefore may not promote cycling as an effective mode of transportation [58].
We find that cyclists may overestimate the risk of bus stops and subway stations, which is consistent with previously published research [59]. One possible explanation is that an increasing number of bus stops leads to more interactions between buses/passengers and cyclists. Buses interfere with cycling routes when they stop as well, and in certain cases force passengers to cross over the bicycle lane when deboarding. The number of bus stops is indeed an important risk factor for cycling [13,15,17,60,61]. In our investigated intersections, either few buses pass by or the bus transit lane is separate; it is possible that buses have relatively little influence on cycling safety in these cases. Our questionnaires show that cyclists are concerned for their safety at bus stops (danger grade of 6.20), which indicates that cyclists' perceptions may depend on certain aspects of the cycling process itself rather than the specific characteristics of intersections. Improving a single intersection's safety may not improve the cyclist-perceived safety there, which is consistent with previous research as well [62].
Few researchers have used subway stations as a candidate variable when exploring cycling safety due to a general scarcity of relevant data. We tested the number of subway stations as a variable affecting cyclist perceptions and found that cyclists appear to misestimate its effects on safety. Statistically, subway stations show a positive relation with cycling safety; cyclists perceive the opposite. The presence of a subway line leads to a reduction of other motor vehicles on the road [63], which eases nearby vehicle traffic. Subway pedestrians can cross the street through underpasses, reducing pedestrian traffic at these intersections as well. Both phenomena indirectly cause a decline in the risk of cycling. On the other hand, subway stations continually attract bike-shares and pedestrians [63], which may give cyclists a sense of danger due to higher flows of other cyclists.
Vehicle traffic, pedestrian traffic, and commercial land density are the most important risk factors for cycling as reflected in both accident data [13,17,60,64] and cyclists' self-reported perceptions [19,25]. Bicycle traffic, interestingly, is not significantly related to bicycle-vehicle conflict probability. The cyclists' perceptions are equivocal, with a safety grade of 5.61 and danger grade of 2.85. The SIN effect may account for this. As shown in Table 1, the variance of average hourly traffic of bicycles is relatively large. This makes the effects of bicycle traffic flow vary at different intersections. In some intersections, the risk increases when there are more bicycles. In other intersections, the opposite is true.
Our results reveal some noteworthy suggestions for promoting cycling. Intersections may be improved at the regional level, for example, as improving individual intersections may not enhance cyclist-perceived safety. The layout of bus stops and subway stations at intersections may be adjusted for uniformity. Reducing the interactions between buses (or bus passengers) and cyclists can be helpful, such as by using integrated bay bus stops (i.e., stops on the right of the intersection's vehicle entrance, where passengers use crosswalks). Bike boxes and staggering buses and cyclists' rush hours also may be helpful, in addition to setting shared bike stops as far away as possible from bicycle lanes and bus stops. We also recommend that subway stations be designed with convenient underground crossings and additional entrances, allowing pedestrians to disperse more quickly and reducing the danger that cyclists may otherwise perceive there.
Further, safety factors should be weighted properly when choosing improvements to a given intersection's safety. The space for traffic to flow through any intersection is limited, so numerous measures may not be possible to implement simultaneously. The objective is generally to maximize the effects of a certain improvement rather than the quantity of improvement measures taken at once. However, "maximum improvement" does not necessarily yield the maximum level of perceived safety. It is also important to improve safety perceptions, not solely the statistical measures of safety, when determining improvement measures. Using safety weights/grades can be a feasible approach to this, but the specific effects still need further study.
This work is not without limitations. First, though we selected intersections with bicycle collision records, we could not obtain specific accident records due to the confidentiality needs of management departments. This may have harmed the credibility and persuasiveness of our results. Second, our sample size is relatively small; further research is yet needed to investigate a greater proportion of the intersections in the investigated district. Some intersections could not be covered here due to environmental factors (e.g., high winds, strong electromagnetic fields). Despite these limitations, we believe our results lend meaningful suggestions for promoting cycling as a mode of transportation. Our results also may be helpful in improving intersection safety. The revealed problems expand previous research on risk factor identification and the analysis of safety perceptions.
Author Contributions: D.C. contributed to the conception of the study; C.W. contributed significantly to analysis, manuscript preparation, data analyses, and manuscript writing; Y.C. contributed to investigation. All authors have read and agreed to the published version of the manuscript.
Funding: This research received no external funding.