Applying Machine Learning to Explore Feelings about Sharing the Road with Autonomous Vehicles as a Bicyclist or as a Pedestrian

: The current literature on public perceptions of autonomous vehicles focuses on potential users and the target market. However, autonomous vehicles need to operate in a mixed traffic con ‐ dition, and it is essential to consider the perceptions of road users, especially vulnerable road users. This paper builds explicitly on the limitations of previous studies that did not include a wide range of road users, especially vulnerable road users who often receive less priority. Therefore, this paper considers the perceptions of vulnerable road users towards sharing roads with autonomous vehi ‐ cles. The data were collected from 795 people. Extreme gradient boosting (XGBoost) and random forests are used to select the most influential independent variables. Then, a decision tree ‐ based model is used to explore the effects of the selected most effective variables on the respondents who approve the use of public streets as a proving ground for autonomous vehicles. The results show that the effect of autonomous vehicles on traffic injuries and fatalities, being safe to share the road with autonomous vehicles, the Elaine Herzberg accident and its outcome, and maximum speed when operating in autonomous are the most influential variables. The results can be used by au ‐ thorities, companies, policymakers, planners, and other stakeholders.


Introduction
Most of the studies related to public perceptions of autonomous vehicles focus on potential users. For example, Silberg et al. [1] conducted a survey in California, New Jersey, and found the elderly and young people (from 18 to 25 years old) as the most potential users. They also found that providing incentives, such as designated lanes, was an important factor for adopting autonomous vehicles. Some of these studies explored the real presence of autonomous vehicles as a mobility option. Begg [2] explored the opinions of transportation experts in the U.K. about the real presence of autonomous vehicles on public roads. The experts suggested 2025 for level 4 and 2040 for level 5 (level 0: no driving automation; level 1: driver assistance; level 2: partial driving automation; level 3: conditional driving automation; level 4: high driving automation; level 5: full driving automation). This study also proposed safety as an important factor. Safety-related factors, such as physical threats, privacy, and trust, are among the important factors in this type of study [3].
Some studies considered the effects of technology on perceptions of autonomous vehicles. Young adults and men are two groups that are more interested in autonomous vehicles than other demographic groups [4,5] since they are more interested in using new technologies [6]. Kyriakidis et al. [7] conducted a survey in different countries and found a positive association between driving and new technologies, such as cruise control usage and willingness to buy autonomous vehicles. They also found that respondents would be willing to pay more to have fully automated vehicles. However, Seapine Software [8] found equipment failures, liability issues, and hacking issues as important concerns for the potential users.
Few studies focused on the potentially shared mobility that can be provided by autonomous vehicles. Autonomous vehicles can be easily adopted for shared mobility, but people still prefer to have a private autonomous vehicle [9]. However, Haboucha et al. [10] found that men in Israel prefer shared autonomous vehicles. This is in line with other studies that found that public perceptions of autonomous vehicles and related effective factors can vary among different countries [11]. For example, autonomous vehicles were perceived as scary among 42% of respondents in a study in Japan, while this rate was 66% among U.S. respondents [12]. Therefore, Americans seem to have more safety concerns than other nationalities, such as the Japanese.
Desirability and willingness to buy is another approach in the current few studies that are related to public perceptions of autonomous vehicles. Casley et al. [13] found safety, legal issues, and cost important for autonomous vehicles' desirability. Jiang et al. [14] also found household size, age, and trip purposes as effective factors for willingness to buy autonomous vehicles and Shabanpour et al. [15] added price, incentives, and policies to these factors.
Autonomous vehicles make eating, working, sleeping, and doing possible during daily travel time [16]. They can increase safety by reducing distractions and human errors [17,18]. Moreover, current and future autonomous vehicles propose more safety benefits, such as intelligent speed assistance and advanced emergency braking. Public perception in addition to the technology and road infrastructure are important factors to find the effects of autonomous vehicles on travel behavior. However, most of the current studies related to autonomous vehicles mainly focus on motor vehicles and connectivity between vehicles and infrastructure [19], and only a few studies focus on the effects of these technologies on public perception. Schoettle and Sivak [12] found that most respondents are not familiar with autonomous vehicles, but they believe in less distractions and fewer accidents for autonomous vehicles. Some studies focus on safety as one of the most significant factors for public perceptions of autonomous vehicles that can change travel behavior and mode, e.g., [20][21][22][23][24]. However, based on a literature review by Gkartzonikas et al. [25], only Hulse et al. [5] focused on the perceptions of pedestrians and Penmetsa et al. [26] focused on the perceptions of pedestrians and bicyclists. This is another important gap since autonomous vehicles need to operate in a mixed traffic condition that includes a wide range of road users. It is critical to consider the perceptions of vulnerable road users who often feel that they have less priority. If vulnerable road users, such as pedestrians and cyclists, do not feel comfortable sharing the roads with autonomous vehicles, using this new technology can negatively affect active travel options. This paper explores the perceptions of bicyclists and pedestrians to fill the gap of previous studies that did not include a wide range of road users, especially vulnerable road users.
Furthermore, most of the studies that examined perceptions and attitudes use descriptive analysis and prediction models, which can relate the perceptions of sharing the road with autonomous vehicles not been developed to date. This paper explores road us-ers' perceptions, including vulnerable road users, towards autonomous vehicles and develops prediction models using machine learning techniques to explore feelings about sharing the road with autonomous vehicles as a bicyclist or as a pedestrian.
Using machine learning and non-parametric techniques provides some advantages for this study. For example, these techniques do not need special assumptions or predefined functions that traditional parametric techniques need. In addition, non-parametric techniques can handle multicollinearity issues better than traditional parametric techniques. Because of high potential correlations between variables in this study, non-parametric techniques can be better options. Finally, these non-parametric models can be presented graphically, making them easy to interpret.

Materials and Methods
Bike Pittsburgh (BikePGH), a registered non-profit company, works to make the city safe and accessible for bicyclists. BikePGH launched two surveys in 2017 and 2019 to explore the feeling of pedestrians and bicyclists about sharing the road with autonomous vehicles, and this paper used the collected data from the latest one. In total, the data were collected from 795 people using the BikePGH related blog, website, and email list. The feeling about sharing the road with autonomous vehicles was the dependent variable in this paper. The independent variables included paying attention to the autonomous vehicles, familiarity with the technology behind autonomous vehicles, the experience of sharing the road with autonomous vehicles while riding a bicycle or walking, feeling safe while sharing the road with autonomous vehicles and human-driven cars, the effects of autonomous vehicles on traffic injuries and fatalities, the maximum speed when operating in autonomous mode, having full-time employees (pilot and co-pilot) at all times, operating in manual mode while in an active school zone, sharing some non-personal data, reporting all safety-related incidents, and previous accidents effects. In addition, some socio-demographic factors, such as postal address, being an active member of BikePGH, car ownership, having a smartphone and age, were also considered. Table 1 shows the description of dependent and independent variables in this paper.

DV
What do you think about using public streets as a proving ground for autonomous vehicles (AVs)?
Approve (1), To what extent have you been paying attention to the subject of AVs in the news? 1-5

IV2
How familiar are you with the technology behind AVs? 1-4

IV3
Have you shared the road with an AV while riding your bicycle?
Have you been near an AV while walking or using a mobility device (wheelchair, etc.)?
Yes (1), no (0), not sure (2) IV10 Do you think that AVs should operate in manual mode while in an active school zone?
Yes (1), no (0), not sure (2) IV11 Should AV companies be required to share some non-personal data with the proper authorities?
Yes (1), no (0), not sure (2) IV12 Should AV companies be required to disclose information and data as to the limitations, capabilities, and real-world performance of their cars with the proper authorities?
Yes (1), no (0), not sure (2) IV13 Should AV companies be required to report all safety-related incidents with the proper authorities, even if a police report is not required?
Yes (1), no (0), not sure (2) IV14 In March of 2018, an AV struck and killed Elaine Herzberg, a pedestrian, in Tempe, AZ, U.S.A. As a pedestrian and/or bicyclist, how did this event and its outcome change your opinion about sharing the road with AVs?
Are you currently an active member of BikePGH?
Yes (1), no (0), not sure (2)  In the first step, the most effective variables among the independent variables to predict the feelings about the use of public streets as a proving ground for autonomous vehicles was identified. In the next step, the identified effective variables were used as selected independent variables to explore the effects of these selected variables on the dependent variable. Extreme gradient boosting (XGBoost) and random forest were used to select the most effective independent variables. This is in line with recent related studies that deal with a high number of independent variables [27][28][29][30][31]. The random forest aggregates many binary decision trees. These trees are the result of a random choice of explanatory variables and bootstrap samples at each node. XGBoost [32] also generates multiple trees to improve accuracy. XGBoost and random forest are better options in comparison with other feature selection techniques. In other techniques, the importance ranking can be affected negatively by other associated inputs [33].
Cross-validation (10-fold cross-validation) is a resampling method that is applied to estimate the accuracy for this limited number of data. Cross-validation generally results in a less biased model than other methods, such as train and test split. After applying random forest and XGBoost, the SHAP (SHapley Additive exPlanations) values [34] were used to select the most effective variables. The SHAP is a value that can explain the contribution of each observation to the dependent variable. Therefore, it is possible to have local interpretability while the traditional importance values are related to each predictor and are based on the entire population. In addition, SHAP values can be estimated for each class (for nominal data) in the dependent variable.
All independent variables were included in the random forest and XGBoost models, and then the not important variables were excluded one by one based on the SHAP values. The accuracy rate and the number of input variables were used to find the threshold for SHAP values. This threshold was used with the selected XGBoost or random forest in addition to finding the most effective variables.
A C5.0 model was used in this study to explore the effects of the selected most effective variables on the dependent variable. C5.0 is an improved version of C4.5 that is an extension of the ID3 algorithm [35][36][37][38]. In this C5.0 model, 2 and 75 are used as the minimum number for records per child branch and the pruning severity. To collapse weak subtrees, local and global pruning are used. The winnow attributes technique excludes irrelevant predictors and, before modelling, evaluates the relevancy of the predictors. Table 2 shows that more than 47% of respondents approve the use of public streets as a proving ground for autonomous vehicles. As was mentioned, the SHAP values can be estimated for each class in the dependent variable. Therefore, these values were used to find the most effective variables for respondents who approve the use of public streets as a proving ground for autonomous vehicles. Table 3 shows that both total and breakdown accuracy values are higher for the XGBoost model in comparison with the random forest model. In addition, in the XGBoost, 80% accuracy is achievable after including only four effective variables based on SHAP values and including more variables cannot significantly enhance the accuracy. In the random forest model, the accuracy after including four effective variables based on SHAP values is 76%. Therefore, the XGBoost model was chosen to find the effective variables.  Table 4 shows the selected effective variables based on SHAP values resulting from the XGBoost model for respondents that approve the use of public streets as a proving ground for autonomous vehicles. Table 4 shows that the effect of autonomous vehicles on traffic injuries and fatalities, being safe to share the road with autonomous vehicles, Elaine Herzberg accident and its outcome, and autonomous vehicles speed when operating in autonomous mode are the most effective factors for respondents that approve the use of public streets as a proving ground for autonomous vehicles. This table also indicates the most effective classes or attributes for these variables. In the next step, a C5.0 model was used in this study to explore the effects of the selected most effective variables on the dependent variable. Figure 1 shows the proposed C5.0 decision tree. The frequency and percentage of each classification in the dependent variable are presented for each node. The overall accuracy is more than 79%, and the breakdown prediction accuracies are around 78% and 81% for 0 (somewhat approve, neutral, somewhat disapprove or disapprove) and 1 (approve) classes. There are five terminal nodes (the bottom nodes of the decision tree), and this model has four splitters, i.e., the effect of autonomous vehicles on traffic injuries and fatalities, being safe to share the road with autonomous vehicles, Elaine Herzberg accident and its outcome, and autonomous vehicles speed when operating in autonomous mode.

Results
The model prediction is 1 for respondents who think that autonomous vehicles make traffic injuries and fatalities situations significantly better (refer to node 8 in Figure 1). The model prediction is 0 for respondents who do not think that autonomous vehicles make traffic injuries and fatalities situations significantly better and the Elaine Herzberg accident changed their opinions about sharing the road with autonomous vehicles (refer to node 2 in Figure 1). For respondents for whom the Herzberg accident did not change their opinions, speed when operating in autonomous mode and being safe to share the road with autonomous vehicles are important factors. For respondents for whom the Herzberg accident did not change their opinions, the model prediction is 1(refer to node 7 in Figure  1) if they do not believe in a maximum 25 mph speed when operating in autonomous mode; if they believe in a maximum 25 mph speed when operating in autonomous mode, the model prediction is 0 (refer node 5 in Figure 1) for respondents who do not think that it is very safe to share the road with autonomous vehicles; and 1 (refer node 6 in Figure 1) for respondents who think that it is very safe to share the road with autonomous vehicles.

Discussion and Conclusions
This study explores the perceived feelings of sharing roads with autonomous vehicles. The paper expands on the scope of previous studies by exploring the perceptions of bicyclists and pedestrians. Moreover, this paper builds explicitly on the limitation of previous studies that did not include a wide range of road users, especially vulnerable road users who often receive less priority. The findings suggest the XGBoost model finds the most influential variables. In addition, the analysis suggests the effect of autonomous vehicles on traffic injuries and fatalities, being safe to share the road with autonomous vehicles, the Elaine Herzberg accident and its outcome, and a maximum speed when operating in autonomous as effective variables to predict approval for the use of public streets as a proving ground for autonomous vehicles.
There are some other variables included in the model that are not related to safety (e.g., paying attention to the subject of autonomous vehicles in the news, familiarity with the technology behind autonomous vehicles, the experience of sharing roods with autonomous vehicles and human-driven cars, data sharing, related policies and some variables related to socio-demographic data), but the most effective variables are related to safety. However, some of these variables, such as familiarity and awareness, are significant in other studies. For example, Schoettle and Sivak [12], Silberg et al. [1] and Sanbonmatsu et al. [39] found a positive association between level of awareness and the intention to adopt autonomous vehicles. Nordhoff et al. [40] also found a similar association for driverless shuttles.
This is not a surprising result since safety is more important for vulnerable road users in comparison with drivers who are better protected. This point is further confirmed by the effects of the Elaine Herzberg accident, which is among the most effective variables. The findings are in line with previous studies that consider safety as an important factor (e.g., [21][22][23][24][25]). However, most of these studies focus on safety as a significant factor for changing travel behavior and mode. In addition, among these studies, only two considered the perceptions of pedestrians and bicyclists [5,27].
The policy relevance of this paper is underlined by the fact that at the individual level, we found safety as a very important factor, and the authorities need to be sure that autonomous vehicles are safe enough to be shared on the streets. Therefore, autonomous vehicle companies need to consider special procedures and cautions during their testing, and authorities need to provide related policies. Public perception, in this case, can be used both directly and indirectly. In addition, planners and other stakeholders need to provide more information to decrease public confusion about autonomous vehicles.
Non-parametric models, such as the proposed C5.0, have some advantages that make them preferable to the traditional parametric models. Chang and Wang [41] highlighted that non-parametric models (such as the proposed C5.0 model) do not need specific assumptions or a functional form and can handle multicollinearity problems, which are a common issue for independent variables in these data because of potentially high correlations between these variables. The results are also more useful since these models focus on a reduced set of the most significant factors [41].
Despite these advantages mentioned above, these models have some disadvantages. For example, they do not have formal statistical inference procedures [41]. These models also do not have confidence intervals for the splitters and predictions [41]. Generally, it is not recommended to generalize the results based on the non-parametric techniques since these models are not very stable. Furthermore, the accuracy and structure may change significantly if different partitioning and stratified random sampling are used. Therefore, these models are usually used to find important variables and further techniques are needed to find final models. Since sampling and partitioning are not used in the proposed C5.0 model development, this disadvantage is not a significant concern for this study.
In addition to the mentioned advantages, machine learning has different applications in various engineering fields (e.g., [42][43][44][45]). Increasing interest in machine learning is because of various data, better computational tools and processing that make computation cheaper and more powerful. This means that applying machine learning can help us to develop more accurate models to analyze bigger and more complex data faster than the traditional techniques.
Some extensions of this study are essential. For example, consistent data collection for different regions needs to be considered since other areas are very different in terms of regulations and people experience with autonomous vehicles. Frequent additional data collection can be used to evaluate the effects of autonomous vehicles on public perception in addition to the evolution of public perceptions of autonomous vehicles. Additional questions, especially related to socio-demographic data, can be used to have more detailed insights. For example, gender-related data are not included in the BikePGH survey, or there is a very low response rate among the age groups that may have different ideas (just around 4% for 18-24 and around 12% for elderly). Finally, the target population in our study is bicyclists and pedestrians that represent these specific mode users. Adding a general population can be useful to have a baseline and a useful comparison. Future studies can also develop questionnaires following a scientific approach to avoid the gap and potential biases in the questions of the BikePGH survey that an interest group develops. Data Availability Statement: Data are freely available to everyone to use and republish at https://data.wprdc.org/dataset/autonomous-vehicle-survey-of-bicyclists-and-pedestrians (accessed on 19 July 2020).

Conflicts of Interest:
The authors declare no conflicts of interest.