1. Introduction
The transport sector has undergone a significant transformation in recent decades, driven by rapid technological advancements that have introduced novel mobility solutions [
1,
2]. Among these innovations, carsharing (CS) services and full autonomous vehicles (AVs) have gained substantial attention due to their potential to reshape urban mobility [
3,
4]. These emerging transport mode services offer alternatives to traditional car ownership by enhancing efficiency, safety, sustainability, and flexibility. The CS and the AVs are anticipated to be essential in creating sustainable and intelligent transportation systems as urbanization increases, traffic congestion grows, and environmental concerns rise [
5,
6]. Furthermore, one of the most important factors impacting AV user acceptance is safety concerns, which continue to be a major factor in travel behavior and decision-making [
7,
8].
CS services provide users with on-demand vehicle access, thereby reducing the number of private vehicles on the road, alleviating congestion, and mitigating environmental impacts, especially at city centers [
9,
10]. Meanwhile, AVs have the potential to significantly increase road safety, fuel economy, and traffic efficiency [
11,
12]. By minimizing human error, one of the leading causes of road accidents, autonomous driving technology has the potential to significantly reduce traffic fatalities and injuries [
13]. However, the perceived safety of AVs remains a major barrier to widespread acceptance, as users express concerns regarding system reliability, cybersecurity threats, and the ability of automated systems to handle complex driving scenarios [
14,
15].
Numerous studies have examined the factors influencing the acceptance of CS and AVs, emphasizing elements such as convenience, cost, environmental awareness, and trust in technology [
16,
17]. The interaction between safety perceptions, behavioral tendencies, and demographic characteristics in shaping mode choice remains underexplored. Conventional discrete choice models have provided valuable insights into user preferences [
18,
19]. Conventional discrete choice models often fail to capture the complexity of decision making in the evolving transport landscape. In order to overcome these constraints, machine learning (ML) has become a powerful analytical technique that makes it possible to analyze large, diverse datasets and find hidden patterns and interdependencies among the variables driving transport decisions [
20,
21,
22]. Previous research has demonstrated the effectiveness of ML models, ranging from traditional classifiers such as logistic regression and decision trees to advanced algorithms like Random Forest, gradient boosting, and Neural Networks, in improving prediction accuracy for mode choice behavior [
21,
23,
24]. Brahimi et al. [
25] use ML models and deep learning to predict the usage of carsharing vehicles at stations. Hu, Tang, Tong and Zhao [
21] examine the spatiotemporal properties of electric vehicles in carsharing using ML models. They found a connection between land use and the use of electric carsharing vehicles.
The AVs will be used differently by stakeholders, such as shared autonomous vehicles (SAVs) where riders share the same vehicle, or the AV fleet is shared by the public [
26]. In this study, privately shared autonomous vehicles (PSAVs) stands for an AV owned by a company or any relevant representative body that can be used like a private taxi (i.e., the car is shared, and the ride is private) [
27]. It is noted that PSAVs represent full automation (Level 5) according to the SAE [
28], allowing users to experience autonomous mobility, whereas traditional CS services rely entirely on human drivers. The distinction between these two services requires attention by travelers, where influential factors are considered, such as safety concerns due to the technology. Furthermore, studying the preferences of people towards these two services is demanded.
ML analysis is a powerful method in different fields, as shown in
Table 1. However, studies that apply ML to analyze the traveling variables, sociodemographic variables, and safety-related concerns influencing the acceptance of CS and PSAVs services are underexplored. Previous studies have focused on using statistical and mathematical models that have limitations in prediction due to the complexity of problems. The added value of this research to the literature is summarized in applying three high-performance machine learning (ML) models—CatBoost, XGBoost, and LightGBM—to classify user choices between CS and PSAVs. Furthermore, applying ML models overcomes the limitations of other statistical methods. Moreover, this study has not been conducted before, where the factors that impact the acceptance of travelers to CS and PSAVs are examined using ML models.
This study aims to identify the key determinants influencing transport mode selection, assess the role of safety-related concerns in shaping user preferences, and evaluate the comparative performance of machine learning algorithms in predicting travel behavior. By integrating demographic, behavioral, and trip-related variables, this research seeks to provide actionable insights for researchers, urban planners, vehicle manufacturers, and mobility service providers. This research answers the following questions:
Q1: Do people’s selection of PSAVs and CS vary by demographic and traveling variables?
Q2: How do travelers inside cities perceive the safety of CS and PSAVs?
Q3: Do machine learning models predict whether a traveler is likely to accept CS or PSAVs based on safety perceptions?
Q4: What are the factors that impact the acceptance of CS and PSAVs?
This research article is organized as follows:
Section 1 presents the introduction, which includes previous research gaps, current research contributions, and research aims. The literature review is presented in
Section 2, which presents previous related research. The methodology, which explains the tools and techniques used, is presented in
Section 3.
Section 4 presents the results and discussions.
Section 5 presents this study’s summary and conclusions.
2. Related Work
The technology of vehicles is developing fast. Despite the current unavailability of AVs on the market, pilot projects have been launched to study the feasibility, safety, and user acceptance of AVs and autonomous shuttles, such as the Smart Columbus EasyMile shuttle program in Ohio [
29], the May Mobility deployments in Ann Arbor [
30], Michigan, and the CAVForth autonomous bus service in Scotland [
31]. Additionally, the general public lacks knowledge about these pilot projects, and they are more involved in automation level 4, such as Cruise and Waymo in San Francisco. In this section, related research works are discussed and presented.
In recent years, CS and AV services have attracted the attention of researchers and city planners as a practical way to address issues with transport in cities, such as traffic, parking restrictions, and environmental concerns [
32,
33]. Safety considerations play a crucial role in shaping travel behavior, particularly in emerging mobility services such as CS and PSAVs [
34]. Extensive research has explored the impact of travel behavior variables, sociodemographic variables, and safety perceptions on transport mode choices, addressing concerns related to technological reliability, accident risks, cybersecurity, and personal security [
35,
36]. This section provides an overview of previous relevant studies on traveler preferences towards CS and AVs, and safety-related factors influencing CS and AV adoption while also examining the role of machine learning in predicting travel behavior.
A study by Kyriakidis et al. [
37] was conducted to analyze the public perceptions of AVs based on survey responses from eight European countries. The authors found that safety remains a primary determinant of AV adoption, with demographic factors such as age, gender, education, and household size influencing willingness to use AVs. Vulnerable road users, including the elderly and individuals with disabilities, expressed a preference for human supervision in AVs, highlighting broader concerns related to reliability, cost, and driving experience. These insights suggest that regulatory frameworks should consider both safety and user comfort in AV implementation. A study by Stoiber et al. [
38] explored user preferences for pooled AVs through an online choice experiment with 709 participants in Switzerland. The study assessed both short- and long-term mobility decisions based on a scenario of full AV market penetration. The study results indicate that 61% of respondents preferred shared AVs over private autonomous cars, reinforcing the potential of pooled AV services to reduce private vehicle dependence. Additionally, integrated measures addressing cost, travel time, and comfort were identified as critical factors in promoting shared mobility solutions [
38]. Zhou et al. [
39] conducted a study to examine consumer preferences toward CS and the potential adoption of shared automated vehicles (SAVs) through a stated preference survey in Australia. A mixed logit model was applied, and the study reveals substantial preference heterogeneity, with prior experience in carsharing increasing multimodal travel choices while reducing private vehicle reliance. The authors show that elderly people, women, and non-drivers—who are generally viewed as major SAV beneficiaries—show lower levels of acceptability, underlining potential challenges to broad adoption and the importance of focused policy measures.
In a study by Kolarova et al. [
40], the travelers were more likely to use AVs than conventional cars. Hao and Yamamoto [
32] found that car owners are less likely to use CS than people who do not own cars in urban areas. The AVs are still not on the market, and researchers use mathematical models based on questionnaires and surveys to understand the behavior of people towards AVs as part of a transport system. It was found that personal experiences impact the acceptance of CS and AVs, as stated Müller [
41]. Schoettle and Sivak [
42] examined how travelers in the USA, UK, and Australia deal with the availability of AVs in the market. The authors found that preference towards AVs changes across gender; for example, women are more willing to use AVs than men. Moreover, a study by Howard and Dai [
43] states that men who are highly educated, own luxury cars, and are high-income earners are more willing to use AVs compared to other groups. Additionally, studies by by Bansal et al. [
44] and Stoma et al. [
45] state that men and high-income people are more likely to use SAVs. Women hesitate to buy AVs due to safety factors; therefore, they are less likely to pay more money for automation, as stated by Louw et al. [
46].
Scholars use mixed logit models to understand the expected behavior of people when AVs are in the market. Chee et al. [
47] find that AVs are accepted by users if the trip time, trip cost, and waiting time are competent. In Japan, a study by Das et al. [
48] shows that around 20–30% of trips might change when AVs are in the market, based on the results of a nested logit model. Moreover, the author states that AVs are impacted by job types; for example, part-time workers are more willing to use AVs. In shared mobility, CS is considered an option that attracts travelers in urban areas. The CS is considered a cost-effective option in cities more than privately owned cars, where the cost of parking and using infrastructure is eliminated [
49,
50]. Pawełoszek [
51] shows that CS is a solution to traffic congestion in city centers where people can rent a car for a short period, parking is utilized, and traffic congestion is alleviated. Efthymiou et al. [
52] show that CS users are among the youngest and educated ones. Meanwhile, the study of [
53] shows that CS is used by students who own a driving license and have low income, and men are the main users of CS. Zhou, Zheng, Whitehead, Washington, Perrons, Page and Practice [
39] conducted a study to examine consumer preferences toward CS and the potential adoption of shared automated vehicles (SAVs) through a stated preference survey in Australia. A mixed logit model was applied, and the study reveals substantial preference heterogeneity, with prior experience in carsharing increasing multimodal travel choices while reducing private vehicle reliance. The authors show that elderly people, women, and non-drivers—who are generally viewed as major SAV beneficiaries—show lower levels of acceptability, underlining potential challenges to broad adoption and the importance of focused policy measures.
Lee [
54] analyzes the changing dynamics of transportation mode choice in the AV era through a combination of discrete choice modeling (DCM) and machine learning (ML) techniques. A stated choice experiment in the U.S. reveals that AV market shares are influenced by a range of socio-demographic and behavioral factors. The study utilizes stochastic gradient boosting to enhance feature interpretability, uncovering non-linear relationships between user characteristics and mode choice. Additionally, methodological limitations in ML-based mode choice modeling were critically assessed, highlighting areas for future refinement. Pineda-Jaramillo et al. [
55] compare traditional multinomial logit models with ML approaches in travel mode choice prediction. In their study, based on household survey data from the Aburrá Valley, Colombia, they find that an optimized gradient boosting model outperformed both logit and Random Forest models. Key determinants of mode choice included travel time, parking availability, vehicle ownership, age, and gender, demonstrating the potential of machine learning as a policy tool for promoting sustainable transport options. Teusch et al. [
56] prepare a systematic literature review of machine learning applications in shared mobility, covering methods, datasets, and decision-support systems. The authors highlight the gaps in ML studies on carsharing and ride hailing. Brahimi, Zhang, Dai and Zhang [
25] study the carsharing data to predict the usage of vehicles at stations. The authors compare ML models and deep learning models. They find that CNN-LSTM achieved the highest prediction accuracy; weather conditions are more influential than time-based variables. Baumgarte et al. [
57] compare 20 groups of users of carsharing, and they find that these groups are different in the spatial and temporal behavior usage of carsharing. The results support a tailored business model. Wang and Ross [
58] compare the multinomial logit model and the XGBoost in predicting travel mode choice. The machine learning model XGBoost outperforms the multinomial logit model in prediction and accuracy. A study by Zhao et al. [
59] proposed ML techniques that are used to examine response heterogeneity, a high-accuracy classifier to predict mode-switching behavior. The study emphasizes that drivers are sensitive to having more pickups on roads than other people who use other transport modes. Hu, Tang, Tong and Zhao [
21] examined the spatiotemporal characteristics of electric vehicles in carsharing. The authors apply ML techniques, and the findings demonstrate a connection between the behavior of carsharing users and land use. Qin et al. [
60] studied commuters’ preferences for AVs, and they found that travel experience improves AV perception. Huang et al. [
61] applied ML techniques to study the behavior of SAV users with the supply. The authors found that the zone-based RL relocation outperforms car-based; this aligns with urban travel behavior and heterogeneity.
In line with the power of the ML model in the transport field, Alencar et al. [
62] examined demand forecasting in carsharing services using LSTM, Prophet, and ensemble models (e.g., XGBoost, CatBoost). Multivariate LSTM with weather data significantly reduces error. Boosting models perform best for short-term (12 h) forecasts, while Prophet and SARIMA excel in long-term (7-day) predictions. The study highlights the benefits of incorporating external variables. Martín-Baos et al. [
63] conduct a systematic and methodologically rigorous comparison of machine learning (ML) models—including XGBoost, Random Forests, and Deep Neural Networks (DNNs)—against multinomial logit models across both real-world and synthetic datasets. The authors advocate for hybrid approaches that combine the strengths of both ML and RUM frameworks and propose AutoML for efficient algorithm selection and the integrated estimation of behavioral parameters. This study emphasizes the complementary role of ML in enhancing, rather than replacing, traditional econometric models in travel mode choice analysis. Zhao et al. [
64] conduct a methodologically rigorous comparison of machine learning (ML) and logit models for travel mode choice, using SP survey data. While Random Forest achieved superior predictive accuracy over multinomial and mixed logit models, behavioral outputs (e.g., marginal effects, arc elasticities) from ML—particularly tree-based models—are often inconsistent or behaviorally implausible unless adjusted. Both approaches generally aligned in variable influence direction and importance. The study highlights a tradeoff between predictive performance and behavioral interpretability, suggesting that ML may serve as an exploratory complement to logit models in travel behavior research. Fafoutellis et al. [
65] examined the user acceptability of Autonomous Mobility-on-Demand services with ride sharing using interpretable machine learning. Employing Random Forests and gradient boosting on survey data from Athens, the study achieved over 80% prediction accuracy. Key determinants include travel attributes, mobility, AV-perception profiles, and demographics. Findings reveal that weather, schedule flexibility, and commuter status influence willingness to share and pay, supporting the utility of explainable ML in transport behavior modeling. Zhu et al. [
66] proposed AMGC-Seq2Seq, a novel deep learning model integrating multi-graph convolution and attention mechanisms to predict multistep flows in carsharing systems. By simultaneously capturing spatial and temporal dependencies, the model outperforms existing methods on a large-scale real-world dataset. The study underscores the efficacy of joint spatiotemporal modeling for accurate demand forecasting.
In sustainability, AVs and CS participate in the reduction in greenhouse gas emissions as well as promote eco-friendly transport modes, as stated by Hao and Yamamoto [
32]. Particularly, the transport sector is the cause of almost one-third of the greenhouse gas emissions in the world, as stated by the WHO, because of the use of internal combustion engines, traffic congestion, and the rise in car ownership worldwide. Furthermore, using CS and AVs, which are on-demand transport systems, can help alleviate the negative impact of conventional transport modes as well as promote sustainable travel behavior in cities [
33]. CS and AVs’ influences are realized in the reduction in the pollution resulting from alleviating traffic congestion, increasing the efficiency of fuel, electric vehicles, less parking, the development of engines, and optimizing the fleet size of AVs and CS [
33,
67].
Table 1 summarizes the methods and the main variables used in the relevant research.
Table 1.
Previous relevant studies.
Table 1.
Previous relevant studies.
Reference | Description | Methods | Main Variables |
---|
Zhou, Zheng, Whitehead, Washington, Perrons, Page and Practice [39] | Examination of consumer preferences toward CS and the potential adoption of SAVs | Discrete choice modeling | Travel time attributes, trip cost attributes, transport mode variables, sociodemographic variables |
Kolarova, Steck and Bahamonde-Birke [40] | Assessment of the effect of autonomous driving on the value of travel time savings | Discrete choice modeling | Travel time attributes, trip cost attributes, perceptions towards AVs, transport mode variables, sociodemographic variables |
Hao and Yamamoto [32] | Review study on SAV, AV, and car sharing | Systematic review | User preferences, safety, user experiences |
Müller [41] | Technology acceptance of autonomous vehicles, battery electric vehicles, and carsharing | Technology Acceptance Model, Partial Least Squares Structural Equation Modeling | Objective usability, innovations, perceived usefulness of the vehicle, environmental protection, and enjoyment |
Schoettle and Sivak [42] | The acceptance of people to AVs | Descriptive statistical analysis | Safety, willing to pay, attitudinal perception |
Howard and Dai [43] | Exploration of the perceptions of people towards AVs | Descriptive statistical analysis | Sociodemographic variables, attitudinal variables |
Bansal, Kockelman and Singh [44] | Perception of people of AVs | Discrete choice modeling | Sociodemographic, travel behavior, technological familiarity, and attitudinal variables |
Stoma, Dudziak, Caban and Droździel [45] | Automotive industry users and their opinions on the AVs | Survey-based research design | Sociodemographic, awareness, perception, and willingness to use new technology |
Louw, Madigan, Lee, Nordhoff, Lehtonen, Innamaa, Malin, Bjorvatn and Merat [46] | The willingness of travelers to use different functions of AVs | Survey design and statistical analysis | Sociodemographic, experience with Advanced Driver Assistance Systems (ADAS) |
Chee, Susilo, Wong and Pernestål [47] | The willingness to pay for using AVs | Structural equation modeling | Sociodemographic, travel behavior variables, and attitudinal variables |
Das, Sekar, Chen, Kim, Wallington and Williams [48] | The impact of AVs on travel time | Discrete choice modeling | Sociodemographic, travel behavior variables, attitudinal variables, time use variables |
Pawełoszek [51] | The barriers facing carsharing users | Descriptive statistics and sentiment analysis | User rating, gender, response time, application updates |
Lee [54] | Analysis of the changing dynamics of transportation mode choice in the AV | Machine learning and discrete choice modeling | Sociodemographic variables, traveling behavior variables, perception and attitude towards AVs, transport mode variables |
Pineda-Jaramillo, Arbeláez-Arenas and Development [55] | Exploration of the commuter preferences towards AVs | Mixed logit model; sensitivity analysis | Travel time/cost, AV perception, ridesharing, commuting behavior, AV familiarity |
Brahimi, Zhang, Dai and Zhang [25] | Investigation of predicting transport mode choice in urban areas | ML models (gradient boosting model), and discrete choice modeling | Travel time, parking type at the destination, age, the number of motorized vehicles per household, and gender |
Baumgarte, Brandt, Keller, Röhrich and Schmidt [57] | Predicting the usage of vehicles of carsharing fleets at stations | Machine learning and deep learning | Car usage history, environmental conditions, and temporal information |
Qin, Yu and Zhang [60] | Carsharing users and analyzing usage patterns | Clustering and segmentation of carsharing user data; spatial usage pattern analysis | User characteristics, spatial behavior, frequency, usage mode (station-based vs. free-floating) |
Hu, Tang, Tong and Zhao [21] | The relationship between the behavior of electric vehicle carsharing and land use | Interpretable ML on EVCARD data (50k EVs); clustering of temporal patterns and land-use influence | Trip records, land use (healthcare, lifestyle, culture, business), temporal usage |
Huang, Liu, Zhang, Liu and Hu [61] | The behavior of SAV users based on the parking space | RL-based SAV relocation with car/zone agents | Parking use, relocation profit, historical demand, zone classification (residential, industrial, etc.) |
Alencar, Pessamilio, Rooke, Bernardino and Borges Vieira [62] | Demand forecasting for different carsharing models (station-based and free-floating) | LSTM (uni- and multivariate), Prophet, XGBoost, CatBoost, LightGBM, SARIMA | Historical carsharing demand; meteorological data (temperature, precipitation) |
Martín-Baos, López-Gómez, Rodriguez-Benitez, Hillel and García-Ródenas [63] | A systematic comparison of ML and Random Utility Models (RUMs) for travel mode choice prediction and behavioral analysis | Multinomial logit model, DNN, RF, XGBoost, SHAP, WTP estimation; AutoML; synthetic data benchmarking | Transport mode attributes, socio-demographics |
Zhao, Yan, Yu and Van Hentenryck [64] | A comparison for travel mode choice modeling | Multinomial logit, mixed logit; Random Forest; partial dependence plots; marginal effects; arc elasticities. | Transport mode-specific attributes; socio-demographic variables |
Fafoutellis, Mantouka, Vlahogianni and Oprea [65] | Modeling of user acceptability and mode choice for Autonomous Mobility-on-Demand (AMoD) services with on-board ridesharing under different weather scenarios | Interpretable machine learning (Random Forest, gradient boosting), permutation feature importance, partial dependence plots | Travel time, cost, walking time, weather, user mobility profile, AV perception, demographic variables |
Zhu, Luo, Liu, Fan, Song, Yu and Du [66] | Addressing multistep flow prediction in carsharing systems | Attention-based multi-graph convolutional sequence-to-sequence (AMGC-Seq2Seq) | Temporal and spatial flow data, graph-based relational data between stations |
In summary, this research applies a novel approach where CS and PSAVs are studied using ML models. Similar studies that have used machine learning are limited. Moreover, the previous studies demonstrate statistical and mathematical models to study the preferences of people towards CS and AVs. Moreover, a lot of previous studies show that ML models are superior compared to discrete choice modeling and other statistical methods. Limited studies focus solely on CS and AVs. Studies that focus on travel mode choice mainly used discrete choice modeling and survey design. The traditional methods have limitations in prediction and accuracy because of the complexity of the problem. The limitations of the traditional statistical methods due to the complexity of research problems concerning AVs are overcome by machine learning models. In this study, the safety perceptions of travelers in urban areas, traveling factors, sociodemographic variables of travelers in urban areas of Budapest, Hungary, are studied. Moreover, the travelers’ behaviors towards CS and PSAV are examined. This study fills the gap in the literature where there is a scarcity of research that finds the impact of certain factors on the choice of CS and PSAV services using machine learning.
This study employs machine learning techniques—CatBoost, XGBoost, and LightGBM—to analyze the factors influencing mode choice between CS and PSAVs. By integrating demographic, behavioral, and safety-related variables, this approach aims to provide a comprehensive understanding of the complex decision-making processes involved in adopting these emerging mobility solutions.
4. Results and Discussion
This section presents the findings of the machine learning models applied to predict user mode choice between CS and PSAVs, focusing on traffic safety perceptions and behavioral-demographic influences. The performance evaluation of the models, classification metrics, and feature importance is discussed in detail.
4.1. Model Performance and Comparative Analysis
The predictive performance of the applied machine learning models—XGBoost, CatBoost, and LightGBM—is evaluated using test accuracy, precision, and F1-score.
Table 2 presents a comparative summary of the models’ performance metrics.
The ML models demonstrate a high level of accuracy in modeling the preference of people towards CS and PSAVs. The model metrics and classification performance are shown in
Table 6 and
Table 7.
The three models predict travelers’ acceptance of CS and PSAV based on the model metrics mentioned above. Among the tested models, XGBoost achieved the highest predictive performance, with an accuracy of 77.17%, followed by CatBoost (76.36%) and LightGBM (73.10%). The high accuracy and precision of XGBoost suggest its superior ability to capture the complex relationships within the dataset.
Figure 12 illustrates the comparative performance of the models, demonstrating the slight variation in classification accuracy. While all models performed competitively, the relatively lower performance of LightGBM may indicate its reduced capacity to generalize under the given data distribution.
4.2. Classification Metrics
A detailed evaluation of the models’ classification performance is provided in terms of precision, recall, and F1-score for both CS and PSAVs, as shown in
Table 7. The results reveal that all models exhibit higher recall for CS users than for PSAV users, suggesting a stronger ability to correctly classify CS users.
Table 7.
Evaluation of the models’ classification performance.
Table 7.
Evaluation of the models’ classification performance.
| XGBoost | LightGBM | CatBoost |
---|
CS | PSAV | CS | PSAV | CS | PSAV |
---|
Precision | 0.77 | 0.77 | 0.73 | 0.73 | 0.77 | 0.76 |
Recall | 0.81 | 0.73 | 0.78 | 0.68 | 0.79 | 0.73 |
F1-score | 0.79 | 0.75 | 0.75 | 0.70 | 0.78 | 0.74 |
XGBoost achieves a precision of 0.77 for both CS and PSAV, with an F1-score of 0.79 for CS and 0.75 for PSAVs, indicating balanced classification performance. CatBoost performs slightly low, with an F1-score of 0.78 for CS and 0.74 for PSAV, showing a small reduction in predictive power compared to XGBoost. LightGBM exhibits the lowest recall for PSAV (0.68), which may imply a greater tendency to misclassify PSAV users compared to the other models.
The classification results indicate that, while all models perform well, XGBoost consistently outperforms the others in detecting and correctly classifying users of both transport modes.
4.3. Feature Importance Analysis
Feature importance analysis is based on the metrics of how each variable improves model accuracy. When the data is split into smaller groups, the model tests the improvements that occurred and the frequency of choosing a variable in the dataset on the accuracy. The feature importance is used to explain the model and the selection of the most impacted variables on the prediction, dropping the lowest importance from the model and specifying the driving features.
To understand the key factors influencing users’ mode choice decisions, feature importance analysis is conducted using the XGBoost model, which demonstrates the highest predictive performance.
Figure 13 presents the ranked importance of input features, highlighting the most influential variables in mode choice prediction. The feature importance in XGBoost is computed based on the Gain, which quantifies the relative contribution of each feature by measuring the improvement in the model’s objective function brought by splits involving that feature across all decision trees in the ensemble.
The results provide valuable insights into the behavioral and demographic factors influencing user preferences in CS and PSAVs. The superior performance of XGBoost suggests that decision tree-based ensemble methods are well-suited for capturing the complexity of travel behavior. The classification results indicate that CS users are more accurately identified, PSAV users present greater classification challenges, likely due to variations in safety perceptions and technological acceptance.
The factors that have influence on the selection of CS and PSAVs are shown in
Figure 13 with different levels of impacts; for example, trip time is the most influential factor that controls the use of CS and PSAVs, while job variable has the least impact on using CS and PSAVs.
The influential factors are divided into three groups for ease of understanding: group 1 is traveling variables, group 2 is sociodemographic variables, and group 3 is safety variables. Each variable in each group shows an importance value different than the others. This demonstrates that every variable has an impact on the acceptance or the use of CS and PSAV.
Group 1 (traveling variables) includes trip time, waiting time for the PSAV or walking time to and from CS, trip purpose by CS, trip purpose by the PSAV, trip cost, traffic congestion, waiting time, current transport mode, usual transport cost, usual main trip time, and camera on board of CS and the PSAV. From
Figure 13, the highest influential factor is the trip time using CS and PSAVs (6.68%); the duration of the trip plays a critical role in users’ mode choice. The third influential factor in the adoption of CS and PSAVs is walking and waiting time (6.12%). It is shown from the results that trip cost (4.66%) is the sixth influential factor that impacts the use of CS and PSAVs after the trip purpose of using CS (5.09%). Meanwhile, the trip purpose by PSAVs is the tenth influential factor, which means that people are still studying using PSAVs based on destinations. Traffic congestion is the ninth influential factor that people consider when using CS and PSAVs. In addition, the availability of a camera on board CS and PSAVs slightly impacts the use of CS and PSAVs compared to the other 24 factors.
Moreover, the people consider their current characteristics when they decide to choose CS and PSAVs, such as waiting time and the type of transport mode that a traveler uses impact using PSAV and CS, and this applies to the amount of money a traveler pays to use the current transport mode as well as the current trip time using the current transport mode. This leads to the conclusion that the current travel patterns influence the use of other transport modes in the future, such as CS and PSAVs. The importance of these variables occupies almost the least importance compared to other factors, as shown in
Figure 13.
Group 2 (sociodemographic variables) includes car ownership, monthly public transport pass ownership, gender, age, income, education, and job variables. The car ownership (6.12%) is the second influential factor; users with private vehicles may be less inclined to use CS or PSAVs. Users of public transport mode are more likely to use shared mobility, as demonstrated in this study, where monthly public transport pass ownership occupies the fourth important factor in influencing the use of PSAVs and CS (5.55%). Gender, age, and income demonstrate medium importance based on
Figure 13; their values are 4.49%, 4.20%, and 3.90%, respectively. The education level and job are shown to have the lowest values of importance based on
Figure 13. The values of importance are 2.60% for job and 2.89% for education level.
For Group 3 (traveling variables),
Figure 13 shows the following percentages: companion (4.58%), privacy concerns (4.07%), safety perceptions (3.84%), and cybersecurity awareness (3.81%). All of these highlight users’ concerns regarding personal security and data protection in shared mobility services. Meanwhile, reliability is almost the least important factor in the model.
These findings emphasize that both trip-related factors (e.g., Group 1), personal attributes (e.g., Group 2), and safety-related factors (e.g., Group 3) strongly influence mode choice. Notably, safety factors, as demonstrated in Group 3, emerge as significant determinants, reflecting users’ hesitations regarding data protection in PSAV services.
From a traffic safety perspective, concerns regarding privacy, cybersecurity, and the presence of surveillance cameras emerged as notable determinants in PSAV acceptance. These findings align with previous research suggesting that trust in autonomous vehicle systems is a critical factor in acceptance decisions [
27,
79]. Additionally, the strong influence of trip time, cost, and congestion highlights the importance of service efficiency in shaping user choices.
In comparison with the previous studies, the findings of this study demonstrate that trip time and trip cost parameters are the most significant factors that impact transport mode selection, as shown in a study conducted by Krueger et al. [
80], Yap et al. [
81], Becker and Axhausen [
82]. Car owners are less willing to use alternative modes of transport as mentioned in studies by Krueger, Rashidi and Rose [
80], and Becker et al. [
83]. Other features mentioned in
Figure 13 are introduced by scholars with variable impacts on the transport mode choice.
4.4. Limitations and Recommendations
While shared autonomous mobility offers promising benefits in terms of congestion reduction and environmental efficiency, ensuring public trust through robust safety regulations and transparency in data security remains imperative. One of the limitations of this study is that the participants do not represent the population of Hungary. To generalize the findings, extension of the data is needed, or at least the gaps need to be filled in sample proportions, for example, as the percentage of 65+ year-old participants. Moreover, this study focuses only on urban areas and studies one use of AVs (i.e., PSAVs).
Future studies can be focused on more relevant variables that impact the use of AVs and CS, such as internal vehicle design, rural areas, the fleet size of AVs and CS, and others. Moreover, the availability of AVs on the market will provide real data that can lead to a study on whether the adoption of AVs is attained or not. Further research could explore the role of real-world safety incidents and regulatory measures in shaping public perception and acceptance trends. Future research should explore the long-term impact of real-world PSAV deployment, integrating empirical safety data and user feedback to refine predictive models and enhance the public acceptance of autonomous mobility solutions. This study focuses on gradient boosting algorithms without empirical comparison to other established classification approaches, such as support vector machines or Neural Networks. Future research is encouraged to undertake comprehensive benchmarking across diverse algorithmic paradigms and to incorporate advanced explainability frameworks such as SHAP values to enhance model transparency, interpretability, and generalizability.