Micro-Mobility Sharing System Accident Case Analysis by Statistical Machine Learning Algorithms

İnaç, Hakan

doi:10.3390/su15032097

Open AccessArticle

Micro-Mobility Sharing System Accident Case Analysis by Statistical Machine Learning Algorithms

by

Hakan İnaç

Directorate for Strategy Development, Head of Investment Management & Control Department, Ministry of Transport and Infrastructure, Ankara 06338, Turkey

Sustainability 2023, 15(3), 2097; https://doi.org/10.3390/su15032097

Submission received: 26 December 2022 / Revised: 11 January 2023 / Accepted: 13 January 2023 / Published: 22 January 2023

(This article belongs to the Special Issue New Techniques to Promote Sustainable Mobility: Evaluation, Optimization and Behavioral Adaptation)

Download

Browse Figures

Versions Notes

Abstract

This study aims to analyze the variables that affect the accidents experienced by e-scooter users and to estimate the probability of an accident during travel with an e-scooter vehicle. The data of e-scooter drivers, offered for use via rental application in 15 different cities of Turkey, were run in this study. The methodology of this study consists of testing the effects of the input parameters with the statistical analysis of the data, estimating the probability of an e-scooter accident with machine learning, and calculating the optimum values of the input parameters to minimize e-scooter accidents. By running SVM, RF, AB, kNN, and NN algorithms, four statuses (completed, injured, material damage, and nonapplicable) likely to be encountered by shared e-scooter drivers during the journey are estimated in this study. The F1 score values of the SVM, RF, kNN, AB, and NN algorithms were calculated as 0.821, 0.907, 0.839, 0.928, and 0.821, respectively. The AB algorithm showed the best performance with high accuracy. In addition, the highest consistency ratio in the ML algorithms belongs to the AB algorithm, which has a mean value of 0.930 and a standard deviation value of 0.178. As a result, the rental experience, distance, driving time, and driving speed for a female driver were calculated as 100, 10.44 km, 48.33 min, and 13.38 km/h, respectively, so that shared e-scooter drivers can complete their journey without any problems. The optimum values of the independent variables of the rental experience, distance, driving time, and driving speed for male drivers were computed as 120, 11.49 km, 52.20 min, and 17.28 km/h, respectively. Finally, this study generally provides a guide to authorized institutions so that customers who use shared and rentable micro-mobility e-scooter vehicles do not have problems during the travel process.

Keywords:

micro-mobility; e-scooter; machine learning algorithms; statistical analysis; accident situations; driving status

1. Introduction

The complex construction, road structures, time, transportation costs, and the search for safe transportation in cosmopolitan cities encourage people to use micro-mobility vehicles [1]. Considering environmental factors, considerate people have increased the demand for electric and micro-mobility vehicles as a green transportation alternative. E-scooter vehicles, the most popular among green transportation and electric micro-mobility vehicles, are prevalent for the daily ride. Shared, rented, or privately owned e-scooter vehicles can be used by drivers at speeds of up to 15–30 km/h [2]. The drivers can end the ride and leave their vehicles wherever they want when the drivers who use this vehicle reach their destination. People used e-scooter vehicles for nearly 86 million trips in the U.S. in 2019 [3]. As a result of the restrictions on public transportation vehicles due to the recent COVID-19 epidemic, people had to turn to micro-mobility vehicles in terms of cost, environment, and comfort. As of 2020, the people who produced the e-scooter ensured it spread to about 53 countries [4]. One company that provides e-scooter vehicles enabled more than 10 million trips to be made in more than 100 U.S. cities in one year as of 2017 [5]. A study has reported that a total of 4.6 million shared, rented, or private e-scooter vehicles will be in the transportation network worldwide by 2024 [6].

The intense preference for e-scooters, which are among the available types of micro-mobility vehicles for transportation, causes some problems to occur by increasing the rate of use of these vehicles. The fact that the legal regulations for electric vehicles, which are among the available types of micro-mobility vehicles today, are not sufficiently comprehensive causes some problems for both the driver and the vehicles. Drivers who use micro-mobility vehicles, in particular, are faced with injuries, material damage, or other accidents in areas with dense settlements and complex transportation road networks. Many studies have presented some approaches regarding the kinds of accidents experienced by people who use micro-mobility vehicles, their causes, and their consequences. A study reported that approximately 3331 drivers who use an e-scooter vehicle for transportation over a period of 34 months visited an emergency service unit due to an accident [7]. Another study dealt with the data of 76 drivers who used e-scooter transportation in Germany in 2019 and visited an emergency service unit for different reasons [8]. Gan-El et al. stated that 170 drivers visited an emergency service unit due to accidents they encountered with their e-scooter vehicles in a year covering the period from 1 June 2019 to 30 June 2020 [9]. Brauner et al., using the Louvain algorithm method, analyzed the use of e-scooters in Germany over a two-year period that was related to 1936 accidents [10]. Another study retrospectively examined patients who had an e-scooter accident who visited the emergency department of a first-level trauma center in Germany during a six-month period from June to December 2019 [11]. The studies emphasize that the authorities’ detailed datasets on e-scooter-related accidents, especially the data on the measures taken for safety purposes, such as helmets, are recorded at a limited level [12,13].

There are two statuses during the journey of drivers who use shared and rentable e-scooter vehicles for their daily work or other reasons. Drivers using e-scooters can either complete their journeys without problems or leave them in the middle due to an accident. This study discusses four different driving statuses of e-scooter drivers, such as completing their journeys without any problems, an accident with injury, an accident with material damage, or an interruption of their journey with an unspecified accident type. A significant part of the studies has addressed the problems faced by e-scooter drivers due to injury accidents. Generally, researchers have measured the severity of injury in injury accidents of drivers who use e-scooters for transportation, with the helmet use rate. Most studies have suggested that the majority of e-scooter drivers do not wear helmets in injury accidents [14,15]. Generally, the types of injuries of those who have accidents using e-scooter vehicles are concentrated on the head. One study found that 94.3% of 193 e-scooter drivers who use microbiome vehicles for transportation do not wear helmets [16]. A study emphasized that the results of 477 radiological studies of 192 drivers using an e-scooter vehicle for transportation belong to e-scooter-related injuries [2].

The most crucial factor that makes this study unique is that, unlike the studies that focus on the post-accident medical consequences of drivers involved in e-scooter accidents for different reasons, this study takes into account the geographical, demographic, and technological factors that cause e-scooter accidents, and is focused on completing his/her travels. This study has provided the formation of an instructional rule about the speed limit, travel distance, travel time, and technical information of the vehicle, especially considering the region and rental time of the driver’s gender during the use of an e-scooter. Many studies have discussed that e-scooter vehicles offer more energy-saving and environmental advantages than other vehicles. One study emphasized that e-scooters cause less CO₂ emissions than other vehicles [1,17]. Ayözen et al. calculated that using micro-mobility tools for mail or mail distribution provides significant energy savings compared to other distribution vehicles [18]. However, few studies deal with the safety and comfort of traveling when using e-scooter vehicles in detail. In this study, the cases of people who use e-scooter vehicles (if they are shared and rented vehicles) to arrive at their destination without any problems during their travel period or to be interrupted due to any accident are handled using statistical mathematical and estimation methods. The methods used to examine the driving status of e-scooter vehicles are expressed, and the differences between this study and other studies are presented in Table 1.

Different methods have been used when researching many subjects, such as the purpose of the use of shared, nonshared or rentable e-scooter vehicles, the advantages of these vehicles, such as energy costs, environmental benefits, and the accidents caused by these vehicles. Generally, researchers have applied statistical analysis methods to basic statistical methods such as descriptive statistics, correlation, and degree of importance on subjects related to e-scooter vehicles [31,32]. The effects of independent variables on the dependent variable were tested by linear regression analysis using actual data from drivers who use e-scooter vehicles for transportation in Turkey. An argument dataset was created using the region where the e-scooter vehicles are used, rental date, time, driving information (speed, distance, duration), driver information (gender, age), and vehicle technical information. Four different output variables, such as smooth driving, material damage accident, injury accident, and unspecified accident, which show the driving status, were defined in this study, and the estimation results of the dependent variables were obtained by using the data of the independent variables and ML (machine learning) algorithms.

ML is the most important and accurate method for forecasting data [33]. Many studies have preferred ML algorithms for estimating categorical or continuous data using statistical methods based on ML algorithms [34]. Unlike traditional statistical methods (specific mathematical modeling is created according to the data), ML algorithms enable the formation of an algorithm by discovering the standard connections between the data and obtaining the prediction data according to the common connections [1,35]. In particular, ML is a prevalent method to use to evaluate big data. In this study, SVM (support vector machine), RF (random forest), kNN (k-nearest neighbor), AB (adaptive boosting, or AdaBoost), and NN (neural network) algorithms were used, which provided the best performance since the type of independent variable was categorical. The performance measurement values of these algorithms, such as the AUC (area under the ROC curve), CA (accuracy classification), F1, recall, and precision, were calculated, and the performances of the ML algorithms used for this study were compared [36]. The ML algorithm which gave the best prediction result was determined. This study aims to determine the limits of the independent variables by estimating the driving status of the drivers who use shared and rentable e-scooters for transportation by using the ML algorithms. In the last stage of this study, the optimum values of the independent variables were calculated for a comfortable and safe driving status. As a result, the lack of comprehensive legal regulations regarding e-scooter vehicles, especially micro-mobility vehicles, puts e-scooter drivers and transportation networks in a difficult situation. For this reason, this study presents scientific evidence to assist authorities in the creation of legal regulations for e-scooter vehicles.

This study includes five main sections. The literature review of this study and the information about the novelty that this study has added to the literature are discussed in the introduction section of this study. Detailed information about the characteristics of the data used in the research and the theoretical development of the methods are discussed in the second part of this study. The numerical results of the statistical analysis and ML algorithms of this study are shared in the third section. The optimum values of the independent variables representing the third stage of the methodology part of this study and the discussion information revealing the difference between the results obtained and other studies are included in the discussion part. Finally, the method used in this study, the results obtained, and the contributions to both the literature and the relevant organizations in terms of the subject are discussed in the conclusion part of this study.

2. Methodology

In this study, data from 2026 drivers were used to analyze the driving statuses of the shared e-scooter application, determine the factors affecting the driving status, and obtain forecast data for the future. The data collection, descriptive statistics, and theoretical information about algorithms used to calculate the forecast data are given in the subsections of this part.

2.1. Data Compilation

Regarding energy cost, environment, and time, today, people turn to shared micro-mobility, especially vehicles, for their daily routine work. In detail, this study discusses e-scooter transportation vehicles, which are micro-mobility vehicles. E-scooter vehicles used in Turkey are defined as ‘binbin’. People rent these vehicles for a specific fee depending on time.

The data used in this study are from 15 different regions (cities) in Turkey. The dataset contains driving and driver information of alien drivers at different times and in other cities. A total of 932 drivers (after preprocessing data) were used in detail in this study to analyze and calculate the forecast data from these data. This study consists of two stages: the statistical analysis and estimation process. The method flow diagram of the methodology created for this study is visualized in Figure 1.

The data in the dataset of the study were categorized as dependent and independent variables. Nine independent variables have been defined, including numeric and text data types of the driver’s demographic and driving status. Information about the completion of the driving process or the occurrence of any accident from the moment of renting the e-scooter vehicles of the drivers is expressed as the dependent variable of the dataset. About 25% of the driver data include information on the travel status of drivers interrupted by accidents. Four different statuses show the travel status of an e-scooter driver using e-scooter vehicles. These are:

Completion of rides using an e-scooter without any problems;
Failure to complete the rides using an e-scooter as a result of an injury accident;
Failure to complete the rides using an e-scooter as a result of an accident with material damage;
Failure to complete the rides using an e-scooter due to reasons for which the type of injury or damage could not be determined as a result of the accident.

The data of these four statuses encountered by e-scooter drivers were defined as dependent variables. All driving information of the drivers completing their trips or interrupting their trips due to an accident was analyzed. The driving status of the drivers is 0 for normally completed cases, 1 for injury accidents, 2 for material damage accidents, and 3 for cases where the accident type is not specified but the travel time is not completed. E-scooter vehicle rental times are recorded in 30-min intervals. An integer is assigned every 30 min, provided these time slots are integer and discrete. Thus, 48 integers are set for 24 h. For example, 1 represents the 00:00–00:30 am time slot and 48 represents the 11:30 pm–00:00 am time slot.

The distribution of the data belonging to the dependent variable e-scooter driving statuses is visualized in Figure 2 based on the rental time, speed, gender type, driver ages, rental numbers, and driving times.

The data of 2026 drivers benefiting from the shared e-scooter application were pre-processed, and after cleaning and sorting, 932 data were subjected to statistical and predictive analysis. Rental of shared e-scooter vehicles is mainly carried out at 4:00–5:30 pm, after work. The maximum number of rental shift times of e-scooter vehicles is between 4:00–5:30 pm during the day. In this rental period, 18.92% of 932 drivers rented e-scooter vehicles. The ratios of female and male customers to e-scooter driver genders were calculated as 21.56% and 78.44%, respectively. The 932 drivers traveled an average of 12.32 min with their shared e-scooter vehicles. These drivers covered an average of 1.87 km with the e-scooter, including the accident rates. The average speed levels of the 932 drivers with their e-scooter vehicles were measured as 0.18 km/min. However, the e-scooter speed is calculated as 0.15 km/min considering the average distance and times. A driver completes an average distance of 1 km in 9 min, considering factors such as region, road, weather, and traffic. A study emphasized that a distance of 1.17 km with e-scooter vehicles will be covered in 7.3 min (required 6.24 min for 1 km) [37]. Another study suggested that covering 0.77 miles takes 7.55 min with e-scooter vehicles (required 6.03 min for 1 km) [38]. This subsection provides detailed information by evaluating the data distributions of the independent variables of shared rental e-scooter vehicles.

2.2. Descriptive Statistics

Descriptive statistics of the dependent and independent variables in the dataset of this study were calculated, and detailed information about the dataset was provided. Descriptive statistics results, such as sample number, mean, standard deviation, variance, maximum value, minimum value, kurtosis, and skewness, are given in Table 2. Rental dates and regions are not included in the descriptive statistics data. By assigning values to the driver statuses of the dependent variable, the data type of this variable provided is to be numeric.

The rental number of shared e-scooter vehicles consists of 201 female and 731 male customers. Approximately 20.17% of the data belonging to customers traveling with e-scooter vehicles due to accidents or other reasons belong to female customers. It has been calculated that the accident rates of male customers using shared e-scooter vehicles are higher than female customers. While the average travel time of the female customers with the e-scooter vehicle is 1.82 km (the maximum distance for the female customer is 18.1 km), the travel time of the male customers with the e-scooter vehicle is calculated as 1.90 km (the maximum distance for the male customer is 19.6 km). The male customers travel for 11.91 min, while the female customers travel for an average of 13.26 min with the e-scooter. The dataset shows that the male drivers traveled 8.88% faster than the female drivers.

It is aimed to analyze the effects of indirect parameters such as geographical and population on the dependent variable by collecting the study data in different provinces of Turkey. The rental numbers and driving conditions of shared e-scooter vehicles belonging to 15 other cities are given in Figure 3.

Figure 4 shows the statistical data on the rental of shared e-scooter vehicles in the regions considered for this study and the incomplete journeys of the drivers due to injuries, material damage, or other reasons. Due to accidents or other reasons, the inability to complete trips with shared and rentable e-scooter vehicles mainly occurs in Istanbul (which constitutes 60% of the dataset). The least number of accident incidents related to shared e-scooter vehicles happened in the Diyarbakir, Kayseri, and Usak regions, with a rate of 0.87%.

Minitab 19 program was utilized to measure the effect of independent factors on dependent factors and to obtain optimum values for the statistical analysis of the data of this study.

2.3. Machine Learning

Machine learning (ML) algorithms were run to predict the driving status of shared e-scooter drivers using the independent variable data of the dataset of this study. ML algorithms generally use statistical databases to correlate the connections between the data through algorithms and provide data predictions [39]. In this study, orange data mining 3.34 software with python software in the background was used for the ML algorithms. Among the ML algorithms, random forest (RF), AdaBoost (AB), support vector machine (SVM), k-nearest network (kNN), and neural network (NN) models were preferred in the present study. The sequence of these models on the software is shown in Figure 5.

After the data types in the dataset were defined as dependent and independent variables, the data were shared with the phases of the ML method with the data sampler module. The dataset analyzed in the ML algorithms was run with 85% in the training and 15% in the test (prediction) phases. Performance measurements of the results obtained in the ML algorithms and comparison processes were carried out. The prediction data validity was verified by calculating the results of the confusion matrix of the ROC data belonging to the ML algorithms. By not normalizing the data used in this study, the estimation results obtained in the ML algorithms are prevented from being biased. In one study, to avoid bias in the estimation results obtained in unbalanced datasets, three different approaches were proposed: selecting negative samples with distributions close to some of the positive samples, negative samples relative to the positive samples, and a certain number of closest negative samples for each positive sample [40]. The ML algorithms used in this study create their algorithm parameters suitable for unbalanced data distributions according to data types. Oversampling of the data was prevented by cross-validating the feature selection in the ML algorithms, and randomness was added to the dataset to prevent overfitting by resampling the data repeatedly. Detailed information about the parameters of the ML algorithms is given in the subsections of this section.

2.3.1. Support Vector Machine (SVM)

Support vector machine (SVM) is an ML method that separates the feature space with a hyperplane and maximizes the margin between the sample data of different classes or class values. This algorithm often provides superior predictive performance results [41]. Adopting the regression approach, SVM performs linear regression on a high-dimensional feature space using an ε-insensitive loss. Class predictions are generated in an SVM algorithm based on its regression. The accuracy of the predicted data depends on a good set of C, ε, and kernel parameters in this model. The model setting information of the SVM algorithm used in this study is given in Table 3.

2.3.2. Neural Network (NN)

The neural network algorithm uses sklearn’s multi-layer perceptron algorithm, which can learn linear and nonlinear models. Since the NN algorithm does not have an extra preprocessor like the default SVM algorithm, it prefers to use its preprocessing. For the preprocessing process, the NN algorithm performs the following operations sequentially:

Ignores unknowns from target values and removes them from the algorithm;
Categorical input continues to use the data of the variables;
Removes nondata partitions from the analysis;
Adds missing data to the study by associating them with the mean value of the available data.

The NN algorithm usually sets the standard deviation value to 1 to fit the data into a normal distribution. The model setting information of the NN algorithm used in this study is given in Table 4.

2.3.3. Random Forest (RF)

The random forest (RF) algorithm consists of a bootstrap sample taken from the training data of each tree by creating many decision trees. As you develop each tree with the training set, a random subset of features is generated from which the best feature is selected for the data sampler [42]. The last step in this algorithm is based on the ratios of the majority values of individually developed trees in the forest. The model setting information of the RF algorithm used in this study is given in Table 5.

Since no preprocessor is provided for the RF algorithm, the preprocessing process included in this algorithm is applied. The steps taken for the preprocessing of this algorithm are as follows:

Removes instances with unknown output variable values;
Allows the use of categorical variables;
Associates missing data with the mean values of other data.

2.3.4. k-Nearest Neighbor (kNN)

The k-nearest neighbor (kNN) algorithm considers the average values of the closest examples, searching for the k closest training examples in the feature space [33]. Because this algorithm assumes that a sampling point’s closest neighbors have a more significant influence than its farther neighbors, this algorithm uses several nearest neighbors, distance parameters (metric), and weights as the model criteria [43]. Four different distance parameters are used. These parameters are:

Euclidean (distance between two points as a straight line);
Manhattan (sum of absolute differences of features);
Maximal (most considerable fundamental differences between attributes);
Mahalanobis (distance between point and distribution).

The kNN algorithm prefers the Euclidean parameter the most among the distance parameters and aims to weight all neuron points equally by using this parameter uniformly. The model setting information of the kNN algorithm used in this study is given in Table 6.

2.3.5. AdaBoost

The AdaBoost (AB) algorithm, known as an acronym for adaptive boosting, is a machine learning algorithm developed and formulated by Yoav Freund and Robert Schapire [44]. This algorithm is used with other learning methods to adjust weak learners during training to improve prediction performance [45]. The model setting information of the AB algorithm used in this study is given in Table 7.

The loss function feature in the AB algorithm is based on the linear regression method. The following Samme.r python script is used to use the classification feature in this algorithm [46] (see Algorithm 1).

Algorithm1 SAMME.R

Step 1. Initialize the observation weights

w_{i} = \frac{1}{n}, i = 1,2, \dots, n .

Step 2. For

m = 1

to

M

:
Step 2.1. fit a classifier

T^{(m)} (x)

to the training data using weights

w_{i}

.
Step 2.2. obtain the weighted class probability estimates:

P_{k}^{(m)} (x) = {P r o b}_{w} (c = k ∖ x), k = 1, 2, \dots, K

Step 2.3. set:

h_{k}^{(m)} (x) \leftarrow (K - 1) (l o g p_{k}^{m} (x) - 1 / K \sum_{k'} l o g p_{k^{'}}^{(m)} (x)), k = 1, 2, \dots, K

.
Step 2.4. set:

w_{i} \leftarrow w_{i} . e x p (- \frac{K - 1}{K} y_{i}^{T} l o g p^{(m)} (x_{i}), i = 1, 2, \dots, n

.
Step 2.5. re-normalize

w_{i}

.
Step 3. Output

C_{(x)} = a r g \binom{m a x}{k} \sum_{m = 1}^{M} h_{k}^{(m)} (x)

.

where the independent factor is signified as

x_{i}

and the value of the dependent factor is represented as

c_{i}

.

C_{(x)}

is donated as the error rate of the misclassification. A weak multi-class classifier is characterized as

T (x)

. The symbol of

α^{(m)}

certifies that the weights of the training dataset are updated. The dependent variable

y

signifies the two-class classification setting in

y = (∥ (c = 1) - ∥ (c = 2)) \in {- 1, 1

}. An

h_{k}^{(m)} (x)

is represented as an enhanced prediction by minimizing the loss at each

x

variable [33].

2.4. Performance Criteria Measurements of Machine Learning

For this study, some estimation performance measurement values were used to verify the validity of the estimation data of the SVM, RF, kNN, AB, and NN models, which are the preferred ML algorithms. For the ML algorithm of this study, the results of the area under the ROC curve (AUC), accuracy classification score (CA), precision, recall, and F1 criteria are used to discuss the sharpness of the estimation results of these algorithms.

A confusion matrix is considered to evaluate a binary classification problem in an ML algorithm, where the columns represent the actual values and the rows represent the predicted values [47]. The confusion matrix used in the ML algorithms is shown in Table 8.

The area defined as the integral of the area between the ROC curve (1 specificity) and the x-axis in an ML algorithm is called the AUC. The AUC is used to evaluate the accuracy of the prediction results of an ML algorithm. Usually, the AUC value ranges from 0.5 to 1.0. If the AUC value of an ML algorithm is 0.5, the prediction values are weak; however, if the AUC value is close to 1, it is concluded that the prediction values of the ML algorithm are perfect [48]. The AUC of a model provides an aggregated measure of performance across all possible classification thresholds. That is, a random positive sample of a model has a higher rank probability than a random negative sample, as measured by the AUC.

The CA criterion of an ML algorithm is measured by the ratio of the number of correct observations predicted during the test phase to the total number of predicted observations. The higher the CA value of an ML algorithm is, the higher the consistency of the prediction data for future samples. The following formula is used for the CA criteria of an ML algorithm:

C A = \frac{n u m b e r o f t r u e (c o r r e c t) p r e d i c t i o n s}{t o t a l n u m b e r o f p r e d i c t i o n s}

(1)

For binary classification performed in an ML algorithm, the CA criterion is calculated as follows:

C A = \frac{T P + T N}{T P + T N + F P + F N}

(2)

where the number of true positives and negatives are denoted by TP and TN, respectively. FP and FN represent the number of false positives and negatives.

The measurement value of the precision criterion of an ML algorithm is calculated by dividing the number of correctly predicted positive observations by the total predicted positive observations. The precision measurement value of an ML algorithm is calculated using the following equation:

P r e c i s i o n = \frac{T P}{T P + F P}

(3)

The recall measurement value of an ML algorithm, on the other hand, offers a similar but different approach to the precision measurement criterion, and the recall value is calculated by dividing the number of correctly predicted positive observations by the sum of the correctly predicted positive observations and incorrectly predicted negative observations. The recall formula is as follows:

P r e c i s i o n = \frac{T P}{T P + F N}

(4)

Examining the precision and recall performance criteria of an ML algorithm increases the prediction data’s accuracy. However, there is an inversely proportional relationship between recall and precision in increased sensitivity. In other words, while the recall value decreases, the sensitivity may increase and vice versa.

The metric used for the classification technique of the algorithms preferred in the ML method is expressed as the F1 score. The F1 score, usually in an ML algorithm, is evaluated as a weighted average of the precision and recall metrics. The F1 formula is as follows:

F 1 = 2 * \frac{R e c a l l * P r e c i s i o n}{R e c a l l * P r e c i s i o n}

(5)

Generally, if the F1 score of an ML algorithm is more significant than 0.7, this algorithm is applied for the prediction data. However, for an ML algorithm to have high performance, any ML algorithm must have an F1 score of 0.9 or higher [49]. There are three cases for the F1 score of an ML algorithm:

A high F1 score depends on high recall and precision values;
Low recall and precision values cause a low F1 score for an ML algorithm;
An inverse relationship between the recall and precision values ensures that the F1 score has an average value.

2.5. Argument Data Suggestions for Driver Statuses

The effects of the independent variables affecting the driving status of people who use shared e-scooter vehicles are analyzed, and solution suggestions are derived. In the driving status, it is desired that the drivers complete their journeys without any problems during their travel time. In this study, e-scooter accident rates are minimized by applying certain limitations to the independent variables that cause the drivers to interrupt their journeys due to injuries, material damage, and other reasons. The limits of the independent variables were calculated by targeting the integer 0, which represents smooth travel, for the dependent variable. Minitab 19 program was used to optimize the results of the dependent and independent variables. This program obtained solution suggestions with the integer optimization model technique. Solution suggestions are discussed in the discussion part of this study.

3. Results

The numerical results of the methods applied in this study are discussed in different subsections in this part of the study. First, the effectiveness levels of the independent variables on the dependent variable representing the driving statuses of e-scooter drivers were tested by linear regression analysis, a statistical analysis. For the dependent variable to have a definite data type, the ML algorithms were analyzed in models with high-performance values for the categorical data type, and prediction results were obtained. In addition, the performance criteria results were compared to verify the validity of the predicted results. Finally, by calculating the optimum values of the independent variables that affect the driving statuses of shared e-scooter drivers, comments have been put forward to ensure that e-scooter drivers complete their journeys without any problems.

3.1. Statistical Analysis

The present method is a powerful statistical tool to determine the effect of the power and direction of the independent variables on the dependent variable based on the linear regression method [50]. This study analyzed the effects of ten independent variables in different categories on the dependent variable. The regression model’s R² and adjusted R² values were calculated as 88.01% and 87.70%, respectively, to verify the validity of the regression analysis performed for this study. Table 9 shows the statistical results of the independent factors that affect the driving status of e-scooter vehicles used in daily routine work. Table 9 shows the regression model’s independent variable coefficients, standard error coefficients, t-value, p-value, and VIF values.

Among the independent variables, the rental dates of shared e-scooter vehicles significantly affect the dependent variable with a significance level of 0.01. Thus, the rental date independent variable directly affects the driving statuses in terms of different years, months, weeks, and days. However, it has been determined that the rental time (the starting time of the driving time of the e-scooter vehicle during the day) has little or no effect on the driver’s behavior. The p-value, which indicates the significance of this independent variable, was calculated as 0.248.

The significance levels of the factors independent of distance, speed, and time, which have a high impact on driving statuses, were measured as 0.001, 0.001, and 0.005, respectively. The common feature of these factors is that the user can adjust e-scooter vehicles throughout the journey. However, although the speed limits of e-scooter vehicles are limited, it has been observed that there are differences between driver genders. It has been analyzed that, while female drivers have higher travel times with their e-scooter vehicles than male drivers, their speed is lower. It was observed that, while driver gender significantly affected the driving status, the effect of driver age on the driving status was low, with a significance level of 0.452. It has been analyzed that the gender of the drivers who want to travel by choosing shared e-scooter vehicles has an effect on their driving status with a significance level of 0.036 (ignoring rental date, rental time, region, and driver age factors). The equations of the regression models of driving behaviors are discussed in terms of gender and e-scooter vehicle models and shared in Table A2 in the appendix of this study.

It has been determined that the model factor of shared e-scooter vehicles is very influential on the driving status, with a significance level of 0.001. In this study, considering the ages of e-scooter vehicles, M2.1, M2.2, and M2.3 codes represent the years 2000, 2001, and 2002, respectively. Models with new technological and reliability features offer drivers a safer driving opportunity.

Shared and rentable (not owned by individuals but used for personal trips) e-scooter vehicles are operated by the BinBin company in different regions of Turkey and offered to drivers. In this study, these regions were determined, and their effects on the dependent variable were examined. According to the regression analysis, the region factor was calculated to have an effect on the driving status according to the 0.001 significance level. Among these regions, the provinces of Istanbul, Bursa, and Trabzon significantly impact driver behavior. The significance levels of these provinces were calculated as 0.013, 0.002, and 0.001, respectively. Regression models of driver behaviors were obtained separately for the regions selected for this study and shared in Table A3 in the appendix of this study. Thus, information about the values of the independent variables is provided to ensure that the e-scooter driving status for each region is trouble-free. However, driver gender, age, and rental time variables, which are independent factors that have little or no effect on the dependent variable, are not considered in these equations.

The regression analysis offers the possibility of the standardized effect graph of the variables to show the power of the effect of the independent variables on the dependent variable. The strength of the impact of the ten independent variables used in this study on the dependent variable, the e-scooter driving status, is visualized in Figure 6.

In the standardized effect graph, the independent variables are expressed with codes. It is assumed that the factors on the right side of the significance level of 1.96 in the standardized effect graph influence the dependent variable. In addition, the effect of the independent variable with a value greater than 1.96, which is the threshold value, on the dependent variable is very large. In this study, the driving time variable has the most effect on the independent variable. The independent variable which has the weakest effect on the dependent variable is the age of the driver. Considering all factors, the significance level showing the impact of driver gender according to the regression analysis results was taken into account in this study as it is close to the threshold value.

The statistical analysis determined that all factors other than driver age and e-scooter rental time significantly influenced the dependent variable’s driving status factor. The rental dates of the shared e-scooter vehicles, the regions where the rental is made, and the driving time, especially, had a strong effect on their driving status.

3.2. Results of Machine Learning

The performance measurement values should be high to verify the validity of the predictive data provided by the ML algorithms. In this study, a performance comparison of the ML algorithms was made by calculating five different performance measurement values. The ML algorithms selected for this study perform almost perfectly on the driving status evaluated by the AUC, CA, F1, precision, and recall metrics. The AUC, CA, F1, precision, and recall values of the ML algorithms are given in Table 10.

The AUC values of the SVM, RF, kNN, AB, and NN algorithms were calculated as 0.875, 0.944, 0.763, 0.959, and 0.500, respectively. Of the ML algorithms, the one with the highest AUC value has the best performance. For this reason, while the AB algorithm provided the best performance in the training phase of the ML algorithms, the NN algorithm showed the worst performance. The AUC values of the SVM, RF, kNN, AB, and NN algorithms of the test phase of the ML algorithms were calculated as 0.723, 0.922, 0.804, 0.800, and 0.500, respectively. While the RF algorithm provided the best performance in the training phase of the ML algorithms, the NN algorithm showed the worst performance. In addition, the graph that considers the ROC curves of these algorithms is visualized in Figure 7. The ROC tool provides the functionality to display the ROC curves for the ML algorithms tested and the corresponding convex hull. ROC, the most effective ML tool, provides a comparison process for classification models among the ML algorithms. The size of the area under the ROC curves increases the accuracy of the prediction data of an ML algorithm. The ROC curves of the ML algorithms confirm their excellent consistency in predicting driving statuses. The AB algorithm has the largest area under the ROC curve in this study.

The CA tool is used as a statistical measure in the ML algorithms to understand how correctly or incorrectly it identifies a condition for binary classification testing. The CA is formulated as the ratio of the true positive and negative numbers of the correct predictions of the dataset under test. A high CA value means it has high predictive data for classified tests. In this study, the AUC values of the SVM, RF, kNN, AB, and NN algorithms during the training phase of the ML method were calculated as 0.871, 0.964, 0.885, 0.950, and 0.878, respectively. In the training phase of the ML algorithms, the RF algorithm has the highest CA value, while the SVM algorithm has the lowest CA value. In the testing phase of the ML method, the AUC values of the SVM, RF, kNN, AB, and NN algorithms were calculated as 0.878, 0.928, 0.863, 0.935, and 0.878, respectively. In the testing phase of the ML algorithms, the AB algorithm has the highest CA value, while the SVM and NN algorithms have the lowest CA value.

Precision, another performance criterion of the ML algorithms, provides the classifier’s ability to not label negative or positive data as negative or positive by acting heuristically. The precision values of the SVM, RF, kNN, AB, and NN algorithms were calculated as 0.770, 0.931, 0.841, 0.922, and 0.770, respectively. In the training phase of the ML algorithms, the RF algorithm has the highest precision, while the SVM and NN algorithms have the lowest precision. During the test phase, the precision values of the SVM, RF, kNN, AB, and NN algorithms were calculated as 0.770, 0.891, 0.821, 0.924, and 0.770, respectively. In the testing phase of the ML algorithms, the AB algorithm has the highest precision, while the SVM and NN algorithms have the lowest precision. There is an inverse proportion between recall and precision, which is one of the performance measurement results of the ML algorithms. Therefore, precision-recall is a critical measure of the prediction success of the ML algorithms because the classes are unbalanced. As a result, in the ML algorithms, the precision criterion is expressed as a measure of the development and the level of analysis. At the same time, the recall measures the number of links between the relevant results and the test used for analysis. Generally, a balance between the recall and precision values is provided in the ML algorithms. The recall values of the SVM, RF, kNN, AB, and NN algorithms were calculated as 0.871, 0.964, 0.885, 0.950, and 0.878, respectively. In the training phase of the ML algorithms, the RF algorithm has the highest recall value, while the SVM algorithm has the lowest recall value. During the test phase, the recall values of the SVM, RF, kNN, AB, and NN algorithms were calculated as 0.878, 0.928, 0.863, 0.935, and 0.878, respectively. In the testing phase of the ML algorithms, the AB algorithm has the highest recall value, while the SVM and NN algorithms have the lowest recall value.

The F1 score criterion is among the most critical performance measures to verify the validity of classification-based prediction data in ML algorithms. The F1 score of ML algorithms emerges as the harmonic mean of the precision and recall values. Suppose an additional weight value is applied to one of these criteria to strengthen the interaction of the precision and recall values. In that case, the resulting criterion measure is expressed with Fb. In ML algorithms, the recall and precision criteria generally perform excellently as an F1 score approaches 1, or the recall and precision criteria perform poorly as an F1 score approaches 0. During the training phase, the F1 score values of the SVM, RF, kNN, AB, and NN algorithms were calculated as 0.817, 0.947, 0.856, 0.939, and 0.821, respectively. Among these algorithms, the RF model has the highest F1 score, while the SVM algorithm has the lowest F1 score. During the test phase, the F1 score values of the SVM, RF, kNN, AB, and NN algorithms were calculated as 0.821, 0.907, 0.839, 0.928, and 0.821, respectively. Among these algorithms, the AB model has the highest F1 score, while the SVM and NN algorithms have the lowest F1 scores.

The accuracy rates obtained in the estimation data of the e-scooter driving status of the five different ML algorithms discussed in this study are shown in Figure 8. Generally, the prediction accuracy margins of the ML algorithms approach 1, and the consistency of the ML algorithms in the prediction data is very high. The mean and standard deviation of the ML algorithms were calculated to deliver consistency between the data selected for the test and the actual data. The highest consistency ratio in the ML algorithms belongs to the AB algorithm, which has a mean value of 0.930 and a standard deviation value of 0.178. The mean values of the SVM, RF, kNN, and NN algorithms were calculated as 0.795, 0.884, 0.884, and 0.842, respectively. The standard deviation data of these algorithms were computed as 0.266, 0.258, 0.191, and 0.239, respectively. A box plot of the ML algorithms was created to show the distribution of the attribute values of the output variable in the ML algorithms. Box plots of the ML algorithms are shown in Figure A1 in Appendix A of this study.

The confusion matrix of the ML algorithms provides the number or ratio of coherence between the predicted data (class) and the actual data (class). Generally, it gives excellent results in the samples with classification features in the ML algorithms. The confusion matrix provides an easy understanding of whether the classification is wrong or correct by comparing the estimated data with the actual samples to which it corresponds. In this study, four different classifications were made, as there were four situations of the driving status, which is the output factor. The estimation results of the classifications of the output factor of each ML algorithm are shared in Table 11.

There are four different statuses that e-scooter drivers encounter during their journey. Most of these statuses have completed status. In the prediction data of this status, the prediction rates of the SVM, RF, kNN, AB, and NN algorithms were calculated as 87.7%, 98.14%, 90.2%, 96.0%, and 87.8%, respectively. The RF algorithm has the highest prediction accuracy for this status, while the SVM algorithm has the lowest prediction rate. No ML algorithm gives a high prediction rate for statuses 2 and 3, which refers to e-scooter drivers interrupting their journey due to an injury or material damage accident. The estimation rates of the SVM, RF, kNN, AB, and NN algorithms were calculated as 0.0%, 84.6%, 57.1%, 91.7%, and 0.0%, respectively, for status 4, which expresses the status of drivers who have an accident due to the drivers interrupting their journey. For this status, the AB algorithm has the highest prediction rate. As a result, in this study, the ML algorithms provided solid results for estimating one of the four different statuses a customer may encounter if he uses shared e-scooter vehicles for travel.

4. Discussion

Many factors, such as dense settlements, complex construction, parking problems, energy costs, time, and environmental factors, are requiring solutions to the transportation problem in people’s daily lives by directing them to shared rental or private electric micro-mobility vehicles. Micro-mobility is an essential mode of transport that needs to be developed and supported for urban mobility, just as the development and use of the public transport network are encouraged. In particular, it minimizes the carbon footprint of vehicles with electric motors included in the sharing economy within the scope of micro-mobility. In this study, the behaviors of drivers who use shared and durable e-scooter vehicles in different regions of Turkey are analyzed, and the factors affecting their driving status are discussed. After the statistical analysis of the variables representing the dependent and independent factors, the ML algorithm was used to show the future effects of shared e-scooter vehicles. Finally, in addition to the impact of the independent variables on the dependent variable, the optimum values of the independent variables were calculated so that the drivers who use e-scooter vehicles can complete their journey without any problems. The main effect graphs of the four most critical independent factors affecting the travel status of the drivers who use shared e-scooter vehicles are visualized in Figure 9.

The increase in driving times and driving speeds of drivers who use shared e-scooter vehicles for travel causes drivers to make more mistakes and interrupt their driving due to injury, material damage, or other reasons. The e-scooter speed limit significantly impacts the driving status, but according to one study, companies impose speed limits because the e-scooter speed causes significant injuries [29,51]. This study calculated the average speed limit as 0.256 km/min (15.36 km/h) for both male and female e-scooter drivers for a safe and problem-free ride. E-scooter speed limits are limited to an average of 25 km/h in the U.K. and 20 km/h in the U.S. and Germany [51]. However, it has been determined that reducing the speed limit from 30 km/h to 20 km/h in e-scooters minimizes the severity of injuries by 23% in injury accidents.

In the provinces of Istanbul, Trabzon, and Kocaeli, which are included in the regional factor, it is frequently experienced that the journeys are not completed due to injury, material damage, or other reasons, among the negative options for the driving status of shared e-scooter vehicles. Finally, as the technology and reliability, which represent shared and rentable e-scooter vehicle models, increase, drivers will travel more safely. In one study, the e-scooter accident rate for drivers aged 18 and under was calculated as 8.4% [52], but in this study, it was assumed that the age factor of e-scooter drivers had little effect on the driving status.

As a result of a small number of studies, there are limited data on the accidents faced by drivers who use e-scooter vehicles. In addition, one of the most important reasons for the scarcity of data on accidents originating from e-scooters is the absence of unique labels (only injury, material damage, and other types of accidents) for accident reports [52]. Micro-mobility vehicles are at a lower level than other means of transport due to injury, material damage, or other reasons due to accidents. In one study, the annual incidence of injuries was calculated as 120 people per 100,000 people (0.12%) for e-scooters [53]. Some studies have found that less than 5% of e-scooter drivers who are injured, especially in injury accidents, wear helmets [54,55]. The safety precautions taken by the drivers are insufficient because the legal regulations for the use of e-scooters have not been completed yet, and the authorities cannot inspect e-scooter drivers.

For the ML algorithms, four different cases are recognized for the dependent (or target) variable. There are four statuses during the journey of drivers who use shared and rentable e-scooter vehicles. In this study, the status where a driver does not encounter any problems with the e-scooter vehicle from the starting point to the finish location or in the case that the drivers do not reach the destination location by interrupting their journeys due to any accident (injured, material damage, or nonapplicable) is expressed as an independent target. The validity of the estimation results of the independent variable was verified by considering the performance criteria values of the SVM, RF, kNN, AB, and NN algorithms. According to the AUC, CA, F1, precision, and recall values, the AB algorithm provided the best performance. The AUC, CA, F1, precision, and recall mean values of the AB algorithm belonging to both the test and training phases of the ML method were calculated as 0.879, 0.9425, 0.9335, 0.923, and 0.9425, respectively. According to the confusion matrix, the false estimation error rate of the AB algorithm was calculated as 0.022 (2.2%).

The optimum values of the independent variables have been analyzed and calculated so that shared e-scooter drivers can complete their journeys without problems. For a female driver with no more than 100 e-scooter rental experiences, they should also be limited to a maximum distance of 10.44 km. If a female driver is to travel for 48.33 min, her speed should be a maximum of 0.223 km/min (13.38 km/h). During the travel period, drivers should use the new version of e-scooter vehicles that include new technologies and provide ease of use. Male drivers using the shared e-scooter vehicle must have at least 120 rental experiences to complete their rides without any problems. Speed, time, distance, and model values, other adjustment data of male drivers were computed as 0.288 km/min (17.28 km/h), 11.49 km, 52.20 min, and M2.2 (or the latest version of the e-scooter vehicle), respectively.

In the present study, the effect of the independent variables on the dependent variable was calculated according to the estimation results according to the ML algorithms and statistical analyses. Among the suggestions, drivers who use shared e-scooter vehicles can have a safe driving experience by creating driver-specific driving information according to their demographic, geographical, and time variables. Other factors that cause material or injury accidents are expressed as the load on passengers and the vehicle structure used in other studies among the driving experiences of drivers who use shared e-scooter vehicles. In a study, different methods have been developed to analyze biomechanical behaviors in various crash scenarios to reduce permanent lesions, extensive/expensive rehabilitation, and social health effects in accidents by redesigning the structure of micro-mobility vehicles [56]. Drivers are directly affected by accidents that occur in shared micro-mobility vehicles compared to other vehicles. In particular, different approaches have been considered to design the vehicle structure as a passive element to prevent/reduce the transmission of crash energy from the accident and to absorb and disperse it [56]. Another study used Bayesian neural network models to predict motor vehicle collisions: factors related to the vehicle, driver, pedestrian, traffic flow, and highway variables before, during, and after the accident event [57]. Xiang developed an algorithm based on the Apriori algorithm and analyzed the car crash test in C-NCAP [58]. This study focuses on providing usage permits for drivers who use e-scooter vehicles according to their driving planning and their demographic, geographical, and rental time factors for the safe travel process.

In this study, e-scooters, which are among the shared and rentable micro-mobility vehicles available for personal travel, offer a solution that takes into account the gender factors of the drivers to ensure that the journeys between the starting and destination locations are sustainable without any problems. In particular, this study will support policy initiatives to ensure that e-scooters can be used safely together with other micro-mobility vehicles in harmony with other travel modes. There are two critical factors affecting the travel experience of drivers who use micro-mobility vehicles for travel, such as a lack of education and legislation. Drivers who use micro-mobility vehicles lack the technical and physical knowledge of the vehicles they use and have limited information on how to move in traffic. Since drivers who use micro-mobility vehicles generally use the roads and intersections used by other vehicles, many problems arise due to the lack of legal regulations or their belief that they do not have to comply with traffic rules in order not to be affected by their driving behavior. Some approaches are presented on how e-scooters work in urban transportation systems to solve many safety and legal issues, especially in urban traffic conditions [59]. The recommended solution for a safe travel experience usually includes education, demographic characteristics such as drivers’ age, and criminal legislation.

Micro-mobility has become common in cities, as using private vehicles increases carbon footprints, parking problems, and town inaccessibility (due to traffic intensity). Developments in electric motor vehicle technologies have created an opportunity for e-scooters in the development of micro-mobility in cities. However, micro-mobility vehicles that intersect with both road vehicle traffic and pedestrian traffic cause many accidents due to the uncertainty or novelty of the rules for drivers. In many countries, preventive accident analyses for micro-mobility systems that are not included in a driver’s license rule system but require driving and driver safety will guide regulatory authorities. Moreover, it will contribute to developing artificial intelligence-assisted accident detection and warning systems as an intelligent transportation system component. For example, it may be possible to use infrastructures such as fatigue detection systems used in vehicle technologies for micro-mobility systems. With the analysis of big data of current driving and the evaluation of accidents, micro-mobility-oriented rules can be controlled with intelligent transportation system technologies in the future.

This study has some limitations. This study contains information about e-scooter (nonpersonal) vehicles among the shared and rentable micro-mobility vehicles used for daily travel. Although this study has information about accident types, it has been evaluated in three categories: material damage, injury, and other accident types. Among the independent variables used in this study, parameters such as temperature, precipitation rate, and humidity rate, which express air parameters, were not considered. Problems without an accident cause, which are among the dependent variables of this study, were considered as a single factor. Demographic information about drivers does not include physical characteristics of drivers such as weight and height. In this study, since the provinces where e-scooter applications are made have large locations, accident locations are considered cities instead of small areas. In a study, a geographic information systems (GIS)-based multi-criteria decision analysis (MCDA) method was developed using the detailed location data of shared bicycles from micro-mobile vehicles implemented in the City of Zurich in February 2020 [60]. In another study, driving distances were calculated by determining the actual trips of the drivers who use micro-mobility vehicles using the GPS data of the micro-mobility vehicles [61]. This study used direct driving distance data instead of geographical distances and location information, such as the road, intersection, and street where e-scooter-related accidents occur, is not included. Only the regions (cities) where the accidents occurred were included in this study. Finally, technical features such as the batteries of shared and rentable e-scooter vehicles are not taken into account, and only e-scooter models are used in terms of the production years.

5. Conclusions

This study examined the data of e-scooter drivers in 15 different cities in Turkey and discussed their driving status. There are four different statuses that customers who use e-scooter vehicles, which are among the available types of micro-mobility vehicles for daily transportation, encounter during their journey. Ten various independent factors affecting the driver’s completion of their journey without any problems during the trip or the interruption of the travel time of the drivers due to injury, material damage, or any other reason (nonapplicable) were examined. The independent factors are evaluated within the scope of the demographic characteristics of the driver, the regions where the trips take place, the rental times and dates, and the e-scooter models used for travel. The data in this study belong to shared and rentable e-scooter vehicles (nonpersonal) belonging to the BinBin company.

After the data used in the study were first preprocessed, statistical analyses were performed, and the effects of the independent factors on the e-scooter driving status were examined. The significance levels of the driver gender, region, rental date, number of rentals, travel time, average speed and distance of the e-scooter during the ride, and the e-scooter production year factors were 0.036, 0.001, 0.002, 0.059, 0.001, and 0.005, respectively. It was calculated as 0.001, 0.001. In terms of the statistical significance of these factors, they significantly affect their driving status. While travel time has the most potent effect on driving status, long-term trips negatively affect drivers, causing them to interrupt their journeys.

In the next step of this study, the driving status of the drivers who use shared e-scooter vehicles was estimated using SVM, RF, kNN, AB, and NN algorithms from ML models, which is a powerful estimation technique. The AUC, CA, F1, precision, and recall values, the performance measurement criteria of the ML models, were calculated and compared. Performance criteria are essential to verify the validity of the prediction results of ML algorithms. Although different algorithms showed excellent performance according to the performance measurement results, the AB algorithm showed the best performance for the F1 score. At the last stage of this study, the optimum values of the independent variables were calculated for a smooth journey according to the gender of the drivers who use shared e-scooter vehicles. According to the numerical results, a female driver’s rental experience, distance, driving time, and driving speed were calculated as 100, 10.44 km, 48.33 min, and 13.38 km/h, respectively. The optimum values of the rental experience, distance, driving time, and driving speed for male drivers were calculated as 120, 11.49 km, 52.20 min, and 17.28 km/h, respectively.

The scientific novelty of this study is to obtain predictable optimum results to provide a safe travel process by considering the driver characteristics of shared and rentable e-scooter vehicles, geographical, technological, and factors related to the travel process. This study includes a detailed analysis of the factors affecting the driving status of shared and rentable e-scooter vehicles to provide a safe journey. This study includes a numerical analysis of the driving statuses of drivers who use shared e-scooter vehicles for short-distance trips. However, for drivers who use e-scooter vehicles over micro-mobility vehicles, numerical analysis and studies should be conducted to create comprehensive legislation for safe driving. Especially for shared micro-mobility applications in some regions of Turkey, some sanctions should be implemented to help solve the problem of the proper behavior of e-scooter drivers and penalize inappropriate communication behavior. In addition, there are doubts that insufficient micro-mobility use training of e-scooter users significantly impacts driving behavior. In one study, many issues such as inadequate knowledge of micro-mobility users, charging problems of electric vehicles, driving electric cars in bad weather conditions, not respecting other vehicles in the traffic, and acting suddenly without following the traffic rules were discussed for drivers to have a safe travel experience in micro-mobility applications [62]. This study provides an essential guide for both drivers who use e-scooter vehicles for travel and institutions that provide this service, especially for regions with no legal rules for shared e-scooter vehicles.

With this study, different approaches are aimed at using information such as location, street, and intersection for shared micro-mobility applications for drivers who use micro-mobility vehicles for travel in future studies to have a safe driving experience. In addition, in future studies, we plan to address the issues that consider the time, cost, and environmental factors of shared e-scooter vehicles, considering cities’ livability indexes.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

The correlation values of the dependent and independent variables in this study are given in Table A1. The correlation values between the dependent and independent variables were evaluated in three different categories and tested. Correlation values between 0.1–0.3, weak correlation, moderate between 0.3–0.5, and strong correlation between 0.5–0.9 have been reported in the literature [63].

Table A1. The values of correlations based on the Pairwise Pearson Correlation Test.

Sample 1	Sample 2	N	Correlation	95% CI for ρ	p-Value
Tenure	Model	263	0.112	(−0.009, 0.230)	0.070
Distance	Model	263	0.087	(−0.034, 0.206)	0.158
Rental Time	Model	263	−0.023	(−0.144, 0.098)	0.710
Driving Time	Model	263	−0.005	(−0.126, 0.116)	0.940
Distance	Tenure	263	−0.053	(−0.173, 0.068)	0.390
Rental Time	Tenure	263	0.008	(−0.113, 0.129)	0.899
Driving Time	Tenure	263	−0.112	(−0.230, 0.009)	0.070
Rental Time	Distance	263	−0.112	(−0.230, 0.009)	0.069
Driving Time	Distance	263	0.565	(0.476, 0.642)	0.000
Driving Time	Rental Time	263	−0.139	(−0.255, −0.018)	0.024

Abbreviation: CI, confidence interval; ρ, correlation coefficient.

The regression models of the dependent variable according to the region data of the number of rental experiences, driving time, distance, and driving speed factors are given in Table A2.

Table A2. Regression models based on the region factor affecting the dependent variable.

Region	Response Variable		Formulas	Eq. Ref.
Adana	Driving Status Numeric	=	−0.0122 + 0.000123 Rental Numbers + 0.02668 Driving Time (min)—0.13252 Distance (km) + 0.499 Speed (km/min)	(A1)
Antalya	Driving Status Numeric	=	−0.0533 + 0.000123 Rental Numbers + 0.02668 Driving Time (min)—0.13252 Distance (km) + 0.499 Speed (km/min)	(A2)
Bursa	Driving Status Numeric	=	−0.1583 + 0.000123 Rental Numbers + 0.02668 Driving Time (min)—0.13252 Distance (km) + 0.499 Speed (km/min)	(A3)
Diyarbakır	Driving Status Numeric	=	−0.0524 + 0.000123 Rental Numbers + 0.02668 Driving Time (min)—0.13252 Distance (km) + 0.499 Speed (km/min)	(A4)
Eskişehir	Driving Status Numeric	=	−0.1175 + 0.000123 Rental Numbers + 0.02668 Driving Time (min)—0.13252 Distance (km) + 0.499 Speed (km/min)	(A5)
Gaziantep	Driving Status Numeric	=	0.0243 + 0.000123 Rental Numbers + 0.02668 Driving Time (min)—0.13252 Distance (km) + 0.499 Speed (km/min)	(A6)
İstanbul	Driving Status Numeric	=	2.5050 + 0.000123 Rental Numbers + 0.02668 Driving Time (min)—0.13252 Distance (km) + 0.499 Speed (km/min)	(A7)
İzmir	Driving Status Numeric	=	−0.0514 + 0.000123 Rental Numbers + 0.02668 Driving Time (min)—0.13252 Distance (km) + 0.499 Speed (km/min)	(A8)
Kayseri	Driving Status Numeric	=	−0.0110 + 0.000123 Rental Numbers + 0.02668 Driving Time (min)—0.13252 Distance (km) + 0.499 Speed (km/min)	(A9)
Kocaeli	Driving Status Numeric	=	0.107 + 0.000123 Rental Numbers + 0.02668 Driving Time (min)—0.13252 Distance (km) + 0.499 Speed (km/min)	(A10)
Konya	Driving Status Numeric	=	−0.0981 + 0.000123 Rental Numbers + 0.02668 Driving Time (min)—0.13252 Distance (km) + 0.499 Speed (km/min)	(A11)
Sakarya	Driving Status Numeric	=	0.0158 + 0.000123 Rental Numbers + 0.02668 Driving Time (min)—0.13252 Distance (km) + 0.499 Speed (km/min)	(A12)
Samsun	Driving Status Numeric	=	−0.1157 + 0.000123 Rental Numbers + 0.02668 Driving Time (min)—0.13252 Distance (km) + 0.499 Speed (km/min)	(A13)
Trabzon	Driving Status Numeric	=	2.908 + 0.000123 Rental Numbers + 0.02668 Driving Time (min)—0.13252 Distance (km) + 0.499 Speed (km/min)	(A14)
Uşak	Driving Status Numeric	=	0.0015 + 0.000123 Rental Numbers + 0.02668 Driving Time (min)—0.13252 Distance (km) + 0.499 Speed (km/min)	(A15)

The regression models of the dependent variable according to the number of rental experiences, driving time, distance, and driving speed factors according to driver genders and e-scooter models are given in Table A3.

Table A3. Regression models based on the gender and e-scooter models factor affecting the dependent variable.

Gender	Model	Response Variable		Formulas	Eq. Ref.
Female	M2.1	Driving Status Numeric	=	1.913 + 0.000264 Rental Numbers + 0.02650 Driving Time (min)—0.1077 Distance (km)—1.638 Speed (km/min)	(A16)
Female	M2.2	Driving Status Numeric	=	0.1822 + 0.000264 Rental Numbers + 0.02650 Driving Time (min)—0.1077 Distance (km)—1.638 Speed (km/min)	(A17)
Female	M2.3	Driving Status Numeric	=	0.7852 + 0.000264 Rental Numbers + 0.02650 Driving Time (min)—0.1077 Distance (km)—1.638 Speed (km/min)	(A18)
Male	M2.1	Driving Status Numeric	=	2.027 + 0.000264 Rental Numbers + 0.02650 Driving Time (min)—0.1077 Distance (km)—1.638 Speed (km/min)	(A19)
Male	M2.2	Driving Status Numeric	=	0.2960 + 0.000264 Rental Numbers + 0.02650 Driving Time (min)—0.1077 Distance (km)—1.638 Speed (km/min)	(A20)
Male	M2.3	Driving Status Numeric	=	0.8990 + 0.000264 Rental Numbers + 0.02650 Driving Time (min)—0.1077 Distance (km)—1.638 Speed (km/min)	(A21)

Box plots showing the distribution of the attribute values of the output variable in the ML algorithms are shown in Figure A1. Box charts are an excellent ML tool to check for new data, especially by quickly discovering anomaly cases such as repeating values, outliers, etc.

Figure A1. Box plots of the distribution of attribute values of the output variable in ML algorithms: (a) SVM algorithm; (b) AB algorithm; (c) kNN algorithm; (d) RF algorithm.

References

İnaç, H.; Ayözen, Y.; Atalan, A.; Dönmez, C.Ç. Estimation of Postal Service Delivery Time and Energy Cost with E-Scooter by Machine Learning Algorithms. Appl. Sci. 2022, 12, 12266. [Google Scholar] [CrossRef]
Mukhtar, M.; Ashraf, A.; Frank, M.; Steenburg, S.D. Injury incidence and patterns associated with electric scooter accidents in a major metropolitan city. Clin. Imaging 2021, 74, 163–168. [Google Scholar] [CrossRef] [PubMed]
Shah, N.R.; Aryal, S.; Wen, Y.; Cherry, C.R. Comparison of motor vehicle-involved e-scooter and bicycle crashes using standardized crash typology. J. Safety Res. 2021, 77, 217–228. [Google Scholar] [CrossRef] [PubMed]
Glavić, D.; Trpković, A.; Milenković, M.; Jevremović, S. The E-Scooter Potential to Change Urban Mobility—Belgrade Case Study. Sustainability 2021, 13, 5948. [Google Scholar] [CrossRef]
Button, K.; Frye, H.; Reaves, D. Economic regulation and E-scooter networks in the USA. Res. Transp. Econ. 2020, 84, 100973. [Google Scholar] [CrossRef]
Insight, B. The Bike and Scootersharing Telematics Market; Berg Insight: Stockholm, Sweden, 2020. [Google Scholar]
Shichman, I.; Shaked, O.; Factor, S.; Weiss-Meilik, A.; Khoury, A. Emergency department electric scooter injuries after the introduction of shared e-scooter services: A retrospective review of 3331 cases. World J. Emerg. Med. 2022, 13, 5. [Google Scholar] [CrossRef] [PubMed]
Störmann, P.; Klug, A.; Nau, C.; Verboket, R.D.; Leiblein, M.; Müller, D.; Schweigkofler, U.; Hoffmann, R.; Marzi, I.; Lustenberger, T. Characteristics and Injury Patterns in Electric-Scooter Related Accidents—A Prospective Two-Center Report from Germany. J. Clin. Med. 2020, 9, 1569. [Google Scholar] [CrossRef]
Gan-El, E.; Djomo, W.N.; Ciobanu, A.M.P.; Kaufman, L.; Djiélé, F.N.; Ulrix, M.; Kreps, B.; Plumacker, A.; Malinverni, S.; Bartiaux, M.; et al. Risk assessment, consequences, and epidemiology of electric scooter accidents admitted to an emergency department: A prospective observational study. Eur. J. Trauma Emerg. Surg. 2022, 48, 4847–4855. [Google Scholar] [CrossRef]
Brauner, T.; Heumann, M.; Kraschewski, T.; Prahlow, O.; Rehse, J.; Kiehne, C.; Breitner, M.H. Web content mining analysis of e-scooter crash causes and implications in Germany. Accid. Anal. Prev. 2022, 178, 106833. [Google Scholar] [CrossRef]
Graef, F.; Doll, C.; Niemann, M.; Tsitsilonis, S.; Stöckle, U.; Braun, K.F.; Wüster, J.; Märdian, S. Epidemiology, Injury Severity, and Pattern of Standing E-Scooter Accidents: 6-Month Experience from a German Level I Trauma Center. Clin. Orthop. Surg. 2021, 13, 443. [Google Scholar] [CrossRef]
Mayhew, L.J.; Bergin, C. Impact of e-scooter injuries on Emergency Department imaging. J. Med. Imaging Radiat. Oncol. 2019, 63, 461–466. [Google Scholar] [CrossRef] [PubMed]
Beck, S.; Barker, L.; Chan, A.; Stanbridge, S. Emergency department impact following the introduction of an electric scooter sharing service. Emerg. Med. Australas. 2020, 32, 409–415. [Google Scholar] [CrossRef] [PubMed]
Azimian, A.; Jiao, J. Modeling factors contributing to dockless e-scooter injury accidents in Austin, Texas. Traffic Inj. Prev. 2022, 23, 107–111. [Google Scholar] [CrossRef] [PubMed]
Wüster, J.; Voß, J.; Koerdt, S.; Beck-Broichsitter, B.; Kreutzer, K.; Märdian, S.; Lindner, T.; Heiland, M.; Doll, C. Impact of the Rising Number of Rentable E-scooter Accidents on Emergency Care in Berlin 6 Months after the Introduction: A Maxillofacial Perspective. Craniomaxillofacial Trauma Reconstr. 2021, 14, 43–48. [Google Scholar] [CrossRef]
Ishmael, C.R.; Hsiue, P.P.; Zoller, S.D.; Wang, P.; Hori, K.R.; Gatto, J.D.; Li, R.; Jeffcoat, D.M.; Johnson, E.E.; Bernthal, N.M. An Early Look at Operative Orthopaedic Injuries Associated with Electric Scooter Accidents: Bringing High-Energy Trauma to a Wider Audience. J. Bone Jt. Surg. 2020, 102, e18. [Google Scholar] [CrossRef]
Reck, D.J.; Martin, H.; Axhausen, K.W. Mode choice, substitution patterns and environmental impacts of shared and personal micro-mobility. Transp. Res. Part D Transp. Environ. 2022, 102, 103134. [Google Scholar] [CrossRef]
Ayözen, Y.E.; İnaç, H.; Atalan, A.; Dönmez, C.Ç. E-Scooter Micro-Mobility Application for Postal Service: The Case of Turkey for Energy, Environment, and Economy Perspectives. Energies 2022, 15, 7587. [Google Scholar] [CrossRef]
Arslan, E.; Uyulan, Ç. Analysis of an e-scooter and rider system dynamic response to curb traversing through physics-informed machine learning methods. Proc. Inst. Mech. Eng. Part D J. Automob. Eng. 2022, 0, 095440702211001. [Google Scholar] [CrossRef]
Kim, S.; Choo, S.; Lee, G.; Kim, S. Predicting Demand for Shared E-Scooter Using Community Structure and Deep Learning Method. Sustainability 2022, 14, 2564. [Google Scholar] [CrossRef]
Kleinertz, H.; Ntalos, D.; Hennes, F.; Nüchtern, J.; Frosch, K.-H.; Thiesen, D.M. Accident Mechanisms and Injury Patterns in E-Scooter Users. Dtsch. Arztebl. Int. 2021, 118, 117–121. [Google Scholar] [CrossRef]
Badeau, A.; Carman, C.; Newman, M.; Steenblik, J.; Carlson, M.; Madsen, T. Emergency department visits for electric scooter-related injuries after introduction of an urban rental program. Am. J. Emerg. Med. 2019, 37, 1531–1533. [Google Scholar] [CrossRef] [PubMed]
Cittadini, F.; Aulino, G.; Petrucci, M.; Valentini, S.; Covino, M. Electric scooter–related accidents: A possible protective effect of helmet use on the head injury severity. Forensic Sci. Med. Pathol. 2022, 1–6. [Google Scholar] [CrossRef]
Ahluwalia, R.; Grainger, C.; Coffey, D.; Malhotra, P.-S.; Sommerville, C.; Tan, P.S.; Johal, K.; Sivaprakasam, M.; Almousa, O.; Janakan, G.; et al. The e-scooter pandemic at a UK Major Trauma Centre: A cost-based cohort analysis of injury presentation and treatment. Surgeon 2022. [Google Scholar] [CrossRef] [PubMed]
Severengiz, S.; Schelte, N.; Bracke, S. Analysis of the environmental impact of e-scooter sharing services considering product reliability characteristics and durability. Procedia CIRP 2021, 96, 181–188. [Google Scholar] [CrossRef]
Yang, H.; Ma, Q.; Wang, Z.; Cai, Q.; Xie, K.; Yang, D. Safety of micro-mobility: Analysis of E-Scooter crashes by mining news reports. Accid. Anal. Prev. 2020, 143, 105608. [Google Scholar] [CrossRef] [PubMed]
Gioldasis, C.; Christoforou, Z.; Seidowsky, R. Risk-taking behaviors of e-scooter users: A survey in Paris. Accid. Anal. Prev. 2021, 163, 106427. [Google Scholar] [CrossRef] [PubMed]
Stigson, H.; Malakuti, I.; Klingegård, M. Electric scooters accidents: Analyses of two Swedish accident data sets. Accid. Anal. Prev. 2021, 163, 106466. [Google Scholar] [CrossRef]
Posirisuk, P.; Baker, C.; Ghajari, M. Computational prediction of head-ground impact kinematics in e-scooter falls. Accid. Anal. Prev. 2022, 167, 106567. [Google Scholar] [CrossRef]
Haworth, N.; Schramm, A.; Twisk, D. Comparing the risky behaviours of shared and private e-scooter and bicycle riders in downtown Brisbane, Australia. Accid. Anal. Prev. 2021, 152, 105981. [Google Scholar] [CrossRef]
Nikiforiadis, A.; Paschalidis, E.; Stamatiadis, N.; Raptopoulou, A.; Kostareli, A.; Basbas, S. Analysis of attitudes and engagement of shared e-scooter users. Transp. Res. Part D Transp. Environ. 2021, 94, 102790. [Google Scholar] [CrossRef]
Pazzini, M.; Cameli, L.; Lantieri, C.; Vignali, V.; Dondi, G.; Jonsson, T. New Micromobility Means of Transport: An Analysis of E-Scooter Users’ Behaviour in Trondheim. Int. J. Environ. Res. Public Health 2022, 19, 7374. [Google Scholar] [CrossRef] [PubMed]
Atalan, A.; Şahin, H.; Atalan, Y.A. Integration of Machine Learning Algorithms and Discrete-Event Simulation for the Cost of Healthcare Resources. Healthcare 2022, 10, 1920. [Google Scholar] [CrossRef] [PubMed]
Ceylan, Z.; Atalan, A. Estimation of healthcare expenditure per capita of Turkey using artificial intelligence techniques with genetic algorithm-based feature selection. J. Forecast. 2021, 40, 279–290. [Google Scholar] [CrossRef]
Jordan, M.I.; Mitchell, T.M. Machine learning: Trends, perspectives, and prospects. Science 2015, 349, 255–260. [Google Scholar] [CrossRef] [PubMed]
Akbulut, A.; Ertugrul, E.; Topcu, V. Fetal health status prediction based on maternal clinical history using machine learning techniques. Comput. Methods Programs Biomed. 2018, 163, 87–100. [Google Scholar] [CrossRef]
Buehler, R.; Broaddus, A.; Sweeney, T.; Zhang, W.; White, E.; Mollenhauer, M. Changes in Travel Behavior, Attitudes, and Preferences among E-Scooter Riders and Nonriders: First Look at Results from Pre and Post E-Scooter System Launch Surveys at Virginia Tech. Transp. Res. Rec. J. Transp. Res. Board 2021, 2675, 335–345. [Google Scholar] [CrossRef]
Jiao, J.; Bai, S. Understanding the Shared E-scooter Travels in Austin, TX. ISPRS Int. J. Geo-Inf. 2020, 9, 135. [Google Scholar] [CrossRef]
Atalan, A. Forecasting drinking milk price based on economic, social, and environmental factors using machine learning algorithms. Agribusiness 2023, 39, 214–241. [Google Scholar] [CrossRef]
Mani, I.; Zhang, I. kNN approach to unbalanced data distributions: A case study involving information extraction. In Proceedings of the Workshop on Learning from Imbalanced Datasets, Washington, DC, USA, 21 August 2003; Volume 126, pp. 1–7. [Google Scholar]
Cortes, C.; Vapnik, V. Support-Vector Networks. Mach. Learn. 1995, 7, 273–297. [Google Scholar] [CrossRef]
Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Rachmani, E.; Hsu, C.-Y.; Nurjanah, N.; Chang, P.W.; Shidik, G.F.; Noersasongko, E.; Jumanto, J.; Fuad, A.; Ningrum, D.N.A.; Kurniadi, A.; et al. Developing an Indonesia’s health literacy short-form survey questionnaire (HLS-EU-SQ10-IDN) using the feature selection and genetic algorithm. Comput. Methods Programs Biomed. 2019, 182, 105047. [Google Scholar] [CrossRef] [PubMed]
Freund, Y.; Schapire, R.; Abe, N. A short introduction to boosting. J.-Jpn. Soc. Artif. Intell. 1999, 14, 1612. [Google Scholar]
Schapire, R.E. Explaining AdaBoost. In Empirical Inference; Springer: Berlin/Heidelberg, Germany, 2013; pp. 37–52. [Google Scholar] [CrossRef]
Hastie, T.; Rosset, S.; Zhu, J.; Zou, H. Multi-class adaboost. Stat. Interface 2009, 2, 349–360. [Google Scholar] [CrossRef]
Zhou, X.; Tian, S.; An, J.; Yang, J.; Zhou, Y.; Yan, D.; Wu, J.; Shi, X.; Jin, X. Comparison of different machine learning algorithms for predicting air-conditioning operating behavior in open-plan offices. Energy Build. 2021, 251, 111347. [Google Scholar] [CrossRef]
Cook, N.R. Statistical Evaluation of Prognostic versus Diagnostic Models: Beyond the ROC Curve. Clin. Chem. 2008, 54, 17–23. [Google Scholar] [CrossRef]
Mun, S.-H.; Kwak, Y.; Huh, J.-H. A case-centered behavior analysis and operation prediction of AC use in residential buildings. Energy Build. 2019, 188–189, 137–148. [Google Scholar] [CrossRef]
Atalan, A. Central Composite Design Optimization Using Computer Simulation Approach. Flexsim Q. Publ. 2014, 5, 19. Available online: https://www.researchgate.net/publication/321748315_Central_Composite_Design_Optimization_Using_Computer_Simulation_Approach (accessed on 20 November 2022).
Santacreu, A.; Yannis, G.; de Saint Leon, O.; Crist, P. Safe micromobility. Sci. Engl. Med. 2020, 1–98. [Google Scholar]
Latinopoulos, C.; Patrier, A.; Sivakumar, A. Planning for e-scooter use in metropolitan cities: A case study for Paris. Transp. Res. Part D Transp. Environ. 2021, 100, 103037. [Google Scholar] [CrossRef]
Stray, A.V.; Siverts, H.; Melhuus, K.; Enger, M.; Galteland, P.; Næss, I.; Helseth, E.; Ramm-Pettersen, J. Characteristics of Electric Scooter and Bicycle Injuries after Introduction of Electric Scooter Rentals in Oslo, Norway. JAMA Netw. Open 2022, 5, e2226701. [Google Scholar] [CrossRef]
Brownson, A.B.S.; Fagan, P.; Dickson, S.; Civil, I.D.S. Electric scooter injuries at Auckland City Hospital. NZ Med. J. 2019, 132, 62–72. [Google Scholar]
Suominen, E.N.; Sajanti, A.J.; Silver, E.A.; Koivunen, V.; Bondfolk, A.S.; Koskimäki, J.; Saarinen, A.J. Alcohol intoxication and lack of helmet use are common in electric scooter-related traumatic brain injuries: A consecutive patient series from a tertiary university hospital. Acta Neurochir. 2022, 164, 643–653. [Google Scholar] [CrossRef]
Jimenez-Martinez, M. Artificial Neural Networks for Passive Safety Assessment. Eng. Lett. 2022, 30, 1–9. [Google Scholar]
Xie, Y.; Lord, D.; Zhang, Y. Predicting motor vehicle collisions using Bayesian neural network models: An empirical analysis. Accid. Anal. Prev. 2007, 39, 922–933. [Google Scholar] [CrossRef] [PubMed]
Xiang, L. Simulation System of Car Crash Test in C-NCAP Analysis Based on an Improved Apriori Algorithm. Phys. Procedia 2012, 25, 2066–2071. [Google Scholar] [CrossRef]
Turoń, K.; Czech, P. The Concept of Rules and Recommendations for Riding Shared and Private E-Scooters in the Road Network in the Light of Global Problems. In Modern Traffic Engineering in the System Approach to the Development of Traffic Networks; Springer Nature: Berlin/Heidelberg, Germany, 2020; pp. 275–284. [Google Scholar] [CrossRef]
Mangold, M.; Zhao, P.; Haitao, H.; Mansourian, A. Geo-fence planning for dockless bike-sharing systems: A GIS-based multi-criteria decision analysis framework. Urban Inform. 2022, 1, 17. [Google Scholar] [CrossRef]
Zhao, P.; Haitao, H.; Li, A.; Mansourian, A. Impact of data processing on deriving micro-mobility patterns from vehicle availability data. Transp. Res. Part D Transp. Environ. 2021, 97, 102913. [Google Scholar] [CrossRef]
TUROŃ, K.; CZECH, P.; TÓTH, J. Safety and security aspects in shared mobility systems. Sci. J. Silesian Univ. Technol. Ser. Transp. 2019, 104, 169–175. [Google Scholar] [CrossRef]
Cohen, J. Statistical Power Analysis for the Behavioral Sciences, 2nd ed.; Lawrence Erlbaum Associates: Hillsdale, MI, USA, 1988. [Google Scholar]

Figure 1. The flowchart of the methodology.

Figure 2. Data distributions of e-scooter drivers: (a) based on the rental times of the e-scooter, (b) based on the speed of the e-scooter vehicle during the trip, (c) distribution of driving status by driver gender, (d) distribution of driving status by driver age, (e) distribution of the e-scooter vehicle rental (tenure) by drivers, and (f) distribution of drivers from the time of the e-scooter vehicle rental to the end of the trip (with or without problems).

Figure 3. Data distributions of e-scooter drivers based on the region selected: (a) driving status, (b) rental numbers by female and male customers. *, data unavailable.

Figure 4. Distribution of accident data of shared e-scooter vehicles by regions.

Figure 5. The sequence of ML algorithms in the software.

Figure 6. The standardized effect graph of driving time (minute).

Figure 7. ROC curves of ML algorithms.

Figure 8. Accuracy rates of test data according to ML algorithms.

Figure 9. Main effects of independent factors acting on dependent factors: (a) the driving time; (b) speed km/min; (c) regions selected; (d) model of e-scooter vehicles.

Table 1. Literature review of e-scooter vehicles’ usage purposes, advantages, and driving statuses such as accidents and the purpose of this study.

Proposed	Method	Location/Data Source	Ref.
Analysis of the dynamic response of the e-scooter and driver system	Physics-informed ML algorithms	Nonapplicable, virtual	[19]
Estimating shared e-scooter demand in terms of community structure	DL	Korea	[20]
Analysis of accident mechanisms and injury patterns of e-scooter drivers	Statistical analysis	Germany	[21]
Estimation of energy and time cost in mail delivery with e-scooter vehicles	ML algorithms	Turkey	[1]
Analysis of injuries related to e-scooter drivers arriving in the emergency room	Descriptive statistics	U.S.	[22]
Analysis of helmet use in accidents involving electric scooters	Chi-square test, Yates’s correction, Fisher’s exact test, the Mann–Whitney U test	Italy	[23]
Cost-based cohort analysis for e-scooter-induced injury treatment	Statistical analysis	U.K.	[24]
Analysis of the environmental impact of e-scooter sharing services	LCA	Nonapplicable, simulation	[25]
Analysis of e-scooter accidents in terms of micro-mobility safety	Descriptive analysis and CTA	U.S.	[26]
Analysis of e-scooter vehicles in terms of environment, economy, and cost in mail delivery	Poisson regression model	Turkey	[18]
Risk-taking behavior of e-scooter drivers	A survey analysis	France	[27]
Analysis of e-scooter accidents	Descriptive statistics	Sweden	[28]
Analysis of factors that cause e-scooter injury accidents	ZIP and ZINB models	U.S.	[14]
Estimation of head-to-ground impact kinematics in falls of e-scooter riders	MDA	Nonapplicable, simulation	[29]
Analyzing the risky behavior of e-scooter and bicycle users	Statistical analysis	Australia	[30]
Estimating the driving status of e-scooter vehicles and calculating the optimum values of the variables	Statistical ML algorithms	Turkey	This Study

Abbreviations: Ref., reference; ML, machine learning; DL, deep learning; LCA, life cycle assessment; CTA, cross-tabulation analysis; ZIP, zero-inflated Poisson; ZINB, zero-inflated negative binomial; MDA, multi-body dynamics approach.

Table 2. The values of descriptive statistics of independent and dependent variables based on the gender of drivers.

Variable	Gender	N	Mean	StDev	Variance	Min	Max	Skewness	Kurtosis
Rental Numbers	Female	201	58.960	111.08	12,338.3	1.000	945.000	4.930	29.880
Rental Numbers	Male	731	70.260	108.01	11,665.9	1.000	1009.000	3.980	22.160
Driving Time (min)	Female	201	13.260	18.40	338.49	1.000	135.000	4.000	20.200
Driving Time (min)	Male	731	11.912	15.991	255.71	1.000	148.000	4.160	22.430
Distance (km)	Female	201	1.815	2.146	4.607	0.100	18.100	3.600	18.800
Distance (km)	Male	731	1.895	2.084	4.343	0.100	19.600	3.320	16.500
Speed (km/min)	Female	201	0.169	0.074	0.005	0.004	0.342	−0.050	−0.440
Speed (km/min)	Male	731	0.184	0.077	0.006	0.005	0.356	−0.120	−0.520
Start Time of Rental	Female	201	29.751	10.71	114.65	1.000	47.000	−1.290	1.240
Start Time of Rental	Male	731	29.330	12.179	148.339	1.000	48.000	−1.030	0.270
Driver Age	Female	201	26.057	7.955	63.283	16.500	40.000	0.750	−0.930
Driver Age	Male	731	23.576	6.847	46.885	16.500	40.000	1.060	0.070
Driving Status *	Female	201	0.259	0.789	0.623	0.000	3.000	3.020	7.620
Driving Status *	Male	731	0.302	0.862	0.743	0.000	3.000	2.700	5.540

Abbreviation: N, sample size; StDev, standard deviation; Min, minimum value; Max, maximum value. * Numeric and integer values (0, 1, 2, 3).

Table 3. The setting of SVM algorithms.

Model Parameters	Settings
Type	SVM
Cost (C)	1.00
Regression loss epsilon (ε)	0.10
Kernel	RBF, exp(-auto\|x-y\|²)
Numerical tolerance	0.001
Iteration limits	100

Table 4. The setting of NN algorithms.

Model Parameters	Settings
Type	Neural network
Hidden Layer	100
Activation	ReLu
Solver	Adam
Alpha (regulation)	0.0001
Iteration limits (Maximum)	200
Replicable training	True

Table 5. The setting of RF algorithms.

Model Parameters	Settings
Type	Random forest
Number of trees	10
Maximal number of considered features	Unlimited
Replicable training	No
Maximal tree depth	Unlimited
Stop splitting nodes with maximum instances	5

Table 6. The setting of kNN algorithms.

Model Parameters	Settings
Type	k-nearest neighbor
Number of neighbors	5
Metric (distance between two points)	Euclidean
Weight	Uniform

Table 7. The setting of AB algorithms.

Model Parameters	Settings
Type	AdaBoost
Base estimator	Tree
The number of estimators	50
Algorithm (classification)	Samme.r
Loss (regression)	Linear

Table 8. A confusion matrix for the dependent variable with the categorical data type of ML algorithms.

Total Number of Observations (P + N)		Predicted Observation
Total Number of Observations (P + N)		Positive (PP)	Negative (PN)
Actual Observation	Positive (P)	True positive (TP)	False negative (FP)
Actual Observation	Negative (N)	False positive (FN)	True negative (TN)

Table 9. Statistical values of the effects of independent variables on energy cost.

Term	Coef	SE Coef	T-Value	p-Value	VIF
Constant	23.2	6.87	3.38	0.001
Rental Date	−0.00047	0.000154	−3.08	0.002	1.19
Rental Numbers	0.000174	0.000092	1.89	0.059	1.06
Driving Time (min)	0.02556	0.00108	23.7	0.001	3.36
Distance (km)	−0.12536	0.00844	−14.85	0.001	3.31
Speed (km/min)	0.49	0.174	2.82	0.005	1.87
Start Time of Rental	−0.00107	0.000929	−1.16	0.248	1.29
Age	−0.00105	0.00139	−0.75	0.452	1.05
Region				0.001
Antalya	−0.0569	0.0542	−1.05	0.294	5.36
Bursa	−0.1287	0.0518	−2.48	0.013	4.02
Diyarbakır	−0.1097	0.0814	−1.35	0.178	2.46
Eskişehir	−0.0914	0.0598	−1.53	0.127	2.31
Gaziantep	0.0369	0.0623	0.59	0.554	2.12
İstanbul	2.4984	0.0731	34.17	0.001	3.87
İzmir	−0.0535	0.0679	−0.79	0.431	1.76
Kayseri	−0.0203	0.0976	−0.21	0.835	1.28
Kocaeli	0.154	0.179	0.86	0.388	1.08
Konya	−0.0838	0.0539	−1.55	0.121	3.32
Sakarya	0.0301	0.0775	0.39	0.698	1.53
Samsun	−0.1164	0.071	−1.64	0.101	5.28
Trabzon	2.893	0.216	13.38	0.001	1.06
Uşak	0.0489	0.0923	0.53	0.596	1.33
Model				0.001
M2.2	−1.899	0.309	−6.15	0	209.21
M2.3	−1.889	0.306	−6.18	0	204.65
Gender				0.109
Male	−0.0394	0.0245	−1.61	0.109	1.08

Abbreviation: Coef., coefficient; SE Coeff., standard error coefficient; t-value and p-value, a measure of the power of influence in the data; VIF, variance inflation factor.

Table 10. Average performance score data of ML algorithms.

Stages	Model	AUC	CA	F1	Precision	Recall
Training	SVM	0.875	0.871	0.817	0.770	0.871
	RF	0.944	0.964	0.947	0.931	0.964
	kNN	0.763	0.885	0.856	0.841	0.885
	AB	0.959	0.950	0.939	0.922	0.950
	NN	0.500	0.878	0.821	0.770	0.878
Testing	SVM	0.723	0.878	0.821	0.770	0.878
	RF	0.922	0.928	0.907	0.891	0.928
	kNN	0.804	0.863	0.839	0.821	0.863
	AB	0.800	0.935	0.928	0.924	0.935
	NN	0.500	0.878	0.821	0.770	0.878

Table 11. Confusion matrix of ML algorithms.

Model		Predicted Observation				Sum
Model	Actual Observations	Completed	Injured	Material Damage	Nonapplicable	Sum
SVM	Completed	121.0	0.00	0.00	1.00	122.0
	Injured	5.000	0.00	0.00	0.00	5.000
	Material Damage	0.000	0.00	0.00	0.00	0.000
	Nonapplicable	12.00	0.00	0.00	0.00	0.000
NN	Completed	122.0	0.00	0.00	0.00	122.0
	Injured	5.000	0.00	0.00	0.00	5.000
	Material Damage	0.000	0.00	0.00	0.00	0.000
	Nonapplicable	12.00	0.00	0.00	0.00	12.00
AB	Completed	121.0	1.00	0.00	0.00	122.0
	Injured	4.000	0.00	0.00	1.00	5.000
	Material Damage	0.000	0.00	0.00	0.00	0.000
	Nonapplicable	1.000	0.00	0.00	11.0	12.00
kNN	Completed	119.0	0.00	0.00	3.00	122.0
	Injured	5.000	0.00	0.00	0.00	5.000
	Material Damage	0.000	0.00	0.00	0.00	0.000
	Nonapplicable	8.000	0.00	0.00	4.00	12.00
RF	Completed	122.0	0.00	0.00	0.00	122.0
	Injured	3.000	0.00	0.00	2.00	5.000
	Material Damage	0.000	0.00	0.00	0.00	0.000
	Nonapplicable	4.000	0.00	0.00	8.00	12.00

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

İnaç, H. Micro-Mobility Sharing System Accident Case Analysis by Statistical Machine Learning Algorithms. Sustainability 2023, 15, 2097. https://doi.org/10.3390/su15032097

AMA Style

İnaç H. Micro-Mobility Sharing System Accident Case Analysis by Statistical Machine Learning Algorithms. Sustainability. 2023; 15(3):2097. https://doi.org/10.3390/su15032097

Chicago/Turabian Style

İnaç, Hakan. 2023. "Micro-Mobility Sharing System Accident Case Analysis by Statistical Machine Learning Algorithms" Sustainability 15, no. 3: 2097. https://doi.org/10.3390/su15032097

APA Style

İnaç, H. (2023). Micro-Mobility Sharing System Accident Case Analysis by Statistical Machine Learning Algorithms. Sustainability, 15(3), 2097. https://doi.org/10.3390/su15032097

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Micro-Mobility Sharing System Accident Case Analysis by Statistical Machine Learning Algorithms

Abstract

1. Introduction

2. Methodology

2.1. Data Compilation

2.2. Descriptive Statistics

2.3. Machine Learning

2.3.1. Support Vector Machine (SVM)

2.3.2. Neural Network (NN)

2.3.3. Random Forest (RF)

2.3.4. k-Nearest Neighbor (kNN)

2.3.5. AdaBoost

2.4. Performance Criteria Measurements of Machine Learning

2.5. Argument Data Suggestions for Driver Statuses

3. Results

3.1. Statistical Analysis

3.2. Results of Machine Learning

4. Discussion

5. Conclusions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI