1. Introduction
Climate change is one of the most important threats of today, which leads to an increase in the frequency of extreme weather events, rising global temperature values, and sea levels due to increased greenhouse gas emissions and the deterioration of air quality. Whatever the reason, it is an inescapable fact that climate change has countless devastating effects on all living things [
1,
2]. Decision makers have focused on sectors such as transportation, which can be regulated or improved to reduce the effects of climate change. In particular, fine particulate matter (PM2.5 and PM10), which is produced as a result of the burning of fossil fuels, and is frequently used as energy sources in the transportation sector, absorbs sunlight, triggers global warming, and causes imbalances in the climate [
3,
4]. According to data from the United Nations Framework Convention on Climate Change, the transportation sector is responsible for 28% of greenhouse gas emissions in the USA in 2022 among economic sectors [
5]. According to another climate change inventory report, this rate was 29% in the European Union region [
5]. To achieve these goals and mitigate the effects of climate change, dependence on fossil fuels must be reduced [
6]. This requires the development of sustainable strategies that include radical technological, structural, and behavioral changes in transportation modes, vehicle technologies, and energy sources [
7,
8]. Sustainable transport plays a vital role in the fight against climate change in cities, allowing for the existence and maintenance of more economically and socially livable urban areas in the long term. One way to increase the sustainability of the transportation sector is to increase and promote new types of transportation vehicles and fuels [
8,
9]. The main objectives of sustainable transportation systems include preferring renewable energy-fueled vehicles for long distances, encouraging pedestrian transportation for short distances, and preferring micro-mobility vehicles (bicycles, e-bikes, e-scooters, hoverboards, skates, etc.) for medium distances. It is known that micro-mobility sharing systems, especially bicycles and scooters, make significant contributions to the dissemination of the concept of sustainable transportation [
10,
11]. Individuals who use micro-mobility vehicles for transportation purposes can complete the remaining distance on foot after using these vehicles [
12,
13]. It has been emphasized that cycling habits are a fundamental solution to reduce air pollutants originating from the transportation sector [
9,
14,
15]. However, micro-mobility users, along with pedestrians, are considered to be the most fragile and vulnerable group among road users [
13]. Micro-mobility vehicles are less likely to be used for transportation purposes due to the perceived safety concerns of their users and the deep conflicts between traffic components [
16,
17,
18,
19]. The study conducted by [
20] in Germany highlighted the importance of addressing the safety of use of electric micro-mobility vehicles, suggesting that the incidence of serious injuries in e-scooter and e-bike users was 13.2% and 17.7%, respectively, while this rate was 5.3% in human-powered bicycles. In a study conducted by [
21], when George Washington Hospital Emergency Department data were examined, it was determined that e-scooter and bicycle users had an injury rate of 12.9% and 3.42% per million kilometers, respectively. In a study conducted by [
22] in Singapore, it was determined that users of electric motorized devices (such as e-scooters and e-bikes) had a three times higher risk of serious injury compared to users of non-motorized devices (such as kick-scooters and skateboards) between 2015 and 2017, according to data from the Singapore National Trauma Registry. Generally speaking, researchers have conducted various studies on user safety of micro-mobility vehicles and have revealed significant challenges in terms of increasing user safety [
23,
24,
25,
26,
27,
28,
29,
30].
Studies are being conducted by researchers using various techniques to reduce or eliminate the safety concerns of micro-mobility vehicles. Studies involving the use of machine learning, deep learning and statistical methods to investigate and predict micro-mobility vehicle and other vehicle conflicts [
31,
32,
33,
34,
35] and the use of computer vision methods to detect micro-mobility vehicle recognition and protective equipment to examine the safety of micro-mobility vehicles under physical conditions [
36,
37,
38,
39,
40,
41,
42,
43,
44,
45] are frequently carried out by researchers. Computer vision applications, which allow instant interventions by taking into account the current physical conditions due to the dynamic nature of traffic, and which enable violations to be detected very quickly, can relatively alleviate the security concerns of micro-mobility users. However, there are serious challenges faced by computer vision or object detection applications, which are sub-application areas [
46,
47,
48]. Multi-scale training, detection of relatively smaller objects, necessity of large datasets, smaller-sized datasets etc., are challenges that can trigger stringent performance constraints for object recognition algorithms [
49]. To increase the success levels of algorithms that can perform real-time object recognition, the dataset used must be quite large and of high quality [
48,
50].
To autonomously identify micro-mobility vehicles and instantly detect the use of protective equipment, the datasets used in the object recognition algorithms must have overcome many limitations. Because traffic accidents that threaten human life are irreversible, it is imperative that the success rate of object recognition studies, especially in this area, be at the highest possible level. In this context, data augmentation techniques are a frequently used approach to eliminate the limitations in the dataset in object recognition algorithms [
51,
52,
53,
54,
55,
56]. Many studies in the literature include data augmentation techniques, and there are data augmentation methods developed using complex software [
52,
57,
58,
59,
60]. However, in most multidisciplinary studies, the use of traditional data augmentation techniques is still an accepted approach [
61,
62,
63,
64,
65]. Traditional data augmentation techniques may be more advantageous than complex data augmentation techniques in terms of ease and speed of processing the dataset. Various studies have been conducted on the effect of a single complex data augmentation technique on model performance [
66,
67,
68,
69]. In the studies conducted by [
70,
71,
72,
73,
74,
75,
76,
77,
78,
79,
80,
81,
82], it was observed that the application of data augmentation techniques in object detection algorithms carried out in various fields (health, environment, agriculture, transportation, etc.) improved the model performance. However, traditional data augmentation techniques were not examined one by one in the studies conducted, and it was not clearly revealed which approach had a greater effect on model performance or how the Multi-Model Ensemble (MME), obtained by determining the most effective data augmentation techniques, contributed. This leaves researchers wondering “which data augmentation technique should I use?” and forces them to manually augment their datasets, increasing the time and workload spent on research, and even possibly leading to a loss of motivation for research.
Traditional data augmentation techniques are categorized under two main headings: Image Level Augmentation (ILA) and Bounding Box Level Augmentation (BBLA) and have 22 data augmentation approaches (Flip, 90’Rotate, Crop, Rotation, Shear, GrayScale, Hue, Saturation, Brightness, Exposure, Blur, Noise, Cutout, Mosaic, etc.). Examining each traditional data augmentation technique by considering model performance outputs (mAP, F1 Score, Recall, Precision, Inference Speed, GFLOPs, IoU, etc.) is challenging for researchers because it progresses under the influence of multi-faceted dynamic variables. Evaluating traditional data augmentation techniques based on a scientific infrastructure according to more than one performance metric is a multi-dimensional and complex analysis process that requires the solution of a multi-criteria problem. In addition, the process of determining the MME, consisting of the most effective data augmentation techniques, includes stages that require deep analysis.
This study addresses the critical role of data augmentation techniques in increasing the usage and user safety of micro-mobility vehicles. In the paper, a You Only Look Once (YOLO) model that can detect micro-mobility vehicles and protective equipment has been developed, and 22 data augmentation approaches from ILA and BBLA data augmentation techniques have been applied one by one in order to improve the performance of the model and reduce the workload in the dataset creation and dataset labeling processes. In order to scientifically determine the data augmentation approach that affects or improves the model performance the most, the Preference Ranking Organization Method for Enrichment Evaluations (PROMETHEE) approach from the Multi-Criteria Decision-Making (MCDM) methods has been used. The data augmentation techniques with the highest level of impact on the model performance have been determined by the K-means clustering–Elbow method, and the MME has been determined. Thus, it will be determined which data augmentation MME will be used in object detection studies conducted with concern for the safety of micro-mobility vehicles and using the YOLO algorithm. The main motivation of the study is that, despite the increasing use of micro-mobility vehicles in urban transportation due to changing travel habits, the difficulties encountered in vehicle identification and user safety continue to increase. Another motivation source of the study is that current object detection algorithms cannot detect commonly used micro-mobility vehicles and protective equipment use with a holistic approach. This study aims to contribute to the safer and more effective use of micro-mobility vehicles in urban transportation.
3. Results
This section presents the performance analysis of the YOLOv12 model, which represents the latest development in the YOLO family, offering state-of-the-art object detection capabilities through various architectural improvements. A total of 46 models (1 no augmentation, 22 data augmentation techniques for D1_Micro-mobility and D2_Helmet_Detection) were evaluated using basic metrics such as Precision, Recall, F1-Score, mAP@0.5, and mAP@0.5:0.95. In order to determine the effectiveness of data augmentation techniques on model performance, the results were evaluated with the PROMETHEE method, and the outputs were presented. According to the ranking results obtained, the optimal number of clusters was determined with the K-means clustering–Elbow method and the MME effective on the YOLOv12 approach was determined.
3.1. Evaluation of Models
All YOLOv12 model analyses were performed in the Google Colab environment using the Python 3 runtime type and the freely available hardware accelerator T4 GPU. The epoch number for model training was determined as 100, and the batch size as 8. Adaptive Moment Estimation optimization was used in the model, and lr = 0.001, momentum = 0.937, and decay = 0.0005 were determined. The models were evaluated at 0.5 IoU and 0.25 confidence threshold, and the model results are presented in
Table 3 and
Table 4.
When the model performance outputs presented in
Table 3 and
Table 4 are examined separately based on datasets, the best performance for the Precision metric was provided by the D1_ILA_Hue dataset, created with the Hue technique, and the D2_ILA_Rotation dataset, created with the Rotation technique, while the worst performance was provided by the D1_BBLA_Noise dataset, created with the Noise technique, and the D2_ILA_Cutout dataset, created with the Cutout technique. For the Recall performance metric, the best performance belongs to D1_BBLA_Exposure and D2_ILA_Flip datasets, while the worst performance value is D1_No Augmentation and D2_ILA_90’Rotate datasets. According to the mAP@0.5 performance metric, the best-performing datasets are D1_BBLA_Noise and D2_ILA_Saturation, respectively, while the worst performing datasets are D1_No Augmentation and D2_ILA_90’Rotate, respectively. According to the mAP@0.5:0.95 performance metric, the best performances belong to D1_ILA_Noise and D2_BBLA_Blur, while the worst performances belong to D1_No Augmentation and D2_ILA_Blur datasets. According to the F1-Score performance metric, the best performances belong to D1_ILA_Hue and D1_BBLA_Exposure datasets for D1_Micro-mobility, and the worst performance value belongs to D1_No Augmentation dataset. According to the D2_Helmet detection dataset, the best performance belongs to D2_ILA_GrayScale dataset and the worst performance belongs to D2_ILA_Blur dataset.
The analysis of the results obtained must be evaluated in a holistic manner, and it is important to consider all performance metrics together in this process. This situation makes determining the best performing dataset a Multi-Criteria Decision-Making problem. In this context, the evaluation of all performance results was carried out with the PROMETHEE method within the scope of the study.
3.2. Comparison of YOLO Detection Results
The evaluation of model performances according to datasets was carried out with the PROMETHEE method and Precision, Recall, F1-Score, mAP@0.5, and mAP@0.5:0.95 metrics were used to rank the alternatives. All metrics were defined as maximization oriented. The PROMETHEE II method was used to obtain a complete ranking. Preference functions were determined as
p = 0.025–0.030; q = 0.025–0.0.13 for the Precision metric,
p = 0.062–0.094; q = 0.025–0.040 for the Recall metric,
p = 0.053–0.071; q = 0.022–0.031 for the mAP@0.5 metric,
p = 0.053–0.053; q = 0.022–0.023 for the mAP@0.5:0.95 metric, and
p = 0.056–0.056; q = N/A for the F1-Score metric, respectively. The preference function of the F1-Score metric is determined as V-shape and the preference function of all other metrics is determined as Linear. The determination processes of the preference functions, q and
p values, are made by taking into account the data distribution of alternative model performances. The ranking results according to net flow values are presented in
Table 5.
According to the ranking results, the dataset that improved the model performance the most for the D1_Micro-mobility dataset was determined as D1_BBLA_Exposure, and for the D2_Helmet detection dataset as D2_ILA_Saturation. The dataset with the least impact on performance for the D1_Micro-mobility dataset was determined as D1_BBLA_Crop. For the D2_Helmet detection dataset, datasets created with some data augmentation techniques worsened the model performance compared to the D2_No Augmentation dataset. For D1_Micro-mobility, it was determined that the model created with the D1_BBLA_Exposure dataset provided 9.23% improvement in the Precision metric, 18.26% in Recall metric, 17.18% in mAP@0.5 metric, 17.84% in mAP@0.5:0.95 metric, and 16.67% in F1-Score metric compared to the model created with the D1_No Augmentation dataset. For D2_Helmet detection, it was observed that the model created with the D2_ILA_Saturation dataset provided 0.3% improvement in the Precision metric, 5.05% in mAP@0.5 metric, 5.99% in mAP@0.5:0.95 metric, and 9.88% in F1-Score metric compared to the model created with the D2_No Augmentation dataset. It was observed that no improvement was provided in the Recall value.
The effect of the dataset obtained with a single data augmentation technique may remain limited. Therefore, the most effective dataset should be created by determining the number of clusters of datasets that have a high effect on model performance. In this context, the optimal number of clusters of data augmentation techniques that have the most effect on model performance was determined within the scope of the study.
3.3. Determining the MME of Data Augmentation Techniques That Improve YOLO Model Performance
Based on the YOLOv12 model, the net flow values obtained by the PROMETHEE method were analyzed in detail to evaluate the relative advantages of 25 datasets and to select the dataset that increased the model performance the most. The K-means clustering–Elbow method was applied to determine the optimal number of clusters of data augmentation techniques that increased the performance of the YOLOv12 model the most for determining the MME. By analyzing the relationship between the total error rate and the number of data augmentation techniques, the “elbow” point where the decreasing trend in the error rate slowed down significantly was determined, and thus the optimal number of data augmentation technique groups was determined. The error rate graph obtained as a result of the Elbow method is presented in
Figure 2.
When
Figure 2 is examined for D1_Micro-mobility, D1_BBLA_Exposure, D1_ILA_Noise, and D1_BBLA_Blur, datasets were determined as MME. For D2_Helmet detection, D1_ILA_Saturation, and D1_ILA_GrayScale datasets were determined as MME. In addition, the dataset created with the determined MME was re-trained using the same indicators in the YOLOv12 approach. The performance outputs of the model trained with the MME dataset are presented in
Table 6.
Upon review of
Figure 3, it is observed that the D1_Micro-mobility_MME dataset is superior to the D1_No Augmentation dataset in all performance metrics. Accordingly, it is seen that performance increases are achieved in the Precision (+15.29%), Recall (+16.52%), mAP@0.5 (+18.54%), mAP@0.5:0.95 (+26.50%), and F1-Score (+19.7%) metrics. When
Figure 4 is examined together, it is seen that the D2_Helmet detection_MME dataset provides a 2.36% improvement only in the Precision metric. Performance decreases are observed in Recall (−5.59%), mAP@0.5 (−5.15%), and mAP@0.5:0.95 (−3.68%) metrics. There is no change in the F1-Score value. In this study, after data augmentation, Precision increased by 15.29% for the D1 dataset and 2.36% for the D2 dataset, while Recall decreased by 5.59% for the D2 dataset. This shows that in real applications, sensitivity should be carefully adjusted according to the intended use of the system. Especially in security-oriented scenarios, high Recall priority should be given, but the false alarm rate should also be managed.
4. Discussion
In recent years, the use of YOLO object detection models in the fields of transportation, agriculture, environment, health, etc., has been increasing and has become an important tool in the field of computer vision on an international scale [
113,
114]. Before using any object detection approach, it is extremely important to determine the data augmentation techniques to be applied to the datasets. For this purpose, the main motivation of this study is to list the data augmentation techniques to be used in the YOLO family, which is shown as the state-of-the-art in object detection approaches, and to determine the MME. For this purpose, the effects of a total of 22 different data augmentation approaches on object detection performance were systematically investigated on two different datasets (D1-Micro-mobility detection and D2-Helmet detection) using the YOLOv12 model. The obtained model outputs were analyzed through performance metrics widely used in the literature, such as Precision, Recall, F1-Score, and mAP. Multi-dimensional performance data were evaluated holistically with the PROMETHEE method; then, the most appropriate data augmentation MME was determined with the K-means clustering–Elbow method.
The analysis results show that data augmentation techniques generally create significant and class-specific differential effects on model performance. While some data augmentation techniques provide consistent and positive contributions to all performance metrics, some techniques provide contributions only to some performance metrics. This shows that the effects of data augmentation approaches are not limited to increasing the number of images; they include features such as the ability to distinguish between classes, focusing on specific classes, and generalization. The ranking performed using the PROMETHEE method allows multi-dimensional performance analysis, unlike the traditional single-metric evaluation approach. Thus, it has been demonstrated in a computable way which data augmentation approach contributes to the overall success. The K-means clustering–Elbow method provides powerful information for MME to researchers or decision makers who want to use more than one data augmentation technique.
The results show that data augmentation techniques generally improve model performance. In the ranking of MCDM according to the considered performance metrics, it was determined that the data augmentation techniques that increased the model performance the most were D1_BBLA_Exposure and D1_ILA_Saturation. Similarly, in the study conducted by [
115], the effectiveness of Exposure and Saturation techniques was shown among the considered data augmentation techniques. In this paper, it was determined that each data class was not detected with the same success rate in both D1_Micro-mobility and D2_Helmet detection datasets. The reasons for this difference may be the imbalance in the dataset, the difficulty of visual features, the similarity between classes, the labeling quality, the capacity of the model, the weight sharing and the training parameters.
Limitations of the study include the fact that the datasets have certain visual limitations, the evaluation was made through a single object detection approach, and the data augmentation techniques were obtained with fixed parameters. The dataset used in this study specifically addresses micro-mobility vehicles and safety devices, which limits comparison with public datasets (e.g., Cityscapes, BDD100K). Such a comparison would not yield fair or meaningful results, as these datasets do not adequately represent the target audiences of our study. However, it is suggested as a research direction for comparative analysis, particularly with the development of more comprehensive public datasets on micro-mobility. In addition, this study analyzes in detail the advantages and disadvantages of the data augmentation methods used. The main advantages of data augmentation include increasing the generalization capacity of the model, increasing robustness to changes in the environment, and reducing the risk of overstressing. However, some disadvantages should also be taken into account, such as increased training time, the possibility of making unrealistic changes to the model, and limited effectiveness on already large or very diverse datasets.
Data imbalance can have a significant impact on model performance, especially in limited and heterogeneous datasets such as micro-mobility vehicles. However, scientific literature shows that data resampling methods (oversampling, under sampling) and advanced loss functions such as class-weighted loss or focal loss can improve model performance even in imbalanced datasets. Integrating these techniques into future studies can further increase the generality of the model and its success in real-world conditions.
5. Conclusions
In this study, the effects of a total of 22 data augmentation techniques (number of models: 46—no data augmentation for D1_Micro-mobility, 22 data augmentation techniques; no data augmentation for D2_Helmet_Detection, 22 data augmentation techniques) on model performance were analyzed using a multi-criteria approach on two different datasets using the YOLOv12 approach. Model outputs were evaluated using common metrics such as Precision, Recall, F1-Score, and mAP. Net flow values were obtained and ranked using the model performance values of data augmentation techniques with the PROMETHEE method. MME, which consists of techniques with a high level of effect according to net flow values, was determined by the K-means clustering–Elbow method.
As a result of the analysis, it was observed that each data augmentation technique did not provide homogeneous success for all performance metrics and training classes. The most successful model for D1_Micro-mobility was obtained in the D1_BBLA_Exposure dataset. The least successful model for the same dataset was obtained in the D1_No Augmentation dataset. Similarly, the most successful model for the D2_Micro-mobility dataset was obtained in the D2_ILA_Saturation dataset, and the least successful model was obtained in the D2_ILA_90’Rotate dataset. This situation revealed that data augmentation techniques vary depending on the dataset and training class.
In the model analysis performed with the dataset D1_BBLA_Exposure, D1_ILA_Noise and D1_BBLA_Blur, which consist of data augmentation techniques included in the MME optimal set determined using the net flow values obtained as a result of the PROMETHEE method for the D1_Micro-mobility dataset, the F1-Score value was obtained as 0.79 and the mAP value as 0.844. This shows that MME improves the model performance by 19.7% in the F1-Score metric and 18.54% in the mAP metric. Similarly, for the D2_Helmet_Detection dataset, the F1-score value was 0.81 and the mAP value was 0.871 in the model analysis performed with the D1_ILA_Saturation and D1_ILA_GrayScale datasets, which consist of data augmentation techniques in the MME optimal set. However, a 2.36% performance success increase was observed only in the Precision metric. In this context, the study goes beyond an experimental success evaluation and provides a strong informatics framework for decision makers and researchers by structuring the data augmentation technique strategy in the YOLOv12 object detection algorithm and ensuring the integrity of the performance metric.
In future studies, it is planned to increase the diversity of the dataset, evaluate different object detection approaches in an integrated manner, and analyze parametric data augmentation techniques in order to eliminate the limitations.