Author Contributions
Conceptualization, R.R.-M. and J.M.C.-P.; methodology, R.R.-M.; software, R.R.-M.; validation, R.R.-M., H.G.-R., E.S.-F., J.S.-P., T.I.-P., L.C.R.-G., O.A.G.-B., J.I.G.-T., C.E.G.-T. and H.L.-G.; formal analysis, R.R.-M.; investigation, R.R.-M.; resources, R.R.-M. and J.M.C.-P.; data curation, R.R.-M., H.G.-R., E.S.-F., J.S.-P., T.I.-P., L.C.R.-G., O.A.G.-B., J.I.G.-T., C.E.G.-T. and H.L.-G.; writing–original draft preparation, R.R.-M.; writing–review and editing, R.R.-M. and J.M.C.-P.; visualization, R.R.-M.; supervision, J.M.C.-P.; project administration, J.M.C.-P. All authors have read and agreed to the published version of the manuscript.
Figure 1.
Visual examples of the traffic sign classes used in the dataset: (a) Right Turn, (b) Straight Ahead, (c) Left Turn, (d) No Entry, (e) Dead End, (f) Stop.
Figure 1.
Visual examples of the traffic sign classes used in the dataset: (a) Right Turn, (b) Straight Ahead, (c) Left Turn, (d) No Entry, (e) Dead End, (f) Stop.
Figure 2.
Track layout used for data collection. Traffic signs were positioned at key intersections and corners. The environment simulates a reduced-scale urban driving scenario with realistic perception conditions.
Figure 2.
Track layout used for data collection. Traffic signs were positioned at key intersections and corners. The environment simulates a reduced-scale urban driving scenario with realistic perception conditions.
Figure 3.
Hardware architecture of the 1:10 autonomous vehicle.
Figure 3.
Hardware architecture of the 1:10 autonomous vehicle.
Figure 4.
Global mean Average Precision mAP@50–95 of YOLOv8-11 models on COCO. Each bar shows the mean detection accuracy over all traffic-sign categories on the validation set. Among them, YOLOv10 B obtains the best overall performance, which has the best trade-off between accuracy and computational costs on both datasets with identical training settings.
Figure 4.
Global mean Average Precision mAP@50–95 of YOLOv8-11 models on COCO. Each bar shows the mean detection accuracy over all traffic-sign categories on the validation set. Among them, YOLOv10 B obtains the best overall performance, which has the best trade-off between accuracy and computational costs on both datasets with identical training settings.
Figure 5.
Average mAP@50–95 across all traffic sign classes for each YOLO variant (YOLOv8–YOLOv11, all scales).
Figure 5.
Average mAP@50–95 across all traffic sign classes for each YOLO variant (YOLOv8–YOLOv11, all scales).
Figure 6.
Per-model detection performance for Class 0 (Left Turn). Bars indicate mean Average Precision (mAP@50–95) for each model. Most architectures reach values above 0.94, with YOLOv10B standing out for its strong generalization in visually consistent symbols.
Figure 6.
Per-model detection performance for Class 0 (Left Turn). Bars indicate mean Average Precision (mAP@50–95) for each model. Most architectures reach values above 0.94, with YOLOv10B standing out for its strong generalization in visually consistent symbols.
Figure 7.
Per-model detection performance for Class 1 (Forward). Lower mAP@50–95 values show that this category remains the most challenging. Visual similarity with Classes 0 and 2 leads to frequent confusion among models, suggesting that additional data augmentation may help.
Figure 7.
Per-model detection performance for Class 1 (Forward). Lower mAP@50–95 values show that this category remains the most challenging. Visual similarity with Classes 0 and 2 leads to frequent confusion among models, suggesting that additional data augmentation may help.
Figure 8.
Per-model performance for Class 2 (Right Turn). The results show larger fluctuations across mid-sized models due to the rotational symmetry of this sign. Specialized augmentation strategies could further improve robustness for these cases.
Figure 8.
Per-model performance for Class 2 (Right Turn). The results show larger fluctuations across mid-sized models due to the rotational symmetry of this sign. Specialized augmentation strategies could further improve robustness for these cases.
Figure 9.
Per-model performance for Class 3 (Dead End). Almost all YOLO versions achieve very high precision, confirming the robustness of detection for signs with distinctive geometry and strong color contrast.
Figure 9.
Per-model performance for Class 3 (Dead End). Almost all YOLO versions achieve very high precision, confirming the robustness of detection for signs with distinctive geometry and strong color contrast.
Figure 10.
Per-model performance for Class 4 (No Entry). All architectures exhibit high mAP@50–95, typically above 0.97. This strong performance arises from the clear color separation and stable shape of the sign.
Figure 10.
Per-model performance for Class 4 (No Entry). All architectures exhibit high mAP@50–95, typically above 0.97. This strong performance arises from the clear color separation and stable shape of the sign.
Figure 11.
Per-model detection accuracy for Class 5 (Stop). The majority of models reach saturation near 1.0 mAP@50–95, showing that this symbol—with its distinctive red octagon and white text—is easily recognized under various conditions.
Figure 11.
Per-model detection accuracy for Class 5 (Stop). The majority of models reach saturation near 1.0 mAP@50–95, showing that this symbol—with its distinctive red octagon and white text—is easily recognized under various conditions.
Figure 12.
Average preprocessing time across YOLOv8–YOLOv11 architectures (ms). This stage includes image normalization, resizing, and data loading operations before inference. Preprocessing time remains below 0.6 s for the compact models YOLOv11N and YOLOv8M, while it goes above 1.3 s for heavier versions such as YOLOv10B and YOLOv8X due to the overhead introduced by tensor reshaping.
Figure 12.
Average preprocessing time across YOLOv8–YOLOv11 architectures (ms). This stage includes image normalization, resizing, and data loading operations before inference. Preprocessing time remains below 0.6 s for the compact models YOLOv11N and YOLOv8M, while it goes above 1.3 s for heavier versions such as YOLOv10B and YOLOv8X due to the overhead introduced by tensor reshaping.
Figure 13.
Inference time comparison of YOLOv8–YOLOv11 models (ms). YOLOv10N and YOLOv8N achieve sub-second inference, suitable for real-time scenarios. YOLOv8X and YOLOv10L exhibit slower performance, exceeding 7 s per frame. Inference remains the main latency source in embedded deployment.
Figure 13.
Inference time comparison of YOLOv8–YOLOv11 models (ms). YOLOv10N and YOLOv8N achieve sub-second inference, suitable for real-time scenarios. YOLOv8X and YOLOv10L exhibit slower performance, exceeding 7 s per frame. Inference remains the main latency source in embedded deployment.
Figure 14.
Postprocessing time distribution per model (milliseconds). This phase includes confidence filtering and non-maximum suppression (NMS) to refine bounding boxes. Smaller models such as YOLOv10N and YOLOv8S remain under 0.25 s, while larger ones like YOLOv11X and YOLOv9S approach 0.52 s. Overall, postprocessing contributes minimally to total delay compared with inference time.
Figure 14.
Postprocessing time distribution per model (milliseconds). This phase includes confidence filtering and non-maximum suppression (NMS) to refine bounding boxes. Smaller models such as YOLOv10N and YOLOv8S remain under 0.25 s, while larger ones like YOLOv11X and YOLOv9S approach 0.52 s. Overall, postprocessing contributes minimally to total delay compared with inference time.
Figure 15.
Complementary metrics describing detection reliability. Left: the F1-score for each model, which balances precision and recall. Right: false-positive rate per model. All tested YOLO architectures achieve F1-scores over 0.99 and false-positive rates below 0.015. This demonstrates uniform detection performance across versions, suggesting that even compact models maintain robustness under the same evaluation settings.
Figure 15.
Complementary metrics describing detection reliability. Left: the F1-score for each model, which balances precision and recall. Right: false-positive rate per model. All tested YOLO architectures achieve F1-scores over 0.99 and false-positive rates below 0.015. This demonstrates uniform detection performance across versions, suggesting that even compact models maintain robustness under the same evaluation settings.
Figure 16.
Final training loss achieved after convergence. Lower values indicate smoother optimization and stronger generalization. YOLOv10B and YOLOv9S yield the lowest residual losses, while YOLOv8M and YOLOv10L exhibit higher remaining error, suggesting sensitivity to overfitting in larger parameter spaces.
Figure 16.
Final training loss achieved after convergence. Lower values indicate smoother optimization and stronger generalization. YOLOv10B and YOLOv9S yield the lowest residual losses, while YOLOv8M and YOLOv10L exhibit higher remaining error, suggesting sensitivity to overfitting in larger parameter spaces.
Figure 17.
Total training time required for each YOLO version under identical experimental conditions. Lightweight models (YOLOv8N, YOLOv10N) complete training in less than one hour, whereas high-capacity models (YOLOv8X, YOLOv10B) exceed 1.2 h. This contrast reflects the computational trade-off between scalability and training efficiency.
Figure 17.
Total training time required for each YOLO version under identical experimental conditions. Lightweight models (YOLOv8N, YOLOv10N) complete training in less than one hour, whereas high-capacity models (YOLOv8X, YOLOv10B) exceed 1.2 h. This contrast reflects the computational trade-off between scalability and training efficiency.
Figure 18.
Experimental setup used for traffic sign recognition. The figure shows a 1:10 scale autonomous vehicle equipped with an Intel RealSense D435i camera mounted on a vertical support. The test track is made of a black surface with white lane markings, and includes printed traffic signs such as “Proceed Forward” and “Stop”, positioned at different locations. The system operates in real time, with detections displayed on the monitor.
Figure 18.
Experimental setup used for traffic sign recognition. The figure shows a 1:10 scale autonomous vehicle equipped with an Intel RealSense D435i camera mounted on a vertical support. The test track is made of a black surface with white lane markings, and includes printed traffic signs such as “Proceed Forward” and “Stop”, positioned at different locations. The system operates in real time, with detections displayed on the monitor.
Figure 19.
Performance metrics on detection “Left Turn” traffic sign: (a) FPS vs. Time, (b) CPU Usage vs. Time, (c) RAM Usage vs. Time, (d) Inference Time vs. Time, (e) CPU Frequency vs. Time. For all of the performance measures, the system operates satisfactorily, and its detection robustness is good.
Figure 19.
Performance metrics on detection “Left Turn” traffic sign: (a) FPS vs. Time, (b) CPU Usage vs. Time, (c) RAM Usage vs. Time, (d) Inference Time vs. Time, (e) CPU Frequency vs. Time. For all of the performance measures, the system operates satisfactorily, and its detection robustness is good.
Figure 20.
Performance parameters for detecting the “Forward” traffic symbol on which it is trained: (a) FPS vs. Time, (b) CPU Usage vs. Time, (c) RAM Usage vs. Time, (d) Inference Time vs. Time, (e) CPU Frequency vs. Time. During real-time embedded testing, the system performed well in terms of detection accuracy and stability.
Figure 20.
Performance parameters for detecting the “Forward” traffic symbol on which it is trained: (a) FPS vs. Time, (b) CPU Usage vs. Time, (c) RAM Usage vs. Time, (d) Inference Time vs. Time, (e) CPU Frequency vs. Time. During real-time embedded testing, the system performed well in terms of detection accuracy and stability.
Figure 21.
Performance metrics during detection of the “Right Turn” traffic sign: (a) FPS vs. Time, (b) CPU Usage vs. Time, (c) RAM Usage vs. Time, (d) Inference Time vs. Time, (e) CPU Frequency vs. Time. The system maintained real-time detection with high reliability and hardware stability.
Figure 21.
Performance metrics during detection of the “Right Turn” traffic sign: (a) FPS vs. Time, (b) CPU Usage vs. Time, (c) RAM Usage vs. Time, (d) Inference Time vs. Time, (e) CPU Frequency vs. Time. The system maintained real-time detection with high reliability and hardware stability.
Figure 22.
Performance metrics during detection of the “Dead End” traffic sign: (a) FPS vs. Time, (b) CPU Usage vs. Time, (c) RAM Usage vs. Time, (d) Inference Time vs. Time, (e) CPU Frequency vs. Time. Stable hardware utilization and consistent detection were observed.
Figure 22.
Performance metrics during detection of the “Dead End” traffic sign: (a) FPS vs. Time, (b) CPU Usage vs. Time, (c) RAM Usage vs. Time, (d) Inference Time vs. Time, (e) CPU Frequency vs. Time. Stable hardware utilization and consistent detection were observed.
Figure 23.
Performance metrics during detection of the “No Entry” traffic sign: (a) FPS vs. Time, (b) CPU Usage vs. Time, (c) RAM Usage vs. Time, (d) Inference Time vs. Time, (e) CPU Frequency vs. Time. Results confirm stable inference and precise classification despite visual similarity to other classes.
Figure 23.
Performance metrics during detection of the “No Entry” traffic sign: (a) FPS vs. Time, (b) CPU Usage vs. Time, (c) RAM Usage vs. Time, (d) Inference Time vs. Time, (e) CPU Frequency vs. Time. Results confirm stable inference and precise classification despite visual similarity to other classes.
Figure 24.
Performance metrics during detection of the “Stop” traffic sign: (a) FPS vs. Time, (b) CPU Usage vs. Time, (c) RAM Usage vs. Time, (d) Inference Time vs. Time, (e) CPU Frequency vs. Time. Results show consistent behavior and perfect classification accuracy, affirming the model’s capability to detect highly distinctive signs.
Figure 24.
Performance metrics during detection of the “Stop” traffic sign: (a) FPS vs. Time, (b) CPU Usage vs. Time, (c) RAM Usage vs. Time, (d) Inference Time vs. Time, (e) CPU Frequency vs. Time. Results show consistent behavior and perfect classification accuracy, affirming the model’s capability to detect highly distinctive signs.
Figure 25.
Power consumption of the embedded system during a continuous 180 s experiment (watts).
Figure 25.
Power consumption of the embedded system during a continuous 180 s experiment (watts).
Table 1.
Technical specifications of the dataset acquisition.
Table 1.
Technical specifications of the dataset acquisition.
| Parameter | Value |
|---|
| Camera model | Intel RealSense D435i |
| Color format | BGR (8-bit per channel) |
| Image resolution | 640 × 480 pixels |
| Capture method | OpenCV (Python API) |
| Frame rate | 30 FPS |
| Number of images | 9000 |
| Traffic sign classes | 6 (official European designs) |
| Angles of view | , , |
| Distances from camera | 20 cm, 40 cm |
| Track type | Indoor modular, 1:10 scale urban layout |
| Vehicle state | Static during acquisition |
Table 2.
Traffic sign classes used in the dataset.
Table 2.
Traffic sign classes used in the dataset.
| Class ID | Description |
|---|
| 0 | Left Turn |
| 1 | Straight Ahead |
| 2 | Right Turn |
| 3 | Dead End |
| 4 | No Entry |
| 5 | Stop |
Table 3.
Cloud-based training environment specifications.
Table 3.
Cloud-based training environment specifications.
| Component | Specification |
|---|
| Platform | RunPod.io |
| GPU | NVIDIA RTX 4090 (24 GB VRAM) |
| CPU | Intel Xeon Platinum 8375C |
| RAM | 64 GB DDR4 |
| OS | Ubuntu 22.04 LTS (64-bit) |
| Frameworks | PyTorch 2.0.1, Python 3.10 |
Table 4.
Onboard embedded system specifications.
Table 4.
Onboard embedded system specifications.
| Component | Specification |
|---|
| Model | TRIGKEY S5 Mini PC |
| CPU | AMD Ryzen 7 5700U (8 cores, 1.8–4.3 GHz) |
| RAM | 12.6 GB DDR4 |
| GPU | Integrated AMD Radeon |
| OS | Ubuntu 20.04.6 LTS |
Table 5.
Real-time embedded inference evaluation metrics.
Table 5.
Real-time embedded inference evaluation metrics.
| Metric | Description |
|---|
| Inference Latency | Processing time per frame |
| Live FPS | Actual frames per second from camera feed |
| CPU Load | Average CPU utilization |
| Memory Usage | RAM consumption of the full pipeline |
| Recognition Rate | Percentage of correct sign detections |
Table 6.
Core hardware components of the vehicle.
Table 6.
Core hardware components of the vehicle.
| Component | Description |
|---|
| Chassis | 1:10 autonomous platform |
| TRIGKEY S5 | Embedded computer |
| RealSense D435i | RGB-D camera |
| FSESC 5.7 Pro | Motor controller (VESC) |
| Step-Down Regulator | Voltage conversion module |
| LiPo Battery | 4S, power source |
Table 7.
Technical specifications of the TRIGKEY S5.
Table 7.
Technical specifications of the TRIGKEY S5.
| Component | Specification |
|---|
| Processor | AMD Ryzen 7 5700U (8 cores) |
| Graphics | AMD Radeon (Integrated) |
| RAM | 12.6 GB DDR4 |
| Storage | 1 TB NVMe SSD |
| OS | Ubuntu 20.04.6 LTS |
| Connectivity | WiFi 6/Bluetooth 5.2 |
Table 8.
Summary of main detection metrics (mAP@50–95, F1-score, and Inference FPS) for the best-performing YOLO models.
Table 8.
Summary of main detection metrics (mAP@50–95, F1-score, and Inference FPS) for the best-performing YOLO models.
| Model | mAP@50–95 | F1-Score | FPS |
|---|
| YOLOv10B | 0.9797 | 0.972 | 17.6 |
| YOLOv9S | 0.9796 | 0.969 | 18.4 |
| YOLOv8M | 0.9782 | 0.966 | 20.1 |