Optimizing Pothole Detection in Pavements: A Comparative Analysis of Deep Learning Models †

: Advancements in computer vision applications have led to improved object detection (OD) in terms of accuracy and processing time, enabling real-time solutions across various ﬁelds. In pavement engineering, detecting visual defects such as potholes, cracking, and rutting is of particular interest. This study aims to evaluate YOLO models on a dataset of 665 road pavement images labeled with potholes for OD. Pre-trained deep learning models were customized for pothole detection using transfer learning techniques. The assessed models include You Only Look Once (YOLO) versions 3, 4, and 5. It was found that YOLOv4 achieves the highest mean average precision (mAP), while its shortened version, YOLOv4-tiny, offers the best-reduced inference time, making it ideal for mobile applications. Furthermore, the YOLOv5s model demonstrates potential, attaining good results and standing out for its ease of implementation and scalability.


Introduction
This paper investigates state-of-the-art computer vision (CV) techniques in detecting pavement potholes, comparing the performance of various deep learning (DL) models.Object detection (OD) methods, which identify and locate objects in images or videos, have evolved from traditional image processing techniques, such as the Viola-Jones Detector [1] and Histogram of Oriented Gradients, to DL implementations [2,3].These DL implementations have demonstrated better performance, particularly in complex scenarios, due to their supervised learning approach and the availability data.Hence, the community effort to create massive datasets such as MS COCO [4], PASCAL [5], and IMAGENET [6], has helped the field to evolve.Still, computation power, mainly with GPUs, rapidly increases year by year [7].
One-stage and two-stage detectors are the main categories of DL applications for OD.Two-stage detectors typically exhibit higher accuracy but are slower, while one-stage detectors are faster and more suitable for real-time applications.This article focuses on one of the most famous families of one-stage detectors: You Only Look Once (YOLO) [8].The YOLO algorithm is a fast and accurate object detection model.It divides input images into grids for simultaneous object detection and classification.Despite lower average precision than some competitors, YOLO's detection makes it ideal for low-latency applications.
This article is organized into four sections, with a brief review of the background, a description of the data and methods used, a presentation of the results, and a conclusion with future recommendations.By comparing the performance of YOLO models, this research aims to determine the most effective method to detect potholes in road pavements, contributing to a safer and well-maintained infrastructure.

Data and Methods
This study aims to identify the best model for pothole detection using YOLO-based implementations.Six deep learning models were compared, including YOLOv3-tiny [9], YOLOv3 [9], YOLOv4-tiny [10], YOLOv4 [10], YOLOv5s [11], and YOLOv5x [11].All models were pre-trained on the Common Objects in Context (COCO) dataset, and transfer learning was used, so the developed models use the base of previous models adapted to pothole detection.
A dataset created by Rahman Atikur [12] containing 665 road pavement images with labeled potholes was used, with a 70/20/10 split for training, validation, and testing.An example of labeled images can be seen in Figure 1.

Data and Methods
This study aims to identify the best model for pothole detection using YOL implementations.Six deep learning models were compared, including YOLOv3 YOLOv3 [9], YOLOv4-tiny [10], YOLOv4 [10], YOLOv5s [11], and YOLOv5x models were pre-trained on the Common Objects in Context (COCO) dataset, and learning was used, so the developed models use the base of previous models ad pothole detection.
A dataset created by Rahman Atikur [12] containing 665 road pavement ima labeled potholes was used, with a 70/20/10 split for training, validation, and tes example of labeled images can be seen in Figure 1.This experiment used a computer specifically assembled to perform highcomputing tasks with the following specifications: No data augmentation technique was used, but the default parameters model were maintained.This experiment used a computer specifically assembled to perform high-demand computing tasks with the following specifications: Models run on Python version 3.8.13 and Pytorch 1.10.2.Furthermore, instructions for installation are in the repositories of YOLOv3, YOLOv4, and YOLOv5.In addition, the customized models were trained using Pytorch and Darknet frameworks, specifically YOLOv3 and YOLOv5 for Pytorch and YOLOv4 for Darknet.The base models used in this study are as follows: YOLOv3-tiny (Pytorch, yolov3-tiny.pt),YOLOv3 (Pytorch, yolov3.pt),YOLOv4tiny (Darknet, yolov4-tiny.conv.29),YOLOv4 (Darknet, yolov4.conv.137),YOLOv5s (Pytorch, yolov5s.pt),and YOLOv5x (Pytorch, yolov5x.pt).
Likewise, the same training hyperparameters were used for all models, namely: No data augmentation technique was used, but the default parameters for each model were maintained.

Results
The models were compared based on their mean average precision (mAP) and time to infer an image (Figure 2).The goal is to find a highly precise model for detecting and locating potholes while maintaining a short inference time.YOLOv4, YOLOv4-tiny, and YOLOv5s models stood out as the best options.
Eng. Proc.2023, 36, x locating potholes while maintaining a short inference time.YOLOv4, YOLOv4-t YOLOv5s models stood out as the best options.YOLOv4 demonstrated greater confidence in predicting small potholes comp YOLOv5-based models.The detected objects were similar in YOLOv4 versions, a confidence levels were better than those of YOLOv5 models.
The detailed results are shown in Table 1, with YOLOv4 and YOLOv4-tiny ing excellent results in terms of mAP, model size, and detection time.In additi believed that mAP could be improved by improving label quality, increasing data, tuning hyperparameters, and using data augmentation.More precise labelin as polygonal segmentation, could also help improve the results.Lastly, limitations of this study include the small dataset of 665 images, limit ity of the labels, lack of hyperparameter tuning, and no direct testing of the algo performance in real time.Furthermore, only YOLO implementations were evalua

Conclusions
The best result obtained for pothole detection in the dataset used was with Y reaching a mAP of 83.2%.Still, the implementation with YOLOv4-tiny presents g tential for mobile applications or devices with less computational power.Howeve YOLOv4 demonstrated greater confidence in predicting small potholes compared to YOLOv5-based models.The detected objects were similar in YOLOv4 versions, and their confidence levels were better than those of YOLOv5 models.
The detailed results are shown in Table 1, with YOLOv4 and YOLOv4-tiny presenting excellent results in terms of mAP, model size, and detection time.In addition, it is believed that mAP could be improved by improving label quality, increasing training data, tuning hyperparameters, and using data augmentation.More precise labeling, such as polygonal segmentation, could also help improve the results.Lastly, limitations of this study include the small dataset of 665 images, limited quality of the labels, lack of hyperparameter tuning, and no direct testing of the algorithm's performance in real time.Furthermore, only YOLO implementations were evaluated.

Conclusions
The best result obtained for pothole detection in the dataset used was with YOLOv4, reaching a mAP of 83.2%.Still, the implementation with YOLOv4-tiny presents good potential for mobile applications or devices with less computational power.However, training a custom model with YOLOv4 and its usability turns out to be more complex with the use of the Darknet framework.This becomes an obstacle to putting the model into production and the solution's scalability.On the other hand, version 5 could have better results with some tuning.However, its Pytorch-based implementation is a plus.Consequently, it is recommended to keep the YOLOv4, YOLOv4-tiny, and YOLOv5s models in mind, depending on the application.
As a future research direction, the goal is to expand these custom models to detect more classes, such as alligator cracking, block cracking, longitudinal or transverse cracking, slippage cracks, and rutting.Additionally, more attention will be given to the data, which will be expanded and revised.The ultimate goal is to develop a real-time model capable of detecting various visual defects in road pavements, improving the management of road assets, reducing costs, and improving road safety.

Figure 1 .
Figure 1.Example image of potholes with labels.

Figure 1 .
Figure 1.Example image of potholes with labels.

Figure 2 .
Figure 2. Mean average precision vs. time to infer one image.

Figure 2 .
Figure 2. Mean average precision vs. time to infer one image.
Models run on Python version 3.8.13 and Pytorch 1.10.2.Furthermore, inst for installation are in the repositories of YOLOv3, YOLOv4, and YOLOv5.In addi customized models were trained using Pytorch and Darknet frameworks, spe YOLOv3 and YOLOv5 for Pytorch and YOLOv4 for Darknet.

Table 1 .
Comparison of YOLO models.

Table 1 .
Comparison of YOLO models.