Intelligent Detection of Asphalt Pavement Cracks Based on Improved YOLOv8s

Su, Jinfei; Xu, Jicong; Shi, Chuqiao; Wang, Yuhan; Dong, Shihao; Zhang, Xue

doi:10.3390/coatings16030359

Open AccessArticle

Intelligent Detection of Asphalt Pavement Cracks Based on Improved YOLOv8s

by

Jinfei Su

,

Jicong Xu

,

Chuqiao Shi

,

Yuhan Wang

,

Shihao Dong

^* and

Xue Zhang

^*

College of Transportation, Shandong University of Science and Technology, Qingdao 266590, China

^*

Authors to whom correspondence should be addressed.

Coatings 2026, 16(3), 359; https://doi.org/10.3390/coatings16030359

Submission received: 23 January 2026 / Revised: 11 February 2026 / Accepted: 11 March 2026 / Published: 12 March 2026

(This article belongs to the Special Issue Pavement Surface Status Evaluation and Smart Perception)

Download

Browse Figures

Versions Notes

Abstract

The intelligent detection of asphalt pavement cracks has become increasingly important for ensuring service performance of road infrastructure. Traditional manual detection has significant safety hazards and insufficient accuracy. Furthermore, existing deep learning models still face challenges, including missed detection, false alarms, and poor performance in small target detection under complex conditions. This investigation adopts unmanned aerial vehicles (UAVs) to acquire pavement distress information and develops an intelligent detection approach for asphalt pavement crack based on improved YOLOv8s. First, the Spatial Pyramid Pooling Fast (SPPF) module is replaced with the Spatial Pyramid Pooling Fast with Cross Stage Partial Connections (SPPFCSPC) module in the backbone network to enhance the multi-scale feature fusion capability. Secondly, the Convolutional Block Attention Module (CBAM) module is introduced to the neck network to optimize the feature weights in both channel and spatial attention. Meanwhile, the Efficient Intersection over Union (EIoU) loss is adopted to improve accuracy. Finally, the Crack_Dataset is established, and the ablation experiments are conducted to verify the reliability of the detection model. The research indicates that the improved model achieves Precision, Recall, and mAP@0.5 of 83.9%, 79.6%, and 83.9%, respectively, representing increases of 1.5%, 1.3%, and 1.4%, compared with the baseline model. In comparison with mainstream object detection algorithms such as YOLOv5s and YOLOv8s, the proposed method attains an F1-score, mAP@0.5, and mAP@[0.5–0.95] of 0.82, 83.9%, and 46.6%, respectively, demonstrating a performance improvement. Based on the improved detection model, a pavement crack detection system was designed and implemented using PyQt5. This system supports image, video, and real-time camera input and detection.

Keywords:

asphalt pavement; YOLOv8s; crack detection; attention mechanism; EIoU loss

1. Introduction

Pavement distresses have become increasingly frequent and severe, imposing substantial burdens on maintenance operations and traffic safety. This situation is imposing significant operational and maintenance pressures, as well as heightened traffic risks. As the most prevalent type of road surface distress, cracks significantly influence load-bearing capacity, durability, driving speed, fuel consumption, driving safety, and ride comfort. Furthermore, crack propagation accelerates the pavement deterioration and leads to increased maintenance and repair costs. Therefore, the prompt and accurate acquisition of information regarding road surface diseases has emerged as a critical issue in the realm of road maintenance.

With its outstanding advantages of high efficiency and reliability, the pavement crack detection method based on machine vision [1] has received extensive attention from both the academic and engineering communities. Its technical routes can be mainly divided into two categories: One is the detection method based on traditional digital image processing [2]. Firstly, the collected pavement images are preprocessed, including gray-scale processing, denoising, binarization, image enhancement, etc., and then the representative features of the disease target are extracted by combining computer vision and image processing technology. Chou et al. [3] used fuzzy technology to eliminate the noise caused by illumination variations and achieved reliable detection accuracy. In order to improve the accuracy of the pavement crack recognition algorithm, Jia et al. [4] used the improved CV model to preprocess the segmented images, so as to overcome the interference of various factors in the environment and make the images acquired by the camera clear. Li et al. [5] improved the traditional edge detection operator, filtering algorithm and image processing algorithm to obtain continuous and clear crack edge feature images and applied the improved algorithm to the detection of various cracks, achieving good results. Due to the diversity and complexity of road surface diseases, the feature extraction process is not universally applicable. Furthermore, the entire procedure requires manual processing of road surface image data, which leads to significant time consumption and increases the likelihood of reduced accuracy.

The other is the pavement crack detection method based on deep learning [6]. Compared with traditional methods, these approaches can reduce the manual intervention and improve the accuracy and robustness of crack detection. Given the special morphological characteristics of cracks, scholars mainly adopt pixel-level semantic segmentation frameworks [7] to achieve fine-grained delineation. Cha et al. [8] partitioned road image into 256 × 256-pixel local samples to form a training set and used the convolutional neural network model to train the model on the local sample training set. The trained neural network was used to test the images other than the training set and finally achieved 98% accuracy on the test set. Tong et al. [9] randomly divided the gray-scale pavement images into training set and test set, designed DCNN architecture, trained and tested pavement images, and successfully applied deep convolutional neural network to automatic pavement crack detection. He et al. [10] proposed the ResNet structure with cross-layer connections and alleviated the gradient dispersion problem in deep networks. Sha et al. [11] combined two CNN networks to extract the characteristics of pavement diseases, achieving the identification and measurement of pavement cracks. With the development of deep learning technology, object detection algorithms [12] have also been increasingly introduced into road engineering. Object detection uses deep neural networks to detect images and automatically generate bounding boxes to determine the position of objects and identify the types of objects. Sun et al. [13] proposed a pavement potting crack detection method based on the improved Faster R-CNN [14]. This algorithm fuses the feature extraction layers in multiple networks with Faster R-CNN to improve detection accuracy. Moreover, to further enhance the model performance, the method of candidate box aspect ratio is introduced into the model. The effect of model detection and positioning has been improved. Zhang et al. [15] introduced the Multi-level Attention Blocks (MLAB) in YOLOv3 and trained the model with road surface images from the perspective of unmanned aerial vehicles, achieving satisfactory results. To improve detection according to the characteristics of road surface defects, He et al. [16] proposed an improved YOLOv5 detection model, Pavement Damage–YOLO (PD-YOLO), enhancing the feature extraction ability and multi-scale feature fusion ability. Based on the YOLOv8 model, Hou et al. [17] introduced the CPCA attention mechanism and replaced the neck network with weighted BiFPN. This not only achieved model lightweighting but also improved the accuracy and Recall of the model for pavement crack detection.

Although the aforementioned research has achieved significant advancements, challenges remain in achieving robust and precise detection of highly diverse and irregular pavement cracks in real-world scenarios. First, while individual techniques like attention mechanisms, advanced pooling modules, and improved loss functions have been explored separately in pavement crack detection, there is a lack of systematic investigation into their synergistic integration and optimization within a unified framework like YOLOv8s. The interaction effects and potential trade-offs among these modules for this specific task are not well understood. Second, many studies focus on algorithmic improvements but pay less attention to practical engineering constraints, such as the balance between accuracy and computational efficiency for potential deployment on mobile platforms like UAVs. Third, the performance evaluation often lacks comprehensive efficiency metrics and failure analysis under challenging conditions. Therefore, this study focuses on pavement cracks as the primary target and utilizes pavement images as the research subject. It proposes an intelligent detection method for asphalt pavement cracks based on an enhanced version of YOLOv8s. The main contributions are: (1) A systematically improved YOLOv8s model that integrates SPPFCSPC for multi-scale feature enhancement, CBAM for targeted feature refinement, and EIoU loss for precise bounding box regression, specifically optimized for pavement crack characteristics. (2) A thorough experimental analysis including ablation studies, comparisons with state-of-the-art models, computational complexity evaluation, and visual failure case analysis. (3) The construction of a balanced Crack_Dataset and the development of a user-friendly detection system based on PyQt5, bridging the gap between algorithm research and practical application.

2. The Improved Structural Design of the YOLOv8s Algorithm

As is shown in Figure 1, The YOLOv8 algorithm represents an advanced iteration within the YOLO series of object detection methodologies. It is composed of four integral components: the input, the backbone, the neck, and the head. The YOLOv8 model has introduced a range of new features and enhancements based on its predecessors, resulting in state-of-the-art accuracy and detection speed. Nevertheless, the detection accuracy and Recall rate for road crack detection applications still require further improvement. In this study, we adopt the small-scale (S-scale) YOLOv8 model and integrate it with specific crack characteristics to enhance the performance of the YOLOv8s model.

2.1. SPPFCSPC Module

Given the complexity of feature extraction and the substantial background noise present in the dataset, the original SPPF module in the YOLOv8s backbone was replaced with the SPPFCSPC module. As is shown in Figure 2, The SPPFCSPC combines the structural advantages of SPPF and SPPCSPC by splitting the input into two branches, each performing independent convolution operations. Additionally, the lower branch is capable of preserving features associated with fine cracks and small targets. Ultimately, the original features are fused with the processed features across various stages to facilitate multi-scale feature integration, thereby improving representational capacity.

The SPPFCSPC module sequentially superimposes three Max pooling operations with 5 × 5 kernels, facilitating feature pyramid fusion through these three pooling processes. This approach enhances the model’s capacity to achieve a larger receptive field, improves its ability to extract multi-scale crack features, and further bolsters the model’s detection capabilities for targets of varying scales.

2.2. Attention Mechanism

The human visual system demonstrates selective perceptual capabilities during the information processing phase. By optimizing the allocation of attentional resources, it prioritizes the processing of essential information [18]. This cognitive mechanism enhances the efficiency and accuracy of information processing. Drawing inspiration from this biological principle, the attention mechanism introduced in the realm of deep learning dynamically adjusts feature weights to achieve improved representation of key features while simultaneously suppressing non-key features. CBAM attention module [19] was incorporated into the neck network of YOLOv8s to enhance the model’s ability to represent features of pavement cracks.

The CBAM attention mechanism module has been integrated into the neck network of YOLOv8s. This CBAM consists of a channel attention module (CAM) and a spatial attention module (SAM), which are arranged in series. It was selected over other attention mechanisms (e.g., SE, ECA) for two main reasons. First, its dual-path design (channel + spatial) is particularly suited for capturing the long, thin, and irregular shapes of pavement cracks, which require both channel-wise feature recalibration and spatial region emphasis. Second, CBAM introduces minimal computational overhead compared to more complex attention modules, making it a practical choice for maintaining a favorable speed-accuracy trade-off, which is crucial for real-time applications. The specific structure of this module is shown in Figure 3. For the input feature map F ∈

ℝ

^C×H×W, the calculation process of this module is represented by Equation (1):

\begin{array}{l} F^{'} = M_{c} (F) \otimes F \\ \hat{F} = M_{s} (F^{'}) \otimes F^{'} \end{array}

(1)

where M_c represents the channel attention weight; M_s represents spatial attention weights; and ⊗ represents element-by-element multiplication.

The input feature map concurrently executes global average pooling and global max pooling operations on each feature channel. Subsequently, the results from both operations are summed after being processed through the fully connected layer. Ultimately, the channel attention weights are derived using the Sigmoid activation function. The input feature map concurrently executes global average pooling and global max pooling along the channel dimension. Subsequently, through convolution operations followed by a Sigmoid activation function, it derives the spatial attention weights.

Within the CBAM framework, the channel attention module evaluates the importance of feature channels, enhancing those corresponding to crack texture and edges while suppressing irrelevant information. The spatial attention module emphasizes the spatial features, including the shape and position of cracks. It mitigates the influence of the road surface background and enhances focus on the target of road surface cracks. Both the fully connected layers in the channel attention module and the convolutional layers in the spatial attention module introduce a limited number of additional parameters, resulting in only a modest increase in computational cost. In this study, the CBAM is integrated after the C2f module of the neck network and prior to feature fusion. This arrangement ensures that high-resolution features obtained from upsampling are initially processed by the attention mechanism, followed by feature fusion and other operations before being forwarded to the model’s detection head. By optimizing dual-dimensional features in both channel and spatial dimensions, we enhance the detection accuracy of pavement cracks.

2.3. The Improved Loss Function

In object detection, the Intersection over Union (IoU) threshold is used to evaluate the localization accuracy of predicted bounding boxes. A threshold of 0.5 is typically applied for both evaluation and non-maximum suppression (NMS) during post-processing. The purpose of this IoU threshold is to eliminate redundant prediction boxes. Notably, a smaller IoU threshold in this context enhances the effectiveness of removing duplicate boxes. The schematic diagram of IoU is shown in Figure 4:

The calculation formula is as follows:

I o U = \frac{A r e a o f I n t e r \sec t i o n}{A r e a o f U n i o n}

(2)

In YOLOv8, the matching of both positive and negative samples, as well as the calculation of loss values, employs Complete Intersection over Union (CIoU) to quantify the degree of overlap between the predicted bounding box and the ground-truth box. CIoU introduces a penalty term for aspect ratio in addition to Distance Intersection over Union (DIoU), while simultaneously considering both the deviation in center point positions and the length of the diagonal. Consequently, it provides a more accurate measurement of similarity between two bounding boxes. The calculation formula is as follows:

C I o U = I o U - \frac{ρ^{2} (b, b^{g t})}{C^{2}} - a v

(3)

v = \frac{4}{π^{2}} {(a c \tan \frac{w^{g t}}{h^{g t}} - a c \tan \frac{w}{h})}^{2}

(4)

a = \frac{v}{(1 - I o U) + v}

(5)

where b and b^gt respectively represent the center point coordinates of the prediction box and the real box; ρ²(b,b^gt) represents the Euclidean distance between two center points. C represents the diagonal length of the minimum bounding rectangle of the two boxes. α is the weight function, ν is used to measure the similarity of aspect ratio. W and w^gt respectively represent the widths of the prediction box and the real box. H and h^gt respectively represent the heights of the predicted box and the true box. The final CIoU loss is defined as:

L_{C I o U} = 1 - I o U + \frac{ρ^{2} (b, b^{g t})}{C^{2}} + a v

(6)

However, the v in the CIoU formula reflects the difference in aspect ratio between the predicted box of the model and the actual label box, rather than the difference in width and height between the two boxes. This enables the model to indirectly learn the size characteristics of diseases by utilizing metrics such as Intersection over Union (IoU) and aspect ratio. Consequently, this approach reduces the efficiency of the model’s learning process. As a result, CIoU somewhat impedes effective optimization of the model. To address this issue, the EIoU method was introduced to establish an absolute size matching mechanism between the predicted box and the ground-truth box. In our improved model, only the bounding box regression loss is replaced by EIoU loss. The classification loss and the objectness loss remain unchanged from the original YOLOv8s implementation. This allows us to directly evaluate the impact of EIoU on localization accuracy. EIoU replaces the penalty factor associated with the aspect ratio in CIoU by incorporating the loss of width and height between the predicted box and the ground-truth box. This modification allows for a more direct alignment of the size of the predicted box, as inferred by the model, with that of the actual box. The gradient direction is more clearly defined, thereby enabling the model to learn the shape features of pavement cracks with greater efficacy. The enhanced positioning loss function incorporates three constraints: overlap degree loss, deviation in center point position, and losses related to width and height. The first two terms of EIoU inherit the operational logic of CIoU, while the third term incorporates width and height loss to improve the model’s learning of crack size features and accelerate convergence. Compared with CIoU, EIoU does not require trigonometric computations, resulting in faster calculation and more efficient model training. The EIoU loss formula is as follows:

\begin{array}{l} L_{E I o U} = L_{I o U} + L_{d i s} + L_{a s p} \\ = 1 - I o U + \frac{ρ^{2} {(b, b^{g t})}^{2}}{{(w^{c})}^{2} + {(h^{c})}^{2}} + \frac{ρ^{2} (w, w^{g t})}{{(w^{c})}^{2}} + \frac{ρ^{2} (h, h^{g t})}{{(h^{c})}^{2}} \end{array}

(7)

where w^c and h^c are the widths and heights of the minimum bounding rectangles of the real box and the predicted box, respectively.

2.4. The Improved Structure of YOLOv8s

Considering the specific characteristics of pavement crack detection targets [20], three modifications were applied to the YOLOv8s architecture: the SPPF module in the backbone was replaced with SPPFCSPC, the CBAM was added to the neck, and the CIoU loss in the head was substituted with EIoU loss. The resulting improved network structure is illustrated in Figure 5.

3. Data Training and Parameter Setting

3.1. Data Acquisition and Preprocessing

Choosing an appropriate dataset is crucial for the development of a pavement disease detection system. A representative dataset containing sufficient samples of pavement distress is essential for reliable model training and evaluation. This enables improved adaptability and accuracy under varying road conditions. To enhance the robustness of the model, this study utilizes training data sourced from the public Road Damage Dataset 2022 (RDD2022). This dataset primarily encompasses three types of pavement diseases: longitudinal cracks, transverse cracks, and grid cracks. Additionally, the dataset also contains information on blurred white lines and indistinct pedestrian crossings; however, this investigation focuses exclusively on the aforementioned three types of pavement diseases.

The dataset was collaboratively developed using data from China_MotorBike and United_States in RDD2022, along with self-collected images obtained via unmanned aerial vehicles. Notably, the self-collected images of Kunlun Mountain Road were exclusively utilized as the validation set. Subsequently, through processes of data annotation, data cleaning, and data augmentation, the dataset was ultimately designated as Crack_Dataset. This dataset comprises a total of 4385 images. It has been divided into a training set and a validation set at an approximate ratio of 8:2.

3.1.1. A Method for Collecting Pavement Disease Data Based on Unmanned Aerial Vehicles

As shown in Figure 6, The DJI Mavic 3E (Produced in Shenzhen, China) unmanned aerial vehicle was employed to capture road surface images of specific sections along Kunlun Mountain Road in the Huangdao District of Qingdao City. The technical parameters of this unmanned aerial vehicle are presented in Table 1. A total of 57 images were collected and incorporated into the validation set. The example image of the road surface collected is shown in Figure 7.

3.1.2. Data Annotation

This investigation primarily focuses on three types of pavement distress: longitudinal cracks, transverse cracks, and network cracks. The study involves labeling the location, size, category, and other relevant information of pavement images collected from the Kunlun Mountain Road using unmanned aerial vehicles. For this purpose, LabelImg(Version 1.8.6) is employed to annotate the pavement crack targets. LabelImg is an open-source data annotation tool that supports three labeling formats: PascalVOC, YOLO, and CreateML. The interface of Labelimg is shown in Figure 8.

3.1.3. Data Cleaning

Data cleaning operations represent a critical phase in the creation of datasets. In this study, such processes are primarily applied to image data derived from the RDD2022 dataset. The objective is to rectify or eliminate any errors, as well as address missing or duplicate annotations that may be present within the dataset, thereby ensuring the integrity and quality of the data utilized for model training.

The RDD2022 public dataset was released with VOC format labeled data already included. This study involves writing a Python program to convert the VOC labeled data format into the YOLO labeled data format, specifically in txt text format. During the initial model training process, certain disease labels were omitted from the RDD2022 dataset. The lack of labeled data can reduce model training efficacy and compromise prediction accuracy. To address this issue, this investigation employs the Labelimg tool for secondary annotation to supplement the missing annotations and enhance the Crack_Dataset dataset.

3.1.4. Data Augmentation

The training set of the Crack_Dataset contains three types of pavement cracks. The data distribution of the training set is illustrated in Figure 9a. Longitudinal crack samples constitute 60.15% of the training set, while transverse crack samples account for 28.11%, and mesh crack samples represent only 9.03%. The proportion of longitudinal cracks exceeds 50%, indicating that this category has a significantly higher sample count compared to other types, particularly when contrasted with mesh cracks. This imbalanced distribution may lead to an overemphasis on longitudinal cracks during model training, which could adversely affect the identification capabilities for transverse cracks. Furthermore, it increases the likelihood of overlooking important yet rare defect types such as mesh cracks, ultimately resulting in suboptimal detection performance by the model.

Based on this, the present investigation employs various data augmentation techniques primarily on images containing transverse and mesh cracks to increase their sample size and proportion, while preserving all original longitudinal crack samples to maintain the real-world distribution. As depicted in Figure 10, The augmentation methods include horizontal flipping, vertical flipping, random brightness adjustment, and the addition of Gaussian and pepper noise. As depicted in Figure 9b, after data enhancement, longitudinal crack samples constitute 37.22%, transverse crack samples account for 27.66%, and mesh crack samples represent 26.19% of the training set. Notably, the proportion and sample size of mesh cracks have increased by approximately threefold.

3.2. Experimental Environment and Parameter Setting

3.2.1. Experimental Environment Setup

To ensure the reliability of the experimental results, this study was conducted within specific software and hardware environments. Regarding hardware, high-performance computer equipment was utilized to meet the requirements for training and inference of the YOLOv8s model. In terms of software, operating systems, programming languages, and deep learning frameworks were selected based on their suitability for deep learning tasks. The following sections outline the software and hardware configurations employed in this experiment. The detailed configuration is shown in Table 2.

3.2.2. Parameter Setting

To ensure the rigor of the experiments and the comparability of model training results, the hyperparameter settings for all models remain consistent, with the exception of the number of training rounds, as detailed in Table 3. The batch_size was set to 8 due to GPU memory constraints. To mitigate potential training instability associated with small batch sizes, we employed the Cosine annealing learning rate scheduler. All training hyperparameters are determined based on considerations related to both model performance and the training environment. To guarantee that model training achieves convergence, an approach involving an increase in training rounds is employed to prevent issues related to non-convergence during model training.

3.3. Design of Road Crack Detection and Identification System

An interactive interface for the automatic pavement crack detection system was developed using PyQt5 to enable user-friendly operation of the Python-based YOLOv5 model. This interface provides a convenient connection between the YOLO model and the user, supporting efficient defect detection.

The PyQt5 framework has been utilized to develop a visual user interface that facilitates the input and detection of images, videos, and real-time camera feeds. This approach offers an effective solution for the automated identification of pavement cracks. It enables real-time tracking and detection of images, videos, and cameras, while also allowing for the storage of recognition results for subsequent data analysis.

4. Analysis of Detection Accuracy for Pavement Cracking Damage

4.1. Model Accuracy Evaluation Index

To assess the detection accuracy of pavement cracks using the improved YOLOv8s model in this study, we employed mean Average Precision (mAP), Precision (P), Recall (R), and F1-score as evaluation metrics for model performance. The calculation method is as follows:

p r e c i s i o n = \frac{T P}{T P + F P}

(8)

Re c a l l = \frac{T P}{T P + F N}

(9)

m A P = \frac{1}{N} \sum_{i = 1}^{N} A P_{i}

(10)

F 1 - s c o r e = 2 \times \frac{(p r e c i s i o n \times r e c a l l)}{(p r e c i s i o n + r e c a l l)}

(11)

where in the context of multi-class object detection, the metrics Precision (P), Recall (R), and Average Precision (AP) are calculated independently for each category. For a given category (e.g., transverse cracks), the definitions are as follows:

(1) True Positive (TP): A ground-truth bounding box of transverse cracks is correctly matched with a predicted bounding box that has the same class label and an Intersection over Union (IoU) greater than a predefined threshold.

(2) False Positive (FP): A predicted bounding box is either: (1) incorrectly assigned the label of transverse cracks, or (2) a duplicate detection of the same ground-truth transverse crack.

(3) False Negative (FN): A ground-truth bounding box of transverse cracks is not matched with any predicted bounding box.

(4) True Negative (TN): In object detection, TN is typically not computed explicitly for each class, as it would encompass all areas of the image not containing the target class and all correctly rejected proposals for other classes. Therefore, TN is not used in the calculation of Precision and Recall.

To quantify processing efficiency, frames per second (FPS) is employed as the evaluation metric for the model’s real-time performance. FPS indicates the number of images processed per second. The total time required to predict an image encompasses preprocessing time, inference time, and post-processing time. The formula is as follows:

F P S = 1 / T o t a l t i m e

(12)

Due to the significant fluctuations in the model’s frames per second (FPS), this experiment utilizes 10 consecutive datasets to mitigate randomness. After excluding the maximum and minimum values, the average of the remaining data is calculated and presented as the final indicator.

4.2. Analysis of Road Crack Detection Accuracy

4.2.1. Ablation Experiments

To evaluate the effectiveness of the identified model improvement points and quantify their contributions to the overall performance of the model, this study conducts a comparative analysis through ablation experiments that focus on these enhancements using the Crack_Dataset test set.

Among the proposed modifications, YOLOv8s_1 replaces the SPPF module in the backbone network of the original YOLOv8s with the SPPFCSPC module. Meanwhile, YOLOv8s_2 integrates the attention mechanism CBAM into the neck network of the original YOLOv8s. Additionally, YOLOv8s_3 substitutes CIoU loss with EIoU loss in its original version. The improvements made to YOLOv8s incorporate these three advancements while building upon its foundational architecture. The results from these ablation studies are presented in Table 4.

While modules like Atrous Spatial Pyramid Pooling (ASPP) or Receptive Field Block (RFB) could also expand the receptive field, SPPFCSPC was chosen for its balance between performance gain and computational cost, and its effective use of CSP structure to reduce redundancy. Future work could include a more extensive comparison with these alternatives. The SPPFCSPC module in YOLOv8s_1 significantly enhanced the model’s accuracy by expanding the receptive field and facilitating multi-scale feature fusion. The model’s accuracy improved from 82.4% to 84.5%. Concurrently, both the Recall rate and mAP index experienced slight improvements, demonstrating that this enhancement enables the model to better extract complex features and strengthen its feature fusion capabilities. Following the introduction of channel and spatial attention, the CBAM attention mechanism in YOLOv8s_2 elevated detection accuracy from 82.4% to 83.9% compared to the original model; however, there was a decrease in Recall rate by 3.3%. The CBAM mechanism allows the model to effectively concentrate on critical areas while reducing false detection rates; nevertheless, this added focus on local regions can lead to information loss in less significant areas, resulting in an increased miss detection rate. After optimizing bounding box regression loss with EloU loss in YOLOv8s_3, the Recall rate rose from 78.3% for the original version to 79.1%, accompanied by a notable improvement in mAP index. This optimization has enhanced the model’s positioning accuracy for fuzzy and small-scale cracks, confirming that this module positively impacts target positioning precision.

After incorporating the three enhancements, the improved YOLOv8s model demonstrates superior performance compared to the original YOLOv8s model in terms of Precision, Recall, and mean Average Precision (mAP). Notably, while the combined model achieves the highest mAP@0.5 (83.9%), its mAP@[0.5–0.95] (46.6%) is lower than that of the model using only EIoU loss (YOLOv8s_3, 50.5%). This suggests that while SPPFCSPC and CBAM improve detection under the standard IoU threshold, they may introduce some localization variance that slightly degrades performance under stricter IoU criteria. The EIoU loss alone excels at precise box regression. This indicates a complex interaction between the modules, where optimizing for one aspect (feature representation and focus) might not perfectly align with optimizing for another (strict localization accuracy). Future work could explore adaptive weighting or more harmonious integration of these components.

This investigation employs ablation experiments to validate the effectiveness of various enhancement measures. The results indicate that the SPPFCSPC module significantly improves the model’s feature expression capability, while the CBAM attention mechanism enhances detection accuracy, albeit with a trade-off in Recall performance. Additionally, the EloU loss function optimizes target localization accuracy. The improved YOLOv8s model outperforms the baseline model across all evaluation metrics. Nevertheless, further optimization research is required to address compatibility issues between modules and ensure stability during the training process.

4.2.2. Contrast Experiment

In order to systematically evaluate the performance of the improved YOLOv8s model in the task of pavement crack detection, a comparative analysis was conducted between this enhanced model and several current mainstream object detection models, namely YOLOv5s, YOLOv11s, and YOLOv12s. The experiments were carried out using the Crack_Dataset. Detailed results can be found in Table 5.

As illustrated in Table 5, the improved model achieved mAP@0.5, mAP@[0.5–0.95], and F1-score values of 83.9%, 46.6%, and 0.82, respectively, surpassing those of other comparative versions of the YOLO model. The inference speed of the improved model was lower than that of YOLOv5s, YOLOv8s, and YOLOv11s, with an approximately 17.7% decrease compared to YOLOv5s. This reduction in speed can be attributed to the incorporation of the attention mechanism CBAM and the introduction of the SPPFCSPC module, both of which increase the number of parameters within the model. It is important to note that this investigation did not focus on enhancing a lightweight version of the model; consequently, there was a decrease in frames per second (FPS). Nevertheless, for real-time detection applications where an FPS exceeding 30 suffices, this improved model remains capable of fulfilling real-time detection tasks effectively.

Based on the above experimental preparations, the training results of the improved YOLOv8s model are shown in Figure 11, Figure 12 and Figure 13.

Figure 11 illustrates the Precision–Recall (PR) curve, which depicts the Precision and Recall ratios across various confidence thresholds. This curve serves as an intuitive indicator of the model’s integrated capabilities in both localization and classification tasks. Notably, the closer the curve approaches the upper-right corner of the graph, the better the overall performance of the model. Figure 12 presents a normalized confusion matrix diagram. The horizontal axis represents the true categories, while the vertical axis indicates those predicted by the model. Consequently, values along the main diagonal reflect the proportion of samples that were correctly identified; they represent the ratio of correct recognitions to total samples within each target category. In contrast, values not located on this diagonal signify proportions of incorrect predictions. Figure 13 displays the F1-score curve, where the horizontal axis denotes changes in confidence thresholds and the vertical axis represents corresponding F1-score values at these thresholds. This curve visually demonstrates trade-offs between Precision and Recall for our model. As shown in Figure 13, under optimal conditions for our improved YOLOv8s model, an F1-score of 0.82 is achieved.

4.2.3. Visual Analysis

The YOLOv5 architecture is a convolutional neural network implemented in Python. To facilitate usage by non-programming professionals, an interactive interface for the automatic detection of pavement cracks has been developed using PyQt5. The YOLO model serves as the communication backbone, enhancing the convenience and intuitiveness of defect detection.

The PyQt5-based interface supports input from images, videos, and real-time camera feeds for automated pavement crack detection. It enables real-time tracking and analysis of images, videos, and camera inputs while also allowing users to save recognition results for subsequent data analysis. The corresponding system interface is shown in Figure 14.

To empirically validate the effectiveness of algorithmic improvements made in this study, two images were randomly selected from the test set to illustrate the differences between the detection outcomes produced by the original YOLOv8s model and those generated by the enhanced YOLOv8s model under investigation. The respective detection results are presented in Figure 15 and Figure 16.

By comparing Figure 15a and Figure 16a, it is evident that the enhanced model demonstrates a superior capability in accurately identifying shallow cracks, with detection results exhibiting higher confidence levels than those of the original model. Furthermore, when examining Figure 15b and Figure 16b, it becomes apparent that the original YOLOv8s model produced redundant detections of the same transverse crack. Although the improved YOLOv8s model yields detection results with lower confidence, it significantly mitigates redundancy by reducing multiple detection frames, thereby aligning more closely with engineering application requirements.

5. Conclusions

This investigation aims to enhance the efficiency of intelligent detection of pavement defects and proposes an advanced method for detecting asphalt pavement cracks based on an improved YOLOv8s framework. The main conclusions are as follows:

(1) To further enhance the detection accuracy of the model, this study replaces the SPPF module in the YOLOv8s backbone network with the SPPFCPSC module. This modification improves both complex feature extraction capabilities and multi-scale feature fusion abilities. Additionally, a CBAM attention mechanism is integrated into the neck network of YOLOv8s to strengthen the model’s focus on critical features. Furthermore, CIoU loss in the head network is substituted with EIoU loss to improve positioning accuracy. Ultimately, an innovative improved YOLOv8s pavement crack detection model is proposed.

(2) The dataset utilized for experiments in this investigation comprises self-collected road images along with selected images from the public RDD2022 dataset, providing high-quality training data that enhances model generalization ability while proposing reasonable evaluation metrics.

(3) Based on experimental results, we verified and quantified the detection performance of our proposed improved YOLOv8s model through ablation studies. The findings indicate that our modified YOLOv8s outperforms conventional models within the YOLO series regarding Precision, Recall, and mAP@0.5 when compared to other mainstream object detection algorithms.

6. Limitation

The training data primarily comes from the RDD2022 dataset, with UAV images used only for validation. This may limit the model’s generalization to pavement types, damage patterns, and environmental conditions not represented in these sources. Future work will involve collecting a larger, more diverse multi-region dataset and conducting cross-dataset validation to better assess generalization. Domain adaptation techniques will also be explored.

In subsequent research, the traditional convolutional network structure will be further optimized to reduce the computational complexity of the model and pruning or distillation techniques will be used to achieve lightweight models while maintaining performance advantages. More disease types will be added as detection targets to improve the model’s detection performance in multiple scenarios and help promote the system construction of smart road maintenance work.

Author Contributions

Conceptualization, S.D. and X.Z.; Methodology, J.S.; Software, J.X.; Validation, J.X. and X.Z.; Formal analysis, C.S.; Investigation, Y.W.; Resources, Y.W.; Data curation, C.S.; Writing—original draft, J.S.; Visualization, J.X.; Project administration, J.S. and S.D. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China [grant number 52508508], the Natural Science Basic Research Plan in Shandong Province [grant number ZR2023QE122, ZR2024QE025], the Natural Science Foundation of Henan Province [grant number 252300421565], the Fundamental Research Funds for the Central Universities [grant number 300102213530] and the Open Research Project of Shandong Key Laboratory of Highway Technology and Safety Assessment [grant number SH202302].

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in this study are included in the article.

Conflicts of Interest

The authors declare no conflict of interest.

References

He, X.; Tang, Z.; Deng, Y.; Zhou, G.; Wang, Y.; Li, L. UAV-based road crack object-detection algorithm. Autom. Constr. 2023, 154, 105014. [Google Scholar] [CrossRef]
Qiu, Y.J.; Wang, G.L.; Yang, E.H.; Yu, X.L.; Wang, C.P. Crack Detection of 3D Asphalt Pavement Based on Multi-feature Test. J. Southwest Jiaotong Univ. 2020, 55, 518–524. [Google Scholar]
Chou, J.; O’Neill, W.A.; Cheng, H. Pavement distress evaluation using fuzzy logic and moment invariants. Transp. Res. Rec. 1995, 1505, 39–46. [Google Scholar]
Jia, D.; Song, W.; Dai, J.; Zhu, H. Pavement crack detection algorithm for linear CCD images. J. Image Graph. 2016, 21, 1623. [Google Scholar] [CrossRef]
Li, J.H.; Lou, W.; Jiang, S.S. A study on road surface defects detecting technology with CCD camera. J. Xi’an Technol. Univ. 2002, 22, 95–99. [Google Scholar]
Zhang, L.; Yang, F.; Zhang, Y.D.; Zhu, Y.J. Road crack detection using deep convolutional neural network. In Proceedings of the 2016 IEEE International Conference on Image Processing (ICIP), Phoenix, AZ, USA, 25–28 September 2016. [Google Scholar]
Maeda, H.; Sekimoto, Y.; Seto, T.; Kashiyama, T.; Omata, H. Road Damage Detection Using Deep Neural Networks with Images Captured Through a Smartphone. Comput.-Aided Civ. Infrastruct. Eng. 2018, 33, 1127–1141. [Google Scholar] [CrossRef]
Cha, Y.J.; Choi, W.; Büyüköztürk, O. Deep learning-based crack damage detection using convolutional neural networks. Comput.-Aided Civ. Infrastruct. Eng. 2017, 32, 361–378. [Google Scholar] [CrossRef]
Tong, Z.; Gao, J.; Han, Z.; Wang, Z. Recognition of asphalt pavement crack length using deep convolutional neural networks. Road Mater. Pavement Des. 2018, 19, 1334–1349. [Google Scholar] [CrossRef]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Sha, A.M.; Tong, Z.; Gao, J. Recognition and Measurement of Pavement Disasters Based on Convolutional Neural Networks. China J. Highw. Transp. 2018, 31, 1–10. [Google Scholar]
Zhang, X. Research on Crack Detection and Quantitative Evaluation Method of Asphalt Pavement Based on Semantic Segmentation. Master’s Thesis, Xi’an University of Science and Technology, Xi’an, China, 2023. [Google Scholar]
Sun, C.Y.; Pei, L.L.; Li, W.; Hao, X.; Chen, Y. Pavement Sealed Crack Detection Method Based on Improved Faster R-CNN. J. South China Univ. Technol. (Nat. Sci. Ed.) 2020, 48, 84–93. [Google Scholar]
Xu, K. Research on Pavement Crack Detection and Extraction Method Based on YOLOv5s. Master’s Thesis, Chang’an University, Xi’an, China, 2022. [Google Scholar]
Zhang, Y.; Zuo, Z.; Xu, X.; Wu, J.; Zhu, J.; Zhang, H.; Wang, J.; Tian, Y. Road damage detection using UAV images based on multi-level attention mechanism. Autom. Constr. 2022, 144, 104613. [Google Scholar] [CrossRef]
He, T.J.; Li, H.J. Pavement damage detection model based on improved YOLOv5. China Civ. Eng. J. 2024, 57, 96–106. [Google Scholar]
Hou, Y.Y.; Liang, K.W.; Guo, W.Q.; Hao, L.; Guo, Z.; Dong, B. Pavement disease detection model based on improved YOLOv8. J. Shaanxi Univ. Sci. Technol. 2025, 143, 166–173. [Google Scholar]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In Advances in Neural Information Processing Systems 30; Curran Associates, Inc.: Red Hook, NY, USA, 2017. [Google Scholar]
Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. Cbam: Convolutional block attention module. In Proceedings of the Computer Vision—ECCV, Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
Tuersun, M.; Qiu, J.Z.; Liu, J.; Du, H.; Zhu, X.; Xu, L. A Road Crack Detection Algorithm Based on Improved YOLOv8n. Sci. Technol. Eng. 2025, 25, 6044–6053. [Google Scholar]

Figure 1. YOLOv8 network architecture.

Figure 2. Structure of SPPCSPC module and SPPFCSPC module.

Figure 3. CBAM structure.

Figure 4. Schematic diagram of IoU calculation.

Figure 5. The improved YOLOv8s network structure.

Figure 6. DJI Mavic 3E drone.

Figure 7. Sample image acquisition of Kunlun Mountain Road surface.

Figure 8. Labelimg interface.

Figure 9. The quantity and proportion of the three diseases. (a) Before date augmentation; (b) after date augmentation.

Figure 10. Data augmentation. (a) original image; (b) horizontal flip; (c) vertical flip; (d) random brightness adjustment; (e) add Gaussian noise; (f) add pepper noise.

Figure 11. PR curve graph.

Figure 12. Confusion matrix.

Figure 13. F1-score curve graph.

Figure 14. Interface of the road crack and pothole detection system.

Figure 15. The test results of the original YOLOv8s. (a) Test set road surface 1; (b) test set road surface 2.

Figure 16. The detection results of the improved YOLOv8s. (a) Test set road surface 1; (b) test set road surface 2.

Table 1. Technical parameters table of unmanned aerial vehicles.

Parameter Name	Parameter Value
Maximum takeoff weight	1050 g
Maximum rotational angular velocity	200°/s
Maximum ascending speed	6 m/s (Normal gear) 8 m/s (Sport gear)
Maximum descent velocity	6 m/s
Maximum wind resistance speed	12 m/s
Maximum horizontal flight speed (windless)	15 m/s (Normal gear) Fly forward: 21 m/s, Side flight: 20 m/s, Rear flight: 19 m/s (Sport gear)
Maximum tilting Angle	30° (Normal gear) 35° (Sport gear)
GNSS	GPS + Galileo + BeiDou + GLONASS
Image sensor	4/3 CMOS, Effective pixels: 20 million
Stable system	Three-axis mechanical pan-tilt

Table 2. Experimental Environment Configuration Table.

Experimental Environment	Name	Model/Version
Hardware environment	GPU	NVIDIA GeForce RTX 3050 Ti Laptop GPU
	memory	4 GB
	CPU	AMD Ryzen 5 5600H with Radeon Graphics
	internal storage	16 GB
Software environment	operating system	Windows 10
	Python	Python 3.10.16
	Pytorch	2.2.0
	CUDA	11.8.0
	cudnn	8.9.7

Table 3. Experimental hyperparameter settings.

Parameter Name	Parameter Value/Algorithms
Training rounds	400
batch_size	8
Pretrained	True
Data augmentation	Mosaic
Optimizer	Stochastic gradient descent (SGD)
Initial learning rate	0.01
Update learning rate	Cosine annealing algorithm
Cosine annealing hyperparameters	0.01
Cosine annealing hyperparameters	0.937
Weight decay coefficient	0.0005
IoU_thres in NMS	0.35

Note: Mosaic augmentation is a default training augmentation in YOLO series that stitches four images together, enriching the context and improving small object detection.

Table 4. Results of ablation experiment.

Model	SPPFCSPC	CBAM	EIoU	Accuracy /%	Recall /%	mAP@ 0.5/%	mAP@ [0.5–0.95]/%
YOLOv8s	×	×	×	82.4	78.3	82.5	44.4
YOLOv8s_1	√	×	×	84.5	78.9	83.0	45.5
YOLOv8s_2	×	√	×	83.9	75.0	81.6	44.6
YOLOv8s_3	×	×	√	82.5	79.1	83.6	50.5
Refining YOLOv8s	√	√	√	83.9	79.6	83.9	46.6

Table 5. Comparison of experimental results of different models.

Model	mAP@0.5/%	mAP@[0.5–0.95]/%	F1-Score	FPS
YOLOv5s	76.0	40.3	0.76	98.7
YOLOv8s	81.5	43.5	0.79	89.9
YOLOv11s	82.2	44.6	0.80	84.1
YOLOv12s	82.7	45.2	0.81	65.9
Improved mode	83.9	46.6	0.82	81.2

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Su, J.; Xu, J.; Shi, C.; Wang, Y.; Dong, S.; Zhang, X. Intelligent Detection of Asphalt Pavement Cracks Based on Improved YOLOv8s. Coatings 2026, 16, 359. https://doi.org/10.3390/coatings16030359

AMA Style

Su J, Xu J, Shi C, Wang Y, Dong S, Zhang X. Intelligent Detection of Asphalt Pavement Cracks Based on Improved YOLOv8s. Coatings. 2026; 16(3):359. https://doi.org/10.3390/coatings16030359

Chicago/Turabian Style

Su, Jinfei, Jicong Xu, Chuqiao Shi, Yuhan Wang, Shihao Dong, and Xue Zhang. 2026. "Intelligent Detection of Asphalt Pavement Cracks Based on Improved YOLOv8s" Coatings 16, no. 3: 359. https://doi.org/10.3390/coatings16030359

APA Style

Su, J., Xu, J., Shi, C., Wang, Y., Dong, S., & Zhang, X. (2026). Intelligent Detection of Asphalt Pavement Cracks Based on Improved YOLOv8s. Coatings, 16(3), 359. https://doi.org/10.3390/coatings16030359

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Intelligent Detection of Asphalt Pavement Cracks Based on Improved YOLOv8s

Abstract

1. Introduction

2. The Improved Structural Design of the YOLOv8s Algorithm

2.1. SPPFCSPC Module

2.2. Attention Mechanism

2.3. The Improved Loss Function

2.4. The Improved Structure of YOLOv8s

3. Data Training and Parameter Setting

3.1. Data Acquisition and Preprocessing

3.1.1. A Method for Collecting Pavement Disease Data Based on Unmanned Aerial Vehicles

3.1.2. Data Annotation

3.1.3. Data Cleaning

3.1.4. Data Augmentation

3.2. Experimental Environment and Parameter Setting

3.2.1. Experimental Environment Setup

3.2.2. Parameter Setting

3.3. Design of Road Crack Detection and Identification System

4. Analysis of Detection Accuracy for Pavement Cracking Damage

4.1. Model Accuracy Evaluation Index

4.2. Analysis of Road Crack Detection Accuracy

4.2.1. Ablation Experiments

4.2.2. Contrast Experiment

4.2.3. Visual Analysis

5. Conclusions

6. Limitation

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI