A Lightweight YOLOv11n-Based Framework for Highway Pavement Distress Detection Under Occlusion Conditions

Wei Li; Xiao Luo; Changhao Yang; Miao Fang; Weiyu Liu

doi:10.3390/app15179664

,

and

School of Electronics and Control Engineering, Chang’an University, Xi’an 710064, China

^*

Authors to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Appl. Sci.2025, 15(17), 9664;https://doi.org/10.3390/app15179664

Version Notes

Order Reprints

Featured Application

The proposed RS-YOLOv11n-based lightweight pavement distress detection model can be deployed on vehicle-mounted mobile inspection terminals. It achieves high-precision distress identification even in complex highway scenarios characterized by high dynamics and severe occlusion. Its heterogeneous feature distillation backbone (RHGNetv2) directionally extracts edge features of elongated distresses, while the occlusion-aware module (SEAM) employs a spatial-channel cooperative attention mechanism to enhance contextual reasoning capabilities for partially occluded defects. This enables real-time, accurate intelligent pavement inspection on edge devices.

Abstract

In response to the three main challenges in lightweight road pavement defect detection models—insufficient feature discriminability, weak environmental robustness, and low edge deployment efficiency—this paper proposes an innovative architecture, RS-YOLOv11n, based on YOLOv11n. Experimental results demonstrate significant improvements of RS-YOLOv11n over YOLOv11n on the RDD2022_Mix dataset: model parameters are reduced by 21.0%, computational complexity is decreased by 17.5%, mAP@0.5 is increased by 0.64%, and recall rate is improved by 1.03%. Firstly, a heterogeneous feature distillation backbone, RHGNetv2, is designed, incorporating RepConv reparameterized convolution to optimize computational efficiency. Secondly, a lightweight occlusion-aware module, SEAM, is introduced, significantly enhancing detection performance in occluded scenarios. RS-YOLOv11n provides a high-precision, low-resource, lightweight solution for intelligent road inspection.

Keywords:

pavement distress detection; YOLOv11n; lightweight; HGNetv2; attention mechanism

1. Introduction

Highway transportation networks serve as vital lifelines for national economic development and societal operation, where safety and unimpeded flow are paramount. With the continuous expansion of China’s highway mileage and increasing service duration, pavements inevitably develop various distresses—such as cracks, potholes, and alligator cracking—due to combined factors including vehicular loads, harsh climates, and material aging. Untimely detection and repair of these defects not only significantly reduces road serviceability and lifespan but also poses severe traffic safety hazards, endangering lives and property.

Traditional pavement distress detection primarily relies on manual inspection or semi-automated specialized vehicles, suffering from inherent limitations such as inefficiency, subjectivity, high costs, susceptibility to weather/illumination conditions, and inability to achieve all-weather real-time monitoring. Against the backdrop of China’s national strategy of “building a leading transportation country” and new infrastructure initiatives, developing intelligent, automated road inspection and maintenance technologies has become an urgent industry demand. Computer vision and deep-learning-based object detection techniques, particularly those utilizing widely deployed roadside surveillance cameras or vehicle-mounted mobile platforms for real-time distress identification, offer revolutionary solutions for efficient, precise, and automated pavement condition monitoring.

Research on automated pavement distress detection has evolved from traditional image processing methods to deep learning approaches, achieving significant progress in detection accuracy and efficiency in recent years.

Traditional image-processing-based distress detection methods have achieved significant advancements through continuous research efforts by scholars worldwide. Recognizing the substantial grayscale differences between distressed areas and background regions in images, researchers have conducted systematic explorations leveraging this characteristic. Cheng et al. [] proposed a real-time threshold segmentation method using image differencing, enabling efficient pavement distress segmentation with limited samples. Yamaguchi et al. [] developed an efficient high-speed crack detection approach employing percolation-based image processing. Their method reduces computation time through termination and skip-adding procedures, halting the percolation process by calculating cyclicity during processing. Tsai et al. [] introduced an adaptive threshold segmentation technique to mitigate threshold-setting impacts on segmentation quality, enhancing pavement distress segmentation accuracy. Lee et al. [] utilized morphological techniques to correct background illumination inhomogeneity, coupled with enhanced binarization and shape analysis to improve detection performance. Li et al. [] designed a long-distance image acquisition device with integrated processing for precise crack extraction. Their solution incorporates a modified C–V model-based segmentation algorithm optimized for crack detection. Xu Xuejun et al. [] conducted comprehensive evaluations of image processing algorithms including grayscale conversion, checkerboard corner pixel rate calculation, noise-reduction filtering, and edge detection. This framework enabled video-/image-based bridge crack width quantification and software implementation. Wang Xin et al. [] established a threshold-based pavement distress segmentation method that isolates defects through optimal threshold selection, subsequently enhancing results via morphological erosion and dilation operations.

Compared to traditional image processing methods, deep-learning-based two-stage detection approaches possess the capability for autonomous learning of distress features while effectively filtering background noise. Zheng et al. [] developed crack detection models using Fully Convolutional Networks (FCNs), Regions with CNN features (R-CNN), and Richer FCN (RFCN) architectures. These amplify and extract features from concrete structures through convolutional neural networks. Zhang Yuefei et al. [] proposed an enhanced Mask R-CNN pavement crack detection method incorporating an adaptive loss function and an auxiliary prediction branch. This design directs network attention toward fine-grained crack features, improving detection accuracy. Sun Chaoyun et al. [] introduced a modified Faster R-CNN approach for sealing crack detection. To address missed detections and localization inaccuracies in conventional Faster R-CNN, their model integrates feature extraction layers from VGG16, ZFNet, and ResNet50 architectures. Xu et al. [] achieved effective detection with limited training data by implementing a combined Faster R-CNN and Mask R-CNN training strategy, demonstrating robustness under data scarcity. Zhao et al. [] established a deep-learning-based crack detection framework utilizing Faster R-CNN for feature extraction and classification. Their method exhibits strong robustness, maintaining high accuracy for shallow, multiple, and blurred cracks. Du et al. [] presented a novel Mask R-CNN application for fracture detection in electrical image logs, automating identification in geological exploration contexts.

In order to improve detection speed and meet real-time detection requirements, some researchers have started to explore the use of deep-learning-based single-stage detection algorithms for road pavement distress identification. Qiwen et al. [] proposed integrating You Only Look Once (YOLO) into Unmanned Aerial Vehicles (UAVs) for real-time crack detection on tiled sidewalks, aiming to identify cracks from the many grid-like elements of the tiles. To further improve the accuracy of pavement distress detection, He Tiejun et al. [] proposed the Pavement Damage–YOLO (PD-YOLO) model, which is based on YOLOv5 and designed to enhance the model’s detection of pavement distress features. Sike et al. [] introduced an improved YOLOv5 model integrated with a Vision Transformer (ViT), which can compute attention weights for image regions and generate a new feature map with weighted features. By adding the ViT module to the neck of the YOLOv5 model, they achieved improvements in both speed and accuracy. Ding et al. [] constructed a model based on the YOLOv8n architecture, seamlessly integrating the Shift-Wise Convolution Feature Extraction Module (SWC2f), the Cross-Scale Attention Fusion Module (CCAFM) for feature fusion and the Dynamic Head (Dyhead) Detection Module to enhance the model’s structure. Han et al. [] proposed an enhanced pavement distress recognition algorithm, MS-YOLOv8, which modifies the YOLOv8 model by incorporating three novel mechanisms: Deformable Large Kernel Attention (DLKA), Large Separable Kernel Attention (LSKA), and Spatially Weighted Dilated Convolution (SWDA), thereby improving detection accuracy and adaptability to varying pavement conditions. Hu Xiaowei et al. [] introduced a lightweight pavement distress detection method called YOLOM, which incorporates a state-space model. The method first designs a multi-scan mode visual Mamba layer and adjusts data normalization to better suit the extraction of image features from smaller batch sizes during training. Additionally, they designed the Mamba Aggregated Feature Extraction Network Layer (MELAN) and the Spatial Pyramid Mamba Layer (SPMELAN), both based on the SSM core mechanism, to mine global features of distress images, thereby enhancing the model’s ability to express local details and global semantics. Yinghong et al. [] proposed the RBDN-YOLO-E model, based on YOLOv8, to address the high labor costs and low detection accuracy and efficiency in asphalt pavement distress detection. The model is designed for multi-scale target detection in deep images of asphalt pavements. Sun et al. [] proposed a new framework called DSWMamba, which includes a Selective Scanning Module (SSM) optimized to meet the unique needs of hazardous detection. Wang et al. [] introduced ShuttleNetS3, an improved architecture based on ShuttleNet, which incorporates learnable down-sampling and up-sampling modules, as well as position-based slicing for sampling. Zhang Bei et al. [] proposed a method for loose distress recognition based on an improved YOLOv8 algorithm (YOLOv8-DN). This approach designed the DN module, which combines dynamic deformable convolution and multi-scale feature fusion modules to adapt the receptive field of dynamic deformable convolutions to the morphological complexity of distress features. By utilizing multi-scale feature fusion paths, the model improves its ability to capture small and ambiguous distress areas. Liu Qi et al. [] introduced an improved lightweight YOLOv8 pavement distress detection algorithm (MSG-YOLO). This model replaces the backbone network of YOLOv8 with MobileNetV3-large to reduce the number of parameters and computational load. They also designed a new C2f_SE module to enhance the network’s focus on crack features. Liu Wenhao et al. [] proposed a novel detection method based on YOLOv9, which incorporates the Intra-Scale Feature Interaction (AIFI) module for more comprehensive information understanding and deeper feature extraction, as well as the Cross-Scale Feature Fusion (CCFF) module to improve the model’s adaptability to target size variations. Liu Pengyu et al. [] proposed an improved YOLOv5-based pavement distress detection method. This method combines attention mechanisms and lightweight structural components to enhance detection accuracy while reducing the number of parameters. It enables the detection and accurate identification of cracks and potholes in road surfaces under various interference conditions. Ruggieri et al. [] proposed an improved YOLOv11-based method for detecting reinforced concrete bridges. This approach incorporates an attention mechanism, enabling the network to focus on the most relevant details in each image. It demonstrates overall improvements in quantitative metrics such as precision and recall while maintaining sufficient computational efficiency, allowing for real-time implementation on resource-constrained devices. Luo Zhen et al. [] introduced an enhanced YOLOv11n-based model for road defect detection. The method integrates a CGBlock (Context Guided Block) to effectively extract both local and global feature information, thereby expanding the receptive field and significantly enhancing detection accuracy. Additionally, a DRCGN (Cross-Scale Feature Fusion Module) is designed and incorporated to capture feature information across different scales effectively. Furthermore, building upon the MPDIoU loss function, a scale factor learning mechanism is integrated to accelerate convergence speed.

Despite the high localization and recognition accuracy achieved by the aforementioned detection methods and their ability to meet real-time detection requirements, most models suffer from excessively large network parameters and size. Deploying these high-performance detection models on edge devices, which have severe limitations in computational resources, memory, and power consumption, poses a significant challenge. Therefore, researching lightweight pavement distress detection methods that are high in accuracy and speed and low in resource consumption is of great theoretical value and practical significance for advancing intelligent highway maintenance, improving infrastructure management efficiency, ensuring public travel safety, and optimizing resource allocation.

To address this, this paper proposes an innovative architecture, RS-YOLOv11n, based on YOLOv11n, targeting three major challenges in lightweight pavement distress detection models: insufficient feature discriminability, weak environmental robustness, and low efficiency in edge deployment. The architecture incorporates the heterogeneous feature distillation backbone RHGNetv2, which extracts distress features through multi-granularity decoupling convolution and improves small target detection rates by combining a dynamic receptive field selection mechanism. Additionally, an improved lightweight occlusion-aware module (SEAM) is introduced, significantly enhancing detection capability in occlusion scenarios. This approach ultimately achieves model lightweighting, offering a high-precision, low-latency, and highly generalized solution for intelligent highway inspection.

2. Related Work

YOLOv11

YOLOv11 was unveiled at Yolo Vision 2024 (YV24), marking a leap in real-time object detection. It introduces architectural and training enhancements to improve accuracy, speed, and efficiency. YOLOv11 enhances adaptability, supporting tasks like pose estimation and instance segmentation beyond traditional detection, broadening its applicability. Its design balances capability and practicality for industry challenges [].

The YOLO framework revolutionized object detection with a unified neural network for simultaneous bounding box regression and classification []. This end-to-end approach differs from traditional two-stage detectors. YOLOv11′s architecture (Figure 1) has three core components. Firstly, the backbone is the main feature extractor, which uses convolutional neural networks to convert the original image data into multi-scale feature maps. Secondly, the neck component serves as an intermediate processing stage, using dedicated layers to summarize and enhance feature representations of different scales. Thirdly, the head component is used as a prediction mechanism to generate the final output of object location and classification based on the refined feature map. YOLOv11 expands and enhances the foundation of YOLOv8 on the basis of this established architecture, introducing architectural innovations and parameter optimizations to achieve outstanding detection performance.

Figure 1. YOLOv11 architecture diagram.

In the backbone, YOLOv11 abandons fixed C2f/C3 modules, introducing the revolutionary C3k2 module. This creatively integrates the cross-stage dense connections of C2f (promoting gradient flow and feature reuse) with the structural simplicity of C3. Its core innovation is the configurable ‘k’ parameter, which dynamically controls the number of branches, connection topology, and computational complexity within the module. By adjusting ‘k’ (e.g., different values for n/s/m/l/x scale models), developers achieve fine-grained trade-offs between representational capacity (accuracy) and computational cost (speed/size), enabling optimal backbone customization for diverse scenarios. In the neck, YOLOv11 enhances the classic FPN (top-down semantic information flow) and PAN (bottom-up positional information flow) multi-scale fusion architecture. Beyond optimizing connection paths to reduce redundancy, a key upgrade is the optional integration of a high-resolution prediction layer ‘P2′. Located earlier in the backbone, ‘P2′ preserves rich spatial details, specifically addressing the challenge of detecting extremely small objects and significantly reducing missed detections—a core improvement for pavement distress detection utility. In the head, YOLOv11 undergoes thorough lightweighting, extensively replacing standard convolutions with depthwise separable convolutions (DSC). DSC decomposes standard convolution into depthwise convolution and pointwise convolution. Additionally, YOLOv11 introduces the C2PSA module, which selectively incorporates self-attention mechanisms locally or on specific channels/branches to enhance long-range dependency and global context capture at controlled computational cost, boosting feature representation and robustness in complex environments. Collectively, these designs enable YOLOv11 to achieve higher accuracy while maintaining high inference speed and significantly reducing resource consumption, making it an ideal base model for edge deployment.

3. Methodology

3.1. Improved YOLOv11n

RS-YOLOv11n is a lightweight road pavement defect detection model based on YOLOv11n, designed to achieve both lightweight and high-precision performance in object detection tasks. The network consists of three main components: the backbone, neck, and head, as shown in Figure 2 of the paper.

Figure 2. RS-YOLOv11n architecture diagram.

Considering the common challenges in lightweight backbone networks, such as the loss of low-level features and insufficient cross-scale information fusion, which result in a significant reduction in the ability to detect fine cracks and irregular potholes, the RS-YOLOv11n model introduces improvements. Although YOLOv11n’s C3k2 module offers structural flexibility, its uniform branching mechanism struggles to accommodate the heterogeneous feature expressions required for road defects like cracks and potholes. To address this, the model replaces the YOLOv11n backbone with the improved lightweight HGNetV2 and reconfigures the feature extraction path using a triple heterogeneous feature distillation mechanism.

Furthermore, road defect detection often faces dynamic occlusion interference, leading to breaks in the continuity of feature space. The spatially invariant convolution operations in YOLOv11n’s detection head are inadequate for modeling the contextual dependencies of partially visible targets, which results in severe missed detections, particularly for potholes and network-like cracks. To overcome this, the detection head is improved with the SEAM (Semantic Attention for Occlusion Modeling) module from YOLO-Face V2, enhancing the model’s ability to handle occlusions and improving detection performance in these challenging scenarios.

3.2. RHGNetV2

HGNetV2 serves as the backbone network for RT-DETR [], designed as an efficient and lightweight backbone for visual tasks at the edge. The overall architecture consists of several key components: the initial preprocessing layer HGStem, the core component HGBlock, the learnable downsampling layer LDS, the global average pooling layer GAP, and the final convolutional and fully connected layers FC. The HGBlock, which extensively employs depthwise separable convolutions (DWConv), offers advantages in computational efficiency. However, its static single-branch structure still exhibits limitations in road defect detection tasks. The fixed 3 × 3 convolution kernel struggles to adapt to the diverse scales of defects. Furthermore, the separation between depthwise and pointwise convolutions weakens the continuity of shallow features, leading to fragmentation of crack textures. Traditional solutions, such as dilated convolutions or attention mechanisms, introduce additional parameters, which contradict the lightweight design principle. Therefore, this paper replaces the DWConv in the HGBlock with reparameterizable convolutions (RepConv), which enhance the representational capacity through multi-branch heterogeneous feature learning during the training phase. The improved HGBlock is illustrated in Figure 3.

Figure 3. RHGBlock structure diagram.

3.3. Detect_SEAM

The SEAM module [] is an attention-based module aimed at enhancing object detection accuracy in occluded scenes, particularly addressing the issue of feature loss during partial occlusion. SEAM leverages attention mechanisms in both spatial and channel dimensions to emphasize the important regions of the image, especially focusing on areas that are not occluded. The model becomes more precise in capturing the features of these non-occluded regions. This enhancement ensures that, in practical detection tasks, even if part of the target is occluded, the model can still perform accurate detection by utilizing the features from the unoccluded areas. The SEAM architecture comprises multiple CSMM modules, each employing patches of different sizes to extract multi-scale features. Although this design allows for the capture of richer multi-scale information, it also increases computational complexity. To reduce the computational load and optimize model performance, this paper proposes an improvement to the SEAM architecture by using a single CSMM module with a patch size set to 3. This modification effectively reduces the computational burden while maintaining the ability to extract multi-scale features with an appropriate patch size. By sharing information across different scales, this approach significantly enhances computational efficiency without compromising model accuracy, making it more suitable for applications with limited computational resources. The improved SEAM is illustrated in Figure 4.

Figure 4. Improved SEAM Structure. (Left) Modified SEAM architecture; (Right) CSMM structure.

4. Experimental Design and Results Analysis

4.1. Experimental Environment

To ensure a fair comparison, all experiments were conducted on the same computer with identical hardware specifications and uniform parameter configurations. The computational environment is detailed in Table 1, while the selected training hyperparameters are specified in Table 2.

Table 1. Experimental environment.

Table 2. Training parameters.

4.2. Dataset

The dataset used in this paper is derived from the publicly available road damage dataset RDD2022, primarily consisting of high-resolution pavement images captured by on-vehicle camera equipment. The RDD2022 (Road Damage Detection 2022) dataset is a publicly accessible resource for pavement damage detection tasks, aimed at promoting the application of computer vision and deep learning models in automatic road damage detection. This dataset comprises images of various types of pavement damage from different regions, making it suitable for training and testing machine learning and deep learning models for automatic pavement damage detection. The RDD2022 dataset covers road scenes from countries such as Japan, India, and the Czech Republic, with a total of 47,420 road images. Its damage classification labels are standardized based on the Japanese road maintenance standards, with all countries using a unified label system. The dataset annotates over 55,000 road damage instances, including four typical damage types: longitudinal cracks, transverse cracks, alligator cracks, and potholes.

To enhance the model’s cross-regional generalization ability and ensure data representativeness, a full sample selection was made based on road infrastructure characteristics from three countries: China (mainly asphalt pavements), the Czech Republic (composite pavements), and the United States (high-traffic highways). By parsing the annotation files using automated scripts and statistically analyzing the instance counts for each damage type across countries, it was observed that the “Block crack” category had only three instances. To prevent the model from overfitting due to the small sample size of this category, which could reduce generalization ability and exacerbate class imbalance issues, categories with fewer than 50 instances were excluded. This ensured effective learning for each damage type. After this filtering process, a core dataset containing 12,012 high-quality annotated images was constructed, referred to as the RDD2022_Mix dataset. The dataset was then split into training, validation, and testing sets with a ratio of 7:1:2. The statistical distribution of the damage categories in the RDD2022_Mix dataset is shown in Figure 5. Among the damage types, D10 represents transverse cracks, D00 represents longitudinal cracks, D20 represents alligator cracks, D40 represents potholes, and Repair represents repairs. An example of the dataset image is shown in Figure 6.

Figure 5. Distress category distribution in RDD2022_Mix dataset.

Figure 6. An example of the RDD2022_Mix dataset image.

4.3. Evaluation Metrics

To evaluate the performance of the proposed model in pavement distress detection, a comprehensive set of metrics was selected: accuracy (P), recall (R), mean average precision (mAP), number of parameters (Parameters), floating-point operations per second (GFLOPs), model size (Size), and frames per second (FPS). The computational formulas for these metrics are defined in Equations (1)–(5):

P = \frac{T P}{T P + F P}

(1)

R = \frac{T P}{T P + F N}

(2)

A P = \int_{0}^{1} P d R

(3)

m A P = \frac{1}{N} \sum_{i = 1}^{N} A P_{i}

(4)

F P S = \frac{1}{D e t e c t i o n T i m e}

(5)

where in Equations (1) and (2), TP (true positives) denotes the number of samples correctly identified as positive (i.e., actual distress regions detected as distress); FP (false positives) denotes the number of samples incorrectly identified as positive (i.e., non-distress regions misclassified as distress); FN (false negatives) denotes the number of samples incorrectly identified as negative (i.e., actual distress regions missed by the detector); and TN (true negatives) denotes the number of samples correctly identified as negative (i.e., non-distress regions correctly classified as non-distress). In Equation (3), AP represents the average accuracy of different categories. In Equation (4), N represents the total number of distinct distress classes; AP_i represents the average precision for the i-th class. In Equation (5), DetectTime represents the average inference time required by the model to process a single input frame, typically measured as the mean inference time across the test dataset.

4.4. Ablation Study

To systematically validate the performance enhancement effect of the proposed improvement strategies and the efficacy of model lightweighting, this paper conducted ablation experiments using YOLOv11n as the baseline model. Among them, YOLOv11n+RHGNetv2 replaces the backbone network of YOLOv11n with the proposed RHGNetv2, which is an enhanced version based on HGNetv2. YOLOv11n+SEAM integrates the improved SEAM attention module into the detection head of YOLOv11n. RS-YOLOv11n represents the full proposed algorithm, incorporating all aforementioned improvements. All experimental groups strictly adhered to identical hyperparameter configurations and training strategies, with ablation studies performed on the RDD2022_Mix dataset. Quantitative evaluation metrics included mean average precision at a 50% Intersection over Union (IoU) threshold (mAP@0.5%), number of parameters (Parameters), model size (Size), and floating-point operations (GFLOPs).

The ablation results are presented in Table 3. Analysis of Table 3 demonstrates that the proposed improved model achieves significant enhancements in both detection accuracy and model efficiency. When replacing the original backbone of YOLOv11n with RHGNetv2, the model exhibits a 1.31% increase in mAP@0.5 on the test set, accompanied by a 17.44% reduction in parameters, a 0.7 MB decrease in model size, and a 0.6 G reduction in computational cost (GFLOPs). Subsequent integration of the improved SEAM detection head further elevates the full model’s mAP@0.5 by 0.64% compared to the original YOLOv11n while reducing parameters by 21.0%, model size by 0.9 MB, and computation by 1.1 GFLOPs. This indicates that the proposed model achieves sustained accuracy optimization alongside substantial simplification of model complexity, validating the effectiveness of the lightweight design. It provides a high-accuracy, low-resource real-time detection solution suitable for edge-device-based road inspection systems.

Table 3. Ablation study results.

Concurrently, Figure 7 illustrates the training progression curves for precision, recall, mAP@0.5, and mAP@0.5:0.95 of both the original and improved models. The figure reveals that RS-YOLOv11n maintains a significant advantage in precision throughout the training process, while the precision of YOLOv11n exhibits a fluctuating downward trend. The performance gap of the improved model widens progressively after approximately 50 epochs, suggesting that the optimization strategies accelerate feature convergence. Furthermore, RS-YOLOv11n reaches a higher plateau more rapidly. The original model displays pronounced overfitting (characterized by oscillating upward curves) after 150 epochs, whereas the improved model maintains smooth convergence. RS-YOLOv11n demonstrates faster convergence speed, superior stability, and a higher accuracy ceiling during the entire training cycle.

Figure 7. Training curves for precision, recall, mAP@0.5, and mAP@0.5:0.95.

Figure 8 presents the training loss and validation loss curves for the original and improved models. Analysis shows that compared to the YOLOv11n baseline, the proposed RS-YOLOv11n achieves synergistic optimization of the bounding box regression loss (box_loss), distribution focal loss (dfl_loss), and classification loss (cls_loss) during training. This optimization accelerates training convergence, mitigates overfitting risk, and reduces loss fluctuation. It establishes a training paradigm offering high stability and strong generalization capability for lightweight detectors deployed on edge devices.

Figure 8. Training and validation loss curves.

4.5. Comparative Experiments

To systematically validate the overall performance advantages of the proposed RS-YOLOv11n algorithm, this study designed comparative experiments with lightweight object detection models from the YOLO series. The effectiveness of the improved model was evaluated from two aspects: detection accuracy and lightweight capabilities. As shown in Table 4, compared to other models, the proposed model achieves significant improvements in both detection accuracy and lightweight performance. Specifically, relative to YOLOv5n, YOLOv6n, YOLOv8n, and YOLOv10n, it increases detection accuracy (mAP@0.5) by 2.82%, 4.12%, 0.4%, and 2.5%, respectively; reduces the number of parameters by 18.4%, 51.77%, 32.23%, and 10.13%; decreases model size by 0.7 MB, 4 MB, 1.7 MB, and 1.2 MB; and lowers computational complexity by 1.9 GFLOPs, 6.6 GFLOPs, 2.9 GFLOPs, and 1.3 GFLOPs, respectively. In conclusion, RS-YOLOv11n not only improves detection accuracy but also reduces the number of parameters and computational costs, offering significant advantages for future deployment. Its efficient network structure can significantly reduce computational resource consumption while maintaining detection accuracy, making it suitable for a wide range of application scenarios, especially on resource-constrained edge devices and mobile platforms.

Table 4. Comparative experiment results.

To further validate the advantages of the proposed lightweight road pavement defect detection algorithm over other algorithms, this study selected the RS-YOLOv11n model, along with YOLOv5n, YOLOv6n, YOLOv8n, YOLOv10n, and YOLOv11n models, for a visual comparison of detection performance. As shown in Figure 8, the detection results are displayed, with the predicted target categories labeled on the left side and the corresponding confidence levels shown on the right, using different colors to distinguish between categories. Through a comparative analysis of Figure 9, it is clear that, for the road pavement defect detection task, the proposed algorithm outperforms the other four models in effectively maintaining the integrity and continuity of defects, significantly reducing incomplete detection and missed detections.

Figure 9. Detection results comparison using different models.

5. Conclusions

This paper addresses the critical bottleneck in lightweight pavement distress detection by proposing RS-YOLOv11n, a lightweight model based on YOLOv11n. Firstly, the original backbone of YOLOv11n is replaced with a novel backbone, RHGNetv2, to address the challenges of missed detection for slender crack features and small targets. This modification achieves a significant improvement, elevating mAP@0.5 to 72.16%. Secondly, an enhanced SEAM module is introduced, leveraging attention mechanisms to augment contextual reasoning capabilities for partially visible distresses, effectively reducing false positives in occlusion scenarios. Integrating these two improved modules compresses the model parameters to 2.04 million, reduces the model size to merely 4.3 MB, and maintains computational complexity at only 5.2 GFLOPs, satisfying the real-time requirements of mobile inspection devices. Finally, comparative experiments confirm that RS-YOLOv11n outperforms other lightweight YOLO series variants (YOLOv5n, YOLOv6n, YOLOv8n) in both detection accuracy and lightweight metrics. This provides a reliable technical foundation for intelligent maintenance of highway infrastructure.

6. Future Work

In the future, with the continuous development of pavement distress detection technology, the application of lightweight models will become more widespread. As an efficient and accurate detection model, RS-YOLOv11n can be further optimized to enhance its adaptability in complex environments, such as improving detection performance under various weather conditions, lighting, and other challenging backgrounds. Furthermore, with the advancement of 5G technology and edge computing, RS-YOLOv11n is expected to be integrated into intelligent inspection devices, leveraging stronger real-time data processing capabilities to achieve more efficient and precise road distress early warning and maintenance management. Future research will focus on enhancing the model’s multi-task capabilities, such as simultaneously detecting multiple types of road distress, and will systematically conduct comprehensive comparative experiments with more advanced detection algorithms to further validate and improve the model’s overall performance and competitiveness, thereby deepening its application potential in intelligent transportation and smart city construction.

Author Contributions

Conceptualization, W.L. (Wei Li); Software, M.F.; Validation, X.L.; Investigation, W.L. (Wei Li) and C.Y.; Resources, W.L. (Wei Li); Writing—original draft, X.L.; Writing—review and editing, W.L. (Weiyu Liu); Project administration, W.L. (Weiyu Liu); Funding acquisition, W.L. (Weiyu Liu). All authors have read and agreed to the published version of the manuscript.

Funding

This research is funded by the National Natural Science Foundation of China (No. 12172064), the Key Research and Development Program of Shaanxi Province (No. 2022GY-208), and the Fundamental Research Funds for the Central Universities CHD (No. 300102322201).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data that support the findings of this study are available from the corresponding author upon reasonable request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Cheng, D.H.; Shi, J.X.; Glazier, C. Real-Time Image Thresholding Based on Sample Space Reduction and Interpolation Approach. J. Comput. Civ. Eng. 2003, 17, 264–272. [Google Scholar] [CrossRef]
Yamaguchi, T.; Hashimoto, S. Fast crack detection method for large-size concrete surface images using percolation-based image processing. Mach. Vis. Appl. 2010, 21, 797–809. [Google Scholar] [CrossRef]
Tsai, Y.; Kaul, V.; Mersereau, M.R. Critical Assessment of Pavement Distress Segmentation Methods. J. Transp. Eng. 2010, 136, 11–19. [Google Scholar] [CrossRef]
Lee, Y.B.; Kim, Y.Y.; Yi, S.; Kim, J.-K. Automated image processing technique for detecting and analysing concrete surface cracks. Struct. Infrastruct. Eng. 2013, 9, 567–577. [Google Scholar] [CrossRef]
Li, G.; He, S.; Ju, Y.; Du, K. Long-distance precision inspection method for bridge cracks with image processing. Autom. Constr. 2014, 41, 83–95. [Google Scholar] [CrossRef]
Xu, X.; Zhang, X. Concrete Bridge Crack Detection Technology Based on Digital Images. J. Hunan Univ. (Nat. Sci. Ed.) 2013, 40, 34–40. [Google Scholar]
Wang, X.; Feng, D.; Li, W. Research and Implementation of Pavement Crack Detection Algorithm. J. Beihua Univ. (Nat. Sci. Ed.) 2017, 27, 9–10+13. [Google Scholar]
Zheng, M.; Lei, Z.; Zhang, K. Intelligent detection of building cracks based on deep learning. Image Vis. Comput. 2020, 103, 103987. [Google Scholar] [CrossRef]
Zhang, Y.; Wang, J.; Chen, B.; Feng, T.; Chen, Z. Road Crack Detection Algorithm Based on Improved Mask R-CNN. J. Comput. Appl. 2020, 40, 162–165. [Google Scholar]
Sun, C.; Pei, L.; Li, W.; Hao, X.; Chen, Y. Pavement Crack Sealing Detection Method Based on Improved Faster R-CNN. J. South China Univ. Technol. (Nat. Sci. Ed.) 2020, 48, 84–93. [Google Scholar]
Xu, X.; Zhao, M.; Shi, P.; Ren, R.; He, X.; Wei, X.; Yang, H. Crack Detection and Comparison Study Based on Faster R-CNN and Mask R-CNN. Sensors 2022, 22, 1215. [Google Scholar] [CrossRef]
Zhao, M.; Shi, P.; Xu, X.; Xu, X.; Liu, W.; Yang, H. Improving the Accuracy of an R-CNN-Based Crack Identification System Using Different Preprocessing Algorithms. Sensors 2022, 22, 7089. [Google Scholar] [CrossRef] [PubMed]
Du, L.; Lu, X.; Li, H. Automatic fracture detection from the images of electrical image logs using Mask R-CNN. Fuel 2023, 351, 128992. [Google Scholar] [CrossRef]
Qiwen, Q.; Denvid, L. Real-time detection of cracks in tiled sidewalks using YOLO-based method applied to unmanned aerial vehicle (UAV) images. Autom. Constr. 2023, 147, 104745. [Google Scholar]
He, T.; Li, H. Pavement Disease Detection Model Based on Improved YOLOv5. J. Civ. Eng. 2024, 57, 96–106. [Google Scholar] [CrossRef]
Sike, W.; Xueqin, C.; Qiao, D. Detection of Asphalt Pavement Cracks Based on Vision Transformer Improved YOLO V5. J. Transp. Eng. Part B Pavements 2023, 149, 04023004. [Google Scholar]
Ding, K.; Ding, Z.; Zhang, Z.; Yuan, M.; Ma, G.; Lv, G. Scd-yolo: A novel object detection method for efficient road crack detection. Multimed. Syst. 2024, 30, 351. [Google Scholar] [CrossRef]
Han, Z.; Cai, Y.; Liu, A.; Zhao, Y.; Lin, C. MS-YOLOv8-Based Object Detection Method for Pavement Diseases. Sensors 2024, 24, 4569. [Google Scholar] [CrossRef]
Hu, X.; Yan, Y.; Wang, D.; Zhang, Y. Lightweight Pavement Disease Detection Method Based on YOLOM Algorithm. China J. Highw. Transp. 2024, 37, 381–391. [Google Scholar] [CrossRef]
Ying, H.; Qin, Y.; Liu, X.; Zhu, J.; Chen, W. Asphalt Pavement Deep Image Disease Detection Algorithm Based on Improved YOLOv8. J. Hunan Univ. Sci. Technol. (Nat. Sci. Ed.) 2025, 40, 88–101. [Google Scholar] [CrossRef]
Sun, P.; Yang, L.; Yang, H.; Yan, B.; Wu, T.; Li, J. DSWMamba: A deep feature fusion mamba network for detection of asphalt pavement distress. Constr. Build. Mater. 2025, 469, 140393. [Google Scholar] [CrossRef]
Wang, D.; Zhang, A.A.; Peng, Y.; Wei, Y.; Cheng, H.; Shang, J. Adaptive learning network for detecting pavement distresses in complex environments. Eng. Appl. Artif. Intell. 2025, 152, 110784. [Google Scholar] [CrossRef]
Zhang, B.; Xu, S.; Zhong, Y.; Cai, H.; Zang, Q.; Li, X. Method for Identifying Loose Disease in Semi-Rigid Base Layers Based on Improved YOLOv8 Algorithm. J. Zhengzhou Univ. (Eng. Ed.) 2025, 46, 122–129. [Google Scholar] [CrossRef]
Liu, Q.; Liang, J.; Wang, X.; Liang, Y.; Fang, W.; Hu, W. MSG-YOLO: A Lightweight Pavement Disease Detection Algorithm. Highw. Eng. 2025, 50, 91–101. [Google Scholar] [CrossRef]
Liu, W.; Zhang, D. Pavement Disease Detection Model Based on Improved YOLOv9. China Test. 2025, 51, 19–29. [Google Scholar]
Liu, P.; Yuan, J.; Gao, Q.; Chen, S. Pavement Disease Detection Method Based on Improved YOLOv5. J. Beijing Univ. Technol. 2025, 51, 552–559. [Google Scholar]
Ruggieri, S.; Cardellicchio, A.; Nettis, A.; Renò, V.; Uva, G. Using Attention for Improving Defect Detection in Existing RC Bridges. IEEE Access 2025, 13, 18994–19015. [Google Scholar] [CrossRef]
Luo, Z.; Jiang, Y.; Li, W. An Improved YOLOv11n-Based Model for Road Defect Detection. Microelectron. Comput. 2025, 1–13. Available online: https://link.cnki.net/urlid/61.1123.TN.20250225.1018.010 (accessed on 31 August 2025).
Khanam, R.; Hussain, M. Yolov11: An overview of the key architectural enhancements. arXiv 2024, arXiv:2410.17725. [Google Scholar] [CrossRef]
Rahima, K.; Muhammad, H.; Richard, H.; Paul, A. A comprehensive review of convolutional neural networks for defect detection in industrial applications. IEEE Access 2024, 12, 94250–94295. [Google Scholar] [CrossRef]
Zhao, Y.; Lv, W.; Xu, S.; Wei, J.; Wang, G.; Dang, Q.; Liu, Y.; Chen, J. Detrs beat yolos on real-time object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 17–21 June 2024; pp. 16965–16974. [Google Scholar]
Yu, Z.; Huang, H.; Chen, W.; Su, Y.; Liu, Y.; Wang, X. Yolo-facev2: A scale and occlusion aware face detector. Pattern Recognit. 2024, 155, 110714. [Google Scholar] [CrossRef]

Figure 1. YOLOv11 architecture diagram.

Figure 2. RS-YOLOv11n architecture diagram.

Figure 3. RHGBlock structure diagram.

Figure 4. Improved SEAM Structure. (Left) Modified SEAM architecture; (Right) CSMM structure.

Figure 5. Distress category distribution in RDD2022_Mix dataset.

Figure 6. An example of the RDD2022_Mix dataset image.

Figure 7. Training curves for precision, recall, mAP@0.5, and mAP@0.5:0.95.

Figure 8. Training and validation loss curves.

Figure 9. Detection results comparison using different models.

Table 1. Experimental environment.

Category	Version
Operating System	Windows11
CPU	Intel(R) Core(TM) i7-14650HX 2.20 GHz
GPU	NVIDIA GeForce RTX 4050
Pytorch Version	Pytorch 2.2.2
Python Version	Python 3.10.14
CUDA Version	CUDA 12.1

Table 2. Training parameters.

Parameter	Value
imgsz	640
epochs	200
batch	32
workers	4
optimizer	SGD
iou	0.7
lr0	0.01
lrf	0.01
momentum	0.937

Table 3. Ablation study results.

Model	P/%	R/%	F1/%	map@0.5/%	Parameters/(×10⁶)	Size/MB	FPS/(f/s)	GFLOPs/(G)
YOLOv11n	74.96	63.40	67.84	70.85	2.58	5.20	288.87	6.30
YOLOv11n+RHGNetv2	74.43	65.41	68.81	72.16	2.13	4.50	240.58	5.70
YOLOv11n+SEAM	74.38	63.19	67.55	70.88	2.49	5.10	267.02	5.80
RS-YOLOv11n	75.56	64.43	68.45	71.49	2.04	4.30	228.07	5.20

Table 4. Comparative experiment results.

Model	P/%	R/%	F1/%	map@0.5/%	Parameters/(×10⁶)	Size/MB	FPS/(f/s)	GFLOPs/(G)
YOLOv5n	72.09	62.19	65.87	68.67	2.50	5.00	324.89	7.10
YOLOv6n	67.20	62.90	64.09	67.37	4.23	8.30	343.37	11.80
YOLOv8n	76.06	61.24	67.15	71.09	3.01	6.00	307.99	8.10
YOLOv10n	68.59	65.47	66.52	68.99	2.27	5.50	228.65	6.50
YOLOv11n	74.96	63.40	67.84	70.85	2.58	5.20	288.87	6.30
RS-YOLOv11n	75.56	64.43	68.45	71.49	2.04	4.30	228.07	5.20

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

A Lightweight YOLOv11n-Based Framework for Highway Pavement Distress Detection Under Occlusion Conditions

Featured Application

Abstract

1. Introduction

2. Related Work

YOLOv11

3. Methodology

3.1. Improved YOLOv11n

3.2. RHGNetV2

3.3. Detect_SEAM

4. Experimental Design and Results Analysis

4.1. Experimental Environment

4.2. Dataset

4.3. Evaluation Metrics

4.4. Ablation Study

4.5. Comparative Experiments

5. Conclusions

6. Future Work

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics