DMR-YOLO: A Lightweight Visual Inspection Method for Surface Defect Detection of Aero-Engine Components

Tong, Jinwu; Cao, Han; Lu, Xinyun; Zhang, Xin; Gao, Bingbing

doi:10.3390/aerospace13040360

Open AccessArticle

DMR-YOLO: A Lightweight Visual Inspection Method for Surface Defect Detection of Aero-Engine Components

by

Jinwu Tong

^1,*,†

,

Han Cao

^1,†,

Xinyun Lu

¹,

Xin Zhang

¹ and

Bingbing Gao

^2,3,*

¹

Engineering Training Center & School of Applied Technology, Nanjing Institute of Technology, Nanjing 211167, China

²

School of Automation, Northwestern Polytechnical University, Xi’an 710072, China

³

Shenzhen Research Institute of Northwestern Polytechnical University, Shenzhen 518057, China

^*

Authors to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Aerospace 2026, 13(4), 360; https://doi.org/10.3390/aerospace13040360

Submission received: 15 March 2026 / Revised: 7 April 2026 / Accepted: 7 April 2026 / Published: 13 April 2026

(This article belongs to the Special Issue Intelligent Assembly and Measurement Technologies for Next-Generation Aero-Engines)

Download

Browse Figures

Versions Notes

Abstract

Accurate surface defect detection is essential for ensuring the measurement accuracy and assembly reliability of aero-engine components during manufacturing and assembly processes. Bearings, as critical rotating components in aero-engines, are highly sensitive to surface defects that may lead to stress concentration and premature failure. However, complex defect types, low-contrast textures, and multi-scale characteristics pose significant challenges for existing lightweight visual inspection models. To address these issues, this paper proposes an improved lightweight detection model, termed DMR-YOLO, based on YOLOv8n. A Diverse Branch Block (DBB) is introduced to enhance multi-scale feature extraction and improve the representation of complex defect patterns. A Multi-Level Channel Attention (MLCA) mechanism is embedded to strengthen discriminative feature channels and suppress background interference caused by low-contrast textures. In addition, a ResidualADown module is designed to preserve critical feature information during downsampling, improving the detection of subtle defects. Experimental results on a bearing surface defect dataset show that the proposed model achieves an mAP of 89.3%, representing a 2.8% improvement over YOLOv8n while maintaining real-time inference at 138.6 FPS. Moreover, generalization tests conducted on a steel surface defect dataset demonstrate the robustness and transferability of the proposed method across different datasets.

Keywords:

aero-engine components; surface defect detection; bearing defects; intelligent visual inspection; lightweight object detection; YOLOv8; DMR-YOLO

1. Introduction

In modern aerospace manufacturing, aero-engine components require extremely high standards of dimensional accuracy, structural reliability, and operational safety. Bearings, as one of the critical rotating components in aero-engine systems, play an essential role in supporting high-speed rotating shafts and ensuring stable power transmission [1]. In aerospace propulsion systems, accurate monitoring and diagnosis of bearing conditions are crucial for ensuring operational safety and preventing catastrophic failures. Recent studies have explored intelligent diagnostic methods based on deep learning to improve the robustness of bearing fault identification in complex operating environments [2]. Minor surface defects—such as cracks, scratches, burrs, pitting, and poor polishing—may originate from complex machining processes, assembly stresses, or harsh operating environments. Although these defects are usually small in size, they can act as stress concentration points, triggering early fatigue failure and significantly reducing the reliability of aero-engine components. Therefore, achieving high-accuracy, automated, and real-time detection of bearing surface defects is crucial for intelligent inspection and precision measurement in advanced manufacturing.

Traditional bearing surface defect detection methods mainly rely on manual inspection by human operators or conventional image processing-based algorithms [3], such as threshold segmentation, edge detection, and texture analysis. These approaches are relatively effective when dealing with homogeneous backgrounds or single-type defects; however, their capability to identify defects under complex backgrounds, low-contrast conditions, or multi-scale scenarios remains limited. Moreover, such traditional methods generally exhibit poor robustness and are highly sensitive to variations in illumination, noise, and texture interference, making them inadequate for the diverse and high-speed inspection requirements of modern industrial production [4]. With the continuous progress of deep learning and computer vision, surface defect detection has increasingly relied on CNN (Convolutional Neural Network)-based object detection methods, which benefit from unified feature extraction and prediction within a single learning framework [5]. For instance, Xia et al. [6] proposed an improved Faster R-CNN-based surface defect detection algorithm, in which a feature pyramid network built upon a ResNet-50 backbone with deformable convolutions was employed to enhance the representation of multi-scale defect features, thereby addressing challenges arising from diverse defect types and complex geometries. Zhang et al. [7] introduced an improved YOLOv5 model that integrates a multi-scale feature fusion strategy with a CSPLayer Res2Attention residual module, significantly strengthening defect feature extraction and aggregation, and consequently improving classification and localization accuracy. In aerospace applications, intelligent diagnosis frameworks have also been proposed to analyze bearing health conditions under complex working environments. For example, federated learning combined with self-attention mechanisms has been explored to improve the accuracy and robustness of aerospace bearing fault diagnosis [8]. Among these approaches, the YOLO (You Only Look Once) series models, characterized by lightweight architectures, high detection speed, and competitive accuracy, have been widely adopted in industrial visual inspection tasks [9]. These intelligent visual inspection methods provide an important technical foundation for automated defect measurement and quality control in modern aerospace manufacturing.

Despite the remarkable success of the YOLO series in general object detection, several challenges remain when applying these models to bearing surface defect detection. First, bearing defects are typically characterized by small scales, irregular shapes, and low contrast [10], making it difficult for conventional convolutional structures to effectively capture fine-grained edge and texture information. Second, although lightweight variants such as YOLOv8n offer high inference speed, the fusion between deep semantic features and shallow texture features is often insufficient [11], limiting detection performance in complex multi-scale scenarios. Furthermore, the lack of targeted enhancement mechanisms for critical features in deep networks makes them vulnerable to background interference in complex industrial environments, thereby reducing defect discriminability [12]. In addition, conventional downsampling strategies tend to cause the loss of subtle defect information during feature map compression [13], which is detrimental to subsequent multi-scale feature fusion and precise localization. Consequently, further improving detection accuracy and model robustness while maintaining real-time inference speed [14] remains an urgent challenge.

In response to the above limitations, an improved lightweight bearing surface defect detection model, named DMR-YOLO, was developed in this study. The proposed model is built on YOLOv8n and incorporates multi-level structural optimizations. Specifically, the main contributions of this work are summarized as follows:

(1) A task-oriented re-parameterized feature extraction module (C2f-DBB) is designed by embedding the Diverse Branch Block into the YOLOv8 backbone and neck. Unlike conventional static convolution structures, the proposed design enables adaptive multi-branch feature aggregation during training while maintaining single-path efficiency during inference. This improves the representation of irregular and small-scale defect patterns under complex industrial backgrounds.

(2) A lightweight multi-level channel attention module (C2f-MLCA) is constructed and integrated into selected backbone layers. Different from conventional channel attention mechanisms, the proposed design jointly models local and global channel dependencies, specifically enhancing the discrimination of low-contrast defect features against noisy metallic textures.

(3) A ResidualADown module is proposed to address information loss in conventional downsampling. By introducing a residual information preservation path, the module improves the retention of fine-grained spatial details, which is critical for detecting subtle defects such as micro-cracks.

(4) More importantly, a collaborative optimization strategy is developed by integrating DBB, MLCA, and ResidualADown into a unified lightweight framework. The three modules are designed to complement each other in feature extraction, feature selection, and information preservation, respectively, forming a synergistic mechanism that enhances detection performance beyond simple incremental improvements.

Through these improvements, the proposed DMR-YOLO model maintains a lightweight design and high inference speed while enhancing multi-scale feature representation and small-target detection performance. Experimental results show that DMR-YOLO achieves higher detection accuracy and stability than existing lightweight models on the bearing surface defect dataset. These results indicate its potential for intelligent visual inspection in surface defect detection tasks, including applications in aero-engine component inspection. In aerospace manufacturing, defect detection is not only required to identify defect categories, but also to support reliability-oriented inspection processes. In practical scenarios, automated visual inspection systems are typically used as a preliminary screening tool to assist human experts, where high recall is critical to avoid missing potential defects, while real-time performance is required for production efficiency. Therefore, developing a lightweight and high-sensitivity detection model is an important step toward intelligent inspection in aerospace manufacturing pipelines, although it does not replace subsequent quantitative evaluation and certification procedures.

The remainder of this paper is organized as follows: Section 2 briefly reviews the research status of bearing defect detection technologies and introduces the fundamental concepts of the YOLOv8n model. Section 3 presents the overall architecture of the proposed DMR-YOLO model and elaborates on the key improvement strategies. Section 4 describes the experimental setup and provides a detailed analysis of the experimental results to validate the performance of the proposed model. Section 5 conducts generalization experiments of the improved model on additional datasets. Finally, the conclusions are drawn, and potential directions for future research are discussed.

2. Related Work

2.1. Defect Detection Based on Deep Learning

Existing studies on bearing surface defect detection can be broadly categorized into two main directions: feature representation enhancement and lightweight model design.

In terms of feature representation, several approaches aim to improve the ability of detection models to capture multi-scale and fine-grained features. Hu and Tong [15] proposed the C2f_EMSCP module, which was integrated into both the backbone and neck networks. This module effectively combines multi-scale convolutional networks with a position-aware mechanism, enhancing the model’s sensitivity and detection capability for objects at multiple scales through the fusion of feature maps from different resolutions. Han et al. [16] proposed an intelligent feature concentration (IFC) module in their self-designed BED-YOLO model to preserve critical defect features, and further developed an efficient feature fusion module with scalable convolutions (EFFSC), which significantly improved the model’s representational capability. Hu et al. [17] optimized the coordinate attention mechanism and introduced a novel deformable attention mechanism (DAM) specifically tailored for small-size defect detection. These studies mainly focus on improving the feature extraction and discrimination capability of detection models, with particular emphasis on small-scale defects in complex background environments. However, although such approaches achieve notable improvements in detection accuracy, they often lead to increased computational overhead, resulting in reduced inference speed.

To address efficiency issues, recent research has focused on lightweight model design. Fu et al. [18] developed a YOLOv5-oriented bearing defect detection framework by introducing MobileNetV3 as a lightweight backbone, replacing the original YOLOv5 feature extractor and significantly reducing inference time. Zhang et al. [19] developed a lightweight bearing defect detection model termed LARD-YOLOv8 based on YOLOv8n. In this work, a LiteShiftHead detection head integrating SPConv, REG, and CLS modules was designed to ensure lightweight architecture while enabling efficient feature extraction and accurate classification and regression. An ARConv module was further introduced to enhance adaptability to multi-directional defects, while the RepNCSPELAN4 module was employed to optimize computational efficiency. In addition, the Inner-DIoU loss function was improved by dynamically adjusting auxiliary bounding boxes, thereby enhancing localization accuracy and convergence speed. Xu et al. [10] proposed an improved YOLOv5-based bearing defect detection method by replacing the C3 modules in the backbone with C2f modules, which reduced the number of parameters and computational complexity to improve both speed and accuracy. Furthermore, SPD modules were incorporated into the backbone and neck networks to strengthen the handling of low-resolution features and small targets, while a lightweight CARAFE operator was adopted to replace nearest-neighbor upsampling, enriching contextual information, reducing information loss, and improving model diversity and robustness. As a result, the detection speed reached up to 100 FPS. Liu et al. [20] introduced a YOLOv8n-based framework that integrates a VanillaNet backbone, the Lion optimizer, a CFP-EVC module, and the Shape-IoU loss function, achieving significant improvements in detection efficiency and accuracy while reducing computational complexity and maintaining a mean average precision of 86.5%. Although existing lightweight models have demonstrated enhanced object recognition capabilities, there remains considerable room for further optimization in terms of detection accuracy and inference speed, particularly for small-scale defect detection under complex background textures.

2.2. Overview of the YOLOv8 Algorithm

YOLOv8, developed by the Ultralytics team, is a new-generation single-stage detection framework that extends beyond conventional object detection to support a broad range of vision perception tasks under a unified architecture. The YOLOv8 family comprises five model scales—YOLOv8n, YOLOv8s, YOLOv8m, YOLOv8l, and YOLOv8x—enabling flexible adaptation to diverse application scenarios ranging from lightweight deployment on mobile devices to high-precision detection on server platforms. On mainstream benchmarks such as the COCO dataset, YOLOv8 consistently outperforms previous versions, including YOLOv5 and YOLOv7, in terms of both detection accuracy and inference speed, thereby maintaining the core YOLO philosophy of efficiency and practicality.

Building upon and optimizing the architectural design of earlier YOLO versions, YOLOv8 incorporates a series of innovative upgrades that further enhance model generalization capability and engineering applicability. The main improvements can be summarized as follows:

(1): An anchor-free detection mechanism is introduced to replace the conventional anchor-based design, which simplifies the label matching process, reduces the model’s dependence on data distribution, and results in more stable detection performance with improved generalization capability.
(2): An improved C2f (Cross-Stage Partial Fusion) structure is adopted to replace the C3 module used in YOLOv5. By leveraging branch connections and feature reuse, the C2f module further reduces computational complexity while enhancing feature representation capability.
(3): The introduction of a dynamic task assignment strategy, namely the Task-Aligned Assigner, enables more flexible and efficient positive–negative sample assignment, thereby accelerating model convergence and improving detection accuracy.

The network architecture of YOLOv8 follows the standard three-part design, consisting of a backbone network, a neck network, and a head network, as illustrated in Figure 1.

Backbone Network: The backbone network of YOLOv8 is constructed based on the CSPDarknet paradigm, where multi-layer convolutional operations are combined with C2f modules to extract features at different scales. The C2f module enhances information flow through cross-stage connections and feature fusion mechanisms, while incorporating a lightweight design to reduce computational overhead. In addition, YOLOv8 employs a Spatial Pyramid Pooling-Fast (SPPF) module at the end of the feature extraction stage to enlarge the multi-scale receptive field, enabling effective capture of contextual information for objects of various sizes.

Neck Network: The neck network of YOLOv8 is built upon a PAN-FPN (Path Aggregation Network -Feature Pyramid Networks) feature fusion framework. Through bidirectional multi-scale feature propagation in both top-down and bottom-up paths, deep semantic information is effectively fused with shallow spatial details. On this basis, YOLOv8 further improves feature fusion efficiency by adopting lightweight convolutional operations and efficient connection strategies.

Head Network: YOLOv8 adopts an anchor-free detection head, in which decoupled classification and regression branches are employed to predict class probabilities and bounding box parameters, respectively. Unlike traditional anchor-based detectors, YOLOv8 performs predictions directly at pixel locations on the feature maps, significantly reducing the complexity of hyperparameter design. Furthermore, IoU-related constraints are incorporated into both bounding box regression and sample assignment processes, enabling the model to focus more on the overlap quality between predicted boxes and ground-truth boxes during training. As a result, target localization accuracy and detection robustness are effectively enhanced [21].

3. Improvements of YOLOv8 Algorithm

The architectural design of DMR-YOLO is fundamentally driven by the unique challenges of industrial bearing defect detection, such as low contrast, fine-grained features (e.g., micro-cracks), and the demand for real-time edge deployment. The improved YOLOv8n algorithm is illustrated in Figure 2. In the backbone network, the C2f-DBB module replaces the original C2f modules at the 2nd and 4th layers, while in the neck network, it is applied at the 12th, 15th, 18th, and 21st layers. This choice stems from the need to capture irregular defect geometries; by incorporating multiple branch convolutions during training and fusing them via structural re-parameterization for inference, the C2f-DBB module enhances feature diversity and multi-scale perception capability without increasing the computational burden during deployment.

Additionally, a Multi-Level Channel Attention (MLCA) mechanism is introduced at the 6th and 8th layers of the backbone network. The original convolution modules are replaced with C2f-MLCA, which adaptively adjusts feature weights across multiple channel levels. This is specifically designed to handle the complex optical conditions of metallic bearing surfaces; by recalibrating channel-wise importance, the mechanism strengthens the model’s focus on critical defect signatures while effectively suppressing background noise caused by specular reflections and uneven lighting.

Finally, the specially designed ResidualADown module is integrated into the 3rd, 5th, and 7th layers. Unlike standard downsampling that often acts as an information bottleneck, this module reduces spatial resolution while maximally preserving important feature information through a residual bypass. This ensures that pixel-level spatial details of subtle defects are not discarded during the deepening of the network, thereby mitigating information loss during downsampling.

It is important to emphasize that the proposed modules are not simply combined in a parallel or independent manner. Instead, they are designed to address different but interrelated challenges in industrial defect detection.

Specifically, the C2f-DBB module enhances the diversity and adaptability of feature extraction, enabling better representation of irregular defect patterns. The C2f-MLCA module further refines these features by selectively emphasizing informative channels and suppressing background noise. Meanwhile, the ResidualADown module preserves critical spatial information during downsampling, ensuring that subtle defect details are not lost in deeper layers. Through this complementary design, the three modules form a synergistic pipeline of “feature enhancement—feature selection—information preservation,” which leads to more robust detection performance compared with isolated or naively combined improvements.

3.1. Re-Parameterization Module C2f-DBB

In the original YOLOv8n, feature extraction and fusion are performed using the C2f module. In this module, the convolutional kernel parameters are fixed after training, and all branch convolutions follow a static structure, which limits the diversity of feature representation. However, bearing surface defects are typically extremely small, with weak and highly variable visual characteristics. The fixed kernel sizes and shapes in the standard C2f module are insufficient to effectively capture such fine-grained features, leading to suboptimal detection performance in complex backgrounds. To address this limitation, a diverse branch module based on multi-branch convolution and structural re-parameterization, termed C2f-DBB, is introduced to replace the C2f modules in YOLOv8n.

In C2f-DBB, the standard convolution operations within the Bottleneck structure are replaced with a multi-branch convolutional design. By incorporating parallel branches with heterogeneous convolution kernel, the module enhances the diversity of feature extraction at multiple receptive fields. This design enables more effective representation of small, irregular, and multi-scale defect patterns. Meanwhile, the multi-branch structure is equivalently fused into a single convolution during inference via structural re-parameterization, ensuring that no additional computational overhead is introduced. As a result, the proposed module improves feature representation capability while maintaining high inference efficiency, making it suitable for complex industrial defect detection scenarios. The structure of C2f-DBB is illustrated in Figure 3.

The DBB (Diverse Branch Block) module is a convolutional building block based on structural re-parameterization, which enhances feature representation diversity by introducing a multi-branch topology during training while maintaining inference efficiency. The overall structure of the DBB module is illustrated in Figure 4. Let the input feature map be denoted as:

\begin{matrix} X \in R^{C \times H \times W} \end{matrix}

(1)

The DBB module consists of four parallel branches, and the final output feature map is obtained by element-wise summation of all branch outputs. This operation can be formulated as follows:

\begin{matrix} Y = \sum_{i = 1}^{4} F_{i} (X) \end{matrix}

(2)

where

F_{i} (\cdot)

denotes the feature transformation function of the i-th branch.

Branches 1 and 4 are single-layer convolution branches, which adopt 1 × 1 convolution and K × K convolution, respectively, followed by Batch Normalization (BN). These two branches introduce linear and spatial feature transformations with different receptive fields, enhancing feature diversity and stabilizing feature distributions. This operation can be formulated as follows:

\{\begin{matrix} F_{1} (X) = B N ({Conv}_{1 \times 1} (X)) \\ F_{4} (X) = B N ({Conv}_{k \times k} (X)) \end{matrix}

(3)

Branch 2 first applies a 1 × 1 convolution to compress and reorganize channel information, followed by a K × K convolution to capture local spatial features. Batch Normalization is applied after each convolution to improve training stability. This operation can be formulated as follows:

\begin{matrix} F_{2} \end{matrix} = BN (Con v_{K \times K} (BN (Con v_{1 \times 1} (X))))

(4)

This branch focuses on extracting fine-grained spatial details and local texture information, which is particularly beneficial for detecting small and irregular defect patterns.

Branch 3 first employs a 1 × 1 convolution followed by BN to adjust channel representations and then applies Average Pooling to aggregate neighborhood information and enhance contextual robustness. This operation can be formulated as follows:

F_{3} (X) = BN (AvgPool (BN ({Conv}_{1 \times 1} (X))))

(5)

This branch introduces a smoothing effect and strengthens global contextual perception, improving robustness against background noise.

By integrating convolutional features from branches with heterogeneous receptive fields and modeling behaviors, the DBB module enhances the network’s ability to capture fine-grained details while preserving broader contextual representations. The outputs of all four branches are then integrated via element-wise summation to form the unified output feature map. This operation can be formulated as follows:

\begin{matrix} Y = F_{1} (X) + F_{2} (X) + F_{3} (X) + F_{4} (X) \end{matrix}

(6)

During inference, the DBB module can be equivalently folded into a single convolution through structural re-parameterization, such that:

\begin{matrix} \sum_{i = 1}^{N} F_{i} (X) = {Conv}_{rep} (X) \end{matrix}

(7)

where the convolution kernel weights and biases of

{Conv}_{rep}

are obtained by linearly combining the convolution and BN parameters from all branches. This re-parameterization is applied only during inference, fully retaining the performance benefits of the multi-branch structure during training without introducing additional inference overhead.

The integration of the DBB module is specifically motivated by the inherent trade-off between detection precision and inference latency in industrial scenarios. Given the diverse and irregular geometries of bearing surface defects (e.g., irregular burrs and fine-grained cracks), a robust multi-scale receptive field is essential. By employing asymmetric convolutions and multi-branch structures during the training phase, DBB significantly enhances the model’s capacity for complex feature extraction. Subsequently, these branches are fused into a single-path equivalent convolution via structural re-parameterization for inference. This design ensures that DMR-YOLO achieves superior feature diversity without imposing additional computational overhead on edge-constrained devices.

3.2. Attention Mechanism C2f-MLCA

In bearing surface defect detection tasks, defect regions usually occupy only a small portion of the image, while background textures may dominate the feature responses. This often leads to insufficient discrimination between defect and non-defect regions. To address this issue, a multi-level channel attention mechanism is introduced to enhance informative channel features and suppress irrelevant background responses. The C2f-MLCA module is a lightweight feature enhancement structure derived from the C2f module, following a “split–enhance–fuse” workflow, as illustrated in Figure 5. The input feature map first passes through a CBS module for basic convolutional preprocessing. It is then split along the channel dimension via a Split operation: one portion of the features is retained as a shortcut, while the remaining portion is fed into a feature enhancement branch. The enhancement branch consists of repeated units of Bottleneck + MLCA, where the Bottleneck unit uses a lightweight residual structure to flexibly adjust channel dimensions, and the Multi-Level Channel Attention (MLCA) module integrates both local and global feature perception mechanisms. This allows the network to capture spatial details and channel dependencies simultaneously, achieving targeted enhancement of critical features. Finally, the processed branch features are concatenated with the shortcut features along the channel dimension through a Concat operation, and the output CBS module unifies the channel dimensions, completing the integration of multi-scale and multi-dimensional information. The resulting enhanced feature map is then output.

The MLCA module combines channel and spatial feature information to improve the representational capability of the network, as shown in Figure 5. First, the input feature map undergoes local average pooling, producing a tensor of shape 1 × C × K × K to extract local spatial features. The tensor is then split into two parallel branches: one branch captures global contextual information through global average pooling, while the other captures fine-grained spatial information via local average pooling. Subsequently, features from both branches are transformed through 1D convolutions and upsampled to the original spatial resolution using reverse average pooling. The fused local and global channel attention weights

A

are then generated according to the following formula:

A = σ (Conv 1 D (GAP (X)) + UNAP (Conv 1 D (LAP (X))))

(8)

Here,

σ

denotes the Sigmoid activation function, and

Conv 1 D (\cdot)

represents a one-dimensional convolutional transformation.

The attention feature maps from the two branches are then fused in an element-wise manner, producing the final feature map that integrates both local and global information. This feature map assigns multi-scale attention weights along both the channel and spatial dimensions, thereby significantly enhancing the representational capability of the network.

In Figure 5,

Conv 1 D

denotes a one-dimensional convolution, where the kernel size

k

is proportional to the number of channels

C

, and their relationship is defined as follows:

\begin{matrix} k = {|\frac{\log_{2} (C)}{γ} + \frac{b}{γ}|}_{odd} \end{matrix}

(9)

Here,

C

denotes the number of channels,

k

is the convolutional kernel size, and

γ

and

b

are hyperparameters, both set to 2 by default. The term odd indicates that

k

is rounded to the nearest odd number; if the result is even, 1 is added to ensure an odd kernel size.

In this work, the MLCA algorithm is integrated into the original C2f module to form the C2f-MLCA module. This integration enhances the network’s ability to perceive both channel and spatial information without significantly increasing computational complexity. The introduction of this module effectively improves feature extraction capability and strengthens the network’s performance in detecting targets of varying scales.

3.3. ResidualADown Module

Downsampling operations in convolutional neural networks often lead to the loss of important feature information. To address this issue, a ResidualADown module is designed to improve feature propagation during the downsampling process. By introducing a residual connection, the proposed module can effectively preserve useful feature information while reducing spatial resolution, thereby enhancing the representation capability of the network, as illustrated in Figure 6. Let the input feature map be defined as:

\begin{matrix} X \in R^{C \times H \times W} \end{matrix}

(10)

The input feature map is first processed by an AvgPool2d layer for preliminary downsampling, which reduces spatial resolution while preserving global contextual information. This process can be expressed as:

\begin{matrix} X_{d} = A v g P o o l 2 d (X) \end{matrix}

(11)

Subsequently, the downsampled feature map is divided into two branches along the channel dimension via a Chunk operation:

\begin{matrix} X_{d} \to {X_{1}, X_{2}} \end{matrix}

(12)

In Branch 1, the features are directly fed into a convolutional layer to extract local detail information, which can be formulated as:

\begin{matrix} Y_{1} = C o n v (X_{1}) \end{matrix}

(13)

Branch 2 first applies a MaxPool2d layer to further compress the spatial dimensions and emphasize salient response regions, followed by a convolutional transformation. The computation is given by:

\begin{matrix} Y_{2} = C o n v (MaxPool 2 d (X_{2})) \end{matrix}

(14)

Finally, the outputs of the two branches are concatenated through a Concat operation to obtain the downsampled feature map with fused multi-scale information:

\begin{matrix} Y_{m} = C o n c a t (Y_{1}, Y_{2}) \end{matrix}

(15)

Although the above dual-branch structure enables effective multi-scale feature extraction during downsampling, a certain degree of information loss may still occur during progressive feature transformation. To further alleviate this issue, a residual path is introduced to enhance feature propagation and preserve complementary information from the input feature map, as illustrated in Figure 6. Structure of the ResidualADown module.

Specifically, the ResidualADown module adopts a dual-path parallel architecture, consisting of a main path and a residual path. The main path performs the downsampling and feature extraction operations described above, while the residual path directly conveys part of the original feature information to the output.

In the residual path, the input feature map is first processed by an AvgPool2d layer to perform spatial downsampling, followed by a convolutional layer to adjust the channel dimension, ensuring that the output feature map is consistent with the main path in both spatial resolution and channel size. The residual path can be expressed as:

\begin{matrix} Y_{r} = C o n v (A v g P o o l 2 d (X)) \end{matrix}

(16)

Subsequently, the output of the residual path is combined with the main path output via element-wise addition to achieve feature fusion. The overall output of the ResidualADown module is given by:

\begin{matrix} Y = Y_{m} + Y_{r} \end{matrix}

(17)

where

Y_{m}

denotes the output of the main path and

Y_{r}

represents the output of the residual path. By employing lightweight pooling and convolution operations, the residual path ensures effective feature alignment and fusion in both spatial and channel dimensions. This residual structure can be regarded as introducing an identity-preserving mechanism during the downsampling stage, which is consistent with the residual learning philosophy of ResNet. It helps mitigate feature degradation and information loss in deep networks. ResidualADown addresses the information bottleneck problem in traditional downsampling by introducing a residual shortcut that preserves high-frequency spatial details (e.g., pixel-level cracks) typically discarded by standard strided convolutions. By combining the multi-scale fused features from the main path with the original feature representations retained in the residual path, the module enhances the stability and integrity of feature representations while maintaining lightweight characteristics, thereby providing a more informative foundation for subsequent high-level semantic modeling and improving the detection of subtle manufacturing defects.

4. Results and Evaluation

4.1. Dataset

In this study, a bearing casting surface condition dataset, legally obtained from a commercial third-party source, was used to simulate surface defect inspection scenarios in industrial component manufacturing. The dataset is not publicly available and is used exclusively for academic research purposes. All defect categories in the dataset are predefined and manually annotated based on their visual characteristics, rather than explicit geometric or physical measurements such as size or depth. All images in the dataset are annotated using bounding boxes that enclose defect regions, following standard object detection labeling practices. Although the dataset used in this study is collected from industrial bearing manufacturing scenarios rather than directly from aerospace production lines, the defect types (e.g., cracks, pits, and scratches) and surface characteristics are highly relevant to those encountered in aero-engine component inspection. Therefore, the dataset is used as a representative benchmark to simulate visual inspection tasks in aerospace manufacturing environments. Bearings are widely used as critical rotating components in aero-engine systems, and their surface quality plays an important role in ensuring operational reliability. The dataset covers a variety of surface conditions that may occur during the production and machining processes of bearing castings, including eight defect categories: Casting burr, Polished casting, Burr, Crack, Pit, Scratch, Strain, and Unpolished casting. The dataset consists of 2561 images in the training set and 732 images in the validation set, with an image resolution of 640 × 640 pixels. The training-to-validation split ratio is 3:1.

In terms of class distribution, the number of samples in each defect category is generally balanced, although slight variations exist due to differences in the frequency of defect occurrence in real manufacturing processes. This reflects practical industrial conditions and avoids excessive bias toward specific defect types.

Due to data usage restrictions imposed by the provider, the dataset cannot be publicly released. However, all experimental configurations, training procedures, and evaluation metrics are fully described in this study to ensure the reproducibility of the proposed method. The model can be readily applied to other similar surface defect datasets. Representative image samples from the dataset are shown in Figure 7.

4.2. Implementation Details

All experiments were conducted on a Windows 11 operating system. The hardware platform is equipped with an Intel Core i7-14650HX CPU and an NVIDIA GeForce RTX 5060 GPU with 16 GB of video memory. Detailed hardware configurations are listed in Table 1. Experimental Environment. During training, the number of training epochs was set to 200, with a batch size of 16. The initial learning rate was 0.01, and the momentum parameter was set to 0.937. The stochastic gradient descent (SGD) optimizer was employed for model optimization. The detailed training hyperparameters are summarized in Table 2. Experimental Parameters. All models in the comparison experiments are trained under identical experimental settings, including the same training and validation splits, data preprocessing, training epochs, optimization parameters, and hardware environment, to ensure a fair comparison. To reduce the influence of randomness in model training and ensure the reliability of the experimental results, each experiment in this study was independently repeated three times, and the final reported results are the average values of the three runs.

4.3. Evaluation Indicators

To accurately evaluate the detection accuracy and efficiency of the proposed improved algorithm for bearing surface defects, Precision (P), Recall (R), and mean Average Precision (mAP) are adopted as evaluation metrics, which are used to assess category-level detection performance. Misclassification cases are counted as incorrect detections and are reflected in the evaluation metrics. Their mathematical definitions are given as follows:

\begin{matrix} P = \frac{TP}{TP + FP} \end{matrix}

(18)

\begin{matrix} R = \frac{TP}{TP + FN} \end{matrix}

(19)

where TP denotes the number of samples in which defect regions are correctly detected, FP represents the number of samples that are incorrectly detected as defects when no defect is present, and FN indicates the number of defect samples that fail to be detected.

Since bearing surface defect detection involves multiple defect categories, Precision (P) and Recall (R) alone are insufficient to comprehensively evaluate the performance of the model. During the training phase, a corresponding Precision–Recall (PR) curve can be generated for each defect category, and the area under the PR curve represents the Average Precision (AP), which is defined as:

\begin{matrix} A P = \int_{0}^{1} P (R) dR \end{matrix}

(20)

The mean Average Precision (mAP) denotes the average of the AP values across all categories and is calculated as:

\begin{matrix} m A P = \sum_{i = 1}^{N} \frac{AP (i)}{N} \end{matrix}

(21)

In this study, the term “detection accuracy” refers to the overall detection performance of the model, which is comprehensively evaluated using Precision (P), Recall (R), and mean Average Precision (mAP). Among these metrics, mAP is considered the primary indicator, as it reflects both localization and classification performance across all defect categories.

In industrial defect detection scenarios, the definition of satisfactory detection accuracy is typically task-dependent and varies with dataset characteristics, defect complexity, and application requirements. In general, a model is considered practically effective when it achieves a high mAP while maintaining strong recall to minimize missed detections.

4.4. Ablation Study

To evaluate the effectiveness of the proposed DMR-YOLO compared with the baseline YOLOv8n model and to verify the contribution of each individual improvement, a series of ablation experiments were conducted. All experiments were performed under the same experimental environment with identical hyperparameter settings to ensure fair comparison. Starting from the baseline YOLOv8n model, the proposed improvement modules were gradually introduced one by one. The detailed ablation results are presented in Table 3. Comparison of Ablation Experiment Results.

Experiment 1 serves as the baseline model, achieving an mAP of 86.5%, which provides a clear reference framework for evaluating the performance gains of subsequent modules. In Experiment 2, the original C2f modules in the baseline model are replaced with the proposed C2f-DBB modules, resulting in an increase in mAP to 87.5%. Meanwhile, Precision and Recall remain nearly unchanged, with only a slight increase in the number of parameters and computational cost. This indicates that the DBB module effectively enhances feature representation capability at an acceptable computational overhead, contributing steadily to detection accuracy. In addition, a slight improvement in FPS is observed. This can be attributed to the structural re-parameterization mechanism of the DBB module, where the multi-branch structure used during training is equivalently fused into a single convolution during inference, leading to more efficient execution. In Experiment 3, the C2f-MLCA modules are introduced at the 6th and 8th layers of the backbone network to replace the original C2f modules. The results show that both detection accuracy and computational cost remain comparable to those of the baseline model, suggesting that MLCA primarily functions in feature distribution adjustment and background noise suppression rather than significantly strengthening deep semantic representations when used independently. Consequently, the performance gain of the MLCA module alone is relatively limited. Experiment 4 incorporates the proposed ResidualADown module, leading to a notable improvement in mAP to 88.4%. At the same time, both Precision and Recall are enhanced, while the overall computational cost is reduced. These results demonstrate that ResidualADown achieves an efficient and stable improvement in detection performance, making it the most effective single-module enhancement among the evaluated components.

On this basis, Experiments 5–7 investigate different combinations of the proposed improvement strategies. In Experiment 5, the integration of DBB and MLCA results in an mAP of 88.2%, outperforming the corresponding single-module experiments. However, this performance gain is accompanied by an increase in model parameters and computational complexity, leading to a slight reduction in FPS. This indicates that although multi-strategy fusion can improve detection performance, it inevitably introduces additional computational overhead. In Experiments 6 and 7, the ResidualADown module is incorporated while maintaining nearly stable Precision and Recall values, resulting in varying degrees of reduction in parameter scale and computational complexity. These results demonstrate the complementary relationship among different modules in terms of balancing performance and efficiency.

In Experiment 8, the complete architecture integrating all three proposed modules—C2f-DBB, C2f-MLCA, and ResidualADown—is adopted to construct the final model, termed DMR-YOLO. As shown in Table 3, the proposed model achieves an mAP of 89.3%, which is the best performance among all experiments. The results indicate a clear synergistic effect among the three components. Notably, while the MLCA module provides limited performance improvement when applied individually (Exp. 2), its contribution becomes significant when combined with ResidualADown and DBB (Exp. 8), resulting in a substantial increase in detection accuracy. This demonstrates that DMR-YOLO operates as an integrated system rather than a simple combination of independent modules, as the interaction between feature enhancement (DBB), feature recalibration (MLCA), and information preservation (ResidualADown) leads to a cooperative optimization process. Specifically, ResidualADown preserves subtle texture and defect details during the downsampling process, DBB strengthens local feature representation, and MLCA performs cross-scale attention modulation to filter critical signals from complex metallic background noise. Through their complementary roles in deep semantic enhancement, cross-scale feature aggregation, and feature selection, the three modules collectively improve the model’s robustness to low-contrast and small-scale defects. Moreover, the final model maintains high inference speed while achieving superior detection accuracy, demonstrating the effectiveness and practical applicability of DMR-YOLO in real-world industrial scenarios.

4.5. Model Comparison and Visualization Analysis

To further verify the superiority of the proposed model in bearing surface condition detection, several representative object detection algorithms in the related field are selected for comparative evaluation, including YOLOv3-tiny, YOLOv5, YOLOv7, YOLOv8, YOLOv9, and YOLOv10n. The performance of each method is evaluated using multiple metrics, namely Precision (P), Recall (R), mean Average Precision (mAP), Frames Per Second (FPS), number of parameters (Parameters/10⁶), and computational complexity (GFLOPs). The comparative results are summarized in Table 4. Performance Comparison of YOLO Series Models.

As shown in Table 4. Performance Comparison of YOLO Series Models, there are significant differences among the compared YOLO-based models in terms of precision, recall, mean average precision, detection speed, parameter size, and computational complexity. Although YOLOv3-tiny achieves relatively high precision and recall with a competitive mAP, its inference speed is relatively low and the computational cost remains high. In contrast, YOLOv5n substantially reduces computational complexity and significantly improves detection speed; however, this improvement comes at the cost of a noticeable degradation in both mAP and recall. As the baseline model in this study, YOLOv8n demonstrates a favorable balance between accuracy and efficiency. It achieves strong precision and recall performance, with an mAP of 86.5%, while maintaining relatively low computational complexity and fast inference speed. By comparison, YOLOv9t suffers from a considerable decrease in detection speed, limiting its suitability for real-time industrial applications. Although YOLOv10n exhibits advantages in terms of model lightweightness, its detection accuracy is unsatisfactory, making it less competitive in precision-critical defect detection tasks. Overall, the experimental results indicate that the proposed DMR-YOLO consistently outperforms the other YOLO-series models. It achieves the highest precision, recall, and mAP (89.3%), while maintaining high inference speed and a relatively small number of parameters. This demonstrates that the proposed improvements effectively enhance detection performance without sacrificing real-time capability.

After 200 training epochs, the improved DMR-YOLO model exhibits clear advantages across all key evaluation metrics. Its precision, recall, and mean average precision surpass those of the compared models, confirming its superior performance and adaptability in practical bearing surface defect detection scenarios. To further visualize and quantitatively compare the average precision of different models across various defect categories, an AP comparison chart for each defect class is provided in Figure 8. Comparison of AP values of different algorithms in each defect category, highlighting the effectiveness of the proposed method in recognizing specific defect types.

Figure 8 provides an intuitive comparison of the average precision (AP) of different YOLO-series algorithms and the proposed DMR-YOLO model across eight categories of bearing surface defects. Overall, all models perform exceptionally well on categories such as Casting burr, Polished casting, and Burr, with AP values approaching or exceeding 0.99. This indicates that these defects have relatively distinct visual characteristics, making them easier to detect. However, for categories including Crack, Pit, Scratch, Strain, and Unpolished casting, significant differences in performance among the models are observed. Notably, the proposed DMR-YOLO achieves the highest AP values in most of these challenging categories, with particularly marked improvements in Crack, Scratch, and Strain. These results demonstrate that the introduced enhancements effectively strengthen the model’s ability to recognize subtle cracks, scratches, and deformations that are typically difficult to detect. This figure, therefore, validates at the category level that DMR-YOLO maintains high robustness while achieving superior sensitivity and generalization for hard-to-detect defects.

To visually corroborate the quantitative performance advantages presented in Table 4 and Figure 8, and to specifically illustrate the improvement in defect recognition in practical scenarios, representative samples containing typical hard-to-detect defects were selected for qualitative comparison. As shown in Figure 9, detection results from the baseline YOLOv8n model and the proposed DMR-YOLO model are displayed alongside the original images. It is clearly observed that the improved model significantly reduces missed detections and false positives, while enhancing both localization accuracy and classification confidence. These instance-level visualizations further confirm the enhanced practical applicability and reliability of the proposed DMR-YOLO model in real-world industrial defect detection.

To visually demonstrate the detection performance of the DMR-YOLO model on bearing surface defects, the confusion matrices of the proposed model and the baseline YOLOv8n model were compared, as shown in Figure 10. In the confusion matrices, the rows represent the ground-truth classes, while the columns correspond to the predicted classes. The main diagonal reflects the correct recognition rate for each defect category, whereas the off-diagonal elements indicate misclassification rates. From the comparison, it is evident that the main diagonal of the DMR-YOLO confusion matrix exhibits darker color intensities and higher numerical values compared to YOLOv8n, indicating a higher correct detection rate across all bearing defect categories. The improved model shows notable gains in detection accuracy, particularly for Scratch and Strain defects, where precision increases from 57.8% and 80.7% to 64.2% and 88.6%, respectively. Overall, the mean average precision (mAP) across all defect categories rises from 86.5% to 89.3%, representing an improvement of 2.8% in detection accuracy.

To intuitively demonstrate the stability of the DMR-YOLO model in bearing surface defect detection, Figure 11 illustrates the epoch-wise progression of mAP@50-95 for both YOLOv8 and DMR-YOLO across 200 training epochs. DMR-YOLO exhibits a generally higher mAP@50-95 than the baseline, particularly after the initial 50 epochs, and shows a smoother convergence trajectory, suggesting improved stability. The persistent performance advantage indicates that the gains introduced by DMR-YOLO are reliable under identical experimental conditions. Overall, DMR-YOLO achieves a favorable balance of detection accuracy, stability, and computational efficiency, with a modest increase in model size (3.60 M parameters) and computational cost (10.6 GFLOPs) compared to YOLOv8.

5. Generalization Evaluation

To further evaluate the generalization capability of the proposed DMR-YOLO model in industrial surface inspection scenarios, experiments were conducted on a steel surface defect dataset using both the baseline model and DMR-YOLO. This dataset contains six types of steel surface defects: Rolled-in-scale (RS), Patches (Pa), Crazing (Cr), Pitted surface, Inclusion (In), and Scratches (Sc), comprising a total of 1800 images. The dataset was split into training, validation, and test sets in an 8:1:1 ratio. The experimental environment and evaluation metrics are kept consistent with those described in Section 4.3. The results are summarized in Table 5. Generalization experiments on the steel dataset.

As shown in Table 5. Generalization experiments on the steel dataset, the generalization experiments on the steel surface defect dataset demonstrate that the proposed DMR-YOLO model achieves improvements in precision, recall, and mean average precision (mAP) compared with the baseline model, with only a slight increase in the number of parameters and computational cost. These results indicate that the proposed DMR-YOLO not only outperforms the baseline model on the bearing dataset but also maintains competitive performance on a similar defect detection dataset, suggesting a certain level of generalization capability.

6. Conclusions and Prospects

To improve both the accuracy and speed of surface defect detection for critical mechanical components while maintaining a lightweight model, this study proposes an enhanced defect detection model termed DMR-YOLO based on YOLOv8n. The proposed method aims to provide an efficient intelligent visual inspection solution for industrial component manufacturing, particularly for applications requiring high reliability such as aero-engine component inspection. Through systematic experimental design and analysis, the following conclusions are drawn:

(1): Ablation experiments verify the effectiveness and complementary contributions of each integrated module. The C2f-DBB module incorporates the Diverse Branch Block structure into the backbone network, enhancing feature diversity through multi-branch convolutional representations and improving the extraction of fine-grained defect features. The C2f-MLCA module introduces a multi-level channel attention mechanism that adaptively emphasizes informative feature channels while suppressing background interference. In addition, the proposed ResidualADown module introduces residual connections into the downsampling stage, improving feature information preservation during spatial resolution reduction. Experimental results indicate that the coordinated combination of these components enables the final model to achieve an mAP of 89.3%, representing a 2.8% improvement over the baseline YOLOv8n while maintaining high inference efficiency. The model contains only 3.60 M parameters, demonstrating a favorable balance between detection accuracy and computational efficiency. Moreover, the lightweight design with relatively low parameter count and computational cost shows promising efficiency characteristics that are favorable for potential edge-oriented deployment scenarios in industrial environments.
(2): Comparative experiments with several mainstream detection models, including YOLOv3-tiny, YOLOv5n, YOLOv8n, YOLOv9t, and YOLOv10n, demonstrate that DMR-YOLO achieves the best overall performance in key evaluation metrics, including precision (86.15%), recall (93.93%), and mAP (89.3%). Particularly for challenging defect categories such as Crack, Scratch, and Strain, the model achieves noticeable improvements in AP values, highlighting its effectiveness for bearing surface defect detection tasks.
(3): Generalization experiments conducted on a steel surface defect dataset further confirm the superiority of DMR-YOLO. Compared with the baseline model, it achieves higher precision, recall, and mAP with only a marginal increase in parameters and computational cost, demonstrating strong generalization capability across different industrial defect datasets.

In summary, DMR-YOLO constructs an efficient lightweight detection framework by integrating structural re-parameterization, channel attention mechanisms, and an improved downsampling strategy within the YOLOv8n architecture. The experimental results demonstrate that the systematic integration of these complementary components can effectively improve detection performance while maintaining computational efficiency. Nevertheless, several limitations remain. Under extremely complex backgrounds, strong noise interference, or very small defect targets, localization and classification performance may still fluctuate, occasionally leading to missed or false detections. It should be noted that the proposed method focuses on defect detection and localization based on image data, and does not directly provide quantitative assessment of defect severity, such as geometric dimensions or acceptability criteria. In aerospace applications, such evaluations typically require additional measurement and domain-specific standards. Therefore, the proposed method is intended to serve as an auxiliary tool in the inspection pipeline, providing efficient preliminary screening rather than final decision-making. Future work will focus on evaluating the proposed method on larger-scale industrial datasets and exploring deployment on practical edge computing platforms for intelligent inspection and quality monitoring of aero-engine components. In addition, lightweight optimization techniques such as network pruning and knowledge distillation will be investigated to further reduce computational complexity and enhance the robustness of the model for real-world industrial inspection applications. Moreover, the current study does not explicitly differentiate defect severity levels (e.g., size or depth), which are important for engineering decision-making in aerospace applications. Incorporating geometric measurement and severity grading mechanisms will be considered in future work. The proposed framework demonstrates promising performance for intelligent defect detection in industrial inspection tasks. It is intended to serve as an auxiliary tool for visual inspection in aerospace manufacturing, rather than a complete solution for surface quality measurement or safety-critical evaluation.

Author Contributions

Conceptualization, J.T., H.C. and X.L.; Methodology, H.C., J.T. and X.L.; Software, H.C. and X.Z.; Validation, H.C. and X.Z.; Writing—original draft, H.C.; Writing—review & editing, H.C., J.T. and B.G.; Supervision, J.T., X.L. and B.G. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Foundation of Key Laboratory of Micro-Inertial Instrument and Advanced Navigation Technology, Ministry of Education, China (SEU-MIAN-202102); Nanjing Institute of Technology Talent Introduction Scientific Research Start Fund Project (YKJ202043, YKJ202439). Industry University Research Collaboration Project in Jiangsu Province (DH20251224). Nanjing Health Science and Technology Development Major Project (ZDX22001). Teaching Reform and Construction Project of Nanjing Institute of Technology (JXJS2025053, JXGG2025010). The Guangdong Basic and Applied Basic Research Foundation under Grant 2024A1515030002, the Shenzhen Science and Technology Program under Grant JCYJ20240813150733042, the China Postdoctoral Science Foundation under Grant 2025M784420, the Shaanxi Postdoctoral Research Project under Grant 2025BSHSDZZ113; Graduate Research and Practice Innovation Program of Nanjing Institute of Technology under Grant TB202517053, TB202517103. Innovation and Entrepreneurship Training Program for College Students in Jiangsu Province under Grant 202411276102Y, 202411276107Y and 202411276121Y. The APC was funded by the authors.

Data Availability Statement

Restrictions apply to the availability of these data. Data were obtained from third party and are available from the authors with the permission of third party.

Acknowledgments

The authors sincerely thank Hu Haoyan and Wang Haibin for their valuable suggestions and constructive advice during their work in the laboratory.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

AP	Average Precision
BN	Batch Normalization
CNN	Convolutional Neural Network
DBB	Diverse Branch Block
FPS	Frames Per Second
GFLOPs	Giga Floating Point Operations
MLCA	Multi-Level Channel Attention
mAP	mean Average Precision
PAN-FPN	Path Aggregation Network -Feature Pyramid Networks
ResidualADown	Residual Adaptive Downsampling
YOLO	You Only Look Once

References

Hou, L.; Yi, H.; Jin, Y.; Gui, M.; Sui, L.; Zhang, J.; Chen, Y. Inter-Shaft Bearing Fault Diagnosis Based on Aero-Engine System: A Benchmarking Dataset Study. JDMD 2023, 2, 228–242. [Google Scholar] [CrossRef]
Li, J.; Yang, Z.; Zhou, X.; Song, C.; Wu, Y. Advancing the Diagnosis of Aero-Engine Bearing Faults with Rotational Spectrum and Scale-Aware Robust Network. Aerospace 2024, 11, 613. [Google Scholar] [CrossRef]
Zhang, X.; Wang, H.; Chen, L. Bearing defect inspection based on machine vision. Measurement 2012, 45, 719–733. [Google Scholar] [CrossRef]
Sun, B.; Sheng, Z.; Song, P.; Sun, H.; Wang, F.; Sun, X.; Liu, J. State-of-the-Art Detection and Diagnosis Methods for Rolling Bearing Defects: A Comprehensive Review. Appl. Sci. 2025, 15, 1001. [Google Scholar] [CrossRef]
Ren, Q.Y.; Wang, Y.D.; Shi, J. Research Progress on Convolutional Neural Network Object Detection Algorithms. Sci. Technol. Eng. 2024, 24, 13665–13677. [Google Scholar]
Xia, B.; Luo, H.; Shi, S. Improved Faster R-CNN based surface defect detection algorithm for plates. Comput. Intell. Neurosci. 2022, 2022, 3248722. [Google Scholar] [CrossRef]
Zhang, H.; Li, S.; Miao, Q.; Fang, R.; Xue, S.; Hu, Q.; Hu, J.; Chan, S. Surface defect detection of hot-rolled steel based on multi-scale feature fusion and attention mechanism residual bloc. Sci. Rep. 2024, 14, 7671. [Google Scholar] [CrossRef]
Li, W.; Yang, W.; Jin, G.; Chen, J.; Li, J.; Huang, R.; Chen, Z. Clustering Federated Learning for Bearing Fault Diagnosis in Aerospace Applications with a Self-Attention Mechanism. Aerospace 2022, 9, 516. [Google Scholar] [CrossRef]
Cen, W.D.; Jiang, J.L.; Huang, B.; Ni, F.C.; Li, G.J. Ten-Year Evolution of YOLO: From Real-Time Detection Pioneer to Multi-Task Intelligent Frontier. J. Wuhan Univ. Sci. Ed. 2026, 72. [Google Scholar] [CrossRef]
Xu, H.; Pan, H.; Li, J. Surface defect detection of bearing rings based on an improved YOLOv5 Network. Sensors 2023, 23, 7443. [Google Scholar] [CrossRef]
Sun, T.; Hong, Z.; Song, C.; Xiao, P. Lightweight insulator defect detection based on multi-scale feature fusion. Prog. Laser Optoelectron. 2025, 62, 0612008. [Google Scholar]
Pan, H. YOLOv8s-DDC: A deep neural network for surface defect detection of bearing ring. Electronics 2025, 14, 1079. [Google Scholar]
Wang, C.; Liu, H. YOLOv8-VSC: A lightweight strip steel surface defect detection algorithm. Comput. Sci. Explor. 2024, 18, 151–160. [Google Scholar]
Li, Q.; Que, Z. X-ray image hazardous item detection based on improved YOLOv5. Sci. Technol. Eng. 2023, 23, 1598–1606. [Google Scholar]
Haoyan, H.; Jinwu, T.; Haibin, W.; Xinyun, L. Ead-yolov10: Lightweight steel surface defect detection algorithm research based on yolov10 improvement. IEEE Access 2025, 13, 55382–55397. [Google Scholar] [CrossRef]
Han, T.; Dong, Q.; Wang, X.; Sun, L. BED-YOLO: An enhanced YOLOv8 for high-precision real-time bearing defect detection. IEEE Trans. Instrum. Meas. 2024, 73, 1–13. [Google Scholar] [CrossRef]
Hu, J.; Yang, H.; He, J.; Bai, D.; Chen, H. EHA-YOLOv5: An efficient and highly accurate improved YOLOv5 model for workshop bearing rail defect detection application. IEEE Access 2024, 12, 14. [Google Scholar] [CrossRef]
Fu, X.; Yang, X.; Zhang, N.; Zhang, R.; Zhang, Z.; Jin, A.; Ye, R.; Zhang, H. Bearing surface defect detection based on improved convolutional neural network. Math. Biosci. Eng. MBE 2023, 20, 12341–12359. [Google Scholar] [CrossRef]
Zhang, B.; Xun, R.; Xu, J. A lightweight bearing defect detection model suitable for industrial scenarios. Measurement 2025, 258, 119239. [Google Scholar] [CrossRef]
Liu, M.; Zhang, M.; Chen, X.; Zheng, C.; Wang, H. YOLOv8-LMG: An improved bearing defect detection algorithm based on YOLOv8. Processes 2024, 12, 930. [Google Scholar] [CrossRef]
Afifah, V.; Erniwati, S. Yolov8 for object detection: A comprehensive review of advances, techniques, and applications. IJACI Int. J. Adv. Comput. Inform. 2026, 2, 53–61. [Google Scholar] [CrossRef]

Figure 1. Structure of the YOLOv8 Network Model.

Figure 2. Structure of the DMR-YOLO network model.

Figure 3. Structure of the C2f-DBB module.

Figure 4. Structure of the DBB module.

Figure 5. Structure of the C2f-MLCA module and MLCA module.

Figure 6. Structure of the ResidualADown module.

Figure 7. Some sample images from the dataset.

Figure 8. Comparison of AP values of different algorithms in each defect category.

Figure 9. Comparison Chart of Detection Results between YOLOv8n and DMR-YOLO Models.

Figure 10. Comparison chart of confusion matrices between YOLOv8n and DMR-YOLO.

Figure 11. Training curves of mAP@50-95 for YOLOv8 and DMR-YOLO over 200 epochs.

Table 1. Experimental Environment.

Environmental Parameters	Environment Configuration
CPU	I7-14650HX
GPU	NVIDIA GeForce RTX 5060
GPU memory	16 G
Operating System	Windows11
CUDNN	8.5.0
CUDA	13.0
Python	3.12
Pytorch	2.0.1

Table 2. Experimental Parameters.

Parameter Name	Parameter Value
Batch-size	16
Epochs	200
Workers	8
Image size	640 × 640
Patience	100
Optimizer	SGD
Python	3.12
Pytorch	2.0.1

Table 3. Comparison of Ablation Experiment Results.

Exp.	Model	P (%)	R (%)	mAP (%)	Params/M	GFLOPs/G	FPS (Frames/s)
1	YOLOv8n	84.65	92.73	86.5	3.16	8.9	142.6
2	YOLOv8n + C2f-DBB	85.20	92.60	87.5	3.83	11.0	147.1
3	YOLOv8n + C2f-MLCA	83.85	92.37	86.5	3.16	8.9	141.2
4	YOLOv8n + ResidualADown	85.07	93.79	88.4	2.92	8.5	145.8
5	YOLOv8n + C2f-DBB + C2f-MLCA	85.39	93.25	88.2	4.03	11.6	134.2
6	YOLOv8n + C2f-DBB + ResidualADown	84.53	92.86	87.8	3.57	10.3	129.7
7	YOLOv8n + C2f-MLCA + ResidualADown	84.23	93.16	87.9	2.92	8.5	125.3
8	DMR-YOLO	86.15	93.93	89.3	3.60	10.6	138.6

Table 4. Performance Comparison of YOLO Series Models.

Model	P (%)	R (%)	mAP@50 (%)	FPS (f/s)	Parameters/10⁶	GFLOPs/G
YOLOv3-tiny	83.52	91.84	85.6	78.6	12.17	19.1
YOLOv5n	84.28	74.77	80.0	145.6	2.65	7.8
YOLOv8n	84.65	92.73	86.5	137.0	3.16	8.9
YOLOv9t	85.64	92.30	84.3	57.6	2.13	8.5
YOLOv10n	78.03	69.75	80.5	136.0	2.78	8.7
DMR-YOLO	86.15	93.93	89.3	138.6	3.60	10.6

Table 5. Generalization experiments on the steel dataset.

Model	P (%)	R (%)	mAP (%)	Parameters/10⁶	GFLOPS/G
YOLOv8n	89.32	92.58	93.9	3.16	8.9
DMR-YOLO	91.12	93.02	95.4	3.60	10.6

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Tong, J.; Cao, H.; Lu, X.; Zhang, X.; Gao, B. DMR-YOLO: A Lightweight Visual Inspection Method for Surface Defect Detection of Aero-Engine Components. Aerospace 2026, 13, 360. https://doi.org/10.3390/aerospace13040360

AMA Style

Tong J, Cao H, Lu X, Zhang X, Gao B. DMR-YOLO: A Lightweight Visual Inspection Method for Surface Defect Detection of Aero-Engine Components. Aerospace. 2026; 13(4):360. https://doi.org/10.3390/aerospace13040360

Chicago/Turabian Style

Tong, Jinwu, Han Cao, Xinyun Lu, Xin Zhang, and Bingbing Gao. 2026. "DMR-YOLO: A Lightweight Visual Inspection Method for Surface Defect Detection of Aero-Engine Components" Aerospace 13, no. 4: 360. https://doi.org/10.3390/aerospace13040360

APA Style

Tong, J., Cao, H., Lu, X., Zhang, X., & Gao, B. (2026). DMR-YOLO: A Lightweight Visual Inspection Method for Surface Defect Detection of Aero-Engine Components. Aerospace, 13(4), 360. https://doi.org/10.3390/aerospace13040360

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

DMR-YOLO: A Lightweight Visual Inspection Method for Surface Defect Detection of Aero-Engine Components

Abstract

1. Introduction

2. Related Work

2.1. Defect Detection Based on Deep Learning

2.2. Overview of the YOLOv8 Algorithm

3. Improvements of YOLOv8 Algorithm

3.1. Re-Parameterization Module C2f-DBB

3.2. Attention Mechanism C2f-MLCA

3.3. ResidualADown Module

4. Results and Evaluation

4.1. Dataset

4.2. Implementation Details

4.3. Evaluation Indicators

4.4. Ablation Study

4.5. Model Comparison and Visualization Analysis

5. Generalization Evaluation

6. Conclusions and Prospects

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI