A Lightweight Detection Algorithm for Surface Defects in Small-Sized Bearings

Wang, Yuanyuan; Song, Zhaoyu; Abdullahi, Hauwa Suleiman; Gao, Shangbing; Zhang, Haiyan; Zhou, Liguo; Li, Yazhou

doi:10.3390/electronics13132614

Open AccessArticle

A Lightweight Detection Algorithm for Surface Defects in Small-Sized Bearings

by

Yuanyuan Wang

^1,*

,

Zhaoyu Song

¹,

Hauwa Suleiman Abdullahi

¹

,

Shangbing Gao

¹,

Haiyan Zhang

¹,

Liguo Zhou

² and

Yazhou Li

¹

School of Computer and Software Engineering, Huaiyin Institute of Technology, Huaian 223003, China

²

Institute of Eco-Chongming (IEC), No. 3663 Northern Zhongshan Road, Shanghai 200062, China

^*

Author to whom correspondence should be addressed.

Electronics 2024, 13(13), 2614; https://doi.org/10.3390/electronics13132614

Submission received: 27 April 2024 / Revised: 30 June 2024 / Accepted: 2 July 2024 / Published: 3 July 2024

Download

Browse Figures

Versions Notes

Abstract

Background: To address issues in current deep learning models for detecting defects on industrial bearing surfaces, such as large parameter sizes and low precision in identifying small defects, we propose a lightweight detection algorithm for small-sized bearing appearance defects. Methods: First, we introduce a large separable convolution attention module on the spatial pyramid pooling fusion module. The deep convolutional layer with large convolutional kernels effectively captures more extensive context information of small-sized bearing defects while reducing the computation burden and learns attention weights to adaptively select the importance of input features. Secondly, we integrate the SimAM (simple attention mechanism) into the model without increasing the original network parameters, thereby augmenting the capacity to extract small-sized features and enhancing the model’s feature fusion capability. Finally, utilizing SIoU (Scylla IoU) as the regression loss and Soft-NMS (soft non-max suppression) for handling redundant boxes strengthens the model’s capacity to identify overlapping areas. Results: Experimental results demonstrate that our improved YOLOv8n model, sized at 6.5 MB, outperforms the baseline in terms of precision, recall, and mAP (mean average precision), with FPS (frames per second) of 146.7 (f/s), significantly enhancing bearing defect recognition for industrial applications.

Keywords:

bearing; defect detection; YOLOv8; LSKA; SimAM; Soft-SIoU-NMS

1. Introduction

Amidst the swift evolution of industrial manufacturing and the continuous progress in science and technology, modern machinery and equipment are progressively becoming more sophisticated and intelligent. Bearings, as indispensable and vulnerable key components of mechanical equipment, have been broadly used in major industrial sectors, such as the aerospace and aviation industry [1], the agricultural machinery sector [2], the automotive industry [3], and the maritime sector [4], due to their characteristics of high precision, low friction resistance, and size standardization. However, with the expansion of their application range and the improvement in their use requirements, the demand for bearing defect detection is also increasing. The realm of surface defect detection in industrial parts has witnessed an increasing application of deep learning techniques, thanks to the ongoing advancements in artificial intelligence technology.

Two main categories of object detection algorithms are commonly utilized: one-stage and two-stage algorithms. A two-stage target detection network is usually accompanied by a long computing time because it needs to go through several steps, such as candidate frame generation, reclassification, and target location. The YOLO algorithm, functioning as a one-stage target detection algorithm, is capable of executing the detection process with just a single forward propagation, leading to a substantial decrease in computational complexity and enhancing detection speed. This advantage makes YOLO particularly effective for detecting defects in industrial bearings. In addition, industrial defect data often involve images of multiple angles and scales to comprehensively reflect the defect forms of different directions and sizes. The YOLO algorithm addresses diverse defect data by introducing multiscale feature maps. Multiscale feature representation aids in accurately capturing both the details and contextual information of small targets, thereby enhancing the accuracy in detecting them.

In bearing surface defect detection, the problem of detecting small targets is a significant challenge. In datasets of bearing defects, these flaws typically appear as grooves, scratches, or grazes, which are often small in size and exhibit low contrast compared to the background or other non-defective areas. Consequently, these small targets are prone to being overlooked during the detection process.

Numerous researchers have suggested different techniques for enhancing the precision of deep learning models in identifying small targets. These methods help models more effectively grasp the intricate characteristics of small targets, ultimately boosting the accuracy of the detection process. For example, Hu et al. [5,6] swiftly established spatial position information in a feature map by incorporating lightweight self-attention modules, thus locating unpredictable targets. Li et al. [7] introduced a defect detection method, which combined the attention mechanism and introduced the BiFPN into the YOLOv5, achieving an accuracy of 93.6% on the metal axis dataset. Zhao et al. [8] proposed the GRP-YOLOv5 algorithm for bearing defect detection, which combined ResC2Net and a residual structure, added PConv convolution in the fusion part, and improved the model’s ability to capture defects; the accuracy reached 93.5% on the defective bearing dataset of chemical equipment. Guo et al. [9] suggested the MSFT-YOLO model, which incorporated the TRANS module inspired by transformer architecture into both the backbone and detection head. This allowed for the integration of features with global information. The average detection accuracy in industrial scenes with large image background disturbances, confusing defect categories, and significant defect scale changes, as well as poor detection effects for small defects, is 75.2%, which is 18% higher than that of YOLOv5. Zhao et al. [10] introduced the RDD-YOLO model, which utilized Res2Net blocks to extract features of varying scales. Additionally, a dual-feature pyramid network was incorporated into the architecture’s neck to bolster the generation of comprehensive representations. As a result, the accuracy achieved was higher by 4.3% to 5.8% compared to YOLOv5. Hu et al. [11] introduced a feature attention aggregation network spanning multiple dimensions, which includes a context attention aggregation module to enhance detection accuracy.

Although the above models have achieved high accuracy on their respective datasets, they still suffer from issues such as a large number of parameters and poor real-time detection, which affects their efficiency in end-to-end bearing quality inspection at industrial sites. Qian et al. [12] introduced the LFF-YOLO model. They utilized ShuffleNetv2 [13] as the feature extraction network, followed by the introduction of a LFPN to enhance detection speed. Through streamlining, the model’s parameters were decreased by 74.6%. Yuan et al. [14] introduced a framework named the MOLO network, derived from YOLOv3, that utilized MobileNetV2 [15] as the foundational network for capturing image features to perform multiscale defect detection. The average accuracy (mAP) achieved 87.40%, marking a 4.03% improvement over YOLOv3. Wang et al. [16] introduced a new YOLO-ACG model prioritizing a balance between accuracy and speed. This model enhances the integration of semantic information by incorporating the feature pyramid network with spatial attention. Notably, the model’s size is approximately a quarter of that of the YOLOv4 model. Building upon the enhanced YOLO algorithm and ResNet18 backbone feature extraction network, Xue et al. [17] introduced a coal gangue detection method. The research reduced the mutil-scale feature, successfully compressing the model volume to 28.5% of its original size.

However, when it comes to the practical implementation of detecting defects in industrial bearings, the accurate detection of bearing defects is crucial because bearing defects may often lead to serious equipment failures and safety hazards. Therefore, aimed at identifying small-sized imperfections on bearing surfaces, this paper introduces an effective algorithm for detecting bearing defects, which is constructed upon improvements to the YOLOv8n model. The key advancements of this study can be outlined as follows:

To capture the variable geometry and low-contrast defect shape features of the bearing surface, a large separable kernel attention (LSKA) module [18] was introduced into the SPPF module [19]. The LSKA module uses a large convolution kernel to enlarge the receptive field to capture a wider range of bearing defect shape feature information, learn the importance of attention weights from the adaptive selection of input features, and enhance the model’s expressiveness and capacity for generalization.
In order to address the challenge of handling a high volume of model parameters and the increased resource demands for current bearing defect detection, the SimAM [20] has been incorporated into the model. This integration boosts the model’s capability to extract features related to small bearing defects and enhance the feature fusion aspect without the need to expand the original network parameters.
At the same time, the SIoU [21] is used as the regression loss, and Soft-NMS [22] is used for redundant frame processing. The SIoU decreases the regression’s freedom and accelerates the network’s convergence. Soft-NMS replaces the non-maximum suppression (NMS) [23] algorithm and optimizes the confidence of the anchor frame to enhance the detection performance of the model.

2. Algorithm Description

2.1. YOLOv8 Algorithm

Ultralytics launched the YOLOv8 algorithm, a new algorithm that was developed in 2023. Based on YOLOv5, the network model architecture was designed, and a new structure was introduced to further improve performance and scalability. It can be divided into five networks according to the standard of depth and width: YOLOv8n, YOLOv8s, YOLOv8m, YOLOv8l, and YOLOv8x. For the model to achieve increased accuracy, both the number of model parameters and the number of computations have significantly risen to accommodate various scenarios.

Adaptive image scaling was applied at the input stage to resize the input, while mosaic data were utilized to boost the model’s robustness. The backbone network consists of the CBS module, C2f module, and SPPF module. The CBS module enhances model stability, accelerates convergence, and mitigates gradient vanishing. The C2f module introduces skip connections and additional split operations to enhance the gradient flow within the model. Within the SPPF module, feature fusion is achieved through pooling and convolution operations, effectively amalgamating feature information across multiple scales to augment the model’s feature extraction capability. The neck segment processes features from the backbone network, employing PAN [24] and FPN [25] to comprehensively integrate features via cross-layer connections in both upward and downward directions. Additionally, the head segment adopts a decoupled structure for separate detection and classification tasks. Positive and negative samples are determined based on scores weighted by classification and regression scores, thereby significantly enhancing model performance. The structure diagram of the YOLOv8 model is shown in Figure 1.

2.2. The Improved YOLOv8 Structure

2.2.1. SimAM Attention Mechanism

Traditional convolutional neural networks (CNNs) lack specificity in feature extraction, making it difficult to efficiently focus on target areas and often resulting in performance degradation due to background interference. Attention modules can guide the model to focus on relevant features while suppressing irrelevant regions, thereby enhancing feature representation capabilities.

Given that bearing defects are often small in nature, SimAM, as a parameter-free attention mechanism, can fully utilize the information from three-dimensional data and reduce the loss rate of feature information. It assigns different weights to each pixel in the input-bearing defect feature map based on their importance, prioritizing defect features and mitigating the impact of irrelevant backgrounds on detection accuracy. This mechanism makes the model more effective in handling irrelevant backgrounds and multi-scale information, ensuring precise detection of subtle bearing defects.

Moreover, SimAM draws upon theories from visual neuroscience, and its mode of interaction between neighboring neurons has proven effective in bearing defect detection tasks. The fusion processing diagram of the 3D attention mechanism, Figure 2, illustrates the X-feature graph, the number of C-channels, the vertical dimension of the H-feature graph, and the horizontal dimension of the W-feature graph.

For the feature map

X \in R^{C \times H \times W}

, SimAM measures the importance of the target neuron t through the minimum energy

e_{t}^{*}

, which

e_{t}^{*}

is defined as follows:

e_{t}^{*} = \frac{4 (σ^{2} + λ)}{{(t - μ)}^{2} + 2 σ^{2} + 2 λ} .

(1)

In Formula (1), t is the regular term, the target neuron of the input feature map is represented by

λ

,

μ

signifies the average of all neurons within a single channel, and the value of

σ^{2}

denotes the variance of all neurons in a single channel. As the minimum energy

e_{t}^{*}

decreases, the disparity between the target neuron t in the defect feature and the other neurons

x_{i}

in the corresponding dimension increases, signifying higher importance. Consequently, the importance of neurons can also be represented by

\frac{1}{e_{t}^{*}}

. The expression of the

σ^{2}

sum

μ

is as follows:

σ^{2} = \frac{1}{M} \sum_{i = 1}^{M} {(x_{i} - μ)}^{2},

(2)

μ = \frac{1}{M} \sum_{i = 1}^{M} x_{i} .

(3)

In Formula (2), M represents the number of neurons, which is the product of the width W and height H of the input feature map, where

x_{i}

corresponds to all neurons in the corresponding dimension of the target neuron. Ultimately, the aforementioned feature information is processed in accordance with the definition of the attention mechanism. Feature enhancement is performed to obtain the final output feature

X^{'}

:

X^{'} = sigmoid (\frac{1}{E}) Θ X

(4)

In Formula (4),

Θ

represents elementwise multiplication and E represents the set of all minimum energies on the feature map

e_{t}^{*}

. Sigmoid limits E to the size of the value through the function to avoid affecting the importance of each neuron.

2.2.2. Large Separable Kernel Attention

Among traditional attention mechanisms, large convolution kernels play an important role in improving model performance due to their stronger feature extraction capabilities. However, this also causes increased computational complexity and a larger memory footprint, which can pose challenges in scenarios where resources are limited or where efficient operation is needed. In response to this problem, LSKA was developed, the main goal of which is to enable attention modules with large convolutional cores to run more efficiently in the model by reducing the computational complexity and memory footprint.

LSKA achieves this in an innovative manner. The 2D convolutional kernel of a deep convolutional layer is broken down into a series of horizontal 1D and vertical 1D kernels. In this way, one large convolutional kernel is converted into two smaller kernels, which significantly reduces the computational complexity and memory usage. For example, a 3 × 3 convolution kernel can be deconstructed into 3 × 1 and 1 × 3 convolution kernels, which perform separate convolution operations, reducing the total number of multiplications from 9 to 6. Despite this decomposition, the feature extraction capability is still powerful.

In the specific application of bearing defect detection, due to the variable geometry and low contrast of bearing defects, by introducing LSKA, the SPPF layer aggregates multiscale context information of bearing defects by constructing spatial pyramids of different scales to better learn the correlation between bearing defect features of different scales and different positions. Through its distinctive convolutional kernel decomposition method, LSKA enhances the SPPF layer to capture a broader spectrum of bearing defect feature information and accentuates the shape characteristics of bearing defects, thereby enhancing the overall efficacy of the spatial pyramid pool. The structures of SPPF, LSKA_SPPF, and LSKA are shown in Figure 3, Figure 4 and Figure 5.

2.2.3. SIoU Loss Function

The intersection-over-union ratio (IoU) cannot reflect the overlapping position, the distance between frames, or other problems. To solve the above problems, GIoU [26], DIoU [26], and CIoU [26] have been proposed. However, none of these methods considers the orientation of the discrepancy between the desired actual and predicted bounding box. The prediction box might shift unpredictably during training, potentially leading to a deterioration in model performance. To address these issues, SIoU incorporates an angle factor derived from CIoU, resolving the direction mismatch problem, as depicted in Figure 6. Four types of losses, e.g., angle cost, distance cost, shape cost, and IoU cost, are taken into account. The mathematical definitions are shown in Equations (5)–(9):

Λ_{Angle - loss} = 1 - 2 \times {sin}^{2} (arcsin x - \frac{π}{4}),

(5)

x = \frac{c_{h}}{σ} = sin α,

(6)

σ = \sqrt{{(b_{c_{x}}^{g t} - b_{c_{x}})}^{2} + {(b_{c_{y}}^{g t} - b_{c_{y}})}^{2}},

(7)

c_{h} = max (b_{c_{y}}^{g t}, b_{c_{y}}) - min (b_{c_{y}}^{g t}, b_{c_{y}}),

(8)

Δ_{Distance - loss} = \sum_{t = x, y} (1 - e^{- γ ρ_{t}}),

(9)

in:

ρ_{x} = {(\frac{b_{c_{x}}^{g t} - b_{c_{x}}}{c_{w}})}^{2}, ρ_{y} = {(\frac{b_{c_{y}}^{g t} - b_{c_{y}}}{c_{h}})}^{2}, γ = 2 - Λ,

(10)

Ω_{shape - loss} = \sum_{t = (w, h)} {(1 - e^{- ω_{t}})}^{θ},

(11)

ω_{w} = \frac{| w - w^{g t} |}{max (w, w^{g t})}, ω_{h} = \frac{| h - h^{g t} |}{max (h, h^{g t})},

(12)

where

b^{g t}

and b are the center point coordinates of the real and predicted box,

σ

represents the distance between the center points of the actual and predicted boxes, and

c_{h}

indicates the y-direction distance between their center points.

w^{g t}

and

h^{g t}

refer to the width and height of the real box, while w and h indicate the width and height of the predicted box.

The improved

l_{b o x}

mathematical definition is shown in Equation (13):

l_{box} = 1 - IOU + \frac{Δ + Ω}{2},

(13)

where

Δ

denotes the distance cost and

Ω

denotes the shape cost.

2.2.4. Soft-NMS

Traditional non-maximum suppression (NMS) is used for redundant prediction frame culling, while direct deletion reduces the detection rate of blocked targets. To address this problem, Soft-NMS was proposed to attenuate the confidence of prediction frames larger than the threshold value instead of setting them to 0 and deleting them directly. Its mathematical definition is shown in Equation (14):

S_{i} = \{\begin{matrix} S_{i}, IOU (b_{i}^{m}, b_{i}) < N \\ S_{i} e^{- \frac{iou {(b_{i}^{m}, b_{i})}^{2}}{σ}}, IOU (b_{i}^{m}, b_{i}) \geq N, \end{matrix}

(14)

where

S_{i}

represents the confidence of other similar prediction boxes,

b_{i}^{m}

represents the prediction box with the highest confidence,

b_{i}

represents the other similar prediction boxes,

σ

represents the standard deviation, and N represents the threshold.

Based on the aforementioned improvements, the improved YOLOv8n network model in this paper is named LSS-YOLOv8n, and its overall improved model diagram is shown in Figure 7.

3. Experiment and Analysis

3.1. Experimental Data

Li et al. [27] captured the dataset for this experiment on a workbench and utilized LabelImg annotation software (1.4.0) to highlight defect targets by placing rectangular frames of varying sizes around them. The defects on the bearing surface were categorized into three main types: groove, scratch, and graze. The dataset consists of 5241 bearing images with a resolution of 554 × 416 pixels. Figure 8 displays a snippet of the various defect types found in the dataset. Each image in the dataset contains bearing defects, and the number of samples for each type of defect is shown in Figure 9. The minimum resolutions for graze, scratch, and groove are approximately 30 × 30, 40 × 30, and 20 × 45, respectively. The images are partitioned into training and validation sets at a ratio of 9:1. The dataset contains various shapes, sizes, and positions of bearing defects. Therefore, the defect detection algorithm must demonstrate a high level of generalizability and resilience to accurately detect diverse defect shapes and locations that may occur in real-world scenarios. Throughout this experiment, the SGD function optimizes the network parameters of the model, employing the cosine annealing strategy to decrease the learning rate progressively. During training, the probability of horizontal flipping is set to 0.5, meaning each training image has a 50% chance of being flipped horizontally to increase data diversity. The probability of image mosaic is set to 1.0, meaning all training images undergo mosaic processing. This further enriches the training data and helps the model better understand different defect characteristics. Detailed parameter configurations and experimental environments are provided in Table 1 and Table 2.

3.2. Evaluation Indicators

Target detection tasks encompass both target classification and localization. The model evaluation can be approached from three dimensions: detection accuracy, detection speed, and model complexity. Common ways to evaluate detection performance include metrics like precision, recall, mAP@0.5, and mAP@0.5:0.95. Detection speed is commonly quantified as frames per second (FPS), while model complexity factors in considerations like model size, parameter count, and computational load (GFLOPs). Regarding accuracy, the accuracy rate refers to the percentage of correct identification outcomes, as outlined in Equation (15). The recall rate indicates the percentage of accurately predicted true positive instances out of the total original sample, as defined in Equation (16).

P = \frac{T P}{T P + F P},

(15)

R = \frac{T P}{T P + F N} .

(16)

The P–R curve plots precision on the y-axis and recall on the x-axis, illustrating the trade-off between these two measures across different thresholds. As recall and precision are inherently contradictory, it is essential to introduce a comprehensive metric. Therefore, the area under the P–R curve is used to calculate the average accuracy (AP), serving as a new indicator. A higher AP value indicates greater precision. For single-category accuracy calculation, integration is employed (refer to Equation (17)). For multiple classes, the AP value for each category is computed separately and then averaged to obtain the mean average precision (mAP) (refer to Equation (18)). Typically, average mAP values are assessed across various IOU thresholds, spanning from 0.5 to 0.95, with a step size of 0.05. These parameters, referred to as mAP@0.5 and mAP@0.5:0.95, respectively, serve as the key metrics for evaluating accuracy in this study.

A P = \int_{0}^{1} P (R) d R,

(17)

R = m A P = \frac{A P_{1} + A P_{2} + \dots + A P_{n}}{n} .

(18)

The variables include

T P

for true positive samples,

F P

for false positive samples,

F N

for false negative samples, P for precision, R for recall, and n for the number of categories.

The benchmark for detection speed is measured in frames per second (FPS), representing the rate at which the target model can detect images. A higher FPS value signifies a faster detection speed. For example, if the target detection model analyzes one image in 0.02 s, the FPS would be determined as 1/0.02 = 50.

When assessing model complexity, GFLOPs serve as a metric for both parameter count and overall model size. Greater computational load, influenced by these factors, correlates with increased model complexity and reduced inference speed.

3.3. Analysis of Experimental Results

3.3.1. Model Training

To evaluate the effectiveness of the enhanced LSS-YOLOv8n model proposed in this study, we utilize identical training sets and training hyperparameters to train both the improved model and the original YOLOv8n algorithm. Subsequently, we compare the results obtained from each. Figure 10 and Figure 11 illustrate the final mAP@0.5 and mAP@0.5:0.95 curves, as well as the loss curve, respectively.

By comparison, as shown in Figure 10a, the mAP improved rapidly and fluctuated significantly in the early stage. With increasing iterations, the mAP@0.5 of YOLOv8n and LSS-YOLOv8n gradually stabilized to approximately 87% and 90%, respectively. Finally, the mAPs of YOLOv8n and LSS-YOLOv8n reached 87.5% and 90.3%, respectively. Figure 10b shows that the mAP@0.5:0.95 in the earlier stage rapidly increases and obviously fluctuates. With increasing iterations, the mAP@0.5:0.95 of YOLOv8n and LSS-YOLOv8n gradually stabilized to approximately 58% and 64%, respectively. The mAPs@0.5:0.95 of YOLOv8n and LSS-YOLOv8n reached 58.5% and 64.2%, respectively.

A comparison of loss values is shown in Figure 11. In the early stage, the loss decreased rapidly, and the initial loss of both was approximately 3.1. After approximately 150 rounds, the losses of YOLOv8n and LSS-YOLOv8n reached 1.2597 and 1.2282, respectively. Both of them gradually stabilized at 1.23 and 1.17.

In summary, by introducing the LSKA module to the SPPF module, a large convolution kernel is employed to expand the receptive field and shape features of different bearing defects are captured, and adaptive learning attention weights are used to select important input features; following this, the SimAM is integrated into the model. This integration boosts the model’s capability to extract minute features of bearing defects while enhancing its feature fusion capability, all without increasing the original network parameters. As a result, the detection accuracy of the enhanced network LSS-YOLOv8n on the bearing defect dataset shows significant improvement. Using the SIoU as the regression loss and Soft-NMS for redundancy frame processing, the loss is smaller for the same number of iterations, and a better training effect is achieved.

3.3.2. Comparative Experiment of the Attention Mechanism

To evaluate the performance of the SimAM, comparative experiments were conducted by integrating the attention mechanism to the original YOLOv8n. The current mainstream attention mechanisms, including CBAM, EMA, and SE, which are well-suited for target detection tasks, were chosen and compared with the nonpara-metric attention mechanism SimAM utilized in this study. To exemplify the enhancement brought by each attention mechanism on the dataset, a random frame from the validation set was chosen for visualization of its characteristics. This visualization was presented in the form of a heatmap, as shown in Figure 12.

As shown in Figure 12, compared to other attention mechanisms, SimAM exhibits superiority in selecting key feature information. It enables the model to concentrate on the pertinent feature regions of bearing defects, accurately identifying three bearing defects while suppressing irrelevant regions surrounding the defects. This leads to more precise localization of defect areas with more distinct and evenly distributed attention, illustrating the advantages of the SimAM from a heatmap visualization perspective. To further confirm the efficacy of different attention mechanisms on the dataset utilized in this study, we compare the results of training models with different attention mechanisms on the original YOLOv8 for 300 epochs. The experimental results are depicted in Table 3.

Table 3 illustrates how the inclusion of attention mechanisms can enhance the original model to varying degrees. Integrating CBAM, EMA, and SE attention mechanisms resulted in respective increases of 2.03%, 1.53%, and 1.73% in the mAP@0.5 compared to the YOLOv8n. Additionally, the model size improved with the addition of any of these three attention mechanisms. Notably, employing Triplet and SimAM significantly enhanced the detection performance on the dataset used in this paper. Specifically, the mAP@0.5 rose by 2.03% and 2.23%, respectively, and the mAP@0.5:0.95 increased by 1.83% and 1.93%, respectively, without any change in model size. From the Params, it is evident that the SimAM attention mechanism can fully utilize three-dimensional weights without increasing the parameters of the original network. It assigns different weights to each pixel in the bearing defect feature map based on their importance, allowing the model to suppress irrelevant background information. This enables the model to focus on the critical features of bearing defects, effectively enhancing the algorithm’s precision in bearing defect detection.

Comprehensive comparisons of attention mechanisms, evaluated through performance metrics and heatmap visualizations, collectively demonstrate that integrating the SimAM attention mechanism effectively enhances the model’s focus on detailed features of bearing defects, thereby improving the model’s performance in detecting bearing defects. Additionally, Considering the need to balance real-time detection requirements for bearing surface defects in industrial fields, the SimAM nonparametric attention mechanism, which offers the fastest FPS detection speed after integration, was selected for this experiment.

3.3.3. LSKA Convolutional Kernel Parameter Comparative Experiment

Independent experiments were carried out on the dataset to assess the effectiveness of three proposed enhancement methods, which targeted hardware utilization, training parameters, and iteration count. The aim of these experiments was to clarify the roles of different modules in object detection tasks and provide valuable insights for enhancing and fine-tuning object detection algorithms. Random seeds 213, 4821, and 64455 were selected for the experiment, and the results presented in Table 4 represent the averages of these three outcomes.

To validate the effectiveness of the improved SPPF_LSKA module and determine the most suitable convolutional kernel size for detecting bearing defects, this study conducted comparative experiments using different kernel sizes, including 7, 11, 23, 35, 41, and 53. The experimental conditions involved solely incorporating the SPPF_LSKA improved module. The experimental results are presented in Table 4.

Bearing defects are typically small-sized, and overly large convolutional kernels may encompass too much background information, while too small kernels may fail to capture the full extent of bearing defects. In Table 4, experimental results indicate that setting the convolutional kernel size to 23 leads to improvements in precision, recall, mAP@0.5, and mAP@0.5:0.95 by 2.53%, 2.4%, 2.43%, and 1.53%, respectively, compared to YOLOv8n. This suggests that with a kernel size of 23, the receptive field is sufficiently large to effectively capture the features of bearing defects, covering these small-sized defects without excessively extending to include too much background information. Consequently, the model exhibits optimal performance in detecting small-sized bearing defects. Therefore, this study opts to set the convolutional kernel size to 23.

3.3.4. Ablation Experiment

Independent experiments were carried out on the dataset to assess the effectiveness of three proposed enhancement methods, which targeted hardware utilization, training parameters, and iteration count. The aim of these experiments was to clarify the roles of different modules in object detection tasks and provide valuable insights for enhancing and fine-tuning object detection algorithms. Random seeds 213, 4821, and 64455 were selected for the experiment, and the results presented in Table 5 represent the averages of these three outcomes.

To validate the detection performance of the proposed algorithm and explore the improvement brought by introducing SimAM, LSKA_SPPF, and Soft-SIoU-NMS, ablation experiments were conducted. According to the results, incorporating LSKA_SPPF alone enabled the model to capture a broader range of shape feature information from the bearing surface. This was achieved by utilizing a large convolution kernel to expand the receptive field and learning the importance of attention weights through an adaptive selection of input-bearing defect features. Compared to the YOLOv8n model, accuracy improved by 2.53%, recall rate increased by 2.4%, mAP@0.5 rose by 2.43%, and mAP@0.5:0.95 increased by 1.53%.

On the other hand, while avoiding an increase in the parameters of the original network, enhancing the model’s focus on subtle differences in bearing defect morphology enables the model to significantly strengthen its feature fusion capability. This enhancement allows the model to accurately capture the interaction information between features, thereby improving the network’s feature extraction capability and ultimately increasing the detection accuracy of the model. Compared to the YOLOv8n model, the proposed enhancements led to significant improvements: the accuracy rate rose by 1.43%, the recall rate by 1.9%, mAP@0.5 by 2.23%, and mAP@0.5:0.95 by 1.93%.

Introducing Soft-SIoU-NMS separately optimized the confidence of the detection anchor frame, resulting in a notable improvement in detection performance. Compared to YOLOv8n, the proposed enhancements led to substantial improvements: the accuracy rate rose by 1.5%, the recall rate by 1.03%, mAP@0.5 by 1.9%, and mAP@0.5:0.95 by 5.26%.

Simultaneously integrating LSKA_SPPF, SimAM, and Soft-SIoU-NMS into YOLOv8n resulted in additional improvements, with accuracy and recall rates increasing by 1.16% and 1.2%, respectively. Moreover, mAP@0.5 rose by 2.73%, and mAP@0.5:0.95 increased by 5.73%. Comparing the original YOLOv8n model with the final improved LSS-YOLOv8n model, the latter demonstrated superiority in accuracy rate, recall rate, and mAP.

3.3.5. Comparison of Experimental Results

To highlight the advantages of the enhanced algorithm, this paper compares the LSS-YOLOv8n algorithm with Gold-YOLO-n [28], YOLOv3, YOLOv3-Tiny, YOLOv5n, YOLOv5-SPD-n [29], YOLOv8s, ITD-YOLOv8 [30], and YOLOv8n. The detailed comparative experimental results are shown in Table 6.

In Table 6, in terms of mAP@0.5:0.95 accuracy, LSS-YOLOv8n surpasses all network models except YOLOv3 and YOLOv8s. Secondly, in terms of mAP@small, both YOLOv3 and YOLOv8s exceed 50%, while LSS-YOLOv8n outperforms the baseline model YOLOv8n by 4.2%, surpassing all other models except YOLOv3 and YOLOv8s. However, in terms of complexity, LSS-YOLOv8n increases by 0.6 MB in memory usage and 0.2 GFLOPs in floating-point operations compared to YOLOv8n, with smaller model size and lower floating-point operations than YOLOv3, YOLOv8s, and other models, making it suitable for deployment on edge devices with limited computing power. Finally, in terms of detection speed, the baseline model YOLOv8n achieves an FPS of 151.8, while the designed LSS-YOLOv8n reaches an FPS of 146.7, a difference of only 5.1 FPS from the baseline model, thereby meeting real-time detection requirements.

According to the experimental results shown in Table 7, the improved algorithm outperforms the original YOLOv8n algorithm across the three categories of targets. Specifically, in terms of AP50 and AP50-95, precision improvements of over 2.0% and 4.0%, respectively, are achieved. This indicates that LSS-YOLOv8n has a strong capability to capture the details of each type of defect and can effectively handle overlapping bounding boxes in object detection, resulting in higher overall precision for the model.

In summary, YOLOv8n’s limitations in detecting bearing defect details stem from the small size, variable geometry, and low contrast characteristics of these defects. However, our LSS-YOLOv8n model, enhanced with SimAM, LSKA_SPPF, and Soft-SIoU-NMS, demonstrates superior performance by accurately capturing defect details and effectively addressing overlapping boundary box issues in target detection.

3.3.6. Generalization Experiment

To validate the generalization of the LSS-YOLOv8n network in detecting small-sized targets, we additionally selected the VisDrone2019 [31] dataset for testing. The VisDrone dataset was released by the AISKYEYE team from Tianjin University’s Machine Learning and Data Mining Laboratory. It covers diverse environments across 14 cities in China, including urban and rural scenes, varying lighting conditions from bright to dark, and different weather scenarios. This dataset comprises diverse object distributions, spanning from sparse to dense, rendering it one of the most extensive and varied image datasets within the realm of drone-based aerial imagery in China. The VisDrone2019 dataset encompasses 10 categories of aerial detection targets. It includes 6471 images for training, 548 images for validation, and 3190 images for testing.

In the training images, an average of 53 objects are annotated per image, while in the testing images, there is an average of 71 objects per image. Objects of each category exhibit varying degrees of occlusion in the images, posing significant challenges for the detection capabilities of the network.

Similarly, to demonstrate the improved algorithm’s generalization performance in detecting small-sized objects, this study compared the LSS-YOLOv8n algorithm with Gold-YOLO-n, YOLOv3, YOLOv3-tiny, YOLOv5n, YOLOv5-SPD-n, YOLOv6n, YOLOv8s, YOLOv8n, and ITD-YOLOv8. The comparison results are shown in Table 8.

Based on the experimental data in Table 8, when faced with different datasets, the LSS-YOLOv8n model has a higher mAP@0.5 and mAP@0.5:0.95 than all other models, and it has a higher mAP@small than all models except YOLOv8s. Through comprehensive analysis of model complexity and accuracy, compared to other network models, LSS-YOLOv8n demonstrates satisfactory detection performance, particularly excelling in the generalization ability for detecting small-sized objects.

4. Conclusions

This paper introduced a lightweight network called LSS-YOLOv8n, tailored to bearing defect detection in industrial applications. Initially, we introduced a large separable convolution attention module into the SPPF module, along with a large convolution kernel, to expand the receptive field. This approach enabled the capture of a wider spectrum of contextual information related to bearing defects. It adaptively learned attention weights to select essential input features, thereby improving the model’s representation and generalization capabilities. Subsequently, the integration of SimAM enhances the feature extraction capacity for small bearing defects without increasing the original network parameters while also improving the model’s feature fusion capability. Additionally, SIoU was employed as the regression loss and Soft-NMS for redundant frame processing, further strengthening the model’s recognition ability for overlapping regions and thereby improving detection accuracy. Through ablation experiments and a series of comparative tests, the effectiveness of these three improvements in enhancing the performance of the YOLOv8n model is demonstrated. The VisDrone2019 dataset also showcases the strong generalization capability of LSS-YOLOv8n in detecting small-sized objects.

The experimental results show that the LSS-YOLOv8n proposed in this paper surpasses the baseline model in detecting bearing appearance defects while maintaining a similar model weight. Compared to the YOLOv8n, the LSS-YOLOv8n increases in size by 0.6 MB and incurs a 0.2 GFLOPs increase in floating-point computation compared to YOLOv8n. Furthermore, it greatly improves the accuracy of detecting bearing defects that are small, have varying geometry and low contrast. Regarding detection speed, the LSS-YOLOv8n model only exhibits a slight difference of 5.1 (f/s) compared to the YOLOv8n model, which fulfils the requirements of bearing appearance defect detection in industrial settings. To further cater to the demands of real-time detection of bearing defects, future research will focus on deploying low-precision operations on embedded platforms and exploring the application of edge computing.

Author Contributions

The authors confirm contributions to the paper as follows: study conception and design: Y.W. and Z.S.; data collection: Z.S.; analysis and interpretation of results: Y.L. and H.S.A.; draft manuscript preparation: Z.S. and L.Z.; supervision: H.Z.; funding acquisition: S.G. and L.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by Ministry of Education Humanities and Social Science Research Project (No. 23YJAZH034), the Postgraduate Research and Practice Innovation Program of Jiangsu Province (No. SJCX_2147, SJCX_2148), the National Computer Basic Education Research Project in Higher Education Institutions (No. 2024-AFCEC-056, 2024-AFCEC-057), and the Enterprise Collaboration Project (No. Z421A22349, Z421A22304, Z421A210045).

Data Availability Statement

Due to the nature of this study, the participants in this study did not agree to share their data publicly; therefore, no supporting data can be provided. The code is available at https://github.com/laiyidao2021/LSS-yolov8 (accessed on 8 March 2024).

Acknowledgments

We thank the National Natural Science Foundation of China, the Ministry of Education Humanities and Social Science Research Project, the Postgraduate Research and Practice Innovation Program of Jiangsu Province, and the Enterprise Collaboration Project for supporting this paper.

Conflicts of Interest

The authors declare that they have no conflict of interest to report regarding the present study.

References

Rejith, R.; Kesavan, D.; Chakravarthy, P.; Murty, S.N. Bearings for aerospace applications. Tribol. Int. 2023, 181, 108312. [Google Scholar] [CrossRef]
Veselovsky, A.; Troyanovskaya, I.; Syromyatnikov, Y.; Voinash, S.; Malikov, V.; Zagidullin, R.; Sabitov, L. Features of Wear of Gears of Agricultural Machinery. Acta Technol. Agric. 2023, 26, 207–214. [Google Scholar] [CrossRef]
Wei, X.; Li, X. Early failure analysis of automobile generator bearing. Eng. Fail. Anal. 2024, 159, 108124. [Google Scholar] [CrossRef]
Zhang, Z.; Ouyang, W.; Liang, X.; Yan, X.; Yuan, C.; Zhou, X.; Guo, Z.; Dong, C.; Liu, Z.; Jin, Y.; et al. Review of the evolution and prevention of friction, wear, and noise for water-lubricated bearings used in ships. Friction 2024, 12, 1–38. [Google Scholar] [CrossRef]
Hu, K.; Zhang, D.; Xia, M.; Qian, M.; Chen, B. LCDNet: Light-weighted cloud detection network for high-resolution remote sensing images. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2022, 15, 4809–4823. [Google Scholar] [CrossRef]
Hu, K.; Shen, C.; Wang, T.; Xu, K.; Xia, Q.; Xia, M.; Cai, C. Overview of temporal action detection based on deep learning. Artif. Intell. Rev. 2024, 57, 26. [Google Scholar] [CrossRef]
Li, B.; Gao, Q. Defect Detection for Metal Shaft Surfaces Based on an Improved YOLOv5 Algorithm and Transfer Learning. Sensors 2023, 23, 3761. [Google Scholar] [CrossRef]
Zhao, Y.; Chen, B.; Liu, B.; Yu, C.; Wang, L.; Wang, S. GRP-YOLOv5: An improved bearing defect detection algorithm based on YOLOv5. Sensors 2023, 23, 7437. [Google Scholar] [CrossRef] [PubMed]
Guo, Z.; Wang, C.; Yang, G.; Huang, Z.; Li, G. Msft-yolo: Improved yolov5 based on transformer for detecting defects of steel surface. Sensors 2022, 22, 3467. [Google Scholar] [CrossRef] [PubMed]
Zhao, C.; Shu, X.; Yan, X.; Zuo, X.; Zhu, F. RDD-YOLO: A modified YOLO for detection of steel surface defects. Measurement 2023, 214, 112776. [Google Scholar] [CrossRef]
Hu, K.; Zhang, E.; Xia, M.; Wang, H.; Ye, X.; Lin, H. Cross-dimensional feature attention aggregation network for cloud and snow recognition of high satellite images. Neural Comput. Appl. 2024, 36, 7779–7798. [Google Scholar] [CrossRef]
Qian, X.; Wang, X.; Yang, S.; Lei, J. LFF-YOLO: A YOLO algorithm with lightweight feature fusion network for multi-scale defect detection. IEEE Access 2022, 10, 130339–130349. [Google Scholar] [CrossRef]
Ma, N.; Zhang, X.; Zheng, H.T.; Sun, J. Shufflenet v2: Practical guidelines for efficient cnn architecture design. In Proceedings of the 15th European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 116–131. [Google Scholar]
Yuan, H.; Chen, H.; Liu, S.; Lin, J.; Luo, X. A deep convolutional neural network for detection of rail surface defect. In Proceedings of the 2019 IEEE Vehicle Power and Propulsion Conference (VPPC), Hanoi, Vietnam, 14–17 October 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 1–4. [Google Scholar]
Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.C. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 4510–4520. [Google Scholar]
Wang, C.; Sun, M.; Cao, Y.; He, K.; Zhang, B.; Cao, Z.; Wang, M. Lightweight network-based surface defect detection method for steel plates. Sustainability 2023, 15, 3733. [Google Scholar] [CrossRef]
Xue, G.; Li, S.; Hou, P.; Gao, S.; Tan, R. Research on lightweight Yolo coal gangue detection algorithm based on resnet18 backbone feature network. Internet Things 2023, 22, 100762. [Google Scholar] [CrossRef]
Lau, K.W.; Po, L.M.; Rehman, Y.A.U. Large separable kernel attention: Rethinking the large kernel attention design in cnn. Expert Syst. Appl. 2024, 236, 121352. [Google Scholar] [CrossRef]
He, K.; Zhang, X.; Ren, S.; Sun, J. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 37, 1904–1916. [Google Scholar] [CrossRef] [PubMed]
Yang, L.; Zhang, R.Y.; Li, L.; Xie, X. Simam: A simple, parameter-free attention module for convolutional neural networks. In Proceedings of the 38th International Conference on Machine Learning, Virtual, 18–24 July 2021; PMLR: London, UK; pp. 11863–11874. [Google Scholar]
Gevorgyan, Z. SIoU loss: More powerful learning for bounding box regression. arXiv 2022, arXiv:2205.12740. [Google Scholar]
Bodla, N.; Singh, B.; Chellappa, R.; Davis, L.S. Soft-NMS–improving object detection with one line of code. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 5561–5569. [Google Scholar]
Neubeck, A.; Van Gool, L. Efficient non-maximum suppression. In Proceedings of the 18th International Conference on Pattern Recognition (ICPR’06), Hong Kong, China, 20–24 August 2006; IEEE: Piscataway, NJ, USA, 2006; Volume 3, pp. 850–855. [Google Scholar]
Liu, S.; Qi, L.; Qin, H.; Shi, J.; Jia, J. Path aggregation network for instance segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 8759–8768. [Google Scholar]
Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125. [Google Scholar]
Zheng, Z.; Wang, P.; Liu, W.; Li, J.; Ye, R.; Ren, D. Distance-IoU loss: Faster and better learning for bounding box regression. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 12993–13000. [Google Scholar]
Yadong, L.; Xing, M.; Jiandong, L.; Chunyang, M. Improved Small Target Detection Method of Bearing Defects in YOLOX Network. Comput. Eng. Appl 2023, 59. [Google Scholar]
Wang, C.; He, W.; Nie, Y.; Guo, J.; Liu, C.; Wang, Y.; Han, K. Gold-YOLO: Efficient object detector via gather-and-distribute mechanism. Adv. Neural Inf. Process. Syst. 2024, 36. [Google Scholar]
Sunkara, R.; Luo, T. No more strided convolutions or pooling: A new CNN building block for low-resolution images and small objects. In Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases, Bilbao, Spain, 13–17 September 2021; Springer: Berlin/Heidelberg, Germany, 2022; pp. 443–459. [Google Scholar]
Zhao, X.; Zhang, W.; Zhang, H.; Zheng, C.; Ma, J.; Zhang, Z. ITD-YOLOv8: An Infrared Target Detection Model Based on YOLOv8 for Unmanned Aerial Vehicles. Drones 2024, 8, 161. [Google Scholar] [CrossRef]
Du, D.; Zhu, P.; Wen, L.; Bian, X.; Lin, H.; Hu, Q.; Peng, T.; Zheng, J.; Wang, X.; Zhang, Y.; et al. VisDrone-DET2019: The vision meets drone object detection in image challenge results. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, Seoul, Republic of Korea, 27–28 October 2019. [Google Scholar]

Figure 1. YOLOv8 original network.

Figure 2. SimAM schematic diagram.

Figure 3. SPPF structure diagram.

Figure 4. LSKA_SPPF structure diagram.

Figure 5. Large separable kernel attention structure diagram.

Figure 6. Angle factor in border regression.

Figure 7. LSS-YOLOv8n network: The LSKA attention mechanism is introduced into the SPPF module, the SimAM attention mechanism is introduced into the LSKA_SPPF, and training is performed using Soft-SIoU-NMS.

Figure 8. Various types of defects in the dataset: (a) groove defects, (b) scratch defects, (c) graze defects, and (d) all defects.

Figure 9. The number of samples for each type of defect.

Figure 10. mAP@0.5 and mAP@0.5:0.95 comparison diagram: (a) mAP@0.5 and (b) mAP@0.5:0.95.

Figure 11. Loss comparison diagram: the orange line is the loss function of the baseline model yolov8n, and the blue line is the loss function of the improved model LSS-YOLOv8n.

Figure 12. Comparison of attention mechanisms: SimAM demonstrates a clearer and more even focus on defects.

Table 1. Parameter settings.

Parameter Names	Parameter Values
batch_size	32
epochs	300
momentum	0.937
weight_decay	0.0005
learning_rate	0.01
seed	213, 4821, 64455

Table 2. Experimental environment configuration.

Configuration	Configuration Parameter
Operating system	Ubuntu 20.04
CPU	15 vCPU AMD EPYC 7543 32-core Processor
GPU	RTX 3090 (24 GB)
Memory	80 GB
Deep learning framework	PyTorch 1.11.0, CUDA 11.3

Table 3. Results of comparative experiments on attention mechanisms.

Base Model	Attention Mechanism	P (%)	R (%)	mAP@0.5 (%)	mAP@0.5:0.95 (%)	FPS (f/s)	GFLOP s/G	Model Size/MB	Params
YOLOv8n	-	88.97	84.60	87.57	58.47	151.8	8.7	6.2	3,157,200
YOLOv8n	CBAM	89.10	87.60	89.60	60.10	145.6	8.7	6.4	3,223,090
YOLOv8n	EAM	90.60	84.60	89.10	60.10	134.8	8.8	6.3	3,167,568
YOLOv8n	SE	89.70	86.00	89.30	59.80	145.3	8.7	6.3	3,165,392
YOLOv8n	Triplet	90.30	86.00	89.60	60.30	135.9	8.7	6.2	3,157,400
YOLOv8n	SimAM	90.40	86.50	89.80	60.40	148.9	8.7	6.2	3,157,200

Table 4. Experiments comparing different convolutional kernel sizes for the SPPF_LSKA module.

Base Model	Kernel Size	P (%)	R (%)	mAP@0.5 (%)	mAP@0.5:0.95 (%)
YOLOv8n	-	88.97	84.60	87.57	58.47
YOLOv8n	7	90.70	87.30	89.40	60.10
YOLOv8n	11	91.80	86.30	89.40	60.00
YOLOv8n	23	91.50	87.00	90.00	60.60
YOLOv8n	35	90.20	84.10	88.80	60.30
YOLOv8n	41	89.80	85.70	88.50	60.20
YOLOv8n	53	89.40	85.30	88.50	59.80

Table 5. Ablation experiment results.

YOLOv8n	SPPF_LSKA	SimAM	Soft-SIoU-NMS	P (%)	R (%)	mAP@0.5 (%)	mAP@0.5:0.95 (%)
✓	-	-	-	88.97	84.60	87.57	58.47
✓	✓	-	-	91.50	87.00	90.00	60.60
✓	-	✓	-	90.40	86.50	89.80	60.40
✓	-	-	✓	90.47	85.63	89.47	63.73
✓	✓	✓	✓	90.13	85.80	90.30	64.20

Table 6. Comparison between LSS-YOLOv8n and various algorithms.

Model	Model Size/MB	P (%)	R (%)	mAP@0.5 (%)	mAP@0.5:0.95 (%)	mAP@Small (%)	FPS (f/s)	GFLOPS
Gold-YOLO-n	12.9	91.3	86.0	88.9	58.9	43.8	126.3	12.1
YOLOv3-tiny	23.2	92.0	87.8	90.7	61.8	44.4	217.2	18.9
YOLOv3n	207.8	93.9	88.5	91.1	72.9	60.7	58.1	282.2
YOLOv5n	5.0	89.6	82.9	87.2	56.2	37.8	131.8	7.1
YOLOv5n-SPD-n	8.7	91.7	87.0	89.7	59.3	39.9	135.1	10.6
YOLOv6n	8.3	89.6	84.9	87.8	58.0	40.6	141.2	11.8
YOLOv8s	22.5	91.6	86.6	89.1	65.3	56.5	143.3	28.4
ITD-YOLOv8	10.4	89.8	85.1	88.0	59.8	42.2	104.8	6.0
YOLOv8n	5.9	88.97	84.60	87.57	58.47	41.7	151.8	8.1
LSS-YOLOv8n	6.5	90.13	85.80	90.30	64.20	45.9	146.7	8.3

Table 7. Comparison of accuracy of each category.

Method	AP50 of Classes			AP50-95 of Classes
Method	Groove	Abrasion	Scratch	Groove	Abrasion	Scratch
YOLOv8n	91.6%	91.0%	80.1%	67.1%	67.5%	40.8%
LSS-YOLOv8n	93.8%	93.3%	83.8%	71.1%	73.3%	48.2%

Table 8. Detection results of different networks on the VisDrone2019 dataset.

Model	Model Size/MB	P (%)	R (%)	mAP@0.5 (%)	mAP@0.5:0.95 (%)	mAP@Small (%)	GFLOPS
Gold-YOLO-n	12.9	45.1	32.4	33.0	19.2	11.9	12.1
YOLOv3	207.8	50.8	38.7	37.5	21.9	12.5	282.3
YOLOv3-tiny	23.2	38.6	24.2	23.5	13.0	8.5	18.9
YOLOv5n	5.0	41.2	31.9	31.5	18.2	11.0	7.1
YOLOv5n-SPD-n	8.7	42.6	33.0	32.6	18.8	11.7	10.6
YOLOv6n	8.3	39.4	31.1	29.9	17.4	10.4	11.8
YOLOv8s	22.5	50.5	37.8	38.4	22.7	14.3	28.4
ITD-YOLOv8	10.4	43.4	32.0	32.4	18.2	12.0	6.0
YOLOv8n	5.9	42.5	31.5	31.3	18.0	11.2	8.1
LSS-YOLOv8n	6.5	56.9	22.4	38.6	24.9	13.0	8.3

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, Y.; Song, Z.; Abdullahi, H.S.; Gao, S.; Zhang, H.; Zhou, L.; Li, Y. A Lightweight Detection Algorithm for Surface Defects in Small-Sized Bearings. Electronics 2024, 13, 2614. https://doi.org/10.3390/electronics13132614

AMA Style

Wang Y, Song Z, Abdullahi HS, Gao S, Zhang H, Zhou L, Li Y. A Lightweight Detection Algorithm for Surface Defects in Small-Sized Bearings. Electronics. 2024; 13(13):2614. https://doi.org/10.3390/electronics13132614

Chicago/Turabian Style

Wang, Yuanyuan, Zhaoyu Song, Hauwa Suleiman Abdullahi, Shangbing Gao, Haiyan Zhang, Liguo Zhou, and Yazhou Li. 2024. "A Lightweight Detection Algorithm for Surface Defects in Small-Sized Bearings" Electronics 13, no. 13: 2614. https://doi.org/10.3390/electronics13132614

APA Style

Wang, Y., Song, Z., Abdullahi, H. S., Gao, S., Zhang, H., Zhou, L., & Li, Y. (2024). A Lightweight Detection Algorithm for Surface Defects in Small-Sized Bearings. Electronics, 13(13), 2614. https://doi.org/10.3390/electronics13132614

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Lightweight Detection Algorithm for Surface Defects in Small-Sized Bearings

Abstract

1. Introduction

2. Algorithm Description

2.1. YOLOv8 Algorithm

2.2. The Improved YOLOv8 Structure

2.2.1. SimAM Attention Mechanism

2.2.2. Large Separable Kernel Attention

2.2.3. SIoU Loss Function

2.2.4. Soft-NMS

3. Experiment and Analysis

3.1. Experimental Data

3.2. Evaluation Indicators

3.3. Analysis of Experimental Results

3.3.1. Model Training

3.3.2. Comparative Experiment of the Attention Mechanism

3.3.3. LSKA Convolutional Kernel Parameter Comparative Experiment

3.3.4. Ablation Experiment

3.3.5. Comparison of Experimental Results

3.3.6. Generalization Experiment

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI