1. Introduction
Amidst the swift evolution of industrial manufacturing and the continuous progress in science and technology, modern machinery and equipment are progressively becoming more sophisticated and intelligent. Bearings, as indispensable and vulnerable key components of mechanical equipment, have been broadly used in major industrial sectors, such as the aerospace and aviation industry [
1], the agricultural machinery sector [
2], the automotive industry [
3], and the maritime sector [
4], due to their characteristics of high precision, low friction resistance, and size standardization. However, with the expansion of their application range and the improvement in their use requirements, the demand for bearing defect detection is also increasing. The realm of surface defect detection in industrial parts has witnessed an increasing application of deep learning techniques, thanks to the ongoing advancements in artificial intelligence technology.
Two main categories of object detection algorithms are commonly utilized: one-stage and two-stage algorithms. A two-stage target detection network is usually accompanied by a long computing time because it needs to go through several steps, such as candidate frame generation, reclassification, and target location. The YOLO algorithm, functioning as a one-stage target detection algorithm, is capable of executing the detection process with just a single forward propagation, leading to a substantial decrease in computational complexity and enhancing detection speed. This advantage makes YOLO particularly effective for detecting defects in industrial bearings. In addition, industrial defect data often involve images of multiple angles and scales to comprehensively reflect the defect forms of different directions and sizes. The YOLO algorithm addresses diverse defect data by introducing multiscale feature maps. Multiscale feature representation aids in accurately capturing both the details and contextual information of small targets, thereby enhancing the accuracy in detecting them.
In bearing surface defect detection, the problem of detecting small targets is a significant challenge. In datasets of bearing defects, these flaws typically appear as grooves, scratches, or grazes, which are often small in size and exhibit low contrast compared to the background or other non-defective areas. Consequently, these small targets are prone to being overlooked during the detection process.
Numerous researchers have suggested different techniques for enhancing the precision of deep learning models in identifying small targets. These methods help models more effectively grasp the intricate characteristics of small targets, ultimately boosting the accuracy of the detection process. For example, Hu et al. [
5,
6] swiftly established spatial position information in a feature map by incorporating lightweight self-attention modules, thus locating unpredictable targets. Li et al. [
7] introduced a defect detection method, which combined the attention mechanism and introduced the BiFPN into the YOLOv5, achieving an accuracy of 93.6% on the metal axis dataset. Zhao et al. [
8] proposed the GRP-YOLOv5 algorithm for bearing defect detection, which combined ResC2Net and a residual structure, added PConv convolution in the fusion part, and improved the model’s ability to capture defects; the accuracy reached 93.5% on the defective bearing dataset of chemical equipment. Guo et al. [
9] suggested the MSFT-YOLO model, which incorporated the TRANS module inspired by transformer architecture into both the backbone and detection head. This allowed for the integration of features with global information. The average detection accuracy in industrial scenes with large image background disturbances, confusing defect categories, and significant defect scale changes, as well as poor detection effects for small defects, is 75.2%, which is 18% higher than that of YOLOv5. Zhao et al. [
10] introduced the RDD-YOLO model, which utilized Res2Net blocks to extract features of varying scales. Additionally, a dual-feature pyramid network was incorporated into the architecture’s neck to bolster the generation of comprehensive representations. As a result, the accuracy achieved was higher by 4.3% to 5.8% compared to YOLOv5. Hu et al. [
11] introduced a feature attention aggregation network spanning multiple dimensions, which includes a context attention aggregation module to enhance detection accuracy.
Although the above models have achieved high accuracy on their respective datasets, they still suffer from issues such as a large number of parameters and poor real-time detection, which affects their efficiency in end-to-end bearing quality inspection at industrial sites. Qian et al. [
12] introduced the LFF-YOLO model. They utilized ShuffleNetv2 [
13] as the feature extraction network, followed by the introduction of a LFPN to enhance detection speed. Through streamlining, the model’s parameters were decreased by 74.6%. Yuan et al. [
14] introduced a framework named the MOLO network, derived from YOLOv3, that utilized MobileNetV2 [
15] as the foundational network for capturing image features to perform multiscale defect detection. The average accuracy (mAP) achieved 87.40%, marking a 4.03% improvement over YOLOv3. Wang et al. [
16] introduced a new YOLO-ACG model prioritizing a balance between accuracy and speed. This model enhances the integration of semantic information by incorporating the feature pyramid network with spatial attention. Notably, the model’s size is approximately a quarter of that of the YOLOv4 model. Building upon the enhanced YOLO algorithm and ResNet18 backbone feature extraction network, Xue et al. [
17] introduced a coal gangue detection method. The research reduced the mutil-scale feature, successfully compressing the model volume to 28.5% of its original size.
However, when it comes to the practical implementation of detecting defects in industrial bearings, the accurate detection of bearing defects is crucial because bearing defects may often lead to serious equipment failures and safety hazards. Therefore, aimed at identifying small-sized imperfections on bearing surfaces, this paper introduces an effective algorithm for detecting bearing defects, which is constructed upon improvements to the YOLOv8n model. The key advancements of this study can be outlined as follows:
To capture the variable geometry and low-contrast defect shape features of the bearing surface, a large separable kernel attention (LSKA) module [
18] was introduced into the SPPF module [
19]. The LSKA module uses a large convolution kernel to enlarge the receptive field to capture a wider range of bearing defect shape feature information, learn the importance of attention weights from the adaptive selection of input features, and enhance the model’s expressiveness and capacity for generalization.
In order to address the challenge of handling a high volume of model parameters and the increased resource demands for current bearing defect detection, the SimAM [
20] has been incorporated into the model. This integration boosts the model’s capability to extract features related to small bearing defects and enhance the feature fusion aspect without the need to expand the original network parameters.
At the same time, the SIoU [
21] is used as the regression loss, and Soft-NMS [
22] is used for redundant frame processing. The SIoU decreases the regression’s freedom and accelerates the network’s convergence. Soft-NMS replaces the non-maximum suppression (NMS) [
23] algorithm and optimizes the confidence of the anchor frame to enhance the detection performance of the model.
4. Conclusions
This paper introduced a lightweight network called LSS-YOLOv8n, tailored to bearing defect detection in industrial applications. Initially, we introduced a large separable convolution attention module into the SPPF module, along with a large convolution kernel, to expand the receptive field. This approach enabled the capture of a wider spectrum of contextual information related to bearing defects. It adaptively learned attention weights to select essential input features, thereby improving the model’s representation and generalization capabilities. Subsequently, the integration of SimAM enhances the feature extraction capacity for small bearing defects without increasing the original network parameters while also improving the model’s feature fusion capability. Additionally, SIoU was employed as the regression loss and Soft-NMS for redundant frame processing, further strengthening the model’s recognition ability for overlapping regions and thereby improving detection accuracy. Through ablation experiments and a series of comparative tests, the effectiveness of these three improvements in enhancing the performance of the YOLOv8n model is demonstrated. The VisDrone2019 dataset also showcases the strong generalization capability of LSS-YOLOv8n in detecting small-sized objects.
The experimental results show that the LSS-YOLOv8n proposed in this paper surpasses the baseline model in detecting bearing appearance defects while maintaining a similar model weight. Compared to the YOLOv8n, the LSS-YOLOv8n increases in size by 0.6 MB and incurs a 0.2 GFLOPs increase in floating-point computation compared to YOLOv8n. Furthermore, it greatly improves the accuracy of detecting bearing defects that are small, have varying geometry and low contrast. Regarding detection speed, the LSS-YOLOv8n model only exhibits a slight difference of 5.1 (f/s) compared to the YOLOv8n model, which fulfils the requirements of bearing appearance defect detection in industrial settings. To further cater to the demands of real-time detection of bearing defects, future research will focus on deploying low-precision operations on embedded platforms and exploring the application of edge computing.