YOLO-SRSA: An Improved YOLOv7 Network for the Abnormal Detection of Power Equipment
Abstract
:1. Introduction
1.1. Related Work
1.2. Research Work in This Paper
- (1)
- To enable the network to better extract features of different anomaly targets, the Attention and Convolution Mixed module (ACmix) is integrated into the Spatial Pyramid Pooling Cross-Stage Partial Channel (SPPCSPC) structure. Meanwhile, convolutional layer pruning is performed to reduce the number of parameters and computational complexity. In addition, the original parallel pooling layers are reconfigured into a cascaded structure, where feature maps from the cascaded pooling layers are fused again to expand the receptive field. The reconstructed SPPCSPC structure enhances the network’s ability to extract features from various anomaly targets.
- (2)
- To improve the network’s flexible recognition of multi-scale feature images, a BiFormer module is added to the efficient aggregation network. The BiFormer module strengthens attention to key features by dynamically adjusting feature weights, highlighting important features while reducing interference from irrelevant data. Unlike traditional global self-attention mechanisms, BiFormer’s sparse attention mechanism selectively focuses on the most relevant parts, resulting in more efficient feature processing and improved feature representation capabilities while reducing computational overhead.
- (3)
- To more comprehensively evaluate the relationship between predicted and ground-truth bounding boxes, the original loss function is replaced with the MPDIoU (Multi-Point Distance Intersection over Union) function. MPDIoU not only considers the area and shape of the bounding boxes but also takes into account the distance between the center points of the bounding boxes, using this information to adjust the IoU calculation. In this way, MPDIoU achieves higher robustness in complex scenarios.
2. The Proposed Approach
2.1. Reconstructed SPPCSPC Module
2.1.1. ACmix Module
2.1.2. AC-SPPCSPC Module Structure
- (1)
- Integration of Convolutional Mixed Attention Module: A convolutional mixed attention mechanism (ACmix) is introduced prior to the pooling layers to enhance discriminative feature extraction for heterogeneous anomaly patterns. This enables the network to better distinguish between anomaly types while preserving critical spatial information. Additionally, redundant convolutional layers are pruned to reduce computational complexity and parameter volume, thereby mitigating the excessive filtering of spatial features from defective targets.
- (2)
- Sequential Pooling Configuration: The original parallel pooling layers are replaced with a sequential arrangement. Unlike parallel pooling, which processes multi-scale features independently, the sequential design cascades pooling operations of varying kernel sizes. This enables the progressive aggregation of multi-scale contextual information, which is particularly advantageous for detecting objects in complex, multi-scale scenarios. By concatenating the outputs of sequentially applied pooling operations, the network synthesizes discriminative features across scales, thereby strengthening its representational capacity and improving detection accuracy.
2.2. Introduction of BiFormer in ESAN Module
2.2.1. BiFormer Module
- Input Partitioning and Transformation: The input feature map is first partitioned into non-overlapping regions. These regions undergo linear transformations to generate query (Q), key (K), and value (V) tensors.
- Region Affinity Computation: For each region, the similarity matrix A is computed by evaluating the affinity between its query (Q) and all keys (K) across regions.
- Sparse Routing: To enforce sparsity, the algorithm retains only the top-S most relevant regions for each row in A, generating a sparse routing matrix .
- Feature Aggregation: Using the routing indices , the corresponding S regions in K and V are aggregated via a gather operation, producing condensed tensors and .
- Output Synthesis: The BRA mechanism updates the input features by combining the attention output with a Local Context Enhancement (LCE) term applied to V. The final output is formulated as
2.2.2. ESAN Module
2.3. Improvement of Loss Function
3. Experimental Work
3.1. Experimental Environment Setup
3.2. Data Augmentation
3.3. Data Annotation
4. Experimental Results and Comparative Analysis
4.1. Ablation Studies
4.1.1. Impacts of Different Attention Mechanisms
4.1.2. Impact of Modified Feature Extraction Module
4.1.3. Impact of Different Loss Functions
4.1.4. Ablation Study of Different Improvement Modules
4.2. Comparative Experiment
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Qi, D.L.; Han, Y.F.; Zhou, Z.Q.; Yan, Y.F. Detection technology for external defects of power transmission and transformation equipment based on video imaging and its current application status. J. Electron. Inf. 2022, 44, 3709–3720. [Google Scholar]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. Ssd: Single shot multibox detector. In Proceedings of the Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; Proceedings, Part I 14. Springer International Publishing: Cham, Switzerland, 2016; pp. 21–37. [Google Scholar]
- Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 37, 1904–1916. [Google Scholar] [CrossRef] [PubMed]
- He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask R-CNN. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2961–2969. [Google Scholar]
- Ma, Q.H. Insulator fault detection method based on umbrella skirt morphology. Sci. Technol. Innov. 2020, 31, 20–21. [Google Scholar]
- Wang, D.L.; Zhang, S.H.; Yuan, B.X.; Zhao, W.; Zhu, R. Research on lightweight detection of self-explosion defects in glass insulators based on improved YOLOv5. High Volt. Technol. 2023, 49, 4382–4390. [Google Scholar]
- Tao, X.; Zhang, D.; Wang, Z.; Liu, X.; Zhang, H.; Xu, D. Detection of power line insulator defects using aerial images analyzed with convolutional neural networks. IEEE Trans. Syst. Man Cybern. Syst. 2020, 50, 1486–1498. [Google Scholar] [CrossRef]
- Zhao, Z.; Zhen, Z.; Zhang, L.; Qi, Y.; Kong, Y.; Zhang, K. Insulator detection method in inspection image based on improved faster R-CNN. Energies 2019, 12, 1204. [Google Scholar] [CrossRef]
- Song, Z.W.; Huang, X.B.; Ji, C.; Zhang, Y. Insulator defect detection and fault warning method for transmission line based on Flexible YOLOv7. High Volt. Eng. 2023, 49, 5084–5094. (In Chinese) [Google Scholar]
- Zhao, W.; Cheng, X.; Zhao, Z.; Zhai, Y. Insulator recognition based on attention mechanism and Faster RCNN. J. Intell. Syst. 2020, 15, 92–98. [Google Scholar]
- Wang, Y.B.; Li, Y.Y.; Duan, Y.; Wu, H. Recognition of infrared images of substation equipment based on lightweight backbone network and attention structure. Power Grid Technol. 2023, 47, 4358–4366. [Google Scholar]
- Bao, W.; Yuan, M.; Liang, D.; Wang, N.; Du, X. Detection algorithm for substation meter defects based on improved YOLOv5. J. Anhui Univ. (Nat. Sci. Ed.) 2024, 48, 50–56. [Google Scholar]
- Xiao, C.J.; Pan, R.Z.; Li, C.; Huang, J. Research on insulator defect detection technology based on improved YOLOv5s. Electron. Meas. Technol. 2022, 45, 137–144. [Google Scholar]
- Xiang, S.; Chang, Z.; Liu, X.; Luo, L.; Mao, Y.; Du, X.; Li, B.; Zhao, Z. Infrared Image Object Detection Algorithm for Substation Equipment Based on Improved YOLOv8. Energies 2024, 17, 4359. [Google Scholar] [CrossRef]
- Han, Y.; Qi, D.; Yan, Y. Self-reduction multi-head attention module for defect recognition of power equipment in substation. Glob. Energy Interconnect. 2025, 8, 82–91. [Google Scholar] [CrossRef]
- Wang, Z.; Lan, X.; Zhou, Y.; Wang, F.; Wang, M.; Chen, Y.; Zhou, G.; Hu, Q. A Two-Stage Corrosion Defect Detection Method for Substation Equipment Based on Object Detection and Semantic Segmentation. Energies 2024, 17, 6404. [Google Scholar] [CrossRef]
- Deng, C.; Liu, M.; Fu, T.; Gong, M.; Luo, B. Infrared Image Recognition of Substation Equipment Based on Improved YOLOv7-Tiny Algorithm. Infrared Technol. 2025, 47, 44–51. [Google Scholar]
- Zhang, H.; Mu, C.; Ma, X.; Guo, X.; Hu, C. MEAG-YOLO: A Novel Approach for the Accurate Detection of Personal Protective Equipment in Substations. Appl. Sci. 2024, 14, 4766. [Google Scholar] [CrossRef]
- Wang, C.Y.; Bochkovshiy, A.; Liao, H.Y.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 7464–7475. [Google Scholar]
- Pan, X.; Ge, C.; Lu, R.; Song, S.; Chen, G.; Huang, Z.; Huang, G. On the integration of self-attention and convolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 815–825. [Google Scholar]
- Zhu, L.; Wang, X.; Ke, Z.; Zhang, W.; Lau, R.W. BiFormer: Vision Transformer with Bi-Level Routing Attention. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 10323–10333. [Google Scholar]
- Zheng, Z.; Wang, P.; Liu, W.; Li, J.; Ye, R.; Ren, D. Distance-IoU loss: Faster and better learning for bounding box regression. Proc. AAAI Conf. Artif. Intell. 2020, 34, 12993–13000. [Google Scholar] [CrossRef]
- Siliang, M.; Yong, X. Mpdiou: A loss for efficient and accurate bounding box regression. arXiv 2023, arXiv:2307.07662. [Google Scholar]
- Yang, L.; Zhang, R.Y.; Li, L.; Xie, X. Simam: A simple, parameter-free attention module for convolutional neural networks. In Proceedings of the International Conference on Machine Learning, PMLR, Virtual, 18–24 July 2021; pp. 11863–11874. [Google Scholar]
- Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141. [Google Scholar]
- Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
- Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 618–626. [Google Scholar]
- Zhu, X.; Lyu, S.; Wang, X.; Zhao, Q. TPH-YOLOv5: Improved YOLOv5 based on transformer prediction head for object detection on drone-captured scenarios. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision Workshops, Montreal, BC, Canada, 11–17 October 2021; IEEE Press: New York, NY, USA, 2021; pp. 2778–2788. [Google Scholar]
- Jocher, G. YOLOv8[EB/OL]. Available online: https://github.com/ultralytics/ultralytics (accessed on 28 March 2025).
- Carion, N.; Massa, F.; Synnaeve, G.; Usunier, N.; Kirillov, A.; Zagoruyko, S. End-to-end object detection with transformers. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; Springer International Publishing: Cham, Switzerland, 2020; pp. 213–229. [Google Scholar]
Attention Mechanism | mAP@0.5/% | AP/% | ||||
---|---|---|---|---|---|---|
bj_mh | bj_ps | bj | jyz_sh | jyz_sl | ||
SimAm | 91.3 | 85.8 | 91.4 | 99.6 | 88.2 | 91.5 |
SE | 89.9 | 84.5 | 91.2 | 99.2 | 86.6 | 87.8 |
CBAM | 90.5 | 84.6 | 91.7 | 99.5 | 87.8 | 88.9 |
BiFormer | 92.6 | 86.4 | 92.8 | 99.5 | 90.2 | 94.0 |
Group | AC-SPPCSPC | MPDIoU | BiFormer | P/% | R/% | mAP@0.5/% | Parameters/M |
---|---|---|---|---|---|---|---|
A | 95.9 | 95 | 89.2 | 70.8 | |||
B | √ | 96.6 | 96.3 | 91.9 | 69.3 | ||
C | √ | 96.8 | 96 | 90.2 | 70.8 | ||
D | √ | 95.6 | 97 | 92.4 | 68.7 | ||
E | √ | √ | 96.2 | 97 | 92.6 | 69.3 | |
F | √ | √ | 97 | 97 | 92.8 | 67.3 | |
G | √ | √ | 96.5 | 97 | 92.7 | 68.7 | |
H | √ | √ | √ | 97.1 | 97 | 93.5 | 67.3 |
Algorithm | mAP@0.5/% | AP | ||||
---|---|---|---|---|---|---|
bj_mh | bj_ps | bj | jyz_sh | jyz_sl | ||
SSD | 79.1 | 76.9 | 80.8 | 97 | 71.2 | 69.6 |
TPH-Yolov5 | 83.4 | 78.9 | 85.8 | 98.4 | 77.4 | 76.2 |
Yolov7 | 89.2 | 83.9 | 90.7 | 99.5 | 85.4 | 86.5 |
Yolov8 | 90.3 | 84.8 | 90.1 | 99 | 87.4 | 90.2 |
DETR | 90.2 | 85.4 | 90.6 | 99.5 | 88.1 | 87.3 |
Ours | 93.5 | 87.8 | 92.9 | 99.5 | 92.3 | 95.2 |
Detection Network | Normal Image | Rainy Image | Foggy Image | Dark Image | Bright Image |
---|---|---|---|---|---|
SSD | 79.1 | 73.4 | 72.6 | 75.3 | 75.7 |
TPH-Yolov5 | 83.4 | 78.7 | 79.3 | 80.6 | 80.9 |
Yolov7 | 89.2 | 85.4 | 84.7 | 87.2 | 87.1 |
Yolov8 | 90.3 | 86.7 | 85.2 | 87.8 | 88.2 |
DETR | 90.2 | 85.9 | 84.6 | 88.1 | 88.4 |
Ours | 93.5 | 90.4 | 89.9 | 91.7 | 92.2 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zou, W.; Jiang, Y.; Liao, W.; Fan, S.; Yang, Y.; Hou, J.; Tang, H. YOLO-SRSA: An Improved YOLOv7 Network for the Abnormal Detection of Power Equipment. Information 2025, 16, 407. https://doi.org/10.3390/info16050407
Zou W, Jiang Y, Liao W, Fan S, Yang Y, Hou J, Tang H. YOLO-SRSA: An Improved YOLOv7 Network for the Abnormal Detection of Power Equipment. Information. 2025; 16(5):407. https://doi.org/10.3390/info16050407
Chicago/Turabian StyleZou, Wan, Yiping Jiang, Wenlong Liao, Songhai Fan, Yueping Yang, Jin Hou, and Hao Tang. 2025. "YOLO-SRSA: An Improved YOLOv7 Network for the Abnormal Detection of Power Equipment" Information 16, no. 5: 407. https://doi.org/10.3390/info16050407
APA StyleZou, W., Jiang, Y., Liao, W., Fan, S., Yang, Y., Hou, J., & Tang, H. (2025). YOLO-SRSA: An Improved YOLOv7 Network for the Abnormal Detection of Power Equipment. Information, 16(5), 407. https://doi.org/10.3390/info16050407