Detection Method of External Damage Hazards in Transmission Line Corridors Based on YOLO-LSDW

Zou, Hongbo; Yang, Jinlong; Sun, Jialun; Yang, Changhua; Luo, Yuhong; Chen, Jiehao

doi:10.3390/en17174483

Open AccessArticle

Detection Method of External Damage Hazards in Transmission Line Corridors Based on YOLO-LSDW

by

Hongbo Zou

^1,2,

Jinlong Yang

^1,*,

Jialun Sun

³,

Changhua Yang

¹,

Yuhong Luo

¹ and

Jiehao Chen

¹

College of Electrical Engineering and New Energy, China Three Gorges University, Yichang 443002, China

²

Hubei Provincial Key Laboratory for Operation and Control of Cascaded Hydropower Station, China Three Gorges University, Yichang 443002, China

³

Zhangjiakou Power Supply Bureau of State Grid Jibei Electric Power Co., Ltd., Zhangjiakou 075000, China

^*

Author to whom correspondence should be addressed.

Energies 2024, 17(17), 4483; https://doi.org/10.3390/en17174483

Submission received: 7 July 2024 / Revised: 23 August 2024 / Accepted: 29 August 2024 / Published: 6 September 2024

(This article belongs to the Special Issue Big Data Analysis and Application in Power System)

Download

Browse Figures

Versions Notes

Abstract

To address the frequent external damage incidents to transmission line corridors caused by construction machinery such as excavators and cranes, this paper constructs a dataset of external damage hazards in transmission line corridors and proposes a detection method based on YOLO-LSDW for these hazards. Firstly, by incorporating the concept of large separable kernel attention (LSKA), the spatial pyramid pooling layer is improved to enhance the information exchange between different feature levels, effectively reducing background interference on external damage hazard targets. Secondly, in the neck network, the traditional convolution is replaced with a ghost-shuffle convolution (GSConv) method, introducing a lightweight slim-neck feature fusion structure. This improves the extraction capability for small object features by fusing deep semantic information with shallow detail features, while also reducing the model’s computational load and parameter count. Then, the original YOLOv8 head is replaced with a dynamic head, which combines scale, spatial, and task attention mechanisms to enhance the model’s detection performance. Finally, the wise intersection over union (WIoU) loss function is adopted to optimize the model’s convergence speed and detection performance. Evaluated on the self-constructed dataset of external damage hazards in transmission line corridors, the improved algorithm shows significant improvements in key metrics, with mAP@0.5 and mAP@0.5:0.95 increasing by 3.4% and 4.6%, respectively, compared to YOLOv8s. Additionally, the model’s computational load and parameter count are reduced, and it maintains a high detection speed of 96.2 frames per second, meeting real-time detection requirements.

Keywords:

transmission line corridor; prevention of external damage; object detection; attention mechanism; loss function

1. Introduction

Transmission lines are a critical component of the power grid, and their secure operation is vital for ensuring reliable power supply [1,2,3]. However, external damage hazards caused by unauthorized construction activities, such as those involving excavators and cranes, frequently occur within transmission line corridors. These hazards can severely damage the transmission facilities, leading to grid trips, power outages, and even serious safety incidents such as electric shocks [4,5]. Therefore, promptly identifying these external damage hazards is essential for maintaining the safe operation of transmission lines, and research aimed at preventing such external damage is of significant importance.

To ensure the stable operation of transmission lines, it is necessary to conduct safety inspections of the corridors [6]. Current inspection methods mainly include manual inspections, drone inspections, and video-based online monitoring systems [7]. Manual inspections are costly, inefficient, and susceptible to weather and terrain conditions. While drone inspections offer higher efficiency, they have limited endurance and complex operation, and pose significant risks in adverse weather conditions. In contrast, installing fixed cameras to monitor transmission line corridors and using computer vision technology to process real-time data to accurately identify potential hazards has become a key technical approach in preventing external damage to transmission lines. With the continuous development of computer vision technology, object detection techniques have been widely applied in detecting hazards in transmission lines. For instance, the literature [8] proposed a re-parameterized YOLOv5-based edge intelligence detection method for transmission line defect hazards, utilizing R-D modules and re-parameterized spatial pyramid pooling (SPP) to improve the accuracy of recognizing insulator self-explosion, pin missing, and bird nest hazards. The literature [9] introduced a YOLOv4-based bird-related fault detection method for transmission lines, employing multi-stage transfer learning for model training and integrating mosaic data augmentation, cosine annealing decay, and label smoothing to enhance training effects, effectively detecting bird targets and identifying bird species in transmission lines. The literature [10] proposed a low false negative defect detection method using a combined target detection framework, adaptively fusing feature extraction results from YOLOv3 and faster RCNN networks, thereby effectively reducing the false negative rate in inspection image defect detection. The literature [11] presented an instance segmentation neural network algorithm using partial bounding box annotation, transferring detection branch features to the mask branch to achieve recognition and segmentation of common external damage categories. The literature [12] proposed an insulator fault detection method for transmission lines based on USRNet and an improved YOLOv5x algorithm. Firstly, USRNet was used for super-resolution reconstruction of inspection images to reduce complex background interference, and then the YOLOv5x algorithm was improved to enhance the detection accuracy of small targets. However, this method, which involves super-resolution reconstruction before detection, increases accuracy but has significant shortcomings in processing speed and lacks generalization capability in complex environments. The literature [13] introduced a YOLOv4-based external damage hazard detection algorithm for transmission lines, improving the K-means algorithm for clustering the target sizes in the image sample set to select anchor boxes that match the target detection characteristics. The CSPDarknet-53 residual network was then used to extract deep network feature data from images, followed by processing the feature maps using the SPP algorithm. Although this improved detection accuracy to some extent, the complex network structure resulted in slower detection speeds, failing to meet real-time hazard detection requirements. The literature [14] proposed a YOLOv5-based fault detection algorithm for transmission lines, incorporating attention mechanisms and cross-scale feature fusion. Firstly, attention mechanisms were introduced to suppress complex background interference, and then the BiFPN feature fusion structure was used to enhance the detection accuracy of multi-scale fault targets. However, this approach did not effectively address the loss of small target information during downsampling, potentially leading to inaccurate positioning and safety hazards.

In summary, current research on detecting external damage hazards has made some progress, but the actual transmission line scenarios are complex. High-rise buildings, forests, and slopes often present in transmission line corridors can interfere with detection. When the characteristics of external damage hazard targets are similar to those of the background or when occlusion occurs, misdetections and missed detections are still prevalent. Additionally, since cameras are usually installed on transmission towers, distant construction machinery appears very small in the field of view, posing a greater challenge for small target detection. Considering these issues and the fact that object detection algorithms will be deployed on edge devices close to cameras outdoors, which have limited computing resources [15,16], this paper selects the YOLOv8s algorithm to meet the real-time and accuracy requirements of hazard detection while keeping the parameter count low for ease of deployment. Based on this, we optimized the model and proposed a detection method for external damage hazards in transmission line corridors based on YOLO-LSDW. The main contributions of this paper are as follows:

1. The combination of LSKA [17] with the SPPF module in the baseline model led to the proposal of the SPPF-LSKA module, which reduces complex background interference on construction machinery by capturing long-range dependencies and adaptivity, thereby improving the model’s accuracy in complex environments.

2. Introduction of the slim-neck [18] structure in the neck network of the original model for feature fusion, enhancing the detection capability of small targets by integrating contextual information while reducing the model’s parameter count and computational load.

3. Replacing the head network of the baseline model with the dynamic head (DyHead) [19] module for prediction output, further enhancing the model’s performance in recognizing construction machinery of different scales, complex backgrounds, and small targets.

4. Introduction of the WIoU [20] loss function to enhance the convergence capability of the network’s classification loss, optimizing the training process to improve the model’s bounding box prediction and regression performance.

The structure of this paper is as follows: Section 2 introduces the YOLO-LSDW model and its enhancements, including the SPPF-LSKA module, slim-neck feature fusion, DyHead module, and WIoU loss function. Section 3 describes the experimental setup, datasets, and evaluation metrics, and validates the model’s effectiveness through ablation and comparison experiments. Section 4 analyzes the experimental results, discussing the model’s performance and limitations. Section 5 concludes with a summary of the research contributions and future directions.

2. YOLO-LSDW Network Model

2.1. Method Overview

The process of the proposed detection method for external damage hazards in transmission line corridors is illustrated in Figure 1. The method begins by collecting raw image data from monitoring equipment, which is then preprocessed and annotated to construct a dataset containing six categories of construction machinery. The dataset is subsequently divided into training, validation, and testing sets, which are used for the development of the YOLO-LSDW model. This model incorporates several improvements based on the YOLOv8s framework, enhancing detection performance through a process of training and optimization. As depicted in Figure 1, after comprehensive testing and performance evaluation, the model is capable of real-time processing of input images of transmission lines, outputting information such as the location, category, and confidence level of external damage hazards, thereby providing robust support for the safety monitoring of transmission lines.

2.2. YOLOv8 Network Model

YOLOv8 [21] is a CNN-based object detection network implemented on the PyTorch framework. It is available in different versions based on network width and depth, ranked from smallest to largest as YOLOv8n, YOLOv8s, YOLOv8m, YOLOv8l, and YOLOv8x. The YOLOv8 algorithm model primarily comprises four modules: the input end, the backbone network, the neck end, and the output end.

At the input end, the images fed into the network are preprocessed using techniques such as mosaic data augmentation [22], adaptive image scaling, and grayscale padding. In the backbone network, features are extracted using Conv, C2f, and SPPF structures, employing convolution and pooling to capture deep image features. The neck end utilizes the feature pyramid network (FPN) [23] and path aggregation network (PAN) [24] structures for design, merging features from different scales through upsampling, downsampling, and feature concatenation. The output end employs a decoupled head structure, which separates the classification and regression processes and includes positive-negative sample matching and loss calculation. YOLOv8 also incorporates the TaskAlignedAssigner [25] method, which weights the classification and regression scores and matches positive samples based on the weighted result. The loss function calculation consists of two parts: the classification branch, which uses binary cross entropy loss (BCE Loss), and the regression branch, which employs the integral representation form in distribution focal loss (DFL) [26] combined with the complete intersection over union (CIoU) loss function to enhance the accuracy of bounding box predictions.

To meet the needs of edge devices in detecting external damage hazards in transmission line corridors, which require high detection speed and deployment efficiency, this paper selects the YOLOv8s network as the foundation due to its lower parameter count and computational load, yet high detection accuracy. Various improvements have been made, resulting in a detection method named YOLO-LSDW for external damage hazards in transmission line corridors. The structure of the YOLO-LSDW model is shown in Figure 2.

2.3. SPPF-LSKA Module

Transmission line corridors often feature complex environments such as high-rise buildings, forests, and slopes, which pose significant challenges for detection due to interference, occlusion, and multi-scale external damage hazard targets. To enhance the model’s ability to extract key features of construction machinery, the LSKA attention mechanism is introduced into the SPPF module of the YOLOv8 backbone network. This integration helps the network ignore irrelevant background information and focus on more effective hazard target features.

LSKA is an innovative attention module that captures long-range dependencies and adaptivity by decomposing large kernel convolution operations. It splits a k × k convolution kernel into separable 1 × k and k × 1 kernels, processing input features in a cascading manner to reduce computational complexity and memory usage.

The improved SPPF-LSKA structure, illustrated in Figure 3, modifies the original SPPF, which passes the input through three 5 × 5 max-pooling layers sequentially, concatenates the outputs from each layer, and then uses a regular convolution for feature fusion to obtain multi-scale features. In the enhanced SPPF-LSKA structure, the concatenated results after the three pooling layers are fed into an 11 × 11 LSKA convolution module. This modification strategy enriches the receptive field information in multi-scale features, thereby improving the model’s robustness in complex environments.

2.4. Slim-Neck Feature Fusion Structure

When capturing image data of transmission lines, cameras are typically installed at the location of transmission towers to obtain a broader field of view. However, construction machinery that is further away from the camera tends to appear smaller, making the detection of these small targets more challenging. To address the difficulty of detecting small targets among external damage hazards and achieve real-time detection, the design of the feature fusion network needs to balance accuracy and speed. Standard convolution (SC) can directly obtain convolution results since the convolution kernel channels are the same as the image channels, preserving the hidden connections between channels. In contrast, depth-wise separable convolution (DSC) [27] processes the input channels in layers, performing separate convolutions for each channel and then recombining them according to their input channels. DSC reduces the computational load associated with multi-channel processing, enhancing prediction speed, but it loses inter-channel correlation information, which directly reduces accuracy. Figure 4a,b illustrate the calculation processes of SC and DSC, respectively. To enhance prediction speed while approaching the performance of SC, this paper adopts GSConv to replace a portion of the Conv convolutions. GSConv combines DSC with SC.

As shown in Figure 5, the input features first undergo SC, then are concatenated with the features processed by DWC, followed by a Shuffle operation to gradually integrate them. However, using GSConv at all stages of the model would deepen the network layers, increasing data flow resistance and significantly extending inference time. By the time these feature maps reach the neck, they have become elongated (maximum channel dimension, minimal width-height dimension). Therefore, this paper introduces GSConv modules only in the neck layer, replacing standard convolution to reduce model parameters and computation while ensuring maximum sampling effectiveness. The lightweight convolution of GSConv combined with the one-shot aggregation method to design the cross-stage partial network module [28], VoV-GSCSP. The original C2f module is replaced by the VoV-GSCSP module, as shown in Figure 6. The slim-neck structure, formed after improving the neck with GSConv and VoV-GSCSP modules, enhances the fusion of feature layers extracted by the backbone network, allowing each feature layer to balance deep semantic information and shallow detail features. This improvement helps increase the feature recognition ability for small targets, and, while maintaining high accuracy, reduces computational complexity and inference time. The specific slim-neck structure is shown in Figure 2. Finally, the processed features enter the DyHead module for detection, further enhancing model performance.

2.5. DyHead Module

To further improve the model’s performance in recognizing construction machinery across different scales, complex backgrounds, and small targets, this paper introduces the DyHead module in the YOLOv8 head network. The original detection head does not consider contextual information, making predictions at each location independently and lacking a global perspective. Moreover, it predicts from only one scale of the feature map, ignoring the contributions of other scale features, thus inadequately handling multi-scale targets. The DyHead module employs a dynamic routing mechanism to better fuse contextual information and recognize multi-scale targets. It can dynamically adjust the weights of different feature layers, aiding in the extraction of multi-scale features.

The DyHead module leverages a self-attention mechanism to unify scale-aware attention, spatial-aware attention, and task-aware attention, enhancing the representation capability and accuracy of target detection. Given a three-dimensional feature tensor

F \in R^{L \times S \times C}

at the detection layer, the attention calculation formula is as follows:

W (F) = π_{C} (π_{S} (π_{L} (F) \cdot F) \cdot F) \cdot F

(1)

where F represents an input three-dimensional tensor L × S × C, with L denoting the feature map’s levels, S representing the spatial dimensions (height and width product), and C indicating the number of channels.

π_{L} (\cdot)

,

π_{S} (\cdot)

, and

π_{C} (\cdot)

correspond to the scale-aware attention module, spatial-aware attention module, and task-aware attention module, respectively, and are applied to the L, S, and C dimensions of the three-dimensional tensor F. As shown in Figure 7, the internal structure of the three attention modules,

π_{L} (\cdot)

,

π_{S} (\cdot)

, and

π_{C} (\cdot)

, and the single DyHead structure they form in series are depicted. These attention modules are embedded within a target detection head and can be stacked multiple times. Based on experimental comparisons and considering computational cost and model performance, this paper employs only one DyHead module.

2.6. Loss Function

YOLOv8 adopts CIoU as the loss function for the bounding box. While CIoU considers overlap area, center point distance, and aspect ratio, it has a certain degree of ambiguity in the relative value description of the aspect ratio and does not address the issue of sample balance. When the training data contains many low-quality samples, geometric factors such as distance and aspect ratio aggravate the penalty on low-quality samples, reducing the algorithm’s generalization ability and leading to unstable training. Therefore, this paper employs the WIoU loss function instead of the CIoU loss function to better address the sample imbalance issue and improve the model’s convergence speed. The visualization of each parameter in the loss function is shown in Figure 8. WIoU is defined by Formulas (2)–(4):

L_{W I o U} = r R_{W I o U} L_{I o U}

(2)

R_{W I o U} = e x p (\frac{{(x - x_{g t})}^{2} + {(y - y_{g t})}^{2}}{{(W_{g}^{2} + H_{g}^{2})}^{*}})

(3)

L_{I o U} = 1 - I o U

(4)

where

R_{W I o U}

represents the high-quality anchor box loss,

(x, y)

and

(x_{g t}, y_{g t})

are the coordinates of the center points of the anchor box and target box, respectively, while

W_{g}

and

H_{g}

are the height and width of the minimum enclosing box formed by the target box and prediction box, respectively. The

*

denotes separating

W_{g}

and

H_{g}

of the minimum enclosing box from the gradient calculation, reducing the adverse impact on model training. The intersection over union (IoU) is an indicator to measure the degree of overlap between the predicted bounding box and the ground truth bounding box.

The non-monotonic focus convergence efficiency

r

is defined as:

r = \frac{β}{δ α^{β - δ}}

(5)

β = \frac{L_{I o U}^{*}}{{\bar{L}}_{I o U}} \in [0, + \infty)

(6)

where

β

is defined as the outlier degree, introduced to measure the quality of the anchor box, which is negatively correlated with the anchor box quality.

L_{I o U}^{*}

is the gradient gain for the monotonic focusing coefficient, defined similarly to

L_{I O U}

. Here,

*

indicates that during training, the quality of the anchor box is continually calculated and updated based on each target detection. When

β

is smaller, the anchor box quality is higher; when

β

is larger, the anchor box quality is lower. To make the bounding box focus more on ordinary-quality anchor boxes, a smaller gradient gain is assigned for both larger and smaller

β

, constructing a non-monotonic focusing coefficient

r

through a dynamic focusing mechanism, used to control the importance weights between different positions. During model training, the gradient gain decreases as

L_{I o U}

decreases, causing the training speed to slow down in the later stages. Therefore, the mean value of

L_{I O U}

, denoted as

{\bar{L}}_{I o U}

, is introduced as a normalization factor to alleviate the problem of slow convergence speed in the later stages.

α

and

δ

are two hyperparameters set to 1.9 and 3.

WIoU balances the penalty strength between low-quality and high-quality anchor boxes, employing a dynamic non-monotonic focus mechanism to mask the impact of low-quality examples, allowing the model to focus more on ordinary-quality anchor boxes and improving the overall performance of the model.

3. Experimental Environment and Result Analysis

3.1. Experimental Environment and Training Strategy

The experiments were conducted in a Windows 10 system environment with hardware including an Intel i5-12490F processor, a 12 GB NVIDIA GeForce RTX 3060 graphics card, and 16 GB of memory. The software environment was Python 3.8, using the PyTorch 1.13.0 framework. The training parameters are set as shown in Table 1.

3.2. Dataset

This study uses images collected by transmission line monitoring equipment in a specific province as the raw image data, filtering 4000 images covering various complex scenes and angles to build the initial dataset. Subsequently, the LabelImg 1.8.6 annotation software was used to annotate the target instances in the dataset one by one according to the YOLO format standard. During the annotation process, six types of construction machinery—excavator, tower crane, crane, bulldozer, concrete mixer, and truck—were identified as targets. To meet the experimental needs, the 4000 images obtained were divided into training set, validation set, and test set in a ratio of 7:1:2, with the distribution of labels in each category shown in Table 2.

3.3. Evaluation Metrics

This paper uses average precision (AP), mean average precision (mAP), parameter, Giga floating point operations per second (GFLOPs), and frames per second (FPS) to evaluate the model. AP and mAP are used to test the model’s recognition ability, calculated as follows:

P = \frac{TP}{TP + FP}

(7)

R = \frac{TP}{TP + FN}

(8)

AP = \int_{0}^{1} p (r) d r

(9)

mAP = \frac{1}{N} \sum_{i = 1}^{N} {AP}_{i}

(10)

where P is precision, R is recall, TP is the number of correctly recognized samples, FP is the number of samples misidentified as other categories, FN is the number of other categories misidentified as the current category, and N represents the total number of detection categories. mAP@0.5 represents the average precision of each category calculated with IoU threshold set at 0.50. mAP@0.5:0.95 represents the average precision calculated with IoU ranging from 0.50 to 0.95 in steps of 0.05.

FPS reflects the model’s inference speed. When FPS exceeds 30, the model meets the requirements for real-time detection. The parameter is used to measure the model’s spatial complexity and scale; a smaller Parameter indicates lower model complexity and storage requirements. GFLOPs evaluate the computational complexity of the model; larger GFLOPs indicates higher computational power requirements.

3.4. Experimental Results and Analysis

3.4.1. Performance Comparison with Different Numbers of DyHead Modules

The number of DyHead modules in series affects the detection performance of the model. This study evaluated the impact of changing the number of DyHead modules on model performance and computational cost. The results, shown in Table 3, indicate that simply increasing the number of DyHead modules does not always lead to performance improvements. In fact, performance declines as the number of modules increases. Therefore, this study used only one DyHead module for prediction output.

3.4.2. Ablation Studies

To better verify the impact of each improved module and the combination of these modules on the original model’s performance, an ablation study was designed. On the basis of the YOLOv8s network, the SPPF-LSKA module, slim-neck feature fusion structure, DyHead detection head, and WIoU loss function were introduced sequentially. The results of the ablation experiments are shown in Table 4. A checkmark (√) indicates the inclusion of a module, while a dash (–) indicates its exclusion.

From Table 4, it can be seen that introducing the SPPF-LSKA module improves the model’s mAP@0.5 and mAP@0.5:0.95 by 1.1% and 1%, respectively, though it increases computation and model size. The slim-neck module slightly improves mAP@0.5 and mAP@0.5:0.95 while significantly reducing computational cost and parameter count. Adding the DyHead module further enhances mAP@0.5 and mAP@0.5:0.95 by 1% and 1.3%, respectively, with slight reductions in computation and parameter count. The WIoU loss function improves mAP@0.5 and mAP@0.5:0.95 by 1.1% and 0.9% without changing model size and computational cost. When SPPF-LSKA, slim-neck, and DyHead modules are used together without the WIoU loss function, mAP@0.5 and mAP@0.5:0.95 improve by 1.8% and 2.9%, respectively, showing significant performance enhancement. Combining these three modules with the WIoU loss function further improves performance, with mAP@0.5 and mAP@0.5:0.95 increasing by 3.4% and 4.6%, respectively. The experiments indicate that the improved model achieves higher detection accuracy with reduced parameters and floating-point operations, maintaining a balance between performance and lightweight design. Moreover, the model achieves a high frame rate of 96.2 FPS, indicating its suitability for real-time applications.

Next, the convergence of the mAP and the bounding box loss for the original YOLOv8s and YOLO-LSDW during training were compared. The training was conducted for 300 epochs, and the result curves are shown in Figure 9.

Figure 9a shows the mAP comparison curve, where YOLO-LSDW demonstrates a faster increase in mAP values during the initial stages of training, stabilizing around the 150th epoch and maintaining higher mAP values throughout, indicating higher detection accuracy and faster convergence speed. Figure 9b displays the loss convergence comparison, with YOLO-LSDW showing a faster reduction in loss values and achieving lower loss values in the later stages of training, indicating better optimization. Combining the results from Figure 9a,b, YOLO-LSDW exhibits faster convergence and higher detection accuracy during training, proving the effectiveness and superiority of the proposed improvements.

3.4.3. Comparative Experiments

To verify the effectiveness of the proposed YOLO-LSDW network model, comparisons were made with current mainstream object detection models, including faster R-CNN [29], YOLOv5s [30], YOLOv7 [31], and YOLOv8s. The experimental results are shown in Table 5, and the training results of the YOLO-LSDW model (precision, recall, and mAP) are illustrated in Figure 10.

In terms of detection accuracy, the YOLO-LSDW model improves mAP@0.5 by 6.7%, 3.4%, 0.9%, and 3.4% compared to faster R-CNN, YOLOv5s, YOLOv7, and YOLOv8s, respectively. For the stricter mAP@0.5:0.95 metric, YOLO-LSDW’s performance is significantly better, with improvements of 22.1%, 8.9%, 5%, and 4.6%, respectively. Additionally, the YOLO-LSDW model excels in average precision for most construction machinery categories, particularly excavators and tower cranes. In terms of computational complexity, YOLO-LSDW’s GFLOPs are 25.6 G, significantly lower than YOLOv7’s 103.2 G and faster R-CNN’s 139.8 G, indicating a reduced computational demand while maintaining high detection accuracy. The YOLO-LSDW model’s parameter count is 11.06 M, slightly higher than YOLOv5s but lower than YOLOv8s, greatly reducing storage requirements and improving training and inference efficiency. With a detection speed of 96.2 FPS, YOLO-LSDW meets real-time detection requirements on edge devices.

Overall, the results show that the YOLO-LSDW model demonstrates significant advantages in detecting external breakage hazards in complex power transmission line corridors, excelling in accuracy, computational efficiency, and real-time performance. Furthermore, the lightweight nature of YOLO-LSDW makes it suitable for deployment on edge devices, which is crucial for environments with limited computational resources. These advantages make the YOLO-LSDW model highly suitable for real-time monitoring systems, especially in the power industry for transmission line safety monitoring, offering significant practical application value.

3.4.4. Comparative Analysis of Methods

To comprehensively evaluate the advantages of the YOLO-LSDW model in the task of detecting external damage hazards in transmission line corridors, we conducted a qualitative comparison between the proposed method and other mainstream object detection methods, as presented in Table 6.

Table 6 demonstrates that the YOLO-LSDW model stands out in terms of advantages, applicable scenarios, and performance. This method integrates the SPPF-LSKA module, slim-neck structure, DyHead detection head, and WIoU loss function, excelling particularly in detecting complex backgrounds and small objects. In contrast, other models like YOLOv8s and YOLOv7, while possessing certain distinctive features, are not as comprehensively optimized for transmission line scenarios as the YOLO-LSDW model.

The YOLO-LSDW model boasts clear advantages in high-precision detection, small object recognition, lightweight design, and fast inference speed, making it well-suited for real-time detection of external damage hazards in transmission line corridors. Although other methods have their own strengths (e.g., the fast inference speed of YOLOv5s and the high precision of faster R-CNN), they fall short of the YOLO-LSDW model in overall performance. Notably, YOLO-LSDW achieves mAP@0.5 and mAP@0.5:0.95 scores of 84.2% and 63.5%, respectively, while maintaining 96.2 FPS, striking a balance between high accuracy and real-time detection.

However, the performance of the YOLO-LSDW model under extreme weather conditions remains to be further verified, which will be a focus of future research. Other methods also have limitations, such as the high computational complexity of the YOLOv7 model and the potential accuracy deficiencies of YOLOv5s in complex scenes, which need to be considered in practical applications. The YOLO-LSDW model is optimized specifically for the real-time detection of external damage hazards in transmission line corridors under complex backgrounds and has shown excellent performance. In contrast, other methods are better suited to their respective advantageous scenarios, such as YOLOv8s for general object detection and faster R-CNN for high-precision tasks that do not require real-time processing. This targeted optimization gives the YOLO-LSDW model a unique application value in transmission line monitoring systems.

Overall, the YOLO-LSDW model exhibits comprehensive advantages in the task of detecting external damage hazards in transmission line corridors, combining high precision, fast inference, and lightweight design, making it well-suited for practical applications. Future research could further explore its performance optimization under extreme weather conditions and its applicability in more complex scenarios.

3.4.5. Detection Results Comparison

To further evaluate the performance of the YOLO-LSDW model in real-world complex scenarios, three typical power transmission line corridor scenes were selected from the test set. These scenes are characterized by complex background environments, target occlusion, and dense distribution, as well as the presence of multi-scale targets. The detection performance of the original YOLOv8s model and the YOLO-LSDW model was compared, with visualization results shown in Figure 11. Figure 11a shows the YOLOv8s detection results, while Figure 11b shows the YOLO-LSDW detection results.

In the first scene, the original YOLOv8s model missed some crane targets due to the complex background, whereas the YOLO-LSDW model showed stronger anti-interference capability by enhancing the extraction of key features of construction machinery. In the second scene, with occlusion and dense distribution of hazard targets, the YOLOv8s model still missed some crane targets, while the YOLO-LSDW model exhibited better detection performance. In the third scene, with large variations in target scales, the YOLOv8s model failed to detect some small concrete mixers and overlapping excavators. In contrast, the YOLO-LSDW model effectively identified these small targets, addressing the issue of missing small construction machinery targets in the original model. Additionally, for targets detected by both models, the YOLO-LSDW model showed higher confidence, further demonstrating its robustness and accuracy in complex scene target detection.

To visually demonstrate the impact of the model improvements, Gradient-weighted Class Activation Mapping (GradCAM) was employed to visualize the output layers of both the original YOLOv8s model and the improved YOLO-LSDW model. This analysis aims to highlight the attention regions within the models before and after the improvements. As shown in the heatmaps generated by GradCAM in Figure 12, the focus areas of the models during the detection task can be clearly observed. The deep red regions in the heatmaps indicate areas of high attention, suggesting a higher likelihood of the model detecting targets in these regions. From the figure, it is evident that the attention regions in the original YOLOv8s model are relatively scattered, whereas the improved YOLO-LSDW model more accurately focuses on the areas where the external damage hazards are located. This shift indicates that the YOLO-LSDW model is more effective in capturing critical features related to the target, thereby enhancing detection accuracy and robustness.

4. Discussion

The proposed YOLO-LSDW model has demonstrated superior performance in the task of detecting external damage hazards in transmission line corridors. Through an in-depth analysis of the experimental results, the following points can be discussed.

4.1. Model Performance Analysis

The YOLO-LSDW model outperforms the comparison models in both mAP@0.5 and mAP@0.5:0.95 metrics, especially excelling in complex backgrounds and small object detection. By introducing the SPPF-LSKA module, this model effectively mitigates the interference of complex backgrounds on the features of construction machinery, significantly improving detection accuracy in challenging environments. Additionally, the lightweight slim-neck feature fusion structure not only enhances the detection capability for small objects but also reduces the model’s parameter count and computational load. The integration of the DyHead detection head, which incorporates scale, spatial, and task-specific attention mechanisms, further boosts the model’s performance in the head network. Moreover, the use of the WIoU loss function optimizes the model’s generalization ability, enhancing stability and convergence speed during training. However, the model still exhibits certain false positives and false negatives in specific scenarios, which may be related to the diversity and representativeness of the training data.

4.2. Computational Efficiency and Real-Time Performance

Although the YOLO-LSDW model shows a significant improvement in detection accuracy, its computational complexity and parameter count are reduced compared to the original YOLOv8s model. This “lightweight” design makes the model more suitable for deployment on edge computing devices, enhancing its feasibility in practical applications. However, during actual deployment, hardware conditions and environmental factors must be considered to ensure the model’s stability and real-time performance across different scenarios.

4.3. Limitations and Future Directions

Despite the achievements of this study, there are still some limitations. Firstly, although the dataset includes a variety of construction machinery, it may not fully represent all possible external damage hazards. Secondly, the model’s performance under extreme weather conditions has not been thoroughly validated. Future research could focus on the following improvements:

1. Expanding the dataset to include more diverse scenarios and target types.

2. Exploring more effective feature fusion methods to further enhance the model’s adaptability to complex scenes.

3. Investigating optimized deployment strategies for the model across different hardware platforms to meet practical application requirements.

5. Conclusions

This paper proposes a detection method for external damage hazards in transmission line corridors based on the YOLO-LSDW model and develops a corresponding dataset. By incorporating the SPPF-LSKA module, slim-neck feature fusion structure, DyHead detection head, and WIoU loss function, the model significantly improves its performance in detecting complex backgrounds and small objects. Experimental results demonstrate that the YOLO-LSDW model outperforms existing mainstream object detection models in both mAP@0.5 and mAP@0.5:0.95 metrics, while maintaining low computational complexity and fast detection speed, showcasing strong potential for practical applications.

However, as noted in the discussion section, there are still some limitations to this study, such as the insufficient diversity of the dataset and the need for further validation of the model’s adaptability under extreme conditions. Future research should focus on expanding and diversifying the dataset, further optimizing feature fusion methods, and enhancing the model’s deployment and performance in real-world application environments.

In summary, the YOLO-LSDW model offers an effective solution for the real-time and efficient detection of external damage hazards in transmission line corridors, contributing significantly to the safety and reliability of power systems. With further optimization and improvements, this method is expected to play a crucial role in the broader field of transmission line safety monitoring, providing essential technical support for the development of smart grids.

Author Contributions

Conceptualization, J.Y.; Methodology, H.Z.; Software, H.Z., J.Y. and J.S.; Formal analysis, H.Z., J.Y. and J.S.; Investigation, H.Z., J.Y., J.S. and C.Y.; Data curation, H.Z.; Writing – original draft, J.Y.; Writing—review & editing, H.Z., J.Y., J.S., C.Y., Y.L. and J.C.; Visualization, J.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Science and Technology Program of the Headquarters of State Grid Corporation of China (5400-202355219A-1-1-ZN).

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Conflicts of Interest

Author Jialun Sun was employed by the company Zhangjiakou Power Supply Bureau of State Grid Jibei Electric Power Co., Ltd. The authors declare that this study received funding from State Grid Corporation. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Wang, Z.; Yuan, G.; Zhou, H.; Ma, Y.; Ma, Y. Foreign-object detection in high-voltage transmission line based on improved yolov8m. Appl. Sci. 2023, 13, 12775. [Google Scholar] [CrossRef]
Wu, C.; Song, J.-G.; Zhou, H.; Yang, X.-F.; Ni, H.-Y.; Yan, W.-X. Research on intelligent inspection system for HV power transmission lines. In Proceedings of the 2020 IEEE International Conference on High Voltage Engineering and Application (ICHVE), Beijing, China, 6–10 September 2020; pp. 1–4. [Google Scholar]
Yang, L.; Fan, J.; Liu, Y.; Li, E.; Peng, J.; Liang, Z. A review on state-of-the-art power line inspection techniques. IEEE Trans. Instrum. Meas. 2020, 69, 9350–9365. [Google Scholar] [CrossRef]
Liu, J.; Huang, H.; Zhang, Y.; Lou, J.; He, J. Deep learning based external-force-damage detection for power transmission line. J. Phys. Conf. Ser. 2019, 13, 012032. [Google Scholar] [CrossRef]
Yang, J.; Qin, Z.; Pang, X.; He, Z.; Cui, C. Foreign body intrusion monitoring and recognition method based on Dense-YOLOv3 deep learning network. Power Syst. Prot. Control 2021, 49, 37–44. [Google Scholar]
Jenssen, R.; Roverso, D. Automatic autonomous vision-based power line inspection: A review of current status and the potential role of deep learning. Int. J. Electr. Power Energy Syst. 2018, 99, 107–120. [Google Scholar]
Gu, X.; Tang, D.; Huang, X. Deep learning-based defect detection and recognition of a power grid inspection image. Power Syst. Prot. Control. 2021, 49, 91–97. [Google Scholar]
Liu, M.; Li, Z.; Li, Y.; Liu, Y.; Jiang, X. A Method for Transmission Line Defect Edge Intelligent Inspection Based on Re-parameterized YOLOv5. High Volt. Eng. 2024, 50, 1954–1966. [Google Scholar]
Qiu, Z.; Zhu, X.; Liao, C. Intelligent recognition of bird species related to power grid faults based on object detection. Power Syst. Technol. 2022, 46, 369–377. [Google Scholar]
Luo, P.; Wang, B.; Ma, H.; Ma, F.; Wang, H.; Zhu, D. Defect recognition method with low false negative rate based on combined target detection framework. High Volt. Eng. 2021, 47, 454–464. [Google Scholar]
Wei, X.; Lu, W.; Zhao, W.; Wang, D. Target detection method for external damage of a transmission line based on an improved Mask R-CNN algorithm. Power Syst. Prot. Control. 2021, 49, 155–162. [Google Scholar]
Huang, Y.; Liu, H.; Chen, Q.; Chen, Z.; Zhang, J.; Yang, C. Transmission line insulator fault detection method based on USRNet and improved YOLOv5x. High Volt. Eng. 2022, 48, 3437–3446. [Google Scholar]
Ersheng, T.; Chunlie, L.; Guodong, Z. Identification algorithm of transmission line external hidden danger based on YOLOv4. Comput. Syst. Appl. 2021, 30, 190–196. [Google Scholar]
Hao, S.; Yang, L.; Ma, X.; Ma, R.; Wen, H. YOLOv5 transmission line fault detection based on attention mechanism and cross-scale feature fusion. Proc. CSEE 2023, 43, 2319–2330. [Google Scholar]
Ma, F.; Wang, B.; Dong, X. Receptive field vision edge intelligent recognition for ice thickness identification of transmission line. Power Syst. Technol. 2021, 45, 2161–2169. [Google Scholar]
Ma, F.; Wang, B.; Dong, X.; Wang, H.; Luo, P.; Zhou, Y. Power vision edge intelligence: Power depth vision acceleration technology driven by edge computing. Power Syst. Technol. 2020, 44, 2020–2029. [Google Scholar]
Lau, K.W.; Po, L.-M.; Rehman, Y.A.U. Large separable kernel attention: Rethinking the large kernel attention design in cnn. Expert Syst. Appl. 2024, 236, 121352. [Google Scholar] [CrossRef]
Li, H.; Li, J.; Wei, H.; Liu, Z.; Zhan, Z.; Ren, Q. Slim-neck by GSConv: A better design paradigm of detector architectures for autonomous vehicles. arXiv 2022, arXiv:2206.02424. [Google Scholar]
Dai, X.; Chen, Y.; Xiao, B.; Chen, D.; Liu, M.; Yuan, L.; Zhang, L. Dynamic head: Unifying object detection heads with attentions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 7373–7382. [Google Scholar]
Tong, Z.; Chen, Y.; Xu, Z.; Yu, R. Wise-IoU: Bounding box regression loss with dynamic focusing mechanism. arXiv 2023, arXiv:2301.10051. [Google Scholar]
Bellou, E.; Pisica, I.; Banitsas, K. Aerial Inspection of High-Voltage Power Lines Using YOLOv8 Real-Time Object Detector. Energies 2024, 17, 2535. [Google Scholar] [CrossRef]
Bochkovskiy, A.; Wang, C.-Y.; Liao, H.-Y.M. Yolov4: Optimal speed and accuracy of object detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
Ghiasi, G.; Lin, T.-Y.; Le, Q.V. Nas-fpn: Learning scalable feature pyramid architecture for object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 7036–7045. [Google Scholar]
Yang, J.; Fu, X.; Hu, Y.; Huang, Y.; Ding, X.; Paisley, J. PanNet: A deep network architecture for pan-sharpening. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 5449–5457. [Google Scholar]
Feng, C.; Zhong, Y.; Gao, Y.; Scott, M.R.; Huang, W. Tood: Task-aligned one-stage object detection. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, BC, Canada, 11–17 October 2021; pp. 3490–3499. [Google Scholar]
Zheng, Z.; Wang, P.; Liu, W.; Li, J.; Ye, R.; Ren, D. Distance-IoU loss: Faster and better learning for bounding box regression. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; pp. 12993–13000. [Google Scholar]
Chollet, F. Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–25 July 2017; pp. 1251–1258. [Google Scholar]
Ouyang, D.; He, S.; Zhang, G.; Luo, M.; Guo, H.; Zhan, J.; Huang, Z. Efficient multi-scale attention module with cross-spatial learning. In Proceedings of the ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes Island, Greece, 4–10 June 2023; pp. 1–5. [Google Scholar]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed]
Gu, J.; Hu, J.; Jiang, L.; Wang, Z.; Zhang, X.; Xu, Y.; Zhu, J.; Fang, L. Research on object detection of overhead transmission lines based on optimized YOLOv5s. Energies 2023, 16, 2706. [Google Scholar] [CrossRef]
Song, Z.; Zhang, K.; Xia, X.; Zhang, H.; Yan, X.; Zhang, L. Detection of Pumping Unit in Complex Scenes by YOLOv7 with Switched Atrous Convolution. Energies 2024, 17, 835. [Google Scholar] [CrossRef]

Figure 1. Detection process of external damage hazards using the YOLO-LSDW model.

Figure 2. YOLO-LSDW model structure.

Figure 3. SPPF and SPPF-LSKA structure module.

Figure 4. Calculation processes of SC (a) and DSC (b).

Figure 5. Detailed structure of GSConv.

Figure 6. VoV-GSCSP module.

Figure 7. Structure of the DyHead module.

Figure 8. Schematic diagram of loss function parameter calculation.

Figure 9. Comparison of mAP (a) and loss convergence (b) between YOLOv8s and YOLO-LSDW.

Figure 10. Training result curves (precision, recall, and mAP).

Figure 11. Detection results comparison between YOLOv8s (a) and YOLO-LSDW (b) models.

Figure 12. Heatmap comparison between YOLOv8s (a) and YOLO-LSDW (b) models.

Table 1. Training parameters.

Parameter	Value
Initial learning rate	0.01
Optimizer	SGD
Weight decay	0.0005
Batch size	16
Training epochs	300
Input image size	640 × 640

Table 2. Label categories and quantities.

Label Category	Quantity
excavator	2146
tower crane	1116
crane	707
bulldozer	932
concrete mixer	596
truck	1379

Table 3. Performance comparison with different numbers of DyHead modules.

Number of DyHead Modules	mAP@0.5/%	GFLOPS/G	FPS (Frame/s)	Model Size/MB
1	81.8	28.1	103.6	22.0
2	81.5	28.6	72.7	23.0
3	80.9	29.2	60.7	24.0
4	80.1	29.7	50.3	25.0

Table 4. Ablation study results.

SPPF-LSKA	Slim-Neck	DyHead	WIoU	mAP @0.5/%	mAP @0.5:0.95/%	Params/ M	GFLOPs/ G	FPS (Frame/s)	Model Size/MB
-	-	-	-	80.8	58.9	11.13	28.4	140	22.5
√	-	-	-	81.9	59.9	12.20	29.3	138.1	24.7
-	√	-	-	81.3	59.0	10.27	25.1	141.8	20.9
-	-	√	-	81.8	60.2	10.85	28.1	103.6	22.0
-	-	-	√	81.9	59.8	11.13	28.4	140	22.5
√	√	-	-	82.2	60.2	11.34	25.9	137.6	23.0
√	√	√	-	82.6	61.8	11.06	25.6	96.2	22.5
√	√	√	√	84.2	63.5	11.06	25.6	96.2	22.5

Table 5. Comparison results with other methods.

Model	AP/%						mAP @0.5/ %	mAP@0.5:0.95/ %	GFLOPs/ G	Params/ M	FPS (Frame/ s)	Model Size/ MB
Model	Truck	Crane	Excavator	Tower Crane	Bulldozer	Concrete Mixer
YOLOv8s	0.797	0.754	0.871	0.824	0.847	0.756	0.808	0.589	28.4	11.1	143.5	22.5
YOLOv7	0.813	0.831	0.886	0.78	0.879	0.808	0.833	0.585	103.2	36.5	53.5	284
YOLOv5s	0.818	0.774	0.854	0.795	0.849	0.759	0.808	0.546	15.8	7.0	169.6	14.4
Faster- RCNN	0.73	0.78	0.82	0.65	0.84	0.83	0.775	0.424	139.8	137.1	21.0	108
YOLO- LSDW	0.831	0.795	0.894	0.842	0.885	0.801	0.842	0.635	25.6	11.0	96.2	22.5

Table 6. Comparative analysis of YOLO-LSDW model and other methods.

Method	Advantages	Limitations	Applicable Scenarios	Performance (mAP@0.5/mAP@0.5:0.95/FPS)
YOLO-LSDW	High detection accuracy, strong small object detection capability, lightweight design, fast inference speed	Performance under extreme weather conditions requires further validation	Real-time detection of external damage hazards in transmission line corridors under complex backgrounds	84.2%/63.5%/96.2
YOLOv8s	Good balance between accuracy and speed, flexible task adaptability	Relatively weak small object detection capability	General object detection tasks requiring a high balance of speed and accuracy	80.8%/58.9%/143.5
YOLOv7	High precision, powerful feature extraction capabilities	High computational complexity, large model size	Scenarios requiring high detection precision but not sensitive to computational resources	83.3%/58.5%/53.5
YOLOv5s	Lightweight design, fast inference speed	Potentially insufficient accuracy in complex scenes	Efficient real-time object detection on resource-constrained devices	80.8%/54.6%/169.6
Faster R-CNN	High precision, capable of detecting multi-scale objects	Slow inference speed, high computational resource demand	High precision tasks where real-time processing is not required	77.5%/42.4%/21.0

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zou, H.; Yang, J.; Sun, J.; Yang, C.; Luo, Y.; Chen, J. Detection Method of External Damage Hazards in Transmission Line Corridors Based on YOLO-LSDW. Energies 2024, 17, 4483. https://doi.org/10.3390/en17174483

AMA Style

Zou H, Yang J, Sun J, Yang C, Luo Y, Chen J. Detection Method of External Damage Hazards in Transmission Line Corridors Based on YOLO-LSDW. Energies. 2024; 17(17):4483. https://doi.org/10.3390/en17174483

Chicago/Turabian Style

Zou, Hongbo, Jinlong Yang, Jialun Sun, Changhua Yang, Yuhong Luo, and Jiehao Chen. 2024. "Detection Method of External Damage Hazards in Transmission Line Corridors Based on YOLO-LSDW" Energies 17, no. 17: 4483. https://doi.org/10.3390/en17174483

APA Style

Zou, H., Yang, J., Sun, J., Yang, C., Luo, Y., & Chen, J. (2024). Detection Method of External Damage Hazards in Transmission Line Corridors Based on YOLO-LSDW. Energies, 17(17), 4483. https://doi.org/10.3390/en17174483

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Detection Method of External Damage Hazards in Transmission Line Corridors Based on YOLO-LSDW

Abstract

1. Introduction

2. YOLO-LSDW Network Model

2.1. Method Overview

2.2. YOLOv8 Network Model

2.3. SPPF-LSKA Module

2.4. Slim-Neck Feature Fusion Structure

2.5. DyHead Module

2.6. Loss Function

3. Experimental Environment and Result Analysis

3.1. Experimental Environment and Training Strategy

3.2. Dataset

3.3. Evaluation Metrics

3.4. Experimental Results and Analysis

3.4.1. Performance Comparison with Different Numbers of DyHead Modules

3.4.2. Ablation Studies

3.4.3. Comparative Experiments

3.4.4. Comparative Analysis of Methods

3.4.5. Detection Results Comparison

4. Discussion

4.1. Model Performance Analysis

4.2. Computational Efficiency and Real-Time Performance

4.3. Limitations and Future Directions

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI