Next Article in Journal
HyperspectralMamba: A Novel State Space Model Architecture for Hyperspectral Image Classification
Previous Article in Journal
A Deep Learning-Based Method for Detection of Multiple Maneuvering Targets and Parameter Estimation
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

SRW-YOLO: A Detection Model for Environmental Risk Factors During the Grid Construction Phase

1
School of Science, Beijing University of Civil Engineering and Architecture, Beijing 102616, China
2
School of Geomatics and Urban Spatial Informatics, Beijing University of Civil Engineering and Architecture, Beijing 102616, China
3
State Grid Economic and Technological Research Institute Co., Ltd., Beijing 102200, China
*
Author to whom correspondence should be addressed.
Remote Sens. 2025, 17(15), 2576; https://doi.org/10.3390/rs17152576
Submission received: 29 April 2025 / Revised: 19 July 2025 / Accepted: 21 July 2025 / Published: 24 July 2025

Abstract

With the rapid advancement of UAV-based remote sensing and image recognition techniques, identifying environmental risk factors from aerial imagery has emerged as a focal point in intelligent inspection during the power transmission and distribution projects construction phase. The uneven spatial distribution of risk factors on construction sites, their weak texture signatures, and the inherently multi-scale nature of UAV imagery pose significant detection challenges. To address these issues, we propose a one-stage SRW-YOLO algorithm built upon the YOLOv11 framework. First, a P2-scale shallow feature detection layer is added to capture high-resolution fine details of small targets. Second, we integrate a reparameterized convolution based on channel shuffle (RCS) of a one-shot aggregation (RCS-OSA) module into the backbone and neck’s shallow layers, enhancing feature extraction while significantly reducing inference latency. Finally, a dynamic non-monotonic focusing mechanism WIoU v3 loss function is employed to reweigh low-quality annotations, thereby improving small-object localization accuracy. Experimental results demonstrate that SRW-YOLO achieves an overall precision of 80.6% and mAP of 79.1% on the State Grid dataset, and exhibits similarly superior performance on the VisDrone2019 dataset. Compared with other one-stage detectors, SRW-YOLO delivers markedly higher detection accuracy, offering critical technical support for multi-scale, heterogeneous environmental risk monitoring during the power transmission and distribution projects construction phase, and establishes the theoretical foundation for rapid and accurate inspection using UAV-based intelligent imaging.

1. Introduction

As a vital component of grid expansion, power transmission, and distribution projects construction phase undertakes the critical roles of conveying electrical energy and safeguarding network stability, yet its implementation inevitably imposes various environmental pressures on adjacent areas. During every stage, from construction preparation and active construction through ecological regular inspection, a range of environmental risk factors emerges, including residential demolition, tower foundation disturbance, ecosystem restoration, soil erosion, vegetation cover change, impacts of permanent and temporary engineering measures, noise and dust pollution, and waste disposal and resource utilization [1]. Owing to variations in unmanned aerial vehicle (UAV) flight altitude and imaging angle, these risk factors often appear in aerial data as multi-scale, heterogeneous targets characterized by small pixel footprint and faint texture, which can lead conventional detectors to register false positives or miss to detect true environmental risks. Consequently, developing robust object detection algorithms for environmental risk factors during the power transmission and distribution projects construction phase holds profound implications for protecting the environment throughout the construction lifecycle.
Object detection constitutes a fundamental branch of computer vision, tasked with precisely localizing objects of interest within still images or video frames and assigning them categorical labels, encompassing both bounding box regression and classification. Research in this domain bifurcates into two major traditions: classical methods and deep learning-based approaches [2]. Classical algorithms depend on handcrafted feature extraction combined with conventional machine learning classifiers; their typical workflow involves image preprocessing, feature extraction, region proposal generation, and classifier design. Notable examples include the Viola–Jones detector, which utilizes Haar features with AdaBoost; sliding-window methods employing histogram of oriented gradients (HOG) descriptors coupled with support vector machines (SVM); and clustering or segmentation techniques driven by color, texture, or edge cues to categorize image regions [3,4,5]. However, handcrafted features often lack the capacity to capture the full diversity and complexity of real-world objects, and they exhibit limited robustness under variations in illumination, scale, or viewpoint. Consequently, traditional detectors are prone to both false positives and missed detections when confronted with multi-scale, small, or heterogeneous targets.
In contrast, deep learning-based object detection frameworks perform end-to-end regression from raw imagery to both object localization and classification. These approaches fall into two principal categories: two-stage detectors and one-stage detectors. Two-stage methods first generate a set of region proposals and then refine these proposals via classification and bounding box regression, examples include R-CNN, Fast R-CNN, Faster R-CNN, Mask R-CNN, and Double-Head R-CNN [6,7,8,9,10]. While these architectures achieve high accuracy and robustness, their multi-step pipelines incur greater computational overhead, limiting applicability in real-time monitoring during the power transmission and distribution projects construction phase. Conversely, one-stage detectors integrate proposal and classification into a single network, yielding substantially faster inference; prominent instances are the single-shot multibox detector (SSD) and the You Only Look Once (YOLO) family [11,12,13,14]. The YOLO series, in particular, has seen extensive adaptation for multi-scale, heterogeneous UAV remote sensing applications [15,16]. For example, Wang et al. [17] introduced MC-YOLO built on YOLOv8s by adding extra detection layers and a multi-scale feature fusion module to better capture transmission line defects. Liu et al. [18] developed a lightweight anomaly detector, YOLO-PowerLite, based on YOLOv8n, where the novel C2f_AK module reduces parameter redundancy and enhances feature adaptability across scales. Shi et al. [19] proposed LSKF-YOLO for high-resolution satellite imagery, combining a large spatial kernel selective attention fusion block with a multi-scale feature alignment structure; on a bespoke power tower dataset, it achieved an F1 score of 0.764 and mean average precision (mAP0.5) of 77.47%. Li et al. [20] addressed variable-shaped debris on transmission lines by integrating deformable convolutions and SimAM attention into a YOLOv7-Tiny backbone, boosting foreign object recall and precision. Bi et al. [21] proposed the YOLOX++ detector by augmenting the original YOLOX architecture with a multi-scale cross-stage local network (MS-CSPNet) and an alpha-IoU loss function. MS-CSPNet fuses features across scales and expands the network’s receptive field, while the alpha-IoU loss refines small-object localization. On a high-voltage tower dataset, YOLOX++ achieved detection accuracies of 86.8% for bird’s nest anomalies and 96.6% for insulator targets, demonstrating markedly enhanced small-target robustness. Bi et al. [22] further extended YOLOX by integrating the FReLU activation function and an attention driven feature enhancement network (AM-FPNet), which strengthens correlations within shallow feature maps to improve detection precision of small transmission line anomalies. Finally, Rong et al. [23] introduced PL-YOLOv8, combining YOLOv8 with oriented filtering to extract directional and textural cues of elongated power lines and their surroundings in complex backgrounds, thereby facilitating both accurate line detection and quantitative assessment of vegetation encroachment.
Although deep learning-based YOLO algorithms have achieved remarkable progress in UAV image analysis, their real-time responsiveness and accuracy demands in power transmission and distribution projects construction phase inspection remain unmet. Introducing a shallow feature detection layer (SFDL) offers a promising avenue for multi-scale object localization in aerial data. For instance, Lin et al. [24] developed HS-YOLO an algorithm that combines a high-resolution backbone with sub pixel convolutions to detect diminutive violations in power security scenarios. Zhang et al. [25] augmented the YOLOv5 backbone with a micro-scale detection head to preserve the subtle defect features on wind turbine blades. Cheng et al. [26] similarly incorporated a novel high-resolution detection branch into YOLOv5, enhancing sensitivity to small transmission line components. Within the YOLOv8 family, Wu et al. [27] embedded an additional small-object head to markedly improve recognition of wire clips and anti-vibration devices. Meanwhile, in order to make the model more lightweight and reduce parameter redundancy to improve the detection speed of the model. To further boost scale sensitivity on overhead lines, Pang et al. [28] introduced the RCS-OSA module [29], and Wu et al. [27] integrated it into YOLOv8’s backbone and shallow neck achieving rapid, high-fidelity inference with lower computation. Beyond architectural tweaks, many studies have focused on loss function innovations: Xiang et al. [30] adopted the Wise-IoU v3 loss [31] to mitigate the impact of low quality annotations in custom line inspection datasets and accelerate convergence. Similarly, in order to optimize the convergence speed and detection performance of the model on the external damage hazard dataset of transmission line corridors, Zou et al. [32] employed Wise-IoU as the loss function for YOLOv8. Hu et al. [33] leveraged Wise-IoU to strike a balance between model compactness and average precision in a UAV insulator-defect context.
Therefore, in designing object detection architectures, it is essential to avoid excessive module stacking, which leads to unwieldy complexity, and instead emphasize the selective extraction of multi-scale features across network components. Guided by this principle, we propose the SRW-YOLO model based on the YOLOv11 network, which can better acquire multi-scale heterogeneous environmental risk factor targets in UAV images during the construction period of transmission and distribution projects by adding a new shallow feature detection layer as well as integrating the RCS-OSA reparameterized convolution module, enhancing the small-object detection capability as well as improving the inference speed of the network model. In addition, the Wise-IoU v3 loss function is introduced as an alternative to the base IoU loss function, which reduces the penalty effect of low-quality samples and dynamically adjusts the loss weights of targets at different scales.
The principal innovations and contributions of this work are fourfold:
(1)
We add an extra P2-scale detection layer in the shallow layer of the YOLOv11 network to enhance the detection of small objects.
(2)
We substitute YOLOv11’s original C3K2 blocks in both the backbone and shallow layer of the neck with RCS-OSA modules, achieving richer multi-scale feature fusion without sacrificing computational or inference efficiency.
(3)
We adopt the Wise-IoU v3 loss function as the loss function of the YOLOv11 model, which enables the model to be free from the interference of anomalous gradients in the face of low-quality samples, and thus improves the overall generalization ability.
(4)
Using UAV aerial photography data from the State Grid Corporation of China, we construct a UAV aerial photography dataset of environmental risk factors during the construction period of transmission and distribution projects, using it to rigorously validate SRW-YOLO’s detection performance.
Overall, our cross-scale dynamic perception design overcomes traditional detectors’ limitations in scale-sensitive feature mining and static loss weighting, which provides theoretical support for the decoupling and enhancement of target features in small objects and weak texture scenes. It not only promotes the theoretical innovation of multi-scale feature fusion and dynamic weight allocation, but also realizes the accurate identification of multi-scale targets in complex environments, and at the same time provides a theoretical basis for the construction of real-time environmental risk early warning platform at a later stage. In addition, this technology plays a key role in the detection of environmental risk factors during the construction period of transmission and distribution projects, which provides a solid technical guarantee for guaranteeing the safety of the projects, promoting the construction of smart cities and improving the level of public safety management. The structure of the article is shown in Figure 1.

2. Proposed Methods

2.1. YOLOv11

In multi-scale object detection research, the YOLO family is renowned for its ability to reconcile high accuracy with real-time inference. In September 2024, Ultralytics released YOLOv11, which this study adopts as its baseline model [34], and the structure of the model is shown in Figure 2. The model is mainly divided into the input, backbone module, neck module, and head output module. The model accepts an RGB image of size 640 × 640 × 3 as input, and the introduced C3K2 module processes the features by using a smaller convolutional kernel, which is more efficient compared to the traditional C2f module and reduces the number of parameters. In the backbone network, the model retains the fast spatial pyramid pooling (SPPF) module, and the subsequently added C2PSA module strengthens the feature capturing ability for key regions by utilizing the parallel spatial attention mechanism. Meanwhile, the model adopts a PANet-like multi-scale feature fusion strategy in the neck to further enhance the cross-scale feature fusion, and the detection head section is the key module responsible for the prediction generation, which outputs the bounding box, confidence, and category probability of the target after the final convolutional layer.

2.2. SRW-YOLOv11

This study tackles the challenges posed by UAV captured imagery during the power transmission and distribution projects construction phase-namely, pronounced resolution disparities among multi-scale targets and the difficulty of extracting features from weakly textured stacking of materials, and enhancing the precision and efficiency of detecting UAV-acquired environmental risk factor data by enhancing the YOLOv11 architecture to create SRW-YOLOv11, as shown in Figure 3. We introduce three main contributions in the SRW-YOLOv11 network architecture. First, we add a new P2-scale SFDL in the backbone of YOLOv11 to enhance the extraction of small targets and thus make better use of the shallow detail information. Second, we introduce the RCS-OSA reparameterized convolution module to the backbone and the shallow layers of the neck networks to improve the feature extraction ability and inference speed of the network through information channel rearrangement. Finally, we introduce the Wise-IoU loss function with dynamic non-monotonic focusing mechanism to reduce the impact of low-quality samples due to partial labelling on the model detection accuracy. Together, these innovations enable SRW-YOLOv11 to more effectively detect multi-scale, heterogeneous environmental risk factors in complex UAV imagery, delivering improved network precision without sacrificing real-time performance.

2.2.1. Shallow Feature Detection Layer

YOLOv11’s original design employs three detection heads at P3, P4, and P5 scales to handle objects of varying sizes, yet UAV imagery of environmental risk factors during the power transmission and distribution projects construction phase often contains many small targets that these scales may overlook. Previously, Wu et al. [27] upsampled the 80×80 feature map in YOLOv8n’s neck to obtain a P2-scale, high-resolution feature map and appended a specialized small-object head, markedly improving multi-scale fusion and detection performance. Li et al. [35] similarly upsampled 80 × 80 features in YOLOv8’s neck, applied a convolution, and merged them with backbone features via element-wise addition (rather than concat) to form a P3 detection layer, thereby leveraging both shallow spatial detail and deep semantic context. Building on these insights, SRW-YOLOv11 augments the YOLOv11 backbone with a P2-scale shallow feature detection layer that produces a 160 × 160 feature map from the 640 × 640 input, ensuring that fine-grained target information is preserved. In the backbone network, a two-step convolutional process is applied to a 640 × 640 input image to produce a 160 × 160 shallow feature map, thereby facilitating the preservation of richer high-resolution spatial details. This neck upsampled P2 feature is then fused with the twice-convolved backbone features for small-object localization, followed by RCS-OSA processing and a further shallow detection pass. Thus the network architecture of SRW-YOLOv11 uses four output layers at P2, P3, P4, and P5 scales for detection, and this design by adding a small-scale detection layer retains the high-resolution shallow features while exploiting the deep semantic feature information. The shallow feature detection layer is responsible for extracting the underlying spatial information features in the image through convolution, pooling, and other operations on the original image, and fusing them with the deep semantic features to enhance the representation of the small targets, so as to construct an information rich feature map, as shown in Figure 4.

2.2.2. RCS-OSA

In UAV-based aerial object detection, integrating a shallow feature detection layer markedly strengthens the network’s ability to capture fine details of diminutive targets, but this benefit comes with increased architectural complexity and parameter count. Within the YOLOv11 framework, the C3K2 block excels at multi-scale feature fusion, yet its multi-level aggregation strategy imposes substantial computational overhead and elevated memory usage, thereby constraining real-time deployment in heterogeneous construction site scenarios. The RCS-OSA module was first introduced in brain-tumor detection to fuse semantic information more effectively and curb feature-integration losses [29]. Yao et al. [36] demonstrated that this structured, reparameterized convolution design achieves a favorable balance between detection accuracy and model size in medical imaging tasks. Guo et al. [37] subsequently replaced YOLOv8n’s original C2f blocks with RCS-OSA, enabling richer feature extraction through one-shot aggregation of cascaded representations, simplifying the overall network and substantially enhancing flame and smoke detection accuracy.
Therefore, in order to preserve detection accuracy while significantly reducing inference latency, this study introduces the RCS-OSA reparametric convolution module into the YOLOv11 network architecture to replace the C3K2 module in the YOLOv11 backbone and the shallow layer of neck network, a pivotal component first introduced in RCS-YOLO. During training, RCS splits an input tensor of dimensions C × H × W into two equal channel partitions; one partition is processed through parallel RepVGG branches (1 × 1 and 3 × 3 convolutions plus an identity branch) and the other bypasses these branches, after which their outputs are concatenated and subjected to “channel shuffle” to yield enriched multi-scale feature representations. In the inference stage, 1 × 1 convolution, 3 × 3 convolution and identity branch are transformed into 3 × 3 RepConv modules by structural reparameterization as shown in Figure 5 [38]. The multi-branch structure enables the model to learn richer feature information in the training phase, and the simplified single-branch structure saves memory consumption and reduces computational complexity in the inference phase, thus achieving fast inference, as shown in Figure 6. This design empowers the model to learn robust features for weak-texture, irregular UAV targets during training, while the simplified replication in inference achieves rapid, real-time performance.
In the RCS-OSA module, one part of the input features is propagated through direct propagation, while the other part is repeatedly stacked by the RCS module to ensure feature reuse, and then the two parts are cascaded to obtain the final output, as shown in Figure 7. The OSA submodule employs a one-shot aggregation strategy, whereby multiple receptive-field representations are generated in parallel but merged in a single fusion step at the end, markedly enhancing feature diversity while alleviating computational load.

2.2.3. Wise-IoU v3

The bounding box regression loss plays a critical role in object detection networks, and its precise definition can markedly enhance model performance. However, training datasets often include noisy or low-quality samples. If a loss function places excessive weight on these poorly labeled examples particularly via stringent aspect ratio penalties it can skew gradient updates and degrade the detector’s overall generalization. By mitigating the undue influence of such samples, a more balanced loss formulation preserves robustness across the full distribution of object scales and qualities.
We denote the anchor box as B = x y w h , and the target box as B = x g t y g t w g t h g t . The intersection over union (IoU) metric quantifies the overlap between the anchor box and its corresponding the target box in object detection, as illustrated in Figure 8, and is formally defined as follows:
L I o U = 1 I o U = 1 W i H i S u
where the area of the union is
S u = w h + w g t h g t W i H i
WIoU v3 employs a dynamic non-monotonic focusing mechanism to adaptively modulate the gradient gain of each bounding-box prediction according to its “outlier degree”. By down-weighting excessively confident anchors and suppressing noisy gradients from low-quality annotations, this loss concentrates learning on ordinary quality boxes, thereby enhancing small-object localization. Consequently, we replace the standard IoU loss in YOLOv11 with the WIoU v3 loss, whose formulation is given below:
L W I o U v 3 = r R W I o U L I o U
r = β δ α β δ
R W I o U = exp x x g t 2 + y y g t 2 W g 2 + H g 2
where W g and H g denote the size of the minimum enclosing box, and α and δ are hyperparameters that regulate the scale r . The term β represents a small, adaptive gradient gain and is defined as follows:
β = L I o U L I o U ¯ 0 , +
where L I o U is the monotonic focusing coefficient and L I o U ¯ is its exponential moving average with momentum m .

3. Experiments

3.1. Datasets and Evaluation Metrics

3.1.1. The State Grid Dataset

In this study, we employ a UAV acquired dataset of environmental risk factors compiled by the State Grid Corporation of China during the construction phase of transmission and distribution projects. After manual curation and annotation, the dataset comprises 2912 high-resolution images (5472 × 3648 pixels each). we used labelimg (Version: 1.8.6)® image annotation software to classify and annotate the dataset, and randomly divided it into a training set, a validation set, and a test set according to a ratio of 8:1:1, which was classified into three types of transmission and substation pylon, base of pylon, and stacking of materials. Table 1 presents the class labels alongside their respective image counts and proportional distributions.

3.1.2. The Publicly Available Dataset

In this study, we employ the publicly available VisDrone2019 dataset, assembled by the AISKYEYE team at Tianjin University’s Machine Learning and Data Mining Laboratory, which was constructed specifically for target detection research in the UAV perspective, and is particularly suitable for solving the detection challenges of small targets, target overlap, and complex backgrounds. The dataset contains a total of 8629 labelled images, of which the training set, validation set and test set include 6471, 548, and 1610 images, respectively. The dataset not only provides rich annotation information, but also comes with a variety of attribute information, which can be widely used for the detection tasks of pedestrians, vehicles, and other objects. Figure 9 shows the number information and size distribution of each type of instances in the VisDrone2019 dataset.

3.1.3. Evaluation Metrics

In object detection, evaluation metrics serve to quantify an algorithm’s performance. Here, we employ Precision (P), Recall (R), mAP, and Giga Floating Point Operations (GFLOPs) to assess SRW-YOLO’s efficacy.
TP denotes the number of correctly detected annotation samples, FP denotes the number of incorrectly detected annotation samples, and FN denotes the number of undetected annotation samples. Precision is defined as the proportion of all annotation samples predicted to be correct that are truly correct, reflecting the model’s ability to distinguish between negative samples:
P = T P T P + F P
Recall is defined as the proportion of detected labelled samples among all true samples and reflects the ability of the model to identify positive samples:
R = T P T P + F N
For each class, the average precision (AP) is computed as the area under its precision–recall curve:
A P = 0 1 P ( R ) d ( R )
The mAP is the average of all category APs and reflects the overall performance of the model on the entire dataset:
m A P = 1 k i = 1 k A P
In addition, the number of network parameters is used in our experiments to assess the complexity of the model.

3.2. Equipment Parameters

In this work, the SRW-YOLO model is trained on the State Grid dataset for 200 epochs using stochastic gradient descent. Input images are rescaled to 640 × 640 pixels, and training uses an initial learning rate of 0.01, weight decay of 5 × 10−4, and a batch size of 32. All experiments run on a server equipped with an NVIDIA RTX A6000 GPU, leveraging CUDA 12.1 for acceleration; the implementation is built in Python 3.9.18 with PyTorch 2.4.0.
A total of three sets of complementary experiments are designed in this study. First, we benchmark SRW-YOLO against leading one-stage detectors YOLOv8, YOLOv10 and YOLOv11 on the State Grid dataset to gauge overall performance. Second, we perform ablation experiments to isolate and quantify the impact of each proposed module (P2-scale detection layer, RCS-OSA block, Wise-IoU v3 loss). Third, we validate generalization by comparing all methods on the public VisDrone2019 dataset, focusing on small-object detection under UAV capture conditions. By maintaining identical hyperparameter settings across every trial, we ensure that comparisons remain fair and results directly attributable to architectural innovations.

3.3. Experimental Results

3.3.1. Comparison of Different Object Detection Networks

To validate the reliability of SRW-YOLO, we benchmarked its performance against several leading detectors on the State Grid dataset. Table 2 reports results for the stacking of materials category: SRW-YOLO achieves a precision of 60.8%, 5.7% higher than YOLOv8, 8.1% higher than YOLOv10m, 8.3% higher than YOLOv10n, 5.4% higher than the YOLOv11 baseline, 0.7% higher than RCS-YOLO, and 3.9% higher than RDA-YOLO. In addition, the SRW-YOLO model achieves a mAP50 of 68.1%, a gain of 4.7% over YOLOv8, 5.8% over YOLOv10m, 4.5% over YOLOv10n, 3.4% over YOLOv11, 7.1% over RCS-YOLO, and 0.6% over RDA-YOLO consistently outperforming all comparators. While YOLOv11 remains the most lightweight at 2.6 million parameters, SRW-YOLO attains superior accuracy with a modest complexity of 4.1 million parameters and 16.5 GFLOPs. Taken together, these metrics underscore SRW-YOLO’s state-of-the-art detection capability within an acceptable computational budget.
Figure 10 compares the detection results of YOLOv8, YOLOv10n, YOLOv11, RCS-YOLO, RDA-YOLO, and SRW-YOLOv11 across all object categories in the State Grid dataset; notably, YOLOv8 exhibits both missed and false detections, resulting in comparatively lower accuracy. In contrast, the line plot in Figure 11 clearly illustrates that SRW-YOLOv11 outperforms the other models on every category within the State Grid dataset.

3.3.2. Ablation Experiments

In order to evaluate the impact of each improved module in the SRW-YOLO algorithm (including the new SFDL, RCS-OSA module and WIoU v3 loss function) on the performance of the original YOLOv11, we conducted an ablation experimental methodology, and the evaluation metrics cover precision (%), recall (%), mAP50 (%), as well as the number of model covariates and the number of GFLOPs, and the detailed results are shown in Table 3.
The results of the ablation experiments show that both the addition of a shallow feature detection layer and the introduction of the RCS-OSA module improve the model performance compared to the original baseline YOLOv11 model. When SFDL alone is added to the baseline network architecture, the model precision and mAP are improved by 0.5% and 1%, respectively. When the RCS-OSA module was integrated into the backbone and the shallow neck layers, model recall and mAP were significantly improved by 2.3% and 6.3%, respectively, despite a slight decrease in precision. In response to the requirement for higher detection accuracy of relevant risk factors during the construction phase of transmission and distribution projects, the SRW-YOLO model achieves a 5.4% improvement in accuracy over the YOLOv11 baseline. By integrating both enhancements and with all other evaluation metrics remaining within acceptable thresholds the SRW-YOLO model is judged to outperform the baseline model. In addition, the introduction of the WIoU v3 loss function resulted in a further 0.7% improvement in model recall.

3.3.3. Results of Experiments with the Publicly Available Dataset

To verify the superiority of the SRW-YOLO algorithm in terms of generalization ability and small-target detection, we compared the performance of the method proposed in this study with the current mainstream object detection algorithms using the same training parameters on the VisDrone2019 dataset. The target categories and results obtained from the experiments are detailed in Table 4. (Note: The VisDrone2019 dataset includes a total of ten instance types—Pedestrian, Person, Bicycle, Car, Van, Truck, Tricycle, A-t, Bus, and Motor—and we only show the overall evaluation metrics of the experimental results in Table 4).
From the experimental results in Table 4, it is clear that the SRW-YOLO algorithm outperforms the baseline YOLOv11 model on the VisDrone2019 validation set in all metrics, with an improvement of 5.3%, 3.7% and 5.0% in precision, recall and mAP, respectively. This result proves that the SRW-YOLO algorithm not only significantly enhances the detection of small targets, but also improves the detection precision of medium and large targets. The visualization results of the SRW-YOLO algorithm with other algorithms on the VisDrone2019 validation set are shown in Figure 12.
Taken together with our benchmarks on the State Grid dataset (Experiments A and B), these results (Experiment C) underscore SRW-YOLO’s exceptional small-object detection capabilities, and compared with other models, its detection ability is more suitable for the practical needs of using UAVs to identify environmental risk factors during the power transmission and distribution projects construction phase.

4. Discussion

In our experiments, SRW-YOLO demonstrated strong performance on multi-scale environmental risk targets, such as base of pylon and stacking of materials, through the combined effect of the P2-scale shallow feature detection layer and the RCS-OSA module, yielding marked gains in small-object precision and inference speed under complex backgrounds. The F1–confidence curve in Figure 13 captures SRW-YOLO’s behavior across thresholds: at a confidence setting of 0.408, the model reaches its maximum F1 score of 0.76 across all categories, reflecting a strategic bias toward higher recall to ensure more risk factors are detected. Applying this optimal threshold in further tests provides a favorable balance between precision and recall, thereby enhancing the inspection system’s overall capability to identify environmental risk factors.
However, real-world construction scenarios present a wider variety of environmental risk factors, such as debris piles from residential demolition and soil erosion due to shifts in vegetation cover, whose visual characteristics differ markedly in shape and texture from those in our current dataset. Future work should therefore expand the dataset and refine the detection head to encompass these new categories. Simultaneously, to satisfy the stringent compute and power constraints of UAV platforms, SRW-YOLO must undergo model quantization, pruning, and hardware accelerator optimizations so as to sustain high frame rate inference on edge devices. At the system level, deeply integrating the detection model with the State Grid’s e-Infrastructure 2.0 platform via standardized APIs will close the loop between “passive detection-active warning-rapid response”. In practice, real-time detection results from the UAV’s edge unit would be streamed to the cloud, automatically triggering alerts, generating visual risk maps, and interfacing with maintenance scheduling systems. This end-to-end solution not only boosts inspection efficiency but also delivers intelligent, granular support for the safety management and environmental protection of transmission and distribution project sites.

5. Conclusions

This study introduces SRW-YOLO, a novel multi-scale detector tailored to UAV-borne imagery of heterogeneous environmental risk factors during the power transmission and distribution projects construction phase. Based on the YOLOv11 model, the method first significantly improves the detection of small targets in the UAV perspective by adding a new P2-scale detection layer to capture high-resolution shallow detail features. Secondly, the RCS-OSA reparameterized convolution module is used to replace the traditional C3K2 block in backbone and the shallow layers of neck network, which accelerates the inference speed while reducing the model complexity using a two-branch aggregation strategy. Finally, in order to further optimize the weight allocation of low-quality labelled samples and improve the small-target detection accuracy, the model also introduces the WIoU v3 loss function with dynamic non-monotonic focusing mechanism. Experimental results show that SRW-YOLO exhibits high accuracy and robustness in identifying environmental risk factors during the construction phase of transmission and distribution projects.
Despite the significant progress that SRW-YOLO algorithm has made in detection performance, it still faces some challenges. For example, better representing linear targets such as the stacking of materials in the feature extraction process still requires in-depth research. Moreover, balancing detection speed and parameter efficiency demands ongoing attention, particularly for deployment on resource-constrained UAV platforms. Future work will focus on designing more lightweight and efficient multi-scale heterogeneous target detection models to better meet the application requirements of UAV imagery in real-time engineering surveillance and identification. These developments aim to fully realize SRW-YOLO’s potential for intelligent, high-precision environmental monitoring in live construction settings.

Author Contributions

Conceptualization, F.L. (Fei Liu); Methodology, F.L. (Fei Liu); Software, Y.Z.; Investigation, Q.H. and F.L. (Fang Liu); Resources, X.S. and J.Z.; Writing—original draft, Y.Z.; Writing—review & editing, F.L. (Fei Liu) and F.L. (Fang Liu); Supervision, Q.H. and F.L. (Fang Liu); Project administration, X.S. and J.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the State Grid Corporation of China Science and Technology Project Grant (Grant no.: 5200-202456108A-1-1-ZN), the Beijing Natural Science Foundation (4252033), the Fundamental Research Funds for Beijing University of Civil Engineering and Architecture (No. X25039), and BUCEA Post Graduate Innovation Project (PG2025172).

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Acknowledgments

We thank the editors and reviewers for their hard work and valuable advice.

Conflicts of Interest

Authors Xiaohu Sun and Jiyong Zhang were employed by the company State Grid Economic and Technological Research Institute Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

  1. Sun, X.; Liu, F.; Zhao, Y.; Liu, F.; Wang, J.; Zhu, S.; He, Q.; Bai, Y.; Zhang, J. Research on Environmental Risk Monitoring and Advance Warning Technologies of Power Transmission and Distribution Projects Construction Phase. Sensors 2024, 24, 7695. [Google Scholar] [CrossRef]
  2. Khoei, T.T.; Slimane, H.O.; Kaabouch, N. Deep learning: Systematic review, models, challenges, and research directions. Neural Comput. Appl. 2023, 35, 23103–23124. [Google Scholar] [CrossRef]
  3. Viola, P.; Jones, M. Rapid object detection using a boosted cascade of simple features. In Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2001, Kauai, HI, USA, 8–14 December 2001; pp. I-511–I-518. [Google Scholar]
  4. Viola, P.; Jones, M. Robust real-time face detection. Int. J. Comput. Vis. 2004, 57, 137–154. [Google Scholar] [CrossRef]
  5. Dalal, N.; Triggs, B. Histograms of oriented gradients for human detection. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), San Diego, CA, USA, 20–25 June 2005; pp. 886–893. [Google Scholar]
  6. Girshick, R. Fast r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar]
  7. Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed]
  8. He, K.; Gkioxari, G.; Dollar, P.; Girshick, R. Mask r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2961–2969. [Google Scholar]
  9. Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 24–27 June 2014; pp. 580–587. [Google Scholar]
  10. Wu, Y.; Chen, Y.; Yuan, L.; Liu, Z.; Wang, L.; Li, H.; Fu, Y. Rethinking classification and localization for object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 10186–10195. [Google Scholar]
  11. Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.-Y.; Berg, A.C. Ssd: Single shot multibox detector. In Proceedings of the Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; Proceedings, Part I 14. pp. 21–37. [Google Scholar]
  12. Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
  13. Wang, C.-Y.; Bochkovskiy, A.; Liao, H.-Y.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 18–22 June 2023; pp. 7464–7475. [Google Scholar]
  14. Khanam, R.; Hussain, M. Yolov11: An overview of the key architectural enhancements. arXiv 2024, arXiv:2410.17725. [Google Scholar] [CrossRef]
  15. Liu, S.; Shao, F.; Chu, W.; Dai, J.; Zhang, H. An Improved YOLOv8-Based Lightweight Attention Mechanism for Cross-Scale Feature Fusion. Remote Sens. 2025, 17, 1044. [Google Scholar] [CrossRef]
  16. Qiang, H.; Hao, W.; Xie, M.; Tang, Q.; Shi, H.; Zhao, Y.; Han, X. SCM-YOLO for Lightweight Small Object Detection in Remote Sensing Images. Remote Sens. 2025, 17, 249. [Google Scholar] [CrossRef]
  17. Wang, J.; Ding, X.; Meng, F. MC-YOLO: Multi-scale Transmission Line Defect Target Recognition Network. In Proceedings of the International Conference on Multimedia Modeling, Amsterdam, The Netherlands, 29 January–2 February 2024; Springer: Berlin/Heidelberg, Germany, 2024; pp. 324–337. [Google Scholar]
  18. Liu, C.; Wei, S.; Zhong, S.; Yu, F. YOLO-Powerlite: A lightweight YOLO model for transmission line abnormal target detection. IEEE Access 2024, 12, 105004–105015. [Google Scholar] [CrossRef]
  19. Shi, C.; Zheng, X.; Zhao, Z.; Zhang, K.; Su, Z.; Lu, Q. LSKF-YOLO: Large selective kernel feature fusion network for power tower detection in high-resolution satellite remote sensing images. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5620116. [Google Scholar] [CrossRef]
  20. Li, S.J.; Liu, Y.X.; Li, M.; Ding, L. DF-YOLO: Highly accurate transmission line foreign object detection algorithm. IEEE Access 2023, 11, 108398–108406. [Google Scholar] [CrossRef]
  21. Bi, Z.; Jing, L.; Sun, C.; Shan, M. YOLOX++ for transmission line abnormal target detection. IEEE Access 2023, 11, 38157–38167. [Google Scholar] [CrossRef]
  22. Bi, Z.; Jing, L.; Sun, C.; Shan, M.; Zhong, W. Transmission line abnormal target detection algorithm based on improved YOLOX. Multimed. Tools Appl. 2024, 83, 53263–53278. [Google Scholar] [CrossRef]
  23. Rong, S.; He, L.; Atici, S.F.; Cetin, A.E. Advanced YOLO-based Real-time Power Line Detection for Vegetation Management. arXiv 2025, arXiv:2503.00044. [Google Scholar] [CrossRef]
  24. Lin, Z.; Chen, W.; Su, L.; Chen, Y.; Li, T. HS-YOLO: Small Object Detection for Power Operation Scenarios. Appl. Sci. 2023, 13, 11114. [Google Scholar] [CrossRef]
  25. Zhang, R.; Wen, C. Sod-yolo: A small target defect detection algorithm for wind turbine blades based on improved YOLOV5. Adv. Theory Simul. 2022, 5, 2100631. [Google Scholar] [CrossRef]
  26. Cheng, Q.; Yuan, G.; Chen, D.; Xu, B.; Chen, E.; Zhou, H. Transmission Lines Small-Target Detection Algorithm Research Based on YOLOv5. Appl. Sci. 2023, 13, 9386. [Google Scholar] [CrossRef]
  27. Wu, K.; Chen, Y.; Lu, Y.; Yang, Z.; Yuan, J.; Zheng, E. SOD-YOLO: A High-Precision Detection of Small Targets on High-Voltage Transmission Lines. Electronics 2024, 13, 1371. [Google Scholar] [CrossRef]
  28. Pang, Q.; Wan, J.; Dong, Z.; Tian, M. Advanced YOLOv10 for Aerial Power Line Fault Detection. In Proceedings of the 2024 4th International Conference on Smart Grid and Energy Internet (SGEI), Shenyang, China, 13–15 December 2024; pp. 326–330. [Google Scholar]
  29. Kang, M.; Ting, C.-M.; Ting, F.F.; Phan, R.C.W. RCS-YOLO: A Fast and High-Accuracy Object Detector for Brain Tumor Detection. In Proceedings of the Medical Image Computing and Computer Assisted Intervention—MICCAI 2023, Vancouver, BC, Canada, 8–12 October 2023; pp. 600–610. [Google Scholar] [CrossRef]
  30. Xiang, Y.; Du, C.; Mei, Y.; Zhang, L.; Du, Y.; Liu, A. BN-YOLO: A lightweight method for bird’s nest detection on transmission lines. J. Real-Time Image Process. 2024, 21, 194. [Google Scholar] [CrossRef]
  31. Tong, Z.; Chen, Y.; Xu, Z.; Yu, R. Wise-IoU: Bounding box regression loss with dynamic focusing mechanism. arXiv 2023, arXiv:2301.10051. [Google Scholar]
  32. Zou, H.; Yang, J.; Sun, J.; Yang, C.; Luo, Y.; Chen, J. Detection Method of External Damage Hazards in Transmission Line Corridors Based on YOLO-LSDW. Energies 2024, 17, 4483. [Google Scholar] [CrossRef]
  33. Hu, C.; Lv, L.; Zhou, T. UAV inspection insulator defect detection method based on dynamic adaptation improved YOLOv8. J. Real-Time Image Process. 2025, 22, 74. [Google Scholar] [CrossRef]
  34. Rasheed, A.F.; Zarkoosh, M. YOLOv11 Optimization for Efficient Resource Utilization. arXiv 2024, arXiv:2412.14790. [Google Scholar] [CrossRef]
  35. Li, S.; Ouyang, H.; Chen, T.; Lu, X.; Zhao, Z. YOLO-T: Multi-Target Detection Algorithm for Transmission Lines. Int. J. Adv. Comput. Sci. Appl. 2024, 15, 1072. [Google Scholar] [CrossRef]
  36. Yao, Q.; Zhuang, D.; Feng, Y.; Wang, Y.; Liu, J. Accurate Detection of Brain Tumor Lesions from Medical Images based on Improved YOLOv8 Algorithm. IEEE Access 2024, 12, 144260–144279. [Google Scholar] [CrossRef]
  37. Guo, S.-J.; Li, B.-H.; Zhang, J.-J.; Zhu, E.-J.; Liang, Y.; Sun, J.-Z. Flame and smoke detection based on channel shuffling and adaptive spatial feature fusion. J. Electron. Imaging 2024, 33, 053036. [Google Scholar] [CrossRef]
  38. Ding, X.; Zhang, X.; Ma, N.; Han, J.; Ding, G.; Sun, J. Repvgg: Making vgg-style convnets great again. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Virtual, 19–25 June 2021; pp. 13733–13742. [Google Scholar]
Figure 1. Article framework diagram.
Figure 1. Article framework diagram.
Remotesensing 17 02576 g001
Figure 2. YOLOv11 network architecture.
Figure 2. YOLOv11 network architecture.
Remotesensing 17 02576 g002
Figure 3. SRW-YOLOv11 network architecture.
Figure 3. SRW-YOLOv11 network architecture.
Remotesensing 17 02576 g003
Figure 4. Schematic of feature map generation.
Figure 4. Schematic of feature map generation.
Remotesensing 17 02576 g004
Figure 5. Structural reparameterization of the RepVGG block. ( μ , σ , γ , β as the accumulated mean, standard deviation and learned scaling factor and bias of the BatchNorm layer following conv.).
Figure 5. Structural reparameterization of the RepVGG block. ( μ , σ , γ , β as the accumulated mean, standard deviation and learned scaling factor and bias of the BatchNorm layer following conv.).
Remotesensing 17 02576 g005
Figure 6. RCS module structure. “*” indicates multiplication here.
Figure 6. RCS module structure. “*” indicates multiplication here.
Remotesensing 17 02576 g006
Figure 7. RCS-OSA module structure. “*” indicates multiplication here.
Figure 7. RCS-OSA module structure. “*” indicates multiplication here.
Remotesensing 17 02576 g007
Figure 8. The black box indicates the smallest closed box and the yellow line indicates the connection of the center points.
Figure 8. The black box indicates the smallest closed box and the yellow line indicates the connection of the center points.
Remotesensing 17 02576 g008
Figure 9. (a) shows the quantity information of the ten types of categories; (b) darker colors indicate a more concentrated distribution of targets in that size range, while lighter colors indicate a sparser distribution of targets.
Figure 9. (a) shows the quantity information of the ten types of categories; (b) darker colors indicate a more concentrated distribution of targets in that size range, while lighter colors indicate a sparser distribution of targets.
Remotesensing 17 02576 g009
Figure 10. Visualization results of the YOLOv8, YOLOv10n, YOLOv11, RCS-YOLO, RDA-YOLO, and SRW-YOLOv11 algorithms for comparison tests on the State Grid dataset.
Figure 10. Visualization results of the YOLOv8, YOLOv10n, YOLOv11, RCS-YOLO, RDA-YOLO, and SRW-YOLOv11 algorithms for comparison tests on the State Grid dataset.
Remotesensing 17 02576 g010
Figure 11. Comparison plot of the experimental results of different models on all target categories of the State Grid dataset.
Figure 11. Comparison plot of the experimental results of different models on all target categories of the State Grid dataset.
Remotesensing 17 02576 g011
Figure 12. Visualization results of different algorithms on the VisDrone2019 validation set.
Figure 12. Visualization results of different algorithms on the VisDrone2019 validation set.
Remotesensing 17 02576 g012
Figure 13. F1-Confidence curves for the experimental results of the SRW-YOLO model.
Figure 13. F1-Confidence curves for the experimental results of the SRW-YOLO model.
Remotesensing 17 02576 g013
Table 1. Label types and number sizes for the State Grid dataset.
Table 1. Label types and number sizes for the State Grid dataset.
ClassTrain SetValidation SetTest Set
Transmission pylon83913596
Base of pylon1428141122
Stacking of materials90988103
Table 2. Results of different target detection models on a data type of stacking of materials.
Table 2. Results of different target detection models on a data type of stacking of materials.
ModulePrecision (%)Recall (%)mAP50 (%)Params (M)GFLOPs
YOLOv855.177.363.43.08.1
YOLOv10m52.779.062.316.563.4
YOLOv10n52.577.863.62.78.2
YOLOv1155.478.464.72.66.3
RCS-YOLO [29]60.178.461.05.615.2
RDA-YOLO [37]56.976.567.58.622.1
SRW-YOLO60.878.668.14.116.5
Table 3. Results of ablation experiments for the stacking of materials class of targets on the State Grid dataset.
Table 3. Results of ablation experiments for the stacking of materials class of targets on the State Grid dataset.
ModulePrecision (%)Recall (%)mAP50 (%)Params (M)GFLOPs
YOLOv11 (baseline)55.478.464.72.66.3
YOLOv11+SFDL55.979.465.72.914.1
YOLOv11+RCS-OSA55.180.771.03.914.4
YOLOv11+WIoU v353.379.163.62.66.3
SRW-YOLO60.878.668.14.116.5
Bold is used to highlight the comparison of models and indicators used.
Table 4. Experimental results of different object detection models on the VisDrone2019 validation set.
Table 4. Experimental results of different object detection models on the VisDrone2019 validation set.
ModelPrecision (%)Recall (%)mAP50 (%)GFLOPs
YOLOv538.332.429.612
YOLOv844.332.633.18.1
YOLOv10m43.432.732.463.4
YOLOv10n43.031.631.88.2
YOLOv1143.133.332.76.3
RCS-YOLO [29]45.128.133.815.2
RDA-YOLO [37]43.732.632.422.1
SRW-YOLO48.437.037.716.5
Bold is used to highlight the comparison of models and indicators used.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhao, Y.; Liu, F.; He, Q.; Liu, F.; Sun, X.; Zhang, J. SRW-YOLO: A Detection Model for Environmental Risk Factors During the Grid Construction Phase. Remote Sens. 2025, 17, 2576. https://doi.org/10.3390/rs17152576

AMA Style

Zhao Y, Liu F, He Q, Liu F, Sun X, Zhang J. SRW-YOLO: A Detection Model for Environmental Risk Factors During the Grid Construction Phase. Remote Sensing. 2025; 17(15):2576. https://doi.org/10.3390/rs17152576

Chicago/Turabian Style

Zhao, Yu, Fei Liu, Qiang He, Fang Liu, Xiaohu Sun, and Jiyong Zhang. 2025. "SRW-YOLO: A Detection Model for Environmental Risk Factors During the Grid Construction Phase" Remote Sensing 17, no. 15: 2576. https://doi.org/10.3390/rs17152576

APA Style

Zhao, Y., Liu, F., He, Q., Liu, F., Sun, X., & Zhang, J. (2025). SRW-YOLO: A Detection Model for Environmental Risk Factors During the Grid Construction Phase. Remote Sensing, 17(15), 2576. https://doi.org/10.3390/rs17152576

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop