A Real-Time DAO-YOLO Model for Electric Power Operation Violation Recognition

Qian, Xiaoliang; Li, Yang; Ding, Xinyu; Luo, Longxiang; Guo, Jinchao; Wang, Wei; Xing, Peixu

doi:10.3390/app15084492

Open AccessArticle

A Real-Time DAO-YOLO Model for Electric Power Operation Violation Recognition

by

Xiaoliang Qian

,

Yang Li

,

Xinyu Ding

,

Longxiang Luo

,

Jinchao Guo

,

Wei Wang

^*

and

Peixu Xing

^*

College of Electrical and Information Engineering, Zhengzhou University of Light Industry, Zhengzhou 450002, China

^*

Authors to whom correspondence should be addressed.

Appl. Sci. 2025, 15(8), 4492; https://doi.org/10.3390/app15084492

Submission received: 7 March 2025 / Revised: 8 April 2025 / Accepted: 13 April 2025 / Published: 18 April 2025

(This article belongs to the Special Issue Application of Artificial Intelligence in Image Processing)

Download

Browse Figures

Versions Notes

Abstract

:

Electric power operation violation recognition (EPOVR) is essential for personnel safety, achieved by detecting key objects in electric power operation scenarios. Recent methods usually use the YOLOv8 model to achieve EPOVR; however, the YOLOv8 model still has four problems that need to be addressed. Firstly, the capability for feature representation of irregularly shaped objects is not strong enough. Secondly, the capability for feature representation is not strong enough to precisely detect multi-scale objects. Thirdly, the localization accuracy is not ideal. Fourthly, many violation categories in electric power operation cannot be covered by the existing datasets. To address the first problem, a deformable C2f (DC2f) module is proposed, which contains deformable convolutions and depthwise separable convolutions. For the second problem, an adaptive multi-scale feature enhancement (AMFE) module is proposed, which integrates multi-scale depthwise separable convolutions, adaptive convolutions, and a channel attention mechanism to optimize multi-scale feature representation while minimizing the number of parameters. For the third problem, an optimized complete intersection over union (OCIoU) loss is proposed for bounding box localization. Finally, a novel dataset named EPOVR-v1.0 is proposed to evaluate the performance of the object detection model applied in EPOVR. Ablation studies validate the effectiveness of the DC2f module, AMFE module, OCIoU loss, and their combinations. Compared with the baseline YOLOv8 model, the mAP@0.5 and mAP@0.5–0.95 are improved by 3.2% and 4.4%, while SDAP@0.5 and SDAP@0.5–0.95 are reduced by 0.34 and 0.019, respectively. Furthermore, the number of parameters and GFLOPS are shown to have slightly decreased. Comparison with seven YOLO models shows that our DAO-YOLO model achieves the highest detection accuracy while achieving real-time object detection for EPOVR.

Keywords:

electric power operation violation recognition (EPOVR); DAO-YOLO; deformable C2f (DC2f); adaptive multiscale feature enhancement (AMFE); optimized complete intersection over union (OCIoU) loss; EPOVR-v1.0 dataset

1. Introduction

The rapid expansion of power infrastructure and increased operational demands have made safety risks in the electricity industry more pronounced, highlighting the urgent need to enhance protection for power workers. However, conventional approaches for EPOVR predominantly depend on manual inspections and remote monitoring as their core operational mechanisms. This method not only places significant strain on human resources and raises operational expenses for businesses, but also leads to visual fatigue among quality control personnel due to prolonged periods of intensive inspection work. These combined effects ultimately diminish inspection efficiency and undermine the reliability of safety oversight mechanisms. In recent years, computer vision and artificial intelligence technologies have emerged as predominant solutions in EPOVR applications [1]. The proposed approaches not only markedly enhance monitoring efficiency but also substantially lower labor expenses, while delivering robust technical support for intelligent safety oversight in power operations. They play a vital role in safeguarding power operators and improving industry regulatory standards.

Object detection techniques have seen remarkable advancements in recent years, driven by progress in deep learning [2,3,4,5,6,7,8,9,10,11,12]. The intelligent detection of EPOVR mainly relies on object detection technology [13]. The detection of critical safety equipment such as safety harnesses, helmets, and reflective vests in power utility operations enables effective identification of safety violations, thereby facilitating real-time monitoring and proactive warning systems. The You Only Look Once (YOLO) series [14,15,16,17] of object detection algorithms have demonstrated notable success in various industrial applications, showing particular promise in EPOVR scenarios. Since its initial release, the YOLO object detection model has undergone continuous development through numerous iterations, spanning from YOLOv1 to YOLOv11. This study demonstrates through a comprehensive comparative analysis (as detailed in Section 4.5) illustrating that the YOLOv8 model most effectively addresses the operational requirements of power companies. This capability stems from the demonstrated effectiveness of YOLO series models in fulfilling real-time processing demands across physical deployment environments [18]. Moreover, YOLOv8 not only offers efficient real-time detection capabilities but also delivers more accurate detection results, which is crucial for enhancing the accuracy of safety supervision in power operations.

Although the YOLOv8 model has achieved favorable detection performance in multiple application scenarios, it still encounters the following challenges in EPOVR:

(1): The feature representation capability for objects with irregular shapes is insufficient. As shown in Figure 1a,b, the YOLOv8 model fails to detect hooks and cranes with irregular shapes.
(2): The feature representation capability for multi-scale objects is insufficient. As shown in Figure 1a,b, the YOLOv8 model fails to detect smaller hooks and larger cranes.
(3): The localization accuracy is insufficient. As shown in Figure 1b, the bounding box produced by the YOLOv8 model is too small for the operator.
(4): The current datasets are not sufficient for EPOVR. The current datasets usually focus on safety helmets and reflective vests, as shown in Figure 2. These tasks are relatively less challenging, and existing studies are relatively mature in these tasks. However, many violations cannot be covered by the above object categories, such as personnel standing under lifted loads, workers throwing objects from height, and ladders without height restriction marking, etc.

In order to address the above challenges, a novel object detection model, named DAO-YOLO, is proposed for EPOVR. The unique contributions of DAO-YOLO are as follows:

(1): To address the first challenge, a deformable C2f (DC2f) module is proposed and integrated into the backbone network. The DC2f module replaces the traditional convolutional operations with deformable convolution (DConv) and depthwise separable convolution operations. This approach reduces the number of parameters while providing more accurate feature representation for objects with irregular shapes.
(2): To address the second challenge, an adaptive multi-scale feature enhancement (AMFE) module is proposed and integrated into the backbone network. The AMFE module combines multi-scale depthwise separable convolutions with an adaptive channel attention mechanism to better represent the characteristics of multi-scale objects.
(3): To address the third challenge, an optimized CIoU (OCIoU) bounding box regression loss is proposed. The OCIoU loss enhances the CIoU (complete intersection over union) loss used in YOLOv8 by optimizing width and height separately instead of using aspect ratio optimization, thereby improving the localization accuracy.
(4): To address the fourth challenge, a novel object detection dataset for EPOVR, named EPOVR-v1.0, is proposed in this paper. This dataset contains 1200 images covering eight common object categories in electric power operation scenarios. The training, validation, and testing sets consist of 840, 120, and 240 images, respectively.

2. Literature Review

In recent years, deep learning technology-based automated object detection methods have been widely applied in diverse scenarios, driving remarkable efficiency improvements and delivering substantial practical value. For instance, deep learning is used to classify building facade materials from street-view images [21], enhancing automation capabilities in urban analysis. Similarly, deep learning-based object detection has played a crucial role in EPOVR and has become the predominant approach in this field. Current object detection methods for EPOVR can be broadly categorized into three types: YOLOv5 model-based methods, YOLOv8 model-based methods, and the other YOLO model-based methods, with the majority of methods belonging to the first two types. Below is a brief introduction to each type.

Firstly, many YOLOv5-based methods for EPOVR have been introduced. Zhang et al. [22] proposed an improved YOLOv5 model for detecting power operation violations. The method introduces a spatial attention mechanism to enhance feature extraction in cluttered scenes, and employs Bayesian-optimized multi-scale fusion to boost detection accuracy across varying object sizes. Liu et al. [23] proposed an improved YOLOv5-based algorithm for detecting safety equipment in power operations, introducing a polarized self-attention module to enhance feature extraction and replacing traditional convolution with grouped spatial convolution to reduce model complexity, thereby improving computational speed and detection accuracy. Sun et al. [24] proposed a lightweight YOLOv5-based safety helmet detection algorithm, embedding a multispectral channel attention mechanism for small-target enhancement, and using channel pruning to achieve real-time detection on embedded devices. Other related works include YOLO-M3C [25], HR-YOLO [26], and FEFD-YOLOV5 [27].

YOLOv8-based methods for EPOVR have also been extensively studied. Wang et al. [28] introduced a helmet detection approach utilizing PConv-YOLOv8, where partial convolution is embedded in the YOLOv8 backbone, a similarity-based attention mechanism is incorporated into the neck, and the Wise-Distribution Focal Loss is employed to enhance detection performance. Bao et al. [29] developed an enhanced helmet detection method based on YOLOv8, where the original C2f module is substituted with the C2f-FE module, incorporating partial convolution from FasterNet and an efficient multi-scale attention mechanism to improve detection accuracy. Di et al. [30] presented a personal protective equipment detection method by modifying YOLOv8s through the integration of a reparameterization module in its backbone network and the incorporation of an R-C2F module alongside a multi-scale feature fusion architecture. Park et al. [31] proposed an improved YOLOv8-based model for detecting non-personal protective equipment on construction sites, utilizing multiple data augmentation techniques to expand the size of training samples, and analyzing the impact of data augmentation on model performance across different classes and backbone networks. Other related works include SDCB-YOLO [32], MEAG-YOLO [33], YOLOv8s-SNC [34], and YOLOv8-ADSC [35].

Finally, other methods based on different YOLO models for EPOVR have also been investigated. Tang et al. [36] developed an enhanced detection model based on YOLOv9 for safety protective equipment in power operations, incorporating self-calibrating convolutions to expand receptive fields and a wavelet transform downsampling module to preserve feature integrity. Zhang et al. [37] proposed a helmet and worker detection method based on the improved YOLOv11n, which replaces traditional convolution with global search convolution for better feature extraction and lower computational costs, and introduces the C3K2-FE module and bidirectional feature pyramid network to enhance detection accuracy, speed, and the integration of features across multiple scales. Eum et al. [38] proposed a heavy equipment detection method by integrating transformer-based backbone networks and multi-scale mechanisms into YOLOv10, demonstrating excellent accuracy for varying equipment sizes with real-time speed. Other related works include Development of Easily Accessible Electricity Consumption Model Using Open Data and GA-SVR [39], GeoIoU-SEA-YOLO [40].

According to the comprehensive performance of mainstream YOLO models, as shown in Section 4.5, the YOLOv8n is selected as the baseline model for our DAO-YOLO model. The major difference between our work and existing works is proposing the DC2f and AMFE modules to enhance the feature representations of irregularly shaped and multi-scale objects, and improving object localization accuracy by a refined bounding box regression loss.

3. DAO-YOLO Model

3.1. Overview

DAO-YOLO is an enhanced and optimized iteration of YOLOv8, featuring a network architecture that comprises three core components: the backbone, the feature fusion layer (Neck), and the detection head (Head). The backbone architecture builds upon the cross-stage partial network [41], employing stacked C2f modules and convolutional layers to effectively capture hierarchical feature representations through multi-stage processing. Building upon this foundation, the final C2f module in the backbone network is substituted with a DC2f structure. This enhanced configuration incorporates two key modifications: substituting the traditional convolution in the Bottleneck layer with a deformable convolution module [42], while introducing a depthwise separable convolution module in place of the conventional convolution within the C2f architecture. This architectural refinement achieves parameter reduction while enhancing feature representation accuracy for irregularly shaped objects. The backbone network is further augmented with an AMFE module at its terminal stage, strengthening multi-scale discriminative feature extraction. These refined feature maps are subsequently fused through hierarchical synthesis in the feature pyramid network. The neck structure is constructed based on the feature pyramid network [43], incorporating bidirectional cross-scale connections to effectively integrate multi-level features, which significantly enhances the model’s feature representation capabilities. In the final stage, the integrated feature maps are routed to the detection head, which utilizes a dual-branch architecture to simultaneously process classification features and localization signals. A 1 × 1 convolutional layer subsequently performs coordinate regression and category prediction. Notably, we implement our proposed OCIoU to supersede the conventional CIoU [44] as the bounding box regression loss, thereby enhancing target localization precision. The comprehensive architecture of DAO-YOLO is illustrated in Figure 3.

3.2. DC2f Module

DAO-YOLO’s DC2f component constitutes a cornerstone architectural innovation within the framework. The principal architectural advancement relative to the baseline C2f constitutes the systematic replacement of standard Bottleneck blocks with deformable Bottleneck (DBottleneck) modules, enabling enhanced feature extraction capability for non-regular targets through deformable convolution mechanisms. To optimize parametric efficiency and computational load, we systematically substitute standard convolutional layers within the DC2f architecture with depthwise separable convolution (DSConv) [45] modules. Figure 4a illustrates the overall architecture of DC2f. Firstly, DC2f extracts features through the depthwise separable convolution module. Afterwards, the extracted feature maps are equally divided into two parts along the channel dimension. One part flows into the DBottleneck module for deep feature mining, while the remaining part is retained for subsequent fusion. Subsequently, the feature maps generated by each DBottleneck module are concatenated sequentially along the channel dimension from left to right, as shown in Figure 4a, further enhancing the feature diversity. In the terminal processing stage, depthwise separable convolution is strategically deployed to compress the channel dimensions of the fused feature representations, ensuring dimensional compatibility with downstream network layers through parametric efficiency optimization.

Figure 4b demonstrates the network design of DBottleneck. Firstly, the DBottleneck module applies a 3 × 3 DConv layer to compress the input feature map by halving its channel dimensions. Secondly, the condensed features are processed through a second 3 × 3 DConv layer, which expands the channel depth back to the original size while simultaneously capturing higher-level feature representations. Finally, the input feature map is element-wise added to the output of the second DConv module through a shortcut connection. The central innovation of DBottleneck resides in its DConv module.

The Bottleneck module in YOLOv8 employs a standard convolutional block architecture. The fixed receptive fields of standard convolution operations limit spatial adaptability to geometric variability in objects, consequently impairing discriminative capability for shape variations. In contrast, DConv can dynamically predict the sampling point locations for convolution operations and introduce a learnable RoI (region of interest) metric factor (which is proportional to the correlation between the sampling area and the object). This structural innovation significantly improves detection robustness for non-regular geometries, with Figure 5 providing schematic visualization of DConv’s adaptive receptive field mechanism.

For the input feature map, DConv employs two additional convolutional layers to predict the offsets of the original sampling points along with the RoI scaling factor. Next, the sampling positions of the convolutional kernel are dynamically adjusted according to the predicted offsets, while the RoI scaling factor weights these dynamically sampled points. In the end, the input feature map is convolved with the modified sampling points to produce the refined feature map. The DConv operation can be formally expressed as follows:

y (p) = \sum_{k = 1}^{K} w_{k} \cdot x (p + p_{k} + Δ p_{k}) \cdot Δ m_{k}

(1)

where

w_{k}

denotes the weight of the k-th convolution point in the convolutional kernel, x denotes the input feature map, p denotes any current position in the feature map,

p_{k}

denotes the coordinates of the k-th convolution point within the kernel,

Δ p_{k}

denotes the predicted offset of the k-th convolution point, and

Δ m_{k} \in [0, 1]

denotes the RoI metric factor of

x (p + p_{k} + Δ p_{k})

, i.e., the probability that

x (p + p_{k} + Δ p_{k})

belongs to the RoI.

As shown in Equation (1), DConv adapts to geometric variations in object shapes by dynamically learning offsets, enabling flexible adjustment of the convolutional kernel’s sampling positions. Meanwhile, the RoI scaling factor weights the importance of sampling points, effectively enhancing key features while suppressing irrelevant background regions. By incorporating DConv, the DC2f module substantially improves feature extraction performance for irregularly shaped objects. Moreover, substituting standard convolution in C2f with DSConv significantly reduces both the model’s parameter size and computational complexity.

3.3. AMFE Module

The AMFE module integrates multi-scale depthwise separable convolutions with a channel attention mechanism, facilitating efficient extraction of both fine-grained local details and comprehensive global context across multiple scales. This integration substantially improves the model’s multi-scale feature representation capability.

Figure 6 illustrates the architecture of the AMFE module, and its implementation process can be expressed as follows:

X_{G} = G A P (D S C o n v_{1} (X_{i}) + D S C o n v_{2} (X_{i}) + D S C o n v_{3} (X_{i}))

(2)

W_{c} = S i g m o i d (A C o n v 1 D_{z} (X_{G}))

(3)

X_{o} = X_{i} \cdot W_{c}

(4)

where

X_{i} \in R^{H \times W \times C}

denotes the input feature map, H, W, and C denote the height, width, and number of channels of the input feature map, respectively,

D S C o n v_{1} (X_{i})

,

D S C o n v_{2} (X_{i})

, and

D S C o n v_{3} (X_{i})

denote depthwise separable convolutions with kernel sizes of 1 × 1, 3 × 3 and 5 × 5, respectively,

G A P (\cdot)

denotes the global average pooling,

X_{G} \in R^{C}

denotes the global feature vector,

A C o n v 1 D_{z}

denotes an adaptive 1D convolution operation with a kernel size of

1 \times z

,

W_{c}

denotes the channel attention coefficient, and

X_{o}

denotes the output feature map. The extent of the convolutional kernel z is adaptively determined by the following equation:

z = {⌈\frac{{log}_{2} (C)}{s} + \frac{o}{s}⌉}_{o d d}

(5)

where C denotes the number of input channels, s denotes the scaling factor that controls the growth rate of z, o denotes the offset used as the initial value for the adjustment of z, and

{⌈\cdot⌉}_{o d d}

denotes the operation of rounding up to the nearest odd number.

The above implementation shows that the AMFE module has two distinct advantages over conventional channel attention modules. Firstly, the input feature map undergoes multi-scale processing via depthwise separable convolutions at three different scales, nnhancing the model’s proficiency in capturing features across multiple scales. Secondly, the employed one-dimensional adaptive convolution determines its kernel size adaptively based on the input feature scale. Compared to traditional fully connected layers, this approach substantially reduces both parameter count and computational complexity, achieving superior balance between computational efficiency and feature extraction capability.

3.4. OCIoU Bounding Box Regression Loss

While the CIoU loss function in YOLOv8 enhances bounding box regression by optimizing aspect ratios, this approach may cause differences in the width–height ratios of the predicted and actual boxes. To resolve this problem, we propose OCIoU, a novel bounding box regression loss that independently optimizes width and height parameters through decoupled computations.

The OCIoU loss is defined by the following expression:

L_{O C I o U} = 1 - I o U + \frac{ρ^{2} (b, b_{g t})}{c^{2}} + \frac{ρ^{2} (w, w_{g t})}{w_{c}^{2}} + \frac{ρ^{2} (h, h_{g t})}{h_{c}^{2}}

(6)

As shown in Figure 7,

I o U

denotes the Intersection over Union between the predicted and ground truth boxes, b and

b_{g t}

respectively denote the center points of the predicted and ground truth boxes, w and h, respectively, denote the width and height of the predicted box,

w_{g t}

and

h_{g t}

respectively denote the width and height of the ground truth box,

w_{c}

,

h_{c}

, and c, respectively, denote the width, height, and diagonal length of the minimum enclosing rectangle,

ρ (b, b_{g t})

denotes the Euclidean distance between b and

b_{g t}

,

ρ (w, w_{g t})

denotes the difference between w and

w_{g t}

, and

ρ (h, h_{g t})

follows the same principle.

3.5. Overall Loss Function

The expression of the overall loss function L of DAO-YOLO is as follows:

L = L_{c l s} + L_{D F L} + L_{O C I o U}

(7)

where

L_{c l s}

denotes the cross-entropy loss, and

L_{D F L}

denotes the Distribution Focal Loss [46].

4. Experimental Results and Analysis

4.1. EPOVR-v1.0 Dataset

The EPOVR-v1.0 dataset proposed in this paper consists of 1200 images, all sourced from power operation sites, with resolutions ranging from 1920 × 1080 to 3840 × 2160. The dataset covers eight object categories critical for EPOVR: operator, supervisor, throwing, cranes, ladder with height restriction marking, ladder, locked-hook, and unlocked-hook. The correspondence between each category of objects and the violation types is detailed in Table 1. The dataset is partitioned into three subsets: a training set with 840 images, a validation set with 120 images, and a test set with 240 images. The explicit distribution of training, validation, and testing samples for each object category is shown in Table 2. This study implemented fourfold data augmentation on the training and validation sets using horizontal flipping combined with rotational transformations at 90°, 180°, and 270°.

The correspondence in Table 1 is explained as follows: violation type 1 is identified when operators are detected but supervisors are not; violation type 2 is identified when both cranes and operators are detected, with operators positioned beneath the crane; violation type 3 is identified when throwing objects from a high altitude are detected; violation type 4 is identified when a regular ladder is detected, while ladders with height restriction markings are used for negative sample comparison training; violation type 5 is identified when unlocked-hooks are detected, with locked-hooks used for negative sample comparison training.

4.2. Experimental Setup

In Equation (5), s equals 1 and o equals 2. With reference to YOLOv5~YOLOv11 [47,48,49,50,51,52,53], the size of the input image is uniformly scaled to 640 × 640, and the ground truth bounding boxes are also scaled following the same proportion in the training stage, counteracting the impact induced by the resizing of the input image. The number of training epochs is set to 200, the batch size is 32, and the initial learning rate is 0.01. The optimizer chosen is stochastic gradient descent, with momentum and weight decay set to 0.937 and 0.0005, respectively. The IoU threshold for non-maximum suppression is set to 0.7. The experimental environment consists of the Ubuntu 22.04 operating system, Python 3.8, PyTorch 1.13.1, CUDA 12.4, an Intel(R) Xeon(R) E5-2650 v4 CPU @ 2.20GHz, a single NVIDIA TITAN RTX GPU, and 24GB of GPU memory.

4.3. Evaluation Metrics

The effectiveness of EPOVR heavily relies on the performance of the object detection model. For example, a violation is identified when the object detection model detects an unlocked hook. To evaluate the performance of the object detection model, the mAP@0.5(%), mAP@0.5–0.95(%), SDAP@0.5, SDAP@0.5–0.95, Parameters, GFLOPS, and FPS are adopted in this paper. The mAP (mean average precision) and SDAP (standard deviation of average precision) are used to evaluate the detection accuracy of the object detection model from a statistical perspective. Specifically, the mAP is calculated as the mean of AP (average precision) for each category, while the SDAP is calculated as the standard deviation of the AP for each category. The calculation of AP is referenced to [54]. The mAP@0.5(%) denotes the mAP when the IoU threshold is 0.5 [54], and the mAP@0.5–0.95(%) denotes the mAP when the IoU threshold ranges from 0.5 to 0.95. Similarly, the definitions of SDAP@0.5 and SDAP@0.5–0.95 can be inferred. The Parameters value denotes the total number of parameters in the detection model, which is used to evaluate the training cost. The GFLOPS denotes the number of billions of floating-point operations per second, which is used to measure the computational complexity. The FPS (frames per second) denotes the number of images inferred by the detection model within a second and is used to evaluate inference speed.

4.4. Ablation Study

This section presents ablation studies on the DC2f module, AMFE module, and OCIoU loss to assess their effectiveness. The experimental results are presented in Table 3, where ‘×’ indicates that the corresponding module has not been added, while ‘✓’ indicates the opposite meaning. In other words, the first row in Table 3 indicates the performance of the baseline model, i.e., YOLOv8n. The positions of the DC2f module, AMFE module, and the OCIoU loss in the DAO-YOLO model can be seen in Figure 2.

Experimental results demonstrate that the independent integration of the DC2f module yields respective improvements of 1.4% and 1.2% in the mAP@0.5(%) and mAP@0.5-0.95(%) metrics. This demonstrates that the DC2f module not only increases object detection accuracy but also improves the model’s real-time performance. When the AMFE module is used independently, it improves mAP@0.5(%) by 1.6% and mAP@0.5–0.95(%) by 1.5%, while keeping the parameter count, GFLOPS, and FPS nearly unchanged, demonstrating its effectiveness. The introduction of the OCIoU loss alone improves mAP@0.5(%) by 0.9% and mAP@0.5–0.95(%) by 2.2%, while keeping the number of parameters, GFLOPS, and FPS unchanged, demonstrating its effectiveness. Combining DC2f, AMFE, and the OCIoU loss yields additional improvements in both mAP@0.5(%) and mAP@0.5–0.95(%), while maintaining nearly identical parameters, GFLOPS, and FPS. In combinations involving DC2f, all three metrics show modest improvements, demonstrating the effectiveness of integrating these three components. Among them, the experiment results are optimal when all three are combined (the model in this paper). The experimental results demonstrate that the integrated model combining all three components delivers the optimal performance. Compared to the baseline YOLOv8, our approach achieves 3.2% and 4.4% improvements in mAP@0.5(%) and mAP@0.5–0.95(%), respectively. Meanwhile, SDAP@0.5 and SDAP@0.5–0.95 are reduced by 0.34 and 0.019 correspondingly. The number of parameters and computational GFLOPS decrease to 2.9 M and 7.9 G, while the FPS increases to 86.7.

4.5. Comprehensive Comparison with Other YOLO Models

To assess the DAO-YOLO model’s performance, we conducted a quantitative comparison against mainstream YOLO-series architectures using identical experimental setups, including the same dataset, software, and hardware configurations. The comparative results are systematically summarized in Table 4.

As demonstrated in Table 4, the proposed model attained a mAP@0.5(%) of 88.2% and mAP@0.5–0.95 of 63.9%, surpassing all seven established YOLO models in both metrics. The SDAP@0.5 and SDAP@0.5–0.95 reached 0.070 and 0.101, respectively, both lower than those of the other seven mainstream YOLO models, indicating that the proposed model has superior robustness. With 2.9 M parameters and 7.9 G FLOPs, the proposed model is more lightweight than YOLOv9t, YOLOv10n, and YOLOv11n, but slightly larger than YOLOv6n, YOLOv7t, and YOLOv8n. The model achieves 86.7 FPS, outperforming YOLOv6n, YOLOv7t, and YOLOv8n, in speed but trailing behind YOLOv5n, YOLOv10n, and YOLOv11n. For power companies, FPS is the most crucial metric, as it indicates real-time image processing capability. Most power company monitoring probes actually operate between 25 and 30 FPS. The proposed model delivers 86.7 FPS, which comfortably meets real-time operation standards. To summarize, our model not only meets real-time operational demands but also achieves the highest detection accuracy among comparable solutions.The proposed model shows significant performance advantages over conventional YOLO models in EPOVR tasks.

4.6. Subjective Evaluation

To better demonstrate DAO-YOLO’s enhanced detection capabilities, Figure 8 presents comparative results on representative power operation images before and after our improvements.

Figure 8a–d demonstrates that while YOLOv8 fails to detect irregularly shaped locked hooks, high-altitude falling objects, and operators, our model successfully identifies all objects, confirming the DC2f module’s efficacy in extracting features from irregular objects. Given the notable size variations among these three types of missed objects, the model’s detection results further demonstrate the effectiveness of the AMFE module in extracting multi-scale object features. As illustrated in Figure 8e, the bounding box predicted by the YOLOv8 model for the crane has an aspect ratio similar to the ground truth, but its overall size is significantly smaller. In contrast, the bounding box predicted by our model aligns much more closely with the ground truth, demonstrating the effectiveness of the OCIoU loss. Figure 8f shows that the YOLOv8 model fails to completely enclose the left worker within its predicted bounding box. This issue occurs due to two factors: firstly, the inherent limitation of the CIoU loss function when optimizing aspect ratios, and secondly, the cluttered background around the worker’s feet combined with color similarities between the operator clothing and the environment, requiring stronger feature discrimination. In contrast, the proposed model achieves accurate and complete operator detection, validating both the improved feature representation capability of the DC2f and AMFE modules and the enhanced object localization accuracy provided by the OCIoU loss.

5. Conclusions

Compared with existing object detection models for EPOVR, the advancements achieved in our work can be summarized as follows. Firstly, a DC2f module is introduced to mitigate the limited feature representation for objects with irregular shapes. Secondly, an AMFE module is introduced to overcome the limitations in feature representation for objects of various scales. Thirdly, an OCIoU bounding box regression loss is introduced to address the issue of inadequate object localization. Fourthly, an object detection dataset tailored for EPOVR is constructed to address the limited category diversity in existing datasets. Compared to previous studies, this paper primarily differs in three aspects: the proposed DC2f module, AMFE module, and OCIoU loss function, none of which have appeared in prior work. The DC2f module improves feature extraction for irregularly shaped objects by using deformable convolution to dynamically adjust sampling positions and weights, while leveraging depthwise separable convolution to lower the number of parameters and reduce computation overhead. The AMCA module combines multi-scale depthwise separable convolution, adaptive convolution, and channel attention mechanisms to enhance multi-scale feature extraction while maintaining low parameter complexity. The OCIoU loss enhances object localization accuracy by separately optimizing width and height, rather than relying on aspect ratio optimization. On the EPOVR-v1.0 dataset, ablation studies validate the effectiveness of DC2f, AMFE, and OCIoU, both individually and in combination. Comparative experiments with seven mainstream YOLO models demonstrate that our approach achieves the highest detection accuracy while maintaining real-time performance, confirming its overall efficacy.

Currently, the DAO-YOLO model can only detect the object categories listed in Table 1. However, the architecture of DAO-YOLO model has potential practical values, i.e, it can be applied in other scenarios, such as security monitoring, intrusion detection, etc.

Future researches will focus on two aspects: (1) enhancing the capability of contour feature extraction, (2) expanding the object categories of EPOVR-v1.0 dataset to cover more scenarios of EPOVR.

Author Contributions

Conceptualization, X.Q.; formal analysis, X.Q.; funding acquisition, W.W.; methodology, X.Q. and Y.L.; project administration, P.X.; resources, W.W.; software, Y.L., X.D. and L.L.; supervision, W.W.; validation, W.W. and J.G.; writing—original draft, Y.L.; writing—review and editing, X.Q. and P.X. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported by the Natural Science Foundation of Henan (Grant No. 252300421063), the Research Project of Henan Province Universities (Grant No. 24ZX005), the National Natural Science Foundation of China (Grant No. 62076223) and the Third Batch of Science and Technology Projects for Production Frontline of State Grid Jiangsu Electric Power Co., Ltd. in 2023 (Grant No. DL-2023Z-198).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding authors.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

DC2f	Deformable C2f
AMFE	Adaptive Multiscale Feature Enhancement
OCIoU	Optimized Complete Intersection over Union
EPOVR	Electric Power Operation Violation Recognition

References

Park, J.; Kang, D. Artificial Intelligence and Smart Technologies in Safety Management: A Comprehensive Analysis Across Multiple Industries. Appl. Sci. 2024, 14, 11934. [Google Scholar] [CrossRef]
Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar]
Qian, X.; Li, C.; Wang, W.; Yao, X.; Cheng, G. Semantic segmentation guided pseudo label mining and instance re-detection for weakly supervised object detection in remote sensing images. Int. J. Appl. Earth Obs. Geoinf. 2023, 119, 103301. [Google Scholar] [CrossRef]
Girshick, R. Fast r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar]
Qian, X.; Wang, C.; Wang, W.; Yao, X.; Cheng, G. Complete and invariant instance classifier refinement for weakly supervised object detection in remote sensing images. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5627713. [Google Scholar] [CrossRef]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 39, 1137–1149. [Google Scholar]
Qian, X.; Wu, B.; Cheng, G.; Yao, X.; Wang, W.; Han, J. Building a bridge of bounding box regression between oriented and horizontal object detection in remote sensing images. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5605209. [Google Scholar] [CrossRef]
He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2961–2969. [Google Scholar]
Qian, X.; Huo, Y.; Cheng, G.; Gao, C.; Yao, X.; Wang, W. Mining high-quality pseudoinstance soft labels for weakly supervised object detection in remote sensing images. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5607615. [Google Scholar] [CrossRef]
Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: Proceedings of the 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; Proceedings, Part I; Springer International Publishing: Cham, Switzerland, 2016; pp. 21–37. [Google Scholar]
Qian, X.; Zeng, Y.; Wang, W.; Zhang, Q. Co-saliency detection guided by group weakly supervised learning. IEEE Trans. Multimed. 2022, 25, 1810–1818. [Google Scholar] [CrossRef]
Zhou, X.; Wang, D.; Krähenbühl, P. Objects as points. arXiv 2019, arXiv:1904.07850. [Google Scholar]
Liu, W.; Meng, Q.; Li, Z.; Hu, X. Applications of computer vision in monitoring the unsafe behavior of construction workers: Current status and challenges. Buildings 2021, 11, 409. [Google Scholar] [CrossRef]
Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
Redmon, J.; Farhadi, A. YOLO9000: Better, faster, stronger. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 7263–7271. [Google Scholar]
Farhadi, A.; Redmon, J. Yolov3: An incremental improvement. In Computer Vision and Pattern Recognition; Springer: Berlin/Heidelberg, Germany, 2018; Volume 1804, pp. 1–6. [Google Scholar]
Bochkovskiy, A.; Wang, C.Y.; Liao, H.Y.M. Yolov4: Optimal speed and accuracy of object detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
Ali, M.L.; Zhang, Z. The YOLO framework: A comprehensive review of evolution, applications, and benchmarks in object detection. Computers 2024, 13, 336. [Google Scholar] [CrossRef]
njvisionpower. Safety Helmet Wearing Dataset (SHWD). 2019. Available online: https://github.com/njvisionpower/Safety-Helmet-Wearing-Dataset (accessed on 23 December 2020).
Huang, M.-L.; Cheng, Y. Dataset of Personal Protective Equipment (PPE). Mendeley Data 2025, V2. [Google Scholar] [CrossRef]
Wang, S.; Han, J. Automated detection of exterior cladding material in urban area from street view images using deep learning. J. Build. Eng. 2024, 96, 110466. [Google Scholar] [CrossRef]
Zhang, M.; Zhao, Y. Algorithm for Identifying Violations in Electrical Power Operations by Integrating Spatial Attention Mechanism and YOLOv5. In Proceedings of the 2023 3rd International Conference on Electrical Engineering and Control Science (IC2ECS), Hangzhou, China, 29–31 December 2023; pp. 1710–1714. [Google Scholar]
Liu, Q.; Xu, W.; Zhou, Y.; Li, R.; Wu, D.; Luo, Y.; Chen, L. Fusing PSA to Improve YOLOv5s Detection algorithm for Electric Power Operation Wearable devices. In International Conference on Mobile Networks and Management; Springer Nature: Cham, Switzerland, 2023; pp. 121–135. [Google Scholar]
Sun, C.; Zhang, S.; Qu, P.; Wu, X.; Feng, P.; Tao, Z.; Zhang, J.; Wang, Y. MCA-YOLOV5-Light: A Faster, Stronger and Lighter Algorithm for Helmet-Wearing Detection. Appl. Sci. 2022, 12, 9697. [Google Scholar] [CrossRef]
He, C.; Tan, S.; Zhao, J.; Ergu, D.; Liu, F.; Ma, B.; Li, J. Efficient and Lightweight Neural Network for Hard Hat Detection. Electronics 2024, 13, 2507. [Google Scholar] [CrossRef]
Lian, Y.; Li, J.; Dong, S.; Li, X. HR-YOLO: A Multi-Branch Network Model for Helmet Detection Combined with High-Resolution Network and YOLOv5. Electronics 2024, 13, 2271. [Google Scholar] [CrossRef]
Zhang, Y.; Qiu, Y.; Bai, H. FEFD-YOLOV5: A Helmet Detection Algorithm Combined with Feature Enhancement and Feature Denoising. Electronics 2023, 12, 2902. [Google Scholar] [CrossRef]
Wang, Y.; Jiang, F.; Li, Y.; Zhang, H.; Wang, M.; Yan, S. Safety Helmet Detection Algorithm for Complex Scenarios Based on PConv-YOLOv8. In Proceedings of the 2023 International Conference on the Cognitive Computing and Complex Data (ICCD), Huaian, China, 21–22 October 2023; pp. 90–94. [Google Scholar]
Bao, J.; Li, S.; Wang, G.; Xiong, J.; Li, S. Improved YOLOV8 network and application in safety helmet detection. J. Phys. Conf. Ser. 2023, 2632, 012012. [Google Scholar] [CrossRef]
Di, B.; Xiang, L.; Daoqing, Y.; Kaimin, P. MARA-YOLO: An efficient method for multiclass personal protective equipment detection. IEEE Access 2024, 12, 24866–24878. [Google Scholar] [CrossRef]
Park, S.; Kim, J.; Wang, S.; Kim, J. Effectiveness of Image Augmentation Techniques on Non-Protective Personal Equipment Detection Using YOLOv8. Appl. Sci. 2025, 15, 2631. [Google Scholar] [CrossRef]
Yang, X.; Wang, J.; Dong, M. SDCB-YOLO: A High-Precision Model for Detecting Safety Helmets and Reflective Clothing in Complex Environments. Appl. Sci. 2024, 14, 7267. [Google Scholar] [CrossRef]
Zhang, H.; Mu, C.; Ma, X.; Guo, X.; Hu, C. MEAG-YOLO: A Novel Approach for the Accurate Detection of Personal Protective Equipment in Substations. Appl. Sci. 2024, 14, 4766. [Google Scholar] [CrossRef]
Han, D.; Ying, C.; Tian, Z.; Dong, Y.; Chen, L.; Wu, X.; Jiang, Z. YOLOv8s-SNC: An Improved Safety-Helmet-Wearing Detection Algorithm Based on YOLOv8. Buildings 2024, 14, 3883. [Google Scholar] [CrossRef]
Wang, J.; Sang, B.; Zhang, B.; Liu, W. A Safety Helmet Detection Model Based on YOLOv8-ADSC in Complex Working Environments. Electronics 2024, 13, 4589. [Google Scholar] [CrossRef]
Tang, P.; Xiao, B.; Su, Z.; Gao, F. Wearable Detection Application of Protective Equipment for Live Working Based on YOLOv9. In Proceedings of the IEEE 2024 8th International Conference on Smart Grid and Smart Cities (ICSGSC), Shanghai, China, 25–27 October 2024; pp. 406–411. [Google Scholar]
Zhang, L.; Sun, Z.; Tao, H.; Wang, M.; Yi, W. Research on Mine-Personnel Helmet Detection Based on Multi-Strategy-Improved YOLOv11. Sensors 2024, 25, 170. [Google Scholar] [CrossRef]
Eum, I.; Kim, J.; Wang, S.; Kim, J. Heavy Equipment Detection on Construction Sites Using You Only Look Once (YOLO-Version 10) with Transformer Architectures. Appl. Sci. 2025, 15, 2320. [Google Scholar] [CrossRef]
Wang, S.; Hae, H.; Kim, J. Development of easily accessible electricity consumption model using open data and GA-SVR. Energies 2018, 11, 373. [Google Scholar] [CrossRef]
Jia, X.; Zhou, X.; Shi, Z.; Xu, Q.; Zhang, G. GeoIoU-SEA-YOLO: An Advanced Model for Detecting Unsafe Behaviors on Construction Sites. Sensors 2025, 25, 1238. [Google Scholar] [CrossRef]
Wang, C.Y.; Liao, H.Y.M.; Wu, Y.H.; Chen, P.Y.; Hsieh, J.W.; Yeh, I.H. CSPNet: A new backbone that can enhance learning capability of CNN. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Seattle, WA, USA, 14–19 June 2020; pp. 390–391. [Google Scholar]
Zhu, X.; Hu, H.; Lin, S.; Dai, J. Deformable convnets v2: More deformable, better results. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 9308–9316. [Google Scholar]
Liu, S.; Qi, L.; Qin, H.; Shi, J.; Jia, J. Path aggregation network for instance segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 8759–8768. [Google Scholar]
Zheng, Z.; Wang, P.; Liu, W.; Li, J.; Ye, R.; Ren, D. Distance-IoU loss: Faster and better learning for bounding box regression. AAAI Conf. Artif. Intell. 2020, 34, 12993–13000. [Google Scholar] [CrossRef]
Chollet, F. Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1251–1258. [Google Scholar]
Li, X.; Wang, W.; Wu, L.; Chen, S.; Hu, X.; Li, J.; Tang, J.; Yang, J. Generalized focal loss: Learning qualified and distributed bounding boxes for dense object detection. Adv. Neural Inf. Process. Syst. 2020, 33, 21002–21012. [Google Scholar]
Jocher, G.; Chaurasia, A.; Qiu, J.; Ultralytics. YOLOv5. GitHub Repository. 2020. Available online: https://github.com/ultralytics/yolov5 (accessed on 27 May 2020).
Li, C.; Li, L.; Jiang, H.; Weng, K.; Geng, Y.; Li, L.; Ke, Z.; Li, Q.; Cheng, M.; Nie, W.; et al. YOLOv6: A single-stage object detection framework for industrial applications. arXiv 2022, arXiv:2209.02976. [Google Scholar]
Wang, C.Y.; Bochkovskiy, A.; Liao, H.Y.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 7464–7475. [Google Scholar]
Jocher, G.; Chaurasia, A. Ultralytics. YOLOv8. GitHub Repository. 2023. Available online: https://github.com/ultralytics/ultralytics (accessed on 10 January 2023).
Wang, C.Y.; Yeh, I.H.; Mark Liao, H.Y. Yolov9: Learning what you want to learn using programmable gradient information. In European Conference on Computer Vision; Springer: Cham, Switzerland, 2024; pp. 1–21. [Google Scholar]
Wang, A.; Chen, H.; Liu, L.; Chen, K.; Lin, Z.; Han, J. Yolov10: Real-time end-to-end object detection. Adv. Neural Inf. Process. Syst. 2024, 37, 107984–108011. [Google Scholar]
Khanam, R.; Hussain, M. Yolov11: An overview of the key architectural enhancements. arXiv 2024, arXiv:2410.17725. [Google Scholar]
Everingham, M.; Van Gool, L.; Williams, C.K.; Winn, J.; Zisserman, A. The pascal visual object classes (voc) challenge. Int. J. Comput. Vis. 2010, 88, 303–338. [Google Scholar] [CrossRef]

Figure 1. Detection results of YOLOv8 and the proposed DAO-YOLO models. (a,b) are from the EPOVR-v1.0 dataset.

Figure 2. Illustrations of the detection of safety helmets and reflective vests. (a,b) are cited from the public datasets SHWD [19] and PPE [20], respectively.

Figure 3. Architecture of our DAO-YOLO model. The red dashed lines indicate the innovations of the DAO-YOLO model, where DC2f, AMFE, and OCIoU loss denote the deformable C2f, adaptive multiscale feature enhancement, and optimized bounding box regression loss, respectively. The ’×2’ denotes two ConvModules connected in series.

Figure 4. Architecture of DC2f Module. (a) Overall architecture; (b) architecture of DBottleneck.

Figure 5. Illustration of DConv. The green, black, and orange arrows denote the signal flow, the direction of offset, and the spatial correspondence between current convolution position and sampling points, respectively.

Figure 6. Architecture of AMFE module.

Figure 7. Illustration of OCIoU loss.

Figure 8. Subjective evaluation of detection results between our DAO-YOLO and YOLOv8 models. (a–f) are from the EPOVR-v1.0 dataset.

Table 1. Correspondence between target categories and violation categories.

No.	Violation Category	Object Category
1	Unattended operation site	operator, supervision
2	Personnel standing under lifted loads during crane operation	crane, operator
3	Workers throwing objects from height	throwing
4	Ladder without height restriction marking	ladder with height restriction marking (LHRM), ladder
5	Hooks of lever hoists, pulley hooks, or cranes lacking locking devices	locked-hook, unlocked-hook

Table 2. The explicit distribution of training, validation and testing samples in each object category. The NTR, NVA and NTE denote the number of training, validation and testing samples, respectively.

Category	NTR	NVA	NTE
operator	110	16	30
supervision	106	13	28
crane	103	15	31
throwing	104	15	30
LHRM	105	16	29
ladder	102	13	30
locked-hook	106	14	28
unlocked-hook	104	18	34
Total	840	120	240

Table 3. Ablation study of our DAO-YOLO model.

DC2f	AMFE	OCIoU	mAP@0.5(%)	mAP@0.5–0.95(%)	SDAP@0.5	SDAP@0.5–0.95	Parameters/M	GFLOPS	FPS
×	×	×	85.0	59.5	0.104	0.120	3.0	8.1	86.2
✓	×	×	86.4	60.7	0.093	0.115	2.9	7.9	87.1
×	✓	×	86.6	61.0	0.089	0.112	3.0	8.1	85.7
×	×	✓	85.9	61.7	0.098	0.109	3.0	8.1	86.2
✓	✓	×	87.6	62.6	0.075	0.106	2.9	7.9	86.6
✓	×	✓	87.1	63.2	0.083	0.104	2.9	7.9	87.2
×	✓	✓	87.4	63.5	0.079	0.103	3.0	8.1	85.8
✓	✓	✓	88.2	63.9	0.070	0.101	2.9	7.9	86.7

Table 4. Quantitative comparison between our DAO-YOLO and other YOLO models.

Method	mAP@0.5(%)	mAP@0.5–0.95(%)	SDAP@0.5	SDAP@0.5–0.95	Parameters/M	GFLOPS	FPS
YOLOv5n	83.2	57.3	0.133	0.137	2.5	7.1	88.7
YOLOv6n	82.5	56.9	0.159	0.141	4.2	12.1	83.3
YOLOv7t	84.3	58.5	0.119	0.129	6.0	13.1	79.6
YOLOv8n	85.0	59.5	0.104	0.120	3.0	8.1	86.2
YOLOv9t	82.3	57.7	0.144	0.136	2.0	7.7	87.8
YOLOv10n	81.9	56.8	0.162	0.151	2.3	6.5	89.7
YOLOv11n	84.3	59.0	0.117	0.125	2.6	6.3	90.3
DAO-YOLO	88.2	63.9	0.070	0.101	2.9	7.9	86.7

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Qian, X.; Li, Y.; Ding, X.; Luo, L.; Guo, J.; Wang, W.; Xing, P. A Real-Time DAO-YOLO Model for Electric Power Operation Violation Recognition. Appl. Sci. 2025, 15, 4492. https://doi.org/10.3390/app15084492

AMA Style

Qian X, Li Y, Ding X, Luo L, Guo J, Wang W, Xing P. A Real-Time DAO-YOLO Model for Electric Power Operation Violation Recognition. Applied Sciences. 2025; 15(8):4492. https://doi.org/10.3390/app15084492

Chicago/Turabian Style

Qian, Xiaoliang, Yang Li, Xinyu Ding, Longxiang Luo, Jinchao Guo, Wei Wang, and Peixu Xing. 2025. "A Real-Time DAO-YOLO Model for Electric Power Operation Violation Recognition" Applied Sciences 15, no. 8: 4492. https://doi.org/10.3390/app15084492

APA Style

Qian, X., Li, Y., Ding, X., Luo, L., Guo, J., Wang, W., & Xing, P. (2025). A Real-Time DAO-YOLO Model for Electric Power Operation Violation Recognition. Applied Sciences, 15(8), 4492. https://doi.org/10.3390/app15084492

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Real-Time DAO-YOLO Model for Electric Power Operation Violation Recognition

Abstract

1. Introduction

2. Literature Review

3. DAO-YOLO Model

3.1. Overview

3.2. DC2f Module

3.3. AMFE Module

3.4. OCIoU Bounding Box Regression Loss

3.5. Overall Loss Function

4. Experimental Results and Analysis

4.1. EPOVR-v1.0 Dataset

4.2. Experimental Setup

4.3. Evaluation Metrics

4.4. Ablation Study

4.5. Comprehensive Comparison with Other YOLO Models

4.6. Subjective Evaluation

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI