Next Article in Journal
Insight into the Crack Evolution Characteristics Around the Ridged PDC Cutter During Rock Breaking Based on the Finite–Discrete Element Method
Previous Article in Journal
Thermochemical Conversion of Biomass: Aspen Plus® Modeling of Sugarcane Bagasse Gasification for Syngas Integration
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Dynamic Anomaly Detection Method for Pumping Units Based on Multi-Scale Feature Enhancement and Low-Light Optimization

1
China Petroleum Safety and Environmental Protection Technology Research Institute Co., Ltd., Beijing 102206, China
2
College of Petroleum Engineering, China University of Petroleum (East China), Qingdao 266580, China
3
College of Petroleum Engineering, China University of Petroleum-Beijing, Beijing 102249, China
*
Author to whom correspondence should be addressed.
Processes 2025, 13(10), 3038; https://doi.org/10.3390/pr13103038
Submission received: 20 August 2025 / Revised: 16 September 2025 / Accepted: 18 September 2025 / Published: 23 September 2025
(This article belongs to the Section Energy Systems)

Abstract

Abnormal shutdown detection in oilfield pumping units presents significant challenges, including degraded image quality under low-light conditions, difficulty in detecting small or obscured targets, and limited capabilities for dynamic state perception. Previous approaches, such as traditional visual inspection and conventional image processing, often struggle with these limitations. To address these challenges, this study proposes an intelligent method integrating multi-scale feature enhancement and low-light image optimization. Specifically, a lightweight low-light enhancement framework is developed based on the Zero-DCE algorithm, improving the deep curve estimation network (DCE-Net) and non-reference loss functions through training on oilfield multi-exposure datasets. This significantly enhances brightness and detail retention in complex lighting conditions. The DAFE-Net detection model incorporates a four-level feature pyramid (P3–P6), channel-spatial attention mechanisms (CBAM), and Focal-EIoU loss to improve localization of small/occluded targets. Inter-frame difference algorithms further analyze motion states for robust “pump-off” determination. Experimental results on 5000 annotated images show the DAFE-Net achieves 93.9% mAP@50%, 96.5% recall, and 35 ms inference time, outperforming YOLOv11 and Faster R-CNN. Field tests confirm 93.9% accuracy under extreme conditions (e.g., strong illumination fluctuations and dust occlusion), demonstrating the method’s effectiveness in enabling intelligent monitoring across seven operational areas in the Changqing Oilfield while offering a scalable solution for real-time dynamic anomaly detection in industrial equipment monitoring.

1. Introduction

The increasing demands for safety and operational efficiency in oil and gas production have driven rapid advancements in intelligent video detection technology. Historically, the development of such systems has been a progressive journey, building upon foundational innovations. Early efforts, starting in the 1980s, such as intelligent technologies in oil and gas engineering have achieved continuous breakthroughs. The American Petroleum Solutions (APS) company pioneered the integrated analytical controller for pumping units, enabling equipment fault diagnosis through key performance indicator monitoring. National Supply Company (NSCO) innovatively developed an intelligent pumping system integrating microprocessors and adaptive electronic controllers. In the 1990s, Cobb’s team achieved real-time visualization of oil production processes through the first fiber-optic downhole video system. In computer vision research, Ross Girshick’s 2014 R-CNN network combined region proposal generation with convolutional neural networks [1]; while Joseph Redmon’s 2015 YOLOv1 algorithm achieved real-time object detection breakthroughs [2]. Wang Y.’s team extended deep learning to multi-object detection applications [3], and Joy A. et al. successfully applied it to pedestrian recognition in automotive driving scenarios. In structural mechanics, Liu S H revealed the multi-load failure mechanisms of coiled tubing through experimental and finite element analysis, proposing optimized design parameters to enhance load-bearing capacity [4]. In 2018, Qi Guanqiu’s team developed a reciprocating compressor fault diagnosis system combining data denoising, sparse coding, and SVM classification, achieving over 80% recognition accuracy with five-year operational data [5]. Carpenter established a deep learning cuttings classification model in 2020 [6], while Wang Yi’an addressed dynamic intrusion target tracking in oilfields through trajectory prediction and path optimization algorithms resolving local optima issues [7]. In 2021, Wang Yanwei’s team applied Mask R-CNN for remaining oil image recognition, achieving 93.83% classification accuracy [8]. Shumakov optimized emission monitoring systems through a dynamic flare monitoring platform [9], while Wang Qiang’s multi-resolution parallel network model improved sucker rod keypoint detection accuracy by 6% compared with traditional convolutional networks [10]. In 2023, Huang Zongchao’s time-series image conversion technique combined with convolutional attention residual networks enhanced progressive detection capabilities for pump leakage faults [11]. Ban Zhe implemented Bayesian parameter estimation for electric submersible pumps using MCMC algorithms and sliding window techniques [12], and Yu Hongda’s TSCODE-SIMAM-YOLOv5 model improved crack recognition through 3D attention mechanism modifications, achieving 91.8% detection accuracy [13]. In 2024, Ling Zhou’s LRFT image preprocessing method enhanced image quality in low-light/underwater environments through feature fusion and local statistical transfer strategies [14]. In 2025, Wu XinYa’s SERep-CCNet lightweight model combining SENetV2 and RepHead modules achieved optimized balance between accuracy and resource consumption in drilling equipment detection [15]. Additionally, Liang Ma’s multi-source heterogeneous data fusion framework integrating AIGAN and MRMR algorithms demonstrated significant advantages in industrial fault diagnosis through data completion and feature optimization [16].
While the aforementioned intelligent monitoring technologies in the oil and gas industry have advanced considerably to date, the precise identification of ‘abnormal shutdowns’ for the pumping unit—a core component of onshore oil extraction and a consistent focus throughout this technology’s evolution—still confronts two major bottlenecks: the inability of traditional algorithms to capture its unique non-continuous motion patterns, and severe image quality degradation caused by harsh on-site visual environments.
As critical equipment in land-based oilfields, pumping units directly impact production efficiency and safety. Abnormal pump shutdowns are common safety hazards that may cause production losses, equipment damage, or secondary accidents. The coexistence of planned maintenance and sudden faults increases the risk of misjudgment. Intelligent management systems maintain optimal operational states, maximizing equipment efficiency. Traditional optical flow and background modeling methods rely on continuous motion features, struggling to identify non-continuous states like donkey head oscillation stagnation. Additionally, field conditions such as abrupt illumination changes, dust occlusion, and equipment interference degrade imaging stability and feature extraction accuracy. To address these limitations, this study proposes a two-stage solution: (1) A Zero-DCE-optimized low-light enhancement framework, addressing image quality degradation; (2) A Dynamic Attention Fusion Network (DAFE-Net) object detection model, incorporating inter-frame difference dynamic state perception algorithms, to improve pump shutdown detection safety, accuracy and real-time performance.

2. Analysis of the Low-Light Enhancement Algorithm Framework

Due to outdoor deployment conditions of cameras at oil production sites, surveillance images often suffer from degradation caused by environmental factors such as insufficient illumination, complex light sources, and uneven brightness distribution. Captured images typically exhibit low luminance values, low contrast, or localized overexposure, with critical details becoming indistinct. Such low-quality, low-contrast images severely compromise the detection of safety violations at production sites. Enhancing low-light images through computational image processing methods can improve image quality and recognition accuracy.

2.1. Overall Algorithm Framework

The Zero-Reference Deep Curve Estimation (Zero-DCE) [17] algorithm framework (Figure 1) employs a convolutional neural network to enhance images without reference data. Its core principle involves constructing a lightweight neural network (DCE-Net) to estimate high-order [18] mapping curves for each pixel in an image, achieving dynamic enhancement effects. This method demonstrates distinct advantages: First, its lightweight network architecture ensures real-time inference performance, making it suiTablele for edge computing scenarios. Unsupervised Learning: A key limitation of traditional methods is their reliance on paired (low-light/ground-truth) or unpaired (low-light/normal-light) datasets for training. To address this problem, our method introduces spatial consistency and exposure control loss functions, thereby enabling unsupervised training. Technically, this approach shares similarities with Gamma transformation in applying nonlinear mapping functions to adjust image response curves. However, Zero-DCE leverages a deep learning framework for adaptive parameter optimization, significantly enhancing adapTableility and robustness. NoTablely, its “unpaired” characteristic manifests in two aspects:
  • Complete independence from manual annotations or external datasets;
  • Implementation of an unsupervised learning framework where only low-light images are input, with enhancement quality optimized indirectly through loss functions, eliminating reliance on paired/unpaired data.
The Zero-DCE framework consists of three components: Light-Enhancement Curve (LE-curve), Deep Curve Estimation Network [19] (DCE-Net), and Non-Reference Loss Functions. The overall processing pipeline first normalizes the pixel values of input low-light RGB images from the original range [0, 255] to [0, 1]. These normalized values are then fed into the DCE-Net to estimate optimal illumination enhancement curves. Subsequently, the Light-Enhancement Curve (LE-curve) are iteratively applied eight times across all three channels of the RGB image to enhance parameter adapTableility. This iterative process generates 24 parameter maps corresponding to the 24-channel outputs of DCE-Net, enabling pixel-wise mapping for all RGB channels.
Figure 1. Overall framework of the Zero-DCE algorithm.
Figure 1. Overall framework of the Zero-DCE algorithm.
Processes 13 03038 g001

2.2. Light-Enhancement Curve Function

The core mechanism of Zero-DCE esTablelishes a dynamic nonlinear mapping between input low-light pixel values L(x) and the enhanced output space. This mapping is Equationted through a curvature-adjusTablele second-order equation as defined in Equation (1):
L E n x = L E n 1 x + α n L E n 1 x 1 L E n 1 x
where n is a critical hyperparameter that governs the overall curve complexity and curvature adjustment capacity; α n is a pixel-level parameter map to enable localized adaptive enhancement.
However, single second-order terms demonstrate limited adapTableility to complex illumination conditions (e.g., extreme contrast or non-uniform lighting). To endow the model with stronger nonlinear representation capability and more flexible local regulation, Zero-DCE extends the Equationtion to higher-order equations through iterative stacking of nonlinear terms. Moreover, to address the limitations of global scalar parameters in handling images with significant local illumination discrepancies (e.g., localized shadows or highlights)—which may cause under-enhancement or over-enhancement in specific regions—the global scalar parameters are generalized to spatially adaptive parameter mappings. The modified higher-order illumination enhancement function is Equationted as Equation (2):
L E n x = L E n 1 x + A n L E n 1 x 1 L E n 1 x
where A n is a parameter mapping of the same size as the given image [20].
In Equation (2), each pixel x possesses independently controlled parameters A n x at the n iteration stage. This design endows the model with strong local adapTableility: the parameter map A n learns to dynamically adjust the nonlinear enhancement curvature at each pixel location based on local brightness, contrast, and contextual information of the input image. Such mechanism enables targeted illumination correction across different regions (e.g., significantly enhancing shadow details without overexposing bright regions, or suppressing localized highlight saturation), thereby substantially improving enhancement performance and robustness under complex non-uniform lighting conditions.

2.3. Optimization of Deep Curve Estimation Network

Zero-DCE implements nonlinear mapping enhancement for low-light images through the lightweight DCE-Net architecture. The original DCE-Net employs a seven-layer symmetric cascade convolutional architecture combined with ReLU and Tanh activation functions to constrain output ranges, achieving complex mapping modeling with only 79,416 parameters and 5.21 GFLOPs computational cost. This demonstrates its suiTableility for resource-constrained scenarios such as mobile devices. However, the high computational complexity of standard convolutional layers limits deployment efficiency in oilfield edge monitoring applications. To address intelligent recognition requirements for low-light images, this work proposes a dual-discrimination mechanism based on gray-level statistical features and spatial distribution characteristics.
The gray-level deviation from the mean, as defined in Equation (3), quantifies the overall deviation of all pixel gray-level values relative to the mean intensity a in the image:
A V G = i = 0 w 1 j = 0 h 1 x i , j a w × h
where x i , j is the gray value at the coordinates (i,j) in the image; a is the mean gray level of the image; w is the width of the image; h is the height of the image. When AVG > 0, the overall image is brighter. When AVG < 0, the overall image is darker. When AVG = 0, the gray-scale distribution of the image is symmetrical to the mean a.
The mean deviation, as shown in Equation (4), is a measure reflecting the dispersion degree of the grayscale value distribution in the image, with higher values indicating a more dispersed grayscale distribution.
A . D . = i = 0 255 i a A V G × H i w × h
where H i is the number of pixels with a gray value of i [20].
The brightness parameter is shown in Equation (5), and the brightness state of the image is determined by the ratio of A V G to A . D .
S = A V G A . D .
The brightness determination index S 1 indicates normal image brightness; when S > 1 and the grayscale deviation from the mean A V G < 0 , it implies significant deviation from the mean, suggesting low-light or overexposed conditions.
Despite the dynamic enhancement capability of low-light images through the unsupervised learning framework in Zero-DCE, its deployment efficiency in oilfield on-site monitoring is constrained by the high parameter count and computational complexity of standard convolution layers. To address this, this paper proposes Zero-DSOpt (Zero-DCE with Depthwise Separable Convolution Optimization), which replaces standard convolution layers in DCE-Net with Depthwise separable convolutions to achieve a marked reduction in both the parameter count and computational load of the network while simultaneously maintaining the quality of low-light image enhancement. Depthwise separable convolution decomposes standard convolution into two steps:
  • Depthwise convolution applies 3 × 3 kernels (stride = 1) independently to each channel of the input feature map H × W × C, generating an intermediate feature map H × W × C with 3 × 3 × C parameters;
  • Pointwise convolution linearly combines channels via 1 × 1 kernels (stride = 1), producing the output feature map H × W × C′ with C × C′ parameters [21].
The total parameter count for Depthwise separable convolution is 3 × 3 × C + C × C′, compared to 3 × 3 × C × C′ for standard convolution. When C′ K2 (K = 3), parameters are reduced by approximately 8–9 times. In terms of computational cost, the FLOPs reduction ratio of Depthwise separable convolution is approximately 1 C + 1 K 2 , achieving a 50% improvement in inference speed. Performance comparisons before and after algorithm improvement are presented in Table 1.

2.4. Experimental Analyzsis

To evaluate the performance differences between Zero-DSOpt and other mainstream low-light enhancement algorithms [20], this study conducts comparative experiments based on the LOL dataset (containing paired images under normal and low-light conditions in natural scenes). Partial experimental results are presented in Figure 2. The quantitative evaluation comprises three core metrics: PSNR (peak signal-to-noise ratio) for noise characterization, SSIM (structural similarity index) for structural consistency measurement [22], and per-frame computational efficiency as temporal performance benchmark. The experimental environment is conFigureured with an Intel i7-9700k CPU, 32 GB RAM, and an NVIDIA GeForce GTX TITAN X GPU.
Conventional image enhancement techniques, such as gamma correction and histogram equalization, are inherently prone to noise amplification. While the single-scale Retinex (SSR) method offers improved noise reduction by leveraging Retinex theory, it often introduces color distortion issues. Among deep learning approaches, the RetinexNet model suffers from image blurring artifacts and suboptimal noise suppression performance. In comparison, both KinD and Zero-DSOpt demonstrate superior noise control capabilities, though the Zero-DSOpt algorithm exhibits a characteristic luminance deficiency requiring further optimization. Furthermore, the computational latency of KinD is approximately 47 times higher than that of the Zero-DSOpt method, as quantitatively demonstrated in Table 2.
Some datasets are shown in Figure 3. The overall brightness of the Zero-DSOpt algorithm is slightly dimmer than that of the KinD algorithm. Although the Zero-DSOpt algorithm does not have high requirements for the dataset, even if the training set only contains low-illumination images, it can still produce a good enhancement effect in the end. However, it will cause overexposure. If the dataset utilizes a multi-exposure sequence dataset (i.e., images with varying exposure levels in the same scene), the final enhancement results will exhibit richer detail features. Therefore, in order to meet the requirements of image quality at the drilling site, this study expanded the Zero-DSOpt training dataset. The source of the expanded dataset was the surveillance video images of more than 70 well teams in the Yellow River Drilling Corporation [20], which were accessed with time series as markers. Finally, 2400 images were selected as the expanded dataset.
The original Zero-DSOpt training set consisted of 2422 images from the SICE dataset. A combined dataset was constructed by integrating the SICE dataset with the drilling site multi-exposure sequence dataset, resulting in a total of 4822 images for training the Zero-DSOpt model in drilling site applications. The model trained on the combined dataset was compared with KinD, where Zero-DSOpt* denotes results from the combined dataset. Experimental outcomes are presented in Figure 4.
Performance comparisons for drilling site applications using the combined dataset are shown in Figure 5.
Quantitative analysis results after combined dataset training are summarized in Table 3.
Experimental analysis reveals that Zero-DSOpt demonstrates significant performance improvements following joint training. Subjectively, the enhanced images exhibit improved overall brightness compared to previous results, with previously obscured details becoming visible. This includes chair textures in the first row of Figure 5. and equipment textures in Figure 6. Quantitative evaluations using the peak signal-to-noise ratio (PSNR) and structural similarity index (SSIM) demonstrate notable improvements in enhancement performance. While Zero-DSOpt slightly underperforms KinD in PSNR and SSIM metrics after combined training, its time complexity remains substantially lower than KinD (Kindling the Darkness). Considering both enhancement quality and real-time processing capabilities, the combined-dataset-trained Zero-DSOpt algorithm provides the optimal solution for field deployment. Its preprocessing performance in drilling site applications is illustrated in Figure 6.
Further analysis identifies Zero-DSOpt effectiveness in addressing three typical low-light challenges in drilling sites: global low-light conditions, local low-light conditions, and human-specific low-light scenarios. Information degradation caused by insufficient illumination, backlighting, and shadowed regions is effectively mitigated. For instance, personnel previously obscured under night-time low-light conditions (the first and third rows of Figure 6) become clearly visible. Additionally, as shown in the second row of Figure 6, drilling equipment exhibiting local low-light characteristics due to backlighting recovers significant image details after processing.
Since low-light preprocessing consumes additional computational resources and may induce overexposure in well-illuminated frames, this study integrates low-light image detection, enhancement algorithms, and object detection to achieve real-time processing capabilities for drilling site applications. The complete low-light preprocessing pipeline based on Zero-DSOpt is illustrated in Figure 7.

3. DAFE-Net: An Integrated Framework for Object Detection and Motion Analysis

Detecting dynamic pumping units presents significant challenges, including the recognition of small targets, interference from occlusions, and multi-scale variations. To address these issues, this study proposes DAFE-Net (Dynamic Attention-based Feature Enhancement with Dynamic Feature Perception) model. The network architecture is illustrated in Figure 8.
Its name originates from the following core improvements:
  • D (Deep): Constructing a four-layer feature pyramid network (P3–P6) to deepen multi-scale semantic fusion and enhance the deep network’s semantic modeling capability for small targets;
  • A (Attention-based): Embedding the CBAM module in the backbone to strengthen target edge feature representation through parallel interaction of channel and spatial attention;
  • F (Feature Enhancement): Combining anchor box distribution optimization with Focal-EIoU loss function reconstruction to precisely constrain predicted box width-height discrepancies and improve boundary regression accuracy;
  • E (Enhancement): Introducing an inter-frame difference algorithm to achieve motion state perception through coordinate recording and dynamic threshold analysis, assisting “pump stop” judgment.
The algorithm workflow consists of three stages:
  • Constructing a multi-level feature pyramid to strengthen multi-scale semantic extraction;
  • Embedding the CBAM module to collaboratively optimize channel and spatial features;
  • Employing the Focal-EIoU loss function and inter-frame difference algorithm to realize size constraints and dynamic motion state perception.
Furthermore, by integrating the inter-frame difference algorithm, the system captures pixel-level changes in target regions in real-time and dynamically updates detection results through threshold segmentation technology, thereby enhancing sensitivity to pumping unit motion states. Experimental results demonstrate that DAFE-Net significantly improves detection accuracy and sTableility for pumping units in complex scenarios while maintaining the model’s lightweight advantage. The pump stop detection workflow of DAFE-Net is shown in Figure 9.

3.1. Backbone Network Integration with CBAM

The DAFE-Net backbone network constructs a lightweight feature learning framework suiTablele for complex industrial scenarios by integrating multi-scale feature extraction with an efficient information aggregation mechanism. This design combines the Focus structure and Cross Stage Partial (CSP) module with the Convolutional Block Attention Module (CBAM) to address feature distribution heterogeneity in small target detection tasks.
The preprocessing effect of the Focus module is shown in Figure 10. The network employs an improved Focus structure at the input stage for preprocessing the original image. This module partitions the input image into four complementary local regions through an asymmetric pixel slicing strategy. It then concatenates these sub-images along the channel dimension, quadrupling the number of input channels (e.g., converting RGB three-channel input into a 12-channel feature map). While preserving complete spatial information, this design achieves twice the downsampling through lightweight convolution operations, significantly enhancing feature extraction density and efficiency. Compared to conventional pooling operations, the Focus structure reduces information loss risks through local region complementarity, establishing a foundation for subsequent multi-scale feature fusion.
Building upon this, DAFE-Net incorporates the Cross Stage Partial (CSP) structure as the core component of the backbone network. This module partitions the input feature maps into two independent pathways: the primary pathway extracts high-level semantic features through a simplified convolutional network (which consists of stacked Bottleneck modules), while the auxiliary pathway directly transmits low-level detail features. The outputs of both pathways are integrated via a dynamic concatenation strategy, generating fused feature maps that combine hierarchical representation capabilities. The CSP structure effectively reduces redundant computations and enhances the model’s perception [21] of target contours and texture details through collaborative utilization of multi-level features. Moreover, the CSP module significantly reduces model parameters through cross-layer feature reuse mechanisms while improving feature extraction robustness, providing stable representation foundations for complex scene detection tasks.
To refine discriminative feature representation, DAFE-Net embeds the Convolutional Block Attention Module (CBAM) at critical nodes of the backbone network, as illustrated in Figure 11. This module constructs cross-dimensional feature integration pathways through collaborative optimization of channel and spatial attention mechanisms. Specifically, the channel attention mechanism employs global average pooling and parameterized weight allocation to dynamically amplify contributions from critical channels. The spatial attention mechanism generates spatial saliency maps via multi-scale convolutional operations, highlighting edge features in target regions. These two mechanisms achieve dynamic interaction strength adjustment through learnable fusion coefficients, significantly enhancing feature map semantic richness and spatial focus without additional parameters. To address heterogeneous feature distributions across network layers, the CBAM module introduces dynamic weight allocation strategies, enabling the network to autonomously regulate interaction intensity between channel and spatial features. This design achieves precise multi-scale features focusing on complex industrial scenarios.
The proposed architecture integrates Focus module feature compression, CSP module cross-layer fusion, and CBAM attention enhancement to construct an efficient and robust feature extraction framework. Experimental results demonstrate that this design significantly improves model adapTableility to small targets and occluded scenes while maintaining computational efficiency, offering an innovative solution for real-time and accuracy requirements in industrial detection tasks.

3.2. Multi-Scale Feature Pyramid Enhancement

This study proposes an improved multi-scale feature fusion architecture that integrates the bi-directional information transmission mechanisms, with the Cross Stage Partial Network (CSPNet) employed as the core module for feature integration, a foundational unit for feature fusion [23]. In the top-down FPN pathway, high-level semantic features are progressively upsampled and fused with low-level features through element-wise operations, effectively enhancing semantic feature representation. The bottom-up PAN pathway preserves local detail information via cross-layer connections to establish bi-directional feature interaction. To address feature degradation issues in conventional FPN structures, CSPNet maintains feature expression integrity while reducing computational complexity through grouped convolution and cross-group feature concatenation strategies. Experimental results demonstrate that this design significantly improves cross-scale feature representation capabilities while meeting real-time processing requirements.
To resolve occlusion interference challenges in small target detection under complex oilfield scenarios, this study systematically optimizes the feature pyramid network, as illustrated in Figure 12. First, topological structure reconstruction extends the traditional three-layer FPN to a four-layer architecture by adding a P6-level feature map to the backbone network, expanding the maximum receptive field to 2.3 times that of the original structure and substantially improving semantic feature capture for micro-targets. Second, during feature fusion, the clustering optimization algorithm adaptively adjusts spatial distribution parameters of predefined anchor boxes based on statistical characteristics of target size distributions, establishing a detection framework covering P3-P6 four-scale levels. This multi-scale detection architecture employs a gradient resolution increment mechanism across feature maps, effectively mitigating conflicts between shallow feature loss and deep feature localization deviations, thereby enhancing small target detection accuracy.

3.3. EIoU Loss Function Reconstruction

To improve the limited bounding box regression accuracy of conventional algorithms, this study adopts an optimized loss function based on geometric features. The classical IoU loss, a variant of Intersection over Union (IoU), serves as the core metric for evaluating localization errors by quantifying the overlap proportion through the intersection-to-union area ratio between the predicted and ground-truth bounding boxes [24].
Specifically, for each predicted boundary box, the IoU value is computed against all ground-truth boxes, and the ground-truth box with the maximum IoU is selected as the matching target to calculate the IoU loss [25]. The key mathematical Equationtions are presented in Equations (6) and (7):
I o U = B B g t B B g t
where B g t = x g t , y g t , w g t , h g t is the Ground truth box; B = x , y , w , h is athe predictive regression box [26].
L I o U = 1 B B g t B B g t
This method effectively guides model optimization when target boxes overlap, but exhibits significant limitations in non-overlapping scenarios. As shown in State 1 in the illustrative diagram, when predicted boxes have no intersection with ground-truth boxes, the IoU value remains fixed at zero, preventing gradient updates. Furthermore, different box shapes (e.g., States 2 and 3 in the diagram) may produce identical loss values, a degeneracy phenomenon that severely impairs model convergence efficiency and localization accuracy. Comparative analysis of IoU values between predicted and ground-truth boxes under different states is presented in Figure 13.
To address these limitations, this study designs the Expected Intersection over Union (EIoU) loss function [27], which constructs a ternary optimization framework incorporating overlap loss, center distance loss [28], and aspect ratio loss, as Equationted in Equation (8):
L E I O U = 1 I O U + ρ 2 b , b g t C 2 + ρ 2 w , w g t C w 2 + ρ 2 h , h g t C h 2
where w and h is the width and height of the prediction box respectively; w g t and h g t the width and height of the real box respectively [29]; C w and C h are the widths and heights of the minimum external boxes that cover the predicted box and the real box respectively.
EIoU is calculated by adding the width-height loss term Equation (9), directly minimizing the dimensional deviation between the predicted box and the real box, thereby significantly improving the regression accuracy of the coordinate x i , y i , w i , h i of the pumping unit.
L h w = w w g t 2 C w 2 + h h g t 2 C h 2
To address the imbalance problem of high/low quality anchor box distribution in object detection, the Focal EIoU loss function is adopted for optimization, as shown in Equation (10):
L F o c a l E I O U = I o U γ L E I O U
where γ is the regulatory factor. The parameter γ serves as a regulation factor to control sample weight distribution. By adjusting α, the learning signals from high-quality anchor boxes (with larger I o U values) are emphasized, while the negative influence of low-quality samples caused by blurring or occlusion on parameter updates is suppressed. This dynamic adjustment mechanism effectively prevents performance degradation caused by low-quality samples dominating the training process, significantly enhancing the detector’s effective feature extraction capability.

3.4. Inter-Frame Difference Dynamic Perception Integration

For motion detection in video streams with a fixed background, this study employs the inter-frame difference method. The algorithm generates a grayscale difference map representing motion regions through pixel-wise difference operations between consecutive video frames, followed by dynamic threshold segmentation to extract significant change regions. The detection results are illustrated in Figure 14. The core advantage lies in segmenting fast-moving targets: when inter-frame displacement exceeds the preset threshold, the spatial migration characteristics of target contours can be effectively captured.
Firstly, a threshold T is defined, followed by subtracting consecutive frames to generate a difference image. Subsequent binarization processing is applied to the difference image, after which connectivity analysis of the binary image enables distinguishing moving targets. The flowchart of the two-frame difference method is illustrated in Figure 15. The thresholding procedure is mathematically Equationted in Equations (11) and (12):
D n x , y = f n x , y f n 1 x , y
where D n x , y is the gray value of the differential image; f n x , y is the gray value of the n frame of the image; f n 1 x , y is the gray value of the n − 1 frame of the image.
R n x , y = 255 , D n x , y > T 0 , e l s e
where R n x , y is the gray value of the binarized image; T is the binarization threshold of the difference image, dimensionless.
Within a fixed time interval T , the detected pumping unit regions are segmented and resized to fixed dimensions. The original image coordinates x i , y i , w i , h i is recorded, where x i , y i denotes the top-left coordinates of the detection box and w i , h i represents its width and height. By storing precise detection box coordinates, the system rapidly locates target regions in subsequent frames. The inter-frame difference method is applied to the cropped pumping unit regions to detect motion states. The variation area threshold C between consecutive frames is iteratively adjusted to determine an optimal threshold that effectively distinguishes pumping unit activity. The detailed workflow is illustrated in Figure 16.

3.5. Experimental Analysis

  • Data Acquisition: Firstly, video stream data was collected from oilfield operation sites, including pumping units under both shutdown and normal operating conditions. The video content should contain pumping units at varying image scales to cover targets at different distances.
  • Data Preprocessing: The collected video streams underwent frame extraction to convert videos into individual frames. Subsequently, redundant frames with high similarity were removed through manual screening.
  • Image Annotation: The LabelImg tool was employed to annotate pumping unit components in the filtered images, with labels defined as “pumping unit,” “horse head,” “walking beam,” and “pumping unit base.” The “pumping unit base” category includes components such as counterweights and pulley wheels. Label names and quantities are summarized in Table 4.
  • Dataset division: The complete dataset was randomly partitioned into a 9:1 training-to-test split to ensure robust model validation.

3.5.1. Experimental Environment

The experiments were conducted on a Windows 10 operating system with an Intel(R) Core(TM) i7-10700 CPU @ 2.90 GHz processor and 32 GB RAM. The GPU conFigureuration utilized an NVIDIA GeForce GTX TITAN X. Parameter settings are detailed in Table 5.

3.5.2. Comparative Experiments

For recognition accuracy evaluation, precision and recall metrics from the confusion matrix were adopted as fundamental indicators. The mean average precision (mAP) was calculated based on precision and recall to comprehensively assess model performance across all categories [30], using an intersection-to-union ratio (IoU) of 0.5 as the benchmark for determining detection success. Comparative experimental results among different models are presented in Table 6.
The results in Table 6 show that DAFE-Net achieves the highest mAP (93.9%) and Recall (96.5%) among all tested models, including Faster-RCNN, SSD and Detr, and YOLOv11. Specifically, the improved DAFE-Net model shows significant accuracy improvements in pumping unit detection without noTablele increases in inference time, achieving real-time detection capabilities with high accuracy.
The dataset comprising 5000 images was partitioned into a training subset and a test subset following the predefined ratio. After model training on the training subset, the recognition accuracy and recall rates are summarized in Table 7.
The enhanced DAFE-Net model attains an average precision of 93.9%, recall rate of 96.5%, and inference time of 35 ms. Compared to YOLOv11, the average precision improves by 0.9%, demonstrating superior performance over baseline models such as Faster-RCNN and SSD. Field tests confirm a pumping unit detection accuracy of 93.9%, validating the algorithm’s robustness in complex scenarios and fulfilling requirements for real-time pump stop detection.

3.5.3. Visualization Analysis

As shown in Figure 17, the improved DAFE-Net model demonstrates substantial performance enhancement in abnormal pump stop detection, achieving 93.9% overall mAP@50%. When recall reaches 80%, the model maintains precision above 85% with peak precision at 97%. The precision curve exhibits gradual decline in high-recall regions (60–100%), indicating effective false detection suppression while covering most abnormal samples. Particularly at recall thresholds ≥80%, precision consistently exceeds 85%, validating the adapTableility of feature enhancement and dynamic threshold strategies to complex industrial conditions, providing technical support for accurate early warning.
As shown in Figure 18, the model demonstrates balanced performance in multi-component anomaly detection. Specifically, the mAP values for “ketouji_arm” and “ketouji_bottom” reach 92.9% and 87.1% respectively, exceeding those of the baseline system. The multi-class evaluation curves exhibit a gradual decline within the high-recall region, avoiding abrupt performance degradation and maintaining controlled false positive rates under high recall conditions. In practical industrial applications, the model achieves robust accuracy in detecting abnormal shutdown events for critical components of pumping units (e.g., arms and bases), with significantly reduced miss detection rates. This characteristic provides high-reliability decision-making support for real-time monitoring systems, fulfilling operational requirements under complex working conditions.

4. Application Results

Based on the “Anyan Project” initiated by PetroChina Company Limited, this study conducted industrial-scale field validation tests in seven representative operation zones (e.g., Xinghe and Xingnan) of Changqing Oilfield through the “Tianxuan” intelligent video analysis platform (Figure 19). A comprehensive multi-modal intelligent analysis methodology was implemented, esTablelishing a full-coverage video monitoring network for pumping unit clusters, gathering stations, and pipeline corridor equipment via cloud-edge-terminal collaborative deployment of “Tianxuan” inference all-in-one devices and intelligent analysis boxes. The experimental data demonstrate that the DAFE-Net model sustains a mean detection accuracy of. 93.9% and real-time response performance at the 35 ms level under extreme working conditions such as strong light fluctuations, dust shielding and partial shielding of equipment. This field application validates that the intelligent detection system significantly enhances safety supervision robustness in complex oil and gas field scenarios, providing key technical support for the large-scale implementation of the “Anyan Project” in PetroChina’s upstream operations.

5. Discussion

Deployment Feasibility and Robustness The field tests confirm the practical deployment feasibility of the proposed system. The use of a cloud-edge-terminal architecture, with “Tianxuan” inference devices performing analysis at the edge, minimizes network latency and bandwidth requirements. The model’s real-time performance (35 ms per frame) is crucial for safety-critical applications, ensuring that alerts for events like pump stops are generated without delay. The sustained high accuracy (93.9%) across seven different operational zones and under diverse environmental stressors validates the model’s generalization capability and robustness, making it a viable solution for large-scale, automated monitoring.
Limitations and Future Work Despite its strong performance, the current system has limitations, primarily related to the inter-frame difference method used for motion perception. This method, while computationally efficient, has several drawbacks: (1) Sensitivity to Environmental Noise: It is susceptible to false positives caused by camera jitter (e.g., from wind) or sudden global illumination changes, such as passing clouds or shifting shadows. (2) Insensitivity to Slow Motion: If a pumping unit slows down very gradually before stopping, the pixel-level changes between consecutive frames may fall below the detection threshold, causing a missed event. (3) Lack of Semantic Understanding: The method only detects pixel changes; it does not inherently understand the context of the motion.
To address these limitations and further enhance system robustness, future research will focus on several key areas. One promising direction is to replace the inter-frame difference algorithm with more sophisticated motion analysis techniques like optical flow, which can estimate the direction and magnitude of motion and is less sensitive to uniform lighting changes.
A more advanced approach would be to develop an end-to-end video analysis model that inherently learns temporal dynamics. Architectures such as 3D Convolutional Neural Networks (3D-CNNs) or Video Transformers could be trained on short video clips to recognize not only the pumping unit but also its operational state (running, stopping, stopped) directly, creating a more integrated and robust solution. Finally, fusing video data with other sensor modalities, such as acoustic or vibration sensors, could provide a definitive, multi-modal confirmation of the pumping unit’s operational status, creating a highly reliable intelligent monitoring system.

6. Conclusions

To address challenges of poor low-light imaging quality, small-target miss detection, and insufficient dynamic perception in dynamic anomaly detection for oilfield pumping units, this study proposes an intelligent detection method integrating multi-scale feature enhancement and low-light image optimization, achieving high-precision real-time detection under complex working conditions.
  • A lightweight low-light enhancement framework (Zero-DSOpt) was developed by modifying the Zero-DCE algorithm. Through Depthwise separable convolution optimization, the model achieves improved parameter efficiency and inference speed, significantly enhancing image quality in terms of PSNR and SSIM metrics. This effectively resolves information loss issues caused by sudden illumination changes and shadow occlusion in oilfield environments.
  • The DAFE-Net model was designed by integrating frame difference algorithms with multi-scale feature fusion strategies, incorporating a four-level feature pyramid, CBAM attention mechanism, and Focal-EIoU loss function for precise detection of small targets and occluded scenarios. Testing on a 5000-image oilfield dataset achieved 93.9% mAP@50%, 96.5% recall, and 35 ms inference time, outperforming mainstream algorithms including YOLOv11 and Faster R-CNN.
  • Field validation through the “Tianxuan” intelligent video platform in seven Changqing Oilfield operation zones confirms that the proposed method maintains high detection accuracy under extreme conditions such as intense illumination fluctuations and dust occlusion, providing reliable technical support for oilfield safety management and promoting large-scale implementation of the “Anyan Project”.

Author Contributions

Conceptualization, K.T.; methodology, Y.M. and S.W. (Shunyi Wang); software, K.T.; validation, K.T. and S.W. (Shuting Wang); formal analysis, Y.M.; investigation, S.W. (Shunyi Wang); resources, G.H.; data curation, K.T. and S.W. (Shuting Wang); writing—original draft preparation, S.W. (Shuting Wang); writing—review and editing, S.W. (Shuting Wang); visualization, Y.M. supervision, G.H.; project administration, K.T. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data presented in this study are available on request from the corresponding author due to the data being internal proprietary information of the collaborating company.

Conflicts of Interest

Author Kun Tan, Yaming Mao and Shunyi Wang was employed by China Petroleum Safety and Environmental Protection Technology Research Institute Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
Zero-DCEZero-Reference Deep Curve Estimation
DCE-NetDeep Curve Estimation Network
LE-curveLight-Enhancement Curve
Zero-DSOptZero-DCE with Depthwise Separable Convolution Optimization
LOLLOw-Light dataset
HEHistogram Equalization
SSRsingle-scale Retinex
KinDKindling the Darkness
PSNRPeak Signal-to-Noise Ratio
SSIMStructural Similarity Index Metrics
DAFE-NetDynamic Attention-based Feature Enhancement with Dynamic Feature Perception
CSPCross Stage Partial
CBAMConvolutional Block Attention Module
RGB RGB color mode
FPNFeature Pyramid Network
PANPath Aggregation Network
CSPNetCross Stage Partial Network
IoUIntersection over Union
EIoUExpected Intersection over Union
Faster-RCNNRegion-based Convolutional Neural Networks
SSDSingle Shot MultiBox Detector
DetrDEtection TRansformer
YOLOv11You Only Look Once v11
mAPMean Average Precision
AIGANAttention-encoding Integrated Generative Adversarial Network
LRFTLocal Reference Feature Transfer
NSCONational Supply Company
APSAmerican Petroleum Solutions
GFLOPs Giga Floating Point Operations Per Second
ReLURectified Linear Unit
MRMRMax-Relevance and Min-Redundancy

References

  1. Cobb, C.C.; Schultz, P.K. A Real-Time Fiber Optic Downhole Video System. In Proceedings of the Offshore Technology Conference IV: Field Drilling and Development System, Richardson, TX, USA, 4–7 May 1992; pp. 575–582. [Google Scholar] [CrossRef]
  2. Girshick, R.; Donahue, J.; Darrell, T. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar] [CrossRef]
  3. Redmon, J.; Divvala, S.; Girshick, R. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the 29th IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 26 June–1 July 2016; Institute of Electrical and Electronics Engineers: New York, NY, USA, 2016; pp. 779–788. [Google Scholar] [CrossRef]
  4. Liu, S.; Xiao, H.; Guan, F. Coiled tubing failure analysis and ultimate bearing capacity under multi-group load. Eng. Fail. Anal. 2017, 10, 7981. [Google Scholar] [CrossRef]
  5. Qi, G.; Zhu, Z.; Erqinhu, K.; Chen, Y.; Chai, Y.; Sun, J. Fault-Diagnosis for Reciprocating Compressors Using Big Data and Machine Learning. Simul. Model. Pract. Theory 2018, 80, 104–127. [Google Scholar] [CrossRef]
  6. Carpenter, C. Deep-Learning Techniques Classify Cuttings Volume of Shale Shakers. J. Pet. Technol. 2020, 72, 61–62. [Google Scholar] [CrossRef]
  7. Wang, Y.; Li, K.; Han, Y.; Ge, F.; Xu, W.; Liu, L. Tracking a Dynamic Invading Target by UAV in Oilfield Inspection via an Improved Bat Algorithm. Appl. Soft Comput. 2020, 90, 106150. [Google Scholar] [CrossRef]
  8. Wang, Y.; Liu, H.; Guo, M.; Shen, X.; Han, B.; Zhou, Y. Image Recognition Model Based on Deep Learning for Remaining Oil Recognition from Visualization Experiment. Fuel 2021, 291, 120216. [Google Scholar] [CrossRef]
  9. Shumakov, Y.A.; Zhandin, A.; Comley, R.; Theuveny, B. Dynamic Flare Monitoring Platform for Continuous Emission Monitoring and Reduction During Well Test and Well Cleanup Operations. In Proceedings of the Abu Dhabi International Petroleum Exhibition and Conference (ADIPEC 2022), Abu Dhabi, United Arab Emirates, 31 October–3 November 2022. [Google Scholar] [CrossRef]
  10. Wang, Q.; Zhang, K.; Zhao, H.; Zhang, H.; Zhang, L.; Yan, X.; Liu, P.; Fan, L.; Yang, Y.; Yao, J. A Novel Method for Trajectory Recognition and Working Condition Diagnosis of Sucker Rod Pumping Systems Based on High-Resolution Representation Learning. J. Pet. Sci. Eng. 2022, 218, 110931. [Google Scholar] [CrossRef]
  11. Huang, Z.; Li, K.; Ke, C. An intelligent diagnosis method for oil-well pump leakage fault in oilfield production Internet of Things system based on convolutional attention residual learning. Eng. Appl. Artif. Intell. 2023, 126, 106829. [Google Scholar] [CrossRef]
  12. Ban, Z.; Pfeiffer, C. Dynamic parameter estimation and uncertainty analysis of electrical submersible pumps-lifted oil field using Markov chain Monte Carlo approaches. Geoenergy Sci. Eng. 2024, 240, 212954. [Google Scholar] [CrossRef]
  13. Yu, H.; Pan, B.; Guo, Y. Automatic fracture identification from logging images using the TSCODE-SIMAM-YOLOv5 algorithm. Geoenergy Sci. Eng. 2024, 243, 213319. [Google Scholar] [CrossRef]
  14. Zhou, L.; Zhang, W.; Zheng, Y. Local Reference Feature Transfer (LRFT): A simple pre-processing step for image enhancement. Pattern Recognit. Lett. 2024, 186, 330–336. [Google Scholar] [CrossRef]
  15. Wu, X.; Li, Q.; Gao, X. Drill tool recognition and detection with SERep-CCNet: A lightweight model approach. Geoenergy Sci. Eng. 2025, 250, 213844. [Google Scholar] [CrossRef]
  16. Ma, L.; Yang, Q.; Llanes-Santiago, O. A multi-source heterogeneous data fusion framework for fault diagnosis in industrial processes with missing image data. Measurement 2025, 256, 118278. [Google Scholar] [CrossRef]
  17. Zhao, H.; Xu, S.; Peng, L.; Hu, H.; Jiang, S. Efficient Gamma-Based Zero-Reference Deep Curve Estimation for Low-Light Image Enhancement. Appl. Sci. 2025, 15, 7382. [Google Scholar] [CrossRef]
  18. Guo, C.; Li, C.; Guo, J.; Loy, C.C.; Hou, J.; Kwong, S. Zero-Reference Deep Curve Estimation for Low-Light Image Enhancement. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 14–19 June 2020; pp. 1777–1786. [Google Scholar] [CrossRef]
  19. Deng, J.; Yao, Y.; Rao, M.; Yang, Y.; Luo, C.; Li, Z.; Hua, X.; Chen, B. Automated Detection Method for Bolt Detachment of Wind Turbines in Low-Light Scenarios. Energies 2025, 18, 2197. [Google Scholar] [CrossRef]
  20. Zhang, Q.; Bai, E.; Shao, M.; Liang, H.; Yang, J. Real-Time Enhancement Algorithm of Low-Light Image Based on Zero-DCE. In Proceedings of the International Conference on Frontiers Technology of Information and Computer (ICFTIC), Qingdao, China, 2–4 December 2022; pp. 58–67. [Google Scholar] [CrossRef]
  21. Jiang, H.; Hu, F.; Fu, X.; Chen, C.; Wang, C.; Tian, L.; Shi, Y. YOLOv8-Peas: A Lightweight Drought Tolerance Method for Peas Based on Seed Germination Vigor. Front. Plant Sci. 2023, 14, 1257947. [Google Scholar] [CrossRef]
  22. Yang, Q.; Yue, Z. Spatial Display Model of Oil Painting Art Based on Digital Vision Design. Int. J. Adv. Comput. Sci. Appl. 2024, 15, 78. [Google Scholar] [CrossRef]
  23. Xie, T.; Han, W.; Xu, S. YOLO-RS: A More Accurate and Faster Object Detection Method for Remote Sensing Images. Remote Sens. 2023, 15, 3863. [Google Scholar] [CrossRef]
  24. Feng, G.; Yang, Q.; Tang, C.; Liu, Y.; Wu, X.; Wu, W. Mask-Wearing Detection in Complex Environments Based on Improved YOLOv7. Appl. Sci. 2024, 14, 3606. [Google Scholar] [CrossRef]
  25. Liu, Z.; Zhong, X.; Wang, C.; Wu, G.; He, F.; Wang, J.; Yang, D. Rapid and accurate detection of peanut pod appearance quality based on lightweight and improved YOLOv5_SSE model. Front. Plant Sci. 2025, 16, 1494688. [Google Scholar] [CrossRef]
  26. Zhu, Q.; Ma, K.; Wang, Z.; Shi, P. YOLOv7-CSAW for maritime target detection. Front. Neurorobot. 2023, 17, 1210470. [Google Scholar] [CrossRef]
  27. Sun, F.; Lv, Q.; Bian, Y.; He, R.; Lv, D.; Gao, L.; Wu, H.; Li, X. Grape Target Detection Method in Orchard Environment Based on Improved YOLOv7. Agronomy 2025, 15, 42. [Google Scholar] [CrossRef]
  28. Song, Q.; Zhou, Z.; Ji, S.; Cui, T.; Yao, B.; Liu, Z. A Multiscale Parallel Pedestrian Recognition Algorithm Based on YOLOv5. Electronics 2024, 13, 1989. [Google Scholar] [CrossRef]
  29. Sun, Y.; Huang, L.; Zhao, J.; Li, X.; Qiu, M. DSMFFNet: Depthwise separable multiscale feature fusion network for bridge detection in very high resolution satellite images. Geocarto Int. 2022, 38, 1–26. [Google Scholar] [CrossRef]
  30. Yong, P.; Li, S.; Wang, K.; Zhu, Y. A Real-Time Detection Algorithm Based on Nanodet for Pavement Cracks by Incorporating Attention Mechanism. In Proceedings of the 2022 8th International Conference on Hydraulic and Civil Engineering: Deep Space Intelligent Development and Utilization Forum (ICHCE), Xi’an, China, 25–27 November 2022; pp. 1245–1250. [Google Scholar] [CrossRef]
Figure 2. Partial experimental results under normal light and low light. (a) Original image. (b) Gamma. (c) HE. (d) SSR. (e) RetinexNet. (f) KinD. (g) Zero-DSOpt. (h) Ground truth.
Figure 2. Partial experimental results under normal light and low light. (a) Original image. (b) Gamma. (c) HE. (d) SSR. (e) RetinexNet. (f) KinD. (g) Zero-DSOpt. (h) Ground truth.
Processes 13 03038 g002
Figure 3. Part of the dataset.
Figure 3. Part of the dataset.
Processes 13 03038 g003
Figure 4. Comparison Experiment Results of the sice dataset. (a) Original image. (b) KinD. (c) Zero-DSOpt. (d) Zero-DSOpt*. (e) Ground truth.
Figure 4. Comparison Experiment Results of the sice dataset. (a) Original image. (b) KinD. (c) Zero-DSOpt. (d) Zero-DSOpt*. (e) Ground truth.
Processes 13 03038 g004
Figure 5. Comparison experiment results of the drilling operation site. (a) Original image. (b) Zero-DSOpt. (c) Zero-DSOpt*.
Figure 5. Comparison experiment results of the drilling operation site. (a) Original image. (b) Zero-DSOpt. (c) Zero-DSOpt*.
Processes 13 03038 g005
Figure 6. Zero-DSOpt* preprocessing result graph. (a) Original image. (b) After preprocessing.
Figure 6. Zero-DSOpt* preprocessing result graph. (a) Original image. (b) After preprocessing.
Processes 13 03038 g006
Figure 7. Network Structure Diagram of the Zero-DSOpt algorithm.
Figure 7. Network Structure Diagram of the Zero-DSOpt algorithm.
Processes 13 03038 g007
Figure 8. Network Structure Diagram of DAFE-Net algorithm.
Figure 8. Network Structure Diagram of DAFE-Net algorithm.
Processes 13 03038 g008
Figure 9. The pump stop detection workflow of DAFE-Net.
Figure 9. The pump stop detection workflow of DAFE-Net.
Processes 13 03038 g009
Figure 10. Focus preprocessing effect diagram.
Figure 10. Focus preprocessing effect diagram.
Processes 13 03038 g010
Figure 11. Attention Module.
Figure 11. Attention Module.
Processes 13 03038 g011
Figure 12. Improved FPN structure diagram.
Figure 12. Improved FPN structure diagram.
Processes 13 03038 g012
Figure 13. Comparison of the intersection and union ratio (IOU) between the prediction box and the real box under different states.
Figure 13. Comparison of the intersection and union ratio (IOU) between the prediction box and the real box under different states.
Processes 13 03038 g013
Figure 14. Example of inter-frame difference detection. (a) Scene 1. (b) Scene 2. (c) The detection result of the inter-frame difference method.
Figure 14. Example of inter-frame difference detection. (a) Scene 1. (b) Scene 2. (c) The detection result of the inter-frame difference method.
Processes 13 03038 g014
Figure 15. DAFE-Net Stop sampling Detection flowchart.
Figure 15. DAFE-Net Stop sampling Detection flowchart.
Processes 13 03038 g015
Figure 16. The specific process of the detection when the pumping unit stops pumping.
Figure 16. The specific process of the detection when the pumping unit stops pumping.
Processes 13 03038 g016
Figure 17. PR Curve Graph.
Figure 17. PR Curve Graph.
Processes 13 03038 g017
Figure 18. Confusion matrix.
Figure 18. Confusion matrix.
Processes 13 03038 g018
Figure 19. Application Scenario diagram.
Figure 19. Application Scenario diagram.
Processes 13 03038 g019
Table 1. Performance comparison Tablele before and after algorithm improvement.
Table 1. Performance comparison Tablele before and after algorithm improvement.
DSconvPSNR(dB)Number of ParametersSpeed (s)
Zero-DCE×31.5779,4160.012 s
Zero-DSOpt31.6111,9260.006 s
Table 2. Quantitative Analysis results of each algorithm.
Table 2. Quantitative Analysis results of each algorithm.
MethodPSNRSSIMSpeed (s)
Gamma14.120.550.0084
HE13.840.470.0053
SSR13.450.660.186
RetinexNet13.950.640.069
KinD16.650.820.052
Zero-DSOpt14.870.690.0011
Table 3. Quantitative Analysis Results After Combined Dataset Training.
Table 3. Quantitative Analysis Results After Combined Dataset Training.
MethodPSNRSSIMSpeed (s)
KinD16.650.820.052
Zero-DSOpt14.870.690.0011
Zero-DSOpt*15.760.750.0013
Table 4. Label Names and Quantities Table.
Table 4. Label Names and Quantities Table.
Label Nameschouyoujichouyouji_headchouyouji_armchouyouji_bottom
Quantity4494526744614494
Table 5. X Parameter Settings Table.
Table 5. X Parameter Settings Table.
Parameter NameParameter Value
Momentum0.937
Weight_decay0.0005
Batch_size16
Learning_rate0.01
Epochs300
Image_size640 × 640
Table 6. Comparative Experiments of Different Models.
Table 6. Comparative Experiments of Different Models.
ModelsmAP(%)Recall (%)Inference Time (ms)
Faster-RCNN91.992.5191
SSD75.277.098
Detr93.495.928
YOLOv1193.095.732
DAFE-Net93.996.535
Table 7. Results of Model Training Experiments.
Table 7. Results of Model Training Experiments.
DatasetRecognize ObjectsmAP (%)Recall (%)
Training setchouyouji94.90%96.50%
chouyouji_head94.13%95.91%
chouyouji_arm92.13%93.91%
chouyouji_bottom91.13%89.91%
Test setchouyouji94.12%95.50%
chouyouji_head93.13%95.10%
chouyouji_arm91.53%93.01%
chouyouji_bottom90.53%89.01%
Field Testingchouyouji93.9%95.5%
chouyouji_head92.13%94.91%
chouyouji_arm91.13%92.91%
chouyouji_bottom90.13%88.91%
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Tan, K.; Wang, S.; Mao, Y.; Wang, S.; Han, G. Dynamic Anomaly Detection Method for Pumping Units Based on Multi-Scale Feature Enhancement and Low-Light Optimization. Processes 2025, 13, 3038. https://doi.org/10.3390/pr13103038

AMA Style

Tan K, Wang S, Mao Y, Wang S, Han G. Dynamic Anomaly Detection Method for Pumping Units Based on Multi-Scale Feature Enhancement and Low-Light Optimization. Processes. 2025; 13(10):3038. https://doi.org/10.3390/pr13103038

Chicago/Turabian Style

Tan, Kun, Shuting Wang, Yaming Mao, Shunyi Wang, and Guoqing Han. 2025. "Dynamic Anomaly Detection Method for Pumping Units Based on Multi-Scale Feature Enhancement and Low-Light Optimization" Processes 13, no. 10: 3038. https://doi.org/10.3390/pr13103038

APA Style

Tan, K., Wang, S., Mao, Y., Wang, S., & Han, G. (2025). Dynamic Anomaly Detection Method for Pumping Units Based on Multi-Scale Feature Enhancement and Low-Light Optimization. Processes, 13(10), 3038. https://doi.org/10.3390/pr13103038

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop