Next Article in Journal
Adaptive Curve-Guided Convolution for Robust 3D Hand Pose Estimation from Corrupted Point Clouds
Previous Article in Journal
Simplification of Indirect Resonant Switched-Capacitor Converter Based on State-Space Average Model Method
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Coal Shearer Drum Detection in Underground Mines Based on DCS-YOLO

1
State Key Laboratory of Intelligent Coal Mining and Strata Control, Shanghai 200030, China
2
China Coal Technology and Engineering Group Shanghai Co., Ltd., Shanghai 200030, China
3
School of Mechanical Engineering and Power, Shanghai Jiao Tong University, Shanghai 200240, China
*
Authors to whom correspondence should be addressed.
Electronics 2025, 14(20), 4132; https://doi.org/10.3390/electronics14204132
Submission received: 25 September 2025 / Revised: 17 October 2025 / Accepted: 20 October 2025 / Published: 21 October 2025
(This article belongs to the Section Artificial Intelligence)

Abstract

To address the challenges of low illumination, heavy dust, and severe occlusion in fully mechanized mining faces, this paper proposes a shearer drum detection algorithm named DCS-YOLO. To enhance the model’s ability to effectively capture features under drum deformation and occlusion, a C3k2_DCNv4 module based on deformable convolution (DCNv4) is incorporated into the network. This module adaptively adjusts convolution sampling points according to the drum’s size and position, enabling efficient and precise multi-scale feature extraction. To overcome the limitations of conventional convolution in global feature modeling, a convolution and attention fusion module (CAFM) is constructed, which combines lightweight convolution with attention mechanisms to selectively reweight feature maps at different resolutions. Under low-light conditions, the Shape-IoU loss function is employed to achieve accurate regression of irregular drum boundaries while considering both positional and shape similarity. In addition, GSConv is adopted to achieve model lightweighting while maintaining efficient feature extraction capability. Experiments were conducted on a dataset built from shearer drum images collected in underground coal mines. The results demonstrate that, compared with YOLOv11n, the proposed method reduces Params and Flops by 7.7% and 4.6%, respectively, while improving precision, recall, mAP@0.5, and mAP@0.5:0.95 by 2.9%, 3.2%, 1.1%, and 3.3%, respectively. These findings highlight the significant advantages of the proposed approach in both model lightweighting and detection performance.

1. Introduction

With the rapid development of the coal industry toward intelligence and unmanned operations, the shearer in fully mechanized mining faces has become the core operational equipment, and its operating status directly affects both coal production efficiency and equipment safety. During the cutting process, the shearer drum is the most critical working component, whose posture variation, contour boundaries, and contact regions dynamically reflect the real-time cutting conditions [1,2]. Traditional manual inspection and simple visual detection methods can no longer meet the requirements for high-precision and real-time monitoring [3]. Accurate acquisition of drum status not only helps to prevent collisions between the drum and the hydraulic shield but also enables timely monitoring of internal defects such as teeth wear and loss, thereby ensuring equipment safety and production continuity [4,5]. However, the complex underground environment poses severe challenges for drum detection: heavy dust, poor illumination, occlusion from hydraulic supports, and multi-posture drum movements all increase the detection difficulty [6].
In current research on monitoring of fully mechanized mining faces, scholars mainly focus on the attitude recognition and collision detection of hydraulic support shield plates. Ren et al. [7] integrated feature point information from depth cameras to achieve accurate calculation of shield plate height and posture angles. Wang et al. [8] proposed a collision detection method combining virtual rays and bounding boxes, though the cost is relatively high. Zhang et al. [9] improved shield plate attitude measurement accuracy by fusing inclinometer and gyroscope data with Kalman filtering. Another work by Zhang et al. [10] built a distributed edge detection system based on 5G communication to enable remote monitoring of shield plate posture. However, these methods have rarely addressed the direct pose detection of coal shearer drums. In contrast, recent studies have increasingly focused on applying deep learning techniques in underground coal mines, targeting tasks such as coal-gangue identification [11], coal-rock interface recognition [12], and industrial conveyor belt tear detection [13]. These works demonstrate that artificial intelligence can be effectively employed for various detection and recognition tasks in complex underground environments. Despite these advances, most systems still rely on manual assessment of drum status, limiting the feasibility of intelligent and continuous monitoring and hindering the broader adoption of automated coal mining systems.
With the rapid development of deep learning and computer vision technologies, convolutional neural networks (CNNs) have been widely applied in object detection and recognition [14,15,16]. Approaches such as Faster R-CNN [17], SSD [18], RT-DETR [19], and the YOLO series [20,21,22] have become mainstream methods in industrial object detection. Compared with Faster R-CNN and RT-DETR, YOLO demonstrates distinct advantages through its single-stage detection architecture, enabling end-to-end training and efficient inference. This design delivers superior performance in terms of speed and real-time capability, while multi-scale feature fusion enhances the detection of small objects. Furthermore, YOLO’s lightweight structure and flexible deployment make it particularly suitable for rapid application in computationally constrained industrial scenarios. Some studies have also proposed solutions to optimize CNNs operation on low-end hardware platforms to address the problem of limited computational resources in industrial applications [23]. In contrast, Faster R-CNN achieves higher accuracy but suffers from slower inference, while RT-DETR provides advantages in global modeling and accuracy at the expense of greater computational cost and limited real-time performance. Consequently, YOLO is better aligned with real-time monitoring and engineering deployment requirements. Furthermore, in scenarios with limited or scarce data, few-shot learning has emerged as an effective strategy [24]. In underground coal mining environments, however, object detection still faces challenges such as insufficient accuracy, object loss, and limited robustness under occlusion [25]. On fully mechanized mining faces, interference from dust, poor illumination, and occlusion caused by hydraulic supports make it difficult for existing methods to achieve high-precision detection and real-time dynamic analysis of shearer drums under multi-posture and continuous motion conditions [26,27].
Current research exhibits two primary limitations: (1) most studies focus exclusively on monitoring the status of hydraulic support shield plates, lacking direct perception of drum posture and motion; (2) existing deep learning approaches struggle to balance accuracy and real-time performance under the complex operating conditions of coal mines, falling short of the requirements for continuous monitoring and collision warning [28]. Therefore, it is imperative to develop a robust detection method capable of delivering high accuracy, low latency, and strong adaptability in complex underground environments, thereby providing reliable support for drum posture analysis, fault prediction, and intelligent control [29]. The main contributions of this work are as follows:
(1)
We introduce the C3k2_DCNv4 module, which adaptively adjusts convolution sampling points to capture drum features under varying scales and non-rigid deformations, improving perception under occlusion.
(2)
We propose a lightweight convolution and attention fusion module (CAFM) that enhances multi-resolution feature representation under complex illumination and background interference while maintaining computational efficiency with GSConv.
(3)
We employ the Shape-IoU loss to precisely fit irregular drum boundaries by considering both position and shape similarity, improving localization accuracy in low-light and complex environments.
The structure of this paper is organized as follows: Section 2 presents the dataset used in this study and details the proposed coal shearer drum detection method; Section 3 describes the experiments and results, validating the effectiveness and performance advantages of the proposed method and analyzing the limitations of the drum detection approach; and Section 4 concludes the paper and outlines directions for future research.

2. Materials and Methods

2.1. Dataset

The dataset used in the experiments was collected from two distinct mining sites, Nantun Coal Mine and Dongtan Coal Mine, located in Shandong Province, China, and comprised a total of 6000 images capturing real operational scenes. To ensure data quality and annotation consistency, a rigorous filtering process was applied prior to model training. Images exhibiting severe motion blur, overexposure, or heavy dust occlusion that obscured the drum contour were removed. In addition, duplicate or near-duplicate frames captured from the same scene or continuous video sequences were excluded to enhance data diversity and reduce overfitting risk. After cleaning, 2738 images were selected to cover different viewing angles and operational states, with representative images shown in Figure 1. All annotations were performed using the Labelme software (https://labelme.io/ accessed on 19 October 2025), and the images were labeled and saved in the YOLO dataset format to facilitate subsequent model training and evaluation. The annotation targets were the minimum bounding rectangles of the coal shearer drums, with the annotation principle being to ensure that each bounding box contained only the drum while minimizing background pixel interference. To evaluate the model’s performance, the dataset was split into training, validation, and test sets at a ratio of 3:1:1.

2.2. Proposed Method

YOLOv11 is a new version of the YOLO series of real-time object detectors developed by Ultralytics [30], which redefines the upper limits of model performance in terms of performance, speed, and efficiency. Building upon the significant achievements of previous versions, YOLOv11 introduces important optimizations and improvements in network architecture and training strategies, making it a general-purpose solution suitable for a variety of computer vision tasks.
However, in underground coal mines, factors such as dust, low illumination, and cluttered backgrounds make it difficult to accurately capture the edge features of shearer drums, resulting in limited detection performance and robustness. To address this, this study improves the YOLOv11n network by incorporating DCNv4, a CAFM, and Shape-IoU loss, thereby enhancing the model’s ability to detect drum. In addition, the convolutions in the backbone and neck are replaced with GSConv [31], effectively improving detection performance and stability while reducing the model’s parameter count. The overall framework is illustrated in Figure 2. In the Figure, “c” stands for concat, which serves to concatenate multiple feature maps along the channel dimension, thereby integrating feature information from different layers.
The overall network architecture is primarily divided into four submodules: the input, backbone, neck, and head. The input layer is responsible for receiving raw image data and performing basic preprocessing. After normalization, the input image enters the subsequent network with a size of 640 × 640. This layer does not alter the spatial dimensions of the image and only performs initial convolution operations to ensure that subsequent network modules receive data in a consistent format. The backbone is responsible for multi-scale feature extraction. An initial convolution module is used to capture low-level texture information, and the C3k2_DCNv4 module is introduced to enhance the model’s ability to perceive object deformations. The C3k2_DCNv4 module incorporates DCNv4 across multiple layers and integrates the SPPF and C2PSA modules to expand the receptive field and optimize multi-scale feature fusion. The neck efficiently fuses multi-scale features through C3k2_DCNv4, upsampling, and feature concatenation. Additionally, the CAFM is incorporated into the detection branches to enhance channel responses, ensuring sufficient interaction between deep semantic features and shallow fine-grained details. The detection head follows the multi-scale design of the YOLO series, utilizing feature maps of 80 × 80, 40 × 40, and 20 × 20 to detect small, medium, and large objects, thereby balancing performance and robustness. Meanwhile, the Shape-IoU loss is employed, effectively addressing the limitations of traditional IoU loss in fitting irregularly shaped objects.

2.2.1. C3k2_DCNv4

Deformable convolution (DCN) is designed to enhance a model’s ability to learn invariance for complex objects. Traditional convolutions rely on fixed and regular sampling points when processing images, which performs well for static or regularly structured object detection tasks. However, drum detection is a typical dynamic task, with significant variations in drum sizes and frequent occlusions caused by the shearer’s boom. Traditional convolutions may struggle in such scenarios. The key advantage of DCN lies in its dynamic sampling capability. It can adaptively adjust the sampling locations based on the size and position of the target, thereby more effectively extracting features from objects of varying scales.
DCNv4 is an advanced and improved version of DCN architecture, designed to enhance the dynamic performance of convolution operations [32]. It achieves this primarily through the integration of DCN and deformable RoI pooling. Unlike traditional convolutions with fixed sampling points, DCN learn an offset mechanism that allows each convolutional kernel’s sampling points to move freely, adapting to the geometric deformations of the target and capturing its shape and pose more accurately.
For an input feature map x R H × W × C , the channels are divided into G groups. Each group of feature maps is processed using standard convolution to compute the offset Δ P g k and modulation Δ m g k for each sampling point k within the group. The computed offset Δ P g k adjusts the regularly distributed sampling positions (original positions being p 0 + p k , where p 0 is the convolution center). Using the adjusted sampling locations, a weighted summation of the input features is performed, weighted by Δ m g k , to obtain the output feature map y g for the group:
y g = k = 1 K Δ m g k . x g p 0 + p k + Δ P g k
The output feature maps of all groups,   y g , are concatenated along the channel dimension to form the final output feature map y:
y = c o n c a t ( y 1 , y 1 , y G ,   a x i s = 1
The original YOLOv11n model only uses a single convolution kernel size for feature extraction, which limits its ability to effectively learn features of objects with different shapes. To address this limitation, this study introduces the DCNv4 module, resulting in the C3k2_DCNv4 module as shown in Figure 3. In this module, the standard convolution layer in the bottleneck of the original C3k2 structure is replaced with a DCNv4 layer, while the first convolution layer and the residual connection are retained to ensure computational efficiency and stable gradient propagation. The C3k2 module is composed of two consecutive C3k blocks, and within each C3k block, the Bottleneck layers are repeated n times. In the YOLOv11n model, n is typically set to 1. The DCNv4 layer introduces two additional lightweight branches, namely the offset branch and the modulation branch, which enable adaptive adjustment of sampling locations and weights. This design allows the module to adaptively adjust the receptive field of the convolution kernel, including its size, shape, and position, thereby capturing the morphological characteristics of the shearer drum more accurately. Structurally, the C3k2_DCNv4 module is integrated into both the backbone and neck of the original YOLOv11n model. Specifically, the C3k2 modules in the 3rd, 5th, 7th, and 9th layers of the backbone are replaced with C3k2_DCNv4 modules, and all C3k2 modules in the neck are also replaced. Compared with previous versions, DCNv4 optimizes the implementation and effectively reduces redundant computations, enhancing both feature adaptability and detection robustness.

2.2.2. Convolution and Attention Fusion Module

Convolution is inherently constrained by its local receptive field, which limits its capacity for global feature modeling. In contrast, transformer-employing attention mechanisms excel at capturing global representations and modeling long-range dependencies. Convolution and attention are therefore complementary in modeling both global and local features [33]. Building on this idea, we propose a lightweight CAFM, as illustrated in Figure 4. In this design, the global branch leverages self-attention to capture broader contextual information, while the local branch focuses on extracting fine-grained details, thereby achieving more comprehensive denoising.
The proposed CAFM consists of both local and global branches. In the local branch, a 1 × 1 convolution is first applied to adjust the channel dimensions, thereby enhancing cross-channel interaction and promoting information integration. This is followed by a channel shuffling (SC) operation to further mix and fuse channel information. Specifically, the input tensor is divided into several groups along the channel dimension, and within each group, depth-wise separable convolution is employed to induce CS. The output tensors from each group are then concatenated along the channel dimension to form a new output tensor. Subsequently, a 3 × 3 GSConv is applied for feature extraction. The local branch can be expressed as:
F c o n v = W 3 × 3 C S W 1 × 1 Y
W 1 × 1 denotes the 1 × 1 convolution, W 3 × 3 represents the 3 × 3 GSConv, and Y is the input feature. The global branch is defined as follows:
F a t t = W 1 × 1 A t t e n t i o n Q ^ , K ^ , V ^ + Y
A t t e n t i o n Q ^ , K ^ , V ^ = V ^ S o f t m a x K ^ Q ^ α
where α is a learnable scaling factor. First, the query (Q), key (K), and value (V) are generated using a 1 × 1 convolution and a 3 × 3 GSConv, producing three tensors of shape H   ^ × W ^ × C ^ . Q is then reshaped into Q ^ R H ^ W ^ × C ^ , while K is reshaped into K ^ R C ^ × H   ^ W ^ . The final output is:
F o u t = F a t t + F c o n v

2.2.3. Shape-IoU Loss

In the shearer drum detection task, drums are often characterized by small scale, complex shapes, and rich edge details. Traditional bounding box regression losses (such as DIoU, GIoU, or CIoU) tend to suffer from unstable gradients, large localization errors, or slow convergence when dealing with small objects or drums with significant aspect ratio variations. Shape-IoU loss simultaneously considers both the overlap and the shape consistency between the predicted and ground truth boxes, making it sensitive to aspect ratio deviations and deformations. This significantly enhances localization accuracy for small and elongated objects. By leveraging Shape-IoU, the model can more precisely capture critical features, thereby improving detection performance and robustness [34], making it particularly suitable for drum condition monitoring in the complex underground mining environment.
I o U = B B g t B B g t
w w = 2 × ( w g t ) s c a l e ( w g t ) s c a l e + ( h g t ) s c a l e
d i s t a n c e s h a p e = h h × x c x c g t 2 c 2 + w w × y c y c g t 2 c 2
s h a p e = t = w , h 1 e w t θ ,     θ = 4
w w = h h × w w g t max w , w g t
w h = w w × h h g t max h , h g t
The final bounding box regression loss is defined as:
I S h a p e I o U = 1 I o u + d i s t a n c e s h a p e + 0.5 × s h a p e
where B denotes the predicted box, and B g t denotes the ground truth box. B B g t represents the area of intersection between the predicted and ground truth boxes, while B B g t represents the area of their union. w g t and h g t denote the width and height of the ground truth box, respectively. The scale factor is related to the target size in the dataset and is used to adjust the shape weighting. w w and w h are the horizontal and vertical weighting coefficients, determined by the shape of the ground truth box. ( x c ,   y c ) are the coordinates of the center of the predicted box, and ( x c g t y c g t ) are the coordinates of the center of the ground truth box. c is the diagonal length of the minimum enclosing box covering both the predicted and ground truth boxes. d i s t a n c e s h a p e is the weighted center point distance loss, with direction weights determined by w and h. w and h are the width and height of the predicted box, an w w and w h are the weighted differences in width and height. s h a p e represents the shape similarity loss, used to reduce the discrepancy between the predicted and ground truth box shapes.

3. Experiments and Results

3.1. Experimental Environment and Parameter Settings

During the experiments, training was conducted for 150 epochs with a batch size of 16 and an input image size of 640. The SGD optimizer was employed with an initial learning rate (lr0) of 0.01, which was linearly decayed to 0.0001 according to a final learning rate fraction (lrf) of 0.01. Momentum was set to 0.937, and a weight decay of 0.0005 was applied to prevent overfitting. A warm-up strategy was used during the first 3 epochs, with momentum gradually increasing from 0.8 and the bias learning rate initialized at 0.1. Early stopping with a patience of 50 epochs was applied to further reduce overfitting. To ensure reproducibility, the random seed was fixed at 2 and deterministic mode was enabled. Data augmentation strategies included Mosaic, HSV color adjustments, translation, scaling, and horizontal flipping. In addition, RandAugment was applied to increase sample diversity. During inference, a confidence threshold of 0.25 and an IoU threshold of 0.7 for non-maximum suppression (NMS) were used to remove redundant detection boxes. The experimental environment configuration used is shown in Table 1.

3.2. Evaluation Metrics

In this study, model performance was evaluated using these commonly adopted metrics: precision, recall, and average precision (AP) [35], providing a comprehensive assessment of detection performance. Detection results are categorized as true positive (TP), false positive (FP), and false negative (FN), where TP represents correctly detected objects, FP denotes incorrectly detected objects, and FN indicates missed detections. Based on the statistics of TP, FP, and FN, performance metrics such as precision and recall can be derived. Precision measures the proportion of true positives among all detected results, reflecting the accuracy of predictions, while Recall represents the proportion of correctly identified positive samples among all actual positives, assessing the model’s detection capability. AP is defined as the area under the precision–recall (P–R) curve and provides an overall measure of the model’s performance across different threshold settings. For multi-class detection, the mean average precision (mAP) is commonly reported. They are defined as follows:
Precision = T P T P   +   F P
Recall = T P T P + F N
F 1 = 2 × Precision   ×   Recall Precision   +   Recall
AP = 0 1 Precision Recall d   Recall
Specifically, mAP@0.5 refers to the mAP calculated at an Intersection over Union (IoU) threshold of 0.5, while mAP@0.5:0.95 represents the mAP averaged over multiple IoU thresholds ranging from 0.5 to 0.95 (with a step of 0.05). Higher values of mAP indicate better overall detection performance and robustness across varying localization criteria. Meanwhile, the number of parameters (Params) and floating-point operations (Flops) are used as metrics to evaluate the model’s scale. Frames per second (FPS) is used to measure the inference speed of a model and is defined as the number of images the model processes per second. A higher FPS indicates faster processing speed, making the model more suitable for real-time applications.

3.3. Results and Discussion

In this experiment, we compared the performance of four loss functions—GIoU, DIoU, EIoU, and Shape-IoU—within YOLOv11n on the coal shearer drum dataset. The loss curves during training are shown in Figure 5. Overall, the loss values of all methods gradually decreased with increasing training epochs and eventually converged. Among them, Shape-IoU exhibited the fastest convergence and the lowest final loss, indicating superior performance in bounding box fitting and localization accuracy. EIoU ranked second, showing better convergence than DIoU and GIoU, while GIoU converged the slowest and had the highest final loss, demonstrating relatively poorer overall performance. These results suggest that Shape-IoU provides a clear advantage in enhancing the model’s ability to fit irregular object boundaries.
The results of the different loss function curves are summarized in Table 2. Pr denotes Precision, and Re denotes Recall. As shown in the Table, the GIoU demonstrates relatively stable performance in terms of precision, recall, and AP, but with limited room for improvement. In comparison, DIoU shows improvements across all metrics, indicating its advantage in constraining target positions. Further, EIoU and Shape-IoU exhibit even better performance, with Shape-IoU achieving the highest precision and AP, highlighting its superiority in fitting complex target boundaries and regressing irregular shapes. Overall, Shape-IoU demonstrates the best balance between detection performance and boundary modeling capability, making it the preferred loss function for applications in complex underground mining environments. To validate the effectiveness of the proposed modules, ablation experiments were conducted on the YOLOv11n network, with results shown in Table 3. In the table, the lightweight YOLOv11n model incorporating GSConv serves as the baseline, and the impacts of DCNv4, CAFM, and Shape-IoU on model performance are evaluated, where “√” indicates that the corresponding module is applied.
Table 3 presents the results of ablation experiments conducted on the YOLOv11n baseline model by progressively incorporating the three proposed modules: DCNv4, CAFM, and Shape-IoU. It can be observed that introducing any single module improves model performance compared to the baseline (mAP@0.5:0.95 = 52.9%). For instance, adding DCNv4 increases mAP@0.5:0.95 to 53.6% and incorporating CAFM yields 53.4%, while Shape-IoU achieves the most significant gain, raising it directly to 53.8%. Moreover, combining two modules further enhances performance; the joint application of DCNv4 and Shape-IoU increases mAP@0.5:0.95 to 55.1%, a 2.2-percentage-point improvement over the baseline. When all three modules are simultaneously integrated, the model achieves optimal performance across all metrics, with precision reaching 91.3%, recall 80.3%, and mAP@0.5 and mAP@0.5:0.95 attaining 85.6% and 56.2%, respectively. These results indicate that the three improvement modules provide complementary and synergistic effects in feature modeling, boundary regression, and deformation adaptation, significantly enhancing detection capability in complex scenarios.
To further illustrate the effectiveness of each proposed module in complex scenarios, we generated confidence comparison maps of different enhanced models in representative detection scenes, particularly under severe occlusion or dusty conditions, as shown in Figure 6. The results indicate that, with the progressive integration of DCNv4, CAFM, and Shape-IoU, the model’s confidence in target localization and recognition steadily improves. When all three modules are incorporated simultaneously, the detection confidence reaches its maximum, target boundaries are more precise, and both missed and false detections are substantially reduced. These visual comparisons provide clear evidence of the effectiveness and complementarity of the modules in feature modeling, boundary regression, and deformation adaptation.
To further demonstrate the advantages of the proposed model, we conducted a comparative evaluation against widely used object detection methods, including Faster R-CNN, SSD, RetinaNet, DETR, RT-DETR, YOLOv8n, and YOLOv11n. The results, summarized in Table 4, report each model’s Params, Flops, FPS, and key detection performance metrics under a consistent experimental setup. It can be seen that the traditional two-stage detector Faster R-CNN achieves 83.8% mAP@0.5 but requires 40.5 M (M = 106) parameters and Flops of 200.3 G (G = 109), making real-time deployment challenging. The single-stage SSD reduces computational cost but still exhibits significantly lower detection performance than mainstream methods. RT-DETR, an end-to-end transformer-based detector, shows a certain advantage in precision, yet its Flops is as high as 56.3 G, limiting inference efficiency. In contrast, YOLOv8n achieves a good balance between detection performance and speed, though its Params and Flops remain relatively high (3.1 M/8.8 G). YOLOv11n further optimizes lightweight performance, reducing Flops to 6.5 G while maintaining mAP@0.5 at 84.5%, though its recall remains suboptimal.
Compared with the aforementioned methods, the proposed model significantly improves detection performance while maintaining an extremely low complexity (2.4 M Params and 6.2 G Flops). It achieves 91.3%, 80.3%, and 85.6% in precision, recall, and mAP@0.5, respectively, outperforming existing approaches. Notably, for the more challenging comprehensive metric mAP@0.5:0.95, the model reaches 56.2%, representing an improvement of 3.3 percentage points over YOLOv11n and 3.7 points over YOLOv8n. This demonstrates that the proposed method achieves a superior balance between lightweight design and performance, offering strong practical value and potential for broader application. Meanwhile, the proposed method also demonstrates strong real-time deployment capability, achieving an FPS of 95.6, which is higher than that of several other methods. Figure 7 shows the comparison results of key metrics for different methods.
To more intuitively illustrate the convergence behavior of different methods in detection performance, we plotted the AP@0.5 curves over training iterations for each method, as shown in Figure 8. It can be observed that most methods exhibit a rapid increase during the initial training stages, followed by a gradual stabilization. Traditional detectors show limited convergence speed and stability, whereas the proposed method achieves high detection performance with fewer training iterations.
These results indicate that the proposed improvements enable comprehensive outperformance of mainstream methods under constrained computational resources, balancing performance and efficiency, and demonstrating strong potential for practical engineering applications and broader deployment.
To further validate the differences in detection performance and feature representation capability, we compared YOLOv11n with the proposed method in terms of detection confidence distribution and Grad-CAM feature visualization [36], as shown in Figure 9 and Figure 10, respectively. As illustrated in Figure 9, the proposed method exhibits higher and more concentrated prediction confidence within the target regions, effectively reducing false positive and false negative caused by low-confidence predictions. Meanwhile, the feature visualization in Figure 10 demonstrates that the proposed method produces stronger responses at target edges and key structural regions, yielding more discriminative feature representations. This indicates that the method maintains robust target representation even in complex backgrounds, further confirming its superiority in detection performance. The proposed DCS-YOLO method demonstrates significant advantages in shearer drum detection under complex underground coal mining conditions.
In addition to successful detections, Figure 11 shows typical failure cases, mainly caused by heavy dust, severe occlusion, or extreme lighting. These conditions can partially obscure the target, hinder accurate localization, or distort its appearance, leading to missed or incorrect predictions. Such examples highlight remaining challenges and suggest directions to further improve the method’s robustness.
Two-stage detectors, such as Faster R-CNN and R-FCN, possess strong region-level feature extraction capabilities and generally achieve high accuracy on standard datasets. However, their complex region proposal and refinement processes result in high computational costs, making them less suitable for real-time deployment. One-stage detectors, such as SSD and RetinaNet, improve efficiency but still suffer from performance degradation when handling small, occluded, or low-light targets, which are common in underground mining environments. Transformer-based detectors, such as DETR and RT-DETR, provide global context modeling and improved robustness in cluttered backgrounds, yet they require large-scale data and high computational resources to converge effectively.
In contrast, YOLO-based detectors offer an excellent trade-off between speed and accuracy, enabling efficient detection under constrained computational resources. Their fully convolutional and anchor-free designs support real-time inference and better adaptation to diverse lighting and occlusion conditions, making them particularly suitable for complex coal mine environments. As shown in Table 4, the proposed DCS-YOLO model achieves competitive performance while maintaining extremely low complexity, outperforming both two-stage and transformer-based detectors in terms of precision and inference speed.
Furthermore, the evolution of YOLO series [37] demonstrates continuous improvements in generalization capability and scalability. Especially after the introduction of YOLOv5, mechanisms such as CSPNet and dynamic anchor adjustment were incorporated, offering multiple model variants to suit different scenarios and showing significant advantages over earlier versions in terms of architecture, performance, and usability. Subsequently, YOLOv6 focused on enhancing practicality and deployment efficiency by adopting a more efficient backbone network and feature fusion mechanisms. YOLOv7, through the EfficientRep backbone and dynamic label assignment strategy, achieved improvements in both detection accuracy and efficiency. YOLOv8 introduced new technologies such as PAN and DKA, enhancing feature extraction capabilities and inference speed. YOLOv9 leveraged advanced techniques like PGI and GELAN to address gradient vanishing and error accumulation issues, improving model convergence speed and stability. Finally, YOLOv10 and YOLOv11 employed innovations such as the C3k2 block and C2PSA block, further enhancing detection efficiency and accuracy, with YOLOv11, in particular, becoming one of the most advanced real-time object detection models to date due to its enhanced spatial awareness. Recent studies [38,39] have empirically validated the strong generalization ability of YOLO models across diverse domains, attributed to the integration of YOLO with Transformer architectures and advanced data augmentation strategies. Building upon these advancements, the proposed DCS-YOLO algorithm incorporates deformable convolutions, convolution-attention fusion modules, and Shape-IoU loss, achieving high-precision lightweight detection of shearer drums under low-light, heavy dust, and occluded conditions. This design balances model compactness with detection accuracy, while significantly enhancing feature representation under complex underground mining environments.
However, several limitations and potential future improvements remain. First, in extreme lighting, severe occlusion, or highly complex backgrounds, detection performance may degrade, indicating a need for further robustness enhancement. Second, the dataset used in this study was primarily collected from specific mine environments, and cross-scenario generalization remains to be validated. Finally, although the model is lightweight overall, future work could further reduce inference latency using techniques such as knowledge distillation or network pruning.

4. Conclusions

This paper proposes a DCS-YOLO-based underground coal mining shearer drum detection algorithm. By integrating the C3k2_DCNv4, and GSConv modules and CAFM along with the Shape-IoU loss function, the method efficiently captures drum features under varying scales, non-rigid deformations, and occlusion conditions, while achieving precise regression. Experiments show that with only 2.4 M parameters and a computational cost of 6.2 G, the method achieves a precision of 91.3%, recall of 80.3%, AP@0.5 of 85.6%, and AP@0.5:0.95 of 56.2%, outperforming mainstream detection models. This approach not only achieves remarkable performance improvements in underground coal mining shearer drum detection but also offers a generalizable technical solution for object detection in resource-constrained, complex industrial environments. Future research will focus on enhancing the algorithm’s perception and robustness across different mining sites. This includes cross-site validation and domain adaptation techniques to systematically evaluate generalization under varying conditions. Additionally, controlled experiments under low-light conditions and with synthetic dust will be conducted to assess detection robustness. Multimodal data fusion, collection of data from additional mining sites under diverse operational scenarios, and the construction of a large-scale, cross-mine, multi-scenario dataset will further support comprehensive evaluation. Moreover, techniques such as knowledge distillation, structured pruning, and quantization will be explored to reduce model size and inference latency without compromising detection performance, enabling practical deployment on resource-constrained underground embedded devices.

Author Contributions

Conceptualization, T.H. and J.Q.; methodology, T.H.; software, Z.Y.; validation, L.Z. and C.L.; formal analysis, Z.Y.; investigation, L.Z.; resources, J.Q.; data curation, Z.Y.; writing—original draft preparation, T.H.; writing—review and editing, C.L.; visualization, T.H.; supervision, L.Z.; project administration, T.H.; funding acquisition, C.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China (No. 52474190).

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

Authors Tao Hu, Jinbo Qiu, Zehai Yu and Cong Liu were employed by the China Coal Technology and Engineering Group Shanghai Co., Ltd., Shanghai, China. The remaining author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

  1. Zhang, F.; Zhang, J.; Cheng, H. Research review on intelligent object detection technology for coal mines based on deep learning. Coal Sci. Technol. 2025, 53, 284–296. [Google Scholar] [CrossRef]
  2. Yang, W.; Ji, Y.; Zhang, X.; Zhao, D.; Ren, Z.; Wang, Z.; Tian, S.; Du, Y.; Zhu, L.; Jiang, J. A multi-camera system-based relative pose estimation and virtual–physical collision detection methods for the underground anchor digging equipment. Mathematics 2025, 13, 559. [Google Scholar] [CrossRef]
  3. Liang, M.; Zhang, J. Application and prospect of strapdown inertial navigation system in coal mining equipment. Sensors 2024, 24, 6836. [Google Scholar] [CrossRef]
  4. Xue, X.; Zhang, Y. Digital modelling method of coal-mine roadway based on millimeter-wave radar array. Sci. Rep. 2024, 14, 69547. [Google Scholar] [CrossRef] [PubMed]
  5. Yu, X.; Zhang, L. An edge computing based anomaly detection method in underground coal mine. Neurocomputing 2022, 470, 226–235. [Google Scholar] [CrossRef]
  6. Hu, T.; Zhuang, D.; Qiu, J. An EfficientNetv2-based method for coal conveyor belt foreign object detection. Front. Energy Res. 2025, 12, 1444877. [Google Scholar] [CrossRef]
  7. Ren, H.W.; Li, S.S.; Zhao, G.R.; Fu, K.K. Measurement method of support height and roof beam posture angles for working face hydraulic support based on depth vision. J. Min. Saf. Eng. 2022, 39, 72–81+93. [Google Scholar]
  8. Wang, M.Y.; Zhang, X.H.; Ma, H.W.; Du, Y.Y.; Zhang, Y.M.; Xie, N.; Wei, Q.N. Remote control collision detection and early warning method for comprehensive mining equipment. Coal Sci. Technol. 2021, 49, 110–116. [Google Scholar]
  9. Zhang, K.; Lian, Z. Hydraulic bracket attitude angle measurement system. Ind. Mine Autom. 2017, 43, 40–45. [Google Scholar]
  10. Zhang, J.; Ding, J.K.; Li, R.; Wang, H.; Wang, X. Research on 5G-based attitude detection technology of overhead hydraulic bracket. Coal Min. Mach. 2022, 43, 39–41. [Google Scholar]
  11. Zhang, K.; Yang, X.; Xu, L.; Thé, J.; Tan, Z.; Yu, H. Research on enhancing coal-gangue object detection using GAN-based data augmentation strategy with dual attention mechanism. Energy 2024, 287, 129654. [Google Scholar] [CrossRef]
  12. Xu, S.; Jiang, W.; Liu, Q.; Wang, H.; Zhang, J.; Li, J.; Wang, C. Coal-rock interface real-time recognition based on the improved YOLO detection and bilateral segmentation network. Undergr. Space 2024, 21, 22–32. [Google Scholar] [CrossRef]
  13. Liu, W.; Tao, Q.; Wang, N.; Xiao, W.; Pan, C. YOLO-STOD: An industrial conveyor belt tear detection model based on YOLOv5 algorithm. Sci. Rep. 2025, 15, 1659. [Google Scholar] [CrossRef]
  14. Kaur, R.; Singh, S. A comprehensive review of object detection with deep learning. Digit. Signal Process. 2023, 132, 103812. [Google Scholar] [CrossRef]
  15. Edozie, E.; Nuhu, A.; John, S.; Sadiq, B.O. Comprehensive review of recent developments in visual object detection based on deep learning. Artif. Intell. Rev. 2025, 58, 277–312. [Google Scholar] [CrossRef]
  16. Lamichhane, B.R.; Srijuntongsiri, G.; Horanont, T. CNN based 2D object detection techniques: A review. Front. Comput. Sci. 2025, 7, 1437664. [Google Scholar] [CrossRef]
  17. Ren, S.; He, K.; Girshick, R.B.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 39, 1137–1149. [Google Scholar] [CrossRef]
  18. Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. SSD: Single shot multibox detector. Eur. Conf. Comput. Vis. (ECCV) 2016, 21, 21–37. [Google Scholar] [CrossRef]
  19. Tlebaldinova, A.; Omiotek, Z.; Karmenova, M.; Kumargazhanova, S.; Smailova, S.; Tankibayeva, A.; Kumarkanova, A.; Glinskiy, I. Comparison of modern convolution and transformer architectures: YOLO and RT-DETR in meniscus diagnosis. Computers 2025, 14, 333. [Google Scholar] [CrossRef]
  20. Zheng, H.; Chen, X.; Cheng, H.; Du, Y.; Jiang, Z. MD-YOLO: Surface defect detector for industrial complex environments. Opt. Lasers Eng. 2024, 178, 108170. [Google Scholar] [CrossRef]
  21. Kang, S.; Hu, Z.; Liu, L.; Zhang, K.; Cao, Z. Object detection YOLO algorithms and their industrial applications: Overview and comparative analysis. Electronics 2025, 14, 1104. [Google Scholar] [CrossRef]
  22. Zhou, W.; Li, C.; Ye, Z.; He, Q.; Ming, Z.; Chen, J.; Wan, F.; Xiao, Z. An efficient tiny defect detection method for PCB with improved YOLO through a compression training strategy. IEEE Trans. Instrum. Meas. 2024, 73, 1–14. [Google Scholar] [CrossRef]
  23. Shen, J.; Cheng, X.; Yang, X.; Zhang, L.; Cheng, W.; Lin, Y. Efficient CNN accelerator based on low-end FPGA with optimized depthwise separable convolutions and squeeze-and-excite modules. AI 2025, 6, 244. [Google Scholar] [CrossRef]
  24. Zhang, L.; Lin, Y.; Yang, X.; Cheng, W. From sample poverty to rich feature learning: A new metric learning method for few-shot classification. IEEE Access 2024, 12, 124990–125002. [Google Scholar] [CrossRef]
  25. Jiang, J.; Xie, G.; Guo, M.; Cui, J. Surface mine personnel object video tracking method based on YOLOv5-Deepsort algorithm. Sci. Rep. 2025, 15, 17123. [Google Scholar] [CrossRef] [PubMed]
  26. Wang, Z.; Zhu, Y.; Zhang, Y.; Liu, S. An effective deep learning approach enabling miners’ protective equipment detection and tracking using improved YOLOv7 architecture. Comput. Electr. Eng. 2025, 123, 110173. [Google Scholar] [CrossRef]
  27. Tian, F.; Song, C.; Liu, X. Small target detection in coal mine underground based on improved RTDETR algorithm. Sci. Rep. 2025, 15, 12006. [Google Scholar] [CrossRef]
  28. Ling, J.; Fu, Z.; Yuan, X. Research on downhole drilling target detection based on improved YOLOv8n. Sci. Rep. 2025, 15, 26105. [Google Scholar] [CrossRef]
  29. Zhang, J.; Chen, Y.; Zhang, Y.; Guo, B.; Xu, R. DWHA-PCMSP: Salient Object Detection Network in Coal Mine Industrial IoT. IEEE Trans. Ind. Inform. 2025, 21, 5746–5754. [Google Scholar] [CrossRef]
  30. Wang, A.; Fu, X.; Liu, Y.; Zhang, Z. A remote sensing image object detection model based on improved YOLOv11. Electronics 2025, 14, 2607. [Google Scholar] [CrossRef]
  31. Li, H.; Li, J.; Wei, H.; Liu, Z.; Zhan, Z.; Ren, Q. Slim-neck by GSConv: A lightweight design for real-time detector architectures. J. Real-Time Image Process. 2024, 21, 62. [Google Scholar] [CrossRef]
  32. Xiong, Y.; Li, Z.; Chen, Y.; Wang, F.; Zhu, X.; Luo, J.; Wang, W.; Lu, T.; Li, H.; Qiao, Y.; et al. Efficient deformable ConvNets: Rethinking dynamic and sparse operator for vision applications. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 17–21 June 2024; pp. 5652–5661. [Google Scholar] [CrossRef]
  33. Hu, S.; Gao, F.; Zhou, X.; Dong, J.; Du, Q. Hybrid convolutional and attention network for hyperspectral image denoising. IEEE Geosci. Remote Sens. Lett. 2024, 21, 5504005. [Google Scholar] [CrossRef]
  34. Zhang, Q.; Zhang, J.; Yang, S. Enhancing YOLOv8 object detection with shape-IoU loss and local convolution for small target recognition. Informatica 2025, 49, 105–120. [Google Scholar] [CrossRef]
  35. Padilla, R.; Netto, S.L.; da Silva, E.A.B. A survey on performance metrics for object-detection algorithms. In Proceedings of the 2020 International Conference on Systems, Signals and Image Processing (IWSSIP), Niteroi, Brazil, 1–3 July 2020. [Google Scholar] [CrossRef]
  36. Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-CAM: Visual explanations from deep networks via gradient-based localization. Int. J. Comput. Vis. 2020, 128, 336–359. [Google Scholar] [CrossRef]
  37. Ali, M.L.; Zhang, Z. The YOLO framework: A comprehensive review of evolution, applications, and benchmarks in object detection. Computers 2024, 13, 336. [Google Scholar] [CrossRef]
  38. Hwang, D.; Kim, J.-J.; Moon, S.; Wang, S. Image augmentation approaches for building dimension estimation in street view images using object detection and instance segmentation based on deep learning. Appl. Sci. 2025, 15, 2525. [Google Scholar] [CrossRef]
  39. Wang, S. Automated non-PPE detection on construction sites using YOLOv10 and transformer architectures for surveillance and body-worn cameras with benchmark datasets. Sci. Rep. 2025, 15, 27043. [Google Scholar] [CrossRef] [PubMed]
Figure 1. Coal shearer drum images.
Figure 1. Coal shearer drum images.
Electronics 14 04132 g001
Figure 2. Overall architecture of the network.
Figure 2. Overall architecture of the network.
Electronics 14 04132 g002
Figure 3. C3k2_DCNv4 structure.
Figure 3. C3k2_DCNv4 structure.
Electronics 14 04132 g003
Figure 4. Convolution and attention fusion module.
Figure 4. Convolution and attention fusion module.
Electronics 14 04132 g004
Figure 5. Comparison of different loss functions.
Figure 5. Comparison of different loss functions.
Electronics 14 04132 g005
Figure 6. Detection results for different enhanced models under challenging scenarios.
Figure 6. Detection results for different enhanced models under challenging scenarios.
Electronics 14 04132 g006
Figure 7. Comparison of key evaluation metrics for different methods.
Figure 7. Comparison of key evaluation metrics for different methods.
Electronics 14 04132 g007
Figure 8. mAP@0.5 curves of different methods over training iterations.
Figure 8. mAP@0.5 curves of different methods over training iterations.
Electronics 14 04132 g008
Figure 9. Comparison of drum detection confidence between proposed method and YOLOv11n. The term “10架” in the first-row image indicates the shearer drum at the position of the 10th hydraulic support in the underground coal mine, and “78架” in the third-row image indicates the shearer drum at the position of the 78th hydraulic support.
Figure 9. Comparison of drum detection confidence between proposed method and YOLOv11n. The term “10架” in the first-row image indicates the shearer drum at the position of the 10th hydraulic support in the underground coal mine, and “78架” in the third-row image indicates the shearer drum at the position of the 78th hydraulic support.
Electronics 14 04132 g009
Figure 10. Comparison of drum detection heatmaps between the proposed method and YOLOv11n. The term “10架” in the first-row image indicates the shearer drum at the position of the 10th hydraulic support in the underground coal mine, and “78架” in the third-row image indicates the shearer drum at the position of the 78th hydraulic support.
Figure 10. Comparison of drum detection heatmaps between the proposed method and YOLOv11n. The term “10架” in the first-row image indicates the shearer drum at the position of the 10th hydraulic support in the underground coal mine, and “78架” in the third-row image indicates the shearer drum at the position of the 78th hydraulic support.
Electronics 14 04132 g010
Figure 11. Typical detection failure cases.
Figure 11. Typical detection failure cases.
Electronics 14 04132 g011
Table 1. Experimental environment.
Table 1. Experimental environment.
ConfigurationParameters
Deep learning frameworkPytorch 2.1.0 + python 3.8.0
Operating systemWindows10
GPUNVIDIA GeForce RTX 3090
CPUIntel(R) Core(TM) i7-12700@2.10 GHz
Table 2. Experimental results of different loss functions.
Table 2. Experimental results of different loss functions.
LossPr (%)Re (%)mAP@0.5 (%)mAP@0.5:0.95 (%)
GIoU88.976.483.90.528
DIoU89.476.684.10.531
EIoU90.176.384.50.535
Shape-IoU90.876.084.90.538
Table 3. Ablation Experiment Results.
Table 3. Ablation Experiment Results.
MethodDCNv4CAFMShape-IoUPr (%)Re (%)mAP@0.5 (%)mAP@0.5:0.95 (%)
a 88.477.184.552.9
b 88.777.384.853.6
c 88.577.684.753.4
d 90.876.084.953.8
e 89.678.085.154.7
f 90.377.885.255.1
g 89.279.185.054.5
h91.380.385.656.2
Table 4. Comparison of Results of Different Methods.
Table 4. Comparison of Results of Different Methods.
MethodParams (M)Flops (G)FPS (f/s)Pr (%)Re (%)mAP@0.5 (%)mAP@0.5:0.95 (%)
Faster-RCNN40.5200.39.283.278.983.851.8
SSD26.330.858.782.076.682.150.3
DETR41.786.528.485.178.484.051.4
RT-DETR19.256.346.390.975.684.452.7
RetinaNet37.7204.111.884.577.283.550.5
YOLOv8n3.18.883.590.875.984.352.5
YOLOv11n2.66.589.188.477.184.552.9
Ours2.46.295.691.380.385.656.2
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Hu, T.; Qiu, J.; Zheng, L.; Yu, Z.; Liu, C. Coal Shearer Drum Detection in Underground Mines Based on DCS-YOLO. Electronics 2025, 14, 4132. https://doi.org/10.3390/electronics14204132

AMA Style

Hu T, Qiu J, Zheng L, Yu Z, Liu C. Coal Shearer Drum Detection in Underground Mines Based on DCS-YOLO. Electronics. 2025; 14(20):4132. https://doi.org/10.3390/electronics14204132

Chicago/Turabian Style

Hu, Tao, Jinbo Qiu, Libo Zheng, Zehai Yu, and Cong Liu. 2025. "Coal Shearer Drum Detection in Underground Mines Based on DCS-YOLO" Electronics 14, no. 20: 4132. https://doi.org/10.3390/electronics14204132

APA Style

Hu, T., Qiu, J., Zheng, L., Yu, Z., & Liu, C. (2025). Coal Shearer Drum Detection in Underground Mines Based on DCS-YOLO. Electronics, 14(20), 4132. https://doi.org/10.3390/electronics14204132

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop