1. Introduction
With the rapid development of intelligent mine construction, autonomous mining trucks (AMTs) have been widely applied in mining, owing to their advantages in fatigue driving risk mitigation and operational cost reduction [
1]. It is important for AMTs to perceive surrounding environmental information, which is the foundation of decision-making and control. As a common evaluation indicator of perception, high object detection accuracy plays a vital role for AMTs in complex mining environments [
2]. Although object detection for urban road scenarios has reached maturity, its application in unstructured mining environments remains remarkably limited.
Object detection is tough in open-pit mines due to the complexity of associated environments due to, e.g., significant scale variation, severe mutual occlusion, and the camouflage effect. To address the problem of significant scale variation, Song et al. [
3] built MSFANet to capture rich context features, enhancing the feature saliency of objects with different scales. Simultaneously, detection accuracy is also decreased, owing to the loss of critical feature information, which is caused by mutual occlusion between objects in open-pit mines. Therefore, Bo et al. [
4] combined the guidance module of contextual information with the efficient squeeze–excitation attention mechanism, ensuring the model focuses on channels with important feature information. In addition, due to interference caused by the high similarity between the target and background, missed and misused results are increased during detection. Hence, Ren et al. [
5] constructed a multi-scales fusion and attention-based model to improve the performance of object detection for camouflaged obstacles in mining.
Apart from the aforementioned characteristics in unstructured open-pit mine environments, variable weather is also a important factor affecting the accuracy of object detection. Currently, research on object detection for AMTs primarily focuses on normal weather conditions, but AMTs often operate in dusty environments [
6], as shown in
Figure 1. Compared with normal weather conditions, visual characteristics of images such as color balance, fine-grained details, and luminance are severely distorted by dust interference, complicating the extraction of critical channel-wise semantic features and the representation of salient object information [
7,
8,
9]. Ordinary bad weather (such as rain, fog, etc.) is evenly distributed in the form of droplets and introduces interference based on Mie scattering. In contrast with ordinary bad weather, mineral dust is composed of irregularly shaped mineral particles with an uneven spatial distribution, causing local occlusion that disrupts object feature continuity and induces edge confusion in detection tasks. Additionally, current detection of adverse weather conditions is based on the principle of uniform distribution of rain, fog, and water droplets in the atmosphere, aiming to enhance detection performance by reducing optical interference. However, the distribution of dust particles in dusty environments violates this principle, and at the same time, dust particles can block the texture and edge details of objects, making existing detection algorithms inapplicable. Subsequently, inaccurate positioningand false detection may occur due to such complications, directly impairing the detection accuracy of AMTs [
10].
In this article, an efficient detector networkfor AMTs is presented to specifically addresses the robust extraction of salient target features and accurate identification of structural characteristics, thereby improving detection accuracy for unmanned mining trucks in dusty environments. The main contributions of this work are as described as follows:
(1) To improve the accuracy of object detection for AMTs in dusty weather, a novel object detection method based on YOLOv8 and named multi-branch feature interaction and location detection net (MBFILNet) is presented.
(2) Aiming at the challenge of discriminating salient target information in dust-laden environments, the dust interference is filtered out based on MBFI-DO, which integrates multi-branch information interaction and employs differential information guidance.
(3) To enhance the discrimination of structural features in dust-obscured environments, DSC-NLA is designed to capture global spatial long-range dependencies and enhance cross-channel information interaction. This design can augment the recognition robustness of detection models across different dust levels.
The remainder of this research is organized as follows. Related works on detection under adverse weather conditions are introduced in
Section 2. The proposed MBFILNet is elaborated upon in
Section 3. In
Section 4, a large number of experiments on DOM are conducted to evaluate the performance of MBFILNet. Finally, conclusions are presented in
Section 5.
2. Related Works
In this section, relevant works about object detection under adverse weather, as well as feature enhancement and global context feature representation, are analyzed briefly.
2.1. Object Detection Under Adverse Weather
Although the YOLO series [
11] and Faster R-CNN [
12] methods achieve high detection accuracy in optimal weather conditions, their performance proves less satisfactory under adverse weather phenomena [
13]. To overcome this problem, research in the last few years has primarily focused on three approaches: (1) image restoration networks, (2) enhanced detection algorithms, and (3) domain adaptation. Restoring an image is conducive to improving the image quality, enhancing the detection accuracy. Huang et al. [
14] developed DSNet, a dual-branch network sharing feature extraction for joint image restoration and detection. IA-YOLO [
15] comprises restoration and detection networks with a learnable image processing module, which is trained in an end-to-end manner using detection loss alone. However, the approaches mentioned above inevitably lose critical image information during the recovery process. To mitigate information loss in restoration networks, enhanced detection algorithms with scene feature extraction capabilities have been proposed. A lightweight object detection network with a novel plug-and-play Cross-Fusion (CF) block was proposed by Ding et al. [
16], combining the advantages of FPN and PAN in a more flexible architecture. Conventional detection algorithms require a substantial amount of training data, yet datasets under adverse weather conditions are particularly limited. To address these limitations, domain adaptation techniques have attracted widespread attention in the field of image detection. Wang et al. [
17] proposed the image Quality Translation Network (QTNet) and Feature Calibration Network (FCNet), enabling models to progressively generalize from clear-weather to adverse-weather domains based on domain adaptation. Nevertheless, cross-domain learning often loses critical channel dimension information, resulting in poor accuracy performance under adverse weather conditions.
Furthermore, dust storms occur less frequently compared to traditional adverse weather conditions, and they are mostly found in open-pit mines, resulting in a scarcity of related research. Currently, there are two kinds of approaches for object detection in dusty mines: image restoration methods and enhanced detection algorithms. For example, TIAN et al. [
18] proposed a coal mine image enhancement algorithm based on dual-domain decomposition, which achieved image defogging in low-frequency images to eliminate the influence of dust. However, this method ignores the color-shift effect caused by coal dust, resulting in color distortion of the restored image and an unsatisfactory dust removal effect. In terms of enhanced detection algorithms, Yang et al. [
19] proposed a multi-scale edge enhancement (MSEE) module and fused it with the C2f module, which enhanced the extraction of the personnel feature under high-dust conditions. Nevertheless, this approach failed to address the background interference of dust without significant improvement of detection accuracy. In summary, there is a significant gap in the research on dusty open-pit mines, so we need to conduct research on the detection of mines in dusty environments.
Motivated by enhanced detection algorithms, in this paper, crucial latent features are preserved by analyzing images corrupted by dust, thereby avoiding the information loss inherent in preprocessing stages.
2.2. Feature Enhancement
Deep learning-based object detection methods extract high-dimensional features through backbone networks, but the feature extraction abilities are reduced in dust-obscured images due to interference from backgrounds. To enhance multi-scale representation, Lin et al. [
20] introduced the Feature Pyramid Network (FPN) by aggregating high-level and low-level features across different resolutions. However, this approach failed to address feature degradation in adverse weather conditions, limiting its robustness in adverse environments. To enhance feature representation in weather-degraded scenarios, Chen et al. [
21] proposed detail-enhanced convolution (DEConv) and content-guided attention (CGA), boosting dehazing performance significantly. But the methods mentioned above are sufficient for specific tasks only an cannot achieve detection in dynamic and changing environments. To adapt to changing environments, Li et al. [
22] developed a change detection difference enhancement module to extract critical features from difference maps. Based on the above analysis, it can be seen that existing methods primarily focus on detailed feature extraction, neglecting salient feature enhancement through background interference suppression.
In order to address this limitation, in this paper, we propose MBFI-DO, which guides discriminative feature extraction in target regions based on the the integration of high-level semantic information, suppressing dust interference while enhancing salient feature representation.
2.3. Global Context Feature Representation
Enhancing the feature representation in dusty environments [
23,
24] is crucial for target localization based on global receptive fields and contextual information, which can be obtained by modeling the global relationship between targets and the background [
25,
26]. Noman M et al. [
27] constructed an efficient local–global context aggregator (ELGCA) module within the encoder to capture enhanced global context and local spatial information. However, oversimplification of long-range dependencies may occur due to its fixed pooling strategy. To address the challenge of feature refinement, Hu et al. [
28] designed a guided refinement decoder (GRD) to extract context information and further refine prediction results. Nevertheless, dust-induced noise is propagated from high-level features to finer scales due to the lack of adaptive filtering. In contrast, the non-local neural network (NLNet) proposed in ref. [
29] established reliable global context feature representations by calculating the pairwise correlations between spatial pixels, in which noise interference is reduced at the same time.
Inspired by the theoretical insights in ref. [
29], DSC-NLA is proposed in this paper. First, the multi-scale receptive field feature from feature maps is extracted by means of depthwise separable convolution. Subsequently, the representations of the heterogeneous receptive field are aggregated based on feature fusion strategies. Finally, pixel-wise correlation modeling is employed to construct spatial long-range dependencies, and target localization is finished.
4. Results and Discussion
In this section, lots of comparative and ablation experiments are conducted on the DAWN public dusty dataset and the self-made DOM dataset. All experiments are performed on an Ubuntu 20.04, using the PyTorch 1.10.0 deep learning framework and CUDA 11.3 for computational acceleration, with models trained and validated on an NVIDIA RTX 3080 GPU.
4.1. Construction of Dusty Open-Pit Mine Datasets
The Dusty Open-pit Mine (DOM) data is constructed in this paper for dust-obscured open-pit mining environments. It is composed of field data acquisition, selective sampling from the AutoMine database, and CycleGAN-based data augmentation. To validate a robust object detection algorithm for autonomous mining trucks in dusty conditions, this paper constructs the DOM dataset, which contains 7371 dusty images across four categories and is divided into training and testing sets in an 8:2 ratio. The four categories are Bulldozer, Mining-Truck, Excavators, and Loader. Among these, the sample count for “Bulldozer” is 623, and the numbers of “Mining-Truck”, “Excavators”, and “Loader” instances are 6447, 5075, and 1020, respectively. The scale and distribution of different categories of objects are shown in
Figure 5.
The CycleGAN model architecture consists of a ResNet-based generator with six residual blocks and two stride-2 convolutions for downsampling, paired with a 70 × 70 PatchGAN discriminator, implementing three key loss functions: cycle-consistency loss ( = 10), LSGAN adversarial loss, and identity loss ( = 0.5). Training was conducted on an unpaired dataset of 7371 real dusty and 7371 synthetic clean images using the Adam optimizer (lr = 0.0002, 1 = 0.5) for 300 epochs with a fixed batch size of 8, maintaining a 1:1 synthetic-to-real data ratio during detector training to preserve CycleGAN output diversity while strictly adhering to the original CycleGAN methodology without architectural modifications or pretraining.
Our framework creates a closed-loop system where CycleGAN generates DOM training data with physically accurate dust patterns. These synthetic data directly shape MBFILNet’s architectural design; specifically, its MBFI-DO and DSC-NLA modules are optimized to handle the characteristic interference patterns present in CycleGAN-generated data. To rule out data leakage, we perform forced bundling of adjacent time-series data, selecting data samples from different time periods as the training set and test set so as to ensure complete independence between the training set and the test set.
4.2. Evaluation Metrics
In the object detection experiments on the DOM and the DAWN public dusty cityscape datasets, the models were pretrained for 100 epochs on the COCO dataset. Specifically, the number of epochs and learning rate were set to 200 and 0.01 in the training of the models, and the batch size was set to 16. In addition, mean Average Precision (mAP) is adopted as the evaluation criterion, with a confidence threshold of 0.5. The formulas for precision, recall, AP, and mAP are expressed as follows:
where
is the number of true-positive bounding boxes with an IoU > 0.5,
is the number of false-positive bounding boxes with an IoU ≤ 0.5,
is represents false negatives, and
N refers to the number of object classes.
4.3. Experiments on DOM and DAWN
Comparison experiments with several classic object detection models are conducted in this section, demonstrating the excellent performance of MBFILNet in dusty open-pit mines. Comparison models include the YOLO series, dusty image restoration before detection (DIR), domain adaptation detection (DAD) methods, and enhanced detection algorithms (EDG). All models are trained and tested based on DOM and DAWN, and training is stopped when the model reaches convergence.
The mAP and FPS of the object detection models in different dust conditions are shown in
Figure 6. As can be seen from
Figure 6, compared with mainstream detection methods for adverse weather conditions, MBFILNet achieved the highest detection accuracy. More importantly, while delivering this exceptional precision, MBFILNet also maintains competitive FPS performance, surpassing most existing models. This dual advantage highlights that MBFILNet has achieved an optimal balance between detection accuracy and computational efficiency.
As can be seen from
Table 1, compared with DIR, a mAP value of 72.0% is obtained by MBFILNet, which is higher than that of DSNet, IA-YOLO, and BAD-Net by 3.8%, 2.7%, and 0.6%, respectively. Simultaneously, in comparison with the DAD MIC, the mAP of the proposed model is enhanced by 0.9% and 1.1% on the DOM and DAWN datasets, respectively. Furthermore, MBFILNet surpasses the advanced Featenhancer EDG by 1.3% on DOM and 2.6% on DAWN. According to the FPS indicators in
Table 1, the FPS values of DIR, DAD, and EDG are substantially lower, rendering them unsuitable for object detection of AMTs. In addition, compared with the base YOLOv8 model, the mAP values of MBFILNet are improved by 2.0% and 3.7% on the DOM and DAWN datasets. MDFILNet achieves best accuracy performance compared to the original models while maintaining efficient operation, with only a modest 17.8 FPS increase in computational cost. The optimal balance between accuracy and processing speed fully meets the real-time requirements of AMTs. Notably, MBFILNet also outperforms the newer YOLO11, which increased mAP by 1.8% on the DOM dataset and 1.7% on the DAWN dataset.
The detection results of the above models on the DOM datasets are visualized in
Figure 7. Although BAD-Net adaptively enhances the input images by eliminating weather-specific information, some relevant target features are inevitably lost during the process. In comparison, YOLO11 effectively enhances multi-scale feature fusion through cross-scale connections and deformable convolutional modules. However, false detections in complex dusty environments occur due to inadequate structural information. Moreover, while multi-scale feature are extracted by the multiple layers of YOLOv8, it cannot overcome severe dust interference. In contrast, MBFILNet achieves better performance by precisely capturing important object details and structural information in dusty environments.
It can be concluded that MBFILNet performs best in various experiments compared with various classic algorithms on the DOM and DAWN datasets. The comparison shows that the introduction of MBFI-DO and DSC-NLA in MBFILNet not only improves detection accuracy but also significantly reduces both false and missed detections, exhibiting stronger overall robustness.
4.4. Ablation Study
This section investigates the robustness of each component of the detection method proposed in this paper. All experiments are conducted on the DOM dataset, and the baseline model is built based on YOLOv8, with results shown in
Table 2.
As can be seen from
Table 2, YOLOv8-MBFI-DO improved mAP by 1.4% compared with YOLOv8. As illustrated in
Figure 8, the interference from the dust background is reduced by MBFI-DO under the guidance of semantic information, thereby enhancing the salient feature representation of target objects. This demonstrates that MBFI-DO can improve detection accuracy by focusing on the essential characteristics of objects in open-pit mines affected by dust.
In addition, mAP values based on YOLOv8-DSC-NLA are improved by 1.6% relative to YOLOv8. As a result, in comparison to standard NLA, DSC-NLA demonstrates better suitability for target detection of AMTs in dusty conditions in comparative ablation studies.
Figure 9 verifies that target structural information is enhanced by modeling global spatial long-range dependencies based on DSC-NLA, thereby emphasizing contour boundary features between objects.
Notably, the mAP is increased by 2.0% using the combination of MBFI-DO and DSC-NLA, where local semantic information is enriched to reduce dusty background interference. Furthermore, compared with YOLOv8, the MBFILNet proposed in this paper improves accuracy at the cost of a slightly increased computational load, ensuing the suitability for the mobile deployment of AMTs in dusty environments.
4.4.1. Ablation Experiments on MBFI-DO
In this section, ablation experiments on MBFI-DO are performed to explore the effects of MBFI and DO, the results of which are shown in
Table 3.
As can be seen from
Table 3, detection performance is enhanced by each component of MBFI-DO. Compared to YOLOv8, MBFI and DO significantly enhance mAP by 1.0% and 0.7% on DOM and 0.7% and 0.8% on DAWN, respectively. In particular, compared with the baseline model with YOLOv8, 1.0% and 2.0% mAP improvements are achieved on the DOM and DAWN datasets due to MBFI. Moreover, the salient feature representations of the targets are better focused under DO guidance, which also reduces the number of parameters.
Through the combined experiments on different pooling operations of GAP and GMP in feature extraction, it can be found that the accuracy improvement is maximized by locating GAP with larger kernel convolution after feature extraction. The reason is that context space information is better focused based on GAP with a larger kernel convolution. Meanwhile, the salient information of the target is more the focus of GMP, which employs gradient-guided feature amplification to suppress non-critical regions. In addition, the GAP and GMP effects of the exchange order achieve the worst performance, with the map decreasing by 2.1% and 3.8% on DOM and DAWN datasets, respectively (
Table 4).
4.4.2. Ablation Experiments on DSC-NLA
The experimental results for SE, CA, DAM, and NLA are presented in
Table 5. Compared to DAM, NLA significantly enhances mAP by 0.6% and 1.2% on DOM and DAWN, respectively. The increased performance is attributed to the NLA modeling of dependencies between different object directions under the interference of dusty environments.
As shown in the
Table 6, compared with STDConv and GhostConv, the depth-separable (DW) convolution applied in this paper not only achieves a reduction in parameters but also improves the accuracy of the mAP on the DOM dataset by 0.8% and 0.4%, respectively, which is attributable to the fact that DW convolution achieves spatial filtering of each input channel separately, avoiding the weight coupling between channels observed in standard convolution.
4.5. Robustness Evaluation
In the DOM dataset, we categorized the test sets into three levels (clear, light dust, and heavy dust) for dust conditions in open-pit mining environments. Clear conditions with visibility exceeding 80 m represent normal operations. Light dust conditions occur when visibility ranges between 30 and 80 m, indicating moderately challenging working environments. Heavy dust conditions emerge when visibility drops below 30 m. Based on the above standards, the test proportions of clear weather, light dust, and heavy dust are 37%, 52%, and 11%, respectively.
The mAP curves in different dust scenarios are shown in
Figure 10. As can be found from
Figure 10, MBFILNet demonstrates the most outstanding detection performance under heavy dust conditions, fully demonstrating the robustness of MBFILNet against interference in dusty environments. It is worth noting that as the dust becomes more severe, the detection performance of existing algorithms shows a significant downward trend. This phenomenon further verifies the fact that dust conditions could exacerbate the difficulty of AMT target detection.
Additionally, the experimental results demonstrate MBFILNet’s superior robustness under challenging dust conditions. As indicated in
Table 7, MBFILNet achieves the highest detection accuracy of 68.7% mAP in heavy dust environments, outperforming all comparable real-time methods. Notably, it maintains a significant 0.5% mAP improvement over R-YOLO while delivering comparable processing speeds of 185.6 FPS versus 113.7 FPS. While conventional YOLO-series detectors exhibit excellent computational efficiency, they show notable performance degradation in heavy dust conditions, with mAP values of YOLOv9 approximately 1.6% lower than those of MBFILNet. Compared with YOLO-series models, the proposed MBFILNet model successfully addresses the common trade-off between speed and accuracy in dust-obscured environments, achieving both real-time processing capabilities and superior detection robustness.
Table 8 presents the detailed performance metrics of MBFILNet over five independent training runs. The results show minimal fluctuation across runs, with run 3 achieving the highest mAP value of 72.2% on DOM and run 4 showing the best performance of 56.1% on DAWN. The calculated average mAP values of 72.02% and 55.88% confirm the stability of our approach, while the narrow standard deviations of ±0.2% and ±0.3% further substantiate the reproducibility of these improvements. These comprehensive measurements address potential concerns about performance variance and validate the reliability of our reported results.
Furthermore, in order to verify the generalization ability of MBFILNet, we also conducted experimental comparisons on the public non-dust KITTI dataset. A comparison of the performance of object detection methods on the KITTI dataset is shown in
Table 9. In
Table 9. With a precision of 94.2% and a recall of 88.6%, MBFILNet achieves the highest mAP value of 93.4% among all evaluated methods, surpassing Faster R-CNN at 5.2%, YOLOv8 at 0.6%, and YOLOv9 at 1.0%. In terms of processing speed, MBFILNet maintains excellent efficiency at 185.6 FPS, significantly outperforming Transformer-based approaches like RT-DETRv2-R18, achieving an mAP value of 90.4% at only 28.9 FPS. This comprehensive evaluation demonstrates MBFILNet as a particularly effective solution for autonomous driving applications where both detection accuracy and real-time performance are critical requirements. The comparative analysis of object detection methods on the KITTI dataset demonstrates MBFILNet’s superior generalization across different scenarios.
As shown in
Table 9, the comparative analysis of object detection methods on the KITTI dataset demonstrates MBFILNet’s superior performance across multiple metrics. With a precision of 94.2% and recall of 88.6%, MBFILNet achieves the highest mAP value of 93.4% among all evaluated methods, surpassing Faster R-CNN at 5.2%, YOLOv8 at 0.6%, and YOLOv9 at 1.0%. In terms of processing speed, MBFILNet maintains excellent efficiency at 185.6 FPS, significantly outperforming Transformer-based approaches like RT-DETRv2-R18, which achieves an mAP value of 90.4% at only 28.9 FPS. This comprehensive evaluation positions MBFILNet as a particularly effective solution for autonomous driving applications where both detection accuracy and real-time performance are critical requirements. The above comparative experiments also demonstrate the outstanding performance of MBFILNet on a non-dusty dataset, highlighting the generalization ability of MBFILNet.
5. Conclusions
To improve detection accuracy in dusty environments, an efficient object detector for AMTs in dusty environments called MBFILNet is proposed in this paper, which incorporates MBFI-DO and DSC-NLA modules. MBFI-DO enhances discriminative features in target regions and integrated semantic information across multiple levels. DSC-NLA captures global spatial long-range dependencies based on pixel correlations. Meanwhile, a feature fusion strategy is implemented to aggregate diverse receptive-field representations, enhancing multi-scale object detection capability.
According to abundant experiments and comparison with the latest methods on self-made and public datasets, complemented by extensive validation and data analysis, the proposed MBFILNet performs better in AMT object detection under dusty conditions. Notably, MBFILNet achieved an mAP of 72.0% on our self-made DOM dataset and 55.8% on the public DAWN dataset, representing significant improvements of 2.0% and 3.7%, respectively, over the baseline YOLOv8 model. The superior capability of MBFILNet is demonstrated by these gains, addressing the challenges of low object detection accuracy caused by hard extraction of salient represent feature and edge information in dusty backgrounds.
Although the proposed MBFILNet performs robust detection in dusty environments, it performance in extremely dust-laden environments is unsatisfied. Extremely dust-laden environments include environments with extremely high dust concentrations and visibility lower than 5 m, as well as sandstorm conditions. Furthermore, the unique working environment of open-pit mines is not only associated with significant dust problems but also frequent complex adverse conditions such as low light, rain, and snow. When these environmental factors occur simultaneously with dust, more extreme multimodal interference scenarios are formed, posing even greater challenges to the perception systems of unmanned mining trucks. As extremely dust-laden conditions severely degrade image quality and cause significant information loss, target discriminability should be enhanced by means of multi-sensor fusion in the future.