YOLOv8n-Al-Dehazing: A Robust Multi-Functional Operation Terminals Detection for Large Crane in Metallurgical Complex Dust Environment

Pan, Yifeng; Long, Yonghong; Li, Xin; Cai, Yejing

doi:10.3390/info16030229

Open AccessArticle

YOLOv8n-Al-Dehazing: A Robust Multi-Functional Operation Terminals Detection for Large Crane in Metallurgical Complex Dust Environment

College of Railway Transportation, Hunan University of Technology, Zhuzhou 412007, China

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Information 2025, 16(3), 229; https://doi.org/10.3390/info16030229

Submission received: 2 January 2025 / Revised: 3 February 2025 / Accepted: 13 March 2025 / Published: 15 March 2025

Download

Browse Figures

Versions Notes

Abstract

In the aluminum electrolysis production workshop, heavy-load overhead cranes equipped with multi-functional operation terminals are responsible for critical tasks such as anode replacement, shell breaking, slag removal, and material feeding. The real-time monitoring of these four types of operation terminals is of the utmost importance for ensuring production safety. High-resolution cameras are used to capture dynamic scenes of operation. However, the terminals undergo morphological changes and rotations in three-dimensional space according to task requirements during operations, lacking rotational invariance. This factor complicates the detection and recognition of multi-form targets in 3D environment. Additionally, operations like striking and material feeding generate significant dust, often visually obscuring the terminal targets. The challenge of real-time multi-form object detection in high-resolution images affected by smoke and dust environments demands detection and dehazing algorithms. To address these issues, we propose the YOLOv8n-Al-Dehazing method, which achieves the precise detection of multi-functional material handling terminals in aluminum electrolysis workshops. To overcome the heavy computational costs associated with processing high-resolution images by using YOLOv8n, our method refines YOLOv8n through component substitution and integrates real-time dehazing preprocessing for high-resolution images, thereby reducing the image processing time. We collected on-site data to construct a dataset for experimental validation. Compared with the YOLOv8n method, our method approach increases inference speed by 15.54%, achieving 120.4 frames per second, which meets the requirements for real-time detection on site. Furthermore, compared with state-of-the-art detection methods and variants of YOLO, YOLOv8n-Al-Dehazing demonstrates superior performance, attaining an accuracy rate of 91.0%.

Keywords:

YOLOv8; high resolution; object detection; dust scenes; dehazing; multi-functional operation terminals

1. Introduction

In the aluminum electrolysis production process, multi-functional cranes play a pivotal role by performing critical operations such as materials handling, anode replacement, shell breaking, slag removal, and material feeding as illustrated in Figure 1, serving as the backbone of the production process. The crane’s multi-functional terminals are tasked with directly removing slags at temperatures ranging from 950 °C to 970 °C (slag removal operation), spreading alumina on new anode (material feeding operation), crushing spent old anode (shell breaking operation), and transporting anodes within the electrolytic cells (anode replacement operation). These tasks demand precise control by operators. However, influenced by the high-temperature working environment, substantial smoke is generated, leading to increased image noise and color distortion, making it exceedingly difficult to distinguish between terminals and backgrounds. During operations such as material feeding and shell breaking, severe dust emissions further exacerbate the issue, with large amounts of non-uniform dust obscuring the operation terminals and significantly reducing visibility, thus complicating detection. The operation terminals undergo vertical movements and horizontal rotations during material transport, accompanied by various postural changes like grasping and striking. These factors collectively pose significant challenges for accurate terminal recognition and detection. Additionally, the complex electrolytic cell structures make it highly susceptible to collisions with other on-site equipment during actions like grabbing, striking, and aligning. Given the constraints of the aluminum electrolysis production environment and the huge size of the cranes, providing remote operators with visual monitoring of various terminals becomes crucial. Ensuring the identification of terminal types in conditions of extreme heat, a lot of dust, and intense mechanical vibrations while continuously monitoring their operational status is of paramount importance for maintaining safety in production.

To address the challenges posed by unstable lighting conditions and the difficulty in distinguishing between intersecting equipment at the production environments, we deploy high-resolution cameras (2560 × 1080) at four corners surrounding the multi-functional operation terminals as illustrated in Figure 2. These cameras automatically track the vertical movements of the terminals during crane operations, capturing video data for continuous monitoring. The task of detecting and recognizing multiple types of operation terminals within the monitoring view can be framed as an object detection problem. Traditional object detection methods rely on manually designed features such as color and contour, extracting fixed-dimensional feature vectors from candidate regions and classifying them through trained classifiers [1,2]. This methods are based on prior knowledge about the objects, identifying distinctive features that stand out from the surrounding environments. However, during dynamic operations, the multi-functional terminals exhibit little discernible difference in color and shape from their environments, undergo changes in posture due to vertical movement, and rotate horizontally. These factors complicate the extraction of object features from images, reducing the processing speed of traditional methods when applied to high-resolution imagery. Deep learning-based object detection methods, capable of automatically extracting highly discriminative features from images, offer a significant enhancement in performance and accuracy. The YOLO series models [3], known for their rapid, real-time, and precise capabilities, provide a promising solution. Nevertheless, the YOLO architecture lacks inherent rotational invariance, which can affect detection performance when terminals rotate. Moreover, the reliance of YOLO on a single-scale feature map for predictions makes it challenging to accommodate objects of varying sizes effectively. Additionally, during terminal tool operations, steam and dust emissions, as shown in Figure 3, create non-uniform smoke that obscures the direct features of the operation terminals, while similar background features become more prominent.

To mitigate the interference caused by non-uniform smoke and dust on the detection results, it is imperative to restore clear image. Although high-resolution cameras have optimized the image clarity of the hardware, and spherical cameras are equipped with remotely controlled wipers to remove dust, smoke, and debris from the lens during operations, routine maintenance involves the periodic use of high-pressure air guns to clean the camera body. They fall short of meeting the precision requirements for object detection, requiring the application of dehazing algorithms. Dehazing methods based on the Atmospheric Scattering Model (ASM) enhance image contrast by estimating transmission rates and atmospheric light intensity, thereby recovering obscured images. However, these methods lack the capability to preserve local details and are limited when dealing with severe smoky and dusty conditions. Physical model-based methods [4,5], which simulate the scattering and absorption of photons through haze, estimate scene transmission and atmospheric light intensity by solving the atmospheric scattering equation. While theoretically capable of providing high-quality outcomes, these methods demand precise environmental modeling. In practice, their application to high-resolution images is constrained by the complexity of parameter estimation and computational demands. Deep learning-based dehazing techniques capture more intricate details through training, significantly improving dehazing performance in complex environments. Dong et al. [6] introduced the Multi-Scale Dense Block Network (MSDBN), a U-Net architecture that densely fuses multi-scale features to enhance detail but fails to meet real-time processing needs due to its computational demands. Shao et al. [7] proposed a domain adaptation model comprising an image translation module and two dehazing modules, achieving highly nonlinear mappings from blurry inputs to clear outputs via extensive filter learning. This approach, however, increases computational consumption, rendering it unsuitable for direct processing of high-resolution images. The 4KDehazing [8] method, employing bilateral grid coefficients predicted by a low-resolution branch, transfers this information back to the full-resolution space of the original input, generating high-quality dehazed features that restore image structure and edges. This method not only removes haze from high-resolution images but also meets the real-time processing requirements of practical production environments, offering a promising preprocessing solution for object detection in aluminum electrolysis operations. Despite effectively enhancing high-resolution image clarity and providing detailed features for the detection module, the use of high-resolution images as input severely impacts the computational performance of YOLOv8n. Moreover, stacking two networks often leads to a significant increase in parameters and computation, substantially reducing the real-time efficiency of the combined 4KDehazing-YOLOv8n detection scheme.

In summary, to accurately identify the dynamically changing multi-functional operation terminals from high-resolution smoky images captured on-site, we first employ the 4KDehazing network to remove non-uniform smoke and dust interference from high-resolution images in real-time. This process preserves the detailed features of objects in various angles and postures within the scene. To enhance YOLOv8n’s capability in processing high-resolution images, we substitute its backbone with EfficientNetV2. Addressing the issue of parameter proliferation caused by stacked networks, we replace the C2f component with C2fRepghost, thereby reducing the number of parameters while maintaining real-time performance. The YOLOv8n-Al-Dehazing method restores the clarity of high-resolution images, ensuring precise object detection. Through structural improvements, it reduces model parameters, achieving an impressive detection accuracy rate of 91% for operational terminals. The processing speed reaches 120.4 frames per second (FPS) on 2K images, fully meeting the demands for real-time on-site processing.

The main contributions of this paper are summarized as follows:

In response to the real-time monitoring requirements for multi-functional operation terminals in aluminum electrolysis environments, we establish a high-resolution surveillance system capable of capturing and identifying dynamic multi-functional operation terminals in real-time.
We devise a real-time monitoring scheme for operation terminals that integrates the efficient dehazing capabilities of 4KDehazing with the precise detection prowess of YOLOv8n. This integration effectively addresses the challenge of object detection in high-resolution images obscured by non-uniform smoke, achieving superior accuracy compared to using YOLOv8n alone for detection. To mitigate the computational performance degradation caused by network stacking, we optimize the YOLOv8n network architecture. This enhancement ensures that, while maintaining detection accuracy, the parameter count is significantly reduced, thus preserving the real-time efficiency of the combined network.

In the following sections, we present the content in a structured manner:

In Section 2, we analyze the recent domestic and international research status of dehazing networks and object detection networks, and evaluate the strengths and weaknesses of different methods.

In Section 3, we elaborate on our proposed approach, YOLOv8n-Al-Dehazing, a object detection algorithm designed for operational terminals in aluminum electrolysis dust environments. In this section, we first introduce the dehazing algorithm 4KDehazing, a real-time high-resolution dehazing algorithm used to preprocess images captured by cameras and eliminate dust interference. Subsequently, we present the YOLOv8n object detection algorithm and its enhancements, where we replace the backbone with EfficientNetV2 and the C2f module with C2fRepGhost to ensure the real-time performance of the integrated network and improve detection accuracy.

In Section 4, we design three sets of experiments to validate the effectiveness of YOLOv8n-Al-Dehazing and demonstrate how the algorithm enhances the operational efficiency, safety, and monitoring capabilities in complex industrial environments. Finally, we summarize the entire paper and outline the deployment plan for the aluminum electrolysis site.

2. Related Works

2.1. Analysis of Object Detection Algorithm

In 2014, R. Girshick et al. introduced the R-CNN [9] algorithm, applying convolutional neural networks to object detection, thereby ushering in extensive research into deep-learning-based object detection. With the rapid advancement of deep learning technologies, significant strides have been made in object detection methods grounded in deep learning. These methods can be broadly categorized into three classes based on their algorithmic processes: DETR, an object detection algorithm based on Transformer, two-stage object detection algorithms, and one-stage object detection algorithms.

In industrial scenarios, Wang et al. [10] proposed the MY3Net network, which can perform object detection on construction machinery in complex lighting conditions. Chen et al. [11] introduced a new framework for automatically analyzing the activities and productivity of multiple excavators, enabling the detection, tracking, and recognition of excavator operations. Golcarenarenji et al. [12] developed a method for human detection from an overhead perspective, enhancing the visibility for crane operators in complex industrial environments. However, these methods do not account for the multi-category and multi-unit scenarios. Hao [13] and Kim [14] proposed methods for multi-object detection on construction sites, including workers and various types of machinery. However, these approaches are typically applied in relatively simple operational environments and lack the complexity found in more challenging settings. The current research on overhead crane detection is primarily focused on obstacle detection and path planning [15,16], and there is a lack of studies on the detection of terminal tools. This highlights the need for more comprehensive and robust solutions that can handle the complexities of industrial environments, such as those encountered in aluminum electrolysis operations, where multiple types of equipment and varying environmental conditions (e.g., smoke, dust, and mixed smog) pose significant challenges.

DETR [17] introduced the Transformer [18] architecture, pioneering the application of end-to-end sequence prediction concepts to object detection. Subsequent developments have led to various derivatives, including pnp-detr [19], deformable-detr [20], and DINO [21]. DETR converts images and query embeddings directly into a fixed-length sequence of object bounding boxes and class labels, simplifying the detection process. Its primary advantages lie in its straightforward architecture, ease of comprehension, and superior generalization capabilities in complex scenes. However, due to the need for multiple self-attention computations over the entire input sequence in DETR, each calculation involves traversing all positions, coupled with a large number of model parameters, which significantly increases training and inference times, rendering it incapable of real-time detection.

Two-stage object detection algorithms involve two primary phases: first, generating region proposals, and then classifying and performing bounding box regression on each proposal. Representative algorithms include the R-CNN [9] series, followed by advancements such as SPP-Net [22], Fast R-CNN [23], and Faster R-CNN [24], which significantly improve detection accuracy. Further research into feature fusion techniques led to the development of algorithms like FPN [25] and Cascade R-CNN [26], which enhanced the detection of small objects and further elevated the accuracy of these algorithms. Although two-stage object detection methods achieve high precision, the multiple stages involved increase the computational complexity, resulting in longer processing times and slower detection speeds, making them unsuitable for real-time detection requirements in aluminum electrolysis industrial scenarios.

To address the high time overhead associated with two-stage object detection algorithms, one-stage object detection algorithms were introduced. These one-stage algorithms are primarily based on the YOLO series [3]. Starting with YOLOv2 [27], the anchor mechanism from two-stage detectors was incorporated, which, while improving the accuracy of the YOLO series, also introduced new challenges such as complex parameter settings and imbalanced positive-to-negative sample ratios. To overcome these issues, anchor-free algorithms were developed, including CornerNet [28], CenterNet [29], and FCOS [30]. These algorithms enhance the precision and speed of object detection by designing more suitable feature representations. Although one-stage object detection algorithms may not match the accuracy of two-stage algorithms, their fast detection speed, fewer parameters, and lower computational requirements have led to their widespread adoption. For instance, Meimetis et al. [31] designed a multi-object tracking and detection algorithm for real-time multiple tracking of vehicles and pedestrians in traffic. Gai et al. [32] improved the YOLOv4 model to achieve the rapid and accurate detection of cherry fruits during their growth. Alaftekin et al. [33] enhanced the YOLOv4 algorithm for real-time sign language recognition. The latest iteration of the YOLO series, YOLOv8, balances real-time performance with accuracy, offering a promising solution for aluminum electrolysis operational scenarios.

2.2. Analysis of Dehazing Algorithm

Image dehazing algorithms can be broadly categorized into three types: image enhancement, physical model dehazing, and deep learning-based dehazing. Representative methods of image enhancement include histogram equalization [34] and guided filtering [35]. Image enhancement techniques rely on grayscale distribution characteristics and enhance global or local contrast by modifying pixel values. However, these methods can lead to information loss or over-enhancement, failing to fully restore details in images, particularly affecting edge and texture detail recovery, which in turn impacts the accuracy of feature extraction.

Physical model dehazing methods are based on the principles of image degradation, such as the dark channel prior (DCP) [36] and atmospheric scattering models. These methods construct physical models using prior knowledge to compute imaging parameters and restore scenes free of haze. By simulating the propagation of light through haze, they estimate and remove the hazy components in images. Such methods can provide relatively accurate dehazing results, especially in scenarios with moderate to heavy haze. However, they often require solving complex optimization problems, which come with high computational costs, and they impose strict prior assumptions on the input images, limiting their applicability in complex environments.

Deep learning-based dehazing methods train end-to-end on large datasets of hazy and clear images to learn the complex mapping from blurry to clear images. Representative works include DehazeNet [37], AOD-Net [38], DA [7], PMS [39], and the recent GDN series of models, showcasing the powerful capability of deep learning in dehazing tasks. These methods not only improve the dehazing efficacy but also maintain or recover more image details, exhibiting strong adaptability to various types of hazy scenes. The advantage of deep learning models lies in their automatic feature extraction capabilities and adaptability to complex scenarios, making them suitable for the intricate environment of aluminum electrolysis operations. However, due to the high-resolution image processing and real-time requirements of aluminum electrolysis operations, methods such as PMS cannot handle high-resolution images directly and do not meet the real-time requirements. While AOD-Net satisfies the real-time constraints, it too cannot directly process high-resolution images. Methods like DehazeNet and DA can handle high-resolution images but fail to perform real-time dehazing. The 4KDehazing algorithm [8], however, achieves the real-time dehazing of 4K images on a single GPU, operating at 125 FPS, making it particularly well suited for the smoky conditions encountered in aluminum electrolysis operations.

3. Proposed Method

This method integrates the 4KDehazing real-time dehazing algorithm with an enhanced YOLOv8n algorithm to address the challenges of detecting multi-functional operational terminals in dynamic and visually obstructed environments. By preprocessing input images with the 4KDehazing algorithm, the adverse effects of dust on detection performance are effectively mitigated. Additionally, EfficientNetV2 is employed as a replacement for the YOLOv8n backbone, and the C2fRepGhost module is introduced to refine its C2f structure.

4KDehazing significantly enhances dehazing performance through a multi-guided bilateral learning mechanism, which integrates both the global and local information. This approach effectively eliminates dust interference, particularly in dynamic environments, restoring clear visual information. Moreover, its adaptive dehazing strategy dynamically adjusts the intensity of dust removal based on local haze concentration, thereby improving the algorithm’s robustness in visually obstructed environments. This makes it particularly suitable for addressing the non-uniform dust interference generated in aluminum electrolysis operations. Traditional dehazing methods, such as the dark channel prior, rely on handcrafted prior knowledge and struggle to cope with complex and variable dust distributions. Early deep learning methods, on the other hand, often face challenges related to high computational complexity and loss of detail when processing high-resolution images. In contrast, 4KDehazing demonstrates superior real-time performance and accuracy, enabling the rapid processing of high-resolution images while preserving intricate details.

EfficientNetV2 significantly enhances computational efficiency while maintaining high accuracy through the introduction of a progressive scaling strategy and the Fused-MBConv module. The progressive scaling strategy begins training with smaller resolutions and shallower networks, gradually increasing complexity, thereby accelerating training speed and reducing memory consumption. The Fused-MBConv module optimizes the computational structure of shallow networks, improving feature extraction efficiency. In dynamic environments, EfficientNetV2 can swiftly adapt to background changes and capture the dynamic features of targets. In visually obstructed scenarios, its multi-scale feature fusion mechanism effectively enhances the recognition capability of occluded targets.

C2fRepGhost integrates the strengths of the RepGhost module and the C2f module. The RepGhost module employs advanced structural re-parameterization techniques, effectively transforming a complex multi-branch architecture during training into a streamlined single-branch structure during inference. This transformation significantly reduces the computational complexity of the model in object detection tasks, thereby enhancing operational efficiency. Simultaneously, the C2f module facilitates cross-stage partial feature fusion, promoting effective information interaction and integration between features at different levels. This enables the model to more accurately capture multi-scale target information in object detection tasks, improving both robustness and accuracy.

These enhancements endow YOLOv8n-Al-Dehazing with remarkable superiority in both real-time performance and detection accuracy. The architecture of this algorithm is illustrated in Figure 4.

3.1. 4KDehazing

4KDehazing is a real-time high-resolution image dehazing algorithm based on multi-guided bilateral learning, primarily addressing the issue of local detail neglect in previous dehazing methods when processing high-resolution images, which often result in artifacts or detail loss in the dehazed output. By employing a multi-guided bilateral learning framework, the algorithm integrates global and local information, utilizing multiple guidance cues such as luminance, color, and gradients to restore clear images. This approach ensures the preservation of intricate details when processing high-resolution imagery.

The dehazing algorithm 4KDehazing possesses the capability to perform real-time dehazing on high-resolution images. It consists of an upstream branch responsible for low-resolution feature extraction and bilateral grid learning parameters, and a downstream full-resolution dehazing branch that handles affine transformations as illustrated in Figure 5.

Firstly, by inputting a low-resolution smoky image, we extract image features through four downsampling and four upsampling operations using U-net [40]. Following this, we perform a slice operation to generate parameters for the bilateral grid. Next, we input the original 4K-resolution image into the U-net, generating a three-channel feature map, which is then fed into three separate convolutional networks to produce three-channel feature maps. Under the guidance of the bilateral grid, each of the RGB channels generates three resultant feature maps. Finally, through concatenation, these nine feature maps are fused into a three-channel intermediate output, which is then combined with the input via a multiplicative skip connection to produce the final result.

To effectively integrate the features extracted from each channel and reconstruct high-quality dehazing images while preserving good visibility, we employ a feature fusion approach to filter out important features. We concatenate ten multi-layer convolutional blocks and add skip connections within each block to obtain a tensor of coefficients, which serves as a function of the fog density. We adopt the L2 norm as the loss function to optimize the weights and biases.

L = \frac{1}{D} \sum_{i = 1}^{D} {∥I_{i} - J_{i}∥}^{2}

(1)

where D represents the number of training images, I denotes the output result of the smoke removal network, and J stands for ground truth. It is observed that employing L2 loss alone is adequate in generating clear and vibrant colors in the soot removal results. 4KDehazing not only effectively removes smoke and dust from high-resolution images but also optimizes the processing of images in real-time through the bilateral grid. By training this bilateral grid with low-resolution images, 4KDehazing achieves optimized performance for high-resolution image dehazing, ensuring both efficiency and clarity in real-time applications.

3.2. YOLOv8n

In 2023, the Ultralytics team advanced upon YOLOv5 to introduce the YOLOv8 object detection model. In the backbone component, YOLOv8 improves upon CSPNet by refining feature propagation and reducing computations, thereby enhancing the network’s ability to capture multi-scale features. The neck section of YOLOv8 comprises multiple layers that effectively integrate features from different scales. In the head section, unlike YOLOv5 which uses an anchor-based detection head, YOLOv8 adopts an anchor-free detection head. This anchor-free approach directly predicts the center and scale of objects, reducing the computational complexity and improving detection accuracy. The network architecture is depicted in Figure 6.

Although YOLOv8n has achieved outstanding performance across various metrics, its computational efficiency is significantly compromised by high-resolution images, hindering real-time detection capabilities. Moreover, the diverse shapes and lack of rotational invariance of terminal tools need a detection network that can capture a richer set of features to enhance detection accuracy.

3.3. EfficientNetV2

To mitigate the significant impact of high-resolution images on the computational performance of YOLOv8n, we improve it through the incorporation of EfficientNetV2. EfficientNetV2 is an improvement over EfficientNet [41], incorporating a joint scaling strategy in its design as shown in Figure 7. Compared to traditional lightweight methods for object detection, EfficientNetV2 introduces a progressive scaling strategy. This approach begins training with smaller resolutions and shallower networks, gradually increasing both resolution and depth as training progresses. The progressive scaling strategy significantly accelerates the training speed, enabling the model to converge more rapidly, particularly when handling high-resolution images. Additionally, the Fused-MBConv module is introduced, replacing the depthwise separable convolution in MBConv with standard convolutional operations. In object detection tasks, this enhancement allows the model to extract low-level features, such as edges and textures, more efficiently in the shallow layers, thereby improving the detection accuracy. As a result, EfficientNetV2 trains four times faster than EfficientNet and has 6.8 times fewer parameters.

The core operations of MBConv include three stages: a 1 × 1 pointwise convolution that expands the input channels, a 3 × 3 depthwise convolution that processes each channel independently, and another 1 × 1 pointwise convolution that projects the expanded channels back to a lower dimension. However, EfficientNets with shallow layers composed of MBConv suffer from slow training speeds due to the use of Depthwise convolutions and poorer performance when dealing with large input images. These issues highlight the need for optimizations such as those provided by Fused-MBConv in EfficientNetV2, which addresses these shortcomings.

To address these issues, EfficientNetV2 replaces the MBConv structures in the earlier layers of the model with Fused-MBConv structures. Fused-MBConv consolidates the convolutional layer and pointwise convolution layer, effectively merging depth and pointwise operations into a single step. This reduces the computational complexity and enhances feature extraction capabilities. Additionally, it employs training-aware Neural Architecture Search (NAS) and scaling to jointly optimize model accuracy, training speed, and parameters.

3.4. C2fRepghost

C2fRepGhost represents an advanced lightweight enhancement method for object detection, integrating the RepGhost and C2f modules. The RepGhost module employs structural re-parameterization techniques to transform a complex multi-branch architecture during training into a streamlined single-branch structure during inference. This transformation significantly reduces the computational complexity of the model in object detection tasks. Simultaneously, the C2f module facilitates cross-stage partial feature fusion, strengthening the information interaction between features at different levels. This enables the model to more effectively capture multi-scale target information, thereby improving the detection accuracy and robustness. The synergy of these modules results in a highly efficient and precise object detection framework.

The RepGhost (Re-parameterization Ghost) module [42] is an improvement upon the Ghost module [43] through structural re-parameterization. It addresses the significant computational cost associated with concatenation operations on hardware devices while maintaining a certain level of model accuracy. The goal is to reduce the computational and parameter overhead of the module, thereby accelerating the network’s operation speed. The structure of the RepGhost module is illustrated in Figure 8.

Ghost Module, as an innovative alternative to traditional convolutional layers, obviously enhances computational efficiency through its efficient feature extraction method. Specifically, this module initially employs a standard 1 × 1 convolutional layer to perform channelwise compression on the input image, thereby reducing data dimensionality and extracting foundational features. Subsequently, it processes these compressed feature maps using depthwise separable convolutions, which efficiently yield richer feature representations. Finally, the feature maps generated from these distinct stages are concatenated, forming a comprehensive output feature set that achieves superior representational capacity and computational performance. This refined approach not only captures a more abundant array of features but also significantly reduces the computational complexity and the number of parameters.

The RepGhost module employs re-parameterization techniques to replace the concat operation in the Ghost module with a linear add module, moving the ReLU activation to after the add module and linking the shortcut operation to the add module. The movement of the ReLU is compliant with the rules of structural re-parameterization. To maintain nonlinearity during training, batch normalization is added to the shortcut operation. Finally, in the inference model, the shortcut operation is removed, allowing for implicit reuse of features, making the inference process faster.

Building upon the Ghost Bottleneck, the RepGhost Bottleneck is constructed by directly substituting the Ghost modules with RepGhost modules. However, due to the difference in output channels between the add and concat modules, the RepGhost Bottleneck modifies the middle channels to half of those in the Ghost Bottleneck and doubles the output channels of the deeper convolution to maintain the basic design of the Ghost Bottleneck.

The C2f module enhances information interaction between features at different levels through cross-stage partial feature fusion, enabling more effective capture of multi-scale target information in object detection tasks. Building upon the original C2f, we redesign a novel C2fRepGhost module, replacing the conventional Bottleneck with a RepGhost Bottleneck. The C2fRepGhost further compresses the size of the original network model, reducing both the computational and parameter overhead.

4. Experiments

To validate the efficacy of the proposed YOLOv8n-Al-Dehazing method and demonstrate its suitability for aluminum electrolysis operational scenarios, a comprehensive evaluation was conducted. We designed three experiments. The first experiment involves dehazing high-resolution images collected from the field using different methods. By comparing the results with state-of-the-art algorithms, we confirm 4KDehazing as our dehazing network.The second experiment involves incrementally improving the backbone and neck networks using various lightweight modules. Through ablation studies, we demonstrate the incremental gains of the combined improvements, thus verifying the effectiveness modifications in YOLOv8n-Al-Dehazing.The third experiment evaluates the superiority of YOLOv8n-Al-Dehazing in an aluminum electrolysis smoky environment compared to mainstream detectors.

4.1. Experimental Setup

The experimental environment for this study includes an AMD Ryzen 9 7950X 16-Core Processor running at 4.50 GHz as the CPU, equipped with 32 GB of RAM. The GPU used is an NVIDIA GeForce RTX 4080 with 16 GB of VRAM. On the software side, the operating system is Windows 11, PyCharm 2023.1.1 serves as the IDE, Python is used as the programming language, and the PyTorch deep learning framework is employed. Detailed information about the experimental environment is listed in Table 1. All training is conducted under the same conditions.

4.2. Training Parameters

During the training of the object detection model, we optimize model performance by adjusting a series of training parameters. We set the size of the training images to 640 × 640 pixels with 3 channels, and the inference image size to 2560 × 1440 pixels with 3 channels. For the training process, we plan to train for 300 epochs, using the Stochastic Gradient Descent (SGD) optimizer, processing 8 images per batch, and setting the learning rate to 0.01. For YOLO-related models, each network model is trained for 300 epochs, while for DETR-related models, each network model is trained for 18 epochs. During training, to ensure training speed, the patience value is set to 50. To improve training efficiency, we disable Mosaic data augmentation during the final 10 training epochs, effectively enhancing the model’s performance in aluminum electrolysis operations and improving its generalization capability.

4.3. Evaluation Metrics

To evaluate the performance of YOLOv8n-Al-Dehazing, we use precision, recall, and mean average precision (mAP) as the evaluation metrics:

P r e c i s i o n = \frac{T P}{T P + F P}

True Positive (TP) represents the number of samples correctly identified as positive. False Positive (FP) indicates the number of samples incorrectly identified as positive:

R e c a l l = \frac{T P}{T P + F N}

False Negative (FN) indicates the number of samples incorrectly identified as negative:

mAP = \frac{\sum_{q = 1}^{Q} {AP}_{q}}{Q}

Mean Average Precision (mAP): the average of the average precisions across all classes. Q denotes the total number of classes.

The quality of dehazing algorithms is evaluated using the Peak Signal-to-Noise Ratio (PSNR), measured in decibels (dB). A higher PSNR value indicates better image quality:

P S N R = 10 \cdot {log}_{10} (\frac{M A X_{I}^{2}}{M S E}) = 20 \cdot {log}_{10} (\frac{M A X_{I}}{\sqrt{M S E}})

where MAX_I represents the maximum pixel value of the image, and MSE is the Mean Squared Error.

The complexity of object detection algorithms is represented by parameters, FLOPs (Floating Point Operations per Second), and FPS (Frames Per Second). Higher values of parameters and FLOPs indicate a more complex model, which is harder to deploy. In our evaluation metrics, a larger number of parameters means a larger model, higher spatial complexity, and greater consumption of storage resources. Floating-point operations (FLOPs) are used to evaluate the temporal complexity of the model; higher FLOPs signify greater computational complexity, consuming more computational resources in practical applications. FPS indicates the number of images that can be processed per unit of time, reflecting the real-time capability of the algorithm.

4.4. Dataset Design

The deployment of the multi-functional material handling terminals of the aluminum electrolysis setup and the camera setup is illustrated in Figure 9. At each of the four corners of the tool vehicle’s circular track, we install a camera with a resolution of 2560 × 1440 pixels, ensuring that at least one camera can capture the working tools with a clear field of view. Continuous recordings are made by the four cameras, capturing operational data from August to November 2023, amassing a total dataset of 530 G. This dataset encompasses scenarios involving smoke, dust, smoke and dust mixed, water mist, dimly lit night-time operations, and tools being operated about varying scales and perspectives. After manually editing the video, we delete low-quality images caused by rapid camera movements and categorize the images into ten classes with corresponding labels. These categories include person, feeding pipe, bucket, anode, anode station, new anode, old anode, shell-breaking mechanisms, head of shell-breaking, and tail of shell-breaking as depicted in Figure 10.

The annotated dataset is then divided into training, validation, and testing sets following a 6:2:2 ratio. The training set contains 6972 images, the validation set includes 2583 images, and the testing set comprises 2430 images.

4.5. Selection of Baseline Network Models

4.5.1. High-Resolution Image Real-Time Dehazing Algorithm 4KDehazing

The deployment of high-resolution spherical cameras in aluminum electrolysis operation sites, coupled with the presence of intricate, non-uniform smoke, poses a significant challenge for dehazing algorithms. Using the O-HAZE dataset, we compare 4KDehazing [8] with other dehazing algorithms; the experimental results are summarized in Table 2. Models such as DA [7], PMS [39], GridDehazeNet [44], and MSBDN [45] struggle to perform real-time dehazing on high-resolution images. Conversely, real-time methods like AOD [38] and DehazeNet [37] exhibit suboptimal dehazing performance on high-resolution images, with PSNR metrics notably lower than those of newer algorithms such as MSBDN, PMS, and DA. 4KDehazing achieves a PSNR value of 18.48, surpassing PMS by 0.7 and GridDehazeNet by 0.91. Furthermore, it processes high-resolution images at an average speed of 16 ms per image, doubling the processing speed of AOD. The improvement in PSNR underscores the efficacy of 4KDehazing’s multi-scale feature extraction and fusion, as well as its adaptive dehazing strategy, which dynamically adjusts the intensity of haze removal based on local haze concentration. The real-time dehazing capability further validates the effectiveness of the multi-guided bilateral grid acceleration, making it particularly suitable for handling haze in the dynamic environments of aluminum electrolysis operations.

4.5.2. Experiments Related to YOLOv8n

We select images from the same operational scene, including those with smoke, no smoke, and dehazing, to conduct inference using YOLOv8n. The results are illustrated in Figure 11. Through inference using the YOLOv8n model, it is observed that the YOLO model performs effectively in smoke-free scenarios but fails in smoky environments. However, after the dehazing treatment, the performance is restored.

The integration of the 4KDehazing network with YOLOv8n significantly increases the model’s parameters and computation, thereby compromising its real-time performance. Given that the equipment configurations at aluminum electrolysis sites are relatively modest, an excessively large number of parameters and high computational demands render the model unsuitable for on-site deployment. Consequently, it is imperative to refine the combined network to reduce the parameters and computational requirements while enhancing the feature extraction capabilities.

4.6. Experiments on Object Detection Algorithms in Aluminum Electrolysis Smoke Scenarios

To improve YOLOv8n, we respectively improve the backbone and neck networks.

4.6.1. Backbone Comparison Experiment

To validate the effectiveness of the improved backbone network, we analyze and compare several state-of-the-art lightweight backbone networks. Liu et al. introduced EfficientViT [46], featuring a cascaded group attention module designed to address the high similarity among attention maps across different heads, thereby introducing computational redundancy. EfficientViT not only reduces the computational costs but also enhances the diversity of attention mechanisms. Tan et al. proposed EfficientNetV2 [47], which utilizes new operations such as Fused-MBConv to enrich the search space and combines the training-aware Neural Architecture Search with scaling to co-optimize the training speed and parameter efficiency. EfficientNetV2 also introduces an improved progressive learning method that adaptively adjusts image sizes and regularization. Han et al. introduced GhostNet [43], a plug-and-play component to upgrade existing convolutional neural network modules, effectively lowering computational costs.

Under identical experimental conditions, we replace the backbone of YOLOv8 with EfficientViT, EfficientNetV2, C2frepghost, Ghostnet, or their combinations. The results are shown in Table 3. While C2frepghost, and Ghostnet and C2frepghost achieve the anticipated improvements in parameters, the reduction in FLOPs is unsatisfactory. In comparison, EfficientNetV2 significantly reduces both computations and parameters. Compared to the original YOLOv8n, FLOPs are reduced by 5.4 G and parameters by 0.51 M, demonstrating the practicality of EfficientNetV2 in lightweight object detection models. However, further reduction in the parameters count is needed.

4.6.2. Neck Comparison Experiment

To validate the effectiveness of the improved neck network, we replace the original neck section with GhostNet, C2fFaster, and C2frepghost. The results are summarized in Table 4. Although C3ghost and C2fFaster individually achieve better results than C2frepghost when improving the neck section alone, their performance is suboptimal when combined with EfficientNetV2, leading to a decrease in accuracy. Therefore, we opt to replace the original C2f with C2frepghost, which reduces the parameters by 0.4 M. Combining this with the backbone improvements, we find that the best enhancement for YOLOv8n is achieved using EfficientNetV2 and C2frepghost. This combination results in reductions of 5.6 G in computations and 0.92 M in parameters, achieving the best lightweight optimization.

4.6.3. Comparison with Lightweight Object Detection Models and Ablation Study

To verify whether the proposed YOLOv8n-Al-Dehazing detection model significantly reduces both parameters and computation, we use the original YOLOv8n as a baseline and compare it with state-of-the-art lightweight variants of the YOLOv8 network, including our YOLOv8n-Al-Dehazing. The results are presented in Table 5.

From the results in the table, it is evident that various lightweight variants of the YOLOv8-based network models achieve significant reductions in both parameters and computation after incorporating lightweight modules. YOLOv8n achieves the best recall rate of 84.1%. YOLOv8n-ghost attains the lowest parameters at just 1.71 M; however, there is a certain decline in mAP and accuracy. Our proposed YOLOv8n-Al-Dehazing not only has the lowest computation at 2.5 G but also achieves the highest precision score of 91.0%. Compared to the baseline YOLOv8n, the proposed model reduces FLOPs by 5.6 G, decreases parameters by 0.92 M, and increases mAP by 11.1% and precision by 7.7%. This robustly demonstrates that the integration of components in YOLOv8n-Al-Dehazing significantly enhances both the accuracy and efficiency of terminal detection in complex industrial environments. Specifically, 4KDehazing markedly improves the quality of input images, while the enhancements in EfficientNetV2 and C2fRepGhost further augment feature extraction and fusion capabilities, leading to a notable increase in object detection precision. The efficiency of the 4KDehazing algorithm, combined with the lightweight design of EfficientNetV2 and C2fRepGhost, ensures that the entire system meets real-time requirements without compromising high accuracy. The integrated approach exhibits superior adaptability to complex environmental factors such as dust, high temperatures, and vibrations, maintaining stable detection performance even in adverse conditions.

To evaluate the effectiveness of each module in our proposed YOLOv8n-Al-Dehazing model, we conducted a series of ablation experiments using the baseline YOLOv8n model to explore and validate the impact of different combinations of improved modules. Key evaluation dimensions involved in the experiments include FLOPs, parameters, and precision. The results are presented in Table 6. Analysis of the table reveals that the addition of the EfficientNetV2 module significantly reduces FLOPs and improves parameters to a certain extent. On the other hand, C2frepghost achieves a substantial reduction in parameters while keeping precision almost unchanged. The optimized fusion network, when combined with the original YOLOv8n, achieves a significant reduction in both FLOPs and parameters while maintaining nearly the same detection accuracy.

4.7. Comparison with State-of-the-Arts

To verify the superiority of the proposed YOLOv8n-Al-Dehazing model in the task of object detection within aluminum electrolysis smoky environments, we compare our model with the latest mainstream models currently in use. We select widely used and representative object detection models for a series of comparative experiments. The results are presented in Table 7.

The experimental results indicate that compared to current mainstream detectors, YOLOv8n-Al-Dehazing exhibits superior performance when dealing with smoky data, showing improvements across key metrics. There is a significant reduction in FLOPs and parameters, achieving the best results in mAP and precision. Specifically, mAP improves by 7.6% compared to YOLOv3, and by 13.3% compared to YOLOv8n. For the DETR model, it outperforms RT-DETR by 11.0%. In terms of accuracy, as shown in Figure 12, YOLOv8n-Al-Dehazing improves by 7.7% over the original YOLOv8n, by 8.6% over RT-DETR, and is 4.1% higher than the most accurate model, YOLOv3, listed in the table. Additionally, YOLOv8n-Al-Dehazing achieves low computational complexity, reduced parameters, and the fastest object detection speed (FPS), effectively lowering the hardware requirements for deployment and meeting the real-time performance needs. Compared to YOLOv8, the parameters are reduced by 0.92 M and computations by 5.6 G. Through comparative experiments with advanced mainstream detectors, it is evident that our proposed YOLOv8n-Al-Dehazing meets our expectations, achieving higher accuracy and mAP on the aluminum electrolysis smoky dataset while significantly reducing parameters and computations, making it a suitable real-time object detector for smoky environments in aluminum electrolysis.

4.8. Comparative Experiment on the Detection Effect of Different Categories of Aluminum Electrolytic Operation Tool

The aluminum electrolysis operation process involves the coordinated use of multiple tools, and thus the model needs to detect a variety of operational targets. To this end, we compare the detection of each category by YOLOv8n and the proposed YOLOv8n-Al-Dehazing model to verify the multi-object detection capabilities of the latter. The results are shown in Table 8. According to the results in the table, compared to the original YOLOv8n detection model, the proposed YOLOv8n-Al-Dehazing model shows improvements in identifying aluminum electrolysis tools, particularly significant enhancements for the head of shell-breaking, tail of shell-breaking, shell-breaking mechanism, anode station, and new anode, with improvements of 9.6%, 15.9%, 20.7%, 30.7%, and 23.4%, respectively. In addition, the detection performance of other aluminum electrolysis tools, such as the anode grabbing rod and bucket, is also enhanced. Due to their distinct features and larger size, these tools already exhibit good detection accuracy with the YOLOv8n model, and the YOLOv8n-Al-Dehazing model, benefiting from the integration of the 4KDehazing model and the lightweight improvements to YOLOv8n, further improves detection performance and precision significantly. These results demonstrate that the network design of the YOLOv8n-Al-Dehazing model is particularly well suited for object detection tasks in aluminum electrolysis tools, enabling the better extraction of key feature information of operational tools and thereby enhancing the accuracy of object detection.

Through comparisons with various state-of-the-art detector networks, we have validated that the YOLOv8n-Al-Dehazing model is a lightweight, high-precision, and rapid solution capable of performing object detection tasks effectively in dust environments. Specifically tailored for aluminum electrolysis industrial scenarios, YOLOv8n-Al-Dehazing swiftly adapts to dynamic scene changes, ensuring uninterrupted and accurate monitoring. It facilitates the real-time detection of operational terminal statuses, equipment positions, and orientations, promptly identifying anomalies such as anode detachment or collisions. This capability significantly reduces the downtime caused by equipment malfunctions, thereby enhancing the overall production efficiency.

4.9. Visualization Results

As shown in Figure 13, within the first 50 epochs, the values of precision, recall, and mAP50 are significantly increased while the loss function value is substantially decreased. After 80 epochs, the rate of improvement begins to slow down, and around 110 epochs, the training metrics stabilizes, indicating that the network model has largely converged.

By conducting inference verification comparisons between YOLOv8n and YOLOv8n-Al-Dehazing, we demonstrate the effectiveness of YOLOv8n-Al-Dehazing in aluminum electrolysis operation sites. We test the model’s inference performance on our new data as shown in Figure 14.

Figure 3, Figure 11 and Figure 14 all showcase the inference results of YOLOv8n-Al-Dehazing in aluminum electrolysis smoky environments. The results in Figure 3 demonstrate that YOLOv8n-Al-Dehazing has the capability to detect operational targets in scenarios involving mist, smoke, and fog. The results in Figure 11 show that YOLOv8n-Al-Dehazing can perform object detection tasks in varying concentrations of smoke and dust, which demonstrates that the 4KDehazing algorithm effectively addresses image degradation issues in such complex environments, enabling the monitoring system to operate stably under various adverse conditions. It highlights the algorithm’s potential to facilitate round-the-clock surveillance with consistent reliability. The results in Figure 14 illustrate that YOLOv8n-Al-Dehazing addresses issues such as false positives and missed detections encountered by YOLOv8n in smoky conditions, achieving superior results in most scenarios.

5. Conclusions

In the challenging environment of aluminum electrolysis, where multi-functional cranes operate in conditions of extreme heat, a lot of dust, and severe mechanical vibrations, the detection of multi-functional operation terminals becomes significantly impeded. This paper analyzes the operational processes, noting that during operations, these terminals exhibit little discernible difference in color and shape from their background, undergo changes in posture due to vertical movement, and rotate horizontally. The use of high-resolution cameras further complicates the extraction of target features from images. To address these issues, we propose YOLOv8n-Al-Dehazing, a dehazing detection algorithm tailored for multi-functional operation terminals in aluminum electrolysis environments. Employing the 4KDehazing network, this method restores high-resolution images affected by non-uniform smoke and dust in real-time, effectively removing such interferences while preserving the detailed characteristics of objects in various angles and postures within the scene. To enhance YOLOv8n’s capability in processing high-resolution images, we substitute its backbone with EfficientNetV2. In response to the parameter proliferation caused by stacked networks, we replace the C2f component with C2fRepghost, thereby reducing the number of parameters and maintaining real-time performance. Experimental results demonstrate that the introduction of the 4KDehazing algorithm has increased the object detection accuracy (mAP) by 9.5%, rising from the original 81.5% to 91.0%, while precision has improved from 71.7% to 89.7% in environments with a high dust concentration. Compared to the original YOLOv8n, the YOLOv8n-Al-Dehazing model has enhanced object detection accuracy in complex industrial environments by 11.1%, elevating it from 78.6% to 89.7%. Precision has also seen a 7.7% increase, climbing from 83.3% to 91.0%, and the inference speed (FPS) has improved by 15.54%, from 104.2 to 120.4. YOLOv8n-Al-Dehazing maintains high accuracy and real-time inference capabilities even in complex dynamic scenes characterized by non-uniform smoke and dust. This solution effectively resolves the detection challenges faced by multi-functional operation terminals in aluminum electrolysis environments, which demonstrate that our method holds significant potential for practical applications in aluminum electrolysis industrial scenarios. In future work, we will involve deploying this model on computers at aluminum electrolysis sites, utilizing hardware cameras installed on multi-functional units to achieve terminal detection and recognition, thereby laying a solid foundation for the intelligent operation of aluminum electrolysis.

Author Contributions

Data curation: Y.P.; Formal analysis: Y.P., Y.L., X.L. and Y.C.; Funding acquisition: Y.L., X.L. and Y.C.; Investigation: Y.P.; Methodology: Y.P.; Project administration: Y.P., Y.L. and X.L.; Resources: Y.P., Y.L., X.L. and Y.C.; Software: Y.P.; Supervision: Y.L. and X.L.; Validation: Y.P.; Visualization: Y.P. and X.L.; Writing—original draft: Y.P. and X.L.; Writing—review and editing: Y.P., Y.L. and X.L. All authors have read and agreed to the published version of the manuscript.

Funding

This paper is funded by the Hunan Provincial Natural Science Foundation 2024JJ7144, 2023JJ50196.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Datasets during the current study are available from the corresponding author on reasonable request. The aluminum electrolysis production process data are available from the aluminum electrolysis Industry in China but restrictions apply to the availability of these data, which were used under license for the current study, and so are not publicly available.

Conflicts of Interest

The authors have no competing interests to declare that are relevant to the content of this article.

References

Dalal, N.; Triggs, B. Histograms of oriented gradients for human detection. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), San Diego, CA, USA, 20–25 June 2005; IEEE: Piscataway, NJ, USA, 2005; Volume 1, pp. 886–893. [Google Scholar]
Felzenszwalb, P.F.; Girshick, R.B.; McAllester, D.; Ramanan, D. Object Detection with Discriminatively Trained Part-Based Models. IEEE Trans. Pattern Anal. Mach. Intell. 2010, 32, 1627–1645. [Google Scholar] [CrossRef] [PubMed]
Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the Computer Vision & Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 779–788. [Google Scholar]
He, K.; Sun, J.; Tang, X. Single image haze removal using dark channel prior. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 1956–1963. [Google Scholar] [CrossRef]
Zhu, Q.; Mai, J.; Shao, L. A Fast Single Image Haze Removal Algorithm Using Color Attenuation Prior. IEEE Trans. Image Process. 2015, 24, 3522–3533. [Google Scholar] [CrossRef]
Dong, H.; Pan, J.; Xiang, L.; Hu, Z.; Zhang, X.; Wang, F.; Yang, M.-H. Multi-Scale Boosted Dehazing Network with Dense Feature Fusion. arXiv 2020, arXiv:2004.13388. [Google Scholar]
Shao, Y.; Li, L.; Ren, W.; Gao, C.; Sang, N. Domain Adaptation for Image Dehazing. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision & Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 2805–2814. [Google Scholar] [CrossRef]
Zheng, Z.; Ren, W.; Cao, X.; Hu, X.; Wang, T.; Song, F.; Jia, X. Ultra-High-Definition Image Dehazing via Multi-Guided Bilateral Learning. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021. [Google Scholar]
Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. In Proceedings of the 2014 IEEE/CVF Conference on Computer Vision & Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar]
Wang, Y.; Liu, X.; Zhao, Q.; He, H.; Yao, Z. Target Detection for Construction Machinery Based on Deep Learning and Multisource Data Fusion. IEEE Sens. J. 2023, 23, 11070–11081. [Google Scholar] [CrossRef]
Chen, C.; Zhu, Z.; Hammad, A. Automated excavators activity recognition and productivity analysis from construction site surveillance videos. Autom. Constr. 2020, 110, 103045. [Google Scholar] [CrossRef]
Golcarenarenji, G.; Martinez-Alpiste, I.; Wang, Q.; Alcaraz-Calero, J.M. Machine-learning-based top-view safety monitoring of ground workforce on complex industrial sites. Neural Comput. Appl. 2022, 34, 4207–4220. [Google Scholar] [CrossRef]
Hao, F.; Zhang, T.; He, G.; Dou, R.; Meng, C. CaSnLi-YOLO: Construction site multi-target detection method based on improved YOLOv5s. Meas. Sci. Technol. 2024, 35, 085202. [Google Scholar] [CrossRef]
Kim, H.; Kim, H.; Hong, Y.W.; Byun, H. Detecting Construction Equipment Using a Region-Based Fully Convolutional Network and Transfer Learning. J. Comput. Civ. Eng. 2018, 32, 04017082. [Google Scholar] [CrossRef]
Jiang, T.; Liu, G.; Zhang, Q.; Zeng, Z.; Cheng, S.; Zhang, J.; Zhang, Y. Obstacle Detection and Path Planning for Intelligent Overhead Cranes Using Dual 3D LiDARs in a Brewing Environment. In Proceedings of the 2022 12th International Conference on CYBER Technology in Automation, Control, and Intelligent Systems (CYBER), Baishan, China, 27–31 July 2022; pp. 289–294. [Google Scholar]
Sjöberg, I. Modelling and Fault Detection of an Overhead Travelling Crane System. Master’s Thesis, Linköping University, Automatic Control, Linköping, Sweden, 2018. [Google Scholar]
Carion, N.; Massa, F.; Synnaeve, G.; Usunier, N.; Kirillov, A.; Zagoruyko, S. End-to-End Object Detection with Transformers. In European Conference on Computer Vision—ECCV 2020; Springer International Publishing: Cham, Switzerland, 2020; pp. 213–229. [Google Scholar]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention Is All You Need. arXiv 2017, arXiv:1706.03762. [Google Scholar]
Wang, T.; Yuan, L.; Chen, Y.; Feng, J.; Yan, S. Pnp-detr: Towards efficient visual analysis with transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 11–17 October 2021; pp. 4661–4670. [Google Scholar]
Zhu, X.; Su, W.; Lu, L.; Li, B.; Wang, X.; Dai, J. Deformable detr: Deformable transformers for end-to-end object detection. arXiv 2020, arXiv:2010.04159. [Google Scholar]
Zhang, H.; Li, F.; Liu, S.; Zhang, L.; Su, H.; Zhu, J.; Ni, L.M.; Shum, H.Y. Dino: Detr with improved denoising anchor boxes for end-to-end object detection. arXiv 2022, arXiv:2203.03605. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2014, 37, 1904–1916. [Google Scholar] [CrossRef]
Girshick, R. Fast R-CNN. In Proceedings of the 2015 IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed]
Lin, T.Y.; Dollar, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature Pyramid Networks for Object Detection. In Proceedings of the 2017 IEEE/CVF Conference on Computer Vision & Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125. [Google Scholar]
Cai, Z.; Vasconcelos, N. Cascade R-CNN: Delving into High Quality Object Detection. In Proceedings of the 2017 IEEE/CVF Conference on Computer Vision & Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 6154–6162. [Google Scholar]
Redmon, J.; Farhadi, A. YOLO9000: Better, Faster, Stronger. In Proceedings of the 2017 IEEE/CVF Conference on Computer Vision Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 7263–7271. [Google Scholar]
Law, H.; Deng, J. CornerNet: Detecting Objects as Paired Keypoints. Int. J. Comput. Vis. 2020, 128, 642–656. [Google Scholar] [CrossRef]
Duan, K.; Bai, S.; Xie, L.; Qi, H.; Huang, Q.; Tian, Q. CenterNet: Keypoint Triplets for Object Detection. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 6569–6578. [Google Scholar]
Tian, Z.; Shen, C.; Chen, H.; He, T. FCOS: Fully Convolutional One-Stage Object Detection. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 9627–9636. [Google Scholar]
Meimetis, D.; Daramouskas, I.; Perikos, I.; Hatzilygeroudis, I. Real-time multiple object tracking using deep learning methods. Neural Comput. Appl. 2023, 35, 89–118. [Google Scholar] [CrossRef]
Gai, R.; Chen, N.; Yuan, H. A detection algorithm for cherry fruits based on the improved YOLO-v4 model. Neural Comput. Appl. 2023, 35, 13895–13906. [Google Scholar] [CrossRef]
Alaftekin, M.; Pacal, I.; Cicek, K. Real-time sign language recognition based on YOLO algorithm. Neural Comput. Appl. 2024, 36, 7609–7624. [Google Scholar] [CrossRef]
Zuiderveld, K. Contrast Limited Adaptive Histogram Equalization. In Graphics Gems IV; Academic Press: San Diego, CA, USA, 1994; pp. 474–485. [Google Scholar]
He, K.; Sun, J.; Tang, X. Guided image filtering. IEEE Trans. Pattern Anal. Mach. Intell. 2012, 35, 1397–1409. [Google Scholar] [CrossRef]
Makkar, D.; Malhotra, M. Single Image Haze Removal Using Dark Channel Prior. Int. J. Adv. Trends Comput. Sci. Eng. 2016, 33, 2341–2353. [Google Scholar] [CrossRef]
Cai, B.; Xu, X.; Jia, K.; Qing, C.; Tao, D. DehazeNet: An End-to-End System for Single Image Haze Removal. IEEE Trans. Image Process. 2016, 25, 5187–5198. [Google Scholar] [CrossRef]
Li, B.; Peng, X.; Wang, Z.; Xu, J.; Feng, D. AOD-Net: All-in-One Dehazing Network. In Proceedings of the 2017 IEEE/CVF International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 4770–4778. [Google Scholar]
Chen, W.T.; Ding, J.J.; Kuo, S.Y. PMS-Net: Robust Haze Removal Based on Patch Map for Single Images. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 11681–11689. [Google Scholar]
Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Medical Image Computing and Computer-Assisted Intervention—MICCAI; Springer: Cham, Switzerland, 2015; pp. 234–241. [Google Scholar]
Tan, M.; Le, Q.V. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. In Proceedings of the International Conference on Machine Learning, PMLR, Long Beach, CA, USA, 9–15 June 2019; pp. 6105–6114. [Google Scholar]
Chen, C.; Guo, Z.; Zeng, H.; Xiong, P.; Dong, J. RepGhost: A Hardware-Efficient Ghost Module via Re-parameterization. arXiv 2022, arXiv:2211.06088. [Google Scholar]
Han, K.; Wang, Y.; Tian, Q.; Guo, J.; Xu, C.; Xu, C. Ghostnet: More features from cheap operations. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision & Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 1580–1589. [Google Scholar]
Liu, X.; Ma, Y.; Shi, Z.; Chen, J. GridDehazeNet: Attention-Based Multi-Scale Network for Image Dehazing. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 7314–7323. [Google Scholar] [CrossRef]
Dong, H.; Pan, J.; Xiang, L.; Hu, Z.; Zhang, X.; Wang, F.; Yang, M.-H. Multi-scale boosted dehazing network with dense feature fusion. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 2157–2167. [Google Scholar]
Liu, X.; Peng, H.; Zheng, N.; Yang, Y.; Hu, H.; Yuan, Y. Efficientvit: Memory efficient vision transformer with cascaded group attention. In Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 14420–14430. [Google Scholar]
Tan, M.; Le, Q. Efficientnetv2: Smaller models and faster training. In Proceedings of the International Conference on Machine Learning, PMLR, Virtual, 18–24 July 2021; pp. 10096–10106. [Google Scholar]

Figure 1. Aluminum electrolysis operations: (a) shell-breaking mechanisms fragmenting the old anode, (b) the anode grabbing rod extracting the old anode from the aluminum electrolytic cell, (c) the bucket scooping out the anode residue from the aluminum electrolytic cell, and (d) the process of the feeding pipe spraying alumina powder onto the surface of the new anode.

Figure 2. The high-resolution camera surveillance system for aluminum electrolysis operations. During the operation of the crane, the camera automatically tracks the vertical movements of the multi-functional operation terminals to capture superior monitoring footage. (a) Monitoring view of the aluminum electrolysis operation environment. (b) Deployment of high-resolution cameras.

Figure 3. Smoke diminishes the contrast between terminals and the background in images, while non-uniformly distributed dust obscures the terminal features. YOLOv8 (top) exhibits low accuracy when detecting images affected by smoke (a) and fails to detect terminals in images heavily obscured by dust (b). In complex environments featuring a mixture of smoke and dust (c) or severe smoke interference (d), YOLOv8 often suffers from significant accuracy degradation or false detections. In contrast, our proposed YOLOv8n-Al-Dehazing (bottom) not only maintains high detection accuracy but also effectively avoids issues such as false and missed detections in these challenging environments.

Figure 4. YOLOv8n-Al-Dehazing architecture.

Figure 5. 4KDehazing architecture.

Figure 6. YOLOV8n architecture.

Figure 7. EfficientNetV2 architecture. In Figure (c), the asterisk (*) represents the multiplication operation.

Figure 8. RepGhost architecture: (a) RepGhost Training Phase: illustrates the multi-branch architecture of the RepGhost module during training. This includes depthwise separable convolution (DConv), batch normalization (BatchNorm2d), ReLU activation functions, and feature addition (Add Block) operations. The multi-branch design, enhanced by structural re-parameterization techniques, improves the feature extraction capabilities. (b) RepGhost Inference Phase: Demonstrates the single-branch structure of the RepGhost module during inference. Through structural re-parameterization, the multi-branch architecture during training is consolidated into an efficient single convolutional layer, significantly reducing the computational complexity and inference latency while maintaining robust feature representation.

Figure 9. Example of Deployment. (a) Actual deployment location of the camera. (b) Camera deployment planning.

Figure 10. Categories of labels: (a) anode grabbing rod, (b) shell-breaking mechanisms, (c) bucket, (d) feeding pipe, (e) new anode, (f) anode station and old anode, and (g) person.

Figure 11. To validate the effectiveness of the component replacements and integrated improvements in YOLOv8n-Al-Dehazing, we conduct inference tests using different enhanced networks within the same operational scenario. (a) The detection results of YOLOv8n, demonstrating its capability to identify operational terminals in dust-free environments. However, as shown in (b), YOLOv8n fails to detect terminals in images obscured by dust. (c) The detection results of YOLOv8n after preprocessing the dust images with 4KDehazing. The results indicate that YOLOv8n can once again identify terminals within the images post-dehazing, thereby proving the efficacy of 4KDehazing, albeit with a significant drop in accuracy. (d) The detection results of YOLOv8n-Al-Dehazing. By incorporating EfficientNetV2 and C2fRepGhost to enhance YOLOv8n, the model significantly improves the accuracy of terminal detection in hazy environments, substantiating the effectiveness of these refinements.

Figure 12. Comparison of experimental results for various mainstream object detection models about precision.

Figure 13. Training results of YOLOv8n-Al-Dehazing.

Figure 14. (a) Misdetections by YOLOv8n due to smoke and dust in the upper image. (b,c) Missed detections by YOLOv8n due to smoke and dust in the upper images. The lower images display the inference results of our proposed YOLOv8n-Al-Dehazing.

Table 1. Experiment environment configuration.

No.	Laboratory Setting	Configuration Information
1	CPU	AMD Ryzen 9 7950X 16-Core Processor 4.50 GHz
2	GPU	NVIDIA RTX4080 16 G
3	Operating System	Windows 11
4	RAM	32 GB
5	Programming Language	Python 3.11.5
6	Deep Learning Framework	PyTorch 2.11 Torchvision 0.16.1
7	CUDA	CU11.8

Table 2. For the evaluation of the de-smoking network on the O-HAZE dataset in terms of PSNR and runtime, we set the time to—for algorithms that do not meet the requirement of processing—at least 10 images per second.

Models	PSNR	Time
DA	16.86	-
PMS	17.78	-
GridDehazeNet	16.66	-
MSBDN	17.76	-
AOD	15.10	34 ms
DehazeNet	17.57	105 ms
4KDehazing	18.43	16 ms

Table 3. Comparison of backbone Networks, where Ghostnet and C2frepghost involves replacing C2fghost and C3ghost in the backbone with C2frepghost.

Models	FLOPs/G	Parameters/M
YOLOv8n(baseline)	8.1	3.01
+EfficientVit	+2.0	+1.00
+EfficientNetV2	−5.4	−0.51
+C2frepghost	−1.1	−0.40
+Ghostnet	−3.1	−1.30
+Ghostnet & C2frepghost	−2.3	−1.06
ours	−5.6	−0.92

Table 4. Neck network comparison.

Models	FLOPs/G	Parameters/M
YOLOv8n(baseline)	8.1	3.01
+Ghostnet(C3ghost)	−1.3	−0.69
+C2fFaster	−1.7	−0.71
+C2frepghost	−1.1	−0.40
ours	−5.6	−0.92

Table 5. Comparison of improvement lightweight YOLOV8 models.

Models	mAP (%)	FLOPs/G	Parameters/M	Precision (%)	Recall (%)
YOLOv8n(baseline)	78.6	8.1	3.01	83.3	84.1
YOLOv8n-ghost	75.9	5.0	1.71	83.0	69.8
YOLOv8n-c2fRepghost	80.2	7.3	2.61	82.9	75.3
YOLOv8n-EfficientVit	76.9	10.1	4.01	83.5	70.2
YOLOv8n-EfficientNetV2	72.7	2.7	2.50	79.8	66.8
YOLOv8n-ghost-c2freghost(neck)	79.4	6.8	2.30	82.7	75.1
YOLOv8n-ghost-c2freghost(backbone)	79.9	5.8	1.95	83.6	74.3
YOLOv8n-EfficientNetV2-c2fRepghost	71.7	2.5	2.09	81.5	65.1
YOLOv8n-Al-Dehazing(ours)	89.7	2.5	2.09	91.0	83.4

Table 6. Experimental results of ablation study.

EfficientNetV2	C2freghost	FLOPs/G	Parameters/M	Precision (%)
		8.1	3.01	83.3
✓		2.7	2.50	79.8
	✓	6.8	2.30	82.7
✓	✓	2.5	2.09	81.5

Table 7. Comparison of advanced models. We perform inference on the models using a resolution of 2560 × 1440 and calculate the FPS. For DINO-4scale and Deformable-DETR, since they are not capable of real-time object detection, we do not include their FPS values in this comparison.

Models	mAP %	FLOPs/G	Parameters/M	Precision (%)	Recall (%)	FPS
YOLOv3	84.3	193.9	120.50	86.9	78.8	55.0
YOLOv5n	78.3	7.1	2.50	82.6	72.9	100.0
YOLOv6n	83.1	11.8	4.23	86.5	77.0	84.6
YOLOv8n	78.6	8.1	3.01	83.3	84.1	104.2
YOLOv8s	82.4	28.5	11.12	85.2	78.1	78.5
YOLOv8m	83.4	78.7	25.85	85.1	77.6	55.0
YOLOv8l	84.3	164.9	43.61	84.5	79.8	32.1
YOLOv8x	83.4	257.4	68.13	85.6	77.5	16.7
RT-DETR	80.9	103.5	32.00	82.4	79.0	27.8
DINO-4scale	71.9	279.0	47.00	74.0	70.2	-
Deformable-DETR	0.10	173.0	40.00	0.0	3.2	-
YOLOv8n-Al-Dehazing	89.7	2.5	2.09	91.0	83.4	120.4

Table 8. Detection precision for different tools categories.

Tools Type	YOLOv8n (AP%)	YOLOv8n-Al-Dehazing (AP%)
Anode grabbing rod	91.2	97.3
Feeding Pipe	84.3	86.3
Excavator Bucket	95.1	98.4
Head of Crust Breaker	85.8	95.4
End of Crust Breaker	80.4	96.3
Crust Breaker	75.5	96.2
Anode Stem	61.6	92.3
Spent Anode	69.0	73.9
New Anode	59.7	83.1
Person	83.3	77.2

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Pan, Y.; Long, Y.; Li, X.; Cai, Y. YOLOv8n-Al-Dehazing: A Robust Multi-Functional Operation Terminals Detection for Large Crane in Metallurgical Complex Dust Environment. Information 2025, 16, 229. https://doi.org/10.3390/info16030229

AMA Style

Pan Y, Long Y, Li X, Cai Y. YOLOv8n-Al-Dehazing: A Robust Multi-Functional Operation Terminals Detection for Large Crane in Metallurgical Complex Dust Environment. Information. 2025; 16(3):229. https://doi.org/10.3390/info16030229

Chicago/Turabian Style

Pan, Yifeng, Yonghong Long, Xin Li, and Yejing Cai. 2025. "YOLOv8n-Al-Dehazing: A Robust Multi-Functional Operation Terminals Detection for Large Crane in Metallurgical Complex Dust Environment" Information 16, no. 3: 229. https://doi.org/10.3390/info16030229

APA Style

Pan, Y., Long, Y., Li, X., & Cai, Y. (2025). YOLOv8n-Al-Dehazing: A Robust Multi-Functional Operation Terminals Detection for Large Crane in Metallurgical Complex Dust Environment. Information, 16(3), 229. https://doi.org/10.3390/info16030229

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

YOLOv8n-Al-Dehazing: A Robust Multi-Functional Operation Terminals Detection for Large Crane in Metallurgical Complex Dust Environment

Abstract

1. Introduction

2. Related Works

2.1. Analysis of Object Detection Algorithm

2.2. Analysis of Dehazing Algorithm

3. Proposed Method

3.1. 4KDehazing

3.2. YOLOv8n

3.3. EfficientNetV2

3.4. C2fRepghost

4. Experiments

4.1. Experimental Setup

4.2. Training Parameters

4.3. Evaluation Metrics

4.4. Dataset Design

4.5. Selection of Baseline Network Models

4.5.1. High-Resolution Image Real-Time Dehazing Algorithm 4KDehazing

4.5.2. Experiments Related to YOLOv8n

4.6. Experiments on Object Detection Algorithms in Aluminum Electrolysis Smoke Scenarios

4.6.1. Backbone Comparison Experiment

4.6.2. Neck Comparison Experiment

4.6.3. Comparison with Lightweight Object Detection Models and Ablation Study

4.7. Comparison with State-of-the-Arts

4.8. Comparative Experiment on the Detection Effect of Different Categories of Aluminum Electrolytic Operation Tool

4.9. Visualization Results

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI