1. Introduction
Object detection plays a vital role in the field of autonomous driving, as it directly impacts the accuracy and reliability of the system [
1]. However, in foggy weather conditions, various challenges emerge, including reduced visibility and blurred object boundaries, which can significantly degrade the performance of detection algorithms [
2]. These issues not only compromise the system’s effectiveness but also raise concerns about the safety and dependability of autonomous vehicles [
3]. Therefore, researching object detection in foggy weather scenarios is crucial for enhancing the robustness and reliability of autonomous driving systems.
In recent years, notable advancements have been made in addressing the challenge of object detection under foggy weather conditions [
4,
5]. Traditional approaches predominantly rely on conventional computer vision techniques such as edge detection, filtering, and background modeling. While these methods offer partial solutions for foggy images, their performance in complex scenes and under severe fog conditions is limited. To improve object detection in such environments, researchers have increasingly turned to physical models for representing foggy images. For example, He et al. [
6] proposed a single-image dehazing method based on the dark channel prior, and Zhu et al. [
7] introduced a fast dehazing approach utilizing color attenuation priors. These dehazing techniques enhance the visibility of foggy images, which, in turn, boosts the accuracy of object detection. However, methods based on physical models typically require the estimation of fog density, which makes it challenging to handle varying fog densities across diverse and complex scenes.
With the continuous advancement of deep learning techniques, object detection has gradually become a major research focus [
8,
9], leading to the emergence of numerous high-performance algorithms based on convolutional neural networks (CNNs). Existing detection methods are generally categorized into two types: two-stage detectors and one-stage detectors. Two-stage detectors, represented by R-CNN [
10], Fast R-CNN [
11], and Faster R-CNN [
12], typically generate a set of candidate regions and then perform classification and position regression for each proposal. For example, Faster R-CNN introduces a Region Proposal Network (RPN) to efficiently generate candidate boxes and employs ROI Pooling for feature extraction and regression, achieving strong detection accuracy. However, these region proposal-based approaches require considerable computational resources and often suffer from slower inference speeds, making them less suitable for real-time applications with strict latency constraints [
13], such as autonomous driving. In foggy weather scenarios, Chen et al. further proposed a domain-adaptive detection method [
14] that aligns features between source and target domains to improve performance under degraded visibility, highlighting the importance of robustness in adverse environments.
In contrast, one-stage detectors directly perform classification and localization on the input image without generating candidate boxes, significantly improving detection efficiency. Representative algorithms in this category include the YOLO series [
15,
16] and SSD, where YOLO divides the image into grids to predict bounding boxes and class probabilities, while SSD predicts multi-scale bounding boxes across different feature layers. Owing to their high speed and relatively simple architecture, one-stage detectors are widely adopted in real-time vision tasks. Nevertheless, improving accuracy while maintaining lightweight and efficient computation remains a critical challenge. To address this issue, many studies have explored structural optimization and feature enhancement strategies for YOLO-based models. For instance, Qiu et al. [
17] enhanced YOLOv5 by integrating the Coordinated Attention (CA) mechanism [
18] with GhostNet [
19], improving feature representation capability in complex environments. Fan et al. [
20] combined YOLOv5 with dark channel enhancement to alleviate low-illumination problems during nighttime image capture and compared multiple image enhancement methods to validate performance improvements. Baidya et al. [
21] introduced an additional detection head and incorporated ConvMixer [
22] modules for unmanned aerial vehicle detection tasks, demonstrating competitive performance on the VisDrone2021 dataset. Similarly, Ge et al. [
23] embedded Coordinated Attention and Squeeze-and-Excitation (SE) modules [
24] into the YOLOv5s architecture to strengthen channel feature learning, further enhancing detection performance. These studies indicate that although one-stage detectors provide significant advantages in speed, continuous improvements in attention mechanisms, feature fusion strategies, and lightweight network design are still required to achieve a better balance between accuracy and efficiency, especially under challenging conditions such as foggy weather and complex lighting environments.
Moreover, several studies have investigated the impact of heavy fog on vehicular sensors and the accuracy of object detection in driving environments. For example, Ogunrinde et al. [
25] analyzed how the performance of CNN-based target detectors deteriorates rapidly under adverse weather conditions, and examined methods to defog and restore the quality of foggy images to improve real-time detection performance. Liu et al. [
3] conducted a quantitative analysis on how varying visibility levels affect the accuracy of visual sensors in foggy conditions. Their findings showed that as fog density increased, the accuracy of object detection using Faster R-CNN decreased significantly, with detection accuracy dropping from 91.55% under clear weather to 57.75% under heavy fog.
Over the past two years, significant advancements have been made in object detection methods based on deep learning. Zhang et al. [
26] improved YOLOv8 by adding a small-object-specific detection layer and a CEF module with CDW-EMA attention, enhancing multiscale perception and background suppression. Wang et al. [
27] proposed MDD-ShipNet by integrating a CNN-based dehazing mechanism with a multi-scale feature fusion dynamic head, enhancing ship detection performance in foggy environments. Zhang et al. [
28] proposed HR-YOLO, an improved YOLOv8 model for foggy ADAS detection of vehicles and pedestrians, incorporating EHPD-Net backbone, DND-Net defogging, optimized neck fusion, and WIoU loss, yielding mAP gains of 5.9% on RTTS and 9.7% on Foggy Cityscapes. Wang et al. [
29] proposed YOLO-Extreme, an enhanced YOLOv12-based framework incorporating DBB, MCAM, and CSFB modules for robust obstacle detection in foggy environments, achieving 50.1% mAP on RTTS (3.9% higher than baseline) with real-time speed for visually impaired navigation assistance. Liu et al. [
30] proposed an optimized YOLOv5-based target detection algorithm inspired by plant intelligence, integrating GCAnet for polarization image fusion and Wise_IOU loss with adaptive plant growth mechanisms, achieving 78.4% mAP in foggy conditions, outperforming existing models.
In the aforementioned research on foggy weather object detection, although the detection accuracy has been improved, most of these methods are primarily focused on defogging and image enhancement [
31]. This study aims to enhance object detection performance in foggy environments by introducing a selective mechanism for intermediate-level feature representation, thereby strengthening the network’s ability to capture discriminative information and improving the robustness of foggy-scene target detection. In recent years, attention mechanisms have been widely applied in computer vision due to their strong capability in enhancing feature representation and modeling contextual relationships [
24,
32]. Inspired by these advances, this study innovatively proposes two gated modules based on attention mechanisms and integrates them into the YOLOv8 framework to improve road object detection performance under foggy weather conditions.
While our work draws inspiration from attention mechanisms, it differs fundamentally from traditional approaches. Conventional attention modules, such as SE and ECA, typically generate weights for feature recalibration based on input feature maps, lacking the capability for feature extraction or fusion. In contrast, our proposed GroupGatedConv and C2fGated modules innovatively integrate channel selection mechanisms into core operations: GroupGatedConv performs channel selection while conducting feature extraction, preserving the local feature capture capability of convolutions; whereas C2fGated introduces dynamic channel filtering based on the multi-scale feature fusion advantage of the C2f structure. This design enables the modules to adaptively enhance crucial channel features like attention mechanisms, without sacrificing their original feature processing capabilities, thereby significantly improving the model’s representation ability.
The main contributions of this study are as follows:
- 1.
We propose the GroupGatedConv module, which uses group-wise gating to improve feature selection in foggy conditions while maintaining computational efficiency;
- 2.
Based on the C2f module, we propose a new module, “C2fGated”, which embeds the ECA attention mechanism [
33] before feature fusion and after the final convolution operation to enhance the selection of target channel features;
- 3.
We integrate the “C2fGated” and “GroupGatedConv” modules into YOLOv8 to enhance the intermediate feature selection capability, thereby improving its ability to detect road targets in foggy conditions.
The structure of this paper is organized as follows.
Section 2 provides an overview of the original YOLOv8n model and discusses the innovations introduced in this study. In
Section 3, we present the dataset, experimental setup, and the results of our experiments. Finally,
Section 4 concludes the paper and outlines potential directions for future research.
3. Experiments
This section presents a clear and concise summary of the experimental results, along with their interpretation and the conclusions drawn from the experiments.
3.1. Datasets
Adverse weather conditions, particularly dense fog, pose considerable challenges to object detection models based on convolutional neural networks (CNNs). Fog significantly degrades image quality by reducing contrast, blurring object edges, and obscuring critical visual features through atmospheric scattering, which often leads to increased false negatives and localization errors. To rigorously evaluate the true performance and robustness of our model under such demanding conditions, we employed a high-quality foggy weather dataset sourced from Roboflow, containing a total of 2975 images [
41]. And this dataset consists of images captured under real fog conditions in actual environmental settings, offering a more authentic challenge for detection tasks. The dataset’s specialized focus on real-world fog scenarios makes it highly representative, providing a more accurate reflection of the model’s intrinsic detection capability in real fog conditions. Although the dataset is relatively moderate in size, its specialized focus on authentic and diverse foggy scenarios makes it highly representative and challenging, thereby providing a more accurate reflection of the model’s intrinsic detection capability rather than relying on large-scale data volume. We allocated 80% of the images (2380 images) as the training set and the remaining 20% (595 images) as the validation set to train and assess the model’s effectiveness, respectively.
3.2. Experimental Details
This research work used the PyTorch framework to complete a series of tasks using GPUs for accelerated training, and the specific relevant environment configurations are seen in
Table 1.
Our algorithm was improved from YOLOv8n, with the learning rate set to 0.001 in the training phase and the weight decay value set to 0.0005. To optimize the parameters of the model, we used the stochastic gradient descent algorithm and the momentum optimization algorithm, and the input image had a length and width of 640 and a batch size of 32; the epoch of iteration was 200, and the momentum factor was 0.937. During the training process of this study, we employed various effective data augmentation techniques, including Mosaic, Hsv, Flip.
3.3. Evaluation Metrics
In foggy scenarios, object detection becomes particularly challenging due to visibility 296 degradation caused by atmospheric scattering, which often leads to missed detections. To 297 quantitatively evaluate the detection performance under such conditions, Recall and mean 298 Average Precision (mAP) are adopted as the primary evaluation metrics.
Among them, Recall is regarded as the most critical metric, as it directly reflects the model’s ability to detect existing targets in low-visibility environments. mAP is additionally reported to provide a comprehensive evaluation by jointly considering both detection accuracy and recall across different categories. Precision and Recall are defined as follows:
where TP denotes the number of true positive detections, FP represents the number of false positives, and FN indicates the number of false negatives.
In foggy scenes, a high Recall implies a lower missed-detection rate, which is particularly important for safety-critical applications such as autonomous driving.
The Average Precision (AP) for a single class is computed as the area under the Precision–Recall (P–R) curve:
where P(R) denotes the precision as a function of recall.
The mean Average Precision (mAP) is obtained by averaging the AP values over all object categories:
where N is the total number of object classes and
denotes the average precision of the i-th class.
3.4. Comparison Experiments
FogGate-YOLO provides an effective solution for object detection in foggy environments, aiming to address the severe impact of fog on detection accuracy. In traditional methods, image dehazing or enhancement is typically used as a preprocessing step. However, FogGate-YOLO directly strengthens the model’s feature extraction ability by introducing two novel modules, GroupGatedConv and C2fGated, which effectively mitigate the image degradation caused by fog. The GroupGatedConv module focuses on coarse-grained channel selection to suppress noise while preserving essential structural features, while the C2fGated module further refines the features post multi-branch fusion, enhancing the model’s discriminative power in foggy conditions. This design, based on modular enhancement, avoids the complexity of traditional image preprocessing and improves computational efficiency.
Figure 5 illustrates the variations in key metrics, including bounding box loss, distribution focal Loss, and class loss, as well as Precision, Recall, mAP50 after each epoch during the training and validation process of FogGate-YOLO.
Compared to other enhanced YOLO models, FogGate-YOLO significantly improves detection accuracy in foggy environments while maintaining efficient inference. Experimental results show that FogGate-YOLO outperforms than models such as YOLOv5n and YOLOv8n in mAP50 and mAP50–95 metrics. FogGate-YOLO achieves an mAP50 of 41.3%, which is higher than other models in the YOLO series, such as YOLOv5n (39.8%), YOLOv6n (36.3%), YOLOv8n (40.6%), and YOLOv11n (39.6%). Meanwhile, FogGate-YOLO’s GFLOPs (8.8) are not significantly different from these models, with YOLOv8n having a GFLOPs of 8.8, YOLOv11n being 6.5, YOLOv6n at 11.4, and YOLOv5n at 7.7. This indicates that the two modules, C2fGated and GroupGatedConv, added to FogGate-YOLO do not introduce significant computational overhead, yet provide substantial performance improvements. Although FogGate-YOLO’s mAP50 (41.3%) is slightly lower compared to YOLOv5s (42.2%) and YOLOv8s (42.6%), FogGate-YOLO outperforms in terms of GFLOPs and parameter size. YOLOv5s has a GFLOPs of 24.0, YOLOv8s is 28.6, both significantly higher than FogGate-YOLO’s 8.8, and the parameter size for YOLOv5s is 9.1 M, for YOLOv8s it is 11.2 M, whereas FogGate-YOLO only has 3.152 M parameters. This demonstrates that while YOLOv5s and YOLOv8s have slightly better accuracy, they come with much larger computational demands and model sizes compared to FogGate-YOLO. Additionally, FogGate-YOLO’s Recall is 39.8%, which is higher than YOLOv5s at 37.8% and YOLOv8s at 39.3%. This indicates that FogGate-YOLO still performs exceptionally well in terms of target detection under foggy conditions, maintaining a high recall rate while keeping low computational overhead and small model size.
In summary, FogGate-YOLO demonstrates a balanced design, providing excellent performance in target detection under foggy conditions while maintaining computational efficiency and model lightweight, making it highly practical for real-world applications. The detection results of various algorithms on the dataset are shown in
Table 2.
3.5. Ablation Experiments
To assess the impact of each module on the performance of our model, we performed an ablation study on our dataset. In order to maintain scientific rigor and ensure a thorough evaluation of the proposed model, we used three key metrics: Recall, mAP50, and mAP50-95. The influence of each individual module on the detection performance is summarized in
Table 3.
3.6. The Impact of the GroupGatedConv Module
To further enhance channel-wise feature selection under foggy conditions while maintaining computational efficiency, we propose a lightweight GroupGatedConv module and evaluate its effectiveness through ablation experiments. Unlike conventional channel attention mechanisms that assign an individual weight to each output channel, the proposed GroupGatedConv adopts a group-wise gating strategy. Specifically, the output channels are evenly divided into multiple groups, and a single adaptive gate is learned for each group. This design enables the network to selectively suppress or preserve groups of feature channels according to their task relevance, while significantly reducing parameter count and computational overhead.
To assess the impact of early-stage channel selection, the proposed GroupGatedConv replaces the original convolution module at the fourth layer of the YOLOv8 backbone. This stage corresponds to mid-level feature extraction, where basic semantic patterns emerge while spatial resolution remains relatively high. By integrating GroupGatedConv at this layer, the network gains stronger selectivity over intermediate channel representations, enabling it to suppress fog-induced background responses and retain features that are more relevant to object structures. As a result, more informative and noise-resilient features are propagated to subsequent layers, improving the overall detection robustness under foggy conditions.
Compared with the baseline YOLOv8n, introducing the GroupGatedConv module leads to a Recall improvement of 0.8%. This gain indicates enhanced sensitivity to fog-obscured targets and demonstrates the effectiveness of group-wise channel gating for improving detection performance in degraded visual environments. Overall, the ablation results confirm that the proposed GroupGatedConv module strengthens mid-level feature discrimination with minimal computational overhead, making it a practical and effective component for foggy scene object detection.
3.7. The Impact of the C2fGated Module
By incorporating an Efficient Channel Attention (ECA) mechanism after feature concatenation, the proposed C2fGated module enables adaptive channel reweighting on the fused features. The ECA module captures local cross-channel dependencies through lightweight one-dimensional convolution, allowing the network to selectively emphasize task-relevant channels while suppressing noise-sensitive ones, without introducing dimensionality reduction or significant computational overhead.
To further investigate the influence of module placement, C2fGated is inserted into two critical stages of YOLOv8: the seventh layer of the backbone and the nineteenth layer of the neck. At the backbone stage, mid-level features begin to encode object structure and semantic cues while remaining susceptible to fog-induced interference. Introducing C2fGated at this stage facilitates channel-wise purification of intermediate representations, enhancing the stability of discriminative features propagated to deeper layers. At the neck stage, multi-scale features are fused to support detection at different resolutions. However, under heavy fog, small-target cues are often diluted during the fusion process. Embedding C2fGated after feature fusion allows the model to perform targeted channel selection on the fused mid-level features, significantly improving robustness for small-target detection in foggy scenes.
Compared with the baseline YOLOv8n, the proposed FogGate-YOLO achieves consistent performance improvements after introducing the single C2fGated module. Specifically, Recall is improved by 0.6%, mAP50 increases by 1.0%, and mAP50–95 improves by 0.6%. These results demonstrate that the proposed C2fGated module effectively enhances feature discrimination under foggy conditions and that the chosen insertion positions in both the backbone and neck contribute to the observed performance gains.
3.8. Joint Effect of C2fGated and GroupGatedConv
To further analyze the collaborative effect of the proposed modules, we conduct joint ablation experiments by simultaneously integrating C2fGated and GroupGatedConv into the YOLOv8n architecture. This setting allows us to evaluate whether the two channel selection mechanisms provide complementary benefits under foggy conditions.
The GroupGatedConv module is responsible for coarse-grained channel selection at early-to-mid stages of the backbone by suppressing or enhancing groups of correlated feature channels. This operation effectively filters out fog-induced background responses and preserves structurally meaningful features at the intermediate representation level. In contrast, the C2fGated module focuses on fine-grained channel recalibration after multi-branch feature fusion, enabling adaptive refinement of aggregated mid-level features in both the backbone and neck. When combined, these two modules form a hierarchical channel selection strategy. GroupGatedConv first performs structured channel filtering on mid-level features, providing a cleaner and more stable feature basis. Subsequently, C2fGated further reweights the fused features through efficient channel attention, selectively emphasizing task-relevant responses. This coarse-to-fine collaboration significantly strengthens the model’s ability to discriminate informative channels under moderate and heavy fog conditions.
As a result, the proposed FogGate-YOLO exhibits a substantially enhanced capability for mid-level channel selection, which is particularly beneficial for road target detection in degraded visual environments. Compared with the baseline YOLOv8n, FogGate-YOLO achieves a Recall improvement of 2.6%, along with gains of 0.7% in mAP
50 and 0.4% in mAP
50–95. These results demonstrate that the cooperative integration of C2fGated and GroupGatedConv yields complementary and cumulative performance improvements, confirming the effectiveness of the proposed channel selection strategy for foggy scene object detection. With YOLOv8n’s GFLOPs at 8.8 and FogGate-YOLO’s GFLOPs also at 8.8, alongside YOLOv8n’s 3.151M parameters compared to FogGate-YOLO’s 3.152M parameters, this indicates that the two new modules introduce minimal computational overhead, further emphasizing the lightweight nature of FogGate-YOLO. It is evident from
Figure 6 that FogGate-YOLO exhibits excellent detection capabilities in foggy weather conditions environments. Additionally, FogGate-YOLO can identify smaller objects in dense fog more effectively.
4. Conclusions
In this paper, we propose FogGate-YOLO, an enhanced YOLOv8 framework designed for robust object detection in foggy conditions. Unlike conventional approaches that rely on image dehazing or preprocessing enhancement, our method directly strengthens the model’s feature representation by embedding advanced channel selection mechanisms, effectively mitigating fog-induced degradation without additional inference overhead.
The core contributions are two synergistic modules: GroupGatedConv and C2fGated. GroupGatedConv performs coarse-grained channel selection in the early-to-mid backbone stages, suppressing fog-related noise while preserving structural features. C2fGated enables fine-grained recalibration after multi-branch fusion, refining aggregated features in both backbone and neck. Together, they form a hierarchical coarse-to-fine channel selection strategy that significantly boosts discriminative power under foggy scenes.
Extensive experiments demonstrate the effectiveness of FogGate-YOLO. Compared to the baseline YOLOv8n, it achieves improvements of 2.6% in Recall, 0.7% in mAP50, and 0.5% in mAP50–95 on foggy road detection datasets, with more pronounced gains under moderate and heavy fog. Joint ablation studies confirm the complementary benefits of the two modules. GroupGatedConv provides structured filtering at the mid-level, while C2fGated performs adaptive refinement after fusion. This coarse-to-fine collaboration creates a cleaner feature basis and selectively emphasizes task-relevant responses, markedly enhancing robustness in challenging foggy environments.
The results indicate strong potential for adverse weather detection. Although our algorithm has shown promising results, certain limitations remain. Firstly, the dataset used in the study has a relatively small sample size and lacks diverse scenarios, which restricts the model’s generalization ability. Secondly, key hyperparameters, such as the parameter G in the GroupGatedConv module, have not been explored in depth, and their impact on the model’s performance has not been thoroughly analyzed. Additionally, our algorithm has not been compared with the most advanced lightweight fog detection models in recent years, which represents a significant area for future research. In order to further enhance the performance of the algorithm, we plan to extend its application to various foggy scenarios in future work, analyzing its performance across different conditions. Additionally, we will apply other techniques such as data augmentation, transfer learning, and regularization to further improve the model’s generalization ability across different datasets. At the same time, we will conduct a detailed exploration of the hyperparameter G in the GroupGatedConv module, focusing on its impact across varying numbers of feature maps, especially in different network architectures. This will ensure that the algorithm can self-adaptively select the optimal number of groups, G, depending on the feature map quantity at different stages, thereby maximizing the model’s performance.