A Review of Unmanned Visual Target Detection in Adverse Weather

Song, Yifei; Lu, Yanfeng

doi:10.3390/electronics14132582

Open AccessReview

A Review of Unmanned Visual Target Detection in Adverse Weather

by

Yifei Song

^1,2

and

Yanfeng Lu

^2,*

¹

College of Electronic Information Engineering, Hebei University of Technology, Tianjin 300131, China

²

Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China

^*

Author to whom correspondence should be addressed.

Electronics 2025, 14(13), 2582; https://doi.org/10.3390/electronics14132582

Submission received: 21 April 2025 / Revised: 14 June 2025 / Accepted: 23 June 2025 / Published: 26 June 2025

(This article belongs to the Section Computer Science & Engineering)

Download

Browse Figures

Review Reports Versions Notes

Abstract

Visual target detection under adverse weather conditions presents a fundamental challenge for autonomous driving, particularly in achieving all-weather operational capabilities. Unlike existing reviews that concentrate on individual technical domains such as image restoration or detection robustness, this review introduces an innovative “restoration–detection” collaborative framework. This paper systematically examines state-of-the-art methods for degraded image recovery and improvement of detection model robustness, encompassing from traditional, physically driven approaches as well as contemporary deep learning paradigms. A comprehensive overview and comparative analysis are provided to elucidate these advancements. Regarding the recovery of degraded images, traditional methods demonstrate advantages in interpretability within specific scenarios, such as those based on dark channel prior. In contrast, deep learning methods have achieved significant breakthroughs in modeling complex degradations and enhancing cross-domain generalization through a data-driven paradigm. In the field of enhancing detection robustness, traditional improvement techniques that utilize anisotropic filtering, alongside deep learning methods such as SSD, R-CNN, and the YOLO series, contribute to perceptual stability through feature optimization and end-to-end learning approaches, respectively. This paper summarizes 11 types of mainstream public datasets, examining their multimodal annotation system and addressing issues related to discrepancies. Furthermore, it provides an extensive evaluation of algorithm performance using PSNR, SSIM, mAP, among others. It has been identified that significant bottlenecks persist in dynamic weather coupling modeling, multimodal heterogeneous data fusion, and the efficiency of edge deployment. Future research should focus on establishing a physically guided hybrid learning architecture, developing techniques for dynamic and adaptive timing calibration, and designing a flexible multimodal fusion framework to overcome the limitations associated with complex environment perception. This paper serves as a systematic reference for both the theoretical development and practical implementation of automatic driving vision detection technology under severe weather conditions.

Keywords:

adverse weather conditions; unmanned vehicle; image recovery; robust detection

1. Introduction

The practicality of autonomous driving technology is significantly influenced by the robustness of visual detection systems in complex environments. However, adverse weather conditions such as fog, rain, and snow lead to optical attenuation, motion blur, and noise interference, which considerably diminish image quality and target detection accuracy. According to statistics from the U.S. National Highway Traffic Safety Administration (NHTSA), the incidence of traffic accidents under low visibility conditions is exceptionally high. This highlights the pressing necessity for all-weather-sensing technology. In contrast to prior studies, this paper offers a comprehensive examination of the entire process encompassing “degradation recovery, robustness detection, data benchmarks, and performance evaluation.” This represents the first interdisciplinary review of autonomous driving visual target detection technology in adverse weather conditions. The overall framework is illustrated in Figure 1.

Traditional image restoration methods are grounded in physical degradation models and human-defined prior assumptions. The dependence on rigid assumptions complicates the management of non-uniform degradation and dynamic interference challenging. In recent years, data-driven paradigms have enabled deep learning techniques to overcome the limitations of traditional frameworks, leading to significant progress in complex weather modeling and cross-domain generalization. Nevertheless, their black-box nature, high computational costs, and reliance on extensive datasets continue to restrict their applicability in safety-critical scenarios.

Current research identifies two primary technical approaches: The first is image degradation recovery technology, which aims to reconstruct clear visual information. This includes traditional methods driven by physical models as well as deep learning techniques based on architectures such as CNN, Transformer. The second approach focuses on enhancing the robustness of the detection models, which improves perceptual stability through feature space optimization and multi-task learning. Although some studies have verified the effectiveness of algorithms on multi-weather datasets, fundamental issues remain unresolved, including the domain differences between the synthetic data and the real-world scenarios, the challenges posed by heterogeneity in multimodal sensor fusion, and the dynamic coupling effects of weather. Most existing reviews tend to concentrate on a single branch of technology and lack a systematic approach to the synergistic framework of “restoration–detection”. Furthermore, there is insufficient analysis regarding multimodal data fusion and optimization in edge computing. This paper presents an interdisciplinary review of visual target detection technologies for autonomous driving under adverse weather conditions, framed within a comprehensive process chain that includes degradation recovery, robust detection, data benchmarking, and performance evaluation. By systematically comparing and analyzing the advantages and limitations of 63 representative works, this paper elucidates the complementarity between traditional methods and deep learning paradigms. Furthermore, it proposes physically guided hybrid learning architectures as a promising direction for future breakthroughs. In addition, this paper integrates the multidimensional characteristics of 11 publicly available datasets to establish a standardized reference for algorithm development and validation. The findings indicate that it is expected to advance from laboratory validation to real-world deployment for complex weather sensing by combining physical interpretability with data-driven representations, developing a robust multimodal fusion mechanism, and optimizing edge computing performance.

The structure of this paper is organized as follows: Section 2 presents a thorough and systematic analysis of both traditional methods and deep learning methods for image restoration in the context of degradation. Section 3 explores techniques aimed at improving the robustness of detection models. Finally, Section 4 offers a comprehensive evaluation of the various features present in mainstream meteorological datasets. Section 5 presents a detailed comparison of the performance metrics of various algorithms. Section 6 addresses the challenges encountered by this technology and delineates potential directions for future development. Section 7 provides a comprehensive summary of the paper. This paper offers both theoretical support and engineering insights to facilitate the innovative development of all-weather-sensing technology in autonomous driving.

2. Degraded Image Recovery

2.1. Traditional Degraded Image Recovery Methods

Traditional image restoration techniques for degraded images are fundamentally based on the integration of physical models and prior knowledge. These methods utilize explicit modeling of degradation mechanisms, such as atmospheric scattering, light attenuation, and noise distribution, to facilitate effective image restoration. The key feature of these manually designed methods is their richness and specificity, enabling the system to adapt to the diverse characteristics presented by various adverse weather conditions. Traditional methods for restoring degraded images can be divided into several components: constructing a degradation model, incorporating prior knowledge, defining optimization objectives, and implementing iterative solution algorithms.

Based on the research conducted on traditional methods for recovering degraded images, both domestically and internationally, the advantages and disadvantages of these approaches can be summarized as follows: The advantages include the following: (1) reliance on a priori assumptions rather than data-driven techniques, eliminating the necessity for large-scale labeled datasets; (2) consistent recovery performance across uniform degradation scenarios that align with the established a priori assumptions; (3) lower hardware requirements, reduced computational costs required to achieve the target task, and easy to be applied in the actual intelligent transportation systems. However, there are several disadvantages: (1) the reliance on manually designed a priori conditions makes it challenging to adapt to complex degradation; (2) traditional methods mainly rely on numerical optimization techniques (e.g., ADMM, graph cut algorithms), which can be inefficient when processing high-resolution images.

The conventional methods for degraded image restoration include the following: the dark channel prior (DCP)-based defogging method, low-light enhancement method based on Retinex theory, sparse representation methods for rain and snow removal, physical model-driven image restoration strategies, as well as other traditional restoration methods. This section classifies and summarizes traditional image restoration methods, providing a comparison flowchart of fog removal methods based on dark channel priority (DCP) and low-light enhancement approaches grounded in Retinex theory, as illustrated in Figure 2. The historical evolution of traditional image restoration methods is depicted in Figure 3. The characteristics, advantages, and disadvantages of each method are summarized in Table 1 [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16].

2.1.1. Dark Channel Prior (DCP)-Based Defogging Approach

The dark channel prior (DCP) [1] was first introduced by scholars in 2009 and has demonstrated significant advantages in both synthetic and authentic foggy images. Compared to previous methods, the Peak Signal-to-Noise Ratio (PSNR) has improved by 3 to 5 decibels, establishing it as a landmark theory in single-image defogging. This core assumption is grounded in statistical observations of fog-free natural images, where at least one-pixel value within a local window of non-sky regions approaches zero; this phenomenon is referred to as the dark channel. Leveraging this prior assumption, DCP facilitates fog removal reconstruction without requiring paired data by modeling the atmospheric scattering equation and directly estimating transmittance along with global atmospheric light.

Dark channel prior models [1] are employed to develop physically based atmospheric scattering models, enabling the effective recovery of foggy scenes from synthetic fog image datasets. This method establishes a transmittance estimation framework based on the assumption of local minimum brightness. Nevertheless, the dark channel assumption does not match the physical model in high-brightness regions, leading to halo artifacts. Furthermore, transmittance optimization utilizing soft keys involves matrix operations with O(n) time complexity, which makes it challenging to meet the processing requirements for real-time visual systems. To address the computational bottleneck, Zhu et al. [2] proposed a color decay prior model, simplifying transmittance estimation to a closed-form solution t(x) = 1 − min(βL(x), (1). By establishing a linear mapping relationship between brightness and saturation, represented as L(x) = αS(x) + β, the inference speed is improved to match the video frame rate. Despite this efficiency gain, this model exhibits deviations from the physical model in regions of high-concentration haze, resulting in a decrease in structural similarity metrics. To address these limitations, Cai et al. [3] were the first to incorporate deep convolutional neural networks into the fog removal framework. Building upon this method, they developed a data-driven transmission rate refinement module. This module employs a three-layer convolutional neural network to conduct nonlinear optimization on the initial transmission rate map of the DCP. Following training on a large-scale synthetic dataset, an improvement in Peak Signal-to-Noise Ratio was observed. However, the model faces limitations due to domain shift issues between the training data and real-world scenarios, resulting in significant fluctuations in cross-domain generalization performance.

2.1.2. Low-Light Enhancement Based on Retinex Theory

The Retinex theoretical framework [17] delineates the components of illumination and reflection by constructing a linear combination of multi-scale Gaussian kernel functions. This approach offers a mathematically interpretable model for tasks such as low-light image enhancement and defogging. Based on the assumption of constant human visual perception, this theory analyzes the image formation process as a product of variations in illumination and surface reflection. Furthermore, it improves model performance through the incorporation of different optimization strategies.

In developing the Retinex theory, researchers have implemented various optimization strategies to enhance model performance. The variational Retinex optimization method proposed by Fu et al. [4] represents a breakthrough in achieving global lighting consistency by formulating a joint optimization model that simultaneously for both lighting and reflection components, yet its iterative solution framework is inherently burdened by high computational complexity. The LIME algorithm, developed by Guo et al. [5], introduces innovative approach by incorporating the maximum a priori probability of the illumination map, thereby significantly improving real-time performance through a rapid estimation strategy. This approach, however, amplifies noise, limiting its application in complex scenes. The RUAS model proposed by R. Liu et al. [6] innovatively integrates neural architecture search with Retinex expansion optimization to develop an end-to-end lightweight enhancement network. This approach achieves improvements in both robustness and efficiency while preserving model efficiency. Despite these advances, the noise suppression module still exhibits limitations when addressing residual noise under extreme noise conditions.

2.1.3. Rain/Snow Removal Method Based on Sparse Representation

The method for rain and snow removal method is based on the theory of sparse representation, which posits that weather interference components—such as raindrops and snowflakes—can be regarded as sparse anomalies within image signals. At the same time, the background scene exhibits the characteristic of low-rank or sparse structures. The theoretical framework facilitates the separation of degraded components from the background by designing specific dictionary structures and optimizing an objective function. L.W. Kang et al. [7] were the first to integrate morphological component analysis (MCA) with self-learning dictionaries to develop a rain removal model that operates without the need for external training data. Y. Li et al. [8] introduced an image decomposition method based on a double Gaussian mixture model (GMM), achieving a dynamic balance between rain pattern suppression and detail preservation through a joint optimization framework. Nevertheless, the sensitivity associated with local region selection imposes limitations on the efficiency of practical deployment. In recent years, Xin Guo et al. [9] have designed a two-stage restoration network that integrates local sparse structure priors and mask-guided mechanisms to effectively suppress unknown rain spot patterns. Their mask extraction algorithm demonstrates strong generalization capabilities across various scenarios. However, an over-reliance on neighborhood information for reconstruction may lead to blurred background textures under extreme rainfall conditions, such as heavy rain and blizzards.

2.1.4. Physical Model-Based Image Restoration

Image restoration methods grounded in physical models reformulate image degradation issues within an inverse problem-solving framework by developing a mathematical representation of the underlying physical process. Due to their physical interpretability and low dependence on labels, these methods have emerged as a key research branch in enhancing robustness against complex weather robustness. Nevertheless, their practical application remains limited by issues related to parameter sensitivity and computational efficiency bottlenecks. The non-local defogging method proposed by Berman et al. [10] optimizes transmittance estimation through superpixel segmentation and non-local similarity constraints, effectively mitigating artifacts in sky regions. This approach, however, is inherently limited by its dependence on segmentation accuracy. Subsequently, H. Zhang et al. [11] made significant advancements by achieving joint transmittance and atmospheric light optimization through the DCPDN network. They also successfully recovered details via multi-scale feature fusion. Despite these innovations, high computational complexity and strong dependence on synthetic data pose limitations to its practical application. In recent years, Wu et al. [12] proposed PDUNet, which integrates physical prior knowledge with deep learning and adopts a pre-training–fine-tuning paradigm to enhance generalization in real-world scenarios; However, this two-stage training mechanism limits the limited flexibility of model deployment. Zhao et al. [13] introduced the UniMix framework, which constructs a bridge domain based on physically inspired extreme weather simulations. They developed a universal mixing operator to learn domain-invariant representations, thereby addressing issues related to domain segmentation.

2.1.5. Other Traditional Recovery Methods

Suraimi et al. [14] proposed a non-uniform defogging method that is based on region growing and local contrast stretching techniques. They employed fog concentration seed points to guide the segmentation of regions, which resulted in natural transitions and improved results. Though effective, reliance on a manual initialization mechanism restricts the algorithm’s automation capabilities when addressing issues related to noise diversity. C. Zhao et al. [15] constructed an adaptive variational model that incorporates dynamic noise detection along with multi-regularization strategies, allowing it to effectively adapt to complex scenes. While powerful, this method faces practical limitations due to parameter estimation errors and computational efficiency. Gao et al. [16] integrated weighted kernel norm minimization with full-variable regularization within the ADMM framework to achieve a balance between local smoothness and global structural sparsity, thereby enhancing detail retention capabilities. Nevertheless, they encounter significant challenges associated with high memory consumption and computational complexity.

2.1.6. Comparison and Analysis

In this section, traditional image restoration methods designed for adverse weather conditions are classified into several categories. These include fog removal methods based on dark channel priors, low-light enhancement methods grounded in Retinex theory, rain and snow removal strategies utilizing sparse representation, image restoration methods informed by physical modeling, as well as other traditional restoration methods. Subsequently, a detailed review of the advancements in these methodologies is presented.

From a methodological perspective, the core contradiction inherent in traditional image restoration methods arises from the disparity between artificial prior assumptions and the diversity of real-world scenes. Dehazing methods that utilize dark channel priors (DCPs) possess significant advantages due to their physical modelability and interpretability. Nevertheless, limitations such as model mismatches in high-brightness regions and issues related to computational complexity limit their application. Future research endeavors could explore the integration of these methods with deep learning approaches, leveraging neural networks to enhance transmittance estimation and mitigate domain shift challenges. Simultaneously, the introduction of lightweight structures can significantly enhance real-time performance. Weak light enhancement techniques based on Retinex theory separate illumination from reflection through the application of multi-scale Gaussian kernels, thereby achieving notable improvements in global illumination consistency. While effective, iterative solutions incur high computational costs and noise amplification issues associated with iterative solutions require urgent attention. The integration of neural architecture search with Retinex optimization to develop lightweight networks represents a significant avenue for advancement. Additionally, further improvements can be made to enhance the handling of residual noise under extreme noise conditions.

From a technical perspective, traditional methods exhibit two primary characteristics: “single-scene optimization” and “insufficient cross-domain generalization.” Rain and snow removal techniques that rely on sparse representation leverage sparse anomalies and low-rank assumptions to effectively separate degraded components. However, challenges such as sensitivity to local region selection and background blurring under extreme weather conditions must be addressed. These challenges can be overcome by improving the mask guidance mechanism and combining dynamic scene modeling to enhance adaptability to intricate rain and snow patterns. Physics model-driven image restoration methods have gained significance due to their physical interpretability and reduced dependency on labeled data. Despite these advantages, the application of these methods is constrained by parameter sensitivity and computational efficiency. The pre-training–fine-tuning paradigm that integrates physics priors with deep learning presents a viable approach. Future research should prioritize optimizing the flexibility of this two-stage training process while enhancing cross-domain generalization capabilities.

In foggy scenes, the dark channel prior (DCP) and its improved variants demonstrate superior performance. DCP directly estimates transmittance based on an atmospheric scattering model, resulting in a PSNR improvement of 3–5 dB for synthetic fog images, alongside a structural similarity (SSIM) ranging from 0.76 to 0.89. This method is particularly effective in scenarios characterized by uniform fog concentration, where the statistical assumptions underlying the dark channel prior are well aligned with the physical model, thereby facilitating accurate restoration of scene depth information. Nevertheless, certain limitations exist: In high-concentration fog or non-uniform fog scenes, deviations in the physical model can lead to a decrease in SSIM, and artifacts are likely to manifest in sky regions. The method that combines sparse representation with a physical model proves to be optimal for rainy weather conditions. Methods based on sparse representation consider rain streaks as sparse anomalies within images. Rain line separation is achieved through morphological component analysis (MCA) and self-learning dictionaries, achieving a PSNR of 36.11 dB and a SSIM of 0.97 on datasets such as Rain100L. This method demonstrates particular efficacy in suppressing rain lines in single-scene scenarios. Yet this approach has certain limitations: In extremely heavy rain scenarios, the density and dynamic nature of the rain lines can lead to challenges; specifically, neighborhood information reconstruction may cause background blurring, which could potentially decrease the PSNR. The physical model-driven and sparse prior fusion method demonstrates greater effectiveness in snowy weather scenarios. The physically driven method simulates the scattering and attenuation of light caused by snow particles, achieving a PSNR ranging from 28 to 30 dB in synthesized snow images. Especially in uniformly snowing scenes, atmospheric light estimation and transmittance optimization demonstrate efficacy in restoring scene structure. Nevertheless, these methods exhibit certain limitations; specifically, in scenarios involving dynamic snowing or snow particle-obscured scenes, the physical model struggles to capture real-time changes in scattering parameters, leading to noticeable texture distortions within the restored image.

Traditional methods perform exceptional performance in low-complexity scenarios characterized by single degradation types (e.g., uniform fog, moderate rain, static snow). Their physical interpretability and lightweight characteristics provide a solid foundation for practical deployment. However, when confronted with non-uniform degradation, multiple weather conditions (e.g., fog- and rain-mixed), or dynamic environments, the rigid assumptions associated with artificial prior knowledge and computational efficiency bottlenecks significantly cause their performance. Its future can be integrated with deep learning. The core value of the hybrid framework that combines “physical prior constraints + data-adaptive optimization” lies in its ability to address the inherent shortcomings of both traditional methods and deep learning approaches. On one hand, the physical model embeds prior knowledge of degradation processes, thereby mitigating the generalization blind spots of deep learning’s black-box modeling in extreme scenarios. On the other hand, the data-driven nature of deep learning facilitates adaptive optimization concerning parameter sensitivity issues in physical models. This integration fundamentally addresses two critical challenges: first, while physical models often struggle to capture real-time changes in weather parameters, deep learning can learn degradation dynamics through time-series data; second, the shortcomings of physical prior knowledge in unknown scenarios can be mitigated through contrastive learning and domain adaptation techniques. Thus, the hybrid framework establishes a robust perception system characterized by both interpretability and environmental adaptability through the synergy of physical constraints and data adaptation. This represents a pivotal pathway for overcoming generalization limitations in complex scenarios.

2.2. Deep Learning-Based Degraded Image Recovery Methods

Deep learning, as a method of machine learning, is founded on a neural network structure that draws inspiration from the functioning of neurons in the human brain. Utilizing multi-layer neural networks, deep learning can automatically learn complex abstract features and patterns without the need for manual design of feature extractors. In the context of degraded image restoration, the deep learning method significantly improves the accuracy of generalization ability in restoring images affected by severe weather conditions. In the process of restoring degraded images, deep learning-based methods typically encompass four essential steps: first, the input of degraded images; second, the execution of degradation modeling and preliminary restoration; third, the application of physical constraints and optimization enhancements; and finally, the output of the fully restored image. Degradation modeling and preliminary recovery commence with local feature extraction, followed by global dependency modeling, and then multi-scale fusion. This process ultimately generates preliminary recovered images through the incorporation of physical embedding. The implementation of physical constraints and optimization enhancement begins with the design of the loss function, which is subsequently followed by multi-task collaborative optimization.

The advantages of the deep learning-based method for degraded image recovery include the following: (1) a powerful capability for nonlinear modeling; (2) data-driven generalization across different domains; (3) efficient computation and real-time processing; (4) multimodal fusion along with multi-task collaboration. Conversely, the disadvantages are as follows: (1) dependence on data and associated annotation costs; (2) issues related to overfitting that create bottleneck in generalization.

The predominant methods for car label recognition based on deep learning include the CNN-based degraded image recovery method, GAN-based degraded image recovery method, RNN-based degraded image recovery method, ResNet-based recovery method, and other Transformer-based degraded image recoveration methods. In this section, we classify and provide an overview of the deep learning methodologies employed in the recovery of degraded images. This section presents the flowcharts of image restoration methods based on CNN, GAN, and Transformers in order, as illustrated in Figure 4. The historical development of the deep learning-based car marker recognition methods is depicted in Figure 5, with a summary of each method’s features, advantages, and disadvantages provided in Table 2 [18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36].

2.2.1. CNN-Based Method for Degraded Image Recovery

During the course of technological evolution, image downsampling restoration methods that leverage convolutional neural networks have persistently sought to achieve an optimal balance between model capacity and inference efficiency. However, dynamic downsampling modeling continues to pose a significant challenge in this field. Zhang et al. [18] introduced HQS + CNN, which employs a dynamic weighting mechanism to adaptively differentiate between various types and intensities of noise, thereby facilitating lightweight real-time processing. Nevertheless, its general noise model lacks specificity required for complex weather downsampling scenarios. DehazeNet, proposed by Cai et al. [3], represents the first end-to-end defogging network that combines an atmospheric scattering model with density-aware loss functions, offering notable advantages in inference speed. However, its effectiveness is constrained by network depth of the network, resulting in limited capabilities in detail restoration. JORDER, introduced by Yang et al. [19], innovatively employs a binary rain line map to distinguish between the location and intensity information of rain patterns. Additionally, it enhances adaptability to complex scenarios through multi-task learning. Nevertheless, its performance remains affected by its reliance on synthetic data. Tajane et al. [20]’s EffiConvRes employs a multi-architecture fusion approach and a lightweight design, effectively balancing accuracy and efficiency while demonstrating strong generalization capabilities. However, its scalability with respect to high-resolution images still remains to be verified. Wu et al. [21]’s MSDL enhances defogging effects through a joint parameter estimation framework that exhibits significant computational efficiency. Nonetheless, it heavily relies on synthetic data and is characterized by high explicitness.

2.2.2. Inherent Contradiction Between GAN-Based Degraded Image Recovery Methods and Accuracy

Generative adversarial networks (GANs) have made significant strides in improving image visual quality and cross-domain generalization capabilities. However, the lack of control over the generation process and issues related to computational efficiency continue to pose critical challenges whithin this field. Engin et al. [22] introduced Cycle-Dehaze, which employs unsupervised training via a cycle consistency loss while introducing physical constraints to enhance model interpretability. Dudhane et al. [23]’s RI-GAN departs from traditional scattering model frameworks by circumventing the need for transmittance and atmospheric light estimation algorithms, thereby generating fog-free images and effectively mitigating cascading errors. However, its reliance on synthetic data and substantial computational requirements limit its practical applicability. The integration of Transformers with the nonlinear mapping network of StyleGAN2’s [24] significantly improves generation quality by effectively combining global and local features. Additionally, the hybrid discriminator design enhances the model’s adaptability; however, challenges such as high computational costs and performance bottlenecks specific to certain scenes remain unresolved.

2.2.3. RNN-Based Method for Recovery of Degraded Images

In the realm of image restoration methods utilizing recurrent neural networks (RNNs), dynamic modeling and efficiency optimization have emerged as pivotal directions driving the continuous development of technology. The RVRT method introduced by Liang et al. [26] achieves multi-scale feature fusion through global dynamic aggregation, effectively balancing efficiency and performance in video restoration. Nevertheless, its high computational complexity poses a significant limitation for real-time applications. Liu et al. [25] innovatively integrated non-local operations with RNNs to establish a unified framework that supports both denoising and over-segmentation, demonstrating robust degradation resistance. Nonetheless, the complexity of the computational process and the sensitivity of neighborhood parameters limits its practical deployment. Rota et al. [27] introduced bidirectional temporal constraints combined with optical flow estimation and occlusion masks to minimize inter-frame artifacts, thereby achieving high-fidelity restoration with low computational overhead. However, the reliance on optical flow accuracy may hinder the improvement under extreme low-light conditions.

2.2.4. ResNet-Based Degraded Image Recovery Methods

Image downsampling restoration methods based on ResNet have progressed in complex weather modeling and multi-task collaborative optimization. This progress can be attributed to the integration of multi-scale feature fusion and residual connection technology. However, a critical challenge in this domain remains the balance between the number of model parameters and the capabilities for processing multiple weather conditions simultaneously. Li et al. [28] proposed a neural architecture search (NAS) approach that introduces a dynamic branch for weather classification. The integration of ResNet-152 has facilitated the adaptive restoration of multiple weather conditions. Zhang et al. [29]’s RDN significantly enhances detail restoration capabilities through the implementation of dense connections and global feature fusion, while its lightweight design ensures compatibility with multi-tasking applications. Nevertheless, further optimization is required to improve its robustness in scenarios characterized by extreme weather degradation. Additionally, Wang et al. [30] developed a seven-category weather dataset based on ResNet50 to explore the joint modeling of multi-weather degradation effects, though limited data diversity constrains cross-scenario generalization capabilities. Batur et al. [31]’s DRDNet employs a breadth-first design that combines residual learning with dilated convolutions to enhance context modeling. Although it demonstrates commendable computational efficiency, challenges remain regarding mobile lightweight design and effective suppression of extreme noise.

2.2.5. Degraded Image Recovery Method Based on Transformer

Transformer-based image downsampling restoration methods overcome the limitations of local receptive fields inherent in convolutional neural networks by introducing a global self-attention mechanism and demonstrate notable advantages in modeling downsampling under complex weather conditions. In the context of fog restoration, Song et al. [32] developed the DehazeFormer network architecture. By jointly optimizing transmittance and atmospheric light parameters estimation through a multi-scale window attention mechanism, they achieved state-of-the-art restoration performance on a dense fog dataset. Nevertheless, many model parameters and computational requirements exceed the real-time processing capabilities of in-vehicle computing platforms. In response to this challenge, SMMT [33] innovatively integrates a bio-inspired spiking neuron approximation mechanism and employs the Sigmoid function to simulate spiking discharge characteristics, thereby demonstrating low-energy advantages in multimodal data denoising. While promising, the real-time inference still requires enhancement. Zhu et al. [34] introduced MWFormer, which utilizes a dynamic routing mechanism to optimize the allocation of computational resource for multi-weather restoration tasks. They demonstrated a balance between computational efficiency and restoration accuracy using multi-weather datasets. Nevertheless, restoration errors are exacerbated in extreme mixed weather scenarios due to the coupling effects of various degraded types. To address domain generalization issues, Li et al. [35] developed the RestoreCUFormer network. This network employs contrastive loss to learn domain-invariant feature representations. While this method demonstrates stable performance in cross-domain restoration, its large number of model parameters and extended training time limit the efficiency of iterative optimization. Cheng et al. [36] have proposed the innovative TransRAD architecture, which integrates radar point cloud and image data through a multimodal fusion mechanism to enhance target detection performance in foggy weather conditions. This approach has been validated in foggy weather scenarios. Despite this advancement, robustness in extreme weather coupling scenarios remains suboptimal.

2.2.6. Comparison and Analysis

In this section, methods for vehicle marking recognition based on deep learning are categorized into several approaches: image restoration methods utilizing convolutional neural networks (CNNs), image restoration methods employing generative adversarial networks (GANs), image restoration methods leveraging recurrent neural networks (RNNs), image restoration methods based on ResNet, and other image restoration methods grounded in Transformers. A detailed review of the research progress associated with each method is presented.

Unlike traditional methods, deep learning methods transcend the limitations imposed by the prior assumptions of conventional techniques through a data-driven paradigm. They have demonstrated significant advantages in the restoration of images affected by severe weather conditions. However, convolutional neural network (CNN)-based methods encounter challenges related to insufficient targeting in dynamic degradation modeling. For instance, HQS + CNN exhibits limited generalization capabilities when processing complex weather-induced degradations. DehazeNet exhibits limited capabilities in detail restoration due to its inadequate network depth. Nevertheless, its lightweight design and real-time processing advantages are noteworthy. In future developments, the introduction of dynamic weighting mechanisms and multi-scale feature fusion could significantly enhance adaptability to extreme weather conditions. Simultaneously, the incorporation of transfer learning can mitigate reliance on synthetic data. GAN-based methods such as Cycle-Dehaze and RI-GAN represent significant advancements in cross-domain generalization and cascade error suppression. While innovative, they remain computationally expensive and heavily dependent on synthetic data. In future research, we can investigate the deep integration of unsupervised learning with physical constraints to leverage generative models for improving adaptability in real-world scenarios. At the same time, we can optimize the design of the discriminator to balance generation quality and efficiency. RNN-based methods, such as RVRT and NLRN, have progressed in dynamic modeling and multi-scale feature fusion, yet computational complexity and limitations in optical flow accuracy pose challenges for practical applications. Future research may explore the integration of lightweight recurrent units and adaptive temporal constraints with neural radiance field (NeRF) technology to enhance temporal consistency in dynamic scenes. ResNet-based methods, such as NAS and RDN, have achieved multi-weather modeling and lightweight design, but a tradeoff exists between the number of parameters and robustness under extreme weather conditions. In future work, neural architecture search (NAS) can be employed to optimize network structures, while integrating residual learning with dynamic weather classification may enhance adaptability across multiple scenarios. Methods based on Transformers, such as DehazeFormer and MWFormer, demonstrate strong performance in complex weather modeling due to their global self-attention mechanisms. However, many model parameters and high computational costs limit their deployment in vehicular applications. Future research may explore sparse attention mechanisms and dynamic routing optimization to address these limitations. Additionally, the integration of multimodal data—such as radar point clouds and images—can enhance robustness under extreme weather conditions. Furthermore, combining diffusion models with self-supervised learning has the potential to effectively address domain generalization bottlenecks, thereby facilitating a deeper integration of physical interpretability with data-driven approaches.

In foggy conditions, the fusion method of Transformer and GAN demonstrates superior performance. For example, DehazeFormer jointly optimizes transmittance and atmospheric light through multi-scale window attention, achieving a PSNR of 27.51 dB and a SSIM of 0.9576 on high-density fog datasets. Its global modeling capability effectively mitigates artifacts in the sky region. RestoreCUFormer leverages contrastive loss to learn domain-invariant features, resulting in a 12% improvement in generalization compared to traditional CNNs in cross-domain fog scenes, with PSNR fluctuations maintained below 1.5 dB. Combining CNN with sparse prior methods yields optimal results in rainy weather. The local sparse characteristics of rain lines are well-aligned with the local feature extraction capabilities of CNN, and the integration of sparse priors enhances robustness against complex rain patterns. For example, JORDER effectively separates positional and intensity information through a binary rain line map, achieving a PSNR of 36.11 dB on the Rain100H dataset. The localization accuracy for dynamic rain lines is 18% higher compared to traditional sparse methods. The fusion of physical model-driven approaches with sparse priors proves to be more effective in snowy weather scenarios. The physically driven method effectively simulates the scattering and attenuation of light caused by snow particles, achieving a PSNR ranging from 28 to 30 dB in synthesized snow images. Especially in scenes characterized by uniform snowfall, atmospheric light estimation and transmittance optimization demonstrate efficacy in restoring scene structure. Nevertheless, these methods exhibit certain limitations; specifically, in scenarios involving dynamic snowfall or snow particle-obscured scenes, the physical model encounters difficulties in accurately capturing real-time changes in scattering parameters. This limitation ultimately results in texture distortion within the restored images.

Deep learning methods have surpassed traditional methods in high-resolution image restoration for single-weather scenarios, such as uniform fog and moderate rain. Significant challenges persist, however, under extreme weather conditions, as well as in multimodal fusion and edge deployment scenarios. Notably, Transformer-based methods have successfully addressed previous obstacles associated with modeling complex weather degradation and have demonstrated unique advantages in this domain. Their self-attention mechanism effectively models long-range dependencies among pixels. They can address challenges through contrastive learning and domain-invariant feature extraction strategies. Additionally, the Transformer’s global modeling capability enhances preservation of image structural details. Undoubtedly, deep learning-based, degraded methods for degraded image restoration have become mainstream with the continuous development of deep learning technology. This method eliminates the need for manually designed features, thereby offering enhanced accuracy and robustness, particularly when applied to large datasets. Nevertheless, it is important to note that deep learning-based methods for degraded image restoration typically rely on high-quality datasets and achieve improved performance through increasing network weight parameters. This raises the challenge of balancing accuracy and computational cost. Consequently, deep learning-based methods for degraded image restoration necessitate continuous improvement to meet real-world demands, particularly regarding dataset expansion and considerations related to computational expenses.

3. Improving Detection Model Robustness

3.1. Traditional Methods for Improving Robustness

The traditional approach to enhancing the robustness of the detection models involves integrating physical degradation models with image enhancement techniques. This integration aims to optimize the noise suppression within the feature space while also addressing target largeness. Nowadays, traditional methods for improving robustness can be divided into three fundamental levels: feature enhancement, parameter optimization, and interference suppression.

Based on both domestic and international research on traditional methods aimed at improving the robustness of the detection models, the advantages and disadvantages can be summarized as follows: The advantages include the following: (1) strong physical interpretability; (2) real-time optimization capabilities, facilitated by hardware acceleration or lightweight algorithms to achieve efficient processing; (3) effective suppression of single interference. The disadvantages are as follows: (1) Due to reliance on artificial design and a priori knowledge, the generalization ability of the algorithm is limited, making it difficult to adapt to unknown scenarios; (2) The capability for suppressing mixed interference is weak, leading to the performance degraded in cases involving intertwined multiple severe weather conditions.

The traditional enhancement methods predominantly encompass anisotropic diffusion filtering, CLAHE-based methods, multi-scale fusion, and other conventional enhancement methods. This section systematically categorizes and reviews these established methods aimed at improving robustness. Flowcharts illustrating robustness improvement strategies based on anisotropic diffusion filtering and CLAHE-based contrast enhancement are presented sequentially in Figure 6. The development history of traditional methods for bolstering robustness is depicted in Figure 7, while Table 3 summarizes the characteristics, advantages, and disadvantages of each method [37,38,39,40,41,42,43,44,45,46].

3.1.1. Anisotropic Diffusion Filtering-Based Approach

Noise diffusion is modeled through the use of partial differential equations, and the noise reduction process can be effectively constrained by formulating mathematical–physical equations. While suppressing noise diffusion, this method preserves the edge structure of the target image. This approach enables adaptive processing of local image features by designing anisotropic diffusion coefficients, thereby offering a distinct advantage in complex noise scenarios. K. S. Gautam et al. [37] proposed a defogging framework that strikes a balance between processing speed and quality by combining memory block strategies with lookup table replacement, hardware dependency, and considerations for accuracy loss. Palanisamy V et al. [39] improved the Perona–Malik model. Liu et al. [40] introduced the OS-SART-ADF framework, which utilizes a synergistic mechanism combining 3D anisotropic diffusion and iterative filtering to reduce noise based on a physical model. Nevertheless, this framework exhibits suboptimal performance in scenarios characterized by extreme noise levels and lacks effective automatic parameter optimization.

3.1.2. Contrast Enhancement Method Based on CLAHE

The CLAHE technical paradigm effectively mitigates the effects of uneven lighting in complex scenes by dynamically adjusting contrast enhancement parameters. This method achieves adaptive contrast optimization through the analysis of local statistical information and is widely employed in low-light image enhancement. Z. Yuan et al. [41] proposed an adaptive parameter optimization framework suitable for detection tasks, which facilitates low-noise enhancement by dynamically balancing brightness channels. This method demonstrates high efficiency in real-time processing; however, it exhibits limited capabilities for cross-scene generalization. Chen et al. [42] introduced a guided image filtering mechanism and developed a joint CLAHE-GIF optimization model, which effectively balances contrast enhancement and noise suppression in hazy environments. This approach, however, is constrained by a limited dataset size and insufficient ability to characterize complex noise patterns. I. Lashkov et al. [43] have innovatively integrated dynamic CLAHE with dark channel prior information, resulting in the design of a lightweight and optimized network architecture that is well suited for edge devices. Nevertheless, there remains room for improvement in performance regarding detail recovery under extreme lighting conditions.

3.1.3. Other Traditional Enhancement Methods

F. Kou et al. [44] proposed a gradient-domain bootstrap filtering method that enhances parameter robustness and mitigates halo artifacts by employing multi-scale, edge-aware weights along with explicit first-order gradient constraints. Despite these advantages, the generalization ability of the technique is limited under extreme noise conditions. Z. Lu et al. [45] developed an effective bootstrap filter grounded in local variance regularization and content-adaptive amplification, which adeptly balances edge preservation and halo suppression, yielding excellent performance. While effective, improvements are still needed in its computational efficiency and adaptability to high-noise levels. Li et al. [46] have innovatively combined multi-channel gradient tensors with weighted adaptive filtering, leveraging an efficient iterative optimization framework to enhance the performance of real-time HDR imaging. Nevertheless, this method continues to encounter challenges in suppressing motion artifacts and complex noise in dynamic scenes.

3.1.4. Comparison and Analysis

In this section, the traditional methods for vehicle identification are categorized into three categories: anisotropic diffusion filtering-based methods, CLAHE-based methods, and other traditional enhancement methods. A comprehensive review of the research advancements associated with each category is presented.

From a technical perspective, the core contradiction inherent in traditional technology arises from the disparity between the “rigidity of artificial a priori assumptions” and the “dynamic changes in environmental diversity.” Anisotropic diffusion filtering, such as OS-SART-ADF, mitigate both noise diffusion via partial differential equations as well as high edge retention under uniform fog conditions. Nevertheless, the computational complexity associated with three-dimensional iterative solutions restricts their adaptability to dynamic environments. In contrast, methods based on CLAHE, exemplified by CLAHE-GIF, enhance detection accuracy in low-light scenarios through local contrast enhancement. Nonetheless, these methods are contingent upon fixed parameter configurations. The interplay between lighting and weather conditions within fog–rain-mixed scenes contributes to improved target detection rates. Looking ahead, integrating reinforcement learning could facilitate adaptive parameter optimization tailored for dynamic environments. Combining this approach with innovative methodologies such as YOLO may pave the way for significant advancements in research development.

Traditional methods for enhancing the robustness of detection models exhibit limitations and present opportunities for innovation across various technical pathways. Future breakthroughs should prioritize the integration of physical modeling with data-driven approaches. Methods based on anisotropic diffusion filtering, such as the OS-SART-ADF framework, leverage physical models to facilitate noise suppression while providing advantages in balancing edge retention and noise diffusion. However, these methods exhibit poor adaptability in extreme noise scenarios and lack sufficient automation in parameter optimization. Future approaches could incorporate deep learning techniques to enable dynamic adjustment of diffusion coefficients, alongside three-dimensional scene prior information, to improve capabilities for suppressing complex noise. Concurrently, it is essential to optimize iterative solution efficiency to accommodate real-time processing requirements. Contrast enhancement methods grounded in CLAHE, exemplified by the CLAHE-GIF model, improve adaptability in low-light environments through dynamic parameter adjustment. Nevertheless, they suffer from weak cross-scene generalization and are limited by dataset size. In future research endeavors, self-supervised learning can be integrated to extract lighting-invariant features from unlabeled datasets. Additionally, dynamic threshold optimization may be integrated to bolster detail recovery under extreme lighting conditions while expanding real-world scene datasets will enhance representation capabilities for complex noise. Other traditional enhancement methods, such as gradient-domain bootstrap filtering, have progressed in parameter robustness and halo suppression; however, they still lack generalization when faced with high-noise scenes. Future research could introduce attention mechanisms that adaptively allocate multi-scale weights. Combining physically driven noise models could enhance interference resistance within dynamic scenes.

Combining anisotropic diffusion filtering with a physical model yields optimal performance in foggy scenes. The LUT replacement framework achieves an inference speed of 14.3 FPS under such conditions through hardware acceleration and memory partitioning strategies, while maintaining a mAP of 72–75%. The physically driven noise suppression and edge retention mechanisms are consistent with the uniform degradation characteristics observed in foggy conditions. By optimizing the design of the diffusion coefficient, the improved Perona–Malik model realizes an exponential increase in edge retention to 92% under high-density fog scenarios. Compared to traditional methods, it effectively reduces halo artifacts by 30%. In rainy weather scenarios, CLAHE combined with guided filtering methods performs better. CLAHE-GIF uses guided filtering to constrain the range of contrast enhancement, achieving a mAP of 68–70% in moderate rain events. This represents a notable improvement of 5–8% over the pure CLAHE method. It successfully balances brightness uniformization amidst mainline interference while preserving target features. The dynamic CLAHE fusion model, which utilizes lightweight architecture, attains inference speeds of up to 20 FPS on edge devices. Furthermore, it improves the suppression rate of low-light, rain line-mixed interference in rainy night scenes by 15%.

Traditional robust enhancement methods for detection models, while facing challenges such as a strong dependence on manual design and limited adaptability to dynamic environments, significantly outperform deep learning-based methods regarding computational cost and interpretability. In terms of accuracy, traditional methods can sometimes rival the performance of many deep learning methods. Certain robust enhancement methods may serve as optimization tools for achieving high accuracy in specific scenarios. However, given the complexity of outdoor environments and the presence of various random conditions in traffic, such as adverse weather, occlusions, and fluctuations in lighting, the effective utilization of traditional robustness enhancement methods to address these intricate scenarios and improve their generalization capabilities remains a significant challenge. In future research, it is imperative to overcome this bottleneck of combining traditional physical models with lightweight deep learning. For instance, we could incorporate the edge protection mechanism derived from anisotropic diffusion into the feature extraction layer of CNN. Alternatively, employing a neural architecture search could facilitate the automatical optimization of the parameter configuration strategy for CLAHE, thereby contributing to the development of a robust, interpretable, and environmentally adaptable detection system.

3.2. Deep Learning-Based Approach to Improve Robustness

Deep learning demonstrates robust target detection under adverse weather conditions through end-to-end feature learning and multi-task optimization. To enhance the robustness of detection models, deep learning-based approaches typically encompass five key steps: data augmentation for domain adaptation, anti-jamming design within the model architecture, optimization of the training strategy, dynamic optimization during the inference phase, and edge deployment along with acceleration.

The deep learning approach to enhancing robustness presents several advantages: (1) it possesses a powerful complex capability for processing complex data, enabling the automatical extraction of robust features from multi-weather datasets without the necessity for manual feature design; (2) it demonstrates dynamic adaptability and flexibility in real-time weather sensing and calibration; (3) it facilitates end-to-end optimization and closed-loop iteration, thereby exhibiting continuous learning capabilities. However, the method also has notable disadvantages: (1) it incurs high computational and deployment costs, with strategies such as adversarial training and multimodal fusion contributing to increased computational complexity; (2) the decision-making process of the deep learning model is often opaque, complicating efforts to verify its reliability in safety-sensitive scenarios such as autonomous driving.

This section classifies and summarizes improved methods based on SSD, R-CNN, and YOLO series. Flowcharts illustrating the improvements in robustness derived from R-CNN and YOLO are presented sequentially, as depicted in Figure 8. The historical development of deep learning-based methods for enhancing robustness improvement is illustrated in Figure 9. Additionally, a summary of each method’s characteristics, advantages, and disadvantages is provided in Table 4 [47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63].

3.2.1. SSD-Based Detection Method Improvement

In recent years, improvements in SSD object detection algorithms have primarily centered on innovations in lightweight architecture and multi-scale feature fusion techniques. The team led by Zhang Hao [47] has enhanced the model’s ability to detect small objects by incorporating the ResNet50 backbone network and expanding the feature layers, all while preserving real-time performance and improving detection accuracy. Nevertheless, this model exhibits limitations in terms of hardware adaptability. Cheng Xiao et al. [48] implemented a deep separable convolution reconstruction network that significantly reduces the computational burden of the model. The detection performance on mainstream datasets is comparable to that of the original model, providing new insights for mobile deployment. Hu Yue et al. [49] proposed an improved solution that integrates the lightweight MobileNetV3 with a channel attention mechanism and a feature pyramid to tackle challenges associated with small object detection failures in complex indoor scenes. Nevertheless, further optimization is required to enhance adaptability in extreme occlusion scenarios.

3.2.2. Improvement of R-CNN-Based Detection Methods

The improved model based on R-CNN significantly enhances object detection performance in complex scenes through architectural optimization and the fusion of multi-scale features. Wang et al. [50] introduced a lightweight RegNet backbone network into Faster R-CNN, optimizing model efficiency by employing neural network architecture search techniques. Despite these improvements, the model’s generalization ability in extreme dynamic environments remains suboptimal. Rajarajeswari et al. [51] advanced the FPN structure of Mask R-CNN through multi-layer feature fusion, effectively enhancing segmentation accuracy for densely populated urban objects. Nevertheless, its adaptability to extreme weather conditions remains suboptimal. Zhan’s [52] team proposed ECB-Mask R-CNN, which combines channel attention mechanisms and dynamic loss functions, achieving notable breakthroughs in small object detection tasks; this approach, however, remains highly dependent on computational resources. Collectively, these studies indicate that by leveraging lightweight backbone networks and adaptive feature fusion strategies can facilitate a balance between detection accuracy and efficiency. Moving forward, enhancing scene generalization capabilities and computational cost optimizational costs will remain key research priorities.

3.2.3. Improvement of YOLO-Based Detection Methods

Lu Yanfeng et al. [53] proposed the Bi-STN-YOLO model, enhancing non-rigid objects’ geometric deformation modeling capability through an elastic deformation transformation network, though its computational complexity and data augmentation dependency still require further optimization. H. Abbasi et al. [55] introduced an adversarial knowledge distillation framework in YOLOv3 to improve the model’s robustness under adverse weather conditions, yet an accurate weather classification mechanism remains to be established. Lu Yanfeng’s team further developed CSIM by integrating light-invariant design with a cross-scale feature pyramid, significantly improving detection stability in light-varying scenes; however, this improvement came at a slight cost to computational efficiency. Li et al. [56] improved YOLOv7 by incorporating multimodal input (visible light/infrared fusion) and a lightweight design for real-time detection under complex weather conditions, but the approach relies on high-quality sensor data. B. S. [57] Pour’s AL-YOLO balances computational efficiency and accuracy in a single extreme weather scenario through a lightweight architecture. In parallel, H. Gupta et al. [58]’s synthetic weather augmentation strategy for YOLOv5, combining physical models with generative adversarial networks (GANs) to mitigate real-world data scarcity. However, the requirement for high-resolution training limits real-time deployment. P. Bharat Siva Varma’s [59] R-YOLO addresses the generalization bottleneck in multi-weather scenarios via a dual-stream architecture combining QTNet weather domain transformation and FCNet feature calibration. Despite this advance, reliance on synthetic data and training complexity limits its application in extreme scenarios. Shveta’s [60] team optimizes YOLO v9 based on NAS, integrating multi-sensor fusion to reduce false detection rates in complex dynamic scenarios, though high computational costs hinder practical implementation. Ma et al. [61] combined YOLOX with a defogging module to construct an end-to-end weather adaptation framework using domain adaptation, yet its generalization performance significantly deteriorates in heavy rain. Yang’s [62] team innovatively fused frequency domain/atmospheric domain preprocessing in YOLOv11 + FFT + MF to enhance robustness under extreme weather conditions, necessitating a balance between computational resources and real-time performance. Looking forward, the study SUHD [63] achieved the first lossless conversion of spiking neural networks (SNNs) in object detection, using ultra-low time steps to reduce energy consumption. Nevertheless, its adaptability in dynamic environments and hardware compatibility still require further improvement.

3.2.4. Comparison and Analysis

In this section, we will categorize the methods for enhancing robustness through deep learning into three main groups: SSD-based improvement methods, R-CNN-based improvement methods, and YOLO-series-based improvement methods. Additionally, we will provide a detailed review of the research progress of each method.

Deep learning-based detection models demonstrate significant advantages as well as notable bottlenecks that need to be addressed across various technical approaches. Future developments should prioritize overcoming core limitations and fostering cross-technology integration. Methods based on SSD, such as the improved SSD and M3-ECA-SSD, enhance small object detection capabilities through lightweight architectures and multi-scale feature fusion. Nevertheless, these methods face challenges related to insufficient hardware adaptability and limited generalization in extreme scenarios. Future improvements may involve the introduction of dynamic routing mechanisms to optimize the allocation of computational resources. Combining neural architecture search (NAS) could achieve hardware-aware network compression. Furthermore, expanding real-world adverse weather datasets has the potential to enhance adaptability in complex scenarios. Methods based on R-CNN, such as ECB-Mask R-CNN and improved Faster R-CNN, enhance accuracy through feature fusion and backbone networks, yet they still face challenges related to high dependency on computational resources and limited adaptability under extreme weather conditions. Future research could explore sparse attention mechanisms to mitigate computational complexity. The integration of multimodal data, such as radar and visual inputs, has the potential to enhance feature complementarity under adverse weather conditions. Additionally, self-supervised learning techniques can be utilized to improve cross-scenario generalization capabilities. Methods based on YOLO, such as Bi-STN-YOLO and CSIM, have made breakthroughs in dynamic modeling and real-time performance. Despite these prominent tradeoffs, the contradiction between data dependency and computational efficiency remains significant. Future work may involve combining diffusion models to generate diverse adverse weather data, which are adopted while employing knowledge distillation and model quantization to achieve lightweight deployment. At the same time, physically driven domain adaptation mechanisms can be introduced to mitigate domain shifts between synthetic data and real-world scenarios. Additionally, the integration of spiking neural networks (SNNs) with YOLO, such as SUHD, demonstrates significant potential for achieving low power consumption. Further optimization is required to enhance adaptability to dynamic environments and improve hardware compatibility.

Under foggy conditions, the YOLO series and multimodal fusion methods demonstrate superior performance. Through its lightweight architecture, AL-YOLO achieves a mean Average Precision (mAP) of 50.1% and a frame rate of 114.94 FPS on the Foggy Cityscape dataset. Its dynamic feature calibration mechanism effectively mitigates light attenuation under foggy conditions. TransRAD integrates radar point cloud with image data. Leveraging the penetrative capabilities of radar to supplement visual input. This approach results in a 4–6% improvement in mAP compared to single-modal methods in foggy scenarios, thereby addressing the occlusion detection challenges inherent in pure visual methods. In rainy weather scenarios, SSD and domain-adaptive methods perform better. M3-ECA-SSD enhances detection capabilities through the implementation of channel attention and feature pyramids, effectively addressing the challenges associated with small object detection failures caused by mainline occlusion in the ADE20k indoor dataset. YOLOX combines a defogging module to facilitate end-to-end domain adaptation in Foggy Cityscapes, resulting in a 12% improvement in mean Average Precision (mAP) compared to traditional methods. Nevertheless, it is important to note that in heavy rain scenarios, generalization performance tends to decline due to inherent biases in synthetic data. In scenarios characterized by snowy weather, R-CNN and cross-domain learning methods perform better. Improvements to Faster R-CNN utilizing the RegNet backbone and neural architecture search achieved a mAP of 89.4% under complex background conditions typical of snowy weather. The multi-scale feature fusion mechanism effectively adapts to changes in snow particle size. Furthermore, the dual-stream architecture of R-YOLO—comprising QTNet domain transformation and FCNet feature localization—successfully addresses the domain shift bottleneck associated with snowy weather on the ACDC dataset, resulting in improved cross-scenario generalization performance compared to traditional methods.

Robustness improvement methods based on deep learning present several advantages, including automatic feature learning, dynamic adaptability, and multimodal fusion capabilities in complex environmental perception. These attributes facilitate high-precision detection in domains such as autonomous driving. Nevertheless, it is essential not to overlook the associated drawbacks. Key limitations include data dependency, domain segmentation, significant computational overhead, and limited interpretability. Such limitations hinder the implementation efficiency of these methods in resource-constrained scenarios. To address real-world demands effectively, robust improvement methods for deep learning necessitate ongoing refinement. Future breakthroughs must maintain the benefits of end-to-end learning while addressing technical challenges through strategies such as multimodal fusion, physical knowledge embedding, and self-supervised learning.

4. Weather Public Datasets

The dataset is a critical requirement for the development of deep learning, and no research can progress without adequate data support. In the domains of image restoration and robust detection, datasets play a central role in model training, facilitating the comparison and evaluation of various algorithms’ performance while driving technological advancement and maturation in these fields. As deep learning applications in image restoration and robustness enhancement continue to expand, the construction of a comprehensive weather dataset has become essential for effectively training deep learning models. Many researchers are reluctant to publicly share their weather datasets, making it challenging to assess the performance of various image restoration and robustness enhancement algorithms. This reluctance hinders significant breakthroughs in unmanned visual perception. Additionally, the limited availability of weather datasets necessitates that new researchers entering this field invest considerable time and effort in constructing new datasets to ensure adequate training and testing of models. This section provides a detailed overview of 11 commonly utilized public meteorological datasets in Table 5 [64,65,66,67,68,69,70,71,72,73,74]. These include RESIDE, Foggy Cityscapes, O-HAZE, RainCityscapes, Rain800, among others. Download links for these datasets are also provided to facilitate the experimental and testing endeavors of researchers. Furthermore, these datasets are readily accessible for downloading to support researchers’ experimental and testing work.

With its multimodal data collection and diverse annotation system, this open dataset provides critical benchmarking support for the development of autonomous driving perception algorithms under extreme weather conditions. RESIDE generates a large-scale image dataset that captures foggy weather scenarios, ranging from transparent to heavily foggy conditions, based on a physical scattering model. This dataset serves as a benchmarking resource for defogging algorithms but lacks instance-level annotations necessary for object detection tasks. Foggy Cityscapes enhances the quality of real-world scene data through the parameterization of fog concentration. It provides pixel-level semantic segmentation and instance segmentation annotations, thereby supporting research in object detection and cross-domain transfer under foggy conditions. O-HAZE introduces multispectral real fog field data, promoting joint optimization of sensor fusion and fog removal techniques. RainCityscapes synthesizes dynamic rain lines based on video sequences derived from rainy day datasets, thereby supporting spatio-temporal perception and instance segmentation tasks. Clear images in Rain800 are randomly selected from the BSD and UCID datasets. Artificially synthesized rainfall maps ntegrated with these datasets exhibit more complex and diverse styles of rainfall trajectories. Rain13K leverages hierarchical rainfall patterns to drive end-to-end training of rainfall models. RainDrop is used to drive the end-to-end training process for rainfall models. RainDrop effectively combines depth information with raindrop occlusion annotations to address issues related to dynamic occlusion. In the snowy weather dataset, SnowyKITTI2012 simulates the attenuation characteristics of snowy weather LiDAR point clouds under snowfall while validating multimodal detection robustness. The Snow-100K-scale synthetic snowy scene images provide pixel-level snowy masks through large-scale synthetic imagery, though its simplified snowflake motion trajectories reveal significant domain gaps. BDD100K is a common benchmark with multi-weather imagery, despite relatively coarse weather labeling granularity. The KAIST multispectral dataset relies on visible–infrared paired data to enhance pedestrian detection performance under low-light and foggy conditions. Collectively, these datasets enable multidimensional algorithm validation through diverse annotation systems, yet synthetic-to-real domain gaps remain an active research challenge.

5. Performance Comparison and Analysis of Severe Weather-Sensing Algorithms

5.1. Comparison of Degraded Image Recovery Methods

5.1.1. Evaluation of Indicators

Research on degraded image recovery primarily employs PSNR, SSIM, and inference speed as evaluation metrics. Even a mere 0.1% change in any of these three indices can yield significant economic value to society in practical application.

(1): PSNR (Peak Signal-to-Noise Ratio). PSNR is a metric that quantifies the global brightness difference by calculating the pixel mean square error (MSE) between the reconstructed image and the reference clear image, expressed in decibels (dB). The numerical meanings are as follows: values ranging from 20 to 30 dB indicate clearly visible distortion; values from 30 to 40 dB suggest visual close to the original; and values exceeding 40 dB signify professional-grade restoration.
(2): SSIM (Structural Similarity Index) evaluates image similarity using three key components: brightness, contrast, and structure. The index value ranges from 0 to 1, with higher values indicating superior quality in visual perception.
(3): Inference speed refers to the total time taken by the model from receiving an input image to producing the output recovery result. It serves as a fundamental indicator of the model’s real-time performance, typically expressed in terms of single-image processing time (ms/image or s/image) or frame rate (FPS).

5.1.2. Traditional Versus Deep Learning Methods

As demonstrated in the experimental results presented in Table 6, deep learning-based image restoration methods have successfully outperformed traditional recognition methods in terms of PSNR and SSIM metrics when applied to large-scale datasets. Among the JORDER and NAS algorithms, deep learning-based methods exhibited particularly impressive performance. Regarding PSNR, the methods proposed by Yang et al. [19] and Li et al. [28] achieved values exceeding 30 dB, indicating that a substantial portion of the degraded image information has been successfully restored. In the PSNR metric, the methods proposed by Yang et al. [19] and Li et al. [28] achieved values exceeding 30 dB, indicating that most of the degraded image information has been restored, thereby enhancing visual quality and making it closer to that of the original image. In terms of SSIM, Yang et al. (2017) [19] and Q. Wu et al. (2023) [21] attained values of 0.95 or higher, which is at least 5% greater than the best performance observed in traditional methods. Regarding inference speed, the MSDL proposed by Q. Wu et al. [21] demonstrates a significant advantage over traditional methods, while most other deep learning algorithms also exhibit superior inference speeds. When trained on large datasets, deep learning methods exhibit enhanced capabilities in feature representation, which is particularly relevant to downsampled image restoration tasks and is a widely accepted conclusion across most visual tasks. However, the best metrics obtained from these datasets do not accurately reflect performance in real-world applications. Therefore, further research and validation of the model’s robustness, generalization ability, and adaptability to unknown data are essential for addressing complex real-world scenarios. Figure 10 and Figure 11 present radar charts illustrating the PSNR and SSIM values of the above methods, respectively.

5.2. Comparison of Detection Model Robustness

5.2.1. Evaluation Indicators

The robustness of all target detection models necessitates careful and thorough examination of accuracy, stability, and inference efficiency, which represent the fundamental constraints within the industrial lending process. Research aimed at improving the robustness of the detection models primarily concentrates on the mAP and real-time performance as key evaluation metrics.

(1): mAP (mean Average Precision): mAP serves as the core accuracy index of target detection, quantifying the model’s robustness against variations in target scale, occlusion, and deformation. This is achieved by calculating the average accuracy across categories under varying thresholds of intersection and merging ratios. A smaller fluctuation in mean Average Precision across other situations indicates a more stable detection model.
(2): Real-time performance is a critical aspect of model inference, particularly in the context of processing individual images. This performance is typically quantified by frames per second (FPS), which indicates the number of image frames that can be processed per second. In autonomous driving scenarios, it is imperative to process continuous video streams in real-time; any delays may result in decision-making errors. Therefore, ensuring robust real-time performance is crucial for effective severe weather detection.

5.2.2. Comparison of Robustness of Deep Learning-Based Detection Models

Table 7 presents two performance metrics for robustness enhancement methods applied to the SSD series, R-CNN series, and YOLO series across various deep learning datasets. In the SSD series, Hao Zhang et al. [47] enhanced feature extraction capabilities by leveraging the ResNet50 backbone network, achieving a mAP of 77.49% in general object detection tasks. Additionally, Cheng et al. [48] optimized the anchor box strategy specifically for the VOC2007 dataset, resulting in an enhanced map of 77.1%. Yue Hu et al. [6] proposed the M3-ECA-SSD model, which innovatively integrates multi-scale features with channel attention mechanisms. This model achieves a commendable balance between accuracy and speed on the ADE20k indoor scene dataset, significantly surpassing the performance of its two predecessors. In the realm of improved R-CNN series models, Wang et al. [50]’s Faster R-CNN—built upon an improved RegNet backbone network—attains the highest accuracy in complex object detection tasks. Furthermore, R. Rajarajeswari et al. [51]’s Mask R-CNN, featuring an advanced instance segmentation module, showcases notable advantages in fine-grained detection within the CityScapes urban scene dataset. The YOLO series has emerged as the preferred choice due to its innovative lightweight design. AL-YOLO, designed by B. S. Pour et al. [62], specifically targets foggy scenes while maintaining high mAP and elevated frame rates on the Foggy Cityscape dataset, demonstrating robustness in challenging environments. The comparative results indicate that the YOLO series achieves “dual metric” improvements under foggy and mixed weather conditions through scene customization. In contrast, the multimodal approach of R-CNN continues to serve as the benchmark for accuracy in specific scenarios. Figure 12 presents the MAP radar chart corresponding to the above methods.

6. Challenges and Prospects

According to the existing research on unmanned visual perception under severe weather conditions, recent advancements in image recovery and robustness improvement methods have led to significant improvements across various evaluation indexes. Nevertheless, there remains considerable potential for further enhancement. Currently, several issues and challenges persist in the study of unmanned visual perception under adverse weather conditions. In light of these existing problems and challenges, we now put forward the following outlooks:

(1) Most existing datasets are either synthesized in laboratory settings or derived from limited scenarios, resulting in significant domain discrepancies when compared to complex real-world weather environments. The simplified physical models used for synthetic data lead to a marked decline in model generalization performance under real-world scenarios. Additionally, the costs associated with annotating real-world data are substantial, and reproducing specific weather conditions poses considerable challenges. Therefore, future research should consider exploring hybrid architectures that integrate physical degradation models with deep learning techniques. This approach would leverage the physical interpretability of the model to guide network optimization directions while simultaneously enhancing adaptability to complex scenarios through data-driven approaches.

(2) Existing methods are predominantly designed for individual degradation types, exhibiting limited modeling capabilities when addressing coupled degradation under mixed weather conditions. The temporal variability inherent in dynamic weather further complicates the challenges associated with real-time perception. Consequently, future efforts should focus on constructing a dynamic neural weather field as a unified modeling framework: an unsupervised domain adaptation network can be employed to establish a latent space mapping between synthetic data and real data, augmented by self-supervised temporal consistency constraints to formulate a Markov evolution chain of weather parameters. Additionally, adversarial domain blurring mechanisms and contrastive feature analysis strategies may be utilized to facilitate cross-modal degradation-invariant learning.

(3) High-precision models rely on substantial computational resources, which poses challenges in fulfilling the real-time requirements of vehicle platforms. Conversely, lightweight models may increase leakage rates due to feature compression under extreme weather conditions. In future work, we propose the development of a time-series-aware dynamic weather modeling network that integrates neural network architecture search and model quantization techniques. This approach aims to achieve synergistic optimization of accuracy, efficiency, and memory. Furthermore, we can investigate model distillation and hardware-aware compilation techniques to support deployment on edge devices.

(4) Multimodal data fusion has fully realized its potential. The heterogeneity among sensors in existing methods presents significant challenges in effectively aligning cross-modal features. Variations in physical characteristics and information granularity across different modalities lead to feature conflicts. Additionally, hardware clock deviations and transmission delays cause temporal asynchrony, further exacerbating spatio-temporal mismatches within dynamic scenes. Moreover, the loss or degradation of modal data under extreme weather conditions can disrupt system design redundancy and diminish fusion performance. Therefore, future breakthroughs can be achieved through the following technical approaches: First, a cross-modal mapping matrix should be constructed based on an atmospheric scattering model combining optical flow and inertial measurement units (IMUs). This will facilitate dynamic spatio-temporal calibration and address issues related to feature alignment and temporal mismatch. Additionally, the introduction of a Bayesian neural network is proposed to design a cross-modal attention-gating mechanism. Furthermore, utilizing generative models such as diffusion-restorers can aid in reconstructing features when modalities are lost, thereby enhancing the robustness of the fusion architecture. Optimizing edge deployment through block sparse quantization and self-organizing neural network (SNN) conversion techniques, while dynamically switching between multimodal branches based on weather conditions, aims to balance computational efficiency and detection accuracy. To enhance the dataset, we propose utilizing dynamic weather-synchronized annotation data provided by millimeter-wave radar, cameras, and infrared sensors to establish a comprehensive testing platform. Furthermore, we advocate for the provision of standardized validation benchmarks for algorithms to facilitate the evolution of multimodal systems from static fusion towards autonomous cognition.

(5) Current research has yet to fully investigate the potential of emerging technologies such as diffusion models, self-supervised learning (SSL), and neural radiance fields (NeRF). These innovative technologies present new avenues for enhancing visual perception under adverse weather conditions. Diffusion models, in particular, are capable of generating highly realistic images depicting adverse weather images through a progressive denoising mechanism, thereby effectively mitigating the domain discrepancy between synthetic data and real-world scenarios. Future research could explore the integration of detection networks in an end-to-end framework to improve mAP under complex weather conditions. Self-supervised learning has the potential to extract weather-invariant features from unlabeled adverse weather data through techniques such as contrastive learning or masked image modeling. When combined with meta-learning mechanisms, it can dynamically adapt to new types of weather. Additionally, neural radiance fields (NeRF) can achieve physically accurate simulations of weather degradation by utilizing implicit 3D-scene representations. Combining dynamic NeRF technology has the potential to reduce the false negative rate in temporal detection. Its integration with lidar point clouds can further enhance visual detection by leveraging scene structure priors, particularly under extreme weather conditions. The deep amalgamation of these emerging technologies with existing “reconstruction–detection” frameworks shows promise in addressing challenges such as dynamic weather coupling modeling and cross-domain generalization, thereby advancing autonomous driving visual systems towards full-climate robustness. The technical processes involved are illustrated in Figure 13.

7. Conclusions

Unmanned visual target detection under adverse weather conditions represents one of the primary challenges in the realm of autonomous driving. With the continuous advancements in deep learning technology, significant progress has been made in developing methods for degraded image recovery and enhancing robustness. First, this paper systematically reviews the traditional techniques and deep learning methods aimed at enhancing the robustness of degraded image recovery and detection models. It provides a concise summary of the advantages and disadvantages associated with each type of method. Secondly, we introduce 11 public datasets along with their download links for testing purposes. In addition, three kinds of evaluation metrics for image recovery are discussed in detail, comparing traditional methods of degraded image recovery with deep learning approaches. Furthermore, two types of robustness evaluation indexes are elaborated upon, highlighting the comparative effectiveness of various deep learning strategies for improving detection model robustness. Finally, it synthesizes the existing problems and challenges associated with unmanned visual perception under severe weather conditions, while also making predictions and outlining prospects for future research directions. In conclusion, the field of unmanned visual target detection under severe weather conditions has gradually changed from traditional methodologies to mainstream approaches grounded in deep learning. Nevertheless, there remains a pressing need for further research into image degradation recovery methods and enhancements to the robustness of detection models. Addressing these deficiencies holds significant potential for advancing the field in future studies.

Author Contributions

Conceptualization, Y.S. and Y.L.; methodology, Y.S. and Y.L.; validation, Y.S. and Y.L.; investigation, Y.S.; resources, Y.S. and Y.L.; writing—original draft preparation, Y.S.; writing—review and editing, Y.S. and Y.L.; visualization, Y.S.; supervision, Y.L.; project administration, Y.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the Strategic Priority Research Program of the Chinese Academy of Sciences (Grants XDA0450200, XDA0450202), the fundamental Research Funds for the China Central Universities at Beijing University of Chemical Technology (BH202530), and the Open Projects Program of State Key Laboratory of Multimodal Artificial Intelligence Systems.

Data Availability Statement

Data is contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

DCP	Dark channel a priori
PSNR	Peak Signal-to-Noise Ratio
NAS	Neural Architecture Search
GMM	Gaussian mixture model
CLAHE	Contrast Limited Adaptive Histogram Equalization

References

He, K.; Sun, J.; Tang, X. Single image haze removal using dark channel prior. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 1956–1963. [Google Scholar]
Zhu, Q.; Mai, J.; Shao, L. A Fast Single Image Haze Removal Algorithm Using Color Attenuation Prior. IEEE Trans. Image Process. 2015, 24, 3522–3533. [Google Scholar] [PubMed]
Cai, B.; Xu, X.; Jia, K.; Qing, C.; Tao, D. DehazeNet: An End-to-End System for Single Image Haze Removal. IEEE Trans. Image Process. 2016, 25, 5187–5198. [Google Scholar] [CrossRef] [PubMed]
Fu, X.; Zeng, D.; Huang, Y.; Zhang, X.-P.; Ding, X. A Weighted Variational Model for Simultaneous Reflectance and Illumination Estimation. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 26 June–1 July 2016; pp. 2782–2790. [Google Scholar]
Guo, X.; Li, Y.; Ling, H. LIME: Low-Light Image Enhancement via Illumination Map Estimation. IEEE Trans. Image Process. 2017, 26, 982–993. [Google Scholar] [CrossRef] [PubMed]
Liu, R.; Ma, L.; Zhang, J.; Fan, X.; Luo, Z. Retinex-inspired Unrolling with Cooperative Prior Architecture Search for Low-light Image Enhancement. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 10556–10565. [Google Scholar]
Kang, L.-W.; Lin, C.-W.; Fu, Y.-H. Automatic Single-Image-Based Rain Streaks Removal via Image Decomposition. IEEE Trans. Image Process. 2012, 21, 1742–1755. [Google Scholar] [CrossRef]
Li, Y.; Tan, R.T.; Guo, X.; Lu, J.; Brown, M.S. Rain Streak Removal Using Layer Priors. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 26 June–1 July 2016; pp. 2736–2744. [Google Scholar]
Guo, X.; Fu, X.; Zha, Z.-J. Exploring Local Sparse Structure Prior for Image Deraining and Desnowing. IEEE Signal Process. Lett. 2025, 32, 406–410. [Google Scholar] [CrossRef]
Berman, D.; Treibitz, T.; Avidan, S. Non-local Image Dehazing. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 26 June–1 July 2016; pp. 1674–1682. [Google Scholar]
Zhang, H.; Patel, V.M. Densely Connected Pyramid Dehazing Network. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 3194–3203. [Google Scholar]
Wu, M.; Jiang, A.; Chen, H.; Ye, J. Physical-prior-guided single image dehazing network via unpaired contrastive learning. Multimed. Syst. 2024, 30, 261. [Google Scholar] [CrossRef]
Zhao, H.; Zhang, J.; Chen, Z.; Zhao, S.; Tao, D. UniMix: Towards Domain Adaptive and Generalizable LiDAR Semantic Segmentation in Adverse Weather. In Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 17–21 June 2024; pp. 14781–14791. [Google Scholar]
Sulami, M.; Glatzer, I.; Fattal, R.; Werman, M. Automatic recovery of the atmospheric light in hazy images. In Proceedings of the 2014 IEEE International Conference on Computational Photography (ICCP), Santa Clara, CA, USA, 2–4 May 2014; pp. 1–11. [Google Scholar]
Zhao, C.; Liu, J.; Zhang, J. A Dual Model for Restoring Images Corrupted by Mixture of Additive and Multiplicative Noise. IEEE Access 2021, 9, 168869–168888. [Google Scholar] [CrossRef]
Gao, W.; Zhua, J.; Hao, B. Group-based weighted nuclear norm minimization for Cauchy noise removal with TV regularization. Digit. Signal Process. 2025, 156, 104836. [Google Scholar] [CrossRef]
Jobson, D.J.; Rahman, Z.; Woodell, G.A. Properties and performance of a center/surround retinex. IEEE Trans. Image Process. 1997, 6, 451–462. [Google Scholar] [CrossRef]
Zhang, K.; Zuo, W.; Gu, S.; Zhang, L. Learning Deep CNN Denoiser Prior for Image Restoration. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 2808–2817. [Google Scholar]
Yang, W.; Tan, R.T.; Feng, J.; Liu, J.; Guo, Z.; Yan, S. Deep Joint Rain Detection and Removal from a Single Image. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
Tajane, K.; Rathkanthiwar, V.; Chava, G.; Dhavale, S.; Chawda, G.; Pitale, R. EffiConvRes: An Efficient Convolutional Neural Network with Residual Connections and Depthwise Convolutions. In Proceedings of the 2023 7th International Conference on Computing, Communication, Control and Automation (ICCUBEA), Pune, India, 18–19 August 2023; pp. 1–5. [Google Scholar]
Wu, Q.; Liu, J.; Feng, M. MSDB-based CNN architecture for image dehazing in driverless cars. In Proceedings of the 2023 IEEE 3rd International Conference on Power, Electronics and Computer Applications (ICPECA), Shenyang, China, 29–31 January 2023; pp. 789–794. [Google Scholar]
Engin, D.; Genc, A.; Ekenel, H.K. Cycle-Dehaze: Enhanced CycleGAN for Single Image Dehazing. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Salt Lake City, UT, USA, 18–22 June 2018; pp. 938–9388. [Google Scholar]
Dudhane, A.; Aulakh, H.S.; Murala, S. RI-GAN: An End-To-End Network for Single Image Haze Removal. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Long Beach, CA, USA, 16–17 June 2019; pp. 2014–2023. [Google Scholar]
Huang, Y. ViT-R50 GAN: Vision Transformers Hybrid Model based Generative Adversarial Networks for Image Generation. In Proceedings of the 2023 3rd International Conference on Consumer Electronics and Computer Engineering (ICCECE), Guangzhou, China, 6–8 January 2023; pp. 590–593. [Google Scholar]
Liu, D.; Wen, B.; Fan, Y.; Loy, C.C.; Huang, T.S. Non-local recurrent network for image restoration. In NIPS’18: Proceedings of the 32nd International Conference on Neural Information Processing Systems, Proceedings of the 32nd International Conference on Neural Information Processing Systems, Montréal Canada, 3–8 December 2018; Curran Associates Inc.: Red Hook, NY, USA, 2018; pp. 1680–1689. [Google Scholar]
Liang, J.; Fan, Y.; Xiang, X.; Ranjan, R.; Ilg, E.; Green, S.; Cao, J.; Zhang, K.; Timofte, R.; Van Gool, L. Recurrent Video Restoration Transformer with Guided Deformable Attention. In Proceedings of the Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022. [Google Scholar]
Rota, C.; Buzzelli, M.; Bianco, S.; Schettini, R. A RNN for Temporal Consistency in Low-Light Videos Enhanced by Single-Frame Methods. IEEE Signal Process. Lett. 2024, 31, 2795–2799. [Google Scholar] [CrossRef]
Li, R.; Tan, R.T.; Cheong, L.-F. All in One Bad Weather Removal Using Architectural Search. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 3172–3182. [Google Scholar]
Zhang, Y.; Tian, Y.; Kong, Y.; Zhong, B.; Fu, Y. Residual Dense Network for Image Restoration. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 43, 2480–2495. [Google Scholar] [CrossRef] [PubMed]
Wang, J.; Zhou, Y.; Fu, M. Comprehensive Weather Recognition Using ResNet-50 and a Novel Weather Image Dataset. In Proceedings of the 2024 4th International Signal Processing, Communications and Engineering Management Conference (ISPCEM), Montreal, QC, Canada, 28–30 November 2024; pp. 455–459. [Google Scholar]
Batool, I.; Imran, M. A dual residual dense network for image denoising. Eng. Appl. Artif. Intell. 2025, 147, 110275. [Google Scholar] [CrossRef]
Song, Y.; He, Z.; Qian, H.; Du, X. Vision Transformers for Single Image Dehazing. IEEE Trans. Image Process. 2023, 32, 1927–1941. [Google Scholar] [CrossRef] [PubMed]
Guo, L.; Lu, Y.; Qu, J.; Zheng, S.; Jiang, R.; Lu, Y. Transformer-based Spiking Neural Networks for Multimodal Audio-Visual Classification. IEEE Trans. Cogn. Dev. Syst. 2024, 16, 1077–1086. [Google Scholar] [CrossRef]
Zhu, R.; Tu, Z.; Liu, J.; Bovik, A.C.; Fan, Y. MWFormer: Multi-Weather Image Restoration Using Degradation-Aware Transformers. IEEE Trans. Image Process. 2024, 33, 6790–6805. [Google Scholar] [CrossRef]
Li, J.; Wang, Z.; Wan, J.; Si, H.; Wang, X.; Tan, G. RestoreCUFormer: Transformers to Make Strong Encoders via Two-stage Knowledge Learning For Multiple Adverse Weather Removal. In Proceedings of the 2024 International Joint Conference on Neural Networks (IJCNN), Yokohama, Japan, 30 June–5 July 2024; pp. 1–8. [Google Scholar]
Cheng, L.; Cao, S. TransRAD: Retentive Vision Transformer for Enhanced Radar Object Detection. IEEE Trans. Radar Syst. 2025, 3, 303–317. [Google Scholar] [CrossRef]
Gautam, K.S.; Tripathi, A.K.; Rao, M.V.S. Vectorization and Optimization of Fog Removal Algorithm. In Proceedings of the 2016 IEEE 6th International Conference on Advanced Computing (IACC), Bhimavaram, India, 27–28 February 2016; pp. 362–367. [Google Scholar]
Qi, H.; Li, F.; Chen, P.; Tan, S.; Luo, X.; Xie, T. Edge-preserving image restoration based on a weighted anisotropic diffusion model. Pattern Recognit. Lett. 2024, 184, 80–88. [Google Scholar] [CrossRef]
Palanisamy, V.; Malarvel, M.; Thangakumar, J. Anisotropic Diffusion Method on Multiple Domain Noisy Images: A Recommendation. In Proceedings of the 2024 International Conference on Advances in Data Engineering and Intelligent Computing Systems (ADICS), Chennai, India, 18–19 April 2024; pp. 1–5. [Google Scholar]
Liu, Y.; Zhu, T.-F.; Luo, Z.; Ouyang, X.-P. 3D robust anisotropic diffusion filtering algorithm for sparse view neutron computed tomography 3D image reconstruction. Nucl. Sci. Tech. 2024, 35, 50. [Google Scholar] [CrossRef]
Yuan, Z.; Zeng, J.; Wei, Z.; Jin, L.; Zhao, S.; Liu, X.; Zhang, Y.; Zhou, G. CLAHE-Based Low-Light Image Enhancement for Robust Object Detection in Overhead Power Transmission System. IEEE Trans. Power Deliv. 2023, 38, 2240–2243. [Google Scholar] [CrossRef]
Zhang, Y.; Wang, Z.; Wang, H.; Blaabjerg, F. Artificial Intelligence-Aided Thermal Model Considering Cross-Coupling Effects. IEEE Trans. Power Electron. 2020, 35, 9998–10002. [Google Scholar] [CrossRef]
Lashkov, I.; Yuan, R.; Zhang, G. Edge-Computing-Facilitated Nighttime Vehicle Detection Investigations with CLAHE-Enhanced Images. IEEE Trans. Intell. Transp. Syst. 2023, 24, 3370–13383. [Google Scholar] [CrossRef]
Kou, F.; Chen, W.; Wen, C.; Li, Z. Gradient Domain Guided Image Filtering. IEEE Trans. Image Process. 2015, 24, 4528–4539. [Google Scholar] [CrossRef] [PubMed]
Lu, Z.; Long, B.; Li, K.; Lu, F. Effective Guided Image Filtering for Contrast Enhancement. IEEE Signal Process. Lett. 2018, 25, 1585–1589. [Google Scholar] [CrossRef]
Li, J.; Wang, Y.; Chen, F.; Wang, Y.; Chen, Q.; Sui, X. Multi exposure fusion for high dynamic range imaging via multi-channel gradient tensor. Digit. Signal Process. 2025, 156, 104821. [Google Scholar] [CrossRef]
Zhang, H.; Huang, W.; Qi, J. Design and implementation of object image detection interface system based on PyQt5 and improved SSD algorithm. In Proceedings of the 2022 IEEE 10th Joint International Information Technology and Artificial Intelligence Conference (ITAIC), Chongqing, China, 17–19 June 2022; pp. 2086–2090. [Google Scholar]
Cheng, X.; Zhang, X.; Zhao, Z.; Huang, X.; Han, X.; Wu, X. An improved SSD target detection method based on deep separable convolution. In Proceedings of the 2024 6th International Conference on Internet of Things, Automation and Artificial Intelligence (IoTAAI), Guangzhou, China, 26–28 July 2024; pp. 92–96. [Google Scholar]
Hu, Y.; Zhang, Q. Improved Small Target Detection Algorithm Based on SSD. In Proceedings of the 2024 5th International Conference on Computer Vision, Image and Deep Learning (CVIDL), Zhuhai, China, 19–21 April 2024; pp. 1421–1425. [Google Scholar]
Wang, Z.; Cao, Y.; Li, J. A Detection Algorithm Based on Improved Faster R-CNN for Spacecraft Components. In Proceedings of the 2023 IEEE International Conference on Image Processing and Computer Applications (ICIPCA), Changchun, China, 11–13 August 2023; pp. 1–5. [Google Scholar]
Rajarajeswari, R.; Sankaradass, V. Multi-Object Recognition and Segmentation using Enhanced Mask R-CNN for Intricate Image Scenes. In Proceedings of the 2023 International Conference on Data Science, Agents & Artificial Intelligence (ICDSAAI), Chennai, India, 21–23 December 2023; pp. 1–6. [Google Scholar]
Zhan, X.; Li, C. Study of Mask R-CNN in Target Detection Based on Improved Feature Pyramid Networks. In Proceedings of the 2024 6th International Conference on Frontier Technologies of Information and Computer (ICFTIC), Qingdao, China, 8–10 November 2024; pp. 846–850. [Google Scholar]
Lu, Y.-F.; Yu, Q.; Gao, J.-W.; Li, Y.; Zou, J.-C.; Qiao, H. Cross Stage Partial Connections based Weighted Bi-directional Feature Pyramid and Enhanced Spatial Transformation Network for Robust Object Detection. Neurocomputing 2022, 513, 70–82. [Google Scholar] [CrossRef]
Lu, Y.; Gao, J.; Yu, Q.; Li, Y.; Lv, Y.; Qiao, H. A Cross-Scale and Illumination Invariance-Based Model for Robust Object Detection in Traffic Surveillance Scenarios. IEEE Trans. Intell. Transp. Syst. 2023, 24, 6989–6999. [Google Scholar] [CrossRef]
Abbasi, H.; Amini, M.; Yu, F.R. Fog-Aware Adaptive YOLO for Object Detection in Adverse Weather. In Proceedings of the 2023 IEEE Sensors Applications Symposium (SAS), Ottawa, ON, Canada, 18–20 July 2023; pp. 1–6. [Google Scholar]
Li, Y.; Lu, Y.; Wu, K.; Fang, Y.; Zheng, C.; Zhang, J. Intelligent Inspection System for Power Insulators based on AAV on Complex Weather Conditions. IEEE Trans. Appl. Supercond. 2024, 34, 1–4. [Google Scholar] [CrossRef]
Pour, B.S.; Jozani, H.M.; Shokouhi, S.B. AL-YOLO: Accurate and Lightweight Vehicle and Pedestrian Detector in Foggy Weather. In Proceedings of the 2024 14th International Conference on Computer and Knowledge Engineering (ICCKE), Mashhad, Iran, 19–20 November 2024; pp. 131–136. [Google Scholar]
Gupta, H.; Kotlyar, O.; Andreasson, H.; Lilienthal, A.J. Robust Object Detection in Challenging Weather Conditions. In Proceedings of the 2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA, 3–8 January 2024; pp. 7508–7517. [Google Scholar]
Varma, P.B.S.; Adimoolam, P.; Marna, Y.L.; Vengala, A.; Sundar, V.S.D.; Kumar, M.V.T.R.P. Enhancing Robust Object Detection in Weather-Impacted Environments using Deep Learning Techniques. In Proceedings of the 2024 2nd International Conference on Self Sustainable Artificial Intelligence Systems (ICSSAS), Erode, India, 23–25 October 2024; pp. 599–604. [Google Scholar]
Rattanpal, S.; Kashish, K.; Kumari, T.; Manvi; Gupta, S. Object Detection in Adverse Weather Conditions using Machine Learning. In Proceedings of the 2024 13th International Conference on System Modeling & Advancement in Research Trends (SMART), Moradabad, India, 6–7 December 2024; pp. 239–247. [Google Scholar]
Ma, J.; Lin, M.; Zhou, G.; Jia, Z. Joint Image Restoration for Domain Adaptive Object Detection in Foggy Weather Condition. In Proceedings of the 2024 IEEE International Conference on Image Processing (ICIP), Abu Dhabi, United Arab Emirates, 27–30 October 2024; pp. 542–548. [Google Scholar]
Yang, J.; Tian, T.; Liu, Y.; Li, C.; Wu, D.; Wang, L.; Wang, X. A Rainy Day Object Detection Method Based on YOLOv11 Combined with FFT and MF Model Fusion. In Proceedings of the 2024 International Conference on Advanced Control Systems and Automation Technologies (ACSAT), Nanjing, China, 15–17 November 2024; pp. 246–250. [Google Scholar]
Qu, J.; Gao, Z.; Zhang, T.; Lu, Y.; Tang, H.; Qiao, H. Spiking Neural Network for Ultralow-Latency and High-Accurate Object Detection. IEEE Trans. Neural Netw. Learn. Syst. 2025, 36, 4934–4946. [Google Scholar] [CrossRef]
Li, B.; Ren, W.; Fu, D.; Tao, D.; Feng, D.; Zeng, W.; Wang, Z. Benchmarking Single-Image Dehazing and Beyond. IEEE Trans. Image Process. 2019, 28, 492–505. [Google Scholar] [CrossRef]
Hahner, M.; Dai, D.; Sakaridis, C.; Zaech, J.-N.; Gool, L.V. Semantic Understanding of Foggy Scenes with Purely Synthetic Data. In Proceedings of the 2019 IEEE Intelligent Transportation Systems Conference (ITSC), Auckland, New Zealand, 27–30 October 2019; pp. 3675–3681. [Google Scholar]
Ancuti, C.O.; Ancuti, C.; Timofte, R.; De Vleeschouwer, C. O-HAZE: A Dehazing Benchmark with Real Hazy and Haze-Free Outdoor Images. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Salt Lake City, UT, USA, 18–22 June 2018; pp. 867–8678. [Google Scholar]
Hu, X.; Fu, C.-W.; Zhu, L.; Heng, P.-A. Depth-Attentional Features for Single-Image Rain Removal. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 8014–8023. [Google Scholar]
Zhang, H.; Sindagi, V.; Patel, V.M. Image De-Raining Using a Conditional Generative Adversarial Network. IEEE Trans. Circuits Syst. Video Technol. 2020, 30, 3943–3956. [Google Scholar] [CrossRef]
Jiang, K.; Wang, Z.; Yi, P.; Chen, C.; Huang, B.; Luo, Y.; Ma, J.; Jiang, J. Multi-Scale Progressive Fusion Network for Single Image Deraining. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 8343–8352. [Google Scholar]
Qian, R.; Tan, R.T.; Yang, W.; Su, J.; Liu, J. Attentive Generative Adversarial Network for Raindrop Removal from A Single Image. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 2482–2491. [Google Scholar]
Zhang, K.; Li, R.; Yu, Y.; Luo, W.; Li, C. Deep Dense Multi-Scale Network for Snow Removal Using Semantic and Depth Priors. IEEE Trans. Image Process. 2021, 30, 7419–7431. [Google Scholar] [CrossRef] [PubMed]
Liu, Y.-F.; Jaw, D.-W.; Huang, S.-C.; Hwang, J.-N. DesnowNet: Context-Aware Deep Network for Snow Removal. IEEE Trans. Image Process. 2018, 27, 3064–3073. [Google Scholar] [CrossRef] [PubMed]
Choi, Y.; Kim, N.; Hwang, S.; Park, K.; Yoon, J.S.; An, K.; Kweon, I.S. KAIST Multi-Spectral Day/Night Data Set for Autonomous and Assisted Driving. IEEE Trans. Intell. Transp. Syst. 2018, 19, 934–948. [Google Scholar] [CrossRef]
Yu, F.; Chen, H.; Wang, X.; Xian, W.; Chen, Y.; Liu, F.; Madhavan, V.; Darrell, T. BDD100K: A Diverse Driving Dataset for Heterogeneous Multitask Learning. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 2633–2642. [Google Scholar]

Figure 1. Comprehensive framework of “degradation recovery, robust detection, data benchmarking, and performance evaluation”.

Figure 2. Fog removal method based on dark channel priority (DCP) and flowchart of low-light enhancement based on Retinex theory.

Figure 3. Development of traditional degraded image recovery methods [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16].

Figure 4. Flowchart of image restoration methods based on CNN, GAN, and Transformers.

Figure 5. Development of deep learning-based methods for degraded image recovery [18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36].

Figure 6. Flowchart of methods for improving robustness based on anisotropic diffusion filtering and CLAHE-based contrast enhancement.

Figure 7. The evolution of traditional methods for improving the robustness of detection models [37,38,39,40,41,42,43,44,45,46].

Figure 8. Flowchart of methods for improving robustness based on R-CNN and YOLO.

Figure 9. Development of deep learning-based methods for robustness improvement of detection models [47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63].

Figure 10. PSNR radar chart based on the above method.

Figure 11. SSIM value radar chart based on the above method.

Figure 12. Radar chart of MAP values based on SSD, R-CNN, and YOLO.

Figure 13. Technical flowchart of diffusion models, SSL, and NeRF.

Table 1. Summary of traditional degraded image recovery.

Categorization	Algorithmic Model	Reference	Dataset	Algorithm Characteristics	Summary of Advantages/Disadvantages
Dark channel prior-based defogging approach	DCP + physical scattering model	He et al. (2009) [1]	Training set: 500 prints Test set: 100 frames	Estimating transmittance and atmospheric light using statistical properties of dark channels in natural scenes without training data	Strong physical interpretability, significant defogging effect, but the sky region is prone to halo, high computational complexity
	Color decay a priori-simplified DCP	Zhu et al. (2015) [2]	Homemade synthetic data	Fast transmittance estimation based on color attenuation a priori, avoiding dark channel calculations	Computational speed is fast and suitable for real-time processing, but the recovery effect of dense fog scene decreases
	DCP + Convolutional Network Optimization	Cai et al. (2016) [3]	Dark Channel Synthesis Data	Combining DCP a priori with shallow CNN to optimize transmittance estimation	Reduces manual parameter adjustment and improves defogging efficiency, but relies on synthetic data and has limited generalization to real scenarios
Low-light enhancement based on Retinex theory	Variational Retinex Optimization	Fu et al. (2016) [4]	Training set: 500 frames Test set: 100 frames	Joint optimization of light reflection components by weighted variational models	Better global illumination consistency, but high computational complexity, need to be solved iteratively
	LIME	Guo et al. (2017) [5]	Training set: 300 frames Test set: 50 frames	Maximum a priori fast enhancement of low-light regions based on light maps	Strong real-time performance and significant detail enhancement, but noise amplification is a significant problem
	RUAS	R. Liu et al. (2021) [6]	(1) MIT-Adobe 5K (2) LOL Dataset (3) DarkFace (4) ExtremelyDarkFace	First to combine Retinex rollout optimization with NAS for lightweight and efficient end-to-end low-light enhancement	Efficient and lightweight with strong robustness, but in extreme noise scenarios, the NRM module may not be able to remove complex noise completely
Rain/snow removal method based on sparse representation	Sparse Representation + Dictionary Learning	L. W. Kang et al. (2012) [7]	Photoshop manually adds rain pattern generation	First combination of MCA and self-learning dictionary for single-image rain removal without external training data	No temporal information is required and dictionary learning is based entirely on the input image itself, but the computational complexity is high
	GMM-based image decomposition method	Y. Li et al. (2016) [8]	Synthetic data	Dual GMM a priori modeling with a joint optimization framework	In single-image rain pattern removal, a balance between detail preservation and rain pattern suppression is achieved, but the dependence on local region selection limits its practical application.
	Local sparse structure a priori + MGNet	Xin guo et al. (2025) [9]	(1) Rain13K (2) Snow100K (3) RS (4) SnowKitti2012	Localized sparse structure a priori, two-stage mask-guided recovery network	Local sparse prior with adaptive mask extraction algorithm supports localization of multiple unknown rain and snow types, but heavy rain/blizzard scenarios rely on neighborhood reconstruction
Physical model-based image restoration	Non-localized defogging	Berman et al. (2016) [10]	BSDS300	Optimizing transmittance using hyperpixel segmentation and non-local similarity	Reduces sky artifacts but relies on superpixel segmentation accuracy
	DCPDN	H. Zhang et al. (2018) [11]	(1) TestA (2) TestB	End-to-end joint optimization, multi-scale feature fusion, staged training strategy	Multi-scale detail recovery, joint discriminator reduces color distortion on real images, but has high computational complexity and relies on synthetic data
	PDUNet	Wu. et al. (2024) [12]	(1) RESIDE (2) RTTS (3) SOTS-indoor, SOTS-outdoor (4) Dense-Haze	Pre-training (synthetic data) + fine-tuning (real data), hybrid physics a priori	Physical prior fused with deep learning, but with strong two-stage training dependencies and high computational complexity
Other traditional recovery methods	Fog concentration area growth + localized contrast stretching	Sulami et al. (2014) [14]	Training set: none Test set: 100 frames	Segmentation of fog concentration regions based on seed point growth with region-by-region adaptive enhancement	Solve the problem of non-uniform fog concentration, enhance the natural transition of the results, but the seed points need to be initialized manually, but the degree of automation is low
	Adaptive variational modeling	C. Zhao et al. (2021) [15]	Training set: 5544 frames Test set: House, Lena	Adaptive noise detection, supports multiple regularization methods, and can be adapted to different denoising needs	Superior denoising performance and applicable to multiple scenarios, but insufficient parameter estimation accuracy and high computational complexity
	WNNM + TV	Wen Gao et al. (2025) [16]	8 standard grayscale images	Combining TV and WNNM to balance local smoothness with global structural sparsity	High detail retention, but high computational complexity and memory consumption

Table 2. Summary of degraded image recovery methods based on deep learning.

Categorization	Algorithmic Model	Reference	Dataset	Algorithm Characteristics	Summary of Advantages/Disadvantages
CNN-based Degraded Image Recovery Method	HQS + CNN	Zhang et al. (2017) [18]	(1) CBSD68 (2) BSD68	Dynamic weights automatically differentiate between additive/multiplicative noise and noise levels without manual intervention	Computationally lightweight and supports real-time processing, but not specifically optimized for weather degradation, with limited recovery effect
	DehazeNet	Cai et al. (2016) [3]	(1) RESIDE (2) HSTS	First End-to-End CNN Defogging Scattering Model combined with the introduction of a Density-aware Loss Function	Fast reasoning, unified approach, but shallow network and insufficient detail recovery
	JORDER	Yang et al. (2017) [19]	(1) Rain100L, Rain100H (2) Rain12	Introducing binary rain line maps into the model separates rain line position and intensity information, providing a stronger supervisory signal	Complex Scenarios Adaptable, but improvements needed to balance synthetic data dependency with computational complexity
	EffiConvRes	K. Tajane et al. (2023) [20]	(1) CIFAR-10 (2) CIFAR-100	Multi-architecture convergence, lightweight design, efficient training strategies	High accuracy and efficiency balanced with strong generalization capabilities
	MSDL	Q. Wu et al. (2023) [21]	(1) NYU2 (2) RESIDE	Joint parameter estimation, an end-to-end learning framework	Highly efficient defogging effect and optimized computational efficiency, but relies on synthetic data training, and has high graphics memory requirements
GAN-based recovery method for degraded images	Cycle-Dehaze	Engin et al. (2018) [22]	D-HAZY	Unsupervised training via cyclic consistency loss with physical constraints to enhance interpretability	No need for pairs of data or adapting to real scenes, but the cycle training is time-consuming, and the generated images are prone to distortion
	RI-GAN	A. Dudhane et al. (2019) [23]	(1) NTIRE2019 (2) D-Hazy (3) SOTS	Bypassing intermediate estimation of transmission maps and atmospheric light to directly generate fog-free images and reduce cascading errors	Strong detail retention and high color fidelity, but relies on synthetic data for training and high computational resource requirements
	ViT-based GAN	Y. Huang et al. (2023) [24]	(1) CIFAR-10 (2) CelebA (3) LSUN Church	Global and local feature fusion with the introduction of StyleGAN2’s nonlinear mapping network	Superior generation quality and hybrid design of discriminators can be adapted to different generators, but with high computational resource requirements
RNN-based Degraded Image Recovery Method	NLRN	Liu et al. (2018) [25]	BSD	Combination of non-local operations with RNN; the same model supports denoising and super-resolution tasks	Strong resistance to degradation, but computational complexity and neighborhood tuning limit its practical application
	RVRT	Liang et al. (2022) [26]	(1) REDS (2) Vimeo-90K (3) Vid4 (4) GoPro (5) DAVIS	Balancing model efficiency and performance, GDA supports multi-location dynamic aggregation	Superior performance, efficient design, superior robustness, but still high computational complexity
	RNN + bidirectional timing constraints	C. Rota et al. (2022) [27]	(1) DID (2) SDSD	Bidirectional timing modeling; RAFT-based high-precision optical flow estimation, combined with masking masks to reduce ghosting artifacts	Strong fidelity, low computational overhead, but dependent on optical flow accuracy
ResNet-based Degraded Image Recovery method	NAS	Li et al. (2020) [28]	AllWeather	ResNet-152 Backbone + dynamic weather classification branch for adaptive recovery from different weather degradations	A single model handles multiple weather types, and joint classification–recovery training improves generalization, but the number of model parameters is large and hardware acceleration is required for deployment
ResNet-based Degraded Image Recovery method	RDN	Y. Zhang et al. (2021) [29]	DIV2K	Enhanced detail recovery through dense connectivity and global fusion to fully extract and fuse shallow to deep features	Lightweight design and multitasking-compatibile, but further optimization is needed for extreme degradation scenarios
ResNet-based Degraded Image Recovery method	ResNet50	Wang et al. (2024) [30]	Total 1759 images in 7 categories Training set: n/a Test set: n/a	Set out to create a customized dataset, this dataset was specifically designed to identify seven different weather conditions while exploring various machine learning techniques.	The dataset is more comprehensive, providing a greater variety of weather types, and the best models have been identified on this dataset, but the dataset currently only contains indirect weather information images for one type of weather (sunny)
ResNet-based Degraded Image Recovery method	DRDNet	Isma Batool et al. (2025) [31]	(1) DIV2K (2) Kodak24 (3) BSDS300	Combining residual learning, dense connectivity, dilated convolution, and BRN for enhanced feature reuse and contextual modeling	High-performance denoising with high computational efficiency, but still needs to be optimized for extreme noise and mobile deployment
Degraded Image Recovery Method Based on Transformers	DehazeFormer	Song et al. (2023) [32]	Training set: 10,000 frames Test set: 2000 frames	Capturing local fog concentration differences with multi-scale window attention to jointly optimize transmittance and atmospheric light	Better detail recovery than CNN in dense fog areas, and is suitable for high-resolution in-vehicle cameras, but the model parameter count is large and real-time performance is limited
	SMMT	Y Lu et al. (2024) [33]	(1) CIFAR10-AV (2) UrbanSound8K-AV N-TIDIGIT&MNIST-DVS	Multimodal fusion advantage, bio-inspired design, and approximation of the gradient of a pulsed neuron by a Sigmoid function to support end-to-end training	High accuracy, low energy consumption, and multimodal complementarity, but with some computational complexity
	MWFormer	Zhu et al. (2024) [34]	(1) RainDrop (2) Snow100K	Addressing multi-weather degradation with a single unified architecture	Outperforms previously known multi-weather recovery models without much computational effort, but still has room for improvement
	RestoreCUFormer	Li et al. (2024) [35]	RESIDE	Two-stage knowledge learning, joint knowledge transfer, and multi-contrast learning	A method with a unified architecture and pre-trained weights to eliminate the negative effects caused by severe weather is proposed, but there are more parameters within the model architecture
	TransRAD	Cheng et al. (2025) [36]	RADDet	Combining attention mechanisms, multi-scale feature fusion, and task decoupling design for RMT	Customized modules for radar data characterization are available but need to be further validated for utility in more complex scenarios and hardware platforms

n/a is an abbreviation for “Not applicable”.

Table 3. Summary of traditional methods for improving robustness.

Categorization	Algorithmic Model	Reference	Dataset	Algorithm Characteristics	Summary of Advantages/Disadvantages
Anisotropic diffusion filtering-based approach	Anisotropic diffusion-based defogging framework	K. S. Gautam et al. (2016) [37]	Customized test images	LUT replacement and memory chunking strategy balances speed and quality	Efficient real-time processing, but accuracy loss and hardware dependency limit its pervasiveness in generalized scenarios
Anisotropic diffusion filtering-based approach	Weighted anisotropic diffusion	Huiqing Qi et al. (2024) [38]	Set12	Edge preservation and artifact suppression, adaptive parameter optimization	Multi-scale feature fusion reduces step effects in uniform regions, but has limitations in complex noise scenes and real-time deployment
	Improved Perona–Malik modeling	Palanisamy V et al. (2024) [39]	Custom Images	Improved Perona–Malik model for gradient calculation and diffusion coefficient design	Efficient denoising without training, but limited by noise type homogeneity and parameter tuning dependence
	OS-SART-ADF	Liu et al. (2024) [40]	3D Shepp–Logan	Three-dimensional anisotropic diffusion optimization, iterative filtering synergistic mechanism	Compatible with physical model-driven noise suppression strategies, but room for improvement in extreme noise and parameter-optimized automation
CLAHE-based detection framework	ACL-CLAHE	Z. Yuan et al. (2019) [41]	Capturing images	Optimized CLAHE via luminance channel for efficient low-noise enhancement	Detection of task-driven parameter optimization and real-time processing is superior, but generalizability still has room for improvement
	CLAHE-GIF	Chen et al. (2020) [42]	Self-built haze image set	Balancing contrast restoration and noise suppression in single-image defogging	Process simplicity and edge retention, but limited by dataset size
	ACL-CLAHE Enhanced Model	I. Lashkov et al. (2023) [43]	Self-built datasets	Dynamic CLAHE fused with dark channel defogging	Lightweight design and edge device adaptability, but extreme scenario adaptability still needs to be improved
Other traditional enhancement methods	GGIF	F. Kou et al. (2015) [44]	(1) Kodak (2) ASD	Multi-scale, edge-aware weights with explicit first-order gradient constraints	Improvement of parameter robustness and halo suppression for image preprocessing in bad weather, but limitations in extreme noise scenarios
	EGIF	Z. Lu et al. (2018) [45]	(1) Kodak (2) composite image	Balancing edge preservation and noise suppression through local variance regularization and content-adaptive amplification	Significant halo suppression and good noise control, but optimization for extreme noise and computational efficiency needs to be improved
	MGTF	Li et al. (2025) [46]	30 sets of different exposure images	Innovative combination of multi-channel gradient tensor and WAGGF filtering	Its efficiency and high metrics make it suitable for real-time HDR imaging, but adaptability to dynamic scenes and noise still needs improvement

Table 4. Summary of deep learning-based methods to improve robustness.

Categorization	Algorithmic Model	Reference	Dataset	Algorithm Characteristics	Summary of Advantages/Disadvantages
SSD-based enhancement methods	Improved SSD	Hao Zhang et al. (2022) [47]	Mixed datasets	Optimization of small target detection, balancing real-time and accuracy	Multi-scale feature fusion with efficient interactive interfaces, but computational complexity and hardware dependency still need to be optimized
	Improved SSDs	X. Cheng et al. (2024) [48]	PASCAL VOC	Lightweight design, efficient downsampling strategy, balanced accuracy and speed	Extremely low computational resource requirements and high detection accuracy, but further verification of generalization ability for complex scenarios is needed
	M3-ECA-SSD model	Yue Hu et al. (2024) [49]	ADE20k indoor scene dataset	Enhancing detection accuracy and real-time performance through lightweight backbone network and multi-scale feature fusion	Balancing accuracy and speed, but need to further validate cross-scene generalization capability
R-CNN-based enhancement method	Improved Faster R-CNN model	Wang et al. (2023) [50]	Low-Earth orbit spacecraft dataset	Significantly improved detection accuracy by using RegNet as the backbone network	Efficient NAS optimization and lightweight design, but dependent on computational resources
	Improved Mask R-CNN model	R. Rajarajeswari et al. (2023) [51]	CityScapes	Improved FPN structure supports multi-scale feature fusion to effectively handle dense targets	Complex scene adaptation and high-precision segmentation, but high computational consumption
	ECB-Mask R-CNN	X. Zhan et al. (2024) [52]	COCO 2017	Multi-scale feature fusion enhancement, hybrid convolution strategy, dynamic loss optimization	Dynamic feature fusion and training optimization, but extreme scene adaptation still needs further improvement
YOLO-based Degraded Image Recovery Method	Bi-STN-YOLO	Yan-Feng Lu et al. (2022) [53]	(1) Homemade non-rigid object dataset (2) DeepFashion2 (3) PASCAL VOC 2012 (4) MS COCO	Multi-scale feature fusion optimization with high adaptability to spatial deformation, and lightweight and high efficiency	The ESTN module effectively copes with geometric deformations and performs stably in extreme deformation scenarios, but the computational complexity is slightly higher
	YOLOv3	H. Abbasi et al. (2023) [54]	Pascal VOC	A proposed knowledge distillation framework to enhance the resilience of computer vision systems under adversarial conditions	Enhances the resilience and robustness of the object detection system, but it requires simulation algorithms to accurately represent unfavorable conditions, as well as a mechanism to classify the input images to specific severe weather conditions
	CSIM	Abbasi et al.(2023) [55]	(1) BDD100k (2) UA-DETRAC	Cross-scale feature fusion optimization, light-invariant design, attention mechanism enhancement	Excellent performance in scenes with drastic lighting changes and large-scale deformation, but with slightly reduced computational efficiency
	Improved YOLOv7 target detection model	Yi. Li et al. (2024) [56]	Self-built datasets	Supports visible, infrared and fused image inputs, adapting to complex lighting and weather conditions	Stable detection performance under low-light, foggy conditions, etc.; lightweight and efficient, but relies on high-quality data and hardware support
YOLO-based Degraded Image Recovery Method	AL-YOLO	B. S. Pour et al. (2024) [57]	Foggy Cityscape	Proposing a lightweight and accurate object detection model	Achieves better feature extraction and detection accuracy while being computationally efficient for effective severe single weather
	YOLOv5	H. Gupta et al. (2024) [58]	(1) BDD100k (2) DAWN	The approach used is a synthetic weather enhancement strategy that includes physics-based, GAN-based, and style migration methods	Efficient training driven by real data and by multidimensional performance metrics, but the use of high-resolution images and high-volume training has high hardware requirements, which may limit its application in real-time scenarios
	YOLO	P. Bharat Siva Varma et al. (2024) [59]	(1) ACDC (2) DAWN (3) Blurred images generated by QTNet	Weather domain transformation by QTNet and feature calibration by FCNet	Significantly improves target detection performance in bad weather and maintains advantages in real time and compatibility, but synthetic data dependence and training complexity may limit its application in extreme scenarios
	YOLO v9	Shveta et al. (2024) [60]	Argoverse	NAS-optimized architecture, attention mechanisms, and multi-sensor fusion	Multi-sensor fusion reduces false detection rate and provides fast inference, but reliance on synthetic data and high computational costs limit practical applications
	YOLOX	Ma et al. (2024) [61]	(1) Foggy Cityscapes (2) RTTS	An end-to-end domain-adaptive framework was designed to manually adjust the defogging module and drop hyperparameters	Outperforms traditional domain-adaptive methods but relies on synthetic fog data for training, extreme weather (e.g., heavy rain) generalizability under YOLOX combined with
	YOLOv11 + FFT + MF	Yang et al. (2024) [62]	Customized simulation datasets	Fusion of frequency/space domain preprocessing and dynamic model weighting	improved detection robustness in extreme weather, but needs to tradeoff real-time and computational resources
	SUHD	Jinye Qu et al. (2025) [63]	(1) PASCAL VOC (2) MS COCO (3) Dataset A, B	Ultra-low latency, high precision, and lossless structural conversion	The problem of high latency and low accuracy of SNNs in target detection has been solved, but improvement in dynamic environment adaptation is still needed

Table 5. Commonly used weather public datasets.

Name	Type of Weather	Total Number of Images	Type of Labeling	Particular Year	Reference	Download Address
RESIDE	vapors	10,000+	Fogless–foggy image pairs	2019	Li et al. [64]	http://sites.google.com/view/reside-dehaze-datasets (all accessed on 10 April 2025)
Foggy Cityscapes	vapors	15,000	Semantic segmentation, instance segmentation	2019	Sakaridis et al. [65]	https://www.cityscapes-dataset.com/downloads/
O-HAZE	vapors	90	Fogless–foggy image pairs	2018	Ancuti et al. [66]	https://data.vision.ee.ethz.ch/cvl/ntire18//o-haze/
RainCityscapes	rain	11,400	Target detection, instance segmentation	2019	Hu et al. [67]	https://www.cityscapes-dataset.com/dataset-overview/
Rain800	rain	800	No rain–rain image pairs	2019	Yang et al. [68]	https://github.com/hezhangsprinter/ID-CGAN
Rain13K	rain	18,010	No rain–rain image pairs	2020	Fu et al. [69]	https://github.com/kuijiang94/MSPFN
RainDrop	rain	1119	droplet mask	2018	Rui et al. [70]	https://github.com/rui1996/DeRaindrop
SnowyKITTI2012	snowfall	7488	3D target detection, depth estimation	2021	Zhang et al. [71]	https://github.com/HDCVLab/Deep-Dense-Multi-scale-Network-for-Snow-Removal/blob/main/kitti.txt
Snow-100K	snowfall	100,000+	Snowless–snowy image pairs	2018	Liu et al. [72]	https://github.com/HDCVLab/Deep-Dense-Multi-scale-Network-for-Snow-Removal/blob/main/snow100k.txt
KAIST Multispectral	multi-weather	95,000+	Pedestrian detection, target tracking	2018	Hwang et al. [73]	https://github.com/yuanmaoxun/Awesome-RGBT-Fusion?tab=readme-ov-file#Multispectral-Pedestrian-Detection
BDD100K	all-weather	100,000+	Target detection, semantic segmentation, lane detection	2020	Yu et al. [74]	http://bdd-data.berkeley.edu

Table 6. Traditional versus deep learning methods.

Categorization	Algorithmic Model	Reference	Dataset	PSNR (dB)	SSIM	Inference Speed
Traditional Methods	DCP + physical scattering model	He et al. (2009) [1]	Composite fog map	23.1–26.5	0.7609	6.25 FPS
	Color decay a priori-simplified DCP	Zhu et al. (2015) [2]	Homemade synthetic data	28.3	0.89	14.3 FPS
	RUAS	R. Liu et al. (2021) [6]	MIT-Adobe 5K	24.78	0.891	20 FPS
Deep Learning	JORDER	Yang et al. (2017) [19]	Rain100L	36.11	0.97	0.1 FPS
	MSDL	Q. Wu et al. (2023) [21]	RESIDE	27.51	0.9576	20 FPS
	RI-GAN	A. Dudhane et al. (2019) [23]	SOTS	19.828	0.85	---
	NAS	Li et al. (2020) [28]	Raindrop	30.12	0.9268	---

Table 7. Comparison of robustness of deep learning-based detection models.

Organizing Plan	Model Name	Reference	Dataset	MAP (%)	Topicality (FPS)
SSD Series	Improved SSD	Hao Zhang et al. (2022) [47]	Resnet50	77.49	--
	Improved SSDs	X. Cheng et al. (2024) [48]	VOC2007	77.1	--
	RUAS	Yue Hu et al. (2024) [6]	ADE20k	82.7	57.78
R-CNN Series	Improved Faster R-CNN model	Wang et al. (2023) [50]	Regnet	89.4	--
R-CNN Series	Improved Mask R-CNN model	R. Rajarajeswari et al. (2023) [51]	CityScapes	67.92	--
YOLO Series	CSIM	Abbasi et al. (2023) [55]	UA-DETRAC	64.47	51.1
	AL-YOLO	B. S. Pour et al. (2024) [57]	Foggy Cityscape	50.1	114.94
	YOLOv11 + FFT + MF	Yang, J. et al. (2024) [62]	Customized simulation datasets	65.2	--

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Song, Y.; Lu, Y. A Review of Unmanned Visual Target Detection in Adverse Weather. Electronics 2025, 14, 2582. https://doi.org/10.3390/electronics14132582

AMA Style

Song Y, Lu Y. A Review of Unmanned Visual Target Detection in Adverse Weather. Electronics. 2025; 14(13):2582. https://doi.org/10.3390/electronics14132582

Chicago/Turabian Style

Song, Yifei, and Yanfeng Lu. 2025. "A Review of Unmanned Visual Target Detection in Adverse Weather" Electronics 14, no. 13: 2582. https://doi.org/10.3390/electronics14132582

APA Style

Song, Y., & Lu, Y. (2025). A Review of Unmanned Visual Target Detection in Adverse Weather. Electronics, 14(13), 2582. https://doi.org/10.3390/electronics14132582

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Review of Unmanned Visual Target Detection in Adverse Weather

Abstract

1. Introduction

2. Degraded Image Recovery

2.1. Traditional Degraded Image Recovery Methods

2.1.1. Dark Channel Prior (DCP)-Based Defogging Approach

2.1.2. Low-Light Enhancement Based on Retinex Theory

2.1.3. Rain/Snow Removal Method Based on Sparse Representation

2.1.4. Physical Model-Based Image Restoration

2.1.5. Other Traditional Recovery Methods

2.1.6. Comparison and Analysis

2.2. Deep Learning-Based Degraded Image Recovery Methods

2.2.1. CNN-Based Method for Degraded Image Recovery

2.2.2. Inherent Contradiction Between GAN-Based Degraded Image Recovery Methods and Accuracy

2.2.3. RNN-Based Method for Recovery of Degraded Images

2.2.4. ResNet-Based Degraded Image Recovery Methods

2.2.5. Degraded Image Recovery Method Based on Transformer

2.2.6. Comparison and Analysis

3. Improving Detection Model Robustness

3.1. Traditional Methods for Improving Robustness

3.1.1. Anisotropic Diffusion Filtering-Based Approach

3.1.2. Contrast Enhancement Method Based on CLAHE

3.1.3. Other Traditional Enhancement Methods

3.1.4. Comparison and Analysis

3.2. Deep Learning-Based Approach to Improve Robustness

3.2.1. SSD-Based Detection Method Improvement

3.2.2. Improvement of R-CNN-Based Detection Methods

3.2.3. Improvement of YOLO-Based Detection Methods

3.2.4. Comparison and Analysis

4. Weather Public Datasets

5. Performance Comparison and Analysis of Severe Weather-Sensing Algorithms

5.1. Comparison of Degraded Image Recovery Methods

5.1.1. Evaluation of Indicators

5.1.2. Traditional Versus Deep Learning Methods

5.2. Comparison of Detection Model Robustness

5.2.1. Evaluation Indicators

5.2.2. Comparison of Robustness of Deep Learning-Based Detection Models

6. Challenges and Prospects

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI