WeatherClean: An Image Restoration Algorithm for UAV-Based Railway Inspection in Adverse Weather

Wang, Kewen; Yang, Shaobing; Zhang, Zexuan; Wang, Zhipeng; Jia, Limin; Li, Mengwei; Yu, Shengjia

doi:10.3390/s25154799

Open AccessArticle

WeatherClean: An Image Restoration Algorithm for UAV-Based Railway Inspection in Adverse Weather

by

Kewen Wang

^1,2,

Shaobing Yang

³,

Zexuan Zhang

³,

Zhipeng Wang

^3,*

,

Limin Jia

³,

Mengwei Li

³ and

Shengjia Yu

³

¹

School of Electrical Engineering, Beijing Jiaotong University, Beijing 100044, China

²

China Energy Investment Group Co., Ltd. and Xinshuo Railway Co., Ltd. Communications Technology Branch, Ordos 014300, China

³

School of Traffic and Transportation, Beijing Jiaotong University, Beijing 100044, China

^*

Author to whom correspondence should be addressed.

Sensors 2025, 25(15), 4799; https://doi.org/10.3390/s25154799

Submission received: 6 July 2025 / Revised: 29 July 2025 / Accepted: 1 August 2025 / Published: 4 August 2025

(This article belongs to the Section Sensing and Imaging)

Download

Browse Figures

Versions Notes

Abstract

UAV-based inspections are an effective way to ensure railway safety and have gained significant attention. However, images captured during complex weather conditions, such as rain, snow, or fog, often suffer from severe degradation, affecting image recognition accuracy. Existing algorithms for removing rain, snow, and fog have two main limitations: they do not adaptively learn features under varying weather complexities and struggle with managing complex noise patterns in drone inspections, leading to incomplete noise removal. To address these challenges, this study proposes a novel framework for removing rain, snow, and fog from drone images, called WeatherClean. This framework introduces a Weather Complexity Adjustment Factor (WCAF) in a parameterized adjustable network architecture to process weather degradation of varying degrees adaptively. It also employs a hierarchical multi-scale cropping strategy to enhance the recovery of fine noise and edge structures. Additionally, it incorporates a degradation synthesis method based on atmospheric scattering physical models to generate training samples that align with real-world weather patterns, thereby mitigating data scarcity issues. Experimental results show that WeatherClean outperforms existing methods by effectively removing noise particles while preserving image details. This advancement provides more reliable high-definition visual references for drone-based railway inspections, significantly enhancing inspection capabilities under complex weather conditions and ensuring the safety of railway operations.

Keywords:

railway safety; UAV inspection; rail perimeter intrusion detection; image de-raining and snow fogging

1. Introduction

In recent years, advancements in drone technology have made railway drone inspections significantly more efficient than traditional methods, particularly in speed and flexibility. However, adverse weather conditions often interfere with drone image acquisition, resulting in blurriness, reduced contrast, and unclear edges. These problems can hinder the drone’s ability to detect foreign objects during inspections. As a result, image enhancement techniques aimed at mitigating weather-related interference—such as rain, snow, and fog—have become a prominent area of research. The primary goal is to restore detailed features within the images, making them more suitable for subsequent analysis and decision-making. Most existing methods focus on eliminating only one type of weather-related noise. The main approaches include vision-based methods [1] and innovative deep learning-based techniques [2,3,4].

UAV-based inspection images pose several challenges, including small inspection targets, varying shooting angles, and complex environments. There is no dedicated algorithm for removing rain, snow, and fog from these images. Furthermore, existing algorithms do not effectively differentiate between various levels of adverse weather, employing a single network for both moderate and extreme conditions. From a network capability standpoint, challenging samples require more parameters, while simpler samples can achieve optimal restoration with fewer parameters. Additionally, current denoising algorithms often fail to eliminate noise particles, resulting in subpar image quality after denoising.

To address these challenges, this paper introduces a noise-supplemented image restoration model with a Dynamic Noise Adjustment Module (DNAM). This module dynamically adjusts its ability to eliminate complex weather effects during inference by modifying the Weather Complexity Adjustment Factor (WCAF). This approach eliminates the need for multiple independent networks while effectively managing various adverse weather conditions, enhancing computational efficiency. Furthermore, to tackle the insufficient handling of fine and residual noise in certain areas by existing denoising networks, this paper employs multi-scale cropping inputs to create multiple loss functions. This strategy improves the model’s capacity to restore local details and effectively reduce noise in small areas, ultimately enhancing overall image quality. As a result, the model provides more stable and accurate restoration performance in complex weather situations. The main contributions of this paper are as follows:

(1): We propose WeatherClean, the first drone inspection-specific rain, snow, and fog removal framework that successfully processes multi-level weather degradation in a single model through the Dynamic Noise Adjustment Module and Weather Complexity Adjustment Factor.
(2): To address the issues of residual noise and detail loss, we design a hierarchical multi-scale optimization loss function that significantly enhances the restoration ability for fine noise and edge structures.
(3): We develop a physics-driven degradation data generation scheme and construct a rain, snow, and fog degradation dataset that aligns with real-world scenarios.

The rest of this paper is organized as follows. Section 2 describes related work, Section 3 presents the proposed WeatherClean method for rain, snow, and fog removal in images. Section 4 discusses ablation studies, comprehensive experiments of the proposed algorithm and their corresponding results. Finally, Section 5 concludes this paper.

2. Related Work

In previous studies, researchers mainly focused on image restoration under single weather conditions such as rain, snow, or fog. For example, Shit et al. [5] proposed the EDD-N network, which combines image defogging and detection for real-time haze image processing; Luo et al. [6] designed the LKD-Net, utilizing dilated large kernel convolutional blocks (DLKCBs) and channel-enhanced feedforward networks (CEFNs) to improve defogging performance. However, these methods lack adaptability to various real-world weather conditions. Current leading weather noise removal techniques are based on end-to-end deep learning frameworks, which directly produce fog-free images through network models without estimating intermediate parameters. These methods mainly fall into two categories: GAN-based and Transformer-based models.

In the GAN-based image restoration field, for snow removal, Jaw et al. [7] proposed DesnowGAN to improve visual quality; Zhang et al. [8] designed an algorithm targeting irregular snowflakes and snow traces, combining a residual generator with a novel Transformer module to restore details. For defogging, Wang et al. [9] improved CycleGAN (DFC-dehaze), using a Dehazeformer-t generator and a local–global discriminator to reduce residual haze; Xu et al. [10] developed an enhanced CycleGAN with an adaptive dark channel prior for unpaired single-image defogging.

Transformer-based methods are becoming more popular because of their strong global modeling abilities. In snow removal tasks, Agbodike et al. [11] introduced the WiT network, which combines Vision Transformer and discrete wavelet transform; Wei et al. [12] used ViT to enhance snow removal accuracy; Lin et al. [13] developed the lightweight LMQFormer, capable of removing coarse snow using a masked query module (MQFormer). In rain removal tasks, Tang et al. [14] utilized the Swin Transformer for rain removal and created a twin network model; Song and Chen [15,16] proposed the Cycle-Derain and Cycle-Attention-Derain methods, respectively, which combine unsupervised attention mechanisms with GANs to manage unpaired data. Regarding general image restoration, the OneRestore algorithm shifts self-attention calculations to the channel dimension through MDTA and GDFN modules to lower the complexity of high-resolution images [17]; Duan et al. [18] improved global and local information capture with bidirectional attention and cross-stream convolution modules; He et al. [1] simplified low-light enhancement to a curve estimation task and integrated a CNN-Transformer architecture for low-complexity denoising.

In recent years, multi-complex weather joint removal networks have emerged. Chen et al. [2] propose a visual method capable of simultaneously removing rain streaks and haze; Siddiqua et al. [3] introduce a multi-domain attention-driven multi-modal conditional generative adversarial network (MACGAN) to enhance visibility. Transformer-based approaches include Valanarasu et al. [4] developing an end-to-end single encoder–decoder model capable of handling various weather degradations; the frequency-guided AIRFormer method proposed in Gao et al. [19]; Sun et al. [20] designing a distributed map attention mechanism that divides degraded features into buckets and uses a self-attention mechanism to capture dynamic range dependencies; Wang et al. [21] proposing a TANet algorithm, whose TAB module with spatial attention blocks can be used to handle various weather degradation patterns; Zhu et al. [22] combining the Transformer architecture, hypernetworks, feature-level linear modulation blocks, and contrastive learning to process various weather degradations using the same parameter set.

However, research on drone detection applications remains relatively limited. Wu et al. [23] proposes a drone-based railway image defogging network that uses structural similarity loss to preserve structural information; Yang et al. [24] suggests a discretization-filtering-reconstruction method for processing railway images. Drone-captured images are typically low in resolution, and existing methods often encounter challenges due to high model complexity and strict hardware requirements, making them difficult to deploy in practice. Additionally, their multi-network integration schemes often overlook the real-time processing needs under actual weather conditions. To address these issues, this paper introduces the Transformer-based WeatherClean method, which can adapt dynamically to real-world weather conditions, achieving real-time rain, snow, and fog removal with low complexity, significantly improving the applicability of railway drone detection scenarios.

3. Methods

3.1. Overall Framework

Existing multi-weather removal methods typically employ architectures with multiple independent encoders. For instance, the All-in-One [25] method formulates the adverse weather removal problem as:

B = D (E_{p} (I_{p}))

(1)

where B is the restored image, D is the decoder,

E_{p}

is the encoder specific to weather type p, and

I_{p}

is the input image with weather type p. This approach requires designing independent encoders for each weather type, leading to high computational complexity and lack of information sharing between weather types.

To address these issues, we propose the WeatherClean, which achieves single-network multi-weather adaptive processing through the introduction of a Weather Complexity Adjustment Factor (WCAF), expressed as:

B = W (I_{p}, α)

(2)

where

W

represents the WeatherClean network, and

α

is the WCAF parameter used to dynamically adjust the network’s capability to handle different weather complexities. It consists of weather-agnostic encoder and decoder networks, unlike the All-in-One network. The weather type queries are learned along with the parameters of

W

, thus making the problem setup more generic.

The overall framework of WeatherClean is shown in Figure 1. During data processing, synthetic degraded data are created using physics-based methods, including degraded images and noise as network inputs, with original images used as labels to guide network training. In the training process, multi-scale cropping inputs are used to build multiple loss functions, which improve the model’s ability to restore local details and effectively reduce fine-grained noise, leading to better overall image quality. In the inference stage, the WCAF is employed to enable adaptive processing of multi-level weather degradation.

The remaining sections of this chapter will, respectively, explain how the Dynamic Noise Adjustment Module and the integrated WCAF enhance and guide the network’s adaptive processing of multi-level weather degradation, the WeatherClean network architecture, and the design of the loss function.

3.2. The Dynamic Noise Adjustment Module

The denoising process aims to eliminate noise that is not present in the original image. FFDNet [26] assumes additive white Gaussian noise (AWGN) and takes a noise level parameter (σ) along with the noisy image. However, this single-parameter approach offers limited additional information. To improve this, we suggest using σ-generated noise images, allowing the network to learn the specific features of rain, snow, and fog through known noise image characteristics, thereby easing the learning process for the network.

Since the rain, snow, and fog images here are artificially generated by adding noise, if the given prompt information is also Gaussian noise of the same intensity, the network might directly learn the pattern of noise addition from the prompt. To enable the network to learn Gaussian noise of σ intensity, we add two separately generated Gaussian noises of equal intensity, creating a more complex noise distribution in the resulting Gaussian image that conceals its generation method. To incorporate the input noise prompts into the network and better extract information from the noise, we propose the Dynamic Noise Adjustment Module (DNAM), as shown in Figure 2. DNAM employs a 32-channel convolution to project information into a high-dimensional space, while using a 1-channel convolution to compress the information back into a low-dimensional space, since noise information is relatively monotonous compared to image information and requires less data for subsequent processing. The noise information is 1-channel, while the image information is 32-channel, and both are concatenated before passing to the next stage, completing the mixing of image and noise information.

During the inference phase, we introduce the Weather Complexity Adjustment Factor (WCAF) to replace the original noise input, providing more detailed and targeted guidance for the network’s restoration of rain, snow, and fog degradation. DNAM calculates a spatially varying WCAF feature map directly from the degraded image and combines it with high-dimensional image features, allowing subsequent convolutional layers to better perceive and adapt to the severity of degradation at each pixel.

Formally, given a degraded image

I \in ℝ^{H \times W \times 3}

, we first extract three local degradation indicators: contrast C, sharpness L, and noise energy N over sliding windows

Ω_{i j}

:

C (i, j) = \frac{\max_{(u, v) \in Ω_{i j}} I (u, v) - \min_{(u, v) \in Ω_{i j}} I (u, v)}{\max_{(u, v) \in Ω_{i j}} I (u, v) + \min_{(u, v) \in Ω_{i j}} I (u, v) + ε}

(3)

L (i, j) = 1 - \frac{{‖\nabla I‖}_{Ω_{i j}}}{\max {‖\nabla I‖}_{Ω_{i j}} + ε}

(4)

N (i, j) = \frac{σ_{Ω_{i j}} (I)}{\max σ_{Ω_{i j}} (I) + ε}

(5)

where

\nabla I

denotes the image gradient,

σ_{Ω_{i j}} (I)

is the local standard deviation, and

ε

is a small constant for numerical stability. These three feature maps are concatenated along the channel dimension and passed through a 1 × 1 convolution followed by a sigmoid activation to yield the WCAF map:

W C A F = σ (W * [C, L, N] + b) \in {(0, 1)}^{H \times W}

(6)

where

*

denoting convolution and

[C, L, N]

are the channel-stacked features.

During the inference stage, the WCAF dynamically adjusts noise intensity to match the degradation complexity in the input image. Specifically, the network creates a complexity factor for each pixel, with values ranging from 0 to 1; higher values near 1 indicate more severe degradation, such as heavy rain or snow, while values near 0 represent mild weather effects like light fog or haze. This dynamic complexity factor allows the network to adaptively modify the denoising strength during noise processing to reflect the actual degradation level. For example, when the WCAF value is high, the noise intensity increases accordingly, helping the model handle severe weather-related image degradation; when the WCAF value is low, the noise intensity decreases, making image restoration easier.

3.3. WeatherClean Network Architecture

Inspired by TransWeather’s single-network approach to multi-weather degradation, we design an innovative model called WeatherClean to simultaneously eliminate image quality degradation caused by three types of weather conditions: rain, snow, and fog. The network architecture of WeatherClean is shown in Figure 3, which mainly consists of a Dynamic Noise Adjustment Module, an encoder, a decoder, and a convolutional projection module. The network first receives input from the DNAM as a processed noisy image of size H × W × 3. Next, the network divides the noisy image into multiple patches and inputs these patches into the encoder of the Transformer module at different processing stages. The resolution of the image is gradually reduced at each stage to ensure that the Transformer can effectively learn both coarse and fine image details. In the decoding stage, the encoded features are used as keys and values within the Transformer decoder block, and the learnable weather type query embedding is used as the query. The convolutional projection block then produces a denoised image with dimensions H × W × 3.

3.3.1. Encoder Design

The encoder is based on the Transformer architecture, which derives hierarchical feature representations of the input image through multi-level processing within the encoder. At various stages, both high-level and low-level image features are extracted. In each stage, an overlapping patch merging strategy is used, where overlapping feature patches are combined into features of the same size as non-overlapping patches before passing to the next stage for further feature extraction.

As shown in Figure 4a, each Transformer module includes a multi-head self-attention mechanism and a feed-forward network for computing self-attention features. For each computational stage, the process can be formally expressed as:

T (I) = F F N (M S A (I) + P)

(7)

where

T (\cdot)

denotes the Transformer module,

F F N (\cdot)

represents the feed-forward network,

M S A (\cdot)

indicates the multi-head self-attention network, and P corresponds to the input image.

Similar to the original self-attention mechanism, each head in the multi-head self-attention network maintains identical dimensions for queries (Q), keys (K), and values (V). Assuming d represents the feature dimension, multiple attention heads exist within each encoder module. The number of heads serves as a tunable hyperparameter that is adaptively adjusted across different stages of the Transformer encoder. The attention computation is formulated as:

A t t e n t i o n = s o f t m a x (\frac{Q K^{T}}{\sqrt{d}}) V

(8)

To lower computational complexity, a reduction ratio R is introduced, which reduces the original self-attention complexity from

O (N^{2})

to

O (N^{2} / R)

.

The implementation involves reshaping the key matrix from

(N, C)

to

(N / R, C \times R)

, followed by a linear projection that maps the second dimension from C/R back to C, resulting in adjusted key dimensions of

(N / R, C)

. The processed self-attention features are then propagated to the feed-forward network module. Notably, unlike standard Transformer architectures, the proposed feed-forward network incorporates depthwise convolution to capture local information and enhance positional awareness for the Transformer. The feed-forward network module structure is illustrated in Figure 4b.

Given the self-attention features

I

as input, where

D W C (\cdot)

denotes depthwise convolution,

G E L U (\cdot)

represents the Gaussian Error Linear Unit, and

M L P (\cdot)

denotes the Multilayer Perceptron. The computation in the FFN module is formulated as follows:

F F N (I) = M L P (G E L U (D W C (M L P (I)))) + I

(9)

3.3.2. The Intra-Patch Transformer Module

During the forward propagation in the Transformer encoder, the Intra-PT (intra-patch Transformer) module is inserted between each stage to process the sub-patches created from the original patches. The dimensions of these sub-patches are half the width and height of the original patch. The structure of the Intra-PT module is shown in Figure 5.

The Intra-PT module employs a similar Transformer architecture as the main module, but uses a higher reduction ratio R for increased computational efficiency. Because it operates on smaller patches, it can capture finer details, reducing the loss of small-scale information. Except for the first stage, the Intra-PT module generates patches at the feature level, whereas the first stage produces patches at the image level. The output self-attentive features of the Intra-PT module are combined at the same stage as those from the main module’s self-attentive features. Therefore, at each stage of the forward propagation, the Transformer encoder’s process can be summarized as:

Y = T_{m a i n} (I) + T_{I n t r a P T} (S P (I))

(10)

where

Y

is the output of the Transformer at each stage, I is the input of the Transformer at each stage,

T_{m a i n}

is the main Transformer module, and

T_{I n t r a P T}

is the Intra-PT module in the Transformer, which corresponds to the process of creating sub-patches from the input patches.

3.3.3. Decoder Design

Traditional Transformer decoders use autoregression to predict the output sequence one element at a time. Inspired by Detection Transformer (DETR), this method defines weather type queries that are used to decode task feature vectors and reconstruct clear images accordingly. These weather type queries are learnable embeddings, trained alongside other network parameters and associated with the feature output of the Transformer encoder.

The Transformer decoder operates as a single stage but incorporates multiple blocks. Figure 6 shows the structure of the Transformer block in the decoder.

Unlike the Transformer module in the encoder, the query Q in the decoder is a learnable embedding that represents the weather type, while the key K and the value V are features extracted from the final stage of the Transformer encoder. These Transformer modules are similar to those in the encoder–decoder architecture. Unlike the self-attentive Transformer module (where Q, K, and V are all from the same input), Q in the decoder is a weather-type learnable embedding, while K and V are features from the Transformer encoder. The features output from the decoder represents the task feature vector, which is fused with the features extracted by the Transformer encoder at each stage and eventually fed back into the convolutional tail to reconstruct a clean image.

For the convolutional layer projection module design, the layered Transformer encoder feature set and task features from the Transformer decoder are processed through four convolutional layers to produce a clean image. Before each convolutional layer, an upsampling layer is used to restore the original image size. Jump connections are established at each stage of the convolutional tail, starting from the Transformer encoder, and the tanh activation function is applied in the final layer.

3.4. Loss Function

To handle complex weather conditions such as different levels of rain, snow, and fog, the proposed algorithm includes a flexible denoising module capable of managing degradation tasks across various weather intensities. However, this flexibility might cause the network to overfit, which could reduce its performance on single-weather scenarios. To improve the model’s ability to process specific weather conditions, the WeatherClean network uses a multi-scale cropping strategy as an input augmentation method. As shown in Figure 7, this approach takes advantage of the randomness in smaller input patches to enhance the network’s generalization. Additionally, various loss functions are used to regulate the network’s outputs, ensuring strong performance across different weather situations.

The network is trained in an end-to-end manner, with the loss function using a smooth L1 loss function between the predicted and true values. The residual loss between the predicted value and the true value is defined as f, and the loss function L1 is defined as follows:

L o s s_{L 1} = \{\begin{matrix} 0.5 f^{2}, | f | < 1 \\ | f | - 0.5, | f | \geq 1 \end{matrix}

(11)

In order to better enhance the detailed information of the output, the perceptual loss is also used to measure the difference between the predicted features and the real features. These features are extracted using the VGG16 network pre-trained on ImageNet. The features are extracted from layers 3, 8, and 15 of VGG16 to calculate the perceptual loss. The perceptual loss is calculated as follows:

L o s s_{p e r c e p t u a l} = L o s s_{M S E} (V G G (\hat{Y}), V G G (Y))

(12)

where

L o s s_{M S E}

is the mean square error loss function,

V G C

is the feature extraction output from the VGG16 network,

Y

is the predicted value, and

\hat{Y}

is the true value.

For the three different sizes of inputs and outputs after dataset cropping, the smoothing L1 loss and perceptual loss are applied to the three predicted and true values, respectively. The different scales used in this algorithm are original size, 128 × 128 and 64 × 64, respectively; the larger size cropped input can capture the overall effect of the whole image and handle the variation in different noise reduction intensities. Meanwhile, the smaller size cropped input with randomization allows the network to focus on details and small noises in the image, significantly improving its noise reduction ability for a single focus area. The total network loss can be expressed as the following equation:

L o s s_{t o t a l} = L o s s_{O} + L o s s_{c 1} + L o s s_{c 2}

(13)

where

L o s s_{O}

,

L o s s_{c 1}

and

L o s s_{c 2}

represent the loss functions between the output values and labels of three different sizes after cropping, which can all be expressed by the following formula:

L o s s = L o s s_{L 1} + λ L o s s_{p e r c e p t u a l}

(14)

λ

is a weight controlling the contribution of the smoothed L1 loss and the perceptual loss to the total loss, and is set to 0.04 here.

4. Experiments

4.1. Experimental Datasets

4.1.1. Datasets

In the study of railway image de-rain, de-snow, and defog, obtaining real-world images with rain, snow, or fog conditions and their corresponding clear images is extremely challenging. Therefore, synthetic datasets have become a primary data source in this research. To address the challenge of removing rain, snow, and fog from railway images and to create a large-scale paired dataset, images from the Kaggle Railroad Worker Detection Dataset [27] are used. Two methods are employed to generate rainy, snowy, and foggy weather conditions.

To simulate the effect of rain and snow on an image, a method based on random sampling and image processing is used. First, random noise regions of the same size as the target image are generated by uniformly sampling pixel values from 0 to 255. The noise distribution is controlled by setting a threshold, and a convolution kernel is applied to initially blur the noise, making it resemble rain and snow in the actual scene more closely. Additionally, a rotation matrix with a specific angle is used to simulate the tilt of the rain and snow, and this matrix is blurred with an equally scaled Gaussian kernel. This not only widens the noise but also makes it more dynamic.

Finally, the blurred rotation matrix is used as a filter to process the initial noisy image. To incorporate rain and snow effects into the original image, we superimpose the noisy image onto the original image proportionally based on pixel values. This results in a synthesized image with the rain and snow effects.

Fog is a dense suspension of tiny water droplets or ice crystals in the atmosphere. Its scattering and absorption characteristics can be described by Mie scattering theory. The presence of fog causes light scattering during transmission, leading to reduced contrast and less saturated object colors. In computer vision, the atmospheric scattering model is commonly used to mathematically represent foggy images, as fog formation is naturally connected to light scattering phenomena in the atmosphere. This scattering process results in the weakening of light by atmospheric particles such as haze and mist. The standard optical model is as follows:

I (x) = J (x) t (x) + A (1 - t (x))

(15)

where

A

is the global atmospheric light component,

x

is the coordinate value of image pixels,

I (x)

is the fog image,

t (x)

is the transmittance, and

J (x)

is the fog removal image to be restored [27]. According to the formula, the image was processed and the synthetic fog figure was obtained.

The paired dataset was created through various parameter adjustments and split into training and validation sets, with the training set containing 7524 images and the test set including 2142 images. The number of images for rain, snow, and fog conditions is evenly distributed. To further assess the algorithm’s ability to generalize across different scenarios, we built a rain, snow, and fog degradation dataset based on Railway Foreign Object Intrusion images captured by drones. Using the same synthesis method for rain, snow, and fog, we generated a total of 240 synthetic weather images, with 80 images for each weather condition. Additionally, we collected 1372 foggy railway images taken by drones during early morning hours for real-world scenario validation.

4.1.2. Experiment Environment and Parameter Settings

The hardware environment for this experiment consists of an NVIDIA GeForce RTX 4090 Ti GPU, while the software environment includes the Ubuntu 18.04.5 operating system, Python 3.8.5, Pytorch 1.8.0, and relevant deep learning Python libraries. The Adam optimizer is used with a learning rate of 0.0002. A learning rate scheduler is employed to update the learning rate by a factor of 2 after 100 and 150 iterations. The network is trained for 500 epochs, with a batch size of 8 for the training set and a batch size of 1 for the validation set.

4.1.3. Evaluation Metrics

In this experiment, two evaluation metrics, the Peak Signal-to-Noise Ratio (PSNR) and the Structural Similarity Index (SSIM), are used to assess the performance of the proposed algorithm [28]. These two metrics are commonly employed to evaluate the similarity between images, but they focus on different aspects. The PSNR is a widely used metric for assessing image quality. It measures the Peak Signal-to-Noise Ratio between the original image and the image that has undergone compression or other distortion processes. A higher PSNR value indicates better image quality. The calculation formula for the PSNR is as follows:

P S N R = 10 \times \log_{10} (\frac{M A X^{2}}{M S E})

(16)

where

M A X

represents the maximum possible pixel value, while MSE is the mean squared error between the original image and the distorted image. The calculation formula for MSE is as follows:

M S E = \frac{1}{m n} \sum_{i = 0}^{m - 1} \sum_{j = 0}^{n - 1} {[I (i, j) - K (i, j)]}^{2}

(17)

where

I (i, j)

represents the pixel value of the original image,

K (i, j)

is the pixel value of the distorted image, and

m

and n are the width and height of the image, respectively.

The SSIM is another metric used to evaluate image quality. It considers not only differences in pixel values but also the image’s structure and human visual perception. The calculation formula for the SSIM is as follows:

S S I M (x, y) = \frac{(2 μ_{x} μ_{y} + c_{1}) \cdot (2 σ_{x y} + c_{2})}{(μ_{x}^{2} + μ_{y}^{2} + c_{1}) \cdot (σ_{x}^{2} + σ_{y}^{2} + c_{2})}

(18)

where

μ_{x}

and

μ_{y}

represent the mean values of images

x

and

y

, respectively,

σ_{x}^{2}

and

σ_{y}^{2}

are their variances, and

σ_{x y}

is their covariance.

c_{1}

and

c_{2}

are constants used to stabilize the calculation.

4.2. Ablation Study

Through ablation experiments, the two main components of the proposed network in this section, namely the Dynamic Noise Adjustment Module (DNAM) and the Multi-scale Crop Loss (MCL), are validated regarding their impact on the network’s denoising performance using the Railway Public Synthetic Dataset. To quantitatively assess the denoising performance of this method, two metrics are introduced: the PSNR and the SSIM. The results of ablation experiments on synthetic railway public datasets for rain, snow, and fog are shown in Figure 8 and Table 1.

Analysis of the Ablation Experiment Results for the DNAM

The Dynamic Noise Adjustment Module aims to effectively integrate input noise information with the network to enhance noise feature extraction. DNAM and degraded images are simultaneously input into the network, allowing the model to learn rain, snow, and fog characteristics from the noise images provided. As shown in the first and second rows of Table 1, after incorporating DNAM, the rain removal performance is the best, with the PSNR increasing from 22.98 to 25.64 and the SSIM from 0.8107 to 0.8947, representing improvements of 11.6% and 10.4%, respectively. Snow removal results are roughly comparable to the baseline, with a slight increase in the SSIM but a minor decrease in the PSNR. The fog removal performance shows a small decline, with the PSNR dropping from 18.63 to 18.46 and the SSIM from 0.8240 to 0.8233. This indicates that DNAM performs exceptionally well in handling high-frequency noise but has limitations in addressing low-frequency degradation, suggesting that combining it with multi-scale cropping loss functions could further improve results.

2.: Analysis of the Ablation Experiment Results for the Multi-scale Crop Loss

When the network is trained to remove noise from degraded images, sometimes too many details are lost in the denoising results. To address this issue, this study proposes a Multi-scale Crop Loss (MCL) function to handle variations in denoising strength, which significantly improves the denoising ability for specific concentrations. In this study, cropped images of sizes 128 × 128 and 64 × 64 are added to the input of the WeatherClean network, with the loss function imposing constraints on the effect. As shown in the third row of Table 1, using the MCL function in the network results in a more effective improvement in rain, snow, and fog removal performance compared to using only the DNAM. As shown in the first and third rows of Table 1, after applying both the DNAM and the loss function, the model’s denoising performance significantly improves compared to the original model, especially in the case of synthetic fog, where the increases in the PSNR and the SSIM are substantial.

3.: Analysis of the Adaptation Mechanism of the Improved Modules to Different Weather Types

After adding the DNAM, the most notable improvement was seen in rainy images, where the PSNR increased by 2.66 (from 22.98 to 25.64). This is mainly because the Noise Module effectively uses noise priors to guide feature extraction, removing the linear noise caused by rain. However, the improvement in snowy and foggy images was less obvious. Snow noise is more scattered and granular, so the DNAM performs less well with this type of noise. In foggy images, the overall degradation and the inability of the DNAM to restore contrast limit its effectiveness. Therefore, the DNAM alone does not produce significant improvements in snowy and foggy images.

However, after adding the Multi-scale Crop Loss, the results improved significantly. MCL boosts the model’s ability to recover local details and small-scale noise by using a multi-scale mechanism, especially in snowy and foggy images, where both the PSNR and the SSIM saw notable gains. Specifically, in foggy images, the PSNR reached 28.44, and the SSIM increased to 0.9548, showing that MCL effectively restored both image details and overall contrast.

To better demonstrate the results of the model on the Railway Public Dataset, the effects of the DNAM and MCL loss function are visually displayed, as shown in Figure 9. Red boxes indicate rain residue, blue boxes indicate snow residue, and green boxes indicate fog residue. It can be observed that the proposed DNAM and MCL function have a noticeable positive effect on the image quality.

4.3. Comprehensive Comparative Experiments

To assess the generalizability of the proposed model, comprehensive comparative experiments were performed on images from the Railway Foreign Object Intrusion Sample Database. The model was evaluated against leading denoising algorithms such as TransWeather [4], MPRNet [29], Restormer [17], Uformer [30], and All-in-One [25]. The results of these comparisons for the rain, snow, and fog synthetic datasets collected by drones are shown in Table 2.

The comprehensive comparative experimental results show that the WeatherClean model performs very well under all weather conditions. Specifically, in rainy conditions, WeatherClean achieves a PSNR of 30.05 and an SSIM of 0.8111, both exceeding all comparison models and showing its excellent ability to handle linear noise. In snowy conditions, WeatherClean has an SSIM of 0.8024, the highest among all models, and a PSNR of 28.49, just below All-in-One’s 28.57, indicating its balanced processing ability. In foggy conditions, WeatherClean stands out most clearly, with a PSNR of 28.50—much higher than the second-best All-in-One at 14.23—and an SSIM of 0.9548, considerably higher than the other models, demonstrating its strong advantage in dealing with low-frequency degradation.

The denoising results of the algorithm proposed in this paper are compared with those of current mainstream denoising algorithms, TransWeather and MPRNet. The rain removal effects on images from the Railway Foreign Object Intrusion Sample Database are shown in Figure 10, where (a) shows the degraded image, (b) displays the ground truth (GT), (c) shows the result of MPRNet, (d) depicts TransWeather’s output, and (e) illustrates the effect of the WeatherClean algorithm in this section. The yellow boxes highlight areas with color distortion. The image processing includes both day and night scenes to assess the algorithms’ performance under different lighting conditions.

The rain removal visual effect on images from the Railway Foreign Object Intrusion Sample Database is shown in Figure 10i. Rain removal was carried out in both daytime and nighttime scenarios. The first row displays the daytime results, where all three algorithms effectively remove rain; however, MPRNet shows noticeable color distortion. In the second row, under nighttime conditions, both MPRNet and TransWeather exhibit color distortion, while our algorithm produces results closer to the ground truth.

The snow removal visual effect on images from the Railway Foreign Object Intrusion Sample Database is illustrated in Figure 10ii. Purple boxes highlight areas with significant noise, especially in the track region. In the first row, which shows daytime results, both MPRNet and TransWeather display unclear track areas with more noise, while NoiseWeather performs better at maintaining the clarity of the track region. In the second row, under nighttime conditions, the differences among the three algorithms are less pronounced.

The defogging visual effect on images from the Railway Foreign Object Intrusion Sample Database is shown in Figure 10iii. Green boxes indicate fog residue. Whether it is daytime or nighttime, both MPRNet and TransWeather still display significant amounts of fog that remain unremoved. Conversely, NoiseWeather performs well in defogging, effectively clearing the fog from the image without causing any color distortion.

In conclusion, these visual images further demonstrate that applying DNAM and MCL to image de-rain, de-snow, and defog tasks is highly effective.

4.4. Validation on Real Foggy Drone Images

To validate the effectiveness of the proposed model in real-world situations, this section tests it using actual foggy images taken by drones during early morning. We first show visual comparisons between original images and dehazed images, then perform basic object detection tasks to train and evaluate on datasets before and after dehazing, respectively. This demonstrates that the proposed dehazing method can effectively improve image quality and enhance the performance of subsequent visual tasks.

Figure 11 shows the image comparison before and after dehazing with our model. It is clear that after dehazing, the overall image clarity improves significantly, fog occlusion is effectively removed, and the contours and details of distant targets are clearly restored. Especially under early morning fog conditions, the dehazed images not only keep the original color accuracy but also greatly enhance contrast and visibility, providing higher-quality input data for later computer vision tasks.

To thoroughly validate the performance boost of the proposed method on downstream visual tasks, we chose five leading object detection algorithms for comparison: YOLOv8 [31], YOLOv11 [32], DETR [33], Faster R-CNN [34], and CenterNet [35]. These algorithms represent different technical approaches in the current object detection landscape, including anchor-based detectors, query-based detectors, and keypoint-based detectors, which allow for a comprehensive assessment of the dehazing method’s applicability across various detection paradigms.

To ensure fairness and comparability of experimental results, we strictly controlled the experimental conditions: all algorithms were run on identical hardware with the same training hyperparameters, including learning rate, batch size, and optimizer configuration. The same random seed was used during training to guarantee result reproducibility, and identical evaluation protocols and data processing pipelines were followed during testing.

For evaluation metrics, we selected four core metrics in the object detection field: precision (P), recall (R), mean average precision (mAP@0.5), and mean average precision (mAP@0.5:0.95). Among them, precision measures the accuracy of detection results, representing the proportion of true positive samples among all samples detected as positive; recall measures the completeness of detection, representing the proportion of all true positive samples that are correctly detected; mAP@0.5 represents the mean average precision at an IoU threshold of 0.5, which is the most commonly used evaluation metric in object detection tasks; mAP@0.5:0.95 indicates the mean average precision over IoU thresholds from 0.5 to 0.95 (with a step size of 0.05), providing a more comprehensive assessment of detector performance under different localization accuracy requirements. This experiment focuses on a single-object category detection task, specifically “Person” detection.

The experimental results in Table 3 clearly show the significant performance improvements of the WeatherClean dehazing method on object detection. Comparing five mainstream object detection models on original foggy images and WeatherClean-processed images, all models demonstrated notable gains. In terms of mAP@0.5, YOLOv8 increased from 0.843 to 0.912 (8.2% rise), YOLOv11 from 0.812 to 0.875 (7.8% rise), DETR from 0.798 to 0.844 (5.8% rise), Faster R-CNN from 0.857 to 0.917 (7.0% rise), and CenterNet from 0.847 to 0.892 (5.3% rise). Notably, in precision, both YOLOv8 and Faster R-CNN achieved high precision levels of 0.934. These findings confirm that our WeatherClean model effectively removes fog interference from images, greatly enhances the performance of downstream visual tasks, and provides strong technical support for reliable drone operations under adverse weather conditions.

5. Conclusions

This study introduces a novel drone image rain–snow–fog removal framework called WeatherClean. The framework features a Dynamic Noise Adjustment Module and a Weather Complexity Adjustment Factor, which adaptively process rain, snow, and fog degradation of different intensities. It uses a hierarchical multi-scale cropping strategy to create local–global collaborative loss functions, improving the recovery of fine noise and edge details. Additionally, it develops a degradation synthesis method based on atmospheric scattering physical models to generate training samples that match real-world weather degradation patterns, helping to address data scarcity. This study effectively employs weather complexity adaptive adjustment and multi-scale cropping strategies, significantly enhancing image restoration performance under complex weather conditions.

However, this research still has some limitations. Limited by actual collection conditions, we mainly tested moderate-intensity foggy weather and have not yet verified the model’s robustness under extreme weather conditions such as heavy rain, blizzards, dense fog, and others. Future research will develop comprehensive datasets that include extreme weather conditions to thoroughly evaluate the dynamic adjustment capability of the WCAF under severe weather and assess the robustness of the proposed model.

Author Contributions

Conceptualization, S.Y. (Shaobing Yang); Methodology, K.W.; Software, M.L.; Writing—original draft, Z.Z. and S.Y. (Shengjia Yu); Writing—review & editing, Z.W. and L.J. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the National Key R&D Program of China, Grant/Award Number: 2022YFB4300601; Research Project of National Railway Administration, Grant Number: KF2024-083; Research Project of China State Railway Group Co., Ltd., Grant Number: P2024X001; State Key Laboratory of Advanced Rail Autonomous Operation (Contract No. RAO2023ZZ002), Beijing Jiaotong University; Research Project of China Energy Investment Group Co., Ltd., Grant Number: GJNY-22-123.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

For subsequent research, the dataset is not publicly available at this time.

Conflicts of Interest

Author Kewen Wang was employed by the company China Energy Investment Group Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

He, R.; Li, X.; Wu, J. LEESDFormer: A LightweigLht Unsupervised CNN-Transformer-Based Curve Estimation Network for Low-Light Image Enhancement, Exposure Suppression, and Denoising. Neural Netw. 2025, 190, 107764. [Google Scholar] [CrossRef]
Chen, X.; Wei, C.; Xin, Z.; Zhao, J.; Xian, J. Ship Detection under Low-Visibility Weather Interference via an Ensemble Generative Adversarial Network. J. Mar. Sci. Eng. 2023, 11, 2065. [Google Scholar] [CrossRef]
Siddiqua, M.; Belhaouari, S.B.; Akhter, N.; Zameer, A.; Khurshid, J. MACGAN: An all-in-one image restoration under adverse conditions using multidomain attention-based conditional GAN. IEEE Access 2023, 11, 70482–70502. [Google Scholar] [CrossRef]
Valanarasu, J.M.J.; Yasarla, R.; Patel, V.M. Transweather: Transformer-based restoration of images degraded by adverse weather conditions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 2353–2363. [Google Scholar]
Shit, S.; Das, D.K.; Ray, D.N.; Roy, B. An encoder-decoder based CNN architecture using end to end dehaze and detection network for proper image visualization and detection. Comput. Animat. Virtual Worlds 2023, 34, e2147. [Google Scholar] [CrossRef]
Luo, P.; Xiao, G.; Gao, X.; Wu, S. LKD-Net: Large kernel convolution network for single image dehazing. In Proceedings of the 2023 IEEE International Conference on Multimedia and Expo (ICME), Brisbane, Australia, 10–14 July 2023; pp. 1601–1606. [Google Scholar]
Jaw, D.W.; Huang, S.C.; Kuo, S.Y. DesnowGAN: An efficient single image snow removal framework using cross-resolution lateral connection and GANs. IEEE Trans. Circuits Syst. Video Technol. 2020, 31, 1342–1350. [Google Scholar] [CrossRef]
Zhang, Z.; Wu, S.; Wang, S. Single-image snow removal algorithm based on generative adversarial networks. IET Image Process. 2023, 17, 3580–3588. [Google Scholar] [CrossRef]
Wang, S.; Mei, X.; Kang, P.; Li, Y.; Liu, D. DFC-dehaze: An improved cycle-consistent generative adversarial network for unpaired image dehazing. Vis. Comput. 2023, 40, 2807–2818. [Google Scholar] [CrossRef]
Xu, Y.; Zhang, H.; He, F.; Guo, J.; Wang, Z. Enhanced CycleGAN Network with Adaptive Dark Channel Prior for Unpaired Single-Image Dehazing. Entropy 2023, 25, 856. [Google Scholar] [CrossRef]
Agbodike, O.; Chen, J. Restoring Snow-Degraded Single Images With Wavelet in Vision Transformer. IEEE Access 2023, 11, 99470–99480. [Google Scholar] [CrossRef]
Wei, B.; Wang, D.; Wang, Z.; Zhang, L. Single Image Desnow Based on Vision Transformer and Conditional Generative Adversarial Network for Internet of Vehicles. CMES-Comput. Model. Eng. Sci. 2023, 137, 1975–1988. [Google Scholar] [CrossRef]
Lin, J.; Jiang, N.; Zhang, Z.; Chen, W.; Zhao, T. LMQFormer: A laplace-prior-guided mask query transformer for lightweight snow removal. IEEE Trans. Circuits Syst. Video Technol. 2023, 33, 6225–6235. [Google Scholar] [CrossRef]
Tang, J.; Feng, G.A. Fusion Deraining Network Based on Swin Transformer and Convolutional Neural Network. IEICE Trans. Inf. Syst. 2023, 106, 1254–1257. [Google Scholar] [CrossRef]
Song, Z.; Guo, Y.; Ma, Z.; Tang, R.; Liu, L. An unsupervised generative adversarial network for single image deraining. IET Image Process. 2021, 15, 3105–3117. [Google Scholar] [CrossRef]
Chen, M.; Wang, P.; Shang, D.; Wang, P. Cycle-attention-derain: Unsupervised rain removal with CycleGAN. Vis. Comput. 2023, 39, 3727–3739. [Google Scholar] [CrossRef]
Zamir, S.W.; Arora, A.; Khan, S.; Hayat, M.; Khan, F.S.; Yang, M.-H. Restormer: Efficient Transformer for High-Resolution Image Restoration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 17768–17778. [Google Scholar]
Duan, Z.; Zhang, J.; Jin, X.; Zhang, Z.; Xiong, Z.; Zou, D.; Ren, J.; Guo, C.; Li, C. DIT4SR: Taming Diffusion Transformer for Real-World Image Super-Resolution. arXiv 2025, arXiv:2503.23580v2. [Google Scholar]
Gao, T.; Wen, Y.; Zhang, K.; Zhang, J.; Chen, T.; Liu, L.; Luo, W. Frequency-oriented efficient transformer for all-in-one weather-degraded image restoration. IEEE Trans. Circuits Syst. Video Technol. 2024, 34, 1886–1899. [Google Scholar] [CrossRef]
Sun, S.; Ren, W.; Gao, X.; Wang, R.; Cao, X. Restoring Images in Adverse Weather Conditions via Histogram Transformer. arXiv 2024, arXiv:2407.10172. [Google Scholar]
Wang, H.; Tsai, F.; Lin, Y.; Lin, C. TANet: Triplet Attention Network for All-In-One Adverse Weather Image Restoration. arXiv 2024, arXiv:2410.08177. [Google Scholar]
Zhu, R.; Tu, Z.; Liu, J.; Bovik, A.C.; Fan, Y. MWFormer: Multi-Weather Image Restoration Using Degradation-Aware Transformers. IEEE Trans. Image Process. 2024, 33, 6790–6805. [Google Scholar] [CrossRef]
Wu, Y.; Yong, Q.; Wang, Z.; Ma, X.; Cao, Z. Densely Pyramidal Residual Network for UAV-Based Railway Images Debazing. Neurocomputing 2020, 371, 124–136. [Google Scholar] [CrossRef]
Yang, H.; Li, X.; Guo, Y.; Jia, L. Discretization–Filtering–Reconstruction: Railway Detection in Images for Navigation of Inspection UAV. IEEE Trans. Instrum. Meas. 2022, 71, 1–13. [Google Scholar] [CrossRef]
Li, B.; Liu, X.; Hu, P.; Wu, Z.; Lv, J.; Peng, X. All-In-One Image Restoration for Unknown Corruption. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 17431–17441. [Google Scholar] [CrossRef]
Zhang, K.; Zuo, W.; Zhang, L. FFDNet: Toward a fast and flexible solution for CNN-based image denoising. IEEE Trans. Image Process. 2018, 27, 4608–4622. [Google Scholar] [CrossRef]
Quaranta, L.; Calefato, F.; Lanubile, F. KGTorrent: A dataset of python jupyter notebooks from Kaggle. In Proceedings of the 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR), Madrid, Spain, 17–19 May 2021; pp. 550–554. [Google Scholar]
Zheng, C.; Ying, W.; Hu, Q. Comparative analysis of dehazing algorithms on real-world hazy images. Sci. Rep. 2025, 15, 10822. [Google Scholar] [CrossRef]
Zamir, S.W.; Arora, A.; Khan, S.; Hayat, M.; Khan, F.S.; Yang, M.H.; Shao, L. Multi-stage progressive image restoration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 14821–14831. [Google Scholar]
Wang, Z.; Cun, X.; Bao, J.; Zhou, W.; Liu, J.; Li, H. Uformer: A General U-Shaped Transformer for Image Restoration. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 17662–17672. [Google Scholar] [CrossRef]
Varghese, R.; Sambath, M. YOLOv8: A Novel Object Detection Algorithm with Enhanced Performance and Robustness. In Proceedings of the 2024 International Conference on Advances in Data Engineering and Intelligent Computing Systems (ADICS), Chennai, India, 18–19 April 2024; pp. 1–6. [Google Scholar] [CrossRef]
Khanam, R. YOLOv11: An Overview of the Key Architectural Enhancements. arXiv 2024, arXiv:2410.17725v1. [Google Scholar]
Carion, N.; Massa, F.; Synnaeve, G.; Usunier, N.; Kirillov, A.; Zagoruyko, S. End-to-End Object Detection with Transformers. arXiv 2020, arXiv:2005.12872. [Google Scholar]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed]
Duan, K.; Bai, S.; Xie, L.; Qi, H.; Huang, Q.; Tian, Q. CenterNet: Keypoint Triplets for Object Detection. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 6568–6577. [Google Scholar] [CrossRef]

Figure 1. Overall framework of WeatherClean. (a) The data processing stage; (b) the network training stage; (c) the inference stage. The network takes degraded images and the WCAF as inputs and outputs denoised images.

Figure 2. The Dynamic Noise Adjustment Module.

Figure 3. Overview of the proposed WeatherClean network.

Figure 4. Encoder design. (a) Schematic diagram of the encoder module architecture; (b) the FFN Module Architecture.

Figure 5. The Intra-PT module structure.

Figure 6. The structure of the Transformer module in the decoder.

Figure 7. Illustration of multi-scale cropped input patches.

Figure 8. Comparison of the PSNR and the SSIM for different components. (a) The PSNR; (b) the SSIM.

Figure 9. Ablation study visualization results. (a) Degraded images; (b) real images; (c) baseline; (d) +DNAM; (e) + DNAM+ MCL.

Figure 10. Visualization of image denoising results. (i) Rainy condition; (ii) snowy condition; (iii) foggy condition. (a) The degraded image; (b) The ground truth (GT); (c)The result of MPRNet; (d) The result of TransWeather; (e) The result of WeatherClean.

Figure 11. Visual comparison of defogging results on real-world drone images. (a) Original foggy drone images; (b) defogged images processed by WeatherClean.

Table 1. WeatherClean component ablation study results.

Module	Snowy		Rainy		Foggy
Module	PSNR	SSIM	PSNR	SSIM	PSNR	SSIM
baseline	27.91	0.9240	22.98	0.8107	18.63	0.8240
+DNAM	27.82	0.9266	25.64	0.8947	18.46	0.8233
+DNAM + MCL	28.12	0.9277	25.83	0.9107	28.44	0.9548

Table 2. Results of comprehensive comparative experiments.

Module	Snowy		Rainy		Foggy		FPS
Module	PSNR	SSIM	PSNR	SSIM	PSNR	SSIM	FPS
MPRNet	28.17	0.7889	29.93	0.7797	12.44	0.7798	17.9
TransWeather	28.15	0.7758	29.67	0.7756	12.43	0.8233	57.5
Restormer	27.89	0.7623	28.45	0.7689	11.98	0.7956	28.3
Uformer	27.76	0.7589	29.32	0.7654	11.87	0.7892	25.7
All-in-One	28.57	0.7922	29.34	0.8023	14.23	0.8544	22.1
WeatherClean	28.49	0.8024	30.05	0.8111	28.50	0.9548	47.3

Table 3. Detection performance comparison of object detection models on original (left) and WeatherClean-Processed (right) foggy images.

Module	Original				Improved
Module	P	R	mAP@0.5	mAP@0.5:0.95	P (↑)	R (↑)	mAP@0.5 (↑)	mAP@0.5:0.95 (↑)
YOLOv8	0.848	0.813	0.843	0.333	0.934	0.814	0.912	0.382
YOLOv11	0.826	0.796	0.812	0.331	0.863	0.832	0.875	0.369
DETR	0.821	0.756	0.798	0.317	0.861	0.768	0.844	0.344
Faster R-CNN	0.864	0.798	0.857	0.339	0.934	0.813	0.917	0.380
CenterNet	0.857	0.802	0.847	0.351	0.859	0.835	0.892	0.371

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, K.; Yang, S.; Zhang, Z.; Wang, Z.; Jia, L.; Li, M.; Yu, S. WeatherClean: An Image Restoration Algorithm for UAV-Based Railway Inspection in Adverse Weather. Sensors 2025, 25, 4799. https://doi.org/10.3390/s25154799

AMA Style

Wang K, Yang S, Zhang Z, Wang Z, Jia L, Li M, Yu S. WeatherClean: An Image Restoration Algorithm for UAV-Based Railway Inspection in Adverse Weather. Sensors. 2025; 25(15):4799. https://doi.org/10.3390/s25154799

Chicago/Turabian Style

Wang, Kewen, Shaobing Yang, Zexuan Zhang, Zhipeng Wang, Limin Jia, Mengwei Li, and Shengjia Yu. 2025. "WeatherClean: An Image Restoration Algorithm for UAV-Based Railway Inspection in Adverse Weather" Sensors 25, no. 15: 4799. https://doi.org/10.3390/s25154799

APA Style

Wang, K., Yang, S., Zhang, Z., Wang, Z., Jia, L., Li, M., & Yu, S. (2025). WeatherClean: An Image Restoration Algorithm for UAV-Based Railway Inspection in Adverse Weather. Sensors, 25(15), 4799. https://doi.org/10.3390/s25154799

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

WeatherClean: An Image Restoration Algorithm for UAV-Based Railway Inspection in Adverse Weather

Abstract

1. Introduction

2. Related Work

3. Methods

3.1. Overall Framework

3.2. The Dynamic Noise Adjustment Module

3.3. WeatherClean Network Architecture

3.3.1. Encoder Design

3.3.2. The Intra-Patch Transformer Module

3.3.3. Decoder Design

3.4. Loss Function

4. Experiments

4.1. Experimental Datasets

4.1.1. Datasets

4.1.2. Experiment Environment and Parameter Settings

4.1.3. Evaluation Metrics

4.2. Ablation Study

4.3. Comprehensive Comparative Experiments

4.4. Validation on Real Foggy Drone Images

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI