Semi-Supervised Degradation-Aware Learning for All-in-One Weather-Degraded Image Restoration

Cai, Lei; Ruan, Fang; Lu, Wei; Lin, Qi; Zheng, Huijie; Xiang, Wenjie; Zhu, Tao

doi:10.3390/electronics15122686

Open AccessArticle

Semi-Supervised Degradation-Aware Learning for All-in-One Weather-Degraded Image Restoration

by

Lei Cai

^1,*

,

Fang Ruan

¹,

Wei Lu

²,

Qi Lin

³,

Huijie Zheng

³,

Wenjie Xiang

¹ and

Tao Zhu

⁴

¹

School of Engineering, Huaqiao University, Quanzhou 362021, China

²

Xiamen Solex High-Tech Industries Co., Ltd., Xiamen 361000, China

³

School of Information Science and Engineering, Huaqiao University, Xiamen 361021, China

⁴

School of Electronic Engineering and Automation, Guilin University of Electronic Technology, Guilin 541004, China

^*

Author to whom correspondence should be addressed.

Electronics 2026, 15(12), 2686; https://doi.org/10.3390/electronics15122686

Submission received: 31 May 2026 / Revised: 14 June 2026 / Accepted: 15 June 2026 / Published: 17 June 2026

(This article belongs to the Topic Computer Vision and Image Processing, 3rd Edition)

Download

Browse Figures

Versions Notes

Abstract

All-in-one weather-degraded image restoration aims to restore clean images from diverse weather-degraded observations (such as rain, haze, and snow) using a unified model. However, this topic remains challenging due to its ill-posed nature and the scarcity of large-scale paired training data. This article develops a novel semi-supervised learning framework, termed Semi-Supervised Degradation-Aware Learning (S²DAL), to adjust the feature space to align with the unified parameter space for all-in-one adverse weather removal. Specifically, the proposed S²DAL consists of two backbone networks: a Degradation-guided Histogram Transformer (DHformer) for weather-degraded image restoration and a Degradation-guided Convolutional Neural Network (DCNN) for degradation generation. A key component, the Degradation-guided Histogram Transformer (DHT) block, is designed to effectively capture intrinsic image features while suppressing diverse degradation interference through channel shuffling modulation, dynamic-range histogram self-attention, and dual-scale gated feed forward. Furthermore, a Monte Carlo-based Expectation-Maximization (EM) algorithm is introduced to jointly optimize latent variables and network parameters under both labeled and unlabeled data. Extensive quantitative and qualitative results on synthetic and real-world datasets consistently demonstrate that the proposed S²DAL achieves superior restoration performance compared to multiple state-of-the-art fully supervised and semi-supervised approaches.

Keywords:

all-in-one image restoration; adverse weather removal; semi-supervised learning

1. Introduction

Image restoration under adverse weather conditions (e.g., rain, fog, and snow) has attracted significant attention in the computer vision community. This task constitutes a highly ill-posed inverse problem, as multiple plausible solutions may correspond to the restoration. Previous studies have primarily focused on single weather-degraded image restoration, such as de-raining [1,2,3], de-hazing [4,5,6], and de-snowing [7,8,9]. Although these approaches demonstrate strong performance under specific weather conditions, their single-task nature substantially limits their practicality in real-world applications, such as autonomous driving [10], surveillance systems [11], and more. Considering this, recent efforts have shifted toward all-in-one weather-degraded image restoration, which seeks to recover clean images from diverse weather-degraded inputs using a unified model with consistent parameters [12].

In recent years, there has been a growing interest in all-in-one weather-degraded image restoration tasks, and numerous advanced approaches [13,14,15,16,17,18,19,20,21,22,23,24,25,26] have been developed to achieve state-of-the-art restoration performance across various weather conditions. Despite the notable progress, most of them are designed to fit joint regression models over degraded/clean image pairs in a fully supervised manner, which necessitates the manual collection of large-scale paired samples for model training. Consequently, their generalization ability on real-world weather-degraded images is limited. More recently, some researchers have adopted semi-supervised schemes to improve the generalization ability of restoration networks. For example, Long et al. [27] proposed a semi-supervised framework for all-in-one adverse weather removal, which is built upon a teacher–student architecture that adopts a Wavelet diffusion model [28] as the backbone. Subsequently, Xu et al. [29] leveraged the vision–language model to construct a semi-supervised learning framework, aiming to enhance restoration performance across multiple adverse weather conditions in real-world settings. Although these semi-supervised learning frameworks mentioned above have achieved promising results, the quality of pseudo-labels generated by the teacher network plays a critical role in determining the final restoration performance because inaccurate or biased pseudo-labels may propagate errors and limit the effectiveness of the student model. Therefore, all-in-one weather-degraded image restoration remains a challenging and largely unresolved problem, which leaves substantial room for further improvement.

Given the limitations in previous semi-supervised adverse weather removal methods, this article presents a new semi-supervised learning framework, called Semi-Supervised Degradation-Aware Learning (S²DAL), to adjust the feature space to align with the unified parameter space for all-in-one adverse weather removal. Specifically, the proposed S²DAL is composed of two backbone networks: a Degradation-guided Histogram Transformer (DHformer) and a Degradation-guided Convolutional Neural Network (DCNN). Among them, DHformer is built upon multiple Degradation-guided Histogram Transformer (DHT) blocks, which incorporate the guidance of degradation-type prompts, and can fully mine the inherent features of the clean image while reducing the interference of different weather-induced degradation patterns. In order to form a semi-supervised learning framework under the unlabeled data condition, DCNN is designed to mimic the generation process of degradation layers, which is also guided by the degradation-type prompts to effectively disentangle the generation of degradation layers. Beyond that, a Monte Carlo-based Expectation-Maximization (EM) algorithm is introduced to optimize the parameters and latent variables in our proposed S²DAL. The main contributions of this article are:

We propose a novel semi-supervised learning framework for all-in-one weather-degraded image restoration. The proposed method, called S²DAL, gets rid of the large-scale labeled data for fully supervised training.
We design a DHT block to build the DHformer for restoring the clean image from its corresponding weather-degraded observation. The DHT block can adaptively regulate the feature space through channel shuffling modulation to better align with the shared parameter and network structure, thereby enhancing the all-in-one restoration performance.
We introduce the Monte Carlo-based EM algorithm to jointly optimize the network parameters and latent variables in the proposed S²DAL, and extensive experimental results on both synthetic and real-world weather-degraded images demonstrate the effectiveness and superiority of our method quantitatively and qualitatively.

The remainder of this paper is organized as follows. Section 2 surveys the related work on image restoration under adverse weather conditions, including single weather-degraded image restoration and all-in-one weather-degraded image restoration. Section 3 details the proposed S²DAL framework. Section 4 introduces the experimental setup and provides a comprehensive performance analysis. Finally, Section 5 concludes the paper.

2. Related Work

2.1. Single Weather-Degraded Image Restoration

Previous studies have predominantly focused on single weather-degraded image restoration tasks (e.g., de-raining [2,3,30], de-hazing [5,6,31], and de-snowing [8,9,32]), which have achieved substantial advancements.

Conventional model-based methods [33,34,35,36,37,38] formulate the single weather-degraded image restoration as a signal separation problem and attempt to exploit various effective priors to impose additional constraints to characterize the background or degradation layers, and then separate them by solving an objective function using a proper optimization algorithm. The representative works include but are not limited to dark channel [33], discriminative sparse coding [37], color attenuation [38], non-local means filter [36], dictionary learning [35], Gaussian mixture model [34], and more. With the rise of deep learning, numerous approaches [5,6,8,9,31,39] based on Convolutional Neural Networks (CNNs) have been developed. Besides the CNN-based approaches, some works [32,40,41] seek to employ the adversarial learning strategy to train Generative Adversarial Networks (GANs [42]) to better restore clean images from real-world weather-degraded ones. However, both CNNs and GANs rely on convolution operations, which are spatially invariant and have a limited receptive field, thereby hindering their ability to model spatially variant properties and global structures of clean images [43]. Considering these limitations, recent researchers [44,45] introduced Transformers [46] to solve the single weather-degraded image restoration problem and yielded significant advancements, as they were able to effectively model the non-local information to achieve better image restoration. While single weather-degraded image restoration methods have received significant interest, most of them are difficult to generalize effectively to different weather-degraded image restoration tasks.

2.2. All-in-One Weather-Degraded Image Restoration

In contrast to single weather-degraded image restoration, all-in-one weather-degraded image restoration aims to address multiple different weather-degraded image restoration tasks using a unified model. And the key challenge lies in how to alleviate the mutual interference among different degradation patterns while faithfully preserving the inherent features of the clean image [47].

One line of research addresses this issue by adapting the network’s parameter space to accommodate different degradation types, and some works have explored this line of research to remove diverse weather degradations in the images. For example, Li et al. [14] made the first attempt to handle the all-in-one weather-degraded image restoration problem. They employed a neural architecture search technique to find the proper network architectures for multiple weather-specific encoders. However, their all-in-one framework incurs a large number of parameters due to the usage of separate encoders for each individual weather removal task. Considering this, Valanarasu et al. [15] developed a transformer-based all-in-one weather removal framework by incorporating intra-patch transformer blocks and learnable weather-type embeddings. Zhu et al. [48] introduced a unified weather removal model that utilizes two separate training stages to learn the weather-general and weather-specific features automatically. Özdenizci et al. [16] designed a patch-based weather-degraded image restoration framework by performing smoothed noise estimation across overlapping patches. Chen et al. [49] explored a two-stage knowledge learning mechanism for all-in-one adverse weather removal, where a student network collaboratively distills knowledge from multiple weather-specific teacher networks and is further refined using a multi-contrastive knowledge regularization loss. Sun et al. [18] proposed a histogram transformer to achieve all-in-one weather-degraded image restoration by introducing histogram self-attention to capture long-range spatial dependencies under adverse weather conditions.

Another line of research is to introduce image or degradation-type prompts to modulate feature representations so as to allow the restoration model to have a unified parameter space. For example, Tan et al. [17] designed a text-based classification loss that computes the cosine distance between textual and visual features, producing a similarity tensor that reflects the probabilistic association between each image and specific weather category. This design enables the network to adaptively accommodate diverse weather conditions. Wen et al. [25] proposed an adaptive degradation-aware self-prompting framework, which introduces the contrastive language–image pre-training model (CLIP [50]) to learn latent caption prompts to guide the reverse sampling process of a diffusion model for all-in-one weather-degrade image restoration. Liao et al. [51] developed a cyclic prompt learning approach, which integrates weather-specific knowledge, textual context, and reliable textures to enhance the performance for universal adverse weather removal. Shao et al. [52] proposed a unified memory-enhanced visual–language recovery method that combines a lightweight encoder–decoder backbone with a visual–language model and an implicit memory bank to restore images under diverse and mixed adverse weather conditions. Yang et al. [21] introduced a language-driven restoration framework, which exploits pre-trained vision–language models to infer the degradation occurrence, type, and severity, and to generate description-based degradation priors for all-in-one weather-degraded image restoration.

Even though the abovementioned all-in-one weather-degraded image restoration approaches have achieved impressive performance, they heavily rely on a large amount of manually labeled data, i.e., degraded/clean image pairs. Due to the inherent limitations in collecting paired degraded and clean images, existing fully supervised methods for adverse weather removal may struggle to generalize to real-world weather-degraded cases. Recently, self-learning methods have achieved significant progress in many vision-related tasks due to their ability to leverage large-scale unlabeled data without requiring expensive manually labeled data, thereby improving model generalization capability [53]. Despite that, self-learning may face substantial challenges in all-in-one adverse weather image restoration, as multiple degradation types in an all-in-one setting exhibit substantially different physical properties and visual characteristics. Without reliable ground-truth supervision from labeled data, purely self-learning methods, where supervisory signals are automatically generated from the data itself rather than relying on manually annotated labels, may introduce error accumulation and unstable optimization when handling diverse weather degradations. In contrast, the semi-supervised paradigm allows the model to benefit from a limited amount of reliable labeled data while simultaneously exploiting abundant unlabeled degraded images, thereby achieving a better trade-off between restoration fidelity and generalization capability [27,29]. In light of this, several approaches have employed semi-supervised learning strategies to enhance the generalization ability of restoration networks when they are applied to real-world weather-degraded images. For example, Long et al. [27] developed a semi-supervised all-in-one adverse weather removal framework built upon the teacher-student network that uses a Wavelet diffusion model [28] as the backbone. Subsequently, Xu et al. [29] introduced the vision–language model to construct a semi-supervised learning framework, aiming to enhance restoration performance across multiple adverse weather conditions in real-world settings. However, both methods require the teacher network to generate high-quality pseudo-labels to impose the supervised training on the student network, while inaccurate or biased pseudo-labels may propagate errors and thereby impair the restoration performance of the student model.

3. Proposed Semi-Supervised Degradation-Aware Learning Framework

3.1. Model Formulation

Let

y

denote a weather-degraded image, and

P_{d}

is its corresponding degradation-type prompt. This paper is devoted to building a Semi-Supervised Degradation-Aware Learning (S²DAL) framework to restore the clean image from its corresponding weather-degraded input, with the guidance of degradation-type prompts. Similar to [54,55], this work decomposes the weather-degraded image

y

into three components as follows:

\begin{matrix} y = x + D + R, R_{i j} \sim N (0, σ^{2}), \end{matrix}

(1)

where

x

,

D

, and

R

represent the latent clean background, degradation layer, and residual term, respectively. The residual term

R

is assumed to follow a zero-mean Gaussian distribution with variance

σ^{2}

, and

R_{i j}

is the element of

R

at spatial location

(i, j)

. The mapping from the observed degraded image

y

to its underlying clean background

x

is parameterized by a deep neural network, which we refer to as image restoration model

f (\cdot; W)

, where

W

represents the network parameters. In the following, we separately model the clean background

x

and the degradation layer

D

.

3.1.1. Modeling of Background Layer

For degraded images caused by various adverse weather conditions, the corresponding clean background typically exhibits strong local correlations among neighboring pixels. To capture this prior knowledge, we introduce a Markov Random Field (MRF) prior for unlabeled degraded images

y

, formulated as

\begin{matrix} p (W) \propto \exp (- ρ \sum_{i, j} υ γ), \end{matrix}

(2)

where

υ = [\begin{matrix} | f_{i + 1, j} - f_{i, j} | \\ | f_{i, j + 1} - f_{i, j} | \end{matrix}], γ = [\begin{matrix} γ_{1} \\ γ_{2} \end{matrix}],

(3)

where

f_{i, j}

represents the value of the restored image

f (y, P_{d}; W)

at spatial location

(i, j)

.

ρ

and

γ

are manually specified hyper-parameters.

Furthermore, for labeled degraded images

y

, the corresponding ground-truth clean background

x

provides an additional strong prior, such as Gaussian assumption and linear correlation prior, which can be incorporated into Equation (2), i.e.,

\begin{matrix} p (W) \propto \\ \exp (- ρ \sum_{i, j} υ^{T} γ - \frac{{∥ f (y, P_{d}; W) - x ∥}_{2}}{ε_{0}} - (1 - \frac{\sum_{n = 1}^{3 H W} (f {(y, P_{d}; W)}_{n} - \bar{f} (y, P_{d}; W)) (x_{n} - \bar{x})}{3 H W σ (f (y, P_{d}; W)) σ (x)})), \end{matrix}

(4)

where

ε_{0}

is a very small hyper-parameter close to zero. H and W denote the height and width of images.

f {(y, P_{d}; W)}_{n}

denotes the n-th pixel of the restored image, whereas

\bar{f} (y, P_{d}; W)

and

σ (f (y, P_{d}; W))

represent the mean and standard deviation of the restored image, respectively. Similarly,

x_{n}

denotes the n-th pixel of the ground truth clean image, whereas

\bar{x}

and

σ (x)

are the mean and standard deviation of the ground truth clean image, respectively. The introduction of linear correlation prior can regulate the linear relationship between images. To realize the image restoration model

f (\cdot; W)

, we designed a novel Degradation-guided Histogram Transformer (DHformer) architecture, and its detailed network architecture is described in Section 3.3.

3.1.2. Modeling of Degradation Layer

To simulate the diverse degradation patterns induced by various adverse weather conditions, a degradation generation model is constructed to model the degradation layer

D

, defined as

\begin{matrix} D = G (z, P_{d}; θ), \end{matrix}

(5)

where

z \in N (0, I)

denotes a noise vector (i.e., latent variable), and

G (\cdot; θ)

represents a degradation generator parameterized by a deep neural network with parameters

θ

. Specifically, we designed a Degradation-guided Convolutional Neural Network (DCNN) which takes the latent variable

z

as input and adaptively predicts the weather-related degradation layers, with the guidance of weather degradation prompts. Similarly, its detailed network architecture is stated in Section 3.3.

3.2. Learning Guideline

3.2.1. Maximum A Posteriori Estimation

To optimize the aforementioned image restoration model

f (\cdot; W)

and the degradation generation model

G (\cdot; θ)

, we adopt Maximum A Posteriori (MAP) estimation as the learning guideline to estimate the model parameters

W

and

θ

, i.e.,

\begin{matrix} log p (W, θ | y, P_{d}) = log p (y, P_{d} | W, θ) + log p (W) + const \\ ≜ Ω (y, P_{d}; W, θ), \end{matrix}

(6)

where

p (y, P_{d} | W, θ)

denotes the joint likelihood function of the observed degradation image

y

and its corresponding degradation-type prompt

P_{d}

, whose integral form can be expressed as

\begin{matrix} p (y, P_{d} | W, θ) = \int p (y, P_{d} | W, θ, z) p (z) d_{z} \\ = \int N (f (y, P_{d}; W) + G (z; θ), σ I) p (z) d_{z} . \end{matrix}

(7)

Correspondingly, Equation (7) can be also reformulated into an expectation form as

\begin{matrix} p (y, P_{d} | W, θ) = E_{p (z)} [p (y, P_{d}, W, θ, z)] . \end{matrix}

(8)

To achieve adverse weather removal through a semi-supervised learning framework, we seek to optimize the following objective function under both labeled and unlabeled data settings:

\begin{matrix} max_{W, θ} log p (W, θ | y, P_{d}) = max_{W, θ} \sum_{y \in {ϕ, φ}} Ω (y, P_{d}; W, θ), \end{matrix}

(9)

where

ϕ = {y_{k}, x_{k}}_{k = 1}^{N_{ϕ}}

and

φ = {y_{k}}_{k = 1}^{N_{φ}}

denote a labeled dataset and an unlabeled dataset, respectively.

y_{k}

represents the k-th weather-degraded image, whereas

x_{k}

is corresponding ground-truth clean image.

3.2.2. Optimization Algorithm

In order to optimize the model parameters

W

and

θ

in Equation (9), an alternative back-propagation technology, named Monte Carlo-based Expectation-Maximization (EM [56]) algorithm, is introduced to maximize

Ω (W, θ)

. Specifically, in the Expectation step (E-step), the latent variable

z

is sampled from the posterior distribution

p (z | y, P_{d})

. With the inferred latent variable

z

, the model parameters

W

and

θ

are subsequently updated in the Maximization step (M-step).

E-step: Let

{W^{o l d}, θ^{o l d}}

denote the current model parameters and

p_{o l d} (z | y, P_{d})

is the posterior distribution under them. Then, the E-step samples the latent variable

z

from

p_{o l d} (z | y, P_{d})

according to Langevin dynamics, which can be formulated as

\begin{matrix} z^{(t + 1)} = z^{(t)} + \frac{α^{2}}{2} [\frac{\partial}{\partial z} log p_{o l d} (z | y, P_{d})] |_{z = z^{(t)}} + α ξ^{(t)} \\ = z^{(t)} - \frac{α^{2}}{2} [\frac{\partial}{\partial z} J (z)] |_{z = z^{(t)}} + α ξ^{(t)} \end{matrix}

(10)

where t indexes the time step in Langevin dynamics, whereas

α

is the step size.

ξ^{(t)}

represents the Gaussian white noise, which is introduced to prevent falling into the local modes, and

J (z)

is defined as

\begin{matrix} J (z) = \frac{1}{2 α^{2}} {∥y - f (y, P_{d}; W^{o l d}) - G (z; θ^{o l d})∥}_{2} + \frac{1}{2} {∥ z ∥}_{2} . \end{matrix}

(11)

In practice, to reduce the computational cost of Markov Chain Monte Carlo (MCMC) sampling, the update process in Equation (10) at each learning iteration is initialized from the previous sampling result, rather than starting from a new random state.

M-step: For convenience, the latent variable sampled in the E-step is denoted as

\hat{z}

. The M-step aims to maximize the approximate upper bound with respect to the parameters

W

and

θ

, as follows:

\begin{matrix} max_{W, θ} Ω (y, P_{d}; W, θ) = max_{W, θ} \{\int p_{o l d} (z | y, P_{d}) log p (y, P_{d}, z | W, θ) d_{z} + log p (W)\} \\ \approx max_{W, θ} \{log p (y, P_{d}, \hat{z} | W, θ) + log p (W)\} . \end{matrix}

(12)

Since Maximum A Posteriori (MAP) estimation is equivalent to empirical risk minimization with regularization terms, Equation (12) can be further reformulated as the following minimization problem:

\begin{matrix} min_{W, θ} L (W, θ) = min_{W, θ} {\frac{1}{2 β^{2}} {∥ y - f (y, P_{d}; W) - G (\hat{z}; θ) ∥}_{2} + ρ \sum_{i, j} υ^{T} γ + 1_{[y \in ϕ]} \cdot \frac{{∥ f (y, P_{d}; W) - x ∥}_{2}}{ε_{0}} \\ + 1_{[y \in ϕ]} \cdot \frac{1}{2} (1 - \frac{\sum_{n = 1}^{3 H W} (f {(y, P_{d}; W)}_{n} - \bar{f} (y, P_{d}; W)) (x_{n} - \bar{x})}{3 H W σ (f (y, P_{d}; W)) σ (x)})}, \end{matrix}

(13)

To solve the optimization objective in Equation (13), the gradient descent based on back-propagation (BP) is employed to update the model parameters

W

and

θ

as follows:

\begin{matrix} Θ \leftarrow Θ - η \frac{\partial}{\partial Θ} L (W, θ), Θ \in {W, θ}, \end{matrix}

(14)

where

η

is the step size. The optimization procedure of our proposed S²DAL framework is summarized in Algorithm 1.

Algorithm 1 Optimization procedure for S²DAL

Require:: Training data $ϕ = {y_{k}, x_{k}}_{k = 1}^{N_{ϕ}}$ and $φ = {y_{k}}_{k = 1}^{N_{φ}}$ , as well as the corresponding degradation-type prompts $P_{d}$ . T iterations correspond to one epoch. Number of Langevin steps l.
1:: Randomly initialize $W$ and $θ$
2:: while not converged do
3:: for $j = 1, 2, \dots, T$ do
4:: Sample a mini-batch data from $ϕ$ or $φ$
5:: E-step: For each data example $y_{k}$ , run l steps of Langevin dynamics to sample $z$ using Equation (10).
6:: M-step: Initialize $W$ and $θ$ with those updated from the previous epoch, and then are further optimized using Equation (14).
7:: end for
8:: end while
9:: return the optimized weights ${W^{*}, θ^{*}}$

3.3. Network Architecture

The overall architecture of the proposed Semi-Supervised Degradation-Aware Learning (S²DAL) framework is illustrated in Figure 1, which is composed of the following two backbone networks.

3.3.1. Degradation-Guided Histogram Transformer

Since diverse weather-induced degradation usually exhibits different patterns, we develop a novel Degradation-guided Histogram Transformer (DHformer) to adaptively remove these weather-induced degradation patterns, according to the guidance of corresponding degradation-type prompts.

As shown in Figure 1, the proposed DHformer first extracts shallow features

F_{0}

via a

3 \times 3

convolutional layer. Next, these features

F_{0}

are fed into a 4-level encoder–decoder network to extract intricate features which are finally converted to the restored image

f (y, P_{d}; W)

via another

3 \times 3

convolutional layer. Within both the encoder and decoder, multiple well-designed Degradation-guided Histogram Transformer (DHT) blocks are designed to mine the inherent features of the clean image while reducing the interference of different weather-induced degradations. At each encoding stage, the proposed DHformer introduces a coarse skip connection to supplement the original input features. This branch is implemented through a sequence of operations, including average pooling, point-wise convolution, and depth-wise convolution. This setup enables the encoders to focus more on the weather-induced degradation factors. Furthermore, at each corresponding encoding–decoding stage, the encoder and decoder are linked via skip connections which enable the transfer of intermediate features across matching levels, thereby enhancing the training stability.

Degradation-guided Histogram Transformer Block. In order to fully mine the inherent features of clean images while reducing the interference of different weather-induced degradation patterns, we propose a Degradation-guided Histogram Transformer (DHT) block to build the DHformer. DHT block can adaptively regulate the feature space through channel shuffling modulation to better align with the shared parameter and network structure, thereby enhancing the all-in-one restoration performance. As shown in Figure 2, as the key component of DHformer, DHT block is composed of three pivotal modules, namely the Degradation-guided Channel Shuffling Module (DCSM), Dynamic-range Histogram Self-Attention (DHSA), and Dual-scale Gated Feed Forward (DGFF). These modules are organized to interact in conjunction with layer normalization, and the overall process can be formulated as follows:

\begin{matrix} F_{j} = DCSM (F_{j - 1}), \end{matrix}

(15)

\begin{matrix} F_{j} = F_{j} + DHSA (LN (F_{j})), \end{matrix}

(16)

\begin{matrix} F_{j} = F_{j} + DGFF (LN (F_{j})), \end{matrix}

(17)

where

LN (\cdot)

denotes a layer normalization operation and

F_{j}

represents the features at the j-th stage.

Regarding DCSM, its objective is to reorder the feature channels through a shuffling mechanism, with the guidance of weather degradation-type prompts. However, directly shuffling the channels on the features

F_{j - 1} \in R^{C \times H \times W}

may make the model difficult to converge due to excessive perturbation [47], which finally impacts the restoration performance. To address this issue, we follow the same practice in [47], expanding the channel number of the image features

F_{j - 1} \in R^{C \times H \times W}

to

{\hat{F}}_{j - 1} \in R^{2 C \times H \times W}

; this enables the feature channels to be shuffled in a higher-dimensional channel space through the Degradation Guidance Module (DGM). In DGM, the weather degradation-type prompts (i.e.,

P_{d} \in R^{C \times 1}

) are encoded to the corresponding index vector, which is used to indicate the order of feature channels in the shuffling process. To make the dimensionality of

P_{d}

align with the channel dimensionality of

{\hat{F}}_{j - 1}

, a Multi-Layer Perceptron (MLP), which contains two linear layers, is used to convert the input prompts

P_{d} \in R^{C \times 1}

to

{\hat{P}}_{d} \in R^{2 C \times 1}

so as to achieve dimension matching with the number of channels in

{\hat{F}}_{j - 1} \in R^{2 C \times H \times W}

. In the channel shuffling stage, the index values in

{\hat{P}}_{d}

are utilized to reorder the feature channels of

{\hat{F}}_{j - 1}

. After shuffling, the number of channels in

{\hat{F}}_{j - 1}

is reduced by half, aiming to maintain channel consistency before and after the transformation. The overall process of DCSM is formulated as follows:

\begin{matrix} F_{j} = {Conv}_{1 \times 1}^{d} ({CS}_{t o p K} ({Conv}_{1 \times 1}^{h} (F_{j - 1}) ∣ DGM (P_{d}))), \end{matrix}

(18)

where

{Conv}_{1 \times 1}^{d} (\cdot)

and

{Conv}_{1 \times 1}^{h} (\cdot)

respectively denote a

1 \times 1

point-wise convolution to double and halve the feature channels, whereas

{CS}_{t o p K} (\cdot)

is a top-K channel shuffling operation which is guided by the index values resulting from

DGM (P_{d})

. By reordering the feature channels that are guided by degradation-type prompts, the adaptive shuffling strategy can effectively mitigate inter-channel interference while preserving the intrinsic image characteristics because there is no feature information loss during the channel shuffling process. Note that, to obtain degradation-type prompts, we introduce a pre-trained CLIP [50] model to encode the textual degradation type descriptions.

Regarding DHSA, its objective is to capture dynamically distributed weather-induced degradation. For that, this module comprises a dynamic-range convolution that re-arranges the spatial distribution of partial features, together with a dual-path histogram self-attention mechanism [18] that combines global and local dynamic feature aggregation. For the dynamic-range convolution, it divides the input features

F_{j}

into two branches along the channel dimension, i.e.,

F_{j}^{1}

and

F_{j}^{2}

. For the first feature branch, sorting operations are conducted along both the horizontal and vertical directions, after which the sorted features are concatenated with those from the second branch. The resulting re-combined features are subsequently passed through a separable convolution layer. Accordingly, the entire calculation process of dynamic-range convolution can be formulated as follows:

\begin{matrix} F_{j}^{1}, F_{j}^{2} = Split (F_{j}), F_{j}^{1} = {Sort}_{v} ({Sort}_{h} (F_{j}^{1})), \end{matrix}

(19)

\begin{matrix} F_{j} = {Conv}_{3 \times 3}^{d} ({Conv}_{1 \times 1} (Concat (F_{j}^{1}, F_{j}^{2}))), \end{matrix}

(20)

where

Split (\cdot)

represents an operation of splitting features along the channel dimension, and

{Sort}_{i \in v, h} (\cdot)

denotes an operation of sorting horizontally or vertically.

{Conv}_{1 \times 1} (\cdot)

is a

1 \times 1

point-wise convolution, and

{Conv}_{3 \times 3}^{d} (\cdot)

represents a

3 \times 3

depth-wise convolution. By arranging pixels with high and low intensities into structured patterns at the diagonal corners of the matrices, dynamic-range convolution enables convolutional operations to capture interactions across different dynamic ranges.

On the other hand, pixels corresponding to background content or weather degradations with varying intensities should be assigned different extents of attention. Considering this, a dual-path histogram self-attention is introduced to partition spatial elements into discrete bins and then adaptively assign varying attention within and across bins. Given the output of dynamic-range convolution, i.e.,

F_{j} \in R^{C \times H \times W}

, it is first decomposed into value features

V

and two pairs of query–key, i.e.,

F_{j}^{Q K, 1}

and

F_{j}^{Q K, 2}

, which are subsequently fed into two branches. The sequence of

V

is sorted, after which the corresponding query–key pairs are re-arranged according to the obtained indices as follows:

\begin{matrix} V, t = Sort (R_{C \times H \times W}^{C \times H W} (F_{j})), \end{matrix}

(21)

\begin{matrix} Q^{1}, K^{1} = Split (Gather (R_{C \times H \times W}^{C \times H W} (F_{j}^{Q K, 1}), t)), \end{matrix}

(22)

\begin{matrix} Q^{2}, K^{2} = Split (Gather (R_{C \times H \times W}^{C \times H W} (F_{j}^{Q K, 2}), t)), \end{matrix}

(23)

where

R_{C \times H \times W}^{C \times H W} (\cdot)

denotes an operation of reshaping features from

R^{C \times H \times W}

to

R^{C \times H W}

, t is an index of sorted value, and

Gather (\cdot)

represents an operation of retrieving elements. Furthermore, to capture both global and local information, two types of reshaping operations are defined, namely Bin-wise Histogram Reshaping (BHR) and Frequency-wise Histogram Reshaping (FHR). The number of bins in the first one is set to N, and each bin contains

H W / N

elements. The frequency of each bin in the second one is equal to B, and the number of bins is

H W / N

. In this manner, BHR can capture large-scale contextual information, as each bin aggregates a substantial number of dynamically distributed pixels, whereas FHR facilitates the extraction of fine-grained details by grouping a small number of pixels with similar intensity values within each bin. The two query–key feature pairs are separately processed by two types of reshaping and subsequent self-attention operations, and their outputs are then multiplied element-wise to produce the final outcome. This procedure can be formulated as follows:

\begin{matrix} M_{B} = softmax (\frac{R_{B} (Q^{1}) R_{B} {(K^{1})}^{T}}{\sqrt{k}}) R_{B} (V), \end{matrix}

(24)

\begin{matrix} M_{F} = softmax (\frac{R_{F} (Q^{2}) R_{F} {(K^{2})}^{T}}{\sqrt{k}}) R_{F} (V), \end{matrix}

(25)

\begin{matrix} M = M_{B} ⊙ M_{F}, \end{matrix}

(26)

where k denotes the number of heads,

R_{B} (\cdot)

and

R_{F} (\cdot)

represent the reshaping operation of BHR and FHR, respectively, whereas

M_{B}

and

M_{F}

are the corresponding obtained attention maps.

Regarding DGFF, its objective is to pay more attention to the correlations among dynamically distributed weather-induced degradation. To achieve this, DGFF incorporates two complementary transmission pathways composed of multi-range and multi-scale depth-wise convolutions. Given the input of DGFF, i.e.,

F_{j} \in R^{C \times H \times W}

, a

1 \times 1

point-wise convolution is first applied to expand the channel dimension. And then, the resulting augmentation features are fed into two parallel branches for subsequent processing. During feature transformation,

5 \times 5

and dilated

3 \times 3

depth-wise convolutions are employed to strengthen the extraction of multi-range and multi-scale representations. Following the gating mechanism in [57], the output of the second branch serves as a gating map for the first branch. Accordingly, the overall feature fusion process within the DGFF module can be formulated as follows:

\begin{matrix} F_{j, 1}, F_{j, 2} = Split (Shuffle ({Conv}_{1 \times 1} (F_{j}))), \end{matrix}

(27)

\begin{matrix} F_{j, 1} = {Conv}_{5 \times 5}^{d} (F_{j, 1}), F_{j, 2} = {Conv}_{3 \times 3}^{d, dilated} (F_{j, 2}), \end{matrix}

(28)

\begin{matrix} F_{j} = {Conv}_{1 \times 1} (Unshuffle (Mish (F_{j, 2}) ⊙ F_{j, 1})), \end{matrix}

(29)

where

{Conv}_{5 \times 5}^{d} (\cdot)

denotes a

5 \times 5

depth-wise convolution, and

{Conv}_{3 \times 3}^{d, dilated} (\cdot)

is a

3 \times 3

dilated depth-wise convolution.

Shuffle (\cdot)

and

Unshuffle (\cdot)

denote an operation of pixel-shuffling and unshuffling, respectively.

Mish (\cdot)

represents a Mish activation [58].

3.3.2. Degradation-Guided Convolutional Neural Network

The DCNN plays a crucial role in our proposed Semi-Supervised Degradation-Aware Learning (S²DAL) framework, as it directly impacts the adverse weather removal performance. As shown in Figure 1, given a latent variable

z \in N (0, I)

, its dimensionality is first elevated by a fully connected layer with ReLU activation, i.e., from

R^{1 \times L}

to

R^{1 \times 16 L}

, where

L = 256

. Next, a reshaping operation is conducted to reshape the output of fully connected layer from

R^{1 \times 16 L}

to

R^{1 \times [\frac{H}{r}] \times [\frac{W}{r}]}

, where

r = 1

, and

H = W = 64

, obtaining the corresponding feature tensors

T

. The process can be formulated as the following expression:

\begin{matrix} T = R_{T} (ReLU (FC (z))), \end{matrix}

(30)

where

FC (\cdot)

is a fully connected layer,

ReLU (\cdot)

denotes a Rectified Linear Unit, and

R_{T} (\cdot)

represents a reshaping operation.

Afterwards, the obtained feature tensors

T

undergo a shallow convolutional network, which is mainly composed of five

3 \times 3

convolutional layers and two Degradation-guided Shuffling Modules (DCSMs). To facilitate the recovery process, the encoding features from the first and second convolutions are concatenated with the decoding features of the third and fourth convolutions via skip connections, and the DCSM is also inserted between the encoder and decoder, and the encoding feature channels are also shuffled under the guidance of weather degradation-type prompts. By incorporating DCSM, the generation of weather-induced degradation layers can be more effectively disentangled, thereby facilitating the generation of more realistic degradation layers. More details of the network configuration for DCNN can be found in Table 1.

4. Experiments

4.1. Datasets

To evaluate the performance, we follow prior works [15,18] and employ the curated AllWeather dataset, which is constructed by combining subsets of training images from Snow100K [59], Outdoor-Rain [60], and RainDrop [61]. The Allweather dataset provides a balanced training set across three adverse weather conditions, including snow, heavy rain with haze, and raindrops. Specifically, the training set comprises 9000 images from Snow100K [59], 818 images from RainDrop [61], and 8250 images from Outdoor-Rain [60]. Snow100K consists of synthetic images degraded by snow, whereas RainDrop contains images affected by raindrops. Outdoor-Rain provides synthetic images simultaneously corrupted by rain streaks and haze. For performance evaluation, we utilized the Test1 dataset [60] with 750 rain-haze images, the RainDrop [61] test set with 58 raindrop images, and the Snow100K-L and Snow100-S test sets [59] with 16801 large snow images and 16611 small snow images, respectively. The sample quantity as well as the corresponding degradation-type prompts for these three datasets are shown in Table 2.

Quantitative evaluations between ground-truth and restored images were conducted using the conventional Peak Signal-to-Noise Ratio (PSNR [62]) and Structural SIMilarity (SSIM [63]), where higher values generally denote superior performance. In addition, two no-reference metrics were adopted to assess real-world restoration quality, namely the Natural Image Quality Evaluator (NIQE [64]) and the feature-enriched completely blind image quality evaluator (IL-NIQE [65]). Lower NIQE and IL-NIQE scores indicate better perceptual quality of the restored images.

4.2. Implementation Details

To train the proposed Semi-Supervised Degradation-Aware Learning S²DAL framework, we randomly split the training samples from Snow100K [59], RainDrop [60], and Outdoor-Rain [61] into labeled and unlabeled data at a ratio of 1:1. Specifically, regrading the training set of Snow100K [59], 4500 snowy/clean image pairs were randomly selected as the labeled data and the remaining 4500 snowy images as the unlabeled data. Likewise, for the training sets of RainDrop [60] and Outdoor-Rain [61], half of the paired degraded/clean images from their corresponding training sets were selected as the labeled data, and the remaining degraded ones as the unlabeled data. The implementation of our S²DAL uses the PyTorch (v1.12.1) deep learning framework on a single GPU of NVIDIA RTX A6000. The proposed S²DAL was trained in an end-to-end manner, where the Adam [66] solver is applied to optimize the model parameters in the M-step of the Monte Carlo-based EM algorithm [56]. The initial learning rates for the image restoration model, i.e., DHformer, and the degradation generative model, i.e., DCNN, were set to

1 \times 10^{- 4}

and

1 \times 10^{- 3}

, respectively, and decayed by a factor of 10 after 20 epochs. The mini-batch was set to 10, and each image was randomly cropped into

128 \times 128

pixels. Note that to make the training more stable, we only optimized the parameter

W

of the image restoration model for the first 2 epochs and then jointly optimized all network parameters, including the parameters

θ

in DCNN. As for the hyper-parameters, we set

ε_{0} = 1 \times 10^{- 6}

,

ρ = 0.1

, and

γ = {[1, 1]}^{T}

.

Note that, due to limited model capacity, it is difficult to accurately fit the degradation layers across the entire dataset using only a single generator model

G (\cdot; θ)

defined in Equation (5). To address this issue, we followed the same practice in [55], employing a considerable number of degradation generators with the identical network architecture, where each generator

G (\cdot; θ)

is assigned to a specific mini-batch for degradation layer generation. Under this configuration, each degradation generator is only responsible for modeling the degradation layers of its corresponding mini-batch data, thereby avoiding the difficulty of modeling all degradation layers across the entire dataset. During the training procedure, when a mini-batch is revisited in a new epoch, the corresponding degradation generator

G (\cdot; θ)

is initialized using the parameters updated from the previous epoch, and then is further optimized based on Equation (14). Therefore, the parameter evolution of each generator

G (\cdot; θ)

remains continuous throughout the iterative optimization process, which is consistent with the iterative refinement mechanism of the EM algorithm. Moreover, the storage overhead of

G (\cdot; θ)

for each mini-batch during the training procedure is

13.55

MB.

4.3. Comparisons with the State of the Art

To demonstrate the effectiveness and superiority of our method, the proposed S²DAL framework is compared with multiple state-of-the-art semi-supervised approaches. The representative semi-supervised approaches used for comparison include SnowMaster [67], SAT-UIR [68], DNDM [69], SSID-KD [70], Muss [71], and SemiDDM-weather [27]. Furthermore, we also perform a performance comparison with two existing fully supervised all-in-one methods for adverse weather removal, including All-in-One network [14] and TransWeather [15]. Note that, to make a fair comparison, all these competing methods were trained to handle raindrops, snow, as well as heavy rain and haze removal using a single model.

Table 3 presents the quantitative evaluation of different methods on the test sets of RainDrop [61], Outdoor-Rain [61], and Snow100K [59]. It can be clearly observed that the proposed S²DAL presents superior restoration performance compared to all competing methods on three adverse weather removal tasks. Particularly, for raindrop removal, the proposed S²DAL is superior to SnowMaster [67] (semi-supervised method) and TransWeather [15] (fully supervised method) by

2.82

dB

/ 0.027

and

1.18

dB

/ 0.016

in terms of PSNR and SSIM. For snow removal as well as heavy rain and haze removal tasks, the proposed S²DAL still yields the best results (i.e., 28.13 dB/0.916 on Outdoor-Rain,

34.90

dB

/ 0.951

on Snow100K-S, and

29.29

dB

/ 0.905

on Snow100K-L) compared to the semi-supervised methods. Although the proposed S²DAL obtains slightly lower PSNR than that of TransWeather [15], it achieves higher SSIM and still competitive PSNR on the Outdoor-Rain [60] test set. More importantly, compared with fully supervised methods including TransWeather [15], our proposed S²DAL can significantly reduce the dependence on large-scale paired labeled data, demonstrating its effectiveness and practical applicability in labeled data-scarce scenarios.

Besides the superiority in quantitative evaluation, Figure 3 depicts some visualizations of raindrop removal by different methods. As can be seen, although those competing methods, i.e., SemiDDM-weather [27], SnowMaster [67], and TransWeather [15], are capable of effectively removing raindrops, they leave behind artifacts in the corresponding removal regions (see the enlarged areas), resulting in undesirable visual effects. In contrast, the proposed S²DAL not only removes raindrops more thoroughly but also avoids leaving behind artifacts caused by the removal process, thereby achieving a superior restoration of the background information. Figure 4 depicts the visual results of different methods for rain and haze removal. As illustrated, TransWeather [15] achieves better visual quality compared to SemiDDM-weather [27] and SnowMaster [67], which appear to have under-de-raining issues, leaving noticeable rain streaks in their restored images. Despite that, the image restored by our method still presents the superior visual effect (see the enlarged areas), compared to that of TransWeather [15]. Figure 5 shows the visualizations of snow removal by different methods. We can see that the competing methods invariably leave behind undesirable artifacts (see the enlarged blue-sky regions) after snow removal, whereas the de-snowed image delivered by our S²DAL is much closer to the Ground Truth, yielding a visually more natural result.

4.4. Comparison on Real-World Images

To demonstrate the practicality, we conduct real-world adverse weather removal on Snow100K-Real [59] and RainDS [72] test sets. Among them, the RainDS [72] test set includes 97 real raindrop test images, whereas the Snow100K-Real [59] test set contains 1329 real-world snowy images. We compare our S²DAL with the recent SemiDDM-weather [27], as both methods are specifically designed for all-in-one weather-degraded image restoration via a semi-supervised learning framework. Figure 6 presents their qualitative results for raindrop removal from real images. Note that these two compared methods were first trained on the AllWeather dataset, and then were applied to restore vision from real-world weather-degraded images. One can see that the SemiDDM-weather [27] method fails to adequately remove real raindrops, resulting in noticeable artifacts within raindrop-contaminated regions. In contrast, the proposed method can more thoroughly eliminate raindrops in the line of sight, thereby producing a more natural visual result. Additionally, Figure 7 presents their qualitative comparisons for snow removal of selected images with heavy snow from Snow100K-Real [59]. As can be seen, although the SemiDDM-weather [27] can remove snowflakes from real images to a certain extent, its restoration effect is still inferior to that of the proposed S²DAL method. Considering that mixed-weather conditions, such as sleet accompanied by haze, may also occur in real-world scenarios, we thus conducted the fixed-weather removal by applying SemiDDM-weather [27] and the proposed S²DAL on an image of sleet accompanied by haze selected from the Snow100K-Real [59] test set. The corresponding visual results are shown in Figure 8. As can be seen, compared with SemiDDM-weather [27], the proposed S²DAL can more effectively remove the sleet and haze in the image. This demonstrates that the proposed S²DAL possesses stronger all-in-one removal capability under complex mixed-weather scenarios.

Finally, the quantitative results in terms of NIQE and IL-NIQE scores between SemiDDM-weather [27] and the proposed S²DAL are reported in Table 4. The quantitative results show that our S²DAL yields better perceptual image quality scores on both Snow100K-Real [59] and RainDS [72] test sets, and significantly outperforms the state-of-the-art semi-supervised all-in-one adverse weather removal method, i.e., SemiDDM-weather [27].

4.5. Ablation Studies

We conducted several ablation experiments on the Outdoor-Rain [60] dataset to further validate the effectiveness of our proposed Semi-Supervised Degradation-Aware Learning (S²DAL) method.

A key component in the proposed S²DAL framework is the Degradation-guided Histogram Transformer (DHT) block, which consists of three modules: Degradation-guided Channel Shuffling Module (DCSM), Dynamic-range Histogram Self-Attention (DHSA), and Dual-scale Gated Feed Forward (DGFF). To analyze the impact of these three modules for adverse weather removal, Table 5 illustrates their quantitative results on the test set of Outdoor-Rain [60]. As can be seen, simultaneously removing the DCSM module from both the DHT block and the DCNN network (Method (a)) leads to the lowest PSNR and SSIM scores. This result validates the critical role of the DCSM module in the proposed S²DAL framework, which can adaptively modulate feature representations according to different degradation-type prompts. Additionally, by removing the DHSA module (Method (d)), the performance of S²DAL also reduces significantly, which is because removing the DHSA module may weaken the ability of S²DAL to extract long-range spatial features, leading to poor restoration performance. Finally, removing the DGFF module (Method (e)), the performance in terms of PSNR and SSIM also drops to a certain extent. This could be due to the fact that, after removing the DGFF module, our proposed S²DAL fails to focus on the correlation among dynamically distributed weather-induced degradation, leading to insufficient feature interaction and sub-optimal image restoration.

On the other hand, in order to demonstrate the contribution of the shuffling mechanism, Table 6 and Figure 9 respectively show the quantitative and qualitative results with and without (w/o) using a top-K channel shuffling operation

{CS}_{t o p K} (\cdot)

in the DCSM. Note that not using the shuffling mechanism means the top-K channel shuffling operation

{CS}_{t o p K} (\cdot)

is disabled in Equation (18) which formulates the overall process of DCSM. In addition, Table 6 also compares the performance between

{CS}_{t o p K} (\cdot)

using the Randomly Shuffling Index (RSI) and

{CS}_{t o p K} (\cdot)

using Degradation-guided Shuffling Index (DSI). From Table 6 and Figure 9, we can observe that the performance of

{CS}_{t o p K} (\cdot)

+ RSI and

{CS}_{t o p K} (\cdot)

+ DSI consistently outperforms that of

{CS}_{t o p K} (\cdot)

(w/o), which shows that the channel shuffling operation contributes to improving the restoration performance, as channel shuffling could alleviate inter-channel interference while preserving the intrinsic image characteristics. Moreover, the performance of

{CS}_{t o p K} (\cdot)

+ DSI also exceeds that of

{CS}_{t o p K} (\cdot)

+ RSI; this implies that channel shuffling guided by the degradation-guided shuffling index can make the deep features more discriminative and representative for recovering cleaner images, compared with channel shuffling using the randomly shuffling index.

4.6. Discussion

Although our proposed semi-supervised framework S²DAL can achieve impressive results for all-in-one weather-degraded image restoration, there still exist some limitations. First, as stated previously, the proposed S²DAL requires a considerable number of degradation generators with the identical network architecture to model the degradation layers for different mini-batch data; this will inevitably increase storage requirements during the training process. Second, when weather-induced degradations are extremely dense, complex, or highly coupled, some fine structural details and texture information may be irreversibly corrupted in the input images, especially for the real-world degraded ones shown in Figure 6, Figure 7 and Figure 8. As can be seen, although the proposed S²DAL helps alleviate this challenge and achieves better visual effects compared to SemiDDM-weather [27], accurately recovering those regions severely corrupted by severe weather remains difficult. Third, the proposed S²DAL belongs to a semi-supervised learning method, which implies that a limited amount of manually labeled data is still required to impose ground-truth supervision for image restoration. Self-learning methods (e.g., self-supervised learning) have recently achieved significant progress in many vision-related tasks [73,74] due to their ability to leverage large-scale unlabeled data without requiring expensive manually labeled data. To address these limitations, future efforts will focus on lightweight model design, fully self-supervised restoration for weather-degraded images, and performance optimization for mixed and complex degradation scenarios.

5. Conclusions

In this paper, we proposed a novel semi-supervised learning framework, termed Semi-Supervised Degradation-Aware Learning (S²DAL), for all-in-one adverse weather image restoration under limited paired data. Our proposed S²DAL is a dual-backbone architecture, consisting of a Degradation-guided Histogram Transformer (DHformer) for weather-degraded image restoration and a Degradation-guided Histogram Transformer (DCNN) for degradation layer generation. In particular, we designed a Degradation-guided Histogram Transformer (DHT) block to construct the DHformer, which can adaptively regulate the feature space through channel shuffling modulation to better align with the shared parameter and network structure. Furthermore, we also introduced a Monte Carlo-based EM optimization algorithm to allow for the joint estimation of latent variables and network parameters under both labeled and unlabeled data. Extensive experimental results on both synthetic and real-world datasets demonstrate that S²DAL consistently outperforms multiple existing state-of-the-art methods.

Author Contributions

Methodology: L.C.; software: L.C. and F.R.; validation: F.R. and H.Z.; writing—original draft preparation: L.C.; writing—review and editing: L.C.; formal analysis: W.L. and Q.L.; investigation: W.L.; data curation: F.R. and Q.L.; visualization: W.X.; supervision: T.Z.; funding acquisition: L.C., W.X. and T.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the National Key R&D Program of China under Grant 2026YFE0202200, in part by the Natural Science Foundation of Xiamen under Grants 3502Z202472011 and 3502Z202571031, in part by the Natural Science Foundation of Fujian Province under Grants 2024J01098 and 2026J001734, in part by the Huaqiao University Research Startup Funds under Grants 24BS109 and 25BS107, in part by the Open List and Leader Appointment Project of Fujian Province under Grant 2024HZ022007, in part by the Key Project of Natural Science Foundation of Fujian Province under Grants 2023J02022 and 2026J002031, in part by the Key Science and Technology Project of Xiamen City under Grants 3502Z20251011 and 3502Z20234034, in part by the High-level Talent Team Project of Quanzhou City under Grant 2023CT001, in part by Fujian Province Science and Technology Empowering Police Research Project under Grant 2024Y0064, in part by the Quanzhou City High-level Talent Innovation and Entrepreneurship Project under Grant 20241ZC010, in part by National Natural Science Foundation of China (NSFC) under Grants 62261011, and in part by Specific Research Project of Guangxi for Research Bases and Talents under Grant GuiKeAD22080028.

Institutional Review Board Statement

Not applicable. This study only used publicly available datasets and did not involve any direct interaction with human participants or animal subjects.

Informed Consent Statement

Not applicable. All data employed in this study are publicly accessible and were obtained in full compliance with the ethical guidelines established by the respective dataset creators.

Data Availability Statement

The datasets analyzed in this study were derived by TransWeather: Transformer-based Restoration of Images Degraded by Adverse Weather Conditions [15], and the code of our work is available at: https://github.com/Lcai-QZ/S2DAL (accessed on 28 May 2026).

Acknowledgments

We are extremely grateful for the valuable suggestions provided by the editors and reviewers.

Conflicts of Interest

Author Wei Lu was employed by the company Xiamen Solex High-Tech Industries Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Wang, Z.; Xu, L.; Rong, W.; Yao, X.; Chen, T.; Zhao, P.; Chen, Y. Semantic-Guided Iterative Detail Fusion Network for Single-Image Deraining. Electronics 2024, 13, 3634. [Google Scholar] [CrossRef]
Li, W.; Chen, G.; Chang, Y. An efficient single image de-raining model with decoupled deep networks. IEEE Trans. Circuits Syst. Video Technol. 2024, 33, 69–81. [Google Scholar] [CrossRef] [PubMed]
Peng, L.; Wang, Y.; Di, X.; Fu, X.; Cao, Y.; Zha, Z. Boosting image de-raining via central-surrounding synergistic convolution. AAAI Conf. Artif. Intell. 2025, 39, 6470–6478. [Google Scholar] [CrossRef]
Fu, Q.; Lu, B.; Yan, C. CoFiWaveMamba: A Coarse-to-Fine Wavelet-Guided Mamba Network for Single Image Dehazing. Electronics 2026, 15, 1599. [Google Scholar] [CrossRef]
Wang, R.; Zheng, Y.; Zhang, Z.; Li, C.; Liu, S.; Zhai, G.; Liu, X. Learning hazing to dehazing: Towards realistic haze generation for real-world image dehazing. In IEEE Conference on Computer Vision and Pattern Recognition; IEEE: Piscataway, NJ, USA, 2025; pp. 23091–23100. [Google Scholar]
Chen, J.; Ren, W.; Zhao, H.; Xia, Q.; Yang, G. You only need clear images: Self-supervised single image dehazing. IEEE Trans. Multimed. 2025, 27, 5800–5814. [Google Scholar] [CrossRef]
Ruan, G.; Kong, F.; Ding, C.; Yang, K.; Hu, T.; Yan, R. DVIOR: Dynamic Vertical and Low-Intensity Outlier Removal for Efficient Snow Noise Removal from LiDAR Point Clouds in Adverse Weather. Electronics 2025, 14, 3662. [Google Scholar] [CrossRef]
Chen, Z.; Sun, Y.; Bi, X.; Yue, J. Lightweight image de-snowing: A better trade-off between network capacity and performance. Neural Netw. 2023, 165, 896–908. [Google Scholar] [CrossRef] [PubMed]
Guo, X.; Wang, X.; Fu, X.; Zha, Z. Deep unfolding network for image desnowing with snow shape prior. IEEE Trans. Circuits Syst. Video Technol. 2025, 35, 4740–4752. [Google Scholar] [CrossRef]
Yang, J.; Chitta, K.; Gao, S.; Chen, L.; Shao, Y.; Jia, X.; Li, H.; Geiger, A.; Yue, X.; Chen, L.; et al. Resim: Reliable world simulation for autonomous driving. Adv. Neural Inf. Process. Syst. 2026, 38, 167710–167741. [Google Scholar]
Yuan, T.; Zhang, X.; Liu, K.; Liu, B.; Chen, C.; Jin, J.; Jiao, Z. Towards surveillance video-and-language understanding: New dataset baselines and challenges. In IEEE Conference on Computer Vision and Pattern Recognition; IEEE: Piscataway, NJ, USA, 2024; pp. 22052–22061. [Google Scholar]
Tang, X.; Gu, X.; He, X.; Hu, X.; Sun, J. Degradation-aware residual-conditioned optimal transport for unified image restoration. IEEE Trans. Pattern Anal. Mach. Intell. 2025, 47, 6764–6779. [Google Scholar] [CrossRef] [PubMed]
Li, W.; Zhou, G.; Lin, S.; Tang, Y. Pernet: Progressive and efficient all-in-one image-restoration lightweight network. Electronics 2024, 13, 2817. [Google Scholar] [CrossRef]
Li, R.; Tan, R.; Cheong, L. All in one bad weather removal using architectural search. In IEEE Conference on Computer Vision and Pattern Recognition; IEEE: Piscataway, NJ, USA, 2020; pp. 3175–3185. [Google Scholar]
Valanarasu, J.; Yasarla, R.; Patel, V. Transweather: Transformer-based restoration of images degraded by adverse weather conditions. In IEEE Conference on Computer Vision and Pattern Recognition; IEEE: Piscataway, NJ, USA, 2022; pp. 2353–2363. [Google Scholar]
Özdenizci, O.; Legenstein, R. Restoring vision in adverse weather conditions with patch-based denoising diffusion models. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 10346–10357. [Google Scholar] [CrossRef] [PubMed]
Tan, Z.; Wu, Y.; Liu, Q.; Chu, Q.; Lu, L.; Ye, J.; Yu, N. Exploring the application of large-scale pre-trained models on adverse weather removal. IEEE Trans. Image Process. 2024, 33, 1683–1698. [Google Scholar] [CrossRef] [PubMed]
Sun, S.; Ren, W.; Gao, X.; Wang, R.; Cao, X. Restoring images in adverse weather conditions via histogram transformer. In European Conference on Computer Vision; Springer: Cham, Switzerland, 2024; pp. 111–129. [Google Scholar]
Lu, X.; Xiao, J.; Zhu, Y.; Fu, X. Continuous adverse weather removal via degradation-aware distillation. In IEEE Conference on Computer Vision and Pattern Recognition; IEEE: Piscataway, NJ, USA, 2025; pp. 28113–28123. [Google Scholar]
Wan, Y.; Shao, M.; Cheng, Y.; Zuo, W. Image all-in-one adverse weather removal via dynamic model weights generation. Knowl.-Based Syst. 2024, 302, 112324. [Google Scholar] [CrossRef]
Yang, H.; Pan, L.; Yang, Y.; Liang, W. Language-driven all-in-one adverse weather removal. In IEEE Conference on Computer Vision and Pattern Recognition; IEEE: Piscataway, NJ, USA, 2024; pp. 24902–24912. [Google Scholar]
Jeong, Y.; Yang, Y.; Yoon, Y.; Yoon, K. Robust Adverse Weather Removal via Spectral-based Spatial Grouping. In IEEE Conference on Computer Vision and Pattern Recognition; IEEE: Piscataway, NJ, USA, 2025; pp. 11872–11883. [Google Scholar]
Cheng, D.; Ji, Y.; Gong, D.; Li, Y.; Wang, N.; Han, J.; Zhang, D. Continual all-in-one adverse weather removal with knowledge replay on a unified network structure. IEEE Trans. Multimed. 2024, 26, 8184–8196. [Google Scholar] [CrossRef]
Guan, Q.; Yang, Q.; Chen, X.; Song, T.; Jin, G.; Jin, J. Weatherbench: A real-world benchmark dataset for all-in-one adverse weather image restoration. In Proceedings of the ACM International Conference on Multimedia, Dublin, Ireland, 27–31 October 2025; pp. 12607–12613. [Google Scholar]
Wen, Y.; Gao, T.; Li, Z.; Zhang, J.; Zhang, K.; Chen, T. All-in-one Weather-degraded Image Restoration via Adaptive Degradation-aware Self-prompting Model. IEEE Trans. Multimed. 2025, 27, 3343–3355. [Google Scholar] [CrossRef]
Xie, D.; Hu, X.; Zhou, Y.; Duan, S. All-in-one adverse weather removal via dual state space-based diffusion model with degradation-aware guidance. Pattern Recognit. 2026, 171, 112081. [Google Scholar] [CrossRef]
Long, F.; Su, W.; Li, Z.; Cai, L.; Li, M.; Wang, Y.; Cao, X. SemiDDM-weather: A semi-supervised learning framework for all-in-one adverse weather removal. Neural Netw. 2026, 195, 108241. [Google Scholar] [CrossRef] [PubMed]
Phung, H.; Dao, Q.; Tran, A. Wavelet diffusion models are fast and scalable image generators. In IEEE Conference on Computer Vision and Pattern Recognition; IEEE: Piscataway, NJ, USA, 2023; pp. 10199–10208. [Google Scholar]
Xu, J.; Wu, M.; Hu, X.; Fu, C.W.; Dou, Q.; Heng, P.A. Towards real-world adverse weather image restoration: Enhancing clearness and semantics with vision-language models. In European Conference on Computer Vision; Springer: Cham, Switzerland, 2024; pp. 147–164. [Google Scholar]
Cai, L.; Fu, Y.; Huo, W.; Xiang, Y.; Zhu, T.; Zhang, Y.; Zeng, H.; Zeng, D. Multiscale Attentive Image De-Raining Networks via Neural Architecture Search. IEEE Trans. Circuits Syst. Video Technol. 2023, 32, 618–633. [Google Scholar] [CrossRef]
Wei, H.; Wu, Q.; Wu, C.; Ngan, K.; Li, H.; Meng, F.; Qiu, H. Robust unpaired image dehazing via adversarial deformation constraint. IEEE Trans. Circuits Syst. Video Technol. 2024, 34, 8614–8628. [Google Scholar] [CrossRef]
Jaw, D.W.; Huang, S.C.; Kuo, S.Y. DesnowGAN: An efficient single image snow removal framework using cross-resolution lateral connection and GANs. IEEE Trans. Circuits Syst. Video Technol. 2021, 31, 1342–1350. [Google Scholar] [CrossRef]
He, K.; Sun, J.; Tang, X. Single image haze removal using dark channel prior. IEEE Trans. Pattern Anal. Mach. Intell. 2010, 33, 2341–2353. [Google Scholar] [CrossRef] [PubMed]
Bossu, J.; Hautiere, N.; Tarel, J.P. Rain or snow detection in image sequences through use of a histogram of orientation of streaks. Int. J. Comput. Vis. 2011, 93, 348–367. [Google Scholar] [CrossRef]
Kang, L.W.; Lin, C.W.; Fu, Y.H. Automatic single-image-based rain streaks removal via image decomposition. IEEE Trans. Image Process. 2012, 21, 1742–1755. [Google Scholar] [CrossRef] [PubMed]
Kim, J.H.; Lee, C.; Sim, J.Y.; Kim, C.S. Single-image deraining using an adaptive nonlocal means filter. In IEEE International Conference on Image Processing; IEEE: Piscataway, NJ, USA, 2013; pp. 914–917. [Google Scholar]
Luo, Y.; Xu, Y.; Ji, H. Removing rain from a single image via discriminative sparse coding. In IEEE International Conference on Computer Vision; IEEE: Piscataway, NJ, USA, 2015; pp. 3397–3405. [Google Scholar]
Zhu, Q.; Mai, J.; Shao, L. A fast single image haze removal algorithm using color attenuation prior. IEEE Trans. Image Process. 2015, 24, 3522–3533. [Google Scholar] [CrossRef] [PubMed]
Ge, C.; Fu, X.; He, P.; Wang, K.; Cao, C.; Zha, Z.J. Neuromorphic Event Signal-Driven Network for Video De-raining. AAAI Conf. Artif. Intell. 2024, 38, 1878–1886. [Google Scholar] [CrossRef]
Cai, L.; Fu, Y.; Zhu, T.; Xiang, Y.; Zhang, Y.; Zeng, H. Joint Depth and Density Guided Single Image De-raining. IEEE Trans. Circuits Syst. Video Technol. 2022, 32, 4108–4121. [Google Scholar] [CrossRef]
Wang, Y.; Yan, X.; Guan, D.; Wei, M.; Chen, Y.; Zhang, X.; Li, J. Cycle-snspgan: Towards real-world image dehazing via cycle spectral normalized soft likelihood estimation patch gan. IEEE Trans. Intell. Transp. Syst. 2022, 23, 20368–20382. [Google Scholar] [CrossRef]
Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial nets. Adv. Neural Inf. Process. Syst. 2014, 27, 2672–2680. [Google Scholar]
Chen, X.; Pan, J.; Dong, J.; Tang, J. Towards unified deep image deraining: A survey and a new benchmark. IEEE Trans. Pattern Anal. Mach. Intell. 2025, 47, 5414–5433. [Google Scholar] [CrossRef] [PubMed]
Dong, W.; Zhou, H.; Wang, R.; Liu, X.; Zhai, G.; Chen, J. Dehazedct: Towards effective non-homogeneous dehazing via deformable convolutional transformer. In IEEE Conference on Computer Vision and Pattern Recognition; IEEE: Piscataway, NJ, USA, 2024; pp. 6405–6414. [Google Scholar]
Wang, C.; Pan, J.; Wang, L.; Wang, W. Intra and inter parser-prompted transformers for effective image restoration. AAAI Conf. Artif. Intell. 2025, 39, 7609–7618. [Google Scholar] [CrossRef]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30, 6000–6010. [Google Scholar]
Tian, X.; Liao, X.; Liu, X.; Li, M.; Ren, C. Degradation-Aware Feature Perturbation for All-in-One Image Restoration. In IEEE Conference on Computer Vision and Pattern Recognition; IEEE: Piscataway, NJ, USA, 2025; pp. 28165–28175. [Google Scholar]
Zhu, Y.; Wang, T.; Fu, X.; Yang, X.; Guo, X.; Dai, J.; Qiao, Y.; Hu, X. Learning weather-general and weather-specific features for image restoration under multiple adverse weather conditions. In IEEE Conference on Computer Vision and Pattern Recognition; IEEE: Piscataway, NJ, USA, 2023; pp. 21747–21758. [Google Scholar]
Chen, W.; Huang, Z.; Tsai, C.; Yang, H.; Ding, J.; Kuo, S. Learning multiple adverse weather removal via two-stage knowledge learning and multi-contrastive regularization: Toward a unified model. In IEEE Conference on Computer Vision and Pattern Recognition; IEEE: Piscataway, NJ, USA, 2022; pp. 17653–17662. [Google Scholar]
Radford, A.; Kim, J.W. Learning transferable visual models from natural language supervision. In Proceedings of the ACM International Conference on Machine Learning, Shenzhen, China, 26 February–1 March 2021; pp. 8748–8763. [Google Scholar]
Liao, R.; Li, F.; Wei, Y.; Shi, Z.; Zhang, L.; Bai, H.; Wang, M. Prompt to Restore, Restore to Prompt: Cyclic Prompting for Universal Adverse Weather Removal. IEEE Trans. Image Process. 2025, 34, 7422–7435. [Google Scholar] [CrossRef] [PubMed]
Shao, Q.; Zhang, Y.; Xiao, R.; Hu, L. Augmented Degradation Modeling for Image Restoration Under Adverse Weather Conditions. In International Conference on Automation and Computing; IEEE: Piscataway, NJ, USA, 2025; pp. 1–6. [Google Scholar]
Liu, N.; Wang, J.; Gao, J.; Chang, S.; Lou, Y. Similarity-informed self-learning and its application on seismic image denoising. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5921113. [Google Scholar] [CrossRef]
Li, Y.; Tan, R.T.; Guo, X.; Lu, J.; Brown, M.S. Rain streak removal using layer priors. In IEEE Conference on Computer Vision and Pattern Recognition; IEEE: Piscataway, NJ, USA, 2016; pp. 2736–2744. [Google Scholar]
Yue, Z.; Xie, J.; Zhao, Q.; Meng, D. Semi-supervised video deraining with dynamical rain generator. In IEEE Conference on Computer Vision and Pattern Recognition; IEEE: Piscataway, NJ, USA, 2021; pp. 642–652. [Google Scholar]
Dempster, A.; Laird, N.; Rubin, D. Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B (Methodol.) 1977, 39, 1–22. [Google Scholar] [CrossRef]
Dauphin, Y.; Fan, A.; Auli, M.; Grangier, D. Language modeling with gated convolutional networks. In International Conference on Machine Learning; PMLR: Cambridge, MA, USA, 2017; pp. 933–941. [Google Scholar]
Misra, D. Mish: A self regularized non-monotonic activation function. arXiv 2019, arXiv:1908.08681. [Google Scholar]
Liu, Y.F.; Jaw, D.W.; Huang, S.C.; Hwang, J.N. DesnowNet: Context-aware deep network for snow removal. IEEE Trans. Image Process. 2018, 27, 3064–3073. [Google Scholar] [CrossRef] [PubMed]
Li, R.; Cheong, L.F.; Tan, R.T. Heavy rain image restoration: Integrating physics model and conditional adversarial learning. In IEEE Conference on Computer Vision and Pattern Recognition; IEEE: Piscataway, NJ, USA, 2020; pp. 1633–1642. [Google Scholar]
Qian, R.; Tan, R.T.; Yang, W.; Su, J.; Liu, J. Attentive generative adversarial network for raindrop removal from a single image. In IEEE Conference on Computer Vision and Pattern Recognition; IEEE: Piscataway, NJ, USA, 2018; pp. 2482–2491. [Google Scholar]
Huynh-Thu, Q.; Ghanbari, M. Scope of validity of PSNR in image/video quality assessment. Electron. Lett. 2008, 44, 800–801. [Google Scholar] [CrossRef]
Wang, Z.; Bovik, A.C.; Sheikh, H.R.; Simoncelli, E.P. Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Process. 2004, 13, 600–612. [Google Scholar] [CrossRef] [PubMed]
Mittal, A.; Soundararajan, R.; Bovik, A.C. Making a “completely blind” image quality analyzer. IEEE Signal Process. Lett. 2013, 20, 209–212. [Google Scholar] [CrossRef]
Zhang, L.; Zhang, L.; Bovik, A.C. A feature-enriched completely blind image quality evaluator. IEEE Trans. Image Process. 2015, 24, 2579–2591. [Google Scholar] [CrossRef] [PubMed]
Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
Lai, J.; Chen, S. Snowmaster: Comprehensive real-world image desnowing via mllm with multi-model feedback optimization. In IEEE Computer Vision and Pattern Recognition; IEEE: Piscataway, NJ, USA, 2025; pp. 4302–4312. [Google Scholar]
Tang, Q.; Guo, X.; Wei, X.; Li, X.; Wang, D.; Zhang, S. SAT-UIR: Self-Assessment Training for Semi-Supervised Underwater Image Restoration. IEEE Trans. Circuits Syst. Video Technol. 2026, 36, 3397–3408. [Google Scholar] [CrossRef]
Jia, T.; Li, J.; Zhuo, L. Semi-supervised single-image dehazing network via disentangled meta-knowledge. IEEE Trans. Multimed. 2024, 26, 2634–2647. [Google Scholar] [CrossRef]
Cui, X.; Wang, C.; Ren, D.; Chen, Y.; Zhu, P. Semi-supervised image deraining using knowledge distillation. IEEE Trans. Circuits Syst. Video Technol. 2022, 32, 8327–8341. [Google Scholar] [CrossRef]
Huang, H.; Luo, M.; He, R. Memory uncertainty learning for real-world single image deraining. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 45, 3446–3460. [Google Scholar] [CrossRef] [PubMed]
Quan, R.; Yu, X.; Liang, Y.; Yang, Y. Removing raindrops and rain streaks in one go. In IEEE Conference on Computer Vision and Pattern Recognition; IEEE: Piscataway, NJ, USA, 2021; pp. 9147–9156. [Google Scholar]
Xing, J.; Wei, D.; Zhou, S.; Wang, T.; Huang, Y.; Chen, H. A comprehensive study on self-learning methods and implications to autonomous driving. IEEE Trans. Neural Netw. Learn. Syst. 2025, 36, 7786–7805. [Google Scholar] [CrossRef] [PubMed]
Mu, F.; Huang, R.; Zhang, J.; Zou, C.; Shi, K.; Sun, S.; Zhan, H.; Zhao, P.; Qiu, J.; Cheng, H. SS-pose: Self-supervised 6-D object pose representation learning without rendering. IEEE Trans. Ind. Inform. 2024, 20, 13665–13675. [Google Scholar] [CrossRef]

Figure 1. The overall architecture of the proposed Semi-supervised Degradation-aware Learning (S²DAL) framework consists of two sub-networks: Degradation-guided Histogram Transformer (DHformer) and Degradation-guided Convolutional Neural Network (DCNN). Among them, DHformer takes a set of weather-degraded images

y

(e.g., natural images corrupted by rain, snow, or raindrop) and their corresponding weather degradation-type prompts

P_{d}

as input, and then passes them through multiple well-designed Degradation-guided Histogram Transformer (DHT) blocks to generate the final restored images

f (y, P_{d}; W)

, while DCNN takes the latent variable (i.e.,

z \in N (0, I)

) and weather degradation-type prompts

P_{d}

as input and outputs the degradation layers

G (z, P_{d}; θ)

. The degradation layers are then added to the restored images to form the supervised signal for model training.

Figure 1. The overall architecture of the proposed Semi-supervised Degradation-aware Learning (S²DAL) framework consists of two sub-networks: Degradation-guided Histogram Transformer (DHformer) and Degradation-guided Convolutional Neural Network (DCNN). Among them, DHformer takes a set of weather-degraded images

y

(e.g., natural images corrupted by rain, snow, or raindrop) and their corresponding weather degradation-type prompts

P_{d}

as input, and then passes them through multiple well-designed Degradation-guided Histogram Transformer (DHT) blocks to generate the final restored images

f (y, P_{d}; W)

, while DCNN takes the latent variable (i.e.,

z \in N (0, I)

) and weather degradation-type prompts

P_{d}

as input and outputs the degradation layers

G (z, P_{d}; θ)

. The degradation layers are then added to the restored images to form the supervised signal for model training.

Figure 2. The Degradation-guided Histogram Transformer (DHT) block is composed of three key components, including Degradation-guided Channel Shuffling Module (DCSM), Dynamic-range Histogram Self-Attention (DHSA), and Dual-scale Gated Feed Forward (DGFF).

Figure 3. Visual comparison of different methods for raindrop removal on an image from the test set of RainDrop [61]. Quantitative results, i.e., PSNR(dB)/SSIM, are also documented below each image, and the best results are marked in bold. The restoration results from (b) to (d) are SemiDDM-weather [27], SnowMaster [67], and TransWeather [15].

Figure 4. Visual comparison of different methods for rain and haze removal on an image from the Outdoor-Rain [60] Test1 set. Quantitative results, i.e., PSNR(dB)/SSIM, are also documented below each image, and the best results are marked in bold. The restoration results from (b) to (d) are SemiDDM-weather [27], SnowMaster [67], and TransWeather [15].

Figure 5. Visual comparison of different methods for snow removal on the test set of Snow100K [59]. Quantitative results, i.e., PSNR(dB)/SSIM, are also documented below each image, and the best results are marked in bold. The restoration results from (b) to (d) are SemiDDM-weather [27], SnowMaster [67], and TransWeather [15].

Figure 6. Visual comparison of raindrop removal between SemiDDM-weather [27] and the proposed S²DAL on real-world raindrop images from RainDS [72] test set. Quantitative results, i.e., NIQE/IL-NIQE, are also documented below each image, and the best results are marked in bold.

Figure 7. Visual comparison of raindrop removal between SemiDDM-weather [27] and the proposed S²DAL on real-world snowy images from Snow100K-Real [59]. Quantitative results, i.e., NIQE/IL-NIQE, are also documented below each image, and the best results are marked in bold.

Figure 8. Visual comparison of real-world fixed weather removal between SemiDDM-weather [27] and the proposed S²DAL on an image of sleet accompanied by haze selected from the Snow100K-Real [59] test set. Quantitative results, i.e., NIQE/IL-NIQE, are also documented below each image, and the best results are marked in bold.

Figure 9. Visual comparison for rain and haze removal with and without (w/o) using a top-K channel shuffling operation

{CS}_{t o p K} (\cdot)

in DCSM.

Figure 9. Visual comparison for rain and haze removal with and without (w/o) using a top-K channel shuffling operation

{CS}_{t o p K} (\cdot)

in DCSM.

Table 1. The network configuration of the DCNN.

Types	Filter Sizes	Neuron/Filter Number	Stride	Padding	Output Sizes
Input ( $z$ )	-	-	-	-	$1 \times 256$
FC	-	4096	-	-	$1 \times 4096$
ReLU	-	-	-	-	$1 \times 4096$
Reshape	-	-	-	-	$1 \times 64 \times 64$
Convolution	$3 \times 3$	128	1	1	$128 \times 64 \times 64$
ReLU	-	-	-	-	$128 \times 64 \times 64$
Convolution	$3 \times 3$	256	1	1	$256 \times 64 \times 64$
ReLU	-	-	-	-	$256 \times 64 \times 64$
DCSM	-	-	-	-	$256 \times 64 \times 64$
Convolution	3 × 3	128	1	1	$128 \times 64 \times 64$
ReLU	-	-	-	-	$128 \times 64 \times 64$
DCSM	-	-	-	-	$128 \times 64 \times 64$
Convolution	3 × 3	512	1	1	$512 \times 64 \times 64$
ReLU	-	-	-	-	$512 \times 64 \times 64$
Upsampling	-	-	-	-	$512 \times 128 \times 128$
Convolution	3 × 3	3	1	1	$3 \times 128 \times 128$
ReLU	-	-	-	-	$3 \times 128 \times 128$

Table 2. The sample quantity as well as the degradation-type prompts for Snow100K [59], Outdoor-Rain [60], and RainDrop [61], respectively.

Dataset	Training Set	Test Set	Degradation-Type Prompts
Snow100K	9000	33,412	Snow degradation by normal snowflakes
Outdoor-Rain	8250	750	Rain degradation by rain streaks and normal haze
RainDrop	818	58	Rain degradation by normal raindrops
Total	18,068	34,220	-

Table 3. Quantitative comparisons on three adverse weather removal tasks are reported in terms of PSNR and SSIM. The top halves of the tables show the results of fully supervised all-in-one image restoration models, while the bottom halves of the tables report the results of semi-supervised methods. The best and second-best results are marked in bold and underlined, respectively.

(a) RainDrop removal
Method	RainDrop
Method	PSNR ↑	SSIM ↑
All-in-One [14]	31.12	0.927
TransWeather [15]	30.17	0.916
SemiDDM-weather [27]	20.65	0.596
Muss [71]	23.19	0.787
SSID-KD [70]	21.12	0.813
DNDM [69]	24.07	0.840
SAT-UIR [68]	26.86	0.873
SnowMaster [67]	28.53	0.908
S²DAL (Ours)	31.35	0.935
(b) De-raining & De-hazing
Method	Outdoor-Rain
Method	PSNR ↑	SSIM ↑
All-in-One [14]	24.71	0.898
TransWeather [15]	28.83	0.900
SemiDDM-weather [27]	22.33	0.786
Muss [71]	17.98	0.603
SSID-KD [70]	18.83	0.724
DNDM [69]	17.09	0.705
SAT-UIR [68]	24.19	0.852
SnowMaster [67]	24.08	0.860
S²DAL (Ours)	28.13	0.916
(c) De-snowing
Method	Snow100K-S		Snow100K-L
Method	PSNR ↑	SSIM ↑	PSNR ↑	SSIM ↑
All-in-One [14]	-	-	28.33	0.882
TransWeather [15]	32.51	0.934	29.31	0.888
SemiDDM-weather [27]	23.84	0.780	23.84	0.780
Muss [71]	24.54	0.775	21.27	0.667
SSID-KD [70]	28.68	0.872	24.30	0.782
DNDM [69]	29.79	0.880	24.13	0.782
SAT-UIR [68]	30.35	0.906	25.63	0.827
SnowMaster [67]	32.74	0.948	27.72	0.901
S²DAL (Ours)	34.90	0.951	29.45	0.905

Table 4. Quantitative NIQE and IL-NIQE comparisons between SemiDDM-weather [27] and S²DAL on Snow100K-Real [59] and RainDS [72]. Lower is better.

Method	Snow100K-Real		RainDS
Method	NIQE ↓	IL-NIQE ↓	NIQE ↓	IL-NIQE ↓
SemiDDM-weather [27]	3.021	21.939	4.286	23.708
S²DAL (Ours)	2.925	21.656	3.621	21.261

Table 5. Ablation results on the Outdoor-Rain [60] test set after removing different components from the proposed S²DAL. The best and second-best results are marked in bold and underlined, respectively.

Method	DCSM ^†	DCSM ^‡	DHSA	DGFF	PSNR ↑	SSIM ↑
(a)	×	×	✓	✓	23.68	0.852
(b)	×	✓	✓	✓	26.49	0.876
(c)	✓	×	✓	✓	25.78	0.886
(d)	✓	✓	×	✓	24.13	0.858
(e)	✓	✓	✓	×	27.46	0.904
S²DAL	✓	✓	✓	✓	28.13	0.916

^† DCSM within the DHT block. ^‡ DCSM within the DCNN network.

Table 6. Comparison of PSNR and SSIM results on the Outdoor-Rain [60] test set, with and without (w/o) using a top-K channel shuffling operation

{CS}_{t o p K} (\cdot)

in DCSM. The best and second-best results are marked in bold and underlined, respectively.

Table 6. Comparison of PSNR and SSIM results on the Outdoor-Rain [60] test set, with and without (w/o) using a top-K channel shuffling operation

{CS}_{t o p K} (\cdot)

in DCSM. The best and second-best results are marked in bold and underlined, respectively.

Method	PSNR ↑	SSIM ↑
${CS}_{t o p K} (\cdot)$ (w/o)	24.73	0.862
${CS}_{t o p K} (\cdot)$ + RSI	25.41	0.895
${CS}_{t o p K} (\cdot)$ + DSI	28.13	0.916

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Cai, L.; Ruan, F.; Lu, W.; Lin, Q.; Zheng, H.; Xiang, W.; Zhu, T. Semi-Supervised Degradation-Aware Learning for All-in-One Weather-Degraded Image Restoration. Electronics 2026, 15, 2686. https://doi.org/10.3390/electronics15122686

AMA Style

Cai L, Ruan F, Lu W, Lin Q, Zheng H, Xiang W, Zhu T. Semi-Supervised Degradation-Aware Learning for All-in-One Weather-Degraded Image Restoration. Electronics. 2026; 15(12):2686. https://doi.org/10.3390/electronics15122686

Chicago/Turabian Style

Cai, Lei, Fang Ruan, Wei Lu, Qi Lin, Huijie Zheng, Wenjie Xiang, and Tao Zhu. 2026. "Semi-Supervised Degradation-Aware Learning for All-in-One Weather-Degraded Image Restoration" Electronics 15, no. 12: 2686. https://doi.org/10.3390/electronics15122686

APA Style

Cai, L., Ruan, F., Lu, W., Lin, Q., Zheng, H., Xiang, W., & Zhu, T. (2026). Semi-Supervised Degradation-Aware Learning for All-in-One Weather-Degraded Image Restoration. Electronics, 15(12), 2686. https://doi.org/10.3390/electronics15122686

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Semi-Supervised Degradation-Aware Learning for All-in-One Weather-Degraded Image Restoration

Abstract

1. Introduction

2. Related Work

2.1. Single Weather-Degraded Image Restoration

2.2. All-in-One Weather-Degraded Image Restoration

3. Proposed Semi-Supervised Degradation-Aware Learning Framework

3.1. Model Formulation

3.1.1. Modeling of Background Layer

3.1.2. Modeling of Degradation Layer

3.2. Learning Guideline

3.2.1. Maximum A Posteriori Estimation

3.2.2. Optimization Algorithm

3.3. Network Architecture

3.3.1. Degradation-Guided Histogram Transformer

3.3.2. Degradation-Guided Convolutional Neural Network

4. Experiments

4.1. Datasets

4.2. Implementation Details

4.3. Comparisons with the State of the Art

4.4. Comparison on Real-World Images

4.5. Ablation Studies

4.6. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI