Symmetry-Guided AB-Dynamic Feature Refinement Network for Weakly Supervised Shadow Removal

Shao, Yiming; Zhang, Zhijia; Yang, Minmin

doi:10.3390/sym18020330

Open AccessArticle

Symmetry-Guided AB-Dynamic Feature Refinement Network for Weakly Supervised Shadow Removal

by

Yiming Shao

^1,*,

Zhijia Zhang

² and

Minmin Yang

²

¹

School of Information Science and Engineering, Shenyang University of Technology, Shenyang 110870, China

²

School of Artificial Intelligence, Shenyang University of Technology, Shenyang 110870, China

^*

Author to whom correspondence should be addressed.

Symmetry 2026, 18(2), 330; https://doi.org/10.3390/sym18020330

Submission received: 2 December 2025 / Revised: 30 January 2026 / Accepted: 2 February 2026 / Published: 11 February 2026 / Corrected: 2 April 2026

(This article belongs to the Section Computer)

Download

Browse Figures

Versions Notes

Abstract

Shadow removal aims to restore photometric, chromatic, and structural consistency between shadowed and non-shadowed image regions. Although weakly supervised shadow removal methods reduce the reliance on densely paired training data, they still struggle to fully exploit appearance priors from non-shadow regions. As a result, their shadow removal outputs often appear unnatural, exhibiting color shifts and loss of fine texture details. To address this issue, we propose an ab-dynamic feature refinement network (AB-DFRNet) for weakly supervised shadow removal that more effectively exploits structural and chromatic symmetry during training. A high-frequency information enhancement (HFIE) module is introduced into the shadow generation subnet to extract and enhance high-frequency components via frequency separation and dense convolutions, thereby facilitating the learning of fine structural symmetry and enriching pseudo-shadow details. In the removal subnet, a dual-attention adaptive fusion (DAAF) module combines global and local attention mechanisms to adaptively recalibrate channel-wise and spatial features, improving multi-scale feature integration. Furthermore, a chrominance-only consistency (COC) loss is designed to minimize differences between the a and b channels of restored regions and their non-shadow references in the Lab color space. This additional color refinement constraint encourages a symmetric distribution of chromatic information and helps the refinement network produce more natural shadow-removed results. Extensive experiments are conducted on three benchmark datasets: ISTD, SRD, and Video Shadow Removal. The results confirm the effectiveness of AB-DFRNet, demonstrating competitive quantitative performance and noticeably better visual quality compared with existing weakly supervised shadow removal methods.

Keywords:

weakly-supervised shadow removal; high-frequency information; color consistency

1. Introduction

Shadows refer to dark regions in a scene that are formed when light is partially or completely blocked by objects. In challenging real-world scenes, illumination inconsistency caused by factors such as non-uniform lighting and shadows can substantially degrade visual perception and analysis [1]. Their presence compromises image perceptual quality and disrupts the effectiveness of numerous computer vision tasks, such as object detection [2,3], image segmentation [4], face-related tasks [5,6], and target tracking [7]. As a critical preprocessing step, shadow removal significantly improves image quality and lays a foundation for many advanced vision-based applications.

Unlike other image restoration problems such as occlusion removal [8], shadow removal focuses on restoring illumination and chrominance consistency while preserving the scene content and structure. Traditional classical shadow removal methods mainly generate shadow-free images based on image gradients, illumination transformation information, or color transformation information [9,10,11,12]. Due to their reliance on physical models or manually designed priors, these approaches often perform poorly on unseen datasets or under varying types of shadow in complex and dynamic scenes. The performance declines when the texture properties differ between shadow and non-shadow regions. Moreover, the transitions between the removed shadow areas and non-shadow regions tend to appear unnatural. In addition, certain traditional methods are relatively complex and computationally expensive, especially for high-resolution images.

Compared to traditional methods, deep learning-based shadow removal models can learn diverse feature representations from large-scale shadow image datasets [13,14,15,16,17,18,19]. This data-driven approach enables superior robustness and higher accuracy when handling complex shadow structures. For instance, Shadowformer [19] incorporates a Retinex theory-driven multi-scale channel attention framework and a shadow interaction module (SIM), which utilizes global information from non-shadow regions to guide shadow inpainting. It effectively addresses the problems of boundary artifacts and inconsistent illumination that are caused by local processing in traditional methods. Remarkably, its small variant model achieves high-quality performance with only 2.4 million parameters. HomoFormer [18] employs a spatial random shuffling operation to uniformly redistribute shadow regions across the entire image. The framework further incorporates local transformer processing to establish an effective shadow removal model. This model addresses the modeling challenges arising from non-uniform shadow distributions and diverse shadow patterns.

These fully supervised trained models achieve excellent shadow removal results by being trained on paired datasets containing shadow images, non-shadow images, and corresponding shadow masks. However, constantly changing natural illumination makes it difficult to maintain consistency in color, brightness, and other visual attributes between paired images during data collection. Such constraints limit the diversity of acquisition scenarios and weaken the model’s generalization capability across different environments.

To overcome the scarcity of paired data, GAN-based [20] shadow removal models have attracted a growing interest in unsupervised and weakly supervised learning scenarios. While unsupervised methods do not require labeled data, they face a significant domain gap between the shadowed and non-shadowed image domains [21]. In contrast, weakly supervised training is driven solely by the shadow image and its corresponding mask. This paradigm guides the model by utilizing features from non-shadow regions, which circumvents the domain gap issue. This method avoids reliance on large amounts of paired data, improving the practicality and generalization capability of the shadow removal model.

Liu et al. [22] propose G2R-ShadowNet, a weakly supervised shadow removal framework trained using only shadow images and their corresponding masks. Its shadow generation subnet first stylizes the non-shadow regions of an image to create pseudo-shadow regions. These generated regions are then paired with the original non-shadow regions to train the shadow removal and refinement subnets in an end-to-end manner. From the perspective of symmetry, G2R-ShadowNet exploits the structural and chromatic correspondence between shadowed and non-shadow regions through pairs of pseudo-shadow and non-shadow regions constructed from a single shadow image.

However, the pseudo-shadows generated by the shadow generation subnet often lack fine structural details and are not sufficiently realistic. This limits the quality of the constructed training pairs and affects the performance of the subsequent shadow removal network.

As shown Figure 1, the visual comparison reveals that G2R-ShadowNet fails to recover fine details, leading to a loss of texture and unrealistic smoothness. Additionally, the shadow-removed result remains inconsistent in color relative to the ground truth image. This highlights the model’s limitation in preserving color consistency.

To address the above issues, we propose an ab-dynamic feature refinement network (AB-DFRNet), a symmetry-guided weakly supervised framework for shadow removal. In this work, “ab-dynamic feature refinement” is defined as a design that performs dynamic, content-adaptive feature reweighting during multi-scale fusion via the dual-attention adaptive fusion module. It further enforces chrominance symmetry in the Lab space by applying a chrominance-only consistency loss to the a and b channels. In addition, a high-frequency information enhancement module is introduced into the shadow generation subnet to refine high-frequency features and enrich pseudo-shadow textures. Together, these designs improve texture detail recovery in shadow-removed regions, and effectively suppress chromatic distortion for more natural restoration. The key contributions of this paper can be summarized as follows:

A high-frequency information enhancement (HFIE) module is introduced into the shadow generation subnet to refine high-frequency features, promoting the learning of fine structural symmetry and enriching pseudo-shadow texture details.

A dual-attention adaptive fusion (DAAF) module is introduced into the shadow removal subnet to enable content-adaptive feature refinement during fusion. It combines global attention to emphasize semantic information and local attention to preserve fine details, improving multi-scale representation and texture recovery in shadow-removed regions.

A chrominance-only consistency (COC) loss is designed in the Lab color space to constrain the L1 loss of the a and b channels between shadow-removed regions and their non-shadow counterparts. This loss encourages the network to learn a more symmetric distribution of chromatic information, thereby effectively suppressing color distortion in the recovered areas.

An AB-DFRNet is proposed for weakly supervised shadow removal model. Experimental results on the ISTD, SRD, and Video Shadow Removal datasets show that the proposed model significantly improves the performance of G2R-ShadowNet in removing shadows. Moreover, the model achieves competitive results compared to other weakly supervised models.

2. Related Work

2.1. GAN-Based Shadow Removal

Generative adversarial networks (GANs) are extensively utilized across a range of low-level vision task, such as image deblurring, deraining, dehaze, and shadow removal. For the task of shadow removal, GANs excel due to their robust image generation capabilities, effectively removing shadows while preserving the texture and structural integrity of real scenes. As shown in Table 1, early GAN-based shadow removal models primarily relied on fully supervised learning methods. Wang et al. [23] propose ST-CGAN model, which enables end-to-end joint learning of shadow detection and shadow removal. It employs two stacked CGAN modules to handle detection and removal separately. The model improves global consistency through adversarial training, significantly enhancing the accuracy of shadow boundary processing. Sidorov [24] proposed AngularGAN to address color constancy under multi-illuminant conditions. Based on Gan framework, it introduces an angular loss function to minimize illumination estimation errors. This allows high-quality color correction under unknown lighting conditions. The model is also applied to shadow removal.

To reduce the dependence on paired data, some studies have shifted toward unsupervised and weakly supervised methods. Hu et al. [25] propose the Mask-ShadowGAN framework based on CycleGAN [26], which is trained on unpaired shadow and shadow-free images. The framework can simultaneously learn to generate shadow masks and remove shadows. By leveraging the estimated shadow mask as guidance, it turns the uncertain mapping between shadow and shadow-free images into a more deterministic image translation process. LG-ShadowNet [27] transforms the input image into the Lab color space and leverages the luminance channel for lightness guidance, significantly improving shadow removal performance of Mask-ShadowGAN.

Table 1. Comparison of representative GAN-based shadow removal methods in terms of supervision setting and training data.

Method	Supervision Level	Shadow Image	Shadow-Free GT	Shadow Mask	Color Space
ST-CGAN [23]	Fully Supervised	Yes	Yes	Yes	RGB
AngularGAN [24]	Fully Supervised	Yes	Yes	Yes	RGB
Mask-ShadowGAN [25]	Unsupervised	Yes	Yes (unpaired)	No (learned)	RGB
LG-ShadowNet [27]	Unsupervised	Yes	Yes (unpaired)	No (learned)	Lab
Le et al. [21]	Weakly Supervised	Yes	No	Yes	RGB
G2R-ShadowNet [22]	Weakly Supervised	Yes	No	Yes (GT/pred.)	Lab
HQSS [28]	Weakly Supervised	Yes	No	Yes (GT/pred.)	Lab
Ours	Weakly Supervised	Yes	No	Yes (GT/pred.)	Lab

To address the domain gap between shadow and shadow-free images in unsupervised methods, Le et al. [21] proposed a method that crops shadow and non-shadow patches from shadow images. This method uses only the shadow mask for supervision to train parameters predicting physical constraints for shadow removal. Liu et al. [22] developed the G2R-ShadowNet for shadow removal based on a weakly supervised training method. It generates pseudo-shadow images from non-shadow regions of input images and pairs them with shadow-free images for training. This method effectively reduces reliance on precisely paired data and improves the performance and robustness of shadow removal. HQSS [28] addresses the issue of low-quality pseudo-shadow images limiting model performance. It introduces a new generation framework that includes a shadow feature encoder and a generator. This framework synthesizes high-quality pseudo-shadow images while preserving shadow features and detailed texture information.

2.2. Feature Fusion

The integration of multi-level, multi-scale, and multi-modal visual representations through feature fusion substantially enhances model capabilities across various computer vision applications. In low-level vision tasks such as shadow detection and removal, and image dehazing, feature fusion enables the model to effectively integrate fine-grained textures with global contextual information. This integration mechanism enhances the visual quality of the generated images.

DSCNet [29] introduces the direction-aware spatial context (DSC) module, leveraging a multi-directional feature fusion strategy and cross-layer connections. This model’s local-global shadow dependencies enhance shadow detection and removal performance. Zhao et al. [30] proposed a mix feature fusion module in the decoder of their dehazing model. This module employs adaptive upsampling and channel-spatial attention to automatically weight features from different paths. This enables the model to focus on high-frequency textures and color fidelity while effectively capturing key regions, thereby enhancing the fusion of shallow and deep features. MFAF-Net [31] introduces an AFM module into the U-Net decoder structure. The module uses different convolutional layers to process features at various levels. It then applies an attention mechanism to assign adaptive weights for feature fusion. This design further improves the dehazing performance of the model. D-Net [32] proposes a dynamic feature fusion (DFF) module that adaptively combines multi-scale features. This strategy strengthens the model’s capacity to preserve hierarchical details and improves overall segmentation accuracy.

2.3. Auxiliary Loss Functions for Shadow Removal

In shadow removal tasks, relying solely on pixel-level losses often leads to blurred details or structural distortions in generated images. To address this limitation, researchers have designed various auxiliary loss functions to enhance output quality.

In early studies, Gatys et al. [33] computed the style loss by using the Gram matrices and Euclidean distances of corresponding feature maps. This method reduces differences in color and texture styles between generated and real images. Perceptual loss [34] has been widely adopted in shadow removal models as an auxiliary loss function. It minimizes the distance between generated and real images in the feature space, preserving high-level visual attributes. The perceptual loss is commonly integrated with other losses in networks such as DC-ShadowNet [35], DHAN [36], and TBRNet [13]. It effectively maintains structural consistency and texture authenticity between shadow-removed regions and non-shadow areas in shadow removal tasks. DC-ShadowNet [35] further introduced a boundary smoothness loss to enhance shadow removal performance. This loss function constrains gradient smoothness in shadow boundary regions, ensuring natural transitions between shadow-removed and non-shadow areas. PUL [37] builds upon Mask-ShadowGAN by integrating color loss, content loss, style loss and adversarial loss to enhance the overall shadow removal performance. The color loss is constructed using the MSE between Gaussian-smoothed images to correct color deviations in shadow regions. The content loss preserves the structural integrity of the image. The style loss ensures natural texture transitions. SG-ShadowNet [38] applies a spatial consistency loss. This loss enforces spatial consistency by matching the relative intensity differences between adjacent local regions in the generated image and the real image. It ensures global structural harmony in the shadow-removed image.

3. Method

As illustrated in Figure 2, the proposed AB-DFRNet architecture is constructed based on G2R-ShadowNet [22]. This framework comprises three key components: a shadow generation subnet, a shadow removal subnet, and a refinement subnet.

During the training phase, the shadow region

R_{s}

is first extracted from the input shadow image S using its corresponding mask

M_{2}

. To generate a realistic non-shadow region

R_{n}

, a shadow mask

M_{1}

is randomly drawn from the training set and applied to the shadow-free portion of input image S. These unaligned regions,

R_{n}

and

R_{s}

, are then fed into the shadow generator GS and discriminator D separately. Through adversarial training, pseudo-shadow images

R_{P S}

are generated. The discriminator D is utilized to distinguish between pseudo-shadow regions

R_{P S}

and randomly sampled real shadow regions

R_{s}

.

Subsequently, the pseudo-shadow

R_{P S}

is fed into the shadow removal subnet

S R

, producing the coarse shadow-free result

R_{f}

. The

S R

subnet is optimized by minimizing the L1 loss computed between

R_{f}

and

R_{n}

.

The coarse result

R_{f}

is combined with the remaining regions of the input image through element-wise addition. The resulting feature is then concatenated with the mask

M_{1}

to form the intermediate representation

R_{e}

. This representation is used as the input to the refinement subnet R to produce the output

R_{r}

. Finally, the refinement subnet R is trained using the L1 loss between

R_{r}

and the input image S. The concatenation of

R_{r}

with mask

M_{1}

is compared with

R_{n}

through the COC loss to further optimize the training of R.

During the testing phase, the shadow region

R_{s}

is extracted from the input shadow image using its corresponding mask.

R_{s}

is then processed by shadow removal subnet

S R

to obtain the initial shadow-free result

R_{f}

. Subsequently, it is embedded into the original shadow image and concatenated with the mask. This combined input is fed into the refinement subnet R to produce the restored shadow-free image.

The three subnets are mainly constructed based on the generator proposed by Hu et al. [25]. For the shadow generation subnet, an HFIE module is added to extract high-frequency features. The complete architecture is illustrated in Figure 3. For the shadow removal subnet, a DAAF module is added to fuse the features from different levels, with its structure shown in Figure 4. Detailed descriptions of both the HFIE and DAAF modules are provided in Section 3.1 and Section 3.2, respectively. The refinement subnet directly utilizes the original generator structure from Hu et al. [25], while the discriminator D employs the PatchGan [39] design. The overall structure of our model and the related subnets and modules are shown in Table 2.

3.1. High-Frequency Information Enhancer Module

When the pseudo-shadows generated by the shadow generation subnet more closely approximate real shadows, they provide more faithful supervision for training. Consequently, the shadow removal subnet produces results with richer structural and textural details. Inspired by HLFD [40], a high-frequency information enhancement (HFIE) module is introduced to improve the quality of pseudo-shadow image generation. Unlike HLFD, which mainly focuses on reconstructing both low- and high-frequency components, HFIE drops the low-frequency branch and dedicates all the capacity to refining the high-frequency residuals. This design emphasizes edges and textures in shadow regions rather than re-estimating smooth illumination. The structure of this module is shown in Figure 5.

Concretely, the low-frequency components of the input feature map

F_{i n}

are obtained through downsampling with average pooling. The low-frequency information is then upsampled through interpolation and subtracted from the input features to derive the high-frequency information. To further enhance this residual, a dense block is constructed by passing the output of each layer to all subsequent layers through cross-layer connections. This design enables multi-level feature fusion and nonlinear transformations, thereby strengthening the representation of edge and texture details and generating enhanced high-frequency feature

F_{e n h}

. The dense block contains multiple densely connected layers, each comprising a 3 × 3 convolutional kernel and a GELU activation function. The processing steps of the module are described by the following formulas:

F_{low} = f_{avg_pooling} (F_{in})

(1)

F_{high} = F_{in} - f_{upsample} (F_{low})

(2)

x_{0} = F_{high}

(3)

x_{1} = GELU (f_{conv 3 \times 3} (x_{0}) + x_{0})

(4)

⋮

(5)

x_{6} = GELU (f_{conv 3 \times 3} (x_{5}) + \sum_{k = 0}^{5} x_{k})

(6)

F_{enh} = x_{6}

(7)

where

f_{upsample}

and

f_{avg_pooling}

denote the upsampling operation and average pooling, respectively. GELU represents the GELU activation function.

f_{conv 3 \times 3}

indicates a convolutional kernel with size 3.

The HFIE module separates and enhances high-frequency information, improving the perception of complex textures in the shadow generation subnet. This allows the generated pseudo-shadow images to more faithfully preserve the fine structural symmetry between shadow and non-shadow regions, resulting in more realistic pseudo-shadows and higher-quality training samples for subsequent shadow removal stages.

3.2. Dual-Attention Adaptive Fusion Module

Inspired by the dynamic feature fusion (DFF) [32] module, a dual-attention adaptive fusion (DAAF) module is proposed to effectively integrate global contextual information with complex local details in the shadow removal subnet. Unlike the DFF module, channel attention and spatial attention mechanisms are employed to perform refined weighted fusion of backbone features and skip-connection features along channel and spatial dimensions in our module, respectively. The importance of deep and shallow features is dynamically adjusted, achieving optimal integration of global and local information.

The architecture of the DAAF module is illustrated in Figure 6. The two input features

F_{1}

and

F_{2}

are first concatenated along the channel dimension to form feature

F_{c a t}

. Subsequently,

F_{c a t}

is processed through a branch containing global average pooling, followed by a 1 × 1 convolutional layer and sigmoid activation to generate attention weights. These weights are then element-wise multiplied with

F_{c a t}

to produce the adjusted feature

F_{g}

. To further enhance local details,

F_{c a t}

is split into two sub-features along the channel dimension. Pixel-attention [41] and channel-attention [42] mechanisms are separately applied to obtain refined features

F_{c a}

and

F_{p a}

. These features are then concatenated and processed through a 1 × 1 convolution followed by sigmoid activation to generate the attention weights. Finally,

F_{g}

is processed through a 1 × 1 convolutional layer for channel adjustment, then multiplied with the attention weights to generate output feature

F_{o u t}

. The formulation of this process is presented as follows:

F_{cat} = f_{concat} (F_{1}, F_{2})

(8)

F_{g} = (σ (f_{conv} (f_{adaptive_pooling} (F_{cat})))) \otimes F_{cat}

(9)

F_{c_{1}}, F_{c_{2}} = f_{split} (F_{cat})

(10)

F_{out} = (σ (f_{conv} (f_{concat} (f_{ca} (F_{c_{1}}), f_{pa} (F_{c_{2}}))))) \otimes (f_{conv} (F_{g}))

(11)

where

f_{split}

denotes the channel splitting operation,

f_{c a}

and

f_{p a}

represent the channel attention mechanism and pixel attention mechanism, respectively,

σ

indicates the Sigmoid activation function, and ⊗ represents the element-wise multiplication.

The DAAF module dynamically adjusts feature weight distributions across both channel and spatial dimensions to enhance critical features while suppressing redundant or interfering information. This adaptive mechanism enables effective cross-level feature fusion, maintaining structural consistency while significantly improving local detail restoration, thereby optimizing overall shadow removal performance.

3.3. Color Consistency Loss

To enhance the refinement capability of the refinement subnet, we design a chrominance-only consistency (COC) loss in the Lab color space. It minimizes the chrominance discrepancy between shadow-removed regions and real non-shadow regions. This leverages their underlying chromatic symmetry to guide the network toward more consistent color restoration.

Specifically, the refined output

R_{r}

from the refinement subnet in Figure 1 is combined with the shadow mask

M_{1}

to obtain the masked result

R_{r m}

. We then extract the a and b chrominance channels from both

R_{r m}

and the reference non-shadow region

R_{n}

, respectively. A masked

ℓ_{1}

distance is computed on each chrominance channel independently, and the two terms are weighted and summed to form the final loss. The formulation of the COC loss function is presented as follows:

L_{coc} = λ_{a} {∥A (R_{r m}) - A (R_{n})∥}_{1} + λ_{b} {∥B (R_{r m}) - B (R_{n})∥}_{1}

(12)

where

λ_{a}

and

λ_{b}

are the weighting parameters, both set to 0.5 in this paper.

A (R_{r m})

,

A (R_{n})

,

B (R_{r m})

, and

B (R_{n})

denote the extracted a and b chrominance channels in the Lab color space from the images

R_{r m}

and

R_{n}

, respectively.

During training, the chrominance-only consistency loss is added as an auxiliary term together with other losses. This guides the model toward more color-consistent outputs. As a result, the authenticity and naturalness of shadow removal are significantly improved.

3.4. Loss Function

Based on the loss design of G2R Shadow-Net, the loss of chrominance-only consistency loss is incorporated into the training process. It is combined with other loss functions to jointly optimize the model. The total loss function

L_{a l l}

is defined as follows:

L_{a l l} = ω_{1} L_{G A N} + ω_{2} L_{i d e n} + ω_{3} L_{r e m} + ω_{4} L_{f u l l} + ω_{5} L_{a r e a} + ω_{6} L_{c o c}

(13)

where

ω_{1}

,

ω_{2}

,

ω_{3}

,

ω_{4}

,

ω_{5}

, and

ω_{6}

are the weighting parameters for the corresponding loss functions. They are set to 1.0, 5.0, 1.0, 1.0, 1.0, and 0.5, respectively, in this paper.

L_{G A N}

denotes the integrated loss for the generator

G S

and the discriminator D in the shadow generation subnet.

L_{i d e n}

represents the identity loss function for the generator GS in the shadow generation subnet.

L_{r e m}

denotes the loss function for the generator

S R

in the shadow removal subnet.

L_{f u l l}

indicates the loss function for the generator R in the refinement subnet.

L_{a r e a}

represents the loss function in G2R-ShadowNet that focuses on the regions adjacent to

R_{n}

. Except for the COC loss function proposed in this paper, all other loss functions are computed using the original formulations from G2R-ShadowNet [22].

4. Experiment

4.1. Datasets

Our model is trained and validated on three shadow removal datasets: ISTD, SRD, and Video Shadow Removal datasets. A brief description of each dataset is provided below:

ISTD [23,43]: The dataset comprises 1870 image triplets, each consisting of a shadow image, its shadow mask and the corresponding shadow-free image. Among them, 1330 image triplets are selected for training, with the remaining 540 used in the testing phase. All images have a resolution of 480 × 640 and cover a total of 135 different scenes. To improve the reliability of the evaluation, Le et al. [43] refined the original ISTD dataset to construct the ISTD+ dataset, which enhances color consistency between shadow and shadow-free image pairs. Following previous work [22,28], we use the shadow images and corresponding ground-truth masks from ISTD to train the model. During testing, we evaluate performance on the ISTD+ test set using both the ground-truth masks and the masks generated by the shadow detector [44] separately.

SRD [45]: The dataset contains 2680 pairs of shadow and shadow-free images. Due to the absence of shadow masks in the SRD dataset, the shadow masks generated by two shadow detection models (DHAN [36] and Liu et al. [46]) are employed for training and testing in separate experiments.

Video Shadow Removal [21]: The dataset consists of 8 videos featuring a static background with no visible moving objects. Variations in the videos are solely caused by moving shadows. By calculating the intensity extrema (Vmax and Vmin) at each pixel throughout the video sequence, a moving mask is generated to define regions of interest for evaluation. The Vmax image, which is derived from the maximum intensity values at each pixel, serves as the non-shadow ground truth. According to the official code, the threshold for generating the moving mask is set to 80.

CUHK-Shadow [47]: The CUHK-Shadow dataset is a large-scale benchmark dataset for shadow detection, consisting of 10,500 images with pixel-level shadow annotations. The dataset consists of shadow images collected from five sources. Each image is provided with a corresponding shadow mask, covering diverse indoor and outdoor scenes with complex shadow patterns.To evaluate the generalization ability of the proposed model, only the Shadow-ADE subset of this dataset is used for testing.

4.2. Evaluation Metrics

For fair quantitative comparison with other models, we adopt the root-mean-square error (RMSE), peak signal-to-noise ratio (PSNR), structural similarity (SSIM), and learned perceptual image patch similarity (LPIPS) as evaluation metrics. The RMSE is computed in the LAB color space, while PSNR and SSIM are calculated in the RGB color space. Lower RMSE values indicate smaller deviations between the inferred results and the ground truth, whereas higher PSNR and SSIM values reflect better reconstruction quality. LPIPS measures perceptual similarity between the restored results and the ground truth in a deep feature space, where lower values indicate better perceptual quality. For a fair comparison, all evaluation images are resized to 256 × 256 pixels to obtain consistent quantitative performance metrics. LPIPS is the only exception and is evaluated on the original full-resolution images.

4.3. Implementation Details

The AB-DFRNet model is implemented based on the PyTorch 1.8.1 framework, and all training and testing procedures are conducted on an NVIDIA GeForce RTX 3090 GPU. An end-to-end joint training strategy is employed, with the network parameters initialized from a Gaussian distribution with a mean of 0 and a standard deviation of 0.02. The Adam optimizer is applied to train all network components, where the first and second momentum parameters are set to 0.5 and 0.999, respectively. The entire model is trained for 150 epochs. The initial learning rate is set to

2 \times 10^{- 4}

for the first 50 epochs, after which a linear decay strategy is applied to reduce it to zero over the remaining epochs. All experiments are conducted with a batch size of 1. To prevent overfitting, data augmentation is performed using random cropping and random flipping. Specifically, each image is first scaled to a resolution of 448 × 448, followed by random cropping of a 400 × 400 patch to generate the model input.

4.4. Comparison with The-State-of-the-Arts

To evaluate the shadow removal performance of our model, we compare it with other shadow removal models on the ISTD and SRD datasets. The comparative results are shown in Table 3 and Table 4. Specifically, the weakly supervised models include FSS2SR [21], G2R-ShadowNet, HQSS, and BCDiff [48]. Based on our literature review, BCDiff is the current state-of-the-art weakly supervised model for shadow removal. The unsupervised models include Mask-ShadowGAN, DC-ShadowNet, and Enhancing unsupervised shadow removal [49]. The fully supervised models include DHAN, SG-ShadowNet, TBRNet [13], and DeS3 [17]. The testing employs two types of masks, where “detect” refers to the mask generated by the shadow detection model [44], while the remaining masks are obtained from the ground truth (GT) masks in the ISTD dataset.

According to the results of using the GT mask in Table 3, the proposed model achieves comparable accuracy to BCDiff within the shadow region, with marginal differences of 0.3 in RMSE and 0.73 in PSNR. The PSNR is 0.14 higher than that of BCDiff in the non-shadow region. Compared with the G2R-ShadowNet model, while maintaining similar performance in non-shadow regions, our model reduces the RMSE in shadow regions by 0.7 and improves the PSNR by 1.2. In addition, our model achieves a lower LPIPS value, indicating improved perceptual quality and closer visual similarity to the ground truth.

Compared with the unsupervised model proposed by Wang et al. [49], the RMSE in shadow regions is increased by 0.6 and the PSNR is decreased by 0.38 in our model. More importantly, the metrics of our model in non-shadow regions and overall image quality are superior to those of the other unsupervised models. Compared to fully supervised training models, although there is a certain performance gap in shadow removal, the test results in the non-shadow region can achieve a comparable level to them.

As presented in the result of using shadow detector masks in Table 4, our model outperforms G2R-ShadowNet in terms of both shadow region removal and overall image quality. Specifically, the RMSE in shadow regions is reduced by 1.9 and the PSNR is improved by 0.99 compared with G2R-ShadowNet. The overall image RMSE is reduced by 0.8, and the PSNR is improved by 0.78. Compared with unsupervised models, our model shows superior performance in non-shadow regions.

In addition, training and testing are also carried out using more accurate shadow masks provided by Liu et al. [46]. Experimental results show that a more precise shadow mask can further improve the shadow removal performance in both the shadow regions and the overall image, as marked by “GT” in Table 4. Specifically, in the shadow region, RMSE is further reduced by 0.9, and PSNR is increased by 0.9. For the overall image, RMSE is also reduced by 0.3 and PSNR is increased by 1.27. These results indicate that our method exhibits better adaptability to mask accuracy. When shadow regions are more precisely localized, the shadow removal capability of the model can be more fully exploited, resulting in higher-quality restoration results. Nevertheless, compared with the results on the ISTD+ dataset shown in Table 3, our model shows relatively limited improvements on SRD, which is likely due to the more challenging scenarios. This suggests that texture-rich shadow regions and heavy shadows still leave room for further improvement.

To further assess the generalization ability of the model, the model is evaluated on the Video Shadow Removal dataset using weights trained on the ISTD dataset. Additionally, shadow masks produced by a shadow detector [44] trained on the SBU dataset [53] are incorporated into the evaluation. As shown in the results of Table 5, the proposed model outperforms G2R-ShadowNet in all performance metrics. Compared with other models, our model does not achieve the best result in terms of PSNR. Nonetheless, it surpasses all other compared models in the remaining metrics, demonstrating strong generalization performance on unseen data.

We further evaluate the complexity of our method and G2R-ShadowNet. As shown in Table 6, our method introduces only 0.52M additional parameters and 9.99 G additional MACs for a single forward pass at an input resolution of

480 \times 640

. Importantly, the HFIE module is only inserted into the shadow-generation subnet during training, while inference uses only the shadow-removal subnet and the refinement subnet. Therefore, the increase in inference-time parameters and computation mainly comes from the DAAF module added to the shadow-removal subnet. In terms of runtime, the inference time increases slightly by 3.22 ms. Despite this minor overhead, our method achieves clearly better quantitative results and visual quality than G2R-ShadowNet.

Compared with LG-ShadowNet and Mask-ShadowGAN, our method indeed has a longer inference time. This is mainly because these two methods typically require only a single network with one forward pass during inference, whereas our inference pipeline uses both the shadow removal subnet and the shadow refinement subnet. In addition, we introduce the DAAF module into the shadow-removal subnet to improve performance in shadow regions, which increases the inference cost of our model.

To complement pixel-level metrics, we compare the shadow removal results with the shadow-free ground truth on the luminance channel. We report gradient-domain structural consistency using Grad-L1, GMSD, and GMS-mean [54]. Grad-L1 is the mean

ℓ_{1}

distance between the Sobel gradient magnitude maps of the shadow removal result and the ground truth within the evaluated region. Using the same Sobel gradient magnitude maps, GMSD and GMS-mean summarize the dispersion and the average of the gradient-magnitude similarity, respectively. To capture texture-frequency strength, we further use Laplacian energy as a proxy for high-frequency texture contrast, and we report its absolute difference between the shadow removal results and the ground truth, denoted as LapEnergyDiff. Using the same Sobel gradient magnitude maps, GMSD and GMS-mean summarize the dispersion and the average of the gradient-magnitude similarity, respectively.

As shown in Table 7, we compute the metrics on the shadow region and on the edge region. In the shadow region, our model achieves better gradient-domain consistency than G2R-ShadowNet and HQSS. It yields lower Grad-L1 and GMSD and higher GMS-mean. This indicates that the restored shadow region is closer to the ground truth in texture–structure statistics and shows fewer local structural fluctuations. On the edge region, our model again improves gradient consistency over G2R-ShadowNet and HQSS. It achieves lower Grad-L1 and GMSD and higher GMS-mean, suggesting more coherent texture transitions near the boundary. LapEnergyDiff shows small gaps across methods in both regions. This suggests comparable overall high-frequency strength across methods. Meanwhile, our model consistently improves the gradient metrics, highlighting more accurate texture–structure restoration.

We additionally perform statistical analysis on the ISTD test set. Per-image PSNR and SSIM are computed within the shadow region using the dataset-provided shadow masks, and the resulting per-image metrics are exported for analysis. The proposed model is then compared with competing methods by computing per-image performance differences, and the mean improvement is reported together with its 95% confidence interval (CI) based on the t distribution.

As shown in Table 8, the proposed method consistently outperforms both competing approaches in the shadow region on PSNR and SSIM. Compared with G2R-ShadowNet, our method improves PSNR by 1.20 and improves SSIM by 0.0038. Compared with HQSS, our model achieves a PSNR gain of 0.66 and an SSIM gain of 0.0013. In all weakly supervised models, the confidence intervals do not cross zero, and both paired t-tests and Wilcoxon signed-rank tests yield significant p-values, indicating statistically significant and consistent improvements across the test images.

Compared with the unsupervised model LG-ShadowNet, our method improves shadow-region PSNR by 2.73, and the 95% CI of the paired gain does not cross zero. Both the paired t-test and the Wilcoxon test are highly significant. For SSIM, the difference between the two methods is very small. The confidence interval crosses zero, indicating that the difference is not statistically significant. This suggests that their structural similarity is essentially comparable.

We also conduct qualitative visual analysis on the test results of the model, and the comparison is shown in Figure 7 and Figure 8. For a fair comparison, publicly available weights of G2R-ShadowNet and HQSS trained on the ISTD datasets are used to perform inference and regenerate shadow removal results. The results of FSS2SR and LG-ShadowNet are obtained from public sources on the ISTD+ test set. Additionally, the results of G2R-ShadowNet on the SRD test set are obtained through retraining. The results of HQSS are from publicly available test outputs.

By comparing the results in Figure 7 and Figure 8, it can be concluded that the shadow removal results of our model are more natural than those of other models. For instance, the colors in the shadow-removed regions are closer to those of the shadow-free ground truth image in the first row of Figure 7 In the second row of Figure 8, our model effectively removes shadows while better preserving the texture details of the original images. In contrast, the texture details in the shadow-removed regions generated by HQSS and G2R-ShadowNet appear relatively blurred.

As shown in Figure 9, we present a comparison between our method and DC-ShadowNet on the SRD dataset. Unlike our weakly supervised setting that uses shadow masks as guidance, DCShadowNet is trained without masks and directly leverages shadow and shadow-free image pairs during training. Without paired shadow-free targets, our model relies mainly on non-shadow regions and mask guidance. When shadows are strong or textures are complex, the available supervision may be insufficient, which can result in residual shadows and less smooth boundary transitions.

The proposed method effectively improves shadow removal quality and overall visual consistency. Despite these improvements, challenges still remain in complex scenarios. As shown in Figure 10, high-contrast hard shadows may cause luminance over-compensation, leading to slightly washed-out appearances. In other cases, shadows may not be completely removed, resulting in residual shadow artifacts. Moreover, texture-rich regions such as grass may suffer from weakened details and less natural texture blending. Consequently, fine details in the corrected regions may appear blurrier in some areas when compared with the ground truth.

To further evaluate the generalization ability and stability of our model in real-world scenarios, the model weights trained on the SRD dataset are applied to the CUHK test dataset (ADE subset). No additional fine-tuning or retraining is performed.

As shown in Figure 11, the model is able to significantly reduce shadow intensity in real scenes, leading to visibly lighter shadow regions and demonstrating a certain degree of cross-dataset and in-the-wild generalization. Nevertheless, for deep or large-area shadows, the restored regions may still exhibit noticeable brightness or color discrepancies compared with surrounding areas. In texture-rich regions, such as gravel roads or dense vegetation, fine details may become blurred or weakened after shadow removal. Overall, these observations indicate that achieving robust shadow removal remains challenging under weak supervision in complex scenes.

4.5. Ablation Study

To verify the effectiveness of different modules and loss functions proposed in this paper, an ablation study is conducted on the ISTD dataset and SRD dataset, including training, validation, and performance analysis. In addition, ablation experiments are carried out to evaluate the impact of various module configurations and weight settings of the loss function.

4.5.1. Ablation on Modules and Loss

Table 9 reports the ablation study on the ISTD+ dataset, with G2R-ShadowNet as the baseline. It is found that when the DAAF module is added, the RMSE in the shadow regions is reduced by 0.2 and the PSNR is improved by 0.39. For all the images, the RMSE decreases by 0.1 and the PSNR increases by 0.26. Further integration of the HFIE module leads to an additional RMSE reduction of 0.3 in the shadow regions and a PSNR improvement of 0.53. The overall PSNR of the image increases by 0.33. Finally, when the COC loss is introduced, the best performance in shadow removal is achieved. Compared to the baseline model, the RMSE in the shadow regions decreases by 0.7 and the PSNR increases by 1.2. For the whole image, the RMSE is reduced by 0.1 and the PSNR is improved by 0.75. Table 10 presents additional ablations on the SRD dataset. Compared with the baseline, the full model with DAAF, HFIE, and the COC loss reduces the shadow-region RMSE by 1.2 and improves PSNR by 0.70. For the whole image, RMSE decreases by 0.5 and PSNR increases by 0.56.

Moreover, component-wise ablations are performed by removing DAAF, HFIE, or the COC loss individually under identical model architecture and training settings. Table 9 and Table 10 show that the full configuration achieves the best performance, and removing any single component results in consistent degradations in shadow-region or overall metrics. This provides further evidence that the proposed modules and the COC loss contribute effectively and work in a complementary manner.

4.5.2. Ablation Study on DAAF Module

To verify that the DAAF module enhances feature fusion for texture–structure preservation, we use Grad-L1 and GMSD/GMS-mean to measure consistency between the output and the ground truth. As shown in Table 11, compared with w/o DAAF, our full model achieves lower Grad-L1 and GMSD, and higher GMS-mean. This indicates that the restored shadow region is closer to the ground truth in texture–structure statistics and shows fewer local structural fluctuations.

To evaluate the effectiveness and complementarity of the attention mechanism introduced in the DAAF module, an ablation study examines the channel attention and pixel attention mechanism. With the overall model architecture kept unchanged, three different module variants are constructed and evaluated under the same training and validation settings. The experimental results are summarized in Table 12. Specifically, “w/o pa” denotes the module with only channel attention, “w/o ca” refers to the module with only pixel attention, and “w/o att” indicates that neither of the two attention mechanisms is included in the module.

The results show that the best performance is achieved only when both the channel attention (CA) and pixel attention (PA) mechanisms are integrated. Although either attention mechanism alone can improve the shadow removal capability, neither performs as well as the combination of both. The worst performance is observed when both attention mechanisms are removed entirely.

In addition, visual examples of shadow removal results from the ablation study are provided. As shown in Figure 12, the shadow-free images generated by the model with both attention mechanisms appear more natural and contain richer detail information in shadow-removed regions. In contrast, removing either attention mechanism leads to poor color consistency or blurred textures in the same regions.

4.5.3. Ablation Study on HFIE Module

To justify the architectural design of placing HFIE in the shadow generation subnet, we conduct an ablation study that compares different HFIE placements, including adding HFIE to the generation subnet, the removal subnet, and both.

As shown in Table 13, the best performance is achieved when HFIE is applied only to the GS subnet. Since the GS subnet is responsible for synthesizing pseudo-shadow images, the HFIE module enhances fine texture details and other high-frequency information. This leads to more realistic pseudo shadows and reduces the gap between synthesized shadows and real shadows, providing more informative training pairs for the SR subnet. In contrast, placing the module in the SR subnet can make the restoration overly sensitive to local variations. This may hinder accurate chrominance and luminance recovery and introduce visible artifacts without providing consistent performance gains.

Because the HFIE module is embedded in the shadow-generation subnet, we compare its pseudo-shadow outputs with those of the shadow-generation branch in G2R-ShadowNet. As shown in Figure 13, (b) presents the pseudo-shadow from G2R-shadownet, whereas (c) is obtained by adding HFIE only, and (d) corresponds to our full model. In the first row of Figure 13, compared with the real non-shadow image, our approach preserves richer texture details of the white line patterns and keeps their boundaries intact. In contrast, the output from G2R-shadownet appears noticeably blurred. In the second row, after incorporating the HFIE module, the generated pseudo-shadow retains more texture information. With the DAAF module and the COC loss further applied, the output exhibits more spatial consistency and is visually more natural.

This indicates that the HFIE module helps the generator network capture more high-frequency details, thereby further improving the overall shadow removal quality of the model.

We also analyze the impact of the HFIE module on model performance; an ablation study is conducted by varying the number of convolutional layers within the dense block. The experimental results are shown in Table 14, where “Number of Convolutional Layers” indicates different settings of inter-layer connectivity. It is observed that when the number is set to 6, the model achieves the best performance across all evaluation metrics. This demonstrates that the HFIE module effectively exploits the advantages of the dense connectivity structure, enabling efficient feature reuse. When the number exceeds 6, the RMSE increases and the PSNR decreases, indicating that excessive dense connections weaken the module’s capacity to model high-frequency local features, leading to a significant degradation in shadow removal performance. Additionally, when the number is less than 6, the module fails to fully learn and utilize hierarchical feature reuse, which limits its shadow removal capability.

To validate our design that the HFIE module extracts only high-frequency features, we concatenate the upsampled low-frequency features with the high-frequency features in Figure 5. We then compare this variant with the original HFIE module. As shown in Table 15, the experimental results indicate that the high-frequency-only HFIE module achieves better shadow removal performance.

To further validate the role of the HFIE module in our model, we visualized the shadow removal results on the same image with and without this module. As shown in Figure 14, the model with HFIE achieved a PSNR of 40.83, compared to 34.28 for the model without HFIE, indicating a notable improvement in detail restoration of the shadow removal region. The texture details are clearer and more natural compared to the ground truth image in the same region.

4.5.4. Ablation Study on Chrominance-Only Consistency Loss

To further investigate the impact of the COC loss on shadow removal performance, an ablation study is conducted by varying its weight setting. This experiment is designed to balance the contribution of the invariant loss with that of other main loss components. The results are presented in Table 16. It is found that although this loss term contributes positively to improving color consistency, an excessively high weight may interfere with the optimization of other critical loss terms, leading to an overall performance degradation. When the weight is set to 0.5, the model achieves the best performance across all evaluation metrics. This demonstrates that a well-balanced and synergistic optimization is achieved between the COC loss and the other loss components under this setting.

To validate the effectiveness of our proposed COC loss, we constructed a color consistency loss using the commonly used L2 loss for comparison. As shown in Table 17, the proposed COC loss reduces the shadow-region RMSE by 0.3 and improves PSNR by 0.41. For the whole image, COC loss improves PSNR by 0.27, while maintaining the same RMSE. These results demonstrate that the proposed COC loss provides a more effective chrominance consistency constraint than the conventional L2-based alternative.

We further study the impact of the weighting parameters

(ω_{a}, ω_{b})

in the proposed COC loss. Keeping all other training settings unchanged, we evaluate the parameter configurations in Table 18, and report the corresponding results in the same table. The equal-weight setting (0.5, 0.5) achieves the best performance in the shadow region. As the weights deviate from this balanced assignment, the shadow-region PSNR/SSIM steadily decreases. This suggests that the equal-weight choice provides a favorable trade-off between performance and robustness. Therefore, we adopt 0.5 as the default weighting in model training.

To further validate the effectiveness of the proposed COC loss, we remove the loss during training and compare the shadow removal results with those of the full model, as shown in Figure 15. Compared with the ground truth, the full model produces more natural color restoration. The colors along shadow boundaries are closer to those in nearby non-shadow regions, reducing visible color discontinuities.

5. Conclusions

This paper proposes AB-DFRNet, a symmetry-guided weakly supervised shadow removal network to address insufficient detail restoration and color inconsistency in shadow removal. By combining the HFIE module, DAAF module and COC loss, our model more effectively exploits the structural and chromatic symmetry encoded in training pairs of shadow and non-shadow regions. These improvements effectively enhance the performance of weakly supervised shadow removal models when dealing with complex scenes and diverse shadow textures. Experimental results on ISTD, SRD, and Video Shadow Removal datasets demonstrate that AB-DFRNet produces more natural shadow-removed images with finer structural details and fewer color artifacts, while achieving competitive performance against current state-of-the-art weakly supervised shadow removal methods.

In further work, we aim to further enhance the capability of weakly supervised shadow removal models in handling complex environments with diverse texture structures. In terms of pseudo-shadow generation, other loss functions will be designed to minimize the feature distance between generated shadows and real shadow samples, ensuring that the generated shadows closely resemble actual shadows in feature distribution. For shadow removal, physical illumination models will be integrated to improve lighting consistency and achieve more natural transitions between shadow-removed and non-shadow regions. Meanwhile, we plan to investigate model lightweighting strategies to improve computational efficiency and inference speed.

Author Contributions

Conceptualization, Y.S. and Z.Z.; methodology, Y.S.; software, Y.S.; validation, Y.S.; formal analysis, Y.S.; investigation, Y.S. and M.Y.; resources, Y.S., Z.Z. and M.Y.; data curation, Y.S.; writing—original draft preparation, Y.S.; writing—review and editing, Y.S., Z.Z. and M.Y.; visualization, Y.S.; supervision, Z.Z. and M.Y.; project administration, Z.Z. and M.Y.; funding acquisition, Z.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research is supported by Liaoning Province, China, the Applied Basic Research Program in 2023, ‘Key Technology Research on Vehicle Vision Perception and Trajectory Planning for ADAS’ (2023JH2/101300237).

Data Availability Statement

All data used in this study are publicly available. The ISTD, SRD, and Video Shadow Removal datasets can be obtained by searching at the following addresses, respectively: ISTD: https://github.com/DeepInsight-PCALab/ST-CGAN (accessed on 1 December 2025); ISTD+: https://github.com/cvlab-stonybrook/SID (accessed on 1 December 2025); SRD: https://liangqiong.github.io/publications/ (accessed on 1 December 2025); Video Shadow Removal: https://github.com/hieulem/FSS2SR (accessed on 1 December 2025); CUHK-Shadow: https://github.com/xw-hu/CUHK-Shadow#cuhk-shadow-dateset (accessed on 1 December 2025).

Conflicts of Interest

The authors declare no conflicts of interest.

References

Sarkar, S.; Purkayastha, K.; Palaiahnakote, S.; Pal, U.; Saleem, M.H.; Ghosal, P. A New Multimodal Cross-Domain Network for Classification of Challenging Scene Images. In Proceedings of the International Conference on Document Analysis and Recognition (ICDAR) 2025 Workshops, Wuhan, China, 20–21 September 2025; pp. 108–123. [Google Scholar] [CrossRef]
Nadimi, S.; Bhanu, B. Physical models for moving shadow and object detection in video. IEEE Trans. Pattern Anal. Mach. Intell. 2004, 26, 1079–1087. [Google Scholar] [CrossRef] [PubMed]
Agrawal, S.; Natu, P. ABGS Segmenter: Pixel wise adaptive background subtraction and intensity ratio based shadow removal approach for moving object detection. J. Supercomput. 2023, 79, 7937–7969. [Google Scholar] [CrossRef]
Suh, H.K.; Hofstee, J.W.; Van Henten, E.J. Improved vegetation segmentation with ground shadow removal using an HDR camera. Precis. Agric. 2018, 19, 218–237. [Google Scholar] [CrossRef]
Zhang, W.; Zhao, X.; Morvan, J.-M.; Chen, L. Improving shadow suppression for illumination robust face recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 41, 611–624. [Google Scholar] [CrossRef]
Liu, Y.; Hou, A.Z.; Huang, X.; Ren, L.; Liu, X. Blind Removal of Facial Foreign Shadows. In Proceedings of the British Machine Vision Conference (BMVC), London, UK, 21–24 November 2022; Available online: https://api.semanticscholar.org/CorpusID:253820934 (accessed on 1 December 2025).
Sanin, A.; Sanderson, C.; Lovell, B.C. Improved Shadow Removal for Robust Person Tracking in Surveillance Scenarios. In Proceedings of the 20th International Conference on Pattern Recognition (ICPR), Istanbul, Turkey, 23–26 August 2010; pp. 141–144. [Google Scholar] [CrossRef]
Pradhan, P.K.; Purkayastha, K.; Sharma, A.L.; Baruah, U.; Sen, B.; Ghosal, P. Graphically Residual Attentive Network for tackling aerial image occlusion. Comput. Electr. Eng. 2025, 125, 110429. [Google Scholar] [CrossRef]
Finlayson, G.D.; Hordley, S.D.; Lu, C.; Drew, M.S. On the removal of shadows from images. IEEE Trans. Pattern Anal. Mach. Intell. 2005, 28, 59–68. [Google Scholar] [CrossRef]
Zhang, L.; Zhang, Q.; Xiao, C. Shadow remover: Image shadow removal based on illumination recovering optimization. IEEE Trans. Image Process. 2015, 24, 4623–4636. [Google Scholar] [CrossRef]
Liu, F.; Gleicher, M. Texture-Consistent Shadow Removal. In Proceedings of the 10th European Conference on Computer Vision (ECCV), Marseille, France, 12–18 October 2008; pp. 437–450. [Google Scholar] [CrossRef]
Guo, R.; Dai, Q.; Hoiem, D. Single-Image Shadow Detection and Removal Using Paired Regions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Colorado Springs, CO, USA, 20–25 June 2011; pp. 2033–2040. [Google Scholar] [CrossRef]
Liu, J.; Wang, Q.; Fan, H.; Tian, J.; Tang, Y. A Shadow Imaging Bilinear Model and Three-Branch Residual Network for Shadow Removal. IEEE Trans. Neural Netw. Learn. Syst. 2024, 35, 15857–15871. [Google Scholar] [CrossRef]
Zhang, X.; Zhao, Y.; Gu, C.; Lu, C.; Zhu, S. SpA-Former: An Effective and Lightweight Transformer for Image Shadow Removal. In Proceedings of the International Joint Conference on Neural Networks (IJCNN), Gold Coast, Australia, 18–23 June 2023; pp. 1–8. [Google Scholar] [CrossRef]
Vasluianu, F.-A.; Seizinger, T.; Timofte, R. WSRD: A Novel Benchmark for High Resolution Image Shadow Removal. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Vancouver, BC, Canada, 17–24 June 2023; pp. 1826–1835. [Google Scholar] [CrossRef]
Wang, Y.; Zhou, W.; Feng, H.; Li, L.; Li, H. Progressive Recurrent Network for Shadow Removal. Comput. Vis. Image Underst. 2024, 238, 103861. [Google Scholar] [CrossRef]
Jin, Y.; Ye, W.; Yang, W.; Yuan, Y.; Tan, R.T. Des3: Adaptive Attention-Driven Self and Soft Shadow Removal Using ViT Similarity. AAAI Conf. Artif. Intell. 2024, 38, 2634–2642. [Google Scholar] [CrossRef]
Xiao, J.; Fu, X.; Zhu, Y.; Li, D.; Huang, J.; Zhu, K.; Zha, Z.-J. HomoFormer: Homogenized Transformer for Image Shadow Removal. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 16–22 June 2024; pp. 25617–25626. [Google Scholar] [CrossRef]
Guo, L.; Huang, S.; Liu, D.; Cheng, H.; Wen, B. ShadowFormer: Global Context Helps Shadow Removal. In Proceedings of the AAAI Conference on Artificial Intelligence, Washington, DC, USA, 7–14 February 2023; pp. 710–718. [Google Scholar] [CrossRef]
Goodfellow, I.J.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative Adversarial Nets. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), Montreal, QC, Canada, 8–13 December 2014; MIT Press: Cambridge, MA, USA, 2014; Volume 27, pp. 2672–2680. [Google Scholar]
Le, H.; Samaras, D. From Shadow Segmentation to Shadow Removal. In Proceedings of the 16th European Conference on Computer Vision (ECCV), Glasgow, UK, 23–28 August 2020; pp. 264–281. [Google Scholar] [CrossRef]
Liu, Z.; Yin, H.; Wu, X.; Wu, Z.; Mi, Y.; Wang, S. From Shadow Generation to Shadow Removal. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 4925–4934. [Google Scholar] [CrossRef]
Wang, J.; Li, X.; Yang, J. Stacked Conditional Generative Adversarial Networks for Jointly Learning Shadow Detection and Shadow Removal. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 1788–1797. [Google Scholar] [CrossRef]
Sidorov, O. Conditional GANs for Multi-Illuminant Color Constancy: Revolution or Yet Another Approach? In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Long Beach, CA, USA, 16–17 June 2019; pp. 1748–1758. [Google Scholar] [CrossRef]
Hu, X.; Jiang, Y.; Fu, C.-W.; Heng, P.-A. Mask-ShadowGAN: Learning to Remove Shadows From Unpaired Data. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 2472–2481. [Google Scholar] [CrossRef]
Zhu, J.-Y.; Park, T.; Isola, P.; Efros, A.A. Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2223–2232. [Google Scholar] [CrossRef]
Liu, Z.; Yin, H.; Mi, Y.; Pu, M.; Wang, S. Shadow Removal by a Lightness-Guided Network With Training on Unpaired Data. IEEE Trans. Image Process. 2021, 30, 1853–1865. [Google Scholar] [CrossRef] [PubMed]
Zhong, Y.; You, L.; Zhang, Y.; Chao, F.; Tian, Y.; Ji, R. Shadow Removal by High-Quality Shadow Synthesis. arXiv 2022, arXiv:2212.04108. [Google Scholar]
Hu, X.; Fu, C.-W.; Zhu, L.; Qin, J.; Heng, P.-A. Direction-Aware Spatial Context Features for Shadow Detection and Removal. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 42, 2795–2808. [Google Scholar] [CrossRef] [PubMed]
Zhao, X.; Wang, Z.; Deng, Z.; Qin, H.; Zhu, Z. Transmission-guided multi-feature fusion Dehaze network. Vis. Comput. 2025, 41, 2285–2297. [Google Scholar] [CrossRef]
Yi, W.; Dong, L.; Liu, M.; Hui, M.; Kong, L.; Zhao, Y. MFAF-Net: Image dehazing with multi-level features and adaptive fusion. Vis. Comput. 2024, 40, 2293–2307. [Google Scholar] [CrossRef]
Yang, J.; Qiu, P.; Zhang, Y.; Marcus, D.S.; Sotiras, A. D-net: Dynamic Large Kernel with Dynamic Feature Fusion for Volumetric Medical Image Segmentation. Biomed. Signal Process. Control 2026, 113, 108837. [Google Scholar] [CrossRef]
Gatys, L.A.; Ecker, A.S.; Bethge, M. Image Style Transfer Using Convolutional Neural Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 26 June–1 July 2016; pp. 2414–2423. [Google Scholar] [CrossRef]
Johnson, J.; Alahi, A.; Fei-Fei, L. Perceptual Losses for Real-Time Style Transfer and Super-Resolution. In Proceedings of the 14th European Conference on Computer Vision (ECCV), Amsterdam, The Netherlands, 11–14 October 2016; pp. 694–711. [Google Scholar] [CrossRef]
Jin, Y.; Sharma, A.; Tan, R.T. DC-ShadowNet: Single-Image Hard and Soft Shadow Removal Using Unsupervised Domain-Classifier Guided Network. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 11–17 October 2021; pp. 5007–5016. [Google Scholar] [CrossRef]
Cun, X.; Pun, C.-M.; Shi, C. Towards Ghost-Free Shadow Removal via Dual Hierarchical Aggregation Network and Shadow Matting GAN. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; pp. 10680–10687. [Google Scholar] [CrossRef]
Vasluianu, F.-A.; Romero, A.; Van Gool, L.; Timofte, R. Shadow Removal with Paired and Unpaired Learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Nashville, TN, USA, 19–25 June 2021; pp. 826–835. [Google Scholar] [CrossRef]
Wan, J.; Yin, H.; Wu, Z.; Wu, X.; Liu, Y.; Wang, S. Style-Guided Shadow Removal. In Proceedings of the 17th European Conference on Computer Vision (ECCV), Tel Aviv, Israel, 23–27 October 2022; pp. 361–378. [Google Scholar] [CrossRef]
Isola, P.; Zhu, J.-Y.; Zhou, T.; Efros, A.A. Image-to-Image Translation with Conditional Adversarial Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 1125–1134. [Google Scholar] [CrossRef]
Chen, G.; Dai, K.; Yang, K.; Hu, T.; Chen, X.; Yang, Y.; Dong, W.; Wu, P.; Zhang, Y.; Yan, Q. Bracketing Image Restoration and Enhancement with High-Low Frequency Decomposition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Seattle, WA, USA, 17–18 June 2024; pp. 6097–6107. [Google Scholar] [CrossRef]
Zhao, H.; Kong, X.; He, J.; Qiao, Y.; Dong, C. Efficient Image Super-Resolution Using Pixel Attention. In Proceedings of the European Conference on Computer Vision Workshops (ECCVW), Glasgow, UK, 23–28 August 2020; pp. 56–72. [Google Scholar] [CrossRef]
Hu, J.; Shen, L.; Sun, G. Squeeze-and-Excitation Networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141. [Google Scholar] [CrossRef]
Le, H.; Samaras, D. Shadow Removal via Shadow Image Decomposition. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 8577–8586. [Google Scholar] [CrossRef]
Zhu, L.; Deng, Z.; Hu, X.; Fu, C.-W.; Xu, X.; Qin, J.; Heng, P.-A. Bidirectional Feature Pyramid Network with Recurrent Attention Residual Modules for Shadow Detection. In Proceedings of the 15th European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 122–137. [Google Scholar] [CrossRef]
Qu, L.; Tian, J.; He, S.; Tang, Y.; Lau, R.W.H. DeshadowNet: A Multi-Context Embedding Deep Network for Shadow Removal. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 2308–2316. [Google Scholar] [CrossRef]
Liu, Y.; Ke, Z.; Xu, K.; Liu, F.; Wang, Z.; Lau, R.W. Recasting Regional Lighting for Shadow Removal. In Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, Canada, 20–27 February 2024; pp. 3810–3818. [Google Scholar] [CrossRef]
Hu, X.; Wang, T.; Fu, C.-W.; Jiang, Y.; Wang, Q.; Heng, P.-A. Revisiting shadow detection: A new benchmark dataset for complex world. IEEE Trans. Image Process. 2021, 30, 1925–1934. [Google Scholar] [CrossRef]
Guo, L.; Wang, C.; Yang, W.; Wang, Y.; Wen, B. Boundary-Aware Divide and Conquer: A Diffusion-Based Solution for Unsupervised Shadow Removal. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Paris, France, 2–6 October 2023; pp. 12999–13008. [Google Scholar] [CrossRef]
Wang, D.; Wang, J.; He, N.; Zhang, J.; Zhang, S.; Liu, S. Enhancing Unsupervised Shadow Removal via Multi-Intensity Shadow Generation and Diffusion Modeling. Vis. Comput. 2025, 41, 5461–5476. [Google Scholar] [CrossRef]
Yang, Q.; Tan, K.-H.; Ahuja, N. Shadow Removal Using Bilateral Filtering. IEEE Trans. Image Process. 2012, 21, 4361–4368. [Google Scholar] [CrossRef]
Gong, H.; Cosker, D. Interactive Shadow Removal and Ground Truth for Variable Scene Categories. In Proceedings of the British Machine Vision Conference (BMVC), Nottingham, UK, 1–5 September 2014. [Google Scholar] [CrossRef]
Huang, Y.; Lu, X.; Quan, Y.; Xu, Y.; Ji, H. Image Shadow Removal via Multi-Scale Deep Retinex Decomposition. Pattern Recognition 2025, 159, 111126. [Google Scholar] [CrossRef]
Vicente, T.F.Y.; Hou, L.; Yu, C.P.; Hoai, M.; Samaras, D. Large-Scale Training of Shadow Detectors with Noisily-Annotated Shadow Examples. In Proceedings of the 14th European Conference on Computer Vision (ECCV), Amsterdam, The Netherlands, 11–14 October 2016; pp. 816–832. [Google Scholar] [CrossRef]
Xue, W.; Zhang, L.; Mou, X.; Bovik, A.C. Gradient Magnitude Similarity Deviation: A Highly Efficient Perceptual Image Quality Index. IEEE Trans. Image Process. 2014, 23, 684–695. [Google Scholar] [CrossRef]

Figure 1. Visual comparison between G2R-ShadowNet and our model. The red boxes indicate the zoomed-in areas.

Figure 2. The structure of proposed AB-DFRNet.

Figure 3. The structure of shadow generation subnet.

Figure 4. The structure of shadow removal subnet.

Figure 5. The structure of the HFIE module.

Figure 6. The structure of the DAAF module.

Figure 7. Visual comparison results of shadow removal on the ISTD dataset. The red boxes indicate the zoomed-in areas.

Figure 8. Visual comparison results of shadow removal on the SRD dataset. The red boxes indicate the zoomed-in areas.

Figure 9. Visual comparison of shadow removal results on the SRD dataset between DCShadowNet and our model.

Figure 10. Challenging cases and limitations on ISTD+ and SRD. The red boxes indicate the zoomed-in areas.

Figure 11. Example result of shadow removal on the CUHK shadow dataset.

Figure 12. Visual comparison example of using different attention mechanisms in the DAAF module.

Figure 13. Visual comparison example of pseudo-shadow outputs. The red boxes indicate the zoomed-in areas.

Figure 14. Visual comparison example of shadow removal results with and without the HFIE module. The red boxes indicate the zoomed-in areas.

Figure 15. Visual comparison example of shadow removal results under COC loss ablation. The red boxes indicate the zoomed-in areas.

Table 2. Summary of subnets, added modules, and their usage during training and inference. “√” indicates the subnet/module is used. “×” indicates that it is not used.

Model (Subnet)	Operator	Added Module	Input	Output	Training	Testing
GS subnet	Conv 7 × 7, Down × 2 ResBlocks × 9 Up × 2, Conv 7 × 7	HFIE	non-shadow region	Pseudo-shadow image	√	×
D subnet	PatchGAN	–	Real shadow region/pseudo-shadow	Patch-wise realism map	√	×
SR subnet	Conv 7 × 7, Down × 2 ResBlocks × 9 Up × 2, Conv 7 × 7	DAAF	Pseudo-shadow image	Coarse shadow-free image	√	√
R subnet	Conv 7 × 7, Down × 2 ResBlocks × 9 Up × 2, Conv 7 × 7	–	Coarse output, shadow mask	Refined shadow-free result	√	√

Table 3. Comparison of different shadow removal models on the ISTD dataset. For models without available outputs, we report the evaluation metrics as presented in their papers or G2R-ShadowNet and marked with “*”. For all other models with released available weights and shadow removal results, we perform standardized re-evaluation using the validation code provided by Liu et al. [22]. “↑” indicates higher is better; “↓” indicates lower is better.

Method	Data	Shadow Region			Non-Shadow Region			All			LPIPS ↓
Method	Data	RMSE ↓	PSNR ↑	SSIM ↑	RMSE ↓	PSNR ↑	SSIM ↑	RMSE ↓	PSNR ↑	SSIM ↑	LPIPS ↓
Yang et al. [50] *	-	23.2	21.57	0.878	14.2	22.25	0.782	15.9	20.26	0.706	-
Gong et al. [51] *	-	13.0	30.53	0.972	2.6	36.63	0.982	4.3	28.96	0.943	-
DHAN [36]	Paired + Mask	9.6	32.92	0.988	7.4	27.15	0.971	7.8	25.66	0.956	0.0831
SG-ShadowNet [38]	Paired + Mask	6.0	37.60	0.990	2.4	37.42	0.985	3.0	33.90	0.972	0.0893
TBRNet [13]	Paired	6.5	36.34	0.991	3.3	35.57	0.977	3.8	31.91	0.964	-
DeS3 [17]	Paired	6.5	36.49	0.989	3.3	34.72	0.972	3.9	31.39	0.957	-
MSRDNet [52]	Paired	5.5	38.93	0.991	2.4	38.49	0.985	2.9	34.94	0.972	-
Mask-ShadowGAN [25]	Unpaired Images	10.8	32.19	0.984	3.8	33.44	0.974	4.8	28.81	0.946	-
DC-ShadowNet [35]	Unpaired Images	10.9	32.00	0.976	3.6	33.56	0.968	4.7	28.77	0.932	-
LG-ShadowNet [27]	Unpaired Images	9.9	32.45	0.982	3.2	33.73	0.975	4.3	29.22	0.947	0.125
Wang et al. [49] *	Unpaired Images	7.3	35.56	0.987	2.4	36.71	0.983	3.2	32.48	0.958	-
FSS2SR (detect) [21]	Shadow + Mask	10.4	33.09	0.983	2.8	35.35	0.978	3.9	30.15	0.951	0.101
G2R-ShadowNet (detect) [22]	Shadow + Mask	8.9	33.58	0.978	2.9	35.52	0.976	3.9	30.52	0.944	0.114
G2R-ShadowNet	Shadow + Mask	8.6	33.98	0.978	2.4	37.41	0.985	3.4	31.81	0.953	0.109
HQSS (detect) [28]	Shadow + Mask	8.5	33.95	0.980	2.8	35.59	0.978	3.7	30.76	0.948	0.114
HQSS	Shadow + Mask	8.2	34.52	0.980	2.4	37.41	0.985	3.4	32.08	0.956	0.108
BCDiff [48] *	Shadow + Mask	7.6	35.91	0.986	2.4	37.27	0.984	3.3	32.73	0.962	-
Ours (detect)	Shadow + Mask	8.4	34.53	0.981	2.8	35.61	0.977	3.7	31.12	0.950	0.110
Ours	Shadow + Mask	7.9	35.18	0.982	2.4	37.41	0.985	3.3	32.56	0.958	0.104

Note: Bold and underlined values denote the best and second-best results, respectively, among weakly supervised models.

Table 4. Comparison of different shadow removal models on the SRD Dataset. Results marked with “†” indicate models that are retrained and validated on the SRD dataset using official implementations. The G2R-ShadowNet model is trained for 150 epochs. Results labeled “GT” denote results obtained using the shadow masks generated by Liu et al. [46] for both training and testing. For other models, we employ the masks generated by Cun et al. [36] for both training and testing. “↑” indicates higher is better; “↓” indicates lower is better.

Method	Data	Shadow Region			Non-Shadow Region			All
Method	Data	RMSE ↓	PSNR ↑	SSIM ↑	RMSE ↓	PSNR ↑	SSIM ↑	RMSE ↓	PSNR ↑	SSIM ↑
Mask-ShadowGAN [25] †	Unpaired Images	10.8	29.97	0.964	4.9	32.84	0.976	6.6	27.56	0.933
LG-ShadowNet [27] †	Unpaired Images	11.9	29.26	0.955	4.6	31.85	0.976	7.0	26.56	0.924
DC-ShadowNet [35] †	Unpaired Images	8.7	32.48	0.967	4.3	34.21	0.978	5.5	29.72	0.938
G2R-ShadowNet [22] †	Shadow + Mask	15.8	26.7	0.940	4.5	31.63	0.981	7.8	24.94	0.902
HQSS [28]	Shadow + Mask	15.9	26.32	0.941	4.6	31.47	0.981	7.7	24.68	0.904
Ours	Shadow + Mask	13.9	27.69	0.953	4.6	31.61	0.981	7.0	25.72	0.914
Ours (GT)	Shadow + Mask	13.0	28.59	0.950	4.1	33.57	0.983	6.7	26.99	0.924

Note: Bold and underlined values denote the best and second-best results, respectively.

Table 5. Comparison of different shadow removal models on the Video Shadow Removal Dataset. “RMSE₄₀” denotes the RMSE computed using a moving shadow mask with a threshold of 40, while other metrics are calculated with a threshold of 80.

Method	Data	RMSE	${RMSE}_{40}$	PSNR	SSIM
SP+M-Net [43]	Paired + Mask	-	22.2	-	-
Mask-ShadowGAN [25]	Unpaired Images	22.7	19.6	20.38	0.887
LG-ShadowNet [27]	Unpaired Images	22.0	18.3	20.68	0.880
FSS2SR [21]	Shadow + Mask	-	20.9	-	-
G2R-ShadowNet [22]	Shadow + Mask	21.8	18.8	21.07	0.882
HQSS [28]	Shadow + Mask	18.95	16.82	21.89	0.888
BCDiff [48]	Shadow + Mask	-	17.7	22.23	0.893
Ours	Shadow + Mask	18.3	16.1	22.03	0.897

Note: Bold and underlined values denote the best and second-best results, respectively, among weakly supervised models.

Table 6. Complexity comparison between the shadow removal models. MACs are measured for a single forward pass with an input resolution of

480 \times 640

.

Table 6. Complexity comparison between the shadow removal models. MACs are measured for a single forward pass with an input resolution of

480 \times 640

.

Method	SR Subnet Params (M)	R Subnet Params (M)	Total Params (M)	SR Subnet MACs (G)	R Subnet MACs (G)	Total MACs (G)	Inference Time (ms)
DC-ShadowNet	-	-	10.59	-	-	246.48	35.34
SG-ShadowNet	2.04	4.13	6.17	83.21	85.86	169.08	56.09
Mask-ShadowGAN	-	-	11.38	-	-	266.77	33.82
LG-ShadowNet	-	-	5.70	-	-	67.37	24.50
G2R-ShadowNet	11.38	11.38	22.76	266.77	267.73	534.5	76.18
HQSS	11.38	11.38	22.76	232.80	233.76	534.5	76.15
Ours	11.90	11.38	23.28	276.98	267.52	544.49	79.40

Table 7. Comparison of shadow removal results on two regions inside the Shadow mask. “↑” indicates higher is better; “↓” indicates lower is better.

Method	Shadow-Region				Edge-Region
Method	Grad-L1 ↓	GMSD ↓	GMS-Mean ↑	LapEnergyDiff ↓	Grad-L1 ↓	GMSD ↓	GMS-Mean ↑	LapEnergyDiff ↓
LG-ShadowNet	0.02575	0.11834	0.91349	0.02819	0.03600	0.15935	0.85856	0.02139
HQSS	0.02806	0.12690	0.89987	0.03214	0.03728	0.16794	0.84325	0.02448
G2R-ShadowNet	0.02745	0.12684	0.90330	0.02916	0.03772	0.17205	0.84095	0.02349
Ours	0.02569	0.11969	0.91356	0.02943	0.03628	0.16650	0.85000	0.02439

Table 8. Per-image mean, 95% CI, and paired statistical comparison on the ISTD test set (shadow region). Gain is computed as ours minus the compared method. “↑” indicates higher is better. Reported values are rounded to four decimal places. Improvements are calculated using full-precision values prior to rounding.

Method	PSNR ↑ Mean [95% CI]	SSIM ↑ Mean [95% CI]	Gain PSNR [95% CI]	p (PSNR, t / w)	Gain SSIM [95% CI]	p (SSIM, t / w)
Ours	35.18 [34.72, 35.64]	0.9817 [0.9804, 0.9830]	–	–	–	–
LG-ShadowNet	32.45 [32.01, 32.89]	0.9820 [0.9807, 0.9832]	+2.73 [2.39, 3.07]	$3.19 \times 10^{- 46}$ $6.83 \times 10^{- 40}$	−0.0002 [−0.0009, 0.0005]	$5.11 \times 10^{- 1}$ $5.35 \times 10^{- 1}$
G2R-ShadowNet	33.98 [33.53, 34.42]	0.9779 [0.9764, 0.9793]	+1.20 [1.06, 1.35]	$2.4 \times 10^{- 47}$ $5.4 \times 10^{- 43}$	+0.0038 [0.0033, 0.0044]	$1.1 \times 10^{- 35}$ $2.3 \times 10^{- 47}$
HQSS	34.52 [34.06, 34.97]	0.9804 [0.9790, 0.9818]	+0.66 [0.49, 0.83]	$6.3 \times 10^{- 14}$ $2.3 \times 10^{- 14}$	+0.0013 [0.0008, 0.0018]	$2.2 \times 10^{- 7}$ $9.7 \times 10^{- 10}$

Table 9. Ablation experimental results of proposed modules and loss function on the ISTD+ dataset. “↑” indicates higher is better; “↓” indicates lower is better. “√” indicates used; “×” indicates not used.

Method	DAAF	HFIE	COC Loss	Shadow Region			All
Method	DAAF	HFIE	COC Loss	RMSE ↓	PSNR ↑	SSIM ↑	RMSE ↓	PSNR ↑	SSIM ↑
Baseline	×	×	×	8.6	33.98	0.978	3.4	31.81	0.953
Ours	√	×	×	8.4	34.37	0.978	3.3	32.07	0.954
Ours	√	√	×	8.1	34.90	0.982	3.3	32.40	0.958
Ours	√	×	√	8.5	33.99	0.978	3.4	31.80	0.953
Ours	×	√	√	8.3	34.63	0.981	3.4	32.24	0.957
Ours	√	√	√	7.9	35.18	0.982	3.3	32.56	0.958

Note: Bold denote the best.

Table 10. Ablation experimental results of proposed modules and loss function on the SRD dataset. “↑” indicates higher is better; “↓” indicates lower is better.”√” indicates used; “×” indicates not used.

Method	DAAF	HFIE	COC Loss	Shadow Region			All
Method	DAAF	HFIE	COC Loss	RMSE ↓	PSNR ↑	SSIM ↑	RMSE ↓	PSNR ↑	SSIM ↑
Baseline	×	×	×	14.2	27.89	0.936	7.2	26.43	0.910
Ours	√	√	√	13.0	28.59	0.950	6.7	26.99	0.924
Ours	√	√	×	13.6	28.16	0.943	6.9	26.65	0.917
Ours	√	×	√	14.5	27.62	0.933	7.3	26.23	0.906
Ours	×	√	√	15.4	27.87	0.928	7.7	26.41	0.902

Note: Bold denote the best.

Table 11. Supplementary evaluation on the shadow region using gradient metrics. “↑” indicates higher is better; “↓” indicates lower is better.

Method	Grad-L1 ↓	GMSD ↓	GMS-Mean ↑
w/o DAAF	0.02661	0.12346	0.90850
Ours	0.02569	0.11969	0.91356

Note: Bold denotes the best.

Table 12. Ablation experimental results of attention mechanism in the DAAF module. “↑” indicates higher is better; “↓” indicates lower is better.

Method	Shadow Region			All
Method	RMSE ↓	PSNR ↑	SSIM ↑	RMSE ↓	PSNR ↑	SSIM ↑
w/o pa	8.0	34.96	0.981	3.3	32.42	0.957
w/o ca	8.0	34.92	0.980	3.3	32.40	0.956
w/o att	8.2	34.76	0.982	3.3	32.27	0.957
Complete model	7.9	35.18	0.982	3.3	32.56	0.958

Note: Bold denotes the best.

Table 13. Ablation results of HFIE placement on the SRD dataset. “↑” indicates higher is better; “↓” indicates lower is better.

Method	GS Subnet	SR Subnet	Shadow Region			All
Method	GS Subnet	SR Subnet	RMSE ↓	PSNR ↑	SSIM ↑	RMSE ↓	PSNR ↑	SSIM ↑
Ours	√	√	8.5	34.43	0.979	3.4	32.04	0.955
Ours	×	√	9.1	33.69	0.977	3.5	31.57	0.951
Ours	√	×	7.9	35.18	0.982	3.3	32.56	0.958

Note: Bold denotes the best.

Table 14. Ablation experimental results of the HFIE module. “↑” indicates higher is better; “↓” indicates lower is better.

Number of Convolutional Layers	Shadow Region			All
Number of Convolutional Layers	RMSE ↓	PSNR ↑	SSIM ↑	RMSE ↓	PSNR ↑	SSIM ↑
1	8.3	34.67	0.980	3.3	32.25	0.956
2	8.2	34.40	0.980	3.4	32.02	0.956
3	8.5	34.57	0.979	3.4	32.17	0.955
4	8.0	34.95	0.982	3.3	32.40	0.958
5	7.9	34.93	0.979	3.3	32.42	0.956
6	7.9	35.18	0.982	3.3	32.56	0.958
7	8.4	34.94	0.980	3.4	32.41	0.956
8	8.9	34.20	0.980	3.5	31.82	0.956

Note: Bold denotes the best.

Table 15. Ablation study of the HFIE module with high-frequency enhancement and high–low feature fusion. “↑” indicates higher is better; “↓” indicates lower is better.

Method	Shadow Region			All
Method	RMSE ↓	PSNR ↑	SSIM ↑	RMSE ↓	PSNR ↑	SSIM ↑
HFIE (HF+LF fusion)	8.4	34.76	0.981	3.4	32.28	0.957
HFIE (HF-only)	7.9	35.18	0.982	3.3	32.56	0.958

Note: Bold denotes the best.

Table 16. Ablation study of COC loss functions with different weights. “↑” indicates higher is better; “↓” indicates lower is better.

Method	Shadow Region			All
Method	RMSE ↓	PSNR ↑	SSIM ↑	RMSE ↓	PSNR ↑	SSIM ↑
0.1	8.1	34.81	0.980	3.3	32.39	0.957
0.3	8.7	34.49	0.980	3.4	32.11	0.956
0.5	7.9	35.18	0.982	3.3	32.56	0.958
0.7	8.1	35.02	0.981	3.3	32.45	0.957
0.9	8.4	34.88	0.979	3.4	32.43	0.955
1	8.0	34.54	0.979	3.3	32.17	0.955

Note: Bold denotes the best.

Table 17. Comparison of COC loss and L2 loss. “↑” indicates higher is better; “↓” indicates lower is better.

Method	Shadow Region			All
Method	RMSE ↓	PSNR ↑	SSIM ↑	RMSE ↓	PSNR ↑	SSIM ↑
L2 loss	8.2	34.77	0.981	3.3	32.29	0.958
COC loss	7.9	35.18	0.982	3.3	32.56	0.958

Note: Bold denotes the best.

Table 18. Sensitivity analysis of the weighting parameters

(ω_{a}, ω_{b})

in the COC loss. “↑” indicates higher is better; “↓” indicates lower is better.

Table 18. Sensitivity analysis of the weighting parameters

(ω_{a}, ω_{b})

in the COC loss. “↑” indicates higher is better; “↓” indicates lower is better.

Method $(ω_{a}, ω_{b})$	Shadow Region			All
Method $(ω_{a}, ω_{b})$	RMSE ↓	PSNR ↑	SSIM ↑	RMSE ↓	PSNR ↑	SSIM ↑
(0.5, 0.5)	7.9	35.18	0.983	3.3	32.55	0.958
(0.6, 0.4)	7.9	34.89	0.982	3.3	32.37	0.958
(0.7, 0.3)	8.4	34.46	0.978	3.4	32.15	0.953
(0.8, 0.2)	8.0	34.79	0.980	3.3	32.37	0.956
(0.9, 0.1)	8.0	34.87	0.980	3.3	32.37	0.957

Note: Bold denotes the best.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Shao, Y.; Zhang, Z.; Yang, M. Symmetry-Guided AB-Dynamic Feature Refinement Network for Weakly Supervised Shadow Removal. Symmetry 2026, 18, 330. https://doi.org/10.3390/sym18020330

AMA Style

Shao Y, Zhang Z, Yang M. Symmetry-Guided AB-Dynamic Feature Refinement Network for Weakly Supervised Shadow Removal. Symmetry. 2026; 18(2):330. https://doi.org/10.3390/sym18020330

Chicago/Turabian Style

Shao, Yiming, Zhijia Zhang, and Minmin Yang. 2026. "Symmetry-Guided AB-Dynamic Feature Refinement Network for Weakly Supervised Shadow Removal" Symmetry 18, no. 2: 330. https://doi.org/10.3390/sym18020330

APA Style

Shao, Y., Zhang, Z., & Yang, M. (2026). Symmetry-Guided AB-Dynamic Feature Refinement Network for Weakly Supervised Shadow Removal. Symmetry, 18(2), 330. https://doi.org/10.3390/sym18020330

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Symmetry-Guided AB-Dynamic Feature Refinement Network for Weakly Supervised Shadow Removal

Abstract

1. Introduction

2. Related Work

2.1. GAN-Based Shadow Removal

2.2. Feature Fusion

2.3. Auxiliary Loss Functions for Shadow Removal

3. Method

3.1. High-Frequency Information Enhancer Module

3.2. Dual-Attention Adaptive Fusion Module

3.3. Color Consistency Loss

3.4. Loss Function

4. Experiment

4.1. Datasets

4.2. Evaluation Metrics

4.3. Implementation Details

4.4. Comparison with The-State-of-the-Arts

4.5. Ablation Study

4.5.1. Ablation on Modules and Loss

4.5.2. Ablation Study on DAAF Module

4.5.3. Ablation Study on HFIE Module

4.5.4. Ablation Study on Chrominance-Only Consistency Loss

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI