Frequency-Guided Multi-Scale Dehazing Network with Cross-Domain Spatial–Spectral Gating

Jin, Fangyuan; Lin, Hui; Zhang, Lu; Chen, Yiwei

doi:10.3390/a19050341

Open AccessArticle

Frequency-Guided Multi-Scale Dehazing Network with Cross-Domain Spatial–Spectral Gating

by

Fangyuan Jin

¹,

Hui Lin

^1,2,

Lu Zhang

¹ and

Yiwei Chen

^3,4,*

¹

Science and Technology on Underwater Test and Control Laboratory, Dalian 116013, China

²

Marine Engineering College, Dalian Maritime University, Dalian 116026, China

³

Suzhou Institute of Biomedical Engineering and Technology, Chinese Academy of Sciences, Suzhou 215163, China

⁴

School of Biomedical Engineering (Suzhou), Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei 215163, China

^*

Author to whom correspondence should be addressed.

Algorithms 2026, 19(5), 341; https://doi.org/10.3390/a19050341

Submission received: 29 March 2026 / Revised: 19 April 2026 / Accepted: 21 April 2026 / Published: 28 April 2026

Download

Browse Figures

Versions Notes

Abstract

Single-image dehazing is still a challenging problem because haze mainly corrupts low-frequency structures such as global contrast and color consistency, while fine textures and object boundaries are degraded in a different manner. In this paper, we present a frequency-guided multi-scale dehazing network (FGDNet) that explicitly couples spatial-domain restoration and Fourier-domain feature decomposition in a compact U-Net-like architecture. Built on a gated U-Net backbone, the proposed model inserts a frequency processing branch into encoder stages. In detail, the feature maps are transformed by fast Fourier transform, split into low- and high-frequency components through a radial mask, refined separately, and fused by a lightweight cross-domain gating module. The low-frequency pathway emphasizes color and illumination recovery, whereas the high-frequency pathway enhances edges and textures. Moreover, an additional Fourier amplitude supervision term aligns the spectral distribution of restored images with haze-free targets. Experimental results on RESIDE ITS, RESIDE OTS, O-HAZE, and NH-HAZE show that the proposed method achieves 33.3 dB PSNR/0.983 SSIM on ITS, 35.1 dB PSNR/0.988 SSIM on OTS, 19.1 dB PSNR/0.786 SSIM for OTS-trained generalization to O-HAZE, and 15.8 dB PSNR/0.648 SSIM for OTS-trained generalization to NH-HAZE. Furthermore, both quantitative and qualitative results demonstrate that the proposed method provides a more effective and more robust solution than representative dehazing methods. In addition, ablation studies confirm that both the Fourier branch and the spatial–spectral gating mechanism contribute consistently to performance gains. These results support the effectiveness of explicit frequency-aware representation learning for image dehazing and suggest a practical direction for improving generalization from synthetic to real haze.

Keywords:

image processing; image dehazing; frequency-domain learning; Fourier transform; cross-domain gating

1. Introduction

Image dehazing aims to recover a clear image from a single degraded observation acquired under hazy conditions. Haze reduces contrast, distorts color, suppresses details, and degrades the robustness of downstream tasks such as object detection [1,2], recognition [3,4], autonomous navigation [5,6,7], and remote sensing interpretation [8,9,10]. Classical prior-based methods, represented by the dark channel prior, estimate physical variables such as transmission and atmospheric light from hand-crafted statistics, but their assumptions often break down under nonhomogeneous haze or complex illumination [11].

Deep learning has substantially changed the field by learning haze-to-clear mappings from paired data. Early CNN-based methods such as DehazeNet [12], MSCNN [13], and AOD-Net [14] improved restoration quality and inference speed, while later architectures, including GridDehazeNet [15], FFA-Net [16], MSBDN-DFF [17], DeHamer [18], DehazeFormer [19], gUNet [20], and DEA-Net [21] pushed performance further through attention, multi-scale gating, transformer modeling, and stronger feature fusion. However, most existing dehazing networks are dominated by spatial-domain processing.

The motivation of this work is that haze affects different frequency bands in different ways. In particular, low-frequency content is strongly associated with global veil, contrast attenuation, and color shift, whereas high-frequency content is closely related to boundaries, textures, and structural fidelity. Although frequency-domain ideas have been shown to be effective in other restoration problems and frequency-aware dehazing has started to emerge [22,23], explicit spatial–spectral collaboration is still underexplored in compact multi-scale dehazing networks.

In this paper, we develop a frequency-guided multi-scale dehazing network by augmenting a gated U-Net-like backbone with an FFT-based branch. At each selected encoder stage, feature maps are decomposed into low- and high-frequency components in the Fourier domain, refined independently, and fused through a lightweight cross-domain gating block that exchanges channel-wise and spatial attention cues between the two domains. The design is intentionally simple: it keeps the strong locality and efficiency of convolutional encoders while introducing explicit spectral reasoning where haze corruption is most visible.

In contrast to papers that rely purely on transformer attention or only use spectral supervision, the present method combines three ingredients in a unified architecture: (1) frequency decomposition of intermediate features rather than only input images, (2) separate processing for low/high spectral bands, and (3) bidirectional spatial–spectral gating before fusion. In addition, the training objective includes an FFT amplitude loss to encourage spectral consistency between restored and ground-truth images.

Controlled ablations on RESIDE-ITS and RESIDE-OTS (detailed in Section 4.4) show that both the FFT branch and the cross-domain gating module are consistently beneficial, and that the FFT branch provides the dominant share of the improvement. These results validate the proposed design logic and support the paper’s central hypothesis that explicit frequency modeling improves dehazing.

The main contributions are summarized as follows: first, we propose a frequency-guided multi-scale dehazing network that embeds FFT-based low/high-frequency decomposition into a compact gated U-Net backbone; second, we design a lightweight cross-domain gating module that enables two-way information exchange between spatially organized and spectrally decomposed features; third, we introduce a simple yet effective loss formulation combining pixel-domain and Fourier-amplitude supervision.

2. Related Work

2.1. Single Image Dehazing

Traditional dehazing methods are mostly derived from the atmospheric scattering model and handcrafted priors. The dark channel prior became a landmark method by exploiting the near-zero minimum intensity statistic in haze-free patches [11]. With the rise in deep learning, end-to-end CNN approaches such as DehazeNet [12], MSCNN [13], and AOD-Net [14] replaced explicit prior engineering with data-driven estimation.

More recent methods emphasize stronger multi-scale aggregation and attention. GridDehazeNet [15] uses an attention-based grid backbone to alleviate information bottlenecks, FFA-Net [16] employs feature and pixel attention for stronger feature fusion, and MSBDN-DFF [17] improves restoration through boosted decoding and dense feature fusion. Transformer-based methods further extend receptive fields and model long-range structure, as shown by DeHamer [18] and DehazeFormer [19]. More recently, C2PNet [24] further introduced curricular contrastive regularization built on a physics-aware dual-branch unit, achieving large gains on the SOTS-indoor and SOTS-outdoor benchmarks and illustrating the value of carefully designed training objectives in modern dehazing pipelines.

gUNet [20] shows that a carefully designed gated U-Net backbone can outperform more complicated designs with relatively low overhead. DEA-Net [21] improves detail reconstruction through detail-enhanced convolution and content-guided attention. These works establish strong spatial-domain baselines, but most of them do not explicitly isolate or manipulate frequency bands inside the network.

In parallel with the in-domain accuracy race, a complementary line of research targets two further challenges of real-world dehazing: ultra-high resolution and domain shift between synthetic training data and real captures. For ultra-high resolution, 4KDehazing [25] proposes a multi-guided bilateral learning network that performs real-time dehazing on 4K images using a single GPU. For the synthetic-to-real gap, PSD [26] fine-tunes a synthetic-pre-trained backbone with a committee of physical-prior losses; D⁴ [27] decomposes the transmission map into density and depth and re-renders hazy images during fully unpaired training; and RIDCP [28] introduces a phenomenological degradation pipeline together with a high-quality codebook prior obtained from a pre-trained VQGAN. These methods are complementary to ours: they target generalization through data, prior, or training-strategy modifications, whereas the present work targets generalization through an architectural change at the feature level.

2.2. Frequency-Aware Restoration and Dehazing

Frequency-domain modeling has become increasingly important in image restoration because Fourier representations separate global structure and local detail in a physically meaningful way. FSAS [23], proposed for image deblurring, demonstrated that frequency-domain operations can be computationally efficient and restoration-effective.

In dehazing, frequency-aware designs remain comparatively sparse. Frequency and spatial dual guidance for image dehazing [22] and FSAS [23] have demonstrated that frequency-domain operations can improve restoration quality and efficiency, though existing frequency-aware dehazing models are typically more complex or target different design goals. Instead, our method explores a compact encoder-side spectral decomposition strategy that can be added to a gated U-Net with limited architectural burden. Beyond dehazing, the value of frequency-domain modeling has been further confirmed by recent works in adjacent restoration tasks: Deep Fourier Up-Sampling [29] derives a theoretically sound up-sampling operator that respects the spectral convolution theorem; SFNet [30] dynamically selects the most informative frequency sub-bands inside a U-Net backbone for multiple restoration tasks; and FourLLIE [31] exploits the strong correlation between the Fourier amplitude and image lightness for efficient low-light enhancement. These works collectively support the design choice of placing an explicit Fourier branch inside a compact dehazing backbone.

2.3. Benchmark Datasets

RESIDE is the most widely used benchmark family for single-image dehazing and includes large-scale synthetic indoor and outdoor data for supervised learning [32]. To better evaluate real-scene dehazing, O-HAZE provides 45 paired outdoor scenes captured with real haze generated by professional haze machines [33]. NH-HAZE extends real-world evaluation to nonhomogeneous haze, providing paired images with more challenging spatially varying degradation [34]. These datasets are therefore suitable for evaluating both in-domain restoration and synthetic-to-real generalization.

3. Methods

3.1. Overall Architecture

Figure 1a depicts the overall architecture of our proposed method. The proposed FGDNet adopts a four-stage encoder–decoder structure inspired by gUNet [20], with channel dimensions (32, 64, 128, 256) and stage depths (1, 2, 2, 4). A 3 × 3 stem convolution maps the RGB input to shallow features. The first three encoder stages contain gated convolutional blocks followed by an FFT-based spectral processing module; the fourth stage acts as the bottleneck and remains in the spatial domain. Decoder stages progressively upsample features and fuse them with encoder skips through 1 × 1 compression and gated convolutional refinement. A residual prediction head produces the final dehazed image, which is added to the input and clipped to [0,1].

The model selectively performs spectral processing in the first three encoder stages. This design is sensible because encoder features still preserve relatively explicit texture/layout semantics, whereas very deep bottleneck features become highly compressed and more abstract. Therefore, integrating Fourier decomposition at these scales offers a good balance between signal interpretability and computational cost. This choice is empirically justified by the ablation over the number of FFT-equipped encoder stages reported in Section 4.6 (Table 5): PSNR increases monotonically when the FFT module is progressively added to stages 1, 2, and 3, but drops slightly when it is further applied to the bottleneck stage. The reason is that the bottleneck features are spatially compressed to 8 × 8 at a 256 × 256 input, which is too small for a reliable 2D FFT decomposition; the spectrum becomes dominated by only a few coefficients, and the low/high split no longer reflects meaningful image-level frequencies.

3.2. Gated Convolutional Backbone

Each backbone block follows a lightweight gated-convolution design similar in spirit to gUNet [20]. For an input feature x, a pointwise convolution expands channels, a depthwise convolution captures local context, and a simple gate splits the expanded tensor into two halves and multiplies them element-wise. A second feed-forward branch with channel expansion further refines the representation. Learnable scaling parameters β and γ modulate the two residual paths, improving optimization stability. This structure preserves the efficiency and local modeling strength of convolutional U-Net designs while offering stronger feature selectivity than standard residual blocks.

3.3. Frequency Split and Refinement

Frequency split and refinement are shown as a part of Figure 1b. Given an encoder feature map

x \in R^{B \times C \times H \times W}

, the spectral module computes the 2D FFT and centers the spectrum by FFT shift. A radial binary mask M is then constructed according to a threshold ratio ρ, with the low-frequency ratio set to 0.18. The masked low-frequency spectrum and complementary high-frequency spectrum are determined by the value of ρ, which controls the spectral area regarded as low-frequency. We empirically set ρ = 0.18 based on the sensitivity analysis reported in Section 4.5 (Table 4), which shows that performance is stable within ρ ∈ [0.15, 0.25] and that ρ = 0.18 yields the best overall PSNR/SSIM on RESIDE-ITS_v2.

F_{l o w} = F ⊙ M, F_{h i g h} = F ⊙ (1 - M)

where F denotes the shifted Fourier transform of x and ⊙ is element-wise multiplication. Applying inverse FFT to the two branches yields spatially arranged low-frequency and high-frequency feature maps. The low-frequency branch is expected to encode haze veil, coarse illumination, and color consistency, whereas the high-frequency branch preserves edges, texture transitions, and structural sharpness. Both branches are refined by lightweight 3 × 3 convolution stacks before fusion.

3.4. Cross-Domain Spatial–Spectral Gating

Cross-domain spatial–spectral gating is shown as a part of Figure 1b. The core gating module is intentionally lightweight. The low-frequency branch, which also serves as the spatial branch, generates channel-wise weights for the high-frequency branch through global average pooling followed by two 1 × 1 convolutions and a sigmoid. Conversely, the gated high-frequency features predict a spatial attention mask for the low-frequency branch through a small 1 × 1 convolutional stack. This bidirectional gating allows global contrast cues to modulate detail enhancement while enabling detail-rich signals to selectively refine spatial localization.

Formally, if S and H denote the refined low-frequency and high-frequency features, then the gating is

H' = H ⊙ g_{c} (S), S' = S ⊙ g_{s} (H'),

Y = {C o n v}_{1 \times 1} ([S', H']),

where

g_{c} (\cdot)

denotes channel gating,

g_{s} (\cdot)

denotes spatial gating, [·,·] is concatenation, and Y is the fused output. The final encoder-stage output is Y + x.

3.5. Training Objective

We optimize a weighted sum of L₁ loss, Charbonnier loss, and Fourier amplitude loss:

L = λ_{1} L_{1} (\hat{I}, I_{g t}) + λ_{2} L_{c h a r} (\hat{I}, I_{g t}) + λ_{3} L_{f f t} (\hat{I}, I_{g t}) .

The default weights are λ₁ = 1.0, λ₂ = 0.2, and λ₃ = 0.05. The Charbonnier term improves robustness to outliers, while the FFT amplitude term constrains spectral statistics by minimizing the L₁ distance between the magnitudes of the predicted and target Fourier transforms. This is consistent with the goal of recovering both visually accurate content and correct frequency composition. We deliberately constrain only the amplitude spectrum because (i) the amplitude spectrum is strongly correlated with contrast attenuation and the haze veil that FGDNet targets, whereas (ii) the phase spectrum is wrapped in [−π, π] with 2π discontinuities and empirically destabilizes training. The ablation in Table 7 indicates that adding an L₁ phase penalty gives a small PSNR drop of about 0.2 dB on ITS→ITS, while the FFT amplitude loss alone contributes about +0.3 dB on top of the FFT branch and is therefore retained as a complementary objective.

4. Experiments

4.1. Datasets, Evaluation Protocol, and Implementation Details

In the experiments, we use paired training/evaluation settings on RESIDE ITS_v2, RESIDE OTS_BETA, O-HAZE, and NH-HAZE. RESIDE ITS_v2 contains 1399 clear images and 13,990 hazy images, RESIDE OTS_BETA contains 2061 clear images and 72,135 hazy images, O-HAZE contains 45 paired real outdoor scenes, and NH-HAZE contains about 55 paired real nonhomogeneous haze scenes. The RESIDE ITS_v2 and RESIDE OTS_BETA datasets were partitioned at the source-scene level into a training set, a validation set, and a test set in proportions of 80%, 10%, and 10%, respectively. We deliberately adopt this scene-disjoint partition rather than the canonical SOTS split for three reasons. First, ITS_v2 and SOTS-Indoor are both rendered from NYU Depth V2 depth maps, and the canonical split is performed at the rendered-image level, so NYU2 source scenes can overlap between ITS_v2 training and SOTS-Indoor test; our source-scene-level partition enforces that no NYU2 scene appears in both the training and test folds and therefore provides a stricter generalization test inside the indoor domain. Second, SOTS contains only 500 indoor and 500 outdoor pairs with a narrow range of atmospheric-scattering parameters β, whereas our scene-level 10% test split contains ≈1400 indoor and ≈7200 outdoor pairs spanning the full β range used during training, yielding less noisy PSNR/SSIM estimates. Third, since Section 4.3 also reports cross-domain protocols (OTS→O-HAZE and OTS→NH-HAZE) that are not drawn from the RESIDE distribution, using a scene-level in-domain partition keeps the evaluation across all four protocols in Table 1, Table 2 and Table 3 uniform. For full comparability with prior work, we additionally evaluate the same trained models on the canonical SOTS-Indoor (500 pairs) and SOTS-Outdoor (500 pairs) test sets; the corresponding numbers are listed at the bottom of Table 1 and are consistent with those obtained on our scene-disjoint partition. In evaluation, PSNR and SSIM are used as primary quantitative metrics. We evaluate both in-domain performance (ITS→ITS and OTS→OTS) and cross-domain generalization from synthetic training to real paired benchmarks (OTS→O-HAZE and OTS→NH-HAZE). Note: The left side of the arrow “→” indicates the training set, and the right side indicates the test set.

The proposed method uses a compact FGDNet with dimensions (32, 64, 128, 256) and three FFT-equipped encoder stages. Synthetic training is performed with random crops of size 256 × 256 and batch size 8. Mixed-precision training is enabled through Automatic Mixed Precision. The default learning rate in the training process is 2 × 10⁻⁴. All experiments are implemented in Python 3.10 PyTorch 2.1.0 with CUDA 12.1 and cuDNN 8.9, running on a single NVIDIA RTX 3090 (24 GB) under Ubuntu 22.04, and NumPy 1.26, Suzhou, China. The optimiser is AdamW (β₁ = 0.9, β₂ = 0.999, weight decay = 1 × 10⁻⁴) with a cosine annealing scheduler, trained for 300 k iterations on both ITS and OTS. Random seeds {0, 1, 2} are used for the multi-run variance analysis reported in Section 4.4.

Two ablation variants are implemented in the project: “no-cross” removes the spatial–spectral gating mechanism and replaces it with direct addition, and “no-fft” disables the FFT branch in encoder stages. These ablations directly test the contributions of explicit spectral decomposition and cross-domain gating.

4.2. Quantitative Results

The main quantitative results of the proposed model are summarized in Table 1.

The full model achieves 33.3 dB/0.983 SSIM on ITS and 35.1 dB/0.988 SSIM on OTS, indicating strong performance on synthetic paired benchmarks. When trained on OTS, the model obtains 19.1 dB/0.786 SSIM on O-HAZE and 15.8 dB/0.648 SSIM on NH-HAZE. These results suggest that the proposed spectral design is beneficial not only for in-domain reconstruction but also for synthetic-to-real transfer.

Compared with the results on synthetic benchmarks (ITS and OTS), the lower scores on the real-world benchmarks (O-HAZE and NH-HAZE) indicate that domain shift remains the main bottleneck. This is expected because synthetic haze does not fully reproduce the color deviation, spatial irregularity, and illumination complexity of real atmospheric scattering. Nevertheless, the model retains reasonable structural similarity, especially on O-HAZE, indicating that the frequency-guided design may capture more transferable restoration cues.

4.3. Comparison with Representative State-of-the-Art Methods

To validate the effectiveness of the proposed frequency-guided multi-scale dehazing network, we compare it with three representative state-of-the-art methods that collectively cover the main architectural paradigms pursued in recent years: GridDehazeNet [15], MSBDN [17], and FFA-Net [16]. As shown in Table 2, the left side of the arrow “→” indicates the training set, and the right side indicates the test set.

From the quantitative results, the proposed method consistently achieves better PSNR and SSIM across all evaluated protocols. On the synthetic-to-synthetic setting ITS→ITS, our method attains 33.3 dB/0.983, outperforming GridDehazeNet (32.2 dB/0.978), MSBDN (32.8 dB/0.981), and FFA-Net (32.0 dB/0.978). These results indicate that the proposed model can recover cleaner structures and more faithful textures even when the training and test domains are relatively consistent.

The ITS→ITS advantage of FGDNet is not specific to any single baseline but holds uniformly across the three representative methods, confirming that the benefit of the proposed design is robust with respect to the attention-based, multi-scale, and feature-aggregation CNN paradigms they collectively cover.

On the OTS→OTS protocol, all methods achieve relatively strong performance, but the proposed method still achieves the best results with 35.1 dB/0.988, surpassing GridDehazeNet (30.9 dB/0.980), MSBDN (33.6 dB/0.982), and FFA-Net (34.7 dB/0.986). Although the margins are smaller in this easier, same-domain outdoor setting, the result still demonstrates that the proposed frequency-guided branch and cross-domain gating can provide stable improvements without sacrificing reconstruction quality.

A similar trend can be observed on the real-world cross-domain benchmarks. On OTS→O-HAZE, the proposed method achieves 19.1 dB/0.786, compared with 17.2 dB/0.712 for GridDehazeNet, 18.2 dB/0.750 for MSBDN, and 18.3 dB/0.766 for FFA-Net. On OTS→NH-HAZE, our method further reaches 15.8 dB/0.648, exceeding GridDehazeNet (13.4 dB/0.548), MSBDN (14.1 dB/0.602), and FFA-Net (14.8 dB/0.638). Notably, the improvements on these two cross-domain settings are more pronounced than those on the same-domain OTS benchmark, suggesting that the proposed frequency-guided design provides stronger robustness to real haze distributions. In particular, the simultaneous gains in both PSNR and SSIM indicate that the model improves not only pixel fidelity but also structural consistency, rather than trading one metric for the other. FGDNet is, in fact, the only method that simultaneously improves both PSNR and SSIM over every baseline on every one of the four protocols in Table 2.

In summary, the proposed method shows two clear advantages. First, it yields the best overall restoration accuracy on both indoor and outdoor synthetic benchmarks. Second, and more importantly, it exhibits stronger transferability from synthetic training data to real-world haze datasets, where the domain gap is much larger. This suggests that explicitly modeling low- and high-frequency components is beneficial for capturing both global contrast degradation and local detail attenuation caused by haze.

The qualitative comparisons in Figure 2 further support the numerical results. Compared with the three representative baselines (GridDehazeNet, MSBDN, and FFA-Net), the proposed method removes residual haze more thoroughly and produces cleaner backgrounds, especially in distant regions and low-contrast areas.

Overall, both quantitative and qualitative results demonstrate that the proposed method provides a more effective and more robust solution than representative dehazing methods.

4.4. Ablation Study

As shown in Table 3, the ablation results demonstrate the individual roles of the FFT branch and the cross-domain gating module. To quantify the statistical reliability of the ablation, every variant is trained three times with seeds {0, 1, 2} under identical data splits and hyperparameters, and Table 3 reports the mean ± standard deviation of PSNR/SSIM. The standard deviations are small (≤0.12 dB PSNR and ≤0.002 SSIM on the synthetic benchmarks), so the moderate 0.4 dB gap between the full model and the “w/o Cross Gating” variant on ITS is several times larger than the seed-level sample standard deviations, suggesting the difference is unlikely to be purely stochastic. The much larger gaps associated with removing the FFT branch (≥3.3 dB on every benchmark) are even more clearly beyond the seed-level noise. We note that with only three seeds, these should be read as indicative rather than as a formal significance test.

Removing cross-domain gating causes consistent but moderate degradation across all settings. On ITS, PSNR drops from 33.3 to 32.9 dB. This indicates that simple spectral decomposition alone is useful, but guided information exchange between low/high-frequency features further improves restoration quality.

Removing the FFT branch produces substantially larger performance loss. The ITS in-domain result decreases by 3.9 dB, and the OTS in-domain result decreases by 3.3 dB. Real-domain generalization also drops notably, especially in SSIM. These trends strongly support the paper’s main claim that explicit spectral modeling is not an auxiliary trick but a core contributor to the final performance.

The ablation also reveals a meaningful hierarchy of contributions: the FFT branch provides the dominant gain, while the cross-domain gating module refines and stabilizes that gain. This is exactly the behavior expected if low/high-frequency separation recovers complementary haze-related information and the proposed fusion block helps recompose it effectively.

For a concise summary of these ablation outcomes, on ITS, removing cross-domain gating reduces performance from 33.30 ± 0.07 dB/0.983 SSIM to 32.90 ± 0.08 dB/0.982 SSIM, while removing the FFT branch reduces performance further to 29.40 ± 0.10 dB/0.971 SSIM; on OTS the corresponding numbers are 35.10 ± 0.09/0.988→34.60 ± 0.10/0.985 (w/o Cross Gating) and → 31.80 ± 0.11/0.978 (w/o FFT branch). Analogous trends are observed on the two real-haze benchmarks (O-HAZE and NH-HAZE). These results validate the proposed design logic and support the paper’s central hypothesis that explicit frequency modeling improves dehazing.

4.5. Sensitivity to the Low-Frequency Ratio ρ

To justify the choice of the low-frequency ratio ρ = 0.18 used by the radial mask M in Section 3.3, we retrain the full FGDNet on RESIDE ITS_v2 with ρ ∈ {0.10, 0.15, 0.18, 0.22, 0.25, 0.30} under an identical optimiser, data splits, and the same number of iterations. All other hyperparameters are kept fixed. As summarized in Table 4, the model is robust within ρ ∈ [0.15, 0.25], for which PSNR stays within roughly 0.3 dB of the optimum. For ρ = 0.10, the low-frequency band is too narrow to fully cover the atmospheric veil, and part of the haze-correlated energy leaks into the high-frequency branch, which degrades the contrast-restoration capability; for ρ = 0.30, the low-frequency band starts to leak into texture spectra, causing mild edge over-smoothing. ρ = 0.18 attains the best overall PSNR and SSIM on ITS and is therefore adopted as the default.

4.6. Ablation on the Number of FFT-Equipped Encoder Stages

Section 3.1 states that the FFT-based spectral branch is inserted into the first three encoder stages. To justify this design choice, Table 5 reports the ITS→ITS performance when the FFT module is applied to the first k stages, with k ∈ {0, 1, 2, 3, 4} Here k = 0 denotes the gated U-Net backbone with the FFT amplitude loss L_fft retained but without any FFT-equipped encoder stage—i.e., the same configuration as the “w/o FFT Branch” variant in Table 3—while k = 4 additionally puts an FFT module at the bottleneck. PSNR increases monotonically from 29.4 dB at k = 0 to 33.3 dB at k = 3, confirming that each additional FFT-equipped encoder stage contributes a further gain. Applying FFT at the fourth (bottleneck) stage actually gives a 0.2 dB drop (33.1 dB) while adding 0.17 M extra parameters and 0.9 G extra FLOPS. The reason is that for a 256 × 256 input, the bottleneck features are spatially compressed to an 8 × 8 grid, which is too small for a reliable 2D FFT: the spectrum is dominated by only a few coefficients, so the low/high split no longer reflects meaningful image-level frequencies. We therefore adopt k = 3 as the default configuration.

4.7. Model Efficiency

To substantiate the claim that FGDNet is compact and lightweight, we report in Table 6 the number of parameters, the number of FLOPS at a 256 × 256 input, and the average per-image inference time measured on a single NVIDIA RTX 3090 in FP16 automatic mixed-precision mode. For each method, the inference time is averaged over 500 ITS test images with 20 warm-up iterations, using each method’s publicly released inference script. FGDNet has only 1.35 M parameters and 9.8 G FLOPS, corresponding to approximately 0.25 M and 2.8 G on top of the gUNet backbone, and achieves 15 ms per image. Compared with FFA-Net, FGDNet uses roughly 30% of the parameters and 3.4% of the FLOPS while, as shown in Table 2, attaining 1.3 dB higher PSNR on ITS. Compared with DEA-Net, FGDNet has a strictly smaller parameter and FLOPS footprint, and compared with the very compact DehazeFormer-T, FGDNet uses slightly more parameters and FLOPS but remains well within the “compact/lightweight” range. These measurements quantitatively support the “compact/lightweight” statement in the abstract and introduction.

4.8. Decomposed Ablation of the FFT Branch and the FFT Amplitude Loss

Section 3.5 introduces an FFT amplitude loss L_fft that is applied to the output image rather than to intermediate features. To isolate its contribution from that of the FFT encoder branch and to justify why only the amplitude spectrum is constrained, we train four variants on ITS→ITS with three random seeds {0, 1, 2} and report mean ± standard deviation in Table 7. Removing L_fft while keeping the FFT branch drops PSNR from 33.30 ± 0.07 to 33.00 ± 0.08 dB, so L_fft contributes approximately 0.3 dB on top of the FFT branch. Removing the FFT branch while keeping L_fft gives 29.40 ± 0.10 dB, so L_fft alone contributes approximately 0.3 dB over the pure baseline (29.10 ± 0.09). The FFT branch is therefore the dominant factor, and L_fft acts as a small but complementary refinement. Finally, adding an additional L1 penalty on the phase spectrum (last row) gives 33.10 ± 0.09 dB, i.e., 0.2 dB worse than amplitude-only, because the phase is wrapped in [−π, π] with 2π discontinuities and empirically destabilizes training; this justifies the design choice of constraining only the amplitude spectrum.

4.9. Real-Haze Fine-Tuning and the Synthetic-to-Real Gap

To further investigate the large PSNR gap observed in Section 4.2 between the synthetic in-domain protocol (OTS→OTS, 35.1 dB) and the real-haze cross-domain protocol (OTS→O-HAZE, 19.1 dB), we conduct a few-shot real-haze fine-tuning experiment. The OTS-pretrained FGDNet is fine-tuned on 35 O-HAZE training pairs, while 5 pairs are held out for validation and 5 pairs form the test set, following the standard O-HAZE evaluation protocol. All hyperparameters are kept identical to the main training recipe; only the number of fine-tuning iterations is varied. As reported in Table 8, only 2 k fine-tuning iterations raise PSNR/SSIM from 19.10/0.786 to 21.60/0.827, a gain of +2.5 dB/+0.041 SSIM; 5 k iterations further reach 21.95/0.834. This indicates that a large portion of the synthetic-to-real gap is caused by distribution mismatch rather than by a fundamental capacity limitation of the model, and that even a small amount of real paired data is sufficient to close most of the gap. Combined with the discussion in Section 5, three practical remedies emerge: (i) few-shot fine-tuning on real hazy–clear pairs, as demonstrated here; (ii) contrastive or unpaired regularization in the spirit of D⁴ [27] and RIDCP [28]; and (iii) content-adaptive frequency masks to replace the fixed radial mask.

5. Discussion

The FFT branch dominates the ablation gain. In Table 3, removing the FFT branch causes a 3.9 dB PSNR drop on ITS (33.3→29.4 dB), whereas removing the cross-domain gating module causes only a 0.4 dB drop (33.3→32.9 dB). We interpret this asymmetry as follows: the FFT branch is the only module that explicitly separates haze-correlated low-frequency content (contrast attenuation, color shift, atmospheric veil) from structure-carrying high-frequency content (edges, fine textures). Once this separation is available, the cross-domain gating module acts as a refinement that modulates how the two bands are recombined. The hierarchy (FFT branch > gating block) is therefore consistent with the physical motivation of the proposed design rather than an artifact of the implementation.

Synthetic-to-real performance gap. The PSNR gap between the in-domain protocol OTS→OTS (35.1 dB/0.988 SSIM) and the cross-domain protocols OTS→O-HAZE (19.1 dB/0.786 SSIM) and OTS→NH-HAZE (15.8 dB/0.648 SSIM) is large. We attribute this gap to three concrete factors. First, the synthetic haze in RESIDE is generated with a simplified atmospheric-scattering model that assumes uniform transmission, while real haze is spatially non-uniform and wavelength-dependent. Second, real captures contain camera ISP effects (sensor noise, tone-mapping, demosaicing artifacts) that are absent from RESIDE renderings. Third, NH-HAZE deliberately contains non-homogeneous haze, which violates the uniform-transmission assumption used to train the model. Encouragingly, the SSIM gap (0.988→0.786 on O-HAZE) is smaller than the corresponding PSNR gap, suggesting that the frequency-guided design captures transferable structural cues even when pixel fidelity degrades. Practical remedies include (i) fine-tuning on a small set of real hazy–clear pairs, (ii) contrastive or unpaired regularization in the spirit of D⁴ [27] and RIDCP [28], and (iii) replacing the fixed radial mask with content-adaptive masks.

Limitations of the radial mask. The current radial mask is isotropic and shared across all channels. For scenes with strong directional structures (e.g., vertical building facades, horizontal horizons, strong perspective lines), an anisotropic or learnable mask might better separate the haze veil from oriented texture. The consistent but moderate residual error observed on the real-world benchmarks is compatible with this hypothesis. The design of an adaptive mask is therefore a natural direction for future work.

Accuracy/efficiency trade-off. Compared with the gUNet backbone augmented only with the L_fft amplitude loss (the “w/o FFT Branch” variant in Table 3, 29.4 dB), the spectral branch and the cross-domain gating module together add only approximately 0.25 M parameters and 2.8 G FLOPS at the 256 × 256 input resolution, while improving PSNR on ITS by 3.9 dB (29.4→33.3). Compared with FFA-Net, the network uses substantially fewer parameters and FLOPS, yet still obtains higher PSNR/SSIM on every protocol reported in Table 2, which qualitatively supports the statement in the abstract that the proposed network is both effective and compact.

6. Conclusions

In this paper, we presented FGDNet, a frequency-guided multi-scale dehazing network that augments a gated U-Net backbone with three complementary components: (1) an FFT-based encoder branch that explicitly decomposes intermediate features into low- and high-frequency sub-bands; (2) a lightweight cross-domain spatial–spectral gating module that enables bidirectional channel/spatial attention between the two bands; and (3) an FFT amplitude loss that constrains the spectral statistics of restored images.

Quantitatively, FGDNet obtains 33.3 dB PSNR/0.983 SSIM on RESIDE-ITS and 35.1 dB PSNR/0.988 SSIM on RESIDE-OTS, corresponding to absolute gains of 0.5 dB/0.002 and 0.4 dB/0.002 over the strongest representative baseline. On the more challenging cross-domain protocols, FGDNet reaches 19.1 dB PSNR/0.786 SSIM on OTS→O-HAZE and 15.8 dB PSNR/0.648 SSIM on OTS→NH-HAZE, outperforming FFA-Net by 0.8 dB/0.020 and 1.0 dB/0.010, respectively, with the largest absolute gains observed on the real-haze benchmarks. Ablation studies confirm a clear hierarchy of contributions: removing the FFT branch causes a 3.9 dB PSNR drop on ITS (33.3→29.4 dB), whereas removing the cross-domain gating module causes only a 0.4 dB drop (33.3→32.9 dB)—both gaps are several times larger than the seed-level sample standard deviations reported in Table 3, indicating that the spectral decomposition carries the dominant share of the gain while the gating module stabilizes and refines it. The FFT amplitude loss contributes an additional 0.3 dB PSNR on top of the FFT branch, and performance is robust within ρ ∈ [0.15, 0.25], where PSNR stays within 0.3 dB of the optimum. The spectral branch and gating module together add only ~0.25 M parameters and ~2.8 G FLOPS on top of the gUNet backbone for a 256 × 256 input; in absolute terms FGDNet uses 1.35 M parameters, 9.8 G FLOPS and 15 ms per image on a single RTX 3090, corresponding to roughly 30% of the parameters and 3.4% of the FLOPS of FFA-Net, supporting the claim that the design is both effective and compact. Furthermore, a few-shot fine-tuning experiment shows that only 2 k iterations on 35 O-HAZE pairs raise PSNR/SSIM from 19.10/0.786 to 21.60/0.827 (+2.5 dB/+0.041 SSIM), indicating that a large portion of the synthetic-to-real gap originates from distribution mismatch rather than from a fundamental capacity limit of the model.

Taken together, these results yield three concrete findings. First, the asymmetry between the large ablation gap of the FFT branch (3.9 dB on ITS) and the moderate gap of the gating module (0.4 dB on ITS) is consistent with the physical motivation of the design: only the FFT branch explicitly separates haze-correlated low-frequency content (contrast attenuation, color shift, atmospheric veil) from structure-carrying high-frequency content (edges, fine textures), while the gating module acts as a refinement that modulates how the two bands are recombined. Second, FGDNet is the only method in Table 2 that simultaneously improves both PSNR and SSIM over every baseline on all four protocols, which suggests that the frequency-guided design improves pixel fidelity and structural consistency jointly rather than trading one for the other. Third, the smaller relative SSIM gap than PSNR gap on O-HAZE (0.988→0.786 versus 35.1→19.1 dB) and the rapid recovery after lightweight real-haze fine-tuning together indicate that the frequency-guided representation captures transferable structural cues even when absolute pixel fidelity degrades, making FGDNet a reasonable starting point for domain-adaptation pipelines.

These findings support our central hypothesis that haze corrupts different frequency bands in different ways, and that explicit spectral decomposition coupled with cross-domain gating is an effective and efficient mechanism for compact dehazing networks. Future work includes (i) content-adaptive or anisotropic frequency masks, (ii) phase-aware frequency losses, (iii) unsupervised real-haze adaptation in the spirit of D⁴ [27] and RIDCP [28], and (iv) hybrid spectral–transformer fusion for higher-resolution real-world haze in the spirit of 4KDehazing [25].

Author Contributions

Conceptualization, F.J., H.L., L.Z. and Y.C.; methodology, F.J., H.L., L.Z. and Y.C.; software, Y.C.; validation, Y.C.; formal analysis, Y.C.; investigation, Y.C.; resources, F.J., H.L., L.Z. and Y.C.; data curation, Y.C.; writing—original draft preparation, F.J., H.L., L.Z. and Y.C.; writing—review and editing, F.J., H.L., L.Z. and Y.C.; visualization, Y.C.; supervision, F.J., H.L., L.Z. and Y.C.; project administration, F.J., H.L., L.Z. and Y.C.; funding acquisition, H.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the funding project under Grant 2025-JCJQ-JJ-0170.

Data Availability Statement

The original data presented in the study are openly available at RESIDE ITS_v2: https://sites.google.com/view/reside-dehaze-datasets/reside-standard (accessed on 18 October 2025); RESIDE OTS_BETA: https://sites.google.com/view/reside-dehaze-datasets/reside-%CE%B2 (accessed on 18 October 2025); O-HAZE: https://data.vision.ee.ethz.ch/cvl/ntire18/o-haze/ (accessed on 18 October 2025); and NH-HAZE: https://data.vision.ee.ethz.ch/cvl/ntire20/nh-haze/ (accessed on 18 October 2025).

Acknowledgments

The authors wish to thank the anonymous reviewers for their valuable suggestions.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

FGDNet

Frequency-guided multi-scale dehazing network

References

Liu, Z.; He, Y.; Wang, C.; Song, R. Analysis of the influence of foggy weather environment on the detection effect of machine vision obstacles. Sensors 2020, 20, 349. [Google Scholar] [CrossRef] [PubMed]
Liu, W.; Ren, G.; Yu, R.; Guo, S.; Zhu, J.; Zhang, L. Image-adaptive YOLO for object detection in adverse weather conditions. In Proceedings of the AAAI Conference on Artificial Intelligence; AAAI Press: Washington, DC, USA, 2022; Volume 36, pp. 1796–1804. [Google Scholar]
Mohapatra, J.B.; Monikantan, J.; Nishchal, N.K. Object recognition in foggy and hazy conditions using dark channel prior-based fringe-adjusted joint transform correlator. Photonics 2024, 11, 1142. [Google Scholar] [CrossRef]
Florea, H.; Petrovai, I.; Racoviteanu, A.; Moldoveanu, A.; Moldoveanu, F. Enhanced perception for autonomous driving using semantic and geometric data fusion. Sensors 2022, 22, 5061. [Google Scholar] [CrossRef] [PubMed]
Hahner, M.; Dai, D.; Sakaridis, C.; Zaech, J.-N.; Van Gool, L. Semantic understanding of foggy scenes with purely synthetic data. In Proceedings of the IEEE Intelligent Transportation Systems Conference (ITSC); IEEE: New York, NY, USA, 2019. [Google Scholar]
Bijelic, M.; Gruber, T.; Ritter, W. A benchmark for lidar sensors in fog: Is detection breaking down? In Proceedings of the IEEE Intelligent Vehicles Symposium (IV); IEEE: New York, NY, USA, 2018; pp. 760–767. [Google Scholar]
Xie, Y.; Wei, H.; Liu, Z.; Wang, X.; Ji, X. SynFog: A photo-realistic synthetic fog dataset based on end-to-end imaging simulation for advancing real-world defogging in autonomous driving. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); IEEE: New York, NY, USA, 2024; pp. 13813–13822. [Google Scholar]
Liu, J.; Wu, Y.; Xiao, Z.; Sun, X.; Alhalbi, J.M.A. A review of remote sensing image dehazing. Sensors 2021, 21, 3928. [Google Scholar] [CrossRef] [PubMed]
Jiang, B.; Zhang, Y.; Lu, X.; Chen, S.; Chen, L.; Liu, J. Deep dehazing network for remote sensing image with non-uniform haze. Remote Sens. 2021, 13, 4443. [Google Scholar] [CrossRef]
Makarau, A.; Richter, R.; Müller, R.; Reinartz, P. Haze detection and removal in remotely sensed multispectral imagery. IEEE Trans. Geosci. Remote Sens. 2014, 52, 5895–5905. [Google Scholar] [CrossRef]
He, K.; Sun, J.; Tang, X. Single image haze removal using dark channel prior. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR); IEEE: New York, NY, USA, 2009; pp. 1956–1963. [Google Scholar]
Cai, B.; Xu, X.; Jia, K.; Qing, C.; Tao, D. DehazeNet: An end-to-end system for single image haze removal. IEEE Trans. Image Process. 2016, 25, 5187–5198. [Google Scholar] [CrossRef] [PubMed]
Ren, W.; Liu, S.; Zhang, H.; Pan, J.; Cao, X.; Yang, M.-H. Single image dehazing via multi-scale convolutional neural networks. In Proceedings of the European Conference on Computer Vision (ECCV); Springer: Cham, Switzerland, 2016; pp. 154–169. [Google Scholar]
Li, B.; Peng, X.; Wang, Z.; Xu, J.; Feng, D. AOD-Net: All-in-one dehazing network. In Proceedings of the IEEE International Conference on Computer Vision (ICCV); IEEE: New York, NY, USA, 2017; pp. 4770–4778. [Google Scholar]
Liu, X.; Ma, Y.; Shi, Z.; Chen, J. GridDehazeNet: Attention-based multi-scale network for image dehazing. In Proceedings of the IEEE International Conference on Computer Vision (ICCV); IEEE: New York, NY, USA, 2019; pp. 7314–7323. [Google Scholar]
Qin, X.; Wang, Z.; Bai, Y.; Xie, X.; Jia, H. FFA-Net: Feature fusion attention network for single image dehazing. In Proceedings of the AAAI Conference on Artificial Intelligence; AAAI Press: Washington, DC, USA, 2020; Volume 34, pp. 11908–11915. [Google Scholar]
Dong, H.; Pan, J.; Xiang, Z.; Hu, L.; Zhang, X.; Wang, F.; Yang, J. Multi-scale boosted dehazing network with dense feature fusion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); IEEE: New York, NY, USA, 2020; pp. 2157–2167. [Google Scholar]
Guo, C.-L.; Yan, Q.; Anwar, S.; Cong, R.; Ren, W.; Li, C. Image dehazing transformer with transmission-aware 3D position embedding. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); IEEE: New York, NY, USA, 2022; pp. 5812–5820. [Google Scholar]
Song, Y.; He, Y.; Qian, H.; Du, X. Vision transformers for single image dehazing. IEEE Trans. Image Process. 2023, 32, 1927–1941. [Google Scholar] [CrossRef] [PubMed]
Song, Y.; Zhou, Y.; Qian, H.; Du, X. Rethinking performance gains in image dehazing networks. arXiv 2022, arXiv:2209.11448. [Google Scholar] [CrossRef]
Chen, Z.; He, Z.; Lu, Z.-M. DEA-Net: Single image dehazing based on detail-enhanced convolution and content-guided attention. IEEE Trans. Circuits Syst. Video Technol. 2024, 33, 1002–1015. [Google Scholar] [CrossRef] [PubMed]
Yu, H.; Zheng, N.; Zhou, M.; Huang, J.; Xiao, Z.; Zhao, F. Frequency and spatial dual guidance for image dehazing. In Proceedings of the European Conference on Computer Vision (ECCV); Springer: Cham, Switzerland, 2022; pp. 181–198. [Google Scholar]
Kong, L.; Dong, J.; Ge, J.; Li, M.; Pan, J. Efficient frequency domain-based transformers for high-quality image deblurring. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); IEEE: New York, NY, USA, 2023; pp. 5886–5895. [Google Scholar]
Zheng, Y.; Zhan, J.; He, S.; Dong, J.; Du, Y. Curricular contrastive regularization for physics-aware single image dehazing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); IEEE: New York, NY, USA, 2023; pp. 5785–5794. [Google Scholar]
Zheng, Z.; Ren, W.; Cao, X.; Hu, X.; Wang, T.; Song, F.; Jia, X. Ultra-high-definition image dehazing via multi-guided bilateral learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); IEEE: New York, NY, USA, 2021; pp. 16185–16194. [Google Scholar]
Chen, Z.; Wang, Y.; Yang, Y.; Liu, D. PSD: Principled synthetic-to-real dehazing guided by physical priors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); IEEE: New York, NY, USA, 2021; pp. 7180–7189. [Google Scholar]
Yang, Y.; Wang, C.; Liu, R.; Zhang, L.; Guo, X.; Tao, D. Self-augmented unpaired image dehazing via density and depth decomposition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); IEEE: New York, NY, USA, 2022; pp. 2037–2046. [Google Scholar]
Wu, R.-Q.; Duan, Z.-P.; Guo, C.-L.; Chai, Z.; Li, C. RIDCP: Revitalizing real image dehazing via high-quality codebook priors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); IEEE: New York, NY, USA, 2023; pp. 22282–22291. [Google Scholar]
Zhou, M.; Yu, H.; Huang, J.; Zhao, F.; Gu, J.; Loy, C.C.; Meng, D.; Li, C. Deep Fourier up-sampling. In Advances in Neural Information Processing Systems (NeurIPS); Neural Information Processing Systems Foundation, Inc.: San Diego, CA, USA, 2022; Volume 35. [Google Scholar]
Cui, Y.; Tao, Y.; Bing, Z.; Ren, W.; Gao, X.; Cao, X.; Huang, K.; Knoll, A. Selective frequency network for image restoration. In Proceedings of the International Conference on Learning Representations (ICLR); ICLR: Rio de Janeiro, Brazil, 2023. [Google Scholar]
Wang, C.; Wu, H.; Jin, Z. FourLLIE: Boosting low-light image enhancement by Fourier frequency information. In Proceedings of the 31st ACM International Conference on Multimedia (ACM MM), Ottawa, ON, Canada, 29 October–3 November 2023; pp. 7459–7469. [Google Scholar]
Li, B.; Ren, W.; Fu, D.; Tao, D.; Feng, W.; Zeng, W.; Wang, Z. RESIDE: A benchmark for single image dehazing. IEEE Trans. Image Process. 2019, 28, 492–505. [Google Scholar] [CrossRef] [PubMed]
Ancuti, C.O.; Ancuti, C.; Timofte, R.; De Vleeschouwer, C. O-HAZE: A dehazing benchmark with real hazy and haze-free outdoor images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW); IEEE: New York, NY, USA, 2018; pp. 754–762. [Google Scholar]
Ancuti, C.O.; Ancuti, C.; Timofte, R.; De Vleeschouwer, C. NH-HAZE: An image dehazing benchmark with non-homogeneous hazy and haze-free images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW); IEEE: New York, NY, USA, 2020; pp. 1798–1805. [Google Scholar]

Figure 1. The framework of our proposed method: (a) overall architecture; (b) detailed structure of the spectral processing block (FFT split and cross-domain gating block). arrow: data flow; colors: module; red box: spectral processing block.

Figure 2. Qualitative results of comparison with representative state-of-the-art methods.

Table 1. Quantitative results of the proposed method.

Training Set	Test Set	PSNR (dB)/SSIM
ITS	ITS	33.3/0.983
OTS	OTS	35.1/0.988
OTS	O-HAZE	19.1/0.786
OTS	NH-HAZE	15.8/0.648
ITS (canonical)	SOTS-Indoor (500 pairs)	33.7/0.984
OTS (canonical)	SOTS-Outdoor (500 pairs)	35.4/0.988

Table 2. Quantitative results of comparison with representative state-of-the-art methods.

Protocol	GridDehazeNet [15]	MSBDN [17]	FFA-Net [16]	FGDNet (Ours)
ITS→ITS	32.2/0.978	32.8/0.981	32.0/0.978	33.3/0.983
OTS→OTS	30.9/0.980	33.6/0.982	34.7/0.986	35.1/0.988
OTS→O-HAZE	17.2/0.712	18.2/0.750	18.3/0.766	19.1/0.786
OTS→NH-HAZE	13.4/0.548	14.1/0.602	14.8/0.638	15.8/0.648

Table 3. Ablation results of the FFT branch and the cross-domain gating module.

Protocol	Full Model PSNR (dB) /SSIM	w/o Cross Gating PSNR (dB) /SSIM	w/o FFT Branch PSNR(dB) /SSIM	Baseline PSNR (dB) /SSIM
ITS→ITS	33.30 ± 0.07/0.983 ± 0.001	32.90 ± 0.08/0.982 ± 0.001	29.40 ± 0.10/0.971 ± 0.002	29.10 ± 0.09/0.968 ± 0.002
OTS→OTS	35.10 ± 0.09/0.988 ± 0.001	34.60 ± 0.10/0.985 ± 0.001	31.80 ± 0.11/0.978 ± 0.002	31.50 ± 0.10/0.975 ± 0.002
OTS→O-HAZE	19.10 ± 0.12/0.786 ± 0.004	18.70 ± 0.14/0.772 ± 0.004	17.70 ± 0.13/0.731 ± 0.005	17.30 ± 0.15/0.719 ± 0.005
OTS→NH-HAZE	15.80 ± 0.15/0.648 ± 0.005	15.40 ± 0.14/0.629 ± 0.006	14.50 ± 0.16/0.586 ± 0.006	14.10 ± 0.17/0.566 ± 0.007

Table 4. Sensitivity of FGDNet to the low-frequency ratio ρ (ITS→ITS, full model).

ρ	0.10	0.15	0.18 (Ours)	0.22	0.25	0.30
PSNR (dB)	32.8	33.2	33.3	33.2	33.0	32.6
SSIM	0.980	0.982	0.983	0.983	0.981	0.979

Table 5. Ablation on the number of FFT-equipped encoder stages k (ITS→ITS).

k	PSNR (dB)	SSIM	Params (M)	FLOPS (G)
0 (no FFT stage, L_fft kept)	29.4	0.971	1.10	7.0
1 (stage 1)	30.9	0.975	1.11	7.9
2 (stages 1–2)	32.4	0.980	1.19	8.8
3 (stages 1–3, ours)	33.3	0.983	1.35	9.8
4 (all stages)	33.1	0.982	1.52	10.7

Table 6. Parameter count, FLOPS, and inference time at a 256 × 256 input.

Method	Params (M)	FLOPS (G)	Time (ms)
GridDehazeNet [15]	0.96	21.5	17
MSBDN [17]	31.35	41.5	28
FFA-Net [16]	4.46	287.8	40
DehazeFormer-T [19]	0.69	6.7	13
DEA-Net [21]	3.60	32.3	22
gUNet (backbone) [20]	1.10	7.0	11
FGDNet (ours)	1.35	9.8	15

Table 7. Decomposed ablation of the FFT branch and L_fft on ITS→ITS (mean ± std over 3 seeds).

Variant	PSNR (dB)	SSIM
Full (FFT branch + L_fft)	33.30 ± 0.07	0.983 ± 0.001
Full − L_fft (FFT branch only)	33.00 ± 0.08	0.982 ± 0.001
Full − FFT branch (L_fft only)	29.40 ± 0.10	0.971 ± 0.002
Baseline (neither)	29.10 ± 0.09	0.968 ± 0.002
Full, amplitude + phase L1	33.10 ± 0.09	0.982 ± 0.001

Table 8. Few-shot real-haze fine-tuning on O-HAZE (OTS-pretrained FGDNet).

Setting	PSNR (dB)	SSIM
OTS only (no fine-tune)	19.10	0.786
+1 k FT iterations (35 pairs)	20.84	0.812
+2 k FT iterations (35 pairs)	21.60	0.827
+5 k FT iterations (35 pairs)	21.95	0.834

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Jin, F.; Lin, H.; Zhang, L.; Chen, Y. Frequency-Guided Multi-Scale Dehazing Network with Cross-Domain Spatial–Spectral Gating. Algorithms 2026, 19, 341. https://doi.org/10.3390/a19050341

AMA Style

Jin F, Lin H, Zhang L, Chen Y. Frequency-Guided Multi-Scale Dehazing Network with Cross-Domain Spatial–Spectral Gating. Algorithms. 2026; 19(5):341. https://doi.org/10.3390/a19050341

Chicago/Turabian Style

Jin, Fangyuan, Hui Lin, Lu Zhang, and Yiwei Chen. 2026. "Frequency-Guided Multi-Scale Dehazing Network with Cross-Domain Spatial–Spectral Gating" Algorithms 19, no. 5: 341. https://doi.org/10.3390/a19050341

APA Style

Jin, F., Lin, H., Zhang, L., & Chen, Y. (2026). Frequency-Guided Multi-Scale Dehazing Network with Cross-Domain Spatial–Spectral Gating. Algorithms, 19(5), 341. https://doi.org/10.3390/a19050341

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Frequency-Guided Multi-Scale Dehazing Network with Cross-Domain Spatial–Spectral Gating

Abstract

1. Introduction

2. Related Work

2.1. Single Image Dehazing

2.2. Frequency-Aware Restoration and Dehazing

2.3. Benchmark Datasets

3. Methods

3.1. Overall Architecture

3.2. Gated Convolutional Backbone

3.3. Frequency Split and Refinement

3.4. Cross-Domain Spatial–Spectral Gating

3.5. Training Objective

4. Experiments

4.1. Datasets, Evaluation Protocol, and Implementation Details

4.2. Quantitative Results

4.3. Comparison with Representative State-of-the-Art Methods

4.4. Ablation Study

4.5. Sensitivity to the Low-Frequency Ratio ρ

4.6. Ablation on the Number of FFT-Equipped Encoder Stages

4.7. Model Efficiency

4.8. Decomposed Ablation of the FFT Branch and the FFT Amplitude Loss

4.9. Real-Haze Fine-Tuning and the Synthetic-to-Real Gap

5. Discussion

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI