Abstract
Single-image dehazing suffers from severe information loss and the under-constraint problem. The lack of high-quality robust priors leads to limited generalization ability of existing dehazing methods in real-world scenarios. To tackle this challenge, we propose a simple but effective single-image dehazing network by fusing high-quality semantic priors extracted from Segment Anything Model 2 (SAM2) with different types of advanced convolutions, abbreviated SAM2-Dehaze, which follows the U-Net architecture and consists of five stages. Specifically, we first employ the superior semantic perception and cross-domain generalization capabilities of SAM2 to generate accurate structural semantic masks. Then, a dual-branch Semantic Prior Fusion Block is designed to enable deep collaboration between the structural semantic masks and hazy image features at each stage of the U-Net. Furthermore, to avoid the drawbacks of feature redundancy and neglect of high-frequency information in traditional convolution, we have designed a novel parallel detail-enhanced and compression convolution that combines the advantages of standard convolution, difference convolution, and reconstruction convolution to replace the traditional convolution at each stage of the U-Net. Finally, a Semantic Alignment Block is incorporated into the post-processing phase to ensure semantic consistency and visual naturalness in the final dehazed result. Extensive quantitative and qualitative experiments demonstrate that SAM2-Dehaze outperforms existing dehazing methods on several synthetic and real-world foggy-image benchmarks, and exhibits excellent generalization ability.
1. Introduction
Single-image dehazing plays an important role in fields such as autonomous driving, intelligent surveillance, and remote sensing, serving as a crucial link between low-level image restoration and high-level visual understanding. Over the past decade, researchers have explored a wide range of approaches. Khan et al. [1] conducted a systematic review of traditional and deep learning-based dehazing methods and summarized the main challenges in this field. Prior-based physical models [2,3,4] estimate the parameters of the atmospheric scattering model based on statistical assumptions. While these methods can enhance image quality to some degree, their strong reliance on simplified physical priors limits their ability to handle complex and diverse real-world conditions. With the rise in convolutional neural networks (CNNs) [5] and transformers [6], data-driven methods [7,8,9] have gradually become the dominant trend. By directly learning degradation patterns directly from large-scale datasets, these methods have achieved remarkable progress and consistently outperform traditional techniques. However, since most deep learning models are trained on synthetic datasets, their performance often drops significantly when applied to real-world hazy images due to the domain gap, resulting in limited generalization ability.
In recent years, the advancement of deep learning and large-scale pre-trained models have provided new directions for image dehazing. Among them, prior-guided approaches leveraging powerful vision foundation models have shown great potential. The Segment Anything Model (SAM) [10] and its successor SAM2 [11] demonstrate outstanding cross-domain generalization and rich semantic representations (see Figure 1), even under hazy conditions, providing reliable semantic priors for dehazing. Motivated by these capabilities, we propose SAM2-Dehaze, a semantic-aware dehazing framework that effectively integrates SAM2-derived priors to enhance detail recovery and visual consistency.
Figure 1.
Illustration of SAM’s robustness on different types of hazy images. The figure demonstrates that SAM can accurately segment objects even when the input is a low-quality hazy image. This observation motivates us to leverage the semantic priors extracted from SAM, a large-scale foundation model, to enhance image restoration performance.
The main contributions of this work can be summarized as follows:
- We design a Semantic Prior Fusion Block (SPFB), which introduces SAM2-derived semantic information at multiple stages of the U-Net backbone. This semantic fusion mechanism guides the model to highlight structural features in key regions, enhancing its perception and restoration of edges and textures.
- We design a parallel detail-enhanced and compression convolution (PDCC), which combines standard, difference, and reconstruction convolutions to enable collaborative multi-level feature modeling. This module improves high-frequency detail representation while reducing redundancy.
- We design a Semantic Alignment Block (SAB) in the reconstruction phase, which performs fine-grained semantic alignment to restore colors, textures, and boundaries of key regions, thereby ensuring semantic consistency, visual naturalness, and structural integrity of the dehazed results.
The rest of this paper is organized as follows: Section 2 reviews related work on traditional, deep learning-based, and semantic-prior-guided image dehazing methods. Section 3 presents the architecture and implementation details of the proposed SAM2-Dehaze framework, including the Semantic Prior Fusion Block, the Parallel Detail-enhanced and Compression Convolution, and the Semantic Alignment Block. Section 4 reports the experimental settings, quantitative and qualitative results, and ablation studies to validate the effectiveness of each component. Section 5 concludes the paper by summarizing the main findings, discussing the limitations of the proposed method, and outlining directions for future research.
2. Related Work
2.1. Traditional Image Dehazing
In the early stages of image dehazing research, prior-based physical models dominated the field and became the mainstream approach. These methods typically rely on the atmospheric scattering model (ASM) and employ handcrafted empirical priors to model the statistical differences between hazy and haze-free images. By leveraging these priors, the transmission map and atmospheric light parameters are estimated to reconstruct a clear scene. For example, He et al. [12] proposed the Dark Channel Prior (DCP), which is based on the observation that in most local patches of outdoor haze-free images, at least a color channel exhibits very low intensity. This prior effectively facilitates the estimation of the transmission map and achieves impressive performance in dehazing tasks. Zhu et al. [13] introduced the Color Attenuation Prior, which analyzes the differences in brightness and saturation to establish a linear relationship with scene depth, enabling the inference of the transmission map. Berman et al. [14] proposed the Non-local Prior, which is based on the observation that pixels in the RGB color space tend to exhibit non-local clustering behavior. This structural property enables the effective separation of haze from scene content and provides reliable guidance for dehazing without explicit depth information.
Despite their early success and satisfactory performance in specific scenarios (e.g., buildings or ground regions), prior-based methods inherently suffer from the limitations of handcrafted assumptions. When dealing with complex and non-uniform haze distributions in natural scenes, handcrafted priors often fail to accurately capture the intricate relationships between haze and image content. Consequently, they tend to produce artifacts such as color distortion and halo effects, ultimately degrading the overall visual quality of dehazed images.
2.2. Deep Learning-Based Image Dehazing
With the rapid development of deep learning and the availability of large-scale synthetic dehazing datasets, learning-based image dehazing has rapidly become the mainstream research direction. Early deep-learning methods largely followed physics-guided paradigms, relying on the atmospheric scattering model (ASM) to estimate intermediate parameters such as transmission maps and atmospheric light for haze-free reconstruction. For instance, Shi et al. [15] proposed a zero-shot sand–dust image-restoration method based on the atmospheric scattering model, enabling unsupervised recovery of real sand–dust images without paired data and achieving superior visual quality. Ren et al. [16] proposed MSCNN to progressively refine transmission estimation; Li et al. [17] introduced AOD-Net to jointly predict transmission and atmospheric light in an end-to-end manner. To further improve estimation accuracy, Zhang et al. [18] designed DCPDN with dual-branch architecture, and Li et al. [19] enhanced model robustness under complex conditions through fuzzy region segmentation and haze-density decomposition.
Despite their effectiveness, physics-guided deep learning approaches are prone to cumulative errors in intermediate estimations, which may deteriorate final restoration quality. Consequently, recent research has shifted toward purely data-driven models that learn the dehazing process directly from data. For example, Liu et al. [20] developed GridDehazeNet based on attention mechanisms, Qin et al. [8] proposed FFA-Net with feature-level adaptive attention, and Wu et al. [21] introduced AECR-Net with contrastive regularization. Hong et al. [22] modeled prediction uncertainty via UDN, and Cheng et al. [23] presented DEA-Net for enhanced detail and structure preservation, while Wang et al. [24] proposed Dehaze-RetinetGAN by integrating Retinex theory with self-supervised learning. In addition, Son et al. [25] developed a Retinex-based multiscale training framework for sand–dust removal, achieving superior color fidelity and clarity under severe atmospheric conditions.
Although these methods significantly improve image clarity and visual quality, they still tend to overlook fine structural details. Many existing models primarily emphasize global color and brightness restoration, often resulting in blurred edges or semantic inconsistencies, which limits their performance in high-quality restoration tasks.
2.3. Semantic Priors for Image Dehazing
In recent years, several studies have incorporated semantic information to guide the dehazing process, aiming to enhance the modeling of image structure and content. For example, Zhang et al. [26] utilized a pre-trained DeepLabv3+ network to extract semantic features and integrated them into the dehazing network through an adaptive fusion module, thereby improving semantic awareness and regional discrimination. Cheng et al. [27] adopted the VGG16 network to extract semantic features and employed a global estimation module together with a color recovery module to transform semantic information into priors of object color and atmospheric light. Similarly, Song et al. [28] proposed a segmentation-guided framework that divides dehazing into prediction and restoration stages, significantly enhancing edge sharpness and texture reconstruction. Although these methods have achieved remarkable success in introducing structural priors and enhancing scene understanding, they generally rely on task-specific semantic segmentation models (e.g., DeepLabv3+ and VGG16) and require pre-training or fine-tuning on specific datasets, which to some extent limits their generalization ability.
The emergence of the Segment Anything Model (SAM) has changed this paradigm. As a large-scale pre-trained universal segmentation model, SAM shows strong structural perception and cross-domain generalization capabilities. It can produce high-quality masks without requiring extra supervision or task-specific training, which greatly expands the use of semantic priors in low-level vision tasks. For example, Zhang et al. [29] proposed an image restoration framework that uses semantic priors extracted from SAM to improve the model’s ability to capture both structural and semantic information, all without increasing inference costs. Li et al. [30] introduced the SAM-Deblur framework, the first to apply SAM to image deblurring. By using plug-and-play Mask Average Pooling (MAP) modules and a mask dropout strategy, they effectively added structural priors and improved generalization under non-uniform blur. Liu et al. [31] proposed SeBIR, a general framework for burst image restoration guided by SAM’s semantic priors. It uses a joint explicit–implicit alignment strategy and a semantic-guided fusion module, leading to significant improvements in alignment accuracy and multi-frame information integration.
Although SAM-based methods have shown great potential, most still utilize shallow semantic concatenation or simple fusion strategies, which fail to fully exploit SAM’s semantic priors in deep feature representation. This limitation becomes particularly evident in challenging scenes involving structural misalignment or regional degradation. To address these issues, we propose a structure-aware enhanced dehazing framework, SAM2-Dehaze, which fully leverages the rich semantic priors of SAM2. By designing three key modules, our method integrates semantic guidance throughout the network, achieving superior semantic consistency, detail recovery, and structural preservation.
3. Proposed Model
3.1. Overview of SAM2-Dehaze Model
We propose a semantic-guided image restoration framework named SAM2-Dehaze, which leverages semantic information to enhance the dehazing process. The overall architecture of the proposed model is illustrated in Figure 2. We first utilize a pre-trained SAM2 to generate semantic prior information, which is effectively injected into multiple stages (G1–G5) via the SPFB, thereby strengthening the network’s perception and understanding of semantic regions. Specifically, before applying SPFB at the G2, G3, and G4 stages, the semantic features are down-sampled using bilinear interpolation to align their spatial dimensions with the fused feature maps. However, considering that standard convolution has limited capability in capturing features around object boundaries and fine details, we introduce the Parallel Detail-enhanced and Compression Convolution module to enhance feature representation across semantic regions, compensating for the limitations of conventional convolution in edge and detail perception. To further improve the semantic consistency and structural fidelity of the dehazing results, we design a SAB. By modeling high-level relationships between the initially-restored image and the semantic priors, SAB performs structural-level semantic alignment, ensuring that the final dehazed image exhibits clearer edges, richer details, and stronger semantic coherence.
Figure 2.
The architecture of SAM2-Dehaze. The input hazy image H is processed by SAM2 to generate a semantic map . The backbone U-Net consists of five stages (G1–G5), each integrating Parallel Detail-enhanced and Compression Convolution (PDCC) and Semantic Prior Fusion Block (SPFB), and finally outputs a clear image through the Structure-Aware Block (SAB). During feature fusion at the G2, G3, and G4 stages, we down-sampling the using bilinear interpolation to ensure its size matches that of the fused features. The training uses a combination of L1 and contrastive losses.
3.2. Semantic Prior Fusion Block
Due to its training on massive datasets and rich parameterization, SAM2 demonstrates exceptional segmentation capability across diverse scenarios and exhibits strong robustness to various types of image degradation. However, in image restoration tasks, directly using degraded images as input may hinder the model’s ability to accurately capture the true structural details of target regions. To address this, we leverage the semantic segmentation maps produced by SAM2 as prior information, providing rich and informative semantic cues to guide existing dehazing networks and thereby enhance overall restoration performance.
First, we utilize the semantic map extracted by SAM2 as an explicit prior and fuse it with the feature map F of the low-quality image to enhance the feature representation capability of the restoration model. Specifically, as illustrated in Figure 3, the image feature F and the semantic prior are concatenated along the channel dimension and processed by a convolutional function , which consists of two convolution layers and a ReLU activation, to generate the initial fused feature :
where denotes the concatenation operation along the channel dimension. Next, we introduce a feature interaction mechanism to further enhance the integration of semantic information during restoration. Two parallel feature extraction branches are employed, each consisting of two convolutional layers and a ReLU activation, forming the convolutional function . One branch extracts updated image features , while the other refines semantic prior representations from :
where P represents the semantic feature extracted from the SAM segmentation map. To establish explicit interaction between image and semantic features, we apply element-wise multiplication between the outputs of the two branches, along with skip connections to preserve essential structural information. This adaptive design strengthens the network’s ability to focus on critical semantic regions and improves overall restoration consistency.
where ⊗ denotes element-wise multiplication and ⊕ denotes element-wise addition.
Figure 3.
Architecture of the Semantic Prior Fusion Block (SPFB). The SPFB unit takes the semantic map and feature map F as input. After concatenation, they are processed to generate an intermediate feature , which is then fused with guidance features from via element-wise multiplication. The output is passed to subsequent network modules to enhance semantic representation and restoration quality.
3.3. Parallel Detail-Enhanced and Compression Convolution
Haze in natural scenes introduces variations in illumination and color, typically manifested in the loss of low-frequency components. Meanwhile, natural scenes covered by haze also tend to lose high-frequency details such as edges and contours. Traditional convolutional operations primarily focus on capturing low-frequency information while neglecting high-frequency restoration, which often becomes a performance bottleneck in dehazing performance. Moreover, standard convolutions easily introduce spatial redundancy, thereby reducing both efficiency and effectiveness. To overcome these limitations, we propose a parallel detail-enhanced and compression convolution module.
The PDCC module consists of three parallel branches: the Detail Enhancement Convolution (DEConv) [23] branch, the Convolution (Conv) Branch, and the Spatial-Channel Construction Convolution (SCConv) [32] branch, as illustrated in Figure 4. We first apply Batch Normalization to the input feature to obtain the normalized input . The normalized features are then fed into the three branches for independent processing. Specifically, the Conv branch is responsible for basic feature extration and the preservation of low-frequency information. The DEConv branch focuses on multi-directional high-frequency detail extraction and incorporates four types of difference convolutions: Center Difference Convolution (CDC), Angle Difference Convolution (ADC), Horizontal Difference Convolution (HDC), and Vertical Difference Convolution (VDC). These convolutions capture high-frequency details at multiple scales and orientations. However, directly applying them significantly increases the number of parameters and computational cost. To address this issue, following [23], we combine these convolution kernels at corresponding positions to form an equivalent kernel, thereby maintaining rich feature extraction capability while reducing computational complexity. The specific operations are as follows:
where represents the kernels used in the ith convolution operations, denotes the input feature map, denotes the convolution operation with a kernel of size k, and denotes the equivalent kernel obtained by combining the four kernels. The SCConv is designed to suppress spatial redundancy and enhance feature representation capability. It comprises two sequential components: a Spatial Reconstruction Unit (SRU) and a Channel Reconstruction Unit (CRU). The SRU reduces spatial redundancy using a “separation-and-reconstruction” strategy, while the CRU employs a “split-transform-fuse” mechanism to minimize channel redundancy. The specific operations are as follows:
where denotes the feature map after spatial reconstruction. Finally, the outputs from the Conv, DEConv, and SCConv branches are aggregated and refined using a 5 × 5 convolution layer to produce the final output feature :
Figure 4.
Architecture of the Parallel Detail-enhanced and Compression Convolution (PDCC). The module consists of a DEConv branch (VDC, HDC, CDC, ADC), a Conv branch, and an SCConv branch (SRU, CRU), which are responsible for detail extraction, basic feature learning, and spatial–channel refinement, respectively. The outputs are fused and further processed by a 5 × 5 convolution to generate the enhanced feature representation.
This parallel architecture enables the PDCC module to effectively recover both low- and high-frequency information, enhance structural detail perception, and suppress redundancy, thereby improving the overall performance and efficiency of the dehazing network.
3.4. Semantic Alignment Block
Semantic information plays a crucial role in restoring degraded images, particularly in reconstructing color, contrast, and texture consistency. Objects belonging to the same semantic category often share similar structural and visual characteristics. These intra-class semantic correlations can effectively constrain the solution space of image restoration, facilitating the preservation of both global structures and fine-grained details. To leverage this property, we design a Semantic Alignment Block in the post-processing phase of the dehazing network, where high-level semantics act as structural priors for further refinement. The SAB extracts key semantic features from the semantic segmentation map generated by SAM2 and fuses them with the structural representations of the coarsely dehazed image. This deep semantic–structural integration improves the network’s ability to recognize semantic regions and maintain structural integrity, thereby enhancing the clarity, naturalness, and semantic coherence of the final restored image.
Specifically, as illustrated in Figure 5, depth-wise separable convolutions are first applied to the semantic segmentation map obtained from SAM2 to extract semantic-guided features. Subsequently, a Residual Dense Block (RDB) [33] is applied to the coarsely dehazed image to obtain refined image features. The RDB serves an efficient residual dense module that enhances the representation of structural and fine details through residual learning and dense connections. Next, the semantic-guided features and the image features from the coarse dehazing stage are fused to enable semantic–structural alignment. To further refine the fused representation, multiple standard convolutional layers followed by a hyperbolic tangent (Tanh) activation function are employed to generate the final dehazed image . With the guidance of high-level semantic priors provided by SAM2, the proposed framework can effectively recover semantically consistent, clearer, and more natural haze-free images.
Figure 5.
Detailed structure of the Semantic Alignment Block. The generated close dehazed image is fused with the semantic-guided features to produce a finer dehazed result. Benefiting from high-level semantic information, the refined dehazed image can better preserve the structure, color, and details of objects.
3.5. Train Loss
During the training process, we adopt the L1 loss and contrastive loss to optimize the quality of the dehazed images. The L1 loss is used to directly constrain the pixel-level difference between the dehazed image and the ground-truth haze-free image, while the contrastive loss enforces the similarity of deep feature representations to improve structural fidelity and perceptual quality. Specifically, given a hazy image H and its corresponding clear image C, we denote the predicted dehazed image from our SAM2-Dehaze as . The contrastive loss optimization objective can be formulated as
where denotes the features extracted from the i layer of a fixed pre-trained model and represents the corresponding weight coefficient for that layer. In our study, we extract features from layers 11, 35, 143, and 152 in the ResNet-152 [5] and set the weights to and 1, respectively. In fact, previous research [34] has demonstrated that, in image restoration tasks, compared with the L2 norm, the L1 norm yields better performance. The L1 loss is defined as
where N denotes the total number of pixels in the image and and represent the predicted dehazed result and the j pixel values of the ground-truth haze-free image, respectively. By combining the L1 loss and contrastive loss, our final loss function is defined as
where is the hyperparameter used to balance the two loss terms and is set to 0.1 in our experiments.
4. Experiments and Results
4.1. Datasets and Evaluation Metrics
Datasets: We conduct a comprehensive evaluation of the proposed method on both synthetic and real-world image datasets. For synthetic data, we employ the Realistic Single Image Dehazing (RESIDE) [35] dataset, which is a widely used benchmark comprising multiple subsets, including the Indoor Training Set (ITS), Outdoor Training Set (OTS), Synthetic Objective Testing Set (SOTS), and Hybrid Subjective Testing Set (HSTS). In the synthetic experiments, our model is trained on ITS and OTS, and evaluated on SOTS-indoor and SOTS-outdoor, respectively. For real-world scenarios, we adopt the Dense-Haze [36], NH-Haze [37], and RTTS datasets to assess the model’s performance on natural hazy scenes. The detailed experimental configurations are summarized in Table 1.
Table 1.
The details of the datasets used in our experiments. ITS–L represents a model of type L trained on the ITS dataset.
Evaluation Metrics: To quantitatively assess dehazing performance, we employ seven widely used image quality evaluation metrics, which are categorized into full-reference and no-reference types. The full-reference metrics include the Peak Signal-to-Noise Ratio (PSNR) [38,39], Structural Similarity Index (SSIM) [40,41], and CIEDE2000 color difference [42]. The no-reference metrics consist of Fog Aware Density Evaluation (FADE) [43], Natural Image Quality Evaluator (NIQE) [44], Perception-based Image Quality Evaluator (PIQE) [45], and Blind/Referenceless Image Spatial Quality Evaluator (BRISQUE) [46]. These metrics are widely adopted in computer vision to measure both visual quality and perceptual consistency of images. In addition to image quality evaluation, we also assess network efficiency using the number of parameters (Param, in millions), computational complexity (MACs, in billions of multiply–accumulate operations), and inference latency (Latency, in milliseconds). To ensure fairness, all experiments are conducted on images with a resolution of 250 × 250.
4.2. Implementation Details
We use the SAM2 pre-trained model for segmentation. Compared with the SAM segmentation model, SAM2 demonstrates significant improvements in segmentation accuracy, speed, interaction efficiency, and applicability across various scenarios. In particular, it shows stronger information processing capabilities and generalization in zero-shot segmentation tasks. All experiments are conducted on an NVIDIA RTX A6000 GPU (48 GB VRAM), and the model is implemented based on the Pytorch 2.4.1 framework. During training, we optimize the network using the AdamW optimizer, with decay parameters set to and . The initial learning rate is set to , and it is gradually decreased to using a cosine annealing strategy to ensure training stability. During training, input images are randomly cropped into 256 × 256 patches. We design three variants of SAM2-Dehaze, named S (Small), B (Basic), and L (Large). Table 2 presents detailed configurations of each variant, with two key columns indicating the number of G-blocks in the network and their corresponding embedding dimensions.
Table 2.
Model Architecture Detailed.
4.3. Comparison with State of the Arts
In this section, we compare our SAM2-Dehaze with eleven dehazing methods, including DCP [12], MSCNN [16], AOD-Net [17], GridDehazeNet [20], FFA-Net [8], AECR-Net [21], Dehamer [47], MIT-Net [48], RIDCP [49], C2PNet [50], and DEA-Net [23]. For synthetic datasets (SOTS-indoor and SOTS-outdoor), all methods are trained and evaluated under identical experimental conditions. For real-world datasets, we compare seven methods (DCP, AOD-Net, FFA-Net, Dehamer, MIT-Net, RIDCP, MixDehazeNet [51]) on the Dense-Haze and NH-Haze datasets. Additionally, for the RTTS dataset, we benchmark our method against seven advanced models, including GridDehazeNet, FFA-Net, Dehamer, C2PNet, MIT-Net, DEA-Net, and IPC-Dehaze [52]. On synthetic datasets, we report results for three variants of SAM2-Dehaze (-S, -B, -L), while on the real-world datasets, only the SAM2-Dehaze-L variant is evaluated. To ensure a fair comparison, we use either the officially released code or the publicly reported results of existing methods. If these are unavailable, we retrain the models under the same dataset and parameter settings as our approach.
Results on Synthetic Datasets: Table 3 presents the quantitative evaluation results of our SAM2-Dehaze and other state-of-the-art methods on the SOTS dataset. As shown, the SAM2-Dehaze-L achieves the best performance on the SOTS-indoor dataset, reaching a PSNR of 42.83 dB and an SSIM of 0.997, outperforming all other methods. Even the lightweight SAM2-Dehaze-S variant ranks second, with a PSNR of 41.41 dB and an SSIM of 0.996. On the SOTS-outdoor dataset, our method does not achieve the best result but still ranks in the upper-middle tier, with the SAM2-Dehaze-L variant achieving a PSNR of 36.22 dB and SSIM of 0.989. In addition, as shown Table 4, we report Params, MACs, and Latency as the main metrics for computational efficiency. Compared with recent advanced methods, SAM2-Dehaze achieves a balanced trade-off between accuracy and efficiency. Given the performance improvements, the slight increase in computational cost is acceptable. All efficiency metrics, including Params, MACs, and Latency, are computed using 256 × 256 RGB input images, ensuring fair and consistent evaluation.
Table 3.
Quantitative comparison on SOTS-indoor/outdoor. We report PSNR, SSIM and CIEDE2000. The ↑ indicates that a larger value is better, the ↓ indicates that a smaller value is better. The symbol “— ” indicates that the value is unavailable. Bold and underlined values represent the best and second-best results.
Table 4.
Computational efficiency comparison. Bold and underlined values denote the best and second-best results, respectively.
Figure 6 and Figure 7 present the visual comparisons between our proposed method and several state-of-the-art dehazing algorithms on the SOTS-indoor and SOTS-outdoor datasets. To provide a more comprehensive evaluation, we additionally analyze the RGB histogram distributions of the restored images, enabling quantitative assessment of color consistency alongside subjective visual inspection. On the SOTS-indoor dataset, traditional methods such as DCP and RIDCP perform poorly, exhibiting noticeable color distortions and residual artifacts in the reconstructed images. AOD-Net also struggles to remove haze effectively. Although GridDehazeNet, FFA-Net, Dehamer, and MIT-Net produce visually appealing results, our method achieves RGB distributions that align more closely with the ground-truth clear images, demonstrating clear advantages in both color restoration and detail preservation. For the SOTS-outdoor dataset, DCP often introduces sky region artifacts, while AOD-Net retains partial structural information yet leaves visible haze residuals. It is important to note that SOTS-outdoor is a synthetic dataset generated based on the atmospheric scattering model, and some nominally “haze-free” images may still contain subtle haze traces, which can introduce training noise and bias the evaluation. While methods such as GridDehazeNet, FFA-Net, Dehamer, MIT-Net, and RIDCP perform competitively on synthetic datasets, they exhibit limitations in handling residual real-world haze embedded in the scenes. In contrast, our proposed SAM2-Dehaze not only removes synthetic haze but also demonstrates stronger robustness and adaptability when dealing with complex or residual real-world haze, thereby achieving superior performance in both visual fidelity and detail preservation.
Figure 6.
Visual and histogram results of SOTS−indoor dataset by different methods. Zoom in for best view.
Figure 7.
Visual and histogram results of SOTS−outdoor dataset by different methods. Zoom in for best view.
Results on Real-World Datasets: Table 5 presents the quantitative comparison between our SAM2-Dehaze-L and several state-of-the-art methods on the Dense-Haze and NH-Haze datasets. As shown, our method achieves superior and stable performance across both datasets. On the Dense-Haze dataset, our method attains the best results, achieving a PSNR of 20.61 dB, SSIM of 0.725, and CIEDE2000 of 8.5909. Compared with the baseline MixDehazeNet-L, our method approach improves PSNR by 4.71 dB, SSIM by 0.146, and CIEDE2000 by 28.99%. On the NH-Haze dataset, our method also achieves the best results, with a PSNR of 22.02 dB, SSIM of 0.831, and CIEDE2000 of 8.5108. Although the improvement margin is smaller: 1.01 dB in PSNR, 0.004 in SSIM, and 10.53% in CIEDE2000. This smaller gain can be attributed to the higher haze concentration and stronger degradation in the Dense-Haze dataset, which presents more significant challenges for conventional feature extraction. In contrast, our method effectively leverages the semantic priors provided by SAM2, substantially enhancing the network’s structural perception and semantic representation under dense-haze conditions. Consequently, SAM2-Dehaze-L achieves more robust and reliable restoration results, demonstrating significant improvements in visual quality and quantitative performance.
Table 5.
Quantitative comparison on the Dense-Haze and NH-Haze datasets. The ↑ indicates that a larger value is better, the ↓ indicates that a smaller value is better. Bold and underlined values denote the best and second-best results, respectively.
Figure 8 and Figure 9 illustrate the qualitative comparison results of our method on the Dense-Haze and NH-Haze datasets. Traditional methods such as DCP and RIDCP exhibit noticeable color distortions and residual haze on both datasets. Lightweight deep models like AOD-Net show limited dehazing capability, often failing to recover clear structures in dense-haze regions. While FFA-Net and MIT-Net achieve better visual quality, they struggle with detail preservation, resulting in blurry local textures. Dehamer enhances overall brightness but introduces artifacts and unnatural color tones, whereas MixDehazeNet achieves more balanced performance yet still fails to handle regions with heavy haze accumulation. In contrast, our proposed SAM2-Dehaze achieves superior dehazing performance on both datasets. It effectively restores realistic color, fine textures, and structural consistency, while suppressing noise and residual artifacts. The resulting images appear visually clearer, more natural, and highly consistent with the ground-truth references, confirming the robustness and generalization of our approach under challenging real-world haze conditions.
Figure 8.
Visual results of NH datasets by different methods. Zoom in for best view.
Figure 9.
Visual results of Dense datasets by different methods. Zoom in for best view.
Results on RTTS datasets: Table 6 presents the quantitative comparison between our method and several state-of-the-art dehazing approaches on the RTTS dataset, evaluated using four no-reference image quality metrics: FADE, NIQE, PIQE, and BRISQUE. As shown in the table, although IPC-Dehaze achieves the best overall performance, our method outperforms most existing methods across all metrics.
Table 6.
Quantitative comparison on RTTS. The ↓ indicates that a smaller value is better. Bold and underlined values indicate the best and second-best results, respectively.
To further assess the qualitative performance, Figure 10 provides a visual comparison on the RTTS dataset. Overall, FFA-Net, Dehamer, GridDehazeNet, and MIT-Net exhibit limited dehazing capability under complex real-world conditions, with noticeable residual haze persisting in the reconstructed images. ICP-Dehaze demonstrates a certain level of effectiveness in some samples but still falls short in terms of structural detail recovery and color fidelity. In contrast, our method consistently delivers superior visual quality across all test images. The restored results are significantly cleaner and more visually pleasing, with natural color reproduction and well-preserved details, without introducing noticeable artifacts or over-enhancement effects.
Figure 10.
Visual comparison on RTTS. Zoom in for best view.
4.4. Ablation Study
Impact of Different Components in the Network. To further validate the effectiveness of each proposed component, we conduct ablation studies to analyze the contributions of the key modules, including the SPFB, the PDCC, and the SAB. We take MixDehazeNet-S as the baseline network and construct four variants based on it as follows:
- (1)
- Base + SPFB → V1
- (2)
- Base + HEConv → V2
- (3)
- Base + SPFB + HEConv → V3
- (4)
- Base + SPFB + HEConv + SAB → V4
All models are trained using the same training strategy, and the “S” variant is evaluated on the ITS-indoor test set. The experimental results are shown in Table 7 and Figure 11. As illustrated in Table 7, each proposed module contributes notably to the overall improvement in dehazing performance. Specifically, the SPFB module increases PSNR by 1.77 dB compared with the baseline. Similarly, the PDCC module introduces additional gains by improving fine-grained edge and structure restoration. Overall, each component plays a complementary role in enhancing dehazing quality, verifying the effectiveness of our architectural design. To further illustrate the contribution of each module, we visualize their respective feature maps. As shown: enhances some edge features but still suffers from coarse outputs and insufficient detail recovery; improves edge and structure clarity to some extent but lacks fine-grained textures; yields moderate global improvement but fails to preserve table textures and background structures, resulting in blurry features. In contrast, the , which integrates SPFB, PDCC, and SAB, produces much clearer features with better spatial sharpness and detail recovery, including more precise object boundaries. These visualizations further confirm the effectiveness of each proposed component.
Table 7.
Ablation study on the RESIDE-indoor dataset. The ✓ indicates that the corresponding module is enabled, while w/o denotes that the module is not used. Bold values indicate the best results.
Figure 11.
Visual comparison of intermediate features in ablation models.
Ablation Study on SPFB Module. To thoroughly validate the effectiveness of the proposed SPFB module, we conducted a series of detailed ablation experiments from two perspectives: external fusion strategies and internal structural variations in the SPFB module.
- (1)
- External fusion strategies
- SPFB-N1: The SPFB module is removed, and feature fusion is performed via simple element-wise addition.
- SPFB-N2: The input RGB image is extended to four channels by appending the semantic segmentation mask as an additional channel before feeding it into the dehazing network.
- (2)
- Internal structural variations
- SPFB-F1: The fusion with input feature F is removed, and only the intermediate semantic attention map is used within function .
- SPFB-F2: The feature extraction branch responsible for obtaining semantic priors from is removed.
As shown in the results Table 8, none of these alternatives, whether in terms of fusion strategies or SPFB structural variants, can match the performance of our original SPFB design. This clearly demonstrates the superior capability of our SPFB module in both structural fusion and semantic guidance. Moreover, we further analyzed the impact of inserting the SPFB module at different positions in the network. As illustrated in Table 8, increasing the number of inserted SPFB modules consistently improves the network performance. This progressive enhancement trend further confirms the effectiveness of the SPFB module, especially in boosting the modeling capacity of multi-level features through structural and semantic reinforcement.
Table 8.
Ablation study on different fusion methods (left) and SPFB insertion locations (right). Bold values indicate the best results.
Impact of Loss Function Hyperparameters. To select the optimal hyperparameters, we conduct ablation experiments on the weight parameters of the contrastive loss and L1 loss. A penalty factor lambda is introduced before the contrastive loss to determine its optimal contribution. As shown in Figure 12, when , the model achieves the best performance in both PSNR and SSIM, indicating the most effective dehazing capability.
Figure 12.
Effect of the parameter on dehazing performance. The plots illustrate the impact of different values—used to balance the L1 loss and contrastive loss—on PSNR (left) and SSIM (right). The best performance is achieved when = 0.1, yielding the highest PSNR of 41.41 dB and the highest SSIM of 0.9961.
5. Conclusions
This paper addresses the limitations of traditional image dehazing methods in semantic understanding and detail restoration by proposing a novel dehazing framework, SAM2-Dehaze, which integrates semantic prior information from a large-scale pre-trained model. We incorporate the SAM2 model to fully exploit its powerful capabilities in semantic segmentation and structural perception, and design, three key modules: the Semantic Prior Fusion Block (SPFB), the Parallel Detail-enhanced and Compression Convolution (PDCC), and the Semantic Alignment Block (SAB). These modules work collaboratively to significantly enhance semantic consistency, detail preservation, and structural reconstruction during the dehazing process. Extensive experiments on multiple standard hazy-image datasets demonstrate that the proposed method outperforms existing state-of-the-art approaches in both quantitative metrics and subjective visual quality, showing strong generalization and practical potential.
Although SAM2-Dehaze exhibits excellent performance and a strong generalization ability, it still depends on a pre-trained semantic segmentation model, which may introduce additional computational overhead in resource-constrained environments. Future work will explore lightweight semantic embedding strategies and self-distillation mechanisms to reduce dependence on large-scale foundation models and further improve real-time performance. Moreover, when the haze in an image is excessively dense, even powerful large-scale models such as SAM2 struggle to achieve accurate segmentation. The lack of sufficient semantic information further limits the dehazing performance of the model (as shown in Figure 13), which remains one of the key challenges to be addressed in future research.
Figure 13.
Semantic segmentation and dehazing results of SAM2-Dehaze. The model’s dependence on a pre-trained semantic segmentation network restricts its performance in extremely dense-haze and resource-limited scenarios.
Author Contributions
Conceptualization, Z.H.; methodology, S.L. and Z.H.; software, S.L.; validation, S.L.; formal analysis, S.L.; investigation, S.L. and J.W.; resources, Z.H.; writing—original draft preparation, S.L.; writing—review and editing, Z.H. and J.W.; visualization, S.L.; supervision, Z.H.; project administration, Z.H.; funding acquisition, Z.H. All authors have read and agreed to the published version of the manuscript.
Funding
This work is supported by the Natural Science Foundation of Henan Province (No. 242300420284), the Henan Provincial Science and Technology Research Project (No. 252102211015), and the Fundamental Research Funds for the Universities of Henan Province (No. NSFRF240820).
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Data Availability Statement
The datasets used in this study are publicly available. RESIDE (including ITS/OTS/SOTS/RTTS and related subsets) can be accessed at the authors’ project site: RESIDE: https://sites.google.com/view/reside-dehaze-datasets (accessed on 12 January 2025). Dense-HAZE (NTIRE 2019) is available at https://data.vision.ee.ethz.ch/cvl/ntire19/dense-haze (accessed on 12 January 2025). NH-HAZE (NTIRE 2020) is available at https://data.vision.ee.ethz.ch/cvl/ntire20/nh-haze (accessed on 12 January 2025).
Acknowledgments
This work acknowledges the contributions of Yizhang Meng (School of Intelligent Sensing and Instrumentation, Zhongyuan University of Technology, Zhengzhou, 450007, Henan, China. myz 123456202108@163.com) to the experiments, data collection, and the visualization of figures and results in this study.
Conflicts of Interest
Author Jianchao Wang was employed by the company China National Software & Service Company Limited. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Abbreviations
The following abbreviations are used in this manuscript:
| SAM | Segment Anything Model |
| SPFB | Semantic Prior Fusion Block |
| PDCC | Parallel Detail-enhanced and Compression Convolution |
| SAB | Semantic Alignment Block |
| ASM | Atmospheric Scattering Model |
| DCP | Dark Channel Prior |
| U-net | U-shaped convolutional neural network |
| VGG16 | Visual Geometry Group 16-layer network |
| ReLU | Rectified Linear Unit |
| DEConv | Detail Enhancement Convolution |
| CDC | Center Difference Convolution |
| ADC | Angle Difference Convolution |
| HDC | Horizontal Difference Convolution |
| VDC | Vertical Difference Convolution |
| SCConv | Spatial-Channel Construction Convolution |
| SRU | Spatial Reconstruction Unit |
| CRU | Channel Reconstruction Unit |
| RDB | Residual Dense Block |
| PSNR | Peak Signal-to-Noise Ratio |
| SSIM | Structural Similarity Index |
| FADE | Fog Aware Density Evaluation |
| NIQE | Natural Image Quality Evaluator |
| PIQE | Perception-based Image Quality Evaluator |
| BRISQUE | Blind/Referenceless Image Spatial Quality Evaluator |
References
- Khan, H.; Xiao, B.; Li, W.; Muhammad, N. Recent Advancement in Haze Removal Approaches. Multimed. Syst. 2022, 28, 687–710. [Google Scholar] [CrossRef]
- Ju, M.; Ding, C.; Ren, W.; Yang, Y.; Zhang, D.; Guo, Y.J. IDE: Image Dehazing and Exposure Using an Enhanced Atmospheric Scattering Model. IEEE Trans. Image Process. 2021, 30, 2180–2192. [Google Scholar] [CrossRef]
- Wang, X.; Chen, X.A.; Ren, W.; Han, Z.; Fan, H.; Tang, Y.; Liu, L. Compensation Atmospheric Scattering Model and Two-Branch Network for Single Image Dehazing. IEEE Trans. Emerg. Top. Comput. Intell. 2024, 8, 2880–2896. [Google Scholar] [CrossRef]
- Xin, W.; Xudong, Z.; Jun, Z.; Rui, S. Image Dehazing Algorithm by Combining Light Field Multi-Cues and Atmospheric Scattering Model. Opto-Electron. Eng. 2025, 47, 190634-1. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16×16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
- Cui, Y.; Wang, Q.; Li, C.; Ren, W.; Knoll, A. EENet: An effective and efficient network for single image dehazing. Pattern Recognit. 2025, 158, 111074. [Google Scholar] [CrossRef]
- Qin, X.; Wang, Z.; Bai, Y.; Xie, X.; Jia, H. FFA-Net: Feature Fusion Attention Network for Single Image Dehazing. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), New York, NY, USA, 7–12 February 2020; Volume 34, Number 7. pp. 11908–11915. [Google Scholar]
- Wang, Y.; Yan, X.; Wang, F.L.; Xie, H.; Yang, W.; Zhang, X.P.; Qin, J.; Wei, M. UCL-Dehaze: Toward real-world image dehazing via unsupervised contrastive learning. IEEE Trans. Image Process. 2024, 33, 1361–1374. [Google Scholar] [CrossRef]
- Kirillov, A.; Mintun, E.; Ravi, N.; Mao, H.; Rolland, C.; Gustafson, L.; Xiao, T.; Whitehead, S.; Berg, A.C.; Lo, W.Y.; et al. Segment Anything. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 1–6 October 2023; pp. 4015–4026. [Google Scholar]
- Ravi, N.; Gabeur, V.; Hu, Y.T.; Hu, R.; Ryali, C.; Ma, T.; Khedr, H.; Rädle, R.; Rolland, C.; Gustafson, L.; et al. SAM 2: Segment anything in images and videos. arXiv 2024, arXiv:2408.00714. [Google Scholar]
- He, K.; Sun, J.; Tang, X. Single image haze removal using dark channel prior. IEEE Trans. Pattern Anal. Mach. Intell. 2010, 33, 2341–2353. [Google Scholar] [CrossRef]
- Zhu, Q.; Mai, J.; Shao, L. A fast single image haze removal algorithm using color attenuation prior. IEEE Trans. Image Process. 2015, 24, 3522–3533. [Google Scholar] [CrossRef]
- Berman, D.; Avidan, S. Non-local image dehazing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 1674–1682. [Google Scholar]
- Shi, F.; Jia, Z.; Zhou, Y. Zero-Shot Sand–Dust Image Restoration. Sensors 2025, 25, 1889. [Google Scholar] [CrossRef]
- Ren, W.; Pan, J.; Zhang, H.; Cao, X.; Yang, M.H. Single image dehazing via multi-scale convolutional neural networks with holistic edges. Int. J. Comput. Vis. 2020, 128, 240–259. [Google Scholar] [CrossRef]
- Li, B.; Peng, X.; Wang, Z.; Xu, J.; Feng, D. AOD-Net: All-in-one dehazing network. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 4770–4778. [Google Scholar]
- Zhang, H.; Patel, V.M. Densely connected pyramid dehazing network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 3194–3203. [Google Scholar]
- Li, T.; Liu, Y.; Ren, W.; Shiri, B.; Lin, W. Single Image Dehazing Using Fuzzy Region Segmentation and Haze Density Decomposition. IEEE Trans. Circuits Syst. Video Technol. 2025; in press. [Google Scholar] [CrossRef]
- Liu, X.; Ma, Y.; Shi, Z.; Chen, J. GridDehazeNet: Attention-based multi-scale network for image dehazing. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 7314–7323. [Google Scholar]
- Wu, H.; Qu, Y.; Lin, S.; Zhou, J.; Qiao, R.; Zhang, Z.; Xie, Y.; Ma, L. Contrastive learning for compact single image dehazing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 10551–10560. [Google Scholar]
- Hong, M.; Liu, J.; Li, C.; Qu, Y. Uncertainty-driven dehazing network. Proc. Aaai Conf. Artif. Intell. 2022, 36, 906–913. [Google Scholar] [CrossRef]
- Chen, Z.; He, Z.; Lu, Z.M. DEA-Net: Single image dehazing based on detail-enhanced convolution and content-guided attention. IEEE Trans. Image Process. 2024, 33, 1002–1015. [Google Scholar] [CrossRef]
- Wang, X.; Yang, G.; Ye, T.; Liu, Y. Dehaze-RetinexGAN: Real-World Image Dehazing via Retinex-based Generative Adversarial Network. Proc. Aaai Conf. Artif. Intell. 2025, 39, 7997–8005. [Google Scholar] [CrossRef]
- Son, D.M.; Huang, J.R.; Lee, S.H. Image Sand–Dust Removal Using Reinforced Multiscale Image Pair Training. Sensors 2025, 25, 1234. [Google Scholar] [CrossRef]
- Zhang, S.; Ren, W.; Tan, X.; Wang, Z.-J.; Liu, Y.; Zhang, J.; Zhang, X.; Cao, X. Semantic-aware dehazing network with adaptive feature fusion. IEEE Trans. Cybern. 2021, 53, 454–467. [Google Scholar] [CrossRef] [PubMed]
- Cheng, Z.; You, S.; Ila, V.; Li, H. Semantic single-image dehazing. arXiv 2018, arXiv:1804.05624. [Google Scholar] [CrossRef]
- Song, Y.; Yang, C.; Shen, Y.; Wang, P.; Huang, Q.; Kuo, C.C.J. SPG-Net: Segmentation prediction and guidance network for image inpainting. arXiv 2018, arXiv:1805.03356. [Google Scholar] [CrossRef]
- Zhang, Q.; Liu, X.; Li, W.; Chen, H.; Liu, J.; Hu, J.; Xiong, Z.; Yuan, C.; Wang, Y. Distilling semantic priors from SAM to efficient image restoration models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024; pp. 25409–25419. [Google Scholar]
- Li, S.; Liu, M.; Zhang, Y.; Chen, S.; Li, H.; Dou, Z.; Chen, H. SAM-Deblur: Let Segment Anything boost image deblurring. In Proceedings of the ICASSP 2024–IEEE International Conference on Acoustics, Speech and Signal Processing, Seoul, Republic of Korea, 14–19 April 2024; pp. 2445–2449. [Google Scholar]
- Liu, H.; Shao, M.; Wan, Y.; Liu, Y.; Shang, K. SeBIR: Semantic-guided burst image restoration. Neural Netw. 2025, 181, 106834. [Google Scholar] [CrossRef]
- Li, J.; Wen, Y.; He, L. ScConv: Spatial and channel reconstruction convolution for feature redundancy. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 6153–6162. [Google Scholar]
- Wang, Y.; Xiong, J.; Yan, X.; Wei, M. USCFormer: Unified transformer with semantically contrastive learning for image dehazing. IEEE Trans. Intell. Transp. Syst. 2023, 24, 11321–11333. [Google Scholar] [CrossRef]
- Zhao, H.; Gallo, O.; Frosio, I.; Kautz, J. Loss functions for image restoration with neural networks. IEEE Trans. Comput. Imaging 2016, 3, 47–57. [Google Scholar] [CrossRef]
- Li, B.; Ren, W.; Fu, D.; Tao, D.; Feng, D.; Zeng, W.; Wang, Z. Benchmarking single-image dehazing and beyond. IEEE Trans. Image Process. 2018, 28, 492–505. [Google Scholar] [CrossRef]
- Ancuti, C.O.; Ancuti, C.; Sbert, M.; Timofte, R. DENSE-HAZE: A benchmark for image dehazing with dense-haze and haze-free images. In Proceedings of the 2019 IEEE International Conference on Image Processing (ICIP), Taipei, Taiwan, 22–25 September 2019; pp. 1014–1018. [Google Scholar]
- Ancuti, C.O.; Ancuti, C.; Timofte, R. NH-HAZE: An image dehazing benchmark with non-homogeneous hazy and haze-free images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Seattle, WA, USA, 14–19 June 2020; pp. 444–445. [Google Scholar]
- Li, L.; Song, S.; Lv, M.; Jia, Z.; Ma, H. Multi-Focus Image Fusion Based on Fractal Dimension and Parameter Adaptive Unit-Linking Dual-Channel PCNN in Curvelet Transform Domain. Fractal Fract. 2025, 9, 157. [Google Scholar] [CrossRef]
- Lv, M.; Song, S.; Jia, Z.; Li, L.; Ma, H. Multi-Focus Image Fusion Based on Dual-Channel Rybak Neural Network and Consistency Verification in NSCT Domain. Fractal Fract. 2025, 9, 432. [Google Scholar] [CrossRef]
- Cao, Z.H.; Liang, Y.J.; Deng, L.J.; Vivone, G. An Efficient Image Fusion Network Exploiting Unifying Language and Mask Guidance. IEEE Trans. Pattern Anal. Mach. Intell. 2025, 47, 9845–9862. [Google Scholar] [CrossRef]
- Wang, Z.; Bovik, A.C.; Sheikh, H.R.; Simoncelli, E.P. Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Process. 2004, 13, 600–612. [Google Scholar] [CrossRef]
- Sharma, G.; Wu, W.; Dalal, E.N. The CIEDE2000 color-difference formula: Implementation notes, supplementary test data, and mathematical observations. Color Res. Appl. 2005, 30, 21–30. [Google Scholar] [CrossRef]
- Choi, L.K.; You, J.; Bovik, A.C. Referenceless prediction of perceptual fog density and perceptual image defogging. IEEE Trans. Image Process. 2015, 24, 3888–3901. [Google Scholar] [CrossRef] [PubMed]
- Mittal, A.; Soundararajan, R.; Bovik, A.C. Making a “completely blind” image quality analyzer. IEEE Signal Process. Lett. 2012, 20, 209–212. [Google Scholar] [CrossRef]
- Venkatanath, N.; Praneeth, D.; Sumohana, S.C.; Swarup, S.M. Blind image quality evaluation using perception based features. In Proceedings of the 2015 Twenty First National Conference on Communications (NCC), Mumbai, India, 27 February–1 March 2015; pp. 1–6. [Google Scholar]
- Mittal, A.; Moorthy, A.K.; Bovik, A.C. No-reference image quality assessment in the spatial domain. IEEE Trans. Image Process. 2012, 21, 4695–4708. [Google Scholar] [CrossRef] [PubMed]
- Guo, C.L.; Yan, Q.; Anwar, S.; Cong, R.; Ren, W.; Li, C. Image dehazing transformer with transmission-aware 3D position embedding. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 5812–5820. [Google Scholar]
- Shen, H.; Zhao, Z.Q.; Zhang, Y.; Zhang, Z. Mutual information-driven triple interaction network for efficient image dehazing. In Proceedings of the 31st ACM International Conference on Multimedia, Hyderabad, India, 6–11 April 2023; pp. 7–16. [Google Scholar]
- Wu, R.Q.; Duan, Z.P.; Guo, C.L.; Chai, Z.; Li, C. RIDCP: Revitalizing real image dehazing via high-quality codebook priors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 22282–22291. [Google Scholar]
- Zheng, Y.; Zhan, J.; He, S.; Dong, J.; Du, Y. Curricular contrastive regularization for physics-aware single image dehazing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 5785–5794. [Google Scholar]
- Lu, L.; Xiong, Q.; Xu, B.; Chu, D. MixDehazeNet: Mix structure block for image dehazing network. In Proceedings of the 2024 International Joint Conference on Neural Networks (IJCNN), Yokohama, Japan, 30 June–5 July 2024; pp. 1–10. [Google Scholar]
- Fu, J.; Liu, S.; Liu, Z.; Guo, C.L.; Park, H.; Wu, R.; Wang, G.; Li, C. Iterative Predictor-Critic Code Decoding for Real-World Image Dehazing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 11–15 June 2025; pp. 12700–12709. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).