You are currently viewing a new version of our website. To view the old version click .
Electronics
  • Article
  • Open Access

Published: 12 May 2024

g2D-Net: Efficient Dehazing with Second-Order Gated Units

,
and
1
Graduate School of Artificial Intelligence, Artificial Intelligence Research Center, Jeonju University, Jeonju-si 55069, Republic of Korea
2
Department of Artificial Intelligence, Artificial Intelligence Research Center, Jeonju University, Jeonju-si 55069, Republic of Korea
*
Authors to whom correspondence should be addressed.
This article belongs to the Section Artificial Intelligence

Abstract

Image dehazing aims to reconstruct potentially clear images from corresponding images corrupted by haze. With the rapid development of deep learning-related technologies, dehazing methods based on deep convolutional neural networks have gradually become mainstream. We note that existing dehazing methods often accompany an increase in computational overhead while improving the performance of dehazing. We propose a novel lightweight dehazing neural network to balance performance and efficiency: the g2D-Net. The g2D-Net borrows the design ideas of input-adaptive and long-range information interaction from Vision Transformers and introduces two kinds of convolutional blocks, i.e., the g2D Block and the FFT-g2D Block. Specifically, the g2D Block is a residual block with second-order gated units, which inherit the input-adaptive property of a gated unit and can realize the second-order interaction of spatial information. The FFT-g2D Block is a variant of the g2D Block, which efficiently extracts the global features of the feature maps through fast Fourier convolution and fuses them with local features. In addition, we employ the SK Fusion layer to improve the cascade fusion layer in a traditional U-Net, thus introducing the channel attention mechanism and dynamically fusing information from different paths. We conducted comparative experiments on five benchmark datasets, and the results demonstrate that the g2D-Net achieves impressive dehazing performance with relatively low complexity.
Keywords:
image dehazing; CNN; U-Net

1. Introduction

Image dehazing is a crucial task in the field of computer vision. It aims to restore potentially clear images from haze-affected images, enhancing image quality and visual clarity. Given its critical role in various fields, such as remote sensing, autonomous driving, and security monitoring, image dehazing has sparked widespread interest in academia and industry.
With the rapid development of deep learning techniques over the past decade, deep convolutional neural networks (CNNs) [,] have been vital in driving significant progress in computer vision. Although the emergence of Vision Transformers (ViTs) [] has posed a significant challenge to the dominance of CNNs, CNNs currently remain the preferred approach for image dehazing tasks. This preference stems from two main reasons:
  • In image dehazing research, acquiring paired hazy and clear images is an arduous endeavor, resulting in the prevalent dehazing datasets being relatively small in scale.
  • In applications such as autonomous driving and security surveillance, where image dehazing computations often occur on edge devices with limited computational resources, both computational efficiency and dehazing performance hold equal importance. CNNs generally offer higher computational efficiency in such scenarios.
However, this does not mean that a Vision Transformer cannot promote the development of the image dehazing field; many current studies have shown that ViTs and CNNs can promote each other’s development [,]. In order to achieve better results, this study borrows some of the key design concepts that have made ViTs successful in the field of computer vision and applies them to CNNs to improve the performance of dehazing while maintaining the efficient computation of the CNNs. The dehazing method proposed in this study is called the second-order Gate Dehaze U-Net (g2D-Net). To validate the effectiveness of the g2D-Net, we conducted experiments on mainstream dehazing datasets. As shown in Figure 1, the g2D-Net can achieve impressive performance using a small computational overhead.
Figure 1. Comparisons between our methods and other state-of-the-art methods. (Left): PSNR vs. MACs on a SOTS-Indoor dataset. (Right): The number of parameters of the models.
Specifically, the study’s aims were as follows:
  • Inspired by the input-adaptive property of ViTs, we construct a residual block with the second-order spatial interaction mechanism based on gated convolution []: the g2D Block. We use it as a backbone to construct a lightweight dehazing U-Net [] with seven stages.
  • To provide the network with some long-range capability and keep the computation as efficient as possible, we propose the FFT-g2D Block. This residual block is a derivative of the g2D Block, which uses fast Fourier convolution [] to extract global information about the feature map in the frequency domain.
  • The g2D-Net replaces the cascading fusion layer in the traditional U-Net with an SK fusion layer modified from the SK-Attention mechanism [], thus dynamically fusing information from different paths.
  • In the g2D-Net’s training, we input/output hazy/clear images with different sizes in multiple stages. This multi-input/output strategy can effectively reduce the training difficulty of the network and significantly improve the convergence speed of the network [].

3. Methods

The g2D-Net is a U-Net structure containing seven stages with local and global residuals, and its overall architecture and the design of some of its modules are shown in Figure 2. The g2D-Net contains two convolutional residual blocks: the g2D Block and the FFT-g2D Block. These two types of residual blocks allow the network to have capabilities such as being input-adaptive and having a long range. In addition, the g2D-Net uses the SK Fusion layer to improve performance further and uses multiple input/output strategies to reduce the difficulty of model training.
Figure 2. (a) The overall architecture of the second-order Gate Dehaze U-Net (g2D-Net). (b) The architecture of the shallow layer. (c) The architecture of the g2D Block. (d) The architecture of the SK Fusion layer. The architecture of the FFT-g2D Block is shown in Figure 3.

3.1. The g2D Block

Existing research indicates that gating units and their variants can effectively enhance the performance of models in various computer vision tasks [,,]. However, the existing gating units cannot facilitate information being transferred across long-range and high-order interactions. On the other hand, input-adaptive, long-range, and high-order interactions may be critical factors in the success of ViTs. Inspired by this, we propose a convolutional residual block with input-adaptive and second-order interaction capabilities: the g2D Block. The g2D Block operates fundamentally based on gated convolution units []. Distinct from the typical gated convolution block, the g2D Block comprises two gated convolution units, thereby achieving a second-order spatial interaction. Gated convolution learns, for each channel and each spatial location, a dynamic feature selection mechanism acting on the feature map, a property similar to the input adaptivity of self-attention, and is inherited by the g2D Block.
Let   X R C × H × W   represent the input feature map, and ϕ i represents the i-th point-wise convolution. The g2D Block initially employs ϕ 0 to project X to ϕ 0 X R 2 C × H × W , subsequently dividing it into three feature vectors for processing based on channels:
P 0 C 2 × H × W , D 0 C 2 × H × W , D 1 C × H × W = S p l i t ϕ 0 X
The g2D Block consists of two gated convolution operations. In these two operations, P 1 and P 2 represent their outputs, respectively. First, we compute P 1 using Equation (3), followed by mapping P 1 through ϕ 1 and participating in the computation of the second gated convolution along with D 1 to obtain P 2 :
P 1 C / 2 × H × W = σ P 0 C / 2 × H × W D W D 0 C / 2 × H × W
P 2 C × H × W = σ ϕ 1 P 1 C × H × W D W D 1 C × H × W
Here, σ represents the Sigmoid function, which maps the output of the gated convolution operation to the interval (0, 1), thus helping to mitigate the risk of gradient explosion during the model’s training process. D W represents depth-wise convolution. Although the Sigmoid function can be replaced with bounded functions like hard sigmoid, tanh, and hard tanh, we still strongly recommend using the sigmoid function for best performance.

3.2. The FFT-g2D Block

To give the neural network a certain degree of long-range capability, we propose the FFT-g2D Block based on the g2D Block. The FFT-g2D Block is a derivative version of the g2D Block used in some stages. Figure 3 illustrates its structure. The FFT-g2D Block employs both 3 × 3 depth-wise convolution (DW-Conv) [] and FFT convolution (FFT-Conv) operators, allowing it to capture both local and global information from the feature maps simultaneously.
Figure 3. (a) The architecture of the FFT-g2D Block. (b) The architecture of the FFT-Conv module in the FFT-g2D Block. The meanings of the symbols in the figure are consistent with those listed in Figure 2.
The FFT-Conv operator, utilizing channel-wise fast Fourier transform, effectively extracts global information from the feature maps in the frequency domain. Let X R C × H × W represent the input feature map. It is first subjected to discrete Fourier transformation, transitioning the feature map from the spatial domain to the frequency domain:
X F = F ( X ) C C × H × W
Performing convolution in the spatial domain is equivalent to point-wise multiplication in the frequency domain. Therefore, by applying a learnable frequency domain filter K C C × H × W to X F through point-wise multiplication, frequency domain information can be filtered. This filter in the frequency domain, referred to as the global filter, has the exact dimensions as X F . Finally, the filtered features in the frequency domain are transformed back to the spatial domain using the inverse discrete Fourier transform:
X F 1 ( K X F ) R C × H × W
FFT-Conv is equivalent to depth-wise global circular convolution, but it has a time complexity of only O C H W l o g H W . Due to the significantly higher complexity of the FFT-g2D Block compared to that of the g2D Block and its proportionality to the input feature map dimensions, we choose to incorporate FFT-g2D Blocks in only select stages.

3.3. SK Fusion

The SK Fusion layer is a simple improvement of the SK module, introducing channel attention mechanisms into the model while dynamically integrating feature maps from both the encoder and decoder stages. Several studies have demonstrated that the SK Fusion layer can significantly enhance model performance, mainly when dealing with low-level computer vision tasks [,]. Figure 2d illustrates the SK Fusion layer’s structure. It takes feature maps f 1 from the encoder stage and f 2 from the main path. Initially, they are fused through element-wise addition. Subsequently, a sequence of operations, including Global Average Pooling (GAP), Multilayer Perceptron (MLP), and SoftMax, are performed to extract channel attention (the MLP in the SK Fusion layer consists of two fully connected layers. The first fully connected layer reduces the number of channels, c to c / 8 , while the second fully connected layer increases the number of channels to 2 c . The ReLU activation function is applied between the two fully connected layers). The attention vector is then separated in the channel dimension to obtain the weights w 1 and w 2 for f 1 and f 2 , respectively. Finally, f 1 and f 2 are weighted according to these weights and added to obtain the output of the SK Fusion layer. The process is as follows:
w 1 , w 2 = S p l i t S o f t M a x M L P G A P f 1 + f 2
o u t = w 1 f 1 + w 2 f 2

3.4. The Loss Function

The loss function of the g2D-Net comprises two parts: the spatial loss function and the frequency loss function. Both of these components utilize L 1 loss functions. The ultimate loss is computed as the weighted sum of the spatial loss and the frequency loss:
L = i = 1 3 ( y ^ i y i 1 + λ F y ^ i F y i 1 )
In the equation, i represents the index of different sizes of input/output images, where y and ŷ represent the output images and the label images, respectively. The hyperparameter λ is configured to be 0.1.

3.5. Architectural Details

The g2D-Net follows the design of [,], with the ratio of block quantities set as [ M : M : M : 2 M : M : M : M ] across different stages. As a lightweight model, in the g2D-Net, we set M = 2 . We introduce a variant with increased depth called g2D-Net++ to cater to different scenarios. g2D-Net++ is twice as deep as the g2D-Net, with M = 4 . In the ablation study section, we also discussed the scheme of increasing the embedding dimension of the model. However, experimental results showed that increasing the number of blocks is a more effective way of expanding the g2D-Net. The architectural details can be found in Table 1.
Table 1. The detailed architecture specifications. Bold indicates that the FFT-g2D Block was used.

4. Experiment

We conducted extensive experiments on synthetic datasets (RESIDE [] and Haze-4K []) and real-world datasets (Dense-Haze [] and NH-Haze []), evaluating the models’ objective dehazing performance using the PSNR and the SSIM. We performed model training separately for indoor and outdoor scenes for the RESIDE dataset and evaluated them on the corresponding SOTS-Indoor and SOTS-Outdoor test sets. The Dense-Haze dataset contains densely and uniformly hazed images, and the NH-Haze dataset contains non-uniformly hazed images. Both datasets consist of 55 image pairs, with 50 pairs used for training and the remaining five used for testing. We utilized the PyTorch (Version: 1.9.0) framework and NVIDIA A100 GPU for model construction and training. The warmup strategy was employed during the initial 50 epochs to increase the learning rate to its initial value gradually. Subsequently, the learning rate was gradually reduced to 1/100 of the initial learning rate according to a cosine decay strategy []. We employed AdamW [] as the optimizer for model training.

4.1. The Main Experimental Results

We first conducted experiments on synthetic datasets. For all variants of the g2D-Net, we randomly cropped training images to 256 × 256, employed a Mini Batch Size of 32, and conducted training for 1000 epochs. The initial learning rates for the g2D-Net and g2D-Net++ were set to 8 × 10−4 and 4 × 10−4, respectively. Figure 4 illustrates the training process of the g2D-Net across different datasets. To assess the effectiveness of our proposed approach, we performed quantitative performance comparisons between the g2D-Net and the SOTA methods. The results of the comparative experiments are elaborated in detail in Table 2. The experimental findings demonstrate that our method exhibits impressive performance across multiple datasets, striking a favorable balance between dehazing effectiveness and computational efficiency. Despite being a lightweight model, the g2D-Net achieves SOTA performance across various datasets. Specifically, compared to the classic FFA-Net [], the g2D-Net utilizes only approximately 7.7% of the parameter count and about 1.7% of the MACs. However, it improves the PSNR by 2.88 dB and 2.46 dB on the RESIDE-IN and RESIDE-OUT datasets. In contrast to other more advanced methods, the g2D-Net achieves a notable reduction in parameter counts and MACs, yet it attains dehazing performance close to or surpassing their effects. For instance, compared to the advanced Transformer-based method DehazeFormer-B [], g2D-Net++ exhibits a parameter reduction of approximately 70%. However, g2D-Net++ achieves an increase in the PSNR by 2.42 dB on the RESIDE-IN dataset. Figure 5 illustrates the dehazing effects of the g2D-Net in different scenarios. The g2D-Net handles various dehazing situations, effectively restoring details and textures affected by haze, suppressing artifacts, enhancing clarity, improving contrast, and recovering color in the images.
Figure 4. The training process of the g2D-Net and g2D-Net++. The vertical axis represents the PSNR of the models on the test set. The data were sampled at intervals with a sampling step of 20 for better visualization.
Table 2. The benchmarking dehazing methods on synthetic datasets. The data for the other methods in the table are taken from their respective papers. ‘-’ indicates that there are no such data in the original paper. The best performance will be displayed in bold; the second-best performance will be indicated using underlining.
Figure 5. A comparison of the dehaze results of different methods in an indoor scene and an outdoor scene. The haze images are from the RESIDE dataset. GT denotes Ground Truth images.
To better evaluate the g2D-Net, we conducted experiments on two more challenging real-world datasets. During training, input images were resized to 800 × 1200, while full-size images were used during testing. Figure 6 illustrates the test results of the g2D-Net on the Dense-Haze and NH-Haze test sets. Table 3 presents the comparative experimental results between the g2D-Net and other methods. The experimental findings indicate that, compared to synthetic datasets, the model’s dehazing performance declines when faced with more challenging real-world datasets. This decline mainly manifests in suboptimal edge details and color reproduction when reconstructing clear images. This suggests that mainstream synthetic datasets still lack realism. However, comparative experimental results indicate that the objective performance metrics of the g2D-Net, particularly those of the SSIM metric, still outperform most existing methods. The excellent SSIM metric results suggest that the overall visual perception quality of images processed by the g2D-Net is higher, which may be attributed to the long-range interaction capability of the g2D-Net. On the other hand, the lower PSNR compared to large-scale SOTA dehazing methods may imply that the performance of the g2D-Net in reconstructing pixel-level details needs improvement due to its smaller parameter size.
Figure 6. The dehazing results on real-world datasets. The first row shows the hazy images, and the second row shows the Ground Truth images.
Table 3. The experiments on real-world datasets. The best performance will be displayed in bold; the second-best performance will be indicated using underlining.

4.2. Ablation Experiments

To analyze the critical designs in the g2D-Net, we conducted corresponding ablation experiments. In these experiments, we systematically examined the impact of modules such as the g2D Block, the FFT-g2D Block, the shallow layer, and the SK layer on the model’s performance. The results of the ablation experiments are presented in Table 4. Ablation experiments will be performed on the RESIDE-OUT dataset if not otherwise specified.
Table 4. The ablation experiments on the RSEIDE-OUT dataset. Unless otherwise stated, the ablation experiments are performed on the g2D-Net. “-” indicates that the current metric is on par with the baseline. "↑" indicates a performance improvement compared to the baseline, while "↓" indicates a decrease in performance.
We initially investigated the impact of the g2D Block on model performance. The g2D Block contains two gated convolutional units, enabling second-order spatial interactions between feature information. When the g2D Block includes only one gated convolutional unit, it degenerates into a gated convolutional (GC) Block (The architecture of the GC Block is illustrated in Figure 7). The experimental results demonstrate that the g2D Block can increase the PSNR by 0.81 dB compared to the GC Block. To validate that the performance improvement brought on by the g2D Block is not due to an increase in parameters, we conducted another set of experiments: we added a 3 × 3 depth-wise convolution operator and a 1 × 1 point-wise convolution in the GC Block to match the parameters and MACs of the g2D Block. However, compared to the g2D Block, adding the convolution operator to the GC Block still decreased the PSNR by 0.39 dB. The utilization of the FFT-g2D Block is aimed at efficiently extracting global and local features. When replacing the g2D Block in the fourth stage with the FFT-g2D Block, the PSNR increases by 1.01 dB.
Figure 7. (a) The architecture of the GC Block with a convolution operator. (b) The architecture of the GC Block.
In the g2D-Net, we employ a multi-input/output strategy to alleviate training difficulty. The role of the shallow layer is to input images of different sizes into the model. If abandoning the multi-input/output strategy, this results in a decrease of 0.75 dB in the PSNR.
The SK layer is incorporated to introduce channel attention to the model, dynamically combining feature map information from different branches. Compared to the commonly used cascaded fusion layers in a traditional U-Net, the SK layer, as a lightweight module, introduces no additional computational overhead to the model while enhancing the PSNR by 0.76.
In ablation experiments, we also validated the scalability of the g2D-Net. The experiments show that regardless of expanding the depth or the width, the g2D-Net’s performance significantly improves, with deepening the network depth being a more recommended choice.

5. Conclusions

In this study, we propose a lightweight convolutional neural network for image dehazing tasks: the g2D-Net. Inspired by a Vision Transformer, we propose the g2D Block and the FFT-g2D Block, two convolutional residual blocks with input-adaptive and long-range capabilities. In addition, we utilize the SK layer to improve the model performance further and adopt the multi-input/output strategy to reduce the model’s training difficulty. Extensive experiments demonstrate that the g2D-Net achieves a balance between performance and computational complexity, delivering SOTA performance on multiple benchmark datasets with low amounts of computational overhead. This lightweight model with excellent performance effectively reduces the difficulty of model training and deployment, promoting the application and development of dehazing networks in real-world scenarios. Although the g2D-Net’s performance is impressive, its performance on large-scale datasets, such as RESIDE-OUT, still cannot match that of the SOTA large-scale dehazing models due to network size limitations. Additionally, constrained by the quality and scale of the dataset, the g2D-Net’s effectiveness in handling real-world haze still needs improvement. However, with a deeper understanding of neural networks and improved dataset quality, the g2D-Net’s performance is poised to enhance further.

Author Contributions

Conceptualization, Z.W.; methodology, Z.W.; software, Z.W.; validation, J.J.; formal analysis, J.M.; investigation, J.J.; resources, J.M.; data curation, J.J.; writing—original draft preparation, J.J.; writing—review and editing, Z.W.; visualization, J.J.; supervision, J.M.; project administration, J.M.; funding acquisition, J.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data presented in this study are available in this article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. LeCun, Y.; Boser, B.; Denker, J.S.; Henderson, D.; Howard, R.E.; Hubbard, W.; Jackel, L.D. Backpropagation Applied to Handwritten Zip Code Recognition. Neural Comput. 1989, 1, 541–551. [Google Scholar] [CrossRef]
  2. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 30 June–1 July 2016; pp. 770–778. [Google Scholar]
  3. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention Is All You Need. Adv. Neural Inf. Process. Syst. 2017, 30, 5998–6008. [Google Scholar]
  4. Liu, Z.; Mao, H.; Wu, C.-Y.; Feichtenhofer, C.; Darrell, T.; Xie, S. A Convnet for the 2020s. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 11976–11986. [Google Scholar]
  5. Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Virtual, 11–17 October 2021; pp. 10012–10022. [Google Scholar]
  6. Dauphin, Y.N.; Fan, A.; Auli, M.; Grangier, D. Language Modeling with Gated Convolutional Networks. In Proceedings of the International Conference on Machine Learning, Sydney, Australia, 6–11 August 2017; PMLR: New York, NY, USA, 2017; pp. 933–941. [Google Scholar]
  7. Long, J.; Shelhamer, E.; Darrell, T. Fully Convolutional Networks for Semantic Segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 6–12 June 2015; pp. 3431–3440. [Google Scholar]
  8. Rao, Y.; Zhao, W.; Zhu, Z.; Lu, J.; Zhou, J. Global Filter Networks for Image Classification. Adv. Neural Inf. Process. Syst. 2021, 34, 980–993. [Google Scholar]
  9. Li, X.; Wang, W.; Hu, X.; Yang, J. Selective Kernel Networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 510–519. [Google Scholar]
  10. Cho, S.-J.; Ji, S.-W.; Hong, J.-P.; Jung, S.-W.; Ko, S.-J. Rethinking Coarse-to-Fine Approach in Single Image Deblurring. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Virtual, 11–17 October, 2021; pp. 4641–4650. [Google Scholar]
  11. Gonzalez, R.C. Digital Image Processing; Pearson Education India: Delhi, India, 2009; ISBN 81-317-2695-9. [Google Scholar]
  12. Seow, M.-J.; Asari, V.K. Ratio Rule and Homomorphic Filter for Enhancement of Digital Colour Image. Neurocomputing 2006, 69, 954–958. [Google Scholar] [CrossRef]
  13. Land, E.H.; McCann, J.J. Lightness and Retinex Theory. Josa 1971, 61, 1–11. [Google Scholar] [CrossRef] [PubMed]
  14. Dippel, S.; Stahl, M.; Wiemker, R.; Blaffert, T. Multiscale Contrast Enhancement for Radiographies: Laplacian Pyramid versus Fast Wavelet Transform. IEEE Trans. Med. Imaging 2002, 21, 343–353. [Google Scholar] [CrossRef] [PubMed]
  15. He, K.; Sun, J.; Tang, X. Single Image Haze Removal Using Dark Channel Prior. IEEE Trans. Pattern Anal. Mach. Intell. 2010, 33, 2341–2353. [Google Scholar]
  16. Zhu, Q.; Mai, J.; Shao, L. A Fast Single Image Haze Removal Algorithm Using Color Attenuation Prior. IEEE Trans. Image Process. 2015, 24, 3522–3533. [Google Scholar] [PubMed]
  17. Li, Z.; Zheng, J. Edge-Preserving Decomposition-Based Single Image Haze Removal. IEEE Trans. Image Process. 2015, 24, 5432–5441. [Google Scholar] [CrossRef]
  18. Zhu, Z.; Wei, H.; Hu, G.; Li, Y.; Qi, G.; Mazur, N. A Novel Fast Single Image Dehazing Algorithm Based on Artificial Multiexposure Image Fusion. IEEE Trans. Instrum. Meas. 2020, 70, 5001523. [Google Scholar] [CrossRef]
  19. Ancuti, C.O.; Ancuti, C.; Bekaert, P. Effective Single Image Dehazing by Fusion. In Proceedings of the 2010 IEEE International Conference on Image Processing, Hong Kong, China, 26–29 September 2010; IEEE: Piscataway, NJ, USA, 2010; pp. 3541–3544. [Google Scholar]
  20. Zhao, D.; Xu, L.; Yan, Y.; Chen, J.; Duan, L.-Y. Multi-Scale Optimal Fusion Model for Single Image Dehazing. Signal Process. Image Commun. 2019, 74, 253–265. [Google Scholar] [CrossRef]
  21. Galdran, A.; Vazquez-Corral, J.; Pardo, D.; Bertalmio, M. Fusion-Based Variational Image Dehazing. IEEE Signal Process. Lett. 2016, 24, 151–155. [Google Scholar] [CrossRef]
  22. McCartney, E.J. Optics of the Atmosphere: Scattering by Molecules and Particles; John Wiley and Sons, Inc.: New York, NY, USA, 1976. [Google Scholar]
  23. Narasimhan, S.G.; Nayar, S.K. Vision and the Atmosphere. Int. J. Comput. Vis. 2002, 48, 233–254. [Google Scholar] [CrossRef]
  24. Nayar, S.K.; Narasimhan, S.G. Vision in Bad Weather. In Proceedings of the Seventh IEEE International Conference on Computer Vision, Kerkyra, Greece, 20–27 September 1999; IEEE: Piscataway, NJ, USA, 1999; Volume 2, pp. 820–827. [Google Scholar]
  25. Cai, B.; Xu, X.; Jia, K.; Qing, C.; Tao, D. Dehazenet: An End-to-End System for Single Image Haze Removal. IEEE Trans. Image Process. 2016, 25, 5187–5198. [Google Scholar] [CrossRef] [PubMed]
  26. Ren, W.; Liu, S.; Zhang, H.; Pan, J.; Cao, X.; Yang, M.-H. Single Image Dehazing via Multi-Scale Convolutional Neural Networks. In Proceedings of the Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; Proceedings, Part II 14. Springer: Berlin/Heidelberg, Germany, 2016; pp. 154–169. [Google Scholar]
  27. Li, B.; Peng, X.; Wang, Z.; Xu, J.; Feng, D. Aod-Net: All-in-One Dehazing Network. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 4770–4778. [Google Scholar]
  28. Tu, Z.; Talebi, H.; Zhang, H.; Yang, F.; Milanfar, P.; Bovik, A.; Li, Y. Maxim: Multi-Axis Mlp for Image Processing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 5769–5780. [Google Scholar]
  29. Song, Y.; Zhou, Y.; Qian, H.; Du, X. Rethinking Performance Gains in Image Dehazing Networks. arXiv 2022, arXiv:2209.11448. [Google Scholar]
  30. Lu, L.; Xiong, Q.; Chu, D.; Xu, B. MixDehazeNet: Mix Structure Block for Image Dehazing Network. arXiv 2023, arXiv:2305.17654. [Google Scholar]
  31. Chen, Z.; He, Z.; Lu, Z.-M. DEA-Net: Single Image Dehazing Based on Detail-Enhanced Convolution and Content-Guided Attention. IEEE Trans. Image Process. 2024, 33, 1002–1015. [Google Scholar] [CrossRef] [PubMed]
  32. Cui, Y.; Tao, Y.; Bing, Z.; Ren, W.; Gao, X.; Cao, X.; Huang, K.; Knoll, A. Selective Frequency Network for Image Restoration. In Proceedings of the Eleventh International Conference on Learning Representations, Virtual, 25–29 April 2022. [Google Scholar]
  33. Engin, D.; Genç, A.; Kemal Ekenel, H. Cycle-Dehaze: Enhanced Cyclegan for Single Image Dehazing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Salt Lake City, UT, USA, 18–22 June 2018; pp. 825–833. [Google Scholar]
  34. Singh, A.; Bhave, A.; Prasad, D.K. Single Image Dehazing for a Variety of Haze Scenarios Using Back Projected Pyramid Network. In Proceedings of the Computer Vision–ECCV 2020 Workshops, Glasgow, UK, 23–28 August 2020; Proceedings, Part IV 16. Springer: Berlin/Heidelberg, Germany, 2020; pp. 166–181. [Google Scholar]
  35. Wu, H.; Liu, J.; Xie, Y.; Qu, Y.; Ma, L. Knowledge Transfer Dehazing Network for Nonhomogeneous Dehazing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Seattle, WA, USA, 14–19 June 2020; pp. 478–479. [Google Scholar]
  36. Guo, C.-L.; Yan, Q.; Anwar, S.; Cong, R.; Ren, W.; Li, C. Image Dehazing Transformer with Transmission-Aware 3d Position Embedding. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 5812–5820. [Google Scholar]
  37. Song, Y.; He, Z.; Qian, H.; Du, X. Vision Transformers for Single Image Dehazing. IEEE Trans. Image Process. 2023, 32, 1927–1941. [Google Scholar] [CrossRef] [PubMed]
  38. Hore, A.; Ziou, D. Image Quality Metrics: PSNR vs. SSIM. In Proceedings of the 2010 20th International Conference on Pattern Recognition, Istanbul, Turkey, 23–26 August 2010; IEEE: Piscataway, NJ, USA, 2010; pp. 2366–2369. [Google Scholar]
  39. Zamir, S.W.; Arora, A.; Khan, S.; Hayat, M.; Khan, F.S.; Yang, M.-H. Restormer: Efficient Transformer for High-Resolution Image Restoration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 5728–5739. [Google Scholar]
  40. Chen, L.; Chu, X.; Zhang, X.; Sun, J. Simple Baselines for Image Restoration. In Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel, 23–27 October 2022; Springer: Berlin/Heidelberg, Germany, 2022; pp. 17–33. [Google Scholar]
  41. Yu, J.; Lin, Z.; Yang, J.; Shen, X.; Lu, X.; Huang, T.S. Free-Form Image Inpainting with Gated Convolution. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 4471–4480. [Google Scholar]
  42. Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. Mobilenets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv 2017, arXiv:1704.04861. [Google Scholar]
  43. Zamir, S.W.; Arora, A.; Khan, S.; Hayat, M.; Khan, F.S.; Yang, M.-H.; Shao, L. Learning Enriched Features for Fast Image Restoration and Enhancement. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 45, 1934–1948. [Google Scholar] [CrossRef]
  44. Li, B.; Ren, W.; Fu, D.; Tao, D.; Feng, D.; Zeng, W.; Wang, Z. Benchmarking Single-Image Dehazing and Beyond. IEEE Trans. Image Process. 2018, 28, 492–505. [Google Scholar] [CrossRef] [PubMed]
  45. Liu, Y.; Zhu, L.; Pei, S.; Fu, H.; Qin, J.; Zhang, Q.; Wan, L.; Feng, W. From Synthetic to Real: Image Dehazing Collaborating with Unlabeled Real Data. In Proceedings of the 29th ACM International Conference on Multimedia, Virtual, 20–24 October 2021; pp. 50–58. [Google Scholar]
  46. Ancuti, C.O.; Ancuti, C.; Sbert, M.; Timofte, R. Dense-Haze: A Benchmark for Image Dehazing with Dense-Haze and Haze-Free Images. In Proceedings of the 2019 IEEE International Conference on Image Processing (ICIP), Taipei, Taiwan, 22–25 September 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 1014–1018. [Google Scholar]
  47. Ancuti, C.O.; Ancuti, C.; Timofte, R. NH-HAZE: An Image Dehazing Benchmark with Non-Homogeneous Hazy and Haze-Free Images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Seattle, WA, USA, 14–19 June 2020; pp. 444–445. [Google Scholar]
  48. Loshchilov, I.; Hutter, F. Sgdr: Stochastic Gradient Descent with Warm Restarts. arXiv 2016, arXiv:1608.03983. [Google Scholar]
  49. Loshchilov, I.; Hutter, F. Decoupled Weight Decay Regularization. arXiv 2017, arXiv:1711.05101. [Google Scholar]
  50. Qin, X.; Wang, Z.; Bai, Y.; Xie, X.; Jia, H. FFA-Net: Feature Fusion Attention Network for Single Image Dehazing. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 11908–11915. [Google Scholar]
  51. Liu, X.; Ma, Y.; Shi, Z.; Chen, J. Griddehazenet: Attention-Based Multi-Scale Network for Image Dehazing. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 7314–7323. [Google Scholar]
  52. Dong, H.; Pan, J.; Xiang, L.; Hu, Z.; Zhang, X.; Wang, F.; Yang, M.-H. Multi-Scale Boosted Dehazing Network with Dense Feature Fusion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 2157–2167. [Google Scholar]
  53. Ye, T.; Jiang, M.; Zhang, Y.; Chen, L.; Chen, E.; Chen, P.; Lu, Z. Perceiving and Modeling Density Is All You Need for Image Dehazing. arXiv 2021, arXiv:2111.09733. [Google Scholar]
  54. Wu, H.; Qu, Y.; Lin, S.; Zhou, J.; Qiao, R.; Zhang, Z.; Xie, Y.; Ma, L. Contrastive Learning for Compact Single Image Dehazing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 19–25 June 2021; pp. 10551–10560. [Google Scholar]
  55. Luo, P.; Xiao, G.; Gao, X.; Wu, S. LKD-Net: Large Kernel Convolution Network for Single Image Dehazing. In Proceedings of the 2023 IEEE International Conference on Multimedia and Expo (ICME), Brisbane, Australia, 10–14 July 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 1601–1606. [Google Scholar]
  56. Cui, Y.; Knoll, A. Exploring the Potential of Channel Interactions for Image Restoration. Knowl. Based Syst. 2023, 282, 111156. [Google Scholar] [CrossRef]
  57. Chao, Q.; Yan, J.; Sun, T.; Li, S.; Chi, J.; Yang, G.; Chen, C.; Yu, T. Instance-Aware Image Dehazing. Eng. Appl. Artif. Intell. 2024, 133, 108346. [Google Scholar] [CrossRef]
  58. Cui, Y.; Ren, W.; Cao, X.; Knoll, A. Focal Network for Image Restoration. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 2–3 October 2023; pp. 13001–13011. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.