You are currently viewing a new version of our website. To view the old version click .
Applied Sciences
  • Article
  • Open Access

12 May 2025

Feature Attention Cycle Generative Adversarial Network: A Multi-Scene Image Dehazing Method Based on Feature Attention

,
,
and
College of Computer Science and Technology, Xi’an University of Science and Technology, Xi’an 710054, China
*
Author to whom correspondence should be addressed.

Abstract

For the clearing of hazy images, it is difficult to obtain dehazing datasets with paired mapping images. Currently, most algorithms are trained on synthetic datasets with insufficient complexity, which leads to model overfitting. At the same time, the physical characteristics of fog in the real world are ignored in most current algorithms; that is, the degree of fog is related to the depth of field and scattering coefficient. Moreover, most current dehazing algorithms only consider the image dehazing of land scenes and ignore maritime scenes. To address these problems, we propose a multi-scene image dehazing algorithm based on an improved cycle generative adversarial network (CycleGAN). The generator structure is improved based on the CycleGAN model, and a feature fusion attention module is proposed. This module obtains relevant contextual information by extracting different levels of features. The obtained feature information is fused using the idea of residual connections. An attention mechanism is introduced in this module to retain more feature information by assigning different weights. During the training process, the atmospheric scattering model is established to guide the learning of the neural network using its prior information. The experimental results show that, compared with the baseline model, the peak signal-to-noise ratio (PSNR) increases by 32.10%, the structural similarity index (SSIM) increases by 31.07%, the information entropy (IE) increases by 4.79%, and the NIQE index is reduced by 20.1% in quantitative comparison. Meanwhile, it demonstrates better visual effects than other advanced algorithms in qualitative comparisons on synthetic datasets and real datasets.

1. Introduction

With the rapid development of science and technology, autonomous driving, remote monitoring, and other technologies are widely used. However, the instability of weather factors can lead to the frequent occurrence of fog. Suspended particles, such as water vapor, smoke, and dust in fog, can absorb and scatter reflected light from the surfaces of target objects. Due to the scattering of natural light, the light received by monitoring and acquisition equipment attenuates along the line of sight, resulting in visual blurring, decreased contrast, color distortion, and other problems in the collected images. This further decreases the reliability of visual application systems and even poses a threat, especially for visible light vision systems such as object detection. Removing haze through dehazing methods to improve image quality can be suitable for various computer vision tasks, such as image segmentation [1] and object detection in hazy weather [2]. Therefore, research on image dehazing technology has important practical significance and application value.
Due to the powerful learning ability of neural networks [3], researchers have proposed a large number of methods for image dehazing. Most current approaches achieve dehazing in a supervised manner [4,5], relying on a large number of mapping pairs of haze-free images for model training. While these methods achieve promising results on specific benchmark datasets, they face several critical challenges in real-world applications. First, obtaining large-scale paired datasets is extremely difficult and costly. As a result, most methods rely on synthetic datasets generated using physical models. However, the domain gap between synthetic and real-world hazy images often leads to overfitting issues. Second, existing methods are often limited to specific types of scenes (e.g., land scenes) or haze concentrations, making them less effective under diverse real-world conditions involving complex terrain, variable lighting, and a range of haze densities. Third, many methods overlook the inherent depth variation and atmospheric light in the scene, which are crucial for realistic dehazing. These shortcomings often result in degraded visual quality, including color distortion and the loss of fine details.
To address the abovementioned issues within a unified framework, we propose FA-CycleGAN, an improved cycle-consistent generative adversarial network integrated with a feature fusion attention mechanism. Our approach is designed with three core motivations: (1) to eliminate the need for paired training data by employing an unpaired training strategy via CycleGAN, (2) to enhance physical interpretability by incorporating an atmospheric scattering model that guides the generator in simulating realistic haze removal, and (3) to preserve more discriminative features by introducing a feature fusion block with coordinate attention (FBCA) that adaptively fuses multi-scale features and enhances important spatial and channel-wise information. This unified framework allows our model to simultaneously address dataset limitations, multi-scene application issues, and feature loss problems, resulting in improved performance across diverse dehazing scenarios.

3. Methods

Since paired haze-free datasets are difficult to obtain in many scenarios, the extensive use of synthetic datasets may lead to overfitting issues in deep learning models, resulting in insignificant dehazing effects in real-world images. Using the CycleGAN unpaired image-to-image translation framework, the image dehazing problem is treated as an image transformation between two distinct style domains. Hazy images are considered the source domain, while haze-free images are regarded as the target domain. The core of the CycleGAN-based image dehazing method lies in mapping the source domain images to the target domain, transforming haze images into haze-free images. However, the physical properties of hazy environments in the real world are often ignored in end-to-end dehazing methods, resulting in generated haze that usually lacks realism and diversity, further impacting the learning of the subsequent dehazing network. To address the abovementioned issues, we propose the FA-CycleGAN model. Using the CycleGAN model can effectively solve the problem of difficult pairwise image acquisition. In the training process, convolutional operations are used to extract the parameters of the atmospheric scattering model and reconstruct the clear image. At the same time, a deep learning model is used to refine the process to make up for the parameter estimation error of the atmospheric scattering model. A feature fusion attention module is introduced into the generator structure for multi-scale feature fusion and different weight assignments, which can retain more feature information and focus on different regions’ features to varying degrees. While ensuring the authenticity of dehazed images, more image details are retained, and the dehazing performance of the model is improved.

3.1. Overall Framework of the Model

In this paper, the atmospheric scattering model is used as a physical constraint in combination with CycleGAN. On the theoretical basis of the mathematical model, the hazy imaging atmospheric scattering model can be described as follows:
H ( x ) = C ( x ) t ( x ) + A ( 1 t ( x ) )
where H(x) is the X-th pixel value of the hazy image, C(x) is the dehazed image, A is the global atmospheric light value, and t(x) is the transmittance. The relationship between the transmittance and the depth information of the image is shown in Equation (2), where β is the scattering coefficient and d(x) is the depth information of the image.
t ( x ) = e β d ( x )
Based on the improvements to the CycleGAN network, FA-CycleGAN is proposed. Its structure is shown in Figure 2a. FA-CycleGAN consists of two generators and two discriminators. Haze-free images are generated from haze images by the generator, GD, aligning their distribution with that of the target domain images, thereby deceiving the discriminator DD. Haze images are generated from haze-free images by the generator, GH, to deceive the discriminator, DH. The discriminator, DH, is responsible for determining whether the input image is a haze image, while the discriminator, DD, assesses whether the input image is a haze-free image.
Figure 2. The proposed FA-CycleGAN consists of two generators and two discriminators, and the two branches of the network are used to generate hazy and haze-free images as shown as (a). The workflow of generator GD and generator GH is shown in (b).
The workflow of generator GD and generator GH is shown in Figure 2b. The image is fed into the generator to produce the generated image. This generated image is then fed into the discriminator to determine whether it is a real image. Then, the loss is calculated based on the generated image and the discriminator’s results, followed by updating the generator and discriminator parameters based on this loss. During the training process, increasingly realistic images are produced by the generator in an attempt to deceive the discriminator, which is used to distinguish between real and generated images. Through the adversarial interaction between the generator and discriminator, the conversion of haze images to haze-free images is achieved. Finally, an optimal paired image generator model with haze and haze-free images is obtained. The pseudocode is as shown in Algorithm 1.
Algorithm 1 FA-CycleGAN Network Training Process
Input: Input_Dehazy, Input_Hazy.//clear images and hazy images.
Output: Training log information.
1:
#===== Branch1: Clean → Hazy → Cyclic Clean =====
2:
Generated_Hazy, gt_beta = GH(Input_Dehazy)
3:
Cyclic_Dehazy, pred_beta = GD(Generated_Hazy)
4:
# ===== Branch2: Hazy → Dehazy → Cyclic Hazy =====
5:
Generated_Dehazy, gt_d = GD(Input_Hazy)
6:
Cyclic_Hazy, pred_d = GH(Generated_Dehazy)
7:
# ===== Discriminators training =====
8:
dis_real_clean = DD(Input_Dehazy)
9:
dis_fake_clean = DD(Generated_Dehazy)
10:
loss_dis_clean = adversarial_loss(dis_real_clean, True) + adversarial_loss(dis_fake_clean, False)
11:
dis_real_hazy = DH(Input_Hazy)
12:
dis_fake_hazy = DH(Generated_Hazy)
13:
loss_dis_hazy = adversarial_loss(dis_real_hazy, True) + adversarial_loss(dis_fake_hazy, False)
14:
total_dis_loss = (loss_dis_clean + loss_dis_hazy)/4
15:
total_dis_loss.backward()
16:
# ===== Generators training =====
17:
fake_clean_logits = DH(Generated_Dehazy)
18:
fake_hazy_logits = DD(Generated_Hazy)
19:
loss_gan = (adversarial_loss(fake_clean_logits, True) + adversarial_loss(fake_hazy_logits, True))/2
20:
loss_cycle = L1(Input_Hazy, Cyclic_Hazy) + L1(Input_Dehazy, Cyclic_Dehazy)
21:
loss_β = L2(pred_beta, gt_beta)
22:
loss_d = L1(gt_d, pred_d)
23:
total_gen_loss = λ_gen*loss_gan + λ_cycle*loss_cycle + λ_β*loss_β + λ_d*loss_d
24:
total_gen_loss.backward()

3.2. Structure of the Specific Network

3.2.1. Structure of the Generators

Unlike traditional CycleGAN, a heterogeneous generator structure, in which the two generators utilize different network structures, is employed in FA-CycleGAN.
Image dehazing is performed on the input hazy image by the GD. The model structure is shown in Figure 3. The pseudocode is shown in Algorithm 2. The depth information of the image is estimated through feature extraction and residual fusion by the transmittance estimation module. The scattering coefficient of the image is estimated through feature extraction and average pooling operations by the scattering coefficient estimation module. GD is first used to extract features from the input image, H. The extracted features at different levels are processed to obtain the image’s transmittance, t ^ , and scattering coefficient, β ^ , as follows:
( t ^ , β ^ ) = G D ( H )
Figure 3. The transmittance and scattering coefficients are estimated through the generator GD using feature extraction and fusion, and a haze-free image is obtained by calculating Equation (4).
The generated haze-free image, C ^ , can be calculated based on the atmospheric scattering model as follows:
C ^ = H A ^ t ^ + A ^
where A ^ is the atmospheric light value estimated from the dark channel prior, H is the hazy images, and t ^ is the image’s transmittance.
Algorithm 2 Process flow of the generator, GD
Input: Input_Hazy.//hazy images.
Output: Generated_Dehazy.
1:
Initialize:
2:
   Bulid the multi-layer feature extraction block(MFEB)
3:
   Build the feature fusion attention block(FBCA)
4:
   Build the output layers(output_Conv)
5:
Method forward_get_A(input_image)
6:
   If use_dc_A
7:
       then Estimate atmospheric light A via dark channel method
8:
       else Set A as the maximum RGB value over spatial dimensions
9:
   return A
10:
Method forward(Input_Hazy)
11:
   features = MFEB(Input_Hazy)
12:
   t = output_Conv(FBCA(features))
13:
   β = AvgPooling(features)
14:
   Normalize t and β into valid ranges
15:
   A = forward_get_A(Input_Hazy)
16:
   Compute Generated_Dehazy according to Equation (4)
As shown in Figure 4, the multi-layer feature extraction block (MFEB) is built based on the EfficientNet-lite3 network. The backbone network is divided into four layers to gradually extract advanced features from the image. Through layered feature extraction, the network can gradually learn low-level details and high-level semantic information, which aids in learning complex structures and patterns in the image, enabling the network to better understand the input data.
Figure 4. The multi-layer feature extraction block can extract features using multiple MBConv structures and output multi-layer features.
The feature fusion attention module (FBCA) performs multi-scale feature fusion on extracted features of different levels. Its structure is shown in Figure 5. The idea of residual connection is adopted in FBCA, where different levels of features are separately fed into different feature fusion blocks for fusion. The output of the previous layer’s feature fusion block is processed by a residual unit and added to the input features of the current layer. The result of this addition is then processed by the residual unit to obtain the output of the current layer. This approach allows for improved retention of low-level features while promoting the learning of high-level features, thus improving the dehazing effect. To enhance the model’s attention to input data, the CA attention module is introduced to improve the model’s performance in handling complex tasks. In this module, the model can focus on complex areas to recover details more efficiently. Most current attention mechanisms (e.g., SE attention mechanism) significantly enhance model performance. Unlike traditional channel attention, which aggregates spatial information globally and may overlook location-specific features, attention is decomposed by CA into two complementary 1D encodings along the horizontal and vertical directions. This approach captures long-range dependencies while preserving precise positional information. By embedding location awareness into channel attention, CA allows the network to adaptively emphasize features that are not only important across channels but also relevant to specific spatial coordinates. This enables the generator to identify and enhance important regions, such as object boundaries, edges, and textured areas, which are often degraded or obscured in hazy images. This results in better structural preservation, reduced artifacts, and improved clarity in localized areas in practice. Thus, the integration of CA within the FBCA module helps the network focus on both channel importance and spatial position, leading to a more refined feature representation that contributes to more effective and visually coherent dehazing outcomes.
Figure 5. The feature fusion attention block can accept the multi-layer features extracted by the MFEB and feed them into different feature fusion blocks for fusion. The feature fusion block is described in the second box, and the residual block is described in the third box.
Image hazing is performed by the GH based on the input clear image. The model structure is shown in Figure 6. The pseudocode is shown in Algorithm 3. Firstly, the transmittance estimation module is used to estimate the depth information, d ^ , of the input clear image, and a scattering coefficient, β, is randomly sampled within the range of [0.6, 1.8]. Then, the atmospheric scattering model is used to calculate a rough pseudo-hazy image. Finally, the rough pseudo-hazy image is refined using the U-Net network to avoid the visual unreality of the image caused by parameter estimation errors, as follows:
H ^ = G H ( C e β d ^ + A ( 1 e β d ^ ) )
Algorithm 3 Process flow of the generator, GH
Input: Input_Dehazy.//clear images.
Output: Generated_Hazy.
17:
Initialize:
18:
   Bulid the multi-layer feature extraction block(MFEB)
19:
   Build the feature fusion attention block(FBCA)
20:
   Build the output layers(output_Conv)
21:
   Build the U-Net network(UNet)
22:
Method forward(Input_Dehazy)
23:
   features = MFEB(Input_Dehazy)
24:
   d = output_Conv(FBCA(Conv(features)))
25:
   β = Ramdom [0.8, 1.6]
26:
   Compute Generated_Hazy according to Eqs.(5)
27:
   Generated_Hazy = UNet(Generated_Hazy)//Refined by U-Net
Figure 6. The generator, GH, can estimate the depth information of the image through feature extraction and fusion and combine the randomly generated scattering system to obtain a rough blurred image, as calculated using Equation (5). Finally, it uses the U-Net network for fine processing to obtain the final haze image.

3.2.2. Structure of the Discriminator

A basic convolutional discriminator was used to distinguish between real and generated images. Its network structure is shown in Figure 7. The discriminator receives the input image and performs feature extraction through a series of convolutional operations. Spectral normalization is applied to stabilize the training process. Using LeakyReLU as the activation function helps preserve more information and prevent gradient vanishing issues. The last layer maps the output to the range [0, 1], providing the probability that the input image is a real image.
Figure 7. The discriminator can output the final discrimination result through a layer-by-layer convolution structure.

3.3. Loss Function

Similar to CycleGAN, cyclic consistency loss and adversarial training loss are used to penalize content consistency and data distribution, respectively. According to the atmospheric scattering model, the degree of fog in the real world is related to the scene depth, d, and the scattering coefficient, β. Therefore, pseudo scattering coefficient-supervised loss and pseudo depth-supervised loss are employed to learn physical properties (depth and density) from unpaired hazy-free images.
Adversarial training loss is used to evaluate whether the generated images belong to a specific domain, penalizing the visual fidelity of the hazy and free images while ensuring that they follow the same distribution as the images in the training set. To address the issue of slow convergence of the discriminator caused by the min–max loss, non-saturating GAN (NSGAN) [29] loss is used. NSGAN loss has good stability and visual quality. For the generator, G, and the corresponding discriminator, D, the adversarial loss can be expressed as follows, where rh is real_hazy sample of the hazy image set, and rc is real_clean sample of the clean image set:
L o s s GAN ( G H , D H ) = E [ log ( D H ( r h ) ] + E [ log ( 1 D H ( G H ( r c ) ) ]
L o s s GAN ( G D , D D ) = E [ log ( D D ( r c ) ] + E [ log ( 1 D D ( G D ( r h ) ) ]
Training CycleGAN using only adversarial training loss does not guarantee the cyclic consistency of the network [30], which refers to the consistency between the input and output. Therefore, cyclic consistency loss is used to penalize the consistency of the inputs and outputs. This loss is defined as the difference between the input value, x, and the forward prediction, F(G(x)), as well as the input value, y, and the forward prediction, G(F(y)). The larger the difference, the further the prediction is from the original input. The cyclic consistency loss is implemented using L1 loss, as shown in Equation (8) as follows:
L o s s cycle = E [ G H ( G D ( r h ) ) r h ] + E [ G D ( G H ( r c ) ) r c ]
The pseudo-scattering coefficient supervised loss is used to penalize the difference between the randomly sampled scattering coefficients generated and the scattering coefficients estimated from the generated hazy images. It is calculated as follows:
L o s s β = ( β ^ β ) 2
Pseudo-depth supervised loss is used to penalize the difference between the depth information, d, estimated from the hazy image, H, and the depth information, d ^ , estimated from the dehazed image. The L1 loss is used, which is defined as follows:
L o s s d = E [ d ^ d ]
In summary, joint optimization is performed using a weighted combination of the abovementioned losses as follows:
L o s s total = λ GAN L o s s GAN + λ cycle L o s s cycle + λ β L o s s β + λ d L o s s d
where λGAN, λcycle, λβ, and λd are the weights used to balance different items. Based on previous experience and experiments conducted, setting these weights to 0.2, 1, 1, and 1, respectively, works well in our experiments.

4. Experiment

4.1. Experimental Environment and Parameters

In this paper, we use Python 3.9 and PyTorch 1.12 to build a deep learning environment and conduct experiments on an NVIDIA GeForce RTX 3060 Laptop GPU. The network is trained using the RGB channel. The training images are cropped to 256 × 256 as the input for the network. The model is trained using the Adam Optimizer, where the batch size is set to 2, the average gradient values β1 and β2 are set to 0.9 and 0.999, respectively, and the learning rate is set to 0.0001.

4.2. Datasets and Evaluation Indicators

4.2.1. Datasets

The public datasets used for evaluating the model in this paper mainly include the RESIDE dataset and the I-HAZE dataset. RESIDE is a large synthetic dataset divided into five subsets. We selected the Outdoor Training Set (OTS) and the Synthetic Outdoor Testing Set (SOTS) for experimentation. OTS contains 2061 clear outdoor images and 72,135 hazy images. Each clear image corresponds to 35 blurred images with different atmospheric light values and scattering coefficients. SOTS contains 500 indoor clear images and 500 clear outdoor images, along with their corresponding hazy images. The I-HAZE dataset is used to evaluate the performance of the algorithm in a real hazy environment, using an artificial fog device to generate real fog in an indoor setting.
Currently, there are many indoor and outdoor hazy image datasets based on land, covering different scene types. However, there are hardly any publicly available datasets specifically for maritime hazy images. To address this issue, we used a hazing algorithm to fog natural images and constructed a maritime dehazing dataset called the Maritime Dehaze Dataset (MDD). To prevent overfitting, we collected maritime images from different scenes. Various visual scenes in the maritime domain are covered comprehensively in MDD, including inland waterway shipping, ports, near-shore coastal areas, watercraft, and buoy small target detection. In the MDD, there are a total of 2142 images for training and 200 for testing.

4.2.2. Evaluation Indicators

In order to evaluate the advantages and disadvantages of the algorithm objectively, a variety of indexes are used. The first is the full reference evaluation index: structural similarity (SSIM) and peak signal-to-noise ratio (PSNR). The structural similarity index between two images can be compared using SSIM, which is based on the mean, variance, and covariance of the images. PSNR is a measure of the relationship between signal and noise and is used to evaluate the quality of image reconstruction. This is followed by no-reference evaluation indicators: information entropy (IE) and Natural Image Quality Evaluation Index (NIQE). IE is an important index for measuring image complexity and information, which is used to evaluate image detail and contrast after fog removal. The quality of an image can be evaluated using the NIQE by analyzing its structural and statistical characteristics and comparing them with the typical features of a natural image, which helps assess whether an image appears visually natural. Finally, there are performance metrics: number of parameters (Params) and floating point operations per second (FLOPs). These two key metrics are used to evaluate the performance and efficiency of the model and are used to measure the complexity and computational requirements of the improved model.

4.3. Comparative Experiments

To validate the effectiveness of the algorithm, we compared it with several classical and advanced dehazing methods, including DCP, CycleDehaze, FFANet, MSBDN, C2PNet, and our method. DCP is a classic prior knowledge-based method. CycleDehaze is an improved algorithm based on CycleGAN. A new feature fusion attention module is proposed with FFANet, which combines channel attention and pixel attention for feature weighting. A multi-scale dense feature fusion module is proposed with MSBDN, which is based on combining U-Net with the back-projection feedback scheme in image super-resolution. A physical-aware dual-branch unit is proposed with C2PNet, which assigns the estimation tasks of transmittance and atmospheric light value to two parallel branches for learning based on the atmospheric scattering model. To qualitatively compare the strengths and weaknesses of these algorithms, a unified dataset and training scheme are used to train the networks.

4.3.1. Land Scene

In the land scene, the OTS is used for training models and comparison methods. SOTS-outdoor was synthesized in the same way as OTS and was selected as one of the test sets. SOTS-indoor and I-HAZE differ in terms of scenes and haze types, and the results from these two test sets can reflect the adaptability of the model to different scenes. The comparison results are shown in Table 1. The visualization results are shown in Figure 8.
Table 1. Comparison results of different models using the SOTS-outdoor, SOTS-indoor, and I-HAZE datasets.
Figure 8. (Hazy) and (Clear) are the land images of SOTS-outdoor dataset, and (af) are the dehazing experimental results of method FFANet, C2PNet, MSBDN, DCP, Cycledehaze, and the proposed method.
In the experimental comparisons using the SOTS-outdoor dataset as the test set, strong fitting capabilities were demonstrated in the supervised FFANet due to the consistency between SOTS-outdoor and the training set OTS in synthetic methods. The best results were achieved using our method among those trained on non-paired datasets. Under unsupervised conditions, image features can be learned effectively by the method, leading to high-quality predictions and demonstrating its potential and advantages in the absence of paired data. In the experimental comparisons using the SOTS-indoor and I-HAZE datasets as the test sets, due to the inconsistency between the test sets and OTS in synthetic methods, the supervised model failed to adapt effectively to changes in the data distribution. In this case, the advantages of supervised methods are significantly diminished, and there is even a slight overfitting phenomenon, leading to a decline in performance on both datasets. In contrast, our proposed method exhibits stable performance on both datasets. In the two image quality evaluation indices of SSIM and PSNR, compared with other existing comparison methods, the proposed method shows better adaptability across diverse datasets. Based on the experimental results, it can be observed that the proposed method not only overcomes the adverse effects caused by the inconsistency of synthetic datasets but also maintains relatively consistent performance across different test sets. This further verifies its effectiveness and feasibility within unsupervised learning.
As can be seen in the visualization results in Figure 8d, DCP is a classic prior-based method that relies on the dark channel prior derived from the atmospheric scattering model. However, it struggles to adapt to complex scenes and often leads to color distortions or over-dehazing. C2PNet, shown in Figure 8b, explicitly incorporates the atmospheric scattering model by proposing a physical-aware dual-branch unit, which assigns the estimation of transmittance and atmospheric light to two separate learning branches. This design significantly improves the accuracy of key parameter estimation, thus enhancing the model’s robustness. On the other hand, cycle consistency constraints for unsupervised learning is introduced by CycleDehaze, but explicit modeling of the physical scattering process is lacking, resulting in weaker performance in physical consistency and detail restoration, as shown in Figure 8e. A feature fusion attention module is introduced by FFANet to adaptively weight features at different scales and regions, as shown in Figure 8a. This significantly enhances the representation capacity of dehazing features and improves the fidelity of image restoration. While not employing an explicit attention mechanism, effective cross-scale feature fusion is achieved by MSBDN through dense connections and feature feedback, which improves dehazing performance to a considerable extent, as shown in Figure 8c. However, when there is a large sky area and strong illumination, most methods encounter problems of incomplete dehazing and distortion in the sky area. The atmospheric scattering model is combined with the CycleGAN network in the proposed method, which can correct parameter estimation errors while retaining physical information. At the same time, the feature fusion attention mechanism is introduced, which can retain feature information to the greatest extent, sro that more details can be retained and the problem of color distortion can be corrected. Therefore, the restored image is closer to the visual effect of a real image.

4.3.2. Maritime Scene

In this section, experiments are conducted using the MDD for training and testing. The results are compared with current classical and more cutting-edge dehazing algorithms. The comparison results are shown in Table 2. The visualization results are shown in Figure 9.
Table 2. Comparison results of different models using MDD.
Figure 9. (Hazy) and (Clear) are the maritime images of MDD dataset, and (af) are the dehazing experimental results of method FFANet, C2PNet, MSBDN, DCP, Cycledehaze, and the proposed method.
In Table 2, it can be seen that our proposed network has relatively stable performance on MDD. The SSIM and PSNR values were 24.850 and 0.956, both achieving the highest scores. The PSNR and SSIM values can reflect the similarity between the dehazed image and the real image. The FBCA weighting mechanism proposed in this paper assigns higher weights to important features, retaining details while suppressing useless noise. Image structure information and details can be recovered better, thereby improving the PSNR and SSIM indicators. Information entropy reflects image complexity and information content. The maximum value of 7.433 is obtained in IE, indicating that the image, after using the model to remove fog, is clearer and more detailed. This improvement is due to the introduction of multi-scale feature fusion and a weighting mechanism in the model, which allows the model to effectively capture high-frequency information, thereby improving the clarity and information of the image. The NIQE index evaluates whether the image is visually natural. The optimal value of the NIQE index indicates that the defogged image has recovered details and textures and is more in line with human visual perception.
The visualization results are shown in Figure 9. MSBDN and CycleDehaze exhibited poor performance on the MDD dataset, resulting in a significant amount of haze residue in the dehazed images and incomplete dehazing, which indicates that their modeling ability for maritime scenes is insufficient because they cannot effectively distinguish between haze and image information. Since DCP is based on the dark channel prior method, it is prone to large-area color distortion in bright areas such as the sky, and there is a degree of image over-exposure. The two networks, FFANet and C2PNet, perform relatively well, but there is some color distortion because the feature differences of different regions are not fully considered in the fusion process of feature extraction. In contrast, the deep learning model is used in the proposed method to extract the parameters of the atmospheric scattering model and refine them. At the same time, a multi-scale feature fusion and weighting strategy is introduced. This model can capture both global and local features and dynamically adjust the weights so as to maintain high dehazing performance in complex scenes, and the visual effect in the dehazed images is more natural.

4.4. Ablation Experiments

4.4.1. Comparison of Different Modules

To further verify the effectiveness of each module in the network, ablation experiments were designed with different model states. The results of the module effectiveness experiments are shown in Table 3, and the visualization results are shown in Figure 9. In the table, the base model is the CycleGAN; “A” is the improved generator structure based on the atmospheric scattering model, “B” is the improved loss function, and “C” is the FBCA module in the network. The comparison results are shown in Table 3. The visualization results are shown in Figure 10.
Table 3. Objective comparison of ablation results of different modules.
Figure 10. Comparison of PSNR and generator loss metrics compared to the baseline model during training.
As shown in Table 3, by improving the generator network structure, the PSNR and SSIM increase by 16.10% and 25.24%, respectively, indicating that after combining the atmospheric scattering model, the model can make full use of the physical characteristics of fog to better learn image features. The generator of the traditional CycleGAN is based on a residual network, which is more complex to calculate, while the improved generator greatly reduces the amount of computation, thus achieving better results in terms of FLOPs. After the introduction of the improved loss function, the performance of the model is further improved, indicating that the optimization of the loss function can better guide model training and optimize parameters. Compared with the original network, the PSNR increases by 32.10%, the SSIM increases by 31.07%, the information entropy increases by 4.79%, and the NIQE index is reduced by 20.1%, indicating that the quality of the dehazed image has been significantly improved. With the introduction of the FBCA module, the best test results were achieved, with a 32.10% increase in PSNR and a 31.07% increase in SSIM compared to the original network. The dehazed image processed by the model is clearer and more in line with human visual perception. Due to the introduction of the FBCA module, the proposed method results in a moderate increase in the indicators of Params and FLOPs. However, as demonstrated by the experimental results, this increase leads to significant improvements in performance, representing a reasonable trade-off between performance and computational cost. As shown in Figure 10a, it can be observed that with the increase in training epochs, the PSNR of both methods shows an overall upward trend. However, FA-CycleGAN consistently achieves higher PSNR values across different stages and exhibits smaller fluctuations, indicating better stability and convergence. These results demonstrate that FA-CycleGAN is capable of generating higher-quality dehazed images, thereby validating the effectiveness of the proposed method in preserving image details and improving dehazing performance. As shown in Figure 10b, both models exhibit a decreasing trend in loss as training progresses, indicating successful convergence. Notably, FA-CycleGAN maintains consistently lower loss values with smaller fluctuations throughout the training process and converges more rapidly compared to CycleGAN. This result further supports the effectiveness of FA-CycleGAN in achieving better stability and optimization during model training.
From the visual effects shown in Figure 11, it can be observed that the original CycleGAN network’s dehazing is not thorough, with “pseudo-shadows” present in some areas and the sky area exhibiting localized exposure. After improving the generators based on the atmospheric scattering model and loss function, the model can make full use of physical prior information; there are no longer any “artifacts”, and the local exposure of the sky area is improved. This demonstrates that physical prior modeling effectively provides more reasonable guidance for dehazing. After introducing the FBCA module, the model can fully integrate the features of each level and assign different weights. The model exhibited significantly enhanced restoration capabilities in detailed regions (such as object edges and texture-rich areas). Therefore, the model can effectively restore image color, dehazes more thoroughly, and significantly improves the visual effect. On the whole, the ablation experimental results fully verify the effectiveness of each improved module, achieving the best performance across multiple evaluation indicators. The improved model enhances the understanding and generation ability of image information, which helps further improve the dehazing performance of the model, so that the processed image can retain more details and generate images that are more in line with human eye perception.
Figure 11. (Hazy) and (Clear) are the images of SOTS-outdoor dataset, and (ae) are the visualization results of different ablation modules.

4.4.2. Comparison of Different Attention Mechanisms

In this section, ablation experiments on different attention mechanisms in the FBCA module were conducted. The impact of these attention mechanisms on model performance is evaluated through comparison. The OTS dataset is used as the training set, and the SOTS-outdoor dataset is used as the validation set to verify the effectiveness of the selected mechanisms. Specifically, the FBCA module without attention mechanisms is labeled as “None”. The names of the introduced attention mechanisms are used as module names, including the channel attention mechanism, Convolutional Block Attention Module (CBAM) mechanism, ShuffleAttention mechanism, SKAttention mechanism, and CA mechanism used in this paper. The quantitative evaluation results are shown in Table 4, and the visual results are shown in Figure 12.
Table 4. Objective comparison results of different attention mechanisms.
Figure 12. (Hazy) and (Clear) are the images of SOTS-outdoor dataset, (a) is the dehazing result without introducing the attention mechanism, and (bf) are the dehazing results of attention mechanism channel attention, CBAM, shuffle attention, SK attention and ours.
In Table 4, it can be seen that when only the channel attention mechanism is used, the PSNR, SSIM, and IE metrics decreased by 4.24%, 2.66%, and 6.32%, respectively. This occurs because channels are weighted by the channel attention mechanism based on their importance, which may lead to some information being weakened or lost. In the dehazing task, information from each channel may be crucial for the quality of the recovered image. Excessive weakening of certain channels can lead to performance degradation. The introduction of the CBAM, ShuffleAttention, SKAttention, and CA attention mechanisms resulted in improved model evaluation metrics. After the introduction of the CA attention mechanism, all indicators of the model increased by 13.78%, 4.65%, 4.47%, and 15.56%, respectively, achieving optimal performance. CBAM and SKAttention can combine spatial and channel attention mechanisms, enabling comprehensive modeling of feature maps and enhancing the model’s understanding of input data. A non-linear channel shuffling operation was introduced in ShuffleAttention. By enhancing information interaction between different channels, it better utilizes the correlation and common features among various channels, enabling the model to learn more complex feature representations. The CA attention mechanism can embed positional information into channel attention. Weights are dynamically calculated based on coordinate positions rather than relying on fixed parameters for specific positions. Compared to other mechanisms, CA can consider the positional information of the entire feature map simultaneously, without being influenced by local regions. This helps the model comprehensively understand the entire input data, leading to better performance. All indicators have improved.
In addition, according to the visualization effect shown in Figure 12, when the attention mechanism is not introduced, there is a problem of incomplete dehazing and poor visual quality. When other attention mechanisms are used, there is color distortion in the sky area and incomplete dehazing in some areas. Thus, the effectiveness of the CA attention module introduced in this paper is demonstrated.

4.4.3. Comparison of Different Haze Densities

In order to verify the robustness of the model against different haze densities, the following experiments are designed. In the experiment, the conditions of light, moderate, and heavy haze were simulated by setting different atmospheric light values (A) and scattering coefficients (β). Specifically, A is set to 0.8, 0.9, 0.95, and Random, where “Random” represents a random selection in the range of [0.8–1.0]. The β is set to 0.04, 0.08, and Random, where ”Random” represents a random selection in the range of [0.04–0.08].
In the atmospheric scattering model, the atmospheric light value and scattering coefficient are key factors that determine haze intensity and image blur degree. They affect image sharpness and detail recovery by controlling light scattering and absorption, respectively. The atmospheric light value represents the intensity of light in the atmosphere that is scattered and eventually reaches the observer. When the atmospheric light value is low, more light in the atmosphere is absorbed or scattered, resulting in greater haze and a significant loss of contrast and detail in the image. The scattering coefficient represents the degree to which the atmosphere scatters light per unit distance. When the scattering coefficient is high, there are more haze particles in the atmosphere, and the scattering effect of light is strengthened, making the image more blurred and the visual effect more atomized.
The results of different haze densities are shown in Table 5. As shown in Figure 13a,c, the light haze in the image has less influence, the atmospheric light value is higher, and the scattering coefficient is lower. Image details can be recovered well by the model, and the degree of color restoration is good. The visual effect is close to that of the original image. The high PSNR and SSIM values indicate the robustness of the model under light haze conditions. With the decrease in atmospheric light value and the increase in scattering coefficient, the contrast of the image decreases, and the details are blurred, as shown in Figure 13b. The overall structure of the image can still be recovered fairly well. The PSNR and SSIM values decreased compared with light haze, but the model could still maintain a good recovery effect. As shown in Figure 13d, under the condition of heavy haze, the image detail is greatly lost. Under this condition, the recovery ability of FA-CycleGAN is obviously limited. The model can restore the outline of the image to a certain extent, but the loss of color detail is significant. As shown in Figure 13e, under the condition of random haze, the performance of the model will also fluctuate because the values of A and β are randomly selected within a certain range. However, in most cases, FA-CycleGAN is able to recover the details of the image fairly well.
Table 5. Objective comparison results using different haze densities.
Figure 13. (ae) Visualization results of five different haze densities before and after dehazing the MDD dataset.

5. Conclusions

In this paper, we propose a multi-scene image dehazing method, FA-CycleGAN. We use the improved CycleGAN for image dehazing while establishing a physical model for the dehazing process. The atmospheric scattering model is integrated to physically constrain the dehazing process, which not only uses the prior knowledge of the physical model but also corrects inherent estimation errors through the deep learning model. A feature fusion attention module is introduced in the generator network. This module performs multi-scale feature fusion and feature weighting on features of different levels to better learn features of varying distributions and improve the adaptability of the model to different scenes. Extensive experiments on multiple datasets demonstrate the superior performance of FA-CycleGAN, confirming its robustness in image dehazing tasks.
This method has shown certain effectiveness across different datasets, effectively improving model performance. Dehazed images can retain more details and exhibit better visual effects. However, there is still room for improvement. Due to time and equipment limitations, the scale and diversity of the established datasets often do not fully reflect the complexity and variability of real environments. Therefore, future research needs to focus on the establishment of complex and variable environmental datasets that include multiple weather conditions and illumination levels, so that the model can better adapt to real complex environments.

Author Contributions

Conceptualization, N.L. (Na Li); methodology, N.L. (Na Liu) and Y.D.; software, N.L. (Na Liu); validation, N.L. (Na Liu); formal analysis, Y.C.; data curation, N.L. (Na Liu); writing—original draft preparation, N.L. (Na Li) and N.L. (Na Liu); writing—review and editing, Y.D. and Y.C.; supervision, N.L. (Na Li); funding acquisition, N.L. (Na Li). All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China (Grant No. 62002285) and, in part, by the Youth Innovation Team of Shaanxi Universities.

Institutional Review Board Statement

Not applicable for studies not involving humans or animals.

Data Availability Statement

The original data (RESIDE) presented in this study are openly available at https://sites.google.com/view/reside-dehaze-datasets (accessed on 25 January 2024). The original data (I-HAZE) presented in this study are openly available at https://data.vision.ee.ethz.ch/cvl/ntire18//i-haze/ (accessed on 12 February 2024). The data in the MDD presented in this study are available on the Kaggle Datasets website at https://www.kaggle.com/ (accessed on 17 March 2024).

Acknowledgments

The authors would like to thank the editor and the anonymous reviewers for their constructive comments, which helped improve the quality of this paper.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Sakaridis, C.; Dai, D.; Hecker, S.; Van Gool, L. Model adaptation with synthetic and real data for semantic dense foggy scene understanding. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; Springer: Berlin/Heidelberg, Germany, 2018; pp. 687–704. [Google Scholar]
  2. Zhiwen, S.; Zhiliang, Q.; Ruosong, P.; Linwei, M.; Benjun, M.; Xueqin, L.; Jichen, Z. Ship identification of foggy sea surface based on improved YOLOv4 deep learning algorithm. Appl. Sci. Technol. 2023, 50, 37–45. [Google Scholar]
  3. Wu, Y.C.; Feng, J.W. Development and application of artificial neural network. Wirel. Pers. Commun. 2018, 102, 1645–1656. [Google Scholar] [CrossRef]
  4. Qin, X.; Wang, Z.; Bai, Y.; Xie, X.; Jia, H. FFA-Net: Feature fusion attention network for single image dehazing. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; AAAI Press: Washington, DC, USA, 2020; pp. 11908–11915. [Google Scholar]
  5. Dong, H.; Pan, J.; Xiang, L.; Hu, Z.; Zhang, X.; Wang, F.; Yang, M.H. Multi-scale boosted dehazing network with dense feature fusion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 2157–2167. [Google Scholar]
  6. Xu, Z.; Liu, X. Enhancement algorithm of fog-degraded images based on bilinear interpolation dynamic histogram equalization. J. Dalian Marit. Univ. 2010, 36, 64–68. [Google Scholar]
  7. Liu, X. Defogging algorithm of ship video surveillance image based on adaptive histogram equalization. Ship Sci. Technol. 2020, 42, 70–72. [Google Scholar]
  8. Zunke, C. Homomorphic Filtering for Navigation-Mark Image Dehazing with Convolutional Neural Network. Navig. China 2020, 43, 84–88. [Google Scholar]
  9. Chun-ya, C.; Zengli, L. Underwater image denoising method based on wavelet transform. Mod. Electron. Technol. 2023, 46, 43–47. [Google Scholar]
  10. Yi, W.; Libo, H.; Min, T.; Shouyi, C.; Xiang, H. Image Reconstruction Algorithm Based on Fusion Wavelet Function and Dark Channel. Microprocessors 2024, 45, 34–37. [Google Scholar]
  11. Su, L.; Lijun, Z. Adaptive water image defogging algorithm based on Retinex. Appl. Sci. Technol. 2024, 51, 62–68. [Google Scholar]
  12. Pazhani, A.A.J.; Periyanayagi, S. A novel haze removal computing architecture for remote sensing images using multi-scale Retinex technique. Earth Sci. Inform. 2022, 15, 1147–1154. [Google Scholar] [CrossRef]
  13. Narasimhan, S.G.; Nayar, S.K. Contrast restoration of weather degraded images. IEEE Trans. Pattern Anal. Mach. Intell. 2003, 25, 713–724. [Google Scholar] [CrossRef]
  14. He, K.; Sun, J.; Tang, X. Single image haze removal using dark channel prior. IEEE Trans. Pattern Anal. Mach. Intell. 2010, 33, 2341–2353. [Google Scholar] [PubMed]
  15. Hong, Z.; Chunyan, L.; Ning, W.; Jiahe, T.; Chen, G. Improved dehazing for sea image based on the dark channel prior. Ship Sci. Technol. 2021, 43, 163–168. [Google Scholar]
  16. Zhou, Y.; Yu, P. Dark channel prior defogging enhancement algorithm based on gray scale eigenvalue segmentation. Intell. Comput. Appl. 2024, 14, 71–78. [Google Scholar]
  17. Yilong, L.; Ying, L.; Lingyu, Q. Defogging algorithm of single sea fog image based on bright channel. J. Dalian Marit. Univ. 2022, 48, 103–112. [Google Scholar]
  18. Wei, Z.; Qi, L.; Xiao, Y. A Image Dehazing Algorithm Combined with Al-Alaoui Operator and Improved Dark Channel. Mod. Inf. Technol. 2024, 8, 151–155. [Google Scholar]
  19. Yu, T.; Song, K.; Miao, P.; Yang, G.; Yang, H.; Chen, C. Nighttime single image dehazing via pixel-wise alpha blending. IEEE Access 2019, 7, 114619–114630. [Google Scholar] [CrossRef]
  20. Yunyuan, T.; Hui, F.; Haixiang, X. Image Defogging Method for Intelligent Ship in Foggy Weather. J. Wuhan Univ. Technol. 2021, 45, 141–146. [Google Scholar]
  21. Cai, B.; Xu, X.; Jia, K.; Qing, C.; Tao, D. Dehazenet: An end-to-end system for single image haze removal. IEEE Trans. Image Process. 2016, 25, 5187–5198. [Google Scholar] [CrossRef]
  22. Li, B.; Peng, X.; Wang, Z.; Xu, J.; Feng, D. Aod-net: All-in-one dehazing network. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 4770–4778. [Google Scholar]
  23. Chen, D.; He, M.; Fan, Q.; Liao, J.; Zhang, L.; Hou, D.; Yuan, L.; Hua, G. Gated context aggregation network for image dehazing and deraining. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision (WACV), Waikoloa Village, HI, USA, 7–11 January 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 1375–1383. [Google Scholar]
  24. Li, R.; Pan, J.; Li, Z.; Tang, J. Single image dehazing via conditional generative adversarial network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 8202–8211. [Google Scholar]
  25. Engin, D.; Genç, A.; Kemal Ekenel, H. Cycle-dehaze: Enhanced cyclegan for single image dehazing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 825–833. [Google Scholar]
  26. Zhao, L.; Zhang, Y.; Cui, Y. An attention encoder-decoder network based on generative adversarial network for remote sensing image dehazing. IEEE Sens. J. 2022, 22, 10890–10900. [Google Scholar] [CrossRef]
  27. Kwon, H. Untargeted Evasion Attacks on Deep Neural Networks Using StyleGAN. Electronics 2025, 14, 574. [Google Scholar] [CrossRef]
  28. Song, Y.; He, Z.; Qian, H.; Du, X.J. Vision transformers for single image dehazing. IEEE Trans. Image Process. 2023, 32, 1927–1941. [Google Scholar] [CrossRef] [PubMed]
  29. Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative Adversarial Networks. Adv. Neural Inf. Process. Syst. 2014, 27, 2672–2680. [Google Scholar] [CrossRef]
  30. Zhu, J.Y.; Park, T.; Isola, P.; Efros, A.A. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 2223–2232. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.