Multi-Input Attention Network for Dehazing of Remote Sensing Images

: The non-uniform haze distribution in remote sensing images, together with the complexity of the ground information, brings many difﬁculties to the dehazing of remote sensing images. In this paper, we propose a multi-input convolutional neural network based on an encoder–decoder structure to effectively restore remote sensing hazy images. The proposed network can directly learn the mapping between hazy images and the corresponding haze-free images. It also effectively utilizes the strong haze penetration characteristic of the Infrared band. Our proposed network also includes the attention module and the global skip connection structure, which enables the network to effectively learn the haze-relevant features and better preserve the ground information. We build a dataset for training and testing our proposed method. The dataset consists of remote sensing images with two different resolutions and nine bands, which are captured by Sentinel-2. The experimental results demonstrate that our method outperforms traditional dehazing methods and other deep learning methods in terms of the ﬁnal dehazing effect, peak signal-to-noise ratio (PSNR), structural similarity (SSIM) and feature similarity (FSIM).


Introduction
In recent years, the quality and quantity of satellite data have tremendously increased. However, the impact of haze has been a common issue with optical remote sensing data. Haze can severely interfere with the transmittance in all optical spectral bands, which impacts the reflected signal and hinders the observation of the underlying surface of the haze. This further results in huge data loss in both the spatial and temporal domains. Haze becomes serious interference for applications requiring time consistency (such as agricultural monitoring) and applications requiring observation of a scene at a specific time (such as disaster monitoring). Therefore, effective recovery from haze will greatly increase the usability of remote sensing data.
Early studies on the dehazing problem of remote sensing images use different methods to eliminate the influence of haze [1][2][3][4]. They use multi-source or multi-temporal images in the same area as auxiliary data, the complementary relationship between images, image fusion, pixel replacement, etc. All these methods have achieved good results. However, the need to obtain multiple sets of data from the same area as auxiliary data leads to poor applicability; especially for some remote sensing data with long collection interval, it will be more difficult to obtain available auxiliary data. Therefore, the single image dehazing method has attracted more and more attention. Some studies on single image dehazing use image enhancement methods, including processing the histogram of the image [5], and enhancing the contrast [6] and saturation [7] of the image. Additionally, some dehazing methods are based on homomorphic filtering [8] and the retinex color constancy theory [9].
Image enhancement lacks the hazy imaging mechanism, which could lead to a certain degree of distortion in the restored image. Researchers build an image dehazing method based on the Atmospheric Scattering Model (ASM). The most popular ASM model was proposed by McCartney and further improved by Narasimhan [10] and Nayar [6]. The model can usually be written as in Formula (1): where I(x) is the image disturbed by haze; J(x) is the haze-free image that needs to be restored; t(x) is the transmittance of light passing through the atmospheric medium; x represents the image pixel; A represents the global constant: atmospheric light. To obtain the haze-free image J(x), we first need to obtain the transmittance t(x) and the global atmospheric light A. However, using hazy images to estimate transmittance requires prior information. At present, studies on prior information are mainly based on statistical properties of hazy images, such as contrast prior [11], dark channel prior (DCP) [12], color attenuation prior [13], etc. However, this prior information will very likely become less applicable in images of different scenes, which affects the dehazing results. The development of deep learning brings new ideas to the dehazing research: convolutional neural network (CNN). Some earlier studies use neural networks to replace prior information to estimate the parameters of Formula (1) [14][15][16] and to obtain the dehazed image. Since the real transmittance value of the hazy image cannot be obtained, the training data can be achieved through simulated parameters, which could impact the accuracy of the estimated transmittance. Meanwhile, the transmittance model is a simplified expression for hazy imaging. The advantage of feature extraction capability of neural networks cannot be fully utilized.
Some research uses end-to-end networks to directly explore the mapping relationship from hazy images to haze-free images and, finally, generate dehazed images [17][18][19][20]. This research obtained prominent dehazing effects. However, there are limitations. First, the dataset used in the dehazing models is usually less disturbed by haze, which is relatively uniformly distributed. In remote sensing images, the distribution of haze is often uneven, and thin clouds often exist, which makes the image more disturbed than the images studied in their research. Second, remote sensing images usually have more than three channels (RGB) as in ordinary natural images. For example, the images obtained by the Operational Land Imager (OLI) of landsat8 have nine bands. The images acquired by the Multispectral Imager (MSI) of Sentinel-2 have 13 bands. The bands beyond RGB also interfere with haze. Figure 1 shows an example of an image captured by Sentinel-2. It can be observed that the infrared bands, such as Band11 and Band12, have stronger penetrating power and are less impacted by haze.
Most of the previous dehazing methods cannot effectively remove non-uniformly distributed haze. They cannot deal with the impact on the bands beyond RGB as well. Inspired by the dehazing of natural images using convolutional neural networks (CNNs), some researchers apply CNNs to the dehazing of remote sensing images [21][22][23][24]. Meanwhile, the infrared bands and Synthetic Aperture Radar (SAR) microwave bands in multispectral remote sensing images can penetrate haze more easily compared to visible bands. They can better reserve the ground information in the area with haze. Therefore, some research uses infrared band images and SAR images as auxiliary information, which is used as input for CNNs to obtain dehazing models [25][26][27]. These methods can better handle the non-uniform distribution of haze. However, most of them focus on the RGB band or a few near-infrared bands, instead of the more abundant infrared bands. Furthermore, most of the training data are synthetic hazy images, which could be different from the actual hazy images with a lot more complexity. To address these issues, we propose a multi-input attention network for the dehazing of a single multispectral remote sensing image. Since different bands in multispectral images have different features, we utilize the strong penetration capability of richer infrared bands and divide the multiple bands into three groups to extract features. Our proposed network consists of an encoder-decoder structure and uses head-to-tail connections and a multiscale output structure, similar to the feature pyramid network. This structure enables the dehazing model to effectively remove haze while maintaining ground details. It can directly process bands of different resolutions. Furthermore, improved channel attention and spatial attention structure are added for extracting features from different inputs, which improves the efficiency and adaptability of training. In this research, we use real haze and haze-free multispectral remote sensing image pairs as datasets. Our dehazing model achieves very good results in restructuring a variety of cloud-contaminated multispectral images.
The main contributions of this study are as follows: • We propose a multi-input attention network to dehaze multispectral remote sensing images. This method does not require upsampling/downsampling on the training data. It can dehaze the images captured by Sentinel-2 with different resolutions in nine bands, effectively avoiding information loss due to upsampling/downsampling. To obtain the best recovery effect, the visible light band and the features of the infrared band are fused. It takes the advantage of the strong penetration capability of the infrared band. • We build an end-to-end dehazing network with an encoder-decoder structure, which directly obtains haze-free images from learning hazy images. To improve training efficiency, the structure of weighted multiplication and residual connection between different input lines are used to adjust the feature extraction. • We use skip connections and a multi-layer output structure in the network, which can produce multi-spectral dehazing images with different resolutions. Connecting the shallow part with the tail of the network preserves the ground details and allows the network to fully extract deep features. Meanwhile, adding an improved attention module to the connection part further improves feature extraction. Finally, the network can effectively remove the disturbances, including clouds and cloud shadows under non-uniform distribution.

Related Work
At present, the research on single-image dehazing falls into two categories: traditional methods and deep learning methods.

Traditional Dehazing Methods
Traditional methods can be further divided into methods based on image enhancement and methods based on atmospheric scattering models. Methods based on image enhancement restore hazy images by enhancing contrast [5,6] and suppressing low-frequency information [8,9]. Chaudhry et al. [28] combined mixed median filtering with Laplacian to dehaze images and apply it to remote sensing images. Huang et al. [29] combined the phase-consistency feature of remote sensing images with multi-scale retinex theory and used it to dehaze urban remote sensing images.
The method based on image enhancement is considered to be unstable in many cases because it lacks a foundation in physics. Therefore, the method based on the atmospheric scattering model eventually became the mainstream of traditional dehazing methods. ASMbased methods mainly use prior knowledge to estimate the parameters in Formula (1) and then obtain a haze-free image.
He et al. [12] proposed a dark channel prior (DCP) method through statistics and research on a large number of haze-free images. Studies have shown that in the non-sky area of the image without haze, there are always pixels with very low values close to 0 in the RGB bands. Therefore, the DCP-based dehazing method achieves outstanding dehazing effects and wide applicability. Many research efforts have been dedicated to the improvement of the DCP-based dehazing method. Zhu et al. [13] proposed a dehazing method based on the color decay prior. This method obtains the depth of field by modeling the relationship between scene depth and color, obtains the parameters through the supervised learning method to obtain the transmittance, and, finally, restores the hazy image effectively according to Formula (1).
Berman et. al. [30] proposed a global-based transmittance estimation method, which is different from the previous local-based transmittance estimation. The method estimates the transmittance and restores the image based on the prior knowledge that the color distribution of pixels in a hazy image will generate haze lines. The global-based estimation is more efficient and robust. Long et al. [31] refined the atmospheric veil through a low-pass filter and redefined the transmittance to reduce color distortion. The experimental results demonstrate well the preservation of ground details and effective dehazing of remote sensing images. Shen et al. [32] proposed a spatial-spectral adaptive dehazing method to effectively remove the haze effect from visible light remote sensing images. This method establishes the relationship between the image gradient and the transmittance between different wavelength bands.

Neural Networks
In recent years, increasing efforts have been dedicated to data-driven methods using deep learning. The end-to-end learning of deep neural networks can potentially solve many problems in traditional algorithms. Researchers first estimate the transmittance in the atmospheric scattering model of Equation (1) by building a neural network and then restore the hazy image according to the model.
Cai et al. [14] used a network based on multi-scale feature extraction to restore images according to the degradation model. It takes the hazy image as input and outputs the transmittance map. Ren et al. [15] used a coarse-scale network that took the hazy image as an input to estimate the rough transmittance map, then fed into the fine-scale network to obtain an optimized transmittance map, and finally obtained a more refined dehazing image. Li et al. [16] took the transmittance and atmospheric light in Equation (1) as one variable and used a neural network to estimate it. Different from the previous estimation of atmospheric light by experience, this method uses the learning ability of the network to perform the estimation. Neural networks demonstrate powerful feature extraction capability, which greatly advocates the research on end-to-end direct dehazing networks.
Chen et al. [18] proposed an end-to-end gated context aggregation network to improve the finesse of dehazing results. It combines smooth hole convolution and multi-level feature fusion techniques. Liu et al. [19] proposed an attention-based grid dehazing network (GridDehazeNet), adding a densely connected grid network to effectively alleviate the bottleneck problem of traditional multi-scale networks. The attention module enables the network to better estimate model parameters. In [20], a Domain Adaptation framework was proposed. It employs a bidirectional transformer network to bridge the gap between synthetic and real domains by transforming images from one domain to another. This method effectively reduces the gap between the synthetic hazy image and the real hazy image. In [23], a spatial attention-based adversarial generative network was proposed to dehaze remote sensing images. The model is separately trained based on haze and small-scale cloud, and, finally, effectively removes both interferences. In [15], SkyGAN was proposed for haze removal in aerial images. The network reconstructs multispectral data from RGB bands in aerial images and then uses a conditional generative network to train these reconstructed data. This method can effectively expand multispectral datasets.
Overall, there are many issues in applying the natural image dehazing methods to remote sensing images. Most research on the dehazing of remote sensing images focuses on the visible light band. The recovery from hazy images is also limited. In this paper, we propose a multi-input multi-spectral remote sensing image dehazing network. It can effectively remove haze in multi-spectral remote sensing images.

Dataset
For natural image dehazing, there are some datasets for training the dehazing network models, for example, FRIDA [33], Hazy Cityscapes datasets [34], D-Hazy [35] and RESIDE [36]. The RESIDE dataset includes a large amount of indoor and outdoor clear images and synthetic hazy images. It has been widely accepted as a benchmark dataset for natural image dehazing research in recent years. Due to the huge difference between remote sensing images and natural images, these datasets cannot be directly applied to dehazing remote sensing images. At present, datasets for remote sensing image dehazing mainly include the haze detection and removal dataset [37], Haze1K [27] and RICE [38]. These datasets include some non-uniform haze but are limited to the visible light band and have no multispectral data. In this research, we build our dataset by collecting Sentinel-2 images from January 2021 to January 2022, from 112°E, 36°N in central China to 120°E and 29°N in eastern China. The band information of Sentinel-2 is shown in Table 1. We chose 9 bands of 10 m and 20 m, which are commonly used in earth observation, including Band 2, Band 3, Band 4, Band 5, Band 6, Band 7, Band 8, Band 11 and Band 12 for research. We also selected 30 sets of hazy and haze-free image pairs as experimental data. Figure 2 shows an example of those image pairs.
(a) Hazy Image (b) Haze-free Image The image size of 10 m resolution images is 10,980 × 10,980, while the image size of 20 m resolution is 5490 × 5490. We first chose the hazy areas of the collected images and the corresponding areas of the haze-free images. We then resized the image size of the 10 m resolution training data to be 1024 × 1024 and resized the 20 m resolution training data to be 512 × 512. The selected area was randomly cropped. We finally obtained 1500 sets of 9-band hazy and haze-free data pairs as the training data set for this research. Figure 3 shows the network architecture in this paper. Inspired by U-Net [39] and Feature Pyramid [40], we designed a multi-input network based on an encoder-decoder structure. It takes multi-spectral remote sensing images of two resolutions as the input and outputs the corresponding haze-free image.

Encoder
As shown in Figure 3, the encoder consists of three inputs, two double convolution layers, two channel attention modules and seven downsampling layers. The numbers in the figure are the number of channels after the feature map passes through the network layer.
On the input side, the 9 bands with different features are categorized into three parts as the input to the network. The visible light bands (Band 2, Band 3 and Band 4) with a resolution of 10 m and the near-infrared band (Band 8) have richer ground detail information. They are the main parts for feature extraction. All other five bands have a resolution of 20 m. The ground detail information in clear cases is relatively poor.  Table 1, it can be observed that the wavelengths of Band 5, Band 6 and Band 7 have a small difference from the wavelengths of Band 8, as well as a small difference in the penetrating capability. Therefore, the features extracted from Band 5, Band 6 and Band 7 are fused with the shallow features of the main network to standardize and correct the features extracted by the main network.
Band 11 and Band 12 have relatively longer wavelengths and stronger penetration power (as shown in Figure 1). The features extracted from Band 11 and Band 12 are fused with the deeper downsampling features. Meanwhile, the features extracted from Band 11 and Band 12 are fused with the features of the upsampling stage in the decoder. The feature extracted from the infrared band (less interfered with haze), together with the learning of deep high-order features and the learning of shallow spatial detail features, make the final restored results closer to the real situation.
For feature extraction, the double convolution network group is used for initial feature extraction for the input image. As illustrated in Figure 4a, it includes two sets of 3 × 3 convolutions. The double convolution layer can be formulated as Formula (2) F c = δ (BN(Conv(δ(BN(Conv(F i )))))) (2) where F i is the input image data or feature map. F c is the feature map after feature extraction from double convolution. Conv and BN are 3 × 3 convolution and Batch Normalization. δ is the Leaky ReLU function. In the downsampling module, as shown in Figure 4b, the input features are subjected to maximum pooling of stride = 2. Then, downsampling and feature extraction are implemented through the double convolutional network.

Decoder
The decoder consists of six upsampling layers, one spatial attention layer, two convolutional layers and an output. The decoder upsamples high-level features from the encoder and finally restores the image to a dehazed image. It uses different upsampling layers to output a multi-band image with the same two resolutions as the input. Figure 5 explains the structure of the upsampling layer. It starts with the upsampling of the upper layer features by Deconvolution with scale = 2, which, next, will be concatenated with the same-size downsampling feature map. After that, the double convolutional layer will restore the channel layer by layer. The "Concat" operation on the upsampling feature and downsampling feature enriches the information in the network, which contains both detailed information in shallow feature maps and the haze feature information in deep feature maps.

Attention Module
Inspired by the attention mechanisms widely applied in the field of image vision [41,42], we introduce the attention module in the feature extraction of the input data. The attention module can reinforce the focusing capability of the model. It enables the system to emphasize important information and suppress relatively irrelevant information so that the system can effectively extract non-uniform haze features. For the feature maps of Band 11 and Band 12 after Double Convolutional Layer and Band 5, Band 6 and Band 7 after the Downsampling Layer, the attention module infers the attention of the image along two independent dimensions of channel and space in turn. Then, the attention map is multiplied with the feature map after Downsampling and Upsampling in the backbone network, which is then fused with the feature map of the backbone network. The process is formulated in Formula (3).
where A i is the attention map generated after different attention modules. F is the feature map of the backbone network with size C × H × W. F * is the output feature map after the fusion with the attention modules, and µ m is the correction coefficient. Different bands impact the final haze removal in different ways. Therefore, different bands have different correction coefficients for feature fusion. The feature extraction correction coefficient for Band 11 and Band 12 is µ m = 0.5, while the correction coefficient for Band 5, Band 6 and Band 7 is µ m = 0.3. The channel attention module generates channel attention maps using the channel relationship between features. Each feature map is achieved by a feature detector, which focuses on meaningful features. The structure of the channel attention module is shown in Figure 6a. Global max pooling and global average pooling are used to compress the spatial dimension of feature maps. The global max pooling and global average pooling can be expressed by Formulas (4) and (5).
H mp and H ap perform global max pooling and global average pooling for input feature map F C of size H × W. X C (i, j) is the value of channel C at (i, j). The size of the feature maps after compression is C × 1 × 1. Since the feature map contains rich information, we chose Leaky ReLU as the activation function instead of Sigmoid, which is widely used in the attention structure to suppress gradient vanishing. The process of channel attention module is formulated in Formula (6). (Conv(g m ))) + Conv(δ(Conv(g a )))) (6) where A C is the channel weight of the output. δ is Leaky ReLU function, and Conv is 1 × 1 convolution. The spatial attention module generates the spatial attention feature map by using the internal spatial relationship between features. It focuses on different spatial information within a feature map. The structure of the spatial attention module is shown in Figure 6b. In the channel dimension, MaxPool and AveragePool are used to aggregate the channel information of the feature map. The processes of MaxPool and AveragePool are formulated in Formulas (7) and (8).
After obtaining the maximum and average values of each pixel of the input feature map X(i, j) on the C channels, two cross-channel feature maps of size 1 × H × W (F max and F avg ) are generated. Finally, they are concatenated through a convolutional layer into a 2D spatial attention feature map output. The process of spatial attention module is formulated in Formula (9).
where A S is the spatial weight of the output. δ is Leaky ReLU function, and Conv is 7 × 7 convolution.

Loss Function
In this research, there are many parameters of the haze restoration network model. We chose the loss L 2 to effectively train the network. The loss L 2 is the mean square loss, which has a relatively stable solution and can converge more effectively. Formula (10) explains the calculation. 3,4,5,6,7,8,11,12) J i (x) and J i (x) represent the dehazing image and the real haze-free image, respectively, i represents the band and N represents the number of image pixels.

Model Training
Our experiments were performed on PyTorch. The training of the model was conducted on NVIDIA A100 GPU. The batch size was 4 and ADAMW was the optimizer for model training, where the range of betas was set to be (0.5, 0.999). The initial learning rate of the model was 0.0001. At the same time, CosineAnnealingLR with T max = 60 and eta min = 1 × 10 −7 were used to adjust the learning rate. During training, 80% of the dataset was used as the training set and 20% as the test set. In addition, we took the two sets of haze and haze-free Sentinel II images outside the data set and used them to produce 150 data pairs as a validation set using the approach in Section 3.1.
Training was performed for a total of 200 Epochs, and the model preservation rule was to preserve only the best model of the validation set results. The loss was recorded during training, as shown in Figure 7. It can be seen that after 200 epochs of training, the loss curve tends to a lower and flatter value, and the training curve of the training set is clearly split from the validation set and the test set, indicating that the model has been adequately trained.

Metrics
We used peak signal-to-noise ratio (PSNR), structural similarity (SSIM) and feature similarity (FSIM) as the metrics to evaluate the performance of our models. PSNR is commonly used in image fusion tasks. It measures the ratio between the effective information of the image and the noise, which can reflect whether the image is distorted. The larger the value, the better the quality of the final dehazed image. SSIM describes structural similarity. The closer it is to 1, the higher the similarity with the haze-free image, which indicates a better dehazing effect. FSIM is a variant of SSIM that uses phase coherence to focus more on the contribution of different local features to the overall structure of the image. Still, the closer the value is to 1, the higher the similarity is.

Experimental Results
In this section, we apply the dehazing model to the hazy multispectral images captured by Sentinel-2. We also compare our model with the traditional method DCP [12] and four neural network methods, DehazeNet [14], AOD-Net [16], GridDehazeNet [19] and MSBDN [43]. For each hazy image used in the experiment, the corresponding clear sky image within a week was collected and used as the haze-free image for reference. Figure 8 demonstrates the dehazing results of nine bands. It can be observed that the visible light as well as Band 5, Band 6 and Band 7 have highly interfered with haze. The restoration results of DCP, DehazeNet and AOD-Net have haze residues. GridDehazeNet, MSBDN and our proposed method have no haze residue after dehazing. MSBDN has a certain color distortion. Infrared bands (Band 8, Band 11 and Band 12) are less interfered with haze. All of the methods achieve great recovery effects visually. For the visible light bands, the most interfered bands, the dehazing effect of DCP, DehazeNet and AOD-Net is relatively poor. MSBDN has a certain color distortion, while GridDehazeNet and our proposed method have better fidelity. Tables 2-4 demonstrate the performance evaluation using PSNR, SSIM and FSIM. It can be observed that the results are basically consistent with the visual effects. Meanwhile, our proposed method significantly outperforms GridDehazeNet and MSBDN, which have better dehazing effects. It indicates that the method in this paper can better maintain ground details and color fidelity.  In this study, we conducted dehazing experiments on different cases of hazy images. Figure 9 shows the case of slightly hazy images. The results are the dehazing of visible true colors (Band 2, Band 3 and Band 4). It can be observed that haze is unevenly distributed, while haze shadows also exist in Figure 9A(a). The traditional method DCP works effectively in the area with a uniform haze distribution but not in the non-uniform part nor the haze shadow area. DehazeNet and AOD-NET also have the same problem with DCP. GridDehazeNet and MSBDN can effectively remove the haze effect, but MSBDN has a certain color distortion after restoration. Compared with these two methods, our proposed method shows outstanding performance in maintaining color and ground details. Figure 10 shows the visible true color dehazing result for an image with moderate haze interference. It can be observed that the results are similar to Figure 8. GridDehazeNet and MSBDN have removed haze, but they have also lost ground details to a certain extent. The method we propose maintains the ground details in a better way. Figure 11 shows the visible true color dehazing results of an image with heavy haze. DCP, DehazeNet and AOD-NET demonstrate poor restoration. MSBDN has a poor effect on maintaining the details of ground objects in Figure 11A(f), and there is still residual haze in the upper part in Figure 11B(f). GridDehazeNet shows great restoration, but there is still a certain loss in the ground details in Figure 11A(e). There are also some haze residues in Figure 11A(e), and there is some color distortion. Our proposed method has demonstrated outstanding performance in haze removal and ground detail preservation.    Tables 5-7 list the PSNR, SSIM and FSIM of the above experiments. It can be seen that for the images with slight haze interference, the PSNR and FSIM values are close. DCP, DehazeNet and AOD-Net achieve relatively close results. MSBDN is slightly better and DehazeNet is even better. Our proposed method outperforms all these methods.
For moderate and heavy hazy images, the dehazing effect of the first three methods drops sharply. The results of GridDehazeNet and MSBDN have a slight drop. Our proposed method maintains a stable and good recovery effect, which, again, outperforms all other methods.

Ablation Experiment
In order to validate the role of different structures in our network, we performed ablation experiments. We are concerned with the multi-input feature fusion structure and the attention module. First, we only kept the main input/output structures corresponding to Band 2, Band 3, Band 4 and Band 8 in the network. We then upsampled the remaining five bands of the training data with 20 m resolution and set the size of input/output images to be 1024 × 1024. At the same time, we removed the spatial attention and channel attention modules. We used this version as the baseline. After that, we added the structural modules one by one as Model-1 to Model-6. We used the hazy images (outside the training set) as the verification dataset and calculated the average PSNR, SSIM and FSIM of 100 verification images for quantitative evaluation purposes. Table 8 lists the results. It can be observed that with the multi-input structure, the dehazing effect is greatly improved. The results are also improved by adding the attention module.

Conclusions
Traditional dehazing methods rely on prior features and are less versatile, which makes them not applicable, especially for remote sensing images with widespread nonuniform haze. In recent years, deep learning methods have been applied for automatic feature extraction. However, the structure of remote sensing images with haze is relatively complex. It is difficult for general neural networks to effectively extract features. Meanwhile, there are very few network models targeting muti-band remote sensing images. In this research, we propose a multi-input, multi-spectral remote sensing image dehazing network, which effectively utilizes the penetrating capability of the infrared band to haze. We used global skip connections and attention modules to achieve effective feature extraction and maintain ground details in the meantime. Finally, we designed experiments to test the performance of the proposed method on multispectral images captured by Sentinel-2, which have different degrees of haze effects. Our method can effectively restore the images. It outperforms the traditional dark channel method and several neural network methods, such as DehazeNet, AOD-Net, MSBDN and GridDehazeNet, in terms of haze residues and quantitative evaluation metrics.
Meanwhile, there are some limits in this research. First, the training dataset is not categorized based on the types of haze, which could impact the effectiveness of the proposed model. Second, even though the ground details are well maintained in the restored images, there is still some loss compared to haze-free images. As for our future work, we will formulate an indicator to describe the degree of the haze effect, which will be used to classify the images in the training dataset. At the same time, we will improve the model by referring to the method that can effectively improve the detail resolution in super-resolution research.