Cloud Contaminated Multispectral Remote Sensing Image Enhancement Algorithm Based on MobileNet

Li, Xuemei; Ye, Huping; Qiu, Shi

doi:10.3390/rs14194815

Open AccessArticle

Cloud Contaminated Multispectral Remote Sensing Image Enhancement Algorithm Based on MobileNet

by

Xuemei Li

¹,

Huping Ye

^2,* and

Shi Qiu

³

¹

School of Mechanical and Electrical Engineering, Chengdu University of Technology, Chengdu 610059, China

²

State Key Laboratory of Resources and Environmental Information System, Institute of Geographic Sciences and Natural Resources Research, Chinese Academy of Sciences, Beijing 100101, China

³

Key Laboratory of Spectral Imaging Technology CAS, Xi’an Institute of Optics and Precision Mechanics, Chinese Academy of Sciences, Xi’an 710119, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2022, 14(19), 4815; https://doi.org/10.3390/rs14194815

Submission received: 16 August 2022 / Revised: 22 September 2022 / Accepted: 22 September 2022 / Published: 27 September 2022

(This article belongs to the Special Issue Scalable and Credible Artificial Intelligence for Remote Sensing Imagery Understanding)

Download

Browse Figures

Versions Notes

Abstract

Multispectral remote sensing images have shown unique advantages in many fields, including military and civilian use. Facing the difficulty in processing cloud contaminated remote sensing images, this paper proposes a multispectral remote sensing image enhancement algorithm. A model is constructed from the aspects of cloud detection and image enhancement. In the cloud detection stage, clouds are divided into thick clouds and thin clouds according to the cloud transmitability in multi-spectral images, and a multi-layer cloud detection model is established. From the perspective of traditional image processing, a bimodal pre-detection algorithm is constructed to achieve thick cloud extraction. From the perspective of deep learning, the MobileNet algorithm structure is improved to achieve thin cloud extraction. Faced with the problem of insufficient training samples, a self-supervised network is constructed to achieve training, so as to meet the requirements of high precision and high efficiency cloud detection under the condition of small samples. In the image enhancement stage, the area where the ground objects are located is determined first. Then, from the perspective of compressed sensing, the signal is analyzed from the perspective of time and frequency domains. Specifically, the inter-frame information of hyperspectral images is analyzed to construct a sparse representation model based on the principle of compressed sensing. Finally, image enhancement is achieved. The experimental comparison between our algorithm and other algorithms shows that the average Area Overlap Measure (AOM) of the proposed algorithm reaches 0.83 and the Average Gradient (AG) of the proposed algorithm reaches 12.7, which is better than the other seven algorithms by average AG 2.

Keywords:

remote sensing; self-supervision; multi-layer; feature fusion; multispectral

Graphical Abstract

1. Introduction

Remote sensing images are imaged by the response of specific spectral to ground objects. As remote sensing images generally contain multiple spectral, clouds will inevitably appear, especially since the cloud regions seriously affects the overall imaging effect [1]. The clouds have a certain transmittance in a specific spectral, and the transmittances are related to the thickness of the cloud. We regard the cloud with better transmittance as thin clouds, which can display their information from a spectrum, but the information is blurred. We consider thick clouds as difficult to transmittance clouds, where only a little information, or no information at all, is shown from the collected data. For this reason, extensive research has been conducted on clouds detection and image enhancement.

Generally, cloud detection algorithms are divided into traditional algorithms and deep learning algorithms. On one hand, traditional algorithms mainly analyze the difference between the cloud contaminated area and other areas from the perspective of the image. The representative algorithms mainly include the threshold method [2,3,4], which determines the threshold value using the statistical characteristics of image pixels to realize cloud area detection. This kind of algorithm has a fast operation speed. The filtering method [5,6,7,8,9] constructs a representative filtering model and realizes the segmentation of cloud areas according to the different response strengths of clouds and other areas. This kind of algorithm has a good detection effect on thick cloud areas, but it will encounter under segmentation or over segmentation phenomena for thin clouds. The feature method [10,11,12,13,14] constructs representative features from the perspective of space and spectrum to achieve cloud area extraction. However, this kind of algorithm analyzes the local area, which is easily affected by noise, and the segmentation is discontinuous in the process of cloud segmentation. The multi-source and multi-time phase fusion method [15,16,17,18] builds an information complementary model according to the different characteristics of different equipment, or builds a model based on the correlation of multiple acquisitions at the same location. This kind of method can better realize cloud detection, but it increases the cost of data acquisition and the difficulty of preprocessing, such as data registration. On the other hand, deep learning algorithms are mainly based on people’s cognitive processes to build a transmission response model to achieve intelligent identification. The representative algorithms are as follows. For the algorithms based on classification ideas [19,20,21,22], the response difference between the cloud and other areas is analyzed from the spectral information according to the collection characteristics of the labeled data in the early stage, and the principle of a large gap between classes is used to achieve cloud detection. The algorithms based on clustering ideas [23,24,25,26] analyze the common attributes between different clouds and use the principle of small intra-class differences to realize cloud detection. The above two types of algorithms have better generalization performance than traditional algorithms, but the detection accuracy still needs to be further improved. Multi-source feature response fusion [27,28] builds a deep model to extract representative features based on the differences in the response of different targets and different sources to achieve cloud detection. However, such methods increase hardware costs. The convolution response algorithm [29,30,31] constructs the mapping relationship between the input image and the detected image to realize cloud detection. This type of method has achieved good results, but also has some disadvantages, for example, the number of parameters is large, the training difficulty is related to the network structure, and the feature interpretability is low. Overall, the existing research shows that deep learning algorithms are superior to traditional ones. Traditional algorithms have good interpretability and fast computational speed. By comparison, deep learning algorithms show good detection performance and require a large amount of work in labeling and training in the early stage, but the feature interpretability is low.

At present, image enhancement algorithms are divided into image-based algorithms and imaging-based algorithms. On one hand, the image-based algorithms analyze the composition of image features to realize the enhancement of texture and color. The representative algorithms mainly include the texture enhancement algorithm [32,33,34], which constructs texture extraction operators to enhance the weakened texture information and has a better effect on images with richer textures. The color enhancement algorithm [35,36,37] builds a color mapping model to achieve image enhancement, which has a better effect on images with richer textures. Pixel statistics algorithm [38,39,40,41,42] builds a model according to the overall pixel distribution characteristics to achieve remapping. This type of method highlights the visual effect of the image, but the image will appear chromatic aberration. The multi-temporal and multi-source information fusion algorithm [43,44,45,46] realizes image enhancement by analyzing the relationship between different periods and signal sources. However, such an algorithm increases the hardware cost. Algorithms based on deep networks [47,48,49,50,51] build a mapping model based on the relationship between annotated images and actual images to achieve image enhancement. This type of algorithm has a strong dependence on data. On the other hand, imaging-based algorithms realize image enhancement by modeling the imaging sensor, environment, and other factors. The transfer function algorithm [52,53,54,55] constructs the transfer function of an imaging system and corrects the image’s imaging effect. This type of method has stricter requirements for the environment and sensor parameters. The specific condition enhancement algorithm [56,57,58] realizes image enhancement by establishing a model for the image analysis of a specific scene according to the actual application requirements. This type of algorithm is only applicable to limited scenes. In general, image-based algorithms and imaging-based algorithms build models from different angles, and both can achieve image enhancement. The universality of the above algorithms is still a key issue in the relevant research field.

Through the above analysis, in the aspect of remote sensing image enhancement of cloud pollution, cloud detection and remote sensing image enhancement are not combined, and the main problems faced are as follows. (1) The traditional image feature representation ability is limited, and the effect of cloud area extraction is not good. (2) The execution efficiency of the network structure is low, failing to meet the needs of practical applications. (3) The amount of cloud contaminated image data is small and the model generalization ability is poor. (4) The correlation between image imaging and image processing is poor, resulting in the need to improve the image enhancement effect.

From the aspects of cloud detection and remote sensing image enhancement, this research establishes a model and proposes a cloud contaminated multispectral remote sensing image enhancement algorithm based on MobileNet. (1) From the perspective of traditional features, a multimodal pre-detection algorithm is constructed. (2) From the perspective of deep learning, considering performance and practical application requirements, we improve MobileNet to fully extract target features. Based on the existing small sample data, a self-supervised training algorithm based on dilated convolution is constructed to learn features, and finally, cloud labeling is realized. (3) By analyzing the composition of the imaging signal, a multispectral fusion algorithm based on compressed sensing is proposed to achieve image enhancement.

The rest of this paper is organized as follows. In Section 2, the main framework of the algorithm proposed in this paper is introduced, and the multi-layer cloud detection and image enhancement algorithm based on compressed sensing is proposed. Section 3 introduces the database used in the experiment, and the main image data is presented. In Section 4, the effectiveness of the proposed algorithm is verified through a large number of experiments. Section 5 summarizes the innovations of this paper and possible follow-up work.

2. Methods

The specific process of the algorithm proposed in this paper is shown in Figure 1. Based on multimodal detection of thick cloud areas, MobileNet is improved to achieve fast and accurate thin cloud detection. The algorithm marks thick, cloudy, and cloudless regions because the hyperspectrum cannot penetrate the regions where there are thick cloud, and the marked regions are not processed. In the face of small cloud samples, the self-supervised training algorithm is used to enhance the robustness of the algorithm. On this basis, the multispectral fusion algorithm based on compressed sensing is used to enhance the image of the thin cloud and the cloudless regions, respectively, and the linear transformation is established to maintain the image consistency.

2.1. MobileNet

The MobileNet model is a lightweight model proposed by Google [59]. Currently, in response to the need for higher accuracy in image detection tasks, networks tend to be deeper and more complex, which inevitably leads to a large number of parameters and thus consumes a lot of computational resources. To balance the demands for time and performance, the MobileNet structure was accordingly proposed. On the basis of the traditional MobileNet structure, three typical structures have been derived.

MobileNet v1 is a lightweight neural network built on a pipelined structure using depthwise separable convolution with two hyperparameters, which can give developers greater freedom in selecting the suitable model [60]. This network structure has 28 layers, with depthwise convolution and pointwise convolution as two layers. The depthwise convolution is a convolution with a channel count of 1 and is responsible for the number of convolution kernels being the same as the input channel count at the output. The pointwise convolution kernel is 1 × 1 × M in size and aims to enrich the connections between channels for feature extraction.

MobileNet v2 focuses on the problem of linear bottlenecks in high dimensional space [61]. For example, the ReLU activation function, despite its effectiveness in increasing the nonlinear representation of features, is limited to high dimensional space, and using the ReLU function again will destroy the features if the dimensionality is reduced. Therefore, MobileNet v2 constructs a Linear Bottlenecks structure, where the activation function is no longer added for nonlinear transformation after the convolutional layer with dimensionality reduction has been performed, in order to cause as little information loss as possible. Instead, an inverse residual structure is introduced, and a bottleneck structure is proposed based on ResNet, which consists of two 1 × 1 convolutions and one 3 × 3 convolution. In MobileNet, the number of layers in depthwise convolution is the number of input channels. If, like the bottleneck in residual networks, compression is conducted first, followed by convolutional extraction, too few features can be obtained. Therefore, it takes an inverse approach, i.e., first up sample, convolution, and then down sample, to achieve the final feature extraction.

MobileNet v3 proposes a new activation function (h-swish), which is improved based on the swish activation function [62]. The swish function is unbounded, lower bounded, smooth, and non-monotonic, and it outperforms the ReLU function on deep models. This network structure allows the neural network to use global information to enhance useful information while suppressing useless information. The h-swish function achieves feature extraction by compressing, activating, and giving weights to different layers.

MobileNet adopts a top-down streamlined architecture, as shown in Figure 2, which mainly consists of a standard convolutional layer (Conv Std), a depthwise convolutional layer (Conv dw), and a linear correction unit (ReLU).

The network replaces traditional convolution with depthwise separable convolution, which greatly reduces the number of model parameters. Depthwise separable convolution can be divided into channel convolution and point convolution. The channel convolution adopts a 3 × 3 convolution kernel, and the point convolution adopts a 1 × 1 × M convolution kernel, where M corresponds to the number of channels in the previous layer to complete the weighted fusion of multi-channel images. The number of parameters is greatly reduced, as shown in Figure 3.

From the above analysis, it is clear that MobileNet is lightweight and is optimized at the convolutional level. However, for the multispectral remote sensing image data, this paper researches how to effectively focus on the cloud contaminated area, mark them, and then guide subsequent research on image enhancement.

2.2. Cloud Detection Algorithm Based on Improved MobileNet

In traditional feature analysis, there is a certain difference in the pixel distribution of the cloud on the image and other targets, but the traditional analysis algorithm cannot completely detect the cloud contaminated area. By comparison, the deep learning network is generally related to the input data and has many parameters, which cannot meet the requirements of fast and accurate calculation. To this end, we integrate traditional image features and deep features to construct a multi-layer cloud detection algorithm from cloud detection.

For single band thick cloud images, their histogram shows obvious double peaks. For less cloudy images, there is no obvious double peak structure, but a gentle extension in the histogram. The threshold method focuses on the trough of the histogram to realize cloud detection. Here, the discretization window is used to build the model and realize the filtering. For cloudy images, the OTSU algorithm [63] is used, and the threshold is set based on statistical principles to maximize the interclass variance. For the satellite image of ground objects, its histogram can be regarded as a multimodal state composed of multiple distributions. The fitting process is as follows:

g (x) = \sum_{i = 1}^{M} α_{j} P (x_{i} | φ_{j})

(1)

P (x_{i} | φ_{j}) = \frac{1}{\sqrt{2 π σ_{j}^{2}}} \exp [- \frac{{(x_{i} - μ_{j})}^{2}}{2 σ_{j}^{2}}]

(2)

where g(x) is the histogram fitting curve. M is the number of Gaussian models, which can be calculated from the Bayesian Information Criterion, x_i is the gray value of pixel I, P(x_i|φ_j) is expressed as the j th Gaussian distribution function, α is the weight, µ is the mean, and σ is the standard deviation.

We use the EM algorithm to estimate the parameters of the Gaussian mixture model. Specific steps are as follows. Firstly, set the initial parameters of M, α, µ, σ, and calculate the posterior probability:

β_{i j} = \frac{α_{j} P (x_{i} | φ_{j})}{\sum_{i = 1}^{M} P (x_{i} | φ_{j})}, i \in [1, N], j \in [1, M]

(3)

Update the weights to get the standard deviation matrix C:

\{\begin{cases} α_{j} = \sum_{i = 1}^{N} β_{i j} / N \\ μ_{j} = \sum_{i = 1}^{N} x_{i} β_{i j} / \sum_{i = 1}^{N} β_{i j} \\ C_{j} = \sum_{i = 1}^{N} β_{i j} {(x_{i} - μ_{j}^{T})}^{2} / \sum_{i = 1}^{N} β_{i j} \end{cases}

(4)

When all parameters converge, the final parameters are obtained. The characteristics of the Gaussian distribution are estimated, using µ+3σ as the dividing line.

For thin cloud images, traditional algorithms cannot effectively extract the cloud area. In practical applications, the balance between accuracy and efficiency should be comprehensively considered. We conduct research based on the lightweight deep feature network MobileNet. To further improve the feature extraction ability of MobileNet, we learn from the idea of ViT (Vision Transformer), which directly applies the transformer to image processing. When tested with the same dataset, as the network depth increases, not only does the performance of ViT get saturated, but the attention maps appear similar after a certain number of deep computations. To solve the problem caused by the self-attention layer, DeepViT replaces the self-attention module in ViT with a re-attention module, and thus its performance will continue to improve as the model depth increases. The DeepViT model structure is shown in Figure 4.

The similarity between attention maps between transformer blocks is relatively high, especially in the deep layers. However, the similarity of different attention maps from the same transformer block is very small. So, we propose to exchange the attention of each head:

R A t t e n t i o n (Q, K, V) = N o r m (θ^{T} (S o f t \max (\frac{Q K^{T}}{\sqrt{d}}))) V

(5)

where Q and K are the multiples for generating the attention map,

\sqrt{d}

is the scale factor for the depth of the network, and θ is the transformation matrix.

The advantages of RAttention are as follows. (1) It uses the interaction between different attention heads to collect complementary information between them, which can better improve the diversity of attention maps. (2) The program of RAttention is very simple and easy to implement.

Given the lightweight advantages of the MobileNet network model and the performance advantages of DeepViT, we design a lightweight convolutional neural network model MobileDViT to reduce the number of parameters while maintaining high accuracy.

There are a total of seven bottleneck layers in the MobileNet network. The size of the image after MobileNet processing is H × W × C. The output needs to be processed by convolution of n × n and 1 × 1, and an original image H × W × d will be obtained. Then, it is input into the transformer, and a 1 × 1 point convolution is performed on the output result to obtain a H × W × C convolution. At this time, the output needs to be connected to the original image, and a H × W × C convolution will be obtained. The image of H × W × 2C is finally converted into H × W × C through a convolution of n × n with the input channel enlarged by 2 times and the output channel unchanged. For the overall structure of the MobileDViT block, the output after each convolution operation needs to go through the BN layer and the nonlinear activation function ReLU. In addition, the step of the MobileDViT block is 1 in the entire network structure. Among them, the n × n convolution in the MobileDViT block is standard, and its structure is shown in Figure 5.

MobileDViT block is similar to convolution, using two standard n × n and 1 × 1 convolutions and a transformer with RAttention, and replacing local processing in convolution with deeper global processing. So, the MobileDViT block can be regarded as convolution. Besides, it has properties similar to Convolutional Neural Network (CNN) and ViT, which helps to achieve better results with fewer parameters and a simple training method.

The Softmax regression model is used in training, which has good classification performance. By calculating the attribution probability of the input feature vector corresponding to each target class, the target is classified into the class with the highest probability. The class label y output by the Softmax model is a k dimensional real vector, where k corresponds to the number of classes. Then the probability of the model outputting class j is expressed as:

p (y_{i} = j |x_{i}; θ) = \exp (θ_{j} T (x_{i})) / \sum_{l = 1}^{k} \exp (θ_{l} T (x_{i}))

(6)

where θ is the parameter of the Softmax model. For the test sample x, the probability of sample x belonging to each class is estimated by:

h (x, θ) = [\begin{array}{l} p (y_{j} = 1 |x; θ_{1}) \\ \dots \\ p (y_{j} = k |x; θ_{k}) \end{array}] = \frac{1}{Z} [\begin{array}{l} \exp (θ_{1} T (x)) \\ \dots \\ \exp (θ_{k} T (x)) \end{array}]

(7)

Z = \sum_{j = 1}^{k} \exp (θ_{j} T (x))

(8)

where Z is the normalization factor. The likelihood estimate is used as the loss function:

J (w) = [\sum_{i = 1}^{m} \sum_{j = 1}^{k} Q (y_{j}^{i}) \log M (w)]

(9)

M (w) = \exp (w_{j} T (x_{i})) / \sum_{l = 1}^{k} \exp (w_{l} T (x_{i}))

(10)

where

Q (y_{j}^{i})

represents the class function, being 1 when y_i belongs to class j, and 0 otherwise. However, the solving process is complicated. A weight decay term is added to make it have a unique solution, and the corresponding loss function is:

J (w) = - \frac{1}{m} [\sum_{i = 1}^{m} \sum_{j = 1}^{k} Q (y_{j}^{i}) \log M (w)] + \frac{λ}{2} \sum_{i = 1}^{m} \sum_{j = 1}^{k} θ_{i j}^{2}

(11)

where λ > 0, and J is a convex function. The partial derivative of θ is obtained, and the gradient descent method is used to obtain the global optimal solution of θ. Based on this, a locally non-overlapping sliding window is constructed to search for pixels to estimate the concentration of the cloud contaminated area:

h (x, y) = \min_{(x^{'}, y^{'}) \in W} (I (x^{'}, y^{'}))

(12)

where W is a window with a size of w × w and centered at (x,y). It is verified by the edge detection algorithm that for the blurred image, the local haze thickness map (HTM) is constant or changes little. Clouds are often concentrated in the low frequency part. To reduce the noise interference at the edge, the following function is constructed:

\begin{array}{l} H (x, y) = \min (h_{i} (x, y), h_{m}) \\ h_{m} = \frac{1}{n} \sum_{(x, y) \in Ω} h_{i} (x, y) \end{array}

(13)

After obtaining the cloud distribution map, it is necessary to restore the original image size and perform interpolation operations. However, this method is susceptible to noise interference and has a large gradient response at the edge, and the sliding windows do not overlap, which leads to agglomeration. Here, Gaussian curvature filtering is used to achieve noise suppression and edge smoothing. Curvature filtering is a method that minimizes the relevant curvature from the perspective of differential geometry to minimize the regularization term. The curvature is calculated by an implicit solution based on the continuity of differential geometry. Based on the discreteness of the data, the gradient and edge details of the image are maintained. Then, the target area covered by thin clouds is enhanced and the cloud contaminated area is detected.

2.3. Self-Supervised Training Algorithm

Due to the limited number of samples, a transfer algorithm needs to be used to improve the robustness of the algorithm. We adopt a transfer algorithm here, which mainly consists of a feature extractor and a classifier.

Feature Extractor. The residual attention dilated network is designed as a feature extractor, the dilated branch enables the network to extract features of different sizes, and the residual attention can enhance the model’s attention to important features.

Classifier. The fully connected layer and the Softmax layer are used as the class classifier. The fully connected layer processes the output of the feature extractor. The Softmax output is used for normalization to obtain the probability that the input data belongs to each class. To alleviate the over fitting phenomenon in the few-shot learning algorithm, the cross entropy loss function, with the addition of the L₂ regularization term, is used as the objective function of the network model.

Traditional convolutional networks use fixed size convolution kernels, which can only mine fixed size features and lack the attention of important features. To this end, we construct a residual attention dilated network. Firstly, based on the traditional convolutional network, the convolution process of expanding the sampling interval of the convolution kernel is adopted to expand the network’s receptive field, and features of different sizes can be extracted.

Using the Resnet12 network as a feature extractor and adding dilated convolution branches to it can not only expand the receptive field of the network but also extract features of different sizes. The commonly used methods to increase the receptive field of the convolutional network model are increasing the size of the convolution kernel or increasing the adjacent interval of the convolution kernel. A fixed size convolution kernel can only obtain the features of a fixed size, and an excessively large size convolution kernel cannot capture local delicate features. Therefore, this paper expands the field of view by increasing the convolution interval.

Based on the traditional Resnet model structure, a dilated convolution branch is added to the block of the feature extractor, and a new residual module is proposed.

The output X_i of each block in Figure 6 is expressed as:

X_{i} = C o v (X_{i - 1}) + X_{i - 1}

(14)

X_{i} = P (\max (X_{i}^{l e f t}, X_{i}^{r i g h t}))

(15)

where P represents max pooling, X_i−₁ represents the output of the previous block,

X_{i}^{l e f t}

represents the output of the left branch of the block, and

X_{i}^{r i g h t}

represents the output of the right branch of the block. The update process is:

\{\begin{cases} X_{i}^{l e f t} = C o v (X_{i - 1}) + X_{i - 1} \\ X_{i}^{r i g h t} = C o v_{A 2} (X_{i - 1}) + X_{i - 1} \end{cases}

(16)

where the dilated convolution branch can capture the features of larger size, and each branch independently updates its scaling and shifting parameters to enhance model adaptability. To enhance the model’s attention to important features, the ability of the model to extract data is improved. An attention mechanism is introduced into each block, as shown in Figure 7.

The corresponding attention vector is expressed by:

M_{i} (X_{i - 1}) = D (X_{i - 1}) + (X_{i - 1}) * U (D (D (X_{i - 1})))

(17)

where D(.) is the subsampling using max pooling, and U(.) is the upsampling by bilinear interpolation. Then the output of each block is:

X_{i} = X_{i} + X_{i} * M_{i} (X_{i - 1})

(18)

Max pooling is not modified by back propagation to obtain scale invariant features. The characteristics of the image content are fully expressed. Finally, a convolutional network is obtained, which improves the receptive field of the model, extracts the features of different sizes, and enhances the model’s attention to important features. Thus, the model’s ability to extract data features is improved.

In the process of obtaining the basic network in the pre-training stage of transfer learning as shown in Figure 8, if the dataset used for training is very different from the new type of dataset, the generalization ability of the model is limited. We propose a pre-training algorithm based on self-supervision. By rotating the image data and establishing the image structure information label, the structure information of the image itself is fully mined, which increases the supervision information in the training task and enhances the generalization ability of the model.

The algorithm process is divided into two stages: pre-training and meta-learning. Self-supervision mainly uses auxiliary tasks to mine its own supervision information from large scale unsupervised data. The network is trained with this constructed supervision information.

A rotation-based self-supervised pre-training algorithm is used to rotate each batch of images, and the rotation angle is used as supervision information to establish labels to obtain more supervision information. A rotation angle classifier is built to train the model. In this stage, the feature extractor φ, the traditional class classifier θ, and the rotation angle classifier θ_s for the self-supervised auxiliary task are randomly initialized and optimized by gradient descent:

[φ, θ, θ_{s}] = [φ, θ, θ_{s}] - α \nabla L_{D} [φ, θ, θ_{s}]

(19)

L_{D} = λ L_{c l a s s} + (1 - λ) L_{s e l f}

(20)

where L_D is the loss function of cross entropy, and α is the learning rate.

2.4. Multispectral Fusion Algorithm Based on Compressed Sensing

Through the introduction of the above two sections, the detection of thin cloud area and thick cloud area under the condition of a small sample has been realized. We will focus on non-cloud and thin cloud areas to highlight features. Since the information contained in different spectral segments is different, to fully exploit the complementarity between the images, clearer images need to be obtained. However, the mainstream contourlet transformation has the following shortcomings. (1) The model does not have translation invariance. (2) The frequency aliasing effect will appear during the sampling process. (3) Insufficient orientation information exists in extracting rectangular images. To solve this problem, the contourlet transform is improved in this research. Specifically, we retain the advantages of multi-scale transformation, eliminate the subsampling step of the two layer transformation in the contourlet transform, and obtain translation invariance to avoid the frequency aliasing phenomenon of the fused image. A signal is considered sparsely representable if it can be linearly represented on a sparse basis. Multispectral images can be regarded as sparse images. For this reason, a fusion algorithm based on the principle of compressed sensing is constructed, as shown in Figure 9.

The multispectral image sequence {M} is decomposed by nonsubsampled contourlet transform (NSCT) to obtain high frequency sub band images {H₁, H₂, … H_s} and low frequency sub band images {L₁, L₂, … L_s}.

For high frequency sub band images, the average gradient

\{g_{1}, g_{2}, \dots g_{s}\}

is calculated first. Then, the average gradient value G_F is obtained by using the CS based high frequency fusion principle. The high frequency fusion coefficient H_F is obtained according to the orthogonal matching tracking algorithm.

For low frequency sub band images, we first perform sparse dictionary learning, build a complete sparse dictionary D, implement sparse coding, and obtain sparse coefficients

\{α_{1}^{i}, α_{2}^{i}, \dots α_{s}^{i}\}

. Then, the low frequency fusion coefficient L_F is obtained according to the low frequency sparse fusion rules based on dictionary learning.

The image is obtained by fusing high frequency and low frequency information. Since NSCT contains two transforms, in order to give full play to the characteristics of the two transforms, we adopt a two-layer decomposition structure, as shown in Figure 10. The first layer uses a non-subsampled pyramid filter bank (NSPFB) to obtain the low frequency sub band image I_L and the high frequency sub band image I_H. The second layer uses a non-subsampled directional filter bank (NSDFB) to decompose the I_H to obtain the corresponding high frequency sub band images I_Ha and I_Hb. Using NSPFB to decompose I_L, the corresponding low frequency sub band images I_La and I_Lb are obtained.

Due to insufficient information in the images of different infrared bands, if the maximum fusion rule or the weighted average fusion rule is adopted, the image will contain a lot of noise, and the display effect will be poor. Since the sparse dictionary can linearly represent the image’s sparse identification number, useful information can be effectively selected from it.

According to {H₁, H₂, … H_s} and {L₁, L₂, … L_s} obtained by NSCT, the image is divided into blocks using the sliding window technique, and the step size is set to s pixels to obtain q × q image blocks.

{\{P_{L}^{i}\}}_{i = 1}^{T}

denotes the T-th block of L with

{\{P_{L}^{i}\}}_{i = 1}^{T}

. We convert

\{P_{1}^{i}, P_{2}^{i}, \dots P_{s}^{i}\}

to a column vector as

\{V_{1}^{i}, V_{2}^{i}, \dots V_{s}^{i}\}

, and normalize it into:

{\bar{V}}_{m}^{i} = V_{m}^{i} - {\bar{v}}_{m}^{i} C

(21)

where C is an n × 1 dimensional identity matrix, and

{\bar{v}}_{m}^{i}

represents the mean of all elements. The sparsity coefficient is calculated according to the OMP (Orthogonal Matching Pursuit) algorithm:

α_{s}^{i} = \underset{α}{\arg \min {‖α‖}_{0}}, s . t . ‖{\bar{V}}_{m}^{i} - D α‖ \leq ξ,_{} s \in \{1, 2, \dots S\}

(22)

Use the L₁ norm to get the fusion coefficient:

α^{i} = \max (α_{1}^{i}, α_{2}^{i}, \dots α_{S}^{i})

(23)

Then the corresponding sparse matrix is:

V_{F}^{i} = D α^{i} + {\bar{v}}_{F}^{i} C

(24)

The low frequency fusion coefficient L_F is obtained by iterations according to the above steps.

The high frequency sub band image contains rich edge information and contains a large amount of structural information at different scales. For this reason, we design an algorithm to fully utilize the high frequency sub band image. Since the average gradient can reflect the grayscale changes on both sides of the boundary, it can be used to measure image clarity. This principle is used to build high frequency fusion rules. The average gradient value G of the high frequency sub band image is given by:

G = \frac{1}{(M - 1) (N - 1)} \sum_{i = 1}^{M - 1} \sum_{j = 1}^{N - 1} \sqrt{\frac{{(\partial f_{x})}^{2} + {(\partial f_{y})}^{2}}{2}}

(25)

where

\partial f_{x}

and

\partial f_{y}

are the horizontal and vertical derivatives, respectively. The reconstructed high frequency coefficients are:

H_{F} = \max (G_{1}, G_{2}, \dots G_{s})

(26)

The fused image is obtained by inverse NSCT.

We normalized the regions separately for thin and cloudless masks, and Input NSCT for enhancement separately and then, the algorithm constructed a linear mapping relationship according to the difference between the pixel values of the thin cloud regions and the cloudless regions to achieve consistent image.

3. Dataset Description

We use 96 sets of data collected by Landsat 8. Landsat 8 consists of an Operational Land Imager (OLI) and a Thermal Infrared Sensor (TIRS) [64]. The similar resolution of the image is 7821 × 7951. A total of nine bands of remote sensing images are included in the dataset. We divide the dataset into three groups, as shown in Figure 11. Group 1 contains the images with large areas of cloud cover. Group 2 contains images of urban scenes, including typical landmarks such as houses and roads. Group 3 contains suburban scene images, including rivers, fields, and villages.

4. Experiment Analysis

Based on the above data, we use a Linux system, image server, and Python to load a deep learning framework for the experiment, using the same dataset to carry out the experiment. The ratio of test data to training data is 1:1.

The time consumption of the proposed image enhancement algorithm for Cloudy multispectral data, Urban multispectral data, and Suburban multispectral data is shown in Table 1, the average of which is 429 s. As the Cloudy multispectral data contains a large number of cloud contaminated areas, the algorithm consumed a total of 298 s. The Urban multispectral data contains a wealth of ground object information and took a total of 578 s. The Suburban multispectral data, with a moderate amount of cloud and ground objects, took a total time of 412 s. We will subsequently verify the performance of the algorithm.

4.1. Performance of the Cloud Detection Algorithm

The analysis is conducted from the perspectives of traditional algorithms and deep learning algorithms. The histogram curve we extracted is shown as the blue line in Figure 12, and the red line is the multimodal fitting curve. It can be seen that the pixel distribution is multimodal. The area where the clouds are located is generally bright, and the ground objects are dark. Through the bimodal algorithm proposed in this paper, the pixel distribution was simulated, and the cloud contaminated area was initially extracted. Figure 12a shows a cloud free image. There is a lot of snow in the photographed area, showing a highlight area similar to the cloud. This area was extracted by the bimodal algorithm, and its attributes need to be further judged later. Figure 12b shows an image of an apogee thin cloud. The cloud covers most of the image, and the highlighted area concentrates many pixel values. The curve is unimodal, and there is a certain deviation in the selection of the threshold. Figure 12c is an apogee dotted cloud image. Because the cloud is high, it is dotted, the contrast of ground objects is low, and the brightness value is low. The curve shows a more obvious bimodal state, and the algorithm shows a good fitting effect. Figure 12d shows that the cloud area is blurred and there is no obvious texture area, which can be seen these clouds tend to gather and dissipate, so we should define them as changing clouds. Due to the rapid movement of the cloud, the blurred image presents a triple-bimodal form, which restricts the curve fitting effect. Figure 12e is an image of a thick cloud near the ground. The cloud texture features are clear, and the image shows an obvious bimodal state. Figure 12f is an image with both thick clouds and thin clouds taken near the ground. The proportion of ground objects in the image is small, and the double peaks are in a state of one large and one small. In summary, the algorithm proposed in this paper can realize the preliminary extraction of the cloud area. On this basis, it is necessary to further process the cloud contaminated area.

To verify the algorithm performance, this paper introduces AOM (Area Overlap Measure), AVM (Area Over-segmentation measure), AUM (Area Under-segmentation Measure), CM (Combination Measure) and ROC (Receiver Operating Characteristic) curves to evaluate the algorithm performance [65]:

\{\begin{cases} A O M = \frac{R_{s} \cap R_{g}}{R_{s} \cup R_{g}} \\ A V M = \frac{R_{s} - R_{g}}{R_{s}} \\ A U M = \frac{R_{g} - R_{s}}{R_{g}} \\ C M = \frac{1}{3} \{A O M + (1 - A V M) + (1 - A U M)\} \end{cases}

(27)

where R_g is the gold standard by manually annotated, and R_s is the algorithm segmentation result. Table 2, Table 3 and Table 4 and show the average indicators of the running results of the comparison algorithms in the database. In general, Table 2 shows the results of cloud detection in database 1. In traditional algorithms, due to the large number of clouds included, the edge contrast is not strong, resulting in poor detection performance. Table 3 shows the results of cloud detection in database 2. Since the spectral response of the city is quite different from that of the cloud, the performance of the algorithm has been improved. Table 4 shows the results of cloud detection in database 3. Due to the constraints of the reflection of the river and the spectral response, the extraction effect is poor. Reference [4] proposed an algorithm for color and texture fusion (CTF), which has a better extraction effect for clouds with obvious colors and clear textures. Faced with complex and changeable scenes, the algorithm has limited adaptability. Reference [12] proposed a multi-scale (MS) algorithm, which considered the effect of different scales on cloud extraction. However, for images containing large areas of clouds, the feature extraction effect is poor. Reference [20] introduced a support vector machine (SVM) algorithm to convert cloud annotation into classification problems, which has a higher performance than traditional algorithms. Reference [30] proposed an attention mechanism based on CNN (ACNN) to focus on the area of interest and realize cloud annotation, but the algorithm requires a large amount of computation and the salient area has certain limitations. The algorithm proposed in this paper improves the MobileNet algorithm, as it comprehensively considers the requirements of accuracy and efficiency to achieve cloud annotation. In all, the proposed algorithm has the best performance. Figure 13 show the ROC curves by different algorithms. Figure 13a are the ROC curves of cloudless image. Cloudless images have high resolution. Figure 13b are the ROC curve of light clouds images. Light clouds images are blurred, so all algorithms are degraded. Figure 13c are the ROC curves of point clouds images. Point clouds images show that the areas where the clouds are blurred, and the area without cloud is clear. Figure 13d are the ROC curves of changing clouds images. Figure 13e are the ROC curves of near thick clouds. Due to the presence of some ground objects around the clouds, algorithms ROC curves are degraded. Figure 13f are the ROC curves of far thick clouds images. Due to the clouds being thick, and the distance is far away, the interference is small, and the algorithms present good effects. In general, the algorithm proposed in this paper achieves good results in different situations and can realize clouds detection.

4.2. Performance of the Training Algorithm

In order to verify the performance of the algorithm, indicators of Dice, Sen, and Spe are introduced to measure:

\{\begin{cases} D i c e = \frac{2 \times T P}{2 \times T P + F P + F N} \\ S e n = \frac{T P}{T P + F N} \\ S p e = \frac{T P}{T P + F P} \end{cases}

(28)

where TP represents the number of pixels that are correctly predicted as positive samples, TN represents the number of pixels that are correctly predicted as negative samples, FP represents the number of pixels mispredicted as positive samples, and FN represents the number of pixels mispredicted as negative samples.

The core idea of the VGG-16 algorithm [66] is a small convolution kernel and multi-layer iteration. Due to the small receptive field, the training process has slow convergence and low accuracy. Inception v1 [67] separates cross channel correlation and spatial corrections, constructs a variety of different receptive field features, and sets many parameters. The Resnet network [68] adopts a multichannel architecture, which has strong interpretability, and its speed and accuracy are better than the above algorithms. MobileNet [60] uses depthwise convolution to improve the speed, and the spatial connection and cross channel are completely independent; while ensuring accuracy, the algorithm efficiency is greatly improved. It is proposed here based on the traditional algorithm; it introduces dilated convolution, expands the receptive field, and enhances the detection ability of large area clouds. The algorithm is robust and the training performance is further improved. The convergence curve is shown in Figure 14.

In order to visually demonstrate the performance of the proposed training algorithm, the cloud detection experiment was conducted under different conditions, with the results shown in Figure 15. Figure 15a is a substantially cloud-free image with scattered cloud points. Figure 15b is a thin clouds image. This area shows continuity, and the algorithm can detect thin clouds. Figure 15c is an image of the clouds presenting a dotted distribution. Figure 15d is an image containing moving clouds, showing a more obvious boundary. Figure 15e is a cloud-contained image taken near the ground, where the cloud has strong continuity and is obviously different from the ground objects. Figure 15f shows the interaction between thick clouds and thin clouds near the ground. The thick clouds are displayed as a large area of white, and the thin clouds are displayed as sporadic white. The difference between ground objects and clouds is obvious. It can be seen that the proposed algorithm can effectively detect the cloud contaminated area, and it has a good effect on the detection of no cloud, thin cloud, and thick cloud.

4.3. Evaluation of Image Enhancement Effect

The performance of the proposed algorithm is measured from the dimensions of subjective evaluation and objective evaluation. Experiments were conducted on cloudy, urban, and suburban images, as shown in Table 5, Table 6 and Table 7.

Reference [56] established the luminosity and contrast model (LCM) from the perspective of color and reference [32] and established the local image enhancement model (LEM) from the perspective of texture to realize image enhancement. The whole image information is ignored. Reference [39] analyzed the overall distribution of the image, establishes a sub image histogram mode (SIH) to darken the region where the cloud is located, and enhanced the image value of other regions, so as to achieve the purpose of image enhancement. Reference [69] used the Laplacian-Gaussian pyramid (LGP) algorithm which has bright features, and the details are insufficient. Reference [30] used CNN to achieve the purpose of image enhancement. Reference [64] analyzed curvelet transform (CVT) and shows blurred textures. Reference [45] used multi-scale fusion (MSF) to image enhancement. As shown in Figure 16, the algorithm proposed in this paper can better maintain texture features and have a better display degree than other algorithms.

The following objective evaluation indicators are introduced for evaluation:

{\begin{cases} A G = \frac{1}{(M - 1) (N - 1)} \sum_{i = 1}^{M - 1} \sum_{j = 1}^{N - 1} \sqrt{\frac{{(I_{i + 1, j} - I_{i, j})}^{2} + {(I_{i, j + 1} - I_{i, j})}^{2}}{2}} \\ S D = \frac{\sqrt{\sum_{i = 1}^{M} \sum_{j = 1}^{N} {(I_{i, j} - μ)}^{2}}}{M N} \\ S F = \sqrt{\frac{\sum_{i = 1}^{M} \sum_{j = 2}^{N} {(I_{i, j} - I_{i, j - 1})}^{2} + \sum_{i = 2}^{M} \sum_{j = 1}^{N} {(I_{i, j} - I_{i - 1, j})}^{2}}{M N}} \\ E I = \frac{\sqrt{\sum_{i = 1}^{M} \sum_{j = 1}^{N} [s_{x} {(i, j)}^{2} + s_{y} {(i, j)}^{2}]}}{M N} \end{cases}

(29)

where AG is the average gradient, M × N is the image resolution, SD is the standard deviation, µ is the mean, SF is the spatial frequency, EI is the edge intensity, and sx and sy are the convolution results of the Sobel operator.

5. Conclusions

In remote sensing imagery, it is inevitable that clouds will obscure ground objects, for which it is necessary to extract the cloud contaminated area for subsequent detection. If the image information acquired from a single band is limited, it is necessary to fuse multi-band image data to achieve image enhancement. Firstly, a multilevel cloud detection model is established based on the imaging characteristics of remote sensing images. For thick clouds, the bimodal model is constructed. For thin clouds, the RAttention transformer block is introduced into MobileNet to extract thin cloud areas, and then a cloud thickness metric model is constructed to label the penetrable thin clouds. Next, based on the self-supervised training framework, we can perform network learning under the condition of a small sample. In terms of image enhancement, a compression-based perception model is developed. The NSCT algorithm is then constructed by analyzing the information contained in each waveband image in the frequency domain to achieve image enhancement. Finally, experiments were conducted with four categories of databases: a cloudy area; a city with thin cloud; a city with thick cloud; and a suburb. For the proposed algorithm, the average AOM reached 0.83 and the average AG reached 12.7, showing the enhanced image quality compared with the original image quality.

The innovations of the proposed MobileNet based multispectral remote sensing image enhancement algorithm for cloud contamination can be summarized as follows:

(1) From the perspective of traditional features, we construct a pre-detection algorithm based on multiple peak states.

(2) From the perspective of deep learning, considering performance and practical application requirements, we improve MobileNet to fully extract the target features. Based on the existing small sample data, a self-supervised training algorithm based on dilated convolution is constructed to learn the features and finally achieve cloud annotation.

(3) By analyzing the signal composition of the obtained image, we propose a compression-aware multispectral fusion algorithm to achieve image enhancement.

Despite some progress in cloud contaminated multispectral remote sensing image enhancement algorithms, some issues still need to be addressed in the future:

(1) At present, we use multispectral means to mark cloud-contaminated areas, and it is a more important task to follow up with other technical means to recover cloud contaminated areas.

(2) Multispectral remote sensing data has certain limitations due to the limited operating time of the satellite payload. Therefore, mining the relationship between the available data to achieve image quality enhancement is a major task.

(3) Existing algorithms process data from ground stations, so it is of great significance to develop a stable and high-performance satellite side processing system.

Author Contributions

Conceptualization, X.L. and S.Q.; methodology, H.Y. and S.Q.; software, X.L. and S.Q.; validation, H.Y. and X.L.; formal analysis, All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Key Research and Development Program of China, grant number 2019YFE0126500; National Science and Technology Major Project of China’s High Resolution Earth Observation System, grant number 21-Y20B01-9001-19/22; the Scientific Instrument Developing Project of the Chinese Academy of Sciences, grant number YJKYYQ20200010.

Data Availability Statement

The data can be publicly accessed at https://landsat.usgs.gov/landsat-8-cloud-cover-assessment-validation-data (accessed on 1 December 2021).

Acknowledgments

The authors would like to give our sincerest thanks to the anonymous reviewers for their constructive suggestions and comments, which are of great value for improving the manuscript. The authors would also like to thank the Editor for the kind assistances and beneficial comments. The authors are grateful for the kind support from the editorial office. We gratefully acknowledge the funders of this study.

Conflicts of Interest

The authors declare no conflict of interest.

References

Li, L.; Li, X.; Jiang, L.; Su, X.; Chen, F. A review on deep learning techniques for cloud detection methodologies and challenges. Signal. Image Video Process. 2021, 15, 1527–1535. [Google Scholar] [CrossRef]
Poli, G.; Adembri, G.; Gherardelli, M.; Tommasini, M. Dynamic threshold cloud detection algorithm improvement for AVHRR and SEVIRI images. In Proceedings of the 2010 IEEE International Geoscience and Remote Sensing Symposium, Honolulu, HI, USA, 25–30 July 2010; pp. 4146–4149. [Google Scholar]
Lin, C.-H.; Lin, B.-Y.; Lee, K.-Y.; Chen, Y.-C. Radiometric normalization and cloud detection of optical satellite images using invariant pixels. ISPRS J. Photogramm. Remote Sens. 2015, 106, 107–117. [Google Scholar] [CrossRef]
Başeski, E.; Cenaras, Ç. Texture and color based cloud detection. In Proceedings of the 2015 7th international conference on recent advances in space technologies (RAST), Istanbul, Turkey, 16–19 June 2015; pp. 311–315. [Google Scholar]
Addesso, P.; Conte, R.; Longo, M.; Restaino, R.; Vivone, G. MAP-MRF cloud detection based on PHD filtering. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2012, 5, 919–929. [Google Scholar] [CrossRef]
Surya, S.R.; Simon, P. Automatic cloud detection using spectral rationing and fuzzy clustering. In Proceedings of the 2013 2nd International Conference on Advanced Computing, Networking and Security, Mangalore, India, 15–17 December 2013; pp. 90–95. [Google Scholar]
Zhang, Q.; Xiao, C. Cloud Detection of RGB Color Aerial Photographs by Progressive Refinement Scheme. IEEE Trans. Geosci. Remote Sens. 2014, 52, 7264–7275. [Google Scholar] [CrossRef]
Zi, Y.; Xie, F.; Jiang, Z. A Cloud Detection Method for Landsat 8 Images Based on PCANet. Remote Sens. 2018, 10, 877. [Google Scholar] [CrossRef]
He, W.; Zhang, H.; Shen, H.; Zhang, L. Hyperspectral Image Denoising Using Local Low-Rank Matrix Recovery and Global Spatial–Spectral Total Variation. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2018, 11, 713–729. [Google Scholar] [CrossRef]
Onsi, M.; ElSaban, H. Spatial cloud detection and retrieval system for satellite images. Int. J. Adv. Comput. Sci. Appl. 2012, 3, 12. [Google Scholar]
Changhui, Y.; Yuan, Y.; Minjing, M.; Menglu, Z. Cloud detection method based on feature extraction in remote sensing images. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2013, 2, W1. [Google Scholar] [CrossRef]
Xie, F.; Shi, M.; Shi, Z.; Yin, J.; Zhao, D. Multilevel cloud detection in remote sensing images based on deep learning. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2017, 10, 3631–3640. [Google Scholar] [CrossRef]
Shao, Z.; Deng, J.; Wang, L.; Fan, Y.; Sumari, N.S.; Cheng, Q. Fuzzy autoencode based cloud detection for remote sensing imagery. Remote Sens. 2017, 9, 311. [Google Scholar] [CrossRef]
Chen, Y.; Guo, Y.; Wang, Y.; Wang, D.; Peng, C.; He, G. Denoising of hyperspectral images using nonconvex low rank matrix approximation. IEEE Trans. Geosci. Remote Sens. 2017, 55, 5366–5380. [Google Scholar] [CrossRef]
Zhang, X.; Qin, F.; Qin, Y. Study on the thick cloud removal method based on multi-temporal remote sensing images. In Proceedings of the 2010 International Conference on Multimedia Technology, Ningbo, China, 29–31 October 2010; pp. 1–3. [Google Scholar]
Marais, I.V.Z.; Du Preez, J.A.; Steyn, W.H. An optimal image transform for threshold-based cloud detection using heteroscedastic discriminant analysis. Int. J. Remote Sens. 2011, 32, 1713–1729. [Google Scholar] [CrossRef]
Qian, J.; Luo, Y.; Wang, Y.; Li, D. Cloud detection of optical remote sensing image time series using mean shift algorithm. In Proceedings of the 2016 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Beijing, China, 10–15 July 2016; pp. 560–562. [Google Scholar]
Wei, Q.; Bioucas-Dias, J.; Dobigeon, N.; Tourneret, J.Y. Hyperspectral and multispectral image fusion based on a sparse representation. IEEE Trans. Geosci. Remote Sens. 2015, 53, 3658–3668. [Google Scholar] [CrossRef]
Hu, G.; Sun, X.; Liang, D.; Sun, Y. Cloud removal of remote sensing image based on multi-output support vector regression. J. Syst. Eng. Electron. 2014, 25, 1082–1088. [Google Scholar] [CrossRef]
Sui, Y.; He, B.; Fu, T. Energy-based cloud detection in multispectral images based on the SVM technique. Int. J. Remote Sens. 2019, 40, 5530–5543. [Google Scholar] [CrossRef]
Li, Y.; Chen, W.; Zhang, Y.; Tao, C.; Xiao, R.; Tan, Y. Accurate cloud detection in high-resolution remote sensing imagery by weakly supervised deep learning. Remote Sens. Environ. 2020, 250, 112045. [Google Scholar] [CrossRef]
Villa, A.; Chanussot, J.; Benediktsson, J.A.; Jutten, C.; Dambreville, R. Unsupervised methods for the classification of hyperspectral images with low spatial resolution. Pattern Recognit. 2013, 46, 1556–1568. [Google Scholar] [CrossRef]
Ozkan, S.; Efendioglu, M.; Demirpolat, C. Cloud detection from RGB color remote sensing images with deep pyramid networks. In Proceedings of the IGARSS 2018—2018 IEEE International Geoscience and Remote Sensing Symposium, Valencia, Spain, 22–27 July 2018; pp. 6939–6942. [Google Scholar]
Liu, H.; Zeng, D.; Tian, Q. Super-pixel cloud detection using hierarchical fusion CNN. In Proceedings of the2018 IEEE Fourth International Conference on Multimedia Big Data (BigMM), Xi’an, China, 13–16 September 2018; pp. 1–6. [Google Scholar]
He, Q.; Sun, X.; Yan, Z.; Fu, K. DABNet: Deformable contextual and boundary-weighted network for cloud detection in remote sensing images. IEEE Trans. Geosci. Remote Sens. 2021, 60, 29. [Google Scholar] [CrossRef]
Yuan, Q.; Zhang, Q.; Li, J.; Shen, H.; Zhang, L. Hyperspectral image denoising employing a spatial–spectral deep residual convolutional neural network. IEEE Trans. Geosci. Remote Sens. 2019, 57, 1205–1218. [Google Scholar] [CrossRef]
Bai, T.; Li, D.; Sun, K.; Chen, Y.; Li, W. Cloud detection for high-resolution satellite imagery using machine learning and multi-feature fusion. Remote Sens. 2016, 8, 715. [Google Scholar] [CrossRef]
Zhang, J.; Li, X.; Li, L.; Sun, P.; Su, X.; Hu, T.; Chen, F. Lightweight U-Net for cloud detection of visible and thermal infrared remote sensing images. Opt. Quantum Electron. 2020, 52, 397. [Google Scholar] [CrossRef]
Zhang, J.; Zhou, Q.; Shen, X.; Li, Y. Cloud detection in high-resolution remote sensing images using multi-features of ground objects. J. Geovisualization Spat. Anal. 2019, 3, 14. [Google Scholar] [CrossRef]
Zhang, J.; Wang, Y.; Wang, H.; Wu, J.; Li, Y. CNN cloud detection algorithm based on channel and spatial attention and probabilistic upsampling for remote sensing image. IEEE Trans. Geosci. Remote Sens. 2021, 60, 1–13. [Google Scholar] [CrossRef]
Wei, K.; Fu, Y.; Huang, H. 3-D quasi-recurrent neural network for hyperspectral image denoising. IEEE Trans. Neural Netw. Learn. Syst. 2021, 32, 363–375. [Google Scholar] [CrossRef]
Hwang, S.J.; Kapoor, A.; Kang, S.B. Context-based automatic local image enhancement. In European Conference on Computer Vision; Springer: Berlin/Heidelberg, Germany, 2012; pp. 569–582. [Google Scholar]
Bai, X.; Zhou, F.; Xue, B. Image enhancement using multi scale image features extracted by top-hat transform. Opt. Laser Technol. 2012, 44, 328–336. [Google Scholar] [CrossRef]
Özay, E.K.; Tunga, B. A novel method for multispectral image pansharpening based on high dimensional model representation. Expert Syst. Appl. 2021, 170, 114512. [Google Scholar] [CrossRef]
Choi, Y.; Kim, N.; Hwang, S.; Kweon, I.S. Thermal image enhancement using convolutional neural network. In Proceedings of the 2016 IEEE/RSJ international conference on intelligent robots and systems (IROS), Daejeon, Korea, 9–14 October 2016; pp. 223–230. [Google Scholar]
Zhuang, P.; Li, C.; Wu, J. Bayesian retinex underwater image enhancement. Eng. Appl. Artif. Intell. 2021, 101, 104171. [Google Scholar] [CrossRef]
Mehranian, A.; Wollenweber, S.D.; Walker, M.D.; Bradley, K.M.; Fielding, P.A.; Su, K.H.; McGowan, D.R. Image enhancement of whole-body oncology [18F]-FDG PET scans using deep neural networks to reduce noise. Eur. J. Nucl. Med. Mol. Imaging 2022, 49, 539–549. [Google Scholar] [CrossRef]
Mahashwari, T.; Asthana, A. Image enhancement using fuzzy technique. Int. J. Res. Eng. Sci. Technol. 2013, 2, 1–4. [Google Scholar]
Singh, K.; Kapoor, R. Image enhancement using exposure based sub image histogram equalization. Pattern Recognit. Lett. 2014, 36, 10–14. [Google Scholar] [CrossRef]
Raju, G.; Nair, M.S. A fast and efficient color image enhancement method based on fuzzy-logic and histogram. AEU-Int. J. Electron. Commun. 2014, 68, 237–243. [Google Scholar] [CrossRef]
Chen, S.; Liu, Y.; Zhang, C. Water-Body segmentation for multi-spectral remote sensing images by feature pyramid enhancement and pixel pair matching. Int. J. Remote Sens. 2021, 42, 5025–5043. [Google Scholar] [CrossRef]
Liu, X.; Bourennane, S.; Fossati, C. Denoising of hyperspectral images using the PARAFAC model and statistical performance analysis. IEEE Trans. Geosci. Remote Sens. 2012, 50, 3717–3724. [Google Scholar] [CrossRef]
Pathak, S.S.; Dahiwale, P.; Padole, G. A combined effect of local and global method for contrast image enhancement. In Proceedings of the 2015 IEEE International Conference on Engineering and Technology (ICETECH), Coimbatore, India, 20 March 2015; pp. 1–5. [Google Scholar]
Goldstein, T.; Xu, L.; Kelly, K.F.; Baraniuk, R. The stone transform: Multi-resolution image enhancement and compressive video. IEEE Trans. Image Process. 2015, 24, 5581–5593. [Google Scholar] [CrossRef]
Xu, Y.; Yang, C.; Sun, B.; Yan, X.; Chen, M. A novel multi-scale fusion framework for detail-preserving low-light image enhancement. Inf. Sci. 2021, 548, 378–397. [Google Scholar] [CrossRef]
Stankevich, S.A.; Piestova, I.O.; Lubskyi, M.S.; Shklyar, S.V.; Lysenko, A.R.; Maslenko, O.V.; Rabcan, J. Knowledge-based multispectral remote sensing imagery superresolution. In Reliability Engineering and Computational Intelligence; Springer: Cham, Switzerland, 2021; pp. 219–236. [Google Scholar]
Rahman, S.; Rahman, M.M.; Abdullah-Al-Wadud, M.; Al-Quaderi, G.D.; Shoyaib, M. An adaptive gamma correction for image enhancement. EURASIP J. Image Video Process. 2016, 1, 1–13. [Google Scholar] [CrossRef]
Li, M.; Liu, J.; Yang, W.; Sun, X.; Guo, Z. Structure-revealing low-light image enhancement via robust retinex model. IEEE Trans. Image Process. 2018, 27, 2828–2841. [Google Scholar] [CrossRef]
Kuang, X.; Sui, X.; Liu, Y.; Chen, Q.; Gu, G. Single infrared image enhancement using a deep convolutional neural network. Neurocomputing 2019, 332, 119–128. [Google Scholar] [CrossRef]
Usharani, A.; Bhavana, D. Deep convolution neural network based approach for multispectral images. Int. J. Syst. Assur. Eng. Manag. 2021, 1–10. [Google Scholar] [CrossRef]
Chen, Y.; He, W.; Yokoya, N.; Huang, T.Z.; Zhao, X.L. Nonlocal tensor-ring decomposition for hyperspectral image denoising. IEEE Trans. Geosci. Remote Sens. 2020, 58, 1348–1362. [Google Scholar] [CrossRef]
Wen, H.; Tian, Y.; Huang, T.; Gao, W. Single underwater image enhancement with a new optical model. In Proceedings of the 2013 IEEE International Symposium on Circuits and Systems (ISCAS), Beijing, China, 19–23 May 2013; pp. 753–756. [Google Scholar]
Shen, L.; Yue, Z.; Feng, F.; Chen, Q.; Liu, S.; Ma, J. Msr-net: Low-light image enhancement using deep convolutional network. arXiv 2017, arXiv:1711.02488. [Google Scholar]
Kaplan, N.H. Remote sensing image enhancement using hazy image model. Optik 2018, 155, 139–148. [Google Scholar] [CrossRef]
Liu, K.; Liang, Y. Underwater image enhancement method based on adaptive attenuation-curve prior. Opt. Express 2021, 29, 10321–10345. [Google Scholar] [CrossRef]
Zhou, M.; Jin, K.; Wang, S.; Ye, J.; Qian, D. Color retinal image enhancement based on luminosity and contrast adjustment. IEEE Trans. Biomed. Eng. 2017, 65, 521–527. [Google Scholar] [CrossRef]
Kim, H.U.; Koh, Y.J.; Kim, C.S. PieNet: Personalized image enhancement network. In European Conference on Computer Vision; Springer: Cham, Switzerland, 2020; pp. 374–390. [Google Scholar]
Moran, S.; Marza, P.; McDonagh, S.; Parisot, S.; Slabaugh, G. Deeplpf: Deep local parametric filters for image enhancement. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 12826–12835. [Google Scholar]
Li, Y.; Huang, H.; Xie, Q.; Yao, L.; Chen, Q. Research on a surface defect detection algorithm based on MobileNet-SSD. Appl. Sci. 2018, 8, 1678. [Google Scholar] [CrossRef]
Kadam, K.; Ahirrao, S.; Kotecha, K.; Sahu, S. Detection and localization of multiple image splicing using MobileNet V1. IEEE Access 2021, 9, 162499–162519. [Google Scholar] [CrossRef]
Srinivasu, P.N.; SivaSai, J.G.; Ijaz, M.F.; Bhoi, A.K.; Kim, W.; Kang, J.J. Classification of skin disease using deep learning neural networks with MobileNet V2 and LSTM. Sensors 2021, 21, 2852. [Google Scholar] [CrossRef]
Kavyashree, P.S.; El-Sharkawy, M. Compressed mobilenet v3: A light weight variant for resource-constrained platforms. In Proceedings of the 2021 IEEE 11th Annual Computing and Communication Workshop and Conference (CCWC), Online. 27–30 January 2021; pp. 104–107. [Google Scholar]
Al-Rahlawee, A.T.H.; Rahebi, J. Multilevel thresholding of images with improved Otsu thresholding by black widow optimization algorithm. Multimed. Tools Appl. 2021, 80, 28217–28243. [Google Scholar] [CrossRef]
Arif, M.; Wang, G. Fast curvelet transform through genetic algorithm for multimodal medical image fusion. Soft Comput. 2020, 24, 1815–1836. [Google Scholar] [CrossRef]
Qiu, S.; Tang, Y.; Du, Y.; Yang, S. The infrared moving target extraction and fast video reconstruction algorithm. Infrared Phys. Technol. 2019, 97, 85–92. [Google Scholar] [CrossRef]
Srivastava, S.; Kumar, P.; Chaudhry, V.; Singh, A. Detection of ovarian cyst in ultrasound images using fine-tuned VGG-16 deep learning network. SN Comput. Sci. 2020, 1, 81. [Google Scholar] [CrossRef]
Sam, S.M.; Kamardin, K.; Sjarif, N.N.A.; Mohamed, N. Offline signature verification using deep learning convolutional neural network (CNN) architectures GoogLeNet inception-v1 and inception-v3. Procedia Comput. Sci. 2019, 161, 475–483. [Google Scholar]
Lu, Z.; Bai, Y.; Chen, Y.; Su, C.; Lu, S.; Zhan, T.; Wang, S. The classification of gliomas based on a pyramid dilated convolution resnet model. Pattern Recognit. Lett. 2020, 133, 173–179. [Google Scholar] [CrossRef]
Vanmali, A.V.; Gadre, V.M. Visible and NIR image fusion using weight-map-guided Laplacian–Gaussian pyramid for improving scene visibility. Sādhanā 2017, 42, 1063–1082. [Google Scholar] [CrossRef]

Figure 1. Flowchart of the proposed algorithm.

Figure 2. Structure of MobileNet.

Figure 3. Calculation of Conv dw.

Figure 4. Process of DeepViT.

Figure 5. MobileDViT block.

Figure 6. Training network.

Figure 7. Residual attention mechanism model.

Figure 8. Schematic diagram of pre-training based on self-supervision.

Figure 9. Algorithm framework.

Figure 10. NSCT two-layer NSCT two-layer exploded chart.

Figure 11. Multispectral database display: (a) cloudy multispectral data, (b) urban multispectral data, and (c) suburban multispectral data.

Figure 12. Performance of the cloud detection algorithm.

Figure 13. The ROC curve by different algorithm.

Figure 14. Convergence curve.

Figure 15. Cloud annotation effect.

Figure 16. Image enhancement effect: (a) cloudy image, (b) urban image with thin cloud, (c) urban image with thick cloud and (d) suburban image. Left: LGP [69]; Middle: CVT [64]; Right: Ours.

Table 1. The time consumption of image enhancement algorithm.

Algorithm	Cloudy	Urban	Suburban	Average
Time (s)	298	578	412	429

Table 2. Cloud detection results of database 1.

Algorithm	AOM	AVM	AUM	CM
CTF [4]	0.65	0.33	0.35	0.66
MS [12]	0.71	0.31	0.34	0.69
SVM [20]	0.75	0.26	0.27	0.74
ACNN [30]	0.76	0.21	0.25	0.77
The proposed algorithm	0.81	0.18	0.23	0.80

Table 3. Cloud detection results of database 2.

Algorithm	AOM	AVM	AUM	CM
CTF [4]	0.72	0.35	0.35	0.67
MS [12]	0.78	0.30	0.31	0.72
SVM [20]	0.80	0.26	0.22	0.77
ACNN [30]	0.82	0.17	0.19	0.82
The proposed algorithm	0.86	0.14	0.15	0.86

Table 4. Cloud detection results of database 3.

Algorithm	AOM	AVM	AUM	CM
CTF [4]	0.61	0.38	0.39	0.61
MS [12]	0.68	0.34	0.35	0.66
SVM [20]	0.76	0.26	0.31	0.73
ACNN [30]	0.80	0.21	0.23	0.79
The proposed algorithm	0.83	0.16	0.19	0.82

Table 5. Cloudy image enhancement effect.

Algorithm	AG	SD	SF	EI
LCM [56]	9.2	46.2	33.5	5.1
LEM [33]	9.5	47.5	34.5	5.3
SIH [39]	9.8	46.9	35.8	6.2
LGP [69]	10.5	49.8	35.4	6.4
CNN [30]	10.3	52.8	36.8	6.5
CVT [64]	10.9	51.3	37.2	6.8
MSF [45]	11.2	52.6	36.5	7.1
The proposed algorithm	12.3	56.1	38.6	7.3

Table 6. Urban cloud contaminated image enhancement effect.

Algorithm	AG	SD	SF	EI
LCM [56]	9.8	47.5	35.5	5.6
LEM [32]	10.1	49.8	36.1	5.9
SIH [39]	10.2	51.4	36.5	6.4
LGP [69]	11.6	52.6	37.1	6.5
CNN [30]	11.8	54.8	37.4	6.8
CVT [64]	12.1	55.9	38.6	7.3
MSF [45]	12.5	56.3	38.9	7.4
The proposed algorithm	13.1	58.2	40.1	7.6

Table 7. Suburban image enhancement effect.

Algorithm	AG	SD	SF	EI
LCM [56]	9.6	46.9	34.5	5.4
LEM [32]	9.8	49.1	34,3	5.7
SIH [39]	10.2	49.3	35.9	6.4
LGP [69]	11	52.1	36.4	6.5
CNN [30]	11.3	53.8	37.3	6.7
CVT [64]	11.4	53.5	38.1	7.1
MSF [45]	12.1	55.1	38.2	7.2
The proposed algorithm	12.7	57.7	39.1	7.4

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Li, X.; Ye, H.; Qiu, S. Cloud Contaminated Multispectral Remote Sensing Image Enhancement Algorithm Based on MobileNet. Remote Sens. 2022, 14, 4815. https://doi.org/10.3390/rs14194815

AMA Style

Li X, Ye H, Qiu S. Cloud Contaminated Multispectral Remote Sensing Image Enhancement Algorithm Based on MobileNet. Remote Sensing. 2022; 14(19):4815. https://doi.org/10.3390/rs14194815

Chicago/Turabian Style

Li, Xuemei, Huping Ye, and Shi Qiu. 2022. "Cloud Contaminated Multispectral Remote Sensing Image Enhancement Algorithm Based on MobileNet" Remote Sensing 14, no. 19: 4815. https://doi.org/10.3390/rs14194815

APA Style

Li, X., Ye, H., & Qiu, S. (2022). Cloud Contaminated Multispectral Remote Sensing Image Enhancement Algorithm Based on MobileNet. Remote Sensing, 14(19), 4815. https://doi.org/10.3390/rs14194815

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Cloud Contaminated Multispectral Remote Sensing Image Enhancement Algorithm Based on MobileNet

Abstract

1. Introduction

2. Methods

2.1. MobileNet

2.2. Cloud Detection Algorithm Based on Improved MobileNet

2.3. Self-Supervised Training Algorithm

2.4. Multispectral Fusion Algorithm Based on Compressed Sensing

3. Dataset Description

4. Experiment Analysis

4.1. Performance of the Cloud Detection Algorithm

4.2. Performance of the Training Algorithm

4.3. Evaluation of Image Enhancement Effect

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI