Fusion Information Multi-View Classification Method for Remote Sensing Cloud Detection

Hao, Qi; Zheng, Wenguang; Xiao, Yingyuan

doi:10.3390/app12147295

Open AccessArticle

Fusion Information Multi-View Classification Method for Remote Sensing Cloud Detection

by

Qi Hao

^1,2,

Wenguang Zheng

^1,2,*

and

Yingyuan Xiao

^1,2,*

¹

Engineering Research Center of Learning-Based Intelligent System, Ministry of Education, Tianjin 300384, China

²

Tianjin Key Laboratory of Intelligence Computing and Novel Software Technology, Tianjin University of Technology, Tianjin 300384, China

^*

Authors to whom correspondence should be addressed.

Appl. Sci. 2022, 12(14), 7295; https://doi.org/10.3390/app12147295

Submission received: 29 May 2022 / Revised: 14 July 2022 / Accepted: 18 July 2022 / Published: 20 July 2022

(This article belongs to the Special Issue Intelligent Computing and Remote Sensing)

Download

Browse Figures

Versions Notes

Abstract

:

In recent years, many studies have been carried out to detect clouds on remote sensing images. Due to the complex terrain, the variety of clouds, the density, and content of clouds are various, and the current model has difficulty accurately detecting the cloud in the image. In our strategy, a multi-view data training set based on super pixel is constructed. View A uses multi-level network to extract the boundary, texture, and deep abstract feature of super pixels. View B is the statistical feature of the three channels of the image. Privilege information View P contains the cloud content of super pixels and the tag status of adjacent super pixels. Finally, we propose a cloud detection method for remote sensing image classification based on multi-view support vector machine (SVM). The proposed method is tested on images of different terrain and cloud distribution in GF-1_WHU and Cloud-38 remote sensing datasets. Visual performance and quantitative analysis show that the method has excellent cloud detection performance.

Keywords:

cloud detection; feature extraction network; fusion information; multi-view learning

1. Introduction

Remote sensing images are widely used in land resource utilization, meteorological monitoring, and geology [1,2]; however, the monitoring data of sensors are often affected by clouds. Many remote sensing studies have been plagued by cloud occlusion, resulting in inaccurate observation results [3]. Therefore, it is necessary to accurately detect the cloud in remote sensing images.

Most cloud detection methods are highly dependent on available spectral bands, and use specific physical constraints to separate different categories according to spectral bands [4]. In particular, when relying on handmade features or experiences and searching for specific thresholds, these methods will have some problems in the segmentation of categories, and there may be some situations that cannot be segmented from the spectral threshold, such as deserts and super-high-brightness pixel areas [5]. The key of the threshold-based method is how to choose the best threshold to distinguish foreground cloud and background surface. Early fixed threshold methods failed to meet the increasing accuracy requirements. Therefore, more and more dynamic adaptive threshold methods are proposed for the difference between cloud features and surface features. Jedlovec et al. [6] used two images with different channels to incorporate the spatiotemporal change threshold into the cloud detection process. Zhang et al. [7] proposed an automatic cloud detection algorithm for remote sensing image observation statistics. They improved the global threshold method and gradually improved the detection results.

With the development of machine learning, the method based on Markov random field [8] and the widely used SVM [9,10,11,12] are becoming more and more popular in cloud detection. There are also some deep networks used as tools for cloud detection tasks. For example, Mohajerani et al. [13] proposed a hybrid full convolution network (FCN) and gradient recognition algorithm and applied FCN to the field of cloud detection. Manzo et al. [14] proposed a framework that combines convolutional neural networks, adapted to the cloud recognition task through a transfer learning approach, using voting rules. The cloud-net algorithm has better detection effect by redesigning the convolution block based on FCN [15], and usually uses the cloud network as the baseline of the deep learning cloud detection network. The multilevel feature fused segmentation network (MFFSNet) algorithm uses a pyramid pooling module to aggregate feature information at different scales to improve the utilization of local and global features of clouds in images [16]. In the detection of cloud area in cloud-containing remote sensing images, judging from the pixel level or dividing the image into rectangular blocks, super-pixel-level judgment is more effective than pixel-level judgment [17,18]. Liu et al. [19] proposed a cloud judgment method combined with several statistical characteristics of super pixels and conducted experiments. The above studies show that judging cloud or non-cloud with super pixel as the basic unit can obtain excellent cloud and non-cloud segmentation results of remote sensing images.

Many methods determine labels according to the number of cloud-containing pixels or the proportion of cloud-containing pixels in the super pixel, and cannot make good use of the label data of adjacent pixels and their own cloud proportion data. The information between super pixels cannot be well combined. The current methods are difficult to define the super pixel as a certain category, or the super pixel cannot fully contain the cloud, and the corresponding cloud-containing state information cannot be reasonably used. At the same time, most remote sensing images now use multiple sensors to collect data, but there will be only three RGB visible light channels of data. For this image, a segmentation method with high accuracy is also needed.

In order to solve the above problems, a multi-layer network structure is used to extract the texture, boundary, and high-level abstract information of super pixels, and the statistical features of the basic three-channel color data are utilized. The data of different extracted views are essentially relevant because they provide complementary information for the same data in semantics. Many methods show that the combination of multiple views learning is superior to the simple method of using a connected view or learning from each view alone [20,21]. The structural relationship between pixels and the cloud state of the pixel itself belong to a privilege information. In order to unify the use of each view, a multi-view classification method based on information fusion is proposed. In the training stage, three feature views are used. The model introducing the privileged information mechanism can accurately determine the super pixel category by using the extracted two feature views (except the privilege information features). Overall, our main contributions are as follows:

1.: A feature extraction network at the super pixel level is constructed to extract the fast features of super pixels in cloud-containing remote sensing images at multi-scales. The cloud content in super pixels and the cloud-containing marker state information of adjacent super pixels are effectively utilized.
2.: A multi-view support vector machine cloud detection classifier based on fusion information is constructed and the solving algorithm based on quadratic convex optimization is given.
3.: We provide a multi-view classification dataset based on remote sensing cloud super pixels. The new model is used to classify the super pixels and synthesize the cloud mask bipartite graph. Experiments are carried out in images with different cloud contents to verify the effectiveness of the new method.

The rest of this article is organized as follows. Section 2 introduces the research status of multi-view learning and an advanced multi-view classification method. In Section 3, we introduce the framework of the proposed method in detail, including the multi-view feature extraction method and fusion information multi-view classification model. Section 4 shows the experimental results on two remote sensing datasets and discusses the results of the experiment. Finally, Section 5 summarizes the research work.

2. Related Work

2.1. Multi-View Learning

Multi-view learning algorithms can be divided into co-training, multi-kernel learning, and subspace learning [21,22]. The co-training algorithm iteratively maximizes mutual conventions on two distinct views to ensure consistency on the same validation data, such as multi-view collaborative clustering algorithm and research [23] using multiple collaborative training for document classification [24]. The multi-kernel learning (MKL) algorithm uses the kernels corresponding to different views and combines them linearly or nonlinearly to improve performance. For example, the support kernel machine (SKM) model introduced in [25], and the sequential minimum optimization (SMO) algorithm is developed to solve it. Multi-kernel framework with nonparallel support vector machine (MKNPSVM) works by integrating non-parallel support vector machines into the MKL framework to learn the optimal kernel combination [26]. The subspace learning algorithm assumes that the input view comes from a potential subspace and aims to realize the potential subspace shared by multiple views, such as SVM-2K combined with two support vector machines and kernel canonical correlation analysis (KCCA) [27], SVM classification method with coupling privileged kernel method [28], etc.

2.2. Coupling Privileged Kernel Method for Multi-View Learning

Tang et al. [28] proposed a simple and effective multi-view learning coupling privileged kernel method (MCPK). MCPK integrates consensus and complementarity principles into a unified framework. In particular, consistency is captured by coupling terms between two views. Because the multi-view data collected from different domains can complement each other, a different feature view can receive explicit privilege information from its view. MCPK can be built as Equation (1).

\begin{matrix} \underset{w_{A}, w_{B}, ξ_{A}, ξ_{B}}{m i n} & \frac{1}{2} (∥ w_{A} ∥^{2} + γ ∥ w_{B} ∥^{2}) + C_{A} \sum_{i = 1}^{l} ξ_{i}^{A} + C_{B} \sum_{i = 1}^{l} ξ_{i}^{B} + C \sum_{i = 1}^{l} ξ_{i}^{A} ξ_{i}^{B}, \\ s . t . & y_{i} (w_{A} \cdot ϕ_{A} (x_{i}^{A})) \geq 1 - ξ_{i}^{A}, \\ y_{i} (w_{B} \cdot ϕ_{B} (x_{i}^{B})) \geq 1 - ξ_{i}^{B}, \\ ξ_{i}^{A} \geq y_{i} (w_{B} \cdot ϕ_{B} (x_{i}^{B})), \\ ξ_{i}^{B} \geq y_{i} (w_{A} \cdot ϕ_{A} (x_{i}^{A})), \\ ξ_{i}^{A} \geq 0, ξ_{i}^{B} \geq 0, i = 1, \dots, l . \end{matrix}

(1)

where

w_{A}

and

w_{B}

are the weight vectors of view A and view B, respectively, and the two views are weighed by the non-negative trade-off parameter

γ

. As slack variables,

ξ_{i}^{A}

and

ξ_{i}^{A}

are constrained by the correction functions determined by the two views. The coupling term

C \sum_{i = 1}^{l} ξ_{i}^{A} ξ_{i}^{B}

makes the product of error variables of the two views as small as possible. When classifiers constructed from different views are more consistent, errors from both views are small, resulting in smaller couplings. Therefore, its consistency can be fully ensured. C is a non-negative coupling parameter that controls the influence of the coupling term.

C_{A}

and

C_{B}

are non-negative penalty parameters.

3. Proposed Method

3.1. Multi-View Feature Extraction

View A: Texture, boundary, and other features are extracted from the super pixel circumscribed rectangular region through a multi-layer joint convolutional neural network (CNN). Cloud area has similar shape to super pixel boundary, which is quite different from grassland area and water area boundary. In this paper, we use the inherent multi-scale pyramid level of a deep convolution network to develop a top architecture with horizontal connection, which is used to construct high-level feature maps at all scales. Features can be easily extracted by multi-scale feature extraction structure in the super pixel circumscribed rectangle, and texture, boundary, gradient, and other information can be obtained from multiple horizons. The feature extraction layer Conv3 in the deepest part will extract more abstract features. View A feature extraction is realized by cloud super pixel feature extraction network (CSPFE-Net); CSPFE-Net structure is shown in Figure 1. In the network structure, the super pixel external rectangular region has the input of

3 \times 16 \times 16

images. Conv refers to the convolution layer. Relu increases the nonlinear ability. Pooling downsampling is used to further extract the image. LRN is normalized to constrain the data within a certain range. Concat refers to stitching vectors. The features extracted from each feature extraction layer are weighted splicing, and are jointly output as the feature vector through the full connection layer. The feature vector with the super pixel output structure of

1 \times 64

under the fine scale is sufficient to complete the feature representation.

View B: The color statistical characteristics extracted by the super pixel itself, excluding the black edge region of the circumscribed rectangle. Specifically, we calculate the mean, variance, maximum, minimum, and median of the data describing the color in each channel.

S P_{i}

represents three-channel data for the i-th super pixel, and RGB color statistics can be calculated by Equation (3). The feature vector format of view b is

1 \times 15

.

g e t C F (x) = [m e a n (x), s t d (x), m a x (x), m i n (x), m e d i a n (x)]

(2)

R G B f e a t u r e = [g e t C F (S P_{i} (R)), g e t C F (S P_{i} (G)), g e t C F (S P_{i} (B))]

(3)

where

g e t C F (x)

extracts and splices the statistical features of the data of one channel x;

S P_{i} (R)

,

S P_{i} (G)

, and

S P_{i} (B)

represent three visible light channels: red, green, and blue.

View P: View P (privileged information view) is the feature of privileged information. The correction space guided by privilege information can correct the classification plane of multi-view, which can make the performance of multi-view classifier better. The feature of cloud-containing super pixel privileged information includes two parts. The first part is the specific cloud content in the super pixel. The second part is the label of the fast adjacent block of the super pixel. Generally, two super pixel blocks with the nearest center distance are selected, and this part is the feature vector of

1 \times 3

.

3.2. Fusion Information Multi-View SVM Classification Method

Fusion information multi-view SVM classification method (FIMV-SVM) is based on MCPK, using feature extraction network and three-channel statistical features as view A and view B data, using privileged information view P to correct the separation hyperplane of view A and view B. The optimized structure constructed is as per Equation (4).

\begin{matrix} \underset{w_{A}, w_{B}, w_{P}, ξ_{A}, ξ_{B}, ξ_{P}}{m i n} & \frac{1}{2} (∥ w_{A} ∥^{2} + γ ∥ w_{B} ∥^{2} + γ_{P} ∥ w_{P} ∥^{2}) \\ + C_{A} \sum_{i = 1}^{l} ξ_{i}^{A} + C_{B} \sum_{i = 1}^{l} ξ_{i}^{B} + C_{P} \sum_{i = 1}^{l} ξ_{i}^{P} + C \sum_{i = 1}^{l} ξ_{i}^{A} ξ_{i}^{B}, \\ s . t . & y_{i} (w_{A} \cdot ϕ_{A} (x_{i}^{A})) \geq 1 - ξ_{i}^{A}, \\ y_{i} (w_{B} \cdot ϕ_{B} (x_{i}^{B})) \geq 1 - ξ_{i}^{B}, \\ y_{i} (w_{P} \cdot ϕ_{P} (x_{i}^{P})) \geq 1 - ξ_{i}^{P}, \\ ξ_{i}^{A} \geq y_{i} (w_{B} \cdot ϕ_{B} (x_{i}^{B})), \\ ξ_{i}^{B} \geq y_{i} (w_{A} \cdot ϕ_{A} (x_{i}^{A})), \\ ξ_{i}^{A} \geq y_{i} (w_{P} \cdot ϕ_{P} (x_{i}^{P})), \\ ξ_{i}^{B} \geq y_{i} (w_{P} \cdot ϕ_{P} (x_{i}^{P})), \\ ξ_{i}^{A} \geq 0, ξ_{i}^{B} \geq 0, ξ_{i}^{P} \geq 0, i = 1, \dots, l . \end{matrix}

(4)

In optimization problem Equation (4),

∥ w_{A} ∥^{2}

and

∥ w_{A} ∥^{2}

are regularization terms of view A and view B,

∥ w_{P} ∥^{2}

is a regularization term for privileged information view P, respectively.

C_{A}

,

C_{B}

, and

C_{P}

are non-negative penalty parameters,

ξ_{i}^{A} = [ξ_{1}^{A}, \dots, ξ_{l}^{A}]

,

ξ_{i}^{B} = [ξ_{1}^{B}, \dots, ξ_{l}^{B}]

, and

ξ_{i}^{P} = [ξ_{1}^{P}, \dots, ξ_{l}^{P}]

are non-negative slack parameters.

γ

is the balance parameter to balance the weight of view A and view B.

γ_{P}

is used to weigh the influence of privileged information view P;

ϕ_{A} (x_{i}^{A})

,

ϕ_{B} (x_{i}^{B})

, and

ϕ_{P} (x_{i}^{P})

represent the mapping of views data. In the constraint,

y_{i} (w_{P} \cdot ϕ_{P} (x_{i}^{P})) \geq 1 - ξ_{i}^{P}

denote that the slack variables are constrained by the view P,

ξ_{i}^{A} \geq y_{i} (w_{P} \cdot ϕ_{P} (x_{i}^{P}))

, and

ξ_{i}^{B} \geq y_{i} (w_{P} \cdot ϕ_{P} (x_{i}^{P}))

correcting constraints on the classification hyperplane through the correction hyperplane formed by privileged information. For the solution of the above optimization problem, it can be transformed into a Lagrangian dual problem and solved by solving quadratic convex optimization. The Lagrangian function is Equation (5).

\begin{matrix} L = \frac{1}{2} (∥ w_{A} ∥^{2} + γ ∥ w_{B} ∥^{2} + γ_{P} ∥ w_{P} ∥^{2}) + C_{A} \sum_{i = 1}^{l} ξ_{i}^{A} + C_{B} \sum_{i = 1}^{l} ξ_{i}^{B} + C_{P} \sum_{i = 1}^{l} ξ_{i}^{P} + C \sum_{i = 1}^{l} ξ_{i}^{A} ξ_{i}^{B} \\ + \sum_{i = 1}^{l} α_{i}^{A} (1 - ξ_{i}^{A} - y_{i} (w_{A} \cdot ϕ_{i}^{A} (x_{i}^{A}))) + \sum_{i = 1}^{l} α_{i}^{B} (1 - ξ_{i}^{B} - y_{i} (w_{B} \cdot ϕ_{i}^{B} (x_{i}^{B}))) \\ + \sum_{i = 1}^{l} α_{i}^{P} (1 - ξ_{i}^{P} - y_{i} (w_{P} \cdot ϕ_{i}^{P} (x_{i}^{P}))) \\ + \sum_{i = 1}^{l} λ_{i}^{A} (y_{i} (w_{B} \cdot ϕ_{B} (x_{i}^{B})) - ξ_{i}^{A}) + \sum_{i = 1}^{l} λ_{i}^{B} (y_{i} (w_{A} \cdot ϕ_{A} (x_{i}^{A})) - ξ_{i}^{B}) \\ + \sum_{i = 1}^{l} μ_{i}^{A} (y_{i} (w_{A} \cdot ϕ_{A} (x_{i}^{A})) - ξ_{i}^{P}) + \sum_{i = 1}^{l} μ_{i}^{B} (y_{i} (w_{B} \cdot ϕ_{B} (x_{i}^{B})) - ξ_{i}^{P}) \\ - \sum_{i = 1}^{l} β_{i}^{A} ξ_{i}^{A} - \sum_{i = 1}^{l} β_{i}^{B} ξ_{i}^{B} . \end{matrix}

(5)

Therefore, the dual programming of Equation (4) can be obtained by finding the partial derivatives of the optimization parameters.

\begin{matrix} m i n & \frac{1}{2} \sum_{i = 1}^{l} \sum_{j = 1}^{l} ((α_{i}^{A} - λ_{i}^{B}) y_{i} K_{A} (x_{i}^{A}, x_{j}^{A}) (α_{j}^{A} - λ_{j}^{B}) y_{j} \\ + \frac{1}{γ} (α_{i}^{B} - λ_{i}^{A}) y_{i} K_{B} (x_{i}^{B}, x_{j}^{B}) (α_{j}^{B} - λ_{j}^{A}) y_{j} \\ + \frac{1}{γ_{P}} (α_{i}^{P} y_{i} K_{P} (x_{i}^{P}, x_{j}^{P}) α_{j}^{P} y_{j})) - \sum_{i = 1}^{l} (α_{i}^{A} + α_{i}^{B} + α_{i}^{P}) \\ + \frac{1}{C} \sum_{i = 1}^{l} (α_{i}^{A} + λ_{i}^{A} + μ_{i}^{A} + β_{i}^{A} - C_{A}) (α_{i}^{B} + λ_{i}^{B} + μ_{i}^{B} + β_{i}^{B} - C_{B}), \\ s . t . & α_{i}^{A}, α_{i}^{B}, λ_{i}^{A}, λ_{i}^{B}, μ_{i}^{A}, μ_{i}^{B}, β_{i}^{A}, β_{i}^{B} \geq 0 . \end{matrix}

(6)

where

K_{A} (x_{i}^{A}, x_{j}^{A}), K_{B} (x_{i}^{B}, x_{j}^{B}), K_{P} (x_{i}^{P}, x_{j}^{P})

represents the kernel mapping mode of view A, view B, and view P feature data, respectively. The optimization problem Equation (6) is a quadratic convex programming problem, which can be solved by the quadratic convex programming method. Solving the optimal parameters

α_{i}^{A *}

,

α_{i}^{B *}

,

β_{i}^{A *}

,

β_{i}^{B *}

,

λ_{i}^{A *}

,

λ_{i}^{B *}

,

μ_{i}^{A *}

,

μ_{i}^{B *}

, we use the Karush–Kuhn–Tucker(KKT) [29] condition to obtain the optimal result

w_{A}^{*}

and

w_{B}^{*}

. The calculation results are shown in Equations (7) and (8).

\begin{matrix} w_{A}^{*} = \sum_{i = 1}^{l} (α_{i}^{A *} y_{i} - λ_{i}^{B *} y_{i}) ϕ_{A} (x_{i}^{A}), \end{matrix}

(7)

\begin{matrix} w_{B}^{*} = \sum_{i = 1}^{l} (α_{i}^{B *} y_{i} - λ_{i}^{A *} y_{i}) ϕ_{B} (x_{i}^{B}) . \end{matrix}

(8)

After obtaining the optimal

w_{A}^{*}

and

w_{B}^{*}

, we use the following formula to predict the labels of the new samples

(x^{A}, x^{B})

from view A and view B. The final predictor of multi-views can be constructed as the average prediction factor of each view and is shown in (9).

\begin{matrix} f = s i g n (\frac{1}{2} f_{A} (x^{A}) + \frac{1}{2} f_{B} (x^{B})) = s i g n (\frac{1}{2} {w_{A}^{*}}^{⊤} ϕ_{A} (x^{A}) + \frac{1}{2} {w_{B}^{*}}^{⊤} ϕ_{B} (x^{B})) . \end{matrix}

(9)

where

f_{A}

represents the decision function of view A and

f_{B}

represents the decision function of view B. The FIMV-SVM solution process is clearly represented in Algorithm 1.

3.3. Cloud Detection Model Training and Application Process

Combined with the above contents, the cloud detection model method can be summarized. The specific process is shown in Figure 2. Notably, privileged information does not appear in the application phase.

Algorithm 1 QP Algorithm for FIMV-SVM

Require:

S = {\{x_{i}^{A}, x_{i}^{B}, x_{i}^{P}, y_{i}\}}_{y = 1}^{l} = {\{(x_{i}^{A}; 1), (x_{i}^{B}; 1), (x_{i}^{P}, 1)\}}_{i = 1}^{l}, y_{i} \in {+ 1, - 1}

Ensure: Decision functions:

f = s i g n (\frac{1}{2} {w_{A}^{*}}^{⊤} ϕ_{A} (x^{A}) + \frac{1}{2} {w_{B}^{*}}^{⊤} ϕ_{B} (x^{B}))

1:: Grid method generates parameter sets: $p a r a S e t = {\{C_{A}, C_{B}, C_{P}, C, γ, γ_{P}\}}_{i = 1}^{n}$ .
2:: for each $i \in [1, n]$ do
3:: Set parameters $C_{A}, C_{B}, C_{P}, C, γ, γ_{P} = p a r a S e t [i, :] .$
4:: Set kernels function of view A, view B and view P: $K_{A} (x_{i}^{A}, x_{j}^{A}), K_{B} (x_{i}^{B}, x_{j}^{B}), K_{P} (x_{i}^{P}, x_{j}^{P})$ .
5:: Create and solve quadratic programming problem and Solving quadratic programming and retaining optimal parameters $α_{i}^{A *}$ , $α_{i}^{B *}$ , $β_{i}^{A *}$ , $β_{i}^{B *}$ , $λ_{i}^{A *}$ , $λ_{i}^{B *}$ , $μ_{i}^{A *}$ , $μ_{i}^{B *}$ .
6:: Get the optimal weight $w_{A}^{*}$ and $w_{B}^{*}$ by substituting formula:

$w_{A}^{*} = \sum_{i = 1}^{l} (α_{i}^{A *} y_{i} - λ_{i}^{B *} y_{i}) ϕ_{A} (x_{i}^{A}), w_{B}^{*} = \sum_{i = 1}^{l} (α_{i}^{B *} y_{i} - λ_{i}^{A *} y_{i}) ϕ_{B} (x_{i}^{B}) .$
7:: Decision function is solved by parameters $w_{A}$ and $w_{B}$ .
8:: The accuracy of statistical validation set was calculated as $A C C [i]$ , $W_{A} [i] = w_{A}$ , $W_{B} [i] = w_{B}$ .
9:: end for
10:: The final $w_{A}^{*}, w_{B}^{*}$ :

$w_{A}^{*} = W_{A} [f i n d (a c c = = m a x (A C C))], w_{B}^{*} = W_{B} [f i n d (a c c = = m a x (A C C))] .$

In the model training phase, the following steps are taken: (1) The super pixels come from simple linear iterative cluster (SLIC) method [30], which is used for super pixel division in the original image dataset to construct the dataset with super pixel blocks as the classification objects; (2) three different feature extraction methods are used to extract the feature of super pixels one by one, and the feature vectors of view A, view B, and view P are composed; (3) FIMV-SVM classifier is trained by using the extracted numerical feature dataset and labels.

In the model application phase, the following steps are taken: (1) The image to be processed is divided by SLIC super pixel segmentation method, and then the super pixel block is resized; (2) the features of view A and view B are extracted from the super pixel set one by one to form their feature vectors; (3) super pixel classification object view A, view B, two view feature vectors, are input through the FIMV-SVM decision function to obtain the corresponding classification results; (4) the final segmentation cloud mask result is formed by combining the super pixel classification results.

4. Experiment

Our experiment was carried out on a personal computer with i7-6500 CPU and 16 GB RAM. The environment used in the experiment is Python 3.7 combined with pyTorch framework (version 1.11.0), and the cvxopt tool [31] on MATLAB 2016b is used to solve the convex optimization problem.

For the experiment in GF-1_WHU remote sensing images [32] and datasets of Cloud-38 [33], the detailed description is shown in Table 1. The original high-pixel image is divided into sub-images of

400 \times 400 \times 3

, and only their visible light channels are used.

In the process of setting the level of segmentation parameters of super pixel segmentation, considering that the super pixel object needs to have certain information inclusion ability, in the image of

400 \times 400 \times 3

, the classification levels are 1000, 1600, 2000, 2400, and 3000, which can be divided into about 160, 100, 80, 67, and 54 pixels in the super pixel block. In order to explore the optimal parameters of the super pixel partition level, we construct a parameter optimization dataset from the original datasets with the size of 1000. A total of 80% of the parameter optimization dataset is used for training, and the rest is used for testing. The average accuracy in the test process is used as the evaluation index. The experimental results are shown in Figure 3. The optimal result is obtained when the super pixel level is 2000, which shows that the number of super pixels in the super pixel is about 80, which is suitable for the smallest unit of the cloud recognition task. Therefore, for the super pixel segmentation method using convenient and efficient SLIC method, the segmentation level is 2000, that is, an image contains about 2000 super pixels, and each super pixel contains about 80 pixels. By resize operation, the number of rectangular pixels is about 256 pixels

(16 \times 16)

. The resize process uses the imresize function of MATLAB software, and imresize uses bicubic interpolation by default. The super pixel blocks with cloud content exceeding 45% are automatically labeled as cloud super pixels, and the others are labeled as non-cloud super pixels.

4.1. Parameter Setting

In the experiment, we set the following parameters. The values of

λ_{1}

,

λ_{2}

,

λ_{3}

, and

λ_{4}

in CSPFE-Net of view A are 0.3, 0.3, 0.2, and 0.2. FIMV-SVM parameters are determined by grid search method and five-fold cross validation during training. The determination of parameters is based on the highest accuracy. C,

C_{A}

,

C_{B}

, and

C_{P}

select their values from the set

[10^{- 3}, 10^{- 2}, 10^{- 1}, 1, 10, 10^{2}, 10^{3}]

. The value set of

γ

and

γ_{P}

is

[0.2, 0.4, 0.6, 0.8, 1]

. Gaussian radial basis function (RBF) kernel function

K (x_{i}, x_{j}) = e x p (- \frac{∥ x_{i} - x_{j} ∥^{2}}{2 σ})

is the kernel function of SVM method,

σ

for the Gaussian RBF kernel function is selected from

[10^{- 3}, 10^{- 2}, 10^{- 1}, 1, 10, 10^{2}, 10^{3}]

. For the association between parameter tuning and training, see Algorithm 1 above.

4.2. Visual Performance

We compare the proposed method with several advanced cloud detection methods. Cloud-Net is a cloud detection method based on deep learning and is widely used as the baseline of cloud detection experiments. Hierarchical fusion convolutional neural network (HFCNN) [19] is a network detection method with multi-level feature extraction, and HFCNN is a super-pixel-level judgment method. Furthermore, we added the SVM method and the MCPK method. In the SVM method, the extracted view A data and view B data are used as one-dimensional features for training. In MCPK, the data of view A and view B are used. Through these two experiments, we can explore the effectiveness of multi-view learning and privileged information addition.

Typical cloud-containing image blocks are selected from GF-1_WHU and Cloud-38 to display the detection results. These images have a variety of cloud coverage and backgrounds. The comparison results with the proposed method are shown in Figure 4 and Figure 5. In the figures, from left to right, are original image, ground truth, Cloud-net, HFCNN, SVM, MCPK, and our method. From the visual results, on the whole, the proposed method is closer to ground truth. There are several aspects worth noting. In Figure 4, the test object in the third line is an image with highlighted features. Cloud-Net has a certain misjudgment, and SVM method judges a large area into a cloud. Compared with HFCNN, the proposed method is more rigorous for thin cloud detection. SVM methods often lack large-scale cloud areas, and MCPK will misjudge to a certain extent in continuous clouds.

4.3. Quantitative Analysis

In the description of the overall results, the Jaccard index is used to describe the similarity between the predicted mask and the real mask, which is widely used in the performance evaluation of cloud detection tasks. Precision is the ratio that predicts the number of true values to the number of cloud tags in cloud data. Recall represents how many clouds can be predicted in all marked cloud data. Specificity index is used to measure the integrity of error prediction, and the overall accuracy index is used to represent the accuracy of the cloud/non-cloud binary classification. F1-score considers the relationship between precision and recall. The calculation method of each evaluation index is shown as Equations (10)–(15).

We divide each data image in GF-1_WHU and Cloud-38 into

400 \times 400 \times 3

specifications, and randomly select 80% of the dataset as the training set image, which is the test set image. Specifically, in the GF-1_WHU dataset, there are 4246 images, including 3369 training images and 850 test images. In the cloud-38 dataset, there are 15,200 images, including 12,160 training images and 3040 test images. Each index data is shown as (mean ± variance).

J a c c a r d I n d e x = \frac{T P}{T P + F N + F P}

(10)

P r e c i s i o n = \frac{T P}{T P + F P}

(11)

R e c a l l = \frac{T P}{T P + F N}

(12)

S p e c i f i c t i t y = \frac{T N}{T N + F P}

(13)

O v e r a l l A c c u r a c y = \frac{T P + T N}{T P + T N + F P + F N}

(14)

F 1 - S c o r e = 2 \times \frac{P r e c i s i o n \times R e c a l l}{P r e c i s i o n + R e c a l l}

(15)

where

T P

represents the number of positive samples with positive judgment results,

T N

represents the number of negative samples with negative judgment results,

F P

represents the number of negative samples with positive judgment results, and

F N

represents the number of positive samples with negative judgment results.

As shown in Table 2 and Table 3, our method achieves high scores in both groups of test results. The Cloud-net [15] method needs a certain number of training sets to annotate data, and cannot use the location information of pixel blocks. HFCNN also uses super pixels as the basic unit of cloud detection, but it uses only one feature extraction method, and cannot use the privilege information marked by super pixels, so it performs poorly in some confusing areas. The basis of its use is

32 \times 32 \times 3

and the segmentation effect may be rougher. By comparing the SVM, MCPK, and our methods, it is proved that the multi-view structure has better performance than the method of directly stitching combined features. At the same time, thanks to the feature extraction of super pixel multi-view and the utilization of privilege information, our method is more accurate for the recognition of the cloud area. The experimental results show that no matter whether the type of substrate in the background is more or less, our method can obtain excellent results, and the results of each index prove the feasibility and effectiveness of the proposed remote cloud sensing detection method.

5. Conclusions

This paper mainly studies the three-channel remote sensing cloud detection method based on the fusion of multi-view information at the super pixel level. Firstly, the segmented super pixels are used to establish a super pixel remote sensing image database. Secondly, a variety of feature extraction mechanisms are used to extract three view features of super pixel blocks containing privileged information views. Finally, an SVM classifier that can utilize privileged information features is constructed, and a solution strategy based on quadratic convex optimization is proposed. The classifier is used to judge the super pixel category organization one by one to generate the cloud mask. Experiments are carried out on GF1_WHU and Cloud-38 datasets with different cloud content data. From the results of qualitative and quantitative analysis, we can see that the proposed method has good performance, and it also has good detection effect in scenarios with large differences in cloud distribution and cloud content. In the future, we consider improving the model by using transfer learning technology to make the model quickly adapt to the cloud recognition of multi-style remote sensing images. The algorithm proposed in this paper is based on the improvement of SVM binary classifiers. It is necessary to study a new strategy for multi-classification tasks such as accurate cloud classification, which is also a direction worthy of study in the future.

Author Contributions

Conceptualization, Q.H. and Y.X.; methodology, Q.H. and W.Z.; software, Q.H.; validation, Q.H., Y.X. and W.Z.; writing—original draft preparation, Q.H.; writing—review and editing, Q.H., Y.X. and W.Z.; visualization, Q.H.; supervision, Y.X.; project administration, W.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Tianjin “Project + Team” Key Training Project grant number XC202022 and Tianjin Research Innovation Project for Postgraduate Students grant number 2021YJSS095.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following is a description of the abbreviations used in this paper.

SVM	Support vector machine
MKL	Multi-kernel learning
SMO	Sequential minimum optimization
MKNPSVM	Multi-kernel framework with nonparallel support vector machine
KCCA	Kernel canonical correlation analysis
FCN	Full convolution networks
MCPK	Multi-view learning coupling privileged kernel method
CNN	Convolutional neural network
CSPFE-Net	Cloud super pixel feature extraction network
FIMV-SVM	Fusion information multi-view SVM
KKT	Karush–Kuhn–Tucke
SLIC	Simple linear iterative cluster
HFCNN	Hierarchical fusion convolutional neural network
RBF	Radial basis function

References

Sola, I.; Álvarez-Mozos, J.; González-Audícana, M. Inter-Comparison of Atmospheric Correction Methods on Sentinel-2 Images Applied to Croplands. In Proceedings of the 2018 IEEE International Geoscience and Remote Sensing Symposium, IGARSS 2018, Valencia, Spain, 22–27 July 2018; IEEE: Piscataway Township, NJ, USA, 2018; pp. 5940–5943. [Google Scholar]
Vermote, E.F.; Saleous, N.; Justice, C.O. Atmospheric correction of MODIS data in the visible to middle infrared: First results. Remote Sens. Environ. 2002, 83, 97–111. [Google Scholar] [CrossRef]
Irish, R.R.; Barker, J.L.; Goward, S.N.; Arvidson, T. Characterization of the Landsat-7 ETM+ automated cloud-cover assessment (ACCA) algorithm. Photogramm. Eng. Remote Sens. 2006, 72, 1179–1188. [Google Scholar] [CrossRef] [Green Version]
Zhu, Z.; Woodcock, C.E. Object-based cloud and cloud shadow detection in Landsat imagery. Remote Sens. Environ. 2012, 118, 83–94. [Google Scholar] [CrossRef]
Bley, S.; Deneke, H. A threshold-based cloud mask for the high-resolution visible channel of Meteosat Second Generation SEVIRI. Atmos. Meas. Tech. 2013, 6, 2713–2723. [Google Scholar] [CrossRef] [Green Version]
Jedlovec, G.J.; Haines, S.L.; LaFontaine, F.J. Spatial and temporal varying thresholds for cloud detection in GOES imagery. IEEE Trans. Geosci. Remote Sens. 2008, 46, 1705–1717. [Google Scholar] [CrossRef]
Zhang, Q.; Xiao, C. Cloud detection of RGB color aerial photographs by progressive refinement scheme. IEEE Trans. Geosci. Remote Sens. 2014, 52, 7264–7275. [Google Scholar] [CrossRef] [Green Version]
Li, P.; Dong, L.; Xiao, H.; Xu, M. A cloud image detection method based on SVM vector machine. Neurocomputing 2015, 169, 34–42. [Google Scholar] [CrossRef]
Le Hégarat-Mascle, S.; André, C. Use of Markov random fields for automatic cloud/shadow detection on high resolution optical images. ISPRS J. Photogramm. Remote Sens. 2009, 64, 351–366. [Google Scholar] [CrossRef]
Shao, Z.; Deng, J.; Wang, L.; Fan, Y.; Sumari, N.S.; Cheng, Q. Fuzzy autoencode based cloud detection for remote sensing imagery. Remote Sens. 2017, 9, 311. [Google Scholar] [CrossRef] [Green Version]
Ishida, H.; Oishi, Y.; Morita, K.; Moriwaki, K.; Nakajima, T.Y. Development of a support vector machine based cloud detection method for MODIS with the adjustability to various conditions. Remote Sens. Environ. 2018, 205, 390–407. [Google Scholar] [CrossRef]
Yuan, Y.; Hu, X. Bag-of-words and object-based classification for cloud extraction from satellite imagery. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2015, 8, 4197–4205. [Google Scholar] [CrossRef]
Mohajerani, S.; Krammer, T.A.; Saeedi, P. Cloud detection algorithm for remote sensing images using fully convolutional neural networks. arXiv 2018, arXiv:1810.05782. [Google Scholar]
Manzo, M.; Pellino, S. Voting in transfer learning system for ground-based cloud classification. Mach. Learn. Knowl. Extr. 2021, 3, 28. [Google Scholar] [CrossRef]
Mohajerani, S.; Saeedi, P. Cloud-Net: An end-to-end cloud detection algorithm for Landsat 8 imagery. In Proceedings of the IGARSS 2019—2019 IEEE International Geoscience and Remote Sensing Symposium, Yokohama, Japan, 28 July–2 August 2019; IEEE: Piscataway Township, NJ, USA,, 2019; pp. 1029–1032. [Google Scholar]
Yan, Z.; Yan, M.; Sun, H.; Fu, K.; Hong, J.; Sun, J.; Zhang, Y.; Sun, X. Cloud and cloud shadow detection using multilevel feature fused segmentation network. IEEE Geosci. Remote Sens. Lett. 2018, 15, 1600–1604. [Google Scholar] [CrossRef]
Tan, K.; Zhang, Y.; Tong, X. Cloud extraction from Chinese high resolution satellite imagery by probabilistic latent semantic analysis and object-based machine learning. Remote Sens. 2016, 8, 963. [Google Scholar] [CrossRef] [Green Version]
Xie, F.; Shi, M.; Shi, Z.; Yin, J.; Zhao, D. Multilevel cloud detection in remote sensing images based on deep learning. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2017, 10, 3631–3640. [Google Scholar] [CrossRef]
Liu, H.; Du, H.; Zeng, D.; Tian, Q. Cloud detection using super pixel classification and semantic segmentation. J. Comput. Sci. Technol. 2019, 34, 622–633. [Google Scholar] [CrossRef]
Gao, X.; Fan, L.; Xu, H. Multiple rank multi-linear kernel support vector machine for matrix data classification. Int. J. Mach. Learn. Cybern. 2018, 9, 251–261. [Google Scholar] [CrossRef]
Xu, C.; Tao, D.; Xu, C. A survey on multi-view learning. arXiv 2013, arXiv:1304.5634. [Google Scholar]
Tang, J.; He, Y.; Tian, Y.; Liu, D.; Kou, G.; Alsaadi, F.E. Coupling loss and self-used privileged information guided multi-view transfer learning. Inf. Sci. 2021, 551, 245–269. [Google Scholar] [CrossRef]
Appice, A.; Malerba, D. A co-training strategy for multiple view clustering in process mining. IEEE Trans. Serv. Comput. 2015, 9, 832–845. [Google Scholar] [CrossRef]
Kim, D.; Seo, D.; Cho, S.; Kang, P. Multi-co-training for document classification using various document representations: TF–IDF, LDA, and Doc2Vec. Inf. Sci. 2019, 477, 15–29. [Google Scholar] [CrossRef]
Cao, L.-L.; Huang, W.B.; Sun, F.-C. Optimization-based extreme learning machine with multi-kernel learning approach for classification. In Proceedings of the 2014 22nd International Conference on Pattern Recognition, Stockholm, Sweden, 24–28 August 2014; IEEE: Piscataway Township, NJ, USA, 2014; pp. 3564–3569. [Google Scholar]
Tang, J.; Tian, Y. A multi-kernel framework with nonparallel support vector machine. Neurocomputing 2017, 266, 226–238. [Google Scholar] [CrossRef]
Chao, G.; Sun, S. Consensus and complementarity based maximum entropy discrimination for multi-view classification. Inf. Sci. 2016, 367, 296–310. [Google Scholar] [CrossRef]
Tang, J.; Tian, Y.; Liu, D.; Kou, G. Coupling privileged kernel method for multi-view learning. Inf. Sci. 2019, 481, 110–127. [Google Scholar] [CrossRef]
Deng, N.; Tian, Y.; Zhang, C. Support Vector Machines: Optimization Based Theory, Algorithms, and Extensions; CRC Press: Boca Raton, FL, USA, 2012. [Google Scholar]
Achanta, R.; Shaji, A.; Smith, K.; Lucchi, A.; Fua, P.; Süsstrunk, S. SLIC super pixels compared to state-of-the-art super pixel methods. IEEE Trans. Pattern Anal. Mach. Intell. 2012, 34, 2274–2282. [Google Scholar] [CrossRef] [Green Version]
Grant, M.; Boyd, S.; Ye, Y. CVX: Matlab Software for Disciplined Convex Programming. 2008. Available online: http://cvxr.com/cvx/ (accessed on 15 July 2022).
Li, Z.; Shen, H.; Li, H.; Xia, G.; Gamba, P.; Zhang, L. Multi-feature combined cloud and cloud shadow detection in GaoFen-1 wide field of view imagery. Remote Sens. Environ. 2017, 191, 342–358. [Google Scholar] [CrossRef] [Green Version]
Roy, D.P.; Wulder, M.A.; Loveland, T.R.; Woodcock, C.E.; Allen, R.G.; Anderson, M.C.; Helder, D.; Irons, J.R.; Johnson, D.M.; Kennedy, R.; et al. Landsat-8: Science and product vision for terrestrial global change research. Remote Sens. Environ. 2014, 145, 154–172. [Google Scholar] [CrossRef] [Green Version]

Figure 1. CSPFE-NET structure.

Figure 2. FIMV-SVM cloud detection model training and application process framework.

Figure 3. Performance of different super pixel classification levels on parameter optimization dataset.

Figure 4. The performance of several methods on GF-1_WHU dataset [32] with different cloud distribution styles.

Figure 5. The performance of several methods on Cloud-38 dataset [33] with different cloud distribution styles.

Table 1. Public dataset description.

Name	Number of Imgages	Resource Acquisition	Remarks
GF-1_WHU	108	http://sendimage.whu.edu.cn/en/mfc-validation-data/ (accessed on 15 July 2022)	GF-1_WHU includes 108 GF-1 wide field of view (WFV) level-2A scenes and its reference cloud and cloud shadow masks.
Cloud-38	38	https://www.kaggle.com/datasets/sorour/38cloud-cloud-segmentation-in-satellite-images/download (accessed on 15 July 2022)	There are four spectral channels, namely, red (band 4), green (band 3), blue (band 2), and near-infrared (band 5).

Table 2. Performance on GF-1_WHU test dataset (mean ± variance (%)).

Method	Jaccard Index	Pression	Recall	Specificity	Overall Accuracy	F1-Score
Cloud-Net [15]	81.93 ± 7.22	87.03 ± 5.48	93.49 ± 4.36	82.86 ± 4.34	88.46 ± 3.68	91.21 ± 4.46
HFCNN [19]	87.94 ± 5.12	91.87 ± 4.05	96.02 ± 5.13	91.28 ± 3.74	92.79 ± 5.04	94.36 ± 4.76
SVM	67.12 ± 3.77	71.36 ± 5.14	92.13 ± 5.49	72.56 ± 4.87	78.85 ± 3.24	79.62 ± 3.64
MCPK	85.32 ± 4.19	87.93 ± 5.33	96.07 ± 3.96	85.77 ± 5.08	90.76 ± 4.39	92.43 ± 3.13
Our method	91.69 ± 1.34	94.67 ± 2.01	97.38 ± 1.65	93.04 ± 2.12	96.31 ± 1.46	96.03 ± 1.72

Table 3. Performance on Cloud-38 test dataset (mean ± variance (%)).

Method	Jaccard Index	Pression	Recall	Specificity	Overall Accuracy	F1-Score
Cloud-Net [15]	87.13 ± 5.14	91.22 ± 4.37	95.94 ± 3.89	89.03 ± 4.34	93.14 ± 5.36	92.87 ± 4.68
HFCNN [19]	92.04 ± 3.69	92.95 ± 3.98	98.25 ± 5.01	92.24 ± 4.36	95.86 ± 5.23	94.21 ± 3.67
SVM	79.02 ± 4.37	82.83 ± 5.04	92.87 ± 4.76	82.78 ± 3.85	86.74 ± 4.19	87.86 ± 3.92
MCPK	87.45 ± 3.94	92.32 ± 4.18	96.83 ± 5.03	88.34 ± 4.76	92.89 ± 5.14	94.03 ± 3.87
Our method	94.87 ± 1.24	95.28 ± 1.39	98.74 ± 2.07	93.48 ± 2.11	97.28 ± 1.47	97.37 ± 1.86

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Hao, Q.; Zheng, W.; Xiao, Y. Fusion Information Multi-View Classification Method for Remote Sensing Cloud Detection. Appl. Sci. 2022, 12, 7295. https://doi.org/10.3390/app12147295

AMA Style

Hao Q, Zheng W, Xiao Y. Fusion Information Multi-View Classification Method for Remote Sensing Cloud Detection. Applied Sciences. 2022; 12(14):7295. https://doi.org/10.3390/app12147295

Chicago/Turabian Style

Hao, Qi, Wenguang Zheng, and Yingyuan Xiao. 2022. "Fusion Information Multi-View Classification Method for Remote Sensing Cloud Detection" Applied Sciences 12, no. 14: 7295. https://doi.org/10.3390/app12147295

APA Style

Hao, Q., Zheng, W., & Xiao, Y. (2022). Fusion Information Multi-View Classification Method for Remote Sensing Cloud Detection. Applied Sciences, 12(14), 7295. https://doi.org/10.3390/app12147295

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Fusion Information Multi-View Classification Method for Remote Sensing Cloud Detection

Abstract

1. Introduction

2. Related Work

2.1. Multi-View Learning

2.2. Coupling Privileged Kernel Method for Multi-View Learning

3. Proposed Method

3.1. Multi-View Feature Extraction

3.2. Fusion Information Multi-View SVM Classification Method

3.3. Cloud Detection Model Training and Application Process

4. Experiment

4.1. Parameter Setting

4.2. Visual Performance

4.3. Quantitative Analysis

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI