Generative Adversarial Networks-Based Semi-Supervised Learning for Hyperspectral Image Classiﬁcation

: Classiﬁcation of hyperspectral image (HSI) is an important research topic in the remote sensing community. Signiﬁcant efforts (e.g., deep learning) have been concentrated on this task. However, it is still an open issue to classify the high-dimensional HSI with a limited number of training samples. In this paper, we propose a semi-supervised HSI classiﬁcation method inspired by the generative adversarial networks (GANs). Unlike the supervised methods, the proposed HSI classiﬁcation method is semi-supervised, which can make full use of the limited labeled samples as well as the sufﬁcient unlabeled samples. Core ideas of the proposed method are twofold. First, the three-dimensional bilateral ﬁlter (3DBF) is adopted to extract the spectral-spatial features by naturally treating the HSI as a volumetric dataset. The spatial information is integrated into the extracted features by 3DBF, which is propitious to the subsequent classiﬁcation step. Second, GANs are trained on the spectral-spatial features for semi-supervised learning. A GAN contains two neural networks (i.e., generator and discriminator) trained in opposition to one another. The semi-supervised learning is achieved by adding samples from the generator to the features and increasing the dimension of the classiﬁer output. Experimental results obtained on three benchmark HSI datasets have conﬁrmed the effectiveness of the proposed method, especially with a limited number of labeled samples.


Introduction
A hyperspectral image [1][2][3][4] contains hundreds of continuous narrow spectral bands, spanning the visible to infrared spectrum.Hyperspectral sensors have attracted much interest in remote sensing for providing abundant and valuable information over the last few decades.With the useful information, HSI has played a vital role in many applications, among which classification [5][6][7] is one of the crucial processing steps that has received enormous attention.The foremost task in hyperspectral classification is to train an effective classifier with the given training set from each class.Therefore, sufficient training samples are crucial to train a reliable classifier.However, in reality, it is time-consuming and expensive to obtain a large number of samples with class labels.This difficulty will result in the curse of dimensionality (i.e., Hughes phenomenon) and will induce the risk of overfitting.
Much work has been carried out to design suitable classifiers to deal with the above-mentioned problems in the last decades.In general, those methods can be categorized into three types, i.e., unsupervised, supervised and semi-supervised methods.Unsupervised methods focus on training models from large unlabeled samples.Since no labeled samples are required, the unsupervised methods can be easily applied in the hyperspectral processing area.Many unsupervised methods, such as fuzzy clustering [8], fuzzy C-Means method [9], artificial immune algorithm [10], graph-based method [11], have demonstrated impressive results in hyperspectral classification.However, one cannot ensure the relationship between clusters and classes with too little priori knowledge.
Supervised classifiers, which are widely used in hyperspectral classification, can yield improved performance by utilizing the priori information of the class labels.Typical supervised classifiers include the support vector machine (SVM) [12,13], artificial neural networks (ANN) [14] and sparse representation-based classification (SRC) [15,16], etc. SVM is a kind of kernel-based method that aims at exploring the optimal separating hyperplane between different classes, ANN is motivated by the biological learning process of human brain, while SRC stems from the rapid development of compressed sensing in recent years.Versatile as the supervised classifiers are, their performance heavily depends on the number of labeled samples.In contrast to the urgent needs of labeled samples, they ignore the large number of unlabeled samples to assist classification.
Semi-supervised learning is designed to alleviate the "small-sample problem" by utilizing both the limited labeled samples and the wealth of unlabeled samples that can be easily obtained without significant cost.The semi-supervised methods can be roughly divided into four types: (1) generative models [17,18], which estimate the conditional density to obtain the labels of unlabeled samples.(2) Low density separation, which aims to place boundaries in regions where few samples (labeled or unlabeled) existed.One of the state-of-the-art algorithms is the transductive support vector machine (TSVM) [19][20][21].(3) Graph-based methods [22][23][24][25][26] that utilize labeled and unlabeled samples to construct graphs and minimize the energy function, and thus, assigning labels to unlabeled samples.(4) Wrapper-based methods, which apply a supervised learning method iteratively and a certain amount of unlabeled samples are labeled in each iteration.The self-training [27,28] and co-training [29,30] algorithms are commonly-used wrapper-based methods.
Notably that the samples within a small neighborhood are likely to belong to the same class and thus, the spatial correlation between neighboring samples can be incorporated into the classification to further improve the performance of the classifiers.For instance, the spatial contextual information [31][32][33][34][35][36][37][38][39] can be extracted by various spatial filters.Segmentation methods (e.g., watershed segmentation [40] and superpixel segmentation [41,42]) can also be adopted to exploit the spatial homogeneity of the HSI.One can also use the spatial similarity of neighboring samples [43][44][45] in the classification stages.Regularizations [15,[46][47][48][49][50][51][52][53] can be added in the classifiers to refine the classification performance.Different from the above-mentioned vector/matrix-based methods, there are some three-dimension (3D)/tensor-based methods [34,[54][55][56][57] that respect the 3D nature of the HSI and process the 3D cube as a whole entity.The 3D/tensor-based methods have demonstrated considerable improvement since the joint spectral-spatial structure information is effectively exploited.
However, most of the aforementioned methods can only extract features of the original HSI dataset in a shallow manner.Deep learning [58], which can hierarchically obtain the high-level abstract representation, has recently become a hotspot in the image processing area, especially in hyperspectral classification.Typical deep architectures involve the stacked autoencoder (SAE) [59], deep brief network (DBN) [60] and convolutional neural networks (CNN) [61].The above-mentioned classification frameworks are supervised, which require a large number of labeled samples for training.Recently, a semi-supervised classifier based on multi-decision labeling and contextual deep learning (i.e., CDL-MD-L) is proposed by [62], which has demonstrated promising results in hyperspectral classification.
In this paper, a generative adversarial networks (GANs)-based semi-supervised method is proposed for hyperspectral classification.To extract the spectral-spatial features, we extend the existing two-dimensional bilateral filter (2DBF) [36,63,64] into its three-dimensional version (i.e., 3DBF), which is a non-iterative method for nonlinear and edge-preserving smoothing.The 3DBF is suitable for spectral-spatial feature extraction since it respects the 3D nature of the HSI cube.Subsequently, the outputs of the previous step can be utilized to train GANs [65,66], which are promising neural networks that have been the focus of attention in recent years.In this paper, the GANs are trained for semi-supervised classification of HSI to use the limited labeled samples and vast of unlabeled samples.The semi-supervised learning is performed by adding samples from the generators to the extracted features and increasing the dimension of the classifier output.
Compared to the existing literature, the contribution of this paper lies in two aspects: 1.
We extract the spectral-spatial features by the 3DBF.Compared to the vector/matrix-based methods, the structural features extracted by the 3DBF can effectively preserve the spectral-spatial information by naturally obeying the 3D form of the HSI and treating the 3D cube as a whole entity.

2.
We classify the HSI in a semi-supervised manner by the GANs.Compared to the supervised methods, the GANs can utilize both limited training samples and abundant of unlabeled samples.
Compared to the non-adversarial networks, the GANs take advantage of the discriminative models to train the generative network based on game theory.
The remaining part of this paper is organized as follows.Section 2 describes the proposed semi-supervised classification method in detail.Section 3 reports the experimental results and analyses on three benchmark HSI datasets.Finally, discussions and conclusions are drawn in Sections 4 and 5.

Proposed Semi-Supervised Method
The conceptual framework of the proposed method is shown in Figure 1, which is composed of two parts: (1) feature extraction; (2) semi-supervised learning.The spectral-spatial features of the original HSI cube I can be extracted by the 3DBF, which is a 3D filter that can obey the 3D nature of the HSI and extract the spectral-spatial features simultaneously.Subsequently, GANs are utilized in the feature space for semi-supervised classification by taking full advantage of both the limited labeled samples and the sufficient unlabeled samples.The classification map can be achieved by visualizing the classification results of different samples.It is noteworthy that both 3DBF and GANs are of great importance for semi-supervised learning of HSI classification.On the one hand, 3DBF is adopted for extracting the spectral-spatial features of the HSI.As emphasized in Section 1, incorporating spatial information into the hyperspectral classification helps to improve the performance the classifiers, and thus, exploring spectral-spatial feature extraction methods has become an important research topic in the hyperspectral community.In addition, since the HSI data is naturally a 3D cube, the 3D/tensor-based methods are more effective to extract the joint spectral-spatial structure information than the vector/matrix-based methods.As will be shown in Section 3.3, the GANs with the original spectral features (i.e., Spec-GANs) provide much worse performance than the GANs with 3DBF features (i.e., 3DBF-GANs), which further highlights the significance of the 3DBF.On the other hand, GANs are utilized for semi-supervised classification of the HSI.The recent development of deep learning has opened up new opportunities for hyperspectral classification.GANs, which are newly proposed deep architectures for training deep generative models by a minimax game, have shown promising results in unsupervised/semi-supervised learning.Although the GANs have been successfully employed in various areas and demonstrated remarkable success, the application of GANs in semi-supervised hyperspectral classification has never been addressed in the literature to the best of our knowledge.Therefore, it is valuable for us to represent the first attempt to develop a semi-supervised hyperspectral classification framework based on GANs.In this section, we introduce the detailed procedure of the proposed semi-supervised classification method, elaborating on the spectral-spatial feature extraction based on 3DBF and semi-supervised classification of HSI by GANs.

Spectral-Spatial Features Extracted by 3D Bilateral Filter
The bilateral filter was originally introduced by [63] under the name "SUSAN".It was then rediscovered by [67] termed as "bilateral filter", which is now the widely used name in the literature.Over the past few years, the bilateral filter has emerged as a powerful tool for several applications, such as image denoising [64] and hyperspectral classification [36].The great success of the bilateral filter stems from several properties.It is a local, non-iterative and simple filter, which smooths images while preserving edges in terms of a nonlinear combination of the neighboring pixels.Although the bilateral filter has announced impressive results in hyperspectral classification, it is performed in each two-dimensional probability map, and thus ignoring the 3D nature of the HSI cube.
In this paper, we extend the bilateral filter to 3DBF for spectral-spatial feature extraction of the HSI volumetric data.Suppose the original HSI cube can be represented as I ∈ R m×n×b , where m, n and b indicate the number of rows, columns and spectral bands, respectively, the result I b f of the 3DBF, which replaces each pixel in the I by a weighted average of its neighbors, can be defined by with where p refers to the coordinate of the HSI cube I, i.e., p = (x, y, z), x = 1, . . ., m, y = 1, 2, . . ., n, z = 1, 2, . . ., b, q indicates the index of the neighborhoods centered at p, W b f denotes the normalizing term of the neighborhood pixels q, G σ s ( p − q ) = exp(− p − q 2 /2σ 2 s ) and G σ r (|I(p) − I(q)|) = exp(−|I(p) − I(q)| 2 /2σ 2 r ) are the Gaussian filters measuring the distance in the 3D image domain (i.e., the spectral-spatial domain S) and the distance on the intensity axis (i.e., the range domain R), respectively.
To speed up the implementation, we decompose the 3DBF into a convolution followed by two nonlinearities based on signal processing grounds.Note that the nonlinearity of the 3DBF (see Equation ( 1)) originates from the division by W b f and the dependency on the intensities by G σ r (|I(p) − I(q)|), we study each point separately and isolate them during computation.Multiplying both sides of Equation ( 1) by W b f , Equations ( 1) and ( 2) can be rewritten as We then define a function W whose value is 1 everywhere (W is a function whose value is 1 everywhere, i.e., W ((x, y, z)) = 1, x = 1, . . ., m, y = 1, 2, . . ., n, z = 1, 2, . . ., b.Therefore, the size of W is the same as that of the original HSI cube) to maintain the weighted mean property of the 3DBF and represent Equation (3) as The above-mentioned Equation ( 4) can be equivalently expressed as where R denotes the intensity interval, δ(ζ) is the Kronecker symbol with δ(ζ) = 1 if ζ = 0, and δ = 0 otherwise.Specifically, δ(ζ − I(q)) = 1 if and only if ζ = I(q).The sum in Equation ( 5) is over the product space S × R, on which we express the functions by lowercases.That means, g δ s ,δ r represents a Gaussian kernel given by Based on Equation ( 5), two functions i and w can be build on S × R by and Observed from the definitions of i and w in Equations ( 7) and ( 8), we have Let the input of g δ s ,δ r be (p − q, I(p) − ζ), the input of i and w be (q, ζ), Equation ( 5) becomes where "⊗" indicates the convolution operator.Therefore, the 3DBF can be modeled by where the functions w b f and i b f are defined as (w In hyperspectral analysis, the 3D image domain (i.e., the spectral-spatial domain S) is a xyz volume and the range domain R is a simple axis labelled ζ.As described in Equation ( 13), the 3DBF can be achieved by the following three steps:

•
Convolve wi and w with a Gaussian defined on xyzζ.In this step, wi and w are "blurred" into Moreover, the 3DBF can be accelerated by downsample and upsample without changing the major steps of the implementation.That is, we downsample (wi, w) to obtain (w The remaining steps are the same as the above-mentioned steps 2 and 3. To sum up, the schematic diagram of the 3DBF can be depicted in Figure 2, by which the original HSI cube I is filtered and the spectral-spatial feature cube I b f is obtained.It is worth underlining that the dimension of the 3DBF cube I b f is the same as that of the original HSI cube, i.e., I b f ∈ R m×n×b .As will be shown in Figures 9 and 10, the spectral and spatial profiles of the 3DBF smooth the original data while still preserving edges.

A pixel located at p=(x,y,z)
The original HSI cube I (wi,w)=(I(p),1) Interpolate i bf at (x,y,z) The result cube I bf of the 3DBF

Brief of Generative Adversarial Networks
GANs are newly proposed deep architectures based on adversarial nets to train the model in an adversarial fashion to generate data mimicking certain distributions.Unlike the other deep learning methods, a GAN is an architecture around two functions (see Figure 3), i.e., a generator G, which can map a sample from a random uniform distribution to the data distribution, and a discriminator D, which is trained to distinguish whether a sample belongs to the real data distribution.In GANs, the generator and discriminator are learned jointly based on game theory.The generator G and the discriminator D can be trained in an alternating manner.In each step, G produces a sample from the random noise z that may fool D, and D is then presented the real data samples as well as the samples generated by G to classify the samples as "real" or "fake".Subsequently, G is rewarded for producing samples that can "fool" D and D for correct classification.Both functions are updated and the iteration stops until a Nash equilibrium is achieved.In greater detail, let D(s) be the probability that s comes from the real data rather than the generator, G and D play a minimax game with the following value function min Much work has been carried out to improve the GAN since it was pioneered by Goodfellow et al. [65] in 2014.Two remarkable aspects can be highlighted: theory and application.On the one hand, several improved versions of GANs in aspects of stability of training, perceptual quality, etc., have been proposed in recent literature, including the well-known deep convolutional GAN (DC-GAN) [68], conditional GAN (C-GAN) [69], Laplacian pyramid GAN (LAP-GAN) [70], information-theoretic extension to the GAN (Info-GAN) [71], unrolled GAN [72] and Wasserstein GAN (W-GAN) [73].On the other hand, recent work has also shown that GANs can provide very successful results in image generation [74], image super resolution [75], image inpainting [76] and semi-supervised learning [77].

Generative Adversarial Networks for Classification
GANs, which can train deep generative models with a minimax game, have recently emerged as powerful tools for unsupervised and semi-supervised classification.Several unsupervised/ semi-supervised techniques motivated by the GANs have sprung up over the past few years to overcome the difficulties of labeling large amounts of training samples.For instance, DC-GAN is proposed in [68] to bridge the gap between the success of the CNN for supervised and unsupervised learning.Several constraints are evaluated to make the convolutional GANs stable to train, and the trained discriminators are applied for image classification tasks, resulting in competitive performance with other unsupervised methods.Info-GAN is proposed in [71] to learn disentangled representations in a completely unsupervised manner.As an information-theoretic extension to the GAN, the Info-GAN maximizes the mutual information between a small subset of the latent variables and the observation, and therefore, interpretable and disentangled representations can be learned.Categorical GAN (CatGAN) [77], which is a framework for robust unsupervised and semi-supervised learning, combines ANN classifiers with an adversarial generative model that regularizes a discriminatively trained classifier.By heuristically understanding the non-convergence problem, an improved semi-supervised learning method is proposed in [66], which can be regarded as a continuation and refinement of the effort in [77].Moreover, Premachandran and Yuille [78]  Note that different versions of GANs have different objective functions and procedures, it is hard to obtain a unified architecture for describing the unsupervised/semi-supervised techniques.In this section, we try to give a schematic illustration of the procedure for unsupervised/semi-supervised learning in Figure 4, which contains the main steps in most of the scenarios but not all of them.It is noteworthy that the logistic regression classifier based on the soft-max function is employed to discriminate different classes in Figure 4.That means, by applying the soft-max function, the class probabilities of s can be expressed as and the class label of s can be determined by In addition, despite remarkable success of GANs, their applications in semi-supervised classification of HSI are surprisingly unstudied to the best of our knowledge.Therefore, this study represents the first attempt to develop a semi-supervised classification framework for the HSI. .Classifiers like this usually have a cross-entropy objective function in supervised scenario.That means, a discriminative model can be trained by minimizing the objective function between observed labels and the model predictive distribution p model (c|s).However, the supervised learning usually needs enough labeled training samples to guarantee the representativeness and prevent the classifier from overfitting, especially for a deep discriminative model with huge parameter volume such as CNN.The strong demand for abundant training samples conflicts with the fact that the labels of the samples are extremely difficult and expensive to identify.At the same time, there are vast of unlabeled samples in the HSI.Therefore, we propose a GANs-based classification method [65,66] to simultaneously utilize both the limited labeled samples and the sufficient unlabeled samples in a semi-supervised fashion.
To establish a new semi-supervised hyperspectral classification framework based on GANs, we add the generated samples to the HSI dataset and denote them as the (C + 1)th class.The dimension of the classifier output is correspondingly increased from C to (C + 1).The probability when s comes from G can be represented as p model (c = C + 1|s), which is a substitution of 1 − D(s) in the objective function V(D, G) of the original GANs [65].Note that the unlabeled training samples belong to the former C classes, we can learn from those unlabeled samples to improve the classification performance by maximizing log p model (c ∈ 1, 2, . . ., C|s).
Without loss of generality, assuming half of the dataset consists of real data and half is the generated data, the loss function L of the classifier yields where L supervised represents the negative log probability of the label with the data is from the real HSI features, L unsupervised equals the standard GAN game-value function in case we substitute 19) According to the Output Distribution Matching (ODM) cost theory of [79], if we have exp[l j (s)] = f (s)p(c = j, s), ∀j < C + 1 and exp[l C+1 (s)] = f (s)p G (s) for some undetermined scaling function f (s), the unsupervised loss will be consistent with the supervised loss.As such, by combining L supervised and L unsupervised , we can get the total cross entropy loss L, whose optimal solution can be estimated by minimizing both loss functions jointly.
Moreover, to address the instability of the unsupervised optimization part related to the GANs, we adopt a strategy called feature matching to substitute the traditional way of training the generator G by requiring it to match the statistics characteristics of the real data.In greater detail, the generator G is trained to match the expected value of the output d(s) on an intermediate layer in the discriminator D. By optimizing an alternative objective function defined as , we obtain a fixed point where G matches the distribution of training data.Based on the above analysis, we show a visual illustration of the semi-supervised hyperspectral classification method by GANs in Figure 5.The network parameters of the generator G and the discriminator D in Figure 5 are trained by optimizing the loss function in Equation (17).The unlabeled data is taken as the true data s ∼ p data in Equation (19) to train both generator G and discriminator D.Moreover, the latent space of the generator G is chosen from the unlabeled data (To be exact, the latent space can also be chosen from the labeled data by ignoring the class labels), the noise follows the uniform distribution, and the output of the generator G is the fake data.By jointly minimizing the loss functions in Equation ( 17), the parameters of the generator G are updated to fool the discriminator D, and the fake examples are generated accordingly.The logistic regression classifier based on the soft-max function is adopted to perform the multi-class classification in the GANs.It is notable that the actual differences between the traditional GANs and the modified GANs used in this paper lie in threefold: (1) the objective functions are changed to make full use of both labeled and unlabeled samples; (2) the output layer of the discriminator is modified from binary classification to multi-class semi-supervised learning; (3) feature matching is adopted to improve the stability of the traditional GANs.

Experimental Section
In this section, we investigate the performance of the proposed method (abbreviated as 3DBF-GANs for simplicity) on three benchmark HSI datasets.A series of experiments are conducted to perform a comprehensive comparison with other state-of-the-art methods, including 2DBF [64], SVM [12], Laplacian SVM (LapSVM) [22,24] and CDL-MD-L [62].2DBF and 3DBF are feature extraction methods, SVM is a widely-used supervised classifier, while LapSVM, GANs and CDL-MD-L are classifiers based on semi-supervised learning.Moreover, the original spectral features are also considered as a baseline for comparison.

Dataset Description
In the experiments, three publicly available hyperspectral datasets (i.e., Indian Pines data, University of Pavia data and Salinas data) are employed as benchmark datasets.What follows are details of the three hyperspectral datasets.

Experimental Setup
In order to evaluate the performance of the proposed 3DBF-GANs method, we compare it with some other algorithms, i.e., 2DBF, SVM, LapSVM, and CDL-MD-L.The original spectral features (abbreviated as "Spec") are also considered in the experiments.Specifically, the "Spec", 2DBF and 3DBF are feature extraction methods, while SVM, LapSVM and CDL-MD-L are supervised/semi-supervised classifiers.The LapSVM, which is a graph-based semi-supervised learning method, introduces an additional manifold regularizer on the geometry of both unlabeled and labeled data in terms of the Graph Laplacian.It has been applied to hyperspectral classification and the results have demonstrated the advantage of this graph-based method in semi-supervised classification of the HSI.As to the GANs, the standard framework is used, except for adding a softmax classifier in the output of the discriminator and adopting feature matching to improve the stability of the original GANs.By combining the feature extraction and classification methods in pairs, 12 methods (i.e., Spec-SVM, Spec-LapSVM, Spec-GANs, Spec-CDL-MD-L, 2DBF-SVM, 2DBF-LapSVM, 2DBF-CDL-MD-L, 2DBF-GANs, 3DBF-SVM, 3DBF-LapSVM, 3DBF-CDL-MD-L and 3DBF-GANs) are obtained for comparison.Since the spectral-spatial information is used in the original CDL-MD-L, Spec-CDL-MD-L and 3DBF-CDL-MD-L denote the input of the CDL-MD-L is the original HSI and the dataset given by 3DBF, respectively.
In the experiments, a training/test sample is a single pixel, whose size is 1 × b.Each pixel can be taken as the feature of a certain class and classified by the discriminator of the GANs or other classifiers.Each pixel corresponds to a unique label.The whole cube contains many pixels and therefore, has lots of labels.All the HSI datasets are normalized between zero and one at the beginning of the experiments.
All the experiments are implemented on the normalized hyperspectral datasets, whose available data is randomly divided into two parts, i.e., about 60% for training and the rest for testing.In all the datasets, very limited labeled samples, i.e., 5 samples per class, are randomly selected from the training samples as labeled samples, and the remaining ones are used as unlabeled samples.The experiments are repeated ten times using random selection of training and test sets, and the average accuracies are reported.To assess the experimental results quantitatively, we compare the aforementioned methods by three popular indexes, i.e., overall accuracy (OA), average accuracy (AA) and kappa coefficient (κ).Moreover, the F-Measure of various methods is also compared.
For the parameter settings, since the number of labeled samples is limited, leave one out cross validation is adopted in this paper.The range of the filtering size σ s and blur degree σ r in the 2DBF are selected in the range of [1, 2, . . ., 9] and [0.1, 0.2, . . ., 0.5], respectively, whereas both σ s and σ r in the 3DBF are chosen from [5, 10, . . . , 50].In the SVM and LapSVM, radial basis function (RBF) kernels are adopted.The RBF parameter γ is obtained from the range [2 −2 , 2 −1 , . . ., 2 10 ] and the penalty term is set to 60. 4 spectral neighbors are adopted to calculate the Laplacian graph in the LapSVM.Three layers are used in the CDL-MD-L, whose window size and the number of hidden units are set to the same as [62].The generator in the GANs has two hidden layers, and the number of units is set to 500 and 300, respectively.In the discriminator, three hidden layers are adopted, and the number of units is set to 300, 200 and 150, respectively.Gaussian noise is added to the output of each layer of the discriminator.Moreover, the learning rate and training epoch are set to 0.001 and 100, respectively.

Experimental Results
To demonstrate the effectiveness of the 3DBF for spectral-spatial feature extraction, we compare the spectral profiles of the pixel (18,6) from the original Indian Pines data, and the features obtained by the 2DBF and the 3DBF in Figure 9.Moreover, the spatial scenes of the 4th, 22nd, 34th bands are compared in Figure 10.As can be seen, the profiles of 3DBF preserve the trend of the original data while provide smoother features in both spectral and spatial domains.The qualitative evaluations of various methods are shown in Tables 4-6, and the classification maps are also visually compared in Figures 11-13.Based on the above-mentioned experimental results, a few observations and discussions can be highlighted.It can be first seen that, the methods (i.e., Spec-SVM, 2DBF-SVM, and 3DBF-SVM) using only the limited labeled training samples provide worse classification performance than the semi-supervised methods that take the unlabeled training samples into consideration.This stresses yet again the importance of unlabeled samples for HSI classification.For instance, it is observed from Table 4 that the SVM leads to lower classification accuracies than other classifiers (i.e., LapSVM, CDL-MD-L and GANs).Taking the same original "Spec" features as inputs, the OA of SVM is 2.15%, 23.28% and 9.49% lower than those of the LapSVM, CDL-MD-L and GANs, respectively.Similar properties can also be found in Tables 5 and 6.The above-mentioned phenomena demonstrate the effectiveness of utilizing the abundant unlabeled samples for the HSI data.
Second, the "Spec"-based features provide higher classification errors than the 2DBF/3DBF-based features.As shown in Table 5, the OA, AA, κ and F-Measure of Spec-SVM are lower than those of the 2DBF-SVM and 3DBF-SVM.Similarly, the OA, AA, κ and F-Measure of Spec-LapSVM/ CDL-MD-L/GANs are also lower than 2DBF-LapSVM/CDL-MD-L/GANs and 3DBF-LapSVM/ CDL-MD-L/GANs.It is also clearly visible that more scattered noise is generated in Figure 12a than in Figure 12e,i.This is due to the fact that the "Spec" features based only on spectral characteristics, while 2DBF and 3DBF methods can effectively incorporate the spatial information.Since the CDL-MD-L can make use of both spectral and spatial information in the classification process, the classification accuracies of Spec-CDL-MD-L are much higher than those of the Spec-SVM, Spec-LapSVM and Spec-GANs.As shown in Table 5, the OA of Spec-CDL-MD-L is at least 8% higher than other classifiers.Moreover, with the same classifiers, the 3DBF performs much better than 2DBF.For instance, the OA

Statistical Significance Analysis of the Results
The statistical significance of the classification differences between various methods is assessed by the McNemar's test, which is based upon the standardized normal test statistic where f ij refers to the number of samples classified correctly by the classifier i but incorrectly by classifier j and Z indicates the pairwise statistical significance of the classification difference between the ith and jth classifiers.In case the test statistic |Z| > 1.96, the difference of classification accuracies between the ith and jth classifiers is regarded as statistical significant at the 5% level of significance.For comparison purpose, the results of the McNemar's test on the 3DBF-GANs and other methods are listed in Table 7, which shows that the proposed 3DBF-GANs is superior (Z > 1.96) to Spec-SVM, Spec-LapSVM, Spec-GANs, Spec-CDL-MD-L, 2DBF-SVM, 2DBF-LapSVM, 2DBF-CDL-MD-L, 2DBF-GANs, 3DBF-SVM, 3DBF-LapSVM, or comparable (|Z| < 1.96 in the Indian Pines data) with 3DBF-CDL-MD-L.According to the McNemar's test, both the 3DBF and GANs are helpful for improving the classification performance since the test statistic is statistical significant, which further confirms the effectiveness of the proposed method.

Sensitivity Analysis of the Parameters
There are four important parameters in the proposed 3DBF-GANs method: the filtering size σ s , the blur degree σ r , the training epoch and the learning rate.The influence of these parameters on the classification performance (e.g., OA) is analyzed in Figures 14 and 15.In Figure 14, the effect of σ s and σ r is plotted with the training epoch is fixed to 100.It can be seen from Figure 14 that, if the filtering size σ s and the blur degree σ r are too small or too large, the OA of 3DBF-GANs is not satisfactory.This is due to the fact that very little spatial information is considered in case σ s and σ r are too small, while too large σ s and σ r will cause oversmooth.Furthermore, the influence of the training epoch is depicted in Figure 15, from which one can observe that, the OA rapidly increases at first, then slowly increases and finally trends to a certain stable value with the increasing of training epoch.The influence of the learning rate is shown in Figure 16, from which one can find that the OA with a large learning rate (e.g., 0.1) is much lower than that with a smaller learning rate.The reason is that too large learning rate can cause the loss function to fluctuate around the minimum, or even worse, to diverge.Note that too small (e.g., 0.00001) will lead to slow convergence, it is better to set the learning rate to 0.001 or 0.0001.In analogy to the other comparison methods, the appropriate parameters are of importance to the classification performance our proposed 3DBF-GANs method.It is highlighted from the above methods when the same number of labeled training samples is chosen.Although the performance of different methods changes as the number of training samples changes, the 3DBF-GANs provides higher classification accuracies than other methods.In addition, it should be pointed out here that the number of synthetic examples generated by the 3DBF-GANs method equals to the total number of training samples (including labeled samples and unlabeled samples), and therefore, is not varied with the number of labeled samples.

Conclusions
In this paper, we have proposed a semi-supervised learning method based on 3DBF and GANs for hyperspectral classification.The proposed 3DBF-GANs method relies on two aspects.The first is the extraction of spectral-spatial features by the 3DBF.The main advantage of the 3DBF is that it is able to smooth the hyperspectral cube while preserving the edges by naturally treating the HSI as a volumetric dataset.The second one is the semi-supervised classification by the GANs.The GANs can make full use of both the limited labeled samples and the abundance of unlabeled samples for significant semi-supervised learning.Compared to the shallow learning and non-adversarial networks, the GANs are one of the effective deep learning methods which can take advantage of the discriminative models to train the generative network in an adversarial fashion.The proposed method has been tested on AVIRIS and ROSIS datasets with very limited labeled samples, and the comparison with other state-of-the-art methods (i.e., 2DBF, SVM, LapSVM, and CDL-MD-L) has confirmed the effectiveness of the proposed method.Quantitatively, the OA of 3DBF-GANs improves about 1% to 25% compared to other state-of-the-art methods.Note that the GANs have a complex structure, a future research topic is to investigate how to determine the network parameters in a more effective and automatic way.Introducing the graph-based semi-supervised learning method [26] from other areas to hyperspectral classification is also a probable future research direction.

Figure 1 .
Figure 1.Flowchart of the proposed method.
learns a deep network by generative adversarial training.Features learned by adversarial training is fused with a traditional unsupervised classification approach, i.e., k-means clustering, and the combination produces better results than direct prediction.In situation of semi-supervised classification, the adversarial training has the potential to outperform supervised learning.

Figure 4 .
Figure 4. Schematic illustration of the procedure for unsupervised/semi-supervised learning based on GANs.

Figure 5 .
Figure 5.A visual illustration of the semi-supervised hyperspectral classification method by GANs.

Figure 6 .
Figure 6.Indian Pines data.(a) Three-band false color composite and (b) ground truth data with 16 classes.

Figure 7 .
Figure 7. University of Pavia data.(a) Three-band false color composite and (b) ground truth data with 9 classes.

Figure 8 .
Figure 8. Salinas data.(a) Three-band false color composite and (b) ground truth data with 16 classes.

Figure 9 .
Figure9.The spectral profiles of the pixel(18,6) from the original Indian Pines data, the 2DBF and the 3DBF.

Figure 10 .
Figure 10.Spatial scenes of the 4th, 22nd, 34th bands.(a,d,g) are chosen from the original Indian Pines data, (b,e,h) are obtained by the 2DBF, and (c,f,i) are obtained by the 3DBF.

Figure 13 .
Figure 13.Classification maps of the Salinas data with 5 samples per class.

Figure 17 .
Figure 17.The impact of the number of labeled training samples per class on the OA in (a) Indian Pines data, (b) University of Pavia data and (c) Salinas data.
Figure 6depicts the color composite of the image as well as the ground truth map.There are 16 classes of interest and the number of samples in each class is displayed in Table1, whose background color denotes different classes of land-covers.Since the number of samples is unbalanced and the spatial resolution is relatively low, it poses a big challenging to the classification task.
Pines data: the first dataset was captured by the Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) sensor over the agricultural Indian Pines test site in the Northwestern Indiana, USA, on 12 June 1992.The original image contains 224 spectral bands.After removing 4 bands full of zero and 20 bands affected by noise and water-vapor absorption, 200 bands are left for experiments.It consists of 145 × 145 pixels with a spatial resolution of 20 m per pixel, and the spectral coverage ranging from 0.4 to 2.5 µm.varioustypes of land-covers.The color composite image together with the ground truth data are shown in Figure7.The detailed number of samples in each class is listed in Table2, whose background color also corresponds to the color in Figure7.3.Salinas data: the third dataset was collected by the AVIRIS sensor over the Salinas Valley, Southern California, USA, on 8 October 1998.The original dataset contains 224 spectral bands

Table 1 .
Number of samples (NoS) used in the Indian Pines data.

Table 2 .
NoS used in the University of Pavia data.

Table 3 .
NoS used in the Salinas data.

Table 4 .
Classification accuracy (%) of various methods for the Indian Pines data with 5 labeled training samples per class, bold values indicate the best result for a row.
a Lines 3 to 18 are the F-Measure per class.

Table 5 .
Classification accuracy (%) of various methods for the University of Pavia data with 5 labeled training samples per class, bold values indicate the best result for a row.
a Lines 3 to 11 are the F-Measure per class.

Table 6 .
Classification accuracy (%) of various methods for the Salinas data with 5 labeled training samples per class, bold values indicate the best result for a row.
a Lines 3 to 18 are the F-Measure per class.