Next Article in Journal
Reconstruction of Vegetation Index Time Series Based on Self-Weighting Function Fitting from Curve Features
Previous Article in Journal
Absolute Radiometric Calibration of an Imaging Spectroradiometer Using a Laboratory Detector-Based Approach
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Graph-Based Deep Multitask Few-Shot Learning for Hyperspectral Image Classification

1
School of Electronics and Information, Northwestern Polytechnical University, 127 West Youyi Road, Xi’an 710072, China
2
School of Automation Science and Electrical Engineering, Beihang University, Beijing 100191, China
*
Author to whom correspondence should be addressed.
Remote Sens. 2022, 14(9), 2246; https://doi.org/10.3390/rs14092246
Submission received: 22 March 2022 / Revised: 2 May 2022 / Accepted: 4 May 2022 / Published: 7 May 2022

Abstract

:
Although the deep neural network (DNN) has shown a powerful ability in hyperspectral image (HSI) classification, its learning requires a large number of labeled training samples; otherwise, it is prone to over-fitting and has a poor classification performance. However, this requirement is impractical for HSIs due to the difficulty in obtaining class labels. To make DNNs suitable for HSI classification with few labeled samples, we propose a graph-based deep multitask few-shot learning (GDMFSL) framework that learns the intrinsic relationships among all samples (labeled and unlabeled) of HSIs with the assistance of graph information to alleviate the over-fitting caused by few labeled training samples. Firstly, a semi-supervised graph is constructed to generate graph information. Secondly, a deep multitask network (DMN) is designed, which contains two subnetworks (tasks): a classifier subnetwork for learning class information from labeled samples and a Siamese subnetwork for learning sample relationships from the semi-supervised graph. To effectively learn graph information, a loss function suitable for the Siamese subnetwork is designed that shortens (and expands) the distance between the target sample and its nearest (and farthest) neighbors. Finally, since the number of training samples of the two subnetworks is severely imbalanced, a multitask few-shot learning strategy is designed to make two subnetworks converge simultaneously. Experimental results on the Indian Pines, University of Pavia and Salinas datasets demonstrate that GDMFSL achieves a better classification performance relative to existing competitors in few-shot settings. In particular, when only five labels per class are involved in training, the classification accuracy of GDMFSL on the three datasets reaches 87.58%, 86.42% and 98.85%, respectively.

1. Introduction

Combined with spectroscopic technology, hyperspectral imaging technology is used to detect the two-dimensional geometric space and one-dimensional spectral information of the target and obtain continuous and narrow band image data with a high spectral resolution [1]. With the development of hyperspectral imaging, hyperspectral images (HSIs) not only contain abundant spectral information reflecting the unique physical properties of the object material but also provide a fine spatial resolution for the ground features [2]. On account of these advantages, HSIs have been widely used in many fields, such as agriculture [3], transportation [4], medicine [5], earth observation [6] and so on.
Among these various HSI-related applications, one of the most essential tasks is HSI classification, which aims to assign a predefined class label to each pixel [7], and has received a substantial amount of attention [8,9,10,11]. However, due to the computational complexity of high dimensional data and the Hughes phenomenon (the higher the dimension, the worse the classification) caused by the limited labeled samples in HSIs, traditional classification techniques performed poorly [12]. To achieve a better classification of HSIs, one conventional solution is to utilize dimensionality reduction to obtain more discriminative features for facilitating the classification of classifiers (e.g., KNN and SVM). Many classic dimensionality reduction methods have been applied to HSIs; for example, principle component analysis [13], linear discriminant analysis [14], independent component analysis [15], low-rank [16] and so on. Nevertheless, the classification performance is still unsatisfactory because these methods are based on the statistical properties of HSI, which neglect the intrinsic geometric structures [17]. To reveal the intrinsic structures of data, manifold learning was designed to discover the geometric properties of HSI; for instance, isometric mapping [18], locally linear embedding [19] and Laplacian eigenmaps [20].
In fact, a unified framework, namely graph learning, can represent and redefine the above dimensionality reduction methods with different similarity matrices and constraint matrices, which can reveal the intrinsic similar relationships of data and have been widely applied to HSIs [17]. Recently, some advanced spectral–spatial graph learning methods were proposed to represent the complex intrinsic structures in HSIs. Zhou et al. [21] developed a spatial and spectral regularized local discriminant embedding method for the dimensionality reduction of HSIs that described the local similarity information by integrating a spectral-domain regularized local preserving scatter matrix and a spatial-domain local pixel neighborhood preserving scatter matrix. Huang et al. [22] proposed an unsupervised spatial–spectral manifold reconstruction preserving embedding method that explored the spatial relationship between each point and its neighbors to adjust the reconstruction weights to improve the efficiency of manifold reconstruction. Huang et al. [23] put forward a spatial–spectral local discriminant projection method where two weighted scatter matrices were designed to maintain the neighborhood structure in the spatial domain and two reconstruction graphs were constructed to discover the local discriminant relationship in the spectral domain. These advanced unsupervised or semi-supervised graph learning methods obtain more discriminative features by exploring and maintaining the intrinsic relationships among samples. Indeed, they improve the classification performance of classifiers. However, the disadvantage of this solution is that feature extraction and classification are separated and the feature extraction process cannot learn the data distribution suitable for the classifier.
Another solution to the HSI classification problem is deep learning technology, which has the powerful ability to learn discriminative features because of the deep structure and automatic learning patterns from data [24]. Different from the above classification methods, in deep learning, the learning process of feature extraction and classification is synchronous, and the extracted features are suitable for the data distribution of the classifier, so as to achieve a better classification performance. Recently, it has also shown a promising performance in HSI classification [25]. To gain a better spatial description of an object, convolutional neural networks (CNNs) have been widely applied for HSIs [26,27,28,29]. Boggavarapu et al. [30] proposed a robust classification framework for HSI by training convolution neural networks with Gabor embedded patches. Paoletti et al. [31] presented a 3-D CNN architecture for HSI classification that used both spectral and spatial information. Zhong et al. [28] designed an end-to-end spectral–spatial residual network that takes raw 3-D cubes as input data without feature engineering for HSIs classification. Although these deep-learning-based methods have achieved a promising classification performance, their learning process requires sufficient labeled samples as the training data, which is difficult for HSIs. In practice, the collection of HSI-labeled samples is generally laborious, expensive, time-consuming and requires field exploration and verification by experts, so the labeled samples available are always limited, insufficient or even deficient [32]. Unfortunately, the conventional deep learning models with limited training samples always face a serious over-fitting issue for HSI classification [33]. Hence, it remains a challenge to apply deep learning that requires sufficient training samples to HSIs with only limited and few labeled samples.
To address this problem, several different few-shot learning methods have been proposed to deal with HSI classification with few labeled samples in recent years. Few-shot learning aims to study the difference between the samples instead of directly learning what the sample is, which is different from most other deep learning methods [34]. As far as we know, there are three types of networks for few-shot learning for HSIs, including the prototypical network, relation network and Siamese network. The prototypical network learns a metric space in which classification can be performed by computing distances to prototype representations of each class [35]. In [36], Tang et al. proposed a spatial–spectral prototypical network for HSIs that first implemented the local-pattern-coding algorithm for HSIs to generate the spatial–spectral vectors. The relation network learns how to learn a deep distance metric based on the prototypical network, which can precisely describe the difference in samples [37]. Gao et al. in [38] designed a new deep classification model based on a relational network and trained it with the idea of meta-learning. The Siamese network is composed of two parallel subnetworks with the same structure and sharing parameters, in which, the input is a sample pair and the Euclidean distances are used to measure the similarity of an embedding pair. In [39], a supervised deep feature extraction method based on a Siamese convolutional neural network was proposed to improve the performance of HSI classification, in which, an additional classifier was required for classification. To sum up, it is not difficult to see that these networks can be summarized as metric-based models, but they usually measure the difference only among labeled samples and ignore unlabeled samples. In practice, labeled samples in HSIs are so few and limited that the neural network can only learn so much information, while the intrinsic structure of HSI is complex and the neural network needs to learn a variety of information. Hence, deep-learning-based few-shot or even one-shot HSI classification is still a challenge.
Although labeled samples in HSI are few and limited, attainable unlabeled samples are abundant and plentiful. Accordingly, the information implicit in unlabeled samples and the relationship between unlabeled samples and labeled samples are worthy and necessary to explore. Nevertheless, if there is no specific constraint, the neural network cannot learn the information beneficial to classification from the unlabeled samples. Fortunately, graph learning, as described earlier, is quite well-versed in this problem, and can effectively reveal the intrinsic relationship among samples. Inspired by the idea of graph learning, we propose a novel graph-based deep multitask few-shot learning (GDMFSL) framework for HSI classification with few labeled samples, which can learn the intrinsic relationships among all samples with the assist of graph information. In addition, another difference between GDMFSL and the aforementioned few-shot learning methods is that their metric function acts on the embedding feature layer, whereas GDMFSL directly constrains the output layer of the classifier. This will make the graph information act on the classification results more effectively.
The main contributions of this paper can be summarized as follows.
  • In order to make the deep learning method suitable for HSI classification with only few labeled samples, we propose a novel graph-based deep multitask few-shot learning (GDMFSL) framework that integrates graph information into the neural network. GDMFSL learns information not only from labeled samples but also from unlabeled samples, and even obtains the relationship between labeled samples and unlabeled samples, which can not only alleviate the over-fitting problem caused by limited training samples but also improve the classification performance.
  • In order to learn both the class information from labeled samples and the graph information, a deep multitask network (DMN) is designed, which contains two subnetworks (tasks): a Siamese subnetwork and a classifier subnetwork. The task of the Siamese subnetwork is to learn the intrinsic relationships among all samples with the assistance of graph information, whereas the classifier subnetwork learns the class information from labeled samples. Accordingly, unlike the networks described earlier for few-shot learning, DMN not only learns what the sample is but also the differences among all samples.
  • In order to effectively learn graph information, a loss function suitable for the Siamese subnetwork learning and training is designed, which shortens the distance between the target sample and its nearest (or in-class) neighbors and expands the distance between the target sample and its farthest (or inter-class) neighbors. Experimental results show that the designed loss function can converge well, effectively alleviate the over-fitting problem of the classifier subnetwork caused by the few labeled samples and improve the classification performance.
  • Due to the small number of labeled samples but large number of unlabeled samples in HSIs, the proportion between the number of training samples for the classifier subnetwork and that for the Siamese subnetwork is seriously unbalanced, and so the learning process of DMN is unstable. In order to balance the learning and training of two tasks in DMN, a multitask few-shot learning strategy is designed to make the two tasks converge simultaneously.
This paper is organized as follows. In Section 2, the proposed graph-based deep multitask few-shot learning framework is described in detail. Section 3 presents the experimental results on three datasets that demonstrate the superiority of the proposed GDMFSL. A conclusion is presented in Section 4.

2. Methodology

2.1. The Proposed Graph-Based Deep Multitask Few-Shot Learning Framework

In this paper, we study HSIs with few labeled samples and predict the classes of unlabeled samples. We represent a pixel (sample) of HSI as a vector x i D , where D is the number of spectral bands. Suppose that an HSI dataset X has m samples, of which, only n ( n m ) samples are labeled and m n samples are unlabeled, m samples are denoted as  X = x 1 , x 2 , , x m and n labeled samples are represented as  ( x 1 , y 1 ) , ( x 2 , y 2 ) , , ( x n , y n ) , where y i is the class label of  x i . For ease of calculation, the values of all samples are mapped to the range 0∼1 before learning.
HSI classification is used to predict the classes of unlabeled samples according to the class labels of labeled samples. In general, deep-learning-based classification methods aim to learn mapping between the training samples and their labels under the supervision of enough labeled samples. However, in the case of few and limited labeled samples in HSIs, the conventional deep neural network (DNN) will fall into over-fitting, resulting in poor classification results. In addition, the information obtained from only few labeled samples is not enough to support the classification of a mass of HSI samples with a complex intrinsic structure. To solve this difficulty, this study tries to guide DNN to gain information conducive to classification from plentiful unlabeled samples that are easily acquired. However, if there are no additional constraints on DNN, learning on unlabeled samples is often chaotic. Therefore, this idea is challenging to take forward. In this paper, inspired by graph learning, the proposed graph-based deep multitask few-shot learning framework finds a solution.
Graph learning is an effective technique to reveal the intrinsic similar relationships among samples, which can reflect the homogeneity of data. It has been widely applied for HSI to reduce data redundancy and dimensionality. In graph learning, the graph is used to reflect the relationship of two samples, which can represent some of the statistical or geometrical properties of data [17]. The relation information of unlabeled samples can also be captured and embodied in the graph. Thereupon, the graph should be a good auxiliary tool to assist the DNN in learning information from unlabeled samples.
In this paper, we study the HSI with few labels. Therefore, using not just unlabeled samples, the graph can reflect the relationship among all samples that contain labeled samples and unlabeled samples. In other words, the graph should be able to reflect the relationship not only within unlabeled samples and within labeled samples, but also between labeled samples and unlabeled samples. This will be key to predicting the classes of unlabeled samples. As a result, a semi-supervised graph is required.
Based on the semi-supervised graph and labeled samples, the DNN has two tasks to learn, namely the class attributes of samples and the relationship among samples. The two tasks are different: one is to learn what the sample is, and the other is to learn the differences among the samples. In order to simultaneously learn the two tasks and to make them promote each other, we designed a deep multitask network.
Based on the above, a graph-based deep multitask few-shot learning (GDMFSL) framework was proposed to deal with HSI classification with few labels, which is shown in Figure 1. Obviously, the first step of GDMFSL was to construct a semi-supervised graph on the basis of all of the samples, both labeled and unlabeled. Meanwhile, graph information was generated to prepare for deep multitask network. The second step was for the deep multitask network to learn and train under the supervision of few labels and graph information, where the input contains all samples. Finally, unlabeled samples were fed into deep multitask network to predict classes.

2.2. Construction of Semi-Supervised Graph

A graph G can be denoted as  G = X , E , W , which is an undirected graph, where X denotes the vertexes, E denotes the edges and W represents the weight matrix of edges. To construct a graph, the neighbors are connected by edges and a weight is given to the corresponding edges [17]. If vertexes i and j are similar, we should put an edge between vertexes i and j in G and define a weight W i j for the edge.
The key to constructing a graph is how to effectively calculate the similarity between samples. For this purpose, the spectral–locational–spatial distance (SLSD) [32] method was employed, which combines spectral, locational and spatial information to excavate the more realistic relationships among samples as much as possible. SLSD not only extracts local spatial neighborhood information but also explores global spatial relations in HSI-based location information. Experimental results in [32] show that neighbor samples obtained by SLSD are more likely to fall into the same class as target samples.
Figure 2 shows the construction of a semi-supervised graph, which is essentially adding the information of few labeled samples to the unsupervised graph. In the following, we will go through the process of constructing a semi-supervised graph in detail. In SLSD, the location information is one of the attributes of pixels. For an HSI dataset X = x 1 , x 2 , , x m D × m with m samples, each of its samples x i D × 1 have D spectral bands. Its location information can be denoted as  C = c 1 , c 2 , , c m 2 × m , where c i = [ p i , q i ] T is the coordinate of the pixel x i . To fuse the spectral and locational information of pixels in HSIs, a weighted spectral-locational dataset X C = x C 1 , x C 2 , , x C m was constructed as follows:
X C = β C 1 β X = β c 1 , , β c m 1 β x 1 , , 1 β x m ,
where β is a spectral–locational trade-off parameter. The local neighborhood space of  x C i is Ω ( x C i ) in a  s × s spatial window, which has s 2 = { 1 , , r } samples. SLSD of the sample x i and x j is defined as 
d SLSD x i , x j = d Ω ( x C i ) , x C j = r = 1 s 2 t i r x C j x C i r r = 1 s 2 t i r , x C i r Ω ( x C i )
where t i r is calculated by t i r = exp γ x C i x C i r , x C i r Ω ( x C i ) . γ is a constant that was empirically set to 0.2 in the experiments. x C i r is a pixel in  Ω ( x C i ) surrounding x C i .
Although SLSD is effective at revealing relationships between samples, it is still an estimated and imprecise measurement. For an HSI dataset with n labeled samples, d SLSD x i , x j of labeled samples x i and x j should be updated. In actual calculations, any d SLSD x i , x j is less than 1. In that way, in terms of n labeled samples ( x 1 , y 1 ) , ( x 2 , y 2 ) , , ( x n , y n ) , d SLSD x i , x j can be updated as follows:
d SLSD x i , x j = 1 , if y i = y j 0 , if y i y j d SLSD x i , x j , if y i or y j i s ϕ .
where, if x i and x j have the same class labels, their SLSD is set 0. If x i and x j have the different class labels, their SLSD is set 1. In addition, if x i or x j is unlabeled, its SLSD is not updated. In this manner, the updated d SLSD contains information about n labels.
In a graph, a vertex and its neighbors are connected by edges. In this paper, we needed to construct two graphs: G w = X , E w , W w based on the nearest neighbors and G b = X , E b , W b based on the farthest neighbors. On the basis of SLSD, G w and G b can be constructed. For G w = X , E w , W w , k 1 nearest neighbors were found as
N k 1 x i = min x j d SLSD x i , x j .
Since d SLSD was updated based on labels, k 1 nearest neighbors could be obtained from the samples with the smallest d SLSD . Then, the weight matrix W w was formulated as
W w i j = exp d SLSD x i , x j 2 2 t i 2 , x j N k 1 x i , 0 , x j N k 1 x i ,
in which, t i = 1 k 1 x j N k 1 x i d SLSD x i , x j . For G b = X , E b , W b , k 2 farthest neighbors were found as
N k 2 x i = max x j d SLSD x i , x j , if y i = ϕ , min x j d SLSD x i , x j , if y i y j .
From that, k 2 farthest neighbors of unlabeled samples were obtained from the samples with the largest d SLSD , whereas those of labeled samples were obtained from the samples with the different class labels and the smallest d SLSD . The weight matrix W b is formulated as
W b i j = 1 , x j N k 2 x i , 0 , x j N k 2 x i .
In fact, k 1 m and k 2 m , W w and W b are sparse matrices.
G w and G b involve different sample relationships. G w reflects the relationships between the target sample and its nearest neighbors, which have a high probability of belonging to the same class as the target sample, whereas G b reflects the relationships between the target sample and its farthest neighbors, which are most likely different classes from the target sample.
Figure 3 illustrates the pipeline of the proposed deep multitask network. The training data of DMN contain both labeled and unlabeled samples, and so the proposed DMN can be regarded as a semi-supervised network. DMN includes a Siamese subnetwork and a classifier subnetwork and they have different tasks and training data. The training data of classifier subnetwork must be labeled samples, which is a conventional standard supervised network. As the name implies, the task of classifier subnetwork is classification, learning the classes of labeled samples to predict unlabeled samples, which, in nature, is learning what samples are. Nevertheless, due to the few and limited labels in HSIs, the conventional classification network often suffers from the problems of over-fitting and poor classification performance. In proposed DMN, Siamese subnetwork was designed to address this problem, whose task was to learn the samples’ relationships from  G w and G b to promote the learning and training of classifier subnetwork. It can be seen from Figure 3 that the two subnetworks have the same architecture and share parameters, which is the hub through which they can communicate and complement each other. The training data of Siamese subnetwork are all samples, including labeled and unlabeled samples. In addition, the training of Siamese network also requires the information generated by the semi-supervised graph, and the value of such information is reflected here. In fact, our designed Siamese subnetwork is essentially an unsupervised network and can still be trained without labels.

2.3. Network Architecture and Loss Function of Deep Multitask Network

Figure 4 shows the generation process of training data for DMN. For an HSI dataset, X = x 1 , x 2 , , x m D × m with m samples (labeled and unlabeled) and n labels ( n m ). M 3-D cube samples x 1 S , x 2 S , , x m S D × s × s × m with the spatial neighborhood are first generated. Since the training data of Siamese subnetwork and classifier subnetwork are different, two training sets need to be established. Classifier subnetwork only trains the labeled samples, so its training data contain n 3-D cube samples x 1 t 1 , x 2 t 1 , , x n t 1 D × s × s × n with y 1 , y 2 , , y n labels, as shown in Training Set 1 X t 1 of Figure 4. In practice, the size of its input is 3-D cube samples x i t 1 D × s × s . Due to the fact that Siamese subnetwork is to learn the sample relationships in  G w and G b , in addition to the target sample, k 1 nearest neighbors in  G w and k 2 farthest neighbors in  G b also need to be input into the network during training. Training Set 2 X t 2 of Figure 4 are the training data of Siamese subnetwork, which include m training samples x 1 t 2 , x 2 t 2 , , x m t 2 , where a training sample x i t 2 = x i S , N k 1 x i S , N k 2 x i S D × s × s × 1 + k 1 + k 2 contains one target sample x i S , its k 1 nearest neighbors N k 1 x i S from  G w and its k 2 farthest neighbors N k 2 x i S from  G b . It is worth noting that some neighbors of unlabeled samples are labeled samples, which allows the network to learn the relationship between labeled samples and unlabeled samples to promote classification.
Figure 5 displays the network architecture of classifier subnetwork with a feature extractor and a logistic regression layer. In view of the strong feature extraction capability of convolutional layer, the feature extractor is a fully convolutional network. Here, the features of each layer decrease as the number of layers increases and the size of output is d × 1 × 1 , so the feature extractor can also be regarded as a process of dimensionality reduction. Taking a four-layer feature extractor as an example, for input data x i S D × s × s , their output can be formulated as 
f ( x i S , Θ ) = r ( conv ( r ( conv ( r ( conv ( x i S , θ 1 ) ) , θ 2 ) ) , θ 3 ) ) ,
where r ( ) is the ReLU function and conv ( ) is the 2-D convolution. Θ = θ 1 , θ 2 , θ 3 is the learning parameter of the feature extractor. Feature extractor and logistic regression layer are fully connected. The output of logistic regression layer is formulated as 
z i = softmax ( f ( x i S , Θ ) , θ L ) ,
in which, θ L is the learning parameters of logistic regression layer and softmax ( ) is the softmax function. Since the task of classifier subnetwork is classification, for the Training Set 1 ( x 1 t 1 , y 1 ) , ( x 2 t 1 , y 2 ) , , ( x n t 1 , y n ) , the loss function adopts the cross-entropy loss, which is defined as 
L c = 1 n i = 1 n k = 1 N C y i k ln z i k .
Here, y i is the class label of  x i S and z i is its predicted label. y i and z i are two N C -dimensional one-hot vectors and N C is the number of classes. y i k is the kth element of  y i and z i k is the kth element of  z i .
The task of Siamese subnetwork is to learn the sample relationships from  G w and G b . As described in Section 2.2, G w represents the relationships between the target sample and its k 1 nearest neighbors and G b expresses the relationships between the target sample and its k 2 farthest neighbors. In order to learn the graph information in  G w and G b at the same time, a novel Siamese subnetwork with ( 1 + k 1 + k 2 ) subnets is designed to learn the relationship between one target sample and ( k 1 + k 2 ) samples at a time, of which, the network architecture is shown in Figure 6. This is different from the traditional Siamese network, which has two subnets and only learns the relationship between two samples at a time [40]. In our designed Siamese network, each subnet has the same network structure as the classifier subnetwork. That is, all subnets have the same network structure with a feature extractor and a graph-based constraint layer. The point is that they share parameters.
Corresponding to the network architecture, input data x i t 2 = x i S , N k 1 x i S , N k 2 x i S D × s × s × 1 + k 1 + k 2 of Siamese subnetwork have ( 1 + k 1 + k 2 ) 3-D cubes and x i S is the target sample. Meanwhile, we suppose that x i p S N k 1 x i S is the pth nearest neighbor of  x i S and x i q S N k 2 x i S is the qth farthest neighbor of  x i S , which are both 3-D cubes. In Siamese subnetwork, each subnet inputs one 3-D cube x i S D × s × s . According to the sequence from top to bottom in Figure 6, the first subnet inputs the target sample x i S , and its output can be formulated as 
z i = subN ( x i S , Θ , θ L ) = softmax ( f ( x i S , Θ ) , θ L ) ,
which is same as the output of classifier subnetwork. f x i S , Θ is also the output of feature extractor and θ L is the learning parameters of graph-based constraint layer. subN ( ) represents a subnet mapping function, which applies to all subnets due to the same network structure and shared parameters. In that way, the second to  ( 1 + k 1 ) th subnets input nearest neighbor samples and the  ( k 1 + 2 ) th to  ( 1 + k 1 + k 2 ) th subnets input farthest neighbor samples. When a subnet inputs a nearest neighbor x i p S N k 1 x i S of  x i S , its output is described as
z i p N = subN ( x i p S , Θ , θ L ) .
In the same way, when a subnet inputs a farthest neighbor x i q S N k 2 x i S of  x i S , its output is described as 
z i q F = subN ( x i q S , Θ , θ L ) .
Thus, for input data x i t 2 = x i S , N k 1 x i S , N k 2 x i S in Training Set 2 of Figure 4, the output of Siamese subnetwork is
Z i x i t 2 , Θ , θ L = z i , z i 1 N , , z i p N , , z i k 1 N , z i 1 F , , z i q F , , z i k 2 F ,
which includes the outputs of  ( 1 + k 1 + k 2 ) subnets. In fact, Siamese subnetwork aims to promote the learning of classifier subnetwork to improve classification performance. As a result, based on  G w and G b , Siamese subnetwork should compress the distance D N between the target sample and k 1 nearest neighbors and expand the distance D F between the target sample and k 2 farthest neighbors. The former can be formulated as 
D N = 1 m i = 1 m p = 1 k 1 W w i p z i z i p N 2 ,
and the latter can be formulated as 
D F = 1 m i = 1 m q = 1 k 2 W b i p z i z i q F 2 .
W w and W b are the weight matrix from  G w and G w , respectively, which are calculated with Equations (5) and (7). W w is based on SLSD, which is proven to be effective in revealing the more real relationships between samples [32], where, if SLSD between x i and x j is smaller, W w i j is larger, and vice versa. Generally, neural networks optimize learning parameters by minimizing objective functions. However, in Siamese subnetwork, D N needs to be minimized, whereas D F needs to be maximized. A simple negative D F optimization will make the network unable to converge. To take into account the convergence of the network, the loss function of Siamese subnetwork is defined as
L s = D N + exp ( D F ) .
Here, exp ( x ) is a decreasing function and converges to 0 as the variable increases. As a result, the loss function L s will be optimized towards zero.

2.4. Multitask Few-Shot Learning Strategy

In fact, as its name suggests, DMN is a two-task network. Since the training data required by these two tasks are completely different not only in content but also in format, the learning of DMN faces the problem of the two tasks not being able to update learning parameters in training at the same time. In addition, due to the large difference in the number of training data between the two tasks, DMN learning is easy to fall into a single task, so the two tasks cannot achieve uniform convergence. To sum up, it is challenging to achieve the synergy and balance effect of the two tasks. To solve this problem, we designed a multitask few-shot learning strategy (MFSL).
Next, for ease of explanation, we still introduce MFSL based on two sub-networks. Due to the fact that the purpose of DMN is classification and the number of labeled samples is very small, the learning of the classifier subnetwork is particularly important. The University of Pavia dataset, for example, contains 42,776 samples from nine classes. If five samples are taken from each class, the number of labeled samples is only 45 and the number of unlabeled samples is 42,731. Thus, the number of training samples for the classifier subnetwork is 45, whereas that for the Siamese subnetwork is 42,776, which shows a large gap between them. Therefore, the task of the classifier subnetwork needs to be emphasized constantly.
In this paper, our GDMFSL deals with HSIs with few labels. The training data of the classifier subnetwork of DMN are only labeled samples, so all of them can be used as one batch to participate in DMN learning. Algorithm 1 shows the multitask few-shot learning strategy. MFSL can be understood as that, in training, when the Siamese subnetwork learns a batch of data, the classifier subnetwork also learns a batch of data. When the Siamese subnetwork is learning different batches of data, the classifier subnetwork is always learning a batch of data repeatedly. Of course, when the number of labeled samples increases, the training data of the classifier subnetwork can also be divided into multiple batches. The following experiments also prove that MFSL can balance and converge the two tasks of DMN.
Algorithm 1: Multitask few-shot learning strategy
Input: Training Set 1 X t 1 with labels, Training Set 2 X t 2 and its number m, weight matrix W w and W b , batch size B, iterations I, learning rate.
Initialize: θ 1 , θ 2 , θ 3 , θ L
 1:
for epoch in  1 , , I  do
 2:
    T 2 RandomShuffle ( X t 2 )
 3:
   for i in  1 , , int ( m / B )  do
 4:
      T 1 RandomShuffle ( X t 1 with labels)
 5:
      T 2 B T 2 [ i · B : min ( i · B + B , m ) ]
 6:
     Update θ 1 , θ 2 , θ 3 , θ L Minimize ( L c ( T 1 ) in Equation (10))
 7:
     Update θ 1 , θ 2 , θ 3 , θ L Minimize ( L s ( T 2 B ) in Equation (17))
 8:
   end for
 9:
end for
Output: Z = [ z 1 , , z m ] with the input T 2 [ : , : , : , : , 0 : 1 ] according to Equation (9)

3. Experiments and Discussion

3.1. Experimental Datasets

To assess the performance of GDMFSL, three public HSI datasets were used, including the Indian Pines (IP), the University of Pavia (UP) and the Salinas.
Figure 7 shows the color image and the labeled image of the IP dataset covering the Indian Pines region, northwest Indiana, USA, which was acquired by the AVIRIS sensor in 1992. Its spatial resolution is 20 m. It has 220 original spectral bands in the wavelength range 0.4∼2.5 μ m . As a result of the noise and water absorption, 104∼108, 150∼163 and 220 spectral bands were abandoned and the remaining 200 bands were used in this paper. It contains 145 × 145 pixels, including background with 10,776 pixels and 16 ground-truth classes with 10,249 pixels.
The UP dataset covers the University of Pavia, Northern Italy, which was acquired by the ROSIS sensor. Its spatial resolution is 1.3 m. It has 115 original spectral bands in the wavelength range of 0.4∼0.82 μ m . Due to removing 12 noise bands, 103 bands are employed in this paper. It has 610 × 340 pixels, where 164,624 pixels are background and 42,776 pixels are nine ground-truth classes. Figure 8 shows the color image and the labeled image with nine classes.
Salinas Dataset covers Salinas Vally, CA, which was acquired by the AVIRIS sensor in 1998. Its spatial resolution is 3.7 m. It has 224 original bands in the wavelength range of 0.4 to 2.45 μ m . Due to the fact that 20 bands are severely affected by noise, the remaining 204 bands are used for this paper. Each band has 512 × 217 pixels, including 16 ground-truth classes with 56,975 pixels and background with 54,129 pixels. The color image and the labeled image with 16 classes are shown in Figure 9.

3.2. Experimental Setting

As described in Section 2.1, our GDMFSL framework consists of two parts: a semi-supervised graph and a deep multitask network. For the semi-supervised graph, four parameters need to be manually set, namely, the size of spatial window s, the spectral–locational trade-off parameter β , the number of nearest neighbors k 1 and that of farthest neighbors k 2 . In fact, the influence of s, β and k 1 on the graph has been analyzed in [32]. According to that, these three parameters are set separately for different datasets and k 2 is set to be equal to k 1 for convenience. In this paper, four parameters were set to s = 5 , β = 0.7 , k 1 = k 2 = 10 for the IP dataset, s = 7 , β = 0.05 , k 1 = k 2 = 20 for the UP dataset and s = 7 , β = 0.03 , k 1 = k 2 = 20 for the Salinas dataset.
Although the DMN contains two subnetworks, their network structure is the same. To further slow down over-fitting and improve DMN classification performance, we added a dropout model between each convolutional layer in the feature extractor. For the IP dataset, the number of features in each layer is 200 100 50 30 16 , the filter size per layer is 3 × 3 2 × 2 2 × 2 1 × 1 , the output size per layer is 5 × 5 3 × 3 2 × 2 1 × 1 1 × 1 , the dropout model has a retention probability of 0.9 and the learning rate is 6 × 10 4 . For the UP dataset, the number of features in each layer is 103 70 30 9 , the filter size per layer is 3 × 3 3 × 3 1 × 1 , the output size per layer is 5 × 5 3 × 3 1 × 1 1 × 1 , the dropout model has a retention probability of 0.9 and the learning rate is 3 × 10 4 . For the Salinas dataset, the number of features in each layer is 204 110 60 30 16 , the filter size per layer is 3 × 3 3 × 3 3 × 3 1 × 1 , the output size per layer is 7 × 7 5 × 5 3 × 3 1 × 1 1 × 1 , the dropout model has a retention probability of 0.8 and the learning rate is 8 × 10 5 .
In order to verify the superiority of GDMFSL, eight classification methods are selected for comparison, including SVM, KNN, 3D-CNN [41], SSRN [28], SS-CNN [42], DFSL+NN [43], RN-FSC [38] and DCFSL [44]. SVM and KNN are the traditional classification methods and the rest are based on neural networks. In the actual experiment, we utilized the 1-nearest neighbors and a LibSVM toolbox with a radial basis function. 3D-CNN [41] and SSRN [28] are two supervised 3-D deep learning frameworks. SS-CNN [42] is a semi-supervised convolutional neural network. DFSL+NN [43], RN-FSC [38] and DCFSL [44] are three few-shot learning frameworks, all of which are cross-domain methods combined with meta-learning. In the following experiments, 200 labeled source domain samples per class are randomly selected to learn transferable knowledge for these three cross-domain methods.
In order to ensure the fairness of the experiment, 1∼5 labeled target dataset samples per class were used for training, and the rest of the samples of the target dataset were reserved as the testing set. The classification overall accuracy (OA), the average accuracy (AA) and the Kappa coefficient were used to evaluate the classification performance. In addition, each experiment in this paper was repeated 10 times in each condition in order to reduce the experimental random error.

3.3. Convergence Analysis

The DMN in our proposed GDMFSL framework has two subnetworks (tasks) with different loss functions. The classifier subnetwork is used to learn what the sample is, and its loss function is the cross-entropy loss between the outputs of the DMN and labels, which is described in Equation (10). The Siamese subnetwork is used to learn the relationship among samples, and its loss function is the mean-squared loss among outputs of the DMN under different inputs, which compresses the distance between the target samples and its nearest neighbors and expands the distance between the target samples and its farthest neighbors, as described in Equation (17). In addition, the two subnetworks learn under different training data. Meanwhile, they share parameters; that is, they jointly optimize the same learning parameter in the DMN. On the surface, under different directions of optimization, the loss of the two subnetworks is likely to fluctuate, and it is not easy to converge. Nevertheless, to prove the convergence of the two subnetworks of the DMN, we show their loss curves and the classification OA curves of the testing set on three datasets, as shown in Figure 10.
The experiment in Figure 10 was performed based on the training data of five labeled samples in each class. The first row is the classifier subnetwork loss, the second row is the Siamese subnetwork loss and the third row is the prediction accuracy of the testing set. Their learning rates are described in Section 3.2. In Figure 10, the x-axis represents the number of learning parameter updates that are performed after learning each batch of samples. The interval within the curve is 100; that is, one loss is recorded for every 100 parameter updates. From Figure 10, the classifier loss of the three datasets has a smooth convergence curve. This proves that, although the amount of training data of the classifier subnetworks is much less than that of the Siamese subnetwork, the task of the classifier subnetwork to learn the labeled samples is not interfered with by the task of the Siamese subnetwork. Though there are fluctuations, the loss curve of the Siamese subnetwork is still gradually converging, which also indicates that our designed loss function in Equation (17) has convergence. These two loss curves also show that the MFSL strategy we designed is also effective. It is worth noting that the prediction accuracy of the test set not only increases gradually with the decrease in the two losses, but also continues to increase with the convergence of the Siamese subnetwork when the classifier subnetwork has converged but the Siamese subnetwork has not. This proves that our designed Siamese subnetwork is quite advantageous for classification.

3.4. Ablation Study

To demonstrate the effectiveness of the strategy proposed in the GDMFSL framework, we conducted an ablation experiment; the results of which are shown in Table 1. Our GDMFSL framework can be divided into two parts: the semi-supervised graph and the DMN. In order to prove the effectiveness of the proposed DMN, we conducted graph learning based on the semi-supervised graph of GDMFSL to reduce the dimensionality of the HSIs dataset, and then classified the dimensionality reduction results using SVM and KNN, which are named SSGL+SVM and SSGL+KNN in this paper. The DMN contains two subnetworks: a classifier subnetwork and a Siamese subnetwork. In order to prove the contribution of the Siamese subnetwork to the DMN, we conducted an experiment to train only the classifier subnetwork, which is called Classifier SubN in Table 1.
Table 1 shows the classification accuracy of SSGL+SVM, SSGL+KNN, Classifier SubN and GDMFSL for three datasets under the condition of a different number of labeled samples, where the highest OA value for the same classification condition has been in bold. GDMFSL, SSGL+SVM and SSGL+KNN learn from the same graph, and the difference is that GDMFSL utilizes DMN to learn not only graph information but also class information, whereas SSGL+SVM and SSGL+KNN only learn graph information. From Table 1, GDMFSL is superior to SSGL+SVM and SSGL+KNN in all conditions on three datasets, which proves that our proposed DMN is effective. In addition, we found that, on the UP dataset, the classification accuracies of SSGL on SVM and KNN are significantly different. This is because the feature extraction and classification in SSGL+SVM and SSGL+KNN are separated, and the feature data distribution of SSGL on the UP dataset is not suitable for the SVM classifier. At this point, a method in which the feature extractor and classifier can learn together is more valuable.
GDMFSL and Classifier SubN are trained with the same labeled samples; the difference is that GDMFSL utilizes the Siamese subnetwork, learning graph information to share the learning parameters to the classifier subnetwork. Table 1 shows that GDMFSL is much better than Classifier SubN, which means that our designed Siamese sub-network is meaningful and can greatly improve the classification performance of the classifier sub-network. Moreover, we can observe from Table 1 that SSGL+SVM and SSGL+KNN are more likely to be better than Classifier SubN. Based on this, it has to be said that the graph-learning method has more advantages than the traditional deep learning method for HSI classification with few labeled samples. The reason for this is that the traditional deep learning method with a deep structure often falls into over-fitting when only few labeled samples are used to train the network. Nevertheless, GDMFSL has solved this problem with the graph, even surpassing the performance of graph learning.

3.5. Classification Result

To further demonstrate the effectiveness of the GDMFSL in HSI classification with few labeled samples, the classification results of GDMFSL and eight comparison methods are presented in this subsection. In the practice experiments, since DFSL+NN [43], RN-FSC [38] and DCFSL [44] are cross-domain methods, four available HSI data sets, Chikusei, Houston, Botswana and Kennedy Space Center, are collected to form source domain data. After discarding classes with less than 200 samples, 40 classes are used to build the source class.
Table 2, Table 3 and Table 4 report the OA, AA, kappa coefficient and the classification accuracy of each class for HSI classification, where the highest value has been in bold. For three target datasets, we randomly selected five labeled samples from each class for training and the rest for testing. In order to eliminate the influence of randomness on the classification accuracy when selecting labeled samples, each experiment was performed 10 times based on an independent randomly labeled sample. From Table 2, Table 3 and Table 4, we obtained the following conclusions.
(1) The classification accuracies of deep-learning-based methods are mostly better than those of traditional classification methods. For example, the OA values of 3D-CNN [41] and SSRN [28] are approximately 13.7% on IP, 8.08% on UP and 5.69% on Salinas higher than those of SVM and KNN, respectively. One reason is that deep learning methods with a hierarchical network structure can obtain more discriminative features. Another reason is that 3D-CNN [41] and SSRN [28] can obtain spectral–spatial features through the convolutional layer, whereas SVM and KNN only explore spectral features.
(2) Although the learning of unlabeled samples is added in addition to the traditional deep learning method, SS-CNN [42], as a semi-supervised method, does not have a better classification performance but is worse when dealing with few labeled sample classification. However, our proposed semi-supervised method, GDMFSL, achieves a good classification performance. The main reason is that SS-CNN [42] uses unlabeled samples only for data reconstruction and does not acquire and learn the relationship information among samples.
(3) The few-shot learning methods are superior to the traditional deep-learning-based methods. Numerically, the OA values of DFSL+NN [43], RN-FSC [38] and DCFSL [44] are approximately 3.48% on IP, 9.53% on UP and 1.53% on Salinas higher than those of 3D-CNN [41] and SSRN [28], respectively. These few-shot learning methods use a meta-learning strategy to learn a metric space suitable for classification. In effect, they are learning a mapping that better expresses the relationships between samples, which is similar to the learning of relationships between samples in our proposed GDMFSL.
(4) Among all of the algorithms, the GDMFSL proposed by us achieved the best classification results on all three data sets. The classification accuracy of GDMFSL is even much higher than other comparison methods. In terms of data, GDMFSL is 20.77% on IP, 2.77% on UP and 9.51% on Salinas higher than the highest OA value in comparison methods. GDMFSL achieved the highest classification accuracy in most classes, with even some classes with a classification accuracy of 100%. In addition, it is worth noting that, in Table 2, when the IP dataset is classified with only five labeled samples in each class, the OA values of all comparison algorithms are all lower than 70%, whereas that of GDMFSL reaches 87.58%. In addition to the OA value, the values of the AA and kappa coefficients of GDMFSL are the highest among all algorithms for the three datasets. All of these can strongly prove the excellent performance of GDMFSL in HSI few-shot classification.
In order to better display and compare the classification results of different methods, we present the classification maps corresponding to Table 2, Table 3 and Table 4, as shown in Figure 11, Figure 12 and Figure 13. Obviously, compared with other comparison methods, the classification map of GDMFSL is the most similar to the ground truth, and the difference in area is the least on the three datasets. Through Figure 11, Figure 12 and Figure 13 the advantages of GDMFSL in processing HSI classification with only few labeled samples are once again proved visually. Although the classification map of GDMFSL looks clear and smooth, GDMFSL still has drawbacks. A careful look at Figure 11, Figure 12 and Figure 13 shows that most of the misclassification pixels of GDMFSL are at the boundary of the class region and are identified as the class of the adjacent region. Meanwhile, the misclassification area is continuous. The reason is that the learning of the DMN in GDMFSL is greatly influenced by sample relationship information generated by a semi-supervised graph based on SLSD with locational information. Although the addition of location information can improve the ability of the DMN to identify samples within the class region, samples at the boundary of the class region are prone to be misclassified.
The above experiment shows the five-shot classification performance of GDMFSL for HSIs. In order to further verify the performance of GDMFSL with fewer labeled samples and the effect of a different number of labeled samples on different methods, we randomly selected one, two, three, four and five labeled samples per class for the experiment, of which, the classification OA values are shown in Table 5 in which the highest value has been in bold. To more clearly show the numerical changes and comparison in Table 5, the corresponding line chart is presented in Figure 14. From Table 5 and Figure 14, we obtain the following observations.
(1) Though 3D-CNN [41] and SSRN [28] are superior to SVM and KNN in classification when there are five labeled samples in each class, with the decrease in the number of labeled samples, the advantage of 3D-CNN [41] and SSRN [28] becomes less and less, and even weaker than SVM and KNN when the number of labeled samples per class is less than three. For example, in the case of one-shot sample classification in the UP dataset, the OA values of SVM and KNN are approximately 6.87% higher than those of 3D-CNN [41] and SSRN [28]. This is because the fewer the training samples, the more serious the overfitting of deep learning methods. DFSL+NN [43], RN-FSC [38] and DCFSL [44] solve this problem through a meta-learning strategy, whereas our GDMFSL learns the relationship among samples from the semi-supervised graph. The experimental results show that, compared with other few-shot learning methods, GDMFSL achieves a better classification performance.
(2) As the number of labeled samples increased, the OA values of all methods increased. When there was only one labeled sample in each class, the OA values of most methods were quite low and close, whereas GDMFSL was the highest, especially on the Salinas dataset, where the OA value of GDMFSL reached 83.52%.
(3) Although the classification performance of SS-CNN [42] is relatively weak compared with other methods in this paper, it does not mean that the semi-supervised deep learning method is unreliable. On the contrary, the experimental results of GDMFSL prove that semi-supervised deep learning is quite effective in processing hyperspectral image classification, with a small number of labeled samples when unlabeled samples are reasonably used. Compared with SS-CNN [42], GDMFSL has the advantage of learning the relationship among samples.
(4) Obviously, GDMFSL is superior to other few-shot learning methods: DFSL+NN [43], RN-FSC [38] and DCFSL [44]. GDMFSL differs from them in that GDMFSL borrows unlabeled sample information from the target dataset whereas they borrow labeled sample information from other source datasets. Objectively speaking, the latter has problems of domain conversion and different classes between different datasets, whereas the former does not.
(5) It is clear from Figure 14 that GDMFSL has the best classification performance under all conditions. Under different numbers of labeled samples, the OA values of GDMFSL are at least 11.29%, 12.18%, 19.46%, 21.12% and 20.77% higher than those of other methods on the IP dataset, 4.35%, 8,35%, 6.18%, 5.12% and 2.77% on the UP dataset and 7.48%, 12.6%, 11.77%, 8.71% and 9.71% on the Salinas dataset.

4. Concluding Remarks

In this paper, we proposed a GDMFSL framework to deal with HSI classification with few labeled training samples. GDMFSL can be viewed as two parts: a semi-supervised graph and a DMN. First, a semi-supervised graph is constructed to generate graph information, which uses SLSD to estimate sample similarities and then revises them with few labeled samples. Second, a DMN with two subnetworks (tasks) is constructed and trained. The classifier subnetwork is trained on few labeled samples, which learns what the sample class is. The Siamese subnetwork is trained based on all samples (labeled and unlabeled), which learns the differences (relationships) among all samples. The loss function constrains the Siamese subnetwork to shorten the distance between the target sample and its nearest (or intra-class) neighbors and widen the distance between the target sample and its farthest (or inter-class) neighbors. The classifier subnetwork and Siamese subnetwork are jointly trained according to the MFSL strategy, and converge cooperatively.
The experimental results demonstrate that our proposed strategy for incorporating graph information into the DNN is more effective than graph learning in handling the few-shot settings of HSIs, the proposed DMN is more efficient than traditional classification networks, our designed Siamese subnetwork indeed alleviates the over-fitting problem of the classifier subnetwork and greatly improves the classification performance, the loss function of the Siamese subnetwork is convergent, and the MFSL strategy is effective for promoting the common convergence of two subnetworks (tasks).
More importantly, GDMFSL is far superior to the other comparison methods in this paper. Under different numbers of labeled samples, the classification OA values of GDMFSL are at least 11.29%, 12.18%, 19.46%, 21.12% and 20.77% higher than those of other methods on the IP dataset, 4.35%, 8.35%, 6.18%, 5.12% and 2.77% on the UP dataset and 7.48%, 12.6%, 11.77%, 8.71% and 9.71% on the Salinas dataset.
The disadvantage of this work is that DMNs with different network structures need to be designed for different datasets, which makes a trained DMN not universal to other data. It can be said that the DMN is overfitting to the target data. Therefore, our future work will focus on improving the generalizability of DMNs.

Author Contributions

Conceptualization, N.L. and J.S.; methodology, N.L.; software, N.L.; validation, N.L., J.S. and X.Z.; formal analysis, N.L.; investigation, N.L.; resources, D.Z.; data curation, N.L. and T.W.; writing—original draft preparation, N.L.; writing—review and editing, N.L.; visualization, N.L.; supervision, D.Z.; project administration, D.Z.; funding acquisition, J.S. and Z.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the National Natural Science Foundation of China (Grant No. 62076204), the Postdoctoral Science Foundation of Shannxi Province (Grant No.2017BSHEDZZ77), the China Postdoctoral Science Foundation (Grant nos. 2017M613204 and 2017M623246), and the Postdoctoral Science Foundation of China under Grants 2021M700337.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
DNNdeep neural network
HSIhyperspectral image
GDMFSLgraph-based deep multitask few-shot learning
DMNdeep multitask network
KNNK nearest neighbor
SVMsupport vector machine
CNNconvolutional neural network
SLSDspectral-locational-spatial distance
MFSLmultitask few-shot learning
IPIndian Pines
UPUniversity of Pavia
AVIRISAirborne Visible/Infrared Imaging Spectrometer
ROSISReflective Optics System Imaging Spectrometer
3D-CNN3D convolutional neural network
SSRNspectral–spatial residual network
SS-CNNsemi-supervised convolutional neural network
DFSL+NNDeep Few-Shot Learning method with nearest neighbor classifier
RN-FSCrelation network for few-shot classification
DCFSLdeep cross-domain few-shot learning
OAoverall accuracy
AAaverage accuracy
SSGL+SVMsemi-supervised graph learning method with SVM classifier
SSGL+KNNsemi-supervised graph learning method with KNN classifier

References

  1. ElMasry, G.; Sun, D.W. Principles of hyperspectral imaging technology. In Hyperspectral Imaging for Food Quality Analysis and Control; Elsevier: Amsterdam, The Netherlands, 2010; pp. 3–43. [Google Scholar]
  2. Boldrini, B.; Kessler, W.; Rebner, K.; Kessler, R.W. Hyperspectral imaging: A review of best practice, performance and pitfalls for in-line and on-line applications. J. Near Infrared Spectrosc. 2012, 20, 483–508. [Google Scholar] [CrossRef]
  3. Sahoo, R.N.; Ray, S.; Manjunath, K. Hyperspectral remote sensing of agriculture. Curr. Sci. 2015, 108, 848–859. [Google Scholar]
  4. Bridgelall, R.; Rafert, J.B.; Tolliver, D. Hyperspectral imaging utility for transportation systems. In Sensors and Smart Structures Technologies for Civil, Mechanical, and Aerospace Systems 2015; International Society for Optics and Photonics: Bellingham, WA, USA, 2015; Volume 9435, p. 943522. [Google Scholar]
  5. Fei, B. Hyperspectral imaging in medical applications. In Data Handling in Science and Technology; Elsevier: Amsterdam, The Netherlands, 2020; Volume 32, pp. 523–565. [Google Scholar]
  6. Transon, J.; d’Andrimont, R.; Maugnard, A.; Defourny, P. Survey of hyperspectral earth observation applications from space in the sentinel-2 context. Remote Sens. 2018, 10, 157. [Google Scholar] [CrossRef] [Green Version]
  7. Ding, C.; Li, Y.; Wen, Y.; Zheng, M.; Zhang, L.; Wei, W.; Zhang, Y. Boosting Few-Shot Hyperspectral Image Classification Using Pseudo-Label Learning. Remote Sens. 2021, 13, 3539. [Google Scholar] [CrossRef]
  8. Chen, Y.; Lin, Z.; Zhao, X.; Wang, G.; Gu, Y. Deep learning-based classification of hyperspectral data. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2014, 7, 2094–2107. [Google Scholar] [CrossRef]
  9. Hennessy, A.; Clarke, K.; Lewis, M. Hyperspectral classification of plants: A review of waveband selection generalisability. Remote Sens. 2020, 12, 113. [Google Scholar] [CrossRef] [Green Version]
  10. Zhang, H.; Li, Y.; Jiang, Y.; Wang, P.; Shen, Q.; Shen, C. Hyperspectral classification based on lightweight 3-D-CNN with transfer learning. IEEE Trans. Geosci. Remote Sens. 2019, 57, 5813–5828. [Google Scholar] [CrossRef] [Green Version]
  11. Tu, B.; Zhou, C.; He, D.; Huang, S.; Plaza, A. Hyperspectral classification with noisy label detection via superpixel-to-pixel weighting distance. IEEE Trans. Geosci. Remote Sens. 2020, 58, 4116–4131. [Google Scholar] [CrossRef]
  12. Sawant, S.S.; Prabukumar, M. A review on graph-based semi-supervised learning methods for hyperspectral image classification. Egypt. J. Remote Sens. Space Sci. 2020, 23, 243–248. [Google Scholar] [CrossRef]
  13. Kang, X.; Xiang, X.; Li, S.; Benediktsson, J.A. PCA-based edge-preserving features for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2017, 55, 7140–7151. [Google Scholar] [CrossRef]
  14. Bandos, T.V.; Bruzzone, L.; Camps-Valls, G. Classification of hyperspectral images with regularized linear discriminant analysis. IEEE Trans. Geosci. Remote Sens. 2009, 47, 862–873. [Google Scholar] [CrossRef]
  15. Wang, J.; Chang, C.I. Independent component analysis-based dimensionality reduction with applications in hyperspectral image analysis. IEEE Trans. Geosci. Remote Sens. 2006, 44, 1586–1600. [Google Scholar] [CrossRef]
  16. He, L.; Li, J.; Plaza, A.; Li, Y. Discriminative low-rank Gabor filtering for spectral–spatial hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2016, 55, 1381–1395. [Google Scholar] [CrossRef]
  17. Zhang, L.; Luo, F. Review on graph learning for dimensionality reduction of hyperspectral image. Geo-Spat. Inf. Sci. 2020, 23, 98–106. [Google Scholar] [CrossRef] [Green Version]
  18. Tenenbaum, J.B.; De Silva, V.; Langford, J.C. A global geometric framework for nonlinear dimensionality reduction. Science 2000, 290, 2319–2323. [Google Scholar] [CrossRef] [PubMed]
  19. Fang, Y.; Li, H.; Ma, Y.; Liang, K.; Hu, Y.; Zhang, S.; Wang, H. Dimensionality reduction of hyperspectral images based on robust spatial information using locally linear embedding. IEEE Geosci. Remote Sens. Lett. 2014, 11, 1712–1716. [Google Scholar] [CrossRef]
  20. Yan, L.; Niu, X. Spectral-angle-based Laplacian eigenmaps for nonlinear dimensionality reduction of hyperspectral imagery. Photogramm. Eng. Remote Sens. 2014, 80, 849–861. [Google Scholar] [CrossRef]
  21. Zhou, Y.; Peng, J.; Chen, C.P. Dimension reduction using spatial and spectral regularized local discriminant embedding for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2014, 53, 1082–1095. [Google Scholar] [CrossRef]
  22. Huang, H.; Shi, G.; He, H.; Duan, Y.; Luo, F. Dimensionality reduction of hyperspectral imagery based on spatial–spectral manifold learning. IEEE Trans. Cybern. 2019, 50, 2604–2616. [Google Scholar] [CrossRef] [Green Version]
  23. Huang, H.; Duan, Y.; He, H.; Shi, G.; Luo, F. Spatial-spectral local discriminant projection for dimensionality reduction of hyperspectral image. ISPRS J. Photogramm. Remote Sens. 2019, 156, 77–93. [Google Scholar] [CrossRef]
  24. Paoletti, M.; Haut, J.; Plaza, J.; Plaza, A. Deep learning classifiers for hyperspectral imaging: A review. ISPRS J. Photogramm. Remote Sens. 2019, 158, 279–317. [Google Scholar] [CrossRef]
  25. Li, S.; Song, W.; Fang, L.; Chen, Y.; Ghamisi, P.; Benediktsson, J.A. Deep learning for hyperspectral image classification: An overview. IEEE Trans. Geosci. Remote Sens. 2019, 57, 6690–6709. [Google Scholar] [CrossRef] [Green Version]
  26. Jiao, L.; Liang, M.; Chen, H.; Yang, S.; Liu, H.; Cao, X. Deep fully convolutional network-based spatial distribution prediction for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2017, 55, 5585–5599. [Google Scholar] [CrossRef]
  27. Xu, Y.; Du, B.; Zhang, F.; Zhang, L. Hyperspectral image classification via a random patches network. ISPRS J. Photogramm. Remote Sens. 2018, 142, 344–357. [Google Scholar] [CrossRef]
  28. Zhong, Z.; Li, J.; Luo, Z.; Chapman, M. Spectral–spatial residual network for hyperspectral image classification: A 3-D deep learning framework. IEEE Trans. Geosci. Remote Sens. 2017, 56, 847–858. [Google Scholar] [CrossRef]
  29. Li, Z.; Huang, H.; Li, Y.; Pan, Y. M3DNet: A manifold-based discriminant feature learning network for hyperspectral imagery. Expert Syst. Appl. 2020, 144, 113089. [Google Scholar] [CrossRef]
  30. Boggavarapu, L.P.K.; Manoharan, P. A new framework for hyperspectral image classification using Gabor embedded patch based convolution neural network. Infrared Phys. Technol. 2020, 110, 103455. [Google Scholar] [CrossRef]
  31. Paoletti, M.E.; Haut, J.M.; Plaza, J.; Plaza, A. A new deep convolutional neural network for fast hyperspectral image classification. ISPRS J. Photogramm. Remote Sens. 2018, 145, 120–147. [Google Scholar] [CrossRef]
  32. Li, N.; Zhou, D.; Shi, J.; Wu, T.; Gong, M. Spectral-locational-spatial manifold learning for hyperspectral images dimensionality reduction. Remote Sens. 2021, 13, 2752. [Google Scholar] [CrossRef]
  33. Deng, B.; Jia, S.; Shi, D. Deep metric learning-based feature embedding for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2019, 58, 1422–1435. [Google Scholar] [CrossRef]
  34. Jia, S.; Jiang, S.; Lin, Z.; Li, N.; Yu, S. A Survey: Deep Learning for Hyperspectral Image Classification with Few Labeled Samples. Neurocomputing 2021, 448, 179–204. [Google Scholar] [CrossRef]
  35. Snell, J.; Swersky, K.; Zemel, R.S. Prototypical Networks for Few-shot Learning. Adv. Neural Inf. Process. Syst. 2017, 30. [Google Scholar]
  36. Tang, H.; Li, Y.; Han, X.; Huang, Q.; Xie, W. A Spatial–Spectral Prototypical Network for Hyperspectral Remote Sensing Image. IEEE Geosci. Remote Sens. Lett. 2019, 17, 167–171. [Google Scholar] [CrossRef]
  37. Sung, F.; Yang, Y.; Zhang, L.; Xiang, T.; Torr, P.H.; Hospedales, T.M. Learning to compare: Relation network for few-shot learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 1199–1208. [Google Scholar]
  38. Gao, K.; Liu, B.; Yu, X.; Qin, J.; Zhang, P.; Tan, X. Deep relation network for hyperspectral image few-shot classification. Remote Sens. 2020, 12, 923. [Google Scholar] [CrossRef] [Green Version]
  39. Liu, B.; Yu, X.; Zhang, P.; Yu, A.; Fu, Q.; Wei, X. Supervised deep feature extraction for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2017, 56, 1909–1921. [Google Scholar] [CrossRef]
  40. Wang, W.; Chen, Y.; He, X.; Li, Z. Soft Augmentation-Based Siamese CNN for Hyperspectral Image Classification With Limited Training Samples. IEEE Geosci. Remote Sens. Lett. 2021, 19, 1–5. [Google Scholar] [CrossRef]
  41. Li, Y.; Zhang, H.; Shen, Q. Spectral–spatial classification of hyperspectral imagery with 3D convolutional neural network. Remote Sens. 2017, 9, 67. [Google Scholar] [CrossRef] [Green Version]
  42. Liu, B.; Yu, X.; Zhang, P.; Tan, X.; Yu, A.; Xue, Z. A semi-supervised convolutional neural network for hyperspectral image classification. Remote Sens. Lett. 2017, 8, 839–848. [Google Scholar] [CrossRef]
  43. Liu, B.; Yu, X.; Yu, A.; Zhang, P.; Wan, G.; Wang, R. Deep few-shot learning for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2018, 57, 2290–2304. [Google Scholar] [CrossRef]
  44. Li, Z.; Liu, M.; Chen, Y.; Xu, Y.; Li, W.; Du, Q. Deep cross-domain few-shot learning for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2021, 60, 1–18. [Google Scholar] [CrossRef]
Figure 1. Brief of the proposed graph-based deep multitask few-shot learning framework.
Figure 1. Brief of the proposed graph-based deep multitask few-shot learning framework.
Remotesensing 14 02246 g001
Figure 2. Construction of semi-supervised graph.
Figure 2. Construction of semi-supervised graph.
Remotesensing 14 02246 g002
Figure 3. Deep multitask network.
Figure 3. Deep multitask network.
Remotesensing 14 02246 g003
Figure 4. Generating training data for DMN.
Figure 4. Generating training data for DMN.
Remotesensing 14 02246 g004
Figure 5. The network architecture of classifier subnetwork.
Figure 5. The network architecture of classifier subnetwork.
Remotesensing 14 02246 g005
Figure 6. The network architecture of Siamese subnetwork.
Figure 6. The network architecture of Siamese subnetwork.
Remotesensing 14 02246 g006
Figure 7. Indian Pines dataset.
Figure 7. Indian Pines dataset.
Remotesensing 14 02246 g007
Figure 8. University of Pavia dataset.
Figure 8. University of Pavia dataset.
Remotesensing 14 02246 g008
Figure 9. Salinas dataset.
Figure 9. Salinas dataset.
Remotesensing 14 02246 g009
Figure 10. The loss curves of the DMN and the predicted accuracy of the testing set on three datasets (five labeled samples per class).
Figure 10. The loss curves of the DMN and the predicted accuracy of the testing set on three datasets (five labeled samples per class).
Remotesensing 14 02246 g010
Figure 11. Classification maps of different methods on the IP dataset.
Figure 11. Classification maps of different methods on the IP dataset.
Remotesensing 14 02246 g011
Figure 12. Classification maps of different methods on the UP dataset.
Figure 12. Classification maps of different methods on the UP dataset.
Remotesensing 14 02246 g012
Figure 13. Classification maps of different methods on the Salinas dataset.
Figure 13. Classification maps of different methods on the Salinas dataset.
Remotesensing 14 02246 g013aRemotesensing 14 02246 g013b
Figure 14. Classification accuracy for all methods with different number of labeled sample on three datasets.
Figure 14. Classification accuracy for all methods with different number of labeled sample on three datasets.
Remotesensing 14 02246 g014
Table 1. Classification accuracy of different methods on three datasets.
Table 1. Classification accuracy of different methods on three datasets.
Dataset n i 12345
IPSSGL+SVM52.4266.2771.2374.9779.64
SSGL+KNN53.1765.6670.7679.8382.10
Classifier SubN41.2948.1554.6259.2062.19
GDMFSL54.2970.0778.1084.1887.58
UPSSGL+SVM38.1151.5652.6455.6462.51
SSGL+KNN45.7354.0769.7173.8182.60
Classifier SubN44.8354.3162.5169.2272.33
GDMFSL61.9076.3880.2585.4486.42
SalinasSSGL+SVM79.1984.1985.8386.6193.81
SSGL+KNN82.8684.8686.3489.4293.60
Classifier SubN69.0276.0782.2485.7389.32
GDMFSL83.5293.5097.2797.2098.85
Table 2. Classification accuracy (%) of different methods on the IP dataset (five labeled samples per class).
Table 2. Classification accuracy (%) of different methods on the IP dataset (five labeled samples per class).
ClassTrainTestSVMKNN3D-CNNSSRNSS-CNNDFSL+NNRN-FSCDCFSLGDMFSL
C154172.2086.9695.1218.3844.1396.7596.3495.37100.0
C25142334.2737.5337.7064.7946.6338.6546.1343.2676.82
C3582539.1838.1919.7727.6533.3242.7940.6157.9587.59
C4523250.3441.3532.5126.6523.0568.1058.6280.60100.0
C5547869.7555.0788.4580.7657.7771.2064.9672.9184.05
C6572566.3679.0473.6586.8778.9876.1869.4587.96100.0
C752389.1396.4281.8232.2422.78100.0100.099.57100.0
C8547368.7367.7653.35100.093.8574.8477.7086.26100.0
C951586.6790.00100.057.6915.77100.0100.099.33100.0
C10596737.4938.6841.3559.6933.4747.9825.4962.4483.02
C115245033.9636.0066.7170.8756.3257.9565.5162.7576.98
C12558831.4331.5337.4045.0025.0838.2127.1348.7282.29
C13520086.5089.2685.7188.2960.3997.5099.7599.35100.0
C145126062.9351.9362.5797.1881.2283.4476.3585.40100.0
C15538128.0812.9556.4236.6463.7862.2970.3466.69100.0
C1658890.9189.2490.3660.9888.33100.0100.097.61100.0
OA801016945.8542.8654.7661.3651.7359.6558.1766.8187.58
±2.44±1.50±0.03±0.49±3.12±0.63±0.02±2.73±3.41
AA59.2458.2163.9359.7551.5472.2469.9077.8992.70
±1.36±1.48±0.02±0.20±1.98±0.42±0.40±0.86±1.90
Kappa39.6840.0648.7256.9145.2854.5552.5262.6486.13
±2.48±1.43±0.03±0.48±2.75±0.52±0.14±0.84±3.72
Table 3. Classification accuracy (%) of different methods on the UP dataset (five labeled samples per class).
Table 3. Classification accuracy (%) of different methods on the UP dataset (five labeled samples per class).
ClassTrainTestSVMKNN3D-CNNSSRNSS-CNNDFSL+NNRN-FSCDCFSLGDMFSL
C15662689.9852.8759.8291.8484.9769.1968.5582.2059.58
C251864483.9162.2963.0595.1384.6184.6393.4487.7489.53
C35209439.9861.6468.9155.2328.0957.4749.8167.46100.0
C45305960.2269.3577.3178.0244.3589.9992.1593.1654.08
C55134095.4499.2590.7798.3497.38100.099.4399.49100.0
C65502437.1248.8363.4053.5640.3771.2357.9977.32100.0
C75132540.6285.0487.6460.0727.6770.6270.0481.18100.0
C85367768.1765.4857.2785.3462.3758.1363.4866.73100.0
C9594299.1399.6895.5798.0851.2896.9299.1998.6693.77
OA454273164.1261.7265.7476.2656.6177.7580.1983.6586.42
±4.55±2.73±1.77±5.78±6.79±1.16±2.18±1.77±2.90
AA68.1871.2776.7279.5157.9077.5777.1283.7788.57
±2.27±1.78±1.01±3.21±3.86±0.31±0.84±1.74±1.74
Kappa55.5955.4957.3770.5648.2071.1173.7378.7082.89
±4.64±2.65±1.97±6.69±6.45±1.22±2.79±2.01±3.30
Table 4. Classification accuracy (%) of different methods on the Salinas dataset (five labeled samples per class).
Table 4. Classification accuracy (%) of different methods on the Salinas dataset (five labeled samples per class).
ClassTrainTestSVMKNN3D-CNNSSRNSS-CNNDFSL+NNRN-FSCDCFSLGDMFSL
C15200497.5798.2595.2997.5546.5395.6396.4799.4099.85
C25372187.4393.3797.2098.9786.7199.0999.4799.76100.0
C35197182.9593.0691.4592.4777.7494.0185.0591.9698.98
C45138999.1198.8597.3196.5075.9799.5498.7599.5579.48
C55267394.2986.2291.2494.2089.4390.5883.4592.7099.55
C65395498.3696.6698.8099.2899.7898.4796.7399.52100.0
C75357494.3998.9399.6999.9892.8999.8199.6198.88100.0
C851126659.9950.3266.4086.9066.2377.7472.1174.5799.73
C95619896.0997.2996.2599.6493.0291.1388.3599.59100.0
C105327371.4561.9570.7292.0187.6960.9870.5386.4297.77
C115106391.2584.5593.1595.8663.2895.9990.0396.6199.90
C125192297.2278.2099.6599.1576.6093.1393.1599.9399.89
C13591197.3098.0392.6389.2495.5799.3498.5499.30100.0
C145106591.8491.4993.5695.1595.5198.0696.4398.8598.50
C155726360.5256.1668.0255.9746.2677.5470.1875.3897.44
C165180281.4578.3681.4198.9197.4385.0582.3992.22100.0
OA805404980.7178.5084.2086.3972.5187.0584.1189.3498.85
±2.75±2.14±2.62±2.68±3.82±0.83±1.36±2.19±0.77
AA87.5885.2489.5693.2480.8891.0188.8394.0498.39
±1.84±1.55±1.79±1.29±3.52±0.66±2.07±1.14±1.41
Kappa78.6176.8482.4684.9569.4285.6382.3888.1798.72
±3.00±2.22±2.90±2.90±4.19±0.91±1.53±2.40±0.86
Table 5. Classification accuracy (%) for all methods with different number of labeled sample on three datasets.
Table 5. Classification accuracy (%) for all methods with different number of labeled sample on three datasets.
Dataset n i TrainTestSVMKNN3D-CNNSSRNSS-CNNDFSL+NNRN-FSCDCFSLGDMFSL
IP11610,23332.3033.8136.2337.7632.1840.1531.8443.0054.29
23210,21733.0736.2440.0040.7743.4349.6939.6057.8970.07
1634810,20140.7939.8443.0643.5243.6754.2944.2958.6478.10
Classes46410,18542.6341.8949.2951.8448.0357.7656.5363.0684.18
5801016945.8542.8654.7661.3651.3959.6558.1766.8187.58
UP1942,76751.3851.2442.9645.9234.7251.8554.0757.5561.90
21842,75853.5752.5851.4851.4841.4358.5167.2268.0376.38
932742,74959.1556.2758.5265.0048.4473.1472.0374.0780.25
Classes43642,74062.3156.8661.1165.9251.2075.7479.4480.3285.44
54542,73164.1261.7265.7476.2656.6177.7580.1983.6586.42
Salinas11654,11366.2570.8769.5869.7955.4676.0471.8772.7083.52
23254,09768.1273.7771.8772.3956.8580.6274.0680.8893.50
1634854,08177.0876.3474.7983.8565.8682.8176.8785.5097.27
Classes46454,06578.9677.1279.2785.6270.9586.4682.2988.4997.20
58054,04980.7178.5084.2086.3971.7687.0584.1189.1498.85
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Li, N.; Zhou, D.; Shi, J.; Zheng, X.; Wu, T.; Yang, Z. Graph-Based Deep Multitask Few-Shot Learning for Hyperspectral Image Classification. Remote Sens. 2022, 14, 2246. https://doi.org/10.3390/rs14092246

AMA Style

Li N, Zhou D, Shi J, Zheng X, Wu T, Yang Z. Graph-Based Deep Multitask Few-Shot Learning for Hyperspectral Image Classification. Remote Sensing. 2022; 14(9):2246. https://doi.org/10.3390/rs14092246

Chicago/Turabian Style

Li, Na, Deyun Zhou, Jiao Shi, Xiaolong Zheng, Tao Wu, and Zhen Yang. 2022. "Graph-Based Deep Multitask Few-Shot Learning for Hyperspectral Image Classification" Remote Sensing 14, no. 9: 2246. https://doi.org/10.3390/rs14092246

APA Style

Li, N., Zhou, D., Shi, J., Zheng, X., Wu, T., & Yang, Z. (2022). Graph-Based Deep Multitask Few-Shot Learning for Hyperspectral Image Classification. Remote Sensing, 14(9), 2246. https://doi.org/10.3390/rs14092246

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop