1. Introduction
Hyperspectral images (HSIs) generally consist of a series of continuous spectral bands and contain abundant spectral characteristic information. By utilizing such abundant HSI information, ground objects can be identified and classified into a specific category accurately. As the cornerstone of HSI applications, HSI classification has been widely implemented in various practical applications, such as urban development [
1], environmental protection [
2], precision agriculture [
3], and geological exploration [
4].
As a powerful feature extraction method, deep learning has been widely utilized to classify HSI and achieved satisfactory classification results. Firstly, the stacked autoencoder (SAE) [
5] and deep belief network (DBN) [
6] were developed to perform HSI classification. However, the above-mentioned methods generally take 1-D pixel vectors as the inputs of deep models, which ignores the complex context information in 3-D pixel neighborhoods. In addition, a large amount of parameters requires to be optimized because of the models constructed by fully connected layers (FC). To address these issues, some works [
7,
8,
9,
10] introduced a convolutional neural network (CNN) to learn deep feature representation, which directly processes 3-D pixel neighborhoods without discarding context information. For example, a regularized deep feature extraction (FE) method [
7] is designed to extract discriminative features by utilizing a CNN consisting of several convolutional and pooling layers. Subsequently, in order to fully utilize spectral-spatial information in HSI pixel neighborhoods, some works [
11,
12,
13] introduced 3-D CNNs to synchronously learn spectral and spatial deep features for HSI classification. Li et al. [
11] designed a spectral-spatial deep model constructed by 3-D CNNs to jointly learn deep semantic features for HSI classification.
However, as the network deepens, the number of learnable parameters increases exponentially and the problem of gradient disappearance also arises. The residual blocks introduced by the ResNet [
14] are widely utilized in HSI classification [
15,
16,
17,
18] to alleviate the above problems. A deep pyramidal residual network [
15] is designed to perform HSIC, which can significantly increase the depth of the network to extract diverse deep features without introducing extra parameters. A spectral–spatial residual network (SSRN) [
16], which consists of several sequential 3-D convolutional blocks, is proposed to jointly learn discriminative deep spectral-spatial features for HSI classification.
Although these abovementioned methods have achieved significant success in HSI classification, they generally require sufficient labeled samples to optimize deep models. In practical scenarios, sufficient training samples are hard to be acquired. Some data augmentation-based methods [
19,
20,
21,
22] are proposed to address this issue. A novel pixel-pair-based method [
19], which can expand the training sample size without extra labeled samples, is proposed to ensure the excellent classification performance of deep CNN models with sufficient samples. Zheng et al. [
20] designed a specific spectral-spatial HSI classification model to tackle small sample size problems in HSI classification, which utilizes superpixel segmentation to enhance the training sample set. Although these methods have achieved excellent performances with small sample sizes, they rarely develop the feature representation ability of deep models in the situation of limited labeled samples.
The attention mechanism, which utilizes similarity metric methods to capture important information for classification, has attracted extensive interest in HSI classification. By utilizing an attention mechanism to model the interactions between different locations of HSI samples, those relevant features are enhanced and irrelevant features are inhibited. Then, models can extract discriminative features to achieve more satisfactory classification performance. In order to learn discriminative feature representation with limited training samples, some works [
23,
24,
25,
26] introduced attention mechanisms into HSI classification. Sun et al. [
23] embedded spatial attention modules into 3-D CNN to jointly extract deep semantic features for HSI classification. Zhong et al. [
25] designed a spectral-spatial transformer network (SSTN) to achieve spectral-spatial classification for HSIs, which sequentially utilizes an attention mechanism to explore spectral and spatial interactions in HSI pixel neighborhoods. A novel attention-based kernel generation network [
26] is designed to learn discriminative feature representation with limited training samples, which utilizes a self-attention mechanism to fully explore the specific spatial distributions over different spatial locations. Although these methods can alleviate the problem of insufficient labeled samples, they generally assume that both training and testing datasets have the same data and label distributions, which makes them hardly generalize to unknown classes with limited training samples.
To tackle the abovementioned issues, few-shot classification, aiming to learn general knowledge for unknown classes, has recently attracted much attention [
27,
28,
29].
Different from the ordinary deep learning method which aims to learn unique feature representation for a specific class, few-shot learning (FSL) aims to learn the distinctions between various object categories with limited training samples. In recent years, some excellent works [
30,
31,
32,
33,
34,
35,
36] also introduced FSL into HSI classification and achieved great success. Liu et al. [
30] proposed a deep FSL (DFSL) framework to perform HSI classification with limited labeled training samples. Specifically, a metric space is first learned via a feature extractor which consists of several 3-D convolutional residual blocks. Then, the SVM and nearest neighbor (NN) classifier are adopted to classify the testing HSI datasets. A novel FSL relation network (RN-FSL) [
31], which is fully trained in source datasets and fine-tuned with a few labeled target samples, is designed to perform HSI classification.
Although those abovementioned FSL methods achieved satisfactory classification performance with limited training samples, they are generally based on the hypothesis that both source and target domain datasets have the same data distribution. However, in practice scenarios, due to the difference in imaging mechanism, samples from the source and target domain may possess unique data characteristics and object categories. In this article, a novel graph-based domain adaptation FSL (GDAFSL) framework is developed to tackle these issues. Specifically, a deep residual hybrid attention network (DRHAN) is first proposed to learn a compact metric space. In this metric space, the embedding features extracted by DRHAN have a small intraclass distance and a large interclass distance. This means that the embedding features extracted by DRHAN are more compact within the same class and separate away from the other classes. Then, a few-shot HSI classification is performed on the learned compact metric space by calculating the Euclidean distance between class prototypes and unlabeled samples. In order to tackle the domain-shift problems, a novel graph-based domain adaptation network (GDAN) is proposed to align domain distribution and learn domain-invariant feature transformation. Benefits from the powerful ability of graphs to model complex interactions between nodes, a graph convolutional network (GCN) is embedded into the domain adversarial framework to explore the domain correlations between node features. The domain correlations are adopted as edge weights to guide the update of node features. Different from the ordinary feature-based domain adversarial strategy, the proposed method can more properly model domain correlations between samples with non-Euclidean characteristics structure and guide the optimization of learnable parameters to enhance the domain adaptation ability of deep models. In addition, the training process of GDAFSL follows a fashion of meta-learning to better simulate few-shot scenarios in practice. The experimental results conducted on three public HSI datasets prove that GDAFSL is outperforming other ordinary deep learning methods and state-of-the-art FSL methods. The main contribution of this article are as follows: (1) A novel DRHAN, which utilizes an attention mechanism to emphasize critical spatial features at a global scale and extract specific spatial features at a local scale, is proposed to enhance the capability of feature representation with limited labeled samples. (2) A novel graph-based domain adaptation network (GDAN) is proposed to achieve domain adaptation FSL. The GDAN utilizes graph construction to measure the domain correlations and generates more refined domain adaptation loss to guide the domain adaptation learning process. (3) A novel similarity measurement method is proposed to model the cross-domain correlations. The proposed method can aggregate node features more properly to obtain a more refined domain similarity graph for GDAN. (4) The proposed GDAFSL combines the FSL and graph-based domain adaptation method organically to improve the cross-domain few-shot classification performance. Extensive experiments conducted on three HSI datasets demonstrate the effectiveness of the proposed GDAFSL.
The remainder of this article is organized as follows.
Section 2 reviewed some related works of GCNs and cross-domain FSL.
Section 3 elaborates on the proposed GDAFSL.
Section 4 reports the experimental results and analysis on three HSI datasets.
Section 5 gives a further discussion about the classification performance of our method.
Section 6 summarizes this article and draws a conclusion.
4. Experiments
4.1. Description of Experiment Datasets
To assess the classification performance of GDAFSL, several public HSI datasets, including the Chikusei dataset, the Indian Pines (IP) dataset, the University of Pavia (UP) dataset, and the Salina Valley (SV) dataset, are selected to conduct comprehensive experiments. As the proposed GDAFSL is a cross-domain FSL, the Chikusei dataset is adopted as the source domain dataset, and IP, UP, and SV datasets are regarded as the target domain dataset. Another reason to select the Chikusei dataset as the source domain dataset is that the spectral characteristics of the Chikusei dataset are different from the other three datasets, which can better verify the effectiveness of GDAFSL.
The Chikusei dataset, captured by hyperspectral visible/near-infrared cameras (Hyperspec-VNIR-C) in 2014, in Chikusei, Ibaraki, Japan, consists of 19 land-cover categories and a total of 77,592 labeled pixels. It has 2517 2335 pixels in spatial dimension and 128 spectral bands in spectral dimension. The corresponding wavelengths ranged between 363 and 1018 nm. Its spectral resolutions and ground sample distance (GSD) are 10nm and 2.5 m/pixel, respectively.
The Indian Pines dataset, captured by the airborne visible infrared imaging spectrometer (AVIRIS) sensor in 1992, over Indiana, USA, contains 16 object categories and a total of 10,249 labeled pixels. It has 145 145 pixels in spatial dimension and 200 spectral bands in spectral dimension. The corresponding wavelengths are ranged between 400 and 2500 nm. The spectral resolution and GSD of the IP dataset are 10nm and 20 m/pixel, respectively.
The University of Pavia dataset, captured by the reflective optics system imaging spectrometer (ROSIS) sensors over Pavia, Northern Italy, consists of nine land-cover categories and a total of 42,776 labeled pixels. It has 610 340 pixels in spatial dimension and 103 spectral bands in spectral dimension. The corresponding wavelengths are evenly distributed between 430 and 860 nm. The spectral resolution and GSD of the UP dataset are 4nm and 1.3 m/pixel, respectively.
The Salina Valley dataset, captured by the airborne visible/infrared imaging spectrometer (AVIRIS) sensors over Salinas Valley, CA, USA, consists of 16 land-cover categories and a total of 54,129 labeled pixels. It has 512 217 pixels in spatial dimension and 204 spectral bands in spectral dimension. The corresponding wavelength ranged between 400 and 2500 nm. The spectral resolution and GSD of the SV dataset are 10 nm and 3.7 m/pixel, respectively.
In our experiments, to ensure the balance of the number of training samples in various classes, 200 labeled samples per class are randomly selected from the source domain dataset to perform source FSL, as conducted in [
30]. 5 labeled samples per class are also sampled from target domain datasets to construct fine-tuning dataset (training dataset) for target domain fine-tuning or target domain FSL.
Table 1,
Table 2 and
Table 3 present the partitioning of IP, UP, and SV datasets, respectively.
Since the FSL training processes are performed alternately on source and target data sets, labeled samples in the target fine-tuning dataset are insufficient to construct target FSL tasks, and data augmentation methods are adopted to expand the target fine-tuning dataset. The samples of the original fine-tune dataset are firstly rotated 90, 180, and 270 degrees clockwise, respectively. Then, the Gaussian noise is randomly added to the rotated samples to further expand the target fine-tuning dataset. It is worth noting that the augmented dataset is only used for target domain FSL and the original fine-tuning dataset is only used to perform target domain fine-tuning.
4.2. Experimental Setup
In this article, our experiments are conducted on a workstation with an AMD Threadripper processor (2.90 GHz), 64 GB of memory, and an RTX 3090 graphics processing unit with 24 GB RAM. All HSI classification methods adopted in our experiments are constructed by utilizing Python language in the Pytorch platform.
In our experiments, the Adam optimizer is adopted to optimize learnable parameters. The Xavier normalization method is adopted to initialize convolutional kernels in GDAFSL. The initial learning rate is set to 0.001 and reduced by 5% after every 200 iterations. The training epoch of the FSL training process is set to 10,000. For a
C-way-
K-shot task in the FSL training process,
C is set to the class number of the target dataset and
K is set to 1. The query sample size
Z of the query set
in each task is set to 19, as conducted in [
30,
31]. The overall accuracy (OA), average accuracy (AA), and Kappa coefficients (
κ) are adopted to quantitatively evaluate different HSI classification methods. All methods are executed 10 times on three HSI datasets to calculate averages and standard deviations of OA, AA, and
κ.
4.3. Parameters Setup
In this section, some main hyperparameters, the initial learning rate, the number of the epoch, the size of tasks in FSL, the neighborhood size P, and the number of GUM , are discussed to find the optimal value for GDAFSL.
The learning rate plays an important role in the training process. To find the optimal initial learning rate, several experiments with different learning rates {0.0001, 0.0005, 0.001, 0.005, 0.01, 0.05, 0.1} are conducted on three target datasets. The experimental results are presented in
Figure 9. When the initial learning rate is set to 0.001, GDAFSL achieved the best classification performance on three target datasets. The overlarge or overall learning rate, such as 0.1 or 0.0001, may cause the model cannot to converge to the optimal solution. Therefore, the initial learning rate is set to 0.001 and reduced by 5% after every 200 iterations.
The number of the epoch is another vital parameter in the training process. To find the optimal value of it, we also conducted experiments with various numbers of epochs. The number of epochs are set as 2000, 4000, 6000, 10,000, 12,000, 14,000, and 16,000, respectively. The experimental results are presented in
Figure 10. The OAs of GDAFSL on three datasets increased continuously with the increase of the number of epochs. When it reached 10,000, the OAs began to converge. Therefore, to balance the relationship between classification performance and time cost, the number of epochs is set to 10,000 on three target datasets.
As mentioned in
Section 3.1, the training processes of the FSL method are iteratively conducted on a series of
C-way-
K-shot tasks. The size of the task is also vital to the model’s classification performance. To find the optimal task size of GDAFSL, several experiments are conducted on three target datasets. Specifically, the number of support samples
K and the number of query samples
Z are discussed in this part. The
K is set to 1, 5, 10, 15, and 20, respectively. The
Z is set to 4, 9, 14, 19, 24, 29, 34, and 39, respectively. The experimental results are presented in
Figure 11. When
K is set to 1, the OAs reached the maximum. Then, the OAs decreased continuously with the increase of
K. For the number of query samples
Z, the OAs increased continuously with the increase of
Z. When the
Z is set to 19, the OAs start to converge. Therefore, the number of support samples
K and the number of query samples
Z are set to 1 and 19, respectively.
To utilize abundant spatial-spectral information in HSIs, HSI pixels are firstly sampled as
pixel neighborhoods to perform HSI classification. Several experiments with different neighborhood sizes
are conducted to find the optimal value of
P on three HSI datasets. As exhibited in
Figure 12, when
is smaller than 11, the OAs of GDAFSL increased continuously with the increase of
. When
is set to 11, GDAFSL achieved the best classification performance and acquired the highest OAs. While
is larger than 11, with the increase of
, the OAs decreased continuously instead. We consider that properly expanding the spatial size of pixel neighborhoods generally has a positive impact on classification performance, but the too-large value of
P will introduce irrelevant noises to weaken the classification performance of deep models. Therefore, the optimized neighborhood size is set to 11.
The number of GUM
determines the ability of GDAN to explore domain correlations between different domain samples. To find the optimal value of
, several experiments are conducted with different numbers of GUM
. The experimental results are reported in
Figure 13. With the increase of
, the OAs first increased and then decreased, and when
is set to 3, the OAs reached the highest value. We consider that the appropriate deepening of GDAN helps to explore domain correlations, but the overly large value of
will make GDAN too complex to be optimized properly, which weakens the capability of GDAN to model domain correlations. Therefore, the optimal value of
is set to 3 in this work.
4.4. Ablation Studies
Comprehensive ablation studies are conducted on several target HSI datasets to assess the specifically designed modules in GDAFSL and evaluate their contributions to HSI classification. In this article, the ordinary residual CNN, which has a similar structure to DRHAN and utilizes 3 3 convolutional layers for feature extraction, is combined with the FSL method as a baseline HSI classification method.
4.4.1. The Impact of DRHAN
To evaluate the feature representation ability of the proposed feature extractor DRHAN, we gradually add GAMM and LAEM into the baseline and conduct experiments on three target datasets under the GDAFSL framework.
Table 4 presents the experimental results and the best classification results are shown in bold. Due to the lack of additional specific designed modules, the baseline method achieves the worst classification performance compared with other specific designed combinations. The combination of “GAMM + LAEM”, which is also called the DRHAN, achieves the best classification results on three target datasets. Compared with DRHAN, separately utilizing GAMM or LAEM cannot achieve competitive classification performance, and the performance of LAEM is better than that of GAMM.
4.4.2. The Effectiveness of GDAN
The domain adaptation strategy plays an important role in our GDAFSL method. To assess the contributions of GDAN to HSI classification, comprehensive ablation studies are executed on IP, UP, and SV datasets. Specifically, the GDAN is firstly compared with an ordinary FSL framework without domain adaptation strategy and FSL with the conditional domain adversarial network (CDAN) [
47] to verify its effectiveness. Then, to further explore the effectiveness of the proposed KL- divergence-based message aggregation, mean-based message aggregation, and cosine similarity-based message aggregation are compared with our method. To ensure the fairness and consistency of experiments, DRHAN is adopted as a feature extractor in all above methods.
Table 5 reports our experimental results. It is obvious that all domain-adaptation-based FSLs achieved higher OAs than ordinary FSLs and the KL-divergence-based GDAN achieved the highest OAs. It is worth noting that the cosine similarity-based GDAN achieved the worst classification performance in the above four domain-adaptation-based FSLs. We consider that the cosine similarity method has an insufficient capability to model the correlations between samples with non-Euclidean structural characteristics, which results in a negative impact on cross-domain classification performance.
4.4.3. The Contribution of Different Modules in GDAFSL
To further verify the contribution of different modules in GDAFSL, the proposed modules are gradually added into the baseline to verify their contributions for HSI classification. As presented in
Table 6, due to the lack of specifically designed modules, the baseline method achieved poor classification performance. After adding specifically designed modules, the classification performance of the baseline method has significant improvement. It is worth noting that all combinations with GDAN, such as “GDAN”, “GDAN + GAMM”, “GDAN + LAEM” and “GDAN + GAMM+LAEM”, achieved higher OAs than those combinations without GDAN, such as “Baseline”, “GAMM”, “LAEM” and “GAMM+LAEM”, which demonstrates that the proposed GDAN can not only enhance models’ classification performance alone but also jointly improve models’ classification performance with other modules. It is worth noting that the combination “GDAN” achieved competitive classification results, even better than the combination “GAMM + LAEM”, which demonstrates that domain adaptation modules have more contributions than feature extraction modules for cross-domain FSL. It means that developing an effective domain adaptation method has great significance to cross-domain few-shot HSI classification.
4.5. Comparision Experimental Results
To evaluate the classification performance of GDAFSL, several state-of-the-art are selected to conduct comparative experiments, including ordinary SVM, SSRN [
16], RSAKGN [
26], DFSL + NN [
30], DFSL + SVM [
30], DCFSL [
42], and FSL-EGNN [
41].
Table 7,
Table 8 and
Table 9 report the quantitative experimental results acquired by different HSI classification methods on three HSI datasets. Compared with the machine learning methods (SVM), deep learning methods (SSRN and RSAKGN) acquired more satisfactory classification performance on all three target datasets. We consider that the higher classification accuracy is mainly caused by the sufficient capability of deep models to extract discriminative semantic features. Compared with deep learning methods (SSRN and RSAKGN), FSL methods (DFSL + NN, DFSL + SVM, DCFSL, and FSL-EGNN) and our GDAFSL achieve more excellent classification performances with limited training samples, which demonstrates the effectiveness of FSL methods for learning transfer knowledge.
For the abovementioned FSL methods, the proposed GDAFSL achieves the best classification performance. The OAs of GDAFSL on IP, UP, and SV datasets are 76.26%, 87.14%, and 93.48%, respectively. Compared with those FSL methods without domain adaptation (DFSL + NN and DFSL + SVM), the domain adaptation FSL methods (DCFSL and GDAFSL) generate better classification performance. Due to adding GDAN to explore domain similarities, GDAFSL achieves more satisfactory classification results. The OAs of GDAFSL are 9.45%, 3.49%, and 4.14% higher than that of DCFSL on IP, UP, and SV datasets, which demonstrates the effectiveness of GDAN. When compared with FSL-EGNN which achieves the second-best classification performance, GDAFSL achieves higher classification performance by virtue of the specifically designed domain adaptation module. The OAs of GDAFSL are 8.27%, 1.66%, and 3.09% higher than that of FSL-EGNN on IP, UP, and SV data sets, which demonstrated the superiority of our proposed GDAFSL method.
It is worth noting that the GDAFSL achieves a more significant improvement of classification performance on the IP dataset than on the UP and SV dataset. This is mainly because the GDAFSL significantly improves the accuracy of those hardly classified classes which are easily misclassified by other methods and have extremely lower classification accuracy, such as class 3 (Corn-mintill), class 10 (Soybean-notill), class 11 (Soybean-mintill), and class 15 (Buildings-Grass-Trees-Drives).
To further intuitively evaluate the classification performance of GDAFSL, classification maps of three target datasets generated by various HSI classification methods are presented in
Figure 14,
Figure 15 and
Figure 16. GDAFSL can generate more precise and consistent classification maps which have fewer misclassification pixels. Compared with reference methods, GDAFSL can not only significantly increase the intra-class consistency in the homogeneous region, but also decrease the inter-class differences in border areas. These classification maps also demonstrate the superiority of GDAFSL in a more intuitive view.