Meta-Learning-Integrated Neural Architecture Search for Few-Shot Hyperspectral Image Classification

Wang, Aili; Zhang, Kang; Wu, Haibin; Chen, Haisong; Wang, Minhui

doi:10.3390/electronics14152952

Open AccessArticle

Meta-Learning-Integrated Neural Architecture Search for Few-Shot Hyperspectral Image Classification

by

Aili Wang

¹

,

Kang Zhang

¹,

Haibin Wu

¹

,

Haisong Chen

^2,* and

Minhui Wang

¹

Heilongjiang Province Key Laboratory of Laser Spectroscopy Technology and Application, Harbin University of Science and Technology, Harbin 150080, China

²

School of Integrated Circuit, Shenzhen Polytechnic University, Shenzhen 518115, China

^*

Author to whom correspondence should be addressed.

Electronics 2025, 14(15), 2952; https://doi.org/10.3390/electronics14152952

Submission received: 11 June 2025 / Revised: 15 July 2025 / Accepted: 22 July 2025 / Published: 24 July 2025

(This article belongs to the Special Issue Deep Learning in Image Processing and Pattern Recognition, 2nd Edition)

Download

Browse Figures

Versions Notes

Abstract

In order to address the limitations of the number of label samples in practical accurate classification scenarios and the problems of overfitting and an insufficient generalization ability caused by Few-Shot Learning (FSL) in hyperspectral image classification (HSIC), this paper designs and implements a neural architecture search (NAS) for a few-shot HSI classification method that combines meta learning. Firstly, a multi-source domain learning framework was constructed to integrate heterogeneous natural images and homogeneous remote sensing images to improve the information breadth of few-sample learning, enabling the final network to enhance its generalization ability under limited labeled samples by learning the similarity between different data sources. Secondly, by constructing precise and robust search spaces and deploying different units at different locations, the classification accuracy and model transfer robustness of the final network can be improved. This method fully utilizes spatial texture information and rich category information of multi-source data and transfers the learned meta knowledge to the optimal architecture for HSIC execution through precise and robust search space design, achieving HSIC tasks with limited samples. Experimental results have shown that our proposed method achieved an overall accuracy (OA) of 98.57%, 78.39%, and 98.74% for classification on the Pavia Center, Indian Pine, and WHU-Hi-LongKou datasets, respectively. It is fully demonstrated that utilizing spatial texture information and rich category information of multi-source data, and through precise and robust search space design, the learned meta knowledge is fully transmitted to the optimal architecture for HSIC, perfectly achieving classification tasks with few-shot samples.

Keywords:

hyperspectral image classification; neural architecture search; meta learning; few-shot learning

1. Introduction

Among various remote sensing technologies, the hyperspectral image (HSI) originates from the ground object imaging technology developed in aerospace, achieving the goal of long-range ground object detail imaging [1]. This imaging feature enables HSI to simultaneously contain the spatial distribution characteristics [2,3]. In order to address the limitations of the number of label samples in practical accurate classification scenarios and the problems of overfitting and insufficient generalization ability caused by Few-Shot Learning (FSL) in hyperspectral image classification (HSIC), this paper designs and implements a neural architecture search (NAS) for a few-shot HSI classification method that combines meta learning. Firstly, a multi-source domain learning framework was constructed to integrate heterogeneous natural images and homogeneous remote sensing images to improve the information breadth of few-sample learning, enabling the final network to enhance its generalization ability under limited labeled samples by learning the similarities between different data sources. Here, “homogeneous” refers to the relative consistency of remote sensing images in terms of data generation logic, feature distribution patterns, or acquisition conditions, which stands in contrast to the “heterogeneity” of natural images. In terms of the uniformity of imaging mechanisms, remote sensing images are usually captured by specific sensors at fixed altitudes and within preset spectral ranges, with relatively stable imaging parameters and little interference from manual shooting. In contrast, natural images mostly come from devices such as ordinary cameras and mobile phones, with significant differences in shooting angles, lighting, distances, and device models, resulting in a more chaotic imaging mechanism (i.e., “heterogeneous”). Regarding the regularity of feature distribution, the underlying features of remote sensing images (such as spectral reflectance and texture structure) often conform to the physical properties of the corresponding ground objects (for example, vegetation has strong near-infrared reflectance, and water bodies have smooth textures), leading to more concentrated feature distributions and more stable noise patterns. Due to the randomness of shooting scenes, objects, and conditions, natural images have more scattered feature distributions with more significant differences. This explains the distinction between homogeneous and heterogeneous images. Secondly, by constructing precise and robust search spaces and deploying different units at different locations, the classification accuracy and model transfer robustness of the final network can be improved. This method fully utilizes spatial texture information and rich category information of multi-source data and transfers the learned meta knowledge to the optimal architecture for HSIC execution through precise and robust search space design, achieving HSIC tasks with limited samples. Experimental results have shown that our proposed method achieved an overall accuracy (OA) of 98.57%, 78.39%, and 98.74% for classification on the Pavia Center, Indian Pine, and WHU-Hi-LongKou datasets, respectively. It is fully demonstrated that by utilizing spatial texture information and rich category information of multi-source data, through precise and robust search space design, the learned meta knowledge is fully transmitted to the optimal architecture for HSIC, perfectly achieving classification tasks with few-shot samples.

Intensity characteristics and material fingerprint spectral characteristics of the target form a unique triple information fusion advantage [4]. Therefore, HSI, which contains rich spatial and spectral information, has deep research value in precision agriculture [5], land cover analysis [6], marine hydrological detection [7], geological exploration [7], and other fields.

Hyperspectral image classification (HSIC) is a key direction in HSI research, with the core goal of accurately extracting discriminative features from high-dimensional data to address the classification challenges of surface cover in complex scenes. In the field of deep learning (DL) images dominated by convolutional neural networks (CNN) [8,9,10], the performance of HSIC tasks has been effectively improved through end-to-end feature extraction mechanisms. Artificially designed models such as the Spectral Spatial Residual Network (SSRN) [11] and Attention Based Adaptive Spectral Spatial Kernel ResNet (A²S²K-ResNet) [12] have achieved satisfactory classification performances. However, these models require a large number of labeled samples. In practical application scenarios, the cost of HSI data collection is relatively high, with tens of thousands of pixels, making it expensive and difficult to obtain a large number of labeled samples [13]. Therefore, under resource constraints, the scarcity of labeled samples poses a challenge to HSIC. Although unsupervised learning and semi-supervised learning have made progress, these methods require the use of data for complete model retraining, which incurs significant resource overhead.

To address the challenge of resource constraints, meta-learning methods based on few-sample learning (FSL) have been proposed and applied to HSIC. This method learns transferable knowledge from a source domain with rich labeled samples, allowing the model to learn how to learn. Then, the model trained on the source domain is transferred to the target domain and combined with the limited samples in the target domain to achieve the classification of unlabeled samples.

In 2019, Chen et al. [14] proposed a one-dimensional automatic neural network, which achieved the first application of Neural Architecture Search (NAS) based on the DARTS search strategy in the field of HSI classification, automating the process of designing classification models. Subsequently, Zhang et al. [15] designed a pixel-to-pixel level HSI classification architecture 3D asymmetric NAS, where all operations are performed within the three-dimensional architecture of a hierarchical search space and the network width is adaptively adjusted based on the characteristics of different HSIs. Xue et al. [16] proposed a hyperspectral image classification method that combines the automatic design of convolutional neural networks with Transformers. This method is the first to combine NAS and Transformers to process HSI classification tasks. Cao et al. [17] proposed a lightweight multi-scale neural structure search-based hyperspectral classification method and designed a new lightweight and efficient search space to reduce the number of model parameters, with high classification performance and low computational cost. In 2024, Xiao et al. [18] first explored the application of few-sample learning in HSI classification tasks using NAS, achieving the automatic design of HSIC embedding extractors with limited labeled samples.

In this context, NAS can effectively reduce manual intervention and save the time cost of parameter tuning by automatically designing network structures, providing a more efficient and accurate approach to network design and optimization for HSIC research with limited samples. Therefore, in order to continue exploring the application of NAS in FSL, this paper proposes a few-shot hyperspectral image classification method combined with meta learning. The contributions of this article are as follows:

Explored the FSL application of NAS in hyperspectral classification tasks, constructed a multi-source domain learning framework combined with meta learning, and improved the richness of learnable meta knowledge.
By designing a precise robust search space for attention convolution, the automatic design of the HSIC feature extractor architecture under limited samples was achieved. The optimal precise and robust units were deployed at different positions of the architecture, ensuring that the architecture maintains both classification accuracy and transfer robustness on HSIC.
Within the search space, an attention convolution operator is proposed, which combines efficient attention mechanisms with depthwise separable convolutions to enhance the discriminative feature extraction capability of the optimal architecture while maintaining the effectiveness of the convolution.
By combining focus loss and label-distribution-aware margin loss, optimal architecture can effectively improve the classification performance of the model for imbalanced samples.

2. Materials and Methods

2.1. Overall Framework of MLFS-NAS

This paper proposes Meta-Learning-Integrated NAS for Few-Shot HSI classification (MLFS-NAS) for HSIC tasks with limited labeled samples. The workflow of the MLFS-NAS is shown in Figure 1, which consists of three main parts: NAS for searching for the optimal embedded feature extractor architecture, the training process for multi-source domains and labeled target domains, and the testing process for unlabeled target domains. Among them, a support set S and a query set Q with disjoint samples are divided from the source domain dataset and the labeled target domain dataset.

Firstly, the natural image dataset Mini ImageNet (MI) and hyperspectral dataset Chikusei (CK) are used to jointly construct a multi-source domain dataset to build a NAS-based supernet (i.e., containing all possible candidate architectures within a predefined search space), and then the optimal unit architecture search for the feature extractor is completed through a differentiable search strategy to construct the final network. Then, a large number of labeled samples from multiple domains and a limited number of labeled samples from the target domain HSI dataset (including the Pavia Center, Indian Pines, and WHU Hi LongKou datasets) are used for meta learning, alternating optimization training to fine tune the parameters of the final network, resulting in an optimal architecture for a target domain HSIC suitable for unlabeled samples. Finally, the optimal architecture is migrated to complete the HSIC for unlabeled samples. Among them, a precise and robust search space was constructed to improve the accuracy and robustness of the architecture, ensuring the stability and classification accuracy of the final architecture during the migration process. A deep separable convolution operator for the internal search space is proposed by combining an adaptive fine-grained channel attention (FCA) and Mixed Local Channel Attention (MICA) to improve the final network’s ability to learn features.

2.2. Few-Shot Sample Learning of Multi-Source Domain and Target Domain

In the method of this paper, an FSL framework is established for the multi-source domain and the target domain. Each dataset in the multi-source domain is denoted by

D_{u}

. Here,

u

represents the index of the source domain, and its value ranges from 1 to 2. Each dataset contains

C_{u}

categories. In each episode,

c

categories are randomly sampled from the source domain datasets, and for each category,

k

labeled samples are extracted as the support set

S

. Similarly, n labeled samples are randomly sampled from the same

c

categories as the query set

Q

, where the samples in the support set are excluded. Therefore, the support set of the multi-source domain is denoted as

S_{u} = {(x_{i}, y_{i})}_{i = 1}^{c \times k}

, and the query set is defined as

Q_{u} = {(x_{j}, y_{j})}_{j = 1}^{c \times n}

. The dataset D of the target domain is divided into a few-shot dataset with labeled samples and a test dataset with unlabeled samples. Similarly, the support set is defined as

S_{t} = {(x_{i}, j_{i})}_{i = 1}^{c \times k}

(

k

labeled samples, with

c

sampled classes in each episode), and the query set is defined as

Q_{t} = {(x_{j}, y_{j})}_{j = 1}^{c \times n}

(

n_{t}

samples with the same classes).

After the network search, the training process is carried out through episode-based meta-training between the source datasets and the target dataset, which helps in the collaborative learning of the discriminative embedding space shared by the source domain and the target domain. Firstly, three mapping layers,

M_{1}

,

M_{2}

, and

M_{3}

, are used to make the input dimensions of the multi-source domain datasets and the target domain dataset equal. Then, the data is input into Finalnet. The feature output of the mapping layer is

F^{'} = F \times M

, where the input data

F \in R^{H \times W \times c h}

,

M \in R^{c h \times 100}

, and

c h

is determined by the number of input spectral bands [18].

\{\begin{matrix} f_{u}^{s} = Finalnet (M_{u} (S_{u})) \\ f_{u}^{q} = Finalnet (M_{u} (Q_{u})) \end{matrix}

(1)

where

u \in {1, 2, t}

is determined by the dataset of each domain.

In order to address the problem of class imbalance, this chapter employs the focal loss (FL) [19] and the label-distribution-aware margin loss (LDAM) [20] functions jointly for few-shot learning in the HSIC task. The total loss function is

L_{F S L}^{u} = L_{L D A M}^{u} + L_{F L}^{u}, u \in {1, 2, t}

(2)

where

L_{L D A M}

represents the label-distribution-aware margin loss and

L_{F L}

represents the focal loss.

In the test phase, classification is carried out by minimizing the distances between the samples

x_{q}

in the query set

Q

and all the samples in the support set

S

:

p (x_{q}) = \underset{x_{i} \in S}{\arg \min} d (f_{θ} (x_{q}), f_{θ} (x_{i}))

(3)

where

x_{q}

represents the unlabeled samples in the query set;

p (x_{q})

represents the predicted class labels of

x_{q}

.

L_{F L} = - \sum_{x \in Q} \sum_{c = 1}^{C} {(1 - y_{c})}^{γ} \log (\hat{p} (x_{q}))

(4)

where

C

represents the number of classes within the domain;

y_{c}

represents the true label distribution of

x_{q}

;

\hat{p} (x_{q})

represents the predicted probability distribution; and

γ

represents the hyperparameter and its value ranges within [0, 5].

L_{L D A M} ((x, y), f) = - \log \frac{u}{u + \sum_{y \neq j} e^{z_{j}}}, u = e^{z_{y} - p_{y}}

(5)

p_{j} = \frac{C}{n_{j}^{1 / 4}}, j \in {1, 2, \dots, k}

(6)

where

z_{y}

represents the output value of the sample

x

with label

y

passing through the model;

z_{j}

represents the output value of this sample for the jth class; and

n_{j}

represents the number of samples in each class.

2.3. Accurate and Robust Search Space

In the ARNAS [21] method, guiding conclusions were proposed for NAS, demonstrating that the depth and width of unit structures at different positions in the entire neural architecture have different effects on accuracy and robustness. By placing different unit structures in different positions, the accuracy and robustness of the neural network architecture can be simultaneously improved. However, most existing search space designs are composed of normal units and reduced units. By stacking multiple normal units to improve classification accuracy as much as possible and reducing units to avoid invalid data, the overall architecture is constructed. Therefore, the final architecture is mainly composed of the same normal units, which limits the accuracy and robustness of the architecture.

2.3.1. Design of Accurate and Robust Search Space

This paper constructs an accurate and robust search space for few samples of HSIC. Among them, reduced units are retained to avoid interference from invalid data on the architecture, while precise and robust units are used instead of single-type normal units to improve the accuracy and robustness of the neural network architecture by placing different unit structures in different positions. Among them, the precise unit and the robust unit return feature maps of the same dimension but can be placed in different positions of the architecture to obtain gains in classification accuracy and transfer robustness. The reason for naming precise units and robust units is that precise units tend to have more attention separable convolution operators internally, which makes it easier for the model to learn more parameters to improve classification performance. Robust units tend to use dilated convolutions, fixing certain weights to zero and making it difficult for input perturbations to alter the output, thus possessing stronger robustness. Its complete structure is shown in Figure 2.

Among them, the reduction unit is placed at one-third and two-thirds of the entire architecture, the precise unit is placed before the second reduction unit, and the robust unit is placed after the second reduction unit. This design mainly relies on the strong correlation between unit performance and its position in the architecture, with a focus on accuracy in the front and robustness in the back. Subsequently, the NAS automatically searches for the optimal operation combination and connection method within the precise and robust units. The operations between nodes within a precise unit tend to have more separable convolutions, fewer dilated convolutions, and skip connections, while the operations between nodes within a robust unit tend to have more dilated convolutions, fewer separable convolutions, and no skip connections.

2.3.2. Internal Design of Search Space

This paper constructs a unit structure for precise and robust search spaces, and the design of internal operators in the search space is a key factor in determining unit performance. Previous CNN methods have overly relied on local spatial feature extraction, while existing channel attention mechanisms (such as SE modules) use fully connected layer static weighting to sever the dynamic correlation between spatial details and spectral global response, making it difficult to fully exploit ground features and complex spectral spatial information in hyperspectral data [22].

To this end, as shown in Figure 3, this paper designs three convolution operators to construct the search space: Dilated Convolution (Dilated_Conv), which expands the receptive field through multi-level dilation rates and captures global contextual information without increasing parameters. The Adaptive Fine-Grained Channel Attention Depthwise Separable Convolution (FCA_SepConv) combines depthwise separable convolution with adaptive fine-grained channel attention to capture the relationship between global and local information at different granularities, achieving a dynamic correlation between local spatial details and global spectral response, thereby improving feature selection efficiency. The Mixed Local Channel Attention Depthwise Separable Convolution (MICA_SepConv) achieves the fine fusion of spectral spatial features through multi-scale local feature interaction and channel-adaptive weighting. The search space achieves spectral spatial joint modeling through the global perception of dilated convolution, local optimization of FCA_SepConv, and multi-scale interactive collaboration of MICA_SepConv. By designing these three operators in conjunction with a precise and robust search space, it is possible to effectively search for the final network that balances classification performance and transfer robustness.

(1) Fine-Grained Channel Attention Depthwise Separable Convolution

In order to effectively integrate the global and local information of hyperspectral data, this paper constructs an FCA_SepConv operator by combining the adaptive fine-grained channel attention (FCA) [23] mechanism. Among them, the FCA creates correlation matrices through cross-correlation operations to capture the relationships between global and local information at different granularities. This feature enhances the interaction between global and local information, enabling a more precise division of their correlations at different granularity levels. Finally, by constructing a trainable parameter to dynamically merge global and local information, the FCA achieved adaptive allocation of channel weights, thereby improving feature extraction capabilities. The architecture of the FCA is shown in Figure 4.

The core idea of FCA is to achieve the interaction of multi-scale features and the adaptive allocation of channel attention through global and local contrastive modeling, thereby enhancing the feature representation ability. Firstly, in order to obtain global information from the feature map, the feature map

F

containing global spatial information is transformed into a channel descriptor

U

through global average pooling. Given the feature map

F \in R^{C \times H \times W}

,

C

,

H

, and

W

represent the number of channels, length, and width, respectively. They are merged through global averaging to generate the channel descriptor

U \in R^{C}

. The

n

th channel element of the channel descriptor

U

is expressed as

U_{n} = G A P (F_{n}) = \frac{1}{H \times W} \sum_{i = 1}^{H} \sum_{j = 1}^{W} F_{n} (i, j)

(7)

where

F_{n} (i, j)

is the nth channel feature map in this local area;

G A P (x)

is the global average pooling function. Through this function, the shape of the feature map

F

can be transformed from

C \times H \times W

to

C \times 1 \times 1

.

In order to obtain local channel information while ensuring a small number of model parameters, a band matrix

B = [b_{1}, b_{2}, \dots, b_{k}]

is used for local channel interaction:

U_{l c} = \sum_{i = 1}^{k} U \cdot b_{i}

(8)

where

U

is the channel descriptor;

U_{l c}

represents the local information; and

k

represents the number of adjacent channels. In order to obtain the global channel information and enhance the ability to represent global information, the diagonal matrix

D

is utilized to capture the dependencies among all channels as global information.

U_{g c} = \sum_{i = 1}^{c} U \cdot d_{i}

(9)

where

U_{g c}

represents the global information;

C

is the number of channels; and the diagonal matrix is

D = [d_{1}, d_{2}, d_{3}, \dots, d_{c}]

.

In order to promote an effective interaction between global information and local information, the global information obtained through the diagonal matrix is combined with the local information obtained through the band matrix. The correlation between the two is captured at different granularities through the cross-correlation operation:

M = U_{g c} \cdot U_{l c}^{T}

(10)

where

M

represents the correlation matrix.

In order to accurately assign feature weights and reduce the computational complexity, an adaptive fusion strategy is adopted. The row and column information are extracted from the correlation matrix and transposed into weight vectors of global and local information. Dynamic fusion is achieved through a learnable factor, and the process is expressed as follows:

U_{g c}^{w} = \sum_{j}^{c} M_{i, j}, i \in 1, 2, \dots, c

(11)

U_{l c}^{w} = \sum_{j}^{c} (U_{l c} \cdot U_{g c}^{T}) = \sum_{j}^{c} M_{i, j}^{T}

(12)

W = σ (σ (θ) \times σ (U_{g c}^{w}) + (1 - σ (θ)) \times σ (U_{l c}^{w}))

(13)

where

U_{g c}^{w}

and

U_{l c}^{w}

are the fused global channel weight and local channel weight, respectively;

θ

represents the Sigmoid activation function.

Through this method, the redundant cross-correlation operation between local and global information is effectively avoided, and at the same time, the interaction between local and global information is further promoted. Eventually, the FCA selectively emphasizes the information features, suppresses the useless features, and achieves a more precise weight allocation for the denoising of relevant features. Finally, the obtained weights are multiplied by the input feature map, and the formula is as follows:

F^{*} = W \otimes F

(14)

where

F

is the input feature map, and

F^{*}

is the feature map output though FCA.

(2) Mixed Local Channel Attention Depthwise Separable Convolution

Due to the complexity and high computational cost of spatial attention modules, it is difficult to directly integrate them into lightweight convolutions. Even though some simple attention methods can successfully reduce model parameters and include spatial and channel information, they also exclude local information and only provide long-range information for the entire range. This article uses the Mixed Local Channel Attention (MLCA) [24] module to improve the performance of depthwise separable convolution by combining channel attention and spatial attention according to HSIC requirements. This module aims to address the issue of existing attention mechanisms ignoring spatial feature information and improve the model’s expressive power while maintaining a lightweight. The principle of MLCA_SepConv is shown in Figure 5.

The MLCA mechanism first performs block processing, converting the input feature vector into a vector of

1 \times C \times k_{s} \times k_{s}

, and extracts local spatial information through the first local pooling. Among them,

k_{s}

represents the number of blocks, which is used to determine the size of the blocks. In the initial stage, two branches are utilized to convert the input into one-dimensional vectors. The first branch contains global information, and the second branch contains local spatial information. After one-dimensional convolution, the original resolution of the two vectors is restored through the un-average pooling operation (UNAP), and then the information is fused to achieve the goal of mixed attention. Conv1d in the figure is a one-dimensional convolution, which processes the vectors of the two branches. The size of the convolution kernel

k

is proportional to the channel dimension

C

, which indicates that when capturing local cross-channel interaction information, only the relationship between each channel and its

k

adjacent channels is considered. The selection of k is determined by Formula (15):

k = ϕ (C) = {|\frac{\log_{2} (C)}{γ} + \frac{b}{γ}|}_{o d d}

(15)

where

C

is the number of channels;

k

is the size of the convolution kernel; both

γ

and

b

are hyperparameters with default values of 2; and “odd” means that k is an odd number. If

k

is an even number, then 1 is added to it.

Figure 6 illustrates the main processes of global average pooling (GAP), Local Average Pooling (LAP), and un-average pooling. GAP is the global average pooling, while adaptive concrete pooling with an output size of 1 will change the feature map to

1 \times 1

. When it is necessary to perform multiplication or addition on the source input, Expand or UNAP should be used for expansion. Among them, UNAP mainly focuses on the attributes of graphics and extends them to the required size. UNAP can achieve adaptive pooling by outputting a size equal to the size of the source feature map. The entire feature map of LAP and GAP channels is different. LAP divides the entire feature map into patches of

k \times k

and then performs average pooling on each patch, which can be executed using the output of

k \times k

adaptive average pooling. When expanding LAP, the size is not 1 × 1, so the Expand operation cannot be directly used. On the contrary, the UNAP process must be used to restore the feature map to its original size.

The overall algorithm flow of the proposed MLSF-NAS is written as Algorithm 1.

Algorithm 1 MLSF-NAS

Initialization and Data Preparation:
1. source_domains = [Domain_D1, Domain_D2] //Source domain data
2. target_domain_labeled = Domain_Dt_labeled //Labeled data of the target domain
3. target_domain_unlabeled = Domain_Dt_unlabeled //Unlabeled data of the target domain

Stage 1: Supernet Architecture Search:
1. SuperNet = InitializeSuperNet() //Initialize the supernet
2. for epoch = 1 to SUPERNET_EPOCHS do
for each domain in source_domains + [target_domain_labeled] do
S_samples, Q_samples = SplitIntoSupportQuery(domain) //Split into support set and query set
features = SuperNet(S_samples, Q_samples) //Supernet feature extraction
loss = Loss(features) //Calculate loss
UpdateSuperNet(SuperNet, loss) //Update supernet parameters
end for
end for

Stage 2: Optimal Architecture Extraction and Final Network Construction:
1. EpisodeOptimizer = InitializeOptimizer(FinalNet) //Initialize the final network optimizer
2. for episode = 1 to EPISODES do
episode_data = SampleEpisodeData(source_domains, target_domain_labeled) //Sample episode data
for each batch in episode_data do
M_samples, other_samples = SplitBatch(batch) //Split into M set and other sets
features = FinalNet(M_samples, other_samples) //Final network feature extraction
loss = Loss(features) //Calculate loss
UpdateFinalNet(FinalNet, loss, EpisodeOptimizer) //Update final network parameters
end for
end for

Stage 4: Transfer Application of Unlabeled Data in the Target Domain:
1. NNClassifier = InitializeClassifier() //Initialize the classifier
2. for each sample in target_domain_unlabeled do
features = FinalNet(sample) //Feature extraction
prediction = NNClassifier(features) //Classification prediction
end for

3. Results

3.1. Dataset Description

The experimental datasets used in this method include a multi-source dataset with sufficient labeled samples and a target domain dataset with fewer labeled samples. For multi-source datasets, this paper chooses Mini ImageNet (MI) and Chikusei (CK) to jointly form a multi-source domain dataset. For the target domain dataset, this paper uses three representative HSI datasets: Pavia Center (PC) dataset, Indian Pines (IN) dataset, and WHU Hi LongKou (LK) dataset.

Mini ImageNet dataset: MI is a subset of the ImageNet dataset consisting of 100 categories, each with 600 images, for a total of 60,000 images. The Mini ImageNet dataset, due to its rich natural images, is often used as a benchmark dataset for meta learning and few-sample domains, for research in few-sample learning.

Chikusei dataset: CK is a hyperspectral image of Chikushi, Ibaraki, Japan, obtained through the Hyperspec-VNIR-CIRIS spectrometer. The ground sampling distance is 2.5 m, the scene size is 2517 × 2335 pixels, with 19 categories and a total of 512 bands.

The PC dataset was captured by ROSIS sensors during a flight over Pavia, northern Italy, in 2001. The number of spectral bands in the Pavia Center is 102. The size of the dataset is 1096 × 1096 pixels, with a spatial resolution of 1.3 m. There are nine categories available for experimental research, as shown in Table 1.

The IN dataset was obtained by the AVIRIS sensor at the Indian Pine testing site in northwest Indiana in 1992. It consists of 16 crop categories, 145 × 145 pixels, and 224 spectral reflectance bands, with 200 effective bands and a spatial resolution of 20 m, as shown in Table 2.

The WHU-Hi-LongKou dataset was obtained on 7 July 2018 in Longkou Town, Hubei Province, China, using an 8 mm focal length Headwall Nano Hyperspec imaging sensor installed on the DJI Matrice 600 Pro (Shenzhen Dajiang lnnovation Technology Co., Ltd., Shenzhen, China) drone platform. The research area is a simple agricultural scenario with a total of nine categories. The image size is 550 × 400 pixels, with 270 bands between 0.4~1 μm. The spatial resolution of the hyperspectral image carried by the drone is approximately 0.463 m, as shown in Table 3.

3.2. Experimental Environment Configuration and Implementation Details

All experiments were conducted under the following computer configurations: Intel (R) Xeon (R) CPU E5-2620 v4@2.10GHz, 128 GB RAM and NVIDIA GeForce 2080 Ti graphics processing unit (GPU) (Shanghai Fenghu Information Technology Co., Ltd., Shanghai, China). The software environment is a 64-bit Windows 10 system, using the open-source framework Pytorch 1.12.1. The Adam optimizer is used to optimize architecture parameters with a learning rate of 0.004.

In this experiment, the single-domain method SSRN [11], the adaptive spectral space kernel improved residual network A²S²K-ResNet [12], the lightweight multi-scale neural architecture search (LMSS-NAS) [17], and the dual view spectral and global spatial feature fusion network (DSGSF) [25] randomly selected five label samples for each class in the target domain. For multi-domain methods such as the Deep Cross Domain Few-Shot Learning (DCFSL) [26] and Heterogeneous Few-Shot Learning (HFSL) [27], an additional 200 labeled source domain samples were randomly selected. The remaining samples in the target domain are retained as the test set. For each episode, the input size of HSI in the CK and target datasets is set to 9 × 9 × 9 (the number of bands) to maintain the same size as the spatial features of hyperspectral data in cross-domain scenes. In addition, in MI, the input size of each image is set to 33 × 33 × 33 to utilize its rich spatial and texture information.

Specifically, the FSL task in each episode is c-ways k-shot n-query, where c, k, and n represent the number of selected classes, the number of labeled samples for each class in the support set, and the number of labeled samples for each class in the query set, respectively. Usually, c is set as the number of classes in the target dataset to ensure the total number of tasks in the source domain dataset based on the FSL method previously used for each task for both the single-source dataset and the target dataset. For the target dataset, k and nt are the same as the previous FSL method. To ensure the stability of the comparison, each experiment was repeated ten times, and the average was taken as the final result. The search iteration count was set to 500 and the training iteration count to 20,000. The Adam optimizer was used to optimize the network, with a learning rate set to 0.004.

3.3. Comparison of the Proposed Method with the State-of-the-Art Methods

In order to comprehensively and objectively evaluate the performance of different methods in HSI classification tasks, this chapter combines a quantitative and qualitative analysis to conduct in-depth and detailed evaluations of various methods. For the quantitative evaluation, this study used three indicators, including the overall accuracy (OA), average accuracy (AA), and Kappa coefficient (K). This chapter validates the proposed method through several comparative experiments, including the methods SSRN and A²S²K-ResNet based on manually designed convolutional residual networks, DSGSF based on the dual channel attention mechanism, LMSS-NAS based on NAS, DFSL based on FSL, DCFSL, and HFSL based on FSL, as shown in Table 4, Table 5 and Table 6.

It can be seen in Table 4, in non-FSL methods, SSRN (OA: PC 91.64%, IN 53.70%, and LK 87.84%) and A²S²K-ResNet (OA: PC 93.39%, IN 57.28%, and LK 88.42%) have a lower classification performance. Therefore, when training with limited labeled samples, manually designed convolutional neural networks have insufficient advantages in achieving an accurate classification. Based on NAS, LMSS-NAS (OA: PC of 96.85%, IN of 60.96%, and LK of 94.60%) has improved compared to SSRN and A²S²K-ResNet and is close to the overall classification performance of DSGSF. However, due to the overfitting problem caused by limited samples, it lags behind the FSL method.

Compared with DCFSL and HFSL in the manually designed FSL method, MLFS-NAS achieved the most competitive and accurate classification performance through multi-source learning and the automatic search of the final network composed of optimal units, under the limitation of limited labeled samples in three datasets. On the PC, IN, and LK datasets, the OA reached 98.57%, 78.39%, and 98.74%, respectively. Meanwhile, compared to the suboptimal method HFSL, the OA improved by 2.33%, 10.17%, and 1.9%, respectively. In addition, on the IN dataset, the AA performance of MLFS-NAS (AA: 84.35%) was higher than that of HFSL (AA: 80.43%), and MLFS-NAS correctly classified more samples in categories, resulting in a higher Kappa coefficient (76.91%). Overall, ARFSL-NAS abstracts the natural image texture information and spectral spatial commonalities of hyperspectral data into transferable meta knowledge through multi-source learning and utilizes precise and robust units at different locations to enhance classification accuracy and transfer robustness. Therefore, MLFS-NAS achieved better classification results on all three datasets.

In order to present the classification results more clearly, a visual analysis was conducted on the classification results of seven methods on two hyperspectral datasets, as shown in Figure 7, Figure 8 and Figure 9. Obviously, compared to other methods, the MLFS-NAS method has more accurate classification results. Compared with other classification methods, SSRN and A²S²K-ResNet have more noise and scattered points in their classification graphs, while LMSS-NAS, DFSL, DCFSL, and HFSL classification methods still have some misclassification. Compared with the true value graph, it can be seen that this method can obtain more accurate classification results, further proving the effectiveness of NAS and multi-source learning in few-sample hyperspectral classification.

Figure 10, Figure 11 and Figure 12 show the different classification accuracies of different classification methods on these three datasets. As shown in the figure, MLFS-NAS has a high accuracy in the vast majority of categories in the three datasets. For example, in the Bitumen and Tile categories of the PC dataset, MLFS-NAS is significantly higher than the suboptimal method HFSL. In the Sesame and Mixed weed classes of the LK dataset, MLFS-NAS also has certain advantages in classification accuracy compared to other comparative experiments. Specifically, the IN dataset is prone to overfitting due to the imbalance in the number of samples between classes (e.g., Oats have only 20 samples), making classification challenging under limited samples. This is also the reason why non-FSL methods perform poorly in this class (e.g., SSRN has only a 12.28% classification accuracy). However, the FSL method utilizes meta learning and implicit knowledge transfer to enable the model to learn the ability to quickly generalize a small number of samples during the training phase, achieving a 100% classification accuracy in this class. At the same time, MLFS-NAS effectively transfers the meta knowledge abstracted by MI and CK to HSI through multi-source learning and enhances the discriminative feature extraction of various types by precise and robust attention convolution operators in the search space, avoiding the problem of feature flooding caused by sparse samples and achieving accurate classification on the IN dataset.

4. Discussion

4.1. Analysis of Optimal Cell Structure

This paper proposes a multi-source FSL framework based on the fusion of heterogeneous natural images (ImageNet) and homogeneous remote sensing images (Chikusei). After each search iteration, the optimal unit structure of the source dataset is obtained as shown in Figure 13. Among them, operations between nodes within precise units tend to have more separable convolutions, fewer dilated convolutions, and skip connections, while operations between nodes within robust units tend to have more dilated convolutions, fewer separable convolutions, and no skip connections. This unit structure design is such that more learnable parameters are beneficial for improving accuracy, so precise units will choose more separable convolution types. During the search process, the robust units at the output of this method are more inclined towards dilated convolution. In robust units, dilated convolutions fix certain weights to zero, resulting in fewer learning parameters. Therefore, input disturbances are difficult to alter the output, thus possessing stronger robustness. The NSA architecture automatically searches for the optimal precise and robust unit structure in the search space and fully extracts the global and local information of HSI through efficient convolution operators in the search space, further improving the classification accuracy of NAS in the field of few-sample HSIC and ensuring the robustness of the transfer process.

4.2. Analysis of Related Parameter

The results of the ablation experiment are shown in Table 7. The results indicate that when MI and CK are treated as independent, single-domain structures, the MI single domain exhibits better experimental performance than the CM single domain on the PC and LK datasets, while the classification performance on the IN dataset is slightly lower. The experiments have shown that in single-source domain learning, different source domains have their own advantages. However, when MI and CK are both in multi-source learning, the learned multi-source information significantly improves the accuracy and generalization ability of HSI classification under limited labeled samples, achieving the best results in all three target domain datasets, which further proves the effectiveness of a multi-source dataset’s learning.

This section conducted experiments on the MLFS-NAS method to verify its robustness under limited labeled samples. In the experiment, one to five labeled samples were selected from each class of the target domain dataset to validate the performance of the model. As shown in Figure 14, as the number of labeled samples increases, the classification performance of the method proposed in this chapter steadily improves. The experimental results indicate that MLSF-NAS effectively alleviates the risk of overfitting in low-sample scenarios through the final network and multi-source feature representation, demonstrating excellent information extraction efficiency and model robustness.

5. Conclusions

This paper proposes an MLFS-NAS, which aims to improve the classification performance and generalization ability of the model with only a small number of labeled samples by combining meta-learning strategies. This method utilizes a multi-source learning framework that integrates heterogeneous natural images (ImageNet) and homogeneous remote sensing images (Chikusei) to enhance the model’s generalization performance by allowing the model to learn to distinguish similarities between homogeneous and heterogeneous data. Meanwhile, by designing a precise and robust efficient search space, different optimal unit structures can be deployed at different locations to improve the classification accuracy and transfer robustness of the final network. Among them, in order to effectively utilize the global and local information of hyperspectral data, an efficient internal search space composed of separable convolutions with combined FCA and MICA mechanisms is designed. The experimental results on PC, IN, and LK datasets show that the proposed method effectively improves the accuracy of land cover classification under limited labeled samples.

Author Contributions

Conceptualization, A.W., K.Z., H.C., H.W. and M.W.; methodology, K.Z., H.C. and M.W.; software K.Z. and M.W.; validation K.Z. and H.C.; writing—review and editing A.W., K.Z., H.C., H.W. and M.W. All authors have read and agreed to the published version of the manuscript.

Funding

This work was funded by the Key Research and Development Plan Project of Heilongjiang (JD2023SJ19), the Natural Science Foundation of Heilongjiang Province (LH2023F034), the Science and Technology Project of Heilongjiang Provincial Department of Transportation (HJK2024B002), and Shenzhen Polytechnic University Research Fund (6025310007K).

Data Availability Statement

http://www.ehu.eus/ccwintco/index.php/Hyperspectral_Remote_Sensing_Scenes (accessed on 23 March 1996); http://www.ehu.eus/ccwintco/index.php/Hyperspectral_Remote_Sensing_Scenes#Pavia_University_scene (accessed on 8 July 2002).

Conflicts of Interest

The authors declare no conflicts of interest.

References

Cantalloube, H.M.J.; Nahum, C.E. Airborne SAR-Efficient Signal Processing for Very High Resolution. Proc. IEEE 2013, 101, 784–797. [Google Scholar] [CrossRef]
Gao, Z.D.; Hao, Q.; Liu, Y.; Zhu, Y.Y.; Cao, J.; Meng, H.M.; Liu, J.; Chen, H.L. Development of Hyperspectral Imaging and Application Technology. Metrol. Meas. Technol. 2019, 24–34. [Google Scholar] [CrossRef]
Rasti, B.; Hong, D.; Hang, R.; Ghamisi, P.; Kang, X.; Chanussot, J.; Benediktsson, J.A. Feature Extraction for Hyperspectral Imagery: The Evolution From Shallow to Deep: Overview and Toolbox. IEEE Geosci. Remote Sens. Mag. 2020, 8, 60–88. [Google Scholar] [CrossRef]
Landgrebe, D. Hyperspectral image data analysis. IEEE Signal Process. Mag. 2002, 19, 17–28. [Google Scholar] [CrossRef]
Bioucas-Dias, J.M.; Plaza, A.; Camps-Valls, G.; Scheunders, P.; Nasrabadi, N.; Chanussot, J. Hyperspectral Remote Sensing Data Analysis and Future Challenges IEEE Geosci. Remote Sens. Mag. 2013, 1, 6–36. [Google Scholar]
Kavita, B.; Vijaya, M. Evaluation of Deep Learning CNN Model for Land Use Land Cover Classification and Crop Identification Using Hyperspectral Remote Rensing Images. J. Indian Soc. Remote Sens. 2019, 47, 1949–1958. [Google Scholar]
Gao, A.F.; Rasmussen, B.; Kulits, P.; Scheller, E.L.; Greenberger, R.; Ehlmann, B.L. Generalized Unsupervised Clustering of Hyperspectral Images of Geological Targets in the Near Infrared. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Nashville, TN, USA, 19–25 June 2021; pp. 4289–4298. [Google Scholar]
Yue, J.; Zhao, W.; Mao, S.; Liu, H. Spectral-Spatial Classification of Hyperspectral Images using Deep Convolutional Neural Networks. Remote Sens. Lett. 2015, 6, 468–477. [Google Scholar] [CrossRef]
Zhang, H.; Li, Y.; Zhang, Y.; Shen, Q. Spectral-Spatial Classification of Hyperspectral Imagery using a Dual-channel Convolutional Neural Network. Remote Sens. Lett. 2017, 8, 438–447. [Google Scholar] [CrossRef]
Chen, Y.; Jiang, H.; Li, C.; Jia, X.; Ghamisi, P. Deep Feature Extraction and Classification of Hyperspectral Images based on Convolutional Neural Networks. IEEE Trans. Geosci. Remote Sens. 2016, 54, 6232–6251. [Google Scholar] [CrossRef]
Zhong, Z.; Li, J.; Luo, Z.; Chapman, M. Chapman. Spectral-Spatial Residual Network for Hyperspectral Image Classification: A 3-D Deep Learning Framework. IEEE Trans. Geosci. Remote Sens. 2018, 56, 847–858. [Google Scholar] [CrossRef]
Roy, S.K.; Manna, S.; Song, T.; Bruzzone, L. Attention-Based Adaptive Spectral-Spatial Kernel ResNet for Hyperspectral Image Classification. IEEE Trans. Geosci. Remote Sens. 2021, 59, 7831–7843. [Google Scholar] [CrossRef]
Liu, B.; Yu, X.; Yu, A.; Zhang, P.; Wan, G.; Wang, R. Deep few-shot learning for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2019, 57, 2290–2304. [Google Scholar] [CrossRef]
Chen, Y.; Zhu, K.; Zhu, L.; He, X.; Ghamisi, P.; Benediktsson, J.A. Automatic Design of Convolutional Neural Network for Hyperspectral Image Classification. IEEE Trans. Geosci. Remote Sens. 2019, 57, 7048–7066. [Google Scholar] [CrossRef]
Zhang, H.; Gong, C.; Bai, Y.; Bai, Z.; Li, Y. 3-D-ANAS: 3-D Asymmetric Neural Architecture Search for Fast Hyperspectral Image Classification. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5508519. [Google Scholar] [CrossRef]
Xue, X.; Zhang, H.; Fang, B.; Bai, Z.; Li, Y. Grafting Transformer on Automatically Designed Convolutional Neural Network for Hyperspectral Image Classification. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5531116. [Google Scholar] [CrossRef]
Cao, C.; Xiang, H.; Song, W.; Yi, H.; Xiao, F.; Gao, X. Lightweight Multiscale Neural Architecture Search With Spectra-Spatial Attention for Hyperspectral Image Classification. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5505315. [Google Scholar] [CrossRef]
Xiao, F.; Xiang, H.; Cao, C.; Gao, X. Neural Architecture Search-Based Few-Shot Learning for Hyperspectral Image Classification. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5513715. [Google Scholar] [CrossRef]
Lin, T.-Y.; Goyal, P.; Girshick, R.; He, K.; Dollar, P. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
Cao, K.; Wei, C.; Gaidon, A.; Arechiga, N.; Ma, T. Learning imbalanced datasets with label-distribution-aware margin loss. Adv. Neural Inf. Process. Syst. 2019, 32. [Google Scholar] [CrossRef]
Ou, Y.; Feng, Y.; Sun, Y. Towards Accurate and Robust Architectures via Neural Architecture Search. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 16–22 June 2024; pp. 5967–5976. [Google Scholar]
Feng, J.; Fu, C. An Enhanced YOLOv8 Model for Flame and Smoke Detection. In Proceedings of the 2024 4th International Conference on Computer Science and Blockchain (CCSB), Shenzhen, China, 6–8 September 2024; pp. 109–113. [Google Scholar]
Sun, H.; Wen, Y.; Feng, H.; Zheng, Y.; Mei, Q.; Ren, D.; Yu, M. Unsupervised Bidirectional Contrastive Reconstruction and Adaptive Fine-Grained Channel Attention Networks for image dehazing. Neural Netw. 2024, 176, 106314. [Google Scholar] [CrossRef] [PubMed]
Wan, D.; Lu, R.; Shen, S.; Xu, T.; Lang, X.; Ren, Z. Mixed Local Channel Attention for Object Detection. Eng. Appl. Artif. Intell. 2023, 123, 106442. [Google Scholar] [CrossRef]
Guo, T.; Wang, R.; Luo, F.; Gong, X.; Zhang, L.; Gao, X. Dual-View Spectral and Global Spatial Feature Fusion Network for Hyperspectral Image Classification. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5512913. [Google Scholar] [CrossRef]
Li, Z.; Liu, M.; Chen, Y.; Xu, Y.; Li, W.; Du, Q. Deep Cross-Domain Few-Shot Learning for Hyperspectral Image Classification. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5501618. [Google Scholar] [CrossRef]
Wang, Y.; Liu, M.; Yang, Y.; Li, Z.; Du, Q.; Chen, Y.; Li, F.; Yang, H. Heterogeneous Few-Shot Learning for Hyperspectral Image Classification. IEEE Geosci. Remote Sens. Lett. 2022, 19, 5510405. [Google Scholar] [CrossRef]

Figure 1. The overall architecture of MLFS-NAS.

Figure 2. The accurate and robust search space.

Figure 3. The convolution operator of internal search space.

Figure 4. Adaptive Fine-Grained Channel Attention Depthwise Separable Convolution structure.

Figure 5. Mixed Local Channel Attention Depthwise Separable Convolution architecture.

Figure 6. Schematic diagram of GAP, LAP, and UNAP.

Figure 7. The classification results of PC dataset. (a) Ground truth; (b) SSRN; (c) A²S²K-ResNet; (d) DSGSF; (e) LMSS-NAS; (f) DCFSL; (g) HFSL; and (h) MLFS-NAS.

Figure 8. The classification results of IN dataset. (a) Ground truth; (b) SSRN; (c) A²S²K-ResNet; (d) DSGSF; (e) LMSS-NAS; (f)DCFSL; (g) HFSL; and (h) MLFS-NAS.

Figure 9. The classification results of LK dataset. (a) Ground truth; (b) SSRN; (c) A²S²K-ResNet; (d) DSGSF; (e) LMSS-NAS; (f) DCFSL; (g) HFSL; and (h) MLFS-NAS.

Figure 10. Classification results of different methods for PC dataset.

Figure 11. Classification results of different methods for IN dataset.

Figure 12. Classification results of different methods for LK dataset.

Figure 13. Optimal cell structure on the source datasets. (a) Accurate cell; (b) robust cell.

Figure 14. Classification accuracy of HCFSL-NAS with different numbers of labeled samples on the target domain datasets. (a) PC; (b) IN; and (c) LK.

Table 1. Description of PC dataset information.

No.	Class	Sample Numbers
1	Water	65,971
2	Trees	7598
3	Asphalt	300
4	Self-Blocking Bricks	2685
5	Bitumen	6584
6	Tiles	9248
7	Shadows	7287
8	Meadows	42,826
9	Bare Soil	2863
Total		42,776

Table 2. Description of IN dataset information.

No.	Class Name	Sample
1	Alfalfa	46
2	Corn-notill	1428
3	Corn-mintill	830
4	Corn	237
5	Grass-pasture	483
6	Grass-trees	730
7	Grass-pasture-mowed	28
8	Hay-windrowed	478
9	Oats	20
10	Soybean-notill	972
11	Soybean-mintill	2455
12	Soybean-clean	593
13	Wheat	205
	Woods	1265
	Buildings-grass-trees-drives	386
	Stone-steel-towers	93
Total		10,249

Table 3. Description of LK dataset information.

No.	Class	Sample Numbers
1	Corn	34,511
2	Cotton	8374
3	Sesame	3031
4	Broad-leaf soybean	63,212
5	Narrow-leaf soybean	4151
6	Rice	11,854
7	Water	67,056
8	Roads and houses	7124
9	Mixed weed	5229
Total		204,542

Table 4. Classification results of PC dataset.

	SSRN	A²S²K-ResNet	DSGSF	LMSS-NAS	DCFSL	HFSL	MLFS-NAS
Class	SSRN	A²S²K-ResNet	DSGSF	LMSS-NAS	DCFSL	HFSL	MLFS-NAS
1	99.14 ± 0.20	99.99 ± 0.01	99.98 ± 0.02	99.95 ± 0.04	99.50 ± 0.12	99.78 ± 0.02	99.65 ± 0.11
2	84.13 ± 4.67	88.48 ± 11.20	75.05 ± 6.39	96.87 ± 1.86	92.70 ± 4.31	95.56 ± 5.10	93.46 ± 1.58
3	66.49 ± 7.85	57.80 ± 18.19	97.56 ± 2.01	83.84 ± 5.94	84.73 ± 3.56	90.26 ± 2.71	93.67 ± 2.63
4	61.27 ± 12.26	62.32 ± 6.68	51.49 ± 14.78	52.91 ± 15.23	99.51 ± 0.21	88.20 ± 0.84	93.66 ± 3.56
5	81.32 ± 8.73	90.04 ± 7.34	99.97 ± 4.85	91.65 ± 5.35	86.23 ± 3.83	89.85 ± 4.10	94.86 ± 2.68
6	84.24 ± 4.82	73.76 ± 6.72	93.29 ± 3.41	91.55 ± 0.77	94.07 ± 1.25	71.71 ± 0.45	97.98 ± 1.52
7	91.27 ± 1.70	96.68 ± 2.66	99.96 ± 1.49	98.71 ± 0.91	84.45 ± 3.64	99.86 ± 4.55	90.37 ± 2.89
8	94.26 ± 5.31	97.63 ± 2.62	98.67 ± 0.99	99.93 ± 0.05	98.75 ± 0.22	99.80 ± 0.91	99.52 ± 0.14
9	93.71 ± 0.18	99.98 ± 0.01	100.00 ± 0.00	99.25 ± 0.77	95.98 ± 4.73	92.51 ± 3.04	96.58 ± 1.46
OA (%)	91.64 ± 1.75	99.98 ± 0.01	95.77 ± 0.56	96.85 ± 1.25	96.89 ± 0.30	96.24 ± 0.95	96.58 ± 1.46
AA (%)	83.98 ± 0.93	85.21 ± 0.04	90.67 ± 2.08	90.52 ± 2.44	92.88 ± 2.56	91.95 ± 1.11	98.57 ± 0.53
K × 100	89.94 ± 2.55	90.70 ± 0.03	94.00 ± 0.80	95.55 ± 1.76	95.60 ± 0.09	94.68 ± 0.76	95.85 ± 0.47

Table 5. Classification results of IN dataset.

	SSRN	A²S²K-ResNet	DSGSF	LMSS-NAS	DCFSL	HFSL	MLFS-NAS
Class	SSRN	A²S²K-ResNet	DSGSF	LMSS-NAS	DCFSL	HFSL	MLFS-NAS
1	29.93 ± 23.74	27.28 ± 18.94	91.66 ± 2.09	26.39 ± 9.81	98.8 ± 1.69	98.78 ± 1.22	96.67 ± 2.57
2	52.13 ± 14.64	51.99 ± 4.89	47.63 ± 3.80	59.97 ± 18.25	38.15 ± 6.16	56.71 ± 4.64	61.73 ± 1.33
3	24.07 ± 3.28	42.27 ± 1.02	41.86 ± 2.76	46.78 ± 24.84	51.63 ± 3.94	50.61 ± 4.30	75.78 ± 1.81
4	29.30 ± 12.91	30.40 ± 10.22	16.01 ± 16.77	40.86 ± 3.87	72.70 ± 13.01	79.74 ± 18.97	82.93 ± 9.21
5	76.71 ± 12.49	77.49 ± 13.48	54.72 ± 14.19	79.60 ± 12.79	61.93 ± 13.01	63.39 ± 10.25	91.10 ± 0.92
6	89.19 ± 7.26	89.66 ± 1.85	82.06 ± 3.61	85.75 ± 10.73	94.14 ± 0.69	78.41 ± 11.10	87.66 ± 1.71
7	35.31 ± 26.51	25.44 ± 4.12	40.00 ± 22.74	32.23 ± 19.72	100.00 ± 0.00	97.83 ± 2.17	98.92 ± 1.25
8	95.02 ± 6.90	100.00 ± 0.00	99.32 ± 0.45	99.11 ± 0.86	97.14 ± 3.73	99.68 ± 0.11	96.57 ± 2.01
9	12.28 ± 5.63	33.60 ± 2.42	11.59 ± 1.49	18.50 ± 15.11	100.00 ± 0.00	100.00 ± 0.00	100.00 ± 0.00
10	63.05 ± 7.61	51.89 ± 11.39	45.04 ± 11.45	70.90 ± 18.40	59.82 ± 0.79	67.99 ± 3.36	75.32 ± 2.20
11	61.22 ± 5.14	67.19 ± 2.78	73.21 ± 3.88	56.15 ± 8.85	33.31 ± 3.86	64.65 ± 1.71	71.52 ± 1.07
12	28.32 ± 4.98	41.61 ± 2.06	42.85 ± 5.74	46.20 ± 25.19	41.52 ± 6.70	55.78 ± 23.98	71.44 ± 4.27
13	94.99 ± 4.15	84.42 ± 0.56	80.61 ± 1.22	91.56 ± 7.69	99.00 ± 0.71	98.75 ± 0.25	98.77 ± 1.77
14	91.43 ± 4.60	96.97 ± 4.77	93.06 ± 1.95	96.30 ± 2.31	92.10 ± 5.33	80.99 ± 9.48	91.48 ± 0.87
15	58.55 ± 25.51	59.61 ± 6.78	55.50 ± 15.89	64.14 ± 12.83	79.27 ± 8.58	99.21 ± 0.52	95.67 ± 3.01
16	73.47 ± 29.26	72.21 ± 4.91	89.77 ± 8.01	83.60 ± 8.13	98.30 ± 2.41	94.32 ± 5.68	96.55 ± 2.77
OA (%)	53.70 ± 4.84	57.28 ± 0.91	59.17 ± 2.23	60.96 ± 4.27	66.55 ± 1.81	69.61 ± 1.71	78.39 ± 0.78
AA (%)	57.18 ± 5.35	59.49 ± 3.39	60.31 ± 2.69	62.38 ± 2.63	78.53 ± 1.09	80.43 ± 2.26	84.35 ± 3.19
K × 100	48.66 ± 5.22	52.70 ± 0.94	54.30 ± 2.18	55.23 ± 4.97	61.99 ± 1.49	65.83 ± 1.30	76.91 ± 0.90

Table 6. Classification results of LK dataset.

	SSRN	A²S²K-ResNet	DSGSF	LMSS-NAS	DCFSL	HFSL	MLFS-NAS
Class	SSRN	A²S²K-ResNet	DSGSF	LMSS-NAS	DCFSL	HFSL	MLFS-NAS
1	93.97 ± 4.58	92.07 ± 0.73	99.12 ± 0.01	98.17 ± 0.53	99.35 ± 0.37	94.39 ± 4.03	98.45 ± 0.53
2	54.57 ± 15.75	76.51 ± 19.53	94.93 ± 0.42	95.23 ± 8.08	93.71 ± 4.22	99.00 ± 0.39	97.58 ± 0.30
3	59.98 ± 11.32	89.33 ± 3.31	65.59 ± 0.69	74.45 ± 22.62	93.75 ± 6.20	81.18 ± 6.03	98.32 ± 0.08
4	91.85 ± 6.70	74.08 ± 2.13	94.86 ± 0.21	98.40 ± 0.91	82.06 ± 5.31	97.04 ± 1.25	98.53 ± 0.23
5	43.85 ± 6.01	84.16 ± 0.86	76.99 ± 3.43	77.33 ± 12.01	98.29 ± 1.17	91.41 ± 3.71	98.42 ± 0.85
6	85.98 ± 12.90	97.38 ± 2.30	98.49 ± 1.21	97.39 ± 4.19	87.85 ± 3.62	96.40 ± 1.96	98.31 ± 0.88
7	98.65 ± 0.88	99.48 ± 0.39	99.34 ± 0.30	99.31 ± 0.65	99.94 ± 0.31	99.58 ± 0.37	98.31 ± 1.22
8	87.20 ± 15.57	93.64 ± 2.53	94.73 ± 1.17	68.24 ± 13.74	88.96 ± 3.86	96.38 ± 0.86	97.64 ± 0.36
9	46.62 ± 15.21	90.63 ± 1.46	74.13 ± 2.42	59.23 ± 4.65	92.97 ± 1.49	87.08 ± 0.75	98.55 ± 0.48
OA (%)	87.48 ± 1.31	88.42 ± 0.49	94.60 ± 0.14	94.99 ± 1.36	92.67 ± 1.14	96.84 ± 0.97	98.74 ± 0.46
AA (%)	73.63 ± 1.24	73.25 ± 1.23	88.69 ± 0.47	86.84 ± 2.19	92.99 ± 0.55	93.61 ± 0.45	98.93 ± 0.11
K × 100	83.64 ± 1.76	85.22 ± 0.61	92.88 ± 0.23	93.34 ± 1.83	90.54 ± 1.35	95.86 ± 1.27	98.61 ± 0.35

Table 7. The ablation experiment of the source domain.

Source Datasets	Target Datasets	OA (%)	AA (%)	K × 100
MI	PC	97.49 ± 0.98	94.62 ± 0.46	96.41 ± 0.76
	IN	73.77 ± 0.89	76.94 ± 1.37	71.35 ± 1.22
	LK	96.50 ± 1.69	96.63 ± 0.91	96.39 ± 1.13
CK	PC	97.36 ± 0.51	94.56 ± 1.07	96.38 ± 0.84
	IN	77.42 ± 1.02	79.81 ± 1.14	75.29 ± 1.27
	LK	96.47 ± 0.70	96.53 ± 0.39	96.32 ± 0.51
MI&CK	PC	98.57 ± 0.53	95.85 ± 0.47	97.89 ± 0.66
	IN	79.85 ± 0.78	80.06 ± 3.19	76.91 ± 0.90
	LK	98.74 ± 0.46	98.93 ± 0.11	98.61 ± 0.35

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, A.; Zhang, K.; Wu, H.; Chen, H.; Wang, M. Meta-Learning-Integrated Neural Architecture Search for Few-Shot Hyperspectral Image Classification. Electronics 2025, 14, 2952. https://doi.org/10.3390/electronics14152952

AMA Style

Wang A, Zhang K, Wu H, Chen H, Wang M. Meta-Learning-Integrated Neural Architecture Search for Few-Shot Hyperspectral Image Classification. Electronics. 2025; 14(15):2952. https://doi.org/10.3390/electronics14152952

Chicago/Turabian Style

Wang, Aili, Kang Zhang, Haibin Wu, Haisong Chen, and Minhui Wang. 2025. "Meta-Learning-Integrated Neural Architecture Search for Few-Shot Hyperspectral Image Classification" Electronics 14, no. 15: 2952. https://doi.org/10.3390/electronics14152952

APA Style

Wang, A., Zhang, K., Wu, H., Chen, H., & Wang, M. (2025). Meta-Learning-Integrated Neural Architecture Search for Few-Shot Hyperspectral Image Classification. Electronics, 14(15), 2952. https://doi.org/10.3390/electronics14152952

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Meta-Learning-Integrated Neural Architecture Search for Few-Shot Hyperspectral Image Classification

Abstract

1. Introduction

2. Materials and Methods

2.1. Overall Framework of MLFS-NAS

2.2. Few-Shot Sample Learning of Multi-Source Domain and Target Domain

2.3. Accurate and Robust Search Space

2.3.1. Design of Accurate and Robust Search Space

2.3.2. Internal Design of Search Space

3. Results

3.1. Dataset Description

3.2. Experimental Environment Configuration and Implementation Details

3.3. Comparison of the Proposed Method with the State-of-the-Art Methods

4. Discussion

4.1. Analysis of Optimal Cell Structure

4.2. Analysis of Related Parameter

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI