Next Article in Journal
LRNet: Change Detection in High-Resolution Remote Sensing Imagery via a Localization-Then-Refinement Strategy
Previous Article in Journal
Meta-Features Extracted from Use of kNN Regressor to Improve Sugarcane Crop Yield Prediction
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Adaptive Spectral Correlation Learning Neural Network for Hyperspectral Image Classification

1
School of Software Engineering, Chengdu University of Information Technology, Chengdu 610225, China
2
Sichuan Provincial Engineering Research Center for Intelligent Tolerance Design and Measurement, Chengdu 610225, China
3
College of Information and Intelligence, Hunan Agricultural University, Changsha 410128, China
4
School of Information Science and Technology, Southwest Jiaotong University, Chengdu 611756, China
*
Author to whom correspondence should be addressed.
Remote Sens. 2025, 17(11), 1847; https://doi.org/10.3390/rs17111847
Submission received: 13 April 2025 / Revised: 22 May 2025 / Accepted: 23 May 2025 / Published: 25 May 2025

Abstract

:
Hyperspectral imagery (HSI), with its rich spectral information across continuous wavelength bands, has become indispensable for fine-grained land cover classification in remote sensing applications. Although some existing deep neural networks have exploited the rich spectral information contained in HSIs for land cover classification by designing some adaptive learning modules, these modules were usually designed as additional submodules rather than basic structural units for building backbones, and they failed to adaptively model the spectral correlations between adjacent spectral bands and nonadjacent bands from a local and global perspective. To address these issues, a new adaptive spectral-correlation learning neural network (ASLNN) is proposed for HSI classification. Taking advantage of the group convolutional and ConvLSTM3D layers, a new adaptive spectral correlation learning block (ASBlock) is designed as a basic network unit to construct the backbone of a spatial–spectral feature extraction model for learning the spectral information, extracting the spectral-enhanced deep spatial–spectral features. Then, a 3D Gabor filter is utilized to extract heterogeneous spatial–spectral features, and a simple but effective gated asymmetric fusion block (GAFBlock) is further built to align and integrate these two heterogeneous features, thereby achieving competitive classification performance for HSIs. Experimental results from four common hyperspectral data sets validate the effectiveness of the proposed method. Specifically, when 10, 10, 10 and 25 samples from each class are selected for training, ASLNN achieves the highest overall accuracy (OA) of 81.12%, 85.88%, 80.62%, and 97.97% on the four data sets, outperforming other methods with increases of more than 1.70%, 3.21%, 3.78%, and 2.70% in OA, respectively.

1. Introduction

Hyperspectral images (HSIs) [1] have been used in many applications, such as military reconnaissance [2], precision agriculture [3], wetland dynamic monitoring [4], and so on [5,6,7]. Particularly, among HSI analysis and processing technologies, classification has become a very important and challenging information acquisition technology.
The filtering-based algorithm is a more useful way to extract spatial–spectral features by manually designing filters and directly interacting with HSIs [8]. Common filter-based feature extraction algorithms comprise the 3D Gabor filter [8,9], morphological profiles [10], scattering wavelet transform [11], scale-invariant feature transform [12], and local binary patterns [13], which can provide low-level and interpretable spatial–spectral features for HSI classification. Then, a suitable classifier can be used to classify land covers with the extracted spatial–spectral features from HSIs, where typical methods include multinomial logistic regression [14], composite kernel support vector machine (SVM-CK) [15], and sparse representation [16]. However, these methods need to be designed manually by experienced researchers for specific tasks.
As deep neural network (DNN)-based classification methods present advantages such as strong self-learning ability and excellent model generalization, various DNNs have been applied in remote sensing field, including convolutional neural networks (CNNs) [17,18], graph convolutional networks (GCNs) [19,20], and recurrent neural networks (RNNs) [21], obtaining better performance than other classification methods. RNNs have been widely used to analyze and process sequential data because of their unique directional feedback structure, where long short-term memory (LSTM) [22] and convolutional LSTM [23] are two special RNNs designed to solve learning problems related with the interdependence of long distance sequential data. According to [24], the convolutional LSTM has been further renamed as ConvLSTM2D for convenience. Thanks to the ability to model the long-term correlation of sequential data, there are two common ways to use LSTM or ConvLSTM2D for the spatial–spectral extraction and classification of HSIs. On the one hand, LSTM or ConvLSTM2D can be utilized alone as a basic unit to build the RNNs for HSI classification, such as SSLSTMs [25], bidirectional-ConvLSTM2D [26], the spatial–spectral ConvLSTM2D neural network (SSCL2DNN) [24], and the two-branch multidirectional spatial–spectral LSTM attention network [27]. On the other hand, LSTM or ConvLSTM2D can be integrated with CNNs or attention mechanisms to build a hybrid network, such as capsule network+ConvLSTM2D [28], tensor attention-driven ConvLSTM2D neural network [29], and nonlocal-dependent learning fully convolutional network [30]. To better preserve the intrinsic structure information of HSIs, Hu et al. [24] further proposed to extend the ConvLSTM2D to its 3D version (namely ConvLSTM3D) to directly extract the spatial–spectral features for HSI classification. Inspired by this, some ConvLSTM3D-based models have been gradually developed for HSI classification, such as SSCL3DNN [24], ConvLSTM3D+3D CNN (SSCRN) [31], ConvLSTM3D+attention mechanism (Dual-Channel A3CLNN) [32], and the regularized spatial–spectral global learning (RSSGL) framework [33]. However, the network structure designs and working mechanisms of these methods lack accurate interpretation, and there is a lack of obvious correlation between the extracted deep semantic features and the classification results of HSIs. To improve the interpretability of deep semantic features and make up for their lack of detailed information, some researchers have tried to integrate the traditional features into the training process of DNNs, where the complementary information and interaction of two different modality features can dynamically optimize the training direction of the whole networks for various applications, such as action recognition, object detection, lane detection, locomotion mode recognition, HSI change detection, and HSI classification [6,34], reducing the overfitting problems of the whole DNNs and improving their performance to a certain extent.
In particular, the special imaging mechanism with which HSIs are collected introduces rich and highly correlated spectral information, motivating the design of different spectral information learning submodules to be plugged into DNNs for spatial–spectral feature extraction. Particularly, attention mechanism is one of the most commonly used and effective ways. Li et al. [32] designed a spectral attention module by using the ConvLSTM2D as a basic unit to assist the dual-channel A3CLNN with adaptive learning of long-term spectral correlation, improving the classification accuracy of HSIs. An improved complex-valued deformable ConvLSTM2D was extended, improving the abilities of learning scale information and spectral correlations of the whole model [6]. Sun et al. [35] built a large kernel spectral–spatial attention network to learn the long-range 3D properties of HSIs. A dense spectral convolution module was designed for exploring the intrinsic similarity between spectral bands for HSI super-resolution [36]. In addition, transformer has gained widespread attention and success in the field of natural language processing due to its ability to handle global long-term dependencies in sequence data [37]. Subsequently, considering the sequence properties presented by the rich spectral information of HSIs, some researchers have attempted to introduce it into the task of HSI classification, achieving promising results. Xu et al. [38] proposed a double branch convolution–transformer network, where a convolution-spectral projection unit and a convolutional multihead self-attention network are applied for exploring the spectral correlation among spectral bands and local–global features for HSI classification. A center-masked transformer with a regularized center masked pretraining task was built to learn the dependencies between central land cover and its neighboring objects without labels during the pretraining process [39]. Shi et al. [40] developed a parallel dual-branch multiscale transformer, containing the spectral convolution, channel shrink soft split, and token-to-token modules for multiscale spatial–spectral feature extraction, which was followed by a pooled activation fusion module for feature fusion and the classification of HSIs. A spatial–spectral wavelet transformer was introduced, which unifies the downsampling with wavelet transforms for a lossless compression of features, preserving data integrity and improving the interaction between structural and shape information for HSI classification [41]. Aiming at the information loss of transformer during its propagation, a memory-augmented spectral–spatial transformer was constructed with a memory tokenizer and memory-augmented transformer encoder to effectively mix the spectral and spatial information for HSI classification. To fully eliminate the influence of multimodal heterogeneity, Hu et al. [42] innovatively viewed the global information as an intermediate agent to propose a new cross-memory quaternion transformer network, effectively improving the classification accuracy of land covers. Although the above spectral learning submodules of models can adaptively model the spectral correlations to obtain the spectral-enhanced spatial–spectral features for HSI classification, there are still some problems needed to further be solved. First, these modules usually only explored the long-range dependencies between different spectral bands, which, however, ignored the local correlations between the adjacent spectral bands, limiting the further improvements of classification performance of HSIs. Second, these modules are usually considered as the additional submodules to assist the feature extraction process of DNNs, which are rarely designed as the basic network units to build the new backbones for HSI classification.
Following our previous work in [6,24], to solve the above-mentioned problems, a novel adaptive spectral correlation learning neural network (ASLNN) is proposed for the spatial–spectral feature extraction and classification of HSIs. By considering the strong spectral correlations that exist between the adjacent spectral bands and between the nonadjacent spectral bands, a new adaptive adaptive spectral correlation learning block (ASBlock) is first designed by utilizing the group convolutional (GConv) layer and the ConvLSTM3D layer, jointly learning the short- and long-term spectral correlations. Then, by taking the designed ASBlock as a basic network unit, a full convolution-based spatial–spectral feature extraction network is constructed, adaptively modeling the unique spectral characteristics of different land covers and extracting enhanced deep spatial–spectral features to accurately discriminate between them. Furthermore, in order to improve the ability of deep semantic features to describe details and upgrade their interpretability, inspired by the conclusions in [6,8,43], the 3D Gabor filter is utilized as the heterogeneous feature extractor, and a simple but effective gated asymmetric fusion block (GAFBlock) is built to align and integrate these two modalities of spatial–spectral features, achieving competitive classification accuracy for HSIs. Experimental results on four common HSI data sets show the superiority of the proposed ASLNN model. The main contributions of this work can be summarized as follows.
(1) To adaptively learn the spectral information of HSIs, a new ASBlock is designed by integrating the GConv and ConvLSTM3D layers, which can be used to construct the backbones of other DNNs for joint learning of the short- and long-range correlations.
(2) Based on the designed ASBlock, a convolutional network can be constructed as the backbone for adaptively extracting the adaptive-spectral-enhanced spatial–spectral features.
(3) A GAFBlock is built to align and integrate the heterogeneous spatial–spectral features, and with the designed ASBlock-based backbone, a novel ASLNN model is further proposed to extract the adaptive spectral-enhanced spatial–spectral features, and it gives consideration to the interpretability of spatial–spectral feature extraction and the ability to describe detailed information design for the better classification of HSIs.
The remainder of this article is organized as follows. In Section 2, the framework and optimization process of the proposed ASLNN are described in detail. Section 3 reports the experimental settings, results, and comparative analysis, which is followed by the conclusions in Section 4.

2. Proposed ASLNN Model

2.1. Architecture Overview

Figure 1 presents the framework of the proposed ASLNN model for the spatial–spectral extraction and classification of HSIs. Due to the fact that the spectral correlations exist between adjacent spectral bands and between nonadjacent spectral bands, a universal plug-and-play ASBlock is constructed in Section 2.2 for extracting enhanced spatial–spectral features from HSIs. Then, an effective GAFBlock is further developed to integrate deep and heterogeneous features by adaptively learning their respective attribute information, improving the spatial–spectral extraction and classification performance of the whole model, as shown in Section 2.3. Finally, Section 2.4 outlines the structure and the training process of our ASLNN model in detail.

2.2. ASBlock

Although there have been plenty of deep models for the spectral-enhanced spatial–spectral feature extraction and classification of HSIs, they often failed to jointly consider the local and global spectral correlation information, limiting their classification performance.
Inspired by the ResNeSt that is designed to model the complementary of both feature-map attention and multipath representation [44], an ASBlock is constructed to adaptively learn the spectral correlations (from a local and global perspective) for the better classification of HSIs. Its inner structure is shown in Figure 2, which contains a spectral group split (SGS) subblock, a short-term spectral correlation learning (STC) subblock, a long-term spectral correlation learning (LTC) subblock, and a group spatial–spectral attention learning (GSS) subblock.

2.2.1. SGS Subblock

Let the input of our ASBlock be denoted as X l R s l × s l × K l × C l , where s l , K l , and C l are the spatial size, the spectral number, and the channel number, respectively. Firstly, the input X l is fed into the SGS subblock, where it is equally divided into N parts along the spectral dimension, forming a spectral group X i l R s × s × d × C l as the input of other subblocks. i = 1 , 2 , , N s , N s = K l d . · means the round down operation, and d is the number of the spectral bands in each spectral group.

2.2.2. STC Subblock

Then, by considering the fact that there is the spectral correlation between the adjacent spectral bands, the spectral group built in the SGS subblock is fed into the STC subblock, where a 1 × 1 × 1 GConv block and a 3 × 3 × 3 GConv block are executed in turn to learn the short-term spectral information, where the GConv block contains a 3D GConv layer, a batch normalization (BN) layer, and a Rectified Linear Unit (ReLU) function, extracting the short-term spectral-enhanced spatial–spectral features from HSIs. The above calculation process is defined as
X i l S = f G C 3 ( f G C 1 ( X i l ) ) ,
where X i l S R s l × s l × d × C l + 1 means the short-term spectral-enhanced spatial–spectral feature. f G C 1 ( · ) and f G C 3 ( · ) are the 1 × 1 × 1 and 3 × 3 × 3 3D GConv blocks, respectively.

2.2.3. LTC Subblock

Furthermore, as spectral correlation also exists between nonadjacent spectral bands, a LTC subblock is further built by utilizing the ability of learning long-term dependencies of the ConvLSTM3D layer, thus adaptively modeling the long-range spectral correlation for spatial–spectral feature extraction and classification. Particularly, the short-term spectral enhanced spatial–spectral feature X i l S is first converted into a sequence with length N s . And then this sequence is fed to a ConvLSTM3D layer to extract the long-term spectral-enhanced spatial–spectral feature from HSIs, as indicated as below
X i l L = f C L 3 D 1 ( X i l S ) ,
where X i l L R s l × s l × d × C l + 1 means the long-term spectral-enhanced spatial–spectral feature. f C L 3 D 1 ( · ) means the 1 × 1 × 1 ConvLSTM3D layer.

2.2.4. GSS Subblock

Then, to jointly learn and integrate the above short- and long-term spectral correlations, a GSS subblock is further developed to learn the adaptive spectral correlations from HSIs, as shown in Figure 3, which consists of a group self-spectral attention module and a self-intrinsic attention module, as shown in Figure 3a and Figure 3b, respectively.
The spatial–spectral feature X i l L is first fed into the group self-spectral attention module, where the group spectral attention weights are learned from the input along with its spectral dimension, adaptively modeling the short- and long-term spectral correlations to extract more discriminative spatial–spectral features. Due to the fact that different land covers present various structure characteristics, a self-intrinsic attention module is further built from the input X i l L to learn the intrinsic structure attention weight, thus jointly modeling the structure- and adaptively spectral correlation-enhanced spatial–spectral features for HSI classification. In particular, in order to avoid the gradient vanishing (or explosion) problem to some extent, a skip connection is considered [44], where the input spatial–spectral feature X i l L is reused as the output. The forward propagation of the GSS subblock can be written as
T = f C B R 1 ( f c o n c a t ( X i l L ) ) H i l L = X i l L f s p e ( T ) H i l ^ L = H i l L f i n t ( T ) F l = f c o n c a t ( H i l ^ L + X i l L ) ,
where F l R s L × s l × K l × C l + 1 means the output of the GSS subblock, and f c o n c a t ( · ) denotes the concatenation operation. f C B R 1 ( · ) is the 1 × 1 × 1 3D Conv layer with the BN layer and the ReLU function. ⊗ means the element-wise multiplication operation. f s p e ( · ) and f i n t ( · ) are, respectively, the group self-spectral attention module and the self-intrinsic attention module, which can be defined as
f s p e ( T ) = G σ ( f C 1 ( s o f t m a x ( f C B R 1 ( T ) ) f C B R 1 ( T ) ) ) f i n t ( T ) = σ ( f C 1 ( s o f t m a x ( f G A P ( f C B R 1 ( T ) ) ) f C B R 1 ( T ) ) ) ,
where σ ( · ) , G σ ( · ) , and s o f t m a x ( · ) denote the sigmoid, group sigmoid, and softmax functions, respectively. ⊙ is the matrix multiplication operation. f G A P ( · ) is the global average pooling (GAP) layer. f C 1 ( · ) is the 1 × 1 × 1 3D Conv layer without the BN layer and the ReLU function.
Finally, the above output F l is fed into another 1 × 1 × 1 3D GConv layer, and the input X l is further reused to obtain the final spatial–spectral feature X l + 1 R s L × s l × K l × C l + 1 of the whole ASBlock
X l + 1 = f C B R 1 ( F l ) + X l .
Therefore, by taking the ASBlock in Figure 2 as a basic unit to build the adaptive convolutional backbone, the adaptive spectral-enhanced deep spatial–spectral features can be extracted for HSI classification.

2.3. GAFBlock

Some work has verified that combining deep and heterogeneous features can solve the problems of the lack of details and interpretability of deep features to some degree [6,43]. However, the parameters and time complexities are relatively high, limiting the applications of these algorithms in practice. As such, drawing on the gate structure of ConvLSTM-based models and the attention mechanism, a simple but effective GAFBlock is constructed to integrate these two modalities of spatial–spectral features with low complexity, as shown in Figure 4, obtaining more expressive spatial–spectral features for HSI classification.
First, following previous research [6,8,43], the 3D Gabor filter is applied as the heterogeneous spatial–spectral feature extractor, whose detailed calculation process and parameter settings for spatial–spectral feature extraction are the same as those in [6,43] to simplify the model complexity and experimental analysis. A 3D Gabor filter is a specialized linear filter designed to capture spatial–spectral features in volumetric data, such as HSIs. It extends the 2D Gabor filter by incorporating a third dimension (spectral) to analyze joint spatial–spectral variations. The filter combines a sinusoidal plane wave with a 3D Gaussian envelope, enabling multiscale and multi-orientation feature extraction. For a pixel at spatial–spectral coordinates ( α , β , γ ), the circular 3D Gabor filter is defined as
Ψ f , φ , θ ( α , β , γ ) = 1 ( 2 π ) 2 3 σ 3 exp j 2 π ( α x + β y + γ z ) exp α 2 + β 2 + γ 2 2 σ 2
where x = f sin φ cos θ , y = f sin φ sin θ , and z = f cos φ . Variable f is the central frequency (scale factor), φ and θ represent the orientation angles in the 3D frequency domain, and ( σ ) is width of the Gaussian envelope in the 3D frequency domain. Therefore, heterogeneous spatial–spectral features can be extracted by using N g 3D Gabor wavelets with different frequencies and orientations, which can be written as F ( h e t e r o g e n e o u s ) . Hence, the two modalities of spatial–spectral features (i.e., deep feature F ( d e e p ) , Gabor feature F ( h e t e r o g e n e o u s ) ) can be obtained from HSI data.
After that, due to the fact that the heterogeneous features contain more detailed information of land covers, a 1 × 1 × 1 3D Conv layer and a softmax function are applied to obtain the attention weight for learning of the detailed structure information of HSIs, and another 1 × 1 × 1 3D Conv layer with a ReLU function is also executed to align the dimension of heterogeneous features with that of the deep features, which can be written as
F ( h e t e r o g e n e o u s ) = f C 1 ( F ( h e t e r o g e n e o u s ) ) , α ( d e t ) = s o f t m a x ( f C 1 ( F ( h e t e r o g e n e o u s ) ) )
where α ( d e t ) is the attention weight for the learning of detailed information of heterogeneous features.
As for the deep feature F ( d e e p ) , it is a channel attention structure containing a GAP layer, a 1 × 1 × 1 3D Conv layer with a ReLU function, and a 1 × 1 × 1 3D Conv layer with a softmax function to model the most important channel information, whose forward propagation can be expressed as
α ( c h a ) = s o f t m a x ( f C 1 ( R e L U ( f C 1 ( f G A P ( F ( d e e p ) ) ) ) ) ) ,
where α ( c h a ) is the attention weight for learning the channel information of deep features.
Then, by projecting these two modalities of features into a common space, their complementary evidences can be learned. Moreover, with the gate structure and the skip connection, these two modalities of features are adaptively integrated, jointly modeling the detailed and semantic information to extract the discriminative spatial–spectral features from HSIs, as written below
F ( f u s e ) = σ ( F ( h e t e r o g e n e o u s ) α ( c h a ) + F ( d e e p ) α ( d e t ) ) + F ( d e e p ) ,
where F ( f u s e ) is the fused spatial–spectral feature. σ ( · ) is the sigmoid function as the gate structure.

2.4. ASLNN Model and Its Training Process

Based on the above ASBlock and GAFBlock, a novel ASLNN model is proposed, as shown in Figure 1, whose backbone contains a 3 × 3 × 3 3D Conv layer with a ReLU function, l ASBlocks, and a GAFBlock to explore the adaptive spectral correlations and integrate the advantages of details of heterogeneous features, extracting discriminative spatial–spectral features for HSI classification.
Suppose that the dimension of original HSI data is expressed as W × H × D , where W, H, and D are the width, the height, and the spectral number, respectively. Similar to the preprocessing process in most works, a 3D volume, containing the local spatial region with size s × s and the spectral band with number D, is extracted as the spatial–spectral information of each pixel x, which can be denoted as X R s × s × D . Then, same as the designs in [6,8,43], by considering the characteristics of heterogeneous features extracted by the 3D Gabor filter, its orientations ( θ and ϕ ) take the values of 0 , π n , 2 π n , , ( n 1 ) π n , n = 4 , and the number ( N g ) of the 3D Gabor filters is 13 for each frequency (f) after removing some redundant filters. To simplify the experimental analysis, the value of d in each spectral group is fixed to 13 (with zeros appended if not divisible), where the number of the total spectral bands is set as K and the spectral group number is ( N s ). Hence, the spatial–spectral information of pixel x is expressed as X R s × s × K , and N s = K d .
Based on the above data preprocessing framework, the input data X are first fed into a 3 × 3 × 3 3D Conv layer with a ReLU function to extract the low-level features. Then, to adaptively explore spectral correlations of HSIs, l ASBlocks are stacked to jointly model the short- and long-range spectral correlations, generating high-level deep spatial–spectral feature F ( d e e p ) R s × s × K × C l from the lth ASBlock. Meanwhile, X is convolved with N g 3D Gabor wavelets to extract the heterogeneous feature F ( h e t e r o g e n e o u s ) R s × s × K × N g . Then, these two modalities of spatial–spectral features (i.e., F ( d e e p ) and F ( h e t e r o g e n e o u s ) ) are fed into the GAFBlock to adaptively explore and integrate their complementary information, extracting the fused spatial–spectral feature F ( f u s e ) R s × s × K × C l from Equation (9) for HSI classification. Particularly, to simplify the parameter analysis process of the whole model, the channel number of the first 3 × 3 × 3 3D Conv layer is fixed to 32, l is 1, and the channel numbers in ASBlocks and GAFBlock are fixed to 64.
Finally, inspired by [6,45], the spatial–spectral feature F ( f u s e ) is fed into a GAP layer and an orthogonal softmax layer (OSL) where the classification network contains fewer trainable parameters, and the cross-entropy is used as the loss function L of ASLNN, as expressed below
Y P = f O S L ( f G A P ( F ( f u s e ) ) ) L = ( Y T · l o g ( Y P ) ) ,
where f O S L ( · ) is the OSL. Y P and Y T are the prediction label of the whole ASLNN model and the ground-truth label of the input data, respectively.
By optimizing the equation above (10) with the adaptive momentum (ADAM) algorithm and the learning rate l r , our ASLNN can be trained in an end-to-end manner, thereby generating the classification results of land covers. Additional analysis on parameter settings will be reported in Section 3.

3. Experimental Results

In this section, to quantitatively and qualitatively evaluate the performance of our ASLNN, some spatial–spectral classification methods, e.g., APDCLNN [6], 3DG-CNN [43], SVM-CK [15], SSCL3DNN [24], 3DOC-SSAN [46], D2S2BoT [47], DBCTNet [38], HybridSN [48], and ISODATA [49], are selected as the comparison methods. Four public HSI data sets are applied for performance analysis, e.g., WHU-Hi-LongKou, Indian Pines, 2013 Houston Data, and MUUFL Gulfport. The overall accuracy (OA), average accuracy (AA), and Kappa coefficient ( κ ) are the adopted quantitative metrics. To avoid the deviation caused by a random selection of samples, 10 Monte Carlo runs are executed, reporting the average value of each metric in the following experimental results. Our ASLNN is built with the Tensorflow platform (i.e., Tensorflow-GPU 2.5.0, python 3.8.0) and trained on a desktop with an Intel Core i7-12700F processor and an Nvidia GeForce RTX 3080ti GPU.

3.1. Hyperspectral Data

In order to measure the classification performance of the proposed ASLNN model with the above comparison methods, the WHU-Hi-LongKou, Indian Pines, Houston Data, and MUUFL Gulfport are used as the HSI data sets, whose detailed information can be found in Figure 5 and Figure 6.
(1) WHU-Hi-LongKou: The WHU-Hi-LongKou data set, depicted in Figure 5, was procured on a DJI Matrice 600 Pro, conducting aerial surveys over Longkou Town in China in 2018. The data set contains 270 spectral bands with a spectral resolution of 0.4 to 1 μm and 550 × 400 pixels [50,51]. The scene covers nine specific categories, comprising six diverse crop species, totaling 204,542 samples.
(2) Indian Pines: Using the Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) sensor, the Indian Pines data set was acquired in 1992 in northwestern Indiana, USA, which consists of 145 × 145 pixels with a spatial resolution of 20 m per pixel (mpp), and 200 bands in the wavelength range from 0.40 to 2.50 μm after removing some unusable spectral bands [52]. In addition, after ignoring the background pixels, 10,249 samples from 16 classes are used for experimental analysis.
(3) Houston Data: The Houston data set was collected in 2012 with the Compact Airborne Spectrographic Imager (CASI) sensor over the University of Houston campus and its surrounding area, and it is the official data set in the 2013 IEEE Geoscience and Remote Sensing Society (GRSS) Data Fusion contest, which is available online from the official Website: http://dase.grss-ieee.org/, accessed on 22 May 2025. Its spatial size is 349 × 1905 pixels with a spatial resolution of 2.5 m, and there are 144 spectral bands in the wavelength range from 0.38 to 1.05 μm maintained by removing some noise-corrupted spectral bands [52]. Moreover, there are 15,029 samples from 15 labeled classes for experimental research.
(4) MUUFL Gulfport: It was collected in 2010 by using the ITRES CASI-1500 sensor over the University of Southern Mississippis Gulf park Campus, Long Beach, Mississippi, USA [53], which is available online from the official website: https://github.com/GatorSense/MUUFLGulfport, accessed on 22 May 2025. This data set contains 325 × 337 pixels with a spatial resolution of 1 m where 325 × 220 pixels are applied for analysis and 72 spectral bands with a wavelength range from 0.375 to 1.050 μm. After removing some noise bands and regardless of background pixels, 64 bands and 53,687 labeled samples with 11 urban land-cover classes are maintained for experimental research.

3.2. Experimental Settings

The parameter settings of the selected compared methods are used according to [6,15,24,38,43,46,47,48,49]. In addition, 10 samples are randomly selected from each class of the Indian Pines, Houston, and MUUFL Gulfport data sets to build the training set, while the other samples are used for testing. The training and testing sets of the WHU-Hi-LongKou data set have been officially divided. In our research, we use 25 training labels for each class, while the other samples are used for testing.
According to the structure design of the proposed ASLNN model, there are some key parameters needed to be tuned. Particularly, inspired by the work in [6,8,43], the value of the frequency f for 3D Gabor filter is fixed to 0.50, 0.25, and 0.125 for the four HSI data sets, respectively. In addition, in order to simplify the parameter analysis process and compare fairly, the kernel size and the channel number in the whole model are fixed, as shown in Section 2. Therefore, only two key parameters need to be analyzed in detail, such as the spatial size ( s × s ), the number ( N s ) of the spectral groups or the number (K) of the spectral bands. Compared with other methods, the complexity of parameter analysis of the proposed model is greatly reduced.
For the spatial size s, the effect of its value on the classification performance of the whole model is further studied, where s is searched from 3 , 5 , 7 . Specifically, due to the limitations of memory, the local spatial window with a larger size fails to be supported. According to the experimental results in Figure 7, the proposed ASLNN model produces quasi-optimal classification accuracy when s is set to 7 for the four HSI data sets.

3.3. Classification Performance

Based on the experimental settings, Table 1, Table 2, Table 3 and Table 4 report the classification results (i.e., class-specific accuracy, OA, AA, and κ ) of the proposed ASLNN model and other comparison methods under the Indian Pines, Houston Data, MUUFL Gulfport, and WHU-Hi-LongKou data sets, respectively.
Compared with other supervised algorithms, ISODATA failed to achieve comparable results. There might be several reasons why ISODATA fails to achieve comparable results to other supervised algorithms. First, as an unsupervised algorithm, it cannot utilize labeled training samples for model development, which inherently limits its ability to match the classification performance of supervised methods that leverage explicit class annotations. Second, the data set used in this paper contains a large number of continuous spectral bands, leading to a high-dimensional and information-redundant data space. This poses significant challenges for distance-based algorithms like ISODATA, as traditional distance metrics become ineffective, inter-class discrimination becomes ambiguous, and the reliability of clustering results decreases. Finally, real-world ground objects exhibit inherent spectral variability (e.g., intra-class spectral differences caused by lighting conditions, humidity, or phenological stages). However, ISODATA relies on linear distance metrics and assumes simple clustering structures [49], making it unable to model the complex nonlinear distributions of such spectral variations, resulting in suboptimal clustering performance.
By integrating the multilayer residual convolution and spectral–spatial bottleneck transformer, D2S2BoT was proposed to jointly model the local features and the long-range spectral correlation, as well as adaptively fuse global spectral and spatial dependencies, thus yielding the second-best classification results on the four data sets. As a strong baseline under the case of small-sample classification, SVM-CK was designed by incorporating the composite kernel into the SVM classifier for jointly using the spatial–spectral information [15], which can process high-dimensional small-sample data, and it is insensitive to the “curse of dimensionality”, generating the third-best classification results on the MUUFL Gulfport data set. Particularly, according to Figure 6c–f, compared with the other three HSI data sets, the MUUFL Gulfport data set presents the characteristics of category imbalance and large differences in the number of samples between different classes, thus obtaining better classification accuracy than APDCLNN and DBCTNet. Since the complementary information between different modality features is not considered, other DNNs (i.e., HybridSN, SSCL3DNN, 3DOC-SSAN, 3DG-CNN) yield the relatively poor classification performance for the four HSI data sets. Differently, on the basis of the spectral group and 3D Gabor filter, our ASLNN model adaptively learns the spectral correlations by jointly considering the relationships between the adjacent spectral bands and the nonadjacent spectral bands, and it integrates the fine-grained complementarity of deep and heterogeneous spatial–spectral features, thus achieving the best classification performance for the four HSI data sets under the premise of low complexity. All of these aspects verify the effectiveness and superiority of the proposed ASLNN model for HSI classification purposes.
More precisely, D2S2BoT and DBCTNet are designed by transformer to capture the local feature maps and simulate the long-range correlation of HSI pixels across both spectral and spatial dimensions. However, they only consider the long-range spectral correlation, and they ignore the local relationship between the adjacent spectral bands, limiting their classification performance. Compared with them, our ASLNN model jointly learns the short-range and long-range spectral correlations from both local and global perspectives, and it effectively integrates the heterogeneous spatial–spectral features, achieving an average improvement of 1.70%, 3.21%, 3.80% and 2.49% compared to D2S2BoT and DBCTNet on the OA value of four HSI data sets, respectively. Although APDCLNN utilizes the modified ConvLSTM2D layer and the heterogeneous feature fusion module to enhance the ability of spatial–spectral feature extraction for improving the classification accuracy of HSIs, its complexity is very high due to a large number of operations and complicated feature extraction and fusion processes. Compared with APDCLNN, our ASLNN model effectively integrates the heterogeneous spatial–spectral features with low storage and computing requirements, thus obtaining 3.30%, 4.57%, 7.59%, and 2.70% improvements in OA for the four HSI data sets, respectively. As for 3DG-CNN, the 3D Gabor filter is applied to build a 3D Gabor-modulated kernel to replace the random initialization kernel to improve the ability of representation and robustness of the extracted spatial–spectral features, which, however, overlooks the influence of heterogeneous features on the classification performance. Compared with it, the gains in OA generated by our ASLNN model are 7.25%, 6.81%, 9.72%, and 4.76% for the four HSI data sets, respectively. The proposed ASLNN model can obtain 7.94%, 6.30%, 4.21%, and 3.42% gains in OA for the four HSI data sets, respectively, when compared with SVM-CK. In addition, compared with other deep models (i.e., HybridSN, SSCL3DNN, and 3DOC-SSAN), our ASLNN model can achieve similar performance gains. Based on the above experimental results and analysis, effective learning of the spectral correlations and fusion of the heterogeneous spatial–spectral features are the keys to improving the classification accuracy of land covers. Therefore, the proposed ASLNN model can extract more discriminative and robust spatial–spectral features and obtain the highest classification accuracy for HSIs, illustrating its rationality and effectiveness.
Moreover, more intuitively, Figure 8, Figure 9, Figure 10 and Figure 11 illustrate their classification maps for the four HSI data sets, respectively. It can be observed that the classification maps yielded by our ASLNN model are the closest to the ground-truth maps of the four HSI data sets, where the classification qualities of class 3, class 6, class 7, and class 12 in Figure 8, class 1, class 2, class 4, and class 15 in Figure 9, class 2, class 5, class 6, class 9, and class 11 in Figure 10, and class 1, class 2, class 5, and class 8 in Figure 11 are significantly improved. These are consistent with the conclusions in Table 1, Table 2, Table 3 and Table 4. These experimental results further prove the effectiveness of our proposed ASLNN model.

3.4. Sensitivity Comparison of Different Training Samples

To further compare and analyze the sensitivity of different training samples on the classification performance of all supervised considered methods, Figure 12 reports the experimental results under different training sizes, where 10, 20, 30, and 40 samples are randomly selected from each class to build the training set for the Indian Pines, Houston, and MUUFL Gulfport data sets, respectively, while the other samples are applied for testing. Particularly, as for class 7 and class 9 in the Indian Pines data set, their numbers of the training samples are fixed to 10. For the WHU-Hi-LongKou data set, we use four kinds of ground truth setups with 25, 50, 100, and 150 training labeled samples (also provided by the official) for each class, respectively. We can observe that as the number of training samples increases from 10 to 40 (or 25 to 150), the classification performance of all methods presents a trend of first rising and then stabilizing. In particular, the proposed ASLNN model achieves the best classification performance in all cases for the four HSI data sets, which also demonstrates its advantages.

3.5. Ablation Study

Aiming at adaptively learning the inherent attribute information of HSIs, an ASBlock and a GAFBlock are designed to construct an ASLNN model to adaptively learn the adaptive spectral correlations and integrate the details and interpretability from the heterogeneous features, improving the classification accuracy of land covers in HSIs. To illustrate and measure the contributions of the designed ASBlock and GAFBlock to the classification results of our ASLNN, a detailed ablation study is conducted, where the proposed ASLNN models with and without each component are abbreviated as proposed (with) and proposed (without) for convenience, respectively.

3.5.1. Effectiveness of the ASBlock

Inspired by the ResNeSt block in [44], to adaptively model the correlations between the adjacent spectral bands and between the nonadjacent spectral bands of HSIs, a novel ASBlock is built by utilizing the GConv and ConvLSTM3D layers. To demonstrate its contribution for HSI classification of the whole model, our ASLNN model is compared with its variant (i.e., the proposed (without ASBlock)) where the ASBlock is replaced with the ResNeSt block, whose experimental results are given in Table 5. It can be observed that compared with this variant, the ASBlock can bring 9.26%, 12.96%, 6.97%, and 3.01% gains in OA to the proposed ASLNN model for the four HSI data sets, respectively, verifying its feasibility and effectiveness in learning spectral correlations and extracting spatial–spectral features with strong expressive ability for HSI classification.

3.5.2. Structure Analysis of the GAFBlock

In order to adaptively align and fuse the heterogeneous spatial–spectral features (i.e., deep features and heterogeneous features) with low complexity, a simple but effective GAFBlock is constructed, and the experiments are carried out to compare our ASLNN with its variant (i.e., the proposed (without GAFBlock)) where the GAFBlock is replaced with the concatenation operation for HSI classification. From Table 6, compared with the concatenation operation, with the GAFBlock, ASLNN obtains 1.35%, 3.12%, 4.75%, and 2.36% gains in OA for the four HSI data sets, respectively, which shows the effectiveness of the designed GAFBlock on fusing the heterogeneous spatial–spectral features. Particularly, compared with the MAFB submodule in [6], the newly designed GAFBlock contains fewer network parameters and has lower computational complexity, which reduce the adverse influence of overfitting problem on the proposed ASLNN model, thus improving its optimization efficiency and classification performance for HSIs.

3.6. Computational Costs

As presented in Table 7, Table 8, Table 9 and Table 10, we report the average training time, average inference time, and memory usage of all compared methods (supervised) on an NVIDIA RTX 3080 GPU. The results show that while our algorithm incurs relatively higher time costs due to its inherent complexity (especially the serial sequence processing design of ConvLSTM3D), it outperforms other methods in terms of classification accuracy.

3.7. Limitations and Future Directions

First, the model exhibits high computational complexity with its intricate architecture leading to significant computational overhead and prolonged processing times. Second, real-time deployment on drones or embedded devices remains currently infeasible due to memory requirements exceeding 2 GB and high inference latency (over 2000 ms inference). Third, because of various factors such as sensor types as well as daily and seasonal variations, there are usually significant differences in data distribution and category distribution between test data and training data in actual scenarios, limiting the application effectiveness of the proposed model in practical scenarios. Moving forward, our future work will focus on addressing these challenges through lightweight optimization techniques, such as model quantization, pruning, and knowledge distillation [54]. In addition, in order to improve the generalization of the proposed model in practical scenarios, domain adaptive hyperspectral or multisource remote sensing image classification methods are also the next focus of research, such as cross-domain few-shot classification [55], cross-domain end-to-end classification of HSIs, or multisource remote sensing data [56].

4. Conclusions

In this article, a novel ASLNN model has been proposed for the spatial–spectral feature extraction and classification of HSIs. Firstly, by integrating the GConv layer and the ConvLSTM3D layer, an effective ASBlock is designed to jointly capture the short- and long-term spectral correlations, which can be used as a plug-and-play feature extraction module to be inserted into any deep learning-based models or used as a fundamental network unit to construct a new backbone for adaptively modeling the local and global correlations, thus generating enhanced deep spatial–spectral features for HSI classification. After that, due to the rich detail and interpretability of heterogeneous features, the 3D Gabor filter is applied to extract heterogeneous spatial–spectral features, and a GAFBlock is further designed to fuse the heterogeneous spatial–spectral features with low requirements in terms of parameter and computational complexity. In addition, some network parameters are designed as fixed values, simplifying the whole model. Extensive comparison and ablation experiments on four commonly-used HSI data sets (i.e., Indian Pines, Houston, MUUFL Gulfport, and WHU-Hi-LongKou data sets) have been conducted to evaluate the proposed ASLNN model. Specifically, when 10, 10, 10 and 25 samples from each class are selected for training, ASLNN achieves the highest overall accuracy (OA) of 81.12%, 85.88%, 80.62%, and 97.97% on the four data sets, outperforming other methods with increases of more than 1.70%, 3.21%, 3.78%, and 2.70% in OA, respectively.

Author Contributions

Conceptualization, W.-Y.W.; Methodology, Y.-J.D. and Y.-P.X.; Validation, B.-J.G. and C.-L.Z.; Investigation, H.-C.L.; Data curation, Y.-J.D. and W.-Y.W.; Writing—original draft, W.-Y.W. and Y.-J.D.; Writing—review and editing, Y.-P.X., B.-J.G. and C.-L.Z.; Visualization, H.-C.L.; Funding acquisition, Y.-J.D. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the National Natural Science Foundation of China under Grant 62401203 and in part by the Hunan Provincial Key Research and Development Program under Grant 2023NK2011.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors upon request.

Acknowledgments

The authors would like to thank D. Landgrebe at Purdue University for providing the free downloads of the Indian Pines Airborne Visible/Infrared Imaging Spectrometer images.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. van der Meer, F. Analysis of spectral absorption features in hyperspectral imagery. Int. J. Appl. Earth Observ. Geoinf. 2004, 5, 55–68. [Google Scholar] [CrossRef]
  2. Matteoli, S.; Diani, M.; Corsini, G. Automatic target recognition within anomalous regions of interest in hyperspectral images. IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens. 2018, 11, 1056–1069. [Google Scholar] [CrossRef]
  3. Long, C.-F.; Wen, Z.-D.; Deng, Y.-J.; Hu, T.; Liu, J.-L.; Zhu, X.-H. Locality preserved selective projection learning for Rice variety identification based on leaf hyperspectral characteristics. Agronomy 2023, 13, 2401. [Google Scholar] [CrossRef]
  4. Xie, Z.; Hu, J.; Kang, X.; Duan, P.; Li, S. Multilayer global spectral-spatial attention network for wetland hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5518913. [Google Scholar] [CrossRef]
  5. Ghamisi, P.; Yokoya, N.; Li, J.; Liao, W.; Liu, S.; Plaza, J.; Rasti, B.; Plaza, A. Advances in hyperspectral image and signal processing: A comprehensive overview of the state of the art. IEEE Geosci. Remote Sens. Mag. 2017, 5, 37–78. [Google Scholar] [CrossRef]
  6. Hu, W.S.; Li, H.C.; Wang, R.; Gao, F.; Du, Q.; Plaza, A. Pseudo complex-valued deformable ConvLSTM neural network with mutual attention learning for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5533017. [Google Scholar] [CrossRef]
  7. Deng, Y.J.; Zhang, L.-W.; Ren, L.; Zhu, X.; Li, H.-C.; Du, Q. Tensor decomposition based relaxed linear regression for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2025. [Google Scholar] [CrossRef]
  8. Shen, L.; Jia, S. Three-dimensional Gabor wavelets for pixel-based hyperspectral imagery classification. IEEE Trans. Geosci. Remote Sens. 2011, 49, 5039–5046. [Google Scholar] [CrossRef]
  9. Zhao, Z.; Xu, X.; Li, J.; Li, S.; Plaza, A. Gabor-modulated grouped separable convolutional network for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5518817. [Google Scholar] [CrossRef]
  10. Hou, B.; Huang, T.; Jiao, L. Spectral-spatial classification of hyperspectral data using 3-D morphological profile. IEEE Geosci. Remote Sens. Lett. 2015, 12, 2364–2368. [Google Scholar]
  11. Tang, Y.Y.; Lu, Y.; Yuan, H. Hyperspectral image classification based on three-dimensional scattering wavelet transform. IEEE Trans. Geosci. Remote Sens. 2015, 53, 2467–2480. [Google Scholar] [CrossRef]
  12. Li, Y.; Li, Q.; Liu, Y.; Xie, W. A spatial-spectral SIFT for hyperspectral image matching and classification. Pattern Recognit. Lett. 2019, 127, 18–26. [Google Scholar] [CrossRef]
  13. Li, Y.; Tang, H.; Xie, W.; Luo, W. Multidimensional local binary pattern for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5505113. [Google Scholar] [CrossRef]
  14. Wang, X. Kronecker factorization-based multinomial logistic regression for hyperspectral image classification. IEEE Geosci. Remote Sens. Lett. 2022, 19, 5508005. [Google Scholar] [CrossRef]
  15. Wang, Y.; Duan, H. Classification of hyperspectral images by SVM using a composite kernel by employing spectral, spatial and hierarchical structure information. Remote Sens. 2018, 10, 441. [Google Scholar] [CrossRef]
  16. Nie, X.; Xue, Z.; Lin, C.; Zhang, L.; Su, H. Structure-prior-constrained low-rank and sparse representation with discriminative incremental dictionary for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5506319. [Google Scholar] [CrossRef]
  17. He, W.; Yang, Y.; Mei, S.; Hu, J.; Xu, W.; Hao, S. Configurable 2D-3D CNNs accelerator for FPGA-based hyperspectral imagery classification. IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens. 2023, 16, 9406–9421. [Google Scholar] [CrossRef]
  18. Sun, L.; Wang, X.; Zheng, Y.; Wu, Z.; Fu, F. Multiscale 3-D-2-D mixed CNN and lightweight attention-free transformer for hyperspectral and LiDAR classification. IEEE Trans. Geosci. Remote Sens. 2024, 62, 2100116. [Google Scholar] [CrossRef]
  19. Yang, J.Y.; Li, H.C.; Yang, J.H.; Pan, L.; Du, Q.; Plaza, A. Multifrequency graph convolutional network with cross-modality mutual enhancement for multisource remote sensing data classification. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5505914. [Google Scholar] [CrossRef]
  20. Wang, Q.; Huang, J.; Shen, T.; Gu, Y. EHGNN: Enhanced hypergraph neural network for hyperspectral image classification. IEEE Geosci. Remote Sens. Lett. 2024, 61, 5504405. [Google Scholar] [CrossRef]
  21. Elman, J.L. Finding structure in time. Cognit. Sci. 1990, 14, 179–211. [Google Scholar] [CrossRef]
  22. Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
  23. Shi, X.; Chen, Z.; Wang, H.; Yeung, D.Y.; Wong, W.K.; Woo, W.C. Convolutional LSTM network: A machine learning approach for precipitation nowcasting. In Proceedings of the 29th International Conference on Neural Information Processing Systems (NIPS 15), Montreal, QC, Canada, 7–12 December 2015; pp. 802–810. [Google Scholar]
  24. Hu, W.S.; Li, H.C.; Pan, L.; Li, W.; Tao, R.; Du, Q. Spatial-spectral feature extraction via deep ConvLSTM neural networks for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2020, 58, 4237–4250. [Google Scholar] [CrossRef]
  25. Zhou, F.; Hang, R.; Liu, Q.; Yuan, X. Hyperspectral image classification using spectral-spatial LSTMs. Neurocomputing 2017, 328, 39–47. [Google Scholar] [CrossRef]
  26. Liu, Q.; Zhou, F.; Hang, R.; Yuan, X. Bidirectional-convolutional LSTM based spectral-spatial feature learning for hyperspectral image classification. Remote Sens. 2017, 9, 1330. [Google Scholar] [CrossRef]
  27. Song, T.; Wang, Y.; . Gao, C.; Chen, H.; Li, J. MSLAN: A two-branch multidirectional spectral-spatial LSTM attention network for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5528814. [Google Scholar] [CrossRef]
  28. Wang, W.Y.; Li, H.C.; Deng, Y.J.; Shao, L.Y.; Lu, X.Q.; Du, Q. Generative adversarial capsule network with ConvLSTM for hyperspectral image classification. IEEE Geosci. Remote Sens. Lett. 2021, 18, 523–527. [Google Scholar] [CrossRef]
  29. Hu, W.S.; Li, H.C.; Deng, Y.J.; Sun, X.; Du, Q.; Plaza, A. Lightweight tensor attention-driven ConvLSTM neural network for hyperspectral image classification. IEEE J. Sel. Topics Signal Process. 2021, 15, 734–745. [Google Scholar] [CrossRef]
  30. Tu, B.; He, W.; Li, Q.; Peng, Y.; Chen, S. Fully convolutional network-based nonlocal-dependent learning for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2022, 71, 5023414. [Google Scholar] [CrossRef]
  31. Farooque, G.; Xiao, L.; Yang, J.; Sargano, A.B. Hyperspectral image classification via a novel spectral-spatial 3D ConvLSTM-CNN. Remote Sens. 2021, 13, 4348. [Google Scholar] [CrossRef]
  32. Li, H.C.; Hu, W.S.; Li, W.; Li, J.; Du, Q.; Plaza, A. A3CLNN: Spatial, spectral and multiscale attention ConvLSTM neural network for multisource remote sensing data classification. IEEE Trans. Neural Netw. Learn. Syst. 2022, 33, 747–761. [Google Scholar] [CrossRef] [PubMed]
  33. Wang, L.; Wang, H.; Wang, L.; Wang, X.; Shi, Y.; Cui, Y. RSSGL: Statistical loss regularized 3-D ConvLSTM for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5529420. [Google Scholar] [CrossRef]
  34. Liu, C.; Li, J.; He, L.; Plaza, A.; Li, S.; Li, B. Naive Gabor networks for hyperspectral image classification. IEEE Trans. Neural Netw. Learn. Syst. 2021, 32, 376–390. [Google Scholar] [CrossRef] [PubMed]
  35. Sun, G.; Pan, Z.; Zhang, A.; Jia, X.; Ren, J.; Fu, H.; Yan, K. SLarge kernel spectral and spatial attention networks for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5519915. [Google Scholar]
  36. Zhu, Q.; Zhang, M.; Chen, Y.; Zheng, G. Spectral correlation-based fusion network for hyperspectral image super-resolution. IEEE Trans. Geosci. Remote Sens. 2024, 63, 5500314. [Google Scholar] [CrossRef]
  37. Kalyan, K.S.; Rajasekharan, A.; Sangeetha, S. AMMUS: A survey of transformer-based pretrained models in natural language processing. arXiv 2021, arXiv:2108.05542. [Google Scholar]
  38. Xu, R.; Dong, X.M.; Li, W.; Peng, J.; Sun, W.; Xu, Y. DBCTNet: Double branch convolution-transformer network for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2024, 64, 5509915. [Google Scholar] [CrossRef]
  39. Jia, S.; Wang, Y.; Jiang, S.; He, R. A center-masked transformer for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5510416. [Google Scholar] [CrossRef]
  40. Shi, C.; Yue, S.; Wang, L. A dual-branch multiscale transformer network for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5504520. [Google Scholar] [CrossRef]
  41. Ahmad, M.; Ghous, U.; Usama, M.; Mazzara, M. WaveFormer: Spectral-spatial wavelet transformer for hyperspectral image classification. IEEE Geosci. Remote Sens. Lett. 2024, 21, 5502405. [Google Scholar] [CrossRef]
  42. Hu, W.S.; Li, W.; Li, H.C.; Huang, F.H.; Tao, R. Global clue-guided cross-memory quaternion transformer network for multisource remote sensing data classification. IEEE Trans. Neural Netw. Learn. Syst. 2024, 36, 7357–7371. [Google Scholar] [CrossRef] [PubMed]
  43. Jia, S.; Liao, J.; Xu, M.; Li, Y.; Zhu, J.; Sun, W.; Jia, X.; Li, Q. 3-D Gabor convolutional neural network for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5509216. [Google Scholar] [CrossRef]
  44. Zhang, H.; Wu, C.; Zhang, Z.; Zhu, Y.; Lin, H.; Zhang, Z.; Sun, Y.; He, T.; Mueller, J.; Manmatha, R.; et al. ResNeSt: Split-attention networks. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2022), New Orleans, LA, USA, 19–20 June 2022; pp. 2735–2745. [Google Scholar]
  45. Li, X.; Chang, D.; Ma, Z.; Tan, Z.H.; Xue, J.H.; Cao, J.; Yu, J.; Guo, J. OSLNet: Deep small-sample classification with an orthogonal softmax layer. IEEE Trans. Image Process. 2020, 29, 6482–6495. [Google Scholar] [CrossRef] [PubMed]
  46. Tang, X.; Meng, F.; Zhang, X.; Cheung, Y.M.; Ma, J.; Liu, F.; Jiao, L. Hyperspectral image classification based on 3-D octave convolution with spatial-spectral attention network. IEEE Trans. Geosci. Remote Sens. 2021, 59, 2430–2447. [Google Scholar] [CrossRef]
  47. Zhang, L.; Wang, Y.; Yang, L.; Chen, J.; Liu, Z.; Bian, L.; Yang, C. D 2S2BoT: Dual-dimension spectral-spatial bottleneck transformer for hyperspectral image classification. IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens. 2024, 17, 2655–2669. [Google Scholar] [CrossRef]
  48. Roy, S.K.; Krishna, G.; Dubey, S.R.; Chaudhuri, B.B. HybridSN: Exploring 3-D-2-D CNN feature hierarchy for hyperspectral image classification. IEEE Geosci. Remote Sens. Lett. 2020, 17, 277–281. [Google Scholar] [CrossRef]
  49. Rahman, S.-A. Hyperspectral imaging classification using ISODATA algorithm: Big data challenge. In Proceedings of the 5th International Conference on e-Learning, Manama, Bahrain, 18–20 October 2015; pp. 247–250. [Google Scholar]
  50. Zhong, Y.; Hu, X.; Luo, C.; Wang, X.; Zhao, J.; Zhang, L. WHU-Hi: UAV-borne hyperspectral with high spatial resolution (H2) benchmark datasets and classifier for precise crop identification based on deep convolutional neural network with CRF. Remote Sens. Environ. 2020, 250, 112012. [Google Scholar] [CrossRef]
  51. Zhong, Y.; Wang, X.; Xu, Y.; Wang, S.; Jia, T.; Hu, X.; Zhao, J.; Wei, L.; Zhang, L. Mini-UAV-borne hyperspectral remote sensing: From observation and processing to applications. IEEE Geosci. Remote Sens. Mag. 2018, 6, 46–62. [Google Scholar] [CrossRef]
  52. Wu, C.; Tong, L.; Zhou, J.; Xiao, C. Spectral–spatial large kernel attention network for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5508814. [Google Scholar] [CrossRef]
  53. Gader, P.; Zare, A.; Close, R.; Aitken, J.; Tuell, G. MUUFL Gulfport hyperspectral and LiDAR airborne data set; Tech. Rep. REP-2013-570; University of Florida: Gainesville, FL, USA, 2013. [Google Scholar]
  54. Li, C.; Rasti, B.; Tang, X.; Duan, P.; Li, J.; Peng, Y. Channel-layer-oriented lightweight spectral–spatial network for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5504214. [Google Scholar] [CrossRef]
  55. Li, J.; Zhang, Z.; Song, R.; Li, Y.; Du, Q. SCFormer: Spectral coordinate transformer for cross-domain few-shot hyperspectral image classification. IEEE Trans. Image Process. 2024, 33, 840–855. [Google Scholar] [CrossRef] [PubMed]
  56. Hu, W.S.; Li, W.; Li, H.C.; Zhao, X.; Zhang, M.; Tao, R. Unsupervised domain adaptation with hierarchical masked dual-adversarial network for end-to-end classification of multisource remote sensing data. IEEE Trans. Geosci. Remote Sens. 2025, 63, 4409917. [Google Scholar] [CrossRef]
Figure 1. Overview of the proposed ASLNN model.
Figure 1. Overview of the proposed ASLNN model.
Remotesensing 17 01847 g001
Figure 2. Structure of the developed ASBlock.
Figure 2. Structure of the developed ASBlock.
Remotesensing 17 01847 g002
Figure 3. Inner structure of the designed GSS subblock.
Figure 3. Inner structure of the designed GSS subblock.
Remotesensing 17 01847 g003
Figure 4. Inner structure of the GAFBlock.
Figure 4. Inner structure of the GAFBlock.
Remotesensing 17 01847 g004
Figure 5. The WHU-Hi-LongKou data set. (a) Image cube. (b) Ground-truth image.
Figure 5. The WHU-Hi-LongKou data set. (a) Image cube. (b) Ground-truth image.
Remotesensing 17 01847 g005
Figure 6. (Left) False-color maps and (Right) ground-truth maps. (a) Indian Pines (bands 20, 40, and 60). (b) 2013 Houston Data (bands 57, 27, and 17). (c) MUUFL Gulfport (bands 31, 20, and 15). (df) Number of the training samples.
Figure 6. (Left) False-color maps and (Right) ground-truth maps. (a) Indian Pines (bands 20, 40, and 60). (b) 2013 Houston Data (bands 57, 27, and 17). (c) MUUFL Gulfport (bands 31, 20, and 15). (df) Number of the training samples.
Remotesensing 17 01847 g006
Figure 7. OA (%) achieved by the proposed ASLNN model with different window size s × s for the four HSI data sets.
Figure 7. OA (%) achieved by the proposed ASLNN model with different window size s × s for the four HSI data sets.
Remotesensing 17 01847 g007
Figure 8. Classification maps for the Indian Pines data set. (a) ISODATA. (b) SVM-CK. (c) HybridSN. (d) SSCL3DNN. (e) 3DOC-SSAN. (f) 3DG-CNN. (g) APDCLNN. (h) D2S2BoT. (i) DBCTNet. (j) ASLNN.
Figure 8. Classification maps for the Indian Pines data set. (a) ISODATA. (b) SVM-CK. (c) HybridSN. (d) SSCL3DNN. (e) 3DOC-SSAN. (f) 3DG-CNN. (g) APDCLNN. (h) D2S2BoT. (i) DBCTNet. (j) ASLNN.
Remotesensing 17 01847 g008
Figure 9. Classification maps for the Houston data set. (a) ISODATA. (b) SVM-CK. (c) HybridSN. (d) SSCL3DNN. (e) 3DOC-SSAN. (f) 3DG-CNN. (g) APDCLNN. (h) D2S2BoT. (i) DBCTNet. (j) ASLNN.
Figure 9. Classification maps for the Houston data set. (a) ISODATA. (b) SVM-CK. (c) HybridSN. (d) SSCL3DNN. (e) 3DOC-SSAN. (f) 3DG-CNN. (g) APDCLNN. (h) D2S2BoT. (i) DBCTNet. (j) ASLNN.
Remotesensing 17 01847 g009
Figure 10. Classification maps for the MUUFL Gulfport data set. (a) ISODATA. (b) SVM-CK. (c) HybridSN. (d) SSCL3DNN. (e) 3DOC-SSAN. (f) 3DG-CNN. (g) APDCLNN. (h) D2S2BoT. (i) DBCTNet. (j) ASLNN.
Figure 10. Classification maps for the MUUFL Gulfport data set. (a) ISODATA. (b) SVM-CK. (c) HybridSN. (d) SSCL3DNN. (e) 3DOC-SSAN. (f) 3DG-CNN. (g) APDCLNN. (h) D2S2BoT. (i) DBCTNet. (j) ASLNN.
Remotesensing 17 01847 g010
Figure 11. Classification maps for the WHU-Hi-LongKou data set. (a) ISODATA. (b) SVM-CK. (c) HybridSN. (d) SSCL3DNN. (e) 3DOC-SSAN. (f) 3DG-CNN. (g) APDCLNN. (h) D2S2BoT. (i) DBCTNet. (j) ASLNN.
Figure 11. Classification maps for the WHU-Hi-LongKou data set. (a) ISODATA. (b) SVM-CK. (c) HybridSN. (d) SSCL3DNN. (e) 3DOC-SSAN. (f) 3DG-CNN. (g) APDCLNN. (h) D2S2BoT. (i) DBCTNet. (j) ASLNN.
Remotesensing 17 01847 g011
Figure 12. OA (%) of all considered methods under different numbers of training samples for the four HSI data sets. (a) Indian Pines. (b) Houston Data. (c) MUUFL Gulfport. (d) WHU-Hi-LongKou.
Figure 12. OA (%) of all considered methods under different numbers of training samples for the four HSI data sets. (a) Indian Pines. (b) Houston Data. (c) MUUFL Gulfport. (d) WHU-Hi-LongKou.
Remotesensing 17 01847 g012
Table 1. Classification results of different methods for the Indian Pines data set.
Table 1. Classification results of different methods for the Indian Pines data set.
ClassISODATASVM-CKHybridSNSSCL3DNN3DOC-SSAN3DG-CNNAPDCLNND2S2BoTDBCTNetASLNN
178.1597.7898.61100.0099.4498.0699.7256.16100.0097.22
233.1059.5645.9053.3072.0869.6868.4378.4486.3166.29
332.6864.9554.2250.3765.7855.9866.9660.4758.1783.78
472.2480.4483.5788.6389.3084.4195.2989.63100.0085.90
577.7587.5378.2776.5884.5790.6173.6897.5791.9688.58
674.0294.3993.7990.1794.0692.2683.5894.4094.7296.25
755.8798.89100.00100.00100.0099.44100.0060.52100.00100.00
884.3693.5998.89100.0098.0195.6499.9899.78100.0092.09
984.36100.00100.00100.00100.00100.00100.0017.04100.00100.00
1052.7874.0768.6275.3677.3867.0675.9777.8171.1075.57
1120.0259.5659.8059.6662.0559.2173.4989.1158.8969.12
1230.7057.6360.6264.9763.1971.3475.1546.2342.1980.44
1369.2397.7499.2899.7997.3898.3196.5689.6299.2895.38
1452.7092.1083.1088.2985.7389.1084.3896.2598.0899.60
1543.6178.6277.1887.6689.1882.3990.0374.4197.8793.61
1668.4798.3196.14100.00100.0099.4098.6748.08100.0098.67
OA41.0873.1868.7871.2876.1073.8777.8279.4277.5481.12
AA60.8783.4581.1283.4286.1384.6086.3773.4787.4588.99
κ 38.9269.7965.0167.8373.1971.0875.0276.9474.5978.76
Table 2. Classification results of different methods for the Houston data set.
Table 2. Classification results of different methods for the Houston data set.
ClassISODATASVM-CKHybridSNSSCL3DNN3DOC-SSAN3DG-CNNAPDCLNND2S2BoTDBCTNetASLNN
185.4190.8189.5285.9891.3788.7690.0787.8075.5892.67
268.5688.0484.5781.2781.7886.7588.7082.2587.5494.87
386.2788.9687.4695.6892.7398.2092.9997.1599.2792.88
471.2486.9783.5484.8085.2488.4489.7491.6585.6592.33
567.8599.1297.8995.0596.4299.8694.8795.2097.8899.51
669.1980.3865.9273.3888.4773.4190.9683.2391.2291.03
757.6365.8472.0473.8668.9069.3783.3981.0186.6477.05
849.5953.9955.5160.1965.0252.7258.0290.8542.3862.53
950.6476.9962.1570.2169.5863.8278.1675.5384.9480.13
1017.0976.9545.5268.8279.4773.0671.2562.8791.5382.87
1159.1069.0060.2461.5765.3669.6767.3696.7757.6374.43
1221.8972.2760.7861.9184.4373.4768.4269.3843.9081.28
1311.3072.7955.2357.6090.5076.8086.1466.0485.4088.67
1478.8491.9996.2498.8596.3997.1896.32100.0090.1998.41
1580.1395.9486.1484.4895.5498.6194.4195.0898.6199.34
OA54.4979.5772.9875.9781.1879.0781.3182.6778.6485.88
AA57.6680.6773.7276.9183.4180.6883.3984.9981.3686.58
κ 52.3177.9270.8574.0479.6777.3879.8381.2876.9184.83
Table 3. Classification results of different methods for the MUUFL Gulfport data set.
Table 3. Classification results of different methods for the MUUFL Gulfport data set.
ClassISODATASVM-CKHybridSNSSCL3DNN3DOC-SSAN3DG-CNNAPDCLNND2S2BoTDBCTNetASLNN
144.9882.5282.6486.4067.5078.3083.3297.9985.0581.42
257.9573.7175.9162.7360.2471.3262.3871.2365.3281.17
348.5568.0942.2745.3922.3757.0056.1365.1365.1167.95
453.2583.6277.6184.3969.5074.0075.4772.9765.8574.28
548.5379.0855.7647.8443.3769.2267.3088.1268.1189.15
668.2091.1690.4499.9192.0880.6695.0939.5998.4699.91
764.9374.4669.5173.3750.6070.0577.1444.8093.4796.32
834.8662.5862.4172.9939.9866.4271.0995.0654.2286.35
913.3059.9934.0230.0133.6735.2934.8121.0347.9266.47
1031.7973.9953.4148.1562.2554.4552.146.1523.6973.41
1174.2086.5694.4894.4095.4789.3894.1362.4096.5296.52
OA50.2776.4169.2970.8753.6570.9073.0376.8273.9480.62
AA47.7175.9867.1367.7855.1967.8269.9160.4169.4380.52
κ 40.5569.9360.7862.6043.1463.2165.5470.6366.7574.15
Table 4. Classification results of different methods for the WHU-Hi-LongKou data set.
Table 4. Classification results of different methods for the WHU-Hi-LongKou data set.
ClassISODATASVM-CKHybridSNSSCL3DNN3DOC-SSAN3DG-CNNAPDCLNND2S2BoTDBCTNetASLNN
181.2499.3497.2190.0294.7293.7798.8098.7495.0699.62
283.3095.4897.1988.8497.8974.1896.9874.4893.8497.89
377.2099.9094.0494.8198.4088.5885.1684.2099.5698.23
449.5490.2377.7791.8982.5393.7089.0899.9384.2295.31
560.9592.5487.6393.4086.0889.7795.6184.2289.6697.91
681.7390.4599.6295.1297.4596.7897.4395.5993.4699.39
777.9897.4499.8999.5799.9596.1399.5299.8998.7299.75
881.0687.6970.0692.8072.3382.3689.6888.3590.9695.22
961.5393.7490.0896.3896.0788.7397.9261.3485.3696.94
OA71.2494.5590.8594.3892.0993.2195.2795.4892.3497.97
AA72.9294.0990.3993.6591.7189.3394.4787.4292.3297.81
κ 68.9692.9488.2392.6889.7891.1693.8594.1490.0997.35
Table 5. Effectiveness research of the ASBlock.
Table 5. Effectiveness research of the ASBlock.
ModelsIndian PinesHouston Data
OA κ OA κ
proposed (without)71.8668.3872.9270.84
proposed (with)81.1278.7685.8884.83
ModelsMUUFL GulfportWHU-Hi-LongKou
OA κ OA κ
proposed (without)73.6566.9194.9693.51
proposed (with)80.6274.1597.9797.35
Table 6. Structure analysis of the GAFBlock.
Table 6. Structure analysis of the GAFBlock.
ModelsIndian PinesHouston Data
OA κ OA κ
proposed (without)79.7776.8282.7681.24
proposed (with)81.1278.7685.8884.83
ModelsMUUFL GulfportWHU-Hi-LongKou
OA κ OA κ
proposed (without)75.8768.6195.6194.12
proposed (with)80.6274.1597.9797.35
Table 7. Computing performance comparison results of different methods for the WHU-Hi-LongKou data set.
Table 7. Computing performance comparison results of different methods for the WHU-Hi-LongKou data set.
ClassSVM-CKHybridSNSSCL3DNN3DOC-SSAN3DG-CNNAPDCLNND2S2BoTDBCTNetASLNN
Training time (s)684.371874.252111.421421.681774.972965.442158.943422.783756.27
Test time (s)78.4636.7935.4818.5628.1581.4768.4889.7493.14
Memory (GB)1.362.192.873.443.166.984.585.8711.39
OA (%)94.5590.8594.3892.0993.2195.2795.4892.3497.97
Table 8. Computing performance comparison results of different methods for the Indian Pines data set.
Table 8. Computing performance comparison results of different methods for the Indian Pines data set.
ClassSVM-CKHybridSNSSCL3DNN3DOC-SSAN3DG-CNNAPDCLNND2S2BoTDBCTNetASLNN
Training time (s)77.251239.481227.811106.591189.801278.361211.531369.741539.16
Test time (s)2.4213.4911.891.139.78818.3216.317.8820.09
Memory (GB)0.911.781.541.121.852.241.081.349.08
OA (%)73.1868.7871.2876.1073.8777.8279.4277.5481.12
Table 9. Computing performance comparison results of different methods for the Houston data set.
Table 9. Computing performance comparison results of different methods for the Houston data set.
ClassSVM-CKHybridSNSSCL3DNN3DOC-SSAN3DG-CNNAPDCLNND2S2BoTDBCTNetASLNN
Training time (s)141.341360.75138.961421.311358.071392.811256.472343.622674.33
Test time (s)3.897.345.682.0715.7448.5443.1117.6727.52
Memory (GB)2.053.283.172.763.444.733.911.918.76
OA (%)79.5772.9875.9781.1879.0781.3182.6778.6485.88
Table 10. Computing performance comparison results of different methods for the MUUFL Gulfport data set.
Table 10. Computing performance comparison results of different methods for the MUUFL Gulfport data set.
ClassSVM-CKHybridSNSSCL3DNN3DOC-SSAN3DG-CNNAPDCLNND2S2BoTDBCTNetASLNN
Training time (s)144.811134.121260.17976.881428.711783.321677.261282.572396.31
Test time (s)5.925.756.122.0815.3359.6456.4121.9674.52
Memory (GB)1.291.982.141.632.442.311.761.982.53
OA (%)76.4169.2970.8753.6570.9073.0376.8273.9480.62
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Wang, W.-Y.; Deng, Y.-J.; Xu, Y.-P.; Guo, B.-J.; Zhang, C.-L.; Li, H.-C. Adaptive Spectral Correlation Learning Neural Network for Hyperspectral Image Classification. Remote Sens. 2025, 17, 1847. https://doi.org/10.3390/rs17111847

AMA Style

Wang W-Y, Deng Y-J, Xu Y-P, Guo B-J, Zhang C-L, Li H-C. Adaptive Spectral Correlation Learning Neural Network for Hyperspectral Image Classification. Remote Sensing. 2025; 17(11):1847. https://doi.org/10.3390/rs17111847

Chicago/Turabian Style

Wang, Wei-Ye, Yang-Jun Deng, Yuan-Ping Xu, Ben-Jun Guo, Chao-Long Zhang, and Heng-Chao Li. 2025. "Adaptive Spectral Correlation Learning Neural Network for Hyperspectral Image Classification" Remote Sensing 17, no. 11: 1847. https://doi.org/10.3390/rs17111847

APA Style

Wang, W.-Y., Deng, Y.-J., Xu, Y.-P., Guo, B.-J., Zhang, C.-L., & Li, H.-C. (2025). Adaptive Spectral Correlation Learning Neural Network for Hyperspectral Image Classification. Remote Sensing, 17(11), 1847. https://doi.org/10.3390/rs17111847

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop