You are currently viewing a new version of our website. To view the old version click .
Remote Sensing
  • Article
  • Open Access

19 April 2025

Hyperspectral Image Classification Using a Multi-Scale CNN Architecture with Asymmetric Convolutions from Small to Large Kernels

,
,
,
,
and
1
School of Information Engineering, Guangdong University of Technology, Guangzhou 510006, China
2
Department of Surveying Engineering, Guangdong University of Technology, Guangzhou 510006, China
3
Guangdong Provincial Key Laboratory of Intellectual Property Big Data, Guangdong Polytechnic Normal University, Guangzhou 510665, China
4
National Subsea Centre, Robert Gordon University, Aberdeen AB21 0BH, UK

Abstract

Deep learning-based hyperspectral image (HSI) classification methods, such as Transformers and Mambas, have attracted considerable attention. However, several challenges persist, e.g., (1) Transformers suffer from quadratic computational complexity due to the self-attention mechanism; and (2) both the local and global feature extraction capabilities of large kernel convolutional neural networks (LKCNNs) need to be enhanced. To address these limitations, we introduce a multi-scale large kernel asymmetric CNN (MSLKACNN) with the large kernel sizes as large as 1 × 17 and 17 × 1 for HSI classification. MSLKACNN comprises a spectral feature extraction module (SFEM) and a multi-scale large kernel asymmetric convolution (MSLKAC). Specifically, the SFEM is first utilized to suppress noise, reduce spectral bands, and capture spectral features. Then, MSLKAC, with a large receptive field, joins two parallel multi-scale asymmetric convolution components to extract both local and global spatial features: (C1) a multi-scale large kernel asymmetric depthwise convolution (MLKADC) is designed to capture short-range, middle-range, and long-range spatial features; and (C2) a multi-scale asymmetric dilated depthwise convolution (MADDC) is proposed to aggregate the spatial features between pixels across diverse distances. Extensive experimental results on four widely used HSI datasets show that the proposed MSLKACNN significantly outperforms ten state-of-the-art methods, with overall accuracy (OA) gains ranging from 4.93% to 17.80% on Indian Pines, 2.09% to 15.86% on Botswana, 0.67% to 13.33% on Houston 2013, and 2.20% to 24.33% on LongKou. These results validate the effectiveness of the proposed MSLKACNN.

1. Introduction

Hyperspectral images (HSI) consist of hundreds of narrow spectral bands captured by hyperspectral remote sensors, containing rich spectral–spatial information. Compared with RGB and multispectral images, HSI provides more advantages in classifying land cover types. Therefore, hyperspectral image classification (HSIC) offers crucial technical support for extensive application in domains such as urban planning [1], agriculture [2], mineral exploration [3], atmospheric sciences [4], environmental monitoring [5], and object tracking [6].
A multitude of HSI classification methods primarily focus on traditional machine learning (ML) models [7] and deep learning (DL) models [8,9,10]. Compared with traditional ML methods that depend on handcrafted feature engineering [11], DL approaches have shown significantly more potential in dealing with various fields, such as HSI classification, due to their ability to automatically learn features in an end-to-end manner. Typical DL approaches include stacked autoencoders (SAEs) [12], recurrent neural networks (RNNs) [13], convolutional neural networks (CNNs) [14,15], capsule networks (CapsNets) [16], graph convolutional networks (GCNs) [17,18], Transformers [10,19], and Mamba [20]. Among these models, CNN-, GCN-, Transformer-, and Mamba-based models have gained more interest. CNN-based models [14,21] utilize shape-fixed small kernel convolutions to extract local contextual information from fixed-size image patches. Subsequently, researchers explore multi-scale CNN architectures [22,23] and attention-based CNN models [8,24,25] to enhance the ability of capturing local spatial–spectral features, thereby improving HSI classification performance. However, owing to the limited receptive field of their small kernels, they encounter challenges in identifying the relationships between land covers over medium and long distances.
Compared to CNNs with shape-fixed kernels, graph convolutional networks (GCNs) [26] and their variant methods can perform flexible convolutions across irregular land cover regions. Consequently, many works introduce superpixel-based GCNs to classify HSI data [9,27,28,29]. These superpixel GCNs are capable of establishing long-range spatial dependencies and capture global information by leveraging superpixels as graph nodes. While the aforementioned superpixel-based GCN models enhance the classification capabilities of HSI, they suffer from two limitations: (1) the construction of their adjacency matrices necessitates significant computational resources, thereby diminishing classification efficiency; and (2) these adjacency matrices solely aim to model spatial relationships between pixels, overlooking the crucial spectral correlations.
Recently, driven by the outstanding achievements of vision Transformers (ViTs) [30] in natural image processing, Transformer-based models [10,19,31,32,33] have been proposed for identifying land cover types. These models have demonstrated remarkable classification outcomes, attributed to their robust capability in capturing and modeling remote dependencies among pixels. Nevertheless, they suffer from computational inefficiency due to the quadratic computational complexity driven by the self-attention mechanism in the Transformer. This complexity poses challenges when dealing with large HSI datasets containing numerous labeled pixels. To address these limitations, several studies [20,34] are devoted to developing Mamba [35] frameworks for HSI classification. Although these Mamba-based models show strong long-range modeling ability and achieve linear computational complexity, their local feature extraction capabilities need to be enhanced.
In recent years, large kernel CNNs (LKCNNs) [36,37,38] have garnered considerable attention. Unlike traditional CNNs, which stack a series of small-kernel layers to enlarge the receptive field, LKCNNs employ a few large spatial convolutions to increase the size of the receptive field, demonstrating a promising capability in natural visual tasks. This capability inspires the limited number of studies [39,40,41] that leverage LKCNNs for HSI classification. These studies, such as SSLKA [41], typically employ the classical large kernel attention (LKA) [37] (decomposing k × k large kernel convolution into a ( 2 d 1 ) × ( 2 d 1 ) depthwise convolution (DWC) [42], a k d × k d depthwise dilation convolution (DDC) with a dilation factor of d, and a 1 × 1 convolution) to capture global features. However, they face three issues: (1) The LKA primarily focuses on modeling long-range dependencies while overlooking the extraction of local features. (2) Their number of parameters and computational complexity significantly increase when the value of k is large, thus increasing the risk of overfitting. (3) Their capability to learn global features needs to be enhanced when the value of k is not large.
To tackle these limitations of CNN-, GCN-, Transformer-, Mamba-, and LKCNN-based models, we propose a multi-scale large kernel asymmetric CNN (MSLKACNN) for HSI classification. This architecture scales up the large kernel sizes to 1 × 17 and 17 × 1 as illustrated in Figure 1. Specifically, we first develop a spectral feature extraction module (SFEM) to eliminate noise, reduce spectral bands, and extract spectral features. Subsequently, to capture the spatial features of different scales, we construct a novel multi-scale large kernel asymmetric convolution (MSLKAC) comprising two parallel multi-scale asymmetric convolution components: a multi-scale large kernel asymmetric depthwise convolution (MLKADC) and a multi-scale asymmetric dilated depthwise convolution (MADDC). MLKADC consists of parallel DWCs with kernels ranging from 1 × 3 and 3 × 1 to 1 × m and m × 1 , which is designed to learn short-range (small local), medium-range (larger local), and long-range (global) spatial features. Since these depthwise kernels are non-square and the m is set to a large value of 17, we refer to our MLKADC as large kernel asymmetric depthwise convolution (ADC). MADDC captures spatial relationships among pixels at varying distances through an integration of multi-scale learning, dilated convolutions [43], DWCs, and asymmetric convolutions. Lastly, an average fusion pooling (AFP) is introduced to fuse these spatial features extracted by various components. The main contributions of this article are summarized as follows.
Figure 1. Overview illustration of the proposed MSLKACNN, which consists of two primary blocks, i.e., a spectral feature extraction module (SFEM) and a multi-scale large kernel asymmetric convolution (MSLKAC) block. MSLKAC includes three key components: MLKADC, MADDC, and an average fusion pooling (AFP).
(1) We introduce a novel MLKADC to extract local-to-global spatial features. The MLKADC utilizes a series of asymmetric DWCs with small to large kernels, addressing the limitations of existing DL models. Notably, it extends the non-square kernel sizes to 1 × 17 and 17 × 1 , thus enhancing the global feature extraction capabilities while reducing the number of parameters compared to SSLKA, which relies on standard square kernels.
(2) We propose a new MADDC to model the spatial relationships between land covers at different distances by combining ADC with dilated convolution.
(3) By combining the proposed MLKADC and MADDC in parallel, we develop a novel MSLKAC for improving the ability to extract spatial features across small to large ranges. Based on our MSLKAC, we introduce an architecture termed MSLKACNN to jointly learn both spectral and spatial features through the SFEM and MSLKAC.
The rest of the paper is organized as follows. In Section 2, we present the related works of HSI classification models. The proposed MSLKACNN is introduced in Section 3. In Section 4, we evaluate and discuss the performance of MSLKACNN. We summarize the paper in Section 5.

3. Proposed Method

The flowchart of the proposed MSLKACNN architecture is depicted in Figure 1, comprising two blocks and a classifier: (1) a spectral feature extraction module (SFEM), designed to eliminate noise, reduce the dimensionality of bands, and learn spectral features in raw HSI data; (2) a multi-scale large kernel asymmetric convolution (MSLKAC), which utilizes two parallel multi-scale asymmetric convolutions with small to large kernels to capture both local and global spatial features; and (3) a Softmax classifier that assigns labels to individual pixels.

3.1. SFEM

The original HSI data incorporate superfluous spectral information and are prone to being influenced by noise. To circumvent these issues and extract spectral features, we design the SFEM. Unlike principal component analysis (PCA), which applies linear transformations to hyperspectral data for dimensionality reduction, our SFEM automatically performs nonlinear operations to reduce the dimensionality of HSI through three consecutive identical convolutional blocks equipped with a limited number of filters, suppressing noise and learning the spectral features of the hyperspectral data. Each block in the proposed SFEM consists of a 1 × 1 convolution, a batch normalization (BN), and a ReLU6 function.
Let X l be the input feature map of the l-th convolutional layer, and the output feature map of the convolutional layer, denoted as X l + 1 , can be expressed as
X l + 1 = ReLU 6 ( BN ( Conv 1 l ( X l ) ) ) ,
where Conv 1 l represents the l-th convolutional layer with a 1 × 1 kernel.

3.2. MSLKAC

In this section, we introduce the innovative MSLKAC, which enlarges the large kernel sizes to 1 × 17 and 17 × 1 as illustrated in Figure 1. The MSLKAC consists of two convolutions and a fusion operation: (1) a multi-scale large kernel asymmetric depthwise convolution (MLKADC), which employs techniques such as multi-scale learning and asymmetric depthwise convolutions with small to large kernels, to extract spatial features from HSI data across various ranges; (2) a multi-scale asymmetric dilated depthwise convolution (MADDC) that captures spatial information among pixels at varying distances by combining these techniques, including multi-scale learning, dilated convolutions, depthwise convolutions, and asymmetric convolutions; and (3) an average fusion pooling (AFP) that integrates the features learned by the two convolutions. The details of these three components will be described in the following sections.
(1) Multi-Scale Large Kernel Asymmetric Depthwise Convolution (MLKADC): Compared with an ordinary convolution, DWC considerably reduces the computational complexity and the number of parameters by convolving each channel of the input feature map separately. Due to these advantages of DWC, Gao et al. [62] use DWCs instead of ordinary convolutions to learn spectral features from HSI. For the further reduction in the number of computations and parameters, we design a new asymmetric depthwise convolution (ADC) as illustrated in Figure 2b. In the proposed ADC, we decompose a 3 × 3 DWC (with a kernel size of 3 × 3 , as shown in Figure 2a) into a 1 × 3 DWC and a 3 × 1 DWC, with the goal of improving the computational efficiency and reducing the number of parameters. To extract small local, larger local, and global spatial features from HSI, we present a novel MLKADC by constructing ADCs ranging from a sequence of two DWCs with 1 × 3 and 3 × 1 kernels to a sequence of two DWCs with 1 × m and m × 1 kernels as depicted in Figure 2c. The proposed MLKADC comprises ( m 1 ) / 2 parallel ADCs. These ADCs take the same input information as captured by SFEM, perform convolution operations in an equal-width manner, maintain a consistent topology, and adhere to two rules: (i) these hyperparameters (filter numbers, strides) remain the same across these ADCs, except for their varying kernel sizes; and (ii) the kernel sizes are distributed in an arithmetic progression with a common difference of 2. Following these two rules, we merely need to design the first template ADC and establish the scale numbers, and our MLKADC can be determined accordingly. Consequently, these two rules significantly streamline the design space, allowing us to concentrate on optimizing a few hyperparameters.
Figure 2. Different depthwise convolutions. (a) Depthwise convolution. (b) Proposed asymmetric depthwise convolution. (c) Proposed multi-scale large kernel asymmetric depthwise convolution (MLKADC).
For the proposed MLKADC, we utilize DWCs with various kernels ranging from 1 × 3 to 1 × m to extract spatial features along the width of HSI, followed by DWCs and ReLU6 functions with diverse kernels ranging from 3 × 1 to m × 1 to learn spatial features along the height of HSI, where 1 × m and m × 1 represent the largest kernel sizes in the width and height directions, respectively. As m is set to a large value of 17 in our experiments, we refer to our MLKADC as a large kernel convolution. We take the feature map X p learned by our SFEM as the input, then its output feature map H MLKADC can be defined as
H MLKADC = MLKADC ( X p ) = { H ˜ 3 , H ˜ 5 , , H ˜ m } ,
where H ˜ m represents the extracted features from X p through a sequence consisting of a DWC with a 1 × m kernel, a DWC with an m × 1 kernel, and a ReLU6 activation function.
In our MLKADC, we employ a sequence of asymmetric convolutions with small kernels (e.g., 1 × 3 and 3 × 1 ) to learn local spatial features, utilize asymmetric convolutions with medium-sized kernels (such as 1 × 7 and 7 × 1 ) for the extraction of larger local spatial features, and use a sequence of asymmetric convolutions with large kernels (like 1 × 17 and 17 × 1 ) to capture global spatial features.
(2) Multi-Scale Asymmetric Dilated Depthwise Convolution (MADDC): Dilated convolution can effectively enlarge the receptive field of ordinary convolutions while avoiding the introduction of additional parameters. Owing to these advantages, dilated convolution has been explored and achieved competitive classification performance in HSI classification [63]. This inspires us to introduce dilated convolution into our model. To simultaneously employ the benefits of dilated convolution and DWC, we construct a 3 × 3 dilated depthwise convolution (DDC) with a dilation factor d = 2 by combining these two convolutions (Figure 3a). To improve the computational efficiency and achieve reduction in the parameters, we design an asymmetric dilated depthwise convolution (ADDC) comprising three consecutive components: a 1 × 3 DDC with d = 2 , a 3 × 1 DDC with d = 2 , and a ReLU6 function as shown in Figure 3b. To enhance the capability of spatial feature extraction, we propose a novel MADDC by utilizing ( k 1 ) / 2 ADDCs with kernels ranging from 1 × 3 and 3 × 1 to 1 × k and k × 1 as depicted in Figure 3c. These ADDCs carry out convolution operations in an equal-width manner, maintaining a similar structure. They are simplified by two rules: (i) apart from varying kernel sizes and dilation factors, the other hyperparameters (filter numbers and strides) are set to be the same; and (ii) the spatial sizes and dilation factors are arranged in arithmetic progressions with common differences of 2 and 1, respectively. Analogous to our MLKADC, according to the two rules, we only need to design the first template ADDC and set the scale numbers, and they can be determined accordingly. Hence, our MADDC avoids complicated design.
Figure 3. Various dilated depthwise convolutions. (a) Proposed dilated depthwise convolution. (b) Proposed asymmetric dilated depthwise convolution. (c) Proposed multi-scale asymmetric dilated depthwise convolution (MADDC).
As illustrated in Figure 3c, these DDCs with different kernels ranging from 1 × 3 to 1 × k , are used to learn spatial information across the width of HSI. Then, these DDCs with diverse kernels ranging from 3 × 1 to k × 1 , combined with ReLU6 functions, are designed to capture spatial features along the height of HSI, where 1 × k and k × 1 denote the largest kernel sizes in the width and height directions for the proposed MADDC, respectively. Let H MADDC denote the output features of MADDC, then we have
H MADDC = MADDC ( X p ) = { H ^ 3 , H ^ 5 , , H ^ k } ,
where H ^ k represents the learned features from X p through a sequence of three operations: a 1 × k DDC with d = ( k + 1 ) / 2 , a k × 1 DDC with d = ( k + 1 ) / 2 , and a ReLU6 function.
In our MADDC, we employ parallel ADDCs with various kernels and dilation factors to capture spatial features and model the relationships between pixels at diverse distances.
(3) Average Fusion Pooling (AFP): The proposed MSLKAC consists of two parallel asymmetric convolutions: (1) MLKADC that is utilized to learn small local, larger local, and global spatial features; and (2) MADDC that is used to extract spatial information at various distances. To integrate these features learned by the two convolutions, we explore a fusion scheme named average fusion pooling (AFP). Let H MSLKAC represent the output features of AFP. With Equations (2) and (3), the AFP can be expressed as
H f = AFP ( H MLKADC ; H MADDC ) = 1 s H ˜ 3 + H ˜ 5 + + H ˜ m + H ^ 3 + H ^ 5 + + H ^ k ,
where s = ( m + k 2 ) / 2 is the total number of scales in the proposed MSLKAC. In HSI classification tasks, common fusion schemes typically include column concatenation [9] and sum [60]. Under the large value s, our AFP offers the following advantages: (1) In contrast to column concatenation fusion, the number of parameters in AFP is significantly reduced, thus mitigating the risk of overfitting; and (2) AFP effectively addresses the potential issue of large features resulting from sum operation, thereby preventing the concern of gradient explosion.

3.3. Softmax Classification

After AFP, to determine the label of each pixel, we utilize a Softmax classifier to classify the fused feature map H MSLKAC . We have
Y = e W i H MSLKAC + b i i c e W i H MSLKAC + b i ,
where c represents the number of land cover categories, and W i and b i denote the trainable parameter and bias. We adopt a cross-entropy error as the loss function to train our model, namely,
L = Z O label j = 1 c O z j ln Y z j ,
where O represents the label matrix, and Y z j denotes the probability of the z-th pixel belonging to the j-th category.

4. Experiment

In this section, we first describe four publicly available benchmark HSI datasets. Then, we introduce the evaluation metrics, compared methods, and implementation details. Next, we qualitatively and quantitatively assess the performance of the proposed MSLKACNN and state-of-the-art methods. Subsequently, we compare different training samples and fusion schemes, as well as training and testing times across various methods. Finally, we conduct several ablation studies to analyze the impacts of key components and hyperparameters.

4.1. Dataset Description

In our experiments, the four HSI datasets are Indian Pines, Botswana, Houston 2013, and WHU-Hi-LongKou (LongKou), respectively. We summarize the details of these datasets in Table 1 and Table 2.
Table 1. Summary of Indian Pines and Botswana datasets. No. denotes number. Train., Val., and Test. represent the number of training samples, validation samples, and test samples, respectively.
Table 2. Summary of Houston 2013 and LongKou datasets. No. denotes number. Train., Val., and Test. represent the number of training samples, validation samples, and test samples, respectively.
(1) Indian Pines: The Indian Pines dataset was acquired by the Airborne Visible Infrared Imaging Spectrometer sensor in 1992. It contains 10,249 labeled pixels with 16 ground-truth classes, consisting of 145 × 145 pixels in the wavelength range from 0.4 to 2.5 μm. After removing these noisy and water absorption bands of 104–108, 150–163, and 220, 200 spectral bands are retained.
(2) Botswana: The Botswana dataset was captured by using the NASA EO-1 satellite over the Okavango Delta region in Botswana. The whole image comprises 1476 × 256 pixels with 242 spectral bands, 14 land cover categories, and wavelengths ranging from 0.4 to 2.5 μm. We retain 145 spectral bands by removing 97 noise bands.
(3) Houston 2013: The Houston 2013 dataset was provided by the National Center for Airborne Laser Mapping (NCALM) over the University of Houston in 2013 [64]. The dataset contains 15,029 labeled pixels with 16 land cover categories, comprising 349 × 1905 pixels with 144 spectral bands ranging from 0.38 to 1.05 μm.
(4) WHU-Hi-LongKou (LongKou): The LongKou dataset was gathered by using an 8 mm focal length Headwall Nano-Hyperspec imaging sensor over the town of LongKou, Hubei Province, China in 2018 [65]. The HSI consists of 550 × 400 pixels with 9 land cover classes and 240 spectral bands in the wavelength range from 0.4 to 1.0 μm.

4.2. Experimental Setup

(1) Evaluation Metrics: To quantitatively analyze the effectiveness of the proposed MSLKACNN, four evaluation metrics are introduced: per-class accuracy, overall accuracy (OA), average accuracy (AA), and Kappa coefficient (KAPPA). Furthermore, the classification maps produced by various models are visualized to enable a qualitative assessment.
(2) Comparison Methods: To demonstrate the strengths of the proposed MSLKACNN, ten comparison methods are selected and evaluated. These comparison methods are divided into four categories, including (a) CNN-based methods: the double-branch dual-attention network (DBDA) [8], and the attention-based adaptive spectral–spatial kernel ResNet ( A 2 S 2 K - Res ) [24]; (b) GCN-based methods: the CNN-enhanced GCN (CEGCN) [9], the fast dynamic graph convolutional network and CNN parallel network (FDGC) [27], and the GCN and transformer fusion network (GTFN) [31]; (c) Transformer-based methods: the spectral–spatial feature tokenization transformer (SSFTT) [10], the groupwise separable convolutional vision Transformer (GSC-ViT) [32], and the double branch convolution-transformer network (DBCTNet) [33]; (d) Mamba-based method: the spatial–spectral Mamba (MambaHSI) [20]; and (e) LKCNN-based method: the spectral–spatial large kernel attention network (SSLKA) [41].
(3) Implementation Details: All experiments are implemented on a Silver 4210 CPU, Python 3.10, and a GTX-3090 GPU. We adopt the Adam optimizer with a learning rate of 0.001 on the Pytorch platform. In the proposed MSLKACNN, the number of filters for all convolutional layers is set to 64. For our MSLKAC, we set the large kernel size m in MLKADC to 17 while setting the kernel size k in MADDC to 5. We train our model for 200 epochs on the Botswana, for 120 epochs on the Houston 2013, and for 150 epochs on the other datasets. All experiments of our MSLKACNN and the comparison methods are repeated twenty times with various random initializations, and the average results are reported across each evaluation metric.

4.3. Comparison with State-of-the-Art Methods

In this section, we conduct a quantitative and qualitative evaluation between the proposed MSLKACNN and existing state-of-the-art baselines on the Indian Pines, Botswana, Houston 2013, and LongKou datasets. These baselines are implemented using the optimal parameters as described in their respective references.
(1) Results on Indian Pines: Table 3 shows the quantitative comparison of all methods on the Indian Pines dataset. From the table, we observe that our MSLKACNN outperforms almost all baselines (except for MambaHSI in KAPPA) in terms of OA, AA, and KAPPA, as well as seven out of sixteen land cover categories. Specifically, MSLKACNN improves over CNN approaches by at least 7.92%, improves over GCN approaches by at least 24.06%, improves over Transformer approaches by at least 21.15%, improves over the Mamba approach by 24.56%, and improves over the LKCNN approach by 15.84% in terms of OA, respectively. These improvements highlight the superiority of the proposed MSLKACNN.
Table 3. Quantitative comparison of all methods on the Indian Pines dataset using two labeled samples per class for training.
Figure 4 illustrates a qualitative evaluation through the visualization of classification maps obtained by various methods on the Indian Pines dataset. These maps clearly show that the proposed MSLKACNN exhibits fewer misclassifications in many classes, such as “Corn-notill” and “Soybean-notill”, in comparison to other methods.
Figure 4. False-color image, ground truth, and classification maps on the Indian Pines dataset. (a) False-color image. (b) Ground truth. (c) DBDA (OA = 62.21%). (d) A 2 S 2 K - Res (OA = 49.75%). (e) CEGCN (OA = 49.34%). (f) FDGC (OA = 52.52%). (g) GTFN (OA = 54.12%). (h) SSFTT (OA = 55.42%). (j) GSC-ViT (OA = 52.98%). (k) DBCTNet (OA = 50.87%). (l) MambaHSI (OA = 53.90%). (m) SSLKA (OA = 57.96%). (i) MSLKACNN (OA = 67.14%).
(2) Results on Botswana: The comparative results of various approaches on the Botswana dataset are summarized in Table 4. The results reveal two key findings: (a) Among all methods, the GSC-ViT, MambaHSI, SSLKA, and CEGCN models achieve the third-best, fourth-best, fifth-best, and sixth-best performance in terms of OA and AA, respectively. This is mainly due to the fact that these models can effectively establish long-range dependencies within the HSI data by utilizing Transformer, Mamba, LKCNN, and GCN, respectively. (b) Our MSLKACNN, which employs multi-scale asymmetric convolutions with kernels ranging from small to large, excels in capturing global features that are neglected by traditional CNNs, performing better than baseline methods in evaluation metrics, including OA, AA, and KAPPA. Specifically, in terms of OA, AA, and KAPPA, MSLKACNN outperforms GTFN by 15.79%, 15.13%, and 17.11%, respectively; outperforms DBCTNet by 6.99%, 6.39%, and 7.57%, respectively; outperforms MambaHSI by 4.41%, 4.15%, and 6.76%, respectively; and outperforms SSLKA by 4.75%, 5.66%, and 5.16%, respectively. These findings further validate the effectiveness of MSLKACNN.
Table 4. Quantitative comparison of all methods on the Botswana dataset using two labeled samples per class for training.
The classification maps of various methods on the Botswana dataset are displayed in Figure 5. Given the significant uneven distribution of various land covers within the highly sparse dataset, we zoom in on the two red boxed areas in the classification maps to facilitate a more accurate qualitative assessment. According to these enlarged maps, we observe that the proposed MSLKACNN achieves a superior classification map compared to the comparison methods.
Figure 5. False-color image, ground truth, and classification maps on the Botswana dataset. (a) False-color image. (b) Ground truth. (c) DBDA (OA = 90.52%). (d) A 2 S 2 K - Res (OA = 82.57%). (e) CEGCN (OA = 86.57%). (f) FDGC (OA = 76.75%). (g) GTFN (OA = 76.82%). (h) SSFTT (OA = 80.93%). (j) GSC-ViT (OA = 90.00%). (k) DBCTNet (OA = 85.62%). (l) MambaHSI (OA = 88.20%). (m) SSLKA (OA = 87.86%). (i) MSLKACNN (OA = 92.61%).
(3) Results on Houston 2013: Table 5 presents the quantitative results achieved by different methods on the Houston 2013 dataset. From these results, it is evident that DBCTNet and GSC-ViT, which integrate convolution and Transformer, rank third and fourth, respectively, among the eleven methods. This indicates their strengths in capturing local features through the convolution and establishing long-range dependencies among pixels via the Transformer. Additionally, MSLKACNN outperforms other methods by a substantial margin in terms of OA, AA, and KAPPA, which demonstrates the superiority of our model in learning local-to-global information through asymmetric convolutions with small-to-large kernels.
Table 5. Quantitative comparison of all methods on the Houston 2013 dataset using two labeled samples per class for training.
The qualitative classification maps of diverse methods are depicted in Figure 6. To aid a visual evaluation, we zoom in on the two red boxed areas in the classification maps. From these enlarged maps, we see that MSLKACNN exhibits a superior classification map in the classes of “Residential” and “Road” compared to comparison baselines.
Figure 6. False-color image, ground truth, and classification maps on the Houston 2013 dataset. (a) False-color image. (b) Ground truth. (c) DBDA (OA = 65.87%). (d) A 2 S 2 K - Res (OA = 62.46%). (e) CEGCN (OA = 64.02%). (f) FDGC (OA = 54.30%). (g) GTFN (OA = 60.16%). (h) SSFTT (OA = 62.30%). (j) GSC-ViT (OA = 65.93%). (k) DBCTNet (OA = 66.20%). (l) MambaHSI (OA = 61.83%). (m) SSLKA (OA = 66.96%). (i) MSLKACNN (OA = 67.63%).
(4) Results on LongKou: Table 6 displays the numerical outcomes obtained by diverse algorithms on the LongKou dataset. Consistent with the findings from other datasets, our proposed MSLKACNN demonstrates a notable enhancement across all benchmark methods, exceeding the second place (CEGCN) by 2.20%, 6.20%, and 2.77% in terms of OA, AA, and KAPPA, respectively. This enhancement again shows the strength of our MSLKACNN.
Table 6. Quantitative comparison of all methods on the LongKou dataset using two labeled samples per class for training.
As illustrated in Figure 7, a visual examination indicates that the classification map of MSLKACNN is closer to the ground truth compared to other methods, especially in distinguishing the category of “Broad-Leaf Soybean”.
Figure 7. False-color image, ground truth, and classification maps on the LongKou dataset. (a) False-color image. (b) Ground truth. (c) DBDA (OA = 80.13%). (d) A 2 S 2 K - Res (OA = 80.89%). (e) CEGCN (OA = 85.99%). (f) FDGC (OA = 70.94%). (g) GTFN (OA = 63.86%). (h) SSFTT (OA = 82.78%). (j) GSC-ViT (OA = 84.16%). (k) DBCTNet (OA = 83.96%). (l) MambaHSI (OA = 79.15%). (m) SSLKA (OA = 79.05%). (i) MSLKACNN (OA = 88.19%).

4.4. Analysis of All Methods Under Various Numbers of Training Samples

In this section, we conduct a comparative analysis of the OA achieved by diverse methods using different numbers of training samples per class. Specifically, we utilize 2, 4, 6, 8, and 10 training samples for each dataset. A uniform number of five validation samples is maintained for all methods across all datasets. As shown in Figure 8, the OA results of most methods demonstrate an upward trend as the number of training samples increases. However, in a minority of cases, we observe that the OA results of a few competitive methods, such as GSC-ViT, decrease unexpectedly with more training samples. These anomalous results may potentially stem from the additional noise introduced by the increased training data. Conversely, the OA results of CEGCN and our proposed MSLKACNN exhibit a notable improvement with the increase in training samples. This enhancement can be credited to the noise suppression modules in their architectures. Furthermore, in most cases, our MSLKACNN consistently surpasses the comparison methods across various datasets, especially under small training sample sizes, thereby further reinforcing its robustness and superiority for HSI classification tasks.
Figure 8. OA performance of various methods using different numbers of training samples per class across each dataset.

4.5. Analysis of Diverse Fusion Schemes

As described in Section 3.2, we introduce two widely used fusion schemes: column concatenation fusion (concatenate) and sum fusion (sum). In Equation (4), the number of feature maps obtained by the proposed MLKADC and MADDC is substantial. The applications of concatenate and sum for combining these feature maps have individually resulted in an increase in the number of parameters and the generation of large feature values, respectively, which may potentially lead to overfitting and gradient explosion issues. To address these challenges, we investigate the AFP fusion scheme. To evaluate our AFP, we compare the OA results achieved by AFP and the two fusion schemes. Figure 9 displays the results. From the figure, it is evident that our AFP significantly outperforms other fusion schemes. This validates the superiority of our AFP in fusing multiple feature maps.
Figure 9. OA results of diverse fusion schemes on each dataset.

4.6. Analysis of Computational Complexity

Table 7 provides an extensive evaluation in terms of the training time, testing time, parameters, and FLOPS across all methods. The analysis yields the following insights: (1) SSFTT demonstrates faster training speeds compared to other baseline methods, which can be attributed to its use of a limited number of convolutional layers. (2) CEGCN and MambaHSI operate on the whole HSI as input instead of using small HSI cubes, leading to quicker prediction speeds than most other methods. (3) Like CEGCN and MambaHSI, the proposed MSLKACNN also processes the entire HSI as input, achieving the fastest prediction time across all datasets. (4) LKVHAN outperforms most methods in terms of parameters, due to its replacement of square kernels with vertical and horizontal kernels. (5) Since CEGCN, MambaHSI, and MSLKACNN take the entire HSI as input, they require significantly more FLOPS compared to other approaches that use small HSI cubes. Additionally, MSLKACNN significantly outperforms other methods in terms of the classification results. These findings highlight the benefits of incorporating small-to-large kernel asymmetric convolutions in MSLKACNN for industrial applications.
Table 7. Analysis of different methods in terms of training time, testing time, parameters, and FLOPS on the Indian Pines, Botswana, Houston 2013, and LongKou datasets. CEGCN, MambaHSI, and our proposed MSLKACNN process the entire HSI as input, while other models utilize small HSI cubes. The FLOPS results for these other models are calculated with a batch size of 1. s, ms, K, and G, denote second, millisecond, kilo, and giga, respectively.

4.7. Ablation Study

The proposed MSLKACNN comprises three primary components, the SFEM, the MLKADC, and the MADDC, as well as two critical hyperparameters, the large kernel size in MLKADC and the large kernel size in MADDC. In this section, we perform ablation studies to assess the individual contributions and impact of the three components and the two hyperparameters.
(1) Contributions of Each Component: To assess the individual contributions of these components, we perform a quantitative analysis by selectively removing one of the three components. The results are summarized in Table 8. To ensure consistency between the number of bands in the original HSI and the number of filters in the convolutional layers, we retain one of the 1 × 1 convolution blocks from the SFEM component after its removal. From the table, we observe that the MSLKACNN model without the MLKADC component exhibits suboptimal performance in comparison to other models across most datasets. This indicates that the component plays a more significant role compared to the other components. Moreover, our MSLKACNN consistently surpasses the performance of its modified versions on all datasets. These findings reinforce the effectiveness of the integrated components.
Table 8. Classification results of each component in MSLKACNN.
(2) Analysis of Various Large Kernel Sizes in MLKADC: To verify the effect of different large kernel sizes in MLKADC, we conduct a comparative analysis of OA using varying large kernel sizes across four benchmark datasets: Indian Pines, Botswana, Houston 2013, and LongKou. The results are visually depicted in Figure 10. The figure illustrates a significant trend: the OA tends to increase with the enlargement of kernel sizes in most cases, reaching its peak at kernel sizes of 1 × 17 and 17 × 1 . Nevertheless, a further increase in kernel sizes leads to a decline in OA. This discovery is vital for determining the optimal large kernel sizes for MLKADC.
Figure 10. OA results of various large kernel sizes in MLKADC on each dataset.
(3) Analysis of Different Kernel Sizes in MADDC: To evaluate the influence of various kernel sizes in MADDC, we compare the OA results achieved by diverse kernel sizes on the Indian Pines, Botswana, Houston 2013, and LongKou datasets. These results are illustrated in Figure 11. We observe that the MSLKACNN model equipped with the kernel sizes of 1 × 5 and 5 × 1 exhibits superior performance compared to its variant models utilizing alternative kernel sizes, thereby determining the optimal kernel sizes for MADDC.
Figure 11. OA results of different kernel sizes in MADDC on each dataset.

5. Conclusions

In this paper, we propose MSLKACNN, a novel multi-scale large kernel asymmetric CNN architecture for HSI classification. The key breakthrough of our MSLKACNN lies in successfully scaling up convolutional kernels to 1 × 17 and 17 × 1 sizes while maintaining computational efficiency. The core innovation of the proposed MSLKACNN is the MSLKAC component, which combines asymmetric depthwise convolutions with small to large kernels and asymmetric dilated depthwise convolutions, effectively extracting both local and global features. Our MSLKACNN achieves the best performance in terms of OA, AA, and KAPPA, compared to baseline methods, demonstrating its effectiveness and superiority. In the future, we will explore replacing a large kernel asymmetric convolution with multiple small kernel asymmetric convolutions to maintain a large receptive field while reducing the number of parameters and computational costs.

6. Further Discussion

As shown in Table 3, Table 4, Table 5 and Table 6, the proposed MSLKACNN demonstrates superior performance compared to those of five major categories of deep learning approaches: (1) CNNs, (2) GCNs, (3) Transformers, (4) Mamba, and (5) LKCNN. From these results, we observe that the LKCNN method SSLKA exhibits significantly lower classification performance than most benchmark methods on the high-density dataset (LongKou), while outperforming most comparative methods on the remaining datasets (Indian Pines, Botswana, and Houston 2013). This implies that SSLKA may be unsuitable for processing dense HSI data. Notably, compared to the most related method, SSLKA, the proposed MSLKACNN shows significant performance improvement across all datasets, with OA improvements of 9.18%, 4.75%, 0.67%, and 9.14% on the Indian Pines, Botswana, Houston 2013, and LongKou datasets, respectively. These performance gains can be attributed to the enhanced capabilities of MSLKACNN to extract and integrate both local and global features through asymmetric convolutions with small-to-large kernels. In addition, as shown in Table 7, our MSLKACNN outperforms SSLKA by a large margin in terms of parameters and testing time, demonstrating the advantages of replacing square kernels with vertical and horizontal kernels in MSLKACNN.
Although the proposed MSLKACNN demonstrates significant advantages in classification performance, inference speed, and parameters, the use of entire HSI rather than its small cubes as input leads to higher computational complexity, posing challenges when dealing with extremely large-scale datasets. Furthermore, while our parallel asymmetric convolutions with small-to-large kernels effectively capture local-to-global features, the absence of an attention mechanism may limit the model’s ability to focus on critical spatial features, which could affect discriminative feature learning in complex scenarios.

Author Contributions

Conceptualization, X.L. (Xun Liu) and A.H.-M.N.; methodology, X.L. (Xun Liu), A.H.-M.N. and F.L.; software, X.L. (Xuejiao Liao); validation, X.L. (Xun Liu), J.R. and L.G.; formal analysis, A.H.-M.N.; investigation, X.L. (Xun Liu) and X.L. (Xuejiao Liao); resources, A.H.-M.N., J.R. and L.G.; writing—original draft preparation, X.L. (Xun Liu); writing—review and editing, A.H.-M.N. and F.L.; visualization, X.L. (Xuejiao Liao); supervision, A.H.-M.N.; funding acquisition, A.H.-M.N. and F.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Program for Guangdong Introducing Innovative and Entrepreneurial Teams (grant number 2019ZT08L213), the National Natural Science Foundation of China (grant number 42274016), the Key Discipline Research Capacity Improvement Project of Guangdong Province (grant number 2024ZDJS022), and the Guangdong Forestry Science Data Center (grant number 2021B1212100004).

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Acknowledgments

The authors thank the anonymous reviewers and the editors for their insightful comments and helpful suggestions that helped improve the quality of our manuscript.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
HSIHyperspectral image
LKCNNLarge kernel convolutional neural network
LKALarge kernal attention
OAOverall accuracy
AAAverage Accuracy
KAPPAKappa coefficient
GCNGraph convolutional network
CNNConvolutional neural network
ViTVision Transformer
MSLKACNNMulti-scale large kernel asymmetric CNN
SFEMSpectral feature extraction module
MSLKACMulti-scale large kernel asymmetric convolution
MADDCMulti-scale asymmetric dilated depthwise convolution
MLMachine learning
DLDeep learning
SAEsStacked autoencoders
RNNsRecurrent neural networks
CapsNetsCapsule networks
DWCDepthwise convolution
DDCDepthwise dilation convolution
MLKADCMulti-scale large kernel asymmetric depthwise convolution
ADCAsymmetric depthwise convolution
AFPAverage fusion pooling
BNBatch normalization
ADDCAsymmetric dilated depthwise convolution

References

  1. Yuan, J.; Wang, S.; Wu, C.; Xu, Y. Fine-grained classification of urban functional zones and landscape pattern analysis using hyperspectral satellite imagery: A case study of Wuhan. IEEE J. Sel. Top. Appl. Earth Observ. Remote Sens. 2022, 15, 3972–3991. [Google Scholar] [CrossRef]
  2. Gevaert, C.M.; Suomalainen, J.; Tang, J.; Kooistra, L. Generation of Spectral–Temporal Response Surfaces by Combining Multispectral Satellite and Hyperspectral UAV Imagery for Precision Agriculture Applications. IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens. 2015, 8, 3140–3146. [Google Scholar] [CrossRef]
  3. Murphy, R.J.; Schneider, S.; Monteiro, S.T. Consistency of Measurements of Wavelength Position From Hyperspectral Imagery: Use of the Ferric Iron Crystal Field Absorption at ∼900 nm as an Indicator of Mineralogy. IEEE Trans. Geosci. Remote Sens. 2014, 52, 2843–2857. [Google Scholar] [CrossRef]
  4. Vibhute, A.D.; Kale, K.V.; Dhumal, R.K.; Mehrotra, S.C. Hyperspectral imaging data atmospheric correction challenges and solutions using QUAC and FLAASH algorithms. In Proceedings of the International Conference on Man and Machine Interfacing (MAMI), Bhubaneswar, India, 17–19 December 2015; pp. 1–6. [Google Scholar]
  5. Ryan, J.P.; Davis, C.O.; Tufillaro, N.B.; Kudela, R.M.; Gao, B.C. Application of the hyperspectral imager for the coastal ocean to phytoplankton ecology studies in Monterey Bay, CA, USA. Remote Sens. 2014, 6, 1007–1025. [Google Scholar] [CrossRef]
  6. Li, Z.; Xiong, F.; Zhou, J.; Lu, J.; Qian, Y. Learning a Deep Ensemble Network with Band Importance for Hyperspectral Object Tracking. IEEE Trans. Image Process. 2023, 32, 2901–2914. [Google Scholar] [CrossRef]
  7. Zhang, B.; Li, S.; Jia, X.; Gao, L.; Peng, M. Adaptive Markov Random Field Approach for Classification of Hyperspectral Imagery. IEEE Geosci. Remote Sens. Lett. 2011, 8, 973–977. [Google Scholar] [CrossRef]
  8. Li, R.; Zheng, S.; Duan, C.; Yang, Y.; Wang, X. Classification of hyperspectral image based on double–branch dual–attention mechanism network. Remote Sens. 2020, 12, 582. [Google Scholar] [CrossRef]
  9. Liu, Q.; Xiao, L.; Yang, J.; Wei, Z. CNN-enhanced graph convolutional network with pixel-and superpixel-level feature fusion for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2021, 59, 8657–8671. [Google Scholar] [CrossRef]
  10. Sun, L.; Zhao, G.; Zheng, Y.; Wu, Z. Spectral–spatial feature tokenization transformer for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–14. [Google Scholar] [CrossRef]
  11. Yang, X.; Ye, Y.; Li, X.; Lau, R.Y.; Zhang, X.; Huang, X. Hyperspectral image classification with deep learning models. IEEE Trans. Geosci. Remote Sens. 2018, 56, 5408–5423. [Google Scholar] [CrossRef]
  12. Chen, Y.; Lin, Z.; Zhao, X.; Wang, G.; Gu, Y. Deep learning-based classification of hyperspectral data. IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens. 2014, 7, 2094–2107. [Google Scholar] [CrossRef]
  13. Mou, L.; Ghamisi, P.; Zhu, X.X. Deep recurrent neural networks for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2017, 55, 3639–3655. [Google Scholar] [CrossRef]
  14. Li, Y.; Zhang, H.; Shen, Q. Spectral–spatial classification of hyperspectral imagery with 3D convolutional neural network. Remote Sens. 2017, 9, 67. [Google Scholar] [CrossRef]
  15. Zhang, M.; Li, W.; Du, Q. Diverse region–based CNN for hyperspectral image classification. IEEE Trans. Image Process. 2018, 27, 2623–2634. [Google Scholar] [CrossRef]
  16. Paoletti, M.E.; Haut, J.M.; Fernandez-Beltran, R.; Plaza, J.; Plaza, A.; Li, J.; Pla, F. Capsule networks for hyperspectral image classification. IEEE Trans. Image Process. 2019, 57, 2145–2160. [Google Scholar] [CrossRef]
  17. Qin, A.; Shang, Z.; Tian, J.; Wang, Y.; Zhang, T.; Tang, Y.Y. Spectral–spatial graph convolutional networks for semisupervised hyperspectral image classification. IEEE Geosci. Remote Sens. Lett. 2019, 16, 241–245. [Google Scholar] [CrossRef]
  18. Bai, J.; Ding, B.; Xiao, Z.; Jiao, L.; Chen, H.; Regan, A.C. Hyperspectral image classification based on deep attention graph convolutional network. IEEE Trans. Geosci. Remote Sens. 2021, 60, 1–16. [Google Scholar] [CrossRef]
  19. Hong, D.; Han, Z.; Yao, J.; Gao, L.; Zhang, B.; Plaza, A.; Chanussot, J. SpectralFormer: Rethinking hyperspectral image classification with transformers. IEEE Trans. Geosci. Remote Sens. 2021, 60, 1–15. [Google Scholar] [CrossRef]
  20. Li, Y.; Luo, Y.; Zhang, L.; Wang, Z.; Du, B. MambaHSI: Spatial-Spectral Mamba for Hyperspectral Image Classification. IEEE Trans. Geosci. Remote Sens. 2024, 62, 1–16. [Google Scholar] [CrossRef]
  21. Wang, W.; Dou, S.; Jiang, Z.; Sun, L. A fast dense spectral–spatial convolution network framework for hyperspectral images classification. Remote Sens. 2018, 10, 1068. [Google Scholar] [CrossRef]
  22. Li, Z.; Huang, L.; He, J. A multiscale deep middle-level feature fusion network for hyperspectral classification. Remote Sens. 2019, 11, 695. [Google Scholar] [CrossRef]
  23. Wang, X.; Tan, K.; Du, P.; Pan, C.; Ding, J. A unified multiscale learning framework for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–19. [Google Scholar] [CrossRef]
  24. Roy, S.K.; Manna, S.; Song, T.; Bruzzone, L. Attention-based adaptive spectral–spatial kernel ResNet for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2021, 59, 7831–7843. [Google Scholar] [CrossRef]
  25. Li, M.; Liu, Y.; Xue, G.; Huang, Y.; Yang, G. Exploring the relationship between center and neighborhoods: Central vector oriented self-similarity network for hyperspectral image classification. IEEE Trans. Circ. Syst. Vid. 2023, 33, 1979–1993. [Google Scholar] [CrossRef]
  26. Kipf, T.N.; Welling, M. Semi–supervised classification with graph convolutional networks. In Proceedings of the International Conference on Learning Representations, (ICLR), Toulon, France, 24–26 April 2017; pp. 1–14. [Google Scholar]
  27. Liu, Q.; Dong, Y.; Zhang, Y.; Luo, H. A Fast Dynamic Graph Convolutional Network and CNN Parallel Network for Hyperspectral Image Classification. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–15. [Google Scholar] [CrossRef]
  28. Zhou, H.; Luo, F.; Zhuang, H.; Weng, Z.; Gong, X.; Lin, Z. Attention Multihop Graph and Multiscale Convolutional Fusion Network for Hyperspectral Image Classification. IEEE Trans. Geosci. Remote Sens. 2023, 61, 1–14. [Google Scholar] [CrossRef]
  29. Liu, X.; Ng, A.H.M.; Ge, L.; Lei, F.; Liao, X. Multibranch Fusion: A Multibranch Attention Framework by Combining Graph Convolutional Network and CNN for Hyperspectral Image Classification. IEEE Trans. Geosci. Remote Sens. 2024, 62, 1–17. [Google Scholar] [CrossRef]
  30. Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
  31. Yang, A.; Li, M.; Ding, Y.; Hong, D.; Lv, Y.; He, Y. GTFN: GCN and Transformer Fusion Network with Spatial-Spectral Features for Hyperspectral Image Classification. IEEE Trans. Geosci. Remote Sens. 2023, 61, 1–15. [Google Scholar] [CrossRef]
  32. Zhao, Z.; Xu, X.; Li, S.; Plaza, A. Hyperspectral Image Classification Using Groupwise Separable Convolutional Vision Transformer Network. IEEE Trans. Geosci. Remote Sens. 2024, 62. [Google Scholar] [CrossRef]
  33. Xu, R.; Dong, X.M.; Li, W.; Peng, J.; Sun, W.; Xu, Y. DBCTNet: Double branch convolution-transformer network for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2024, 62, 1–15. [Google Scholar] [CrossRef]
  34. Yao, J.; Hong, D.; Li, C.; Chanussot, J. Spectralmamba: Efficient mamba for hyperspectral image classification. arXiv 2024, arXiv:2404.08489. [Google Scholar]
  35. Gu, A.; Dao, T. Mamba: Linear-time sequence modeling with selective state spaces. arXiv 2023, arXiv:2312.00752. [Google Scholar]
  36. Ding, X.; Zhang, X.; Han, J.; Ding, G. Scaling up your kernels to 31x31: Revisiting large kernel design in cnns. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 11963–11975. [Google Scholar]
  37. Liu, S.; Chen, T.; Chen, X.; Chen, X.; Xiao, Q.; Wu, B.; Kärkkäinen, T.; Pechenizkiy, M.; Mocanu, D.; Wang, Z. More convnets in the 2020s: Scaling up kernels beyond 51x51 using sparsity. arXiv 2022, arXiv:2207.03620. [Google Scholar]
  38. Ding, X.; Zhang, Y.; Ge, Y.; Zhao, S.; Song, L.; Yue, X.; Shan, Y. UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio Video Point Cloud Time-Series and Image Recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 16–22 June 2024; pp. 5513–5524. [Google Scholar]
  39. Zhong, C.; Gong, N.; Zhang, Z.; Jiang, Y.; Zhang, K. LiteCCLKNet: A lightweight criss-cross large kernel convolutional neural network for hyperspectral image classification. IET Comput. Vis. 2023, 17, 763–776. [Google Scholar] [CrossRef]
  40. Sun, G.; Pan, Z.; Zhang, A.; Jia, X.; Ren, J.; Fu, H.; Yan, K. Large kernel spectral and spatial attention networks for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2023, 61, 1–15. [Google Scholar] [CrossRef]
  41. Wu, C.; Tong, L.; Zhou, J.; Xiao, C. Spectral-Spatial Large Kernel Attention Network for Hyperspectral Image Classification. IEEE Trans. Geosci. Remote Sens. 2024, 42. [Google Scholar] [CrossRef]
  42. Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. Mobilenets: Efficient convolutional neural networks for mobile vision applications. In Proceedings of the IEEE IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 1–9. [Google Scholar]
  43. Yu, F.; Koltun, V. Multi–scale context aggregation by dilated convolutions. In Proceedings of the International Conference on Learning Representations (ICLR), San Juan, Puerto Rico, 2–4 May 2016. [Google Scholar]
  44. Plaza, A.; Martinez, P.; Perez, R.; Plaza, J. A new approach to mixed pixel classification of hyperspectral imagery based on extended morphological profiles. Pattern Recogn. 2004, 37, 1097–1116. [Google Scholar] [CrossRef]
  45. Hu, W.; Huang, Y.; Wei, L.; Zhang, F.; Li, H. Deep convolutional neural networks for hyperspectral image classification. J. Sens. 2015, 2015, 1–12. [Google Scholar] [CrossRef]
  46. Zhao, W.; Du, S. Spectral–spatial feature extraction for hyperspectral image classification: A dimension reduction and deep learning approach. IEEE Trans. Geosci. Remote Sens. 2016, 54, 4544–4554. [Google Scholar] [CrossRef]
  47. Yang, J.; Zhao, Y.Q.; Chan, J.C.W. Learning and transferring deep joint spectral–spatial features for hyperspectral classification. IEEE Trans. Geosci. Remote Sens. 2017, 55, 4729–4742. [Google Scholar] [CrossRef]
  48. Chen, C.; Zhang, J.J.; Zheng, C.H.; Yan, Q.; Xun, L.N. Classification of hyperspectral data using a multi-channel convolutional neural network. In Proceedings of the International Conference Intelligent Computing (ICIC), Wuhan, China, 15–18 August 2018; pp. 81–92. [Google Scholar]
  49. Zhong, Z.; Li, J.; Luo, Z.; Chapman, M. Spectral–spatial residual network for hyperspectral image classification: A 3-D deep learning framework. IEEE Trans. Geosci. Remote Sens. 2018, 56, 847–858. [Google Scholar] [CrossRef]
  50. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
  51. Huang, G.; Liu, Z.; Van Der Maaten, L.; Weinberger, K.Q. Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 4700–4708. [Google Scholar]
  52. Chen, Y.; Jiang, H.; Li, C.; Jia, X.; Ghamisi, P. Deep feature extraction and classification of hyperspectral images based on convolutional neural networks. IEEE Trans. Geosci. Remote Sens. 2016, 54, 6232–6251. [Google Scholar] [CrossRef]
  53. Li, Y.; Xie, W.; Li, H. Hyperspectral image reconstruction by deep convolutional neural network for classification. Pattern Recogn. 2017, 63, 371–383. [Google Scholar] [CrossRef]
  54. Gong, Z.; Zhong, P.; Yu, Y.; Hu, W.; Li, S. A CNN with multiscale convolution and diversified metric for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2019, 57, 3599–3618. [Google Scholar] [CrossRef]
  55. Ding, Y.; Zhang, Z.; Zhao, X.; Hong, D.; Li, W.; Cai, W.; Zhan, Y. AF2GNN: Graph convolution with adaptive filters and aggregator fusion for hyperspectral image classification. Inf. Sci. 2022, 602, 201–219. [Google Scholar] [CrossRef]
  56. Wang, D.; Du, B.; Zhang, L. Spectral-spatial global graph reasoning for hyperspectral image classification. IEEE Trans. Neur. Net. Lear. 2023, 1–14. [Google Scholar] [CrossRef]
  57. Dong, Y.; Liu, Q.; Du, B.; Zhang, L. Weighted feature fusion of convolutional neural network and graph attention network for hyperspectral image classification. IEEE Trans. Image Process. 2022, 31, 1559–1572. [Google Scholar] [CrossRef]
  58. He, J.; Zhao, L.; Yang, H.; Zhang, M.; Li, W. HSI-BERT: Hyperspectral Image Classification Using the Bidirectional Encoder Representation From Transformers. IEEE Trans. Geosci. Remote Sens. 2020, 58, 165–178. [Google Scholar] [CrossRef]
  59. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. Proc. Adv. Neural Inf. Process. Syst. (NeurIPS) 2017, 30, 1–11. [Google Scholar]
  60. Zhao, F.; Zhang, J.; Meng, Z.; Liu, H.; Chang, Z.; Fan, J. Multiple vision architectures-based hybrid network for hyperspectral image classification. Expert Syst. Appl. 2023, 234, 121032. [Google Scholar] [CrossRef]
  61. Zhou, W.; Kamata, S.I.; Wang, H.; Wong, M.S.; Hou, H.C. Mamba-in-Mamba: Centralized Mamba-Cross-Scan in Tokenized Mamba Model for Hyperspectral Image Classification. arXiv 2024, arXiv:2405.12003. [Google Scholar] [CrossRef]
  62. Gao, H.; Yang, Y.; Li, C.; Gao, L.; Zhang, B. Multiscale residual network with mixed depthwise convolution for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2021, 59, 3396–3408. [Google Scholar] [CrossRef]
  63. Zhao, F.; Zhang, J.; Meng, Z.; Liu, H. Densely connected pyramidal dilated convolutional network for hyperspectral image classification. Remote Sens. 2021, 13, 3396. [Google Scholar] [CrossRef]
  64. Debes, C.; Merentitis, A.; Heremans, R.; Hahn, J.; Frangiadakis, N.; van Kasteren, T.; Liao, W.; Bellens, R.; Pižurica, A.; Gautama, S.; et al. Hyperspectral and LiDAR data fusion: Outcome of the 2013 GRSS data fusion contest. IEEE J. Sel. Top. Appl. Earth Obs. Remote. Sens. 2014, 7, 2405–2418. [Google Scholar] [CrossRef]
  65. Zhong, Y.; Hu, X.; Luo, C.; Wang, X.; Zhao, J.; Zhang, L. WHU-Hi: UAV-borne hyperspectral with high spatial resolution (H2) benchmark datasets and classifier for precise crop identification based on deep convolutional neural network with CRF. Remote Sens. Environ. 2020, 250, 112012. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.