Next Article in Journal
Exploring Spatial Non-Stationarity and Scale Effects of Natural and Anthropogenic Factors on Net Primary Productivity of Vegetation in the Yellow River Basin
Next Article in Special Issue
Identifying Rare Earth Elements Using a Tripod and Drone-Mounted Hyperspectral Camera: A Case Study of the Mountain Pass Birthday Stock and Sulphide Queen Mine Pit, California
Previous Article in Journal
Comparative Study of Seafloor Topography Prediction from Gravity–Geologic Method and Analytical Algorithm
Previous Article in Special Issue
Enhancing Semi-Supervised Few-Shot Hyperspectral Image Classification via Progressive Sample Selection
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Cross-Hopping Graph Networks for Hyperspectral–High Spatial Resolution (H2) Image Classification

1
School of Computer, China West Normal University, Nanchong 637002, China
2
Institute of Artificial Intelligence, China West Normal University, Nanchong 637002, China
3
Key Laboratory of Optimization Theory and Applications, China West Normal University of Sichuan Province, Nanchong 637002, China
4
School of Electronic Information and Automation, Civil Aviation University of China, Tianjin 300300, China
*
Author to whom correspondence should be addressed.
Remote Sens. 2024, 16(17), 3155; https://doi.org/10.3390/rs16173155
Submission received: 28 June 2024 / Revised: 23 August 2024 / Accepted: 24 August 2024 / Published: 27 August 2024
(This article belongs to the Special Issue Deep Learning for Spectral-Spatial Hyperspectral Image Classification)

Abstract

:
As we take stock of the contemporary issue, remote sensing images are gradually advancing towards hyperspectral–high spatial resolution (H2) double-high images. However, high resolution produces serious spatial heterogeneity and spectral variability while improving image resolution, which increases the difficulty of feature recognition. So as to make the best of spectral and spatial features under an insufficient number of marking samples, we would like to achieve effective recognition and accurate classification of features in H2 images. In this paper, a cross-hop graph network for H2 image classification(H2-CHGN) is proposed. It is a two-branch network for deep feature extraction geared towards H2 images, consisting of a cross-hop graph attention network (CGAT) and a multiscale convolutional neural network (MCNN): the CGAT branch utilizes the superpixel information of H2 images to filter samples with high spatial relevance and designate them as the samples to be classified, then utilizes the cross-hop graph and attention mechanism to broaden the range of graph convolution to obtain more representative global features. As another branch, the MCNN uses dual convolutional kernels to extract features and fuse them at various scales while attaining pixel-level multi-scale local features by parallel cross connecting. Finally, the dual-channel attention mechanism is utilized for fusion to make image elements more prominent. This experiment on the classical dataset (Pavia University) and double-high (H2) datasets (WHU-Hi-LongKou and WHU-Hi-HongHu) shows that the H2-CHGN can be efficiently and competently used in H2 image classification. In detail, experimental results showcase superior performance, outpacing state-of-the-art methods by 0.75–2.16% in overall accuracy.

1. Introduction

Hyperspectral-high spatial resolution (H2) [1] images have finer spectral and spatial information [2] and can effectively distinguish spectrally similar objects by capturing subtle differences in continuous shape of the spectral features [3], realizing landmark recognition at the image element level [4]. H2 images have been widely used in environmental monitoring, pollution monitoring, geologic exploration, agricultural evaluation, etc. [5]. However, high dimensionality, a small number of labeled samples, spectral variability, and complex noise effects in H2 data [6] make it a great challenge to perform effective feature extraction, which leads to poor classification accuracy and low accuracy of feature recognition [7].
Traditional machine learning methods have a strong dependence on a priori knowledge and specialized knowledge when extracting features of an HSI, which makes it hard to extract deep features from an HSI [8]. Deep learning methods can better handle nonlinear data and outperform traditional machine learning-based feature extraction methods [9]. Convolutional neural networks (CNNs) [10] are good at extracting effective spectral spatial features from HSIs. Zhong et al. [11] operated directly on an original HSI and designed an end-to-end spectral spatial residual network (SSRN) using spectral and spatial residual blocks. Roy et al. [12] constructed a model called HybridSN by concatenating a 2D-CNN and a 3D-CNN to maximize accuracy. However, these methods also exhibit complex structures and high computational requirements [13]. Recently, transformers [14] have had good performance in processing HS data due to their self-self attention mechanism. Hong et al. [15]. proposed a novel network called SpectralFormer, which learns local spectral features from multiple neighboring bands at each coding location. Nonetheless, the transformers-based networks, e.g., ViT [16], inevitably experience a degradation in performance when processing HSI data [17]. Johnson et al. [18] proposed a neuro-fuzzy modeling method for constructing a fitness predictive model. Zhao et al. [19] proposed a new shuffle Net-CA-SSD lightweight network. Li et al. [20] proposed a variational autoencoder, GAN, for fault classification. Yu Li et al. [21] proposed a distillation-constrained prototype representation network for image classification. Bhosle et al. [22] proposed a deep learning CNN model for digit recognition. Sun et al. [23] proposed a deep learning data generation method for predicting ice resistance. Wang et al. [24] proposed a spatio-temporal deep learning model for predicting streamflow. Shao et al. [25] proposed a task-supervised ANIL for fault classification. Dong et al. [26] designed a hybrid model with a support vector machine and a gate recurrent unit. Dong et al. [27] designed a region-based convolutional neural network for epigraphical images. Zhao et al. [28] designed a interpretable dynamic inference system based on fuzzy broad learning. Yan et al. [29] designed a lightweight framework using separable multiscale convolution and broadcast self-attention. Wang et al. [30] designed a deep learning interpretable model with multi-source data fusion. Li et al. [31] proposed an adaptive weighted ensemble clustering method. Li et al. [32] proposed an automatic assessment method for depression symptoms. Li et al. [33] proposed an optimization-based federated learning for non-IID data. Xu et al. [34] proposed an ensemble clustering method based on structure information. Li et al. [35] proposed a deep convolutional neural network for automatic diagnosis of depression.
Deep learning-based feature extraction methods for HSIs are widely used because they can better extract the deep features of the image. Traditional convolutional neural networks (CNNs) [10] need to be trained with a massive number of labeled samples and have a high training time complexity. Graph Convolutional Neural Networks (GCNs) [36] can handle arbitrary structural data, adaptively learn parameters according to specific feature types, and optimize these parameters, which improves a model’s recognition performance in different feature types compared to traditional CNNs [37]. Hong et al. [3] proposed a miniGCN to reduce computation cost and realize the complementary positive aspect between a CNN and a GCN in a small batch. However, pixel-based feature extraction methods generate high-dimensional feature vectors and extract a large amount of redundant information. Therefore, in order to extract features more efficiently, researchers have introduced superpixels for use instead of pixels. Sellars et al. proposed a semi-supervised method (SGL) [38] that combines superpixels with graphical representations and pure-graph classifiers to greatly reduce the computational overhead. Sheng et al. [39] designed a multi-scale dynamic graph-based network (MDGCN), exploiting the multi-scale information with dynamic transformation. To enhance the computational efficiency and speed up the training process, Li et al. [40] designed a new symmetric graph metric learning (SGML) model by introducing a symmetry mechanism, which mitigates the spectral variability through symmetry mechanisms. Although the above methods utilize superpixels instead of pixels to reduce the computational complexity, these methods can only generate features at the superpixel level and fail to take into account subtle features within each superpixel during the spectral feature extraction process. Consequently, classification maps generated by GCNs are sensitive to over-smoothing and produce false boundaries between classes [17]. To overcome this problem, Liu et al. [41] proposed a heterogeneous deep network (CEGCN) that utilizes CNNs to complement the superpixel-level features of GCNs by generating local pixel-level features. Moreover, graph encoders and decoders are proposed to solve the incompatibility between CNNs and GCN data. Based on the above idea, the Multi-layer Superpixel Structured Graph U-Net (MSSG-UNet) [42], which gradually extracts varied scale features from coarse to fine and performs feature fusion, was generated. Meanwhile, in order to better utilize neighboring nodes to prevent information loss, [36] proposed the graph attention network (GAT), which uses k-nearest neighbors to find the adjacency matrix and compute the weights of different nodes. Dong et al. [43] used weighted feature fusion of a CNN and a GAT. Ding et al. [44] designed a fusion network (MFGCN) [45] and a multi-scale receptive field graph attention neural network (MRGAT). The above improved GCNs and enabled a more comprehensive use of spatial and spectral features by reconfiguring the adjacency matrices; however, converting the connections of the nodes may present superfluous information and degrade the classification performance. Xue et al. designed a multihop GCN using different branches and different hopping graphs [46], which aggregates multiscale contextual information by hopping. Zhou et al. proposed a fusion network with attention multi-hop graph and multi-scale convolutional (AMGCFN) [47]. Xiao et al. [48] proposed a privacy-preserving federated learning system in IIoT. Tao et al. [49] proposed a memory-guided population stage-wise control spherical search algorithm. However, the above networks are complex in structure and inefficient in model training. Some new methods have been proposed in recent years [50,51,52,53,54,55,56,57,58,59], which can be used to optimize these models.
Aimed at solving the above problems in extracting and classifying features from HSIs, such as the inconsistency of superpixel segmentation processing for similar feature classification, the limitation of the traditional single-hop GCN for node characterization, and the loss of information in joint classification of the spatial spectrum, this paper proposes a cross-hop graph network model for H2 image classification (H2-CHGN). The model utilizes two branches consisting of a cross-hop graph attention network (CGAT) and a multiscale convolutional neural network (MCNN) to classify H2 images in parallel. Considering the computational complexity of graph construction, the GCN-based feature extraction first refines the graph nodes by means of superpixel segmentation, then constructs the graph convolution by using the cross-hop graph (replacing the ordinary neighbor-hopping operation with interval-hopping), which is combined with the multi-head GAT to jointly extract the superpixel features in order to obtain more representational global features. CNN-based feature extraction is performed in parallel to obtain multi-scale pixel-level features using improved CNNs. Finally, features from both branches are fused using a two-channel attention mechanism to gain more comprehensive and enriched feature representation. The contributions of this paper are outlined below:
(1)
A two-branch neural network (CGAT and MCNN) was designed for feature extraction of H2 images separately, which aims at fully leveraging the collective characteristics of hyperspectral imagery, encompassing both the superpixel and pixel levels.
(2)
A cross-hopping graph operation algorithm was proposed to perform graph convolution operations from near to far, which can better capture the local and global correlation features between spectral features. To better capture multi-scale node features, a pyramid feature extraction structure was used to comprehensively learn multilevel graph structure information.
(3)
In order to improve the adaptivity of the multilayer graph, a multi-head graph attention mechanism was introduced to portray different aspects of similarity between nodes, thus providing richer feature information.
(4)
So as to reduce the computational complexity, dual convolutional kernels were utilized in convolutional neural networks using different sizes of convolutional kernels at different layers to extract pixel-level multiscale features by means of cross connectivity. The features from the two branches were fused through the dual-channel attention mechanism, in order to gain a more comprehensive and accurate feature representation.
The subsequent content of this article is organized as follows. The introduction of the H2-CHGN is in Section 2. The experimental settings are depicted in Section 3. In Section 4, the extensive experiments and analyses are carried out. Finally, summaries are made in Section 5.

2. Proposed H2-CHGN Framework

We denote the H2 image cube as X R H × W × B , and H , W , and B represent the height, width, and number of spectral bands of the spatial dimension, respectively.
In the H2-CHGN model, as shown Figure 1, the CGAT branch first uses superpixel segmentation on the H2 image to filter out the samples with high spatial correlation with samples to be classified. The data structure transformation is performed through the coder (decoder), which facilitates the operations in the graph space. The cross-hop operation is utilized in graph convolution to broaden the range of graph convolution. For increasing feature diversity, we use the pyramid feature extraction structure to comprehensively learn multilevel graph structure information. Then, the contextual information is captured by graph attention mechanism to obtain more representational global features. Meanwhile, for enriching local features, the MCNN branch extracts multi-scale pixel-level features by parallel cross connectivity with dual convolutional kernels (3 × 3 and 5 × 5). Finally, in order to make image elements more prominent, the features of varied scales are transferred into the dual-channel attention fusion module to obtain fused features, then final classification results are obtained through the S o f t M a x layer.

2.1. Graph Construction Process Based on Superpixel Segmentation

In order to apply graph neural networks to an HSI, we need to convert standard Euclidean data like H2 images into graph data. However, if we directly consider the pixels in H2 images as graph nodes, the resulting graph must be very large and the computational complexity will be extremely high. Thus, we first applied principal component analysis (PCA) on the original HSI to improve the efficiency of the graph construction process. It is followed by a simple linear iterative clustering (SLIC) method, which generated spatially neighboring and spectrally similar superpixels. The adjacency matrix input into the GCN afterwards is constructed by establishing the adjacency relationship between superpixels, as shown in Figure 2.
Specifically, an H2 image is divided into Z = ( H × W ) / λ superpixels by PCA-SLIC, wherein λ is the superpixel segmentation scale. Let S = S i i = 1 Z denote the superpixels set, with S i = x j i j = 1 N i as the i t h superpixel, N i as the number of pixels in S i , and x j i as the j t h pixel in S i ( S i S j = , i j and H × W = i = 1 Z N i ).
Since the subsequent output features of the GCN need to be merged with the CNN’s, in order to alleviate the data incompatibility between the two networks, we apply the graphical encoder and decoder proposed in [41] to perform the data structure conversion. It is assumed that the association matrix of conversion is Q R H W × Z and is denoted as follows [41]:
Q i , j = 1 ,       i f   X i S j       0 ,     o t h e r w i s e                   X = F l a t t e n ( X )
where F l a t t e n denotes the expansion of original data into spatial dimensions and Q i , j denotes the value Q at position ( i , j ) . Then the matrix of node v in graph G ( G = ( v , e ) ) can be expressed as follows [41]:
V = E n c o d e ( X ; Q ) = Q T F l a t t e n ( X )
where Q is the result of column normalization and E n c o d e ( ) represents the process that encodes the pixel map of the HSI into the graph node of G. The superpixel node features are then mapped back to pixel features using the D e c o d e ( ) equation as follows [41]:
X = D e c o d e ( V ; Q ) = Reshape ( Q V )

2.2. Cross-Hop Graph Attention Convolution Module

It is feasible to obtain more node information by the way of stacking multiple convolutional layers in the GCN model; however, this operation will inevitably increase the computational complexity of the network. Merely using a shallow GCN lacks deeper feature information and causes poor classification accuracy. In contrast, the multi-hop graph [46] has more flexibility and can fully utilize the multi-hop node information to broaden the acceptance domain and mine the potential relationship between hop nodes.
As shown in Figure 3, the weight matrix W R m × m between the superpixel nodes is obtained by the superpixel segmentation algorithm for constructing the neighbor matrix of the multi-hop graph structure; the concrete steps are as follows:
Step 1. Assuming that the graph node V c e n t e r is at located at the center, first obtain all the surrounding k-hop neighborhood paths by using a Depth-First Search (DFS) starting at V c e n t e r .
Step 2. Calculate the path weights as:
W = 1 k ( W ( V c e n t e r , V 1 ) + W ( V c e n t e r , V 2 ) + . . . + W ( V c e n t e r , V e n d ) )
where V 1 ,   V 2 ,…, V e n d represents the nodes in the path. (Note: For the case of having different paths, we select the maximum value of the corresponding path weight and the current weight.)
Step 3. Add the current weight matrix with the unit matrix to ensure that each node is connected to itself to obtain a new k-hop matrix ( A 1 ,   A 2 ,…, A k ).
The k-hop neighbor matrix A k is applied to find the graph convolution through the GCN and BN layers as follows:
T k l = LeakyReLU ( ( D K ) 1 2 A k ( D K ) 1 2 T k l 1 W k l )
where T k l denotes the output of k-hop neighbor matrix A k through layer l , D K is the degree matrix of A k , and W k l indicates weight matrix. The node features of different jumps can feed different visual field information, as in the structure of Figure 4a, through the pyramid feature extraction structure to obtain the features of different depths and splice them to obtain the following:
T o u t = T 1 T 3 T 5
To compute the hidden information of each node, as in Figure 4b, a self-attention sharing mechanism a was applied to compute the attention coefficients between node v i and node v j : e i j = a ( W v i , W v j ) . Here, the first-order attention mechanism is carried by merely computing the first-order neighbor nodes j N i of the node i , where N i is a domain of i . Then, normalization by the S o f t M a x function is executed to make the coefficients more easily comparable across nodes:
α i j = Softmax j ( e i j ) = exp ( e i j ) k N i exp ( e i k )
Then, the corresponding features undergo a linear combination to calculate the node output features v i = σ ( j N i α i j W v j ) .
To obtain the node features stably, we apply multiple attention mechanisms to obtain multiple sets of new features, which are spliced in feature dimensions and placed into the final fully connected layer to obtain the final features as follows:
h i = ELU ( W 3 ( t = 1 T σ ( j N i α i j t W h j ) ) )
where α i j t denotes the t t h group of attention coefficients; T is the number of heads, which means there are T groups of attention coefficients; and denotes splicing operation of the total number of T groups in the feature dimensions. W 3 R C × T d denotes the weight matrix, where T d indicates input feature dimensions and C denotes the output feature dimensions, which equal the number of classes of the hyperspectral images.

2.3. CNN-Based Multiscale Feature Extraction Module

Although traditional 2D-CNNs can extract context space features, considering the multi-parameters in the convolutional kernel and a limited number of training samples, we prevent overfitting and enhance training efficiency by applying batch normalization (BN) to each convolutional layer unit [47], which is calculated as follows:
v i j x y = σ ( b i j + m = 0 M i 1 p = 0 P i 1 q = 0 Q i 1 w i j m p q v ( i 1 ) m ( x + p ) ( y + q ) )
where P i , Q i , M i ,   p , q , and m denote the size of the convolution kernel and their corresponding indices; v i j x y indicates the output by the i t h convolutional layer at x , y ; and w i j m p q , b i j , and σ denote the weight, bias term, and L e a k y R e L U activation function, respectively.
For broadening the sensory field of the network to obtain local features at varied scales, as shown in Figure 5, features are extracted from different sized convolution kernels in parallel, and then the different scale information is integrated by cross-path fusion. Because of the high dimensionality of the HSI spectrum, a 1D convolution kernel is first used in the convolution module to remove redundant spectral information and reduce parameter usage. Then an average pooling layer is used between the different convolutional layers to reduce the feature space size and prevent overfitting, where the pooling window size, stride, and padding size are, respectively, set to 3 × 3, 1, and 1.

2.4. Dual-Channel Attention Fusion Module

To better employ the channel relationship between hyperspectral pixels, we utilize the Convolutional Block Attention Network [60] (CBAM) module displayed in Figure 6 to sequentially infer the attention mapping along channel and spatial dimensions independently, multiplying it by the input for adaptive feature refinement.
Suppose the input feature map is F R C × H × W ; after the CBAM module there are channel and spatial attention maps M C R C × 1 × 1 and M S R 1 × H × W and their attention processes is expressed as follows:
F 1 = M C ( F ) F
F 2 = M S ( F 1 ) F 1
Feature fusion inspired by [47] utilizes the cross-attention fusion mechanism as shown in Figure 7.
Specifically, the input is first aggregated by global and max pooling on the spatial information of the feature maps to generate two features: F a v g C and F m a x C . There is a shared network composed of a multilayer perceptron (MLP) and a hidden layer where the above-mentioned features are input to get a channel attention map M C R C × 1 × 1 :
M C F = σ M L P A v g P o o l F + M L P M a x P o o l F = σ ( W 1 W 0 F a v g C + W 1 W 0 F m a x C )
where W 0 R C / r × C and W 1 R C / C × r represent the shared weight parameters of the MLP, r is the reduction ratio in the hidden layer, and σ and is the s i g m o i d function. From this, we can obtain mappings M C N and M C G , which correspond to the previous two branch features N o u t and G o u t , respectively, and multiply them to obtain M C c r o s s :
M C c r o s s = M C N × ( M C G ) T
After the channel crossing module, the characteristics are obtained as follows:
N C = Softmax ( M C c r o s s ) N out
G C = Softmax ( M C c r o s s ) G out
Unlike the channel attention operation, we input two pooled features into the convolutional layer to generate the spatial attention mapping M S R 1 × H × W [47]
M S ( F ) = σ ( f k × k ( [ A v g P o o l ( F ) ; M a x P o o l ( F ) ] ) ) = σ ( f k × k F a v g S ; F m a x S )
where f k × k is a convolutional layer with kernel size k × k . Similarly obtain the spatial weight coefficients M T N and M T G , multiply them with the input features, and add the residuals to get the noted features, respectively:
N S = Softmax ( M C c r o s s ) N out + N out
G S = Softmax ( M C c r o s s ) G out + G out
Eventually, through the fully connected layer, the last features are obtained [47] as follows:
Y = Softmax ( W ( N S H S ) + B )
where W and B are the weight and bias. The progress is summarized in Algorithm 1.
Algorithm 1 Dual-Channel Attention Fusion Algorithm
Input:   Feature   map   F R C × H × W
Step 1: Calculate F a v g C and F m a x C of F separately by global pooling and max pooling.
Step 2: Calculate channel weight coefficients for both branches as M C N and M C G according to Equation (12).
Step3: Calculate crossover coefficient M C c r o s s by Equation (13).
Step 4: Use the channel crossover module to calculate N C and G C , respectively, according to Equations (14) and (15).
Step 5: Similar to steps 1–4 above, calculate the spatial weight coefficients M T N and M T G and the fusion features N S and G S by Equations (16)–(18).
Step 6: Calculate final fusion features according to Equation (19).
Output:   Y

3. Experimental Details

To validate the effectiveness and generalization of the H2-CHGN, we chose the following three datasets: a classical HSI dataset (Pavia University) and two H2 image datasets (WHU-Hi-LongKou [1] and WHU-Hi-HongHu [61]). They are captured by different sensors for different types of scenes, providing richer samples. This diversity helps to improve the generalization ability of the model so that it performs well in different scenes. Table 1 lists the detailed information about the datasets. Table 2, Table 3 and Table 4 list training and testing set divisions of the three datasets. The comparison methods are SVM (OA: PU = 79.54%, LK = 92.88%, and HH = 66.34%), CEGCN [41] (OA: PU = 97.81%, LK = 98.72%, and HH = 94.01%), SGML (OA: PU = 94.30%, LK = 96.03%, and HH = 92.51%), WFCG (OA: PU = 97.53%, LK = 98.29%, and HH = 93.98%), MSSG-UNet (OA: PU = 98.52%, LK = 98.56%, and HH = 93.73%), MS-RPNet (OA: PU = 96.96%, LK = 97.17%, and HH = 93.56%), AMCGFN (OA: PU = 98.24%, LK = 98.44%, and HH = 94.44%) and H2-CHGN (OA: PU = 99.24%, LK = 99.19%, and HH = 96.60%). To quantitatively and qualitatively assess the classification performance of the network [62], three evaluation indices are used: overall accuracy (OA), average accuracy (AA), and kappa coefficient (Kappa). All experimental results are averaged over ten runs independently. The experiments were run via Python 3.7.16 with i5-8250U CPU and NVIDIA GeForce RTX 3090 GPU.

3.1. Parametric Setting

Table 5 shows the architecture of the H2-CHGN where it is used for all activation functions. In addition, the network uses the Adam optimizer with a learning rate of 5 ×10−4 to train. The remaining setup parameters include the superpixel segmentation scale and the number of attentional heads, and the iterations are explained in the following sections. We discuss and analyze these parameters through experiments, and the ultimate optimal parameter settings are shown in Table 6.

3.2. Analysis of Multi-Head Attention Mechanism in Graph Attention

By using multiple attention heads, a GAT is able to learn the attention weights between different nodes and combine them to obtain a more comprehensive and accurate image features. Notably, the number of heads controls different node relationships that the model is able to learn. Increasing attention heads allows the model to capture richer node relationships and improves the model’s representation, but at the same time this increases the computational complexity and memory overhead and is prone to overfitting. Therefore, as shown in Figure 8, so as to weigh the balance between model performance and computational complexity, the number of heads is taken as 7, 6, and 5 for three datasets individually.

3.3. Impact of Superpixel Segmentation Scale

The size of the superpixel segmentation scale molds the size of the graph construction area. The larger the superpixel area, the more pixels are contained in the segmented superpixel area and the fewer the number graph nodes that have to be constructed. The experiments are set λ ’s as 150, 200, 250, 300, 350, 400. Regions with different numbers of superpixels are visualized in mean color as in Figure 9, where the most representative features of each region can be seen more clearly. The larger the superpixel segmentation scale, the more superpixels in the segmented region, indicating a more detailed segmentation. Figure 10 demonstrates that as the scale increases, the classification accuracy initially increases but eventually declines, with the WHU-Hi-HongHu dataset experiencing a more significant decrease compared to the other two datasets. This is attributed to its greater complexity, featuring 22 categories and a relatively denser distribution of features. When different varieties of the same crop type are planted in a region, finer segmentation can lead to a higher likelihood of misclassifying various pixel categories into a single node. Additionally, the overall accuracy improves again when the segmentation scale increases from 350 to 400, likely due to the enhanced category separability achieved through more detailed segmentation.

3.4. Cross-Hopping Connection Analysis

Because HSI contains feature distributions of varying shapes and sizes, the cross-hop mechanism plays a critical role in modeling complex spatial topologies. Precisely, the near-hop graph structure contains fewer nodes, which is good for modeling small feature distributions, but makes it hard to learn continuous smooth features on large one. In contrast to the neighbor-hopping graph structure, in which most of the graph nodes are duplicated, the cross-hopping graph can better model large feature distributions but fails to account for subtle differences. To affirm the competence of the cross-hop hopping operation, we repeated the experiment 10 times for three sets of experiments (123-hop for adjacent hops, 124-hop for cross-even hops, and 135-hop for cross-odd hops). As shown in Figure 11, there is no significant effect for the simpler WHU-Hi-LongKou dataset. In contrast, for other two more complex datasets, the cross odd-hop 135-hop operation has a better performance. The reason can be inferred that the cross-hop mechanism enables a larger range of graph convolution operations, which can effectively extract a wider range of sample information and perform better.

4. Comparative Experimental Analysis and Discussions

4.1. Comparison of Classification Performance

The classification accuracies of the diverse methods on the three datasets are described in Table 7, Table 8 and Table 9, and the matching classification diagrams are shown in Figure 12, Figure 13 and Figure 14. From the results, SVM achieves poorer results when the training samples are small. In contrast, GCN-based SGML achieves smoother features, but also suffers from many scattered pretzel noises and susceptibility to misclassification. In addition, other models based on the fusion mechanism of a GCN and a CNN can obviously achieve higher classification accuracy despite their more complex structure, which indicates that the training sample features can be fully exploited by using the superpixel-based multiscale fusion mechanism. As seen in classification Figure 12, Figure 13 and Figure 14, the H2-CHGN achieves better classification results than other models on both classical and double-high (H2) datasets. It is obvious that the model provides a more significant improvement effect on WHU-Hi-HongHu dataset. Its plots vary in size and exhibit a more fragmented distribution, making them susceptible to generating isolated regions of classification errors in classification tasks. The proposed cross-hop graph operation facilitates the acquisition of contextual information and global association features from near to far, achieving a maximum accuracy of 96.6% on the WHU-Hi-HongHu dataset. In summary, the H2-CHGN not only integrates a CNN and a GCN well, but also effectively handles complex spatial features by using superpixel segmentation on the GCN branch, extracts structure-learning multi-scale features by using pyramid features of the spanning-hopping graphs, and captures local and global information by a GAT. The combined effect of superpixel segmentation, cross-hop map convolution, and attention mechanism can alleviate the problem of high spatial-spectral heterogeneity.

4.2. Ablation Analysis

To validate the efficacy of each module, we conducted ablation experiments on three datasets. Table 10 presents the results of these experiments: (1) and (2) represent the experiments conducted solely on the two-branch module, while (3) and (4) depict the experiments conducted with the combination of the Dual-Channel Fusion module with one or the other of the two branches, respectively. By comparing (1) and (3), as well as (2) and (4), it is evident that the Dual-Channel Fusion module enhances the classification performance by improving the output features of the corresponding branches. Moreover, Experiment (5) demonstrates the indispensable contribution of each component to the overall model.

4.3. Performance under Limited Samples

H2 data usually have high dimensionality and plenty of spectral bands, but training samples are usually odd in that samples of each category are limited. Therefore, the small-sample learning ability of the model is crucial in HSI classification methods. Experimentally, 0.1–0.6% samples are selected as training samples. As is described in Figure 15, OA increases with more training samples, and the H2-CHGN outperforms the other models on all three datasets. It is worth noting that due to the possible imbalance of samples in hyperspectral datasets, over-sampling of certain categories may lead to a decrease in accuracy with an increase in sampling rate.

4.4. Visualization by t-SNE

T-distributed stochastic neighbor embedding (t-SNE) [63] is a nonlinear dimensionality reduction technique which is particularly suitable for the visualization of high-dimensional data. Figure 16 shows the t-SNE result of four methods on three datasets. For a more straightforward view, the experiment was visualized by randomly selecting one of the bands of the feature. By comparison, a larger interclass gap and smaller intraclass disparity are displayed in the proposed feature space, and all categories are distinguished by acceptable boundaries [64].

4.5. Comparison of Running Time

To evaluate the efficiency of the H2-CHGN, the running times of different methods were recorded on the same computing platform, and comparisons are shown in Table 11. In this paper, the H2-CHGN, as in CEGCNs and MSSGU-Nets, inputs the whole H2 image into the network and classifies all the pixels in parallel through two branches, which also results in a shorter training time of the network compared to other methods and proves the superiority of the method.

5. Conclusions

In this paper, a cross-hop graph network (H2-CHGN) model for H2 image classification is proposed to alleviate the spatial heterogeneity and spectral variability of an H2 image. It is also essentially a hybrid neural network based on a superpixel-based GCN and a pixel-based CNN to extract spatial and global spectral features. Among them, a cross-hop graph attention network (CGAT) branch widens the range of graph convolution and the pyramid feature extraction structure is utilized to fuse multilevel features. So as to better capture relationship between nodes and contextual information, the H2-CHGN also employs a graph attention mechanism that specializes in graph-structured data. Meanwhile, the multi-scale convolutional neural network (MCNN) employs dual convolutional kernels to extract features at different scales and obtains multi-scale localized features in pixel level by means of cross connectivity. Finally, the dual-channel attention fusion module is used to effectively integrate multi-scale information, while strengthening the key features and improving generalization capability. Experimental results verify the validity and generalization of the H2-CHGN. Specifically, the overall accuracy of the three datasets (Pavia University, WHU-Hi-LongKou, WHU-Hi-HongHu) is as high as 99.24%, 99.19%, 96.60% respectively.
Our study possesses certain limitations containing a large number of model parameters and high computational demands. First, we noticed that the H2-CHGN is the result of superpixel segmentation, cross-hop convolution, and the attention mechanism working together. Therefore, suitable parameter matching is essential. In addition, a GCN needs to construct the adjacency matrix on all data, while the H2-CHGN can reduce the computational cost by cross-hop graph convolution.
In the future, we will explore strategies to enhance the cross-hop graph by employing more advanced techniques, such as transformers. Additionally, we aim to develop lighter-weight networks to reduce complexity while preserving performance.

Author Contributions

Conceptualization, T.W. and H.C.; methodology, T.W. and B.Z.; software T.C.; validation, W.D.; resources, H.C.; data curation, T.W.; writing—original draft preparation, T.W. and H.C.; writing—review and editing, T.C. and W.D.; visualization, T.W.; supervision, B.Z.; project administration, T.C.; funding acquisition, H.C. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the National Natural Science Foundation of China (62176217), the Innovation TeamFunds of ChinaWest Normal University (KCXTD2022-3), the Sichuan Science and Technology Program of China (2023YFG0028, 2023YFS0431), the A Ba Achievements Transformation Program (R23CGZH0001), the Sichuan Science and Technology Program of China (2023ZYD0148, 2023YFG0130), and Sichuan Province Transfer Payment Application and Development Program (R22ZYZF0004).

Data Availability Statement

Dataset available on request from the authors.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Zhong, Y.; Hu, X.; Luo, C.; Wang, X.; Zhao, J.; Zhang, L. WHU-Hi: UAV-borne hyperspectral with high spatial resolution (H2) benchmark datasets and classifier for precise crop identification based on deep convolutional neural network with CRF. Remote Sens. Environ. 2020, 250, 112012. [Google Scholar] [CrossRef]
  2. Paoletti, M.E.; Haut, J.M.; Plaza, J.; Plaza, A. Deep learning classifiers for hyperspectral imaging: A review. ISPRS J. Photogramm. Remote Sens. 2019, 158, 279–317. [Google Scholar] [CrossRef]
  3. Hong, D.; Gao, L.; Yao, J.; Zhang, B.; Plaza, A.; Chanussot, J. Graph convolutional networks for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2020, 59, 5966–5978. [Google Scholar] [CrossRef]
  4. Landgrebe, D. Hyperspectral image data analysis. IEEE Signal Process. Mag. 2002, 19, 17–28. [Google Scholar] [CrossRef]
  5. Shimoni, M.; Haelterman, R.; Perneel, C. Hypersectral imaging for military and security applications: Combining myriad processing and sensing techniques. IEEE Geosci. Remote Sens. Mag. 2019, 7, 101–117. [Google Scholar] [CrossRef]
  6. Li, S.; Song, W.; Fang, L.; Chen, Y.; Ghamisi, P.; Benediktsson, J.A. Deep learning for hyperspectral image classification: An overview. IEEE Trans. Geosci. Remote Sens. 2019, 57, 6690–6709. [Google Scholar] [CrossRef]
  7. Long, H.; Chen, T.; Chen, H.; Zhou, X.; Deng, W. Principal space approximation ensemble discriminative marginalized least-squares regression for hyperspectral image classification. Eng. Appl. Artif. Intell. 2024, 133, 108031. [Google Scholar] [CrossRef]
  8. Li, J.; Zheng, K.; Yao, J.; Gao, L.; Hong, D. Deep unsupervised blind hyperspectral and multispectral data fusion. IEEE Geosci. Remote Sens. Lett. 2022, 19, 1–5. [Google Scholar] [CrossRef]
  9. Chen, H.; Long, H.; Chen, T.; Song, Y.; Chen, H.; Zhou, X.; Deng, W. M3FuNet: An Unsupervised Multivariate Feature Fusion Network for Hyperspectral Image Classification. IEEE Trans. Geosci. Remote Sens. 2024, 62, 1–15. [Google Scholar]
  10. Makantasis, K.; Karantzalos, K.; Doulamis, A.; Doulamis, N. Deep supervised learning for hyperspectral data classification through convolutional neural networks. In Proceedings of the 2015 IEEE international geoscience and remote sensing symposium (IGARSS), Milan, Italy, 26–31 July 2015; pp. 4959–4962. [Google Scholar]
  11. Zhong, Z.; Li, J.; Luo, Z.; Chapman, M. Spectral–spatial residual network for hyperspectral image classification: A 3-D deep learning framework. IEEE Trans. Geosci. Remote Sens. 2017, 56, 847–858. [Google Scholar] [CrossRef]
  12. Roy, S.K.; Krishna, G.; Dubey, S.R.; Chaudhuri, B.B. HybridSN: Exploring 3-D–2-D CNN feature hierarchy for hyperspectral image classification. IEEE Geosci. Remote Sens. Lett. 2019, 17, 277–281. [Google Scholar] [CrossRef]
  13. Chen, H.; Ru, J.; Long, H.; He, J.; Chen, T.; Deng, W. Semi-supervised adaptive pseudo-label feature learning for hyperspectral image classification in internet of things. IEEE Internet Things J. 2024. [Google Scholar] [CrossRef]
  14. Vaswani, A. Attention is all you need. arXiv 2017, arXiv:1706.03762. [Google Scholar]
  15. Hong, D.; Han, Z.; Yao, J.; Gao, L.; Zhang, B.; Plaza, A.; Chanussot, J. SpectralFormer: Rethinking hyperspectral image classification with transformers. IEEE Trans. Geosci. Remote Sens. 2021, 60, 1–15. [Google Scholar] [CrossRef]
  16. Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16 × 16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
  17. Sun, L.; Zhang, H.; Zheng, Y.; Wu, Z.; Ye, Z.; Zhao, H. MASSFormer: Memory-Augmented Spectral-Spatial Transformer for Hyperspectral Image Classification. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5516415. [Google Scholar] [CrossRef]
  18. Johnson, F.; Adebukola, O.; Ojo, O.; Alaba, A.; Victor, O. A task performance and fitness predictive model based on neuro-fuzzy modeling. Artif. Intell. Appl. 2024, 2, 66–72. [Google Scholar] [CrossRef]
  19. Zhao, H.; Gao, Y.; Deng, W. Defect detection using shuffle Net-CA-SSD lightweight network for turbine blades in IoT. IEEE Internet Things J. 2024. [Google Scholar] [CrossRef]
  20. Li, W.; Liu, D.; Li, Y.; Hou, M.; Liu, J.; Zhao, Z.; Guo, A.; Zhao, H.; Deng, W. Fault diagnosis using variational autoencoder GAN and focal loss CNN under unbalanced data. Struct. Health Monit. 2024. [Google Scholar] [CrossRef]
  21. Yu, C.; Zhao, X.; Gong, B.; Hu, Y.; Song, M.; Yu, H.; Chang, C.I. Distillation-Constrained Prototype Representation Network for Hyperspectral Image Incremental Classification. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5507414. [Google Scholar] [CrossRef]
  22. Bhosle, K.; Musande, V. Evaluation of deep learning, C.N.N Model for recognition of Devanagari digit. Artif. Intell. Appl. 2023, 1, 114–118. [Google Scholar]
  23. Sun, Q.; Chen, J.; Zhou, L.; Ding, S.; Han, S. A study on ice resistance prediction based on deep learning data generation method. Ocean. Eng. 2024, 301, 117467. [Google Scholar] [CrossRef]
  24. Wang, Z.; Xu, N.; Bao, X.; Wu, J.; Cui, X. Spatio-temporal deep learning model for accurate streamflow prediction with multi-source data fusion. Environ. Model. Softw. 2024, 178, 106091. [Google Scholar] [CrossRef]
  25. Shao, H.; Zhou, X.; Lin, J.; Liu, B. Few-shot cross-domain fault diagnosis of bearing driven by Task-supervised ANIL. IEEE Internet Things J. 2024, 11, 22892–22902. [Google Scholar] [CrossRef]
  26. Dong, J.; Wang, Z.; Wu, J.; Cui, X.; Pei, R. A novel runoff prediction model based on support vector machine and gate recurrent unit with secondary mode decomposition. Water Resour. Manag. 2024, 38, 1655–1674. [Google Scholar] [CrossRef]
  27. Preethi, P.; Mamatha, H.R. Region-based convolutional neural network for segmenting text in epigraphical images. Artif. Intell. Appl. 2023, 1, 119–127. [Google Scholar] [CrossRef]
  28. Zhao, H.; Wu, Y.; Deng, W. An interpretable dynamic inference system based on fuzzy broad learning. IEEE Trans. Instrum. Meas. 2023, 72, 2527412. [Google Scholar] [CrossRef]
  29. Yan, S.; Shao, H.; Wang, J.; Zheng, X.; Liu, B. LiConvFormer: A lightweight fault diagnosis framework using separable multiscale convolution and broadcast self-attention. Expert Syst. Appl. 2024, 237, 121338. [Google Scholar] [CrossRef]
  30. Wang, Z.; Wang, Q.; Liu, Z.; Wu, T. A deep learning interpretable model for river dissolved oxygen multi-step and interval prediction based on multi-source data fusion. J. Hydrol. 2024, 629, 130637. [Google Scholar] [CrossRef]
  31. Li, T.Y.; Shu, X.Y.; Wu, J.; Zheng, Q.X.; Lv, X.; Xu, J.X. Adaptive weighted ensemble clustering via kernel learning and local information preservation. Knowl.-Based Syst. 2024, 294, 111793. [Google Scholar] [CrossRef]
  32. Li, M.; Lv, Z.; Cao, Q.; Gao, J.; Hu, B. Automatic assessment method and device for depression symptom severity based on emotional facial expression and pupil-wave. IEEE Trans. Instrum. Meas. 2024. [Google Scholar]
  33. Li, X.; Zhao, H.; Deng, W. IOFL: Intelligent-optimization-based federated learning for Non-IID data. IEEE Internet Things J. 2024, 11, 16693–16699. [Google Scholar] [CrossRef]
  34. Xu, J.; Li, T.; Zhang, D.; Wu, J. Ensemble clustering via fusing global and local structure information. Expert Syst. Appl. 2024, 237, 121557. [Google Scholar] [CrossRef]
  35. Li, M.; Wang, Y.Q.; Yang, C.; Lu, Z.; Chen, J. Automatic diagnosis of depression based on facial expression information and deep convolutional neural network. IEEE Trans. Comput. Soc. Syst. 2024. [Google Scholar] [CrossRef]
  36. Veličković, P.; Cucurull, G.; Casanova, A.; Romero, A.; Lio, P.; Bengio, Y. Graph attention networks. arXiv 2017, arXiv:1710.10903. [Google Scholar]
  37. Saber, S.; Amin, K.; Pławiak, P.; Tadeusiewicz, R.; Hammad, M. Graph convolutional network with triplet attention learning for person re-identification. Inf. Sci. 2022, 617, 331–345. [Google Scholar] [CrossRef]
  38. Sellars, P.; Aviles-Rivero, A.I.; Schönlieb, C.B. Superpixel contracted graph-based learning for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2020, 58, 4180–4193. [Google Scholar] [CrossRef]
  39. Wan, S.; Gong, C.; Zhong, P.; Du, B.; Zhang, L.; Yang, J. Multiscale dynamic graph convolutional network for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2019, 58, 3162–3177. [Google Scholar] [CrossRef]
  40. Li, Y.; Xi, B.; Li, J.; Song, R.; Xiao, Y.; Chanussot, J. SGML: A symmetric graph metric learning framework for efficient hyperspectral image classification. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 15, 609–622. [Google Scholar] [CrossRef]
  41. Liu, Q.; Xiao, L.; Yang, J.; Wei, Z. CNN-enhanced graph convolutional network with pixel-and superpixel-level feature fusion for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2020, 59, 8657–8671. [Google Scholar] [CrossRef]
  42. Liu, Q.; Xiao, L.; Yang, J.; Wei, Z. Multilevel superpixel structured graph U-Nets for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2021, 60, 1–5. [Google Scholar] [CrossRef]
  43. Dong, Y.; Liu, Q.; Du, B.; Zhang, L. Weighted feature fusion of convolutional neural network and graph attention network for hyperspectral image classification. IEEE Trans. Image Process. 2022, 31, 1559–1572. [Google Scholar] [CrossRef]
  44. Ding, Y.; Zhang, Z.; Zhao, X.; Hong, D.; Cai, W.; Yang, N.; Wang, B. Multi-scale receptive fields: Graph attention neural network for hyperspectral image classification. Expert Syst. Appl. 2023, 223, 119858. [Google Scholar] [CrossRef]
  45. Ding, Y.; Zhang, Z.; Zhao, X.; Hong, D.; Cai, W.; Yu, C.; Yang, N.; Cai, W. Multi-feature fusion: Graph neural network and CNN combining for hyperspectral image classification. Neurocomputing 2022, 501, 246–257. [Google Scholar] [CrossRef]
  46. Xue, H.; Sun, X.K.; Sun, W.X. Multi-hop hierarchical graph neural networks. In Proceedings of the 2020 IEEE International Conference on Big Data and Smart Computing (BigComp), Busan, Republic of Korea, 19–22 February 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 82–89. [Google Scholar]
  47. Zhou, H.; Luo, F.; Zhuang, H.; Weng, Z.; Gong, X.; Lin, Z. Attention multi-hop graph and multi-scale convolutional fusion network for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2023, 61, 1–14. [Google Scholar]
  48. Xiao, Y.; Shao, H.; Lin, J.; Huo, Z.; Liu, B. BCE-FL: A secure and privacy-preserving federated learning system for device fault diagnosis under Non-IID Condition in IIoT. IEEE Internet Things J. 2024, 11, 14241–14252. [Google Scholar] [CrossRef]
  49. Tao, S.; Wang, K.; Jin, T.; Wu, Z.; Lei, Z.; Gao, S. Spherical search algorithm with memory-guided population stage-wise control for bound-constrained global optimization problems. Appl. Soft Comput. 2024, 161, 111677. [Google Scholar] [CrossRef]
  50. Song, Y.J.; Han, L.H.; Zhang, B.; Deng, W. A dual-time dual-population multi-objective evolutionary algorithm with application to the portfolio optimization problem. Eng. Appl. Artiffcial Intell. 2024, 133, 108638. [Google Scholar] [CrossRef]
  51. Li, F.; Chen, J.; Zhou, L.; Kujala, P. Investigation of ice wedge bearing capacity based on an anisotropic beam analogy. Ocean Eng. 2024, 302, 117611. [Google Scholar] [CrossRef]
  52. Chen, H.; Heidari, A.A.; Chen, H.; Wang, M.; Pan, Z.; Gandomi, A.H. Multi-population differential evolution-assisted Harris hawks optimization: Framework and case studies. Future Gener. Comput. Syst. 2020, 111, 175–198. [Google Scholar] [CrossRef]
  53. Zhao, H.; Wang, L.; Zhao, Z.; Deng, W. A new fault diagnosis approach using parameterized time-reassigned multisynchrosqueezing transform for rolling bearings. IEEE Trans. Reliab. 2024. [Google Scholar] [CrossRef]
  54. Xie, P.; Deng, L.; Ma, Y.; Deng, W. EV-Call 120: A new-generation emergency medical service system in China. J. Transl. Intern. Med. 2024, 12, 209–212. [Google Scholar] [CrossRef] [PubMed]
  55. Ahmadianfar, I.; Heidari, A.A.; Gandomi, A.H.; Chu, X.; Chen, H. RUN Beyond the Metaphor: An Efficient Optimization Algorithm Based on Runge Kutta Method. Expert Syst. Appl. 2021, 115079. [Google Scholar] [CrossRef]
  56. Deng, W.; Chen, X.; Li, X.; Zhao, H. Adaptive federated learning with negative inner product aggregation. IEEE Internet Things J. 2023, 11, 6570–6581. [Google Scholar] [CrossRef]
  57. Gao, J.; Wang, Z.; Jin, T.; Cheng, J.; Lei, Z.; Gao, S. Information gain ratio-based subfeature grouping empowers particle swarm optimization for feature selection. Knowl.-Based Syst. 2024, 286, 111380. [Google Scholar] [CrossRef]
  58. Yang, Y.; Chen, H.; Heidari, A.A.; Gandomi, A.H. Hunger games search: Visions, conception, implementation, deep analysis, perspectives, and towards performance shifts. Expert Syst. Appl. 2021, 177, 114864. [Google Scholar] [CrossRef]
  59. Wang, J.; Shao, H.; Peng, Y.; Liu, B. PSparseFormer: Enhancing fault feature extraction based on parallel sparse self-attention and multiscale broadcast feed-forward block. IEEE Internet Things J. 2024, 11, 22982–22991. [Google Scholar] [CrossRef]
  60. Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. Cbam: Convolutional block attention module. In Proceedings of the European conference on computer vision (ECCV), Munich, Gernamy, 8–14 September 2018; pp. 3–19. [Google Scholar]
  61. Zhong, Y.; Wang, X.; Xu, Y.; Wang, S.; Jia, T.; Hu, X.; Zhao, J.; Wei, L.; Zhang, L. Mini-UAV-borne hyperspectral remote sensing: From observation and processing to applications. IEEE Geosci. Remote Sens. Mag. 2018, 6, 46–62. [Google Scholar] [CrossRef]
  62. Chen, H.; Wang, T.; Chen, T.; Deng, W. Hyperspectral image classification based on fusing S3-PCA, 2D-SSA and random patch network. Remote Sens. 2023, 15, 3402. [Google Scholar] [CrossRef]
  63. Van der Maaten, L.; Hinton, G. Visualizing data using t-SNE. J. Mach. Learn. Res. 2008, 9, 2579–2605. [Google Scholar]
  64. Yang, J.; Du, B.; Wang, D.; Zhang, L. ITER: Image-to-pixel representation for weakly supervised HSI classification. IEEE Trans. Image Process. 2024, 33, 257–272. [Google Scholar] [CrossRef] [PubMed]
Figure 1. The framework of H2-CHGN model for H2 image classification.
Figure 1. The framework of H2-CHGN model for H2 image classification.
Remotesensing 16 03155 g001
Figure 2. Superpixel and pixel feature conversion process.
Figure 2. Superpixel and pixel feature conversion process.
Remotesensing 16 03155 g002
Figure 3. The procedure for k-hop matrices.
Figure 3. The procedure for k-hop matrices.
Remotesensing 16 03155 g003
Figure 4. Cross-hop graph attention module: (a) pyramid structure by cross-connect feature and (b) graph attention mechanism.
Figure 4. Cross-hop graph attention module: (a) pyramid structure by cross-connect feature and (b) graph attention mechanism.
Remotesensing 16 03155 g004
Figure 5. The structure of ConvBlock.
Figure 5. The structure of ConvBlock.
Remotesensing 16 03155 g005
Figure 6. Convolutional block attention network module: (a) channel attention module and (b) spatial attention module.
Figure 6. Convolutional block attention network module: (a) channel attention module and (b) spatial attention module.
Remotesensing 16 03155 g006
Figure 7. Dual-channel attention fusion module.
Figure 7. Dual-channel attention fusion module.
Remotesensing 16 03155 g007
Figure 8. OAs under a different number of heads and epochs: (a) Pavia University; (b) WHU-Hi-LongKou; and (c) WHU-Hi-HongHu.
Figure 8. OAs under a different number of heads and epochs: (a) Pavia University; (b) WHU-Hi-LongKou; and (c) WHU-Hi-HongHu.
Remotesensing 16 03155 g008
Figure 9. Mean color visualization of superpixel segmented regions on different datasets: (ad): Pavia University; (eh): WHU-Hi-LongKou; and (il): WHU-Hi-HongHu.
Figure 9. Mean color visualization of superpixel segmented regions on different datasets: (ad): Pavia University; (eh): WHU-Hi-LongKou; and (il): WHU-Hi-HongHu.
Remotesensing 16 03155 g009aRemotesensing 16 03155 g009b
Figure 10. Effect of different superpixel segmentation scales on classification accuracy.
Figure 10. Effect of different superpixel segmentation scales on classification accuracy.
Remotesensing 16 03155 g010
Figure 11. Effect of different cross-hopping methods on classification accuracy.
Figure 11. Effect of different cross-hopping methods on classification accuracy.
Remotesensing 16 03155 g011
Figure 12. Classification maps for the Pavia University dataset: (a) False-color image; (b) ground truth; (c) SVM (OA = 79.54%); (d) CEGCN (OA = 97.81%); (e) SGML (OA = 94.30%); (f) WFCG (OA = 97.53%); (g) MSSG-UNet (OA = 98.52%); (h) MS-RPNet (OA = 96.96%); (i) AMGCFN (OA = 98.24%); and (j) H2-CHGN (OA = 99.24%).
Figure 12. Classification maps for the Pavia University dataset: (a) False-color image; (b) ground truth; (c) SVM (OA = 79.54%); (d) CEGCN (OA = 97.81%); (e) SGML (OA = 94.30%); (f) WFCG (OA = 97.53%); (g) MSSG-UNet (OA = 98.52%); (h) MS-RPNet (OA = 96.96%); (i) AMGCFN (OA = 98.24%); and (j) H2-CHGN (OA = 99.24%).
Remotesensing 16 03155 g012
Figure 13. Classification maps for the WHU-Hi-LongKou dataset: (a) False-color image; (b) ground truth; (c) SVM (OA = 92.88%); (d) CEGCN (OA = 98.72%); (e) SGML (OA = 96.03%); (f) WFCG (OA = 98.29%); (g) MSSG-UNet (OA = 98.56%); (h) MS-RPNet (OA = 97.17%); (i) AMGCFN (OA = 98.44%); and (j) H2-CHGN (OA = 99.19%).
Figure 13. Classification maps for the WHU-Hi-LongKou dataset: (a) False-color image; (b) ground truth; (c) SVM (OA = 92.88%); (d) CEGCN (OA = 98.72%); (e) SGML (OA = 96.03%); (f) WFCG (OA = 98.29%); (g) MSSG-UNet (OA = 98.56%); (h) MS-RPNet (OA = 97.17%); (i) AMGCFN (OA = 98.44%); and (j) H2-CHGN (OA = 99.19%).
Remotesensing 16 03155 g013
Figure 14. Classification maps for the WHU-Hi-HongHu dataset: (a) False-color image; (b) Ground truth; (c) SVM (OA = 66.34%); (d) CEGCN (OA = 94.01%); (e) SGML (OA = 92.51%); (f) WFCG (OA = 93.98%); (g) MSSG-UNet (OA = 93.73%); (h) MS-RPNet (OA = 93.56%); (i) AMGCFN (OA = 94.44%); (j) H2-CHGN (OA = 96.60%).
Figure 14. Classification maps for the WHU-Hi-HongHu dataset: (a) False-color image; (b) Ground truth; (c) SVM (OA = 66.34%); (d) CEGCN (OA = 94.01%); (e) SGML (OA = 92.51%); (f) WFCG (OA = 93.98%); (g) MSSG-UNet (OA = 93.73%); (h) MS-RPNet (OA = 93.56%); (i) AMGCFN (OA = 94.44%); (j) H2-CHGN (OA = 96.60%).
Remotesensing 16 03155 g014
Figure 15. Effect of different numbers of training samples for the methods: (a) Pavia University; (b) WHU-Hi-LongKou; and (c) WHU-Hi-HongHu.
Figure 15. Effect of different numbers of training samples for the methods: (a) Pavia University; (b) WHU-Hi-LongKou; and (c) WHU-Hi-HongHu.
Remotesensing 16 03155 g015
Figure 16. t-SNE results of different methods on three datasets: (ad) Pavia University; (eh) WHU-Hi-LongKou; and (il) WHU-Hi-HongHu.
Figure 16. t-SNE results of different methods on three datasets: (ad) Pavia University; (eh) WHU-Hi-LongKou; and (il) WHU-Hi-HongHu.
Remotesensing 16 03155 g016aRemotesensing 16 03155 g016b
Table 1. Information about datasets.
Table 1. Information about datasets.
Detailed InformationPavia UniversityWHU-Hi-LongKouWHU-Hi-HongHu
Size (pixels)610 × 340550 × 400940 × 475
Bands103270270
Spatial-res (m)1.30.4630.043
Spectral-wave (nm)430–860400–1000400–1000
SensorROSISDJI M600 ProDJI M600 Pro
Class9922
Training sample ratio0.4%0.2%0.2%
Validation sample ratio0.4%0.2%0.2%
Training sample ratio99.2%99.6%99.6%
Table 2. Category information of Pavia University dataset.
Table 2. Category information of Pavia University dataset.
ClassNameTrainingValidationTesting
1Asphalt27276577
2Meadows757518,499
3Gravel992081
4Trees13133038
5Metal sheets661333
6Bare soil21214987
7Bitumen661318
8Bricks15153652
9Shadows44939
Total 17617642,424
Table 3. Category information of WHU-Hi-LongKou dataset.
Table 3. Category information of WHU-Hi-LongKou dataset.
ClassNameTrainingValidationTesting
1Corn707034,371
2Cotton17178340
3Sesame773017
4Broad-leaf soybean12712762,958
5Narrow-leaf-soybean994133
6Rice242411,806
7Water13513566,786
8Roads and houses15157094
9Mixed weed11115207
Total 415415203,712
Table 4. Category information of WHU-Hi-HongHu dataset.
Table 4. Category information of WHU-Hi-HongHu dataset.
ClassNameTrainingValidationTesting
1Red roof292913,983
2Road883496
3Bare soil444421,733
4Cotton327327162,604
5Cotton firewood13136192
6Rape909044,377
7Chinese cabbage494924,005
8Pakchoi994036
9Cabbage222210,775
10Tuber mustard252512,344
11Brassica parachinensis232310,969
12Brassica chinensis18188918
13Small Brassica chinensis464622,415
14Lactuca sativa15157326
15Celtuce33996
16Film covered letttuce15157232
17Romaine lettuce772996
18Carrot773203
19White radish18188676
20Garlic sprout773472
21Broad bean331322
22Tree12125916
Total 790790386,986
Table 5. The architectural details of H2-CHGN.
Table 5. The architectural details of H2-CHGN.
ModuleDetails
Superpixel segmentation methodsPCA-SLIC
Graph constructionCalculate adjacency matric
Cross-hop graph way135-hop
Muti-scale CNN processingConv3×3@64, Conv5×5@64,
Feature fusionDual-Channel Attention Fusion
Table 6. Parameter settings for different datasets.
Table 6. Parameter settings for different datasets.
Dataset λ Head_numEpoch
Pavia University2505250
WHU-Hi-LongKou3007250
WHU-Hi-HongHu3006250
Table 7. Classification accuracy of the Pavia University dataset.
Table 7. Classification accuracy of the Pavia University dataset.
ClassSVMCEGCNSGMLWFCGMSSG-UNetMS-RPNetAMGCFNH2-CHGN
170.32 ± 5.7897.65 ± 2.6797.58 ± 0.2198.93 ± 0.3196.85 ± 2.3896.8 ± 2.3299.74 ± 0.1799.48 ± 0.23
277.47 ± 2.8298.66 ± 1.5399.72 ± 0.0999.55 ± 0.1499.06 ± 0.3799.64 ± 0.3199.96 ± 0.0699.95 ± 0.07
373.79 ± 3.3194.18 ± 3.8574.31 ± 14.8499.02 ± 0.4296.30 ± 3.6392.14 ± 4.8288.24 ± 11.4399.52 ± 0.25
492.70 ± 2.3993.35 ± 3.2991.12 ± 5.8599.65 ± 0.1894.67 ± 2.9390.15 ± 6.9491.44 ± 4.9397.54 ± 2.24
599.43 ± 0.1299.98 ± 0.05100 ± 091.24 ± 2.07100 ± 089.89 ± 8.5899.84 ± 0.1499.25 ± 0.83
679.72 ± 1.4199.95 ± 0.0285.17 ± 9.7398.52 ± 1.02100 ± 098.50 ± 1.07100 ± 0100 ± 0
792.80 ± 2.0699.24 ± 0.2390.40 ± 4.4199.04 ± 0.40100 ± 096.77 ± 2.1598.10 ± 1.6996.51 ± 1.92
881.88 ± 4.1397.20 ± 2.7293.86 ± 3.8284.74 ± 9.3199.97 ± 0.3695.59 ± 3.2999.26 ± 0.3298.49 ± 1.40
999.93 ± 0.1290.46 ± 8.3595.92 ± 3.09100 ± 099.57 ± 0.4085.54 ± 11.2082.85 ± 13.5791.38 ± 5.31
OA(%)79.54 ± 1.8797.81 ± 1.3194.3 ± 1.6497.53 ± 1.0298.52 ± 0.4796.96 ± 1.3298.24 ± 0.7999.24 ± 0.37
AA(%)85.34 ± 2.8496.74 ± 1.2592.01 ± 2.3296.74 ± 1.8798.49 ± 0.7293.89 ± 2.7895.49 ± 1.6498.01 ± 1.54
Kappa × 10073.86 ± 2.1997.10 ± 1.1692.53 ± 1.7196.72 ± 1.3598.05 ± 0.3995.95 ± 2.1397.67 ± 1.0598.99 ± 0.78
Table 8. Classification accuracy of the WHU-Hi-LongKou dataset.
Table 8. Classification accuracy of the WHU-Hi-LongKou dataset.
ClassSVMCEGCNSGMLWFCGMSSG-UNetMS-RPNetAMGCFNH2-CHGN
194.24 ± 4.8299.82 ± 0.3599.89 ± 0.1699.87 ± 0.1199.73 ± 0.2899.44 ± 0.4199.61 ± 0.2799.78 ± 0.13
286.45 ± 8.0598.97 ± 0.4194.67 ± 2.7497.41 ± 2.7299.87 ± 0.12100 ± 094.86 ± 4.6799.42 ± 0.59
393.52 ± 2.1491.35 ± 5.8698.96 ± 0.9899.85 ± 0.1988.60 ± 7.4798.79 ± 0.5398.21 ± 1.5899.60 ± 0.38
486.63 ± 2.5899.56 ± 0.2590.71 ± 5.8697.41 ± 1.6798.87 ± 0.7795.80 ± 2.6799.67 ± 0.2499.78 ± 0.16
586.77 ± 3.2494.27 ± 4.8298.54 ± 1.2199.06 ± 0.2498.35 ± 0.8698.88 ± 0.7490.65 ± 7.8198.67 ± 1.45
698.12 ± 2.7398.99 ± 0.7398.88 ± 0.7499.29 ± 0.3999.14 ± 0.4199.17 ± 0.6298.96 ± 0.8799.32 ± 0.61
799.85 ± 0.1099.98 ± 0.0499.92 ± 0.1299.10 ± 0.2299.96 ± 0.0298.68 ± 0.8799.87 ± 0.1199.97 ± 0.06
885.42 ± 1.6887.93 ± 9.7388.01 ± 8.6591.46 ± 4.1496.69 ± 2.2397.30 ± 1.6480.44 ± 18.2890.67 ± 9.43
982.94 ± 2.8586.54 ± 11.7087.76 ± 8.9295.19 ± 3.2374.38 ± 10.7868.00 ± 26.0792.94 ± 5.9589.17 ± 10.85
OA(%)92.88 ± 2.2398.72 ± 0.1696.03 ± 1.6198.29 ± 0.3798.56 ± 0.2497.17 ± 0.7398.44 ± 0.1299.19 ± 0.32
AA(%)90.44 ± 3.4595.27 ± 1.1295.26 ± 2.3297.63 ± 1.4895.06 ± 0.6995.12 ± 1.6095.02 ± 2.9197.38 ± 0.87
Kappa × 10090.77 ± 2.9798.31 ± 0.6694.83 ± 1.7297.74 ± 1.0598.11 ± 0.3496.28 ± 1.1997.95 ± 1.3498.93 ± 0.45
Table 9. Classification accuracy of the WHU-Hi-HongHu dataset.
Table 9. Classification accuracy of the WHU-Hi-HongHu dataset.
ClassSVMCEGCNSGMLWFCGMSSG-UNetMS-RPNetAMGCFNH2-CHGN
178.86 ± 3.3196.62 ± 1.1394.35 ± 2.3798.53 ± 0.3290.94 ± 7.8590.95 ± 5.2397.08 ± 1.5198.67 ± 1.28
280.39 ± 2.7182.82 ± 10.5791.93 ± 4.6887.46 ± 6.3994.82 ± 3.8659.73 ± 20.3583.01 ± 11.9792.16 ± 5.72
371.88 ± 2.1497.32 ± 0.4184.97 ± 12.3593.63 ± 2.3195.44 ± 2.0497.77 ± 1.1193.82 ± 3.9196.23 ± 1.36
469.76 ± 5.8599.77 ± 0.2994.09 ± 2.8199.46 ± 0.2497.93 ± 1.9598.56 ± 1.0999.53 ± 0.1899.74 ± 0.08
568.73 ± 6.5391.88 ± 1.2798.29 ± 0.1289.09 ± 6.0598.13 ± 0.6193.46 ± 4.6796.30 ± 2.8682.57 ± 10.71
678.13 ± 5.7197.75 ± 1.0195.07 ± 1.5797.75 ± 1.2191.49 ± 5.6597.03 ± 1.3698.87 ± 0.6599.35 ± 0.14
751.94 ± 5.3892.44 ± 2.3589.67 ± 7.9884.00 ± 10.8283.92 ± 10.3187.29 ± 8.5688.39 ± 6.3394.33 ± 3.94
832.97 ± 13.6535.54 ± 40.1298.86 ± 0.0661.08 ± 28.6399.97 ± 0.0578.62 ± 11.9863.36 ± 32.9791.55 ± 4.79
987.20 ± 3.3796.19 ± 1.3395.79 ± 2.8298.53 ± 0.3997.76 ± 1.1389.88 ± 7.7193.00 ± 4.8693.52 ± 3.82
1041.43 ± 7.6589.72 ± 6.1191.09 ± 4.3498.60 ± 1.0279.62 ± 12.2692.26 ± 4.2588.47 ± 7.0897.86 ± 1.19
1134.61 ± 11.4877.62 ± 14.5590.61 ± 6.3683.11 ± 10.7594.20 ± 3.2490.35 ± 7.4290.60 ± 6.4993.06 ± 4.56
1255.37 ± 4.6777.22 ± 12.8588.32 ± 7.9582.36 ± 11.5498.05 ± 0.8677.6 ± 12.3373.65 ± 24.0782.81 ± 16.48
1347.00 ± 9.2684.74 ± 6.2778.68 ± 14.0781.10 ± 9.4874.82 ± 18.0387.91 ± 6.7893.21 ± 4.2494.31 ± 3.51
1456.19 ± 8.0788.32 ± 5.3993.09 ± 2.2996.21 ± 1.8398.36 ± 0.6181.07 ± 10.6282.03 ± 14.1192.87 ± 4.79
1576.31 ± 6.4388.42 ± 5.1494.23 ± 2.2497.84 ± 0.3997.87 ± 1.4794.28 ± 4.7436.37 ± 58.8684.57 ± 12.01
1669.37 ± 3.2591.71 ± 2.8997.86 ± 1.0384.70 ± 13.5397.18 ± 1.5597.17 ± 3.0384.06 ± 12.9798.84 ± 0.92
1769.26 ± 2.8675.45 ± 8.6598.56 ± 1.1196.17 ± 4.4896.53 ± 2.0381.28 ± 13.9991.80 ± 6.2166.77 ± 26.44
1864.49 ± 5.8582.24 ± 7.6198.78 ± 0.9792.15 ± 3.8898.28 ± 0.6775.96 ± 18.1384.29 ± 11.3492.73 ± 3.94
1967.98 ± 4.1491.33 ± 2.2990.21 ± 3.5584.76 ± 8.3690.11 ± 6.2979.45 ± 15.2782.08 ± 14.3189.84 ± 6.82
2070.42 ± 3.1088.27 ± 6.1199.53 ± 0.3783.75 ± 8.4799.48 ± 0.2580.27 ± 11.8977.62 ± 19.6397.09 ± 1.20
2165.54 ± 6.3222.51 ± 40.7399.85 ± 0.1470.97 ± 14.31100 ± 098.94 ± 0.9398.56 ± 0.9296.22 ± 2.06
2269.60 ± 4.8388.22 ± 5.6899.73 ± 0.2087.63 ± 6.2299.80 ± 0.1389.52 ± 5.9799.97 ± 0.0186.53 ± 10.36
OA(%)66.34 ± 3.3994.01 ± 1.7292.51 ± 0.7793.98 ± 1.8193.73 ± 1.4893.56 ± 2.6394.44 ± 1.0596.60 ± 0.45
AA(%)63.97 ± 5.7183.46 ± 4.9393.79 ± 1.1688.59 ± 3.7594.30 ± 2.9587.24 ± 5.5186.18 ± 3.3191.89 ± 1.64
Kappa × 10060.11 ± 3.1292.41 ± 2.3590.66 ± 0.8394.15 ± 2.1992.12 ± 2.4291.87 ± 1.5992.98 ± 2.7095.77 ± 1.53
Table 10. OA (%) of ablation of each module on three datasets.
Table 10. OA (%) of ablation of each module on three datasets.
Module(1)(2)(3)(4)(5)
CGAT--
MCNN--
Dual-Channel Fusion--
Pavia University95.6294.1796.8196.5499.24
WHU-Hi-LongKou96.8796.1597.6497.3299.19
WHU-Hi-HongHu92.9491.8694.7294.4896.60
Table 11. Running time for different methods.
Table 11. Running time for different methods.
DatasetTime(s)AMGCFNWFCGCEGCNMSSG-UNetH2-CHGN
Pavia UniversityTrain59.70764.61447.37157.38857.506
Test0.0133.0173.63910.1660.033
WHU-Hi-LongKouTrain67.36870.46983.00564.10551.828
Test0.0213.3620.49510.6620.028
WHU-Hi-HongHuTrain137.803155.298314.517138.511125.697
Test0.0257.74529.91921.5610.181
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Chen, T.; Wang, T.; Chen, H.; Zheng, B.; Deng, W. Cross-Hopping Graph Networks for Hyperspectral–High Spatial Resolution (H2) Image Classification. Remote Sens. 2024, 16, 3155. https://doi.org/10.3390/rs16173155

AMA Style

Chen T, Wang T, Chen H, Zheng B, Deng W. Cross-Hopping Graph Networks for Hyperspectral–High Spatial Resolution (H2) Image Classification. Remote Sensing. 2024; 16(17):3155. https://doi.org/10.3390/rs16173155

Chicago/Turabian Style

Chen, Tao, Tingting Wang, Huayue Chen, Bochuan Zheng, and Wu Deng. 2024. "Cross-Hopping Graph Networks for Hyperspectral–High Spatial Resolution (H2) Image Classification" Remote Sensing 16, no. 17: 3155. https://doi.org/10.3390/rs16173155

APA Style

Chen, T., Wang, T., Chen, H., Zheng, B., & Deng, W. (2024). Cross-Hopping Graph Networks for Hyperspectral–High Spatial Resolution (H2) Image Classification. Remote Sensing, 16(17), 3155. https://doi.org/10.3390/rs16173155

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop