Cross-Hopping Graph Networks for Hyperspectral–High Spatial Resolution (H2) Image Classification

Chen, Tao; Wang, Tingting; Chen, Huayue; Zheng, Bochuan; Deng, Wu

doi:10.3390/rs16173155

Open AccessArticle

Cross-Hopping Graph Networks for Hyperspectral–High Spatial Resolution (H²) Image Classification

by

Tao Chen

¹,

Tingting Wang

¹,

Huayue Chen

^1,2,3,*,

Bochuan Zheng

^1,2,3

and

Wu Deng

⁴

¹

School of Computer, China West Normal University, Nanchong 637002, China

²

Institute of Artificial Intelligence, China West Normal University, Nanchong 637002, China

³

Key Laboratory of Optimization Theory and Applications, China West Normal University of Sichuan Province, Nanchong 637002, China

⁴

School of Electronic Information and Automation, Civil Aviation University of China, Tianjin 300300, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2024, 16(17), 3155; https://doi.org/10.3390/rs16173155

Submission received: 28 June 2024 / Revised: 23 August 2024 / Accepted: 24 August 2024 / Published: 27 August 2024

(This article belongs to the Special Issue Deep Learning for Spectral-Spatial Hyperspectral Image Classification)

Download

Browse Figures

Versions Notes

Abstract

As we take stock of the contemporary issue, remote sensing images are gradually advancing towards hyperspectral–high spatial resolution (H²) double-high images. However, high resolution produces serious spatial heterogeneity and spectral variability while improving image resolution, which increases the difficulty of feature recognition. So as to make the best of spectral and spatial features under an insufficient number of marking samples, we would like to achieve effective recognition and accurate classification of features in H² images. In this paper, a cross-hop graph network for H² image classification(H²-CHGN) is proposed. It is a two-branch network for deep feature extraction geared towards H² images, consisting of a cross-hop graph attention network (CGAT) and a multiscale convolutional neural network (MCNN): the CGAT branch utilizes the superpixel information of H² images to filter samples with high spatial relevance and designate them as the samples to be classified, then utilizes the cross-hop graph and attention mechanism to broaden the range of graph convolution to obtain more representative global features. As another branch, the MCNN uses dual convolutional kernels to extract features and fuse them at various scales while attaining pixel-level multi-scale local features by parallel cross connecting. Finally, the dual-channel attention mechanism is utilized for fusion to make image elements more prominent. This experiment on the classical dataset (Pavia University) and double-high (H²) datasets (WHU-Hi-LongKou and WHU-Hi-HongHu) shows that the H²-CHGN can be efficiently and competently used in H² image classification. In detail, experimental results showcase superior performance, outpacing state-of-the-art methods by 0.75–2.16% in overall accuracy.

Keywords:

hyperspectral image (HSI) classification; graph convolutional network (GCN); convolutional neural network (CNN); attention mechanism

1. Introduction

Hyperspectral-high spatial resolution (H²) [1] images have finer spectral and spatial information [2] and can effectively distinguish spectrally similar objects by capturing subtle differences in continuous shape of the spectral features [3], realizing landmark recognition at the image element level [4]. H² images have been widely used in environmental monitoring, pollution monitoring, geologic exploration, agricultural evaluation, etc. [5]. However, high dimensionality, a small number of labeled samples, spectral variability, and complex noise effects in H² data [6] make it a great challenge to perform effective feature extraction, which leads to poor classification accuracy and low accuracy of feature recognition [7].

Traditional machine learning methods have a strong dependence on a priori knowledge and specialized knowledge when extracting features of an HSI, which makes it hard to extract deep features from an HSI [8]. Deep learning methods can better handle nonlinear data and outperform traditional machine learning-based feature extraction methods [9]. Convolutional neural networks (CNNs) [10] are good at extracting effective spectral spatial features from HSIs. Zhong et al. [11] operated directly on an original HSI and designed an end-to-end spectral spatial residual network (SSRN) using spectral and spatial residual blocks. Roy et al. [12] constructed a model called HybridSN by concatenating a 2D-CNN and a 3D-CNN to maximize accuracy. However, these methods also exhibit complex structures and high computational requirements [13]. Recently, transformers [14] have had good performance in processing HS data due to their self-self attention mechanism. Hong et al. [15]. proposed a novel network called SpectralFormer, which learns local spectral features from multiple neighboring bands at each coding location. Nonetheless, the transformers-based networks, e.g., ViT [16], inevitably experience a degradation in performance when processing HSI data [17]. Johnson et al. [18] proposed a neuro-fuzzy modeling method for constructing a fitness predictive model. Zhao et al. [19] proposed a new shuffle Net-CA-SSD lightweight network. Li et al. [20] proposed a variational autoencoder, GAN, for fault classification. Yu Li et al. [21] proposed a distillation-constrained prototype representation network for image classification. Bhosle et al. [22] proposed a deep learning CNN model for digit recognition. Sun et al. [23] proposed a deep learning data generation method for predicting ice resistance. Wang et al. [24] proposed a spatio-temporal deep learning model for predicting streamflow. Shao et al. [25] proposed a task-supervised ANIL for fault classification. Dong et al. [26] designed a hybrid model with a support vector machine and a gate recurrent unit. Dong et al. [27] designed a region-based convolutional neural network for epigraphical images. Zhao et al. [28] designed a interpretable dynamic inference system based on fuzzy broad learning. Yan et al. [29] designed a lightweight framework using separable multiscale convolution and broadcast self-attention. Wang et al. [30] designed a deep learning interpretable model with multi-source data fusion. Li et al. [31] proposed an adaptive weighted ensemble clustering method. Li et al. [32] proposed an automatic assessment method for depression symptoms. Li et al. [33] proposed an optimization-based federated learning for non-IID data. Xu et al. [34] proposed an ensemble clustering method based on structure information. Li et al. [35] proposed a deep convolutional neural network for automatic diagnosis of depression.

Deep learning-based feature extraction methods for HSIs are widely used because they can better extract the deep features of the image. Traditional convolutional neural networks (CNNs) [10] need to be trained with a massive number of labeled samples and have a high training time complexity. Graph Convolutional Neural Networks (GCNs) [36] can handle arbitrary structural data, adaptively learn parameters according to specific feature types, and optimize these parameters, which improves a model’s recognition performance in different feature types compared to traditional CNNs [37]. Hong et al. [3] proposed a miniGCN to reduce computation cost and realize the complementary positive aspect between a CNN and a GCN in a small batch. However, pixel-based feature extraction methods generate high-dimensional feature vectors and extract a large amount of redundant information. Therefore, in order to extract features more efficiently, researchers have introduced superpixels for use instead of pixels. Sellars et al. proposed a semi-supervised method (SGL) [38] that combines superpixels with graphical representations and pure-graph classifiers to greatly reduce the computational overhead. Sheng et al. [39] designed a multi-scale dynamic graph-based network (MDGCN), exploiting the multi-scale information with dynamic transformation. To enhance the computational efficiency and speed up the training process, Li et al. [40] designed a new symmetric graph metric learning (SGML) model by introducing a symmetry mechanism, which mitigates the spectral variability through symmetry mechanisms. Although the above methods utilize superpixels instead of pixels to reduce the computational complexity, these methods can only generate features at the superpixel level and fail to take into account subtle features within each superpixel during the spectral feature extraction process. Consequently, classification maps generated by GCNs are sensitive to over-smoothing and produce false boundaries between classes [17]. To overcome this problem, Liu et al. [41] proposed a heterogeneous deep network (CEGCN) that utilizes CNNs to complement the superpixel-level features of GCNs by generating local pixel-level features. Moreover, graph encoders and decoders are proposed to solve the incompatibility between CNNs and GCN data. Based on the above idea, the Multi-layer Superpixel Structured Graph U-Net (MSSG-UNet) [42], which gradually extracts varied scale features from coarse to fine and performs feature fusion, was generated. Meanwhile, in order to better utilize neighboring nodes to prevent information loss, [36] proposed the graph attention network (GAT), which uses k-nearest neighbors to find the adjacency matrix and compute the weights of different nodes. Dong et al. [43] used weighted feature fusion of a CNN and a GAT. Ding et al. [44] designed a fusion network (MFGCN) [45] and a multi-scale receptive field graph attention neural network (MRGAT). The above improved GCNs and enabled a more comprehensive use of spatial and spectral features by reconfiguring the adjacency matrices; however, converting the connections of the nodes may present superfluous information and degrade the classification performance. Xue et al. designed a multihop GCN using different branches and different hopping graphs [46], which aggregates multiscale contextual information by hopping. Zhou et al. proposed a fusion network with attention multi-hop graph and multi-scale convolutional (AMGCFN) [47]. Xiao et al. [48] proposed a privacy-preserving federated learning system in IIoT. Tao et al. [49] proposed a memory-guided population stage-wise control spherical search algorithm. However, the above networks are complex in structure and inefficient in model training. Some new methods have been proposed in recent years [50,51,52,53,54,55,56,57,58,59], which can be used to optimize these models.

Aimed at solving the above problems in extracting and classifying features from HSIs, such as the inconsistency of superpixel segmentation processing for similar feature classification, the limitation of the traditional single-hop GCN for node characterization, and the loss of information in joint classification of the spatial spectrum, this paper proposes a cross-hop graph network model for H² image classification (H²-CHGN). The model utilizes two branches consisting of a cross-hop graph attention network (CGAT) and a multiscale convolutional neural network (MCNN) to classify H² images in parallel. Considering the computational complexity of graph construction, the GCN-based feature extraction first refines the graph nodes by means of superpixel segmentation, then constructs the graph convolution by using the cross-hop graph (replacing the ordinary neighbor-hopping operation with interval-hopping), which is combined with the multi-head GAT to jointly extract the superpixel features in order to obtain more representational global features. CNN-based feature extraction is performed in parallel to obtain multi-scale pixel-level features using improved CNNs. Finally, features from both branches are fused using a two-channel attention mechanism to gain more comprehensive and enriched feature representation. The contributions of this paper are outlined below:

(1): A two-branch neural network (CGAT and MCNN) was designed for feature extraction of H² images separately, which aims at fully leveraging the collective characteristics of hyperspectral imagery, encompassing both the superpixel and pixel levels.
(2): A cross-hopping graph operation algorithm was proposed to perform graph convolution operations from near to far, which can better capture the local and global correlation features between spectral features. To better capture multi-scale node features, a pyramid feature extraction structure was used to comprehensively learn multilevel graph structure information.
(3): In order to improve the adaptivity of the multilayer graph, a multi-head graph attention mechanism was introduced to portray different aspects of similarity between nodes, thus providing richer feature information.
(4): So as to reduce the computational complexity, dual convolutional kernels were utilized in convolutional neural networks using different sizes of convolutional kernels at different layers to extract pixel-level multiscale features by means of cross connectivity. The features from the two branches were fused through the dual-channel attention mechanism, in order to gain a more comprehensive and accurate feature representation.

The subsequent content of this article is organized as follows. The introduction of the H²-CHGN is in Section 2. The experimental settings are depicted in Section 3. In Section 4, the extensive experiments and analyses are carried out. Finally, summaries are made in Section 5.

2. Proposed H²-CHGN Framework

We denote the H² image cube as

X \in R^{H \times W \times B}

, and

H

,

W

, and

B

represent the height, width, and number of spectral bands of the spatial dimension, respectively.

In the H²-CHGN model, as shown Figure 1, the CGAT branch first uses superpixel segmentation on the H² image to filter out the samples with high spatial correlation with samples to be classified. The data structure transformation is performed through the coder (decoder), which facilitates the operations in the graph space. The cross-hop operation is utilized in graph convolution to broaden the range of graph convolution. For increasing feature diversity, we use the pyramid feature extraction structure to comprehensively learn multilevel graph structure information. Then, the contextual information is captured by graph attention mechanism to obtain more representational global features. Meanwhile, for enriching local features, the MCNN branch extracts multi-scale pixel-level features by parallel cross connectivity with dual convolutional kernels (3 × 3 and 5 × 5). Finally, in order to make image elements more prominent, the features of varied scales are transferred into the dual-channel attention fusion module to obtain fused features, then final classification results are obtained through the

S o f t M a x

layer.

2.1. Graph Construction Process Based on Superpixel Segmentation

In order to apply graph neural networks to an HSI, we need to convert standard Euclidean data like H² images into graph data. However, if we directly consider the pixels in H² images as graph nodes, the resulting graph must be very large and the computational complexity will be extremely high. Thus, we first applied principal component analysis (PCA) on the original HSI to improve the efficiency of the graph construction process. It is followed by a simple linear iterative clustering (SLIC) method, which generated spatially neighboring and spectrally similar superpixels. The adjacency matrix input into the GCN afterwards is constructed by establishing the adjacency relationship between superpixels, as shown in Figure 2.

Specifically, an H² image is divided into

Z = ⌈(H \times W) / λ⌉

superpixels by PCA-SLIC, wherein

λ

is the superpixel segmentation scale. Let

S = {\{S_{i}\}}_{i = 1}^{Z}

denote the superpixels set, with

S_{i} = {\{x_{j}^{i}\}}_{j = 1}^{N_{i}}

as the

i t h

superpixel,

N_{i}

as the number of pixels in

S_{i}

, and

x_{j}^{i}

as the

j t h

pixel in

S_{i}

(

S_{i} \cap S_{j} = \emptyset, \forall i \neq j

and

H \times W = \sum_{i = 1}^{Z} N_{i}

).

Since the subsequent output features of the GCN need to be merged with the CNN’s, in order to alleviate the data incompatibility between the two networks, we apply the graphical encoder and decoder proposed in [41] to perform the data structure conversion. It is assumed that the association matrix of conversion is

Q \in R^{H W \times Z}

and is denoted as follows [41]:

Q_{i, j} = \{\begin{array}{l} 1, i f \overset{\land}{X_{i}} \in S_{j} \\ 0, o t h e r w i s e \end{array} \overset{\land}{X} = F l a t t e n (X)

(1)

where

F l a t t e n (\cdot)

denotes the expansion of original data into spatial dimensions and

Q_{i, j}

denotes the value

Q

at position

(i, j)

. Then the matrix of node v in graph

G

(

G = (v, e)

) can be expressed as follows [41]:

V = E n c o d e (X; Q) = {\overset{\land}{Q}}^{T} F l a t t e n (X)

(2)

where

\overset{\land}{Q}

is the result of column normalization and

E n c o d e (\cdot)

represents the process that encodes the pixel map of the HSI into the graph node of G. The superpixel node features are then mapped back to pixel features using the

D e c o d e (\cdot)

equation as follows [41]:

\tilde{X} = D e c o d e (V; Q) = Reshape (Q V)

(3)

2.2. Cross-Hop Graph Attention Convolution Module

It is feasible to obtain more node information by the way of stacking multiple convolutional layers in the GCN model; however, this operation will inevitably increase the computational complexity of the network. Merely using a shallow GCN lacks deeper feature information and causes poor classification accuracy. In contrast, the multi-hop graph [46] has more flexibility and can fully utilize the multi-hop node information to broaden the acceptance domain and mine the potential relationship between hop nodes.

As shown in Figure 3, the weight matrix

W \in R^{m \times m}

between the superpixel nodes is obtained by the superpixel segmentation algorithm for constructing the neighbor matrix of the multi-hop graph structure; the concrete steps are as follows:

Step 1. Assuming that the graph node

V_{c e n t e r}

is at located at the center, first obtain all the surrounding k-hop neighborhood paths by using a Depth-First Search (DFS) starting at

V_{c e n t e r}

.

Step 2. Calculate the path weights as:

W^{'} = \frac{1}{k} (W_{(V_{c e n t e r}, V_{1})} + W_{(V_{c e n t e r}, V_{2})} + . . . + W_{(V_{c e n t e r}, V_{e n d})})

(4)

where

V_{1} {, V}_{2}

,…,

V_{e n d}

represents the nodes in the path. (Note: For the case of having different paths, we select the maximum value of the corresponding path weight and the current weight.)

Step 3. Add the current weight matrix with the unit matrix to ensure that each node is connected to itself to obtain a new k-hop matrix (

A_{1}

,

A_{2}

,…,

A_{k}

).

The k-hop neighbor matrix

A_{k}

is applied to find the graph convolution through the GCN and BN layers as follows:

T_{k}^{l} = LeakyReLU ((D_{K})^{- \frac{1}{2}} A_{k} (D_{K})^{\frac{1}{2}} T_{k}^{l - 1} W_{k}^{l})

(5)

where

T_{k}^{l}

denotes the output of k-hop neighbor matrix

A_{k}

through layer

l

,

D_{K}

is the degree matrix of

A_{k}

, and

W_{k}^{l}

indicates weight matrix. The node features of different jumps can feed different visual field information, as in the structure of Figure 4a, through the pyramid feature extraction structure to obtain the features of different depths and splice them to obtain the following:

T_{o u t} = T_{1} \oplus T_{3} \oplus T_{5}

(6)

To compute the hidden information of each node, as in Figure 4b, a self-attention sharing mechanism

a

was applied to compute the attention coefficients between node

v_{i}

and node

v_{j}

:

e_{i j} = a (W v_{i}, W v_{j})

. Here, the first-order attention mechanism is carried by merely computing the first-order neighbor nodes

j \in N_{i}

of the node

i

, where

N_{i}

is a domain of

i

. Then, normalization by the

S o f t M a x

function is executed to make the coefficients more easily comparable across nodes:

α_{i j} = {Softmax}_{j} (e_{i j}) = \frac{\exp (e_{i j})}{\sum_{k \in N_{i}} \exp (e_{i k})}

(7)

Then, the corresponding features undergo a linear combination to calculate the node output features

v_{i}^{'} = σ (\sum_{j \in N_{i}} α_{i j} W v_{j})

.

To obtain the node features stably, we apply multiple attention mechanisms to obtain multiple sets of new features, which are spliced in feature dimensions and placed into the final fully connected layer to obtain the final features as follows:

h_{i}^{'} = ELU (W_{3} (\overset{\underset{T}{t = 1}}{\oplus} σ (\sum_{j \in N_{i}} α_{i j}^{t} W h_{j})))

(8)

where

α_{i j}^{t}

denotes the

t t h

group of attention coefficients;

T

is the number of heads, which means there are

T

groups of attention coefficients; and

\oplus

denotes splicing operation of the total number of

T

groups in the feature dimensions.

W_{3} \in R^{C \times T d}

denotes the weight matrix, where

T d

indicates input feature dimensions and

C

denotes the output feature dimensions, which equal the number of classes of the hyperspectral images.

2.3. CNN-Based Multiscale Feature Extraction Module

Although traditional 2D-CNNs can extract context space features, considering the multi-parameters in the convolutional kernel and a limited number of training samples, we prevent overfitting and enhance training efficiency by applying batch normalization (BN) to each convolutional layer unit [47], which is calculated as follows:

v_{i j}^{x y} = σ (b_{i j} + \sum_{m = 0}^{M_{i} - 1} \sum_{p = 0}^{P_{i} - 1} \sum_{q = 0}^{Q_{i} - 1} w_{i j m}^{p q} v_{(i - 1) m}^{(x + p) (y + q)})

(9)

where

P_{i}

,

Q_{i}

,

M_{i}

,

p

,

q

, and

m

denote the size of the convolution kernel and their corresponding indices;

v_{i j}^{x y}

indicates the output by the

i t h

convolutional layer at

(x, y)

; and

w_{i j m}^{p q}

,

b_{i j}

, and

σ

denote the weight, bias term, and

L e a k y R e L U

activation function, respectively.

For broadening the sensory field of the network to obtain local features at varied scales, as shown in Figure 5, features are extracted from different sized convolution kernels in parallel, and then the different scale information is integrated by cross-path fusion. Because of the high dimensionality of the HSI spectrum, a 1D convolution kernel is first used in the convolution module to remove redundant spectral information and reduce parameter usage. Then an average pooling layer is used between the different convolutional layers to reduce the feature space size and prevent overfitting, where the pooling window size, stride, and padding size are, respectively, set to 3 × 3, 1, and 1.

2.4. Dual-Channel Attention Fusion Module

To better employ the channel relationship between hyperspectral pixels, we utilize the Convolutional Block Attention Network [60] (CBAM) module displayed in Figure 6 to sequentially infer the attention mapping along channel and spatial dimensions independently, multiplying it by the input for adaptive feature refinement.

Suppose the input feature map is

F \in R^{C \times H \times W}

; after the CBAM module there are channel and spatial attention maps

M_{C} \in R^{C \times 1 \times 1}

and

M_{S} \in R^{1 \times H \times W}

and their attention processes is expressed as follows:

F_{1} = M_{C} (F) \otimes F

(10)

F_{2} = M_{S} (F_{1}) \otimes F_{1}

(11)

Feature fusion inspired by [47] utilizes the cross-attention fusion mechanism as shown in Figure 7.

Specifically, the input is first aggregated by global and max pooling on the spatial information of the feature maps to generate two features:

F_{a v g}^{C}

and

F_{m a x}^{C}

. There is a shared network composed of a multilayer perceptron (MLP) and a hidden layer where the above-mentioned features are input to get a channel attention map

M_{C} \in R^{C \times 1 \times 1}

:

\begin{array}{l} M_{C} (F) & = σ (M L P (A v g P o o l (F) + M L P (M a x P o o l (F)))) \\ = σ (W_{1} (W_{0} (F_{a v g}^{C})) + W_{1} (W_{0} (F_{m a x}^{C}))) \end{array}

(12)

where

W_{0} \in R^{C / r \times C}

and

W_{1} \in R^{C / C \times r}

represent the shared weight parameters of the MLP,

r

is the reduction ratio in the hidden layer, and

σ

and is the

s i g m o i d

function. From this, we can obtain mappings

M_{C}^{N}

and

M_{C}^{G}

, which correspond to the previous two branch features

N_{o u t}

and

G_{o u t}

, respectively, and multiply them to obtain

M_{C}^{c r o s s}

:

M_{C}^{c r o s s} = M_{C}^{N} \times (M_{C}^{G})^{T}

(13)

After the channel crossing module, the characteristics are obtained as follows:

N_{C} = Softmax (M_{C}^{c r o s s}) N_{out}

(14)

G_{C} = Softmax (M_{C}^{c r o s s}) G_{out}

(15)

Unlike the channel attention operation, we input two pooled features into the convolutional layer to generate the spatial attention mapping

M_{S} \in R^{1 \times H \times W}

[47]

\begin{array}{l} M_{S} (F) & = σ (f^{k \times k} ([A v g P o o l (F); M a x P o o l (F)])) \\ = σ (f^{k \times k} ([F_{a v g}^{S}; F_{m a x}^{S}])) \end{array}

(16)

where

f^{k \times k}

is a convolutional layer with kernel size

k \times k

. Similarly obtain the spatial weight coefficients

M_{T}^{N}

and

M_{T}^{G}

, multiply them with the input features, and add the residuals to get the noted features, respectively:

N_{S} = Softmax (M_{C}^{c r o s s}) N_{out} + N_{out}

(17)

G_{S} = Softmax (M_{C}^{c r o s s}) G_{out} + G_{out}

(18)

Eventually, through the fully connected layer, the last features are obtained [47] as follows:

Y = Softmax (W (N_{S} \oplus H_{S}) + B)

(19)

where

W

and

B

are the weight and bias. The progress is summarized in Algorithm 1.

Algorithm 1 Dual-Channel Attention Fusion Algorithm

Input:

Feature map F \in R^{C \times H \times W}

Step 1: Calculate

F_{a v g}^{C}

and

F_{m a x}^{C}

of

F

separately by global pooling and max pooling.

Step 2: Calculate channel weight coefficients for both branches as

M_{C}^{N}

and

M_{C}^{G}

according to Equation (12).
Step3: Calculate crossover coefficient

M_{C}^{c r o s s}

by Equation (13).
Step 4: Use the channel crossover module to calculate

N_{C}

and

G_{C}

, respectively, according to Equations (14) and (15).
Step 5: Similar to steps 1–4 above, calculate the spatial weight coefficients

M_{T}^{N}

and

M_{T}^{G}

and the fusion features

N_{S}

and

G_{S}

by Equations (16)–(18).
Step 6: Calculate final fusion features according to Equation (19).

Output:

Y

3. Experimental Details

To validate the effectiveness and generalization of the H²-CHGN, we chose the following three datasets: a classical HSI dataset (Pavia University) and two H² image datasets (WHU-Hi-LongKou [1] and WHU-Hi-HongHu [61]). They are captured by different sensors for different types of scenes, providing richer samples. This diversity helps to improve the generalization ability of the model so that it performs well in different scenes. Table 1 lists the detailed information about the datasets. Table 2, Table 3 and Table 4 list training and testing set divisions of the three datasets. The comparison methods are SVM (OA: PU = 79.54%, LK = 92.88%, and HH = 66.34%), CEGCN [41] (OA: PU = 97.81%, LK = 98.72%, and HH = 94.01%), SGML (OA: PU = 94.30%, LK = 96.03%, and HH = 92.51%), WFCG (OA: PU = 97.53%, LK = 98.29%, and HH = 93.98%), MSSG-UNet (OA: PU = 98.52%, LK = 98.56%, and HH = 93.73%), MS-RPNet (OA: PU = 96.96%, LK = 97.17%, and HH = 93.56%), AMCGFN (OA: PU = 98.24%, LK = 98.44%, and HH = 94.44%) and H²-CHGN (OA: PU = 99.24%, LK = 99.19%, and HH = 96.60%). To quantitatively and qualitatively assess the classification performance of the network [62], three evaluation indices are used: overall accuracy (OA), average accuracy (AA), and kappa coefficient (Kappa). All experimental results are averaged over ten runs independently. The experiments were run via Python 3.7.16 with i5-8250U CPU and NVIDIA GeForce RTX 3090 GPU.

3.1. Parametric Setting

Table 5 shows the architecture of the H²-CHGN where it is used for all activation functions. In addition, the network uses the Adam optimizer with a learning rate of 5 ×10⁻⁴ to train. The remaining setup parameters include the superpixel segmentation scale and the number of attentional heads, and the iterations are explained in the following sections. We discuss and analyze these parameters through experiments, and the ultimate optimal parameter settings are shown in Table 6.

3.2. Analysis of Multi-Head Attention Mechanism in Graph Attention

By using multiple attention heads, a GAT is able to learn the attention weights between different nodes and combine them to obtain a more comprehensive and accurate image features. Notably, the number of heads controls different node relationships that the model is able to learn. Increasing attention heads allows the model to capture richer node relationships and improves the model’s representation, but at the same time this increases the computational complexity and memory overhead and is prone to overfitting. Therefore, as shown in Figure 8, so as to weigh the balance between model performance and computational complexity, the number of heads is taken as 7, 6, and 5 for three datasets individually.

3.3. Impact of Superpixel Segmentation Scale

The size of the superpixel segmentation scale molds the size of the graph construction area. The larger the superpixel area, the more pixels are contained in the segmented superpixel area and the fewer the number graph nodes that have to be constructed. The experiments are set

λ

’s as 150, 200, 250, 300, 350, 400. Regions with different numbers of superpixels are visualized in mean color as in Figure 9, where the most representative features of each region can be seen more clearly. The larger the superpixel segmentation scale, the more superpixels in the segmented region, indicating a more detailed segmentation. Figure 10 demonstrates that as the scale increases, the classification accuracy initially increases but eventually declines, with the WHU-Hi-HongHu dataset experiencing a more significant decrease compared to the other two datasets. This is attributed to its greater complexity, featuring 22 categories and a relatively denser distribution of features. When different varieties of the same crop type are planted in a region, finer segmentation can lead to a higher likelihood of misclassifying various pixel categories into a single node. Additionally, the overall accuracy improves again when the segmentation scale increases from 350 to 400, likely due to the enhanced category separability achieved through more detailed segmentation.

3.4. Cross-Hopping Connection Analysis

Because HSI contains feature distributions of varying shapes and sizes, the cross-hop mechanism plays a critical role in modeling complex spatial topologies. Precisely, the near-hop graph structure contains fewer nodes, which is good for modeling small feature distributions, but makes it hard to learn continuous smooth features on large one. In contrast to the neighbor-hopping graph structure, in which most of the graph nodes are duplicated, the cross-hopping graph can better model large feature distributions but fails to account for subtle differences. To affirm the competence of the cross-hop hopping operation, we repeated the experiment 10 times for three sets of experiments (123-hop for adjacent hops, 124-hop for cross-even hops, and 135-hop for cross-odd hops). As shown in Figure 11, there is no significant effect for the simpler WHU-Hi-LongKou dataset. In contrast, for other two more complex datasets, the cross odd-hop 135-hop operation has a better performance. The reason can be inferred that the cross-hop mechanism enables a larger range of graph convolution operations, which can effectively extract a wider range of sample information and perform better.

4. Comparative Experimental Analysis and Discussions

4.1. Comparison of Classification Performance

The classification accuracies of the diverse methods on the three datasets are described in Table 7, Table 8 and Table 9, and the matching classification diagrams are shown in Figure 12, Figure 13 and Figure 14. From the results, SVM achieves poorer results when the training samples are small. In contrast, GCN-based SGML achieves smoother features, but also suffers from many scattered pretzel noises and susceptibility to misclassification. In addition, other models based on the fusion mechanism of a GCN and a CNN can obviously achieve higher classification accuracy despite their more complex structure, which indicates that the training sample features can be fully exploited by using the superpixel-based multiscale fusion mechanism. As seen in classification Figure 12, Figure 13 and Figure 14, the H²-CHGN achieves better classification results than other models on both classical and double-high (H²) datasets. It is obvious that the model provides a more significant improvement effect on WHU-Hi-HongHu dataset. Its plots vary in size and exhibit a more fragmented distribution, making them susceptible to generating isolated regions of classification errors in classification tasks. The proposed cross-hop graph operation facilitates the acquisition of contextual information and global association features from near to far, achieving a maximum accuracy of 96.6% on the WHU-Hi-HongHu dataset. In summary, the H²-CHGN not only integrates a CNN and a GCN well, but also effectively handles complex spatial features by using superpixel segmentation on the GCN branch, extracts structure-learning multi-scale features by using pyramid features of the spanning-hopping graphs, and captures local and global information by a GAT. The combined effect of superpixel segmentation, cross-hop map convolution, and attention mechanism can alleviate the problem of high spatial-spectral heterogeneity.

4.2. Ablation Analysis

To validate the efficacy of each module, we conducted ablation experiments on three datasets. Table 10 presents the results of these experiments: (1) and (2) represent the experiments conducted solely on the two-branch module, while (3) and (4) depict the experiments conducted with the combination of the Dual-Channel Fusion module with one or the other of the two branches, respectively. By comparing (1) and (3), as well as (2) and (4), it is evident that the Dual-Channel Fusion module enhances the classification performance by improving the output features of the corresponding branches. Moreover, Experiment (5) demonstrates the indispensable contribution of each component to the overall model.

4.3. Performance under Limited Samples

H² data usually have high dimensionality and plenty of spectral bands, but training samples are usually odd in that samples of each category are limited. Therefore, the small-sample learning ability of the model is crucial in HSI classification methods. Experimentally, 0.1–0.6% samples are selected as training samples. As is described in Figure 15, OA increases with more training samples, and the H²-CHGN outperforms the other models on all three datasets. It is worth noting that due to the possible imbalance of samples in hyperspectral datasets, over-sampling of certain categories may lead to a decrease in accuracy with an increase in sampling rate.

4.4. Visualization by t-SNE

T-distributed stochastic neighbor embedding (t-SNE) [63] is a nonlinear dimensionality reduction technique which is particularly suitable for the visualization of high-dimensional data. Figure 16 shows the t-SNE result of four methods on three datasets. For a more straightforward view, the experiment was visualized by randomly selecting one of the bands of the feature. By comparison, a larger interclass gap and smaller intraclass disparity are displayed in the proposed feature space, and all categories are distinguished by acceptable boundaries [64].

4.5. Comparison of Running Time

To evaluate the efficiency of the H²-CHGN, the running times of different methods were recorded on the same computing platform, and comparisons are shown in Table 11. In this paper, the H²-CHGN, as in CEGCNs and MSSGU-Nets, inputs the whole H² image into the network and classifies all the pixels in parallel through two branches, which also results in a shorter training time of the network compared to other methods and proves the superiority of the method.

5. Conclusions

In this paper, a cross-hop graph network (H²-CHGN) model for H² image classification is proposed to alleviate the spatial heterogeneity and spectral variability of an H² image. It is also essentially a hybrid neural network based on a superpixel-based GCN and a pixel-based CNN to extract spatial and global spectral features. Among them, a cross-hop graph attention network (CGAT) branch widens the range of graph convolution and the pyramid feature extraction structure is utilized to fuse multilevel features. So as to better capture relationship between nodes and contextual information, the H²-CHGN also employs a graph attention mechanism that specializes in graph-structured data. Meanwhile, the multi-scale convolutional neural network (MCNN) employs dual convolutional kernels to extract features at different scales and obtains multi-scale localized features in pixel level by means of cross connectivity. Finally, the dual-channel attention fusion module is used to effectively integrate multi-scale information, while strengthening the key features and improving generalization capability. Experimental results verify the validity and generalization of the H²-CHGN. Specifically, the overall accuracy of the three datasets (Pavia University, WHU-Hi-LongKou, WHU-Hi-HongHu) is as high as 99.24%, 99.19%, 96.60% respectively.

Our study possesses certain limitations containing a large number of model parameters and high computational demands. First, we noticed that the H²-CHGN is the result of superpixel segmentation, cross-hop convolution, and the attention mechanism working together. Therefore, suitable parameter matching is essential. In addition, a GCN needs to construct the adjacency matrix on all data, while the H²-CHGN can reduce the computational cost by cross-hop graph convolution.

In the future, we will explore strategies to enhance the cross-hop graph by employing more advanced techniques, such as transformers. Additionally, we aim to develop lighter-weight networks to reduce complexity while preserving performance.

Author Contributions

Conceptualization, T.W. and H.C.; methodology, T.W. and B.Z.; software T.C.; validation, W.D.; resources, H.C.; data curation, T.W.; writing—original draft preparation, T.W. and H.C.; writing—review and editing, T.C. and W.D.; visualization, T.W.; supervision, B.Z.; project administration, T.C.; funding acquisition, H.C. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the National Natural Science Foundation of China (62176217), the Innovation TeamFunds of ChinaWest Normal University (KCXTD2022-3), the Sichuan Science and Technology Program of China (2023YFG0028, 2023YFS0431), the A Ba Achievements Transformation Program (R23CGZH0001), the Sichuan Science and Technology Program of China (2023ZYD0148, 2023YFG0130), and Sichuan Province Transfer Payment Application and Development Program (R22ZYZF0004).

Data Availability Statement

Dataset available on request from the authors.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Zhong, Y.; Hu, X.; Luo, C.; Wang, X.; Zhao, J.; Zhang, L. WHU-Hi: UAV-borne hyperspectral with high spatial resolution (H2) benchmark datasets and classifier for precise crop identification based on deep convolutional neural network with CRF. Remote Sens. Environ. 2020, 250, 112012. [Google Scholar] [CrossRef]
Paoletti, M.E.; Haut, J.M.; Plaza, J.; Plaza, A. Deep learning classifiers for hyperspectral imaging: A review. ISPRS J. Photogramm. Remote Sens. 2019, 158, 279–317. [Google Scholar] [CrossRef]
Hong, D.; Gao, L.; Yao, J.; Zhang, B.; Plaza, A.; Chanussot, J. Graph convolutional networks for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2020, 59, 5966–5978. [Google Scholar] [CrossRef]
Landgrebe, D. Hyperspectral image data analysis. IEEE Signal Process. Mag. 2002, 19, 17–28. [Google Scholar] [CrossRef]
Shimoni, M.; Haelterman, R.; Perneel, C. Hypersectral imaging for military and security applications: Combining myriad processing and sensing techniques. IEEE Geosci. Remote Sens. Mag. 2019, 7, 101–117. [Google Scholar] [CrossRef]
Li, S.; Song, W.; Fang, L.; Chen, Y.; Ghamisi, P.; Benediktsson, J.A. Deep learning for hyperspectral image classification: An overview. IEEE Trans. Geosci. Remote Sens. 2019, 57, 6690–6709. [Google Scholar] [CrossRef]
Long, H.; Chen, T.; Chen, H.; Zhou, X.; Deng, W. Principal space approximation ensemble discriminative marginalized least-squares regression for hyperspectral image classification. Eng. Appl. Artif. Intell. 2024, 133, 108031. [Google Scholar] [CrossRef]
Li, J.; Zheng, K.; Yao, J.; Gao, L.; Hong, D. Deep unsupervised blind hyperspectral and multispectral data fusion. IEEE Geosci. Remote Sens. Lett. 2022, 19, 1–5. [Google Scholar] [CrossRef]
Chen, H.; Long, H.; Chen, T.; Song, Y.; Chen, H.; Zhou, X.; Deng, W. M³FuNet: An Unsupervised Multivariate Feature Fusion Network for Hyperspectral Image Classification. IEEE Trans. Geosci. Remote Sens. 2024, 62, 1–15. [Google Scholar]
Makantasis, K.; Karantzalos, K.; Doulamis, A.; Doulamis, N. Deep supervised learning for hyperspectral data classification through convolutional neural networks. In Proceedings of the 2015 IEEE international geoscience and remote sensing symposium (IGARSS), Milan, Italy, 26–31 July 2015; pp. 4959–4962. [Google Scholar]
Zhong, Z.; Li, J.; Luo, Z.; Chapman, M. Spectral–spatial residual network for hyperspectral image classification: A 3-D deep learning framework. IEEE Trans. Geosci. Remote Sens. 2017, 56, 847–858. [Google Scholar] [CrossRef]
Roy, S.K.; Krishna, G.; Dubey, S.R.; Chaudhuri, B.B. HybridSN: Exploring 3-D–2-D CNN feature hierarchy for hyperspectral image classification. IEEE Geosci. Remote Sens. Lett. 2019, 17, 277–281. [Google Scholar] [CrossRef]
Chen, H.; Ru, J.; Long, H.; He, J.; Chen, T.; Deng, W. Semi-supervised adaptive pseudo-label feature learning for hyperspectral image classification in internet of things. IEEE Internet Things J. 2024. [Google Scholar] [CrossRef]
Vaswani, A. Attention is all you need. arXiv 2017, arXiv:1706.03762. [Google Scholar]
Hong, D.; Han, Z.; Yao, J.; Gao, L.; Zhang, B.; Plaza, A.; Chanussot, J. SpectralFormer: Rethinking hyperspectral image classification with transformers. IEEE Trans. Geosci. Remote Sens. 2021, 60, 1–15. [Google Scholar] [CrossRef]
Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16 × 16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
Sun, L.; Zhang, H.; Zheng, Y.; Wu, Z.; Ye, Z.; Zhao, H. MASSFormer: Memory-Augmented Spectral-Spatial Transformer for Hyperspectral Image Classification. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5516415. [Google Scholar] [CrossRef]
Johnson, F.; Adebukola, O.; Ojo, O.; Alaba, A.; Victor, O. A task performance and fitness predictive model based on neuro-fuzzy modeling. Artif. Intell. Appl. 2024, 2, 66–72. [Google Scholar] [CrossRef]
Zhao, H.; Gao, Y.; Deng, W. Defect detection using shuffle Net-CA-SSD lightweight network for turbine blades in IoT. IEEE Internet Things J. 2024. [Google Scholar] [CrossRef]
Li, W.; Liu, D.; Li, Y.; Hou, M.; Liu, J.; Zhao, Z.; Guo, A.; Zhao, H.; Deng, W. Fault diagnosis using variational autoencoder GAN and focal loss CNN under unbalanced data. Struct. Health Monit. 2024. [Google Scholar] [CrossRef]
Yu, C.; Zhao, X.; Gong, B.; Hu, Y.; Song, M.; Yu, H.; Chang, C.I. Distillation-Constrained Prototype Representation Network for Hyperspectral Image Incremental Classification. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5507414. [Google Scholar] [CrossRef]
Bhosle, K.; Musande, V. Evaluation of deep learning, C.N.N Model for recognition of Devanagari digit. Artif. Intell. Appl. 2023, 1, 114–118. [Google Scholar]
Sun, Q.; Chen, J.; Zhou, L.; Ding, S.; Han, S. A study on ice resistance prediction based on deep learning data generation method. Ocean. Eng. 2024, 301, 117467. [Google Scholar] [CrossRef]
Wang, Z.; Xu, N.; Bao, X.; Wu, J.; Cui, X. Spatio-temporal deep learning model for accurate streamflow prediction with multi-source data fusion. Environ. Model. Softw. 2024, 178, 106091. [Google Scholar] [CrossRef]
Shao, H.; Zhou, X.; Lin, J.; Liu, B. Few-shot cross-domain fault diagnosis of bearing driven by Task-supervised ANIL. IEEE Internet Things J. 2024, 11, 22892–22902. [Google Scholar] [CrossRef]
Dong, J.; Wang, Z.; Wu, J.; Cui, X.; Pei, R. A novel runoff prediction model based on support vector machine and gate recurrent unit with secondary mode decomposition. Water Resour. Manag. 2024, 38, 1655–1674. [Google Scholar] [CrossRef]
Preethi, P.; Mamatha, H.R. Region-based convolutional neural network for segmenting text in epigraphical images. Artif. Intell. Appl. 2023, 1, 119–127. [Google Scholar] [CrossRef]
Zhao, H.; Wu, Y.; Deng, W. An interpretable dynamic inference system based on fuzzy broad learning. IEEE Trans. Instrum. Meas. 2023, 72, 2527412. [Google Scholar] [CrossRef]
Yan, S.; Shao, H.; Wang, J.; Zheng, X.; Liu, B. LiConvFormer: A lightweight fault diagnosis framework using separable multiscale convolution and broadcast self-attention. Expert Syst. Appl. 2024, 237, 121338. [Google Scholar] [CrossRef]
Wang, Z.; Wang, Q.; Liu, Z.; Wu, T. A deep learning interpretable model for river dissolved oxygen multi-step and interval prediction based on multi-source data fusion. J. Hydrol. 2024, 629, 130637. [Google Scholar] [CrossRef]
Li, T.Y.; Shu, X.Y.; Wu, J.; Zheng, Q.X.; Lv, X.; Xu, J.X. Adaptive weighted ensemble clustering via kernel learning and local information preservation. Knowl.-Based Syst. 2024, 294, 111793. [Google Scholar] [CrossRef]
Li, M.; Lv, Z.; Cao, Q.; Gao, J.; Hu, B. Automatic assessment method and device for depression symptom severity based on emotional facial expression and pupil-wave. IEEE Trans. Instrum. Meas. 2024. [Google Scholar]
Li, X.; Zhao, H.; Deng, W. IOFL: Intelligent-optimization-based federated learning for Non-IID data. IEEE Internet Things J. 2024, 11, 16693–16699. [Google Scholar] [CrossRef]
Xu, J.; Li, T.; Zhang, D.; Wu, J. Ensemble clustering via fusing global and local structure information. Expert Syst. Appl. 2024, 237, 121557. [Google Scholar] [CrossRef]
Li, M.; Wang, Y.Q.; Yang, C.; Lu, Z.; Chen, J. Automatic diagnosis of depression based on facial expression information and deep convolutional neural network. IEEE Trans. Comput. Soc. Syst. 2024. [Google Scholar] [CrossRef]
Veličković, P.; Cucurull, G.; Casanova, A.; Romero, A.; Lio, P.; Bengio, Y. Graph attention networks. arXiv 2017, arXiv:1710.10903. [Google Scholar]
Saber, S.; Amin, K.; Pławiak, P.; Tadeusiewicz, R.; Hammad, M. Graph convolutional network with triplet attention learning for person re-identification. Inf. Sci. 2022, 617, 331–345. [Google Scholar] [CrossRef]
Sellars, P.; Aviles-Rivero, A.I.; Schönlieb, C.B. Superpixel contracted graph-based learning for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2020, 58, 4180–4193. [Google Scholar] [CrossRef]
Wan, S.; Gong, C.; Zhong, P.; Du, B.; Zhang, L.; Yang, J. Multiscale dynamic graph convolutional network for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2019, 58, 3162–3177. [Google Scholar] [CrossRef]
Li, Y.; Xi, B.; Li, J.; Song, R.; Xiao, Y.; Chanussot, J. SGML: A symmetric graph metric learning framework for efficient hyperspectral image classification. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 15, 609–622. [Google Scholar] [CrossRef]
Liu, Q.; Xiao, L.; Yang, J.; Wei, Z. CNN-enhanced graph convolutional network with pixel-and superpixel-level feature fusion for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2020, 59, 8657–8671. [Google Scholar] [CrossRef]
Liu, Q.; Xiao, L.; Yang, J.; Wei, Z. Multilevel superpixel structured graph U-Nets for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2021, 60, 1–5. [Google Scholar] [CrossRef]
Dong, Y.; Liu, Q.; Du, B.; Zhang, L. Weighted feature fusion of convolutional neural network and graph attention network for hyperspectral image classification. IEEE Trans. Image Process. 2022, 31, 1559–1572. [Google Scholar] [CrossRef]
Ding, Y.; Zhang, Z.; Zhao, X.; Hong, D.; Cai, W.; Yang, N.; Wang, B. Multi-scale receptive fields: Graph attention neural network for hyperspectral image classification. Expert Syst. Appl. 2023, 223, 119858. [Google Scholar] [CrossRef]
Ding, Y.; Zhang, Z.; Zhao, X.; Hong, D.; Cai, W.; Yu, C.; Yang, N.; Cai, W. Multi-feature fusion: Graph neural network and CNN combining for hyperspectral image classification. Neurocomputing 2022, 501, 246–257. [Google Scholar] [CrossRef]
Xue, H.; Sun, X.K.; Sun, W.X. Multi-hop hierarchical graph neural networks. In Proceedings of the 2020 IEEE International Conference on Big Data and Smart Computing (BigComp), Busan, Republic of Korea, 19–22 February 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 82–89. [Google Scholar]
Zhou, H.; Luo, F.; Zhuang, H.; Weng, Z.; Gong, X.; Lin, Z. Attention multi-hop graph and multi-scale convolutional fusion network for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2023, 61, 1–14. [Google Scholar]
Xiao, Y.; Shao, H.; Lin, J.; Huo, Z.; Liu, B. BCE-FL: A secure and privacy-preserving federated learning system for device fault diagnosis under Non-IID Condition in IIoT. IEEE Internet Things J. 2024, 11, 14241–14252. [Google Scholar] [CrossRef]
Tao, S.; Wang, K.; Jin, T.; Wu, Z.; Lei, Z.; Gao, S. Spherical search algorithm with memory-guided population stage-wise control for bound-constrained global optimization problems. Appl. Soft Comput. 2024, 161, 111677. [Google Scholar] [CrossRef]
Song, Y.J.; Han, L.H.; Zhang, B.; Deng, W. A dual-time dual-population multi-objective evolutionary algorithm with application to the portfolio optimization problem. Eng. Appl. Artiffcial Intell. 2024, 133, 108638. [Google Scholar] [CrossRef]
Li, F.; Chen, J.; Zhou, L.; Kujala, P. Investigation of ice wedge bearing capacity based on an anisotropic beam analogy. Ocean Eng. 2024, 302, 117611. [Google Scholar] [CrossRef]
Chen, H.; Heidari, A.A.; Chen, H.; Wang, M.; Pan, Z.; Gandomi, A.H. Multi-population differential evolution-assisted Harris hawks optimization: Framework and case studies. Future Gener. Comput. Syst. 2020, 111, 175–198. [Google Scholar] [CrossRef]
Zhao, H.; Wang, L.; Zhao, Z.; Deng, W. A new fault diagnosis approach using parameterized time-reassigned multisynchrosqueezing transform for rolling bearings. IEEE Trans. Reliab. 2024. [Google Scholar] [CrossRef]
Xie, P.; Deng, L.; Ma, Y.; Deng, W. EV-Call 120: A new-generation emergency medical service system in China. J. Transl. Intern. Med. 2024, 12, 209–212. [Google Scholar] [CrossRef] [PubMed]
Ahmadianfar, I.; Heidari, A.A.; Gandomi, A.H.; Chu, X.; Chen, H. RUN Beyond the Metaphor: An Efficient Optimization Algorithm Based on Runge Kutta Method. Expert Syst. Appl. 2021, 115079. [Google Scholar] [CrossRef]
Deng, W.; Chen, X.; Li, X.; Zhao, H. Adaptive federated learning with negative inner product aggregation. IEEE Internet Things J. 2023, 11, 6570–6581. [Google Scholar] [CrossRef]
Gao, J.; Wang, Z.; Jin, T.; Cheng, J.; Lei, Z.; Gao, S. Information gain ratio-based subfeature grouping empowers particle swarm optimization for feature selection. Knowl.-Based Syst. 2024, 286, 111380. [Google Scholar] [CrossRef]
Yang, Y.; Chen, H.; Heidari, A.A.; Gandomi, A.H. Hunger games search: Visions, conception, implementation, deep analysis, perspectives, and towards performance shifts. Expert Syst. Appl. 2021, 177, 114864. [Google Scholar] [CrossRef]
Wang, J.; Shao, H.; Peng, Y.; Liu, B. PSparseFormer: Enhancing fault feature extraction based on parallel sparse self-attention and multiscale broadcast feed-forward block. IEEE Internet Things J. 2024, 11, 22982–22991. [Google Scholar] [CrossRef]
Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. Cbam: Convolutional block attention module. In Proceedings of the European conference on computer vision (ECCV), Munich, Gernamy, 8–14 September 2018; pp. 3–19. [Google Scholar]
Zhong, Y.; Wang, X.; Xu, Y.; Wang, S.; Jia, T.; Hu, X.; Zhao, J.; Wei, L.; Zhang, L. Mini-UAV-borne hyperspectral remote sensing: From observation and processing to applications. IEEE Geosci. Remote Sens. Mag. 2018, 6, 46–62. [Google Scholar] [CrossRef]
Chen, H.; Wang, T.; Chen, T.; Deng, W. Hyperspectral image classification based on fusing S3-PCA, 2D-SSA and random patch network. Remote Sens. 2023, 15, 3402. [Google Scholar] [CrossRef]
Van der Maaten, L.; Hinton, G. Visualizing data using t-SNE. J. Mach. Learn. Res. 2008, 9, 2579–2605. [Google Scholar]
Yang, J.; Du, B.; Wang, D.; Zhang, L. ITER: Image-to-pixel representation for weakly supervised HSI classification. IEEE Trans. Image Process. 2024, 33, 257–272. [Google Scholar] [CrossRef] [PubMed]

Figure 1. The framework of H²-CHGN model for H² image classification.

Figure 2. Superpixel and pixel feature conversion process.

Figure 3. The procedure for k-hop matrices.

Figure 4. Cross-hop graph attention module: (a) pyramid structure by cross-connect feature and (b) graph attention mechanism.

Figure 5. The structure of ConvBlock.

Figure 6. Convolutional block attention network module: (a) channel attention module and (b) spatial attention module.

Figure 7. Dual-channel attention fusion module.

Figure 8. OAs under a different number of heads and epochs: (a) Pavia University; (b) WHU-Hi-LongKou; and (c) WHU-Hi-HongHu.

Figure 9. Mean color visualization of superpixel segmented regions on different datasets: (a–d): Pavia University; (e–h): WHU-Hi-LongKou; and (i–l): WHU-Hi-HongHu.

Figure 10. Effect of different superpixel segmentation scales on classification accuracy.

Figure 11. Effect of different cross-hopping methods on classification accuracy.

Figure 12. Classification maps for the Pavia University dataset: (a) False-color image; (b) ground truth; (c) SVM (OA = 79.54%); (d) CEGCN (OA = 97.81%); (e) SGML (OA = 94.30%); (f) WFCG (OA = 97.53%); (g) MSSG-UNet (OA = 98.52%); (h) MS-RPNet (OA = 96.96%); (i) AMGCFN (OA = 98.24%); and (j) H²-CHGN (OA = 99.24%).

Figure 13. Classification maps for the WHU-Hi-LongKou dataset: (a) False-color image; (b) ground truth; (c) SVM (OA = 92.88%); (d) CEGCN (OA = 98.72%); (e) SGML (OA = 96.03%); (f) WFCG (OA = 98.29%); (g) MSSG-UNet (OA = 98.56%); (h) MS-RPNet (OA = 97.17%); (i) AMGCFN (OA = 98.44%); and (j) H²-CHGN (OA = 99.19%).

Figure 14. Classification maps for the WHU-Hi-HongHu dataset: (a) False-color image; (b) Ground truth; (c) SVM (OA = 66.34%); (d) CEGCN (OA = 94.01%); (e) SGML (OA = 92.51%); (f) WFCG (OA = 93.98%); (g) MSSG-UNet (OA = 93.73%); (h) MS-RPNet (OA = 93.56%); (i) AMGCFN (OA = 94.44%); (j) H2-CHGN (OA = 96.60%).

Figure 15. Effect of different numbers of training samples for the methods: (a) Pavia University; (b) WHU-Hi-LongKou; and (c) WHU-Hi-HongHu.

Figure 16. t-SNE results of different methods on three datasets: (a–d) Pavia University; (e–h) WHU-Hi-LongKou; and (i–l) WHU-Hi-HongHu.

Table 1. Information about datasets.

Detailed Information	Pavia University	WHU-Hi-LongKou	WHU-Hi-HongHu
Size (pixels)	610 × 340	550 × 400	940 × 475
Bands	103	270	270
Spatial-res (m)	1.3	0.463	0.043
Spectral-wave (nm)	430–860	400–1000	400–1000
Sensor	ROSIS	DJI M600 Pro	DJI M600 Pro
Class	9	9	22
Training sample ratio	0.4%	0.2%	0.2%
Validation sample ratio	0.4%	0.2%	0.2%
Training sample ratio	99.2%	99.6%	99.6%

Table 2. Category information of Pavia University dataset.

Class	Name	Training	Validation	Testing
1	Asphalt	27	27	6577
2	Meadows	75	75	18,499
3	Gravel	9	9	2081
4	Trees	13	13	3038
5	Metal sheets	6	6	1333
6	Bare soil	21	21	4987
7	Bitumen	6	6	1318
8	Bricks	15	15	3652
9	Shadows	4	4	939
Total		176	176	42,424

Table 3. Category information of WHU-Hi-LongKou dataset.

Class	Name	Training	Validation	Testing
1	Corn	70	70	34,371
2	Cotton	17	17	8340
3	Sesame	7	7	3017
4	Broad-leaf soybean	127	127	62,958
5	Narrow-leaf-soybean	9	9	4133
6	Rice	24	24	11,806
7	Water	135	135	66,786
8	Roads and houses	15	15	7094
9	Mixed weed	11	11	5207
Total		415	415	203,712

Table 4. Category information of WHU-Hi-HongHu dataset.

Class	Name	Training	Validation	Testing
1	Red roof	29	29	13,983
2	Road	8	8	3496
3	Bare soil	44	44	21,733
4	Cotton	327	327	162,604
5	Cotton firewood	13	13	6192
6	Rape	90	90	44,377
7	Chinese cabbage	49	49	24,005
8	Pakchoi	9	9	4036
9	Cabbage	22	22	10,775
10	Tuber mustard	25	25	12,344
11	Brassica parachinensis	23	23	10,969
12	Brassica chinensis	18	18	8918
13	Small Brassica chinensis	46	46	22,415
14	Lactuca sativa	15	15	7326
15	Celtuce	3	3	996
16	Film covered letttuce	15	15	7232
17	Romaine lettuce	7	7	2996
18	Carrot	7	7	3203
19	White radish	18	18	8676
20	Garlic sprout	7	7	3472
21	Broad bean	3	3	1322
22	Tree	12	12	5916
Total		790	790	386,986

Table 5. The architectural details of H²-CHGN.

Module	Details
Superpixel segmentation methods	PCA-SLIC
Graph construction	Calculate adjacency matric
Cross-hop graph way	135-hop
Muti-scale CNN processing	Conv3×3@64, Conv5×5@64,
Feature fusion	Dual-Channel Attention Fusion

Table 6. Parameter settings for different datasets.

Dataset	$λ$	Head_num	Epoch
Pavia University	250	5	250
WHU-Hi-LongKou	300	7	250
WHU-Hi-HongHu	300	6	250

Table 7. Classification accuracy of the Pavia University dataset.

Class	SVM	CEGCN	SGML	WFCG	MSSG-UNet	MS-RPNet	AMGCFN	H²-CHGN
1	70.32 ± 5.78	97.65 ± 2.67	97.58 ± 0.21	98.93 ± 0.31	96.85 ± 2.38	96.8 ± 2.32	99.74 ± 0.17	99.48 ± 0.23
2	77.47 ± 2.82	98.66 ± 1.53	99.72 ± 0.09	99.55 ± 0.14	99.06 ± 0.37	99.64 ± 0.31	99.96 ± 0.06	99.95 ± 0.07
3	73.79 ± 3.31	94.18 ± 3.85	74.31 ± 14.84	99.02 ± 0.42	96.30 ± 3.63	92.14 ± 4.82	88.24 ± 11.43	99.52 ± 0.25
4	92.70 ± 2.39	93.35 ± 3.29	91.12 ± 5.85	99.65 ± 0.18	94.67 ± 2.93	90.15 ± 6.94	91.44 ± 4.93	97.54 ± 2.24
5	99.43 ± 0.12	99.98 ± 0.05	100 ± 0	91.24 ± 2.07	100 ± 0	89.89 ± 8.58	99.84 ± 0.14	99.25 ± 0.83
6	79.72 ± 1.41	99.95 ± 0.02	85.17 ± 9.73	98.52 ± 1.02	100 ± 0	98.50 ± 1.07	100 ± 0	100 ± 0
7	92.80 ± 2.06	99.24 ± 0.23	90.40 ± 4.41	99.04 ± 0.40	100 ± 0	96.77 ± 2.15	98.10 ± 1.69	96.51 ± 1.92
8	81.88 ± 4.13	97.20 ± 2.72	93.86 ± 3.82	84.74 ± 9.31	99.97 ± 0.36	95.59 ± 3.29	99.26 ± 0.32	98.49 ± 1.40
9	99.93 ± 0.12	90.46 ± 8.35	95.92 ± 3.09	100 ± 0	99.57 ± 0.40	85.54 ± 11.20	82.85 ± 13.57	91.38 ± 5.31
OA(%)	79.54 ± 1.87	97.81 ± 1.31	94.3 ± 1.64	97.53 ± 1.02	98.52 ± 0.47	96.96 ± 1.32	98.24 ± 0.79	99.24 ± 0.37
AA(%)	85.34 ± 2.84	96.74 ± 1.25	92.01 ± 2.32	96.74 ± 1.87	98.49 ± 0.72	93.89 ± 2.78	95.49 ± 1.64	98.01 ± 1.54
Kappa × 100	73.86 ± 2.19	97.10 ± 1.16	92.53 ± 1.71	96.72 ± 1.35	98.05 ± 0.39	95.95 ± 2.13	97.67 ± 1.05	98.99 ± 0.78

Table 8. Classification accuracy of the WHU-Hi-LongKou dataset.

Class	SVM	CEGCN	SGML	WFCG	MSSG-UNet	MS-RPNet	AMGCFN	H²-CHGN
1	94.24 ± 4.82	99.82 ± 0.35	99.89 ± 0.16	99.87 ± 0.11	99.73 ± 0.28	99.44 ± 0.41	99.61 ± 0.27	99.78 ± 0.13
2	86.45 ± 8.05	98.97 ± 0.41	94.67 ± 2.74	97.41 ± 2.72	99.87 ± 0.12	100 ± 0	94.86 ± 4.67	99.42 ± 0.59
3	93.52 ± 2.14	91.35 ± 5.86	98.96 ± 0.98	99.85 ± 0.19	88.60 ± 7.47	98.79 ± 0.53	98.21 ± 1.58	99.60 ± 0.38
4	86.63 ± 2.58	99.56 ± 0.25	90.71 ± 5.86	97.41 ± 1.67	98.87 ± 0.77	95.80 ± 2.67	99.67 ± 0.24	99.78 ± 0.16
5	86.77 ± 3.24	94.27 ± 4.82	98.54 ± 1.21	99.06 ± 0.24	98.35 ± 0.86	98.88 ± 0.74	90.65 ± 7.81	98.67 ± 1.45
6	98.12 ± 2.73	98.99 ± 0.73	98.88 ± 0.74	99.29 ± 0.39	99.14 ± 0.41	99.17 ± 0.62	98.96 ± 0.87	99.32 ± 0.61
7	99.85 ± 0.10	99.98 ± 0.04	99.92 ± 0.12	99.10 ± 0.22	99.96 ± 0.02	98.68 ± 0.87	99.87 ± 0.11	99.97 ± 0.06
8	85.42 ± 1.68	87.93 ± 9.73	88.01 ± 8.65	91.46 ± 4.14	96.69 ± 2.23	97.30 ± 1.64	80.44 ± 18.28	90.67 ± 9.43
9	82.94 ± 2.85	86.54 ± 11.70	87.76 ± 8.92	95.19 ± 3.23	74.38 ± 10.78	68.00 ± 26.07	92.94 ± 5.95	89.17 ± 10.85
OA(%)	92.88 ± 2.23	98.72 ± 0.16	96.03 ± 1.61	98.29 ± 0.37	98.56 ± 0.24	97.17 ± 0.73	98.44 ± 0.12	99.19 ± 0.32
AA(%)	90.44 ± 3.45	95.27 ± 1.12	95.26 ± 2.32	97.63 ± 1.48	95.06 ± 0.69	95.12 ± 1.60	95.02 ± 2.91	97.38 ± 0.87
Kappa × 100	90.77 ± 2.97	98.31 ± 0.66	94.83 ± 1.72	97.74 ± 1.05	98.11 ± 0.34	96.28 ± 1.19	97.95 ± 1.34	98.93 ± 0.45

Table 9. Classification accuracy of the WHU-Hi-HongHu dataset.

Class	SVM	CEGCN	SGML	WFCG	MSSG-UNet	MS-RPNet	AMGCFN	H²-CHGN
1	78.86 ± 3.31	96.62 ± 1.13	94.35 ± 2.37	98.53 ± 0.32	90.94 ± 7.85	90.95 ± 5.23	97.08 ± 1.51	98.67 ± 1.28
2	80.39 ± 2.71	82.82 ± 10.57	91.93 ± 4.68	87.46 ± 6.39	94.82 ± 3.86	59.73 ± 20.35	83.01 ± 11.97	92.16 ± 5.72
3	71.88 ± 2.14	97.32 ± 0.41	84.97 ± 12.35	93.63 ± 2.31	95.44 ± 2.04	97.77 ± 1.11	93.82 ± 3.91	96.23 ± 1.36
4	69.76 ± 5.85	99.77 ± 0.29	94.09 ± 2.81	99.46 ± 0.24	97.93 ± 1.95	98.56 ± 1.09	99.53 ± 0.18	99.74 ± 0.08
5	68.73 ± 6.53	91.88 ± 1.27	98.29 ± 0.12	89.09 ± 6.05	98.13 ± 0.61	93.46 ± 4.67	96.30 ± 2.86	82.57 ± 10.71
6	78.13 ± 5.71	97.75 ± 1.01	95.07 ± 1.57	97.75 ± 1.21	91.49 ± 5.65	97.03 ± 1.36	98.87 ± 0.65	99.35 ± 0.14
7	51.94 ± 5.38	92.44 ± 2.35	89.67 ± 7.98	84.00 ± 10.82	83.92 ± 10.31	87.29 ± 8.56	88.39 ± 6.33	94.33 ± 3.94
8	32.97 ± 13.65	35.54 ± 40.12	98.86 ± 0.06	61.08 ± 28.63	99.97 ± 0.05	78.62 ± 11.98	63.36 ± 32.97	91.55 ± 4.79
9	87.20 ± 3.37	96.19 ± 1.33	95.79 ± 2.82	98.53 ± 0.39	97.76 ± 1.13	89.88 ± 7.71	93.00 ± 4.86	93.52 ± 3.82
10	41.43 ± 7.65	89.72 ± 6.11	91.09 ± 4.34	98.60 ± 1.02	79.62 ± 12.26	92.26 ± 4.25	88.47 ± 7.08	97.86 ± 1.19
11	34.61 ± 11.48	77.62 ± 14.55	90.61 ± 6.36	83.11 ± 10.75	94.20 ± 3.24	90.35 ± 7.42	90.60 ± 6.49	93.06 ± 4.56
12	55.37 ± 4.67	77.22 ± 12.85	88.32 ± 7.95	82.36 ± 11.54	98.05 ± 0.86	77.6 ± 12.33	73.65 ± 24.07	82.81 ± 16.48
13	47.00 ± 9.26	84.74 ± 6.27	78.68 ± 14.07	81.10 ± 9.48	74.82 ± 18.03	87.91 ± 6.78	93.21 ± 4.24	94.31 ± 3.51
14	56.19 ± 8.07	88.32 ± 5.39	93.09 ± 2.29	96.21 ± 1.83	98.36 ± 0.61	81.07 ± 10.62	82.03 ± 14.11	92.87 ± 4.79
15	76.31 ± 6.43	88.42 ± 5.14	94.23 ± 2.24	97.84 ± 0.39	97.87 ± 1.47	94.28 ± 4.74	36.37 ± 58.86	84.57 ± 12.01
16	69.37 ± 3.25	91.71 ± 2.89	97.86 ± 1.03	84.70 ± 13.53	97.18 ± 1.55	97.17 ± 3.03	84.06 ± 12.97	98.84 ± 0.92
17	69.26 ± 2.86	75.45 ± 8.65	98.56 ± 1.11	96.17 ± 4.48	96.53 ± 2.03	81.28 ± 13.99	91.80 ± 6.21	66.77 ± 26.44
18	64.49 ± 5.85	82.24 ± 7.61	98.78 ± 0.97	92.15 ± 3.88	98.28 ± 0.67	75.96 ± 18.13	84.29 ± 11.34	92.73 ± 3.94
19	67.98 ± 4.14	91.33 ± 2.29	90.21 ± 3.55	84.76 ± 8.36	90.11 ± 6.29	79.45 ± 15.27	82.08 ± 14.31	89.84 ± 6.82
20	70.42 ± 3.10	88.27 ± 6.11	99.53 ± 0.37	83.75 ± 8.47	99.48 ± 0.25	80.27 ± 11.89	77.62 ± 19.63	97.09 ± 1.20
21	65.54 ± 6.32	22.51 ± 40.73	99.85 ± 0.14	70.97 ± 14.31	100 ± 0	98.94 ± 0.93	98.56 ± 0.92	96.22 ± 2.06
22	69.60 ± 4.83	88.22 ± 5.68	99.73 ± 0.20	87.63 ± 6.22	99.80 ± 0.13	89.52 ± 5.97	99.97 ± 0.01	86.53 ± 10.36
OA(%)	66.34 ± 3.39	94.01 ± 1.72	92.51 ± 0.77	93.98 ± 1.81	93.73 ± 1.48	93.56 ± 2.63	94.44 ± 1.05	96.60 ± 0.45
AA(%)	63.97 ± 5.71	83.46 ± 4.93	93.79 ± 1.16	88.59 ± 3.75	94.30 ± 2.95	87.24 ± 5.51	86.18 ± 3.31	91.89 ± 1.64
Kappa × 100	60.11 ± 3.12	92.41 ± 2.35	90.66 ± 0.83	94.15 ± 2.19	92.12 ± 2.42	91.87 ± 1.59	92.98 ± 2.70	95.77 ± 1.53

Table 10. OA (%) of ablation of each module on three datasets.

Module	(1)	(2)	(3)	(4)	(5)
CGAT	√	-	√	-	√
MCNN	-	√	-	√	√
Dual-Channel Fusion	-	-	√	√	√
Pavia University	95.62	94.17	96.81	96.54	99.24
WHU-Hi-LongKou	96.87	96.15	97.64	97.32	99.19
WHU-Hi-HongHu	92.94	91.86	94.72	94.48	96.60

Table 11. Running time for different methods.

Dataset	Time(s)	AMGCFN	WFCG	CEGCN	MSSG-UNet	H²-CHGN
Pavia University	Train	59.707	64.614	47.371	57.388	57.506
Pavia University	Test	0.013	3.017	3.639	10.166	0.033
WHU-Hi-LongKou	Train	67.368	70.469	83.005	64.105	51.828
WHU-Hi-LongKou	Test	0.021	3.362	0.495	10.662	0.028
WHU-Hi-HongHu	Train	137.803	155.298	314.517	138.511	125.697
WHU-Hi-HongHu	Test	0.025	7.745	29.919	21.561	0.181

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Chen, T.; Wang, T.; Chen, H.; Zheng, B.; Deng, W. Cross-Hopping Graph Networks for Hyperspectral–High Spatial Resolution (H²) Image Classification. Remote Sens. 2024, 16, 3155. https://doi.org/10.3390/rs16173155

AMA Style

Chen T, Wang T, Chen H, Zheng B, Deng W. Cross-Hopping Graph Networks for Hyperspectral–High Spatial Resolution (H²) Image Classification. Remote Sensing. 2024; 16(17):3155. https://doi.org/10.3390/rs16173155

Chicago/Turabian Style

Chen, Tao, Tingting Wang, Huayue Chen, Bochuan Zheng, and Wu Deng. 2024. "Cross-Hopping Graph Networks for Hyperspectral–High Spatial Resolution (H²) Image Classification" Remote Sensing 16, no. 17: 3155. https://doi.org/10.3390/rs16173155

APA Style

Chen, T., Wang, T., Chen, H., Zheng, B., & Deng, W. (2024). Cross-Hopping Graph Networks for Hyperspectral–High Spatial Resolution (H²) Image Classification. Remote Sensing, 16(17), 3155. https://doi.org/10.3390/rs16173155

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Cross-Hopping Graph Networks for Hyperspectral–High Spatial Resolution (H²) Image Classification

Abstract

1. Introduction

2. Proposed H²-CHGN Framework

2.1. Graph Construction Process Based on Superpixel Segmentation

2.2. Cross-Hop Graph Attention Convolution Module

2.3. CNN-Based Multiscale Feature Extraction Module

2.4. Dual-Channel Attention Fusion Module