HGNN-AS: Enhancing Hypergraph Neural Network for Node Classification Accuracy with Attention and Self-Attention

Li, Chuang; Huang, Lanfang; Liu, Ruihai; He, Dian; Chen, Minghui; Wu, Qian

doi:10.3390/electronics14214282

Open AccessArticle

HGNN-AS: Enhancing Hypergraph Neural Network for Node Classification Accuracy with Attention and Self-Attention

by

Chuang Li

^1,2,

Lanfang Huang

¹,

Ruihai Liu

¹,

Dian He

^1,*,

Minghui Chen

^1,* and

Qian Wu

¹

College of Computer Science, Hunan University of Technology and Business, Changsha 410205, China

²

Xiangjiang Laboratory, Changsha 410205, China

^*

Authors to whom correspondence should be addressed.

Electronics 2025, 14(21), 4282; https://doi.org/10.3390/electronics14214282 (registering DOI)

Submission received: 31 August 2025 / Revised: 20 October 2025 / Accepted: 27 October 2025 / Published: 31 October 2025

(This article belongs to the Special Issue Digital Intelligence Technology and Applications, 2nd Edition)

Download

Browse Figures

Versions Notes

Abstract

The incorporation of attention mechanisms into hypergraphs is an effective method for obtaining appropriate node and hyperedge representations. However, the ability of attention mechanisms to aggregate features on a hypergraph can be improved, particularly for noisy hypergraphs with connections between unrelated nodes that were constructed in a KNN-like manner. In this paper, we propose HGNN-AS, an enhanced hypergraph neural network model that achieves good accuracy on node classification tasks by combining an attention mechanism and a self-attention mechanism. Specifically, we introduce two self-attention mechanisms to improve how the HGNN-AS model expresses attention when distinguishing mislinked neighbours. Moreover, we add multihead attention mechanisms to our model to stabilize the training effect. The proposed model is evaluated on benchmark node classification tasks, including citation network classification and visual object recognition. Our experimental results demonstrate that our model outperforms most advanced methods on both tasks; among them, the accuracy improvement on the Cora dataset is the most noticeable, with an accuracy of 83.9%.

Keywords:

hypergraph neural network; self-attention mechanism; attention mechanism; multihead attention

1. Introduction

Hypergraphs are generalized graph structures in which the edges are connected to an arbitrary number of nodes to represent higher-order relationships. Compared with general graph structures, hypergraphs can model multivariate relationships more accurately and extract richer information. With these characteristics, hypergraphs have demonstrated broad applicability across various domains, including protein complex modelling in bioinformatics [1], collaborative cyber-attack detection in network security [2], and complex relationship modelling in financial analysis [3].

Recently, hypergraphs have received considerable interest in the field of graph learning. Many works have proposed a series of neural networks for hypergraphs, which are collectively referred to as hypergraph neural networks. Among these methods, HGNN [4] was the first proposed hypergraph neural network model; this model realized information dissemination on hypergraphs through the spectral method. HyperGCN [5] is a semisupervised graph convolution training model based on spectral theory. HyperGAT [6] uses a dual attention mechanism to aggregate node features and hyperedge features in hypergraphs. The approaches used in these models for aggregating node and hyperedge features depend on the hypergraph structure.

However, hypergraph information cannot be used effectively. Hypergraphs are often noisy because hyperedges may include nodes that are semantically or categorically unrelated, which can mislead hypergraph neural networks and result in suboptimal node representations. HyperGAT adopts an attention mechanism to alleviate this issue. Similar to attention in sequential data [7,8,9] and graph data [10], hypergraph attention captures the relational importance of a hypergraph, that is, the degree of significance between each node on the hyperedge and the same node on different hyperedges. Although HyperGAT has an improved node classification performance, this method cannot prevent noise or extract richer information in the hypergraphs. This is a critical issue that should be addressed in the field of hypergraph neural networks. The structure of the hypergraph is another vital factor that affects the node classification accuracy of a hypergraph neural network. Thus, the questions of how to construct hypergraphs and how to define the relationship between nodes in the hypergraph are fundamental. Figure 1 shows a standard method for constructing hypergraphs that constructs hyperedges similar to the k-nearest neighbour (KNN) method, a commonly utilized method in citation networks, 3D visual object classification, and other applications. In this method, each node is chosen as the centre node, nodes with high correlations with the centre node are called neighbour nodes, and the centre node and neighbour nodes form a hyperedge. However, hypergraphs constructed in this manner cannot be used to describe the relationship between nodes in the same hyperedges. For example, suppose one node in a hyperedge is not in the same category as the central node and the other nodes are. In that case, the elimination of noise for nodes in different categories needs to be considered in the node classification task.

In this paper, we propose HGNN-AS, a hypergraph neural network that combines attention and self-attention mechanisms to improve the efficiency of HyperGAT by focusing on a hypergraph constructed in a KNN-like manner. In the self-attention mechanisms employed in the model, we leveraged nodes and hyperedges that encode explicit information about the importance of the relations in the hypergraph. If nodes i and j have a relationship between a central node and its corresponding neighbour node, they are more relevant to each other than to other nodes. If nodes i and j do not have this relationship, they are not important to each other. If node i is on hyperedge e, this relationship is relevant, and if node i is not on hyperedge e, this relationship has very little relevance. Although conventional attention is trained without self-attention, if we have prior knowledge, we can supervise attention using the self-attention mechanism. Specifically, we can use two new attention values as inputs to predict the relationship between the nodes and hyperedges. We combined these two self-attention mechanisms with HyperGAT, which slightly improved the performance of the model.

We also introduced a multihead attention mechanism, which has the same form as in the GAT, into our model. Specifically, our multihead attention mechanism adopts multiple independent attentional heads in the first-layer network to identify node features. It splices the obtained node features into the second-layer network to get the final results. The multihead attention method is an integration mechanism that can prevent overfitting and acquire multiple types of data features to increase accuracy.

Our contributions can be summarized as follows:

We propose the HGNN-AS model for enhancing the accuracy of node classification tasks on hypergraphs.
We design two self-attention mechanisms using the hyperedge information and node information to learn richer information and obtain better representations.
We apply the multihead attention mechanism to hypergraphs, which increases the stability of the model and prevents overfitting.

This paper is organized as follows: In Section 2, we introduce related work, including graph neural networks, hypergraph neural networks, and self-attention mechanisms. In Section 3 we present our proposed HGNN-AS model as well as related details. In Section 4, the experiments are discussed and analysed. Section 5 concludes this paper.

2. Related Work

In this section, we briefly review the development history and existing work on graph neural networks, hypergraph neural networks, and self-attention mechanisms.

2.1. Graph Neural Networks

A graph is a simple, abstract, and intuitive mathematical expression between objects that is widely used in recommended systems, social networks, and other applications. With increasing data volume, traditional graph algorithms have considerable limitations in addressing critical issues, such as node classification and link prediction. The graph neural network (GNN) model considers the scale, heterogeneity, and deep topological information of the input data, among other factors, to achieve convincing and consistent results when mining deep practical topological information, extracting key complex data features, and rapidly processing large amounts of data. For example, GNNs have demonstrated broad potential across various application scenarios, including federated learning for text sentiment analysis [11], high-dimensional feature selection and classification optimization [12], dynamic task scheduling in edge computing [13], medical image segmentation and liver lesion analysis [14], vehicle routing optimization and dynamic delivery [15], IoT location privacy protection [16], data trading mechanism design [17], D-IHFSs-based uncertain information processing [18], and hesitant fuzzy linguistic term sets similarity combined with TOPSIS [19], as well as other specific applications [20,21,22].

In 2005, Gori et al. [23] introduced the concept of graph neural networks based on neural network research results and designed a model for processing data with graph structures. Scarselli et al. [24] extensively researched this model in 2009; since then, other graph neural network models and research applications have been proposed. With the increase in graph-structured data studies in recent years, the number of graph neural network research papers has increased dramatically, and the research direction and application domains of graph neural networks have grown substantially. In 2017, Kipf et al. [25] proposed graph convolutional networks (GCNs). In 2018, Velickovic et al. [7] proposed graph attention networks (GATs), which introduced the increasingly mature attention mechanism to natural language processing. GATs comprehensively capture the entity and relation features as well as semantically similar relationships in multihop domains by assigning different weights to various entities in the n-hop domain.

GCNs, GATs, and other conventional graph neural networks [26,27,28,29] can effectively extract structural and attribute information, providing convenient solutions to node classification tasks. However, due to graph structure limitations, these models can only represent second-order relations, impacting the acquired data to varying degrees. To better represent higher-order relationships between objects, researchers have extended these graphs to hypergraphs and have proposed hypergraph neural networks to extract higher-order correlation information between nodes.

2.2. Hypergraph Neural Networks

In contrast to graphs, which can model only second-order relationships among data, hypergraphs can represent higher-order relationships among data, allowing network models to extract more complete feature information and better learn graph representations. Hypergraphs have been widely applied in various fields, including spatiotemporal sequence analysis [30], recommendation systems [31], disease prediction [32], IoT anomaly detection and Edge AI [33], as well as air quality prediction frameworks [34].

The pioneering work on hypergraph learning [35] systematically elaborated the fundamental theory and properties of hypergraphs, laying the foundation for subsequent research in this area. Building upon this theoretical framework, Zhou et al. [36] were the first to investigate the use of hypergraphs to represent complex relationships among objects and extended spectral clustering methods to hypergraphs, further proposing additional hypergraph construction techniques based on spectral hypergraph clustering. Subsequently, Zhang et al. [37] proposed a dynamic hypergraph structure learning model, which enhances representational capacity by simultaneously optimizing the label projection matrix and hypergraph structure. Zhang et al. [38] introduced Hypergraph Label Propagation Networks (HLPNs), combining hypergraph-based label propagation with deep neural networks to improve feature embedding. In 2019, Feng et al. [4] proposed Hypergraph Neural Networks (HGNNs), which construct hypergraphs using the k-nearest neighbour (KNN) algorithm and achieved superior performance on classification tasks. To address HGNN’s limitation in dynamically adjusting the hypergraph structure via feature embeddings during training, Dong et al. [39] proposed the HNHN framework for hypergraph representation learning, wherein convolution operations over the incidence matrix update hypernode and hyperedge representations, enabling flexible weighting of high-cardinality hyperedges and high-degree nodes. Further exploring higher-order information extraction, Bai et al. [40] incorporated hypergraph convolution and hypergraph attention mechanisms into graph neural networks to capture higher-order information, demonstrating strong performance in semi-supervised node classification tasks.

Meanwhile, Duta et al. [41] introduced the concept of a cavity layer within hypergraphs, combining linear and nonlinear features to achieve more substantial inductive bias, surpassing traditional methods in node classification tasks. For complex structured data, Chen et al. [42] proposed the HyTrel framework, treating cells as nodes, rows, and columns and entire tables as different types of hyperedges, effectively capturing permutation invariance and hierarchical structure properties and achieving strong performance across multiple downstream tasks. Furthermore, to simultaneously model high-order dependencies among nodes and interactions among hyperedges while facilitating knowledge transfer, Ju et al. proposed the HEAL framework, introducing branch consistency constraints and achieving state-of-the-art performance on multiple real-world graph classification datasets [43].

2.3. Self-Attention Mechanism

The attention mechanism [44], which Treisman and Gelade proposed, is a model that simulates attention in the human brain. The attention mechanism is a signal connection mechanism modelled on observations in the human brain; that is, when humans scan an observation target, they focus on specific areas that contain key information while ignoring the remainder of the nonprimary information. The attention mechanism has been a popular method in various fields since it was first proposed. Bahdanau et al. [8] first applied the attention mechanism to machine translation. Rush et al. [45] applied the attention mechanism to text summaries and extracted keywords from long sentences and paragraphs. Mnih et al. [46,47] used an attention mechanism for image recognition, with substantial progress in the development of image captions. Chan [48] used an encoder–decoder framework based on the attention mechanism in speech recognition to establish the relationship between speech and words. Xu [49] introduced the attention mechanism to image captioning. The attention mechanism has also achieved good results in hypergraph learning. Kaize Ding et al. [6] proposed hypergraph attention networks (HyperGAT), which introduced the attention mechanism to hypergraph neural networks.

In 2017, the Google Translate team proposed the self-attention mechanism [9], another kind of attention mechanism. Compared with the conventional attention mechanism, the self-attention mechanism focuses on itself and extracts relevant information without using additional information. The content of the source and target ends differs in the traditional attention mechanism, and the outcome is the dependency between each word at the source and target ends. In the self-attention mechanism, attention occurs between source elements within the source or target. The self-attention mechanism has been used in a variety of graph neural network applications. For example, Kim [50] applied the self-attention mechanism to the graph attention network, which improved the effect of the original network. In recent years, the self-attention mechanism has been widely used for tasks in high-dimensional and complex scenarios, including user adoption behaviour analysis in mobile chronic disease management services, indoor wireless positioning optimization, big data service evaluation and optimization, as well as Edge server resource allocation, mobile charging, and tasks related to interval type-2 fuzzy sets (IT2 FS) [51,52,53,54,55].

3. Methodology

In this section, we first review HyperGAT and then review the details of our proposed HGNN-AS.

3.1. HyperGAT for Node Classification

With the increased focus on hypergraph learning and attention mechanisms, HyperGATs have experienced considerable success in representation learning on hypergraph-structured data. HyperGATs obtain new node representations by applying two neighbourhood aggregation functions on hypergraphs. A general HyperGAT layer can be defined as

\begin{matrix} f_{j}^{l} & = A G G R_{n o d e}^{l} ({h_{k}^{l - 1} | \forall v_{k} \in e_{j}}) \\ h_{i}^{l} & = A G G R_{e d g e}^{l} (h_{i}^{l - 1} {f_{j}^{l} | \forall e_{j} \in ε_{i}}) \end{matrix}

(1)

where

f_{j}^{l}

is the representation of hyperedge

e_{j}

in layer l, and

h_{i}^{l}

is the node representation of node

v_{i}

in layer l.

A G G R_{n o d e}^{l}

is an aggregation function that combines node features on hyperedges, and

A G G R_{e d g e}^{l}

is another aggregation function that combines hyperedge features with nodes.

ε_{i}

denotes the set of hyperedges connected to node

v_{i}

.

HyperGAT aggregates node features and hyperedge features based on a dual attention mechanism. Although this model obtains a good node representation, the attention mechanism used in HyperGAT is insufficient. When the noise in different hypergraphs or irrelevant nodes is connected, the attention mechanism adopted by HyperGAT often fails to assign the correct weight to each node. Nodes should ignore the interference of different types of nodes as much as possible when aggregating neighbour nodes; however, with the original attention mechanism, the interference of different types of nodes is substantial. Another disadvantage of HyperGAT is that the meaning of the hyperedge representation is unknown when hyperedges aggregate node features with different weights to obtain the hyperedge representation. This representation is related to the construction of the hypergraphs. When the hypergraph is constructed like KNN, we believe that the representation of the hyperedge is the representation of the class closest to the central node. The second aggregation function

A G G R_{e d g e}^{l}

of the HyperGAT model also expresses this potential meaning. When there is a significant difference between the node and hyperedge features, the attention mechanism adopted by the aggregation function assigns a small weight to this hyperedge. However, the attention mechanism in HyperGAT is insufficient and needs improvement. In summary, while the two aggregation functions of HyperGAT obtain exemplary node and hyperedge representations, the attention mechanism of HyperGAT can be improved.

3.2. Hypergraph Neural Network with Attention and Self-Attention

3.2.1. Data Preprocessing and Building Hypergraphs

The first step in the HGNN-AS model is constructing the hypergraphs. We construct a hypergraph like KNN, which selects each node as a centre node and chooses the nodes near the central node as its neighbour nodes. Then, each central node and its corresponding neighbour nodes form a hyperedge. A hypergraph is defined as

G = (V, E)

, where

V = {v_{1}, \dots, v_{n}}

represents all nodes in the hypergraph, and

E = {e_{1}, \dots, e_{m}}

represents all hyperedges in the hypergraph. It is worth noting that each hyperedge e could have two or more nodes. Notably, an incidence matrix

A \in R^{n \times m}

can also be used to depict the topological structure of a hypergraph G. This matrix is defined as follows:

A (v, e) = \{\begin{matrix} 1, & if v \in e \\ 0, & if v \notin e \end{matrix}

(2)

Typically, each node in the hypergraph has a d-dimensional attribute vector. Therefore, the attribute vectors of all nodes can be represented as

X = {[X_{1}, X_{2}, X_{3}, \dots, X_{n}]}^{T}

∈

R^{n \times m}

, and we can represent the whole hypergraph by

G = (A, X)

.

The second step in the HGNN-AS model is adding two new labels according to the hypergraph structure. The first label assigns node pairs in which the centre node

v_{i}

is connected to the neighbour node a value of 1, and node pairs without this connection have a value of 0. The first label can be defined as

l a b e l_{1} = [l 1_{11}, l 1_{12}, \dots, l 1_{i j}, \dots, l 1_{n n}]

. The second label assigns nodes that exist on the hyperedge in the graph a value of 1 and nodes that do not exist on the hyperedge a value of 0. Thus, the second label can be defined as

l a b e l_{2} = [l 2_{11}, l 2_{12}, \dots, l 2_{i j}, \dots, l 2_{n m}]

. These two labels can be defined as follows:

\begin{matrix} l a b e l_{1} & = [l 1_{11}, l 1_{12}, \dots, l 1_{i j}, \dots, l 1_{n n}] \\ A (v, e) & = \{\begin{matrix} 1, & if centre node v_{i} is connected to neighbour node v_{j} \\ 0, & if centre node v_{i} is not connected to neighbour node v_{j} \end{matrix} \\ l a b e l_{2} & = [l 2_{11}, l 2_{12}, \dots, l 2_{i j}, \dots, l 2_{n m}] \\ A (v, e) & = \{\begin{matrix} 1, & if v_{i} o n e_{j} \\ 0, & if v_{i} i s n o t o n e_{j} \end{matrix} \end{matrix}

(3)

3.2.2. Node-Level Attention

The whole process of the HGNN-AS model is shown in Figure 2. The attention mechanism used in the HGNN-AS model is the same as that used in HyperGAT. This mechanism first aggregates node features and then aggregates hyperedge features. The specific process of node-level attention is as follows:

f_{j}^{l} = σ (α_{j k} w_{1} h_{k}^{l - 1})

(4)

where

σ

represents the activation function, for example,

R E L U

,

w_{1}

is a trainable weight matrix, and

h_{k}^{l - 1}

represents the l-1th layer node

v_{k}

. The attention coefficient of node

v_{k}

in hyperedge

e_{j}

is denoted by

α_{j k}

and can be calculated as follows:

\begin{matrix} α_{j k} = \frac{e x p (a_{1}^{T} u_{k})}{\sum_{v_{p}} \in e_{j} e x p (a_{1}^{T} u_{p})} \\ u_{k} = L e a k e y R e L U (w_{1} h_{k}^{l - 1}) \end{matrix}

(5)

v_{k}

where

a_{1}^{T}

is a trainable weight vector.

3.2.3. Node and Node-Level Self-Attention

We determined the hyperedge representations through the above node-level attention process. We added a self-attention mechanism to determine whether nodes were connected to each other in the final layer of the network, which can be used to learn better hyperedge representations. This self-attention mechanism can be defined as follows:

S_{N N} = σ {[{(w_{1} h_{i}^{l - 1})}^{T} \cdot (w_{1} h_{j}^{l - 1})] \cdot (a_{1}^{T} u_{i} + a_{1}^{T} u_{j})}

(6)

We assume that l is the last layer of the HGNN-AS model.

h_{i}^{l - 1}

and

h_{j}^{l - 1}

represent the node representations of nodes

v_{i}

and

v_{j}

in the

l - 1

layer, respectively. The degree of association between nodes

v_{j}

and

v_{j}

is represented by

S_{N N}

.

σ

represents the sigmoid activation function. Our self-attention mechanism has the same parameters as the original HyperGAT self-attention mechanism. The goal of the self-attention mechanism at the node level is to better train the weight matrix

w_{1}

and weight vector

a_{1}^{T}

. This goal was motivated by SuperGAT [6] and the gating mechanism of gated recurrent units [56].

3.2.4. Edge-Level Attention

All hyperedge representations

{f_{j}^{l} | \forall e_{j} \in ε_{i}}

were obtained through node-level attention. To learn the next-layer representation of node

v_{i}

, we use an edge-level attention mechanism to identify informative hyperedges. This procedure can be formalized as follows:

h_{i}^{l} = σ (\sum_{v_{k} \in e_{j}} β_{i j} w_{2} f_{j}^{l})

(7)

where

h_{i}^{l}

is the output representation of node

v_{i}

, and

w_{2}

is a weight matrix. The attention coefficient of hyperedge

e_{j}

on node

v_{i}

is denoted by

β_{i j}

and can be calculated as follows:

\begin{matrix} β_{i j} & = \frac{e x p (a_{2}^{T} u_{j})}{\sum_{v_{p} \in ε_{i}} e x p (a_{2}^{T} v_{p})} \\ u_{j} & = L e a k e y R e L U ([w_{2} f_{j}^{l} ∥ w_{1} h_{i}^{l - 1}]) \end{matrix}

(8)

where

a_{2}^{T}

is a weight vector used to calculate the importance of the hyperedges, and

∥

is the concatenation operation.

3.2.5. Node and Hyperedge-Level Self-Attention

The second self-attention mechanism we introduce is called node and hyperedge level self-attention, which judges whether a node is on a hyperedge based on the hyperedge and node representations. The process can be formulated as follows:

S_{N E} = σ {[{(w_{2} f_{j}^{l})}^{T} \cdot (w_{1} h_{i}^{l - 1})] \cdot (a_{2}^{T} [w_{2} f_{j}^{l} ∥ w_{1} h_{i}^{l - 1}])}

(9)

We assume that l is the last layer of the neural network and that

f_{j}^{l}

is the representation of the hyperedge

e_{j}

in the lth layer. The representation of node

v_{i}

in the

l - 1

layer is represented by

h_{i} (l - 1)

.

S_{N E}

is the degree of association between the node and the hyperedge, and

σ

represents the sigmoid activation function. The two self-attention mechanisms are shown in Figure 3. Compared with the original hypergraph attention network, our new self-attention mechanism does not add any new parameters. The goal of the self-attention mechanism at the node and hyperedge levels is to better train the weight matrix

w_{2}

and weight vector

a_{2}^{T}

to allow our model to learn better node representations. Algorithm 1 shows the pseudocode that summarizes the training procedure.

Algorithm 1 Training process of the HGNN-AS model.

Require: The hypergraph $G = (V, E)$ ,
The node feature { $h_{i}$ , $\forall i \in V$ },
The number of attention head K,
Ensure: The final nodes representation { $h_{i}^{2}$ , $\forall i \in V$ },
The degree of association between nodes $S_{N N}$ ,
The degree of association between nodes and hyperedges $S_{N E}$ ,
initialization
First layer: Aggregate features using a multihead attention mechanism
for $K = 1 \dots K$ do
Aggregate node features to hyperedges $f_{j}^{1}$ ← $A G G R_{n o d e} (h_{i})$
Aggregate hyperedge features to nodes $h_{i}^{1}$ ← $A G G R_{e d g e} (f_{j}^{1}, h_{i})$
end
Second layer: Aggregate features using the self-attention mechanism
Aggregate node features to hyperedges $f_{j}^{2}$ ← $A G G R_{n o d e} (h_{i}^{1})$
$S_{N N}$ ← Node and node-level self-attention
Aggregate hyperedge features to nodes $h_{i}^{2}$ ← $A G G R_{e d g e} (f_{j}^{2}, h_{i}^{1})$
$S_{N E}$ ← Node and edge-level self-attention
return $h_{i}^{2}$ , $S_{N N}$ , $S_{N E}$

3.2.6. Loss Function

Finally, we combine the cross-entropy loss of the node labels (

L_{O}

), the node and node-level self-attention loss of the final layer (

L_{E E}

), and the node and hyperedge level self-attention loss of the final layer (

L_{N E}

) to establish the total loss function, which can be defined as follows:

\begin{matrix} L & = L_{O} + L_{E E} + L_{N E} \\ L_{E E} & = B C E W i t h L o g i t s L o s s (L a b e l_{1}, S_{E E}) \\ L_{N E} & = B C E W i t h L o g i t s L o s s (L a b e l_{2}, S_{N E}) \end{matrix}

(10)

For

L_{E E}

and

L_{N E}

, we use the

B C E W i t h L o g i t s L o s s

loss function. Algorithm 1 reports the pseudocode that summarizes the training procedure.

4. Experiments

In this section, we test our proposed HGNN-AS model on two tasks: citation network classification and visual object recognition. In addition, we compare the proposed method to HyperGAT and other advanced methods.

4.1. Citation Network Classification

Datasets

In this experiment, we classify a citation network dataset. We employ two widely utilized citation network datasets, namely Cora and Pubmed [57]. The experimental design was based on semi-supervised learning with graph embeddings [58] proposed by Yang et al. The nodes in both datasets represent papers, and the edges represent the citation relationships between papers. The node features of the Cora dataset have 1433 dimensions, and the nodes are divided into 7 categories. The node features of the Pubmed dataset have 500 dimensions, and the nodes are divided into 3 categories. We used semi-supervised learning to classify the nodes, and 140/500/1000 nodes in the Cora dataset were used for training/validation/testing. Additionally, 60/500/1000 nodes in the Pubmed dataset were used for training/validation/testing. Specific descriptions of the two datasets are listed in Table 1.

To create the hypergraph structure for HGNN-AS, one node in the graph was chosen as the centre node, and all nodes connected to this central node were selected as neighbour nodes. We generated the hyperedge between the central node and its corresponding neighbour nodes. We constructed a hypergraph according to these nodes and hyperedges that has the same amount of data as the original graph. Our constructed hypergraph has the same structure as the original graph; however, the view of the graph is different.

4.2. Experimental Settings

A two-layer HGNN-AS was used in this experiment. The first layer had

K = 4

attention heads, each of which computed

F = 16

features (for a total of 64 features), and dropout [59] with a drop rate of

p = 0.5

was used to prevent overfitting. ReLU was chosen as the nonlinear activation function. The cross-entropy loss function was used as the original HyperGAT loss function, and the BCEWithLogitsLoss function was used as the self-attention loss function. We used the Adam optimizer [57] with a learning rate of 0.005 during the training process to minimize the cross-entropy loss. In these experiments, we also compared the proposed HGNN-AS to existing methods.

4.3. Results and Discussion

Table 2 displays the experimental results and comparisons on the citation network datasets. On the Cora and Pubmed datasets, the proposed HGNN-AS model achieved average classification accuracies of 83.9% and 80.3%, respectively, over 200 independent runs. To more intuitively assess the reliability of the improvement, we calculated the corresponding 95% confidence intervals (CIs) for the runtime performance. The results indicate that HGNN-AS outperforms or performs comparably to other state-of-the-art methods. Compared with HyperGAT, HGNN-AS achieves a slight improvement on the Pubmed dataset and approximately a 1.0% gain on the Cora dataset. In future work, we plan to enhance model performance further and validate the statistical significance of the improvements through more robust hypergraph construction and noise suppression strategies. The performance of the model under different dropout rates on the Cora and Pubmed datasets is shown in Figure 4. It is noteworthy that HGNN-AS exhibits greater stability than HyperGAT as the dropout rate varies, suggesting that HGNN-AS is less prone to overfitting. In addition, we compared the loss curves of HGNN-AS with those of HyperGAT and HGNN on both datasets, as shown in Figure 5. It can be observed that HGNN-AS achieves the fastest convergence speed and the lowest final training loss on both datasets, demonstrating that the self-attention mechanism effectively enhances the model’s learning efficiency and training stability.

4.4. Visual Object Classification

4.4.1. Datasets

In this experiment, we classified visual objects. To evaluate our methods, we used two public benchmark datasets: the Princeton ModelNet40 dataset [60] and the National Taiwan University (NTU) 3D model dataset [61]. We follow HGNN and preprocessed the data with multiview convolutional neural networks (MVCNNs) [62] and group-view convolutional neural networks (GVCNNs) [63]. The ModelNet40 dataset contains 12,311 objects in 40 popular categories, and the same training/testing split is used as in [60], with 9843 objects used for training and 2468 objects used for testing. The NTU dataset contains 2012 3D shapes in 67 different categories, such as cars, chairs, chess, chips, clocks, cups, doors, frames, pens, and plant leaves. In the NTU dataset, 80% of the data was used for training, while the remaining 20% of the data was used for testing. Each 3D object in both datasets is represented by an extracted feature. To extract the features of each 3D object, we used two methods: the MVCNN and the GVCNN. Specific descriptions of the two datasets are listed in Table 3.

Therefore, each 3D object in the ModelNet40 and NTU datasets is composed of an MVCNN feature and a GVCNN feature, as well as its own label. Compared to the HyperGAT method, the ModelNet40 and NTU datasets have no available graph structures. As a result, we constructed probability graphs based on the node distance, which is the same method used in HGNN models.

4.4.2. Experimental Settings

We used the ModelNet40 and NTU datasets in this experiment. To construct the hypergraph, we used 10 nodes with the shortest Euclidean distances from the centre node as neighbour nodes, and we used the hypergraph and object features as inputs. We combined the MVCNN and GVCNN features in different ways to obtain the results of our model when processing single modality features and multimodality feature. The data were preprocessed according to HGNN.

In this experiment, we used a two-layer hypergraph attention network with a self-attention mechanism. As shown in Table 4, the first layer has two attention heads, and the hidden layer obtained by each attention head has a feature dimension of 128 (for a total of 256 features). For optimal performance, we used the Adam optimizer and the optimized learning rate was set to 0.001, and the dropout rate was set to 0.3. We trained the model for 1000 epochs with an early-stopping strategy to learn the HGNN-AS model. The cross-entropy loss function was used as the node classification loss function, and the BCEWithLogitsLoss function was used as the loss function of the self-attention mechanism.

4.4.3. Results and Discussion

Figure 6 illustrates the feature aggregation behaviour of the HGNN-AS model in modelling node–hyperedge relationships. We visualised the attention weights for samples from the ModelNet40 dataset. It can be clearly observed that, in the first layer, the attention distribution is relatively dispersed, with the model tending to aggregate features from a broader neighbourhood to capture local structural information. In the second layer, the attention gradually focuses on a small number of key nodes and hyperedges, indicating that the model begins to emphasise essential structures. This transition from dispersed to concentrated attention suggests that HGNN-AS can effectively identify critical structures, thereby supporting subsequent feature aggregation and enhancing both the interpretability and robustness of the model.

The experimental results for two visual object classification tasks are shown in Table 5. We find that HGNN-AS outperformed HyperGAT in all cases on these two datasets for both single modality and multimodality features. More specifically, our method has improvements ranging from 0.3% to 0.9% on different datasets. Compared with general graph neural networks, when only one feature is used to generate graph/hypergraph structures, HGNN-AS can achieve a certain level of improvement. However, when we use both MVCNN and GVCNN features to construct a hypergraph, HGNN-AS achieves much better performance compared with GCN. These results reveal that HGNN-AS is a hypergraph neural network with good performance, which is more suitable for processing multiheadal data.

5. Conclusions

In this paper, we propose a hypergraph neural network for node classification that combines attention and self-attention mechanisms (HGNN-AS), which is an improvement on the HyperGAT network. In this method, the HGNN-AS model is generated by adding a multihead attention mechanism and two self-attention mechanisms to the HyperGAT model. These two new self-attention mechanisms can obtain better node representations and hyperedge representations, and the multihead attention mechanism stabilises the training effect. To the best of our knowledge, this is the first multihead attention mechanism that has been added to hypergraph neural networks. Our proposed HGNN-AS network targets hypergraphs constructed like KNN, which are more accurate for node classification tasks.

Although HGNN-AS has achieved considerable progress, some limitations remain that should be addressed in the future. (1) Although we provided a new hypergraph neural network model, our hypergraph attention and hypergraph structure analyses are still unclear. (2) Due to the similarities between hypergraphs and heterogeneous graphs in processing multimodal data, we can use the self-attention mechanism to learn richer information in heterogeneous graphs.

Author Contributions

Conceptualization, C.L.; Methodology, M.C. and C.L.; Software, C.L.; Validation, C.L., L.H. and R.L.; Formal analysis, L.H. and R.L.; Investigation, L.H. and R.L.; Resources, C.L.; Data curation, M.C., L.H., R.L. and Q.W.; Writing—original draft, C.L., L.H. and R.L.; Writing—review & editing, M.C., D.H. and Q.W.; Visualization, D.H. and Q.W.; Supervision, D.H.; Project administration, D.H.; Funding acquisition, C.L. and D.H. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported by the Scientific Research Fund of Hunan Provincial Education Department (Grant No. 21A0372), the Young Program of the National Natural Science Foundation of China (Grant No. 62002115), the Key Research and Development Plan of Hunan Province (Grant No. 2021NK2020), the Training Program for Excellent Young Innovators of Changsha (Grant No. kq2107020), and the project Research on Fault Diagnosis and Fault-Tolerant Control Algorithms for Multi-Joint Intelligent Robots (Grant No. 24B0570).

Data Availability Statement

Enquiries about data availability should be directed to the authors.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

References

Lawson, S.; Donovan, D.; Lefevre, J. An application of node and edge nonlinear hypergraph centrality to a protein complex hypernetwork. PLoS ONE 2024, 19, e0311433. [Google Scholar] [CrossRef]
Yang, Z.; Ma, Z.; Zhao, W.; Li, L.; Gu, F. HRNN: Hypergraph recurrent neural network for network intrusion detection. J. Grid Comput. 2024, 22, 52. [Google Scholar] [CrossRef]
Harit, A.; Sun, Z.; Yu, J.; Moubayed, N.A. Breaking down financial news impact: A novel AI approach with geometric hypergraphs. arXiv 2024, arXiv:2409.00438. [Google Scholar] [CrossRef]
Feng, Y.; You, H.; Zhang, Z.; Ji, R.; Gao, Y. Hypergraph Neural Networks. In Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January–1 February 2019; pp. 3558–3565. [Google Scholar]
Yadati, N.; Nimishakavi, M.; Yadav, P.; Nitin, V.; Louis, A.; Talukdar, P. HyperGCN: A new method of training graph convolutional networks on hypergraphs. In Proceedings of the Advances in Neural Information Processing Systems 32 (NeurIPS 2019), Vancouver, BC, Canada, 8–14 December 2019. [Google Scholar]
Ding, K.; Wang, J.; Li, J.; Li, D.; Liu, H. Be More with Less: Hypergraph Attention Networks for Inductive Text Classification. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Online, 16–20 November 2020; pp. 4927–4936. [Google Scholar]
Luong, M.T.; Pham, H.; Manning, C.D. Effective approaches to attention-based neural machine translation. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal, 17–21 September 2015; pp. 1412–1421. [Google Scholar]
Bahdanau, D.; Cho, K.; Bengio, Y. Neural machine translation by jointly learning to align and translate. arXiv 2014, arXiv:1409.0473. [Google Scholar]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 5998–6008. [Google Scholar]
Velickovic, P.; Cucurull, G.; Casanova, A.; Romero, A.; Liò, P.; Bengio, Y. Graph attention networks. arXiv 2017, arXiv:1710.10903. [Google Scholar]
Liang, W.; Chen, X.; Huang, S.; Xiong, G.; Yan, K.; Zhou, X. Federated learning edge network-based sentiment analysis combating global COVID-19. Comput. Commun. 2023, 204, 33–42. [Google Scholar] [CrossRef] [PubMed]
Ping, T.; Wang, X.; Wang, Y. Dimensionality reduction in evolutionaryalgorithms-based featureselection for motor imagerybrain-computer interface. Swarm Evol. Comput. 2020, 52, 100597. [Google Scholar]
Ouyang, Y.; Liu, W.; Yang, Q.; Mao, X.; Li, F. Trust based task offloading scheme in UAV-enhanced edge computing network. Peer-Peer Netw. Appl. 2021, 14, 3268–3290. [Google Scholar] [CrossRef]
Shi, C.; Xian, M.; Zhou, X.; Wang, H.; Cheng, H.D. Multi-slice low-rank tensor decomposition based multi-atlas segmentation: Application to automatic pathological liver CT segmentation. Med. Image Anal. 2021, 73, 102152. [Google Scholar] [CrossRef]
Liu, C.; Kou, G.; Zhou, X.; Peng, Y.; Sheng, H.; Alsaadi, F.E. Time-dependent vehicle routing problem with time windows of city logistics with a congestion avoidance approach. Knowl.-Based Syst. 2020, 188, 104813. [Google Scholar] [CrossRef]
Fei, F.; Li, S.; Dai, H.; Hu, C.; Dou, W.; Ni, Q. k-anonymity based schema for location privacy preservation. IEEE Trans. Sustain. Comput. 2017, 4, 156–167. [Google Scholar] [CrossRef]
Li, C.; He, A.; Wen, Y.; Liu, G.; Chronopoulos, A.T. Optimal trading mechanism based on differential privacy protection and Stackelberg game in big data market. IEEE Trans. Serv. Comput. 2023, 16, 3550–3563. [Google Scholar] [CrossRef]
Xihua, L.; Chen, X. D-intuitionistichesitant fuzzy sets and their application in multiple attributedecision making. Cogn. Comput. 2018, 10, 496–505. [Google Scholar]
Donghai, L.; Liu, Y.; Chen, A. The new similarity measureand distance measure betweenhesitant fuzzy linguisticterm sets and their applicationin multi-criteria decisionmaking. J. Intell. Fuzzy Syst. 2019, 37, 995–1006. [Google Scholar]
Chen, X.; Xu, G.; Xu, X.; Jiang, H.; Tian, Z.; Ma, T. Multicenter hierarchical federated learningwithfault-tolerance mechanismsforresilient edge computingnetworks. IEEE Trans. Neural Netw. Learn. Syst. 2024, 36, 47–61. [Google Scholar] [CrossRef] [PubMed]
Xu, X.; Du, Z.; Chen, X.; Cai, C. Confidence consensus-based model for large-scale group decision making: A novel approach to managing non-cooperative behaviors. Inf. Sci. 2019, 477, 410–427. [Google Scholar] [CrossRef]
Zhou, Z.; Cai, Y.; Xiao, Y.; Chen, X.; Zeng, H. The optimization of reverse logistics cost based on value flow analysis—A case study on automobile recycling company in China. J. Intell. Fuzzy Syst. 2018, 34, 807–818. [Google Scholar] [CrossRef]
Gori, M.; Monfardini, G.; Scarselli, F. A new model for learning in graph domains. In Proceedings of the 2005 IEEE International Joint Conference on Neural Networks, Montreal, QC, Canada, 31 July–4 August 2005; pp. 729–734. [Google Scholar]
Scarselli, F.; Gori, M.; Tsoi, A.C.; Hagenbuchner, M.; Monfardini, G. The graph neural network model. IEEE Trans. Neural Netw. 2009, 20, 61–80. [Google Scholar] [CrossRef]
Kipf, T.N.; Welling, M. Semi-supervised classification with graph convolutional networks. In Proceedings of the 5th International Conference on Learning Representations, Toulon, France, 24–26 April 2017. [Google Scholar]
Defferrard, M.; Bresson, X.; Vandergheynst, P. Convolutional neural networks on graphs with fast localized spectral filtering. In Advances in Neural Information Processing Systems; Curran Associates, Inc.: Red Hook, NY, USA, 2016; pp. 3844–3852. [Google Scholar]
Zhuang, C.; Ma, Q. Dual graph convolutional networks for graph-based semi-supervised classification. In Proceedings of the 2018 World Wide Web Conference, International World Wide Web Conferences Steering Committee, Lyon, France, 23–27 April 2018; pp. 499–508. [Google Scholar]
Hu, F.; Zhu, Y.; Wu, S.; Wang, L.; Tan, T. Hierarchical graph convolutional networks for semi-supervised node classification. arXiv 2019, arXiv:1902.06667. [Google Scholar] [CrossRef]
Fu, S.; Liu, W.; Li, S.; Zhang, Y. Two-order graph convolutional networks for semi-supervised classification. IET Image Process. 2019, 13, 2763–2771. [Google Scholar]
Zhan, M.; Kou, G.; Dong, Y.; Chiclana, F.; Herrera-Viedma, E. Bounded confidence evolution of opinions and actions in social networks. IEEE Trans. Cybern. 2021, 52, 7017–7028. [Google Scholar] [CrossRef] [PubMed]
He, L.; Chen, H.; Wang, D.; Shoaib, J.; Yu, P.; Xu, G. Click-through rate prediction with multi-modal hypergraphs. In Proceedings of the 30th ACM International Conference on Information and Knowledge Management, Virtual, 1–5 November 2021. [Google Scholar]
Chen, F.; Park, J.; Park, J. A hypergraph convolutional neural network for molecular properties prediction using functional group. arXiv 2021, arXiv:2106.01028. [Google Scholar]
Zhang, J.; Bhuiyan, M.Z.A.; Yang, X.; Wang, T.; Xu, X.; Hayajneh, T.; Khan, F. AntiConcealer: Reliable detection of adversary concealed behaviors in EdgeAI-assisted IoT. IEEE Internet Things J. 2021, 9, 22184–22193. [Google Scholar] [CrossRef]
Feng, Y.; Yemei, Q.; Shen, Z. Correlation-split and Recombination-sort Interaction Networks for airquality forecasting. Appl. Soft Comput. 2023, 145, 110544. [Google Scholar] [CrossRef]
Berge, C. Graphs and Hypergraphs. In North-Holland Mathematical Library; North-Holland: Amsterdam, The Netherlands, 1973. [Google Scholar]
Zhou, D.; Huang, J.; Schölkopf, B. Learning with hypergraphs: Clustering, classification, and embedding. In Proceedings of the 19th International Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 4–7 December 2006; MIT Press: Cambridge, MA, USA, 2007; pp. 1601–1608. [Google Scholar]
Zhang, Z.; Lin, H.; Gao, Y. Dynamic hypergraph structure learning. In Proceedings of the 27th International Joint Conference on Artificial Intelligence, Stockholm, Sweden, 13–19 July 2018; AAAI Press: Washington, DC, USA, 2018; pp. 3162–3169. [Google Scholar]
Zhang, Y.; Wang, N.; Chen, Y.; Zou, C.; Wan, H.; Zhao, X.; Gao, Y. Hypergraph label propagation network. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; AAAI Press: Washington, DC, USA, 2020; Volume 34, pp. 6885–6892. [Google Scholar]
Dong, Y.; Sawin, W.; Bengio, Y. HNHN: Hypergraph Networks with Hyperedge Neurons. arXiv 2020, arXiv:2006.12278. [Google Scholar] [CrossRef]
Bai, S.; Zhang, F.; Torr, P.H. Hypergraph convolution and hypergraph attention. Pattern Recognit. 2021, 110, 107637. [Google Scholar] [CrossRef]
Duta, I.; Cassarà, G.; Silvestri, F.; Lió, P. Sheaf hypergraph networks. Adv. Neural Inf. Process. Syst. 2023, 36, 12087–12099. [Google Scholar]
Chen, P.; Sarkar, S.; Lausen, L.; Srinivasan, B.; Zha, S.; Huang, R.; Karypis, G. Hytrel: Hypergraph-enhanced tabular data representation learning. Adv. Neural Inf. Process. Syst. 2023, 36, 32173–32193. [Google Scholar]
Ju, W.; Mao, Z.; Yi, S.; Qin, Y.; Gu, Y.; Xiao, Z.; Wang, Y.; Luo, X.; Zhang, M. Hypergraph-enhanced dual semi-supervised graph classification. arXiv 2024, arXiv:2405.04773. [Google Scholar]
Mnih, V.; Heess, N.; Graves, A.; Kavukcuoglu, K. Recurrent models of visual attention. arXiv 2014, arXiv:1406.6247. [Google Scholar] [CrossRef]
Rush, A.M.; Chopra, S.; Weston, J. A neural attention model for abstractive sentence summarization. arXiv 2015, arXiv:1509.00685. [Google Scholar] [CrossRef]
Mnih, V.; Heess, N.; Graves, A. Recurrent models of visual attention. In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 8–13 December 2014; pp. 2204–2212. [Google Scholar]
Ba, J.; Mnih, V.; Kavukcuoglu, K. Multiple object recognition with visual attention. arXiv 2014, arXiv:1412.7755. [Google Scholar]
Chan, E.; Rappaport, L.A.; Kemper, K.J. Complementary and alternative therapies in childhood attention and hyperactivity problems. J. Dev. Behav. Pediatr. 2003, 24, 4–8. [Google Scholar] [CrossRef] [PubMed]
Xu, K.; Ba, J.; Kiros, R.; Cho, K.; Courville, A.; Salakhudinov, R.; Zemel, R.; Bengio, Y. Show, attend and tell: Neural image caption generation with visual attention. In Proceedings of the 32nd International Conference on Machine Learning, Lille, France, 6–11 July 2015; pp. 2048–2057. [Google Scholar]
Kim, D.; Oh, A. How to Find Your Friendly Neighborhood: Graph Attention Design with Self-Supervision. arXiv 2022, arXiv:2204.04879. [Google Scholar] [CrossRef]
Zhu, Z.; Liu, Y.; Cao, X.; Dong, W. Factors affecting customer intention to adopt a mobile chronic disease management service: Differentiating age effect from experiential distance perspective. J. Organ. End User Comput. (JOEUC) 2022, 34, 1–23. [Google Scholar] [CrossRef]
Li, X.; Cai, J.; Zhao, R.; Li, C.; He, C.; He, D. Optimizing anchor node deployment for fingerprint localization with low-cost and coarse-grained communication chips. IEEE Internet Things J. 2022, 9, 15297–15311. [Google Scholar] [CrossRef]
Qi, L.; Dou, W.; Hu, C.; Zhou, Y.; Yu, J. A context-aware service evaluation approach over big data for cloud applications. IEEE Trans. Cloud Comput. 2015, 8, 338–348. [Google Scholar] [CrossRef]
Ren, Y.; Liu, A.; Mao, X.; Li, F. An intelligent charging scheme maximizing the utility for rechargeable network in smart city. Pervasive Mob. Comput. 2021, 77, 101457. [Google Scholar] [CrossRef]
Chen, Z.-S.; Yang, Y.; Wang, X.-J.; Chin, K.S.; Tsui, K.L. Fostering linguistic decision-making under uncertainty: A proportional interval type-2 hesitant fuzzy TOPSIS approach based on Hamacher aggregation operators and andness optimization models. Inf. Sci. 2019, 500, 229–258. [Google Scholar] [CrossRef]
Cho, K.; Van Merriënboer, B.; Gulcehre, C.; Bahdanau, D.; Bougares, F.; Schwenk, H.; Bengio, Y. Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation. arXiv 2014, arXiv:1406.1078. [Google Scholar] [CrossRef]
Sen, P.; Namata, G.; Bilgic, M.; Getoor, L.; Galligher, B.; Eliassi-Rad, T. Collective classification in network data. AI Mag. 2008, 29, 93. [Google Scholar] [CrossRef]
Yang, Z.; Cohen, W.W.; Salakhutdinov, R. Revisiting Semi-Supervised Learning with Graph Embeddings. In Proceedings of the 33rd International Conference on Machine Learning, New York, NY, USA, 19–24 June 2016. [Google Scholar]
Srivastava, N.; Hinton, G.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R. Dropout:A Simple Way to Prevent Neural Networks from Overfitting. J. Mach. Learn. Res. 2014, 15, 1929–1958. [Google Scholar]
Wu, Z.; Song, S.; Khosla, A.; Yu, F.; Zhang, L.; Tang, X.; Xiao, J. 3D shapenets: A deep representation for volumetric shapes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 1912–1920. [Google Scholar]
Chen, D.; Tian, X.; Shen, Y.; Ouhyoung, M. On visual similarity based 3D model retrieval. In Computer Graphics Forum; Wiley Online Library: Hoboken, NJ, USA, 2003; Volume 22, pp. 223–232. [Google Scholar]
Su, H.; Maji, S.; Kalogerakis, E.; Learned-Miller, E. Multi-view convolutional neural networks for 3D shape recognition. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 945–953. [Google Scholar]
Feng, Y.; Zhang, Z.; Zhao, X.; Ji, R.; Gao, Y. Gvcnn: Group-view convolutional neural networks for 3D shape recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 264–272. [Google Scholar]

Figure 1. A way to construct hypergraphs similar to KNN. As shown in the figure, nodes of different colours represent different categories. The original graph structure has

{v_{1}, v_{2}, v_{3} \dots v_{7}, v_{8}}

nodes; connected to

v_{4}

are

{v_{1}, v_{2}, v_{3}, v_{5}}

, and connected to

v_{5}

are

{v_{1}, v_{2}, v_{3}, v_{5}}

. When we converted the original graph into a hypergraph, for node

v_{4}

, we use nodes

{v_{1}, v_{2}, v_{3}, v_{5}}

as neighbour nodes to generate the hyperedge

e_{1}

; for node

v_{5}

, we use nodes

{v_{4}, v_{6}, v_{7}, v_{8}}

as neighbour nodes to generate the hyperedge

e_{2}

.

Figure 1. A way to construct hypergraphs similar to KNN. As shown in the figure, nodes of different colours represent different categories. The original graph structure has

{v_{1}, v_{2}, v_{3} \dots v_{7}, v_{8}}

nodes; connected to

v_{4}

are

{v_{1}, v_{2}, v_{3}, v_{5}}

, and connected to

v_{5}

are

{v_{1}, v_{2}, v_{3}, v_{5}}

. When we converted the original graph into a hypergraph, for node

v_{4}

, we use nodes

{v_{1}, v_{2}, v_{3}, v_{5}}

as neighbour nodes to generate the hyperedge

e_{1}

; for node

v_{5}

, we use nodes

{v_{4}, v_{6}, v_{7}, v_{8}}

as neighbour nodes to generate the hyperedge

e_{2}

.

Figure 2. Hypergraph neural network with attention and self-attention (HGNN-AS).

Figure 3. Two self-attention mechanisms.

Figure 4. Test accuracy by varying the dropout rate.

Figure 5. Training loss curves of HGNN-AS, HyperGAT, and HGNN illustrating faster and more stable convergence of HGNN-AS.

Figure 6. HGNN-AS HGNN-AS attention visualization on the ModelNet40 dataset. Colors indicate the magnitude of attention weights, with warmer colors representing higher attention values and cooler colors representing lower attention values.

Table 1. Summary statistics of the two citation network datasets.

Dataset	Cora	Pubmed
Node	2708	19,717
Edge	5429	44,338
Feature	1433	500
Class	7	3

Table 2. Test accuracy (%) with 95% confidence intervals (CIs) on citation network datasets.

Method	Cora Acc (%)	95% CI	Pubmed Acc (%)	95% CI
GCN [25]	81.5	81.0–82.0	79.0	78.5–79.5
GAT [10]	83.0	82.5–83.5	79.0	78.5–79.5
HGNN [4]	81.6	81.2–82.0	80.1	79.8–80.4
HyperGAT [58]	82.9	82.4–83.4	80.0	79.6–80.4
HGNN-AS	83.9	83.2–84.3	80.3	80.0–80.6

Table 3. Detailed information on the ModelNet40 and NTU datasets.

Dataset	ModelNet40	NTU
Objects	12,311	2012
MVCNN Feature	4096	4096
GVCNN Feature	2048	2048
Training node	9843	1639
Testing node	2468	373
Classes	40	67

Table 4. Parameter settings on the ModelNet40 and NTU datasets.

Experimental Parameter	Setting
Attention head	2
Feature dimension	128
Optimizer	Adam optimizer
Learning rate	0.001
Dropout rate	0.3
Epochs	1000

Table 5. Test accuracy (%) on visual object classification. BOTH means GVCNN + MVCNN, which stands for combining features or structures to generate multiheadal data.

Datasets	Feature	Structure	GCN [25]	HGNN [4]	HyperGAT [58]	HGNN-AS
NTU	MVCNN	MVCNN	86.7%	91.0%	90.2%	91.1%
	GVCNN	GVCNN	91.8%	92.6%	92.4%	93.0%
	MVCNN	BOTH	92.3%	96.6%	96.4%	96.7%
	GVCNN	BOTH	92.8%	96.6%	96.7%	97.0%
	BOTH	BOTH	94.4%	96.7%	96.7%	97.0%
Model-Net40	MVCNN	MVCNN	71.3%	75.6%	76.9%	77.4%
	GVCNN	GVCNN	78.8%	82.5%	82.3%	82.6%
	MVCNN	BOTH	73.2%	83.6%	82.8%	83.7%
	GVCNN	BOTH	75.9%	84.2%	83.9%	84.2%
	BOTH	BOTH	76.1%	84.2%	84.0%	84.2%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Li, C.; Huang, L.; Liu, R.; He, D.; Chen, M.; Wu, Q. HGNN-AS: Enhancing Hypergraph Neural Network for Node Classification Accuracy with Attention and Self-Attention. Electronics 2025, 14, 4282. https://doi.org/10.3390/electronics14214282

AMA Style

Li C, Huang L, Liu R, He D, Chen M, Wu Q. HGNN-AS: Enhancing Hypergraph Neural Network for Node Classification Accuracy with Attention and Self-Attention. Electronics. 2025; 14(21):4282. https://doi.org/10.3390/electronics14214282

Chicago/Turabian Style

Li, Chuang, Lanfang Huang, Ruihai Liu, Dian He, Minghui Chen, and Qian Wu. 2025. "HGNN-AS: Enhancing Hypergraph Neural Network for Node Classification Accuracy with Attention and Self-Attention" Electronics 14, no. 21: 4282. https://doi.org/10.3390/electronics14214282

APA Style

Li, C., Huang, L., Liu, R., He, D., Chen, M., & Wu, Q. (2025). HGNN-AS: Enhancing Hypergraph Neural Network for Node Classification Accuracy with Attention and Self-Attention. Electronics, 14(21), 4282. https://doi.org/10.3390/electronics14214282

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

HGNN-AS: Enhancing Hypergraph Neural Network for Node Classification Accuracy with Attention and Self-Attention

Abstract

1. Introduction

2. Related Work

2.1. Graph Neural Networks

2.2. Hypergraph Neural Networks

2.3. Self-Attention Mechanism

3. Methodology

3.1. HyperGAT for Node Classification

3.2. Hypergraph Neural Network with Attention and Self-Attention

3.2.1. Data Preprocessing and Building Hypergraphs

3.2.2. Node-Level Attention

3.2.3. Node and Node-Level Self-Attention

3.2.4. Edge-Level Attention

3.2.5. Node and Hyperedge-Level Self-Attention

3.2.6. Loss Function

4. Experiments

4.1. Citation Network Classification

Datasets

4.2. Experimental Settings

4.3. Results and Discussion

4.4. Visual Object Classification

4.4.1. Datasets

4.4.2. Experimental Settings

4.4.3. Results and Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI