Geometry-V-Sub: An Efficient Graph Attention Network Struct Based Model for Node Classification

Lyu, Zhengyu; Aziguli, Wulamu; Zhang, Dezheng

doi:10.3390/app12147246

Open AccessArticle

Geometry-V-Sub: An Efficient Graph Attention Network Struct Based Model for Node Classification

by

Zhengyu Lyu

^†

,

Wulamu Aziguli

^† and

Dezheng Zhang

^*

The School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing 100083, China

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Appl. Sci. 2022, 12(14), 7246; https://doi.org/10.3390/app12147246

Submission received: 10 June 2022 / Revised: 8 July 2022 / Accepted: 10 July 2022 / Published: 19 July 2022

(This article belongs to the Section Computing and Artificial Intelligence)

Download

Browse Figure

Versions Notes

Abstract

:

With the development of deep learning and graph deep learning, the network structure is more and more complex, and the parameters in the network model and the computing resources and storage resources required are increasing. The lightweight design and optimization of the network structure is conducive to reducing the required computing resources and storage resources, reducing the requirements of the network model on the computing environment, increasing its scope of application, reducing the consumption of energy in computing, and is conducive to environmental protection. The contribution of this paper is that Geometry-V-Sub is a graph learning structure based on spatial geometry, which can greatly reduce the parameter requirements and only lose a little accuracy. The number of parameters is only 13.05–16.26% of baseline model, and the accuracy of Cora, Citeseer and PubMed is max to 80.4%, 68% and 81.8%, respectively. When the number of parameters is only 12.01% of baseline model, F1 score is max to 98.4.

Keywords:

model light-weighting; Geometric-V-Sub; optimization; node classification; graph neural networks

1. Introduction

With the implementation and practice of machine learning in other fields, more and more data research has been extended to the field of graph structure data representation and learning. Because of its natural logic and relational representation, graph structure has increasingly appeared in various interdisciplinary fields. Datasets with graph structure data, such as Cora [1,2], citeseer [3], PubMed [4]; Datasets PPI in biomedical field [5]; Data sets of social relations and community discovery spamme [6], Reddit [5], etc. Therefore, a large number of models and algorithms applied to graph learning have been developed. In the current models, most of the models used for graph classification tasks originate from graph convolution network (GCN) or its improved version. For example, spectral-based graph convolution neural networks (such as Spectral CNN [7], ChebNet [8], 1st ChebNet [9]) and spatial based graph convolution neural networks (such as message passing neural networks (MPNN) [10], GraphSage [5], diffusion revolution neural networks (DCNN) [11]) and graph attention networks (such as Graph Attention Network(GAT) [12] Gated Attention Network (GAAN) [13], etc.).

Graph convolution network based on spectrum is an earlier network structure, which has more research results and theoretical basis. Graph convolution based on spectrum [7] treats graph structure data as signals, and maps graph data in spatial domain to frequency domain for convolution. In the initial research results, the process of Fourier transform convolution inverse Fourier transform is used for calculation [8]. The disadvantage is that it requires many computing resources and needs to calculate the whole map, so it is not suitable for large-scale map inference and directed graph calculation. In the follow-up work, Chebyshev polynomial and 1st ChebNet are used to reduce the need of computing power [9]. Because the features decomposition of the Laplace matrix used in the derivation process needs to meet the condition that the Laplace matrix is a symmetric matrix, this improvement still cannot be inferred directly on the directed graph. If the directed graph inference is carried out by this method, it needs to convert the directed graph into an undirected graph to reason, and thus the direction information will be lost. If the inference is carried out by cutting the sub-graph, the slight change of the graph structure data will lead to a huge difference in the generation of Laplace matrix. However, in the follow-up work [14], the author improved the graph convolution network to a certain extent, so that it can deal with directed graphs, and also took a step in the scientific research results of spectrum-based graphs. In addition, the improved spectrum-based graph convolution network includes adaptive graph convolution network (AGCN) [15] and dual graph convolution network (DGCN) [16].

In order to solve the limitations faced by the spectrum-based graph convolution network, the space-based graph convolution network Neural Network for Graphs (NN4G) [17] was first proposed. It realizes graph convolution by directly accumulating the features of adjacent nodes. Due to the use of non-standardized adjacency matrix, the value may be unstable. In order to make this GCN interpretable, a probabilistic model based on NN4G, Contextual Graph Markov Model(CGMM) [18], was proposed; the contribution of this article is that CGMM not only maintains spatial locality, but also has probabilistic interpretability. Diffusion Convolutional Neural Network (DCNN) [11] and Graph Diffusion Convolution (GDC) [19] are both propagated by using the power series of probability transfer matrix. The final propagation features are related to the distance (the distance between node A and node B is a few nodes), which can reflect the dependence of a node on the importance of remote nodes; the difference is that DCNN splices the final results, while DGC sums the final results. The general framework of space-based graph convolution network is Message Passing Neural Networks(MPNN) [10], and the framework implemented through this mechanism has strong plasticity, such as Deep Graph Library [20]. Because the number of neighbors of nodes can vary from one to one thousand or more, it is inefficient to obtain the full size of node neighbors, so GraphSage [5] uses sampling to obtain a fixed number of neighbors, therefore, the operation efficiency is improved.

Attention mechanism has made many excellent achievements in multitasking since it was proposed [21]. There is no exception in the study of graph structure data, so there are many excellent models and structures. Graph Attention Network (GAT) [12] improves the model’s ability to express the graph by splicing nodes as edge features, learning edge features using attention vectors, and finally aggregating the results by implicitly obtaining edge weights. The Gated Attention Network (GAAN) [13] introduced the self-attention mechanism to calculate different weights for each attention. In addition to applying graph attention in space, GeniePath also proposed a gating mechanism similar to LSTM to control the information flow across graph volumes.

The graph neural network represented by GCN and GAT has good results in model structure and experimental effect. If a new model is developed on the basis of it, it will be easier to succeed. However, when we modify the graph neural network represented by GAT, we find that adding parameters does not achieve better results, which makes us analyze the network model. Generally, if we need the model to achieve better results, having a set of good initialization parameters will greatly improve the model. As mentioned in the paper [22], the model has been greatly improved through good initialization methods. The particle swarm optimization technology and its improved version [23,24] can greatly improve the artificial neural network, and has been successfully used in the field of medical health. Another way to improve the model is to lightweight the neural network.

Lightweight can be achieved by quantifying the network model. For example, the article [25] quantifies the convolutional neural network to make it run on portable devices. In the article [26], the sparse matrix operation is accelerated by compressing data and FPGA hardware equipment, which not only optimizes the calculated data from the perspective of quantification, but also greatly reduces the number of instructions that are not directly related to the operation in the general processor operation through hardware acceleration. Similarly, lightweight can be achieved not only through quantification, but also through the structural innovation and improvement of network model. For example, the article [27] optimizes the aggregation of nodes by establishing an orthogonal coordinate system, so as to improve the performance of the network model. In the article [28], by optimizing the structure of convolutional neural network, it has achieved certain results in computer game.

Therefore, driven by the above optimization methods, our graph neural network model decides to create a new model for lightweight design. The newly designed lightweight model Geometric Vector Subtraction (Geometric-V-Sub) in this paper uses the attention mechanism in GAT and the inspiration of geometric learning in the article [29] for reference, further enhances the expression ability of graph neural network by exploring the spatial relationships existing in graph data, greatly reduces parameters with small loss of accuracy, and provides a new network model for subsequent research.

2. Materials and Methods

2.1. Symbols

h^{x}

: The result obtained by a node after the operation of the x-th layer, x = 0 represents the input layer, and x = out represents the output layer.

W: Node features transformation matrix.

| |

: Splicing between vectors,

v_{n} | | v_{m} = v_{n + m}

.

e_{i j}

: The attention mechanism is used for calculation, but it is not processed by subsequent calculation.

w_{i j}

: By calculating the weight of processed and used for information broadcasting.

m_{i}

: The mail is made by feature

h_{i}

multiplying with weight

w_{i j}

, and the mail is sent to the target node for the final aggregation operation.

2.2. Reduce Parameters by Modifying GAT

In this part, we try to improve the performance of the model by slightly adjusting the structure, which is adjusted by 2 aspects:

After the feature transformation, the activation function is added to ensure that the parameters of the feature transformation are relatively independent compared with the later processing steps, and more node features can be learned.
The attention vector is widened by two or more times to enhance the expression of opposite edge information.
By scaling the edge vectors spliced by nodes, the attention vectors on the edges can learn more hidden features of edges.

When dealing with graph structure data, it is often necessary to map the sparse representation node information from the sparse representation space to the dense representation space, so as to facilitate the subsequent steps and reduce the scale of the model. Most models in graph learning usually have this step. This process can be completed by matrix transformation. We can get the transformed feature

h^{1}

by transforming the input value

h^{0}

through the matrix W, as shown in Equation (1):

h^{1} = W h^{0}

(1)

In order to adapt the attention matrix of the attention matrix

2 \times n

, it is necessary to expand the copy of the input feature

h^{1}

with the feature matrix size of

2 \times n

to

h^{2}

shown as flow (2):

h^{2} = e x p a n d (h^{1})

(2)

In order to fully characterize the transformed nodes,

L e a k y R e L U

is used as an activation function to process

h^{2}

. The purpose is to use the activation function to isolate the feature transformation and edge attention to enhance the generalization ability, as shown in equation:

h^{3} = L e a k y R e L U (h^{2})

(3)

Like GAT, the feature representation of nodes and between nodes is the processed feature of node i and node j connected to edge ij by splicing, and the attention mechanism is used to achieve the purpose of edge learning. The activation function

σ

is applied to process the results to increase the generalization ability of the model network. Here,

L e a k y R e L U

with negative slope

α = 0.2

is selected as the activation function, as shown in Equation (4):

e_{i j} = σ (a (h_{i}^{2} | | h_{j}^{2}))

(4)

To get the message passing weight in probability form

w_{i j}

, where the softmax function is used to calculate the attention of each edge, as shown in Equation (5):

w_{i j} = s o f t m a x_{j} (e_{i j})

(5)

To fit

h^{1}

with the size

1 \times n

, here

w_{i j}

sum column directions to get

w_{i j}^{'}

, now

w_{i j}

is converted from a matrix of

2 \times n

to a vector of

1 \times n

. By multiplying the message passing weight

w_{i j}

with the

h_{i}^{1}

, the message

m_{i}

transmission is achieved, as shown in Equation (6):

m_{i} = h_{i}^{1} w_{i j}

(6)

After message passing step, messages aggregate through Equation (7) as current layer output features

h^{o u t}

shown as follow:

h^{o u t} = \sum_{i} m_{i}

(7)

2.3. Geometric Vector Subtraction (Geometric-V-Sub) for Edge Feature Enhancement

In order to further explore the role of edges in graph neural network, Geometric-V-Sub is designed according to the spatial structure characteristics of graph structure itself. The overview of the network structure is shown in the Figure 1. The structure has a strong feature extraction ability, and the parameters are 6% to 13% of GAT module. The network model based on this structure can greatly reduce the number of parameters by sharing the feature transformation matrix

W_{1}

and using the spatial relationship between features of each node, but the accuracy loss is very small. The input of the Geometric-V-Sub module also needs to be reduced in dimension, as shown in Equation (8):

h^{1} = L e a k y R e L U (W_{1} h^{0})

(8)

In geometry, the edge between vertex A and vertex B can be represented by AB. This vector contains the connection between two nodes. Vertex B can be derived from vertex A through this vector, and vice versa; based on this principle, Geometric-V-Sub can obtain the relationship between nodes in the top data. The network model takes node features

h^{x}

as coordinates, where

x

represents the layer in the structure, and the AB vector of the difference between Na node features

h_{a}^{x}

and Nb node features

h_{b}^{x}

as edge features, as shown in the Equation (9):

e_{i j}^{0} = h_{i}^{1} - h_{j}^{1}

(9)

vector of

e_{i j}^{0}

is copied as

e_{i j}^{1 k}

, and the 2 vectors are multiplied by the attention vector

a^{1 k}

to obtain the propagation weight

w^{1 k}

, where k is 1 and 2. In this step, the redundancy of edge information is increased through multiple edge attention, so as to reduce the risk of under fitting. The process is shown in the Equation (10):

w_{i j}^{1 k} = a^{1 k} e_{i j}^{1 k}

(10)

For the node classification task, the expression of message passing weight using probability will have better results and reduce the risk of over fitting. The edge normalization method here, like GAT, uses edge based softmax to normalize the edge fitting results. The process is shown in the Equation (11):

w_{i j}^{2 k} = s o f t m a x_{j} (w_{i j}^{1 k})

(11)

In order to further enhance the expression and generalization of nodes, a hidden layer is added to fit the features of nodes. Considering that the direct use of ReLU will lose the information of negative direction, the activation function of this layer adopts LeakyReLU with a negative slope of 0.2, as shown in the Equation (12):

h^{2} = L e a k y R e L U (W_{2} h^{1})

(12)

The calculated

h^{2}

is used for message passing and final node feature aggregation. The aggregation process is shown in the Equation (13):

h_{i}^{3} = \sum_{l = 0}^{l = k} \sum_{j} w_{i j}^{2 k} h_{i j}^{2}

(13)

In view of the fact that attention can achieve ideal results in many models, and increasing the network depth is conducive to the generalization ability of results, this network model also handles the edges with attention, and adds an attention mechanism to the final aggregated results, and the transformed features of edges and nodes can be better expressed. The enhancement process is shown in the Equation (14):

h^{4} = a^{2} h^{3}

(14)

If considering that the network needs to use a deep model, it is inevitable to use residual links. For the needs of this part, only

h^{1}

needs to be processed through attention and become a vector with the same size as

h^{4}

. This situation is shown in the Equation (15):

h^{5} = h^{4} + a^{3} h^{1}

(15)

After the feature processing in the above steps, the final output result is shown in the Equation (16):

\begin{matrix} h^{o u t} = \{\begin{matrix} h^{5}, & residual link required; \\ h^{4}, & other . \end{matrix} \end{matrix}

(16)

3. Results

We have realized our Geometric-V-Sub through Deep Graph Learning (DGL) [20], and based on this, we have done experiments with the modified control model GAT-A-FX and the GAT network model [12] in this paper. The experiment is divided into two parts, which are composed of transductive learning and inductive learning. The transductive learning part consists of three models: GAT as baseline, GAT-A-FX modified through GAT, and Geometric-V-Sub proposed by us. Under the condition that the characteristic numbers of each channel are 8, 16, 32, and 64, respectively, the performance of the accuracy of each model is tested under the condition of no the number of parameters; the inductive learning part consists of three models: GAT as baseline, GAT-A-FX modified through GAT and Geometric-V-Sub proposed by us. The performance of F1 score of each model is tested under the condition that the number of channels of each channel is 32, 64, 128 and 256, respectively. As for the naming rules of the modified GAT model GAT-A-FX, a represents that an activation function is inserted in the middle layer to improve the generalization ability of the model, while FX represents the number of features in each channel. For example, GAT-A-F8 is a modified GAT model with the number of 8 features in each channel.

3.1. Dataset and Baseline

In this experiment, Cora, Citeseer and PubMed are used to test the Transductive learning. PPI data sets are used in inductive learning. The parameters of the dataset are shown in the following Table 1. The baseline selects GAT [12]. Each head (channel) of the GAT hidden layer in this article has 64 features and 8 heads. In the Transductive learning task, there are 2 layers (1 hidden layer and 1 output layer), and in the inductive learning task, there are 3 layers (2 hidden layers and 1 output layer).

3.2. Experimental Parameters

The optimization function used in the experiment is Adam, and the learning rate is 0.005 and epochs is 100. Since we did not have a fixed random seed, we repeated each experiment 100 times and selected the one with the highest value as the final result. The number of channels (headers) is 8. In Transductive learning, the number of hidden layers is 1, the number of output layers is 1, and the total number of layers is 2. In inductive learning, the number of hidden layers is 2, the number of output layers is 1, and the total number of layers is 3.

3.2.1. Transductive Learning

All models’ hidden layers are 2 and each channel of feature dimensions are shown in these tables.

In the experimental parameter Table 2. This experiment is formulated by setting the characteristic number to 8, 16, 32, 64 and the channel number of all models to 8. In the experiment, GAT [12] is used as the baseline, GAT-A-FX as the control model, and our original model Geometric-V-Sub, is compared to carry out the accuracy experiment.

Table 3, respectively, records the number of parameters on Cora, Citeseer and PubMed data sets when each channel of GAT, GAT-A-FX and Geometric-V-Sub contains 8, 16, 32 and 64 features.

From Table 2 and Table 3, it can be seen that the accuracy of GAT-A-FX has a small change compared with GAT [12] under the same feature number input, while the accuracy of Geometric-V-Sub has decreased and improved compared with GAT under the same feature number input.

Based on the GAT-A-FX network structure model, the Cora dataset has a decrease of 0.4% (16 features) to 0.2% (64 features). On the Citeseer dataset, there is a 1% increase (8 features) to a 1.4% decrease (16 features). There was a 0.4% decrease on the PubMed dataset (8, 16, 64 features). Although there was a small amount of parameter improvement when the attention vector width was increased, there was no performance improvement in the corresponding experimental group, and even a partial decrease in accuracy. Therefore, it proved that the transformation of GAT relatively failed in the improvement direction of only increasing the attention vector width and inserting activation functions to improve generalization ability in the transductive learning task.

The Geometric-V-Sub network structure model has a decrease of 7% (8 features) to 2% (32 features) on the Cora dataset, but the Geometric-V-Sub network structure model has a decrease of 12.75% (8 features, Geometric-V-Sub is 12,245 and GAT is 92,302) to 13.95% (64 features, Geometric-V-Sub is 103,021 and GAT is 738,318) on the parameters of the control model GAT. There is a 9% (8 features) to 5.6% decrease in the Citeseer dataset, but the parameters of the Geometric-V-Sub network structure model are 13.27% (8 features, Geometric-V-Sub is 30,300 and GAT is 237,516) to 13.05% (64 features, Geometric-V-Sub is 247,972 and GAT is 1,900,044) of the control model GAT. On the PubMed dataset, there is a decrease of 0.4% (8 features) and an increase of 0.6% (16, 64 features). Based on the Geometric-V-Sub network structure model, the the number of parameters is 13.68% (8 features, Geometric-V-Sub is 4421 and GAT is 32,326) to 16.26% (64 features, Geometric-V-Sub is 42,053 and GAT is 258,566) of the control model GAT.

It can be seen from this result that when the number of parameters is reduced too much, it will have a great impact on the dataset with a large feature dimension of the input node (for example, the feature dimension of Citeseer node is 3703, while the feature dimension of PubMed node is only 500). In this experiment, the counts of parameters of Geometric-V-Sub in the first layer is only GAT 1/8 in the node feature transformation, so the dataset with larger node feature dimensions performs poorly. For the dataset with fewer node features, the node feature transformation can be fully mapped and less information is lost, so better results can be obtained. In the case of a huge difference in the total parameters, the final accuracy is smaller or even improved compared to the baseline, which shows that the network structure is reasonable, and it is feasible to use the geometric characteristics of the graph data to classify and learn the node features.

3.2.2. Inductive Learning

The inductive learning experiment was carried out on PPI dataset. The number of channels for GAT, GAT-A-FX and Geometric-V-Sub models is specified as 8. The input feature number of each model is set to 32, 64, 128 and 256, respectively. Table 4 records the number of parameters required by the 3 models under different characteristic numbers of each channel and F1-Score of the each model.

It can be seen from Table 4 that GAT-A-FX is 6.5 lower than GAT F1 score (32 feature numbers, GAT-A-FX the number of parameters is 111,844, GAT the number of parameters is 110,578) to 2.1 (256 feature numbers, GAT-A-FX the number of parameters is 4,561,380, GAT the number of parameters is 4,552,946). Although there was a small increase in parameters when increasing the width of attention vector, there was no due performance improvement in the corresponding experimental group, and even a small decrease in F1 score. Therefore, it was proved that the transformation of GAT was relatively failed in the improvement direction of only increasing the width of attention vector and inserting activation function to improve generalization ability in the task of inductive learning. Geometric-v-sub is 7.9 lower than GAT F1-Score (32 feature numbers, GAT-A-FX the number of parameters is 97,405, GAT the number of parameters is 110,578, and the parameter is 88.09% of baseline) to 0.2 (256 feature numbers, GAT-A-FX the number of parameters is 546,749, GAT the number of parameters is 4,552,946, and the parameter is 12.01% of baseline). After greatly reducing the number of parameters, Geometric-V-Sub can still get F1 score close to the near baseline, which shows that a large number of parameters are actually wasted on the original GAT structure to a certain extent. Therefore, it is necessary to lighten the weight. It is necessary to study how to reasonably allocate parameters to maximize the parameter efficiency of the model, and as the edge of information transmission, it can well transmit information with the support of geometric features, it is helpful for better node classification.

4. Discussion

From the comparative experiment between the two network structures in this paper and the baseline model GAT, it can be seen that adding the hidden layer activation function and the length of attention vector on the basis of GAT can not improve the accuracy of the network model, but lead to over fitting due to the increase of parameters, resulting in the decline of accuracy. The reason is that on the one hand, the GAT structure design is ingenious and reasonable, on the other hand, the splicing in GAT is used to calculate the edge vector by paying attention to the vector, which can deal with most problems, but at the same time, it also raises the question: does the graph neural network really need so many parameters, especially under the condition that the graph network itself has certain logical attributes and correlation characteristics. The Geometric-V-Sub proposed in this paper draws on the attention implementation method of GAT and the design inspiration from spatial geometry to design a method that can greatly reduce the use of parameters and make only a small loss of accuracy. At the same time, this method still has some defects in the task of transductive learning classification. For example, the performance in high-dimensional feature space needs to be improved only by using geometric subtraction to obtain edge features. The feature transformation part should adopt more effective means to achieve this goal, such as unsupervised training the feature space transformation ability of nodes in the form of automatic encoder. If the number of feature transformation matrices can be reduced more effectively, the use of parameters can be greatly reduced, so as to reduce the dependence on hardware, so that graph learning can have more application scenarios and better user experience.

5. Conclusions

On the one hand, this paper proves the potential of GAT network model in lightweight through experiments. On the other hand, a network model based on graph attention mechanism Geometric-V-Sub is proposed. Different from GAT, Geometric-V-Sub adopts spatial geometry for operation. By optimizing the node dimension reduction part, the parameters of the model can be greatly reduced, but the accuracy is not much different from the baseline model GAT. The lightweight network proposed in this paper is a successful attempt to modify the model structure. It can not only verify the effectiveness of spatial geometric structure in improving the accuracy of graph node classification, but also explain the importance of dimensionality reduction of graph node features, because the result of dimensionality reduction will directly affect the final classification effect.

The follow-up work of this paper will focus on two aspects: first, how to map the sparse representation of high-dimensional space to the dense representation of low-dimensional space. Secondly, it studies how to improve the accuracy of the existing Geometric-V-Sub model without increasing or a small amount of model parameters, so that it can surpass the benchmark model GAT, so that it can have a broader application value.

Author Contributions

Conceptualization, Z.L.; methodology, Z.L.; software, Z.L.; validation, Z.L.; resources, W.A. and D.Z.; supervision, W.A.; project administration, W.A. and D.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Science and Technology Innovation 2030—“New Generation Artificial Intelligence” Major Project, Grant no. 2020AAA0108703.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Cora: https://linqs-data.soe.ucsc.edu/public/lbc/cora.tgz,
Citseer: https://linqs-data.soe.ucsc.edu/public/lbc/citeseer.tgz,
Pubmed: https://linqs-data.soe.ucsc.edu/public/Pubmed-Diabetes.tgz,
PPI: http://snap.stanford.edu/graphsage/ppi.zip.
The access time of the above data sets is 10 December 2021.
Datasets can also be downloaded using dataset classes in the deep graph library.

Conflicts of Interest

The authors declare no conflict of interest.

Sample Availability

Not applicable.

References

Getoor, L. Link-based classification. In Advanced Methods for Knowledge Discovery from Complex Data; Springer: London, UK, 2005; pp. 189–207. [Google Scholar]
Sen, P.; Namata, G.; Bilgic, M.; Getoor, L.; Galligher, B.; Eliassi-Rad, T. Collective classification in network data. AI Mag. 2008, 29, 93. [Google Scholar] [CrossRef] [Green Version]
Bhattacharya, I.; Getoor, L. Collective entity resolution in relational data. ACM Trans. Knowl. Discov. Data (TKDD) 2007, 1, 5-es. [Google Scholar] [CrossRef] [Green Version]
Namata, G.; London, B.; Getoor, L.; Huang, B.; Edu, U.M.D. Query-driven active surveying for collective classification. In Proceedings of the 10th International Workshop on Mining and Learning with Graphs, Edinburgh, Scotland, UK, 26 June–1 July 2012; Volume 8, p. 1. [Google Scholar]
Hamilton, W.; Ying, Z.; Leskovec, J. Inductive representation learning on large graphs. In Proceedings of the Advances in Neural Information Processing Systems 30 (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017; Volume 30. [Google Scholar]
Fakhraei, S.; Foulds, J.; Shashanka, M.; Getoor, L. Collective spammer detection in evolving multi-relational social networks. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Sydney, NSW, Australia, 10–13 August 2015; pp. 1769–1778. [Google Scholar]
Bruna, J.; Zaremba, W.; Szlam, A.; LeCun, Y. Spectral networks and locally connected networks on graphs. arXiv 2013, arXiv:1312.6203. [Google Scholar]
Defferrard, M.; Bresson, X.; Vandergheynst, P. Convolutional neural networks on graphs with fast localized spectral filtering. In Proceedings of the 30th International Conference on Neural Information Processing Systems, Barcelona, Spain, 5–10 December 2016; pp. 3844–3852. [Google Scholar]
Kipf, T.N.; Welling, M. Semi-supervised classification with graph convolutional networks. arXiv 2016, arXiv:1609.02907. [Google Scholar]
Gilmer, J.; Schoenholz, S.S.; Riley, P.F.; Vinyals, O.; Dahl, G.E. Neural message passing for quantum chemistry. In Proceedings of the International Conference on Machine Learning, Sydney, Australia, 6–11 August 2017; pp. 1263–1272. [Google Scholar]
Atwood, J.; Towsley, D. Diffusion-convolutional neural networks. In Proceedings of the Advances in Neural Information Processing Systems 29 (NIPS 2016), Barcelona, Spain, 5–10 December 2016; Volume 29. [Google Scholar]
Veličković, P.; Cucurull, G.; Casanova, A.; Romero, A.; Lio, P.; Bengio, Y. Graph attention networks. arXiv 2017, arXiv:1710.10903. [Google Scholar]
Zhang, J.; Shi, X.; Xie, J.; Ma, H.; King, I.; Yeung, D.Y. Gaan: Gated attention networks for learning on large and spatiotemporal graphs. arXiv 2018, arXiv:1803.07294. [Google Scholar]
Tong, Z.; Liang, Y.; Sun, C.; Rosenblum, D.S.; Lim, A. Directed graph convolutional network. arXiv 2020, arXiv:2004.13970. [Google Scholar]
Li, R.; Wang, S.; Zhu, F.; Huang, J. Adaptive graph convolutional neural networks. In Proceedings of the AAAI Conference on Artificial Intelligence, New Orleans, LO, USA, 2–7 February 2018; Volume 32. [Google Scholar]
Zhuang, C.; Ma, Q. Dual graph convolutional networks for graph-based semi-supervised classification. In Proceedings of the 2018 World Wide Web Conference, Lyon, France, 23–27 April 2018; pp. 499–508. [Google Scholar]
Micheli, A. Neural Network for Graphs: A Contextual Constructive Approach. IEEE Trans. Neural Netw. 2009, 20, 498–511. [Google Scholar] [CrossRef] [PubMed]
Bacciu, D.; Errica, F.; Micheli, A. Contextual graph markov model: A deep and generative approach to graph processing. In Proceedings of the International Conference on Machine Learning, Stockholm, Sweden, 10–15 July 2018; pp. 294–303. [Google Scholar]
Klicpera, J.; Weißenberger, S.; Günnemann, S. Diffusion improves graph learning. arXiv 2019, arXiv:1911.05485. [Google Scholar]
Wang, M.; Zheng, D.; Ye, Z.; Gan, Q.; Li, M.; Song, X.; Zhou, J.; Ma, C.; Yu, L.; Gai, Y. Deep graph library: A graph-centric, highly-performant package for graph neural networks. arXiv 2019, arXiv:1909.01315. [Google Scholar]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention is all you need. In Proceedings of the Advances in Neural Information Processing Systems 30 (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017; Volume 30. [Google Scholar]
Bangyal, W.H.; Hameed, A.; Alosaimi, W.; Alyami, H. A new initialization approach in particle swarm optimization for global optimization problems. Comput. Intell. Neurosci. 2021, 2021, 6628889. [Google Scholar] [CrossRef] [PubMed]
Pervaiz, S.; Ul-Qayyum, Z.; Bangyal, W.H.; Gao, L.; Ahmad, J. A systematic literature review on particle swarm optimization techniques for medical diseases detection. Comput. Math. Methods Med. 2021, 2021, 5990999. [Google Scholar] [CrossRef] [PubMed]
Haider Bangyal, W.; Hameed, A.; Ahmad, J.; Nisar, K.; Haque, M.R.; Ibrahim, A.; Asri, A.; Rodrigues, J.J.; Khan, M.A.; B Rawat, D.; et al. New modified controlled bat algorithm for numerical optimization problem. Comput. Mater. Contin. 2022, 70, 2241–2259. [Google Scholar] [CrossRef]
Coelho, Y.; Nguyen, B.; Santos, F.; Krishnan, S.; Bastos-Filho, T. A lightweight model for human activity recognition based on two-level classifier and compact CNN model. In Proceedings of the XXVII Brazilian Congress on Biomedical Engineering: Proceedings of CBEB 2020, Vitória, Brazil, 26–30 October 2020; Springer: Cham, Switzerland, 2022; pp. 1895–1901. [Google Scholar]
Tao, Z.; Wu, C.; Liang, Y.; He, L. LW-GCN: A Lightweight FPGA-based Graph Convolutional Network Accelerator. arXiv 2021, arXiv:2111.03184. [Google Scholar]
Sahbi, H. Lightweight Connectivity In Graph Convolutional Networks For Skeleton-Based Recognition. In Proceedings of the 2021 IEEE International Conference on Image Processing (ICIP), Anchorage, AK, USA, 19–22 September 2021; pp. 2329–2333. [Google Scholar]
Li, X.; Lv, Z.; Liu, B.; Wu, L.; Wang, Z. Improved feature learning: A maximum-average-out deep neural network for the game go. Math. Probl. Eng. 2020, 2020, 1397948. [Google Scholar] [CrossRef] [Green Version]
Monti, F.; Boscaini, D.; Masci, J.; Rodola, E.; Svoboda, J.; Bronstein, M.M. Geometric deep learning on graphs and manifolds using mixture model cnns. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 22–25 July 2017; pp. 5115–5124. [Google Scholar]

Figure 1. Graph processing flow of structural data through the model.

Table 1. Datasets details.

Dataset	Features	Nodes	Edges	Classes	Graphs
Cora	1433	2708	10,556	7	1
Citeseer	3703	3327	9228	6	1
Pubmed	500	19,717	88,651	3	1
PPI	50	2372	818,716	121	24

Table 2. Accuracy comparison in transductive learning.

	Hidden Layer Dimension	Cora (%)	Citeseer (%)	Pubmed (%)
GAT	8	82	74	81.2
	16	81.4	74.2	81.2
	32	82.4	72.6	80.8
	64	82.4	74	81.2
GAT-A-FX	8	82	75	80.8
	16	81.8	72.8	80.8
	32	82.4	72.6	80.8
	64	82.2	72.8	80.8
Geometric-V-Sub	8	75	65	80.8
	16	78.8	67.4	81.8
	32	80.4	67	81.2
	64	79.6	68	81.8

Table 3. Parameter comparison in transductive learning.

	Hidden Layer Dimension	Cora	Citeseer	Pubmed
GAT	8	92,302	237,516	32,326
	16	184,590	475,020	64,646
	32	369,692	950,552	129,804
	64	738,318	1,900,044	258,566
GAT-A-FX	8	92,444	237,656	32,460
	16	184,860	475,288	64,908
	32	369,692	950,552	129,804
	64	739,356	1,901,080	259,596
Geometric-V-Sub	8	12,245	30,300	4421
	16	24,445	60,628	9029
	32	49,613	122,052	19,013
	64	103,021	247,972	42,053

Table 4. Accuracy comparison and parameter comparison in inductive learning.

	Hidden Layer Dimension	F1-Score	Total Parameters
GAT	32	91.8	110,578
	64	97.2	351,986
	128	98.4	1,228,018
	256	98.6	4,552,946
GAT-A-FX	32	85.3	111,844
	64	93.7	354,276
	128	95.9	1,232,356
	256	96.5	4,561,380
Geometric-V-Sub	32	83.9	97,405
	64	91.9	130,877
	128	96.7	228,541
	256	98.4	546,749

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Lyu, Z.; Aziguli, W.; Zhang, D. Geometry-V-Sub: An Efficient Graph Attention Network Struct Based Model for Node Classification. Appl. Sci. 2022, 12, 7246. https://doi.org/10.3390/app12147246

AMA Style

Lyu Z, Aziguli W, Zhang D. Geometry-V-Sub: An Efficient Graph Attention Network Struct Based Model for Node Classification. Applied Sciences. 2022; 12(14):7246. https://doi.org/10.3390/app12147246

Chicago/Turabian Style

Lyu, Zhengyu, Wulamu Aziguli, and Dezheng Zhang. 2022. "Geometry-V-Sub: An Efficient Graph Attention Network Struct Based Model for Node Classification" Applied Sciences 12, no. 14: 7246. https://doi.org/10.3390/app12147246

APA Style

Lyu, Z., Aziguli, W., & Zhang, D. (2022). Geometry-V-Sub: An Efficient Graph Attention Network Struct Based Model for Node Classification. Applied Sciences, 12(14), 7246. https://doi.org/10.3390/app12147246

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Geometry-V-Sub: An Efficient Graph Attention Network Struct Based Model for Node Classification

Abstract

1. Introduction

2. Materials and Methods

2.1. Symbols

2.2. Reduce Parameters by Modifying GAT

2.3. Geometric Vector Subtraction (Geometric-V-Sub) for Edge Feature Enhancement

3. Results

3.1. Dataset and Baseline

3.2. Experimental Parameters

3.2.1. Transductive Learning

3.2.2. Inductive Learning

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Sample Availability

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI