Graph Convolutional Network Design for Node Classiﬁcation Accuracy Improvement

: Graph convolutional networks (GCNs) provide an advantage in node classiﬁcation tasks for graph-related data structures. In this paper, we propose a GCN model for enhancing the performance of node classiﬁcation tasks. We design a GCN layer by updating the aggregation function using an updated value of the weight coefﬁcient. The adjacency matrix of the input graph and the identity matrix are used to calculate the aggregation function. To validate the proposed model, we performed extensive experimental studies with seven publicly available datasets. The proposed GCN layer achieves comparable results with the state-of-the-art methods. With one single layer, the proposed approach can achieve superior results.


Introduction
Graph data structures represent complex data structures for solving different problem domains.Graphs are irregular in shape and can represent any form of relationship between different entities.The ubiquitous representation of graphs is an attractive feature for data processing.Machine learning models are proven to be very efficient in learning different relationships between inputs and outputs and are widely adopted in various research areas [1].In most cases, deep learning algorithms are capable of capturing hidden patterns in Euclidean data that are not represented in graph form [2].In recent years, the study of graph neural networks (GNNs) has been growing at a rapid pace.The use of GNNs is mainly focused on graph-oriented data which are very complicated in nature.In a graph, the relationship is built upon nodes and edges, where each node or edge is assigned with features or attributes.These features are revalued vectors that represent original or synthetic data.By using these feature vectors, GNNs can learn the different patterns of relationships or labels associated with each node.Graph learning can be applied to various problems like community detection [3,4], node classification [5,6], link prediction [7], molecular properties, natural language processing, and clustering [8,9].
Among the different applications of GNNs, node classification has gained researchers' attention.Various approaches are made to the training and prediction capability of GNNs using deep learning approaches.The authors in [10] proposed a node-feature-based learning embedding function which generalizes to unseen nodes.In [11], the graph convolution is formulated as an integral transform of the embedding function under probability measures to identify sample loss and gradients.The study presented in [12] introduced NodeNet, a method for classifying nodes in citation graphs.Prakash et al. [13] proposed the kernel propagation method, leveraging higher-order structural features of GNNs.Additionally, in [14], the authors proposed a two-layer feature selection graph neural network to learn the importance of features during the training period.
A graph convolutional network (GCN) was proposed in [5] for first-order approximations of spectral graph convolutions.The GCN provided significant processing power on graph data and has already been implemented in various tasks [15].The main categories of the research area where the GCN was implemented include science, computer vision, and natural language processing [16].The GCN has also been used in drug development and discovery by considering molecular properties and activity prediction [17].By combining one-hop local graph topologies with node characteristics, graph convolutional networks apply a spectral strategy to learn node embeddings and extract embeddings from the hidden layers.The authors of [18] proposed a model which uses multiple instances of GCNs over node pairs discovered at different distances in random walks, and combined instances are used to learn classification outputs.The study in [19] predicted the interface between pairs of proteins using graph representation of the underlying protein structure.Graph convolution is used by a merge layer and a fully connected layer to make the classification of proteins.A geometric GCN was proposed in [20] to transductive learning on graphs.The authors of [21] proposed improved identification of influential nodes using information entropy and the weighted degree of edge connections.The method was evaluated by comparing nine algorithms and nine datasets.Geometric aggregation including node embedding, structural neighborhood and bi-level aggregation was applied to a graph neural network to obtain good performance.A Lanczos network was proposed in [22], which leverages the Lanczos algorithm to build low rank approximations of the graph Laplacian for graph convolution.A GCN was proposed in [23] for hyperspectral image classification using the minibatch technique, which allows large-scale data to be trained efficiently.In another study in [24], a cluster GCN was proposed to reduce the computational cost for high dataset sizes.In addition, a GCN cannot properly distinguish graph structures and it also suffers from noise limitation.If the data noise increases, the possibility of obtaining good results decreases.From the above discussion, we can observe that different works have been proposed for enhancing graph data classification.Most of the studies concentrate on performing experiments outside of the GCN layer internal structure by introducing new algorithms, minibatch sizes, or changing application areas.However, more research is needed to find a more accurate layer design in order to find an alternative solution for the graph data structure.The motivation of our study is to examine the result when we modify the layer calculation strategy of the following GCN-based approach.In our study, we propose a GCN model for the node classification task.In this study, we have considered the node classification problem for one single graph.We have formed a new GCN layer for enhancing the feature extraction process and classification accuracy.We take the feature propagation mechanism to aggregate the neighborhood information of a particular node.We first normalize the input feature values and next we calculate the coefficient value for aggregation.The coefficient values for each step are updated during the training process.We conduct experiments using the following seven datasets to evaluate the proposed GCN model: Cora, Citeseer, PubMed, Amazon photos, Amazon computers, Cora Full, and Coauthor CS.The experimental results demonstrate promising outcomes achieved by our proposed model.In summary, the main contributions of this study are:

•
We propose a GCN layer for improving the classification accuracy of nodes in graph data.

•
The proposed approach is tested with seven different datasets and is compared with related previous studies.
The rest of the paper is organized as follows.Section 2 presents the graph notation and GCN structure details, Section 3 demonstrates the proposed model, Section 4 presents the experiment results, and Section 5 concludes the paper.

Graph Architecture 2.1. Graph Notation
A graph can be represented as G = (V, E ), where V is the set of vertices or nodes and E is the set of edges.We consider that the node sets are V = n 1 , n 2 , ...., n n and the edges sets are E = e 1 , e 2 , ...., e m .An example of a graph structure is represented in Figure 1.This example figure represents 50 nodes with a total number of edges of 252.To process graph data, any graph is represented as an adjacency matrix denoted by A ∈ {0, 1} n×n .For each element or node, if there exists an edge between node v i and v j , A i,j = 1; if there is no edge, A i,j = 0. Again, each node is associated with a d-dimension feature matrix X ∈ R n×d = [x 1 , x 2 , ..., x n ], where x i is the signal vector for the i-th node.If self-loops are available in the graph structure, the adjacency matrix can be written as Â = A + I, where I is the identity matrix.A list of notation is presented in Table 1.

Graph Convolutional Networks
GCNs are created by following the neural network architecture using convolution layers to learn node embedding by aggregation.In particular, GCNs use the graph structure and input features to learn the node embedding.A node's representation captures the structural information within its l-hop network neighborhood after finishing l iterations of aggregation.Mathematically, the l-th layer of a GNN can be represented as follows [25]: where is the feature vector of node u at the l-th layer.A multi-layer GCN can be expressed as follows [5]: where Â is the adjacency matrix represented by Â = A + I, C and W are layered specific trainable weight matrices, σ(.) is the activation function, and H (l+1) is the matrix of activations in the (l + 1)-th layer.If the graph G can be divided into subgraphs, the convolution layers can be approximated by Monte Carlo sampling.Equation ( 2) can be expressed as follows [16]: where f l is the number of independent and identically distributed samples u l 1 , ..., u l f l for Monte Carlo samplings.A GCN can be built by stacking multiple convolution layers and a simple example of a single layer is as follows: where f (X, A) is the learning condition of a model, where X is the data feature and A is the adjacency matrix, and ReLU is the rectified linear unit non-linearity.The expression in ( 4) is useful to understand the formation of GCN layers and the architecture of a GCN model.

Proposed Model
In this section, we describe the proposed GCN model in detail.The architecture of the proposed model is shown in Figure 2. First, we have the input layer which accepts the graph parameters and includes an adjacency matrix and input features.The feature values are normalized in the first place, which can be expressed as follows: where X 0 is the maximum value.Next, the normalized features are passed through a GCN layer which is shown in Figure 2. In the GCN layer, several operations are performed in a sequential manner.In the first step, the input weights are normalized using the softmax function as follows: where W i+1 are the update weights and W i are the received weights.In the next step, we create a coefficient for aggregation using the following expression: where the first value of the update weight is multiplied by the identity matrix and the second weight value is multiplied by the adjacent matrix and identity matrix.The resulting value is multiplied by the feature matrix.The expression of aggregation is written as follows: where b is the bias value.The resulting value is passed through by the sigmoid layer as follows: where S(.) is the sigmoid function.The loss is estimated using cross entropy loss.The overall process of the GCN layer is shown in Algorithm 1.After the GCN layer, a softmax function is applied for predicting one of the classes.Finally, the result with the highest probability is picked as the predicted class.The proposed single layer network can be expressed as follows:

Results
In any machine-learning-based problem solving approach, the objective is to minimize the loss for the objective function and gain a high accuracy.In our approach, we also try to minimize the loss, which will eventually result in a high node classification accuracy.The overall learning and inference process for node classification of the proposed algorithm is described in Algorithm 2. In the first place, we set learning parameters, which are updated in order to minimize the loss function.1: Input: Train data feature (X), labels(y), Adjacency matrix (A), Loss function, optimizer(Adam), coefficient value (q), max-episode.2: Output: Optimized model.3: A learnable parameters list is created with size of prarameters = ((X, y), q).4: Adam optimizer function is initiated.for episode is not max-episode do 5: Get parameters state.6: Using Algorithm 1 calculate the loss for the current parameters.7: Update the value of parameters using the optimizer function.end for 8: Save the model with the minimized loss value.9: Return the model.10: Run inference for the input feature and find node class.11: End Algorithm 1 is the main function to reduce the loss for the input graph data.We use the adjacency matrix and identity matrix to reduce the loss.After some episodes, the loss starts to be minimized because the internal calculation can correctly match the label associated with the data.
To evaluate the performance of the proposed GCN model, we have considered seven publicly available datasets, namely Cora, Citeseer, PubMed, Amazon photos, Amazon computers, Cora Full, and Coauthor CS.Each dataset's information is listed in Table 2.The Cora dataset has 2708 nodes of a citation network of machine learning papers and seven different classes that represent different subject categories [26].The Citeseer dataset has 3327 nodes and six classes that represent subject categories.Similarly, PubMed has 19,717 nodes with three classes.We also considered large datasets which include more nodes and classes, such as Amazon photos, Amazon computer, Cora Full, and Coauthor CS.These datasets are large compared to the dataset which is described above, and all the information in the datasets is provided in Table 2. First, we considered the Cora dataset for evaluating our proposed model.Figure 3a shows the overall training and testing loss of the model for the Cora dataset.We observed a sharp decrease in loss after 20 episodes and later, the loss reduces steadily.Again, Figure 3d shows the accuracy of the model for the training and testing process.The mean value of the testing accuracy is around 87.25%.Next, the Citeseer dataset was considered for model evaluation.In Figure 3b, we can see the training and testing loss for the Citeseer dataset.The loss trend is similar to the previously described Cora dataset.However, the accuracy is a bit different in the case of the Citeseer dataset, as depicted in Figure 3e.The training accuracy increases steadily but the testing accuracy fluctuates and sometimes decreases.Nevertheless, the mean accuracy for testing remains at 76% which is comparable with existing results in the literature.The training and testing progress for the PubMed dataset is shown in Figure 3c, and Figure 3f shows the achieved accuracy during training.As the model learns and updates the optimized parameters, better accuracy results are produced.The training accuracy increases to 84.18% as the number of epochs increases.However, the test accuracy does not increase in the same manner, and this is due to the randomness of the dataset.Figure 4 shows the training and testing loss and accuracy for the four other datasets.Figure 4a shows the training and testing loss, which is reduced to the minimum after 150 epochs.The accuracy also reaches high values at around 150 epochs, as depicted in Figure 4e.The mean test accuracy for the Amazon photo dataset is 89.60% which is comparable to other studies.Again, Figure 4b shows the training and testing of loss for dataset of Amazon computers.The testing accuracy is different to the training accuracy, but both follow the same trend, as depicted in Figure 4f, and the method can achieve the highest accuracy of 81.88%.The Cora Full dataset has the highest number of classes, and achieving a good testing accuracy is challenging.Figure 4c,g shows the loss and accuracy for training and testing data.Finally, the Coauthor CS dataset was considered and achieves a 92.42% testing accuracy.The loss and accuracy are shown in Figure 4d,h, respectively.In Table 3, we have compared the result obtained in our experiment with other studies.It can be observed that the proposed model can achieve better performance compared to other studies.However, for PubMed it has a marginal advantage.The other four dataset results have been benchmarked in Table 4.The proposed model significantly achieves good results in the case of the Coauthor CS dataset.In the case of other datasets, the results are comparable with most studies, except that some studies can provide better results.Node embedding results can be helpful to understand the evaluation performance of any model.The node embedding will converge to the same class of labels and form a cluster.Figure 5a shows node embedding classification for the Cora dataset.We can observe from Figure 5a that seven different clusters have been formed after training.Each of the clusters represents a specific topic of the Cora dataset.Figure 5b shows node embedding of the classes in the Citeseer dataset.It was observed that six different clusters have formed after training.For the PubMed dataset, three different classes are shown in Figure 5c.There are some overlaps with the different classes because some nodes are not correctly For the other four datasets, the node embedding results are presented in Figure 6.For clear visibility, we have omitted the legends for the dataset classes.Figure 6a-d shows the node embedding results for the Amazon photos, Amazon computers, Cora Full, and Coauthor CS datasets, respectively.After training, the algorithm can successfully cluster similar nodes, which signifies the node classification accuracy.In summarizing the model's performance based on the experimental results, it is evident that the proposed GCN layer with a single layer can achieve superior results on the Cora, Citeseer, and PubMed datasets.For the Amazon photos, Amazon computers, and Cora Full datasets, the model does not surpass the performance of existing models.However, it consistently produces results that are nearly identical to those of the compared studies.Notably, the proposed model demonstrates outstanding performance on the Coauthor CS dataset.We also conducted a node classification accuracy test with the highest probability for each of the datasets.We selected 100 data samples randomly and tested the trained model.The classification results are presented in Figure 7, which compares the outcomes with various existing studies.For the Cora dataset, out of 100 test data points, our model correctly classifies 80% of nodes.In contrast, the method described in [9] achieves an accuracy of 63%.When the Citeseer dataset is considered, the percentage of accurately classified nodes drops to 70%, while the approach outlined in [31] also achieves a 70% accuracy.The PubMed dataset demonstrates a node classification accuracy of 79%, whereas the study conducted in [18] achieves a slightly higher accuracy of 81%.Moving to the Amazon computers dataset, our model achieves a commendable 83% accuracy, comparable to [40], which attained a 75% accuracy.Similarly, the Amazon photos dataset achieves an accuracy of 89%, slightly surpassing the performance in [38] at 87%.In the context of the Coauthor CS dataset, our model achieves an 88% accuracy, which can be contrasted with [10], boasting an accuracy of 85%.However, the Cora Full dataset exhibits the lowest accuracy at approximately 55%, aligning with similar findings in [38], who also reported accuracy of 55%.It is important to note that these results can vary due to the random nature of the experiments and their sensitivity to different parameter settings.
The complexity of the proposed model depends on the following equation: To express the complexity, we consider the number of nodes to be N and the number of embedding dimensions to be F. Thus, we can express each W as (F × F), A as (N × N), and X as (N × F).By following the above Equation ( 11), we can express the complexity as O(2N 3 F + NF 3 ).

Conclusions
In this paper, we have proposed a GCN layer for enhancing the accuracy of GCNbased node classification operations.We have proposed a new GCN layer combined with the use of weight, adjacency, and identity matrices to improve prediction.The experimental results show that our proposed model can achieve more than 87% accuracy for the Cora dataset, 76% for the Citeseer dataset, and 84% for the PubMed dataset.For the other four datasets (Amazon photos, Amazon computer, Cora Full, and Coauthor CS), 89%, 81%, 59%, and 92% accuracies were achieved, respectively.As the number of classes increases, the proposed model cannot correctly classify all nodes.In our future studies, we will work on datasets that include more classes.

Figure 1 .
Figure 1.A example of a graph data structure where each node n ∈ V is represented by a circle and an edge e ∈ E is represented by a straight line.

Algorithm 2
Proposed model training and evaluation steps.

Figure 3 .
Figure 3. Loss and accuracy results for different datasets: (a) training and testing loss of Cora dataset, (b) training and testing loss of Citeseer dataset, (c) training and testing loss of Pubmed dataset, (d) training and testing accuracy of Cora dataset, (e) training and testing accuracy of Citeseer dataset, and (f) training and testing accuracy of Pubmed dataset.

Figure 4 .
Figure 4. Loss and accuracy results for different datasets: (a) training and testing loss of Amazon photo dataset, (b) training and testing loss of Amazon Computers dataset, (c) training and testing loss of Cora Full dataset, (d) training and testing loss of Coauthor dataset, (e) training and testing accuracy of Amazon photo dataset, (f) training and testing accuracy of Amazon computers dataset, (g) training and testing accuracy of Cora full dataset, and (h) training and testing accuracy of Coauthor CS dataset.

Figure 5 . 6 .
Figure 5. Results of node embedding for different datasets: (a) Cora dataset after finishing 150 episodes of training, showing seven different classes; (b) Citeseer dataset shows six classes after 150 episodes of training; and (c) the result of the PubMed dataset after 350 episodes.

Figure 7 .
Figure 7. Node classification accuracy performance comparison with previous studies.

Table 1 .
List of notation.

Table 2 .
Experiment dataset information

Table 3 .
A comparison of model testing accuracy with state-of-the-art methods for the Cora, Citeseer, and PubMed datasets.
Boldface characters in column classification data are matrix and boldface in column Cora, Citeseer and Pubmed represents highest accuracy.

Table 4 .
A comparison of model testing accuracy with state-of-the-art methods for the Amazon photos, Amazon computers, Cora Full, and Coauthor CS datasets.