Learning Robust Node Representations via Graph Neural Network and Multilayer Perceptron Classifier

Sejan, Mohammad Abrar Shakil; Rahman, Md Habibur; Aziz, Md Abdul; Hameed, Iqra; Islam, Md Shofiqul; Sabuj, Saifur Rahman; Song, Hyoung-Kyu

doi:10.3390/math14040680

Open AccessArticle

Learning Robust Node Representations via Graph Neural Network and Multilayer Perceptron Classifier

by

Mohammad Abrar Shakil Sejan

¹

,

Md Habibur Rahman

^2,3

,

Md Abdul Aziz

^2,3

,

Iqra Hameed

^2,3

,

Md Shofiqul Islam

⁴

,

Saifur Rahman Sabuj

⁵

and

Hyoung-Kyu Song

^2,3,*

¹

Department of AI Convergence Electronic Engineering, Sejong University, Seoul 05006, Republic of Korea

²

Department of Information and Communication Engineering, Sejong University, Seoul 05006, Republic of Korea

³

Department of Convergence Engineering for Intelligent Drone, Sejong University, Seoul 05006, Republic of Korea

⁴

Institute for Intelligent Systems, Deakin University, 75 Pigdons Rd., Waurn Ponds, Geelong, VIC 3216, Australia

⁵

Department of Electrical and Electronic Engineering, Brac University, Dhaka 1212, Bangladesh

^*

Author to whom correspondence should be addressed.

Mathematics 2026, 14(4), 680; https://doi.org/10.3390/math14040680

Submission received: 31 December 2025 / Revised: 9 February 2026 / Accepted: 12 February 2026 / Published: 14 February 2026

(This article belongs to the Special Issue Mathematical Methods for AI-Enhanced Wireless Communications in Smart Cities)

Download

Browse Figures

Versions Notes

Abstract

Node classification is a fundamental task in graph-based learning, with applications in social networks, citation networks, and biological systems. Learning node representations for different graph datasets is necessary to find the correlation between different types of nodes. Graph Neural Networks (GNNs) play a critical role in providing revolutionary solutions for graph data structures. In this paper, we analyze the effect of combined GNN and multilayer perceptron (MLP) architecture to solve the node classification problem for different graph datasets. The feature information and network topology are efficiently captured by the GNN layer, and the MLP helps to make accurate decisions. We have selected popular datasets, namely Amazon-computer, Amazon-photo, Citeseer, Cora, Corafull, PubMed, and Wikics, for evaluating the performance of the proposed approach. In addition, in the GNN part, we have used six models to find the best model fit in the proposed architecture. We have conducted extensive simulations to find the node classification accuracy for the proposed model. The results show the proposed architecture can outperform previous studies in terms of test accuracy. In particular, the GNN algorithms SAGEConv, GENConv, and TAGConv show superior performance across different datasets.

Keywords:

graph neural network; node classification; performance measurement; graph data; multilayer perceptron

MSC:

68R10

1. Introduction

Machine learning (ML) has emerged as a technological revolution that possesses the power to model complex problems in the real world [1]. Most of the problems solved by ML algorithms are generally based on Euclidean space, and the solutions are used in different application areas. Image- and text-based data are efficiently handled by traditional architectures like convolutional neural networks (CNNs). However, these structures are not efficient in non-Euclidean or irregular space data, which has a large application domain in real-world scenarios [2]. This constraint causes low performance for CNNs and other models in application areas such as bioinformatics, drug discovery, social network analysis, computer network recommendation systems, logistics, supply chain, and fraud detection in financial transactions [3]. The systems mentioned above are best described as graph data structures, and the internal dependencies of the data are important to understand the complex geometry of each system. In addition, moving to graph-based representation learning is inevitable, as it helps provide meaningful insights into available data. The problems related to non-Euclidean data are fundamental and are key to finding viable solutions for different complex problems. To enhance the use of graph-based data, Graph Neural Networks (GNNs) are fundamental structures to be leveraged. GNNs are increasingly used in different domains to help solve research problems efficiently [4,5,6].

Learning representations of graph-structured data has been transformed from shallow to deep embedding techniques for end-to-end GNNs. The studies conducted in the early phase of developing GNNs and node classification include low-dimensional vector representations that preserve local neighborhood structures. DeepWalk utilizes skip-gram models to learn node embeddings in sentences by random walks and generating word vectors [7]. Again, the study in node2vec [8] introduces a biased random walk strategy and applies both breadth-first search and depth-first search sampling to capture both homophily and structural equivalence.

These methods can successfully reveal topological information, but the limitation is they cannot use node features. This results in low accuracy for unseen nodes that come to the model for prediction. For this reason, GNNs were updated to process graph topology and node features in a graph structure simultaneously. Kipf and Welling [9] proposed the Graph Convolutional Network (GCN), which approximates spectral graph convolutions using a localized first-order approximation approach. The proposed method aggregates information from immediate neighbors, which allows the model to learn representations with feature data. Other studies focus on improving the aggregation technique to gain maximum performance. Graph Attention Networks (GAT) [10] incorporated self-attention mechanisms. GATs can assign different importance weights to different neighbors, which improves the representation quality in noisy graphs.

The study in [11] proposed a multichannel heterogeneous graph neural network for node classification by learning channel-wise feature representations across heterogeneous relations and also enabling effective aggregation of structural information for enhancing node classification. Empirical studies have shown that methods like DeepWalk or node2vec have less accuracy as compared with GNN algorithms in node classification tasks. In addition, datasets like Cora, Citeseer, and PubMed are tested to see the superiority of GNNs as compared with traditional machine learning methods [12]. The advanced models have increased the performance of GNNs by considering attention, sampling, and generalized propagation. However, the upper-limit performance has not been reached yet, and more studies are needed to increase the node classification accuracy [13]. The authors in [14] proposed a sign-aware recommendation system using GNNs for network embedding applications. First, a signed bipartite graph is constructed and divided into two edge-disjoint graphs. Next, node embeddings are created, and a Bayesian personalized ranking loss function is used to optimize the training process. A GNN-based channel estimation scheme is proposed in [15]. The authors first proposed a spatial-frequency GNN for configuring the transmitter and learning wireless channel models. Massive MIMO models are considered for this study, and good performance is achieved. In a different work, the authors of [16] proposed a graph MLP that learns discriminative node representations and implicitly extracts the original graph characteristics via contrastive learning. A multi-stage deep learning model was proposed for anomaly detection in videos in [17]. First, two-stream features were collected using a pretrained technique, and these features were then learned using a graph convolutional network and local and global multi-head self-attention mechanisms. These features were reduced using MLP integration with prompt engineering learning via knowledge-based prompts. At the end, a classifier is used to classify video anomalies from different datasets.

In this study, our goal is to enhance the node classification technique by employing a GNN and multilayer perceptron (MLP) structure. The motivation behind this study is that GNNs are able to capture graph representation learning, and MLPs are able to make accurate classification. The GNNs can process graph data by using the message-passing technique over neighboring nodes and enable the model to capture the structural dependencies of the input graph. GNNs are also able to identify node feature similarities, which traditional neural networks cannot represent. Each node can learn a context-aware embedding by using its structural and semantic position in the graph. Although the use of an MLP on top of GNN embeddings is common, its role in our framework is fundamental. The GNN encoder is responsible solely for learning topology-aware node representations through message passing, while the MLP acts as a dedicated nonlinear decision module. This design decouples representation learning from classification, allowing each component to specialize in its respective task. Such separation is particularly beneficial when evaluating multiple GNN algorithms across heterogeneous datasets. Different graphs exhibit varying degrees of homophily, feature noise, and structural complexity. A shallow or linear classifier often lacks the capacity to adapt to these variations. The MLP, by contrast, learns flexible and dataset-specific decision boundaries over the learned embeddings, improving generalization and robustness. This framework-level design distinguishes our approach from prior methods: rather than introducing a new convolution operator, we provide a general and systematic pipeline for evaluating and deploying GNNs for node classification. Through experiments on seven heterogeneous datasets, we demonstrate that this design yields consistent performance gains and robust behavior. The contribution of this work, therefore, lies in the framework-level formulation and comprehensive evaluation, providing a strong and practical baseline for node classification.

The use of an MLP on top of GNN embeddings provides a more expressive nonlinear decision function over the learned representations. Thus, the GNN learns meaningful relational embeddings, while the MLP optimizes class separability, resulting in a unified framework that achieves superior node classification performance and generalization across graph datasets. For the GNN component, we consider six well-established GNN architectures, including Topology Adaptive Graph Convolution (TAGConv), GraphSAGE (SAGEConv), Graph Convolutional Network (GCNConv), Graph Attention Network (GATConv), Generalized Edge-Conditioned Convolution (GENConv), and Chebyshev Graph Convolution (ChebConv). Each GNN model is independently integrated into the proposed architecture, and the performance of the corresponding models is systematically evaluated. These GNN variants cover a wide range of message-passing mechanisms. GCNConv serves as a canonical baseline for homophilous graphs. SAGEConv supports inductive learning and scalability, which are essential for large graphs. GATConv employs attention to adaptively weight neighbors, making it robust to noisy or heterogeneous neighborhoods. TAGConv and ChebConv utilize polynomial filters to incorporate higher-order neighborhood information, enabling long-range dependency modeling. GENConv provides a generalized and stable aggregation scheme with enhanced classification power. These models are evaluated on seven heterogeneous datasets spanning citation and e-commerce networks. This design enables us to analyze how different GNN paradigms behave under varying structural and feature distributions. It also demonstrates that the proposed GNN and MLP framework is broadly applicable to any dataset.

The contributions of this study can be listed as follows:

We implement a two-stage node classification pipeline in the first phase, a GNN-based approach to generate node embeddings, and an MLP architecture to make smooth decisions for each node classification. This architecture enables the GNN to focus on producing informative, topology-aware representations.
A comprehensive analysis of six different GNN algorithms is presented by incorporating them into the proposed architecture. This unified framework ensures a fair and consistent comparison by keeping the classification head and training protocol fixed while varying only the GNN encoder. Each technique is rigorously evaluated using multiple performance metrics, including Macro-F1, Micro-F1, Precision, and Recall, enabling a detailed assessment.
Seven graph datasets are used for evaluating results, and each dataset is tested across different models. By applying every model to every dataset under identical experimental settings, the study enables a fair comparison and highlights the generalization capability and robustness of different GNN architectures across heterogeneous graph domains. Furthermore, we present comprehensive noise resilience experiments for each model to systematically assess their robustness against feature and label corruption.

The rest of the article is organized as follows. Section 2 presents the GNN algorithms adopted in this study and the datasets used for node embedding. Section 3 describes the proposed model architecture. Section 4 details the experimental setup and discusses the results. Finally, Section 5 concludes the paper.

2. GNN Datasets and Methods

This section describes the datasets and GNN methods adopted in this study. Seven widely used benchmark datasets are considered, and six GNN algorithms are employed as the GNN component of the proposed architecture.

2.1. Graph Benchmark Datasets

To make the study more effective, diverse datasets are considered. Cora, Citeseer, and PubMed represent citation networks in which a node represents a scientific publication and an edge is a citation of that paper. Also, node features are the textual context, and labels indicate the research topic presented in the paper. Citeseer is characterized by higher feature sparsity, whereas PubMed provides a larger-scale graph with smoother class distributions. Corafull extends the original Cora dataset by significantly increasing the number of publication categories, resulting in a more challenging multi-class classification scenario [18]. Amazon-computer and Amazon-photo are product co-purchase graphs where nodes represent items and edges indicate frequent co-purchasing behavior; node features encode product metadata, and labels correspond to product categories, reflecting dense community structures commonly observed in recommendation systems [13]. Wikics is derived from Wikipedia articles connected via hyperlink relationships and includes dense node features along with predefined data splits, enabling consistent and reproducible evaluation [19].

2.2. Graph Neural Network Methods

Let a graph be defined as G = (V, E, X) with N = |V| nodes, E denotes the edges, and

X \in R^{N \times d}

corresponds to the node feature matrix. The adjacency matrix is denoted by

A \in R^{N \times N}

, and

H^{(l)}

represents node embeddings at layer l. Self-loops are included via

\tilde{A} = A + I

, with degree matrix

\tilde{D}

.

TAGConv captures higher-order neighborhood information by combining multiple powers of the normalized adjacency matrix [20]:

H^{(l + 1)} = σ (\sum_{k = 0}^{K} {(D^{- 1 / 2} A D^{- 1 / 2})}^{k} X W_{k}^{(l)}),

(1)

where K controls the receptive field size,

X

is feature matrix, and

W_{k}^{(l)}

are hop-specific trainable weights.

SAGEConv introduces an inductive framework by explicitly aggregating neighborhood information and concatenating it with a node’s own features [21]:

h_{i}^{(l + 1)} = σ (W^{(l)} [h_{i}^{(l)} ∥ AGG {x_{j}^{(l)} : j \in N (i)}]),

(2)

where

N (i)

is the neighborhood set of node

v_{i}

,

x_{j}

is the feature vector,

W^{(l)}

is the learnable weight matrix,

σ

is the nonlinear activation function, and

AGG (\cdot)

denotes a permutation-invariant aggregation function such as mean or max pooling.

The GCN performs neighborhood aggregation using a symmetrically normalized adjacency matrix, enabling localized feature smoothing [9]:

H^{(l + 1)} = σ ({\tilde{D}}^{- \frac{1}{2}} \tilde{A} {\tilde{D}}^{- \frac{1}{2}} X^{(l)} W^{(l)}),

(3)

where

W^{(l)}

is a learnable weight matrix,

X^{(l)}

input feature matrix at layer l, and

σ (\cdot)

is a nonlinear activation function.

GATConv assigns adaptive importance weights to neighboring nodes through a self-attention mechanism [10]:

α_{i j} = \frac{exp (LeakyReLU (a^{⊤} [W h_{i} ∥ W h_{j}]))}{\sum_{k \in N (i)} exp (LeakyReLU (a^{⊤} [W h_{i} ∥ W h_{k}]))},

(4)

h_{i}^{(l + 1)} = σ (\sum_{j \in N (i)} α_{i j} W h_{j}^{(l)}),

(5)

where

a^{⊤}

is the weight vector, W is the weight matrix,

σ

is the nonlinear activation function, and

a_{i j}

is the attention coefficient. Multi-head attention is typically employed for improved stability.

The GENConv framework [22] enhances traditional graph convolutional networks by integrating generalized aggregation, message normalization, and residual connections to enable stable training of very deep GNN architectures. The overall layer update is formulated as:

h_{i}^{(l + 1)} = h_{i}^{(l)} + σ (BN (W^{(l)} \cdot {AGG}_{GEN} ({h_{j}^{(l)} : j \in N (i)}))),

(6)

where

{AGG}_{GEN} (\cdot)

denotes a learnable log-sum-exp aggregation function that generalizes mean and max pooling, and the residual connection

h_{i}^{(l)} + σ (\cdot

) ensures gradient stability.

3. Proposed System Architecture

The proposed architecture follows a modular encoder–classifier pipeline, as illustrated conceptually in Figure 1. The system consists of two primary components: a GNN encoder responsible for structural representation learning and an MLP classifier for final node label prediction. This architecture enables end-to-end learning on graph-structured data, efficiently capturing both topological dependencies and feature-based relationships among nodes.

A graph G = (V, E, X) is considered as input for the model and sent to the input layer. The adjacency matrix information provides the structural connectivity of the graph. This connectivity helps to implement message-passing operations performed by the GNN. In our case, the GNN is composed of multiple layers of different convolutional variants, including TAGConv, SAGEConv, GCNConv, GATConv, GENConv, and ChebConv. Each of the convolution techniques has a distinct message-passing layer that performs different types of operations on the input data.

Both local and global structural pattern information is extracted by aggregating neighboring nodes through different propagation mechanisms. Multi-hop dependencies are captured by topology-adaptive filters in the vertex domain by TAGConv. Spectral filtering using normalized Laplacian-based propagation is applied by GCNConv. Inductive representation learning by sampling and aggregating neighborhood features is applied by SAGEConv. GATConv applies self-attention mechanisms, which help to dynamically weight neighbors for obtaining node information. Additionally, GENConv enhances stability and generalization through softmax-based message normalization, and ChebConv uses Chebyshev polynomial approximations for efficient higher-order spectral filtering. Each convolutional layer transforms the input node features

H^{(l)}

into higher-level embeddings

H^{l + 1}

, gathering semantic structure from the graph. The proposed system structure can be formulated as follows:

\hat{Y} = f_{MLP} (f_{GNN} (x, A,; θ_{GNN}) θ_{MLP}),

(7)

where

f_{GNN \cdot}

denotes the graph feature encoder implemented by one or more convolutional variants,

f_{MLP}

is the MLP classifier,

θ_{GNN}

is the trainable parameter for GNN layers,

θ_{MLP}

is the trainable parameter for MLP,

\hat{Y}

is the matrix of predicted class probabilities.

The GNN layers learns the node embedding by using the following function:

h_{i}^{l + 1} = σ (W^{l} \cdot {AGGREGATE}_{t y p e} (h_{j}^{l} : j \in N (i) \cup {i})),

(8)

where i represent the current node,

N (i)

is the neighbor nodes of i,

h^{l}

represents the hidden embeding at layer l, node embedding at layer

h^{l + 1}

,

W^{l}

is the trainable weight matrix for layer l, and

σ (\cdot)

is the non-leanr activation function. The aggregation type is the section of any of the GNN methods used in the algorithm. In the next part, the MLP-based classifier constitutes the final stage of the proposed model and is responsible for mapping the graph-encoded node embeddings into their corresponding class probabilities. The MLP layer received node representation

Z = {[z_{1}, z_{2}, z_{3}, \dots, z_{N}]}^{T} \in R^{N \times d_{z}}

from GNN layer. For each node

v_{i}

, the MLP layer can be expressed as follows:

g_{i}^{l} = RELU (W_{1} z_{i} + b^{1}),

(9)

o_{i} = W_{2} g_{i}^{l} + b^{2},

(10)

where

z_{i} \in R^{d_{z}}

is the input embedding for node

v_{i}

,

W_{1} \in R^{d_{h} \times d_{z}}

and

W_{2} \in R^{C \times d_{h}}

are the weight matrices of the hidden and output layers,

d_{h}

denotes the MLP hidden layer size, and C is the number of classes, and

b_{1}

and

b_{2}

are bias vectors.

A dropout regularization term with a rate of

0.2

is applied after the hidden layer to prevent overfitting and improve generalization. The final output logits

o_{i}

are then passed through a softmax function to obtain the normalized class probabilities:

{\hat{y}}_{i c} = softmax (o_{i}),

(11)

where each element of

{\hat{y}}_{i c}

predicts the node

v_{i}

belonging to a class c. This MLP structure acts as a discriminative mapping function that transforms high-dimensional, graph-encoded embeddings into a compact class-probability space. The proposed model employs the categorical cross-entropy loss as the objective function to optimize both the GNN encoder and the MLP classifier in an end-to-end supervised learning framework. The loss quantifies the divergence between the predicted probability distribution produced by the MLP and the ground-truth class labels. For a given node

v_{i}

, we assume the predicted probability vector be

{\hat{y}}_{ic} = {[{\hat{y}}_{i 1}, {\hat{y}}_{i 1}, \dots, {\hat{y}}_{i C}]}^{T}

, where C denotes the total number of node classes and

\sum_{c = 1}^{C} {\hat{y}}_{i c} = 1

. The cross-entropy loss can be defined for

V_{L} \subseteq V

as follows:

L = - \sum_{i \in V_{L}} \sum_{c = 1}^{C} {\hat{y}}_{i c} log ({\hat{y}}_{i c}),

(12)

where

{\hat{y}}_{i c}

is obtained from the softmax output of MLP classifiers as follows:

{\hat{y}}_{i c} = \frac{exp (o_{i c})}{\sum_{k = 1}^{C} exp (o_{i k})}

(13)

The minimized cross-entropy helps to produce a high probability for the correct class and low probabilities for other classes in the network.

4. Experiment Results

4.1. Training of Each Model

We aim to produce a fair comparison between each model; thus, we use a consistent architecture for different GNN algorithms. Table 1 shows the summarized results of the training, validation, and test accuracy for each model across different datasets. Each dataset is divided into 60% for training, 20% for validation, and 20% for testing. For the Amazon-computer dataset, the SAGEConv model architecture shows the best accuracy at 90%. The other three models, TAGConv, SAGEConv, and GENConv, share the same accuracy of 89%, while GATConv has 88% accuracy. Finally, ChebConv attained 86% test accuracy. In the case of the Amazon-photo dataset, TAGConv and SAGEConv have shown the best performance by achieving 93% test accuracy, which signifies the high node classification performance of the proposed architecture. Next, GCNConv achieved 88%, ChebConv achieved 86%, and GATConv achieved 84% test accuracy. In the case of GCNConv, the accuracy remains low at 73%. The Citeseer dataset is a challenging dataset for achieving high node classification accuracy. The highest test accuracy is achieved by TAGConv, with 75% of nodes correctly classified. GATConv and GENConv have been able to classify 74% of nodes correctly. In the case of GCN, it is 73%, and for both SAGEConv and ChebConv, 72% test accuracy is achieved. For the Cora dataset, TAGConv and SAGEConv have achieved the highest test accuracy of 88%, and the second-highest test accuracy was achieved by ChebConv at 87%. The rest of the models achieved 84% accuracy, which is also a good improvement compared with the previous models. Corafull is a very large dataset and also includes many classes; thus, the performance is not very high compared with other dataset results. TAGConv has achieved the best test accuracy of 67%, and the second best accuracy of 64% was achieved by SAGEConv. ChebConv has achieved 60% test accuracy, and all other models achieved 61% test accuracy. The models have been successful in classifying the PubMed dataset very efficiently. Among the models, TAGConv has achieved the highest accuracy of 87%, and the second highest was achieved by GENConv at 86%. SAGEConv and GCNConv test accuracy is 85%, and ChebConv achieved 84%. GATConv has the least accuracy at 83%, even though it has successfully classified more than 80% of the nodes. For the Wikics dataset, GENConv shows 84% test accuracy, which is the highest among the models. SAGEConv shows the second-highest test accuracy of 82%. In addition, GATConv shows 81% accuracy, and both TAGConv and ChebConv show 79% accuracy. Finally, GCNConv shows lower accuracy of 72%.

Each model is trained for 2000 epochs, and the Adam optimizer is used with a learning rate of 0.0001. In addition, the cross-entropy loss function is used to minimize the loss during training. The hyperparameters for each model are chosen based on the best performance of the model after fine-tuning. In the model architecture, we use three hidden layers of GNNs, whose function is to provide robust node embedding results. We used three configurations to specify the number of hidden units inside each GNN layer and within the MLP layer. The configurations are defined as follows:

\begin{matrix} C l a s s_A = [1950, 1950, 1950, 2000] \\ C l a s s_B = [1550, 1550, 1550, 1700] \\ C l a s s_C = [550, 550, 550, 700] \end{matrix}

The motivation for designing three different hyperparameter classes is to reduce model complexity while maintaining competitive performance. In the GNN hidden layers, the parameter difference between

C l a s s_C

and

C l a s s_B

is 1000, while the difference between

C l a s s_B

and

C l a s s_A

is 400. We systematically evaluated the models using these internal configurations, and when comparable performance was observed, we selected the configuration with fewer hidden units. This strategy ensures that the proposed architecture remains computationally efficient without sacrificing predictive accuracy. The GNN component of the proposed architecture consists of three GNN layers, while the MLP component consists of two fully connected layers. Experimental analysis demonstrates that a two-layer MLP provides adequate classification capability for the node embeddings produced by the GNN component. Further increasing the number of MLP layers does not yield notable performance gains. In each configuration class, the first three values represent the number of hidden units in the three GNN layers, and the last value indicates the number of hidden units in the MLP layer. The output layer of the MLP is set equal to the number of node classes for the corresponding dataset. For TAGConv, the

C l a s s_B

configuration is used for the Amazon-computer, PubMed, and Wikics datasets, while

C l a s s_C

is applied to the remaining datasets. In the case of SAGEConv,

C l a s s_A

is employed for all datasets except Wikics, where the hidden-layer configuration follows

C l a s s_B

. For GCNConv,

C l a s s_B

is used for the Amazon-computer, PubMed, and Wikics datasets, whereas

C l a s s_C

is adopted for the other datasets. Similarly, GENConv utilizes

C l a s s_C

for all datasets except Wikics, which follows

C l a s s_B

. For ChebConv,

C l a s s_B

is used for the Wikics dataset, while

C l a s s_C

is applied to all remaining datasets.

Table 2 presents the performance improvement of the proposed architecture in comparison with only GNN and only MLP models. First, we compute the test accuracy using the same GNN structures while omitting the MLP component. These results are then compared with those obtained from the proposed GNN+MLP architecture. Each GNN technique is evaluated on the corresponding dataset, and a consistent improvement in performance is observed when the MLP is integrated, as reported in the table. We also evaluate a standalone MLP on the graph datasets to assess its capability. As MLPs are not inherently designed to capture graph topology, their performance is significantly lower than that of GNN-based models. In contrast, the proposed architecture demonstrates a notable increase in test accuracy in most cases. The degree of improvement varies across models and datasets, reflecting the diverse characteristics of both the data and the underlying GNN techniques. When compared with only GNN models, the proposed architecture shows a steady performance gain across all methods. In particular, GENConv exhibits a substantial improvement of up to 19% when combined with the MLP. Although in some cases the improvement is modest (e.g., around 1%), it still represents a consistent positive enhancement, indicating that the MLP component contributes constructively to the overall classification performance.

4.2. Node Embedding Comparison

The node embedding results for each of the datasets using different models are crucial for a comparative analysis. Figure 2 depicts the heatmap of different models’ node embedding accuracy across different datasets. In the first row of Figure 2, the Amazon-computer dataset is presented, and GENConv has a slightly higher accuracy as compared with SAGEConv. Both show excellent classification accuracy of 90%. Other models also show a very high node classification accuracy above 87%. The average classification accuracy is 89% for all models. The SAGEConv is optimized for scalable and inductive learning on large graphs, and GENConv benefits from its stabilized aggregation. The second row of the heatmap shows the node classification accuracy for the Amazon-photo dataset. It is visible from the figure that the maximum accuracy is shown by GENConv, where about 95% of nodes are correctly classified. Other models also exhibit a very high accuracy above 93%, as the heatmap color is similar for all models. In this case, the average accuracy is 93%. These results confirm that large, feature-rich, and homophilous graphs favor aggregation mechanisms that are robust to scale and neighborhood variability. The Citeseer dataset classification accuracy mainly falls in the 68–76% range, as depicted in the third row of Figure 2. TAGConv benefits from its polynomial filtering, which captures broader contextual information, while SAGEConv’s adaptive aggregation better handles irregular neighborhood structures. In the case of the Cora dataset, 88% accuracy was achieved by TAGConv, SAGEConv, GCNConv, and ChebConv. GATConv and GENConv have achieved around 84% and 83% accuracy, respectively. This reflects the strong homophily and clean structure of Cora, where even simple convolution operators are sufficient to separate classes. As Corafull is a large dataset, we see a reduction in the node classification accuracy. About 67% of nodes were correctly classified by TAGConv, which is the highest among the other methods. It indicates that higher-order neighborhood information becomes important when class boundaries are very close. SAGEConv has successfully classified 90% of nodes correctly for the PubMed dataset, as shown in Figure 2. All methods achieved a good classification accuracy above 84%. The lower accuracy of GCNConv suggests that uniform averaging is less effective on this medium-scale graph. The advantage of SAGEConv highlights the importance of adaptive aggregation in graphs where node connectivity varies significantly across classes. In the case of the Wikics dataset, GENConv achieved 85% and SAGEConv 84% accuracy in node embedding. GENConv remains stable because of its robust aggregation scheme, and SAGEConv continues to perform well due to its flexible and inductive design. We see that GCNConv and GATConv have a performance degradation due to the different structure of the dataset. Overall, other methods managed to keep the node embedding accuracy above 82%.

4.3. Comprehensive Evaluation of Models

The classification accuracy of each model needs to be verified in order to get a complete evaluation. We consider the Macro-F1, Micro-F1, precision, and recall of each model to make a comprehensive comparison of their learning abilities. In addition, we have recorded the training time for each model to perform a time comparison. Table 3 presents the values of the metric evaluations of different models for the seven datasets.

GENConv and SAGEConv have similar Micro-F1 scores, which are the highest among all models for the Amazon-computer dataset. However, SAGEConv requires less time to complete the training process as compared with GENConv. TAGConv and ChebConv show similar accuracy performance, which places them in the second position. For the Macro-F1 evaluation, TAGConv, SAGEConv, GENConv, and ChebConv show similar performance. For the recall evaluation, SAGEConv and GENConv have the best accuracy. For the Amazon-photo dataset, each of the models performs very well. Especially, the Macro-F1 and Micro-F1 scores for TAGConv, SAGEConv, and ChebConv are more than 97%, showing high accuracy over all classes. SAGEConv shows promising results across all accuracy metrics while considering training time. The Citeseer dataset is evaluated for each model, and the ChebConv model’s performance in Macro-F1 and Micro-F1 is the best among the models. ChebConv is considered a superior method for classifying nodes in a citation network. TAGConv and SAGEConv also showed high Macro-F1 and Micro-F1 scores. Precision and recall scores for ChebConv, TAGConv, and SAGEConv are similar, but for GCNConv, GENConv, and GATConv, they are below 85%. GCNConv requires less time to train on the Citeseer dataset. Another widely used benchmark dataset is Cora, which is important for benchmarking. For the Cora dataset, the different accuracy evaluations show that TAGConv, SAGEConv, and ChebConv gained similar accuracy and are considered the best among the models. GCNConv, GATConv, and GENConv show slightly less accuracy across different evaluation metrics. Each model’s training time is reasonably low; within 47 s, SAGEConv can complete training, which is the highest time to train a model for Cora. Corafull is a relatively large dataset and includes 70 different classes. Thus, the accuracy for each is reduced to some extent. TAGConv shows good performance among all evaluation metrics and is stable in each category. The other datasets vary in performance. However, the average accuracy is more than 80% in the case of all models. ChebConv requires the most time to complete the training process, and GCNConv takes the minimum time for training. In the case of the PubMed dataset, the number of classes is 3, so the accuracy of each model is quite high. All models have shown an accuracy level of more than 94%, which is a good sign that the proposed model is superior in classifying nodes. ChebConv requires the minimum time for training, and TAGConv requires the maximum time for training. Finally, Wikics is used to test the diversity-capturing capability of each model. ChebConv has achieved a high score in different accuracy metrics and takes only 507s to train. GENConv, TAGConv, and SAGEConv also performed above 90% for the different accuracy metrics. However, GCNConv and GATConv failed to achieve good results, and their accuracy dropped dramatically. The best training time was achieved by SAGEConv, while GENConv requires a long time to finish the training process. Overall, TAGConv and SAGEConv can be considered for practical deployment to get good results in a reasonable amount of time.

On citation networks (Citeseer, Cora, and PubMed), performance becomes more model-sensitive. TAGConv and SAGEConv show clear advantages on Citeseer, where sparse connectivity and noisy features require either higher-order filtering or adaptive aggregation. In contrast, on Cora, all models achieve comparable performance, reflecting the strong homophily and clean structure of the graph, where even simple convolution operators are sufficient. More challenging datasets, such as Corafull and Wikics, reveal sharper contrasts among models. TAGConv performs best on Corafull, suggesting that higher-order neighborhood information becomes important when class boundaries are very close and the number of classes is large. On Wikics, GENConv and ChebConv significantly outperform other models, while GATConv degrades severely, indicating that attention mechanisms can be unstable on graphs with dense inter-class connectivity and limited labeled data. Across all datasets, the proposed architecture consistently yields strong and stable performance, particularly for SAGEConv, GENConv, and TAGConv. These results demonstrate that different GNN variants exhibit distinct strengths depending on graph structure, and that the unified GNN and MLP framework provides a robust and transferable baseline for node classification across heterogeneous graph domains.

The node embedding performance has also been evaluated for each of the models. We report only the results of those models that performed best on each dataset. Figure 3 depicts the node embedding results for different datasets. In Figure 3a, we see the node embedding results for the Amazon-computer dataset using the SAGEConv model. In the figure, 10 different classes are clearly visible, which indicates the model’s superiority in node embedding. There are some overlapping regions where the model incorrectly classified nodes. Figure 3b shows the node embedding for the Amazon-photo dataset using the GENConv model. As the model achieved more than 95% accuracy, we can see a clear distinction between different classes. The node embedding results of the Citeseer dataset are presented in Figure 3c. Six different clusters represent the different node types in the dataset; this result was produced by the SAGEConv model. The Cora dataset node embedding result is depicted in Figure 3d. This result was produced by using the SAGEConv model. The seven clusters represent the seven different classes of the Cora dataset. Figure 3e shows the Corafull node embedding using the TAGConv model. We can see small clusters representing different classes in the Corafull dataset. Another widely adopted dataset is PubMed, which includes three classes. We can see the three different classes of PubMed in Figure 3f. Here, we have utilized SAGEConv to generate the node embedding results. The Wikics node embedding result is shown in Figure 3g. A total of 10 classes are visible in the dataset after performing the node embedding operation with GENConv.

A comprehensive benchmarking with previous studies is provided in Table 4. Each column represents the test accuracy of each model across the seven datasets tested in the experiment. The proposed architecture shows a competitive performance across all datasets compared with previous studies. In Table 4, we listed 11 previous studies alongside our study to make a fair comparison. For the Amazon-computer and Amazon-photo datasets, the proposed architecture performed 1% better than the CGRL method. In the case of the Citeseer dataset, it improves the node classification accuracy by 3% as compared with other studies. In the case of the Cora dataset, the proposed architecture achieved 2% more accuracy compared with previous studies. A good comparative accuracy gain is achieved in the PubMed dataset, at about 6%. In the case of Corafull and Wikics, the proposed model gained 4% better accuracy as compared with previous methods. Even though a single GNN technique alone does not achieve this performance, we need different GNN techniques to achieve these results for different datasets.

4.4. Performance Under Noise Condition

We test the different GNN based model for noise resilience with two types of noise. To evaluate model robustness under feature perturbations, we inject additive Gaussian noise into node attributes. Given an input graph G with feature matrix X, the noisy version

\tilde{X}

is generated as follows:

\tilde{X} = X + δ \cdot N (0, I),

(14)

where,

N (0, I)

denotes a standard normal distribution and

δ

controls the noise intensity. To evaluate robustness against noisy supervision, we introduce label corruption through random label flipping. Given an input graph G with node labels y, a noisy label set

\tilde{y}

is generated by randomly selecting a proportion p of nodes within a specified data split. The input graph is first cloned to preserve the original data. Nodes are selected according to a predefined mask corresponding to the chosen scope, and each selected node is flipped with probability p.

Figure 4 illustrates the performance of various GNN models in noise resilience experiments focusing on feature noise. Similarly, Figure 5 depicts performance under label noise. We conducted experiments with noise levels ranging from 0% to 40% across multiple benchmark datasets. Across all datasets, a consistent reduction in classification accuracy was observed as noise intensity increased, confirming the adverse impact of corrupted node attributes and labels. Under the feature noise experiment, most models show gradual and relatively smooth performance degradation. In particular, GCNConv, TAGConv, and GENConv demonstrate strong robustness, maintaining stable accuracy even at high noise levels. This behavior is due to their normalized and topology-aware aggregation mechanisms, which effectively suppress random confusion in node features. GATConv shows moderate stability, benefiting from attention-based weighting of neighbors. But SAGEConv experiences slightly higher sensitivity. In contrast, ChebConv consistently exhibits the largest performance drop across datasets, indicating that spectral-based filtering is not effective in response to feature corruption or alteration. In comparison, label noise leads to substantially more severe performance decline. Since corrupted labels directly affect the training objective. Across most datasets, GCNConv and TAGConv remain the most robust under label corruption, preserving relatively high accuracy even at 40% noise. GATConv also shows competitive robustness in several cases, although its performance is more dataset-dependent. Conversely, GENConv suffers significant degradation under label noise, particularly on Cora, CiteSeer, and Corafull, reflecting its strong dependence on clean supervision. ChebConv and SAGEConv also display notable sensitivity, especially in large-scale and sparse networks. Dataset-specific characteristics further influence robustness patterns. On homophilic and well-connected graphs such as Amazon and PubMed, neighborhood aggregation effectively mitigates both feature and label noise. On sparse citation networks such as Citeseer and Cora, models relying on stable normalization (GCNConv and TAGConv) outperform more expressive architectures. For Wikics, attention-based methods become unstable under label noise, indicating that noisy supervision can severely distort learned attention weights. Overall, these results demonstrate that simple and well-regularized aggregation schemes provide superior robustness in noisy environments. GCNConv and TAGConv consistently achieve the best balance between expressiveness and stability, making them more suitable for real-world applications where data imperfections are unavoidable. In contrast, highly expressive or spectral-based models are more susceptible to noise amplification, particularly under corrupted supervision. This comprehensive analysis highlights the importance of selecting robust GNN architectures when deploying graph-based learning systems in practical scenarios.

5. Conclusions

This paper presents a comprehensive study of the integration of GNN and MLP structures to perform node representations for different graph datasets. The proposed GNN and MLP structure provides a robust node representation learning technique across different datasets. The experimental results show the out-performance of the proposed architecture across different datasets. Among the GNN techniques, SAGEConv and GENConv have shown excellent performance in classifying nodes in different datasets. In addition, GCNConv and TAGConv show good resilience against noise. The benchmark performance shows that the proposed architecture has achieved the highest accuracy compared with previous studies. This study can be extended to analyze deeper GNNs with attention mechanisms, improving performance in large and heterogeneous graphs.

Author Contributions

Conceptualization, M.A.S.S., M.H.R., and S.R.S.; methodology, M.A.S.S., M.A.A., and I.H.; software, M.A.S.S., I.H., M.S.I., and S.R.S.; validation, M.A.S.S., M.H.R., and M.A.A.; formal analysis, M.A.S.S., M.H.R., M.A.A., I.H., M.S.I., and S.R.S.; investigation, M.A.S.S., M.H.R., M.A.A., I.H., M.S.I., and S.R.S.; resources, H.-K.S.; data curation, M.A.S.S., M.H.R., M.A.A., I.H., M.S.I., and S.R.S.; writing—original draft preparation, M.A.S.S. and S.R.S.; writing—review and editing, M.A.S.S. and H.-K.S.; visualization, M.H.R., M.A.A., I.H., M.S.I., and S.R.S.; supervision, H.-K.S.; project administration, H.-K.S.; funding acquisition, H.-K.S. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Institute of Information & Communications Technology Planning & Evaluation (IITP), South Korea under the metaverse support program to nurture the best talents (IITP-2025-RS-2023-00254529) grant funded by the Korea government Ministry of Science and ICT (MSIT) and in part by the Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education, Korea (2020R1A6A1A03038540) and in part by the MSIT, Korea, under the IITP and Information Technology Research Center (ITRC) support program (IITP-2026-RS-2024-00438007) grant funded by the Korea government MSIT.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author due to ongoing research and planned follow-up studies.

Conflicts of Interest

The authors declare no conflicts of interest.

References

L’Heureux, A.; Grolinger, K.; Elyamany, H.F.; Capretz, M.A.M. Machine Learning With Big Data: Challenges and Approaches. IEEE Access 2017, 5, 7776–7797. [Google Scholar] [CrossRef]
Sejan, M.A.S.; Rahman, M.H.; Aziz, M.A.; Tabassum, R.; Baik, J.I.; Song, H.K. Powerful graph neural network for node classification of the IoT network. Internet Things 2024, 28, 101410. [Google Scholar] [CrossRef]
Zhang, S.; Tong, H.; Xu, J.; Maciejewski, R. Graph convolutional networks: A comprehensive review. Comput. Soc. Netw. 2019, 6, 11. [Google Scholar] [CrossRef] [PubMed]
Paul, S.G.; Saha, A.; Hasan, M.Z.; Noori, S.R.H.; Moustafa, A. A Systematic Review of Graph Neural Network in Healthcare-Based Applications: Recent Advances, Trends, and Future Directions. IEEE Access 2024, 12, 15145–15170. [Google Scholar] [CrossRef]
Gupta, A.; Matta, P.; Pant, B. Graph neural network: Current state of Art, challenges and applications. Mater. Today Proc. 2021, 46, 10927–10932. [Google Scholar] [CrossRef]
Sejan, M.A.S.; Rahman, M.H.; Aziz, M.A.; Tabassum, R.; Hameed, I.; Nasser, N.; Song, H.K. Graph neural network enhanced internet of things node classification with different node connections. J. Netw. Comput. Appl. 2025, 244, 104363. [Google Scholar] [CrossRef]
Perozzi, B.; Al-Rfou, R.; Skiena, S. DeepWalk: Online learning of social representations. In KDD ’14: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY, USA, 24–27 August 2014; Association for Computing Machinery: New York, NY, USA, 2014; pp. 701–710. [Google Scholar] [CrossRef]
Grover, A.; Leskovec, J. node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; Association for Computing Machinery: New York, NY, USA, 2016; pp. 855–864. [Google Scholar]
Kipf, T. Semi-supervised classification with graph convolutional networks. arXiv 2016, arXiv:1609.02907. [Google Scholar]
Veličković, P.; Cucurull, G.; Casanova, A.; Romero, A.; Liò, P.; Bengio, Y. Graph Attention Networks. In Proceedings of the ICLR; 2710 E Corridor Drive: Appleton, WI, USA, 2018. [Google Scholar]
Li, Y.; Jian, C.; Zang, G.; Song, C.; Yuan, X. Node classification oriented adaptive multichannel heterogeneous graph neural network. Knowl.-Based Syst. 2024, 292, 111618. [Google Scholar] [CrossRef]
Khemani, B.; Patil, S.; Kotecha, K.; Tanwar, S. A review of graph neural networks: Concepts, architectures, techniques, challenges, datasets, applications, and future directions. J. Big Data 2024, 11, 18. [Google Scholar] [CrossRef]
Shchur, O.; Mumme, M.; Bojchevski, A.; Günnemann, S. Pitfalls of Graph Neural Network Evaluation. arXiv 2018, arXiv:1811.05868. [Google Scholar]
Seo, C.; Jeong, K.J.; Lim, S.; Shin, W.Y. SiReN: Sign-aware recommendation using graph neural networks. IEEE Trans. Neural Netw. Learn. Syst. 2022, 35, 4729–4743. [Google Scholar] [CrossRef] [PubMed]
Ye, M.; Liang, X.; Pan, C.; Xu, Y.; Jiang, M.; Li, C. Graph Neural Networks Based Channel Estimation for mmWave Massive MIMO Systems. IEEE Trans. Veh. Technol. 2025, 74, 19420–19435. [Google Scholar] [CrossRef]
Yuan, L.; Jiang, P.; Hou, W.; Huang, W. G-MLP: Graph Multi-Layer Perceptron for Node Classification Using Contrastive Learning. IEEE Access 2024, 12, 104909–104919. [Google Scholar] [CrossRef]
Shin, J.; Kaneko, Y.; Miah, A.S.M.; Hassan, N.; Nishimura, S. Anomaly detection in weakly supervised videos using multistage graphs and general deep learning based spatial-temporal feature enhancement. IEEE Access 2024, 12, 65213–65227. [Google Scholar] [CrossRef]
Yang, Z.; Cohen, W.W.; Salakhutdinov, R. Revisiting Semi-Supervised Learning with Graph Embeddings. In ICML’16: Proceedings of the 33rd International Conference on International Conference on Machine Learning—Volume 48; PMLR: New York, NY, USA, 2016. [Google Scholar]
Morris, C.; Ritzert, M.; Fey, M.; Hamilton, W.L.; Lenssen, J.E.; Rattan, G.; Grohe, M. Weisfeiler and Leman Go Neural: Higher-Order Graph Neural Networks. In AAAI’19/IAAI’19/EAAI’19: Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence and Thirty-First Innovative Applications of Artificial Intelligence Conference and Ninth AAAI Symposium on Educational Advances in Artificial Intelligence; AAAI Press: Honolulu, HI, USA, 2020. [Google Scholar]
Du, J.; Zhang, S.; Wu, G.; Moura, J.M.F.; Kar, S. Topology Adaptive Graph Convolutional Networks. arXiv 2017, arXiv:1710.10370. [Google Scholar]
Hamilton, W.; Ying, Z.; Leskovec, J. Inductive representation learning on large graphs. In NIPS’17: Proceedings of the 31st International Conference on Neural Information Processing Systems; Curran Associates Inc.: Red Hook, NY, USA, 2017; Volume 30. [Google Scholar]
Li, G.; Xiong, C.; Thabet, A.; Ghanem, B. Deepergcn: All you need to train deeper gcns. arXiv 2020, arXiv:2006.07739. [Google Scholar] [CrossRef]
Mo, Y.; Peng, L.; Xu, J.; Shi, X.; Zhu, X. Simple unsupervised graph representation learning. In Proceedings of the AAAI Conference on Artificial Intelligence; AAAI Press: Menlo Park, CA, USA, 2022; Volume 36, pp. 7797–7805. [Google Scholar]
Peng, Z.; Huang, W.; Luo, M.; Zheng, Q.; Rong, Y.; Xu, T.; Huang, J. Graph representation learning via graphical mutual information maximization. In Proceedings of the Web Conference 2020; Association for Computing Machinery: New York, NY, USA, 2020; pp. 259–270. [Google Scholar]
Zhu, Y.; Xu, Y.; Yu, F.; Liu, Q.; Wu, S.; Wang, L. Deep graph contrastive representation learning. arXiv 2020, arXiv:2006.04131. [Google Scholar] [CrossRef]
Hassani, K.; Khasahmadi, A.H. Contrastive multi-view representation learning on graphs. In Proceedings of the International Conference on Machine Learning; Proceedings of Machine Learning Research: Waterloo, ON, Canada, 2020; pp. 4116–4126. [Google Scholar]
Shou, Y.; Lan, H.; Cao, X. Contrastive graph representation learning with adversarial cross-view reconstruction and information bottleneck. Neural Netw. 2025, 184, 107094. [Google Scholar] [CrossRef] [PubMed]
Zhu, Y.; Xu, Y.; Yu, F.; Liu, Q.; Wu, S.; Wang, L. Graph contrastive learning with adaptive augmentation. In Proceedings of the Web Conference 2021; Association for Computing Machinery: New York, NY, USA, 2021; pp. 2069–2080. [Google Scholar]
Peng, L.; Mo, Y.; Xu, J.; Shen, J.; Shi, X.; Li, X.; Shen, H.T.; Zhu, X. GRLC: Graph representation learning with constraints. IEEE Trans. Neural Netw. Learn. Syst. 2023, 35, 8609–8622. [Google Scholar] [CrossRef] [PubMed]

Figure 1. The proposed system architecture includes one part of GNN layers and the second part MLP layer.

Figure 2. Node classification accuracy for seven datasets utilizing six GNN models, where the proposed architecture significantly boosts performance, particularly for SAGEConv, GENConv, and TAGConv.

Figure 3. Node embedding results for different dataset achieving highest node classification accuracy; (a) Amazon-computer dataset with SAGEConv model, (b) Amazon-photo with GENConv model, (c) Citeseer dataset with SAGEConv model, (d) Cora dataset with SAGEConv model, (e) Corafull dataset with TAGConv, (f) PubMed dataset with SAGEConv, and (g) Wikics datsaet with GENConv.

Figure 4. Test accuracy of different GNN and MLP models under varying levels of Gaussian feature noise: (a) Amazon-Computers, (b) Amazon-Photo, (c) Citeseer, (d) Cora, (e) Corafull, (f) PubMed, and (g) Wikics datasets.

Figure 5. Test accuracy of different GNN and MLP models under varying levels of label noise: (a) Amazon-Computers, (b) Amazon-Photo, (c) Citeseer, (d) Cora, (e) Corafull, (f) PubMed, and (g) Wikics datasets.

Table 1. Training, Test, and Validation Accuracy comparison of different models.

Dataset	Metric	TAGConv	SAGEConv	GCNConv	GATConv	GENConv	ChebConv
Amazon-computer	Training accuracy	$0.99 \pm 0.05$	$0.99 \pm 0.04$	$0.95 \pm 0.16$	$0.95 \pm 0.20$	$0.99 \pm 0.04$	$0.98 \pm 0.07$
	Validation Accuracy	$0.88 \pm 0.04$	$0.89 \pm 0.03$	$0.88 \pm 0.05$	$0.88 \pm 0.07$	$0.89 \pm 0.05$	$0.86 \pm 0.05$
	Test Accuracy	$0.89 \pm 0.04$	$0.90 \pm 0.03$	$0.89 \pm 0.05$	$0.88 \pm 0.07$	$0.89 \pm 0.05$	$0.86 \pm 0.06$
Amazon-photo	Training accuracy	$0.99 \pm 0.06$	$0.96 \pm 0.16$	$0.90 \pm 0.29$	$0.87 \pm 0.19$	$0.94 \pm 0.17$	$0.90 \pm 0.25$
	Validation Accuracy	$0.93 \pm 0.05$	$0.93 \pm 0.06$	$0.87 \pm 0.30$	$0.84 \pm 0.21$	$0.90 \pm 0.08$	$0.85 \pm 0.20$
	Test Accuracy	$0.93 \pm 0.05$	$0.93 \pm 0.06$	$0.88 \pm 0.30$	$0.84 \pm 0.21$	$0.90 \pm 0.08$	$0.86 \pm 0.20$
Citeseer	Training accuracy	$0.86 \pm 0.15$	$0.76 \pm 0.24$	$0.78 \pm 0.26$	$0.80 \pm 0.21$	$0.81 \pm 0.19$	$0.74 \pm 0.24$
	Validation Accuracy	$0.75 \pm 0.11$	$0.72 \pm 0.19$	$0.73 \pm 0.30$	$0.73 \pm 0.18$	$0.73 \pm 0.14$	$0.71 \pm 0.19$
	Test Accuracy	$0.75 \pm 0.12$	$0.72 \pm 0.20$	$0.73 \pm 0.31$	$0.74 \pm 0.18$	$0.74 \pm 0.14$	$0.72 \pm 0.20$
Cora	Training accuracy	$0.99 \pm 0.05$	$0.99 \pm 0.04$	$0.98 \pm 0.08$	$0.98 \pm 0.07$	$0.99 \pm 0.04$	$0.99 \pm 0.06$
	Validation Accuracy	$0.88 \pm 0.04$	$0.88 \pm 0.04$	$0.84 \pm 0.06$	$0.86 \pm 0.06$	$0.84 \pm 0.78$	$0.86 \pm 0.04$
	Test Accuracy	$0.88 \pm 0.04$	$0.88 \pm 0.03$	$0.84 \pm 0.06$	$0.84 \pm 0.05$	$0.84 \pm 0.03$	$0.87 \pm 0.04$
Corafull	Training accuracy	$0.97 \pm 0.09$	$0.98 \pm 0.09$	$0.93 \pm 0.15$	$0.93 \pm 0.15$	$0.81 \pm 0.19$	$0.93 \pm 0.15$
	Validation Accuracy	$0.66 \pm 0.05$	$0.65 \pm 0.05$	$0.62 \pm 0.08$	$0.61 \pm 0.07$	$0.63 \pm 0.04$	$0.61 \pm 0.05$
	Test Accuracy	$0.67 \pm 0.05$	$0.64 \pm 0.06$	$0.61 \pm 0.08$	$0.61 \pm 0.07$	$0.61 \pm 0.04$	$0.60 \pm 0.05$
PubMed	Training accuracy	$0.87 \pm 0.11$	$0.85 \pm 0.12$	$0.84 \pm 0.12$	$0.83 \pm 0.14$	$0.85 \pm 0.11$	$0.84 \pm 0.13$
	Validation Accuracy	$0.86 \pm 0.11$	$0.84 \pm 0.12$	$0.84 \pm 0.27$	$0.82 \pm 0.14$	$0.85 \pm 0.11$	$0.84 \pm 0.12$
	Test Accuracy	$0.87 \pm 0.11$	$0.85 \pm 0.12$	$0.85 \pm 0.11$	$0.83 \pm 0.14$	$0.86 \pm 0.11$	$0.84 \pm 0.13$
Wikics	Training accuracy	$0.86 \pm 0.12$	$0.95 \pm 0.11$	$0.73 \pm 0.12$	$0.85 \pm 0.10$	$0.99 \pm 0.03$	$0.91 \pm 0.13$
	Validation Accuracy	$0.79 \pm 0.09$	$0.82 \pm 0.02$	$0.71 \pm 0.11$	$0.79 \pm 0.09$	$0.85 \pm 0.02$	$0.79 \pm 0.09$
	Test Accuracy	$0.79 \pm 0.08$	$0.82 \pm 0.08$	$0.72 \pm 0.12$	$0.81 \pm 0.09$	$0.84 \pm 0.02$	$0.79 \pm 0.08$

Table 2. Comparison of test accuracy improvements achieved by the proposed model over only GNN and only MLP baselines under the same system configuration.

Dataset	Model	TAGConv	SAGEConv	GCNConv	GATConv	GENConv	ChebConv
Amazon- computer	Only GNN	4% increase	6% increase	7% increase	5% increase	19% increase	6% increase
Amazon- computer	Only MLP	33% increase	34% increase	33% increase	32% increase	33% increase	31% increase
Amazon- Photo	Only GNN	3% increase	3% increase	1% increase	1% increase	7% increase	2% increase
Amazon- Photo	Only MLP	36% increase	36% increase	27% increase	27% increase	33% increase	29% increase
Citeseer	Only GNN	4% increase	1% increase	2% increase	3% increase	17% increase	6% increase
Citeseer	Only MLP	5% increase	2% increase	3% increase	4% increase	4% increase	2% increase
Cora	Only GNN	4% increase	3% increase	2% increase	2% increase	23% increase	7% increase
Cora	Only MLP	12% increase	12% increase	8% increase	8% increase	8% increase	11% increase
PubMed	Only GNN	4% increase	1% increase	3% increase	2% increase	6% increase	3% increase
PubMed	Only MLP	5% increase	3% increase	3% increase	1% increase	4% increase	2% increase
Wikics	Only GNN	4% increase	3% increase	5% increase	6% increase	6% increase	3% increase
Wikics	Only MLP	8% increase	11% increase	1% increase	10% increase	13% increase	8% increase

Table 3. Different accuracy metrics evaluation for the proposed architecture.

Dataset	Model	Macro-F1	Micro-F1	Precision	Recall	Time in Seconds
Amazon-computer	TAGConv	$0.9444$	$0.9535$	$0.9488$	$0.9404$	827
	SAGEConv	$0.9518$	$0.9557$	$0.9477$	$0.9568$	327
	GCNConv	$0.8723$	$0.8775$	$0.8590$	$0.8961$	391
	GATConv	$0.9379$	$0.9458$	$0.9341$	$0.9421$	180
	GENConv	$0.9516$	$0.9578$	$0.9509$	$0.9525$	38,105
	ChebConv	$0.9444$	$0.9535$	$0.9488$	$0.9404$	235
Amazon-photo	TAGConv	$0.9773$	$0.9784$	$0.9783$	$0.9765$	235
	SAGEConv	$0.9749$	$0.9804$	$0.9747$	$0.9751$	156
	GCNConv	$0.9603$	$0.9680$	$0.9683$	$0.9532$	170
	GATConv	$0.9589$	$0.9621$	$0.9647$	$0.9537$	70
	GENConv	$0.9516$	$0.9784$	$0.9791$	$0.9717$	13,092
	ChebConv	$0.9752$	$0.9784$	$0.9746$	$0.9761$	119
Citeseer	TAGConv	$0.8857$	$0.9024$	$0.8874$	$0.8842$	35
	SAGEConv	$0.8805$	$0.8904$	$0.8780$	$0.8835$	65
	GCNConv	$0.8356$	$0.8574$	$0.8379$	$0.8336$	18
	GATConv	$0.8405$	$0.8589$	$0.8388$	$0.8426$	21
	GENConv	$0.8457$	$0.8694$	$0.8425$	$0.8584$	31
	ChebConv	$0.8920$	$0.9039$	$0.8937$	$0.8906$	37
Cora	TAGConv	$0.8711$	$0.8821$	$0.8720$	$0.8721$	25
	SAGEConv	$0.8657$	$0.8821$	$0.8700$	$0.8640$	47
	GCNConv	$0.8280$	$0.8453$	$0.8334$	$0.8245$	19
	GATConv	$0.8163$	$0.8453$	$0.8254$	$0.8103$	24
	GENConv	$0.8227$	$0.8361$	$0.8275$	$0.8210$	28
	ChebConv	$0.8694$	$0.8821$	$0.8811$	$0.8598$	28
Corafull	TAGConv	$0.8559$	$0.8769$	$0.8607$	$0.8548$	463
	SAGEConv	$0.8386$	$0.8619$	$0.8508$	$0.8336$	476
	GCNConv	$0.8356$	$0.8548$	$0.8413$	$0.8370$	91
	GATConv	$0.8270$	$0.8533$	$0.8332$	$0.8274$	113
	GENConv	$0.8275$	$0.8480$	$0.8458$	$0.8202$	296
	ChebConv	$0.8177$	$0.8523$	$0.8391$	$0.8110$	4273
PubMed	TAGConv	$0.9628$	$0.9630$	$0.9628$	$0.9628$	378
	SAGEConv	$0.9559$	$0.9566$	$0.9559$	$0.9559$	239
	GCNConv	$0.9317$	$0.9351$	$0.9309$	$0.9324$	177
	GATConv	$0.9405$	$0.9437$	$0.9402$	$0.9409$	288
	GENConv	$0.9415$	$0.9445$	$0.9393$	$0.9439$	196
	ChebConv	$0.9573$	$0.9579$	$0.9565$	$0.9582$	99
Wikics	TAGConv	$0.9029$	$0.9090$	$0.8979$	$0.9099$	688
	SAGEConv	$0.9128$	$0.9214$	$0.9245$	$0.9052$	261
	GCNConv	$0.6887$	$0.7228$	$0.7422$	$0.7088$	331
	GATConv	$0.3396$	$0.5361$	$0.7350$	$0.3356$	19,906
	GENConv	$0.9264$	$0.9342$	$0.9276$	$0.9259$	123,079
	ChebConv	$0.9330$	$0.9351$	$0.9409$	$0.9258$	507

Table 4. Benchmarking with previous studies.

Methods	Amazon-Computer	Amazon-Photo	Citeseer	Cora	PubMed	Corafull	Wikics
Raw Features [23]	$0.74 \pm 0.10$	$0.79 \pm 0.20$	$0.49 \pm 0.30$	$0.47 \pm 0.40$	$0.69 \pm 0.20$	$0.43 \pm 0.60$	$0.72 \pm 0.90$
GCN [9]	$0.84 \pm 0.10$	$0.91 \pm 0.10$	$0.70 \pm 0.40$	$0.82 \pm 0.20$	$0.79 \pm 0.5$	$0.59 \pm 0.60$	$0.74 \pm 0.70$
GAT [10]	$0.84 \pm 0.10$	$0.91 \pm 0.10$	$0.70 \pm 0.40$	$0.83 \pm 0.20$	$0.79 \pm 0.50$	$0.59 \pm 0.60$	$0.74 \pm 0.70$
Deep Walk [7]	$0.85 \pm 0.10$	$0.89 \pm 0.10$	$0.43 \pm 0.40$	$0.81 \pm 0.20$	$0.65 \pm 0.50$	$0.53 \pm 0.50$	$0.74 \pm 0.70$
GMI [24]	$0.82 \pm 0.40$	$0.90 \pm 0.60$	$0.72 \pm 0.40$	$0.83 \pm 0.20$	$0.80 \pm 0.40$	$0.53 \pm 0.70$	$0.75 \pm 0.70$
GRACE [25]	$0.87 \pm 0.20$	$0.92 \pm 0.30$	$0.72 \pm 0.10$	$0.83 \pm 0.20$	$0.80 \pm 0.50$	$0.54 \pm 0.60$	$0.75 \pm 0.70$
MVGRL [26]	$0.87 \pm 0.10$	$0.92 \pm 0.10$	$0.73 \pm 0.40$	$0.83 \pm 0.30$	$0.80 \pm 0.70$	$0.59 \pm 0.40$	$0.76 \pm 1.10$
CGRL [27]	$0.89 \pm 0.50$	$0.94 \pm 0.30$	$0.75 \pm 0.20$	$0.86 \pm 0.20$	$0.85 \pm 0.60$	$0.63 \pm 0.50$	$0.80 \pm 0.30$
GIC [28]	$0.84 \pm 0.20$	$0.92 \pm 0.20$	$0.72 \pm 0.50$	$0.82 \pm 0.50$	$0.82 \pm 0.10$	$0.58 \pm 0.70$	$0.76 \pm 0.60$
CRLC [29]	$0.87 \pm 0.20$	$0.92 \pm 0.20$	$0.72 \pm 0.30$	$0.84 \pm 0.20$	$0.82 \pm 0.10$	$0.59 \pm 0.60$	$0.80 \pm 0.50$
This study	$0.90 \pm 0.03$	$0.95 \pm 0.03$	$0.76 \pm 0.04$	$0.88 \pm 0.04$	$0.88 \pm 0.02$	$0.67 \pm 0.05$	$0.84 \pm 0.02$

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Sejan, M.A.S.; Rahman, M.H.; Aziz, M.A.; Hameed, I.; Islam, M.S.; Sabuj, S.R.; Song, H.-K. Learning Robust Node Representations via Graph Neural Network and Multilayer Perceptron Classifier. Mathematics 2026, 14, 680. https://doi.org/10.3390/math14040680

AMA Style

Sejan MAS, Rahman MH, Aziz MA, Hameed I, Islam MS, Sabuj SR, Song H-K. Learning Robust Node Representations via Graph Neural Network and Multilayer Perceptron Classifier. Mathematics. 2026; 14(4):680. https://doi.org/10.3390/math14040680

Chicago/Turabian Style

Sejan, Mohammad Abrar Shakil, Md Habibur Rahman, Md Abdul Aziz, Iqra Hameed, Md Shofiqul Islam, Saifur Rahman Sabuj, and Hyoung-Kyu Song. 2026. "Learning Robust Node Representations via Graph Neural Network and Multilayer Perceptron Classifier" Mathematics 14, no. 4: 680. https://doi.org/10.3390/math14040680

APA Style

Sejan, M. A. S., Rahman, M. H., Aziz, M. A., Hameed, I., Islam, M. S., Sabuj, S. R., & Song, H.-K. (2026). Learning Robust Node Representations via Graph Neural Network and Multilayer Perceptron Classifier. Mathematics, 14(4), 680. https://doi.org/10.3390/math14040680

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Learning Robust Node Representations via Graph Neural Network and Multilayer Perceptron Classifier

Abstract

1. Introduction

2. GNN Datasets and Methods

2.1. Graph Benchmark Datasets

2.2. Graph Neural Network Methods

3. Proposed System Architecture

4. Experiment Results

4.1. Training of Each Model

4.2. Node Embedding Comparison

4.3. Comprehensive Evaluation of Models

4.4. Performance Under Noise Condition

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI