CA-NodeNet: A Category-Aware Graph Neural Network for Semi-Supervised Node Classification

Lu, Zichang; Zhong, Meiyu; Sun, Qiguo; Ma, Kai

doi:10.3390/electronics14163215

Open AccessArticle

CA-NodeNet: A Category-Aware Graph Neural Network for Semi-Supervised Node Classification

School of Computer, Jiangsu University of Science and Technology, Zhenjiang 212100, China

^*

Author to whom correspondence should be addressed.

Electronics 2025, 14(16), 3215; https://doi.org/10.3390/electronics14163215

Submission received: 13 July 2025 / Revised: 8 August 2025 / Accepted: 12 August 2025 / Published: 13 August 2025

(This article belongs to the Special Issue How Graph Convolutional Networks Work: Mechanisms and Models)

Download

Browse Figures

Versions Notes

Abstract

Graph convolutional networks (GCNs) have demonstrated remarkable effectiveness in processing graph-structured data and have been widely adopted across various domains. Existing methods mitigate over-smoothing through selective aggregation strategies such as attention mechanisms, edge dropout, and neighbor sampling. While some approaches incorporate global structural context, they often underexplore category-aware representations and inter-category differences, which are crucial for enhancing node discriminability. To address these limitations, a novel framework, CA-NodeNet, is proposed for semi-supervised node classification. CA-NodeNet comprises three key components: (1) coarse-grained node feature learning, (2) category-decoupled multi-branch attention, and (3) inter-category difference feature learning. Initially, a GCN-based encoder is employed to aggregate neighborhood information and learn coarse-grained representations. Subsequently, the category-decoupled multi-branch attention module employs a hierarchical multi-branch architecture, in which each branch incorporates category-specific attention mechanisms to project coarse-grained features into disentangled semantic subspaces. Furthermore, a layer-wise intermediate supervision strategy is adopted to facilitate the learning of discriminative category-specific features within each branch. To further enhance node feature discriminability, we introduce an inter-category difference feature learning module. This module first encodes pairwise differences between the category-specific features obtained from the previous stage and then integrates complementary information across multiple feature pairs to refine node representations. Finally, we design a dual-component optimization function that synergistically combines intermediate supervision loss with the final classification objective, encouraging the network to learn robust and fine-grained node representations. Extensive experiments on multiple real-world benchmark datasets demonstrate the superior performance of CA-NodeNet over existing state-of-the-art methods. Ablation studies further validate the effectiveness of each module in contributing to overall performance gains.

Keywords:

graph convolutional network; graph-structured data; attention mechanism; inter-category feature difference; node classification; semi-supervised learning

1. Introduction

Data with graph structures are prevalent in various domains, including social networks, biological networks and citation networks. Recently, graph convolutional networks (GCNs), a class of neural networks designed to learn from graph data, have gained significant attention due to their effectiveness in addressing graph-based analytical tasks such as node classification [1,2], graph classification [3], link prediction [4], and recommendation systems [5]. The typical GCN [6] and its variants usually operate under message-passing manner, where nodes iteratively aggregate features from topological neighbors to generate embeddings for downstream tasks. The key to success lies in the effective integration of topological structure and node features throughout this process.

Existing graph convolutional networks [7,8,9] generally adopt a holistic approach [10] for representation learning by globally modeling the structural information and node features of the graph simultaneously. While this approach is effective, it also has some limitations. Node features tend to homogenize as information propagates across multiple layers, leading to oversmoothing [11], which diminishes the capacity to distinguish nodes effectively. Furthermore, the holistic approach aggregates information indiscriminately from all neighbors, including those that may contribute little or no meaningful information. Such indiscriminate aggregation leads to information redundancy, which not only increases computational complexity but also places unnecessary burdens on the model. To address these problems, several strategies have been proposed. Velickovic et al. [12] introduced a notable solution by incorporating attention mechanisms into their method. They proposed assigning different weights to neighbors based on their relevance. This strategy mitigates excessive aggregation and preserves node specificity. By selectively weighting neighbor information, it was demonstrated that attention mechanisms effectively reduce oversmoothing and alleviate information redundancy. Yu Rong et al. [13] proposed another approach that involves randomly removing edges during training to prevent over-aggregation of information. This strategy mitigates oversmoothing while maintaining the diversity of node representations in deeper layers. Hamilton et al. [14] proposed sampling-based techniques as an effective approach to address redundancy in graph neural networks. By selectively sampling a subset of neighbors during the aggregation process, their method reduces computational complexity and mitigates the risk of information redundancy. It is worth noting that there are some defects in the above existing methods. The attention weights in [12] only reflect local correlations and fail to capture global graph properties; the key edge deletion strategy in [13] leads to the loss of important information; key neighbors in [14] may be omitted due to sampling. These defects essentially stem from the inability of the holistic approach to distinguish between different scale features of graph data for representation learning. These defects call for more targeted approaches that reduce redundancy and oversmoothing while enhancing the extraction of fine-grained discriminative features to improve performance.

To address the above problems, we propose CA-NodeNet, a novel two-stage framework consisting of category-decoupled multi-branch attention and inter-category difference feature learning. In the category-decoupled multi-branch attention stage, we design multiple sub-branches. Each sub-branch employs an attention unit to extract key information from features that are aggregated using a GCN-based encoder. Specifically, the attention unit calculates attention weights with Softmax to help the model focus on the most relevant features. Furthermore, each attention unit integrates a category-specific detector and a detection loss that not only guides the detector to focus on specific features [15,16] but also facilitates the learning of salient and discriminative representations. In addition, the category-decoupled multi-branch attention stage is constrained by intermediate supervision loss that averages multiple detection losses. In the inter-category difference feature learning stage, the model leverages inter-category differences to focus on category-level distinctions during training. This process utilizes inter-category information discrepancies to further enhance category-specific features and improve the model’s ability to capture and utilize discriminative features. By integrating these two stages, the CA-NodeNet effectively enhances the model’s ability to capture complex relationships and improves node representation expressiveness. As a result, this method effectively improves the model’s performance in node classification tasks. In summary, the key novelties and contributions of this work include the following.

(1): A novel GCN-based framework, named CA-NodeNet, is proposed, which can significantly enhance node feature discriminability. The core idea of CA-NodeNet is to learn category-specific features by designing a multi-branch structure with intermediate supervision and then improve the refinement of node features by mining the inter-category difference characteristics. The experiments show that CA-NodeNet can provide better performance than other works in node classification tasks.
(2): A category-decoupled multi-branch attention module is introduced. The module employs hierarchical multi-branch architecture; each branch leverages category-specific attention (CS Attention) to project the coarse-grained data to a specific category semantic space. Then, category-specific feature (CS Feature) detectors are adopted with a layer-wise supervision strategy to mine salient correlation features of the target category.
(3): An inter-category difference feature learning module is developed. The module consists of inter-category difference coding and difference-aware feature enhancing. The former is utilized to quantify pairwise differences between previously category-specific features. The latter is used to integrate complementary information from multiple pairs of features. This module can effectively enhance CA-NodeNet to mine the inter-category difference characteristics and preserve category-specific salient features.
(4): A dual-component optimization function is designed. The function synergistically integrates intermediate supervision loss with final classification objectives, which can effectively enhance CA-NodeNet to learn fine-grained node representation.

The remainder of this paper is organized as follows. Section 2 reviews the related work. Section 3 details the proposed methodology. In Section 4, we present and analyze the results of a series of experiments. Finally, Section 5 concludes the study and outlines potential directions for future work.

2. Related Works

Graph convolutional networks (GCNs) have demonstrated remarkable performance in semi-supervised node classification tasks [6]. To address the issue of over-smoothing and to enhance representation learning, various extensions have been proposed, which can be broadly categorized into four families: base encoders, attention-based methods, multi-scale fusion approaches, and contrastive learning frameworks.

Traditional graph convolution approaches, including GCN [6] and its streamlined variants such as Simplified GCN (SGC) [17] and GFNN [18], rely on spectral convolution to propagate and mix node features. While computationally efficient, these models often face limitations in depth and expressive power, making them less effective when dealing with complex structural semantics.

Building upon these foundations, attention-based techniques such as graph attention networks (GATs) [12] introduce self-attention to adaptively weight neighboring nodes during aggregation. Further developments, such as a high-order graph attention network [19] and multi-view GCNs with attention [20], aim to capture higher-order dependencies or leverage multi-view structural signals. However, these methods predominantly focus on local adaptivity and often overlook the importance of explicitly modeling inter-category semantic differences.

To exploit richer structural information, several works explore combining information from neighborhoods at various scales. MixHop [21] mixes multi-hop features via sparsified operations, while N-GCN [22] and MOGCN [23] integrate multi-scale outputs through parallel branches. DisGNN [24] disentangles edges to identify latent semantics, and GCN+PGBSF [25] improves efficiency through progressive sampling fusion. While these approaches effectively capture structural diversity, they typically lack category-aware discrimination capabilities.

A more recent trend introduces contrastive learning to enhance robustness through augmentation-based objectives. Methods such as DGI [26], GraphCL [27], and IGCL [28] learn representations by maximizing agreement between positive pairs, while PA-GCN [29] employs perturbation-augmented training to promote invariance in graph structure changes. Although effective in improving generalization, these models seldom integrate fine-grained category guidance in representation learning.

Despite extensive efforts to mitigate over-smoothing and enhance node representation learning, existing methods generally emphasize adaptivity, structural fusion, or augmentation but rarely unify them in a category-aware framework. More critically, most models lack explicit mechanisms to model inter-category differences and category-specific learning, which are vital for fine-grained node classification. To address these limitations, we propose a novel representation paradigm that leverages category-decoupled multi-branch attention mechanisms and inter-category difference modeling to guide node representation learning in a more holistic and discriminative manner.

3. Methodology

In this section, we present CA-NodeNet, a novel framework designed for node classification tasks. The overall architecture of CA-NodeNet is illustrated in Figure 1. The proposed framework consists of three principal components: (1) a coarse-grained node feature learning module, (2) a category-decoupled multi-branch attention module, and (3) an inter-category difference feature learning module. The mathematical notations referenced throughout Section 3 are summarized in Table 1 for ease of reference.

3.1. Coarse-Grained Node Feature Learning Module

For simplicity and effectiveness, we employ the widely adopted GCN proposed in [6] as the fundamental graph encoder within our framework. As illustrated in Figure 1, the model utilizes a GCN-based encoder to generate coarse-grained node feature, defined as

g (\cdot) : R^{N \times N} \times R^{N \times F} \to R^{N \times F}

, with N being the number of nodes and F the dimension of input node features. Specifically, a GCN leverages spectral graph theory to perform convolutional operations by propagating information between nodes and their respective neighbors. Given a graph

G = (V, E)

, where V is the set of nodes and E is the set of edges, the core operation of GCN involves aggregating and updating node features based on a symmetrically normalized adjacency matrix [6]. This process enables efficient extraction of topological and feature information for effective node representation learning. For each layer, the update rule for node features is as follows:

X^{(l + 1)} = σ (\tilde{A} X^{(l)} W^{(l)}) .

(1)

Here,

X^{l}

represents the node feature matrix at layer l;

\tilde{A} = {\hat{D}}^{- \frac{1}{2}} \hat{A} {\hat{D}}^{- \frac{1}{2}}

is the symmetrically normalized adjacency matrix;

\hat{A} = A + I

, where I is the identity matrix;

\hat{D} = D + I

is the degree matrix with self-loops.

W^{(l)} \in R^{F \times F_{h}}

represents the weight matrix of the network at layer l, with F being the dimension of input node features and

F_{h}

the dimension of output features.

σ (\cdot)

denotes

R e L U (\cdot) = max (0, \cdot)

. This process propagates the information of a node to its neighbors and achieves feature aggregation among nodes. Eventually, after multiple layers of GCNs, the output node’s representation can effectively capture the structural information and node features of the graph. In this paper, we employ a two-layer GCN encoder to obtain the coarse-grained node representation

Z \in R^{N \times F_{h}}

.

3.2. Category-Decoupled Multi-Branch Attention Module

As described in Section 3.1, feature aggregation for each node is accomplished using a GCN-based encoder, which integrates information from a node and its neighbors according to the underlying graph topology. Building upon this encoder, we propose a category-decoupled multi-branch attention module to extract salient and category-specific discriminative features. The architecture and operation of this module are depicted in Figure 1. The module is designed to enhance node representations by emphasizing informative features relevant to each category, thereby improving the discriminative capacity of the model.

The coarse-grained node representations, Z, are generated by utilizing a GCN-based encoder that processes the node feature matrix from the training set. With a coarse-grained node representation Z, our category-decoupled multi-branch attention module aims to learn salient and specific discriminative feature sets

{Z_{1}, \dots, Z_{k}, \dots, Z_{K}}

, where K is the number of categories. Inside the category-decoupled multi-branch attention module, we design K sub-branches for Z, in which the kth branch learns one category-specific feature

Z_{k}

for Z with respect to the kth category. Each sub-branch is composed of a category-specific attention (CS Attention) unit, a category-specific detector, and detection loss. CS Attention focuses on extracting the weights for Z with respect to category, the category-specific detector aligning with

Z_{k}

, and intermediate supervision loss, which constrains each category-specific feature over all sub-branches and is averaged from K detection losses.

Category-Specific Attention: In the context of the attention mechanism, the softmax function [30] ensures that the computed weights are normalized to sum to one. This normalization not only facilitates focusing on task-relevant dimensions with higher attention values but also suppresses the influence of less important feature dimensions [30], thereby enhancing the model’s discriminative capacity. We employ an attention mechanism to extract category-specific features, enhancing classification accuracy while reducing irrelevant information. Firstly, the input features are processed by a fully connected layer. Then, attention weights are computed via a softmax function and used to perform a weighted sum over the input features, forming category-specific features. Finally, these features are passed through fully connected layers and a sigmoid function for classification, enabling more precise detection. In our work, the attention mechanism provides flexibility in our model to learn K category-specific features

Z_{1}, \dots, Z_{k}, \dots, Z_{K}

from the feature Z. As shown in Figure 1, K CS Attention units are used. The details of each CS Attention unit are depicted in Figure 2. Specifically, the feature Z is separately connected to K fully connected layers. After activation by softmax, the attention weight of Z is obtained for each specific category:

a_{k} = {softmax}_{k} (Z),

(2)

where

a_{k} = [a_{k}^{1}, \dots, a_{k}^{n}, \dots, a_{k}^{N}] \in R^{N \times D}

is a matrix that has the same dimensions as Z,

R^{N \times D}

is the dimension of Z, and

n \in (1, N), k \in (1, K)

. Then, the representation

Z_{k} = [Z_{k}^{1}, \dots, Z_{k}^{n}, \dots, Z_{k}^{N}] \in R^{N \times D}

for each specific category is given by the following:

Z_{k} = a_{k} ⊙ Z,

(3)

where ⊙ denotes element-wise multiplication.

Category-Specific Detector: Each category-specific detector comprises two fully connected layers followed by a sigmoid layer as the output. The fully connected layers serve as classifiers, while the sigmoid layer functions as the activation mechanism, enabling the mapping of outputs to probabilities. Note that the sigmoid layer is not the only option for the detector. The kth category-specific detector is shown in Figure 2. For the training samples, the sub-branch for the kth category is trained by optimizing the following detection loss:

L_{sub_detec} (θ; g_{k}) = - [g_{k} log ({\hat{g}}_{k}) + (1 - g_{k}) log (1 - {\hat{g}}_{k})],

(4)

where

g_{k}

and

{\hat{g}}_{k} \in R^{N}

denote the ground truth (either 1 or 0) and the probability of samples belonging to the kth category, respectively. For example, if the category of samples is the kth category, then the ground truth of samples in the kth detector is 1 and that of the other detectors is 0.

L_{sub_detec}

allows the network to generate category-specific features for every sample.

Intermediate Supervision Loss: The category-decoupled multi-branch attention module is regulated by intermediate supervision loss, which is computed as the mean of K detection losses derived from the sub-branches. Specifically, for each sub-branch, a distinct detection loss is calculated, and the intermediate supervision loss is defined as the average of these K individual detection losses:

L_{detec} (θ; g) = \frac{1}{K} \sum_{k = 1}^{K} L_{sub_detec} (θ; g_{k}) .

(5)

By restraining the intermediate supervision loss of

L_{detec}

and the attention mechanism, the category-decoupled multi-branch attention module obtains salient and discriminative category-aware features for node classification.

3.3. Inter-Category Difference Feature Learning Module

The inter-category difference feature learning module is illustrated in Figure 3. The module computes feature differences to enhance category-specific features and strengthen the model’s capability to distinguish between different categories. This module comprises two sub-components: inter-category difference encoding and difference-aware feature enhancement. The inter-category difference encoding sub-module is responsible for quantifying pairwise differences between previously category-specific features, while the difference-aware feature enhancement sub-module further refines node representations.

Inter-Category Difference Encoding: To capture feature differences across categories, we first quantify pairwise differences between previously obtained category-specific features

{Z_{1}, \dots, Z_{k}, \dots, Z_{K}}

. These quantified difference values are then encoded to capture the discriminative information divergence across different categories. Specifically, this scheme calculates the distances between pairwise features using the Euclidean distance and encodes these distances as metrics to quantitatively assess inter-category differences. Then, we focus on pairs of features with significant differences to extract their complementary information and use it to enhance the feature representations of the target category:

D^{(v, w)} = \sqrt{tr ({(Z_{v} - Z_{w})}^{⊤} (Z_{v} - Z_{w}))},

(6)

where

D^{(v, w)}

represents the distance between the vth and wth categories.

Z_{v}

and

Z_{w}

are the category-specific features, and tr(·) represents the trace of matrix. To precisely distinguish the differences between pairwise features, the most significant inconsistencies within each pairwise feature set are utilized to constrain the receptive field during feature enhancement, ensuring a more targeted and effective improvement of feature representations. Thus, the process of difference encoding can be defined as follows:

D^{(v, w)} = G (Diag (D^{(v, w)})) \cdot W_{d}^{(v, w)},

(7)

where

D^{(v, w)}

represents the difference coefficient between the vth and wth categories.

Diag : R^{n \times n} \to R^{n \times 1}

extracts the diagonal elements of the matrix as a vector, and

W_{d}^{(v, w)}

is a learnable difference weight matrix.

G

is a difference encoder that retains the top-k largest distances in

Diag (D^{(v, w)})

and sets the rest to zero.

Difference-Aware Feature Enhancing: To ensure that each category-specific feature captures comprehensive and discriminative information, a difference-aware feature enhancement mechanism is employed. This mechanism is designed to learn and integrate complementary information from multiple pairs of feature representations. Specifically, the process of category-specific feature enhancing can be formally expressed as follows:

Z_{v}^{*} = Z_{v} + \sum_{\begin{matrix} w = 1 \\ w \neq v \end{matrix}}^{K} Z_{w} \cdot D^{(v, w)} .

(8)

where

Z_{v}^{*}

is the vth updated feature after considering inter-category differences. By leveraging the differences across multiple categories, we effectively compensate for the complementary information within inter-category features, enhancing their representational capacity.

3.4. Feature Fusion for Classification

Following the inter-category difference feature learning module, the fusion of K enhanced feature representations remains a significant challenge. To address this, we adopt element-wise summation to integrate all enhanced features, ensuring an efficient and unified representation. Consequently, the fuse feature representation

Z^{'}

is defined as follows:

Z^{'} = \sum_{k = 1}^{K} Z_{k}^{*},

(9)

where

Z_{k}^{*}

is the updated enhanced feature after considering inter-category differences.

Then, the fused feature

Z^{'}

is fed into the final classification module, which consists of two fully connected layers. The first fully connected layer is followed by a dropout layer (with a dropout probability of 0.5). Lastly, the output of the last fully connected layer is activated by a softmax unit. Here, the cross-entropy loss for node classification over all training nodes in X is represented as

L_{c l s}

:

L_{c l s} (ϕ; y_{i j}) = - \sum_{i \in X} \sum_{j = 1}^{K} y_{i j} ln {\hat{y}}_{i j},

(10)

where

y_{i j}

is the set of node indices that have labels,

y_{i j}

is the associated ground truth label vector (same as the one-hot vector),

{\hat{y}}_{i j}

denotes the associated prediction probabilities for the every sample,

{\hat{y}}_{i j} = [{\hat{y}}_{1}, {\hat{y}}_{2}, \dots, {\hat{y}}_{K}]

, and K is the number of categories.

A Dual-Component Optimization Function: The intermediate supervision loss

L_{detec}

in Equation (5) and classification loss of

L_{c l s}

in Equation (10) are combined to construct a novel loss for all training samples:

L (θ, ϕ; g, y_{i j}) = λ L_{detec} (θ; g) + L_{cls} (ϕ; y_{i j}) .

(11)

where the hyperparameter

λ

balances the contributions of intermediate supervision and category classification.

Finally, the model’s parameters are optimized with respect to the objective function using stochastic gradient descent. To ensure general applicability, the detailed algorithm of the proposed model is outlined in Algorithm 1.

Algorithm 1 CA-NodeNet

Require:: Graph $G = (V, E)$ , Feature Matrix $X$ , Adjacency Matrix $A$ .
Ensure:: Node classification predictions ${\hat{y}}_{i j}$ .
1:: Initialize model parameters $Θ$ .
2:: Encode node features using a GCN-based encoder through Equation (1) to obtain Z.
3:: for each category $k \in (1, K)$ do
4:: Compute CS attention weights $a_{k}$ via Equation (2)
5:: Compute category-specific feature $Z_{K}$ through Equation (3)
6:: Train category-specific detectors using Equation (4)
7:: end for
8:: Calculated intermediate supervision loss via Equation (5)
9:: for each pair of categories $(v, w)$ do
10:: Compute difference $D^{(v, w)}$ using Equations (6) and (7)
11:: Encode difference with top-k largest values.
12:: Enhance features with complementary information via Equation (8)
13:: end for
14:: Fuse all enhanced features via Equation (9) to obtain $Z^{'}$
15:: Feed $Z^{'}$ into a classifier and compute cross-entropy loss with Equation (10)
16:: Evaluate the loss with Equation (11)
17:: Update the parameter set $Θ$ via back propagation
18:: Return predictions ${\hat{y}}_{i j}$ .

4. Experiments

In this section, we present a comprehensive evaluation of the proposed CA-NodeNet by comparing its performance against state-of-the-art semi-supervised node classification methods. Furthermore, we perform ablation studies to systematically investigate the effectiveness and individual contributions of each component within CA-NodeNet.

4.1. Experimental Settings

4.1.1. Datasets

The experiments are conducted on eight benchmark datasets. The details of these datasets are summarized in Table 2. These datasets include three widely used citation networks: Cora, Citeseer, and Pubmed. Additionally, five other benchmark datasets are included: Chameleon and Squirrel from the Wikipedia network, the WebKB webpage datasets (comprising Cornell University and the University of Texas), and the ACM dataset.

Citation Networks: Cora, Citeseer, and Pubmed are standard citation network benchmark datasets [6], where nodes represent research papers and edges denote citations. The node features are represented as a bag-of-words encoding of the paper’s content, and the node labels correspond to the academic topics of the papers.
WebKB: WebKB is a webpage dataset from the computer science departments of multiple universities (e.g., Carnegie Mellon University). This study uses two subsets: Cornell and Texas. In these datasets [25], nodes are web pages, edges are hyperlinks between them, and node features are in a bag-of-words form of page content. Each web page is manually classified into one of five categories: student, project, course, staff, or faculty.
Wikipedia Networks: Chameleon and Squirrel are two page–page networks on specific topics in Wikipedia [31]. Nodes are web pages, and edges are mutual hyperlinks between them. Node features come from informative nouns in Wikipedia page content. Nodes are classified into four categories according to the average monthly traffic of web pages.

The datasets vary considerably in terms of the number of nodes, feature dimensions, and graph structure characteristics. In Cora, Citeseer, and Pubmed, many documents cite the same core papers, resulting in a one-to-many citation relationship. In contrast, the Texas, Cornell, Chameleon, and Squirrel datasets represent heterogeneous graphs, where connected nodes often exhibit low label similarity. This heterogeneity introduces significant challenges for node classification tasks, making these datasets particularly valuable for assessing the performance and robustness of graph-based models.

4.1.2. Baselines

We compare CA-NodeNet with state-of-the-art methods, which can be categorized into the following four groups:

Base Encoders: Including GCN [6], SGC [17], and GFNN [18].
Attention-Based Encoders: GAT [12].
Multi-Scale Information Fusion Encoders: Including MOGCN [23], MixHop [21], N-GCN [22], DisGNN [24], and GCN+PGBSF [25].
Contrastive Learning-Based Encoders: Including DGI [26], GraphCL [27], IGCL [28], and PA-GCN [29].

4.1.3. Evaluation Metrics

Following established works on evaluating node classification tasks [6,32,33,34,35,36,37,38,39], accuracy (ACC) and floating-point operations (FLOPs) [40] are employed as evaluation metrics to comprehensively assess the performance of the baseline models and CA-NodeNet. ACC is calculated on all testing examples based on each dataset, serving as a primary metric for evaluating classification effectiveness. FLOPs, in the context of neural networks, is a key metric that measures the computational workload and efficiency of neural network models. Specifically, FLOPs represents the total number of floating-point arithmetic operations, including addition and multiplication, performed during the training or inference process. This metric provides critical insights into the computational demands and efficiency of the proposed and baseline models.

4.1.4. Parameter Settings

The experiment is conducted on a computer system equipped with an Intel Core i7-13700F processor (5.2 GHz), 32 GB of RAM, and an NVIDIA RTX 4070Ti graphics card. The operating system used is Windows 11, and the implementation is carried out in Python 3.8 using the PyTorch 2.3.0 framework with CUDA 12.7 for deep learning computations. Model optimization is performed using the Adam [41] algorithm. The experimental results are analyzed using various performance metrics, such as accuracy, and visualizations are created using the Matplotlib version 3.7.2 library. The experiment is conducted over 500 epochs, with a learning rate of 0.005 and a dropout rate of 0.5. Consistent with the parameter settings in [6], the proposed method integrates a GCN-based encoder comprising a hidden layer with 16 neurons. Hyperparameter configurations for other comparative methods are taken from the default values specified in their corresponding articles or codebases.

4.2. Node Classification

The detailed results are presented in Table 3. Overall, CA-NodeNet consistently achieves superior performance compared to baseline models across most benchmark datasets. Based on these results, we draw the following key observations:

(1): Compared to all baseline models, CA-NodeNet demonstrates exceptional competitiveness, consistently achieving superior performance across the majority of datasets. Notably, CA-NodeNet attains outstanding results on representative benchmarks, including Cora, Pubmed, ACM, and Chameleon, as well as on the Texas and Cornell datasets.
(2): CA-NodeNet outperforms multi-scale information fusion-based encoders (MOGCN [23], MixHop [21], N-GCN [22], DisGNN [24], GCN+PGBSF [25]), contrastive learning-based encoders (DGI [26], IGCL [28], GraphCL [27], and PA-GCN [29]) and attention-based encoders (GAT) [12] across the majority of datasets. This superiority can be attributed to the model’s integration of multiple innovative modules, including category-decoupled multi-branch attention, inter-category difference feature learning, and a category-specific detector. These components enable CA-NodeNet to effectively capture category-specific discriminative features, resulting in enhanced performance and generalization. Taking the Chameleon dataset as an example, CA-NodeNet outperforms the multi-scale information fusion-based encoders for training set.
(3): As observed from Table 3, CA-NodeNet achieves leading performance on most datasets but its performance on Citeseer is weaker compared to GAT. This can be attributed to the fact that GAT dynamically models the importance of node neighbors via its attention mechanism, allowing it to perform better in situations where the graph structure is sparse or incomplete. In contrast, CA-NodeNet leverages both intra-category and inter-category information based on the original graph, demonstrating robust performance across other various datasets.

4.3. Ablation Study

To further validate the effectiveness of CA-NodeNet, we conduct ablation studies to analyze the contribution of its key components. Specifically, “w/o

C A

&

I C

” denotes the base model without the category-decoupled multi-branch attention module (

C A

) and the inter-category difference feature learning module (

I C

). “w/

C A

only” introduces only the

C A

module to assess its standalone impact on node representation. “w/

C A

&

I C

” represents the full version of CA-NodeNet with both components integrated.

To ensure reliability, all ablation experiments were repeated multiple independent times. We report the average accuracy along with standard deviation (±) for each setting. The results are shown in Table 4, which demonstrate the stability of our proposed modules.

The results of the ablation studies are presented in Table 4. From these results, several key observations can be made, as detailed below:

(1): The introduction of the category-decoupled multi-branch attention module resulted in a significant improvement in performance across all datasets except Citeseer (e.g., 81.27% → 81.4% on Cora and 47.6% → 52.8% on Chameleon). These results indicate that the attention mechanisms within the category-decoupled multi-branch attention module effectively emphasize distinctive characteristics and generate salient features, enhancing the overall model performance.
(2): The addition of the inter-category difference feature learning module resulted in notable performance improvements on most datasets (e.g., 81.4% → 83.81% on Cora and 91.7% → 92.81% on ACM). This improvement is attributed to the module’s ability to effectively consider the information differences between the extracted category-specific features. By employing a specialized inter-category difference encoder, the module processes feature pairs to efficiently identify and leverage these differences, thereby enhancing the model’s discriminative power.
(3): Additionally, we evaluate the FLOPs of the ablations and representative baseline models (MOGCN and GCN) for a comprehensive comparison, as shown in Table 5. From the results in Table 5, it can be observed that CA-NodeNet and its variant without the inter-category difference feature learning module exhibit different computational results. This is because the (w/ $C A$ only) module lacks the inter-category difference computation. Notably, compared to MOGCN, CA-NodeNet achieves lower FLOPs on some datasets (e.g., Pubmed, Citeseer), demonstrating higher efficiency alongside superior performance. However, on other datasets (e.g., Cora, Chameleon, ACM), CA-NodeNet incurs higher computational costs, which are offset by its significantly better performance. Furthermore, due to the incorporation of multiple convolutional operations, CA-NodeNet exhibits higher FLOPs compared to GCNs. Nevertheless, this computational overhead is compensated by the model’s excellent performance across various datasets.

4.4. Labeling Rates Analysis

In the field of graph learning, analyzing labeling rates is a fundamental approach to evaluating model robustness. In this study, we examine the impact of varying labeling rates on model performance. To this end, we selected five datasets for our experiments and partitioned the training samples based on different labeling rates for each dataset. For example, the Cora dataset was divided into 14/21/28/140 training samples corresponding to different labeling rates. The same procedure was applied to the other four datasets, ensuring a consistent and systematic evaluation across all datasets.

From Table 6, we make the following observations: CA-NodeNet demonstrates strong effectiveness even in scenarios with limited labeled information. Specifically, CA-NodeNet maintains considerable classification performance when the number of labeled nodes is small. For instance, on the Pubmed dataset with a label rate corresponding to a training size of nine samples, the CA-NodeNet achieves a classification accuracy of 60.2%, significantly outperforming the second-ranked PA-GCN, which achieves only 51.0% accuracy.

4.5. Visualization Analysis

We perform a visualization task on the Cora dataset under different numbers of labeled nodes in the training set to further validate the effectiveness of CA-NodeNet. The output embeddings from its final layer, prior to the application of the softmax function, are utilized for this purpose. Additionally, the embeddings of the test set are visualized using t-SNE [42]. The visualization results of the Cora dataset, as shown in Figure 4, are color-coded according to the true labels.

From Figure 4, it is evident that the results of GCNs are suboptimal, as nodes with different labels are intermingled, leading to poor category separation. In contrast, the visual representation produced by CA-NodeNet stands out, demonstrating the most optimal embeddings. These embeddings are characterized by a more compact structure, higher intra-category similarity, and well-defined boundaries between different categories. Furthermore, its visual representation remains robust even in scenarios with a limited number of labeled nodes in the training set, further highlighting its effectiveness.

5. Conclusions

In this paper, we propose CA-NodeNet, a novel GCN-based framework that enhances node feature discriminability through a hierarchical multi-branch architecture with intermediate supervision. It first aggregates features via a GCN encoder and then employs a category-decoupled multi-branch attention module to extract discriminative category-specific representations. An inter-category difference learning module further enhances these features by modeling semantic gaps between categories. To jointly optimize category-specific feature discrimination and overall classification accuracy, we design a dual-component loss function. Extensive experiments on semi-supervised node classification benchmarks demonstrate that CA-NodeNet outperforms state-of-the-art baselines. The node classification performance of CA-NodeNet exhibits a stable upward trend as the label rate increases, demonstrating efficient use of labeled data and strong fine-grained representation learning. By jointly leveraging graph structure and inter-category relationships, it effectively addresses oversmoothing limitations in prior methods.

Category-specific attention in CA-NodeNet is not only effective in improving classification accuracy but also enhances the interpretability of learned representations. Since each attention branch is designed to focus on category-specific semantic subspaces, the resulting attention weights reflect the relative importance of features with respect to each category. This design provides explainable insights into the model’s decision-making process, allowing practitioners to understand what aspects of the input features contribute most to each category. Such interpretability is particularly valuable in real-world applications that require transparency and trust. For instance, in a recommendation scenario, the attention responses of each category branch can reveal user preferences aligned with different content types. In fraud detection, the model can highlight behavioral patterns that are indicative of specific fraud types, improving the model’s diagnostic utility. Additionally, category-specific modeling enables the network to generalize to domains where category semantics differ significantly.

In future work, we plan to extend the application of the framework to additional tasks, further exploring its potential. Moreover, the strong adaptability of CA-NodeNet ensures its applicability to a wide range of other scenarios, making it a versatile tool for graph-based learning tasks.

Author Contributions

Conceptualization, Z.L. and M.Z.; methodology, Z.L.; software, Q.S.; validation, Z.L.; formal analysis, K.M.; investigation, M.Z.; resources, Q.S.; data curation, K.M.; writing—original draft preparation, Z.L.; writing—review and editing, Z.L. and M.Z.; visualization, Z.L.; supervision, M.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported by the National Natural Science Foundation of China (42402239) and Research Start-up funding for high-level talent of Jiangsu University of Science and Technology (1132932306 and 1132932304).

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author(s).

Conflicts of Interest

The authors declare no conflicts of interest.

References

Li, S.; Han, L.; Wang, Y.; Pu, Y.; Zhu, J.; Li, J. GCL: Contrastive learning instead of graph convolution for node classification. Neurocomputing 2023, 551, 126491. [Google Scholar] [CrossRef]
Wang, K.; An, J.; Zhou, M.; Shi, Z.; Shi, X.; Kang, Q. Minority-weighted graph neural network for imbalanced node classification in social networks of internet of people. IEEE Internet Things J. 2022, 10, 330–340. [Google Scholar] [CrossRef]
Gao, H.; Ji, S. Graph u-nets. In Proceedings of the International Conference on Machine Learning, PMLR, Long Beach, CA, USA, 9–15 June 2019; pp. 2083–2092. [Google Scholar]
Zhang, P.; Chen, J.; Che, C.; Zhang, L.; Jin, B.; Zhu, Y. IEA-GNN: Anchor-aware graph neural network fused with information entropy for node classification and link prediction. Inf. Sci. 2023, 634, 665–676. [Google Scholar] [CrossRef]
Sun, J.; Gao, L.; Shen, X.; Liu, S.; Liang, R.; Du, S.; Liu, S. Separated graph neural networks for recommendation systems. IEEE Trans. Ind. Inform. 2022, 19, 382–393. [Google Scholar] [CrossRef]
Kipf, T.N.; Welling, M. Semi-supervised classification with graph convolutional networks. arXiv 2016, arXiv:1609.02907. [Google Scholar]
Guo, Q.; Yang, X.; Ding, W.; Qian, Y. Cross-Graph Interaction Networks. IEEE Trans. Knowl. Data Eng. 2025, 37, 2341–2355. [Google Scholar] [CrossRef]
Wang, Y.; Yang, X.; Sun, Q.; Qian, Y.; Guo, Q. Purity skeleton dynamic hypergraph neural network. Neurocomputing 2024, 610, 128539. [Google Scholar] [CrossRef]
Sun, Q.; Wei, X.; Yang, X. GraphSAGE with deep reinforcement learning for financial portfolio optimization. Expert Syst. Appl. 2024, 238, 122027. [Google Scholar] [CrossRef]
Ma, J.; Cui, P.; Kuang, K.; Wang, X.; Zhu, W. Disentangled graph convolutional networks. In Proceedings of the International Conference on Machine Learning, PMLR, Long Beach, CA, USA, 9–15 June 2019; pp. 4212–4221. [Google Scholar]
Li, Q.; Han, Z.; Wu, X.M. Deeper insights into graph convolutional networks for semi-supervised learning. In Proceedings of the AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018; Volume 32. [Google Scholar]
Veličković, P.; Cucurull, G.; Casanova, A.; Romero, A.; Lio, P.; Bengio, Y. Graph attention networks. arXiv 2017, arXiv:1710.10903. [Google Scholar]
Rong, Y.; Huang, W.; Xu, T.; Huang, J. Dropedge: Towards deep graph convolutional networks on node classification. arXiv 2019, arXiv:1907.10903. [Google Scholar]
Hamilton, W.; Ying, Z.; Leskovec, J. Inductive representation learning on large graphs. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Volume 30. [Google Scholar]
Teng, Q.; Yang, X.; Sun, Q.; Wang, P.; Wang, X.; Xu, T. Sequential attention layer-wise fusion network for multi-view classification. Int. J. Mach. Learn. Cybern. 2024, 15, 5549–5561. [Google Scholar] [CrossRef]
Ju, H.; Li, J.; Ding, W.; Fan, X.; Huang, J.; Xu, S.; Yang, X. Class-specific semi-supervised feature selection with fuzzy convex balling information granularity. Inf. Sci. 2025, 700, 121821. [Google Scholar] [CrossRef]
Wu, F.; Souza, A.; Zhang, T.; Fifty, C.; Yu, T.; Weinberger, K. Simplifying graph convolutional networks. In Proceedings of the 36th International Conference on Machine Learning, Long Beach, CA, USA, 9–15 June 2019; pp. 6861–6871. [Google Scholar]
Hoang, N.; Maehara, T.; Murata, T. Revisiting graph neural networks: Graph filtering perspective. In Proceedings of the 2020 25th International Conference on Pattern Recognition (ICPR), Milan, Italy, 10–15 January 2021; pp. 8376–8383. [Google Scholar]
He, L.; Bai, L.; Yang, X.; Du, H.; Liang, J. High-order graph attention network. Inf. Sci. 2023, 630, 222–234. [Google Scholar] [CrossRef]
Yao, K.; Liang, J.; Liang, J.; Li, M.; Cao, F. Multi-view graph convolutional networks with attention mechanism. Artif. Intell. 2022, 307, 103708. [Google Scholar] [CrossRef]
Abu-El-Haija, S.; Perozzi, B.; Kapoor, A.; Alipourfard, N.; Lerman, K.; Harutyunyan, H.; Ver Steeg, G.; Galstyan, A. Mixhop: Higher-order graph convolutional architectures via sparsified neighborhood mixing. In Proceedings of the International Conference on Machine Learning, PMLR, Long Beach, CA, USA, 9–15 June 2019; pp. 21–29. [Google Scholar]
Abu-El-Haija, S.; Kapoor, A.; Perozzi, B.; Lee, J. N-gcn: Multi-scale graph convolution for semi-supervised node classification. In Proceedings of the Uncertainty in Artificial Intelligence, PMLR, Virtual, 13–18 July 2020; pp. 841–851. [Google Scholar]
Wang, J.; Liang, J.; Cui, J.; Liang, J. Semi-supervised learning with mixed-order graph convolutional networks. Inf. Sci. 2021, 573, 171–181. [Google Scholar] [CrossRef]
Zhao, T.; Zhang, X.; Wang, S. Exploring edge disentanglement for node classification. In Proceedings of the ACM Web Conference 2022, Virtual, 25–29 April 2022; pp. 1028–1036. [Google Scholar]
Cong, H.; Sun, Q.; Yang, X.; Liu, K.; Qian, Y. Enhancing graph convolutional networks with progressive granular ball sampling fusion: A novel approach to efficient and accurate GCN training. Inf. Sci. 2024, 676, 120831. [Google Scholar] [CrossRef]
Velickovic, P.; Fedus, W.; Hamilton, W.L.; Liò, P.; Bengio, Y.; Hjelm, R.D. Deep graph infomax. ICLR (Poster) 2019, 2, 4. [Google Scholar]
You, Y.; Chen, T.; Sui, Y.; Chen, T.; Wang, Z.; Shen, Y. Graph contrastive learning with augmentations. Adv. Neural Inf. Process. Syst. 2020, 33, 5812–5823. [Google Scholar]
Liang, H.; Du, X.; Zhu, B.; Ma, Z.; Chen, K.; Gao, J. Graph contrastive learning with implicit augmentations. Neural Netw. 2023, 163, 156–164. [Google Scholar] [CrossRef] [PubMed]
Guo, Q.; Yang, X.; Zhang, F.; Xu, T. Perturbation-augmented graph convolutional networks: A graph contrastive learning architecture for effective node classification tasks. Eng. Appl. Artif. Intell. 2024, 129, 107616. [Google Scholar] [CrossRef]
Lin, Z.; Feng, M.; Santos, C.N.d.; Yu, M.; Xiang, B.; Zhou, B.; Bengio, Y. A structured self-attentive sentence embedding. arXiv 2017, arXiv:1703.03130. [Google Scholar]
Pei, H.; Wei, B.; Chang, K.C.C.; Lei, Y.; Yang, B. Geom-gcn: Geometric graph convolutional networks. arXiv 2020, arXiv:2002.05287. [Google Scholar]
Zhang, X.; Ding, G.; Li, J.; Wang, W.; Wu, Q. Deep learning empowered MAC protocol identification with squeeze-and-excitation networks. IEEE Trans. Cogn. Commun. Netw. 2021, 8, 683–693. [Google Scholar] [CrossRef]
Cong, H.; Yang, X.; Liu, K.; Guo, Q. Feature-topology cascade perturbation for graph neural network. Eng. Appl. Artif. Intell. 2025, 152, 110657. [Google Scholar] [CrossRef]
Guo, Q.; Yang, X.; Guan, W.; Ma, K.; Qian, Y. Robust graph mutual-assistance convolutional networks for semi-supervised node classification tasks. Inf. Sci. 2025, 694, 121708. [Google Scholar] [CrossRef]
Guo, Q.; Yang, X.; Li, M.; Qian, Y. Collaborative graph neural networks for augmented graphs: A local-to-global perspective. Pattern Recognit. 2025, 158, 111020. [Google Scholar] [CrossRef]
Bruna, J.; Zaremba, W.; Szlam, A.; LeCun, Y. Spectral networks and locally connected networks on graphs. arXiv 2013, arXiv:1312.6203. [Google Scholar]
Fu, S.; Liu, W.; Tao, D.; Zhou, Y.; Nie, L. HesGCN: Hessian graph convolutional networks for semi-supervised classification. Inf. Sci. 2020, 514, 484–498. [Google Scholar] [CrossRef]
Chen, J.; Ma, T.; Xiao, C. Fastgcn: Fast learning with graph convolutional networks via importance sampling. arXiv 2018, arXiv:1801.10247. [Google Scholar]
Chen, M.; Wei, Z.; Huang, Z.; Ding, B.; Li, Y. Simple and deep graph convolutional networks. In Proceedings of the International Conference on Machine Learning, PMLR, Virtual, 13–18 July 2020; pp. 1725–1735. [Google Scholar]
Molchanov, P.; Tyree, S.; Karras, T.; Aila, T.; Kautz, J. Pruning convolutional neural networks for resource efficient inference. arXiv 2016, arXiv:1611.06440. [Google Scholar]
Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
Wang, X.; Zhu, M.; Bo, D.; Cui, P.; Shi, C.; Pei, J. Am-gcn: Adaptive multi-channel graph convolutional networks. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Virtual, 6–10 July 2020; pp. 1243–1253. [Google Scholar]

Figure 1. The architecture of CA-NodeNet.

Figure 2. Detailed description of the kth sub-branch in category-decoupled multi-branch attention module.

Figure 3. Detailed description of the inter-category difference feature learning module.

Figure 4. Visualization of the learned node embeddings on the Cora dataset.

Table 1. Explanation of notations in Section 3.

Notations	Descriptions
$X^{(l)}$	Node feature matrix at layer l
A	Adjacency matrix of the graph
$\hat{A}$	Adjacency matrix of the graph with self-loops
$\tilde{A}$	Symmetrically normalized adjacency matrix
D	Degree matrix
$\hat{D}$	Degree matrix with self-loops
$W^{(l)}$	Weight matrix of the network at layer l
$σ$	Activation function
Z	Output node representation from GCN
$Z_{k}$	Category-specific feature of k-th category
$a_{k}$	The CS attention weight of the k-th category
${\hat{g}}_{k}$	Predicted probability of k-th category
$g_{k}$	Ground truth label of k-th category
$L_{sub_detec}$	Detection loss of k-th category
$L_{detec}$	Intermediate supervision loss, average over all categories
$Z_{v}$	Feature representation of category v
$D^{(v, w)}$	Difference coefficient between categories v and w
$Z_{k}^{*}$	Enhanced feature for k-th category after difference learning
$Z^{'}$	The fused feature after integrating the enhanced feature

Table 2. Dataset description.

Datasets	Nodes	Edges	Features	Classes	Label Rate
Cora	2708	5429	1433	7	0.05
Citeseer	3327	4732	3703	6	0.057
Pubmed	19,717	44,338	500	3	0.003
Chameleon	2277	36,101	2325	4	0.048
Texas	183	1703	309	5	0.0475
Cornell	183	1703	295	5	0.0475
Squirrel	5201	2089	217,073	4	0.048
ACM	3025	13,128	1870	3	0.02

Table 3. Accuracy comparisons on eight datasets.

	Cora	Citeseer	Pubmed	Chameleon	Texas	Cornell	Squirrel	ACM
GCN [6]	0.8127	0.7040	0.7900	0.4760	0.3923	0.2466	0.3143	0.8780
GAT [12]	0.8320	0.7260	0.7711	0.2790	0.3791	0.2447	0.3158	0.8740
SGC [17]	0.8120	0.7190	0.7890	0.2970	0.3788	0.2514	0.3178	0.8630
DGI [26]	0.8270	0.7180	0.7680	0.3560	-	-	-	0.8940
MIXHOP [21]	0.8190	0.7140	0.8080	0.4060	-	-	-	0.8740
N-GCN [22]	0.8300	0.7220	0.7950	0.4890	-	-	-	0.8800
GraphCL [27]	0.8310	0.7210	0.8060	0.4860	-	-	-	0.9020
GFNN [18]	0.8190	0.6960	0.8070	0.4740	-	-	0.2977	0.8680
MOGCN [23]	0.8240	0.7240	0.7920	0.4690	0.3847	0.2405	0.2957	0.9010
DisGNN [24]	0.8300	0.7200	0.8020	0.4730	-	-	-	0.8900
IGCL [28]	0.8350	0.7210	0.8080	0.4760	-	-	-	0.8920
GCN+PGBSF [25]	0.8048	0.7036	0.8167	0.5125	0.4076	0.2721	0.3252	0.9013
PA-GCN [29]	0.8360	0.7040	0.7930	0.4900	-	-	-	0.9090
OURS	0.8381	0.7020	0.8170	0.5380	0.4706	0.3203	0.3470	0.9281

The best result in each column is highlighted in bold.

Table 4. Ablation study on node classification. Bold: best.

Dataset	w/o $CA$ & $IC$	w/ $CA$ Only	w/ $CA$ & $IC$
Cora	0.8127 ± 0.0041	0.8140 ± 0.0045	0.8381 ± 0.0037
Citeseer	0.7040 ± 0.0053	0.6910 ± 0.0060	0.7020 ± 0.0048
Pubmed	0.7900 ± 0.0038	0.8010 ± 0.0042	0.8170 ± 0.0035
Chameleon	0.4760 ± 0.0061	0.5280 ± 0.0072	0.5380 ± 0.0068
Texas	0.3923 ± 0.0095	0.4379 ± 0.0101	0.4706 ± 0.0089
Cornell	0.2466 ± 0.0074	0.3007 ± 0.0065	0.3203 ± 0.0056
Squirrel	0.3143 ± 0.0058	0.3300 ± 0.0050	0.3470 ± 0.0043
ACM	0.8780 ± 0.0042	0.9170 ± 0.0039	0.9281 ± 0.0037

Table 5. FLOPs (millions) of ablations and baselines.

Datasets	Ablations		Baselines
	w/ $CA$ & IC	w/ $CA$ Only	MOGCN	GCN
Cora	258.2 M	217.2 M	61.6 M	13.7 M
Citeseer	797.8 M	699.5 M	1678.1 M	390.7 M
Pubmed	654.9 M	587.9 M	1358.2 M	319.6 M
Chameleon	346.8 M	301.7 M	55.6 M	12.3 M
ACM	366.6 M	327.6 M	47.8 M	11.6 M

Table 6. ACC of node classification tasks (%) (bold: best).

	Training	GCN	GAT	SGC	DGI	N-GCN	GraphCL	GFNN	MOGCN	DisGNN	IGCL	PA-GCN	OURS
Cora	14	43.8	44.5	40.9	45.1	40.8	45.3	37.8	41.9	45.5	45.0	47.6	49.3
	21	50.9	55.6	51.2	53.8	52.4	54.0	48.9	62.8	54.2	54.2	56.7	64.3
	28	62.3	66.8	61.7	69.9	61.8	70.1	60.3	71.5	70.3	70.3	72.1	72.4
	140	81.7	83.2	81.2	82.7	83.0	83.1	81.9	82.4	83.0	83.5	83.6	83.8
	Average	59.7	62.5	58.8	62.9	59.5	63.1	57.2	64.7	63.3	63.3	65.0	67.5
Citeseer	12	24.7	33.0	24.3	27.1	20.5	27.5	23.2	28.1	27.5	25.5	28.9	37.5
	18	43.6	58.4	42.3	48.8	47.2	49.3	40.8	58.9	49.8	48.3	60.4	61.5
	36	55.3	61.1	54.2	61.2	58.4	62.2	54.1	62.8	62.3	63.3	64.7	66.0
	120	70.4	72.6	71.9	71.8	72.2	72.1	69.6	72.4	72.0	72.1	70.4	70.2
	Average	48.5	56.3	48.2	52.2	49.6	52.8	46.9	55.6	52.9	52.3	56.1	58.8
Pubmed	9	43.9	42.0	45.7	43.2	39.9	43.2	35.5	40.2	42.2	42.2	51.0	60.2
	15	60.5	56.6	55.8	65.4	62.4	65.4	62.5	63.2	65.3	65.2	63.5	66.7
	30	57.5	70.7	59.1	67.9	58.1	67.9	54.3	68.5	67.5	68.1	73.5	76.3
	60	79.0	77.1	78.9	76.8	79.5	80.6	80.7	79.2	80.2	80.8	79.3	81.7
	Average	60.2	62.1	59.9	63.3	60.0	64.3	58.3	62.8	63.8	64.1	66.8	71.2
ACM	9	51.8	56.3	50.1	52.8	54.4	53.8	50.2	56.6	51.5	52.6	56.7	64.3
	15	65.6	64.6	44.6	72.6	64.3	72.4	62.3	68.1	71.2	72.2	69.7	75.4
	30	79.6	80.0	70.4	86.3	76.5	87.1	74.3	87.5	86.4	87.3	87.2	89.0
	60	87.8	87.4	86.3	89.4	88.0	90.2	86.8	90.1	89.0	89.2	90.9	92.8
	Average	71.2	72.1	62.8	75.3	70.8	75.9	68.4	75.6	74.5	75.3	76.1	80.2
Chameleon	8	23.9	27.7	25.2	20.2	21.5	21.5	25.0	28.1	20.2	20.2	29.0	33.9
	16	23.6	27.1	25.6	26.4	25.3	25.6	23.8	25.5	26.2	26.6	27.7	38.0
	28	31.4	28.2	25.9	30.1	29.2	31.1	28.6	30.2	31.3	32.1	41.9	41.7
	120	47.6	27.9	29.7	35.6	48.9	48.6	47.4	46.9	47.3	47.6	49.1	53.8
	Average	31.6	27.7	26.6	28.1	31.2	31.7	31.2	32.7	31.3	31.6	34.4	41.8

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Lu, Z.; Zhong, M.; Sun, Q.; Ma, K. CA-NodeNet: A Category-Aware Graph Neural Network for Semi-Supervised Node Classification. Electronics 2025, 14, 3215. https://doi.org/10.3390/electronics14163215

AMA Style

Lu Z, Zhong M, Sun Q, Ma K. CA-NodeNet: A Category-Aware Graph Neural Network for Semi-Supervised Node Classification. Electronics. 2025; 14(16):3215. https://doi.org/10.3390/electronics14163215

Chicago/Turabian Style

Lu, Zichang, Meiyu Zhong, Qiguo Sun, and Kai Ma. 2025. "CA-NodeNet: A Category-Aware Graph Neural Network for Semi-Supervised Node Classification" Electronics 14, no. 16: 3215. https://doi.org/10.3390/electronics14163215

APA Style

Lu, Z., Zhong, M., Sun, Q., & Ma, K. (2025). CA-NodeNet: A Category-Aware Graph Neural Network for Semi-Supervised Node Classification. Electronics, 14(16), 3215. https://doi.org/10.3390/electronics14163215

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

CA-NodeNet: A Category-Aware Graph Neural Network for Semi-Supervised Node Classification

Abstract

1. Introduction

2. Related Works

3. Methodology

3.1. Coarse-Grained Node Feature Learning Module

3.2. Category-Decoupled Multi-Branch Attention Module

3.3. Inter-Category Difference Feature Learning Module

3.4. Feature Fusion for Classification

4. Experiments

4.1. Experimental Settings

4.1.1. Datasets

4.1.2. Baselines

4.1.3. Evaluation Metrics

4.1.4. Parameter Settings

4.2. Node Classification

4.3. Ablation Study

4.4. Labeling Rates Analysis

4.5. Visualization Analysis

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI