Disentangled Graph Representation Based on Prototype Subgraph Neural Network

Yang, Baosheng; Yang, Jingshang; Xu, Lixiang; Tang, Yuanyan

doi:10.3390/math14111915

Open AccessArticle

Disentangled Graph Representation Based on Prototype Subgraph Neural Network

¹

School of Computer Science and Information Engineering, Hefei University of Technology, Hefei 230601, China

²

Suzhou Industrial Investment Holding Group Co., Ltd., Suzhou 234000, China

³

School of Artificial Intelligence, Hefei Institute of Technology, Hefei 230601, China

⁴

Department of Computer and Information Science, Faculty of Science and Technology, University of Macau, Macau SAR, China

^*

Author to whom correspondence should be addressed.

Mathematics 2026, 14(11), 1915; https://doi.org/10.3390/math14111915

Submission received: 8 April 2026 / Revised: 16 May 2026 / Accepted: 26 May 2026 / Published: 1 June 2026

(This article belongs to the Special Issue Advances in Graph Computing: Algorithms and Applications)

Download

Browse Figures

Versions Notes

Abstract

While graph representation learning methods have been successfully applied to various graph data mining tasks, they typically couple graph information into an unstructured holistic representation. This makes it difficult to explicitly identify substructures with specific functionalities within the graph and lacks the ability to mine discriminative prototype structures that can be shared across graphs. To address these challenges, this paper proposes Prototype Subgraph Disentangled Graph Neural Network (PSDGNN). It explicitly disentangles node features into multiple independent latent factor groups through latent factor decomposition and prototype alignment mechanisms. A subgraph generator then converts these factors into factor subgraphs to model latent semantic substructures. Furthermore, learnable prototype subgraphs are introduced to represent foundational structural patterns shared across graphs. Through similarity matching and mutual information minimization objectives, the model aligns factor subgraphs with corresponding prototypes in structural semantics. Experimental results demonstrate superior performance over existing baseline methods across seven public datasets. The model provides intuitive, structured explanations for classification decisions through visualizable factor subgraphs and prototype subgraphs, significantly enhancing interpretability and generalization capabilities.

Keywords:

disentangled representation learning; graph neural networks; graph kernel; graph classification

MSC:

68R10; 62H30; 68T10

1. Introduction

Graph data, as a prevalent form of non-Euclidean data, are widely present across numerous real-world domains, including transportation networks [1], citation networks [2], biological protein interactions [3], and knowledge graphs [4]. To extract valuable insights from these complex graph datasets, graph representation learning has emerged. Its main objective is to map nodes, subgraphs, or entire graphs onto low-dimensional embedding spaces while maximally preserving their topological structure, node attributes, and rich semantic information [5]. The learned low-dimensional vector representations serve as effective features that significantly empower downstream machine learning tasks such as node classification [6], link prediction [7], and graph classification [8]. Consequently, graph representation learning has become one of the key technologies in graph data mining [9].

With recent advances in deep learning, graph neural networks (GNNs) [10] have achieved significant success in graph representation learning. GNNs aim to extend convolutional operations from conventional domains to arbitrary topologies and unordered structures, including space-based methods [11,12] and spectrum-based methods [13,14]. Currently, most GNNs are essentially flat, as they propagate node information solely through edges and obtain graph representations by globally summarizing node representations. These aggregation methods include averaging across all nodes [15], adding virtual nodes [16], or using connectedness layers [17] or convolutional layers [18]. However, many real-world graph objects (such as molecules and proteins) are composed of reusable, functionally specific substructures (e.g., functional groups and structural motifs) [19], which can be regarded as factor subgraphs. Existing methods often couple this factor information into holistic, unstructured graph-level representations during the learning process, leading to limitations in model interpretability, generalization ability, and sensitivity to discriminative substructures. Without understanding the principles behind predictions, these black-box models cannot be fully trusted or widely applied in critical fields such as medical diagnosis. Furthermore, model interpretation facilitates model debugging and error analysis.

To enhance interpretability, researchers have begun exploring subgraph-based graph neural networks. These methods can explain node or graph classification predictions from GNNs trained using different strategies, broadly falling into two categories. On one hand, some methods enhance model interpretability by integrating graph kernels with graph neural networks [20,21,22]. This is achieved by comparing a set of learnable hidden subgraphs with the input graph to obtain the final graph embedding representation. However, the size and number of hidden subgraphs require manual tuning, and they cannot detect key subgraphs of arbitrary size and shape without explicit subgraph-level annotations. On the other hand, some approaches integrate subgraph information into the message-passing mechanism of graph neural networks [23,24,25], with the main objective of mining critical subgraphs reflecting intrinsic graph properties. These methods rely on optimization decision theory to ensure interpretability of generated subgraphs [26]. These methods typically rely on information from only a single type of subgraph. However, since a graph’s properties may be influenced by more than one subgraph, it is not possible to simply compress the information in the graph into a single graph. For example, the properties of a chemical molecule may be determined by multiple functional groups [27].

Although existing subgraph-based neural networks demonstrate strong performance and enhance the interpretability of GNNs, current approaches primarily suffer from two main limitations:

(1): Insufficient Structural Disentanglement Capability: Node features learned by mainstream GNNs and graph autoencoders are typically holistic vectors, making it difficult to explicitly decompose and identify latent substructures or factors with distinct semantics within the graph (e.g., representing different functional groups in molecular graphs or distinct community patterns in social networks). This coupling hinders the model’s deep understanding of the underlying principles of graph composition.
(2): Lack of Prototyping and Generalization Capabilities: Many graph classification models directly classify whole-graph features without mining reusable “prototype subgraph structures” shared across graphs. This hinders models from capturing the most discriminative structural patterns at the category level, resulting in weak generalization and interpretability when data are scarce or when dealing with heterogeneous graphs.

To address these challenges, this paper proposes Prototype Subgraph Disentangled Graph Neural Network, a graph representation learning framework based on latent factor decomposition and prototype alignment. Our key idea is to explicitly disentangle learned node features into multiple independent latent factors and align these factors with a set of learnable, discriminative prototype subgraph structures. Specifically, the model first acquires node features via a graph autoencoder and then innovatively partitions these features into several independent latent factor groups based on dimensionality. Each factor group is transformed into a factor subgraph through a subgraph generator, explicitly modeling latent internal substructures. Furthermore, we introduce a set of learnable prototype subgraphs, each representing a fundamental structural pattern shared across graphs. We design a factor–prototype one-to-one correspondence and similarity matching mechanism, supplemented by a mutual information maximization objective, to encourage learned factors to align structurally and semantically with their most similar prototypes. The main contributions of this work can be summarized as follows:

(1): We propose a factor subgraph disentangled mechanism that explicitly models and decomposes latent semantic substructures within graphs by partitioning node features into multiple groups and constructing factor subgraphs, thereby enhancing model interpretability.
(2): We design a prototype alignment and consistency learning paradigm. By introducing learnable prototype subgraphs and aligning decomposed factors with discriminative prototype subgraphs through similarity matching and factor subgraph mutual information loss, we learn category-sensitive, generalizable subgraph structures.
(3): Experiments on multiple public benchmark datasets demonstrate that our model not only achieves superior classification performance compared with existing baseline methods but also provides intuitive, structured explanations for classification decisions through the visual disentanglement of factor subgraphs and prototype subgraphs.

It is worth noting that recent advances in large language model (LLM)-based graph reasoning and graph-aware intelligent agents have shown emerging potential in complex graph data analysis. These methods typically rely on the accurate identification and representation of semantic substructures within graphs. However, existing LLM and agent-based approaches still face limitations in structural interpretability and the efficient modeling of crossgraph sharable prototype structures. The proposed factor subgraph disentanglement and prototype alignment mechanism in this paper is expected to provide a structured and interpretable subgraph representation paradigm for these directions, thereby complementing the graph understanding capabilities of large models and intelligent agents in terms of interpretability and structural generalization.

This paper is organized as follows: In Section 2, we review some work related to subgraph neural networks and the combination of graph kernels and GNNs. Section 3 introduces some implementation details regarding PSDGNN. In Section 4, we conduct a series of experiments to evaluate the performance of PSDGNN. Finally, we conclude the paper in Section 5.

2. Related Work

2.1. Subgraph Neural Networks

To enhance the structural discriminative capability of GNNs and understand their decision-making processes, some studies have attempted to incorporate subgraph information into the message-passing process or utilize subgraphs to construct graph representations. On one hand, incorporating subgraph information can enhance GNN interpretability. For instance, to address the issue of requiring predefined sizes and shapes in subgraph extraction, Li et al. [25] proposed a novel adaptive subgraph neural network (AdaSNN) to identify dominant subgraphs influencing prediction outcomes in graph data. AdaSNN leverages a reinforcement learning-based subgraph detection module that adaptively extracts dominant subgraphs without prior knowledge. To enhance GNN interpretability from a graph representation learning perspective, Qin et al. [28] introduced the MultiNet local subgraph convolution module. This module adaptively partitions each input graph into multiple subgraph views and applies numerous subgraph-specific, view-dependent convolution operations to constrain the propagation range of node information within the original global graph structure. This approach not only mitigates oversmoothing but also generates more discriminative local node representations. On the other hand, enhancing the expressive power of GNNs is another approach. Wang et al. [29] proposed a subgraph-aware neural network that incorporates subgraph identifiers into the original node label propagation process. This identification mechanism encodes the complete structural information of neighborhood subgraphs, enabling its expressive capability to surpass 1-WL in certain scenarios. This method indirectly enhances graph representation capabilities from a local perspective but fails to improve the global representation power of the graph. Current efforts explore prototype learning for global graph representations. For instance, Zhu et al. [30] proposed a graph neural network model for structural labeling and interaction modeling. Its key idea involves computing a set of end-to-end optimizable substructure prototypes, enabling any input graph to be projected onto these local structural representations for global characterization. Bevilacqua et al. [31] introduced a novel framework called Equivariant Subgraph Aggregation Network (EQAN). This approach enhances the expressive power of graph neural networks by representing each graph as a set of subgraphs derived from predefined policies and processing them using an appropriate equivariant architecture. Bouritsas [32] proposed a graph substructure network based on a topology-aware message-passing scheme using substructure encoding, primarily by incorporating subgraph counts into the message-passing process.

Table 1 compares our method, PSDGNN, with these approaches across several dimensions, including subgraph awareness, subgraph diversity, crossgraph prototype sharing, and subgraph generation methods.

2.2. Combination of Graph Kernels and GNNs

Graph kernels and GNNs can be integrated within the same model framework. On one hand, graph kernels possess robust structural representation capabilities [33]; on the other, GNNs can learn node representations even in the absence of domain-specific expertise. Their fusion enables complementary advantages. The most widely adopted graph kernel in this field is the random walk kernel. Representative works include the following: Lei et al. [34] mapped input graphs to kernel space by comparing them against reference objects. However, their reference objects lacked structural information, potentially failing to capture graph structure. Chen et al. [35] proposed GCKN, which computes the similarity between paths starting from nodes in the graph and a set of prototype paths using a random walk kernel, and this similarity serves as the embedding representation of the nodes, enabling each node to carry information about local graph substructures. However, GCKN employs random walk sampling for node paths, introducing randomness. In contrast, the model presented here generates subgraphs based on the original graph information, enabling the capture of richer topological structures. Nikolentzos et al. [21] introduced RWGNN, which achieves differentiability of the random walk kernel through a direct product graph. This method calculates the similarity between a batch of trainable hidden graphs and the input graph as the final graph representation, enhancing model transparency. However, it only supports single-layer models and lacks effective representation of local information. To explore further integration of graph kernels with GNNs, Cosmo et al. [36] extended other graph kernels into this hidden graph framework. Subgraphs are constructed by sampling node neighborhoods, and graph kernels compute the similarity between these subgraphs and the hidden graph to update node embeddings. However, the non-differentiability of some graph kernels significantly increases the computational cost of training hidden graphs, limiting their application on large-scale graphs.

3. Methodology

The proposed PSDGNN framework, as shown in Figure 1, provides a brief introduction to the model architecture.

Problem Formulation. A graph can be represented as

G = (V, X, A)

, where

V = {v_{1}, v_{2}, \dots, v_{N}}

is a set of nodes and

X \in R^{N \times d}

denotes the node feature matrix. The number of nodes is denoted by N, and the dimension of the node features is d. A denotes the adjacency matrix of the graph. The graphs studied in this paper are all unweighted undirected graphs. For the graph supervised classification task, given a dataset

(G, Y) = {(G_{1}, y_{1}), (G_{2}, y_{2}), \dots, (G_{n}, y_{n})}

, where

y_{i} \in Y

denotes the label of

G_{i} \in G

, the objective of the graph classification task is to learn the mapping function from the graph

G

to the label

Y

. This paper employs one-hot encoding to process discrete labels. As an illustration, the three labels

1, 2, 3

correspond to the three-dimensional vectors

(1, 0, 0)

,

(0, 1, 0)

, and

(0, 0, 1)

, respectively. In addition, we summarize the main notations in Table 2.

3.1. Representation Disentangled Module

Given a graph

g = (V, X, A)

, we use a graph variational autoencoder (GraphVAE) to learn m disentangled latent factors

Z^{(l)} = {z_{1}, z_{2}, \dots z_{m}}

, where

z_{m} \in R^{n \times \tilde{d}}

,

\tilde{d} = d (l) / m

and

d (l)

denotes the dimension of the lth hidden layer. These latent factors originate from the grouping of node features. Our improved GraphVAE employs a basic graph convolutional network (GCN) as the graph encoder, for which the output of the l-th layer is given by

Z^{(l)} = σ (\tilde{A} Z^{(l - 1)} W^{(l - 1)})

(1)

where A denotes the normalized adjacency matrix;

\tilde{A} = A + I

, where I is the identity matrix; D is the diagonal matrix of node degrees; and

σ

is the Sigmoid activation function.

In the decoder of the improved GraphVAE, we employ separate heads: a multi-layer perceptron for reconstructing X and a linear inner-product decoder for recovering A. Specifically, we define the graph reconstruction as

A_{c} = σ (Z^{(l)} {(Z^{(l)})}^{T})

(2)

X_{c} = M L P (Z^{(l)})

(3)

where

A_{c}

is the reconstructed adjacency matrix and

X_{c}

denotes the reconstructed node features. Our multi-head GraphVAE aims to minimize reconstruction error while maximizing the compression of latent variables Z. The objective is formulated as follows:

\begin{matrix} L_{G r a p h V A E} = E [‖ X - X_{C} ‖_{F}] + E [‖ A - A_{C} ‖_{F}] \\ - E [D_{K L} [q (Z | A, X) ‖ p (Z)]] \end{matrix}

(4)

where

{∥\cdot∥}_{F}

denotes the Frobenius norm and

q (Z |A, X)

represents the graph encoder. Furthermore, to ensure independence among latent factors, for graph representations, the mutual information (MI) between them reaches a minimum value of 0 when

p (z_{i}, z_{j}) = p (z_{i}) p (z_{j})

, indicating that

z_{i}

and

z_{j}

are mutually independent. Thus, minimizing the MI between them encourages factor representations to learn information from different aspects of the graph. Recently, several MI upper bounds [37] have been introduced to minimize MI. However, estimating these MI upper bounds across m graph representations requires

m (m - 1)

estimations, leading to significantly higher computational costs, especially when m is large. To mitigate this issue, since orthogonality is a specific instance of linear independence, we relax the orthogonality constraint for minimizing MI. This approach has also been proven effective in many previous studies [38]. The factor independence loss is defined as follows:

L_{f a c t o r} = |Z^{(l)} {(Z^{(l)})}^{T} - I|

(5)

where

|\cdot|

denotes the L1 norm and I represents the identity matrix. The complete algorithm of the representation disentangling module is shown in Algorithm 1.

Algorithm 1 Representation disentangling module of PSDGNN

1:: Input: Graph $G = (V, X, A)$ , number of latent factors m, number of encoder layers L
2:: Output: Latent factors $Z^{(L_{e n c})} = {z_{1}, z_{2}, \dots, z_{m}}$ , reconstructed adjacency $A_{c}$ , reconstructed features $X_{c}$
3:: Compute normalized adjacency matrix $\tilde{A} \leftarrow A + I$
4:: Initialize $Z^{(0)} \leftarrow X$
5:: for $l = 1$ to L do
6:: GCN encoding using Equation (1)
7:: end for
8:: Set $Z \leftarrow Z^{(L_{e n c})} \in R^{N \times d (l)}$
9:: Split Z into m equal parts along the feature dimension: $d (l) = m \times \tilde{d}$
10:: for $i = 1$ to m do
11:: $z_{i} \leftarrow Z [:, (i - 1) \cdot \tilde{d} + 1 : i \cdot \tilde{d}]$ ▹i-th latent factor
12:: end for
13:: Reconstruct adjacency matrix using Equation (2)
14:: Reconstruct node features using Equation (3)
15:: Compute $L_{G r a p h V A E}$ Equation (4)
16:: Compute $L_{f a c t o r}$ using Equation (5)

3.2. Factor Subgraph Generation Module

The workflow of the factor subgraph generation module is illustrated in Figure 2. Based on the previously obtained latent factors, we compute the probabilities for each edge in the factor subgraph. First, we learn to compute the inner product of each factor subgraph, where each entry represents an edge selection probability. Then, using the Sigmoid function, we obtain the attention mask

p^{(m)} \in R^{n \times n}

for the m-th edge in the factor subgraph, ensuring that

{p_{i j}}^{(m)} \in [0, 1]

, where

{p_{i j}}^{(m)}

denotes the probability of selecting the edge between node

v_{i}

and node

v_{j}

.

P (m)

is calculated as follows:

p^{(m)} = σ (z_{m} {z_{m}}^{T})

(6)

Subsequently, we binarize

p^{(m)}

to obtain the edge assignment

S^{(m)} \in {0, 1}^{n \times n}

. To ensure that the gradient

{p_{i j}}^{(m)}

is computable, we employ the Gumbel–Softmax reparameterization trick [39] to update the edge assignment. However, if we directly apply the Gumbel–Softmax function to

S^{(m)}

, we can only select an edge from every n consecutive edges. To ensure that sufficient edges are continuous, we reshape

p^{(m)}

into an L-dimensional matrix and then apply binarization. We utilize the Gumbel–Softmax method to generate the edge assignment

S^{(m)}

. This guarantees that at least an edge is selected from every L continuous edges. Here, the l-th edge sample probability of the L-dimensional sample vector is defined as

{\hat{p}}_{l}^{(m)} = \frac{exp ((log {p_{l}}^{(m)} + c_{l}) / τ)}{\sum_{i = 1}^{L} exp ((log {p_{i}}^{(m)} + c_{i}) / τ)}

(7)

c_{l} = - log (- log U_{l}), U_{l} = U n i f o r m (0, 1)

(8)

where

τ

is the temperature of the concrete distribution,

{p_{l}}^{(m)}

denotes the edge selection probability for the l-th group after partitioning,

{\hat{p}}_{l}^{(m)}

is the sample probability, and

c_{l}

is generated from the

G u m b e l (0, 1)

distribution. Thus, based on this process, L can determine the size of each potential subgraph

S G

, and we can retain

n \times n / k

edges. Then, we transform

S^{(m)}

into an

n \times n

matrix. Finally, extract the factor subgraph

S G_{m}

via

A_{S G_{m}} = A ⊙ S^{(m)}

, whose node feature is

X_{S G_{m}} = M L P (Z_{m})

, where ⊙ denotes element-wise multiplication. A and

A_{S G_{m}}

represent the adjacency matrices of the input graph and the m-th factor subgraph, respectively. This yields the entire set of potential factor subgraphs

S G = (S G_{1}, S G_{2}, \dots S G_{m})

.

3.3. Prototype Kernel Embedding Module

Recent prototype-based GNNs implicitly represent prototype subgraphs by defining a set of learnable graph embedding vectors, which compromises their inherent interpretability. Therefore, we aim to explicitly define prototype subgraphs. To achieve this, we propose using a random walk graph kernel to explicitly explore graph topology. Our prototype kernel embedding module introduces learnable prototype subgraphs parameterized by trainable adjacency matrices. Formally, our module comprises a set

P G = (P G_{1}, P G_{2}, \dots P G_{M})

of prototype subgraphs of equal size for each factor subgraph

P G_{m} = (A_{P G_{m}}, X_{P G_{m}})

. Each prototype subgraph is parameterized as a less-parametric undirected graph, which corresponds one-to-one with the previously obtained factor subgraph via subscripting. These prototype subgraphs are expected to learn prototype structures that aid in distinguishing available classes. Inspired by the fact that random walk kernels quantify similarity between two graphs based on the number of common walks, we compare each factor subgraph with its corresponding prototype subgraph equipped with differentiable functions derived from random walk kernels.

Specifically, given a factor subgraph

S G_{m} = (A_{S G_{m}}, X_{S G_{m}})

and its corresponding prototype subgraph

P G_{m} = (A_{P G_{m}}, X_{P G_{m}})

, their direct product graph

{G_{m}}^{\times}

has an adjacency matrix

{A_{m}}^{\times} = A_{S G_{m}} \otimes A_{P G_{m}}

, where ⊗ denotes the Kronecker product between two matrices. It can be observed that a random walk on

{G_{m}}^{\times}

can be interpreted as a simultaneous walk on graphs

S G_{m}

and

P G_{m}

[40]. Considering that the traditional random walk kernel computes all matched walk pairs between two graphs, the number of matched walks when traversing nodes on both

S_{G_{m}}

and

P_{G_{m}}

is equivalent to the number of matched walks in the adjacency matrix of

{G_{m}}^{\times}

, denoted

{A_{m}}^{\times}

of

{G_{m}}^{\times}

, when traversing nodes on

S G_{m}

and

P G_{m}

simultaneously. Therefore, the random walk kernel for P steps between

S G_{m}

and P that computes all simultaneous random walks is defined as

K_{m}^{(P)} (S G_{m}, P G_{m}) = \sum_{p = 0}^{P} w_{p} u^{T} {A_{m}}^{\times} u

(9)

where

u = v e c (X_{S G_{m}} {X_{P G_{m}}}^{T})

denotes the one-dimensional vector resulting from unfolding the similarity matrix of node features between

S G_{m}

and

P G_{m}

into a vector.

w_{p}

denotes the positive weight. To simplify computation, we only calculate the number of steps with a common length of P on the two comparison graphs:

K_{m}^{(P)} (S G_{m}, P G_{m}) = u^{T} {A_{m}}^{\times} u

(10)

Then, given a set of factor subgraphs

SG = (S G_{1}, S G_{2}, \dots S G_{m})

and a set of prototype subgraphs

P G = (P G_{1}, P G_{2}, \dots P G_{M})

, we calculate the similarity between the factor subgraph and the atomic graph according to the subscript correspondence:

K_{m} = K^{(p)} (S G_{m}, P G_{m})

(11)

Unlike traditional graph kernels where both graphs are fixed, our scenario requires the prototype graph to be learnable. The kernel computes a joint similarity that integrates two complementary types:

(1): Walk-based topological similarity. Captured by the direct product graph adjacency matrix ${A_{m}}^{\times} = A_{S G_{m}} \otimes A_{P G_{m}}$ , where $A_{P G_{m}}$ is learnable. This term measures structural alignment by counting common walk sequences of length P.
(2): Feature-guided similarity. Encoded by $u = v e c (X_{S G_{m}} {X_{P G_{m}}}^{T})$ , where $X_{P G_{m}}$ is also learnable. This term weights each walk by the similarity of node features between the two graphs.

The final graph embedding representation is obtained as

H_{G} = [K_{1}, K_{2}, \dots K_{m}] \in R^{m}

and ultimately implemented through a multi-layer perceptron to accomplish the graph classification task.

3.4. Model Optimization

The final component of PSDGNN takes the similarity representation of factor subgraphs and atomic graphs as input, outputting the predicted probability for each class:

\hat{y} = f (H_{G})

, where the function f is an MLP followed by a Softmax layer. For optimization, cross-entropy loss is employed to measure classification loss:

L_{C E} = \frac{1}{N} \sum_{i} - [y_{i} log ({\hat{y}}_{i}) + (1 - y_{i}) log (1 - {\hat{y}}_{i})]

(12)

where N denotes the number of graphs and

y_{i}

represents true label.

Combined with the previous image encoder loss and factor mutual information loss, the final optimization objective of PSDGNN is

L = L_{C E} + α L_{G r a p h V A E} + β L_{f a c t o r}

(13)

where

α

and

β

are hyperparameters that adjust the weights of the loss function. The complete algorithm of PSDGNN is shown in Algorithm 2.

Algorithm 2 Training process of PSDGNN

Input:

G = {g_{1}, g_{2}, \dots, g_{n}}

;

g = (V, E, X)

;

Number of factor subgraphs m; Number of layers l;

Number of epochs

E_{m a x}

Output: Predicted label

\hat{Y} \in R^{C}

of the graph

1:: Initialize the adjacency matrix $A_{P G_{m}}$ and feature matrix $X_{P G_{m}}$ of the prototype subgraph.
2:: Encoding node features using Equation (1) to obtain $Z^{l}$
3:: Decompose $Z^{l}$ into m latent factors based on the feature dimension
4:: Reconstruct the graph using Equations (2) and (3).
5:: for $t = 1, 2, \dots, E_{m a x}$ do
6:: for $i = 1, 2, \dots, m$ do
7:: Calculate the edge mask matrix $p^{(i)}$ using Equation (6)
8:: Reshape the edge mask matrix into an L-dimensional matrix
9:: Select $n \times n / k$ edges using Equations (7) and (8)
10:: The factor subgraph is generated by $A_{S G_{i}} = A ⊙ S^{(i)}$ ▹ Generate factor subgraph via element-wise masking
11:: The similarity $K_{i}$ between factor subgraph ${S G}_{m}$ and prototype subgraph $P G_{m}$ is calculated by Equation (11) ▹ Concatenate similarities into graph embedding
12:: end for
13:: Generate predicted graph label $\hat{Y}$ for the graph using MLP and Softmax.
14:: Calculate the overall loss $L$ by Equation (13)
15:: Update parameters by descending $L$ ▹ End-to-end optimization via backpropagation
16:: end for

3.5. Complexity Analysis

In this section, we will analyze the computational complexity and consumption of PSDGNN. The overall time complexity and memory complexity of one graph of PSDGNN are respectively

O (L N \bar{d} d + m N^{2} (P + \tilde{d}))

and

O (N d + m N^{2})

, where N denotes the number of nodes,

\bar{d}

is the average node degree, d is the hidden dimension and m is the number of factors. Specifically, the time complexity and memory complexity of the representation disentangled module are

O (L N \bar{d} d + N^{2} d)

and

O (N d + N^{2})

. The factor subgraph generation module takes

O (m N^{2} \tilde{d})

and

O (m N^{2})

in terms of time complexity and memory complexity, respectively. For the prototype kernel embedding module, time complexity and memory complexity are

O (m N^{2} P)

and

O (m N^{2})

.

3.6. Discussion

Identifiability of latent factor decomposition. A limitation of the proposed factor decomposition is that without additional constraints, the decomposition of node features into m latent factors

Z^{(l)} = {z_{1}, z_{2}, \dots z_{m}}

is not strictly identifiable. In general, any invertible transformation applied to the latent factors can produce an equivalent reconstruction, a common challenge in disentangled representation learning. To address this from a practical perspective, PSDGNN incorporates two mechanisms that promote factor uniqueness and disentanglement:

(1): Factor mutual information loss ( $L_{f a c t o r}$ ). Following prior work on mutual information-based disentanglement [37,38], we minimize the mutual information among different factor subgraphs. This encourages each factor to capture distinct, non-redundant structural patterns, reducing the degrees of freedom in the decomposition space and promoting a form of practical identifiability.
(2): One-to-one prototype alignment. Each factor subgraph $S G_{m}$ is forced to align with a dedicated learnable prototype subgraph $P G_{m}$ through the random walk kernel similarity. This additional supervision signal anchors each factor to a specific structural prototype, further constraining the decomposition space. While the prototype–factor mapping is one-to-one, overlapping substructures in the original graph are not discarded. Instead, they are captured through the composition of multiple prototypes in the later factor aggregation stage. For example, in a molecular graph with overlapping functional groups (e.g., a benzene ring fused with a pyrrole ring), different prototypes activate for different subregions, and the factor representation aggregates these prototype activations to form a holistic representation. The one-to-one constraint applies to the mapping definition, not to the activation pattern—a single node or edge can contribute to multiple prototypes, and multiple prototypes can jointly represent overlapping semantics.

4. Experiments

In this section, we first describe the experimental setup and present graph classification experiments conducted on seven real-world benchmark datasets to evaluate the performance of the proposed method. Next, we explore the stability and convergence of model training. Subsequently, ablation experiments are performed to demonstrate the effectiveness of the various modules in our framework, and parameter sensitivity experiments are carried out to empirically examine the influence of different hyperparameters. Finally, we present a visualization analysis to investigate PSDGNN’s encoding variability and interpretability.

4.1. Experiments on Benchmark Datasets

Datasets. The seven benchmark datasets used in this study are among the most commonly used public datasets in graph classification. These datasets exhibit significant diversity in graph size, node feature richness, and number of classes, enabling comprehensive evaluation of PSDGNN’s generalization ability and robustness across different scenarios, and they are widely recognized as standard evaluation benchmarks in the community, covering three representative domains, chemical compounds, biological proteins, and social networks, as detailed in Table 3. Seven benchmark datasets were utilized in the experiments. For the chemical compound datasets, MUTAG [19], NCI1 [41], and PTC [42] were selected. MUTAG classifies nitro-containing molecules as mutagenic or non-mutagenic aromatic or heteroaromatic compounds. NCI1 screens compounds for activity against non-small-cell lung cancer, while PTC predicts carcinogenicity in rodents. For biological protein datasets, PROTEINS [43] and DD [44] were selected, both designed for enzyme classification tasks. The PROTEINS and DD datasets were used to determine whether protein molecules belong to the enzyme class. For the social network dataset, IMDB-B [45] and IMDB-M [45] were selected. The IMDB datasets originate from the Internet Movie Database and represent movie collaboration networks. This task involves classifying actor subgraphs into corresponding movie genres.

Baseline. This paper selects several representative benchmark methods for graph classification tasks, including graph kernels, graph neural networks, and hybrid approaches combining both, to demonstrate the superior performance of PSDGNN. Traditional graph kernel methods include WL kernel [46], RetGK [47], AEGK [48], and DSP-I [49], while graph neural network models include GIN [10], SLIM [30], AdaSNN [25], SEKGIN [50], and HA-SCN [51]. Recently proposed hybrid approaches are KerGNN [22], GKNN [36], and GCKSVM [52]. To ensure experimental fairness, comparisons are conducted under identical hardware configurations and default parameter settings.

Parameter settings. The model learning rate is set to 0.01 by default, with a batch size of 32 graphs and 400 iterations. Cross-validation is performed 10 times to estimate the model’s classification accuracy. The number of factor subgraphs is set to 16, the path length for the random walk kernel is 3, and graph pooling defaults to summation. When processing the dataset, the social network dataset uses node degree as the node feature.

Experimental environment. All experiments in this paper are conducted using an NVIDIA RTX 4090D GPU with 24GB of VRAM. The operating system is Ubuntu 16.04.1 LTS 64-bit, and the CPU model is Xeon^® Platinum 8481C. Code implementation utilizes Python 3.10, with the deep learning framework being PyTorch 2.2.1.

To evaluate the performance of PSDGNN, its graph classification accuracy is compared with 10 benchmark methods across seven graph datasets. The results are presented in Table 4, where the best result on each dataset is highlighted in bold, the second-best result is underlined, and “-” indicates that results for that method could not be obtained on that dataset. As shown in Table 2, PSDGNN demonstrates significant performance advantages on six datasets, except MUTAG. Compared with graph kernel-based methods (e.g., WL, RetGK, and AEGK), PSDGNN achieves notably higher accuracy, confirming its robust subgraph structure modeling capability. SEKGCN, a subgraph-based graph neural network, achieves the second-best overall performance among all benchmark methods, underscoring the importance of subgraph encoding. PSDGNN outperforms SEKGCN because it leverages community subgraph information, whereas SEKGCN relies on fixed neighborhood subgraphs. Compared with kernel-based methods like GKNN, KerGIN, and GCKSVM, PSDGNN also demonstrates superior classification accuracy. It achieves average accuracy improvements of 8% and 10% on MUTAG and PTC, respectively, indicating that incorporating community subgraph information into graph embedding representations can enhance the performance of graph neural networks. Figure 3 shows the results of PSDGNN across ten-fold cross-validation on seven datasets.

4.2. Ablation Analysis

Module Ablation: To investigate the impact of individual modules on performance, we also conducted ablation studies to evaluate the potential contributions of various components within PSDGNN. Specifically, we compared three variants of PSDGNN: the original model w/o GraphVAE (PSDGNN without the modified GraphVAE), w/o PG (PSDGNN without the prototype subgraph module and prototype kernel embedding module), and GNN (PSDGNN without the modified GraphVAE, with the factor subgraph generation module and the prototype kernel embedding module). Notably, the w/o GraphVAE variant omits the decoding component and model optimization of the enhanced GraphVAE algorithm. As shown in Figure 4a, our ablation study reveals that PSDGNN consistently outperforms the model w/o PG across all datasets, which suggests that prototype subgraphs help models capture typical features relevant to classification. Furthermore, PSDGNN outperforms the model w/o GraphVAE. This may be because GraphVAE is capable of learning meaningful, interpretable latent representations of graphs.

Loss Ablation: To investigate the impact of various loss functions on performance, an ablation study was conducted. Specifically, we compared two variants of PSDGNN: the original model w/o GraphVAE Loss (removing the GraphVAE loss function), w/o Factor Loss (removing the factor mutual information loss function), and GNN (PSDGNN without GraphVAE modifications, retaining the original subgraph module and the original kernel embedding module). The results of the ablation study are shown in Figure 4b. Compared with the original model (PSDGNN), the classification accuracy of the variant w/o GraphVAE Loss decreased by approximately 2%. This is because GraphVAE loss helps preserve critical structural information in the input graph, preventing information loss. Compared with the original model (PSDGNN), the version w/o Factor Loss resulted in a classification accuracy decrease of approximately 1%. This is because factor loss enforces independence between different latent variables, ensuring that each factor subgraph corresponds to a unique and non-redundant connectivity pattern.

Effect of Independence Constraints on Graph Classification: In Table 5, we evaluate three representative paradigms for feature independence constraints: HSIC (kernel-based), distance correlation (distance-based), and factor mutual information (Factor MI) (information-theoretic). Factor MI consistently achieves the best or runner-up performance across all seven datasets, demonstrating the superiority of information-theoretic independence constraints for graph-structured data. HSIC performs competitively on MUTAG (92.9%) and PTC (73.4%), ranking second in most cases, validating kernel-based methods as strong baselines, while distance correlation yields relatively conservative results, suggesting that distance-based independence measures may be less sensitive to structural dependencies in graphs. Based on these results, we adopt Factor MI as the default independence constraint. Although alternative disentanglement strategies such as contrastive objectives (e.g., InfoNCE) or independence-promoting regularizers (e.g., Total Correlation penalty) show potential, the consistent and dominant performance of Factor MI across the seven benchmarks justifies its selection. Exploring these alternatives in graph representation learning remains an interesting direction for future work.

Comparison with Attention-Based Soft Matching. To validate the design choice of strict prototype alignment, we implemented an alternative approach in which each factor graph is matched with all prototype graphs via an attention mechanism. As shown in Figure 5, strict alignment consistently outperforms the attention mechanism variant across all seven datasets, with an average improvement of 2–4%. We attribute this to two key reasons. First, strict alignment maintains a clear one-to-one correspondence between factor subgraphs and prototype subgraphs, which is crucial to decoupling—each factor is forced to align with a distinct structural prototype, preventing the attention mechanism from blending multiple prototypes into a single factor, thereby preserving interpretability at the factor level. Second, attention-based variants introduce additional parameters and optimization complexity, which may lead to overfitting, particularly on smaller datasets such as MUTAG and PTC, where the performance gap is most pronounced. Therefore, we retain strict prototype alignment as the default mechanism in our framework.

4.3. Parameter Sensitivity Analysis

Number of Factor Subgraphs M: When constructing factor subgraphs, the hyperparameter M is introduced to control the number of feature groupings for nodes. To achieve optimal model performance, experiments were conducted with different values of M. As shown in Figure 6, graph classification experiments were performed with M ranging from 2 to 12 in increments of 0.2. Experimental results indicate the following: On the MUTAG, PROTEINS, IMDB-B, and IMDB-M datasets, classification accuracy peaks when M is approximately 4, while on the PTC and NCI1 datasets, the highest accuracy is achieved when M is approximately 6. In summary, the optimal range for the parameter is between 4 and 8. This is primarily because the features of nodes in different datasets contain varying amounts of information; an excessively large value of M introduces noise and weakens the discriminative power of the factor subgraph, while an excessively small value of M prevents the model from fully capturing factor subgraphs at different levels, thereby reducing the representational capacity of the factor subgraph.

Number of Graph Encoder Layers: To investigate the impact of encoder layer count on experimental performance, we examined how graph classification accuracy evolves across training epochs for PSDGNN’s GraphVAE module with varying graph encoder layers across six datasets. As shown in Figure 7, with layer sizes ranging from 1 to 4, across all datasets, the best performance was achieved with two layers. When the number of layers exceeded two, the accuracy of graph classification began to decline. This is determined by the graph structure: when the number of layers is excessive, the aggregated information from the neighborhoods of individual nodes becomes too similar, leading to a decrease in the model’s classification accuracy. This phenomenon, commonly observed in message-passing neural networks, is known as the oversmoothing problem. However, the introduction of the factor subgraph mutual information loss function into this model confers a degree of resistance to oversmoothing. Consequently, while the number of layers increases, the decline in accuracy is less pronounced.

Weighting Hyperparameters in Loss Functions: During model optimization, this paper introduces hyperparameters

α

and

β

representing the weights for GraphVAE loss and factor mutual information loss, respectively. This experiment investigates how varying these hyperparameters affects classification accuracy across three graph classification datasets. As shown in Figure 8, on the MUTAG dataset, the highest classification accuracy is achieved when

α = 0.6

and

β = 0.2

. On the DD dataset, the highest classification accuracy is achieved when

α = 0.2

and

β = 1.0

. On the Pubmed dataset, the highest classification accuracy is achieved when

α

is set to 0.2 and

β

is set to 0.6. This indicates that the two loss functions can provide PSDGNN with self-supervised information, thereby enabling the subgraph representation to distinguish between different graph instances. The proposed subgraph disentangled mechanism can extract different key information from various types of datasets through these two loss functions.

The Path Length P of the Random Walk Kernel: For the prototype kernel embedding module, we utilize random walk kernels to compute the similarity between factor subgraphs and prototype subgraphs. To investigate the impact of path length P for different random walk kernels on model performance, we evaluated PSDGNN with varying path lengths P across six datasets. As shown in Figure 9, on the MUTAG, PTC, PROTEINS, and NCI1 datasets, PSDGNN performs best when

P = 2

or

P = 3

, whereas on the IMDB-B and IMDB-M datasets, PSDGNN performs best when

P = 4

. This suggests that the optimal path length in social networks has a higher order than that in bioinformatics graphs. It is reasonable because the diameters of most of the basic functional blocks in molecules are around 2 to 3 and long-range dependency also plays an important role in social networks.

The Size of the Prototype Subgraph: In our framework, each learnable prototype subgraph is parameterized by a trainable adjacency matrix and a trainable feature matrix, with the prototype subgraph size matching that of the factor subgraph. To investigate the impact of different prototype subgraph sizes on model performance, we conducted experiments with the number of prototype subgraphs ranging from 2 to 16. As shown in Figure 10, the optimal prototype size varies substantially across datasets, reflecting differences in graph complexity and structural diversity. For small molecular graphs such as MUTAG and PTC, which have average node counts of 17 and 19 respectively, the best performance is achieved with a prototype size of 4. Smaller prototypes (e.g., size 2) lack sufficient capacity to capture discriminative functional groups such as nitro groups or aromatic rings, while larger prototypes (e.g., size 8 or above) introduce spurious structures that lead to overfitting. For medium-sized graphs including PROTEINS and IMDB-B, which have average node counts of 20 to 25, the optimal prototype size increases to 6. This larger capacity allows the model to capture more complex structural motifs, such as secondary protein structures or dense collaboration communities. For the largest datasets in our study—DD, NCI1, and IMDB-M—where average node counts range from 30 to nearly 70, the optimal prototype size ranges from 8 to 10. These larger prototypes are necessary to encode the diverse and elaborate topological patterns present in large molecular compounds, protein enzymes, and multi-genre movie collaboration networks. Notably, when the prototype size is increased beyond 12, performance degrades consistently across all datasets due to increased model complexity and sparsity in the similarity matching between factor subgraphs and prototypes. Conversely, when the prototype size is too small (e.g., size 2), the prototypes lack sufficient expressive power to capture the distinguishing substructures of each class.

4.4. Visualization Analysis

To assess the discriminative power of PSDGNN, we use t-SNE to project the high-dimensional graph embeddings into 2D space. As shown in Figure 11, the embeddings of different classes form well-separated clusters on MUTAG, PROTEINS, and DD, indicating that PSDGNN successfully learns class-discriminative representations.

To illustrate the disentanglement process more clearly, we present an example of the factor subgraph generated by PSDGNN. Figure 12 shows the original disentangled factor subgraph. After setting the coefficient threshold on the original graph, the edges of the disentangled factor subgraph are visually highlighted with different colors. For instance, in the MUTAG dataset, the task is to predict the mutagenicity of a molecule against a set of nitroaromatic compounds in Salmonella typhimurium. We can observe that different parts of the molecular graph play distinct roles in the prediction. This also demonstrates the reliability of our generated factor subgraph when obtaining a disentangled graphical representation.

5. Conclusions

This paper proposes Prototype Subgraph Disentangled Graph Neural Network (PSDGNN), which enhances interpretability by decomposing input graph features into latent factor subgraphs and aligning them with learnable prototype subgraph via stochastic tour kernels. Concurrently, it introduces factor mutual information loss to encourage learning distinct latent connectivity patterns across factor subgraphs, thereby improving model generalization.

Experimental results on seven public benchmark datasets demonstrate that PSDGNN consistently outperforms existing methods, achieving competitive performance across all datasets. Notably, PSDGNN achieves a 5.5% accuracy improvement over the best baseline on DD and a 4.2% improvement on IMDB-B, validating the effectiveness of its disentanglement and prototype alignment mechanisms. On average, PSDGNN outperforms traditional graph kernels by 8–10% and GNN-based methods by 3–5% on challenging datasets. Beyond accuracy, PSDGNN provides structured interpretability through visualizable factor subgraphs and prototype subgraphs, and effectively mitigates the oversmoothing problem in GNNs.

Future work will focus on exploring composition methods based on other graph kernel approaches and investigating the model’s potential applications in other graph tasks, such as node classification and link prediction. In addition, the structured factor subgraphs disentangled by our method have the potential to provide interpretable substructure priors for large-model-based graph reasoning tasks. Exploring the integration of factor subgraphs with large models is a promising direction for future work.

Author Contributions

Conceptualization, B.Y.; methodology, B.Y.; software, B.Y.; validation, B.Y.; formal analysis, B.Y.; investigation, B.Y.; resources, L.X.; data curation, J.Y.; writing—original draft preparation, B.Y.; writing—review and editing, L.X. and Y.T.; visualization, J.Y.; supervision, L.X. and Y.T.; project administration, L.X. and Y.T. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported by the University Synergy Innovation Program of Anhui Province (GXXT-2022-047); the Institute of Complex Networks and Computational Intelligence, Suzhou University (2021XJPT50); Municipal-Level Research Platforms in Suzhou City (2022SJPT05); and Anhui Province’s Training Initiative for Middle-Aged and Young Teachers: Overseas Study Program for Young Core Teachers (gxgwfx2020063).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in the study are included in the article; further inquiries can be directed to the corresponding author.

Conflicts of Interest

Author Baosheng Yang was employed by the company Suzhou Industrial Investment Holding Group Co., Ltd. The remaining authors declare that the present research study was conducted in the absence of any commercial or financial relationships that could be construed as potential conflicts of interest.

References

Feng, X.; Sheng, L.; Zhu, L.; Feng, Y.; Wei, C.; Xiao, X.; Wang, H. Traffic Flow Prediction in Complex Transportation Networks via a Spatiotemporal Causal–Trend Network. Mathematics 2026, 14, 443. [Google Scholar] [CrossRef]
Peng, H.; Li, J.; He, Y.; Liu, Y.; Bao, M.; Wang, L.; Song, Y.; Yang, Q. Large-scale hierarchical text classification with recursively regularized deep graph-cnn. In Proceedings of the 2018 World Wide Web Conference, Lyon, France, 23–27 April 2018; pp. 1063–1072. [Google Scholar]
Zhang, L.; Yi, L.; Liu, Y.; Wang, C.; Zhou, D. Motif Entropy Graph Kernel. Pattern Recognit. 2023, 140, 109544. [Google Scholar] [CrossRef]
Xu, L.; Wang, Z.; Bai, L.; Ji, S.; Ai, B.; Wang, X.; Yu, P.S. Multi-level knowledge distillation with positional encoding enhancement. Pattern Recognit. 2025, 163, 111458. [Google Scholar] [CrossRef]
Essam, J.W.; Fisher, M.E. Some basic definitions in graph theory. Rev. Mod. Phys. 1970, 42, 271. [Google Scholar] [CrossRef]
Sejan, M.A.S.; Rahman, M.H.; Aziz, M.A.; Hameed, I.; Islam, M.S.; Sabuj, S.R.; Song, H.K. Learning robust node representations via graph neural network and multilayer perceptron classifier. Mathematics 2026, 14, 680. [Google Scholar] [CrossRef]
Yang, M.; He, Y. A Link Prediction Algorithm Based on Layer Attention Mechanism for Multiplex Networks. Mathematics 2025, 13, 3803. [Google Scholar] [CrossRef]
Kudo, T.; Maeda, E.; Matsumoto, Y. An application of boosting to graph classification. Adv. Neural Inf. Process. Syst. 2004, 17, 729–736. [Google Scholar]
Hu, Y.; Lu, J.; Zhao, X.; Li, Y.; Tian, Z.; Li, Z. ProcessGFM: A Domain-Specific Graph Pretraining Prototype for Predictive Process Monitoring. Mathematics 2025, 13, 3991. [Google Scholar] [CrossRef]
Xu, K.; Hu, W.; Leskovec, J.; Jegelka, S. How powerful are graph neural networks. arXiv 2019, arXiv:1810.00826. [Google Scholar] [CrossRef]
Niepert, M.; Ahmed, M.; Kutzkov, K. Learning convolutional neural networks for graphs. In Proceedings of the International Conference on Machine Learning, New York, NY, USA, 20–22 June 2016; pp. 2014–2023. [Google Scholar]
Simonovsky, M.; Komodakis, N. Dynamic edge-conditioned filters in convolutional neural networks on graphs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 3693–3702. [Google Scholar]
Wang, X.; Zhang, M. How powerful are spectral graph neural networks. In Proceedings of the International Conference on Machine Learning, Baltimore, MD, USA, 17–23 July 2022; pp. 23341–23362. [Google Scholar]
Guo, J.; Huang, K.; Yi, X.; Zhang, R. Graph neural networks with diverse spectral filtering. In Proceedings of the ACM Web Conference 2023, Austin, TX, USA, 30 April–4 May 2023; pp. 306–316. [Google Scholar]
Duvenaud, D.K.; Maclaurin, D.; Iparraguirre, J.; Bombarell, R.; Hirzel, T.; Aspuru-Guzik, A.; Adams, R.P. Convolutional networks on graphs for learning molecular fingerprints. Adv. Neural Inf. Process. Syst. 2015, 28, 2224–2232. [Google Scholar]
Ruiz, L.; Gama, F.; Ribeiro, A. Gated graph recurrent neural networks. IEEE Trans. Signal Process. 2020, 68, 6303–6318. [Google Scholar] [CrossRef]
Gilmer, J.; Schoenholz, S.S.; Riley, P.F.; Vinyals, O.; Dahl, G.E. Neural message passing for quantum chemistry. In Proceedings of the International Conference on Machine Learning, Sydney, Australia, 6–11 August 2017; pp. 1263–1272. [Google Scholar]
Zhang, M.; Cui, Z.; Neumann, M.; Chen, Y. An end-to-end deep learning architecture for graph classification. In Proceedings of the AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018; Volume 32. [Google Scholar]
Debnath, A.K.; Lopez de Compadre, R.L.; Debnath, G.; Shusterman, A.J.; Hansch, C. Structure-activity relationship of mutagenic aromatic and heteroaromatic nitro compounds. correlation with molecular orbital energies and hydrophobicity. J. Med. Chem. 1991, 34, 786–797. [Google Scholar] [CrossRef]
Xu, L.; Peng, J.; Jiang, X.; Chen, E.; Luo, B. Graph neural network based on graph kernel: A survey. Pattern Recognit. 2025, 161, 111307. [Google Scholar] [CrossRef]
Nikolentzos, G.; Vazirgiannis, M. Random walk graph neural networks. Adv. Neural Inf. Process. Syst. 2020, 33, 16211–16222. [Google Scholar]
Feng, A.; You, C.; Wang, S.; Tassiulas, L. Kergnns: Interpretable graph neural networks with graph kernels. In Proceedings of the AAAI Conference on Artificial Intelligence, Online, 22 February–1 March 2022; Volume 36, pp. 6614–6622. [Google Scholar]
Alsentzer, E.; Finlayson, S.; Li, M.; Zitnik, M. Subgraph neural networks. Adv. Neural Inf. Process. Syst. 2020, 33, 8017–8029. [Google Scholar]
Sun, Q.; Li, J.; Peng, H.; Wu, J.; Ning, Y.; Yu, P.S.; He, L. Sugar: Subgraph neural network with reinforcement pooling and self-supervised mutual information mechanism. In Proceedings of the Web Conference 2021, Ljubljana, Slovenia, 19–23 April 2021; pp. 2081–2091. [Google Scholar]
Li, J.; Sun, Q.; Peng, H.; Yang, B.; Wu, J.; Yu, P.S. Adaptive subgraph neural network with reinforced critical structure mining. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 8063–8080. [Google Scholar] [CrossRef]
Yu, J.; Xu, T.; Rong, Y.; Bian, Y.; Huang, J.; He, R. Recognizing predictive substructures with subgraph information bottleneck. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 46, 1650–1663. [Google Scholar] [CrossRef]
Ertl, P.; Altmann, E.; McKenna, J.M. The most common functional groups in bioactive molecules and how their popularity has evolved over time. J. Med. Chem. 2020, 63, 8408–8418. [Google Scholar] [CrossRef]
Qin, X.; Bai, L.; Cui, L.; Li, M.; Du, H.; Hancock, E. MultiNet: Adaptive Multi-Viewed Subgraph Convolutional Networks for Graph Classification. Adv. Neural Inf. Process. Syst. 2026, 38, 67709–67729. [Google Scholar]
Wang, Z.; Cao, Q.; Shen, H.; Xu, B.; Cheng, X. Twin weisfeiler-lehman: High expressive GNNs for graph classification. arXiv 2022, arXiv:2203.11683. [Google Scholar] [CrossRef]
Zhu, Y.; Zhang, K.; Wang, J.; Ling, H.; Zhang, J.; Zha, H. Structural landmarking and interaction modelling: A “slim” network for graph classification. In Proceedings of the AAAI Conference on Artificial Intelligence, Online, 22 February–1 March 2022; Volume 36, pp. 9251–9259. [Google Scholar]
Bevilacqua, B.; Frasca, F.; Lim, D.; Srinivasan, B.; Cai, C.; Balamurugan, G.; Bronstein, M.M.; Maron, H. Equivariant Subgraph Aggregation Networks. In Proceedings of the International Conference on Learning Representations, Virtual, 25 April 2022. [Google Scholar]
Bouritsas, G.; Frasca, F.; Zafeiriou, S.; Bronstein, M.M. Improving graph neural network expressivity via subgraph isomorphism counting. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 45, 657–668. [Google Scholar] [CrossRef] [PubMed]
Xu, L.; Jiang, K.; Niu, X.; Chen, E.; Luo, B.; Yu, P.S. GL-BKGNN: Graphlet-based Bi-Kernel Interpretable Graph Neural Networks. Inf. Fusion 2025, 123, 103284. [Google Scholar] [CrossRef]
Lei, T.; Jin, W.; Barzilay, R.; Jaakkola, T. Deriving neural architectures from sequence and graph kernels. In Proceedings of the International Conference on Machine Learning, Sydney, Australia, 6–11 August 2017; pp. 2024–2033. [Google Scholar]
Chen, D.; Jacob, L.; Mairal, J. Convolutional kernel networks for graph-structured data. In Proceedings of the International Conference on Machine Learning, Virtual, 13–18 July 2020; pp. 1576–1586. [Google Scholar]
Cosmo, L.; Minello, G.; Bicciato, A.; Bronstein, M.M.; Rodolà, E.; Rossi, L.; Torsello, A. Graph kernel neural networks. IEEE Trans. Neural Netw. Learn. Syst. 2024, 36, 6257–6270. [Google Scholar] [CrossRef]
Poole, B.; Ozair, S.; Van Den Oord, A.; Alemi, A.; Tucker, G. On variational bounds of mutual information. In Proceedings of the International Conference on Machine Learning, Long Beach, CA, USA, 9–15 June 2019; pp. 5171–5180. [Google Scholar]
Liang, X.; Li, D.; Madden, A. Attributed network embedding based on mutual information estimation. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management, Virtual, 19–23 October 2020; pp. 835–844. [Google Scholar]
Maddison, C.J.; Mnih, A.; Teh, Y.W. The concrete distribution: A continuous relaxation of discrete random variables. arXiv 2016, arXiv:1611.00712. [Google Scholar]
Vishwanathan, S.V.N.; Schraudolph, N.N.; Kondor, R.; Borgwardt, K.M. Graph kernels. J. Mach. Learn. Res. 2010, 11, 1201–1242. [Google Scholar]
Wale, N.; Watson, I.A.; Karypis, G. Comparison of descriptor spaces for chemical compound retrieval and classification. Knowl. Inf. Syst. 2008, 14, 347–375. [Google Scholar] [CrossRef]
Toivonen, H.; Srinivasan, A.; King, R.D.; Kramer, S.; Helma, C. Statistical evaluation of the predictive toxicology challenge 2000–2001. Bioinformatics 2003, 19, 1183–1193. [Google Scholar] [CrossRef]
Borgwardt, K.M.; Ong, C.S.; Schönauer, S.; Vishwanathan, S.; Smola, A.J.; Kriegel, H.P. Protein function prediction via graph kernels. Bioinformatics 2005, 21, i47–i56. [Google Scholar] [CrossRef] [PubMed]
Dobson, P.D.; Doig, A.J. Distinguishing enzyme structures from non-enzymes without alignments. J. Mol. Biol. 2003, 330, 771–783. [Google Scholar] [CrossRef]
Kim, J.; Lee, J.G. Community detection in multi-layer graphs: A survey. ACM Sigmod Rec. 2015, 44, 37–48. [Google Scholar] [CrossRef]
Shervashidze, N.; Schweitzer, P.; Van Leeuwen, E.J.; Mehlhorn, K.; Borgwardt, K.M. Weisfeiler-lehman graph kernels. J. Mach. Learn. Res. 2011, 12, 2539–2561. [Google Scholar]
Zhang, Z.; Wang, M.; Xiang, Y.; Huang, Y.; Nehorai, A. Retgk: Graph kernels based on return probabilities of random walks. Adv. Neural Inf. Process. Syst. 2018, 31, 3968–3978. [Google Scholar]
Bai, L.; Cui, L.; Li, M.; Ren, P.; Wang, Y.; Zhang, L.; Yu, P.S.; Hancock, E.R. AEGK: Aligned Entropic Graph Kernels Through Continuous-Time Quantum Walks. IEEE Trans. Knowl. Data Eng. 2025, 37, 1064–1078. [Google Scholar] [CrossRef]
Ye, W.; Guo, W.; Tang, S.; Tian, H.; Sun, X.; Cao, X.; Shen, H.T. Distributional Shortest-Path Graph Kernels. IEEE Trans. Knowl. Data Eng. 2025, 37, 6367–6378. [Google Scholar] [CrossRef]
Yao, T.; Wang, Y.; Zhang, K.; Liang, S. Improving the expressiveness of k-hop message-passing gnns by injecting contextualized substructure information. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Long Beach, CA, USA, 6–10 August 2023; pp. 3070–3081. [Google Scholar]
Qin, X.; Bai, L.; Cui, L.; Li, M.; Du, H.; Wang, Y.; Hancock, E. HA-SCN: Learning Hierarchical Aligned Subtree Convolutional Networks for Graph Classification. In Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence, Montreal, QC, Canada, 16–22 August 2025; pp. 3245–3253. [Google Scholar]
Wu, Z.; Zhang, Z.; Fan, J. Graph convolutional kernel machine versus graph convolutional networks. Adv. Neural Inf. Process. Syst. 2023, 36, 19650–19672. [Google Scholar]

Figure 1. Illustration of the proposed method, PSDGNN. PSDGNN consists of four main modules (i.e., representation disentangled module, factor subgraph generation module, prototype kernel embedding module, and an MLP-based classifier). The representation disentangled module encodes node features of the input graph using a graph autoencoder and then decomposes the encoded node features into several latent factors. The factor subgraph generation module utilizes the latent factors obtained in the previous step to generate latent subgraphs, each of which can be regarded as a potential connectivity pattern of the original graph. For the prototype kernel embedding module, a set of learnable prototype subgraphs is defined. By employing a random walk kernel to compute the similarity between latent subgraphs and prototype subgraphs, the final decomposed representation of the graph is obtained. Finally, the graph classification task is accomplished through an MLP-based classifier.

Figure 2. Factor subgraph generator. An edge attention mask is used to generate a factor subgraph from the input graph. The similarity between nodes is calculated using the latent factors

z_{m}

from GraphVAE via inner products to obtain the attention mask

P^{(m)}

, where

{p_{i j}}^{(m)} \in [0, 1]

, with

p_{i j}

representing the edge selection probability between nodes i and j. To ensure that a sufficient number of edges are retained, P is reshaped into an R-dimensional matrix and then binarized to generate the edge assignment. Subsequently, we binarize P to obtain the edge assignment S. To ensure that the gradient

p_{i j}

is differentiable, we use the Gumbel–Softmax reparameterization technique to update the edge assignment S and transform S into an

n \times n

matrix. Finally, the factor subgraph is extracted via

A_{S G_{m}} = A ⊙ S^{(m)}

, where ⊙ denotes element-wise multiplication, and A and

A_{S G_{m}}

represent the adjacency matrices of the input graph G and factor subgraph

S G_{m}

, respectively.

Figure 2. Factor subgraph generator. An edge attention mask is used to generate a factor subgraph from the input graph. The similarity between nodes is calculated using the latent factors

z_{m}

from GraphVAE via inner products to obtain the attention mask

P^{(m)}

, where

{p_{i j}}^{(m)} \in [0, 1]

, with

p_{i j}

representing the edge selection probability between nodes i and j. To ensure that a sufficient number of edges are retained, P is reshaped into an R-dimensional matrix and then binarized to generate the edge assignment. Subsequently, we binarize P to obtain the edge assignment S. To ensure that the gradient

p_{i j}

is differentiable, we use the Gumbel–Softmax reparameterization technique to update the edge assignment S and transform S into an

n \times n

matrix. Finally, the factor subgraph is extracted via

A_{S G_{m}} = A ⊙ S^{(m)}

, where ⊙ denotes element-wise multiplication, and A and

A_{S G_{m}}

represent the adjacency matrices of the input graph G and factor subgraph

S G_{m}

, respectively.

Figure 3. Results of ten-fold cross-validation on seven datasets.

Figure 4. Module and loss ablation analysis (a) Module ablation. (b) Loss ablation.

Figure 5. PSDGNN with different feature fusion Methods.

Figure 6. The effect of factor Subgraph quantity on classification accuracy. (a) MUTAG. (b) PTC. (c) PROTEINS. (d) NCI1. (e) IMDB-B. (f) IMDB-M.

Figure 7. The effect of graph encoder layers on classification accuracy. (a) MUTAG. (b) PTC. (c) PROTEINS. (d) NCI1. (e) IMDB-B. (f) IMDB-M.

Figure 8. The impact of weight hyperparameters in loss functions on classification accuracy. (a) MUTAG. (b) DD. (c) IMDB-B.

Figure 9. The effect of the path length P of the random walk kernel on classification accuracy. (a) MUTAG. (b) PTC. (c) PROTEINS. (d) NCI1. (e) IMDB-B. (f) IMDB-M.

Figure 10. Parameter sensitivity of prototype subgraph size.

Figure 11. Graph embedding visualization. (a) MUTAG. (b) PROTEINS. (c) DD.

Figure 12. Visualization of factor subgraphs in the MUTAG dataset.

Table 1. Summary of subgraph neural networks and comparison with PSDGNN (“✓” in the table indicates that the graph embedding model possesses the corresponding capability).

Graph Embedding Model	Subgraph Awareness	Subgraph Diversity	Crossgraph Prototype Sharing	Subgraph Generation Method
[25]	✓			k-Hop subgraph
[28]	✓	✓		Parameterized learning
[29]	✓			k-Hop subgraph
[30]	✓	✓		k-Hop subgraph
[31]		✓		k-Hop subgraph
[32]		✓		Predefined subgraph
PSDGNN (ours)	✓	✓	✓	Parameterized learning

Table 2. Main notations in this paper.

Notation	Definition
G	Graph.
$V, X$	Node set and node attribute matrix.
A	Adjacent matrix.
$G, Y$	Graph dataset and its label.
A	Adjacent matrix.
$Z^{(l)}, W^{(l - 1)}$	Output of the l-th encoder layer and weight matrix of the $l - 1$ -th layer.
$A_{c}, X_{c}$	Reconstructed adjacency matrix and node attribute matrix.
$z_{m}, p^{(m)}$	The m-th latent factor and attention mask matrix.
$τ, c_{l}$	Temperature parameter in concrete distribution and Gumbel noise term.
$S G_{m}, P G_{m}$	The m-th factor subgraph and prototype subgraph.
${G_{m}}^{\times}, {A_{m}}^{\times}$	Direct product graph of $S G_{m}$ and $P G_{m}$ and its adjacency matrix.
$M, P$	Number of prototype graphs and path length of random walk.
$L_{G r a p h V A E}, L_{f a c t o r}, L_{C E}$	GraphVAE reconstruction loss, factor mutual information loss and factor mutual information loss.
$α, β$	Weight hyperparameter for $L_{G r a p h V A E}$ and $L_{f a c t o r}$ .

Table 3. Statistic of datasets.

Field	Dataset	Number	Class	Avg. Nodes	Node Features
Chemical compounds	MUTAG	188	2	17	7
Chemical compounds	NCI1	4110	2	29	37
Biological proteins	PTC	344	2	14	19
	DD	1178	2	284	18
	PROTEINS	1113	2	39	3
Social networks	IMDB-B	1000	2	19	-
Social networks	IMDB-M	1500	3	13	-

Table 4. Summary of experimental results: “Classification accuracy in each dataset”.

Method	MUTAG	PTC	PROTEINS	DD	NCI1	IMDB-B	IMDB-M
WL	90.4 ± 5.7	59.9 ± 4.3	75.0 ± 3.1	79.4 ± 0.3	86.0 ± 1.8	73.8 ± 3.9	50.9 ± 3.8
RetGK	90.3 ± 1.1	62.5 ± 1.6	75.8 ± 0.6	81.6 ± 0.3	80.4 ± 0.2	71.9 ± 1.0	47.7 ± 0.3
AEGK	90.4 ± 0.6	59.4 ± 0.4	75.1 ± 0.3	77.8 ± 0.3	-	-	-
DSP-I	91.0 ± 7.5	67.5.9 ± 7.2	76.8 ± 4.3	-	86.2 ± 1.7	76.0 ± 4.7	52.1 ± 2.7
GIN	89.4 ± 5.6	64.6 ± 7.0	76.2 ± 2.8	77.6 ± 2.9	82.7 ± 1.7	75.1 ± 5.1	52.3 ± 2.8
SLIM	93.3 ± 3.3	72.4 ± 6.9	77.5 ± 4.3	79.2 ± 2.6	80.5 ± 2.0	77.2 ± 2.1	53.4 ± 4.0
AdaSNN	87.2 ± 5.0	60.2 ± 6.4	76.5 ± 2.6	-	-	74.2 ± 2.5	51.9 ± 4.7
SEKGIN	96.9 ± 3.6	75.7 ± 6.4	81.7 ± 2.5	-	-	80.3 ± 3.2	56.1 ± 3.0
HA-SCN	90.1 ± 1.4	63.2 ± 1.5	76.3 ± 0.3	79.3 ± 0.4	-	73.2 ± 0.4	49.9 ± 0.6
KerGNN	88.7 ± 2.1	64.3 ± 3.1	76.5 ± 3.9	78.9 ± 3.5	82.8 ± 1.8	74.4 ± 4.3	51.6 ± 3.1
GCKSVM	88.7 ± 7.6	67.7 ± 5.4	74.5 ± 3.9	-	-	75.4 ± 2.1	53.9 ± 2.8
GKNN	85.7 ± 2.7	60.1 ± 1.9	75.3 ± 1.1	76.8 ± 4.3	71.5 ± 1.2	69.9 ± 1.4	53.5 ± 2.2
PSDGNN	94.6 ± 3.2	76.3 ± 2.9	83.5 ± 3.2	84.9 ± 1.5	86.4 ± 3.2	81.4 ± 4.3	56.3 ± 3.5

Table 5. Classification accuracy of different independence constraint methods on seven graph classification benchmarks (bold: best; underlined: runner-up).

Independence Constraint Method	MUTAG	PTC	PROTEINS	DD	NCI1	IMDB-B	IMDB-M
HSIC	92.9 ± 4.1	73.4 ± 3.4	79.8 ± 2.7	82.3 ± 3.2	85.3 ± 2.9	79.8 ± 4.2	53.6 ± 3.1
Distance Correlation	89.1 ± 2.5	69.2 ± 1.5	75.3 ± 0.3	79.5 ± 2.7	83.4 ± 3.3	77.3 ± 2.8	52.9 ± 3.6
Factor MI	94.6 ± 3.2	76.3 ± 2.9	83.5 ± 3.2	84.9 ± 1.5	86.4 ± 3.2	81.4 ± 4.3	56.3 ± 3.5

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Yang, B.; Yang, J.; Xu, L.; Tang, Y. Disentangled Graph Representation Based on Prototype Subgraph Neural Network. Mathematics 2026, 14, 1915. https://doi.org/10.3390/math14111915

AMA Style

Yang B, Yang J, Xu L, Tang Y. Disentangled Graph Representation Based on Prototype Subgraph Neural Network. Mathematics. 2026; 14(11):1915. https://doi.org/10.3390/math14111915

Chicago/Turabian Style

Yang, Baosheng, Jingshang Yang, Lixiang Xu, and Yuanyan Tang. 2026. "Disentangled Graph Representation Based on Prototype Subgraph Neural Network" Mathematics 14, no. 11: 1915. https://doi.org/10.3390/math14111915

APA Style

Yang, B., Yang, J., Xu, L., & Tang, Y. (2026). Disentangled Graph Representation Based on Prototype Subgraph Neural Network. Mathematics, 14(11), 1915. https://doi.org/10.3390/math14111915

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Disentangled Graph Representation Based on Prototype Subgraph Neural Network

Abstract

1. Introduction

2. Related Work

2.1. Subgraph Neural Networks

2.2. Combination of Graph Kernels and GNNs

3. Methodology

3.1. Representation Disentangled Module

3.2. Factor Subgraph Generation Module

3.3. Prototype Kernel Embedding Module

3.4. Model Optimization

3.5. Complexity Analysis

3.6. Discussion

4. Experiments

4.1. Experiments on Benchmark Datasets

4.2. Ablation Analysis

4.3. Parameter Sensitivity Analysis

4.4. Visualization Analysis

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI