The Heterogeneous Network Community Detection Model Based on Self-Attention

Zhou, Gaofeng; Wang, Rui-Feng

doi:10.3390/sym17030432

Open AccessArticle

The Heterogeneous Network Community Detection Model Based on Self-Attention

by

Gaofeng Zhou

^1,*

and

Rui-Feng Wang

^2,3

¹

College of Computer Science & Technology, Qingdao University, Qingdao 266071, China

²

College of Engineering, China Agricultural University, 17 Qinghua East Road, Haidian, Beijing 100083, China

³

National Innovation Center for Digital Fishery, China Agricultural University, Beijing 100083, China

^*

Author to whom correspondence should be addressed.

Symmetry 2025, 17(3), 432; https://doi.org/10.3390/sym17030432

Submission received: 12 February 2025 / Revised: 4 March 2025 / Accepted: 11 March 2025 / Published: 13 March 2025

(This article belongs to the Special Issue Symmetry and Asymmetry Study in Graph Theory)

Download

Browse Figures

Versions Notes

Abstract

With the advancement of representation learning, graph representation learning has gained significant attention in the field of community detection for heterogeneous networks. A prominent approach in this domain involves the use of meta-paths to capture higher-order relationships between nodes, particularly when bidirectional or reciprocal relationships exist. However, defining effective meta-paths often requires substantial domain expertise. Moreover, these methods typically depend on additional clustering algorithms, which can limit their practical applicability. To address these challenges, context paths have been introduced as an alternative to meta-paths. When combined with a self-attention mechanism, models can dynamically assess the relative importance of different context paths. By leveraging the inherent symmetry within context paths, models enhance their ability to capture balanced relationships between nodes, thereby improving their representation of complex interactions. Building on this idea, we propose BP-GCN, a self-attention-based model for heterogeneous community detection. BP-GCN autonomously identifies node relationships within symmetric context paths, significantly improving community detection accuracy. Furthermore, the model integrates the Bernoulli–Poisson framework to establish an end-to-end detection system that eliminates the need for auxiliary clustering algorithms. Extensive experiments on multiple real-world datasets demonstrate that BP-GCN consistently outperforms existing benchmark methods.

Keywords:

heterogeneous network; community detection; self-attention; Bernoulli–Poisson model; contextual path; graph neural network

1. Introduction

With the advent of the big data era and the increasing research interest in complex networks [1], heterogeneous networks have emerged as a key focus in this field. Heterogeneous networks are intricate structures composed of multiple types of nodes and edges. Unlike homogeneous networks, which consist of a single type of entity and relationship, heterogeneous networks encode a broader range of structural and semantic information. These networks are prevalent across various domains, including academic networks [2] (which involve nodes such as papers, authors, conferences and topics), movie networks [3] (comprising entities like actors, directors and films) and knowledge graphs, which represent entities and their relationships. While the diversity of node and edge types in heterogeneous networks enriches their semantic information, it also introduces significant challenges in terms of modeling and analysis.

Deng Cai et al. [4] proposed a query-based approach for mining heterogeneous social networks, which has significantly advanced the study of such networks. This method employs regression-based graph matrix analysis to uncover detailed and nuanced semantic relationships within social networks. Guo-Jun Qi et al. [5] introduced a clustering algorithm based on heterogeneous random fields, which models the structure and content of heterogeneous graphs through anomalous edges. By integrating content information from social media objects with the link structures, this approach enhances clustering performance. Chuan Shi et al. [6] proposed a ranking-based clustering method that projects heterogeneous graphs into subgraphs for analysis, facilitating effective community detection.

While traditional methods for heterogeneous community detection have made significant strides, they still struggle to capture higher-order complex relationships within heterogeneous networks. A common approach in these methods is the use of meta-paths for community detection. A meta-path connects different types of nodes and reveals higher-order relationships between them.

For instance, consider nodes a1 and a5 in Figure 1. These nodes should belong to the same community (Community 1), but traditional methods, which rely on direct relationships (such as “association”), would not group them together since no direct connection exists between them (r4). However, a1 and a5 are the authors of papers p1 and p4, and these papers share a common theme, t1. Additionally, a5 is directly connected to a4, who has collaborated with a1 on the same research theme, t1. These indirect connections provide strong evidence that a1 and a5 should be grouped together. Predefined meta-paths (such as “Author–Paper–Theme–Paper–Author”) facilitate a more accurate community assignment for a1 and a5. Furthermore, the symmetry of meta-paths enriches and enhances the flexibility of the relationships between nodes. For instance, the relationship between a1 and a5 through papers p1 and p4 and theme t1 is not one-way. Due to symmetry, both a1 and a5 act as “authors”, which strengthens their similarity. This symmetric connection aids in more accurately identifying their shared community.

Meta-paths, by capturing higher-order relationships that direct connections may overlook (such as indirect links through papers and themes), provide a more reliable and comprehensive representation of interactions between different node types in heterogeneous networks.

With the advancement of deep learning [7,8,9,10] and the increasing research interest in complex networks [11,12], the study of heterogeneous networks has become a significant focus in network science. However, most existing community detection methods primarily target homogeneous networks, which consist of a single node type. These methods often perform poorly when applied to heterogeneous networks, as they fail to effectively capture the intricate relationships between diverse node types. Consequently, the development of algorithms capable of accounting for multiple node types and their complex interrelationships in heterogeneous networks has become a critical area of current research.

In community detection methods [13,14,15,16] based on representation learning, many techniques leverage meta-paths to capture node relationships within heterogeneous networks. For example, the HIN2Vec [17] method employs a neural network to model relationships between nodes via meta-paths, embedding this relational information into a low-dimensional space. By learning relationship vectors, it differentiates between various types of relationships while preserving contextual information. The Metapath2Vec [18] method constructs heterogeneous neighborhoods of nodes using meta-path-based random walks and then performs node embeddings through a heterogeneous skip-gram model. This process automatically uncovers hidden connections between different node types.

The HAN model [2] is a semi-supervised heterogeneous graph neural network that employs an attention mechanism with hierarchical layers at both the node and semantic levels. Specifically, node-level attention identifies correlations between nodes and their meta-path neighbors, while semantic-level attention evaluates the significance of different meta-paths. The CDBMA model [16] consists of both a structural information encoder and a semantic information encoder. By incorporating a multi-attention mechanism and integrating self-supervised learning, this model achieves the joint optimization of structural and semantic information, effectively reducing dependency on community labels. The HGT model [19] adopts an attention-based architecture that accounts for both node and edge types. Unlike traditional methods, it eliminates the need for predefined meta-paths by learning ‘soft’ meta-paths through message passing. This approach is particularly effective for large-scale heterogeneous graphs, as it dynamically aggregates information from high-order neighbors of various types.

The effectiveness of these methods is closely tied to the quality of meta-paths, which are often designed based on domain knowledge. Constructing effective meta-paths requires a comprehensive understanding of both the network’s structure and its semantics, making the process inherently difficult and complex. As meta-path length increases, the number of potential combinations grows exponentially, rendering it impractical to evaluate every possible meta-path in detail. While methods based on Heterogeneous Graph Transformers (HGTs) circumvent the challenge of manual meta-path definition, they are limited in that they only capture one-hop neighbor information. This constraint fails to account for higher-order relationships within the network, reducing the model’s robustness and generalization ability and making it less effective at capturing complex, high-order connections.

Moreover, most existing community detection methods treat node representation learning and clustering tasks as separate processes. Although this approach is effective in certain cases, it does not fully integrate the network structure and node relationships, ultimately diminishing the overall effectiveness of community detection. The key challenge, therefore, is to merge node representation learning with community detection within a unified framework, thereby enhancing performance in heterogeneous networks.

In this study, we introduce BP-GCN, a self-attention-based model for community detection in heterogeneous networks, utilizing the Bernoulli–Poisson model [20,21] as an optimization function. BP-GCN employs context paths to capture semantic relationships between nodes, replacing traditional meta-paths. The self-attention mechanism enables the model to assess the importance of different paths, facilitating the effective aggregation of higher-order relationships. BP-GCN generates a community affiliation matrix and performs end-to-end community detection. A key advantage of context paths is their adaptability: they evolve based on specific tasks while maintaining node-type symmetry. This ensures that nodes of the same type learn similar semantic contexts, while different node types develop their representations within a balanced space. Such symmetry enhances model robustness by reducing noise, mitigating bias and improving overall efficiency. Additionally, this approach lowers computational complexity while preserving expressive power. By effectively aggregating high-order relationships, BP-GCN enhances both accuracy and scalability in community detection.

2. BP-GCN Model

2.1. Concept Definition

This section assumes a given heterogeneous graph

H G (V, E, A, ℝ)

, where V denotes the set of nodes, including various types of nodes; E represents the set of edges, consisting of different edge types; A is the set of node types, indicating the type of each node; and

ℝ

is the set of edge types, representing the different types of edges in the graph. Node types are associated with elements in A through the node mapping function ϕ(v), while edge types are linked to elements in

ℝ

via the edge mapping function ψ(e). To ensure that the heterogeneous graph can represent diverse structures, it is required that

| A | + | ℝ | > 2

, representing that both the number of node and edge types must exceed two. The Table A1 in the Appendix A provides a detailed explanation of the symbols used in this paper. Community detection in heterogeneous graphs must account for the various types of nodes and edges, as well as their diverse relationships. The results of this process are analogous to those in homogeneous graphs, where nodes of the same type with high association are clustered together.

Definition 1. Primary Node and Auxiliary Node

[22]. In community detection tasks, clustering is typically performed on a specific type of node, referred to as the primary node (P). Other nodes in the heterogeneous network are termed auxiliary nodes, and together they form the type set A’. Notably, any node in the network can be considered as a primary node type.

Definition 2. Primary Graph and Auxiliary Graph.

Given a heterogeneous graph HG, the primary graph is a subgraph composed of primary nodes of type P, denoted as

{HG}_{P} = (V_{P}, E_{P})

. Each node

v \in V_{P}

belongs to the primary node type P, and each edge,

(v_{i}, v_{j}) \in E_{P} f o r (v_{i}, v_{j}) \in V_{P} a n d v_{i} \neq v_{j}

, is a homogeneous edge directly connecting primary nodes. The primary graph serves as the main target graph in community detection tasks. The auxiliary graph is a subgraph formed by the auxiliary nodes A’ and their associated edges, denoted as

{HG}_{A^{'}} = (V_{A^{'}}, E_{A^{'}})

. Each node

v \in V_{A^{'}}

in the auxiliary graph has a corresponding node

φ (v) = A^{'} \in A

in the primary graph. The auxiliary graph provides contextual information and supplementary semantics by capturing relationships between auxiliary nodes as well as between auxiliary and primary nodes, thereby enhancing the effectiveness of community detection.

Definition 3. Context Path

[23]. Given a heterogeneous graph HG, a context path is a path connecting two nodes, v_i and v_j, in the primary graph

{HG}_{P}

, denoted as

ρ_{K} = (v_{i}, R_{K}, v_{j})

, where

R_{K}

is any path connecting v_i and v_j consisting of K nodes; this path can be either a symmetric loop or an asymmetric path. When K = 0,

R_{K}

is empty, indicating that v_i and v_j are directly connected. Nodes connected through context paths are referred to as context neighbors.

Due to the small-world phenomenon [24,25], most nodes in a network can be connected through relatively short paths. In the case of context paths, this is reflected in their ability to link primary nodes via a small number of auxiliary nodes. This characteristic enables context paths to effectively capture higher-order semantic relationships between nodes, providing theoretical support for the definition and utilization of context neighbors. In contrast, meta-paths require domain experts to manually configure combinations of auxiliary nodes along the context path. As the number of node types and path lengths increases, the number of possible combinations grows exponentially, leading to a rapid increase in computational complexity. Therefore, context paths help mitigate the impact of path length on computational complexity.

Furthermore, context paths encompass all possible K-step relationships for a given path length K, allowing them to capture a wider range of latent semantic connections. According to the six degrees of separation theory [26,27], the distance between any two nodes in a network typically does not exceed six steps. Building on this theory, empirical selection or the optimization of path length K via a context path length attention mechanism can effectively cover the primary relationships between nodes while avoiding redundancy and the computational overhead associated with excessively long paths. This approach significantly enhances computational efficiency while boosting the model’s expressive capability.

As shown in Figure 2, in these paths, R* represents the auxiliary nodes, such as works or directors, that form the path. These auxiliary nodes may include films jointly acted in by both actors, directors they collaborated with, or other relevant entities. Such paths capture both direct relationships between actors and more complex indirect connections, thereby providing rich semantic information for context-based network analysis. For example, in Figure 2, paths R1 and R2 can be represented by a context path of length 3.

2.2. Loss Function

A target function L(BP), based on the Bernoulli–Poisson model, was designed to construct an end-to-end detection model and enhance the quality of clustering results. The following section provides an introduction to this loss function.

The Bernoulli–Poisson model (B-P model) is a bipartite membership network model capable of accurately detecting community structures in a network, including non-overlapping, overlapping and nested community structures. The core of the model lies in fitting the adjacency matrix A based on the community membership matrix F. This approach enables the detection of complex community patterns, ensuring that the model effectively represents various types of relationships within the network.

A_{i j} = B e m o u l l i (1 - \exp (- F_{i} F_{j}^{T}))

(1)

The community membership matrix is used to approximate the adjacency matrix of the primary graph with the improved formula, expressed as follows:

{HG}_{P_{i j}} = B e m o u l l i (1 - \exp (- F_{i} F_{j}^{T}))

(2)

According to the theory of the B-P model, the edge connection probability between any two primary nodes, v_i and v_j, is positively correlated with the number of shared communities. The element F_ij of the membership matrix F represents the probability that the primary node v_i belongs to community C_j. If F_ij exceeds a predefined threshold probability p, the primary node v_i is assigned to community C_j. Therefore, the row vector F_i of the matrix F characterizes the community affiliation of node v_i. By optimizing the following function L(BP) using the B-P model, the following community membership matrix F for the primary graph is obtained:

L (B P) = \sum_{(u, v) \in E_{P}} \log (1 - \exp (- F_{i} F_{j}^{T})) - \sum_{(u, v) \notin E_{P}} F_{i} F_{j}^{T}

(3)

Here,

1 - \exp (- F_{i} F_{j}^{T})

represents the probability of connection between two primary nodes v_i and v_j. In a real network, if nodes v_i and v_j are connected, the inner product of

F_{i} F_{j}^{T}

should equal 1, making the first part of the equation equal to 1. If the two nodes are not connected,

F_{i} F_{j}^{T}

should be 0. Given that real-world networks are typically sparse, with the actual number of edges being much smaller than the total number of possible edges, the second part of the equation generally contributes significantly. To ensure the accuracy of the resulting structure, an appropriate balance between the two parts of the equation is required. The balanced formula is as follows:

L (B P) = - E_{(u, v) ~ P_{E}} [\log (1 - \exp (- F_{i} F_{j}^{T}))] + E_{(u, v) ~ P_{N}} [F_{i} F_{j}^{T}]

(4)

In this context, P_E and P_N represent the uniform distribution of edges and non-edges in the primary graph, respectively.

2.3. BP-GCN Model Architecture

This section introduces BP-GCN, a self-attention-based graph neural network model designed to learn the community membership matrix of primary nodes within heterogeneous graphs, thereby enabling end-to-end community detection. The overall architecture of the model is illustrated in Figure 3.

Initially, for a given heterogeneous graph (HG), all context paths of length K are extracted for each node in the primary graph (HG_P). A self-attention mechanism is then applied to further differentiate the semantic information of various contexts. Subsequently, a partial decoder architecture from the Transformer model is adopted. The BP-GCN model integrates Graph Convolutional Networks (GCNs) with the Bernoulli–Poisson loss function to learn the community membership matrix. The key implementation details of the model are discussed in the following sections.

The model reinterprets the processes of embedding transformation and context path aggregation in heterogeneous networks as a systematic generative decoding process [28,29]. By sequentially generating node context information through the decoder layer by layer, BP-GCN effectively incorporates higher-order relationships and contextual details within the graph. This iterative approach refines the embedding representation of each node, facilitating a more accurate capture of structural features and intricate relational patterns inherent in the graph.

In heterogeneous networks, node relationships may exhibit irregularities, where certain nodes experience unstable information propagation due to missing edges or erroneous connections. To address this, the Dropout mechanism [30,31,32] was introduced to enhance model robustness. By randomly dropping a fraction of the outputs, this mechanism prevents the model from over-relying on specific nodes or features, effectively reducing model overfitting and improving generalization capabilities.

H G' = D r o p o u t (H G)

(5)

Subsequently, BP-GCN employs an averaging operation to obtain the global graph representation, thereby enhancing the model’s expressive capacity and stability. This step involves averaging the embedding vectors of all nodes, seamlessly integrating local information from each node to form a unified global representation. The averaging operation is particularly advantageous due to its simplicity and robust expressiveness, as it consolidates information from all nodes without introducing significant computational complexity.

h g = M e a n (C')

(6)

C'

integrates the context information vectors of all nodes in the heterogeneous graph

H G'

, providing richer node representations for subsequent community detection tasks.

The design of BP-GCN contrasts with traditional Graph Convolutional Networks (GCNs), which typically rely on explicit graph structure traversal and node adjacency information for message passing. In contrast, BP-GCN leverages the auto-regressive nature of its decoder architecture to progressively generate the contextual information of the nodes.

During this generation process, BP-GCN integrates a self-attention mechanism, which assigns varying levels of importance to different contextual paths, enabling selective aggregation across multiple relationships. The attention score for the h-th head is computed as follows:

Q^{h g} (h g_{T}^{l}) = Q L i n e a r_{T}^{l} (h g_{T}^{l})

(7)

K^{h g} (h g_{S}^{l}) = K L i n e a r_{S}^{l} (h g_{S}^{l})

(8)

α_{S T}^{h g, l} = S o f t \max \frac{Q^{h g} (h g_{l}^{T}) K^{h g} (h g_{l}^{S})}{\sqrt{d}}

(9)

Let S and T denote the source and target node types in the node relation r_ST. For the one-layer heterogeneous subgraphs

H G_{S}

and

H G_{T}

, let

h g_{S}^{l}

and

h g_{T}^{l}

represent their graph-aggregated vectors. To compute the attention scores, two linear projection functions, QLinear and KLinear, are applied to map the graph-aggregated vectors into query and key vectors. This mapping enables the model to capture the relationships between different nodes and their importance to the final node embeddings.

To enhance the model’s expressive capability, a multi-head attention mechanism [33] is incorporated. This involves the use of H distinct attention heads, each with independent parameters that are progressively learned and optimized during the training process. Each attention head is designed to focus on different aspects of the relationships within the graph, thereby enabling the model to capture more comprehensive relational information. The attention weight, denoted as

α_{S T}^{h g, l}

, represents the attention weight of relation r_ST at layer l on head h, reflecting the contribution of the relation to the node embeddings. This setup allows each layer of the decoder to dynamically calculate the attention weights for various relations based on the output of the preceding layer and the contextual information relevant to the current node. By employing this mechanism, the model can automatically assess the importance of different relation types during the decoding process, avoiding the reliance on static graph structures characteristic of traditional methods. As a result, the aggregation of node information becomes more flexible and efficient. The aggregation of multiple attention heads enables the model to capture the diverse influences of different relations. In determining the contextual information vector for node v_i, the contextual information vectors of its neighbors from the (l − 1)-th layer are used, along with the attention scores for each relation, to perform a weighted aggregation.

c_{i}^{l} = W_{2}^{l} (\sum_{h = 1}^{H} σ (W_{1}^{l} \sum_{r_{S T}} α_{S T}^{h g, l} \sum_{v_{j} \in N_{S} (i)} c_{i}^{l - 1}) + B)

(10)

Let

N_{S} (i)

denote the set of neighboring nodes of node v_i and H represent the number of attention heads. After K layers of computation, the final output is denoted as

c_{p}^{K}

. By employing multi-head attention aggregation at each layer, this approach effectively integrates information derived from multiple relations, thereby capturing richer contextual associations among nodes.

To optimize contextual information and effectively address the potential issue of over-smoothing that may occur during the deep propagation of node embeddings, the BP-GCN model specifically incorporates a Gated Recurrent Unit (GRU) mechanism [34,35,36]. The GRU mechanism helps to preserve the diversity of node embeddings and prevents the gradual loss of information throughout the multi-layer decoding process.

c_{p}^{K} = G R U (c_{p}^{K - 1}, B P - G C N L a y e r (c_{p}^{K - 1}, c_{A'}^{K - 1}))

(11)

To further enhance the model’s expressive capacity, BP-GCN applies the ReLU activation function to the generated contextual information, effectively filtering out redundant embedding features. This approach not only reduces information redundancy but also improves the precision of the embeddings. Additionally, the contextual information of the primary graph nodes, as generated by the model, serves as the input for a Graph Convolutional Network (GCN), which produces the final community affiliation matrix for the primary graph. By leveraging a message-passing mechanism, the GCN propagates node feature information to neighboring nodes, thereby capturing the intricate relationships within the network. In some respects, this message-passing process mimics real-world information dissemination, increasing the likelihood that the derived community structures accurately represent real-world scenarios. The single-layer convolutional formula of the GCN is expressed as follows:

H^{(l + 1)} = σ ({\tilde{D}}^{- \frac{1}{2}} \tilde{A} {\tilde{D}}^{- \frac{1}{2}} H^{(l)} W^{(l)})

(12)

Here, σ denotes the activation function, such as Sigmoid, Tanh or ReLU.

\tilde{A} = A + I_{N}

represents the adjacency matrix combined with self-loops, and D is the degree matrix of the adjacency matrix A. The convolution process in the GCN involves an iterative procedure that continuously aggregates information from both the node itself and its neighbors, where each iteration corresponds to a feature transformation. For large-scale networks, the BP-GCN method employs a two-layer convolution to further process contextual information,

c_{p}^{K}

, with the following specific modifications:

F = Re L U (\tilde{H G_{P}} (Tanh (\tilde{H G_{P}} c_{p}^{K} W^{(1)}) W^{(2)}))

(13)

In this context,

\tilde{H G_{P}} = D_{P}^{- \frac{1}{2}} (H G_{P}) D_{P}^{- \frac{1}{2}}

denotes the degree matrix of the

H G_{P}

subgraph within the heterogeneous graph, and F corresponds to the community affiliation matrix of the primary graph. This matrix represents the final clustering results, enabling an end-to-end approach to community detection tasks.

3. Experiments

3.1. Dataset

This section comprises our evaluation of the performance of BP-GCN by comparing it with other benchmark models using the ACM [2], DBLP [37], IMDB [3] and AIFB [38] datasets. Detailed information regarding these datasets is provided in the Table 1 below:

ACM Dataset: This dataset consisted of a paper information network with four types of nodes. The primary node was the paper node, which formed the main graph. The other three node types served as auxiliary nodes, contributing to the construction of an auxiliary graph. Two predefined meta-paths were utilized: Paper–Author–Paper (PAP) and Paper–Subject–Paper (PSP). The PAP meta-path captured the relationship between papers and authors by linking papers co-authored by the same individuals, thereby revealing collaboration networks and paper patterns. The PSP meta-path, on the other hand, highlighted the relationship between papers and subjects, connecting papers through shared research topics to illustrate academic relationships between research themes.

DBLP Dataset: The DBLP dataset included four types of nodes and was modeled in two stages. Initially, the author and paper nodes were chosen as the primary nodes, allowing for clustering based on these two categories. Subsequently, the dataset was presented in two forms: DBLP-P (paper-centered) and DBLP-A (author-centered). Three meta-paths were defined: Author–Paper–Author (APA), Author–Paper–Conference–Paper–Author (APCPA) and Author–Paper–Term–Paper–Author (APTPA). The APA meta-path highlighted the connections between authors through shared publications, reflecting collaboration patterns such as overlaps in research domains. The APCPA meta-path captured relationships between authors, their co-authored papers and the conferences they were associated with, showcasing the authors’ influence in specific conference domains. Lastly, the APTPA meta-path linked authors through shared terminology, facilitating the identification of similarities in their research topics.

IMDB Dataset: This dataset featured four node types: “Director”, “Actor”, “Movie” and “Keyword”. The movie node served as the primary node and was categorized into three genres: Action, Comedy and Drama. The remaining three nodes were auxiliary types. Three meta-paths were defined for the network. The first, Movie–Actor–Movie (MAM), captured the relationships between films through actors who co–starred, highlighting collaborations between films and actors. The second, Movie–Director–Movie (MDM), connected films through shared directors. The third, Movie–Keyword–Movie (MKM), linked films through shared keywords, enabling an analysis of thematic or content-based similarities between them.

AIFB Dataset: The AIFB dataset was based on a knowledge graph, consisting of seven distinct node types and 104 unique edge types. The primary node type was the ’Personen’ node, which was further categorized into four different groups. Due to the dataset’s complexity, some specific details are omitted from the table, though relevant meta-paths are provided where necessary.

3.2. Benchmark Methods

To evaluate the performance of BP-GCN, the proposed method was benchmarked against several state-of-the-art techniques, including both un-supervised and supervised approaches. Among the un-supervised methods, Node2vec [39] and Metapath2vec [18] were included, while the supervised methods consisted of GCN [40], GAT [41], LGNN [42], HAN [2] and HGT [19]. Notably, HGT is regarded as the current state-of-the-art method. The following section provides a comprehensive overview of the benchmark methods:

Node2vec: Node2vec formulates network feature learning as an optimization problem based on a graph search. It is designed to capture diverse connectivity patterns within a network by mapping nodes into a low-dimensional feature space. The objective is to maximize the preservation of node neighborhood probabilities, thereby enabling the efficient learning of node representations.

Metpath2vec: Metpath2vec is specifically tailored for heterogeneous networks with multiple node and edge types. It generates heterogeneous neighborhood embeddings for nodes using meta-path-based random walks and incorporates heterogeneous negative sampling techniques to enhance optimization. This enables Metpath2vec to capture both the structural and semantic relationships between different network objects.

GCN: GCN is a semi-supervised learning method designed for graph-structured data. The key idea is to approximate the spectral graph convolution in the first order locally. GCN introduces an effective hierarchical propagation rule that approximates the graph’s Laplacian operator, using the graph’s structure to establish dependencies between nodes. This process enables node representations to be updated through information flow within local neighborhoods, making GCN particularly effective for node classification tasks.

GAT: GAT employs a masked self-attention mechanism that allows each node to attend to its neighboring nodes. This attention mechanism is used to compute hidden representations of nodes, with multiple layers stacking to capture neighborhood features more effectively. It also enables nodes to assign varying weights to each neighbor, avoiding computationally expensive operations like matrix inversion, and eliminates the need for prior knowledge about the graph structure.

LGNN: LGNN is an enhancement to the traditional Graph Neural Network (GNN) architecture, which allows the model to utilize edge adjacency information and incorporate a non-recursive graph operator. These improvements enhance the model’s expressiveness, enabling it to capture more complex relationships in graph data.

HAN: HAN is a heterogeneous graph neural network that utilizes a hierarchical attention mechanism. It integrates attention at both the node and semantic levels. The node-level attention learns the importance of the relationships between nodes and their neighbors, based on meta-paths. The semantic-level attention focuses on determining the relative significance of different meta-paths. This hierarchical aggregation of features from meta-path neighbors results in more effective node embeddings.

HGT: HGT incorporates parameters that depend on node and edge types, allowing it to capture heterogeneous attention across edges. This enables the model to retain specialized representations for various node and edge types. Additionally, HGT introduces a heterogeneous mini-batch graph sampling algorithm called HG-Sampling, which facilitates efficient and scalable training on web-scale datasets.

3.3. Comparative Experiments

Three widely used evaluation metrics—F1 score, Normalized Mutual Information (NMI) and Adjusted Rand Index (ARI)—were employed to assess the quality of the clustering results in community detection. A brief overview of these metrics is provided below:

F1 score: F1 score is a standard evaluation metric for community detection, measuring the similarity between the detected community partition and the ground truth. It combines precision and recall by calculating their harmonic mean. F1 scores range from 0 to 1, where a higher score indicates better community partition quality and a lower score suggests a greater disparity between the detected and true communities.

NMI: NMI [43] quantifies the shared statistical information between two clusters, thus providing an assessment of clustering performance. NMI values range from 0 to 1. A value closer to 1 indicates a strong alignment between the clustering results and the correct community classification, while a value near 0 implies weak correspondence. NMI standardizes the metric to mitigate the effects of varying category numbers and dataset scales, ensuring consistent evaluation across different datasets and clustering algorithms.

ARI: The Adjusted Rand Index (ARI) is a widely used metric for clustering evaluation, measuring the degree of agreement between the clustering outcomes and the ground truth labels. The ARI adjusts the Rand Index to account for random chance, leading to a more accurate reflection of clustering quality. The ARI ranges from −1 to 1, with a score of 1 indicating perfect agreement between the clustering results and the ground truth, suggesting accurate capture of the data structure. A score of 0 means the clustering results are no better than random chance, indicating poor clustering performance. Negative values suggest clustering results that are worse than random, likely due to significant misclassification errors.

In this experiment, GCN, GAT, LGNN and HAN were employed to learn primary node embeddings. The resulting embedding matrices were then input into the K-Means algorithm for clustering, with the value of k set to match the number of primary node categories, facilitating community detection. The performance of each algorithm is presented in the Table 2 below. Meta-path-based methods, such as Metapath2Vec and HAN, were unable to perform community detection on the AIFB dataset due to the absence of predefined meta-paths. This limitation highlights the constraints of meta-path-based approaches.

When comparing un-supervised methods, Metapath2Vec outperforms Node2Vec. On the DBLP dataset, Metapath2Vec significantly surpasses Node2Vec in terms of F1, NMI and ARI metrics. These results suggest that longer meta-paths are more effective in capturing higher-order semantic relationships between nodes, leading to richer node representations. Furthermore, the design of meta-paths allows Metapath2Vec to better leverage semantic information in heterogeneous graphs, thereby enhancing its performance on heterogeneous network data. This underscores the crucial role of meta-paths in improving node embedding quality in un-supervised learning.

Among the supervised methods, GCN and LGNN exhibit the weakest performance, particularly on the DBLP-P network. On this dataset, GCN even produces negative metric values, while LGNN encounters similar issues on the DBLP-A and IMDB datasets. This is because GCN and LGNN are optimized for homogeneous networks and struggle to process the complex heterogeneous information present in heterogeneous graphs, making them unsuitable for community detection in such networks. In contrast, GAT, HAN and HGT, which employ attention mechanisms to assess the importance of different node relationships in heterogeneous graphs, demonstrate superior performance. Among them, HAN outperforms GAT due to its use of meta-paths, which extract deeper structural and semantic information, leading to better performance. However, HAN’s reliance on predefined meta-paths limits its ability to detect communities in the AIFB dataset. Unlike HAN, HGT does not rely on meta-paths, providing greater flexibility across datasets, which enhances its robustness and generalization capability. Its community detection performance is comparable to that of HAN. However, HGT still requires labeled data for model optimization, which constrains its ability to generalize to unseen data.

The experimental results demonstrated that BP-GCN outperforms other methods across multiple datasets and evaluation metrics. On the ACM dataset, BP-GCN achieved an ARI score of 0.4039, the highest among the tested methods, showing a 2.2% improvement over HGT. On the DBLP-P dataset, BP-GCN achieved F1 and NMI scores of 0.4875 and 0.4392, respectively, with the NMI score exceeding HGT by more than 30%, highlighting BP-GCN’s ability to effectively capture complex relationships between nodes. On the IMDB dataset, BP-GCN achieved an NMI score of 0.2298, significantly surpassing other methods, demonstrating its broad applicability in heterogeneous graphs. On the AIFB dataset, BP-GCN achieved F1 and NMI scores of 0.7569 and 0.3854, respectively, outperforming HGT and other baseline methods. This result underscores BP-GCN’s capability to uncover higher-order relationships between diverse node types. Unlike meta-path-dependent methods such as HAN, BP-GCN does not require predefined meta-paths, making it more flexible and adaptable across different datasets. As a result, its learned node embeddings are better suited for community detection in heterogeneous networks. Compared to models designed for homogeneous graphs such as GCN and LGNN, BP-GCN demonstrates superior performance in heterogeneous graph analysis.

In conclusion, the design and optimization strategies of BP-GCN enable exceptional performance in community detection. The model’s adaptability and robustness make it highly effective for analyzing heterogeneous graphs, positioning it as a state-of-the-art solution for heterogeneous network analysis.

3.4. Parameter Sensitivity Analysis

This section presents our systematic examination of the sensitivity of the model’s performance though key hyperparameters, including the number of attention heads, embedding dimensions and context path length, through a series of experiments. Specifically, we analyzed how different parameter configurations influenced the model’s performance across various datasets. By adjusting these hyperparameters, we assessed their impact on the model’s final performance and present experimental results under multiple parameter settings for each dataset. These findings provide empirical evidence demonstrating the critical roles of attention heads, embedding dimensions and context path length in shaping the model’s effectiveness.

3.4.1. Number of Attention Heads

Figure 4 and Figure 5 illustrate the model’s performance across varying numbers of attention heads on the ACM and DBLP datasets. The experiments evaluated the impact of different attention head configurations on model performance, comparing both smaller and larger numbers of attention heads. Due to hardware limitations, the maximum number of attention heads tested was 16. Nevertheless, the results offer valuable insights into how the number of attention heads affects model performance.

As depicted in Figure 4 and Figure 5, the F1, NMI and ARI values reach their peak at 8 attention heads but begin to decline when the number increases to 16. This suggests that simply increasing the number of attention heads does not necessarily enhance model performance. When the number of attention heads is too low, the model struggles to capture complex features, leading to suboptimal results.

Although increasing the number of attention heads beyond 8 can enhance the model’s capacity, it may also introduce redundant information, increasing both the number of parameters and computational complexity. This can lead to attention dispersion or the incorporation of noise, which negatively affects performance. Therefore, selecting the optimal number of attention heads is crucial for maximizing performance across different tasks and datasets. Based on the experimental results, setting the number of attention heads to 8 is recommended for community detection tasks on both the ACM and DBLP-A datasets, as it yielded the best overall performance.

3.4.2. Embedding Dimensions

Figure 6 and Figure 7 evaluate the model’s performance across different embedding dimensions for the ACM and DBLP datasets, providing insights into how embedding size influenced model effectiveness. The experimental results indicate a positive correlation between embedding dimensions and model performance across all datasets. However, the relationship between embedding size and performance is nonlinear.

When the embedding dimension was too small, the model struggled to effectively represent features, limiting its ability to capture complex relationships within the data. As the embedding dimension increased, the model’s capacity to represent features improved, allowing it to capture richer semantic information. However, this also introduced potential redundancy, which can lead to overfitting and reduced generalization.

Moreover, increasing the embedding dimension expanded the number of model parameters, making training more computationally intensive and requiring additional computational resources. Due to experimental constraints, the embedding dimension for the ACM dataset was capped at 128, while for the DBLP-A and DBLP-P tasks, the maximum embedding dimension was set to 256. Although hardware limitations restricted the scope of the experiments, the results still provide valuable guidance for selecting appropriate embedding dimensions to optimize model performance.

Based on the experimental results for attention head parameters, the model’s performance on the ACM dataset was evaluated by maintaining a constant number of eight attention heads while varying the embedding dimensions. The results, presented in Figure 6, demonstrate a significant improvement in F1, NMI and ARI scores as the embedding dimension increased. These findings suggest that larger embedding dimensions enhanced the model’s ability to represent features, allowing it to capture more complex relationships within the data, thereby improving performance.

However, increasing the embedding dimension also escalates computational demands and memory usage. This trade-off underscores the importance of balancing performance improvements with resource efficiency. The experiments revealed that an embedding dimension of 128, combined with eight attention heads, achieved the best performance on the ACM dataset. Therefore, this configuration is recommended to achieve an optimal balance between performance and efficiency for community detection tasks on this dataset.

Figure 7 illustrates the performance of community detection tasks on the DBLP dataset, with the Author node as the primary node. As previously mentioned, the model achieved optimal performance when the number of attention heads was set to 8 for the DBLP-A task. In this section, the experiments fixed the number of attention heads at 8 and focused on examining how embedding dimensions influenced model performance.

The results indicate that for the DBLP-A task, the model performed best when the embedding dimension was set to 16. Beyond this point, performance gradually declined as the embedding dimension increased. This decline can be attributed to the additional complexity introduced by higher embedding dimensions. While a larger feature space could enhance the model’s expressiveness, it also increased the number of parameters, raising model complexity and making training more challenging. Furthermore, excessively high embedding dimensions may lead to overfitting, reducing the model’s ability to generalize effectively on the test set.

Based on this analysis, setting the embedding dimension to 16 while maintaining 8 attention heads is recommended for achieving optimal performance in community detection tasks on the DBLP dataset.

3.4.3. Context Path Length

This study, through controlled variable experiments, revealed the nonlinear impact of context path length on the performance of the BP-GCN model. As illustrated in Figure 8a,b, on the ACM and DBLP-A datasets, the model achieved its peak performance in metrics such as F1 score, NMI and ARI when the path lengths were increased to K = 3 and K = 4, respectively. Notably, the performance of CP-GNN on the DBLP-A dataset further validated the core hypothesis: when K = 3, the model optimally captures higher-order relational information between nodes, thereby achieving superior performance in community detection tasks. This phenomenon underscores the critical role of high-order topological features in node semantic representation within complex network analysis.

The inverted U-shaped variation pattern revealed by the experimental data can be attributed to a triple mechanism: (1) short path constraints (K ≤ 2) lead the model into a local neighborhood trap, only perceiving shallow features of directly connected nodes; (2) within the optimal path range (K = 3–4), the expanded context window enables the model to achieve structural–semantic joint modeling of multi-hop neighborhoods, dynamically aggregating feature combinations with community discriminability through a self-attention mechanism; and (3) the network degradation effect induced by excessively long paths (K > 4) manifests as follows: when K exceeds a critical threshold, the effective neighborhood range of nodes expands sharply, eventually forming a fully connected topology. This over-connected state leads to two negative effects: firstly, node features undergo excessive smoothing (over-smoothing) during multi-layer propagation, blurring the discriminative boundaries between different communities; secondly, the exponential growth of path combinations causes the accumulation of semantic noise, resulting in a dilution effect of community-related features. Therefore, the selection of context path length requires a balance between information richness and noise control to achieve optimal model performance. This conclusion provides significant insights for subsequent research: a dynamic path length adjustment mechanism should be established in model design, rather than simply adopting fixed-length context paths.

Figure 9 illustrates the variation in runtime of the BP-GCN model under different context path lengths. It is important to note that, due to the limitations of the experimental machine’s performance, the maximum context path length in the DBLP-A and DBLP-P datasets was set to 4, making it impossible to execute community detection tasks with a path length of 5.

As shown in Figure 9, the runtime of the model significantly increased as the context path length grew. Specifically, when the path length increased from 1 to 2, the runtime grew relatively smoothly; however, when the path length increased from 2 to 3, the runtime rose sharply, and the model’s performance also improved significantly (as shown in Figure 9). When the path length further increased from 3 to 5, the runtime surged dramatically, but the model’s performance did not reach its optimum; instead, the best performance was achieved at a path length of 3. Therefore, the appropriate selection of context path length has a significant impact on both the performance and efficiency of the model. The reason for the increase in runtime with the context path length was that as the context path length increased, the range of information the model needed to process expanded significantly, and the computational complexity also rose. Specifically, longer context paths meant that the model needed to aggregate information over a larger neighborhood. Additionally, longer path lengths introduced more redundant information, increasing the overhead of self-attention semantic parsing and adding to the computational burden. Consequently, the runtime grew significantly with the increase in context path length. However, excessively long path lengths may lead to information overload or noise accumulation, which can degrade the model’s performance. Thus, selecting an appropriate context path length is crucial for balancing model performance and computational efficiency.

4. Conclusions

This paper presents BP-GCN, a heterogeneous network community detection model based on a self-attention mechanism. BP-GCN leverages the Bernoulli–Poisson framework to address the challenges of community detection in heterogeneous graphs. The model excels in learning structural features within these graphs without relying on predefined meta-paths. By using a self-attention mechanism, BP-GCN captures higher-order relationships between nodes and highlights the significance of different context paths, overcoming the limitations of traditional meta-path-based methods. Moreover, BP-GCN eliminates the need for additional clustering algorithms in heterogeneous community detection, facilitating an end-to-end detection process. This approach enhances both efficiency and flexibility. Extensive experiments conducted on various real-world datasets show that BP-GCN outperforms existing benchmark methods, demonstrating its robust generalization ability and resilience.

Author Contributions

Conceptualization, G.Z.; Methodology, G.Z. and R.-F.W.; Software, G.Z.; Validation, G.Z. and R.-F.W.; Formal Analysis, G.Z. and R.-F.W.; Investigation, G.Z. and R.-F.W.; Resources, G.Z.; Data Curation, G.Z.; Writing—Original Draft Preparation, G.Z.; Writing—Review and Editing, G.Z. and R.-F.W.; Visualization, G.Z.; Project Administration, G.Z.; Funding Acquisition, G.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Humanities and Social Sciences Planning Foundation of the Ministry of Education, grant number 21YJA860001 and Shandong Natural Science Foundation of China, grant number ZR2021MG006.

Data Availability Statement

Experimental datasets required in this study can be found online (https://github.com/xiaotaozi121096/BP-GCN/) last accessed on 12 March 2025.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

This appendix provides a comprehensive list of symbols, notations and their corresponding meanings used throughout this thesis for clarity and ease of reference.

Table A1. Symbol Explanation.

Symbol	Explanation	Relation
HG	Heterogeneous graph
HG_P	Primary graph of the heterogeneous graph	$H G_{P} \subseteq H G$
HG_A’	Auxiliary graph of the heterogeneous graph	$H G_{A'} \subseteq H G$
A	Node set of the heterogeneous graph
P	Primary node type	$P \subseteq A$
A’	Auxiliary node type	$A' \subseteq A$
$ℝ$	Edge set of the heterogeneous graph
$ρ_{K}$	Context path of length k
F	Community membership matrix

References

Wang, M.; Xiang, D.; Qu, Y.; Li, G. The diagnosability of interconnection networks. Discret. Appl. Math. 2024, 357, 413–428. [Google Scholar] [CrossRef]
Wang, X.; Ji, H.; Shi, C.; Wang, B.; Ye, Y.; Cui, P.; Yu, P.S. Heterogeneous graph attention network. In Proceedings of the World Wide Web Conference, San Francisco, CA, USA, 13–17 May 2019; pp. 2022–2032. [Google Scholar]
Wu, C.-Y.; Beutel, A.; Ahmed, A.; Smola, A.J. Explaining reviews and ratings with paco: Poisson additive co-clustering. In Proceedings of the 25th International Conference Companion on World Wide Web, Montréal, QC, Canada, 11–15 April 2016; pp. 127–128. [Google Scholar]
Cai, D.; Shao, Z.; He, X.; Yan, X.; Han, J. Mining hidden community in heterogeneous social networks. In Proceedings of the 3rd International Workshop on Link Discovery, Chicago, IL, USA, 21–25 August 2005; pp. 58–65. [Google Scholar]
Qi, G.-J.; Aggarwal, C.C.; Huang, T.S. On clustering heterogeneous social media objects with outlier links. In Proceedings of the Fifth ACM International Conference on Web Search and Data Mining, Seattle, WA, USA, 8–12 February 2012; pp. 553–562. [Google Scholar]
Shi, C.; Wang, R.; Li, Y.; Yu, P.S.; Wu, B. Ranking-based clustering on general heterogeneous information networks by network projection. In Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management, Shanghai, China, 3–7 November 2014; pp. 699–708. [Google Scholar]
Fan, S.; Zhu, J.; Han, X.; Shi, C.; Hu, L.; Ma, B.; Li, Y. Metapath-guided heterogeneous graph neural network for intent recommendation. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Anchorage, AK, USA, 4–8 August 2019; pp. 2478–2486. [Google Scholar]
Pan, C.-H.; Qu, Y.; Yao, Y.; Wang, M.-J.-S. HybridGNN: A Self-Supervised Graph Neural Network for Efficient Maximum Matching in Bipartite Graphs. Symmetry 2024, 16, 1631. [Google Scholar] [CrossRef]
Wang, R.-F.; Su, W.-H. The application of deep learning in the whole potato production Chain: A Comprehensive review. Agriculture 2024, 14, 1225. [Google Scholar] [CrossRef]
Cui, K.; Tang, W.; Zhu, R.; Wang, M.; Larsen, G.D.; Pauca, V.P.; Alqahtani, S.; Yang, F.; Segurado, D.; Fine, P.; et al. Real-time localization and bimodal point pattern analysis of palms using uav imagery. arXiv 2024, arXiv:2410.11124. [Google Scholar]
Wang, M.; Lin, Y.; Wang, S. The nature diagnosability of bubble-sort star graphs under the PMC model and MM∗ model. Int. J. Eng. Appl. Sci. 2017, 4, 2394–3661. [Google Scholar]
Cui, K.; Li, R.; Polk, S.L.; Murphy, J.M.; Plemmons, R.J.; Chan, R.H. Unsupervised spatial-spectral hyperspectral image reconstruction and clustering with diffusion geometry. In Proceedings of the 2022 12th Workshop on Hyperspectral Imaging and Signal Processing: Evolution in Remote Sensing (WHISPERS), Rome, Italy, 13–16 September 2022; pp. 1–5. [Google Scholar]
Tu, Y.-H.; Wang, R.-F.; Su, W.-H. Active Disturbance Rejection Control—New Trends in Agricultural Cybernetics in the Future: A Comprehensive Review. Machines 2025, 13, 111. [Google Scholar] [CrossRef]
Xiang, D.; Hsieh, S.-Y. G-good-neighbor diagnosability under the modified comparison model for multiprocessor systems. Theor. Comput. Sci. 2025, 1028, 115027. [Google Scholar]
Cui, K.; Li, R.; Polk, S.L.; Lin, Y.; Zhang, H.; Murphy, J.M.; Plemmons, R.J.; Chan, R.H. Superpixel-based and Spatially-regularized Diffusion Learning for Unsupervised Hyperspectral Image Clustering. IEEE Trans. Geosci. Remote. Sens. 2024, 62, 1–18. [Google Scholar] [CrossRef]
Li, Y.; Wu, Z.; Wang, Z.; Li, P. CDBMA: Community Detection in Heterogeneous Networks Based on Multi-attention Mechanism. In Proceedings of the Chinese National Conference on Social Media Processing, Hefei, China, 23–26 November 2023; pp. 174–187. [Google Scholar]
Fu, T.-Y.; Lee, W.-C.; Lei, Z. Hin2vec: Explore meta-paths in heterogeneous information networks for representation learning. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, Singapore, 6–10 November 2017; pp. 1797–1806. [Google Scholar]
Dong, Y.; Chawla, N.V.; Swami, A. metapath2vec: Scalable representation learning for heterogeneous networks. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Halifax, NS, Canada, 13–17 August 2017; pp. 135–144. [Google Scholar]
Hu, Z.; Dong, Y.; Wang, K.; Sun, Y. Heterogeneous graph transformer. In Proceedings of the Web Conference 2020, Taipei, Taiwan, 20–24 April 2020; pp. 2704–2710. [Google Scholar]
Yang, J.; Leskovec, J. Community-affiliation graph model for overlapping network community detection. In Proceedings of the 2012 IEEE 12th International Conference on Data Mining, Brussels, Belgium, 10–13 December 2012; pp. 1170–1175. [Google Scholar]
Shchur, O.; Günnemann, S. Overlapping community detection with graph neural networks. arXiv 2019, arXiv:1909.12201. [Google Scholar]
Luo, L.; Fang, Y.; Cao, X.; Zhang, X.; Zhang, W. Detecting communities from heterogeneous graphs: A context path-based graph neural network model. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management, Queensland, Australia, 1–5 November 2021; pp. 1170–1180. [Google Scholar]
Barman, D.; Bhattacharya, S.; Sarkar, R.; Chowdhury, N. k-Context Technique: A Method for Identifying Dense Subgraphs in a Heterogeneous Information Network. IEEE Trans. Comput. Soc. Syst. 2019, 6, 1190–1205. [Google Scholar] [CrossRef]
Watts, D.J.; Strogatz, S.H. Collective dynamics of ‘small-world’networks. Nature 1998, 393, 440–442. [Google Scholar] [CrossRef] [PubMed]
Du, H.; Li, S.; Yue, Z.; Yang, X. Research on Community Structure of Small-World Networks and Scale-Free Networks. Chin. Phys. Soc. 2007, 56, 6886–6893. [Google Scholar]
Guare, J. Six degrees of separation. In The Contemporary Monologue: Men; Routledge: London, UK, 2016; pp. 89–93. [Google Scholar]
Zhang, L.; Tu, W. Six Degrees of Separation in Online Society. 2009. Available online: https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=1febc4cc5199d36e50ec7d3533a8ac8a32889d7c (accessed on 10 March 2025).
Zhao, C.-T.; Wang, R.-F.; Tu, Y.-H.; Pang, X.-X.; Su, W.-H. Automatic Lettuce Weed Detection and Classification Based on Optimized Convolutional Neural Networks for Robotic Weed Control. Agronomy 2024, 14, 2838. [Google Scholar] [CrossRef]
Wang, Z.; Wang, R.; Wang, M.; Lai, T.; Zhang, M. Self-supervised transformer-based pre-training method with General Plant Infection dataset. In Proceedings of the Chinese Conference on Pattern Recognition and Computer Vision (PRCV), Urumqi, China, 18–20 October 2024; pp. 189–202. [Google Scholar]
Tinto, V. Dropout from higher education: A theoretical synthesis of recent research. Rev. Educ. Res. 1975, 45, 89–125. [Google Scholar] [CrossRef]
Srivastava, N.; Hinton, G.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R. Dropout: A simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 2014, 15, 1929–1958. [Google Scholar]
Li, L.; Wang, H.; Wu, Y.; Chen, S.; Wang, H.; Sigrimis, N.A. Investigation of Strawberry Irrigation Strategy Based on K-Means Clustering Algorithm. Trans. Chin. Soc. Agric. Mach. 2020, 51, 295–302. [Google Scholar]
Voita, E.; Talbot, D.; Moiseev, F.; Sennrich, R.; Titov, I. Analyzing multi-head self-attention: Specialized heads do the heavy lifting, the rest can be pruned. arXiv 2019, arXiv:1905.09418. [Google Scholar]
Dey, R.; Salem, F.M. Gate-variants of gated recurrent unit (GRU) neural networks. In Proceedings of the 2017 IEEE 60th International Midwest Symposium on Circuits and Systems (MWSCAS), Boston, MA, USA, 6–9 August 2017; pp. 1597–1600. [Google Scholar]
Sun, S.; Liu, J.; Sun, S. Hyperspectral subpixel target detection based on interaction subspace model. Pattern Recognit. 2023, 139, 109464. [Google Scholar] [CrossRef]
Wang, H.; Zhang, X.; Mei, S. Shannon-Cosine Wavelet Precise Integration Method for Locust Slice Image Mixed Denoising. Math. Probl. Eng. 2020, 2020, 4989735. [Google Scholar] [CrossRef]
Gao, J.; Liang, F.; Fan, W.; Sun, Y.; Han, J. Graph-based consensus maximization among multiple supervised and unsupervised models. In Proceedings of the 23rd International Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 7–10 December 2009; Volume 22. [Google Scholar]
Ristoski, P.; De Vries, G.K.D.; Paulheim, H. A collection of benchmark datasets for systematic evaluations of machine learning on the semantic web. In Proceedings of the Semantic Web–ISWC 2016: 15th International Semantic Web Conference, Kobe, Japan, 17–21 October 2016; Proceedings, Part II 15. pp. 186–194. [Google Scholar]
Grover, A.; Leskovec, J. node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 855–864. [Google Scholar]
Kipf, T.N.; Welling, M. Semi-supervised classification with graph convolutional networks. arXiv 2016, arXiv:1609.02907. [Google Scholar]
Veličković, P.; Cucurull, G.; Casanova, A.; Romero, A.; Lio, P.; Bengio, Y. Graph attention networks. stat 2017, 1050, 4. [Google Scholar]
Chen, Z.; Li, X.; Bruna, J. Supervised community detection with line graph neural networks. arXiv 2017, arXiv:1705.08415. [Google Scholar]
Danon, L.; Diaz-Guilera, A.; Duch, J.; Arenas, A. Comparing community structure identification. J. Stat. Mech. Theory Exp. 2005, 2005, P09008. [Google Scholar] [CrossRef]

Figure 1. Heterogeneous bibliographic network.

Figure 2. The context path gives four possible context paths ρ*, each of different lengths, to connect participants V1 and V2.

Figure 3. BP-GCN Model Architecture.

Figure 4. Performance of different attention head numbers on ACM dataset.

Figure 5. Performance of different attention head numbers on DBLP-A dataset.

Figure 6. Performance of different embedding dimensions in ACM datasets.

Figure 7. Performance of different embedding dimensions in DBLP-A datasets.

Figure 8. The influence of different context path lengths on model performance.

Figure 9. The running time of different context path lengths.

Table 1. Dataset information (primary node types are marked with “*”).

Dataset	Node Type	Nodes	Edge Type	Edges	Meta-Path
ACM	* Paper (P)	12,499	Paper–Paper	30,789	PAP
	Author (A)	17,431	Paper–Author	37,055	PAP
	Subject (S)	73	Paper–Subject	12,499	PSP
	Facility (F)	1804	Author–Facility	30,424	PSP
DBLP	* Paper (P)	14,475	Paper–Conference Author–Paper Paper–Term	14,736 41,794 114,624	APCPA APA APTPA
	* Author (A)	14,736
	Conference (C)	20
	Term (T)	8920
IMDB	* Movie	4275	Movie–Actor Movie–Director Movie–Keyword	12,831 4181 20,428	MAM MDM MKM
	Actor	5432
	Director	2083
	Keyword	7313
AIFB	A total of 7 types of nodes	7262	A total of 104 types of edges	48,810	-

Table 2. Experimental results.

Dataset	Metric	Node2vec	Metapath2vec	GCN	GAT	LGNN	HAN	HGT	BP-GCN
ACM	F1	0.6954	0.7142	0.5366	0.6876	0.6987	0.7922	0.7599	0.7496
	NMI	0.2666	0.3596	0.0966	0.2577	0.2746	0.394	0.4509	0.4231
	ARI	0.2469	0.2956	0.1022	0.1422	0.2368	0.319	0.3813	0.4039
DBLP-A	F1	0.7572	0.7144	0.32	0.9023	0.321	0.9023	0.9386	0.9261
	NMI	0.0638	0.2554	0.0186	0.618	0.0069	0.624	0.7032	0.6934
	ARI	0.0409	0.2722	0.0166	0.5264	−0.0012	0.665	0.7322	0.7626
DBLP-P	F1	0.3	0.3125	0.31	0.3	0.225	0.3375	0.4	0.4875
	NMI	0.0655	0.0034	0.0171	0.0495	0.0431	0.0732	0.1086	0.4392
	ARI	−0.0016	0.0013	−0.0048	−0.0029	0.0016	−0.0103	0.0724	0.0564
IMDB	F1	0.5494	0.488	0.3628	0.3587	0.3646	0.4888	0.3634	0.546
	NMI	0.0745	0.027	0.0018	0.0012	0.0158	0.1172	0.0101	0.2298
	ARI	0.0471	0.0146	0.0013	−0.0009	−0.0079	0.131	0.0083	0.1135
AIFB	F1	0.7517	-	0.6524	0.7375	0.6809	-	0.7163	0.7659
	NMI	0.2401	-	0.1567	0.2117	0.2435	-	0.3812	0.3854
	ARI	0.1518	-	0.1248	0.1142	0.079	-	0.3011	0.2836

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhou, G.; Wang, R.-F. The Heterogeneous Network Community Detection Model Based on Self-Attention. Symmetry 2025, 17, 432. https://doi.org/10.3390/sym17030432

AMA Style

Zhou G, Wang R-F. The Heterogeneous Network Community Detection Model Based on Self-Attention. Symmetry. 2025; 17(3):432. https://doi.org/10.3390/sym17030432

Chicago/Turabian Style

Zhou, Gaofeng, and Rui-Feng Wang. 2025. "The Heterogeneous Network Community Detection Model Based on Self-Attention" Symmetry 17, no. 3: 432. https://doi.org/10.3390/sym17030432

APA Style

Zhou, G., & Wang, R.-F. (2025). The Heterogeneous Network Community Detection Model Based on Self-Attention. Symmetry, 17(3), 432. https://doi.org/10.3390/sym17030432

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

The Heterogeneous Network Community Detection Model Based on Self-Attention

Abstract

1. Introduction

2. BP-GCN Model

2.1. Concept Definition

2.2. Loss Function

2.3. BP-GCN Model Architecture

3. Experiments

3.1. Dataset

3.2. Benchmark Methods

3.3. Comparative Experiments

3.4. Parameter Sensitivity Analysis

3.4.1. Number of Attention Heads

3.4.2. Embedding Dimensions

3.4.3. Context Path Length

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI