Unsupervised Multimodal Community Detection Algorithm in Complex Network Based on Fractal Iteration

Deng, Hui; Huang, Yanchao; Wang, Jian; Hu, Yanmei; Cai, Biao

doi:10.3390/fractalfract9080507

Open AccessArticle

Unsupervised Multimodal Community Detection Algorithm in Complex Network Based on Fractal Iteration

by

Hui Deng

¹,

Yanchao Huang

²,

Jian Wang

²,

Yanmei Hu

² and

Biao Cai

^2,*

¹

School of Computer Science and Software Engineering, Southwest Petroleum University, Chengdu 610500, China

²

College of Computer Science and Cyber Security, Chengdu University of Technology, Chengdu 610059, China

^*

Author to whom correspondence should be addressed.

Fractal Fract. 2025, 9(8), 507; https://doi.org/10.3390/fractalfract9080507

Submission received: 23 June 2025 / Revised: 29 July 2025 / Accepted: 29 July 2025 / Published: 2 August 2025

Download

Browse Figures

Versions Notes

Abstract

Community detection in complex networks plays a pivotal role in modern scientific research, including in social network analysis and protein structure analysis. Traditional community detection methods face challenges in integrating heterogeneous multi-source information, capturing global semantic relationships, and adapting to dynamic network evolution. This paper proposes a novel unsupervised multimodal community detection algorithm (UMM) based on fractal iteration. The core idea is to design a dual-channel encoder that comprehensively considers node semantic features and network topological structures. Initially, node representation vectors are derived from structural information (using feature vectors when available, or singular value decomposition to obtain feature vectors for nodes without attributes). Subsequently, a parameter-free graph convolutional encoder (PFGC) is developed based on fractal iteration principles to extract high-order semantic representations from structural encodings without requiring any training process. Furthermore, a semantic–structural dual-channel encoder (DC-SSE) is designed, which integrates semantic encodings—reduced in dimensionality via UMAP—with structural features extracted by PFGC to obtain the final node embeddings. These embeddings are then clustered using the K-means algorithm to achieve community partitioning. Experimental results demonstrate that the UMM outperforms existing methods on multiple real-world network datasets.

Keywords:

community detection; fractal iteration; multimodal networks; graph convolution; dual-channel encoder; unsupervised learning

1. Introduction

Complex networks play a critical role in modern scientific research, describing systems composed of numerous interacting nodes [1]. From social networks [2,3,4] and biological networks to technological networks [5], complex networks are ubiquitous across various domains. Understanding the organizational structure of these networks is essential for uncovering their intrinsic mechanisms, predicting dynamic behaviors, and designing effective intervention strategies. Among these, community structure stands out as one of the most significant organizational features of complex networks.

Community structure, also referred to as community or modular structure, indicates that nodes in a network can be partitioned into groups where intra-group connections are dense, while inter-group connections are sparse. This structure reflects latent functional relationships and information propagation pathways among nodes, revealing the modular characteristics of the system and interaction patterns between groups. For instance, in social networks, community detection aids in understanding user interest groups, social group formation and evolution, and the processes of opinion formation and dissemination. In protein interaction networks, identifying community structures facilitates the localization of functionally related protein complexes and elucidates their associations with disease mechanisms. In financial transaction networks, community detection can be used to identify high-risk trading groups and predict potential channels for systemic risk propagation [6,7], providing a basis for financial regulation.

In recent years, researchers have proposed numerous community detection methods. Among these, methods based on graph neural networks (GNNs) have garnered significant attention. GNNs learn low-dimensional node representations by propagating and aggregating information over graph structures, enabling community partitioning. However, existing GNN-based methods typically rely on a large number of trainable parameters, making them prone to overfitting, particularly in scenarios with sparse data or small-scale networks. Moreover, many GNN methods only utilize single-modal node or edge features, struggling to fully leverage the diverse heterogeneous information present in networks [8,9,10]. In contrast, traditional non-GNN methods, such as those based on modularity optimization or graph embedding [11], typically focus on local topological connections, inferring community structures by optimizing local objective functions or learning node embedding similarities. These approaches often overlook global network information, making it challenging to capture long-range dependencies and global semantic relationships. Consequently, they exhibit significant limitations in integrating multi-source heterogeneous information, capturing global semantic associations, and adapting to dynamic network evolution. Additionally, these methods often face issues such as overfitting, limited scalability, and poor model robustness on large-scale graphs.

To address the shortcomings of traditional methods, this paper proposes a novel Unsupervised Multimodal Community Detection Algorithm (UMM) based on fractal iteration. Drawing inspiration from the iterative generation and self-similarity principles of fractals, we conceptualize the aggregation of multimodal information in graphs as a fractal iterative process. By designing a parameter-free feature propagation mechanism and adaptive local information integration strategy, the UMM effectively captures local feature patterns and global structural semantics across different scales without relying on extensive trainable parameters, thereby achieving robust identification of multimodal community structures in complex networks. The main contributions of this paper are as follows:

1.: A novel unsupervised node feature aggregation method is designed based on fractal iteration principles. This method avoids introducing nonlinear functions or adjustable parameters. Through multi-layer iteration, each node’s final representation effectively integrates information from all nodes within its multi-hop neighborhood.
2.: A semantic–structural dual-channel encoder (DC-SSE) is proposed, which fuses semantic features—obtained by reducing the dimensionality of PFGC-derived features via UMAP—with structural features extracted by PFGC to produce the final node embeddings.
3.: The fused node representations obtained from the dual-channel encoder are clustered using the K-means algorithm, achieving superior community partitioning results compared to traditional methods.

2. Related Work

This section reviews the major advancements in the field of community detection, covering approaches based on spectral clustering, modularity optimization, graph neural networks (GNNs), and other notable methods, while briefly discussing the application of fractal theory in this domain.

2.1. Spectral Clustering-Based Community Detection Methods

Spectral clustering, a prominent traditional approach, maps nodes from a high-dimensional space to a low-dimensional embedding space by computing the eigenvalues and eigenvectors of the normalized Laplacian matrix of a graph. In this space, similar nodes naturally cluster to form communities. The Ncut algorithm (Shi & Malik) [12], originally developed for image segmentation, minimizes a normalized cut objective and has been widely applied to community detection. The NJW algorithm [13], a classic spectral clustering implementation, enhances robustness through steps involving similarity graph construction, eigenvector computation, row normalization, and K-means clustering.

For directed or weighted graphs, modifications to the Laplacian matrix definition are required. For overlapping communities, clustering methods allowing overlap or specialized eigenvector processing are employed. To address high computational time and memory costs, techniques such as Nyström approximation, Lanczos algorithms, or multilevel spectral clustering (e.g., Graclus algorithm [14]) are used for network coarsening. To tackle the challenge of unknown k values, eigengap heuristic methods identify the optimal k by detecting the largest gap in the eigenvalue distribution. Additionally, adaptive or self-tuning spectral clustering methods, such as those proposed by Zelnik-Manor and Perona [15], select the best k by evaluating the quality of partitions across different k values.

2.2. Modularity Optimization-Based Community Detection Methods

Modularity optimization methods assess the quality of community partitions by maximizing the modularity function Q. Newman and Girvan introduced the concept of modularity [16], employing a greedy merging strategy for community detection, though with high computational complexity. The Louvain algorithm [1], currently the most popular and widely applied modularity optimization method, significantly improves efficiency through multilayer iteration, making it suitable for large-scale networks.

Modularity optimization faces challenges such as resolution limits (difficulty in detecting small communities) and local optima. The Leiden algorithm improves upon Louvain by incorporating additional checks and refinement steps to ensure tightly connected communities, though it does not fully resolve the resolution limit. To address this, Reichardt and Bornholdt [17] proposed multi-resolution modularity, introducing resolution parameter γ to explore community structures at different scales. For large-scale networks, parallelized Louvain algorithms, such as the distributed Louvain algorithm implemented by Que et al. using the Spark framework [18], have been developed.

Modularity optimization has been combined with random walks (e.g., Walktrap algorithm [19]) or deep learning to enhance accuracy. It has also been extended to dynamic networks (e.g., Mucha et al. [20] introduced temporal coupling terms) and multilayer networks. Current challenges include resolution limits, local optima, and adaptation to dynamic networks, particularly for ultra-large-scale networks and overlapping community detection, which require further exploration.

2.3. Graph Neural Network-Based Community Detection Methods

In recent years, with the rapid advancement of deep learning, graph neural network (GNN)-based community detection methods have emerged as a research hotspot. GNNs learn low-dimensional node embeddings through message-passing mechanisms, effectively integrating network topology and node attributes, making them suitable for community detection in heterogeneous and dynamic networks. The Graph Convolutional Network (GCN) proposed by Kipf and Welling [8], a semi-supervised classification model, provides a new approach for efficient node information integration. GCNs aggregate local neighborhood information recursively, enabling node encodings to better reflect their semantic relationships within the network. Meanwhile, the Graph Attention Network (GAT) proposed by Veličković et al. (2018) [9] enhances flexibility in assigning weights to neighborhood information through an attention mechanism, allowing the model to focus on critical nodes and edges. The GraphSAGE method by Hamilton et al. [21] demonstrates superior performance in inductive learning, offering a powerful tool for community detection in evolving and large-scale networks. Researchers often adopt multi-objective optimization frameworks, combining community partition quality with node embedding learning to construct loss functions that balance local structure and global connectivity. Adaptive optimization algorithms like Adam and RMSprop improve training efficiency and stability, while multi-task learning and self-supervised learning enhance model robustness.

2.4. Core-Expansion-Based Community Detection Methods

Core-expansion-based methods represent a class of community detection strategies that first identify structurally dense core nodes or subgraphs and then iteratively expand them into complete communities. These approaches typically exhibit strong interpretability and local structural awareness, making them particularly suitable for detecting communities with distinct core-periphery structures. They perform well in heterogeneous networks, sparse networks, and large-scale graphs.

These methods primarily fall into three categories: k-core expansion, clique percolation, and local optimization expansion. K-core expansion methods utilize highly connected subgraphs (e.g., k-cores) as cores and expand communities based on structural similarity or link density. For instance, the SCAN algorithm [22] expands core nodes using ε-neighborhoods and structural similarity, enabling the detection of overlapping communities and outliers. The LFM algorithm [4] defines cores based on link density and retains edges with high fuzzy similarity during expansion. Such methods are computationally efficient, scalable to large networks, and capable of identifying hierarchical communities. However, their performance is sensitive to the choice of k and may overlook low-density communities. Clique percolation methods employ k-cliques as cores, forming communities by connecting cliques that share k-1 nodes. For example, the CPM algorithm [23] naturally supports overlapping communities, aligning with the intuition that “communities are highly interactive subgraphs.” However, these methods suffer from high computational complexity (the maximal clique problem is NP-hard) and are sensitive to noise from small cliques. Local optimization expansion methods expand communities based on local rules, such as the Louvain algorithm, which greedily optimizes modularity by expanding from high-modularity cores, and the DEMON algorithm [24], which aggregates multiple local cores and determines node membership through democratic voting. These methods strike a balance between global and local information, demonstrating strong robustness, though they may converge to local optima.

In summary, core-expansion-based methods are efficient and flexible, making them widely applicable in fields such as social network analysis and bioinformatics. However, challenges remain in addressing parameter sensitivity and local optima issues.

2.5. Semi-Supervised Community Detection Methods

Semi-supervised community detection methods leverage partial known labels or pairwise constraints to guide community partitioning, thereby enhancing the accuracy and stability of the results. These methods hold significant practical value, combining the precision of supervised learning with the flexibility of unsupervised learning. By utilizing a small amount of labeled data alongside a large volume of unlabeled data, they effectively identify community structures within networks.

Representative approaches, in addition to the graph neural network methods discussed earlier, include label propagation and matrix factorization techniques. The Semi-Supervised Label Propagation (SSLP) algorithm [25], built upon the classical Label Propagation Algorithm (LPA) [26], incorporates seed node labels. By fixing a subset of known labels, SSLP iteratively propagates labels to guide the community assignment of unlabeled nodes. The Speaker-Listener Label Propagation Algorithm (SLPA) [27] extends LPA by maintaining a historical record of node labels to improve stability and supports overlapping community detection by allowing nodes to possess multiple labels. These methods demonstrate high accuracy in social network applications but are sensitive to label noise, which may lead to propagation errors.

Semi-supervised methods based on Non-negative Matrix Factorization (NMF) incorporate label information and pairwise constraints into the optimization objective to guide community detection. Graph Regularized Non-negative Matrix Factorization (GNMF) [28] introduces a graph Laplacian regularization term to preserve local geometric structures, enhancing accuracy under semi-supervised settings through label consistency constraints, making it well-suited for social networks. Semi-Supervised Non-negative Matrix Factorization (SSNMF) [29] integrates label information into the NMF optimization objective. Robust Semi-Supervised Non-negative Matrix Factorization (RSSNMF) [30] employs the

L_{2,1}

-norm to enhance robustness, while Joint Non-negative Matrix Factorization (JNMF) [31] fuses multimodal information from heterogeneous networks (e.g., node attributes and topology) through shared factor optimization for community detection, albeit at a high computational cost. These methods generally exhibit high computational complexity when applied to large-scale graphs, and their performance is sensitive to hyperparameter tuning.

In summary, semi-supervised methods significantly improve the accuracy of community detection in scenarios with partially known labels. However, their performance is contingent on the quality and quantity of labeled data, and their computational complexity remains a challenge.

2.6. Other Community Detection Methods

Beyond the methods discussed previously, the field of community detection encompasses approaches such as random walk, statistical inference, dynamic systems, and metaheuristic algorithms. These methods address community detection from diverse perspectives, each offering distinct advantages in efficiency, robustness, or applicability to specific scenarios.

Random walk algorithms leverage the flow of information within networks to uncover community structures. The Infomap algorithm [32] identifies communities by minimizing the information required to encode random walk paths. Walktrap [19] employs random walk distances for hierarchical clustering. These methods excel at capturing deep connectivity patterns but incur high computational complexity. Statistical inference algorithms frame community detection as a statistical inference task, assuming network structures are generated by probabilistic models, such as the Stochastic Block Model (SBM). The Degree-Corrected Stochastic Block Model (DC-SBM) [33] accounts for node degree heterogeneity, enhancing accuracy. These approaches offer statistical rigor and interpretability but rely heavily on model assumptions. Dynamic systems algorithms transform community detection into the evolution of dynamic processes on networks, such as synchronization behaviors based on the Kuramoto model [34] or information diffusion processes [35]. They exhibit strong robustness and adaptability to dynamic networks but are computationally intensive. Metaheuristic algorithms provide powerful search capabilities for optimizing community detection problems, such as modularity maximization. The Genetic Algorithm for Networks (GA-Net) [36] employs evolutionary principles for global search, potentially escaping local optima. This approach is well-suited for scenarios requiring high-precision community partitioning, though it typically involves high computational complexity.

Compared to spectral clustering’s eigenvalue decomposition [37], modularity optimization’s efficiency [1], and the attribute integration capabilities of graph neural networks (GNNs) [8], the aforementioned methods offer distinct advantages in efficiency, robustness, or overlapping community detection.

2.7. Fractals and Community Detection

Fractal theory describes complex structures with self-similarity and scale invariance, widely applied to analyze the multi-scale properties of natural and social systems [38]. In network science, many real-world networks (e.g., the Internet, social networks, biological networks) exhibit fractal properties, where subgraph structures statistically resemble the overall network [39]. This self-similarity aligns closely with the goals of community detection: communities typically manifest as locally dense, globally sparse subgraphs. Fractal methods reveal hierarchical community structures by identifying multi-scale self-similar patterns. Fractal dimension and the Box-Covering Algorithm are commonly used to quantify a network’s fractal properties, guiding community partitioning [40].

Fractal-based community detection methods leverage network self-similarity for multi-scale community partitioning. Song et al. proposed a box-covering-based community detection algorithm that identifies fractal subgraphs by minimizing the number of boxes, mapping them to communities [39]. Lancichinetti et al. combined fractal theory with modularity optimization, proposing a multi-scale community detection method that captures communities at different scales by adjusting resolution parameters [4]. Additionally, fractal methods integrated with random walks (e.g., fractal random walks [41]) enhance community detection robustness by simulating multi-scale node transitions. These methods perform well in biological networks (e.g., protein interaction networks) and social networks, making them suitable for analyzing complex networks with hierarchical structures.

Fractal methods offer unique advantages in multi-scale analysis and hierarchical community detection but face challenges such as high computational complexity and limited adaptability to non-fractal networks.

3. Methodology

This section elaborates on the unsupervised multimodal community detection algorithm (UMM) proposed in the paper, which leverages fractal-based techniques. The algorithm aims to exploit the inherent cross-scale self-similarity in networks, recursively modeling higher-order neighbor relationships, to enhance the expressive power of node representations for complex topological structures.

In real-world networks, nodes within the same community often exhibit high semantic and structural similarity. This similarity can be abstracted as the “invariant component” in node representations, encapsulating shared, intrinsic, and relatively stable features or structural patterns among community members. Conversely, individual node feature differences constitute the “variable component.” Let the embedding vector of node

v

at the

l

-th iteration be denoted as

h_{v}^{(l)} \in R^{d}

, where

d

is the embedding dimension and node

v \in C_{k}

. The node representation can then be decomposed as follows:

h_{v}^{(l)} = c_{k}^{(l)} + ∆ h_{v}^{(l)}

(1)

where

c_{k}^{(l)}

represents the shared feature vector of community

C_{k}

at the

l

-th iteration, i.e., the invariant component, and

∆ h_{v}^{(l)}

denotes the individualized deviation of node

v

. The fractal iteration mechanism introduced by UMM aggregates the shared invariant components across nodes, progressively enhancing the self-similarity of community structures layer by layer. Through multiple rounds of recursive iterations, UMM continuously extracts and learns the variable components, capturing subtle yet critical individual differences among nodes to achieve more discriminative and structure-aware embedding representations.

UMM comprehensively considers semantic and structural factors, employing a semantic–structural dual-modality optimization mechanism to refine the representation of embedding vectors. This approach addresses the potential bias toward a single modality in heterogeneous information fusion, enabling the spatial distribution of nodes to more accurately reflect the strength of inter-node relationships. Compared to recent multimodal embedding methods (e.g., D2GCN [42] and MM-GNN [43]), which typically rely on semi-supervised training, complex nonlinear transformations, and numerous trainable parameters to learn and integrate information from different modalities, UMM requires no trainable parameters. These conventional methods are constrained by fixed receptive fields and high computational costs, often leading to overfitting and over-smoothing issues. In contrast, UMM offers superior generalization ability and lower computational cost, making it well-suited for unsupervised community detection tasks in complex networks.

The algorithm architecture comprises two key modules: the node feature aggregation module, termed the Parameter-Free Graph Convolution Encoder (PFGC), and the heterogeneous information fusion module, as illustrated in Figure 1.

In the node feature learning phase, the PFGC module relies solely on the graph’s topological structure and initial node features to perform multi-order feature aggregation, effectively extracting high-order semantic features without introducing additional trainable parameters. In scenarios lacking explicit node features, this method constructs initial inputs through the singular value decomposition (SVD) of the adjacency matrix, ensuring the model’s applicability and generalization across diverse networks. The fractal iteration mechanism, grounded in the self-similarity of community structures, progressively aggregates shared components among nodes through multiple rounds of information propagation to robustly capture the common structural characteristics of communities. Simultaneously, it iteratively preserves and amplifies subtle inter-node differences across layers. This design not only enhances the interpretability of feature representations but also achieves robust high-order feature learning within a parameter-free framework.

In the heterogeneous information fusion phase, the Dual-Channel Semantic Structure Encoder (DC-SSE) separately models the semantic and structural features of nodes through distinct channels, balancing node semantic features with edge-based structural information. Through end-to-end collaborative training, it effectively achieves the complementary enhancement of these heterogeneous features, producing optimized embedding representations with strong discriminative power. The final community partitioning module employs the classical K-means algorithm to cluster the embedding representations. This design choice not only validates the separability quality of the embeddings but also ensures computational efficiency on large-scale networks.

3.1. Unsupervised Node Feature Aggregation Method

This section introduces the unsupervised method for node feature aggregation employed in the UMM. In the UMM framework, we define the graph as a tuple

G = (V, E)

, where

V

represents the set of nodes, and

E = \{(u, v)| u . v \in V\}

denotes the set of edges. To achieve semantic feature extraction for nodes, we adhere to the graph smoothness assumption, which posits that adjacent nodes (or nodes with similar structures) in the graph should exhibit similarity in their features or labels. This assumption can be mathematically expressed as follows:

L_{s m o o t h} = \sum_{(u, v) \in E} {‖h_{u} - h_{v}‖}^{2},

(2)

where

h_{u}

and

h_{v}

represent the embedding vectors of nodes

u

and

v

, respectively. This loss function shows that the embedding vectors of adjacent nodes converge in Euclidean space, akin to the implicit feature mapping in Word2Vec [44].

This optimization process, grounded in local consistency, is essentially iterative mapping designed to enhance the regularity of the system through repeated updates of embedding vectors. Similarly, fractal iteration formulas typically generate self-similar structures through recursive transformations. The classic Mandelbrot set [45], for instance, involves complex number iteration, fundamentally a mapping function that transforms one complex number into another, using a specific formula:

z_{n + 1} = F (z_{n}, c) = z_{n}^{2} + c, z_{0} = 0,

(3)

where

z

is a complex number, and

c

is a complex parameter representing a point in the complex plane. The general iteration formula can be abstracted as follows:

x^{(k + 1)} = F (x^{(k)}, θ),

(4)

In fractals,

θ = c

, while in graph neural networks,

θ

can correspond to neighbor nodes, weight matrices, activation functions, or other components. The self-similarity of fractals manifests as local structures resembling the global structure through recursive transformations [39]. In graph embedding, fractal iteration can be likened to the recursive updating of node features, where the feature

h_{u}^{(k + 1)}

of node

u

is generated by aggregating information from its neighborhood, i.e., the features

h_{v}^{(k)}

of neighbor nodes

v

, analogous to the transformation of points in fractals.

For each node

u

in the graph, this can be expressed as

h_{u}^{(k + 1)} = φ (h_{u}^{(k)}, \{h_{v}^{(k)}| v \in N (u)\}),

(5)

where

φ

is a learnable or fixed aggregation function.

Modern graph embedding methods explicitly leverage graph structure information for node embedding. For instance, traditional Graph Convolutional Networks (GCNs) achieve feature propagation through multiple layers of nonlinear transformations. In contrast, the method proposed in this study entirely eliminates parameterized transformation matrices and nonlinear activation functions. Its core operation can be formalized as

h_{u}^{(k)} = {A G G R E G A T E}^{(k)} (\{h_{v}^{(k - 1)}| v \in N (u)\}),

(6)

The aggregation function (AGGREGATE) employs a simple weighted average instead of a learnable neural network, replacing traditional weight matrices and activation functions with linear operations. This shift enables a transition from supervised to unsupervised learning by substituting complex weight matrices and activation functions with purely linear operations. Nonlinear activation functions may distort local features, disrupt global topological consistency, and introduce information loss, thereby affecting the fractal self-similarity of community structures. In contrast, linear activation functions better preserve the geometric and topological properties of features, ensuring structural consistency during recursive iterations. Through multi-layer iterations, each node’s final representation effectively integrates information from all nodes within its multi-hop neighborhood. For instance, after

k

layers of iteration, a node’s representation incorporates information from all nodes within its

k - h o p

neighborhood. This progressive aggregation resembles fractal geometry, where a simple local rule is repeatedly applied across different scales to construct a complex structure with multi-scale self-similarity. In the feature space, this implies that local feature patterns of nodes are “recursively” absorbed and averaged over larger neighborhoods. Node embeddings continuously evolve and enrich in a “self-similar” manner across different layers, ultimately generating node embeddings with multi-scale self-similar characteristics. This parameter-free propagation approach shares a deep intrinsic connection with label propagation algorithms but cleverly extends label propagation to multidimensional feature vector propagation.

The AGG function, which serves as the message aggregation function, determines how to balance a node’s own information with that of its neighbors. The specific formula is as follows:

h_{v}^{(l + 1)} = α h_{v}^{(l)} + \frac{1 - α}{|N_{v}|} \sum_{u \in N_{v}} ω_{u, v} h_{u}^{(l)},

(7)

where

h_{v}^{(l + 1)}

is the embedding vector of node

v

at layer

l + 1

,

α

is an adjustable parameter that determines the weight of the node’s own information during message aggregation,

N_{v}

and

|N_{v}|

represent the neighbors and the number of neighbors of node

v

, respectively, and

ω_{u, v}

is the weight of the edge

(u, v)

.

Notably, although nonlinear functions and adjustable parameters are omitted here for simplicity, they can be readily incorporated into frameworks such as graph attention mechanisms or variational graph autoencoders.

Traditional GCNs typically use a small number of iterations (e.g., k = 2, 3), but for parameter-free GCN methods, more iterations are needed to capture the network structure and obtain better embedding representations. However, this leads to an issue known as the over-smoothing problem. The core operation of node feature aggregation in the algorithm (Algorithm 1) is implemented via matrix multiplication, specifically through the weighted average achieved by normalizing the adjacency matrix, abstracted as follows:

H^{(l + 1)} = \tilde{A} H^{(l)},

(8)

Algorithm 1. Parameter-free graph convolution algorithm.

Input: Network

G = (V, E)

, Initial node features

H

Parameters: Number of layers

k

, Anti-over-smoothing coefficient

λ

Output: Graph convolution encoded features:

H^{(k)}

Begin

1.: $i n i t (H^{0})$ //Initialize node features $H$
2.: Initialize node set $V$
3.: For $l = 0$ to $k$ :
4.: For $v$ in $V$
5.: Initialize the neighbors of $v$ , denoted by $N_{v}$
6.: $H_{v}^{(l + 1)} = A G G (H_{v}^{(l)}, \{H_{u}^{(l)}| u \in N_{v}\})$ , where $A G G$ is the message aggregation function.
7.: End For
8.: Compute $C e n t e r = M e a n (H)$
9.: $R = N o r m (R - λ C e n t e r)$ //Apply anti-over-smoothing
10.: $H^{(l + 1)} = R$ // Update $H$
11.: End For
12.: Return $H^{(k)}$

End

This is inherently a linear transformation. However, all linear transformations acting on finite-dimensional vector spaces are intrinsically Lipschitz continuous. Such transformations do not result in unbounded amplification of distances between input data points. For any two input feature matrices

X_{1}

and

X_{2}

, let the smoothing operation be defined as the function

f (X) = \tilde{A} X

. There exists a bounded constant

K

, specifically the spectral norm of the linear transformation matrix

{‖\tilde{A}‖}_{2}

, such that the Frobenius norm distance between the transformed output feature matrices is bounded by

K

times the distance between the input matrices:

{‖f (X_{1}) - f (X_{2})‖}_{F} \leq K \cdot {‖X_{1} - X_{2}‖}_{F},

(9)

here

K

represents the Lipschitz constant, which ensures that the relative distance changes in node features in the feature space are bounded during each aggregation iteration.

The Hausdorff dimension is a dimensional concept that effectively describes the “roughness” or “space-filling capacity” of complex, fractal structures. Falconer demonstrated in studies on fractal dimensions and Lipschitz mappings that no Lipschitz mapping can increase the Hausdorff dimension of a set [46]. For any Lipschitz continuous function

f

and set

A

, it follows that

{d i m}_{H} (f (A)) \leq {d i m}_{H} (A),

(10)

This theoretically guarantees that the aggregation operation itself does not arbitrarily introduce additional “detail” or “complexity” to increase the intrinsic dimension of the feature space.

However, the message aggregation in parameter-free Graph Convolutional Networks (GCNs), as a specific type of Lipschitz mapping, serves a purpose far beyond merely “not increasing the dimension.” It fundamentally acts as a low-pass filter, suppressing high-frequency signals (local differences) while preserving low-frequency signals (global smoothness). In the feature space, these high-frequency signals are precisely the key components that constitute subtle differences, complex patterns, and “fractal details” among node features. As multiple layers of iteration proceed, node features are progressively averaged toward their neighbors under the smoothing effect, leading to their gradual convergence in the feature space, becoming highly similar and homogenized. This “compression” and “loss of detail” effect causes the initially diverse node distribution, which may possess a high fractal dimension in the feature space, to collapse into a simpler, lower-dimensional structure. Consequently, although Lipschitz mappings only guarantee that the dimension does not increase, the inherent mechanism of graph Laplacian smoothing results in an actual reduction in the effective dimension and fractal dimension of the feature space, rendering node features increasingly uniform and indistinguishable.

To address this issue, we introduce an anti-over-smoothing method. The core idea is to redistribute the signals of low-frequency (global smoothness) and high-frequency (local differences) information after each message aggregation, thereby preserving as much high-frequency information as possible in subsequent convolutions.

3.2. Semantic–Structural Dual-Channel Encoder

Traditional graph embedding methods typically rely on single-modal information (structural or semantic) for representation learning. However, the complexity and heterogeneity of real-world networks necessitate the development of algorithms capable of multimodal feature fusion. To address this challenge, this paper proposes the Dual-Channel Semantic Structure Encoder (DC-SSE). Its core mechanism involves a two-step process that refines the node features obtained in the first step to achieve improved encoding performance.

3.2.1. Semantic Encoder

In the semantic encoder component, to address the potential noise and redundancy in high-dimensional sparse features, this module (Algorithm 2) employs the Uniform Manifold Approximation and Projection (UMAP) algorithm [47] for nonlinear dimensionality reduction and semantic reconstruction. UMAP preserves the topological structure of semantic similarities between nodes by constructing a probabilistic similarity mapping between the high-dimensional feature space and the low-dimensional embedding space.

Algorithm 2. UMAP-based semantic encoder.

Input: Input features

H

, Target dimension

d

Parameters: Neighborhood parameter

k

, minimum distance parameter

δ_{m i n}

, learning rate

η

, number of iterations

T

Output: Semantic encoded features:

Y^{(T)}

BEGIN

1.: //Construct fuzzy topological structure
2.: For $x_{i}$ in $H$ :
3.: Compute the k-nearest neighbor set, denoted as $N_{k} (x_{i})$
4.: For $x_{j}$ in $N_{k} (x_{i})$ :
5.: Compute $d (x_{i}, x_{j}) = D i s t a n c e (x_{i}, x_{j})$
6.: Compute $ρ_{i} = \underset{j \in N_{k} (x_{j})}{m i n} d (x_{i}, x_{j})$
7.: End For
8.: Determine the local scale $σ_{i}$ via binary search, satisfying

\sum_{j \in N_{k} (x_{i})} e x p (- \frac{d (x_{i}, x_{j}) - ρ_{i}}{σ_{i}}) = {l o g}_{2} k

9.: Construct probability distribution $p_{j ∣ i} = e x p (- \frac{d (x_{i}, x_{j}) - ρ_{i}}{σ_{i}})$
10.: End For
11.: Symmetrize probability matrix $P$ , $p_{i j} = p_{j ∣ i} + p_{i ∣ j} - p_{j ∣ i} p_{i ∣ j}$
12.: // Semantic optimization
13.: Initialize the initial feature matrix $Y^{(0)}$
14.: Compute parameters $a, b$
15.: For $t = 0$ to $T - 1$
16.: Compute similarity $q_{i j}^{(t)} = {(1 + a \cdot {‖y_{i}^{(t)} - y_{j}^{(t)}‖}_{2}^{2 b})}^{- 1}$
17.: Define the loss function $L = \sum_{i, j} p_{i j} l o g (\frac{p_{i j}}{q_{i j}^{(t)}}) + (1 - p_{i j}) l o g (\frac{1 - p_{i j}}{1 - q_{i j}^{(t)}})$
18.: Randomly select $m$ negative sample pairs $(y_{i}, y_{k})$
19.: Update embeddings using stochastic gradient descent $y_{i}^{(t + 1)} \leftarrow y_{i}^{(t)} - η \nabla_{y_{i}} L$
20.: End For
21.: Return $Y^{(T)}$

END

The method consists of two main steps. The first step involves constructing a new k-nearest neighbor (k-NN) graph based on the input features of the nodes. For each node, the k nearest nodes in the feature space are identified as its neighbors, and edges are added between them, with edge weights determined by their distances.

In our research, we found that UMAP can effectively capture the semantic similarity of the original feature space and provide good semantic support for the structural encoder.

3.2.2. Structural Encoder

To address the limitation of the semantic encoder in neglecting graph structural information, the proposed structural graph convolutional encoder (Algorithm 3) performs the deep modeling of the network’s topological structure. This method employs an iterative neighborhood aggregation mechanism to effectively capture node structural features without introducing additional trainable parameters. It relies on the graph smoothness assumption while ensuring sufficient distinguishability between nodes to mitigate the over-smoothing phenomenon.

The core component is the structural aggregation method, with the update formula expressed as follows:

h_{i}^{t + 1} = h_{i}^{t} + σ (\sum_{j \in N_{i}} \frac{α P (i, j) (D (i, j) - ξ)}{‖N_{i}‖}) - σ (\sum_{k \in S_{i}} \frac{β P (i, k)}{{(D (i, k) - ξ)}^{2} ‖S_{i}‖}),

(11)

where

σ

is the sigmoid activation function, which is employed to prevent excessive update information from being introduced in a single convolution. The parameters

α

and

β

are adjustable and used to regulate the proportion of positive and negative sample information during the convolution process. The term

P (i, j)

denotes the unit vector of

h_{j}^{t} - h_{i}^{t}

,

D (i, j)

represents the distance metric between nodes

i

and

j

in the embedding space, and

ξ

is the critical distance threshold.

Algorithm 3. Structural diagram of convolutional encoder.

Input: Network

G (V, E)

, initial node features

H

Parameters: Number of iterations

T

Output: Output feature

H^{(T)}

Begin

1.: Initialize node set $V$
2.: For $t = 0$ to $T$ :
3.: For $v$ in $V$
4.: Initialize the neighbor set $N_{v}$ of $v$
5.: Initialize the negative sample set $S_{v}$
6.: $R (v) = S t r u c t u r a l A G G (H (v), H (N_{v}), H (S_{v}))$
7.: End For
8.: $H^{(t)} = R$ // Update $H$
9.: End For
10.: Return $H^{(T)}$

End

The key distinction of this method from traditional graph convolutional approaches lies in its adoption of a residual-like concept. Instead of simply aggregating the features of a node and its neighbors, it considers how a node is influenced by its neighbors. In our modeling, nodes connected by edges (i.e., neighbors) are expected to be closer in the embedding space, while non-neighbor nodes should be farther apart. Therefore, for each node, we treat its neighbors as positive samples and randomly sample other nodes as negative samples. Through iterative updates using the aforementioned formula, the embeddings are optimized to achieve the desired representation.

4. Experiments

This chapter presents the experimental section, designed to validate the efficiency and effectiveness of the proposed unsupervised multimodal community detection algorithm (UMM) based on fractal iteration. It includes the experimental setup, detailed experimental results, and corresponding analyses.

4.1. Experimental Setup

4.1.1. Experimental Datasets

This experiment utilizes three types of datasets for validation: classic small-scale networks, large-scale real-world networks, and citation networks. Detailed information is provided below.

1.: Classic Small-Scale Networks

(1) Karate Club Network: A benchmark dataset in community detection [48], this network captures social interactions among 34 members of a university Karate club in the United States, comprising 34 nodes and 78 edges. The network splits into two communities due to internal conflicts.

(2) Dolphins Social Network: Constructed based on interaction behaviors of 62 Dolphins in New Zealand’s Doubtful Sound [49], this network includes 159 edges and naturally forms two distinct communities. It is commonly used to evaluate the robustness of community detection algorithms.

(3) American College Football Network: This network represents match relationships among American college Football teams [50], with 115 nodes (teams) and 616 edges (matches). It is explicitly divided into 12 communities (conferences), exhibiting a clear hierarchical structure.

(4) Polbooks Political Books Network: Built from the co-purchase relationships of political books on Amazon [51], this network contains 105 nodes and 441 edges. It is partitioned into three communities based on the political leanings of the books (liberal/neutral/conservative), reflecting real-world social network characteristics.

2.: Large-Scale Real-World Networks

(1) DBLP Collaboration Network: A scholarly collaboration network in the field of computer science, consisting of 317,080 nodes (authors) and 1,049,866 edges (co-authorships). Communities represent research teams or academic domains [52].

(2) Amazon Product Network: Constructed from co-purchase relationships of products on Amazon, comprising 334,863 product nodes and 925,872 edges. Communities correspond to hierarchical product categories, including 75,149 fine-grained classes [52].

(3) YouTube Social Network: A user social relationship network from the video platform, containing 1,134,890 nodes and 2,987,624 edges (friendships). Communities reflect user interest groups, with 8385 active communities [52].

3.: Citation Network Datasets

(1) Cora: Comprising 2708 machine learning papers as nodes and 5429 citation edges, this network is divided into seven research domains. Node features are represented by bag-of-words models of paper abstracts [53].

(2) CiteSeer: Encompassing 3312 computer science papers with 4732 citation edges, which are partitioned into six disciplinary categories. Node features are frequency vectors of paper keywords [53].

(3) PubMed: A biomedical literature citation network with 19,717 paper nodes and 44,338 citation edges. Communities correspond to disease types studied in the papers, with node features represented by TF-IDF-weighted abstract texts [53].

4.1.2. Evaluation Metrics

In the performance evaluation of community detection algorithms, selecting appropriate evaluation metrics is critical for quantifying algorithmic effectiveness. This study adopts Normalized Mutual Information (NMI) [54] as the primary evaluation metric to measure the consistency between the community partitions detected by the algorithm and the ground-truth community structure.

NMI is a dimensionless metric rooted in information theory. It evaluates the similarity between two clustering results by computing their Mutual Information and normalizing it. The core principle is that if the community structure detected by the algorithm closely aligns with the true partition, the shared information (Mutual Information) between them is higher, resulting in an NMI value approaching 1. Conversely, if the partitions are entirely unrelated, the NMI value approaches 0. The specific calculation formula is as follows:

NMI (A, B) = \frac{2 I (A, B)}{H (A) H (B)},

(12)

where

H (\cdot)

is information entropy. For discrete variables, information entropy can be expressed as

H (X) = \sum_{x \in X} P (x) \log P (x) .

(13)

I (\cdot, \cdot)

cross-entropy is a measure of the mutual dependence between variables. For discrete variables, it can be calculated using the following formula.

I (X, Y) = \sum_{x \in X, y \in Y} P (x, y) \log P (x, y),

(14)

I (A, B)

represents the Mutual Information between the true partition

A

and the detection result

B

, while

H (A)

and

H (B)

represent the entropies of the two, respectively.

Accuracy (ACC) is one of the most intuitive metrics for evaluating classification tasks, used in node classification to measure the extent to which the algorithm’s predicted labels match the true labels. It is defined as the proportion of correctly classified nodes relative to the total number of nodes:

A c c u r a c y = \frac{\sum_{i = 1}^{C} {T P}_{i}}{N},

(15)

where

C

is the number of classes,

{T P}_{i}

is the number of correctly classified samples in the i-th class, and

N

is the total number of samples.

4.1.3. Hyperparameter Configuration

To optimize experimental performance, we conducted hyperparameter tuning tailored to each dataset. This section details the primary hyperparameter settings employed for each dataset. Key parameters for the PFGC diffusion process include the iteration depth

k

, initial embedding dimension

d

, attraction coefficient

α

, and repulsion coefficient

β

, which are used to construct the low-dimensional embedding space. For UMAP dimensionality reduction, relevant parameters include the clustering dimension

n

, number of neighbors, and minimum distance. The number of neighbors and minimum distance were set to their default values of 15 and 0.1, respectively. Additional parameters include the learning rate, the number of optimization iterations

i

, and the number of clusters for K-means clustering. The learning rate was uniformly set to 0.1, and the number of clusters

c

was configured to match the true number of classes in each dataset.

The specific parameter adjustments are outlined in Table 1 follows:

4.2. Baseline Algorithms

For the selection of baseline algorithms, this study includes a comprehensive comparison with both classic and state-of-the-art community detection methods, broadly categorized into deep learning-based and non-deep learning-based approaches.

1.: Unsupervised Methods

(1) Spectral Clustering [37]: A classic method based on spectral graph theory that performs eigendecomposition on the normalized graph Laplacian matrix

L = D - A

, selecting the eigenvectors corresponding to the

k

smallest eigenvalues to construct a low-dimensional embedding space, followed by K-means for node partitioning. Its strength lies in its theoretical rigor and sensitivity to convex communities, but the time complexity of eigendecomposition

O (n^{3})

limits its scalability to large-scale networks.

(2) Label Propagation (LP) [26]: This method iteratively updates node labels until convergence, where a node’s label at step

t

is determined by the majority label of its neighbors at step

t - 1

, satisfying

C_{i}^{t} = a r g m a x_{j} \sum_{v_{k} \in N (i)} δ (C_{k}^{t - 1}, j)

. LP is computationally efficient with linear time complexity but is sensitive to initial conditions and prone to forming oversized communities.

(3) Louvain [1]: Based on modularity optimization, this method employs a hierarchical local aggregation strategy to partition the network into communities. Initially, each node is treated as an independent community, and nodes are iteratively merged to maximize modularity, yielding high-quality partitions. While Louvain excels in computational efficiency and partition quality, it suffers from resolution limits, making it less effective at detecting smaller community structures.

(4) LP-W [26]: An improved version of the label propagation algorithm that incorporates edge weights, updating labels through weighted accumulation of neighbor labels. It maintains linear time complexity and converges quickly but remains sensitive to initial labels and may form oversized communities in certain cases.

(5) BIGCLAM [55]: A community affiliation model that represents edge existence probabilities as nonlinear combinations of node affiliation strengths, maximizing the likelihood of network generation. This method efficiently detects overlapping communities and is suitable for large-scale networks but may have limitations in distinguishing non-overlapping communities.

2.: Supervised Methods

(1) DACDPR [56]: A deep learning-based community detection method that enhances efficiency and performance by partitioning the network and reducing trainable parameters.

(2) DNR_CE [57]: A community embedding method based on deep nonlinear reconstruction, utilizing stacked autoencoders to learn low-dimensional node representations. Its reconstruction loss function preserves network structural properties, while KL divergence optimizes community distributions, making it suitable for sparse networks.

(3) ComNet-R [58]: A community detection method based on deep convolutional learning. It introduces an Edge-to-Image (E2I) conversion model that encodes network edge adjacency relationships into a two-dimensional image structure. By constructing a Community Classification Network (ComNet), it achieves convolution neural network-based edge-type discrimination (i.e., intra-community edges vs. inter-community edges). The method innovatively combines Breadth-First Search (BFS) to generate local community views. It then employs a local modularity optimization strategy, denoted by R, to merge communities. Its community partitioning criterion

L_{m e r g e} = a r g m a x \sum R (C_{i}, C_{j})

significantly enhances the modularity index. Furthermore, ComNet-R demonstrates strong adaptability to ambiguous community boundaries and topological sparsity in extremely large-scale real-world networks.

(4) MFF [59]: A multi-feature fusion-based community detection network that proposes a bidirectional edge feature modeling framework. Local features are generated by measuring the similarity of node neighbor attributes (e.g., using the Jaccard coefficient). In contrast, non-local features capture long-range structural relationships through a bidirectional random walk strategy. The method innovatively designs a cross-scale feature fusion module. This module employs an attention mechanism to align the local and non-local feature spaces, thereby constructing a fused feature vector

h_{e} = α h_{l o c a l} + (1 - α) h_{n o n - l o c a l}

. Furthermore, MFF proposes a neighbor-constrained community merging algorithm. This algorithm, based on local modularity

R = \frac{l_{c}}{m} - {(\frac{d_{c}}{2 m})}^{2}

, only merges topologically adjacent communities. This approach effectively maintains community quality while reducing the time complexity to

O (n / l o g n)

. MFF significantly improves edge classification accuracy in sparse and heterogeneous networks, making it particularly well-suited for multi-scale community detection in social and biological networks.

(5) VGAER [60]: A novel unsupervised community detection algorithm based on variational graph autoencoder reconstruction, which excels in community detection tasks by integrating high-order modularity information with network features.

(6) LSCD [61]: A multi-objective community detection algorithm that searches for non-dominated solutions in complex networks through iterative local search. It employs a learning-based strategy to dynamically select nodes, optimizing search quality and improving community partition accuracy and efficiency by leveraging network topology and node relationships. Experiments demonstrate LSCD’s superior performance on both synthetic and real-world networks, particularly its fast and accurate community detection capabilities in large-scale networks.

(7) CDMG [62]: An unsupervised community detection method that approaches community discovery from the perspective of optimizing Markov stability using graph neural networks (GNNs). Specifically, this algorithm employs Markov stability as a loss function to evaluate the quality of community partitions.

4.3. Experimental Results on Small-Scale Real-World Datasets

4.3.1. NMI Results on Small-Scale Real-World Datasets

We first evaluate the NMI performance of the proposed method compared to other classic unsupervised and supervised methods on small-scale datasets, including Karate, Dolphins, Football, and Polbooks. Table 2 presents the NMI results of various algorithms for community detection tasks using these small-scale datasets.

The Karate dataset, a highly representative benchmark in community detection, offers clear clustering boundaries, providing an ideal validation scenario for supervised learning algorithms. Supervised methods such as DACDPR, DNR_CE, ComNet-R, and MFF-NET achieve 100% accuracy, and the proposed method similarly accomplishes perfect partitioning, validating the effectiveness of supervised paradigms in scenarios with well-defined topological structures.

For the medium-scale Dolphins network, a convergence in algorithm performance is observed, with spectral clustering, DNR_CE, ComNet-R, and the proposed method all achieving an NMI of 0.889, followed closely by DACDPR at 0.878. Deeper analysis reveals a common misclassification of node 39, a boundary node: this node only has two neighbors (degree centrality = 2), each belonging to a different community, forming a typical region of topological ambiguity.

In the Football network, which exhibits a well-defined hierarchical structure of league competitions, algorithm performance is highly concentrated. The proposed method leads with an NMI of 0.927, followed closely by spectral clustering and MFF-NET (0.924), and supervised methods like DACDPR (0.914), while Label Propagation trails at 0.870. This indicates that topology-based feature learning effectively captures hierarchical relationships.

The Polbooks dataset reveals significant performance divergence among algorithms, with the highest NMI (MFF-NET: 0.632) and the lowest (spectral clustering: 0.574) differing by 9.7%. The proposed method, with an NMI of 0.614, outperforms traditional supervised methods (e.g., DNR_CE: 0.582) but falls short of MFF-NET. This suggests that in the Polbooks network, characterized by ambiguous opinion propagation, simple feature combinations struggle to effectively represent node semantics. Nevertheless, the proposed unsupervised method maintains baseline recognition capability in this scenario, offering a valuable reference for real-world applications with noisy labels.

4.3.2. Visualization of Community Detection Results on Small-Scale Real-World Datasets

Figure 2 illustrates the predicted community detection results and the ground-truth community structure, respectively, of the proposed method when applied to the classic Karate social network dataset. The true community structure of this dataset consists of two communities. As shown in the figures, our method successfully partitions the Karate network into exactly two communities, perfectly aligning with the true number of communities. By comparing the predicted results in Figure 2a with the ground-truth community structure in Figure 2b, it is evident that the predicted node assignments and overall community structure closely match the true structure. The figures clearly depict two tightly knit communities connected by a few edges, with nodes accurately assigned to their respective true communities. This demonstrates that the proposed community detection method effectively identifies the core communities and their boundaries in the Karate network, fully validating its effectiveness and accuracy in detecting community structures in small-scale real-world networks.

Figure 3 illustrates the visualization of the predicted community detection results and the ground-truth community structure, respectively, of the proposed method on the Dolphin dataset. As shown in the figures, our method successfully partitions the Dolphin social network into two communities, consistent with the true number of communities in the dataset. By comparing the predicted community structure with the ground-truth community structure, a high degree of consistency in both community structure and node assignments is observed. Nodes are clearly divided into two primary communities, with the vast majority accurately assigned to their corresponding true communities. The only discrepancy occurs with a “bridge” node between clusters, which is misclassified in the predicted partition. This misclassification arises because the node is structurally connected to both communities by only a single edge, representing a typical region of topological ambiguity. This high visual alignment demonstrates that the proposed method effectively identifies and recovers the true community structure in complex networks, exhibiting strong community detection performance and high partition quality. The characteristics of dense intra-community connections and sparse inter-community connections are accurately reflected in the predicted results, validating the method’s effectiveness in capturing network community features.

Figure 4 presents a comparison between the predicted community partitions and the ground-truth community structure, respectively, of the proposed community detection method on the Football dataset. Unlike the Dolphin and Karate datasets, the Football dataset exhibits a more complex multi-community structure. As shown in the figures, our method successfully identifies and partitions multiple communities, closely matching the true number of communities. By comparing the predicted partitions in Figure 4a with the ground-truth structure in Figure 4b, we observe a high degree of consistency in the spatial distribution, morphology, and assignment of key nodes across communities. Communities, represented by distinct colors, are clearly discernible in the predicted partition, with their boundaries closely corresponding to those of the ground-truth communities. These results strongly demonstrate that the proposed method is not only effective for detecting community structures in simple networks but also excels in community detection for real-world networks with complex multi-community structures, achieving high accuracy.

While our method demonstrates high accuracy on networks such as Dolphin and Karate, the challenge of community detection increases in networks with more complex community overlap characteristics, such as the Polbooks dataset. Figure 5 compares the predicted community partitions of our method with the ground-truth community structure on the Polbooks dataset. The three communities in the Polbooks network represent political orientations: liberal (orange), neutral (green), and conservative (purple).

From the visualization, it can be seen that our method successfully identifies the three primary communities, with most nodes correctly assigned. However, a closer examination of the green community, representing the neutral faction, reveals some discrepancies at the community boundaries, where certain nodes deviate from their true labels. These differences primarily manifest as neutral nodes being predicted as liberal or conservative, and vice versa, with liberal or conservative nodes being predicted as neutral. This reflects the inherent ambiguity in the community affiliation of neutral nodes, which maintain connections with both liberal and conservative factions, thereby increasing the difficulty of accurate partitioning. Despite these minor discrepancies, our method effectively captures the core liberal–neutral–conservative triadic structure of the Polbooks network, demonstrating its efficacy in handling real-world networks with community overlap and boundary ambiguity. Simultaneously, it highlights areas for further improvement in terms of precisely identifying boundary nodes.

4.4. Experimental Results on Large-Scale Real-World Networks

Experiments were conducted on the large-scale DBLP, Amazon, and YouTube networks. Table 3 presents the NMI values obtained when using the proposed algorithm and other baseline algorithms on these datasets (all results were derived from the complete datasets).

On the large-scale DBLP, Amazon, and YouTube networks, our method demonstrates superior performance in terms of the NMI metric. Table 3 lists the NMI results of different algorithms across these three datasets. Overall, our method consistently performs well across multiple datasets, particularly achieving significant advantages regarding the Amazon and YouTube datasets compared to other methods.

Firstly, our method is an unsupervised community detection approach, which, unlike supervised learning algorithms, does not rely on additional label information, thereby offering stronger generalization and broader applicability. On the DBLP dataset, our method achieves an NMI score of 60.4, significantly outperforming all other methods, including MFF. Compared to other approaches, our method more effectively captures community information embedded in the network structure. Notably, without requiring any labeled data, our unsupervised method approaches or even surpasses the performance of some supervised methods, demonstrating its robust representation capability.

On the Amazon dataset, our method also exhibits strong performance, achieving an NMI score of 47.9, slightly higher than MFF (47.2), and substantially outperforming other methods such as ComNet-R (46.8) and Louvain-W (43.0). The Amazon dataset is typically sparse with relatively complex community structures, posing significant challenges for graph-based community detection methods. Nevertheless, our method leverages the advantages of unsupervised learning to extract representations that better align with community partitions from the network’s topological information, resulting in superior community detection outcomes. This further indicates that, even in the absence of supervised signals, our method can achieve or exceed the performance of certain supervised methods on large-scale networks, highlighting its potential for real-world applications.

On the YouTube dataset, our method achieves an NMI score of 32.7, showing clear improvements over methods such as ComNet-R (22.4), Louvain-W (5.1), GraphGAN (4.9), and LP-W (3.2). This demonstrates that our method maintains strong generalization capability in complex, heterogeneous network environments like video social networks. The community structures in the YouTube network are often ambiguous, with many nodes exhibiting unclear connection patterns, which makes methods relying on local topological information susceptible to noise. However, our method’s significant advantage in such a challenging environment underscores its superior ability to capture global community structures.

4.5. Effectiveness Analysis of the Parameter-Free GCN Encoder

To validate the effectiveness of the parameter-free Graph Convolutional Network (GCN) encoder, this section conducts systematic analysis from both qualitative and quantitative perspectives, combining visualization analysis with downstream task validation to comprehensively evaluate the encoder’s ability to capture graph structural features. We employ UMAP dimensionality reduction to project high-dimensional node representations into a two-dimensional space, observing the distribution density and boundary clarity of nodes from different communities to intuitively assess the encoder’s capability to decouple heterogeneous community structures. Experiments are conducted on benchmark datasets with prominent community structures, comparing the projection effects of traditional random walk embedding methods and parameterized GCN models. The focus is on evaluating whether the encoder can preserve the topological features of community structures under unsupervised conditions without relying on label information. As a baseline, the GCN model requires input features and a small number of labeled nodes for training. For datasets lacking provided features or splits, one-hot vectors are used as input features, and a 0.2 label rate is applied to randomly split the training set. To ensure result stability, experiments are conducted over 10 random splits, with the results averaged.

4.5.1. Visualization Analysis of Encoder Results

To validate the effectiveness of the parameter-free GCN encoder, this section conducts systematic analysis from both qualitative and quantitative perspectives, combining visualization analysis with downstream task validation to comprehensively evaluate the encoder’s ability to capture graph structural features.

UMAP dimensionality reduction is first applied to project high-dimensional node representations into a two-dimensional space, allowing the intuitive observation of the distribution density and boundary clarity of nodes from different communities as shown in Figure 6. This visual analysis demonstrates the encoder’s capability to decouple heterogeneous community structures.

Experiments were conducted on benchmark datasets with prominent community structures to compare the projection effects of traditional random walk embedding methods and parameterized GCN models. The focus is on evaluating whether the encoder, under unsupervised conditions, can preserve the topological features of community structures without relying on label information. For the baseline GCN model, which requires input features and a small number of labeled nodes for training, datasets lacking provided features or splits utilize one-hot vectors as input features with a 0.2 label rate for random training set splits. To ensure the stability of results, experiments are performed over 10 random splits, and the average is reported.

4.5.2. Quantitative Analysis of Encoder Effectiveness

To assess the encoding capabilities of the parameter-free GCN (PFGC) compared to traditional GCN in downstream node classification tasks, this study employs a two-layer MLP nonlinear classifier with a hidden layer dimension of 256. Experiments were conducted on citation datasets (CiteSeer, Cora, PubMed), and Table 4 presents the average accuracy and standard deviation of each method in node classification tasks.

The experimental results demonstrate that PFGC achieves a classification accuracy of 69.02% on the CiteSeer dataset, approximately 1.12 percentage points higher than the 67.9% of traditional GCN, indicating that PFGC effectively extracts node features and exhibits superior encoding capability on this dataset. However, on the Cora dataset, PFGC’s accuracy of 78.23% is lower than GCN’s value of 80.1%, suggesting that traditional GCN’s feature extraction approach remains more advantageous for this dataset. On the PubMed dataset, PFGC achieves a classification accuracy of 78.66%, nearly equivalent to GCN’s 78.9%, indicating the comparable performance between the two encoding methods on this dataset.

Overall, the superior performance of PFGC on the CiteSeer dataset indicates that its encoding approach can effectively enhance node representation quality under certain data distributions. However, on the Cora and PubMed datasets, its performance is slightly inferior to or comparable with traditional GCN. These results validate the feasibility of PFGC as an encoder while also highlighting that the structural characteristics of different datasets may influence its encoding effectiveness. Therefore, future research could further explore PFGC’s adaptability across diverse graph data types and integrate more complex classifiers or tasks to comprehensively evaluate its potential advantages and applicable scenarios.

4.6. Ablation Study

4.6.1. Module-Level Ablation Study

To validate the effectiveness of each module in the proposed method, we conducted module-level ablation experiments by constructing various model variants to systematically analyze the impact of each enhancement stage on the final clustering performance. Based on the proposed multi-stage feature enhancement strategy, node representations are progressively aggregated and enhanced within a hierarchical structure. This section focuses on evaluating the contribution of each enhancement module to feature learning. Specifically, the following three comparative models were designed: (1) raw, which uses only the original node features or SVD-derived features for K-means clustering; (2) PFGC, which applies the proposed PFGC to encode raw features before performing K-means clustering; and (3) PFGC + DC-SSE, which employs both the proposed PFGC and DC-SSE for two-stage feature enhancement followed by K-means clustering. The experimental results are presented in Table 5.

The results in Table 5 indicate that the RAW model struggles to achieve satisfactory clustering performance across most datasets, with the exception of the Football dataset, where it attains a relatively high accuracy of 92.4%. This suggests that the Football dataset’s regular structure allows raw features to support reasonable clustering outcomes. In contrast, PFGC significantly outperforms RAW across all datasets, achieving 100% clustering accuracy on the Karate dataset and improving accuracy to 81.4%, 52.9%, and 36.6% on the Dolphin, PolBooks, and DBLP datasets, respectively. These results demonstrate that a single enhancement effectively extracts and preserves critical structural information from the graph.

PFGC + DC-SSE further improves clustering performance on most datasets, with accuracy on the Dolphin dataset increasing from 81.4% to 88.9%, and accuracy on the PolBooks and DBLP datasets rising from 52.9% and 36.6% to 61.4% and 49.3%, respectively. This indicates that secondary enhancement provides deeper structural characterization and information extraction, resulting in more discriminative community partitions. On the Football dataset, however, PFGC and PFGC + DC-SSE achieve comparable results (92.69% vs. 92.7%), suggesting that a single enhancement is sufficient to fully capture the community structure in this case. Collectively, these experimental results validate the positive contributions of each module in the proposed method to feature representation. Specifically, PFGC’s initial enhancement significantly improves clustering performance, while the additional integration of DC-SSE for secondary enhancement yields higher accuracy on most datasets, confirming the method’s effectiveness in enhancing community detection quality.

4.6.2. Parameter Sensitivity Analysis

To further evaluate the impact of key hyperparameter settings on the performance of the proposed method, we conducted sensitivity analysis on three critical hyperparameters:

α

,

β

, and the iteration depth

k

. According to the model design,

α

and

β

regulate the fusion ratio of positive and negative sample information during the graph convolution process, while

k

controls the number of propagation steps for features across the graph, significantly influencing the quality of the final features and clustering performance.

In the experimental setup, the following strategies were employed for the analysis:

(1) Iteration Depth

k

Analysis: With fixed settings for

α

and

β

, we incrementally increased

k

from a small value across four datasets (Karate, Dolphin, Football, PolBooks). The NMI scores were recorded for each setting, and performance line plots were generated to observe the impact of embedding propagation depth on representation capability.

(2) Joint Analysis of

α

and

β

: With the iteration depth

k

fixed at the optimal value identified in the first experiment, we varied the settings of

α

and

β

. A three-dimensional surface plot of NMI performance was generated to assess the combined impact of positive and negative sample fusion ratios on the model’s clustering performance.

Figure 7 illustrates the impact of different diffusion iteration steps

k

on clustering performance when

α = 0.1

and

β = 0.01

. The overall trend indicates that an appropriate iteration depth positively contributes to improving representation capability and clustering performance. However, excessively deep iterations may lead to over-smoothing, resulting in performance degradation.

Specifically, the Karate dataset maintained high clustering performance (NMI reaching 100%) within the range of

k \leq 3

, but a noticeable decline was observed around

k = 6

, suggesting that overly deep propagation may cause node representations to converge excessively, reducing structural distinguishability. Performance subsequently rebounded around

k = 8

, exhibiting non-monotonic behavior. The Dolphin dataset exhibited the lowest performance at

k = 1

(approximately 40%), but rapidly improved and stabilized after

k = 2

, indicating that shallow propagation is sufficient for effective feature enhancement in this dataset. The Football dataset demonstrated overall stability, maintaining NMI scores between 85% and 93% within

1 \leq k \leq 10

, reflecting its regular structure and strong robustness. The PolBooks dataset showed greater performance fluctuations but stabilized after

k \geq 20

, suggesting that more propagation rounds facilitate the integration of complex structural information.

In summary, optimal propagation depth varies across datasets, indicating that the setting of diffusion step

k

should be tailored to the specific graph structure to balance feature extraction capability and representation distinguishability.

Figure 8 illustrates the performance impact surfaces of the model across different datasets under joint variations in parameters

α

and

β

. With the iteration depth

k

fixed at the optimal value for each dataset, we performed a grid search over

α

and

β

within the interval [0.1, 1.0] (with a step size of 0.1), recording the corresponding NMI scores.

From the figure, it can be observed that the Karate and Dolphin datasets exhibit strong robustness across a wide range of parameter settings, achieving near-optimal NMI scores particularly with smaller

α

and

β

values. The Football dataset shows greater sensitivity to parameter settings but maintains an overall stable performance trend. In contrast, the PolBooks dataset experiences sharp performance drops in certain regions, indicating a strong dependence on parameter tuning for this dataset and underscoring the critical importance of parameter selection.

5. Conclusions

This study addresses the community detection task in complex networks by proposing a parameter-free Graph Convolutional Network architecture grounded in fractal iteration. By employing a simple message aggregation strategy and leveraging multi-layer stacking, the proposed approach effectively captures high-order neighborhood information, thereby enhancing the discriminative power of node embeddings while preserving network topological information. The node embeddings within the same community are decomposed into invariant and variant components, where the invariant component captures the stable characteristics of community structures through fractal iteration-induced self-similarity, and the variant component adapts to dynamic node features via learning. This design significantly strengthens the model’s ability to represent community structures at a deeper level. However, the explicit construction and utilization of the invariant component remain underexplored, particularly in terms of systematically designing invariant features to further improve model robustness and the precision of community delineation, leaving substantial room for future investigation.

In terms of integrating semantic and structural information, this paper introduces a Dual-Channel Semantic Structure Encoder (DC-SSE), which employs a force-directed layout algorithm to balance the interplay between semantic and structural information. The semantic encoder, based on the UMAP algorithm, performs nonlinear dimensionality reduction and reconstruction of high-dimensional sparse features, effectively preserving semantic similarities between nodes. Meanwhile, the structural encoder captures inter-node connectivity and local topological structures through an iterative neighborhood aggregation mechanism. The dual-channel encoder not only enhances the quality of embedding representations but also ensures that the resulting node embeddings accurately reflect the intrinsic relationships and local community structures of the original graph, laying a robust foundation for subsequent community partitioning.

Experimental evaluations using the NMI metric on datasets such as Karate, Dolphin, Football, PolBooks, and DBLP demonstrate that the proposed method achieves high community detection accuracy across most datasets. Ablation studies further confirm the effectiveness of DC-SSE in extracting deep-level information and characterizing structural properties, elucidating the synergistic interactions among the model’s components and providing valuable insights for future optimizations tailored to diverse network structures.

Future work will focus on the explicit modeling of invariant features within community structures, aiming to further explore stable representation mechanisms for node embeddings across multi-layer iterations. Leveraging fractal theory, we plan to design more concise and efficient iterative models that explicitly extract and utilize self-similar features in community structures, thereby enhancing the robustness and discriminative capability of node embeddings. Additionally, we aim to integrate more effective community clustering or merging methods to further improve the performance of community detection in complex networks.

Author Contributions

Conceptualization, B.C.; Methodology, H.D. and B.C.; Validation, Y.H. (Yanchao Huang) and J.W.; Investigation, Y.H. (Yanmei Hu); Resources, H.D. and B.C.; Data curation, Y.H. (Yanchao Huang), J.W. and Y.H. (Yanmei Hu); Writing—original draft, Y.H. (Yanchao Huang) and J.W.; Writing—review & editing, B.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflict of interest.

References

Blondel, V.D.; Guillaume, J.L.; Lambiotte, R.; Lefebvre, E. Fast unfolding of communities in large networks. J. Stat. Mech. 2008, 2008, P10008. [Google Scholar] [CrossRef]
Alamsyah, A.; Rahardjo, B. Community detection methods in social network analysis. Adv. Sci. Lett. 2014, 20, 250–253. [Google Scholar] [CrossRef]
Girvan, M.; Newman, M.E.J. Community structure in social and biological networks. Proc. Natl. Acad. Sci. USA 2002, 99, 7821–7826. [Google Scholar] [CrossRef]
Lancichinetti, A.; Fortunato, S.; Kertész, J. Detecting the overlapping and hierarchical community structure in complex networks. New J. Phys. 2009, 11, 033015. [Google Scholar] [CrossRef]
Rahiminejad, S.; Maurya, M.R.; Subramaniam, S. Topological and functional comparison of community detection algorithms in biological networks. BMC Bioinform. 2019, 20, 212. [Google Scholar] [CrossRef]
Akoglu, L.; Tong, H.; Koutra, D. Graph based anomaly detection and description: A survey. Data Min. Knowl. Discov. 2015, 29, 626–688. [Google Scholar] [CrossRef]
Zhang, S.; Zhou, D.; Yildirim, M.Y.; Alcorn, S.; He, J.; Davulcu, H.; Tong, H. Hidden: Hierarchical dense subgraph detection with application to financial fraud detection. In Proceedings of the SIAM International Conference on Data Mining, Houston, TX, USA, 27–29 April 2017. [Google Scholar]
Kipf, T.N.; Welling, M. Semi-supervised classification with graph convolutional networks. arXiv 2016, arXiv:1609.02907. [Google Scholar]
Veličković, P.; Cucurull, G.; Casanova, A.; Romero, A.; Lio, P.; Bengio, Y. Graph attention networks. arXiv 2017, arXiv:1710.10903. [Google Scholar]
Zhou, D.; Zhang, S.; Yildirim, M.Y.; Alcorn, S.; Tong, H.; Davulcu, H.; He, J. A local algorithm for structure-preserving graph cut. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Halifax, NS, Canada, 13–17 August 2017. [Google Scholar]
Sobolevsky, S.; Belyi, A. Graph neural network inspired algorithm for unsupervised network community detection. Appl. Netw. Sci. 2022, 7, 63. [Google Scholar] [CrossRef]
Shi, J.; Malik, J. Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2000, 22, 888–905. [Google Scholar] [CrossRef]
Ng, A.; Jordan, M.; Weiss, Y. On spectral clustering: Analysis and an algorithm. In Proceedings of the 14th Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 3–8 December 2001. [Google Scholar]
Dhillon, I.S.; Guan, Y.; Kulis, B. Weighted graph cuts without eigenvectors: A multilevel approach. IEEE Trans. Pattern Anal. Mach. Intell. 2007, 29, 1944–1957. [Google Scholar] [CrossRef]
Zelnik-Manor, L.; Perona, P. Self-tuning spectral clustering. In Proceedings of the 17th Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 13–18 December 2004. [Google Scholar]
Newman, M.E.J.; Girvan, M. Finding and evaluating community structure in networks. Phys. Rev. E 2004, 69, 026113. [Google Scholar] [CrossRef] [PubMed]
Reichardt, J.; Bornholdt, S. Statistical mechanics of community detection. Phys. Rev. E 2006, 74, 016110. [Google Scholar] [CrossRef] [PubMed]
Que, X.; Checconi, F.; Petrini, F.; Gunnels, J.A. Scalable community detection with the Louvain algorithm. In Proceedings of the IEEE International Parallel and Distributed Processing Symposium, Hyderabad, India, 25–29 May 2015. [Google Scholar]
Pons, P.; Latapy, M. Computing communities in large networks using random walks. In Computer and Information Sciences—ISCIS 2005, 1st ed.; Yolum, P., Güngör, T., Eds.; Springer: Berlin/Heidelberg, Germany, 2005; pp. 284–293. [Google Scholar]
Mucha, P.J.; Richardson, T.; Macon, K.; Porter, M.A.; Onnela, J.P. Community structure in time-dependent, multiscale, and multiplex networks. Science 2010, 328, 876–878. [Google Scholar] [CrossRef] [PubMed]
Hamilton, W.; Ying, Z.; Leskovec, J. Inductive representation learning on large graphs. In Proceedings of the 30th Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
Xu, X.; Yuruk, N.; Feng, Z.; Schweiger, T.A.J. SCAN: A Structural Clustering Algorithm for Networks. In Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Jose, CA, USA, 12–15 August 2007; pp. 824–833. [Google Scholar]
Palla, G.; Derényi, I.; Farkas, I.; Vicsek, T. Uncovering the Overlapping Community Structure of Complex Networks in Nature and Society. Nature 2005, 435, 814–818. [Google Scholar] [CrossRef]
Coscia, M.; Rossetti, G.; Giannotti, F.; Pedreschi, D. DEMON: A Local-First Discovery Method for Overlapping Communities. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Beijing, China, 12–16 August 2012; pp. 615–623. [Google Scholar]
Zhu, X.; Ghahramani, Z.; Lafferty, J.D. Semi-supervised learning using gaussian fields and harmonic functions. In Proceedings of the 20th International Conference on Machine Learning (ICML-03), Washington, DC, USA, 21–24 August 2003; pp. 888–895. [Google Scholar]
Raghavan, U.N.; Albert, R.; Kumara, S. Near linear time algorithm to detect community structures in large-scale networks. Phys. Rev. E 2007, 76, 036106. [Google Scholar] [CrossRef]
Xie, J.; Szymanski, B.K. Towards linear time overlapping community detection in social networks. In Advances in Knowledge Discovery and Data Mining, 1st ed.; Tan, P.-N., Chawla, S., Eds.; Springer: Berlin/Heidelberg, Germany, 2012; pp. 25–36. [Google Scholar]
Cai, D.; He, X.; Han, J.; Huang, T.S. Graph Regularized Nonnegative Matrix Factorization for Data Representation. IEEE Trans. Pattern Anal. Mach. Intell. 2011, 33, 1548–1560. [Google Scholar]
Lee, H.; Yoo, J.; Choi, S. Semi-supervised nonnegative matrix factorization. IEEE Signal Process. Lett. 2009, 17, 4–7. [Google Scholar] [CrossRef]
He, C.; Liu, X.; Yu, P.; Liu, C.; Hu, Y. Community detection method based on robust semi-supervised nonnegative matrix factorization. Phys. A Stat. Mech. Its Appl. 2019, 523, 279–291. [Google Scholar] [CrossRef]
Ma, X.; Dong, D.; Wang, Q. Community detection in multi-layer networks using joint nonnegative matrix factorization. IEEE Trans. Knowl. Data Eng. 2018, 31, 273–286. [Google Scholar] [CrossRef]
Rosvall, M.; Bergstrom, C.T. Maps of random walks on complex networks reveal community structure. Proc. Natl. Acad. Sci. USA 2008, 105, 1118–1123. [Google Scholar] [CrossRef]
Karrer, B.; Newman, M.E.J. Stochastic blockmodels and community structure in networks. Phys. Rev. E 2011, 83, 016107. [Google Scholar] [CrossRef]
Arenas, A.; Díaz-Guilera, A.; Pérez-Vicente, C.J. Synchronization reveals topological scales in complex networks. Phys. Rev. Lett. 2006, 96, 114102. [Google Scholar] [CrossRef]
Lambiotte, R.; Delvenne, J.C.; Barahona, M. Random walks, Markov processes and the multiscale modular organization of complex networks. IEEE Trans. Netw. Sci. Eng. 2015, 1, 76–90. [Google Scholar] [CrossRef]
Pizzuti, C. GA-Net: A genetic algorithm for community detection in social networks. In Proceedings of the International Conference on Parallel Problem Solving from Nature, Dortmund, Germany, 13–17 September 2008. [Google Scholar]
Von Luxburg, U. A tutorial on spectral clustering. Stat. Comput. 2007, 17, 395–416. [Google Scholar] [CrossRef]
Cannon, J.W. The fractal geometry of nature. by Benoit B. Mandelbrot. Am. Math. Mon. 1984, 91, 594–598. [Google Scholar] [CrossRef]
Song, C.; Havlin, S.; Makse, H.A. Self-similarity of complex networks. Nature 2005, 433, 392–395. [Google Scholar] [CrossRef]
Song, C.; Gallos, L.K.; Havlin, S.; Makse, H.A. How to calculate the fractal dimension of a complex network: The box covering algorithm. J. Stat. Mech. 2007, 2007, P03006. [Google Scholar] [CrossRef]
Rozenfeld, H.D.; Song, C.; Makse, H.A. Small-world to fractal transition in complex networks: A renormalization group approach. Phys. Rev. Lett. 2010, 104, 025701. [Google Scholar] [CrossRef]
Ye, Z.; Li, Z.; Li, G.; Zhao, H. Dual-channel deep graph convolutional neural networks. Front. Artif. Intell. 2024, 7, 1290491. [Google Scholar] [CrossRef]
Li, Y.; Wang, X.; Zhang, Y. Multi-modal Graph Neural Network for Attributed Network Embedding. Knowl.-Based Syst. 2024, 283, 111234. [Google Scholar]
Mikolov, T.; Chen, K.; Corrado, G.; Dean, J. Efficient estimation of word representations in vector space. arXiv 2013, arXiv:1301.3781. [Google Scholar] [CrossRef]
Douady, A.; Hubbard, J.H. On the dynamics of polynomial-like mappings. Ann. Sci. Ec. Norm. Super. 1985, 18, 287–343. [Google Scholar] [CrossRef]
Falconer, K. Fractal Geometry: Mathematical Foundations and Applications, 3rd ed.; John Wiley & Sons: Chichester, UK, 2013; pp. 27–29. [Google Scholar]
McInnes, L.; Healy, J.; Melville, J. UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction. Available online: https://arxiv.org/abs/1802.03426 (accessed on 21 June 2025).
Zachary, W.W. An information flow model for conflict and fission in small groups. J. Anthropol. Res. 1977, 33, 452–473. [Google Scholar] [CrossRef]
Lusseau, D.; Schneider, K.; Boisseau, O.J.; Haase, P.; Slooten, E.; Dawson, S.M. The bottlenose dolphin community of doubtful sound features a large proportion of long-lasting associations: Can geographic isolation explain this unique trait? Behav. Ecol. Sociobiol. 2003, 54, 396–405. [Google Scholar] [CrossRef]
Evans, T.S. Clique graphs and overlapping communities. J. Stat. Mech. 2010, 2010, P12037. [Google Scholar] [CrossRef]
Pasternak, B.; Ivask, I. Four unpublished letters. Books Abroad 1970, 44, 196–200. [Google Scholar] [CrossRef]
Yang, J.; Leskovec, J. Defining and evaluating network communities based on ground-truth. In Proceedings of the ACM SIGKDD Workshop on Mining Data Semantics, Beijing, China, 12–16 August 2012. [Google Scholar]
Sen, P.; Namata, G.; Bilgic, M.; Getoor, L.; Galligher, B.; Eliassi-Rad, T. Collective classification in network data. AI Mag. 2008, 29, 93. [Google Scholar] [CrossRef]
De Meo, P.; Ferrara, E.; Fiumara, G.; Provetti, A. Mixing local and global information for community detection in large networks. J. Comput. Syst. Sci. 2014, 80, 72–87. [Google Scholar] [CrossRef]
Yang, J.; Leskovec, J. Overlapping community detection at scale: A nonnegative matrix factorization approach. In Proceedings of the Sixth ACM International Conference on Web Search and Data Mining, Rome, Italy, 4–8 February 2013. [Google Scholar]
Al-Andoli, M.; Cheah, W.P.; Tan, S.C. Deep learning-based community detection in complex networks with network partitioning and reduction of trainable parameters. J. Ambient Intell. Humaniz. Comput. 2021, 12, 2527–2545. [Google Scholar] [CrossRef]
Yang, L.; Cao, X.; He, D.; Wang, C.; Wang, X.; Zhang, W. Modularity based community detection with deep learning. In Proceedings of the 25th International Joint Conference on Artificial Intelligence, New York, NY, USA, 9–15 July 2016. [Google Scholar]
Cai, B.; Wang, Y.; Zeng, L.; Hu, Y.; Li, H. Edge classification based on convolutional neural networks for community detection in complex network. Physica A 2020, 556, 124826. [Google Scholar] [CrossRef]
Cai, B.; Wang, M.; Chen, Y.; Hu, Y.; Liu, M. MFF-Net: A multi-feature fusion network for community detection in complex network. Knowl.-Based Syst. 2022, 252, 109408. [Google Scholar] [CrossRef]
Qiu, C.; Huang, Z.; Xu, W.; Li, H. VGAER: Graph neural network reconstruction based community detection. arXiv 2022, arXiv:2201.04066. [Google Scholar] [CrossRef]
Liu, B.; Wang, D.; Gao, J. A multi-objective community detection algorithm with a learning-based strategy. Int. J. Comput. Intell. Syst. 2024, 17, 311. [Google Scholar] [CrossRef]
Yuan, S.; Wang, C.; Jiang, Q.; Ma, J. Community detection with graph neural network using Markov stability. In Proceedings of the International Conference on Artificial Intelligence in Information and Communication, Jeju Island, Republic of Korea, 21–24 February 2022. [Google Scholar]

$Fractalfract 09 00507 g001$

Figure 1. UMM network framework structure.

$Fractalfract 09 00507 g001$

$Fractalfract 09 00507 g002$

Figure 2. Comparison of predicted and ground-truth community partitions on the Karate dataset: (a) Karate predicts community segmentation; (b) Karate real community segmentation. Nodes with the same color belong to the same community.

$Fractalfract 09 00507 g002$

$Fractalfract 09 00507 g003$

Figure 3. Comparison of predicted and ground-truth community partitions on the Dolphin dataset: (a) Dolphin predicts community segmentation; (b) Dolphin real community segmentation.

$Fractalfract 09 00507 g003$

$Fractalfract 09 00507 g004$

Figure 4. Comparison of predicted and ground-truth community partitions in the Football dataset: (a) Football predicts community segmentation; (b) Football real community segmentation.

$Fractalfract 09 00507 g004$

$Fractalfract 09 00507 g005$

Figure 5. Comparison of predicted and ground-truth community partitions on the Polbooks dataset: (a) Polbooks predicts community segmentation; (b) Polbooks real community segmentation.

$Fractalfract 09 00507 g005$

$Fractalfract 09 00507 g006a$ $Fractalfract 09 00507 g006b$

Figure 6. Visual comparison of PFGCN and GCN encoding results: (a) PFGC Karate, (b) GCN Karate; (c) PFGC Dolphin, (d) GCN Dolphin; (e) PFGC Polbooks, (f) GCN Polbooks; (g) PFGC CiteSeer, (h) GCN CiteSeer.

$Fractalfract 09 00507 g006a$ $Fractalfract 09 00507 g006b$

$Fractalfract 09 00507 g007$

Figure 7. Effect of different iteration step depths k on clustering performance.

$Fractalfract 09 00507 g007$

$Fractalfract 09 00507 g008a$ $Fractalfract 09 00507 g008b$

Figure 8. Performance impact surfaces of the model under joint variations in parameters

α

and

β

across four datasets, with fixed optimal iteration depth

k

. Parameters

α

and

β

vary in [0.1, 1.0] with a step size of 0.1, showing NMI scores, in (a) Karate, (b) Dolphin, (c) Football, (d) PolBooks.

Figure 8. Performance impact surfaces of the model under joint variations in parameters

α

and

β

across four datasets, with fixed optimal iteration depth

k

. Parameters

α

and

β

vary in [0.1, 1.0] with a step size of 0.1, showing NMI scores, in (a) Karate, (b) Dolphin, (c) Football, (d) PolBooks.

$Fractalfract 09 00507 g008a$ $Fractalfract 09 00507 g008b$

Table 1. Key parameter settings for Karate, Dolphins, Football, and Polbooks.

Parameters	Karate	Dolphin	Football	PolBooks
$k$	10	10	4	3
$d$	4	4	24	12
$α$	0.1	0.1	0.1	0.1
$β$	0.01	0.01	0.01	0.01
$n$	4	4	18	4
$i$	22	10	6	74
$c$	2	2	12	3

Table 2. MI results of several algorithms on Karate, Dolphins, Football, and Polbooks.

Algorithm	Supervised Learning	Karate	Dolphin	Football	PolBooks
Spectral Cluster	no	83.6	88.9	92.4	57.4
Label Propagation	no	44.5	52.7	87.3	53.4
Louvain	no	48.2	44.9	91.3	40.8
DACDPR	yes	100	87.8	91.4	57.2
DNR_CE	yes	100	88.9	91.4	58.2
ComNet-R	yes	100	88.9	91.4	59.8
MFF-NET	yes	100	100	92.4	63.2
VGAER	yes	100	91.9	87.3	-
LSCD	no	69.1	62.5	87.9	-
Ours	no	100	88.9	92.7	61.4

Table 3. NMI results of several algorithms on DBLP, Amazon, and YouTube.

Algorithm	Supervised Learning	Amazon	DBLP	YouTube
BIGCLAM	no	20.1	11.2	-
LP-W	no	41.3	25.5	3.2
Louvain	yes	43.0	28.0	4.3
Louvain-W	yes	42.4	26.8	5.1
GraphGAN	yes	41.7	8.3	4.9
ComNet-R	yes	46.8	44.8	22.4
MFF	yes	47.2	57.6	37.3
CDMG	no	11.4	24.5	16.5
Ours	no	47.9	60.4	32.7

Table 4. Comparison of PFGC encoding results for node classification on citation datasets.

Algorithm	CiteSeer	Cora	PubMed
GCN	67.9 ± 0.5	80.1 ± 0.5	78.9 ± 0.7
PFGC	69.0 ± 0.1	78.2 ± 1.0	78.7 ± 0.5

Table 5. PFMM ablation experiment results.

Algorithm	Karate	Dolphin	Football	PolBooks	DBLP
RAW	46.6	13.7	92.4	13.8	4.7
PFGC	100	81.4	92.7	52.9	51.6
PFGC + DC-SSE	100	88.9	92.7	61.4	60.4

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Deng, H.; Huang, Y.; Wang, J.; Hu, Y.; Cai, B. Unsupervised Multimodal Community Detection Algorithm in Complex Network Based on Fractal Iteration. Fractal Fract. 2025, 9, 507. https://doi.org/10.3390/fractalfract9080507

AMA Style

Deng H, Huang Y, Wang J, Hu Y, Cai B. Unsupervised Multimodal Community Detection Algorithm in Complex Network Based on Fractal Iteration. Fractal and Fractional. 2025; 9(8):507. https://doi.org/10.3390/fractalfract9080507

Chicago/Turabian Style

Deng, Hui, Yanchao Huang, Jian Wang, Yanmei Hu, and Biao Cai. 2025. "Unsupervised Multimodal Community Detection Algorithm in Complex Network Based on Fractal Iteration" Fractal and Fractional 9, no. 8: 507. https://doi.org/10.3390/fractalfract9080507

APA Style

Deng, H., Huang, Y., Wang, J., Hu, Y., & Cai, B. (2025). Unsupervised Multimodal Community Detection Algorithm in Complex Network Based on Fractal Iteration. Fractal and Fractional, 9(8), 507. https://doi.org/10.3390/fractalfract9080507

Article Menu

Unsupervised Multimodal Community Detection Algorithm in Complex Network Based on Fractal Iteration

Abstract

1. Introduction

2. Related Work

2.1. Spectral Clustering-Based Community Detection Methods

2.2. Modularity Optimization-Based Community Detection Methods

2.3. Graph Neural Network-Based Community Detection Methods

2.4. Core-Expansion-Based Community Detection Methods

2.5. Semi-Supervised Community Detection Methods

2.6. Other Community Detection Methods

2.7. Fractals and Community Detection

3. Methodology

3.1. Unsupervised Node Feature Aggregation Method

3.2. Semantic–Structural Dual-Channel Encoder

3.2.1. Semantic Encoder

3.2.2. Structural Encoder

4. Experiments

4.1. Experimental Setup

4.1.1. Experimental Datasets

4.1.2. Evaluation Metrics

4.1.3. Hyperparameter Configuration

4.2. Baseline Algorithms

4.3. Experimental Results on Small-Scale Real-World Datasets

4.3.1. NMI Results on Small-Scale Real-World Datasets

4.3.2. Visualization of Community Detection Results on Small-Scale Real-World Datasets

4.4. Experimental Results on Large-Scale Real-World Networks

4.5. Effectiveness Analysis of the Parameter-Free GCN Encoder

4.5.1. Visualization Analysis of Encoder Results

4.5.2. Quantitative Analysis of Encoder Effectiveness

4.6. Ablation Study

4.6.1. Module-Level Ablation Study

4.6.2. Parameter Sensitivity Analysis

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI