5.3.2. Node Classification Results of WCRW-MLP
To further enrich the node embeddings, we proposed the WCRW-MLP framework, which combines the WCRW algorithm with an MLP, fully leveraging both the structural features and attribute features of the hypergraph. As shown in
Table 4, we evaluated the performance of WCRW-MLP on five datasets, namely Cora, Cora-CA, NTU2012, Zoo, and ModelNet40, and compared it with some existing methods. The experimental results demonstrate that WCRW-MLP achieves competitive performance on most datasets, showing significant advantages. These findings fully validate its effectiveness and superiority in node classification tasks, further proving the generalization ability and robustness of our method on different datasets.
Building on the above results, we explain why WCRW-MLP achieves superior performance. Conventional hypergraph representations broadly fall into two lines: sampling and embedding approaches and message-passing methods. Both largely model higher-order structure implicitly, which tends to dilute semantic neighborhoods and cause oversmoothing on dense expanded graphs. WCRW addresses this limitation by explicitly injecting two community-salient signals, namely node-pair co-occurrence frequency and triadic closure, into the transition probabilities of a second-order random walk, producing memory-based biased transitions. The second-order walk provides path dependence, the co-occurrence bias strengthens reliable high-frequency co-visits, and the closure bias emphasizes local motif consistency, so that short-context sampling concentrates on structurally coherent and semantically homogeneous clusters. From an objective-function perspective, Skip-gram with negative sampling effectively maximizes the separability between short-context co-visits and random pairs and can be viewed as a low-rank factorization of the windowed co-occurrence matrix. The biased second-order transitions systematically amplify motif-consistent, within-community positives while suppressing incidental co-visits, concentrating the spectrum of the matrix to yield embeddings with stronger linear separability. On this foundation, WCRW-MLP adopts a decoupled fusion approach, concatenating structure embeddings with node attributes and using a shallow MLP to refine the decision boundary. This design avoids the oversmoothing typical of expanded-graph or deep aggregation schemes and, in heterogeneous or sparse regimes, suppresses indiscriminate mixing across weak bridging edges.
In summary, the advantage of WCRW-MLP does not stem from additional model stacking but from the alignment between its second-order memory with explicit co-occurrence and closure biases and the Skip-gram objective, followed by a lightweight attribute-fusion stage. Consequently, gains are more pronounced on structure-dominated datasets, while on attribute-rich datasets the method still achieves stable improvements with better generalization robustness.
5.3.3. Ablation Experiments and Parameter Sensitivity
In this section, we conduct a comprehensive analysis to better understand the effectiveness and robustness of the proposed method. We first perform ablation experiments to isolate the contributions of the weighted-bias and clustering-bias mechanisms in WCRW, then compare the attribute-only MLP with WCRW-MLP, which incorporates both structural embeddings and initial node attributes, in order to highlight the complementary role of structural embeddings. Finally, we investigate the sensitivity of key parameters in the biased random walk process, including walk-related settings and bias coefficients, to gain deeper insights into how different factors influence overall performance.
Ablation Experiments. As shown in
Table 5, both single-bias variants—WCRW (Weighted-Bias Only) and WCRW (Clustering-Bias Only)—already achieve competitive performance on DBLP and IMDb, indicating that each bias mechanism is independently effective in enhancing structural representation. However, the complete WCRW model that integrates both bias terms consistently outperforms the single-bias versions, suggesting complementary advantages when simultaneously modeling node co-occurrence strength and triadic closure. These results confirm that the proposed dual-bias design is crucial for fully capturing the structural characteristics of hypergraphs.
Table 6 further compares the attribute-only MLP with the proposed WCRW-MLP, which incorporates both structural embeddings and initial node attributes, across five attributed datasets. In the experiments, WCRW-MLP achieves higher classification accuracy than MLP on all benchmarks, with improvements ranging from modest yet consistent gains on NTU2012, Zoo, and ModelNet40 to more pronounced increases on datasets such as Cora. This demonstrates that the structural embeddings learned by WCRW provide complementary information to node attributes, thereby improving overall performance. The comparison underscores that structural signals are not redundant but play a vital role in enhancing the discriminative power of attribute-based models.
Parameter Sensitivity. We conducted an empirical evaluation of the parameter sensitivity of WCRW-MLP for node classification on the NTU2012 dataset, focusing on common random-walk parameters: window size
k, walk length
l, number of walks per node
t, embedding dimension
d, and clustering coefficient bias
. We keep other parameters constant to ensure a fair evaluation. As shown in
Figure 3, when the window size
k increases from 4 to 10, classification performance improves because larger windows capture richer contextual information. However, excessively large window sizes can introduce noise. Similarly, as the walk length
l increases, performance steadily improves, reaching optimal results at a moderate value. However, excessively long walks may incorporate irrelevant nodes, reducing the model’s effectiveness. Increasing the number of walks per node
t improves performance, but it plateaus once a certain number of walks is reached. This demonstrates that a sufficient number of walks is critical for capturing structural patterns, but excessively high values add computational cost without meaningful gains. For the embedding dimension
d, lower-dimensional embeddings limit the model’s representational power, while excessively high dimensions increase the risk of overfitting and computational cost. An embedding dimension of approximately 128 strikes a balance between representation quality and computational efficiency. And then we analyze two important parameters of biased random walks: the return parameter
p and the in-out parameter
q. The x-axis denotes the logarithm of
p, and the y-axis denotes the logarithm of
q (use 2 as the base of the logarithm), obtained under the optimal parameter setting of the first-order random walk. The heatmap visualizations in
Figure 4 show that when
and
, the biased random walk achieves optimal performance by balancing local and global structural information.
Finally, we explore the influence of the clustering coefficient bias
, as shown in
Figure 5. The results indicate that the optimal parameter value for node classification lies in the range of
. Furthermore, it was observed that the difference in classification results between positive and negative values of
is not significant, which may be attributed to the size of the hypergraph. In cases where the hypergraph has a relatively small number of nodes, longer walk length. and a higher number of walks are typically employed to capture sufficient structural information. Under these conditions, the generated random walk sequences comprehensively encode the hypergraph’s structural information, leading to minimal differences in classification performance between positive and negative values of
. However, when the hypergraph contains a larger number of nodes, the length of the random walk sequences and the repetition counts of nodes are generally constrained to smaller values due to computational complexity considerations. In such cases, the influence of the sign of
on the classification results becomes more pronounced. Further analysis reveals that smaller positive values of
yield better classification performance, whereas larger values of
may interfere with other parameters, thereby affecting the effectiveness of the biased random walk strategy.
Limitations. Our results indicate consistent gains from combining weight-aware and closure-aware biases within a second-order walk, yet several factors delimit the scope of these findings. Relying on the clique expansion collapses higher-order relations into pairwise links and may bias the representation toward nodes involved in large hyperedges. The bias terms are hand-crafted and governed by fixed hyperparameters, which favors interpretability but may be suboptimal across domains. On ultra-large, dense hypergraphs, computing the triadic closure metrics and maintaining sampling structures for second-order random walks incur substantial time and memory costs, necessitating trade-offs among accuracy, speed, and memory footprint.