Figure 1.
An example of node classification updated by different neighborhood aggregation operators: (a) Network G. (b) Neighborhood aggregation techniques can reduce difficulty of the classification task. (c) Poor neighborhood aggregation techniques will arrange neighbor nodes closely together in the embedding space. (d) Catastrophic aggregation techniques make tasks more difficult.
Figure 1.
An example of node classification updated by different neighborhood aggregation operators: (a) Network G. (b) Neighborhood aggregation techniques can reduce difficulty of the classification task. (c) Poor neighborhood aggregation techniques will arrange neighbor nodes closely together in the embedding space. (d) Catastrophic aggregation techniques make tasks more difficult.
Figure 2.
Framework overview. First, DeepWalk is utilized to obtain self-representational vectors of nodes. Then, the weights between neighbors are calculated by the self-attention model on a Euclidean-distance-based learning task. The local representational vectors of nodes are calculated by the weighted neighborhood aggregation. Finally, the sequentially connected vectors are fused into a single unified embedding space.
Figure 2.
Framework overview. First, DeepWalk is utilized to obtain self-representational vectors of nodes. Then, the weights between neighbors are calculated by the self-attention model on a Euclidean-distance-based learning task. The local representational vectors of nodes are calculated by the weighted neighborhood aggregation. Finally, the sequentially connected vectors are fused into a single unified embedding space.
Figure 3.
Framework of one-head self-attention: Each one of lines calculates the weights of one node’s neighbors. The top line is the process of calculation for the weights of neighbors of node , the middle line is for , and so on.
Figure 3.
Framework of one-head self-attention: Each one of lines calculates the weights of one node’s neighbors. The top line is the process of calculation for the weights of neighbors of node , the middle line is for , and so on.
Figure 4.
The multi-head self-attention mechanism employs parallel computation units to independently derive attention weights for different nodes in the graph structure. Specifically, each attention head generates a distinct set of learnable projection parameters that computes the scaled dot-product attention between a target node and its neighbors. As illustrated in the computational graph, the uppermost attention head processes the edge weight relationships for node , and the intermediate head handles the neighborhood interactions for node , with this pattern extending to subsequent nodes. Crucially, all attention heads operate concurrently through parameter-independent transformation matrices, enabling the model to capture heterogeneous relational patterns in the feature space.
Figure 4.
The multi-head self-attention mechanism employs parallel computation units to independently derive attention weights for different nodes in the graph structure. Specifically, each attention head generates a distinct set of learnable projection parameters that computes the scaled dot-product attention between a target node and its neighbors. As illustrated in the computational graph, the uppermost attention head processes the edge weight relationships for node , and the intermediate head handles the neighborhood interactions for node , with this pattern extending to subsequent nodes. Crucially, all attention heads operate concurrently through parameter-independent transformation matrices, enabling the model to capture heterogeneous relational patterns in the feature space.
Figure 5.
Non-learning weighting. From left to right: calculating the Euclidean distance between nodes and their neighbors; normalizing the distance into (0,1); through the Softmax function, obtaining the weights.
Figure 5.
Non-learning weighting. From left to right: calculating the Euclidean distance between nodes and their neighbors; normalizing the distance into (0,1); through the Softmax function, obtaining the weights.
Figure 6.
The comparison of different heads of multi-head self-attention on the Amherst dataset.
Figure 6.
The comparison of different heads of multi-head self-attention on the Amherst dataset.
Table 1.
Statistics of datasets.
Table 1.
Statistics of datasets.
Dataset | | | Multi | # Labels |
---|
Blogcatalog | 10,312 | 333,983 | Yes | 39 |
PPI | 3890 | 76,584 | Yes | 50 |
Amherst | 2235 | 90,954 | No | 6 |
Mich | 3745 | 81,901 | No | 10 |
Table 2.
Experimental results of multi-label classification on Blogcatalog on Micro-F1 (%). Bold indicates the optimal result among the comparative algorithms under the same parameters.
Table 2.
Experimental results of multi-label classification on Blogcatalog on Micro-F1 (%). Bold indicates the optimal result among the comparative algorithms under the same parameters.
Algorithm | Training Proportion |
---|
|
0.2 |
0.4 |
0.6 |
0.8 |
---|
DeepWalk | 38.49 | 40.67 | 41.78 | 42.51 |
LINE | 29.66 | 33.74 | 35.54 | 36.76 |
node2vec | 37.99 | 39.69 | 40.72 | 41.33 |
SDNE | 29.62 | 30.45 | 30.75 | 30.95 |
DeeWaNA (N) | 39.44 | 41.27 | 42.25 | 42.91 |
DeeWaNA (1H + N) | 39.36 | 41.23 | 42.11 | 42.83 |
DeeWaNA (MH + N) | 39.37 | 41.25 | 42.19 | 42.72 |
DeeWaNA (1H) | - | - | - | - |
DeeWaNA (MH) | - | - | - | - |
Table 3.
Experimental results of multi-label classification on PPI on Micro-F1 (%). Bold indicates the optimal result among the comparative algorithms under the same parameters.
Table 3.
Experimental results of multi-label classification on PPI on Micro-F1 (%). Bold indicates the optimal result among the comparative algorithms under the same parameters.
Algorithm | Training Proportion |
---|
|
0.2 |
0.4 |
0.6 |
0.8 |
---|
DeepWalk | 18.30 | 20.58 | 22.01 | 22.88 |
LINE | 12.29 | 14.04 | 15.44 | 16.69 |
node2vec | 18.23 | 20.37 | 21.66 | 22.72 |
SDNE | 16.81 | 18.47 | 19.34 | 20.08 |
DeeWaNA (N) | 19.99 | 22.44 | 23.67 | 24.55 |
DeeWaNA (1H + N) | 19.91 | 22.37 | 23.68 | 24.54 |
DeeWaNA (MH + N) | 19.85 | 22.52 | 23.70 | 24.47 |
DeeWaNA (1H) | 19.96 | 22.39 | 23.57 | 24.54 |
DeeWaNA (MH) | 19.97 | 22.37 | 23.68 | 24.47 |
Table 4.
Experimental results of single-label classification on Amherst on Micro-F1 (%). Bold indicates the optimal result among the comparative algorithms under the same parameters.
Table 4.
Experimental results of single-label classification on Amherst on Micro-F1 (%). Bold indicates the optimal result among the comparative algorithms under the same parameters.
Algorithm | Training Proportion |
---|
|
0.2 |
0.4 |
0.6 |
0.8 |
---|
DeepWalk | 35.61 | 37.27 | 38.58 | 39.60 |
LINE | 31.60 | 34.38 | 36.52 | 37.63 |
node2vec | 36.38 | 37.18 | 37.96 | 38.53 |
SDNE | 35.42 | 36.88 | 37.89 | 38.88 |
DeeWaNA (N) | 37.53 | 38.66 | 39.38 | 39.98 |
DeeWaNA (1H + N) | 37.64 | 38.74 | 39.59 | 40.13 |
DeeWaNA (MH + N) | 37.41 | 38.39 | 39.11 | 39.68 |
DeeWaNA (1H) | 37.54 | 38.45 | 39.05 | 39.56 |
DeeWaNA (MH) | 37.53 | 38.43 | 39.13 | 39.62 |
Table 5.
Experimental results of single-label classification on Mich on Micro-F1 (%). Bold indicates the optimal result among the comparative algorithms under the same parameters.
Table 5.
Experimental results of single-label classification on Mich on Micro-F1 (%). Bold indicates the optimal result among the comparative algorithms under the same parameters.
Algorithm | Training Proportion |
---|
|
0.2 |
0.4 |
0.6 |
0.8 |
---|
DeepWalk | 39.70 | 42.16 | 43.68 | 44.53 |
LINE | 35.33 | 38.11 | 39.89 | 41.33 |
node2vec | 39.40 | 42.06 | 43.44 | 44.20 |
SDNE | 36.46 | 38.38 | 39.60 | 40.01 |
DeeWaNA (N) | 39.62 | 42.11 | 43.70 | 44.64 |
DeeWaNA (1H + N) | 39.60 | 42.20 | 43.72 | 44.78 |
DeeWaNA (MH + N) | 39.80 | 42.19 | 43.53 | 44.70 |
DeeWaNA (1H) | 39.49 | 41.95 | 43.47 | 44.48 |
DeeWaNA (MH) | 39.65 | 42.01 | 43.29 | 44.69 |
Table 6.
The experimental results of effectiveness of weighted neighborhood aggregation on Micro-F1 (%). Bold indicates the optimal result among the comparative algorithms under the same parameters.
Table 6.
The experimental results of effectiveness of weighted neighborhood aggregation on Micro-F1 (%). Bold indicates the optimal result among the comparative algorithms under the same parameters.
Dataset | Algorithm | Training Proportion |
---|
| |
0.2 |
0.4 |
0.6 |
0.8 |
---|
Blogcatalog | LINE | 29.66 | 33.74 | 35.54 | 36.76 |
| LiNA | 34.58 | 36.60 | 37.71 | 38.31 |
| node2vec | 37.99 | 39.69 | 40.72 | 41.33 |
| NoNA | 38.85 | 40.36 | 41.17 | 41.67 |
| SDNE | 29.62 | 30.45 | 30.75 | 30.95 |
| SeNA | 29.21 | 30.70 | 31.55 | 32.08 |
PPI | LINE | 12.29 | 14.04 | 15.44 | 16.69 |
| LiNA | 16.70 | 18.39 | 19.52 | 20.18 |
| node2vec | 18.23 | 20.37 | 21.66 | 22.72 |
| NoNA | 19.94 | 22.00 | 23.07 | 23.78 |
| SDNE | 16.81 | 18.47 | 19.35 | 20.08 |
| SeNA | 17.86 | 20.20 | 21.51 | 22.31 |
Amherst | LINE | 31.60 | 34.38 | 36.52 | 37.63 |
| LiNA | 36.34 | 37.61 | 38.62 | 39.31 |
| node2vec | 36.38 | 37.18 | 37.96 | 38.53 |
| NoNA | 36.37 | 37.48 | 38.11 | 38.93 |
| SDNE | 35.42 | 36.88 | 37.89 | 38.88 |
| SeNA | 37.36 | 38.54 | 39.15 | 39.77 |
Mich | LINE | 35.33 | 38.11 | 39.89 | 41.33 |
| LiNA | 38.21 | 40.47 | 41.64 | 42.57 |
| node2vec | 39.40 | 42.06 | 43.44 | 44.20 |
| NoNA | 39.84 | 42.38 | 43.64 | 44.80 |
| SDNE | 36.46 | 38.38 | 39.60 | 40.01 |
| SeNA | 37.61 | 39.34 | 40.25 | 40.78 |
Table 7.
The experimental results of various algorithms for learning self-representational vectors on Micro-F1 (%). Bold indicates the optimal result among the comparative algorithms under the same parameters.
Table 7.
The experimental results of various algorithms for learning self-representational vectors on Micro-F1 (%). Bold indicates the optimal result among the comparative algorithms under the same parameters.
Dataset | Algorithm | Training Proportion |
---|
| | 0.2
| 0.4 | 0.6 | 0.8 |
---|
Blocatalog | LiNA | 34.58 | 36.60 | 37.71 | 38.31 |
| NoNA | 38.85 | 40.36 | 41.17 | 41.67 |
| SeNA | 29.21 | 30.70 | 31.55 | 32.08 |
| De*NA (N) | 39.36 | 41.27 | 42.25 | 42.91 |
PPI | LiNA | 16.70 | 18.39 | 19.52 | 20.18 |
| NoNA | 19.94 | 22.00 | 23.07 | 23.78 |
| SeNA | 17.86 | 20.20 | 21.51 | 22.31 |
| De*NA (N) | 19.99 | 22.44 | 23.67 | 24.55 |
Amherst | LiNA | 36.34 | 37.61 | 38.62 | 39.31 |
| NoNA | 36.37 | 37.48 | 38.11 | 38.93 |
| SeNA | 37.36 | 38.54 | 39.15 | 39.77 |
| De*NA (N) | 37.53 | 38.66 | 39.38 | 39.98 |
Mich | LiNA | 38.21 | 40.47 | 41.64 | 42.57 |
| NoNA | 39.84 | 42.38 | 43.64 | 44.80 |
| SeNA | 37.61 | 39.34 | 40.25 | 40.78 |
| De*NA (N) | 39.62 | 42.11 | 43.70 | 44.64 |
Table 8.
The classification results of different distance–weight relationships on Micro-F1 (%). Bold indicates the optimal result among the comparative algorithms under the same parameters.
Table 8.
The classification results of different distance–weight relationships on Micro-F1 (%). Bold indicates the optimal result among the comparative algorithms under the same parameters.
Dataset | Algorithm | Training Proportion |
---|
| | 0.2 | 0.4 | 0.6 | 0.8 |
---|
Blogcatalog | DeeWaNA (F) | 39.36 | 41.26 | 42.14 | 42.74 |
| DeeWaNA (N) | 39.44 | 41.27 | 42.25 | 42.91 |
PPI | DeeWaNA (F) | 19.87 | 22.42 | 23.62 | 24.54 |
| DeeWaNA (N) | 19.99 | 22.44 | 23.67 | 24.55 |
Amherst | DeeWaNA (F) | 37.44 | 38.48 | 39.23 | 39.88 |
| DeeWaNA (N) | 37.53 | 38.66 | 39.38 | 39.98 |
Mich | DeeWaNA (F) | 39.60 | 42.08 | 43.61 | 44.51 |
| DeeWaNA (N) | 39.62 | 42.11 | 43.70 | 44.64 |
Table 9.
The experimental results of various combinations of representation on Blogcatalog on Micro-F1 (%). Bold indicates the optimal result among the comparative algorithms under the same parameters.
Table 9.
The experimental results of various combinations of representation on Blogcatalog on Micro-F1 (%). Bold indicates the optimal result among the comparative algorithms under the same parameters.
Algorithm | Training Proportion |
---|
| 0.2 | 0.4 | 0.6 | 0.8 |
---|
DeepWalk | 38.49 | 40.67 | 41.78 | 42.51 |
DeeWaNA (local) | 36.12 | 38.60 | 39.67 | 40.32 |
DeeWaNA (cnt) | 38.68 | 40.89 | 41.96 | 42.67 |
DeeWaNA (N) | 39.44 | 41.27 | 42.25 | 42.91 |
Table 10.
The classification results of different fusion methods on Blogcatalog on Micro-F1 (%). Bold indicates the optimal result among the comparative algorithms under the same parameters.
Table 10.
The classification results of different fusion methods on Blogcatalog on Micro-F1 (%). Bold indicates the optimal result among the comparative algorithms under the same parameters.
Algorithm | Training Proportion |
---|
| 0.2 | 0.4 | 0.6 | 0.8 |
---|
DeeWaNA (ICA) | 17.64 | 18.61 | 19.81 | 21.12 |
DeeWaNA (AE) | 35.21 | 37.62 | 38.73 | 39.45 |
DeeWaNA (N) | 39.36 | 41.27 | 42.25 | 42.91 |