4.1. k-Core Decomposition
To enable persistent storage of results and support subsequent analyses, we utilize graph databases to implement k-core decomposition. However, existing methods have low efficiency when performed in graph databases. Therefore, an improved k-core decomposition algorithm based on graph databases is proposed. This method takes advantage of two characteristics of graph databases.
The first characteristic is with regard to node properties. Each node is able to store multiple properties to record intermediate and final results, supporting subsequent analysis. Additionally, the efficiency of queries is enhanced through the indexes of properties, which is similar to relational databases.
The Twitch-Gamers [
25] dataset is used as an example. This dataset is a social network of Twitch users. Vertices are Twitch users and edges are mutual follower relationships between them. We implement the storage of this dataset in a graph database and add some properties to the vertices, as shown in
Table 1.
Among all these properties, the properties of views, created_at, life_time are provided in the dataset, which is used to support subsequent analysis. The properties of degree, core, status, and member are additional properties that represent the intermediate and final results of k-core decomposition and core member filtering. Specifically, the degree and status properties are used to record intermediate results during k-core decomposition. The degree property stores the number of neighboring vertices for each node, while the status property reflects whether the core number of the vertex has been determined. Due to the indexes of these properties, they can be efficiently queried during k-core decomposition, which boosts algorithm efficiency. The property of core records the final result of the k-core decomposition, i.e., the core number of each vertex. In addition, the property of member marks the core members, which is the final result of core member filtering. The method for querying core members is described in detail in the next section. This property will be used for subsequent core member analyses.
The second characteristic is with regard to batch queries. Graph databases provide batch queries that avoid the complexity of multiple traversals. We are able to obtain multiple results of a single query in
k-core decomposition. The detailed process of
k-core decomposition in a graph database is shown in Algorithm 1.
Algorithm 1 Algorithm for k-core decomposition in a graph database |
Require: The graph in a graph database with the provided properties in a dataset Ensure: The graph in a graph database with the properties related to k-core and the maximum core number l - 1:
, - 2:
for each do - 3:
- 4:
- 5:
end for - 6:
while
do - 7:
- 8:
for to k do - 9:
- 10:
end for - 11:
if then - 12:
for all do - 13:
- 14:
- 15:
- 16:
for all do - 17:
- 18:
end for - 19:
end for - 20:
else - 21:
- 22:
end if - 23:
- 24:
end while - 25:
return
|
Lines 1–5 perform the initialization process. During this process, each node is assigned the properties of
status and
degree in
Table 1. Specifically, the property of
status of each node is allocated the value of
false in line 4, which indicates that the core number of each node is not determined. In line 5, the number of neighboring vertices for each node is calculated, which is easily performed in graph databases, since graph databases store the relationships of each vertex, i.e., the adjacent vertices of each vertex.
Lines 6–19 represent the k-core decomposition process. In the loop spanning lines 8–10, nodes are identified based on their exact degree. For each value of j from 1 to k, the algorithm queries for nodes where the degree is exactly j and the status is false. These nodes belong to the k-core with a core number of k according to Definitions 1 and 2. Due to the first characteristic of graph databases mentioned before, these nodes can be quickly queried through the indexes of the properties in graph databases. Moreover, these nodes are queried in batches to enhance efficiency, which is the second characteristic mentioned before. Lines 11–16 record the intermediate and final results of k-core decomposition in the properties. Meanwhile, the neighboring nodes of each node are queried in graph databases, and the property of degree of these nodes is reduced by 1.
Compared to the existing
k-core decomposition methods [
7,
10,
12], the proposed method leverages the two characteristics of graph databases mentioned earlier to improve overall efficiency and to support subsequent analysis. Namely, we utilize the indexes of node properties and batch queries to replace traversal queries. Meanwhile, the intermediate and final results are recorded in the properties of each node. For example, in lines 8–10, instead of a single, slow range query to find all nodes with a degree less than or equal to
k (which cannot use a composite index and would require a full graph scan), we perform a series of fast, exact-match queries on the indexed
degree property within a loop. This is a direct application of using property indexes and batch queries to replace a much slower traversal-based operation.
Figure 3 shows a subgraph of the Twitch-Gamers dataset. We will use this subgraph to illustrate the process of Algorithm 1 in a social network.
First, during the initialization process, each node is assigned the properties of degree and status, as shown in lines 1–5 of Algorithm 1. For example, the property of status is set to false and the property of degree is set to 4 for vertex a.
During the k-core decomposition process, the nodes are batch queried based on the core number k in current iteration. When k is 1, the nodes o, q, r, s, t are batch queried according to lines 6–9 of Algorithm 1. Then, the property of status in these nodes is set to true, and the property of core is set to 1 according to lines 10–16 of Algorithm 1. Meanwhile, the degree of their neighboring nodes p, n, m, u is decremented by 1. Since the property of degree in these neighboring nodes is modified, some of these nodes can be queried in lines 8–9 of Algorithm 1 and the core number k remains 1. Therefore, the property of core in the nodes p, u, v will also be determined. As a result, when k is 1, the core number of the nodes o, p, q, r, s, t, u and v is determined. At this point, no nodes that satisfy the condition can be found, so the k value is incremented by 1.
By analogy, when
k is greater than 1, all nodes with the property of
status set to
false and the property of
degree less than
k are performed similarly. Finally, in the social network shown in
Figure 3, the nodes with core number of 1 are
o,
p,
q,
r,
s,
t,
u,
v; the nodes with core number of 2 are
f,
g,
h,
i,
j,
k,
l,
m,
n, and the nodes with core number of 3 are
a,
b,
c,
d,
e. The entire
k-core decomposition process is executed in graph databases.
4.2. Core Member Filtering
After computing the core number for all nodes, a critical subsequent step in exploring homogeneous dense groups is to identify and analyze the core members—nodes with higher core numbers. With conventional
k-core decomposition methods, this process is cumbersome, often requiring the export of
k-core results and subsequent loading into a separate analysis tool, which introduces significant I/O and data restructuring overhead. However, in our graph database based approach, the core number is already persisted as an indexed property. Consequently, the task of filtering core members is transformed from a complex data processing problem into simple and fast database queries. The detailed process of core member filtering in a graph database is shown in Algorithm 2.
Algorithm 2 Algorithm for core member filtering in a graph database |
Require: The graph in a graph database with the properties related to k-core, the given minimum core number s based on the datasets and the maximum core number l returned from Algorithm 1 Ensure: The graph in a graph database with the properties related to core members and the set of core members C - 1:
- 2:
for each do - 3:
- 4:
end for - 5:
for to l do - 6:
- 7:
for all do - 8:
- 9:
end for - 10:
end for - 11:
return , C
|
In the initialisation process, the property of member of each node is assigned to false in lines 2–3. In the filtering process, lines 4–7 iterate core number from the given minimum value to the maximum value to identify the nodes whose core number meets the current condition in line 6. These nodes are set as the core members. Since the nodes have the indexed property of core, these nodes can be queried efficiently in line 6 due to the first characteristic of graph databases mentioned before. Similarly, due to the second characteristic of graph databases mentioned before, these nodes are also queried in batches to improve efficiency. Finally, the property of member that indicates whether the node is a core member is persistently stored in the graph database to support the subsequent in-depth analysis.
Taking
Figure 4 as an example, we have labelled the core number of each node according to Algorithm 1. It is assumed that the given minimum core number of core members is 2. Then, we query the nodes with the core number being 2 and mark these nodes as core members based on the indexed property of the graph database according to Algorithm 2. Specifically, we query the nodes in the set
, and mark these nodes. In the same way, we continue to query the nodes with the core number being 3 and mark these nodes, i.e., the nodes in set
.