Local Community Detection in Dynamic Graphs Using Personalized Centrality^{ †}

## Abstract

## 1. Introduction

#### 1.1. Contributions

## 2. Background

#### 2.1. Definitions

#### 2.2. Measures of Community Quality

#### 2.3. Centrality Measures

Algorithm 1 Solve $M\mathit{x}=\mathit{b}$ to tolerance $tol$ using Jacobi algorithm. | |||||||||

1: | procedure Jacobi($M,\mathit{b},tol$) | ||||||||

2: | k = 0 | ||||||||

3: | ${\mathit{x}}^{\left(0\right)}=\mathbf{0}$ | ||||||||

4: | ${\mathit{r}}^{\left(0\right)}=\mathit{b}-M{\mathit{x}}^{\left(0\right)}$ | ||||||||

5: | $D=diag\left(M\right)$ | ||||||||

6: | $R=M-D$ | ||||||||

7: | while $\parallel {\mathit{r}}^{\left(k\right)}{\parallel}_{2}>tol$ do | ||||||||

8: | ${\mathit{x}}^{(k+1)}={D}^{-1}(R{\mathit{x}}^{\left(k\right)}+\mathit{b})$ | ||||||||

9: | ${\mathit{r}}^{(k+1)}=\mathit{b}-M{\mathit{x}}^{(k+1)}$ | ▹ Next residual | |||||||

10: | $k+=1$ | ||||||||

11: | end while | ||||||||

12: | return ${\mathit{x}}^{(k+1)}$ | ||||||||

13: | end procedure |

## 3. Related Work

#### 3.1. Community Detection

Algorithm 2 Static, Greedy Seed Set Expansion | |

1: | procedure GreedySeedset(graph G, seed set $seed$) |

2: | $C=seed$ |

3: | $progess=True$ |

4: | while $progress$ do |

5: | $maxscore=-1$ |

6: | $maxvtx=null$ |

7: | for $v\in Nb\left(C\right)$ do |

8: | $s\left(v\right)=fit(C\cup v)-fit\left(C\right)$ |

9: | if $s\left(v\right)>maxscore$ then |

10: | $maxscore=s\left(v\right)$ |

11: | $maxvtx=v$ |

12: | end if |

13: | end for |

14: | if maxscore > 0 then |

15: | $C=C\cup maxvtx$ |

16: | else |

17: | $progess=false$ |

18: | end if |

19: | end while |

20: | return C |

21: | end procedure |

#### 3.2. Dynamic Algorithms for Centrality Measures

## 4. Communities from Personalized Centrality

#### 4.1. Local Communities from Personalized Centrality

#### 4.2. Results on Static, Synthetic Graphs

## 5. Dynamic Communities from Personalized Centrality

#### 5.1. Methods

#### 5.2. Synthetic Dynamic Graphs

#### 5.3. Real Graphs

#### 5.3.1. Different Seeding Methods

## 6. Guaranteed Ranking

#### 6.1. Methods

**Theorem**

**1.**

**Proof.**

#### 6.1.1. New Stopping Criterion

#### 6.2. Results

## 7. Conclusions

**Figure 1.**The speedup of the personalized Katz centrality method compared to greedy expansion is shown for SBM graphs with different parameters. (

**a**) The number of vertices n in the graph varies, with $d=20$ and $k=2$. (

**b**) The number of communities k in the graph varies, with $n=47104$ and $d=20$. (

**c**) The average vertex degree d varies, with $n=1000$ and $k=2$.

**Figure 2.**Synthetic dynamic graph showing merging and splitting of communities. (

**a**) $t=1$, (

**b**) $t=2$, (

**c**) $t=3$, (

**d**) $t=4$.

**Figure 3.**Performance and quality behavior of dynamic algorithm compared to static recomputation over time. (

**a**) speedup in iterations over time for b = 10, (

**b**) ratio of conductance scores over time for b = 100.

Contribution | Main Results | |
---|---|---|

New method of identifying local communities using personalized centrality metrics | • Comparisons to a modified version of greedy seed set expansion | 4 |

• High recall values comparing our method to ground truth on stochastic block model graphs | ||

• Several orders of magnitude of speedup obtained using our method | ||

Dynamic algorithm to identify local communities in evolving networks | • Recalls of over 0.80 for synthetic networks showing community evolution | 5 |

• Speedups of over 60× execution time improvement compared to static recomputation for real graphs | ||

• Good quality of communities returned by our dynamic method w.r.t. ratios of conductance and normalized edge cut | ||

• Quality of communities is preserved over time for real graphs | ||

• Comparisons using multiple seeds for our algorithm show our method is robust to using many seeds | ||

Numerical theory to guarantee the accuracy of an approximate solution to a centrality metric | • Development of a new stopping criterion for iterative solvers to terminate when we can guarantee rankings given desired precision | 6 |

• Speedups obtained compared to running to preset tolerance versus using our new stopping criterion |

**Table 2.**The quality of communities detected with our personalized Katz method and greedy expansion is shown. Test graphs are stochastic block model (SBM) graphs with $n=1000$ and $k=2$. (a) The average vertex degree d is varied, while $\rho =0.01$. (b) The proportion of inter-community edges $\rho $ is varied, while $d=20$. (c) The proportion of inter-community edges $\rho $ is varied, while $d=100$.

(a) | ||||||||||

Avg. Degree | $\mathbf{\rho}$ | Katz Recall | Greedy Recall | Forced Greedy Recall | ||||||

Min | Mean | Max | Min | Mean | Max | Min | Mean | Max | ||

5 | 0.01 | 0.688 | 0.936 | 0.974 | 0.004 | 0.015 | 0.034 | 0.024 | 0.924 | 1.000 |

10 | 0.01 | 0.920 | 0.988 | 0.998 | 0.002 | 0.104 | 1.000 | 0.002 | 0.970 | 1.000 |

20 | 0.01 | 0.974 | 0.997 | 1.000 | 0.002 | 0.902 | 1.000 | 0.002 | 0.990 | 1.000 |

50 | 0.01 | 0.994 | 0.999 | 1.000 | 0.002 | 0.990 | 1.000 | 0.002 | 0.990 | 1.000 |

100 | 0.01 | 0.990 | 0.998 | 1.000 | 0.002 | 0.990 | 1.000 | 0.002 | 0.990 | 1.000 |

250 | 0.01 | 1.000 | 1.000 | 1.000 | 0.002 | 0.990 | 1.000 | 0.002 | 0.990 | 1.000 |

490 | 0.01 | 1.000 | 1.000 | 1.000 | 0.002 | 0.990 | 1.000 | 0.002 | 0.990 | 1.000 |

(b) | ||||||||||

Avg. Degree | $\mathbf{\rho}$ | Katz Recall | Greedy Recall | Forced Greedy Recall | ||||||

Min | Mean | Max | Min | Mean | Max | Min | Mean | Max | ||

20 | 0.01 | 0.974 | 0.997 | 1.000 | 0.002 | 0.902 | 1.000 | 0.002 | 0.990 | 1.000 |

20 | 0.05 | 0.806 | 0.944 | 0.988 | 0.002 | 0.852 | 1.000 | 0.002 | 0.960 | 1.000 |

20 | 0.1 | 0.678 | 0.833 | 0.910 | 0.002 | 0.773 | 1.000 | 0.002 | 0.869 | 1.000 |

20 | 0.2 | 0.502 | 0.638 | 0.730 | 0.002 | 0.603 | 0.998 | 0.008 | 0.833 | 0.998 |

20 | 0.3 | 0.474 | 0.551 | 0.630 | 0.002 | 0.505 | 0.932 | 0.096 | 0.655 | 0.942 |

20 | 0.4 | 0.456 | 0.508 | 0.542 | 0.006 | 0.354 | 0.594 | 0.416 | 0.521 | 0.604 |

(c) | ||||||||||

Avg. Degree | $\mathbf{\rho}$ | Katz Recall | Greedy Recall | Forced Greedy Recall | ||||||

Min | Mean | Max | Min | Mean | Max | Min | Mean | Max | ||

100 | 0.01 | 0.990 | 0.998 | 1.000 | 0.002 | 0.990 | 1.000 | 0.002 | 0.990 | 1.000 |

100 | 0.05 | 0.980 | 0.990 | 1.000 | 0.002 | 0.960 | 1.000 | 0.002 | 0.960 | 1.000 |

100 | 0.1 | 0.942 | 0.980 | 0.992 | 0.002 | 0.940 | 1.000 | 0.002 | 0.940 | 1.000 |

100 | 0.2 | 0.728 | 0.822 | 0.908 | 0.002 | 0.880 | 1.000 | 0.002 | 0.880 | 1.000 |

100 | 0.3 | 0.552 | 0.626 | 0.700 | 0.002 | 0.828 | 1.000 | 0.002 | 0.828 | 1.000 |

100 | 0.4 | 0.482 | 0.530 | 0.576 | 0.070 | 0.604 | 0.936 | 0.074 | 0.612 | 0.944 |

**Table 3.**Average recalls at each point in time for synthetic merging and splitting of communities over time.

Block Size = 100 | Block Size = 1000 | |||||||||
---|---|---|---|---|---|---|---|---|---|---|

Parameters | Batch Size | t = | 1 | 2 | 3 | 4 | 1 | 2 | 3 | 4 |

R = | 100 | 200 | 200 | 100 | 1000 | 2000 | 2000 | 1000 | ||

${p}_{in}$ = 0.2, ${p}_{out}$ = 0.01 | 10 | 0.86 | 0.94 | 0.93 | 0.85 | 0.93 | 0.97 | 0.98 | 9.98 | |

100 | 0.76 | 0.89 | 0.89 | 0.75 | 0.97 | 0.99 | 0.99 | 0.97 | ||

1000 | 0.76 | 0.84 | 0.84 | 0.66 | 0.93 | 0.97 | 0.97 | 0.93 | ||

${p}_{in}$ = 0.5, ${p}_{out}$ = 0.01 | 10 | 0.92 | 0.96 | 0.97 | 0.92 | 0.96 | 0.98 | 0.99 | 0.99 | |

100 | 0.79 | 0.89 | 0.90 | 0.78 | 0.95 | 0.98 | 0.98 | 0.95 | ||

1000 | 0.88 | 0.91 | 0.91 | 0.82 | 0.96 | 0.99 | 0.99 | 0.96 | ||

Average | 0.83 | 0.91 | 0.91 | 0.80 | 0.95 | 0.98 | 0.98 | 0.96 |

**Table 4.**Real graphs used in experiments. Columns are graph name, number of vertices, and number of edges.

Graph | $\left|\mathit{V}\right|$ | $\left|\mathit{E}\right|$ |
---|---|---|

slashdot-threads | 51,083 | 140,778 |

enron | 87,221 | 1,148,072 |

digg | 279,630 | 1,731,653 |

wiki-talk | 541,355 | 2,424,962 |

youtube-u-growth | 3,223,589 | 9,375,374 |

**Table 5.**Average summary statistics over time on real graphs for all batch sizes. Columns are graph name, batch size, speedup in time, speedup in iterations, recall, ratio of conductance scores, and ratio of normalized edge cut scores.

Graph | Batch Size | Performance | Quality | |||
---|---|---|---|---|---|---|

${\mathit{T}}_{\mathit{S}}/{\mathit{T}}_{\mathit{D}}$ | ${\mathit{I}}_{\mathit{S}}/{\mathit{I}}_{\mathit{D}}$ | Recall | ${\mathbf{\varphi}}_{\mathit{S}}/{\mathbf{\varphi}}_{\mathit{D}}$ | ${\mathit{f}}_{\mathit{S}}/{\mathit{f}}_{\mathit{D}}$ | ||

slashdot-threads | 10 | 52.94× | 34.02× | 0.93 | 0.99 | 1.03 |

100 | 26.88× | 21.46× | 0.96 | 1.00 | 1.01 | |

1000 | 39.65× | 31.09× | 0.96 | 1.00 | 1.00 | |

enron | 10 | 75.42× | 45.04× | 0.97 | 1.00 | 1.00 |

100 | 63.61× | 41.28× | 0.98 | 1.01 | 0.98 | |

1000 | 46.20× | 29.57× | 0.96 | 1.01 | 0.98 | |

digg | 10 | 54.29× | 29.41× | 0.86 | 0.97 | 1.18 |

100 | 47.64× | 25.69× | 0.90 | 0.98 | 1.07 | |

1000 | 50.64× | 26.87× | 0.97 | 0.99 | 1.02 | |

wiki-talk | 10 | 56.02× | 36.68× | 0.95 | 1.00 | 1.02 |

100 | 48.87× | 31.46× | 0.91 | 0.99 | 1.19 | |

1000 | 56.22× | 36.95× | 0.96 | 1.00 | 1.02 | |

youtube- u-growth | 10 | 56.47× | 27.66× | 0.96 | 1.00 | 0.94 |

100 | 50.00× | 26.58× | 0.96 | 1.00 | 1.00 | |

1000 | 40.17× | 20.44× | 0.91 | 1.00 | 0.92 |

**Table 6.**Results for different seeding methods. Columns are graph name, seeding method, speedup in time, speedup in iterations, recall, ratio of conductance scores, and ratio of normalized edge cut scores. Results shown are averaged over all graphs.

Graph | Method | Number of Seeds | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|

1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | ||

${T}_{S}/{T}_{D}$ | RW-1 | 46.9× | 54.4× | 49.3× | 41.7× | 39.4× | 30.3× | 32.4× | 47.3× | 41.5× | 29.3× |

RW-2 | 33.7× | 66.1× | 42.8× | 51.5× | 57.0× | 52.1× | 50.6× | 46.1× | 53.2× | 39.0× | |

RW-3 | 44.5× | 53.4× | 54.0× | 44.3× | 53.6× | 44.5× | 53.0× | 63.2× | 68.5× | 47.8× | |

${I}_{S}/{I}_{D}$ | RW-1 | 29.4× | 30.9× | 29.8× | 24.6× | 24.5× | 24.4× | 21.0× | 29.2× | 25.3× | 22.3× |

RW-2 | 20.4× | 37.3× | 24.4× | 30.9× | 31.8× | 29.2× | 29.0× | 28.4× | 30.1× | 24.1× | |

RW-3 | 26.0× | 29.8× | 31.9× | 27.9× | 33.4× | 27.4× | 30.9× | 38.2× | 37.0× | 29.9× | |

Recall | RW-1 | 0.99 | 0.98 | 1.00 | 0.98 | 1.00 | 1.00 | 0.99 | 0.98 | 1.00 | 1.00 |

RW-2 | 0.96 | 0.98 | 0.95 | 0.99 | 0.96 | 0.99 | 1.00 | 0.99 | 0.99 | 0.99 | |

RW-3 | 0.93 | 0.97 | 0.95 | 0.99 | 0.98 | 0.99 | 0.99 | 0.98 | 0.99 | 0.99 | |

${\varphi}_{S}/{\varphi}_{D}$ | RW-1 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |

RW-2 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | |

RW-3 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | |

${f}_{S}/{f}_{D}$ | RW-1 | 0.99 | 0.98 | 1.00 | 1.00 | 1.00 | 1.00 | 0.94 | 1.01 | 0.98 | 1.00 |

RW-2 | 1.00 | 0.97 | 0.99 | 1.01 | 1.03 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | |

RW-3 | 1.03 | 0.99 | 1.01 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.01 |

© 2017 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

