# Link Pruning for Community Detection in Social Networks

## Abstract

## 1. Introduction

- We notice a different pattern between the attributes of links within a cluster (i.e., internal links) and those of links belonging to different clusters (i.e., external links).
- We develop a new community detection algorithm that removes less important links according to the different patterns of link attributes.
- We theoretically prove that link pruning effectively detects enhanced communities using a random graph model with planted clusters.
- We empirically show that the proposed algorithm achieves higher accuracy than the existing algorithms, especially when the pruning rate increases.

## 2. Related Work

#### 2.1. Link Attributes

#### 2.2. Community Detection

#### 2.3. Graph Sparsification

## 3. Our Proposed Framework

#### 3.1. Calculation of Link Attributes

#### 3.1.1. Jaccard’s Index

#### 3.1.2. Number of Common Triangles

#### 3.1.3. Forman–Ricci Curvature

#### 3.2. Algorithm for Clustering Using Link Pruning

#### 3.2.1. Proposed Algorithm

Algorithm 1: Clustering with Link Pruning. |

input: (i) a graph $G=(V,E)$;(ii) a link attribute $attribute$; (iii) a pruning rate $\alpha $; (iv) a clustering algorithm A; output: a set of clusters C of V;1 /* Step 1: Calculate the link attributes */2 for $\{u,v\}\in E$ do3 | Calculate $attribute(u,v)$4 /* Step 2: Sort the link attributes */5 Sort ${\left(attribute(u,v)\right)}_{\{u,v\}\in E}$6 /* Step 3: Prune low-value links */7 ${G}^{*}\leftarrow $ remove the smallest $100\alpha $% links from G8 /* Step 4: Detect communities in the transformed graph */9 $C\leftarrow $ apply A to ${G}^{*}$10 return C; |

#### 3.2.2. Theoretical Analysis

## 4. Experiments

#### 4.1. Datasets

#### 4.1.1. Synthetic Networks

#### 4.1.2. Real-World Networks

#### 4.2. Link Attribute Distribution

#### 4.2.1. Synthetic Networks

#### 4.2.2. Real-World Networks

#### 4.3. Community Detection

#### 4.4. Synthetic Networks

#### 4.5. Real-World Networks

#### 4.6. Hybrid Method

#### 4.7. Verification of Efficiency

## 5. Conclusions

**Figure 1.**Distribution of link attributes in synthetic networks: (

**a**) Jaccard; (

**b**) CommonTriangles; and (

**c**) Forman−−Ricci.

**Figure 2.**Distribution of link attributes in Football: (

**a**) Jaccard; (

**b**) CommonTriangles; and (

**c**) Forman−−Ricci.

**Figure 3.**Graph sparsification after link pruning with Jaccard’s index: (

**a**) $\mathrm{pruning}\phantom{\rule{4.pt}{0ex}}\mathrm{rate}\phantom{\rule{4.pt}{0ex}}0$ (original graph); (

**b**) $\mathrm{pruning}\phantom{\rule{4.pt}{0ex}}\mathrm{rate}\phantom{\rule{4.pt}{0ex}}0.2$; and (

**c**) $\mathrm{pruning}\phantom{\rule{4.pt}{0ex}}\mathrm{rate}\phantom{\rule{4.pt}{0ex}}0.4$.

**Figure 4.**NMI values when varying $\alpha $: (

**a**) mixing 0.1; (

**b**) mixing 0.2; (

**c**) mixing 0.3; (

**d**) mixing 0.4; (

**e**) mixing 0.5; and (

**f**) mixing 0.6.

**Figure 5.**Modularity values when varying $\alpha $: (

**a**) karate; (

**b**) football; (

**c**) twitter; (

**d**) DBLP; (

**e**) Amazon; and (

**f**) YouTube.

Dataset | # of Nodes | # of Links | Clustering Coefficient |
---|---|---|---|

Karate | 34 | 78 | 0.571 |

Football | 115 | 613 | 0.403 |

348 | 4831 | 0.475 | |

DBLP | 317,080 | 1,049,866 | 0.632 |

Amazon | 334,863 | 925,872 | 0.397 |

YouTube | 1,134,890 | 2,987,624 | 0.081 |

Dataset | Link Type | $\mathbf{Jaccard}$ | $\mathbf{CommonTriangles}$ | $\mathbf{Forman}\mathbf{-}\mathbf{Ricci}$ |
---|---|---|---|---|

Karate | intra | $0.34\pm 0.13$ | $\mathbf{3.90}\pm \mathbf{1.63}$ | $\mathbf{-}\mathbf{5.63}\mathbf{\pm}\mathbf{5.25}$ |

inter | $\mathbf{0.46}\mathbf{\pm}\mathbf{0.15}$ | $2.73\pm 1.05$ | $-10.7\pm 5.64$ | |

Football | intra | $\mathbf{0.47}\mathbf{\pm}\mathbf{0.11}$ | $\mathbf{7.52}\mathbf{\pm}\mathbf{1.28}$ | $\mathbf{-}\mathbf{1.06}\mathbf{\pm}\mathbf{3.56}$ |

inter | $0.17\pm 0.11$ | $3.16\pm 1.60$ | $-13.7\pm 5.11$ | |

intra | $\mathbf{0.26}\mathbf{\pm}\mathbf{0.11}$ | $\mathbf{21.1}\mathbf{\pm}\mathbf{13.0}$ | $\mathbf{-}\mathbf{39.4}\mathbf{\pm}\mathbf{28.8}$ | |

inter | $0.16\pm 0.07$ | $16.5\pm 10.3$ | $-67.3\pm 25.2$ | |

DBLP | intra | $0.36\pm 0.31$ | $\mathbf{6.95}\mathbf{\pm}\mathbf{13.53}$ | $\mathbf{-}\mathbf{21.96}\mathbf{\pm}\mathbf{40.0}$ |

inter | $\mathbf{0.43}\mathbf{\pm}\mathbf{0.30}$ | $0.36\pm 0.31$ | $0.36\pm 0.31$ | |

Amazon | intra | $\mathbf{0.31}\mathbf{\pm}\mathbf{0.20}$ | $\mathbf{2.22}\mathbf{\pm}\mathbf{2.30}$ | $-12.49\pm 25.70$ |

inter | $0.29\pm 0.20$ | $1.66\pm 1.81$ | $\mathbf{-}\mathbf{12.32}\mathbf{\pm}\mathbf{25.22}$ | |

YouTube | intra | $0.06\pm 0.09$ | $\mathbf{9.49}\mathbf{\pm}\mathbf{21.07}$ | $\mathbf{-}\mathbf{472.96}\mathbf{\pm}\mathbf{691.84}$ |

inter | $\mathbf{0.09}\mathbf{\pm}\mathbf{0.15}$ | $2.78\pm 12.14$ | $-998.67\pm 3324.71$ |

Pruning Rate ($\mathit{\alpha}$) | Louvain | LPA | Infomap | Fastgreedy | Walktrap |
---|---|---|---|---|---|

0 | 4.94 | 28.31 | 1681.35 | 308.87 | 1495.25 |

0.1 | 4.59 | 22.49 | 1332.08 | 137.12 | 1398.27 |

0.2 | 4.42 | 14.81 | 903.47 | 32.16 | 1392.60 |

0.3 | 4.32
| 12.19 | 10504.02 | 63.47 | 1199.82 |

Pruning Rate ($\mathit{\alpha}$) | Louvain | LPA | Infomap | Fastgreedy | Walktrap |
---|---|---|---|---|---|

0 | 0.926 | 0.786 | 0.825 | 0.867 | 0.849 |

0.1 | 0.911 | 0.760 | 0.803 | 0.889 | 0.811 |

0.2 | 0.892 | 0.741 | 0.782 | 0.886 | 0.793 |

0.3 | 0.873 | 0.725 | 0.767 | 0.868 | 0.783 |

Pruning Rate ($\mathit{\alpha}$) | Louvain | LPA | Infomap | Fastgreedy | Walktrap |
---|---|---|---|---|---|

0 | 0.0012 | 0.0033 | 0.0029 | 0.0015 | 0.0028 |

0.1 | 0.0011 | 0.0033 | 0.0029 | 0.0011 | 0.0029 |

0.2 | 0.0012 | 0.0033 | 0.0031 | 0.0012 | 0.0031 |

0.3 | 0.0028 | 0.0033 | 0.0031 | 0.0028 | 0.0028 |

