# Finding Influential Users in Social Media Using Association Rule Learning

## Abstract

## 1. Introduction

## 2. Related Work

## 3. Association Rule Learning

#### 3.1. Evaluation Metrics

#### 3.2. Usage of the Eclat Algorithm

## 4. Data Model

#### Data Selection

## 5. Experiments and Results

#### 5.1. Item-Sets and Rules

#### 5.2. Verification of Learned Rules

#### 5.3. Identifying and Verifying Influential Users Using Social Network Analysis

## 6. Discussion

## 7. Conclusions

## Acknowledgments

## Author Contributions

## Conflicts of Interest

**Figure 1.**Combined plot of number of occurrence of each item-set (Frequency) with respect to number of users in the rule (Length). The upper and right axis illustrates histograms of the respective distributions.

**Figure 2.**Distribution of values in learned association rules. (

**a**) support distribution; (

**b**) confidence distribution; (

**c**) lift distribution; (

**d**) conviction distribution.

Type | Mean | Std. | Min | $Q1$ | Median | $Q3$ | Max |
---|---|---|---|---|---|---|---|

Users | 69,678 | 130,564 | 152 | 4282 | 17,995 | 62,194 | 675,200 |

Posts | 7431 | 19,329 | 18 | 784 | 2157 | 5758 | 161,264 |

Comments | 147,721 | 264,711 | 577 | 7886 | 33,437 | 133,421 | 1,340,730 |

Evaluation Metric | Mean | Median | Std. |
---|---|---|---|

Support | 0.05 | 0.02 | 0.07 |

Confidence | 0.43 | 0.33 | 0.33 |

Lift | 18.97 | 9.38 | 24.64 |

Conviction | 1.83 | 1.32 | 1.18 |

Rule | Confidence | Lift | Conviction | |
---|---|---|---|---|

Confidence | ||||

$\{{u}_{179},\phantom{\rule{4pt}{0ex}}{u}_{538},\phantom{\rule{4pt}{0ex}}{u}_{580},\phantom{\rule{4pt}{0ex}}{u}_{938},\phantom{\rule{4pt}{0ex}}{u}_{992},\phantom{\rule{4pt}{0ex}}{u}_{1090}\}\phantom{\rule{4pt}{0ex}}\Rightarrow \phantom{\rule{4pt}{0ex}}\left\{{u}_{11}\right\}$ | 1.00 | 10.17 | ∞ | |

$\{{u}_{11},\phantom{\rule{4pt}{0ex}}{u}_{31},\phantom{\rule{4pt}{0ex}}{u}_{80},\phantom{\rule{4pt}{0ex}}{u}_{179},\phantom{\rule{4pt}{0ex}}{u}_{992},\phantom{\rule{4pt}{0ex}}{u}_{1093}\}\phantom{\rule{4pt}{0ex}}\Rightarrow \phantom{\rule{4pt}{0ex}}\left\{{u}_{580}\right\}$ | 1.00 | 4.80 | ∞ | |

$\{{u}_{11},\phantom{\rule{4pt}{0ex}}{u}_{31},\phantom{\rule{4pt}{0ex}}{u}_{179},\phantom{\rule{4pt}{0ex}}{u}_{580},\phantom{\rule{4pt}{0ex}}{u}_{992},\phantom{\rule{4pt}{0ex}}{u}_{1093}\}\phantom{\rule{4pt}{0ex}}\Rightarrow \phantom{\rule{4pt}{0ex}}\left\{{u}_{80}\right\}$ | 1.00 | 9.53 | ∞ | |

$\{{u}_{11},\phantom{\rule{4pt}{0ex}}{u}_{179},\phantom{\rule{4pt}{0ex}}{u}_{538},\phantom{\rule{4pt}{0ex}}{u}_{580},\phantom{\rule{4pt}{0ex}}{u}_{938},\phantom{\rule{4pt}{0ex}}{u}_{953}\}\phantom{\rule{4pt}{0ex}}\Rightarrow \phantom{\rule{4pt}{0ex}}\left\{{u}_{429}\right\}$ | 1.00 | 4.84 | ∞ | |

$\{{u}_{179},\phantom{\rule{4pt}{0ex}}{u}_{1094},\phantom{\rule{4pt}{0ex}}{u}_{1096},\phantom{\rule{4pt}{0ex}}{u}_{1113},\phantom{\rule{4pt}{0ex}}{u}_{1171},\phantom{\rule{4pt}{0ex}}{u}_{1352}\}\phantom{\rule{4pt}{0ex}}\Rightarrow \phantom{\rule{4pt}{0ex}}\left\{{u}_{1378}\right\}$ | 1.00 | 101.67 | ∞ | |

Lift | ||||

$\{{u}_{580},\phantom{\rule{4pt}{0ex}}{u}_{861},\phantom{\rule{4pt}{0ex}}{u}_{1352},\phantom{\rule{4pt}{0ex}}{u}_{1466}\}\phantom{\rule{4pt}{0ex}}\Rightarrow \phantom{\rule{4pt}{0ex}}\{{u}_{896},\phantom{\rule{4pt}{0ex}}{u}_{1291}\}$ | 1.00 | 152.50 | ∞ | |

$\{{u}_{580},\phantom{\rule{4pt}{0ex}}{u}_{861},\phantom{\rule{4pt}{0ex}}{u}_{1291},\phantom{\rule{4pt}{0ex}}{u}_{1352}\}\phantom{\rule{4pt}{0ex}}\Rightarrow \phantom{\rule{4pt}{0ex}}\{{u}_{896},\phantom{\rule{4pt}{0ex}}{u}_{1466}\}$ | 1.00 | 152.50 | ∞ | |

$\{{u}_{31},\phantom{\rule{4pt}{0ex}}{u}_{80},\phantom{\rule{4pt}{0ex}}{u}_{179},\phantom{\rule{4pt}{0ex}}{u}_{580}\}\phantom{\rule{4pt}{0ex}}\Rightarrow \phantom{\rule{4pt}{0ex}}\{{u}_{11},\phantom{\rule{4pt}{0ex}}{u}_{992},\phantom{\rule{4pt}{0ex}}{u}_{1093}\}$ | 1.00 | 152.50 | ∞ | |

$\{{u}_{19},\phantom{\rule{4pt}{0ex}}{u}_{64},\phantom{\rule{4pt}{0ex}}{u}_{673},\phantom{\rule{4pt}{0ex}}{u}_{685}\}\phantom{\rule{4pt}{0ex}}\Rightarrow \phantom{\rule{4pt}{0ex}}\{{u}_{54},\phantom{\rule{4pt}{0ex}}{u}_{581}\}$ | 1.00 | 152.50 | ∞ | |

$\{{u}_{580},\phantom{\rule{4pt}{0ex}}{u}_{861},\phantom{\rule{4pt}{0ex}}{u}_{1291},\phantom{\rule{4pt}{0ex}}{u}_{1466}\}\phantom{\rule{4pt}{0ex}}\Rightarrow \phantom{\rule{4pt}{0ex}}\{{u}_{896},\phantom{\rule{4pt}{0ex}}{u}_{1352}\}$ | 1.00 | 152.50 | ∞ | |

Conviction | ||||

$\{{u}_{429},\phantom{\rule{4pt}{0ex}}{u}_{578}\}\phantom{\rule{4pt}{0ex}}\Rightarrow \phantom{\rule{4pt}{0ex}}\left\{{u}_{19}\right\}$ | 0.95 | 3.93 | 16.66 | |

$\left\{{u}_{920}\right\}\phantom{\rule{4pt}{0ex}}\Rightarrow \phantom{\rule{4pt}{0ex}}\left\{{u}_{179}\right\}$ | 0.95 | 4.27 | 16.32 | |

$\left\{{u}_{929}\right\}\phantom{\rule{4pt}{0ex}}\Rightarrow \phantom{\rule{4pt}{0ex}}\left\{{u}_{179}\right\}$ | 0.95 | 4.26 | 15.54 | |

$\{{u}_{580},\phantom{\rule{4pt}{0ex}}{u}_{1093}\}\phantom{\rule{4pt}{0ex}}\Rightarrow \phantom{\rule{4pt}{0ex}}\left\{{u}_{179}\right\}$ | 0.94 | 4.22 | 13.21 | |

$\{{u}_{580},\phantom{\rule{4pt}{0ex}}{u}_{938}\}\phantom{\rule{4pt}{0ex}}\Rightarrow \phantom{\rule{4pt}{0ex}}\left\{{u}_{179}\right\}$ | 0.94 | 4.22 | 13.21 |

**Table 4.**Descriptive statistics of learned rules with of Confidence ⩾ 95% from the complete dataset.

Evaluation Metric | Mean | Std. | Min | $Q1$ | Median | $Q4$ | Max |
---|---|---|---|---|---|---|---|

No. of rules | 33,426.89 | 87,457.39 | 2.00 | 151.00 | 2351.00 | 32,053.50 | 724,510.00 |

Confidence | 1.00 | 0.00 | 0.97 | 1.00 | 1.00 | 1.00 | 1.00 |

Lift | 38.06 | 42.14 | 1.41 | 10.86 | 25.34 | 47.91 | 217.53 |

Conviction | 19.39 | 4.61 | 5.88 | 18.07 | 19.79 | 20.70 | 29.46 |

**Table 5.**Example of false positives and false negatives. Capital letters indicates users and ${P}_{1-4}$ corresponds to different posts.

Example rule: $\{A,\phantom{\rule{4pt}{0ex}}B,\phantom{\rule{4pt}{0ex}}C\}\Rightarrow \left\{D\right\}$ | ||
---|---|---|

${P}_{1}=\{A,\phantom{\rule{4pt}{0ex}}B,\phantom{\rule{4pt}{0ex}}C,\phantom{\rule{4pt}{0ex}}D\}$ | ⟶ | true positive |

${P}_{2}=\{A,\phantom{\rule{4pt}{0ex}}B,\phantom{\rule{4pt}{0ex}}C\}$ | ⟶ | false positive |

${P}_{3}=\{F,\phantom{\rule{4pt}{0ex}}G,\phantom{\rule{4pt}{0ex}}H\}$ | ⟶ | true negative |

${P}_{4}=\{D,\phantom{\rule{4pt}{0ex}}E\}$ | ⟶ | false negative |

**Table 6.**Testing of learned rules based on a $80/20\phantom{\rule{0.166667em}{0ex}}\%$ learn and test split. SD stands for standard deviation.

Evaluation Metric | OccupyTogether | OccupyTogether ${}^{a}$ | All pages (SD) | All pages ${}^{a}$ (SD) |
---|---|---|---|---|

No. of rules | 46,170 | 4469 | 99,237 (248,968) | 7092 (14,965) |

Accuracy | 0.886 | 0.927 | 0.858 (0.135) | 0.906 (0.128) |

Precision | 0.291 | 0.794 | 0.286 (0.287) | 0.633 (0.343) |

Recall | 0.071 | 0.017 | 0.138 (0.193) | 0.165 (0.258) |

Percent of Top Users | Users | Degree ∩ ASR | Page Rank ∩ ASR | Page Rank ∩ Degree |
---|---|---|---|---|

1 % | 4 | 0.75 | 0.75 | 0.75 |

5 % | 20 | 0.45 | 0.45 | 0.95 |

10 % | 41 | 0.488 | 0.512 | 0.927 |

25 % | 104 | 0.462 | 0.49 | 0.923 |

50 % | 209 | 0.512 | 0.526 | 0.947 |

75 % | 313 | 0.502 | 0.556 | 0.92 |

100 % | 418 | 0.517 | 0.565 | 0.928 |

Percent of Top Users | Degree ∩ ASR (SD) | Page Rank ∩ ASR (SD) | Page Rank ∩ Degree (SD) |
---|---|---|---|

1 % | 0.092 (0.173) | 0.131 (0.227) | 0.822 (0.238) |

5 % | 0.081 (0.145) | 0.095 (0.158) | 0.805 (0.251) |

10 % | 0.115 (0.158) | 0.133 (0.173) | 0.830 (0.219) |

25 % | 0.181 (0.188) | 0.194 (0.198) | 0.836 (0.167) |

50 % | 0.231 (0.212) | 0.257 (0.228) | 0.848 (0.129) |

75 % | 0.266 (0.243) | 0.286 (0.249) | 0.868 (0.119) |

100 % | 0.286 (0.261) | 0.304 (0.264) | 0.886 (0.114) |

Average Rank | 3 | 2 | 1 |

**Table 9.**Paired rank comparison of intersections using the Nemenyi post hoc test. The upper triangle shows difference between intersections. Lower triangle shows pairs with statistical significance.

Compared Measures | Degree ∩ ARL | Page Rank ∩ ARL | Page Rank ∩ Degree |
---|---|---|---|

Degree ∩ ARL | - | 1.00 | 2.00 |

Page Rank ∩ ARL | - | - | 1.00 |

Page Rank ∩ Degree | ${}^{*}$, ${}^{**}$ | - | - |

Method | Mean | Std. |
---|---|---|

Degree | 329.135 | (2345.996) |

Page Rank | 633.152 | (4602.607) |

ASR | 9.033 | (22.497) |

