# Finding Influential Users in Social Media Using Association Rule Learning

^{1}

^{2}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Related Work

## 3. Association Rule Learning

#### 3.1. Evaluation Metrics

#### 3.2. Usage of the Eclat Algorithm

## 4. Data Model

#### Data Selection

## 5. Experiments and Results

#### 5.1. Item-Sets and Rules

#### 5.2. Verification of Learned Rules

#### 5.3. Identifying and Verifying Influential Users Using Social Network Analysis

## 6. Discussion

## 7. Conclusions

## Acknowledgments

## Author Contributions

## Conflicts of Interest

## References

- Cha, M.; Haddadi, H.; Benevenuto, F.; Gummadi, P.K. Measuring User Influence in Twitter: The Million Follower Fallacy. ICWSM
**2010**, 10, 10–17. [Google Scholar] - Riquelme, F. Measuring user influence on Twitter: A survey. 2015; arXiv:1508.07951. [Google Scholar]
- Musiał, K.; Kazienko, P.; Bródka, P. User Position Measures in Social Networks. In Proceedings of the 3rd Workshop on Social Network Mining and Analysis; ACM: New York, NY, USA, 2009. Article No. 6. [Google Scholar]
- Bródka, P. Key User Extraction Based on Telecommunication Data (aka. Key Users in Social Network. How to find them?). 2013; arXiv:1302.1369. [Google Scholar]
- Erlandsson, F.; Borg, A.; Johnson, H.; Bródka, P. Predicting User Participation in Social Media. In Advances in Network Science; Springer International Publishing: Cham, Switserland, 2016; pp. 126–135. [Google Scholar]
- Flach, P. Machine Learning: The Art and Science of Algorithms that Make Sense of Data; Cambridge University Press: Cambridge, UK, 2012. [Google Scholar]
- Liben-Nowell, D.; Kleinberg, J. The Link-prediction Problem for Social Networks. J. Am. Soc. Inf. Sci. Technol.
**2007**, 58, 1019–1031. [Google Scholar] [CrossRef] - Utz, S.; Jankowski, J. Making “Friends” in a Virtual World The Role of Preferential Attachment, Homophily, and Status. Soc. Sci. Comput. Rev.
**2015**. [Google Scholar] [CrossRef] - Zu, Q.; Hu, B.; Gu, N.; Seng, S. Human Centered Computing. In Proceedings of the 1st Human Centered Computing Conference International Conference, (HCC 2014), Phnom Penh, Cambodia, 27–29 November 2014.
- Au, W.H.; Chan, K.C.; Yao, X. A novel evolutionary data mining algorithm with applications to churn prediction. IEEE Trans. Evolut. Comput.
**2003**, 7, 532–545. [Google Scholar] - Ruta, D.; Kazienko, P.; Bródka, P. Network-Aware Customer Value in Telecommunication Social Networks. In Proceedings of the 2009 International Conference on Artificial Intelligence, (ICAI’09), Las Vegas, NE, USA, 13–16 July 2009; pp. 261–267.
- Saganowski, S.; Gliwa, B.; Bródka, P.; Zygmunt, A.; Kazienko, P.; Koźlak, J. Predicting community evolution in social networks. Entropy
**2015**, 17, 3053–3096. [Google Scholar] [CrossRef] - De Meo, P.; Ferrara, E.; Rosaci, D.; Sarne, G.M.L. Trust and Compactness in Social Network Groups. IEEE Trans. Cybern.
**2015**, 45, 205–216. [Google Scholar] [CrossRef] [PubMed] - Asur, S.; Huberman, B.A. Predicting the Future with Social Media. In Proceedings of the 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology–Volume 01; IEEE Computer Society: Washington, DC, USA, 2010; pp. 492–499. [Google Scholar]
- Ahmad, W.; Riaz, A.; Johnson, H.; Lavesson, N. Predicting Friendship Intensity in Online Social Networks. In Proceedings of the 21st Tyrrhenian Workshop on Digital Communications: Trustworthy Internet; Springer: Berlin/Heidelberg, Germany, 2010. [Google Scholar]
- Nia, R.; Erlandsson, F.; Johnson, H.; Wu, S.F. Leveraging social interactions to suggest friends. In Proceedings of the 2013 IEEE 33rd International Conference on Distributed Computing Systems Workshops (ICDCSW), Philadelphia, PA, USA, 8–11 July 2013; pp. 386–391.
- Spertus, E.; Sahami, M.; Buyukkokten, O. Evaluating Similarity Measures: A Large-Scale Study in the Orkut Social Network. In Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, (KDD’05), Chicago, IL, USA, 21–24 August 2005; pp. 678–684.
- Vasuki, V.; Natarajan, N.; Lu, Z.; Savas, B.; Dhillon, I. Scalable Affiliation Recommendation Using Auxiliary Networks. ACM Trans. Intell. Syst. Technol.
**2011**, 3. [Google Scholar] [CrossRef] - Petz, G.; Karpowicz, M.; Fürschuß, H.; Auinger, A.; Stříteský, V.; Holzinger, A. Computational approaches for mining user’s opinions on the web 2.0. Inf. Process. Manag.
**2015**, 51, 510–519. [Google Scholar] [CrossRef] - Jamali, S.; Rangwala, H. Digging Digg: Comment Mining, Popularity Prediction and Social Network Analysis. In Proceedings of the International Conference on Web Information Systems and Mining, (WISM 2009), Shanghai, China, 7–8 November 2009; pp. 32–38.
- Hakim, M.; Khodra, M. Predicting information cascade on Twitter using support vector regression. In Proceedings of the 2014 International Conference on Data and Software Engineering (ICODSE), Hyderabad, India, 31 May–7 June 2014; pp. 1–6.
- Jankowski, J.; Michalski, R.; Kazienko, P. The Multidimensional Study of Viral Campaigns as Branching Processes. In Social Informatics; Aberer, K., Flache, A., Jager, W., Liu, L., Tang, J., Guéret, C., Eds.; Springer: Berlin/Heidelberg, Germany, 2012; Volume 7710, pp. 462–474. [Google Scholar]
- Bakshy, E.; Hofman, J.M.; Mason, W.A.; Watts, D.J. Everyone’s an Influencer: Quantifying Influence on Twitter. In Proceedings of the Fourth ACM International Conference on Web Search and Data Mining, (WSDM ’11); ACM: New York, NY, USA, 2011; pp. 65–74. [Google Scholar]
- Ghosh, R.; Lerman, K. Predicting Influential Users in Online Social Networks. 2010; arXiv:1005.4882. [Google Scholar]
- Shin, H.; Xu, Z.; Kim, E.Y. Discovering and Browsing of Power Users by Social Relationship Analysis in Large-Scale Online Communities. In Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology–Volume 01; IEEE Computer Society: Washington, DC, USA, 2008; pp. 105–111. [Google Scholar]
- Lin, K.C.; Wu, S.H.; Chen, L.P.; Yang, P.C. Finding the Key Users in Facebook Fan Pages via a Clustering Approach. In Proceedings of the 2015 IEEE International Conference on Information Reuse and Integration (IRI), Redwood City, CA, USA, 13–15 August 2015; pp. 556–561.
- Weng, J.; Lim, E.P.; Jiang, J.; He, Q. TwitterRank: Finding Topic-Sensitive Influential Twitterers. In Proceedings of the Third ACM International Conference on Web Search and Data Mining; ACM: New York, NY, USA, 2010; pp. 261–270. [Google Scholar]
- Tang, X.; Yang, C.C. Identifing Influential Users in an Online Healthcare Social Network. In Proceedings of 2010 IEEE International Conference on Intelligence and Security Informatics (ISI), Vancouver, BC, Canada, 23–26 May 2010; pp. 43–48.
- Hotho, A.; Jäschke, R.; Schmitz, C.; Stumme, G. Information Retrieval in Folksonomies: Search and Ranking. In The Semantic Web: Research and Applications; Springer: Berlin/Heidelberg, Germany, 2006; pp. 411–426. [Google Scholar]
- Nancy, P.; Geetha Ramani, R.; Jacob, S. Mining of Association Patterns in Social Network Data (Face Book 100 Universities) through Data Mining Techniques and Methods. In Advances in Computing and Information Technology; Meghanathan, N., Nagamalai, D., Chaki, N., Eds.; Springer: Berlin/Heidelberg, Germany, 2013; Volume 178, pp. 107–117. [Google Scholar]
- Yu, X.; Liu, H.; Shi, J.; Hwang, J.N.; Wan, W.; Lu, J. Association Rule Mining of Personal Hobbies in Social Networks. In Proceedings of the 2014 IEEE International Congress on Big Data (BigData Congress), Anchorage, AK, USA, 27 June 27–2 July 2014; pp. 310–314.
- Schmitz, C.; Hotho, A.; Jäschke, R.; Stumme, G. Mining association rules in folksonomies. In Data Science and Classification; Springer: Berlin/Heidelberg, Germany, 2006; pp. 261–270. [Google Scholar]
- Agrawal, R.; Imieliński, T.; Swami, A. Mining Association Rules Between Sets of Items in Large Databases. ACM SIGMOD Rec.
**1993**, 22, 207–216. [Google Scholar] [CrossRef] - Agrawal, R.; Srikant, R. Fast Algorithms for Mining Association Rules in Large Databases. In Proceedings of the 20th International Conference on Very Large Data Bases; Morgan Kaufmann Publishers Inc.: San Francisco, CA, USA, 1994; pp. 487–499. [Google Scholar]
- Goethals, B. Survey on Frequent Pattern Mining; Technical report; University of Helsinki: Helsinki, Finland, 2003. [Google Scholar]
- Zaki, M.J. Scalable Algorithms for Association Mining. IEEE Trans. Knowl. Data Eng.
**2000**, 12, 372–390. [Google Scholar] [CrossRef] - Erlandsson, F.; Nia, R.; Boldt, M.; Johnson, H.; Wu, S.F. Crawling Online Social Networks. In Proceedings of the 2015 European Network Intelligence Conference (ENIC), Karlskrona, Sweden, 21–22 September 2015.
- Nia, R.; Erlandsson, F.; Bhattacharyya, P.; Rahman, M.R.; Johnson, H.; Wu, S.F. Sin: A platform to make interactions in social networks accessible. In Proceedings of the 2012 International Conference on Social Informatics (SocialInformatics), Washington, DC, USA, 14–16 December 2012; pp. 205–214.
- Occupy Together. Available online: https://www.facebook.com/OccupyTogether (accessed on 27 April 2016).
- Brodka, P.; Musial, K.; Kazienko, P. A performance of centrality calculation in social networks. In Proceedings of the International Conference on IEEE Computational Aspects of Social Networks (CASON’09), Fontainebleau, France, 24–27 June 2009; pp. 24–31.
- Sheskin, D. Handbook of Parametric and Nonparametric Statistical Procedures; Chapman & Hall: London, UK, 2007. [Google Scholar]
- Demšar, J. Statistical Comparisons of Classifiers over Multiple Data Sets. J. Mach. Learn. Res.
**2006**, 7, 1–30. [Google Scholar]

**Figure 1.**Combined plot of number of occurrence of each item-set (Frequency) with respect to number of users in the rule (Length). The upper and right axis illustrates histograms of the respective distributions.

**Figure 2.**Distribution of values in learned association rules. (

**a**) support distribution; (

**b**) confidence distribution; (

**c**) lift distribution; (

**d**) conviction distribution.

Type | Mean | Std. | Min | $Q1$ | Median | $Q3$ | Max |
---|---|---|---|---|---|---|---|

Users | 69,678 | 130,564 | 152 | 4282 | 17,995 | 62,194 | 675,200 |

Posts | 7431 | 19,329 | 18 | 784 | 2157 | 5758 | 161,264 |

Comments | 147,721 | 264,711 | 577 | 7886 | 33,437 | 133,421 | 1,340,730 |

Evaluation Metric | Mean | Median | Std. |
---|---|---|---|

Support | 0.05 | 0.02 | 0.07 |

Confidence | 0.43 | 0.33 | 0.33 |

Lift | 18.97 | 9.38 | 24.64 |

Conviction | 1.83 | 1.32 | 1.18 |

Rule | Confidence | Lift | Conviction | |
---|---|---|---|---|

Confidence | ||||

$\{{u}_{179},\phantom{\rule{4pt}{0ex}}{u}_{538},\phantom{\rule{4pt}{0ex}}{u}_{580},\phantom{\rule{4pt}{0ex}}{u}_{938},\phantom{\rule{4pt}{0ex}}{u}_{992},\phantom{\rule{4pt}{0ex}}{u}_{1090}\}\phantom{\rule{4pt}{0ex}}\Rightarrow \phantom{\rule{4pt}{0ex}}\left\{{u}_{11}\right\}$ | 1.00 | 10.17 | ∞ | |

$\{{u}_{11},\phantom{\rule{4pt}{0ex}}{u}_{31},\phantom{\rule{4pt}{0ex}}{u}_{80},\phantom{\rule{4pt}{0ex}}{u}_{179},\phantom{\rule{4pt}{0ex}}{u}_{992},\phantom{\rule{4pt}{0ex}}{u}_{1093}\}\phantom{\rule{4pt}{0ex}}\Rightarrow \phantom{\rule{4pt}{0ex}}\left\{{u}_{580}\right\}$ | 1.00 | 4.80 | ∞ | |

$\{{u}_{11},\phantom{\rule{4pt}{0ex}}{u}_{31},\phantom{\rule{4pt}{0ex}}{u}_{179},\phantom{\rule{4pt}{0ex}}{u}_{580},\phantom{\rule{4pt}{0ex}}{u}_{992},\phantom{\rule{4pt}{0ex}}{u}_{1093}\}\phantom{\rule{4pt}{0ex}}\Rightarrow \phantom{\rule{4pt}{0ex}}\left\{{u}_{80}\right\}$ | 1.00 | 9.53 | ∞ | |

$\{{u}_{11},\phantom{\rule{4pt}{0ex}}{u}_{179},\phantom{\rule{4pt}{0ex}}{u}_{538},\phantom{\rule{4pt}{0ex}}{u}_{580},\phantom{\rule{4pt}{0ex}}{u}_{938},\phantom{\rule{4pt}{0ex}}{u}_{953}\}\phantom{\rule{4pt}{0ex}}\Rightarrow \phantom{\rule{4pt}{0ex}}\left\{{u}_{429}\right\}$ | 1.00 | 4.84 | ∞ | |

$\{{u}_{179},\phantom{\rule{4pt}{0ex}}{u}_{1094},\phantom{\rule{4pt}{0ex}}{u}_{1096},\phantom{\rule{4pt}{0ex}}{u}_{1113},\phantom{\rule{4pt}{0ex}}{u}_{1171},\phantom{\rule{4pt}{0ex}}{u}_{1352}\}\phantom{\rule{4pt}{0ex}}\Rightarrow \phantom{\rule{4pt}{0ex}}\left\{{u}_{1378}\right\}$ | 1.00 | 101.67 | ∞ | |

Lift | ||||

$\{{u}_{580},\phantom{\rule{4pt}{0ex}}{u}_{861},\phantom{\rule{4pt}{0ex}}{u}_{1352},\phantom{\rule{4pt}{0ex}}{u}_{1466}\}\phantom{\rule{4pt}{0ex}}\Rightarrow \phantom{\rule{4pt}{0ex}}\{{u}_{896},\phantom{\rule{4pt}{0ex}}{u}_{1291}\}$ | 1.00 | 152.50 | ∞ | |

$\{{u}_{580},\phantom{\rule{4pt}{0ex}}{u}_{861},\phantom{\rule{4pt}{0ex}}{u}_{1291},\phantom{\rule{4pt}{0ex}}{u}_{1352}\}\phantom{\rule{4pt}{0ex}}\Rightarrow \phantom{\rule{4pt}{0ex}}\{{u}_{896},\phantom{\rule{4pt}{0ex}}{u}_{1466}\}$ | 1.00 | 152.50 | ∞ | |

$\{{u}_{31},\phantom{\rule{4pt}{0ex}}{u}_{80},\phantom{\rule{4pt}{0ex}}{u}_{179},\phantom{\rule{4pt}{0ex}}{u}_{580}\}\phantom{\rule{4pt}{0ex}}\Rightarrow \phantom{\rule{4pt}{0ex}}\{{u}_{11},\phantom{\rule{4pt}{0ex}}{u}_{992},\phantom{\rule{4pt}{0ex}}{u}_{1093}\}$ | 1.00 | 152.50 | ∞ | |

$\{{u}_{19},\phantom{\rule{4pt}{0ex}}{u}_{64},\phantom{\rule{4pt}{0ex}}{u}_{673},\phantom{\rule{4pt}{0ex}}{u}_{685}\}\phantom{\rule{4pt}{0ex}}\Rightarrow \phantom{\rule{4pt}{0ex}}\{{u}_{54},\phantom{\rule{4pt}{0ex}}{u}_{581}\}$ | 1.00 | 152.50 | ∞ | |

$\{{u}_{580},\phantom{\rule{4pt}{0ex}}{u}_{861},\phantom{\rule{4pt}{0ex}}{u}_{1291},\phantom{\rule{4pt}{0ex}}{u}_{1466}\}\phantom{\rule{4pt}{0ex}}\Rightarrow \phantom{\rule{4pt}{0ex}}\{{u}_{896},\phantom{\rule{4pt}{0ex}}{u}_{1352}\}$ | 1.00 | 152.50 | ∞ | |

Conviction | ||||

$\{{u}_{429},\phantom{\rule{4pt}{0ex}}{u}_{578}\}\phantom{\rule{4pt}{0ex}}\Rightarrow \phantom{\rule{4pt}{0ex}}\left\{{u}_{19}\right\}$ | 0.95 | 3.93 | 16.66 | |

$\left\{{u}_{920}\right\}\phantom{\rule{4pt}{0ex}}\Rightarrow \phantom{\rule{4pt}{0ex}}\left\{{u}_{179}\right\}$ | 0.95 | 4.27 | 16.32 | |

$\left\{{u}_{929}\right\}\phantom{\rule{4pt}{0ex}}\Rightarrow \phantom{\rule{4pt}{0ex}}\left\{{u}_{179}\right\}$ | 0.95 | 4.26 | 15.54 | |

$\{{u}_{580},\phantom{\rule{4pt}{0ex}}{u}_{1093}\}\phantom{\rule{4pt}{0ex}}\Rightarrow \phantom{\rule{4pt}{0ex}}\left\{{u}_{179}\right\}$ | 0.94 | 4.22 | 13.21 | |

$\{{u}_{580},\phantom{\rule{4pt}{0ex}}{u}_{938}\}\phantom{\rule{4pt}{0ex}}\Rightarrow \phantom{\rule{4pt}{0ex}}\left\{{u}_{179}\right\}$ | 0.94 | 4.22 | 13.21 |

**Table 4.**Descriptive statistics of learned rules with of Confidence ⩾ 95% from the complete dataset.

Evaluation Metric | Mean | Std. | Min | $Q1$ | Median | $Q4$ | Max |
---|---|---|---|---|---|---|---|

No. of rules | 33,426.89 | 87,457.39 | 2.00 | 151.00 | 2351.00 | 32,053.50 | 724,510.00 |

Confidence | 1.00 | 0.00 | 0.97 | 1.00 | 1.00 | 1.00 | 1.00 |

Lift | 38.06 | 42.14 | 1.41 | 10.86 | 25.34 | 47.91 | 217.53 |

Conviction | 19.39 | 4.61 | 5.88 | 18.07 | 19.79 | 20.70 | 29.46 |

**Table 5.**Example of false positives and false negatives. Capital letters indicates users and ${P}_{1-4}$ corresponds to different posts.

Example rule: $\{A,\phantom{\rule{4pt}{0ex}}B,\phantom{\rule{4pt}{0ex}}C\}\Rightarrow \left\{D\right\}$ | ||
---|---|---|

${P}_{1}=\{A,\phantom{\rule{4pt}{0ex}}B,\phantom{\rule{4pt}{0ex}}C,\phantom{\rule{4pt}{0ex}}D\}$ | ⟶ | true positive |

${P}_{2}=\{A,\phantom{\rule{4pt}{0ex}}B,\phantom{\rule{4pt}{0ex}}C\}$ | ⟶ | false positive |

${P}_{3}=\{F,\phantom{\rule{4pt}{0ex}}G,\phantom{\rule{4pt}{0ex}}H\}$ | ⟶ | true negative |

${P}_{4}=\{D,\phantom{\rule{4pt}{0ex}}E\}$ | ⟶ | false negative |

**Table 6.**Testing of learned rules based on a $80/20\phantom{\rule{0.166667em}{0ex}}\%$ learn and test split. SD stands for standard deviation.

Evaluation Metric | OccupyTogether | OccupyTogether ${}^{a}$ | All pages (SD) | All pages ${}^{a}$ (SD) |
---|---|---|---|---|

No. of rules | 46,170 | 4469 | 99,237 (248,968) | 7092 (14,965) |

Accuracy | 0.886 | 0.927 | 0.858 (0.135) | 0.906 (0.128) |

Precision | 0.291 | 0.794 | 0.286 (0.287) | 0.633 (0.343) |

Recall | 0.071 | 0.017 | 0.138 (0.193) | 0.165 (0.258) |

Percent of Top Users | Users | Degree ∩ ASR | Page Rank ∩ ASR | Page Rank ∩ Degree |
---|---|---|---|---|

1 % | 4 | 0.75 | 0.75 | 0.75 |

5 % | 20 | 0.45 | 0.45 | 0.95 |

10 % | 41 | 0.488 | 0.512 | 0.927 |

25 % | 104 | 0.462 | 0.49 | 0.923 |

50 % | 209 | 0.512 | 0.526 | 0.947 |

75 % | 313 | 0.502 | 0.556 | 0.92 |

100 % | 418 | 0.517 | 0.565 | 0.928 |

Percent of Top Users | Degree ∩ ASR (SD) | Page Rank ∩ ASR (SD) | Page Rank ∩ Degree (SD) |
---|---|---|---|

1 % | 0.092 (0.173) | 0.131 (0.227) | 0.822 (0.238) |

5 % | 0.081 (0.145) | 0.095 (0.158) | 0.805 (0.251) |

10 % | 0.115 (0.158) | 0.133 (0.173) | 0.830 (0.219) |

25 % | 0.181 (0.188) | 0.194 (0.198) | 0.836 (0.167) |

50 % | 0.231 (0.212) | 0.257 (0.228) | 0.848 (0.129) |

75 % | 0.266 (0.243) | 0.286 (0.249) | 0.868 (0.119) |

100 % | 0.286 (0.261) | 0.304 (0.264) | 0.886 (0.114) |

Average Rank | 3 | 2 | 1 |

**Table 9.**Paired rank comparison of intersections using the Nemenyi post hoc test. The upper triangle shows difference between intersections. Lower triangle shows pairs with statistical significance.

Compared Measures | Degree ∩ ARL | Page Rank ∩ ARL | Page Rank ∩ Degree |
---|---|---|---|

Degree ∩ ARL | - | 1.00 | 2.00 |

Page Rank ∩ ARL | - | - | 1.00 |

Page Rank ∩ Degree | ${}^{*}$, ${}^{**}$ | - | - |

Method | Mean | Std. |
---|---|---|

Degree | 329.135 | (2345.996) |

Page Rank | 633.152 | (4602.607) |

ASR | 9.033 | (22.497) |

© 2016 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC-BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Erlandsson, F.; Bródka, P.; Borg, A.; Johnson, H. Finding Influential Users in Social Media Using Association Rule Learning. *Entropy* **2016**, *18*, 164.
https://doi.org/10.3390/e18050164

**AMA Style**

Erlandsson F, Bródka P, Borg A, Johnson H. Finding Influential Users in Social Media Using Association Rule Learning. *Entropy*. 2016; 18(5):164.
https://doi.org/10.3390/e18050164

**Chicago/Turabian Style**

Erlandsson, Fredrik, Piotr Bródka, Anton Borg, and Henric Johnson. 2016. "Finding Influential Users in Social Media Using Association Rule Learning" *Entropy* 18, no. 5: 164.
https://doi.org/10.3390/e18050164