# Predicting Community Evolution in Social Networks

^{1}

^{2}

^{*}

^{†}

## Abstract

**:**

## 1. Introduction

## 2. Problem Description

## 3. Related Work

## 4. General Concept

- collecting data and its splitting into time frames (Figure 1a);
- extraction of social networks for each period and social community identification (Figure 1b);
- detection of changes (events) in groups for the following time windows and identification of chains preceding the recent state of the group (Figure 1c);
- building the predictive model (learning the classifier) and its validation (Figures 1d and 2).

_{n}

_{−1}→T

_{n}, i.e., events like group splitting, growing, merging, dissolving and so on (Figure 1c). Two different algorithms developed by the authors were independently utilized for that purpose, see Section 5 for details:

- Stable Group Changes Identification Algorithm (SGCI),
- Group Evolution Discovery Algorithm (GED).

_{i}from T

_{n}. Such chain consists of all other preceding groups from the previous time frames (T

_{n}

_{−1}, T

_{n}

_{−2}, T

_{n}

_{−3}, etc.) the recent group G

_{i}comes from. Additional information (metrics described in section 5.3) about related changes that formed group G

_{i}is added. Overall, it may happen that a group has been created from two or more other groups—through merging, e.g., group G

_{3}came into being from G

_{1}and G

_{2}. In such case, two separate evolution chains are being established for G

_{3}, one with group G

_{1}and one with G

_{2}. It could be even multiplied by more merging events in the following time frames. In general, many evolution chains may be assigned to one group.

_{i}in time frame T

_{n}, a set of descriptive, mainly structural features is computed (see Section 5.3). These features correspond to both the state of group G

_{i}within the social network in T

_{n}(29 features for SGCI and 2 more for GED), as well as its identified ancestors (previous groups) and transitions in the preceding periods T

_{n}

_{−1}→T

_{n}, T

_{n}

_{−2}→T

_{n}

_{−1}, T

_{n}

_{−3}→T

_{n}

_{−2}, …, etc. This calculation is based on the items coming from the group evolution chain detected in the third phase. Through this transformation of evolution chains into descriptive features (attributes, variables), we obtain even hundreds of features reflecting historical evolution of each group until T

_{n}. For each period a separate feature set is calculated. In total, we obtain (no. of features for a period) × (no. of periods) features for each considered group in T

_{n}. Groups with their descriptive features from each time frame T

_{n}(except the first periods that possess too short history) are put into one set of group instances ready to build predictive model. Since there may be several evolution chains for a given group G

_{i}, each of them corresponds to another case. As a result, group G

_{i}occurs as many times in the learning set as many evolution chains were detected for it.

_{n}

_{+1}(output) are used to learn the classifier, i.e., build a classification model. For the validation purpose (to evaluate a quality of prediction), the entire set of these cases (chains of groups) is randomly split into 10 partitions to enable 10-fold cross-validation: learning on nine sets, testing on the remaining 10th and repetition this process 10 times for another remaining testing set.

_{n}→T

_{n}

_{+1}. For example, group G

_{1}in Figure 1c was involved in both merging with G

_{2}into G

_{3}and splitting into G

_{3}and G

_{4}. In this study, we used only typical multi-class classification method; its output is one out of many classes from the fixed set of event types. Hence, the learned model is able to predict only one future event for a given group described by the assigned features derived from one evolution chain. To enable multiple output (many possible events) another solution—multi-label classification would need to be applied [21,22]. This is, however, much more complex and is rather a matter of further research.

## 5. Predicting Group Evolution in the Social Network

#### 5.1. Predicting Group Evolution Using SGCI Results and the Notion of Dominating Event

^{n}

^{,1}, is expressed by a following vector: [L

^{n}

^{,1}, D

^{n}

^{,1}, Co

^{n}

^{,1}, S

^{n}

^{,1}, R

^{n}

^{,1}, {avg, sum, max, min}{D

_{in}

^{n}

^{,1}, D

_{out}

^{n}

^{,1}, D

_{t}

^{n}

^{,1}, B

^{n}

^{,1}, C

^{n}

^{,1}, E

^{n}

^{,1}}], where upper index of each measure concerns the number of time frame (n) and the number of the group (1). The values of functions (avg, sum, max, min) are calculated for values of given measure (D

_{in}

^{n}

^{,1}, D

_{out}

^{n}

^{,1}, D

_{t}

^{n}

^{,1}, B

^{n}

^{,1}, C

^{n}

^{,1}, E

^{n}

^{,1}) of all members of considered groups (see Section 5.3).

_{n}

_{−2,1}, G

_{n}

_{−1,1}and G

_{n}

_{,1}and we want to predict the next evolution event for sequence seq1 and group G

_{n}

_{,1}. As we can see in Figure 3a, this group has two events assigned: constancy (transition between G

_{n}

_{,1}and G

_{n}

_{+1,1}) and addition (transition between G

_{n}

_{,1}and G

_{n}

_{+1,2}). According to the introduced concept of dominating events and chosen priorities of events, the predicted dominating event is the constancy. The table in Figure 3a summarizes the types of predicted events for each considered sequence.

#### 5.2. Predicting Group Evolution Using GED Results

_{n}) is a case (instance) for classification, for which its event T

_{n}T

_{n}

_{+1}is being predicted.

_{n}

_{−3}; (2) Event type T

_{n}

_{−3}T

_{n}

_{−2}; (3) Group profile in T

_{n}

_{−2}; (4) Event type T

_{n}

_{−2}T

_{n}

_{−1}; (5) Group profile in T

_{n}

_{−1}; (6) Event type T

_{n}

_{−1}T

_{n}; (7) Group profile in T

_{n}. A predictive variable is the next event for a given group. Thus, the goal of classification is to predict (classify) Event T

_{n}T

_{n}

_{+1}type–out of the six possible classes: i.e., (1) growing, (2) continuing; (3) shrinking; (4) dissolving; (5) merging and (6) splitting. The forming event was excluded since it can only start the sequence.

- usage of different methods of groups evolution (SGCI and GED, respectively)
- the concept of dominating event in approach using SGCI method
- usage of additional, specific measures for prediction of events in approach using GED method (metrics alpha and beta which are utilized internally in the process of determining groups transitions in consecutive time frames in GED method)
- different generation of chains for split/splitting event (for GED if the last group in a chain has assigned splitting events with multiple groups in the next time frame, then for each splitting transition for the considered group the identical chain is generated, but with SGCI only one such chain is generated).

#### 5.3. Measures Used To Describe Group Profile

**max_total_indegree**. As a result, we obtain 6 × 4 = 24 aggregated features for each group.

**group size**—the number of nodes in the group,**density**—a measure expressing how many connections between nodes are present in the group in relation to all possible connections between them [1]:$$D=\frac{{\displaystyle \sum _{i}{\displaystyle \sum _{j}a}}(i,j)}{n(n-1)}$$**cohesion**—a measure characterising strength of connections inside the group in relation to the connections outside the group (incident with the group members) [1]:$$C=\frac{\frac{{\displaystyle \sum _{i\in G}{\displaystyle \sum _{j\in G}w}}(i,j)}{n(n-1)}}{\frac{{\displaystyle \sum _{i\in G}{\displaystyle \sum _{j\notin G}w}}(i,j)}{N(N-n)}}$$**leadership**—a measure describing centralization in the graph or group (the largest value is for a star network) [27]:$$L={\displaystyle \sum _{i=1}^{n}\frac{{d}_{\mathrm{max}}-{d}_{i}}{(n-2)(n-1)}}$$_{max}means the maximum value of degree in the group and n–the number of nodes in the group,**reciprocity**—a fraction of edges that are reciprocated [28]:$$R=\frac{1}{m}{\displaystyle \sum _{ij}a}(i,j)a(j,i)$$**alpha**—the GED inclusion measure of group G_{i}from time frame T_{n}in group G_{j}from time frame T_{n}_{+1}[26] (a measure used only in approach utilizing the GED method),**beta**—the GED inclusion measure of group Gj from time frame Tn+1 in group Gi from time frame Tn [26] (a measure used only in approach utilizing the GED method),**indegree**—a node measure defining the number of connections directed to the node [27]:$${D}_{in}={\displaystyle \sum _{i}a}(j,i)$$**outdegree**—a node measure determining the number of connections outgoing from the node [27]:$${D}_{out}={\displaystyle \sum _{i}a}(i,j)$$**total degree**—sum of indegree and outdegree:$$D={D}_{in}+{D}_{out}$$**betweenness**—a node measure describing the number of the shortest paths from all nodes to all others that pass through that node [27]:$$B={\displaystyle \sum _{i\ne j\ne v}\frac{{\sigma}_{ij}(v)}{{\sigma}_{ij}}}$$_{ij}(v) align is the total number of the shortest paths from node i to j and σ_{ij}(v) is the number of those paths that pass through v,**closeness**—a node measure defined as the inverse of the farness, which in turn, is the sum of distances to all other nodes [27]:$$C={\displaystyle \sum _{i\ne j}\frac{1}{d(i,j)}}$$**eigenvector**—a node measure indicating the influence of a node in the network [29].

## 6. Dataset and Experiment Setup

#### 6.1. Dataset Description

#### 6.2. Group Extraction

#### 6.3. Group Sizes

#### 6.4. Experiment Setup

## 7. Experiments

#### 7.1. Predicting Group Evolution Using the SGCI Results

#### 7.1.1. DBLP Dataset

#### 7.1.2. Facebook Dataset

#### 7.1.3. Salon24 Dataset

#### 7.1.4. Features Selection

#### 7.2. Predicting Group Evolution Using GED Results

#### 7.2.1. DBLP Dataset

#### 7.2.2. Facebook Dataset

#### 7.2.3. Salon24 Dataset

#### 7.2.4. Features Selection

## 8. Discussion

#### 8.1. Prediction

#### 8.2. Features Selection

_{n}→ T

_{n}

_{+1}, most of features is from the group profiles extracted in states T

_{n}, T

_{n}

_{−1}and T

_{n}

_{−2}, see Figure 3 for the sequence of events for a single group. For example, when the evolution chain length was 10 and upcoming change was predicted as many as 89% of features in case of GED and 64% in case of SGCI have been from the tenth, ninth and eighth group profile.

## 9. Conclusions and Future Work

## Acknowledgments

## Author Contributions

## Conflicts of Interest

## References

- Wasserman, S.; Faust, K. Social Network Analysis: Methods and Applications; Cambridge University Press: Cambridge, UK, 1994. [Google Scholar]
- Liben-Nowell, D.; Kleinberg, J. The link-prediction problem for social networks. J. Am. Soc. Inf. Sci
**2007**, 58, 1019–1031. [Google Scholar] - Lichtenwalter, R.; Lussier, J.T.; Chawla, N.V. New perspectives and methods in link prediction. Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, 25–28 July 2010; ACM: New York, NY, USA; pp. 243–252.
- Zheleva, E.; Getoor, L.; Golbeck, J.; Kuter, U. Using Friendship Ties and Family Circles for Link Prediction. Proceedings of 2nd International Conference on Advances in Social Network Mining and Analysis (SNAKDD’08), Las Vegas, NV, USA, 24–27 August 2008; Springer: Berlin/Heidelberg, Germany, 2008; pp. 97–113. [Google Scholar]
- Chiang, K.Y.; Natarajan, N.; Tewari, A.; Dhillon, I.S. Exploiting longer cycles for link prediction in signed networks. Proceedings of the 20th ACM Conference on Information and Knowledge Management (CIKM 2011), Glasgow, UK, 24–28 October 2011; ACM: New York, NY, USA; pp. 1157–1162.
- Kunegis, J.; Lommatzsch, A.; Bauckhage, C. The slashdot zoo: Mining a social network with negative edges. Proceedings of the 18th International World Wide Web Conference (WWW 2009), Madrid, Spain, 20–24 April 2009; ACM: New York, NY, USA; pp. 741–750.
- Leskovec, J.; Huttenlocher, D.P.; Kleinberg, J.M. Predicting positive and negative links in online social networks. Proceedings of the 19th International World Wide Web Conference (WWW 2010), Raleigh, NC, USA, 26–30 April 2010; ACM: New York, NY, USA; pp. 641–650.
- Symeonidis, P.; Tiakas, E.; Manolopoulos, Y. Transitive node similarity for link prediction in social networks with positive and negative links. Proceedings of the 4th ACM Conference on Recommender Systems (RecSys 2010), Barcelona, Spain, 26–30 April 2010; ACM: New York, NY, USA; pp. 183–190.
- Davis, D.; Lichtenwalter, R.; Chawla, N.V. Supervised methods for multi-relational link prediction. In Social Network Analysis and Mining; Springer: Vienna, Austria, 2012. [Google Scholar] [CrossRef]
- Richter, Y.; Yom-Tov, E.; Slonim, N. Predicting Customer Churn in Mobile Networks through Analysis of Social Groups, Proceedings of the SIAM International Conference on Data Mining, SDM 2010, Columbus, OH, USA, 29 April–1 May 2010; pp. 732–741.
- Wai-Ho, A.; Chan, K.C.C.; Xin, Y. A novel evolutionary data mining algorithm with applications to churn prediction. IEEE Trans. Evol. Comput
**2003**, 7, 532–545. [Google Scholar] - Kairam, S.R.; Wang, D.J.; Leskovec, J. The life and death of online groups: Predicting group growth and longevity. Proceedings of the fifth ACM International Conference on Web Search and Data Mining (WSDM’12), Seattle, WA, USA, 8–12 February 2012; pp. 673–682.
- Patil, A.; Liu, J.; Gao, J. Predicting group stability in online social networks. Proceedings of the 22nd International Conference on World Wide Web (WWW’13), Rio de Janeiro, Brazil, 13–17 May 2013; pp. 1021–1030.
- Goldberg, M.; Magdon-Ismail, M.; Nambirajan, S.; Thompson, J. Tracking and Predicting Evolution of Social Communities. Proceedings of Privacy, Security, Risk and Trust (PASSAT) and 2011 IEEE Third International Conference on Social Computing (SocialCom), Boston, MA, USA, 9–11 October 2011; pp. 780–783.
- Matjaž, P. The Matthew effect in empirical data. J. R. Soc. Interface
**2014**, 11. [Google Scholar] [CrossRef] - Bródka, P.; Kazienko, P.; Kołoszczyk, B. Predicting Group Evolution in the Social Network. In Social Informatics; Aberer, K., Flache, A., Jager, W., Liu, L., Tang, J., Guéret, C., Eds.; Springer: Berlin/Heidelberg, Germany, 2012; pp. 54–67. [Google Scholar]
- Gliwa, B.; Bródka, P.; Zygmunt, A.; Saganowski, S.; Kazienko, P.; Koźlak, J. Different Approaches to Community Evolution Prediction in Blogosphere. Proceedings of 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), Niagara Falls, ON, USA, 25–28 August 2013; pp. 1291–1298.
- Takaffoli, M.; Rabbany, R.; Zaiane, O.R. Community evolution prediction in dynamic social networks. Proceedings of 2013 12th International Conference on Machine Learning and Applications (ICMLA), Miami, FL, USA, 4–7 December 2013; pp. 191–196.
- Derényi, I.; Palla, G.; Vicsek, T. Clique Percolation in Random Networks. Phys. Rev. Lett
**2005**, 94, 160–202. [Google Scholar] - Palla, G.; Derényi, I.; Farkas, I.; Vicsek, T. Uncovering the Overlapping Community Structure of Complex Networks in Nature and Society. Nature
**2005**, 435, 814–818. [Google Scholar] - Indyk, W.; Kajdanowicz, T.; Kazienko, P. Relational Large Scale Multi-label Classification Method for Video Categorization. Multimedia Tools Appl
**2013**, 65, 63–74. [Google Scholar] - Kajdanowicz, T.; Kazienko, P. Multi-label Classification Using Error Correcting Output Codes. Int. J. Appl. Math. Comput. Sci
**2012**, 22, 829–840. [Google Scholar] - Gliwa, B.; Saganowski, S.; Zygmunt, A.; Bródka, P.; Kazienko, P.; Koźlak, J. Identification of Group Changes in Blogosphere. Proceedings of 2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), Istanbul, Turkey, 26–29 August 2012; pp. 1201–1206.
- Zygmunt, A.; Bródka, P.; Kazienko, P.; Koźlak, J. Key person analysis in social communities within the blogosphere. J. UCS
**2012**, 18, 577–597. [Google Scholar] - Gliwa, B.; Zygmunt, A.; Byrski, A. Graphical analysis of social group dynamics. Proceedings of Fourth International Conference on Computational Aspects of Social Networks, CASoN 2012, Sao Carlos, Brazil, 21–23 November 2012; pp. 41–46.
- Bródka, P.; Saganowski, S.; Kazienko, P. GED: The Method for Group Evolution Discovery in Social Networks. Soc. Netw. Anal. Min
**2013**, 3, 1–14. [Google Scholar] - Freeman, L.C. Centrality in Social Networks. Conceptual Clarification. Soc. Netw 1, 215–239.
- Newman, M. Networks: An Introduction; Oxford University Press: Oxford, UK, 2010. [Google Scholar]
- Bonacich, P.B. Factoring and weighing approaches to status scores and clique identification. J. Math. Sociol
**1972**, 2, 113–120. [Google Scholar] - Ley, M. The DBLP computer science bibliography: Evolution, research issues, perspectives. In String Processing and Information Retrieval; Laender, A.F., Oliveira, A., Eds.; Springer: Berlin/Heidelberg, Germany, 2002; pp. 1–10. [Google Scholar]
- Viswanath, B.; Mislove, A.; Cha, M.; Gummadi, K.P. On the evolution of user interaction in Facebook. Proceedings of the 2nd ACM workshop on Online social networks (WOSN’09), Barcelona, Spain, 16–21 August 2009; pp. 37–42.
- Bródka, P.; Musiał, K.; Kazienko, P. A Performance of Centrality Calculation in Social Networks, CASoN 2009. Proceedings of International Conference on Computational Aspects of Social Networks (CASON’09), Paris, French, 24–27 June 2009; pp. 24–31.
- McLachlan, G.J.; Do, K.A.; Ambroise, C. Analyzing Microarray Gene Expression Data; John Wiley & Sons: Hoboken, NJ, USA, 2004; ISBN-10: 0471226165. [Google Scholar]
- Quinlan, R. C4. 5: Programs for Machine Learning; Morgan Kaufmann Publishers: San Mateo, CA, USA, 1993. [Google Scholar]
- Breiman, L. Random Forests. Mach. Learn
**2001**, 45, 5–32. [Google Scholar] - Freund, Y.; Schapire, R.E. A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting. J. Comput. Syst. Sci
**1997**, 55, 119–139. [Google Scholar] - Breiman, L. Bagging predictors. Mach. Learn
**1996**, 24, 123–140. [Google Scholar]

**Figure 3.**Two approaches to group evolution prediction: (

**a**) using the SGCI method—example with sequences of group measures from 3 time frames (1 present group state and 2 earlier group states) and predicted dominating event; (

**b**) using the GED method—the sequence of events for a single group together with its profiles as well as its target class-event type in T

_{n}T

_{n}

_{+1}(the chain corresponds to one case in classification).

**Figure 12.**SGCI: distribution of the event types for events being predicted in the Facebook dataset.

**Figure 50.**The number of features selected from all features for the particular chain length for the Facebook dataset.

**Figure 51.**The percentage of features selected from all features available for the particular chain length for the Facebook dataset.

**Figure 52.**The percentage of features selected from the last 3 time frames for the particular chain length for the Facebook dataset.

**Figure 53.**The comparison of features usage in GED and SGCI after feature selection for the Facebook dataset.

**Figure 54.**Error in prediction of GED and SGCI events after feature selection for the Facebook dataset.

Short Name | Name |
---|---|

J48-C4.5 decision tree | C4.5 decision tree [34] |

RandomForest | Random forest [35] |

AdaBoost(J48) | Adaptive Boosting [36] |

Bagging(REPTree) | Bootstrap aggregating [37] |

Chain Length | DBLP | Salon24 | |
---|---|---|---|

2 | 2,980 | 3,027 | 2,119 |

3 | 2,581 | 2,759 | 5,999 |

4 | 2,051 | 2,094 | 5,005 |

5 | 1,919 | 1,831 | 10,712 |

6 | 1,754 | 1,575 | 9,895 |

7 | 1,120 | 1,401 | 15,076 |

8 | 744 | 1,314 | 18,735 |

9 | 603 | 1,280 | 29,690 |

10 | 417 | 1,141 | – |

**Table 3.**SGCI: the number of evolution chains for particular event type and particular chain length in the DBLP dataset.

Chain Length | Addition | Change Size | Constancy | Decay | Deletion | Merge | Split |
---|---|---|---|---|---|---|---|

2 | 7 | 981 | 340 | 846 | 5 | 471 | 330 |

3 | 7 | 569 | 166 | 964 | 3 | 520 | 352 |

4 | 4 | 431 | 126 | 548 | 3 | 516 | 423 |

5 | 3 | 432 | 106 | 379 | 1 | 499 | 499 |

6 | 0 | 428 | 82 | 296 | 1 | 532 | 415 |

7 | 0 | 334 | 72 | 135 | 0 | 381 | 198 |

8 | 0 | 219 | 39 | 146 | 0 | 229 | 111 |

9 | 0 | 182 | 29 | 99 | 0 | 178 | 115 |

10 | 0 | 106 | 16 | 82 | 0 | 135 | 78 |

**Table 4.**SGCI: the number of evolution chains for particular event type and particular chain length in the Facebook dataset.

Chain Length | Addition | Change Size | Constancy | Decay | Deletion | Merge | Split |
---|---|---|---|---|---|---|---|

2 | 23 | 1137 | 416 | 840 | 32 | 298 | 281 |

3 | 17 | 854 | 286 | 1078 | 18 | 295 | 211 |

4 | 23 | 680 | 202 | 714 | 20 | 247 | 208 |

5 | 8 | 623 | 160 | 624 | 23 | 204 | 189 |

6 | 11 | 541 | 134 | 499 | 11 | 215 | 164 |

7 | 13 | 457 | 139 | 425 | 11 | 195 | 161 |

8 | 14 | 434 | 118 | 389 | 11 | 170 | 178 |

9 | 9 | 394 | 99 | 438 | 19 | 168 | 153 |

10 | 5 | 324 | 87 | 426 | 16 | 147 | 136 |

**Table 5.**SGCI: the number of evolution chains for particular event type and particular chain length in the Salon24 dataset.

Chain Length | Addition | Change Size | Constancy | Decay | Deletion | Merge | Split |
---|---|---|---|---|---|---|---|

2 | 185 | 615 | 72 | 683 | 255 | 125 | 184 |

3 | 920 | 764 | 102 | 3,638 | 157 | 216 | 202 |

4 | 603 | 1,098 | 68 | 2,280 | 444 | 214 | 298 |

5 | 1,334 | 1,510 | 104 | 6,773 | 340 | 338 | 313 |

6 | 1,064 | 2,170 | 138 | 5,201 | 398 | 464 | 460 |

7 | 1,860 | 2,573 | 158 | 8,597 | 594 | 563 | 731 |

8 | 2,065 | 3,357 | 365 | 9,917 | 912 | 920 | 1,199 |

9 | 4,126 | 3,900 | 533 | 16,498 | 1,151 | 1,875 | 1,607 |

State | Chain 2 | Chain 3 | Chain 4 | Chain 5 | Chain 6 | Chain 7 | Chain 8 | Chain 9 | Chain 10 |
---|---|---|---|---|---|---|---|---|---|

n-1 | 3 | 14 | 20 | 17 | 21 | 24 | 19 | 18 | 17 |

n-2 | 1 | 7 | 22 | 9 | 9 | 15 | 12 | 7 | 5 |

n-3 | 2 | 24 | 9 | 7 | 13 | 8 | 4 | 7 | |

n-4 | 14 | 7 | 10 | 14 | 6 | 3 | 3 | ||

n-5 | 3 | 9 | 14 | 7 | 6 | 5 | |||

n-6 | 2 | 6 | 5 | 3 | 2 | ||||

n-7 | 6 | 6 | 6 | 1 | |||||

n-8 | 1 | 3 | 2 | ||||||

n-9 | 1 | 0 | |||||||

n-10 | 3 |

Chain Length | DBLP | Salon24 | |
---|---|---|---|

2 | 20,324 | 8,655 | 26,619 |

3 | 2,480 | 3,618 | 25,136 |

4 | 729 | 2,401 | 160,059 |

5 | 281 | 1,838 | 163,723 |

6 | 135 | 1,434 | 107,554 * |

7 | 73 | 1,249 | 42,284 ** |

8 | 45 | 1,069 | – |

9 | 24 | 864 | – |

10 | 9 | 677 | – |

^{*}and

^{**}denote that only 10% and 5% of the total number of evolution chains were selected as a input for the classifier.

**Table 8.**GED: the number of evolution chains for particular event type and particular chain length in the DBLP dataset.

Chain Length | Continuing | Dissolving | Growing | Merging | Shrinking | Splitting |
---|---|---|---|---|---|---|

2 | 1,063 | 16,875 | 1,075 | 135 | 977 | 199 |

3 | 233 | 1,557 | 285 | 69 | 229 | 107 |

4 | 73 | 337 | 119 | 41 | 128 | 31 |

5 | 26 | 113 | 51 | 15 | 56 | 20 |

6 | 8 | 39 | 33 | 15 | 29 | 11 |

7 | 4 | 16 | 18 | 6 | 21 | 8 |

8 | 3 | 9 | 12 | 5 | 12 | 4 |

9 | 1 | 9 | 6 | 3 | 4 | 1 |

10 | 1 | 2 | 0 | 1 | 4 | 1 |

**Table 9.**GED: the number of evolution chains for particular event type and particular chain length in the Facebook dataset.

Chain Length | Continuing | Dissolving | Growing | Merging | Shrinking | Splitting |
---|---|---|---|---|---|---|

2 | 915 | 4842 | 826 | 359 | 916 | 797 |

3 | 410 | 1193 | 512 | 257 | 642 | 604 |

4 | 263 | 587 | 379 | 209 | 483 | 480 |

5 | 191 | 388 | 300 | 160 | 399 | 400 |

6 | 153 | 272 | 262 | 160 | 322 | 265 |

7 | 129 | 205 | 218 | 124 | 259 | 314 |

8 | 124 | 177 | 190 | 109 | 250 | 219 |

9 | 89 | 176 | 149 | 121 | 166 | 163 |

10 | 69 | 121 | 116 | 97 | 135 | 139 |

**Table 10.**GED: the number of evolution chains for particular event type and particular chain length in the Salon24 dataset.

Chain Length | Continuing | Dissolving | Growing | Merging | Shrinking | Splitting |
---|---|---|---|---|---|---|

2 | 115 | 341 | 114 | 957 | 142 | 24,950 |

3 | 214 | 1,179 | 230 | 10,517 | 249 | 12,747 |

4 | 112 | 727 | 123 | 5,632 | 183 | 153,282 |

5 | 1,090 | 8,724 | 1,019 | 66,511 | 1,542 | 84,837 |

6* | 60 | 593 | 62 | 3,808 | 111 | 102,920 |

7** | 591 | 878 | 573 | 17,958 | 317 | 21,967 |

^{*}and

^{**}denote that only 10% and 5% of the total number of evolution chains were selected as a input for the classifier.

**Table 11.**GED: the number of features selected for particular chain length for the Facebook dataset.

State | Chain 2 | Chain 3 | Chain 4 | Chain 5 | Chain 6 | Chain 7 | Chain 8 | Chain 9 | Chain 10 |
---|---|---|---|---|---|---|---|---|---|

n-1 | 12 | 21 | 23 | 10 | 16 | 26 | 12 | 14 | 14 |

n-2 | 6 | 25 | 22 | 8 | 9 | 15 | 5 | 9 | 7 |

n-3 | 8 | 24 | 10 | 11 | 8 | 7 | 4 | 3 | |

n-4 | 5 | 5 | 3 | 7 | 6 | 5 | 2 | ||

n-5 | 0 | 7 | 7 | 6 | 1 | 0 | |||

n-6 | 0 | 3 | 6 | 2 | 0 | ||||

n-7 | 0 | 4 | 1 | 0 | |||||

n-8 | 1 | 1 | 1 | ||||||

n-9 | 0 | 0 | |||||||

n-10 | 0 |

© 2015 by the authors; licensee MDPI, Basel, Switzerland This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Saganowski, S.; Gliwa, B.; Bródka, P.; Zygmunt, A.; Kazienko, P.; Koźlak, J.
Predicting Community Evolution in Social Networks. *Entropy* **2015**, *17*, 3053-3096.
https://doi.org/10.3390/e17053053

**AMA Style**

Saganowski S, Gliwa B, Bródka P, Zygmunt A, Kazienko P, Koźlak J.
Predicting Community Evolution in Social Networks. *Entropy*. 2015; 17(5):3053-3096.
https://doi.org/10.3390/e17053053

**Chicago/Turabian Style**

Saganowski, Stanisław, Bogdan Gliwa, Piotr Bródka, Anna Zygmunt, Przemysław Kazienko, and Jarosław Koźlak.
2015. "Predicting Community Evolution in Social Networks" *Entropy* 17, no. 5: 3053-3096.
https://doi.org/10.3390/e17053053