# SentiFlow: An Information Diffusion Process Discovery Based on Topic and Sentiment from Online Social Networks

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Framework

#### 2.1. SNS Log Collection

**Definition**

**1.**

_{comment}, where the user’s name is written and followed by the comment in subscript. Each user and comment are ordered by the timestamp for when the comment was published. The example SNS log contains six posts and 23 comments written by five users: Angela, George, John, Paul, and Ringo.

#### 2.2. Information Flow among Communities

**Definition**

**2.**

**Definition**

**3.**

_{1}= {John, Angela}, c

_{2}= {George, Ringo}, and c

_{3}= {Paul}. Table 2 presents the first findings for the process model obtaining the matrix diffusion community A and the probability of state transition A′. In addition, the transition probability distribution is π = (1, 0, 0) where c

_{1}always initiates the information diffusion.

_{2}is greater because it has more incoming information diffusion than do the other communities. Moreover, the information diffusion between communities c

_{2}and c

_{3}is notable.

#### 2.3. Semantic Information Flow

**Definition**

**4.**

_{1}. The top eight keywords of the two topics are t

_{1}= {need, nice, demand, happy, need demand, need persistent, ok, everything fine} and t

_{2}= {think, wow perfect, pitiful, better, better reform, cool, everything, excellent}.

#### 2.4. Sentimental Information Flow

**Definition**

**5.**

**Definition**

**6.**

- $\pi =\left({\pi}_{i}\right)$ for 1 ≤ i ≤ N, where ${\pi}_{i}$ is the probability of being in ${c}_{i}$ at time 1.
- $C=\left\{{c}_{1},\text{}\dots ,{c}_{N}\right\}\in L$ for 1 ≤ i ≤ N, where $\left\{{c}_{1},\text{}\dots ,{c}_{N}\right\}$ are the information diffusion states of $L$.
- $T=\text{}\left\{{t}_{1},\text{}\dots ,{t}_{M}\right\}\in L$ for 1 ≤ m ≤ M, where $\left\{{t}_{1},\text{}\dots ,{t}_{M}\right\}$ are the observed topics of $L$.
- ${A}^{\prime}=\left({{a}^{\prime}}_{ij}\right)$ for 1 ≤ i, j ≤ N, where ${{a}^{\prime}}_{ij}$ is the probability of state transition from ${c}_{i}$ to ${c}_{j}$.
- ${B}^{\prime}=\left({{b}^{\prime}}_{im}\right)$ for 1 ≤ i ≤ N, 1 ≤ m ≤ M, where ${{b}^{\prime}}_{im}$ is the probability of observing ${t}_{m}$ in state ${c}_{i}$.
- ${D}^{\prime}=\left({{d}^{\prime}}_{imr}\right)$ for 1 ≤ i ≤ N, 1 ≤ m ≤ M, 1 ≤ r ≤ 3, where ${{d}^{\prime}}_{imr}$ is the probability of observing a sentiment ${s}_{r}$ from a topic ${t}_{m}$ in a state ${c}_{n}$.

_{1}. The difference of color between the communities and topics is shown. For example, c

_{1}to t

_{1}shows a bluish color representing predominant positive commentaries (0.75) compared to negative commentaries (0.25). However, c

_{3}to t

_{1}presents predominantly negative commentaries for t

_{1}(0.75), with 0.25 positive commentaries. Additionally, community c

_{2}to topic t

_{2}has a mixture of sentiments in the comments, with 0.5 for both. The mapping color of the arc from a community to a topic represents the type of sentiment; the arc is red if the sentiment is negative, lime if neutral, and blue if positive.

_{2}and c

_{3}was about topics t

_{1}and t

_{2}, although c

_{2}mainly focused on t

_{2}and c

_{3}mainly focused on t

_{1}. The second question is about how the sentiment is shared in the information diffusion process from a probabilistic perspective. As an example, the information diffusion from communities c

_{1}to c

_{2}for topic t

_{2}is used. Considering the sequence, <positive, positive>, the result can be analyzed using the forward algorithm [22], and the probability of the sequence is P (<positive, positive>|Λ) = 1.0 × 0.67 × 0.5 × 0.5 = 0.1675. In the case of the sequence <neutral, neutral>, the probability is 0, and the sequence <negative, negative> is P(<negative, negative>|Λ) = 1.0 × 0.33 × 0.5 × 0.5 = 0.0825. Therefore, the probability that community c

_{2}responds positively to a positive comment of community c

_{1}is higher because community c

_{1}has a higher probability of posting a positive comment.

## 3. Algorithm

_{imr}(Lines 8–25). In particular, if two adjacent users belong to the same community, the algorithm skips the count in the diffusion community matrix, and the last SNS event is counted for its topic and sentiment. Afterwards, the state transition probability matrix A′, the observation symbol probability B’, and the opinion probability matrix D’ are calculated from A, B, and D using ML. Finally, the algorithm returns a SentiFlow model, Λ(L) = (π, C, T, A′, B′, D′).

Algorithm 1. SentiFlow | |

1: | Input: SNS log L = [σ], which is a multi-set of action traces σ in the SNS. |

2: | Output: A SentiFlow model, $\Lambda \left(L\right)\text{}=\text{}\left(\pi ,C,\text{}T,\text{}{A}^{\prime},\text{}{B}^{\prime},{D}^{\prime}\right)$ |

3: | Insert all users in L into a user set U. |

4: | Detect communities C from users U, and prepare function $c=community\left(\#u\left(e\right)\right)$. |

5: | For each trace $\sigma =\langle {e}_{1},\text{}\dots ,{e}_{H}\rangle $ in L Do Discover topics T from user comment, and prepare a function |

6: | $t=lda\left(\#user\_comment\left(e\right)\right)$. |

7: | Discover sentiments S, and prepare a function $s=nbsc\left(\#user\_comment\left(e\right)\right)$. |

8: | End For |

9: | For each trace $\sigma =\langle {e}_{1},\text{}\dots ,{e}_{H}\rangle $ in L Do |

10: | If e_{1} Then |

11: | Increase ${\pi}_{i}$ in $community\left(\#u\left({e}_{1}\right)\right)$. |

12: | End If |

13: | For each adjacent SNS event (e_{h}, e_{h+}_{1}) in σ for 1 ≤ h ≤ H − 1 Do |

14: | c_{i} = community(#u(e_{h})) and c_{j} = community(#u(e_{h+1})). |

15: | t_{m} = lda(#user_comment(e_{h})) and s_{r} = nbsc(#user_comment(e_{h})). |

16: | If c_{i} ≠ c_{j} Then |

17: | Increase a_{ij} in A by 1. |

18: | End If |

19: | Increase b_{im} in B by 1. |

20: | Increase d_{imr} in D by 1. |

21: | If e_{h+}_{2} = null Then |

22: | Increase b_{jm} in B by m and t_{m} = lda(#user_comment(e_{h+}_{1})). |

23: | Increase d_{jmr} in D by m,r and s_{r} = nbsc(#user_comment(e_{h+1})). |

24: | End If |

25: | End For |

26: | End For |

27: | Calculate the state transition probability matrix A′ = (a′_{ij}) based on A = (a_{ij}). |

28: | Calculate the observation symbol probability matrix B′ = (${{b}^{\prime}}_{im}$) based on topic matrix B = (${b}_{im}$). |

29: | Calculate the sentiment probability matrix D′ = (${{d}^{\prime}}_{imr}$) based on the sentiment matrix D = (${d}_{imr}$). |

30: | Return a SentiFlow model, $\Lambda \left(L\right)\text{}=\text{}\left(\pi ,C,\text{}T,\text{}{A}^{\prime},\text{}{B}^{\prime},{D}^{\prime}\right)$ |

## 4. Experiments

_{1}to c

_{6}, contain 203, 1048, 13, 9, 25, and 121 users, respectively, among a total of 1419 users.

_{2}concentrates most of the information flows from c

_{1}, c

_{3}, c

_{4}, c

_{5}, and c

_{6}, revealing a larger size from the higher incoming information flow from smaller communities and the number of user comments. Moreover, the information flow received from c

_{3}to c

_{2}shows the highest information diffusion probability among all communities. The threshold of information diffusion probability used for the process model visualization in Figure 6 is 0.04. The threshold is used to present a readable process model removing the arcs with lower probability.

_{2}and t

_{3}share two keywords. The word “Trump” is repeated in t

_{1}, t

_{2}, t

_{3}, and t

_{4}with notable importance in a mixture of topics, but relays in categorize individually each topic.

_{2}has greater importance because it has many thicker arcs connecting communities than do other topics. Communities c

_{1}, c

_{3}, and c

_{4}present frequent use of keywords for topics t

_{2}and t

_{3}. As in the previous step, the threshold used to present the information flow is 0.04.

_{3}over t

_{2}. Conversely, a positive probability opinion can be observed from c

_{3}toward t

_{3}with a bluish color. The figure shows that c

_{5}toward t

_{2}shows a mixture of opinions and has relative balance between positive and negative opinions. In general, neutral opinions show a lower probability than positive and negative opinions.

_{2}and t

_{3}, followed by t

_{1}and t

_{4}and finally t

_{5}as the least discussed. In addition, community c

_{5}has a similar amount of interest between topics t

_{1}, t

_{3}, t

_{4}, and t

_{5}. Furthermore, topic t

_{2}is the most commented upon among all the communities with the exception of community c

_{1}, which focuses on topic t

_{3}.

_{2}is continuously the largest community and concentrates most of the information flows from c

_{1}, c

_{3}, c

_{4}, c

_{5}, and c

_{6}from the different topics. Figure 11a shows a SentiFlow model from topic t

_{1}with a different flow from Figure 9, where community c

_{3}does not provide an initial probability and indicates an information diffusion to c

_{4}with probability 0.0769. Additionally, c

_{3}and c

_{4}have a predominant negative opinion in contrast to c

_{1}, c

_{2}, c

_{5}, and c

_{6}with a positive opinion. In Figure 11b, the initial information diffusion $\pi $ changed for community c

_{4}not presented in other SentiFlow models with probability of 0.0110. In addition, c

_{4}, c

_{5}, and c

_{2}have a purple color to note they have an opinion divided between negative and positive. However, c

_{6}shows a greater positive opinion, whereas c

_{3}presents a greater negative opinion. In Figure 10a, community c

_{1}has the most comments for topic t

_{3}, but, in Figure 11c, c

_{1}is smaller than c

_{2}because there are more user comments than in c

_{1}. Positive opinions are expressed in c

_{1}, c

_{2}, c

_{3}, and c

_{6}, whereas while negative opinions are expressed in c

_{4}, and mixed opinions are expressed by users from c

_{5}. In Figure 11d,e, good information flow is observed between all communities with the exception of c

_{4}. In this case, the community does not show an incoming information flow because the probabilities are below 0.04. For Figure 11d, a general predominant positive opinion can be seen for almost all communities, even though c

_{5}has a combination of positive and negative opinions. In the sentiment information diffusion for topic t

_{5}, c

_{3}, c

_{4}, and c

_{6}, have a positive opinion, in contrast to c

_{5}with negative comments and c

_{1}and c

_{2}with a balanced opinion between positive and negative, as shown in Figure 11e.

_{3}, with the communication between community c

_{4}and community c

_{1}with a probability of 0.3103 and with a response communication probability of 0.0672, which is not observed in the other information diffusion flows. As a response for the second question about how the sentiments are shared from a probabilistic view, the example of information diffusion from community c

_{2}to community c

_{4}for topic t

_{1}shown in Figure 11a is analyzed. Taking the sequence of communities <c

_{2}, c

_{3}, c

_{4}> and the sequence of sentiments <positive, positive, positive>, the probability of the sequence is P (<positive, positive, positive>|Λ) = 0.9451 × 0.53 × 0.0486 × 0.38 × 0.0769 × 0.4 = 2.8455 × 10

^{−4}. Moreover, if the communication from community c

_{2}to community c

_{6}and the sequence of sentiments <positive, positive> are analyzed, the probability is P (<positive, positive>|Λ) = 0.9451 × 0.53 × 0.4649 × 0.64 = 0.1490 because the only way that c

_{2}can communicate with c

_{4}is through c

_{3}, decreasing the probability of the positive sentiment, instead of from c

_{2}to c

_{6}, where the information diffusion does not need an intermediary community.

## 5. Conclusions

## Author Contributions

## Funding

## Conflicts of Interest

## References

- Zafarani, R.; Abbasi, M.A.; Liu, H. Social Media Mining: An Introduction, 1st ed.; Cambridge University Press: Cambridge, UK, 2014; ISBN 1107018854. [Google Scholar]
- Guille, A.; Hacid, H.; Favre, C.; Zighed, D.A. Information diffusion in online social networks: A survey. Sigmod. Rec.
**2013**, 42, 17–28. [Google Scholar] [CrossRef] - Grabowicz, P.A.; Ramasco, J.J.; Moro, E.; Pujol, J.M.; Eguiluz, V.M. Social features of online networks: The strength of intermediary ties in online social media. PLoS ONE
**2012**, 7. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Tafti, A.; Zotti, R.; Jank, W. Real-time diffusion of information on Twitter and the financial markets. PLoS ONE
**2016**, 11, e0159226. [Google Scholar] [CrossRef] [PubMed] - Zhang, X.; Han, D-D.; Yang, R.; Zhang, Z. Users’ participation and social influence during information spreading on Twitter. PLoS ONE
**2017**, 12. [Google Scholar] [CrossRef] [PubMed] - Jafari, S.; Navidi, H. A Game-Theoretic Approach for Modeling Competitive Diffusion over Social Networks. Games
**2018**, 9, 8. [Google Scholar] [CrossRef] - Kim, K.; Jung, J.-Y.; Park, J. Discovery of information diffusion process in social networks. IEICE Trans. Inf. Syst.
**2012**, 95, 1539–1542. [Google Scholar] [CrossRef] - Kim, K.; Obregon, J.; Jung, J.-Y. Analyzing information flow and context for Facebook fan pages. IEICE Trans. Inf. Syst.
**2014**, 97, 811–814. [Google Scholar] [CrossRef] - Ullah, F.; Lee, S. Social Content Recommendation Based on Spatial-Temporal Aware Diffusion Modeling in Social Networks. Symmetry
**2016**, 8, 89. [Google Scholar] [CrossRef] - Kimura, M.; Saito, K.; Nakano, R.; Motoda, H. Extracting influential nodes on a social network for information diffusion. Data Min. Knowl. Discov.
**2010**, 20, 70. [Google Scholar] [CrossRef] - Li, D.; Zhang, S.; Sun, X.; Zhou, H.; Li, S.; Li, X. Modeling information diffusion over social networks for temporal dynamic prediction. IEEE Trans. Knowl. Data Eng.
**2017**, 29, 1985–1997. [Google Scholar] [CrossRef] - Kim, M.; Newth, D.; Christen, P. Modeling dynamics of diffusion across heterogeneous social networks: News diffusion in social media. Entropy
**2013**, 15, 4215–4242. [Google Scholar] [CrossRef] - Li, M.; Wang, X.; Gao, K.; Zhang, S. A Survey on information diffusion in online social networks: Models and methods. Information
**2017**, 8, 118. [Google Scholar] [CrossRef] - Zhao, J.; Dong, L.; Wu, J.; Xu, K. Moodlens: An emoticon-based sentiment analysis system for Chinese tweets. In Proceedings of the 18th ACM SIGKDD, Beijing, China, 12–16 August 2012; pp. 1528–1531. [Google Scholar]
- Fan, R.; Zhao, J.; Chen, Y.; Xu, K. Anger is more influential than joy: Sentiment correlation in Weibo. PLoS ONE
**2014**, 9, e110184. [Google Scholar] [CrossRef] [PubMed] - Kramer, A.D.; Guillory, J.E.; Hancock, J.T. Experimental evidence of massive-scale emotional contagion through social networks. Proc. Natl. Acad. Sci. USA
**2014**, 111, 8788–8790. [Google Scholar] [CrossRef] [PubMed] - Vitale, P.; Guarasci, R.; Iannotta, I.S. Visualizing research topics in Facebook conversations. Proceedings
**2017**, 1, 895. [Google Scholar] [CrossRef] - Maynard, D.; Gossen, G.; Funk, A.; Fisichella, M. Should I care about your opinion? Detection of opinion interestingness and dynamics in social media. Future Internet
**2014**, 6, 457–481. [Google Scholar] [CrossRef] - Zeng, F.; Zhao, N.; Li, W. Effective social relationship measurement and cluster based routing in mobile opportunistic networks. Sensors
**2017**, 17, 1109. [Google Scholar] [CrossRef] [PubMed] - Hernandez-Suarez, A.; Sanchez-Perez, G.; Toscano-Medina, K.; Martinez-Hernandez, V.; Perez-Meana, H.; Olivares-Mercado, J.; Sanchez, V. Social sentiment sensor in Twitter for predicting cyber-attacks using ℓ1 regularization. Sensors
**2018**, 18, 1380. [Google Scholar] [CrossRef] [PubMed] - Ren, G.; Hong, T. Investigating Online destination images using a topic-based sentiment analysis approach. Sustainability
**2017**, 9, 1765. [Google Scholar] [CrossRef] - Bishop, C.M. Pattern Recognition and Machine Learning, 1st ed.; Springer: New York, NY, USA, 2006; ISBN 9780387310732. [Google Scholar]
- Van der Aalst, W.M.P. Process Mining: Data Science in Action, 2nd ed.; Springer: Berlin, Germany, 2016; ISBN 9783662498507. [Google Scholar]
- Carrera, B.; Lee, J.; Jung, J.-Y. Discovering information diffusion processes based on hidden Markov models for social network services. In Proceedings of the Asia-Pacific Conference BPM, Busan, Korea, 24–26 June 2015; pp. 170–182. [Google Scholar]
- Carrera, B.; Lee, J.; Jung, J.-Y. Discovery of gatekeepers on information diffusion flows using process mining. Int. J. Ind. Eng.
**2016**, 23, 253–269. [Google Scholar] - Newman, M. Networks: An Introduction, 1st ed.; Oxford University Press: Oxford, UK, 2010; ISBN 9780199206650. [Google Scholar]
- Blei, D.M. Probabilistic topic models. Commun. ACM
**2012**, 55, 77–84. [Google Scholar] [CrossRef] - Jurka, T. Sentiment: Tools for Sentiment Analysis. R Package Version 02. 2012. Available online: https://github.com/timjurka/sentiment (accessed on 6 March 2018).
- Liu, B. Sentiment Analysis and Opinion Mining; Morgan & Claypool Publishers: San Rafael, CA, USA, 2012. [Google Scholar]

**Figure 3.**The semantic information diffusion process model generated from log L

_{1}. t

_{1}and t

_{2}are the topics that were discovered from the texts exchanged among users.

**Figure 9.**The sentimental information flow based on polarity for topics generated from the CNN Facebook page.

**Figure 10.**Sentiment analysis across user opinions by each topic among six communities: (

**a**) community c

_{1}; (

**b**) community c

_{2}; (

**c**) community c

_{3}; (

**d**) community c

_{4}; (

**e**) community c

_{5}; and (

**f**) community c

_{6}.

**Figure 11.**A SentiFlow model for each topic: (

**a**) topic t

_{1}; (

**b**) topic t

_{2}; (

**c**) topic t

_{3}; (

**d**) topic t

_{4}; and (

**e**) topic t

_{5}.

**Table 1.**An example SNS log L

_{1}. A SNS log contains many action traces, which are sequential comments replies to specific posts.

Post ID | Action Trace |
---|---|

p_{1} | John _{I like it}, Angela _{This is amazing!}, George _{I think this is absurd} |

p_{2} | John _{We need to be persistent}, Ringo _{I think this is very aggressive}, Paul _{I am ashamed}, George _{We need to demand our rights!} |

p_{3} | John _{It’s better if we reform the laws}, Paul _{I am relaxed}, Ringo _{This is a revolution}, George _{pitiful} |

p_{4} | John _{Wow this is perfect}, Paul _{That is bad}, Ringo _{Nice}, George _{Excellent} |

p_{5} | John _{Too much stress}, Paul _{I am afraid}, Ringo _{Superficial}, George _{It is ok everything will be fine} |

p_{6} | John _{Terrorism}, Ringo _{I am so tired}, Paul _{I am so happy}, George _{Cool} |

**Table 2.**Matrices extracted from L

_{1}: (a) information diffusion matrix A; and (b) state transition probability matrix A′.

(a) | (b) | |||||
---|---|---|---|---|---|---|

c_{1} | c_{2} | c_{3} | c_{1} | c_{2} | c_{3} | |

c_{1} | 0 | 3 | 3 | 0.0 | 0.5 | 0.5 |

c_{2} | 0 | 0 | 2 | 0.0 | 0.0 | 1.0 |

c_{3} | 0 | 5 | 0 | 0.0 | 1.0 | 0.0 |

**Table 3.**Matrices extracted from log L

_{1}for constructing a semantic information diffusion process model: (a) topic matrix B; and (b) observation symbol probability matrix B′.

(a) | (b) | |||
---|---|---|---|---|

t_{1} | t_{2} | t_{1} | t_{2} | |

c_{1} | 4 | 3 | 0.57 | 0.43 |

c_{2} | 3 | 8 | 0.27 | 0.73 |

c_{3} | 4 | 1 | 0.80 | 0.20 |

**Table 4.**Matrices extracted from log L

_{1}for sentiment annotation: (a) sentiment matrix D; and (b) sentiment probability matrix D′.

(a) | (b) | ||||||
---|---|---|---|---|---|---|---|

s_{1} | s_{2} | s_{3} | s_{1} | s_{2} | s_{3} | ||

c_{1} | t_{1} | 3 | 0 | 1 | 0.75 | 0.00 | 0.25 |

t_{2} | 2 | 0 | 1 | 0.67 | 0.00 | 0.33 | |

c_{2} | t_{1} | 1 | 1 | 1 | 0.33 | 0.33 | 0.33 |

t_{2} | 4 | 0 | 4 | 0.50 | 0.00 | 0.50 | |

c_{3} | t_{1} | 1 | 0 | 3 | 0.25 | 0.00 | 0.75 |

Topic | Top 8 Keywords |
---|---|

t_{1} | money, Trump, troll, pay, make, blah, need, wall |

t_{2} | people, like, get, would, Trump, one, go, women |

t_{3} | Trump, Obama, president, war, Syria, world, people, us |

t_{4} | rice, Susan, Trump, Susan Rice, CNN, Obama, Russia, story |

t_{5} | CNN, news, fake, fake news, Fox, Clinton, lol, lemon |

© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Carrera, B.; Jung, J.-Y.
SentiFlow: An Information Diffusion Process Discovery Based on Topic and Sentiment from Online Social Networks. *Sustainability* **2018**, *10*, 2731.
https://doi.org/10.3390/su10082731

**AMA Style**

Carrera B, Jung J-Y.
SentiFlow: An Information Diffusion Process Discovery Based on Topic and Sentiment from Online Social Networks. *Sustainability*. 2018; 10(8):2731.
https://doi.org/10.3390/su10082731

**Chicago/Turabian Style**

Carrera, Berny, and Jae-Yoon Jung.
2018. "SentiFlow: An Information Diffusion Process Discovery Based on Topic and Sentiment from Online Social Networks" *Sustainability* 10, no. 8: 2731.
https://doi.org/10.3390/su10082731