# Modeling, Evaluating, and Applying the eWoM Power of Reddit Posts

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Related Literature

## 3. Dataset Description

## 4. Modeling and Evaluating the eWoM Power of Reddit Posts

#### 4.1. Determining the Support Theoretical Tools

`num_comments`,

`num_crossposts`,

`score`,

`total_award`, and

`title`(evaluated through its sentiment polarity); (ii) for authors:

`karma`and

`created_utc`; (iii) for subreddits:

`subscribers`and

`description`(evaluated through its sentiment polarity). In order to decide whether these candidate fields can contribute to the eWoM Power of a post and, in the affirmative case, to identify how they contribute, we studied the distribution of the posts, authors, and subreddits of our dataset against all these fields. This study is essential to define a function that models this contribution. In fact, suppose that, for a candidate field, say the

`score`of posts, the distribution follows a power law. This distribution describes the relationship between two variables, where one variable is proportional to a power of the other one. It can be expressed as $y=k\xb7{(x+\delta )}^{\alpha}$. In this formula, the exponent $\alpha $ is a measure of the steepness of the power law, and so, the higher $\alpha $ is, the higher the steepness of the curve. Instead, $\delta $ is an offset parameter that shifts the distribution along the x-axis. When $\delta $ is positive, the distribution is shifted to the right; when $\delta $ is negative, the distribution is shifted to the left. Power law distributions are characterized by a long tail of rare events, meaning that the distribution has a high frequency of small values and a low frequency of large values. In our scenario, taking the

`score`parameter into consideration, this means that many posts have a low score, while a few posts have a high score. Therefore, with this type of distribution, having a high score is extremely difficult for a post. On the other side, a post that achieves this goal acquires a “competitive advantage” over other posts. This advantage ultimately results in an increase of its eWoM Power.

- The distributions of posts against the fields
`num_comments`,`num_crossposts`,`total_``awards`, and`score`follow a power law. The values of the parameters $\alpha $ and $\delta $ of these distributions, together with the maximum values of the corresponding fields in our dataset, are shown in Table 4. - The distribution of posts against the sentiment polarity of the field
`title`is centered on the value of 0. It highlights that almost all the titles of the posts (in particular, 91% of them) have a null sentiment polarity. Furthermore, the very few posts having a different behavior are distributed quite uniformly on the right and left of 0, and in any case, most of the corresponding sentiment polarity values are very close to 0. This allowed us to conclude that the sentiment polarity of the titles does not contribute significantly to characterizing the posts and, ultimately, that the field`title`does not provide a significant contribution to the eWoM Power of a post. - The distribution of authors against the fields
`karma`and`created_utc`follows a power law (see Table 4 for the values of $\alpha $ and $\delta $). The distribution of authors against`created_utc`should be interpreted as follows. On the abscissae axis, we put the registration seniority of the authors to Reddit, grouped by bimesters. As for author seniority, we point out that, in our dataset, the youngest author registered on 8 March 2020 (and, therefore, has a seniority of 0 bimesters), whereas the oldest one registered on Reddit on 19 January 2006 (and, therefore, has a seniority of 84 bimesters). - The distribution of subreddits against the field
`subscribers`follows a power law (see Table 4 for the values of $\alpha $ and $\delta $). - The distribution of subreddits against the field
`description`is centered on the value of 0 of its sentiment polarity; its overall trend is analogous to the one of the distribution of posts against the field`title`. Therefore, also in this case, we can conclude that this field does not contribute to eWoM Power.

#### 4.2. A Naive Formulation of the eWoM Power of Reddit Posts

`num_comments`; (ii) $R(cr,{\alpha}_{cr})$ is the contribution given by

`num_crossposts`; (iii) $R(s,{\alpha}_{s})$ is the contribution of the

`score`of p; (iv) $R(a,{\alpha}_{a})$ is the contribution given by the parameter

`total_awards`; (v) $R(k,{\alpha}_{k})$ is the contribution provided by the

`karma`of the author of p; (vi) $R(u,{\alpha}_{u})$ is the contribution given by the parameter

`created_utc`of the author of p (we recall that this parameter denotes the seniority of the author of p in Reddit); (viii) $R(sb,{\alpha}_{sb})$ is the contribution given by the parameter

`subscribers`.

- It represents only the starting point for the computation of the refined eWoM Power, which, instead, is characterized by a more sophisticated formulation.
- The adoption of a weighted mean gives our definition of naive eWoM Power a great flexibility because changing the weights makes it possible to model very different situations, starting from the same simple common formulation.
- The adoption of a weighted mean gives our definition of naive eWoM Power a great extensibility, making both the addition of new parameters that one wants to consider and the modeling of scenarios different from those initially envisaged easy (see Section 5.1 for some of them). In this last case, as we will see below, it will be enough to define a new combination of weights that is well fit to the scenario that one wants to model.

#### 4.3. A Refined Formulation of eWoM Power of Reddit Posts

- The decay law of the value of a post over time is similar to all the distribution functions related to the other parameters of interest for eWoM Power;
- We have seen that, in those cases, the Leaky ReLU function is well suited to express the contribution of each of those parameters to eWoM Power;

- The function $\widehat{\mathcal{WP}}(p,t)$ returns the value of $\overline{\mathcal{WP}}\left(p\right)$ at t.
- $R(t,{\alpha}_{t})$ is the Leaky ReLU function at t. As for it, we note that:
- –
- When $\nu \left(t\right)$ is smaller than $\frac{1}{{\alpha}_{t}}$, the decrease caused by the “time” factor is low and grows slowly over time.
- –
- When $\nu \left(t\right)$ is higher than or equal to $\frac{1}{{\alpha}_{t}}$, the decrease becomes consistent and grows quickly over time.
- –
- In the definition of the normalization function $\nu $ associated with $R(t,{\alpha}_{t})$, we assumed that the maximum value ${t}_{M}$ of t is equal to the number of seconds elapsed between t and the time instant in which p was published.

## 5. Experiments and Possible Applications of the eWoM Power of Reddit Posts

#### 5.1. Preliminaries: Evaluation of the Parameters’ Impact on Naive eWoM Power

- ${\mathcal{C}}_{0}$, ${\mathcal{C}}_{1}$, and ${\mathcal{C}}_{3}$ have a very high number of top 1000 posts in common. This result is in line with the ones reported in Table 6, where we can see that the average values of eWoM Power for these three combinations are very close.
- ${\mathcal{C}}_{9}$ and ${\mathcal{C}}_{11}$ have a very high number of top 1000 posts in common. In this case also, the result is in line with the ones on the average values of eWoM Power, reported in Table 6. In fact, the values of the eWoM Power of ${\mathcal{C}}_{9}$ and ${\mathcal{C}}_{11}$ are very close to each other and very far from those of the other combinations.
- ${\mathcal{C}}_{5}$ differs considerably from all the other combinations. This is not surprising for ${\mathcal{C}}_{0}$, ${\mathcal{C}}_{1}$, and ${\mathcal{C}}_{3}$, while it is unexpected for ${\mathcal{C}}_{7}$. In fact, ${\mathcal{C}}_{5}$ and ${\mathcal{C}}_{7}$ differ only by the fact that the latter privileges not only the author (as ${\mathcal{C}}_{5}$ does), but also the subreddit. While this fact does not produce substantial differences on the average values of eWoM Power, it causes great differences in the lists of the top 1000 posts.
- There is a remarkable overlap between the top 1000 posts of ${\mathcal{C}}_{7}$ and the top 1000 posts of ${\mathcal{C}}_{0}$, ${\mathcal{C}}_{1}$, and ${\mathcal{C}}_{3}$. This is quite surprising because this overlap is not reflected in the average values of eWoM Power, shown in Table 6; in fact, in this table, ${\mathcal{C}}_{7}$ is the combination that differs most from ${\mathcal{C}}_{0}$, ${\mathcal{C}}_{1}$, and ${\mathcal{C}}_{3}$.

#### 5.2. Evaluation of the Goodness of the Results Returned by Our Approach

- The best combination is ${\mathcal{C}}_{0}$. It can be used in all circumstances remaining confident that the results it returns are correct and satisfactory.
- ${\mathcal{C}}_{1}$ is suitable for scenarios where the popularity of the post is the most-important factor. It can favor the diffusion of the most-popular and -widely shared content.
- ${\mathcal{C}}_{2}$ is suitable for scenarios where the engagement of the post is the most-important factor. It can favor the diffusion of highly commented on or upvoted posts.
- ${\mathcal{C}}_{3}$ is suitable for scenarios where the quality of the post is important. It can favor the diffusion of high-quality posts.
- ${\mathcal{C}}_{4}$ is suitable for scenarios where the expertise of the author is important. It can favor the diffusion of posts authored by individuals with relevant professional or personal experience.
- ${\mathcal{C}}_{5}$ is suitable for scenarios where the social influence of the authors of posts is important. It can favor the diffusion of posts published by user with high social influence.
- ${\mathcal{C}}_{6}$ is suitable for scenarios where the author’s activity level is important. It can favor the diffusion of posts authored by users who frequently contribute to the community.
- ${\mathcal{C}}_{7}$ is suitable for scenarios where the number of subscribers to the subreddit where the post was published is important as much as the karma and seniority of the post author. It can favor the diffusion of posts that are written by authoritative users on authoritative subreddits.
- ${\mathcal{C}}_{8}$ is suitable for scenarios where the number of subscribers to the subreddit and the karma and seniority of the author are of utmost importance. It can favor the diffusion of the posts published by influencers in the community.
- ${\mathcal{C}}_{9}$ is suitable for scenarios where the comments to the post and the subreddit where the post was published are more important than the other parameters and awards and shares are even more important. It can favor the diffusion of posts stimulating many interactions and whose quality has been recognized by users.
- ${\mathcal{C}}_{10}$ is suitable for scenarios where the interactions with and the awards to the posts are of utmost importance. It can favor the diffusion of posts with the highest number of interactions by users and that received the highest number of awards with respect to the other posts.
- ${\mathcal{C}}_{11}$ is suitable for scenarios where the rarest parameters (such as awards and crossposts) should play a more important role. It can favor the diffusion of posts that are different from the majority of the other ones as they have higher values on these parameters.
- ${\mathcal{C}}_{12}$ is suitable for scenarios where the rarest parameters are of utmost importance. It can favor the diffusion of the rarest posts, which have the highest values of awards, crossposts, and number of comments.

#### 5.3. A First Application: Determining the Lifespan of a Reddit Post

- The template of Figure 6a could be called “Explosive”. It corresponds to a scenario where an author with high karma and a high score submits a post, possibly in a subreddit with many subscribers. The eWoM Power of this post is initially very high; however, its content does not attract other users, and the post is not successful. As a consequence, the “time” factor leads to a fast decay of the eWoM Power, without being contrasted by positive factors.
- The template of Figure 6b could be called “Revived”. It is similar to the template “Explosive”. However, in this case, the decay caused by the “time” factor is contrasted by the growth of some positive factors. These manage to increase the eWoM Power again and keep it constant for some time until the post content becomes obsolete, the “time” factor prevails, and the eWoM Power decays.
- The template of Figure 6c could be called “Bell”. In this case, the post starts with a low eWoM Power, maybe because its author is not famous, and it is published in a subreddit with few subscribers. At a certain time, the positive factors suddenly make the eWoM Power grow, maybe because the post is reposted by a famous user or receives an award. At this point, the value of the eWoM Power remains constant for some time until the post content becomes obsolete and it decays.
- The template of Figure 6d could be called “Unlucky”. In this case, the post starts as poorly as in the previous case. However, different from before, there is no positive explosive factor able to significantly increase its eWoM Power, which becomes zero in a short time.
- The template of Figure 6e could be called “Gray”. In this case, the post starts with a medium value of eWoM Power. However, this last parameter does not grow; therefore, in a short time period, the “time” factor lowers its eWoM Power to zero.
- The template of Figure 6f could be called “Moody”. Here, the eWoM Power starts as high as in the “Explosive” template. Similar to what happens for that template, the “time” factor acts negatively, tending to lower the eWoM Power. However, the post is very successful. Therefore, there are periodically positive factors that increase its eWoM Power. These ups and downs are repeated several times until the post content becomes obsolete and its eWoM Power becomes zero.

#### 5.4. A Second Application: Determining the Profile of a Reddit Post

- “Meme” represents a post or a picture that attracts the attention of other users through irony or sarcasm. It can concern the behavior of people or animals or, alternatively, current topics.
- “Niche news” represents news concerning a topic of interest to a small audience. Often, it is local news or news that did not receive great attention from media.
- “News” represents news concerning a current topic. It could regard a chronicle, a celebrity, finance, and so on. In any case, it received great attention from media.

- “Cringe/NSFW” is a post about not safe for work niche topics or, anyway, about topics not suitable for a general audience. Usually, it is an NSFW post published in a subreddit with a lower than average number of subscribers.
- “Digital art” is a post containing a digital drawing, created by the post author herself/himself. The posts of this category are generally published by authors to show off their work and make themselves known.

- “Provocative” is a post that attracts the attention of other users in a provocative way, for example by explicitly attacking a politician or certain categories of people.
- “NSFW” is similar to “Cringe/NSFW”, but while the latter is niche, the former is generic.

- “Doubt” is a post where a user asks a direct question to a category of users to resolve a doubt or know something. Posts with this profile generally attract attention thanks to the answers to this question.
- “Unsuccessful” represents a generic post that failed to attract the attention of other users.

## 6. Discussion

- The definition of two parameters, the first naive, but simple to compute, the second refined, but expensive to calculate, for the computation of the eWoM Power.
- The definition of six possible post lifespan templates.
- The definition of a set of post profiles.

## 7. Conclusions

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Conflicts of Interest

## References

- Richins, M.L.; Root-Shaffer, T. The role of evolvement and opinion leadership in consumer word-of-mouth: An implicit model made explicit. ACR N. Am. Adv.
**1988**, 15, 32–36. [Google Scholar] - Tucker, T. Online word of mouth: Characteristics of Yelp.com reviews. Elon J. Undergrad. Res. Commun.
**2011**, 2, 37–42. [Google Scholar] - Ismagilova, E.; Dwivedi, Y.; Slade, E. Perceived helpfulness of eWOM: Emotions, fairness and rationality. J. Retail. Consum. Serv.
**2020**, 53, 101748. [Google Scholar] [CrossRef] [Green Version] - Reyes-Menendez, A.; Saura, J.; Martinez-Navalon, J. The impact of e-WOM on hotels management reputation: Exploring TripAdvisor review credibility with the ELM model. IEEE Access
**2019**, 7, 68868–68877. [Google Scholar] [CrossRef] - Lee, H.; Law, R.; Murphy, J. Helpful reviewers in TripAdvisor, an online travel community. J. Travel Tour. Mark.
**2011**, 28, 675–688. [Google Scholar] [CrossRef] - Filieri, R.; Alguezaui, S.; McLeay, F. Why do travelers trust TripAdvisor? Antecedents of trust towards consumer-generated media and its influence on recommendation adoption and word of mouth. Tour. Manag.
**2015**, 51, 174–185. [Google Scholar] [CrossRef] [Green Version] - Zhang, N.; Campo, S.; Janz, K.; Eckler, P.; Yang, J.; Snetselaar, L.; Signorini, A. Electronic word of mouth on Twitter about physical activity in the United States: Exploratory infodemiology study. J. Med. Internet Res.
**2013**, 15, e261. [Google Scholar] [CrossRef] [Green Version] - Asur, S.; Huberman, B.; Szabo, G.; Wang, C. Trends in social media: Persistence and decay. In Proceedings of the International AAAI Conference on Weblogs and Social Media (ICWSM’11), Barcelona, Spain, 17–21 July 2011. [Google Scholar]
- Cao, X.; Li, H.; Zhu, R. Analyzing the Online Word of Mouth Dynamics: A Novel Approach. In Proceedings of the Academy of Management, Boston, MA, USA, 7–11 August 2020; Academy of Management Briarcliff Manor: Briarcliff Manor, NY, USA, 2020; Volume 2020, p. 17328. [Google Scholar]
- Kim, A.; Johnson, K. Power of consumers using social media: Examining the influences of brand-related user-generated content on Facebook. Comput. Hum. Behav.
**2016**, 58, 98–108. [Google Scholar] [CrossRef] - Kumar, A.; Sangwan, S.; Nayyar, A. Rumour veracity detection on twitter using particle swarm optimized shallow classifiers. Multimed. Tools Appl.
**2019**, 78, 24083–24101. [Google Scholar] [CrossRef] - Cho, S.; Cha, M.; Sohn, K. Topic category analysis on twitter via cross-media strategy. Multimed. Tools Appl.
**2016**, 75, 12879–12899. [Google Scholar] [CrossRef] - Medvedev, A.; Lambiotte, R.; Delvenne, J. The anatomy of Reddit: An overview of academic research. In Dynamics on and of Complex Networks; Springer: Berlin, Germany, 2017; pp. 183–204. [Google Scholar]
- Ismail, H.; Khalil, A.; Hussein, N.; Elabyad, R. Triggers and Tweets: Implicit Aspect-Based Sentiment and Emotion Analysis of Community Chatter Relevant to Education Post-COVID-19. Big Data Cogn. Comput.
**2022**, 6, 99. [Google Scholar] [CrossRef] - Alnazzawi, N. Using Twitter to Detect Hate Crimes and Their Motivations: The HateMotiv Corpus. Data
**2022**, 7, 69. [Google Scholar] [CrossRef] - Achimescu, V.; Chachev, P.D. Raising the flag: Monitoring user perceived disinformation on reddit. Information
**2020**, 12, 4. [Google Scholar] [CrossRef] - Guidi, B.; Michienzi, A.; Salve, A.D. Community evaluation in Facebook groups. Multimed. Tools Appl.
**2020**, 79, 33603–33622. [Google Scholar] [CrossRef] - Amati, G.; Angelini, S.; Gambosi, G.; Rossi, G.; Vocca, P. Influential users in Twitter: Detection and evolution analysis. Multimed. Tools Appl.
**2019**, 78, 3395–3407. [Google Scholar] [CrossRef] - Erl, T.; Khattak, W.; Buhler, P. Big Data Fundamentals—Concepts, Drivers & Techniques; Prentice Hall: Hoboken, NJ, USA, 2015. [Google Scholar]
- Dellarocas, C. The digitization of word of mouth: Promise and challenges of online feedback mechanisms. Manag. Sci.
**2003**, 49, 1407–1424. [Google Scholar] [CrossRef] [Green Version] - Arndt, J. Role of product-related conversations in the diffusion of a new product. J. Mark. Res.
**1967**, 4, 291–295. [Google Scholar] [CrossRef] - Katz, E.; Lazarsfeld, P. Personal Influence, The Part Played by People in the Flow of Mass Communications; Transaction Publishers: Piscataway, NJ, USA, 1966. [Google Scholar]
- Dean, D.; Lang, J. Comparing three signals of service quality. J. Serv. Mark.
**2008**, 22, 48–58. [Google Scholar] [CrossRef] - Cassavia, N.; Masciari, E.; Pulice, C.; Saccà, D. Discovering User Behavioral Features to Enhance Information Search on Big Data. ACM Trans. Interact. Intell. Syst.
**2017**, 7, 1–33. [Google Scholar] [CrossRef] - Jansen, B.; Zhang, M.; Sobel, K.; Chowdury, A. Twitter power: Tweets as electronic word of mouth. J. Am. Soc. Inf. Sci. Technol.
**2009**, 60, 2169–2188. [Google Scholar] [CrossRef] - Zhang, P.; Lee, H.; Zhao, K.; Shah, V. An empirical investigation of eWOM and used video game trading: The moderation effects of product features. Decis. Support Syst.
**2019**, 123, 113076. [Google Scholar] [CrossRef] - Huete-Alcocer, N. A literature review of word of mouth and electronic word of mouth: Implications for consumer behavior. Front. Psychol.
**2017**, 8, 1256. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Chu, S.; Kim, Y. Determinants of consumer engagement in electronic word-of-mouth (eWOM) in social networking sites. Int. J. Advert.
**2011**, 30, 47–75. [Google Scholar] [CrossRef] [Green Version] - Horng, S.; Wu, C. How behaviors on social network sites and online social capital influence social commerce intentions. Inf. Manag.
**2020**, 57, 103176. [Google Scholar] [CrossRef] - Sohaib, M.; Hui, P.; Akram, U.; Majeed, A.; Tariq, A. How Social Factors Drive Electronic Word-of-Mouth on Social Networking Sites? In Proceedings of the International Conference on Management Science and Engineering Management (ICMSEM’19), Toronto, ON, Canada, 5–8 August 2019; Springer: Berlin, Germany, 2019; pp. 574–585. [Google Scholar]
- Wang, T.; Yeh, R.; Chen, C.; Tsydypov, Z. What drives electronic word-of-mouth on social networking sites? Perspectives of social capital and self-determination. Telemat. Inform.
**2016**, 33, 1034–1047. [Google Scholar] [CrossRef] - Brown, J.; Broderick, A.; Lee, N. Word of mouth communication within online communities: Conceptualizing the online social network. J. Interact. Mark.
**2007**, 21, 2–20. [Google Scholar] [CrossRef] - Luo, Q.; Zhong, D. Using social network analysis to explain communication characteristics of travel-related electronic word-of-mouth on social networking sites. Tour. Manag.
**2015**, 46, 274–282. [Google Scholar] [CrossRef] - Adamopoulos, P.; Ghose, A.; Todri, V. The impact of user personality traits on word of mouth: Text-mining social media platforms. Inf. Syst. Res.
**2018**, 29, 612–640. [Google Scholar] [CrossRef] [Green Version] - Bae, Y.; Ryu, P.; Kim, H. Predicting the lifespan and retweet times of tweets based on multiple feature analysis. Etri J.
**2014**, 36, 418–428. [Google Scholar] [CrossRef] - Kong, S.; Feng, L.; Sun, G.; Luo, K. Predicting lifespans of popular tweets in microblog. In Proceedings of the Special Interest Group on Information Retrieval (SIGIR’12), Portland, OR, USA, 16 August 2012; pp. 1129–1130. [Google Scholar]
- Sun, B.; Ng, V. Lifespan and popularity measurement of online content on social networks. In Proceedings of the IEEE International Conference on Intelligence and Security Informatics (ISI’11), Beijing, China, 10–12 July 2011; IEEE: Piscataway, NJ, USA, 2011; pp. 379–383. [Google Scholar]
- Yang, J.; Leskovec, J. Patterns of Temporal Variation in Online Media. In Proceedings of the International Conference on Web Search and Web Data Mining (WSDM 2011), Hong Kong, China, 9–12 February 2011; ACM: New York, NY, USA, 2011; pp. 177–186. [Google Scholar]
- Peri, S.; Chen, B.; Dougall, A.; Siemens, G. Towards understanding the lifespan and spread of ideas: Epidemiological modeling of participation on Twitter. In Proceedings of the International Conference on Learning Analytics & Knowledge (LAK’20), Frankfurt Germany, 23–27 March 2020; pp. 197–202. [Google Scholar]
- Fiebert, M.; Aliee, A.; Yassami, H.; Dorethy, M. The life cycle of a Facebook post. Open Psychol. J.
**2014**, 7, 18–19. [Google Scholar] [CrossRef] [Green Version] - Bonifazi, G.; Corradini, E.; Marchetti, M.; Sciarretta, L.; Ursino, D.; Virgili, L. A Space-Time Framework for Sentiment Scope Analysis in Social Media. Big Data Cogn. Comput.
**2022**, 6, 130. [Google Scholar] [CrossRef] - Spasojevic, N.; Li, Z.; Rao, A.; Bhattacharyya, P. When-to-post on social networks. In Proceedings of the International Conference on Knowledge Discovery and Data Mining (SIGKDD’15), Sydney, NSW, Australia, 10–13 August 2015; pp. 2127–2136. [Google Scholar]
- Shin, J.; Jian, L.; Driscoll, K.; Bar, F. The diffusion of misinformation on social media: Temporal pattern, message, and source. Comput. Hum. Behav.
**2018**, 83, 278–287. [Google Scholar] [CrossRef] - Alkhamees, N.; Fasli, M. Event detection from social network streams using frequent pattern mining with dynamic support values. In Proceedings of the International Conference on Big Data (BigData’16), Washington, DC, USA, 5–8 December 2016; IEEE: Piscataway, NJ, USA, 2016; pp. 1670–1679. [Google Scholar]
- Shen, J.; Rudzicz, F. Detecting anxiety through Reddit. In Proceedings of the Fourth Workshop on Computational Linguistics and Clinical Psychology—From Linguistic Signal to Clinical Reality, Vancouver, BC, Canada, 3 August 2017; Association for Computational Linguistics: Cedarville, OH, USA, 2017; pp. 58–65. [Google Scholar]
- Buntain, C.; Golbeck, J. Identifying Social Roles in Reddit Using Network Structure. In Proceedings of the International Conference on World Wide Web (WWW’14), Seoul, Republic of Korea, 7–11 April 2014; ACM: NewYork, NY, USA, 2014; pp. 615–620. [Google Scholar]
- Baumgartner, J.; Zannettou, S.; Keegan, B.; Squire, M.; Blackburn, J. The pushshift Reddit dataset. In Proceedings of the International AAAI Conference on Web and Social Media (ICWSM’20), Online, 19–20 November 2020; AAAI Press: Palo Alto, CA, USA, 2020; Volume 14, pp. 830–839. [Google Scholar]
- Weninger, T. An exploration of submissions and discussions in social news: Mining collective intelligence of Reddit. Soc. Netw. Anal. Min.
**2014**, 4, 173–192. [Google Scholar] [CrossRef] - Newell, E.; Jurgens, D.; Saleem, H.; Vala, H.; Sassine, J.; Armstrong, C.; Ruths, D. User Migration in Online Social Networks: A Case Study on Reddit During a Period of Community Unrest. In Proceedings of the International Conference on Web and Social Media (ICWSM 2016), Cologne, Germany, 17–20 May 2016; AAAI: Palo Alto, CA, USA, 2016; pp. 279–288. [Google Scholar]
- Soliman, A.; Hafer, J.; Lemmerich, F. A Characterization of Political Communities on Reddit. In Proceedings of the ACM Conference on Hypertext and Social Media (HT’19), Berlin, Germany, 16–23 September 2019; ACM: NewYork, NY, USA, 2019; pp. 259–263. [Google Scholar]
- Guimaraes, A.; Balalau, O.; Terolli, E.; Weikum, G. Analyzing the Traits and Anomalies of Political Discussions on Reddit. In Proceedings of the International Conference on Web and Social Media (ICWSM 2019), Munich, Germany, 11–14 June 2019; AAAI: Palo Alto, CA, USA, 2019; pp. 205–213. [Google Scholar]
- Banerjee, C.; Mukherjee, T.; Pasiliao, E., Jr. An empirical study on generalizations of the ReLU activation function. In Proceedings of the ACM Southeast Conference (ACM-SE‘19), Kennesaw, GA, USA, 18–20 April 2019; ACM: NewYork, NY, USA, 2019; pp. 164–167. [Google Scholar]
- Godes, D.; Silva, J. Sequential and temporal dynamics of online opinion. Mark. Sci.
**2012**, 31, 448–473. [Google Scholar] [CrossRef] - Li, X.; Hitt, L. Self-selection and information role of online product reviews. Inf. Syst. Res.
**2008**, 19, 456–474. [Google Scholar] [CrossRef] [Green Version] - Moe, W.; Schweidel, D. Online product opinions: Incidence, evaluation, and evolution. Mark. Sci.
**2012**, 31, 372–386. [Google Scholar] [CrossRef] - Corradini, E.; Nocera, A.; Ursino, D.; Virgili, L. Defining and detecting k-bridges in a social network: The Yelp case, and more. Knowl.-Based Syst.
**2020**, 187, 104820. [Google Scholar] [CrossRef] - Rios, S.; Aguilera, F.; Nuñez-Gonzalez, J.; Graña, M. Semantically enhanced network analysis for influencer identification in online social networks. Neurocomputing
**2019**, 326, 71–81. [Google Scholar] [CrossRef] - Graves, R.L.; Perrone, J.; Al-Garadi, M.A.; Yang, Y.C.; Love, J.; O’Connor, K.; Gonzalez-Hernandez, G.; Sarker, A. Thematic analysis of reddit content about buprenorphine-naloxone using manual annotation and natural language processing techniques. J. Addict. Med.
**2022**, 16, 454. [Google Scholar] [CrossRef] - Chandrasekharan, E.; Jhaver, S.; Bruckman, A.; Gilbert, E. Quarantined! Examining the effects of a community-wide moderation intervention on Reddit. ACM Trans. Comput.-Hum. Interact. (TOCHI)
**2022**, 29, 1–26. [Google Scholar] [CrossRef] - Aghabozorgi, S.; Shirkhorshidi, A.S.; Wah, T.Y. Time-series clustering–a decade review. Inf. Syst.
**2015**, 53, 16–38. [Google Scholar] [CrossRef] - Cauteruccio, F.; Terracina, G.; Ursino, D. Generalizing identity-based string comparison metrics: Framework and Techniques. Knowl.-Based Syst.
**2020**, 187, 104820. [Google Scholar] [CrossRef]

**Figure 1.**Distributions of posts against comments (

**left**, log–log scale) and crossposts (

**right**, log–log scale) enriched with marginal distributions.

**Figure 2.**Distributions of posts against scores (

**left**, log–log scale) and awards (

**right**, log–log scale) enriched with marginal distributions.

**Figure 3.**Distribution of posts against the sentiment polarity of their title (semi-logarithmic scale).

**Figure 4.**Distributions of authors against their karma (

**left**, log–log scale, enriched with marginal distributions) and registration dates (

**right**, semi-logarithmic scale).

**Figure 5.**Distributions of subreddits against subscribers (

**left**, log–log scale, enriched with marginal distributions) and sentiment polarity of their description (

**right**, semi-logarithmic scale).

**Figure 6.**The six lifespan templates we determined: (

**a**) explosive; (

**b**) revived; (

**c**) bell; (

**d**) unlucky; (

**e**) gray; (

**f**) moody.

Name | Description |
---|---|

post_id | The identifier of the post. |

author_id | The identifier of the author of the post. |

author | The username of the author of the post. |

subreddit_id | The identifier of the subreddit in which the post was published. |

subreddit | The name of the subreddit in which the post was published. |

created_utc | The creation date of the post. |

num_comments | The number of the comments received by the post. |

num_crossposts | How many times the post was crossposted. |

nsfw | If the post contains Not Safe For Work (NSFW) material. |

score | The overall score of the post; it is computed as the sum of the positive and the negative votes obtained by the post itself. |

title | The title of the post. |

is_video | If the post consists of a video. |

has_image | If the post contains an image. |

total_awards | The number of awards received by the post; an award is an acknowledgment of a user indicating that she/her liked the post. |

is_part_of_collection | If the post is part of a collection. |

is_self_post | If the post contains a link outside of Reddit. |

Name | Description |
---|---|

author_id | The identifier of the author. |

author | The username of the author. |

created_utc | The date of the author’s registration to Reddit expressed in Coordinated Universal Time. |

is_gold | The author purchased a premium subscription to Reddit. |

is_suspended | If the author has been suspended. |

karma | The author’s karma on Reddit; the karma of an author is a score representing her/his contribution to the Reddit community, evaluated on the basis of her/his posts and comments. |

Name | Description |
---|---|

subreddit_id | The identifier of the subreddit. |

subreddit | The name of the subreddit. |

created_utc | The creation date of the subreddit. |

description | The description of the subreddit. |

over_18 | If the subreddit is an NSFW one. |

subscribers | The number of users subscribed to the subreddit. |

**Table 4.**Values of $\alpha $ and $\delta $ of the power law distributions and maximum values of the fields of interest for the eWoM Power computation.

Field | $\mathit{\alpha}$ | $\mathit{\delta}$ | Maximum Value |
---|---|---|---|

num_comments | 1.425 | 0.012 | 100,002 |

num_crossposts | 1.386 | 0.027 | 576 |

score | 1.595 | 0.009 | 242,971 |

total_awards | 1.371 | 0.032 | 5638 |

karma | 1.873 | 0.018 | 35,249,636 |

created_utc | 2.028 | 0.089 | 89 |

subscribers | 1.862 | 0.022 | 67,092,933 |

Combination | Weight Values | Characteristics |
---|---|---|

${\mathcal{C}}_{0}$ | All weights are set to 1. | This is the default combination. It gives the same importance to all parameters. |

${\mathcal{C}}_{1}$, ${\mathcal{C}}_{2}$ | ${\omega}_{c}={\omega}_{cr}=2$ (for ${\mathcal{C}}_{1}$); ${\omega}_{c}={\omega}_{cr}=10$ (for ${\mathcal{C}}_{2}$); all the other weights are set to 1. | It privileges comments and crossposts and, therefore, those posts for which a discussion has been started and which have been crossposted on more subreddits. The privilege degree is moderate in ${\mathcal{C}}_{1}$ and high in ${\mathcal{C}}_{2}$. |

${\mathcal{C}}_{3}$, ${\mathcal{C}}_{4}$ | ${\omega}_{s}={\omega}_{a}=2$ (for ${\mathcal{C}}_{3}$); ${\omega}_{s}={\omega}_{a}=10$ (for ${\mathcal{C}}_{4}$); all the other weights are set to 1. | It privileges scores and awards and, therefore, posts with a high quality. The privilege degree is moderate in ${\mathcal{C}}_{3}$ and high in ${\mathcal{C}}_{4}$. |

${\mathcal{C}}_{5}$, ${\mathcal{C}}_{6}$ | ${\omega}_{k}={\omega}_{u}=2$ (for ${\mathcal{C}}_{5}$); ${\omega}_{k}={\omega}_{u}=10$ (for ${\mathcal{C}}_{6}$); all the other weights are set to 1. | It privileges the karma and seniority of the post authors and, therefore, the posts published by famous authors on Reddit. The privilege degree is moderate in ${\mathcal{C}}_{5}$ and high in ${\mathcal{C}}_{6}$. |

${\mathcal{C}}_{7}$, ${\mathcal{C}}_{8}$ | ${\omega}_{k}={\omega}_{u}={\omega}_{sb}=2$ (for ${\mathcal{C}}_{7}$); ${\omega}_{k}={\omega}_{u}={\omega}_{sb}=10$ (for ${\mathcal{C}}_{8}$); all the other weights are set to 1. | It privileges not only the karma and seniority of the post authors, but also the number of subscribers to the subreddit where the post was published. The privilege degree is moderate in ${\mathcal{C}}_{7}$ and high in ${\mathcal{C}}_{8}$. |

${\mathcal{C}}_{9}$, ${\mathcal{C}}_{10}$ | ${\omega}_{c}={\omega}_{sb}=2$; ${\omega}_{cr}={\omega}_{a}=4$ (for ${\mathcal{C}}_{9}$) ; ${\omega}_{c}={\omega}_{sb}=10$; ${\omega}_{cr}={\omega}_{a}=100$ (for ${\mathcal{C}}_{10}$); all the other weights are set to 1. | It privileges those parameters that, according to their semantics, should play the most-important roles in the definition of eWoM Power. The privilege degree is moderate in ${\mathcal{C}}_{9}$ and high in ${\mathcal{C}}_{10}$. |

${\mathcal{C}}_{11}$, ${\mathcal{C}}_{12}$ | ${\omega}_{sb}=2$; ${\omega}_{c}=4$; ${\omega}_{cr}=6$; ${\omega}_{a}=8$ (for ${\mathcal{C}}_{11}$); ${\omega}_{sb}=10$; ${\omega}_{c}=100$; ${\omega}_{cr}=1,000$; ${\omega}_{a}=10,000$ (for ${\mathcal{C}}_{12}$); all the other weights are set to 1. | It privileges those parameters characterized by rare occurrences; it assumes that the rarer an occurrence, the more its presence in a post contributes to increasing the post’s eWoM Power. The privilege degree is moderate in ${\mathcal{C}}_{11}$ and high in ${\mathcal{C}}_{12}$. |

Combination | Average eWoM Power of Posts | Combination | Average eWoM Power of Posts |
---|---|---|---|

${\mathcal{C}}_{0}$ | 0.02984 | ||

${\mathcal{C}}_{1}$ | 0.02323 | ${\mathcal{C}}_{2}$ | 0.00843 |

${\mathcal{C}}_{3}$ | 0.02327 | ${\mathcal{C}}_{4}$ | 0.00856 |

${\mathcal{C}}_{5}$ | 0.04018 | ${\mathcal{C}}_{6}$ | 0.06335 |

${\mathcal{C}}_{7}$ | 0.04170 | ${\mathcal{C}}_{8}$ | 0.06123 |

${\mathcal{C}}_{9}$ | 0.01766 | ${\mathcal{C}}_{10}$ | 0.00326 |

${\mathcal{C}}_{11}$ | 0.01155 | ${\mathcal{C}}_{12}$ | 0.00059 |

**Table 7.**Cardinality of the intersection of the top 1000 posts for each possible pair of weight combinations.

${\mathcal{C}}_{0}$ | ${\mathcal{C}}_{1}$ | ${\mathcal{C}}_{3}$ | ${\mathcal{C}}_{5}$ | ${\mathcal{C}}_{7}$ | ${\mathcal{C}}_{9}$ | ${\mathcal{C}}_{11}$ | |
---|---|---|---|---|---|---|---|

${\mathcal{C}}_{0}$ | 1000 | ||||||

${\mathcal{C}}_{1}$ | 994 | 1000 | |||||

${\mathcal{C}}_{3}$ | 978 | 982 | 1000 | ||||

${\mathcal{C}}_{5}$ | 595 | 596 | 594 | 1000 | |||

${\mathcal{C}}_{7}$ | 991 | 985 | 969 | 590 | 1000 | ||

${\mathcal{C}}_{9}$ | 803 | 802 | 789 | 449 | 806 | 1000 | |

${\mathcal{C}}_{11}$ | 803 | 805 | 791 | 450 | 805 | 989 | 1000 |

**Table 8.**Percentage of posts returned by our approach that the human expert evaluated to have high eWoM Power in reality.

Combination | Percentage of Posts | Combination | Percentage of Posts |
---|---|---|---|

${\mathcal{C}}_{0}$ | 100% | ||

${\mathcal{C}}_{1}$ | 100% | ${\mathcal{C}}_{2}$ | 99% |

${\mathcal{C}}_{3}$ | 99.5% | ${\mathcal{C}}_{4}$ | 98% |

${\mathcal{C}}_{5}$ | 95% | ${\mathcal{C}}_{6}$ | 93.5% |

${\mathcal{C}}_{7}$ | 99% | ${\mathcal{C}}_{8}$ | 97.5% |

${\mathcal{C}}_{9}$ | 98% | ${\mathcal{C}}_{10}$ | 97% |

${\mathcal{C}}_{11}$ | 98% | ${\mathcal{C}}_{12}$ | 96.5% |

Lifespan Template | Percentage of Posts |
---|---|

“Explosive” | 22.2% |

“Revived” | 22.6% |

“Bell” | 7.2% |

“Unlucky” | 22.5% |

“Gray” | 18.3% |

“Moody” | 4.7% |

None of them | 2.5% |

Lifespan Template | Percentage of Posts |
---|---|

“Explosive” | 21.8% |

“Revived” | 23.4% |

“Bell” | 7.5% |

“Unlucky” | 21.8% |

“Gray” | 17.6% |

“Moody” | 4.9% |

None of them | 3.0% |

Short Lifetime | Long Lifetime | |
---|---|---|

Low eWoM Power | Meme | Niche News |

High eWoM Power | Meme | News |

Short Lifetime | Long Lifetime | |
---|---|---|

Low eWoM Power | Meme | Cringe/NSFW |

High eWoM Power | Meme | Digital art |

Short Lifetime | Long Lifetime | |
---|---|---|

Low eWoM Power | Cringe/NSFW | NSFW |

High eWoM Power | Provocative | Provocative |

Short Lifetime | Long Lifetime | |
---|---|---|

Low eWoM Power | Content Creator | Content Creator |

High eWoM Power | News | News |

Short Lifetime | Long Lifetime | |
---|---|---|

Low eWoM Power | Doubt | Unsuccessful |

High eWoM Power | Doubt | Unsuccessful |

Short Lifetime | Long Lifetime | |
---|---|---|

Low eWoM Power | Meme | Politics |

High eWoM Power | Provocative | Politics |

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Bonifazi, G.; Corradini, E.; Ursino, D.; Virgili, L.
Modeling, Evaluating, and Applying the eWoM Power of Reddit Posts. *Big Data Cogn. Comput.* **2023**, *7*, 47.
https://doi.org/10.3390/bdcc7010047

**AMA Style**

Bonifazi G, Corradini E, Ursino D, Virgili L.
Modeling, Evaluating, and Applying the eWoM Power of Reddit Posts. *Big Data and Cognitive Computing*. 2023; 7(1):47.
https://doi.org/10.3390/bdcc7010047

**Chicago/Turabian Style**

Bonifazi, Gianluca, Enrico Corradini, Domenico Ursino, and Luca Virgili.
2023. "Modeling, Evaluating, and Applying the eWoM Power of Reddit Posts" *Big Data and Cognitive Computing* 7, no. 1: 47.
https://doi.org/10.3390/bdcc7010047