Prominent User Segments in Online Consumer Recommendation Communities: Capturing Behavioral and Linguistic Qualities with User Comment Embeddings
Abstract
:1. Introduction
2. Relevant Studies and Formulation of Research Questions
- RQ1: What aspects of user engagement behavior and linguistic characteristics can be sufficiently captured by latent factors of user comment embeddings in consumer recommendation communities?
- RQ2: What are the prominent user segments in consumer recommendation communities, based on user comment embeddings:
- ∘
- RQ2a: distinguished by factors of behavioral engagement?
- ∘
- RQ2b: distinguished by factors of writing style and quality?
3. Methods
3.1. Data Collection and Preperation
3.2. Evaluation Variables
3.3. User Embeddings, Dimension Reduction, and Clustering Approach
4. Results
4.1. Correlations of Principal Factors with Evaluation Variables
4.2. Cluster Analysis
4.2.1. User Clusters—Different Levels of Contribution
4.2.2. User Clusters—Different Levels of Sentiment
4.2.3. Testing Cluster Formation without Dimensionality Reduction
5. Discussion
5.1. Evaluating User Embeddings (RQ1)
5.2. Prominent User Segments (RQ2)
5.3. Implications, Limitations, and Future Research
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Appendix A
Metrics | Source | Description |
---|---|---|
Behavioral Engagement | ||
Activ Length Months | Reddit Dataset | Length of Activity—Total Months |
Activ Length Days | Reddit Dataset | Length of Activity—Total Days |
Avg Comment Score | Reddit Dataset | Avg Score (Upvotes–Downvotes) per Comment |
Avg Replies Created | Reddit Dataset | Average Total Comments (Replies) Created by Each User Comment |
Agg Comment Score | Reddit Dataset | Aggregate of Score (Upvotes Minus Downvotes) of Comments |
Agg Replies Created | Reddit Dataset | Aggregate Replies Created by User Comments |
Num of Comments | Reddit Dataset | Number of User Comments |
Num of Unique Threads | Reddit Dataset | Number of Unique Threads Participated |
Num of Com Own Threads | Reddit Dataset | Number of Comments in Own Submissions (Threads Created by the User) |
Comp Sentiment Score | VADER Library | Compound Sentiment VADER Score per Comment |
Pos Sentiment Score | VADER Library | Positive Sentiment VADER Score per Comment |
Neu Sentiment Score | VADER Library | Neutral Sentiment VADER Score per Comment |
Neg Sentiment Score | VADER Library | Negative Sentiment VADER Score per Comment |
Affect 1 | LIWC Library (2007) | Number of Affective Words per Comment |
Pos Emotion 1 | LIWC Library (2007) | Number of Positive Emotion Words per Comment |
Neg Emotion 1 | LIWC Library (2007) | Number of Negative Emotion Words per Comment |
Linguistic Style | ||
Tot Word 1 | LFTK Library | Words per Comment |
Tot Un Word 1 | LFTK Library | Unique Words per Comment |
Avg Syll PS 2 | LFTK Library | Syllables per Sentence |
Avg Char PW 3 | LFTK Library | Characters per Word |
Avg Syll PW 3 | LFTK Library | Syllables per Word |
Avg Kup AoA PW 3 | LFTK Library | Kuperman Age of Acquisition per Word |
RT Fast | LFTK Library | Reading Time of Fast Reader |
RT Slow | LFTK Library | Reading Time of Slow Reader |
Coleman–Liau Index | LFTK Library | Coleman–Liau Readability Index |
SMOG Index | LFTK Library | SMOG Readability Index |
Pers Pronouns 1 | LIWC Library (2007) | Personal Pronoun Words per Comment |
1st Pers Singular 1 | LIWC Library (2007) | 1st Person Singular Words per Comment |
1st Pers Plural 1 | LIWC Library (2007) | 1st Person Plural Words per Comment |
2nd Pers 1 | LIWC Library (2007) | 2nd Person Words per Comment |
3rd Pers Singular 1 | LIWC Library (2007) | 3rd Person Singular Words per Comment |
3rd Pers Plural 1 | LIWC Library (2007) | 3rd Person Plural Words per Comment |
Articles 1 | LIWC Library (2007) | Articles Words per Comment |
Common Verbs 1 | LIWC Library (2007) | Common Verbs Words per Comment |
Past Tense 1 | LIWC Library (2007) | Past Tense Words per Comment |
Present Tense 1 | LIWC Library (2007) | Present Tense Words per Comment |
Future Tense 1 | LIWC Library (2007) | Future Tense Words per Comment |
Adverbs 1 | LIWC Library (2007) | Adverbs Words per Comment |
Prepositions 1 | LIWC Library (2007) | Prepositions Words per Comment |
Conjunctions 1 | LIWC Library (2007) | Conjunctions Words per Comment |
Negations 1 | LIWC Library (2007) | Negations Words per Comment |
Quantifiers 1 | LIWC Library (2007) | Quantifiers Words per Comment |
Social Processes 1 | LIWC Library (2007) | Social Processes Words per Comment |
Family 1 | LIWC Library (2007) | Family Words per Comment |
Friends 1 | LIWC Library (2007) | Friends Words per Comment |
Humans 1 | LIWC Library (2007) | Humans Words per Comment |
Cognitive 1 | LIWC Library (2007) | Cognitive Processes Words per Comment |
Insight 1 | LIWC Library (2007) | Insight Words per Comment |
Causation 1 | LIWC Library (2007) | Causation Words per Comment |
Discrepancy 1 | LIWC Library (2007) | Discrepancy Words per Comment |
Tentative 1 | LIWC Library (2007) | Tentative Words per Comment |
Certainty 1 | LIWC Library (2007) | Certainty Words per Comment |
Inhibition 1 | LIWC Library (2007) | Inhibition Words per Comment |
Inclusive 1 | LIWC Library (2007) | Inclusive Words per Comment |
Exclusive 1 | LIWC Library (2007) | Exclusive Words per Comment |
Appendix B
References
- Russo Spena, T.; D’Auria, A.; Bifulco, F. Customer Insights and Consumer Profiling. In Digital Transformation in the Cultural Heritage Sector; Springer Nature: Cham, Switzerland, 2021; pp. 95–117. [Google Scholar] [CrossRef]
- Smith, A. Consumer Behaviour and Analytics, 2nd ed.; Informa UK Limited: London, UK, 2023. [Google Scholar] [CrossRef]
- Akar, E.; Mardikyan, S. User Roles and Contribution Patterns in Online Communities: A Managerial Perspective. SAGE Open 2018, 8, 2158244018794773. [Google Scholar] [CrossRef]
- Bhattacharjee, D.R.; Pradhan, D.; Swani, K. Brand communities: A literature review and future research agendas using TCCM approach. Int. J. Consum. Stud. 2021, 46, 3–28. [Google Scholar] [CrossRef]
- Veloutsou, C.; Black, I. Creating and managing participative brand communities: The roles members perform. J. Bus. Res. 2019, 117, 873–885. [Google Scholar] [CrossRef]
- Lillqvist, E.; Moisander, J.K.; Firat, A.F. Consumers as legitimating agents: How consumer-citizens challenge marketer legitimacy on social media. Int. J. Consum. Stud. 2018, 42, 197–204. [Google Scholar] [CrossRef]
- Reddit. How Community Recommendations Drive Collective Influence. 2023. Available online: https://connect.redditinc.com/hubfs/121662_Reddit%20Recommends%20Research%20Report_Superside_V4_V1.pdf (accessed on 3 March 2024).
- Boyd, R.L. Psychological Text Analysis in the Digital Humanities. In Data Analytics in Digital Humanities; Springer International Publishing: Cham, Switzerland, 2017; pp. 161–189. [Google Scholar] [CrossRef]
- Boyd, R.L.; Pennebaker, J.W. Language-based personality: A new approach to personality in a digital world. Curr. Opin. Behav. Sci. 2017, 18, 63–68. [Google Scholar] [CrossRef]
- Lee, B.W.; Arockiaraj, B.F.; Jin, H. Linguistic Properties of Truthful Response. In Proceedings of the Annual Meeting of the Association for Computational Linguistics, Toronto, ON, Canada, 9–14 July 2023; pp. 135–140. [Google Scholar] [CrossRef]
- Boyd, R.L.; Pennebaker, J.W. Did Shakespeare Write Double Falsehood? Identifying Individuals by Creating Psychological Signatures With Text Analysis. Psychol. Sci. 2015, 26, 570–582. [Google Scholar] [CrossRef]
- Gkikas, D.C.; Tzafilkou, K.; Theodoridis, P.K.; Garmpis, A.; Gkikas, M.C. How do text characteristics impact user engagement in social media posts: Modeling content readability, length, and hashtags number in Facebook. Int. J. Inf. Manag. Data Insights 2022, 2, 100067. [Google Scholar] [CrossRef]
- Alzetta, C.; Dell’Orletta, F.; Miaschi, A.; Prat, E.; Venturi, G. Tell me how you write and I’ll tell you what you read: A study on the writing style of book reviews. J. Doc. 2023, 80, 180–202. [Google Scholar] [CrossRef]
- Dell’Orletta, F.; Montemagni, S.; Venturi, G. READ–IT: Assessing Readability of Italian Texts with a View to Text Simplification. In Proceedings of the Second Workshop on Speech and Language Processing for Assistive Technologies, Edinburgh, UK, 30 July 2011; pp. 73–83. Available online: https://aclanthology.org/W11-2308 (accessed on 5 February 2024).
- Forti, L.; Bolli, G.G.; Santarelli, F.; Santucci, V.; Spina, S. MALT-IT2: A new resource to measure text difficulty in light of CEFR levels for Italian L2 learning. In Proceedings of the 12th Language Resources and Evaluation Conference, Marseille, France, 11–16 May 2020; pp. 7204–7211. [Google Scholar]
- Biondi, G.; Franzoni, V.; Li, Y.; Milani, A.; Santucci, V. RITA: A Phraseological Dataset of CEFR Assignments and Exams for Italian as a Second Language. In Proceedings of the 2023 IEEE/WIC International Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT), Venice, Italy, 26–29 October 2023; pp. 425–430. [Google Scholar] [CrossRef]
- Jian, L.; Xiang, H.; Le, G. English Text Readability Measurement Based on Convolutional Neural Network: A Hybrid Network Model. Comput. Intell. Neurosci. 2022, 2022, 6984586. [Google Scholar] [CrossRef]
- Berggren, M.; Kaati, L.; Pelzer, B.; Stiff, H.; Lundmark, L.; Akrami, N. The generalizability of machine learning models of personality across two text domains. Pers. Individ. Differ. 2024, 217, 112465. [Google Scholar] [CrossRef]
- Pan, S.; Ding, T. Social Media-based User Embedding: A literature review. In Proceedings of the IJCAI International Joint Conference on Artificial Intelligence, Macao, China, 10–16 August 2019; pp. 6318–6324. [Google Scholar] [CrossRef]
- Guimaraes, A.; Balalau, O.; Terolli, E.; Weikum, G. Analyzing the Traits and Anomalies of Political Discussions on Reddit. Proc. Int. AAAI Conf. Web Soc. Media 2019, 13, 205–213. [Google Scholar] [CrossRef]
- Rivas, P.; Zimmermann, M. Empirical study of sentence embeddings for english sentences quality assessment. In Proceedings of the 6th Annual Conference on Computational Science and Computational Intelligence, CSCI 2019, Las Vegas, NV, USA, 5–7 December 2019; pp. 331–336. [Google Scholar] [CrossRef]
- Sepahpour-Fard, M.; Quayle, M.; Schuld, M.; Yasseri, T. Using word embeddings to analyse audience effects and individual differences in parenting Subreddits. EPJ Data Sci. 2023, 12, 38. [Google Scholar] [CrossRef]
- Ahmad, H.; Asghar, M.Z.; Khan, A.S.; Habib, A. A Systematic Literature Review of Personality Trait Classification from Textual Content. Open Comput. Sci. 2020, 10, 175–193. [Google Scholar] [CrossRef]
- Tegene, A.; Liu, Q.; Gan, Y.; Dai, T.; Leka, H.; Ayenew, M. Deep Learning and Embedding Based Latent Factor Model for Collaborative Recommender Systems. Appl. Sci. 2023, 13, 726. [Google Scholar] [CrossRef]
- Schuld, M.; Durrheim, K.; Mafunda, M. Speaker landscapes: Machine learning opens a window on the everyday language of opinion. Commun. Methods Meas. 2023, 1–17. [Google Scholar] [CrossRef]
- Terreau, E.; Gourru, A.; Velcin, J. Writing Style Author Embedding Evaluation. In Proceedings of the 2nd Workshop on Evaluation and Comparison of NLP Systems, Stroudsburg, PA, USA, 10 November 2021; pp. 84–93. [Google Scholar] [CrossRef]
- Curiskis, S.A.; Drake, B.; Osborn, T.R.; Kennedy, P.J. An evaluation of document clustering and topic modelling in two online social networks: Twitter and Reddit. Inf. Process. Manag. 2019, 57, 102034. [Google Scholar] [CrossRef]
- Bayrak, A.T. An application of Customer Embedding for Clustering. In Proceedings of the IEEE International Conference on Data Mining Workshops, ICDMW, Orlando, FL, USA, 28 November–1 December 2022; pp. 79–82. [Google Scholar] [CrossRef]
- Cauteruccio, F.; Corradini, E.; Terracina, G.; Ursino, D.; Virgili, L. Investigating Reddit to detect subreddit and author stereotypes and to evaluate author assortativity. J. Inf. Sci. 2022, 48, 783–810. [Google Scholar] [CrossRef]
- Arazzi, M.; Nicolazzo, S.; Nocera, A.; Zippo, M. The importance of the language for the evolution of online communities: An analysis based on Twitter and Reddit. Expert Syst. Appl. 2023, 222, 119847. [Google Scholar] [CrossRef]
- Zhu, X.; de Melo, G. Sentence Analogies: Linguistic Regularities in Sentence Embeddings. In Proceedings of the 28th International Conference on Computational Linguistics, Stroudsburg, PA, USA, 8–13 December 2020; pp. 3389–3400, International Committee on Computational Linguistics. [Google Scholar] [CrossRef]
- Simoulin, A. Sentence Embeddings and Their Relation with Sentence Structures. Ph.D. Thesis, Université Paris Cité, Paris, France, 2022. [Google Scholar]
- Noguti, V. Post language and user engagement in online content communities. Eur. J. Mark. 2016, 50, 695–723. [Google Scholar] [CrossRef]
- Santos, Z.R.; Cheung, C.M.K.; Coelho, P.S.; Rita, P. Consumer engagement in social media brand communities: A literature review. Int. J. Inf. Manag. 2021, 63, 102457. [Google Scholar] [CrossRef]
- Zhang, Y.; Ridings, C.; Semenov, A. What to post? Understanding engagement cultivation in microblogging with big data-driven theory building. Int. J. Inf. Manag. 2022, 71, 102509. [Google Scholar] [CrossRef]
- García-Rudolph, A.; Sanchez-Pinsach, D.; Frey, D.; Opisso, E.; Cisek, K.; Kelleher, J.D. Know an Emotion by the Company It Keeps: Word Embeddings from Reddit/Coronavirus. Appl. Sci. 2023, 13, 6713. [Google Scholar] [CrossRef]
- Pennebaker, J.W.; Chung, C.K.; Ireland, M.; Gonzales, A.; Booth, R.J. The Development and Psychometric Properties of LIWC2007; University of Texas at Austin: Austin, TX, USA, 2007. [Google Scholar]
- Yarkoni, T. Personality in 100,000 Words: A large-scale analysis of personality and word use among bloggers. J. Res. Pers. 2010, 44, 363–373. [Google Scholar] [CrossRef]
- Gjurković, M.; Šnajder, J. Reddit: A gold mine for personality prediction. In Proceedings of the Second Workshop on Computational Modeling of People’s Opinions, Personality, and Emotions in Social Media, New Orleans, LA, USA, 6 June 2018; pp. 87–97. [Google Scholar] [CrossRef]
- Dover, Y.; Amichai-Hamburger, Y. Characteristics of online user-generated text predict the emotional intelligence of individuals. Sci. Rep. 2023, 13, 6778. [Google Scholar] [CrossRef]
- Tavabi, L.; Tran, T.; Stefanov, K.; Borsari, B.; Woolley, J.D.; Scherer, S.; Soleymani, M. Analysis of Behavior Classification in Motivational Interviewing. In Proceedings of the Seventh Workshop on Computational Linguistics and Clinical Psychology: Improving Access, Rio de Janeiro, Brazil, 8–13 July 2021; pp. 110–115. [Google Scholar] [CrossRef]
- Biggiogera, J.; Boateng, G.; Hilpert, P.; Vowels, M.; Bodenmann, G.; Neysari, M.; Kowatsch, T. BERT meets LIWC: Exploring State-of-the-Art Language Models for Predicting Communication Behavior in Couples’ Conflict Interactions. In Proceedings of the ICMI ‘21 Companion: Companion Publication of the 2021 International Conference on Multimodal Interaction, New York, NY, USA, 18–22 October 2021; pp. 385–389. [Google Scholar] [CrossRef]
- Nguyen, D.; Rosé, C.P. Language use as a reflection of socialization in online communities. In Proceedings of the Workshop on Languages in Social Media, Portland, Oregon, 23 June 2011; pp. 76–85. [Google Scholar]
- Hay, J.; Doan, B.L.; Popineau, F.; Elhara, O.A. Representation learning of writing style. In Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020), Online, 19 November 2020; pp. 232–243. [Google Scholar]
- Huertas-Tato, J.; Martín, A.; Camacho, D. Understanding writing style in social media with a supervised contrastively pre-trained transformer. Knowl. Based Syst. 2024, 296, 111867. [Google Scholar] [CrossRef]
- Strukova, S.; Ruipérez-Valiente, J.A.; Gómez Mármol, F. Computational approaches to detect experts in distributed online communities: A case study on Reddit. Clust. Comput. 2023, 27, 0123456789. [Google Scholar] [CrossRef]
- Cork, A.; Everson, R.; Naserian, E.; Levine, M.; Koschate-Reis, M. Collective self-understanding: A linguistic style analysis of naturally occurring text data. Behav. Res. Methods 2022, 55, 4455–4477. [Google Scholar] [CrossRef]
- Baumgartner, J.; Zannettou, S.; Keegan, B.; Squire, M.; Blackburn, J. The pushshift reddit dataset. In Proceedings of the International AAAI Conference on Web and Social Media, Georgia, GA, USA, 8–11 June 2019; Volume 14, pp. 830–839. [Google Scholar]
- Rani, S.; Ahmed, K.; Subramani, S. From Posts to Knowledge: Annotating a Pandemic-Era Reddit Dataset to Navigate Mental Health Narratives. Appl. Sci. 2024, 14, 1547. [Google Scholar] [CrossRef]
- Proferes, N.; Jones, N.; Gilbert, S.; Fiesler, C.; Zimmer, M. Studying Reddit: A Systematic Overview of Disciplines, Approaches, Methods, and Ethics. Soc. Media + Soc. 2021, 7, 20563051211019004. [Google Scholar] [CrossRef]
- Bump, P. 24 Reddit Stats and Facts to Know in 2022. HubSpot. 2022. Available online: https://blog.hubspot.com/marketing/reddit-stats (accessed on 2 April 2024).
- Hintz, E.A.; Betts, T. Reddit in communication research: Current status, future directions and best practices. Ann. Int. Commun. Assoc. 2022, 46, 116–133. [Google Scholar] [CrossRef]
- Kilroy, D.; Healy, G.; Caton, S. Using Machine Learning to Improve Lead Times in the Identification of Emerging Customer Needs. IEEE Access 2022, 10, 37774–37795. [Google Scholar] [CrossRef]
- Eberhard, L.; Popova, K.; Walk, S.; Helic, D. Computing recommendations from free-form text. Expert Syst. Appl. 2024, 236, 121268. [Google Scholar] [CrossRef]
- Hutto, C.; Gilbert, E. VADER: A Parsimonious Rule-Based Model for Sentiment Analysis of Social Media Text. Proc. Int. AAAI Conf. Web Soc. Media 2014, 8, 216–225. [Google Scholar] [CrossRef]
- Lee, B.W.; Lee, J.H.J. LFTK: Handcrafted Features in Computational Linguistics. In Proceedings of the Annual Meeting of the Association for Computational Linguistics, Baltimore, MY, USA, 23–24 June 2014; pp. 1–19. [Google Scholar] [CrossRef]
- Ruan, T.; Lv, Q. Public perception of electric vehicles on Reddit and Twitter: A cross-platform analysis. Transp. Res. Interdiscip. Perspect. 2023, 21, 100872. [Google Scholar] [CrossRef]
- Le, Q.; Mikolov, T. Distributed representations of sentences and documents. In Proceedings of the International Conference on Machine Learning, Beijing, China, 22–24 June 2014; pp. 1188–1196. [Google Scholar]
- Aguilar, J.; Salazar, C.; Velasco, H.; Monsalve-Pulido, J.; Montoya, E. Comparison and Evaluation of Different Methods for the Feature Extraction from Educational Contents. Computation 2020, 8, 30. [Google Scholar] [CrossRef]
- Budiarto, A.; Rahutomo, R.; Putra, H.N.; Cenggoro, T.W.; Kacamarga, M.F.; Pardamean, B. Unsupervised News Topic Modelling with Doc2Vec and Spherical Clustering. Procedia Comput. Sci. 2021, 179, 40–46. [Google Scholar] [CrossRef]
- Karvelis, P.; Gavrilis, D.; Georgoulas, G.; Stylios, C. Topic recommendation using Doc2Vec. In Proceedings of the International Joint Conference on Neural Networks, Rio de Janeiro, Brazil, 8–13 July 2018; pp. 1–6. [Google Scholar] [CrossRef]
- Wang, G.; Kwok, S.W.H. Using K-means clustering method with Doc2vec to understand the twitter users’ opinions on COVID-19 vaccination. In Proceedings of the 2021 IEEE EMBS International Conference on Biomedical and Health Informatics (BHI), Athens, Greece, 27–30 July 2021; pp. 1–4. [Google Scholar] [CrossRef]
- Reimers, N.; Gurevych, I. Sentence-BERT: Sentence embeddings using siamese BERT-networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, 3–7 November 2019; pp. 3982–3992. [Google Scholar] [CrossRef]
- Iliescu, D.M.; Grand, R.; Qirko, S.; van der Goot, R. Much Gracias: Semi-supervised Code-switch Detection for Spanish-English: How far can we get? Computational Approaches to Linguistic Code-Switching. In Proceedings of the CALCS 2021—5th Workshop, Mexico City, Mexico, 11 June 2021; pp. 65–71. [Google Scholar] [CrossRef]
- Adams, M.A.; Conway, T.L. Eta Squared. In Encyclopedia of Quality of Life and Well-Being Research; Michalos, A.C., Ed.; Springer: Dordrecht, The Netherlands, 2014. [Google Scholar] [CrossRef]
Subreddit (Community) | ||||
---|---|---|---|---|
r/Booksuggestions | r/Moviesuggestions | r/Suggestmeabook | Total | |
Comments | 303.881 | 325.955 | 680.351 | 1310.187 |
Threads | 29.510 | 17.576 | 47.218 | 94.304 |
Unique users with min 1 comment | 56.138 | 40.520 | 119.749 | 216.407 |
Unique users >3 comments | 15.714 | 12.490 | 32.054 | 60.258 |
r/Booksuggestions | r/Moviesuggestions | r/Suggestmeabook | ||||
---|---|---|---|---|---|---|
Behavioral Engagement | PCA 1 | PCA 2 | PCA 1 | PCA 2 | PCA 1 | PCA 2 |
Activ Length Months | 0.57 | −0.05 | −0.50 | 0.08 | −0.53 | 0.48 |
Activ Length Days | 0.65 | 0.07 | −0.65 | 0.07 | −0.62 | 0.53 |
Sum Comment Score | 0.53 | 0.05 | −0.51 | 0.03 | −0.47 | 0.44 |
Sum Replies Created | 0.51 | 0.05 | −0.53 | −0.01 | −0.45 | 0.41 |
Sum Comments | 0.51 | 0.08 | −0.49 | 0.02 | −0.47 | 0.42 |
Sum Unique Threads | 0.51 | 0.08 | −0.45 | 0.06 | −0.45 | 0.39 |
Comp Sentim Score | −0.09 | −0.15 | 0.01 | −0.43 | 0.07 | 0.09 |
Pos Sentim Score | −0.37 | 0.05 | 0.16 | −0.33 | 0.32 | −0.23 |
Neu Sentim Score | 0.34 | −0.05 | −0.12 | 0.28 | −0.28 | 0.19 |
Affect | −0.32 | 0.08 | 0.18 | −0.16 | 0.28 | −0.22 |
Pos Emotion | −0.34 | 0.08 | 0.18 | −0.21 | 0.29 | −0.24 |
Linguistic Style | ||||||
Tot Word | 0.52 | −0.12 | −0.50 | −0.29 | −0.47 | 0.59 |
Tot Un Word | 0.56 | −0.15 | −0.53 | −0.33 | −0.50 | 0.64 |
Avg Syll PS | 0.50 | −0.10 | −0.41 | −0.27 | −0.46 | 0.51 |
Avg Kup AoA PW | −0.08 | −0.16 | 0.06 | −0.49 | 0.08 | 0.21 |
RT_Fast | 0.52 | −0.12 | −0.50 | −0.29 | −0.47 | 0.59 |
RT_Slow | 0.52 | −0.12 | −0.50 | −0.29 | −0.47 | 0.59 |
SMOG Index | 0.38 | 0.04 | −0.32 | −0.18 | −0.35 | 0.41 |
Pers Pronouns | −0.19 | −0.04 | 0.05 | −0.42 | 0.18 | 0.00 |
1st Pers Singular | −0.20 | −0.10 | 0.05 | −0.43 | 0.18 | −0.01 |
Common Verbs | −0.30 | 0.03 | 0.11 | −0.44 | 0.25 | −0.08 |
Past Tense | −0.04 | −0.11 | 0.02 | −0.32 | 0.05 | 0.08 |
Present Tense | −0.28 | 0.05 | 0.11 | −0.33 | 0.24 | −0.11 |
Adverbs | −0.05 | −0.12 | −0.03 | −0.37 | 0.05 | 0.13 |
Conjunctions | 0.19 | −0.19 | −0.15 | −0.31 | −0.13 | 0.28 |
Cognitive | 0.16 | −0.16 | −0.11 | −0.39 | −0.10 | 0.34 |
Exclusive | 0.10 | −0.19 | −0.08 | −0.32 | −0.06 | 0.25 |
r/Booksuggestions | r/Moviesuggestions | r/Suggestmeabook | ||||
---|---|---|---|---|---|---|
Behavioral Engagement | PCA 1 | PCA 2 | PCA 1 | PCA 2 | PCA 1 | PCA 2 |
Num of Com Own Threads | 0.29 | −0.19 | 0.00 | 0.37 | 0.25 | 0.23 |
Comp Sentim Score | 0.10 | −0.60 | −0.30 | 0.58 | 0.02 | 0.62 |
Pos Sentim Score | 0.65 | −0.44 | −0.02 | 0.73 | 0.55 | 0.54 |
Neu Sentim Score | −0.61 | 0.35 | 0.05 | −0.59 | −0.51 | −0.44 |
Neg Sentim Score | −0.21 | 0.31 | −0.05 | −0.29 | −0.17 | −0.30 |
Affect | 0.60 | −0.24 | 0.13 | 0.47 | 0.52 | 0.33 |
Pos Emotion | 0.63 | −0.30 | 0.12 | 0.60 | 0.55 | 0.40 |
Linguistic Style | ||||||
Tot Word | −0.43 | −0.19 | −0.58 | 0.05 | −0.45 | 0.20 |
Tot Un Word | −0.51 | −0.23 | −0.67 | 0.08 | −0.52 | 0.25 |
Avg Syll PS | −0.59 | −0.11 | −0.66 | 0.00 | −0.58 | 0.10 |
Avg Kup AoA PW | 0.06 | −0.38 | −0.28 | 0.28 | 0.01 | 0.41 |
RT Fast | −0.43 | −0.19 | −0.58 | 0.05 | −0.45 | 0.20 |
RT Slow | −0.43 | −0.19 | −0.58 | 0.05 | −0.45 | 0.20 |
SMOG Index | −0.37 | −0.14 | −0.45 | 0.00 | −0.37 | 0.13 |
Pers Pronouns | 0.22 | −0.30 | −0.19 | 0.44 | 0.17 | 0.36 |
1st Pers Singular | 0.06 | −0.24 | −0.28 | 0.32 | 0.02 | 0.29 |
2nd Pers | 0.32 | −0.19 | 0.06 | 0.36 | 0.30 | 0.24 |
Articles | −0.35 | 0.15 | −0.07 | −0.28 | −0.27 | −0.22 |
Common Verbs | 0.56 | −0.39 | −0.10 | 0.69 | 0.45 | 0.49 |
Present Tense | 0.60 | −0.36 | 0.01 | 0.67 | 0.51 | 0.47 |
Future Tense | 0.24 | −0.33 | −0.07 | 0.37 | 0.21 | 0.36 |
Adverbs | −0.04 | −0.17 | −0.33 | 0.17 | −0.09 | 0.21 |
Prepositions | −0.46 | 0.08 | −0.34 | 0.00 | −0.43 | −0.11 |
Conjunctions | −0.34 | −0.11 | −0.44 | 0.06 | −0.34 | 0.11 |
Cognitive | −0.32 | −0.09 | −0.42 | 0.06 | −0.31 | 0.11 |
Exclusive | −0.23 | −0.09 | −0.36 | 0.09 | −0.24 | 0.12 |
r/Booksuggestions | r/Moviesuggestions | r/Suggestmeabook | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Behavioral Engagement | HC | MC | LC | etasq | HC | MC | LC | etasq | HC | MC | LC | etasq |
Activ Length Months | 6.41 | 3.66 | 2.35 | 0.27 | 6.70 | 4.09 | 2.86 | 0.21 | 5.90 | 3.52 | 2.41 | 0.24 |
Activ Length Days | 29.76 | 7.31 | 3.80 | 0.31 | 40.05 | 9.78 | 4.84 | 0.33 | 23.28 | 6.74 | 3.77 | 0.28 |
Agg Comment Score | 182.97 | 36.70 | 16.81 | 0.19 | 322.97 | 52.35 | 22.16 | 0.17 | 180.47 | 42.07 | 19.51 | 0.16 |
Agg Replies Created | 29.55 | 4.86 | 2.21 | 0.16 | 30.79 | 4.98 | 1.90 | 0.19 | 23.88 | 4.71 | 2.27 | 0.13 |
Num of Comments | 63.87 | 11.25 | 6.23 | 0.16 | 98.00 | 15.57 | 6.90 | 0.15 | 49.37 | 10.82 | 6.20 | 0.14 |
Num of Unique Threads | 51.69 | 8.32 | 3.64 | 0.15 | 77.06 | 11.52 | 5.18 | 0.13 | 37.58 | 7.44 | 3.67 | 0.11 |
Pos Sentiment Score | 0.15 | 0.18 | 0.25 | 0.11 | 0.15 | 0.17 | 0.19 | 0.01 | 0.16 | 0.18 | 0.23 | 0.07 |
Neu Sentiment Score | 0.79 | 0.77 | 0.71 | 0.09 | 0.77 | 0.76 | 0.75 | 0.01 | 0.79 | 0.77 | 0.72 | 0.05 |
Affect | 0.06 | 0.07 | 0.11 | 0.08 | 0.07 | 0.08 | 0.10 | 0.02 | 0.06 | 0.07 | 0.10 | 0.05 |
Pos Emotion | 0.05 | 0.06 | 0.09 | 0.09 | 0.05 | 0.06 | 0.07 | 0.02 | 0.05 | 0.06 | 0.08 | 0.06 |
Linguistic Style | ||||||||||||
Tot Word | 48.36 | 32.77 | 16.26 | 0.22 | 32.37 | 21.22 | 10.44 | 0.22 | 46.72 | 30.39 | 15.37 | 0.26 |
Tot Un Word | 37.24 | 27.88 | 15.68 | 0.27 | 26.22 | 19.12 | 10.53 | 0.26 | 36.28 | 26.27 | 14.96 | 0.29 |
Avg Syll PS | 19.01 | 16.44 | 11.26 | 0.22 | 14.04 | 12.46 | 8.84 | 0.17 | 18.49 | 15.81 | 11.09 | 0.22 |
RT_Fast | 0.16 | 0.11 | 0.05 | 0.22 | 0.11 | 0.07 | 0.03 | 0.22 | 0.16 | 0.10 | 0.05 | 0.26 |
RT_Slow | 0.28 | 0.19 | 0.09 | 0.22 | 0.18 | 0.12 | 0.06 | 0.22 | 0.27 | 0.17 | 0.09 | 0.26 |
SMOG Index | 2.62 | 2.12 | 1.36 | 0.13 | 1.88 | 1.58 | 1.01 | 0.10 | 2.54 | 2.05 | 1.30 | 0.14 |
Common Verbs | 0.09 | 0.10 | 0.13 | 0.07 | 0.08 | 0.09 | 0.09 | 0.00 | 0.09 | 0.10 | 0.12 | 0.03 |
Present Tense | 0.06 | 0.06 | 0.09 | 0.06 | 0.05 | 0.06 | 0.06 | 0.00 | 0.06 | 0.06 | 0.08 | 0.03 |
r/Booksuggestions | r/Moviesuggestions | r/Suggestmeabook | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Behavioral Engagement | HP | MP | LP | etasq | HP | MP | LP | etasq | HP | MP | LP | etasq |
Activ Length Months | 1.78 | 3.70 | 3.78 | 0.08 | 2.00 | 3.81 | 4.21 | 0.05 | 2.05 | 3.73 | 3.71 | 0.06 |
Num of Com Own Threads | 7.62 | 1.11 | 0.96 | 0.11 | 9.71 | 2.21 | 0.78 | 0.10 | 7.80 | 1.23 | 0.95 | 0.10 |
Comp Sentiment Score | 0.50 | 0.39 | 0.22 | 0.21 | 0.44 | 0.25 | 0.09 | 0.26 | 0.49 | 0.37 | 0.19 | 0.22 |
Pos Sentiment Score | 0.40 | 0.18 | 0.16 | 0.45 | 0.39 | 0.18 | 0.13 | 0.39 | 0.39 | 0.17 | 0.15 | 0.43 |
Neu Sentiment Score | 0.57 | 0.77 | 0.78 | 0.38 | 0.58 | 0.75 | 0.79 | 0.27 | 0.59 | 0.77 | 0.79 | 0.34 |
Neg Sentiment Score | 0.02 | 0.05 | 0.06 | 0.08 | 0.03 | 0.07 | 0.08 | 0.05 | 0.02 | 0.05 | 0.06 | 0.07 |
Affect | 0.17 | 0.07 | 0.07 | 0.31 | 0.17 | 0.08 | 0.08 | 0.18 | 0.16 | 0.07 | 0.07 | 0.29 |
Pos Emotion | 0.16 | 0.06 | 0.05 | 0.36 | 0.17 | 0.06 | 0.05 | 0.27 | 0.15 | 0.05 | 0.05 | 0.35 |
Linguistic Style | ||||||||||||
Tot Word | 14.25 | 37.11 | 21.45 | 0.16 | 12.16 | 26.13 | 10.87 | 0.23 | 15.34 | 36.84 | 20.39 | 0.17 |
Tot Un Word | 13.89 | 30.97 | 19.10 | 0.22 | 12.17 | 22.81 | 10.60 | 0.31 | 14.77 | 30.80 | 18.24 | 0.24 |
Avg Syll PS | 8.64 | 17.57 | 13.22 | 0.27 | 8.24 | 14.05 | 8.74 | 0.29 | 9.19 | 17.36 | 12.86 | 0.25 |
Avg Kup AoA PW | 4.10 | 3.90 | 3.61 | 0.08 | 3.99 | 3.78 | 3.23 | 0.12 | 4.11 | 3.87 | 3.53 | 0.10 |
RT Fast | 0.05 | 0.12 | 0.07 | 0.16 | 0.04 | 0.09 | 0.04 | 0.23 | 0.05 | 0.12 | 0.07 | 0.17 |
RT Slow | 0.08 | 0.21 | 0.12 | 0.16 | 0.07 | 0.15 | 0.06 | 0.23 | 0.09 | 0.21 | 0.12 | 0.17 |
SMOG Index | 1.21 | 2.27 | 1.61 | 0.11 | 0.95 | 1.79 | 1.03 | 0.14 | 1.25 | 2.24 | 1.58 | 0.10 |
Pers Pronouns | 0.09 | 0.06 | 0.05 | 0.09 | 0.08 | 0.05 | 0.03 | 0.15 | 0.08 | 0.06 | 0.05 | 0.10 |
1st Pers Singular | 0.05 | 0.04 | 0.03 | 0.03 | 0.04 | 0.03 | 0.02 | 0.11 | 0.05 | 0.04 | 0.03 | 0.04 |
2nd Pers | 0.03 | 0.01 | 0.01 | 0.09 | 0.03 | 0.01 | 0.01 | 0.08 | 0.03 | 0.01 | 0.01 | 0.09 |
Articles | 0.04 | 0.07 | 0.07 | 0.10 | 0.04 | 0.07 | 0.07 | 0.04 | 0.04 | 0.07 | 0.07 | 0.08 |
Common Verbs | 0.20 | 0.10 | 0.09 | 0.34 | 0.20 | 0.10 | 0.06 | 0.36 | 0.19 | 0.10 | 0.08 | 0.33 |
Present Tense | 0.16 | 0.06 | 0.05 | 0.35 | 0.15 | 0.06 | 0.04 | 0.34 | 0.15 | 0.06 | 0.05 | 0.35 |
Future Tense | 0.02 | 0.01 | 0.01 | 0.13 | 0.02 | 0.01 | 0.00 | 0.10 | 0.02 | 0.01 | 0.01 | 0.13 |
Adverbs | 0.04 | 0.04 | 0.03 | 0.02 | 0.03 | 0.04 | 0.02 | 0.09 | 0.04 | 0.04 | 0.03 | 0.03 |
Prepositions | 0.07 | 0.11 | 0.10 | 0.13 | 0.06 | 0.08 | 0.06 | 0.07 | 0.07 | 0.11 | 0.10 | 0.11 |
Conjunctions | 0.03 | 0.05 | 0.04 | 0.10 | 0.03 | 0.04 | 0.03 | 0.13 | 0.03 | 0.05 | 0.04 | 0.09 |
Cognitive | 0.09 | 0.12 | 0.11 | 0.08 | 0.09 | 0.12 | 0.08 | 0.12 | 0.09 | 0.12 | 0.10 | 0.07 |
Exclusive | 0.02 | 0.03 | 0.02 | 0.05 | 0.02 | 0.03 | 0.01 | 0.10 | 0.02 | 0.03 | 0.02 | 0.05 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Skotis, A.; Livas, C. Prominent User Segments in Online Consumer Recommendation Communities: Capturing Behavioral and Linguistic Qualities with User Comment Embeddings. Information 2024, 15, 356. https://doi.org/10.3390/info15060356
Skotis A, Livas C. Prominent User Segments in Online Consumer Recommendation Communities: Capturing Behavioral and Linguistic Qualities with User Comment Embeddings. Information. 2024; 15(6):356. https://doi.org/10.3390/info15060356
Chicago/Turabian StyleSkotis, Apostolos, and Christos Livas. 2024. "Prominent User Segments in Online Consumer Recommendation Communities: Capturing Behavioral and Linguistic Qualities with User Comment Embeddings" Information 15, no. 6: 356. https://doi.org/10.3390/info15060356
APA StyleSkotis, A., & Livas, C. (2024). Prominent User Segments in Online Consumer Recommendation Communities: Capturing Behavioral and Linguistic Qualities with User Comment Embeddings. Information, 15(6), 356. https://doi.org/10.3390/info15060356