Modeling and Moderation of COVID-19 Social Network Chat
Abstract
:1. Introduction
2. Materials and Methods
2.1. Dataset Collection and Feature Extraction
2.2. Bi-clustering and HMM Initialization
2.3. HMM Training
3. Results
3.1. Bi-Clustering
3.2. HMM Training
3.3. Meaning of HMM Conversation States
3.4. Moderation Strategies
3.5. Comparison with Other Works
4. Discussion
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
OSN | Online social network |
SVM | Support vector machine |
LSTM | Long short-term memory |
VAE | Variational auto-encoder |
HMM | Hidden Markov model |
TF-IDF | Term frequency-inverse document frequency |
References
- Listings of WHO’s Response to COVID-19. 2020. Available online: https://www.who.int/news/item/29-06-2020-covidtimeline (accessed on 13 December 2022).
- Auxier, B.; Anderson, M. Social media use in 2021. Pew Res. Cent. 2021, 1, 1–4. [Google Scholar]
- Spotlight on Canadians: Results from the General Social Survey the Use of Media to Follow News and Current Affairs. 2016. Available online: https://www150.statcan.gc.ca/n1/pub/89-652-x/89-652-x2016001-eng.htm (accessed on 8 November 2022).
- Wong, A.; Ho, S.; Olusanya, O.; Antonini, M.V.; Lyness, D. The use of social media and online communications in times of pandemic COVID-19. J. Intensive Care Soc. 2021, 22, 255–260. [Google Scholar] [CrossRef] [PubMed]
- Oyebode, O.; Ndulue, C.; Adib, A.; Mulchandani, D.; Suruliraj, B.; Orji, F.A.; Chambers, C.T.; Meier, S.; Orji, R. Health, Psychosocial, and Social Issues Emanating From the COVID-19 Pandemic Based on Social Media Comments: Text Mining and Thematic Analysis Approach. JMIR Med. Inform. 2021, 9, e22734. [Google Scholar] [CrossRef] [PubMed]
- Lyu, J.C.; Han, E.L.; Luli, G.K. COVID-19 Vaccine–Related Discussion on Twitter: Topic Modeling and Sentiment Analysis. J. Med. Internet Res. 2021, 23, e24435. [Google Scholar] [CrossRef] [PubMed]
- Boucher, J.C.; Cornelson, K.; Benham, J.L.; Fullerton, M.M.; Tang, T.; Constantinescu, C.; Mourali, M.; Oxoby, R.J.; Marshall, D.A.; Hemmati, H.; et al. Analyzing Social Media to Explore the Attitudes and Behaviors Following the Announcement of Successful COVID-19 Vaccine Trials: Infodemiology Study. JMIR Infodemiol. 2021, 1, e28800. [Google Scholar] [CrossRef] [PubMed]
- Haupt, M.R.; Li, J.; Mackey, T.K. Identifying and characterizing scientific authority-related misinformation discourse about hydroxychloroquine on twitter using unsupervised machine learning. Big Data Soc. 2021, 8, 20539517211013843. [Google Scholar] [CrossRef]
- Hussain, A.; Tahir, A.; Hussain, Z.; Sheikh, Z.; Gogate, M.; Dashtipour, K.; Ali, A.; Sheikh, A. Artificial Intelligence–Enabled Analysis of Public Attitudes on Facebook and Twitter Toward COVID-19 Vaccines in the United Kingdom and the United States: Observational Study. J. Med. Internet Res. 2021, 23, e26627. [Google Scholar] [CrossRef] [PubMed]
- Paul, N.; Gokhale, S.S. Analysis and classification of vaccine dialogue in the coronavirus era. In Proceedings of the 2020 IEEE International Conference on Big Data (Big Data), Atlanta, GA, USA, 10–13 December 2020; pp. 3220–3227. [Google Scholar]
- Loomba, S.; de Figueiredo, A.; Piatek, S.J.; de Graaf, K.; Larson, H.J. Measuring the impact of COVID-19 vaccine misinformation on vaccination intent in the UK and USA. Nat. Hum. Behav. 2021, 5, 337–348. [Google Scholar] [CrossRef] [PubMed]
- Obadimu, A.; Khaund, T.; Mead, E.; Marcoux, T.; Agarwal, N. Developing a socio-computational approach to examine toxicity propagation and regulation in COVID-19 discourse on YouTube. Inf. Process. Manag. 2021, 58, 102660. [Google Scholar] [CrossRef] [PubMed]
- Dutta, S.; Das, D. Dialogue modelling in multi-party social media conversation. In Proceedings of the International Conference on Text, Speech, and Dialogue, Prague, Czech Republic, 27–31 August 2017; Springer: Berlin/Heidelberg, Germany, 2017; pp. 219–227. [Google Scholar]
- Brambilla, M.; Javadian, A.; Sulistiawati, A.E. Conversation Graphs in Online Social Media. In Proceedings of the International Conference on Web Engineering, Biarritz, France, 18–21 May 2021; Springer: Berlin/Heidelberg, Germany, 2021; pp. 97–112. [Google Scholar]
- Jelodar, H.; Wang, Y.; Orji, R.; Huang, S. Deep sentiment classification and topic discovery on novel coronavirus or COVID-19 online discussions: NLP using LSTM recurrent neural network approach. IEEE J. Biomed. Health Inform. 2020, 24, 2733–2742. [Google Scholar] [CrossRef] [PubMed]
- Hayawi, K.; Shahriar, S.; Serhani, M.A.; Taleb, I.; Mathew, S.S. ANTi-Vax: A novel Twitter dataset for COVID-19 vaccine misinformation detection. Public Health 2022, 203, 23–30. [Google Scholar] [CrossRef]
- Yousefinaghani, S.; Dara, R.; Mubareka, S.; Papadopoulos, A.; Sharif, S. An analysis of COVID-19 vaccine sentiments and opinions on Twitter. Int. J. Infect. Dis. 2021, 108, 256–262. [Google Scholar] [CrossRef]
- Zeng, J.; Li, J.; He, Y.; Gao, C.; Lyu, M.R.; King, I. What you say and how you say it: Joint modeling of topics and discourse in microblog conversations. Trans. Assoc. Comput. Linguist. 2019, 7, 267–281. [Google Scholar] [CrossRef]
- Xu, J.; Lei, Z.; Wang, H.; Niu, Z.Y.; Wu, H.; Che, W. Discovering dialog structure graph for coherent dialog generation. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Bangkok, Thailand, 1–6 August 2021; Association for Computational Linguistics: Stroudsburg, PA, USA, 2021; pp. 1726–1739. [Google Scholar]
- Qiu, L.; Zhao, Y.; Shi, W.; Liang, Y.; Shi, F.; Yuan, T.; Yu, Z.; Zhu, S.C. Structured Attention for Unsupervised Dialogue Structure Induction. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Online, 16–18 November 2020; Association for Computational Linguistics: Stroudsburg, PA, USA, 2020; pp. 1889–1899. [Google Scholar]
- Bonifazi, G.; Breve, B.; Cirillo, S.; Corradini, E.; Virgili, L. Investigating the COVID-19 vaccine discussions on Twitter through a multilayer network-based approach. Inf. Process. Manag. 2022, 59, 103095. [Google Scholar] [CrossRef] [PubMed]
- Ritter, A.; Cherry, C.; Dolan, B. Unsupervised Modeling of Twitter Conversations. In Proceedings of the Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Los Angeles, CA, USA, 2–4 June 2010; pp. 172–180. [Google Scholar]
- Brychcín, T.; Král, P. Unsupervised Dialogue Act Induction using Gaussian Mixtures. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers, Online, 19–20 April 2017; pp. 485–490. [Google Scholar]
- Paul, M.J. Mixed Membership Markov Models for Unsupervised Conversation Modeling. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Jeju Island, Republic of Korea, 12–14 July 2012; pp. 94–104. [Google Scholar]
- Rabiner, L.R. A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 1989, 77, 257–286. [Google Scholar] [CrossRef] [Green Version]
- Dhillon, I.S. Co-Clustering Documents and Words Using Bipartite Spectral Graph Partitioning. In Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Association for Computing Machinery, San Francisco, CA, USA, 26–29 August 2001; pp. 269–274. [Google Scholar]
- Rousseeuw, P.J. Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 1987, 20, 53–65. [Google Scholar] [CrossRef] [Green Version]
- Kendall, M.G. A new measure of rank correlation. Biometrika 1938, 30, 81–93. [Google Scholar] [CrossRef]
- Juang, B.H.; Rabiner, L.R. A probabilistic distance measure for hidden Markov models. AT&T Tech. J. 1985, 64, 391–408. [Google Scholar]
- Page, L.; Brin, S.; Motwani, R.; Winograd, T. The PageRank Citation Ranking: Bringing Order to the Web; Technical Report 1999-66; Stanford InfoLab: Stanford, CA, USA, 1999. [Google Scholar]
Name | Description |
---|---|
Toxicity score | Degree to which the comment contains some form of toxicity. Continuous (between 0 and 1). |
Sarcasm score | Degree to which the comment contains sarcasm. Continuous (between 0 and 1). |
Sentiment score | Degree to which the comment is positive. Continuous (between 0 and 1). |
Anger score | Degree to which the comment contains the anger emotion. Continuous (between 0 and 1). |
Fear score | Degree to which the comment contains the fear emotion. Continuous (between 0 and 1). |
Joy score | Degree to which the comment contains the joy emotion. Continuous (between 0 and 1). |
Love score | Degree to which the comment contains the love emotion. Continuous (between 0 and 1). |
Sadness score | Degree to which the comment contains the sadness emotion. Continuous (between 0 and 1). |
Surprise score | Degree to which the comment contains the surprise emotion. Continuous (between 0 and 1). |
Contains URL | 1 if the comment contains an URL, 0 otherwise. |
Contains Email | 1 if the comment contains an email address, 0 otherwise. |
Contains hashtag | 1 if the comment contains a hashtag, 0 otherwise. |
Image only | 1 if the comment contains only an image or a GIF but no text, 0 otherwise. |
Starts with name | 1 if the comment starts with a proper noun referring to a person, 0 otherwise. |
Comment length | Number of words in the comment. |
Nbr likes | Number of likes on the comment. |
Nbr first person singular pronouns | Number of first person singular pronouns that the comment contains. |
Nbr first person plural pronouns | Number of first person plural pronouns that the comment contains. |
Nbr second person pronouns | Number of second person pronouns (both singular and plural) that the comment contains. |
Nbr third person singular pronouns | Number of third person singular pronouns that the comment contains. |
Nbr third person plural pronouns | Number of third person plural pronouns that the comment contains. |
Nbr politeness / gratitude | Number of terms of politeness and gratitude that the comment contains. |
Elapsed time | How much time has passed between when this comment and the previous one were written |
in the conversation. Measured in seconds. |
Model | Value | Model | Value | Model | Value |
---|---|---|---|---|---|
Model #1 | 0.26 | Model #2 | 0.24 | Model #3 | 0.23 |
Model #4 | 0.22 | Model #5 | 0.20 | Model #6 | 0.18 |
Model #7 | 0.13 | Model #8 | 0.11 | Model #9 | 0.10 |
Model #10 | 0.09 | Model #11 | 0.09 | Model #12 | 0.08 |
EM Conversation | 0.22 | Conversation + Topic | 0.26 | Bayesian Conversation | 0.28 |
Name | Description | Examples of Most Relevant Unigrams and Bigrams | Proportion of Comments from the Dataset |
---|---|---|---|
Positive | Comments that are mostly positive. | Delicious, congratulations, condolences, fantastic, adorable, much happiness, filled joy, well done, beautiful story, happy birthday, etc. | 35.70% |
Images/GIF | Comments that consist of only an image or a GIF, with no text. | N/A | 0.86% |
Negative/toxic | Comments that are negative and toxic in general. | Vicious, vile, drunken, bitch, petty, pretty racist, jealous con, fascist regimes, notoriously vicious, etc. | 29.02% |
COVID-19 and vaccine worries or skepticism | Comments that reflect people’s worries about the vaccine and COVID-19 in general, as well as their skepticism towards both of these aspects. Feelings of discomfort and uneasiness related to the lockdown are also present in these comments. | Poliovirus, terrified, claustrophobic, skeptical, frightened, reluctant, nervous, feel uncomfortable, URL vaccines, plandemic scamdemic, really scared, URL brainwashing, vaccine derived, etc. | 5.44% |
URLs | Comments that are mostly made up of users linking URLs, with little to no additional text. | N/A | 1.29% |
Negative—society and economy | Comments that contain a lot of negativity aimed towards the state of society and economy. | Doomed, deprived, teetering, disgraceful, crumbling, dysfunctional, agonizing, failed economic, disrupting economy, warnings imploring, hoarding country, decimated economy, lost jobs, crash bankrupts, etc. | 9.25% |
Negative—politicians | Comments that contain a lot of negativity aimed towards politicians and governments. Contains a few hashtags. | #teardowntrudeau, #thisisamerica, overlords, spineless, #npisfakenews, humiliate bureaucratic, overlords demanding, bureaucratic overlords, trump trash, liberal retardation, hot mess, deficits matter, etc. | 12.54% |
Misc. 1 | Long messages on a variety of topics. | Khalifa, merciful, chastisement, Allah, vigour, herbal, human physicians, oil rich, grand quran, private sector, misleading information, etc. | 3.78% |
Misc. 2 | Long messages on a variety of topics. | Allah, trachea, stable financially, peace upon, investment trade, allah chastisement, wonderful mentorship, war crimes, isreali regime, economic growth, private sector, etc. | 2.12% |
State | Baseline | Positive Start | Reduced Loops | Negative Intervention | Non-Negative Intervention | Positive Only |
---|---|---|---|---|---|---|
Positive | 33.59% | 33.59% | 37.17% | 39.16% | 40.61% | 43.22% |
Images/GIF | 0.82% | 0.82% | 0.91% | 0.95% | 0.98% | 0.70% |
Negative/toxic | 27.48% | 27.48% | 24.51% | 24.05% | 21.96% | 23.89% |
COVID-19 and vaccine worries or skepticism | 5.20% | 5.20% | 5.76% | 6.07% | 6.31% | 4.54% |
URLs | 1.31% | 1.31% | 1.44% | 1.52% | 1.64% | 1.10% |
Negative—society and economy | 8.89% | 8.89% | 9.27% | 7.69% | 7.13% | 7.78% |
Negative—politicians | 15.33% | 15.33% | 12.76% | 11.92% | 12.27% | 12.54% |
Misc. 1 | 4.02% | 4.02% | 4.45% | 4.70% | 4.89% | 3.31% |
Misc. 2 | 3.37% | 3.37% | 3.73% | 3.94% | 4.22% | 2.92% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Gélinas-Gascon, F.; Khoury, R. Modeling and Moderation of COVID-19 Social Network Chat. Information 2023, 14, 124. https://doi.org/10.3390/info14020124
Gélinas-Gascon F, Khoury R. Modeling and Moderation of COVID-19 Social Network Chat. Information. 2023; 14(2):124. https://doi.org/10.3390/info14020124
Chicago/Turabian StyleGélinas-Gascon, Félix, and Richard Khoury. 2023. "Modeling and Moderation of COVID-19 Social Network Chat" Information 14, no. 2: 124. https://doi.org/10.3390/info14020124
APA StyleGélinas-Gascon, F., & Khoury, R. (2023). Modeling and Moderation of COVID-19 Social Network Chat. Information, 14(2), 124. https://doi.org/10.3390/info14020124