Text Analysis Methods for Misinformation–Related Research on Finnish Language Twitter
:1. Introduction
2. Overview of Key Concepts
3. Materials and Methods
3.1. Methodology Issues Related to the Language Used
3.2. Data Collection
3.3. Data Processing
3.4. Word Count Analysis in Atlas.ti 9
4. Results
4.1. Word Cloud of Tweets
4.2. Topic Model of Tweets
4.3. Word Count Analysis and Clustering
5. Discussion
6. Conclusions
Author Contributions
Data Availability Statement
Conflicts of Interest
- Choudrie, J.; Banerjee, S.; Kotecha, K.; Walambe, R.; Karende, H.; Ameta, J. Machine learning techniques and older adults processing of online information and misinformation: A covid 19 study. Comput. Human Behav. 2021, 119, 106716. [Google Scholar] [CrossRef]
- EU. Final Report of the High Level Expert Group on Fake News and Online Disinformation. 2018. Available online: https://digital-strategy.ec.europa.eu/en/library/final-report-high-level-expert-group-fake-news-and-online-disinformation (accessed on 10 June 2021).
- Lazer, D.M.J.; Baum, M.A.; Benkler, Y.; Berinsky, A.J.; Greenhill, K.M.; Menczer, F.; Metzger, M.J.; Nyhan, B.; Pennycook, G.; Rothschild, D.; et al. The science of fake news. Science 2018, 359, 1094–1096. [Google Scholar] [CrossRef]
- Shu, K.; Bhattacharjee, A.; Alatawi, F.; Nazer, T.; Ding, K.; Karami, M.; Liu, H. Combating Disinformation in A Social Media Age. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2020, 10, 1–39. [Google Scholar] [CrossRef]
- Spradling, M.; Straub, J.; Strong, J. Protection from ‘Fake News’: The Need for Descriptive Factual Labeling for Online Content. Futur. Internet 2021, 13, 142. [Google Scholar] [CrossRef]
- Helmstetter, S.; Paulheim, H. Collecting a Large Scale Dataset for Classifying Fake News Tweets Using Weak Supervision. Futur. Internet 2021, 13, 114. [Google Scholar] [CrossRef]
- Carchiolo, V.; Longheu, A.; Malgeri, M.; Mangioni, G.; Previti, M. Mutual Influence of Users Credibility and News Spreading in Online Social Networks. Future Internet 2021, 13, 107. [Google Scholar] [CrossRef]
- Haselton, M.G.; Nettle, D. The paranoid optimist: An integrative evolutionary model of cognitive biases. Personal. Soc. Psychol. Rev. 2006, 10, 47–66. [Google Scholar] [CrossRef] [PubMed]
- Binder, J.R.; Desai, R.H.; Graves, W.W.; Conant, L. Where is the semantic system? A critical review and meta-analysis of 120 functional neuroimaging studies. Cereb Cortex 2009, 19, 2767–2796. [Google Scholar] [CrossRef] [PubMed]
- Fedorenko, E.; Scott, T.L.; Brunner, P.; Coon, W.G.; Pritchett, B.; Schalk, G.; Kanwisher, N. Neural correlate of the construction of sentence meaning. Proc. Natl. Acad. Sci. USA 2016, 113, E6256–E6262. [Google Scholar] [CrossRef] [Green Version]
- Freelon, D.; Wells, C. Disinformation as political communication. Polit. Commun. 2020, 37, 145–156. [Google Scholar] [CrossRef]
- Bradshaw, S.; Howard, P.N.; Kollanyi, B.; Neudert, L.M. Sourcing and automation of political news and information over social media in the United States, 2016–2018. Polit. Commun. 2020, 37, 173–193. [Google Scholar] [CrossRef]
- Bello-Orgaz, G.; Jung, J.J.; Camacho, D. Social big data: Recent achievements and new challenges. Inf. Fusion 2016, 28, 45–59. [Google Scholar] [CrossRef]
- Lytras, M.D.; Visvizi, A.; Jussila, J. Social media mining for smart cities and smart villages research. Soft Comput. 2020, 24, 10983–10987. [Google Scholar] [CrossRef]
- Vatrapu, R.; Mukkamala, R.R.; Hussain, A.; Flesch, B. Social set analysis: A set theoretical approach to big data analytics. IEEE Access 2016, 4, 2542–2571. [Google Scholar] [CrossRef]
- Li, J.; Su, M.H. Real Talk about Fake News: Identity Language and Disconnected Networks of the US Public’s “Fake News” Discourse on Twitter. Soc. Media Soc. 2020, 6. [Google Scholar] [CrossRef]
- Vosoughi, S.; Roy, D.; Aral, S. The spread of true and false news online. Soc. Sci. 2018, 359, 1146–1151. [Google Scholar] [CrossRef] [PubMed]
- Varol, O.; Ferrara, E.; Davis, C.A.; Menczer, F.; Flammini, A. Online Human-Bot Interactions: Detection, Estimation, and Characterization. In Proceedings of the Eleventh International AAAI Conference on Web and Social Media (ICWSM 2017), Montréal, QC, Canada, 15–18 May 2017; pp. 280–289. [Google Scholar]
- EU Institutions Data Flash Eurobarometer 464: Fake News and Disinformation Online. Available online: https://data.europa.eu/data/datasets/s2183_464_eng?locale=en (accessed on 10 June 2021).
- Zakharchenko, A.; Peráček, T.; Fedushko, S.; Syerov, Y.; Trach, O. When Fact-Checking and ‘BBC Standards’ Are Helpless: ‘Fake Newsworthy Event’Manipulation and the Reaction of the ‘High-Quality Media’on It. Sustainability 2021, 13, 573. [Google Scholar] [CrossRef]
- Dann, S. Twitter data acquisition and analysis: Methodology and best practice. In Maximizing Commerce and Marketing Strategies through Micro-Blogging; IGI Global: Hershey, PA, USA, 2015; pp. 280–296. [Google Scholar]
- UN. UN Tackles ‘Infodemic’ of Misinformation and Cybercrime in COVID-19 Crisis|United Nations. Available online: https://www.un.org/en/un-coronavirus-communications-team/un-tackling-%E2%80%98infodemic%E2%80%99-misinformation-and-cybercrime-covid-19 (accessed on 10 June 2021).
- Zeng, J.; Chan, C.H. A cross-national diagnosis of infodemics: Comparing the topical and temporal features of misinformation around COVID-19 in China, India, the US, Germany and France. Online Inf. Rev. 2021. [Google Scholar] [CrossRef]
- Fetzer, J.H. Disinformation: The use of false information. Minds Mach. 2004, 14, 231–240. [Google Scholar] [CrossRef]
- Pal, A.; Banerjee, S. Handbook of Research on Deception, Fake News, and Misinformation Online. In Advances in Media, Entertainment, and the Arts; Chiluwa, I.E., Samoilenko, S.A., Eds.; IGI Global: Hershey, PA, USA, 2019; ISBN 9781522585350. [Google Scholar]
- Fetzer, J.H. Information: Does it have to Be True? Minds Mach. 2004, 14, 223–229. [Google Scholar] [CrossRef]
- Bastick, Z. Would you notice if fake news changed your behavior? An experiment on the unconscious effects of disinformation. Comput. Human Behav. 2021, 116, 106633. [Google Scholar] [CrossRef]
- Tandoc, E.C.; Lim, Z.W.; Ling, R. Defining “Fake News”: A typology of scholarly definitions. Digit. J. 2018, 6, 137–153. [Google Scholar] [CrossRef]
- Tandoc, E.C., Jr.; Thomas, R.J.; Bishop, L. What Is (Fake) News? Analyzing News Values (and More) in Fake Stories. Media Commun. 2021, 9, 110–119. [Google Scholar] [CrossRef]
- UK Parliament Disinformation and ‘fake news’: Interim Report. Available online: https://publications.parliament.uk/pa/cm201719/cmselect/cmcumeds/363/363.pdf (accessed on 28 May 2021).
- Visvizi, A.; Jussila, J.; Lytras, M.D.; Ijäs, M. Tweeting and mining OECD-related microcontent in the post-truth era: A cloud-based app. Comput. Human Behav. 2020, 107, 105958. [Google Scholar] [CrossRef]
- Pan, W.; Fang, J. An Examination of Factors Contributing to the Acceptance of Online Health Misinformation. Front. Psychol. 2020, 12, 524. [Google Scholar] [CrossRef]
- Van Sant, K.; Fredheim, R.; Bergmanis-Korats, G. Abuse of Power: Coordinated Online Harassment of Finnish Government Ministers. Riga: NATO Strategic Communications Centre of Excellence. Available online: https://stratcomcoe.org/pdfjs/?file=/cuploads/pfiles/abuse_of_power_online_harassment_of_fin_ministers_16-03-2021.pdf?zoom=page-fit (accessed on 10 June 2021).
- Mejova, Y.; Weber, I.; Macy, M.W. Twitter: A Digital Socioscope; Cambridge University Press: New York, NY, USA, 2015. [Google Scholar]
- Friese, S. Qualitative Data Analysis with ATLAS.ti; SAGE: Los Angeles, SC, USA, 2019. [Google Scholar]
- Karlsson, F. Finnish: An Essential Grammar; Taylor & Francis e-Library: Abingdon, UK, 2002; ISBN 0-203-18753-9. [Google Scholar]
- Bird, S.; Klein, E.; Loper, E. Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit; O’Reilly Media, Inc.: Newton, MA, USA, 2009. [Google Scholar]
- Korenius, T.; Laurikkala, J.; Järvelin, K.; Juhola, M. Stemming and lemmatization in the clustering of finnish text documents. In Proceedings of the Thirteenth ACM International Conference on Information and Knowledge Management, Washington, DC, USA, 8–13 November 2004; pp. 625–633. [Google Scholar]
- Roesslein, J. Tweepy. Available online: https://docs.tweepy.org/en/stable/ (accessed on 3 May 2021).
- Partanen, A. TweetCollector. Available online: https://github.com/hamk-uas/TweetCollector (accessed on 11 June 2021).
- Twitter Twitter API v2: Early Access. Available online: https://developer.twitter.com/en/docs/twitter-api/early-access (accessed on 11 June 2021).
- Rosen, A. Tweeting Made Easier. Available online: https://blog.twitter.com/en_us/topics/product/2017/tweetingmadeeasier.html (accessed on 7 May 2021).
- Antupis Finnish Lemmatization with Python. Available online: https://antupis.github.io/lemmatization/finnish/2019/06/12/Lemmatizing-finnish-text.html (accessed on 28 May 2021).
- Bakshy, E.; Hofman, J.M.; Mason, W.A.; Watts, D.J. Everyone’s an influencer: Quantifying influence on twitter. In Proceedings of the Fourth ACM International Conference on Web Search and Data Mining, Hong Kong, China, 9–12 February 2011; pp. 65–74. [Google Scholar]
Variable Identification | Type | Description |
created_at | String | UTC time when this tweet was created. |
Text | String | The actual UTF-8 text of the status update. |
extended_tweet.full_text | String | Untruncated text message when longer than 140 characters. |
entities.hashtags | array | Represents hashtags that have been parsed out of the tweet text. |
user.id_srt | String | The string representation of the unique identifier for this user. |
user.screen_name | String | The screen name, handle, or alias that this user identifies themselves with. screen_names are unique but subject to change. |
user.description | String | The user-defined UTF-8 string describing their account. |
Action | Description |
1. Remove links | Remove links to reduce unstructured text by removing links. |
2. Make all letters lower case. | All letters are converted into lower case because the analysis is case-sensitive. |
3. Remove punctuation, digits, and special markers | Removing all punctuation to reduce unstructured text and numbers does not usually change the meaning of the text. Removing special markers, usually @, is commonly used when a user is mentioned. |
4. Remove white spaces | All unnecessary white spaces are removed. |
Action | Inclusion/Exclusion Criteria | Quantity of Data |
1. Data import from Excel to Atlas.ti | inclusion 16,463 | 16,463 documents (tweets) |
2. General terms to the stoplist | general Finnish stop terms 746 in Atlas.ti and other general words and Twitter display names | 47,013 words |
3. Wordlist with quantities to Excel spread list | Inclusion criteria minimum 50 tweets per word | 468 words |
4. Exclusion | Exclusion of 213 + 37 general words and Twitter display names | 218 words |
5. Inclusion | 385 derived or compounded from stem word | 602 words |
6. Clustering | Clustering of words | 88 clusters |
Word | Freq |
Propaganda | 4804 |
Valeuutinen | 1676 |
Disinformaatio | 1500 |
Suomen | 492 |
Yleuutiset | 461 |
mariaohisalo | 421 |
Leviää | 358 |
Suomessa | 354 |
Yle | 354 |
Russia | 334 |
Dimmu | 328 |
Persut | 326 |
propagandaa | 306 |
Ylen | 287 |
Saa | 275 |
Osa | 268 |
Media | 267 |
astatenhunen | 267 |
Pitää | 266 |
Mm | 255 |
Uutisankka | 253 |
Keronen | 249 |
Amp | 248 |
Marinsanna | 244 |
Hsfi | 225 |
Twitterissä | 217 |
Venäjän | 216 |
Hallituksen | 206 |
Journalismi | 205 |
Somessa | 197 |
Topic 1 | Topic 4 | Topic 7 | Topic 10 | Topic 13 |
propaganda | propaganda | propaganda | propaganda | propaganda |
mariaohisalo | disinformaatio | valeuutinen | disinformaatio | valeuutinen |
somessa | valeuutinen | leviää | yle | hai |
jaa | marinsanna | persut | mariaohisalo | govt |
valeuutisia | hai | mariaohisalo | kansainvälistä | disinformaatio |
linkkejä | lapset | disinformaatio | ritken | sanoin |
sitaatteja | govt | liittyen | ajoista | mattimuukkonen |
voida | suomen | opparviainen | vietetään | paikassa |
alkuperää | pitäisi | sisäministeri | synkistä | vaiennus |
todistaa | yleisradio | sosiaalisi | kansanmurhien | lakia |
Topic 2 | Topic 5 | Topic 8 | Topic 11 | Topic 14 |
propaganda | propaganda | propaganda | propaganda | propaganda |
valeuutinen | disinformaatio | disinformaatio | valeuutinen | valeuutinen |
disinformaatio | valeuutinen | vaarallista | russia | leviää |
mariaohisalo | tuli | päivää | suomen | persut |
leviää | vihapuhe | levinnyt | propagandaa | kuvaa |
liittyen | mm | syystä | leviää | govt |
sosiaalisi | pari | pari | disinformaatio | hai |
sisäministeri | yle | valheellinen | mariaohisalo | disinformaatio |
opparviainen | suomen | liandersson | the | somealustoilla |
propagandaa | kuntavaalit | kuvamanipulaa | opparviainen | kannattajia |
Topic 3 | Topic 6 | Topic 9 | Topic 12 | Topic 15 |
propaganda | propaganda | propaganda | propaganda | propaganda |
valeuutinen | disinformaatio | valeuutinen | astatenhunen | valeuutinen |
disinformaatio | valeuutinen | lakia | susi | kysymyksiä |
leviää | yleuutiset | sensuuripykälä | disinformaatio | mariaohisalo |
yleuutiset | russia | mattimuukkonen | valeuutinen | sanna |
media | toimii | vaiennus | suomen | leviää |
ylen | korona | mieltä | syy | keronen |
mariaohisalo | lapset | sanoin | vihreät | nuorten |
opparviainen | venäjän | paikassa | mm | hallituksen |
liittyen | suomessa | rikotaan | amp | suulla |
Main Clusters in English | Quantity of WordManifestations |
Keyword clusters | |
1. Propaganda (Fin. propaganda) | 5706 |
2. Fake news (Fin. valeuutinen or fake news) | 2119 |
3. Disinformation (Fin. disinformaatio) | 1991 |
4. Hoax (Fin. uutisankka) | 256 |
5. Misinformation (Fin. misinformaatio) | 198 |
6. Lie, Waddle, Truth, Fact (Fin. vale, valhe, huuhaa, totuus, fakta) | 1274 |
Theme word clusters | |
7. Media | 5692 |
8. Politics | 3821 |
9. Foreign countries | 2769 |
10. Politician | 2101 |
11. Finland | 1601 |
12. Health (corona, vaccination, virus) | 1538 |
13. Social media | 1416 |
14. Animals | 520 |
15. Children and young people | 507 |
16. Movements (Qanon, Isis, Elokapina) | 453 |
17. Country and World | 340 |
18. School | 290 |
19. Police | 253 |
20. Researcher and Research | 247 |
21. Opposition (Fin. vastaisuus) | 228 |
22. Hate Speech | 201 |
The Content of Three Main Clusters in English | Quantity of Manifestations |
Clusters | |
Politics | 3821 |
Finns Party (Fin. persu, persujen, persut, perussuomalaiset, perussuomalaisten, perussuomalaisiin, ps, ps:n) | 848 |
The Greens and left-wing greens (Fin. vihreät, vihreat, vihreiden, vihreille, vihreiltä, vihreistä, vihreitä, vihreä, vihreän, vihervasemmisto, vihervasemmiston, vihervasemmisto’lainen) | 564 |
Right-wing (Fin. äärioikeisto, äärioikeistolainen, äärioikeiston, äärioikeis, äärioikeistolaista, äärioikeistollisten) | 384 |
Government (Fin. hallituksen, hallitus) | 350 |
Municipal election, election (Fin. kuntavaaliehdokkaat, kuntavaalien, kuntavaalit, kuntavaalit2021, vaalit, vaaleihin, vaaleissa, vaaleja, vaalien | 297 |
Left-wing (Fin. vasemmisto, vasemmistolainen, vasemmalla, vasemmistolaisuus, vasemmiston, vasemmistopopulismista) | 259 |
Communism (Fin. kommunismi, kommunisti, kommunistien, kommunistinen, kommunistisen, kommunistista, kommunistit) | 134 |
Politicians | 2101 |
The prime minister (Fin. marin, marinin, marinia, marinista, marinsanna, sanna, pääministeri, pääministeriltä) | 898 |
Minister of the Interior (Fin. mariaohisalo, ohisalo, ohisalon, ohisalosta, sisäministe, sisäministeri) | 705 |
Foreign countries | 2769 |
Russia (Fin. venäjä, venäjän, venäjällä, venäjä’n, venäläinen, venäläispropagandistien, russia, russian, russians) | 867 |
Trump (Fin. trump, trumpin, trumpia, trumppia) | 409 |
EU (Fin. eu, eu:n, eu’n, euroopan) | 326 |
USA (Fin. usa, usa:n, usan, usa:ssa, usassa, usavaalit, yhdysvallat, yhdysvalloissa, yhdysvaltain) | 237 |
China (Fin. kiina, kiinaa, kiinan, kiinassa, ürümqi) | 225 |
Sweden (Fin. ruotsin, sek) | 104 |
Putin (Fin. putin, putinin) | 98 |
Soviet Union (Fin. neuvostoliitto, neuvostoliiton) | 63 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Jussila, J.; Suominen, A.H.; Partanen, A.; Honkanen, T. Text Analysis Methods for Misinformation–Related Research on Finnish Language Twitter. Future Internet 2021, 13, 157. https://doi.org/10.3390/fi13060157
Jussila J, Suominen AH, Partanen A, Honkanen T. Text Analysis Methods for Misinformation–Related Research on Finnish Language Twitter. Future Internet. 2021; 13(6):157. https://doi.org/10.3390/fi13060157
Chicago/Turabian StyleJussila, Jari, Anu Helena Suominen, Atte Partanen, and Tapani Honkanen. 2021. "Text Analysis Methods for Misinformation–Related Research on Finnish Language Twitter" Future Internet 13, no. 6: 157. https://doi.org/10.3390/fi13060157
APA StyleJussila, J., Suominen, A. H., Partanen, A., & Honkanen, T. (2021). Text Analysis Methods for Misinformation–Related Research on Finnish Language Twitter. Future Internet, 13(6), 157. https://doi.org/10.3390/fi13060157