Text Analysis Methods for Misinformation–Related Research on Finnish Language Twitter

: The dissemination of disinformation and fabricated content on social media is growing. Yet little is known of what the functional Twitter data analysis methods are for languages (such as Finnish) that include word formation with endings and word stems together with derivation and compounding. Furthermore, there is a need to understand which themes linked with misinformation—and the concepts related to it—manifest in different countries and language areas in Twitter discourse. To address this issue, this study explores misinformation and its related concepts: disinformation, fake news, and propaganda in Finnish language tweets. We utilized (1) word cloud clustering, (2) topic modeling, and (3) word count analysis and clustering to detect and analyze misinformation-related concepts and themes connected to those concepts in Finnish language Twitter discussions. Our results are two-fold: (1) those concerning the functional data analysis methods and (2) those about the themes connected in discourse to the misinformation-related concepts. We noticed that each utilized method individually has critical limitations, especially all the automated analysis methods processing for the Finnish language, yet when combined they bring value to the analysis. Moreover, we discovered that politics, both internal and external, are prominent in the Twitter discussions in connection with misinformation and its related concepts of disinformation, fake news, and propaganda.


Introduction
Across disciplines, there is an increasing interest in misinformation, which is an umbrella term referring to false information circulating online [1]. However, there is also limited understanding of why certain individuals, societies, and institutions are more vulnerable to disinformation, i.e., the intentional spread of misinformation [2]. Furthermore, fake news is considered one of the information disorders of misinformation and disinformation, a notably effective vehicle for disinformation [3]. Recently, the dissemination of disinformation and fabricated content, e.g., in the form of fake news on social media such as Twitter, is a growing concern, especially due to the lack of awareness of the existence of such false information [4][5][6][7]. The severity of this concern has expanded as younger generations choose social media sources over journalistic ones for their information [5]. Different disciplines have approached the subject of misinformation from various viewpoints of human behavior and communication, as well as the spreading enabled by information systems and social media. As examples of those presented approaches are research on cognitive biases related to information literacy [8], disinformation processing, e.g., concerning sentence comprehension and semantic decision making [9,10]. In the field of communication research, disinformation is studied primarily from the perspective of political communication because its functions are particularly visible and effective, especially in election campaigns. The study of propaganda has a long history in communication research, but disinformation was rarely a topic of primary analysis before 2017 [11]. Since then, communication research on disinformation has widely increased, and the study of social media has played a special role in this [12]. In information systems, research in big data is a popular trend that deals with high volume, high velocity, high variety, and high veracity information assets that require new forms of processing for enhanced decision making and insight discovery [13]. In social media research, it has been identified that conversations, actors involved in the conversations, and the interactions between the actors are relevant dimensions for the dissemination of disinformation [14,15]. Furthermore, the term "fake news" serves as a discursive device for ordinary citizens to consolidate group identity in everyday political utterances on Twitter [16]. The rapid spread of false information on Twitter has especially raised concerns [3,17,18]. So far, research is still theoretically scattered, and there is a need for a data-driven understanding of the phenomena. The research has especially focused on the role of social media in facilitating the dissemination of fake news. However, it has been shown that social media also serves as a networked public space for opposing parties to define, contest, and strategically leverage the "fake news" label to their respective interests, thus challenging democracy with a weaponized discourse of fake news [16]. A total of 33% of Finns tend to trust or totally trust news and information they receive through online social networks and messaging apps [19]. Thus, investigating the spread of misinformation, disinformation, fake news, and propaganda in the discourse of Finnish Twitter is a relevant research topic. Previous research has also identified that users have difficulty in recognizing manipulation of information in news [20], which supports the need for further research on the topic. Methods to study and analyze social media data, also regarding misinformation, are currently evolving alongside social media applications. Thus, we focus on Twitter data, especially in Finnish Twitter. Therefore, in this article, we aim to study misinformation and its related concepts in the context of Twitter, test the functionality of a few analysis methods for Twitter data in the Finnish language, examine the discussed themes linked with misinformation and its related concepts in Finnish tweets, and propose some promising future research directions on misinformation in social media. This study is limited to the investigation of the text content of tweets [21]. Limiting the investigation only on the text content of tweets was chosen to uncover the inherent challenges of text mining Finnish tweet contents and to understand what kind of insights could be drawn from text only. The Finnish language has special grammatical characteristics since word formation occurs with the addition of endings (bound morphemes, suffixes) to word stems. In addition, derivation and compounding are two ways of forming new words from existing words and stems, which makes the analysis of words and text, as well as machine learning of the language, more complex than with English, for example. Therefore, our goals are to find functional methods of analysis as well as discussion themes that are linked with misinformation-related concepts in Finnish Twitter data. To reach our goals, we seek to answer the following research question: "What are the functional methods to analyze Finnish Twitter data to find out discussion themes that are linked with misinformation and concepts related to it?" The article is formulated as follows: First, in the introduction, we give background regarding our research problem and present the research question. Second, we describe the key concepts related to misinformation. Third, we discuss our methodological choices for our research. Fourth, we illustrate our research results. Fifth, we discuss our results in light of the literature and present practical implications together with further research suggestions. And sixth, we draw our conclusions.

Overview of Key Concepts
Simultaneously with the COVID-19 pandemic, the world is struggling with an "infodemic". Infodemic is information overload about a problem, which makes discovering a solution cumbersome, and it can spread misinformation and disinformation [22,23]. "Misinformation" and "disinformation," together with "fake news," are related concepts, now excessively utilized both by academics and other information providers. The proliferation in the use of the concepts and the lack of generally agreed definitions, has led to interchangeable use [23]. Misinformation is defined as false, mistaken, or misleading information [24], and as a type of claim that can be verified and that has been confirmed to be false [23]. It refers to forms of factually inaccurate information that is propagated inadvertently [25], often considered as an honest mistake [4]. Some studies use misinformation as an umbrella term when referring to false information circulating online [1]. Misinformation, disinformation, and fake news are three distinct concepts, although similar characteristics are detectable in the definitions used. Other potentially conceptually overlapping with misinformation are propaganda, conspiracy theories, and false rumors [23]. Disinformation, in addition to the falseness of the information, includes the purposeful intention by the sender or information provider, and thus distributed, asserted, or disseminated-often online-to mislead, deceive, or confuse [1,[24][25][26][27]. The communication of misinformation has a social function: (1) it is comprehended as a type of collective sense-making, and (2) it subverts outgroups or rivals [23]. Moreover, disinformation is considered a direct manifestation of this function, containing deliberately deceptive and propagated information that purposefully weakens public support of a competing body, e.g., as in the U.S. presidential election campaigns [23].
Fake news is one of the information disorders of misinformation and disinformation [3]. It is a notably effective vehicle for disinformation. Fake news is fabricated information that mimics news media content in form [3], i.e., disguised as journalistic articles [27,28]. Fake news is pushed onto newsfeed platforms [27,28] like Twitter. Although its form resembles news, the organizational process or intent of fake news does not. The outlets lack editorial norms and processes of the news media, where accuracy and credibility of information are ensured [3]. Thus, fake news exploits the credibility, timeliness, and the variety of sensitive topics of journalism [27,28]. At its core, it is perniciously parasitic, benefiting from the standard media outlets, yet undermining their credibility [3]. The difficulty in differentiating fake news from traditional news is due to their similarity. They both share an inverted pyramid format, timeliness, negativity and prominence, and subjects such as politics and government. The only two differences are the lack of objectivity, including the opinion of their author(s) and the value of impact because fake news sites focus on trivial stories [29]. Thus, the recipient, journalist, or reader of a news article usually cannot recognize the manipulation [20].
As a concept, fake news avoids any agreed definition [1,30], although it goes back centuries, and it became a buzzword particularly after the 2016 presidential elections in the United States [28,29]. Therefore, it currently has a political flavor [1,3,17], but vaccination, nutrition, and the stock market are also topics of fake news [3]. The research on fake news has focused a lot on the "fakeness," i.e., the deception as well as the motivation of the deceivers [29]. Furthermore, the focus of the research has been on the role of social media in facilitating the dissemination of fake news. Yet, it has been shown that social media also serves as a networked public space for opposing parties to define, contest, and strategically leverage the "fake news" label to their respective interests, thus challenging democracy with the weaponized discourse of fake news. Moreover, the term "fake news" serves as a discursive device for ordinary citizens to consolidate group identity in everyday political utterances on Twitter [16].
Misinformation has an adverse societal impact because it accelerates propaganda, creates anxiety, induces fear, and sways public opinion [1]. Unfortunately, its speed, geographical reach, depth, and breadth exceed those of information, yet humans seldom detect misinformation [1,17]. The social function of misinformation, i.e., sense-making, determines that ambiguous or potentially threatening contexts, such as crisis, is suitable soil for disseminated false information to breed and flourish, especially as fertilized by the information paucity and anxiety of people [23]. The active role of the audience in fake news is also emphasized. The audience seems to be involved in its co-construction, as the receiver determines the realness or fakeness of the news. When fake news is regarded as fake (i.e., fiction) by the receiver, the deception process has failed. Yet, when the deception process succeeds, the legitimacy of journalism is taken advantage of. The co-creative role of the audience is accentuated in the social media context due to the information exchange that comprises the negotiation and sharing of meanings. In social media, socialness expedites the construction of fake news by mitigating its penetration into social spheres. Social spheres are strengthened by information exchange, which potentially sidelines the quality of information. Therefore, the legitimizing role of the audience in future studies is called for [28].
Social media is regarded as a powerful global medium that influences people's perceptions of the world and their role in it [31]. Social media has raised critical concerns as context due to its popularity [3] as well as the rapid spread of fake news, thus causing potentially detrimental effects both on individuals and society [17,32]. In particular, the rapid spread of false information on Twitter has caused concern [3, 17,18].
On Twitter, the diffusion of falsehood (i.e., retweeting) is significantly farther, faster, deeper, and broader than the truth in all categories of information. This is particularly prevailing in the topic of politics. Two issues were conspicuous concerning fake news on Twitter. Firstly, the novelty that suggests that people share novel information. Secondly, the negative emotions that false stories inspired in replies: fear, disgust, and surprise. The rate of acceleration by robots ("bots") was the same for both true and false news. That implies that the spread of false news is caused by humans [17]. However, social bots (automated accounts impersonating humans) do play a role in magnifying the spread of fake news by orders of magnitude by liking, sharing, and searching for information [3]. The bot population on Twitter has been estimated to range from 9% to 15%.
A recent exploratory study on Finnish Twitter on the scope of politically motivated abusive language focused on the extent to which it is perpetrated by inauthentic accounts, i.e., bots [33]. The results demonstrated that the messaging is mostly with very low levels of both bot and coordinated activity, i.e., it is human-induced. Furthermore, the minority of the identified bots were operating in the Finnish language, while most were using English in an automated or semi-automated manner. The themes that triggered abusive language concerning politics were government corruption and failure, sexism and homophobia, racism and Islamophobia, government handling of COVID-19, and education (in the context of COVID-19). Therefore, it is interesting to explore whether the same or similar themes are likewise related to misinformation on Finnish Twitter in general. A third of Finns tend to trust or totally trust news and information they receive through online social networks and messaging apps, according to the Flash Eurobarometer [19]. Therefore, investigating the spread of misinformation, disinformation, fake news, and propaganda on Finnish Twitter is a relevant research topic, as it has been recognized that the recipient (whether journalist or reader) usually cannot recognize the manipulation [20].

Materials and Methods
The methodology of the study is composed of Twitter data collection and analysis by a combination of the computational social science approach [34] and content analysis with Atlas.ti [35]. The description of the methodology follows the recommendations for Twitter data acquisition and analysis by Dann [21]. The computational social science approach is used to create a word cloud-a list of most frequently appearing words and a topic model of the collected tweets. A more detailed word count analysis in Atlas.ti 9 and clustering of keywords and theme words are then conducted to gain a deeper understanding of the discussions.

Methodology Issues Related to the Language Used
The Finnish language is a member of the Finno-Ugric language family and closely related to Estonian. In 1997, Finnish was the native language of 92.7% of Finland's population of 5.15 million people. The Finnish language does not have grammatical gender or articles. The basic principle of word formation in Finnish is the addition of endings (bound morphemes, suffixes) to stems. Furthermore, Finnish verb forms are built up in the same way. Often, the endings are piled up one behind the other rather mechanically. The adding of endings to a stem is a morphological feature of many European languages, but Finnish is nevertheless different from most others in two respects: (1) English nouns have only one "morphologically marked" case, but Finnish has more case endings than is usual in European languages-about 15 cases (nominative, accusative, genitive, partitive, inessive, elative, illative, adessive, ablative, allative, essive, translative, abessive, comitative, and instructive). Finnish endings normally correspond to the prepositions or postpositions in other languages; (2) Finnish sometimes uses endings, where Indo-European languages generally have independent words. For example, cases in the Finnish noun "auto" ("car" in English) is formed with the endings: auto/ssa, ("in the car"), auto/sta ("out of the car"), auto/on ("into the car"), auto/lla ("by car"), and by attaching the plural ending -I, as in auto/i/ssa ("in the cars"). Finnish possessive suffixes correspond to possessive pronouns, such as -ni ("my"), -si ("your"), -mme ("our")-for example, auto/ssa/ni ("in my car"), auto/ssa/si ("in your car"), auto/ssa/mme ("in our car"). An example of using the verb stem sano-("say") and the endings -n ("I"), -i (past tense), and -han (for emphasis), verbs can be formed: sano/n ("I say"), sano/n/han ("I do say"), sano/i/n ("I said"), and sano/i/n/han ("I did say"). Another set of endings particular to Finnish is that of the enclitic particles, which always occur in the final position after all other endings, used mainly for emphasis: -kin ("too", "also") auto/ssa/si/kin ("in your car too"), -han (for emphasis: "you know, don't you?"), and -ko (English interrogative): On/ko tuo auto? ("Is that a car?"). Moreover, a characteristic feature of Finnish is the wide-ranging use of endings to form new words. For example, kirja ("book") and its derived forms kirj/e ("letter"), kirja/sto ("library"), kirja/llinen ("literary"), kirja/llis/uus ("literature"), kirjo/itta(a) ("(to) write"), and kirjo/itta/ja ("writer"). When adjectives occur as attributes, they agree in number and case with the headword, i.e., they take the same endings-for example, isossa autossa ("in the big car") [36].
There are two ways to form new words from existing words and stems in Finnish: (1) derivation and (2) compounding. In derivation, new words (word stems) are made by adding derivative endings or suffixes to the root or another stem, e.g., to adjective kaunis: kaunii-("beautiful"), add the ending -ta to form the derived verb stem kaunis/ta-("beautify"), the first infinitive kaunis/ta/a. Similarly, to the verb stem aja-("drive"), add the ending -o to form the derived noun aj/o ("drive", "chase", "hunt") or the ending -ele-to form the verb stem aj/ele-("drive around"), the first infinitive aj/el/la. The most common type of compound word is made up of two non-derived nouns, and typical compounds are written without spaces, e.g., autokauppa ("car sales"). The first noun in these compounds is often in the genitive, e.g., auto/n/ikkuna ("car window") [36].
In text analytics, many research projects rely on software libraries-such as the Natural Language Toolkit (NLTK)-that provide functionalities for language processing, e.g., stemmers and sentence tokenizers [37]. However, as outlined above, Finnish language stemming often leads to word stems that are not valid words, e.g., "kirj" but are also meaningless. Previous studies in classifying Finnish social media text have discovered that lemmatization functions significantly better than stemming for Finnish [38]. However, the most widely used software packages do not support lemmatization in the Finnish language.

Data Collection
Data collection was made with the Twitter Streaming API collecting tweets from a list of keywords specified as disinformaatio, huhu, misinformaatio, propaganda, uutisankka, valeuutinen (in English: disinformation, rumor, misinformation, propaganda, hoax, fake news). As the boundary, there is the specified filter to only look at tweets that contain the Finnish language. Tweets containing at least one of the keywords were stored in a database. The collecting of tweets was automated using the TweePy python library [39], which gains access to the API and real-time stream of tweets. The database chosen to collect Tweets was MongoDB, which is a NoSQL database capable of handling unstructured data. Custom code was implemented using Python to run in a virtual machine [40]. Tweets were collected from 23 June 2020, to 23 March 2021. A total of 43,890 tweets were collected that contained at least one of the defined keywords.
Development has been done with the Twitter API version 1, but the upcoming API version 2 makes it possible to replicate the data collection with new endpoints, which allows the user to search the complete archive [41]. With this method, it is possible to search tweets starting from 2006.
Twitter data are collected in full format into a MongoDB total. The total field count is 145, and for further analysis, only the most relevant fields are used. The fields selected for further analysis are documented in Table 1.

Data Processing
Twitter data processing started with reading an Excel spreadsheet that was extracted from MongoDB to a Python Pandas data frame. Data were formatted to have only one text field for text content. This was done because the Twitter API provides two text fields for text content (see Table 1). The default text field is made with the rule that if the tweet text is longer than 140 characters, then the text is truncated. Another text field provided by the API contains the extended tweet full text up to 280 characters [42]. In the case of truncated text, a custom script was developed to combine the default tweet text field and extended tweet text field to one field. Then the combined text field was processed with the Python regular expression operations listed in Table 2 to have text as general as possible. 2. Make all letters lower case. All letters are converted into lower case because the analysis is case-sensitive.

Remove punctuation, digits, and special markers
Removing all punctuation to reduce unstructured text and numbers does not usually change the meaning of the text. Removing special markers, usually @, is commonly used when a user is mentioned.

Remove white spaces
All unnecessary white spaces are removed.

Word Count Analysis in Atlas.ti 9
The word count analysis to detect the manifestation of disinformaatio, misinformaatio, propaganda, uutisankka, valeuutinen (disinformation misinformation, propaganda, hoax, and fake news) and the clustering of themes related to the keywords were carried out in two phases ( Table 3). The first phase was processed using Atlas.ti research analysis software version 9. The process began with importing the data from an Excel spreadsheet that was extracted from MongoDB and preprocessed to include only the most relevant fields. Next, the general terms in the Finnish stoplist and general terms such as "https:\\" were excluded. After that, the list of 47,013 words was exported to Excel, where the two-phased exclusions and inclusion were executed, resulting in 602 words that had a minimum of 50 manifestations in tweets. In the end, these words were clustered into 88 clusters, which varied from 5706 manifestations of propaganda to 50 manifestations of aivopesu ("brain wash"), kaksinaismoraali ("double standards"), and vihreäpropaganda ("Green propaganda"). Table 3. The process of word count analysis in Atlas.ti and the spreadsheet.

Action
Inclusion/Exclusion Criteria Quantity of Data

Results
The results of the study were created using custom-developed Python scripts and Atlas.ti 9 software. The results generated with developed Python scripts include a word cloud of tweets, the top 30 most frequent words in tweets, and a topic model of tweets. Atlas.ti was used to compute a more extensive word count and to export discovered main clusters, which were further cleaned and processed in an Excel spreadsheet. The combination of computational analysis with Python and qualitative data analysis software was performed to gain a deeper understanding of the discussions related to misinformation, disinformation, fake news, and propaganda.

Word Cloud of Tweets
A word cloud of tweets is used to illustrate and describe the data collected for the study (Figure 1). A word cloud contains the data in raw format without stemming, lemmatization, or the use of other natural language processing techniques.
performed to gain a deeper understanding of the discussions related to misinfor disinformation, fake news, and propaganda.

Word Cloud of Tweets
A word cloud of tweets is used to illustrate and describe the data collected study (Figure 1). A word cloud contains the data in raw format without stemmin matization, or the use of other natural language processing techniques. As seen from the word cloud propaganda, valeuutinen (fake news), disinfor (disinformation), yleuutiset (Yleisradio, "Yle", is the Finnish Broadcasting Compa tiset means news), Suomen (Finland's), suomessa (in Finland), mariaohisalo Ohisalo is the Minister of the Interior of Finland), perussuomalaiset (Finns Party) (slang for Finns Party), Russia, venäjän (Russia's), venäjä (Russia), and the Finnish Minister Sanna Marin spelled in various ways (e.g., marinsanna, marinin, ma among the biggest terms. From the processed tweet text (according to Table 2) of th cloud, the top 30 most frequently used words are listed in Table 4.  As seen from the word cloud propaganda, valeuutinen (fake news), disinformaatio (disinformation), yleuutiset (Yleisradio, "Yle", is the Finnish Broadcasting Company, uutiset means news), Suomen (Finland's), suomessa (in Finland), mariaohisalo (Maria Ohisalo is the Minister of the Interior of Finland), perussuomalaiset (Finns Party), persut (slang for Finns Party), Russia, venäjän (Russia's), venäjä (Russia), and the Finnish Prime Minister Sanna Marin spelled in various ways (e.g., marinsanna, marinin, marin) are among the biggest terms. From the processed tweet text (according to Table 2) of the word cloud, the top 30 most frequently used words are listed in Table 4.  The top 30 most frequent words in tweets point out that stemming would have worked only for five of the words, i.e., yle (stem), yleuutiset ja ylen, propaganda (stem), and propagandaa. None of the top 30 most frequent words gives any indication of what the conversations were about. Instead, they describe the organization or context related to the conversations, e.g., public broadcasting company, Russia, government, etc.

Topic Model of Tweets
Topic Modeling and Latent Dirichlet Allocation (LDA) was next performed to gain an overview of topics and the similarity between topics. The topic model was trained with a total of 15 topics. The top 10 words are listed in Table 5. Data collected using the "rumor" keyword were excluded from the topic model as it referred mostly to unrelated discussions in foreign languages.  propaganda  propaganda  propaganda  propaganda  propaganda  mariaohisalo  disinformaatio  valeuutinen  disinformaatio  valeuutinen  somessa  valeuutinen  leviää  yle  hai  jaa  marinsanna  persut  mariaohisalo  govt  valeuutisia  hai  mariaohisalo  kansainvälistä  disinformaatio  linkkejä  lapset  disinformaatio  ritken  sanoin  sitaatteja  govt  liittyen  ajoista  mattimuukkonen  voida  suomen  opparviainen  vietetään  paikassa  alkuperää  pitäisi  sisäministeri  synkistä  vaiennus  todistaa  yleisradio  sosiaalisi  kansanmurhien  lakia   Topic 2  Topic 5  Topic 8  Topic 11  Topic 14   propaganda  propaganda  propaganda  propaganda  propaganda  valeuutinen  disinformaatio  disinformaatio  valeuutinen  valeuutinen  disinformaatio  valeuutinen  vaarallista  russia  leviää  mariaohisalo  tuli  päivää  suomen  persut  leviää  vihapuhe  levinnyt  propagandaa  kuvaa  liittyen  mm  syystä  leviää  govt  sosiaalisi  pari  pari  disinformaatio  hai  sisäministeri  yle  valheellinen  mariaohisalo  disinformaatio  opparviainen  suomen  liandersson  the  somealustoilla  propagandaa  kuntavaalit  kuvamanipulaa  opparviainen  kannattajia   Topic 3  Topic 6  Topic 9  Topic 12  Topic 15   propaganda  propaganda  propaganda  propaganda  propaganda  valeuutinen  disinformaatio  valeuutinen  astatenhunen  valeuutinen  disinformaatio  valeuutinen  lakia  susi  kysymyksiä  leviää  yleuutiset  sensuuripykälä  disinformaatio  mariaohisalo  yleuutiset  russia  mattimuukkonen  valeuutinen  sanna  media  toimii  vaiennus  suomen  leviää  ylen  korona  mieltä  syy  keronen  mariaohisalo  lapset  sanoin  vihreät  nuorten  opparviainen  venäjän  paikassa  mm  hallituksen  liittyen  suomessa  rikotaan  amp  suulla Some of the discussions can be identified based on the topic model centered around people, e.g., Minister of Interior (topics 1-3), political parties, e.g., Finns Part (topic 14), elections (topic 5), Russian children's rights in Finland (topic 6), censorship (topic 9), and wolves and the Green League. However, for many of the topics, a more detailed investigation of the tweets would be needed to determine the connections between the words on the topic model. Figure 2 presents the results of the topic model that was manually modified by combining synonymous terms. For instance, the Minister of the Interior was referred to both by title and by name (Twitter account) in the original data, and several indicated social media, such as Facebook, social media, and social media platforms, which were categorized under one term. Furthermore, to maintain anonymity, some terms that referred to individual persons' accounts were replaced by a term indicating their profile on Twitter, e.g., Politician * and Student **. Some of the discussions can be identified based on the topic model centered around people, e.g., Minister of Interior (topics 1-3), political parties, e.g., Finns Part (topic 14), elections (topic 5), Russian children's rights in Finland (topic 6), censorship (topic 9), and wolves and the Green League. However, for many of the topics, a more detailed investigation of the tweets would be needed to determine the connections between the words on the topic model. Figure 2 presents the results of the topic model that was manually modified by combining synonymous terms. For instance, the Minister of the Interior was referred to both by title and by name (Twitter account) in the original data, and several indicated social media, such as Facebook, social media, and social media platforms, which were categorized under one term. Furthermore, to maintain anonymity, some terms that referred to individual persons' accounts were replaced by a term indicating their profile on Twitter, e.g., Politician * and Student **. Obvious terms, such as fake news, propaganda, and disinformation, were found among the most salient terms in the topic model ( Figure 2). However, misinformation was not included in the Top-10 or Top-30 most salient terms. National politics was a common Obvious terms, such as fake news, propaganda, and disinformation, were found among the most salient terms in the topic model ( Figure 2). However, misinformation was not included in the Top-10 or Top-30 most salient terms. National politics was a common topic, with frequent references to the Minister of the Interior, state secretary of the Ministry of the Interior, and politically active persons. Russia, as a country, emerged among the most salient terms as well. Surprisingly, an individual student was also among the most salient terms. Overall, the automated topic model functioned rather poorly for the Finnish language, which necessitated the manual combination of terms and cleaning the automatically generated figures.

Word Count Analysis and Clustering
The results of the word count analysis carried out in Atlas.ti and clustered in the Excel spreadsheet are presented in Tables 6 and 7. In Table 6, the main 22 clusters of 88 are presented. First, the five keywords are presented, together with the cluster of related words, such as lie, waddle, truth, and fact. Second, the 16 main theme word clusters, which include more than 200 manifestations in tweets, are presented.
The "Media" and "Social media" clusters included mention of various media and social media producers: newspapers and television channels, certain journalists, as well as general terms regarding news, such as news in Finnish "uutinen". The "Finland" cluster included various Finland-related issues, such as words referring to the country Finland, and also Finns and Finnish language. Similarly, the "Country and world" cluster contains words that refer to the country (Fin. maa) and the world (Fin. maailma). The "Animals" cluster comprises various animals. "Children and young people" cluster consists of words referring to child or children (Fin. lapsi, lapset) or young people (Fin. nuoret). The "Police" and "Researcher and Research" clusters include words referring to those occupations. "Opposition" (Fin. vastaisuus) and "Hate speech" included words referring to those terms.  In Table 7, three of the main clusters are examined in more detail. The results comprise the most manifested words in the clusters "Politics," "Politicians," and "Foreign countries". The cluster of "Politics" included general terms of politics, such as politics ("politiikka" in Finnish) and government. Furthermore, there were words referring to two parties (Finns Party and The Greens), and the political spectrums of right-wing and left-wing, and even the political ideology of communism. The elections were also presented among the most manifested words.
The two most manifested in the Politicians clusters were the Prime Minister and Minister of the Interior with various terms. Their manifestations were 75% of the total number in the Politician cluster. In the foreign countries cluster, Russia was the most manifested. However, EU, USA, China, Sweden, and even the Soviet Union were also manifested in the tweets. Presidents Putin and Trump were manifested highly in Finnish tweets regarding misinformation.

Discussion
The rapid spread of false information on Twitter has caused concern. Along with the social media applications, the various methods to study and analyze social media data, such as Twitter data, are evolving, too. Therefore, the functional analysis methods enable the investigation of discussions on misinformation and its related concepts, such as disinformation, fake news, and propaganda themes linked with those concepts in Twitter. By answering the research question: "What are the functional methods to analyze Finnish Twitter data to discover out discussion themes that are linked with misinformation and concepts related to it?", our article makes a two-fold contribution. First, it contributes to the applicability of three analysis methods on Finnish Twitter data. Secondly, it contributes to the discovered main themes that are connected to the misinformation-related concepts in Finnish Twitter discussions.
Regarding the contribution of the applicability of three analysis methods to Twitter data, the article focused on describing and comparing methods to detect and analyze the text content of tweets in Twitter conversations in Finnish that includes the concepts of misinformation, disinformation, fake news, hoax, and propaganda, and the themes linked to them. Limiting the investigation to the text content of tweets was done to uncover the inherent challenges of text mining Finnish tweet contents and to understand what kind of insights could be drawn from only the text. The grammatical characteristics of the Finnish language, particularly regarding the word formation with endings to word stems, as well as word formation via derivation and compounding, challenges conventional data analysis methods as well as machine learning.
Our results were obtained using three methods: (1) word cloud analysis, (2) topic model of tweets, and (3) word count analysis and clustering. Of these methods, the result of word cloud clustering is the most non-specific. Word cloud clustering does highlight the most utilized words visually and thus can easily present the relations in the manifestations of various words. However, it regards variant forms of words as separate, therefore it does not lemmatize the same stem variants. Thus, for Finnish text, word cloud clustering is not the most ideal for producing 100% correct results. Rather, it is a good method to look for word stems for further analysis. Compared to word cloud analysis, the results of the second analysis method-the topic model of tweets-gives a more distinct view of the most utilized words in Twitter data. However, the topic model does not consider the stem variants or lemmatize the same stem variants. Therefore, the accuracy of the results with a language like Finnish is not precise. As with word cloud clustering, topic modeling is feasible for an overview of the themes of Twitter discussion and detecting the word stems of the most used words. The third method-word count analysis followed by clustering in a spreadsheet-does produce very detailed results for word count, followed by clustering the words to themes. Yet, manual stem variant detection and lemmatizing is a labour-intensive phase in the research. However, it does bring value to the research if accuracy and inclusiveness of as many words as possible are required. Another possible method could be that the main word stems are discovered with word cloud analysis, or better yet, with the topic model of tweets, and then the documents are coded automatically in Atlas.ti with the word stems. However, with the Finnish language, there is still the risk that irregularly declinable words or conjugated verbs will not be included in the code list. In addition, natural language processing in Finnish is very limited. Finnish lemmatization could be carried out with Python with a very few libraries, such as Voikko and FINNPOS, yet their documentation is almost non-existent [43]. For further research, the above-mentioned coding in Atlas.ti could be suggested. Another avenue would be to consider possible machine learning methods to automate the labor-intensive phase of thematic analysis. However, compared to the word cloud analysis and the topic model of tweets, the combination of word count analysis and clustering does illustrate with more detail which political themes (for example) Twitter users combine with misinformationrelated concepts. Nevertheless, word count analysis and clustering do not reveal which themes are linked with each other concerning misinformation, for example, which themes are linked with the Prime Minister. This would require coding the data with word stems and using manually coded data in combination with topic modeling.
Regarding the contribution to the discovered main themes connected to the misinformationrelated concepts in Finnish Twitter discussions, our results revealed that "propaganda" is the most utilized misinformation-related word. Conspicuously, this result was detectable with all three analysis methods. In addition, fake news (Fin. valeuutinen) and disinformation (Fin. disinformaatio) were particularly represented misinformation-related words. Besides the keywords, the most used words in the Twitter data were generally neutral, i.e., their information value is not high without knowing the context. These are media-related words, such as news. Yet, they emphasize the communication style typical for microblogging on Twitter: tweeting, re-tweeting, and commenting on news and stories. Furthermore, other general words, such as politics, country, and world, are neutral words lacking context. Yet, they portray the landscape of the discussion content on Twitter. Furthermore, our results, particularly with the cluster analysis, revealed that politics and politicians are the main topics of discussion with regards to misinformation, disinformation, fake news, hoax, and propaganda in Finnish Twitter. Thus, this confirms the prior studies on fake news [29], as well as studies on false news spread on English Twitter [17].
In addition to the textual correlation, the study of disinformation has an emotional correlation. However, how and why disinformation flows-for example, regarding political polarization and immigration, particularly on social media such as Twitter-are questions that research regarding methods to analyze the text content of tweets in Twitter does not give answers. Therefore, we suggest further research of a descriptive or explanatory type with qualitative analysis methods on Twitter interactions and Twitter users. Patterns of how disinformation flows from one user to another user can be revealed, e.g., by analysis of information cascades on Twitter [44]. However, the rationale for why people disseminate misinformation inadvertently or disinformation intentionally on social media is discoverable from human motivation.
A limitation of this study is that it only investigated the text content of tweets. Moreover, the data was limited to Twitter data in the Finnish language. Furthermore, the study was limited to the functionality of three methods for analyzing Twitter data. Naturally, there could also be other types of analysis methods for different types of Twitter data besides the text content of tweets that could be functional especially for tweets in the Finnish language or other languages.

Conclusions
With the three methods we utilized in the analysis of Finnish Twitter data-(1) word cloud analysis, (2) topic model of tweets, and (3) word count analysis and clustering-we detected topics that are related to misinformation-related concepts: disinformation, fake news, and propaganda. We detected the advantages and disadvantages of each method and different levels of the result accuracy of each method. Furthermore, we noticed that each method has critical limitations, especially all the automated analysis methods for Finnish language processing; yet when combined, they bring value to the analysis. However, further research is needed on the applicability of analysis methods for Twitter data in Finnish due to the particularities in word formation.
Politics, both internal and external, are present in the Twitter discussions in connection with misinformation concepts. Thus, our results support the previous findings of studies on fake news [29], as well as studies on the spreading of false news on English Twitter [17].