Conspiracy Thinking, Online Misinformation, and Hate: Insights from an Italian News Story Using Topic Modeling Techniques

: This study delved into the realm of conspiratorial thinking and misinformation on Twitter, examining the case of Silvia Romano, an Italian aid worker who faced online conspiratorial attacks before and after her release. With the increasing prevalence of conspiratorial narratives on social media, this research investigated the interplay between conspiratorial thinking and the dissemination of misinformation. Two datasets comprising Italian tweets were analyzed, aiming to uncover primary topics, detect instances of conspiratorial thinking, explore broader emerging topics beyond Silvia Romano’s case, and examine whether authors of conspiratorial narratives also engage in spreading misinformation. Twitter served as a critical platform for this study, reﬂecting its evolving role in news dissemination and social networking. The research employed topic modeling techniques and coherence scores to achieve these objectives, addressing challenges posed by the inherent ambiguities in deﬁning conspiratorial narratives. The ﬁndings contribute to a deeper understanding of the complex dynamics of conspiratorial thinking and misinformation in the digital age.


Introduction
Over the past decade, individuals who have fallen victim to the online realm of conspiratorial thinking have encountered challenges in gaining public acknowledgment of their experiences.This challenge arises from a lack of empirical substantiation for their claims and a legal framework ill-prepared to address their concerns.Unfortunately, their appeals for recognition have often gone unanswered, exacerbating their predicaments.This unintended consequence has been driven by the proliferation of social media, which has significantly expanded people's freedom of expression.This newfound liberty enables them to share their thoughts and beliefs on a multitude of topics Lewandowsky et al. (2013); Marwick and Lewis (2017); Sunstein (2018).
However, this enhanced freedom also has a downside: some individuals misuse it to propagate conspiratorial narratives, resulting in an increasing volume of such content that more people are exposed to Van Prooijen et al. (2018).Consequently, both online platforms and governments have been exploring potential solutions and countermeasures to address this issue.
Apart from its impact on online discourse, conspiratorial thinking can also have significant consequences in real-life situations, with the boundaries between online and offline manifestations often blurring.Researchers have hypothesized that online conspiratorial narratives may be part of a wider process of harm that can begin on social media and then move to the real/offline world.
Nevertheless, the identification and mitigation of conspiratorial thinking face challenges due to the inherent ambiguities in its definition.Conspiratorial narratives may involve calls for harmful actions against individuals or groups or communication containing inappropriate or offensive language.
This study examined two datasets comprising 35,055 and 1,815,602 Italian tweets, respectively.The smaller dataset focused on tweets related to the release of Silvia Romano, a Milanese woman who volunteered in Kenya, was kidnapped in November 2018, and faced numerous conspiratorial attacks on social media before and after her release.The larger dataset comprised tweets from all users who posted about Silvia Romano's release and their subsequent activity.The aim was to analyze users engaging in conspiratorial thinking in the first dataset (related to Silvia Romano's release) and explore any potential connections with the consumption of misinformation in the second dataset (covering all tweets by these users).
Twitter played a significant role in this research, as it has evolved into a platform combining features of news media and social networking Bruns and Moe (2014).Its importance is underscored by extensive literature demonstrating its utility in various fields, from detecting natural disasters and terrorist threats to studying opinion formation on political issues and monitoring public health Buntain et al. (2016); De Santis et al. (2020); Iacomini and Vellucci (2021); Jungherr (2016); Leitner et al. (2021); Mendoza et al. (2018); Pierri et al. (2020).The ongoing global pandemic of coronavirus disease 2019  has shown that these latter two themes are intertwined.As for Italy, on 11 March 2020, the then Italian Prime Minister Giuseppe Conte ordered a set of severe measures of "social distancing", stopping nearly all commercial activity and constraining the Italian population at home for safety reasons.As happened in all the countries that were hit the hardest by the COVID-19 pandemic, where social distancing measures were more stringent, people spent an unusual amount of time on social media to stay connected and also to acquire important information mus (n.d.); vic (n.d.); Wiederhold (2020).This tsunami of information also contained misinformation, rumors, and fake news, which spread rapidly through social media platforms, pushing the WHO to declare it as a serious infodemic problem Zarocostas (2020)."We're not just fighting an epidemic; we're fighting an infodemic", declared WHO Director-General Tedros Adhanom Ghebreyesus at the Munich Security Conference on 15 February 2020.
This paper utilized two topic modeling techniques, the biterm topic model (BTM) Yan et al. (2013) and latent Dirichlet allocation (LDA) Blei et al. (2003), to analyze the data.Automatic topic extraction was supplemented by manual analysis aided by coherence scores, which gauged the human interpretability of topics Mimno et al. (2011).To calculate such scores, it is advisable to employ an external corpus to mitigate the effects of unusual term statistics present in tweet collections Boyd-Graber et al. (2014).Common choices for external corpora include the entire collection of English Wikipedia articles and the Associated Press data from the First Text Retrieval Conference (TREC-1) held in November 1992 Harman (1993).However, due to the lack of an Italian corpus suitable for this purpose, an ad hoc corpus was constructed using documents related to kidnapping and terrorism.
The aforementioned discussion led to the following research questions (RQs): RQ1.What are the primary topics and themes discussed in tweets related to Silvia Romano's release?
RQ2.Is it possible to differentiate instances of conspiratorial thinking among these discussions?
RQ3.What topics emerge from the analysis of tweets, beyond those discussing Silvia Romano, by users who have engaged in conspiratorial thinking regarding the Italian aid worker?
RQ4. Do the authors of conspiratorial narratives also engage in disseminating misinformation on Twitter?

The Datasets
For the current study, two datasets were built through a suitable scraper connected to the standard Twitter Streaming API, accessible upon opening a Twitter developer account.
They consisted of, respectively, 35,055 and 1,815,602 tweets.Tweets were retrieved using the R package rtweet Kearney (2019).The first dataset, denoted by D 1 , was built on all the tweets in Italian containing the keywords "Silvia Romano", for a period that spanned from 2020-05-08 at 05:00:00 UTC to 2020-05-16 at 07:30:05 UTC.This was the week of the release of Silvia Romano.The dataset D 1 consisted of 35,055 tweets.The number of twitterers who posted this stream of tweets was 18,235.Then, the activities of all these users were monitored, and all their tweets were collected during subsequent weeks.This produced the second dataset of this paper, denoted by D 2 .The inspection interval for D 2 fell in the weeks from 2020-05-08 01:56:24 UTC to 2020-06-02 12:48:18 UTC.The dataset D 2 consisted of 1,815,602 tweets.The tweets were filtered for the Italian language exploiting the specific filtering function available in the Twitter Streaming API.
The collected posts did not include retweets, which we excluded from the search upstream.

Methodology
This section describes the tools employed for determining the main topics emerging from tweets about Silvia Romano's release.These techniques were borrowed from the field of text mining, such as frequency and correlation analysis plus topic modeling.

Data Preprocessing
In the current study, the adoption of several preprocessing steps was motivated by the aim of reducing the noise present in the data.The adopted preprocessing steps were the following:
The part-of-speech step was carried through UDPipe NLP Toolkit Buchholz and Marsi (2006); Straka and Straková (2017).See also the website of Universal Dependencies CoN (n.d.), as well as Straka and Straková (2017) and the references therein for additional information.

Topic Modeling
The first step of the topic modeling approach adopted in this paper was the application of the biterm topic model Yan et al. (2013), aka BTM, to the dataset D 1 .A BTM learns topics over short texts, like tweets, by directly modeling the generation of all the biterms (i.e., two words co-occurring in the same context) in the whole corpus.In brief, the whole corpus is a mixture of topics, where each biterm is drawn from a specific topic independently.A detailed description of the algorithm can be found in Ref. Yan et al. (2013).To find the topics emerging in collection D 1 , the R package provided by Ref. Wijffels (2020) was used.
The second step of the present topic modeling approach was to resort to a latent Dirichlet allocation (LDA) technique Jónsson and Stolee (2015); Silge and Robinson (2017) for the dataset D 2 , where tweets were aggregated by author to overcome the poor performance on short documents shown by LDA.The authors of Yan et al. (2013) denoted this variation of the LDA model as LDA-U.The R library provided by Ref. Grün and Hornik (2011), which implements the variational expectation-maximization (VEM) algorithm Blei et al. (2003) for the LDA model, was used to gather experimental data and compared to other models.
Why use LDA for D 2 ?LDA is a mechanism employed for topic extraction, which treats documents as probabilistic distribution sets of words or topics.LDA shows poor performance on shorter documents Jónsson and Stolee (2015); Yan et al. (2013).However, this phase aimed to treat twitter users (who tweeted about Silvia Romano) as a probabilistic distribution of topics.This is why a user-based aggregation of tweets was performed.This was conducted before applying the topic modeling technique, and therefore the application of LDA did not present technical contraindications due to the length of the documents.

Coherence in Topic Models
The coherence score is an automated metric proposed by Mimno et al. (2011) for topic quality evaluation.In each model, it is possible to distinguish between topics that seem to be coherent and topics that seem to be illogical.The coherence score is a way to measure topic quality without relying on human judgments.Given a topic z and its top n words ordered by P(w|z), the coherence score is defined as where, for the sake of simplicity, the dependence of top words by z in the RHS is avoided.
In the following, the top 10 most likely topic words Boyd-Graber et al. (2014) will be considered (i.e., n = 10).Let score(., .)be a function defined as follows.D(w) is the count of documents containing the word w concerning the total number of documents, i.e., the document frequency of word w.Moreover, D(w, w ) is the count of documents containing both words w and w for the total number of documents, i.e., the co-document frequency of words w and w .D is the total number of documents in the corpus.There seems to be no corpus in Italian that can be used to compute the counts.For this reason, an ad hoc corpus was built from some books written or translated into Italian Ammaniti (2010); Márquez (2010); Olimpio (2018):

•
News of a Kidnapping by Gabriel García Márquez.• I'm Not Scared by Niccolò Ammaniti.

•
Terrorismi: Atlante mondiale del terrore by Guido Olimpio.As far as we know, there is no English translation of this book, which addresses the phenomenon of terrorism on a global scale.
These books were selected because they speak of kidnappings, ransoms, and terrorists and could be similar to the story of Silvia Romano.A problem with the choice of corpus may be that some very technical or specific words are missed.For this paper, the resulting corpus was provided as a document-term matrix with a term frequency of 26,577 terms in three documents (the aforementioned books) and a sparsity of about 58%.
Lastly, the score function introduced in Mimno et al. (2011), was adopted.A smoothing count of 1 was included to avoid taking the logarithm of zero.

Results
In this section, the results obtained through the techniques described in the previous section are listed and discussed.To increase the clarity and comprehensibility for the reader, the section is split into two subsections devoted to the two datasets of the paper, D 1 and D 2 .

Dataset D 1
The dataset D 1 was built on all the tweets in Italian containing the keywords "Silvia Romano" during the week of Silvia Romano's release.

Topic Modeling with BTM
A natural first problem when applying a topic modeling technique concerns the selection of a suitable number of topics.There is no way to address this issue that is the best.For example, the approach adopted by Kuhn (2018) involves studying the trade-off between semantic coherence and exclusivity.
Coherence is based on the document frequency of individual words and the codocument frequency of pairs of distinct words.Such measures can prevent the onset of topics that are misleading in some respects.For example, words may be linked in a chain but not belong to the same topic: the word "raspberry" might be linked to the word "diphthong", which is also linked to the term "vowel", but the words "raspberry" and "vowel" should not be assigned to the same topic in a topic model.The solution to this issue through the topic coherence measure was introduced by Mimno et al. (2011).
Table 1 shows the values of topic coherence when applying the BTM to dataset D 1 for a number of topics equal to 5. In the following, the i-th topic of dataset D for a number of topics equal to n is denoted by the symbol T D,n i for i = 1, 2, . . ., n.

27.55598819
The corresponding topics are shown in Table 2, whereas Figure 1 shows their graphical representation.Table 2 further provides English translations of the terms composing the topics.The topmost and most coherent topic dealt with cyber hate, which, against all odds, immediately greeted the Italian hostage's release Flick (2020); Povoledo (2020).As documented by Povoledo (2020) in her New York Times article, "the conversion of the young woman, Silvia Romano, to Islam, along with rumors that Italy had paid a ransom for her release, opened the dam to a deluge of insults on social media".Topic T D 1 ,5 2 also contained references to the dress worn by Silvia Romano upon disembarking the plane, a green jilbab-a long and loose-fit coat worn by some Muslim women.
The second most coherent topic (Topic T D 1 ,5 5 ) also dealt with hate, this time from some local politicians-for example, a right-wing municipal councilor ans (n.d.) who posted a photo of Silvia Romano captioned "hang her".Inevitably, these posts also involved political leaders in the controversy that took place on social networks-Conte, Di Maio, and Salvini were mentioned in the topic.Matteo Salvini is the head of the country's League party 1 , Giuseppe Conte served as Prime Minister of Italy from June 2018 until February 2021, and Luigi Di Maio has served as the Minister of Foreign Affairs since 5 September 2019.
Political controversies also emerged from the third most coherent topic, Topic T D 1 ,5 4 .A League lawmaker described Silvia Romano as a "neo-terrorist" Povoledo (2020), not including the aforementioned rumors that Italy had paid a ransom for her release.All these traits were contained inside Topic T seemed to be a too general topic, even if there were words in it that indicated the public happiness due to the good news of the Italian aid worker's release.Topic T D 1 ,5 3 was probably a set of scrap words that the BTM algorithm could not allocate correctly because of the paucity of topics.Anyway, it was possible to find in it a word that could be related to an episode of social hate that physically affected Silvia Romano.The word "bottle" may refer to the bottle thrown at the window of her house by unknown persons Kington (2020).By increasing the number of topics, the situation became clearer.No criterion was followed to select this number, but for the paper's purposes it was enough to observe again the presence of topics related to cyber hate and political controversies on Twitter.Table 3 shows the values of topic coherence when applying the BTM to dataset D 1 for a number of topics equal to 9. The corresponding topics, accompanied by English translations of the Italian words that appeared in them, are shown in Table 4, whereas Figure 2 shows their graphical representation.The most coherent topic (Topic T D 1 ,9 5 in this case) dealt again with cyber hate, but not exclusively.Terms like "social" (which in Italian stands for "social media") and "insult", referring to hateful messages that occurred on social media platforms (Twitter, in our case), were accompanied by "deputy"; "leghista" (a member or supporter of the League party); and "scuffle" (the translation of "bagarre", which in Italian is usually used to indicate a situation of political quarrel), instead telling a story of bitter political controversy-remember the League lawmaker who described Silvia Romano as a "neo-terrorist" Povoledo (2020).Silvia Romano's clothing, which in the case with five topics appeared among the words of the topic labeled as cyber hate, now has a topic of its own, linked to the conversion of the young woman-Topic T D 1 ,9 9 . The young woman's conversion caused controversy and hate phenomena, not only on the web.Topic T D 1 ,9 8 , which had almost the same coherence as the previous one, described the controversies that emerged from Twitter (ransom, million, terrorist).The difference between Topic T D 1 ,9 5 and Topic T D 1 ,9 8 was that, whereas in the first case political news was reported, and this could have been reported in a neutral tone or even criticizing the author of the disrespectful comments (the deputy of the League party), in the second case the words seemed to indicate the controversy spontaneously born from the tweets-for example, there was no indication of political subjects, as occurred in Topic T D 1 ,9 5 (i.e., the deputy).In this sense, Topic T D 1 ,9 8 was very similar to Topic T D 1 ,9 9 .Topic T D 1 ,9 4 seemed to tell the story of Silvia Romano, a kidnapped 24-year-old Italian aid worker who was then released after 18 months in captivity in Kenya.By further increasing the number of topics (see Figure 3), the topic that described the controversies emerging from Twitter (labeled as "controversies" in the case with nine topics) became the most coherent.See Table 5, where, in order not to burden the section too much, only the first three most coherent topics are listed.The analysis presented so far therefore answered RQ1 and RQ2.To further answer RQ2, an analysis of the most frequent trigrams that emerged from the corpus was also performed.It was not possible to list all the trigrams resulting from the dataset D 1 because there were a large number (a total of 59,499 trigrams), nor did it make sense to report only the most frequent ones because, for example, the first positions were occupied by trigrams such as "libera romano silvia" (translation: "free Silvia Romano", with a score of 1595) or "liberata romano silvia" (translation: "Silvia Romano freed", with a score of 497).These were the first two positions and represent general information, in the sense that they do not suggest positive or negative emotions but simply limit themselves to reporting news (in this case, the release of Silvia Romano).The trigram "bentornata romano silvia" (translation: "welcome back Silvia Romano", with a score of 105) greeted the return home of Silvia Romano with happiness.
However, not all trigrams showed general or positive content.For example, the trigram "costata romano silvia" (translation: "Silvia Romano has cost", score 74) refers to the rumors that Italy had paid a ransom for Romano's release, whereas the trigram "romano silvia terrorista" (translation: "Silvia Romano terrorist", score 71) is related to the political exploitation regarding the conversion of the young woman who was also accused, as reported above, of having become a terrorist.
Among those contained in dataset D 1 , the words "costata" (i.e., "has cost") and "terrorista" (i.e., "terrorist") appeared in, respectively, 328 and 843 tweets out of 35,055.These tweets mostly concerned the controversy sparked by the League lawmaker who described Silvia Romano as a "neo-terrorist" and the rumors that Italy had paid a ransom for her release Povoledo (2020).The tables provided in the Supplementary Information show, among these, the most cited tweets that represented cyber hate phenomena against Silvia Romano.Similar results could be found, e.g., for the trigram "islamica romano silvia" (translation: "Silvia Romano Islamic", score 68).The word "islamica" ("Islamic") appeared in 285 tweets out of 35,055.See the Tables in the Supplementary Information.These tables also show information concerning the "favorite count" field, which provides the number of times the tweet was favorited, representing a sign of strong support for the opinion conveyed by the tweet.
Overall, it is possible to say that the positive or neutral (i.e., journalistic) tones in D 1 far outweighed the negative, polemical, and offensive ones.This was related to the positivity of the news, because one could expect that almost everyone would be happy or at least neutral if they learned of the release of a hostage.Nevertheless, thanks to the BTM, the presence of hate speech and controversies was also found.Regarding these findings, the results enclosed in the following sections were obtained.

Dataset D 2
Dataset D 2 contained all the tweets published by the users whose tweets fell within dataset D 1 .The observation period followed that of D 1 .

Topic Modeling with LDA-U
To obtain the topics that emerged from the analysis of the tweets of Romano's detractors, a semi-automatic procedure, based on the combination of the BTM and LDA-U techniques, was employed.The procedure comprised the following steps: ).(c) Detect all the tweets which, with greater probability (assumed to be greater than 0.5), belong to Topic T D 1 ,13 k -this was possible because it was assumed Yan et al. (2013) that the topic proportions of a document were equal to the expectation of the topic proportions of biterms generated from the document.This set of tweets was called Set S (a subset of D 1 ).(d) Detect the users who posted the tweets within Set S (say, Set U).(e) Group by user the tweets in D 2 for each user in U.
Topic T D 1 ,13 k was that labeled as "controversies" in the case with 13 topics (k = 10).It contained the most controversial posts on the release of Silvia Romano.At this point, a new dataset (denoted by D 3 , a subset of D 2 ) was formed, to which the LDA-U technique was applied.D 3 consisted of 216,247 tweets, which were aggregated by 653 users.These were the authors of the aforementioned most controversial posts on the release of Silvia Romano.Figure S1 in the Supplementary Information shows the resulting clustered terms (into 30 topics) obtained from dataset D 3 through LDA-U.The topmost topic dealt with coronavirus, which, as one would expect, was a hot topic in May due to the pandemic's spread in Italy.This was, however, an overall theme that embraced more topics in Figure S1 in the Supplementary Information: 1,2,4,6,8,10,11,13,15,19,20,22,24,26,27,28,29,30.In the following, the symbol T D 3 ,30 i denotes the i-th topic learned from dataset D 3 .Topic T D 3 ,30 1 concerned what has been called, in Italian,"Decreto Rilancio", a decreelaw of 19 May 2020, containing urgent measures in the field of health, labor support, and the economy, as well as social policies related to the epidemiological emergency of COVID-19.
Topic T D 3 ,30 2 introduced the answers to RQ3 and RQ4.It contained the words "covid"; "trump"; "governo" (i.e., government); "bill" and "gates" (Bill Gates); "virus"; "regime"; and "stato" (i.e., state government).This topic embraced some rumors and conspiracy theories related to the COVID-19 pandemic that were proliferated given the lack of scientific consensus on the virus's spread.According to a widespread conspiracy theory, Bill Gates used the pandemic as a cover to launch a broad vaccination program to facilitate a global surveillance regime Baines et al. ( 2021 contained offensive tweets that used obscene language.This denotes fear, anger, and frustration in the social discussions, as well as conflict and hatred between people who have different opinions.Inappropriate and offensive language deteriorates public discourse and can lead to a more radicalized society Cinelli et al. (2021).
In Topic T D 3 ,30 5 , the juxtaposition of the words "immigrati" (i.e., immigrant); "migranti" (i.e., migrant); "soldi" (i.e., money); "lavoro" (i.e., job); and "governo" (i.e., government) suggested the presence in dataset D 2 of another famous strand of conspiracy theories, resulting from the spread of fallacious racial news, known as "racial hoaxes" Papapicco et al. (2022), able to feed the narrative of immigration promoted by some politicians Cervi et al. (2020) and induce the need for cognitive closure in ordinary citizens Baldner and Pierro (2019).Topic T D 3 ,30 19 , like the aforementioned Topic T D 3 ,30 5 , fell into this strand (here, "clandestini" translates to "illegal immigrants").Linked to them was Topic T D 3 ,30 9 , focused on the policies pursued by the Lega, which opposes illegal immigration into Italy and the EU as well as the EU's management of asylum seekers.
Topic T D 3 ,30 23 was related to the release of Silvia Romano.Here, it was still possible to find the presence of terms that suggested controversy regarding the ransom paid for her release and her conversion.This meant that these controversies did not end in the week of Silvia's release, but remained in the subsequent weeks.
Other topics related to fake news and hate were T (3) Out of a total of 653 users who posted the most controversial tweets on the release of Silvia Romano, 207 (about 31.7%)used the more offensive words contained in Topic T D 3 ,30 3 .Of course, these words were not necessarily directed to Silvia Romano.Furthermore, about 23% of these 653 users mentioned words in topics related to rumors and conspiracy theories concerning Bill Gates or racial hoaxes.
Repeating the procedure steps (a)-(e) and substituting (at point (b)) the most polemical topic against Silvia Romano with the most favorable/supportive one towards her (i.e., in English, news, beautiful, free, to free, release, thank you, first, joy, good, day), another dataset was obtained, called D3 .Obviously, D3 ⊂ D 2 and D3 ∩ D 3 = ∅.D3 consisted of 88,635 tweets posted by 857 users (who were the authors of the most favorable/supportive tweets towards Silvia Romano). Figure S2 in the Supplementary Information shows the resulting clustered terms (into 30 topics) obtained from dataset D3 through LDA-U; there seemed to be no records of misinformation in them.
Since LDA modeled each document (i.e., the aggregate tweets by author) as a mixture of topics, it was possible to examine the per-document-per-topic probabilities γ of aggregate documents in D 3 .Then, each aggregate document in D 3 was assigned to the topic with the maximum value of γ.The authors of documents assigned to one of the aforementioned topics in T (i.e., those related to fake news and hate) were finally selected.In this way, 271 users were identified and used to refine the search.In dataset D 3 , they posted 61,815 tweets.This dataset was called D 4 , which was in turn a subset of D 3 .In the top 20 of the most active users in D 4 , the number of tweets produced varied between 833 and 2005.To date, some of these accounts have been suspended (Twitter suspends accounts that violate the Twitter Rules), whereas some others are still connected to the League party and Matteo Salvini.Users in D 4 were on average more active than those in the rest of the dataset, D 2 \D 4 , because the average number of tweets posted per single user in D 4 was equal to 228.0996, compared to 118.124 for users in D 2 \D 4 .
These included references to: (i) some alternative debated cures cited by vaccine opponents (plasma therapy) to defeat COVID-19; (ii) the aforementioned conspiracy theory regarding Bill Gates (whereas Sara Cunial is an Italian deputy and former member of the parliamentary group of the Five Star Movement, famous in Italy for her anti-scientific positions on vaccines); (iii) again, the controversy over the ransom paid by the Italian state for the release of Silvia Romano; (iv) the sanatorium proposed by the then Italian minister of agriculture, Teresa Bellanova, which provided for the regularization of foreign workers in Italy and caused considerable controversies; (v) a very general conspiracy theme, containing the term globalist, used as a pejorative in right-wing parties and conspiracy theories Stack (2016) and arguments against masks (also involving supposed abuses committed by police against citizens without masks).
Other topics not listed above referred to the Italian lockdown and to political games (decrees and contrasts between the political leaders of the Democratic Party, Five Star Movement, and League party).

Dataset D 2 Two Years Later
After two years, of the 18,235 users, 15,542 remained (82.4%).A possible explanation for this difference is that hateful users were banned more often due to the infringement of Twitter's guidelines, as conjectured by Ref. Ribeiro et al. (2018).The same is true for "conspiracists", especially after Twitter adopted stricter policies on COVID-19 vaccine misinformation.The set of users who produced tweets mainly belonging to topics in T went from 271 to 222, with a percentage (81.9%)comparable to the rest of the dataset D 2 .Deleting the tweets produced by removed accounts, D 4 went from 61,815 tweets to 44,539 (70%), while the average number of tweets posted per user in D 4 became 200.6261.This new dataset is denoted by D 4 ⊂ D 4 .These data were in line with the results of Ref. Ribeiro et al. (2018), according to which hateful users are "power users" in the sense that they tweet more.A similar result could also hold for conspiracists.
These again included references to the ransom paid by the Italian state for the release of Silvia Romano, the regularization proposal for foreign workers in Italy, and anti-mask positions.Bill Gates and covid were still mentioned, but Sara Cunial disappeared, possibly due to the cancellation of accounts that shared her ideas.Unlike the topics detected in D 4 , this topic in D 4 joined the one containing the mention of plasma therapy, where a reference to Giuseppe De Donno now also appeared.Giuseppe De Donno was an Italian Professor and physician.He was a supporter of plasma remedies to combat COVID-19.
Moreover, the remaining accounts may have also lost some of their tweets due to Twitter Rules violations.By searching status IDs, it was found that of the 61,815 tweets, only 24,185 remained (39.1%).This new dataset is denoted by D 4 ⊂ D 4 ⊂ D 4 .The application of the BTM to D 4 showed non-relevant topics, focused on political debates, except for the controversies related to the regularization proposal for foreign workers in Italy and the ransom paid for the release of Silvia Romano, which, compared to the corresponding topics in D 4 , appeared without any changes in the initial words.Instead, the references to Bill Gates and anti-mask theories disappeared.Although social media polarization and echo chambers could still be found in these topics, the latter results showed the effective work carried out by Twitter to address misinformation on the platform.

Conclusions
In this study, a large corpus of almost two million tweets in Italian was collected, containing: (i) posts about the Italian volunteer Silvia Romano during the week of her release, i.e., from 2020-05-08 to 2020-05-16; (ii) all the posts published by the authors of this stream of tweets in the subsequent weeks, spanning from 2020-05-08 to 2020-06-02.The temporal range covered the end of the first COVID-19 lockdown in Italy.This work aimed to characterize the behavior of users engaging in conspiratorial thinking in the first dataset and shed light on the relationship with the consumption of misinformation in the second dataset.
The combined LDA and BTM techniques were able to discover, in an unsupervised fashion, the main emerging terms related to socio-political events (such as discussions of decrees and political disputes), also including terms that were heavily and constantly used, such as the major political leaders' names.
The implications of this study mainly concern the management of social platforms like Twitter.First, as in Cinelli et al. (2021), this study did not find evidence of a strict relationship between the usage of toxic language (violent, offensive, or simply inappropriate) and involvement in the spread of misinformation on Twitter.Second, users seemed to be prone to use toxic language outside of their echo chamber, targeting the community they perceived to be their opponent.This is in line with recent studies about the polarization of online debates and the stigmatization of users Iacomini and Vellucci (2021).Third, among the tweets posted by the authors of the most controversial posts on the release of Silvia Romano, the presence of mono-thematic debates was not observed.Therefore, there were no serial users/producers of misinformation; instead, these authors seemed to also share other content.Lastly, it should be noted that many of the efforts made by Twitter involved the contradiction of misinformation on the platform.Despite these efforts, after checking the existence of the accounts that had posted the most controversial tweets on the release of Silvia Romano, after almost 2 years from the date of their posting, we found that only around 20% of them were unavailable due to official banning or removal by the author (even though Twitter still removed nearly 60% of these tweets!).
The rest of the political implications were aimed at governments.The past literature explains that during crises, people depend on media to keep updated and receive accurate information Ball-Rokeach and DeFleur (1976).The presence of topics related to antivaccine theories showed the difficulties governments face in dealing with individuals' concerns about vaccine efficacy.The efforts of governments should instead be designed to raise awareness among individuals and include them in civil dialogues, online and offline.The presence of these topics, as well as topics related to racial hoaxes, concerning unfounded rumors of the economic privileges immigrants would enjoy in Italy, represented a pressing need for possible consequences in the real world.Recent measures carried out by the European Parliament have contributed to this direction; it is worth mentioning the provisional political agreement reached on the Digital Services Act (DSA), which follows the principle that what is illegal offline must also be illegal online DSA (n.d.).
We would like to remark that misinformation is often a symptom of deeper sociopolitical issues rather than their cause.Addressing the symptoms can be helpful, but it should not detract from addressing the root causes or the importance of advocating for access to accurate, transparent, and high-quality information Altay et al. (2023).
The findings of this study must be considered in light of some limitations.This paper discussed concepts such as "hate speech" and "misinformation".There is a lack of academic consensus on how to measure these concepts (canceled accounts are by no means a proxy for misinformation).Indeed, concerning hate, we adhered here to legal standards (only implicitly).We understand that these standards are ad hoc in abstract cases of legal categorization and may not reflect "scientific" categories in information and communication science or psychology, sociology, etc.; in particular, Silvia's case represented several categories often targeted on social media that conformed to the legalistic definition of hate speech.Is conspiratorial thinking a form of hate speech?There is clearly an overlap; however, conspiratorial thinking was much more specific in this case study.We could hypothesize two different models of action for the production of conspiratorial tweets.The first model concerned genuine expressions of emotions regarding the topic of Silvia Romano: people who genuinely felt that something unjust had happened.The second model of action properly pertained to misinformation (or humbug): these people expressed concerns about the hypothesis that Silvia Romano could have been sympathetic to terrorists.The difference between these categories of action was crucial for the study.For future work, when analyzing the past tweeting behavior of people expressing concern (rather than hate), one could try to quantify genuine distress about the historical contingency of Silvia Romano (but this approach could also be applied to other news) through the systematic behavior of spreading "concerns" (e.g., about immigration, which we found in our discussion to be relevant).
Another limitation of this paper was that the case of Silvia Romano, as a case study, does not generalize well because the sample of people who tweet in Italian is particularly clustered around certain demographics, which are less representative of the general dynamics involving human behavior compared to collections of tweets in English, French, Spanish, and possibly Arabic, which generalize across many countries.It is not even a representative sample of Italian society: Twitter is less popular in Italy compared to other EU countries.Research on Italian tweets has its own value, but it would require more

Figure 1 .
Figure 1.Visualization of the biterm topic clusters (5 topics) from the database containing keywords "Silvia Romano".

Figure 2 .
Figure 2. Visualization of the biterm topic clusters (9 topics) from the database containing keywords "Silvia Romano".
the bottle thrown at the window of Silvia Romano's home in Milan.Topic T D 1 ,9 3 referred to the role of Turkish intelligence in the release of the hostage.Topic T D 1 ,9 7 also dealt with hate towards Silvia Romano involving some local politicians (e.g., the aforementioned right-wing municipal councilor).Happiness at the hostage's release and Silvia Romano's return home emerged instead from Topic T

Figure 3 .
Figure 3. Visualization of the biterm topic clusters (13 topics) from the database containing keywords "Silvia Romano".

Table 1 .
Topic coherence (5 topics) based on BTM for the database containing keywords "Silvia Romano".

Table 2 .
The top 10 most likely topic words for each topic (5 topics) according to BTM in the database containing keywords "Silvia Romano".

Table 3 .
Topic coherence (9 topics) according to BTM for the database containing keywords "Silvia Romano".

Table 4 .
The top 10 most likely topic words for each topic (9 topics) according to BTM from the database containing keywords "Silvia Romano".

Table 5 .
The top 10 most likely topic words for the 3 most coherent topics (13 topics) according to BTM from the database containing keywords "Silvia Romano".