Hybrid Intelligence Strategies for Identifying, Classifying and Analyzing Political Bots

: Political bots, through astroturﬁng and other strategies, have become important players in recent elections in several countries. This study aims to provide researchers and the citizenry with the necessary knowledge to design strategies to identify bots and counteract what international organizations have deemed bots’ harmful effects on democracy and, simultaneously, improve automatic detection of them. This study is based on two innovative methodological approaches: (1) dealing with bots using hybrid intelligence (HI), a multidisciplinary perspective that combines artiﬁcial intelligence (AI), natural language processing, political science, and communication science, and (2) applying framing theory to political bots. This paper contributes to the literature in the ﬁeld by (a) applying framing to the analysis of political bots, (b) deﬁning characteristics to identify signs of automation in Spanish, (c) building a Spanish-language bot database, (d) developing a speciﬁc classiﬁer for Spanish-language accounts, (e) using HI to detect bots, and (f) developing tools that enable the everyday citizen to identify political bots through framing.


Introduction
No longer just a software mediator, bots have become important players in various political systems (Salge and Karahanna 2018;Lewis et al. 2019). Since 2010, political parties and governments have spent more than USD 500 million on research and development in this field (Bradshaw and Howard 2018). Their effect on election and referendum results increases each year, with formally organized social media manipulation campaigns in 48 countries, compared to 28 in the previous year (Bradshaw and Howard 2018), though they develop and exert influence differently depending on the context. Bots' political influence continues to grow (Montal and Reich 2017) due to their ability to create artificial public opinion and turn non-existent or minority opinions into majority or dominant ones . The receivers of these messages are defenseless given that they are yet unable to distinguish real from fake speeches, which employ deception so as to conceal their nature (van der Kaa and Krahmer 2014;Waddell 2018;Wölker and Powell 2018;Kušen and Strembeck 2020).
This research aims primarily to provide citizens with the knowledge necessary to create strategies for identifying political bots as a safeguard against their potentially negative impact on democracy, as a tool for achieving equality of opportunities in public debate for all political options, and to improve automated bot detection. We seek to do this by applying hybrid intelligence (HI), which will allow us to combine machine and human intelligence to overcome deficiencies in current artificial intelligence systems 1.
Develop Spain's first account classifier and define characteristics to identify signs of automation. We collected and indexed a massive amount of text from Twitter, analyzing the political bots of the resulting corpus so as to determine their defining characteristics. We measured their effectiveness at the individual and aggregate levels, leveraging various sets of characteristics so as to find the most effective. Among the heuristics explored were those featured in previous studies on Twitter.

2.
Compile Spain's first bot database for Twitter. The database will allow for a subsequent analysis of bots' presence in and influence on Spanish public opinion. 3.
Determine the typical characteristics of political bots during political campaigns based on political bots' profiles and tweets. 4.
Analyze and develop tools for the public at large to identify bots without relying on automated machine detection.
This article contributes to the literature by (a) applying framing to the analysis of political bots, (b) defining a set of characteristics so as to identify signs of automation in Spanish, (c) compiling a Spanish-language bot database, (d) developing a classifier for Spanish-language accounts, (e) applying HI to bot detection, (f) identifying key information for the public at large to identify political bots, and (g) improving automated detection of bots.
Having set forth an introduction, we will now briefly review the relevant literature and describe the perspective we adopted in our research. Then, we will detail the methods and samples used, examine the set of features used by the bot classifier, discuss the results of the bot analysis, and propose a detection tool. Finally, we will discuss trends in the field, as well as the conclusions and limitations of our research.

Political Bots: Identification and Social Impact
Previous research focused on creating techniques to automatically identify bots by analyzing messaging behavior in the Twitter ecosystem (Schuchard et al. 2019;Badawy et al. 2019;Lai et al. 2019;Wang et al. 2020;Zheng et al. 2019). Bots' movement throughout Twitter was primarily analyzed so as to improve automatic detection systems. In keeping with Woolley and Howard (2016), political bots were defined as, "[ . . . ] the algorithms that operate over social media, written to learn from and mimic real people so as to manipulate public opinion across a diverse range of social media and device networks." The most noteworthy results from previous research revolve around four central points: (a) influence on electoral processes, (b) functions, (c) taxonomies, and (d) regulation of political bots.
Political parties and other political players can track personal data to send campaign ads, determine the ideological makeup of their potential voters, and send personalized messages adapted to voters' needs in real time with no need for human intervention. Algorithms can reveal the public's state of mind, opinion, location, ideology, and needs. Moreover, they can be used to build and send in real-time messages designed to support the sender's positions and influence each and every type of voter. Bots can even hold conversations with people or amongst themselves. Here, the literature describes astroturfing as a system for creating fake public opinion, highlighting several bot actions (Treré 2016;Bastos and Mercea 2017): (a) pro-government ads, (b) creating fake opinion leaders, (c) delegitimizing systems of government, (d) supporting opposition groups, (e) empowering the public, (f) establishing political agendas and debates, and (g) weakening political dissent.
Taxonomies revolve around the bots' dynamics (Dagon et al. 2008;Chu et al. 2010;McKelvey and Dubois 2017) or devise specific categories for corpus analysis, as in Stukal et al. (2019), where bots were categorized as pro-regime, anti-regime, or neutral. One of the most thorough taxonomies includes various characteristics: professional news content, professional political content, polarizing and conspiracy content, and other policy news and information (Machado et al. 2018).
The detection of a large number of bots spreading false information and polarizing the political conversation (Bessi and Ferrara 2016) gave rise to various initiatives, an evaluation of changes to legislation, and proposals for intervention. Though some criticize regulation for potentially limiting freedom of speech (Lamo and Calo 2019), many countries and international organizations have recently developed such rules. The European Union warned of the threat to democracy if political parties generate automated messages adapted to the needs of each person based on big data analysis, which may even be manipulated by fake news (European Commission 2018; European Parliament 2017). Additionally, some countries have enacted laws to regulate artificial intelligence and computational propaganda (Italy in 2014, France in 2016, the UK in 2017). In November 2018, the European Commission adopted the Action Plan against Disinformation to minimize disinformation in European elections.

Our Approaches: Hybrid Intelligence
Our paper uses this context as a starting point, understanding that the sophistication of bots necessitates the development of mixed analysis methods that combine statistical methods with social sciences, which we seek to achieve through HI. Natural language processing and new AI techniques in the field of machine/deep learning (ML/DL) have been used in recent years to detect bots at the account level, processing a large number of social network posts and leveraging information on the network's structure, temporal dynamics, and sentiment analysis, and even using neural networks in large compilations of text data (Kudugunta and Ferrara 2018;Stukal et al. 2019). Nonetheless, we believe that bots' increasing complexity necessitates mixed methodologies that combine expert human knowledge with ML/DL and NLP systems. Soc. Sci. 2021, 10, 357 4 of 18 This approach will allow us to feed the classifier with new information that will improve and allow for immediate detection of political bots, and will also allow individual users to do so, despite them not typically having access to such technological tools nor the ability to visualize a large number of bots.
We tackle bot detection through framing not only because it can connect with the messages' propagation dynamics and their cascade influence, but also because it is a model that successfully maps out the complexity of thought by paying attention to the elements of greatest significance in a given communicative context.
We assume that frames play an important role in public opinion and that bots can rapidly spread frames (tell people what to think about) so as to, as Entman (2010) points out, monitor public attitudes to influence people's behavior. This situation, combined with the spiral of silence, could become a powerful weapon for hiding opinions not just because they are a minority position but because they differ from those proposed by bots. Moreover, according to Chong and Druckman (2007), bots have two of the three frame-strengthening elements: frequency, accessibility, and relevance. The influence bots may exert through frame-spreading is portrayed in the following Figure 1. Soc. Sci. 2021, 10, x FOR PEER REVIEW 4 of 19 text data (Kudugunta and Ferrara 2018;Stukal et al. 2019). Nonetheless, we believe that bots' increasing complexity necessitates mixed methodologies that combine expert human knowledge with ML/DL and NLP systems. This approach will allow us to feed the classifier with new information that will improve and allow for immediate detection of political bots, and will also allow individual users to do so, despite them not typically having access to such technological tools nor the ability to visualize a large number of bots.
We tackle bot detection through framing not only because it can connect with the messages' propagation dynamics and their cascade influence, but also because it is a model that successfully maps out the complexity of thought by paying attention to the elements of greatest significance in a given communicative context.
We assume that frames play an important role in public opinion and that bots can rapidly spread frames (tell people what to think about) so as to, as Entman (2010) points out, monitor public attitudes to influence people's behavior. This situation, combined with the spiral of silence, could become a powerful weapon for hiding opinions not just because they are a minority position but because they differ from those proposed by bots. Moreover, according to Chong and Druckman (2007), bots have two of the three framestrengthening elements: frequency, accessibility, and relevance. The influence bots may exert through frame-spreading is portrayed in the following Figure 1. Although bots may have initially been geared towards increasing a politician's nonexistent popularity, they are not typically used to spread frames that tend to receivers' needs. A frame's success partially revolves around awareness and adaptation of pre-existing frames. As seen in the diagram, this is one of bots' strengths: Namely, the user imbues the bots with data analysis, allowing them to incorporate preexisting frames. If the spiral of silence were applied to political bots, a bot creator could quickly spread new frames consistent with receivers' preexisting frames so as to transform those initially rare or nonexistent frames into dominant ones.
Here we turn to the conceptualization and operationalization of the frame analysis, as well as to the trends in social network frames that underpin the tools used in our paper's empirical analysis.
Frames are a key concept in political communication, though they have been used in other fields as well (Bateson 2002;Goffman 1974). Frames can carry out four functions: defining problems, interpreting causes, moral judgments, and recommendations for treatment (Entman 2003). Their success is linked to interaction with individuals' pre-existing schema, pre-existing frames, and current information (Entman and Usher 2018).
The framing approach presents distinct typologies (Matthes 2009). For our purposes, the relevant typology is that which distinguishes between specific and generic frames (De Vreese 2005). Whereas specific frames revolve around specific topics or isolated events, generic frames transcend thematic limitations because the same issue can be identified in different contexts (De Vreese 2002). A great deal of research has applied such classifications: specific frames in Matthes (2009), Lengauer and Höller (2013), Hänggli and Kriesi (2010), Sheafer and Gabay (2009), Zhou and Moy (2007), and Matthes (2009), and generic Although bots may have initially been geared towards increasing a politician's nonexistent popularity, they are not typically used to spread frames that tend to receivers' needs. A frame's success partially revolves around awareness and adaptation of preexisting frames. As seen in the diagram, this is one of bots' strengths: Namely, the user imbues the bots with data analysis, allowing them to incorporate preexisting frames. If the spiral of silence were applied to political bots, a bot creator could quickly spread new frames consistent with receivers' preexisting frames so as to transform those initially rare or nonexistent frames into dominant ones.
Here we turn to the conceptualization and operationalization of the frame analysis, as well as to the trends in social network frames that underpin the tools used in our paper's empirical analysis.
Frames are a key concept in political communication, though they have been used in other fields as well (Bateson 2002;Goffman 1974). Frames can carry out four functions: defining problems, interpreting causes, moral judgments, and recommendations for treatment (Entman 2003). Their success is linked to interaction with individuals' pre-existing schema, pre-existing frames, and current information (Entman and Usher 2018).
In keeping with Aalberg et al.'s (2012) work on generic frames, we dealt with game frames and strategic frames. Game frames take politics as a game that consists of winning or losing, employs bellicose language, and emphasizes polling data. They typically focus on politics in general, legislative debates, the winners and losers of an election, and the battle for public opinion (polls), and speculate on election or political results and potential coalitions. The strategic frame refers to campaign strategies and tactics, motives and instrumental actions, personality, style, and metacoverage. It also includes different media strategies, including news that covers the press's behavior.
The first generation of research demonstrated that frames can affect public opinion on a wide range of issues (Aarøe 2011), though not to the same degree (Entman 2003;Aarøe 2011;Chong and Druckman 2007). Public actors compete to strengthen certain frames and several factors determine their success or failure in spreading new frames (Chong and Druckman 2007;Hänggli and Kriesi 2012).
Given social networks' consolidation as a form of political communication, framing processes, as well as the processes of information production and consumption, among others, need to be reevaluated (Entman and Usher 2018). Digitalization of online framing processes has significantly influenced interpersonal, family, and organizational communication and increased opportunities for extremism and balkanization (Entman and Usher 2018).
Initial studies applied framing theory to blogs and websites (Bichard 2006;Goldman and Kuypers 2010). To date, many framing-based studies of social networks have considered frame propagation dynamics in distinct but inter-related communication ecosystems (Wasike 2013;Aruguete and Calvo 2018).
Twitter has become a major political communication tool, especially during campaign seasons when parties and candidates use it to provide information about their campaign and its events or to link to their website (Jungherr 2016). Previous studies affirmed Twitter's ability to get out the vote and increase civic engagement (Gainous and Wagner 2014) and to change political commitment (Lee and Park 2013;Grčić et al. 2017). Academic research highlights bi-directionality, interactivity, the capacity for dialogue, and even promoting a participatory, deliberative democracy. More importantly, Twitter is the appropriate tool for politization given its ability to disseminate and cause messages to circulate, in addition to its fostering of public engagement (Utz et al. 2013;Abitbol and Lee 2017;Ji et al. 2018;Painter 2015).
Additionally, our paper considers Twitter a shared communicative space in which democratic deliberation could arise and each political actor could defend their positions. Moreover, we understand that for this to happen, an equal playing field is an important premise. Bots violate that condition by giving too much power to too few. Given that platforms refuse to take responsibility for this situation, the frames spread therein, and the beneficiaries (Entman and Usher 2018), we propose empowering the public to detect bots, promoting horizontal surveillance put into practice by people themselves in a somewhat planned way through cell phones and other devices that allow for reporting and sharing political actions with which they disagree. Our research aims to foster public detection of political bots, whose messages are often indistinguishable from those created by humans. Indeed, much of their success lies in this concealment.

Materials and Methods
We proposed a multi-disciplinary approach leveraging artificial intelligence, natural language processing, political science, and communication science. We applied hybrid intelligence to our analysis of political bots from Spain's April 2019 election, referencing previous studies on automatic bot detection. Below we summarize and explain our extraction methods and our analysis of tweets (Appendix A): Step 1. Crawler Design.
The Polypus Twitter crawler (Martínez-Castaño et al. 2018b) retrieves tweets in real time from Twitter's public and anonymous Search API, used by its official web client. The tweets are formatted in HTML so they can be directly inserted in the site's feed. The API limits the number of returned tweets by returning a sample of the total that matches a given query. Any user of the web interface is expected to receive the same set of tweets when executing the same query within the same time windows.
The high-level architecture of the system is shown in Figure 2. The crawler can be configured both for targeted and untargeted searches. The number of threads is configurable so that the queries are equally distributed and processed in parallel. In addition, multiple instances of the crawler can be executed on different machines so that the crawler scales horizontally. To avoid repeated tweets, a distributed memory-based key-value store is used to store the known identifiers for several days. The Polypus Twitter crawler (Martínez-Castaño et al. 2018b) retrieves tweets in real time from Twitter's public and anonymous Search API, used by its official web client. The tweets are formatted in HTML so they can be directly inserted in the site's feed. The API limits the number of returned tweets by returning a sample of the total that matches a given query. Any user of the web interface is expected to receive the same set of tweets when executing the same query within the same time windows.
The high-level architecture of the system is shown in Figure 2. The crawler can be configured both for targeted and untargeted searches. The number of threads is configurable so that the queries are equally distributed and processed in parallel. In addition, multiple instances of the crawler can be executed on different machines so that the crawler scales horizontally. To avoid repeated tweets, a distributed memory-based key-value store is used to store the known identifiers for several days. The list of queries can be set as the list of the most frequent terms or expressions in a set of languages. With this strategy, the crawler can retrieve huge amounts of tweets without any specific target. There is not a linear relation with the available resources (due to the aforementioned limitations of Twitter's API). However, for specific targets such as in this study, the extraction can practically match the actual production of tweets through the use of specific terms, hashtags, or Twitter accounts. Dynamic data (e.g., retweets, number of replies) are collected afterwards since the tweets are extracted in real time and these attributes do not offer useful information about the users' interaction initially.
Polypus's Twitter crawler is now part of a set of social media crawlers integrated into Catenae (Martínez-Castaño et al. 2018a, 2018c, a Python framework for easy design, development, and deployment of stream processing applications with Docker containers 1 . Step 2. Classifier features. Extraction parameters (explained below).
Step 3. Discourse analysis through framing. Tweet content analysis parameters (explained below). The list of queries can be set as the list of the most frequent terms or expressions in a set of languages. With this strategy, the crawler can retrieve huge amounts of tweets without any specific target. There is not a linear relation with the available resources (due to the aforementioned limitations of Twitter's API). However, for specific targets such as in this study, the extraction can practically match the actual production of tweets through the use of specific terms, hashtags, or Twitter accounts. Dynamic data (e.g., retweets, number of replies) are collected afterwards since the tweets are extracted in real time and these attributes do not offer useful information about the users' interaction initially.

Crawler Threads
Polypus's Twitter crawler is now part of a set of social media crawlers integrated into Catenae (Martínez-Castaño et al. 2018a, 2018c, a Python framework for easy design, development, and deployment of stream processing applications with Docker containers 1 . Step 2. Classifier features. Extraction parameters (explained below).
Step 3. Discourse analysis through framing. Tweet content analysis parameters (explained below).
The sample was in keeping with those of other studies (Hedman et al. 2018;Schäfer et al. 2017). In addition to the campaign season, it also included the period between April 15 and May 5, with 575 candidate or political party accounts, as well as the hashtags promoted by those parties. The sample yielded the following Table 1 data: Additionally, it combined analysis of parties' and candidates' accounts, as well as hashtags as we can see in the Table 2. Table 2. Accounts, hashtags, and terms.
For this paper's context, we chose Spain's 28 April 2019 elections, which are particularly relevant for several reasons. First is the 74.65% voter participation, much higher than that of the 2016 elections won by the Spanish Socialist Workers' Party (PSOE in Spanish), at a time in which Spain's two-party system was splintering due to the emergence of two new parties (Podemos and Ciudadanos). Second, VOX, following its performance in the province of Andalusia's regional elections, could be anticipated to capture seats in the national parliament, as well.
Our study was warranted, moreover, because in Spain we had yet to recognize the extent of computational propaganda given there was only one exploratory study (Campos-Domínguez and García-Orosa 2018), which indicated that in 2019 the political algorithm would finally take hold in the country. With this phenomenon having advanced in recent years, it became necessary to undertake a multi-disciplinary study in Spain that further analyzed the situation and consequences for digital society and the electoral processes of 2019 and 2020.

Features of the Classifier
Feature selection is a critical process for classification tasks that rely on traditional machine learning. In this section, we describe the different types of features we used to design and implement a hybrid classifier, leveraging a model trained from annotated datasets and some generic heuristics determined based on prior knowledge formalized by experts in the domain. The system is therefore based on the HI paradigm, since it hybridizes automatic learning with information from experts.

Features
In order to train a classifier, we defined three types of features: social network, contentbased, and lexical features.

Social Network Features
These are specific characteristics of the language used in social networks, consisting of textual elements that can only be found on Twitter: Ratio of the number of onomatopoeias, e.g., "haha" in English or jeje in Spanish.

Content-Based Features
These are features that can be extracted from any text message: • Ratio of the size of tweets; • Ratio of the number of identical pairs of tweets; • Lexical richness, defined as lemma/token ratio; • Similarity between sequential pairs of tweets. To obtain the final similarity ratio associated with a user account, all similarity scores between pairs of sequential tweets are added, and the result is divided by the total number of tweets.
These content-based features were created with just lexical words (i.e., nouns, adjectives, verbs, and adverbs) by making use of PoS tagging so as to identify them.

Lexical Features
Lexical features were derived from several domain-specific lexicons; in particular, two different weighted lexicons were automatically built for each language: • A human/bot lexicon consisting of specific words belonging to two classes: the language of bots and the language of humans in Twitter; • A sentiment lexicon consisting of polarity words (positive or negative) used by bots or humans.
Each lexicon was built by making use of the annotated corpora provided by the PAN Shared Task organizers and a ranking algorithm defined in Almatarneh and Gamallo (2018). As in the case of content-based features, only words tagged as nouns, verbs, adjectives, and adverbs were considered.
In order to find the best feature configuration in a classification task, we used a Bayesian algorithm. In addition to its simplicity and efficiency, Naive Bayes performs well in this type of task, as described in Alarifi et al. (2016), where the Bayesian classifier obtained the best results in the bot/human classification. Our classifier was implemented with the Naïve Bayes Perl module (https://metacpan.org/pod/Algorithm::NaiveBayes, accessed on 30 June 2021) In order to lemmatize and identify lexical PoS tags, tweets were processed using the multilingual toolkit LinguaKit . The classifier was trained with the dataset provided by the PAN Shared Task.

Heuristics
Our hybrid approach features a system with two modules: a rule-based module consisting of generic heuristics defined with expert knowledge and the Bayes classifier developed with the features described above. The generic heuristics are applied before the Bayes classifier.
The generic heuristics use some of the features defined above; for instance, a user is considered a bot if the similarity of its tweets is above a given threshold, or if the number of hashtags and user references is very high yet the lexical richness is very low. Thresholds were set empirically. Preliminary results obtained with the PAN dataset collection (Rangel and Rosso 2019) showed that the hybrid approach, with rules and a statistical classifier, works slightly better than using just the rules or the classifier.

Analysis
Based on our data and the referenced literature, we created a set of characteristics for political bots during election campaigns, designed a strategy for identifying them, and indicated which ones are likely to be automated and included in our classifier, in addition to evaluating their effects on the discourse.
In the analyzed election campaign, the bot:non-bot ratio was 0.063%, and bots sent 1.903% of messages. Nonetheless, despite these low percentages, bots were highly active and tweeted an average of 132.30 times, compared to the 4.3 tweets of the average human account. Likewise, the average bot account tweeted 6.30 times per day, compared to 4.31 daily tweets by the non-bot accounts. Political bots were flexible and fast but rarely interacted with previous messages and elicited little interaction from other users (10.96% of posts received likes, 9.95% were retweeted, and none received replies, n = 9466 messages). Those that elicited likes typically received one per post, except during particularly active periods of message repetition, like the "EquiparacionYa" (a protest against the gender pay gap in Spain's security forces) and "Talidomida" campaigns (referencing those harmed by thalidomide, a pharmaceutical drug developed by the German company Grunenthal GmbH, sold in Spain between 1957 and 1963 as a sedative and nausea suppressor and that caused thousands of birth defects). The campaign featured, among others, the following messages: #ElDebateDecisivo #ILPJusapol @jusapol @PSOE @populares @ahorapodemos @Ciu-dadanosCs @vox_es @europapress @EFEnoticias//Los talidomidicos hacen público su voto. Comparte 2 . #Avite #talidomida #28A #28Abril #CampañaElectoral #EleccionesGenerales #YoVotoGrunenthal #28AbrilElecciones #EleccionesGenerales2019 #Elecciones2019 #LaEspañaQueQuieres #110compromisosPSOE https://youtu.be/klCrtCJBkwQ (accessed on 30 June 2021).
Such automated political messages tended to be part of synchronized, planned, goaloriented mediated campaigns that featured high concentrations of messages during a short period (for example, intense criticism of another party's leader based on a specific act during a short period). Thus, we detected high-frequency tweets concentrated in a specific time interval and normally with a specific common goal. Such was the case in a relatively short time interval with the identified-as-bot account <user name="jucilcantabria">.
The aforementioned account sent the following messages: El nombre es lo de menos, JUSAPOL SOMOS TODOS Estamos en cada rincón de este país y ¡No vamos a parar! #ILPJusapol #EquiparacionYa and similar retweets:// #EquiparacionYa #ILPJusapol @jusapol, eliciting a great number of retweets and likes. 3 The foregoing was part of the creation of an opinion climate linked to astroturfing or the sometimes-artificial creation of a favorable or unfavorable opinion climate. Such climates have low intensity but long duration. They have been addressed in previous studies but go beyond the scope of this paper.
The bot seems to have a single objective, typically support for a certain political party (or, in Spain's case, the left-right ideological blocs that played a major role in the analyzed election campaign), and it strives to achieve its objective by repeating those messages or topics that support it.
We detected five types of bots based on function: megaphone, amplifier, propagation of party platforms, electoral competition, and offline mobilization.
The megaphone function uses frequency to make a party's or bloc's frames and issues more visible.
The last type of bot disseminates calls to offline action and normally responds to mediated campaigns. For example: El día 25 ante la sede del PSOE en las capitales de provincia, para hacerle saber que la equiparación no se ha ejecutado. #EquiparacionYa #ILPJusapol @jusapol 9 .
Though bots normally have but one objective, there are sometimes two. Depending on the issue, bots will tweet about a higher number of issues to achieve their goal or tweet about just one issue with greater frequency.
Based on this set of features and a content analysis of bots and their sociopolitical end game, rather than the network dynamics approach seen in previous research, we came up with a list of devices for framing prolific bots that would simultaneously enable the public at large to detect them without access to big data and allow us to feed our automatic classifier so as to achieve more precise measurements. As stated before, we assumed frames exert significant influence on public opinion and considered the various aspects of communicative elements: (a) Structural level (syntactic and communicative) (b) Content level (framing) Regarding syntax, political bots spread telegraphic messages with similar syntactic structures and no complexities. For example: la fuga de Garrido a @CiudadanosCs no creo q sea beneficioso ni para él, ni para el partido de Rivera; Nadie habla del gobierno d ahora en Portugal con lo cerca q está. No interesa Gobiernan los Socialistas con la izquierda. No hablan, porque están mejorando todos los indicadores Están recuperando el Estado del Bienestar q empezó a destruirlo Tacher Felipe Aznar Caída Muro> 10 .
At the communicative level, there was little feedback on the network, they tended not to develop threads nor refer to previous messages, they used denotative language, they refrained from using irony and double entendre and, normally, they were linked to news articles or statements made by leaders.
Regarding content, we focused on three elements: number of frames, issue-specific frames, and generic frames. A bot tends to use just one frame, as seen in previous examples.
Regarding issue-specific frames, to make them easier to identify, we defined the most common categories of issue-specific frames in bots, avoiding references to issues specifically related to the Spanish elections dealt with in this paper: media reproduction or dissemination (issue-focused on an outlet's news piece), reproduction of leadership (issuefocused on a political leader's statements), circulation/visibility/repetition of a limited number of issues but high repetition/circulation of one single issue, hybrid (inclusion of calls to offline action), and partisan repetition (reference to a party) The most prevalent game frames among bots are those that treat politics like a contest, typically focusing on who wins or loses an election; on the approval or disapproval of various interest groups, districts, or audiences; or on election results, politicians or potential coalitions, and in our case specifically, on the unlikelihood that any party would win an outright majority.
Lastly, in generic frames bots do not define the problem, nor do they interpret its causes or recommend solutions; rather, they tend to offer moral judgments and tend to be unable to build a complete frame. In this way, bots could be skilled, effective frame-transmitters but not builders or managers of complex frames.
Based on this analysis, we came up with a four-phase strategy for the general public to identify bots:
Interaction with the automated message.
This bot detection scheme is summarized in the Table 3 below. The higher the score obtained, the more likely the message came from a political bot.
Basing our study on this set of characteristics and a content analysis of the bots and their sociopolitical end game, instead of focusing on network dynamics as in previous studies, we created a series of tools for classifying prolific bots that simultaneously allows the public at large to detect them without access to big data and allows us to feed our automatic classifier so as to achieve more precise measurements. As mentioned before, we assumed that frames exert significant influence on public opinion, and we took into account the various elements of communication.
The Table 3 gives some hints on how to identify a bot on the basis of different criteria that are easily detectable with a relatively low score for a relatively low number of messages. The more points a given message or set of messages accumulates, the more likely it is to be identified as a bot.

Discussion and Conclusions
The impetus for this research was the concern over the impact that the use of bots could have on democracy (Hagen et al. 2020). We developed a hybrid detection method that researchers had called for in previous studies. Moreover, we analyzed the use of bots in a specific context, to wit, the 2019 Spanish election campaign, which allowed us to compile a database for future studies, compare data from previous studies, and propose new categories for the analysis of bots.
First, we tackled our technological goal: to improve the detection of political bots by incorporating social science expertise in machine learning, deep learning, and natural language processing systems. We designed and employed a hybrid classifier, equipped with a model trained with annotated datasets and several generic heuristics comprised of previous knowledge formalized by experts in the field. Thus, the system is based on the hybrid intelligence paradigm, as it hybridizes machine learning and expert knowledge. As explained in Section 3, the preliminary results obtained through the compilation of PAN datasets (Rangel and Rosso 2019) showed that the hybrid approach, with rules and a statistical classifier, works somewhat better than the rules or the classifier alone. As such, we were able to detect and classify the political bots operating in Spain's 2019 elections, as well as to develop the country's first political bot classifier. Consequently, we were able to overcome the problems that arise upon using classifiers designed for texts written in English (Albadi et al. 2019).
The frequency and intensity of the bots we detected resembled those of previous studies (Bessi and Ferrara 2016;Forelle et al. 2015;Schuchard et al. 2019). Nonetheless, we did not detect the intent to engage other users in conversation, as seen in the bots detected in previous studies. Rather, the bots used in Spain's election campaign seemed more geared towards the repetitive dissemination of specific messages than generating interactions or conversations. We detected high-frequency, single-message tweets concentrated in a brief period of time. This idea is consistent with the strategies developed in recent years by Spanish political parties, which seek to increase user engagement (García-Orosa et al. 2017).
To round off the set of bot characteristics proposed by the scientific literature, which has focused primarily on bot dynamics in the Twitter ecosystem, we created a syntactical, communicative, and content-based framework that confirmed that they are governed by a series of inflexible decisions that fail to consider the unpredictability, spontaneity, and deviation from patterns inherent to human thought and behavior (Entman and Usher 2018). We also assumed that frames significantly influence public opinion and considered bots a highly useful and appropriate tool for the dissemination of strategically-designed frames.
Political bots have all the markings of a good frame transmitter due to their frequency, accessibility, and relevance, but above all, because they conceal their true nature as bots and learn from and adapt to pre-existing frames.
In addition to developing the aforementioned classifier, which will improve classifiers in future studies, we detected several trends that increase the threat bots pose to democracy. First, the bots in our study focused on problems in the game frames that distract users from the core message. Moreover, they have negative implications for democracy since they drown out and reduce the number of politically informed people. Likewise, the use of bots could foment cynicism and is already associated with lower levels of internal efficacy (Pedersen 2012).
Second, the overwhelming presence of a single frame, revolving around a party leader or party and sometimes previously disseminated by other media, confirms bots' ability to draw people's attention to certain issues and create artificial leadership, as indicated in previous studies.
With this acquired knowledge, we were able to design a bot detection tool that combines the technical and formal characteristics of bots with content analysis and, above all, an analysis of the elements that may be linked to the frame and play a marked role in the online manipulation of public opinion.

Limitations
Our research makes simplified assumptions of online communication, which should be complemented with additional factors and variables of analysis in subsequent studies. Additionally, it would be interesting to apply these results to organized Twitter campaigns analyzed in previous studies.
Though the use of bots is most visible on Twitter, one of the most-used platforms for political communication, it would be beneficial to study other platforms in this fashion.
Subsequent research could expand on this research and test the effectiveness of our tool by compiling various strata of audiences and including other factors. Additionally, a potential subject of study would be the possible interaction between the human receiver and the bot as one of the significant elements in confirming its level of empathy and the likelihood that it is an automated message.
Moreover, though efforts have already been made to increase digital literacy so that the public has the tools to identify forms of computational propaganda and limit their impact (Dubois and McKelvey 2019), we expect these results to be incorporated into an app or web platform designed to assist the public in said identification and counteracting.

Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.

Data Availability Statement:
The study did not report any data.

Conflicts of Interest:
The authors declare no conflict of interest.
Each line of the file contains the user's and the tweet's ID (one tweet per line). You need both to locate the tweet. The tweet's ID alone is not enough. The tweets' IDs can be found in the following file: https://nextcloud.brunneis.com/index.php/s/qkgy6s4CFHCC9tH (accessed on 30 June 2021 The name is the least important thing. WE ARE ALL JUSAPOL We're in all four corners of this country and we won't stop! #ILPJusapol #ParityNOW</body. 4 Amazing editorial in El Mundo on Sánchez's Attorney General's support of the coup plotters. Sánchez whitewashes coup plotters and besmirches judges-the path to treachery. 5 Widows' and widowers' pensions will rise 4 points. More than 414,000 people will benefit, mostly older women. 6 Action Plan to globalize Spain's economy. 7 Get out and vote for Unidas Podemos to get more votes than the psoe so that when it comes time to form a Government with Sánchez he doesn't slide to the right. A vote for UNidasPod will benefit the until-now sacrificial majority, middle class workers small and large company [sic]. 8 VOX as is stands at 37/42. If the secret vote for VOX is greater than 15% it COULD REACH 45/47 that's my prediction. 9 On the 25th in front of the PSOE headquarters in the provincial capitals, to let them know that the equalisation has not been implemented. 10 I don't think Garrido switching to @CiudadanosCs benefits him or Rivera's party; Nobody's talking about Portugal's current government despite how close they are. It doesn't matter the Socialists govern with the left. They don't say anything, because all the indicators are improving. They're getting back the Welfare State that Tacher [Thatcher] Felipe Aznar Fallen Wall started to destroy.