Hybrid Intelligence Strategies for Identifying, Classifying and Analyzing Political Bots

García-Orosa, Berta; Gamallo, Pablo; Martín-Rodilla, Patricia; Martínez-Castaño, Rodrigo

doi:10.3390/socsci10100357

Open AccessFeature PaperArticle

Hybrid Intelligence Strategies for Identifying, Classifying and Analyzing Political Bots

by

Berta García-Orosa

^1,*,

Pablo Gamallo

²

,

Patricia Martín-Rodilla

³ and

Rodrigo Martínez-Castaño

²

¹

Department of Communication Sciences, Universidad de Santiago, 15782 Santiago de Compostela, Spain

²

Centro de Investigación en Tecnoloxías da Información, Universidade de Santiago de Compostela, 15705 Santiago de Compostela, Spain

³

Information Retrieval Lab, Centro de Investigación en Tecnoloxías da Información e as Comunicacións (CITIC), Universidade da Coruña, Campus de Elviña s/n CP, 15071 A Coruña, Spain

^*

Author to whom correspondence should be addressed.

Soc. Sci. 2021, 10(10), 357; https://doi.org/10.3390/socsci10100357

Submission received: 7 June 2021 / Revised: 9 September 2021 / Accepted: 10 September 2021 / Published: 27 September 2021

(This article belongs to the Special Issue Journalism and Politics: New Influences and Dynamics in the Social Media Era)

Download

Browse Figures

Versions Notes

Abstract

Political bots, through astroturfing and other strategies, have become important players in recent elections in several countries. This study aims to provide researchers and the citizenry with the necessary knowledge to design strategies to identify bots and counteract what international organizations have deemed bots’ harmful effects on democracy and, simultaneously, improve automatic detection of them. This study is based on two innovative methodological approaches: (1) dealing with bots using hybrid intelligence (HI), a multidisciplinary perspective that combines artificial intelligence (AI), natural language processing, political science, and communication science, and (2) applying framing theory to political bots. This paper contributes to the literature in the field by (a) applying framing to the analysis of political bots, (b) defining characteristics to identify signs of automation in Spanish, (c) building a Spanish-language bot database, (d) developing a specific classifier for Spanish-language accounts, (e) using HI to detect bots, and (f) developing tools that enable the everyday citizen to identify political bots through framing.

Keywords:

bots; framing; hybrid intelligence; empowerment; social media

1. Introduction

No longer just a software mediator, bots have become important players in various political systems (Salge and Karahanna 2018; Lewis et al. 2019). Since 2010, political parties and governments have spent more than USD 500 million on research and development in this field (Bradshaw and Howard 2018). Their effect on election and referendum results increases each year, with formally organized social media manipulation campaigns in 48 countries, compared to 28 in the previous year (Bradshaw and Howard 2018), though they develop and exert influence differently depending on the context.

Bots’ political influence continues to grow (Montal and Reich 2017) due to their ability to create artificial public opinion and turn non-existent or minority opinions into majority or dominant ones (Ross et al. 2019). The receivers of these messages are defenseless given that they are yet unable to distinguish real from fake speeches, which employ deception so as to conceal their nature (van der Kaa and Krahmer 2014; Waddell 2018; Wölker and Powell 2018; Kušen and Strembeck 2020).

This research aims primarily to provide citizens with the knowledge necessary to create strategies for identifying political bots as a safeguard against their potentially negative impact on democracy, as a tool for achieving equality of opportunities in public debate for all political options, and to improve automated bot detection. We seek to do this by applying hybrid intelligence (HI), which will allow us to combine machine and human intelligence to overcome deficiencies in current artificial intelligence systems (Dellermann et al. 2019; Kamar 2016). The idea is to leverage a multidisciplinary approach through AI, natural language processing, political science, and communication sciences.

We used HI to analyze the bots in Spain’s April 2019 elections, leveraging previous studies on automatic bot detection (Perdana et al. 2015; Morstatter et al. 2016; Ramalingam and Chinnaiah 2018; Gamallo and Almatarneh 2019), a novel idea that has developed a great deal in the past year (e.g., the HI4NLP workshop proposed by two of this paper’s authors and accepted by ECAI 2020, a referential European conference on artificial intelligence). Moreover, it has been applied in other contexts and fields, mostly to tackle problems with a clear social component or goal, like the one at hand in this paper. For example, it has been used in market analyses (Dellermann et al. 2019) or analyses of big data collected by sensors (e.g., smart cities), thanks to its focus on the Internet of Things (IoT) (Dellermann et al. 2017).

Moreover, we leveraged framing theory, taking frames as a speech-constructing and processing strategy (Pan and Kosicki 1993) that emphasize certain aspects of reality to encourage the desired interpretation thereof (Gitlin 1980; Gamson and Modigliani 1989; Scheufele 1999; Entman and Usher 2018).

We established four specific goals:

Develop Spain’s first account classifier and define characteristics to identify signs of automation. We collected and indexed a massive amount of text from Twitter, analyzing the political bots of the resulting corpus so as to determine their defining characteristics. We measured their effectiveness at the individual and aggregate levels, leveraging various sets of characteristics so as to find the most effective. Among the heuristics explored were those featured in previous studies on Twitter.
Compile Spain’s first bot database for Twitter. The database will allow for a subsequent analysis of bots’ presence in and influence on Spanish public opinion.
Determine the typical characteristics of political bots during political campaigns based on political bots’ profiles and tweets.
Analyze and develop tools for the public at large to identify bots without relying on automated machine detection.

This article contributes to the literature by (a) applying framing to the analysis of political bots, (b) defining a set of characteristics so as to identify signs of automation in Spanish, (c) compiling a Spanish-language bot database, (d) developing a classifier for Spanish-language accounts, (e) applying HI to bot detection, (f) identifying key information for the public at large to identify political bots, and (g) improving automated detection of bots.

Having set forth an introduction, we will now briefly review the relevant literature and describe the perspective we adopted in our research. Then, we will detail the methods and samples used, examine the set of features used by the bot classifier, discuss the results of the bot analysis, and propose a detection tool. Finally, we will discuss trends in the field, as well as the conclusions and limitations of our research.

2. Framework

2.1. Political Bots: Identification and Social Impact

Previous research focused on creating techniques to automatically identify bots by analyzing messaging behavior in the Twitter ecosystem (Schuchard et al. 2019; Badawy et al. 2019; Lai et al. 2019; Wang et al. 2020; Zheng et al. 2019). Bots’ movement throughout Twitter was primarily analyzed so as to improve automatic detection systems. In keeping with Woolley and Howard (2016), political bots were defined as, “[…] the algorithms that operate over social media, written to learn from and mimic real people so as to manipulate public opinion across a diverse range of social media and device networks.”

The most noteworthy results from previous research revolve around four central points: (a) influence on electoral processes, (b) functions, (c) taxonomies, and (d) regulation of political bots.

The literature indicates that between 5 and 25% of Twitter accounts are bots and that they are used more during election campaigns (Keller and Klinger 2019). Additionally, previous studies confirm political bots’ influence on electoral processes in distinct political systems and contexts (e.g., Mexico (Glowacki et al. 2018), Venezuela (Forelle et al. 2015), Chile (Santana and Huerta Cánepa 2019), Colombia (López Urrea et al. 2016), the United Kingdom (Murthy et al. 2016), the United States (Howard and Kollanyi 2017; Frey et al. 2018; Luceri et al. 2019), Ecuador (Puyosa 2017), France (Ferrara 2017), Argentina (Filer and Fredheim 2017), Spain (Campos-Domínguez and García-Orosa 2018), Russia (Sanovich n.d.), 2017). Other performed comparative analyses of different countries (Anelli et al. 2019). Still, developments in bots and the influence they exert have varied country to country, and some have even sought to promote the use of good bots (McKelvey and Dubois 2017).

Political parties and other political players can track personal data to send campaign ads, determine the ideological makeup of their potential voters, and send personalized messages adapted to voters’ needs in real time with no need for human intervention. Algorithms can reveal the public’s state of mind, opinion, location, ideology, and needs. Moreover, they can be used to build and send in real-time messages designed to support the sender’s positions and influence each and every type of voter. Bots can even hold conversations with people or amongst themselves. Here, the literature describes astroturfing as a system for creating fake public opinion, highlighting several bot actions (Treré 2016; Bastos and Mercea 2017): (a) pro-government ads, (b) creating fake opinion leaders, (c) delegitimizing systems of government, (d) supporting opposition groups, (e) empowering the public, (f) establishing political agendas and debates, and (g) weakening political dissent.

Taxonomies revolve around the bots’ dynamics (Dagon et al. 2008; Chu et al. 2010; McKelvey and Dubois 2017) or devise specific categories for corpus analysis, as in Stukal et al. (2019), where bots were categorized as pro-regime, anti-regime, or neutral. One of the most thorough taxonomies includes various characteristics: professional news content, professional political content, polarizing and conspiracy content, and other policy news and information (Machado et al. 2018).

The detection of a large number of bots spreading false information and polarizing the political conversation (Bessi and Ferrara 2016) gave rise to various initiatives, an evaluation of changes to legislation, and proposals for intervention. Though some criticize regulation for potentially limiting freedom of speech (Lamo and Calo 2019), many countries and international organizations have recently developed such rules. The European Union warned of the threat to democracy if political parties generate automated messages adapted to the needs of each person based on big data analysis, which may even be manipulated by fake news (European Commission 2018; European Parliament 2017). Additionally, some countries have enacted laws to regulate artificial intelligence and computational propaganda (Italy in 2014, France in 2016, the UK in 2017). In November 2018, the European Commission adopted the Action Plan against Disinformation to minimize disinformation in European elections.

2.2. Our Approaches: Hybrid Intelligence

Our paper uses this context as a starting point, understanding that the sophistication of bots necessitates the development of mixed analysis methods that combine statistical methods with social sciences, which we seek to achieve through HI. Natural language processing and new AI techniques in the field of machine/deep learning (ML/DL) have been used in recent years to detect bots at the account level, processing a large number of social network posts and leveraging information on the network’s structure, temporal dynamics, and sentiment analysis, and even using neural networks in large compilations of text data (Kudugunta and Ferrara 2018; Stukal et al. 2019). Nonetheless, we believe that bots’ increasing complexity necessitates mixed methodologies that combine expert human knowledge with ML/DL and NLP systems.

This approach will allow us to feed the classifier with new information that will improve and allow for immediate detection of political bots, and will also allow individual users to do so, despite them not typically having access to such technological tools nor the ability to visualize a large number of bots.

We tackle bot detection through framing not only because it can connect with the messages’ propagation dynamics and their cascade influence, but also because it is a model that successfully maps out the complexity of thought by paying attention to the elements of greatest significance in a given communicative context.

We assume that frames play an important role in public opinion and that bots can rapidly spread frames (tell people what to think about) so as to, as Entman (2010) points out, monitor public attitudes to influence people’s behavior. This situation, combined with the spiral of silence, could become a powerful weapon for hiding opinions not just because they are a minority position but because they differ from those proposed by bots. Moreover, according to Chong and Druckman (2007), bots have two of the three frame-strengthening elements: frequency, accessibility, and relevance. The influence bots may exert through frame-spreading is portrayed in the following Figure 1.

Although bots may have initially been geared towards increasing a politician’s non-existent popularity, they are not typically used to spread frames that tend to receivers’ needs. A frame’s success partially revolves around awareness and adaptation of pre-existing frames. As seen in the diagram, this is one of bots’ strengths: Namely, the user imbues the bots with data analysis, allowing them to incorporate preexisting frames. If the spiral of silence were applied to political bots, a bot creator could quickly spread new frames consistent with receivers’ preexisting frames so as to transform those initially rare or nonexistent frames into dominant ones.

Here we turn to the conceptualization and operationalization of the frame analysis, as well as to the trends in social network frames that underpin the tools used in our paper’s empirical analysis.

Frames are a key concept in political communication, though they have been used in other fields as well (Bateson 2002; Goffman 1974). Frames can carry out four functions: defining problems, interpreting causes, moral judgments, and recommendations for treatment (Entman 2003). Their success is linked to interaction with individuals’ pre-existing schema, pre-existing frames, and current information (Entman and Usher 2018).

The framing approach presents distinct typologies (Matthes 2009). For our purposes, the relevant typology is that which distinguishes between specific and generic frames (De Vreese 2005). Whereas specific frames revolve around specific topics or isolated events, generic frames transcend thematic limitations because the same issue can be identified in different contexts (De Vreese 2002). A great deal of research has applied such classifications: specific frames in Matthes (2009), Lengauer and Höller (2013), Hänggli and Kriesi (2010), Sheafer and Gabay (2009), Zhou and Moy (2007), and Matthes (2009), and generic frames in Aalberg et al. (2012), Iyengar (1991), Semetko and Valkenburg (2000), Neuman et al. (1992), and Lengauer and Höller (2013).

In keeping with Aalberg et al.’s (2012) work on generic frames, we dealt with game frames and strategic frames. Game frames take politics as a game that consists of winning or losing, employs bellicose language, and emphasizes polling data. They typically focus on politics in general, legislative debates, the winners and losers of an election, and the battle for public opinion (polls), and speculate on election or political results and potential coalitions. The strategic frame refers to campaign strategies and tactics, motives and instrumental actions, personality, style, and metacoverage. It also includes different media strategies, including news that covers the press’s behavior.

The first generation of research demonstrated that frames can affect public opinion on a wide range of issues (Aarøe 2011), though not to the same degree (Entman 2003; Aarøe 2011; Chong and Druckman 2007). Public actors compete to strengthen certain frames and several factors determine their success or failure in spreading new frames (Chong and Druckman 2007; Hänggli and Hanspeter 2012).

Given social networks’ consolidation as a form of political communication, framing processes, as well as the processes of information production and consumption, among others, need to be reevaluated (Entman and Usher 2018). Digitalization of online framing processes has significantly influenced interpersonal, family, and organizational communication and increased opportunities for extremism and balkanization (Entman and Usher 2018).

Initial studies applied framing theory to blogs and websites (Bichard 2006; Goldman et al. 2010). To date, many framing-based studies of social networks have considered frame propagation dynamics in distinct but inter-related communication ecosystems (Wasike 2013; Aruguete and Calvo 2018).

Twitter has become a major political communication tool, especially during campaign seasons when parties and candidates use it to provide information about their campaign and its events or to link to their website (Jungherr 2016). Previous studies affirmed Twitter’s ability to get out the vote and increase civic engagement (Gainous and Wagner 2014) and to change political commitment (Lee and Park 2013; Grčić et al. 2017). Academic research highlights bi-directionality, interactivity, the capacity for dialogue, and even promoting a participatory, deliberative democracy. More importantly, Twitter is the appropriate tool for politization given its ability to disseminate and cause messages to circulate, in addition to its fostering of public engagement (Utz et al. 2013; Abitbol and Lee 2017; Ji et al. 2018; Painter 2015).

Additionally, our paper considers Twitter a shared communicative space in which democratic deliberation could arise and each political actor could defend their positions. Moreover, we understand that for this to happen, an equal playing field is an important premise. Bots violate that condition by giving too much power to too few. Given that platforms refuse to take responsibility for this situation, the frames spread therein, and the beneficiaries (Entman and Usher 2018), we propose empowering the public to detect bots, promoting horizontal surveillance put into practice by people themselves in a somewhat planned way through cell phones and other devices that allow for reporting and sharing political actions with which they disagree. Our research aims to foster public detection of political bots, whose messages are often indistinguishable from those created by humans. Indeed, much of their success lies in this concealment.

3. Materials and Methods

We proposed a multi-disciplinary approach leveraging artificial intelligence, natural language processing, political science, and communication science. We applied hybrid intelligence to our analysis of political bots from Spain’s April 2019 election, referencing previous studies on automatic bot detection. Below we summarize and explain our extraction methods and our analysis of tweets (Appendix A):

Step 1. Crawler Design.

The Polypus Twitter crawler (Martínez-Castaño et al. 2018b) retrieves tweets in real time from Twitter’s public and anonymous Search API, used by its official web client. The tweets are formatted in HTML so they can be directly inserted in the site’s feed. The API limits the number of returned tweets by returning a sample of the total that matches a given query. Any user of the web interface is expected to receive the same set of tweets when executing the same query within the same time windows.

The high-level architecture of the system is shown in Figure 2. The crawler can be configured both for targeted and untargeted searches. The number of threads is configurable so that the queries are equally distributed and processed in parallel. In addition, multiple instances of the crawler can be executed on different machines so that the crawler scales horizontally. To avoid repeated tweets, a distributed memory-based key-value store is used to store the known identifiers for several days.

The list of queries can be set as the list of the most frequent terms or expressions in a set of languages. With this strategy, the crawler can retrieve huge amounts of tweets without any specific target. There is not a linear relation with the available resources (due to the aforementioned limitations of Twitter’s API). However, for specific targets such as in this study, the extraction can practically match the actual production of tweets through the use of specific terms, hashtags, or Twitter accounts. Dynamic data (e.g., retweets, number of replies) are collected afterwards since the tweets are extracted in real time and these attributes do not offer useful information about the users’ interaction initially.

Polypus’s Twitter crawler is now part of a set of social media crawlers integrated into Catenae (Martínez-Castaño et al. 2018a, 2018c), a Python framework for easy design, development, and deployment of stream processing applications with Docker containers1.

Step 2. Classifier features. Extraction parameters (explained below).

Step 3. Discourse analysis through framing. Tweet content analysis parameters (explained below).

The sample was in keeping with those of other studies (Hedman et al. 2018; Schäfer et al. 2017). In addition to the campaign season, it also included the period between April 15 and May 5, with 575 candidate or political party accounts, as well as the hashtags promoted by those parties. The sample yielded the following Table 1 data:

Additionally, it combined analysis of parties’ and candidates’ accounts, as well as hashtags as we can see in the Table 2.

We manually classified frames based on a sample of 50 accounts identified as bots and selected randomly, using the following categories: message dynamics, interaction on the network (links, retweets, replies), topics, goals, level of message repetition, end game, frequency, and frames, divided along the following lines:

(a): Structural level (syntactic and communicative). At the communicative level, we analyzed feedback in the network, the development of threads and references to previous messages, the use of denotative language, irony and double entendre, and connection to offline messages.
(b): Content level (framing). At the content level, we pinpointed three components: number of frames, issue-specific frames, and generic frames. Lastly, we evaluated to what extent each of these categories can be automated.

For this paper’s context, we chose Spain’s 28 April 2019 elections, which are particularly relevant for several reasons. First is the 74.65% voter participation, much higher than that of the 2016 elections won by the Spanish Socialist Workers’ Party (PSOE in Spanish), at a time in which Spain’s two-party system was splintering due to the emergence of two new parties (Podemos and Ciudadanos). Second, VOX, following its performance in the province of Andalusia’s regional elections, could be anticipated to capture seats in the national parliament, as well.

Our study was warranted, moreover, because in Spain we had yet to recognize the extent of computational propaganda given there was only one exploratory study (Campos-Domínguez and García-Orosa 2018), which indicated that in 2019 the political algorithm would finally take hold in the country. With this phenomenon having advanced in recent years, it became necessary to undertake a multi-disciplinary study in Spain that further analyzed the situation and consequences for digital society and the electoral processes of 2019 and 2020.

4. Features of the Classifier

Feature selection is a critical process for classification tasks that rely on traditional machine learning. In this section, we describe the different types of features we used to design and implement a hybrid classifier, leveraging a model trained from annotated datasets and some generic heuristics determined based on prior knowledge formalized by experts in the domain. The system is therefore based on the HI paradigm, since it hybridizes automatic learning with information from experts.

4.1. Features

In order to train a classifier, we defined three types of features: social network, content-based, and lexical features.

4.1.1. Social Network Features

These are specific characteristics of the language used in social networks, consisting of textual elements that can only be found on Twitter:

Ratio of the number of hashtags, i.e., number of hashtags used by a user account divided by total number of tweets sent from that account;
Ratio of the number of retweets;
Ratio of the number of URL links;
Ratio of the number of user references;
Ratio of the number of emojis;
Ratio of the number of textual emoticons;
Ratio of the number of onomatopoeias, e.g., “haha” in English or jeje in Spanish.
Ratio of the number of language abbreviations, e.g., “b4” (before) or “btw” (by the way) in English, and “q” (que) or “xq” (porque), in Spanish;
Ratio of the number of alliterations, e.g., repetition of vowel sounds.

4.1.2. Content-Based Features

These are features that can be extracted from any text message:

Ratio of the size of tweets;
Ratio of the number of identical pairs of tweets;
Lexical richness, defined as lemma/token ratio;
Similarity between sequential pairs of tweets. To obtain the final similarity ratio associated with a user account, all similarity scores between pairs of sequential tweets are added, and the result is divided by the total number of tweets.

These content-based features were created with just lexical words (i.e., nouns, adjectives, verbs, and adverbs) by making use of PoS tagging so as to identify them.

4.1.3. Lexical Features

Lexical features were derived from several domain-specific lexicons; in particular, two different weighted lexicons were automatically built for each language:

A human/bot lexicon consisting of specific words belonging to two classes: the language of bots and the language of humans in Twitter;
A sentiment lexicon consisting of polarity words (positive or negative) used by bots or humans.

Each lexicon was built by making use of the annotated corpora provided by the PAN Shared Task organizers and a ranking algorithm defined in Almatarneh and Gamallo (2018). As in the case of content-based features, only words tagged as nouns, verbs, adjectives, and adverbs were considered.

In order to find the best feature configuration in a classification task, we used a Bayesian algorithm. In addition to its simplicity and efficiency, Naive Bayes performs well in this type of task, as described in Alarifi et al. (2016), where the Bayesian classifier obtained the best results in the bot/human classification. Our classifier was implemented with the Naïve Bayes Perl module (https://metacpan.org/pod/Algorithm::NaiveBayes, accessed on 30 June 2021) In order to lemmatize and identify lexical PoS tags, tweets were processed using the multilingual toolkit LinguaKit (Gamallo et al. 2018). The classifier was trained with the dataset provided by the PAN Shared Task.

4.2. Heuristics

Our hybrid approach features a system with two modules: a rule-based module consisting of generic heuristics defined with expert knowledge and the Bayes classifier developed with the features described above. The generic heuristics are applied before the Bayes classifier.

The generic heuristics use some of the features defined above; for instance, a user is considered a bot if the similarity of its tweets is above a given threshold, or if the number of hashtags and user references is very high yet the lexical richness is very low. Thresholds were set empirically. Preliminary results obtained with the PAN dataset collection (Rangel and Rosso 2019) showed that the hybrid approach, with rules and a statistical classifier, works slightly better than using just the rules or the classifier.

5. Analysis

Based on our data and the referenced literature, we created a set of characteristics for political bots during election campaigns, designed a strategy for identifying them, and indicated which ones are likely to be automated and included in our classifier, in addition to evaluating their effects on the discourse.

In the analyzed election campaign, the bot:non-bot ratio was 0.063%, and bots sent 1.903% of messages. Nonetheless, despite these low percentages, bots were highly active and tweeted an average of 132.30 times, compared to the 4.3 tweets of the average human account. Likewise, the average bot account tweeted 6.30 times per day, compared to 4.31 daily tweets by the non-bot accounts. Political bots were flexible and fast but rarely interacted with previous messages and elicited little interaction from other users (10.96% of posts received likes, 9.95% were retweeted, and none received replies, n = 9466 messages). Those that elicited likes typically received one per post, except during particularly active periods of message repetition, like the “EquiparacionYa” (a protest against the gender pay gap in Spain’s security forces) and “Talidomida” campaigns (referencing those harmed by thalidomide, a pharmaceutical drug developed by the German company Grunenthal GmbH, sold in Spain between 1957 and 1963 as a sedative and nausea suppressor and that caused thousands of birth defects). The campaign featured, among others, the following messages:

#ElDebateDecisivo #ILPJusapol @jusapol @PSOE @populares @ahorapodemos @CiudadanosCs @vox_es @europapress @EFEnoticias//Los talidomidicos hacen público su voto. Comparte2. #Avite #talidomida #28A #28Abril #CampañaElectoral #EleccionesGenerales #YoVotoGrunenthal #28AbrilElecciones #EleccionesGenerales2019 #Elecciones2019 #LaEspañaQueQuieres #110compromisosPSOE https://youtu.be/klCrtCJBkwQ (accessed on 30 June 2021).

Such automated political messages tended to be part of synchronized, planned, goal-oriented mediated campaigns that featured high concentrations of messages during a short period (for example, intense criticism of another party’s leader based on a specific act during a short period). Thus, we detected high-frequency tweets concentrated in a specific time interval and normally with a specific common goal. Such was the case in a relatively short time interval with the identified-as-bot account <user name=“jucilcantabria”>.

The aforementioned account sent the following messages:

El nombre es lo de menos, JUSAPOL SOMOS TODOS Estamos en cada rincón de este país y ¡No vamos a parar! #ILPJusapol #EquiparacionYa and similar retweets://#EquiparacionYa #ILPJusapol @jusapol, eliciting a great number of retweets and likes.3

The foregoing was part of the creation of an opinion climate linked to astroturfing or the sometimes-artificial creation of a favorable or unfavorable opinion climate. Such climates have low intensity but long duration. They have been addressed in previous studies but go beyond the scope of this paper.

The bot seems to have a single objective, typically support for a certain political party (or, in Spain’s case, the left–right ideological blocs that played a major role in the analyzed election campaign), and it strives to achieve its objective by repeating those messages or topics that support it.

We detected five types of bots based on function: megaphone, amplifier, propagation of party platforms, electoral competition, and offline mobilization.

The megaphone function uses frequency to make a party’s or bloc’s frames and issues more visible.

The amplifier does not provide its own discourse, but rather it links to previous messages, primarily the media’s. For example:

Extraordinario Editorial de El Mundo (5/5/2019) sobre el apoyo de la Fiscalía de Sánchez a los golpistas. Sánchez-blanqueador de golpistas y ennegrecedor de Jueces-camino de la traición. #España #PP #PSOE #Cs #UP #Vox #Cataluña #BCN2019 #PorEspaña #26M #HablamosEspañol pic.twitter.com/92DcSnUwWT4

The third type offers up the party’s platform in a distinct message. For example:

body 1: @2Estela #VotaPSOE Las pensiones de viudedad aumentaran 4 puntos. Se beneficiarán más de 414.000 personas, en su mayoría mujeres mayores.5 #HazQuePase #28A #LaEspañaQueQuieres #110CompromisosPSOE #PSOEPonienteSur #CórdobaESP https://pst.cr/6jrZVpic.twitter.com/KWtceL1JX0 (accessed on 30 June 2021); body 2: @AceitesCanoliva #HazQuePase Plan de Acción 2019-20 de internacionalización de la economía española.6 #28A #VotaPSOE #LaEspañaQueQuieres #110CompromisosPSOE #PSOEPonienteSur #CórdobaESP https://pst.cr/4KPakpic.twitter.com/DTGQtb9n26 (accessed on 30 June 2021)

The election competition function mentions the electoral contest to secure votes and is used mostly by election bots. For example:

Llenemos las urnas de votos a Unidas Podemos para que tenga más votos que psoe y a la hora d formar Gobierno con Sanchez no se deslice éste hacia la derecha El voto a UNidasPod beneficiará así a la mayoría hasta ahora sacrificada, trabajadores clase media pequeña y grande empresa7 o VOX sin cocinar 37/42. si el voto oculto es mayor del 15% para VOX … PUEDE LLEGAR A 45/47 este es mi pronóstico8.

The last type of bot disseminates calls to offline action and normally responds to mediated campaigns. For example:

El día 25 ante la sede del PSOE en las capitales de provincia, para hacerle saber que la equiparación no se ha ejecutado. #EquiparacionYa #ILPJusapol @jusapol9.

Though bots normally have but one objective, there are sometimes two. Depending on the issue, bots will tweet about a higher number of issues to achieve their goal or tweet about just one issue with greater frequency.

Based on this set of features and a content analysis of bots and their sociopolitical end game, rather than the network dynamics approach seen in previous research, we came up with a list of devices for framing prolific bots that would simultaneously enable the public at large to detect them without access to big data and allow us to feed our automatic classifier so as to achieve more precise measurements. As stated before, we assumed frames exert significant influence on public opinion and considered the various aspects of communicative elements:

(a): Structural level (syntactic and communicative)
(b): Content level (framing)

Regarding syntax, political bots spread telegraphic messages with similar syntactic structures and no complexities. For example:

la fuga de Garrido a @CiudadanosCs no creo q sea beneficioso ni para él, ni para el partido de Rivera; Nadie habla del gobierno d ahora en Portugal con lo cerca q está. No interesa Gobiernan los Socialistas con la izquierda. No hablan, porque están mejorando todos los indicadores Están recuperando el Estado del Bienestar q empezó a destruirlo Tacher Felipe Aznar Caída Muro>10.

At the communicative level, there was little feedback on the network, they tended not to develop threads nor refer to previous messages, they used denotative language, they refrained from using irony and double entendre and, normally, they were linked to news articles or statements made by leaders.

Regarding content, we focused on three elements: number of frames, issue-specific frames, and generic frames. A bot tends to use just one frame, as seen in previous examples.

Regarding issue-specific frames, to make them easier to identify, we defined the most common categories of issue-specific frames in bots, avoiding references to issues specifically related to the Spanish elections dealt with in this paper: media reproduction or dissemination (issue-focused on an outlet’s news piece), reproduction of leadership (issue-focused on a political leader’s statements), circulation/visibility/repetition of a limited number of issues but high repetition/circulation of one single issue, hybrid (inclusion of calls to offline action), and partisan repetition (reference to a party)

The most prevalent game frames among bots are those that treat politics like a contest, typically focusing on who wins or loses an election; on the approval or disapproval of various interest groups, districts, or audiences; or on election results, politicians or potential coalitions, and in our case specifically, on the unlikelihood that any party would win an outright majority.

Lastly, in generic frames bots do not define the problem, nor do they interpret its causes or recommend solutions; rather, they tend to offer moral judgments and tend to be unable to build a complete frame. In this way, bots could be skilled, effective frame-transmitters but not builders or managers of complex frames.

Based on this analysis, we came up with a four-phase strategy for the general public to identify bots:

Identify a tweet’s syntactic features;
Identify its communicative features;
Analyze the frames used: (1) frequency, (2) issue-specific frames, (3) generic frames;
Interaction with the automated message.

This bot detection scheme is summarized in the Table 3 below. The higher the score obtained, the more likely the message came from a political bot.

Basing our study on this set of characteristics and a content analysis of the bots and their sociopolitical end game, instead of focusing on network dynamics as in previous studies, we created a series of tools for classifying prolific bots that simultaneously allows the public at large to detect them without access to big data and allows us to feed our automatic classifier so as to achieve more precise measurements. As mentioned before, we assumed that frames exert significant influence on public opinion, and we took into account the various elements of communication.

The Table 3 gives some hints on how to identify a bot on the basis of different criteria that are easily detectable with a relatively low score for a relatively low number of messages. The more points a given message or set of messages accumulates, the more likely it is to be identified as a bot.

6. Discussion and Conclusions

The impetus for this research was the concern over the impact that the use of bots could have on democracy (Hagen et al. 2020). We developed a hybrid detection method that researchers had called for in previous studies. Moreover, we analyzed the use of bots in a specific context, to wit, the 2019 Spanish election campaign, which allowed us to compile a database for future studies, compare data from previous studies, and propose new categories for the analysis of bots.

First, we tackled our technological goal: to improve the detection of political bots by incorporating social science expertise in machine learning, deep learning, and natural language processing systems. We designed and employed a hybrid classifier, equipped with a model trained with annotated datasets and several generic heuristics comprised of previous knowledge formalized by experts in the field. Thus, the system is based on the hybrid intelligence paradigm, as it hybridizes machine learning and expert knowledge. As explained in Section 3, the preliminary results obtained through the compilation of PAN datasets (Rangel and Rosso 2019) showed that the hybrid approach, with rules and a statistical classifier, works somewhat better than the rules or the classifier alone. As such, we were able to detect and classify the political bots operating in Spain’s 2019 elections, as well as to develop the country’s first political bot classifier. Consequently, we were able to overcome the problems that arise upon using classifiers designed for texts written in English (Albadi et al. 2019).

The frequency and intensity of the bots we detected resembled those of previous studies (Bessi and Ferrara 2016; Forelle et al. 2015; Schuchard et al. 2019). Nonetheless, we did not detect the intent to engage other users in conversation, as seen in the bots detected in previous studies. Rather, the bots used in Spain’s election campaign seemed more geared towards the repetitive dissemination of specific messages than generating interactions or conversations. We detected high-frequency, single-message tweets concentrated in a brief period of time. This idea is consistent with the strategies developed in recent years by Spanish political parties, which seek to increase user engagement (García-Orosa et al. 2017).

To round off the set of bot characteristics proposed by the scientific literature, which has focused primarily on bot dynamics in the Twitter ecosystem, we created a syntactical, communicative, and content-based framework that confirmed that they are governed by a series of inflexible decisions that fail to consider the unpredictability, spontaneity, and deviation from patterns inherent to human thought and behavior (Entman and Usher 2018). We also assumed that frames significantly influence public opinion and considered bots a highly useful and appropriate tool for the dissemination of strategically-designed frames.

Political bots have all the markings of a good frame transmitter due to their frequency, accessibility, and relevance, but above all, because they conceal their true nature as bots and learn from and adapt to pre-existing frames.

In addition to developing the aforementioned classifier, which will improve classifiers in future studies, we detected several trends that increase the threat bots pose to democracy. First, the bots in our study focused on problems in the game frames that distract users from the core message. Moreover, they have negative implications for democracy since they drown out and reduce the number of politically informed people. Likewise, the use of bots could foment cynicism and is already associated with lower levels of internal efficacy (Pedersen 2012).

Second, the overwhelming presence of a single frame, revolving around a party leader or party and sometimes previously disseminated by other media, confirms bots’ ability to draw people’s attention to certain issues and create artificial leadership, as indicated in previous studies.

With this acquired knowledge, we were able to design a bot detection tool that combines the technical and formal characteristics of bots with content analysis and, above all, an analysis of the elements that may be linked to the frame and play a marked role in the online manipulation of public opinion.

7. Limitations

Our research makes simplified assumptions of online communication, which should be complemented with additional factors and variables of analysis in subsequent studies. Additionally, it would be interesting to apply these results to organized Twitter campaigns analyzed in previous studies.

Though the use of bots is most visible on Twitter, one of the most-used platforms for political communication, it would be beneficial to study other platforms in this fashion.

Subsequent research could expand on this research and test the effectiveness of our tool by compiling various strata of audiences and including other factors. Additionally, a potential subject of study would be the possible interaction between the human receiver and the bot as one of the significant elements in confirming its level of empathy and the likelihood that it is an automated message.

Moreover, though efforts have already been made to increase digital literacy so that the public has the tools to identify forms of computational propaganda and limit their impact (Dubois and McKelvey 2019), we expect these results to be incorporated into an app or web platform designed to assist the public in said identification and counteracting.

Lastly, much attention should be paid to ongoing innovation in the automation of information.

Author Contributions

Conceptualization, B.G.-O.; Data curation, P.G.; Formal analysis, B.G.-O., P.G., P.M.-R. and R.M.-C.; Investigation, B.G.-O.; Methodology, B.G.-O., P.G., P.M.-R. and R.M.-C.; Resources, P.M.-R. and R.M.-C.; Software, P.G., P.M.-R. and R.M.-C.; Writing—original draft, B.G.-O., P.G., P.M.-R. and R.M.-C.; Writing—review & editing, B.G.-O., P.G., P.M.-R. and R.M.-C. All authors have read and agreed to the published version of the manuscript.

Funding

This article has been developed within the research project “Digital Native Media in Spain: Storytelling Formats and Mobile Strategy” (RTI2018–093346-B-C33) funded by the Ministry of Science, Innovation, and Universities and co-funded by the European Regional Development Fund (ERDF) and has received financial support from DOMINO project (PGC2018-102041-B-I00, MCIU/AEI/FEDER, UE), eRisk project (RTI2018-093336-B-C21), the Consellería de Cultura, Educación e Ordenación Universitaria (accreditation 2016–2019, ED431G/08, Groups of Reference: ED431C 2020/21, and ERDF 2014-2020: Call ED431G 2019/04) and the European Regional Development Fund (ERDF).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

The study did not report any data.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

To consult the repository, please visit: https://github.com/polypus-firehose (accessed on 30 June 2021).

Each line of the file contains the user’s and the tweet’s ID (one tweet per line). You need both to locate the tweet. The tweet’s ID alone is not enough. The tweets’ IDs can be found in the following file: https://nextcloud.brunneis.com/index.php/s/qkgy6s4CFHCC9tH (accessed on 30 June 2021).

user_id, tweet_id

AlfredoCelso, 1119894989327282182

mimundin, 1119894985971765248

…

The URLs should follow this format:

https://twitter.com/user_id/status/tweet_id (accessed on 30 June 2021).

For example, for the previous two:

https://twitter.com/AlfredoCelso/status/1119894989327282182 (accessed on 30 June 2021).

https://twitter.com/mimundin/status/1119894985971765248 (accessed on 30 June 2021).

Notes

1	https://github.com/catenae (accessed on 30 June 2021).
2	People affected by thalidomide make their vote known. Pass it on.
3	The name is the least important thing. WE ARE ALL JUSAPOL We’re in all four corners of this country and we won’t stop! #ILPJusapol #ParityNOW</body.
4	Amazing editorial in El Mundo on Sánchez’s Attorney General’s support of the coup plotters. Sánchez whitewashes coup plotters and besmirches judges—the path to treachery.
5	Widows’ and widowers’ pensions will rise 4 points. More than 414,000 people will benefit, mostly older women.
6	Action Plan to globalize Spain’s economy.
7	Get out and vote for Unidas Podemos to get more votes than the psoe so that when it comes time to form a Government with Sánchez he doesn’t slide to the right. A vote for UNidasPod will benefit the until-now sacrificial majority, middle class workers small and large company [sic].
8	VOX as is stands at 37/42. If the secret vote for VOX is greater than 15% it COULD REACH 45/47 that’s my prediction.
9	On the 25th in front of the PSOE headquarters in the provincial capitals, to let them know that the equalisation has not been implemented.
10	I don’t think Garrido switching to @CiudadanosCs benefits him or Rivera’s party; Nobody’s talking about Portugal’s current government despite how close they are. It doesn’t matter the Socialists govern with the left. They don’t say anything, because all the indicators are improving. They’re getting back the Welfare State that Tacher [Thatcher] Felipe Aznar Fallen Wall started to destroy.

References

Aalberg, Toril, Jesper Strömbäck, and Claes De Vreese. 2012. The framing of politics as strategy and game: A review of concepts, operationalizations and key findings. Journalism 13: 162–78. [Google Scholar] [CrossRef]
Aarøe, Lene. 2011. Investigating frame strength: The case of episodic and thematic frames. Political Communication 28: 207–26. [Google Scholar] [CrossRef]
Abitbol, Alan, and S. Y. Lee. 2017. Messages on CSR-dedicated Facebook pages: What works and what doesn’t. Public Relations Review 43: 796–808. [Google Scholar] [CrossRef]
Alarifi, Abdulrahman, Mansour Alsaleh, and AbdulMalik Al-Salman. 2016. Twitter turing test: Identifying social machines. Information Sciences 372: 332–46. [Google Scholar] [CrossRef]
Albadi, Nuha, Maram Kurdi, and Shivakant Mishra. 2019. Hateful people or hateful bots? Detection and characterization of bots spreading religious hatred in Arabic social media. Proceedings of the ACM on Human-Computer Interaction 3: 1–25. [Google Scholar] [CrossRef]
Almatarneh, Sattam, and Pablo Gamallo. 2018. A lexicon based method to search for extreme opinion. PLoS ONE 13: e197816. [Google Scholar] [CrossRef] [PubMed]
Anelli, Massimo, Italo Colantone, and Piero Stanig. 2019. We Were The Robots: Automation and Voting Behavior in Western Europe. BAFFI CAREFIN Centre Research Paper No. 2019-115. Available online: https://doi.org/10.2139/ssrn.3419966 (accessed on 30 June 2021).
Aruguete, Natalia, and Erenesto Calvo. 2018. Time to #protest: Selective exposure, cascading activation, and framing in social media. Journal of Communication 68: 480–502. [Google Scholar] [CrossRef]
Badawy, Adam, Aseel Addawood, Kristina Lerman, and Emilio Ferrara. 2019. Characterizing the 2016 Russian IRA influence campaign. Social Network Analysis and Mining 9: 1–11. [Google Scholar] [CrossRef]
Bastos, Marco T., and Dan Mercea. 2017. The Brexit Botnet and User-Generated Hyperpartisan Newsle. Social Science Computer Review 10: 38–54. [Google Scholar] [CrossRef]
Bateson, Gregory. 2002. Steps to an Ecology of Mind: Collected Essays in Anthropology, Psychiatry, Evolution, and Epistemology. Chicago: University of Chigaco Press. [Google Scholar]
Bessi, Alessandro, and Emilio Ferrara. 2016. Social bots distort the 2016 U.S. Presidential election online discussion. First Monday 21: 11. [Google Scholar] [CrossRef]
Bichard, Shannon. 2006. Building Blogs: A Multi-Dimensional Analysis of the Distribution of Frames on the 2004 Presidential Candidate Web Sites. Journalism and Mass Communication Quarterly 83: 329–45. [Google Scholar] [CrossRef]
Bradshaw, Samantha, and Philip Howard. 2018. Challenging Truth and Trust: A Global Inventory of Organized Social Media Manipulation. Oxford: University of Oxford, p. 26. [Google Scholar]
Campos-Domínguez, Eva, and Berta García-Orosa. 2018. Algorithmic communication and political parties: Automation of production and flow of messages. Profesional de La Informacion 27: 769–77. [Google Scholar] [CrossRef]
Chong, Dennis, and James Druckman. 2007. Framing public opinion in competitive democracies. American Political Science Review 101: 637–55. [Google Scholar] [CrossRef]
Chu, Zi, Steven Gianvecchio, Haining Wang, and Sushil Jajodia. 2010. Who is tweeting on twitter: Human, bot, or cyborg? Paper presented at the Annual Computer Security Applications Conference, Austin, TX, USA, December 6–10. [Google Scholar] [CrossRef]
Dagon, David, Guofei Gu, and Christopher Lee. 2008. A taxonomy of Botnet structures. Advances in Information Security 36: 143–64. [Google Scholar] [CrossRef]
De Vreese, Claes. 2002. Framing Europe: Television News and European Integration. Amsterdam: Aksant. [Google Scholar]
De Vreese, Claes. 2005. News framing: Theory and typology. Information Design Journal 13: 51–62. [Google Scholar] [CrossRef]
Dellermann, Dominik, Nikolaus Lipusch, Philipp Ebel, and Jan Marco Leimeister. 2017. Building Your IoT Ecosystem: Proposing the Hybrid Intelligence Accelerator. Paper presented at the European Workshop on Software Ecosystems 2017, Darmstadt, Germany, December 17. [Google Scholar]
Dellermann, Dominik, Philipp Ebel, Matthias Söllner, and Jan Marco Leimeister. 2019. Hybrid Intelligence. Business and Information Systems Engineering 61: 637–43. [Google Scholar] [CrossRef]
Dubois, Elizabeth, and Fenwick McKelvey. 2019. Political bots: Disrupting Canada’s Democracy. Canadian Journal of Communication Policy Portal 44: 27–34. [Google Scholar] [CrossRef]
Entman, Robert. 2003. Cascading Activation: Contesting the White House’s Frame after 9/11. Political Communication 20: 415–32. [Google Scholar] [CrossRef]
Entman, Robert. 2010. Media framing biases and political power: Explaining slant in news of Campaign 2008. Journalism 11: 389–408. [Google Scholar] [CrossRef]
Entman, Robert, and Nikki Usher. 2018. Framing in a Fractured Democracy: Impacts of Digital Technology on Ideology, Power and Cascading Network Activation. Journal of Communication 68: 298–308. [Google Scholar] [CrossRef]
European Commission. 2018. Communication from the Commission to the European Parliament, the Council, the European Economic and Social Committee and the Committee of the Regions. Tackling Online Disinformation: A European Approach. Available online: https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX%3A52018DC0236 (accessed on 30 June 2021).
European Parliament. 2017. European Parliament Resolution of 14 March 2017 on Fundamental Rights Implications of Big Data: Privacy, Data Protection, Non-Discrimination, Security and Law-Enforcement. Available online: https://www.europarl.europa.eu/doceo/document/TA-8-2017-0076_EN.html (accessed on 30 June 2021).
Ferrara, Emilio. 2017. Disinformation and Social Bots Operations in the Run Up to the 2017 French Presidential Election. arXiv arXiv:1707.00086. [Google Scholar]
Filer, Tanya, and Rolf Fredheim. 2017. Popular with the Robots: Accusation and Automation in the Argentine Presidential Elections, 2015. International Journal of Politics, Culture and Society 30: 259–74. [Google Scholar] [CrossRef]
Forelle, Michelle, Phillip Howard, Andres Monroy-Hernandez, and Saiph Savage. 2015. Political Bots and the Manipulation of Public Opinion in Venezuela. Available online: https://doi.org/10.2139/ssrn.2635800 (accessed on 30 June 2021).
Frey, Carl Benedikt, Thor Berger, and Chinchih Chen. 2018. Political machinery: Did robots swing the 2016 US presidential election? Oxford Review of Economic Policy 34: 418–42. [Google Scholar] [CrossRef]
Gainous, Jason, and Kevin Wagner. 2014. Tweeting to Power: The Social Media Revolution in American Politics. Public Opinion Quarterly 78: 1026–28. [Google Scholar] [CrossRef]
Gamallo, Pablo, Marcos Garcia, César Piñeiro, Rodrigo Martínez-Castaño, and Juan Pichel. 2018. LinguaKit: A Big Data-Based Multilingual Tool for Linguistic Analysis and Information Extraction. Paper presented at the 2018 Fifth International Conference on Social Networks Analysis, Management and Security (SNAMS), Valencia, Spain, October 15–18. [Google Scholar]
Gamallo, Pablo, and Sattam Almatarneh. 2019. Naive-Bayesian Classification for Bot Detection in Twitter. Paper presented at the Conference and Labs of the Evaluation Forum 2019, Lugano, Switzerland, September 9–12. [Google Scholar]
Gamson, William, and Andre Modigliani. 1989. Media discourse and public opinion on nuclear power: A constructionist approach. American Journal of Sociology 95: 1–35. [Google Scholar] [CrossRef]
García-Orosa, Berta, Pablo Vázquez-Sande, and Xosé López-García. 2017. Digital narratives of the major political parties of Spain, France, Portugal and the United States. El Profesional de La Información 26: 4. [Google Scholar] [CrossRef][Green Version]
Gitlin, Todd. 1980. The Whole World Is Watching: Mass Media in the Makin and unmaking of the New Left. Berkeley: University of California Press. [Google Scholar]
Glowacki, Monika, Vidya Narayanan, Sam Maynard, Gustavo Hirsch, Bence Kollanyi, Lisa-Maria Neudert, Phil Howard, Thomas Lederer, and Vlad Barash. 2018. News and Political Information Consumption in Mexico: Mapping the 2018 Mexican Presidential Election on Twitter and Facebook. pp. 1–6. Available online: https://demtech.oii.ox.ac.uk/wp-content/uploads/sites/93/2018/06/Mexico2018.pdf (accessed on 30 June 2021).
Goffman, Erving. 1974. Frame Analysis: An Essay on the Organization of Experience. Cambridge: Harvard University Press. [Google Scholar]
Goldman, Adria, and Jim Kuypers. 2010. Contrasts in News Coverage. Relevant Rethoric: A New Journal of Rhetorical Studies 1: 1–18. [Google Scholar]
Grčić, Klara, Marina Bagic Babac, and Vedran Podobnik. 2017. Generating politician profiles based on content analysis of social network datasets. Journal of Universal Computer Science 23: 236–55. [Google Scholar]
Hagen, Loni, Stephen Neely, Thomas E. Keller, Ryan Scharf, and Fatima Espinoza Vasquez. 2020. Rise of the Machines? Examining the Influence of Social Bots on a Political Discussion Network. Social Science Computer Review. [Google Scholar] [CrossRef]
Hänggli, Regula, and Hanspeter Kriesi. 2010. Political framing strategies and their impact on media framing in a swiss direct-democratic campaign. Political Communication 27: 141–57. [Google Scholar] [CrossRef]
Hänggli, Regula, and Hanspeter Kriesi. 2012. Frame Construction and Frame Promotion (Strategic Framing Choices) Title. American Behavioral Scientist 56: 9. [Google Scholar] [CrossRef]
Hedman, Freja, Fabian Sivnert, Bence Kollanyi, Vidya Narayanan, Lisa-Maria Neudert, and Philip Howard. 2018. News and Political Information Consumption in Sweden: Mapping the 2018 Swedish General Election on Twitter. Oxford: Oxford Internet Institute. [Google Scholar]
Howard, Philip, and Bence Kollanyi. 2017. Bots, #Strongerin, and #Brexit: Computational Propaganda During the UK-EU Referendum. Available online: https://doi.org/10.2139/ssrn.2798311 (accessed on 30 June 2021).
Iyengar, Shanto. 1991. Is Anyone Responsible? How Television Frames Political Issues. Chicago: University of Chicago Press. [Google Scholar]
Ji, Yi Grace, Zifei Fay Chen, Weiting Tao, and Zongchao Cathy Li. 2018. Functional and emotional traits of corporate social media message strategies: Behavioral insights from SandP 500 Facebook data. Public Relations Review 45: 88–103. [Google Scholar] [CrossRef]
Jungherr, Andreas. 2016. Twitter use in election campaigns: A systematic literature reviewNo Title. Journal of Information Technology and Politics 13: 72–91. [Google Scholar] [CrossRef]
Kamar, Ece. 2016. Direction in Hybrid Intelligence: Complementing AI System with Human Intelligence. Paper presented at the 25th International Joint Conference on Artificial Intelligence, New York, NY, USA, July 9–15; pp. 4070–73. [Google Scholar]
Keller, Tobias, and Ulrike Klinger. 2019. Social Bots in Election Campaigns: Theoretical, Empirical, and Methodological Implications. Political Communication 36: 171–89. [Google Scholar] [CrossRef]
Kudugunta, Sneha, and Emilio Ferrara. 2018. Deep Neural Networks for Bot Detection. Information Sciences 467: 312–22. [Google Scholar] [CrossRef]
Kušen, Ema, and Mark Strembeck. 2020. You talkin’ to me? Exploring Human/Bot Communication Patterns during Riot Events. Information Processing and Management 57: 102126. [Google Scholar] [CrossRef]
Lai, Mirko, Marcella Tambuscio, Viviana Patti, Giancarlo Ruffo, and Paolo Rosso. 2019. Stance polarity in political debates: A diachronic perspective of network homophily and conversations on Twitter. Data and Knowledge Engineering 124: 101738. [Google Scholar] [CrossRef]
Lamo, Madeline, and Ryan Calo. 2019. Regulating Bot Speech (July 16, 2018). UCLA Law Review. Available online: https://ssrn.com/abstract=3214572 (accessed on 30 June 2021).
Lee, Hyunmin, and Hyojung Park. 2013. Testing the impact of message interactivity on relationship management and organizational reputation. Journal of Public Relations Research 25: 188–206. [Google Scholar] [CrossRef]
Lengauer, Günter, and Iris Höller. 2013. Generic frame building in the 2008 Austrian elections. Public Relations Review 39: 303–14. [Google Scholar] [CrossRef]
Lewis, Seth C., Andrea Guzman, and Thomas Schmidt. 2019. Automation, Journalism, and Human–Machine Communication: Rethinking Roles and Relationships of Humans and Machines in News. Digital Journalism 7: 409–27. [Google Scholar] [CrossRef]
López Urrea, Laura María, Julián Enrique Páez Valdez, and Arlex Darwin Cuellar Rodríguez. 2016. El Discurso Político Mediado Por Ordenadores. Revista Nexus Comunicación 19: 110–29. [Google Scholar] [CrossRef][Green Version]
Luceri, Luca, Ashok Badawy, Adam Deb, and Emilio Ferrara. 2019. Red bots do it better: Comparative analysis of social bot partisan behavior. Paper presented at the Web Conference 2019—Companion of the World Wide Web Conference (WWW 2019), San Francisco, CA, USA, May 13–17; pp. 1007–12. [Google Scholar] [CrossRef]
Machado, Caio, Beatriz Kira, Gustavo Hirsch, Nahema Marchal, Bence Kollanyi, Philip Howard, Thomas Lederer, and Vlad Barash. 2018. News and Political Information Consumption in Brazil: Mapping the First Round of the 2018 Brazilian Presidential Election on Twitter. Comprop Data Memo 2018.4/October 5. pp. 1–7. Available online: https://demtech.oii.ox.ac.uk/wp-content/uploads/sites/93/2018/10/machado_et_al.pdf (accessed on 30 June 2021).
Martínez-Castaño, Rodrigo, Juan Pichel, and David Losada. 2018a. Building python-based topologies for massive processing of social media data in real time. Paper presented at the 5th Spanish Conference on Information Retrieva, Zaragoza, Spain, June 25–27; pp. 1–8. [Google Scholar]
Martínez-Castaño, Rodrigo, Juan Pichel, and Pablo Gamallo. 2018b. Polypus: A Big Data Self-Deployable Architecture for Microblogging Text Extraction and Real-Time Sentiment Analysis. arXiv arXiv:1801.03710. [Google Scholar]
Martínez-Castaño, Rodrigo, Juan Pichel, David Losada, and Fabio Crestani. 2018c. A Micromodule Approach for Building Real-Time Systems with Python-Based Models: Application to Early Risk Detection of Depression on Social MediaTitle. Paper presented at the European Conference on Information Retrieval, Grenoble, France, March 26–29; pp. 801–5. [Google Scholar]
Matthes, Jörg. 2009. What’s in a frame? A content analysis of media framing studies in the world’s leading communication journals, 1990-2005. Journalism and Mass Communication Quarterly 86: 349–67. [Google Scholar] [CrossRef]
McKelvey, F., and E. Dubois. 2017. Computational Propaganda in Canada: The Use of Political Bots. Working Paper No. 2017.6. Oxford: University of Oxford. [Google Scholar]
Montal, Tal, and Zvi Reich. 2017. I, Robot. You, Journalist. Who is the Author? Authorship, bylines and full disclosure in automated journalism. Digital Journalism 5: 829–49. [Google Scholar] [CrossRef]
Morstatter, Fred, Liang Wu, Tahora Nazer, Kathlenn Carley, and Huan Liu. 2016. A New Approach to Bot Detection. Paper presented at the IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), San Francisco, CA, USA, August 18–21; pp. 533–40. [Google Scholar]
Murthy, Dhiraj, Alison Powell, Ramine Tinati, Nick Anstead, Leslie Carr, Susan Halford, and Mark Weal. 2016. Bots and political influence: A sociotechnical investigation of social network capital. International Journal of Communication 10: 4952–71. [Google Scholar]
Neuman, Russell, Marion Just, and Ann Crigler. 1992. News and the Construction of Political Meaning. Chicago: The University of Chicago. [Google Scholar]
Painter, David Lynn. 2015. Online Political Public Relations and Trust: Source and Interactivity Effects in the 2012 US Presidential Campaign. Public Relations Review 41: 801–8. [Google Scholar] [CrossRef]
Pan, Zhongdang, and Gerald Kosicki. 1993. Framing analysis: An approach to news discourse. Political Communication 10: 55–75. [Google Scholar] [CrossRef]
Pedersen, Ramus Tue. 2012. The game frame and political efficacy: Beyond the spiral of cynicism. European Journal of Communication 27: 225–40. [Google Scholar] [CrossRef]
Perdana, Rizal Setya, Tri Hadiah Muliawati, and Reddy Alexandro. 2015. Bot Spammer Detection in Twitter Using Tweet Similarity and Time Interval Entropy. Jurnal Ilmu Komputer Dan Informasi 8: 19. [Google Scholar] [CrossRef]
Puyosa, Iria. 2017. Political Bots on Twitter in #Ecuador2017 Presidential Campaigns. Contratexto 27: 39–60. [Google Scholar] [CrossRef]
Ramalingam, Devakunchari, and Valliyammai Chinnaiah. 2018. Fake profile detection techniques in large-scale online social networks: A comprehensive review. Computers and Electrical Engineering 65: 165–77. [Google Scholar] [CrossRef]
Rangel, Francisco, and Paolo Rosso. 2019. Overview of the 7th author profiling task at PAN 2019: Bots and gender profiling in twitter. Paper presented at the CEUR Workshop Proceedings, Lugano, Switzerland, September 9–12. [Google Scholar]
Ross, Björn, Laura Pilz, Benjamin Cabrera, Florian Brachten, German Neubaum, and S. Stieglitz. 2019. Are social bots a real threat? An agent-based model of the spiral of silence to analyse the impact of manipulative actors in social networks. European Journal of Information Systems 28: 394–412. [Google Scholar] [CrossRef]
Salge, Carolina Alves de Lima, and Elena Karahanna. 2018. Protesting Corruption on Twitter: Is It a Bot or Is It a Person? Academy of Management Discoveries 4: 32–49. [Google Scholar] [CrossRef]
Sanovich, Sergey. n.d. Computational Propaganda in Russia: The Origins of Digital Misinformation. Oxford: The University of Oxford. [Google Scholar]
Santana, Luis, and Gonzalo Huerta Cánepa. 2019. ¿Son bots? Automatización en redes sociales durante las elecciones presidenciales de Chile 2017. Cuadernos.info 44: 61–77. [Google Scholar] [CrossRef]
Schäfer, Fabian, Evert Stefan, and Philip Heinrich. 2017. Japan’s 2014 general election: Political bots, right-wing internet activism, and prime minister Shinzō Abe’s hidden nationalist agenda. Big Data 5: 294–309. [Google Scholar] [CrossRef]
Scheufele, Dietram. 1999. Framing as a theory of media effects. Journal of Communication 49: 103–22. [Google Scholar] [CrossRef]
Schuchard, Ross, Andrew Crooks, Anthony Stefanidis, and Arie Croitoru. 2019. Bot stamina: Examining the influence and staying power of bots in online social networks. Applied Network Science 4: 55. [Google Scholar] [CrossRef]
Semetko, Holli, and Patti Valkenburg. 2000. Framing European Politics: A content analysis of press and television news. JOurnal of Communication 50: 93–109. [Google Scholar] [CrossRef]
Sheafer, Tamir, and Itay Gabay. 2009. Mediated public diplomacy: A strategic contest over international agenda building and frame building. Political Communication 26: 447–67. [Google Scholar] [CrossRef]
Stukal, Denis, Sergey Sanovich, Joshua Tucker, and Richard Bonneau. 2019. For Whom the Bot Tolls: A Neural Networks Approach to Measuring Political Orientation of Twitter Bots in Russia. SAGE Open 9: 2158244019827715. [Google Scholar] [CrossRef]
Treré, Emiliano. 2016. The dark side of digital politics: Understanding the algorithmic manufacturing of consent and the hindering of online dissidence. IDS Bulletin 47: 127–38. [Google Scholar] [CrossRef]
Utz, Sonja, Schultz Friederike, and Sandra Glocka. 2013. Crisis communication online: How medium, crisis type and emotions affected public reactions in the Fukushima Daiichi nuclear disaster. Public Relations Review 39: 40–46. [Google Scholar] [CrossRef]
van der Kaa, Hille, and Emiel Krahmer. 2014. Journalist versus news consumer: The perceived credibility of machine written news. Paper presented at the Computation+ Journalism Conference, New York, NY, USA, October 24–25; pp. 1–4. [Google Scholar]
Waddell, Franklin. 2018. A Robot Wrote This? Digital Journalism 6: 236–55. [Google Scholar] [CrossRef]
Wang, Wei, Yaoyao Shang, Yongzhong He, Yidong Li, and Jiqiang Liu. 2020. BotMark: Automated botnet detection with hybrid analysis of flow-based and graph-based traffic behaviors. Information Sciences 511: 284–96. [Google Scholar] [CrossRef]
Wasike, Ben. 2013. Framing news in 140 characters: How social media Editors frame the news and interact with audiences via Twitter. Global Media Journal, Canadian Edition 6: 5–23. [Google Scholar]
Wölker, Anja, and Thomas Powell. 2018. Algorithms in the newsroom? News readers’ perceived credibility and selection of automated journalism. Journalism 22: 86–103. [Google Scholar] [CrossRef]
Woolley, Samuel, and Philip Howard. 2016. Political communication, Computational Propaganda, and autonomous agents: Introduction. International Journal of Communication 10: 4882–90. [Google Scholar]
Zheng, Lei, Chistopher Albano, Neev Vora, Feng Mai, and Jeffrey Nickerson. 2019. The roles bots play in Wikipedia. Paper presented at ACM on Human-Computer Interaction, 3(CSCW), Boston, MA, USA, June 30–July 3. [Google Scholar]
Zhou, Yuqiong, and Patricia Moy. 2007. Parsing Framing Processes: The Interplay Between Online Public Opinion and Media Coverage. Journal of Communication 57: 79–98. [Google Scholar] [CrossRef]

Figure 1. Source: created by authors.

Figure 2. High-level Twitter crawler architecture.

Table 1. Capture time interval: [2019/04/16 10:15:00 UTC, 2019/05/09 10:59:19 UTC].

Unique users:	1,036,920
Tweets:	4,547,482
Tweets plus retweets:	22,296,826

Table 2. Accounts, hashtags, and terms.

Accounts, Hashtags and Terms
Accounts	@PSOE, @PPopular, @ahorapodemos, @CiudadanosCs, @eajpnv, @JuntsXCat, @compromis, @vox_es, @navarra_suma, @ForoAsturias, @coalicion@Esquerra_ERC, @ehbildu, @Nueva_Canarias, @sanchezcastejon, @pablocasado, @Pablo_Iglesias, @Albert_Rivera, @Aitor_Esteban, @jordialapreso, @LauraBorras, @joanbaldovi, @junqueras, @gabrielrufian, @sergiosayas, @PedroQuevedoIt, @anioramas, @Santi_ABASCAL, @meritxell_batet, @InesArrimadas, @cayetanaAT @Jaumeasens
Hashtags	#elecciones2019, #debates, #eleccionesgenerales2019, #28deabril, #eleccionesgenerales28A, #LaEspañaQueQuieres, #HazQuePase, #ValorSeguro, #VamosCiudadanos, #PorEspaña, #Perotampocoteconformes, #ahorapodemos
Terms	PSOE, PP, Podemos, Ciudadanos, PNV, Junts per Catalunya, Junts, Compromís, Navarra Suma, Partido Popular, Coalición Canaria, Esquerra Republicana, EH-Bildu, Nueva Canarias, Vox, Pedro Sánchez, Pablo Casado, Pablo Iglesias, Albert Rivera, Aitor Esteban, Jordi Sánchez, Laura Borrás, Joan Baldoví, Paloma Gázquez, Oriol Junqueras, Gabriel Rufián, Sergio Sayas, Garazi Dorronsoro, Pedro Quevedo, Ana Oramas, Santiago Abascal, Meritxell Batet, Inés Arrimadas, Cayetana Álvarez, Jaume Asens

Table 3. Classification instructions for bot detection.

Level	Type	swDescription		Max Score	Applicable by Individuals?	Automatable?
Structural	Syntax	The bot uses telegraphic language		1	Yes	Yes
		The bot is repetitive		1	Yes	Yes
		The bot has a simple syntactic structure		1	yes	No
	Communicative	Lack of interaction and references to previous messages		1	Yes	Yes
		Scarce feedback on the network			Yes	Yes
		Does not develop threads			Yes	Ongoing learning process
		Links to media outlets			Yes	Yes
		Use of denotative language			Yes	No
		Lacks irony and double entendre			Yes	No
Content level	Number of frames	One frame		1	Yes	No
	Number of frames	Two or more		0	Yes	No
	Issue-specific	Dissemination of media outlets	Issue focused on reproducing news by a media outlet	1	Yes	No
		Dissemination of leaders	Issue focused on reproducing statements by a political leader	1	Yes	No
		Repetition	Not issue-heavy, heavy on repetition/dissemination with same issue	1	Sí	No
		Hybrid	Features calls to offline action	0	Yes	No
		Partisan dissemination	Reference to a party or leader	1	Yes	No
	Generic frames	Game frame		1	Yes	No
		Strategic frame		0	Yes	No
		Definition of a problem		0	Yes	No
		Interpretation of its causes		0	Yes	No
		Moral judgment		1	Yes	No
		Treatment recommendation		0	Yes	No

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

García-Orosa, B.; Gamallo, P.; Martín-Rodilla, P.; Martínez-Castaño, R. Hybrid Intelligence Strategies for Identifying, Classifying and Analyzing Political Bots. Soc. Sci. 2021, 10, 357. https://doi.org/10.3390/socsci10100357

AMA Style

García-Orosa B, Gamallo P, Martín-Rodilla P, Martínez-Castaño R. Hybrid Intelligence Strategies for Identifying, Classifying and Analyzing Political Bots. Social Sciences. 2021; 10(10):357. https://doi.org/10.3390/socsci10100357

Chicago/Turabian Style

García-Orosa, Berta, Pablo Gamallo, Patricia Martín-Rodilla, and Rodrigo Martínez-Castaño. 2021. "Hybrid Intelligence Strategies for Identifying, Classifying and Analyzing Political Bots" Social Sciences 10, no. 10: 357. https://doi.org/10.3390/socsci10100357

APA Style

García-Orosa, B., Gamallo, P., Martín-Rodilla, P., & Martínez-Castaño, R. (2021). Hybrid Intelligence Strategies for Identifying, Classifying and Analyzing Political Bots. Social Sciences, 10(10), 357. https://doi.org/10.3390/socsci10100357

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Hybrid Intelligence Strategies for Identifying, Classifying and Analyzing Political Bots

Abstract

1. Introduction

2. Framework

2.1. Political Bots: Identification and Social Impact

2.2. Our Approaches: Hybrid Intelligence

3. Materials and Methods

4. Features of the Classifier

4.1. Features

4.1.1. Social Network Features

4.1.2. Content-Based Features

4.1.3. Lexical Features

4.2. Heuristics

5. Analysis

6. Discussion and Conclusions

7. Limitations

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Appendix A

Notes

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI