Using Twitter to Detect Hate Crimes and Their Motivations: The HateMotiv Corpus

: With the rapidly increasing use of social media platforms, much of our lives is spent online. Despite the great advantages of using social media, unfortunately, the spread of hate, cyberbullying, harassment, and trolling can be very common online. Many extremists use social media platforms to communicate their messages of hatred and spread violence, which may result in serious psychological consequences and even contribute to real-world violence. Thus, the aim of this research was to build the HateMotiv corpus, a freely available dataset that is annotated for types of hate crimes and the motivation behind committing them. The dataset was developed using Twitter as an example of social media platforms and could provide the research community with a very unique, novel, and reliable dataset. The dataset is unique as a consequence of its topic-speciﬁc nature and its detailed annotation. The corpus was annotated by two annotators who are experts in annotation based on uniﬁed guidelines, so they were able to produce an annotation of a high standard with F-scores for the agreement rate as high as 0.66 and 0.71 for type and motivation labels of hate crimes, respectively.


Introduction
Over the last decade, social media channels, such as Facebook, Twitter, etc., have become a normal part of daily life for many [1]. Twitter is an extremely popular social media platform and has been growing rapidly since its creation in 2006. It has provided a place where people can interact with each other and maintain social ties. People use Twitter to share their daily activities with their contacts, which makes it both a valuable tool and a great data source for research.
Although Twitter has become a useful way to spread information, it has introduced new modes of social discourse and produced antagonistic content that adds to the dissemination of prejudice, false information, and hostility towards people on an unprecedented scale, including refugees and other marginalized groups [2,3]. The content involving hateful messages ranges from verbal aggression to cyberbullying, offensive language, hate speech, and incitement to crime. This verbal aggression on social media platforms can cause more harm than traditional bullying because it allows users to adopt an alias [4]. Unfortunately, the reach and extent of online aggression have given such content considerable power and influence that can affect anyone, irrespective of their status, identity, and location. Thus, such incidents are not a trivial annoyance and instead have now entered the realms of criminal activity, which can affect anyone and everyone. It has been noted that hateful online content can create both mental and psychological anguish for users of social media platforms, has led some to deactivate their accounts and, in the worst cases, to commit suicide [5].
Research has shown that online victimization in the form of hatred is just one part of a broader process of harm that commences on social media platforms social media. For example, suspects in many recent hate-related terror attacks have been shown to have a comprehensive history of posting hate-related comments on social media, which would Data 2022, 7, 69 2 of 10 indicate that social media could well have contributed to their radicalization [6,7]. Other studies have shown that there is a correlation between hate tweets focused on race and religion with racially and religiously aggravated offenses that occur offline in the same location (2019; 2019) [8].
A hate crime is a criminal offense that is motivated, in part or in whole, by the perpetrator's attitude towards race, religion, disability, sexual orientation, ethnicity, gender, or gender identity (2015; 2018). According to recent statistics from the FBI, the number of hate crimes is on the increase, and in 2018, the number of personal attacks motivated by personal bias hit a 16-year high (2018; 2019). In the UK, the number of hate crimes has more than doubled since 2013, and according to the Home Office data, 1605 hate crimes were identified as possible online offenses between 2017 and 2018 (2018) [9]. Europe has also witnessed a significant increase in negative xenophobic, nationalist, Islamophobic, racist, and antisemitic opinions. Their effect is not just restricted to hostile rhetoric and leads to actual crimes against groups and individuals [3]. Therefore, it is important for preventive measures to be taken to deal with abusive and aggressive behavior online [10].
While most microblogging websites and social networks have banned the use of hateful language, the volume of work posted on them makes it virtually impossible to moderate everything posted. As a consequence, the need has arisen for there to be a form of automatic speech detection that can identify and remove hate speech. Thus, the dissemination of hate on Twitter constitutes a social emergency with real-life individual and social consequences [3]. Thus, the aim of this research is to build a semantically annotated corpus called HateMotiv. This was done using Twitter data with a focus on the identification of mentions denoting types of hate crimes (physical assault, verbal abuse, and incitement to hatred) and the motivation behind committing such crimes as classified by the FBI (race or ethnicity, religion, sexism, and disability). Such information could help to stimulate research on the automatic extraction of hate crime types and the motivations behind them from social media text. We also underline that we are not stating that there is any direct relationship between the use of hate speech online and hate-based actions in the physical (real) world. However, the findings of this study are pertinent to improving discrimination surveillance and mitigation efforts [11].
This research aims to use Twitter as an example of a social media platform to study hate crimes and the motivation(s) behind them. The contribution of this research is three-fold:

1.
To build annotated datasets for the mentions of hate crime types and the motivation(s) behind committing such hate crimes.

2.
The corpus is freely available to be used by the research community and will serve as a resource to train and evaluate text-mining (TM) tools, which in turn, can be used to automatically extract mentions of hate crimes and motivation. The TM tools can be used for the prediction of hate crime events and for better surveillance and mitigation efforts against discrimination. To the best of our knowledge, this is the first study pertaining to the investigation of Twitter data for hate crimes and the motivation(s) behind them.

3.
To create a hate crime and motivation vocabulary list specifically relating to hate crimes and their motivation(s). The vocabulary list is freely available and can be used as a resource for TM techniques and named entity recognition methods.

Related Work
With there being such growth and proliferation of hate speech on social media platforms, there has also been considerable research on the identification of offensive language such as hate speech [12][13][14][15][16][17], cyberbullying [18,19], aggression [10], and toxic comments [20,21]. Only a nominal volume of work has specifically been centered on the detection of personal bias against specific groups that leads to hate crimes, and these efforts were limited in terms of where and when hate crimes are committed [13,22] or were about finding the associations between social media discrimination and offline hate crimes in 100 cities in the United States. However, the latter study was limited to discrimination Data 2022, 7, 69 3 of 10 related to race, ethnicity, or national origin, and the classification of tweets was made at the sentence level [9,11]. Table 1 is representative of a sample of sets of public data which are available for evaluation-and training-related hate speech and offensive language training. Most of the datasets include two classes to label the text as offensive or not offensive, with a small percentage identifying racist or sexist content. Furthermore, all the mentioned datasets are annotated at the tweet level (e.g., the tweet is labeled as either racist, sexist, or both) without annotating the specific word or expression that denotes the label. This makes the dataset more suitable for developing binary classification methods as opposed to simply helping with the extraction of detailed information about any text. Annotating text at the "mentions" level gives more information about it and enhances the use of linguistic and syntactic features that can be used to train and evaluate ML methods. As far as we are aware, none of the previous works have studied the problem of hate crimes as a whole with the inclusion of different personal biases that influence and motivate the occurrence of crimes regardless of if the tweet is written by the main person who committed the crime or by another party. This is mainly because there is no available annotated corpus for hate crime information. Furthermore, there is no comprehensive dictionary covering hate crime-related terms. Developing TM tools that can automatically extract hate crime-related information relies upon textual corpora where pertinent information has been specifically annotated by those experienced in the field. Building publicly available, curated datasets that identify hate crime mentions and the personal bias that motivates such crimes is essential for training machine learning (ML) techniques (2019) [26,27] and raising the bar for the effective and systematic assessment of any novel methodologies. There are several ways in which our own research varies from previous work on the subject. First, our work is devoted in particular to the detection and classification of hate crimes and the motivations at a detailed level. Second, the annotations in our work are made at the mention level of words or sequences of words that denote a hate crime (e.g., physical, verbal, etc.) and the personal bias that motivates committing the hate crime (e.g., racism, sexism, etc.). As new words are introduced frequently in expressing hateful content, annotating tweets at the mentions level will facilitate the generation of a lexical resource and the automatic augmentation of existing lexical resources. Therefore, we used our corpus to create a hate crime vocabulary list, which includes vocabulary about different hate crimes classified as different types and the personal bias motivating the crimes.

Material and Methods
To carry out experiments on the automatic detection of online hate crime, it is critical one has access to labeled corpora. As no benchmark exists for such a corpora for hate crimes, researchers have been forced to obtain and class data specifically for themselves [2]. As far as we know, no previous studies have been specifically centered on hate crime detection and the motivation behind them, which does not help when it comes to reaching any relevant conclusions about the most common motivations behind hatred and hate crimes. Furthermore, it is very difficult to provide recommendations on how to control bias and remove the motivation for hate-related crimes.

Corpus Construction
We retrieved examples for the HateMotiv corpus from Twitter using the TweetScraper tool. Tweets were collected from a nine-year period (between 1 January 2010, and 30 December 2019). Twitter includes a very high number of tweets related to the topic of hate crimes, and the presence of particular words, such as "hate crime", does not necessarily indicate that a tweet is related to committing a hate crime. However, the hashtag convention is predominantly utilized on Twitter as a means to connect a user's comments and points of view to an event [28]. Therefore, we used the "Hashtagify" tool (https://hashtagify.me/) [29] to find the most popular hashtags related to hate crimes. The following list of hashtags was used to collect the relevant tweets: "hate crime", "racist", "racism", "Islamophobia", "Islamophobic", "sexism", "disability", "transgender", "antisemitism", "misogyny", and "disabled". We noticed that the hashtags that were highly related to hate crime terms matched the hate crime classification of the FBI, so we used these hashtags as keywords to crawl relevant tweets. The keywords were chosen by the judge of this annotation process, who is an English teacher and who has considerable experience in annotation, and which resulted in 23,179 tweets that contained mentions of the listed hashtags in the query. Due to the cost of manual annotations in terms of time and money, the tweets were further filtered, and we randomly selected 5000 tweets to be considered for annotation by the annotators.

Annotation Process
The annotation was done through the COGITO Tech (LLC) service (https://www. cogitotech.com/about-us) [30], with which each tweet was annotated by two annotators, who are English language instructors. The tweets were annotated for mentions related to the type of hate crime and the motivation behind committing the crime through the use of the same set of guidelines applicable to the annotation. The annotation included marking up all entity mentions in the corpus related to four hate crime types and motivations, as shown in Table 2. Annotators were supported by a concise set of guidelines together with regular meetings with the judge of this annotation process to discuss any issues which arose during the annotation process and to resolve all problems and discrepancies. As shown in Figure 1, the most common hate crime type according to the HateMotiv corpus is physical assault, followed by incitement to hatred. The least common type of hate crime reported on Twitter is verbal abuse. Figures 2-4 show the distribution of the motivations behind committing different types of hate crimes in HateMotiv corpus. Bias against different races and ethnicities contributed the most to committing different types of hate crimes, and it seems that people do not tolerate variation and variability in terms of different skin colors and nationalities. Disability and people's attitudes toward disabled people were the least common motivation behind committing hate crimes.
As shown in Figure 1, the most common hate crime type according to the HateMotiv corpus is physical assault, followed by incitement to hatred. The least common type of hate crime reported on Twitter is verbal abuse. Figure 2 Figure 4 show the distribution of the motivations behind committing different types of hate crimes in HateMotiv corpus. Bias against different races and ethnicities contributed the most to committing different types of hate crimes, and it seems that people do not tolerate variation and variability in terms of different skin colors and nationalities. Disability and people's attitudes toward disabled people were the least common motivation behind committing hate crimes.  The motivation of incitement to hatred crime  The motivation of incitement to hatred crime    Sexism and discrimination against other genders came next after racism as the second most common motivation for different types of hate crime. It should also be mentioned that hate crimes are sometimes committed without a clear motivation or reason and are just based on personal bias or mental problems. This is reflected in the HateMotiv corpus with the "unknown" motivation label. As shown in the figures, physical assault was the most common form of hate crime perpetrated without a clear reason. However, the percentage of crimes committed for unknown reason is very low compared with other motivations (only 0.011% of all hate crime types).

Results and Discussion
The reliability of human annotation is very important for ensuring both that ML algorithms can accurately learn the characteristics of tweets that discuss hate crimes and to provide an upper bound for the expected performance [23]. We ensured the superior quality of the generated corpus by measuring the inter-annotator agreement (IAA). If the IAA score is high, this proves that one has a reliable corpus that will be suitable for the training and testing of TM techniques and ML models. Following a number of previous studies [31][32][33][34][35], we used the F-score to calculate the IAA because it is the same regardless of the set of annotations utilized as the gold standard [32,36]. The F-score is calculated thus: The annotations produced by the first annotator were considered the "gold standard", and the total number of correct annotations corresponded to the number of annotated entities produced by that annotator. Based on this, we calculated the IAA based on precision, recall, and F-score. Precision (P) is the percentage of correct positive annotated entities produced by the second annotator, comparing them to the gold standard. The precision equals the ratio between the number of true positive (TP) entities and the total number of annotated entities from the second annotator, as per: for hate crime types and motivation was lower than the recall. The reason for this is that the second annotator annotated mentions of incidents that are not necessarily considered hate crimes or motivations according to the annotation guidelines for this task. Moreover, the second annotator annotated every mention of hate crimes, resulting in annotating the same hate crimes more than once in a tweet. For example, consider the following tweet: The second annotator annotated both "killed" and "killing" as physical-assault hate crimes, while the correct annotation according to the guidelines and the annotation produced by the first annotator is to only annotate the first mention of the hate crime "killed" as a physical-assault hate crime. Another example is the following tweet: "A group of whites attacked a black and Muslim man. This is racism against black people and a hate crime". The correct annotation as per the gold standard is to annotate "attacked" as a physical assault-type hate crime with the following motivations "black" as racism and "Muslim" as religion. However, the second annotator annotated extra unnecessary text e.g., the second mention of black, which should not have been included in the gold standard as a motivation since the more specific mention and description of the racism motivation "black" had already been annotated.
On the other hand, the recall was high, which means that the second annotator produced the same set of annotations compared with the gold standard.
To show the importance of our generated dataset, we compared the HateMotiv corpus with other publicly available datasets in the same domain of hate crimes, which are reported in Table 1. In particular, we compared our corpus with the dataset that used Twitter as a social media source, and they share some of the characteristics with our proposed corpus, e.g., they have been annotated for similar classes related to hate crime. As shown in Table 5, the HateMotiv corpus differs from the existing datasets in terms of the very specific domain, which is hate crimes and the motivation behind committing them, and also the level of annotations at mention level. However, the other datasets include binary or multi-class annotation at tweet level. The results of the annotation are satisfactory and are measured in terms of F-score at 0.66 and 0.71 for type and motivation labels of hate crimes, respectively.  As a result of this research, upon the creation of the HateMotiv corpus, we were able to create a hate crime and motivation vocabulary list. As far as we know, there is no resource devoted specifically to hate crimes and their motivations. The vocabulary list is freely available and can be used as a resource for TM techniques and named entity recognition methods

Conclusions
We have presented the HateMotiv corpus, a new dataset with annotations of types and motivations of hate crimes. To the best of our knowledge, this is the first dataset to contain annotations of hate crime types and to target offenses on social media. The results could open up new directions for interesting research. The dataset is freely available to the research community and has the potential to stimulate investigations of the automatic detection and prediction of hate crimes and their motivation. We have also shared a vocabulary list derived from the results, which can be used as a reference for hate crimes to augment existing dictionaries or to create new specialized dictionaries. Detailed information about the motivations behind hate crimes has the potential to help control and mitigate the motivations of hate crimes in an attempt to control personal bias and reduce the number of crimes conducted based on such bias against others. In the future, we plan to use state-of-the-art ML and deep-learning techniques (e.g., neural networks) to train models to extract and recognize hate crime mentions and their motivations on social media platforms.