Social Media Toxicity Classiﬁcation Using Deep Learning: Real-World Application UK Brexit

: Social media has become an essential facet of modern society, wherein people share their opinions on a wide variety of topics. Social media is quickly becoming indispensable for a majority of people, and many cases of social media addiction have been documented. Social media platforms such as Twitter have demonstrated over the years the value they provide, such as connecting people from all over the world with different backgrounds. However, they have also shown harmful side effects that can have serious consequences. One such harmful side effect of social media is the immense toxicity that can be found in various discussions. The word toxic has become synonymous with online hate speech, internet trolling, and sometimes outrage culture. In this study, we build an efﬁcient model to detect and classify toxicity in social media from user-generated content using the Bidirectional Encoder Representations from Transformers (BERT). The BERT pre-trained model and three of its variants has been ﬁne-tuned on a well-known labeled toxic comment dataset, Kaggle public dataset (Toxic Comment Classiﬁcation Challenge). Moreover, we test the proposed models with two datasets collected from Twitter from two different periods to detect toxicity in user-generated content (tweets) using hashtages belonging to the UK Brexit. The results showed that the proposed model can efﬁciently classify and analyze toxic tweets.


Introduction
Social media has many positive aspects, one of which is the sense of community it provides to people [1].Social media allows people to do in a day what would usually take a lifetime to achieve.It also gives people the opportunity to form a support network.People from all walks of life around the world can connect with the right individuals and build a mutually beneficial network.Social media also provides the ability to see what people in other parts of the world are thinking or doing.This can be very beneficial, for example, in instances where someone wants to see the immediate reactions people have during an election, sports game, award show, etc. Providing an outlet for self-expression is another area where social media excels.Most people are looking to frequently express themselves in some way, shape, or form.Some use social media as a sort of therapy, allowing them to talk about what is bothering them.In [2], the authors examine the benefits of social media in adolescence.One of the benefits they point out is identity exploration, which they say can help adolescents discover aspects of themselves.
Despite the fact that social media offers a lot of good to the world, it also has a host of negative aspects.Social media has become a place of discord.People rarely agree on matters of discussion, and some of the individuals on social media take these disagreements to a higher intensity.They attack anyone who is at odds with what they believe in.This can lead to insensitive language being used from one person or group to another.Topics of contention range from politics, gender, movies, and more.People who want to discuss things they care about must be willing to receive opinions different from theirs.Said opinions can sometimes be toxic.This sometimes leads to users no longer expressing themselves, and eventually may halt the search for differing opinions because of the threat of abuse and harassment on social media.Platforms such as Twitter have had difficulty in trying to effectively facilitate conversations.This has led to large numbers of communities limiting or outright shutting down user comments.
Toxic behavior on social media has come to be expected, but it is increasingly not tolerated.Toxicity within the social sphere can be described as spreading unnecessary negativity or hate that ends up negatively affecting people who encounter it.Toxic individuals online look to spread malice and abuse other people in discussions.For instance, Kwak et al. [23] studied toxic behavior in team competition online games.They found that the result of a match is tied to the appearance of toxic behavior.Toxic comments on social sites, such as Twitter can be found on topics that are very difficult to discuss, such as Brexit, climate change, abortion, vaccines, and US elections.Toxic behavior is more prevalent in such topics because of their divisive nature.People tend to have different opinions when discussing such topics, which can lead to divisions.Groups of people that believe in a particular view are formed, with each group believing their views are right.There is rarely a middle ground in these clashes of words.Some people are civil when discussing such topics, but more often than not, people become frustrated by the other group and start using toxic language.A comment about climate change such as "They're stupid, it's getting warmer, we should enjoy it while it lasts", can be considered toxic.The previously mentioned comment, made by someone that believes in climate change, is a blatant attack on individuals that deny that climate change is happening.
Toxic behavior on social platforms can also indicate the presence of cyber-bullying.Cyberbullying is a type of bullying or harassment that is carried out online and is widespread on social media sites such as Twitter.It has become a common occurrence among people of all ages, but it mostly occurs amongst teenagers.According to a particular poll, 22% of teens use their preferred social media site more than 10 times a day [24].Whittaker and Kowalski [25] carried out a study revealing that texting and social media are the most commonly used venues for cyberbullying.According to (Bullying statistics.http://www.bullyingstatistics.org/content/cyber-bullying-statistics.html(accessed on 1 January 2021)), over 25% of teens and adolescents have experienced cyberbullying, with most choosing not to tell their guardians when it occurs.Therefore, cyberbullying can be described as when individuals, usually teens, bully or harass others on social media.Cyberbullying is inherently toxic and harmful.Examples of harmful bullying behavior are threats, hate speech, posting vile rumors, making unsolicited sexual comments, as well as giving out someone's personal information without their permission.Fox et al. [26], for instance, show the effects of sexism and sexual harassment on social media.They deduce that a major factor is the anonymity of users.If left unchecked, cyberbullying can lead to low self-esteem in victims.In more extreme cases, victims end up committing suicide.Other emotional responses to such abuse include anger, frustration, depression, and fear for one's life.
Sometimes the source of these toxic comments are individuals who are known as internet trolls, or simply trolls.A troll is an individual who deliberately initiates quarrels or angers people online in an attempt to distract and stir up discord by posting provocative, off-topic comments to incite an emotional response from readers, either for their own amusement or to serve a specific goal.This act is known as trolling.The acts of trolls can be explained by the online disinhibition effect [27,28], which is the lowering of one's behavioral inhibitions in the online sphere.Trolls benefit significantly from the anonymity they have on social sites.Trolls can be detrimental to online communities in several ways by disrupting discussions (e.g., on Twitter), spreading lousy advice, and damaging the trust developed over time in an online community.Moreover, when the rate of deception is high, a group can become sensitized to trolling.This can lead to the rejection of honest, naive questions because they are considered trolling.In some situations, when a new user on a site such as Twitter decides to make their first post, they are immediately flooded with angry accusations.Despite the possibility that the accusations might be uncalled for, being labeled a troll can be quite harmful to one's online reputation.
All things considered, toxic behavior is an issue that needs to be dealt with head-on to foster civil, healthy, and open discussions on social media.That said, it is up to social platforms to decide the method of resolving this issue.Some have resorted to permanently banning individuals that were reported by other users, while others have chosen to let this behavior run rampant.The removal of toxic comments from social media can have a substantial positive impact, not just on conversations, but for people that may have been on the receiving end of said toxic comments.One solution to dealing with toxicity is through the use of sentiment analysis.A sentiment analysis system can be used to detect toxic comments by classifying the likelihood of such text as being toxic.Sentiment analysis has proven to be a successful approach to solving problems in numerous domains such as in [29][30][31][32][33][34][35].In addition, optimization techniques can be used to optimize the classification parameters [36,37].
The goal of this study is to analyze and to detect toxic behavior in Twitter using user-generated content in social media, such as Twitter.Twitter sentiment analysis has received wide attention, and has been utilized in various domains, for example, political influences [38][39][40], consumer insight mining [41], transportation services [42], movements of stock markets [43], traffic congestion detection [44,45], happiness evaluation [46], and others [47].
Therefore, we propose a method to detect toxicity in social media that involves building a deep learning model that is trained on a toxic dataset and then tested with real-time tweets that were collected using the Twitter API.
In this study, a toxicity detection and classification model was built based on Bidirectional Encoder Representations from Transformers (BERT).The BERT is a language representation scheme proposed by Devlin et al. [48], which is considered an efficient method that is widely employed in various applications, and has showed excellent performance on different tasks.
To sum up, the primary contributions of this paper are as follows: 1.
Propose an efficient deep learning model to classify toxic comments from user generated contents in social media; 2.
Build our model based on the Bidirectional Encoder Representations from Transformers (BERT) model; 3.
The BERT pre-trained model is fine-tuned on a Kaggle public dataset, "Toxic Comment Classification Challenge", and has been evaluated on two different datasets, collected from Twitter in two different periods, using Twitter API by applying several search terms and hashtags such such as #Brexit, #BrexitBetrayal, and #StopBrexit; 4.
We compare the BERT base model to three models, namely, Multilingual BERT, RoBERTa, and DistilBERT to verify its performance.
This paper is organized as follows.Related works are presented in Section 2. Section 3 describes the methodology.In Section 4, the experimental evaluation is described, where the conclusion is presented in Section 5.

Applications of User Generated Contents in Social Media
Souri et al. [3] presented a personality classification model by analyzing user activities in Facebook using machine learning.The proposed method uses Facebook API to collect the data of 100 users.The proposed method also can recommend friends in Facebook groups.Morente-Molinera et al. [4] leveraged a sentiment analysis technique to address group decision making using social media.The proposed method helps experts to generate preference relations which may be applied to make a group decision.Risch and Krestel [5] proposed a deep learning based model to identify the aggregation of user-generated content in social media.The proposed method uses a recurrent neural network based on a bidirectional gated recurrent unit.They used a machine translated to augment a dataset used for model training.
Subramani et al. [6] proposed a domestic violence identification system using deep learning.The proposed system uses a dataset collected from Facebook.The proposed system uses a binary text classification approach that can detect if a content created by a user is recognized as critical or uncritical.Subramani et al. [7] also presented a deep learning-based domestic violence identification system.Unlike their previous work [6], this system can be applied for a multi-class posts categorization including five categories, namely general, empathy, awareness, personal story, and fund raising.The proposed system was evaluated using a dataset collected from Facebook users' generated contents and achieved a high accuracy rate.
Ahmad et al. [10] used a deep learning model with sentiment analysis technique to classify user-generated content on Twitter (tweets) as a binary classification (extremist tweet or non-extremist tweet).In their work, Budiharto and Meiliana [11] used sentiment analysis techniques to predict the result of the Indonesian presidential election.The prediction of the proposed method was correct according to the result of the Indonesian presidential election.Al Shehhi et al. [14] used sentiment analysis techniques to analyze tweets collected from Twitter users in the United Arab Emirates (UAE) to measure happiness.They used both English and Arabic tweets.They found that 7:00 am is the happiest hour, and Friday is the happiest day.
In [15], the authors proposed a sentiment analysis system that can be applied to analyze posts in teaching evaluation systems.The proposed system collects student feedback, and analyzes sentiment analysis phrases to obtain the classification of teaching attitude.
Aloufi and El Saddik [16] proposed a sentiment analysis approach to analyze football fans sentiments through tweets posted by them and their reaction to game events (i.e., penalty kick, scoring goals, etc.).The proposed approach can conclude fan interaction through football game events.
Ibrahim et al. [49] proposed a toxic detection model based on convolutional neural network (CNN), bidirectional gated recurrent units (GRU), and bidirectional long shortterm memory (LSTM).They used a Wikipedia dataset to evaluate the proposed method and achieved an F-1-score of 87.2% for predicting toxicity types and 82.2% for the classification of toxic/non-toxic.In [50], the authors presented personal attacks analysis method based on a combination of machine learning and crowdsourcing.In [51], the authors proposed a deep learning-based approach to classify toxic comments.They used Kaggel data to test their approach.
Google and Jigsaw presented an API (Perspective API, https://www.perspectiveapi.com/ (accessed on 10 February 2021)) for classifying toxic comments.However, this API handles a simple binary text.Moreover, another study addressed the classification of toxic comments presented by [52].This study also used a simple binary text classification to test the proposed model.

Applications of Bidirectional Encoder Representations from Transformers (BERT)
The BERT has received wide attention, and has been applied in numerous applications.Fang et al. [53] proposed a near-miss reports classification method based on the BERT model.They evaluated BERT using near-miss reports datasets that collected from realworld construction projects.They found that BERT has the ability to classify near misses from the datasets.Additionally, BERT outperforms other compared models.Fan et al. [54] applied BERT to detect adverse drug events.They used reviews from Drug.com and WebMD to detect unreported drug events.The evaluation outcomes showed that the BERT achieved 94% of AUC.Moradi et al. [55] presented a summarization method using contextualized embeddings generated by the BERT.Their model was applied to capture the context of the text of medical texts.The evaluation of the method confirmed that BERT improved the summarization for biomedical text.
Wang et al. [56] used different methods to normalize Chinese written procedure and diagnosis to the standard concepts in ICD (International Classification of Diseases).Among five well-known methods, the BERT showed the best performances for normalization of procedure and diagnosis.In addition, in [57], the authors presented a BERT-based model to measure semantic similarity of clinical trial outcomes.Moreover, another text analysis approach for medical applications was proposed by Zhang et al. [58] using BERT.In this study, they used Chinese clinical information of various types of notes for breast cancer.The BERT was evaluated with extensive comparisons to other models and it showed better performances.
Chen et al. [59] proposed a multi-source data fusion approach for aspect-level sentiment classification.Their proposed approach integrates data from word-level sentiment lexicons, sentence-level corpora, and aspect-level corpora using the power of the BERT model.Furthermore, BERT is applied in many other applications, such as the classification of target-dependent sentiment [60], entity linking [61], image classification [62], medical text inferencing [63], occupational title embedding [64], and others [65].Moreover, multiple transformer-based models have been developed such GPT3 [66,67], Megatron-lm [68], and Electra [69] which demonstrated remarkable performance.

Methodology
In this section, we build a classifier able to classify toxic comments from a selection of collected Twitter posts.In this work, the BERT pre-trained model and three of its variants has been fine-tuned on a well-known labeled toxic comment dataset (https://www.kaggle.com/c/jigsaw-toxic-comment-classification-challenge(accessed on 1 January 2021)) and used to classify the collected tweets.The methodology workflow consists of three main steps, (i) implementation and fine-tuning of the BERT model for toxic comment classification, (ii) collection, pre-processing, and characterization of tweets relevant to a pre-defined hashtag, and (iii) classification analysis in the toxicity trend of the collected tweets.

Pre-Processing
Posts originated from a particular hashtag in a specific period.Before building the model, we implemented light pre-processing steps.We removed punctuation, links, and non-English words.For tokenization, we also used the pre-trained tokenizer for "bert_base_uncased".

BERT for Toxic Comments Classification
Transformer-based models are dominating a wide variety of NLP tasks leading to many state-of-the-art results [70].Based on transformer architecture, BERT (Bidirectional Encoder Representations from Transformers) has been recently proposed by a Google AI language research team as a language model that has been trained on two tasks, which are masked tokens prediction (Masked LM (MLM)) and next sentence prediction (NSP).Transformers are based on the popular attention mechanism to perform language modeling.Previous techniques only generate their embeddings from a text sequence by either looking to it from left to right, combined left to right, or right to left during training.Contrary, BERT applies the bidirectional training of the transformer to devise and gain a deeper understanding of the language context and flow.
Since transformers integrate the attention mechanism, which incorporates two separate mechanisms (encoder and decoder), BERT makes use of the encoder only to generate a language model.The encoder is used to learn contextual relations where the input is a sequence of tokens (words or sub-words) embedded into vectors.Before converting words into vectors and feeding them to BERT for training the masked LM, word masking is applied.Training the Masked LM is done by predicting 15% of the randomly chosen tokens in each sequence.Where 80% of the chosen tokens are replaced with a [MASK] token, 10% with a random token, and the remaining 10% remain unchanged.At this stage, the model suffers from slow convergence compared to directional models as BERT focuses only on predicting the masked values when calculating the loss function and ignores the prediction of the non-masked tokens in each batch.In the NSP training process, pairs of sentences (A, B) are inputted to the model where the goal is to predict if the sentence B is the next sentence in the original document.At this stage, 50% of the inputs are paired in which sentence B is the subsequent sentence of sentence A in the original document.Whereas, the other 50% are pairs where sentence B is a random sentence from the corpus.
In this paper, BERT-base trained on a single language (English) is used during the fine-turning phase with its default configuration that has 12 encoder blocks, 768 hidden dimensions, 12 attention heads, 512 maximum sequence length, and a total of 110 M parameters [48].In addition, the BERT multilingual base is a model trained on 104 languages using Wikipedia text and the MLM technique [48].RoBERTa [71] is an optimized model based on Google's BERT where the authors modified the hyperparameters (mini-batche size and learning rate) and omitted the NSP objective.DistilBERT [72] is a model trained by distilling Google's BERT which reduced the number of parameters by 40% (66 M) compared to the BERT-based method and ran faster, almost matching BERT's performance.

Model Fine-Tuning for Toxic Comments Classification
The BERT-base model is fine-tuned on documents retrieved from Wikipedia by the Kaggle competition.The dataset is from the Toxic Comment Classification Challenge, which was provided by the Conversation AI team.Conversation AI is a research initiative that was formed by Google and Jigsaw, and both are a part of Alphabet Inc.
The provided class labels in the dataset were originally defined across six different types of toxicity, including toxic, severe toxic, obscene, threats, insults, and identity-based hate.In this study, we consider using all six classes and train/test samples provided in the original competition dataset to train, validate, and evaluate the model.Overall, the dataset contains 159,571 training samples and 153,165 testing samples.
Figure 1 shows the number of comments that belong to each category.A comment can exist in one, two, three, or more categories, as shown in Figure 2. In total, there are 16,225 comments with labels.There are also comments that do not belong to any of the categories, which are 143,346 comments.Such comments were deemed to be non-toxic.The number of non-toxic comments is shown under the none column.Fine-tuning BERT-base on a multi-label text classification problem is done by adding a dropout and classification layer (simple feed-forward layer with standard Sigmoid).The layers are placed on top of the transformer output for the [CLS] token.In the fine-tuning, most hyper-parameters stay the same as in BERT's original training.Hence, a BERT-based compatible tokenization model was used, which contains around 30,000 tokens.During this stage, the classifier and the pre-trained model weights are trained jointly.For finetuning, we use a maximum sequence size of 169 tokens, a batch size of 16, and we set the learning rate to 2 × 10 −5 .We consider the usage of AUC-ROC (Area Under the Receiver Operating Characteristics Curve) score as an evaluation metric to assess the classification model performance and choose the best fine-tuned model.For instance, the score for each class is calculated based on averaging the corresponding predicted AUCs for each class.The metric is the standard evaluation method used in the Kaggle competition which can be define as in Equation (1).
where δ indicates all the possible comparisons between subsets X and Y.Meanwhile, related metric to AUC such as specificity and sensitivity are calculate in Equations ( 2) and ( 3).

Sensitivity =
True positive True positive + False negative (2) Speci f icity = True negative True negative + False positive (3) where true positive (TP) stands for positive samples correctly predicted, true negative (TN) are negative samples correctly predicted, false positive (TP) are positive samples misclassified as negative samples, and false negative are negative samples misclassified as positive samples.

Twitter Data Retrieval and Cleaning
The described methodological framework was applied on a Twitter data stream collected and divided into two time-periods: (i) The first four months of 2019 in which we collected around 14,000 tweets (Dataset1) (ii) and from 1 October 2019 to 31 March 2020 in which we collected around 10,000 tweets (Dataset2).Tweets were collected from Twitter via the Twitter API using several search terms and hashtags such as Brexit, #Brexit, #BrexitBetrayal, and #StopBrexit.Figure 3 shows the most frequent words in the collected datasets.The topic of Brexit has sparked a lot of tension among British citizens, and it is most prevalent on Twitter.Brexit is a term used to convey the process of the United Kingdom (UK), leaving the European Union (EU) [73].A quick look at the several Brexit-related hashtags on Twitter shows the genuine disagreements people have on the matter.Some user tweets convey their thoughts in a more cordial manner, while others do not.People from all political spectrums have taken to social media sites like Twitter to vent their frustrations and give their opinions, among other things.The conversation around Brexit on Twitter ranges from sensible to chaotic.The current prime minister, Theresa May, proposed several deals that were all rejected by parliament.At such occurrences, there is a considerable uptick in the Brexit conversation on Twitter.The model proposed in the methodology will, therefore, be used to check the toxicity of conversations around Brexit on Twitter.
A look at the raw data will show a lot of unnecessary information within the tweets where only English tweets were included.Unnecessary in this case means bits and pieces of information that add no value to the model classification.Since these tweets were collected from hashtags, they contain hashtags.However, instead of removing the whole hashtag, only the "#" character was removed, leaving the text of the hashtag.Emojis, which are small digital icons used in online text like tweets, are also present in the collected data.As the model was not trained on emoji data, they were removed.Numbers, punctuations, and URL links in the tweets were also removed from the dataset.

Experiments and Results
We adopt BERT-base as the basis for all experiments based on the publicly available implementation of BERT (Tensorflow, https://github.com/huggingface/transformers(ac-cessed on 1 February 2021), and we follow the fine-tuning regime specified in [48]).While many submissions to the Kaggle leaderboard tackling Toxic Comment Classification Challenge achieved high results, our fine-tuned model performed very well on the challenge test set where it scored 0.98561 (AUC-ROC) as a public score and 0.98603 (AUC-ROC) as a private score.Note that the maximum sequence length is set to 169 in our experiments where a batch size of 16 is used for fine-tuning, and 32 is used during the testing phase.The following sections present results and analysis for our experiments starting with the fine-tuning stage of the BERT model.Then, the classification of the collected tweets dataset (Dataset1 and Dataset2) and analysis of the model's predictions.Additionally, we compare the BERT-base model to three other models, called Multilingual BERT, RoBERTa, and Dis-tilBERT.Table 1 lists the comparison results.It is clear that the BERT-base model recorded the best private score and public score of 0.9890, and 0.9856, respectively.The Multilingual BERT obtained the second rank, followed by DistilBERT and RoBERTa.

Fine-Tuning BERT
Figure 4 shows the change in AUC-ROC value for each category during the fine-tuning (training) of BERT-base on the Toxic Comment Classification Challenge dataset described in Section 3.3.The reported changes over each train steps set shows that the fine-tuned model performed well in learning to classify different comments.In toxic, server toxic, obscene, and insult categories, the model scored less than 0.81 during the first 500 training steps, and then it starts fitting the data from the 1000 training steps till it reaches the highest AUC-ROC values on all categories.Whereas, in threat and identity hate categories, the model starts having AUC-ROC values greater than 0.90 after 2000 and 1000 training steps, respectively.This is due to the small size of training samples presented in threat and identity hate categories.Figure 5 shows the training loss during the fine-tuning process, where the model reaches its lowest value (0.0583) after 8500 training steps, taking around 48 min.Figure 6 shows the obtained AUC-ROC values for each category during the fine-tuning (validation) of BERT-base where 10% of the training samples are used as a validation set.As shown in the figure, most of the categories have been classified with an AUC-ROC value greater than 0.98.Where the average AUC-ROC value for all categories is equal to 0.9878 with a loss value of 0.0374.These findings support the results obtained during the evaluation of the fine-tuned model on the testing set (the evaluation of the testing set predictions has been conducted on the Kaggle platform, https://www.kaggle.com/c/jigsaw-toxic-comment-classification-challenge (accessed on 1 January 2021)).

Prediction Results
For Dataset1, 10,000 Tweets from the search term Brexit and hashtag # BrexitBetrayal were collected.After the tweets were cleaned and tokenized, they were fed to the model.Figure 7a shows the results of these predictions.The results show that most of the tweets were classified as an insult.Then, threat class in the second rank, and obscene class in the third rank.Identity hate class is the fourth, followed by toxic class.Figure 7b shows the number of tweets in Dataset1 having more than one label predicted by the model.The figure shows that 7320 tweets have three labels, which support the results in Figure 7, where threat, insult, and identity hate categories dominate the predicted classes.
Figure 8a shows the prediction results of Dataset2 of the BERT-base model.Insult class attracted the most tweets, followed by threat and obscene categories, respectively.
Figure 8b shows the number of tweets in Dataset2 having more than one label predicted by the BERT-base model.The figure shows that 9187 tweets have three labels, which is greater than the results reported in Dataset1 with few tweets having one or more labels.Furthermore, for the Multilingual BERT model using Dataset1, Figure 9a shows that most of the tweets were also classified as an insult.Figure 9b indicates that 11,014 tweets have three labels.For Dataset2, Figure 10a shows that identity hate class came in first.Figure 10a shows that 9495 tweets have three labels.Additionally, for the RoBERTa model, in Dataset1, most of the tweets classified as insult and threat classes had a number of tweets at 13,995 and 14,000, respectively, as shown in Figure 11a.Moreover, Figure 11b indicates that 12,704 tweets have two labels.For Dataset2, as shown in Figure 12a severe toxic class came in first, with 9970 tweets.Figure 12b shows that about 5269 tweets have four labels.Finally, for the DistilBERT model, Figure 13a shows that most of the tweets in Dataset1 were classified as toxic and insult.Figure 13b indicates that 7019 tweets have two labels.Figure 14a shows that severe toxic, insult, and toxic are the most classified tweets, respectively.Moreover, about 5002 tweets have four labels, as shown in Figure 14b.

Tweets Analysis
The following figures show the most common words in each class presented as word clouds.Figure 15 shows the most common words from tweets in the toxic class.The word Brexit comes up as the most used word, which is understandable considering the context.Even other words like a party, vote, and the UK appear.Harsh words like idiot, shit, and fuck are also among the most frequently used words in this class.However, the model did not classify any tweet as severe toxic.

Conclusions
In this study, we addressed toxicity detection in social media using deep learning techniques.We adopt the Bidirectional Encoder Representations from Transformers (BERT) to classify toxic comments from user-generated data in social media, such as tweets.The BERT-base pre-trained model was fine-tuned on a well-known labeled toxic comment dataset, Kaggle public datasets.Moreover, the proposed model was tested on real-world data, two different tweets datasets, collected in two different periods based on a case study of the UK Brexit.The evaluation outcomes showed that BERT has the ability to classify and to predict toxic comments with a high accuracy rate.Moreover, we compared the BERT-base model to three models, called Multilingual BERT, RoBERTa, and DistilBERT.The BERT-base model outperformed all compared models and achieved the best results.
In future work, further work could be done to make the model better suited to dealing with specific social media data.The size of the dataset could be increased to include tweets to train the model with more Twitter-related data.These tweets would have to be hand-labeled, which would take a fair amount of time to get enough data to increase the accuracy of the model.One of the benefits of adding tweets is that tweets that have emojis can be kept in the dataset.This would allow the model to be trained to account for the presence of emojis.Data labeled toxic based on tweets is currently not available.However, the ability to label a massive dataset of this type of data could be hugely beneficial in the long run.Aside from just collecting Twitter data, text from other social media sites like Facebook, YouTube, and Reddit could be added to improve the dataset.

Figure 1 .
Figure 1.Comments distribution in all categories.

Figure 4 .
Figure 4. Training AUC-ROC in each training step for each comment type.

Figure 5 .
Figure 5. Loss in each training step during fine-tuning.

Figure 6 .
Figure 6.Validation AUC-ROC for each comment type.

Figure 16 Figure 16 .Figure 17 .Figure 18 .
Figure 16 shows the obscene class common words that have Brexit as the most common among tweets in it.It also has harsh words showing up more frequently compared to the other classes.

Figure 19 .
Figure 19.Word clouds for (a) identity hate category in Dataset1 and (b) Dataset2.

Table 1 .
Evaluated models for the fine-tuning task.