Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessEditor’s ChoiceArticle

Peer-Review Record

Social Media Hate Speech Detection Using Explainable Artificial Intelligence (XAI)

Algorithms 2022, 15(8), 291; https://doi.org/10.3390/a15080291

by Harshkumar Mehta and Kalpdrum Passi^*

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Reviewer 3:

Artur Strzelecki

Reviewer 4:

Gopi Battineni

Algorithms 2022, 15(8), 291; https://doi.org/10.3390/a15080291

Submission received: 29 June 2022 / Revised: 10 August 2022 / Accepted: 11 August 2022 / Published: 17 August 2022

(This article belongs to the Section Evolutionary Algorithms and Machine Learning)

Round 1

Reviewer 1 Report

The paper describes some experiments on hate speech detection using tools that enable the classifiers' interpretability, and two public datasets for training and testing.

Despite the interesting topic, the main weakness of this paper is that it lacks a clear definition of the actual research goals pursued in this work as well as the work's intended contribution. The introductory section does not explain what this work is about, the research question/s it intends to address, nor the contribution it aims to provide. No background information whatsoever is given on the hate speech phenomenon, nor on the hate speech detection task (along with its still open challenges) and why this task might benefit from the integration of tools that enable the models' interpretability. As a result, the reader moves on to section 2, that introduces the datasets, without even knowing why they were used for. Also, the authors do not explain the rationale behind the choice of these two datasets in particular, among all the existing hate speech benchmarks available.

The article contains a great deal of unnecessary details that are either uninformative (e.g. sect. 2.3, 2.8, 2.9.1) or related to concepts whose definition is well established and supposedly known to the average reader of this journal (such as feature extraction, NN architectures, logistic regression and Naive Bayes classifiers, to name a few). On the other hand, it fails in providing a clear and proper overview of fundamental notions, especially as regards BERT. I'll list a few examples:

- l. 407, "BERT has undergone continual unsupervised learning and hence, continual improvement." : this is totally unclear and should be rephrased

- l. 401, "BERT uses a ‘transformer’ which is a part of the model responsible for providing BERT with increased capacity for understanding context and ambiguity in language.": this sentence is really vague and gives an inaccurate explanation of the transformer architecture and the attention mechanism

- the Next Sentence Prediction task is first mentioned in the TF-IDF section, but an actual definition of the task is only provided later in the paper

-the content in ll. 425-432, that includes the above-mentioned NSP task and the one on Masked LM, does not actually explain what these tasks deal with (that is, simply put, to predict whether a given sentence B follows sentence A, and to predict the masked word within a sentence)

All these issues make the article a hard read and prevent from a fair evaluation of the actual contribution provided with this work.

Some other comments follow:

- the "Literature Review" section is ill-structured and confusing, since it mixes related work on automatic detection of hate speech, and related phenomena, with contributions on XAI. Moreover, the one proposed in sect.1.3 cannot be considered an actual literature review, but rather a plain list of paper summaries. A proper literature review should ideally put the presented work in the right context, highlighting both the state of the art in that field and the open issues. This section should be fully restructured taking these principles into account

- content-wise, the article is also full of redundant text, with sentences and complete paragraphs repeated nearly the same way multiple times in the text, for example, the "black-box" definition on pp. 1 and 2, the definition of tokenization in l.301 and l.304, or the LIME description in sections 2.10.4 and 3.2.3.

- the steps described in sect. 2.4 are normal preprocessing steps usually performed before any classification task; how they are supposed to remove biases, as claimed in the abstract (l.13) is not justified nor further motivated by the authors

- on a side note, the "data cleaning" steps in sect. 2.5 are still typical pre-processing operations, therefore the two sections can be simply conflated into a single one
- l.275: "Data cleaning makes the data set error-free": this claim can be quite debatable, as the notion of error is too vague and does not clarifiy what is actually supposed to be "correct"

- the steps depicted in Figure 1 are not described anywhere in the text

- the AngryBERT model should point to the proper reference paper

Author Response

Please see attached file

Author Response File: Author Response.docx

Reviewer 2 Report

This paper uses LIME for interpreting and explaining the predictions of several models on two hate speech datasets. The manuscript is centered on an interesting topic, but some problems need to be fixed:

Lacking overview of XAI – several papers that discuss the contributions in the field are introduced, however, an actual discussion presenting the available XAI models and how they are deployed is missing.

When presenting the different approaches adopted for the task of hate speech detection, we don’t know what these models are (e.g., ‘deep learning architecture’ (line 114); ‘an approach for detecting bullying and aggressive behavior on Twitter has been proposed’(line 122)). What are the deep learning approaches that were proposed? Section 2.10 also introduces several classification methods, however, the actual state of the art methods used in hate speech detection are missing.

In order to normalize the text as much as possible while retaining all relevant semantic information, it is necessary to carefully preprocess the data. The authors include in the pre-processing step the removal of stopwords and punctuation. It is unclear if the BERT based models used the pre-processed text. It should be specified; and if the answer is yes, why?

When introducing the datasets (Section 2), please specify the language of these datasets. Given the multitude of available hate speech corpora, why were these two datasets selected?

‘The label has been derived from the class column in the original data set and the category label has values 0,1 and 2 encoded from the columns (hate_speech, offensive language, neither) in the existing dataset’ (lines 328-330) – this statement appears to be referring to the HateXplain dataset. How were the labels encoded for the Jigsaw dataset?

Did the authors try optimizing the hyperparameters of the models that were employed?

Why did the authors choose BERT + MLP and BERT + ANN? Why not the simple BERT? It becomes clear that these architectures improve over the BERT model only in the conclusion, when we get a comparison of the performance of these different models.

Devlin et al. recommend only 2-4 epochs of training for fine-tuning BERT on a specific NLP task (compared to the hundreds of GPU hours needed to train the original BERT model). Is there any specific reason for using 50 epochs?

Was bert-base-cased model used? It should be specified.

‘Unbalanced data is dealt with using weight optimization 626 and bias is set.’ (line 626) – were any other methods explored? The authors could briefly introduce the different existing methods for dealing with unbalanced datasets (e.g., changing the loss function, downsampling, augmenting the dataset, etc.)

What F1 score was used? Macro? It should be specified.

Some parts of the manuscript may result unclear for some readers. Sections 1.3, and 3 need better organizing in order to resolve this lack of clarity and improve the overall readability of the paper.

In Section 1.3, I would suggest having a subsection that deals with XAI and another one that introduces the different hate speech methods.
Section 3 is difficult to read. I would suggest starting with a subsection that introduces the theoretical background for the models, followed by a subsection that presents the implementation and the hyperparameters that were used, and ending by presenting the results obtained for both datasets.

I would also suggest re-organizing Sections 2.4 (data preprocessing) and 2.5 (data cleaning) as some parts appear to be repeating.

The paragraph at lines 47-53 is repeated at lines 73-79.

Some sections appear highlighted.

Line 9 – delete the .

Line 260 – delete one of the commas

Lines 384-388 and 690-692 – these paragraphs could be itemized

In the tables, please present the best results in bold.

Some results have three decimals, others only two (e.g., Table 4).

Another issue relates to the use of acronyms - every acronym should be defined only once (at the first occurrence) and always used afterwards (except for the abstract).

Author Response

Please see attached file

Author Response File: Author Response.docx

Reviewer 3 Report

The subject of the paper “Social Media Hate Speech Detection using Explainable Artificial Intelligence (XAI) ” is timely and valuable to the audience of the Algorithms. Researchers presented results of hate speech detection on two datasets.

Overall, the paper is well structured, reads quite well, and covers the existing literature quite well. The analysis of the data is interesting and well-documented. However, in my view, some major amendments are required prior to publication.

First of all, some parts of the paper are with a yellow background. As this is the first round of review, I don’t know what it means to highlight in this way.

In the introduction section, I would recommend organizing the literature review in a different way. Right now, there is one paragraph per one mentioned paper (lines 94 to 221). For the reader, this is useless. Perhaps a better way is to organize this into common topics, and under topics, cite proper papers.

In the materials and method section, I feel that point 2.9 does not need to be so long. If a reader wants to read about different extraction methods, they can follow a reference. This point finishes in line 513 of the paper, and we still didn’t get to the results of your work.

In the results section, you suppose to present only the results of your work. Still, long parts of the text which don’t fit in this section are also here. Examples are explanations what are LSTN (lines 517-519) and Random Forests (567-576), and the same applies to other methods described in this section. It would be better (for a final reader) to use references for these explanations and a results section used only for the results.

Author Response

Please see attached file

Author Response File: Author Response.docx

Reviewer 4 Report

The authors presented comprehensive work on the detection of hate speech with help of XAI. The paper is well written with a clear aim and specific objectives. A few concerns that need to be addressed before it gets acceptance.

v Abstract should be rewritten by not mentioning/comparing the existing literature instead they can highlight performance metrics (or its values) that were achieved by this study

v Please add citations for the statements such as lines 32-33

v AI abbreviation should be done prior to use

v Literature review is well written in details

v Google Jigsaw dataset adopted from Kaggle please add the source website

v The results of TF-IDF were missed

Author Response

Please see attached file

Round 2

Reviewer 1 Report

The paper improved compared to the previous version, but some weaknessess in the workflow design and description still persist. First, I don't get the point of finding the best model on Google Jigsaw, when the tests on explainability were then implemented on the HateXplain benchmark. And more in general, the text is fragmented and lacks cohesion, which prevents from fully understanding both the workflow and the rationale behind the experimental design.

Also, there are still vague and imprecise expressions in the manuscript, such as the notion of "structural error" mentioned in sect.2.4 but not explained.

Author Response

Please see attached file

Author Response File: Author Response.pdf

Reviewer 3 Report

Thank you very much. All of my previous comments were correctly addressed. Thank you very much for clarifying the related work and results sections. I think that the manuscript has been significantly improved. I wish you good luck in your future work.

Author Response

Thank you for your valuable comments and encouragement

Article Menu

Social Media Hate Speech Detection Using Explainable Artificial Intelligence (XAI)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI