Next Article in Journal
An Improved Deep Belief Network Prediction Model Based on Knowledge Transfer
Previous Article in Journal
An Analysis of the Supply of Open Government Data
Previous Article in Special Issue
Ensemble Classifiers for Network Intrusion Detection Using a Novel Network Attack Dataset
Article

A Comparative Analysis of Machine Learning Techniques for Cyberbullying Detection on Twitter

1
Department of Computer and Information Sciences, Universiti Teknologi PETRONAS, Seri Iskandar 32160, Malaysia
2
Information Systems Department, College of Computer and Information Sciences, Prince Sultan University, Riyadh 11586, Saudi Arabia
*
Author to whom correspondence should be addressed.
Future Internet 2020, 12(11), 187; https://doi.org/10.3390/fi12110187
Received: 9 October 2020 / Revised: 19 October 2020 / Accepted: 20 October 2020 / Published: 29 October 2020
(This article belongs to the Special Issue Artificial Intelligence and Machine Learning in Cybercrime Detection)
The advent of social media, particularly Twitter, raises many issues due to a misunderstanding regarding the concept of freedom of speech. One of these issues is cyberbullying, which is a critical global issue that affects both individual victims and societies. Many attempts have been introduced in the literature to intervene in, prevent, or mitigate cyberbullying; however, because these attempts rely on the victims’ interactions, they are not practical. Therefore, detection of cyberbullying without the involvement of the victims is necessary. In this study, we attempted to explore this issue by compiling a global dataset of 37,373 unique tweets from Twitter. Moreover, seven machine learning classifiers were used, namely, Logistic Regression (LR), Light Gradient Boosting Machine (LGBM), Stochastic Gradient Descent (SGD), Random Forest (RF), AdaBoost (ADB), Naive Bayes (NB), and Support Vector Machine (SVM). Each of these algorithms was evaluated using accuracy, precision, recall, and F1 score as the performance metrics to determine the classifiers’ recognition rates applied to the global dataset. The experimental results show the superiority of LR, which achieved a median accuracy of around 90.57%. Among the classifiers, logistic regression achieved the best F1 score (0.928), SGD achieved the best precision (0.968), and SVM achieved the best recall (1.00). View Full-Text
Keywords: cyberbullying detection; tweets classification; Twitter; logistic regression; random forest; light GBM; SGD; AdaBoost; naive bayes; SVM cyberbullying detection; tweets classification; Twitter; logistic regression; random forest; light GBM; SGD; AdaBoost; naive bayes; SVM
Show Figures

Figure 1

MDPI and ACS Style

Muneer, A.; Fati, S.M. A Comparative Analysis of Machine Learning Techniques for Cyberbullying Detection on Twitter. Future Internet 2020, 12, 187. https://doi.org/10.3390/fi12110187

AMA Style

Muneer A, Fati SM. A Comparative Analysis of Machine Learning Techniques for Cyberbullying Detection on Twitter. Future Internet. 2020; 12(11):187. https://doi.org/10.3390/fi12110187

Chicago/Turabian Style

Muneer, Amgad, and Suliman M. Fati. 2020. "A Comparative Analysis of Machine Learning Techniques for Cyberbullying Detection on Twitter" Future Internet 12, no. 11: 187. https://doi.org/10.3390/fi12110187

Find Other Styles
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

1
Back to TopTop