MDPI - Publisher of Open Access Journals

20 pages, 4507 KiB

Open AccessArticle

Cyberbullying Detection on Social Media Using Stacking Ensemble Learning and Enhanced BERT

by Amgad Muneer, Ayed Alwadain, Mohammed Gamal Ragab and Alawi Alqushaibi

Information 2023, 14(8), 467; https://doi.org/10.3390/info14080467 - 18 Aug 2023

Cited by 29 | Viewed by 7921

The prevalence of cyberbullying on Social Media (SM) platforms has become a significant concern for individuals, organizations, and society as a whole. The early detection and intervention of cyberbullying on social media are critical to mitigating its harmful effects. In recent years, ensemble learning has shown promising results for detecting cyberbullying on social media. This paper presents an ensemble stacking learning approach for detecting cyberbullying on Twitter using a combination of Deep Neural Network methods (DNNs). It also introduces BERT-M, a modified BERT model. The dataset used in this study was collected from Twitter and preprocessed to remove irrelevant information. The feature extraction process involved utilizing word2vec with Continuous Bag of Words (CBOW) to form the weights in the embedding layer. These features were then fed into a convolutional and pooling mechanism, effectively reducing their dimensionality, and capturing the position-invariant characteristics of the offensive words. The validation of the proposed stacked model and BERT-M was performed using well-known model evaluation measures. The stacked model achieved an F1-score of 0.964, precision of 0.950, recall of 0.92 and the detection time reported was 3 min, which surpasses the previously reported accuracy and speed scores for all known NLP detectors of cyberbullying, including standard BERT and BERT-M. The results of the experiment showed that the stacking ensemble learning approach achieved an accuracy of 97.4% in detecting cyberbullying on Twitter dataset and 90.97% on combined Twitter and Facebook dataset. The results demonstrate the effectiveness of the proposed stacking ensemble learning approach in detecting cyberbullying on SM and highlight the importance of combining multiple models for improved performance. Full article

(This article belongs to the Special Issue Digital Literacy and Social Media: Practices, Challenges and Opportunities)

► Show Figures

Figure 1

21 pages, 4970 KiB

Open AccessArticle

Cyberbullying Detection on Twitter Using Deep Learning-Based Attention Mechanisms and Continuous Bag of Words Feature Extraction

by Suliman Mohamed Fati, Amgad Muneer, Ayed Alwadain and Abdullateef O. Balogun

Mathematics 2023, 11(16), 3567; https://doi.org/10.3390/math11163567 - 17 Aug 2023

Cited by 35 | Viewed by 6601 | Correction

Abstract

Since social media platforms are widely used and popular, they have given us more opportunities than we can even imagine. Despite all of the known benefits, some users may abuse these opportunities to humiliate, insult, bully, and harass other people. This issue explains why there is a need to reduce such negative activities and create a safe cyberspace for innocent people by detecting cyberbullying activity. This study provides a comparative analysis of deep learning methods used to test and evaluate their effectiveness regarding a well-known global Twitter dataset. To recognize abusive tweets and overcome existing challenges, attention-based deep learning methods are introduced. The word2vec with CBOW concatenated formed the weights included in the embedding layer and was used to extract the features. The feature vector was input into a convolution and pooling mechanism, reducing the feature dimensionality while learning the position-invariant of the offensive words. A SoftMax function predicts feature classification. Using benchmark experimental datasets and well-known evaluation measures, the convolutional neural network model with attention-based long- and short-term memory was found to outperform other DL methods. The proposed cyberbullying detection methods were evaluated using benchmark experimental datasets and well-known evaluation measures. Finally, the results demonstrated the superiority of the attention-based 1D convolutional long short-term memory (Conv1DLSTM) classifier over the other implemented methods. Full article

(This article belongs to the Special Issue Advances in Machine Learning and Applications)

► Show Figures

Figure 1

19 pages, 2715 KiB

Open AccessArticle

A Node Embedding-Based Influential Spreaders Identification Approach

by Dongming Chen, Panpan Du, Bo Fang, Dongqi Wang and Xinyu Huang

Mathematics 2020, 8(9), 1554; https://doi.org/10.3390/math8091554 - 10 Sep 2020

Cited by 11 | Viewed by 3339

Abstract

Node embedding is a representation learning technique that maps network nodes into lower-dimensional vector space. Embedding nodes into vector space can benefit network analysis tasks, such as community detection, link prediction, and influential node identification, in both calculation and richer application scope. In this paper, we propose a two-step node embedding-based solution for the social influence maximization problem (IMP). The solution employs a revised network-embedding algorithm to map input nodes into vector space in the first step. In the second step, the solution clusters the vector space nodes into subgroups and chooses the subgroups’ centers to be the influential spreaders. The proposed approach is a simple but effective IMP solution because it takes both the social reinforcement and homophily characteristics of the social network into consideration in node embedding and seed spreaders selection operation separately. The information propagation simulation experiment of single-point contact susceptible-infected-recovered (SIR) and full-contact SIR models on six different types of real network data sets proved that the proposed social influence maximization (SIM) solution exhibits significant propagation capability. Full article

(This article belongs to the Special Issue Computational Mathematics and Neural Systems)

► Show Figures

Figure 1

13 pages, 2178 KiB

Open AccessArticle

Multidocument Arabic Text Summarization Based on Clustering and Word2Vec to Reduce Redundancy

by Samer Abdulateef, Naseer Ahmed Khan, Bolin Chen and Xuequn Shang

Information 2020, 11(2), 59; https://doi.org/10.3390/info11020059 - 23 Jan 2020

Cited by 48 | Viewed by 6258

Abstract

Arabic is one of the most semantically and syntactically complex languages in the world. A key challenging issue in text mining is text summarization, so we propose an unsupervised score-based method which combines the vector space model, continuous bag of words (CBOW), clustering, and a statistically-based method. The problems with multidocument text summarization are the noisy data, redundancy, diminished readability, and sentence incoherency. In this study, we adopt a preprocessing strategy to solve the noise problem and use the word2vec model for two purposes, first, to map the words to fixed-length vectors and, second, to obtain the semantic relationship between each vector based on the dimensions. Similarly, we use a k-means algorithm for two purposes: (1) Selecting the distinctive documents and tokenizing these documents to sentences, and (2) using another iteration of the k-means algorithm to select the key sentences based on the similarity metric to overcome the redundancy problem and generate the initial summary. Lastly, we use weighted principal component analysis (W-PCA) to map the sentences’ encoded weights based on a list of features. This selects the highest set of weights, which relates to important sentences for solving incoherency and readability problems. We adopted Recall-Oriented Understudy for Gisting Evaluation (ROUGE) as an evaluation measure to examine our proposed technique and compare it with state-of-the-art methods. Finally, an experiment on the Essex Arabic Summaries Corpus (EASC) using the ROUGE-1 and ROUGE-2 metrics showed promising results in comparison with existing methods. Full article

(This article belongs to the Special Issue Natural Language Generation and Machine Learning)

► Show Figures

Figure 1

12 pages, 252 KiB

Open AccessArticle

Learning Word Embeddings with Chi-Square Weights for Healthcare Tweet Classification

by Sicong Kuang and Brian D. Davison

Appl. Sci. 2017, 7(8), 846; https://doi.org/10.3390/app7080846 - 17 Aug 2017

Cited by 28 | Viewed by 5120

Abstract

Twitter is a popular source for the monitoring of healthcare information and public disease. However, there exists much noise in the tweets. Even though appropriate keywords appear in the tweets, they do not guarantee the identification of a truly health-related tweet. Thus, the traditional keyword-based classification task is largely ineffective. Algorithms for word embeddings have proved to be useful in many natural language processing (NLP) tasks. We introduce two algorithms based on an existing word embedding learning algorithm: the continuous bag-of-words model (CBOW). We apply the proposed algorithms to the task of recognizing healthcare-related tweets. In the CBOW model, the vector representation of words is learned from their contexts. To simplify the computation, the context is represented by an average of all words inside the context window. However, not all words in the context window contribute equally to the prediction of the target word. Greedily incorporating all the words in the context window will largely limit the contribution of the useful semantic words and bring noisy or irrelevant words into the learning process, while existing word embedding algorithms also try to learn a weighted CBOW model. Their weights are based on existing pre-defined syntactic rules while ignoring the task of the learned embedding. We propose learning weights based on the words’ relative importance in the classification task. Our intuition is that such learned weights place more emphasis on words that have comparatively more to contribute to the later task. We evaluate the embeddings learned from our algorithms on two healthcare-related datasets. The experimental results demonstrate that embeddings learned from the proposed algorithms outperform existing techniques by a relative accuracy improvement of over 9%. Full article

(This article belongs to the Special Issue Smart Healthcare)

► Show Figures

Graphical abstract

Search Results (5)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (5)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI