MDPI - Publisher of Open Access Journals

17 pages, 3202 KiB

Open AccessFeature PaperArticle

Arabic Spam Tweets Classification: A Comprehensive Machine Learning Approach

by Wafa Hussain Hantom and Atta Rahman

AI 2024, 5(3), 1049-1065; https://doi.org/10.3390/ai5030052 - 2 Jul 2024

Cited by 3 | Viewed by 2393

Nowadays, one of the most common problems faced by Twitter (also known as X) users, including individuals as well as organizations, is dealing with spam tweets. The problem continues to proliferate due to the increasing popularity and number of users of social media platforms. Due to this overwhelming interest, spammers can post texts, images, and videos containing suspicious links that can be used to spread viruses, rumors, negative marketing, and sarcasm, and potentially hack the user’s information. Spam detection is among the hottest research areas in natural language processing (NLP) and cybersecurity. Several studies have been conducted in this regard, but they mainly focus on the English language. However, Arabic tweet spam detection still has a long way to go, especially emphasizing the diverse dialects other than modern standard Arabic (MSA), since, in the tweets, the standard dialect is seldom used. The situation demands an automated, robust, and efficient Arabic spam tweet detection approach. To address the issue, in this research, various machine learning and deep learning models have been investigated to detect spam tweets in Arabic, including Random Forest (RF), Support Vector Machine (SVM), Naive Bayes (NB) and Long-Short Term Memory (LSTM). In this regard, we have focused on the words as well as the meaning of the tweet text. Upon several experiments, the proposed models have produced promising results in contrast to the previous approaches for the same and diverse datasets. The results showed that the RF classifier achieved 96.78% and the LSTM classifier achieved 94.56%, followed by the SVM classifier that achieved 82% accuracy. Further, in terms of F1-score, there is an improvement of 21.38%, 19.16% and 5.2% using RF, LSTM and SVM classifiers compared to the schemes with same dataset. Full article

(This article belongs to the Special Issue Cybersecurity and Artificial Intelligence: Current and Future Developments)

► Show Figures

Figure 1

16 pages, 522 KiB

Open AccessArticle

Enhancing Detection of Arabic Social Spam Using Data Augmentation and Machine Learning

by Abdullah M. Alkadri, Abeer Elkorany and Cherry Ahmed

Appl. Sci. 2022, 12(22), 11388; https://doi.org/10.3390/app122211388 - 10 Nov 2022

Cited by 21 | Viewed by 3599

Abstract

In recent years, people have tended to use online social platforms, such as Twitter and Facebook, to communicate with families and friends, read the latest news, and discuss social issues. As a result, spam content can easily spread across them. Spam detection is considered one of the important tasks in text analysis. Previous spam detection research focused on English content, with less attention to other languages, such as Arabic, where labeled data are often hard to obtain. In this paper, an integrated framework for Twitter spam detection is proposed to overcome this problem. This framework integrates data augmentation, natural language processing, and supervised machine learning algorithms to overcome the problems of detection of Arabic spam on the Twitter platform. The word embedding technique is employed to augment the data using pre-trained word embedding vectors. Different machine learning techniques were applied, such as SVM, Naive Bayes, and Logistic Regression for spam detection. To prove the effectiveness of this model, a real-life data set for Arabic tweets have been collected and labeled. The results show that an overall improvement in the use of data augmentation increased the macro F1 score from 58% to 89%, with an overall accuracy of 92%, which outperform the current state of the art. Full article

(This article belongs to the Special Issue Applications of Artificial Intelligence on Social Media)

► Show Figures

Figure 1

24 pages, 4016 KiB

Open AccessArticle

A Combined Text-Based and Metadata-Based Deep-Learning Framework for the Detection of Spam Accounts on the Social Media Platform Twitter

by Atheer S. Alhassun and Murad A. Rassam

Processes 2022, 10(3), 439; https://doi.org/10.3390/pr10030439 - 22 Feb 2022

Cited by 30 | Viewed by 4917

Abstract

Social networks have become an integral part of our daily lives. With their rapid growth, our communication using these networks has only increased as well. Twitter is one of the most popular networks in the Middle East. Similar to other social media platforms, Twitter is vulnerable to spam accounts spreading malicious content. Arab countries are among the most targeted, possibly due to the lack of effective technologies that support the Arabic language. In addition, as a complex language, Arabic has extensive grammar rules and many dialects that present challenges when extracting text data. Innovative methods to combat spam on Twitter have been the subject of many current studies. This paper addressed the issue of detecting spam accounts in Arabic on Twitter by collecting an Arabic dataset that would be suitable for spam detection. The dataset contained data from premium features by using Twitter premium API. Data labeling was conducted by flagging suspended accounts. A combined framework was proposed based on deep-learning methods with several advantages, including more accurate, faster results while demanding less computational resources. Two types of data were used, text-based data with a convolution neural networks (CNN) model and metadata with a simple neural networks model. The output of the two models combined identified accounts as spam or not spam. The results showed that the proposed framework achieved an accuracy of 94.27% with our combined model using premium feature data, and it outperformed the best models tested thus far in the literature. Full article

(This article belongs to the Special Issue Recent Advances in Machine Learning and Applications)

► Show Figures

Figure 1

16 pages, 1022 KiB

Open AccessArticle

A Hybrid CNN-LSTM Model for SMS Spam Detection in Arabic and English Messages

by Abdallah Ghourabi, Mahmood A. Mahmood and Qusay M. Alzubi

Future Internet 2020, 12(9), 156; https://doi.org/10.3390/fi12090156 - 18 Sep 2020

Cited by 107 | Viewed by 13059

Abstract

Despite the rapid evolution of Internet protocol-based messaging services, SMS still remains an indisputable communication service in our lives until today. For example, several businesses consider that text messages are more effective than e-mails. This is because 82% of SMSs are read within 5 min., but consumers only open one in four e-mails they receive. The importance of SMS for mobile phone users has attracted the attention of spammers. In fact, the volume of SMS spam has increased considerably in recent years with the emergence of new security threats, such as SMiShing. In this paper, we propose a hybrid deep learning model for detecting SMS spam messages. This detection model is based on the combination of two deep learning methods CNN and LSTM. It is intended to deal with mixed text messages that are written in Arabic or English. For the comparative evaluation, we also tested other well-known machine learning algorithms. The experimental results that we present in this paper show that our CNN-LSTM model outperforms the other algorithms. It achieved a very good accuracy of 98.37%. Full article

(This article belongs to the Section Cybersecurity)

► Show Figures

Figure 1

21 pages, 677 KiB

Open AccessArticle

Predicting Rogue Content and Arabic Spammers on Twitter

by Adel R. Alharbi and Amer Aljaedi

Future Internet 2019, 11(11), 229; https://doi.org/10.3390/fi11110229 - 30 Oct 2019

Cited by 13 | Viewed by 5347

Abstract

Twitter is one of the most popular online social networks for spreading propaganda and words in the Arab region. Spammers are now creating rogue accounts to distribute adult content through Arabic tweets that Arabic norms and cultures prohibit. Arab governments are facing a huge challenge in the detection of these accounts. Researchers have extensively studied English spam on online social networks, while to date, social network spam in other languages has been completely ignored. In our previous study, we estimated that rogue and spam content accounted for approximately three quarters of all content with Arabic trending hashtags in Saudi Arabia. This alarming rate, supported by autonomous concurrent estimates, highlights the urgent need to develop adaptive spam detection methods. In this work, we collected a pure data set from spam accounts producing Arabic tweets. We applied lightweight feature engineering based on rogue content and user profiles. The 47 generated features were analyzed, and the best features were selected. Our performance results show that the random forest classification algorithm with 16 features performs best, with accuracy rates greater than 90%. Full article

(This article belongs to the Section Cybersecurity)

► Show Figures

Figure 1

26 pages, 3428 KiB

Open AccessArticle

A Survey of Attacks Against Twitter Spam Detectors in an Adversarial Environment

by Niddal H. Imam and Vassilios G. Vassilakis

Robotics 2019, 8(3), 50; https://doi.org/10.3390/robotics8030050 - 4 Jul 2019

Cited by 26 | Viewed by 7771

Abstract

Online Social Networks (OSNs), such as Facebook and Twitter, have become a very important part of many people’s daily lives. Unfortunately, the high popularity of these platforms makes them very attractive to spammers. Machine learning (ML) techniques have been widely used as a tool to address many cybersecurity application problems (such as spam and malware detection). However, most of the proposed approaches do not consider the presence of adversaries that target the defense mechanism itself. Adversaries can launch sophisticated attacks to undermine deployed spam detectors either during training or the prediction (test) phase. Not considering these adversarial activities at the design stage makes OSNs’ spam detectors vulnerable to a range of adversarial attacks. Thus, this paper surveys the attacks against Twitter spam detectors in an adversarial environment, and a general taxonomy of potential adversarial attacks is presented using common frameworks from the literature. Examples of adversarial activities on Twitter that were discovered after observing Arabic trending hashtags are discussed in detail. A new type of spam tweet (adversarial spam tweet), which can be used to undermine a deployed classifier, is examined. In addition, possible countermeasures that could increase the robustness of Twitter spam detectors to such attacks are investigated. Full article

(This article belongs to the Special Issue Latest Artificial Intelligence Research Output 2018)

► Show Figures

Figure 1

Search Results (6)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (6)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI