Next Article in Journal
Applications of Nonlinear Programming to the Optimization of Fractionated Protocols in Cancer Radiotherapy
Next Article in Special Issue
A Reliable Weighting Scheme for the Aggregation of Crowd Intelligence to Detect Fake News
Previous Article in Journal
The Importance of Trust in Knowledge Sharing and the Efficiency of Doing Business on the Example of Tourism
Open AccessArticle

Malicious Text Identification: Deep Learning from Public Comments and Emails

Department of Computer Science and Engineering, University of Louisville, Louisville, KY 40292, USA
*
Authors to whom correspondence should be addressed.
These authors contributed equally to this work.
Information 2020, 11(6), 312; https://doi.org/10.3390/info11060312
Received: 16 April 2020 / Revised: 4 June 2020 / Accepted: 5 June 2020 / Published: 10 June 2020
(This article belongs to the Special Issue Tackling Misinformation Online)
Identifying internet spam has been a challenging problem for decades. Several solutions have succeeded to detect spam comments in social media or fraudulent emails. However, an adequate strategy for filtering messages is difficult to achieve, as these messages resemble real communications. From the Natural Language Processing (NLP) perspective, Deep Learning models are a good alternative for classifying text after being preprocessed. In particular, Long Short-Term Memory (LSTM) networks are one of the models that perform well for the binary and multi-label text classification problems. In this paper, an approach merging two different data sources, one intended for Spam in social media posts and the other for Fraud classification in emails, is presented. We designed a multi-label LSTM model and trained it on the joint datasets including text with common bigrams, extracted from each independent dataset. The experiment results show that our proposed model is capable of identifying malicious text regardless of the source. The LSTM model trained with the merged dataset outperforms the models trained independently on each dataset. View Full-Text
Keywords: spam text filter; text mining; content-based classification; natural language processing; multi-label classification; LSTM spam text filter; text mining; content-based classification; natural language processing; multi-label classification; LSTM
Show Figures

Figure 1

MDPI and ACS Style

Baccouche, A.; Ahmed, S.; Sierra-Sosa, D.; Elmaghraby, A. Malicious Text Identification: Deep Learning from Public Comments and Emails. Information 2020, 11, 312.

Show more citation formats Show less citations formats
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

1
Search more from Scilit
 
Search
Back to TopTop