Next Article in Journal
Developing IQJournalism: An Intelligent Advisor for Predicting the Perceived Quality in Greek News Articles
Previous Article in Journal
Development of Digital Twin for DC-DC Converters Under Varying Parameter Conditions
Previous Article in Special Issue
Quantum-Enhanced Machine Learning for Cybersecurity: Evaluating Malicious URL Detection
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Systematic Review

Advances in NLP Techniques for Detection of Message-Based Threats in Digital Platforms: A Systematic Review

by
José Saias
1,2
1
Departamento de Informática, Escola de Ciências e Tecnologia, Universidade de Évora, Rua Romão Ramalho, n. 59, 7000-671 Évora, Portugal
2
VISTA Lab, ALGORITMI Research Centre/LASI, University of Évora, 7004-516 Évora, Portugal
Electronics 2025, 14(13), 2551; https://doi.org/10.3390/electronics14132551
Submission received: 21 April 2025 / Revised: 18 June 2025 / Accepted: 20 June 2025 / Published: 24 June 2025

Abstract

Users of all ages face risks on social media and messaging platforms. When encountering suspicious messages, legitimate concerns arise about a sender’s malicious intent. This study examines recent advances in Natural Language Processing for detecting message-based threats in digital communication. We conducted a systematic review following PRISMA guidelines, to address four research questions. After applying a rigorous search and screening pipeline, 30 publications were selected for analysis. Our work assessed the NLP techniques and evaluation methods employed in recent threat detection research, revealing that large language models appear in only 20% of the reviewed works. We further categorized detection input scopes and discussed ethical and privacy implications. The results show that AI ethical aspects are not systematically addressed in the reviewed scientific literature.

1. Introduction

Both social media and instant messaging platforms pose risks of online predator activity, catfishing, dating scams, and social engineering attacks aimed at data harvesting, including phishing and spear phishing, as well as the spread of malware. In online messaging and social platforms, when a suspicious message is received, several critical questions arise:
  • Has the account of the person I am interacting with been compromised?
  • Is this unknown sender a real person acting with malicious intent?
  • Could this be part of a manipulation scam or a targeted social engineering attempt?
  • Is the sender impersonating someone else?
  • Could that profile be fake and controlled by a bot for deceptive purposes?
A variety of studies have explored these topics. In [1], the authors explore how the convergence of social media platforms and Artificial Intelligence (AI) foster the possibility of exploitation, using case studies to examine victims’ experiences in digital romance fraud. In [2], the authors analyze romance fraud and behavioral changes on the Internet use during the pandemic. The challenges of fake profiles and the techniques used to detect them were surveyed in [3]. The dual role of large language models (LLMs) in social engineering is examined in [4]. Describing a case study on collusion scams, the authors analyze both the potential of LLMs to generate attacks and their possible use in enhancing detection. The work in [5] presents a review of deep learning approaches for phishing detection in E-mail.
While other studies have focused on surveys of specific types of threats, there is a gap in the literature regarding analyses from a more transversal perspective, one that emphasizes common patterns, particularly the threats’ action channel (messages) and medium (text). Additionally, the practical applicability of these strategies in a real-world context, especially in light of emerging ethical and legal challenges, remains underexplored.
Our work focuses on recent advancements in Natural Language Processing (NLP) techniques for identifying unsafe conversations in private or direct messaging environments. This review is guided by the following research questions:
RQ1.
What NLP techniques are employed in threat detection for messaging platforms, and how are LLMs integrated into these approaches, according to recent research?
RQ2.
What evaluation strategies are used to measure the effectiveness of these techniques?
RQ3.
What types of input are used for online threat detection in recent studies?
RQ4.
How can detection efficacy be balanced with privacy, informed consent, and compliance with legal and ethical regulations?
RQ1 directly targets the central research gap concerning a holistic perspective on textual and message-based threats. Rather than focusing on isolated threat categories, it investigates NLP techniques underpinning diverse forms of online deception. Given the paradigm-shifting impact of LLMs, we explicitly incorporate their role in our analysis. The second research question seeks to evaluate the performance of detection methods, and to analyze how recent studies assess their effectiveness. In online security contexts, where missed threats may have severe consequences, identifying potential limitations in current approaches helps to prioritize future research directions. While messages serve as the primary threat vector, effective detection typically requires the analysis of broader contextual cues. RQ3 characterizes the supplementary information researchers use alongside textual content to enhance detection, enabling a holistic understanding of detection strategies. RQ4 addresses the second identified gap, seeking to understand the suitability of these detection strategies for real use in citizen protection, and to verify whether recent studies explicitly account for user privacy and regulatory compliance.
According to the Internet Organised Crime Threat Assessment 2021 report from EuroPol [6], that year, phishing and social engineering increased to generate considerable criminal proceeds. The pandemic-driven shift to online shopping and digital interaction created heightened opportunities for fraud, as malicious actors exploited increased reliance on virtual platforms. Three years after, the IOCTA2024 report [7], highlighted the growing use of end-to-end encrypted communication platforms by offenders. Phishing remained the leading fraud method in 2023 against EU citizens, companies, and institutions, and Smishing was the most commonly used variant. The North-American Internet Crime Report 2023 (https://www.ic3.gov/AnnualReport/Reports/2023_IC3Report.pdf (accessed on 2 April 2025)), accounts for 51,750 complaints on impersonation scams, with total losses surpassing USD 1.3 billion that year.
Securing online interactions is critical for individuals and organizations alike, as compromised personal data can expose affiliated entities to targeted attacks. With these assumptions, the main contributions of this work are the following:
  • A characterization of message-based threat detection strategies as reported in the recent scientific literature and the extent to which LLMs are involved;
  • Understand whether there is standardization or not of the evaluation strategies of contemporary approaches;
  • Determine the monitoring scope selected in research for threat detection;
  • How current research addresses new challenges regarding ethics and privacy in the use of data in a real-world scenario.
The following article content has been structured as follows. Section 2 describes the context and the main types of messaging-based threats as well as the current AI and NLP landscape. Section 3 describes the review search strategy, the inclusion criteria, and the data extraction process. In Section 4, we present the outcomes of each phase of our systematic review protocol. We discuss the results and the interpretations that can be drawn from them in Section 5. Finally, in Section 6, we summarize our main findings and present future research directions.

2. Problem Context

This section presents foundational concepts and essential terminology relevant to this research area and mentioned in later sections.

2.1. Message-Based Threats

Some threats associated with message-based interaction have been known for decades, but they have evolved with technological advances, creating the need for continuous updates to protection techniques. When replying to a message (or reacting to it in certain ways), certain risks arise that are independent of the terminal device, whether a computer or smartphone, and are unrelated to the specific software used, such as browsers or apps. These risks can lead to both short-term and long-term harm to victims. Sometimes the consequence only occurs after a period in which the attacker gradually builds trust and gathers valuable information. Some of the types of threats conveyed through messages are as follows:
  • Fake profile is a fake online persona designed to mislead or deceive other users. It may involve photos, names, or details copied from other people, or artificially fabricated by AI. It can be operated by a bot, controlled by AI, or directly by a human.
  • Impersonation is the malicious use of another person’s identity, by creating a fake profile, or by hacking into an existing legitimate account. By impersonating a celebrity, for example, the attacker may gain advantages and convince victims to provide information while thinking they are actually talking to that person.
  • Shaming and cancel culture is the act of explicit criticism towards victims, through comments, group chats, or posts online, either for offensive reasons or in an attempt to shape public opinion against them.
  • Pretexing is a form of social engineering in which the victim is confronted with an unreal and unexpected scenario (the pretext), and the attacker tries to manipulate them with persuasive storytelling.
  • Social engineering is a manipulation technique that seeks to take advantage of human behavior and psychology and is used to deceive individuals into revealing valuable information, performing actions, or instinctively react in a compromising way.
  • Phishing is a form of cyberattack where criminals use forged messages or web content to scam victims into disclosing sensitive information. Usually it comprises two parts: the bait, or deceptive content alluding to some trusted entity, and the hook, or a way of capturing information or inserting malware.
  • Spear phishing is a form of phishing targeted at a specific individual or organization, often involving prior research to include credible and personalized elements.
  • Smishing is a phishing variant where the attack is conducted via text messages (SMS). The name first S stands for SMS, just like in Vishing, where the V stands for voice call-based phishing.
  • Doxing (also referred to as ’doxxing’) is the malicious act of exposing someone’s private or sensitive information in public or within a group, in an unauthorized manner, and with the purpose of harassing or humiliating.
  • Spam is the process of sending unsolicited (or simply irrelevant) messages, usually to a large number of recipients. It can be used to send advertising or to spread rumors or misinformation, but it can also be a channel for phishing.
  • A scam is a broad class of deliberate fraudulent schemes designed to manipulate victims into losing money, or to obtain control, resources, or valuable information.
  • Harmful content, as a threat, is the transmission of material that inflicts emotional, physical, or societal harm or was created through any form of abuse or exploitation.
  • Hate speech is a form of harmful content that promotes hostility against individuals or groups.
It can be observed that some concepts are specializations of broader categories. In addition to this hierarchical structure, some threats may span across multiple categories.
Terminological ambiguity is occasionally observed. Some studies refer to “fake account”, although “fake profile” is generally the more accurate term. While an account may be compromised, it technically refers to a technical concept linked to platform access, and is not fake in that sense but rather in terms of the false identity it is intended to represent.

2.2. Landscape of NLP and AI

This section examines the fundamental concepts of NLP and AI, with a focus on pertinent techniques and recent developments in these evolving domains.

2.2.1. NLP

Natural Language Processing (NLP) is a branch of computer science that blends Artificial Intelligence with Computational Linguistics, empowering computers to interpret, analyze, and produce human language. Many applications benefit from NLP techniques, including named entity recognition, sentiment analysis, question-answering, dialogue systems, machine translation, speech recognition, and text-to-speech technology [8].
Typical NLP processing pipelines involve preprocessing, feature extraction to convert raw text into representations interpretable by machine learning models, and model training. Some NLP techniques do not rely on ML and often involve rule and heuristic-based approaches and statistical approaches, which do not require training.
Since around 2010, deep neural networks have significantly improved the performance of many NLP tasks, driving major advances in the field.

2.2.2. ML and DL

Machine learning (ML) is a branch of Artificial Intelligence focused on enabling computers and machines to imitate the way that humans learn, to perform tasks autonomously, and to improve their performance and accuracy through experience and exposure to more data [9]. Some traditional models—meaning not based on neural networks—rely on manual features or statistical patterns. Some algorithms falling into this class are Naive Bayes (NB), Support Vector Machine (SVM), Logistic Regression (LR), and Random Forest (RF).
Deep learning (DL) is a class of ML techniques that use multiple layers of nonlinear processing units in neural networks (or deep neural networks) to model complex patterns in data [10]. Multilayer Perceptrons (MLPs), Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), and Long Short-Term Memory (LSTM, a type of RNN) networks are all commonly applied DL algorithms.
Introduced in 2017, Transformers are a type of DL models particularly well suited for tasks involving sequential data, such as language, that leverage a mechanism called self-attention, to process the entire sequence of data in parallel, which leads to significant improvements in efficiency and performance [11]. Bidirectional Encoder Representations from Transformers (BERT), Bidirectional and Auto-Regressive Transformers (BART), are examples of transformer-based models used in NLP tasks.

2.2.3. LLMs

A large language model (LLM) is a type of language model trained on extremely large volumes of data, having billions of parameters, and aiming to understand and to generate human-like text. LLMs are considered a subset of Transformer-based models and are used to perform several NLP tasks. Between 2019 and 2020, Generative Pre-trained Transformers (GPTs) emerged, in particular, those by OpenAI (https://openai.com/ (accessed on 2 April 2025)), whose popularity grew exponentially with the conversational ChatGPT in 2022. The models were closed and accessible via API. Closed LLMs are proprietary AI models whose weights and source code are not publicly accessible, and interaction typically occurs through an API. On the other hand, open models make their weights available, allowing for inspection or fine-tuning, though they may still have usage restrictions. Going a step further, open-source models provide full access to their training data, code, and documentation, giving users complete control over the model’s internals [12].
Some entities and platforms have contributed to facilitating collaboration and access to language models. Hugging Face (https://huggingface.co/ (accessed on 2 April 2025)) is a company and a collaborative online platform that has emerged as an important hub for the machine learning community in accessing models and datasets.
Later, high-quality models started to appear available to researchers, especially after the release of the first LLaMA (https://www.llama.com/ (accessed on 2 April 2025)) model, in February 2023. Meta released LLaMA 2 in July 2023, making it fully accessible to researchers, developers, and businesses. During 2024, Meta followed up with the release of LLaMA 3, while Google also introduced its Gemma family (https://ai.google.dev/gemma (accessed on 2 April 2025)) of open-source models. Mistral AI (https://mistral.ai/ (accessed on 2 April 2025)) has also established itself as a significant contributor to the open-source LLM community, with models known for their efficiency and performance.
The LLM ecosystem has been very dynamic, with growing interest from the community and several releases of new models of various sizes and tuning specializations. In late 2024 and 2025, new models arrived with improvements in dimension (number of parameters) or in improving multimodal capacity for text, audio, and visual processing.
By revisiting these key temporal references in the adoption of LLMs, we provide a rationale for the time horizon selected for our study, as outlined in the following section.

3. Methodology

This research follows a qualitative approach to gain deeper insight into the strategies proposed in the recent literature, which is a common method for studying problems using non-numerical data [13].

3.1. Search Strategy

For transparency and to reduce the risk of reporting bias, we followed the PRISMA 2020 (https://www.prisma-statement.org/ (accessed on 4 April 2025)) guidelines to organize the review process, including the stages of literature search, study screening, and data extraction.

3.1.1. Data Sources

A comprehensive search was prepared across a set of curated academic databases known for indexing peer-reviewed, high-impact, and domain-relevant literature, and supporting structured and reproducible search. Table 1 lists the paper source databases chosen for the research review, with the corresponding URL address for each one, from which the search process can be replicated.

3.1.2. Selection Criteria

The selection criteria were guided by the research questions outlined in Section 1. Given RQ1’s focus on LLMs, we included only publications dated 1 January 2024 or later, as this represents the earliest date for research including open LLM models to appear in the literature.
We aimed to refine the search strategy to cover the core themes of the review, namely, NLP, threat detection, and messaging platforms. Through preliminary testing, we found it necessary to broaden the search lexicon by including alternative terms commonly used in the field to describe the task, specific threats, or detection contexts. The search terms used to retrieve publication records from the five selected databases are listed in Figure 1.
The search query was applied to both title and abstract fields when possible or limited to abstracts depending on database interface constraints. As part of the screening protocol, we also established specific criteria for the inclusion and exclusion of studies.
  • Inclusion criteria:
    • Relevance to the research questions;
    • Conference or journal articles;
    • Indexed in at least one of the selected databases (Section 3.1.1);
    • Published within the time frame defined for this review;
    • Written in English.
  • Exclusion criteria:
    • The publication date falls outside the time frame defined for this review;
    • The report is a pre-print, review, or meta-analysis paper;
    • After abstract or full-text screening, a study is excluded when outside the scope of the review and not aligned with the research questions;
    • The document does not allow retrieval of at least half of the analysis items specified in Section 3.2;
    • The full text of the document is not accessible through academic channels and is restricted behind a paywall;
    • The document is inaccessible due to legal or technical constraints.

3.2. Data Extraction

A paper analysis protocol and a data extraction table were developed to standardize the retrieval of key characteristics from the selected studies. Sixteen aspects were established and verified in each work:
  • Targeted threats: the specific types of malicious intent or threats that the research aims to detect.
  • Approach: general type of process, such as classification on text or image, sentiment analysis, regression, or other.
  • Model architecture: the type and architecture of the AI model used for threat detection (ML, DL, Transformer-based, LLM, rule-based or other).
  • Preprocessing: the transformation steps applied to raw data.
  • Text representation: how the text is converted into a model appropriate input, such as Bag of Words (BoW), Term Frequency–Inverse Document Frequency (TF-IDF), n-grams, word embeddings, or contextualized embeddings.
  • Detection features: type of features considered in threat detection, text-based, or not (as behavioral features or network connections-based data).
  • Supported language: the language(s) for which the paper’s detection techniques were explicitly designed or evaluated.
  • Input scope: the type of data sources analyzed for threat detection.
  • Evaluation: the assessment form and obtained results.
  • Limitations: the researched approach caveats as identified by the authors of the paper.
  • Future directions: the suggestions for improvements or research next steps proposed by the authors.
  • Deployment platform: the specific messaging platforms or environments where the threat detection techniques are applied.
  • Datasets: description of the datasets used for training and evaluating.
  • Real time: whether the research proposal is feasible for real-time detection.
  • Implementation tools: tools, libraries, or frameworks used for implementation or evaluation.
  • Ethical aspects of real-world deployment: considerations regarding ethical implications of deploying the proposed detection methods, including privacy handling, informed consent, or legal compliance to GDPR, CCPA, or other.

4. Results

Following the methodology outlined in the previous section, this part details the results derived from each step and includes an analysis of the collected data points from each study, addressed according to the research questions.

4.1. Search and Screening

The process of literature search and retrieval was conducted in April 2025. Figure 2 illustrates the PRISMA 2020 flow diagram used in this review to depict the selection process of the relevant literature. In the current context, a study refers to a distinct piece of research, while a report refers to any publication or document that describes that study. The PRISMA diagram reflects both numbers. In this survey, only one document per study was considered.
The search query was not always correctly interpreted by the databases’ search engines, particularly regarding the Boolean AND operator. For this reason, we subsequently applied a local filtering tool to the list of results exported from each database, using the title, abstract, and keywords. This process led to the exclusion of 11,814 records. A total of 78 papers were not selected, as they did not fall within the review’s defined time period, the entire year of 2024 and the first quarter of 2025. Selected records and retrieved study reports were screened by a human reviewer. Three papers were not retrieved due to unavailability through regular academic sources. Among the reports assessed for eligibility, 31 papers were excluded based on relevance-related exclusion criteria. As an example, the paper in [14] was filtered out by the second and the fourth exclusion criteria mentioned in Section 3.1.2.
After screening and applying the inclusion criteria, a total of 30 publications were selected for detailed analysis, comprising 11 peer-reviewed journal articles and 19 papers from conference proceedings.
Figure 2. Flow of information through the different phases (PRISMA diagram [15]).
Figure 2. Flow of information through the different phases (PRISMA diagram [15]).
Electronics 14 02551 g002
The data items retrieved from the detailed paper analysis (outlined in Section 3.2) are listed across Table A1, Table A2 and Table A3, presented in Appendix A.

4.2. Analysis of the NLP Techniques Employed

Beginning with the distribution of studies by threat type, we found five papers focused on phishing [16,17,18,19,20] and three more specifically addressing smishing [21,22,23], two dealing with social engineering [24,25], and two for the scam category [26,27]. Regarding spam, there were six studies [28,29,30,31,32,33], and five on fake profile detection [34,35,36,37,38]. Impersonation and other threat types were each addressed by one study. Next, we examined the techniques adopted by the selected works, including detection approaches, preprocessing procedures, and text representation methods.
The analysis of detection approaches revealed that text classification was the most prevalent method, employed in 14 studies. This aligns with the nature of the threats, as they primarily manifest in written communication. Four additional studies applied classification techniques to non-textual content. A few works introduced hybrid or alternative methods, including simulated annealing, a deep stacked autoencoder combined with DL, federated learning, a hybrid approach using rule-based and ML, sentiment analysis, topic modeling combined with classification, statistical matching, and a combination of classification and regression tasks.
Going into a little more detail, we sought to verify how the techniques were distributed among the most represented threat types. Studies that worked on phishing detection employed LLMs with prompts and contextual embeddings [16,17], ML with RF, SVM, etc. [19,20], and deep learning with autoencoders [18]. For Smishing-specific studies, techniques ranged from ML (RF with TF-IDF) [21] to DL (CNN-LSTM) [22] and hybrid TF-IDF/embedding models [23]. Social engineering threats were addressed using two approaches: LLM pipelines, using Retrieval-Augmented Generation (RAG) contextual embeddings [25], and ensemble DL models with bi-LSTM/CNN/MLP [24]. In spam detection, traditional ML methods remain highly prevalent. Approaches in this category included ML (NB, RF, SVM) [28,32,33], DL (LSTM, RNN) [31,32], ML+DL hybrid [29], and Federated Learning (with PhoBERT) [30]. The use of federated learning is particularly interesting as it can help to partially mitigate some privacy concerns. Fake profile detection employed diverse techniques, ranging from statistical methods [38] and traditional ML [36] to modern gradient-boosted ensembles [37], deep learning (RNN) [34], and Transformer-based models for tweet content analysis [35]. While text-based approaches focused on profile bios and tweets, others leveraged behavioral or network-level features.
Regarding preprocessing, it is noteworthy that ML-based studies provided more detailed documentation of these steps, which typically followed a consistent pipeline, including lowercasing, punctuation and stop-word removal, tokenization, and lemmatization. Some studies that report this treatment are [19,20,28,32,39,40]. Other mentioned techniques were emoji removal [23,41] or numerical feature scaling and categorical encoding for profile metadata [36,37].
Deep neural network-based approaches tended to employ fewer preprocessing transformations, or authors provided less detail about this stage in their methodology descriptions. LLMs typically bypassed manual preprocessing steps, instead relying on prompt engineering [16,17,26] or contextual embeddings [16,25].
An examination of preprocessing techniques by targeted threat type indicated that standard NLP preprocessing steps were widely employed in the majority of text message-based threat detection studies. For threats like fake profile or payment fraud, which often involve structured data or metadata, preprocessing steps shifted towards numerical/categorical conversion of attributes. While many steps were general, some were tailored:
  • Slang removal in hacker discussions [42] indicated domain-specific cleaning;
  • Length-based filtering for impersonation [41] suggested message structure was also important in this threat category.
Traditional lexical features were the most frequent text representation methods listed, aligning with the common use of feature-based ML approaches. This group of features included Bag-of-Words (BoW), Term Frequency-Inverse Document Frequency (TF-IDF), n-grams, and message-level features (such as sentence length or structure, and uppercase usage), according to the second to last column of Table A1. Additionally, three works employed pre-trained word embeddings (GloVe and PhoBERT) [29,30,34], while contextual embeddings were mentioned seven times. Furthermore, several studies employed combined features, integrating lexical features with either embeddings [23,24,31] or semantic [27,43] and behavioral features [24,41].
Now, characterizing the text representation methods per threat category, we found that phishing and scam detection methods utilized both modern embeddings [16,17,26] and traditional techniques such as TF-IDF [19,20]. Static and contextual embeddings were used for text-based fake profile detection [34,35], in addition to other techniques with no text representation [36,37]. For the spam category, the textual representation ranged from simple BoW [28,32] to sophisticated embeddings [29,30]. Smishing studies employed both frequency-based representations [21] and SMS domain-adapted embeddings [22]. For social engineering, we observed a distinctive aspect: the use of a combination of modern representation techniques [24,25], including RAG. This may indicate that the authors aimed to capture not just superficial features of a message but also deeper semantic, contextual, and behavioral patterns.

4.3. Analysis of the Use of LLMs

As can be seen in the fourth column of Table A1, six out of thirty studies selected in this review, covering a period of one year and three months, employed LLM-based approaches. These publications specifically address threat categories phishing [16,17], scam [26], social engineering [25], harmful content [44], and one paper focused on hacker discussions as a early threat indicator [42]. Two additional studies employed a Transformer-based text classification approach for fake profile [35], and for social engineering [24] detection.
To illustrate the evaluation of two phishing detection approaches, studies [16,17] report accuracies of 100% and 97.5%, respectively. These results compare to 92% in [18] (DL-based) and 99% in [19] (ML-based) for the same threat type, though trained on distinct datasets.
LLMs demonstrate strong performance in context-intensive threats such as phishing and social engineering, primarily due to their advanced semantic understanding capabilities. However, LLMs are rarely used for behavior-centric threats (e.g., fake profile) or low-resource domains (e.g., Smishing).
Occasionally, the use of LLMs for generating simulated scenarios was noted as a limitation, as in [25], due to the risk of introducing unrealistic elements that could affect the reliability of the dataset.
Regarding language support in LLM-based studies, when specified, only English was mentioned. Multilingual capabilities were not reported in these six studies.
In half of the works involving LLMs, E-mail was the detection input scope, while the others took chat conversation history, posts, and comments as input.
Unexpectedly, among the 24 works where the methodology did not involve LLMs, only two instances [29,41] explicitly stated the intention to incorporate or investigate the application of LLMs within their future directions. Other studies indirectly alluded to more advanced models or had already employed LLMs and indicated future experiments with fine-tuned variants.

4.4. Analysis of Threat Detection Evaluation Methods

Out of 30 studies listed, 29 reported some form of evaluation, while 1 study explicitly stated “Planned, not performed” [40]. Hold-out validation and cross-validation were commonly employed. The fourth column (Evaluation) in Table A2 presents the results reported by authors regarding their approaches’ performance. The most frequently adopted metrics were accuracy, F1-score, precision and recall.
By combining those results with the Approach and Model Architecture (third and fourth) columns from Table A1, we can identify the peak reported performance for each detection technique, as presented in Table 2. Notably, within the phishing threat category, the top-performing study relies on traditional machine learning [19] rather than large language models.
It is also worth highlighting the assessment results by type of threat. In the case of fake profile detection, three studies reported an accuracy above 91% [34,36,38], while one study showed very limited performance [35]. Regarding phishing, the reported accuracy ranged from 62.5% to 100% in studies [16,17,18,19,20]. Three of the works on spam detection [28,30,32] reported an accuracy of 98% or higher, while two others presented values of 80.6% [29] and 85.6% [31]. The detection input scope for these first three was SMS, while for the others, it was described as textual messages.
Based on the reported evaluations, the studies generally demonstrated a high degree of effectiveness in the specific tasks they addressed, particularly in areas like SMS spam detection, phishing E-mail identification, and malicious chat content detection. However, it is important to note the considerable range in reported metrics. Studies like [17] (accuracy: 62.5% to 97.5%, F1: 44% to 97.6%) and [35] (precision: 31%, recall: 50%) highlight that effectiveness can vary greatly depending on the specific model, dataset quality, data size, and experimental setup.
The Limitations column in Table A2 provides additional context to put some results into perspective (e.g., “restricted data access” for [21], “small dataset” for [35], “dataset is focused on simulated scenarios” and “generated messages may be unrealistic” for [25]). These limitations suggest that real-world applicability or generalizability might be lower than the reported metrics imply in some cases.

4.5. Analysis of Detection Input Scope

Considering the last column of Table A1, we note that the features considered by the models essentially characterized the message itself, and occasionally the message along with the preceding conversation history. Additional aspects, such as behavioral features and social network cues, were mentioned only in works addressing fake profile and impersonation detection.
Regarding the detection scope, that is, the category of data source being processed, as shown at the top of Figure 3, most studies focused on chats, forums, messaging platforms, SMS, and E-mail. A detailed analysis of the second column in Table A2 revealed the following: Nine instances of message history analysis (covering chats, forums, and unidirectional messaging platform channels); Eight instances of SMS message processing; Five cases of E-mail content examination; Two studies analyzing tweets (X platform); Two works incorporating profile account attributes; One instance of website content analysis; One study utilizing users’ typing patterns.

4.6. Analysis of Real-World Applicability and Ethical Compliance

Three of the selected studies [21,27,39] mentioned the possibility of threat detection in real time. The others were either silent on this aspect or explicitly indicated that it was not supported.
Among the 30 papers analyzed, 22 did not mention any ethical considerations related to real-world deployment, as shown in Table A3. These AI-based approaches involved message analysis and the automated processing of behavioral and social connection data. No explicit references to regulations such as GDPR were found.
An ethical dilemma is explicitly stated in [25]: publicly releasing the dataset could enable malicious use by certain actors.
In study [41], the authors report that personally identifiable information was removed to protect user privacy. The same work also documented that the detected fake and cloned channels were reported to Telegram.
In a few isolated cases, consent is mentioned, but only regarding direct participation in the study itself [38], not for potential future use of the system in real-world scenarios.
The work presented in [42] identifies privacy concerns and suggests corresponding risk mitigation strategies. The authors of [30] propose a federated learning approach to reduce sensitive information exposure. In [17], self-hosting a model is suggested for privacy, rather than using a cloud-deployed model and expanding the data perimeter.

5. Discussion

Following the synthesis of results, as outlined in the previous section and thoroughly detailed in Appendix A, we now present a series of considerations, addressing each research question individually.
RQ1: What NLP techniques are employed in threat detection for messaging platforms, and how are LLMs integrated into these approaches, according to recent research?
Threat detection for messaging platforms involves mainly classification tasks, typically binary, based on textual information. Sometimes, in addition to the text, other inputs are also considered, of a quantitative and structured nature, regarding properties of the profile of the message-sending interlocutor or their visible behavior. Graph-type data, related to connections on social networks, for example, may also exist. We also found distinct approaches for detection that involved regression, federated learning, hybrid solutions with ensembles, or approaches that combine heuristics with classification or with sentiment analysis.
LLMs are still not very prevalent in recent publications on the detection of threats related to messages and online interaction, with our review showing a weight of 6 out of 30, or a presence in 20% of the works. Reports on studies not using LLMs do not mention reasons for not adopting these tools. As many powerful LLMs reside in the cloud, and are controlled by private entities, sending sensitive data off-premises introduces significant risks, such as the lack of control on data retention policies, data interception risk, or the potential for the model to generate improper output, inadvertently exposing data related to the input provided [12,45,46]. This is a particular concern under stringent regulations like the GDPR. These data privacy challenges may perhaps explain part of the limited adoption of LLMs in the reviewed studies. The alternative of training a dedicated LLM, or using an on-premises open model, can mitigate privacy and legal compliance risks but demands significant computational resources. This can also be a discouraging factor for the use of LLMs in this context.
Focusing now on how LLMs are employed, models are used for the classification task central to detection, for generating clues and embeddings to be considered by another part of a deep neural network system, for the identification of topics or named entities or to support an auxiliary task, or for synthesizing new training data when original data are limited [25].
RQ2: What evaluation strategies are used to measure the effectiveness of these techniques?
The evaluations presented in the articles are scientifically sound, employing methods and metrics widely used in machine learning, particularly in the field of NLP. Since most detection approaches are classification-based, it is common practice to use hold-out validation or K-fold cross-validation on reference datasets. Performance is typically measured using accuracy, precision, recall, and F-measure.
In the case of regression-based approaches, the most commonly used evaluation metrics are the Mean Absolute Error (MAE—the average of the absolute differences between the predicted and actual values) and Mean Squared Error (MSE—the average of the squared differences between the predicted and actual values).
However, comparability across studies is challenging, as the datasets used often differ. In many cases, researchers need to create new datasets to address issues related to the sensitivity of existing data. As noted in one of the reviewed studies, this process of generating training instances can introduce bias into the dataset, potentially impacting the model’s performance.
An important insight we can draw is that traditional machine learning remains highly competitive. For certain complex threats, such as phishing, ML approaches can even outperform LLMs, as shown in Table 2. Another notable aspect is the significant performance variability within threat categories. Even for the same threat type (e.g., fake profile or phishing), reported results vary considerably.
Globally, in this context of heterogeneous threats, effectiveness appears to be task-specific and input-dependent. There also seems to be a correlation between input scope (and consequently, task complexity) and the detection performance. To illustrate, high-performing spam detection studies primarily used SMS input, which may be simpler to process than longer conversation histories.
As seen in Section 4.4, some study limitations presented in Table A2 suggest that part of the reported results may not fully generalize to real-world settings, due to differences between experimental and live environments.
RQ3: What types of input are used for online threat detection in recent studies?
Analyzed threat detection systems make use of the following input sources: textual messages (E-mail, SMS, messaging platforms), isolated or within a conversation history; web content displayed on websites; profile attributes on social media; or behavioral data quantified over time or related to social connections. These input scopes primarily focus on various forms of text-based communication, and crucial non-textual data related to user profile and behavior. This diversification of input types reflects the evolving nature of cyberthreats and the need for more comprehensive detection systems that can analyze a wider array of digital footprints.
Perhaps it would be useful for the effectiveness of detecting certain types of threats, such as those associated with fake profiles, to consider a broader detection input scope than just the content of the message or a one-sided view. Studies in social engineering detection often demand a deep understanding of the context, particularly the conversation history. On the other hand, the collaboration between human analysts and automatic systems, and therefore a new source of input for multimodal AI systems, can contribute to the detection performance.
RQ4: How can detection efficacy be balanced with privacy, informed consent, and compliance with legal and ethical regulations?
The findings of this review indicate that ethical implications related to the later deployment phase are not currently a priority during the research stage focused on designing and optimizing techniques, or at least, there is little to no documented discussion of this topic.
Concern for privacy and ethics in the use of AI is explicitly expressed in some of the analyzed works, with informed consent from participants being used in some studies. Likewise, the validation of the work referenced in certain papers by Institutional Review Board is pertinent; however, it does not obviate the necessity of devising a security, privacy, and ethics strategy for the application of AI upon real-world deployment.
Considering the implementation tools mentioned, the studies using ML and DL appear to minimize the scope of data exposure, whereas the approaches based on LLMs generally rely on remote services and platforms, rather than using open models running locally. Data transfers for remote processing create significant privacy and compliance obligations, particularly given the jurisdictional uncertainties inherent in deperimeterized environments.
In our rapidly evolving digital landscape, advancing digital literacy is crucial. This must encompass not only awareness of online threats but also an understanding of ethical implications and legal compliance requirements. An effective real-world solution must incorporate data subject rights from inception, respecting their authorizations or refusals regarding data use for each specific purpose and time period.
Privacy by design is recommended to minimize risks. Data protection is paramount, driven by cryptographic methods and robust access control. On-premises data processing minimizes data exposure. Where feasible, privacy-enhancing technologies like federated learning or homomorphic encryption should be implemented as complementary safeguards. Anonymization must be systematically applied, being particularly critical in the event of a system breach, serving to mitigate the risk to data subjects. In services involving data transfers, all flows must be mapped and compliance with the data sovereignty principles verified. Data processing agreements must be prepared when a service involves a third-party entity.
In the European Union, the Artificial Intelligence Act (AI Act (https://artificialintelligenceact.eu/ (accessed on 15 April 2025))) is a comprehensive regulatory framework for AI to ensure safety, transparency, and fundamental rights protection. An online compliance checker for new AI-related obligations is available. It also emphasizes the importance of AI literacy, listing some training programs that may be useful to better understand AI technologies and their responsible use.

6. Conclusions

The current threat detection literature largely lacks a transversal perspective on common message-based patterns. Furthermore, the practical applicability of these detection strategies for real-world citizen protection, especially concerning user privacy and regulatory compliance, requires deeper investigation. In response to these gaps, we designed this systematic review of NLP techniques for detecting message-based threats on digital platforms, conducting a bibliographical survey of recently published studies while following PRISMA guidelines. The review search and screening pipeline resulted in a set of 30 relevant articles, whose information was analyzed according to a defined protocol and recorded as shown in Appendix A.
Our main contributions include a survey of the strategies used to detect message-based threats and the role of large language models (LLMs), the evaluation methods and performance of the studied approaches, the types of data sources used for threat detection, and discussions surrounding privacy and ethics. To the best of our knowledge, this is the first study to quantitatively assess the prevalence of LLMs in recent research on message-based threat detection.
One limitation of our work may be the short time frame used for selecting publications. On one hand, a year and a quarter might seem like a short period. On the other hand, it seems unlikely that relevant papers published before 2024 would use open LLM models for threat detection. Additionally, extending the time frame backwards would reduce the up-to-date nature that was intended for this systematic review.
Regarding the use of LLMs, there still seems to be a way to include this type of model in message-based threat detection techniques. Perhaps the computational complexity, added to the ethical challenges in using message and interlocutor data, is the justification. This will be something to validate in a future extension of this work. The trend towards open LLM models, offering personalized fine-tuning and less centralized deployment, may soon facilitate a wider adoption of dedicated LLMs, thus avoiding reliance on third-party entities and deperimetrization issues.
For attackers, it is easier to try and find a victim by exploiting the mass sending of messages and resorting to automated systems. For defensive systems, it is difficult to guarantee that no point of vulnerability exists. And unlike attackers, the training of models and the use of data must follow good ethical and legal practices, which may add some delay to the progress of protection techniques.
When it comes to ethics in AI, the balance between data privacy and the ability to develop models for threat detection and timely protection is a challenge. However, while safeguarding the consent of the data subject, the important thing is to ensure that the impact of data exposure to AI is proportionate to the severity of the threat it aims to mitigate [47].

Funding

This research received no external funding.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
AIArtificial Intelligence
AUCArea Under the ROC Curve
BERTBidirectional Encoder Representations from Transformers
BiLSTMBidirectional Long Short-Term Memory
CNNConvolutional Neural Network
CTICyberthreat Intelligence
DLDeep Learning
DTDecision Tree
F1F1-Score (harmonic mean of precision and recall)
GPTGenerative Pre-trained Transformer
IRBInstitutional Review Board
KNNK-Nearest Neighbors
LLMLarge Language Model
LRLogistic Regression
LSTMLong Short-Term Memory
MLMachine Learning
MLPMultilayer Perceptron
MSEMean Squared Error
NBNaïve Bayes
NERNamed Entity Recognition
NLPNatural Language Processing
PIIPersonally Identifiable Information
RAGRetrieval-Augmented Generation
RFRandom Forest
RNNRecurrent Neural Network
SMSShort Message Service (text message)
SVMSupport Vector Machine

Appendix A

This Appendix presents the tables containing data items collected during our analysis of the selected papers in this review, following the protocol outlined in Section 3.2.
Table A1. Summary of approaches and NLP techniques used in the selected studies.
Table A1. Summary of approaches and NLP techniques used in the selected studies.
RefTargeted ThreatsApproachModel
Architecture
PreprocessingText RepresentationDetection Features
[16]PhishingText classification using LLMsLLMPrompt developmentLLM’s contextual embeddingsText-based features: message content, subject, sender, recipient
[26]ScamLLM applies a scam detection rubricLLMPrompt development including instructions, E-mail, and rubricLLM’s contextual embeddingsText-based features and rubric rules
[21]SmishingText classificationML (Random Forest)Tokenization, stopword removal, stemmingTF-IDFText-based features
[39]Toxic communicationText classificationML (RF, SVM, LR, MNB, KNN), and BERTNormalization, tag and link removal, stopword removal, stemming, lemmatizationTF-IDFText-based features
[34]Fake profileClassificationDL (RNN: GRU+LSTM)Tokenization, normalizationWord embedding: GloVeText-based features (from tweets, user profile, network)
[40]Inappropriate messagesRule-based, sentiment analysisDL (LSTM)Tokenization, stopword removal, stemmingUnspecifiedText-based features, behavior temporal features
[48]Authenticity classificationTopic modeling (LDA), text classificationXLNet and BERTTokenization, special characters and stopword removalTF-IDF with LDA; contextual embeddings for classificationText-based features
[17]PhishingBenchmarking 12 LLMs instructed for detectionLLMBenchmarking setup; prompt developmentLLM’s contextual embeddingsText-based features
[35]Fake profiletext classificationTransformer-based (BERT)tag, punctuation and stopword removal; lemmatizationBERT’s contextual embeddingsText-based features from tweet content and properties
[24]Social engineeringSimulated annealing optimized fusion of five specialized DL modelsDL (bi-LSTM, CNN, MLP) and Transformer-basedNot mentioned. Specific to inner models.Static embeddings and BERT’s contextual embeddingspersonality traits, linguistic aspects, behavioral characteristics, and IT attributes
[28]Spamtext classificationML (NB, RF, SVM…)Tokenization, stemmingBoW and TF-IDFText-based features
[25]Social engineeringLLM-based text classification pipelineLLMNot mentionedContextual embeddings (LLM+RAG)Text-based features (message-level and conversation-level)
[29]SpamText classificationML, DLTokenization, stopword removal, stemmingGloVe embeddingsText-based features
[18]PhishingDL+deep stacked autoencoderDLNot mentionedBoW, n-gram, hashtags, sentence length, uppercase, TF-IDFWeb and URL features and text-based features
[30]SpamFederated learning for classificationFedAvg, FedAvgM, FedAdamNAPhoBERT’s embeddingsText-based features
[44]Harmful contentLLMs as content classifiers in a custom policy moderation pipelineLLMNAGPT 3.5 and LLaMa 2’s embeddingsText-based features
[36]Fake profileClassificationML (Random Forest, SVM, KNN)Converting profile attributes into numerical formNA (the proposal uses no text)Profile account metadata, behavioral features, network follower, and following numbers
[31]Spamtext classificationDL (LSTM)Punctuation, stopwords, and URL removal; lemmatizationBoW, TF- IDF, word embeddingsText-based features
[37]Fake profile, payment fraudClassificationML (XGBoost, CatBoost, GBM)Numerical feature scaling, categorical encoding, derived features’ generation.Not mentionedAverage likes per post, sentiment analysis of profile bio text, post hashtags
[41]Impersonation, clone channelsClassificationML (RF, SVM) and DL (MLP)Removal of mentions, numbers, links, emoji, and messages shorter than 15 characters; tokenizationMessage format and structure features (not the usual text representation)Behavioral features (#forwarded messages, average length of posted messages, #messages posted in the last 3 months, interaction features); profile attributes
[43]Aggressive contentText classificationML, DL (LSTM)Punctuation and stopword removal; stemming; LabelingSemantic features (actor, target, polarity), and TF-IDFText-based features
[38]Fake profileStatistical methods for data match scoreSeparate and combined similarity verifiersNot mentionedNAKey hold time, key interval time, word hold time
[22]SmishingText classificationDL (CNN + LSTM)Lowercase conversion, tokenization, punctuation and stopword removalTrained embeddingsText-based features
[27]Malicious (spam, phishing, or fraud) messagesText classificationMLStopword removal, stemming, lemmatizationLexical (word frequencies, n-grams), Semantic features (fraud related), context features (sender, time related)Text-based features and behavioral features
[42]Hacker discussions as early threat indicatorLLM-based named entity recognition and classificationNLP, ML, and LLMlowercase conversion; tokenization; punctuation, stopword, slang and non-ASCII removal; lemmatizationBoW and TF-IDF for SVM; BERT’s contextual embeddingsText-based features
[32]SpamText classificationML (LR, SVM, RF), DL (RNN)Tokenization, stopword and punctuation removal, stemmingBoW, message lengthText-based features
[33]SpamClassification and regression approachML (RF, RF Regressor)Not mentionedFeatures representing the outcome of criteria checks (binary/numerical)56 features extracted using NLP and grouped into categories: headers, text, attachments, URLs, and protocols
[20]PhishingText classificationML (RF)Stemming via AraBERTTF-IDFText-based features
[19]PhishingText classificationML (RF, DT, LR, SVM)Convert to lowercase, punctuation and stopword removal, tokenization, lemmatizationTF-IDFText-based features
[23]SmishingText classificationML (LR, SVM, RF...)Prior feature extraction, tokenization, emojis and punctuation removal, lowercaseTF-IDF and BERT embeddingsMetadata and text-based features
Table A2. Language, inputs, evaluation, research limitations and future directions reported in the selected studies.
Table A2. Language, inputs, evaluation, research limitations and future directions reported in the selected studies.
RefSupported LanguageInput ScopeEvaluationLimitationsFuture Directions
[16]Unspecified (multilingual ability by LLM)E-mailAccuracy, confidence score distribution; OpenAI models excelSporadic experience involving copying text into a prompt for the analyzed modelsFurther model fine-tuning and refinement
[26]Unspecified (multilingual ability by LLM)E-mailAccuracy: 69% to 98%Limited experience; vulnerable to OpenAI interface downtime, and API rate limitsRefining the scam detection rubric; creating an entirely self-contained ML algorithm
[21]English and BembaSMS text messagesF1: 90.2%, AUC: 95%Restricted data accessThe need for further model optimization
[39]EnglishChat message contentAccuracy: 84% to 92%Not specified; scale challengesOptimizing ML parameters, bot enhancing, and multilingual support
[34]Mostly English (dataset-dependent)TweetsAccuracy: 99.73%, Precision: 98.23%, Recall: 99.56%, F1: 99.63%Not specifiedWider range of features, multilingual, language-agnostic features
[40]UnspecifiedChat message contentPlanned, not performedNot specified; the proposed methodology lacks implementation detailsRefine the core components; broader chat platform scope
[48]ChineseChat messages’ contentAccuracy: 77% to 82.5%Not specified.Optimizing the model and enhancing generalization, alongside supporting diverse languages and cultural backgrounds
[17]EnglishE-mailAccuracy: 62.5% to 97.5%, F1: 44% to 97.6%The benchmark used base models and small datasets; optimized temperature value was different for each modelPhishing detection using fine-tuned LLMs; expanding datasets to include more complex scenarios
[35]English (multilingual depends on BERT version)TweetsPrecision: 31%, recall: 50%Not specified; poor performance; small datasetNot specified
[24]EnglishChat historyAccuracy: 79.9%, AUC: 74.3%, F1: 70.1%Gap between the AUC value of the multimodal fusion model and one inner component; further research is needed to investigate the model interpretabilityExpand capabilities, identifying deepfake content
[28]EnglishSMS text messagesAccuracy: 98%Scalability challenges; dataset sizeExploring temporal patterns features
[25]English; multilingual not specifiedChat message; chat conversation (history)F1: 80%Dataset is focused on simulated scenarios in a particular topic; LLM-generated messages may be unrealistic or overly agreeable and could affect the dataset reliabilityExpand to other domains such as financial services or customer support; consider the broader ethical and practical LLM usage implications
[29]EnglishMessage textAccuracy: 80.6%, F1: 75.5%Not specifiedIncorporate BERT or GPTs; use HuggingFace
[18]UnspecifiedWebsite dataAccuracy: 92%, TPR: 92.5%, TNR: 92.1%Not specifiedUtilize transfer learning
[30]VietnameseSMS text messagesAccuracy: 98%NANA
[44]EnglishPosts, comments, messages (text)F1: 44% to 77.6%Cannot differentiate between a violation detected with high confidence and one just suspectedExpand across diverse languages, cultures, and contexts, even involving real users; using multimodal models, generalize the approach to different types of data
[36]UnspecifiedProfile account attributesAccuracy: 92.5%Not specifiedNot specified
[31]Unspecified (dataset dependent)MessageAccuracy: 85.6%Not specifiedContextual analysis; user-centric approaches.
[37]UnspecifiedProfile account attributesAccuracy: 93%, F1: 92%Not specifiedDetection across various domain sectors
[41]EnglishChannel text messages; message-related counters; channel (profile) attributesF1: 85.45%Focused only on English channelsIncorporate LLMs and include semantic features.
[43]EnglishText sentencesF1: 77.13% for LR/ML, F1: 91.61% for BiLSTM/DLLinguistic features seem to make the models over-generalize; for the textual features, models tend to be biased; model interpretability; limited to English.Dataset’s human annotation; increase dataset positive instances
[38]UnspecifiedUsers’ typing patternsAccuracy: 91.6% to 100%Not specifiedAugment the dataset with additional users; investigate additional linguistic features; include multimodal, multi-device, usage-context, and DL
[22]SwahiliSMS text messagesAccuracy: 99.98%Not specifiedTune hyperparameters to reduce false negatives; develop a mobile application
[27]EnglishSMS, chat, app, or E-mail messagesAccuracy: 94.6%, F1: 93.2%Not specifiedNot specified
[42]Dataset-dependent (English)Forum or group chat contentsAccuracy: 91.7%, F1: 87.8%Communities are probably moving to other platforms; sampling bias; report and forum information reliability; the scope of the analysis was English-speaking forums onlyDevelop benchmark datasets for NER models within the CTI domain; focus on topic matching that incorporates contextual semantics; extend data sources to include modern platforms
[32]EnglishSMS text messagesAccuracy: 99.28%Not specifiedTrain the model with other datasets and fine tuning; curate new SMS datasets
[33]English and SpanishE-mail message with headersF1: 91.4% for classification, and MSE: 0.781 for regressionConsiderable feature extraction timeDevelopment of an ensemble model based on stacked generalization; explore more distinctive features
[20]ArabicSMS text messagesAccuracy: 98.66%, F1: 98.67%Some inaccuracies in translating an English dataset to ArabicUse a real Arabic dataset
[19]Dataset-dependent (English)E-mailAccuracy: 98.72% and 99.15%, depending on the datasetNot specifiedDataset augmentation; try a wider range of phishing strategies and linguistic nuances
[23]UnspecifiedSMS text messagesAccuracy: 94%, F1: 93.78%Not specifiedExtend to analyze various social platform messages; collect other language-based datasets.
Table A3. Platform, datasets, tools, and ethical aspects reported in the reviewed studies.
Table A3. Platform, datasets, tools, and ethical aspects reported in the reviewed studies.
RefDeployment PlatformDatasetsReal-TimeImplementation ToolsEthical Aspects of Real-World Deployment
[16]E-mailFraudulent E-mail corpusNpoe.com AI chat platform, LLMsNot mentioned
[26]E-mailNazario database, Untroubled Scam 2023 Archive, and customNOpenAI ChatGPT API, ChatGPT 3.5Not mentioned
[21]SMSEnglish + Bemba Smishing datasetsYUnspecifiedNot mentioned
[39]Telegram platformChat dataset from social platforms and KaggleYHuggingFace, Python packagesNot mentioned
[34]X platform (Twitter)TwiBot-20 datasetUnspecifiedPython packagesNot mentioned
[40]Messaging appsUnspecifiedNAUnspecifiedData anonymization and user consent are proposed
[48]WeChat group chatsCustom chat datasetUnspecifiedHuggingFace’s Transformers libraryNot mentioned
[17]E-mailEnterprise E-mails (contemporary message quality)NAChatbot Arena, OllamaSelf-hosting a model is suggested for privacy
[35]X platform (Twitter)Collected tweet datasetUnspecifiedUnspecifiedNot mentioned
[24]Chat platformsChat social engineering corpusUnspecifiedSpaCy, PyTorch, HuggingFaceNot mentioned
[28]SMSSMS Spam CollectionNPython sklearnNot mentioned
[25]Chat platformsSEConvo, a developed datasetUnspecifiedLangChain, OpenAI API, Faiss, PythonEthical dilemma: the potential dataset misuse; privacy and consent are not mentioned.
[29]SMS, chat, messaging platformsCompilation of SMS and Twitter datasetsNUnspecifiedNot mentioned
[18]Content referring WebpagesWebpage phishing detection datasetNPythonNot mentioned
[30]SMSSpam datasetUnspecifiedNAFL is used to reduce the exposure of sensitive information
[44]Social mediaOpenAI’s content moderation dataset; Reddit’s Multilingual Content Moderation datasetUnspecifiedUnspecifiedEthical and legal concerns are reported on how data and personal information might be used for training without the user’s awareness
[36]Instagram platformInstagram fake/spammer/genuine accountsUnspecifiedPython, Pandas, NumPy, Scikit-learnNot mentioned
[31]Social mediaCustom datasetUnspecifiedNLTK, Pandas, TensorFlow, PyTorch, Scikit-learnNot mentioned
[37]Social MediaFake profile and payment fraud datasetsUnspecifiedUnspecifiedNot mentioned
[41]Telegram platformCollected data from 120,979 Telegram public channelsUnspecifiedNLTK, LangDetect, Scikit-learn, PyTorch, Telegram APIThe protection of user privacy is mentioned; PII was removed; authors reported the detected fake and clone channels to Telegram
[43]Text message platformLLM generated datasetUnspecifiedTextblob, ChatGPT, Google ColabNot mentioned
[38]Social networksCollected keystroke timings on Facebook, X, and InstagramUnspecifiedPythonStudy participants signed a consent; deployment phase not mentioned
[22]SMS32,259 Swahili SMS messagesUnspecifiedPython, Keras, Google ColabNot mentioned
[27]Mobile messaging applications50,000 (benign, spam, phishing, and fraud) messagesYUnspecifiedNot mentioned
[42]Online forumsHacker forums articles covering 20 years; threat report data (intelligence community, press, sites)NNLTK, DarkBERTEthical and privacy concerns are mentioned by the authors regarding the use of conversations; some measures are suggested to mitigate privacy risks
[32]SMSSMS Spam Dataset (UCI)UnspecifiedPython, Google ColabNot mentioned
[33]E-mailCustom dataset built from two sourcesNPython, Scikit-learnNot mentioned
[20]SMSCustom, translated dataset: 638 phishing and 4,844 legitimate messagesUnspecifiedAraBERT (Arabic LM), Python, Scikit-learn, Google ColabNot mentioned
[19]E-mailFraud E-mail dataset and phishing E-mail datasetUnspecifiedUnspecifiedNot mentioned
[23]SMSCustom dataset combining messages from three sourcesUnspecifiedNLTK, Scikit-learn, TensorFlowNot mentioned

References

  1. Thumboo, S.; Mukherjee, S. Digital romance fraud targeting unmarried women. Discov. Glob. Soc. 2024, 2, 105. [Google Scholar] [CrossRef]
  2. Buil-Gil, D.; Zeng, Y. Meeting you was a fake: Investigating the increase in romance fraud during COVID-19. J. Financ. Crime 2022, 29, 460–475. [Google Scholar] [CrossRef]
  3. Alharbi, A.; Dong, H.; Yi, X.; Tari, Z.; Khalil, I. Social Media Identity Deception Detection: A Survey. Acm Comput. Surv. 2021, 54, 69. [Google Scholar] [CrossRef]
  4. Perik, L.W. Leveraging Generative Pre-trained Transformers for the Detection and Generation of Social Engineering Attacks: A Case Study on YouTube Collusion Scams. Master’s Thesis, University of Twente, Enschede, The Netherlands, 12 January 2025. [Google Scholar]
  5. Kyaw, P.H.; Gutierrez, J.; Ghobakhlou, A. A Systematic Review of Deep Learning Techniques for Phishing Email Detection. Electronics 2024, 13, 3823. [Google Scholar] [CrossRef]
  6. Europol. Internet Organised Crime Threat Assessment (IOCTA) 2021; Publications Office of the European Union: Luxembourg, 2021. [Google Scholar] [CrossRef]
  7. Europol. Internet Organised Crime Threat Assessment (IOCTA) 2024; Publications Office of the European Union: Luxembourg, 2024. [Google Scholar] [CrossRef]
  8. Jurafsky, D.; Martin, J.H. Speech and Language Processing, 3rd ed.; Stanford University: Stanford, CA, USA, 2025. [Google Scholar]
  9. IBM. What is Machine Learning? Available online: https://www.ibm.com/think/topics/machine-learning (accessed on 4 April 2025).
  10. Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; MIT Press: Cambridge, MA, USA, 2016. [Google Scholar]
  11. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention is All You Need. In Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
  12. Barberá, I. AI Privacy Risks & Mitigations—Large Language Models (LLMs). European Data Protection Board. Available online: https://www.edpb.europa.eu/system/files/2025-04/ai-privacy-risks-and-mitigations-in-llms.pdf (accessed on 12 June 2025).
  13. Booth, A.; Sutton, A.; Clowes, M.; James, M. Systematic Approaches to a Successful Literature Review, 3rd ed.; SAGE Publications Ltd.: Thousand Oaks, CA, USA, 2022. [Google Scholar]
  14. Fakhouri, H.; Alhadidi, B.; Omar, K.; Makhadmeh, S.; Hamad, F.; Halalsheh, N. AI-Driven Solutions for Social Engineering Attacks: Detection, Prevention, and Response. In Proceedings of the 2nd International Conference on Cyber Resilience (ICCR), Dubai, United Arab Emirates, 26–28 February 2024; pp. 1–8. [Google Scholar] [CrossRef]
  15. Haddaway, N.R.; Page, M.J.; Pritchard, C.C.; McGuinness, L.A. PRISMA2020: An R package and Shiny app for producing PRISMA 2020-compliant flow diagrams, with interactivity for optimised digital transparency and Open Synthesis. Campbell Syst. Rev. 2022, 18, e1230. [Google Scholar] [CrossRef] [PubMed]
  16. Patel, H.; Rehman, U.; Iqbal, F. Evaluating the Efficacy of Large Language Models in Identifying Phishing Attempts. In Proceedings of the 16th International Conference on Human System Interaction (HSI), Paris, France, 8–11 July 2024; pp. 1–7. [Google Scholar] [CrossRef]
  17. Zhang, J.; Wu, P.; London, J.; Tenney, D. Benchmarking and Evaluating Large Language Models in Phishing Detection for Small and Midsize Enterprises: A Comprehensive Analysis. IEEE Access 2025, 13, 28335–28352. [Google Scholar] [CrossRef]
  18. Vidyasri, P.; Suresh, S. FDN-SA: Fuzzy deep neural-stacked autoencoder-based phishing attack detection in social engineering. Comput. Secur. 2025, 148, 104188. [Google Scholar] [CrossRef]
  19. Aljamal, M.; Alquran, R.; Aljaidi, M.; Aljamal, O.S.; Alsarhan, A.; Al-Aiash, I.; Samara, G.; Banisalman, M.; Khouj, M. Harnessing ML and NLP for Enhanced Cybersecurity: A Comprehensive Approach for Phishing Email Detection. In Proceedings of the 25th International Arab Conference on Information Technology, Zarqa, Jordan, 10–12 December 2024. [Google Scholar] [CrossRef]
  20. Ibrahim, A.; Alyousef, S.; Alajmi, H.; Aldossari, R.; Masmoudi, F. Phishing Detection in Arabic SMS Messages using Natural Language Processing. In Proceedings of the Seventh International Women in Data Science Conference at Prince Sultan University, Riyadh, Saudi Arabia, 3–4 March 2024. [Google Scholar] [CrossRef]
  21. Zimba, A.; Phiri, K.; Kashale, C.; Phiri, M. A machine learning and natural language processing-based smishing detection model for mobile money transactions. Int. J. Inf. Technol. Secur. 2024, 16, 69–80. [Google Scholar] [CrossRef]
  22. Mambina, I.S.; Ndibwile, J.D.; Uwimpuhwe, D.; Michael, K.F. Uncovering SMS Spam in Swahili Text Using Deep Learning Approaches. IEEE Access 2024, 12, 25164–25175. [Google Scholar] [CrossRef]
  23. Jain, A.K.; Kaur, K.; Gupta, N.K.; Khare, A. Detecting Smishing Messages Using BERT and Advanced NLP Techniques. SN Comput. Sci. 2025, 6, 109. [Google Scholar] [CrossRef]
  24. Tsinganos, N.; Fouliras, P.; Mavridis, I.; Gritzalis, D. CSE-ARS: Deep Learning-Based Late Fusion of Multimodal Information for Chat-Based Social Engineering Attack Recognition. IEEE Access 2024, 12, 16072–16088. [Google Scholar] [CrossRef]
  25. Ai, L.; Kumarage, T.; Bhattacharjee, A.; Liu, Z.; Hui, Z.; Davinroy, M.; Cook, J.; Cassani, L.; Trapeznikov, K.; Kirchner, M.; et al. Defending Against Social Engineering Attacks in the Age of LLMs. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, Miami, FL, USA, 12–26 November 2024; pp. 12880–12902. [Google Scholar] [CrossRef]
  26. DiMario, C.L.; Bacha, R.C.; Butka, B.K. Combatting Senior Scams Using a Large Language Model-Created Rubric. In Proceedings of the 5th Asia Service Sciences and Software Engineering Conference (ASSE 2024), Tokyo, Japan, 11–13 September 2024. [Google Scholar] [CrossRef]
  27. Reddy, M.; Pallerla, R. Using AI to Detect and Classify Suspicious Mobile Messages in Real Time. In Proceedings of the 3rd International Conference on Intelligent Data Communication Technologies and Internet of Things (IDCIoT), Bengaluru, India, 5–7 February 2025; pp. 1772–1777. [Google Scholar] [CrossRef]
  28. Dharrao, D.; Gaikwad, P.; Gawai, S.V.; Bongale, A.M.; Patel, K.; Singh, A. Classifying SMS as spam or ham: Leveraging NLP and machine learning techniques. Int. J. Saf. Secur. Eng. 2024, 14, 289–296. [Google Scholar] [CrossRef]
  29. Asmitha, M.; Kavitha, C.R. Exploration of Automatic Spam/Ham Message Classifier Using NLP. In Proceedings of the 9th International Conference for Convergence in Technology (I2CT), Pune, India, 5–7 April 2024; pp. 1–7. [Google Scholar] [CrossRef]
  30. Anh, H.Q.; Anh, P.T.; Nguyen, P.S.; Hung, P.D. Federated Learning for Vietnamese SMS Spam Detection Using Pre-trained PhoBERT. In Proceedings of the 25th International Conference on Intelligent Data Engineering and Automated Learning—IDEAL 2024, Valencia, Spain, 20–22 November 2024; Volume 15346. [Google Scholar] [CrossRef]
  31. Sivakumar, M.; Abishek, S.A.; Karthik, N.; Vanitha, J. Offensive Message Spam Detection in Social Media Using Long Short-Term Memory. In Proceedings of the 3rd Edition of IEEE Delhi Section Flagship Conference (DELCON), New Delhi, India, 21–23 November 2024. [Google Scholar] [CrossRef]
  32. Bennet, D.T.; Bennet, P.S.; Thiagarajan, P.; Sundarakantham, K. Content Based Classification of Short Messages using Recurrent Neural Networks in NLP. In Proceedings of the International Conference on Artificial Intelligence, Computer, Data Sciences and Applications (ACDSA), Victoria, Seychelles, 1–2 February 2024; pp. 1–6. [Google Scholar] [CrossRef]
  33. Jáñez-Martino, F.; Alaiz-Rodríguez, R.; González-Castro, V.; Fidalgo, E.; Alegre, E. Spam email classification based on cybersecurity potential risk using natural language processing. Knowl.-Based Syst. 2025, 310, 112939. [Google Scholar] [CrossRef]
  34. Geetha, B.; Sushmitha, B.; Ilanchezhian, P.; Alabdeli, H.; Ahila, R. A Bi-directional Gated Recurrent Unit and Long Short-Term Memory based Fake Profile Identification System. In Proceedings of the First International Conference on Software, Systems and Information Technology (SSITCON), Tumkur, India, 18–19 October 2024; pp. 1–5. [Google Scholar] [CrossRef]
  35. Singha, A.K.; Paul, A.; Sonti, S.; Guntur, K.; Chiranjeevi, M.; Dhuli, S. BERT-Based Detection of Fake Twitter Profiles: A Case Study on the Israel-Palestine War. In Proceedings of the 15th International Conference on Computing Communication and Networking Technologies (ICCCNT), Kamand, India, 24–28 June 2024; pp. 1–6. [Google Scholar] [CrossRef]
  36. Arunprakaash, R.R.; Nathiya, R. Leveraging Machine Learning algorithms for Fake Profile Detection on Instagram. In Proceedings of the 7th International Conference on Circuit Power and Computing Technologies (ICCPCT), Kollam, India, 8–9 August 2024; pp. 869–876. [Google Scholar] [CrossRef]
  37. Asha, V.; Nithya, B.; Prasad, A.; Kumari, M.; Hujaifa, M.; Sharma, A. Optimizing Fraud Detection with XGBoost and CatBoost for Social Media Profiles and Payment Systems. In Proceedings of the International Conference on Electronics and Renewable Systems (ICEARS), Tuticorin, India, 11–13 February 2025; pp. 1987–1992. [Google Scholar] [CrossRef]
  38. Kuruvilla, A.; Daley, R.; Kumar, R. Spotting Fake Profiles in Social Networks via Keystroke Dynamics. In Proceedings of the IEEE 21st Consumer Communications & Networking Conference (CCNC), Las Vegas, NV, USA, 6–9 January 2024; pp. 525–533. [Google Scholar] [CrossRef]
  39. Abhijith, A.B.; Prithvi, P. Automated Toxic Chat Synthesis, Reporting and Removing the Chat in Telegram Social Media Using Natural Language Processing Techniques. In Proceedings of the Fourth International Conference on Advances in Electrical, Computing, Communication and Sustainable Technologies (ICAECT), Bhilai, India, 11–12 January 2024; pp. 1–7. [Google Scholar] [CrossRef]
  40. Shiny, J.; Penyameen, S.; Hannah, N.; Harilakshmi, J.S.; Hewin, A.; Thanusha, S. Analysis of Behavior in Chat Applications using Natural Language Processing. In Proceedings of the 2nd International Conference on Sustainable Computing and Smart Systems (ICSCSS), Coimbatore, India, 10–12 July 2024; pp. 718–725. [Google Scholar] [CrossRef]
  41. La Morgia, M.; Mei, A.; Mongardini, A.; Wu, J. Pretending to be a VIP! Characterization and Detection of Fake and Clone Channels on Telegram. Acm Trans. 2024, 19, 1–24. [Google Scholar] [CrossRef]
  42. Paladini, T.; Ferro, L.; Polino, M.; Zanero, S.; Carminati, M. You Might Have Known It Earlier: Analyzing the Role of Underground Forums in Threat Intelligence. In Proceedings of the 27th International Symposium on Research in Attacks, Intrusions and Defenses (RAID ’24), Padua, Italy, 30 September–2 October 2024; pp. 368–383. [Google Scholar] [CrossRef]
  43. Raza, M.O.; Meghji, A.F.; Mahoto, N.A.; Reshan, M.S.A.; Abosaq, H.A.; Sulaiman, A.; Shaikh, A. Reading Between the Lines: Machine Learning Ensemble and Deep Learning for Implied Threat Detection in Textual Data. Int. J. Comput. Intell. Syst. 2024, 17, 183. [Google Scholar] [CrossRef]
  44. Franco, M.; Gaggi, O.; Palazzi, C.E. Integrating Content Moderation Systems with Large Language Models. Acm Trans. 2024, 19, 1–21. [Google Scholar] [CrossRef]
  45. Feretzakis, G.; Vagena, E.; Kalodanis, K.; Peristera, P.; Kalles, D.; Anastasiou, A. GDPR and Large Language Models: Technical and Legal Obstacles. Future Internet 2025, 17, 151. [Google Scholar] [CrossRef]
  46. Narayan, S.M.; Kohli, N.; Martin, M.M. Addressing contemporary threats in anonymised healthcare data using privacy engineering. NPJ Digit. Med. 2025, 8, 145. [Google Scholar] [CrossRef] [PubMed]
  47. Karliuk, M. Proportionality principle for the ethics of artificial intelligence. AI Ethics 2023, 3, 985–990. [Google Scholar] [CrossRef]
  48. Nie, N.; Guo, H.; Song, W. Authenticity Classification of WeChat Group Chat Messages Based on LDA and NLP. In Proceedings of the 9th International Conference on Cloud Computing and Big Data Analytics (ICCCBDA), Chengdu, China, 25–27 April 2024; pp. 313–319. [Google Scholar] [CrossRef]
Figure 1. The search terms used in the chosen databases to find publications.
Figure 1. The search terms used in the chosen databases to find publications.
Electronics 14 02551 g001
Figure 3. Distribution of data sources on which studies performed threat detection.
Figure 3. Distribution of data sources on which studies performed threat detection.
Electronics 14 02551 g003
Table 1. Paper source databases used in this systematic review.
Table 1. Paper source databases used in this systematic review.
Database NameURL
ACM Digital Libraryhttps://dl.acm.org/ (accessed on 4 April 2025)
IEEE Xplorehttps://ieeexplore.ieee.org/ (accessed on 4 April 2025)
PubMedhttps://pubmed.ncbi.nlm.nih.gov/ (accessed on 4 April 2025)
Scopushttps://www.scopus.com/ (accessed on 4 April 2025)
Web of Sciencehttps://www.webofscience.com/ (accessed on 4 April 2025)
Table 2. Peak performance by technique across all threat types.
Table 2. Peak performance by technique across all threat types.
TechniqueBest EvaluationBest-Case Threat TypeTop Citation
MLAccuracy: 99.15%Phishing (E-mail)[19]
DLAccuracy: 99.98%Smishing (Swahili)[22]
LLMsAcc.: 97.5%, F1: 97.6%Phishing[17]
Ensemble DLF1: 91.61%Aggressive content[43]
Statistical/otherAcc.: 91.6% to 100%Fake profile[38]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Saias, J. Advances in NLP Techniques for Detection of Message-Based Threats in Digital Platforms: A Systematic Review. Electronics 2025, 14, 2551. https://doi.org/10.3390/electronics14132551

AMA Style

Saias J. Advances in NLP Techniques for Detection of Message-Based Threats in Digital Platforms: A Systematic Review. Electronics. 2025; 14(13):2551. https://doi.org/10.3390/electronics14132551

Chicago/Turabian Style

Saias, José. 2025. "Advances in NLP Techniques for Detection of Message-Based Threats in Digital Platforms: A Systematic Review" Electronics 14, no. 13: 2551. https://doi.org/10.3390/electronics14132551

APA Style

Saias, J. (2025). Advances in NLP Techniques for Detection of Message-Based Threats in Digital Platforms: A Systematic Review. Electronics, 14(13), 2551. https://doi.org/10.3390/electronics14132551

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop