Telecom Fraud Detection Based on Large Language Models: A Multi-Role, Multi-Layer Prompting Strategy
Abstract
1. Introduction
- MRML prompting: This strategy simulates a human expert panel, enabling collaborative reasoning from low-level feature perception to high-level semantic judgment, thereby improving accuracy and robustness.
- Transparent decision-making: By incorporating multiple expert roles for parallel information parsing and feature extraction, this framework provides explicit and traceable rationales for each decision.
- Conditional triggered reasoning: This framework activates in-depth analysis only when initial screening indicates suspicious activity, improving computational efficiency without compromising performance.
2. Related Work
2.1. Machine Learning-Based Fraud Detection Methods
2.2. Deep Learning-Based Fraud Detection Methods
2.3. Large Language Model-Based Fraud Detection Methods
2.4. Summary
3. Methodology
3.1. Overview of the Overall Framework
3.2. Layer 1: Multi-Role Parallel Analysis and Preliminary Judgment
3.3. Layer 2: Conditional Triggering and Fine-Grained Classification
3.4. Summary of the Policy Process
4. Experiments and Results
4.1. Dataset
- (1)
- CCL2023-FGRC-SCD fusion dataset
- (2)
- ChiFraud dataset
4.2. Model Selection
4.3. Experimental Results and Analysis
4.3.1. Evaluation Indicators
4.3.2. Comparison of Baseline Models
4.3.3. Comparison of the Reminder Strategies
- Zero-Shot: Basic zero-sample prompt;
- Few-Shot: A few sample prompts with limited examples
- CoT: The thinking chain prompts the LLMs to display the reasoning process.
- RP(Role-based prompts): This strategy includes three specialized prompts—text analysis, business process, and security analysis—to evaluate analytical skills from different perspectives.
4.3.4. Ablation Experiments
5. Limitations
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Kwon, S.; Jang, B. A comprehensive survey of fake text detection on misinformation and LM-generated texts. IEEE Access 2025, 13, 25301–25324. [Google Scholar] [CrossRef]
- Papasavva, A.; Lundrigan, S.; Lowther, E.; Johnson, S.; Mariconti, E.; Markovska, A.; Tuptuk, N. Applications of AI-Based Models for Online Fraud Detection and Analysis. Crime Sci. 2025, 14, 7. [Google Scholar] [CrossRef]
- Minaee, S.; Mikolov, T.; Nikzad, N.; Chenaghlu, M.; Socher, R.; Amatriain, X.; Gao, J. Large language models: A survey. arXiv 2024, arXiv:2402.06196. [Google Scholar]
- Chae, Y.; Davidson, T. Large language models for text classification: From zero-shot learning to instruction-tuning. Sociol. Methods Res. 2025. [Google Scholar] [CrossRef]
- Salloum, S.; Gaber, T.; Vadera, S.; Shaalan, K. A systematic literature review on phishing email detection using natural language processing techniques. IEEE Access 2022, 10, 65703–65727. [Google Scholar] [CrossRef]
- Al Tawil, A.; Almazaydeh, L.; Qawasmeh, D.; Qawasmeh, B.; Alshinwan, M.; Elleithy, K. Comparative analysis of machine learning algorithms for email phishing detection using TF-IDF, Word2Vec, and BERT. Comput. Mater. Contin. 2024, 81, 3395. [Google Scholar] [CrossRef]
- Taneja, K.; Vashishtha, J.; Ratnoo, S. Fraud-BERT: Transformer based context aware online recruitment fraud detection. Discov. Comput. 2025, 28, 9. [Google Scholar] [CrossRef]
- Wu, Y.; Wang, L.; Li, H.; Liu, J. A Deep Learning Method of Credit Card Fraud Detection Based on Continuous-Coupled Neural Networks. Mathematics 2025, 13, 819. [Google Scholar] [CrossRef]
- Fields, J.; Chovanec, K.; Madiraju, P. A survey of text classification with transformers: How wide? how large? how long? how accurate? how expensive? how safe? IEEE Access 2024, 12, 6518–6531. [Google Scholar] [CrossRef]
- Han, K.; Xiao, A.; Wu, E.; Guo, J.; Xu, C.; Wang, Y. Transformer in transformer. Adv. Neural Inf. Process. Syst. 2021, 34, 15908–15919. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30, 5998–6008. [Google Scholar]
- Zhuoxian, L.; Tuo, S.; Xiaofeng, H. A Text Classification Model Combining Adversarial Training with Pre-trained Language Model and neural networks: A Case Study on Telecom Fraud Incident Texts. arXiv 2024, arXiv:2411.06772. [Google Scholar] [CrossRef]
- Li, J.; Zhang, C.; Jiang, L. Innovative Telecom Fraud Detection: A New Dataset and an Advanced Model with RoBERTa and Dual Loss Functions. Appl. Sci. 2024, 14, 11628. [Google Scholar] [CrossRef]
- Wang, Z.; Lin, Y.; Shen, J.; Zhu, X. A Survey of Large Language Models for Text Classification: What, Why, When, Where, and How. TechRxiv 2025. [Google Scholar] [CrossRef]
- Gao, Y.; Xiong, Y.; Gao, X.; Jia, K.; Pan, J.; Bi, Y.; Dai, Y.; Sun, J.; Wang, H.; Wang, H. Retrieval-augmented generation for large language models: A survey. arXiv 2023, arXiv:2312.10997. [Google Scholar]
- Tang, M.; Zou, L.; Liang, S.; Jin, Z.; Wang, W.; Cui, S. Chifraud: A long-term web text benchmark for chinese fraud detection. In Proceedings of the 31st International Conference on Computational Linguistics (COLING 2025), Abu Dhabi, United Arab Emirates, 19–24 January 2025. [Google Scholar]
- Heiding, F.; Schneier, B.; Vishwanath, A.; Bernstein, J.; Park, P.S. Devising and detecting phishing emails using large language models. IEEE Access 2024, 12, 42131–42146. [Google Scholar] [CrossRef]
- Cate, M. The Role of Zero-Shot and Few-Shot Learning in Enhancing LLMs for Real-World Applications. Report 2023. Available online: https://www.researchgate.net/publication/388920886 (accessed on 20 December 2025).
- Xu, H.X.; Wang, S.A.; Li, N.; Wang, K.; Zhao, Y.; Chen, K.; Yu, T.; Liu, Y.; Wang, H. Large language models for cyber security: A systematic literature review. arXiv 2024, arXiv:2405.04760. [Google Scholar] [CrossRef]

| Prompt Module | Prompt Content | Function Description |
|---|---|---|
| Role Definition | As a seasoned expert in analyzing fraudulent text, you are well-versed in the characteristics of various scam messages and can spot red flags in so-called ‘risk-free’ information. | The LLMs are given the identity of a domain expert to restrict their analysis perspective and focus on the feature of the text level. |
| knowledge infusion | Be aware of typical signals of non-fraudulent information, such as: “Contact official customer service”, “Through official channels”, ”Only providing information or advice”, no fund transfer, no download of third-party apps, no urgent urging, etc. | Provide key domain knowledge to the LLMs, especially the features of counterexamples, to improve the accuracy of binary classification and reduce misjudgment. |
| Task Instruction | Please analyze the following information step by step to determine whether it contains characteristics of fraudulent information or is non-fraudulent: SMS content: {text} | The LLMs are required to perform CoT with a binary classification task and input formatting. |
| Output Guide | Please give your judgment. | The LLMs output the preliminary analysis results, which can provide the basis for the next stage of decision-making. |
| Prompt Module | Prompt Content | Function Description |
|---|---|---|
| Role Definition | You are a highly experienced expert in analyzing fraudulent text messages, specializing in the precise classification of multi-category scam SMS. | Shift role-based tasks from binary screening to multi-category scenarios, prioritizing the core objective of ‘precise’ classification. |
| Task and context | Please use the analysis results from the first stage {first_stage_results} to perform multi-category judgment on the following SMS content {text}, and select the most matching category from the predefined categories [‘,’.join(LABELS)}. | The hierarchical information transmission is realized, and the conclusion of the first stage is taken as the important context. The task is defined as the multiple choice discrimination from the closed category library. |
| knowledge infusion | You can identify these fraud types by recognizing their core red flags, such as: • Phishing calls pretending to be financial or regulatory bodies, claiming credit issues • Urging you to download and share a screen for remote meetings • Encouraging loans on major platforms • Asking you to transfer funds to a so-called “safe account” • Using fake official documents to threaten you | For example, the LLMs can be trained with fine-grained domain knowledge to provide specific and actionable discriminative criteria. |
| Structured Output | Please strictly follow the output format below: Prediction category: <category> Reason: <about 150 words, explain the risk signals in the text and their matching with category features, and explain why they are not misclassified as other categories> | The LLMs output standardized and structured results, which are easy to be analyzed by the program, and the LLMs also require the classification reasons to be provided. |
| Category Name | Sample Size | Percentage |
|---|---|---|
| Cashback for fake transactions | 35,459 | 32.0% |
| Pretending to be a customer service representative for e-commerce logistics | 13,772 | 12.4% |
| Fake online investment and financial management products | 11,836 | 10.7% |
| Loans, credit card applications, and related services | 11,105 | 10.0% |
| False credit reporting | 8464 | 7.6% |
| Fake shopping and services | 7058 | 6.4% |
| Pretending to be from the public security, procuratorial, judicial, or government agencies | 4563 | 4.1% |
| Pretending to be a superior or someone you know well | 4407 | 4.0% |
| Online game products involving fake transactions | 2155 | 1.9% |
| Online dating and social networking | 1654 | 1.5% |
| Shopping items for those who impersonate military or police officers | 1092 | 1.0% |
| Cases of cybercrimes | 1197 | 1.1% |
| No risk (FGRC-SCD) | 8000 | 7.2% |
| Total | 110,762 | 100% |
| Category | Number | Percentage |
|---|---|---|
| New | 1063 | 0.3% |
| Loan | 1522 | 0.4% |
| SIM | 1405 | 0.3% |
| Certification | 7073 | 1.7% |
| Cash-out | 2817 | 0.7% |
| Drugs | 5128 | 1.2% |
| Bank | 1837 | 0.4% |
| Credentials | 1381 | 0.3% |
| Whoring | 26,187 | 6.4% |
| Gambling | 10,693 | 2.6% |
| Normal | 352,328 | 85.6% |
| Total | 411,434 | 100% |
| Model | Precision | Recall | F1-Score |
|---|---|---|---|
| TextCNN | 85.2 | 83.8 | 83.4 |
| Transformer | 80.6 | 81.3 | 80.3 |
| Bert | 63.6 | 62.9 | 62.9 |
| ChineseBert | 86.9 | 87.0 | 86.5 |
| glm-4-9b-chat/MRML | 87.8 | 86.1 | 86.5 |
| Qwen2.5-7B-Instruct/MRML | 91.3 | 87.6 | 88.2 |
| Model | Precision | Recall | F1-Score |
|---|---|---|---|
| TextCNN | 81.1 | 63.0 | 66.9 |
| Transformer | 81.8 | 78.4 | 77.9 |
| Bert | 76.8 | 75.0 | 74.9 |
| ChineseBert | 83.1 | 81.0 | 80.5 |
| glm-4-9b-chat/MRML | 87.4 | 87.4 | 87.4 |
| Qwen2.5-7B-Instruct/MRML | 84.9 | 84.9 | 84.9 |
| Model | Prompt Strategy | Precision | Recall | F1-Score |
|---|---|---|---|---|
| glm-4-9b-chat | ZERO-SHOT | 76.4 | 68.7 | 66.1 |
| FEW-SHOT | 82.4 | 77.5 | 77.4 | |
| COT | 74.8 | 66.6 | 64.2 | |
| RP | 78.0 | 70.2 | 67.4 | |
| MRML | 87.8 | 86.1 | 86.5 | |
| Qwen2.5-7B-Instruct | ZERO-SHOT | 83.3 | 75.2 | 73.0 |
| FEW-SHOT | 83.9 | 77.9 | 76.7 | |
| COT | 81.2 | 74.4 | 72.4 | |
| RP | 82.1 | 75.9 | 74.5 | |
| MRML | 91.3 | 87.6 | 88.2 |
| Model | Prompt Strategy | Precision | Recall | F1-Score |
|---|---|---|---|---|
| glm-4-9b-chat | ZERO-SHOT | 78.8 | 75.8 | 76.2 |
| FEW-SHOT | 77.7 | 74.3 | 74.3 | |
| COT | 79.3 | 75.3 | 75.7 | |
| RP | 80.1 | 78.0 | 78.1 | |
| MRML | 87.4 | 84.9 | 85.9 | |
| Qwen2.5-7B-Instruct | ZERO-SHOT | 80.0 | 68.3 | 69.4 |
| FEW-SHOT | 79.2 | 63.1 | 64.7 | |
| COT | 80.4 | 71.3 | 72.0 | |
| RP | 79.7 | 68.6 | 69.8 | |
| MRML | 87.8 | 82.7 | 84.1 |
| Model | Prompt Strategy | Precision | Recall | F1-Score |
|---|---|---|---|---|
| glm-4-9b-chat | without Text Role | 87.3 | 85.2 | 85.9 |
| without Process Role | 87.7 | 85.4 | 86.0 | |
| without Security Role | 87.7 | 85.8 | 86.3 | |
| Without Layer 1 | 75.9 | 74.8 | 72.9 | |
| MRML | 87.8 | 86.1 | 86.5 | |
| Qwen2.5-7B-Instruct | without Text Role | 89.0 | 82.4 | 83.3 |
| without Process Role | 88.4 | 83.6 | 84.0 | |
| without Security Role | 88.3 | 82.9 | 83.5 | |
| Without Layer 1 | 86.3 | 80.8 | 81.8 | |
| MRML | 91.3 | 87.6 | 88.2 |
| Model | Prompt Strategy | Precision | Recall | F1-Score |
|---|---|---|---|---|
| glm-4-9b-chat | without Text Role | 86.2 | 83.3 | 84.4 |
| without Process Role | 86.9 | 84.7 | 85.6 | |
| without Security Role | 87.3 | 84.5 | 85.7 | |
| Without Layer 1 | 81.5 | 79.3 | 79.7 | |
| MRML | 87.4 | 84.9 | 85.9 | |
| Qwen2.5-7B-Instruct | without Text Role | 87.6 | 67.2 | 71.8 |
| without Process Role | 87.0 | 66.7 | 71.7 | |
| without Security Role | 87.4 | 66.8 | 70.7 | |
| Without Layer 1 | 83.5 | 79.1 | 79.4 | |
| MRML | 87.8 | 82.7 | 84.1 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Ding, J.; Zhou, H. Telecom Fraud Detection Based on Large Language Models: A Multi-Role, Multi-Layer Prompting Strategy. Appl. Sci. 2026, 16, 544. https://doi.org/10.3390/app16010544
Ding J, Zhou H. Telecom Fraud Detection Based on Large Language Models: A Multi-Role, Multi-Layer Prompting Strategy. Applied Sciences. 2026; 16(1):544. https://doi.org/10.3390/app16010544
Chicago/Turabian StyleDing, Jianpeng, and Houpan Zhou. 2026. "Telecom Fraud Detection Based on Large Language Models: A Multi-Role, Multi-Layer Prompting Strategy" Applied Sciences 16, no. 1: 544. https://doi.org/10.3390/app16010544
APA StyleDing, J., & Zhou, H. (2026). Telecom Fraud Detection Based on Large Language Models: A Multi-Role, Multi-Layer Prompting Strategy. Applied Sciences, 16(1), 544. https://doi.org/10.3390/app16010544

