Image-Based Telecom Fraud Detection Method Using an Attention Convolutional Neural Network

Li, Jiyuan; Dang, Jianwu; Wang, Yangping; Yang, Jingyu

doi:10.3390/e27101013

Open AccessArticle

Image-Based Telecom Fraud Detection Method Using an Attention Convolutional Neural Network

School of Electronic and Information Engineering, Lanzhou Jiaotong University, Lanzhou 730070, China

^*

Authors to whom correspondence should be addressed.

Entropy 2025, 27(10), 1013; https://doi.org/10.3390/e27101013

Submission received: 24 July 2025 / Revised: 9 September 2025 / Accepted: 22 September 2025 / Published: 27 September 2025

(This article belongs to the Section Signal and Data Analysis)

Download

Browse Figures

Versions Notes

Abstract

In recent years, telecom fraud remains prevalent in many regions, severely impacting people’s daily lives and causing substantial economic losses. However, previous research has mainly relied on expert knowledge for feature engineering, which lags behind and struggles to adapt to the continuously evolving patterns of fraud effectively. In addition, the extreme imbalance in fraud amounts within real communication data hinders the development of deep learning methods. In response, we propose a feature transformation method to represent users’ communication behavior as comprehensively as possible, and develop a convolutional neural network (CNN) with a Focal Loss function to identify rare fraudulent activities in highly imbalanced data. Experimental results on a real-world dataset show that, under conditions of severe class imbalance, the proposed method significantly outperforms existing approaches in two key metrics: recall (0.7850) and AUC (0.8662). Our work provides a new approach for telecommunication fraud detection, enabling the effective identification of fraudulent numbers.

Keywords:

telecommunication fraud detection; convolutional neural network; feature generation

1. Introduction

While telecommunication networks have played a crucial role in economic and social development, the rapid advancement of mobile communication technologies and the widespread use of smart devices have also made them a major platform for fraud [1,2]. According to statistics, telecom fraud in China has led to significant financial losses, including CNY 35.37 billion in 2020 [3] and over CNY 326 billion intercepted in 2021 [4]. Additionally, more than 370,000 fraud cases were detected in 2021, with over 400,000 cases identified in both 2022 [5] and 2023 [6]. Once these fraud cases occur, it is extremely difficult to recover the funds involved [7]. Thus, there is an urgent need to establish effective prevention and management strategies to tackle telecom fraud.

In response to this growing threat, current fraud detection technologies primarily rely on analyzing users’ communication behaviors and content to develop recognition systems. Traditional detection models [8,9,10] focus on rule engines and blacklists, which trigger detection mechanisms through predefined characteristics of fraudulent behavior. In recent years, research has focused mainly on combining expert knowledge for feature engineering and applying machine learning algorithms to detect fraud, including Random Forests [11,12,13], Support Vector Machines [14,15,16,17,18], ensemble learning [19], and neural networks [20,21,22,23,24]. At the same time, graph-based fraud detection technologies predict fraud by learning the features of user interaction behaviors [25,26,27,28,29]. However, the aforementioned methods largely rely on manually designed features based on expert knowledge. The continuous evolution of fraud tactics [30] makes this reliance on manually crafted features costly and unsustainable. Therefore, exploring efficient methods that can automatically extract practical features has become a critical challenge in the field of fraud detection. Moreover, the current detection system faces a severe class distribution imbalance issue, where the number of fraud numbers accounts for only a small fraction of the number of normal numbers. The model may not sufficiently learn from the minority class samples during training, which can negatively impact the accuracy of identifying fraudulent cases. Although various solutions to the data imbalance problem have been proposed [31], and some studies [32,33,34] have explored the class imbalance problem in the field of credit card fraud, research on the data imbalance in the field of fraud-related number recognition is relatively scarce. This may lead to suboptimal classification performance on real-world datasets.

To overcome the challenges of automatically extracting practical features and handling class imbalance, we design a solution to detect telecom fraud. First, we introduce the Focal Loss function [35] during the training process to address the severe class imbalance between normal and fraudulent numbers. Focal Loss increases the loss weight of the minority class samples by introducing a balancing factor, while simultaneously reducing the loss weight of easily classified samples through a modulating factor. This enables the model to focus more on the hard-to-classify fraudulent number samples, thereby improving its ability to learn from minority classes. Next, we propose a feature transformation mechanism to effectively capture users’ communication behaviors for fraud detection. Our main contributions are shown below.

(1): To address the challenge of automatically extracting useful features from telecom data, we propose a feature transformation mechanism that converts Call Detail Record (CDR) text data into structured matrices. This mechanism transforms key features into image-like matrices, such as the proportion of call duration per caller and the number of called numbers, capturing the temporal and behavioral patterns of user interactions. These matrices are then stacked together to form an 8-dimensional tensor, which serves as a rich, high-dimensional representation of the user’s communication behavior. By using this transformation, our approach not only automates the feature extraction process but also significantly reduces the need for manual intervention from domain experts.
(2): We propose a novel approach that combines Squeeze-and-Excitation (SE) blocks [36] with a Convolutional Neural Network (CNN) for detecting telephone fraud. The SE blocks dynamically learn a set of weights that enable the model to emphasize the most informative features while suppressing less relevant ones. This adaptive adjustment of channel importance enhances the model’s ability to focus on critical features, improving performance on complex tasks like fraud detection. By incorporating SE blocks into the CNN, our method strengthens the network’s feature selection process, leading to more accurate and reliable fraud detection outcomes.

The rest of the paper is organized as follows: Section 2 briefly introduces related research works on telecommunication fraud detection, Section 3 presents explicit details about our proposed solution, Section 4 describes and analyzes the experimental results, and Section 5 concludes and discusses future work.

2. Related Work

Telecom fraud detection has evolved significantly over time, with research efforts spanning rule-based methods, traditional machine learning, deep learning, and more recently, graph neural networks.

2.1. Rule-Based Methods

Early anti-fraud technologies primarily relied on rule-based approaches, informed by insights gained from previous telecommunication fraud detection efforts. These insights were translated into rule systems and deployed for real-time detection, enabling user identification. Taniguchi et al. [9] proposed a rule-based method that combines customer and behavioral data, using a greedy algorithm and adjusted thresholds to select an optimal rule set. However, as the volume of communication services has increased, telecom user data has grown, and fraud techniques have continued to evolve, rule-based anti-fraud methods are insufficient to meet the demands of modern telecommunication fraud detection. Consequently, researchers have increasingly turned to machine learning and deep learning techniques to address these challenges.

2.2. Traditional Machine Learning

As an intuitive approach, manually designed features combined with classical machine learning algorithms are commonly used to classify fraudulent phone numbers. With the help of various features and machine learning algorithms, many studies have attempted to address the fraud detection problem. Machine learning techniques primarily utilize methods such as Support Vector Machines (SVM) and Random Forest (RF) for the identification and classification of fraudulent activities. Subudhi et al. [18] presented an approach based on the Quarter-Sphere Support Vector Machine (QS-SVM), where they constructed a user behavior profile by comparing a user’s current calling pattern with historical usage patterns. Experimental results demonstrated that QS-SVM achieved more accurate fraud detection and significantly reduced the false alarm rate, outperforming traditional SVM methods. Bai et al. [13] introduced a fraudulent phone call identification model based on Random Forest. This model identifies fraudulent calls by extracting features of fraudulent calls, selecting model variables, and establishing the model. A real-time shutdown system was also constructed. Subudhi et al. [37] proposed a fraud detection method based on the fuzzy C-means clustering algorithm. The algorithm performs clustering analysis by comparing a user’s recent call activities with typical call behaviors, achieving an accuracy of 93.05%. These methods are more flexible compared to traditional rule-based methods and can learn more complex behavioral patterns of users. However, feature engineering still requires specialized knowledge and manually designed features may not be comprehensive enough to respond promptly to the constantly evolving fraudulent characteristics.

2.3. Deep Learning Approaches

In recent years, the development of deep learning technology has brought new opportunities for telecom fraud detection. As an algorithm that simulates the human brain’s neural network, deep learning possesses powerful pattern recognition and learning capabilities. Researchers can utilize deep learning models, such as Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs), to extract high-level features and perform fraud detection. Xing et al. [38] compared the performance of three deep learning models—CNN, LSTM, and Stacked Denoising Autoencoder (SDAE)—with the traditional Random Forest model on call detail record datasets. The deep learning models achieved accuracy rates exceeding 99%. Gowri et al. [20] utilized an algorithm based on Recurrent Neural Networks (RNNs) for detecting telephone spam and scams, without relying on the telephone network infrastructure. By analyzing historical spam call datasets, they achieved a malicious call detection rate and binary call accuracy of over 90%. Li et al. [24] proposed a method based on breadth learning and dual-channel convolutional neural networks (BLS-DCCNN) for identifying fraudulent calls in the context of telecommunications anti-fraud scenarios. This method first generates and enhances features through a breadth learning system (BLS), then combines a dual-channel convolutional neural network (CNN) structure to extract global and local features, respectively, thereby improving high-dimensional feature expression and classification capabilities. Notably, the authors addressed the issue of extreme imbalance between positive and negative samples in real-world data by introducing the Focal Loss loss function during model training. Zhen et al. [21] proposed an approach called CDR2IMG, which transforms a user’s calling and called relative times by time dimension into an image-like feature matrix containing only 0, 1, and −1. The study used CNN and achieved superior performance with an F1-score of 89.98% and AUC of 92.93%. Bernardo et al. [23] proposed a real-time telecom fraud detection system using a combination of Neural Factorization Machines (NFM) and Autoencoders (AE). Their method models customer calling patterns and adapts to changing behaviors with a memory module, outperforming traditional methods with a high AUC of 91.06%, TPR of 91.89%, and F1-score of 95.45%.

2.4. Graph Neural Networks

While the above methods can detect fraudulent activities to some extent, most of them are unable to capture the interactive information between users. Recently, prediction models based on graph neural networks have emerged, which can perform fraud detection by learning the hidden features of social networks. Hu et al. [25] developed an end-to-end telecom fraud detection framework named Bridge to Graph (BTG), which effectively addresses the fraud detection challenge in sparsely connected data through graph neural networks. Experimental results showed that BTG significantly outperforms traditional methods in several metrics. On a real-world CDR dataset, BTG achieved an AUC of up to 92.45% and an F1-score of 85.21%. Hu et al. [39] proposed GAT-COBO, a cost-sensitive graph neural network (GNN) model that addresses the graph imbalance problem in telecom fraud detection by combining Graph Attention Networks with ensemble learning, demonstrating improved detection performance on imbalanced datasets. While traditional telecom fraud detection often relies on single-network data, Ren et al. [26] designed a multi-network latent collaborative graph fraud detection model. The model effectively captures heterogeneity in telecom fraud by fusing individual dynamic behaviors and multi-network embedding in voice and SMS networks. Wu et al. [29] introduced LSG-FD, a telecom fraud detection model that leverages latent synergy graph learning to capture fraudster behaviors and tackle graph disassortativity, demonstrating superior performance on real-world datasets like Sichuan, Yelp, and Amazon.

Although the above works addressed fraudulent call detection to some extent, most of them heavily rely on feature engineering, which cannot adapt to the fast-changing modes of fraudsters. In view of this, we propose a data transformation scheme that turns CDR data into image-like matrices and then stacks them into an 8-dimensional tensor.

3. Materials and Methods

3.1. Datasets

To ensure the model’s effectiveness in real-world applications, we used communication data from a specific city for a complete month. The dataset comprised 93,293 normal call samples (negative instances) and 517 suspected fraud cases (positive instances). From this collection, we randomly selected 10,000 negative samples and all 517 positive samples to construct the experimental subset. The experimental data was divided into a training set and a test set at a 4:1 ratio.

3.2. A Fraud Detection Framework

We construct eight image-like matrices that integrate behavioral features with the temporal dimension to represent communication behaviors, which are subsequently analyzed using a convolutional neural network. This framework enables the model to gain a comprehensive understanding of communication patterns, thereby enhancing its fraud detection performance. Our proposed framework is shown in Figure 1. In the following section, we introduce each component of the framework in detail.

3.2.1. Feature Engineering

The features are extracted from voice call data. In contrast to the communication behavior features used in previous studies, we introduce two new features. The first, RatioImeichange, represents the frequency with which a user changes terminals. The calculation formula is as follows:

RatioImeichange = \frac{Number of IMEI changes within one hour}{Total number of IMEI changes within one month}

(1)

The second feature, StdRelaAttribution, refers to the standard deviation of the difference between the calling and called attribution codes. StdRelaAttribution reflects the degree of change in the relative position between the calling party and the called number when the user acts as the caller over a period of time, as shown below:

StdRelaAttribution = std (|A_{i} - B_{i}|)

(2)

where

(A_{i})

and

(B_{i})

represent the attribution codes of the phone number and its counterpart in the (i)-th call record, and the calculation is performed over all records within one hour in which the phone number serves as the primary call party. In addition, we retain six features commonly used in existing research: proportion of outgoing call duration, proportion of incoming call duration, proportion of outgoing call count, proportion of incoming call count, number of unique counterpart numbers (from outgoing call records), and number of counterpart cities (from outgoing call records).

Inspired by the CDR2IMG approach [21] discussed in Section 2, we create a two-dimensional feature matrix for each feature to describe communication behavior. The x-axis of the matrix corresponds to the date, while the y-axis represents the hour. We select eight features, including the two newly introduced ones, and aggregate the data within each hour, transforming the values into feature matrices. These eight feature matrices are then stacked along the spatial dimension and used as inputs to a Convolutional Neural Network (CNN), with each matrix corresponding to a distinct dimension in the feature space. This approach allows us to effectively capture and analyze users’ communication behavior in a structured and coherent manner.

3.2.2. Convolutional Neural Network

We propose a Convolutional Neural Network (CNN) that integrates Squeeze-and-Excitation (SE) blocks for enhanced feature selection in detecting telephone fraud, as shown in Figure 2. The SE blocks are placed after the activation layers in the convolutional blocks and serve as channel-wise attention mechanisms. They operate by first applying global average pooling to capture global contextual information, followed by fully connected layers that generate channel-specific weights to recalibrate the importance of each feature map. These weights help the network focus on more informative features while suppressing irrelevant ones, thereby improving its ability to detect subtle patterns associated with fraud. By incorporating SE blocks, the model effectively enhances feature selection, leading to better performance on complex classification tasks without significantly increasing computational complexity. This approach allows the network to adaptively prioritize important features, making it more robust and accurate for tasks such as fraud detection. The kernel sizes and filter counts used in our method were varied, and their detailed configurations are provided in Table 1.

Furthermore, to address the severe class imbalance between the proportion of normal numbers and fraudulent numbers in the dataset, we introduce the Focal Loss function. Fraudulent number detection can be viewed as a binary classification problem. In the case of class imbalance, the cross-entropy loss is easily influenced by samples from the majority class. A large number of easily classified normal number samples dominate the gradient updates, making it difficult for the model to effectively learn the characteristics of fraudulent samples. Focal Loss adjusts the weights of the loss function to reduce the loss contribution of easily classified samples, thereby focusing the model on the hard-to-classify fraudulent samples during training. It has the following expression:

F L (p_{t}) = - α_{t} {(1 - p_{t})}^{γ} log (p_{t})

(3)

α_{t} = \{\begin{matrix} α, if y = 1 \\ 1 - α, otherwise \end{matrix}

(4)

p_{t} = \{\begin{matrix} p, if y = 1 \\ 1 - p, otherwise \end{matrix}

(5)

4. Experiment and Discussion

4.1. Experiment Setup

4.1.1. Training Environment

The training process was executed on a computer with Windows 10, using Python 3.10 and Pytorch 2.4. The following hardware was used: Gen Intel^® Core^TM i5-12400F (Intel Corporation, Santa Clara, CA, USA) 2.50 GHz and AMD Radeon RX 6750 GRE (Advanced Micro Devices, Inc., Santa Clara, CA, USA). In the training, the batch size was set to 8, with 100 training epochs. The Adam optimizer was used to update the network parameters, with a learning rate of 0.0001 and a weight decay of 1

\times 10^{- 5}

. In this paper, the weight factor of the Focal Loss function was set to 0.95, and the modulation factor was set to 3.

4.1.2. Parameter Settings

In the experiments, we tested weight decay values of 1

\times 10^{- 3}

, 1

\times 10^{- 4}

, and 1

\times 10^{- 5}

. The parameter

α

in the focal loss was set based on the ratio of positive to negative samples in the dataset, and we evaluated

γ

values of 0, 1, 2, 3. The hyperparameter set that achieved the highest performance was selected as the final configuration. For SVM, RF, and XGBoost, we conducted hyperparameter optimization via grid search to identify the best settings based on recall metrics; detailed results are provided in Table 2. In addition, we tuned the decision threshold (0.496) to maximize recall for our proposed method.

4.1.3. Evaluation Metrics

This section provides an overview of the performance metrics used to evaluate binary classification problems. The predictive performance of classification models is typically evaluated using metrics such as precision, recall, accuracy, AUC, and F1-score. The calculation of these metrics is primarily based on the confusion matrix, which serves as the foundation for computing the average performance metrics of the model. Table 3 shows a confusion matrix for a binary classification model.

Accuracy, which is the proportion of users predicted correctly out of the total number of users, generally indicates better model performance with higher values. However, in the presence of class imbalance, the accuracy tends to favor the class with a larger number of samples (in this case, non-fraudulent users).

Accuracy = \frac{T P + T N}{T P + T N + F N + F P}

(6)

Precision refers to the proportion of samples predicted as fraudulent that are actually fraudulent. It quantifies the cost associated with misclassifying non-fraudulent users as fraudulent.

Precision = \frac{T P}{F P + T P}

(7)

Recall represents the proportion of actual fraudulent users correctly identified by the model. It signifies the model’s sensitivity in detecting fraudulent users.

Recall = \frac{T P}{F N + T P}

(8)

The F1-score is the harmonic mean of the model’s precision and recall, providing a balance between these two metrics.

F 1 - score = 2 \times \frac{Precision \times Recall}{Precision + Recall}

(9)

The ROC curve plots the false positive rate (FPR) on the x-axis and the true positive rate (TPR) on the y-axis. The FPR represents the proportion of normal users incorrectly classified as fraudulent, while the TPR corresponds to the recall. The Area Under the Curve (AUC) quantifies the overall performance of the classifier. For imbalanced datasets, AUC serves as an effective evaluation metric, representing the probability that a randomly chosen positive sample receives a higher prediction score than a randomly chosen negative sample. A higher AUC value indicates more reliable classification performance.

4.2. Experimental Analysis

4.2.1. Performance Comparisison

To validate the effectiveness of our model, we compared it against logistic regression (LR), support vector machine (SVM), random forest (RF), Stacked Denoising Autoencoder (SDAE), XGboost, one-dimensional convolutional neural network (1D-CNN), and CDR2IMG based on evaluation metrics. The feature dimension of the pre-processed data used in this study was 44, and the total number of samples was 10,517. Both the 44 tabular features and the 8 map channels were derived from the original data fields through data processing. Specifically, all features used for model inputs were constructed based on the raw communication records and relevant attributes. Among them, the 44 tabular features were obtained through comprehensive preprocessing while the 8 map channels used in the image-based models were further refined from these 44 features by merging, removing redundancy, and selecting the most informative attributes with expert guidance. The SDAE, 1D-CNN, CDR2IMG, and the proposed model employed the Focal Loss cross-entropy loss function. The comparison results are presented in Table 4, and Figure 3 illustrates the comparative performance in recall, accuracy, F1-score, and AUC.

In the domain of fraudulent number identification, given that the cost of missed detections significantly outweighs that of false positives, it is imperative to maximize the coverage of potentially fraudulent numbers. Therefore, recall is the primary metric for evaluating model performance. Distinct performance variations were noted across the evaluated models, as reflected in their Recall, Accuracy, F1-score, and AUC metrics. For example, although Random Forest (RF) achieved the highest Accuracy (0.8763) and F1-score (0.3606), its Recall (0.7009) was lower than that of Logistic Regression (LR) (0.7664) and our proposed model (0.8130). Among the SVM variants, kernel choice led to notable differences. SVM (Linear) attained the highest Recall (0.7573) among SVMs, demonstrating strong performance in fraud detection. Conversely, SVM (Poly) exhibited substantially lower Recall (0.5922), highlighting its limitations in identifying fraud. SVM (RBF) and SVM (Sigmoid) shared the same Recall (0.6990); however, SVM (RBF) outperformed in Accuracy and F1-Score, illustrating the RBF kernel’s advantage in balancing detection and overall classification. LR also delivered competitive results with a Recall of 0.7664 and an AUC of 0.8768, underscoring its reliability in fraud identification.

In contrast, models such as SDAE and CDR2IMG performed poorly across most metrics—CDR2IMG, for instance, recorded the lowest Recall (0.4953) and AUC (0.6535). Notably, although the CDR2IMG model reported outstanding performance metrics in the original literature, its Recall on the experimental dataset was only 0.4953, with an Accuracy of 0.7509, failing to meet expectations. The 1D-CNN approach outperformed CDR2IMG with a Recall of 0.7184, but still lagged behind the model proposed in this paper. It should be noted that both the data distribution and input duration for CDR2IMG in our experiments differed from the original study: while the original work used a six-month dataset with a positive-to-negative ratio of about 1:2, our dataset was based on real-world scenarios with a much lower fraud rate and covered only one month. These differences, particularly the more realistic data and limited duration, may have limited the model’s capacity to capture long-term patterns. However, our design enabled a more comprehensive representation of user behaviors, contributing to the improved effectiveness of our proposed method. Although the classification accuracy of the SDAE was essentially on par with that of the 1D-CNN, its Recall reached only 0.6601. The detection framework introduced in this study achieved a Recall of 0.8130, representing at least a 4.7% improvement over the best model, thereby fully demonstrating its discriminative advantage in fraud detection.

The performance of our model and its comparison models in terms of ROC curves and AUC values is shown in Figure 4. As illustrated in the figure, our model achieved the highest AUC, reaching a value of 0.8632. To provide a more detailed presentation of the classification results for fraudulent number identification using our proposed model, the corresponding confusion matrix is shown in Figure 4.

In addition, to comprehensively evaluate the impact of negative sampling on model performance, we conducted experiments under both the original class prevalence (0.55%) using the full dataset and under different negative sample sizes. The results are presented in Table 4 and illustrated in Figure 5. Compared with the 10,000 negative sample setting used in the main experiments, most methods showed an increase in recall on the full dataset, except for RF, XGBoost, CDR2IMG, and our proposed method. Although some models achieved a maximum recall of 0.7757 under different negative sampling rates, this value is still lower than the recall of our method (0.8130) with the 10,000-negative-sample setting.

To further verify the practicality of our method in real-world scenarios, we also conducted a lightweight performance evaluation on a Windows PC equipped with a single AMD Radeon RX 6750 GRE (Advanced Micro Devices, Inc., Santa Clara, CA, USA). For this experiment, 1000 samples were randomly selected from the test dataset with the batch size set to 1. The measured average inference latency was 0.58 ms per sample and the peak GPU memory usage was 12 MB. This result confirms that our approach not only achieves superior detection performance but also fully satisfies the requirements for real-time online deployment.

A comprehensive evaluation of the three key metrics—Accuracy, Recall, and AUC—demonstrates that the model proposed in this paper offers significant advantages in overall performance. The model substantially improves the identification rate of fraudulent telephone users while maintaining a high level of classification accuracy.

4.2.2. Ablation Study

To verify the effectiveness of the two newly designed features in this paper, we conducted an ablation study on the dataset. The results, shown in Table 5, present the performance of the CNN model under different feature configurations. The model that excludes the RatioImeichange and StdRelaAttribution features achieved a recall of 0.7009. In contrast, incorporating both features increased the recall to 0.8130, resulting in an absolute improvement of 11.21%. Although the addition of these features led to a slight decrease in accuracy (down by 1.59%), the model’s F1-score and AUC improved, rising by 1.34% and 1.95%, respectively. These findings indicate that integrating RatioImeichange and StdRelaAttribution significantly enhances recall and, when used together, improves overall model performance. Figure 6 illustrates the performance metrics for the different feature configurations. Furthermore, we incorporated two base station-related features into the original 8-dimensional set, resulting in a 10-dimensional feature vector. As shown in Table 5, this expansion did not lead to performance gains; the recall decreased by 10.28% compared to the 8-dimensional configuration.

To better understand the contribution of the focal loss, we replaced the focal loss with weighted cross-entropy loss in our ablation experiment. For the Weighted Cross-Entropy (WCE) loss, experiments were conducted on the dataset with 10,000 negative samples. The positive-to-negative class weight ratio was directly computed from the class distribution of this dataset. The results of this setting are reported in Table 5 (WCE). As shown, our proposed method with focal loss achieves better performance than the weighted cross-entropy baseline, demonstrating the effectiveness of the focal loss design in our approach.

5. Conclusions and Future Works

In this study, we propose a telecommunication fraud detection method based on real-world data from a northwest city in China. We design features related to communication patterns, and an ablation study confirms that these features enhance the model’s performance. To address the issue of imbalanced samples, we employ the Focal Loss function to adjust the loss weights between positive and negative samples. Experiments on a real-world dataset demonstrate that our method achieves superior overall performance in detecting fraudulent telephone activities.

This study is limited to a single month of data from one region, which prevents evaluation of the model’s temporal and regional generalization. We plan to conduct longitudinal experiments on larger, multi-region datasets to assess temporal stability and adaptability to evolving fraud behaviors. In the current setup, feature matrices were constructed using the entire month’s data (24 h × 31 days), which restricts chronological train–test splits and may lead to information leakage. Future work will redesign the feature extraction pipeline to enable such splits for a more realistic robustness assessment. In addition, we will explore richer and more diverse behavioral descriptors—including fine-grained temporal dynamics and social connectivity patterns—to expand the dimensionality of the feature vectors and enhance their representational capacity. Developing sustainable and adaptive detection mechanisms to keep pace with changing communication patterns will be a priority. Beyond automated feature extraction from raw CDR data, integrating relational information from communication networks and incorporating supplementary data sources may yield more comprehensive behavioral representations and improve feature correlation modeling [40,41].

Author Contributions

J.L.: Investigation, Methodology, Writing original manuscript. J.D.: Supervision, Investigation, Writing—review and editing. Y.W.: Resources, Writing—review and editing, Funding acquisition. J.Y.: Validation, Writing—review and editing, Funding acquisition. All authors reviewed the manuscript. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported in part by funds from the National Natural Science Foundation of China (No. 62367005), Central Government Guides Local Science and Technology Development Fund Project (No. 24ZYQA051), Major Cultivation Project of Scientific Research and Innovation Platforms in Gansu Provincial Colleges and Universities (No. 2024CXPT-17).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author. The data are not publicly available due to privacy restrictions.

Acknowledgments

The authors are grateful to the anonymous reviewers whose comments significantly improved this work.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Zhu, R.; Ye, H.; Sun, H.; Li, X.; Duan, Y.; Hou, J. Construction and application of knowledge-base in telecom fraud domain. Int. J. Intell. Inf. Database Syst. 2021, 14, 198–214. [Google Scholar] [CrossRef]
Shen, S. Study on Telecommunication Fraud from a Student’s Perspective. Int. J. Front. Sociol. 2023, 5, 137–142. [Google Scholar]
The “Two Highs and One Ministry” Issued the “Opinions on Several Issues Concerning the Application of Law in Handling Criminal Cases of Telecom Network Fraud, etc. (II)”. 2021. Available online: https://www.mps.gov.cn:8090/n2254098/n4904352/c7942849/content.html (accessed on 21 September 2025).
“Public Security 2021” Year-End Review Report. 2021. Available online: https://www.mps.gov.cn/n2254314/n6409334/c8294658/content.html (accessed on 21 September 2025).
The Crackdown and Governance of New Types of Telecom and Internet-Related Crimes Have Shown Significant Results. 2023. Available online: https://www.mps.gov.cn/n2254314/n6409334/c9061407/content.html (accessed on 21 September 2025).
Ministry of Public Security: In 2023, a Total of 437,000 Telecom and Internet Fraud Cases Were Solved. 2024. Available online: https://www.chinanews.com.cn/gn/2024/01-09/10142690.shtml (accessed on 21 September 2025).
How to Overcome the Challenges in Combating Telecom Network Fraud Crimes. 2017. Available online: https://www.spp.gov.cn/llyj/201702/t20170205_180096.shtml (accessed on 21 September 2025).
Gopal, R.K.; Meher, S.K. A rule-based approach for anomaly detection in subscriber usage pattern. In World Academy of Science, Engineering and Technology; WASET.ORG: Riverside, CT, USA, 2007; pp. 396–399. [Google Scholar]
Taniguchi, M.; Haft, M.; Hollmén, J.; Tresp, V. Fraud detection in communication networks using neural and probabilistic methods. In Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP’98 (Cat. No. 98CH36181), Seattle, WA, USA, 15 May 1998; Volume 2, pp. 1241–1244. [Google Scholar]
Fawcett, T.; Provost, F. Adaptive fraud detection. Data Min. Knowl. Discov. 1997, 1, 291–316. [Google Scholar] [CrossRef]
Saaid, F.A.; King, R.; Nur, D. Development of Users’ Call Profiles using Unsupervised Random Forest. In Proceedings of the Third Annual ASEARC Conference, Newcastle, Australia, 7–8 December 2009. [Google Scholar]
Lu, C.; Lin, S.; Liu, X.; Shi, H. Telecom fraud identification based on ADASYN and random forest. In Proceedings of the 2020 5th International Conference on Computer and Communication Systems (ICCCS), Shanghai, China, 15–18 May 2020; pp. 447–452. [Google Scholar]
Lihong, B.J. Fraud Phone Identification Furthermore, Management Based On Big Data Mining Technology. Chang. Inf. Commun. 2021, 34, 126–128. [Google Scholar]
Ji, Z.; Ma, Y.c.; Li, S.; Li, J.l. SVM based telecom fraud behavior identification method. Comput. Eng. Softw. 2017, 38, 46–51. [Google Scholar]
Wang, D.; Wang, Q.-y.; Zhan, S.-y.; Li, F.-x.; Wang, D.-z. A feature extraction method for fraud detection in mobile communication networks. In Proceedings of the Fifth World Congress on Intelligent Control and Automation (IEEE Cat. No. 04EX788), Hangzhou, China, 15–19 June 2004; Volume 2, pp. 1853–1856. [Google Scholar]
Li, R.; Zhang, Y.; Tuo, Y.; Chang, P. A novel method for detecting telecom fraud user. In Proceedings of the 2018 3rd International Conference on Information Systems Engineering (ICISE), Shanghai, China, 4–6 May 2018; pp. 46–50. [Google Scholar]
Sallehuddin, R.; Ibrahim, S.; Zain, A.M.; Elmi, A.H. Detecting SIM box fraud by using support vector machine and artificial neural network. J. Teknol. (Sci. Eng.) 2015, 74, 131–143. [Google Scholar] [CrossRef]
Subudhi, S.; Panigrahi, S. Quarter-sphere support vector machine for fraud detection in mobile telecommunication networks. Procedia Comput. Sci. 2015, 48, 353–359. [Google Scholar] [CrossRef]
Arafat, M.; Qusef, A.; Sammour, G. Detection of wangiri telecommunication fraud using ensemble learning. In Proceedings of the 2019 IEEE Jordan International Joint Conference on Electrical Engineering and Information Technology (JEEIT), Amman, Jordan, 9–11 April 2019; pp. 330–335. [Google Scholar]
Gowri, S.M.; Ramana, G.S.; Ranjani, M.S.; Tharani, T. Detection of telephony spam and scams using recurrent neural network (RNN) algorithm. In Proceedings of the 2021 7th International Conference on Advanced Computing and Communication Systems (ICACCS), Coimbatore, India, 19–20 March 2021; Volume 1, pp. 1284–1288. [Google Scholar]
Zhen, Z.; Gao, J. CDR2IMG: A Bridge from Text to Image in Telecommunication Fraud Detection. Comput. Syst. Sci. Eng. 2023, 47, 955. [Google Scholar] [CrossRef]
Yang, J.-K.; Xia, W.C. Fraud Call Identification Based on User Behavior Analysis. Comput. Syst. Appl. 2021, 30, 311–316. [Google Scholar]
Wahid, A.; Msahli, M.; Bifet, A.; Memmi, G. NFA: A neural factorization autoencoder based online telephony fraud detection. Digit. Commun. Netw. 2024, 10, 158–167. [Google Scholar] [CrossRef]
Li, S.; Xu, G.; Liu, Y. Fraud Call Identification Based on Broad Learning System and Convolutional Neural Networks. In Proceedings of the 2022 IEEE 8th International Conference on Computer and Communications (ICCC), Chengdu, China, 9–12 December 2022; pp. 1471–1476. [Google Scholar]
Hu, X.; Chen, H.; Liu, S.; Jiang, H.; Chu, G.; Li, R. BTG: A Bridge to Graph machine learning in telecommunications fraud detection. Future Gener. Comput. Syst. 2022, 137, 274–287. [Google Scholar] [CrossRef]
Ren, L.; Zang, Y.; Hu, R.; Li, D.; Wu, J.; Huan, Z.; Hu, J. Do not ignore heterogeneity and heterophily: Multi-network collaborative telecom fraud detection. Expert Syst. Appl. 2024, 257, 124974. [Google Scholar] [CrossRef]
Chu, G.; Wang, J.; Qi, Q.; Sun, H.; Tao, S.; Yang, H.; Liao, J.; Han, Z. Exploiting Spatial-Temporal Behavior Patterns for Fraud Detection in Telecom Networks. IEEE Trans. Dependable Secur. Comput. 2023, 20, 4564–4577. [Google Scholar] [CrossRef]
Hu, X.; Chen, H.; Chen, H.; Li, X.; Zhang, J.; Liu, S. Mining mobile network fraudsters with augmented graph neural networks. Entropy 2023, 25, 150. [Google Scholar] [CrossRef]
Wu, J.; Hu, R.; Li, D.; Ren, L.; Huang, Z.; Zang, Y. Beyond the individual: An improved telecom fraud detection approach based on latent synergy graph learning. Neural Netw. 2024, 169, 20–31. [Google Scholar] [CrossRef]
Koi-Akrofi, G.Y.; Koi-Akrofi, J.; Odai, D.A.; Twum, E.O. Global telecommunications fraud trend analysis. Int. J. Innov. Appl. Stud. 2019, 25, 940–947. [Google Scholar]
Ghosh, K.; Bellinger, C.; Corizzo, R.; Branco, P.; Krawczyk, B.; Japkowicz, N. The class imbalance problem in deep learning. Mach. Learn. 2024, 113, 4845–4901. [Google Scholar] [CrossRef]
Gambo, M.L.; Zainal, A.; Kassim, M.N. A convolutional neural network model for credit card fraud detection. In Proceedings of the 2022 International Conference on Data Science and Its Applications (ICoDSA), Bandung, Indonesia, 6–7 July 2022; pp. 198–202. [Google Scholar]
Priscilla, C.V.; Prabha, D.P. Influence of optimizing xgboost to handle class imbalance in credit card fraud detection. In Proceedings of the 2020 Third International Conference on Smart Systems and Inventive Technology (ICSSIT), Tirunelveli, India, 20–22 August 2020; pp. 1309–1315. [Google Scholar]
Huang, H.; Liu, B.; Xue, X.; Cao, J.; Chen, X. Imbalanced credit card fraud detection data: A solution based on hybrid neural network and clustering-based undersampling technique. Appl. Soft Comput. 2024, 154, 111368. [Google Scholar] [CrossRef]
Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141. [Google Scholar]
Subudhi, S.; Panigrahi, S. Use of Possibilistic fuzzy C-means clustering for telecom fraud detection. In Computational Intelligence in Data Mining, Proceedings of the International Conference on CIDM, Bhubaneswar, India, 10–11 December 2016; Springer: Singapore, 2017; pp. 633–641. [Google Scholar]
Xing, J.; Yu, M.; Wang, S.; Zhang, Y.; Ding, Y. Automated fraudulent phone call recognition through deep learning. Wirel. Commun. Mob. Comput. 2020, 2020, 8853468. [Google Scholar] [CrossRef]
Hu, X.; Chen, H.; Zhang, J.; Chen, H.; Liu, S.; Li, X.; Wang, Y.; Xue, X. GAT-COBO: Cost-Sensitive Graph Neural Network for Telecom Fraud Detection. IEEE Trans. Big Data 2024, 10, 528–542. [Google Scholar] [CrossRef]
Liu, Y.; Wang, C.; Lu, M.; Yang, J.; Gui, J.; Zhang, S. From Simple to Complex Scenes: Learning Robust Feature Representations for Accurate Human Parsing. IEEE Trans. Pattern Anal. Mach. Intell. 2024, 46, 5449–5462. [Google Scholar] [CrossRef]
Wang, C.; Zhang, Q.; Wang, X.; Zhou, L.; Li, Q.; Xia, Z.; Ma, B.; Shi, Y.Q. Light-Field Image Multiple Reversible Robust Watermarking Against Geometric Attacks. IEEE Trans. Dependable Secur. Comput. 2025; Early Access. [Google Scholar]

Figure 1. Flowchartof the proposed framework.

Figure 2. The proposed model in telecommunication fraud detection.

Figure 3. Comparison of different models.

Figure 4. Model Performance Comparison. (a) ROC Curves of Different Models; (b) Confusion Matrix of Our Proposed Model.

Figure 5. Recall with Different Numbers of Negative Samples.

Figure 6. Comparison of Different Features.

Table 1. The proposed CNN architecture and its parameters.

Layer	Number of Kernels	Kernel Size	Stride	Padding	Output
Input layer	24 × 31 × 8
Conv1	64	3 × 3	(1, 1)	same	24 × 31 × 64
SE1	64	-	-	-	24 × 31 × 64
Conv2	128	3 × 3	(1, 1)	valid	19 × 26 × 128
Max-pool1	-	2 × 2	(2, 2)	0	9 × 13 × 128
Conv3	64	3 × 3	(1, 1)	same	9 × 13 × 64
SE2	64	-	-	-	9 × 13 × 64
Conv4	128	3 × 3	(1, 1)	valid	4 × 8 × 128
Max-pool2	-	2 × 2	(2, 2)	0	2 ×4 × 128
FC1	-	-	-	-	256 × 1
FC2	-	-	-	-	2 × 1

Table 2. Parameter setting table.

Model	Hyperparameters
Our proposed model	Epoch = 100, batch = 8, the optimization method is Adam, learning rate = 0.0001, decay = 1 $\times 10^{- 5}$ , $α$ = 0.95, $γ$ = 3
LR	C = 100, penalty = ‘12’
RF	max_depth = 13, max_features = 9, min_sample_leaf = 10, min_samples_split = 50, n_estimators = 200
SVM (linear/poly/RBF/sigmoid)	C = 100, gamma = ‘auto’, cache_ = 500
XGBoost	colsample_bytree = 0.8, gamma = 0, learning rate = 0.01, max_depth = 3, n_estimators = 100
SDAE	Epoch = 800, batch = 512, the optimization method is Adam, learning rate = 0.0001, decay = 1 $\times 10^{- 5}$ , $α$ = 0.95, $γ$ = 3
1D-CNN	Epoch = 150, batch = 8, the optimization method is Adam, learning rate = 0.0001, decay = 0.0001, $α$ = 0.95, $γ$ = 3
CDR2IMG	Epoch = 150, batch = 8, the optimization method is Adam, learning rate = 0.0001, decay = 0.0001, $α$ = 0.95, $γ$ = 3

Table 3. Confusion Matrix for Binary Classification Model.

User Status	Prediction = 0	Prediction = 1
label = 0	TN	FP
label = 1	FN	TP

Table 4. Results of different model evaluation metrics under varying numbers of negative samples.

Negative Sampel Count	Metric	LR	RF	SVM (L)	SVM (P)	SVM (R)	SVM (S)	XGBoost	SDAE	1D-CNN	CDR2IMG	Our Model
N_10,000	Recall	0.7664	0.7009	0.7573	0.5922	0.6990	0.6990	0.7290	0.6601	0.7184	0.4953	0.8130
	Accuracy	0.8156	0.8763	0.8251	0.8451	0.8327	0.8113	0.8118	0.7882	0.7975	0.7509	0.7859
	F1-score	0.2971	0.3606	0.2977	0.2723	0.2903	0.2662	0.2826	0.2218	0.2578	0.1682	0.2779
	AUC	0.8768	0.8832	0.8684	0.8122	0.8656	0.8472	0.8118	0.7966	0.8478	0.6535	0.8632
N_20,000	Recall	0.7477	0.6636	0.7664	0.6168	0.7757	0.7103	0.7009	0.7169	0.7289	0.5140	0.7196
	Accuracy	0.8225	0.8880	0.8232	0.8442	0.8293	0.8191	0.8093	0.8093	0.8397	0.7138	0.8190
	F1-score	0.1800	0.2359	0.1887	0.1710	0.1915	0.1698	0.1608	0.1381	0.1914	0.0856	0.1714
	AUC	0.8775	0.8813	0.8809	0.8221	0.8862	0.8577	0.8693	0.8289	0.8868	0.6461	0.8734
N_50,000	Recall	0.7570	0.6822	0.7664	0.6916	0.7757	0.7009	0.7196	0.7009	0.7289	0.5233	0.7570
	Accuracy	0.8241	0.8849	0.8272	0.8226	0.8333	0.7720	0.8091	0.8135	0.8206	0.6754	0.8069
	F1-score	0.0835	0.1115	0.0859	0.0762	0.0897	0.0320	0.0739	0.0736	0.0792	0.0330	0.0766
	AUC	0.8755	0.8757	0.8793	0.8357	0.8863	0.8227	0.8618	0.8288	0.8844	0.6321	0.8616
N_ALL	Recall	0.7664	0.6449	0.7757	0.7009	0.7570	0.6916	0.7103	0.6635	0.7570	0.3925	0.7476
	Accuracy	0.8233	0.8837	0.8252	0.814	0.8368	0.7073	0.8089	0.8321	0.8379	0.8066	0.7713
	F1-score	0.0471	0.0595	0.0482	0.0412	0.0502	0.0262	0.0209	0.0431	0.0505	0.0226	0.0362
	AUC	0.8763	0.8777	0.8816	0.8339	0.8812	0.7753	0.8505	0.8088	0.8755	0.6064	0.8429

Table 5. Ablation study on loss functions and feature dimensions.

Model	Recall	Accuracy	F1-Score	AUC
6d feature model	0.7009	0.8018	0.2645	0.8437
8d feature model (WCE)	0.7757	0.7884	0.2716	0.8436
8d feature model	0.8130	0.7859	0.2779	0.8632
10d feature model	0.7102	0.8174	0.2835	0.8677

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Li, J.; Dang, J.; Wang, Y.; Yang, J. Image-Based Telecom Fraud Detection Method Using an Attention Convolutional Neural Network. Entropy 2025, 27, 1013. https://doi.org/10.3390/e27101013

AMA Style

Li J, Dang J, Wang Y, Yang J. Image-Based Telecom Fraud Detection Method Using an Attention Convolutional Neural Network. Entropy. 2025; 27(10):1013. https://doi.org/10.3390/e27101013

Chicago/Turabian Style

Li, Jiyuan, Jianwu Dang, Yangping Wang, and Jingyu Yang. 2025. "Image-Based Telecom Fraud Detection Method Using an Attention Convolutional Neural Network" Entropy 27, no. 10: 1013. https://doi.org/10.3390/e27101013

APA Style

Li, J., Dang, J., Wang, Y., & Yang, J. (2025). Image-Based Telecom Fraud Detection Method Using an Attention Convolutional Neural Network. Entropy, 27(10), 1013. https://doi.org/10.3390/e27101013

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Image-Based Telecom Fraud Detection Method Using an Attention Convolutional Neural Network

Abstract

1. Introduction

2. Related Work

2.1. Rule-Based Methods

2.2. Traditional Machine Learning

2.3. Deep Learning Approaches

2.4. Graph Neural Networks

3. Materials and Methods

3.1. Datasets

3.2. A Fraud Detection Framework

3.2.1. Feature Engineering

3.2.2. Convolutional Neural Network

4. Experiment and Discussion

4.1. Experiment Setup

4.1.1. Training Environment

4.1.2. Parameter Settings

4.1.3. Evaluation Metrics

4.2. Experimental Analysis

4.2.1. Performance Comparisison

4.2.2. Ablation Study

5. Conclusions and Future Works

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI