Figure 1.
Overview of Wangiri fraud detection in real life.
Figure 1.
Overview of Wangiri fraud detection in real life.
Figure 2.
Literature review analysis (a) Distribution of telecom fraud detection research by methodology type. (b) Distribution of research focus by fraud mechanism.
Figure 2.
Literature review analysis (a) Distribution of telecom fraud detection research by methodology type. (b) Distribution of research focus by fraud mechanism.
Figure 3.
Overview of the Wangiri fraud detection pipeline, including data ingestion, feature engineering, balancing, model training, calibration, and evaluation.
Figure 3.
Overview of the Wangiri fraud detection pipeline, including data ingestion, feature engineering, balancing, model training, calibration, and evaluation.
Figure 4.
Distribution of fraud vs. non-fraud calls (y-axis in millions of calls).
Figure 4.
Distribution of fraud vs. non-fraud calls (y-axis in millions of calls).
Figure 5.
Spearman correlation of features with the fraud label.
Figure 5.
Spearman correlation of features with the fraud label.
Figure 6.
Spearman correlation heatmap among all features.
Figure 6.
Spearman correlation heatmap among all features.
Figure 7.
Log-scale distributions of key numerical features.
Figure 7.
Log-scale distributions of key numerical features.
Figure 9.
Call distribution by source and destination time of day, colored by fraud label.
Figure 9.
Call distribution by source and destination time of day, colored by fraud label.
Figure 10.
Top 10 caller and destination countries in fraudulent calls.
Figure 10.
Top 10 caller and destination countries in fraudulent calls.
Figure 11.
Analysisof engineered features by fraud status.
Figure 11.
Analysisof engineered features by fraud status.
Figure 12.
callduration vs. cpg_time, colored by fraud status. The number of blue dots is dominant due to the rarity of fraudulent calls.
Figure 12.
callduration vs. cpg_time, colored by fraud status. The number of blue dots is dominant due to the rarity of fraudulent calls.
Figure 13.
Training and Evaluation Procedure.
Figure 13.
Training and Evaluation Procedure.
Figure 14.
Confusion matrices for Logistic Regression (SMOTE + RUS, no calibration) on raw distribution (left) and balanced (right) test sets.
Figure 14.
Confusion matrices for Logistic Regression (SMOTE + RUS, no calibration) on raw distribution (left) and balanced (right) test sets.
Figure 15.
ROC (left), PR (center), and calibration (right) curves for Logistic Regression (SMOTE + RUS, no calibration) on the raw distribution test set. The dashed lines represent the standard performance of a completely random distribution.
Figure 15.
ROC (left), PR (center), and calibration (right) curves for Logistic Regression (SMOTE + RUS, no calibration) on the raw distribution test set. The dashed lines represent the standard performance of a completely random distribution.
Figure 16.
ROC (left), PR (center), and calibration (right) curves for Logistic Regression (SMOTE + RUS, no calibration) on the balanced test set. The dashed lines represent the standard performance of a completely random distribution.
Figure 16.
ROC (left), PR (center), and calibration (right) curves for Logistic Regression (SMOTE + RUS, no calibration) on the balanced test set. The dashed lines represent the standard performance of a completely random distribution.
Figure 17.
Confusion matrices for Random Forest (SMOTE + RUS, no calibration) on raw distribution (left) and balanced (right) test sets.
Figure 17.
Confusion matrices for Random Forest (SMOTE + RUS, no calibration) on raw distribution (left) and balanced (right) test sets.
Figure 18.
ROC (left), PR (center), and calibration (right) curves for Random Forest (SMOTE + RUS, no calibration) on the raw distribution test set. The dashed lines represent the standard performance of a completely random distribution.
Figure 18.
ROC (left), PR (center), and calibration (right) curves for Random Forest (SMOTE + RUS, no calibration) on the raw distribution test set. The dashed lines represent the standard performance of a completely random distribution.
Figure 19.
ROC (left), PR (center), and calibration (right) curves for Random Forest (SMOTE + RUS, no calibration) on the balanced test set. The dashed lines represent the standard performance of a completely random distribution.
Figure 19.
ROC (left), PR (center), and calibration (right) curves for Random Forest (SMOTE + RUS, no calibration) on the balanced test set. The dashed lines represent the standard performance of a completely random distribution.
Figure 20.
Confusion matrices for XGBoost (SMOTE + RUS, no calibration) on raw distribution (left) and balanced (right) test sets.
Figure 20.
Confusion matrices for XGBoost (SMOTE + RUS, no calibration) on raw distribution (left) and balanced (right) test sets.
Figure 21.
ROC (left), PR (center), and calibration (right) curves for XGBoost (SMOTE + RUS, no calibration) on the raw distribution test set. The dashed lines represent the standard performance of a completely random distribution.
Figure 21.
ROC (left), PR (center), and calibration (right) curves for XGBoost (SMOTE + RUS, no calibration) on the raw distribution test set. The dashed lines represent the standard performance of a completely random distribution.
Figure 22.
ROC (left), PR (center), and calibration (right) curves for XGBoost (SMOTE + RUS, no calibration) on the balanced test set. The dashed lines represent the standard performance of a completely random distribution.
Figure 22.
ROC (left), PR (center), and calibration (right) curves for XGBoost (SMOTE + RUS, no calibration) on the balanced test set. The dashed lines represent the standard performance of a completely random distribution.
Figure 23.
SHAP bar plot for the XGBoost model (SMOTE + RUS, no calibration).
Figure 23.
SHAP bar plot for the XGBoost model (SMOTE + RUS, no calibration).
Figure 24.
SHAP summary plot for the XGBoost model (SMOTE + RUS, no calibration).
Figure 24.
SHAP summary plot for the XGBoost model (SMOTE + RUS, no calibration).
Figure 25.
SHAP waterfall plot for the XGBoost model (SMOTE + RUS, no calibration).
Figure 25.
SHAP waterfall plot for the XGBoost model (SMOTE + RUS, no calibration).
Figure 26.
Confusion matrices for Decision Tree (SMOTE + RUS, no calibration) on raw distribution (left) and balanced (right) test sets.
Figure 26.
Confusion matrices for Decision Tree (SMOTE + RUS, no calibration) on raw distribution (left) and balanced (right) test sets.
Figure 27.
ROC (left), PR (center), and calibration (right) curves for Decision Tree (SMOTE + RUS, no calibration) on the raw distribution test set. The dashed lines represent the standard performance of a completely random distribution.
Figure 27.
ROC (left), PR (center), and calibration (right) curves for Decision Tree (SMOTE + RUS, no calibration) on the raw distribution test set. The dashed lines represent the standard performance of a completely random distribution.
Figure 28.
ROC (left), PR (center), and calibration (right) curves for Decision Tree (SMOTE + RUS, no calibration) on the balanced test set. The dashed lines represent the standard performance of a completely random distribution.
Figure 28.
ROC (left), PR (center), and calibration (right) curves for Decision Tree (SMOTE + RUS, no calibration) on the balanced test set. The dashed lines represent the standard performance of a completely random distribution.
Figure 29.
Confusion matrices for MLP (SMOTE + RUS, no calibration) on raw distribution (left) and balanced (right) test sets.
Figure 29.
Confusion matrices for MLP (SMOTE + RUS, no calibration) on raw distribution (left) and balanced (right) test sets.
Figure 30.
ROC (left), PR (center), and calibration (right) curves for MLP (SMOTE + RUS, no calibration) on the raw distribution test set. The dashed lines represent the standard performance of a completely random distribution.
Figure 30.
ROC (left), PR (center), and calibration (right) curves for MLP (SMOTE + RUS, no calibration) on the raw distribution test set. The dashed lines represent the standard performance of a completely random distribution.
Figure 31.
ROC (left), PR (center), and calibration (right) curves for MLP (SMOTE + RUS, no calibration) on the balanced test set. The dashed lines represent the standard performance of a completely random distribution.
Figure 31.
ROC (left), PR (center), and calibration (right) curves for MLP (SMOTE + RUS, no calibration) on the balanced test set. The dashed lines represent the standard performance of a completely random distribution.
Table 1.
A comparative analysis of state-of-the-art research papers on telecom fraud detection using AI techniques.
Table 1.
A comparative analysis of state-of-the-art research papers on telecom fraud detection using AI techniques.
| Study | Year | Methodology | Fraud Type | Real-Time | Accuracy | Key Contribution |
|---|
| Sahin et al. [26] | 2011 | Supervised ML | General | No | ∼89% | Benchmark of standard ML models |
| Arafat et al. [27] | 2019 | Ensemble ML | Wangiri | No | ∼92% | Boosting for missed-call scams |
| Sahaidak et al. [42] | 2022 | Literature Review | Hybrid | No | - | Industry/hybrid fraud review |
| Ravi et al. [30] | 2022 | Mixed ML | Wangiri | No | Pattern-dependent | Fraud pattern taxonomy |
| Krasic and Celar [29] | 2022 | ML + SMOTE | General | No | Improved F1 | Handling imbalanced data |
| Hu et al. [21] | 2022 | GNN + RL | General | Yes | High precision/recall | Graph imbalance handling |
| Bayram and Özkoç [41] | 2023 | Regulatory | General (Turkey) | No | - | Policy reform proposals |
| Cazzolato et al. [33] | 2023 | Graph Visualization | Multi-type | Yes | Analyst-friendly | Visual fraud discovery |
| Muchilwa et al. [40] | 2023 | Threat Intel Sharing | Phone fraud | Yes | Platform success | Coeus data sharing |
| Liang et al. [31] | 2023 | Autoencoder + Binning | General | Yes | Beats GNNs | GNN-free modeling |
| Wahid et al. [32] | 2024 | Neural Autoencoder | General | Yes | 95.45% (F1) | Memory-aware deep streaming |
| Mundia et al. [39] | 2024 | Policy + Qual. | SIM swap, Wangiri | No | - | Concept drift and biometric proposal |
| Birhanu [28] | 2024 | RF, NN | SIM-box | Yes | 100% | Real-time slicing with full accuracy |
| Boskou et al. [37] | 2024 | Prompted LLM | Financial | No | 67% (F1) | Fraud detection in corp. docs |
| Korkanti [38] | 2024 | LLM + Analytics | Financial (broad) | Yes | High precision | Hybrid anomaly detection |
| Singh et al. [34] | 2025 | RAG + LLM | Conversational | Yes | 97.98% | Real-time voice + policy detection |
| Shen et al. (WS *) [35] | 2025 | LLM Evaluation | Conversational | No | ∼99% (RF) | Dataset bias analysis |
| Shen et al. (IWM **) [36] | 2025 | Real-time LLM | Conversational | Yes | Effective alerts | User scam prevention |
Table 2.
CDR attributes related to Wangiri Fraud Detection.
Table 2.
CDR attributes related to Wangiri Fraud Detection.
| CDR Attribute | Stands for | Description |
|---|
| CALLERCC | Calling Party’s Country Code | A string indicating the home country code of the calling number. |
| CALLEDNO | Called Number | A string representing the number that was dialed. |
| CDRID | Call Detail Record ID | A unique numeric identifier assigned to each CDR entry. |
| STARTTIME | Start Time | A numeric value denoting the second component of the timestamp of the Initial Address Message (IAM). Unit: seconds. |
| MILLISEC | Milliseconds | A numeric value indicating the millisecond component of the IAM timestamp. Unit: milliseconds. |
| OPC | Originating Point Code | A numeric field identifying the signaling point code of the originating network element. |
| DESTCC | Destination Country Code | A string indicating the home carrier’s country code of the called party. |
| FORMATCALLERNO | Normalized Calling Number | A normalized representation of the caller number. |
| IAM TIME | I AM Time | A numeric value representing the delay of the Initial Address Message. Unit: milliseconds. |
| ACM TIME | ACM Time | A numeric field indicating the delay of the Address Complete Message. Unit: milliseconds. |
| CPG TIME | CPG Time | A numeric field representing the delay of the Call Progress Message. Unit: milliseconds. |
| CALLDURATION | Call Duration | A numeric field measuring the duration of the call. Unit: seconds/milliseconds. |
Table 3.
XGBoost hyperparameter search space.
Table 3.
XGBoost hyperparameter search space.
| Hyperparameter | Values Tested |
|---|
| n_estimators | {150, 300} |
| max_depth | {3, 5, 7} |
| learning_rate | {0.05, 0.1, 0.2} |
| subsample | {0.8, 0.9, 1.0} |
| colsample_bytree | {0.8, 0.9, 1.0} |
Table 4.
Confusion Matrix.
Table 4.
Confusion Matrix.
| Actual | Predicted |
|---|
| Fraud | Non-Fraud |
|---|
| Fraud | TP | FN |
| Non-Fraud | FP | TN |
Table 5.
Summary of metrics for Logistic Regression across all configurations on the label_unique_calls_and_callduration dataset. The best-performing result in the Training Strategy section of the table is highlighted in bold.
Table 5.
Summary of metrics for Logistic Regression across all configurations on the label_unique_calls_and_callduration dataset. The best-performing result in the Training Strategy section of the table is highlighted in bold.
| Training Strategy | Calibration | Test Set | Accuracy | Precision (Macro) | Recall (Macro) | F1 (Macro) | ROC-AUC/PR-AUC |
|---|
| train_full | none | raw_dist | 0.9992 | 0.4996 | 0.5000 | 0.4998 | 0.8477/0.0063 |
| train_full | none | balanced | 0.5000 | 0.2500 | 0.5000 | 0.3333 | 0.8456/0.8256 |
| train_full | isotonic | raw_dist | 0.9992 | 0.4996 | 0.5000 | 0.4998 | 0.8488/0.0068 |
| train_full | isotonic | balanced | 0.5000 | 0.2500 | 0.5000 | 0.3333 | 0.8469/0.8209 |
| train_full | sigmoid | raw_dist | 0.9992 | 0.4996 | 0.5000 | 0.4998 | 0.8477/0.0063 |
| train_full | sigmoid | balanced | 0.5000 | 0.2500 | 0.5000 | 0.3333 | 0.8456/0.8256 |
| train_smote | none | raw_dist | 0.9934 | 0.5536 | 0.9948 | 0.5951 | 0.9948/0.0727 |
| train_smote | none | balanced | 0.9946 | 0.9946 | 0.9946 | 0.9946 | 0.9949/0.9809 |
| train_smote | isotonic | raw_dist | 0.9992 | 0.4996 | 0.5000 | 0.4998 | 0.9963/0.1135 |
| train_smote | isotonic | balanced | 0.5000 | 0.2500 | 0.5000 | 0.3333 | 0.9965/0.9936 |
| train_smote | sigmoid | raw_dist | 0.9992 | 0.4996 | 0.5000 | 0.4998 | 0.9948/0.0727 |
| train_smote | sigmoid | balanced | 0.5000 | 0.2500 | 0.5000 | 0.3333 | 0.9949/0.9809 |
| train_smote_rus | none | raw_dist | 0.9934 | 0.5536 | 0.9948 | 0.5951 | 0.9948/0.0727 |
| train_smote_rus | none | balanced | 0.9946 | 0.9946 | 0.9946 | 0.9946 | 0.9949/0.9809 |
| train_smote_rus | isotonic | raw_dist | 0.9992 | 0.4996 | 0.5000 | 0.4998 | 0.9963/0.1135 |
| train_smote_rus | isotonic | balanced | 0.5000 | 0.2500 | 0.5000 | 0.3333 | 0.9965/0.9936 |
| train_smote_rus | sigmoid | raw_dist | 0.9992 | 0.4996 | 0.5000 | 0.4998 | 0.9948/0.0727 |
| train_smote_rus | sigmoid | balanced | 0.5000 | 0.2500 | 0.5000 | 0.3333 | 0.9949/0.9809 |
| train_rus | none | raw_dist | 0.9931 | 0.5520 | 0.9944 | 0.5924 | 0.9948/0.0729 |
| train_rus | none | balanced | 0.9944 | 0.9944 | 0.9944 | 0.9944 | 0.9949/0.9810 |
| train_rus | isotonic | raw_dist | 0.9992 | 0.4996 | 0.5000 | 0.4998 | 0.9966/0.1137 |
| train_rus | isotonic | balanced | 0.5000 | 0.2500 | 0.5000 | 0.3333 | 0.9968/0.9938 |
| train_rus | sigmoid | raw_dist | 0.9992 | 0.4996 | 0.5000 | 0.4998 | 0.9948/0.0729 |
| train_rus | sigmoid | balanced | 0.5000 | 0.2500 | 0.5000 | 0.3333 | 0.9949/0.9810 |
Table 6.
Summary of metrics for Random Forest across all configurations on the label_unique_calls_and_callduration dataset. The best-performing result in the Training Strategy section of the table is highlighted in bold.
Table 6.
Summary of metrics for Random Forest across all configurations on the label_unique_calls_and_callduration dataset. The best-performing result in the Training Strategy section of the table is highlighted in bold.
| Training Strategy | Calibration | Test Set | Accuracy | Precision (Macro) | Recall (Macro) | F1 (Macro) | ROC-AUC/PR-AUC |
|---|
| train_full | none | raw_dist | 0.9999 | 0.9820 | 0.9820 | 0.9820 | 0.99999/0.9943 |
| train_full | none | balanced | 0.9817 | 0.9823 | 0.9817 | 0.9817 | 0.99988/0.99986 |
| train_full | isotonic | raw_dist | 0.9999 | 0.9773 | 0.9863 | 0.9817 | 0.99999/0.9928 |
| train_full | isotonic | balanced | 0.9860 | 0.9864 | 0.9860 | 0.9860 | 0.99989/0.99985 |
| train_full | sigmoid | raw_dist | 0.9999 | 0.9827 | 0.9796 | 0.9811 | 0.99999/0.9943 |
| train_full | sigmoid | balanced | 0.9793 | 0.9801 | 0.9793 | 0.9793 | 0.99988/0.99986 |
| train_smote | none | raw_dist | 0.9998 | 0.8923 | 0.9932 | 0.9370 | 0.99999/0.9836 |
| train_smote | none | balanced | 0.9930 | 0.9931 | 0.9930 | 0.9930 | 0.99994/0.99994 |
| train_smote | isotonic | raw_dist | 0.9999 | 0.9602 | 0.9701 | 0.9651 | 0.99972/0.9795 |
| train_smote | isotonic | balanced | 0.9699 | 0.9715 | 0.9699 | 0.9699 | 0.99967/0.99966 |
| train_smote | sigmoid | raw_dist | 0.9999 | 0.9588 | 0.9704 | 0.9645 | 0.99999/0.9836 |
| train_smote | sigmoid | balanced | 0.9702 | 0.9718 | 0.9702 | 0.9701 | 0.99994/0.99994 |
| train_smote_rus | none | raw_dist | 0.9998 | 0.8946 | 0.9929 | 0.9383 | 0.99999/0.9841 |
| train_smote_rus | none | balanced | 0.9927 | 0.9928 | 0.9927 | 0.9927 | 0.99993/0.99992 |
| train_smote_rus | isotonic | raw_dist | 0.9999 | 0.9751 | 0.9570 | 0.9659 | 0.99972/0.9808 |
| train_smote_rus | isotonic | balanced | 0.9567 | 0.9601 | 0.9567 | 0.9566 | 0.99967/0.99965 |
| train_smote_rus | sigmoid | raw_dist | 0.9999 | 0.9603 | 0.9739 | 0.9670 | 0.99999/0.9841 |
| train_smote_rus | sigmoid | balanced | 0.9737 | 0.9749 | 0.9737 | 0.9736 | 0.99993/0.99992 |
| train_rus | none | raw_dist | 0.9976 | 0.6233 | 0.9988 | 0.6972 | 0.99978/0.6461 |
| train_rus | none | balanced | 0.9984 | 0.9984 | 0.9984 | 0.9984 | 0.99989/0.99984 |
| train_rus | isotonic | raw_dist | 0.9995 | 0.8273 | 0.9068 | 0.8628 | 0.99951/0.6456 |
| train_rus | isotonic | balanced | 0.9070 | 0.9216 | 0.9070 | 0.9062 | 0.99962/0.99958 |
| train_rus | sigmoid | raw_dist | 0.9994 | 0.7963 | 0.9782 | 0.8658 | 0.99978/0.6461 |
| train_rus | sigmoid | balanced | 0.9780 | 0.9788 | 0.9780 | 0.9779 | 0.99989/0.99984 |
Table 7.
Summary of metrics for XGBoost across all configurations on the label_unique_calls_and_callduration dataset. The best-performing result in the Training Strategy section of the table is highlighted in bold.
Table 7.
Summary of metrics for XGBoost across all configurations on the label_unique_calls_and_callduration dataset. The best-performing result in the Training Strategy section of the table is highlighted in bold.
| Training Strategy | Calibration | Test Set | Accuracy | Precision (Macro) | Recall (Macro) | F1 (Macro) | ROC-AUC/PR-AUC |
|---|
| train_full | none | raw_dist | 0.9998 | 0.9925 | 0.8745 | 0.9254 | 0.9899/0.8836 |
| train_full | none | balanced | 0.8745 | 0.8997 | 0.8745 | 0.8725 | 0.9902/0.9926 |
| train_full | isotonic | raw_dist | 0.9998 | 0.9784 | 0.9172 | 0.9457 | 0.9899/0.8757 |
| train_full | isotonic | balanced | 0.9172 | 0.9290 | 0.9172 | 0.9166 | 0.9902/0.9926 |
| train_full | sigmoid | raw_dist | 0.9998 | 0.9922 | 0.8763 | 0.9265 | 0.9785/0.8817 |
| train_full | sigmoid | balanced | 0.8763 | 0.9009 | 0.8763 | 0.8744 | 0.9787/0.9792 |
| train_smote | none | raw_dist | 0.9996 | 0.8752 | 0.8695 | 0.8723 | 0.9999/0.8485 |
| train_smote | none | balanced | 0.8694 | 0.8962 | 0.8694 | 0.8671 | 0.9996/0.9994 |
| train_smote | isotonic | raw_dist | 0.9996 | 0.8572 | 0.9327 | 0.8913 | 0.9996/0.8327 |
| train_smote | isotonic | balanced | 0.9325 | 0.9404 | 0.9325 | 0.9322 | 0.9994/0.9991 |
| train_smote | sigmoid | raw_dist | 0.9996 | 0.8955 | 0.8190 | 0.8532 | 0.9999/0.8485 |
| train_smote | sigmoid | balanced | 0.8188 | 0.8667 | 0.8188 | 0.8127 | 0.9996/0.9994 |
| train_smote_rus | none | raw_dist | 0.9997 | 0.8961 | 0.8897 | 0.8929 | 0.9999/0.8734 |
| train_smote_rus | none | balanced | 0.8895 | 0.9093 | 0.8895 | 0.8882 | 0.9996/0.9993 |
| train_smote_rus | isotonic | raw_dist | 0.9997 | 0.8981 | 0.8838 | 0.8908 | 0.9991/0.8591 |
| train_smote_rus | isotonic | balanced | 0.8836 | 0.9054 | 0.8836 | 0.8820 | 0.9988/0.9986 |
| train_smote_rus | sigmoid | raw_dist | 0.9996 | 0.9140 | 0.8499 | 0.8793 | 0.9999/0.8734 |
| train_smote_rus | sigmoid | balanced | 0.8497 | 0.8842 | 0.8497 | 0.8463 | 0.9996/0.9993 |
| train_rus | none | raw_dist | 0.9980 | 0.6451 | 0.9988 | 0.7244 | 0.9996/0.6024 |
| train_rus | none | balanced | 0.9981 | 0.9981 | 0.9981 | 0.9981 | 0.9996/0.9996 |
| train_rus | isotonic | raw_dist | 0.9994 | 0.7980 | 0.8783 | 0.8333 | 0.9996/0.5864 |
| train_rus | isotonic | balanced | 0.8785 | 0.9022 | 0.8785 | 0.8767 | 0.9996/0.9994 |
| train_rus | sigmoid | raw_dist | 0.9992 | 0.4996 | 0.5000 | 0.4998 | 0.9996/0.6024 |
| train_rus | sigmoid | balanced | 0.5000 | 0.2500 | 0.5000 | 0.3333 | 0.9996/0.9996 |
Table 8.
Summary of metrics for Decision Tree across all configurations on the label_unique_calls_and_callduration dataset. The best-performing result in the Training Strategy section of the table is highlighted in bold.
Table 8.
Summary of metrics for Decision Tree across all configurations on the label_unique_calls_and_callduration dataset. The best-performing result in the Training Strategy section of the table is highlighted in bold.
| Training Strategy | Calibration | Test Set | Accuracy | Precision (Macro) | Recall (Macro) | F1 (Macro) | ROC-AUC/PR-AUC |
|---|
| train_full | none | raw_dist | 0.99998 | 0.9925 | 0.9970 | 0.9948 | 0.9970/0.9793 |
| train_full | none | balanced | 0.9970 | 0.9971 | 0.9970 | 0.9970 | 0.9970/0.9970 |
| train_full | isotonic | raw_dist | 0.99998 | 0.9925 | 0.9970 | 0.9948 | 0.9970/0.9793 |
| train_full | isotonic | balanced | 0.9970 | 0.9971 | 0.9970 | 0.9970 | 0.9970/0.9970 |
| train_full | sigmoid | raw_dist | 0.99998 | 0.9925 | 0.9970 | 0.9948 | 0.9970/0.9793 |
| train_full | sigmoid | balanced | 0.9970 | 0.9971 | 0.9970 | 0.9970 | 0.9970/0.9970 |
| train_smote | none | raw_dist | 0.9999 | 0.9752 | 0.9900 | 0.9825 | 0.9900/0.9316 |
| train_smote | none | balanced | 0.9901 | 0.9902 | 0.9901 | 0.9901 | 0.9901/0.9901 |
| train_smote | isotonic | raw_dist | 0.9999 | 0.9752 | 0.9900 | 0.9825 | 0.9900/0.9316 |
| train_smote | isotonic | balanced | 0.9901 | 0.9902 | 0.9901 | 0.9901 | 0.9901/0.9901 |
| train_smote | sigmoid | raw_dist | 0.9999 | 0.9752 | 0.9900 | 0.9825 | 0.9900/0.9316 |
| train_smote | sigmoid | balanced | 0.9901 | 0.9902 | 0.9901 | 0.9901 | 0.9901/0.9901 |
| train_smote_rus | none | raw_dist | 0.9999 | 0.9752 | 0.9900 | 0.9825 | 0.9900/0.9316 |
| train_smote_rus | none | balanced | 0.9901 | 0.9902 | 0.9901 | 0.9901 | 0.9901/0.9901 |
| train_smote_rus | isotonic | raw_dist | 0.9999 | 0.9752 | 0.9900 | 0.9825 | 0.9900/0.9316 |
| train_smote_rus | isotonic | balanced | 0.9901 | 0.9902 | 0.9901 | 0.9901 | 0.9901/0.9901 |
| train_smote_rus | sigmoid | raw_dist | 0.9999 | 0.9752 | 0.9900 | 0.9825 | 0.9900/0.9316 |
| train_smote_rus | sigmoid | balanced | 0.9901 | 0.9902 | 0.9901 | 0.9901 | 0.9901/0.9901 |
| train_rus | none | raw_dist | 0.9991 | 0.7388 | 0.9993 | 0.8230 | 0.9993/0.4774 |
| train_rus | none | balanced | 0.9989 | 0.9989 | 0.9989 | 0.9989 | 0.9989/0.9981 |
| train_rus | isotonic | raw_dist | 0.9992 | 0.4996 | 0.5000 | 0.4998 | 0.9993/0.4774 |
| train_rus | isotonic | balanced | 0.5000 | 0.2500 | 0.5000 | 0.3333 | 0.9989/0.9981 |
| train_rus | sigmoid | raw_dist | 0.9992 | 0.4996 | 0.5000 | 0.4998 | 0.9993/0.4774 |
| train_rus | sigmoid | balanced | 0.5000 | 0.2500 | 0.5000 | 0.3333 | 0.9989/0.9981 |
Table 9.
Summary of metrics for MLP across all configurations on the label_unique_calls_and_callduration dataset. The best-performing result in the Training Strategy section of the table is highlighted in bold.
Table 9.
Summary of metrics for MLP across all configurations on the label_unique_calls_and_callduration dataset. The best-performing result in the Training Strategy section of the table is highlighted in bold.
| Training Strategy | Calibration | Test Set | Accuracy | Precision (Macro) | Recall (Macro) | F1 (Macro) | ROC-AUC/PR-AUC |
|---|
| train_full | none | raw_dist | 0.9997 | 0.9681 | 0.8175 | 0.8783 | 0.9994/0.8816 |
| train_full | none | balanced | 0.8175 | 0.8663 | 0.8175 | 0.8112 | 0.9993/0.9995 |
| train_full | isotonic | raw_dist | 0.9997 | 0.9661 | 0.8196 | 0.8792 | 0.9996/0.8692 |
| train_full | isotonic | balanced | 0.8196 | 0.8674 | 0.8196 | 0.8136 | 0.9995/0.9994 |
| train_full | sigmoid | raw_dist | 0.9997 | 0.9595 | 0.8218 | 0.8785 | 0.9996/0.8816 |
| train_full | sigmoid | balanced | 0.8218 | 0.8686 | 0.8218 | 0.8159 | 0.9995/0.9995 |
| train_smote | none | raw_dist | 0.9993 | 0.7677 | 0.9980 | 0.8481 | 0.9999/0.8801 |
| train_smote | none | balanced | 0.9978 | 0.9979 | 0.9978 | 0.9978 | 0.9998/0.9998 |
| train_smote | isotonic | raw_dist | 0.9996 | 0.9063 | 0.8572 | 0.8802 | 0.9996/0.8652 |
| train_smote | isotonic | balanced | 0.8570 | 0.8885 | 0.8570 | 0.8540 | 0.9995/0.9994 |
| train_smote | sigmoid | raw_dist | 0.9994 | 0.7934 | 0.9919 | 0.8675 | 0.9999/0.8801 |
| train_smote | sigmoid | balanced | 0.9917 | 0.9918 | 0.9917 | 0.9917 | 0.9998/0.9998 |
| train_smote_rus | none | raw_dist | 0.9993 | 0.7697 | 0.9975 | 0.8497 | 0.9999/0.8476 |
| train_smote_rus | none | balanced | 0.9973 | 0.9973 | 0.9973 | 0.9973 | 0.9997/0.9997 |
| train_smote_rus | isotonic | raw_dist | 0.9996 | 0.8765 | 0.8617 | 0.8690 | 0.9969/0.8304 |
| train_smote_rus | isotonic | balanced | 0.8616 | 0.8913 | 0.8616 | 0.8589 | 0.9967/0.9966 |
| train_smote_rus | sigmoid | raw_dist | 0.9995 | 0.8080 | 0.9912 | 0.8786 | 0.9999/0.8476 |
| train_smote_rus | sigmoid | balanced | 0.9909 | 0.9910 | 0.9909 | 0.9909 | 0.9997/0.9997 |
| train_rus | none | raw_dist | 0.9953 | 0.5722 | 0.9974 | 0.6250 | 0.9991/0.3102 |
| train_rus | none | balanced | 0.9973 | 0.9973 | 0.9973 | 0.9973 | 0.9990/0.9972 |
| train_rus | isotonic | raw_dist | 0.9992 | 0.4996 | 0.5000 | 0.4998 | 0.9988/0.3140 |
| train_rus | isotonic | balanced | 0.5000 | 0.2500 | 0.5000 | 0.3333 | 0.9989/0.9976 |
| train_rus | sigmoid | raw_dist | 0.9992 | 0.4996 | 0.5000 | 0.4998 | 0.9991/0.3102 |
| train_rus | sigmoid | balanced | 0.5000 | 0.2500 | 0.5000 | 0.3333 | 0.9990/0.9972 |
Table 10.
Summary of best model performances on the raw distribution test set (selected by highest PR-AUC). The best-performing result in the table is highlighted in bold.
Table 10.
Summary of best model performances on the raw distribution test set (selected by highest PR-AUC). The best-performing result in the table is highlighted in bold.
| Model | Training Strategy | Calibration | Accuracy | Precision (Macro) | Recall (Macro) | F1 (Macro) | ROC-AUC | PR-AUC |
|---|
| Logistic Regression | train_rus | isotonic | 0.9992 | 0.4996 | 0.5000 | 0.4998 | 0.9966 | 0.1137 |
| Random Forest | train_full | none/sigmoid | 0.9999 | 0.9820/0.9827 | 0.9820/0.9796 | 0.9820/0.9811 | 0.99999 | 0.9943 |
| XGBoost | train_full | none | 0.9998 | 0.9925 | 0.8745 | 0.9254 | 0.9899 | 0.8836 |
| Decision Tree | train_full | none/isotonic/sigmoid | 0.99998 | 0.9925 | 0.9970 | 0.9948 | 0.9970 | 0.9793 |
| MLP | train_full | none/sigmoid | 0.9997 | 0.9681/0.9595 | 0.8175/0.8218 | 0.8783/0.8785 | 0.9994/0.9996 | 0.8816 |
Table 11.
Summary of best model performances on the balanced test set (selected by highest macro-averaged F1-score). The best-performing result in the table is highlighted in bold.
Table 11.
Summary of best model performances on the balanced test set (selected by highest macro-averaged F1-score). The best-performing result in the table is highlighted in bold.
| Model | Training Strategy | Calibration | Accuracy | Precision (Macro) | Recall (Macro) | F1 (Macro) | ROC-AUC | PR-AUC |
|---|
| Logistic Regression | train_smote/train_smote_rus | none | 0.9946 | 0.9946 | 0.9946 | 0.9946 | 0.9949 | 0.9809 |
| Random Forest | train_rus | none | 0.9984 | 0.9984 | 0.9984 | 0.9984 | 0.99989 | 0.99984 |
| XGBoost | train_rus | none | 0.9981 | 0.9981 | 0.9981 | 0.9981 | 0.9996 | 0.9996 |
| Decision Tree | train_rus | none | 0.9989 | 0.9989 | 0.9989 | 0.9989 | 0.9989 | 0.9981 |
| MLP | train_smote | none | 0.9978 | 0.9979 | 0.9978 | 0.9978 | 0.9998 | 0.9998 |
Table 12.
Comparison of Wangiri fraud detection studies and the proposed idea. Our proposed method features in the table is highlighted in bold.
Table 12.
Comparison of Wangiri fraud detection studies and the proposed idea. Our proposed method features in the table is highlighted in bold.
| Study | Methodology | RT | Best Result | Key Findings/Contribution |
|---|
| Sahin et al. (2011) [26] | Supervised ML (DT, SVM) | No | Acc: ∼89% | Established baselines for general telecom fraud. Highlighted the need for feature engineering but lacked specific handling for Wangiri patterns. |
| Arafat et al. (2019) [27] | Ensemble ML (RF, AdaBoost, XGB) | No | XGB: Acc. 99.4%, F1 0.96 | Showed ensemble ML efficiency on labeled CDRs but lacked imbalance handling and unlabeled data support. The proposed work improves performance (PR-AUC 0.9943, F1 0.998) via SMOTE/RUS balancing. |
| Sahaidak et al. (2022) [42] | Literature Review | No | N/A | Provided a taxonomy of hybrid fraud types but lacked a computational implementation. Our work operationalizes these concepts into a functional detection pipeline. |
| Ravi et al. (2022) [30] | Mixed ML (SVM, RF, MLP, IF) | No | RF: Acc. 99%, F1 0.97 | Defined Wangiri taxonomy and compared supervised vs. unsupervised models. The proposed pipeline achieves F1 0.998 (+2.8%) and introduces pseudo-labeling and SHAP explainability. |
| Mundia et al. (2024) [39] | Policy/Qualitative Review | No | - | Identified gaps in current fraud management (manual processes, lack of automation, concept drift). The proposed work operationalizes these insights via an automated, drift-aware ML pipeline. |
| Proposed Method | ML pipeline (Ensemble) + SMOTE/RUS + SHAP | Yes | RF: PR-AUC 0.9943, F1 0.998 | Outperforms all prior studies by combining unsupervised labeling with optimized ensembles. Offers the highest reported precision/recall balance on imbalanced data while maintaining interpretability. |