1. Introduction
Credit card fraud poses an escalating threat to financial institutions worldwide, with losses estimated to surpass
$40 billion annually by 2027 [
1,
2,
3]. This upward trajectory is driven largely by the accelerated adoption of digital payments, e-commerce, and mobile banking—platforms that fraudsters increasingly exploit due to their rapid expansion and inherent vulnerabilities [
4]. Detecting fraudulent activities early and accurately is essential; however, this task remains inherently challenging due to the extreme imbalance in transaction datasets, where fraud typically constitutes fewer than 0.2% of all transactions [
5,
6]. In this context, fraudulent transactions represent the minority samples (positive class), while the overwhelming majority of legitimate transactions form the majority class. Distinguishing these rare minority samples is critical, as they drive the difficulty and importance of the fraud detection task.
Historically, supervised machine learning models, including logistic regression, random forests, and neural networks, have been employed extensively for fraud detection [
4,
7,
8]. These approaches primarily rely on historical labeled data and often presume static fraud patterns, leading to degraded performance when encountering new fraud behaviors [
9]. Moreover, they treat transactions independently, neglecting relational cues such as shared devices, accounts, or temporal correlations.
To address these shortcomings, anomaly detection methods—particularly autoencoders—have been widely explored. Autoencoders model legitimate transaction behavior and flag deviations as potential fraud [
10,
11,
12]. However, these models operate at the individual transaction level and may overlook coordinated schemes spanning multiple related transactions [
13].
The evolution of fraud detection methods can be broadly characterized in three stages. First, supervised learning dominated early studies, leveraging labeled datasets but struggling with class imbalance and evolving fraud tactics. Second, unsupervised and semi-supervised methods such as autoencoders emerged, focusing on capturing deviations from normal patterns without requiring extensive fraud labels. Third, recent advances in Graph Neural Networks (GNNs) have transformed the field by modeling inter-transaction dependencies explicitly. Architectures like GraphSAGE [
14] and Relational Graph Convolutional Networks (RGCN) [
15] demonstrated that neighborhood aggregation and relation-specific learning can uncover subtle, coordinated fraud rings that traditional methods miss. This trajectory highlights a clear research gap: while VAEs capture probabilistic feature-level anomalies and GNNs capture relational dependencies, few frameworks effectively integrate the two with ensemble meta-learning for robust fraud detection.
Motivated by these complementary strengths, this study introduces a novel hybrid architecture combining VAE-based anomaly detection and Graph Attention Network (GAT)-based relational modeling within an ensemble framework. Specifically, a VAE captures probabilistic latent representations of legitimate transactions, enhancing anomaly detection at the feature level, while a GAT processes the transaction graph to extract relational embeddings encoding fraud-relevant dependencies. Outputs from both models are fused through a stacking ensemble with XGBoost, enabling robust and generalizable predictions.
The primary contributions of this paper are summarized as follows:
Development of a hybrid architecture combining VAE-based anomaly detection and GAT-based relational modeling for enhanced credit card fraud detection.
Introduction of a stacking ensemble with XGBoost to effectively fuse anomaly scores and graph-derived embeddings, improving generalization under class imbalance.
Extensive validation on two real-world, highly imbalanced datasets, demonstrating superior performance of the proposed framework compared to strong baselines and recent state-of-the-art methods.
The rest of this paper is structured as follows:
Section 2 reviews related works in credit card fraud detection.
Section 3 details the datasets and fundamental methodologies employed in this study, including autoencoders, Graph Attention Networks, and the XGBoost meta-learner, as well as the evaluation metrics.
Section 4 describes the proposed hybrid ensemble framework, integrating probabilistic anomaly detection, relational modeling, and ensemble learning.
Section 5 presents the experimental results and comparative evaluations, followed by
Section 6, which concludes the paper and discusses directions for future research.
2. Related Works
Research into credit card fraud detection has evolved considerably over the past decade, driven by the urgent need to mitigate significant financial losses due to fraudulent activities. Current methodologies broadly encompass supervised machine learning, unsupervised and semi-supervised anomaly detection, graph-based relational modeling, and ensemble learning techniques.
Early research primarily utilized supervised classification methods such as logistic regression, decision trees, support vector machines (SVM), and random forests due to their interpretability and ease of implementation. However, these traditional supervised methods often struggled with imbalanced data scenarios, resulting in high false negatives and limited generalization to emerging fraud patterns [
4,
8]. To address this imbalance, oversampling methods and cost-sensitive learning techniques have been widely investigated [
9]. Despite such improvements, these supervised methods remain limited by their dependency on historical labeled data, making them susceptible to novel fraudulent behaviors.
To overcome these limitations, researchers have turned to unsupervised and semi-supervised methods, particularly autoencoders. Autoencoders have gained prominence due to their effectiveness in capturing complex latent representations of legitimate transactions, enabling the detection of anomalous deviations indicative of fraud [
16]. Tayebi and El Kafhali [
6], for instance, employed temporal autoencoders using Long Short-Term Memory (LSTM) networks, demonstrating significant improvements in fraud detection by explicitly modeling temporal transaction dependencies. Additionally, Almansoori and Telek [
17] presented a hybrid autoencoder-based approach integrating isolation forests to enhance anomaly detection effectiveness, while Alshameri and Xia [
18] explored Variational Autoencoders (VAEs) for robust unsupervised anomaly detection, significantly improving fraud identification. Compared to standard autoencoders, VAEs provide a probabilistic latent space, which enhances their ability to generalize to unseen fraudulent behaviors, though at the cost of increased computational complexity.
Graph-based methods, especially GNNs, have been employed to address this gap by explicitly modeling inter-transaction relationships. Recent studies have validated the effectiveness of GNN-based architectures such as Graph Convolutional Networks (GCN) and GAT in capturing relational fraud patterns. Cherif et al. [
9] recently proposed an encoder–decoder GNN model for fraud detection, demonstrating superior performance by effectively exploiting structural information within transaction graphs. Similarly, Gupta et al. [
19] presented a GCN-based model, validating its capability to detect sophisticated fraudulent patterns through node embeddings derived from transaction graph structures. Furthermore, Khosravi et al. [
20] demonstrated the effectiveness of attention-based GNN models in dynamically identifying complex relational fraud structures, indicating superior generalization and interpretability. In particular, GATs often outperform GCNs because their attention mechanism selectively emphasizes informative neighbors and downweights noisy or irrelevant edges, whereas GCNs aggregate information uniformly across neighbors, making them more sensitive to graph noise.
Additionally, ensemble methods have been widely used in fraud detection for their ability to combine diverse learners and improve robustness. Bagging approaches, such as random forests, enhance stability but may underperform in extreme imbalance scenarios, while boosting techniques (e.g., AdaBoost, XGBoost, CatBoost) incrementally focus on hard-to-classify fraud cases and often deliver superior recall. Stacking ensembles extend these methods by learning meta-level representations from multiple base models, a strategy particularly effective in imbalanced fraud detection tasks [
21]. Beyond classical approaches, recent hybrid models integrating deep learning with GNNs have shown significant promise in domains such as intrusion detection and cybersecurity, demonstrating their potential for fraud detection [
22,
23,
24,
25]. For example, Zhang and Luo [
26] proposed a hybrid model integrating GNN embeddings with deep neural network classifiers, demonstrating significant performance gains in real-world fraud detection tasks.
Despite these advancements, existing hybrid models often do not fully leverage the probabilistic modeling advantages of advanced autoencoder variants such as VAEs, nor do they extensively explore enriched graph representations capturing dynamic relational patterns. In response to these research gaps, our proposed approach integrates a Variational Autoencoder-based anomaly detection module with a GAT designed for relational modeling, employing an XGBoost meta-learner to unify and interpret the combined outputs. This framework aims to capitalize on the strengths of unsupervised probabilistic anomaly detection, attention-based relational learning, and ensemble fusion, thus offering improved accuracy and robustness to imbalanced and evolving fraud detection scenarios.
3. Materials and Methods
This section describes the dataset and summarizes the fundamental techniques used in this research, specifically autoencoders for anomaly detection, GAT for relational modeling, and XGBoost as a meta-learner.
3.1. Dataset Description
This study evaluates model performance on two widely used, real-world credit card fraud datasets: the European Credit Card dataset [
27] and the IEEE-CIS Fraud Detection dataset. The experiments will be restricted to these two benchmark datasets, rather than multiple datasets, in order to provide a focused and rigorous evaluation of the proposed hybrid framework.
3.1.1. European Credit Card Fraud Dataset
This study uses the publicly available dataset introduced by Dal Pozzolo et al. [
27], which consists of 284,807 transactions collected over two consecutive days in 2013. Out of these, only 492 transactions (0.172%) are labeled as fraudulent, resulting in a highly imbalanced binary classification problem. Each transaction is described by 30 numerical features: 28 have been anonymized through principal component analysis (PCA) to protect confidentiality, while the remaining two features (
Time and
Amount) retain their original form.
3.1.2. IEEE-CIS Fraud Detection Dataset
The second benchmark is the IEEE-CIS Fraud Detection dataset, made available as part of a Kaggle competition (
https://www.kaggle.com/c/ieee-fraud-detection, accessed on 26 May 2025). This dataset was provided by Vesta Corporation, a global leader in fraud prevention for e-commerce platforms, making it highly representative of real-world online transaction environments. It contains over 1.1 million transactions, with approximately 3.5% labeled as fraud. Each record is described by 431 features, spanning transaction amount, device and browser information, payment card attributes, network details, and digital identity variables. The dataset has become a widely recognized benchmark in the research community due to its scale and complexity and the visibility it gained through the Kaggle competition, which attracted thousands of participants. This significance makes it particularly well-suited for evaluating the scalability and robustness of advanced fraud detection frameworks.
3.2. Autoencoder
Autoencoders are unsupervised neural networks that learn to encode input data into a compressed latent representation and then decode it back to reconstruct the original input [
28,
29]. In the context of fraud detection, an autoencoder is trained using only legitimate (non-fraudulent) transactions. This allows the model to learn the underlying patterns of normal behavior, so that, during inference, transactions that deviate significantly from these patterns, resulting in high reconstruction errors, can be flagged as potential fraud. The strength of autoencoders lies in their ability to capture subtle deviations from normality without requiring explicit fraud labels.
3.3. Graph Attention Networks
GATs [
30] are deep learning models designed to work with graph-structured data. Here, each node represents a transaction, and edges indicate similarity or contextual relationships, such as shared card information or temporal proximity [
31]. Unlike traditional graph neural networks, GATs introduce an attention mechanism that learns to assign different levels of importance to neighboring nodes, thus enabling the model to focus more on informative or suspicious relationships. This is especially useful for fraud detection, as it allows the network to identify complex, coordinated fraud activities that manifest as anomalous connectivity patterns within the transaction graph.
In this study, the construction of the transaction graph is a critical step, based on temporal windows, feature similarity, and shared identifiers. The full procedure for graph construction is detailed in
Section 4, Algorithm 1, which illustrates how transactions are transformed into graph-structured input for the GAT.
Algorithm 1 Transaction graph construction |
Require: Transaction dataset with features and timestamps |
Ensure: Transaction graph - 1:
Initialize graph with nodes for each transaction - 2:
for each pair of transactions do - 3:
if s then - 4:
Add edge to E - 5:
else if shared identifiers exist (e.g., card ID, device) then - 6:
Add edge to E - 7:
else if then - 8:
Add edge to E - 9:
end if - 10:
end for - 11:
Encode node features using normalized transaction attributes - 12:
return Graph
|
3.4. XGBoost
XGBoost is an efficient implementation of gradient-boosted decision trees that excels on structured, tabular data and is particularly well-suited for tasks with severe class imbalance, such as fraud detection [
32]. It combines multiple weak learners—typically shallow decision trees—into a strong ensemble, iteratively correcting errors of previous trees while applying regularization to prevent overfitting [
33]. In this study, XGBoost acts as a meta-classifier, aggregating the diverse information produced by both the autoencoder and the graph-based models to generate robust final predictions.
3.5. Evaluation Metrics
To rigorously assess model performance under extreme class imbalance, we employ a suite of metrics that captures both overall correctness and the model’s sensitivity to rare fraud cases; these metrics include accuracy, precision, recall, F-measure, receiver operating characteristics (ROC) curve, and area under the ROC curve (AUC). Accuracy measures the proportion of all transactions that are correctly classified but can be misleading in imbalanced datasets, since a model predicting only the majority class may still achieve high accuracy [
34,
35]. To provide a more meaningful evaluation, we also report precision, which measures the proportion of predicted fraud cases that are actually fraudulent and thus reflects the model’s ability to avoid false alarms [
36]. Recall, also known as sensitivity, captures the proportion of actual fraudulent transactions that are correctly identified, making it critical for minimizing undetected fraud. The F1-score, which is the harmonic mean of precision and recall, balances the trade-off between these two metrics and provides a single value reflecting both false positives and false negatives [
37]. These metrics are mathematically defined as follows:
where
denotes true positives (correctly detected frauds),
denotes true negatives (correctly detected non-frauds),
denotes false positives (genuine transactions incorrectly classified as fraud), and
denotes false negatives (fraudulent transactions missed by the model) [
37,
38]. Together, these metrics provide a holistic evaluation framework that captures both the correctness of fraud detection and the model’s discriminative ability in the presence of severe class imbalance.
4. Proposed Methodology
The proposed framework, shown in
Figure 1, integrates probabilistic anomaly detection, relational graph modeling, and ensemble learning to robustly address imbalanced credit card fraud detection. The process begins with a VAE, which is trained exclusively on non-fraudulent transactions to learn the typical distribution of normal behavior. Given input vector
, the encoder parameterized by
maps
x to a latent representation
z, and the decoder parameterized by
reconstructs
x from
z [
28]. The VAE is optimized by minimizing a combined loss comprising the reconstruction error and the Kullback–Leibler (KL) divergence, as follows:
where
is the reconstruction of the input,
is the approximate posterior,
is the prior over latent variables, and
denotes the Euclidean norm. The reconstruction error between the input
x and its reconstructed output
is treated as an anomaly score, where higher values suggest deviations from normal behavior and potential fraudulent activity.
A central component of this framework is the construction of the transaction graph . Each transaction is represented as a node, and edges are formed when transactions share identifiers (e.g., card or device), occur within a temporal window of 60 s, or lie within a thresholded Euclidean distance in PCA-transformed feature space. Node features are encoded using normalized transaction attributes such as amount, device metadata, and anonymized principal components. The detailed procedure is outlined in Algorithm 1.
Meanwhile, in the GAT model, the constructed graph
is processed to update each node embedding
using a weighted aggregation of its neighbors’ features:
where
is the feature vector for node
j,
W is a learnable weight matrix,
is a non-linear activation, and
is the set of neighbors for node
i [
39,
40]. The attention coefficients
are computed as
where
a is a trainable attention vector and
denotes vector concatenation [
40]. This mechanism enables the GAT to learn the relative importance of each neighbor, allowing for effective identification of complex, context-driven fraud patterns. In this work, the GAT is trained in a semi-supervised setting where the available fraud labels guide learning. A cross-entropy loss function is optimized over labeled nodes, and class imbalance is mitigated by applying inverse-frequency class weights during training. The model is trained end-to-end rather than freezing embeddings at intermediate stages, ensuring that node representations remain fully adaptive to the fraud classification objective. The outputs from the VAE (anomaly scores), GAT (node embeddings and contextual fraud probabilities), and other relevant features are concatenated to form enriched feature vector
for each transaction:
where
is the VAE anomaly score,
is the GAT-based fraud probability, and
is the learned node embedding. These representations are provided as input to an XGBoost meta-classifier, which learns to combine them for final fraud prediction:
where
denotes the XGBoost model. The overall workflow is summarized in Algorithm 2.
Furthermore, model validation is performed using stratified 5-fold cross-validation to ensure the minority class is adequately represented. Early stopping is applied to all neural components, and ensemble training is carried out on out-of-fold predictions to prevent information leakage. This unified architecture capitalizes on the strengths of probabilistic anomaly detection, relational modeling, and ensemble fusion, resulting in improved performance and robustness for fraud detection in highly imbalanced and dynamic settings.
Algorithm 2 Hybrid VAE–GAT ensemble for fraud detection |
Require: Transaction dataset with labels |
Ensure: Fraud predictions for transactions - 1:
Train VAE on non-fraud subset and compute anomaly scores - 2:
Construct transaction graph using Algorithm 1 - 3:
Train GAT on with weighted cross-entropy loss - 4:
for each node in do - 5:
Compute embeddings and fraud probabilities - 6:
end for - 7:
for each transaction do - 8:
Form ensemble input - 9:
Predict using XGBoost - 10:
end for
|
5. Results and Discussion
The evaluation was conducted on two benchmark datasets: the European Credit Card Fraud dataset and the IEEE-CIS Credit Card dataset, both exhibiting severe class imbalance. All preprocessing and modeling were implemented in Python 3.11 using scikit-learn v1.7.2, XGBoost v3.0.5, CatBoost v1.2.3, TensorFlow v2.20.0, and PyTorch-Geometric v2.6.1 for graph-based learning. For the graph-based components, transactions were connected based on shared identifiers, temporal proximity (within 60 s), and Euclidean distance in PCA space.
For model selection and hyperparameter tuning, a stratified 80/20 train–test split was employed, ensuring representation of the minority fraud class in both sets. A stratified 5-fold cross-validation strategy was used for hyperparameter optimization, ensuring balanced representation of fraud and non-fraud samples across folds. Early stopping was applied to all neural models to reduce overfitting, while the GAT employed a weighted cross-entropy loss to address class imbalance. To prevent information leakage, ensemble training was based on out-of-fold predictions from the base learners rather than reusing training outputs. The VAE was trained exclusively on non-fraud transactions. The model parameters are shown in
Table 1.
5.1. Experimental Results
Table 2 and
Table 3 summarize the performance of all models across the two credit card fraud detection benchmarks. Several insights emerge from these results. Firstly, all traditional baseline models, including logistic regression (LR), random forest (RF), XGBoost, and CatBoost, exhibited high overall accuracy, a well-documented artifact of class imbalance in fraud detection. LR, despite its high recall (0.9184 and 0.7087 for the European and IEEE-CIS datasets, respectively), showed comparatively lower precision, indicating a tendency towards false positives.
RF, XGBoost, and CatBoost demonstrated more balanced trade-offs with improved precision–recall balances reflected in their respective F1-scores (
Table 2). The hybrid models (Hybrid-XGBoost and Hybrid-CatBoost) significantly outperformed the baseline classifiers on both datasets, achieving F1-scores exceeding 0.97 on the European dataset and approximately 0.98 on the IEEE-CIS dataset. This improvement highlights the benefit of integrating VAE-generated anomaly scores with graph-based relational features captured by the GAT within an ensemble learning framework.
As illustrated in
Figure 2 and
Figure 3, the ROC curves for both hybrid models dominate those of baseline models across all thresholds. Hybrid-XGBoost achieved AUC values of 0.995 (European dataset) and 0.990 (IEEE CIS dataset), while Hybrid-CatBoost recorded 0.992 and 0.984, respectively, confirming strong discriminatory performance.
5.2. Comparison with Similar Studies
We benchmark our approach against state-of-the-art methods in
Table 4. While deep ensembles such as GAN-LSTM [
41] and CNN-LSTM hybrids [
42] report strong F1-scores around 0.975–0.984, our Hybrid-XGBoost surpasses these with an F1-score of 0.9875. Similarly, encoder–decoder GNNs [
43] achieve moderate gains by exploiting relational structures, but their performance (F1 = 0.86) falls short of our probabilistic–relational integration. Transformer-based models [
44] show solid precision (0.958) but considerably weaker recall (0.795), highlighting their limitation in detecting rare fraud cases.
Compared with stacked deep learning ensembles [
21] and LSTM ensembles [
45], which report recall above 0.996 but without corresponding precision values, our hybrid approach achieves a more balanced performance, reducing false alarms while maintaining near-perfect recall. These findings shows that the joint use of VAE-derived anomaly scores and GAT-based embeddings within a stacking ensemble provides consistent improvements across all metrics, offering a more generalizable and practical solution for real-world fraud detection.
Table 4.
Comparison with other recent studies. Results for our proposed models are highlighted in bold.
Table 4.
Comparison with other recent studies. Results for our proposed models are highlighted in bold.
Reference | Technique | Recall | Precision | F1-Score |
---|
Mienye and Swart [41] | GAN-LSTM | 0.990 | 0.979 | 0.984 |
Forough and Momtazi [46] | LSTM-conditional random fields | 0.756 | - | 0.807 |
Alfaiz and Fati [47] | CatBoost-resampling | 0.961 | - | 0.869 |
Alfaiz and Fati [47] | XGBoost-SVM SMOTE | 0.930 | - | 0.865 |
Ileberi and Sun [42] | Deep learning ensemble | 0.961 | 0989 | 0.975 |
Xinwei et al. [48] | RNN | - | - | 0.473 |
Madhurya et al. [49] | LR | 0.968 | - | - |
Khalid et al. [50] | LR-SMOTE | 0.944 | - | 0.944 |
Varmedja et al. [51] | MLP-SMOTE | 0.816 | - | 0.804 |
Alarfaj et al. [52] | XGBoost | - | - | 0.845 |
Mrozek et. al. [53] | RF-undersampling | 1.000 | - | 0.118 |
Jain et al. [54] | RF | 0.780 | - | 0.858 |
Lin and Jiang [55] | Autoencoder-RF | 0.814 | - | - |
Najadat et al. [56] | LR | 0.720 | - | 0.220 |
Dighe et al. [57] | LR | 0.816 | - | - |
Nadim et al. [58] | Classification and regression trees | 0.885 | - | - |
Alwan et al. [59] | Naive Bayes | 0.836 | - | - |
Asha and Kumar [60] | Artificial neural network | 0.761 | - | - |
Dhankhad et al. [61] | Random forest + undersampling | 0.950 | - | 0.950 |
Dhankhad et al. [61] | XGBoost with undersampling | 0.950 | - | 0.950 |
Tang and Liu et al. [44] | Structured Data Transformer | 0.795 | 0.958 | 0.867 |
Esenogho et al. [45] | LSTM Ensemble + SMOTE-ENN | 0.996 | - | - |
Mienye and Sun [21] | Stacked deep learning ensemble | 0.997 | - | - |
Cherif et al. [43] | Encoder–decoder GNN | 0.920 | 0.820 | 0.86 |
Ileberi and Sun [42] | Hybrid CNN-LSTM ensemble | 0.961 | 0.989 | 0.975 |
This Paper | Proposed Hybrid-XGBoost | 0.9976 | 0.9741 | 0.9875 |
This Paper | Proposed Hybrid-CatBoost | 0.9940 | 0.9738 | 0.9838 |
5.3. Discussion
The results across both datasets confirm that hybridizing VAE anomaly scores with GAT-based relational embeddings significantly enhances fraud detection performance. The F1-scores for Hybrid-XGBoost (0.9857 on the European dataset, 0.9842 on the IEEE dataset) and Hybrid-CatBoost (0.9838 on the European dataset, 0.9679 on the IEEE dataset) indicate an almost ideal balance between precision and recall. These models not only detect nearly all fraud instances but also significantly reduce false positives, which is critical in minimizing operational costs in fraud analytics systems.
The ROC curves in
Figure 2 and
Figure 3 corroborate this finding. Hybrid-XGBoost achieves AUC scores of 0.995 (European dataset) and 0.990 (IEEE dataset), while Hybrid-CatBoost follows with AUCs of 0.992 and 0.984. These high AUC values indicate that the hybrid models are effective across a wide range of thresholds. The improvements can be attributed to complementary strengths: the VAE identifies statistical outliers in high-dimensional feature space, while GAT captures latent topological patterns. Feeding these diverse insights into the meta-classifier enables more robust and informed decision-making.
5.4. Limitations of the Study
While highly effective, the proposed framework entails several practical trade-offs. First, constructing high-quality transaction graphs introduces additional computational overhead, especially for large-scale datasets. The performance of the GAT module is also sensitive to graph quality; noisy or sparse edges may limit its ability to capture meaningful relationships. Second, although class imbalance is mitigated using weighting schemes, extreme imbalance may still pose challenges for stable model training. Finally, this study does not include a complete ablation analysis (e.g., VAE-only, GAT-only, or linear fusion versus the full hybrid), which would provide further evidence of the contribution of individual modules. These aspects represent natural extensions for future research, particularly in the context of dynamic graph construction, scalable GNN variants, and visualization techniques for better interpretability.
6. Conclusions and Future Work
This paper presented a hybrid fraud detection framework that integrates VAE-based anomaly detection, GAT-based relational learning, and gradient-boosted decision ensembles (XGBoost and CatBoost) into a unified architecture. The method was evaluated on two highly imbalanced benchmark datasets, i.e., the European Credit Card and IEEE-CIS, and outperformed traditional models across all key metrics. In quantitative terms, the hybrid models achieved 15–20% higher F1-scores compared to baseline classifiers such as logistic regression, random forest, and standalone boosting methods. Hybrid-XGBoost in particular reached an F1-score of 0.9857 on the European dataset, while Hybrid-CatBoost achieved 0.9838, establishing new performance benchmarks on these datasets.
The results demonstrate that combining probabilistic feature-level anomalies from VAE with graph-based topological features from GAT significantly improves detection accuracy. Notably, the proposed Hybrid-XGBoost model achieved excellent performance, outperforming other baseline models. The Hybrid-CatBoost variant also performed strongly, highlighting the robustness of the stacking ensemble mechanism. These findings support the hypothesis that hybrid architectures leveraging both local feature irregularities and global relational structures are more effective in capturing complex fraud patterns.
Future work will also explore explainable AI (e.g., SHAP, LIME) to improve model interpretability and expand the framework to real-time streaming fraud detection environments using dynamic graph construction and online learning.
Author Contributions
Conceptualization, I.D.M., E.E., and C.M.; methodology, I.D.M., E.E., and C.M.; validation, I.D.M., E.E,. and C.M.; investigation, I.D.M., E.E., and C.M.; writing—original draft preparation, I.D.M.; writing—review and editing, I.D.M., E.E., and C.M. visualization, I.D.M., E.E., and C.M.; supervision, E.E. and C.M. All authors have read and agreed to the published version of the manuscript.
Funding
This research received no external funding.
Data Availability Statement
Conflicts of Interest
The authors declare no conflicts of interest.
Abbreviations
The following abbreviations are used in this manuscript:
AI | Artificial Intelligence |
CatBoost | Categorical Boosting |
F1 | F1-Score |
GAT | Graph Attention Network |
GNN | Graph Neural Network |
IEEE-CIS | IEEE Computational Intelligence Society |
LSTM | Long Short-Term Memory |
ML | Machine learning |
PCA | Principal component analysis |
ROC | Receiver operating characteristic |
SMOTE | Synthetic Minority Oversampling Technique |
VAE | Variational Autoencoder |
XGBoost | Extreme Gradient Boosting |
References
- Report, N. Global Card Fraud Losses to Exceed $40 Billion by 2027. 2023. Available online: https://nilsonreport.com/newsletters/1276/ (accessed on 3 July 2025).
- Bin Sulaiman, R.; Schetinin, V.; Sant, P. Review of machine learning approach on credit card fraud detection. Hum.-Centric Intell. Syst. 2022, 2, 55–68. [Google Scholar] [CrossRef]
- Gupta, R.; Srivastava, P.; Taluja, H.K.; Sharma, S.; Samant, S.; Ratna, S.; Sharma, A. Leveraging Machine Learning Algorithms for Fraud Detection and Prevention in Digital Payments: A Cross Country Comparison. In Proceedings of the International Conference on Information Technology, Noida, India, 2–3 March 2023; Springer: Singapore, 2023; pp. 369–381. [Google Scholar]
- Theodorakopoulos, L.; Theodoropoulou, A.; Tsimakis, A.; Halkiopoulos, C. Big Data-Driven Distributed Machine Learning for Scalable Credit Card Fraud Detection Using PySpark, XGBoost, and CatBoost. Electronics 2025, 14, 1754. [Google Scholar] [CrossRef]
- Gupta, P.; Varshney, A.; Khan, M.R.; Ahmed, R.; Shuaib, M.; Alam, S. Unbalanced credit card fraud detection data: A machine learning-oriented comparative study of balancing techniques. Procedia Comput. Sci. 2023, 218, 2575–2584. [Google Scholar] [CrossRef]
- Tayebi, M.; El Kafhali, S. Generative Modeling for Imbalanced Credit Card Fraud Transaction Detection. J. Cybersecur. Priv. 2025, 5, 9. [Google Scholar] [CrossRef]
- Mniai, A.; Tarik, M.; Jebari, K. A novel framework for credit card fraud detection. IEEE Access 2023, 11, 112776–112786. [Google Scholar] [CrossRef]
- Huang, H.; Liu, B.; Xue, X.; Cao, J.; Chen, X. Imbalanced credit card fraud detection data: A solution based on hybrid neural network and clustering-based undersampling technique. Appl. Soft Comput. 2024, 154, 111368. [Google Scholar] [CrossRef]
- Cherif, A.; Badhib, A.; Ammar, H.; Alshehri, S.; Kalkatawi, M.; Imine, A. Credit card fraud detection in the era of disruptive technologies: A systematic review. J. King Saud Univ.-Comput. Inf. Sci. 2023, 35, 145–174. [Google Scholar] [CrossRef]
- Fanai, H.; Abbasimehr, H. A novel combined approach based on deep Autoencoder and deep classifiers for credit card fraud detection. Expert Syst. Appl. 2023, 217, 119562. [Google Scholar] [CrossRef]
- Shi, S.; Luo, W.; Pau, G. An attention-based balanced variational autoencoder method for credit card fraud detection. Appl. Soft Comput. 2025, 177, 113190. [Google Scholar] [CrossRef]
- Koo, K.; Park, M.; Yoon, B. A suspicious financial transaction detection model using autoencoder and risk-based approach. IEEE Access 2024, 12, 68926–68939. [Google Scholar] [CrossRef]
- Alarfaj, F.K.; Shahzadi, S. Enhancing Fraud detection in banking with deep learning: Graph neural networks and autoencoders for real-time credit card fraud prevention. IEEE Access 2024, 13, 20633–20646. [Google Scholar] [CrossRef]
- Hamilton, W.; Ying, Z.; Leskovec, J. Inductive representation learning on large graphs. Adv. Neural Inf. Process. Syst. 2017, 30. Available online: https://proceedings.neurips.cc/paper_files/paper/2017/file/5dd9db5e033da9c6fb5ba83c7a7ebea9-Paper.pdf (accessed on 15 September 2025).
- Schlichtkrull, M.; Kipf, T.N.; Bloem, P.; Van Den Berg, R.; Titov, I.; Welling, M. Modeling relational data with graph convolutional networks. In Proceedings of the European Semantic Web Conference, Heraklion, Greece, 3–7 June 2018; Springer: Cham, Switzerland, 2018; pp. 593–607. [Google Scholar]
- Ding, Y.; Kang, W.; Feng, J.; Peng, B.; Yang, A. Credit card fraud detection based on improved Variational Autoencoder Generative Adversarial Network. IEEE Access 2023, 11, 83680–83691. [Google Scholar] [CrossRef]
- Almansoori, M.; Telek, M. Anomaly detection using combination of autoencoder and isolation forest. In Proceedings of the 1st Workshop on Intelligent Infocommunication Networks, Systems and Services (WI2NS2), Budapest, Hungary, 7 February 2023; Budapest University of Technology and Economics: Budapest, Hungary, 2023; pp. 25–30. [Google Scholar]
- Alshameri, F.; Xia, R. An evaluation of variational autoencoder in credit card anomaly detection. In Big Data Mining and Analytics; Tsinghua University Press: Beijing, China, 2024. [Google Scholar]
- Gupta, V.; Mishra, N.; Dash, Y.; Kumar, U.; Abraham, A. Graph Convolutional Network-Driven Adaptive Learning Framework for Fraud Detection in Complex Transactional Cryptonetworks. In Proceedings of the 2025 3rd International Conference on Communication, Security, and Artificial Intelligence (ICCSAI), Greater Noida, India, 4–6 April 2025; IEEE: Piscataway, NJ, USA, 2025; Volume 3, pp. 685–689. [Google Scholar]
- Khosravi, S.; Kargari, M.; Teimourpour, B.; Talebi, M. Transaction fraud detection via attentional spatial–temporal GNN. J. Supercomput. 2025, 81, 537. [Google Scholar] [CrossRef]
- Mienye, I.D.; Sun, Y. A Deep Learning Ensemble with Data Resampling for Credit Card Fraud Detection. IEEE Access 2023, 11, 30628–30638. [Google Scholar] [CrossRef]
- Pradeep, M.; Gopalakrishnan, S. Hybrid Model for Imbalance Correction in Intrusion Detection Systems Using Advanced Optimization Techniques and Graph Neural Networks. In Proceedings of the 2024 4th International Conference on Mobile Networks and Wireless Communications (ICMNWC), Tumkuru, India, 4–5 December 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 1–9. [Google Scholar]
- Bilot, T.; El Madhoun, N.; Al Agha, K.; Zouaoui, A. Graph neural networks for intrusion detection: A survey. IEEE Access 2023, 11, 49114–49139. [Google Scholar] [CrossRef]
- Dhadhania, A.; Bhatia, J.; Mehta, R.; Tanwar, S.; Sharma, R.; Verma, A. Unleashing the power of SDN and GNN for network anomaly detection: State-of-the-art, challenges, and future directions. Secur. Priv. 2024, 7, e337. [Google Scholar] [CrossRef]
- Masood, S.; Zafar, A. Deep-efficient-guard: Securing wireless ad hoc networks via graph neural network. Int. J. Inf. Technol. 2024, 16, 4111–4126. [Google Scholar] [CrossRef]
- Zhang, W.; Luo, C. GE-GNN: Gated Edge-augmented Graph Neural Network for Fraud Detection. IEEE Trans. Big Data 2025, 11, 1664–1676. [Google Scholar] [CrossRef]
- Dal Pozzolo, A.; Caelen, O.; Johnson, R.A.; Bontempi, G. Calibrating probability with undersampling for unbalanced classification. In Proceedings of the 2015 IEEE Symposium Series on Computational Intelligence, Cape Town, South Africa, 7–10 December 2015; IEEE: Piscataway, NJ, USA, 2015; pp. 159–166. [Google Scholar]
- Mienye, I.D.; Swart, T.G. Deep autoencoder neural networks: A comprehensive review and new perspectives. Arch. Comput. Methods Eng. 2025, 2, 25. [Google Scholar] [CrossRef]
- Motamednia, H.; Mahmoudi-Aznaveh, A.; Ng, A.W. Autoencoders. In Dimensionality Reduction in Machine Learning; Elsevier: Amsterdam, The Netherlands, 2025; pp. 245–268. [Google Scholar]
- Veličković, P.; Cucurull, G.; Casanova, A.; Romero, A.; Lio, P.; Bengio, Y. Graph attention networks. arXiv 2017, arXiv:1710.10903. [Google Scholar]
- Cui, Y.; Han, X.; Chen, J.; Zhang, X.; Yang, J.; Zhang, X. FraudGNN-RL: A graph neural network with reinforcement learning for adaptive financial fraud detection. IEEE Open J. Comput. Soc. 2025, 6, 426–437. [Google Scholar] [CrossRef]
- Pasaribu, J.; Yudistira, N.; Mahmudy, W.F. Tabular Data Classification and Regression: XGBoost or Deep Learning with Retrieval-Augmented Generation. IEEE Access 2024, 12, 191719–191732. [Google Scholar] [CrossRef]
- Zhang, P.; Jia, Y.; Shang, Y. Research and application of XGBoost in imbalanced data. Int. J. Distrib. Sens. Netw. 2022, 18, 15501329221106935. [Google Scholar] [CrossRef]
- Wang, L.; Han, M.; Li, X.; Zhang, N.; Cheng, H. Review of classification methods on unbalanced data sets. IEEE Access 2021, 9, 64606–64628. [Google Scholar] [CrossRef]
- Mienye, I.D.; Swart, T.G. Ensemble Large Language Models: A Survey. Information 2025, 16, 688. [Google Scholar] [CrossRef]
- Chakrabarty, B.; Moulton, P.C.; Pugachev, L.; Wang, X. Catch me if you can: In search of accuracy, scope, and ease of fraud prediction. Rev. Account. Stud. 2025, 30, 1268–1308. [Google Scholar] [CrossRef]
- Diallo, R.; Edalo, C.; Awe, O.O. Machine learning evaluation of imbalanced health data: A comparative analysis of balanced accuracy, MCC, and F1 score. In Practical Statistical Learning and Data Science Methods: Case Studies from LISA 2020 Global Network, USA; Springer: Berlin/Heidelberg, Germany, 2024; pp. 283–312. [Google Scholar]
- Obaido, G.; Ogbuokiri, B.; Mienye, I.D.; Kasongo, S.M. A Voting Classifier for Mortality Prediction Post-Thoracic Surgery. In Proceedings of the International Conference on Intelligent Systems Design and Applications, Online, 12–14 December 2022; Springer: Cham, Switzerland, 2022; pp. 263–272. [Google Scholar]
- Vrahatis, A.G.; Lazaros, K.; Kotsiantis, S. Graph attention networks: A comprehensive review of methods and applications. Future Internet 2024, 16, 318. [Google Scholar] [CrossRef]
- Brody, S.; Alon, U.; Yahav, E. How attentive are graph attention networks? arXiv 2021, arXiv:2105.14491. [Google Scholar]
- Mienye, I.D.; Swart, T.G. A hybrid deep learning approach with generative adversarial network for credit card fraud detection. Technologies 2024, 12, 186. [Google Scholar] [CrossRef]
- Ileberi, E.; Sun, Y. A Hybrid Deep Learning Ensemble Model for Credit Card Fraud Detection. IEEE Access 2024, 12, 175829–175838. [Google Scholar] [CrossRef]
- Cherif, A.; Ammar, H.; Kalkatawi, M.; Alshehri, S.; Imine, A. Encoder–decoder graph neural network for credit card fraud detection. J. King Saud Univ.-Comput. Inf. Sci. 2024, 36, 102003. [Google Scholar] [CrossRef]
- Tang, Y.; Liu, Z. A Credit Card Fraud Detection Algorithm Based on SDT and Federated Learning. IEEE Access 2024, 12, 182547–182560. [Google Scholar] [CrossRef]
- Esenogho, E.; Mienye, I.D.; Swart, T.G.; Aruleba, K.; Obaido, G. A Neural Network Ensemble with Feature Engineering for Improved Credit Card Fraud Detection. IEEE Access 2022, 10, 16400–16407. [Google Scholar] [CrossRef]
- Forough, J.; Momtazi, S. Sequential credit card fraud detection: A joint deep neural network and probabilistic graphical model approach. Expert Syst. 2021, 39, e12795. [Google Scholar] [CrossRef]
- Alfaiz, N.S.; Fati, S.M. Enhanced Credit Card Fraud Detection Model Using Machine Learning. Electronics 2022, 11, 662. [Google Scholar] [CrossRef]
- Zhang, X.; Han, Y.; Xu, W.; Wang, Q. HOBA: A novel feature engineering methodology for credit card fraud detection with a deep learning architecture. Inf. Sci. 2021, 557, 302–316. [Google Scholar] [CrossRef]
- Madhurya, M.J.; Gururaj, H.L.; Soundarya, B.C.; Vidyashree, K.P.; Rajendra, A.B. Exploratory analysis of credit card fraud detection using machine learning techniques. Glob. Transitions Proc. 2022, 3, 31–37. [Google Scholar] [CrossRef]
- Khalid, A.R.; Owoh, N.; Uthmani, O.; Ashawa, M.; Osamor, J.; Adejoh, J. Enhancing Credit Card Fraud Detection: An Ensemble Machine Learning Approach. Big Data Cogn. Comput. 2024, 8, 6. [Google Scholar] [CrossRef]
- Varmedja, D.; Karanovic, M.; Sladojevic, S.; Arsenovic, M.; Anderla, A. Credit Card Fraud Detection - Machine Learning methods. In Proceedings of the 2019 18th International Symposium INFOTEH-JAHORINA (INFOTEH), East Sarajevo, Bosnia and Herzegovina, 20–22 March 2019; pp. 1–5. [Google Scholar] [CrossRef]
- Alarfaj, F.K.; Malik, I.; Khan, H.U.; Almusallam, N.; Ramzan, M.; Ahmed, M. Credit Card Fraud Detection Using State-of-the-Art Machine Learning and Deep Learning Algorithms. IEEE Access 2022, 10, 39700–39715. [Google Scholar] [CrossRef]
- Mrozek, P.; Panneerselvam, J.; Bagdasar, O. Efficient Resampling for Fraud Detection During Anonymised Credit Card Transactions with Unbalanced Datasets. In Proceedings of the 2020 IEEE/ACM 13th International Conference on Utility and Cloud Computing (UCC), Leicester, UK, 7–10 December 2020; pp. 426–433. [Google Scholar] [CrossRef]
- Jain, V.; Kavitha, H.; Mohana Kumar, S. Credit Card Fraud Detection Web Application using Streamlit and Machine Learning. In Proceedings of the 2022 IEEE International Conference on Data Science and Information System (ICDSIS), Hassan, India, 29–30 July 2022; pp. 1–5. [Google Scholar] [CrossRef]
- Lin, T.H.; Jiang, J.R. Credit Card Fraud Detection with Autoencoder and Probabilistic Random Forest. Mathematics 2021, 9, 2683. [Google Scholar] [CrossRef]
- Najadat, H.; Altiti, O.; Aqouleh, A.A.; Younes, M. Credit card fraud detection based on machine and deep learning. In Proceedings of the 2020 11th International Conference on Information and Communication Systems (ICICS), Irbid, Jordan, 7–9 April 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 204–208. [Google Scholar]
- Dighe, D.; Patil, S.; Kokate, S. Detection of Credit Card Fraud Transactions Using Machine Learning Algorithms and Neural Networks: A Comparative Study. In Proceedings of the 2018 Fourth International Conference on Computing Communication Control and Automation (ICCUBEA), Pune, India, 16–18 August 2018; IEEE: Piscataway, NJ, USA, 2018; p. 8. [Google Scholar] [CrossRef]
- Nadim, A.H.; Sayem, I.M.; Mutsuddy, A.; Chowdhury, M.S. Analysis of Machine Learning Techniques for Credit Card Fraud Detection. In Proceedings of the 2019 International Conference on Machine Learning and Data Engineering (iCMLDE), Taipei, China, 2–4 December 2019; IEEE: Piscataway, NJ, USA, 2019; p. 12. [Google Scholar] [CrossRef]
- Alwan, R.H.; Hamad, M.M.; Dawood, O.A. Credit Card Fraud Detection in Financial Transactions Using Data Mining Techniques. In Proceedings of the 2021 7th International Conference on Contemporary Information Technology and Mathematics (ICCITM), Mosul, Iraq, 25–26 August 2021; IEEE: Piscataway, NJ, USA, 2021; p. 8. [Google Scholar] [CrossRef]
- RB, A.; KR, S.K. Credit card fraud detection using artificial neural network. Glob. Transitions Proc. 2021, 2, 35–41. [Google Scholar] [CrossRef]
- Dhankhad, S.; Mohammed, E.; Far, B. Supervised Machine Learning Algorithms for Credit Card Fraudulent Transaction Detection: A Comparative Study. In Proceedings of the 2018 IEEE International Conference on Information Reuse and Integration for Data Science (IRI), Salt Lake City, UT, USA, 6–9 July 2018; IEEE: Piscataway, NJ, USA, 2018; p. 7. [Google Scholar] [CrossRef]
| Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).