1. Introduction
Credit card fraud detection is a critical issue in the financial industry, with substantial economic implications [
1,
2,
3]. According to recent reports, financial institutions worldwide lose over USD 32 billion annually due to fraudulent activities [
2,
4]. Additionally, the COVID-19 pandemic has led to a rise in online transactions, which further increased the risk of fraud [
5]. Therefore, effective fraud detection systems are essential to mitigate these losses and protect consumers.
However, credit card fraud detection presents several challenges that traditional methods and even some DL techniques struggle to address effectively. For instance, fraud patterns constantly evolve as perpetrators develop new strategies to bypass detection systems [
6,
7]. This continuous evolution requires adaptive models that can learn from new and unseen fraudulent activities in real-time, a task which traditional rule-based systems and even some classical ML models find challenging [
8,
9,
10]. Additionally, credit card fraud is typically a rare event, accounting for a tiny fraction of overall transactions [
11]. This leads to highly imbalanced datasets where legitimate transactions significantly outnumber fraudulent ones. Such imbalanced data causes difficulties in training models that can effectively detect the minority class (fraudulent transactions) without overfitting to the majority class [
1].
Furthermore, fraudulent behavior is often hidden within sequences of transactions over time. Thus, credit card fraud detection requires models capable of capturing temporal dependencies across sequential data. Fraudulent transactions may exhibit temporal relationships, such as multiple small transactions over a short period or patterns based on transaction time. Recurrent Neural Network (RNN)-based architectures like Long Short-Term Memory (LSTM) networks and Gated Recurrent Units (GRUs) can model these temporal dependencies effectively. However, even these models face limitations when dealing with extreme data imbalance and complex, evolving fraud patterns that require synthetic data to enhance training.
Therefore, to overcome these challenges, the integration of Generative Adversarial Networks (GANs) into the fraud detection framework introduces a promising solution. GANs can generate realistic synthetic fraudulent transactions to augment the training data, addressing the imbalance issue directly [
12]. They can enhance the model’s ability to detect rare events without overfitting the majority class by augmenting the dataset with synthetic examples. Furthermore, combining GANs with RNNs such as LSTM and GRU utilizes the strengths of both models, allowing the model to not only generate realistic synthetic data but also capture temporal dependencies in transaction patterns.
GANs consist of a generator and a discriminator that compete in a zero-sum game. The generator creates synthetic data, while the discriminator attempts to distinguish between real and fake data [
13,
14]. GANs have achieved improved performance in different applications, including image generation, natural language processing, and anomaly detection [
15,
16,
17]. Therefore, this study aims to employ the robustness of the GAN framework and deep learning (DL) architectures to enhance credit card fraud detection. In the proposed credit card fraud detection approach, the GAN’s generator aims to create synthetic fraudulent transactions. The discriminator, using various architectures such as a Simple RNN, an LSTM, and a GRU, will initially distinguish between real and synthetic transactions. Through this adversarial training process, the discriminator will learn to identify realistic patterns of fraudulent activity. After this initial phase, the discriminator is further fine-tuned to directly classify transactions as fraudulent or legitimate. This dual-phase training process has proven to be effective in different applications [
18,
19,
20]. The rationale for using GANs in conjunction with RNNs is that while RNNs capture temporal dependencies, GANs help balance imbalanced datasets and enhance model generalization by generating synthetic fraudulent transactions.
The main contributions of this study are threefold:
First, the study proposes a hybrid DL framework that integrates GANs with RNNs to enhance credit card fraud detection. The proposed approach addresses the critical challenge of data imbalance by generating realistic synthetic fraudulent transactions, augmenting the training data and enabling more robust fraud detection.
Second, the study evaluates multiple RNN architectures, including Simple RNN, LSTM, and GRU, as discriminators within the GAN structure to capture the temporal dependencies inherent in transaction sequences.
Third, a performance evaluation is conducted on two widely used public datasets to demonstrate the robustness of the proposed framework.
The remainder of this paper is structured as follows:
Section 2 reviews related works.
Section 3 describes the dataset used in the study, the different DL architecture and the training procedures of the proposed hybrid GAN-RNN models.
Section 4 presents the experimental results and discusses the implications of our findings, and
Section 5 concludes the paper with suggestions for future work.
2. Related Works
The field of credit card fraud detection has seen significant advancements with the advent of machine learning. Traditional ML techniques, such as decision trees, logistic regression, and support vector machines (SVMs), have been widely applied in predicting credit card fraud. However, they are inefficient in capturing complex, non-linear patterns in transaction data [
9,
21]. Meanwhile, deep learning has revolutionized many fields due to its ability to model complex data distributions and uncover hidden patterns [
22,
23,
24]. For instance, CNNs have been effectively used to capture spatial features in transaction data, while RNNs and their variants, such as LSTM networks and GRUs, are excellent at modeling temporal dependencies [
25]. These approaches have demonstrated significant improvements in detection accuracy compared to traditional methods. However, deep learning models often require large amounts of balanced data for training, which is a common challenge in fraud detection since legitimate transactions often outnumber fraudulent transactions.
Recent advancements in deep learning have led to innovative architectures tailored for specific challenges in image and pattern recognition. For instance, Tao [
26] and Tao et al. [
27] presented the principles of enhanced deformable convolutions and channel-enhanced networks, respectively, and how they could be adapted to improve the feature extraction capabilities of neural networks used in fraud detection. These studies demonstrate how specialized neural network modifications can significantly enhance detection capabilities in complex environments. Additionally, these adaptations could potentially address the unique challenges of transaction data, which is often non-linear and temporally complex.
Furthermore, a GAN, introduced by Goodfellow et al. [
28], has shown remarkable capabilities in generating realistic synthetic data. It consists of two neural networks: a generator, which creates synthetic samples, and a discriminator, which attempts to distinguish between real and synthetic samples. The adversarial training process enables the generator to produce highly realistic data, making GANs useful for tasks such as data augmentation and anomaly detection [
17,
29,
30]. GANs address the problem of imbalanced datasets by generating synthetic fraudulent transactions, providing more training examples for fraud detection models. Recent studies have explored the application of GANs in fraud detection. For instance, Fiore et al. [
31] utilized GANs to generate synthetic fraudulent transactions, demonstrating improved detection performance. Similarly, Chen et al. [
32] proposed the use of InfoGANs for enhancing fraud detection by generating interpretable synthetic data.
Ding et al. [
33] proposed a robust credit card fraud detection system using a hybrid model combining GANs and variational autoencoders (VAEs). The authors generated synthetic fraudulent transactions with a GAN to address the issue of imbalanced datasets, while an autoencoder was employed to reconstruct transaction data and detect anomalies. Their model significantly improved detection performance, especially in terms of recall, compared to other baseline models. Similarly, Wu et al. [
34] introduced a GAN-based approach for credit card fraud detection, where the GAN generates synthetic fraudulent transactions, and two autoencoders are used to detect the anomalies, enhancing the model’s ability to detect fraudulent patterns over time. Their results demonstrated that the GAN-autoencoder framework outperformed conventional methods.
Additionally, Banu et al. [
35] developed a deep neural network architecture specifically designed for credit card fraud detection by integrating a convolutional neural network (CNN) and an LSTM network. The CNN was employed to capture spatial correlations within transaction features, while the LSTM was used to detect temporal patterns over sequences of transactions. This combination allowed the model to detect fraudulent behavior across both short and long time horizons. Their experimental results demonstrated that this hybrid approach achieved higher accuracy and area under the receiver characteristic curve (AUC) compared to baseline models like random forests and SVMs.
These studies highlight the potential of GANs to improve the robustness and accuracy of fraud detection systems by augmenting the training data with realistic synthetic examples. Furthermore, the combination of GANs and RNNs presents a promising approach to fraud detection by utilizing the strengths of both techniques. GANs can generate synthetic fraudulent transactions that mimic real-world patterns, while RNN models can effectively process and analyze complex transaction data. Few studies have explored this hybrid approach in different applications. For example, Gupta et al. [
36] used the GAN-RNN architecture for video generation, while Yang et al. [
37] utilized it for health data augmentation.
Therefore, in this study, a hybrid GAN-RNN framework is proposed for effective credit card fraud detection. The framework explores multiple RNN architectures (Simple RNN, LSTM, and GRU) as discriminators within the GAN framework. This study aims to identify the most effective combination for enhancing fraud detection performance by systematically evaluating these architectures.
4. Results and Discussion
This section presents the experimental results and discussion. In the experimental setup of the proposed hybrid GAN-RNN framework, the data are split into training, validation, and test sets using a stratified k-fold cross-validation approach, which is suitable for handling imbalanced datasets. This method ensures that each fold has an equal proportion of fraudulent and legitimate transactions, providing a balanced representation across all splits. Specifically, the data are divided into , where folds are used for training, and the remaining fold is used for validation. During the training phase, the GAN’s generator and discriminator are trained iteratively using folds in an adversarial manner.
Once the adversarial training is complete, the discriminator is fine-tuned to classify transactions directly as fraudulent or legitimate using the same
folds. This process is repeated
k times, with each fold being used as the validation set once. The final model evaluation is performed on an independent test set, which is not used during the training and validation phases. This approach aims to mitigate overfitting and ensures that the model generalizes well to unseen data. Meanwhile, the parameters of the different models used in this study are shown in
Table 4. These parameters were chosen as they are the commonly used default values in several ML and DL studies, known for their stability and effectiveness in similar binary classification tasks, such as fraud detection [
33,
53].
4.1. Model Performance Using European Credit Card Dataset
The performance of various models using the European credit card dataset is evaluated and summarized in
Table 5 and
Figure 3. The models tested include GAN combined with LSTM, GRU, and Simple RNN, as well as standalone LSTM, GRU, and Simple RNN models. The GAN-GRU model achieved the highest performance among all models, with a sensitivity of 0.992, specificity of 1.000, precision of 1.000, and an F-measure of 0.996. This model’s robustness is further shown by its highest AUC of 0.997, as shown in the ROC curves. These results indicate that the GAN-GRU combination is highly effective in distinguishing between fraudulent and legitimate transactions.
The GAN-LSTM model also performed exceptionally well, with a sensitivity of 0.990, specificity of 0.995, precision of 0.979, an F-measure of 0.984, and AUC of 0.992, demonstrating the effectiveness of combining GAN with LSTM for this application. Meanwhile, the GAN-Simple RNN model, although slightly less effective, still achieved impressive results with a sensitivity of 0.960, specificity of 0.991, precision of 0.962, and an F-measure of 0.961, along with an AUC of 0.987. In contrast, the standalone models (LSTM, GRU, and Simple RNN) showed lower performance. The LSTM model achieved a sensitivity of 0.901, specificity of 0.990, precision of 0.912, and an F-measure of 0.906, with an AUC of 0.971. The GRU model followed closely with a sensitivity of 0.897, specificity of 0.990, precision of 0.850, an F-measure of 0.873, and an AUC of 0.975. The Simple RNN model had the lowest performance, with a sensitivity of 0.884, specificity of 0.989, precision of 0.870, an F-measure of 0.877, and an AUC of 0.955.
Furthermore, the best-performing model (i.e., GAN-GRU) is compared with widely used ML models, including random forest, logistic regression, SVM, multi-layer perceptron (MLP), and extreme gradient boosting (XGBoost). The proposed GAN-GRU is benchmarked against these classifiers in
Table 6.
The results in
Table 6 demonstrate the superior performance of the proposed GAN-GRU model compared to widely used ML classifiers. The GAN-GRU model achieved the highest sensitivity, specificity, precision, and F-measure, significantly outperforming traditional models. Among the baseline models, XGBoost showed the best performance, with a sensitivity of 0.824, a specificity of 0.961, and an F-measure of 0.854. However, even XGBoost falls short when compared to GAN-GRU, particularly in terms of sensitivity and precision. Other models, such as random forest and MLP, performed reasonably well but struggled with lower sensitivity values, indicating difficulties in accurately detecting fraudulent transactions. Furthermore, SVM and decision tree showed lower performance in terms of sensitivity and F-measure, highlighting their limited ability to handle the imbalanced nature of the dataset. In contrast, the GAN-GRU model addressed the class imbalance effectively through the generation of synthetic fraudulent transactions, leading to its superior performance across all metrics.
4.2. Model Performance Using the Brazilian Dataset
The performance of the various models using the Brazilian dataset is summarized in
Table 7 and illustrated by the ROC curves in
Figure 4. The results consistently demonstrate the superior performance of the GAN-integrated models over their standalone counterparts. Among the tested models, GAN-LSTM obtained the best performance, achieving a sensitivity of 0.920, specificity of 0.965, precision of 0.988, and an F-measure of 0.953. Its AUC of 0.923 further confirms its effectiveness in accurately distinguishing fraudulent transactions from legitimate ones. Similarly, the GAN-GRU model also exhibited strong performance with a sensitivity of 0.903, specificity of 0.929, precision of 0.951, and an F-measure of 0.926, supported by an AUC of 0.913. The GAN-Simple RNN, while slightly behind, still achieved impressive results with a sensitivity of 0.898, specificity of 0.910, precision of 0.924, an F-measure of 0.911, and an AUC of 0.901.
In comparison, the standalone models demonstrated lower performance. The LSTM model recorded a sensitivity of 0.715, a specificity of 0.894, a precision of 0.881, an F-measure of 0.789, and an AUC of 0.819. The GRU model followed with a sensitivity of 0.690, specificity of 0.918, precision of 0.817, and an F-measure of 0.748, alongside an AUC of 0.877. The Simple RNN model scored the lowest in performance, with a sensitivity of 0.579, specificity of 0.864, precision of 0.703, an F-measure of 0.625, and an AUC of 0.791.
Furthermore, the performance of the proposed GAN-LSTM approach is benchmarked with baseline models in
Table 8. The proposed GAN-LSTM model demonstrates a superior performance compared to the baseline models. The GAN-LSTM achieved the highest sensitivity, specificity, precision, and F-measure, outperforming all the baseline models. Meanwhile, among the baseline models, XGBoost achieved the best performance, with a sensitivity of 0.764, specificity of 0.899, and an F-measure of 0.833. While XGBoost outperforms other baselines in terms of precision, it still falls short compared to GAN-LSTM. Meanwhile, random forest also performed well with a sensitivity of 0.709 and an F-measure of 0.790, but it could not match the overall performance of the GAN-LSTM. The substantial gap in sensitivity and F-measure between the GAN-LSTM and the baseline models demonstrates the effectiveness of combining GANs with RNN architectures. This combination allows the model to better handle imbalanced datasets and capture long-term temporal dependencies, both of which are crucial for detecting fraudulent transactions in highly complex credit card transaction datasets.
4.3. Performance Comparison with Other Scholarly Works
While the proposed approach achieved superior performance compared to the standalone DL and baseline models, it is necessary to compare its performance with state-of-the-art methods in the literature. A comparison is thus tabulated in
Table 9, and it is based on studies that used the European dataset. It can be seen that the proposed GAN-GRU model outperformed several models presented in recent studies. The superior performance of our GAN-GRU model, as compared to these recent studies, validates the effectiveness of integrating GANs with the DL architectures to address the challenges of credit card fraud detection, particularly in handling class imbalance and improving model robustness.
4.4. Discussion
The proposed hybrid GAN-RNN framework demonstrated significant improvements in fraud detection performance across the two datasets. The results consistently show that integrating GANs with RNNs, particularly GRUs, enhances the model’s ability to distinguish between fraudulent and legitimate transactions. This can be attributed to the GAN’s capacity to generate realistic synthetic fraudulent transactions, which helps in balancing the training dataset and mitigating the class imbalance issues. The combination of GANs with RNN architectures leverages the strengths of both models. GANs excel in generating realistic data, while RNNs are proficient at capturing temporal dependencies in sequential data. This dual approach allows the model to learn complex patterns associated with fraudulent activities, which standalone models or traditional ML techniques might miss.
Furthermore, our findings align with and exceed the results reported in the existing literature. The superior performance of the GAN-GRU model on the European dataset, with its high sensitivity, specificity, and F-measure, highlights its practical applicability in real-world fraud detection scenarios. The GAN-GRU’s effectiveness on the European dataset can be attributed to its ability to capture short-term dependencies and effectively handle the subtle patterns in the transaction data, which are crucial for this dataset. Conversely, the GAN-LSTM model showed the best performance on the Brazilian dataset. LSTM networks are known for their capability to learn long-term dependencies, which might be more significant given the diverse and detailed features in the Brazilian dataset. The longer period over which transactions were recorded in the Brazilian dataset may benefit more from LSTM’s ability to retain information over extended sequences, helping to identify fraudulent patterns that span over longer periods.
5. Conclusions and Future Research Directions
This study proposed a hybrid DL framework that combines GANs with RNN architectures for credit card fraud detection. The experimental results on two datasets demonstrated the superior performance of the proposed approach, which outperformed other models using the European and Brazilian datasets. This highlights the robustness and effectiveness of the proposed approach in handling class imbalance and enhancing model performance. Our findings indicate that the integration of GANs with DL architectures significantly improves the ability to detect fraudulent transactions, providing a more reliable solution for credit card fraud detection. The dual-phase training process, where the GAN generates synthetic data and the RNN learns from it, proves to be highly effective.
Future research should focus on exploring other GAN configurations and incorporating additional DL models to further enhance the detection capabilities. Moreover, implementing and testing the proposed approach in real-world environments will be essential to validate its practical applicability and operational efficiency. Therefore, addressing challenges related to computational efficiency and scalability will be crucial for deploying these models in large-scale, real-time fraud detection systems.