Enhancing Financial Fraud Detection through Addressing Class Imbalance Using Hybrid SMOTE-GAN Techniques

: The class imbalance problem in ﬁnance fraud datasets often leads to biased prediction towards the nonfraud class, resulting in poor performance in the fraud class. This study explores the effects of utilizing the Synthetic Minority Oversampling TEchnique (SMOTE), a Generative Adversarial Network (GAN), and their combinations to address the class imbalance issue. Their effectiveness was evaluated using a Feed-forward Neural Network (FNN), Convolutional Neural Network (CNN), and their hybrid (FNN+CNN). This study found that regardless of the data generation techniques applied, the classiﬁer’s hyperparameters can affect classiﬁcation performance. The comparisons of various data generation techniques demonstrated the effectiveness of the hybrid SMOTE and GAN, including SMOTiﬁed-GAN, SMOTE+GAN, and GANiﬁed-SMOTE, compared with SMOTE and GAN. The SMOTiﬁed-GAN and the proposed GANiﬁed-SMOTE were able to perform equally well across different amounts of generated fraud samples.


Introduction
The financial sector faces a significant challenge in the form of financial fraud, encompassing various forms of criminal deception aimed at securing financial gains, including activities like telecommunication fraud and credit card skimming.The proliferation of electronic payment technology has propelled online transactions into the mainstream, thereby amplifying the occurrence of fraudulent schemes.The prevalence of these fraudulent transactions has led to substantial losses for financial institutions.However, the large daily transactions pose a challenge for humans in manually identifying fraud.Recently, deep learning techniques have been explored and have shown promising results in detecting financial fraud Alarfaj et al. (2022); Fang et al. (2021); Kim et al. (2019).Unfortunately, most real-world financial fraud datasets suffer from a severe class imbalance issue, where the fraud data's proportion is significantly lower than that of nonfraud.In binary classification, class imbalance often leads to biased predictions favoring the majority class Johnson and Khoshgoftaar (2019).Consequently, the classifier's performance on the minority class is compromised, especially when encountering dissimilar frauds.Overcoming this problem poses a significant challenge, as classifiers are expected to achieve high precision and recall in fraudulent class.
To address this problem, several oversampling methods have been employed to generate minority samples.Synthetic Minority Oversampling TEchnique (SMOTE) interpolates between the existing minority data to synthesize minority samples Chawla et al. (2002).Generative Adversarial Networks (GANs) comprise a discriminator that aims to differentiate between real and generated samples and a generator that strives to deceive the discriminator by synthesizing realistic samples Goodfellow et al. (2014).GANs have shown superior results compared with SMOTE Fiore et al. (2019).However, SMOTE may cause overgeneralization issues.GAN, primarily designed for image generation, is not ideal for handling the class imbalance problem.To overcome these limitations, SMOTified-GAN employs SMOTE-generated samples instead of random noises as input to the GAN Sharma et al. (2022).
In addition to the aforementioned data generation techniques, other hybrids of SMOTE and GAN are worth exploring.This study presents the following contributions: 1.
Introducing two data generation techniques, SMOTE+GAN and GANified-SMOTE, designed to effectively address the class imbalance issue in finance fraud detection.2.
Conducting a comprehensive comparison between the proposed oversampling methods and existing data generation techniques, utilizing precision, recall, and F1-score as key performance metrics.

3.
Evaluating the performance of the data generation techniques across various neural network architectures, including a Feed-forward Neural Network (FNN), Convolutional Neural Network (CNN), and the proposed hybrid FNN+CNN.4.
Analyzing the impact of training classifiers on different proportions of the generated minority samples.

Related Work
The task of detecting financial fraud can be approached as a binary classification challenge, where classifiers examine the patterns within fraudulent and legitimate transactions to classify new transactions accurately.Consequently, it is crucial to possess an ample and diverse dataset to enable classifiers to grasp the inherent patterns of both transaction categories.Addressing the issue of inadequate fraudulent samples in the training dataset, various methodologies have been introduced to create artificial fraud instances and supplement the original data.These techniques include SMOTE, GAN, and SMOTified-GAN.SMOTE Chawla et al. (2002) has been widely applied to imbalanced training datasets.More than 85 SMOTE variations were proposed by 2018, including SMOTE+TomekLinks, SMOTE+ENN, Borderline-SMOTE, and Adaptive Synthetic Fernández et al. (2018).Recent studies proposed Radius-SMOTE Pradipta et al. (2021), which prevents overlap among generated samples, and Reduced-Noise SMOTE Arafa et al. (2022), which removes noise after oversampling.In financial fraud detection, SMOTE and its variations have been widely utilized to resample highly imbalanced datasets before training models such as AdaBoost Ileberi et al. (2021) and FNN Fang et al. (2021).Besides the finance domain, SMOTE and its variations have found extensive application in other fields dealing with highly imbalanced datasets.In bio-informatics, SMOTE has been used to discriminate Golgi proteins Tahir et al. (2020) and predict binding hot spots in protein-RNA interactions Zhou et al. (2022).In medical diagnosis, SMOTE and its variations have been employed for diagnosing cervical cancer Abdoh et al. (2018) and prostate cancer Abraham and Nair (2018).SMOTE has also been used to predict diabetes Mirza et al. (2018) and heart failure patients' survival Ishaq et al. (2021).
GANs Goodfellow et al. (2014) and their variations have more recently been employed for generating minority samples to tackle the class imbalance problem.Douzas and Bacao (2018) utilized a conditional GAN (cGAN) which can recover the distribution of training data to generate minority samples.To address the mode collapse issue, Balancing GAN was proposed to generate more diverse and higher-quality minority images Mariani et al. (2018).However, in this technique, the generator and discriminator cannot simultaneously reach their optimal states, leading to the development of IDA- GAN Yang and Zhou (2021).In financial fraud detection, GAN has been employed to generate fraud samples for imbalanced datasets before training classifiers, such as AdaBoost-Decision Tree Mo et al. (2019) and FNN Fiore et al. (2019).These studies have reported that the GAN achieves higher AUC, accuracy, and precision compared with SMOTE.Interestingly, Fiore et al. (2019) found that the best performance was achieved when twice as many GAN-generated fraud samples as the original fraud data were added to the training dataset.In other financerelated domains, GANs have been utilized to address class imbalance in money laundering detection in gambling Charitou et al. (2021).GANs and their variations have also been used extensively for high-dimensional imbalanced datasets, such as images Mariani et al. (2018); Scott and Plested (2019) and biomedical data Zhang et al. (2018).Recent studies have successfully applied GANs and their variations to generate minority samples in bio-informatics Lan et al. (2020).
Despite the notable accomplishments of SMOTE and GAN, these methods have certain limitations.SMOTE may introduce noise that leads to overgeneralization Bunkhumpornpat et al. (2009).While GANs can generate more "realistic" data, they may not be ideal for handling imbalanced data, as it was originally designed for generating images using random noise.Additionally, there may be insufficient real minority data available for training the GAN Mariani et al. (2018).To address these limitations, Sharma et al. (2022) proposed SMOTifed-GAN, which employs SMOTE-generated samples as input for GAN instead of random numbers, resulting in improved performance compared with SMOTE and GAN.
In early studies, financial fraud detection systems predominantly depended on rulebased methodologies, wherein human expertise in fraud was translated into rules to anticipate fraudulent activities Zhu et al. (2021).However, the evolving behaviors of fraudsters and the increasing size of transaction datasets have posed challenges in identifying fraud-related rules manually.As a result, research has shifted towards machine learning methods, such as naive Bayes, logistic regression, support vector machine, random forest, and decision tree (Ileberi et al. 2021;Ye et al. 2019;Zhu et al. 2021), which can "learn" fraud and nonfraud patterns from given datasets.Nonetheless, machine learning techniques require extensive data preprocessing before training the classifier Alarfaj et al. (2022); Kim et al. (2019); Zhu et al. (2021).
In recent years, deep learning has gained popularity in financial fraud detection due to its superior performance compared with traditional machine learning approaches Alarfaj et al. (2022); Fang et al. (2021); Jurgovsky et al. (2018); Kim et al. (2019).Some studies have approached financial fraud detection as a sequence classification problem, considering the temporal sequence of transactions as a crucial factor.Sequential models, such as Gated Recurrent Units Branco et al. (2020), Long Short-Term Memory (LSTM) Jurgovsky et al. (2018), and Time-aware Attention-based Interactive LSTM Xie et al. (2022), have been proposed.However, since most available financial fraud datasets lack timesequence information, sequential models may not be suitable in such cases.Due to the vector format of finance fraud datasets without time-sequence information, FNNs are considered a suitable choice Fang et al. (2021); Fiore et al. (2019); Kim et al. (2019).Initially designed for image processing and classification, CNNs have also been found effective in financial fraud detection Alarfaj et al. (2022); Chen and Lai (2021); Zhang et al. (2018).Their 1D convolution layers can extract patterns within smaller segments of a transaction vector.
Building on Fiore et al. (2019)'s findings, this study aimed to assess the performance of a model using varying amounts of minority samples in the training dataset.To achieve this, the study explores the use of SMOTE, GAN, SMOTified-GAN, and other variants of hybrid SMOTE and GAN.Consequently, a combination of SMOTE-and GAN-generated minority samples, along with GANified-SMOTE, was proposed to fulfill the research aims.Finally, FNN, CNN, and FNN+CNN models were employed to ensure a fair evaluation of the performances of different data generation techniques.

Data Preprocessing
The experiment utilized the Kaggle (2018) credit card fraud dataset, consisting of 284,807 transactions conducted by European credit card holders over two days in September 2013.This dataset comprises 31 numerical features, including Time, Amount, Class, and 28 other unnamed features.The 'Time' feature represents the elapsed time in seconds since the first transaction, while the 'Amount' feature denotes the transaction amount.The 'Class' label indicates fraudulence, utilizing binary values, where 1 and 0 represent fraud and nonfraud, respectively.Notably, only 492 transactions (0.172%) are classified as fraudulent, resulting in a highly imbalanced distribution.
To facilitate gradient descent convergence and mitigate bias towards features with larger magnitudes, all features except the 'Class' label were rescaled to the range [0, 1] while maintaining the original feature distribution.This rescaling process for a value X in a given feature was transformed into a new value X (see Equation ( 1), where X min and X max represent the minimum and maximum values of the feature, respectively) to maintain the original feature distribution.
Subsequently, the dataset was divided into a training set comprising 80% of the data (227,451 nonfraud and 394 fraud) and a testing set comprising the remaining 20% (56,864 nonfraud and 98 fraud).

Data Generation Methods
To address the issue of class imbalance, this study explored five data generation techniques: SMOTE, GAN, and their respective combinations.

SMOTE
SMOTE creates synthetic minority samples rather than duplicating existing ones to avoid overfitting.For a specific minority data point x represented as a vector, a vector x k is randomly chosen from its k-nearest neighbors to generate a new sample x using Equation ( 2).In this study, 394 instances of fraudulent data from the training dataset were utilized with the SMOTE technique, employing five nearest neighbors, to generate additional fraud samples, as depicted in Figure 1.
Figure 1.SMOTE employed in this study utilizing five nearest neighbors for random interpolations and generating minority samples.

GAN
A GAN comprises a generator G and a discriminator D that engage in a competitive training process to improve their respective objectives.The discriminator aims to correctly classify real samples x and fake samples generated by the G(z), where z represents random noise or the latent space input to the G.The D's predictions for real and generated samples are denoted as D(x) and D(G(z)), respectively.By considering real samples with a label of 1 and generated samples with a label of 0, the D's loss function is defined in Equation ( 3), where E calculates the error or distance between the D's prediction and the true label.The G's objective is to generate realistic fake samples from random noise that can deceive the D into misclassifying them.The G's general loss function, defined in Equation ( 4), allows it to improve the quality of the generated samples based on the feedback received from the D's classification.As the G and D continue enhancing their performance, the quality of the generated minority samples improves.
The proposed GAN architecture, as shown in Figure 2, consists of a 5-layer FNN G with respective neuron counts of 100, 256, 128, 64, and 30.The G takes 100 random noises sampled from a normal distribution.LeakyReLU activation function (Equation ( 5)) is used in all hidden layers, and dropout layers with a dropout rate of 0.2 are added after each hidden layer to mitigate overfitting.The output layer employs the sigmoid activation function (Equation ( 6)) to produce values between 0 and 1.Similarly, the D is a 5-layer FNN with identical activation functions and dropout layers.However, the neuron counts are 30, 128, 64, 32, and 1 for each layer.The D employs a stochastic gradient descent (SGD) optimizer with a learning rate of 0.05.The loss function depicted in Figure 2 is binary cross-entropy (Equation ( 7)), as the D's task involves binary classification.The GAN network also employs an optimizer with the same learning rate as the D, but the loss function utilizes the mean squared error metric (Equation ( 8), where y i is the true label and ŷi is the predicted class) as feedback for the G.
(5) The fraud data from the training dataset were utilized to train D, enabling it to recognize patterns in real fraud data and generate fraud samples.Since there were only 394 fraudulent data points available for training, the batch size was reduced to 32.The number of training epochs was set to 1000 to allow sufficient time for the G and D to improve their performance.Following training, the G is employed to generate fraud samples based on the required number of minority samples.

SMOTified-GAN
GAN can learn patterns from minority data, resulting in more authentic minority samples.However, using random noise as input for the GAN G can be seen as generating samples from scratch, making it more challenging to train the G to produce high-quality samples.By utilizing SMOTE-generated samples as input, the generation process becomes simpler as the G begins with pre-existing fraud samples (Sharma et al. 2022).In the proposed approach, SMOTE was applied with the five nearest neighbors to generate double the number of fraud samples.Figure 3 illustrates that 788 SMOTE-generated samples were used as input for the GAN G.The hyperparameters of the GAN in the SMOTified-GAN model remained the same as the regular GAN, except for the number of neurons in the input layer of the G, which was adjusted to 30 to match the 30 features present in the SMOTE-generated fraud samples.

SMOTE+GAN
To address the limitations of SMOTE and GAN, a hybrid approach was proposed and employed to enhance the ratio of fraudulent data in the training dataset.The SMOTEgenerated and GAN-generated fraud samples were directly combined with the original training dataset without any alterations, as depicted in Figure 4.The combined dataset comprised an equal contribution from both the SMOTE-and GAN-generated samples, amounting to half of the total required generated data.

GANified-SMOTE
Another hybrid method, GANified-SMOTE, was implemented.The random interpolation makes SMOTE-generated samples susceptible to the noise present in the dataset.Consequently, the generated minority samples are located near the boundary of the majority class, leading to higher misclassification rates.Conversely, GAN can learn the underlying patterns of the minority class, reducing the impact of such noise.By utilizing GAN-generated data for SMOTE interpolations, the limitations of SMOTE can be overcome.Additionally, applying SMOTE on the GAN-generated data can decrease reliance on the prominent patterns of the minority class, thereby mitigating overfitting.Figure 5 illustrates the utilization of fraud samples generated by the GAN, which are then processed with SMOTE to generate the necessary number of fraud samples.The resulting output from SMOTE is combined with the original training dataset, which originally contained 394 authentic fraud data points.

Summary of Data Generation Methods
Table 1 presents an overview of the types and quantities of fraud data utilized in each data generation method.Two experiments were conducted for each method to assess the impact of varying amounts of generated data in the training dataset.In the first experiment (Test A), the training dataset was adjusted to achieve a balanced distribution of 50% fraud and 50% nonfraud samples.In the second experiment (Test B), only 788 fraud samples were generated, twice the number of the original fraud data in the training dataset.This choice was based on the finding (Fiore et al. 2019) that injecting twice as many GAN-generated fraud samples as the original fraud data produced the optimal outcome.
For both experiments, fraud samples were generated using five data generation techniques after splitting the complete dataset into training and testing sets.The testing dataset was not utilized for data generation to ensure that the validation conducted using these unseen data reflects the model's performance when applied to real-world financial fraud detection systems, as these systems encounter unseen data.A preliminary investigation was conducted to assess several hyperparameter configurations of the FNN (Feed-forward Neural Network) along with SMOTE-generated samples to tackle the problem of class imbalance.The two most effective models were selected as classifiers to evaluate all the data generation techniques.Table 2 contains the hyperparameters used for these models.Both models employed the Rectified Linear Unit (ReLU) activation function for their hidden layers.To counter overfitting, dropout layers with a dropout rate of 0.1 were inserted after each hidden layer.The output layer utilized the sigmoid activation function to ensure that the output probabilities fall within the range of 0 to 1, representing the likelihood of a transaction being fraudulent.For the loss function, binary cross-entropy was employed.Due to the substantial size of the training dataset, a batch size of 128 was chosen, and the training process was executed over 100 epochs, allowing for multiple iterations to refine the model.

CNN
Similarly to the FNN, various hyperparameter configurations of the CNN were tested, and the two best-performing models were selected for further investigation.The hyperparameters for these models are presented in Table 2.Both models began with an input layer of dimensions (30, 1).Subsequently, a 1D convolutional layer and a max-pooling layer were incorporated, followed by a flattening layer and a dense layer consisting of 50 neurons, utilizing the ReLU activation function, along with a dropout layer featuring a dropout rate of 0.1.The output layer consisted of a single neuron activated by the sigmoid function.The kernel size and pool size for both models were set to 3 and 2, respectively.The initial findings indicated that the CNN models reached a stable loss and accuracy after the 50th epoch.Consequently, the number of training epochs was set to 50, providing sufficient time to refine the models and observe their performance.

FNN+CNN
FNN and CNN models tend to misclassify nonfraudulent transactions, while demonstrating an intuitive ability to identify the same fraudulent transactions.Consequently, this study integrated the two models to enhance the final prediction, aiming to reduce the false-positive rate within the fraud class.By leveraging the strengths of both models and combining their insights, it was anticipated that the integrated approach would yield improved accuracy and more reliable identification of fraudulent transactions.

Method I:
The final prediction is classified as fraudulent only if both FNN and CNN predict the transaction as fraudulent.The detailed processes and decision steps of this method are depicted in Figure 6a.Both the FNN and CNN output a probability of a transaction being fraudulent, where a value greater than 0.5 is considered fraudulent.Hence, the integrated model predicted a transaction as fraudulent only if both models' output surpassed 0.5.Intuitively, it is improbable for a nonfraudulent transaction to be classified as fraudulent by both models, given their tendency to learn distinct patterns associated with fraud and nonfraud.This integration reinforces the reliability of the fraud prediction, since it must satisfy both conditions.Method II: The initial study demonstrated that the first method successfully enhanced the precision of fraud detection but resulted in a decrease in the recall.Therefore, another method was proposed to increase recall while maintaining a high precision.In certain instances, one of the models produces a value close to 1, indicating a high probability of the transaction being fraudulent.Conversely, the other model generates a value below but close to 0.5.According to the first method, these transactions would be predicted as nonfraudulent.However, intuitively, such transactions are more likely to be fraud, since one of the models strongly indicates fraud.To address this scenario, the sum of the output values from both models is utilized to make the final prediction.If the sum exceeds a selected threshold, the prediction will be fraudulent.The detailed processes and decision steps of this method are depicted in Figure 6b.

Deep Learning Models with SMOTE-Generated Data
In this study, two FNN and two CNN models were developed to determine the optimal configuration, and their specific hyperparameters are outlined in Table 2.The training process took place on a machine equipped with an 11th Gen Intel Core i7-11375H CPU, 16GB RAM, Intel Iris Xe Graphics, and NVIDIA GeForce RTX 3060 Laptop GPU.The training dataset consisted of an equal distribution of fraud and nonfraud data, with the fraud samples generated using the SMOTE technique (refer to Section 3.2.1).To evaluate their performance, these variations were tested on the testing dataset, and various metrics were employed, including training accuracy, loss, and time, as well as testing precision (PR), recall (RC), F1-score (F1), and root mean squared error (RMSE).
In Table 3, all the top-performing models achieved impeccable precision, recall, and F1-score (PR = RC = F1 = 1.00) for the nonfraud class, and their recalls in the fraud class were satisfactory (RC ≥ 0.85).However, their precision and F1-score in the fraud class did not meet the desired criteria.FNN1, which utilized a lower learning rate, demonstrated higher precision and F1-score compared with FNN2.Similarly, CNN2, with fewer filters, exhibited higher precision and F1-score compared with CNN1, albeit with a slightly lower precision.Consequently, FNN2 and CNN2, boasting the highest F1-score within their respective FNN and CNN models, were selected as the top-performing models for integration (FNN+CNN).It is worth noting that FNN models using the SGD optimizer yielded superior results compared with those employing Adam, while the opposite was observed for CNN.Additionally, CNN's training time was longer than that of FNN due to the fewer epochs utilized.
Table 3.The results of the two top-performing models within FNN and CNN variations are presented.The selection of the best model was based on comparing their recalls first, followed by their precision.This research employed four selected FNNs and CNNs to evaluate the impact and performance of the generated data.The results can be found in Table 4.When employing the identical data generation method and incorporating an equal quantity of fraudulent data, both versions of FNN and CNN yielded comparable outcomes.This demonstrates that the outcomes derived from a data generation method are not significantly influenced by the classifier's parameters.The FNN generally yielded a higher recall compared with the CNN, albeit with lower precision.In test A, both SMOTE and SMOTE+GAN exhibited significantly lower precision and F1-score across all models, despite demonstrating a high recall.However, there was substantial improvement observed in test B for these two methods, where synthetic fraud samples were injected at a ratio of twice the original records.Additionally, SMOTE+GAN achieved the highest F1-score in three out of four models during test B. The proposed GANified-SMOTE generally yielded slightly higher precision and F1-score than GAN, despite having a lower recall.This can be attributed to GAN's ability to capture the original fraud data's characteristics, resulting in the GAN-generated fraud samples being clustered in regions with a high concentration of the original fraud data.The SMOTE's application on the GAN-generated samples generates additional fraud samples in between them, potentially causing a 'blurring' effect on the fraud data's features.This could explain the generally lower recall of GANified-SMOTE in comparison with GAN.As a tradeoff, GANified-SMOTE achieves a lower false positive rate, leading to higher precision compared with GAN.When compared with SMOTified-GAN, the proposed GANified-SMOTE generally demonstrated slightly lower precision and F1-score, while maintaining a similar recall.SMOTified-GAN generates fraud samples using SMOTE-generated samples as input, which can result in the production of more realistic and diverse samples.The SMOTified-GAN-generated samples tended to be more centrally distributed within fraud areas and less centrally distributed within nonfraud areas, which could explain the higher precision and F1-score observed.

FNN+CNN
The top-performing models from the FNN and CNN were combined to create two distinct hybrid FNN+CNN methods, as illustrated in Figure 6.The results of the FNN+CNN approach for Method I and II are depicted in Table 5. Method I exhibited improved precision for detecting fraudulent cases compared with using FNN or CNN models alone.In Test A, the SMOTE and SMOTE+GAN showed significant improvements in precision, despite a slight decrease in recall, particularly when compared with the FNN model.This decline can be attributed to the fact that fraud predictions must meet two distinct conditions, resulting in a reduced number of predicted fraud cases.Consequently, the fraud class's precision increases, since precision is determined by the ratio of true fraud cases to predicted fraud cases.However, this trade-off leads to a decrease in the fraud class's recall.Nevertheless, the overall F1-score exhibited a slight increase compared with the individual FNN and CNN models.In Method II, different thresholds ranging from 1.1 to 1.9 were tested to determine the optimal threshold value.Since Method II's goal was to enhance recall, the best threshold value was determined based on recall.
Overall, Method II yielded better results than Method I.The findings demonstrated that the proposed hybrid FNN+CNN approach in Method II outperformed the FNN and CNN models individually.Similar to the observations on the FNN and CNN, injecting twice the number of fraud samples as the original fraud data using SMOTE and SMOTE+GAN yielded better performance than a 50:50 distribution of fraud and nonfraud samples.The performance of GAN, SMOTified-GAN, and GANified-SMOTE was not significantly affected by the number of injected fraud samples.The proposed GANified-SMOTE technique achieved the highest precision for both integration methods and also exhibited high F1score and recall.This may be attributed to the pure variations in the FNN and CNN used in the hybrid model performing well with GANified-SMOTE.However, the GANified-SMOTE's performance on the FNN and CNN variations was similar.Therefore, it can be concluded that the proposed GANified-SMOTE can achieve high performance regardless of the number of injected fraud samples.

Comparison with Existing Studies
To evaluate the proposed methodologies, a comparison was made with previous studies that utilized the same dataset, as presented in Table 6.Given the trade-off between precision and recall, attaining flawless outcomes for models, whether existing or proposed, remains elusive.The outcomes observed by (Fiore et al. 2019) and (Sharma et al. 2022) upon applying SMOTE, GAN, and SMOTified-GAN exhibited relatively modest recalls (below 0.80), indicating limited detection of fraudulent transactions.Consequently, their F1-scores generally trailed behind those of the proposed methods.This serves to illustrate the proficiency of the proposed models in effectively identifying fraudulent transactions while upholding a minimal misclassification rate for nonfraudulent data.
All the implemented techniques exhibited higher recall rates compared with the existing studies.However, this improvement came at the expense of lower precision when compared with previous research.One potential explanation for this discrepancy could be the differences in the classifiers utilized.Previous studies Fiore et al. (2019); Sharma et al. (2022) employed classifiers with less than four layers, whereas the proposed classifier consisted of at least four.Consequently, the enhanced classifier was able to better learn the distinguishing characteristics of fraudulent data, improving the identification of such instances.However, this also led to an increased misclassification of nonfraudulent data.
Another factor that could have influenced the outcomes is the stochastic nature of the SMOTE (and the GAN) and deep learning models (the FNN).Despite the lower precision, the F1-score of the proposed methodologies surpassed that of the previous studies, except for SMOTE on Test A. This observation highlights the significant impact of classifier parameters on its performance, irrespective of the data generation methods employed.This observation aligns with the previous findings, indicating an overall enhancement in the F1-score when utilizing a hybrid of FNN and CNN, regardless of the specific data generation methods employed.Nonetheless, data generation methods can still affect the performance of the same classifier in different variations.The results from Fiore et al. (2019) and Sharma et al. (2022) demonstrated an increase in recall and F1-score when employing GAN as opposed to SMOTE.However, in this study, the implemented GAN did not improve recall but only enhanced the F1-score on Test A. The challenges in training the GAN may have resulted in the lower quality of generated fraudulent samples.Conversely, SMOTE's random interpolation may not effectively capture the distinguishing characteristics of fraud instances.Therefore, combining SMOTE and GAN in a hybrid approach could result in the two complementing each other and better a representation of the fraudulent data.The proposed SMOTE+GAN demonstrated a slight improvement in recall on Test A compared with SMOTE.Additionally, the implemented SMOTified-GAN and the proposed GANified-SMOTE successfully improved the F1-score.

Conclusions
The present study introduces SMOTE+GAN and GANified-SMOTE techniques as innovative solutions to counteract class imbalance, thereby offering financial institutions an effective tool for reducing losses due to fraudulent activities.Additionally, the integration of FNN and CNN in predicting transaction categories is proposed.The effectiveness of the newly proposed data generation methods was assessed against existing techniques using an FNN, CNN, and FNN+CNN as classifiers.The outcomes highlight the potency of GANified-SMOTE, particularly when coupled with the proposed FNN+CNN classifier, in augmenting the F1-score for fraudulent data.This high F1-score indicates the method's capacity to identify a substantial portion of fraudulent transactions with reduced misclassification of legitimate transactions.Notably, GANified-SMOTE and SMOTified-GAN consistently exhibit commendable performance across varying quantities of generated minority samples.Furthermore, the research underscores the significant impact of the classifier's hyperparameter settings on classification performance, irrespective of the employed data generation methods.
In light of this experiment utilizing an online-acquired dataset, it is crucial to recognize that the study's findings may not perfectly simulate real-world scenarios marked by ever-evolving fraudulent behaviors.Future endeavors should validate the efficacy of the proposed methods within actual financial institutions.Moreover, while the experiment employs a labeled dataset with presumed accurate class labels, real-world datasets often pose the challenge of being unlabeled and necessitating comprehensive preprocessing.To tackle class labeling issues, future investigations could explore the potential of unsupervised learning in data generation.Furthermore, to firmly establish the effectiveness of the proposed methods, this study acknowledges that comparisons with existing research were limited.Factors like classifier selection may have influenced observed improvements.Therefore, to enhance generalizability, future research should involve additional classifiers and ablation studies.These efforts would serve to validate the performance of the data generation methods in diverse scenarios.

Figure 2 .
Figure 2. GAN architecture employed in this study consisting of a 5-layer FNN generator and discriminator, leading to the generation of the final minority samples depicted by the blue square.

Figure 3 .
Figure3.SMOTified-GAN architecture that employed SMOTE-generated samples as the input for the G, deviating from the traditional GAN approach that uses random noise.The final minority samples were produced and depicted in the blue square.

Figure 4 .
Figure 4. SMOTE+GAN architecture was employed to generate minority samples by directly incorporating SMOTE and GAN techniques.These generated samples were then merged with the original training dataset.

Figure 5 .
Figure 5. Proposed GANified-SMOTE architecture that involved the application of SMOTE to the GAN-generated samples, as indicated by the red dashed square.

Figure 6 .
Figure 6.Flowchart illustrating the two methods employed for FNN+CNN.

Table 1 .
Variations in the types and quantities of fraud data utilized for each data generation method.

Table 2 .
The hyperparameters of the two top performing models within FNN and CNN variations.

Table 4 .
The outcomes of employing five methods to generate data on four models were evaluated on two distinct minority samples.

Table 5 .
The integration of FNN2+CNN2 with Method I and II was evaluated using different data generation techniques on two minority samples, and the top-performing results for each measurement in tests A and B are highlighted in bold.

Table 6 .
Comparison of results obtained by existing studies and the proposed methods.Test A is the result for including twice-generated fraud samples as much as the original fraud samples, whereas Test B is the result for 50:50 fraud and nonfraud distributions.The highest Precision, Recall, and F1-score are highlighted in bold.