Machine Learning-Based Data Generative Techniques for Credit Card Fraud-Detection Systems

Feng, Xiaomei; Kim, Song-Kyoo

doi:10.3390/math14060975

Open AccessArticle

Machine Learning-Based Data Generative Techniques for Credit Card Fraud-Detection Systems

by

Xiaomei Feng

and

Song-Kyoo Kim

^*

Faculty of Applied Sciences, Macao Polytechnic University, R. de Luis Gonzaga Gomes, Macao SAR, China

^*

Author to whom correspondence should be addressed.

Mathematics 2026, 14(6), 975; https://doi.org/10.3390/math14060975

Submission received: 29 January 2026 / Revised: 9 March 2026 / Accepted: 11 March 2026 / Published: 13 March 2026

(This article belongs to the Special Issue Artificial Intelligence and the Quantitative Management of Financial Risk)

Download

Browse Figures

Versions Notes

Abstract

This study investigates the pressing issue of credit card fraud in the context of evolving e-commerce platforms and the necessity for improved fraud detection mechanisms. Since the advent of credit cards, the surge in their usage has led to a corresponding increase in fraud rates, highlighting the need to establish strong detection systems to prevent such activities. This research proposes a novel approach by integrating two distinct credit card datasets and a comparative evaluation of four machine learning imputation techniques to address missing values. By leveraging machine learning algorithms and imputation methods, we aim to enhance the accuracy and reliability of fraud detection. Our findings reveal significant improvements in model performance, with the accuracy of the integrated dataset reaching 100%, representing a 6.05% improvement over the original datasets; this improvement was confirmed to be statistically significant. Using the CBPM method, we selected the model that best balances accuracy and time efficiency. This result emphasizes the importance of effective data integration and imputation in combating financial fraud. It has direct practical implications for financial institutions, regulators, fraud analysts, and financial policymakers, who can use this approach to increase detection efficiency, reduce false positives, and optimize decision-making processes. Consequently, the method also helps protect consumers and enhances the overall resilience and credibility of financial markets.

Keywords:

missing data; data generative; multiple imputation; credit card fraud; credit prediction; predictive modeling; machine learning

MSC:

62H30; 62P05; 62P99; 91-08

1. Introduction

Credit cards emerged in the 1960s, providing an alternative to cash and check transactions. This shift caused a rapid increase in credit card users and transaction volumes, which in turn led to a rise in credit card fraud [1]. Most businesses have transformed into online services to provide e-commerce and convenient access, thereby enhancing customer productivity and accessibility [2]. When conducting business and transactions online, neither the card nor the cardholder needs to be present [3]. This makes identifying potential borrowers and assessing the credit risk arising from borrower defaults the primary concern for financial institutions [4]. Credit card fraud involves the unauthorized use of another person’s credit card, typically without the knowledge of either the legitimate cardholder or the issuing financial institution [5]. Such fraudulent conduct encompasses a range of illicit actions, including the acquisition of goods or services without making payment and the improper transfer of funds from the account, among other similar schemes. These activities can be categorized into offline fraud, application fraud, and bankruptcy fraud [6]. Detecting and preventing credit card fraud is a vital part of the financial system, focusing on identifying and blocking fraudulent transactions [7]. Effective fraud monitoring strategies can help minimize economic losses, boost customer trust, and reduce complaints [8,9]. The substantial financial losses associated with credit card fraud necessitate urgent countermeasures. Financial institutions are therefore compelled to accelerate the design, development, and deployment of effective fraud-detection systems capable of promptly and accurately identifying fraudulent transactions. Consolidation of data originating from heterogeneous sources constitutes data integration, which aims to deliver a coherent and unified perspective of the information to end users. Data integration can be classified into two types: virtual and materialized. Virtual integration allows users to access data sources in real-time through the integration system, often resulting in higher query costs. In contrast, materialized integration involves the system maintaining a replicated view of the data sources, which is commonly used in data warehousing and information system re-engineering [10]. Successful implementation of data integration relies on conceptual modeling and effective reasoning support, requiring the use of expressive logic to enhance integration capabilities. Despite extensive research on integrating heterogeneous information systems, most commercial solutions do not achieve complete data integration [11]. Missing data is a widespread challenge in modern scientific and engineering fields. Missing data can substantially bias results, impair the validity of data processing and subsequent analysis, and diminish overall statistical power [12]. In general, missingness mechanisms are categorized into three types: missing completely at random (MCAR), missing at random (MAR), and missing not at random (NMAR) [13]. Under the MAR mechanism, the probability of missingness is determined exclusively by the values of the observed variables. For MCAR, the absence of data is independent of both observed and unobserved variables. In contrast, NMAR is characterized by missing data that is influenced by both the observed variables and the missing values. Single imputation replaces missing data with a single estimate [14], offering simplicity and efficiency while maintaining sample size, though it may underestimate standard errors. Multiple imputation, developed by Donald B. Rubin in 1987 [15], is a robust method for handling missing data by generating several completed datasets through sampling from an imputation model, allowing analysts to compute and combine estimates to account for the uncertainty introduced by the missing values [16]. Multiple imputation is now widely recognized as the best general method for handling incomplete data across many fields. Nevertheless, this assumption does not hold universally. Leveraging continuous progress in machine learning techniques, numerous and varied ML-based approaches have been developed and applied to credit card fraud-detection problems using different datasets. A substantial number of credit card transaction datasets have been employed for model training in earlier investigations [6,17]. This study integrates two separate credit card datasets and applies imputation procedures to any missing values that arise, thereby enabling the identification of potential borrowers and the continuous assessment of their default risk. To obtain more precise and robust estimates, machine learning-driven imputation techniques—encompassing both single and multiple imputation approaches—were employed. In particular, the European Cardholders Dataset and the AmExpert credit dataset serve as the foundation for performing data integration, handling missing value imputation, and subsequently training machine learning classification models.

The rapid advancement of computer and database technologies has led to an explosion of data, far outpacing human ability to process it effectively [18]; businesses face significant challenges in processing and integrating this data. These challenges include not only ensuring the accuracy and timeliness of the data but also dealing with traditional data warehouse methods that can create data silos, progressively reducing the benefits of data warehouses [19]. When data cannot be effectively retrieved, observed, or interacted with, it loses its value for businesses. The data integration process merges information from various databases and sources, offering users a unified view. An effective data integration strategy allows companies to leverage insights from multiple sources to achieve their business goals. Such data integration constitutes a critical component of contemporary business intelligence architectures, since it governs the precision and currency of available information and, in turn, substantially shapes the capacity of an organization to make informed strategic decisions amid fast-evolving market conditions. Organizations should accord high priority to the careful design and effective deployment of data integration mechanisms so as to facilitate more timely and accurate real-time decision-making and thereby maintain operational efficiency [19]. By integrating data from various business areas such as finance, manufacturing, sales, and marketing, organizations enhance financial transparency and improve operational efficiency in their supply chains. This integration allows businesses to utilize customer behavior insights from combined datasets, maximizing satisfaction, loyalty, and profitability [20]. Organizations analyze this data to enhance its availability and reliability. Ultimately, effective data integration enables businesses to leverage information more strategically, thereby maintaining a competitive advantage. The main contributions of this study include the following:

(1): Data and empirical contribution. We develop a novel integration of two datasets with entirely different features. The integrated dataset yields a 6.05% improvement in accuracy compared with the original datasets and achieves 100% accuracy after imputation; this improvement is statistically significant.
(2): Methodological contribution. We introduce a robust integration pipeline together with a CBPM performance metric that jointly accounts for predictive accuracy and training time, allowing practitioners to select models that balance effectiveness and computational cost in real-world deployments.
(3): Practical contribution. We provide actionable guidance for financial institutions and regulators on deploying the proposed approach to enhance fraud-detection systems. By improving detection efficiency and reducing false positives, the method helps protect consumers while enhancing the resilience and credibility of financial markets.

Following this Introduction, the paper is structured into three main sections. Section 2 addresses foundational concepts and mainly delineates the theoretical basis of the present work, encompassing techniques for data balancing, a range of machine learning algorithms documented in prior research, the conceptual framework of generative data algorithms, and the evaluation metrics commonly utilized in fraud-detection systems grounded in machine learning [21,22]. Section 3 presents the experimental results, offering detailed comparisons among diverse machine learning algorithms and feature reduction techniques. Additionally, this section introduces the optimized system architecture for a data generative machine learning framework tailored to credit card fraud detection. Lastly, Section 4 provides a concise summary of the principal contributions made by the present research and outlines several prospective avenues worthy of further exploration in subsequent studies.

2. Preliminaries

This study utilizes two publicly available real-world credit card datasets sourced from Kaggle. These two datasets provide valuable experiential data that directly reflects actual consumer transaction behaviors and fraud characteristics prevalent in the market, which is crucial for gaining a deep understanding of real-world complexities. The European credit cardholders dataset, serving as the primary source, comprises 284,807 non-fraudulent transactions and 492 fraudulent ones—equivalent to 0.172% of all records—and it is characterized by 30 predictive features. Complementing this, the AmExpert credit dataset encompasses 42,579 legitimate transactions alongside 2949 fraudulent cases (corresponding to 6.926% fraud prevalence) and is defined by 18 attributes [21]. These two datasets have found extensive application in the existing body of credit card fraud research [23,24,25]. In the course of dataset merging, the complete set of features originating from the European dataset was preserved, whereas two features deemed irrelevant from the AmExpert dataset were excluded. In the preprocessing stage, non-numeric data was label-encoded. The European dataset consists entirely of numeric data, while the American Express dataset contains four non-numeric features, which were also encoded using label encoding (see Table 1). Prior to initiating model training, systematic application of the StandardScaler ensured thorough standardization of the data [26,27]. By normalizing feature distributions and mitigating the effects of disparate measurement scales, this essential preprocessing procedure substantially enhanced the predictive performance of the resulting machine learning models [28].

2.1. Dataset

The resampling strategy is commonly used to address data imbalance issues, and it includes both over-sampling and under-sampling techniques. Over-sampling, which replicates instances belonging to the minority class, carries the risk of inducing model overfitting or magnifying existing noise within the data [7]. In contrast, under-sampling serves to decrease computational demands and enhance processing efficiency; this approach can be implemented either through random elimination of majority-class samples or by replacing selected instances with cluster centroids derived from subsets of the dataset [7,29]. To reduce the occurrence of non-fraud samples, we employed random under-sampling techniques on both datasets in this study. Our objective was to achieve a distribution where both the fraud and non-fraud classes represent approximately 50%. Random under-sampling has been applied to balance the European credit cardholders dataset, yielding a final composition of 483 non-fraudulent data and 492 fraudulent transactions (see Figure 1). In a parallel manner, the AmExpert credit dataset underwent the same balancing procedure, resulting in 2996 non-fraudulent transactions and 2949 fraudulent transaction data (see Figure 2).

The two selected datasets were divided into training and testing subsets to better support the performance evaluation of various machine learning models. Table 2 provides the corresponding counts for non-fraud and fraud cases in the training and testing datasets of combined dataset.

While reducing the number of data samples can adversely affect key performance indicators like accuracy, precision, and recall, it is crucial to acknowledge that imbalanced training datasets can lead to bias in machine learning algorithms. Mitigating class imbalance within training datasets constitutes a critical prerequisite for constructing reliable and robust machine learning frameworks. Accordingly, the subsequent phase of the investigation applies an ensemble of machine learning algorithms to the prepared and categorized dataset. Training of these models is conducted exclusively on a rebalanced classification dataset that ensures equitable representation across all classes. Such class-balanced training markedly enhances both the predictive accuracy and the overall dependability of the models across various classification tasks.

2.2. Various Data Generation Techniques for Credit Card Fraud Detection

Imputation techniques enable the estimation of absent values by drawing upon the observed information within the dataset, thereby facilitating the completion of incomplete records [30]. Several straightforward methods have been introduced to address missing values, such as ignoring and deletion [13]. While these methods are easy to implement, they may pose significant risks if the proportion of missing values is excessively high, potentially leading to substantial impacts on the analysis results [31]. This study employed four machine learning-based imputation methods. Single imputation can quickly generate a complete dataset, making it simple and efficient for real-time analysis. Multiple imputation can produce multiple datasets to capture variability and improve the accuracy of statistical estimates, but it requires more computational resources [32]. The following introduces the definitions and basic characteristics of these four methods.

Classification and Regression Tree (CART) Imputation is a popular modeling method developed in recent decades that is also used as a supportive model for missing data imputation [33]. CART imputation is a method used for imputing missing data by treating each variable as a response variable and utilizing other variables to build the model. During the construction process, only the observed data is used to train the model, which is then employed to predict the missing values. When multiple variables have missing data, one can either exclude the missing data or use iterative methods for stepwise imputation. The CART imputation method is flexible and efficient, capable of handling complex data structures while generating intuitive visual results that are easy to interpret [34].
Gradient Boosting Tree (GBT) Imputation enhances the handling of missing data by constructing decision trees sequentially based on GBT principles [35,36]. Each tree corrects the errors of its predecessor, starting with a simple tree known as a weak learner. The iterative procedure focuses on those instances containing missing values that previous trees predicted incorrectly, and proceeds sequentially until either the predetermined number of trees has been constructed or a target level of predictive accuracy has been attained. GBT imputation has proven to be highly effective in accurately filling in missing data in tabular datasets [37,38].
K-Nearest Neighbour (KNN) Imputation has become one of the most popular approaches in recent years [39]. It fills in missing data by estimating values based on the nearest instances in the dataset. It predicts discrete attributes by identifying the most frequent value among the k-nearest neighbours and continuous attributes by calculating their mean. A key advantage of k-NN imputation is that it does not require explicit predictive models for each attribute with missing data, allowing for easy adaptation to various attributes by simply modifying the distance metric. Additionally, it can effectively handle instances with multiple missing values [30].
Random Forest (RF) Imputation is based on the random forest method proposed by Breiman in 2001 [40], making predictions by constructing multiple decision trees. RF imputation differs from traditional imputation methods by employing a random selection of features at each node for splitting, thus enhancing diversity among the trees while reducing the risk of overfitting [41]. The aggregation of multiple weak learners results in improved accuracy, robustness, and generalization compared to a single decision tree [42]. This increases randomness and enhances the stability of the model. Mainly, RF has only two main parameters: the number of variables in the random subset at each node and the number of trees in the forest [43]. However, a potential drawback of RF imputation is its computational cost, as it requires constructing multiple trees and repeated fitting during each training iteration, leading to increased resource consumption and computation time.

It is noted that CART, GBT, and RF are all tree-based models. CART addresses missing values in a structured way, while GBT improves prediction accuracy through ensemble learning. RF emphasizes random sampling to enhance model stability and predictive performance. KNN is a similarity-based imputation method that fills in missing values by utilizing nearby points. Our goal is to select the most appropriate imputation method to maximize data quality and the reliability of the analytical results. Algorithm 1 demonstrates the core imputation process of KNN, outlining key steps including neighbor selection, distance calculation, and value aggregation for imputation. By leveraging the inherent similarity among data points, KNN provides a robust method for handling missing values, often leading to improved predictive accuracy in machine learning tasks.

Algorithm 1: KNN missing value imputation Procedure

Providing the KNN imputation algorithm is essential because it improves the transparency and reproducibility in the data analysis process. A clear and detailed algorithm allows researchers and practitioners to understand the principles behind each step, making it easier to adjust parameters for specific applications and ensuring that results can be consistently replicated across different studies.

2.3. Various Machine Learning Models for Credit Card Fraud Detection

This study selects six prominent machine learning models for training and analysis, including both traditional and deep learning approaches. The traditional models specifically consist of ensemble-based techniques. Four of these algorithms have been applied to the same dataset [44,45,46,47]. The machine learning models used in this research are as follows:

Artificial Neural Network (ANN) [48] constitutes a computational framework inspired by biological neural systems. The architecture consists of numerous interconnected artificial neurons that transform input signals through successive weighted summations followed by nonlinear activation functions. In standard configurations, the network is organized hierarchically into an input layer responsible for receiving raw external information, one or several hidden layers that facilitate complex nonlinear transformations via inter-neuron signal propagation, and an output layer that generates the final computed results. The hidden layers allow the ANN to approximate complex functional relationships and recognize patterns through self-learning, while an ANN without hidden layers can only solve linear functions.
Convolutional Neural Network (CNN) [46] presents a specialized class of deep learning architectures that has achieved widespread adoption in domains such as image processing, natural language processing, audio signal analysis, and time-series modeling [49]. The framework is particularly tailored to grid-structured inputs and automatically extracts hierarchical spatial patterns by means of convolutional filters applied across multiple layers. CNNs provide key benefits in classification tasks by reducing the need for extensive human feature engineering and performing well in various recognition applications [50]. When dealing with high-dimensional trading data, CNNs can effectively capture temporal patterns and local dependencies through their hierarchical structure. For feature heterogeneity, CNNs provide the advantage of automatic feature extraction through convolutional layers. These layers learn to identify relevant patterns and variations in the input data, making them adaptable to the diverse characteristics found in real-world datasets. This capability allows CNNs to effectively process heterogeneous features without the need for extensive manual preprocessing [51].
Gradient-Boosted Decision Tree (GBDT) [45] constitutes an ensemble learning technique that constructs a powerful predictive model through sequential fitting of decision trees, where each successive tree corrects the residual errors of its predecessors. Prior studies have further demonstrated the utility of GBDT as a base learner when employing fixed-depth decision trees, thereby circumventing difficulties associated with the rapid exponential increase in complexity that accompanies unrestricted tree growth. In the process of building a decision tree, GBDT automatically analyzes and selects the features with the highest statistical information gain. By combining these features, it aims to better fit the training target, thereby effectively handling datasets with dense numerical features [36]. Additionally, its iterative approach to correcting predictions improves performance by focusing on complex patterns, making it highly suitable for a variety of predictive tasks.
K-Nearest Neighbor (KNN) [46] forms a non-parametric classification model by assigning class labels according to majority vote among the k closest training instances in the feature space [47]. The hyperparameter k, which determines the number of neighbors considered, is user-defined, with initial neighbor selection typically performed randomly and subsequently optimized via iterative performance assessment. A principal strength of the KNN algorithm lies in its conceptual simplicity and minimal implementation complexity, rendering it particularly suitable for individuals new to machine learning applications [52]. Additionally, KNN is highly effective for datasets that are not linearly separable, as it can capture complex decision boundaries through local voting mechanisms [53]. KNN performs well in multi-class classification tasks and provides robust performance even in the presence of noisy data, as the influence of outliers is mitigated by the collective voting of nearby neighbors. This characteristic enhances its effectiveness across various applications, such as recommendation systems, image recognition, and anomaly detection.
Long Short-Term Memory (LSTM) [54] is an advanced type of recurrent neural network designed to retain sequential data over time. It features gates and a memory cell that capture and store historical trends. Each LSTM consists of multiple cells that function as modules, facilitating the transfer of data along a transport line from one cell to another. The gates within each cell filter, retain, or add data based on sigmoidal activations, enabling selective passage. The Forget Gate regulates data retention, the Memory Gate selects and modifies new data for storage, and the Output Gate determines the final output based on the cell state and processed information. This architecture allows LSTMs to effectively manage long-range dependencies in sequential data.
Support Vector Machine (SVM) [47] is used for both classification and regression tasks and is well-known for its ability to establish optimal decision boundaries between different class distributions. SVM has strong generalization ability and is designed to maximize the margin between classes. This design allows the model to respond more effectively to unseen data, reducing the risk of overfitting and making it a robust classifier. Its performance in high-dimensional spaces is particularly impressive, especially when the number of features exceeds the number of samples, which makes SVM especially well-suited for analyzing high-dimensional data in areas such as natural language processing and image recognition [55]. However, SVM may perform unsatisfactorily when dealing with datasets that have imbalanced class distributions, include irrelevant data, or feature overlapping class samples.

These six machine learning models each have their own characteristics and are suitable for different types of data and analytical needs. ANN and CNN belong to deep learning methods; the former is mainly used for structured data, while the latter performs well in processing images and spatial data. GBDT is a powerful ensemble learning method that can improve prediction accuracy through continuous iteration, making it suitable for structured datasets. On the other hand, KNN is an instance-based method that emphasizes proximity, making it suitable for small datasets. LSTM focuses on time-series data and is particularly effective in handling long-term dependencies. Finally, SVM is very effective in classification tasks, but it has poorer adaptability to unbalanced data. These six classification models will be applied to analyze before and after filling in missing data. By comparing the similarities and differences among various models, the effectiveness of different imputation methods can be evaluated, allowing for the selection of the model that best enhances data quality and prediction reliability. Table 3 summarizes the parameter settings for various classification models used in this study. Each model, including ANN, CNN, GBDT, KNN, LSTM, and SVM, is detailed along with its specified parameters. These settings are utilized for training and predicting across all models in classification tasks.

2.4. Combined Bivariate Performance Measure

The Combined Bivariate Performance Measure (CBPM) constitutes a novel evaluation metric specifically developed to assess the performance of systems or processes governed by two inherently conflicting objectives [56]. The CBPM is suitable for evaluating machine learning systems with two conflicting performance metrics, especially in cases where the two metrics exhibit opposing trends. For example, one metric may trend towards small-is-better (SIB), while the other trends towards bigger-is-better (BIB). The CBPM integrates multiple performance indicators into a single value, allowing for hyperparameter evaluation that influences these conflicting performance metrics and optimizing the overall performance evaluation process of the system. In this study, accuracy is an important metric; however, we should also consider the time required for training machine models and data imputation. When these two metrics are unrelated, we will apply the CBPM to provide a unified measure for evaluating performance outcomes, thereby validating the effectiveness of data imputation and model training. To quantify system performance, two functions,

f (x)

and

g (x)

, are defined, representing performance within the continuous range

x \in [x_{a}, x_{b}]

. It is assumed that

f (x)

is a decreasing function when

g (x)

is increasing, or vice versa. The performance ratio functions are defined as follows [56]:

ϕ (x) = \frac{f (x) - f_{0}}{Δ f} + ϵ, ρ (x) = \frac{g_{1} - g (x)}{Δ g} + ϵ,

(1)

where

Δ f = | f (x_{a}) - f (x_{b}) |, Δ g = | g (x_{a}) - g (x_{b}) |,

(2)

and

ϵ \sim 0.01

, which serves as the minimal tolerance to avoid resulting in zero from multiple calculations.

f_{0}

represents the performance at the lower bound

x_{a}

, while

g_{1}

represents the performance at the upper bound

x_{b}

. The CBPM function

ξ (x)

is as follows:

ξ (x) = ϕ (x) \cdot ρ (x),

(3)

the optimal value

x^{*}

can be found by solving the following:

x^{*} = {{}^{\exists}x ∣ \frac{d}{d x} ξ (x) = 0, x \in [x_{a}, x_{b}]} .

(4)

In a production environment, when deploying machine learning models, it is essential to not only consider accuracy but also prioritize training time. If training takes too long, it will delay the deployment of the model, thereby affecting the model’s ability to respond quickly to new fraud patterns. Given the dynamic nature of fraud, even minor improvements in training time can have a significant impact on operational efficiency. Therefore, in practical applications, it is crucial to optimize both accuracy and training time, especially when frequent retraining based on new data is required. The CBPM method can provide a comprehensive evaluation of performance, facilitating an effective balance between accuracy and efficiency. In this experiment, the previously mentioned CBPM performance measurement method will be employed to identify the optimal machine model based on accuracy and time metrics.

3. Results of the Experiment

In this section, we present the performance results of six machine learning algorithms applied to a single dataset. We also include the results from two single imputation methods and two multiple imputation methods that were used after merging two datasets, as detailed in Section 2.2. The present evaluation facilitates a thorough side-by-side assessment of four standard classification metrics—accuracy, precision, recall, and F1-score—together with the corresponding computational times required for model training. The complete workflow of the data imputation procedure is illustrated in Figure 3. The process commences with the ingestion of the two source datasets designated for integration, followed by application of the selected imputation strategy to address all missing entries within the combined dataset, thereby yielding a fully populated and gap-free dataset ready for subsequent analysis. Subsequently, the imputed dataset undergoes partitioning into training and test subsets, followed by model fitting, generation of predictions, and rigorous assessment of predictive performance via standard evaluation metrics and statistical hypothesis testing. Upon completion of the evaluation, if the merged and imputed dataset exhibits performance that is either comparable to or markedly superior to that achieved with the original incomplete datasets, the final model along with its performance outcomes are retained and reported. In cases where performance remains equivalent or inferior, an alternative imputation strategy is selected and pertinent hyperparameters are tuned before retraining the models.

3.1. Result Comparisons for Various ML Algorithms

The following tables presented herein summarize the predictive performance and computational records obtained from two distinct datasets when evaluated with six classification algorithms, thereby facilitating clearer inspection and direct comparison of the benefits conferred by the merged and imputed datasets in subsequent experimental phases. In particular, Table 4 reports the classification outcomes achieved on the European Cardholders dataset using the six previously outlined approaches. Observed accuracies spanned the range of 90% to 94%, with the support vector machine (SVM) yielding the best result of 93.91%. All algorithms executed their training and inference procedures in less than 10 s.

Table 5 displays the performance results for the AmExpert credit dataset. Among the methods evaluated, KNN had the lowest accuracy at 90.60%. In contrast, the other methods achieved accuracy levels between 95% and 98%, with GBDT reaching the highest accuracy of 97.43%. Additionally, KNN recorded the shortest execution time, nearing 0 s, while CNN had an average execution time of around 100 s. This near-zero training time for KNN reflects that it is an instance-based method that does little or no model fitting. In contrast, CNN’s long training time arises from iterative weight optimization and heavy matrix computations; CNN can learn complex feature representations, which is useful only when such representational power leads to improved accuracy.

3.2. Result Comparisons for Data Imputation

This study employed five distinct machine learning imputation methods to effectively merge the datasets to consider a broader range of imputation techniques and to identify the most effective method. Table 6 displays the accuracy results, imputation time and training time for various imputations including CART, GBT, KNN, and Random Forest imputations. Among the models tested, all models had an accuracy of over 91%. Overall, the KNN imputation method had the highest accuracy with all accuracies exceeding 99%, with the overall time ranging from 9 to 35 s. Notably, the CNN achieved a perfect accuracy of 100%, with a training time of only 72.51 s, which is better than the total running time for the original single dataset. The CART imputation method had the lowest accuracy, ranging from 91% to 97%. The accuracy of other imputation methods fell between these two. Regarding imputation time and model training time, all imputation methods took less than 95 s, except for the RF imputation method, which took up to 3940 s.

Figure 4 displays a bar chart of the accuracy results obtained by each imputation method. From the trend in the chart, it is evident that the KNN imputation method achieved the best accuracy overall, while the CART imputation method had the lowest accuracy overall.

Figure 5 presents the CBPM values for four imputation methods applied to both the training and testing datasets. The CBPM method calculated the accuracy and runtime to obtain the corresponding

ξ (x)

values associated with each imputation method used in the machine learning algorithms. Additionally, as displayed in Figure 5, the 3D bar chart of CBPM values further emphasizes that the GBDT algorithm, utilizing the KNN imputation method, achieved the highest

ξ (x)

values across both datasets. This highlights the effectiveness of the CBPM method in identifying the most efficient imputation strategy among selected machine learning algorithms.

These findings highlight the critical role of the selected imputation strategies in improving model performance, with KNN imputation emerging as our preferred choice among the four methods due to its superior performance in enhancing accuracy. Based on the evaluation of the CBPM method, the most efficient model identified is the GBDT model utilizing KNN imputation for both training and inference phases; it reflects the effectiveness of CBPM in selecting the best machine learning model. Therefore, the KNN imputation method will be applied in this study for further model training, and the results will be presented in the next subsection.

3.3. Enhanced Credit Card Fraud-Detection System with Generative Dataset

This research employed various machine learning methods for imputation and compared their performance outcomes. KNN imputation emerged as the most effective approach and was therefore adopted for all subsequent imputation steps in the present investigation. Because the integration combined completely unrelated feature sets from two credit card datasets, the resulting merged dataset exhibited a missingness rate of 50%. The test portion of this study comprised 1377 transactions, denoted as

x_{i}

(see Figure 6), characterized by a total of 46 aggregated features represented as

f_{i}

. All missing values created from the merging process were filled with KNN imputation techniques. Based on the performance results of the model training, the best-performing model was selected for data training and prediction, finally detecting high-risk user transactions and providing binary classification outputs.

Table 7 presents the performance results of the KNN-based imputation method. It is evident that all key evaluation metrics, including accuracy, precision, and recall, exceed 99%, with F1-score values ranging from 0.94 to 1. KNN achieves the lowest metrics at approximately 99.42%. Importantly, CNN achieved the highest accuracy, precision, and recall, with all values at 100% and an F1-score of 1. This confirms the excellent performance of the model on the test set, indicating that it can perfectly identify all samples. Overall, the consistency across metrics from the merged dataset demonstrates stability and reliability, indicating that the KNN imputation process was successful in enhancing data quality.

Figure 7 illustrates the Receiver Operating Characteristic Area Under Curve (ROC-AUC) [57] of six machine learning algorithms applied to the combined dataset, the European dataset and the AmExpert dataset, which effectively measures the ability of classification models to identify positive and negative samples. A higher AUC value indicates better model performance, with the average AUC values for the European dataset and the AmExpert dataset being 0.971 and 0.989, respectively. In Figure 7, it is clearly demonstrated that the combined dataset achieves AUC values that nearly reach 1. This indicates that the model effectively distinguishes between positive and negative classes, demonstrating its strong predictive capability.

Figure 8 shows the precision–recall curve (PR-AUC) metrics [58] of six machine learning algorithms for the datasets, which effectively measure the performance of classification models on imbalanced datasets. A higher average precision (AP) value signifies better model performance, with the average AP values for the European dataset and the AmExpert dataset being 0.978 and 0.988, respectively. Figure 8 shows that the combined dataset yields AP values approaching 1. This suggests that the models are highly effective in identifying positive instances while minimizing false positives. Such impressive performance metrics not only validate the algorithms used but also highlight the effectiveness of the combined dataset and the use of KNN imputation methods to enhance model accuracy.

3.4. Statistical Tests

Hypothesis testing was employed in this study to determine whether the observed results were due to random fluctuations or actual effects. This method not only helped us assess differences between groups but also validated the effectiveness of KNN imputation. By calculating p-values, hypothesis testing quantitatively expresses the statistical significance of the results, thereby enhancing the credibility of the research. During hypothesis testing, this study set the significance level at 0.01 to confirm that the overall performance of the KNN imputed dataset significantly improved compared to the original dataset. By setting the

α

value at a higher level, this approach ensures that we can effectively identify strong statistical evidence supporting the observed effects.

Table 8 and Table 9 present a comparison of the accuracies of the AmExpert dataset and European dataset with the KNN imputed dataset. The results show that the accuracy of the KNN imputed dataset has significantly improved across all classification models. Specifically, the overall average accuracy of the AmExpert dataset increased by 4.5%, while that of the European dataset increased by 7.6%. In the evaluation of the use of p-values from hypothesis testing at a significance level of

α

set to 0.01, all models showed a significant improvement, leading to the acceptance of the alternative hypothesis

H_{1}

, which indicates that the accuracies of all models on the KNN imputed dataset have significantly increased. This demonstrates that the KNN imputation method not only effectively fills in missing values but also enhances the dataset’s performance in classification tasks.

Based on the results obtained from the KNN imputation, the CBPM method was employed for performance evaluation. Table 10 presents the optimization results for various algorithms. The metric

ϕ (x)

, calculated from accuracy, aligns with the BIB criterion, while

ρ (x)

, derived from training time, aligns with the SIB principle.

ξ (x)

represents the results of the CBPM function. The optimization results are illustrated in Figure 9, displaying a diagram of

x^{*}

derived from the CBPM function. This diagram provides a visual representation of the optimization results obtained by the GBDT, highlighting the best results achieved across these different algorithms.

Overall, the merged dataset exhibited substantially superior predictive performance relative to each of the two original datasets when considered separately. These findings provide strong empirical support for the efficacy of KNN-based imputation within the experimental framework and underscore the value of successfully integrating feature spaces from two fundamentally dissimilar credit card transaction datasets.

4. Conclusions

This paper proposed a novel approach that successfully integrates two datasets with entirely different features. We compared four imputation methods and employed six machine learning algorithms, effectively addressing missing values using the KNN imputation method. By utilizing advanced machine learning data imputation techniques, we achieved an accuracy of 100% for integrated dataset. Furthermore, our novel metric measurement method, the CBPM, identified GBDT as the most efficient machine learning model in terms of accuracy and training time. The results not only demonstrate the feasibility of the proposed data integration method but also validate that effective data imputation can significantly improve model accuracy. As financial institutions continue to face the challenges posed by fraud, our findings provide valuable insights into the design and implementation of more resilient detection mechanisms, as well as practical guidance for merging datasets. This study used only two credit-card datasets, the findings are limited to users who have applied for or hold credit cards, and the approach can detect fraud only within that population, not among potential applicants who have not applied. In future work, we aim to explore the integration of datasets from different business sectors to identify opportunities for expanding customer capabilities and to evaluate their fraud risk across various services.

Author Contributions

Conceptualization, S.-K.K.; methodology, S.-K.K.; software, X.F.; data reshaping, X.F.; writing—original draft, S.-K.K. and X.F.; writing—review and editing, S.-K.K. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the Macao Polytechnic University (MPU), under Grant RP/FCA-05/2024.

Data Availability Statement

The datasets for this paper are available on Kaggle repository (Credit Card Fraud Detection (https://www.kaggle.com/datasets/mlg-ulb/creditcardfraud) (accessed on 31 August 2025) and AmExpert CodeLab 2021 (https://www.kaggle.com/datasets/pradip11/amexpert-codelab-2021/) (accessed on 31 August 2025)).

Acknowledgments

This paper was revised by using AI/ML-assisted tools (Grok 4).

Conflicts of Interest

The authors declare no conflicts of interest.

References

Banker, S.; Dunfield, D.; Huang, A.; Prelec, D. Neural mechanisms of credit card spending. Sci. Rep. 2021, 11, 4070. [Google Scholar] [CrossRef]
Borah, L.; Saleena, B.; Prakash, B. Credit Card Fraud Detection Using Data Mining Techniques. Seybold Rep. 2020, 15, 2431–2436. [Google Scholar]
da Costa, V.G.T.; de Leon Ferreira de Carvalho, A.C.P.; Barbon Junior, S. Strict Very Fast Decision Tree: A memory conservative algorithm for data stream mining. Pattern Recognit. Lett. 2018, 116, 22–28. [Google Scholar] [CrossRef]
Tang, Q.; Tong, Z.; Yang, Y. Large portfolio losses in a turbulent market. Eur. J. Oper. Res. 2021, 292, 755–769. [Google Scholar] [CrossRef]
Koralage, R. Data Mining Techniques for Credit Card Fraud Detection. Sustain. Vital Technol. Eng. Inform. 2019, 1–9. [Google Scholar]
Makki, S.; Assaghir, Z.; Taher, Y.; Haque, R.; Hacid, M.S.; Zeineddine, H. An Experimental Study With Imbalanced Classification Approaches for Credit Card Fraud Detection. IEEE Access 2019, 7, 93010–93022. [Google Scholar] [CrossRef]
Ghaleb, F.A.; Saeed, F.; Al-Sarem, M.; Qasem, S.N.; Al-Hadhrami, T. Ensemble Synthesized Minority Oversampling-Based Generative Adversarial Networks and Random Forest Algorithm for Credit Card Fraud Detection. IEEE Access 2023, 11, 89694–89710. [Google Scholar] [CrossRef]
Tingfei, H.; Guangquan, C.; Kuihua, H. Using Variational Auto Encoding in Credit Card Fraud Detection. IEEE Access 2020, 8, 149841–149853. [Google Scholar] [CrossRef]
Salazar, A.; Safont, G.; Vergara, L. Semi-Supervised Learning for Imbalanced Classification of Credit Card Transaction. In 2018 International Joint Conference on Neural Networks (IJCNN); IEEE: Piscataway, NJ, USA, 2018; pp. 1–7. [Google Scholar] [CrossRef]
Calvanese, D.; Giacomo, G.D.; Lenzerini, M.; Nardi, D.; Rosati, R. Description Logic Framework for Information Integration; KR’98; Morgan Kaufmann Publishers Inc.: San Francisco, CA, USA, 1998; pp. 2–13. [Google Scholar]
Bleiholder, J.; Naumann, F. Data fusion. ACM Comput. Surv. (CSUR) 2009, 41, 1–41. [Google Scholar] [CrossRef]
Kuang, S.; Huang, Y.; Song, J. Unsupervised data imputation with multiple importance sampling variational autoencoders. Sci. Rep. 2025, 15, 3409. [Google Scholar] [CrossRef] [PubMed]
Little, R.; Rubin, D. Statistical Analysis with Missing Data; Wiley: New York, NY, USA, 2019. [Google Scholar]
Donders, A.R.T.; van der Heijden, G.J.M.G.; Stijnen, T.; Moons, K.G.M. Review: A gentle introduction to imputation of missing values. J. Clin. Epidemiol. 2006, 59, 1087–1091. [Google Scholar] [CrossRef]
Rubin, D.B. Multiple Imputation for Nonresponse in Surveys; John Wiley & Sons Inc.: New York, NY, USA, 1987. [Google Scholar]
Murray, J.S. Multiple Imputation: A Review of Practical and Theoretical Findings. Statist. Sci. 2018, 33, 142–159. [Google Scholar] [CrossRef]
Muslim, M.A.; Nikmah, T.L.; Pertiwi, D.A.; Dasril, Y. New Model Combination Meta-learner to Improve Accuracy Prediction P2P Lending with Stacking Ensemble Learning. Intell. Syst. Appl. 2023, 18, 200–204. [Google Scholar] [CrossRef]
Liu, H.; Yu, L. Toward integrating feature selection algorithms for classification and clustering. IEEE Trans. Knowl. Data Eng. 2005, 17, 491–502. [Google Scholar] [CrossRef]
Dayal, U.; Castellanos, M.; Simitsis, A.; Wilkinson, K. Data Integration Flows for Business Intelligence; EDBT ’09; Association for Computing Machinery: New York, NY, USA, 2009; pp. 1–11. [Google Scholar] [CrossRef]
Nofal, M.I.; Yusof, Z.M. Integration of Business Intelligence and Enterprise Resource Planning within Organizations. Procedia Technol. 2013, 11, 658–665. [Google Scholar] [CrossRef]
Feng, X.; Kim, S.K. Novel Machine Learning Based Credit Card Fraud Detection Systems. Mathematics 2024, 12, 1869. [Google Scholar] [CrossRef]
Feng, X.; Kim, S.K. Statistical Data-Generative Machine Learning-Based Credit Card Fraud Detection Systems. Mathematics 2025, 13, 2446. [Google Scholar] [CrossRef]
Rajora, S.; Li, D.L.; Jha, C.; Bharill, N.; Patel, O.P.; Joshi, S.; Puthal, D.; Prasad, M. A Comparative Study of Machine Learning Techniques for Credit Card Fraud Detection Based on Time Variance. In 2018 IEEE Symposium Series on Computational Intelligence; IEEE: Piscataway, NJ, USA, 2018; pp. 1958–1963. [Google Scholar] [CrossRef]
Tanouz, D.; Subramanian, R.R.; Eswar, D.; Reddy, G.V.P.; Kumar, A.R.; Praneeth, C.V.N.M. Credit Card Fraud Detection Using Machine Learning. In IEEE Proceedings of ICICCS; IEEE: Piscataway, NJ, USA, 2021; pp. 967–972. [Google Scholar]
El hlouli, F.Z.; Riffi, J.; Mahraz, M.A.; El Yahyaouy, A.; Tairi, H. Credit Card Fraud Detection Based on Multilayer Perceptron and Extreme Learning Machine Architectures. In Proceedings of the 2020 International Conference on Intelligent Systems and Computer Vision (ISCV), Fez, Morocco, 9–11 June 2020. [Google Scholar]
Batista, G.E.A.P.A.; Monard, M.C. An analysis of four missing data treatment methods for supervised learning. Appl. Artif. Intell. 2003, 17, 519–533. [Google Scholar] [CrossRef]
Afriyie, J.K.; Tawiah, K.; Pels, W.A.; Addai-Henne, S.; Dwamena, H.A.; Owiredu, E.O.; Ayeh, S.A.; Eshun, J. A supervised machine learning algorithm for detecting and predicting fraud in credit card transactions. Decis. Anal. J. 2023, 6, 100163. [Google Scholar] [CrossRef]
Developers, S.L. Sklearn. Preprocessing. StandardScaler. Available online: https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html (accessed on 31 August 2025).
Fernandez, A.; Garcia, S.; Galar, M.; Prati, R.C.; Krawczyk, B.; Herrera, F. Learning from Imbalanced Data Sets; Springer: Cham, Switzerland, 2018. [Google Scholar]
Batista, G.; Monard, M.C. A Study of K-Nearest Neighbour as an Imputation Method. In Proceedings of the Soft Computing Systems—Design, Management and Applications, HIS 2002, Santiago, Chile, 1–4 December 2002; Volume 30, pp. 251–260. [Google Scholar]
Pratama, I.; Permanasari, A.E.; Ardiyanto, I.; Indrayani, R. A review of missing values handling methods on time-series data. In Proceedings of the 2016 International Conference on Information Technology Systems and Innovation (ICITSI), Bali, Indonesia, 24–27 October 2016; pp. 1–6. [Google Scholar]
Bertsimas, D.; Pawlowski, C.; Zhuo, Y.D. From Predictive Methods to Missing Data Imputation: An Optimization Approach. J. Mach. Learn. Res. 2018, 18, 1–39. [Google Scholar]
Breiman, L.; Friedman, J.; Stone, C.; Olshen, R. Classification and Regression Trees; Chapman and Hall/CRC: New York, NY, USA, 1984. [Google Scholar]
Chen, C.Y.; Chang, Y.W. Missing data imputation using classification and regression trees. PeerJ Comput. Sci. 2024, 10, e2119. [Google Scholar] [CrossRef]
Friedman, J.; Hastie, T.; Tibshirani, R. Additive logistic regression: A statistical view of boosting (With discussion and a rejoinder by the authors). Ann. Stat. 2000, 28, 337–407. [Google Scholar] [CrossRef]
Friedman, J.H. Greedy function approximation: A gradient boosting machine. Ann. Stat. 2001, 29, 1189–1232. [Google Scholar] [CrossRef]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.U.; Polosukhin, I. Attention is All you Need. Adv. Neural Inf. Process. Syst. 2017, 30. [Google Scholar]
Jolicoeur-Martineau, A.; Fatras, K.; Kachman, T. Generating and Imputing Tabular Data via Diffusion and Flow-based Gradient-Boosted Trees. In Proceedings of the International Conference on Artificial Intelligence and Statistics, Valencia, Spain, 2–4 May 2024; Volume 238, pp. 1288–1296. [Google Scholar]
Jäger, S.; Allhorn, A.; Bießmann, F. A Benchmark for Data Imputation Methods. Front. Big Data 2021, 30, 693674. [Google Scholar] [CrossRef]
Breiman, L. Random Forests. Mach. Learn. 2021, 45, 5–32. [Google Scholar] [CrossRef]
Rebai, S.; Ben Yahia, F.; Essid, H. A graphically based machine learning approach to predict secondary schools performance in Tunisia. Socio-Econ. Plan. Sci. 2020, 70, 100724. [Google Scholar] [CrossRef]
Kehinde, T.; Oyedele, A.; Kareem, M.; Akpan, J.; Olanrewaju, O. Explainable DEA–Ensemble Approach with Golden Jackal Optimization: Efficiency Evaluation and Prediction for United States Information Technology Firms. Mach. Learn. Appl. 2025, 23, 100798. [Google Scholar] [CrossRef]
Liaw, A.; Wiener, M.C. Classification and Regression by randomForest. R News 2007, 2, 18–22. [Google Scholar]
Ileberi, E.; Sun, Y.; Wang, Z. Performance Evaluation of Machine Learning Methods for Credit Card Fraud Detection Using SMOTE and AdaBoost. IEEE Access 2021, 9, 165286–165294. [Google Scholar] [CrossRef]
Alam, T.M.; Shaukat, K.; Hameed, I.A.; Luo, S.; Sarwar, M.U.; Shabbir, S.; Li, J.; Khushi, M. An Investigation of Credit Card Default Prediction in the Imbalanced Datasets. IEEE Access 2020, 8, 201173–201198. [Google Scholar] [CrossRef]
Alarfaj, F.K.; Malik, I.; Khan, H.U.; Almusallam, N.; Ramzan, M.; Ahmed, M. Credit Card Fraud Detection Using State-of-the-Art Machine Learning and Deep Learning Algorithms. IEEE Access 2022, 10, 39700–39715. [Google Scholar] [CrossRef]
Kalid, S.N.; Ng, K.H.; Tong, G.K.; Khor, K.C. A Multiple Classifiers System for Anomaly Detection in Credit Card Data with Unbalanced and Overlapped Classes. IEEE Access 2020, 8, 28210–28221. [Google Scholar] [CrossRef]
Nur Ozkan-Gunay, E.; Ozkan, M. Prediction of bank failures in emerging financial markets: An ANN approach. J. Risk Financ. 2007, 8, 465–480. [Google Scholar] [CrossRef]
Lu, H.; Setiono, R.; Liu, H. Effective data mining using neural networks. IEEE Trans. Knowl. Data Eng. 1996, 8, 957–961. [Google Scholar] [CrossRef]
Khan, A.; Sohail, A.; Zahoora, U.; Qureshi, A.S. A survey of the recent architectures of deep convolutional neural networks. Artif. Intell. Rev. 2020, 53, 5455–5516. [Google Scholar] [CrossRef]
Razali, M.N.; Arbaiy, N.; Lin, P.C.; Ismail, S. Optimizing Multiclass Classification Using Convolutional Neural Networks with Class Weights and Early Stopping for Imbalanced Datasets. Electronics 2025, 14, 705. [Google Scholar] [CrossRef]
Oded Maimon, L.R. Data Mining and Knowledge Discovery Handbook; Springer: New York, NY, USA, 2010. [Google Scholar]
Taghizadeh-Mehrjardi, R.; Nabiollahi, K.; Minasny, B.; Triantafilis, J. Comparing data mining classifiers to predict spatial distribution of USDA-family soil groups in Baneh region, Iran. Geoderma 2015, 253–254, 67–77. [Google Scholar] [CrossRef]
Siami-Namini, S.; Namin, A.S. Forecasting Economics and Financial Time Series: ARIMA vs. LSTM. arXiv 2018, arXiv:1803.06386. [Google Scholar] [CrossRef]
Karamizadeh, S.; Abdullah, S.M.; Halimi, M.; Shayan, J.; Rajabi, M.J. Advantage and drawback of support vector machine functionality. In Proceedings of the 2014 International Conference on Computer, Communications, and Control Technology (I4CT), Langkawi, Malaysia, 2–4 September 2014; IEEE: Piscataway, NJ, USA, 2014; pp. 63–65. [Google Scholar]
Kim, S.K. Combined Bivariate Performance Measure. IEEE Trans. Instrum. Meas. 2024, 73, 1–4. [Google Scholar] [CrossRef]
AbouGrad, H.; Sankuru, L. Online Banking Fraud Detection Model: Decentralized Machine Learning Framework to Enhance Effectiveness and Compliance with Data Privacy Regulations. Mathematics 2025, 13, 2110. [Google Scholar] [CrossRef]
Yin, C.; Zhang, S.; Wang, J.; Xiong, N.N. Anomaly Detection Based on Convolutional Recurrent Autoencoder for IoT Time Series. IEEE Trans. Syst. Man Cybern. Syst. 2022, 52, 112–122. [Google Scholar] [CrossRef]

Figure 1. Data balancing for European Cardholders dataset [22].

Figure 2. Data balancing for AmExpert credit dataset [22].

Figure 3. Flow diagram of experimental steps.

Figure 4. Comparison of accuracy across multiple imputation methods.

Figure 5. CBPM results; (left) for training dataset; (right) testing (or inference) dataset.

Figure 6. Enhanced credit card fraud-detection system.

Figure 7. ROC-AUC comparison of combined datasets with KNN imputation.

Figure 8. Precision–recall comparison of combined datasets with KNN imputation.

Figure 9. CBPM optimization results using KNN imputation.

Table 1. Encoded features in AmExpert credit dataset [22].

Serial Number	Feature	Original Value	Encoded Value
1	gender	F M	0 1
2	owns_car	N Y	0 1
3	owns_house	N Y	0 1
4	occupation_type	Accountants Cleaning staff Cooking staff Core staff Drivers HR staff High skill tech staff IT staff Laborers Low-skill laborers Managers Medicine staff Private service staff Realty agents Sales staff Secretaries Security staff Unknown Waiters/barmen staff	0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

Table 2. Transaction sample distribution in the combined datasets.

Dataset Name	Dataset Type	Class 0 (Non-Fraud)	Class 1 (Fraud)
Combined datasets	Training Testing	2787 692	2752 689

Table 3. Parameter settings for classification models.

Model	Parameters
ANN	Input layer: 128 neurons, hidden layers: 64 and 32 neurons, output layer: 1 neuron
CNN	Convolutional layers: 32-64-128 filters, pooling: max pooling, fully connected: 256-128-64-32-16 neurons, output: 1 neuron
GBDT	Estimators: 100, max depth: 3
KNN	Neighbors: 5
LSTM	LSTM layer: 64 neurons, output layer: 1 neuron
SVM	Penalty parameter: 1.0, kernel: rbf

Table 4. Performance results of various ML algorithms on the European Cardholders dataset.

Algorithm	Accuracy $(%)$	Precision $(%)$	Recall $(%)$	F1-Score	Training Time (sec)
ANN [48]	90.36	90.40	90.35	0.904	3.60
CNN [46]	91.37	91.49	91.36	0.914	9.64
GBDT [45]	92.39	92.61	92.37	0.924	0.94
KNN [46]	91.88	92.17	91.86	0.919	0.10
LSTM [54]	92.89	92.97	92.88	0.929	5.38
SVM [47]	93.91	94.22	93.89	0.939	0.12

Table 5. Performance results of various ML algorithms on the AmExpert credit dataset.

Algorithm	Accuracy $(%)$	Precision $(%)$	Recall $(%)$	F1-Score	Training Time (sec)
ANN [48]	96.01	96.00	96.03	0.960	20.72
CNN [46]	96.62	96.62	96.66	0.966	97.04
GBDT [45]	97.43	97.48	97.50	0.975	1.71
KNN [46]	90.60	90.61	90.58	0.906	0.30
LSTM [54]	95.61	95.60	95.62	0.960	33.79
SVM [47]	95.40	95.41	95.45	0.954	2.850

Table 6. Comprehensive evaluation of imputation methods for missing values.

Imputation Method	Algorithm	Accuracy $(%)$	Total Time (sec)	Imputation Time (sec)	Training Time (sec)
CART	ANN	95.50	23.04	0.21	22.83
	CNN	94.84	64.61	0.21	64.40
	GBDT	96.88	2.48	0.21	2.27
	KNN	91.72	0.62	0.21	0.41
	LSTM	95.35	30.02	0.21	29.81
	SVM	95.50	4.28	0.21	4.07
GBT	ANN	99.20	52.68	32.96	19.72
	CNN	98.98	94.03	32.96	61.07
	GBDT	99.06	35.38	32.96	2.42
	KNN	98.55	33.36	32.96	0.40
	LSTM	99.20	58.51	32.96	25.55
	SVM	98.84	35.21	32.96	2.25
KNN	ANN	99.85	29.03	9.60	19.43
	CNN	100.00	72.51	9.60	62.91
	GBDT	99.85	13.12	9.60	3.52
	KNN	99.42	9.99	9.60	0.39
	LSTM	99.85	34.57	9.60	24.97
	SVM	99.71	10.68	9.60	1.08
RF	ANN	98.33	3662.10	3636.74	25.36
	CNN	97.75	3940.49	3636.74	303.75
	GBDT	98.47	3645.21	3636.74	8.47
	KNN	95.72	3637.15	3636.74	0.41
	LSTM	98.40	3685.99	3636.74	49.25
	SVM	98.33	3639.22	3636.74	2.48

Table 7. Performance results of KNN-based imputation methods.

Algorithm	Accuracy $(%)$	Precision $(%)$	Recall $(%)$	F1-Score
ANN [48]	99.85	99.85	99.86	0.999
CNN [46]	100.00	100.00	100.00	1.000
GBDT [45]	99.85	99.85	99.86	0.999
KNN [46]	99.42	98.41	99.44	0.994
LSTM [54]	99.85	99.85	99.86	0.999
SVM [47]	99.71	99.71	99.71	0.997

Table 8. Evaluation of KNN-based generative data compared to the American Express credit dataset.

Algorithm	Original $(%)$	Accuracy $(%)$	p-Value	Significance? ( $α = 0.01$ )	$H_{1}$ Accepted?
ANN	96.01	99.85	<0.001	Yes	Yes
CNN	96.62	100.00	<0.001	Yes	Yes
GBDT	97.43	99.85	<0.001	Yes	Yes
KNN	90.60	99.42	<0.001	Yes	Yes
LSTM	95.61	99.85	<0.001	Yes	Yes
SVM	95.40	99.71	<0.001	Yes	Yes

Table 9. Evaluation of KNN-based generative data compared to the European Cardholders dataset.

Algorithm	Original $(%)$	Accuracy $(%)$	p-Value	Significance? ( $α = 0.01$ )	$H_{1}$ Accepted?
ANN	90.36	99.85	<0.001	Yes	Yes
CNN	91.37	100.00	<0.001	Yes	Yes
GBDT	92.39	99.85	<0.001	Yes	Yes
KNN	91.88	99.42	<0.001	Yes	Yes
LSTM	92.89	99.85	<0.001	Yes	Yes
SVM	93.91	99.71	<0.001	Yes	Yes

Table 10. CBPM for KNN-based imputation ML algorithm.

Algorithm	Accuracy $(%)$	Training Time (sec)	$ϕ (x)$	$ρ (x)$	$ξ (x)$
ANN	99.85	29.03	0.982	0.555	0.545
CNN	100.00	72.51	1.000	0.282	0.282
GBDT	99.85	13.12	0.982	1.000	0.982
KNN	99.42	9.99	0.931	0.952	0.886
LSTM	99.85	34.57	0.982	0.268	0.264
SVM	99.71	10.68	0.965	0.968	0.934

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Feng, X.; Kim, S.-K. Machine Learning-Based Data Generative Techniques for Credit Card Fraud-Detection Systems. Mathematics 2026, 14, 975. https://doi.org/10.3390/math14060975

AMA Style

Feng X, Kim S-K. Machine Learning-Based Data Generative Techniques for Credit Card Fraud-Detection Systems. Mathematics. 2026; 14(6):975. https://doi.org/10.3390/math14060975

Chicago/Turabian Style

Feng, Xiaomei, and Song-Kyoo Kim. 2026. "Machine Learning-Based Data Generative Techniques for Credit Card Fraud-Detection Systems" Mathematics 14, no. 6: 975. https://doi.org/10.3390/math14060975

APA Style

Feng, X., & Kim, S.-K. (2026). Machine Learning-Based Data Generative Techniques for Credit Card Fraud-Detection Systems. Mathematics, 14(6), 975. https://doi.org/10.3390/math14060975

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Machine Learning-Based Data Generative Techniques for Credit Card Fraud-Detection Systems

Abstract

1. Introduction

2. Preliminaries

2.1. Dataset

2.2. Various Data Generation Techniques for Credit Card Fraud Detection

2.3. Various Machine Learning Models for Credit Card Fraud Detection

2.4. Combined Bivariate Performance Measure

3. Results of the Experiment

3.1. Result Comparisons for Various ML Algorithms

3.2. Result Comparisons for Data Imputation

3.3. Enhanced Credit Card Fraud-Detection System with Generative Dataset

3.4. Statistical Tests

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI