Novel Machine Learning Based Credit Card Fraud Detection Systems

Feng, Xiaomei; Kim, Song-Kyoo

doi:10.3390/math12121869

Open AccessArticle

Novel Machine Learning Based Credit Card Fraud Detection Systems

by

Xiaomei Feng

and

Song-Kyoo Kim

^*

Faculty of Applied Sciences, Macao Polytechnic University, R. de Luis Gonzaga Gomes, Macao, China

^*

Author to whom correspondence should be addressed.

Mathematics 2024, 12(12), 1869; https://doi.org/10.3390/math12121869

Submission received: 9 May 2024 / Revised: 9 June 2024 / Accepted: 13 June 2024 / Published: 15 June 2024

(This article belongs to the Special Issue Machine Learning and Finance)

Download

Browse Figures

Versions Notes

Abstract

This research deals with the critical issue of credit card fraud, a problem that has escalated in the last decade due to the significant increase in credit card usage, largely driven by advances in international trade, e-commerce, and FinTech. With global losses projected to exceed USD 400 billion in the next decade, the urgent need for effective fraud detection systems is apparent. Our study leverages the power of machine learning (ML) and presents a novel approach to credit card fraud detection. We used the European cardholders dataset for model training, addressing the data imbalance issue that often hinders the effectiveness of the learning process. As a key innovative element, we introduce compact data learning (CDL), a powerful tool for reducing the size and complexity of the training dataset without sacrificing the accuracy of the ML system. Comparative experiments have shown that our CDL-adapted feature reduction outperforms various ML algorithms and feature reduction methods. The findings of this research not only contribute to the theoretical foundations of fraud detection but also provide practical implications for the financial sector, which can benefit immensely from the enhanced fraud detection system.

Keywords:

compact data; data reduction; machine learning; credit card fraud; security; data complexity; artificial intelligence; supervised learning; robust classification

MSC:

62H30; 62P05; 62P99; 91B08

1. Introduction

There has been a marked and swift surge in credit card users and transaction volume over the previous decade. This escalation is tied to advancements in international commerce, e-commerce, and financial technology, which have notably amplified the convenience of credit card use. Consequently, the ubiquity of credit card transactions has spurred an ongoing rise in credit card fraud. Credit card fraud involves unauthorized use of a credit card account, taking place when the cardholder or card issuer remains unaware of third-party usage. Fraudulent actors engage in acts such as procuring goods or services without payment or illicitly accessing account funds, including offline fraud, application fraud, bankruptcy fraud, and behavioral fraud [1]. The detection and prevention of credit card fraud are vital elements of financial systems aiming to identify and halt fraudulent transactions [2]. The deployment of efficient fraud surveillance strategies curbs economic losses, bolsters customer trust, and diminishes complaints [3,4]. Addressing the substantial financial losses tied to such fraudulent activities is pivotal. Recent data indicate that global losses from credit card fraud were USD 9.84 billion in 2011 [5], escalating to USD 28.65 billion in 2019 [6], an increase of USD 18.81 billion over eight years. Moreover, forecasts suggest that global credit card fraud losses may surpass USD 400 billion in the ensuing decade [7]. In 2020, there were 365,597 instances of fraud involving new credit accounts [8,9]. According to Federal Trade Commission (FTC) data [10,11], there were 459,297 credit card fraud cases in 2020, with 393,207 identified as credit card theft, a 44.6% increase from 271,927 cases in 2019. Consequently, the rapid development and implementation of credit card fraud detection systems by enterprises, particularly within the financial sector, is a pressing priority. The detection of credit card fraud within the financial sector could potentially be integrated with the field of economic criminology [12]. The efficiency and effectiveness of fraud detection systems could be enhanced by recognizing the limitations inherent in policing fraud through the application of modern technologies [13]. Leveraging the continual progress in machine learning (ML), a broad array of diverse ML systems is deployed for credit card fraud detection across various datasets. Numerous datasets used in preceding studies include the Credit Card Fraud dataset [1], European Cardholders dataset [14,15,16,17], and Lending Club Issued Loans dataset [18,19]. Several of these datasets are characterized by substantial data volume [1,18] and are trained with an extensive set of features [19]. Furthermore, it has been noted that data imbalance issues can potentially impede the effectiveness of the learning process in certain studies [17,20]. For our research, we utilized the European Cardholders dataset for ML model training and evaluation [14,17,21,22,23]. The escalating volume and rapid expansion of data in contemporary enterprises, coupled with their increasing diversity, have resulted in data becoming increasingly complex and high-dimensional. The primary contribution of this paper is the proposition of a unique compact data learning (CDL) approach aimed at enhancing model training efficiency. This approach seeks to optimize runtime speed by diminishing the sample size and minimizing data features. Through the use of reduced sampling to decrease the dataset size and the implementation of a robust comparison and selection procedure for feature reduction methods, we effectively tackled the difficulties associated with training models on expansive datasets. Importantly, our methodology not only boosts runtime efficiency but also ensures negligible impact on the original accuracy performance. The outcomes of this study offer insightful understanding and pragmatic techniques for augmenting model training efficiency, particularly in situations involving data volume reduction and feature selection.

This article proceeds in three additional sections. Section 2 explores the preliminaries, primarily presenting the theoretical foundations of our study, including data balancing and diverse machine learning algorithms previously implemented in existing research [8,14,17,24]. A brief overview of current feature reduction techniques is also provided in this section. Moreover, it suggests a novel feature reduction approach utilizing compact data learning [25,26]. CDL is a simple yet potent tool for downsizing the training dataset in terms of features and/or sample size with no harm to the accuracy of a machine learning system. Section 3 offers the experiment outcomes, comparing various ML algorithms and feature reduction methods with CDL adapted feature reduction. Lastly, Section 4 encapsulates the performance comparisons with the innovative feature reduction.

2. Preliminaries

Due to the limited access to real-world credit card transaction data from companies, the European credit cardholders dataset, which contains 284,807 non-fraud category and 492 fraud category (i.e.,

0.172 %

out of the total) samples, was applied in this paper. This dataset, which is publicly available on Kaggle, has been widely used in the related studies [14,15,21,22,23].

2.1. Data Balancing

The resampling approach, encompassing both over-sampling and under-sampling methods, is often utilized to mitigate issues of data imbalance. This process involves generating synthetic samples either by duplicating minority class samples or interpolating between them [27]. Nevertheless, over-sampling via the duplication of minority class samples may amplify the noise present in the data [2]. The Synthetic Minority Oversampling Technique (SMOTE) [27] and ensemble-based sampling approaches, which are typical over-sampling techniques, are found to be highly susceptible to the quality of the synthetically created samples. However, these techniques might introduce imprecision and lead to unstable model performance, as the learning process becomes overly dependent on the characteristics of the artificially generated samples [2]. On the other hand, under-sampling, targeted at addressing class imbalance in datasets and reducing computational burden for improved efficiency, involves either randomly eliminating samples from the majority class or replacing them with cluster centroids from a subset of samples [2,28]. In alignment with compact data learning (CDL) principles, under-sampling is considered more suitable for enhancing machine learning (ML) training efficiency. Consequently, the random under-sampling technique was applied to reduce the number of non-fraud samples, with the goal of attaining an almost equal distribution between fraud and non-fraud classes, aiming for approximately 50% representation in each class. As a result of the random under-sampling method, we obtained 550 non-fraud transactions and 492 fraud transactions (see Figure 1).

In this research, the dataset was divided into the training and testing (or inference) datasets for training and testing various ML models. The training dataset accounts for

74 %

of the entire dataset, while the testing dataset accounts for

26 %

(see Table 1). This study utilized five ML models for conducting model training and analysis, including the RF method with the AdaBoost, GBDT, KNN, CNN, and SVM models. These five algorithms were employed in training with the optimized and balanced dataset.

Our experiments demonstrate that basic data balancing can boost training time efficiency by up to 24,000 times compared to unbalanced data. It is important to note that unbalanced training datasets can introduce bias into ML systems. While reducing data samples may potentially impair key performance metrics (e.g., accuracy, precision, and recall), unbalanced training datasets should be adjusted to develop a proper ML system for the above reason.

2.2. Various Machine Learning Models for Credit Card Fraud Detection

In this section, we provide an overview of our research methodology in this study. Various machine learning algorithms were tested for selecting the best model for the credit card fraud detection. Among these models, we chose ensemble-based learning models and traditional machine learning models for our analysis [8,14,15,17]. There are five ML algorithms which were applied to the same datasets. These ML models with the balanced datasets are considered for this research:

The random forest (RF) + adaptive boosting (AB) [14] method constructs a stronger classifier through training the random forest model, which is an ensemble learning model consisting of multiple decision trees based on random feature selection and a boot program [24]. This combined model adjusts the weights of samples based on the performance of the previous round’s classifier and strengthens the training of misclassified samples in the next round. The pairing of AdaBoost with the RF method enhances its robustness and improves the quality of classification for imbalanced credit card data.
The Gradient Boosted Decision Tree (GBDT) [17] model is an ensemble learning algorithm which iteratively trains a series of decision trees to build a powerful predictive model. The GBDT model has also been used in previous papers as a base learner for fixed-size decision trees to overcome the problem of decision trees limiting their depth due to exponential growth.
K-Nearest Neighbor (KNN) [8] is a model that involves voting on local neighboring data points to build the classifier function [15,29,30]. The user sets the number k, and the ‘neighbors’ value is initially chosen randomly, but it can be fine-tuned through iterative evaluation.
A Convolutional Neural Network (CNN) [8] is a deep learning method that is widely used in images, text, audio and time series data, etc. There are six different layers in the CNN model, namely, the input layer, convolutional layer, pooling layer, fully connected layer, SoftMax/Logic layer, and output layer, of which hidden layers with the same structure can have different numbers of channels per layer.
A Support Vector Machine (SVM) [15] utilizes both classification and regression tasks. The SVM is known for its capability to derive optimal decision boundaries between classes. However, it is not well-suited for datasets exhibiting imbalanced class distribution, noise, and overlapping class samples.

The reproduced results based on the above models with the balanced dataset are presented in Table 2 in Section 3.1. It is noted that the performance results for the above ML models are not the same as the results from the original research because all above mentioned studies were completed using an unbalanced dataset.

2.3. Various Feature Reduction Methods

Feature selection methods have been widely adopted in addressing high-dimensional problems due to their simplicity and efficiency [31]. Feature selection aids in data understanding, reduces computational demands, mitigates the curse of dimensionality, and enhances predictor performance [32]. The essence of feature selection lies in selecting a subset of variables from the input and effectively captures the input data while minimizing the influence of noise or irrelevant variables to generate robust predictive outcomes [32,33].

Analysis of Variance (ANOVA) is a statistical method used to compare means across different groups by analyzing data variance. It is commonly used in feature selection to aid in inference and decision-making processes. This method has reused in a previous paper [34].
The feature importance method is a technique used to evaluate and quantify the importance of features in a machine learning model, which helps the user to understand the critical role of specific features in the predictive performance of a model.
The correlation heatmap is a graphical representation that visualizes pairwise correlations between variables in a dataset and is generated based on linear correlation coefficients. In the correlation heatmap, darker blue indicates a stronger negative correlation, while darker red indicates a stronger positive correlation.
The linear correlation coefficient is employed to quantify the strength and direction of the linear relationship between two variables [35].

The above four feature reduction methods were employed to select the features for training the machine learning models. The outputs of these methods were compared to determine the optimal feature reduction approach for further training. Resampling techniques were utilized to eliminate redundant data instances from the dataset.

2.4. Compact Data Learning

Compact data design for machine learning entails the development of an optimized training dataset that maintains comparable machine learning accuracy while minimizing data volume [25]. Compact Data Learning (CDL) introduces a novel and applicable structure for enhancing a classification system by reducing the machine learning training data size [26]. Since CDL is an enhanced feature reduction method which is based on correlation, a correlation heatmap is directly applied to calculate the pair-wise comparison between the input features of the dataset. A correlation heatmap is a visualization instrument utilized to display the intensity of the correlation among variables. The Pearson Correlation Coefficient serves as a significant method for quantifying the affinity or relationship between multiple data variables [36,37,38]. The correlation score heatmap of all input features in the training dataset is shown in Figure A1 in Appendix A. Originating from the idea of compact data design, which provides optimal resources without the necessity of managing intricate big data, CDL distinguishes itself by offering a general, output-independent structure to optimize the ML training dataset. CDL serves as a specific framework intended to accelerate the machine learning training phase without sacrificing system precision. A typical form of absolute correlation is as [26]:

r = |\frac{E [(X - μ_{X}) (Y - μ_{Y})]}{\sqrt{E [{(X - μ_{X})}^{2}] \cdot E [{(Y - μ_{Y})}^{2}]}}|, r \in [0, 1],

(1)

where

μ_{X} = E [X], μ_{Y} = E [Y]

. The closer the absolute correlation value r is to 1, the higher correlation. An absolute correlation value close to 0 indicates weak or no correlation between two variables. It is noted that the CDL can be easily implemented from the correlation heatmap by using a simple algorithm (see Algorithm A1 in Appendix B). In our research, we employed the CDL-based feature reduction, and the accuracy of the trained models was evaluated using the two-sample Z-test method to determine the significance of the results, which would help us determine whether to accept or reject the outcomes of the model. In the subsequent section, we present and analyze the results and compare the outcomes of applying the CDL method. Identifying the optimal threshold for diminishing input features could serve as another research subject to enhance CDL. The absolute correlation threshold, denoted as

r^{*}

, is formally defined as follows [26]:

r^{*} = \underset{r}{a r g m i n} \{True for H_{0} : E [G (ξ (r))] - E [G (ξ (1))] = 0\},

(2)

where

H_{0}

is the null hypothesis for the two-sample Z-test and the revised set of the input features, the function

G (ξ (r))

provides the accuracy of a machine learning function, and

ξ (r)

is the selected input feature based on the correlation threshold from the correlation heatmap. It is noted that several ML evaluations are required to obtain the optimal threshold

r^{*}

from (2), and this threshold is data-dependent. But the absolute correlation threshold is conventionally set based on industrial practices [26]. According to our industrial practices, CDL-based feature reduction gives the best performance when

r *

is around

0.7

to

0.9

. Hence, we chose the threshold of the CDL feature reduction from the best practice (i.e.,

r^{*} = 0.7

).

2.5. Performance Measures

The performance of the selected models was evaluated using a performance matrix, which compared the actual observations with the model predictions. The performance matrix encompassed metrics such as accuracy, precision, recall, and F1-score. The metrics were calculated across different classes: True Positives (TPs) refer to the number of correctly classified positive instances, while True Negatives (TNs) represent the number of correctly classified negative instances. False Positives (FPs) indicate the number of instances that are falsely classified as positive, and False Negatives (FNs) denote the instances that are falsely classified as negative [18,39]. Let N represent the total number of samples, and the evaluation metrics can be expressed using the following formulas:

A c c u r a c y = \frac{N_{T P} + N_{T N}}{N},

(3)

P r e c i s i o n = \frac{N_{T P}}{N_{T P} + N_{F P}},

(4)

R e c a l l = \frac{N_{T P}}{N_{T P} + N_{F N}},

(5)

F 1 - s c o r e = \frac{2 \times P r e c i s i o n \times R e c a l l}{P r e c i s i o n + R e c a l l} .

(6)

Accuracy is the most commonly used and simplest model evaluation metric, providing a direct measure of the proportion of correctly classified samples. Precision measures the proportion of true positive samples among all predicted positive samples by the model. Recall measures the proportion of true positive samples among all actual positive samples. F1-score is the harmonic mean of precision and recall, which combines both accuracy and recall performance. In this study, these four model metrics were utilized to comprehensively evaluate the ML models, and their respective scores are presented to provide objective measures to evaluate the performance of the models and aid in decision-making. Furthermore, we continuously examined accuracy in the application of the CDL method to ensure the performance level of the models.

3. Experiment Results

The upcoming discussion presents the optimization results of the five machine learning algorithms detailed in Section 2.5. Section 3.1 illustrates the performance metrics, namely, accuracy, precision, recall, f1-score, as well as the execution time of algorithms, when deployed on a balanced dataset using the testing dataset. Advancing to Section 3.2, three feature reduction methods, specifically ANOVA, feature importance, and correlation heatmap, were implemented to generate accuracy results and determine the most effective feature reduction method. In conclusion, as detailed in Section 3.3, we applied the feature reduction method chosen in Section 3.2 to conduct feature filtering during training, aiming to confirm the acceptability of the accuracy results. Simultaneously, we also recorded the execution time performance of the ML algorithms.

3.1. Result Comparisons for ML Algorithms

The subsequent discussion features the performance results of the test dataset, which were also analyzed. Displayed in Table 2 are the original performance results for the five machine learning models. Notably, all models achieved performance scores ranging from

91.83 %

to

94.61 %

across the evaluation metrics. It is noted that the RF + AB model achieved the highest performance across all accuracy-related indicators, but the SVM provided highest performance in terms of the training time. Among all models, the CNN model exhibited the longest running time, requiring

31.3

s to complete.

3.2. Result Comparisons for Feature Reduction

Based on the accuracy results shown in Table 2 in the previous section, the ANOVA, feature importance, and correlation heatmap methods were utilized to reduce the number of features in the dataset, and the training results of the models are presented in Table 3. In ANOVA, a significance level (

α

) of

0.05

, representing a

95 %

confidence level, was selected. Subsequently, features with p-values lower than the established significance level were selected for model training. In feature importance analysis, features with a score of zero were removed as they have no predictive capability for the target variable. Similarly, in the correlation heatmap, features with a score of zero were not selected because they show no significant correlation with the target variable.

As shown in Table 3, the accuracy scores of the correlation heatmap group were consistently higher or equal to those of the original group. In contrast, the ANOVA group and feature importance group have one or two accuracy scores that decreased. Consequently, the correlation heatmap approach might be adapted for further model training.

3.3. Feature Reduction with CDL

The accuracy results obtained using different correlation threshold limits for feature selection in machine model training are showcased in Table 4, with the correlation threshold between each feature represented in the correlation heatmap in Figure A1. In this study, the null hypothesis assumes no significant accuracy difference between the two samples, at a significance level of

0.05

. The findings indicate that when the correlation threshold

r^{*}

is

0.7

, the GBDT model showed a slight accuracy improvement, while the other four models experienced a decrease in accuracy. By applying the Z-test method, it was found that only the KNN model had a Z-score exceeding the critical value, leading to the rejection of the null hypothesis, implying that the accuracy of the KNN model is not acceptable. By using the Z-test method, it was determined that there is no significant accuracy difference among all models, leading to the acceptance of the null hypothesis, suggesting that the accuracy of all models is acceptable. The running times for all models are shorter than their original running times, indicating a decrease in running time achieved after feature reduction. The GBDT model showed an improvement in accuracy, while the RF + AB, SVM, CNN, and KNN models experienced a decrease in accuracy but still acceptable performance (i.e., no differences). On the other hand, the performance of the KNN model is unacceptable, as indicated by the rejection of the null hypothesis. The training times of all models were shorter than their original times (see Table 4).

Figure 2 presents the quantity and ratio of features prior to any feature removal, subsequent to feature selection using the heatmap, and following feature selection using the CDL technique. According to our experiments, the CDL-based feature reduction is theoretically based on the absolute correlation and is easily transformed from the correlation heatmap, but this simple method is more practical, and its efficiency is even better than the performance using a correlation heatmap (see Figure 2). Before feature selection, the original dataset consisted of 30 features. After applying heatmap-based feature selection, 24 features were selected, accounting for

80 %

of the total features. Employing the CDL technique for feature selection resulted in the selection of 18 features, representing

60 %

of the total features. Using

60 %

of the CDL-selected features, we trained the model and achieved comparable accuracy performance to the initial selection of 30 features, as indicated in Table 4. These results highlight the potential effectiveness of our proposed CDL technique in reducing features while maintaining the desired accuracy performance.

4. Conclusions

This research deals with the development of a credit card fraud detection system that leverages machine learning, and innovatively incorporates a compact data learning method (CDL) for feature reduction. The CDL method not only reduces the sample size through data balancing but also optimizes the feature size, all while maintaining the accuracy of the fraud detection system. The simplicity and universality of the CDL method make it an ideal choice for almost any machine learning training process. The CDL serves as a guide for adopting this cutting-edge method to boost ML training efficiency for credit card fraud detection. By integrating the CDL method into machine learning training, we can create more robust and reliable models. This not only improves the accuracy of fraud detection but also reduces the computational complexity, thus delivering quicker and more reliable outcomes. Therefore, this research offers a promising direction in the field of credit card fraud detection, demonstrating that the CDL method could be a game-changer in improving the detection of fraudulent activities, thereby enhancing the security of credit card transactions. While numerous challenges remain to be addressed in future research, this framework shows promise for application across various multidisciplinary domains, including economic criminology, accountancy, and legal matters. This research opens new avenues for enhancing the efficiency and effectiveness of fraud detection systems. Scholars active in these areas might find the research particularly beneficial, recognizing the broader context that could be potential topics for our future research.

Author Contributions

Conceptualization, S.-K.K.; methodology, S.-K.K.; software, X.F.; data reshaping, X.F.; writing—original draft, S.-K.K. and X.F.; writing—review and editing, S.-K.K. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the Macao Polytechnic University (MPU) under Grant RP/FCA-04/2023.

Data Availability Statement

The datasets used for the current study are available in the Kaggle repository (https://www.kaggle.com/, accessed on 30 January 2024) [14,17,21,22,23].

Acknowledgments

This paper was revised by using AI/ML-assisted tools. The authors are thankful to the referees whose comments are very constructive.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A. Correlation Heatmap

A correlation heatmap illuminates the relationships between variables in a dataset. By transforming linear correlation coefficients into distinct shades of blue and red, it offers an intuitive understanding of negative and positive correlations, respectively.

Figure A1. Correlation heatmap based on the European credit cardholders dataset.

Appendix B. Algorithm for CDL Feature Reduction

The seamless integration of CDL with the correlation heatmap is worth noting. This synergy is achieved through the implementation of a simple algorithm, making the process of uncovering data correlations both efficient and effective. This combination enhances the power and utility of data analysis, making it an appealing option for data-driven investigations.

Algorithm A1: CDL-based feature reduction algorithm

References

Makki, S.; Assaghir, Z.; Taher, Y.; Haque, R.; Hacid, M.S.; Zeineddine, H. An Experimental Study with Imbalanced Classification Approaches for Credit Card Fraud Detection. IEEE Access 2019, 7, 93010–93022. [Google Scholar] [CrossRef]
Ghaleb, F.A.; Saeed, F.; Al-Sarem, M.; Qasem, S.N.; Al-Hadhrami, T. Ensemble Synthesized Minority Oversampling-Based Generative Adversarial Networks and Random Forest Algorithm for Credit Card Fraud Detection. IEEE Access 2023, 11, 89694–89710. [Google Scholar] [CrossRef]
Tingfei, H.; Guangquan, C.; Kuihua, H. Using Variational Auto Encoding in Credit Card Fraud Detection. IEEE Access 2020, 8, 149841–149853. [Google Scholar] [CrossRef]
Salazar, A.; Safont, G.; Vergara, L. Semi-supervised learning for imbalanced classification of credit card transaction. In Proceedings of the 2018 International Joint Conference on Neural Networks (IJCNN), Rio de Janeiro, Brazil, 8–13 July 2018; pp. 1–7. [Google Scholar]
Nilson Report 2019; HSN Consultants, Inc.: Santa Barbara, CA, USA, 2019.
Nilson Report 2021; HSN Consultants, Inc.: Santa Barbara, CA, USA, 2021.
Mullen, C. Card Industry Faces $400B in Fraud Losses Over Next Decade. 2022. Available online: https://www.paymentsdive.com/news/card-industry-faces-400b-in-fraud-losses-over-next-decade-nilson-says/611521/ (accessed on 31 January 2024).
Alarfaj, F.K.; Malik, I.; Khan, H.U.; Almusallam, N.; Ramzan, M.; Ahmed, M. Credit Card Fraud Detection Using State-of-the-Art Machine Learning and Deep Learning Algorithms. IEEE Access 2022, 10, 39700–39715. [Google Scholar] [CrossRef]
Dornadula, V.; Geetha, S. Credit Card Fraud Detection Using Machine Learning Algorithms. Procedia Comput. Sci. 2019, 165, 631–641. [Google Scholar] [CrossRef]
Nguyen, N.; Duong, T.; Chau, T.; Nguyen, V.H.; Trinh, T.; Tran, D.; Ho, T. A Proposed Model for Card Fraud Detection Based on CatBoost and Deep Neural Network. IEEE Access 2022, 10, 96852–96861. [Google Scholar] [CrossRef]
Intuit Inc. 25 Credit Card Fraud Statistics to Know in 2021; Intuit Inc.: Mountain View, CA, USA, 2022. [Google Scholar]
Button, M.; Hock, B.; Shepherd, D. Economic Crime: From Conception to Response, 1st ed.; Routledge: London, UK, 2022. [Google Scholar]
Hock, B.; Button, M. Non-Ideal Victims or Offenders? The Curious Case of Pyramid Scheme Participants. Vict. Offend. 2023, 18, 1311–1334. [Google Scholar] [CrossRef]
Ileberi, E.; Sun, Y.; Wang, Z. Performance Evaluation of Machine Learning Methods for Credit Card Fraud Detection Using SMOTE and AdaBoost. IEEE Access 2021, 9, 165286–165294. [Google Scholar] [CrossRef]
Kalid, S.N.; Ng, K.H.; Tong, G.K.; Khor, K.C. A Multiple Classifiers System for Anomaly Detection in Credit Card Data with Unbalanced and Overlapped Classes. IEEE Access 2020, 8, 28210–28221. [Google Scholar] [CrossRef]
Taha, A.A.; Malebary, S.J. An Intelligent Approach to Credit Card Fraud Detection Using an Optimized Light Gradient Boosting Machine. IEEE Access 2020, 8, 25579–25587. [Google Scholar] [CrossRef]
Alam, T.M.; Shaukat, K.; Hameed, I.A.; Luo, S.; Sarwar, M.U.; Shabbir, S.; Li, J.; Khushi, M. An Investigation of Credit Card Default Prediction in the Imbalanced Datasets. IEEE Access 2020, 8, 201173–201198. [Google Scholar] [CrossRef]
Muslim, M.; Nikmah, T.; Pertiwi, D.A.A.; Subhan.; Unjung, J.; Yosza, D.; Iswanto. New Model Combination Meta-learner to Improve Accuracy Prediction P2P Lending with Stacking Ensemble Learning. Intell. Syst. Appl. 2023, 18, 200–204. [Google Scholar] [CrossRef]
Madaan, M.; Kumar, A.; Keshri, C.; Jain, R.; Nagrath, P. Loan default prediction using decision trees and random forest: A comparative study. IOP Conf. Ser. Mater. Sci. Eng. 2021, 1022, 012042. [Google Scholar] [CrossRef]
Butaru, F.; Chen, Q.; Clark, B.; Das, S.; Lo, A.W.; Siddique, A. Risk and risk management in the credit card industry. J. Bank. Financ. 2016, 72, 218–239. [Google Scholar] [CrossRef]
Rajora, S.; Li, D.L.; Jha, C.; Bharill, N.; Patel, O.P.; Joshi, S.; Puthal, D.; Prasad, M. A comparative study of machine learning techniques for credit card fraud detection based on time variance. In Proceedings of the 2018 IEEE Symposium Series on Computational Intelligence (SSCI), Bangalore, India, 18–21 November 2018; pp. 1958–1963. [Google Scholar]
Tanouz, D.; Subramanian, R.R.; Eswar, D.; Reddy, G.V.P.; Kumar, A.R.; Praneeth, C.V.N.M. Credit card fraud detection using machine learning. In Proceedings of the 2020 4th International Conference on Intelligent Computing and Control Systems (ICICCS), Madurai, India, 13–15 May 2020; pp. 967–972. [Google Scholar]
El hlouli, F.Z.; Riffi, J.; Mahraz, M.A.; El Yahyaouy, A.; Tairi, H. Credit card fraud detection based on multilayer perceptron and extreme learning machine architectures. In Proceedings of the 2020 International Conference on Intelligent Systems and Computer Vision (ISCV), Fez, Morocco, 9–11 June 2020. [Google Scholar]
Randhawa, K.; Loo, C.K.; Seera, M.; Lim, C.P.; Nandi, A.K. Credit Card Fraud Detection Using AdaBoost and Majority Voting. IEEE Access 2018, 6, 14277–14284. [Google Scholar] [CrossRef]
Kim, S.K. Toward compact data from big data. In Proceedings of the 2020 15th International Conference for Internet Technology and Secured Transactions (ICITST), London, UK, 8–10 December 2020; pp. 1–5. [Google Scholar]
Kim, S.K. Compact Data Learning For ML Classification. Axioms 2024, 13, 137. [Google Scholar] [CrossRef]
Chawla, N.V.; Bowyer, K.W.; Hall, L.O.; Kegelmeyer, W.P. SMOTE: Synthetic minority over-sampling technique. J. Artif. Int. Res. 2002, 16, 321–357. [Google Scholar] [CrossRef]
Fernandez, A.; Garcia, S.; Galar, M.; Prati, R.C.; Krawczyk, B.; Herrera, F. Learning from Imbalanced Data Sets; Springer: Cham, Switzerland, 2018. [Google Scholar]
Oded Maimon, L.R. Data Mining and Knowledge Discovery Handbook; Springer: New York, NY, USA, 2010. [Google Scholar]
Taghizadeh-Mehrjardi, R.; Nabiollahi, K.; Minasny, B.; Triantafilis, J. Comparing data mining classifiers to predict spatial distribution of USDA-family soil groups in Baneh region, Iran. Geoderma 2015, 253–254, 67–77. [Google Scholar] [CrossRef]
Akogul, S. A Novel Approach to Increase the Efficiency of Filter-Based Feature Selection Methods in High-Dimensional Datasets With Strong Correlation Structure. IEEE Access 2023, 11, 115025–115032. [Google Scholar] [CrossRef]
Chandrashekar, G.; Sahin, F. A survey on feature selection methods. Comput. Electr. Eng. 2014, 40, 16–28. [Google Scholar] [CrossRef]
Chuang, L.Y.; Chang, H.W.; Tu, C.J.; Yang, C.H. Improved binary PSO for feature selection using gene expression data. Comput. Biol. Chem. 2008, 32, 29–38. [Google Scholar] [CrossRef] [PubMed]
Wu, C.; Yan, Y.; Cao, Q.; Fei, F.; Yang, D.; Lu, X.; Xu, B.; Zeng, H.; Song, A. sEMG Measurement Position and Feature Optimization Strategy for Gesture Recognition Based on ANOVA and Neural Networks. IEEE Access 2020, 8, 56290–56299. [Google Scholar] [CrossRef]
Biesiada, J.; Duch, W.l. Feature Selection for High-Dimensional Data—A Pearson Redundancy Based Filter; Springer: Berlin/Heidelberg, Germany, 2007; pp. 242–249. [Google Scholar]
Zhu, H.; You, X.; Liu, S. Multiple Ant Colony Optimization Based on Pearson Correlation Coefficient. IEEE Access 2019, 7, 61628–61638. [Google Scholar] [CrossRef]
Benesty, J.; Chen, J.; Huang, Y.; Cohen, I. Pearson Correlation Coefficient; Springer: Berlin/Heidelberg, Germany, 2009; pp. 1–4. [Google Scholar]
Adler, J.; Parmryd, I. Quantifying colocalization by correlation: The Pearson correlation coefficient is superior to the Mander’s overlap coefficient. Cytom. Part A 2010, 77A, 733–742. [Google Scholar] [CrossRef]
Al-Asadi, M.A.; Tasdemír, S. Empirical Comparisons for Combining Balancing and Feature Selection Strategies for Characterizing Football Players Using FIFA Video Game System. IEEE Access 2021, 9, 149266–149286. [Google Scholar] [CrossRef]

Figure 1. Data balancing of credit card fraud dataset; (a) original dataset; (b) balanced dataset.

Figure 2. Proportion of feature reduction using different correlation techniques.

Table 1. Training and testing transaction samples.

Dataset/ Classification	Class 0 (Non-Fraud)	Class 1 (Fraud)
Training	400	369
Testing	150	123

Table 2. Performance results of various machine learning algorithms.

Algorithm	Accuracy $(%)$	Precision $(%)$	Recall $(%)$	F1-Score	Training Time (s)
RF + AB [14]	94.14	94.61	93.72	0.940	0.345
GBDT [17]	93.41	93.86	92.98	0.933	0.937
SVM [15]	93.77	94.49	93.24	0.936	0.041
CNN [8]	92.67	93.87	91.94	0.925	31.30
KNN [8]	92.31	92.80	91.83	0.922	0.076

Table 3. Accuracy comparisons of different feature reduction methods.

Algorithm	Original $(%)$	ANOVA $(%)$	Feat. Imp. $(%)$	Heatmap $(%)$
RF + AB [14]	94.14	93.40	93.68	94.14
SVM [15]	93.77	93.04	93.77	93.77
GBDT [17]	93.41	93.77	94.04	94.87
CNN [8]	92.67	94.87	92.67	93.41
KNN [8]	92.31	92.31	92.31	93.77

Table 4. Performance comparisons of different correlation threshold limits.

Algorithm	Original (%)	CDL ( $r^{*} = 0.7$ , $α = 0.05$ )
Algorithm	Original (%)	Accuracy (%)	$H_{0}$ Accepted?	Training Time (s)
GBDT [17]	93.41	93.77	Yes	0.578
RF + AB [14]	94.14	92.67	Yes	0.343
SVM [15]	93.77	91.94	Yes	0.040
CNN [8]	92.67	90.84	Yes	21.834
KNN [8]	92.31	87.18	No	0.058

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Feng, X.; Kim, S.-K. Novel Machine Learning Based Credit Card Fraud Detection Systems. Mathematics 2024, 12, 1869. https://doi.org/10.3390/math12121869

AMA Style

Feng X, Kim S-K. Novel Machine Learning Based Credit Card Fraud Detection Systems. Mathematics. 2024; 12(12):1869. https://doi.org/10.3390/math12121869

Chicago/Turabian Style

Feng, Xiaomei, and Song-Kyoo Kim. 2024. "Novel Machine Learning Based Credit Card Fraud Detection Systems" Mathematics 12, no. 12: 1869. https://doi.org/10.3390/math12121869

APA Style

Feng, X., & Kim, S.-K. (2024). Novel Machine Learning Based Credit Card Fraud Detection Systems. Mathematics, 12(12), 1869. https://doi.org/10.3390/math12121869

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Novel Machine Learning Based Credit Card Fraud Detection Systems

Abstract

1. Introduction

2. Preliminaries

2.1. Data Balancing

2.2. Various Machine Learning Models for Credit Card Fraud Detection

2.3. Various Feature Reduction Methods

2.4. Compact Data Learning

2.5. Performance Measures

3. Experiment Results

3.1. Result Comparisons for ML Algorithms

3.2. Result Comparisons for Feature Reduction

3.3. Feature Reduction with CDL

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A. Correlation Heatmap

Appendix B. Algorithm for CDL Feature Reduction

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI