A Machine Learning Method with Hybrid Feature Selection for Improved Credit Card Fraud Detection
Abstract
:1. Introduction
- Using the information gain technique for initial feature selection to rank the features in the credit card dataset, only the top-ranked features are fed into the GA wrapper to reduce the search space and enhance the classification performance.
- Secondly, the GA wrapper is employed to select the best feature subset that results in optimal classification performance, and the ELM is employed as the learning algorithm in the GA wrapper.
- Additionally, this study employs the G-mean as the fitness function in the GA wrapper instead of the conventional accuracy evaluation criterion, ensuring the recognition rate of the minority samples is considered and improved.
2. Related Works
3. Materials and Methods
3.1. Credit Card Dataset
3.2. Information Gain
3.3. Genetic Algorithm
3.4. Extreme Learning Machine
4. Proposed Credit Card Fraud-Detection Approach
Algorithm 1 Proposed IG-GAW |
|
5. Results and Discussion
5.1. Performance of the ELM Classifier with Filter, Wrapper, and Hybrid Feature Selection Methods
5.2. Performance Comparison with Baseline Classifiers and Recent Literature
5.3. Discussions
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Femila Roseline, J.; Naidu, G.; Samuthira Pandi, V.; Alamelu alias Rajasree, S.; Mageswari, N. Autonomous credit card fraud detection using machine learning approach. Comput. Electr. Eng. 2022, 102, 108132. [Google Scholar] [CrossRef]
- Alharbi, A.; Alshammari, M.; Okon, O.D.; Alabrah, A.; Rauf, H.T.; Alyami, H.; Meraj, T. A Novel text2IMG Mechanism of Credit Card Fraud Detection: A Deep Learning Approach. Electronics 2022, 11, 756. [Google Scholar] [CrossRef]
- Bin Sulaiman, R.; Schetinin, V.; Sant, P. Review of Machine Learning Approach on Credit Card Fraud Detection. Hum.-Centric Intell. Syst. 2022, 2, 55–68. [Google Scholar] [CrossRef]
- Wang, D.; Chen, B.; Chen, J. Credit card fraud detection strategies with consumer incentives. Omega 2019, 88, 179–195. [Google Scholar] [CrossRef]
- Nandi, A.K.; Randhawa, K.K.; Chua, H.S.; Seera, M.; Lim, C.P. Credit card fraud detection using a hierarchical behavior-knowledge space model. PLoS ONE 2022, 17, e0260579. [Google Scholar] [CrossRef]
- Ileberi, E.; Sun, Y.; Wang, Z. Performance Evaluation of Machine Learning Methods for Credit Card Fraud Detection Using SMOTE and AdaBoost. IEEE Access 2021, 9, 165286–165294. [Google Scholar] [CrossRef]
- Rtayli, N.; Enneya, N. Enhanced credit card fraud detection based on SVM-recursive feature elimination and hyper-parameters optimization. J. Inf. Secur. Appl. 2020, 55, 102596. [Google Scholar] [CrossRef]
- Oo, M.C.M.; Thein, T. An efficient predictive analytics system for high dimensional big data. J. King Saud Univ.-Comput. Inf. Sci. 2022, 34, 1521–1532. [Google Scholar] [CrossRef]
- Huebner, J.; Fleisch, E.; Ilic, A. Assisting mental accounting using smartphones: Increasing the salience of credit card transactions helps consumer reduce their spending. Comput. Hum. Behav. 2020, 113, 106504. [Google Scholar] [CrossRef]
- Pudjihartono, N.; Fadason, T.; Kempa-Liehr, A.W.; O’Sullivan, J.M. A Review of Feature Selection Methods for Machine Learning-Based Disease Risk Prediction. Front. Bioinform. 2022, 2, 927312. [Google Scholar] [CrossRef]
- de-la-Bandera, I.; Palacios, D.; Mendoza, J.; Barco, R. Feature Extraction for Dimensionality Reduction in Cellular Networks Performance Analysis. Sensors 2020, 20, 6944. [Google Scholar] [CrossRef]
- Bouaguel, W. A New Approach for Wrapper Feature Selection Using Genetic Algorithm for Big Data. In Intelligent and Evolutionary Systems; Springer: Cham, Switzerland, 2016; pp. 75–83. [Google Scholar] [CrossRef]
- Chandrashekar, G.; Sahin, F. A survey on feature selection methods. Comput. Electr. Eng. 2014, 40, 16–28. [Google Scholar] [CrossRef]
- Bashir, S.; Khattak, I.U.; Khan, A.; Khan, F.H.; Gani, A.; Shiraz, M. A Novel Feature Selection Method for Classification of Medical Data Using Filters, Wrappers, and Embedded Approaches. Complexity 2022, 2022, e8190814. [Google Scholar] [CrossRef]
- Kumar, A.; Bhatia, M.P.S.; Sangwan, S.R. Rumour detection using deep learning and filter-wrapper feature selection in benchmark twitter dataset. Multimed. Tools Appl. 2022, 81, 34615–34632. [Google Scholar] [CrossRef]
- Wang, F.; Lu, X.; Chang, X.; Cao, X.; Yan, S.; Li, K.; Duić, N.; Shafie-khah, M.; Catalão, J.P. Household profile identification for behavioral demand response: A semi-supervised learning approach using smart meter data. Energy 2022, 238, 121728. [Google Scholar] [CrossRef]
- Wang, Z.; Gao, S.; Zhou, M.; Sato, S.; Cheng, J.; Wang, J. Information-Theory-based Nondominated Sorting Ant Colony Optimization for Multiobjective Feature Selection in Classification. IEEE Trans. Cybern. 2022, 1–14. [Google Scholar] [CrossRef]
- Rasool, A.; Tao, R.; Kamyab, M.; Hayat, S. GAWA–A Feature Selection Method for Hybrid Sentiment Classification. IEEE Access 2020, 8, 191850–191861. [Google Scholar] [CrossRef]
- Ileberi, E.; Sun, Y.; Wang, Z. A machine learning based credit card fraud detection using the GA algorithm for feature selection. J. Big Data 2022, 9, 24. [Google Scholar] [CrossRef]
- Al-Ahmad, B.; Al-Zoubi, A.M.; Abu Khurma, R.; Aljarah, I. An Evolutionary Fake News Detection Method for COVID-19 Pandemic Information. Symmetry 2021, 13, 1091. [Google Scholar] [CrossRef]
- Soumaya, Z.; Drissi Taoufiq, B.; Benayad, N.; Yunus, K.; Abdelkrim, A. The detection of Parkinson disease using the genetic algorithm and SVM classifier. Appl. Acoust. 2021, 171, 107528. [Google Scholar] [CrossRef]
- Huang, G.-B.; Zhu, Q.-Y.; Siew, C.-K. Extreme learning machine: A new learning scheme of feedforward neural networks. In Proceedings of the 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541), Budapest, Hungary, 25–29 July 2004; Volume 2, pp. 985–990. [Google Scholar] [CrossRef]
- Han, S.; Zhu, K.; Zhou, M.; Cai, X. Competition-Driven Multimodal Multiobjective Optimization and Its Application to Feature Selection for Credit Card Fraud Detection. IEEE Trans. Syst. Man Cybern. Syst. 2022, 52, 7845–7857. [Google Scholar] [CrossRef]
- Malik, E.F.; Khaw, K.W.; Belaton, B.; Wong, W.P.; Chew, X. Credit Card Fraud Detection Using a New Hybrid Machine Learning Architecture. Mathematics 2022, 10, 1480. [Google Scholar] [CrossRef]
- Zioviris, G.; Kolomvatsos, K.; Stamoulis, G. Credit card fraud detection using a deep learning multistage model. J. Supercomput. 2022, 78, 14571–14596. [Google Scholar] [CrossRef]
- Alarfaj, F.K.; Malik, I.; Khan, H.U.; Almusallam, N.; Ramzan, M.; Ahmed, M. Credit Card Fraud Detection Using State-of-the-Art Machine Learning and Deep Learning Algorithms. IEEE Access 2022, 10, 39700–39715. [Google Scholar] [CrossRef]
- Van Belle, R.; Van Damme, C.; Tytgat, H.; De Weerdt, J. Inductive Graph Representation Learning for fraud detection. Expert Syst. Appl. 2022, 193, 116463. [Google Scholar] [CrossRef]
- Esenogho, E.; Mienye, I.D.; Swart, T.G.; Aruleba, K.; Obaido, G. A Neural Network Ensemble with Feature Engineering for Improved Credit Card Fraud Detection. IEEE Access 2022, 10, 16400–16407. [Google Scholar] [CrossRef]
- Zhang, Y.-F.; Lu, H.-L.; Lin, H.-F.; Qiao, X.-C.; Zheng, H. The Optimized Anomaly Detection Models Based on an Approach of Dealing with Imbalanced Dataset for Credit Card Fraud Detection. Mob. Inf. Syst. 2022, 2022, e8027903. [Google Scholar] [CrossRef]
- Ala’raj, M.; Abbod, M.F.; Majdalawieh, M.; Jum’a, L. A deep learning model for behavioural credit scoring in banks. Neural Comput. Appl. 2022, 34, 5839–5866. [Google Scholar] [CrossRef]
- Zhang, X.; Yu, L.; Yin, H.; Lai, K.K. Integrating data augmentation and hybrid feature selection for small sample credit risk assessment with high dimensionality. Comput. Oper. Res. 2022, 146, 105937. [Google Scholar] [CrossRef]
- Yang, Y.; Fan, C.; Chen, L.; Xiong, H. IPMOD: An efficient outlier detection model for high-dimensional medical data streams. Expert Syst. Appl. 2022, 191, 116212. [Google Scholar] [CrossRef]
- Chaquet-Ulldemolins, J.; Gimeno-Blanes, F.-J.; Moral-Rubio, S.; Muñoz-Romero, S.; Rojo Álvarez, J.-L. On the Black-Box Challenge for Fraud Detection Using Machine Learning (I): Linear Models and Informative Feature Selection. Appl. Sci. 2022, 12, 3328. [Google Scholar] [CrossRef]
- Al-Yaseen, W.L.; Idrees, A.K.; Almasoudy, F.H. Wrapper feature selection method based differential evolution and extreme learning machine for intrusion detection system. Pattern Recognit. 2022, 132, 108912. [Google Scholar] [CrossRef]
- Beheshti, Z. BMPA-TVSinV: A Binary Marine Predators Algorithm using time-varying sine and V-shaped transfer functions for wrapper-based feature selection. Knowl.-Based Syst. 2022, 252, 109446. [Google Scholar] [CrossRef]
- Prashanth, S.K.; Shitharth, S.; Praveen Kumar, B.; Subedha, V.; Sangeetha, K. Optimal Feature Selection Based on Evolutionary Algorithm for Intrusion Detection. SN Comput. Sci. 2022, 3, 439. [Google Scholar] [CrossRef]
- Xue, X.; Yao, M.; Wu, Z. A novel ensemble-based wrapper method for feature selection using extreme learning machine and genetic algorithm. Knowl. Inf. Syst. 2018, 57, 389–412. [Google Scholar] [CrossRef]
- Salazar, A.; Safont, G.; Rodriguez, A.; Vergara, L. Combination of multiple detectors for credit card fraud detection. In Proceedings of the 2016 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT), Limassol, Cyprus, 12–14 December 2016; pp. 138–143. [Google Scholar] [CrossRef]
- Vergara, L.; Salazar, A.; Belda, J.; Safont, G.; Moral, S.; Iglesias, S. Signal processing on graphs for improving automatic credit card fraud detection. In Proceedings of the 2017 International Carnahan Conference on Security Technology (ICCST), Madrid, Spain, 23–26 October 2017; pp. 1–6. [Google Scholar] [CrossRef]
- Mienye, I.D.; Sun, Y. A Deep Learning Ensemble With Data Resampling for Credit Card Fraud Detection. IEEE Access 2023, 11, 30628–30638. [Google Scholar] [CrossRef]
- Gkikas, D.C.; Theodoridis, P.K.; Beligiannis, G.N. Enhanced Marketing Decision Making for Consumer Behaviour Classification Using Binary Decision Trees and a Genetic Algorithm Wrapper. Informatics 2022, 9, 45. [Google Scholar] [CrossRef]
- Mabdeh, A.N.; Al-Fugara, A.; Ahmadlou, M.; Al-Adamat, R.; Al Shabeeb, A.R. GIS-based landslide susceptibility assessment and mapping in Ajloun and Jerash governorates in Jordan using genetic algorithm-based ensemble models. Acta Geophys. 2022, 70, 1253–1267. [Google Scholar] [CrossRef]
- Tao, P.; Sun, Z.; Sun, Z. An Improved Intrusion Detection Algorithm Based on GA and SVM. IEEE Access 2018, 6, 13624–13631. [Google Scholar] [CrossRef]
- Kasongo, S.M. An Advanced Intrusion Detection System for IIoT Based on GA and Tree Based Algorithms. IEEE Access 2021, 9, 113199–113212. [Google Scholar] [CrossRef]
- Credit Card Fraud Detection. Available online: https://kaggle.com/mlg-ulb/creditcardfraud (accessed on 26 October 2021).
- Lin, T.-H.; Jiang, J.-R. Credit Card Fraud Detection with Autoencoder and Probabilistic Random Forest. Mathematics 2021, 9, 2683. [Google Scholar] [CrossRef]
- Mienye, I.D.; Obaido, G.; Aruleba, K.; Dada, O.A. Enhanced Prediction of Chronic Kidney Disease Using Feature Selection and Boosted Classifiers. In Intelligent Systems Design and Applications; Springer: Cham, Switzerland, 2022; pp. 527–537. [Google Scholar] [CrossRef]
- Alhaj, T.A.; Siraj, M.M.; Zainal, A.; Elshoush, H.T.; Elhaj, F. Feature Selection Using Information Gain for Improved Structural-Based Alert Correlation. PLoS ONE 2016, 11, e0166017. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Ebiaredoh-Mienye, S.A.; Swart, T.G.; Esenogho, E.; Mienye, I.D. A Machine Learning Method with Filter-Based Feature Selection for Improved Prediction of Chronic Kidney Disease. Bioengineering 2022, 9, 350. [Google Scholar] [CrossRef]
- Katoch, S.; Chauhan, S.S.; Kumar, V. A review on genetic algorithm: Past, present, and future. Multimed. Tools Appl. 2021, 80, 8091–8126. [Google Scholar] [CrossRef]
- Schulte, R.V.; Prinsen, E.C.; Hermens, H.J.; Buurke, J.H. Genetic Algorithm for Feature Selection in Lower Limb Pattern Recognition. Front. Robot. AI 2021, 8, 710806. Available online: https://www.frontiersin.org/articles/10.3389/frobt.2021.710806 (accessed on 23 November 2022). [CrossRef] [PubMed]
- Kalita, K.; Dey, P.; Haldar, S.; Gao, X.-Z. Optimizing frequencies of skew composite laminates with metaheuristic algorithms. Eng. Comput. 2020, 36, 741–761. [Google Scholar] [CrossRef]
- Jovanovic, D.; Antonijevic, M.; Stankovic, M.; Zivkovic, M.; Tanaskovic, M.; Bacanin, N. Tuning Machine Learning Models Using a Group Search Firefly Algorithm for Credit Card Fraud Detection. Mathematics 2022, 10, 2272. [Google Scholar] [CrossRef]
- Prasetiyowati, M.I.; Maulidevi, N.U.; Surendro, K. Determining threshold value on information gain feature selection to increase speed and prediction accuracy of random forest. J. Big Data 2021, 8, 84. [Google Scholar] [CrossRef]
- Xie, J.; Wang, M.; Xu, S.; Huang, Z.; Grant, P.W. The Unsupervised Feature Selection Algorithms Based on Standard Deviation and Cosine Similarity for Genomic Data Analysis. Front. Genet. 2021, 12, 684100. Available online: https://www.frontiersin.org/article/10.3389/fgene.2021.684100 (accessed on 15 January 2022). [CrossRef]
- Van Hulse, J.; Khoshgoftaar, T.M.; Napolitano, A.; Wald, R. Threshold-based feature selection techniques for high-dimensional bioinformatics data. Netw. Model. Anal. Health Inform. Bioinform. 2012, 1, 47–61. [Google Scholar] [CrossRef] [Green Version]
- Theodoridis, P.K.; Gkikas, D.C. Optimal Feature Selection for Decision Trees Induction Using a Genetic Algorithm Wrapper—A Model Approach. In Strategic Innovative Marketing and Tourism; Springer: Cham, Switzerland, 2020; pp. 583–591. [Google Scholar] [CrossRef]
- Kumar, A.; Sinha, N.; Bhardwaj, A. A novel fitness function in genetic programming for medical data classification. J. Biomed. Inform. 2020, 112, 103623. [Google Scholar] [CrossRef]
- Mienye, I.D.; Sun, Y. Effective Feature Selection for Improved Prediction of Heart Disease. In Pan-African Artificial Intelligence and Smart Systems; Springer: Cham, Switzerland, 2022; pp. 94–107. [Google Scholar] [CrossRef]
- Costa-Carrapiço, I.; Raslan, R.; González, J.N. A systematic review of genetic algorithm-based multi-objective optimisation for building retrofitting strategies towards energy efficiency. Energy Build. 2020, 210, 109690. [Google Scholar] [CrossRef]
- Maghawry, A.; Hodhod, R.; Omar, Y.; Kholief, M. An approach for optimizing multi-objective problems using hybrid genetic algorithms. Soft Comput. 2021, 25, 389–405. [Google Scholar] [CrossRef]
- Blank, J.; Deb, K. A Running Performance Metric and Termination Criterion for Evaluating Evolutionary Multi- and Many-objective Optimization Algorithms. In Proceedings of the 2020 IEEE Congress on Evolutionary Computation (CEC), Glasgow, UK, 19–24 July 2020; pp. 1–8. [Google Scholar] [CrossRef]
- Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
- Schapire, R.E. A brief introduction to boosting. IJCAI 1999, 99, 1401–1406. [Google Scholar]
- Cramer, J.S. The Origins of Logistic Regression. In Social Science Research Network; SSRN Scholarly Paper ID 360300; SSRN: Rochester, NY, USA, 2002. [Google Scholar] [CrossRef] [Green Version]
- Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef] [Green Version]
- Cortes, C.; Vapnik, V. Support-vector networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
- Krzywinski, M.; Altman, N. Classification and regression trees. Nat. Methods 2017, 14, 8. [Google Scholar] [CrossRef]
- Prusty, S.; Patnaik, S.; Dash, S.K. SKCV: Stratified K-fold cross-validation on ML classifiers for predicting cervical cancer. Front. Nanotechnol. 2022, 4, 972421. Available online: https://www.frontiersin.org/articles/10.3389/fnano.2022.972421 (accessed on 8 November 2022). [CrossRef]
- Trevethan, R. Sensitivity, Specificity, and Predictive Values: Foundations, Pliabilities, and Pitfalls in Research and Practice. Front. Public Health 2017, 5, 307. Available online: https://www.frontiersin.org/article/10.3389/fpubh.2017.00307 (accessed on 25 January 2022). [CrossRef]
- Mienye, I.D.; Sun, Y. A Survey of Ensemble Learning: Concepts, Algorithms, Applications, and Prospects. IEEE Access 2022, 10, 99129–99149. [Google Scholar] [CrossRef]
- Obaido, G.; Ogbuokiri, B.; Swart, T.G.; Ayawei, N.; Kasongo, S.M.; Aruleba, K.; Mienye, I.D.; Aruleba, I.; Chukwu, W.; Osaye, F.; et al. An Interpretable Machine Learning Approach for Hepatitis B Diagnosis. Appl. Sci. 2022, 12, 11127. [Google Scholar] [CrossRef]
- Mienye, I.D.; Sun, Y.; Wang, Z. Improved Predictive Sparse Decomposition Method with Densenet for Prediction of Lung Cancer. Int. J. Comput. 2020, 1, 533–541. [Google Scholar] [CrossRef]
- Zain, A.M.; Haron, H.; Sharif, S. Application of GA to optimize cutting conditions for minimizing surface roughness in end milling machining process. Expert Syst. Appl. 2010, 37, 4650–4659. [Google Scholar] [CrossRef]
- Mirjalili, S. Genetic Algorithm. In Evolutionary Algorithms and Neural Networks: Theory and Applications; Mirjalili, S., Ed.; Springer International Publishing: Cham, Switzerland, 2019; pp. 43–55. [Google Scholar] [CrossRef]
- Mienye, I.D.; Kenneth Ainah, P.; Emmanuel, I.D.; Esenogho, E. Sparse noise minimization in image classification using Genetic Algorithm and DenseNet. In Proceedings of the 2021 Conference on Information Communications Technology and Society (ICTAS), Durban, South Africa, 10–11 March 2021; pp. 103–108. [Google Scholar] [CrossRef]
- Zhu, H.; Liu, G.; Zhou, M.; Xie, Y.; Abusorrah, A.; Kang, Q. Optimizing Weighted Extreme Learning Machines for imbalanced classification and application to credit card fraud detection. Neurocomputing 2020, 407, 50–62. [Google Scholar] [CrossRef]
- Alkhatib, K.I.; Al-Aiad, A.I.; Almahmoud, M.H.; Elayan, O.N. Credit Card Fraud Detection Based on Deep Neural Network Approach. In Proceedings of the 2021 12th International Conference on Information and Communication Systems (ICICS), Valencia, Spain, 24–26 May 2021; pp. 153–156. [Google Scholar] [CrossRef]
- Yotsawat, W.; Wattuya, P.; Srivihok, A. A Novel Method for Credit Scoring Based on Cost-Sensitive Neural Network Ensemble. IEEE Access 2021, 9, 78521–78537. [Google Scholar] [CrossRef]
- Kalid, S.N.; Ng, K.-H.; Tong, G.-K.; Khor, K.-C. A Multiple Classifiers System for Anomaly Detection in Credit Card Data With Unbalanced and Overlapped Classes. IEEE Access 2020, 8, 28210–28221. [Google Scholar] [CrossRef]
- Mrozek, P.; Panneerselvam, J.; Bagdasar, O. Efficient Resampling for Fraud Detection During Anonymised Credit Card Transactions with Unbalanced Datasets. In Proceedings of the 2020 IEEE/ACM 13th International Conference on Utility and Cloud Computing (UCC), Leicester, UK, 7–10 December 2020; pp. 426–433. [Google Scholar] [CrossRef]
- Carta, S.; Ferreira, A.; Reforgiato Recupero, D.; Saia, R. Credit scoring by leveraging an ensemble stochastic criterion in a transformed feature space. Prog. Artif. Intell. 2021, 10, 417–432. [Google Scholar] [CrossRef]
- Xie, Y.; Li, A.; Gao, L.; Liu, Z. A Heterogeneous Ensemble Learning Model Based on Data Distribution for Credit Card Fraud Detection. Wirel. Commun. Mob. Comput. 2021, 2021, e2531210. [Google Scholar] [CrossRef]
- Saheed, Y.K.; Hambali, M.A.; Arowolo, M.O.; Olasupo, Y.A. Application of GA Feature Selection on Naive Bayes, Random Forest and SVM for Credit Card Fraud Detection. In Proceedings of the 2020 International Conference on Decision Aid Sciences and Application (DASA), Sakheer, Bahrain, 8–9 November 2020; pp. 1091–1097. [Google Scholar] [CrossRef]
- Verma, B.P.; Verma, V.; Badholia, A. Hyper-Tuned Ensemble Machine Learning Model for Credit Card Fraud Detection. In Proceedings of the 2022 International Conference on Inventive Computation Technologies (ICICT), Lalitpur, Nepal, 20–22 July 2022; pp. 320–327. [Google Scholar] [CrossRef]
- Padhi, B.K.; Chakravarty, S.; Naik, B.; Pattanayak, R.M.; Das, H. RHSOFS: Feature Selection Using the Rock Hyrax Swarm Optimization Algorithm for Credit Card Fraud Detection System. Sensors 2022, 22, 9321. [Google Scholar] [CrossRef]
- Ganji, V.R.; Chaparala, A.; Sajja, R. Shuffled shepherd political optimization-based deep learning method for credit card fraud detection. Concurr. Comput. Pract. Exp. 2023, 35, e7666. [Google Scholar] [CrossRef]
- UCI Machine Learning Repository: Statlog (German Credit Data) Data Set. Available online: https://archive.ics.uci.edu/ml/datasets/statlog+(german+credit+data) (accessed on 5 December 2022).
- UCI Machine Learning Repository: Default of credit card clients Data Set. Available online: https://archive.ics.uci.edu/ml/datasets/default+of+credit+card+clients (accessed on 5 December 2022).
Parameter | Value |
---|---|
Population size | 50 |
Number of generations | 100 |
Crossover rate | 0.6 |
Mutation rate | 0.01 |
Fitness function | G-mean |
Stopping criteria | Max number of generations |
Type of mutation | Uniform mutation |
Type of crossover | Single point |
Parent selection method | Tournament selection |
Tournament size | 2 |
Classifier | Sensitivity | Specificity | AUC | G-Mean |
---|---|---|---|---|
ELM | 0.881 | 0.904 | 0.900 | 0.892 |
IG-ELM | 0.936 | 0.960 | 0.940 | 0.947 |
GAW | 0.949 | 0.962 | 0.950 | 0.955 |
IG-GAW | 0.997 | 0.994 | 0.990 | 0.994 |
Feature-Selection Method | Features |
---|---|
Complete feature set | , , Time, Amount, Class |
IG | Time, Amount |
GAW | , Time, Amount |
IG-GAW |
Classifier | Sensitivity | Specificity | AUC | G-Mean |
---|---|---|---|---|
AdaBoost | 0.889 | 0.918 | 0.900 | 0.903 |
LR | 0.752 | 0.916 | 0.810 | 0.829 |
RF | 0.869 | 0.940 | 0.890 | 0.904 |
SVM | 0.585 | 0.827 | 0.660 | 0.695 |
DT | 0.590 | 0.801 | 0.690 | 0.688 |
Proposed IG-GAW | 0.997 | 0.994 | 0.990 | 0.994 |
Reference | Algorithm | Sensitivity | Specificity | AUC |
---|---|---|---|---|
Zhu et al. [77] | Weighted ELM | 0.982 | - | 0.978 |
Alkhatib et al. [78] | DNN | 0.955 | - | 0.990 |
Yotsawat et al. [79] | CS-NNE | - | 0.936 | 0.980 |
Ileberi et al. [19] | GA-RF | 72.56 | - | 0.950 |
Kalid et al. [80] | DT-NB | 0.872 | 1.000 | - |
Mrozek et al. [81] | Random forest-SMOTE | 0.829 | - | 0.910 |
Carta et al. [82] | Stochastic ensemble | 0.915 | - | 0.876 |
Xie et al. [83] | XGBoost-SMOTE | 0.988 | - | 0.970 |
Saheed et al. [84] | GA-SVM | 0.963 | 0.963 | - |
Verma et al. [85] | PSO-based Ensemble model | 0.97 | - | - |
Padhi et al. [86] | RHSO | 0.951 | - | - |
Ganji et al. [87] | DRN-SSPO | 0.912 | 0.902 | - |
This paper | Proposed IG-GAW | 0.997 | 0.994 | 0.990 |
Feature-Selection Method | Features |
---|---|
Complete feature set | Status of existing checking account, duration in month, credit history, purpose, credit amount, savings account, present employment since, installment rate as a percentage of disposable income, personal status and sex, other debtors, present residence since, property, age, other installment plans, housing, number of existing credits at this bank, job, number of dependents, telephone, foreign worker |
IG | Status of existing checking account, duration in month, credit history, purpose, credit amount, savings account, present employment since, installment rate as a percentage of disposable income, personal status and sex, other debtors, property, age, other installment plans, housing, number of dependents, foreign worker |
GAW | Status of existing checking account, duration in month, credit history, purpose, credit amount, savings account, present employment since, property, age, other installment plans, housing, number of dependents, foreign worker |
IG-GAW | Credit amount, status of existing checking account, duration in months, age, credit history, purpose, property, present employment since, and housing |
Classifier | Sensitivity | Specificity | AUC | G-Mean |
---|---|---|---|---|
AdaBoost | 0.785 | 0.892 | 0.810 | 0.837 |
LR | 0.688 | 0.813 | 0.700 | 0.748 |
RF | 0.796 | 0.904 | 0.830 | 0.850 |
SVM | 0.649 | 0.792 | 0.650 | 0.716 |
DT | 0.630 | 0.787 | 0.640 | 0.704 |
ELM | 0.704 | 0.830 | 0.710 | 0.763 |
IG-ELM | 0.796 | 0.903 | 0.810 | 0.847 |
GAW | 0.820 | 0.925 | 0.860 | 0.871 |
Proposed IG-GAW | 0.904 | 0.946 | 0.910 | 0.925 |
Feature Selection Method | Features |
---|---|
Complete feature set | ID, LIMIT_BAL, SEX, EDUCATION, MARRIAGE, AGE, PAY_0, PAY_2, PAY_3, PAY_4, PAY_5, PAY_6, BILL_AMT1, BILL_AMT2, BILL_AMT3, BILL_AMT4, BILL_AMT5, BILL_AMT6, PAY_AMT1, PAY_AMT2, PAY_AMT3, PAY_AMT4, PAY_AMT5, PAY_AMT6 |
IG | SEX, PAY_0, PAY_2, PAY_3, PAY_4, PAY_5, PAY_6, BILL_AMT1, BILL_AMT2, BILL_AMT3, BILL_AMT4, BILL_AMT5, BILL_AMT6, PAY_AMT1, PAY_AMT2, PAY_AMT3, PAY_AMT4, PAY_AMT5, PAY_AMT6 |
GAW | PAY_0, PAY_2, PAY_4, PAY_5, BILL_AMT1, BILL_AMT2, BILL_AMT3, BILL_AMT4, BILL_AMT5, BILL_AMT6, PAY_AMT1, PAY_AMT2, PAY_AMT3, PAY_AMT4, PAY_AMT5, PAY_AMT6 |
IG-GAW | BILL_AMT1, BILL_AMT2, BILL_AMT3, BILL_AMT4, BILL_AMT5, BILL_AMT6, PAY_AMT1, PAY_AMT2, PAY_AMT3, PAY_AMT6, PAY_AMT4, PAY_AMT5, PAY_0, and PAY_2 |
Classifier | Sensitivity | Specificity | AUC | G-Mean |
---|---|---|---|---|
AdaBoost | 0.870 | 0.890 | 0.870 | 0.880 |
LR | 0.625 | 0.837 | 0.640 | 0.723 |
RF | 0.829 | 0.914 | 0.840 | 0.870 |
SVM | 0.626 | 0.819 | 0.650 | 0.716 |
DT | 0.574 | 0.773 | 0.610 | 0.666 |
ELM | 0.710 | 0.885 | 0.730 | 0.793 |
IG-ELM | 0.874 | 0.911 | 0.890 | 0.892 |
GAW | 0.899 | 0.920 | 0.900 | 0.909 |
Proposed IG-GAW | 0.945 | 0.961 | 0.940 | 0.952 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Mienye, I.D.; Sun, Y. A Machine Learning Method with Hybrid Feature Selection for Improved Credit Card Fraud Detection. Appl. Sci. 2023, 13, 7254. https://doi.org/10.3390/app13127254
Mienye ID, Sun Y. A Machine Learning Method with Hybrid Feature Selection for Improved Credit Card Fraud Detection. Applied Sciences. 2023; 13(12):7254. https://doi.org/10.3390/app13127254
Chicago/Turabian StyleMienye, Ibomoiye Domor, and Yanxia Sun. 2023. "A Machine Learning Method with Hybrid Feature Selection for Improved Credit Card Fraud Detection" Applied Sciences 13, no. 12: 7254. https://doi.org/10.3390/app13127254
APA StyleMienye, I. D., & Sun, Y. (2023). A Machine Learning Method with Hybrid Feature Selection for Improved Credit Card Fraud Detection. Applied Sciences, 13(12), 7254. https://doi.org/10.3390/app13127254