MDPI - Publisher of Open Access Journals

16 pages, 2316 KiB

Open AccessArticle

Effective Feature Engineering and Classification of Breast Cancer Diagnosis: A Comparative Study

by Emilija Strelcenia and Simant Prakoonwit

BioMedInformatics 2023, 3(3), 616-631; https://doi.org/10.3390/biomedinformatics3030042 - 2 Aug 2023

Cited by 14 | Viewed by 7109

Breast cancer is among the most common cancers found in women, causing cancer-related deaths and making it a severe public health issue. Early prediction of breast cancer can increase the chances of survival and promote early medical treatment. Moreover, the accurate classification of benign cases can prevent cancer patients from undergoing unnecessary treatments. Therefore, the accurate and early diagnosis of breast cancer and the classification into benign or malignant classes are much-needed research topics. This paper presents an effective feature engineering method to extract and modify features from data and the effects on different classifiers using the Wisconsin Breast Cancer Diagnosis Dataset. We then use the feature to compare six popular machine-learning models for classification. The models compared were Logistic Regression, Random Forest, Decision Tree, K-Neighbors, Multi-Layer Perception (MLP), and XGBoost. The results showed that the Decision Tree model, when applied to the proposed feature engineering, was the best performing, achieving an average accuracy of 98.64%. Full article

(This article belongs to the Special Issue Feature Papers in Computational Biology and Medicine)

► Show Figures

Figure 1

26 pages, 3025 KiB

Open AccessArticle

A Survey on GAN Techniques for Data Augmentation to Address the Imbalanced Data Issues in Credit Card Fraud Detection

by Emilija Strelcenia and Simant Prakoonwit

Mach. Learn. Knowl. Extr. 2023, 5(1), 304-329; https://doi.org/10.3390/make5010019 - 11 Mar 2023

Cited by 53 | Viewed by 13133

Abstract

Data augmentation is an important procedure in deep learning. GAN-based data augmentation can be utilized in many domains. For instance, in the credit card fraud domain, the imbalanced dataset problem is a major one as the number of credit card fraud cases is in the minority compared to legal payments. On the other hand, generative techniques are considered effective ways to rebalance the imbalanced class issue, as these techniques balance both minority and majority classes before the training. In a more recent period, Generative Adversarial Networks (GANs) are considered one of the most popular data generative techniques as they are used in big data settings. This research aims to present a survey on data augmentation using various GAN variants in the credit card fraud detection domain. In this survey, we offer a comprehensive summary of several peer-reviewed research papers on GAN synthetic generation techniques for fraud detection in the financial sector. In addition, this survey includes various solutions proposed by different researchers to balance imbalanced classes. In the end, this work concludes by pointing out the limitations of the most recent research articles and future research issues, and proposes solutions to address these problems. Full article

(This article belongs to the Special Issue Privacy and Security in Machine Learning)

► Show Figures

Figure 1

27 pages, 11471 KiB

Open AccessArticle

Improving Classification Performance in Credit Card Fraud Detection by Using New Data Augmentation

by Emilija Strelcenia and Simant Prakoonwit

AI 2023, 4(1), 172-198; https://doi.org/10.3390/ai4010008 - 31 Jan 2023

Cited by 45 | Viewed by 11187

Abstract

In many industrialized and developing nations, credit cards are one of the most widely used methods of payment for online transactions. Credit card invention has streamlined, facilitated, and enhanced internet transactions. It has, however, also given criminals more opportunities to commit fraud, which has raised the rate of fraud. Credit card fraud has a concerning global impact; many businesses and ordinary users have lost millions of US dollars as a result. Since there is a large number of transactions, many businesses and organizations rely heavily on applying machine learning techniques to automatically classify or identify fraudulent transactions. As the performance of machine learning techniques greatly depends on the quality of the training data, the imbalance in the data is not a trivial issue. In general, only a small percentage of fraudulent transactions are presented in the data. This greatly affects the performance of machine learning classifiers. In order to deal with the rarity of fraudulent occurrences, this paper investigates a variety of data augmentation techniques to address the imbalanced data problem and introduces a new data augmentation model, K-CGAN, for credit card fraud detection. A number of the main classification techniques are then used to evaluate the performance of the augmentation techniques. These results show that B-SMOTE, K-CGAN, and SMOTE have the highest Precision and Recall compared with other augmentation methods. Among those, K-CGAN has the highest F1 Score and Accuracy. Full article

► Show Figures

Figure 1

Search Results (3)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (3)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI