Class-Imbalance and Cost-Sensitive Learning

A special issue of Mathematics (ISSN 2227-7390). This special issue belongs to the section "Mathematics and Computer Science".

Deadline for manuscript submissions: closed (29 February 2024) | Viewed by 4504

Special Issue Editor


E-Mail Website
Guest Editor
School of Computer Science, China University of Geosciences, Wuhan 430074, China
Interests: machine learning and data mining; bayesian learning; nearest neighbor learning; decision tree learning; cost sensitive learning; crowdsourcing learning; deep learning; classification; sorting; class probability estimation; clustering; regression; distance measurement; feature selection
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear colleagues,

Cost-sensitive learning is identified as one of the top 10 challenging problems in data mining research. Different from cost-insensitive learning, cost-sensitive learning takes a large variety of different types of costs, including misclassification costs, test costs, teaching (labeling) costs, computation costs, and human–computer interaction costs, into consideration. The goal of this type of learning is to minimize the total costs. It is one of the most active and important research areas in machine learning, and it plays an important role in real-world data mining applications, such as medical diagnoses and defect detection.

The main challenges in the field of cost-sensitive learning include class-imbalanced cost-sensitive learning, active cost-sensitive learning, semisupervised cost-sensitive learning, multiclass cost-sensitive learning, and so on.

The aim of this Special Issue is to provide a forum for researchers to disseminate their latest research, up-to-date issues, and challenges in the field of cost-sensitive learning. Potential topics include, but are not limited to, the following: recent developments and characterizations of cost-sensitive learning; new evaluation criteria for cost-sensitive learning; comparative theoretical and experimental analyses of new cost-sensitive learning models and algorithms, with validation through convincing computational experiments, performance measures, and others; and promising applications of cost-sensitive learning in real-world science and engineering domains.

Submissions should be original and unpublished. Extended versions of conference publications will be considered if they contain at least 50% new content.

 

Prof. Dr. Liangxiao Jiang
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Mathematics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • cost-sensitive learning
  • machine learning
  • data mining
  • pattern recognition
  • artificial intelligence
  • intelligent data analysis

Published Papers (3 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

15 pages, 385 KiB  
Article
Density-Based Clustering to Deal with Highly Imbalanced Data in Multi-Class Problems
by Julio Cesar Munguía Mondragón, Eréndira Rendón Lara, Roberto Alejo Eleuterio, Everardo Efrén Granda Gutirrez and Federico Del Razo López
Mathematics 2023, 11(18), 4008; https://doi.org/10.3390/math11184008 - 21 Sep 2023
Cited by 1 | Viewed by 1253
Abstract
In machine learning and data mining applications, an imbalanced distribution of classes in the training dataset can drastically affect the performance of learning models. The class imbalance problem is frequently observed during classification tasks in real-world scenarios when the available instances of one [...] Read more.
In machine learning and data mining applications, an imbalanced distribution of classes in the training dataset can drastically affect the performance of learning models. The class imbalance problem is frequently observed during classification tasks in real-world scenarios when the available instances of one class are much fewer than the amount of data available in other classes. Machine learning algorithms that do not consider the class imbalance could introduce a strong bias towards the majority class, while the minority class is usually despised. Thus, sampling techniques have been extensively used in various studies to overcome class imbalances, mainly based on random undersampling and oversampling methods. However, there is still no final solution, especially in the domain of multi-class problems. A strategy that combines density-based clustering algorithms with random undersampling and oversampling techniques is studied in this work. To analyze the performance of the studied method, an experimental validation was achieved on a collection of hyperspectral remote sensing images, and a deep learning neural network was utilized as the classifier. This data bank contains six datasets with different imbalance ratios, from slight to severe. The experimental results outperform the classification measured by the geometric mean of the precision compared with other state-of-the-art methods, mainly for highly imbalanced datasets. Full article
(This article belongs to the Special Issue Class-Imbalance and Cost-Sensitive Learning)
Show Figures

Figure 1

14 pages, 6883 KiB  
Article
Searching for Optimal Oversampling to Process Imbalanced Data: Generative Adversarial Networks and Synthetic Minority Over-Sampling Technique
by Gayeong Eom and Haewon Byeon
Mathematics 2023, 11(16), 3605; https://doi.org/10.3390/math11163605 - 21 Aug 2023
Viewed by 1223
Abstract
Classification problems due to data imbalance occur in many fields and have long been studied in the machine learning field. Many real-world datasets suffer from the issue of class imbalance, which occurs when the sizes of classes are not uniform; thus, data belonging [...] Read more.
Classification problems due to data imbalance occur in many fields and have long been studied in the machine learning field. Many real-world datasets suffer from the issue of class imbalance, which occurs when the sizes of classes are not uniform; thus, data belonging to the minority class are likely to be misclassified. It is particularly important to overcome this issue when dealing with medical data because class imbalance inevitably arises due to incidence rates within medical datasets. This study adjusted the imbalance ratio (IR) within the National Biobank of Korea dataset “Epidemiologic data of Parkinson’s disease dementia patients” to values of 6.8 (raw data), 9, and 19 and compared four traditional oversampling methods with techniques using the conditional generative adversarial network (CGAN) and conditional tabular generative adversarial network (CTGAN). The results showed that when the classes were balanced with CGAN and CTGAN, they showed a better classification performance than the more traditional oversampling techniques based on the AUC and F1-score. We were able to expand the application scope of GAN, widely used in unstructured data, to structured data. We also offer a better solution for the imbalanced data problem and suggest future research directions. Full article
(This article belongs to the Special Issue Class-Imbalance and Cost-Sensitive Learning)
Show Figures

Figure 1

13 pages, 1738 KiB  
Article
Positive-Unlabeled Learning for Network Link Prediction
by Shengfeng Gan, Mohammed Alshahrani and Shichao Liu
Mathematics 2022, 10(18), 3345; https://doi.org/10.3390/math10183345 - 15 Sep 2022
Cited by 1 | Viewed by 1395
Abstract
Link prediction is an important problem in network data mining, which is dedicated to predicting the potential relationship between nodes in the network. Normally, network link prediction based on supervised classification will be trained on a dataset consisting of a set of positive [...] Read more.
Link prediction is an important problem in network data mining, which is dedicated to predicting the potential relationship between nodes in the network. Normally, network link prediction based on supervised classification will be trained on a dataset consisting of a set of positive samples and a set of negative samples. However, well-labeled training datasets with positive and negative annotations are always inadequate in real-world scenarios, and the datasets contain a large number of unlabeled samples that may hinder the performance of the model. To address this problem, we propose a positive-unlabeled learning framework with network representation for network link prediction only using positive samples and unlabeled samples. We first learn representation vectors of nodes using a network representation method. Next, we concatenate representation vectors of node pairs and then feed them into different classifiers to predict whether the link exists or not. To alleviate data imbalance and enhance the prediction precision, we adopt three types of positive-unlabeled (PU) learning strategies to improve the prediction performance using traditional classifier estimation, bagging strategy and reliable negative sampling. We conduct experiments on three datasets to compare different PU learning methods and discuss their influence on the prediction results. The experimental results demonstrate that PU learning has a positive impact on predictive performances and the promotion effects vary with different network structures. Full article
(This article belongs to the Special Issue Class-Imbalance and Cost-Sensitive Learning)
Show Figures

Figure 1

Back to TopTop