Machine Learning Approaches for Imbalanced Domains: Emerging Trends and Applications

A special issue of Information (ISSN 2078-2489). This special issue belongs to the section "Artificial Intelligence".

Deadline for manuscript submissions: 1 December 2024 | Viewed by 2111

Special Issue Editors


E-Mail Website
Guest Editor
Department of Mathematics and Computer Science, University of Cagliari, 09124 Cagliari, Italy
Interests: data mining and machine learning; high-dimensional data analysis; feature selection
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
Department of Mathematics and Computer Science, University of Cagliari, 09124 Cagliari, Italy
Interests: computer vision; image processing; machine learning; deep learning; artificial intelligence; medical image analysis; biomedical image analysis
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

In many real-world domains, the data distribution is highly imbalanced since instances of some classes appear much more frequently than others. This poses a difficulty for machine learning algorithms as they tend to be biased towards the majority class. At the same time, the minority class is typically the most important from a data mining perspective as it may carry valuable knowledge.

Despite more than two decades of continuous research, several open issues remain in the field of imbalance learning, and recent trends increasingly focus on the interaction between class imbalance and other difficulties embedded in the nature of the data, such as the fast-growing data volume and dimensionality, the variability of concepts in time, or the presence of noise and data quality issues. New real-world problems continue to emerge that motivate researchers to focus on advanced learning strategies, which can involve data-level and algorithm-level approaches, to effectively deal with imbalanced datasets.

The aim of this Special Issue is to bring together contributions that discuss problems and solutions in this area, both from a methodological and an application-oriented perspective. Topics of interest include but are not limited to:

  • Data-level, algorithm-level, and hybrid approaches;
  • Machine learning, ensemble learning, and deep learning methods;
  • Multi-label and multi-class imbalanced learning;
  • Learning strategies for high-dimensional imbalanced data;
  • Learning strategies for imbalanced data streams;
  • Learning strategies for imbalanced visual data;
  • Noise robustness of learning methods in imbalanced settings;
  • Metrics and methodologies for model evaluation in imbalanced settings;
  • Real-world applications: industrial monitoring systems, fraud detection, intrusion detection, software defect prediction, medical diagnosis, object detection and image classification, computer vision, text mining, sentiment analysis, anomaly detection, and behavior analysis in social media.

Dr. Barbara Pes
Dr. Andrea Loddo
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Information is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • data mining and knowledge discovery
  • machine learning
  • deep learning
  • imbalance learning
  • case studies and real-world applications

Published Papers (2 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

20 pages, 537 KiB  
Article
An Extensive Performance Comparison between Feature Reduction and Feature Selection Preprocessing Algorithms on Imbalanced Wide Data
by Ismael Ramos-Pérez, José Antonio Barbero-Aparicio, Antonio Canepa-Oneto, Álvar Arnaiz-González and Jesús Maudes-Raedo
Information 2024, 15(4), 223; https://doi.org/10.3390/info15040223 (registering DOI) - 16 Apr 2024
Abstract
The most common preprocessing techniques used to deal with datasets having high dimensionality and a low number of instances—or wide data—are feature reduction (FR), feature selection (FS), and resampling. This study explores the use of FR and resampling techniques, expanding the limited comparisons [...] Read more.
The most common preprocessing techniques used to deal with datasets having high dimensionality and a low number of instances—or wide data—are feature reduction (FR), feature selection (FS), and resampling. This study explores the use of FR and resampling techniques, expanding the limited comparisons between FR and filter FS methods in the existing literature, especially in the context of wide data. We compare the optimal outcomes from a previous comprehensive study of FS against new experiments conducted using FR methods. Two specific challenges associated with the use of FR are outlined in detail: finding FR methods that are compatible with wide data and the need for a reduction estimator of nonlinear approaches to process out-of-sample data. The experimental study compares 17 techniques, including supervised, unsupervised, linear, and nonlinear approaches, using 7 resampling strategies and 5 classifiers. The results demonstrate which configurations are optimal, according to their performance and computation time. Moreover, the best configuration—namely, k Nearest Neighbor (KNN) + the Maximal Margin Criterion (MMC) feature reducer with no resampling—is shown to outperform state-of-the-art algorithms. Full article
19 pages, 2175 KiB  
Article
An Evaluation of Feature Selection Robustness on Class Noisy Data
by Simone Pau, Alessandra Perniciano, Barbara Pes and Dario Rubattu
Information 2023, 14(8), 438; https://doi.org/10.3390/info14080438 - 03 Aug 2023
Viewed by 1213
Abstract
With the increasing growth of data dimensionality, feature selection has become a crucial step in a variety of machine learning and data mining applications. In fact, it allows identifying the most important attributes of the task at hand, improving the efficiency, interpretability, and [...] Read more.
With the increasing growth of data dimensionality, feature selection has become a crucial step in a variety of machine learning and data mining applications. In fact, it allows identifying the most important attributes of the task at hand, improving the efficiency, interpretability, and final performance of the induced models. In recent literature, several studies have examined the strengths and weaknesses of the available feature selection methods from different points of view. Still, little work has been performed to investigate how sensitive they are to the presence of noisy instances in the input data. This is the specific field in which our work wants to make a contribution. Indeed, since noise is arguably inevitable in several application scenarios, it would be important to understand the extent to which the different selection heuristics can be affected by noise, in particular class noise (which is more harmful in supervised learning tasks). Such an evaluation may be especially important in the context of class-imbalanced problems, where any perturbation in the set of training records can strongly affect the final selection outcome. In this regard, we provide here a two-fold contribution by presenting (i) a general methodology to evaluate feature selection robustness on class noisy data and (ii) an experimental study that involves different selection methods, both univariate and multivariate. The experiments have been conducted on eight high-dimensional datasets chosen to be representative of different real-world domains, with interesting insights into the intrinsic degree of robustness of the considered selection approaches. Full article
Show Figures

Figure 1

Back to TopTop