Special Issue "Feature Selection for High-Dimensional Data"

A special issue of Information (ISSN 2078-2489). This special issue belongs to the section "Artificial Intelligence".

Deadline for manuscript submissions: closed (31 October 2017)

Special Issue Editors

Guest Editor
Dr. Verónica Bolón Canedo

Grupo LIDIA, Departamento de Computación, Facultad de Informática, Universidade da Coruña, 15071 A Coruña, Spain
Website | E-Mail
Interests: machine learning; pattern recognition; feature selection; medical applications
Guest Editor
Dr. Noelia Sánchez-Maroño

Departamento de Computación, Facultad de Informática, Universidade da Coruña, 15071 A Coruña, Spain
Website | E-Mail
Interests: artificial intelligence; machine learning; pattern recognition; feature selection
Guest Editor
Dr. Amparo Alonso-Betanzos

Departamento de Computación, Facultad de Informática, Universidade da Coruña, 15071 A Coruña, Spain
Website | E-Mail
Interests: computer science; artificial intelligence, machine learning, feature selection, scalability issues in machine learning

Special Issue Information

Dear Colleagues,

Feature selection has been embraced as one of the high activity research areas during the last few years, because of the appearance of datasets containing hundreds of thousands of features. Therefore, feature selection was deemed as a great tool to better model the underlying process of data generation, as well as to reduce the cost of acquiring the features. Furthermore, from the Machine Learning perspective, given that feature selection can reduce the dimensionality of the problem, it can be used for maintaining or even improving the algorithms’ performance, while reducing computational costs. Nowadays, the advent of Big Data has brought unprecedented challenges to machine learning researchers, who now have to deal with huge volumes of data, in terms of both instances and features, making the learning task more complex and computationally demanding than ever. Specifically, when dealing with an extremely large number of features, learning algorithms’ performance can degenerate due to overfitting; learned models decrease their interpretability as they become more complex; and speed and efficiency of the algorithms decline in accordance with size. A vast body of feature selection methods exists in the literature, including filters based on distinct metrics (e.g., entropy, probability distributions or information theory) and embedded and wrapper methods using different induction algorithms. However, some of the most used algorithms were developed when dataset sizes were much smaller, and nowadays they cannot scale well, producing a need to readapt these successful algorithms to be able to deal with Big Data problems.

In this Special Issue, we invite investigators to contribute with their recent developments in feature selection methods for high-dimensional settings, as well as review articles that will stimulate the continuing efforts to understand the problems usually encountered in this field.

Topics of interest include, but are not limited to:

  • New feature selection methods
  • Ensemble methods for feature selection
  • Feature selection to deal with microarray data
  • Parallelization of feature selection methods
  • Missing data in the context of feature selection
  • Feature selection applications

Dr. Verónica Bolón Canedo
Dr. Noelia Sánchez-Maroño
Dr. Amparo Alonso-Betanzos
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All papers will be peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Information is an international peer-reviewed open access quarterly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 350 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • Feature selection
  • Ensemble feature selection
  • Filters
  • Wrappers
  • Embedded methods

Published Papers (2 papers)

View options order results:
result details:
Displaying articles 1-2
Export citation of selected articles as:

Research

Open AccessArticle sCwc/sLcc: Highly Scalable Feature Selection Algorithms
Information 2017, 8(4), 159; doi:10.3390/info8040159
Received: 31 October 2017 / Revised: 1 December 2017 / Accepted: 2 December 2017 / Published: 6 December 2017
PDF Full-text (1876 KB) | HTML Full-text | XML Full-text
Abstract
Feature selection is a useful tool for identifying which features, or attributes, of a dataset cause or explain the phenomena that the dataset describes, and improving the efficiency and accuracy of learning algorithms for discovering such phenomena. Consequently, feature selection has been studied
[...] Read more.
Feature selection is a useful tool for identifying which features, or attributes, of a dataset cause or explain the phenomena that the dataset describes, and improving the efficiency and accuracy of learning algorithms for discovering such phenomena. Consequently, feature selection has been studied intensively in machine learning research. However, while feature selection algorithms that exhibit excellent accuracy have been developed, they are seldom used for analysis of high-dimensional data because high-dimensional data usually include too many instances and features, which make traditional feature selection algorithms inefficient. To eliminate this limitation, we tried to improve the run-time performance of two of the most accurate feature selection algorithms known in the literature. The result is two accurate and fast algorithms, namely sCwc and sLcc. Multiple experiments with real social media datasets have demonstrated that our algorithms improve the performance of their original algorithms remarkably. For example, we have two datasets, one with 15,568 instances and 15,741 features, and another with 200,569 instances and 99,672 features. sCwc performed feature selection on these datasets in 1.4 seconds and in 405 seconds, respectively. In addition, sLcc has turned out to be as fast as sCwc on average. This is a remarkable improvement because it is estimated that the original algorithms would need several hours to dozens of days to process the same datasets. In addition, we introduce a fast implementation of our algorithms: sCwc does not require any adjusting parameter, while sLcc requires a threshold parameter, which we can use to control the number of features that the algorithm selects. Full article
(This article belongs to the Special Issue Feature Selection for High-Dimensional Data)
Figures

Figure 1

Open AccessArticle Ensemble of Filter-Based Rankers to Guide an Epsilon-Greedy Swarm Optimizer for High-Dimensional Feature Subset Selection
Information 2017, 8(4), 152; doi:10.3390/info8040152
Received: 28 September 2017 / Revised: 19 October 2017 / Accepted: 20 November 2017 / Published: 22 November 2017
PDF Full-text (640 KB) | HTML Full-text | XML Full-text
Abstract
The main purpose of feature subset selection is to remove irrelevant and redundant features from data, so that learning algorithms can be trained by a subset of relevant features. So far, many algorithms have been developed for the feature subset selection, and most
[...] Read more.
The main purpose of feature subset selection is to remove irrelevant and redundant features from data, so that learning algorithms can be trained by a subset of relevant features. So far, many algorithms have been developed for the feature subset selection, and most of these algorithms suffer from two major problems in solving high-dimensional datasets: First, some of these algorithms search in a high-dimensional feature space without any domain knowledge about the feature importance. Second, most of these algorithms are originally designed for continuous optimization problems, but feature selection is a binary optimization problem. To overcome the mentioned weaknesses, we propose a novel hybrid filter-wrapper algorithm, called Ensemble of Filter-based Rankers to guide an Epsilon-greedy Swarm Optimizer (EFR-ESO), for solving high-dimensional feature subset selection. The Epsilon-greedy Swarm Optimizer (ESO) is a novel binary swarm intelligence algorithm introduced in this paper as a novel wrapper. In the proposed EFR-ESO, we extract the knowledge about the feature importance by the ensemble of filter-based rankers and then use this knowledge to weight the feature probabilities in the ESO. Experiments on 14 high-dimensional datasets indicate that the proposed algorithm has excellent performance in terms of both the error rate of the classification and minimizing the number of features. Full article
(This article belongs to the Special Issue Feature Selection for High-Dimensional Data)
Figures

Figure 1

Back to Top