entropy-logo

Journal Browser

Journal Browser

Information Theoretic Feature Selection Methods for Big Data

A special issue of Entropy (ISSN 1099-4300). This special issue belongs to the section "Information Theory, Probability and Statistics".

Deadline for manuscript submissions: closed (15 December 2020) | Viewed by 18072

Special Issue Editors


E-Mail Website
Guest Editor
School of Computer Science and Engineering, Chung-Ang University, 84 Heukseok-ro, Heukseok-dong, Dongjak-gu, Seoul 06974, Korea
Interests: artificial intelligence; machine learning; deep learning

E-Mail Website
Guest Editor
Department of Computer Science and Engineering, Chung-Ang University, 84 Heukseok-ro, Heukseok-dong, Dongjak-gu, Seoul 06974, Republic of Korea
Interests: Artificial Intelligence; machine learning; neural architecture design; feature engineering
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

In recent years, with the emergence of Big Data, feature selection has become the focus of research and applications involving datasets with hundreds of thousands of variables. These areas include text processing, particularly of text from Internet and social network services, genomics and medical informatics, entertainment, and education. Feature selection is the process of reducing the number of variables that are most predictive of a given outcome by selecting the optimum subset of features and labels based on specific measures. Feature selection based on information theories or concepts such as interaction information, mutual information, and entropy can be found in all machine learning tasks. These involve a combination of supervised or unsupervised, classification or regression, single-label or multi-label, single-task or multi-task time-series predictions, posing various challenges of significant interest.

Topics of Interest

This Special Issue aims to solicit and publish papers that provide a clear view of state-of-the-art feature selection methods based on information-related measures and theories for Big Data. We therefore encourage submissions in, but not limited to, the following areas:

  • Information-theoretic methods for feature selection based on interaction information, mutual information, and entropy, among other approaches;
  • Supervised, unsupervised, and semi-supervised feature selection methods for single-label, multi-label, multi-task, multi-instance, and time-series-linked Big Data, using information-, uncertainty-, or dependency-related measures;
  • Feature selection methods for missing, uncertain, and imbalanced data, concerning the concepts of information, uncertainty, or dependency;
  • Feature selection methods using single-objective and multi-objective meta-heuristic search methods—such as genetic algorithms, particle swarm optimization, and ant colony optimization—concerning the concepts of information, uncertainty, or dependency;
  • Information-, uncertainty- and dependency-related feature selection methods in applications such as text processing, bioinformatics, medical informatics, urban, entertainment, education, and others.

Prof. Dr. Dae-Won Kim
Prof. Dr. Jaesung Lee
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Entropy is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • Feature selection
  • Information theory
  • Information measure
  • Uncertainty measure
  • Dependency measure
  • Big Data

Published Papers (7 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

15 pages, 1284 KiB  
Article
Monte Carlo Tree Search-Based Recursive Algorithm for Feature Selection in High-Dimensional Datasets
by Muhammad Umar Chaudhry, Muhammad Yasir, Muhammad Nabeel Asghar and Jee-Hyong Lee
Entropy 2020, 22(10), 1093; https://doi.org/10.3390/e22101093 - 29 Sep 2020
Cited by 1 | Viewed by 2660
Abstract
The complexity and high dimensionality are the inherent concerns of big data. The role of feature selection has gained prime importance to cope with the issue by reducing dimensionality of datasets. The compromise between the maximum classification accuracy and the minimum dimensions is [...] Read more.
The complexity and high dimensionality are the inherent concerns of big data. The role of feature selection has gained prime importance to cope with the issue by reducing dimensionality of datasets. The compromise between the maximum classification accuracy and the minimum dimensions is as yet an unsolved puzzle. Recently, Monte Carlo Tree Search (MCTS)-based techniques have been invented that have attained great success in feature selection by constructing a binary feature selection tree and efficiently focusing on the most valuable features in the features space. However, one challenging problem associated with such approaches is a tradeoff between the tree search and the number of simulations. In a limited number of simulations, the tree might not meet the sufficient depth, thus inducing biasness towards randomness in feature subset selection. In this paper, a new algorithm for feature selection is proposed where multiple feature selection trees are built iteratively in a recursive fashion. The state space of every successor feature selection tree is less than its predecessor, thus increasing the impact of tree search in selecting best features, keeping the MCTS simulations fixed. In this study, experiments are performed on 16 benchmark datasets for validation purposes. We also compare the performance with state-of-the-art methods in literature both in terms of classification accuracy and the feature selection ratio. Full article
(This article belongs to the Special Issue Information Theoretic Feature Selection Methods for Big Data)
Show Figures

Figure 1

19 pages, 1911 KiB  
Article
Multi-Population Genetic Algorithm for Multilabel Feature Selection Based on Label Complementary Communication
by Jaegyun Park, Min-Woo Park, Dae-Won Kim and Jaesung Lee
Entropy 2020, 22(8), 876; https://doi.org/10.3390/e22080876 - 10 Aug 2020
Cited by 15 | Viewed by 2838
Abstract
Multilabel feature selection is an effective preprocessing step for improving multilabel classification accuracy, because it highlights discriminative features for multiple labels. Recently, multi-population genetic algorithms have gained significant attention with regard to feature selection studies. This is owing to their enhanced search capability [...] Read more.
Multilabel feature selection is an effective preprocessing step for improving multilabel classification accuracy, because it highlights discriminative features for multiple labels. Recently, multi-population genetic algorithms have gained significant attention with regard to feature selection studies. This is owing to their enhanced search capability when compared to that of traditional genetic algorithms that are based on communication among multiple populations. However, conventional methods employ a simple communication process without adapting it to the multilabel feature selection problem, which results in poor-quality final solutions. In this paper, we propose a new multi-population genetic algorithm, based on a novel communication process, which is specialized for the multilabel feature selection problem. Our experimental results on 17 multilabel datasets demonstrate that the proposed method is superior to other multi-population-based feature selection methods. Full article
(This article belongs to the Special Issue Information Theoretic Feature Selection Methods for Big Data)
Show Figures

Figure 1

15 pages, 597 KiB  
Article
A Cooperative Coevolutionary Approach to Discretization-Based Feature Selection for High-Dimensional Data
by Yu Zhou, Junhao Kang and Xiao Zhang
Entropy 2020, 22(6), 613; https://doi.org/10.3390/e22060613 - 01 Jun 2020
Cited by 5 | Viewed by 2303
Abstract
Recent discretization-based feature selection methods show great advantages by introducing the entropy-based cut-points for features to integrate discretization and feature selection into one stage for high-dimensional data. However, current methods usually consider the individual features independently, ignoring the interaction between features with cut-points [...] Read more.
Recent discretization-based feature selection methods show great advantages by introducing the entropy-based cut-points for features to integrate discretization and feature selection into one stage for high-dimensional data. However, current methods usually consider the individual features independently, ignoring the interaction between features with cut-points and those without cut-points, which results in information loss. In this paper, we propose a cooperative coevolutionary algorithm based on the genetic algorithm (GA) and particle swarm optimization (PSO), which searches for the feature subsets with and without entropy-based cut-points simultaneously. For the features with cut-points, a ranking mechanism is used to control the probability of mutation and crossover in GA. In addition, a binary-coded PSO is applied to update the indices of the selected features without cut-points. Experimental results on 10 real datasets verify the effectiveness of our algorithm in classification accuracy compared with several state-of-the-art competitors. Full article
(This article belongs to the Special Issue Information Theoretic Feature Selection Methods for Big Data)
Show Figures

Figure 1

12 pages, 581 KiB  
Article
Generalized Term Similarity for Feature Selection in Text Classification Using Quadratic Programming
by Hyunki Lim and Dae-Won Kim
Entropy 2020, 22(4), 395; https://doi.org/10.3390/e22040395 - 30 Mar 2020
Cited by 7 | Viewed by 2224
Abstract
The rapid growth of Internet technologies has led to an enormous increase in the number of electronic documents used worldwide. To organize and manage big data for unstructured documents effectively and efficiently, text categorization has been employed in recent decades. To conduct text [...] Read more.
The rapid growth of Internet technologies has led to an enormous increase in the number of electronic documents used worldwide. To organize and manage big data for unstructured documents effectively and efficiently, text categorization has been employed in recent decades. To conduct text categorization tasks, documents are usually represented using the bag-of-words model, owing to its simplicity. In this representation for text classification, feature selection becomes an essential method because all terms in the vocabulary induce enormous feature space corresponding to the documents. In this paper, we propose a new feature selection method that considers term similarity to avoid the selection of redundant terms. Term similarity is measured using a general method such as mutual information, and serves as a second measure in feature selection in addition to term ranking. To consider balance of term ranking and term similarity for feature selection, we use a quadratic programming-based numerical optimization approach. Experimental results demonstrate that considering term similarity is effective and has higher accuracy than conventional methods. Full article
(This article belongs to the Special Issue Information Theoretic Feature Selection Methods for Big Data)
Show Figures

Figure 1

17 pages, 477 KiB  
Article
CDE++: Learning Categorical Data Embedding by Enhancing Heterogeneous Feature Value Coupling Relationships
by Bin Dong, Songlei Jian and Ke Zuo
Entropy 2020, 22(4), 391; https://doi.org/10.3390/e22040391 - 29 Mar 2020
Viewed by 2254
Abstract
Categorical data are ubiquitous in machine learning tasks, and the representation of categorical data plays an important role in the learning performance. The heterogeneous coupling relationships between features and feature values reflect the characteristics of the real-world categorical data which need to be [...] Read more.
Categorical data are ubiquitous in machine learning tasks, and the representation of categorical data plays an important role in the learning performance. The heterogeneous coupling relationships between features and feature values reflect the characteristics of the real-world categorical data which need to be captured in the representations. The paper proposes an enhanced categorical data embedding method, i.e., CDE++, which captures the heterogeneous feature value coupling relationships into the representations. Based on information theory and the hierarchical couplings defined in our previous work CDE (Categorical Data Embedding by learning hierarchical value coupling), CDE++ adopts mutual information and margin entropy to capture feature couplings and designs a hybrid clustering strategy to capture multiple types of feature value clusters. Moreover, Autoencoder is used to learn non-linear couplings between features and value clusters. The categorical data embeddings generated by CDE++ are low-dimensional numerical vectors which are directly applied to clustering and classification and achieve the best performance comparing with other categorical representation learning methods. Parameter sensitivity and scalability tests are also conducted to demonstrate the superiority of CDE++. Full article
(This article belongs to the Special Issue Information Theoretic Feature Selection Methods for Big Data)
Show Figures

Figure 1

11 pages, 593 KiB  
Article
Weighted Mean Squared Deviation Feature Screening for Binary Features
by Gaizhen Wang and Guoyu Guan
Entropy 2020, 22(3), 335; https://doi.org/10.3390/e22030335 - 14 Mar 2020
Cited by 3 | Viewed by 2196
Abstract
In this study, we propose a novel model-free feature screening method for ultrahigh dimensional binary features of binary classification, called weighted mean squared deviation (WMSD). Compared to Chi-square statistic and mutual information, WMSD provides more opportunities to the binary features with probabilities near [...] Read more.
In this study, we propose a novel model-free feature screening method for ultrahigh dimensional binary features of binary classification, called weighted mean squared deviation (WMSD). Compared to Chi-square statistic and mutual information, WMSD provides more opportunities to the binary features with probabilities near 0.5. In addition, the asymptotic properties of the proposed method are theoretically investigated under the assumption log p = o ( n ) . The number of features is practically selected by a Pearson correlation coefficient method according to the property of power-law distribution. Lastly, an empirical study of Chinese text classification illustrates that the proposed method performs well when the dimension of selected features is relatively small. Full article
(This article belongs to the Special Issue Information Theoretic Feature Selection Methods for Big Data)
Show Figures

Figure 1

15 pages, 316 KiB  
Article
Information Theoretic Multi-Target Feature Selection via Output Space Quantization
by Konstantinos Sechidis, Eleftherios Spyromitros-Xioufis and Ioannis Vlahavas
Entropy 2019, 21(9), 855; https://doi.org/10.3390/e21090855 - 31 Aug 2019
Cited by 11 | Viewed by 2899
Abstract
A key challenge in information theoretic feature selection is to estimate mutual information expressions that capture three desirable terms—the relevancy of a feature with the output, the redundancy and the complementarity between groups of features. The challenge becomes more pronounced in multi-target problems, [...] Read more.
A key challenge in information theoretic feature selection is to estimate mutual information expressions that capture three desirable terms—the relevancy of a feature with the output, the redundancy and the complementarity between groups of features. The challenge becomes more pronounced in multi-target problems, where the output space is multi-dimensional. Our work presents an algorithm that captures these three desirable terms and is suitable for the well-known multi-target prediction settings of multi-label/dimensional classification and multivariate regression. We achieve this by combining two ideas—deriving low-order information theoretic approximations for the input space and using quantization algorithms for deriving low-dimensional approximations of the output space. Under the above framework we derive a novel criterion, Group-JMI-Rand, which captures various high-order target interactions. In an extensive experimental study we showed that our suggested criterion achieves competing performance against various other information theoretic feature selection criteria suggested in the literature. Full article
(This article belongs to the Special Issue Information Theoretic Feature Selection Methods for Big Data)
Show Figures

Figure 1

Back to TopTop