Next Article in Journal
Useful Dual Functional of Entropic Information Measures
Previous Article in Journal
An Improved Total Uncertainty Measure in the Evidence Theory and Its Application in Decision Making
Previous Article in Special Issue
Detecting Metachanges in Data Streams from the Viewpoint of the MDL Principle
Article

An Efficient, Parallelized Algorithm for Optimal Conditional Entropy-Based Feature Selection

1
Center of Toxins, Immune-Response and Cell Signaling (CeTICS), Laboratório de Ciclo Celular, Instituto Butantan, Butantã, São Paulo-SP 05503-900, Brazil
2
Instituto de Matemática e Estatística, Universidade de São Paulo, São Paulo-SP 05503-900, Brazil
*
Author to whom correspondence should be addressed.
Entropy 2020, 22(4), 492; https://doi.org/10.3390/e22040492
Received: 11 March 2020 / Revised: 3 April 2020 / Accepted: 4 April 2020 / Published: 24 April 2020
(This article belongs to the Special Issue Information-Theoretical Methods in Data Mining)
In Machine Learning, feature selection is an important step in classifier design. It consists of finding a subset of features that is optimum for a given cost function. One possibility to solve feature selection is to organize all possible feature subsets into a Boolean lattice and to exploit the fact that the costs of chains in that lattice describe U-shaped curves. Minimization of such cost function is known as the U-curve problem. Recently, a study proposed U-Curve Search (UCS), an optimal algorithm for that problem, which was successfully used for feature selection. However, despite of the algorithm optimality, the UCS required time in computational assays was exponential on the number of features. Here, we report that such scalability issue arises due to the fact that the U-curve problem is NP-hard. In the sequence, we introduce the Parallel U-Curve Search (PUCS), a new algorithm for the U-curve problem. In PUCS, we present a novel way to partition the search space into smaller Boolean lattices, thus rendering the algorithm highly parallelizable. We also provide computational assays with both synthetic data and Machine Learning datasets, where the PUCS performance was assessed against UCS and other golden standard algorithms in feature selection. View Full-Text
Keywords: machine learning; supervised learning; information theory; mean conditional entropy; feature selection; classifier design; Support-Vector Machine; U-curve problem; Boolean lattice machine learning; supervised learning; information theory; mean conditional entropy; feature selection; classifier design; Support-Vector Machine; U-curve problem; Boolean lattice
Show Figures

Figure 1

MDPI and ACS Style

Estrela, G.; Gubitoso, M.D.; Ferreira, C.E.; Barrera, J.; Reis, M.S. An Efficient, Parallelized Algorithm for Optimal Conditional Entropy-Based Feature Selection. Entropy 2020, 22, 492. https://doi.org/10.3390/e22040492

AMA Style

Estrela G, Gubitoso MD, Ferreira CE, Barrera J, Reis MS. An Efficient, Parallelized Algorithm for Optimal Conditional Entropy-Based Feature Selection. Entropy. 2020; 22(4):492. https://doi.org/10.3390/e22040492

Chicago/Turabian Style

Estrela, Gustavo, Marco D. Gubitoso, Carlos E. Ferreira, Junior Barrera, and Marcelo S. Reis. 2020. "An Efficient, Parallelized Algorithm for Optimal Conditional Entropy-Based Feature Selection" Entropy 22, no. 4: 492. https://doi.org/10.3390/e22040492

Find Other Styles
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

1
Back to TopTop