Next Article in Journal
Individual Differences, Self-Efficacy, and Chinese Scientists’ Industry Engagement
Previous Article in Journal
Bidirectional Long Short-Term Memory Network with a Conditional Random Field Layer for Uyghur Part-Of-Speech Tagging
Previous Article in Special Issue
Ensemble of Filter-Based Rankers to Guide an Epsilon-Greedy Swarm Optimizer for High-Dimensional Feature Subset Selection
Article Menu

Export Article

Open AccessArticle
Information 2017, 8(4), 159; doi:10.3390/info8040159

sCwc/sLcc: Highly Scalable Feature Selection Algorithms

1
Graduate School of Applied Informatics, University of Hyogo, Kobe 651-2197, Japan
2
Information Networking Institute, Carnegie Mellon University, Pittsburgh, PA 15213, USA
3
Computer Centre, Gakushuin University, Tokyo 171-0031, Japan
4
Institude of Economic Research, Chiba University of Commerce, Chiba 272-8512, Japan
5
Center for Digital Humanities, University of California Las Angeles; Los Angeles, CA 90095, USA
These authors contributed equally to this work.
*
Author to whom correspondence should be addressed.
Received: 31 October 2017 / Revised: 1 December 2017 / Accepted: 2 December 2017 / Published: 6 December 2017
(This article belongs to the Special Issue Feature Selection for High-Dimensional Data)
View Full-Text   |   Download PDF [1876 KB, uploaded 7 December 2017]   |  

Abstract

Feature selection is a useful tool for identifying which features, or attributes, of a dataset cause or explain the phenomena that the dataset describes, and improving the efficiency and accuracy of learning algorithms for discovering such phenomena. Consequently, feature selection has been studied intensively in machine learning research. However, while feature selection algorithms that exhibit excellent accuracy have been developed, they are seldom used for analysis of high-dimensional data because high-dimensional data usually include too many instances and features, which make traditional feature selection algorithms inefficient. To eliminate this limitation, we tried to improve the run-time performance of two of the most accurate feature selection algorithms known in the literature. The result is two accurate and fast algorithms, namely sCwc and sLcc. Multiple experiments with real social media datasets have demonstrated that our algorithms improve the performance of their original algorithms remarkably. For example, we have two datasets, one with 15,568 instances and 15,741 features, and another with 200,569 instances and 99,672 features. sCwc performed feature selection on these datasets in 1.4 seconds and in 405 seconds, respectively. In addition, sLcc has turned out to be as fast as sCwc on average. This is a remarkable improvement because it is estimated that the original algorithms would need several hours to dozens of days to process the same datasets. In addition, we introduce a fast implementation of our algorithms: sCwc does not require any adjusting parameter, while sLcc requires a threshold parameter, which we can use to control the number of features that the algorithm selects. View Full-Text
Keywords: feature selection; consistency; high-dimensional data; scalability feature selection; consistency; high-dimensional data; scalability
Figures

Figure 1

This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. (CC BY 4.0).

Scifeed alert for new publications

Never miss any articles matching your research from any publisher
  • Get alerts for new papers matching your research
  • Find out the new papers from selected authors
  • Updated daily for 49'000+ journals and 6000+ publishers
  • Define your Scifeed now

SciFeed Share & Cite This Article

MDPI and ACS Style

Shin, K.; Kuboyama, T.; Hashimoto, T.; Shepard, D. sCwc/sLcc: Highly Scalable Feature Selection Algorithms. Information 2017, 8, 159.

Show more citation formats Show less citations formats

Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Related Articles

Article Metrics

Article Access Statistics

1

Comments

[Return to top]
Information EISSN 2078-2489 Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert
Back to Top