Next Article in Journal
An Approach to the Classification of Cutting Vibration on Machine Tools
Next Article in Special Issue
A Big Network Traffic Data Fusion Approach Based on Fisher and Deep Auto-Encoder
Previous Article in Journal
The Treewidth of Induced Graphs of Conditional Preference Networks Is Small
Previous Article in Special Issue
Effects of Semantic Features on Machine Learning-Based Drug Name Recognition Systems: Word Embeddings vs. Manually Constructed Dictionaries
Article Menu

Export Article

Open AccessArticle
Information 2016, 7(1), 6; doi:10.3390/info7010006

A Feature Selection Method for Large-Scale Network Traffic Classification Based on Spark

1
Guangxi Key Laboratory of Trusted Software, Guilin University of Electronic Technology, Guilin 541004, China
2
Guangxi Colleges and Universities Key Laboratory of Cloud Computing and Complex Systems, Guilin University of Electronic Technology, Guilin 541004, China
3
Key Laboratory of Cognitive Radio and Information Processing, Guilin University of Electronic Technology, Guilin 541004, China
*
Author to whom correspondence should be addressed.
Academic Editors: Yong Yu and Yu Wang
Received: 20 December 2015 / Revised: 27 January 2016 / Accepted: 29 January 2016 / Published: 15 February 2016
(This article belongs to the Special Issue Recent Advances of Big Data Technology)
View Full-Text   |   Download PDF [1342 KB, uploaded 15 February 2016]   |  

Abstract

Currently, with the rapid increasing of data scales in network traffic classifications, how to select traffic features efficiently is becoming a big challenge. Although a number of traditional feature selection methods using the Hadoop-MapReduce framework have been proposed, the execution time was still unsatisfactory with numeral iterative computations during the processing. To address this issue, an efficient feature selection method for network traffic based on a new parallel computing framework called Spark is proposed in this paper. In our approach, the complete feature set is firstly preprocessed based on Fisher score, and a sequential forward search strategy is employed for subsets. The optimal feature subset is then selected using the continuous iterations of the Spark computing framework. The implementation demonstrates that, on the precondition of keeping the classification accuracy, our method reduces the time cost of modeling and classification, and improves the execution efficiency of feature selection significantly. View Full-Text
Keywords: feature selection; Fisher score; sequential forward search; MapReduce; Spark feature selection; Fisher score; sequential forward search; MapReduce; Spark
Figures

Figure 1

This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. (CC BY 4.0).

Scifeed alert for new publications

Never miss any articles matching your research from any publisher
  • Get alerts for new papers matching your research
  • Find out the new papers from selected authors
  • Updated daily for 49'000+ journals and 6000+ publishers
  • Define your Scifeed now

SciFeed Share & Cite This Article

MDPI and ACS Style

Wang, Y.; Ke, W.; Tao, X. A Feature Selection Method for Large-Scale Network Traffic Classification Based on Spark. Information 2016, 7, 6.

Show more citation formats Show less citations formats

Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Related Articles

Article Metrics

Article Access Statistics

1

Comments

[Return to top]
Information EISSN 2078-2489 Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert
Back to Top