Next Article in Journal
Easy Processing of Metal–Organic Frameworks into Pellets and Membranes
Next Article in Special Issue
A Heterogeneous Ensemble Learning Framework for Spam Detection in Social Networks with Imbalanced Data
Previous Article in Journal
Projection-Based Augmented Reality Assistance for Manual Electronic Component Assembly Processes
Previous Article in Special Issue
Improving Incident Response in Big Data Ecosystems by Using Blockchain Technologies
Open AccessArticle

Synthetic Minority Oversampling Technique for Optimizing Classification Tasks in Botnet and Intrusion-Detection-System Datasets

1
Instituto Politecnico Nacional, ESIME Culhuacan, Mexico City 04440, Mexico
2
Departament of Electrical Engineering Faculty of Technology University of Brasilia (UnB), Campus Universitario Darcy Ribeiro, Brasilia CEP 70910-900, Brazil
*
Author to whom correspondence should be addressed.
Appl. Sci. 2020, 10(3), 794; https://doi.org/10.3390/app10030794
Received: 8 December 2019 / Revised: 16 January 2020 / Accepted: 17 January 2020 / Published: 22 January 2020
Presently, security is a hot research topic due to the impact in daily information infrastructure. Machine-learning solutions have been improving classical detection practices, but detection tasks employ irregular amounts of data since the number of instances that represent one or several malicious samples can significantly vary. In highly unbalanced data, classification models regularly have high precision with respect to the majority class, while minority classes are considered noise due to the lack of information that they provide. Well-known datasets used for malware-based analyses like botnet attacks and Intrusion Detection Systems (IDS) mainly comprise logs, records, or network-traffic captures that do not provide an ideal source of evidence as a result of obtaining raw data. As an example, the numbers of abnormal and constant connections generated by either botnets or intruders within a network are considerably smaller than those from benign applications. In most cases, inadequate dataset design may lead to the downgrade of a learning algorithm, resulting in overfitting and poor classification rates. To address these problems, we propose a resampling method, the Synthetic Minority Oversampling Technique (SMOTE) with a grid-search algorithm optimization procedure. This work demonstrates classification-result improvements for botnet and IDS datasets by merging synthetically generated balanced data and tuning different supervised-learning algorithms. View Full-Text
Keywords: imbalanced data; datasets; botnet detection; synthetic minority oversampling technique; machine learning; predictive models. imbalanced data; datasets; botnet detection; synthetic minority oversampling technique; machine learning; predictive models.
Show Figures

Figure 1

MDPI and ACS Style

Gonzalez-Cuautle, D.; Hernandez-Suarez, A.; Sanchez-Perez, G.; Toscano-Medina, L.K.; Portillo-Portillo, J.; Olivares-Mercado, J.; Perez-Meana, H.M.; Sandoval-Orozco, A.L. Synthetic Minority Oversampling Technique for Optimizing Classification Tasks in Botnet and Intrusion-Detection-System Datasets. Appl. Sci. 2020, 10, 794.

Show more citation formats Show less citations formats
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

1
Back to TopTop