# Towards a Reliable Comparison and Evaluation of Network Intrusion Detection Systems Based on Machine Learning Approaches

^{1}

^{2}

^{3}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Related Work

#### Recent Works and Methods

## 3. Network Datatasets for NIDSs Evaluation

#### 3.1. UGR’16 Dataset

## 4. Methodology

#### 4.1. Feature Engineering (FE)

`ts`) into new observations which will comprise valuable information from the original variables in the form of counters. For example, the

`protocol`raw variable is transformed in two new ones:

`protocol_tcp`and

`protocol_udp`. These new variables counts how many times UDP and TCP protocol, highlighted in bold green and red in the figure respectively, are seen in a minute. In the case of

`packets`raw variable we binning it into what we consider a low, very low, medium, high or very high value according to its distribution. It allows transforming the original variable in five new ones which are

`npackets_verylow, npackets_low, npackets_medium, npackets_high`and

`npackets_veryhigh`respectively. Figure 2 show an example of this transformation where several packets in the interval $[0,4)$ are considered very low while those in the interval $[4,21)$ are low. Same approach is used for the

`bytes`variable but with different magnitude. Of course, such intervals can be customized on demand.

`output-20160727t1343`comprises three different class labels:

`background`,

`dos`and

`nerisbotnet`. In that case, the most frequent one will be chosen as the associated class or label for that specific observation, i.e.,

`background`in the illustrated example.

#### 4.2. Feature Selection (FS)

#### 4.3. Data Pre-processing (DP)

#### 4.4. Hyper-parameters Selection (HS)

#### 4.5. Machine Learning (ML) Models

- Multinomial Logistic Regression (LR). It is the simplest linear model [40] that has been widely applied to several and diverse tasks. For a binary classification problem, LR models the dependent variable ${y}_{i}\in [0,1]$ as a linear combination of the independent variables ${\mathit{x}}_{\mathit{i}}$ as shown in Equation (4),$${y}_{i}=f({\beta}_{0}+{\beta}_{1}{x}_{i1}+{\beta}_{2}{x}_{i2}+\dots +{\beta}_{p}{x}_{ip}),$$$${\mathit{\beta}}^{*}=\underset{\mathit{\beta}}{arg\; min}\phantom{\rule{5.0pt}{0ex}}\left|\right|\mathit{y}-f\left(\mathit{\beta}{X}^{T}\right){\left|\right|}_{2}^{2}$$Since the problem addressed consists of several ($>2$) labels or classes, a One-vs-All approach was used in such a way that K logistic regression models are trained (K being the number of labels or classes), each one focusing on solving the corresponding binary classification problem, and the overall prediction of the LR model is computed using the softmax function depicted in Equation (2).
- Support Vector Machine (SVC). It is a kernel-based method that uses a kernel function (radial basis, linear, polynomial, or any other) to map the original input space into a new space where predictions can be made more accurately. In this sense, the performance of an SVC is determined by the type of kernel function. In this work, the Linear Function (SVC-L) and Radial Basis Function (SVC-RBF) were tested. Linear and Gaussian kernel functions are depicted in Equations (6) and (7), respectively.$$\begin{array}{c}\hfill \begin{array}{cc}\hfill K(x,z)=& {x}^{T}z\hfill \end{array}\end{array}$$$$\begin{array}{c}\hfill \begin{array}{cc}\hfill K(x,z)=& exp\left(\frac{-\left|\right|x-{z\left|\right|}^{2}}{2{\sigma}^{2}}\right)\hfill \end{array}\end{array}$$
- Random Forest (RF). It is a tree-based non-linear bagging model for which multiple decision trees are fitted to different views of the observed data [41]. In this sense, each decision tree is fitted to a subset of the N samples (randomly sampled). Moreover, a random subset of the P input features is used within each node of a tree to determine which of them is used to expand it further. Overall RF predictions are computed by calculating the average (or weighted average according to the performance of each single decision tree on the out-of-bag samples) of the individual predictions provided by the multiple decision trees.

#### 4.6. Performance Metrics (PM)

- Recall (R). It is also known as sensitivity or TPR (True Positive Rate) and represents the ability of the classifier to detect all the positive cases, as depicted by Equation (8),$$Recall\left(R\right)=\frac{TP}{TP+FN}$$
- Precision (P). It evaluates the ability of the classifier to avoid positive samples miss-classification and it is defined in Equation (9),$$Precision\left(P\right)=\frac{TP}{TP+FP}$$
- F1 score. It is the harmonic mean of the previous two values, as depicted in Equation (10). A high F1 score value (close to 1) means an excellent performance of the classifier.$$F1=\frac{2\times R\times P}{R+P}$$
- AUC. The AUC is a quantitative measurement of the area under the Receiver Operating Characteristic (ROC) curve which is widely used as a performance metric in NIDSs in particular and IDSs in general [10,42]. The ROC curve compares the evolution of the TP rate versus the FP rate for different values of the classifying threshold. Consequently, the AUC is a performance indicator such that classifiers with AUC = 1 behave perfectly, i.e., it is able to correctly classify all the observations, while a random classifier would get an AUC value around 0.5.
- Weighted average. For each class $i=1,\dots ,C$, the $weighted\_avg\left(P{M}_{i}\right)$ computes the weighted average of each metric $P{M}_{i}$ previously introduced times by the corresponding support ${q}_{i}$ (the number of true observations of each class), being Q the total number of observations. These weighted metrics are defined as shown in Equation (11).$$weighted\_avg\left(P{M}_{i}\right)=\frac{{\sum}_{i=1}^{C}P{M}_{i}\times {q}_{i}}{Q}$$

## 5. Experimentation: UGR’16 as a Case Study

#### 5.1. Experimental Environment

#### 5.2. Results and Discussion

- DoS: Precision (SVC-RBF vs SVC-L), Recall (RF vs SVC-L), F1 (LR vs SVC-L, RF vs SVC-L, and LR vs RF), and AUC (LR vs SVC-L;
- Botnet: Precision (LR vs SVC-L), Recall (LR vs SVC-L and LR vs SVC-RBF), F1 (LR vs SVC-L and LR vs SVC-RBF), AUC (LR vs SVC-L and LR vs SVC-RBF);
- Scan: Recall (LR vs SVC-L, RF vs SVC-L and LR vs RF), F1 (RF vs SVC-L), AUC (LR vs SVC-L and RF vs SVC-L);
- Spam: Recall (LR vs SVC-L).

#### Proposal Comparison of the Proposed ML-based NIDS with a Previously Published One

## 6. Conclusions and Future Work

## Supplementary Materials

## Author Contributions

## Funding

## Acknowledgments

## Conflicts of Interest

## Abbreviations

ABC | Artificial Bee Colony |

AFS | Artificial Fish Swarm |

AIDS | Anomaly-based IDS |

AUC | Area Under the Curve |

BGP | Border Gateway Protocol |

DD | Derived Dataset |

DDN | Deep Neural Network |

DoS | Denial of Service |

DT | Decision Tree |

FaaC | Features as a Counter |

FNR | False Negative Rate |

FPR | False Positive Rate |

FE | Feature Engineering |

FSR | Forward Selection Ranking |

GAN | Generative Adversarial Networks |

GBDT | Gradient Boosted DT |

HS | Hyper-parameter Selection |

HTTP | Hyper Text Transport Protocol |

ICMP | Internet Control Messaging Protocol |

ICT | Information and Communication Technology |

IDS | Intrusion Detection System |

IGMP | Internet Gateway Messaging Protocol |

IoT | Internet of Things |

ISP | Internet Service Provider |

LASSO | Leas Absolute Shrinkage and Selection Operator |

LSTM | Long Short-Term Memory |

LR | Linear Regression |

ML | Machine Learning |

MSNM | Multivariate Statistical Network Monitoring |

NIDS | Network IDS |

PLS | Partial Least Squares |

PM | Performance Metric |

RNN | Recurrent Neural Network |

RF | Random Forest |

ROC | Receiving Operating Characteristic |

SIDS | Signature-based IDS |

SMTP | Simple Mail Transport Protocol |

SNMP | Simple Network Management Protocol |

SSH | Secure SHell |

SVM | Support Vector Machine |

TCP | Transport Control Protocol |

ToS | Type of Service |

TPR | True Positive Rate |

UDP | User Datagram Protocol |

## References

- ENISA. ENISA Threat Landscape Report 2017. Available online: https://www.enisa.europa.eu/publications/enisa-threat-landscape-report-2017 (accessed on 22 September 2019).
- Chaabouni, N.; Mosbah, M.; Zemmari, A.; Sauvignac, C.; Faruki, P. Network Intrusion Detection for IoT Security Based on Learning Techniques. IEEE Commun. Surv. Tutor.
**2019**, 21, 2671–2701. [Google Scholar] [CrossRef] - ENISA. ENISA Threat Landscape Report 2018. Available online: https://www.enisa.europa.eu/publications/enisa-threat-landscape-report-2018 (accessed on 22 September 2019).
- Di Pietro, R.; Mancini, L.V. Intrusion Detection Systems; Springer: New York, NY, USA, 2008; Volume 38, p. XIV, 250. [Google Scholar]
- García-Teodoro, P.; Díaz-Verdejo, J.; Maciá-Fernández, G.; Vázquez, E. Anomaly-based network intrusion detection: Techniques, systems and challenges. Comput. Secur.
**2009**, 28, 18–28. [Google Scholar] [CrossRef] - University of California. KDD Cup 1999 Dataset. Available online: http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html (accessed on 20 September 2019).
- Tavallaee, M.; Bagheri, E.; Lu, W.; Ghorbani, A.A. A detailed analysis of the KDD CUP 99 data set. In Proceedings of the 2009 IEEE Symposium on Computational Intelligence for Security and Defense Applications, Ottawa, ON, Canada, 8–10 July 2009; pp. 1–6. [Google Scholar]
- Moustafa, N.; Slay, J. UNSW-NB15: A comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set). In Proceedings of the 2015 Military Communications and Information Systems Conference (MilCIS), Canberra, Australia, 10–12 November 2015; pp. 1–6. [Google Scholar]
- Moustafa, N.; Slay, J. The Evaluation of Network Anomaly Detection Systems: Statistical Analysis of the UNSW-NB15 Data Set and the Comparison with the KDD99 Data Set. Inf. Sec. J. Glob. Perspect.
**2016**, 25, 18–31. [Google Scholar] [CrossRef] - Bhuyan, M.H.; Bhattacharyya, D.K.; Kalita, J.K. Network Anomaly Detection: Methods, Systems and Tools. IEEE Commun. Surv. Tutor.
**2014**, 16, 303–336. [Google Scholar] [CrossRef] - Liu, H.; Lang, B. Machine Learning and Deep Learning Methods for Intrusion Detection Systems: A Survey. Appl. Sci.
**2019**, 9, 4396. [Google Scholar] [CrossRef] [Green Version] - Camacho, J.; Maciá-Fernández, G.; Fuentes-García, N.M.; Saccenti, E. Semi-supervised Multivariate Statistical Network Monitoring for Learning Security Threats. IEEE Trans. Inf. Forensics Secur.
**2019**, 14, 2179–2189. [Google Scholar] [CrossRef] [Green Version] - Siddique, K.; Akhtar, Z.; Aslam Khan, F.; Kim, Y. KDD Cup 99 Data Sets: A Perspective on the Role of Data Sets in Network Intrusion Detection Research. Computer
**2019**, 52, 41–51. [Google Scholar] [CrossRef] - Maciá-Fernández, G.; Camacho, J.; Magán-Carrión, R.; García-Teodoro, P.; Therón, R. UGR‘16: A new dataset for the evaluation of cyclostationarity-based network IDSs. Comput. Secur.
**2018**, 73, 411–424. [Google Scholar] [CrossRef] [Green Version] - Rathore, M.M.; Ahmad, A.; Paul, A. Real time intrusion detection system for ultra-high-speed big data environments. J. Supercomput.
**2016**, 72, 3489–3510. [Google Scholar] [CrossRef] - Haider, W.; Hu, J.; Slay, J.; Turnbull, B.P.; Xie, Y. Generating realistic intrusion detection system dataset based on fuzzy qualitative modeling. J. Netw. Comput. Appl.
**2017**, 87, 185–192. [Google Scholar] [CrossRef] - Sharafaldin, I.; Lashkari, A.H.; Ghorbani, A.A. Toward Generating a New Intrusion Detection Dataset and Intrusion Traffic Characterization. In Proceedings of the 4th International Conference on Information Systems Security and Privacy (ICISSP), Funchal, Portugal, 22–24 January 2018; pp. 108–116. [Google Scholar]
- Li, Z.; Rios, A.L.G.; Xu, G.; Trajković, L. Machine Learning Techniques for Classifying Network Anomalies and Intrusions. In Proceedings of the 2019 IEEE International Symposium on Circuits and Systems (ISCAS), Sapporo, Japan, 26–29 May 2019; pp. 1–5. [Google Scholar]
- Le, T.T.H.; Kim, Y.; Kim, H. Network Intrusion Detection Based on Novel Feature Selection Model and Various Recurrent Neural Networks. Appl. Sci.
**2019**, 9, 1392. [Google Scholar] [CrossRef] [Green Version] - Cordero, C.G.; Vasilomanolakis, E.; Wainakh, A.; Mühlhäuser, M.; Nadjm-Tehrani, S. On generating network traffic datasets with synthetic attacks for intrusion detection. arXiv
**2019**, arXiv:1905.00304. [Google Scholar] - Kabir, E.; Hu, J.; Wang, H.; Zhuo, G. A novel statistical technique for intrusion detection systems. Future Gener. Comput. Syst.
**2018**, 79, 303–318. [Google Scholar] [CrossRef] [Green Version] - Hajisalem, V.; Babaie, S. A hybrid intrusion detection system based on ABC-AFS algorithm for misuse and anomaly detection. Comput. Netw.
**2018**, 136, 37–50. [Google Scholar] [CrossRef] - Divekar, A.; Parekh, M.; Savla, V.; Mishra, R.; Shirole, M. Benchmarking datasets for Anomaly-based Network Intrusion Detection: KDD CUP 99 alternatives. In Proceedings of the 2018 IEEE 3rd International Conference on Computing, Communication and Security (ICCCS), Kathmandu, Nepal, 25–27 October 2018; pp. 1–8. [Google Scholar]
- Belouch, M.; El Hadaj, S.; Idhammad, M. Performance evaluation of intrusion detection based on machine learning using Apache Spark. Procedia Comput. Sci.
**2018**, 127, 1–6. [Google Scholar] [CrossRef] - Hussain, J.; Lalmuanawma, S. Feature Analysis, Evaluation and Comparisons of Classification Algorithms Based on Noisy Intrusion Dataset. Procedia Comput. Sci.
**2016**, 92, 188–198. [Google Scholar] [CrossRef] [Green Version] - García, S.; Grill, M.; Stiborek, J.; Zunino, A. An empirical comparison of botnet detection methods. Comput. Secur.
**2014**, 45, 100–123. [Google Scholar] [CrossRef] - Zhang, J.; Liang, Q.; Jiang, R.; Li, X. A Feature Analysis Based Identifying Scheme Using GBDT for DDoS with Multiple Attack Vectors. Appl. Sci.
**2019**, 9, 4633. [Google Scholar] [CrossRef] [Green Version] - García Cordero, C.; Hauke, S.; Mühlhäuser, M.; Fischer, M. Analyzing flow-based anomaly intrusion detection using Replicator Neural Networks. In Proceedings of the 2016 14th Annual Conference on Privacy, Security and Trust (PST), Auckland, New Zealand, 12–14 December 2016; pp. 317–324. [Google Scholar]
- Camacho, J.; Pérez-Villegas, A.; García-Teodoro, P.; Maciá-Fernández, G. PCA-based multivariate statistical network monitoring for anomaly detection. Comput. Secur.
**2016**, 59, 118–137. [Google Scholar] [CrossRef] [Green Version] - Gupta, K.K.; Nath, B.; Kotagiri, R. Layered Approach Using Conditional Random Fields for Intrusion Detection. IEEE Trans. Dependable Secur. Comput.
**2010**, 7, 35–49. [Google Scholar] [CrossRef] - Ring, M.; Wunderlich, S.; Scheuring, D.; Landes, D.; Hotho, A. A survey of network-based intrusion detection data sets. Comput. Secur.
**2019**, 86, 147–167. [Google Scholar] [CrossRef] [Green Version] - Camacho, J.; Maciá-Fernández, G.; Díaz-Verdejo, J.; García-Teodoro, P. Tackling the Big Data 4 vs for anomaly detection. In Proceedings of the 2014 IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS), Toronto, ON, Canada, 27 April–2 May 2014; pp. 500–505. [Google Scholar]
- Camacho, J.; García-Giménez, J.M.; Fuentes-García, N.M.; Maciá-Fernández, G. Multivariate Big Data Analysis for intrusion detection: 5 steps from the haystack to the needle. Comput. Secur.
**2019**, 87, 101603. [Google Scholar] [CrossRef] [Green Version] - Pérez-Villegas, A.; García-Jiménez, J.; Camacho, J. FaaC (Feature-as-a-Counter) Parser—Github. Available online: https://github.com/josecamachop/FCParser (accessed on 20 December 2019).
- Friedman, J.; Hastie, T.; Tibshirani, R. Regularization Paths for Generalized Linear Models via Coordinate Descent. J. Stat. Softw.
**2010**, 33, 1–22. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Freeman, D.; Chio, C. Machine Learning and Security; O’Reilly Media: Newton, MA, USA, 2018. [Google Scholar]
- Bergstra, J.; Bengio, Y. Random Search for Hyper-Parameter Optimization. J. Mach. Learn. Res.
**2012**, 13, 281–305. [Google Scholar] - Magán Carrión, R.; Diaz-Cano, I. Evaluación de Algoritmos de Clasificación para la Detección de Ataques en Red Sobre Conjuntos de Datos Reales: UGR’16 Dataset como caso de Estudio. In Proceedings of the V Jornadas Nacionales de Investigación en Ciberseguridad, Cáceres, Spain, 5–7 June 2019; Universidad de Extremadura, Servicio de Publicaciones: Badajoz, Spain, 2019; pp. 46–52. [Google Scholar]
- Snoek, J.; Larochelle, H.; Adams, R.P. Practical Bayesian Optimization of Machine Learning Algorithms. In Advances in Neural Information Processing Systems 25; Curran Associates, Inc.: New York, NY, USA, 2012; pp. 2951–2959. [Google Scholar]
- Bishop, C. Pattern Recognition and Machine Learning; Information Science and Statistics; Springer Inc.: New York, NY, USA; Berlin, Germany, 2006. [Google Scholar]
- Breiman, L. Random Forests. Mach. Learn.
**2001**, 45, 5–32. [Google Scholar] [CrossRef] [Green Version] - Salo, F.; Injadat, M.; Nassif, A.B.; Shami, A.; Essex, A. Data Mining Techniques in Intrusion Detection Systems: A Systematic Literature Review. IEEE Access
**2018**, 6, 56046–56058. [Google Scholar] [CrossRef] - Free Framework for Machine Learning. Cámara UI-3250CP-C-HQ. Available online: https://github.com/ucadatalab/ff4ml (accessed on 26 December 2019).
- García, S.; Molina, D.; Lozano, M.; Herrera, F. A Study on the use of Non-parametric Tests for Analyzing the Evolutionary Algorithms’ Behaviour: A Case Study on the CEC’05 Special Session on Real Parameter Optimization. J. Heuristics
**2009**, 15, 617–644. [Google Scholar] [CrossRef]

**Figure 3.**Performance results of each ML model tested, for each class in the dataset and including a weighted average of the corresponding performance metric.

**Figure 4.**Comparison of the AUC performance result among the ML-based solutions proposed in this work and a previously published NIDS based on MSNM techniques.

Work | Dataset | Methodology | |||||
---|---|---|---|---|---|---|---|

FE | FS | DP | HS | ML | PM | ||

Siddique et al. [13] | KDDCup’99, NGIDS-DS | – | – | – | – | classical | A, TFR |

Rathore et al. [15] | KDDCup’99 | – | proposed | – | – | classical | TFR |

Sharafaldin et al. [17] | CICIDS2017 | existing | existing | – | – | classical | F1, P, R |

Li et al. [18] | BGP, NSL-KDD | proposed | – | normalization | manual | deep learning | A, F1 |

Le et al. [19] | ISCX12, NSL-KDD | – | proposed | – | – | deep learning | A, TFR, ROC |

Cordero et al. [20] | MAWI | proposed | – | – | manual | deep learning | O |

Camacho et al. [12] | UGR’16 | proposed | proposed | mean, normalization | manual | statistical | AUC |

Kabir et al. [21] | KDDCup’99 | – | existing | – | – | classical | F1, P, R |

Hajisalem et al. [22] | NSL-KDD, UNSW-NB15 | – | existing | – | – | other | A, TFR |

Divekar et al. [23] | KDDCup’99, NSL-KDD,UNSW-NB15 | – | existing | mean | existing | classical | A, F1 |

Belouch et al. [24] | UNSW-NB15 | – | – | – | – | classical | A, TFR |

Hussain et al. [25] | KDDCup’99, NSL-KDD | – | proposed | – | – | classical | A, TFR, AUC |

García et al. [26] | CTU-13 | – | – | – | – | other | A, TFR, O, F1, P, R |

Zhang et al. [27] | MAWI | proposed | proposed | normalization | existing, manual | other, classical | A, TFR, F1, P, R |

Magán-Carrión et al. | UGR’16 | existing | existing | normalization | existing | classical | F1, P, R, AUC |

**FE (Feature Engineering)**: existing (existing proposal), proposed (author’s proposal).

**DP (Data Pre-processing)**: mean, normalization.

**FS (Feature Selection)**: existing, proposed, manual.

**HS (Hyper-parameter Selection)**: existing, proposed.

**ML (Machine Learning)**: classical, deep learning, statistical, other.

**PM (Performance Metric)**: A (Accuracy), TFR (TP (True Positive Rate), FP (False Positive Rate), TN (True Negative Rate) or FN (False Negative Rate)), F1, R (Recall), P (Precision), ROC, AUC, O (Others).

Class | # of Flows | % |
---|---|---|

Background | ∼$4000M$ | $97.14$ |

Blacklist | ∼$18M$ | $0.46$ |

Botnet | ∼$2M$ | $0.04$ |

DoS | ∼$9M$ | $0.23$ |

SSH scan | 64 | ∼0 |

Scan | ∼$6M$ | $0.14$ |

Spam | ∼$78M$ | $1.96$ |

UDP scan | ∼$1M$ | $0.03$ |

Description | # | Values |
---|---|---|

Source IP | 2 | public, private |

Destination IP | 2 | public, private |

Source port | 52 | HTTP, SMTP, SNMP, … |

Destination port | 52 | HTTP, SMTP, SNMP, … |

Protocol | 5 | TCP, UDP, ICMP, IGMP, Other |

Flags | 6 | A, S, F, R, P, U |

ToS | 3 | 0, 192, Other |

# packets | 5 | very low, low, medium, high, very high |

# bytes | 5 | very low, low, medium, high, very high |

label | 8 | background, blacklist, botnet, dos, sshscan, scan, spam, udpscan |

Class | # of Observations | % |
---|---|---|

Background | $\mathrm{30,091}$ | $63.65$ |

Botnet | 594 | $1.26$ |

DoS | 417 | $0.88$ |

SSH scan | 176 | $0.06$ |

Scan | 27 | $0.37$ |

Spam | $\mathrm{15,961}$ | $33.76$ |

UDP scan | 9 | $0.02$ |

Model | Class | PM | |||
---|---|---|---|---|---|

P | R | F1 | AUC | ||

LR | Background | 0.814 | 0.919 | 0.863 | 0.775 |

Dos | 0.933 | 0.915 | 0.923 | 0.957 | |

Botnet | 0.965 | 0.891 | 0.926 | 0.945 | |

Scan | 0.801 | 0.916 | 0.852 | 0.957 | |

Spam | 0.797 | 0.606 | 0.688 | 0.764 | |

Weighted avg. | 0.810 | 0.812 | 0.805 | 0.776 | |

RF | Background | 0.885 | 0.906 | 0.921 | 0.871 |

Dos | 0.973 | 0.884 | 0.925 | 0.942 | |

Botnet | 0.977 | 0.922 | 0.948 | 0.961 | |

Scan | 0.932 | 0.925 | 0.928 | 0.962 | |

Spam | 0.917 | 0.749 | 0.824 | 0.857 | |

Weighted avg. | 0.897 | 0.887 | 0.888 | 0.868 | |

SVC-RBF | Background | 0.839 | 0.937 | 0.885 | 0.810 |

Dos | 0.960 | 0.831 | 0.889 | 0.915 | |

Botnet | 0.972 | 0.886 | 0.927 | 0.943 | |

Scan | 0.941 | 0.536 | 0.652 | 0.768 | |

Spam | 0.824 | 0.638 | 0.717 | 0.790 | |

Weighted avg. | 0.837 | 0.832 | 0.827 | 0.806 | |

SVC-L | Background | 0.819 | 0.93 | 0.871 | 0.785 |

Dos | 0.957 | 0.898 | 0.926 | 0.948 | |

Botnet | 0.968 | 0.899 | 0.932 | 0.949 | |

Scan | 0.944 | 0.91 | 0.926 | 0.955 | |

Spam | 0.829 | 0.61 | 0.703 | 0.773 | |

Weighted avg. | 0.826 | 0.821 | 0.815 | 0.785 |

**Table 6.**Comparison of the classification performance between MSNM approaches in a state-of-the-art publication and the ML-based ones proposed in this work.

NIDS | Model | Class | AUC |
---|---|---|---|

Our Proposal | LR | Dos | 0.957 |

Botnet | 0.945 | ||

Scan | 0.957 | ||

RF | Dos | 0.942 | |

Botnet | 0.961 | ||

Scan | 0.962 | ||

SVC-RBF | Dos | 0.915 | |

Botnet | 0.943 | ||

Scan | 0.768 | ||

SVC-L | Dos | 0.948 | |

Botnet | 0.949 | ||

Scan | 0.955 | ||

Camacho et al. | MSNM-MC | Dos | 0.969 |

Botnet | 0.512 | ||

Scan | 0.979 | ||

MSNM-AS | Dos | 0.983 | |

Botnet | 0.62 | ||

Scan | 0.994 | ||

MSNM-R2R-PLS | Dos | 0.999 | |

Botnet | 0.771 | ||

Scan | 1 | ||

MSNM-SVC-RBF | Dos | 0.999 | |

Botnet | 0.884 | ||

Scan | 0.997 | ||

MSNM-SVC-L | Dos | 0.998 | |

Botnet | 0.808 | ||

Scan | 0.997 |

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Magán-Carrión, R.; Urda, D.; Díaz-Cano, I.; Dorronsoro, B.
Towards a Reliable Comparison and Evaluation of Network Intrusion Detection Systems Based on Machine Learning Approaches. *Appl. Sci.* **2020**, *10*, 1775.
https://doi.org/10.3390/app10051775

**AMA Style**

Magán-Carrión R, Urda D, Díaz-Cano I, Dorronsoro B.
Towards a Reliable Comparison and Evaluation of Network Intrusion Detection Systems Based on Machine Learning Approaches. *Applied Sciences*. 2020; 10(5):1775.
https://doi.org/10.3390/app10051775

**Chicago/Turabian Style**

Magán-Carrión, Roberto, Daniel Urda, Ignacio Díaz-Cano, and Bernabé Dorronsoro.
2020. "Towards a Reliable Comparison and Evaluation of Network Intrusion Detection Systems Based on Machine Learning Approaches" *Applied Sciences* 10, no. 5: 1775.
https://doi.org/10.3390/app10051775