You are currently viewing a new version of our website. To view the old version click .
Sensors
  • Article
  • Open Access

23 April 2021

Detection and Classification of Malicious Flows in Software-Defined Networks Using Data Mining Techniques

and
1
NASK National Research Institute, 01-045 Warsaw, Poland
2
Ministry of National Defense, 01-045 Warsaw, Poland
*
Author to whom correspondence should be addressed.
This article belongs to the Collection Intelligent Wireless Networks

Abstract

The increasing availability of mobile devices and applications, the progress in virtualisation technologies, and advances in the development of cloud-based distributed data centres have significantly stimulated the growing interest in the use of software-defined networks (SDNs) for both wired and wireless applications. Standards-based software abstraction between the network control plane and the underlying data forwarding plane, including both physical and virtual devices, provides an opportunity to significantly increase network security. In this paper, to secure SDNs against intruders’ actions, we propose a comprehensive system that exploits the advantages of SDNs’ native features and implements data mining to detect and classify malicious flows in the SDN data plane. The architecture of the system and its mechanisms are described, with an emphasis on flow rule generation and flow classification. The concept was verified in the SDN testbed environment that reflects typical SDN flows. The experiments confirmed that the system can be successfully implemented in SDNs to mitigate threats caused by different malicious activities of intruders. The results show that our combination of data mining techniques provides better detection and classification of malicious flows than other solutions.

1. Introduction

A traditional communication network comprises interconnected and individually configured devices for forwarding data packets. This has a few limitations related to the flexibility of packet forwarding and network management, as well as inhibiting the introduction of new, more effective mechanisms. The increasing availability of mobile devices and applications, the progress in virtualisation techniques, and advances in the development of cloud-based distributed data centres has significantly stimulated the growing interest in the use of software-defined networks. An SDN decouples the control plane from the data plane, improving the flexibility and automation of network functions; creates favourable conditions for introducing innovations; and leads to a reduction in the SDN’s operating costs.
Although the implementation of software-based technology in wired networks is relatively easy and frequent, it also has benefits in the wireless domain [1]. For example, it enables better collaboration between access points in order to reduce radio-specific problems and enhance wireless network security.
The SDN architecture can also be successfully used in other areas. For example, the recent work on many-core systems-on-a-chip (MCSoCs) considered adopting the SDN concept to design low-cost, high-performance architecture for aperiodic and low-duty-cycle traffic between cores [2]. An MCSoC in smartphones or IoT devices has a huge number of processing cores and many memories connected to one another by an on-chip network; therefore, the introduction of the SDN architecture may significantly improve network management performance.
However, SDN technology has many vulnerabilities that can be exploited by an attacker to breach network security, as discussed by Kumar and Gupta [3]. Figure 1 shows the possible attacks and threat vectors targeting different components of the SDN architecture.
Figure 1. Possible attacks and threat vectors targeting SDN components.
Cyber-attacks can exploit, for example:
  • An incorrect configuration;
  • The operating system’s kernel errors of SDN controller;
  • Incorrect permissions;
  • The insufficient validation of input data;
  • Software coding mistakes, e.g., a buffer overflow.
A basic threat is taking over control of the SDN driver, which may compromise the entire SDN. The attacker has unauthorised control over the network devices. The level of vulnerability to such attacks is mostly conditioned by the implementation of an SDN driver.
There are some solutions to detect and neutralise such threats, e.g., an intrusion detection system (IDS), an intrusion prevention system (IPS), or SDN controller replication mechanisms, as proposed by Gonzales et al. [4]. Lee et al. [5] discussed SDN security issues resulting from attacks on the northern interface, involving taking control of network applications or introducing malicious software. Such attacks cause illegal actions, e.g., the manipulation of flow rules, redirecting packets to an unauthorised recipient, or blocking selected traffic flows. An attack can also be targeted at the southbound interface by exploiting vulnerabilities of the protocols used, which enables the attacker to monitor or interfere in the exchange of messages between the controller and network devices. The SDN controller can be treated as a single point of failure, and therefore, it is a particularly attractive target for denial-of-service (DoS) attacks from both northbound and southbound interfaces. An attack from the northbound interface on one application can negatively affect another application that is not being directly attacked and, consequently, may, for example, introduce many conflicting flow rules for many applications. In the case of reactive flow entries, where each packet that does not match the existing entries in the flow table is forwarded to the controller, the generation of a huge number of malicious flows can overload the SDN controller and, consequently, disable network control.
Several practical solutions have been proposed to secure SDNs. For example, Scott-Hayward et al. [6] discussed the challenges in securing an SDN from a persistent attacker and proposed a holistic approach to the development of the SDN security architecture. They also identified research directions essential for providing network security. A state-of-the-art solution proposed to secure SDNs was discussed in [7]. The authors classified security solutions in terms of SDN layers/interfaces, security measures, simulation environments, and security objectives, as well as providing their own view on potential security requirements and key enablers for securing SDNs.
However, standards-based software abstraction between the network control plane and the underlying data forwarding plane, including both physical and virtual devices, provides an opportunity to significantly increase network security by using native SDN features, as discussed by Shin et al. [8] and Yoon et al. [9].
We propose to take advantage of such features, mainly related to the aggregation of various statistics from network devices, the openness for the implementation of new applications enabling the proper processing of traffic data, and their integration with network control mechanisms. We also recommend using the data mining technique (DMT) to detect and classify malicious flows in the SDN data plane. The use of DMTs is more common in detecting unauthorised activities in complex, multi-service information systems. DMTs also allow improving the efficiency and flexibility of intrusion detection, detecting new types of threats, highlighting symptoms of a specific attack, and precisely distinguishing between malicious and legal activities.
Therefore, this study’s aims were twofold:
  • To present a comprehensive solution that can be successfully used in the SDN environment to mitigate threats caused by different malicious activities of intruders;
  • To demonstrate that our combination of data mining techniques provides better detection and classification of malicious flows compared with other solutions.
This study continues our preliminary works on assessing the rationality of using transformation techniques such as ICA and PCA to reduce the features space and processing time [10] and exploring methods of generating both normal and malicious flows that can be used to evaluate various SDN-based intrusion detection systems [11].
Briefly, the main contributions of this paper are as follows:
  • The elaboration of a flow rule generation mechanism that allows for the online adjustment of the granularity of rules to the current traffic volume and obtains a good balance between the number of captured features to precisely identify network traffic and the controller protection against flooding;
  • The extension of flow classifier functions to enable the examination of different classification methods by the appropriate selection of parameters for the technique used and their values, as well as the attributes of the learning phase;
  • The presentation of the Monitoring and Detection of Malicious Activities in SDN (MADMAS) system deployment in a virtual environment that allows for its examination in conditions similar to the real ones;
  • The demonstration of the performance of MADMAS system alternatives and their effectiveness in malicious flow detection and classification compared with other solutions described in the literature.
The rest of the paper is structured as follows. Section 2 presents a brief overview of an available solution that enables the detection of malicious activities in SDNs, with particular emphasis on the data mining techniques used. Section 3 provides an overview of the system architecture, also showing the techniques used at individual stages of flow detection and classification and the mechanisms of flow rule generation and flow classification. Section 4 describes experiments that confirmed the MADMAS system’s effectiveness in the detection and classification of malicious flows under conditions that reflect typical SDN activities. The paper concludes with some remarks and proposals for the future.

3. MADMAS Architecture

3.1. Architecture Overview

In a MADMAS system, the SDN controller acts as an intermediary platform for the centralised retrieval of traffic flow parameters from the switches. We assumed that measurements of traffic flow parameters, their processing, and flow feature selection should be performed in such a way and at such a level of detail that we can identify malicious hosts. Furthermore, the traffic measurement and feature processing mechanisms should use the native functions and protocols of the SDN, and the use of other mechanisms and protocols that are not part of the SDN environment should be limited.
The MADMAS system architecture, presented in Figure 2, consists of seven main components: a flow rules generator (FRG), flow reader (FR), basic features repository (BFR), additional features generator and flows repository (AFG), features pre-processing (FPP), flow classifier (FC), and control component (CC). The system operates in the network environment, cooperating directly with the SDN controller.
Figure 2. MADMAS architecture. The symbols used: FR—flow rule; FFP—first packet in a flow; CFP—features of FFP; FS—flow statistics; CP—basic flow features; X, XP—input vectors before and after pre-processing, respectively; CRL3, CRL4—attributes of flow granularity.
The FRG generates flow rules to ensure packet transfer over the network between source and destination nodes. The flow rules granularity technique allows us to distinguish sessions and connections. Incorporating a mechanism that allows for dynamic adjustment of the granularity of the flow rules to the current traffic volume enables a good balance between the number of captured features to correctly identify the network traffic and the controller protection against flooding. The FRG is also responsible for collecting application layer data from the first packets of flow and provides information about the reduction in flow granulation. A detailed description of this component is given in Section 3.2.
The FR performs tasks related to the sequential reading of the contents of flow tables and extraction of data from flow rules (flow input port, source and destination addresses, layer 4 protocols, and source and destination TCP/UDP ports) and from flow statistics (maximum flow duration, number of bytes sent/transferred in the source/destination direction with or without the TCP PSH flag, number of packets sent in the source/destination direction with a TPC PSH flag set). For each composition of such data, the set of basic flow features CP is defined and stored in the BFR for further analysis. The FR is also responsible for the generation of application layer features. The UDP flows are taken directly from the payload of the first packet. For TPC flows, the application layer data are passed after the three-way handshake process. Therefore, the FR uses the PSH flag to distinguish such flows.
The additional features generator and repository is responsible for additional feature specification based on the basic features and content of flow tables. A set of additional features contains complementary data that reflect the interrelation of flows, changes in the value of some of their attributes, as well as data enabling the differentiation of traffic classes. This helps to increase the effectiveness of malicious activity detection, such as the maximum value of the flow coefficient with different or the same ports, the maximum value of the flow factor for a given target host, the maximum value of the single flow coefficient, the maximum value of the flow repetition coefficient, and maximum values of the layer 3 and layer 4 flow reduction coefficient. Both additional and application layer features are stored in the repository for further use in flow classification. The features pre-processing component carries out the initial phase of data mining. Based on the set of vectors X, which represent application layer data gained from the AFG, this component creates a set of vectors XP containing selected features enabling effective flow classification. It comprises four 4 modules responsible for:
  • The processing of application layer data with a text mining technique that includes input data tokenisation, n-gram analysis of tokens, features pruning, and features transformation using independent component analysis (ICA);
  • The normalisation of the features for the unification of the numerical ranges of their values;
  • The linear transformation of features with principal component analysis (PCA) in order to highlight specific aspects of the data;
  • Feature selection for flow classification.
The transformation of the string of ASCII characters representing the application layer data into a set of tokens creates input data for n-gram analysis, which allows for the creation of a feature space for a string by counting occurrences of substrings consisting of n tokens. The result of n-gram analysis is a vector that defines the frequency distribution of the substrings for each string representing application layer data. The token occurrence frequency is determined by the TF-IDF method [44]. The result is a vector containing the weight of words occurring within the application layer data. The set of vectors can contain a large number of features, and therefore, additional processes are implemented to reduce their elements. The first reduction process removes tokens for which the TF-IDF value is outside the given frequency range of occurrence. Thus, the limited set of vectors is again reduced with ICA transformation [45].
The normalisation of features aims at achieving a coherent dataset and leads to unification of the numerical ranges and values of the data. The normalised vectors of the features are further subject to PCA transformation for feature space reduction. During the selection, a set of the most significant features contained in the vectors of reduced dimensionality is created, which is then used for flow classification.
The flow classifier carries out tasks related to malicious flow detection and assigns each malicious flow an appropriate label, representing a class of specific illegal activity. The outcomes of classification can trigger reaction procedures, including the introduction of new flow rules to eliminate identified threats. In the present version, the FC can be configured directly for flow classification with a predefined technique or for the examination of different classification methods by appropriate selection of the parameters of the technique used and their values, as well as training attributes. A detailed description of the FC is given in Section 3.3.
The control component enables the system’s operator to introduce modifications/changes to a technique used for flow classification in order to obtain an accepted level of system effectiveness.

3.2. Flow Rules Generator

The flow rules generator, whose internal structure is shown in Figure 3, consists of three modules: incoming packets handler (IPH), application layer data recorder (ALDR), and a flow rules generator (FRG).
Figure 3. Structure of a flow rules generator. The symbols used: FR—flow rule; FFP—first packet in a flow; C7—application layer data of FFP; ID—identifier for joining FR and CFP; CPH—features from packet header; and CRL3, CRL4—attributes of flow granularity reduction.
The IPH module receives the first packets FFP sent by the controller from flows for which no match rules FR are found, and it retrieves the information necessary to create flow rules as well as to obtain application layer data. A copy of the unprocessed application layer data CL7, together with the generated identifier ID used to associate data with the flow, is sent to the ALDR module for further feature selection. The application layer data of the UDP flow are already contained in the payload of the first packet transferred to the IPH module. However, for the TCP flow, application layer data are transferred only after establishing a connection (three-way handshake) between source and destination nodes, so the first TCP packet cannot be used for such identification. To resolve this issue, additional differentiation of flows is introduced by using the TCP push flag that enables forwarding the TCP packet to the controller, together with application layer data.
The FRG module creates flow rules FR based on CPH data obtained from the packet header and the flow ID. A flow rule contains the identifier of the flow (ID), rules of the flow processing (An) determining the PFD output port at which packets are forwarded, and the set of flow matching attributes:
F R = I D ,   P i n ,   I P s r c ,   I P d s t , p s r c ,   p d s t ,   P L 4 ,   p s h ,   A n
where Pin is the PFD input port, IPsrc is the source IP address, IPdst is the destination IP address, psrc is the L4 input port, pdst is the L4 output port, PL4 is the L4 protocol, and psh is the status of the PSH flag.
The flow rule is removed if no new packet corresponding to it is received within the time frame tidle > 0. There are a number of benefits of reactive flow rule removal, including the ability to determine the flow duration, a reduction in the size of flow tables, and an increase in the level of flow granularity.
Flow rules have a specific level of granularity that allows us to identify the type of network traffic and distinguish sessions and connections. Increasing flow granularity allows more detail in capturing traffic features. In contrast, a reduction in flow granularity allows us to reduce the number of packets sent to the controller; however, this leads to a decrease in the level of details in measuring traffic characteristics. Increasing the granularity of flows results in passing more packets to the controller, which must be processed to implement flow rules. This can result in increased consumption of controller resources, which, in turn, can increase the delay in packet processing. A situation in which packets are sent to the controller in a number significantly exceeding the normal level of network traffic is interpreted as controller flooding. To avoid such adverse events, the regulation mechanism of flow granularity was introduced, as shown in Algorithm 1, which generates the values of granularity reduction attributes that are submitted to the basic feature repository.
Algorithm 1 Reduction in flow granularity
Input arguments:
σL4: L4 reduction threshold;
σL3: L3 reduction thresholdTIP: set of IP addresses of packets sent to the controller
TP; set of ports in the packet forwarding device (PFD)
Loop
for each IPi in TIP
determine Δ P i L 4   for   I P i
if Δ P i L 4 from I P s r c > σ L 4
read R C L 4 for I P i
determine t h a r d L 4 based on Δ P i L 4 , R C L 4
determine ID, A n
determine F R L 4 based on ID, t h a r d L 4 , A n
introduce F R L 4 to PFD
determine P C L 4 for I P i
determine C R L 4 based on Δ P i L 4 , R C L 4 , P C L 4
introduce I D , C R L 4 to BFR
update P C L 4 for I P i
end
for each Pi in TP
determine Δ P i L 3 for P i
if Δ P i L 3 from P i n > σ L 3
read R C L 3 for P i
determine t h a r d L 3 based on Δ P i L 3 , R C L 3
determine ID, A n
determine F R L 3 based on ID, t h a r d L 3 , A n
introduce F R L 3 to PFD
determine P C L 3 for P i
determine C R L 3 based on Δ P i L 3 , R C L 3 , P C L 3
introduce I D , C R L 3 to BFR
update P C L 3 for P i
end
endloop
Output arguments:
F R L 4 , F R L 3 , t h a r d L 4 , t h a r d L 3 , C R L 4 , C R L 3
The attributes are determined by the values of the packet parameters of the incoming flows in relation to:
  • IP source address (IPsr): reduction at the L4 level;
  • PFD input port (Pin): reduction at the L3 level.
The flow rules take the following forms:
F R L 4 = I D ,   P i n ,   p s r c ,   p d s t ,   A n : in the case of granularity reduction at the L4 level;
F R L 3 = I D ,   P i n ,   p d s t , A n : in the case of granularity reduction at the L3 level.
The attributes of the flow granularity reduction at the L4 level are determined according to the following formula:
C R L 4 = 1 1 + e α Δ P L 4   + P C L 4 + R C L 4
where α is the shape factor, ΔPL4 is the number of packets sent from the given IPsr address within t = 1 s, PCL4 is the total number of packets sent since the start until the end of reduction, and RCL4 is the number of previous reductions for the given IPsr address, CRL4 (0,1).
The attributes of the flow granularity reduction at the L3 level are determined according to the following formula:
C R L 3 = 1 1 + e α Δ P L 3   + P C L 3 + R C L 3
where α is the shape factor, ΔPL3 is the number of packets sent from the given port Pin within t = 1 s, PCL3 is the total number of packets sent since the start until the end of the reduction, and RCL3 is the number of previous reductions for the given port Pin, CRL3 (0,1).
For a given flow rule, the additional parameter thard is also determined, according to Equations (4) and (5), which defines the time for which the reduced flow rule is introduced:
t h a r d L 4 = t L 4 1 + e α Δ P L 4   + R C L 4
t h a r d L 3 = t L 3 1 + e α Δ P L 3   + R C L 3
where tL3 and tL4 are the maximum values of thardL3 and thardL4, respectively.
After time thard, the reduced granularity flow is removed from the table, regardless of whether packets are being forwarded within this flow. This enables us to continue introducing flows of high granularity according to Equation (1).

3.3. Flow Classifier

The flow classifier (FC) performs tasks related to the detection of malicious flows in the SDN data plane using selected classification techniques. It is composed of three modules: switching, learning, and classification and visualisation (Figure 4). The switching module divides the set of features after pre-processing XP into the subsets XPU and XPK. XPU contains the vectors of input data for the learning phase of the selected algorithm, which is performed in the first stage of malicious flow detection. XPK contains the vectors of input data used for flow classification. The division of XP into XPU and XPK is determined by values of the PD parameters.
Figure 4. Structure of the flow classifier. The symbols used: XP—input vectors after features processing; XPU—input vectors for learning; XPK—input vectors for classification; PM—learned model parameters for classification and visualisation; and PD—hyperparameters for classification and visualisation.
The learning module carries out the process of choosing the parameter values of the selected classification method using the XPU learning subset. Based on this, the PM model is built, which is used to predict a class of flows for a new pattern whose input arguments are not included in the learning dataset.
Proper malicious flow detection using the XPU subset of input data is performed by the classification and visualisation module, which executes two processes:
  • The detection of unauthorised activity and assigning to it an appropriate label of malicious action;
  • The visualisation of classification results.
The presented solution assumes that the following classification techniques can be used to detect undesirable flows:
  • Multilayer perceptron (MLP) and radial basis function (RBF);
  • Multipass self-organising map (MSOM);
  • Learning vector quantisation (LVQ) and hierarchical LVQ (HLVQ);
  • Support vector machine (SVM);
  • k-Nearest neighbour (k-NN).
To ensure the effectiveness of unauthorised flow detection, the MADMAS system allows us to modify the values of PD parameters of the applied classification technique, the list of which is given in Table 2. The type of parameter and its value depend on the technique used, as well as on the value of the following attributes:
Table 2. Modifiable parameters of malicious flow detection techniques.
  • psplit(0,1): learning/detection split ratio;
  • n: number of learning cycles if cross-validation is applied;
  • PM: parameters of a detection model.
The PD parameter values are defined by the MADMAS user, depending on the actual needs, and entered via the control module.

4. MADMAS Examination

4.1. Experimental Setup

The aim of the study was to examine the effectiveness of MADMAS in the detection and classification of malicious flows under typical SDN traffic conditions. For this purpose, an experimental tested environment was developed containing an SDN emulator, an OpenDaylight (ODL) controller containing MADMAS components, and data centre servers, all implemented on a single server hosting some virtual machines, as shown in Figure 5.
Figure 5. MADMAS testbed environment.
The flow rules generator was implemented as the OSGi network application in the ODL based on the OpenDaylight L2 switch project modification. The FR read the flow rules using ODL API REST messages. The NoSQL Cassandra database (column family database) was used to store datasets of basic and additional flow features. The specialised tools MATLAB, WEKA, and RapidMiner were used for the implementation of classification techniques. The control component contained a set of dedicated tools and scripts for the automatic change of parameters of individual methods. The Mininet platform was used as the SDN emulator. The data centre side was emulated by Metasploi Table 2 virtual machines. Traffic generators of normal as well as malicious traffic were implemented on separate virtual machines.
To reflect typical SDN traffic conditions, five classes of flows were generated that represent both normal and malicious network activities:
  • Normal (N): correct flows between clients and servers;
  • Denial of service (DoS): actions aimed at making network resources unavailable to users;
  • Probe (P): actions aimed at ports, vulnerabilities or version scans;
  • Access by exploit (AE): actions enabling remote access to machines by exploiting vulnerabilities;
  • Access by password guess (APG): actions enabling access to remote machines through attempts of unauthorised login.
The list of applications used for traffic generation is presented in Table 3.
Table 3. Traffic application tools.
It was assumed that the generated traffic would be complex, preventing the direct detection of malicious flows. However, due to the complexity of real traffic, it was necessary to adopt some assumptions and simplifications that do not affect the credibility of the outcomes of system examination:
  • Services indicated in Table 3 are running on the servers;
  • Data are exchanged between servers and hosts;
  • Hosts initiate normal and malicious traffic;
  • Hosts do not cooperate with one another;
  • Unauthorised and normal traffic is generated simultaneously on separate virtual machines, with the parameters presented in Table 4.
    Table 4. Generated traffic parameters.
Normal traffic is generated using client applications according to the Poisson distribution, while malicious traffic is generated according to the normal distribution. Each class of unauthorised action has subclasses, which define the detailed course of action and type of tools or exploits applied for attacks targeted at a server or network resources. For example, Nping is used to generate a flooding attack that affects both the performance of the SDN controller and the available data plane resources and can cause delays in flow matching.

4.2. Testing Conditions

It was assumed that malicious flow detection was performed in off-line passive mode, i.e., the core detection process occurs after the completion of flow feature measurements on a data mining platform. Test data for individual methods were stored in the repository and read for experimentation. This approach allowed for a comprehensive study and comparison of the effectiveness of selected classification techniques, as well as indicating the most effective one, tailored to the specificity of traffic flows in the SDN.
The detection and classification of malicious flows by MADMAS requires the introduction of a set of input vectors X to the FPP and FC. The MADMAS system was examined using repetitions of the learning processes, the so-called k-cross-validation, with different learning datasets. The input dataset was divided into k = 10 parts, of which k − 1 were used for learning. The procedure was repeated k times, changing the testing subset each time.
The following metrics were used to evaluate flow classification performance:
  • Recall rate:
T P R = T P T P + F N
where TP is a true positive and FN is a false negative;
  • Precision rate:
P P V = T P T P + F P
where FP is a false positive;
  • F-measure (F1 score):
F 1 = 2 P P V Δ T P R P P V + T P R
In addition, the following time measures were used for system evaluation:
  • Average execution time:
A E T = t x n f
where tx is the cross-validation time and nf is the number of datasets used for cross-validation;
  • Flow transfer delay (round-trip time):
F R T T = t r t s
where ts is the time of sending the first packet and tr is the time of receiving the response.
The experiments presented below aimed at:
  • Assessing the mechanism of granularity reduction;
  • Identifying the most suitable technique for detection and classification of malicious flows in the SDN environment.

4.3. Flow Granularity Reduction

The study of the flow granularity reduction mechanism was performed in two modes of the MADMAS system, i.e., with the mechanism on and off. In both cases, a source host generated ICMP packets with the given intensity IF for a set of receiving hosts. It was assumed that no flow rule existed for any generated packet, which forced it to be transferred to the SDN controller. After confirmation of each ICMP packet receipt, the FRTT was calculated and averaged at the end of the session.
The impact of a number of generated packets NP on the metric FRTT with the flow granularity reduction mechanism on and off is presented in Figure 6.
Figure 6. Effect of using flow granularity reduction for IF = 30,000 (packets/s) and NP < 3100.
The flow granularity reduction mechanism does not affect the FRTT value if the number of generated packets is relatively small (NP < 900). However, if the mechanism is off, along with an increase in the number of packets loading the controller, FRTT increases rapidly. The mechanism contributes to a significant reduction in flow transfer delay, which also translates into a reduced controller load.
Without the flow granularity reduction mechanism, a further increase in the number of packets transferred to the controller (Figure 7) leads to overloading, which blocks the introduction of new flow rules.
Figure 7. Effect of using flow granularity reduction for IF = 30,000 (packets/s) and NP > 6000.
If the flow granularity reduction mechanism is on, traffic flooding is significantly limited. FRTT remains low ( F R T T 176.92   ms) regardless of the number of incoming packets. This confirms the purposefulness of using the mechanism when flows are introduced in reactive mode. This mechanism protects the SDN controller from flooding traffic that might be a form of DoS attack. The in granularity was introduced only for a specific period, and this information was saved to the repository, which enabled us to constantly monitor the activity in the SDN.

4.4. MADMAS Evaluation

The effectiveness of MADMAS in malicious flow detection was examined in the testbed environment (see Figure 5) following the procedure shown in Figure 8.
Figure 8. Test procedure.
The procedure started with the generation of both normal and malicious traffic. The user hosts generated requests to servers while malicious hosts launched attacks using the tools specified in Table 3.
The MADMAS components acquired and processed the information sent to the controller from the data layer and created a set of input vectors X that were stored in the repository for further processing. Labels that defined the traffic class were assigned to the saved vectors for system validation. The traffic class was determined during its generation based on features that were not used in detection, e.g., a time stamp of traffic generation, IP address, etc. Then, pre-processing of features was performed, the classification technique under investigation was selected, and the values of its configuration parameters (see Table 2) were determined. Furthermore, tenfold cross-validation was performed, followed by evaluation of the obtained results. The procedure supports changing of the configuration parameters values to obtain the best flow classification results. The MADMAS components, used at specific stages of the procedure, are also shown in the right part of Figure 8.
The initial phase of the experiment focused on the selection of techniques with the best ability to detect SDN flows. The obtained results, presented in Table 4, confirmed the usefulness of using SVM, k-NN and HLVQ classifiers in the MADMAS system. For better clarity, fields with the best TPR, PPV, and AET values are marked in colour in Table 5. The advantage of these techniques over others is that all types of flows generated by hostile hosts are detected, especially DoS, P, and APG attacks.
Table 5. Efficiency of selected techniques in SDN flow detection.
However, the use of data mining in a real SDN environment requires its quick reaction to undesirable flows. The lowest values of the AET metric were achieved for LVQ1 and HLVQ1 classifiers, while the other classifiers had large AET values which, taking into account small TPR and PPV values, indicates their low usefulness in the considered application. By reducing the number of features by using principal component analysis (PCA) transformation [46] in the FPP, a significant reduction in AET was achieved for K-NN, HLVQ, and SVM classifiers, i.e., 2.0-, 1.8-, and 1.3-fold, respectively. The use of PCA transformation resulted in only a slight increase in the TPR and PPV values (Table 4), which confirms the low sensitivity of these techniques to reduce the number of features used. Therefore, we decided to use k-NN, HLVQ, and SVM techniques for further tests of the MADMAS system.
During the main phase of the experiment, the efficiency of the MADMAS system in the detection and classification of malicious flows was compared against selected alternative mechanisms (see Table 6) depicted in [29,31,36].
Table 6. Compared systems.
For such a comparison to be credible, the alternative solutions should use the data collected by MADMAS from SDN flows, which are then processed and classified according to a specific concept, as shown in Figure 9. However, in our study, we used ready-made comprehensive solutions. Neither of these uses application layer features for classification purposes.
Figure 9. Implementation of alternative solutions.
As shown in Table 7, in all the cases considered, the MADMAS system was better able to detect malicious flows compared to other solutions. The best metric values for each attack class are shown in colour for better visibility.
Table 7. Summary of the results.
The better efficiency of the MADMAS system is particularly evident in the case of the access-by-exploit attack class, for which the following increments of classification performance metrics were obtained compared to other methods:
  • For TPR by 31.4%;
  • For PPV by 27.3%;
  • For F1 by 29.3%.
This confirms the purposefulness of data acquisition from the application layer and the use of those data for flow classification.
The best results of malicious flow classification were obtained for the MADMAS system based on the SVM, especially in the case of DoS and APG attacks. The MADMAS system is less effective in detecting probe attacks but is still significantly better compared to other solutions. This is because of the similarity of probe attacks to normal traffic, which uses low-intensity port scanning and covert scanning.
To demonstrate the impact of using application layer features on the effectiveness of flow classification, an additional experiment was performed based on the solution proposed by [31]. System 2 was modified to enable the use of ICA-based application layer features stored in the MADMAS repository. The results obtained with and without application layer features are presented in Table 8.
Table 8. Impact of the ICA-based application layer features on the effectiveness of System 2.
The inclusion of the data obtained from the application layer in the solution proposed by Bhargava et al. [31] results in a slight improvement in the efficiency of the flow classification, with a simultaneous slight increase in execution time. We would like to emphasise that more in-depth experiments should be performed for the complete assessment of such impact. In particular, the relationship between the transformation of the ICA-based application layer features and the machine learning technique used should be identified.
The ROC curves, shown in Figure 10, confirm a much better classification performance of the MADMAS system than the solution described in [29] for all the considered threats. The biggest difference was for probe and access-by-exploit attacks (Figure 10b,c), while the best curve shape was obtained for DoS and APG attacks.
Figure 10. ROC curves.
Although an SVM allows the MADMAS system to obtain high classification performance, its time efficiency, expressed by AET, is much lower compared to the system based on HLVQ. This indicates the advisability of using an HLVQ-based system in SDNs with limited hardware resources (e.g., RAM, processor performance).

5. Conclusions

In this paper, we described the promising concept of using data mining techniques for the detection and classification of malicious flows in the SDN data plane, with a focus on the presentation of flow rule generation and flow classification mechanisms. The MADMAS system was implemented in a testbed environment, and its performance metrics were evaluated and compared with some alternative solutions.
The use of a virtual test environment with the SDN emulator and the existing SDN controller allowed testing the system in conditions similar to real ones. The experiments confirmed that the MADMAS system provides good flow detection performance for all types of malicious activities, in particular, probe and access-by-exploit attacks. The implementation flow granularity reduction prevents flooding traffic being passed to the SDN controller.
We also examined some classification techniques and assessed their applicability for malicious flow detection in the SDN data plane. The obtained results indicate that the use of the SVM for flow classification in the MADMAS system gives the best results in terms of classification performance. However, due to its low time efficiency, HLVQ seems to be a more appropriate solution for an SDN with limited hardware resources.
All MADMAS components are software-based; therefore, the system can easily be extended with additional procedures for flow generation and/or classification. The system architecture enables the identification and mitigation of threats caused by malicious actions in both fixed and wireless networks. However, a full assessment of the system effectiveness in detecting malicious flows requires further studies, especially in a real SDN. The OpenFlow-based SDN environment, depicted in [47] and which was developed with a specific focus on validation of SDN security mechanisms, could be successfully used for the MADMAS examination.

Author Contributions

Conceptualisation, M.A.; methodology, D.J. and M.A.; software, D.J.; investigation, D.J. and M.A.; formal analysis, D.J. and M.A.; validation, D.J.; visualisation, D.J.; writing—original draft, D.J.; writing—review and editing, M.A.; supervision, M.A. All authors have read and agreed to the published version of the manuscript.

Funding

This research was partially funded by the National Centre for Research and Development, grant number CYBERSECIDENT/369195/I/NCBR/2017. The APC was funded by NASK—National research Institute.

Acknowledgments

This work was partially performed within the CYBERSECIDENT/369195/I/NCBR/2017 project supported by the National Centre of Research and Development in the frame of the CyberSecIdent Programme.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Chaudet, C.; Haddad, Y. Wireless Software Defined Networks: Challenges and opportunities. In Proceedings of the 2013 IEEE International Conference on Microwaves, Communications, Antennas and Electronic Systems (COMCAS 2013), Tel Aviv, Israel, 21–23 October 2013; IEEE: Piscataway Township, NJ, USA, 2013; pp. 1–5. [Google Scholar]
  2. Berestizshevsky, K.; Even, G.; Fais, Y.; Ostrometzky, J. SDNoC: Software defined network on a chip. Microprocess. Microsyst. 2017, 50, 138–153. [Google Scholar] [CrossRef]
  3. Kumar, H.; Gupta, P. SDN Security Issue and Resolution. Indian J. Appl. Res. 2017, 7, 654–656. [Google Scholar] [CrossRef]
  4. Gonzalez, A.J.; Nencioni, G.; Helvik, B.E.; Kamisinski, A. A Fault-Tolerant and Consistent SDN Controller. In Proceedings of the 2016 IEEE Global Communications Conference (GLOBECOM), Washington, DC USA, 4–8 December 2016; pp. 1–6. [Google Scholar]
  5. Lee, S.; Yoon, C.; Shin, S. The Smaller, the Shrewder. In Proceedings of the 2016 ACM International Workshop on Security in Software Defined Networks & Network Function Virtualization; ACM: New York, NY, USA, 2016; pp. 23–28. [Google Scholar]
  6. Scott-Hayward, S.; Natarajan, S.; Sezer, S. A Survey of Security in Software Defined Networks. IEEE Commun. Surv. Tutorials 2016, 18, 623–654. [Google Scholar] [CrossRef]
  7. Akhunzada, A.; Ahmed, E.; Gani, A.; Khan, M.K.; Imran, M.; Guizani, S. Securing software defined networks: Taxonomy, requirements, and open issues. IEEE Commun. Mag. 2015, 53, 36–44. [Google Scholar] [CrossRef]
  8. Shin, S.; Xu, L.; Hong, S.; Gu, G. Enhancing Network Security through Software Defined Networking (SDN). In Proceedings of the 2016 25th International Conference on Computer Communication and Networks (ICCCN), Waikoloa, HI, USA, 1–4 August 2016. [Google Scholar]
  9. Yoon, C.; Park, T.; Lee, S.; Kang, H.; Shin, S.; Zhang, Z. Enabling security functions with SDN: A feasibility study. Comput. Networks 2015, 85, 19–35. [Google Scholar] [CrossRef]
  10. Jankowski, D.; Amanowicz, M. A study on flow features selection for malicious activities detection in software defined networks. In Proceedings of the 2018 International Conference on Military Communications and Information Systems (ICMCIS), Warsaw, Poland, 22–23 May 2018; pp. 1–9. [Google Scholar]
  11. Jankowski, D.; Amanowicz, M. A method of network workload generation for evaluation of intrusion detection systems in SDN environment. In Proceedings of the 2016 International Conference on Military Communications and Information Systems (ICMCIS), Brussels, Belgium, 23–24 May 2016. [Google Scholar]
  12. Liao, H.-J.; Richard Lin, C.-H.; Lin, Y.-C.; Tung, K.-Y. Intrusion detection system: A comprehensive review. J. Netw. Comput. Appl. 2013, 36, 16–24. [Google Scholar] [CrossRef]
  13. Umer, M.F.; Sher, M.; Bi, Y. Flow-based intrusion detection: Techniques and challenges. Comput. Secur. 2017, 70, 238–254. [Google Scholar] [CrossRef]
  14. Kozik, R.; Choraś, M.; Hołubowicz, W. Evolutionary-based packets classification for anomaly detection in web layer. Secur. Commun. Networks 2016, 9, 2901–2910. [Google Scholar] [CrossRef]
  15. Bhuyan, M.H.; Bhattacharyya, D.K.; Kalita, J.K. Network Anomaly Detection: Methods, Systems and Tools. IEEE Commun. Surv. Tutorials 2014, 16, 303–336. [Google Scholar] [CrossRef]
  16. Boriah, S.; Chandola, V.; Kumar, V. Similarity Measures for Categorical Data: A Comparative Evaluation. In Proceedings of the 2008 SIAM International Conference on Data Mining, Philadelphia, PA, USA, 24–26 October 2008. [Google Scholar]
  17. Kruczkowski, M.; Niewiadomska-Szynkiewicz, E.; Kozakiewicz, A. FP-tree and SVM for Malicious Web Campaign Detection. In Intelligent Information and Database Systems. ACIIDS 2015; Nguyen, N., Trawiński, B., Kosala, R., Eds.; Lecture Notes in Computer Science; Springer: Cham, Switherland, 2015; Volume 9012, pp. 193–201. [Google Scholar]
  18. Kruczkowski, M.; Niewiadomska-Szynkiewicz, E. Comparative study of supervised learning methods for malware analysis. J. Telecommun. Inf. Technol. 2014, 4, 24–33. [Google Scholar]
  19. Buczak, A.L.; Guven, E. A Survey of Data Mining and Machine Learning Methods for Cyber Security Intrusion Detection. IEEE Commun. Surv. Tutorials 2016, 18, 1153–1176. [Google Scholar] [CrossRef]
  20. Dua, S.; Du, X. Data Mining and Machine Learning in Cybersecurity; Auerbach Publications: New York, NY, USA, 2016; ISBN 9780429063756. [Google Scholar]
  21. Denatious, D.K.; John, A. Survey on data mining techniques to enhance intrusion detection. In Proceedings of the International Conference on Computer Communication and Informatics, Coimbatore, India, 10–12 January 2012; pp. 1–5. [Google Scholar]
  22. AbuHmed, T.; Mohaisen, A.; Nyang, D. A Survey on Deep Packet Inspection for Intrusion Detection Systems. Mag. Korea Telecommun. Soc. 2008, 24, 25–36. [Google Scholar]
  23. Tavallaee, M.; Bagheri, E.; Lu, W.; Ghorbani, A.A. A detailed analysis of the KDD CUP 99 data set. In Proceedings of the 2009 IEEE Symposium on Computational Intelligence for Security and Defense Applications, Ottawa, ON, Canada, 8–10 July 2009. [Google Scholar]
  24. Dhanabal, L.; Shantharajah, S.P. A study on NSL-KDD dataset for intrusion detection system based on classification algorithms. Int. J. Adv. Res. Comput. Commun. Eng. 2015, 4, 446–452. [Google Scholar]
  25. Mehdi, S.A.; Khalid, J.; Khayam, S.A. Revisiting Traffic Anomaly Detection Using Software Defined Networking. In Recent Advances in Intrusion Detection. RAID 2011. Lecture Notes in Computer Science; Sommer, R., Balzarotti, D., Maier, G., Eds.; Springer: Berlin/Heidelberg, Germany, 2011; pp. 161–180. [Google Scholar]
  26. Dotcenko, S.; Vladyko, A.; Letenko, I. A fuzzy logic-based information security management for software-defined networks. In Proceedings of the 16th International Conference on Advanced Communication Technology, Pyeongchang, Korea, 16–19 February 2014; pp. 167–171. [Google Scholar]
  27. Giotis, K.; Argyropoulos, C.; Androulidakis, G.; Kalogeras, D.; Maglaris, V. Combining OpenFlow and sFlow for an effective and scalable anomaly detection and mitigation mechanism on SDN environments. Comput. Netw. 2014, 62, 122–136. [Google Scholar] [CrossRef]
  28. Phaal, P.; Panchen, S.; McKee, N. InMon Corporation’s Sflow: A Method for Monitoring Traffic in Switched and Routed Networks. Available online: https://tools.ietf.org/pdf/rfc3176.pdf (accessed on 20 February 2021).
  29. Braga, R.; Mota, E.; Passito, A. Lightweight DDoS flooding attack detection using NOX/OpenFlow. In Proceedings of the IEEE Local Computer Network Conference, Denver, CO, USA, 10–14 October 2010. [Google Scholar]
  30. Sathya, R.; Thangarajan, R. Efficient anomaly detection and mitigation in software defined networking environment. In Proceedings of the 2nd International Conference on Electronics and Communication Systems (ICECS), Coimbatore, India, 26–27 February 2015. [Google Scholar]
  31. Bhargava, N.; Sharma, G.; Bhargava, R.; Mathuria, M. Decision tree analysis on J48 algorithm for data mining. Int. J. Adv. Res. Comput. Sci. Softw. Eng. 2013, 3, 1114–1119. [Google Scholar]
  32. Nakamura, R.Y.M.; Pereira, L.A.M.; Costa, K.A.; Rodrigues, D.; Papa, J.P.; Yang, X.-S. BBA: A Binary Bat Algorithm for Feature Selection. In Proceedings of the 2012 25th SIBGRAPI Conference on Graphics, Patterns and Images, Ouro Preto, Spain, 22–25 August 2012. [Google Scholar]
  33. Mirjalili, S.; Mirjalili, S.M.; Yang, X.-S. Binary bat algorithm. Neural Comput. Appl. 2014, 25, 663–681. [Google Scholar] [CrossRef]
  34. Le, A.; Dinh, P.; Le, H.; Tran, N.C. Flexible Network-Based Intrusion Detection and Prevention System on Software-Defined Networks. In Proceedings of the 2015 International Conference on Advanced Computing and Applications (ACOMP), Ho Chi Minh City, Vietnam, 23–25 November 2015. [Google Scholar]
  35. Ruggieri, S. Efficient C4.5 [classification algorithm]. IEEE Trans. Knowl. Data Eng. 2002, 14, 438–444. [Google Scholar] [CrossRef]
  36. Tang, T.A.; Mhamdi, L.; McLernon, D.; Zaidi, S.; Ghogho, M. Deep learning approach for Network Intrusion Detection in Software Defined Networking. In Proceedings of the 2016 International Conference on Wireless Networks and Mobile Communications (WINCOM), Fez, Morocco, 26–29 October 2016. [Google Scholar]
  37. Queiroz, W.; Capretz, M.A.M.; Dantas, M. An approach for SDN traffic monitoring based on big data techniques. J. Netw. Comput. Appl. 2019, 131, 28–39. [Google Scholar] [CrossRef]
  38. Tuan, N.N.; Hung, P.H.; Nghia, N.D.; Van Tho, N.; Van Phan, T.; Thanh, N.H. A DDoS Attack Mitigation Scheme in ISP Networks Using Machine Learning Based on SDN. Electronics 2020, 9, 413. [Google Scholar] [CrossRef]
  39. Elsayed, M.S.; Le-Khac, N.-A.; Jurcut, A.D. InSDN: A Novel SDN Intrusion Dataset. IEEE Access 2020, 8, 165263–165284. [Google Scholar] [CrossRef]
  40. Gomez-Rodriguez, J.R.; Sandoval-Arechiga, R.; Ibarra-Delgado, S.; Rodriguez-Abdala, V.I.; Vazquez-Avila, J.L.; Parra-Michel, R. A Survey of Software-Defined Networks-on-Chip: Motivations, Challenges and Opportunities. Micromachines 2021, 12, 183. [Google Scholar] [CrossRef] [PubMed]
  41. Ruaro, M.; Caimi, L.L.; Moraes, F.G. A Systemic and Secure SDN Framework for NoC-Based Many-Cores. IEEE Access 2020, 8, 105997–106008. [Google Scholar] [CrossRef]
  42. Ruaro, M.; Caimi, L.L.; Moraes, F.G. SDN-Based Secure Application Admission and Execution for Many-Cores. IEEE Access 2020, 8, 177296–177306. [Google Scholar] [CrossRef]
  43. Chaves, C.; Azad, S.; Hollstein, T.; Sepúlveda, J. DoS Attack Detection and Path Collision Localization in NoC-Based MPSoC Architectures. J. Low Power Electron. Appl. 2019, 9, 7. [Google Scholar] [CrossRef]
  44. Wu, H.C.; Luk, R.W.P.; Wong, K.F.; Kwok, K.L. Interpreting TF-IDF term weights as making relevance decisions. ACM Trans. Inf. Syst. 2008, 26, 1–37. [Google Scholar] [CrossRef]
  45. Hyvärinen, A.; Karhunen, J.; Oja, E. Independent Component Analysis; Adaptive and Learning Systems for Signal Processing, Communications, and Control; John Wiley & Sons, Inc.: New York, NY, USA, 2001; ISBN 047140540X. [Google Scholar]
  46. Cao, L.J.; Chua, K.S.; Chong, W.K.; Lee, H.P.; Gu, Q.M. A comparison of PCA, KPCA and ICA for dimensionality reduction in support vector machine. Neurocomputing 2003, 55, 321–336. [Google Scholar] [CrossRef]
  47. Wrona, K.; Amanowicz, M.; Szwaczyk, S.; Gierlowski, K. SDN testbed for validation of cross-layer data-centric security policies. In Proceedings of the 2017 International Conference on Military Communications and Information Systems (ICMCIS), Oulu, Finland, 15–16 May 2017. [Google Scholar]
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.