1. Introduction
OpenFlow is proposed as an online communication protocol that is able to control the forwarding plane of an online switch or router. Moreover, OpenFlow API and the stateful data plane [
1] are very widespread among researchers, as they are straightforward for collaborating with software controllers. Introductory vendors that uphold OpenFlow include HP, NEC, and some others, and this rundown has since extended significantly. The combination of SDN [
2] and OpenFlow [
3] is currently the simplest way to perform Quality of Service (QoS) estimation, whenever and wherever, using self-guided, self-tuning components that persistently screen and measure network execution and respond quickly to issues [
4]. With the program-based OpenFlow API, a wide range of controller stages has been developed [
5] for programmers to make numerous applications, such as dynamic access control [
6], network virtualization [
7], energy-efficient networking [
8], consistent virtual-machine relocation, and client versatility.
Due to the underlying hardwired implementation of routing rules and other hurdles, conventional network equipment such as switches and routers are inflexible and cannot deal with varied forms of network traffic. This programmable network may be constructed to meet the needs of the network operator, who can set its own bandwidth, latency, and other parameters, including packet missing and inactivity, to help meet the diverse necessities of cutting-edge network applications and administrations [
9]. Although the hybrid deployment of SDN [
10] has many advantages, including adaptability to budget constraints and central programmability of the network, it is limited by its structure. Thus, we need to find out the optimal solutions to make the SDN more powerful with a lot of network functionalists. However, SDN is generally not completely adopted in networks for a variety of reasons. One significant cause is a restricted budget for new network infrastructure.
SDN management applications require reliable and timely information on network resources at various aggregation levels in order to respond to traffic changes. So, in traffic engineering, traffic measurement is critical. It consists of three major subtasks: measuring network topology, measuring network traffic, and measuring network performance. Many SDN systems, however, employ conventional monitoring methods, which either need complicated extra modules at the switch or incur substantial measurement overhead. iSTAMP [
11] is a traffic aggregation and measurement paradigm that is intelligent. iSTAMP divides switch TCAM entries into two parts: wildcard rules for aggregate measures and fine-grained rules for deaggregating and directly measuring the most informative flows for per-flow measurements. OpenMeasure [
12] implies that the aggregate matrix is provided in accordance with the underlying routing and flow aggregation principles. OpenTM [
13] is a query-based monitoring solution for OpenFlow networks that intends to assess the traffic matrix (TM). It maintains track of all active network flows.
Therefore, SDN traffic measurement [
14] comes into consideration to break down the complexity of SDNs’ unresolved problems, which also supports different types of tasks for OpenFlow controllers to utilize programmable interfaces. These software-defined measurement solutions help to support the diversity requirements of next-generation network applications and services by providing consistent traffic measurement of flow parameters, such as bandwidth, packet loss, and latency. Moreover, whether the path carries that service or not, the flexibility of software-defined measurement still gives the network operator the capability to offer dynamic QoS that guarantees service quality between endpoints. Moreover, an indirect, nonintrusive, and statistical way, which in some cases cannot be measured in traditional large networks, is allowed in SDN traffic measurement to infer several characteristics. Although it is quite a mature technology these days, it is still challenging for programmers to analyze or predict traffic measurements in the future to provide dynamic QoS that guarantees service quality.
However, with the development of machine learning (ML) algorithms, we can make decisions on plenty of tasks using previous knowledge. In general, statistical machine learning, deep learning (DL), and deep reinforcement learning (DRL) are the main methods to deal with the challenge of future prediction. Here, ML is a widely used way to measure the traffic data, and it can give the future network administrator recommendations. Moreover, DL is a recent method which performs more efficiently and accurately than the classic ML approaches and can crack out more represented features to make a prediction. Additionally, different from these two approaches, DRL concerns more how to lead the agents to take action for achieving the final goals, and it will interact with the environment to make decisions. With the analysis of the traffic data, SDN traffic measurement among the network can be predicted, and then programmers can prevent attackers from controlling the controller of the network, which provides satisfying service and guarantees the QoS.
In this paper, we present a survey of the research relating to solutions in SDN traffic measurement that has been carried out until 2022. We first describe the current issues of the SDN, SDCN, and traffic measurement, and then analyze the existing approaches in terms of their aspects and contributions. To further improve the performance of SDN traffic measurement, we summarize the future scopes of ML-based optimization and networking deployment. The main contributions of this paper are as follows:
We find out the basic requirements for designing an SDN traffic measurement framework, which can be used as a standard design method in this area.
We provide a brief literature review of the SDN and SDCN measurements and categorize them by the network applications and learning behavior, and then we also summarize the future scopes of research with the existing challenges.
To further improve the measurement of traffic data, we summarize the probability of how to apply machine learning, deep learning, or deep reinforcement learning to the traffic measurement. In more detail, we analyze previous machine-learning-based methods and then transfer the basic idea into other applications with different approaches.
Lastly, we conclude that machine learning is the best choice for designing a framework where we can use a set of algorithms as a measurement library for SDN traffic measurement in the next generation of heterogeneous, complex, and hierarchical networks.
The remainder of this paper is organized as follows: In
Section 2, we first give a short, brief description of SDN and OpenFlow and discuss its development and some measurement issues. Then, we describe the currently existing cellular network in
Section 3 and discuss how to solve the security issues to improve the quality of SDCN. After this, in
Section 4, we introduce SDN traffic measurement in detail. In addition, we categorize the previous state-of-the-art traffic measurement solutions for SDN/SDCN into four different segments based on the network application behavior described in
Section 5. Additionally, we categorize a few ML SDN traffic measurement solutions based on their learning behavior to show the advantages of using ML in
Section 6. After that, we list out the future prospects of ML-based networking and nine future research directions opened in SDN and SDCN traffic measurement in
Section 7 and
Section 8, respectively. Lastly,
Section 9 summarizes the conclusions.
3. Cellular Network
Even though there has been consistent development of constantly accessible networks [
26,
27], the security, threats, and scalability of cellular networks still need to be researched. Currently existing networks have many security issues, such as interface flooding, network element crashing, traffic eavesdropping, unauthorized data access, traffic modification, data modification, compromised network element, malicious insider, and theft of service [
28]. These attacks generally occur in the base station, radio station, and Evolved Packet Core, and it is possible to maximize the security and adequately manage the networks by including an SDN in the network. In this part, we discuss the structure of the currently existing networks and SDCN architectures, respectively.
The tremendous developments in innovation and networks is forming a connected universe of millions and billions of devices that are associated and communicating with each other. The present remote advances in 3G/4G are sprouting IP availability and point toward giving quicker web association, sight and sound application, and a large number of administrations with expanded performance. In light of these lists of quick improvement in the network devices, the network does not adequately take care of the demand of assorted variety and low latency, which is foreseen in the 5G remote network. 5G fundamentally gives client-driven availability, where various applications are gotten to a speedier pace at a higher limit and at 1 ms latency. 5G is viewed as an essential instrument for acknowledging the Internet of Things (IoT) worldview of interfacing billions of devices, as it is equipped for supporting machine-to-machine (M2M) correspondence and requiring little to no effort and low battery utilization.
3.1. Long-Term Evolution
In this subsection, we first discuss the LTE architecture, which is the main technology in communication networks, especially in the SDN, consisting of E-UTRAN and EPC [
29], its two parts. The client hardware (UE) interfaces coordinate traffic through a serving gateway (S-GW) over a GPRS Tunneling Protocol (GTP) with a base station. Here, the S-GW must deal with mobility in a client’s area and store many states, since clients tends to hold their IP when they move. Moreover, the S-GW sends traffic to the packet data network gateway (P-GW) to uphold nature-of-benefit approaches and screens traffic to perform charging for the clients. This part needs to be discussed with [
30,
31]. The P-GW likewise interfaces with the Internet and other cellular data networks and goes about as a firewall that squares undesirable traffic.
With the joint effort of Mobility Management Entity (MME), which has handover and mobility functions, the P-GW deals with session setup, reconfiguration and, additionally, portability. For instance, the P-GW sends QoS and other session data to the S-GW in light of a UE’s request for a devoted session setup. Like this, the S-GW advances the data to the MME, which at that point requests the base station distribute radio resources and build up the association with the UE. As for the hand-off of UE, the source base station sends the hand-off demand to the real base station and exchanges the UE state to the external base station to an affirmation. The real base station likewise tells the MME that the UE has changed cells, and the former base station will discharge resources, which will create new GTP for a new radio base station.
The Policy Control and Charging Function (PCRF) oversees flow-based charging in the P-GW. Additionally, the PCRF likewise gives the QoS approval that chooses how to treat each traffic flow, in light of the client’s membership profile. Moreover, the Home Subscriber Server (HSS) contains membership data for every client and the related MME. To give a more detailed illustration,
Figure 3 shows the LTE architecture as Evolved Packet Core (EPC) and Evolved Terrestrial Radio Access Network (E-UTRAN).
In general, the design of present cellular networks has a few noteworthy restrictions. All traffic data in the network must pass through P-GW, which makes it a more critical device in EPC and makes it exceptionally costly. Centralized controller data-plane capacities at the mobile Internet limits traffic through the P-GW, including traffic between clients on the same cell network, making it hard to host famous substances inside the cell network. Furthermore, the network devices have vendor-particular design interfaces with no programmability options, in part through complex control-plane protocols, but with a substantial and developing number of tunable parameters. Above all, new types of architectures are needed to solve these critical problems [
29].
3.2. Software-Defined Cellular Network (SDCN)
Next, in this part, we discuss the structure of SDCNs, in which cellular networks are ready for the presentation of software wireless networks [
29], where the network hardware performs fundamental packet-preparing capacities at the command of a utilization running on a logically incorporated controller. As the cellular network is a wireless network, it is not easy for it to deploy itself to the SDN, which can disperse data-plane rules over various, less expensive network switches, diminishing the scalability on the packet gateway and empowering adaptable treatment of traffic that stays inside the cellular network. Moreover, it supports ongoing updates to some fine-grained packet-handling rules, raising enormous scalability challenges. Additionally, mobility is a big issue, and SDN integration can make a significant impact in solving this problem. Otherwise, it can require a forwarding state at the level of individual endorsers; moreover, the state must change rapidly to stay away from benefit disturbances. However, the network state needs to be changed rapidly to perform the network operation. The LTE network’s components for a typical LTE system architecture comprise an Evolved UMTS Terrestrial Radio Access Network (E-UTRAN) and the System Architecture Evolution (SAE). The Evolved Packet Core, or EPC, is the core component of SAE, while the SDCN’s goal is to create a large-scale, consistent sensing layer that can successfully interact with the top layers. SDCN appears at the top layer as a “large sensor”, concealing the internal communication complexity of the SDCN’s linked sensing networks. SDCN employs a divide-and-conquer-based discovery technique to establish the large-scale sensing network, which is meant to finish the discovery service for the whole network based on diverse communication and ability features among IoT devices, gateways, and users. Abdul et al. [
32] introduce and assess an SDN-based LTE network design that is intended to address existing LTE network constraints, while also bringing various advantages to future wireless networks. The suggested architecture provides a centralized perspective for network operators to deploy new services that aid in the effective management of network resources. The results reveal that the suggested novel architecture allows the controller to improve network performance by centrally controlling radio resource operations and benefiting from the global network perspective. Khan et al. [
33] investigate the controller placement problem in SDCN while taking into account the unpredictability in cellular user locations. Moreover, it solves and evaluates
and CPPA using sample average approximation in conjunction with different linearization strategies. The major findings indicate the following benefits:
Joint compared with sequential optimization.
Stochastic compared with deterministic optimization.
Adaptive compared with static optimization. The primary optimization criteria are as follows:
- (a)
Reduce the number of controllers.
- (b)
Reduce reaction time to various eNBs.
As opposed to , the eNB-controller assignment in CPPA adjusts to fluctuations in cellular user locations. However, because control functionality is performed by cloud services and thus benefits from commodity computing capabilities, integrating SDN in LTE networks gives benefits in terms of CAPEX and OPEX. Another advantage stems from the fact that in the transportation network, the data plane is simplified by eliminating the GTP tunnel. In the future, this will enable the use of commodity off-the-shelf OpenFlow switches to deliver the necessary data forwarding functionalities managed from the cloud.
There are four main ideas mentioned in [
29], and these ideas are flexible policies using local switch agents, flexible switch patterns, and remote control over the virtual-based station. They were proposed to design the cellular network on top of the LTE structure, which also can be used as a 5G network concept. Here, a simple SDCN architecture is shown in
Figure 4. On the other hand, Network Functions Virtualization (NFV) [
34,
35] is used for specific network functions, such as mobility, security, and handover, which are stored in the data center, and it also can store user information, billings, and subscriptions. Thus, every logic component is separated from the data plane as different parts. As described in the earlier section of this paper, the OpenFlow protocol is responsible for the connection between controllers and switches/routers. However, OpenFlow is used to apply the rules directly to the base station from the controller in SDCN, and the controller can be used for directing traffic through middle-boxes, monitoring for network control, billing, mobility, QoS control, virtual cellular operators, or intercell interference management.
3.3. Security Challenges and Traffic Measurement
There are four groups of security issues in the cellular network [
28] of SDCN. Thus, the security challenges are quite relative to traffic measurement. SDN defeats the confinement of the heterogeneous network by decoupling the control plane from the forwarding plane by including programmable capacities, which underscores the way that its division thought easily furnishes administrators with resource provisioning and programmability to change and control the qualities of the entire network [
36]. Inspired by this, SDN needs to take care of current network issues and to enhance network security with exceptionally compelling and productive arrangement.
Although OpenFlow improvement is at its peak, there are still many challenges to be taken into consideration. The middleman attack especially is the big issue for SDCNs—that it is possible to take user information when the data pass through the controller. Thus, an attacker can take network data, change flow rules in the switches, or even embed wrong flow rules to take control of the entire network [
37].
Moreover, OpenFlow itself creates some security issues, and it will take SDCNs to the next level of challenges to solve security issues. The controller generally has a few modules for proficient network administration and observing, which can be viewed as outsider applications [
38]. Once these modules have been endangered in the controller, it will cause unfavorable issues and actuate weaknesses unexpectedly in the entire framework.
In this way, to overcome these security challenges in SDCN and to set up a superior and shrewder network guard system, we have to apply currently developed SDNs to recognize them in different tasks and then react them to malevolent activities ahead of time. Here, we believe SDN traffic measurement ought to be the prime decision for these accompanying reasons:
Traffic measurement adopts favorable circumstances of some specific strategies to comprehend and evaluate network practices, which can be extremely useful in identifying mysterious practices ahead of time.
Traffic measurement can likewise be exceptionally helpful in making safeguards and responses in a short time.
Traffic measurement is the way to understand the network status continuously, which can be connected to a large network, where these are altogether viable approaches to build up a superior secured environment.
3.4. Comparison between SDCN and SDN LTE Architecture
The SDCN (Software-Defined Cloud Networking) architecture and the SDN LTE architecture are both intended to provide flexible and programmable networking capabilities. Yet, there are numerous important distinctions in terms of originality and advantages between these two systems.
Control scope: The SDN LTE architecture focuses on wireless network control and administration, namely LTE (Long-Term Evolution) networks, using a centralized controller that governs network activities, such as radio resource management, mobility management, and quality of service. The SDCN design, on the other hand, concentrates on data center network control and management, offering a consolidated view of network resources and enabling dynamic network provisioning and orchestration.
Network agility: Both designs attempt to promote network agility by providing programmable network interfaces and centralized administration. Nevertheless, the SDN LTE architecture is especially built to handle the dynamic allocation of radio resources, allowing network operators to maximize network performance in real time depending on changing network conditions. The SDCN design, on the other hand, delivers network agility through the virtualization of network resources, enabling on-demand network provisioning and more effective resource consumption.
Integration with cloud computing: The SDCN architecture is intended to be integrated with cloud computing platforms, allowing for the dynamic development and control of network services in the cloud. This gives cloud-based applications and services more scalability and flexibility. The SDN LTE architecture, on the other hand, is designed to provide mobile network operators with flexible networking capabilities and may not be directly connected with cloud computing platforms.
While both the SDN LTE architecture and the SDCN architecture offer programmable and flexible networking capabilities, their areas of concentration and advantages differ. The SDCN design provides virtualized network resources for cloud computing platforms, while the SDN LTE architecture allows dynamic resource allocation for wireless networks.
4. SDN Traffic Measurement
In this section, we introduce the traffic measurement in SDN and different issues from current networks. Technology has advanced significantly in the previous several years. One of the most significant aspects of SDN implementation is traffic analysis. The SDN controller acts as the central processing unit of the network, evaluating and monitoring real-time data flow. Monitoring and analyzing real-time data traffic is a key tool in any networking solution for viewing data packets as they travel from the source host to the destination host. Many comparisons were made in this study. The performance of the two most popular open source SDN controllers, POX and Ryu, is first compared. Second, a large number of topologies were implemented by comparing linear and tree topologies. Third, several packet sizes were compared, providing the researchers with knowledge of the best packet sizes in various scenarios [
39].
New technology makes it troublesome for network administrators to measure the status and progression of the network in compelling and proficient ways [
40], which is the basic requirement for the capacity of clarifying kinds of network activities at various time scales. Congestion control and guaranteed performance are essential to ensure the application execution that allows network administrators in extraordinarily distressing circumstances to fulfill clients’ desires for conveying applications with ensured QoS. There are some key points that need to be considered before choosing the way of traffic measurement, as follows:
So far, with the description of their colossal scale and the assorted variety of the considerable traffic, the SDN measurement tasks need a different way to measure the data traffic. In this way, there are two types of measurements that exist in SDN traffic measurement. One is active measurement and the other is passive measurement [
43].
4.1. Active Measurement
In the active measurement process, network flows are ceaselessly checked for execution by sending unique test packets over the network. Moreover, traffic flows are irritated by test traffic flows, and it may create huge overhead. Thus, active measurement in SDN requires caution when intending to adapt to the necessities of unified control design. Additionally, the deployment of active measurement devices expands the data, obtaining dramatically different requests of size [
44]. So, the flows of data procured by the dispersed devices in the network cannot furnish the SDN controller with the essential data, and it is essential to limit the effect of the unsettling activity’s influence. Even though there are some design methods included in this area, such as artificial intelligence and robotic nature, the controller faces correspondence bottlenecks, firm control, enhancement issues, and performance optimization. It is necessary to know that this is a continuous process, and that the more we run the query, the more congestion will increase. This technique can be vulnerable to the controller, and it cannot respond as quickly as the traffic increases.
4.2. Passive Measurement
On the contrary, in passive measurement, genuine traffic is captured and then queries are run to it to obtain the traffic measurement. Passive measurement has no impact, as it does not lead to any overhead in the network. However, sometimes passive measurement is vital in substantial traffic instances, as it does not create additional traffic in the network and depends on sampling and some statistical method to obtain the traffic data. Thus, the primary challenge of passive measurement is that small flows might be missed along an SDN flow way and may test the very same packet, prompting measurement mistakes [
45]. To obtain a real-time measurement, we need sophisticated hardware to implement with the network, so that the result comes out within a tight time. Presently, a few rising methodologies are endeavoring to overcome some critical challenges by effectively utilizing the concept of scalability in SDNs to offer programmable interfaces to accomplish fine-grained estimations of network activity flows. These existing investigations either propose active measurement techniques or passive measurement strategies.
4.3. Requirement of SDN Traffic Measurement
SDN traffic measurement has some requirements that must be followed to design SDNs. Without accurate measurement, the network cannot be scalable, adaptable, and secured, and it is quite impossible to design a higher level network policy [
46]. The basic part of an SDN is the software program, which is accessible to be tracked by the attacker. To avoid monitoring what is happening inside the network every time, it is necessary to apply artificial intelligence into the network with the meaningful use of big data. Thus, SDN traffic measurement is the critical step to make it scalable and adaptable for network operators. A robust network framework for SDN has several design goals which will impact the network, and it needs to follow the issues given below.
Accuracy: an SDN traffic measurement framework requires high measurement accuracy, which is a crucial part for every network functioning specifically in an SDN. Additionally, a measurement job would not create congestion and performance degradation.
Resource effectiveness: an SDN traffic measurement framework proficiently uses the Central Processing Unit (CPU) for packet handling and memory, on which the measurement job would like to increase extra usage of the resources. That is why CPU usage will be increased by the increasing number of measurement tasks. Thus, the solutions of traffic measurement need to decrease resource usage while maintaining higher accuracy.
Generality: an SDN traffic measurement framework supports an extensive variety of measurement solutions. We cannot use a framework only for one network function, which is costly, complex, unreliable, and unsuitable for next-generation heterogeneous networks.
Simplicity: an SDN traffic measurement framework naturally mitigates the handling weights under high traffic load, and it does not necessarily require configuring every host. It needs to arrange the data and the results to deploy an action immediately in the network.
Some of the papers related to SDN traffic measurement frameworks are the review of outlines such as OpenSketch [
47], Univmon [
48], Dream [
49], Scream [
50], and Trumpet [
51], as well as the online measurement of large traffic data by Jose et al. [
52], where all of them support the requirements above. It contains a separated data plane that keeps running on the software switches in a network and a centralized control plane that collects data from all software switches for various types of measurements.
Figure 5 shows the balance of overhead, generality, real-time decision making, and proper resource usage are the standards set for SDN accurate measurement.
4.4. Traffic Measurement Activities
There are several measurement activities for traffic measurement. Here, we have to find out which in the network needs to be measured. To answer this question, we summarize the measurement activities, which are inevitable for SDNs/SDCNs.
4.4.1. Link Latency
Link latency is the delay from input into a system to desired outcome; the term is understood slightly differently in various contexts, and latency issues also vary from one system to another. Link latency greatly affects how usable and enjoyable electronic and mechanical devices and communications are, and it is sensitive to network applications in order to find out the malfunctions of the network and then correct them [
19]. For example, e-banking needs to communicate without compromising the delay, in which link latency plays a crucial role. By watching latency thresholds, the network administrator can reroute the network path to maintain QoS [
20].
There is an example of link latency and how we can measure it shown in
Figure 6a. In general, it can be described as the latency in the communication between the user and the servers. Here, we need to calculate the latency between them as one part, from pulling request to receiving instruction; then, we can make solution to guarantee the QoS.
4.4.2. Network Topology
Network topology is the arrangement of elements such as the links or nodes of a communication network, which is the complete overview of the whole network containing all the physical connection links among all network nodes [
53,
54,
55]. Estimating and refreshing topology assumes an essential job in giving the capacities of essential networks. What is more, some network functions such as routing, troubleshooting, network management, and malware detection need the information of network topology.
Figure 6b shows a distributed topology in a simple network, the nodes contact each other and each of them links to several users. To measure the traffic in different kinds of topology networks, we need to first clarify their architecture and then collect the messages.
4.4.3. Bandwidth
Bandwidth [
56,
57,
58] is the maximum throughput or capacity of the communication, whether it is logical or physical communication. Maximum capacity is sustained on the Shannon channel capacity theorem, which depends on signal-to-noise ratio. An example is shown in
Figure 7a, in which there is limited capacity on the global branch at each second, which may dredge into several local branches.
4.4.4. Heavy Hitters and Heavy Changers
Heavy hitters [
59] are huge flows which cause network congestion, making it hard to handle the data in the link. Thus, we have to set a threshold point to decide whether the traffic is jammed, and then give out the solutions. On the other hand, a heavy changer is almost the same as a heavy hitter, in that it also causes congestion in the network. The difference between them is that the heavy changer is a flow whose difference in byte counts crosswise over two consecutive epochs exceeds a threshold.
Figure 7b shows the example of these two measurements, where the huge flows exceed the capacity of the network and lead to congestion.
4.4.5. DDoS
DDoS [
60] stands for Distributed Denial of Service, where a system gets a massive amount of traffic measurement of more than a threshold point from various sources, which leads to a breakdown in the network; we call this a DDoS attack. Thus, the server is attacked by an amount of disturbing flows, and then the link cannot handle the traffic and stops responding to all requests. A DDoS attack is one of the popular attacks which takes less time to attack and control over the network.
As shown in
Figure 8a, the attacker uses the controller to control plenty of zombie services to send traffic messages to the victim, which produces congestion and breaks down the victim’s system. In real time, our measurement needs to filter out the attack packets to provide QoS.
4.4.6. Superspreader
A superspreader [
61] is the opposite of a DDoS; the amount of flows/data are sent by a source of more than the threshold point to several destinations. Thus, an attacker can use one client’s computer as a superspreader to send a lot of traffic to other destinations, which leads the client’s computer to not being able to handle the traffic and even to stop responding. To avoid being attacked by the superspreader, we must disconnect the Internet of whole network. Different from the DDoS, the attacker in
Figure 8b controls the victim’s computer to send an amount of flows to other places at each time, and the victim cannot handle this traffic either.
4.4.7. Entropy
Entropy [
62] is the equilibrium state of the network. There are many links in the network, which can give the largest number of ways to construct the network. Thus, a network needs to use all links properly with a real-time scenario. It is crucial to know the link flow distribution, so that the link can be at the equilibrium state.
There are some other crucial measurement activities, such as cardinality (number of particular flows in an epoch) and flow estimate distribution (flows for range scopes of byte counts in an epoch), which are also valuable to be measured in the traffic measurement task.
4.5. Shortcomings of Current Solutions to Traffic Measurement
The measurement of traffic in SDN (Software-Defined Network) and SDNC (Software-Defined Network Controller) settings can be difficult, owing to a variety of variables. The following are some issues with existing traffic measuring solutions:
Limited visibility: In traditional networks, network administrators have access to every device and can monitor traffic at every point in the network. However, in an SDN or SDNC environment, the network is abstracted, and network administrators have limited visibility into the network. This limited visibility makes it difficult to measure and monitor traffic effectively.
Lack of standardization: SDN and SDNC environments are relatively new, and there is a lack of standardization in terms of traffic measurement solutions. This lack of standardization makes it difficult for network administrators to choose the right traffic measurement solution, leading to compatibility and integration issues.
Overhead: Certain traffic monitoring systems rely on network traffic capture and analysis, which can produce a large volume of traffic and degrade network performance. The overhead caused by traffic measuring technologies might potentially have an impact on traffic measurement accuracy.
Security concerns: Traffic measurement solutions that capture and analyze network traffic can potentially expose sensitive data, making them a target for cyberattacks. It is important to implement proper security measures to protect the traffic measurement solutions and the data they collect.
Cost: Traffic measurement solutions can be expensive, particularly for large-scale networks. This cost can be a significant barrier to implementation, particularly for small and medium-sized businesses.
Overall, while traffic measurement solutions are crucial for monitoring and optimizing network performance, their implementation in SDN and SDNC environments requires careful consideration of the above factors. To overcome the challenges mentioned above, network administrators can consider the following solutions:
Use of flow-based monitoring: Flow-based monitoring measures traffic flows based on the source, destination, and protocol used. This method reduces the amount of data captured and analyzed, reducing the overhead generated by traffic measurement solutions. Flow-based monitoring is also a standardized method of traffic measurement, ensuring compatibility and integration with different SDN and SDNC environments.
Implementing network overlays: Network overlays provide a virtual network layer that can be used to monitor and measure traffic. Network overlays enable network administrators to have more visibility into the network, making it easier to measure and monitor traffic.
Implementing secure traffic measurement solutions: Security concerns can be addressed by implementing secure traffic measurement solutions that use encryption and authentication to protect sensitive data. These solutions can also be implemented using secure protocols such as HTTPS and SSH.
Leveraging open-source solutions: Open-source traffic measurement solutions can be used to reduce costs associated with commercial solutions. Open-source solutions can be customized to fit specific network requirements, making them more flexible than commercial solutions.
Selecting the right traffic measurement solution: Network administrators can select traffic measurement solutions based on the specific needs of their network. Choosing the right solution can reduce compatibility and integration issues, making it easier to implement traffic measurement solutions in SDN and SDNC environments.
Implementing traffic measurement solutions in SDN and SDNC environments requires careful consideration of the challenges mentioned above. By using flow-based monitoring, network overlays, secure traffic measurement solutions, leveraging open-source solutions, and selecting the right traffic measurement solution, network administrators can effectively measure and monitor traffic in their SDN and SDNC environments. In the following section, we briefly look at several potential approaches for traffic measurement solutions.
5. SDN Traffic Measurement Solutions
In this section, we discuss existing proposed SDN traffic measurement solutions, which can be used for further research to design secured SDCN architecture. We collected almost all the papers which proposed traffic measurement solutions until 2018, from which the initial stage network architecture was simple, less complex, and easy to deploy. Thus, in the research before, measurement requirement was based on overhead and most of the researchers concentrated on balancing overhead. At the same time, resource usage policy played a crucial rule, and some other solutions were introduced to obtain high accuracy by using less resources. The complexity and heterogeneity of today’s networks involve the need to instant video calling [
63], online transaction, and social networking, and we also have to consider real-time traffic measurements for quick decision making. Furthermore, we need some specific network discovery, such as latency monitoring and topology discovery, to find out the current status of the network. Here, we make four groups of different traffic measurement solutions in
Figure 9, which are given below in detail.
5.1. Balance Overhead
Ceaselessly observing the network frequently presents overhead, and it should be taken into consideration as a tradeoff with traffic measurement precision. With the end goal of locating a suitable zone among exactness and overhead ramifications, Jose et al. [
52] proposed a way to collect substantial traffic data in commodity switches by measurement structure, where switches coordinate packets against a collection of wildcard case rules accessible in Ternary Content Addressable Memory (TCAM). Here, TCAM is a type of RAM where data can be saved in Boolean form, and it can save a good amount of data by compression. This approach reduces overhead, because the switch can take decisions based on some wildcard rules. By using TCAM, it is possible to store data or packet processing rules in the switch/router. The structure is assessed utilizing a Hierarchical Heavy Hitter (HHH) program to comprehend the tradeoff between accuracy and overhead. In these types of categories, we need to regularly update the matching rules, which are the primary concern of this measurement solution [
64].
To further improve the above method, iSTAMP [
11] powerfully segments the TCAM entries to permit the fine-grained measurement tasks of coming flows. iSTAMP creates two partitions of TCAM, one is used for aggregation, and the other is for deaggregation. Flows are stamped for active measurement on the off chance that they are considered to be essential. The iSTAMP utilizes the algorithm which is based on calculation to process these two arrangements of measurements, which are then mutually prepared to evaluate the extent of all network flows utilizing distinctive optimization strategies.
On the other hand, methods such as OpenNetMon [
65] use OpenFlow to quantify traffic parameters. OpenNetMon decides whether end-to-end QoS parameters exist in each flow, and it is a process of continuous monitoring with predefined rules. Thus this active measurement fetches the data from switches, and the queries will vary by changing the flow rate. Moreover, if the flow rate changes, the query will increase, and vice versa. Another OpenFlow-based approach is proposed by OpenTM [
13]. Here, switches are simple forwarding devices, and the controller can query traffic data by using OpenFlow flow entries, of which the logic is to track every active flow in the network. OpenTM is an active far-reaching measurement approach that at last will present overhead during the time spent intermittently pulling factual data from switches over the network. Additionally, OpenTM is used to mix determination strategies to choose switches for pulling data; this may prompt some measurement mistakes. Thus, OpenTM is not suitable for high traffic congestion.
There are also many approaches in different methods. Hash-based switches [
66] are used to collect the traffic flows in the SDN, which helps distinctive measurement undertakings with the heavy hitter calculation for characterizing vital traffic. In any case, observing guidelines should be carefully designated over the network. Like in Y. Zhang et al. [
67], it utilizes a forecast-based calculation for flow checking to distinguish anomalies. The measurement granularity along both the spatial and worldly measurements changes progressively. In the same way, abnormality locators can educate the flow-gathering module to give fine-grained measurement data if there should arise an occurrence of expecting an attack, or it can gather fine-grained flow data generally. Furthermore, Payless [
68] is an SDN-based active framework, which is also considered for its accuracy and overhead. The preferred fundamental standpoint of Payless is that it utilizes a versatile measurements gathering calculation to achieve exact data continuously without bringing about noteworthy network overhead. The API of the floodlight controller is used for actualization, which has low overhead and makes for higher precision.
Table 1 shows the comparison of the traffic measurement by considering the balanced overhead.
From the approaches mentioned above, we can generalize their characteristics as follows:
These traffic measurement solutions are almost all concentrated on balancing overhead and accuracy.
TCAM is the vital component of all these solutions in this part, and flow entries are stored in the different segments of TCAM.
A different approach of collecting TCP sequence numbers in sampled flow (sFlow) collector makes it robust in hierarchical networks and popular among the network operators.
5.2. Resource Usage Policy
As we mentioned before, resource usage was not a concerning tradeoff for the above solutions; so, one proposed method, DREAM [
49], is a dynamic resource allocation software-defined measurement system that harmonizes between accuracy levels and resources used for measurement activities. These resources are not allotted before the execution of the measurement task in DREAM, which concerns resource usage and its impact on measurement accuracy to obtain an accurate measurement. Furthermore, the DREAM system tries to utilize heavy hitter programs to demonstrate that DREAM can bolster more simultaneous undertakings with higher accuracy than a few different options. Moreover, alignment is a necessary component between source and destination so that it can achieve a higher level of accuracy while reducing overhead.
Moreover, another way was developed by Dusi et al. [
69], who proposed a powerful proactive controller which required a certain amount of space for flow entries in TCAM. However, sometimes traffic flow can exceed the limit of TCAM entries. While furnishing SDN switches with all the more capable TCAMs is a possible alternative, this may come at the detriment of expanding the operation and power utilization costs. The investigation recommends that controllers ought to expend resources proficiently utilizing a reactive logic control approach. As in DREAM, the examination recommends that resources must be dispensed and liberated, relying upon the network stack, the viable conduct of the flows, their granularity, and packet processing. Thus, it should be a way that can be manageable in the network correctly. Additionally,
Baatdaat [
77] is another proposal which utilizes NetFPGA programmable switches with running OpenFlow, and it allows continuous dynamic flow. The proposed algorithm can adjust to immediate traffic blasts and additionally to standard connection stack by utilizing the saved DC network ability to moderate the performance corruption of intensely used connections.
A platform named
HONE, in which various types of measurement solutions can be gathered together, is proposed in [
70]. Here, it presents a uniform stack for a various accumulation of measurements in SDN-based frameworks. Since consistently gathering measurable data about network flows is costly, two strategies are proposed to solve this issue. The first strategy makes a table of statistical data collected from source, destination, and network devices, which minimizes the network overhead by allowing a controller for querying from that statistical data table. The second strategy is known as parallel streaming, where operators can use these data by aggregating among multiple hosts. If we want to apply HONE to an SDN, scalability will be the significant challenge, because each host needs to install a software that synchronizes to the statistical data so that meaningful queries can be achieved.
Table 2 shows the comparison of the traffic measurement by considering the resource usage.
From the approaches mentioned above, we can generalize their characteristics as follows:
All these traffic measurement solutions concentrated before on how to decrease resource usage of CPUs while improving the accuracy performance.
The powerful proactive controller was used for obtaining more TCAM storage, but flow entries exceeded the limit of the table in high traffic load. Thus, researchers utilized optimization techniques to manage resources more efficiently.
One of the accurate techniques they applied is that different measurement solutions obtained different priority on resource usage based on the measurement requirement.
5.3. Real-Time Monitoring
Collecting the statistical data of the large amount of flows in real time is the challenge in SDN traffic measurement. Thus, time-sensitive apps and real-time analysis can solve this issue. A proposed method,
PLANCK [
71], is a real-time measurement framework which gathers significant real-time data for statistical analysis, and it utilizes the typical characteristics of the switches. The eminent advantage is that it can be deployed in an SDN environment without changing network devices. Moreover, PLANCK gives a speedy network framework for comparing traditional networks as port mirroring, which is a typical way to deal with traffic measurement, going through the mirror utilizing an assortment of network analyzers and security applications. Nevertheless, traffic volume may surpass the limit of the ports to make the switch begin dropping packets. To unravel this issue, the researchers proposed a way [
72] to buffer the traffic measurements for advance examination. However, buffering does not give proper solutions, because dropping packets are dramatically increased in high network load.
Slightly different from the above OpenFlow approach,
OpenSample [
72] is a sampling-based measurement method proposed by IBM, a manufacturer company of hardware devices. It uses sFlow [
73] packets to give near-continuous measurements of both network load and individual flows by catching packets from the network. OpenSample utilizes the TCP sequence number from the captured data, and sFlow collector collects the data to analyze accurate flow statistics. Floodlight OpenFlow controller is used for the testbed. One of the fundamental focal points of OpenSample is that it considers any TCP flow. Furthermore, it can negotiate colossal traffic and continuous flows with no requirement of adjustment to switches. These advantages make it exceptionally attractive and acceptable to global network operators. However, it is expensive and vendor-oriented to be deployed in an SDN.
A new idea of using sketch-based measurement was started from
Reversible Sketch [
74], which utilizes a flow header and hashing function to make a smaller subspace. It takes a smaller amount of memory space. In the first stage, it records the packet stream using an FPGA board by operating it online. Moreover, it can detect massive changes in the flows in the next stage. Thus, this framework is critical in heavy traffic, which uses a small amount of memory. Furthermore, most sketch-based arrangements are intended for the particular measurement task. To run various measurement tasks together, we have to use a set of algorithms altogether to obtain the desired query. In addition, running every algorithm individually on each packet turns out to be computationally demanding; so, we need a framework for measurement solutions.
A complete framework,
OpenSketch [
47], requires to sketch the data with three stages of hashing, classification, and counting. The sketches used in it are just natives that cannot be straightforwardly utilized for network measurement; instead, we should supplement them with extra segments and operations to ultimately support a measurement task. Additionally, the network administrator uses a measurement library to obtain the desired measurement tasks, and it can use a set of algorithms from the library to obtain different types of measurements. OpenSketch can be utilized for a few measurement exercises, including heavy hitter measurement, traffic flows, and DDoS, specifically keeping in mind that to collect traffic measurements, adding extensions is necessary. Sketch is a program that is reversible, implying that sketches store traffic statistics, as well as productively answer the queries on the measurements. After a querying of flow, the Count-Min sketch algorithm [
47] can return a flow size. Thus, we need to distinguish heavy hitters that surpass a predetermined threshold, such as a Count-Min sketch, which can instantly report a heavy hitter by checking whether a packet surpasses the limit. Nevertheless, the earlier threshold is inaccessible ahead of time, and we have to inquire for heavy hitters subject to various thresholds. In addition, we should query all flows in the whole flow space and check if each of them surpasses the threshold for the possibility of giving higher accuracy in measurement tasks. Above all, OpenSketch has many advantages, which almost overcome all other problems. Some of the essential benefit of using it are as follows:
Using hashing and only counting the flows increases the accuracy with low overhead.
It utilizes a much smaller amount of resources.
Having a measurement library for measurement reduces computational cost and time.
Deltoid [
75] updates every packet and adds counters in each bucket, which is encoded in flow headers. This updating significantly increases the query cost and time-consumption. However, it is not easy to get back the data from the data plane. Another solution,
FlowRadar [
76], solves this issue by mapping flows to counters through exclusive or (XOR) operations, where flows can be constructed easily by repeatedly XOR-ing. However, query cost and heavy computational overhead are still the basic challenges. The recently proposed
UnivMon [
48] enables to sketch the data simultaneously with distinctive sorts of traffic measurements. Nonetheless, it needs to refresh different components and remains computationally costly. After UnivMon, in another paper, there is
SketchVisor [
78], which proposes a new idea for a faster path. Generally, in colossal traffic scenarios, measurement accuracy in other solutions is computationally high-cost and less accurate. The SketchVisor framework introduces a faster pathway to bypass the normal path when network congestion is increased. It only counts faster path flows and small flow counts can be achieved by subtracting fast path flow counts from total flow counts. Although sketch-based solutions are attractive in SDN traffic measurement or arrangement, they need to be updated by network nodes continuously.
Table 3 shows the comparison of the traffic measurement by considering the balanced overhead.
From the above methods, we can make a summarize as follows:
All of these traffic measurement solutions concentrate on real-time network flow collection for quick decision making.
A single framework is being used with a library of different measurement algorithms, so that network administrators can choose any combination of the algorithm to obtain desired traffic measurement.
Th sketch-based framework is the best method, though it needs a different protocol other than OpenFlow. However, OpenFlow is now broadly acknowledged as an industry standard in data center situations and it is progressively being actualized.
5.4. Traffic Measurement for Specificity
In this part, we discuss some specific measurement frameworks of latency measurement, bandwidth measurement, and topology measurement. We already discussed the basic idea behind these measurement activities in the early part of the paper.
5.4.1. Latency Measurement
A latency measurement framework
SLAM [
79] can quantify latency between any two network switches along the way by sending test packets from one switch to another. In addition, it estimates latency by checking arrival packets in the control plane, which is well suited to SDN engineering. Moreover, the accuracy of latency relies upon the process time of the first and last switches along the way, which differs continuously with every occasion. The preface of estimating latency is the timestamp extricated from the packet, which implies that increasing the packet header size can be very useful for different SDN applications. There is another latency measurement framework,
DPTH [
80], which estimates latency between the two switches by sending timestamped packets from one switch to another and explicitly computes the time contrast as long as these switches have synchronized. Some of the critical characteristics of DPTH are as follows:
Mitigating color bits by adding it in the header.
Adding an extra header to all packets schemes, which by all accounts is exorbitant in any case.
Different expense in light of particular application requests.
Processing delay to measure latency accurately.
Additionally,
MCPL [
81] introduced a far-reaching investigation of latency measurement, which has efficient points, such as inbound latency, outbound latency, and hardware performance inconsistency. Receiving the message, addition or subtraction, update time, and installation delay are part of outbound latency. Additionally, a packet timestamp can procure the timing. We already illustrated some latency measurement solutions which can be used for specific network application conditions.
Table 4 shows traffic measurement solutions for latency measurement.
5.4.2. Bandwidth Measurement
Accessible Bandwidth (ABW) [
82] is the extreme packet-forwarding rate for another flow to impart the different flows. Accuracy in bandwidth measurement in SDN can quicken network administration and enhance network QoS. At the point when the controller is collecting data from the entire network, bandwidth can successfully analyze and wipe out conceivable linkage issues, which ensures the network capacities as well [
83].
ABW measurement is a general method for normal conditions of the network. To further improve the ABW,
SOMETIME [
84] is an active measurement, which is designed for wireless connectivity and applied in cellular broadband access networks to give an estimation of ABW in an SDN environment. Because of various of devices in a cellular network, the accurate ABW measurement approach is expected to represent sharing correspondence resources and breaking constraints. By exploiting the OpenFlow protocol to focus on flows from other network traffic effortlessly, the controller is the busiest module here, without affecting much in network performance. However, the presence of a control message may introduce overhead, which can slow down the network.
Thus, we need ABW to be more accurate and straightforward to apply in the network applications. A way to maintain the QoS is developed by
MAPLE-Scheduler [
85], which is the first one to measure effective bandwidth and check routing conditions. The edge and core part of the network has different effective bandwidth (EB) measurements. Moreover, it gives more flexibility and accuracy by deploying EB measurement in the edge part of the network. However, the disadvantage is that the accuracy cannot always be the same.
Table 5 shows traffic measurement solutions for bandwidth measurement.
5.4.3. Topology Discovery
OpenFlow Discovery Protocol (OFDP) [
86] discovers network linkage hop by hop, and the entire system can be rearranged, as all switch nodes will build up TLS with the controller by handshake session in any case. At the initial stage, the controller collects the packets of the full network and establishes a TLS session. Then, the controller sends packets to the whole network with the full information of the data in the next stage. After all the switches get the data, it will send packets to the nearest switches with predefined rules, and finally the controller gets the map of the full network. In addition, the controller will get the information of new topology by synchronization if any link is broken. With such an efficient way to actualize the resource consumption, its trouble is high overhead.
Moreover,
OFDP v2 [
87] has been enhanced by restricting the controller to send an amount of packet numbers. Moreover, the switch can change the receiving packets, including the MAC address and port numbers. Then, these packets are sent to every single access port through switches. Although the control overhead and CPU resources usage reduce by almost a half compared with the previous version, the constraints of OFDP still exist and are still not solved yet, namely that it takes a large amount of resources while refreshing the entire network structure. It takes advantage of quality performance in high traffic.
SD-TDP [
88] is a solution of measurement to find the topology by mitigating the controller’s load by breaking the entire network into smaller parts. In the initial stage, each active node (AN) in the network keeps up a deactivated state and waits for the TDP request message from the controller. The node status will be changed once a TDP request message is received with the progressive structure set up. Then, the next discovery stage consists of the correspondence among father node (FN) and AN. FN has all the messages and communicates with the controller, which will recover the topology structure of FN last. When the network topology changes, FN will naturally change its state to AN. Additionally, SD-TDP relieves the controller’s load and enhances its performance.
NetMagic [
89] is another topology measurement platform which can discover actively complete networks, and the controller sends test packets to all NetMagic in the whole network. When NetMagic gets packets from its neighbors, it informs the controller; then, the controller collects all the information from NetMagic and gets the topology of the entire network. NetMagic uses a hash function, which has the great advantage of using less onboard RAM. Additionally, intradomain topologies sometime can be very complex, which may lead to overhead, and the administrator does not know what the current scenario of the network is, whereas the controller manages the full network. Thus, the controller cannot take routing decisions in intradomain topologies.
W. Y. Huang et. al [
90] utilize a module called
ENVI and support a correspondence system among NOX and floodlight controllers. Here, NOX will send each host interface message to ENVI and save all the data by maintaining a data structure. Moreover, the floodlight needs to make a correspondence with ENVI in any case and includes two data structures in the floodlight controller to support the correspondence, which is an intradomain map. Along these lines, the new data structure can be put away and prepared among NOX, floodlight, and ENVI. NOX and floodlight assemble the correspondence component through ENVI. Nonetheless, the enormous intricacy, low correspondence productivity, and absence of high recognizable proof accuracy of this technique still need to change.
Table 6 shows traffic measurement solutions for topology measurement. In a word, we analyze the existing methods as follows:
For latency measurement, the control message is sent by the controller, which receives the message and calculates the latency by receiving and transmitting time; so, the controller is the main part of whole measurement system.
For bandwidth measurement, end-to-end connection is used in wired and wireless networks to differentiate traffic flows and send flows to the controllers.
For topology measurement, the entire network structure is partitioned into a few units, and every node in the partitions is active to process the control message forward.
Clearly, it is possible to see that traffic measurement in SDN is not smooth enough to perform. Researchers are still trying to solve the measurement problems to meet the minimum requirement of SDN traffic measurement solutions. Using statistic machine learning and deep learning, we can find more accurate measurement with less resource usage, low overhead, real time, and specific accurate SDN traffic measurements by using ML in a complex, hierarchical, and heterogeneous next-generation 5G network. In the next section, we discuss some ML-based SDN traffic measurements.
6. Machine Learning in Traffic Measurement
In recent years, ML has been popular in many fields, and this technique has been widely used by researchers to solve sophisticated problems for data science and networking. For example, it can be used in search engines, such as Google, Bing, and Yahoo, or in web page ranking, disease analysis, and network management to dig out the interests of customers, then make decisions and further analyses. Thus, ML not only can help to analyze data but can also predict future patterns by itself. Additionally, it is a new and upcoming concept that it can be used in traffic measurement to learn the patterns, such as routing, congestion control, QoS control, resource management, network management, fault analysis, and security management, and then make future predictions of their trends. Especially in a wide area network, routing patterns are complicated to be handled by the administrator. At this point, whether the traditional networking system can be replaced by the SDN modern networking system, where controllers are taking decisions for network management, is the main problem. However with highly heterogeneous networks using ML, the controller will be more scalable and responsive to make decisions to manage the network. A lot of survey papers [
22,
91,
92,
93,
94,
95] have applied ML to networking, and some parts of them also described how we can use ML in SDN traffic measurement.
Figure 10 gives an overview of this section.
6.1. Process of ML in Traffic Measurement
6.1.1. Data Preparation
Before using ML in the network, we need to collect and store the packets data offline or online [
96] to be learned. There are various types of substantial offline data that can be collected, such as Cyber Risk Trust Archive [
97], Discovery in Databases Archive [
98], and Waikato Internet Traffic Storage [
99], and the passive measurement or active measurement framework [
100] can help to further store more data. Additionally, this dataset will help to build a decision model to find an accurate predictive measurement. Then, the gathered data are divided into train and test datasets, and the training set is utilized to discover the perfect condition for a Neural Network (NN) as an ML model. At long last, the test set is utilized to evaluate the unprejudiced performance of the choice to demonstrate.
Figure 10.
Machine learning solutions in traffic measurement [
84,
85,
86,
96,
100,
101,
102,
103,
104,
105,
106,
107,
108,
109,
110,
111,
112,
113,
114,
115].
Figure 10.
Machine learning solutions in traffic measurement [
84,
85,
86,
96,
100,
101,
102,
103,
104,
105,
106,
107,
108,
109,
110,
111,
112,
113,
114,
115].
6.1.2. Feature Extraction
During the training, we need to take the feature selection part, which is utilized to lessen dimensionality in voluminous data and to distinguish segregating features that lessen computational overhead and increment the accuracy of machine learning models [
116]. Here, feature selection and extraction can be performed utilizing different simulators, for example, NetMate [
117], WEKA [
118], Python [
119], and MATLAB [
120]. Nonetheless, the extraction and determination procedures are constrained by the capacity of the device utilized. It is significant to deliberately choose a perfect set of features that strike a harmony between misusing connections and lessening overfitting for higher accuracy and lower computational overhead. Then, ground truth is to be set up to relate formal depictions to the classes of intrigue.
6.1.3. Training and Testing
After extracting features from the dataset, we train them with the machine learning methods such as support vector machine (SVM), decision tree (DT), naive Bayes (NB), and so on. Moreover, we can use these models to evaluate the traffic measurement and update our models each time with the online data. There are different techniques for marking datasets utilizing the features of a class. Essentially, it requires help from deep packet inspection (DPI) [
121,
122]. Once an ML model has been assembled and the ground truth has been discovered, it is critical to measure the performance of the model that will anticipate or assess outcomes. There is no real way to recognize a learning algorithm as the best model, and it is not reasonable to analyze mistake rates [
18]. The performance measurements can be utilized to gauge the distinctive parts of the model. In this paper, we discuss some ML techniques for traffic classification, prediction, routing, and resource management, which can be used in an SDN environment.
Figure 11 shows how ML can be used in network traffic measurement.
6.2. Traffic Measurement Solutions Using ML
In this part, we discuss ML-based SDN traffic measurement solutions and then categorize them.
6.2.1. Traffic Classification
Today, in the era of technology, where everyone is communicating through a network, there is the potential for users to communicate or send data with a probability of being hacked or damaged. Thus, there are many ML-based applications that are introduced to detect misused based attacks. In these applications, ML classifiers are used to distinguish the data between two groups: one is misused, and the other is anomaly-based [
91]. ML classifiers have the power to recognize the pattern of threatened attacks in a large amount of data which is coming in the network. Many studies have been conducted on the bases of ML technology in the past to detect the misuse detection [
101,
102,
103,
104,
105,
123,
124,
125,
126]. The majority of work has been done on computers (offline), such as researchers collecting data from the network and processing the data offline on a computer using different ML tools. Additionally, in offline detection, people trained the classifiers to classify the attack data and standard data.
Cannady et al. [
124] introduced a very early virus detection system through NN, which can achieve great success in saving an SDN from viruses and works especially efficiently when there is less complexity in detecting misuse data. This work has been done offline; five features were analyzed, i.e., TCP, ICMP, IP, header fields, and payload. In NN, nine (9) layers were utilized to classify two categories of data. The number of neurons has been determined through trial and error; the sigmoid function has been utilized as an activation function on the neurons, and it can achieve 89% to 91% accuracy.
6.2.2. Traffic Routing
In addition to traffic flow classification, traffic routing is also important in traffic measurement to manage traffic flows when congestion happens. It requires challenging abilities for the ML models, such as the ability to cope and scale with complex and dynamic network topologies, the ability to learn the correlation between the selected path and the perceived QoS, and the ability to predict the consequences of routing decisions. Thus, researchers prefer applying reinforcement learning (RL) to learn the control strategy instead of human management. There are several existing papers related to the traffic routing problem using RL methods.
In [
106], Forster et al. used a Q-learning approach in a multicast routing protocol, called FROMS (Feedback Routing for Optimizing Multiple Sinks). The goal of FROMS is to route data efficiently, in terms of hop count, from one source to many mobile sinks in a WSN by finding the optimal shared tree. Additionally, Hu and Fei [
107] proposed QELAR, a model-based variant of the Q-routing algorithm, to provide faster convergence, route cost reduction, and energy preservation in underwater WSNs. More recently, a centralized SARSA with a softmax policy selection algorithm was applied by Lin et al. [
108] to achieve QoS-aware adaptive routing (QAR) in SDNs.
6.2.3. Resource Management
Additionally, resource management in networking entails controlling the vital resources of the network and managing kinds of resources to make the network stable. For programmers, it is easy to determine how to control the controller to takes order. However, it will be more artificial if we apply machines to dig out the relationship between resource measurement and management. There are several methods using machine learning to deal with this problem.
Piamrat et al. [
109] proposed an admission control mechanism for wireless networks based on subjective quality of experience (QoE) perceived by end-users. This is in contrast to leveraging quantitative parameters, such as bandwidth, loss, and latency. To do so, they first chose configuration parameters, such as codec, bandwidth, loss, delay, and jitter, along with their value ranges. Then, the authors synthetically distorted a number of video samples by varying the chosen parameters. Moreover, Baldo et al. [
110] proposed an ML-based solution using MLP-NN to address the problem of user-driven admission control for VoIP communications in a WLAN. In their solution, a mobile device gathers measurements on the link congestion and the service quality of past voice calls.
Table 7 shows the comparison among all the SDN traffic measurement solutions in different categories.
6.3. Comparison of Proposed Approaches
Machine learning (ML) is being used more and more in network measurement to automate the analysis of enormous amounts of network traffic data and find trends and anomalies that may signal network faults or security risks. Among the most typical outcomes of ML procedures in network assessment are the following:
Anomaly detection: ML algorithms may be taught to detect unexpected network activity that may suggest a security breach, a misconfiguration, or a network infrastructure breakdown. Anomaly detection can assist network administrators in detecting and responding to threats or issues more rapidly.
Predictive maintenance: ML algorithms can anticipate when network components are likely to break by evaluating previous network performance data, allowing administrators to carry out preventative maintenance and reduce downtime.
Classification of network traffic: ML may be used to categorize network traffic based on application type, user behavior, or other variables. This can assist administrators in prioritizing bandwidth allocation, improving network speed, and identifying possible security risks.
Network optimization: Machine learning algorithms may evaluate network traffic patterns to identify bottlenecks or regions of congestion and then recommend modifications to increase network performance and minimize delay.
Capacity planning: ML can forecast future capacity demands and assist administrators in planning for future network development by evaluating network traffic trends and patterns.
Analyze and benchmark several ML models applied to the investigation of three distinct and disparate network measurement challenges, including network attack detection, smartphone app anomaly detection, and QoE prediction in cellular networks [
132]. We looked at a wide range of ML models, including supervised and semisupervised approaches, as well as ML ensembles such as bagging, boosting, and stacking. Real network measurements from operational networks are used to assess the proposed models. This proved the superior performance of neural networks and decision trees in the analysis of network measurements from a variety of networking challenges. From
Table 8, we can see that decision-tree-based models outperform other single models in terms of accuracy and prediction.
7. Future Prospects of ML in Traffic Measurement
In the ML-based methods mentioned above, the statistic machine learning algorithms can dig out the patterns of routing, congestion control, QoS, and resource management to make future predictions, and then help the controller manage the network in advance. For further improvement of the performances and quality of the services, perhaps another method such as deep learning or deep reinforcement learning will help to find deep features of the packets flow. In this section, we analyze the probability of different popular artificial intelligence approaches to see whether they can be used in traffic measurement tasks, and then we give the future prospects of traffic measurement in SDNs.
7.1. Statistic Machine Learning
Statistic machine learning is the typical method to be used in data analysis, and we described many existing ML-based traffic measurements in the last section. Here, we discuss some other methods to see whether they can be utilized in an SDN.
7.1.1. SVM
SVM is one of the machine learning algorithms which belongs to a generalized linear classifier; it can minimize generalization error and maximize the geometric marginal. Generally, in SDN traffic measurement, classifiers such as NB and DT are used to recognize the features of packets data to detect attacks. However, SVM is a more useful and convenient method to make predictions, and it is robust to various kinds of cases. Ref. [
133] is an example that applies SVM to an SDN to classify the application traffic, and we can transfer it to other measurements as follows:
After receiving the packets data, the SVM can be used to detect whether an attack exists.
It can also spy on the resources and predict the trend of the traffic, such as if it will be busy in the future to maintain QoS.
SVM is also a valuable algorithm to find the potential congestion in the network.
7.1.2. Decision Tree
Decision trees (DTs) are another statistic machine learning algorithm which mainly depend on basic datasets. They filter out the disturbing features and build a tree to make classifications with different branches. Thus, DTs can be used to predict the traffic trend based on the previous knowledge which is already applied to the traffic classification [
117,
129]. Also using an online flow table, ref. [
134] utilized a DT to solve the Flow Table Congestion Problem (FTCP) to guarantee the quality of service. In the same way as the usage of classification or prior knowledge of congestion, a Decision Tree has a wide range of cases to be applied, as follows:
As it generally relies on previous data to build the decision tree, we can turn it on resource management to help the controller control the resources.
It can also be used to classify the received flow for better storing or understanding.
7.1.3. K-Nearest Neighbor
K-nearest neighbor (KNN) is a kind of unsupervised learning method that has no label to use, compared with the supervised methods (such as SVM and DT). It can be applied to classification or prediction tasks to divide the feature into clusters without class knowledge. Ref. [
112] utilized k-means and KNN clustering to classify data; here, we can also apply this clustering method to the task of resource management and attack detection.
7.1.4. Ensemble Learning
Ensemble learning is a strongly learnable method that gathers various weakly learnable algorithms, such as NB and KNN, to improve the final prediction. Given several algorithms, they firstly dig out the features to analyze the data, respectively, and then we obtain the final results among them by counting the highest votes. Kolomvatsos et al. [
135] developed an ensemble forecasting method to provide QoS in an SDN with its own prediction rule. However, it is worse than the original machine learning algorithm. Thus, in the future, we can apply ensemble learning to improve the results of tasks, although it may cost time and resources:
First, we can gather SVM, DT, and KNN or other methods to detect the attack among the flow data to make the final prediction.
It can also be used to classify the traffic application and vote to get the most probable one.
It can also learn the trend of the traffic flow and then decide whether there will be congestion and send instruction to the controller.
There are various types of statistic ML algorithms that have not been mentioned above, and they also can be used to deal with different kinds of detection or classification tasks in SDN traffic measurement.
Despite the statistic machine learning algorithms, DL is a new way to learn the features from the collected data to make predictions [
136]. With deeper layers and more neural nodes, DL can dig out deeper representations and find relationships corresponding to the ground truth. It can also build up kinds of neural networks with different functions to analyze various inputs, such as images, text or sounds. Here, we take a look at the DL network to see how we can use it in SDN traffic measurement [
137].
7.2. Deep Learning
A recent spike in interest in DL can be explained by the fact that it has been shown to be effective in the resolution of complicated issues in a variety of fields, including networking, robotics, computer vision, and speech recognition. Several DL applications have been implemented because of the need to analyze network traffic in SDNs. In the following section, we describe relevant studies that have made use of DL in traffic classification.
7.2.1. Multilayer Perceptron
Multilayer Perceptron (MLP) is the basic deep learning architecture, which consists of several layers with numbers of neurons. Generally, in addition to the input and output layer, the single hidden layer can fit different kinds of linear functions, and it is suitable to handle classification tasks on text data. Moreover, MLP performs better than the statistic machine learning algorithms due to its greater flexibility and adaptability, as well as its ability to build deep architecture to find deep features. To the best of our knowledge, there is no paper about MLP-based SDN traffic measurement; perhaps we can utilize MLP to evaluate the text data task as follows:
Like what the ML methods did before, we can apply MLP to the traffic classification and resources management.
MLP can also be used to detect attacks among the received data.
It can predict the congestion and traffic trend in the future.
7.2.2. Recurrent Neural Network
A recurrent neural network (RNN) is an extension of a conventional feed-forward neural network (such as MLP) which makes use of sequential information. The output is dependent on the previous computations in RNN and performs the same job for every single element of a sequence. It can store the memory of prior knowledge and forget the disturbing parts. Ref. [
136] developed a gated recurrent units (GRU) approach to provide SDNs with an intrusion detection system (IDS), which can detect attacks and achieve higher performance than the ML methods. Thus, we can apply RNN with big data analytics in SDN as follows:
With a short memory of the studied data, RNNs can take charge of the resource management with the study of the controller instructions.
Long short-term memory (LSTM), one of the RNN algorithms, can be used to learn long dependence on the data flow to classify traffic or detect attacks.
7.2.3. Convolutional Neural Network
Different from the deep learning methods above, a convolutional neural network (CNN) is highlighted in the convolution layer, which can extract patterns from the graph. In general, a CNN is mainly used to learn the classification of the image-based dataset, and it can dig out the more in-depth representation with more convolution layers through the kernel filters. Researching SDN traffic measurement, ref. [
138] applied a CNN to controllers to choose the best path combination for packet forwarding in switches, where the input image is composed by the switches in time intervals. Inspired by this, we can use CNNs to handle many graph-based tasks as follows:
With the collected data at each specific time, we can gather the information in the spatial dimension and build up array data. Then, we can utilize a CNN to extract features from these array data to find the relationships.
We can also create two-dimensional graphs with encoded features in time steps, and then the CNN will learn the temporal information with each column on the graph.
Thus, DL is an efficient way to make classifications and predictions in traffic measurement problems and help the network gain prior knowledge.
7.3. Deep Reinforcement Learning
Different from the feature analysis with ML or DL, RL concerns more how to lead software agents to take action by time step in an environment to maximize some notion of cumulative reward. In more detail, RL forces the agents to learn to choose the best action in each step, which will obtain the high scores at the end. The learning process does not need any feature, rather it needs to analyze the status in each step. To further improve the performance of RL [
140], DRL adopts deep learning architecture [
139] and builds a deeper neural network to dig out the deep representation to find the relationship between the actions and statuses. Here, we think about whether DRL can be used in traffic measurement.
7.3.1. Deep Q Network
Q-learning is a model-free algorithm using delay rewards that interacts with the environment by perceptions and actions. It builds up a Q-table to store the reward for each action in the corresponding status, and there are many SDN measurement tasks used to deal with the congestion and multiple control problem based on Q-learning [
141,
142,
143]. However, with the rapid increase in data dimension, it is challenging to build a large Q-Table to remember the experience. Thus, a Deep Q Network (DQN) enables finding the low-dimensional features of high-dimensional data by crafting weights and biases in deep networks that can replace the Q-table with the neural network. Recently, [
144] utilized DQN to consider the features of blockchain nodes and controllers jointly, and we can transfer this method into other applications as follows:
When receiving attack flows, DQNs can enhance the network stably by limiting the attack flows and retaining normal communication between the users.
With the congestion problem, DQNs can help find the best way to dredge flows.
Although DQNs cannot make classifications, they still can manage a variety of traffic to enhance the interaction.
7.3.2. Deep Deterministic Policy Gradient
Policy gradient is another reinforcement learning algorithm which aims to find the best policy with each action by time step. It has more significant performance than Q-learning, in that it is robust to policy degradation. Additionally, Deep Deterministic Policy Gradient (DDPG) improves the ability of policy gradient by applying deep neural network to dig out high-dimensional features to decide the best action in each step. To the best of our knowledge, there is no one utilizing DDPN in SDN traffic measurement; it may improve performance on the same tasks as DQNs.
Thus, DRL is suitable to manage SDNs and help the controller take the proper actions in different tasks. In short, the above three artificial intelligence algorithms may make progress in traffic measurement and help to improve network performance, which is summarized in
Figure 12.