2.1. Software-Defined Networking (SDN)
SDN is a modern networking approach that is flexible, manageable, cost-effective, and adaptable. The objective of SDN is to simplify network management, promote innovation, improve network resource utilization, and optimize network performance. It is well suited for high-bandwidth, dynamic applications. In large networks with hundreds of switches and routers, network management is a highly difficult and time-consuming procedure. Thus, SDN, a novel approach, seeks to improve the present unfavorable scenarios for building, developing, and maintaining networks. The core concept of SDN is to logically centralize network control in an SDN controller, which controls and monitors network behavior [
8].
Data traffic in our society is increasing exponentially as a result of the recent rapid growth of intelligent devices and network technologies. Networks are growing more heterogeneous and advanced in order to manage higher numbers of devices and optimize traffic distribution. One possible approach to solving these issues is to increase network intelligence. Fortunately, SDN makes learning easier [
9]. The traditional methods of configuring, optimizing, and troubleshooting computer networks may not be sufficient or efficient due to the large size, heterogeneity, and complexity of current and future networks [
10]. SDN offers security, energy efficiency, and network virtualization for enhanced network performance as a result of the rising prevalence of smart programmable devices in networks [
11]. Even though SDN provides many benefits, such as scalability, agility, and ease of administration, security is one of the most important aspects to take into account while using it [
12]. The benefits of SDN and the explanations for its rapid acceptance are depicted in
Figure 1.
Since the SDN architecture is centralized, programmable, and able to collect real-time data from the controller, it is conceivable to use some “intelligence” (ML techniques) for efficient routing and QoS provisioning [
9]. By utilizing SDN’s programmability, ML could be used to provide real-time QoS solutions. In SDN, the control and forwarding functions of a network are separated, enabling the control of the network to be programmable and the infrastructure for applications and network services to be abstracted. This design makes SDN a suitable choice for current and future networking needs. SDN architecture majorly consists of three layers, namely (a) the application layer, (b) the control layer, and (c) the infrastructure layer. Each of these layers performs certain functions and communicates with the others via interfaces [
13].
2.1.1. Infrastructure Layer
The infrastructure layer includes all physical components of a network, including a switch, router, openVswitch, a wireless access point, etc. Receiving the client’s request and sending the information to the control layer are the main responsibilities of these devices’.
2.1.2. Control Layer
The controller manages the flow of data in the network. The southbound APIs enable the controller to configure the forwarding devices, proceed traffic to the data plane in accordance with forwarding policies, and determine the best network path. In essence, the controller provides a centralized view of the network for better management and performance tuning, and it offers network abstraction for high-level network functionality. Some examples of controllers are ONOS, OpenDaylight, Floodlight, Beacon, Ryu, and POX etc. [
14,
15]. Specifically, the southbound interface named OpenFlow allows for communication between the control layer and the network layer. Hu et al. [
16] explained the analysis on the SDN controller based on throughput, response time, etc. To analyze the performance depending on variables like RTT, jitter, delay, etc., authors employed a mininet emulator that includes a Ryu controller.
2.1.3. Application Layer
Several end-user applications, including network monitors, time delays in traffic, traffic classifiers, and load balancing, are implemented in this layer. The network monitor keeps track of incoming traffic and counts the bytes, the traffic classifier uses ML to classify incoming traffic in real-time, and load balancing reroutes incoming network traffic to the chosen controller based on the nature of the traffic. The SDN’s control and data plane separation allows for effective domain-wide traffic routing and management. Programming forwarding devices is the responsibility of the controllers in the control plane, whereas network programming and policy enforcement are the responsibilities of the application plane, which is the top layer [
1].
2.2. Openflow
OpenFlow (OF) enables routing decisions and the transmission of rules such as load balancing, security, and quality of service (QoS) from the SDN controller to devices in the data plane. Southbound API is a widely used SDN protocol for facilitating communication between network devices and the SDN controller. The three primary components of an OpenFlow network architecture are the OpenFlow controller, OpenFlow protocol, and OpenFlow switch [
17]. One or more flow tables containing flow entries are present in OpenFlow switches. Match fields and actions are included in each flow entry, and the controller fills the tables. The match fields are made up of a packet’s header data, including the source and destination IP and MAC addresses, port numbers, and other details. Each action determines the packet’s instructions. Along with basic flow data, like IP and port numbers, OpenFlow’s flow table also offers a number of statistical features, including packet and byte numbers, duration, and so on [
18].
An OpenFlow switch looks up the appropriate command in its flow table when a packet arrives at the switch (for example, discarding or forwarding the packet). The packet is sent to the controller, a process known as packet-in, if there are no instructions in the flow table. Based on the packet’s payload, header information, and statistics, the controller chooses the appropriate instruction. The packet is sent back to the switch, a process known as packet out, after the instructions have been decided. The flow table is updated in the interim. Subsequent packets from the same flow are treated in accordance with the instruction.
When load balancing is used, the architecture enables a global view of the network and ongoing evaluation of the load on various links. This makes it simple to alter forwarding rules so that the load is shared equally. The basis of OpenFlow networks is the ability to conceive all traffic as flows, which means that different flows will have different entries in a flow table and that different rules can be defined for each entry. More specifically, the OpenFlow protocol facilitates information exchange between network devices in the forwarding plane and the SDN controller.
2.5. Load Balancing in SDN
To improve system efficiency and the user experience, the primary goal of load balancing is to prevent severe system load variance over extended periods of time. To effectively manage incoming traffic and resources and enhance network performance, the load balancing issue must be addressed due to the rising demand and depletion of resources. The role of the controller in SDN is to balance the load for better quality of service (QoS), which is one of the most crucial concerns [
26].
Load balancing is a technique for allocating load to various network components or processors to maximize network performance and improve QoS [
27]. Load balancing makes it possible to predict bottlenecks before they occur. Some of the load balancing objectives include increasing throughput, reducing response time, and improving traffic. SDN load-balancing techniques are more precise and feature better performance. Due to industrial concerns, load balancing is one of the most crucial topics in SDN-related research. Neghabi et al. [
28] presented a thorough review of the load balancing strategies employed in SDN, categorized into two types: non-deterministic and deterministic. Round Robin, Weighted Round Robin, least connections, weighted least connections, and random load balancing approaches were demonstrated and evaluated on a single switch architecture in [
29].
Improving end-to-end QoS metrics, maximizing throughput, minimizing response time, avoiding bottlenecks, reducing transmission latency, and optimizing resources use are the objectives of load balancing [
30]. Researchers have mentioned several load balancing metrics, including packet loss, transmission hop count, root mean squared error, utilization ratio, packet loss, throughput, response time, end-to-end latency, and types of traffic. After reviewing the state of the art and research conducted on various controllers, load balancing can be divided into being logically centralized/physically distributed. Physically distributed controllers are further subdivided into hierarchical, horizontal (flat), and virtualization controller load balancing.
Ahmad et al. [
31] identified and emphasized the variety of SDN controllers and research challenges in different SDN control plane architectures, including distributed, hybrid, and centralized SDN control planes. They assessed the most well-known SDN controllers using the four performance metrics of scalability, consistency, reliability, and security. Issues like interoperability, consistency, network partition, controller placement and load balancing, and security have been raised in regard to distributed control plane architecture.
Ali et al. [
32] propose an efficient slave controller allocation-based load balancing approach named ESCALB for a multi-domain SDN-enabled IoT network which aims to transfer switches to a controller with idle resources effectively.
Kumari et al. [
33] and Li et al. [
34] surveyed load balancing for both the data plane and the control plane, with a focus on SDN in the data center. Load balancing has been used to efficiently distribute network resources in order to improve both quality of service (QoS) and overall network performance, and this technology has been proven to be essential.
Liu et al. [
35] propose bidirectional switch migration based on load prediction (BSM-LP), which avoids needless switch migrations by properly predicting controller loads using historical load data and an ATT-GRU model. Their BSM algorithm enhances migration efficiency to prevent overloading the target controller and another algorithm for identifying and integrating isolated nodes to decrease their frequency.
In our previous study [
3], we highlighted that it is necessary to measure the load statistics for multiple controllers by developing an intelligent load distribution strategy and to optimize the number and location of controllers in order to establish balanced controller-load distributions throughout the ISP/Telco network.
2.6. Related Works
Using a broad perspective of the network, a network controller in SDN decides how to forward data. As a result, it may manage network traffic “on demand” to meet application requirements. Traffic management is essential for ensuring the quality of service (QoS) for real-time business traffic, which is sensitive to packet losses and delays. Therefore, a lot of researchers have concentrated on designing dynamic-data-forwarding path-planning algorithms that may satisfy end-user requirements.
Currently, network traffic classification is a hot topic in computer science. Understanding the variety of network applications that are used in a network is a crucial task for internet service providers (ISPs). Traditional techniques for classifying internet traffic include those that use ports, payloads, and ML. The ML approach is currently the most commonly utilized method. It has received excellent, accurate results and is utilized by numerous researchers.
Shafiq et al. [
36] discuss network traffic classification approaches step by step. The classifiers Support Vector Machine, C4.5 decision tree, Naive Bays, and Bayes Net are used. According to experimental data, C4.5 classifiers perform quite well in terms of accuracy when compared to other classifiers.
Qazi et al. [
37] introduce the framework Atlas, which uses ML-based classification techniques in order to incorporate fine-grained application awareness in SDN. The proposed approach can classify network traffic in real-time since it gathers the packet size of the first N packets of a network flow. Atlas is installed on a wireless network and uses the Android OS to collect network traffic.
Using the distributed controllers, Gasmelseed et al. [
5] propose a mechanism that splits the traffic into TCP and UDP and employs a failover strategy to provide a high-availability environment that ensures reliability.
Amaral et al. [
38] use three supervised classification techniques, such as Random Forest (RF), Stochastic Gradient Boosting (SGB), and Extreme Gradient Boosting (EGB)-XG Boost. They set up this platform in an enterprise network and gathered data using the following schemes: Bittorrent, Dropbox, Facebook, HTTP, LinkedIn, Skype, Vimeo, and YouTube. The OpenFlow protocol was used to collect the data for this study, and features such as packet size, interval arrival time, source and destination IP/MAC/Port, flow length, byte count, and the packet count of the first five packets were collected.
Raikar et al. [
39] suggest a supervised learning model for classifying data traffic in the SDN context. SVM, closest centroid, and Nave Bayes are the three different models employed to categorize the data traffic based on the applications in an SDN platform. In all three supervised learning models, the accuracy of traffic classification is greater than 90%.
Amiri et al. [
40] describe a technique for raising the quality of service (QoS) for traffic flows in a cloud data center by utilizing SDN and ML-based traffic classification. The proposed real-time traffic classification module makes use of ML techniques to improve traffic classification accuracy. Compared to Equal Cost Multi-Path (ECMP), the proposed optimization method increases bandwidth consumption. This improvement may be extremely important in ensuring that users’ interactions with cloud data centers have a sufficient QoE.
In order to address the mislabeled training data issue, Wang et al. [
41] introduce a novel approach in their research called Noise-resistant Statistical Traffic Classification (NSTC), which integrates reliability estimates and noise reduction into traffic classification. In the case of large amounts of unclean data, NSTC can dramatically outperform state-of-the-art approaches in terms of classification performance.
Eom et al. [
42] propose an SDN-based system for classifying network traffic. Applying four ensemble algorithms—Random Forest (RF), Gradient Boosting Machine (GBM), Extreme Gradient Boosting (XGBoost), and Light Gradient Boosting Machine (LightGBM)—researchers examine the classification performance of each algorithm in terms of accuracy, precision, recall, F 1-score, training time, and classification time. The experimental results show that classifiers based on ensemble models outperform those based on the proposed framework and the real-world network traffic dataset. Notably, the classification performance achieved by the LightGBM model is the best.
Mohammed et al. [
43] look at existing deep learning (DL) techniques for classifying and predicting traffic in an SDN environment. Malik et al. [
44] offer Deep-SDN, a new deep learning model for SDN that can quickly and reliably identify a variety of traffic applications. Better results were reported in terms of accuracy, precision, recall, and f-measure when the proposed model’s performance was compared to the state of the art. According to the research, the proposed model could achieve an overall accuracy of 96%. The researcher’s claim is that the proposed model can accurately and quickly identify the different network traffic application types, making it useful for identifying online traffic.
Load balancing is a technique used to divide the workload among multiple resources in order to avoid overloading any of the resources [
2]. Some of the objectives of load balancing include maximizing throughput, minimizing response time, and optimizing traffic. Ejaz et al. in [
45] balance the load with a focus on the controller. The incoming traffic is balanced using virtual SDN (vSDN) controller duplication and shared load. Using two vSDN controllers in a mininet emulator, the authors experimentally confirmed the load balancing in the Fat-Tree architecture.
Hai et al. [
46] provide load balancing in the control plane based on load status and the dynamic weight coefficient of each controller to improve resource use, reliability, and resilience in the network. Additionally, compared to current methods, communication overhead is greatly decreased by employing a pre-defined load threshold.
Mousa et al. [
47] provide an in-depth review of recently introduced SDN-based load balancing and routing methods. There are different types of possible load balancing techniques, such as switch migration, routing, controller placement, traffic classification, heuristic algorithms, AI-based techniques, etc.
Maity et al. [
48] propose CORE, a prediction-based technique consisting of mobility prediction, rule-caching, and master controller assignment, to minimize controller overload and distribute the dynamic traffic optimally under the consideration of heterogeneous IoT devices.
Sapkota et al. [
49] propose the Naked Mole-Rat (NMR) algorithm, a novel population-based meta-heuristic algorithm, to optimize the location for controller placement based on switch–controller and controller–controller latency while maintaining load balance among the controllers. Two commonly available standard topologies, Ernet and Savvis, are used to demonstrate the concepts and methods.
Xue et al. [
50] propose a novel approach for load balancing of SDN which combines Genetic-Ant (GA) with Colony Optimization (ACO), called (G-ACO). It benefits from the speedy genetic algorithm (GA) global search and the effective search for an ACO optimal solution. Computer simulation reveals that the proposed scheme significantly improves the round trip time (RTT) and packet delivery ratio compared to the Round Robin (RR) and ACO algorithms by effectively achieving the LB.