Reliability Aware Multiple Path Installation in Software Defined Networking

Being a state-of-the-art network, Software Defined Networking (SDN) decouples control and management planes from data plane of the forwarding devices by implementing both the control and management planes at logically centralized entity, called controller. This helps to make simple and easy both the network control and management. Failure of links occurs frequently in a computer network. To deal with the link failures, the existing approaches computes and installs multiple paths for a flow at the switches in SDN without considering the reliability value of the primary path. This incurs extra computation to compute multiple paths, and both increased computation time and traffic to install extra flow rules in the network. In this research work we propose a new approach that calculates the link reliability and then installs the number of multiple paths based on the reliability value of the primary path. More specifically, if a primary path has higher reliability then a smaller number of alternative paths should be installed. This shall decrease the path computational time and flow rule installation load at controller. Resultantly there shall be less flow rule entries in switch flow table which in turn will avoid the overflow of the flow table. Through simulation results, our proposed approach performs better as compared to the existing approach in term of computational overhead at controller, end-to-end delay for packet deliver and the traffic overhead for flow rule installation.


Application Plane
Application or management plane is set of end user applications that interact with the control plane, like security applications are responsible to counter attacks and threats, load balancer is used to evenly distribute the traffic among the links, SDN programming languages (like Frenetic [8]) are used to specify the configuration and requirements at abstract level, etc..

Control Plane
SDN control plane is the decision logic which specifies the action for a data packet based on application plane and network topology. The action for a data packet can be to forward through a specific path, or drop or send to controller. The controller communicates the decision to the data plane by installing the flow rules at the switches along the path. Controller has awareness of network topology. Control plane can be run over a single machine or multiple machines.

Data Plane
Data Plane in SDN consists of a set of forwarding devices (Open switch, routers, Access points) that are commanded by the controller using OpenFlow protocol to take action for a data packet arrived at the switch. A data plane establishes a secure connection with SDN controller and has a flow table which consists of pattern and action fields. When a data packet is arrived at the Security switch, the switcher looks for the matching pattern in the flow table., If the matching is found, then the switch takes the corresponding action. Otherwise, the switch shall communicate with controller via OpenFlow to compute the action for the data packet. [9Due to the numerous advantages, SDN has been adopted by many organizations, e.g.,

SDN Challenges
Beside these benefits, SDN has also several challenges as follows. (i)Controller placement: every switch is continuously communicating with controller. So there is a need to place the controller at optimum place to minimize the load at controller, delay toward the controller, and the distance to controller from each switch. [15].
Similarly, the communication between application plane and control plane, and control plane and data plane should be standardized and open source.
Link Failure resiliency is also a challenge to provide better quality of service to the users despite the link failure. Link failure occurs frequently in any computer network. This degrades the network performance if a link remains failed for longer time without taking measures for its recovery.

Figure1(a) -Supposed Network View for Proactive and Reactive flow Rule Installation
The existing approaches for link failure in SDN can be categorized into two types [16,17]; proactive and reactive approaches. To explain these approaches, suppose we have a network as shown in Figure 1(a).

(i)-Reactive Failure Recovery Mechanism [Restoration]
When a link gets failed at the switch, the switch informs the controller by sending the link failure event and asks the controller to compute another alternative path for the flows passing through the failed link. After this, controller computes and installs the alternative paths for all the flows passing through the disconnected link. Then the data packets these flows start to be forwarded along the new paths. The disadvantage of this approach is that it introduces longer delay taken during the process to inform the controller and subsequently to compute and to install the alternative path. For example, there are two paths between switches A and G in the figure. If the link between switches A and B is failed for the flow between switches A and G, then the switch A will inform the controller and the controller will compute another path (A-C-D-G) as shown in Figure 1 Controller F a il u r e

N o t if ic a t io n ( li n k s t a t e )
A d d fl o w c o m m a n d

Primary communication among A->G path A-B-G T1
End Host A End Host B Flow rule command T1

Failure information T2
Flow rule command T3 alternate path

Figure 1(c)-Two Path Installation by Controller Proactively
Thus, this mechanism is faster than the reactive failure recovery mechanism because of less controller intervention involved in this process [18][19][20]. This mechanism has associated disadvantages of longer delay to computing multiple paths, larger traffic overhead to install multiple paths at the switches, and consumes more memory space at the switches for storing multiple paths in the forwarding  [22]. TCAM is an expensive (in term of its cost) and faster memory. The existing approaches for SDN stores multiple paths for every flow at each switch regardless of the path reliability. This can overflow the TCAM memory of the switches. More specifically, if a path for a flow is reliable (i.e. its failure chance is very low) then there is no need to install multiple paths. This will reduce the computation time at the controller for computing single path, traffic overhead by installing the flow rules (path) along the single path, and the TCAM memory consumption by having single flow entry in the forwarding table of the switch.
Our proposed approach, called RAF, solves this problem formerly as follows. In our proposed approach each switch periodically exchange the link failure information along with other link state information with the controller. Then the controller computes the reliability of each link.
After receiving a request for path computation for the flow, the controller computes the number of multiple paths based on the reliability level of the primary path. More specifically, if the primary path for a flow is most reliable (in our proposed approach if the reliability of the path is greater than 90%), then the controller computes and installs single path. If the reliability level of the primary path is between 80% and 90%, then the controller computes and installs two paths for the flow at the data plane. More detail of the proposed solution is given in Chapter 4.
Through simulation results, we show that our proposed approach performs better than the existing approaches in term of end-to-end delay, traffic overhead and computation time overhead.
Rest of the thesis is organized as follows. A comprehensive overview of related literature work is described in Chapter 2. Chapter 3 explains the problem statement through an example scenario.
The detail of our proposed approach is given in Chapter 4. Chapter 4 describes the simulation results. Finally, Chapter 5 concludes the thesis along with some future research directions.
Many researchers have provided different solutions for link failure handling, the detail is as followed.
The authors in [18] States that Fast Failover group (FF) is an OpenFlow switch specification used for detouring the flows to alternate port of OpenFlow switch, when a link failure occurs and switch request to controller. In this mechanism, flow rules have group ID in their action filed.
This group ID invokes the group table entries. Group Table entries forward packets to switch port defined in action bucket. When failure occur next available action, bucket is activated, and status of previous action bucket is disabled. Figure 2 Table entries and Action Bucket band and Out-of-band OpenFlow networks. Out-of-band networks are widely used because these are easy to manage but Controller need a dedicated port of each switch for control channel.
While in-band Control does not need port reservation. In-band network failure resilience in hard to manage because when a single link failure occur it cause many data and control channels disruption. Figure 3 presents example of in-band and out-of-band networks

Figure 2.2-In-band and Out-Of-Band OpenFlow Networks
Author identifies problem of time taken for link failure detection involves recovery delay.  Figure 5 shows the working of Controller and Shadow controller in order to execute the recorded rules before failure and reconfiguring the switches when a link failure occur.

Sw-2
Sw-3   link testing using LLD packets, system has less control interception in recovery mechanism.
Route planning module is responsible to compute multiple routes on base of updated network topology information. In this work SDN application can use only the logical path configured by OpenFlow Protocol during its VLAN switch configuration module. In this module multiple ports are configured with VLAN IDs. After this host traffic is forwarded to compute logical path randomly or by using round robin algorithm. Modules involved in CORONET Fault tolerance system are elaborated in Figure 7. Packets affected by link failure shall be process using the flow table. For the purpose , Controller find the alternative paths using the Floyd algorithm ignoring the failed links. But it's a post failure recovery or reactive action performer for recovery purpose not a proactive approach like Veriflow , a layer between controller and data plan to evaluate the flow rule proactively before installation [32].
SPIDER [33] is a pipeline packet processing that apply the stateful design for the recovery mechanism in switch locally. It provide the functionality using fully customizable and request does not return any packet to failure , port status is set to down. When a port is declared as a down port than any other port will be active to forward data. Using this mechanism SPIDER, guarantee the failure recovery in very short time. Behavioral Model in this work is in form of Finite State Machine which is acted for per flow processing.
Flow Table Compression in proposed in [34] to Utilize the OpenFlow switch memory efficiently.
This solution addresses the problems of Data center networks as in datacenters link reliability, multipath availability and immediate recovery is requires. To insure these requirements when a switch has multiple paths along with a primary path in its memory, this mechanism cause to increase the inefficient usage of TCAM usage that is a crucial resource. Compression is performed for flow entries that have same action outputs in a switch locally. In this way a Flow   the packet will be received at switch s1. The switch (s1) shall send a packetIn toward controller, suppose the flow rule is not installed in the forwarding table of s1. Controller computers the primpary path (s1-s4-s8) which has maximum reliablity (0.9 i.e 90%) . In this case, the existing approach [44] will compute all alternative paths and will install all the paths in the network, as shown in Figure 3.1 (b). As the reliability is high (90%) of the primary path, so the other alternative paths installed are redundant. We suggest that in this case only the primary path should be installed, as shown in Figure 3.1 (c). This will reduce the compuation overhead for computing multiple paths at the controller, traffic overhead to install the flow rules at all alternative paths in the network, and memory usage in the switches.

Proposed Approach
In this section, we formulate our proposed approach, called Reliability-Aware Flow installation mechanism (RAF), that the primary path based on higher reliability. Then, we propose a variation of the RAF, Distance based RAF, that computes the primary path based on the joint value of higher reliability and shorter distance.. We assume out-of-band communication model for communication between a switch and controller, and reactive flow installation mode. Our proposed approach contains the following components.

Path computation
As we mentioned above that we assume a reactive flow installation mode. In this mode, when the network is configured and starts the running, the flow tables at the switches are empty. When the data packet of a flow arrives at the switch, the switch looks for the matching entry in its forwarding table. If the matching is found, then the switch forwards the data packet according to the corresponding action of the flow table entry. Otherwise, the switch asks controller to compute the action for the flow. After receiving the request from the switch, the controller checks the access control list (ACL) whether the flow is allowed or denied. If the flow is denied, then the controller installs the drop action at the switches. Otherwise, the controller computes the of primary path using either of the following approaches

RAF
In RAF, we compute the most reliable path as primary path and follows either of the following cases for computing and installing other alternative paths RAF.

Case I:
If the reliability of primary path is more than 90%, RAF does not compute and install any other alternative path.

Case II:
If the reliability value of primary path is between 80% and 90 %, the controller installs two alternative paths.

Case III:
If the reliability value of primary path is between 70% and 80 %, the controller installs three alternative paths.

Case IV:
If the reliability value of primary path is between 60% and 70 %, the controller installs four alternative paths.
If the reliability value of primary path is between 50% and 60 %, the controller installs five alternative paths.

Case VI:
If the reliability value of primary path is between 0 % and 50 %, the controller installs all available alternative path.

Distance based RAF
Distance based RAF considers computes the primary path based on the joint value of higher reliability and shorter path length. After this the controller computes and installs multiple path based on the reliability value of the primary path using Case I to Case VI in Section 4.1.1.

Link Failure Process
Proposed method has functionally to make unavailability of path in case if link failure event occurs. Link failure is configured at switch egress ethernet interface by OpenFlow command at controller. Controller adds a flow rule entry for binding the physical Ethernet interface to a logical port or simple setting the port status down. Failure procedure is activated when aliveness time of a link approaches up to zero. Controller gets information of failure using feature reply periodic message and makes the decision of alternative path installation.

Simulation Tools
To evaluate the performance of proposed approaches, we have used the POX controller and Mininet simulator.

Mininet Simulator
With a need to fasten the research in Open flow and SDN Mininet emulator has been created that allows simulation for both small and large networks both without modification. Large scale topologies of size up to hundreds and thousands of nodes can be created using Mininet and can be tested easily that consists of simple tools for command line and API. Mininet [46] gives ease of use to the user that offers easy creation of SDN elements, customizing, sharing and testing SDN networks.

POX Controller
POX provides a framework for communicating with stateless switches using OpenFlow. .
Developers can use POX to create an SDN controller via python language. It is a popular tool for academia researching in software defined networks and network management applications. POX can be immediately used as basic SDN controller by using the stock components that come with its versions [48]..

OpenFlow API
OpenFlow is data plane devices management application programming interface that has used for testing our approach. OpenFlow was emerged in 2008 and continuously in updating situation.
This API support user custom application and their decision to store in OpenFlow enabled switches via control plane commands. we have used its 1.0 version. 1.0 switch specification is supported at Pox controller.

Python
We have used python programming langue for management and data plane scripting , implementation of RAF as its compatibility with SDN controller and Mininet simulator.

Experimental Results
We have 25 virtual end host and 9 OVS switches in our emulated network topology. Switches are interconnected. Host IP addressing is from IP subnet (10.0.0.1/16). A single host in this network topology generate 10000 UDP data packet with average of 62 bytes data per packet.
Subsequent packet delay is controlling data rates of socket interfaces. Best results are collected via using 5 host generated data packets because of laptop limited resources.
Experiments are recursively performed using POX controller (2.2 eel version) , Mininet framework presents our work efficiency and concerned beneficial superiority comparative to existing approach of all path installation. We have all implementation scripts compiled in python  As the Path length increase Controller has more and more control packet transmission rate. Software Defined Networking is an emerging computer network architecture by centralizing both the control and management plane. This centralization has many advantages like easy network control and management. Link failure occurs frequently in a network. To deal with the link failure, the existing approaches install all multiple paths in the network without considering the reliability value of the primary path. This causes more computation overhead at controller to compute all the paths, larger traffic overhead in the network to install all multiple paths at the switches in the network, more memory usage in the switches for install multiple paths. To address this problem, we have proposed a new approach, called RAF, which considers the reliability level of primary path and installs the number alternate paths according to the reliability value of the primary path. More specifically, if the primary path has higher reliability, then there is no need to install multiple paths. The simulation results confirmed that our proposed approach has better results by decreasing the computation load at controller, average end-to-end delay and the traffic overhead for installing multiple paths at data plane. As a future work, we would like to extend our proposed work by computing the link reliability using deep learning algorithms.