1. Introduction
Owing to the rapid evolution of communication technologies over the past few years, the number of terminals that access network resources has increased, and applications have diversified. As of the coexistence of application services requiring different policies, network configuration and management have become very complicated [
1]. In traditional networking environments consisting of network devices operated by vendor-specific control software, network flexibility is very low, making it difficult, time-consuming, and expensive to configure networks to meet these complex needs.
Software Defined Networking (SDN) technology, which operates a network by separating the control and data planes, has been shown to solve these problems and efficiently perform network configuration, control, and management. SDN switches operate in the data plane to forward packets according to the flow table information provided by the SDN controller in the control plane [
2], which makes it easier to ensure Quality of Service (QoS) and recover from failures compared with traditional networks [
3,
4]. As the number of terminals and the various types of data served through the network have increased, QoS support is also an essential issue in the management of SDN networks. Several studies have been conducted to provide QoS in single-domain SDN networks [
5,
6,
7,
8,
9,
10,
11,
12,
13,
14,
15,
16,
17,
18,
19,
20,
21].
As networks grow in size, service providers have partitioned into multiple administrative domains, called multi-domains, to address scaling and operational concerns. The SDN architecture supporting multi-domains generally consists of local SDN controllers that independently operate each domain and a global controller that integrates and manages them in a hierarchical manner [
22].
To support end-to-end QoS for services passing through multiple domains, complex traffic engineering is required because the quality level that can be provided according to the resource conditions in these domains should be comprehensively considered [
23]. Numerous studies have been conducted to support end-to-end QoS in multi-domain SDN networks [
24,
25,
26,
27,
28,
29,
30,
31,
32,
33,
34,
35,
36,
37].
Determining the end-to-end path that supports QoS in a multi-domain network requires a large amount of information exchange between the local and global controllers and very high computational complexities. To reduce the computational complexity, Effective Bandwidth (EB)-based methods have been proposed [
5,
6,
8,
9]. EB is defined as the minimum service rate required to maintain the given QoS requirements for a traffic source or arrival process [
8,
38]. Existing methods statistically calculate the EB using the amount of arrival and departure traffic measured by the network devices.
These methods have been applied to individual routers or to an end-to-end path within a single domain, but this approach may cause severe errors depending on the measurement and traffic delivery methods. Furthermore, Effective Delay (ED) is defined effective transmission time to transmit a data stream. Thus, ED is differs from latency [
39,
40]. ED can be obtained from the EB curve of the backlog/service process. It is necessary to consider the network situation in every domain and the traffic delivery method between domains in a multi-domain environment. However, from our investigation, it appears that there have been very few attempts to employ such a method.
In this paper, we propose an effective path selection method that satisfies QoS requirements for inter-domain flows based on EB theory and a Directed Acyclic Graph (DAG) in a multi-domain SDN architecture. With ED and DAG, the proposed method can determine the end-to-end path for inter-domain flows, guaranteeing their QoS much faster than conventional path selection methods.
The contributions of this study are as follows.
An ED measurement method that improves accuracy in a multi-domain environment is proposed by simultaneously applying the backlog process construction method and supermartingale theory.
Based on the ED derived in the multi-domain environment, we propose a flow decision method that satisfies the QoS requirements faster than the existing method through DAG.
We construct a multi-domain simulation scenarios to conduct performance comparison and validation of the proposed and existing methods.
The remainder of this paper is organized as follows.
Section 2 summarizes the related work.
Section 3 introduces the proposed system architecture, measurement method of the ED, and a DAG-based flow selection method for QoS support in multi-domain networks.
Section 4 describes the performance of the proposed method. Finally,
Section 5 concludes the paper and discusses future work.
3. QoS Support Path Selection for Inter-Domain Flows
The existing flow determination methods described in the previous section involve a complete search for the entire path, which may cause performance degradation owing to the computational overhead required in the determination process. The proposed method can reduce the computational complexity by calculating the ED values combined with QoS metrics. The DAG algorithm is then applied to determine QoS-customized flows for end-to-end applications in multi-domain networking environments. In this section, the proposed inter-domain flow path selection method that considers a multi-domain SDN architecture is explained.
3.1. Preliminaries
Here, we present some definitions and theorems related to the martingale theory, which is the basis of the proposed method for calculating the ED.
Let us consider a sequence of random variables . Then, we present the following definitions of the martingale process.
Definition 1 (
Martingale Process [
42,
43])
. The stochastic process is said to be a martingale if, for , Equations (
1) and (
2) imply that given all historical observations for the time interval
, the conditional expectation of the next observation for time
is equal to that for time
t.
Definition 2 (
Supermartingale Process [
42,
43])
. The stochastic process is called a supermartingale if, for , Equations (
3) and (
4) indicate that the supermartingale process has the property that expectations fall over time.
For a queuing system, let and be the numbers of arrivals and departures, respectively, at time t. Let us define and , and and . Then, the following arrival and service martingales are defined.
Definition 3 (
Arrival Martingales [
44])
. is said to be an arrival martingale, if for every , there exists a and a function such that the process is a supermartingale, where Definition 4
(Service Martingales [
44])
. is called a service martingale, if for every , there exists a and a function such that the process is a supermartingale, where As methods for obtaining the statistical delay boundaries required to describe the backlog process, we use approaches, such as Markov’s equity or the Chernoff bound to calculate the upper bound of probability in cases where the average probability distributions are given. In this study, as a method of obtaining statistical delay boundaries, we utilize Doob’s supermartingale quality feature, which is defined in Definition 5.
Definition 5 (
Doob’s maximal inequality [
45])
. Doob’s maximum linearity can be used to calculate the upper bound of the statistical delay boundary if is a supermartingale.
In Equation (
7), sup(·) represents the supremum or least upper bound [
46] and
represents the expected value of the argument, and
is defined when
as a moment generator function of random variable
X defined for any
:
3.2. System Model
The main variables used to explain the network model and the proposed method are listed in
Table 2.
The muti-domain SDN architecture considered in this study is depicted in
Figure 1, in which the network is partitioned into
L local domains. The
lth domain (
) consists of
switches managed by a Local Controller (LC)
. The information of each switch is exchanged with its LC using the Link Layer Discovery Protocol (LLDP) so that the LC can manage data flows among switches. We denote the
n-th switch in domain
l as
, and
as the link between two switches
and
, where
and
. Then, the
lth domain is represented by a graph
where
and
.
Each LC communicates with the Global Controller (GC) to periodically exchange information on its domain and request path selection for flows beyond its domain area. In addition to maintaining the comprehensive information of all LCs, the GC considers each domain as a virtual node and manages a network graph representing the connections between them, , , where , , and denotes the link between two local domains n and m.
The system model of switch
is shown in
Figure 2. Let us consider the traffic from this to
. Assume that there are
k flows,
(
, arriving at a single queue with infinite and FIFO strategy. Furthermore, the capacity occupied by traffic transmitted to the next node through the service curve is expressed by
, and it will be described with cumulative departure traffic.
The cumulative arrival and departure traffic is represented by stochastic processes and , respectively, for all .
Subsequently, .
The server that deals with traffic is characterized by the service curve .
Each switch system in the local domain
l is then simplified with the corresponding
,
, and
, as shown in
Figure 2.
3.3. ED Calculation in a Local Domain
Here, we explain the method for calculating the ED for the traffic flow between two switches in a local domain. To generalize the explanation without being limited to a specific switch, the arrival, departure, and service processes at a switch in a local domain are expressed as , , and , respectively, except for the subscripts n, m, and l.
Let the backlog and delay processes in the queue of the switch be
and
, respectively. Then, we have
In Equation (
10), inf(·) represents infimum or greatest lower bound [
46]. As in [
47], we assume that the arrival process
follows a leaky bucket shaper model with bucket ratio
and burst parameter
b, which is widely applied to define the envelope function that provides the deterministic upper bound of an arrival process. Then, for all
, there exist
and
b satisfying
From Equation (
9), we can determine
for
by assuming that the boundary of
exists in the sense that there is a constant rate of service ratio
as follows:
We apply the leaky bucket traffic model used in Equation (
12) to traffic generation in this study, and we show that existing studies can experimentally simulate traffic similar to real media traffic [
48].
Figure 3 shows the relationship between the backlog and delay processes,
and
, respectively, with respect to
,
,
,
, and
illustrated above. It should be noted that the service process
associates departure process
with arrival process
.
From Equation (
10), it can be written as
The complementary cumulative distribution function of
can then be expressed as follows:
We can express the service curve in the min-plus algebra equation as
related to departure
and arrival
to extend to the switches connected between the switches.
where the ⊗ operator is defined as follows:
The arrival and departure at each node within the time interval are described as stochastic behavior of random processes and service curves; for every
t with
, they can be expressed as follows:
where
denotes an error function that provides a bound on the probability that
will be violated, and the parameter
is used to generate a linear delay boundary in the backlog process where
represents the bustiness construct defined in [
49] and the characteristics of the service envelope in network calculus. As described in [
47], the characteristics of bound business were applied. Consider the case in which packets from a flow are transmitted via W series-connected switches
,
, ⋯,
, as shown in
Figure 4, where
in the local domain
l. Let the service curve of switch
be
. Then, the service curve passing through
W switches is given by [
50]
Using the service curve and backlog process characteristics of a single switch defined above, we can obtain the statistical delay bound for consecutively connected switches as follows:
Theorem 1 (
Linear Statistical Delay Bound)
. Let us consider that switches have a constant service rate, , under statistically independent assumptions. An arrival with a traffic service rate and a magnitude b representing the burstiness of traffic is passed through consecutively connected W switches. Assuming a stable situation, the linear statistical delay bound can be calculated as Proof. Similar to [
51], we can prove the theorem as follows. Let
,
,
, and
, respectively, be the delay, arrival, service, and departure processes at the
i-th switch along the serial
W switches,
. From Equation (
14), the delay bound at
can be expressed as
As
is equivalent to
, we have
From the definitions of supermartingale and Doob’s maximum linearity and by utilizing the statistical independence of
and
, we have
Using
in Equation (
18), the end-to-end latency bound can be written as
Based on the statistical independence of
, Equation (
23) can be rearranged as follows:
Using the inequality condition, we obtain
Based on
being limited by the error function
in Equation (
17), we obtain the delay
as follows:
By differentiating the right-hand side of Equation (
27), the optimal value of the delay bound is calculated as
It can be confirmed that the result is given by Equation (
29) and
. As
,
can achieve the minimum bound. By contrast,
can also achieve the maximum bound. If we say
, we can obtain the optimal value
as
Based on the conditions in Equation (
31),
reaches the minimum bound. Substituting
into Equation (
29) completes this proof. □
The proposed delay bound increases linearly with the number of connected switches, that is,
W. The end-to-end delay bound
given by Equation (
19) can be utilized to determine whether or not the requested delay constraint for the traffic is satisfied.
It is also necessary to obtain a case from Equation (
26), which satisfies the condition that
is
for this purpose. From Equation (
26), when
, the following inequality holds:
From Equation (
30), Equation (
32) can be rewritten as follows:
It should be noted that Equation (
33) implies a condition that satisfies the stochastic delay bound in Equation (
19).
3.4. Path Selection of a Flow Using ED
As the local switch determines the forwarding port of traffic based on its own flow table, a local switch that does not have a send/receive rule registered in the flow table must request information from the LC regarding which port should forward traffic to the destination. At this time, the LC must handle the flow decision process by considering (1) the presence of a transmission/reception pair in its own management domain and (2) the presence of a destination in a domain other than its own domain.
3.4.1. Case 1: Path Selection of a Flow within a Local Domain
Here, we consider a case that determines the path of a flow in a local domain l, which is represented by a graph as previously defined. When a switch in a local domain l receives a flow request from the source (src) and destination (dst) with traffic type (type) including traffic characteristics and QoS requirements, it forwards the request to . determines the set of possible paths () for the flow, calculates the probabilistic delay bounds for these paths, and selects the best path to satisfy the QoS requirements of the flow.
To find the set of available paths
in the local domain, we utilize the multipath A-star algorithm proposed in [
52].
For all possible paths in
, the delay bound (
) can be calculated using Equation (
19). Subsequently, if the lowest delay bound satisfies the QoS requirements, the path with the lowest delay bound is selected as the flow path. If no path satisfies the delay bound, the flow request is rejected.
3.4.2. Case 2: Path Selection for an Inter-Domain Flow
Step 1: Construct DAG and Set Global Network
For flows that cannot be processed within the LC, the GC must determine the flow by using the DAG between the local domain controller information. In , the graph of the GC is the virtual graph between local domains, the vertex is the local domain, and edge is the edge between the local domains. A pair of inter-domain switches connecting domains l and between each domain is represented by , and the edges are therefore represented by . The GC can construct the connection relationship of the LCs through the inter-domain connection switch (gateway switch) information received from the LC, and reflect it in the global domain graph of the GC.
Algorithm 1 describes the construction of the inter-domain connection information and the generation of a DAG for a global inter-domain network to apply topological sorting. First, a variable is initialized to output a set in which the topological ordering of the entire network DAG is completed. The VisitList for the topological order sequence and the Visited variable for checking the visit status are initialized by making a detour based on each domain index. Subsequently, for each unvisited domain vertex, the dfsRecursive() function is called.
The dfsRecursive function treats an unvisited domainvertex as a visit and adds the domainvertex to the VisitList. We then check the visit of each neighbor domain vertex and recursively call dfsRecursive() to the nonvisited neighbor node. This process completes the topological order list from the first domain to the last domain until the entire domain is visited, adding it to
, and then calculating the topological order from the next domain index until the last domain index to return the set
, which is the final topological ordering.
Algorithm 1: Construction of DAG set global network. |
|
Step 2: Flow Path Selection on Multi-Domain
Using the generated topological-order subgraph list, the flow for inter-domain transmission/reception pairs for flows requested by the local switch is determined using Algorithm 2.
First, is initialized to zero to determine whether the QoS is satisfied, and then the domain index () of the switch where the transmission node is located is substituted into . If is the same as the source domain index, the bound between the set of paths between the source switch and the next domain entry switch must be calculated, because the ED bound from the source domain switch in the domain must be calculated. For this calculation, we calculate and then add the minimum boundary to . At this time, the traffic origin node of the source domain node is assumed to be the n-th node of , and is expressed as , and the m-th switch of the next domain in the topological order of the source domain is expressed as .
Subsequently, in the case of inter-domain flow, the minimum boundary of the inlet and outlet switches is calculated, which changes the value after
to
[
]. next value, cumulatively calculates the minimum boundary value of the path corresponding to
, and determines whether the QoS is satisfied. When the final destination domain arrives, the final domain, like the source domain, calculates the path from the incoming switch to the destination through
, calculates it using the minimum boundary, reflects it in the cumulative boundary, and determines whether the QoS is satisfied. The inter-domain flow decision is completed by delivering traffic flow information to the LC corresponding to the index from the source to the cumulative bound.
Algorithm 2: Inter-domain flow path selection. |
|
As shown in Algorithm 2, to obtain the path from the source to the destination in the entire requested flow, the path obtained by performing a topological sort is traversed to ensure that the QoS requirements for the traffic type are met. The process in which the GC receives the flow request from the LC and delivers it to the controller related to the flow through the algorithm is shown in
Figure 5.
By performing the above process, it is possible to determine a flow that satisfies the QoS requirements within polynomial time when the number of domains is large using the proposed method.
In summary, switches report their queue statistics as defined in the OpenFlow document [
53] and the information on arrival and service processes of
and
which are additionally defined for the proposed method to their LC, periodically or aperiodically. The service rate is set to fixed as in [
54], which enables stable estimation of the delay boundary. Each switch calculates the traffic rate
based on the current queue static information, i.e.,
. With the information that switches provide, the LC calculates ED boundaries for source and destination node pairs using Equation (
19). Among various methods to transfer metric information between switches and controllers, LLDP control loops can be used. In this case, the overhead of TLV (Type, Length, and Value) fields for the additional information transfer is 8 bytes, which is negligibly small.
Since the times for switches to send their statistic information are not synchronized, the path determined by the LC may not be optimal [
55]. However, LLDPs are sent periodically in normal, and immediately when significant changes occur. Thus, we assume that the lack of synchronization problem can be ignored.
When the flow passes through other domains, LC requests the inter-domain path selection process to GC by providing the ED boundary information in its intra-domain and QoS requirements. It is noted that GC maintains the inter-domain paths by DAG based on the topology information received from LCs periodically. For the request of an inter-domain flow, GC determines the inter-domain path satisfying the QoS requirement and passes the path information to LCs connecting the flow.
4. Performance Evaluation
We implemented a network simulator to conduct performance comparisons using Riverbed Modeler 18.9 (formerly OPNET Modeler) with Opendaylight 0.8.4 SDN controllers in order to verify the performance of the proposed method. We compared the proposed method (
Proposed) with the Dijkstra algorithm without considering QoS (
Dijkstra) and DOLPHIN, which supports QoS in a multi-domain SDN environment [
31] (
DOLPHIN). We constructed three network topologies using the Abilene, GEANT, and Synth200 topologies with a variable number of hosts per domain, as shown in
Figure 6. The number of local hosts per domain was configured to range from 50 to 200.
Table 3 shows the average number of flows according to the topology and the number of hosts. Traffic generation is generated by selecting an arbitrary source host, destination host, and QoS type. As the number of domains increases, the number of flows increases as the number of hosts increases.
Table 4 lists the traffic types with different delay constraints used in the experiments. There are three traffic types: Best-Effort data (BE Data), Voice over IP (VoIP), and Video on Demand (VOD) with different delay requirements, which were also used in [
56,
57].
The BE data traffic does not require any QoS requirements, and is treated with the lowest priority. VoIP traffic has the smallest delay requirement with the highest priority. VOD traffic also has a delay requirement, but is smaller than VoIP, and it has medium priority.
The CDF graph of the Maximum Link Utilization (MLU) for the Abilene topology is shown in
Figure 7a. As shown in the figure, it can be seen that the CDF of the graph has a lower link utilization between
DOLPHIN and the proposed method with QoS and traffic engineering than the simple path flow selection method without traffic engineering. In addition, it can be seen that the maximum link usage has a lower MLU ratio than
DOLPHIN because the alternative path selection is fast for the proposed method.
The proposed method showed a better performance than the comparison target method in a complex environment. This is because when the topology is complex, the computation process is less complex than that of other methods. In particular, as shown in
Figure 7b, traffic engineering is efficiently applied to the Synth200 topology in
Figure 7c, indicating that most links are concentrated between 25% and 30%.
The delay CDF graph is the result of deriving traffic delay CDFs in an environment where there are 50 hosts per domain and basic traffic occurs according to the parameters of each topology.
Figure 8a shows the delay for the Abilene topology, and it can be seen that the topology is relatively simple, and the delay occurs because of the bottleneck phenomenon that occurs in 70–80% occupied links, as shown in
Figure 8a.
Figure 8b shows the delay for GEANT, and there are relatively many alternative paths to apply traffic engineering, so it is not as simple as the Abilene topology; however, it can be seen that the delay is alleviated. In addition, for Synth200 in
Figure 8c, which is a large-scale topology, it can be seen that the delay is low because the link can be evenly distributed. Even in such an environment, the proposed method sets the flow path to which traffic engineering is applied to a lower complexity, indicating that there are more flows with lower latency.
The average delay time graph based on the number of hosts is shown in
Figure 9. As the number of hosts increases, traffic saturation increases and traffic congestion increases accordingly.
Figure 9a shows the average latency in the Abilene topology. As the alternate paths are almost the same, the increase in latency is observed to be similar as the number of hosts increases. However,
Dijkstra support sends traffic without route recalculation, so it can be seen that the increase in latency due to the increase in the number of hosts is greater than in
DOLPHIN and the proposed method.
Figure 9b shows that the delay is less than that of the Abilene topology because it has more alternative paths than the Abilene topology, with an average delay time in the GEANT topology.
As shown in the MLU CDF graph in
Figure 8, when traffic increases as the number of hosts increases, the delay time of
Dijkstra and
DOLPHIN with high link saturation is higher than that of the proposed method. As
DOLPHIN uses a Dijkstra-based QoS support method, it can be seen that the delay is larger than that of the proposed method because of the influence of processing time due to rediscovery, and the slope of the delay time increase due to host increase is milder. As the Synth 200 topology in
Figure 9c has many alternate paths, and the bottleneck is less than other topologies, the latency is smaller than that of the Abilene topology or the GEANT topology. However, as the number of hosts increases,
Dijkstra, which does not support QoS, has a relatively high delay time increase rate, whereas in the case of
DOLPHIN, a delay occurs by path search and thus has a higher delay than the proposed method.
As the mean of delay is measured in both the inter-and intra-domains, it is necessary to further measure whether end-to-end delay requirements are met to enable accurate performance comparison. First, it can be seen that in the case of
Dijkstra, QoS performance is poor in all three topologies. In the case of
DOLPHIN supporting QoS, as the number of hosts increases, it can be seen that QoS is provided by the QoS support algorithm, and in the Abile topology of
Figure 10a, which has few alternative paths, the QoS performance is almost similar to that of the proposed method. However, in an environment with many alternative paths, we can see that a difference between
DOLPHIN and
Proposed occurs.
This is because the complexity required to compute the QoS requirements and derive the optimal flows is lower in the case of the proposed method.
Figure 10b shows that there is a difference in QoS satisfaction between
DOLPHIN and
Proposed in the GANT topology, where the number of domains is increased compared with that of Abilene. Compared to
Dijkstra, there is a large amount of traffic that satisfies the QoS through the QoS function of
DOLPHIN. As the domain increases, the processing time of
DOLPHIN increases, particularly for 150 and 200 hosts with a large number of hosts, and the QoS satisfaction rate decreases.
Figure 10c shows that the average latency in the Synth200 topology is reduced compared to that in the GEANT topology because the number of domains increases and there are many alternative paths, so there are fewer bottlenecks. When the QoS is satisfied in this environment, it may be seen that it is lower than GEANT because of the section in which the path is lengthened.
In the case of Dijkstra, as the host increases, the amount of QoS-satisfied traffic decreases owing to the fixed route of traffic transmission, and DOLPHIN increases the amount of satisfied traffic compared to Dijkstra owing to the route of modification according to QoS support. With respect to the proposed method, it can be confirmed that the traffic satisfying QoS requirements is high with a fast processing time of topological sort and path selection considering the delay bound.
Simple Dijkstra approaches that do not support QoS have poor QoS satisfaction because they do not do admission control for traffic with requirements. To increase QoS satisfaction, DOLPHIN and the proposed approaches to which admission control is applied to perform traffic control for flows that do not satisfy the requirements to satisfy QoS priority traffic, drop traffic that does not meet the requirements, and perform admission control for low-priority BE traffic. As shown in
Figure 11, in the case of Abilene traffic with low traffic alternative paths, 150 and 200 hosts with high traffic occurrence have high drop rates for traffic that does not meet the requirements. Comparing traffic drops for GEANT and Synth200 traffic, it can be seen that both methods drop for VoIP traffic with high requirement criteria, and the DOLPHIN method shows high drop rates due to the high bandwidth and path calculation complexity for 150 and 200 hosts compared to the proposed method.
5. Conclusions
There is a limitation to the application of QoS in a multi-domain environment owing to domain management information imbalance. Compared to traditional networks, SDN enables flexible network configuration management through the control plane, and various studies have been conducted. This study proposed a flow determination method that supports QoS by applying an improved ED calculation in a multi-domain SDN environment. We proposed a DAG-based path selection method for inter-domain flows that satisfies QoS requirements by reducing the variability of determinants within the ED boundaries of the local domain. To verify the performance of the proposed method, comparisons with Dijkstra, without considering QoS and DOLPHIN, which supports QoS in multi-domain environments, were performed using simulations.
Based on the results of the comparison by topology, the performance of DOLPHIN and the proposed method did not differ significantly in Abilene topologies because of the characteristics of the simple Abilene topologies with fewer domains as well as the larger number of alternative paths and Synth200 topologies. In particular, the QoS performance in terms of traffic showed similar characteristics because the alternative paths were limited and the QoS support paths were similar.
Differences were observed in the GEANT and Synth200 topologies because of the differences in performance between the proposed method with computational complexity
and DOLPHIN with
. Since the proposed method assumes that the time synchronization between the controller and the switch that may occur in the actual environment is consistent, we consider an environment in which there is no time error in the envelope process for efficient delay calculation. However, since this is rare in real-world settings, considering more general environments, we should consider the control traffic overhead and time stamps with less control overhead for considering motivation, such as [
55]. In the future, we plan to conduct research on the generalization of deciding factors.