Dynamic SDN Controller Load Balancing

Abstract


Approach
Balance  (SMT) , takes into account the overhead derived from the switch-to-controller and controller-to-switch 92 messages. This method achieves good results as shown in [21]. 93 In the approaches mentioned earlier in this section, all the balancing work is performed in the SC, thus, is used only to gather load data and send it to/from the controllers. "Hybrid Flow" suffers from long run 98 time caused by the dependency that exists between the SC operation and other controllers operations.

99
When the number of controllers or switches increases, the time required for the balancing operation 100 increases as well. Table I summarizes the time complexity of the methods mention above. "Reassignment" level. In this initial architecture, the "Reassignment" level is not sufficiently flexible for 118 various algorithms, which served as a motivation for us to extend it. 119 In this paper, the target is to leverage on previous work [25], [26] and achieve load balancing among 120 controllers. This is done by taking into account: network scalability, algorithm flexibility, minor complexity, 121 better optimization and overhead reduction. To achieve the above objectives, we use the DCF architecture, 122 and considered distance and load at the "Clustering" level that influence the overhead and response time 123 at the "Reassignment" level, respectively.

124
Towards that target, the DCF architecture has been updated to enable the application of existing  figure 1 we show an illustration of 160 our proposed architecture with an example of reclustering process where in green we see the master of 161 each cluster and SC denotes the super controller. In this three-tier architecture, we observe two levels of 162 load operations: "Clustering" and "Reassignments". In the top level of "Clustering", the SC organizes 163 the controllers into clusters. In the lower level of "Reassignments", for each cluster, there is a Master 164 Controller (MC) responsible for the load balancing inside the cluster by reassigned switches to controllers 165 dynamically. In order to define the problem of the "Clustering" for the high level of the load balancing 166 operation, two aspects are considered: first, the minimal differences between clusters' loads are set as 167 targets, and second, the minimal distances between controllers in each cluster.

168
In this paper, we do not propose a new method for performing load balancing inside each cluster since In this section, the assignment problem of controllers to clusters is presented. This problem is considered 182 here as a minimization problem with constraints.

184
We consider a control plane C with M controllers, denoted by C = {C 1 , C 2 , ..., C M } where C i is a single 185 controller. We assume that the processing power of each controller is the same and equal to P, which 186 stands for the number of requests per second that it can handle. Let d i j be the distance (number of hops) 187 between C i and C j . We denote by G i the i t h cluster and by G = {G 1 , G 2 , ..., G K }, the set of all clusters. 188 We assume that M K is an integer and is actually the number of controllers per cluster. Thus, the size of the 189 CV is M K , i.e. we assume that each cluster consist of the same number of controllers. Y denotes a matrix, 190 handled by SC, which consists of the matching of each controller to a single cluster. Each column of Y 191 represents a cluster and each row a controller.
192 Figure 2 shows an example of such a matrix Y(9X3) corresponding to nine controllers split into three 193 clusters. On the left part we see Y matrix before re-clustering process and on the right after it is completed.

194
Thanks to Y one can know which controller is in which cluster as follows. If a controller is included in a 195 cluster then in the corresponding row of the controller and corresponding column of the cluster the value 196 will be 1 otherwise it will be 0. For instance, we see that before re-clustering controller number 5 is in 197 cluster b. After re-clustering (on the right matrix) we see that controller 5 is no longer in b but has been 198 moved to cluster c.

199
Therefore, Y is a binary MXK matrix as follows: Controller load -Average flow request of j th controller per second in time slot t SC Super Controller -collects controllers' loads from masters and re-clustering   As we mentioned in section III, the first aspect of the high-level load balancing is to achieve balanced 207 clusters (in this paper we assume that all controllers have the same processing capabilities, therefore 208 balanced clusters is the overall optimal allocation). For this purpose, the gaps between their loads must be 209 narrowed. A cluster load is defined as the sum of the controllers' average loads included in it, as follows: Where i is the cluster number and M is the number of controllers in the cluster.

211
To measure how much a cluster load is far from other clusters' loads, we derive the global cluster's load 212 average: Where, k is the number of clusters.

214
Then, we define the distance of a cluster's load from the global average's load (denoted above Avg) as: In a second step, we define a metric that measures the total load difference between clusters' load as 216 follows: C. Distances between controllers within same cluster 218 In this section, we focus on the second aspect of the high level load balancing (mentioned in section 219 III-B). The rationale behind this distance optimization is as follows. Since in the initialization phase, 220 switches are matched to the closest controller, then when we perform load balancing inside the same 221 cluster we want the other controllers to be also close to each other, otherwise this would imply that the 222 switch might be now matched to a new controller far from it. For that purpose we define the maximal 223 distance between controllers within the same cluster (over all the clusters) as follows: Where c is the cluster number, and i, j are the controllers in cluster c.

225
Obviously, the best result would be to reach the minimum n(t) possible. Because if the controllers are 226 close to each other, then the overhead consisting of the message exchanged between them will be less 227 significant whereas if they are far from each other, then a multihop path will be required which will clearly 228 impact the traffic on the control plane. However if the constraint on n(t) is too strict this might not allow 229 us enough flexibility to perform load balancing. Therefore we propose to define the minimum distance 230 required to provide enough flexibility for the load balancing operation, denoted as "minMaxDistance".

231
If the value of minMaxDistance is not large enough, it is possible to adjust it by adding an offset to it.

232
Finally we denote by Cnt the constraint on the maximal distance as follows:

234
Our goal is to find the best clustering assignment as defined by Y(t) which minimize ς (t) (Eq.6) and at 235 the same time fulfills the distance constraint (Eq. 8). Therefore, the problem can be formulated as follows: Minimize ς (t) (9) subject to: partition in general is computationally intractable and impractical (unless P = NP).

250
In this paper, we propose an approximation algorithm to solve these problems. We adapt the K-Center 251 problem solution for initial clustering, and use game theoretic techniques to satisfy our objective function 252 with the distance constraint. In this section, we divide the DCC problem into two phases and present our solutions for each of them.

255
In the first phase, we define the initial clusters. We show some possibilities for the initialization that refer 256 to distances between controllers and load differences between clusters. In the second phase, we improve 257 the results. We further reduce the differences of cluster loads without violating the distance constraint by 258 means of our replacement algorithm. We also discuss the connections between these two phases, and the 259 advantages of using this two-phase approach for optimizing the overall performance. The aim of initial clustering is to enable the best start that provides the best result for the second phase.

262
There are two possibilities for the initialization. The first possibility is to focus on the distance, that is, 263 seeking an initial clustering which satisfy the distance constraint while the second possibility is to focus 264 on minimizing load difference between clusters.   to find MCs. Given a set of centers, C, the k-center clustering price of P by C is P C ∞ = max p∈P d (p, C).  After that Algorithm 1 finds K masters, we partition controllers between the masters by keeping the 284 number of controllers in each group under M/K as illustrated in Heuristic 2.

285
As depicted in Heuristic 2, lines 1-2 prepare set S that contains the list of controllers to assign. lines 3-5 286 define the initial empty clusters with one master for each one. Lines 7-15 (while loop) are the candidate 287 clusters which have less than M/K controllers, and each controller is assigned to the nearest master of 288 these candidates. After the controllers are organized into clusters, we check the maximal distance between 289 any two controllers in lines 16-19; this value is used for the "maxDistance" (that was used for Eq. 8).  2) Initial clustering based on load only: If the overhead generated by additional traffic to distant 299 controllers is not an issue (for example due to broadband link) then we should consider this type of 300 initialization, which put an emphasis on the controllers' load. In this case, we must arrange the controllers 301 into clusters according to their loads. To achieve a well-distributed load for all the clusters we want to 302 reach a "min − max", i.e., we would like to minimize the load in the most loaded cluster. As mentioned 303 earlier (in IV-A) we assume the same number of controllers in each cluster. We enforce this via a constraint 304 on the size of each cluster (see further Heuristic 3).

305
In the following, we present a greedy technique to partition the controllers into clusters (Heuristic 3). The 306 basic idea is that in each iteration it fills the less loaded clusters with the most loaded controller.

307
In Heuristic 3, line 1 sorts the controllers by loads. In Line 2-9, each controller, starting with the heaviest 308 one, is matched to the group with the minimum cost function, Cost g (C), if the group size is less than K, 309 where 310 Cost g (C) = CurrentClusterS um + C load .
(13) The "CurrentClusterS um" is the sum of the controllers' loads already handled by cluster g, and C load 311 is the controller's load that will be handled by that cluster. Regarding the time complexity, sorting M The outcomes of the two types of initialization, namely "distance" and "load", presented so far (section 317 V-A) are used as an input for the second phase.

318
It should be noted that since the "maxDistance" constraint is an output of the initialization based on the 319 distance (Heuristic 2), the first phase is mandatory in case the distance constraint is tight. On the other 320 hand, the initialization based on the load (Heuristic 3) is not essential to being perform load balancing in 321 the second phase, but it can accelerate convergence in the second phase. In the second phase, we apply the coalition game theory [28]. We can define a rule to transfer participants 324 from one coalition to another. The outcome of the initial clustering process is a partition denoted Θ 325 defined on a set C that divides C into K clusters with M/K controllers for each cluster. Each controller is 326 associated with one cluster. Hence, the controllers that are connected to the same cluster can be considered 327 participants in a given coalition. 328 We now leverage the coalition game-theory in order to minimize the load differences between clusters or 329 to improve it if an initial load balancing clustering has been performed such as in V-A2 330 A coalition structure is defined by a sequence B = {B 1 , B 2 , . . . , B l } where each B i is a coalition. In 331 general, a coalition game is defined by the triplet (N, v, B), where v is a characteristic function, N are the 332 elements to be grouped and B is a coalition structure that partitions the N elements [28]. In our problem 333 the M controllers are the elements, G is the coalition structure, where each group of controllers G i is 334 a coalition. Therefore, in our problem we can define the coalition game by the triplet (M, v, G) where 335 v = ς (t). The second phase can be considered as a coalition formation game. In a coalition formation 336 game each element can change its coalition providing this can increase its benefit as we will define in 337 the following. For this purpose, we define the Replacement Value (RV) as follows: involves two controllers C i and C j with loads Cl (t) i and l (t) j , respectively, and two clusters a and b with 342 loads L a and L b , respectively. We use the notations "old" and "new" to indicate a value before and after 343 the replacement.

356
In Figure 3, the sum of the loads' distances from the global average, before the replacement is x+y. After In the other symmetrical options, 358 the result is the same.

359
In Figure 4 the sum of distances from the global average, before the replacement is x + y, and this sum 360 after the replacement is (x + (l(t) i + l(t) j ) + (l(t) i + l(t) j ) − y > x + y. In the other symmetrical options, the 361 result is the same.
a value that can be greater than or less than zero. Using the RV, we define the following "ReplacementRule": Regarding the time complexity of lines 1 in algorithm 4, i.e., find the best replacement, it takes:  The DCC can run the second option without any distance constraint (Line 6). In Line 11 it chooses the 390 best solution in such cases, (referring to the minimal load differences) from the following three options:  In this section, our aim is to prove how close our algorithm is to the optimum. Because the capacity of 398 controllers is identical, the minimal difference between clusters is achieved when the controllers' loads are 399 equally distributed among the clusters, where the clusters' loads are equal to the global average, namely 400 ς (t) = 0. Since in the second phase, i.e., in the replacements, the DCC full algorithm is the one that sets 401 the final partition and therefore determines the optimality, it is enough to provide proof of this.

402
As mentioned before, the replacement process is finished when all RV s are 0, at which time any 403 replacement of any two controllers will not improve the result. Figure 5 shows the situation for each 404 two clusters at the end of the algorithm.

405
For each two clusters, where the load of one cluster is above the general average and the load of the 406 second cluster is below the general average, the following formula holds: (15) We begin by considering the most loaded cluster and the most under-loaded cluster. When the cluster 408 size is g, we define X 1 to contain the lowest g/2 controllers, and X 2 to contain the next lowest g/2 409 controllers. In the same way, we define Y 1 to contain the highest g/2 controllers and Y 2 to contain the 410 next highest g/2 controllers. In the worst case, the upper cluster has the controllers from the Y 1 group 411 and the lower cluster has the controllers from the X 1 group. Since the loads of the clusters are balanced,  According to Formula 15, we can take the lowest difference between a controller in the upper cluster 415 and a controller in the lower cluster to obtain a bound on the sum of the distance of loads of these two 416 clusters from the overall average. The sum of distances from the overall average of these two clusters is 417 equal to or smaller than the difference between the two controllers, i.e., between the one with the lowest 418 load of the g most loaded controller and the one with the highest load of the g lowest controllers.
419  controller. We used this latter option to generate each scenario in the following figures.

437
First we begin by showing that the bound for the s(t) function is met.We used 30 controllers divided 438 into 5 clusters. We ran 60 different scenarios. In each scenario, we used a random topology, and random 439 controllers' loads. Figure 6 shows the optimality bound (Eq. 15), which appears as a dashed line, and the The innerBalance set the quality of the balance inside the cluster. Our algorithm balanced the load between 445 clusters and showed the different results that indicates the quality of the balance. The distances between 446 controllers were randomly chosen in a range of [1,100]. 447 We ran these simulations with different clusters size of: 2,3,5,10 and 15. The results showed that when 448 the cluster size increases, the distance of the different bound from the actual bound also increases. We

449
can also see that when the cluster size is too big (15) or too small (2) the final results are less balanced.

450
The reason is because too small cluster size does not contain enough controllers for flexible balancing, 451 and too big cluster do not allow flexibility between clusters since it decreases the number of clusters. 452 We got similar results when running 50 controllers with cluster sizes: 2,5,10,25.

453
As the number of controllers increases, the distance between the difference bound and the actual difference increases. This is because the bound is calculated according to the worst case scenario. Figure 7 shows the 455 increase in distance between the actual difference distance from the difference bound when the number 456 of controllers increases. The results are for 5 controllers in a cluster with 50 network scenarios. 457 We now refer to the number of replacements required. As shown in Figure 8, the actual replacement 458 number is lower than the bound. The results are for 30 controllers and 10 clusters over 40 different 459 network scenarios (as explained above for fig. 6). The number of clusters affects the number of replacements. As the number of clusters increases, the 461 number of replacements increases. Figure 9 shows the average number of replacements over the 30 network 462 configurations, with 100 controllers, where the number of clusters increases.

463
As noted, the initialization of step 1 in the DCC algorithm reduces the number of replacements required 464 in step 2. Figure 10 depicts the number of replacements required, with and without initialization of step 465 1. The results are for 75 controllers and 15 clusters over 50 different network scenarios.

466
As mentioned previously in Section V-A1 during the initialization we can consider also the constraint 467 on the distance (although it is not mandatory and in Section V-A2 we presented an initialization based 468 on load only). Thus, if a controller-to-controller maximal distance constraint is important, we have to 469 compute the lower bound on the maximal distance. By adding this lower bound to the offset defined by 470 the user, an upper bound called "Cnt" is calculated (Eq.8). Figure 11 shows the final maximal distance  Finally, we compare our method of dynamic clusters with another method of fixed clusters. As a starting 474 point, the controllers are divided into clusters according to the distances between them (heuristic 2). In 475 each time cycle, the clusters are rearranged according to the controllers' loads of the previous time cycle. 476 The change in the load status from cycle to cycle is defined by the following transition function: 477 f (n) = max(l(t) i + random(range), P) random(0, 1) = 1 max(l(t) i + random(range), 0) else   increases or decreases randomly. We set the range at 20, and P at 1000. Figure 12 depicts the results 479 with 50 controllers partitioned into 10 clusters. The results show that the differences between the clusters' 480 loads are lower when the clusters are dynamic.

481
Following Fig 12, we ran simulation (see Table IV)   We propose a system (made of multiple algorithms) that assign controllers to clusters with an opti-494 mization and the maximal distance between two controllers in the same clusters. 495 We show that using dynamic clusters provide better results than fixed clustering.

496
In future research, we plan to explore the optimal cluster size, and allow clusters of different sizes.

497
An interesting direction concerns overlapping clusters. Another direction is to examine the required ratio