Solution of the Simultaneous Routing and Bandwidth Allocation Problem in Energy-Aware Networks Using Augmented Lagrangian-Based Algorithms and Decomposition

: We discuss several algorithms for solving a network optimization problem of simultaneous routing and bandwidth allocation in green networks in a decomposed way, based on the augmented Lagrangian. The problem is difficult due to the nonconvexity caused by binary routing variables. The chosen algorithms, which are several versions of the Multiplier Method, including the Alternating Direction Method of Multipliers (ADMM), have been implemented in Python and tested on several networks’ data. We derive theoretical formulations for the inequality constraints of the Bertsekas, Tatjewski and SALA methods, formulated originally for problems with equality constraints. We also introduce some modifications to the Bertsekas and Tatjewski methods, without which they do not work for an MINLP problem. The final comparison of the performance of these algorithms shows a significant advantage of the augmented Lagrangian algorithms, using decomposition for big problems. In our particular case of the simultaneous routing and bandwidth allocation problem, these algorithms seem to be the best choice.


Introduction
In the 2022 year, data transmission networks consumed 260-360 TWh or 1-1.5% of the global electricity use [1].A strong growth in demand for data center and network services is expected to continue at least until the end of this decade [2], especially because of video streaming and gaming.On the other hand, a global energy crisis causes increase in prices of electricity and energy commodities and the threat of their shortages.All this makes it more important than ever to use the network infrastructure efficiently to guarantee the highest possible quality of service with the lowest possible energy consumption.
In the standard approach, network congestion control along with active queue management algorithms try to minimize an aggregated cost with respect to source rates, assuming that routing is given and constant over the time scale of interest.However, it seems that it would be more effective to treat transport and network layers together and minimize crosslayer costs on the time scale of routing changes, taking into account the energy component.
In paper [3], an approach to solve such a simultaneous routing and flow rate optimization problem in energy-aware computer networks has been presented.The formulation considered two criteria.The first criterion was the cost of a bad quality of service, and the second was the energy consumption.It was earlier shown [4] that such problems, where routing variables are binary, even in the simplified version without the energy component, are NP-hard, which means that for bigger networks it is impossible to obtain the optimal solution in a reasonable time frame.The ordinary Lagrangian relaxation cannot help us to obtain a good assessment of the solution and, as it is in most of the integer linear programming (ILP) and mixed-integer nonlinear programming (MINLP) problems [5,6], a duality gap can be observed [7] (that is a nonzero difference between the minimum of the original problem and the maximum of the dual function) and the violation of many constraints.
In the case of nonconvex problems, the most popular remedy to obtain strong duality, that is the zero gap, and to ensure that the constraints are met is to use augmented Lagrangian [8][9][10][11][12][13][14] and the multiplier or, in other words, shifted penalty function method, based on it [15][16][17].Since we solve a network MINLP problem, the only formulations interesting to us are those which involve nonconvex constraints or discrete sets.As is stated in [13], continuous optimization over concave constraints can be used to represent the integrality condition.The simplest method of conversion of a discrete set X to an equality constraint with a continuous function for any x ∈ X ⊆ R n , where X ⊆ X and X is compact, is to use a distance function d(x, X) = min z∈ X∥x − z∥, ∀x ∈ X and the constraint d(x, X) = 0 [13].Other methods of continuous reformulations of mixed-integer optimization problems that may be effective in some problems are presented in [18].
Unfortunately, although MILP and MINLP optimization problems have been present from the very beginning in the most important works on the augmented Lagrangian approach [15] and they are very important in practical problems [19][20][21][22], most of the works dealing with nonconvex problems consider only nonconvexity of the objective functions [23,24].There are relatively few proposals of methods and algorithms also considering nonconvexity of constraints [21,[25][26][27][28][29] and even in a recent paper the following statement can be found: "Despite the recent development of distributed algorithms for nonconvex programs, highly complicated constraints still pose a significant challenge in theory and practice" [30].The proposed algorithms are rather complicated, involve many levels and loops, corrections, linearizations, etc. [21,[30][31][32].
In our paper, we try to solve the two-layer network optimization problem using especially adapted classical methods proposed in the literature for problems with nonconvex constraints, which have also some potential for parallelization.Because of that, very intensively developed in the recent years methods from the ADMM family (Alternative Directions Method of Multipliers), which as a Gauss-Seidel type are sequential as such, are treated on an equal footing with others.If necessary, for example, when an algorithm has been formulated for a different type of constraints than those in our problem, we adapt it.
The paper is organized as follows.We first present the studied network problem in Section 2, which is one of the presented in [7].Next, in Section 3 and Section 4, we review, respectively, the basic augmented Lagrangian algorithms proposed for problems with nonconvex constraints and the methods of their decompositions suitable for parallel computing with some modifications proposed by us.Section 5 describes the transformations of our problem to use the algorithms presented in Section 4. The results of numerical tests on network problems will be shown in Section 6, and the conclusions follow in Section 7.

Network Optimization Problem of Simultaneous Routing and Bandwidth Allocation in Energy-Aware Networks
We can describe the problem of optimizing routing and flow rates simultaneously as identifying scalar flow rates x w and routes (single paths) b w that satisfy network constraints for given source-destination pairs (S(w), D(w)), where w is a demand (flow) from the set W, that deliver the minimal total cost.Routes b w are vectors built of binary indicator variables b wl , whether a link l from the set L is used by the connection w.
The problem can be formulated in the following way [7]: ∑ w∈W y wl ≤ c l , ∀l ∈ L (4) where N is the set of all nodes and A is the set of all arcs of the network.This is the third formulation from the paper [7], denoted there by P alt .The objective Function ( 1) is quadratic and convex.The first component of the function f w (x w , b w ) expresses a cost for not delivering the full possible bandwidth for a connection w ∈ W. The second component expresses the total cost of energy used in the network by the connection w ∈ W. Flow conservation law equations are formulated with auxiliary real variables y wl (the function e(., .)numbers arcs) and binary variables b wl , the equality (2), and three inequalities (5)- (7).Constraints (7) force single path routing and constraints (6) keep the relation between auxiliary variables and binary variables.

Augmented Lagrangian Algorithms
We consider the constrained optimization problem of the form s.t.h(x) = 0 (10) where the function f : R n − → R is continuous and differentiable, h : R n − → R m is linear, m < n and the set X ⊆ R n is compact.For simplicity, we do not consider inequality constraints, because they can be either transformed to equality constraints (10) with the help of (additional) slack variables or be a part of the definition of the set X.
In the 1950s to solve the minimization problem ( 9)- (10) in the convex case, the Lagrangian relaxation method was proposed [33][34][35], using Lagrangian function L : R n+m → R defined as where λ ∈ R m is the Lagrange multipliers vector.This method used the associated dual functional q for (11) given as According to the duality theory, the solution maximizes the dual function with respect to the Lagrange multipliers vector, that is, we should calculate max Usually, we iterate alternately on x and λ in the following way: where ρ k is a scalar stepsize coefficient.The major drawback of the Lagrangian relaxation method is the requirement that the problem should possess a convex structure, which reduces its applicability.Applying it to nonconvex problems, including problems with discrete variables, can result in a duality gap, whereby the maximum of the dual function does not equal the minimum of the original problem.
Powell [36] and Hestenes [37] convexified the problem ( 9)-( 10) by adding the squared norm of the equality constraints where ρ > 0 is a scalar, penalty parameter.This function is convex in the neighborhood of (x * , λ * ) when ρ is taken sufficiently large and if at (x * , λ * ) the second order sufficient optimality condition is satisfied.The method of multipliers algorithm for solving problem ( 9)- (10), based on the augmented Lagrangian, consists of the following iterations: When X = R n , to reduce the steps of the ( 18)-( 19) algorithm, a multiplier estimate function derived from the first order Karush-Kuhn-Tucker optimality condition [33] can be used [38,39]: This formula is particularly useful when it is possible to calculate analytically λ(x), e.g., when functions h(x) are linear.Then, having λ(x), one can replace the two-level algorithm ( 18)-( 19) with a single-level one:

Classical Separable Problem Formulation
The problem ( 9)-( 10) is said to be separable if it can be written as where x i ∈ R n i are subvectors of decision variables, x = (x 1 , . . ., x N ) ∈ R n , n = ∑ N i=1 n i , h i : R n i − → R m are given vector functions that describe constraints, and f i : R n i − → R, i = 1, . . ., N. The set X is a Cartesian product of the sets X i in the corresponding subspaces.
The augmented Lagrangian for ( 22)-( 24) can be written as The separable problem ( 22)-( 24) yields a nonseparable Lagrange function due to the last-quadratic-penalty term, hence decomposition algorithms cannot be applied directly to the dual equivalent of the problem ( 22)- (24).To avoid the nonseparability, some tricks are necessary, which we describe in the following subsections.The most important conditions of the convergence of the presented algorithms to the optimal point x * for sufficiently big ρ are that all functions from the problem statement are of class C 2 in a neighborhood of the optimal point x * , gradients of all active constraints are linearly independent at x * , and that the second order sufficient optimality condition ( 17) is satisfied (for the ordinary Lagrangian of the problem).

Bertsekas Decomposition Method
Bertsekas [25] proposed a convexification method that preserves the separability of a modified Lagrangian.It belongs to the proximal point algorithms family and may be interpreted as a multipliers method in disguise [40].The penalty component expresses now not the square of the constraints function norm but the square of the distance between the x vector and a parameter vector s from its vicinity.In subsequent iterations, this vector approaches x, leading to convergence, while preserving separability.The resulting Lagrangian is This function is separable and locally convex for ρ from some interval provided that the above mentioned conditions are satisfied at point x * [25].In a decomposed form, the augmented Lagrangian (26) can be written as where The Bertsekas [25] approach was adapted by us to an MINLP problem.The most efficient proved to be a two-level version with additional scaling coefficient β ≪ 1 in the formula for changing Lagrange multipliers.This algorithm may be described by the following steps: where ξ ∈ [0, 1) is a relaxation parameter.

Tanikawa-Mukai Decomposition Method
In the case when in the Bertsekas algorithm X = R n , we may apply the approximation of the multipliers vector λ(x) using the direct formula (20).Unfortunately, when λ(x) substitutes λ in (26), the resulting Lagrangian will no longer be separable, because of the term λ(x) T h(x), containing products of functions of different x i components, i = 1, . . ., N. Hence, separability will be preserved if λ does not depend on x.To address this issue, Tanikawa and Mukai [26] replaced λ(x) with an approximate λ(s) and added a penalty term ηh(s) T M(s)h(x) so as to ensure that the minimum point ŝ is closer than s to x * .These improvements resulted in the formula where η > 0 is a scalar, and M(s) is a symmetric, positive-definite matrix whose elements are of C 1 class.The Lagrangian problem ( 32) is locally strictly convex in x in the neighborhood of x * if ρ is taken large enough and it has a unique local minimum point s * as a function of s.
In a decomposed form, (32) can be written as where The problem ( 22)-( 24) for X = R n can now be solved by a two-level algorithm, Tatjewski and Engelmann [28] extended this approach to a more general case of the problem ( 22)-( 24), involving local sets X i , given by inequality constraints In this case, the Lagrange multipliers vector was calculated from the following formula: where µ is the multipliers vector for local inequality constraints

Tatjewski Decomposition Method
A different approach to handle the nonseparable term in the Function (25) with local sets X i given by (37), proposed by Tatjewski [27], consists in replacing the Function (25) by where s = (s 1 , s 2 , . . ., s N ) ∈ R n is an approximation point.The Function ( 40) is separable where The algorithm, which we adapted to an MINLP problem, has, as in the Bertsekas algorithm case, two-levels and an additional scaling factor β ≪ 1 in the formula for updating the Lagrange multipliers: where ξ ∈ [0, 1) is a relaxation parameter.

SALA Decomposition Algorithm in ADMM Version
The Alternating Direction Method of Multipliers (ADMM) was proposed by Glowinski and Marrocco [41] and Gabay and Mercier [42].In the basic version, ADMM solves problems dependent on two subvectors of the decision variables vector-x and scalculating the new estimates of the solutions (x * , s * ) in the Gauss-Seidel fashion.For our particular, nonconvex problem with local constraints, the best choice seems to be the Separated Augmented Lagrangian Algorithm (SALA) proposed by Hamdi et al. [29,43,44].
In SALA, first, the problem ( 22)-( 24) is reformulated into where s i ∈ R m , i = 1, . . ., N are additional, artificial variables.To this formulation, the multiplier method with partial elimination of constraints [15] is then applied.The eliminated by means of dualization and a penalty are constraints (47); the constraints (48) are retained explicitly.The augmented Lagrangian for problem ( 46)-(49) with elimination of only (47) constraints will have the form (it can be easily proved that λ where The SALA method of multipliers can be summed up in the following stages: Using the ADMM approach for problem (52), we can separate optimizations with respect to x and s.Moreover, the optimization with respect to x can be split among N sub-problems, that is, we finally obtain the ADMM algorithm with a Jacobi (i.e., parallelizable) part with respect to the x vector: Calculate

Decomposition of the Network Problem
Unfortunately, not all the methods presented in Section 4 can be applied to solve our network problem (1)-( 8).Tanikawa-Mukai and Tatjewski-Engelmann methods from Section 4.3 are not suitable, because this is a mixed-integer problem, where Karush-Kuhn-Tucker optimality conditions, which are the basis of the explicit formulas for calculation of the Lagrange multipliers, are not well defined.Fortunately, all the remaining methods can be applied.
We will try to solve the problem ( 1)-( 8), decomposing it with respect to flows w ∈ W. Hence, it will be convenient to introduce for every flow w the admissible set XBY w resulting from the Equations ( 2), ( 3) and ( 5)-( 8).This set for the flow w will be defined as x w , y wl ∈ R, b wl ∈ {0, 1}, l ∈ L : ∑ {i∈N|(i,j)∈A} y w,e(i,j) − ∑ {k∈N|(j,k)∈A} y w,e(j,k)

The Standard Multiplier Method without Decomposition
The augmented Lagrangian for (1)-( 8) is where z l ≥ 0, l ∈ L are slack variables.To find for every l ∈ L, the value of z * l that minimizes the augmented Lagrangian (61) in current conditions (that is, for given y wl , w ∈ W, l ∈ L), we write min The unconstrained minimum of the expression in square brackets in (62) is the scalar ẑl at which the derivative is 0 Hence, the solution of problem (62) is and We know that at iteration k + 1, Substituting into (67), we have Taking into account (66), we can solve (1)-( 8) using the algorithm ( 18)-( 19), which will take the form 3) and (5)-( 8) (70) with ρ > 0.

The Bertsekas Method
The augmented Lagrangian in the Bertsekas method for the problem (1)-(8) will have the form where z l ≥ 0 are slack variables and λ l ∈ R for l ∈ L. Formula (72) is equivalent to where and To find for every l ∈ L, the value of z * l that minimizes the augmented Lagrangian (72), we will solve independently the following problems: The unconstrained minimum of the expression in square brackets in (76) is the scalar ẑl at which the derivative is 0, that is and the solution of problem ( 76) is Now, we can perform the subsequent iterations of the algorithm ( 29)- (31).

The Tatjewski Method
The Tatjewski Lagrangian of ( 1)-( 8) in a decomposed form can be written as where z l ≥ 0, l ∈ L are slack variables.Formula (80) may be rewritten in the following way: where and To find for every l ∈ L, the value of z * l that minimizes the augmented Lagrangian (80), we will solve independently the problems min The unconstrained minimum of the expression enclosed in square brackets in ( 84) is for the scalar ẑwl at which the derivative is 0, that is The solution of problem (84) is Now, we can execute the iterations of the Tatjewski algorithm ( 43)- (45).

SALA ADMM Algorithm
In the network problem ( 1)-( 8), constraint binding flows (4) are of the inequality type; hence, in our version of problem formulation to be solved by the SALA algorithm ( 46)-( 49 The equality constraint for the flow w ∈ W in link l ∈ L will be as follows: The augmented Lagrangian is where The algorithm (55)-(59) adapted to this problem will take the form

Numerical Tests
The algorithms described above were implemented and tested on ten network problems of different sizes.The tested networks consisted of loosely connected node clusters with strongly connected nodes.The problems were first solved without decomposition and later with decomposition, allowing for future parallelization (however, our tests were performed in a sequential way).
The Python 3.7.7 programming language was used for all implementations and numerical experiments.To construct and evaluate optimization problems, the amplpy software [45,46] package was employed, the Gurobi solver was used for the optimization, and the NetworkX software package [47] was applied to create and display networks related to optimization problems.
The tests were performed on the machine with AMD Ryzen 5 4600H with Radeon Graphics 3.00 GHz processor, 32 GB RAM, 64-bit operating system, x64-based processor, 512 GB SSD storage, under Windows 10 Pro operating system.
Five distinct network topologies were created for the study.The first two topologies, the medium networks, had fewer variables and can be observed in Table 1 and Figure 1.The second two topologies, the large networks, had more variables and can be seen in Table 1 and Figure 2. The last topology, the extra-large network, had many more variables and can be seen in Table 1 and Figure 3.The source and destination nodes for each problem instance were generated randomly, along with the capacities of links.Instances of the problem were created for all network topologies.(a) For γ = 1, 2; δ = 1 (b) For γ = 1, 2; δ = 2       In most cases, all algorithms based on the augmented Lagrangian method dealt very well with the duality gap (the biggest value of the gap was less than 5%; in most cases, it was below 0.5%), which is a serious issue when Lagrangian relaxation is applied to MINLP problems, such as simultaneous optimization of routing and bandwidth allocation [3,4,7].Among the algorithms tested, the standard Augmented Lagrangian method achieved the objective value with the least relative error for medium and large-sized problems.In most tests, Tatjewski and ADMM SALA algorithms also produced competitive results, though slightly higher, and the Bertsekas algorithm was a little worse but not always.In the case of very-large problems, the Tatjewski algorithm proved to be the most precise, ADMM, SALA and Bertsekas algorithms were a little worse, but they all delivered approximation of solutions with a very small relative error less than 0.6%.The ordinary augmented Lagrangian algorithm did not converge within 8 h.
Regarding constraints violation, the algorithms are comparable-they all deliver (of course if they converge) an admissible solution with the assumed accuracy 10 −6 .As expected, none of the algorithms incurred any routing violations, indicating their ability to generate feasible routing solutions that satisfy network flow conservation equations.
The analysis of runtimes presented in Tables 2-4 and Figures 14-16 reveals that while the medium problems are solved very fast by one of the best commercial solvers (Gurobi), in the case of large problems, AL-based decomposed algorithms are competitive.The run times of delivering of a good approximation of the optimal solution were 2-3 times shorter for the standard AL algorithm and 4-14 (for one test example even 31.5)times for the algorithms with decomposition.In the case of very large problems the decomposed algorithms were 1.6-26 times faster than Gurobi solver.The standard augmented Lagrangian method required significantly more time to solve both medium and large-sized problems and failed in the case of very-large problems.The results highlight the trade-offs between various Lagrangian-based optimization algorithms.The standard augmented Lagrangian excels in finding optimal routing solutions in not-too-big problems but at the cost of increased computational time.The Bertsekas algorithm offers a balanced approach, achieving competitive results with shorter runtimes.Tatjewski's algorithm maintains high accuracy of solutions for big problems, making it suitable for scenarios with stringent demands.For very big problems ADMM SALA is a little worse in terms of the objective function, but it is quite fast.
It should be mentioned that the above AL-based methods will work also for more complicated two-layer network problems, e.g., with logarithmic objective functions [3].

Conclusions
We have discussed several algorithms for solving separable, mixed-integer, problems in a decomposed way and implemented them using Python and amplpy.We have tested these implementations on a network optimization problem of simultaneous routing and bandwidth allocation.
Since this network optimization problem has inequality constraints that bind subproblems, while in the original presentations of these methods, the binding constraints were equalities, we reformulated them.
Based on the results of numerical experiments we can conclude that all augmented Lagrangian-based algorithms (standard, Bertsekas, Tatjewski and SALA) overcome the duality gap, which is a serious disadvantage of using the Lagrangian Relaxation method to solve the two-layer routing and bandwidth allocation problem with single paths [3,4,7].They deliver a very good approximation of the exact optimal solution in a relatively short time, usually 4-10 times shorter than that of the exact commercial solvers.
Applying parallelization to solve independently local optimization problems for particular flows should give additional speed up of the decomposed algorithms.
The choice of a particular AL-based optimization algorithm should be tailored to the specific requirements of the network problem.Standard augmented Lagrangian stands out as the go-to choice for small problems.The Bertsekas method offers a strong compromise between solution quality and runtime efficiency, and hence, it is suitable for large-scale problems where decomposition is needed, while the Tatjewski method excels in the solution quality.Ultimately, the selection of the algorithm should align with the priorities of the network optimization problem at hand.
Our formal analysis sheds light on the mathematical difficulties of Lagrangian-based optimization algorithms in the context of two-layer network problems, providing some insights for practitioners and researchers in the field of information technology and network optimization.
Our future works on simultaneous routing and bandwidth allocation problem in energy-aware networks will concern different than quadratic QoS components of the objective function (e.g., logarithmic [3]), nonseparable problems and more complicated models of the cost of energy, including unidirectional links.

Figures 4 -
Figures 4-13 show plots of values of the objective Function (1) and the norm of constraint residuals in subsequent iterations of all optimization algorithms considered in Section 5.
(a) Objective function values (b) Norm of residual values

Figure 14 .
Figure 14.Run times for the Medium problems.

Figure 15 .
Figure 15.Run times for the Large problems.

Figure 16 .
Figure 16.Run times for the Extra-large problems.