Safe Approximation—An Efﬁcient Solution for a Hard Routing Problem

: The Disjoint Connecting Paths problem and its capacitated generalization, called Unsplittable Flow problem, play an important role in practical applications such as communication network design and routing. These tasks are NP -hard in general, but various polynomial-time approximations are known. Nevertheless, the approximations tend to be either too loose (allowing large deviation from the optimum), or too complicated, often rendering them impractical in large, complex networks. Therefore, our goal is to present a solution that provides a relatively simple, efﬁcient algorithm for the unsplittable ﬂow problem in large directed graphs, where the task is NP -hard, and is known to remain NP -hard even to approximate up to a large factor. The efﬁciency of our algorithm is achieved by sacriﬁcing a small part of the solution space. This also represents a novel paradigm for approximation. Rather than giving up the search for an exact solution, we restrict the solution space to a subset that is the most important for applications, and excludes only a small part that is marginal in some well-deﬁned sense. Speciﬁcally, the sacriﬁced part only contains scenarios where some edges are very close to saturation. Since nearly saturated links are undesirable in practical applications, therefore, excluding near saturation is quite reasonable from the practical point of view. We refer the solutions that contain no nearly saturated edges as safe solutions, and call the approach safe approximation . We prove that this safe approximation can be carried out efﬁciently. That is, once we restrict ourselves to safe solutions, we can ﬁnd the exact optimum by a randomized polynomial time algorithm.


Introduction
In communication network design and routing, one often looks for disjoint routes that connect various sources with associated terminals. This motivates the Disjoint Connecting Paths problem, which is the following decision task: Input: a set of source-destination node pairs (s 1 , t 1 ), . . . , (s k , t k ) in a graph. Task: Find edge disjoint paths P 1 , . . . , P k , such that P i connects s i with t i for each i. This is one of the classical NP-complete problems that appears already at the sources of NP-completess theory, among the original problems of Karp [1]. It remains NP-complete both for directed and undirected graphs, as well as for the edge disjoint and vertex disjoint paths version. The corresponding natural optimization problem, when we are looking for the maximum number of terminator pairs that can be connected by disjoint paths is NP-hard.
There is also a capacitated version of the Disjoint Connecting Paths problem, also known as the Unsplittable Flow problem. In this task a flow demand is given for each origindestination pair (s i , t i ), as well as a capacity value is known for each edge. The requirement is to find a system of paths, connecting the respective source-destination pairs and carrying the respective flow demands, such that the capacity constraint of each edge is obeyed, that is, the sum of the flows of paths that traverse the edge cannot be more than the capacity of the edge. The name Unsplittable Flow expresses the requirement that between each source-destination pair the flow must follow a single route, it cannot split. Note that the disjointness of the paths themselves is not required a priori in the Unsplittable Flow, but can be enforced by the capacity constraints, if needed. The Unsplittable Flow problem is widely regarded as a fundamental task in communication network design and routing applications. Note that it directly subsumes the Disjoint Connecting Paths problem as a special case, which can be obtained by taking all capacities and flow demands equal to 1. In this paper, after reviewing some existing results, we show that the Unsplittable Flow problem, which is known to be NP-hard (and its decision version is NP-complete), becomes efficiently solvable if we impose a mild and practically well justifiable restriction. Specifically, our key idea is that we "cut down" a small part of the solution space by slightly reducing the edge capacities. In other words, we exclude solutions that are close to saturating some edges, which would be practically undesirable, anyway. We call remaining solutions safe solutions: safe in the sense that no link is very close to saturation. Why is this slight capacity reduction useful? Its usefulness is based on the fact, which we constructively prove in the paper, that if we restrict ourselves to safe solutions only, then the hard algorithmic problem becomes solvable by a relatively simple randomized polynomial time algorithm. With very high probability the algorithm finds the exact optimum in polynomial time, whenever there exists a safe solution. We call this approach safe approximation.

Previous Results
Considerable work was done on the Disjoint Connecting Paths problem, since its first appearance as an NP-complete task in the classic paper of Karp [1] in 1972.
One direction of research deals with finding the "heart" of the difficulty-the simplest restricted cases that still remain NP-complete. (Or NP-hard if the optimization version is considered, where we look for the maximum number of connecting paths, allowing that possibly not all source-destination pairs will be connected). Kramer and van Leeuwen [2] proves, motivated by VLSI layout design, that the problem remains NP-complete even for graphs as regular as a two dimensional mesh. If we restrict ourselves to undirected planar graphs with each vertex having degree at most three, the problem also remains NP-complete, as proven by Middendorf and Pfeiffer [3]. The optimization version remains NP-hard for trees with parallel edges, although there the decision problem is already solvable in polynomial time [4].
The restriction that we only allow paths which connect each source node with a dedicated target is essential. If this is relaxed and we are satisfied with edge disjoint paths that connect each source s i with some of destinations t j but not necessarily with t i , then the problem becomes solvable with classical network flow techniques. Thus, the prescribed matching of sources and destinations causes a dramatic change in the problem complexity. Interestingly, it becomes already NP-complete if we require that just one of the sources is connected to a dedicated destination, the rest is relaxed as above (Faragó [5]).
Another group of results produces polynomial time algorithmic solutions for finding the paths, possibly using randomization, in special classes of graphs. For example, Middendorf and Pfeiffer [3] proves the following. Let us represent the terminator pairs by demand edges. These are additional edges that connect a source with its destination. If this extended graph is embeddable in the plane such that the demand edges lie in a bounded number of faces of the original graph, then the problem is solvable in polynomial time. (The faces are the planar regions bordered by the curves that represent the edges in the planar embedding, that is, in drawing the graph in the plane). Thus, this special case requires that, beyond the planarity of the extended graph, the terminators are concentrated in a constant number of regions (independent of the graph size), rather than spreading over the graph.
Broder, Frieze, Suen and Upfal [6] consider the case of random graphs and provide a randomized algorithm that, under some technical conditions, finds a solution with high probability in time O(nm 2 ) for a random graph of n vertices and m edges.
A deep theoretical result, due to Robertson and Seymour [7], is that for general undirected graphs the problem can be solved in polynomial time if the number k of paths to be found is constant (i.e., cannot grow with the size of the graph). This is important from the theoretical point of view, but does not lead to a practical algorithm, as analyzed by Bienstock and Langston [8]. On the other hand, for directed graphs the problem becomes NP-complete already for the smallest nontrivial case of k = 2, as proven by Even, Itai, and Shamir [9].
Another line of research aims at finding approximations to the optimization version. An algorithm is said to be an f (n)-approximation if it can connect a subset of the terminator pairs by disjoint paths such that this subset is at most f (n) times smaller than the optimum in a graph of n vertices. For example, in this terminology a 2-approximation algorithm always reaches at least the half of the optimum, or an O(log n)-approximation reaches at least a c/ log n fraction of the optimum, for n > n 0 with some constants c, n 0 .
Various approximations have been presented in the literature. For example, Garg, Vazirani and Yannakakis [4] provide a 2-approximation for trees with parallel edges. Aumann and Rabani [10] gives an O(log n)-approximation for the 2-dimensional mesh. Kleinberg and Tardos [11] present an O(log n)-approximation for a larger subclass of planar graphs, they call "nearly Eulerian, uniformly high-diameter planar graphs." On the other hand, finding a constant factor, or at least a logarithmic factor approximation for general directed graphs is hopeless-Guruswami, Khanna, Rajaraman, Shepherd, and Yannakakis [12] showed that on directed graphs it is NP-hard to obtain an O(n 1/2− ) approximation for n-vertex graphs for any > 0.
For the general case the approximation factor of min{ √ m, m/opt} = O( √ m) is known to be achievable (Srinivasan [13]), where m is the number of edges and opt is the optimum, that is, the maximum number of disjoint connecting paths between the source-destination pairs. Similar bounds apply for the Unsplittable Flow problem, as well. Bounds have been also found in terms of special (less trivial) graph parameters. For example, Kolman and Scheideler [14] proves that an efficient O(F) approximation exists, where F is the so called flow number of the graph. Although the flow number can be computed in polynomial time [14], it is a rather indirect characterization of the graph. Numerous further related results are reviewed in the Handbook of Approximation Algorithms and Metaheuristics [15].
It is worth noting that in the capacitated version (Unsplittable Flow problem) the approximation does not mean approximating the total carried flow. It is meant with respect to the number of source-destination pairs connected, such that for each pair their entire demand is carried. For example, if there are 30 source-destination pairs, each with flow demand four units, then a 2-approximation means routing at least 15 of these flows, each path carrying four units of flow. In other words, not only the paths are unsplittable, but also the flow demands. That is, if we connect all 30 source-destination pairs, but each route can carry only two units of flow, it would not count as a 2-approximation.
In summary, the landscape of the Disjoint Connecting Path and the Unsplittable Flow problems can be characterized by the following facts: • The decision problems are NP-complete for the general case. • The approximation versions are NP-hard for the general case, even within a large approximation factor. • Polynomial time solutions or better approximations exist only for special cases. • Even in those cases when a polynomial time solution exists, it is either far from practical, or else it applies to a graph class that is too special to satisfy the demands of network design applications.

The Method of Safe Approximation
The various above referenced solutions tend to be rather complicated, which is certainly not helpful for real-life applications, in particular for large, complex networks. Our approach for providing a more practical solution to the unsplittable flow problem based on the following idea. We "cut down" a small part of the solution space by slightly re-ducing the edge capacities. In other words, we exclude solutions that are close to saturating some edges, as explained below.
Let V i be the given flow demand of the i th connecting path. We normalize these demands such that V i ≤ 1 for every i. Let C j be the capacity of edge j. The graph is assumed directed and the edges are numbered from 1 through m. Recall that a feasible solution of the problem is a set of s i − t i (directed) paths, one for each i, satisfying the edge capacity constraints. The latter means that on each edge j the sum of the V i values of those paths that traverse the edge does not exceed C j . As mentioned earlier, deciding whether a feasible solution exist at all is a hard (NP-complete) problem.
On the other hand, not all feasible solutions are equally good from a practical viewpoint. For example, if a route system in a network saturates or nearly saturates some links, then it is not preferable because it is close to being overloaded. For this reason, let us assign a parameter 0 < ρ j < 1 to each edge j, such that ρ j will act as a safety margin for the edge. More precisely, let us call a feasible solution a safe solution with parameters ρ j , j = 1, . . . , m, where m is the number of edges, if it uses at most C j = ρ j C j capacity on edge j. The parameter ρ j will be close to 1 in the cases that are interesting to us. Now, the appealing thing is that if we restrict ourselves to only those cases when a safe solution exists, then the hard algorithmic problem becomes solvable by a relatively simple randomized polynomial time algorithm. With very high probability, the algorithm finds the exact optimum in polynomial time, whenever there exists a safe solution. We call this approach safe approximation.
The price is that we exclude those cases when a feasible solution might still possibly exists, but there is no safe solution. This means, in these cases all feasible solutions are undesirable, because they all make some edges nearly saturated. In these marginal cases the algorithm may find no solution at all. Our approach constitutes a new avenue to approximation, in the sense that instead of giving up finding an exact solution, we rather restrict the search space to a (slightly) smaller one. When, however, the algorithm finds any solution, then it provides the exact (not just approximate) optimum.
Let us choose the safety margin ρ j for edge j in a graph of m edges as where C j is the capacity of edge j, ln denotes the natural logarithm log e , and e ≈ 2.718 . . . is Euler's number (the base of the natural logarithm). Note that ρ j tends to 1 with growing C j , even if the graph also grows, but C j grows faster than the logarithm of the graph size, which is very reasonable (note that doubling the number of edges will only increase the natural logarithm by less than 1). For example, if an edge has capacity C j = 1000 units (measured in relative units, such that the maximum path flow demand is 1), and the graph has m = 200 edges, then ρ j ≈ 0.87. Now, we outline how the algorithm works. First, we describe a continuous multicommodity flow relaxation.
Continuous multicommodity flow relaxation Let V i be the given flow demand of the i th connecting path, with source node s i and target node t i . We refer to i as the type of the flow. For each such type, and for each edge j, assign a non-negative variable x i j , representing the amount of type-i flow on edge j. The flow model is then expressed by the following constraints:

1.
Capacity constraints For every edge j The meaning of (2) is that the flow on edge j, summed over all types, cannot be more than the capacity of the edge, reduced by the safety margin.

Flow conservation For every node v, let In(v) denote the set of incoming edges of v.
Similarly, let Out(v) be the set of outgoing edges of v. Then write for every flow type i and for every node v, except s i and t i : The meaning of (3) is that for every flow type i, the type-i flow is conserved at every node, except its source s i and destination t i . By conservation we mean that the total incoming type-i flow to v is equal to the total outgoing type-i flow from v.

3.
Source constraints For every flow type i (which has its source at node s i ): ∑ j∈Out(s i ) The meaning of (4) is that source s i should emit V i amount of type-i flow.
Now we can describe our algorithm. The used notations are summarized in Table 1.

Procedure
Step 1 Multicommodity flow relaxation Solve the following linear program: In case there is no feasible solution, then declare "no safe solution exists" and HALT.

Step 2 Path Creation via Random Walk
For each source-destination pair s i , t i find a path via the following randomized procedure. Start at the source s i and take the next node such that it is drawn randomly among the successor neighbors of the source, with probabilities proportional to the i th commodity flow values on the edges from s i to the successor neighbors in the directed graph. Continue this in a similar manner-at each node choose the next one among its successor neighbors randomly, with probabilities that are proportional to the i th commodity flow values. Finally, upon arrival at t i , we store the found (s i , t i ) path.
Step 3 Feasibility Check and Repetition Having found a system of paths in the previous step, check whether it is a feasible solution (i.e., it can carry the flow demand for each path within the capacity bounds). If so, then HALT, else repeat from Step 2.
If after repeating r times (r is a fixed parameter) none of the runs are successful then declare "No solution is found" and HALT.
It is clear from the above description that the algorithm has practically feasible complexity, since the most complex part is the linear programming. Note that while Step 2 is repeated r times, the linear program in Step 1 needs to be solved only once. The correctness of the algorithm is shown in the following theorem. Proof. Since a safe solution is also a feasible solution of the multicommodity flow relaxation, therefore, if there is no flow solution in Step 1, then no safe solution can exist either.
Step 2 transforms the flow solution into paths. To see that they are indeed paths, observe that looping cannot occur in the randomized branching procedure, because if a circle arises on the way, that would mean a circle with all positive flow values for a given commodity, which could be canceled from the flow of that commodity, thus contradicting to the fact that the linear program finds a flow in which the sum of edge flows is minimum, according to the objective function (5). Furthermore, since looping cannot occur, we must reach the destination via the procedure in at most n steps, where n is the number of nodes. Now a key observation is that if we build the paths with the described randomization between the i th source-destination pair, then the expected value of the load that is put on any given edge by these paths will be exactly the value of the i th commodity flow on the link. This follows from the fact that the branching probabilities are flow-proportional.
From the above we know that the total expected load of an edge, arising form the randomly chosen paths, is equal to the total flow value on the edge. What we have to bound is the deviation of the actual load from this expected value. Let F j be the flow (=expected load) on edge j. This arises in the randomized procedure as where X i is a random variable that takes the value 1 if the i th path (i.e., the type-i flow) contributes to the edge load, otherwise it is 0. The construction implies that these random variables are independent, whenever j is a fixed edge and i runs over the different flow types. Now consider the random variable By (6), we have E(Ψ j ) = F j . The probability that the actual random value Ψ j deviates form its expected value by F j by more than a factor of δ can be bounded by the tail inequality found in [16], which is a variant of the Chernoff-Hoeffding bound: It can be calculated from this [16] that if we want to guarantee that the bound does not exceed a given value > 0, then it is sufficient to satisfy Let us choose = 1/(2m). Then we have Since the bound that we do not want to exceed is the edge capacity C j , therefore, if is satisfied, then we have If this holds for all edges, it yields Thus, the probability that the found path system is not feasible is less than 1/2. Repeating the procedure r times with independent randomness, the probability that none of the trials provide a feasible solution is bounded by 1/2 r , that is, the failure probability becomes very small, already for moderate values of r. Finally, expressing F j form (7) we obtain which shows that the safety margin is correctly chosen, thus completing the proof.

Simulation Experiments
We have carried out simulation experiments to demonstrate how our approach performs in practice. Below we summarize the experimental results.
The calculation of the safety margin ρ for the edges is based on formula (1). Note that the safety margin is determined by the number of edges and the edge capacities, and the meaning of the safety margin is the fraction of the maximum edge capacity that can be used by our algorithm. This is shown in  Table 2 is represented graphically in Figure 1. In our simulation experiments, the number of nodes ranges from 5 to 40, while the number of edges ranges from 7 through 62. The actual graph (network topology) is chosen randomly. An example is shown in Figure 2. For each number of edges we ran the simulation with two different edge capacities: C = 1000 and C = 2000 units. As the safety margin increases with higher capacities, the network can be better utilized, since the safety margin determines what percentage of the maximum capacity is utilized. We ran the simulation with two different traffic types (two commodities in the multicommodity flow); they represent two source-terminal pairs, which were chosen randomly. Furthermore, the number of nodes and edges were varied: the nodes between 5 and 40, the edges between 7 and 62. The results are shown in Table 3. The table contains that for various numbers of nodes and edges, how many iterations were needed to find the optimum, and what was the optimum objective function value in terms of total carried traffic. The same results are represented graphically in Figure 3. The results allowed us to make some observations: • As the number of nodes and edges increase, that is, the network becomes larger, the algorithm tends to require more iterations to find the optimum, which, of course, is not surprising. • By setting the repetition parameter to r = 8, the algorithm returned an optimal solution in 100% of the cases, which is very encouraging. It shows that one does not need an astronomical number of iterations to guarantee the optimum. • In some cases (actually 25% the of all cases we tried) already the first iteration achieved the optimum. Furthermore, in 75% of cases at most 5 iterations were sufficient to reach the optimum, and we never needed more than 8. Of course, these are relatively small examples, but it reinforces the observation that a limited number of iterations suffice.

Conclusions
We have presented an efficient solution for the NP-complete Unsplittable Flow problem in directed graphs. The efficiency is achieved by sacrificing a small part of the solution space. The sacrificed part only contains scenarios where some edges are close to saturation, which is undesired in practical applications, anyway. The solutions that exclude such nearly saturated edges are called safe solutions.
The approach constitutes a new avenue to approximation, which we call safe approximation. It means that instead of giving up the exact solution, we rather restrict the search space to a (slightly) smaller one. When, however, the algorithm finds any solution, which happens with very high probability, then it is an exact (not just an approximate) solution.