Article Asymptotic Behavior of the Maximum Entropy Routing in Computer Networks

Maximum entropy method has been successfully used for underdetermined systems. Network design problem, with routing and topology subproblems, is an underdetermined system and a good candidate for maximum entropy method application. Wireless ad-hoc networks with rapidly changing topology and link quality, where the speed of recalculation is of crucial importance, have been recently successfully investigated by maximum entropy method application. In this paper we prove a theorem that establishes asymptotic properties of the maximum entropy routing solution. This result, besides being theoretically interesting, can be used to direct initial approximation for iterative optimization algorithms and to speed up their convergence.

The basic idea of the maximum entropy method is to get a unique solution from the underdetermined system by introducing the additional constraint that the entropy function should be maximized.The other methods that were used for solving underdetermined systems use the same technique: they introduce additional, artificial constraints that make the number of constraints equal to the number of unknowns.The difference is that the maximum entropy method introduces the most natural additional constraint: one that does not introduce any new, arbitrary and unwarranted information.It uses only the information that is given and makes no assumptions about missing information.
It is interesting to mention that besides very pragmatic uses (like one proposed in this paper, network design) there were very extensive philosophical discussions about the real meaning of the maximum entropy principle.The predecessor of the maximum entropy principle is the principle of insufficient reason (James Bernoulli: "Ars Conjectandi," 1713).It states that in the absence of any information (knowledge), all outcomes should be considered equally possible.This principle of insufficient reason was involved in the discussions about prior probabilities (probabilities of one event, state of the knowledge) and relative frequencies.Relative frequencies become predominant and some useful works from Laplace and Bayes were criticized.Shannon's works on information theory opened a new opportunity for revitalization of the principle of insufficient reason, this time as a more sophisticated maximum entropy principle that was introduced by Jaynes [14,15].These philosophical discussions about the real meaning of the maximum entropy method are interesting, but since the method was successfully applied in many areas, for any new area of application the most important criterion is not how well can we explain the relation between the MEM and that area, but how useful are the results we get by applying the method.

Definition of the MEM
The formal definition of the maximum entropy method is: Suppose that for a discrete random variable X the values x 1 , x 2 , ..., x n that it can take are known, but the corresponding probabilities p 1 , p 2 , ..., p n are not known.Also, the expected values for k < n − 1 functions of X (for example, the first k moments) are known: In fact, we do not need to know the values x 1 , x 2 , ..., x n , or analytical expressions for functions f r , r = 1, 2, ..., k.It is sufficient to know the values f r (X i ); r = 1, 2, ..., k; i = 1, 2, ..., n.Also, we do not have to start with probabilities p 1 , p 2 , ..., p n .We can start with any set of numbers t 1 , t 2 , ..., t n .Then we introduce This gives us (together with p i = 1 ) k+1 < n constraints for n unknown variables p 1 , p 2 , ..., p n .This system is underdetermined and has an infinite number of solutions.We want to find the unique solution that maximizes the entropy of the system.That is the best solution, in the sense that it uses only the information given.It is neutral to the missing information (it does not introduce any hidden assumptions).This additional constraint can be expressed as: Maximize the entropy function If K = 1 is selected, entropy will be expressed in natural units (rather than in bits).This system can be solved by method of Lagrange multipliers and there is a standard algorithm to solve this system.However, the function that is to be minimized is not convex even in the simplest case when there is only one constraint: expected value.The standard Newton-Raphson procedure does not work, but the Jacobian matrix for the system is symmetric and positive definite.This gives a scalar potential function which is strictly convex and its minimum is easy to find.The use of the second order Taylor expansion is recommended, as well as heuristics that avoid inverting the Jacobian matrix on each iteration [16].

Network Design Problem
The network design problem (NDP) is an old well known NP-hard problem.It is a very interesting problem because it has great practical value and since it is intractable, heuristics and suboptimal solutions have been used for decades.It is an open problem and since unique best solution cannot be found, every new approach, including MEM, is promising in the sense that solution obtained can be better then previous ones, at least in some cases.This problem is intractable if full and exact solution is required.Networks can have many hundreds of nodes (computers).Fortunately, experience has shown that network design can be done hierarchically and still be near optimal.An example is a network for a country.First, it can be decided where to put trunks between major cities, then connect small cities to nearest major cities, then make local networks inside the cities.This approach allows to work with networks of at most 50 nodes at a time.This is a great help, but the problem is still intractable.
Computer networks consist of computers, called nodes, and communication lines, called links, that interconnect them.All data that is exchanged among nodes is divided into packets.Destination address is added to messages and these packets are sent to neighboring computers that send to their neighbors and so on, until the message reaches its destination.
The network design problem is: • For given locations of nodes, traffic matrix (offered traffic for each pair of nodes) and cost matrix (cost to transfer a message for each pair of nodes) • With performance constraints: reliability, delay (time that a message spend in the network), throughput • Find values for variables: topology (which nodes will be connected directly with a line and which will have to communicate indirectly, using other nodes as intermediate stations), line capacities (how much traffic will each link be able to carry), flow assignment -routing (which paths messages between any pair of nodes will follow) • Minimize the cost (of building and maintaining the whole network).
Other formulations of the problem are: minimize delay for the given cost or maximize throughput for given cost and delay.It has been shown that all these problems are similar and that the same techniques can be applied.
The network design problem, that was for many decades investigated with emphasis on wide area networks, has been recently revitalized with application to wireless ad-hoc networks and subclasses mobile ad-hoc networks (MANET) [17,18], wireless mesh networks (WMN) and wireless sensor networks (WSN) [19,20].

MEM for NDP
The problem of network design is to find a topology, routing and capacity assignment such that cost or delay is minimized.Once the topology and the routing are decided, there are exact methods for capacity assignment that will minimize delay or cost.However, there are few theoretical results on how to select topology and routing.Most of the algorithms that are used are heuristic and many of them do not even have intuitive justification other then "easy to calculate" or "only simple thing we can do."Here is presented an attempt to use the maximum entropy method to select (initial) topology and routing.
Network design and analysis almost always involve underdetermined systems, especially when routing policy has to be determined.The number of possible routings grows with the factorial of the number of the nodes in the network and the number of possible topologies is exponential in the number of links.The number of constraints (such as "everything that goes in must go out" for each node that is neither source nor sink) is typically polynomial in the number of nodes in the network.That makes the network design problem a good candidate for the maximum entropy method application.
To solve the NDP, most methods that are currently used introduce new, artificial constraints.These artificial constraints do not have any justification other than that they make the number of unknowns and the number of constraints equal.The maximum entropy method has the nice property that it solves underdetermined systems without introducing any new, unwarranted information.The other advantage of the maximum entropy method is that it makes things as equal as possible.It is intuitively appealing that a network should not have overloaded or underutilized links i.e., traffic should be distributed as equally as possible along all lines.The same goal can be attained by using some other function that has maximum when all variables are equal.One very simple such function is the product of all variables.The product function expression seems simpler then entropy function expression which involves logarithms, but when the fact that partial derivatives are needed is considered, we see that entropy function is better since it separates variables.
It is possible to apply the MEM if analysis is started with totally interconnected network of n nodes.Some lines will be dropped later in the process of improving utilization or reducing the cost.Once the topology becomes sparse enough, other exact routing optimization methods can be introduced.To apply the maximum entropy method it has to be determined what will be the variables of the system.Some combination of required traffic values can be used for that [21,22] since for MEM application it is not necessary to start with probabilities, but an arbitrary set of numbers which can be normalized (by dividing each of these numbers by their sum).
The network design problem, primarily the routing feasibility, has been adjusted to the maximum entropy method requirements [16].Computationally feasible algorithm was developed which implements the standard maximum entropy method, includes adjustments for problems that do not involve probabilities initially, calculates a function that substitutes large sparse matrix, includes heuristic that speeds up calculations by avoiding to invert Jacobian matrix at each iteration, determines variables that define constraints for the routing feasibility, includes additional constraints that direct uniformity of the solution in the desirable direction [23], cancels opposing traffic and excludes underutilized links.Mentioned additional constraints are "soft", which is a unique feature of this algorithm, in the sense that they do not have to be satisfied; the solution will be pulled in the direction of satisfying them as much as possible.
Proposed algorithm computes a reasonable solution that is robust with respect to often required dynamic changes of the cost function.The maximum entropy solution can be a good starting point for further optimization considering that the cost function with delay penalties involves queuing theory that is usually computationally expensive.Alternatively, guided version of the MEM can be used dynamically.
To improve mentioned algorithm some theoretical results can help direct initial approximation.

Asymptotic Behavior of the MEM Routing
The algorithm explained in the previous section uses maximum entropy method as a basis but it starts with a totally connected network and iteratively eliminates lines to decrease the objective function (combination of cost and delay).Since this is computationally expensive algorithm, it helps to always select better starting point to reduce convergence time.Here we prove a theorem that establishes maximum entropy routing behavior in a network.
Theorem 1.For a totally interconnected network of n nodes and given offered load L ij ; i, j = 1, n; i = j; (required traffic from node i to node j) the maximum entropy solution will asymptotically (when total traffic on the network approaches infinity) distribute net traffic (traffic that remains after the traffic for same message types between same nodes in opposing directions is canceled) along direct paths and paths of length two in ratio 2:1.That means that offered load L ij for any pair of nodes i and j will be send as follows: 2  n L ij will be sent along direct path [i, j] and 1 n L ij will be sent along each of n − 2 paths of length two: j (all paths of length two from node i to node j through any intermediary node k).
To prove the theorem we need three lemmas: Lemma 1.Consider a system with n probabilities and two independent sets of constraints where the first set applies to p 1 , ..., p k and the second set of constraints apply to p k+1 , ..., p n , 1 ≤ k < n.If the sums for the two groups of probabilities are kept constant the maximization of the entropy for the system can be done as two separate maximizations, one for each group of probabilities.The proof is trivial [24].Lemma 2. To maximize the entropy for totally interconnected network of n nodes where offered load is L ij (the only traffic on the network is from node i to node j) and total network traffic is kept constant, we only need to consider paths of length 1 and 2 (direct path [i, j] and n − 2 paths of length two [i, k] − [k, j]; k = 1, n; k = i, j.All other lines will not carry any net traffic (opposing traffics will be equal).
Proof.For totally interconnected network of n nodes there are n − 1 lines going from node i and n − 1 lines going to node j.All the traffic going from node i and all the traffic coming into node j has to go along these lines.These lines contain all paths from node i to node j of length one and two.We know that maximization of entropy favors equal traffic everywhere.In order to push traffic L ij from node i to node j we have to disturb this equality by adding some traffic on lines going from node i and subtracting some traffic on lines going to node i.Similar is true for lines adjacent to node j.We want to make this disturbance as small as possible and we distribute offered load along all possible lines (lines adjacent to node i) and route it along lines adjacent to node j.Lemma 3 will prove that we need to do this and show how to do this.What we are proving here is that we do not need any lines of length greater then two.Indeed, regardless of what paths we will use later, we have to disturb lines going from node i for the amount of offered load L ij (the offered load must be sent from node i).Also, regardless of paths that are used, we have to disturb lines going to node j for the amount of offered load L ij (the offered load has to come to node j).By introducing the new paths (of length greater then two), we can not decrease the amount of disturbance that we already have on lines that are adjacent to nodes i and j.All we can do is to disturb some other lines that were undisturbed (opposing traffics were equal) and separating two probabilities that were equal (sum has to remain unchanged) will only decrease the entropy.
Lemma 3.For a totally connected network of n nodes, offered load L ij and total traffic on the network T , the entropy is maximized if traffic is routed along paths of the length 2 and 1 in the following quantities: where Proof.From Lemma 2 we know that we only need to consider direct path and paths of length two.Since all paths of length two are equivalent there cannot be any reason to treat any of them differently: all paths of length two will carry the same amount of traffic.The only thing that we have to decide is how much of the traffic to send along direct path and how much along paths of length two.
Totally interconnected network of n nodes has n(n − 1) lines (two between any two nodes).Let us first satisfy the condition that total traffic be equal to T .Put on all lines equal amount of traffic, that means each line carries traffic T n(n−1) .When we normalize traffic to probabilities, each probability will be p = 1 n(n−1) .This system has maximum entropy, but there is no net traffic on this network: all traffic cancels and there is no flow from node i to node j.To push some net flow on line [k, l] we have to increase probability p kl and decrease, by the same amount, the probability p lk .The new probabilities will be p kl = p + e and p lk = p − e.The net flow on line [k, l] is now 2eT .We want to push traffic L ij from node i to node j.Suppose that traffic T 1 will go along direct path, and traffic T 2 will go along each path of length two.Then Corresponding changes in probabilities, e 1 and e 2 , can be easily calculated: We have one direct path that now has probabilities p + e 1 and p − e 1 and n − 2 paths of length two, i.e., 2(n − 2) lines, each with probabilities p + e 2 and p − e 2 .From the previous two equations we get The entropy function is now: The last term is for other lines that are not on any path of length one or two.This lines do not carry any net flow, do not depend on e 1 or e 2 and are not considered during optimization (the term vanishes when derivative is taken).
We want to maximize entropy, so we take derivative and equate it to zero: we put and get equation where The solution for t will be where It is easy to see that for n >= 8 Q is always greater then zero.The solution we were looking for is: Proof of the Theorem 1.To prove the theorem we only need to investigate asymptotic behavior for the solutions from Lemma 3. When T − > ∞, d − > 0. In that case