An Efficient Mixed Integer Linear Programming Model for the Minimum Spanning Tree Problem

Finding a minimum spanning tree in a given network is a famous combinatorial optimization problem that appears in different engineering applications. Even though this problem is solvable in polynomial time, having efficient mathematical programming models is important as they can provide insights for formulating larger models that integrate other decisions in more complex applications. In the literature, there are ten different integer and mixed integer linear programming (MILP) models for this problem. They are variants of set packing, cuts, network flow and node level formulations. In addition, this paper introduces an efficient node level MILP model. Comparisons for the eleven models are provided. First, the models are compared in terms of the number of decision variables and the number of constraints. Then, computational comparisons using a commercial MILP solver on sets of randomly generated instances of different sizes are conducted. Results provide evidence that the proposed MILP model is competitive in terms of the computational time needed for proving optimality of generated solutions for instances with up to 50 nodes. Meanwhile, the LP relaxation of a multi-commodity flow MILP model which has integer polyhedron provides stable computational times despite its larger size.


Introduction
For a given undirected, connected network W = (N , A), where N is the set of nodes, A is the set of arcs and l a is the length of arc a ∈ A, the minimum spanning tree (MST) is a subset of arcs M ⊂ A that connect all nodes in N such that ∑ a∈M l a is minimum.To illustrate, Figure 1 shows a sample undirected, connected network in which N = {1, 2, 3, 4, 5, 6}, and A = {{i, j} : i, j ∈ N and i = j}.Each arc a ∈ A is represented in Figure 1 using a dashed line, and its associated length (l a ) is written on top of the dashed line.Figure 2 shows four different selected subsets of arcs which are represented by solid lines.In Figure 2a, the selected arcs do not form a spanning tree as they do not connect all nodes in the network.In Figure 2b, the selected arcs connect all nodes but they form the cycle 1-2-4-5-1.In cycles, an arc can be unselected to reduce the summation of the selected arcs' lengths while the involved nodes remain connected.Therefore, this set of arcs does not represent a minimum spanning tree.In Figure 2c, the selected arcs span all nodes in the network without forming a cycle, yet the summation of the selected arcs' lengths is not minimum.Meanwhile, Figure 2d shows selected arcs that form a spanning tree having the minimum summation of the arcs' lengths.Therefore, M = {{1, 2}, {1, 5}, {1, 6}, {3, 4}, {4, 5}}.1: (a) selected arcs are not sufficient to form a spanning tree that connects all nodes; (b) selected arcs are spanning all nodes but they form a cycle; (c) selected arcs form a spanning tree that does not have the minimum total arc length; and (d) selected arcs form a minimum spanning tree.
The problem of finding M is a famous combinatorial optimization problem for which polynomial time algorithms exist [1][2][3][4].As explained in [5], this problem appears in different engineering and service applications, particularly in designing computer, telecommunications, transportation and water supply networks.In addition, it has a number of computational applications such as clustering of data points in a plane [6], handwriting recognition [7] and providing approximate solutions for the traveling salesman problem [8].Some recent applications include cell nuclei segmentation [9], Alzheimer's classification [10], water looped network equilibrium [11] and characterizing local urban patterns [12].
As indicated by Magnanti and Wolsey [5], having alternate linear programming models for the problem of finding MST can offer theoretical and algorithmic insights into solving the problem.This can be extended to a larger application in which finding an MST is a subproblem.In the literature, there are ten different integer and mixed integer linear programming models for the MST problem.They are variants of set packing, cuts, network flow and node level formulations.The set packing formulation is based on defining constraints that restrict the cardinality of any selected subset of arcs not to exceed the number of connected nodes less than one.These constraints are used for eliminating cycles.In cuts formulations, a cut set is a subset of arcs that connect two subsets of nodes.Constraints that define lower bounds on the number of arcs in cut sets are used to ensure that all nodes are connected.The network flow formulations utilize artificial decision variables to represent quantities of commodity flows along the arcs.By maintaining consistent flow starting at a source node and ending at a destination node along the network, a tree is constructed.In the node level formulations, a level decision variable is associated with each node.Constraints that restrict the inclusion of arcs connecting a lower level node to a higher level one are used for eliminating cycles.
Some of the models are notorious for their combinatorial explosion as a result of their exponential number of constraints with respect to the number of nodes.Other models aim to have polynomial number of constraints while maintaining integer polyhedron for their linear programming relaxations.Such a property reduces the computational effort by directly using polynomial-time linear programming algorithms without the need for the non-polynomial MILP solution algorithms.However, this does not necessarily guarantee finding an optimal solution in reduced computational time.Therefore, any comparison between the models has to take into consideration two aspects: the first one is the size of the model represented by the number of decision variables and the number of constraints; and the second one being the computational time needed to prove optimality of a generated solution.
This paper presents a node level MILP model and compares it with existing models in terms of the number of decision variables and constraints.Numerical experiments are conducted using randomly generated instances of various sizes to provide computational comparisons between a selected subset of the models that have polynomial number of constraints in the size of the network represented by the number of nodes.IBM ILOG CPLEX version 12.8 [13] (IBM, Armonk, NY, USA) is used to generate solutions to these instances based on the compared MILP models.The models are entered to CPLEX using the optimization programming language (OPL) [14].The computational time needed for each model to prove optimality of a generated solution is used for comparison.
The rest of this paper is organized as follows.A review is provided in Section 2 for the existing integer and mixed integer linear programming models of the problem of finding MST.The proposed model is presented along with a proof for its validity in Section 3. In Section 4, a comparison is provided for the two aspects mentioned above; followed by the conclusion in Section 5.

Review of Existing Models
This section reviews the different integer and mixed integer linear programming models developed in the literature for the problem of finding M. The first nine models are provided in [5], and the tenth model is presented in [15].They are presented here with different exposition and insights, and for the sake of the integrality of this paper.To simplify comparisons between models, we consider a network in which |A| = n(n − 1)/2 where n = |N |, i.e., for every pair of nodes i and j ∈ N , there is an arc {i, j} ∈ A.

Cycle Elimination Model
The first model is a direct intuitive mathematical expression of the definition of the MST.A binary decision variable x a is used to represent the decision of whether arc a ∈ A will be included in the tree (x a = 1) or not (x a = 0).Accordingly, the first model is formulated as follows.
Minimize ∑ a∈A l a x a (1) Subject to: The set of constraints T 1 = {(2)-( 4)} defines the domain of the decision variables of the first model.Constraint (3) represents the condition that a tree in W has exactly n − 1 arcs.Constraints (4) take the form of set packing constraints, and they are referred to as cycle elimination constraints.Eliminating cycles is achieved by limiting the number of arcs included in the sub-tree connecting the nodes in every subset S not to exceed |S| − 1.Here, A(S) ⊂ A is the subset of all arcs connecting the nodes in S. Logically, all the aforementioned constraints ensure that all nodes in the network will be connected.The linear programming (LP) relaxation of this model has integer polyhedron.Even though this model has exponential number of cycle elimination constraints (2 n − n − 2), it has interesting theoretical properties that can be used to prove the optimality of the greedy algorithm in [2], as illustrated in [5].

Cut Sets Models
The second model is based on defining cut sets.For a non-empty subset S ⊂ N , a cut set is a subset of arcs, denoted C(S), that whenever removed from W will disconnect the nodes in S from the nodes in N − S.This model has definitions of the decision variables and the objective function that are similar to the first model.The set of constraints T 2 = {(2), (3), (5)} defines the domain of the decision variables of the second model.

∑ a∈C(S)
x a 1 ∀S ⊂ N where |S| ≥ 1 In T 2 , the cycle elimination constraints are replaced by connectivity constraints which ensure that for all subsets S ⊂ N , the size of the cut set is at least one.The latter set of constraints is also exponential in size which equals 2 n−1 − 1 after excluding duplicate cut sets.For n > 3, the number of Constraints ( 5) is less than the number of Constraints ( 4), yet it is proven in [5] that the LP relaxation of T 2 does not have integer polyhedron.
The third model is based on defining multicuts.For some K where 1 The multicut model utilizes the same decision variables and objective function as the first two models.Let C(S 1 , S 2 , . . ., S K ) ⊂ A be the arcs that whenever removed from W will result in K separate cliques.The set of Constraints T 3 = {(2), (3), (6)} defines the domain of the decision variables.∑ a∈C(S 1 ,S 2 ,...,S K ) Constraints (6) maintain connectivity in the MST for all node partitions satisfying the aforementioned properties.When K = 2, Constraints (6) reduce to Constraints (5).It is proven in [5] that, for K > 2, the LP relaxation of T 3 has integer polyhedron which makes it advantageous over the single cut domain T 2 .However, no clue is provided for the upper limit of K at which feasible trees can be obtained.The following proposition states an important property of this model.Proposition 1.The set of constraints T 3 does not prevent cycles when K = n − 1.
Proof.Observe that, for an arbitrary non-empty subset of nodes σ ⊂ N , the set of constraints T 3 , when K = n − 1, does not prevent the following condition: x {i,j} = 0 Therefore, cycles can be formed.
Regarding the size of the third model, the number of Constraints ( 6) is the Stirling number of the second kind which is denoted { n K } [16].It is known that { n n−1 } = n(n − 1)/2, which is polynomial, but there is no guarantee for feasible trees as indicated by Proposition 1.Meanwhile, { n K } is exponential in n when 2 ≤ K < n − 1.

Digraph Models
The MST is equivalent to a set of shortest paths that start at an arbitrary root node and connect it with every other node in the network.A model that finds such shortest paths with the minimum total length for all paths finds an MST as well.Based on this concept, the fourth model is defined.A digraph G = (N , D) is constructed by replacing every arc a = {i, j} ∈ A in W = (N , A) by directed arcs (i, j) and (j, i) ∈ D, both having a length of l a .For every pair of arcs (i, j) and (j, i) ∈ D, variables z (i,j) and z (j,i) are defined, respectively.An arc a ∈ A is included in M whenever either z (i,j) or z (j,i) equals 1. Constraints (8) define this relationship, and Constraints ( 9) and ( 10) define, respectively, the limit on the number of included arcs in M and the cycle elimination constraints.
where D(S) ⊂ D is the subset of all directed arcs connecting the nodes in S. Let α ∈ N denote the arbitrarily selected root node, and N − = N − {α}.To construct the set of shortest paths, it should be exactly one arc coming into every node in G except the root node.Constraints (11) represent this condition.
Meanwhile, the root node should not have any incoming arc from any other node.This is implied by Constraints ( 9) and ( 11).The fourth model, which is represented by the Objective Function (1) and the set of Constraints T 4 = {(2), (7)-( 11)}, is a directed version of the first model.It does not reduce the number of constraints in compensation of the increase in the number of decision variables.However, it may provide some insights for algorithmic approaches.
The fifth model is a directed version of the second model that is based on the digraph G = (N , D), arbitrarily selected root node α and the decision variables z (i,j) and z (j,i) .The set of constraints T 5 = {(2), ( 7)-( 9), ( 12)} defines the domain of the decision variables in the fifth model.
where D(S) ⊂ D is the subset of all directed arcs that originate from the nodes in the subset S ⊂ N − and end at nodes in the subset N − S in the digraph G.It is proven in [5] that the LP relaxation of the fifth model has equivalent polyhedron to the LP relaxation of the fourth model, and both are equivalent to the first model.

Single Commodity Flow Model
The sixth model is based on defining fictitious flow variables through the arcs of the network.It is presented in [5] as a single commodity flow model in which one arbitrary node is selected as the source node and the other nodes are defined as demand nodes.Let α denote the source node and N − = N − {α}.For each arc a = {i, j} ∈ A connecting nodes i and j, two non-negative decision variables, y i,j and y j,i , are defined to represent quantity of commodity flow from node i to node j and vice versa.At every node i ∈ N −1 , the demand is exactly one unit.Therefore, the difference between inflows and outflows must be exactly one.Meanwhile, the supply quantity at the source node α has to be exactly n − 1.These flow conservation constraints are formulated by Equations ( 13) and (14).
∑ {α,j}∈A y i,j and y j,i ≥ 0 ∀{i, j} ∈ A In the sixth model, The MST solution is captured by the binary decision variables {x a : a ∈ A} just like the first three models.For arc a = {i, j}, y i,j = 0 and y j,i = 0 whenever x a = 0. Constraints ( 16) and ( 17) represent this relationship.
where (n − 1) is the upper bound on the amount of flow on any arc throughout the network.Whenever x a = 1, Constraints ( 13) and ( 14) force y i,j or y j,i to take a value greater than zero as needed.The model defined by Objective Function (1) and the set of constraints T 6 = {(2), (3), ( 13)-( 17)} is based on a common approach used in formulating routing problems in which tying flow variables to binary arc inclusion-exclusion variables, along with flow conservation constraints, is used to replace the cycle elimination constraints.As shown in [5], the LP relaxation of this model results in non-integer solutions.The main advantage of this model compared to the previous ones is that it has polynomial number of constraints.

Multicommodity Flow Models
Based on the idea of the equivalence of the MST and the set of shortest paths from an arbitrary source node, the seventh model is defined.On the digraph G = (N , D), defined in Section 2.3, starting at the root node α, a single unit of commodity k is to be sent to node k ∈ N − .The decision variable f k d is used to represent the flow quantity along the directed arc d ∈ D of commodity k.The following flow conservation equations define part of the constraints on these variables.
By Constraints (18), every node k ∈ N − receives exactly one unit of commodity k which originates at the source node by Constraints (19), and flows through other nodes without being consumed by Constraints (20).Similar to the single commodity flow model, the flow decision variables are tied to the binary arc inclusion-exclusion decision variables z (i,j) by Constraints (22).
Accordingly, the seventh model is defined by Objective Function (1) and the set of constraints T 7 = {(2), ( 7)-( 9), ( 18)-( 23)}.It is shown in [5] that, based on the maximal flow-minimal cut theorem [17], the LP relaxation of the seventh model has an integer polyhedron identical to the fifth model.
The eighth model is an undirected version of the seventh model defined over the network W = (N , A).Similar to the single commodity flow model, the decision variables f k (i,j) and f k (j,i) are defined on arc {i, j} ∈ A. The integer decision variables z d of the seventh model are not used in the eighth model.The flow decision variables f k (i,j) and f k (j,i) are tied to the binary variables x a as represented by Constraints (24).
The eighth model is constituted by Objective Function (1) and the set of constraints T 8 = {(2), (3), ( 18)-( 21), (24)}.This reduces the number of decision variables compared to the seventh model without affecting the polyhedron of the LP relaxation as shown in [5].
Based on the eighth model, Magnanti and Wolsey [5] suggested a ninth model in which Constraints (24) are replaced by weaker Constraints (25).
However, Constraints (25) are found to be insufficient for generating feasible solutions as Constraints (26) have to be considered as well.
Therefore, the ninth model is defined by the set of Constraints T 9 = {(2), (3), ( 18)-( 21), (25), (26)}.Based on the maximal flow-minimal cut theorem, it is shown in [5] that the polyhedron of the LP relaxation of the ninth model is equivalent to the polyhedron of the LP relaxation of the second model.

Node Level Model
The node level formulation is based on the idea of assigning level decision variables for the nodes and using them to eliminate cycles.This is done by introducing level difference constraints that are tied to the arc inclusion/exclusion binary decision variables.This approach was originally introduced by Miller, Tucker and Zemlin for the traveling salesman problem [18].The tenth model presented here was introduced in [15] using LINGO mathematical programming modeling language.It utilizes the digraph G = (N , D) and the binary decision variables z d for d ∈ D defined in Section 2.3.Additional decision variables are defined to represent the node level in the tree, which are denoted V i for i ∈ N .An arbitrary root node α is selected and assigned a zero level.For other nodes k ∈ N − {α}, if z (i,k) = 1, the level of node k is higher than the level of node i by 1.The following constraints represent these relationships.
In addition, two domain constraints are defined for the level decision variables as follows.
To make sure that all nodes are connected while maintaining their level relationships, every node k ∈ N − {α} must have exactly one arc coming into it, while the root node α must have at least one arc going out of it.
Accordingly, the tenth model is represented by the set of Constraints T 10 = {(8), (27)-(33)}.No proof is provided in [15] for the validity of this model.However, this can be deduced since Constraints (32) and (33) ensure that all nodes are connected, and the level Constraints (27) to (30) eliminate cycles.It is important to note here that the structure of Constraints (28) along with Constraints (31)-(33) allow only one of the arcs (i, j) and (j, i) to be included in the solution but not both.

The Proposed MILP Model
The proposed eleventh model is another variant of the node level formulation.It utilizes the digraph G = (N , D) and the binary decision variables z d for all d ∈ D as defined in Section 2.3.The main idea behind the proposed model is based on the analogy between rivers which have multiple sources and a single sink and trees which have multiple branches and a single trunk.Neither rivers nor trees not have cycles.In the proposed model, an arbitrary node ω ∈ N is labeled as a sink node.Other nodes represent either sources of water or intermediate intersection points.In both cases, water only flows from a higher level node to a lower level one.We define decision variables V i for all i ∈ N to represent the water level at each node.The sink node has the lowest level, which is set to zero, while the other nodes have levels that must be greater than zero.Some constraints are introduced to link the z d variables with the level variables.We start with the following formulation.

∑
(ω,j)∈D z (ω,j) = 0 (36) V ω = 0 (38) The Objective Function (34) is the same as (1), defined using the decision variables z d , where l d for both d = (i, j) and d = (j, i) equals l a for a = {i, j}.Constraints (35) represent the condition that there is exactly one arc going out of every node i ∈ N − {ω}.Meanwhile, Constraint (36) ensures that the sink node does not have any arcs going out of it.These constraints imitate directions of water flow in rivers, and they ensure that the number of arcs included in the solutions is n − 1, which is equivalent to Constraint (9).
Constraints (37) define the relationship between the binary decision variables z (i,j) and the corresponding level variables V i and V j .It is formulated to make sure that if z (i,j) = 1, the level at node i is greater than the level at node j by one; otherwise, this constraint will be redundant.Constraints (38) sets the level of the sink node to zero.Constraints (39) are the domain constraints for the variables z d .The following theorem is necessary to proof the validity of the proposed model.Proof.(by contradiction) Suppose without loss of generality that, for the subset of nodes S = {α, α + 1, . . ., ν} ⊆ N − {ω}, where |S| ≥ 2, a cycle is formed by having z (ν,α) = 1 and z (i,i+1) = 1 ∀i, i + 1 ∈ S. Therefore, according to Constraints (37), V ν ≥ V α + 1 and V i ≥ V i+1 + 1 ∀i, i + 1 ∈ S. The latter leads to V α > V ν which contradicts the former.Proof.As a result of Theorem 1 and since Constraints (35), (36) and (39) ensure that there is exactly one arc going out from every node i ∈ N − {ω}, there has to be at least one arc coming into node ω.Therefore, a valid acyclic tree in which all nodes are connected is obtained by Constraints (35)-(39).
Therefore, the MILP model presented by Equations ( 34)-( 39) is sufficient to find a minimum spanning tree.However, additional bounding constraints for the decision variables help in reducing the number of iterations needed by a solution algorithm.Therefore, we define Constraints (40) and (41) to provide lower and upper bounds for the level decision variables.
In addition, valid inequalities [19] can be defined to strengthen the proposed formulation.Logically, since there must be at most one arc connecting every pair of nodes, the following constraints are added.
Proof.Since Formulations (34)-(39) results in solutions that do not contain cycles, as stated in Theorem 1 and Corollary 1, for every pair of nodes i and j ∈ N − {ω} either z (i,j) = 1 or z (j,i) = 1 but not both.Since Inequality (42) are mathematical representations of these conditions, they are valid inequalities for Formulations (34)-(39).
In addition, there must be at least one arc coming into the sink node ω.Since this is implied by Constraints (35) and (36), the following constraint is another valid inequality for Formulations (34)-(39).
Therefore, the set of constraints T 11 = {(35)-(43)} defines the domain of the decision variables of the proposed model.

Models Comparison
The first aspect to consider in comparing the eleven models presented herein is the memory size needed by an MILP solution algorithm, which is represented by the number of decision variables and the number of constraints.This aspect not only defines the memory requirements, but also reflects the computational time needed in each iteration of the solution algorithm.The second aspect is the computational time needed to prove optimality.Both aspects are not necessarily correlated.The following sections discuss these two aspects.

Model Size
A comparison of the characteristics of the eleven models and their sizes is presented in Table 1.For each model, an abbreviated name is assigned in Table 1 which is used in the following analysis, tables and figures to provide a meaningful reference.The size of each model is represented in Table 1 by the total number of variables, the number of integer variables and the number of constraints.The number of arcs in the network, A = n(n − 1)/2, is used to simplify the formulas used for the calculated numbers.Following the convention of the MILP solvers, the non-negativity, integer and binary constraints for all models are not counted.
As shown in Table 1, the first to fifth models suffer from the combinatorial explosion as a result of their exponential number of constraints.For the remaining models, the number of decision variables and constraints are polynomial in terms of the number of nodes.With respect to the number of decision variables, the tenth (D-level) and the proposed models have the lowest number of decision variables among the models with polynomial number of constraints.In terms of the number of integer variables, the LP relaxations of the seventh (D-m-flow) and eighth (m-flow-1) models have obvious advantage over the other models since they are proven to provide integer solutions without the need to explicitly define integer variables.This allows for solving them using polynomial time linear programming algorithms instead of the non-polynomial branch-and-bound or branch-and-cut algorithm.However, this does not necessarily mean that the computational time will be less as it is also affected by the model size.For the other models with polynomial number of constraints, the ninth (m-flow-2) model has the lowest number of integer variables.This may help in reducing the number of branch-and-bound iterations needed; however, again, it does not necessarily mean that the computational time is less.
The sixth (D-1-flow) model has the lowest number of constraints, followed by the tenth (D-level) and proposed models, respectively.The differences in the number of constraints between the D-1-flow, D-level and the proposed models are not significant, while the D-m-flow, m-flow-1 and m-flow-2 models have comparatively larger number of constraints.To demonstrate, Figure 3 presents a comparison between selected models in terms of their number of constraints at different values of the number of nodes.The difference between the number of constraints of the D-m-flow and the m-flow-2 models is A as given in Table 1, which is comparatively small.This explains the overlapping curves for these two models in Figure 3. Similarly, the difference between the number of constraints of D-1-flow and D-level models is n − 2 which is relatively small.
To further demonstrate the differences between the models with respect to their number of constraints, consider an instance with 500 nodes.The proposed model has a number of constraints that is 374,752 as opposed to 250,001 and 250,500 of D-1-flow and D-level models, respectively.Meanwhile, D-m-flow and m-flow-2 models are above 124 million constraints and their LP relaxations are almost 125 million constraints, since in the LP relaxations additional bounding constraints for binary variables are added.For m-flow-1 model, this number exceeds 3 billion.

Computational Time
The computational time needed to prove optimality of a generated solution is an important criterion that needs to be considered in comparing mathematical programming models.This comparison have to be done using a state-of-the-art software that implements efficient algorithmic techniques.IBM ILOG CPLEX version 12.8 fits this requirement as it implements concurrently a mix of primal/dual simplex and barrier methods for solving linear programming models.For MILP models, CPLEX implements branch-and-bound, branch-and-cut and dynamic search algorithms in addition to pre-processing algorithms for model reductions and heuristic methods for speeding up the search [13].For linear programming models, an optimality check is performed at each step, and the iterations continue until a proof of optimality is reached.For MILP models, a proof of optimality is concluded when the lower bound equals the upper bound.All computational experiments in this paper were conducted on a laptop computer with Intel Core i7 processor running at 2.70 GHz with 2 cores, 4 logical processors and 8 GB physical memory, under Microsoft Windows 10.
Random instances are generated by assigning two random coordinates for each node.The arc length between any two nodes equals the Euclidean distance between them based on their coordinates.This approach maintains the triangular inequality property in the randomly generated instances, which is commonly found in most, if not all, applications.The randomly generated coordinates and the calculated distances are prepared in Microsoft Excel and loaded through the interface provided by IBM ILOG CPLEX Optimization Studio.The loading time for the data as well as the model loading time are taken into consideration in the recorded computational times.
Preliminary experiments with less than 20 nodes do not show any significant differences in the computational times needed to prove optimality of generated solutions based on the models presented in this paper.Since the number of constraints of the first to fifth models is quite large for instances with 20 or more nodes (exceeds half million constraints), they are not considered in the computational comparison.Since the LP relaxations of the seventh (D-m-flow) and eighth (m-flow-1) models, denoted LP (D-m-flow) and LP (m-flow-1), respectively, have integer polyhedrons, they are used in the computational comparisons instead of their original MILP models.

Results for 30-Node Instances
The first set of experiments is based on 30-node instances.The coordinates are randomly generated integers following a uniform distribution between 1 and 100.For 100 randomly generated instances, Table 2 presents some descriptive statistics for the total computational time in seconds, and Figure 4 provides a comparison between the Box and Whisker plots of the models.The logarithm to the base 10 of the computational times is presented in Figure 4 for a better clarity of the distribution of the points.
Based on the statistics provided in Table 2, it is evident that the proposed model provides the best performance in terms of the average computational time, followed by the LP relaxation of the D-m-flow model and the D-level model respectively, both with slight differences.However, the LP relaxation of the D-m-flow model has the lowest standard deviation which makes it more robust to changes in the input data.Despite the relatively large variability of the computational times of the D-level and the proposed models, they both have median computational times that are almost half the median computational time of the LP relaxation of the D-m-flow model.This indicates that, in most cases, both models need less computational times, which occurred in 87 and 86 instances for the D-level and the proposed models, respectively, as the detailed results reveal.Meanwhile, the proposed model achieved less computational times in 74 instances compared to the D-level model.
The statistics of the D-1-flow model indicate that it has both higher average and standard deviation values for the computational time when compared to the LP relaxation of the D-m-flow, the D-level and the proposed models.Even though it is capable in some cases to find an optimal solution faster than these three models, which occurred in 12 cases, its computational time in 12 cases was more than twice the computational time of any one of the other three models.As shown in Figure 4, It is evident that the m-flow-2 model results in the worst computational time that went over two hours for one of the instances.Furthermore, its standard deviation of the computational time is the highest, which makes it unfavorable candidate when compared to the LP relaxation of the D-m-flow model which has slightly larger number of constraints.Similarly, even though the LP relaxation of the m-flow-1 model is a slight modification of the D-m-flow model, such a modification does not provide any benefit in terms of reducing the computational time.

Results for 40-Node Instances
The second set of experiments excludes both the LP relaxation of the m-flow-1 model and the m-flow-2 model from the comparison as based on the 30-node instances, they are not expected to provide competitive computational times.In the second set of experiments, 100 instances of 40 nodes are randomly generated.Similar to the previous set, node coordinates are random integers generated using uniform distribution between 1 and 100.Table 3 presents some descriptive statistics for the total computational time in seconds, and Figure 5 provides a comparison between the Box and Whisker plots of the models.The results of the 40-node set is coherent with the previous 30-node set.As shown in Table 3, the proposed model provides the best average computational time followed by the LP relaxation of the D-m-flow model and the D-level model respectively.Meanwhile, the performance of the D-1-flow model is comparatively very poor on average, it has the highest median and standard deviation values, and its computational time exceeded 1.5 h in one instance.
As illustrated in Figure 5, the LP relaxation of the D-m-flow model has the lowest variability in computational time; nevertheless, its median is more than double the medians of the proposed and the D-level models, which conforms with the results of the 30-node instances.The detailed results reveal that the proposed model is capable of achieving less computational time than the LP relaxation of the D-m-flow model in 81 instances, and less than the D-level model in 78 instances.

Results for 50 and 60-Node Instances
The remaining computations in this paper are intended to compare the proposed model with the LP relaxation of the D-m-flow model.The advantage of the proposed model is attributed to its smaller size, yet the algorithmic techniques used for solving it are not polynomial in time.On the other hand, the latter model is easier to solve using polynomial time algorithms, but due to its larger size, the computational time needed can be larger than the proposed model.Therefore, the intention of conducting computational experiments with 50-node and more than 50-node instances is to investigate and compare the trends of the computational times for both models.
Starting with 50-node instances, the coordinates of the nodes are randomly generated integers uniformly distributed between 1 and 500.Increasing the scale for the generated coordinates compared to previous instances is done to reduce the chances of having duplicate nodes with the same coordinates.However, this does not have any effect on the computational time.Table 4 presents some descriptive statistics for the total computational time in seconds, and Figure 6 provides a comparison between the Box and Whisker plots of the two models for all problem sizes investigated so far.
The statistics provided in Table 4 show that the median of the computational times of the proposed model is a little bit higher than half the median of the LP relaxation of the D-m-flow model which is not significantly different from the results of smaller size instances.However, the proposed model results in an average computational time that is higher than that of the LP relaxation of the D-m-flow model which represents a shift from the results of smaller-size instances.However, in general, the relative distributions of the computational time results of the two models for the 50-node instances are not significantly different from those of smaller instances, as depicted in Figure 6.For the 50-node instances, the proposed model achieved less computational times in 74 instances.Even though this number represents a relatively large proportion, it is less than the figures obtained for small size instances, which are 86 and 81 for the 30-node and 40-node instances, respectively.This signifies a downward trend for the proposed model.
To further investigate this trend, additional instances with 60 nodes are randomly generated.Similar to the 50-node instances, the coordinates of the nodes are randomly generated integers uniformly distributed between 1 and 500.only 30 randomly generated instances are sufficient to reveal the deterioration of the performance of the proposed model.It is capable of achieving less computational time in only 13 instances.The median of the computational time of the proposed model is found to be 108.2s, while, for the LP relaxation of the D-m-flow model, it is found to be 71.49s.Even though the proposed model can achieve relatively low computational times-less than 10 s in 10 cases-it takes more than 5 min in eight cases with slow convergence rate to a proven optimality.The LP relaxation of the D-m-flow model on the other hand is very stable in its computational time requirement with a mean of 73.42 s and relatively small standard deviation of 9.63 s.

Conclusions
Combinatorial optimization problems are characterized by logical relationships that can be represented mathematically using different models.This paper follows an assessment approach for such models that is based on two aspects.The first is the memory size needed; and the second is the computational time needed by a solution algorithm to prove optimality.This paper presents a node level MILP model for the famous combinatorial optimization problem of finding a minimum spanning tree in a given network.Despite the ten existing models in the literature and the polynomial time solution algorithms for this problem, introducing a model with a modified structure can provide more insights for modeling and solving larger problems in which the MST problem is a subproblem.
The proposed model is based on the analogy between rivers and trees, as both do not contain cycles.It is proven that the proposed model will result in valid minimum spanning trees.In terms of the model size, the proposed model has the third lowest number of constraints, which increases polynomially with the number of nodes.Meanwhile, the number of decision variables of the proposed model is the second lowest, which also increases polynomially with the number of nodes.
To compare the proposed model with the other models in terms of the computational time needed to prove optimality of generated solutions, computational experiments were conducted using a state-of-the-art commercial MILP solver.The computational comparison focused only on models that have polynomial number of constraints.These computational experiments were done sequentially using increasing number of nodes in a graph to filter out inefficient models early.Starting with 30-node instances, two multi-commodity flow-based models were found to be inefficient and were excluded from the next set of 40-node instances.The computational results for the 40-node instances showed that the proposed model produces the best results and is competitive with a linear programming relaxation of a digraph multi-commodity flow model.These two models were investigated further using 50-node instances, for which the results showed that the proposed model achieved less computational time in 74 cases out of 100.However, since that figure is found to be decreasing when compared to the previous results of 30-and 40-node instances, additional experiments with 60-node instances were conducted.The latter results suggest that the proposed model may not be efficient with instances of more than 50 nodes.For larger instances, the computational results show slow convergence for the lower and upper bounds.On the other hand, the LP relaxation of the digraph multi-commodity flow model demonstrated stable computational time results; however, its size is relatively large.

Figure 2 .
Figure2.Subsets of selected arcs connecting nodes in the sample network of Figure1: (a) selected arcs are not sufficient to form a spanning tree that connects all nodes; (b) selected arcs are spanning all nodes but they form a cycle; (c) selected arcs form a spanning tree that does not have the minimum total arc length; and (d) selected arcs form a minimum spanning tree.

Figure 3 .
Figure 3.Comparison between selected models based on the number of constraints.

Figure 4 .
Figure 4. Box and Whisker plots for the distribution of computational times in seconds of selected models for the 30-node instances.

Figure 5 .
Figure 5. Box and Whisker plots for the distribution of computational times in seconds of selected models for the 40-node instances.

Figure 6 .
Figure 6.Box and Whisker plots for the distribution of computational times in seconds of the LP relaxation of the D-m-flow model and the proposed model for the 30-, 40-and 50-node instances.

Table 1 .
Model characteristic and size comparison.

Table 2 .
Descriptive statistics of the computational times in seconds for the 30-node instances.

Table 3 .
Descriptive statistics of the computational times in seconds for the 40-node instances.

Table 4 .
Descriptive statistics of the computational times in seconds for the 50-node instances.