Next Article in Journal
Compression Challenges in Large Scale Partial Differential Equation Solvers
Next Article in Special Issue
Correspondence between Multilevel Graph Partitions and Tree Decompositions
Previous Article in Journal
An Intelligent Artificial Neural Network Modeling of a Magnetorheological Elastomer Isolator
Previous Article in Special Issue
Using Graph Partitioning for Scalable Distributed Quantum Molecular Dynamics
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Faster and Better Nested Dissection Orders for Customizable Contraction Hierarchies

Institute of Theoretical Informatics, Karlsruhe Institute of Technology, Am Fasanengarten 5, 76131 Karlsruhe, Germany
*
Author to whom correspondence should be addressed.
Algorithms 2019, 12(9), 196; https://doi.org/10.3390/a12090196
Submission received: 29 June 2019 / Revised: 26 August 2019 / Accepted: 4 September 2019 / Published: 16 September 2019
(This article belongs to the Special Issue Graph Partitioning: Theory, Engineering, and Applications)

Abstract

:
Graph partitioning has many applications. We consider the acceleration of shortest path queries in road networks using Customizable Contraction Hierarchies (CCH). It is based on computing a nested dissection order by recursively dividing the road network into parts. Recently, with FlowCutter and Inertial Flow, two flow-based graph bipartitioning algorithms have been proposed for road networks. While FlowCutter achieves high-quality results and thus fast query times, it is rather slow. Inertial Flow is particularly fast due to the use of geographical information while still achieving decent query times. We combine the techniques of both algorithms to achieve more than six times faster preprocessing times than FlowCutter and even faster queries on the Europe road network. We show that, using 16 cores of a shared-memory machine, this preprocessing needs four minutes.

1. Introduction

The goal of graph partitioning is to divide a graph into a given number of roughly equally sized parts by removing a small number of edges or nodes. Graph partitioning has many practical applications, such as accelerating matrix multiplication, dividing compute workloads, image processing, circuit design, and the focus of this work, accelerating shortest path computations in road networks. For an overview of the state-of-the-art in graph partitioning, we refer the reader to a survey article [1].
Computing shortest paths in road networks is a fundamental building block in applications such as navigation systems (e.g., Google Maps), logistics planning, and traffic simulation. Unfortunately, Dijkstra’s algorithm [2] takes over a second for a single query on continental-size road networks with tens of millions of nodes, rendering it infeasible for interactive scenarios. This has led to a large amount of research on speedup techniques [3], which often use an expensive preprocessing phase to enable fast queries. Arc-Flags [4,5] is one of the early techniques that use graph partitioning. It has been frequently used to enhance other techniques, e.g., SHARC [6], combining shortcuts [7] and Arc-Flags, ReachFlags [8], combining Reach [9] and Arc-Flags as well as CHASE [8], combining Contraction Hierarchies [10] and Arc-Flags. In navigation systems, the graph topology changes infrequently, but the metric (arc weights) changes frequently, e.g., due to traffic congestion or road closures. To accommodate this, modern speedup techniques split the preprocessing phase into an expensive metric-independent preprocessing phase and a fast metric-dependent customization phase. The two state-of-the-art techniques are Multilevel Overlays, also known as Customizable Route Planning [11], which use nested k-way partitions and Customizable Contraction Hierarchies (CCHs) [12], which use nested dissection orders [13]. In this work, we focus on CCHs, which extend classic two-phase Contraction Hierarchies [10] to the three-phase approach with customization.
Contraction Hierarchies simulate contracting all nodes in a given order and insert shortcut arcs between the neighbors of a contracted node. These represent paths via the contracted nodes. Shortest st path queries are answered by, e.g., a bidirectional Dijkstra search [2] from s and t, which only considers shortcut and original arcs to higher-ranked nodes. Thus, nodes which lie on many shortest paths should be ranked high in the order. Customizable Contraction Hierarchies [12] use contraction orders computed via recursive balanced node separators (nested dissection) in order to achieve a logarithmic search space depth with few added shortcuts. Node separators are considered to lie on many shortest paths, as any path between the components crosses the separator. The weights of the contraction hierarchy can then be quickly customized to any metric, allowing, e.g., the incorporation of real-time traffic information. The running time needed for the customization and the shortest path queries depends on the quality of the calculated order. Previously proposed partitioning tools for computing separators in road networks include FlowCutter [14], Inertial Flow [15], KaHiP [16], Metis [17], PUNCH [18], and Buffoon [19]. KaHiP and Metis are general-purpose graph partitioning tools. PUNCH and Buffoon are special-purpose partitioners, which aim to use geographical features of road networks such as rivers or mountains. Rivers and mountains form very small cuts and were dubbed natural cuts in [18]. PUNCH identifies and deletes natural cuts, then contracts the remaining components and subsequently runs a variety of highly randomized local search algorithms. Buffoon incorporates the idea of natural cuts into KaHiP, running its evolutionary multilevel partitioner instead of the flat local searches of PUNCH. In [14], it was shown that FlowCutter is also able to identify and leverage natural cuts. Inertial Flow is another special-purpose partitioner that is even based on using the geographic embedding of the road network.
We combine the idea of Inertial Flow to use geographic coordinates with the incremental cut computations of FlowCutter. This allows us to compute a series of cuts with suitable balances much faster than FlowCutter while still achieving high quality. In an extensive experimental evaluation, we compare our new algorithm InertialFlowCutter to the state-of-the-art. Thus far, FlowCutter has been the best method for computing CCH orders. InertialFlowCutter computes slightly better CCH orders than FlowCutter and is a factor of 5.7 and 6.6 faster on the road networks of the USA and Europe, respectively—our two most relevant instances. Using 16 cores of a shared-memory machine, we can compute CCH orders for these instances in four minutes.
In Section 2, we briefly present the existing Inertial Flow and FlowCutter algorithms and describe how we combined them. In Section 3, we describe the setup and results of our experimental study. We conclude with a discussion of our results and future research directions in Section 4.
This paper recreates the experiments from [14] and uses a lot of the same setup. Therefore, there is substantial content overlap. To keep this paper self-contained, we repeat the parts we use. Our contributions are the InertialFlowCutter algorithm, an improved Inertial Flow implementation and a reproduction of the experiments from [14], including InertialFlowCutter and a newer KaHiP version.

2. Materials and Methods

After introducing preliminaries, we describe the existing biparitioning algorithms FlowCutter and Inertial Flow on a high level, before discussing how to combine them into our new algorithm InertialFlowCutter. We refer the interested reader to [14] for implementation details and a more in-depth discussion of the FlowCutter algorithm. Then, we discuss our application Customizable Contraction Hierarchies (CCH), what makes a good CCH order, and how we use recursive bisection to compute them.

2.1. Preliminaries

An undirected graph G = ( V , E ) consists of a set of nodesV and a set of edges E V 2 . A directed graph G = ( V , A ) has directed arcs A V × V instead of undirected edges. It is symmetric if for every arc ( x , y ) A , the reverse arc ( y , x ) is in A. For ease of notation, we do not distinguish between undirected and symmetric graphs in this paper, and we use them interchangeably, whichever better suits the description. Let n : = | V | denote the number of nodes and let m : = | E | denote the number of edges of an undirected graph. All graphs in this paper contain neither self-loops ( x , x ) nor multiedges. H = ( V , A ) is a subgraph of G if V V and A A . The subgraph induced by a node set U V is defined as G [ U ] : = ( U , { ( u , v ) A ( U × U ) } ) , the graph with nodes U and all arcs of G with endpoints in U. The degree deg ( x ) = | { ( x , y ) A } | is the number of outgoing arcs of x. A path is a sequence of edges such that consecutive edges overlap in a node. A graph is called k-connected if there are k node-disjoint paths between every pair of nodes. The k-connected components of a graph are the node-induced subgraphs, which are inclusion-maximal regarding k-connectivity. 1-connected components are called connected components, while 2-connected components are called biconnected components.

2.1.1. Separators and Cuts

Let V 1 , V 2 V be a bipartition of V = V 1 V 2 into two non-empty disjoint sets, called blocks. The cut induced by ( V 1 , V 2 ) is the set of edges cut ( V 1 , V 2 ) : = { ( v 1 , v 2 ) E ( V 1 × V 2 ) } between V 1 and V 2 . The cut size is | cut ( V 1 , V 2 ) | . We often use the terms cut and bipartition interchangeably. Sometimes, we say a bipartition is induced by a set of cut edges. A node separator partition is a partition of V = Q V 1 V 2 into three disjoint sets ( Q , V 1 , V 2 ) such that there is no edge between V 1 and V 2 . We call Q the separator and V 1 , V 2 the blocks or components of the separator. | Q | is the separator size. For an ε [ 0 , 1 ] , a cut or separator is ε -balanced if max ( | V 1 | , | V 2 | ) ( 1 + ε ) n / 2 . We often call ε the imbalance, as larger values correspond to less balanced cuts. The balanced graph bipartitioning [balanced node separator] problem is to find an ε -balanced cut [separator] of minimum size. Let S , T V be two fixed, disjoint, non-empty subsets of V. An edge cut [node separator] is an ST edge cut [node separator] if S V 1 and T V 2 .

2.1.2. Maximum Flows

A flow network N = ( V , A , S , T , c ) is a simple symmetric directed graph ( V , A ) with two disjoint non-empty terminal node sets S , T V , also called the source and target node set, as well as a capacity function c : A R 0 . A flow in N is a function f : A R subject to the capacity constraint f ( a ) c ( a ) for all arcs a, flow conservation ( u , v ) A f ( ( u , v ) ) = 0 for all nonterminal nodes v, and skew symmetry f ( ( u , v ) ) = f ( ( v , u ) ) for all arcs ( u , v ) . In this paper, we consider only unit flows and unit capacities, i.e., f : A { 1 , 0 , 1 } , c : A { 0 , 1 } . The value of a flow | f | : = s S , ( s , u ) A f ( ( s , u ) ) is the amount of flow leaving S. The residual capacity r f ( a ) : = c ( a ) f ( a ) is the additional amount of flow that can pass through a without violating the capacity constraint. The residual network with respect to f is the directed graph N f = ( V , A f ) , where A f : = { a A | r f ( a ) > 0 } . An augmenting path is an ST path in N f . A node v is called source-reachable if there is a path from S to v in N f . We denote the set of source-reachable nodes by S r , and define the set of target-reachable nodes T r analogously. The flow f is a maximum flow if | f | is maximal among all possible flows in N . This is the case if and only if there is no augmenting path in N f . The well-known max-flow–min-cut theorem [20] states that the value of a maximum flow equals the capacity of a minimum ST edge cut. ( S r , V \ S r ) is the source-side cut, and ( V \ T r , T r ) is the target-side cut of a maximum flow.

2.2. Flowcutter

FlowCutter is an algorithm for the balanced graph bipartitioning problem. The idea of its core algorithm is to solve a sequence of incremental max flow problems, which induce cuts with monotonically increasing cut size and balance, until the latest cut induces an ε -balanced bipartition. The flow problems are incremental in the sense that the terminal nodes S , T of the previous flow problem are subsets of the terminals in the next flow problem. This nesting allows us to reuse the flow computed in previous iterations.
Given randomly chosen starting terminal nodes s , t , we set S : = { s } , T : = { t } and compute a maximum ST flow. Then, we transform the S-reachable nodes S r to sources if | S r | | T r | , or T r to targets otherwise. Assume | S r | | T r | without loss of generality. Now S induces a minimum ST cut C S . If C S is ε -balanced, the algorithm terminates. Otherwise, we transform one additional node, called piercing node, to a source. The piercing node is chosen from the nodes incident to the cut C S and not in S. This step is called piercing the cut C S . It ensures that we will find a different cut in the next iteration. Subsequently, we augment the previous flow to a maximum flow that considers the new source node. These steps are repeated until the latest cut induces an ε -balanced bipartition. Algorithm 1 shows pseudocode for FlowCutter.
Algorithm 1: FlowCutter
Algorithms 12 00196 i001
A significant detail of the piercing step is that piercing nodes which are not reachable from the opposite side are preferred. Choosing such nodes for piercing does not create augmenting paths. Thus, the cut size does not increase in the next iteration. This is called the avoid-augmenting-paths heuristic. A secondary distance-based piercing heuristic is used to break ties, when the avoid-augmenting-paths heuristic gives multiple choices. It chooses the node p which minimizes dist ( p , t ) dist ( s , p ) , where dist is the hop distance, precomputed via Breadth-First-Search from s and t. Roughly speaking, this attempts to prevent the cut sides from meeting before perfect balance. It also has a geometric interpretation, which is explained in [14].
We choose the starting terminal nodes s and t uniformly at random. Experiments [14] indicate that 20 terminal pairs are sufficient to obtain high quality partitions of road networks.
For computing maximum flows, we use the basic Ford–Fulkerson algorithm [20], with Pseudo- Depth-First-Search for finding augmenting paths. Pseudo-Depth-First-Search directly marks all adjacent nodes as visited when processing a node. It can be implemented like Breadth-First-Search using a stack instead of a queue.
A major advantage of FlowCutter over other partitioning tools is the fact that it computes multiple cuts. From this set of cuts, we derive the Pareto cutset, which we define as the set of all nondominated cuts. A cut C 2 is dominated by a cut C 1 if C 2 has neither better balance nor smaller cut size than C 1 . Instead of selecting a maximum imbalance a priori, we can select a good trade-off between cut size and imbalance from the Pareto cutset.

2.3. Inertial Flow

Given a line l R 2 , Inertial Flow projects the geographic coordinates of the nodes onto their closest points on l. The nodes are sorted by order of appearance on l. For a parameter α [ 0 , 0.5 ] , the first α · n nodes are chosen as S. Analogously, the last α · n nodes are chosen as T. In the next step, a maximum ST flow is computed from which a minimum ST cut is derived. Figure 1 illustrates the initialization. Instead of line, we use the term direction. In [15], α is set to 0.2 and four directions are used: West–East, South–North, Southwest–Northeast, and Southeast–Northwest. This simple approach works surprisingly well for road networks.

2.4. Combining Inertial Flow and Flowcutter into InertialFlowCutter

One drawback of Inertial Flow is the restriction to an a priori chosen imbalance, i.e., a value of α . We enhance FlowCutter by initializing S and T in the same way as Inertial Flow, though with a smaller parameter α than proposed for Inertial Flow. Additionally, we pierce cuts with multiple nodes from the Inertial Flow order at once. We call this bulk piercing. This way, we enumerate multiple Inertial Flow cuts simultaneously, without having to restart the flow computations. Furthermore, we can skip some of the first, highly imbalanced cuts of FlowCutter that are irrelevant for our application.
We introduce three additional parameters γ a , γ o ( 0 , 0.5 ] and δ ( 0 , 1 ) to formalize bulk piercing. Let L be a permutation of the nodes, ordered according to a direction. For the source side, we use bulk piercing as long as S contains at most γ a · n nodes. Furthermore, we limit ourselves to piercing the first γ o · n nodes of L. Parameter δ influences the step size. The idea is to decrease the step size as the cuts become more balanced. When we decide to apply bulk piercing, we settle the next δ ( 1 δ 2 n | S | ) nodes to S, when piercing the source side. To enforce the limit set by γ o , we pierce fewer nodes if necessary. For the target side, we apply this analogously starting from the end of the order. If bulk piercing can not be applied, we revert to the standard FlowCutter method of selecting single piercing nodes incident to the cut. Additionally, we always prioritize the avoid-augmenting-paths heuristic over bulk piercing. In our experiments, we conduct a parameter study which yields α = 0.05 , γ a = 0.4 , γ o = 0.25 and δ = 0.05 as reasonable choices. In Figure 2, we show an example for the InertialFlowCutter piercing step.

2.5. Running Multiple InertialFlowCutter Instances

To improve solution quality, we run q N instances of InertialFlowCutter with different directions. An instance is called a cutter. We use the directions ( cos ( φ ) , sin ( φ ) ) for φ = k π q and k [ 0 , , q 1 ] . To include the directions proposed in [15], q should be a multiple of 4. To improve running time, we run cutters simultaneously in an interleaved fashion as already proposed in [14]. We always schedule the cutters with the currently smallest flow value to either push one additional unit of flow or derive a cut. For the latter, we improve the balance by piercing the cut as long as this does not create an augmenting path. One standalone cutter runs in O ( c m ) , where c is the size of the largest output cut. Roughly speaking, this stems from performing one graph traversal, e.g., Pseudo-DFS, per unit of flow. The exact details can be found in [14]. Flow-based execution interleaving ensures that no cutter performs more flow augmentations than the other cutters. Thus, the running time for q cutters is O ( q c m ) , where c is the size of the largest found cut among all cutters. We specifically avoid computing some cuts that the standalone cutters would find. Consider the simple example with q = 2 , where the second cutter immediately finds a perfectly balanced cut with cut size c, but the first cutter only finds one cut with cut size C c . If the first cutter runs until a cut is found, we invested C m work but should only have invested c m .
In the case of InertialFlowCutter, it is actually important to employ flow-based interleaving and not just run a cutter until the next cut is found, as after a bulk piercing step, the next cut might be significantly larger. For road networks and FlowCutter, this difference is insignificant in practice, as the cut increases by just one most of the time.

2.6. Customizable Contraction Hierarchies

A Customizable Contraction Hierarchy (CCH) is an index data structure which allows fast shortest path queries and fast adaptation to new metrics (arc weights) in road networks. It consists of three phases: a preprocessing phase which only uses the network topology but not the arcs weights, a faster customization phase which adapts the index to new arc weights, and a query phase which quickly answers shortest path queries.
The preprocessing phase computes a contraction order of the nodes, e.g., via nested dissection, and then simulates contracting all nodes in that order, inserting shortcut arcs between all neighbors of a contracted node. Shortcuts represent two-arc paths via the contracted node.
The customization phase assigns correct weights to shortcuts by processing all arcs ( u , v ) in the order ascending by rank of u, i.e., the position of u in the order. To process an arc ( u , v ) , it enumerates all triangles u , w , v where w has lower rank than u and v, and updates the weight of ( u , v ) if the path ( u , w , v ) is shorter.
There are two different algorithms for st queries. For every shortest st path in the original graph, there is a shortest st path in the CCH that consists of two paths ( s , , x ) and ( x , , t ) such that the arcs in the first path are from lower-ranked to higher-ranked nodes, whereas the arcs in the second path are from higher-ranked to lower-ranked nodes [10]. The first, basic query algorithm performs bidirectional Dijkstra search from s and t and relaxes only arcs to higher-ranked nodes. The second query algorithm uses the elimination tree of a CCH to avoid priority queues, which are typically a bottleneck. In the elimination tree, the parent of a node is its lowest-ranked upward neighbor. The ancestors of a node v are exactly the nodes in the upward search space of v in the basic query [21]. For the st query, the outgoing arcs of all nodes on the path from s to the root and all incoming arcs of all nodes on the path from t to the root are relaxed. The node z minimizing the distance from s to z plus the distance from z to t determines the distance between s and t.
The query complexity is linear in the number of arcs incident to nodes on the paths from s and t to the root. Similarly, the customization running time depends on the number of triangles in the CCH. Fewer shortcuts result in less memory consumption and faster queries. We aim to minimize these metrics by computing high quality contraction orders.

2.7. Nested Dissection Orders for Road Networks

The framework to compute contraction orders is the same as for FlowCutter in [14] and our implementation builds upon theirs [22]. We only exchange the partitioning algorithm and parallelize it. For self-containedness, we repeat it here.

2.7.1. Recursive Bisection

We compute contraction orders via recursive bisection, using node separators instead of edge cuts. This method is also called nested dissection [13]. Let ( Q , V 1 , V 2 ) be a node separator partition. Then, we recursively compute orders for G [ V 1 ] and G [ V 2 ] and return the order of G [ V 1 ] followed by the order of G [ V 2 ] followed by Q. Q can be in an arbitrary order. We opt for the input order. Recursion stops once the graphs are trees or cliques. For cliques, any order is optimal. For trees, we use an algorithm to compute an order with minimal elimination tree depth in linear time [23,24].
In the nested dissection implementation from [22], a recursive call first computes a node separator Q, then deletes the edges { ( u , v ) E u Q or v Q } incident to that separator and then reorders the nodes in preorder. Preorder is the order in which nodes are first visited in a Depth-First-Search from some starting node—in this implementation, the one with smallest ID. The preorder identifies connected components of the new graph, which are the subgraphs to recurse on, as well as assigns local node and arc identifiers for them. This is also done once at the beginning, without computing a separator, in case the input graph is disconnected.

2.7.2. Separators

InertialFlowCutter computes edge cuts. We use a standard construction [25] to model node capacities as edge capacities in flow networks. It expands the undirected input graph G = ( V , E ) into a directed graph G = ( V , A ) . For every node v V , there is an in-node v i and an out-node v o in V , joined by a directed arc ( v i , v o ) , called the bridge arc of v. Furthermore, for every edge { u , v } E , there are two directed external arcs ( u o , v i ) and ( v o , u i ) A . Since we restrict ourselves to unit capacity flow networks, we can not assign infinite capacity to external arcs, and thus, the cuts contain both bridge arcs and external arcs. Bridge arcs directly correspond to a node in the separator. For the external arcs in the cut, we place the incident node on the larger side of the cut in the separator.

2.7.3. Choosing Cuts from the Pareto Cutset

InertialFlowCutter yields a sequence of nondominated cuts with monotonically increasing cut size and balance, whereas standard partitioners yield a single cut for some prespecified imbalance. We need to choose one cut, to recurse on the sides of the corresponding separator. The expansion of a cut is its cut size divided by the number of nodes on the smaller side. This gives a certain trade-off between cut size and balance. We choose the cut with minimum expansion and ε < 0.6 , i.e., at least 20% of the nodes on the smaller side. While this approach is certainly not optimal, it works well enough. It is not clear how to choose the optimum cut without considering the whole hierarchy of cuts in deeper levels of recursion.

2.7.4. Special Preprocessing

Road networks contain many nodes of degree 1 or 2. The graph size can be drastically reduced by eliminating them in a preprocessing step that is performed only once. First, we compute the largest biconnected component B in linear time using [26] and remove all edges between B and the rest of the graph G. The remaining graph usually consists of a large B and many tiny, often tree-like components. We compute orders for the components separately and concatenate them in an arbitrary order. The order for B is placed after the orders of the smaller components.
A degree-2-chain is a path ( x , y 1 , , y k , z ) where all deg ( y i ) = 2 but deg ( x ) > 2 and deg ( z ) 2 . We divide the nodes into two graphs G 3 and G 2 with degrees at least 3 and at most 2, by computing all degree-2-chains in linear time and splitting along them. If deg ( z ) > 2 , we insert an edge between x and z since z is in G 3 . We compute contraction orders for the connected components of G 2 separately and concatenate them in an arbitrary order. Since these are paths, we can use the algorithm for trees. The order for G 3 is placed after the one for G 2 . We compute degree-2 chains by iterating over all arcs ( x , y ) . If deg ( x ) > 2 and deg ( y ) 2 , then x is the start of a degree-2 chain. We follow this chain recursively: As long as deg ( y ) = 2 , the only arc of y that is not ( y , x ) comes next in the chain. If deg ( y ) = 1 or deg ( y ) 3 , the chain is finished at y. This algorithm runs in linear time as it considers every arc at most twice.

2.8. Parallelization

Recursive bisection is straightforward to parallelize by computing orders on the separated blocks independently, using task-based parallelism. This only employs parallelism after the first separators have been found.
Therefore, we additionally parallelize InertialFlowCutter. The implementation of FlowCutter [22] contains a simple parallelization. In a round, all cutters with the currently smallest cut are advanced to their next cut in parallel. Before the next round, all threads synchronize. This approach exhibits poor core utilization since only a few cutters may have the smallest cut in a round and threads perform different amounts of work, leading to skewed load distribution.
Instead, we employ a simple wait-free task-based parallelization scheme, which guarantees that the cutters with the t currently smallest flow values are making progress, for t threads executing in parallel. Algorithm 2 illustrates this scheme in pseudocode. For every cutter, we store two atomic flags: a free flag, which indicates that currently no task holds this cutter, and an active flag, which indicates that this cutter can still yield a cut with better expansion than previously found cuts. In the beginning, every cutter is active and free.
Recall that q is the number of cutters. We create q tasks, and the task scheduler launches t q parallel tasks, potentially adding more when resources from other parts of the recursive bisection become available. A task executes a loop in which it first acquires a cutter with the currently smallest flow out of the free and active cutters, then performs a chunk of work on it and releases the cutter again. A chunk of work consists of deactivating the cutter, if it can not improve expansion and otherwise running one Pseudo-Depth-First-Search, which either pushes one unit of flow or derives a cut. If the cut has at least 20 % of the nodes on the smaller side and improves expansion, we acquire a lock and store the cut. A task terminates once it fails to acquire a free and active cutter, as there are now more tasks than active cutters.
If less than q tasks are running simultaneously, the tasks switch between cutters. If all cutters are acquired and the task’s currently acquired cutter remains active, we continue working on it, to avoid the overhead of releasing and immediately re-acquiring the same cutter. Note that, due to the parallelization, cuts are not necessarily enumerated in the order of increasing cut size, and dominated cuts may also be reported.
Algorithm 2: Parallel InertialFlowCutter
Algorithms 12 00196 i002

3. Results

In this section, we discuss our experimental setup and results.

3.1. Experimental Setup

In Section 3.6, we discuss our parameter study to obtain reasonable parameters for InertialFlowCutter. Our remaining experiments follow the setup in [14], comparing FlowCutter, KaHiP, Metis, and Inertial Flow to InertialFlowCutter, regarding CCH performance as well as cut sizes for different imbalances on the input graph without biconnectivity and degree-2 chain preprocessing. The latter are referred to as top-level Pareto cut experiments. Our benchmark set consists of the road networks of Colorado, California, and Nevada, the USA and Western Europe, see Table 1, made available during the DIMACS implementation challenge on shortest paths [27].
The CCH performance experiments compare the different partitioners based on the time to compute a contraction order, the median running time of nine customization runs, the average time of 10 6 random st queries, as well as the criteria introduced in Section 2.6. Unless explicitly stated as parallel, all reported running times are sequential on an Intel Xeon E5-1630 v3 Haswell processor clocked at 3.7 GHz with 10 MB L3 cache and 128 GB DDR4 RAM (2133 MHz). We additionally report running times for computing contraction orders in parallel on a shared-memory machine with two 8-core Intel Xeon Gold 6144 Skylake CPUs, clocked at 3.5GHz with 24.75 MB L3 cache and 192 GB DDR4 RAM (2666 MHz). InertialFlowCutter is implemented in C++, and the code is compiled with g++ version 8.2 with optimization level 3. We use Intel’s Threading Building Blocks library for shared-memory parallelism. Our InertialFlowCutter implementation and evaluation scripts are available on GitHub [28].

3.2. CCH Implementation

We used the CCH implementation in RoutingKit [29]. There are different CCH customization and query variants. RoutingKit implements basic customization with upper triangles instead of lower triangles, no witness searches, no precomputed triangles, and no instruction-level parallelism. We used the sequential customization. For queries, we used elimination tree search. There has been a recent, very simple improvement [30], which drastically accelerates elimination tree search for short-range queries. It is not implemented in RoutingKit, but random st queries tend to be long-range, so the effect would be negligible for our experiments.

3.3. Partitioner Implementations and Nested Dissection Setup

In [14], the KaHiP versions 0.61 and 1.00 are used. We did not re-run the preprocessing for those old versions of KaHiP but used the orders and order computation running times of [14]. We did re-run customizations and queries. The order computation running times are comparable as the experiments ran on the same machine. We added the latest KaHiP version 2.11, which is available on GitHub [31]. For all three versions, the strong preset of KaHiP was used. We refer to the three KaHiP variants as K0.61, K1.00, and K2.11. For the CCH order experiments, we kept versions K0.61 and K1.00 but omitted them for the top-level cut experiments because K2.11 is better for top-level cuts.
We used Metis 5.1.0, available from the authors’ website [32], which we denote by M in our tables.
We used InertialFlowCutter with 4 , 8 , 12 , 16 directions and denote the configurations by IFC4, IFC8, IFC12, and IFC16, respectively.
We used our own Inertial Flow implementation with the four directions proposed in [15]. It is available at our repository [28]. Instead of Dinic algorithm [33], we used Ford–Fulkerson, as preliminary experiments indicate it is faster. Furthermore, we filtered source nodes that are only connected to other sources and target nodes that are only connected to other targets. Instead of sorting nodes along a direction, we partitioned the node-array such that the first and last α · n nodes are the desired terminals, using std::nth_element. These optimizations reduce the running time from 1017 s [14] down to 450 s for a CCH order on Europe. Additionally, we used flow-based interleaving on Inertial Flow. This was already included in the Inertial Flow implementation used in [14]. We denote Inertial Flow by I in our tables.
The original FlowCutter implementation used in [14] is available on GitHub [22]. We used a slightly modified version that has been adjusted to use Intel’s Threading Building Blocks instead of OpenMP for optional parallelism. All parallelism is disabled for FlowCutter in our experiments. We used FlowCutter with 3 , 20 , 100 random source-target pairs and denote the variants by F3, F20, and F100, respectively.
Implementations of Buffoon [19] and PUNCH [18] are not publicly available. Therefore, these are not included in our experiments.
We now discuss the different node ordering setups used in the experiments. Metis offers its own node ordering tool ndmetis, which we used. For Inertial Flow, K1.00 and K2.11, we used a nested dissection implementation, which computes one edge cut per level and recurses until components are trees or cliques, which are solved directly. Separators are derived by picking the nodes incident to one side of the edge cut. For comparability with [12,14], we used an older nested dissection implementation for K0.61, which, on every level, repeatedly computes edge cuts until no smaller cut is found for ten consecutive iterations. For InertialFlowCutter and FlowCutter, we employed the setup that was proposed for FlowCutter in [14] that has also been described in Section 2.7. Our nested dissection implementation is based on the implementation in the FlowCutter repository [22]. We made minor changes and parallelized it, as described in Section 2.8.
We tried to employ the special preprocessing techniques for KaHiP 2.11. While this made order computation faster, the order quality was much worse regarding all criteria.
Starting with version 1.00, KaHiP includes a more sophisticated multilevel node separator algorithm [34]. It was omitted from the experiments in [14] because it took 19 hours to compute an order for the small California graph, using one separator per level, and did not finish in reasonable time on the larger instances. Therefore, we still exclude it.

3.4. Order Experiments

In this section, we compare the different partitioners with respect to the quality of computed CCH orders and running time of the preprocessing. Table 2 contains a large collection of metrics and measurements for the four road networks of California, Colorado, Europe, and the USA. Recall that the query time is averaged over 10 6 queries with distinct start and end nodes chosen uniformly at random, the customization time is the median over nine runs. The order computation time is from a single run, since it is infeasible to run certain partitioners multiple times in a reasonable timeframe.

3.4.1. Quality

Over all nodes v, we report the average and maximum number of ancestors in the elimination tree, as well as the number of arcs incident to the ancestors. These metrics assess the search space sizes of an elimination tree query. The query times in Table 2 are correlated with search space size, as expected. The partitioner with the smallest average number of nodes and arcs in the search space always yields the fastest queries. Furthermore, we report the number of arcs in the CCH, i.e., shortcut and original arcs, the number of triangles, and an upper bound on the treewidth, which we obtained using the CCH order as an elimination ordering. A CCH is essentially a chordal supergraph of the input. Thus, CCHs are closely related to tree decompositions and elimination orderings. The relation between tree decompositions and Contraction Hierarchies is further explained in [35]. A low treewidth usually corresponds to good performance with respect to the other metrics. However, as the treewidth is defined by the largest bag in the tree decomposition which may depend on the size of few separators and disregards the size of all smaller separators, this is not always consistent. In the context of shortest path queries, a better average is preferable to a slightly reduced maximum.
On the California and USA road networks, IFC12 yields the fastest queries and smallest average search space sizes, while on Europe IFC8 does. On Colorado, our smallest road network, F100 is slightly ahead of the InertialFlowCutter variants by 0.2 to 0.3 microseconds query time. IFC16 yields the fastest customization times for Colorado, Europe, and the USA, while IFC12 yields the fastest customization times for California. Customization times are correlated with the number of triangles. However, for Europe and the USA, the smallest number does not yield the fastest customizations. Even though we take the median of nine runs, this may still be due to random fluctuations.
FlowCutter with at least 20 cutters has slightly worse average search space sizes and query times than InertialFlowCutter for California and the USA but falls behind for Europe. Thus, InertialFlowCutter computes the best CCH orders, with FlowCutter close behind. The different KaHiP variants and Inertial Flow compute the next best orders, while Metis is ranked last by a large margin.
The ratio between maximum and average search space size is most strongly pronounced for Inertial Flow. This indicates that Inertial Flow works well for most separators, but the quality degrades for a few. InertialFlowCutter resolves this problem.
There is an interesting difference in the number of cutters necessary for good CCH orders with InertialFlowCutter and FlowCutter. In [14], F20 is the recommended configuration. The performance differences between F20 and F100 are marginal (except for Europe). However, using just three cutters seems insufficient to get rid of bad random choices.
For the InertialFlowCutter variants, four cutters suffice most of the time. The search space sizes, query times and customization times are very similar. This is also confirmed by the top-level cut experiments in Section 3.5. It seems the Inertial Flow guidance is sufficiently strong to eliminate bad random choices. Again, only for Europe, the queries for IFC4 are slower, which is why we recommend using IFC8. The better query running times justify the twice as long preprocessing.
Europe also stands out when comparing Inertial Flow query performances. For Europe, Inertial Flow only beats Metis, but, for the USA, it beats all KaHiP versions and Metis. The query performance difference of 57 microseconds between Inertial Flow and IFC4 for Europe suggests that the incremental cut computations of InertialFlowCutter make a significant difference and are worth the longer preprocessing times compared to Inertial Flow.

3.4.2. Preprocessing Time

Previously, CCH performance came at the cost of high preprocessing time. We compute better CCH orders than FlowCutter in a much shorter time.
KaHiP 0.61 and KaHiP 1.00 are by far the slowest. KaHiP 2.11 is faster than F100 but slower than F20. All InertialFlowCutter variants are faster than F20. IFC8 and F3 have similar running times. Metis is the fastest by a large margin, and Inertial Flow is the second fastest.
The two old KaHiP versions are slow for different reasons. As already mentioned, K0.61 computes at least 10 cuts, as opposed to K1.00 and K2.11. K1.00 is slow because the running time for top-level cuts with ε 0.2 increases unexpectedly, according to [14].
Using 16 cores and IFC8, we compute a CCH order of Europe in just 242 s, with 2258 s sequential running time on the Skylake machine. This corresponds to a speedup of 9.3 over the sequential version. See Table 3. Note that, due to using eight cutters, at most eight threads work on a single separator. Therefore, in particular for the top-level separator, at most 8 of the 16 cores are used. The top-level separator alone needs about 50 s using eight cores. Due to unfortunate scheduling and unbalanced separators, it happens also at later stages that a single separator needs to be computed before any further tasks can be created. Using eight cores, we get a much better speedup of 6.8 for Europe; up to four cores, we see an almost perfect speedup for all but the smallest road network. This is because some cutters need less running time than others. Thus, there is actually less potential for parallelism than the number of cutters suggests.

3.5. Pareto Cut Experiments

For the top-level cut experiments, we permute the nodes in preorder from a randomly selected start node, using the same start node for all partitioners. As discussed in Section 2.7, this is part of the recursive calls in the nested dissection implementation from [22]. We include it here to recreate the environment of a nested dissection on the top level.
In Table 4, Table 5, Table 6 and Table 7, we report the found cuts for various values of ε for all road networks. We use the partitioners KaHiP 2.11, IFC4, IFC8, IFC12, F3, F20, Metis, and Inertial Flow. We also report the actually achieved imbalance, the running time, and whether the sides of the cut are connected (•) or not (∘). We report ε = 0.0 only if perfect balance was achieved; otherwise, if the rounded value is 0.0 , we report < 0.1 % . KaHIP was not able to achieve perfect balance for any of the graphs if perfect balance was desired. We note this by crossing out the respective values. This is due to our use of the KaHiP library interface that does not support enforcing balance. Metis simply rejects ε = 0 , which is why we mark the corresponding entries with a dash. Perfect balance is not actually useful for the application. We solely include it to analyze the different Pareto cuts.
Note that for FlowCutter and InertialFlowCutter, the running time always includes the computation of all more imbalanced cuts, i.e., to generate the full set of cuts, only the running time of the perfectly balanced cut is needed, while for all other partitioners, the sum of all reported running times is needed.
Concerning the performance, Metis wins, but almost all reported cuts are larger than the cuts reported by FlowCutter, InertialFlowCutter, and KaHiP. Inertial Flow is also quite fast but, due to its design, produces cuts that are much more balanced than desired and thus can not achieve as small cuts as the other partitioners.
KaHIP achieves exceptionally small, highly balanced cuts on the Europe road network. On the other road networks, it is similar to or worse than F20 in terms of cut size. This is due to the special geography of the Europe road network. It excludes large parts of Eastern Europe, which is why there is a cut of size 2 and ε = 72.8 % imbalance that separates Norway, Sweden, and Finland from the rest of Europe. For ε = 10 % , KaHiP computes a cut with 112 edges, which separates the European mainland from the Iberian peninsula, Britain, Scandinavia minus Denmark, Italy, and Austria [14]. The Alps separate Italy from the rest of Europe. Britain is only connected via ferries, and the Iberian peninsula is separated from the remaining mainland by the Pyrenees. One side of the cut is not connected because the only ferry between Britain and Scandinavia runs between Britain and Denmark. FlowCutter is unable to find cuts with disconnected sides without a modified initialization. By handpicking terminals for FlowCutter, a similar cut with only 87 edges and 15 % imbalance, which places Austria with the mainland instead, is found in [14]. However, it turns out that the FlowCutter CCH order using the 87 edge cut as a top-level separator is not much better than plain FlowCutter. This indicates that it does not matter at what level of recursion the different cuts are found.
For large imbalances, KaHIP seems unable to leverage the additional freedom to achieve the much smaller but more unbalanced cuts, like the ones reported by InertialFlowCutter and FlowCutter. This has already been observed for previous versions of KaHIP [14]. In terms of running time, KaHIP and F20 are the slowest algorithms. InertialFlowCutter is in all three configurations an order of magnitude faster than F20. Up to a maximum ε of 10 % , the three variants report almost the same cuts. Apart from the very imbalanced ε = 90 % cuts, the cuts are also at most one edge worse than F20. Only for more balanced cuts more cutters give a significant improvement. Here, in particular on the Europe road network, F20 is also significantly better than InertialFlowCutter. In the range between ε = 60 % and ε = 10 % , which is most relevant for our application, there is thus no significant difference between F20 and InertialFlowCutter, regardless of the number of cutters. This indicates that on the top level, the first four directions seem to cover most cuts already. On the other hand, for highly balanced cuts, the geographic initialization does not help much, as can be seen from the much worse cuts for InertialFlowCutter. Here, just having more cutters seems to help.

3.6. Parameter Configuration

In this section, we tune the parameters α , δ , γ a , γ o of InertialFlowCutter. Our goal is to achieve much faster order computation without sacrificing CCH performance. Recall that α is the fraction of nodes initially fixed on each side, δ is—roughly speaking—a stepsize, γ o is the threshold up to how many nodes on a side of the projection we perform bulk piercing, and similarly, γ a for how many settled nodes on a side. Table 8 shows a large variety of tested parameter combinations for InertialFlowCutter with eight directions on the road network of Europe. We selected the parameter set α = 0.05 , δ = 0.05 , γ a = 0.4 , γ o = 0.25 based on query and order computation time. The best entries per column are highlighted in bold. Furthermore, color shades are scaled between values in the columns. Darker shades correspond to lower values, which are better for every measure.
First, we consider the top part of Table 8, where we fix α to 0.05 and try different combinations of δ , γ o , γ a . While the number of triangles and customization times are correlated, the top configurations for these measures are not the same interestingly. The variations in search space sizes, customization time (27 ms), and query time (3 μ s) are marginal. At the bottom part of Table 8, we try different values of α with the best choices for the other parameters. As expected, larger values for α accelerate order computation and slightly slow down queries. In summary, InertialFlowCutter is relatively robust to parameter choices other than for α , which means users do not need to invest much effort on parameter tuning.

4. Discussion

We have presented InertialFlowCutter, an algorithm that exploits geographical information to quickly compute high-quality bipartitions of road networks. Our experiments show that we are able to compute nested dissection orders, as used for CCHs, six times faster than the previous state-of-the-art. Using 16 cores, we can compute a nested dissection order of the Europe road network in four minutes. This makes CCHs even more attractive to be applied in practice.
An open question is how to transfer the ideas of large initial terminal node sets and piercing multiple nodes simultaneously to graphs without geographical information. As FlowCutter also achieved quite good results on general graphs, albeit with slow running times [14], this might be an interesting direction for future research.

Author Contributions

Conceptualization, L.G., M.H., and D.W.; software, L.G., M.H., and T.N.U.; validation, L.G., M.H., and T.N.U.; formal analysis, L.G. and M.H.; investigation, L.G., M.H., and T.N.U.; resources, D.W.; data curation, L.G. and T.N.U.; writing—original draft preparation, L.G. and M.H.; writing—review and editing, L.G., M.H., T.N.U., and D.W.; visualization, L.G., M.H., and T.N.U.; supervision, D.W.; project administration, L.G. and M.H.; funding acquisition, D.W.

Funding

This research was partially funded by the Deutsche Forschungsgemeinschaft (DFG) under grants WA654/19-2 and WA654/22-2. The Article Processing Charges (APC) were funded by the KIT-Publication Fund of the Karlsruhe Institute of Technology.

Acknowledgments

We thank Ben Strasser for helpful discussions and for providing us the setup and code of the experiments conducted in [14].

Conflicts of Interest

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

References

  1. Buluç, A.; Meyerhenke, H.; Safro, I.; Sanders, P.; Schulz, C. Recent Advances in Graph Partitioning. In Algorithm Engineering—Selected Results and Surveys; Kliemann, L., Sanders, P., Eds.; Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2016; Volume 9220, pp. 117–158. [Google Scholar] [CrossRef]
  2. Dijkstra, E.W. A Note on Two Problems in Connexion with Graphs. Numer. Math. 1959, 1, 269–271. [Google Scholar] [CrossRef]
  3. Bast, H.; Delling, D.; Goldberg, A.V.; Müller–Hannemann, M.; Pajor, T.; Sanders, P.; Wagner, D.; Werneck, R.F. Route Planning in Transportation Networks. In Algorithm Engineering—Selected Results and Surveys; Lecture Notes in Computer, Science; Kliemann, L., Sanders, P., Eds.; Springer: Berlin/Heidelberg, Germany, 2016; Volume 9220, pp. 19–80. [Google Scholar]
  4. Möhring, R.H.; Schilling, H.; Schütz, B.; Wagner, D.; Willhalm, T. Partitioning Graphs to Speedup Dijkstra’s Algorithm. ACM J. Exp. Algorithm. 2006, 11, 1–29. [Google Scholar] [CrossRef]
  5. Hilger, M.; Köhler, E.; Möhring, R.H.; Schilling, H. Fast Point-to-Point Shortest Path Computations with Arc-Flags. In The Shortest Path Problem: Ninth DIMACS Implementation Challenge; Demetrescu, C., Goldberg, A.V., Johnson, D.S., Eds.; DIMACS Book; American Mathematical Society: Providence, RI, USA, 2009; Volume 74, pp. 41–72. [Google Scholar]
  6. Bauer, R.; Delling, D. SHARC: Fast and Robust Unidirectional Routing. In Proceedings of the 10th Workshop on Algorithm Engineering and Experiments (ALENEX’08), San Francisco, CA, USA, 19 January 2008; Munro, I., Wagner, D., Eds.; SIAM: Philadelphia, PA, USA, 2008; pp. 13–26. [Google Scholar] [Green Version]
  7. Sanders, P.; Schultes, D. Engineering Highway Hierarchies. ACM J. Exp. Algorithm. 2012, 17, 1–40. [Google Scholar] [CrossRef]
  8. Bauer, R.; Delling, D.; Sanders, P.; Schieferdecker, D.; Schultes, D.; Wagner, D. Combining Hierarchical and Goal-Directed Speed-Up Techniques for Dijkstra’s Algorithm. ACM J. Exp. Algorithm. 2010, 15, 1–31. [Google Scholar] [CrossRef]
  9. Gutman, R.J. Reach-Based Routing: A New Approach to Shortest Path Algorithms Optimized for Road Networks. In Proceedings of the 6th Workshop on Algorithm Engineering and Experiments (ALENEX’04), New Orleans, LA, USA, 10 January 2004; SIAM: Philadelphia, PA, USA, 2004; pp. 100–111. [Google Scholar]
  10. Geisberger, R.; Sanders, P.; Schultes, D.; Vetter, C. Exact Routing in Large Road Networks Using Contraction Hierarchies. Transp. Sci. 2012, 46, 388–404. [Google Scholar] [CrossRef]
  11. Delling, D.; Goldberg, A.V.; Pajor, T.; Werneck, R.F. Customizable Route Planning in Road Networks. Transp. Sci. 2017, 51, 566–591. [Google Scholar] [CrossRef]
  12. Dibbelt, J.; Strasser, B.; Wagner, D. Customizable Contraction Hierarchies. ACM J. Exp. Algorithm. 2016, 21, 1–5. [Google Scholar] [CrossRef]
  13. George, A. Nested Dissection of a Regular Finite Element Mesh. SIAM J. Numer. Anal. 1973, 10, 345–363. [Google Scholar] [CrossRef]
  14. Hamann, M.; Strasser, B. Graph Bisection with Pareto Optimization. ACM J. Exp. Algorithm. 2018, 23, 1–2. [Google Scholar] [CrossRef]
  15. Schild, A.; Sommer, C. On Balanced Separators in Road Networks. In Proceedings of the 14th International Symposium on Experimental Algorithms (SEA’15), Paris, France, 29 June–1 July 2015; Lecture Notes in Computer Science. Springer: Berlin/Heidelberg, Germany, 2015; pp. 286–297. [Google Scholar]
  16. Sanders, P.; Schulz, C. Think Locally, Act Globally: Highly Balanced Graph Partitioning. In Proceedings of the 12th International Symposium on Experimental Algorithms (SEA’13), Rome, Italy, 5–7 June 2013; Lecture Notes in Computer Science. Springer: Berlin/Heidelberg, Germany, 2013; Volume 7933, pp. 164–175. [Google Scholar]
  17. Karypis, G.; Kumar, V. A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs. SIAM J. Sci. Comput. 1999, 20, 359–392. [Google Scholar] [CrossRef]
  18. Delling, D.; Goldberg, A.V.; Razenshteyn, I.; Werneck, R.F. Graph Partitioning with Natural Cuts. In Proceedings of the 25th International Parallel and Distributed Processing Symposium (IPDPS’11), Anchorage, AK, USA, 16–20 May 2011; IEEE Computer Society: Washington, DC, USA, 2011; pp. 1135–1146. [Google Scholar]
  19. Sanders, P.; Schulz, C. Distributed Evolutionary Graph Partitioning. In Proceedings of the 14th Meeting on Algorithm Engineering and Experiments (ALENEX’12), Kyoto, Japan, 16 January 2012; SIAM: Philadelphia, PA, USA, 2012; pp. 16–29. [Google Scholar] [Green Version]
  20. Ford, L.R., Jr.; Fulkerson, D.R. Maximal flow through a network. Can. J. Math. 1956, 8, 399–404. [Google Scholar] [CrossRef]
  21. Bauer, R.; Columbus, T.; Rutter, I.; Wagner, D. Search-space size in contraction hierarchies. Theor. Comput. Sci. 2016, 645, 112–127. [Google Scholar] [CrossRef]
  22. Strasser, B. FlowCutter Implementation. Available online: https://github.com/kit-algo/flow-cutter/tree/cch-tree-order (accessed on 22 May 2019).
  23. Iyer, A.V.; Ratliff, H.D.; Vijayan, G. Optimal Node Ranking of Trees. Inf. Process. Lett. 1988, 28, 225–229. [Google Scholar] [CrossRef]
  24. Schæffer, A.A. Optimal node ranking of trees in linear time. Inf. Process. Lett. 1989, 33, 91–96. [Google Scholar] [CrossRef]
  25. Ahuja, R.K.; Magnanti, T.L.; Orlin, J.B. Network Flows: Theory, Algorithms, and Applications; Prentice Hall: Upper Saddle River, NJ, USA, 1993. [Google Scholar]
  26. Hopcroft, J.E.; Tarjan, R.E. Efficient Algorithms for Graph Manipulation. Commun. ACM 1973, 16, 372–378. [Google Scholar] [CrossRef]
  27. Demetrescu, C.; Goldberg, A.V.; Johnson, D.S. (Eds.) The Shortest Path Problem: Ninth DIMACS Implementation Challenge; DIMACS Book; American Mathematical Society: Providence, RI, USA, 2009; Volume 74. [Google Scholar]
  28. Gottesbüren, L.; Hamann, M.; Uhl, T. InertialFlowCutter Implementation and Evaluation Scripts. Available online: https://github.com/kit-algo/InertialFlowCutter (accessed on 28 June 2019).
  29. Strasser, B. CCH Implementation in RoutingKit. Available online: https://github.com/RoutingKit/RoutingKit (accessed on 3 June 2019).
  30. Buchhold, V.; Sanders, P.; Wagner, D. Real-Time Traffic Assignment Using Fast Queries in Customizable Contraction Hierarchies. In Proceedings of the 17th International Symposium on Experimental Algorithms (SEA’18), L’Aquila, Italy, 27–29 June 2018; Leibniz International Proceedings in Informatics. pp. 27:1–27:15. [Google Scholar]
  31. Schulz, C. KaHiP Implementation. Available online: https://github.com/schulzchristian/KaHIP (accessed on 10 June 2019).
  32. Karypis, G.; Kumar, V. Metis Binary Distribution. Available online: http://glaros.dtc.umn.edu/gkhome/metis/metis/download (accessed on 10 June 2019).
  33. Dinitz, Y. Algorithm for Solution of a Problem of Maximum Flow in a Network with Power Estimation. Sov. Math.-Dokl. 1970, 11, 1277–1280. [Google Scholar]
  34. Sanders, P.; Schulz, C. Advanced Multilevel Node Separator Algorithms. In Proceedings of the 15th International Symposium on Experimental Algorithms (SEA’16), St. Petersburg, Russia, 5–8 June 2016; Lecture Notes in Computer Science. Springer: Berlin/Heidelberg, Germany, 2016; Volume 9685, pp. 294–309. [Google Scholar]
  35. Strasser, B.; Wagner, D. Graph Fill-In, Elimination Ordering, Nested Dissection and Contraction Hierarchies. In Gems of Combinatorial Optimization and Graph Algorithms; Schulz, A.S., Skutella, M., Stiller, S., Wagner, D., Eds.; Springer: Berlin/Heidelberg, Germany, 2015; pp. 69–82. [Google Scholar] [CrossRef]
Figure 1. Inertial Flow projection and initialization. Nodes are projected onto their closest point on the blue direction. S and T are highlighted.
Figure 1. Inertial Flow projection and initialization. Nodes are projected onto their closest point on the blue direction. S and T are highlighted.
Algorithms 12 00196 g001
Figure 2. One piercing step of InertialFlowCutter. S in dark green, S r \ S in bright green, T in bright blue, T r \ T in dark blue. The vertical lines depict jumps in the Inertial Flow order. The cuts are depicted in red. The orange stripes in the left figure depict the piercing nodes for the source-side cut. In this case, they overlap with T r , which means that in the next iteration, the cut size increases. The right figure shows the piercing nodes settled to S as well as the newly computed cuts and S r , T r . The previous source-side cut is dashed.
Figure 2. One piercing step of InertialFlowCutter. S in dark green, S r \ S in bright green, T in bright blue, T r \ T in dark blue. The vertical lines depict jumps in the Inertial Flow order. The cuts are depicted in red. The orange stripes in the left figure depict the piercing nodes for the source-side cut. In this case, they overlap with T r , which means that in the next iteration, the cut size increases. The right figure shows the piercing nodes settled to S as well as the newly computed cuts and S r , T r . The previous source-side cut is dashed.
Algorithms 12 00196 g002
Table 1. Benchmark road networks.
Table 1. Benchmark road networks.
Graphnm
Colorado 436 · 10 3 10 6
California and Nevada 1.9 · 10 6 4.6 · 10 6
USA 24 · 10 6 57 · 10 6
Europe 18 · 10 6 44 · 10 6
Table 2. Customizable Contraction Hierarchy (CCH) order experiments. The best values (smallest) per metric and graph are highlighted.
Table 2. Customizable Contraction Hierarchy (CCH) order experiments. The best values (smallest) per metric and graph are highlighted.
Search SpaceCCH Up.Running Times
NodesArcs [ · 10 3 ]Arcs#Tri.Tw.OrderCust.Query
Avg.Max.Avg.Max.[ · 10 6 ][ · 10 6 ]Bd.[s][ms][ μ s]
ColM155.63546.122.013.763.91021.858.821.1
K0.61135.13574.621.616.772.41033837.166.416.9
K1.00136.43574.822.115.069.1991052.462.017.1
K2.11135.13634.722.814.968.4100924.661.816.9
I151.35436.237.715.073.91193.063.720.1
F3127.22774.114.412.847.4859.446.715.8
F20122.52633.813.812.543.88755.944.314.7
F100122.32633.813.812.543.787274.544.414.6
IFC4123.22613.913.712.544.11006.943.514.9
IFC8123.32613.913.712.543.910012.943.414.9
IFC12123.12633.914.012.543.68718.743.314.8
IFC16123.12623.914.012.543.58724.343.214.8
CalM275.554317.353.265.0364.11809.8310.147.9
K0.61187.74837.037.074.8342.416018,659.3316.424.9
K1.00184.94716.837.969.5334.41436023.6302.324.7
K2.11184.84496.836.569.5332.41624374.9300.824.7
I191.46057.153.468.8341.316116.0301.725.4
F3178.83616.224.959.2235.413257.9240.223.3
F20169.63835.626.358.0218.5132358.5229.721.9
F100169.63865.626.358.0218.31321759.2229.921.9
IFC4170.03805.626.258.0217.613242.3225.121.7
IFC8169.83805.626.258.0217.713279.0225.121.7
IFC12169.43805.626.257.9217.2132115.2224.821.6
IFC16170.23815.726.258.0218.4132151.9225.821.9
EurM1167.31914373.1765.9697.413,238.1828124.68302.3645.3
K0.61638.61224114.3284.1739.25782.5482213,091.14464.5229.1
K1.00652.51279113.4286.7683.35745.4451242,680.54169.7223.7
K2.11652.61198113.5262.4683.15637.744949,553.14125.0224.1
I732.61569149.6413.6674.05897.3516450.34177.1280.0
F3743.71156138.1283.7602.15004.24932227.93682.0262.0
F20622.31142106.6262.1588.34624.145416,130.53527.4211.4
F100615.51101103.2237.2588.44606.644979,176.63511.0206.7
IFC4663.01087108.8246.7589.34644.94471245.73506.8223.1
IFC8608.61092102.1246.7588.64587.14542448.13508.5203.8
IFC12611.11094103.3247.2588.84627.54543608.63511.9205.7
IFC16609.51092102.8246.7588.74616.64544780.23505.0204.2
USAM1020.91763273.6666.7861.712,738.5733171.97804.3491.4
K0.61575.5104171.3185.0979.07371.2366265,567.35449.2158.4
K1.00540.3106362.3208.1887.46483.3439315,942.64717.3135.6
K2.11543.7101563.2190.2887.46454.633668,828.14711.4137.2
I533.7137162.0290.9887.96820.5384439.54821.1135.5
F3512.092957.5163.0758.94845.63321813.03812.8126.2
F20491.286152.9154.0743.44425.231211,443.13610.6119.0
F100491.186452.8153.9743.64431.631156,934.73608.3118.5
IFC4491.786552.8153.4743.14421.93101028.43608.2118.4
IFC8491.485952.8153.4743.04423.63122022.93606.8121.7
IFC12490.786552.7153.4742.84409.73112977.33599.6118.1
IFC16491.186052.8153.4742.94421.83123938.53592.8118.2
Table 3. Running times in seconds of IFC8, using up to 16 cores of the Skylake CPU.
Table 3. Running times in seconds of IFC8, using up to 16 cores of the Skylake CPU.
Graph Cores
124816
ColTime [s]11.66.13.32.11.7
Speedup1.01.93.55.56.8
CalTime [s]71.536.719.211.37.2
Speedup1.01.93.76.39.9
EurTime [s]2257.81160.0600.7334.2241.8
Speedup1.01.93.86.89.3
USATime [s]1869.5947.9497.0275.5173.2
Speedup1.02.03.86.810.8
Table 4. Pareto cuts of Colorado.
Table 4. Pareto cuts of Colorado.
max ε Achieved ε [%]Cut Size
IFC4IFC8IFC12F3F20K2.11MIIFC4IFC8IFC12F3F20K2.11MI
00.00.00.00.00.0 1.0 0.06048484444 35 259
10.40.60.20.80.81.00.00.14741382828343796
32.82.81.90.80.82.80.00.73636362828335770
54.34.34.30.80.84.42.70.92828282828323960
108.98.98.90.89.18.9<0.11.42222222822224346
2011.611.611.618.818.818.816.714.020202019191923027
3027.627.627.627.627.610.3<0.123.11414141414214421
5040.640.640.640.640.634.144.336.41212121212132214
7057.657.657.640.657.640.641.248.8111111121112128712
9081.289.081.283.587.389.447.481.596911859719
max ε Are Sides Connected?Running Time [s]
IFC4IFC8IFC12F3F20K2.11MIIFC4IFC8IFC12F3F20K2.11MI
00.60.81.21.610.82.10.1
10.50.71.01.38.52.40.10.1
30.40.60.91.38.53.40.10.1
50.30.50.71.38.54.60.10.1
100.20.40.61.37.113.40.10.1
200.20.40.50.96.426.20.10.1
300.20.30.40.75.026.70.10.1
500.10.30.30.64.417.80.10.1
700.10.20.30.64.043.70.10.1
900.10.20.30.63.027.70.10.2
Table 5. Pareto cuts of California and Nevada.
Table 5. Pareto cuts of California and Nevada.
max ε Achieved ε [%]Cut Size
IFC4IFC8IFC12F3F20K2.11MIIFC4IFC8IFC12F3F20K2.11MI
00.00.00.00.00.0 1.0 0.05746468043 48 306
11.01.01.00.20.20.20.00.63935356131316393
32.32.32.32.42.32.3<0.11.12929295029295164
52.32.32.34.32.32.3<0.11.62929293429295662
102.32.32.35.32.32.30.30.62929292929294437
2016.716.716.75.316.716.7<0.12.72828282928284729
3016.716.716.75.316.72.3<0.15.52828282928295029
5042.342.342.35.349.12.333.340.8252525292429311827
7042.342.342.367.049.12.341.242.6252525282429334326
9085.485.485.490.089.849.147.485.6181818151424304018
max ε Are Sides Connected?Running Time [s]
IFC4IFC8IFC12F3F20K2.11MIIFC4IFC8IFC12F3F20K2.11MI
02.53.55.011.858.910.60.3
11.93.04.311.149.613.00.70.3
31.42.53.510.147.522.70.70.3
51.42.53.58.147.536.80.70.4
101.42.53.57.247.574.50.70.4
201.42.33.47.246.0104.00.70.5
301.42.33.47.246.0172.60.70.6
501.22.02.97.240.2210.80.71.1
701.22.02.96.940.2227.60.71.5
900.81.42.13.923.5110.50.71.6
Table 6. Pareto cuts of Europe.
Table 6. Pareto cuts of Europe.
max ε Achieved ε [%]Cut Size
IFC4IFC8IFC12F3F20K2.11MIIFC4IFC8IFC12F3F20K2.11MI
00.00.00.00.00.0 1.0 0.0311289288273271 148 1578
11.01.00.30.70.31.0<0.10.3274274243246224148393417
33.03.02.30.71.32.6<0.10.4259241238246219130434340
54.84.84.24.65.02.9<0.10.2226226215211207129452299
109.59.59.59.59.57.9<0.10.2188188188188188112468284
209.59.59.59.59.57.8<0.17.5188188188188188113403229
309.59.59.59.59.526.8<0.19.1188188188188188104463202
5049.049.049.09.543.78.233.39.52323231883911116,151188
7049.070.049.064.567.532.141.264.723202358228623,02138
9072.872.872.872.872.872.872.872.822222222
max ε Are Sides Connected?Running Time [s]
IFC4IFC8IFC12F3F20K2.11MIIFC4IFC8IFC12F3F20K2.11MI
098.4183.3253.0503.63240.0141.55.1
191.9179.3232.1475.12965.1193.98.13.9
389.2169.0229.7475.12930.9352.48.14.8
581.9162.5216.3428.02839.4639.18.16.4
1067.1138.5192.9390.12647.12256.78.111.2
2067.1138.5192.9390.12647.13618.48.123.6
3067.1138.5192.9390.12647.12406.78.141.7
5010.716.724.1390.1613.54233.78.286.5
7010.714.924.1124.1361.83351.58.223.7
904.37.911.96.549.13353.08.14.8
Table 7. Pareto cuts of the USA.
Table 7. Pareto cuts of the USA.
max ε Achieved ε [%]Cut Size
IFC4IFC8IFC12F3F20K2.11MIIFC4IFC8IFC12F3F20K2.11MI
00.00.00.00.00.0 0.8 0.0115115115115115 118 1579
10.60.60.60.60.60.50.00.4828282828294178406
32.32.32.32.32.32.4<0.10.1767676767673192257
53.83.83.83.83.83.80.00.1616161616161289186
103.83.83.83.83.83.8<0.13.261616161616125381
203.83.83.83.83.83.8<0.13.961616161616122261
303.83.83.83.83.83.8<0.13.961616161616123261
503.83.83.83.83.83.83.73.961616161616124261
7069.669.669.669.669.63.841.266.546464646466141,97661
9069.669.669.669.669.669.647.470.346464646464645,40946
max ε Are Sides Connected?Running Time [s]
IFC4IFC8IFC12F3F20K2.11MIIFC4IFC8IFC12F3F20K2.11MI
060.6102.0145.3246.91963.9179.26.5
143.781.5117.4234.51628.9246.510.95.1
340.475.7108.2223.91533.5691.410.95.5
531.860.286.0198.01290.51329.810.96.0
1031.860.286.0198.01290.51710.710.96.2
2031.860.286.0198.01290.52983.510.99.3
3031.860.286.0198.01290.54891.810.916.6
5031.860.286.0198.01290.55307.410.932.4
7024.444.763.2154.2985.15445.211.250.9
9024.444.763.2154.2985.110,637.711.261.2
Table 8. CCH performance of different parameter configurations of IFC8 on Europe. Bold values are the best in their category. Darker shades indicate better values.
Table 8. CCH performance of different parameter configurations of IFC8 on Europe. Bold values are the best in their category. Darker shades indicate better values.
Search SpaceCCH Up.Running Times
ConfigurationNodesArcs [ · 10 3 ]Arcs#Tri.Tw.OrderCust.Query
α δ γ a γ o Avg.Max.Avg.Max.[ · 10 6 ][ · 10 6 ]Bd.[s][ms][ μ s]
0.050.050.30.1610.21092102.8248.9588.74586.645429333388204.1
0.050.050.30.15608.61093102.2248.9588.64578.145426443380203.2
0.050.050.350.15608.61093102.2248.9588.64577.845426553385203.2
0.050.050.30.2610.61096103.0248.9588.94621.645425053400204.1
0.050.050.350.2610.51098103.0246.4588.94620.845424873396203.9
0.050.050.40.2610.31098102.9246.7588.94622.245424953400204.4
0.050.050.30.25610.51096103.0248.9588.94630.845424763404204.4
0.050.050.350.25610.61092103.1246.4588.94629.645424643403204.3
0.050.050.40.25608.61092102.1246.7588.64587.145424483401202.9
0.050.050.350.3610.61092103.0246.4588.84628.045424573396204.2
0.050.050.40.3609.51092102.9246.7588.74625.645424453398203.7
0.050.050.40.35609.61094102.9246.7588.84626.745424453404203.9
0.050.10.30.1610.31092102.9248.9588.84594.845429043391204.4
0.050.10.30.15610.71116103.1248.9588.84603.445425953413204.5
0.050.10.350.15608.51116102.2248.9588.74586.145425893399208.2
0.050.10.30.2612.41094103.8248.9588.94628.545425083402205.8
0.050.10.350.2610.11093102.8246.4588.84611.845424803396203.8
0.050.10.40.2610.31093102.9246.7588.94616.645424893398203.9
0.050.10.30.25612.31094103.7248.9588.94628.745425163400205.6
0.050.10.350.25610.11099102.8246.4588.84610.345424973392204.1
0.050.10.40.25608.21099102.0246.7588.64579.145424783388203.4
0.050.10.350.3610.21093102.9246.4588.84614.845424893396204.0
0.050.10.40.3609.51093102.9246.7588.84622.745424823395203.8
0.050.10.40.35609.61095102.9246.7588.84623.145424753397203.6
0.050.150.30.1610.31092102.9248.9588.84594.845429063396204.8
0.050.150.30.15610.71116103.1248.9588.84603.045425723393204.5
0.050.150.350.15608.61116102.2248.9588.74588.045425683396203.1
0.050.150.30.2612.61116103.8248.9589.04637.045425233407205.9
0.050.150.350.2610.31114102.9246.6588.94617.345424943400204.4
0.050.150.40.2610.51114102.9246.7588.94622.645425043402204.5
0.050.150.30.25612.61100103.8248.9589.14644.545425213406205.7
0.050.150.350.25610.31100102.9246.4588.94619.245425073398203.9
0.050.150.40.25608.81100102.1246.7588.74588.645424893393203.1
0.050.150.350.3610.41100102.9246.4588.94624.245425063401204.2
0.050.150.40.3609.61100102.9246.7588.84626.945424953393203.8
0.050.150.40.35609.71100102.9246.7588.94634.345424923408203.9
0.010.050.40.25609.01095102.4248.5588.74596.345428173387203.6
0.0250.050.40.25607.61095101.9248.5588.64585.245426583394203.1
0.0750.050.40.25633.01131110.8265.8588.94638.945024023416216.1
0.10.050.40.25641.91140112.7274.9589.04650.945121823417219.0
0.1250.050.40.25651.51118106.2263.8589.14618.047518003386211.9
0.150.050.40.25651.61108106.2263.3589.24616.147516563390216.6

Share and Cite

MDPI and ACS Style

Gottesbüren, L.; Hamann, M.; Uhl, T.N.; Wagner, D. Faster and Better Nested Dissection Orders for Customizable Contraction Hierarchies. Algorithms 2019, 12, 196. https://doi.org/10.3390/a12090196

AMA Style

Gottesbüren L, Hamann M, Uhl TN, Wagner D. Faster and Better Nested Dissection Orders for Customizable Contraction Hierarchies. Algorithms. 2019; 12(9):196. https://doi.org/10.3390/a12090196

Chicago/Turabian Style

Gottesbüren, Lars, Michael Hamann, Tim Niklas Uhl, and Dorothea Wagner. 2019. "Faster and Better Nested Dissection Orders for Customizable Contraction Hierarchies" Algorithms 12, no. 9: 196. https://doi.org/10.3390/a12090196

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop