In-Path Oracles for Road Networks

: Many spatial applications beneﬁt from the fast answering to a seemingly simple spatial query: “Is a point of interest (POI) ‘in-path’ to the shortest path between a source and a destination?” In this context


Introduction
A bread and butter operation on a road network is finding the shortest path from a source to a destination and a natural corollary to that is the determination of whether a POI is along the shortest path or can be reached by making a small detour. This problem is called an "in-path" query. While this problem is deceptively simple, it almost always involves computing the shortest path or at least some part of the shortest path, which makes this operation slow. Given a set of POIs, the in-path query is a determination of which of the POIs lies on the shortest path or can be reached by taking a bounded detour from the shortest path. The detour can be expressed either by a fixed value (e.g., 1 km) or a percentage fraction (e.g., say 5% additional detour from the shortest path). Now, there are many applications that require answering millions of in-path queries on a road network. Consider the following real-life scenarios where the notion of in-path is of critical importance.

[Analysis or Simulation Queries]
A city wants to choose a COVID vaccination camp and wants to choose one among hundreds of possible locations. One of the considerations among others is that these camps should be in-path to many commuters from a target group whose home and work locations are known. The city takes the daily commute routes of their target group and chooses the one location for the camp that is in-path to the most number of commuters. Related use-cases here include evacuation or shelter planning [21][22][23][24][25][26], location planning for new businesses [27], choosing new sites for playgrounds, etc.
Both these queries require computing millions of in-path queries per second and at high throughput. When a user is interacting with a routing app, the window of opportunity for displaying coupons or gas stations along the way is very short. Answering in-path queries requires finding the shortest path or at least part of it. Our challenge in this paper is doing so without having to actually compute the shortest path during runtime.
We introduce an in-path oracle that is able to answer millions of in-path queries per second. An oracle [28][29][30][31] is a data structure that is able to answer a query not by computing but by looking up a precomputed result quickly. In our case, the in-path oracle is able to make the determination for any arbitrary source and destination if a POI is along the shortest path or not. The key innovation here is that to make this determination, we do not have to compute the shortest path between the sources and the destinations. The construction of the oracle involves precomputation and storing all the pertinent information needed to answer the in-path queries. The precomputation is done in an offline process which aids in our stated goal of building a spatial platform that can support in-path queries at high throughput. To the best of our knowledge, the high-throughput computation of "in-path" queries is not yet explored on a real road network. Our work builds on a body of road network oracles like distance oracles [29], and path oracles [30] that have been pioneered by the authors and others over the last decade. This work adds to the arsenal of high throughput operations one can perform on road networks. We list out the following important contributions in the paper.
[Throughput:] The in-path oracle relies on precomputing using an offline process so that queries can be answered at a high throughput rate. Once precomputation is done, our in-path oracle can answer 1.5M in-path queries per second on a single desktop machine. We compared our approach to the dual Dijkstra approach [12] and show that our approach is many orders of magnitude faster since it can answer in-path queries without computing the shortest path.
[Size:] The next consideration is the size of the oracles. We show both theoretically and experimentally that the size of our oracle is O(m × n) where n is the number of nodes in the road network and m is the number of POIs. In-path queries can be answered in O(log n + F) time where F is the number of POIs in the result. The complexity bound is achieved by appealing to the similarity to WSP decomposition [32][33][34][35][36] which provides us with the linear bound on the number of nodes on the road network. Note that the linear bound means that each POI requires just a few MB of storage and our approach can be scaled to a large number of POIs.
[Computation Time:] The precomputation takes about tens of minutes on a single machine for city-sized road networks. Note that, although not explored in the paper, since the computation is data parallel, there are straightforward techniques to speed up the precomputation by employing more machines.
[Interoperability:] The in-path oracle can be embedded, employed, and used in multiple ways. One of the methods described in this paper is embedding in a relational database system [37] and querying using SQL. This means that the in-path oracle can be used in complex querying scenarios.
Related work is discussed in Section 2. Next, Section 3 provides the background and problem statement. A dual Dijkstra based baseline algorithm for finding in-path POIs is provided in Section 4. We describe the in-path oracle in Section 5, while experimental results are provided in Section 6. Finally, concluding remarks are drawn in Section 7.

Related Work
In-path queries are closely related to shortest path findings and the prior work in this area can be categorised into two groups-oracle techniques [29][30][31] and node-importance based methods [38][39][40][41]. In the following, we provide an overview of a body of work in this area and broadly distinguish our in-path oracle from these approaches.

Path and Distance Oracles
Thorup and Zwick [31] introduced the theoretical frameworks for distance oracle and since then many approximate distance oracle approaches have been proposed that trade-off between preprocessing time, space requirements, and query time. Below we briefly describe some of the oracles built on road networks while noting that our in-path oracle extends the idea of spatial coherence in road networks.
For a road network with n nodes, Sankaranarayanan et al. [30] introduced a compact construct called the path distance oracle that takes O(n) storage to encode the n 2 shortest paths between each pair of nodes. The path oracle's is predicated on the observed coherence between the spatial locations of the source and destination nodes and the shortest paths between them, which makes it easier to organise source and destination nodes into groups that have common nodes or edges on the shortest paths. Precomputing the distances between all possible sources and destinations forms the foundation of shortest path retrieval in a path oracle. Sankaranarayanan and Samet [29] extend the idea to network distances and propose approximate distance oracles for road networks based on spatial coherence. The approximate shortest distances between all possible nodes in the spatial network are precomputed and stored using a concise amount of space.
Peng and Samet [42] introduced City Distance Oracles (CDO) for local area road networks spreading over several cities to achieve fast spatial analytic queries on a city road network. Peng et al. [43] also propose SPark and Distance Oracles (SPDO), which is an approximate distance oracle for spatial analysis queries on continental road networks on Apache Spark which results in high throughput. Finally, Peng et al. [44] recent development called Distance Oracle System (DOS) can quickly process a huge number of spatial analytic queries on large road networks.
SILC [45,46] is a method to take advantage of the spatial coherence property between a single to multiple destination vertices. The shortest paths and distances are precomputed and stored using efficient SILC encoding. The Path-Coherent Pairs Decomposition (PCPD) [30] decomposes a road network based on WSP decomposition and approximately estimates the road network distance and the shortest path between any pair of nodes in a spatial road network.

Node-Importance Based Methods
Node-importance-based methods [38][39][40][47][48][49] focus on computing the network distances in the road network. The oracle approaches use the path coherence of road networks and in contrast these approaches use the graph nature of road networks. The node-importance approaches use the observation that some nodes in a spatial road network are more important than other nodes and every shortest path passes through at least one of these nodes. Node-importance-based methods [41,[50][51][52][53][54] rank the nodes in a road network according to their importance and pre-calculate the shortest path distances among the important nodes to speed-up query processing. In contrast to these approaches, in-path oracles completely precompute away the graph information of a road network, relying on spatial information for answering in-path queries.

On the Way POI Search Approaches
Yoo and Shekhar [55] propose the in-route nearest neighbour query problem. Given a source and a destination, their method identifies a POI with the minimum deviation from the query path on the way to the destination. Chen et al. [56] explore the problem of path nearest neighbour query in dynamic road networks, where a user is travelling towards a given destination from his/her current location and wants to avail of a service along the way. So, the path nearest neighbour query differs from the conventional nearest neighbour query in the sense that the neighbour is a detour along the shortest path. Each path nearest neighbour query builds a shortest path between the user's current location and the destination and then searches for the POI (i.e., with the smallest detour distance) that is closest to the shortest path relative to the entire journey path and his/her current query location. Our dual Dijkstra is a good stand-in for these approaches and experimental results show that our approach outperforms these techniques by many orders of magnitude.

Detour Queries
Obstructed detour queries were introduced by Saha et al. [57]. In their problem formulation, obstacles such as private property or a fence, and obstructed detours make straight line navigation impossible. In this formulation, one can define the notion of nearest POI which is similar in flavour to the POI approaches discussed earlier.
Shang et al. [58] propose optimization methods to quickly process the best point detour queries on road networks. From a source to a destination location on the road network, a user preferred path P is given. The best point detour query seeks quick identification of the best detour POI, i.e. detour with the lowest detour cost on the path to be travelling along P. Nutanong et al. [59] explore the problem of determining the shortest route between two locations that contain a stopover of a specific type. They are interested in minimising the overall travel distance between two locations while covering a specific kind of stopover. Note that our work with in-path oracle is more fundamental in the sense that it succinctly encodes the notion of what POI is in-path to every possible source and destination. This means that one can formulate these detour problems on top of our oracle and be able to process them efficiently.

Background
In this section, we first develop the preliminary concepts before describing the problem setup.

Road Network
In our formulation, the road network is modeled as a directed, and weighted graph, G = (V, E, W), such that V corresponds to nodes which are road intersections, while E corresponds to edges which are directed road segments connecting two nodes. Let n, and m be the number of nodes and edges respectively in the road network. Edge weights represent either the travel distance or the travel times between two adjacent nodes v i and v j that have an edge between them. The edge weight w(v i , v j ) is strictly greater than zero. In other words, there are no negative or zero edge weights.

Shortest Distance
Given source s and destination t nodes, the shortest distance d N (s, t) is obtained by summing up edges along the shortest path between s and t. Note that since w(.) denotes either distance or trip time, d N (s, t) denotes the shortest path using an appropriate unit. Note that G is fully connected such that given any two nodes u and v, there is a path from u to v and vice-versa, although they may not have the same distance due to the directed nature of the road network.

Detour
Given source s and destination t nodes, let π(s, t) denote a simple path that is not necessarily the shortest. The detour of such a path is the difference in the network distance along π(s, t) compared to d N (s, t). Furthermore, it is fairly trivial to see that the detour of any path is greater or equal to zero.

Detour Bound
We are interested in finding detours that are bounded by a fraction such that the detours are bounded by × d N (s, t). For instance, if = 0.1, we allow up to 10% increase in the path distances to accommodate the detour, so, all paths between s and t of distance ≤ (1 + ) × d N (s, t) are acceptable.

POI
A point of interest (POI) is an arbitrary node on the road network through which we want to route the shortest paths. There are m such POIs in the road network and these can be dynamically added or deleted as the algorithm proceeds.

In-Path POI
A traveller starts from a source node s and is driving to the destination node t. A POI p is said to be in-path if there is a path that passes through p that is a detour bounded by . Now that we have provided the necessary details to define the problem setup, we can provide the problem statement that we solve in this paper.
Problem definition: We are given a road network G, set P of m POIs, and a detour bound . A driver travels from source s and destination t, we want to find the set of POIs in P that are "in-path" under the conditions specified. Now that we have provided our problem statement we can also discuss the limitations of our approach. First, we are interested in retrieving all the POIs that satisfy the detour constraints. Our focus here is maximising the throughput where one can answer millions of in-path queries a second using a single machine. Our solution is not geared towards a driver that wants to visit multiple POIs yet stay within the detour bound. In our model, the expectation is that the user is presented with the POI choices and may choose one of the in-path POI to visit. Such examples include coffee shops, restaurants, gas stations, vaccination clinics, etc. The driver is unlikely to visit another POI of the same kind. Our work is in the context of the placement of relevant POIs on a map as an opportunistic service where speed is of the essence, so, the composition of complex trips that include visiting multiple POIs is not the focus of this work.

Dijkstra-Based In-Path Algorithm
Over the years, there have been many variants, modifications, extensions, and casespecific (negative weight edges, small and fixed integer weight edges) alternatives proposed to Dijkstra's algorithm. Prominent among them are A [60], Bellman-Ford algorithm [61,62], Moore's algorithm [63], Dial's algorithm [64], etc. that improved the original algorithm or applied it the algorithm to different use-cases. For instance, Moore's algorithm uses a queue (instead of a priority queue) that can handle negative edge weights. In our problem formulation, we require a way of not only finding the shortest paths but also visiting the detour paths in an incremental fashion.
In this paper, we focus on developing the dual Dijkstra algorithm [12], a Dijkstra based variant for finding POIs within the specified detour tolerance limit . This approach serves as a baseline algorithm for our oracle-based approach as well as a useful mechanism to highlight a few properties that we will use later. Here we are given a source s, a destination t, detour limit , and a set of POIs P. The algorithm fetches the subset of POIs in P that are in-path with respect to the given source and destination.
Algorithm 1 sets up the dual Dijkstra's implementation for determining the in-path POIs for a given source s and destination t. Our algorithm makes use of a priority queue Q of objects O, which are made up of triples, v(O), d(O), and l(O), namely, the current node, road network distance, and the traversal label. To aid efficient storage representation, these handy functions are defined to access the fields from the object O. Note that l(O) can either be a backward or forward scan or a POI. The current state of the algorithm is stored in two data structures T and VISITED. T stores the forward and backward distances of a node from s and t. For instance, T(v, B) and T(v, F) provide the backward and forward distances of a node v, respectively. Another data structure VISITED(.) helps avoid revisiting already visited nodes and is also initially empty. Since we have two simultaneous traversals, this data structure needs to keep track of which traversal (i.e., B and/or F) has visited a node. For s and t, VISITED is marked TRUE, i.e., VISITED is initialised. We initialise the forward scans from s and the backward scans from t. Updation of the priority queue Q takes place with the forward and backward traversals from s and t respectively. The incoming and outgoing nodes are added into Q with their appropriate labels. Finally, d N is the estimate of the shortest path distance computed so far, and result stores the POIs found thus far.
The primary approach is presented in Algorithm 2, which is based on the aforementioned setup and initialisation. Line 1 illustrates the termination condition of the algorithm. When the Q is empty, the algorithm stops. In other words, when the whole graph has been traversed or if the element at the front of the queue is at a distance more than d N , the algorithm stops. We provide correctness proof for the algorithm in Lemma 1.
We start by looking into object O, which has been taken out of the Q front. When a POI is found at the front of the queue, the POI is in-path and is added to the result. This is shown in lines 2-5. In lines 6-10 it checks whether the current extracted object O from Q is already marked, i.e., we look at VISITED to check if the node has been already traversed. If so, we ignore it and continue with the traversal, otherwise, the node is marked as visited. To ensure that every object's label is correctly reflected in the VISITED data structure, we label every object whether the traversal is forward or backward. After that, in line 11, we check the corresponding entries of T, i.e. whether an update requires the current object's node v(O) and the label l(O). If the distance has already been computed, we stop trying to improve it. This is because of the best-first nature of the backward and forward traversals on G. At this point, we furthermore check whether both the labels (i.e., backward and forward) of node v(O) are computed. When both the entries (i.e., backward and forward distance) entries are computed, then in line 14 we compute the current shortest path distance and compare it with the existing shortest path distance. In case, the current shortest path distance is less than the existing shortest path distance, we replace the existing shortest path distance with the current shortest path distance. Then in line 19, we add an object triple into Q which stores the fully resolved node's current shortest path distance and the POI node. When we say fully resolved, we mean that both the forward and backward traversals have reached the POI after which it is added back to Q.

Algorithm 2
Determine if p is in-path to s, t for a given .
POI is found 4: continue 5: end if 6: if if Not EXISTS T(v(O), l(O)) then 12: Update 15: if d N > d * (1 + ) then Updated once. 16: 17: end if 18: if v(O) ∈ P then 19: Add fully resolved POI to priority queue 20: end if 21: end if 22: end if 23: if l(O) == F then Forward scan 24: for end for 27: else Backward Scan 28: for Proof. Suppose p ∈ P is a POI that may be in-path with respect to the shortest path between s and t. Here we have to be cognizant of two cases and handle them appropriately.

•
Case 1: p lies on the way from s to t, so to reach p, no detour is required. In other words, p is part of the shortest path. • Case 2: p does not lie on the way from s to t, so to reach p, a detour is required. The complexity, in this case, is that the determination of whether p is within the detour tolerance can only be made after the shortest distance between s and t has been determined. Therefore, the algorithm should seamlessly work for an arbitrary value of .
For Case 1, the algorithm can only stop when the object at the front of the queue is d(Q. f ront()) > d N . Note that d N ≥ d N (s, t), so we are guaranteed that all the nodes in the shortest path are visited by the traversal before the algorithm fetches the POIs corresponding to Case 2. Now for Case 2, similarly, the algorithm cannot stop before all nodes that have a distance of d N (s, t) * (1 + ) have been visited by both the forward and backward scans. Note that this means all the POIs qualifying for the detour requirement are retrieved by the algorithm.

In-Path Oracles
The objective of this paper is to design an in-path oracle to efficiently answer the problem defined in section 3 whether a given POI is in-path with respect to a given source and destination provided during query time. The goal of an in-path oracle is to answer this question quickly using precomputed results. In this paper, we define a new kind of oracle but the concept of oracles in itself is not new. For instance, a distance oracle [29] provides an -approximate shortest distance, while a path oracle [30] provides both an approximate distance as well as the shortest path. In contrast, the in-path oracle proposed here retrieves the POIs that are in-path to the shortest path of the driver travelling from s to a destination t, subject to a detour limit .
The key idea behind in-path oracles is that it exploits the spatial coherence [46] property in road networks which observes that spatially adjacent nodes in a road network share common characteristics. The shortest paths and distances between nodes and their spatial locations are coherent with one another which forms the key principle behind the path and distance oracles [29,30].
The in-path oracle expands on the spatial coherence property in road networks. We know from path coherence that a set of source nodes A and destination nodes B may share the same shortest paths if A and B are sufficiently far apart from one another but the nodes in A and B themselves are close to one another. Since they share common paths it means that we can now determine if a given POI is in-path with respect to this group of nodes as opposed to single pairs of shortest paths. This is a powerful idea since it allows us to define the "in-path" property on groups of nodes as opposed to pairs of nodes resulting in dramatic space savings and efficient access.
In the rest of this section, in Section 5.1, we establish the sufficient conditions by which we can determine if a POI p is in-path to a set of sources A and destinations B. Next, we provide the algorithm for computing the in-path oracle in Section 5.2. We discuss how the oracle is stored and indexed in Section 5.3. Finally, discuss how the oracle is queried using a few examples in Section 5.4.

Identifying In-Path Property
In this section, we define the in-path property for a set of source nodes A and a set of destination nodes B. The sources A and destinations B are represented by a bounding box (later we restrict the box to be a quadtree [65][66][67][68][69][70][71][72] block) that contains all the nodes in A (and B). Let s be a randomly chosen representative source node in A while t is a representative destination node in B. Let p be the POI to which we want to make a determination if all the shortest path from all the sources in A to all the destinations in B are in-path to p. In other words, we want to determine if block-pair (A, B) is in-path to p. Note that we make this determination for a single POI and then repeat the process for all the m POIs.
Let r F a be the forward radius from s denoting the farthest distance of a node in A from s. Similarly, r B a is the backward radius to s denoting the farthest distance from a node in A to s. We can compute a similar radius for t in B. We define r F b and r B b denoting the forward and backward radius of B, which is depicted in Figure 1. For every pair of source nodes in A to every destination node in B, one can define the shortest path. Among these shortest paths, one can bound the shortest and longest shortest paths. These are captured by the following lemmas.

Lemma 2. The length of any shortest path between A and B is greater than or equal to
Proof. This can be trivially shown by counter example. If the path is smaller than that quantity, then one can go from s to t using a shorter distance than d N (s, t). Let us choose an arbitrary s and t to be an arbitrary source and destination and, furthermore, the network distance between them is smaller than the unit derived above. Then one can take a shortest path s −→ s −→ t −→ t. Note that s −→ s is bounded by r F a and t −→ t is bounded by r B b . Hence, d N (s , t ) has to be greater than or equal to d N (s, t) − (r F a + r B b ) else d N (s, t) is not the shortest distance between s, t, which would be a contradiction.

Lemma 3. The length of any shortest path between A and B is at most
Proof. Let us choose an arbitrary s and t be an arbitrary source and destination and, furthermore, one can define a path as follows: . Hence, the lemma.

Lemma 4. (A, B) is in-path to p if the following condition is satisfied.
Proof. For any given node s , t in A, B, respectively, the network distance between d N (s , t ) is at least d N (s, t) − (r F a + r B b ) from Lemma 2. We can estimate a path that passed s −→ s −→ p −→ t −→ t . If p is in-path to s −→ t , then any path passing via p should be at most r B a + d N (s, p) + d N (p, t) + r F b . For the shortest possible path in A, B to be in-path with p, we have the following inequality: If the above inequality is satisfied for a particular pair of (A, B), then any shortest path between A and B is in-path to p.

Lemma 5. (A, B) is not in-path to p if the following condition is satisfied.
Proof. For any given node s , t in A, B, respectively, the network distance between d N ( if s , t are part of the shortest path between s, p and p, t, respectively. If the shortest path between s , t is not in-path to p, then we have the following inequality: If the above inequality is satisfied for a particular pair of (A, B), then any shortest path between A and B is not an in-path to p. Lemma 6. If both conditions in Lemmas 4 and 5 are not satisfied, then some sub-division of A, B is in-path to p.
Proof. Subdividing A and B reduces the radius of the blocks and could satisfy the conditions in Lemmas 4 and 5. In other words, if we cannot make a determination whether A, B is in-path to p, we need to keep subdividing it until children blocks are either empty, in-path, or not in-path. Figure 2 shows an example of both in-path and not in-path cases in practice. Here the POI is shown using a green circle. The set of sources and destinations are shown using blue colour. The common shortest path between a randomly chosen representative source and destination, and the resulting detour path are also shown. Now that the lemmas establishing the properties have been established, an algorithm for computing the in-path oracle for a road network is described next. The red dots indicate randomly chosen sources or destinations whose shortest paths are drawn.

Computing In-Path Oracle
Algorithm 3 provides the mechanics for computing the in-path oracle. The input to the algorithm is a road network consisting of nodes and edges. There is a quadtree on the spatial positions of the nodes such that R denotes the root block. Queue Q holds the quadtree block pairs. Note that Q is initialised with the block-pair (R, R). The algorithm continues till Q is empty (line 4). At every iteration of the algorithm, a block pair A, B is retrieved from the front of the queue. We choose a source s and destination t at random from A and B, respectively. We compute the shortest distance between s and t, i.e., d N (s, t), and the distance of the shortest path that passes through p, i.e., d N (s, p) and d N (p, t). We also compute the forward and backward radiuses of the blocks, i.e., r F a , r B a , r F b , r B b .

Algorithm 3
Determine in-path oracle for a given POI p. Insert all children block-pairs into Q 15: end while 16: return result Armed with the necessary components, one can use Lemma 4 if (A, B) denotes an in-path block-pair. If so, we add it to the result set. Similarly, Lemma 5 informs if the block-pair cannot be in-path in which case it is discarded. Finally, if both the conditions are not satisfied, we will simply break A, B into their 4 children blocks to form the resulting children blocks. These are inserted into Q. It is fairly easy to see that the children blocks might satisfy one of the two lemmas after a few decomposition steps. If one continues running the algorithm at the end we will get pairs of blocks such that they are all in-path with respect to p.
Referring back to the example in Figure 2a-b, one can see a block pair A, B denoting both the in-path and not in-path cases. Here we can see that Figure 2a corresponds to the case in line 9 where all the paths between A, B pass through the POI under consideration. The POI is shown by the green dot in the figures. Figure 2b corresponds to the case where we can determine that the p is definitely not in-path and the pair A, B can simply be discarded. This covers in line 11 in the construction algorithm. Finally, if some of the paths between A and B pass through p but some of them do not, we have to further subdivide. This is the case in line 13 where we have to further subdivide A, B into children block pairs and insert them into the priority queue. Now, we have established the principles of the in-path oracle decomposition, for a single POI, we can estimate the size of the resulting decomposition in Lemma 7.

Lemma 7.
The size of the in-path oracle for p is O( 1 2 n) since it is a Well-Separated Pair (WSP) decomposition of the road network.
Proof. The size of the in-path oracle uses the same arguments as in [29]. Since there is a bit of background information that connects road network oracles (e.g., distance, path, and now in-path) to WSP, the interested reader is referred to [29,30,45,46,73] for the necessary background information. Here we will only provide a rough sketch of how the bounds are established.
The first intuition is that the in-path oracle decomposition in Algorithm 3 is a WSP decomposition of the nodes of the road network. This can be seen from Lemmas 4 and 5 that the block-pair under consideration is in-path or not depending on the relative magnitude of the radius of the blocks to the distance separating the blocks. For instance, if the blocks are sufficiently "far" and their radiuses are small, the block may (or may not) satisfy the in-path condition. It is important to note that the WSP decomposition has an identical condition of determining if a block-pair is sufficiently far. Once the similarity has been established, we know that the size of the in-path oracle decomposition is O(s d n), where s is the separation factor [33] and d is the dimensionality of the embedding space which is 2 for the case of road networks. In our case, s is dependent on as we discuss next. Now, we can observe that, as increases (i.e., relaxed), more blocks easily satisfy the in-path oracle condition. Similarly, as decreases (i.e., made more stringent), more block-pairs can be marked off as not being in-path. So, there is a sweet-spot for that controls the size of the in-path oracle; setting it high or low results in a smaller in-path oracle. Since we are only interested in the worst-case size behaviour here, we can note that the size of the in-path oracle is inversely proportional to the . This can be intuitively observed from the fact that a smaller requires that the radiuses are much smaller than the distance between the blocks. In other words, the separation factor s is proportional to 1 2 where the factor 2 denotes the dimensionality of the embedding space.

Storing the In-Path Oracle
From Lemma 7 we know that a single POI results in an in-path oracle of size O( 1 2 n). For each entry, we store a block-pair (A, B) corresponding to the quadtree blocks as a four-dimensional Morton code [72,[74][75][76]. In this scheme, a pair of Morton codes of the same level can be bit-interleaved to form a 64-bit number with the level of the block stored in the most-significant bit position of the code. Here 64 bits is sufficient to represent a fairly small area of less than 100 meters on the surface of the road network. This variation is also linear quadtrees [72] and forms the basis of storing multidimensional quadtrees on a database system. In the following we provide the size and access time results for our in-path oracle. Proof. Given m POIs, we compute the in-path oracle for each POI. We store the identity of the POI along with each block-pair. The table containing these blocks is encoded as a four-dimensional Morton code, sorted and indexed by a B-tree [73,[77][78][79][80][81]. The size of this representation is O( 1 2 nm). Given a source s and destination t, we can first compute its four-dimensional Morton code, i.e., Z 4 (s, t) in O(1). Next, we can use that key to look for matching blocks using the B-tree index. The time complexity of this operation is O(log n + F), where F is the number of POIs that match the results. This leads us to the storage and access complexity of the in-path oracles.

Querying the In-Path Oracle
To illustrate how one can use the in-path oracle effectively, we consider a relational database [37,73,82] containing a few sample tables using which we can create a few scenarios. The oracle is stored as a table ORACLE (Z AB , poi_id). Here Z AB represents the blockpair that is known to be in-path with respect to poi_id. Here the attribute poi_id is a foreign key to another table POI (poi_id, Type, Name, ...) containing additional dimensional information about the POIs. Finally, there is a very large table of ACTIVE_TRIPS(trip_id, Z4_st) where the starting and ending locations are encoded Z4_st as 4-dimensional Morton codes. Note that the ORACLE table is indexed by (Z AB , poi_id) which means that one can binary search for a particular Z 4 pair in O(log n + log m) time, where n is the number of nodes and m is the number of POIs. Once we have the following setup, we can apply some interesting queries to the in-path oracles.
Find all POIs that are in-path to a given source s and destination t: Here we are given a source s and a destination t and we want to find all the POIs that are in-path to the given source and destination. In this case, the query is a lookup on the ORACLE table using the primary index to fetch the in-path poi_id and joining it with the POI table to provide the dimensional fields of the POIs. This query is expected to be very quick since it is a simple lookup using a primary index. One can do millions of such queries per second using a single machine hosting the database which means that one can trivially scale up this setup for use in the critical path of a popular navigation app to add an in-path layer during navigation.
SELECT poi_id, Type, Name FROM POI JOIN ORACLE using (poi_id) WHERE ORACLE.Z AB = Z4_st; In this setup, one can add new POIs by adding a row to the POI  As one can see, a fairly complex query can be reduced to a few lines of SQL using the in-path oracle which speaks to the real power of this abstraction.

Experimental Results
The experiments were conducted on a GPU consisting of an AMD Ryzen Threadripper 3960 CPU, which has multi-processing power. It has 48 threads and 24 cores, having 2.2 GHz clock speed, 128 GB RAM, 4 TB HDD, and 512 KB cache memory. The implementation of proposed algorithms is done using Python programming language, version 3.10, running in 64-bit Ubuntu Linux operating system, version 20.04.3, Kernel version 5.4.0-125.
Dataset: The road network dataset was obtained from the 9th DIMACS Implementation Challenge website [83]. We used two datasets in our evaluation, Washington, D.C. road network with 9559 nodes and 12,304 edges and San Francisco dataset which included 321,270 nodes and 800,172 edges. Each directed edge's weight represents the road network travel distances between two nodes. The spatial position of each node was also available which was normalised into a [0, 1) grid box.
Comparison Techniques: Our primary evaluation consists of dual Dijkstra baseline algorithm and comparing it against the proposed in-path oracle. In particular, we choose a source and destination node at random and ask both dual Dijkstra algorithm and in-path oracle to determine which of the POIs are in-path to the shortest path. The parameters that we change to examine how the algorithms perform under diverse settings are listed in Table 1. Each data point was repeated 100 times by randomly chosen sources and destinations for each time and then averaged across all execution runs. The POIs used in the experiments were randomly chosen by uniformly sampling the nodes in the road network. The sampling rate of the POI refers to the rate of choosing POIs among the nodes of the road network. For instance, 0.001 means that we roughly choose 1 node from a set of 1000 nodes. Recall that refers to the detour tolerance limit which is varied as a fraction over the shortest distance between the source and destination. We vary the epsilon value from 0.1 (i.e., 10%) to 5 (500%). Methodology: We collected the following metrics for each experiment: (1) the elapsed time spent waiting for the program to deliver its outputs (measured in seconds), and (2) the throughput of the algorithm using a single machine. For the dual Dijkstra approach, we additionally report the (3) largest size of the priority queue and (4) the number of POIs added to the priority queue as a way of quantifying how much of the road network had to be traversed by the algorithm. Note that among these experiments, the throughput approach provides the best indicator of the relative performance of the experiments. At the end of the day when these algorithms are deployed in production, the key question is how much machine resources are needed to support the incoming workload. Here, as one can see later, the in-path oracle is a much superior solution compared to the dual Dijkstra baseline approach.

Baseline Approach
In this section, we examine the performance of a general in-path query variant where the driver is travelling between a source and a destination. The experimental results are on the dual Dijkstra algorithm that we developed earlier, which serves as a credible variant for the in-path oracle approach.

Varying Detour Limit
The plot shown in Figure 3a demonstrates that the running time increases with the detour tolerance. It can be seen from the graphs that, as we increase the detour tolerance, the running time also increases. Similarly, the size of the priority queue increases when the detour tolerance limit increases, shown in Figure 3b. Finally, Figure 3c shows that the number of POIs added to the priority queue increases when the detour tolerance limit increases. Note that this result again underscores the observation from earlier that the algorithm is sensitive to detour tolerance. The larger the detour tolerance we allow, the larger is the search space and consequently slower is the query response time.

Varying POIs Sampling Rates
We vary the POIs sampling rate from 0.0001 to 0.1 while keeping the detour tolerance limit is fixed at 0.01. As mentioned earlier, we conducted each experiment 100 times, averaging the results. We track latency, the maximum size of the priority queue, and the number of POIs added to the priority queue during the course of the search process. The plot shown in Figure 4a shows that the running time increases with the POIs sampling rate. It can be seen from the graphs that as we increase the POIs sampling rate, running time also increases. Similarly, the priority queue size increases when the POIs sampling rate increases shown in Figure 4b. Finally, Figure 4c shows that the number of POIs added to the priority queue increases when the POIs sampling rate increases.  The key insight here is that as the detour tolerance limit increases, the search space of the algorithm increases and consequently the query latency. Although the sensitivity of the algorithm to the POI sampling rate does not seem very high. This is expected since the query is expected to return all the POIs found, so, there is no opportunity to stop the search process when a few POIs have been found.

In-Path Oracle
In this section, we examine the performance of oracle size with varying parameter values like detour limits and road network size for a fixed POI.

Varying Road Network Size
The road network used in the experiment was the San Francisco Bay area dataset, consisting of 321,270 road nodes and 800,172 road edges. For different sizes (number of nodes) of the road network, we recorded the size of the oracle. We considered the size of the road network between 1000 and 100,000 and kept the detour tolerance limit fixed at 0.25. Figure 5 illustrates the different sizes of oracles with varying numbers of nodes. One can see that the size of the oracle is linear in the number of nodes which confirms the earlier linear complexity result. This means that, for a road network of 100,000 nodes, the size of the in-path oracle is about 8 MB which is quite modest by today's computing standards. Note that computing the oracle took less than 30 min for all these cases using a single machine running a single processor thread. One can substantially decrease the query execution by using multiple threads or employing multiple machines. Since the algorithm is data parallel, this denotes a fairly trivial extension to our work.

Varying Detour Limit
For this experiment, we use the Washington D.C. road network dataset and vary the detour limits from 0.05 to 5. Figure 6 illustrates the size of the in-path oracle and one can see that the size of the oracle is bell-shaped. In particular, when is greater than 2 the size of the oracle sharply decreases since at that point many larger block-pairs would start becoming in-path thus there is no need to subdivide these blocks deeper. For instance, at = 5.0, most of the road network would be in-path, so, the size of the oracle would be quite small. In the case of the baseline approach, the latency increases with but in the case of in-path oracles the size of the oracle decreases with which is a desirable property.

In-Path Query Throughput Experiment
In this set of experiments, the throughput of the in-path queries is measured by the varying sampling rates and compared against baseline dual Dijkstra's algorithm which was provided in Section 4. The experiment used 100,000 nodes belonging to the San Francisco dataset. Experiments were done using a single machine with 20 parallel threads corresponding to the number of cores in the machines.
The POIs were randomly chosen and the corresponding in-path oracles were computed. They were inserted into a disk-resident B-tree. The experiments randomly generated source and destination node pairs and looked up the disk-based in-path oracle to find which of the POIs (if any) were in-path to the chosen node pairs. Internally, these queries looked up a disk-resident B-tree with a fixed 3 GB cache. The results shown here correspond to the warm-cache case where the experiments have been running for some period of time. At this point, the in-memory buffers have cached pages of the B-tree in memory. Running the experiment on a cold cache results in high variation in the throughput results, but this case is not interesting from a production standpoint. In our use-case, it was envisioned that the in-path oracle service would be running continuously, and hence the cache is seldom cold.
The number of POIs is chosen similarly to the previous experiments and measured in terms of the sampling rate. The average number of blocks per in-path oracles was about 1 M, which is roughly about 8 MB in memory. Note that the disk size of these oracles are considerably smaller since B-trees encode and compress the data and the resulting representation is an order of magnitude smaller on disk.
We can observe many things from this experimental setup. First of all, the throughput from the dual Dijkstra approach was about 100-200 in-path queries/second for most of the POI sampling rates. In this case, we ran 20 parallel versions of the setup in Figure 3. In this case, there is no opportunity for work-sharing among the threads, so, we just run them in separate threads that do not share anything other than the graph data structure. In the case of the in-path oracle experiments, they share the cache and the underlying B-trees. Note that the dual Dijkstra baseline algorithm searches the graph representation at runtime while in-path oracle looks up the precomputed representation, so, is expected to be much faster.
In particular, Figure 7 shows the throughput results as a function of the POI sampling rate. One can see that we get more than one million in-path queries per second. In other words, we can answer queries four orders of magnitude faster than the existing baseline algorithm. This is a very powerful result that shows that we can use just a few machines to answer in-path queries for very large map services. This means that our in-path oracle technique can also be used in extensive simulations where in-path considerations may be important. This exceptional improvement that we report in this paper is due to the sensible use of precomputation in solving fundamental and basic operations in road networks.

Conclusions
We aimed to solve a seemingly simple problem of which POI is in-path to a given source and destination. We demonstrated a new technique that is simple yet effective. We proposed an in-path oracle that is able to answer in-path queries without having to compute the shortest path between the source and destination. Our experimental results showed that one can perform millions of such queries per second on a road network, which are many orders of magnitude faster than an existing strawman approach. Furthermore, our approach can be embedded in a database system, and thus can be used in conjunction with complex queries. A drawback with any oracle technique is that major changes to road networks such as road closures may affect the precomputed results. However, it is possible to incorporate such dynamic changes by determining which block pairs in the in-path oracle have been invalidated by such road network updates. Future work will look at the dynamism of road networks and the applicability of in-path oracles for such use-cases.