Efﬁcient Processing of All Nearest Neighbor Queries in Dynamic Road Networks

: The increasing trend of GPS-enabled smartphones has led to the tremendous usage of Location-Based Service applications. In the past few years, a signiﬁcant amount of studies have been conducted to process All nearest neighbor (ANN) queries. An ANN query on a road network extracts and returns all the closest data objects for all query objects. Most of the existing studies on ANN queries are performed either in Euclidean space or static road networks. Moreover, combining the nearest neighbor query and join operation is an expensive procedure because it requires computing the distance between each pair of query objects and data objects. This study considers the problem of processing the ANN queries on a dynamic road network where the weight, i.e., the traveling distance and time varies due to various trafﬁc conditions. To address this problem, a shared execution-based approach called standard clustered loop (SCL) is proposed that allows efﬁcient processing of ANN queries on a dynamic road network. The key concept behind the shared execution technique is to exploit the coherence property of road networks by clustering objects that share common paths and processing the cluster as a single path. In an empirical study, the SCL method achieves signiﬁcantly better performance than competitive methods and efﬁciently reduces the computational cost to process ANN queries in various problem settings.


Introduction
Over the years, mobile technologies and location-based services (LBSs) have rapidly become popular, affording its user's easy access to a variety of LBSs that aim to provide value-added experiences. On the other hand, GPS-enabled devices, such as smartphones, wireless sensor networks (WSN), and navigation systems, are responsible for generating a massive number of spatial query requests. The most common instances of spatial queries [1] associated with popular LBSs include shortest path queries, range queries [2,3], k-nearest neighbor (k-NN) queries [4,5], reverse k-NN queries [6], and preference queries [7][8][9]. To address the growing demand for such services, a significant amount of research has been conducted over the past few years to monitor and improve the processing of spatial queries. All these spatial queries extract the data objects based on the distance from the query point.
The ANN query retrieves all nearest neighbor data objects for every query object with the least distance between them. It can be said that ANN is a variation of k-NN, where the k is always equal to one for each query object in the entire query dataset. In this paper, the exploration of the ANN queries on a dynamic road network is carried out as it serves as a close approximation to real-world scenarios. Due to the frequent traffic updates, the weights, i.e., traveling time and distance, change accordingly. ANN queries can be used in real-life applications such as "Find the nearest gas station for every car-parking lot", and in case of ride-sharing, "Find all the nearest taxi for all the matched customers". In a ridesharing scenario, a car is shared by one or more passengers. Let us assume that the group of passengers is requesting the nearest vacant taxi cab via their smartphones. The groups of passengers and taxi cabs are denoted as query objects and data objects, respectively. The scheduling system assigns the taxi cab that bears the smallest traveling distance to all the passengers. The consequence demand for real-time reporting in the ride-sharing concept can be facilitated through the improvement in ANN queries. The aforementioned examples utilize the snapshot query because the huge number of LBSs involve only snapshot query processing capabilities [10][11][12][13][14][15], rather than continuous monitoring.
A naive approach to process ANN queries is to scan the whole dataset by computing the distance between each query and data object. For large datasets, such a naive solution is not feasible because of its high computational complexity. Until now, most of the research on ANN queries has been done in Euclidean and metric space [16][17][18]. Their approaches either focus on the indexing scheme to implement the pruning that mitigates the entire scanning of the dataset or precomputation of the shortest path distance. These approaches are not suitable in the context of dynamic road networks as materialized structures cannot dynamically compute the shortest path between each pair of nodes in road networks, and they are limited to computation of Euclidean distances. For example, let us consider a scenario with a huge number of query objects requesting for their respective nearest data object within the same time frame t 1 . Indexing such a large number of query objects with regard to their locations and distances to evaluate the set of nearest neighbors (NNs) might be suitable for a one-time evaluation. Whenever there occurs any location update after a certain time duration t i+1 , the server would be required to rebuild the whole index from the start, which would result in a huge computational burden. However, using the snapshot of the query locations at each time frame t i+1 and clustering them would reduce the redundant network traversal regardless of the movement of the query object. Nevertheless, if the query object moves away from its current query cluster, then re-clustering would take relatively less time compared to the indexing, which makes it feasible for dynamic road environments.
Unfortunately, the indexing method cannot support answering ANN queries in dynamic road networks due to the computation overhead. Therefore, this study proposes an efficient algorithm to find the ANN queries on dynamic road networks by implementing the shared execution technique. The fundamental idea of the shared-execution technique is to batch the queries and execute them as a single query in order to achieve efficient load handling [19,20]. Standard clustered loop (SCL), based on the shared-execution technique, follows a similar rule, according to which groups of similar objects are clustered together. The working sets of SCL are constructed in two steps. In the first step, objects belonging to similar categories are clustered and arranged into a sequence. During the second step, a maximum of two NN queries are evaluated at the ends of each query object cluster and the results are assigned to all query objects belonging to that cluster. SCL exhibits low computational demands as it avoids the evaluation of redundant NN queries by employing shared-execution processing. SCL does not implement any materialization or precomputing scheme to process NN queries. Any existing nearest neighbor algorithm can be incorporated to process the ANN. The primary contributions of this study can be outlined as follows: • The shared-execution based SCL method is proposed for the efficient processing of ANN queries on dynamic road networks. Moreover, to the extent of our knowledge, this is the first attempt to evaluate ANN queries on dynamic road networks. • The proposed SCL method is simple and easy to implement. Only an optimized number of NN queries are evaluated by implementing clustering and shared-execution.

•
Extensive experiments with various settings are performed to measure the superiority of the proposed scheme for particular scenarios.
The rest of the paper is organized as follows. Related works have been discussed in Section 2. An overview of the associated concepts is presented in Section 3. The SCL algorithm is outlined in Sections 4 and 5. The time complexity of the proposed algorithm is discussed in Section 5.3. The experiments and evaluation of the proposed method are exhibited in Section 6 and further discussion in Section 7. Finally, the paper is concluded with future research direction in Section 8.

Dynamic Road Network
Currently, the processing of spatial queries on road networks has intrigued many researchers. The known developments can be chronicled as follows. Papadias et al. [21] introduced Incremental Euclidean Restriction (IER) and Incremental Network Expansion (INE), both of which apply multi-step kNN capable of retrieving high-dimension similarities. IER is an early kNN technique that inherits its characteristics from the A* search algorithm [17]. It operates on the assumption that the network distance between two distinct objects cannot be less than the Euclidean distance between them. INE, on the other hand, performs spatial search by expanding the search region from a query object. The first data object discovered during the expansion is identified to be the data object closest to the initial query object. Samet et al. [22] proposed the SILC framework to mitigate the storage cost incurred during the decoupling of the objects from a large spatial network. SILC precomputes the shortest path between all possible pairs of nodes based on a bestfirst-search manner and uses quad-trees to store the identified shortest paths. Lee et al. [23] suggest route overlay and association directory (ROAD), a framework to search the spatial objects lying on the road network that cleanly separates the road network and objects. It is a solution-based approach that utilizes some precomputed results and distance ranges, instead of using precise distances that consume redundant storage. Hence, ROAD aims to solve the problem of high precomputation and storage overhead by utilizing the search space pruning to improve the efficiency of the framework. Precomputation is an expensive operation as the result sets must be stored. To overcome this issue, Zhong et al. [24] proposed an efficient indexing scheme that recursively partitions the road network into equal sub-networks. The assembly-based method computes the shortest path distance for the single-pair shortest path (SPSF) query. A few studies exist to process an NN query on a dynamic road environment [25,26]. In these studies, the authors employed precomputed grid structures that utilized in-memory data structures. However, this is inappropriate for large networks. A safe-region technique was proposed to efficiently compute the nearest neighbor queries for moving query objects [9,27]. Techniques involving safe-region computation cannot be used to process ANN queries on dynamic road networks considering the difference in problem definition. Heuristic methods were developed to compute the shortest path in dynamic road networks [28][29][30]. These methods depend on a lightweight indexing technique for route planning in dynamic road networks. However, these methods cannot be applied directly to process ANN queries on dynamic road networks considering the differences in the motivation of studies.

All Nearest Neighbor Queries
Clarkson [16] proposed three different algorithms to solve the problem of ANN computation in Euclidean space. Both the query and data objects were assumed to be of the same type, i.e., monochromatic data. All the data points were enclosed in small cubic cells of identical size. Then, the distance bounded by the nearest data cell represented the nearest neighbor of the query object. Zhang et al. [17] proposed a two-phase hash-based algorithm that loads pairs of data and query objects and divides them into buckets of equal size. Identical or overlapping buckets can then be used to identify the nearest neighbor of each query object. To date, there is only one study of the problem of processing ANN query in road networks [18]. This study introduced virtual vertex traversal (VIVET), which uses an index-based algorithm and performs a single traversal from a virtual node to all other existing nodes, and their shortest-path distances are stored in an array used for precomputation. Subsequently, a simple lookup in the array allows finding the nearest neighbor of any query object. However, the creation of an index is memory-intensive. No precomputation or indexing scheme can solve the problem of finding the ANN on dynamic road networks without incurring high computational costs. The proposed scheme differs from the existing studies in various aspects. First, the proposed method takes into account the dynamic nature of the road network, which cannot be addressed by using precomputation. Secondly, the implementation of shared execution enables efficient query processing. Lastly, based on gained knowledge, this is the first attempt to evaluate ANN queries on dynamic road networks.

Preliminaries
Here, a formal definition of the basic concepts along with their characteristics that have been used in this paper is mentioned. Firstly, this section establishes fundamentals for the dynamic road networks Secondly, this provides insight about the classification of nodes, which is the basis for understanding shared execution. Lastly, the conventional definition of an ANN query is given.

Dynamic Road Network
The world is composed of various complex structures, such as biological networks, communication networks, power-grid networks, and road networks [31,32], that can be represented and analyzed using a graph where the nodes and edges resemble the entity and the relationship among them [33]. In this study, the road network is depicted as an undirected graph G = N, E, W , where N, E, W denotes the sets of nodes, edges, and the weights. In general, each edge has a non-negative weight that represents its distance. The concept of a dynamic road network simply can be understood as the variation of the moving objects and distances depending upon the traffic and road condition at any time. Figure 1 depicts an example of a dynamic road network, in which objects q 1 to q 5 represent query objects (denoted by rectangles), and objects d 1 to d 4 represent data objects (denoted by triangles). Let us interpret this example in terms of a ride-sharing service, where data objects represent taxi-cabs and query objects represent passengers. The ride-sharing service involves sending a taxi-cab to each passenger who is located within the nearest vicinity from the taxi-cab. Further, the travel time should be updated frequently by consulting the real-time traffic conditions. For example, the nearest taxi-cab for q 5 at time t 1 is d 4 , as shown in Figure 1a. However, due to huge traffic-congestion in taxi-cab d 4 's locality, it is incapable of reaching q 5 faster than d 3 . In this study, a dynamic graph is treated as series of snapshots, where each snapshot is static in itself, yet dynamic between each other.

Classification of Nodes
In this study, nodes are classified into three different categories. Here, the degree of a node refers to the number of adjacent nodes that it is connected with. If the degree of a node is equal to one or two, then the node is called terminal or intermediate node, respectively. If the degree of a node is more than two, it is called an intersection node.

Node Sequence and Segments
A node sequence n l n l+1 . . . n m represents a path between two nodes n l and n m in a road network, such that n l and n m are either intersection nodes or a terminal nodes, and n l+1 , . . . , n m−1 are intermediate nodes. The node sequence from n l to n m can be called a node segment. The shortest path distance is a distance between a data object and query object, whereas the length denotes the distance of the path from that query object to the data object in the same node sequence. Table 1 summarizes the notations used in this paper. For simplification of notation, q i q j is used to indicate the query object cluster, i.e., q i q i+1 . . . q j , where, q i q i+1 , . . . , q j are the query objects lying in the same node segment. Table 1. Notation and their meaning.

Notation
Meaning n l n l+1 . . . n m Node sequence N when n l and n m are either intersection nodes or terminal nodes, n l n m ∈ |N| q i q i+1 . . . q j Query sequence Q generated from query objects q i , q i+1 , . . . q j in the same node Set of data objects located in query object cluster (q i q j ). D q Set of data objects closest to query objects q for every d. dist(q, d) Shortest path distance between the data object and the query object. len(q, d) Distance of the segment connecting q and d such that they lie in the same segment.

All Nearest Neighbor Query
The ANN query in a road network returns all the pairs of every query object with the corresponding closest data object. Definition 1. Given two different object sets Q and D, where Q = {q 1 , q 2 , . . . , q n } and D = {d 1 , d 2 , . . . , d m }, the ANN query returns the set of pair of objects from Q D such that D is the nearest neighbor of Q.

Methodology
An overview of the proposed query processing methodology has been depicted in Figure 2. Initially, the server captures a snapshot and creates a query processing module, which includes the query locations and n number of generated query requests at a time t 1 (Step 1 in Figure 2). Following this, the query processing module begins to search the nearest data object (Step 2 in Figure 2). During the search, the SCL method first traverses through the network and clusters all nodes that are in the same node segment, i.e., 'intermediate nodes'. The node clustering process takes place only once and is used for the next snapshot (Step 3 in Figure 2). In addition, the query objects located in the same node cluster will be grouped together to form a cluster of query objects after scanning each node cluster (Step 4 in Figure 2). Next, the NN query is evaluated at the boundaries of each query object cluster. For every query object cluster, the generated NN result will be used for comparing the distance between the inner query objects with boundary query objects if the query object cluster size is equal or more than three. This process eliminates the necessity to perform traversal for inner query objects. The results of the boundary query objects are assigned to the nearest inner query objects (Step 5 in Figure 2). Finally, the result set that contains the nearest data object for each query object is returned to corresponding query users (Step 6 in Figure 2). After t i+1 time, the process is repeated to update the result set and return fresh query responses.

Nearest neighbor query requests
Results set (nearest data object) Traversing road network

Clustering Algorithm
Clustering is a process that involves the grouping of the data points belonging to the same category. There are two distinct types of clustering used: node and query object clustering. Figure 3a represents node clusters (denoted by different lines) in a road network. After clustering, the following node sequences are formed: n 1 n 2 n 3 , n 1 n 4 n 3 and n 1 n 3 .
Query object q n ∈ Q Data object d m ∈ D  Figure 3b represents the clustering process of query objects in the same node sequence. There are six query objects q 1 to q 6 and four data objects d 1 to d 4 . Given two objects set of Q = {q 1 , q 2 , q 3 , q 4 , q 5 , q 6 } and D = {d 1 , d 2 , d 3 , d 4 } where q 1 , q 5 and q 4 query objects in node sequence n 1 n 2 n 3 are clustered together into q 1 q 5 q 4 , the query objects q 3 and q 2 in the node sequence n 1 n 4 n 3 are clustered into q 2 q 3 and a single query object q 6 in n 1 n 3 is clustered into q 6 . The bold lines represent the query object clusters. The set of query objects Q = {q 1 , q 2 , q 3 , q 4 , q 5 , q 6 } is now converted into Q = {q 1 q 5 q 4 , q 2 q 3 , q 6 }, where Q indicates the set of query object clusters.
Algorithm 1 outlines the process for clustering the query objects into query object clusters. The clustering process incorporates two steps. Generation of the node sequence is followed by generation of the query object clusters. Line 6 checks whether the node is an intersection node or a terminal node. If the node is either type, paths from the node to its adjacent nodes are explored until an intermediate node is found. A node sequence is added to the N. From Lines 15-18, the algorithm looks for the query objects in each node sequence. Then, the discovered objects are grouped together and a query object cluster is formed and added to Q as in Line 17. Line 19 returns the final set of query object clusters. Step 1: N is originated from N and E 5 foreach node n l ∈ N do 6 if n l ∈ N intersection node || n l ∈ N terminal node then 7 foreach edge n l n l+1 . . . n m ∈ E adjacent of n 1 do 8 n 1 n l+1 . . . n m ← find_query_sequence(n l , n l+1 , N) 9 N ← N ∪ {n l n l+1 . . . n m } 10 end 11 else 12 end 13 end 14 Step 2: N is originated from Q and N 15 foreach node sequence n 1 n l+1 . . . n m ← N do 16 q i q i+1 . . . q j ← find_query_objects_in_node_sequence{n l n l+1 n m } 17 Q ← Q ∪ q i q i+1 . . . q j 18 end 19 return Q

Overview of SCL
Cho [34] suggests that evaluating NN queries at the two ends of the query object cluster is adequate to retrieve the nearest data object for every query object. According to Lemma 1, after evaluating at most two NN queries for q i and q j , located at the boundaries of the query object cluster q i q i+1 . . . q j , evaluating the NN for other query objects located inside q i q i+1 . . . q j is irrelevant. It is because the data objects can be retrieved just by comparing the distance from inner query objects q i+1 . . . q j−1 to the q i and q j and assigning answer data object of the closest boundary query object. This process eliminates the necessity to perform traversal for inner query objects. Lemma 1. For every query object q ∈ q i q j , there exists D q , which is a subset of D q i ∪ D q j ∪ D q i q j , where D q i (D q j ) refers to the set of data objects nearest to the query objects q i q j , and D q i q j refers to the set of data objects lying in the query object cluster q i q j .
Proof of Lemma 1. The correctness of Lemma 1 is proved by contradiction. Let us assume that D q ⊆ D q i ∪ D q j ∪ D q i q j is false and rather it holds On the other hand, d / ∈ D q j , hence d is farther from q j than its nearest data object d q j , i.e., dist q j , d q j < dist q j , d . Nevertheless, d does not belong to D q i q j , i.e., d / ∈ D q i q j . Hence, the shortest path connecting q to d must travel through either q i or q j . The distance between q to d is calculated by dist(q, d) = min len(q, q i ) + dist(q i , d), len(q, q i ) + dist q j , d . Based on the aforementioned cases: dist(q, d) > min len(q, q i ) + dist(q i , d q i ), len q, q j + dist q j , d q j , which contradicts the assumption that there is a data object d that belongs to D q , i.e., d ∈ D q such that D q D q i ∪ D q j ∪ D q i q j .
It is required to find the distance from the query object q ∈ q i q j to the nearest data object d. In Figure 4, XYcoordinates represent the distance and length from the query points to the data objects. X coordinate refers to the len q i , q j , whereas the Y coordinate refers to the distance from query point to data object dist(q, d). When there exists a path Finally, from the array holding the computed distance, only the minimum distance is extracted as given below. Here, len and dist represent length and distance, respectively.
then dist q j , d = len q, q j + dist q j , d ; (c) If d ∈ q i q j , then dist(q, d) = len(q, d). Table 2 shows the conditions to compute the distance from the q to d. A data object can belong either to D q i , D q j , or D q i q j . Retrieving the closest data object from D q i (D q j ) is more significant than retrieving the set of data objects from D q i q j . The process of identifying the nearest-neighbor pairs is described in Algorithm 2. The shared execution process can be implemented to improve the execution time. Firstly, Ω, i.e., the set of object pairs, is initially assigned to be null. The algorithm comprises two steps. In the first step, the adjacent query objects are clustered together and query object clusters are formed and Q is transformed to Q. The latter step involves the evaluation of the nearest neighbor query at the query object cluster q i q j to retrieve the nearest data object d. Table 2. Evaluation of dist(q, d) for q ∈ q i q j and d ∈ D q i ∪ D q j ∪ D q i q j .

Condition dist(q,d)
Step 1: Adjacent query objects are clustered to form a segment 5 Q ← Cluster_Query_Objects(q, n, e) ; /* from algorithm-1 */ 6 Step 2: Nearest Neighbor query is performed for each query object cluster q i q j ∈ Q 7 foreach query object cluster q i q j ∈ Q do 8 if q i q j = 1 then The algorithm involves three different cases considering the number of query objects located in query object clusters. Case-I: q i q j = 1; Case-II: q i q j = 2; and Case-III: q i q j 3. In Case-I, the NN query NN_Query(q i ) is evaluated at q i , as in this case, the query object cluster q i q j consists solely of q i . Following this, the NN query result is added to the partial join result Ω(q i ) in Lines 8-12. If it is Case-II, NN_Query(q i ) and NN_Query q j are evaluated at q i and q j , respectively. Then, the partial join result Ω(q i ) and Ω q j are obtained, and the union is performed for the obtained partial sets, i.e., Ω(q i ) ∪ Ω q j in Lines 13-19.
Finally, for Case-III, two NN queries are evaluated, and the search for the nearest data objects is performed at q i and q j . For each query object q ∈ q i+1 . . . q j−1 , the set of nearest neighbors of q is extracted in Lines 12-24. According to Lemma 1, a partial join result Ω q i q j can be extracted from D q i ∪ D q i ∪ D q i q j by applying the shared execution method. Finally, the join result set is obtained and the union of partial join results Ω(q i ) ∪ Ω(q i+1 ) ∪ . . . Ω q j is returned in Lines 25-29 after processing the entire query object cluster.
To bypass evaluating the redundant NN queries, a simple heuristic has been adopted, where no NN query is computed at query points close to the terminal nodes. For instance, as depicted in Figure 5, the graph consists of an intersection node n 2 with a query cluster q i q j adjacent to it that ends with a terminal node n 1 . In this example, the NN query at q i is unnecessary because it holds that D q ⊆ D q j ∪ D n 2 q j , and it is sufficient to evaluate NN query at q j . Figure 5. Heuristic D q ⊆ D q j ∪ D n 2 q j for each query segment q ∈ q i q j .

Evaluation of SCL
In this section, a brief discussion about the SCL algorithm has been carried out using Figure 3. Considering that, Q = {q 1 , q 2 , q 3 , q 4 , q 5 , q 6 } and D = {d 1 , d 2 , d 3 , d 4 } are the given sets of query and data objects, respectively. Query objects from Q have been clustered into query object clusters (q 1 q 5 q 4 ), (q 3 q 2 ), and q 6 , all of which belong to Q, as depicted in Figure 3b.
While processing q 1 q 5 q 4 containing q 1 , q 5 , and q 4 , the two NN queries are evaluated at q 1 and q 4 and the corresponding nearest data objects are retrieved. After evaluating the NN queries at two ends of the query object cluster, D q 1 = {d 1 }, D q 4 = {d 3 }, and D(q 1 q 5 q 4 ) = ∅ are obtained. The simple partial join result sets can be generated based on this information as Ω(q 1 ) = { q 1 , d 1 } and Ω(q 4 ) = { q 4 , d 3 }. Instead of evaluating the NN query corresponding to q 5 , as it is an inner query object in the query object cluster q 1 q 5 q 4 . Rather, simply applying Lemma 1 is enough to retrieve the NN for q 5 based on the relation D q 1 ∪ D q 4 ∪ D(q 1 q 5 q 4 ) = {d 1 , d 3 }. For this purpose, it is necessary to compute the distance between q 5 and the candidate data object d ∈ {d 1 , d 3 }. It is evident that d 1 ∈ D q 1 ∪ D q 4 − D(q 1 q 5 q 4 ) and so, based on Table 3, the distance between q 5 and d 1 is given by dist(q 5 , d 1 ) = len(q 5 , q 1 ) + dist(q 1 , d 1 ) = 3, as depicted in Figure 6a. Similarly, d 3 ∈ D q 1 ∪ D q 4 − D(q 1 q 5 q 4 ) and so, based on Table 3, the distance from q 5 to d 3 is dist(q 5 , d 3 ) = min{len(q 5 , q 1 ) + dist(q 1 , d 3 ), len(q 5 , q 4 ) + dist(q 4 , d 3 )} = min{9, 15}, as shown in Figure 6b. Table 3. Computation of nearest neighbor Q D using the SCL method. q i q j q i q j D q i D q j D q i q j q i+1 . . . q j−1 D q i ∪ D q i ∪ D q i q j Next, the evaluation of the NN queries at query object cluster q 2 q 3 is performed. As the query object cluster consists of only two query objects, the NN queries corresponding to both query objects need to be evaluated. On retrieving the set of the nearest data objects, the respective partial join result set is generated for each query object. From Table 3, it is observed that D q 2 = {d 4 } and D q 3 = {d 4 }, then the partial join result set for q 2 and q 3 will be Ω( On successful processing, the NN queries from q 2 q 3 , which leads to the final query point q 6 . In this case, a single NN query is generated at q 6 and the partial join set for q 6 is performed as Ω(q 6 ) = { q 6 , d 3 }. Finally, the union of all query object clusters is computed to be Ω(q 1 q 5 q 4 ) ∪ Ω(q 2 q 3 ) ∪ Ω(q 6 ) = { q 1 , d 1 , q 4 , d 3 , q 5 , d 1 , q 2 , d 4 , q 3 , d 4 , q 6 , d 3 } where Ω(q 1 q 5 q 4 ) = Ω(q 1 ) ∪ Ω(q 5 ) ∪ Ω(q 4 ), and Ω(q 2 q 3 ) = Ω(q 2 ) ∪ Ω(q 3 ), respectively.

Complexity Analysis
The complexity of finding the ANN queries using the proposed SCL method is covered as follows. The number of data objects are denoted as |D|, the number of query objects as |Q|, and the node cluster as N. The number of nodes and edges are denoted as |N| and |E|, respectively.
The clustering process takes O(|N| + |E|) + O(N) = O(|N| + |E| + N). Initially, the road network is traversed from the terminal or intersection node until it reaches another intersection node. For the road network traversal, the SCL algorithm adopts a breadth-first search traversal with the worst-case time complexity of O(|N| + |E|). Once the node clustering is completed, the algorithm linearly scans through the node clusters to find query objects located in those node clusters. The scanning takes the linear search of O(N) time.
The query time complexity of the SCL algorithm depends upon the number of query object clusters, i.e., |Q|. At most, two NN queries are applied for each query object cluster that makes 2 × |Q|. To find the shortest path from the end of each query object cluster to the nearest data object, Dijkstra's algorithm was implemented, which has a worst-case time complexity of O(|N| + |E|log|N|). Therefore, the time complexity of the SCL algorithm is expressed as: O |Q| × (|N| + |E|log|N|) .

Experimental Evaluation
This section introduces an experimental approach to the algorithm analysis. Further, it is subdivided into a description of the experimental environment and the presentation of the results followed by the discussions.

Experimental Setup
Real road datasets from California, San-Joaquin, and Oldenburg, available in [35], were used to verify the performance of the proposed algorithm. The California dataset comprised 21,048 nodes and 21,693 edges, the San-Joaquin dataset comprised 18,263 nodes and 23,784 edges, and the Oldenburg dataset comprised 6105 nodes and 7035 edges. Further details about the datasets have been presented in Table 4. The proposed SCL algorithm was employed in this study to find the NN queries on the disclosed datasets. In this paper, the query and data objects were assumed to move continuously in dynamic road networks, where the weights of the road segment (transit times and distances) were frequently updated and the distance between two objects was taken to be the length of the shortest path connecting them. ANN queries were taken to be snapshot-based rather than continuous queries. This required the system to store the current locations of the query and data objects. Thus, the movement of the query and data objects was not of much significance to the proposed method, as the frequent updates of network distance simply paralyzed the precomputed structures.
In each experiment, the size of data and query objects were changed. When the size of data objects was increased, then the size of the query object was kept static and vice-versa. The setup parameters used for this experiment are shown in Table 5. The bold values represent the default selected values throughout the experiments. Initially, 10 centroid datasets were generated that followed a Gaussian distribution, and the mean was set to the centroid, whereas the standard deviation was set to 2% of the side-length. The distribution of the query and data objects were taken to follow the centroid distribution, unless stated otherwise.  [21] algorithms were employed to compute the nearest data object for all n given query objects. The VIVET algorithm uses a lookup array table. Initially, it computes the distance from the virtual node and passes through every data object until it reaches the nearest node. Once it finds the nearest node, the distance table is updated, and a new distance is recorded in the lookup array table, whereas the INE algorithm starts the traversal from each query object and terminates the search once it finds the nearest data object. The three algorithms were implemented using Java language and run on a desktop PC running Linux 64-bit OS with 16 GB RAM and a 32-cores Intel processor at 2.10 GHz. The experiments were repeated five times, and the obtained average values were recorded.

Experimental Results
Figures 7-9 depict the comparison of the query processing time for CAL, SANJ, and OLDEN datasets for the INE, SCL, and VIVET algorithms, respectively. The size of the query object was set to be 50 K as default and the data object varied from 20 to 100 K and vice-versa when the size of the query object varied. Finally, the figures show the various data distribution combinations (i.e., (C, C), (C, U), (U, C, ), (U, U)) for the query and data objects.  Figure 7a illustrates the effect of the data object size growth on query processing. The query processing time tends to grow with the number of data objects. However, the growth of the data object size has less impact on the efficiency of SCL and INE algorithms. In fact, the SCL exhibits six times better performance than the VIVET algorithm due to the reason that the query processing of the SCL algorithm is not affected by the change in the size of the data object. For VIVET, when the data object size increased, the algorithm had to spend 65% of the computation time traversing from the virtual node, passing through every data object to find the nearest node and keeping the index table up to date. Figure 7b illustrates the effect of the query object size growth on query processing. The query processing time for all three algorithms increases with an increasing number of query objects. Clearly, the SCL algorithm is two and four times faster than INE and VIVET, respectively. Moreover, when the |Q| = 100 K, the SCL required approximately 25 K NN queries evaluations. It is also observed from the figure that INE was faster than the VIVET-since the query objects followed a centroid distribution-so that INE quickly came up with an NN result with less network traversal. As expected, VIVET performed much worse than SCL and INE because it required checking all data objects if they affect the lookup table. Note that each data object and query object are treated as nodes in VIVET, which alters the actual node count after augmenting the original graph G. Figure 7c depicts the effect on query processing while both the query and data objects followed the various data distribution combinations. It is clear that under various distribution settings, SCL outperformed both INE and VIVET algorithms. When the query objects and data objects are uniformly distributed the INE and SCL show similar performance. Nonetheless, the VIVET severely gets affected while the query and data objects are uniformly distributed since all the objects are sparely scattered such that all objects are widely far from each other. Figure 8a depicts a comparison of INE, SCL, and VIVET query processing time with respect to the data object size. For the 20 and 30 K data object size, SCL and VIVET show distinctly similar query processing times. On increasing the data object size to 100 K, it was observed that the processing time of SCL declined due to the reason that clustering query object reduces the number of NN query computations and is not heavily affected by the number of data objects. At the same time, the query processing time for VIVET shows a slight increment since VIVET depends upon the size of the data objects. The excessive intersection among the edges occurred, causing the INE to expand the traversal to reach the nearest data object. However, with an increased number of data objects up to 70 K, a significant plummeting of query processing time is observed for INE. Figure 8b compares INE, SCL, and VIVET performance with respect to the number of the query objects. It is evident that the number of NN queries required to be evaluated was drastically reduced by the proposed SCL algorithm compared to others. The SCL shows up to six times faster performance than INE since the SCL algorithm only required evaluating approximately 30 K NN queries when |Q| = 100 K. When |Q| was increased, INE required more iterations to compute the NN queries, and thus its processing time increased more rapidly than other methods. Figure 8c shows that the SCL algorithm performed well on all tested combinations distributions except (U, C) owing to the same reason as in the case of the CAL dataset. The result shows that the performance of the INE algorithm degenerated significantly because the INE had to traverse a long-distance path before it accessed the NN data object. However, all methods show a similar performance when the query object followed the centroid distribution, that is, (C, U). Figure 9a illustrates the impacts of data object size on INE, SCL, and VIVET performance. The result shows that the data object size has a less significant impact on SCL. This experiment result demonstrates a similar trend as in Figure 7a. On average, VIVET incurred 90% of the computation cost during the rebuilding process of the precomputation table. Figure 9b shows the effect of query object increment on INE, SCL, and VIVET. When the size of |Q| increased, the shared execution drastically reduced redundant NN query computations. In particular, the processing time required by the SCL algorithm was observed to be 67% and 90% less than that required by the INE and VIVET algorithms, respectively. Figure 9c depicts the query processing time when the query and data objects followed various distributions. The SCL method outperformed the INE and VIVET algorithms for every distribution combination. However, as expected, VIVET performed much worse than the other two algorithms as it consumed almost 80% of the computation time on keeping the index table up to date.

Discussion
From the above experimental results, it can be inferred that there was a huge difference in query processing time when the size of query objects was fixed or changed. The proposed algorithm aims to reduce the cost of query computation while there is a huge number of query objects. The SCL algorithm was implemented in a central server that was responsible for handling a huge number of queries. Since the SCL algorithm utilized the sharedexecution technique, it became dominant over INE [21] and the VIVET [18] algorithms. Moreover, extending this work to a distributed environment [30,36,37] is likely to reduce the computation cost much more significantly, which leads to a future research direction.
The goal of this study emphasized the processing of ANN queries in a dynamic road environment. For that, the beginning step requires clustering the whole map dataset and storing it in the server. Following, the previously clustered information is later used to re-cluster the query objects belonging to each cluster at a time t 1 . It is known that the ANN query is a snapshot-based query, which requires the server to take the snapshot after t i+1 time and re-cluster in order to evaluate further ANN queries. Figures 10 and 11 show the performance improvement of the SCL algorithm over the INE and the VIVET algorithms in terms of query processing time on three different datasets-California (CAL), San Joaquin (SANJ), and Oldenburg (OLDEN)-while the size of query objects increase with two different distributions, i.e.,(C, U) and (U, U). As is evident in the aforementioned results, the SCL algorithm performs best when involving a large number of query objects. In addition, ANN queries are designed to cater to spatial query processing in the presence of a large number of query objects. . Performance improvement by increasing the query object size following U, U distribution. Figure 10 illustrates the initial performance of the proposed algorithm on all three datasets with respect to the INE, which starts from 40-60% and then drops almost to 7-22%. However, when increasing the size of the query objects, the performance is progressively regained up to 60% again. However, with the VIVET, the performance of the SCL starts from 42-82%, and then slightly drops in performance down to 50%. In the case when the size of query objects is less than data objects, the performance difference is significant, and as the query object size tends to grow, the performance gap increases.
From Figure 11, it can be interpreted that the initial performance of the SCL versus the INE on all three datasets starts from 7-14% and achieves a significant performance growth approximately up to 50% when the query object size is at 50K. As the size keeps on increasing, the performance again jumps down to almost 30%. The performance of the SCL with respect to the INE improves with the increment in the query object size. When the query objects were densely scattered, the shared execution process alleviated the processing time but when the query objects were widely scattered following a uniform distribution, the query processing time was aggravated for both INE and SCL algorithms.
In addition, it can be inferred that the query processing time for the VIVET shows a linear growth due to the fact that the query processing time depends upon the number of data objects. It is due to the reason that the VIVET algorithm was designed to process the ANN queries in a static road network and whenever the location update occurs, the ANN has to rebuild the index from the start. On the other hand, due to the shared execution processing, the number of NN queries evaluated by the SCL method decreased with an increase in the size of data objects. When the data objects are uniformly distributed, the time required for each data object to locate its nearest node is longer, which is the reason why the SCL performance improved with respect to the VIVET. VIVET performed worse as the size of query objects kept on increasing. This demonstrates that the SCL algorithm optimized the shared execution processing more effectively for large datasets.
As from Table 4, the ratios of intermediate nodes to the total number of nodes for all datasets were found to be ≈0.94, ≈0.21, and ≈0.53, respectively. This shows that when that ratio of intermediate nodes to the total number of nodes is close to 1.0, the performance of the SCL algorithm increases moderately. Another parameter to take into consideration could be a relation of the number of node clusters to the number of total nodes. The CAL dataset has many intermediate nodes that contribute to forming a few node clusters. However, for the SANJ dataset, the number of node clusters is greater than that of its total number of nodes. This also can be a contributing factor for a slight improvement of performance over the increase of query objects. To sum up, the SCL algorithm is up to six times faster than INE and VIVET, particularly when the query objects exhibit a centroid distribution. Secondly, the INE algorithm often outperforms VIVET clearly when the number of data objects is larger than 50,000. The performance study implies that the shared execution technique reduces the number of NN queries evaluation to process large query requests efficiently.
Limitations of VIVET in the dynamic road network Figures 12 and 13 depict the precomputation and query lookup stages of the VIVET algorithm. In order to restrict overlapping network traversal and duplication of the computation, VIVET traverses the network starting from a virtual node n * that connects to all data objects. For each node n i , there is always one data object d i on the shortest path from its virtual node such that d i is the nearest neighbor of the node n i . Then the results are stored in a precomputation table that holds the N number of nodes. The VIVET algorithm assumes that data objects and query objects are located at the graph nodes. Once the precomputation is completed, the ANN query can answer the results by locating the node on which every query object q i lies and then return the nearest data object d i for that query object q i .
The VIVET algorithm has been specifically designed to support the ANN query processing in a static road network. A precomputation table that holds the data object and the nearest node to each data object can be used to find the nearest query object. However, in a dynamic road network environment, the weight of an edge can be changed frequently, which invalidates the precomputed distance between query and data objects. Thus, in order to generate a valid query result, the precomputation table must be updated whenever there is a change in the network that significantly increases the computation overhead. Figure 12 shows a precomputation table at time t 1 . The NN results for two query objects q 1 and q 2 are shown in the green box. After traversing to all the vertices starting from the virtual node n * and passing through the connected data objects d 1 and d 2 , the precomputation table is filled with the nearest neighbors of the nodes. The precomputation table is also known as NN array. The shortest path connecting from n * to q 1 and q 2 passes through {n * → n 4 → n 1 } and {n * → n 9 → n 10 → n 11 } with the respective distance as one and eight. The green box shows the nearest neighbor for the query objects q 1 and q 2 lying on the nodes n 1 and n 11 as d 1 and d 2 . Now let us consider, at time t i+1 the weight of edges (n 2 , n 3 ), (n 4 , n 7 ), (n 7 , n 13 ), (n 13 , n 12 ), (n 12 , n 11 ), (n 9 , n 10 ) change due to certain traffic conditions. Then, the shortest path distance that connects n * to n 11 changes to {n * → n 7 → n 13 → n 12 → n 11 } with a distance of seven. When there occurs a change in network distance, the VIVET nullifies the NN array table; afterward, the traversal is initialized that recomputes the NN pairs, and finally, the NN array table is updated, which has been shown in Figure 13 and it also changes the NN result for q 2 , which becomes d 1 .
Whenever there exists a huge number of road network updates that require fresh results, indexing techniques are less efficient as it poses computation overhead. Table 6 shows the time and space complexities of the INE, SCL, and VIVET algorithms. The INE and SCL are almost identical when it comes to the ANN query lookup because both of the algorithms start traversing the road network from the query point and expanding the adjacent edges until it reaches its nearest neighbor data point. Thus, the input size for the INE and SCL are |Q|, and |Q|, respectively. However, SCL reduces the number of query points by clustering them hence reducing the NN query evaluations. On the other hand, the ANN query lookup using VIVET is linear to the size of the query objects, i.e., |Q|. As mentioned earlier, INE and SCL do not use any precomputation technique; hence, maintaining an index is always O(1). On the other hand, VIVET purely adopts the light indexing scheme, and precomputed results are stored in an NN array lookup table; therefore, VIVET takes O((|E| + |N| + |D|) × log|N|) during the precomputation phase. Therefore, the run-time complexity of VIVET is O((|E| + |N| + |D|) × log|N|) + |Q|. The total run-time taken by INE and SCL depends upon the size of the query points, but for VIVET, the data object size affects the run-time. In contrast, the SCL algorithm significantly reduces the number of query points by clustering them; hence, the size of |Q| will be relatively small than the size of |D|. To conclude, the VIVET inherently shows a worse performance in dynamic road networks than in static road networks because it has to perform precomputation whenever there is an update in the network distance.

Conclusions
Following, the analysis of the space complexities of the algorithms is carried out. The space complexity of the INE algorithm is O(|Q|) due to the reason that NN queries are evaluated from all existing query objects. The SCL takes O |Q| because the existing number of query objects are first scanned, then clustered to form a query cluster and for every query cluster |Q|, the memory consumption is 2×|Q| since, at most, two NN queries are required for a query segment. The VIVET algorithm initially creates an augmented graph G * from an original graph G. During the augmentation process, all the query objects and data objects are transformed into nodes, such that data objects and query objects are on the nodes. Storing the additional number of nodes during the execution would take the space of O(|N|). Consequently, the memory consumption of the SCL is comparatively smaller than those of precomputed methods.
This study investigated efficient methods to process ANN queries in dynamic road networks. Specifically, ANN queries involve processing a huge number of query requests, which imposes a high computational burden. Therefore, this paper proposed an efficient framework to process ANN queries in a dynamic road network that reduces the computation cost. To enhance the efficiency and effectiveness of the proposed algorithm, the shared execution technique is adopted that initially creates a cluster of nodes, followed by clustering of query objects in order to bypass redundant NN evaluations.
To evaluate the performance of the proposed algorithm, a simulation experiment was conducted that used real-world road network maps. Various data distribution combinations were used to evaluate the performance of the proposed framework. With the experimental demonstrations, it was verified that the proposed algorithm outperforms the VIVET and INE algorithms. Further, the evidence from the results proved that the SCL algorithm performs best to evaluate queries in cases involving a large number of query objects in a road segment. Motivated by the limitations of this work, as mentioned in the discussion section, extending this approach to the problem of distributed query processing in dynamic road environments can be further studied. This could facilitate minimizing the wireless communication and the server computation costs, both of which are heavily dependent on the amount of the location-update stream generated by moving objects. Additionally, the prospect of being able to integrate the shared execution approach with Markov's chain state model can support AI-based recommendation systems, which serve as a spur for future research.