Top-k Spatial Preference Queries in Directed Road Networks

: Top-k spatial preference queries rank objects based on the score of feature objects in their spatial neighborhood. Top-k preference queries are crucial for a wide range of location based services such as hotel browsing and apartment searching. In recent years, a lot of research has been conducted on processing of top-k spatial preference queries in Euclidean space. While few algorithms study top-k preference queries in road networks, they all focus on undirected road networks. In this paper, we investigate the problem of processing the top-k spatial preference queries in directed road networks where each road segment has a particular orientation. Computation of data object scores requires examining the scores of each feature object in its spatial neighborhood. This may cause the computational delay, thus resulting in a high query processing time. In this paper, we address this problem by proposing a pruning and grouping of feature objects to reduce the number of feature objects. Furthermore, we present an efﬁcient algorithm called TOPS that can process top-k spatial preference queries in directed road networks. Experimental results indicate that our algorithm signiﬁcantly reduces the query processing time compared to period solution for a wide range of problem settings.


Introduction
Due to the exponential growth of hand held devices, the widespread availability of maps and inexpensive network bandwidths have popularized location based services.According to Skyhook, the number of location-based applications being developed each month is increasing exponentially.Thus, spatial queries such as k nearest neighbor, range queries and reverse nearest neighbor [1][2][3][4][5] have received a significant amount of attention from the research community.However, most of the existing applications are limited to traditional spatial queries, which return objects based on their distances from the query point.
In this paper, we study the top-k spatial preference query, which returns a ranked list of k best spatial objects based on the neighborhood facilities.Given a set of data objects {d 1 , d 2 , . . ., d n } ∈ D, a top-k spatial preference query retrieves a set of k objects in D based on the quality of the facilities (the quality is calculated by aggregating the distance score and non-spatial score) in its neighborhood.Many real-life scenarios exist to illustrate the useful-ness of preference queries.Thus, if we consider a scenario in which a real estate agency office maintains a database of available apartments, a customer may want to rank the apartments based on neighboring facilities (e.g., market, hospitals, and school).
In another example, a tourist may be looking for hotels, where he may be interested in hotels located near some good quality restaurants, cafes or tourist spots.Figure 1 illustrates the data objects (i.e., hotels) as triangles, and two distinct feature datasets: solid rectangles denote cafes while hollow rectangles denote restaurants.The number on each edge denotes the cost of traveling on that edge, where the cost of an edge can be considered as the amount of time required to travel along it e.g., ( 1 ,  1 ⃗⃗⃗⃗⃗⃗⃗⃗⃗⃗⃗ ) = 2, ( 3 ,  2 ̅̅̅̅̅̅̅ ) = 1.The numbers in parentheses over the feature objects denote the score of that particular feature object e.g., the score of feature object  2 is 0.5.Now, let us consider a tourist looking for a hotel near to cafes and restaurants.The tourist can restrict the range of the query so we assume that range is 3 (i.e.,  = 3) in this example.The score of the hotels can be determined according to the following criteria (1) the maximum quality for each feature in the neighborhood region; and (2) the aggregate of these qualities.For instance, if the hotels are ranked based on the scores of cafes only, the top hotel would be  1 because the score of  1 ,  2 and  3 are 0.7, 0 and 0.5, respectively.It should be noted that the score of  2 is 0 although  1 is within its range, because it is a directed road network and no path connects  2 to  1 .On the contrary, if the hotels are ranked based only on the scores of restaurants, the top hotel would be  2 because the score of  1 ,  2 and  3 are 0, 0.8 and 0.5, respectively.Finally, if the hotels are ranked based on the summed scores of restaurants and cafes, then the top hotel would be  3 because the score of  1 ,  2 and  3 are 0.7, 0.8 and 1, respectively.Two basic factors are considered when ranking objects: (1) the spatial ranking, which is the distance and (2) the non-spatial ranking, which ranks the objects based on the quality of the feature objects.Our top-k spatial preference query algorithm efficiently integrates these two factors to retrieve a list of data objects with the highest score.
Unlike traditional top-k queries [6,7], top-k spatial preference queries require that the score of a data object is defined by the feature objects that satisfies a spatial neighborhood condition such as the range, nearest neighbor, or influence [8][9][10][11].Therefore, a pair comprising a data object and a feature objects needs to be examined in order to determine the score of the corresponding data object.In addition, processing top-k spatial preference queries in road networks is more complex than in Euclidean space, because the former requires the exploration of the spatial neighborhood along the given road network.
Top- spatial preference queries are intuitive and they have several useful applications such as hotel browsing.Unfortunately, most of the existing algorithms are focused on the Euclidean space and little attention has been given to road networks.Indeed, although few algorithms exist for the preference queries in road networks, they all consider undirected road networks.Motivated by aforementioned reasons, we propose a new approach to process top- spatial preference queries in directed road networks, where each road segment has a particular orientation (i.e., either directed or undirected).In our method, the feature objects in the road network are grouped together in order to reduce the number of pairs that need to be examined to find the top- data objects.We propose a method for grouping feature objects based on pivot nodes; as described in detail in Section 4.2.1.All Two basic factors are considered when ranking objects: (1) the spatial ranking, which is the distance and (2) the non-spatial ranking, which ranks the objects based on the quality of the feature objects.Our top-k spatial preference query algorithm efficiently integrates these two factors to retrieve a list of data objects with the highest score.
Unlike traditional top-k queries [6,7], top-k spatial preference queries require that the score of a data object is defined by the feature objects that satisfies a spatial neighborhood condition such as the range, nearest neighbor, or influence [8][9][10][11].Therefore, a pair comprising a data object and a feature objects needs to be examined in order to determine the score of the corresponding data object.In addition, processing top-k spatial preference queries in road networks is more complex than in Euclidean space, because the former requires the exploration of the spatial neighborhood along the given road network.
Top-k spatial preference queries are intuitive and they have several useful applications such as hotel browsing.Unfortunately, most of the existing algorithms are focused on the Euclidean space and little attention has been given to road networks.Indeed, although few algorithms exist for the preference queries in road networks, they all consider undirected road networks.Motivated by aforementioned reasons, we propose a new approach to process top-k spatial preference queries in directed road networks, where each road segment has a particular orientation (i.e., either directed or undirected).In our method, the feature objects in the road network are grouped together in order to reduce the number of pairs that need to be examined to find the top-k data objects.We propose a method for grouping feature objects based on pivot nodes; as described in detail in Section 4.2.1.All of the pairs of data objects and feature groups are mapped onto a distance-score space, and a subset of pairs is identified that is sufficient to answer spatial preference queries.In order to map the pairs, we describe a mathematical formula for computing the minimum and maximum distances in directed road networks between data objects and feature groups.Finally, we present our Top-k Spatial Preference Query Algorithm (TOPS), which can efficiently compute the top-k data objects using these pairs.This study is an extended version of our previous investigation of top-k spatial preference queries in directed road networks [12].However, we present four new extensions that we did not consider in our preliminary study [12].The first extension is an enhanced grouping technique which can efficiently obtain multiple feature groups associated with a single pivot node (Section 4.2.1).The second extension studies adaptations of the proposed algorithms to deal with neighborhood conditions other than range score, e.g., nearest neighbor score and influence score (Section 5).In the third extension, we present an efficient incremental maintenance method for materialized skyline sets (Section 5).In the fourth extension, we conduct an extensive experimental study in which we increase the number of parameters to evaluate the performance of TOPS; as well as implementing two versions of TOPS: TOPS gr and TOPS in where we compared their performance with the Period approach (Section 6).In addition to these major extensions, we also present a Lemma to prove that the materialized skyline set is sufficient for determining the partial score of a data object d ∈ D. Furthermore, we also discuss the limitations of undirected road networks based algorithms in a directed road networks (Section 4.4).
The contributions of this paper are summarized as follows: • We propose an efficient algorithm called TOPS for processing top-k preference queries in directed road networks.To the best of our knowledge, this is the first study to address this problem.

•
We present a method for grouping feature objects based on a pivot node.We show the mapping of data objects and feature groups in a distance-score space to generate a skyline set.

•
We state lemmas for computing the minimum and maximum distances between the data object and the feature group.

•
In addition, we propose a cost-efficient method for the incremental maintenance of materialized skyline sets.

•
Based on experimental evaluations, we study the effects of applying our proposed algorithm with various parameters using real-life road dataset.
The remainder of this paper is organized as follows.We discuss related research in Section 2. In Section 3, we define the primary terms and notation used in this study, as well as formulating the problem.In Section 4, we explain the pruning and grouping of feature objects, as well as mapping of pairs of data objects and feature groups to a distance-score space.Section 5 presents our proposed algorithm for processing top-k spatial preference queries in directed road networks.Our extensive experimental evaluations are presented in Section 6.Finally, we give our conclusion in Section 7.

Related Work
In this section, we review the previous algorithms proposed for ranking spatial data objects.Object ranking is a popular retrieval task in various applications.As found in most relational database applications, we often want to rank the tuples using an aggregate score of attribute values.For example, a rental car agency maintains a database that contains information about cars available for rent.A potential customer wishes to view the top 10 options among the latest models with the lowest prices.In this case, the score of each car is expressed by the sum of two qualities: model and price.In spatial databases, the rankings are often associated with nearest neighbor queries.Thus, given a query point q, we are mainly interested in finding the set of nearest objects to it that meet a specific condition.Many algorithms [1,[13][14][15] have been proposed to retrieve the nearest neighbor objects both in Euclidean space and road networks.
In the following subsections, we survey the skyline queries in Section 2.1, as well as briefly review the feature-based spatial queries in Section 2.2, and top-k spatial preference queries in Section 2.3.

Skyline Queries
In recent years, skyline query processing [16][17][18][19][20][21] has attracted much of attention due to its suitability for decision-making applications such as top-k spatial preference queries.Lin et al. [16] studied the general spatial skyline (GSSKY) problem, which generates a minimal candidate set comprising the optimal solutions for any monotonic distance-based spatial preference query.They proposed an efficient progressive algorithm called P-GSSKY, which considerably reduces the number of non-promising objects during computation.Lee et al. [17] proposed two index-based approaches, called the augmented R-tree and dominance diagram for various skyline queries in Euclidean space, e.g., location-dependent skyline queries and reverse location-dependent skyline queries.Liu et al. [18] presented an algorithm for traditional top-k queries in road networks, which considers multiple attributes of data objects, including the data object's location.For example, when a user wants to find a hotel, they often consider several factors such as the distance to the hotel, the hotel rating, and the service quality.However, their method does not consider the relationship between a data object and a feature object; therefore, it cannot be applied to top-k spatial preference queries in road networks.Deng et al. [19] was the first to consider the problem of multi-source skyline queries in road networks.Recently, Cheema et al. [22] addressed skyline queries for moving queries, where they proposed a safe zone based algorithm for continuously monitoring a skyline query.

Feature-Based Spatial Queries
Xia et al. [23] proposed a novel algorithm for retrieving the top-t most influential spatial sites (e.g., residential apartments) based on their influence on the feature points (e.g., market).The influence of each site p is determined by the sum of the scores of all the feature objects with p as their closest site.Yang et al. [24,25] studied the problem of finding optimal location queries, however, unlike [23] the optimal location query retrieves any point in the data space and it is not necessarily an object in the dataset; although the score computation method is similar [23].These algorithms [23][24][25] are specific to the particular query types mentioned above and they cannot be applied to the top-k spatial preference queries.In addition, they only deal with the single feature dataset whereas preference query considers a multiple feature dataset.

Top-k Spatial Preference Queries
Several algorithms have been proposed for top-k spatial preference queries in Euclidean space.Yiu el al. [8,9] first considered computing the score of data object p based on the set of multiple feature objects in its spatial neighborhood rather than a single feature object set as studied previously [21][22][23].We know that the score of a data object can be defined by three different spatial scores, i.e., the range, nearest neighbor, and nearest score; thus, Yiu et al. [8,9] designed several algorithms according to three categories (Probing Algorithms, Branch and Bound Algorithms, and Feature Join Algorithms) for evaluating the top-k spatial preference queries for these scores.In contrast to Yiu et al. [8,9], Rocha-Junior et al. [10] proposed a materialization technique which yields significant computational and I/O cost savings during query processing.They introduced a mapping of pairs of the data object and the feature object to a distance-score space.The minimal subset of the pairs is materialized and it is sufficient to answer any spatial preference query in an efficient manner.However, these Euclidean space based algorithms [8][9][10] are not suitable for evaluating top-k spatial preference queries in road networks because the distance between objects is determined by the shortest path connecting them in road networks.
Cho et al. [11] proposed a novel algorithm called ALPS for processing top-k preference queries in road networks, where they extended the materialization technique [10] based on the distance-score space for road networks.To minimize the number of data objects, their methodology groups a set of data objects in a road segment and then converts grouped data objects into a data segment.ALPS [11] can efficiently process the top-k spatial preference queries in road networks, but it only works for undirected road networks.
This study distinguishes itself from existing studies in several aspects.Firstly, we study top-k spatial preference queries in a directed road network where each road segment has a particular orientation whereas the previous studies they either focus on Euclidean space [8][9][10] or an undirected road network [11].Secondly, our approach is based on the grouping of feature objects, which reduces the computation cost.Lastly, we devise a mathematical formula to quickly calculate the minimum and maximum distances between the data object and the feature group in the road network.In recent years, different variations of preference queries have been studied in spatial road networks.Mouratidis et al. [26] studies preference queries in multi-cost transportation network where each road segment is associated with multiple cost values.Lin et al. [27] studies k multi-preference queries to retrieve top-k groups of facilities that minimizes the traveling distance in the road network.These studies have different problem scenarios from those in our study and their solutions are not appropriate in our problem domain.Furthermore, we present detailed limitations of undirected based algorithms in directed road networks in Section 4.4.

Preliminaries
Section 3.1 defines the terms and notations used in this paper.Section 3.2 formulates the problem using an example to illustrate the results obtained from top-k spatial preference queries.

Definition of Terms and Notations
Road Network: A road network is represented by a weighted directed graph (G = N, E, W) where N, E, and W denote the node set, edge set, and edge distance matrix, respectively.Each edge is also assigned an orientation, which is either undirected or directed.An undirected edge is represented by e = n i n j , where n i and n j are adjacent nodes, whereas a directed edge is represented as Naturally, the arrows above the edges denote their associated directions.
Segment s(p i , p j ) is the part of an edge between two points p 1 and p 2 on the edge.An edge comprises one or more segments.An edge is also considered to be a segment where the nodes are the end points of the edge.To simplify the presentation, Table 1 lists the notations used in this study.
the top-k spatial preference query returns the k data objects with the highest scores.The score of a data object d ∈ D is defined by the scores of the feature objects f ∈ F i in the spatial neighborhood of the data object.Each feature object f has a non-spatial score, denoted as s( f ), which indicates the quality (goodness) of f and it is graded in the range [0,1].
The score γ θ (d) of a data object d is determined by aggregating the partial scores with respect to a neighborhood condition θ and the ith feature dataset F i .The aggregation function 'agg' can be any monotone function (sum, max, min), but in this study we use the sum to simplify the discussion.We consider the range (rng), nearest neighbor (nn), and influence (inf ) constraints as the neighborhood condition θ.In particular, the score γ θ (d) is defined as, where θ ∈ {rng, nn, in f } and agg ∈ {sum, max, min}.The partial score γ θ i is determined by the feature objects that belongs to the ith feature dataset F i only and that satisfy the neighborhood condition θ.In other words γ θ i is defined by the highest score s(f) for a single feature object f ∈ F i that satisfies the spatial constraint θ.Similar to previous studies [9][10][11], the partial scores γ θ i for different neighborhood conditions θ are defined as follows: • Range(rng) score of d: Next, we evaluate the scores of data objects d 1 , d 2 and d 3 in Figure 1.We consider the range constraint value r = 3.Table 2 summarizes the scores obtained of d 1 , d 2 and d 3 using the aforementioned definitions of the neighborhood conditions {rng, nn, inf }.The score of data object d is the sum of the partial score of each feature set.The range score of d 1 is calculated as γ rng (d 1 ) = γ rng 1 (d 1 ) + γ rng 2 (d 1 ) = 0.7.The ith partial range score of a data object d is the maximum non-spatial score of the feature objects f ∈ F i which are within the range r.Therefore, the first partial score γ rng 1 (d 1 ) = 0.7, because s(a 1 ) = 0.7 and dist(d 1 , a 1 ) ≤ r.However, γ rng 2 (d 1 ) = 0 because there is no restaurant located within the defined range r.Similarly, the range scores of d 2 and d 3 are γ rng (d 2 ) = 0 + 0.8 = 0.8 and γ rng (d 3 ) = 0.5 + 0.5 = 1.0, respectively.
The nearest neighbor score of d 1 is calculated as γ nn (d 1 ) = γ nn 1 (d 1 ) + γ nn 2 (d 1 ) = 1.5.The ith partial nearest neighbor score of a data object d is the score of the nearest feature object f ∈ F i to d.Therefore, the first partial score γ nn 1 (d 1 ) = 0.7, because a 1 is the closest feature object of d 1 and s(a 1 ) = 0.7, whereas γ nn 2 (d 1 ) = 0.8 because b 1 is the closest feature object of d 1 and s(b 1 ) = 0.8.Similarly, the nearest neighbor score of d 2 and d 3 are γ nn (d 2 ) = 0.5 + 0.8 = 1.3 and The influence score of d 1 is calculated as 64.The ith partial influence score of a data object d is calculated using the score of the feature object f ∈ F i and its distance to the data object.Specifically, the influence score is inversely proportional to the distance between d and f.Therefore, the influence score decreases rapidly as the distance between the feature object f and data object d increases.The first partial score

Finding Pivot Nodes
We now discuss the method used for computing the pivot nodes.In our approach, we group the feature objects based on the pivot nodes.Each feature object is associated with one pivot node, and thus feature objects sharing the same pivot node can be grouped together.It is obvious that the performance of the proposed scheme will improve if the number of feature groups is small.Therefore, the main objective is to retrieve the minimum number of pivot nodes.Computing the minimum number of pivot nodes is a minimum vertex cover problem.The minimum vertex cover comprises a set of nodes that can connect all the edges of the graph with the minimum number of nodes.

Definition 1:
A vertex cover of a graph G = (V, E) is a subset S ⊂ V such that if (u, v) ∈ E then either u ∈ S or v ∈ S or both.In other words, a vertex cover is a subset of nodes that contains at least one node on each edge.
The minimum vertex cover is an NP-complete problem and it is also closely related to many other hard graph problems.Therefore, numerous studies have been conducted to design optimization and approximation techniques based on Branch and Bound Algorithm, Greedy Algorithm and Genetic Algorithm.We employ the technique proposed by Hartmann [28] based on the Branch and bound algorithm because it is a complete algorithm, thereby ensuring that we find the best solution or the optimal solution of various optimization problems, including the minimum vertex cover.However, the only tradeoff when using Branch and Bound algorithm is that the running time increases with large graphs.
The Branch and Bound algorithm recursively explores the complete graph by determining the presence or absence of one node in the cover during each step of the recursive process, and then recursively solving the problem for the remaining nodes.The complete search space can be considered as a tree where each level determines the presence or absence of one node, and there are two possible branches to follow for each node; one corresponds to selecting the node for the cover whereas the other corresponds to ignoring the node.Virtually a node that is covered and all of its adjacent edges are removed from the graph.The algorithm does not need to descend further into the tree when a cover has been found, i.e., when all of the edges are covered.Next, the backtracking process starts and search continues to higher levels of the tree to identify a cover with a possibly smaller vertex cover.During backtracking all of the covered nodes are reinserted in the graph.Subsets of the nodes are determined that yield legitimate vertex covers and the smallest in size is the minimum vertex cover.Let us consider an example in Figure 2, where we found that the set of nodes {n 2 , n 6 , n 7 } constructs a minimum vertex cover, that connects all of the edges in a given road network.
ISPRS Int.J. Geo-Inf.2016, 5, 170 7 of 25 nodes is a minimum vertex cover problem.The minimum vertex cover comprises a set of nodes that can connect all the edges of the graph with the minimum number of nodes.

Definition 1:
A vertex cover of a graph  = (, ) is a subset  ⊂  such that if (, ) ∈  then either  ∈  or  ∈  or both.In other words, a vertex cover is a subset of nodes that contains at least one node on each edge.
The minimum vertex cover is an NP-complete problem and it is also closely related to many other hard graph problems.Therefore, numerous studies have been conducted to design optimization and approximation techniques based on Branch and Bound Algorithm, Greedy Algorithm and Genetic Algorithm.We employ the technique proposed by Hartmann [28] based on the Branch and bound algorithm because it is a complete algorithm, thereby ensuring that we find the best solution or the optimal solution of various optimization problems, including the minimum vertex cover.However, the only tradeoff when using Branch and Bound algorithm is that the running time increases with large graphs.
The Branch and Bound algorithm recursively explores the complete graph by determining the presence or absence of one node in the cover during each step of the recursive process, and then recursively solving the problem for the remaining nodes.The complete search space can be considered as a tree where each level determines the presence or absence of one node, and there are two possible branches to follow for each node; one corresponds to selecting the node for the cover whereas the other corresponds to ignoring the node.Virtually a node that is covered and all of its adjacent edges are removed from the graph.The algorithm does not need to descend further into the tree when a cover has been found, i.e., when all of the edges are covered.Next, the backtracking process starts and search continues to higher levels of the tree to identify a cover with a possibly smaller vertex cover.During backtracking all of the covered nodes are reinserted in the graph.Subsets of the nodes are determined that yield legitimate vertex covers and the smallest in size is the minimum vertex cover.Let us consider an example in Figure 2, where we found that the set of nodes { 2 ,  6 ,  7 } constructs a minimum vertex cover, that connects all of the edges in a given road network.

Pruning and Grouping
Top-k spatial preference queries return a ranked set of spatial data objects.Unlike traditional top-k queries the rank of each data object is determined by the quality of the feature objects in its spatial neighborhood.Thus, computing the partial score of a data object d based on the feature set Fi requires the examination of every pair of objects (d, f).Therefore, for a large number of objects, the

Pruning and Grouping
Top-k spatial preference queries return a ranked set of spatial data objects.Unlike traditional top-k queries the rank of each data object is determined by the quality of the feature objects in its spatial neighborhood.Thus, computing the partial score of a data object d based on the feature set F i requires the examination of every pair of objects (d, f ).Therefore, for a large number of objects, the search space that needs to be explored to determine the partial score is also significantly high, thereby further increasing the challenges of efficiently processing top-k spatial queries in directed road networks.
In Section 4.1, we discuss the dominance relation and we then explain the pruning lemma.Section 4.2 presents the grouping algorithm and the computation of the feature group score s(g), as well as discussing the computation of the minimum and maximum distances between data objects and feature groups.Section 4.3 describes the mapping of pairs of data objects and feature groups to the distance-score space.Finally, we discuss the limitations of undirected based algorithms in directed road networks in Section 4.4.

Pruning
In this section, we present a method for finding the dominant feature objects that contribute only to the score of a data object.The feature objects that do not contribute to the score of a data object will be pruned automatically.This dramatically reduces the search space, thereby significantly decreasing the computational cost.In order to make the pruning step more efficient, we use the pre-computed distances stored in a minimum distance table MDT.The MDT stores the pre-computed distances between the pair of nodes n i and n j in a directed road network.Each tuple in a MDT is of the form {(n i , n j ), dist(n i , n j )}, where (n i , n j ) is used as a search key for retrieving the value of dist(n i , n j ).
It should be noted that the network distance between two nodes, n i and n j , is not symmetrical in a directed road network (i.e., dist(n i , n j ) = dist(n j , n i )).Therefore, we need to insert a separate entry to retrieve the distance from n j to n i .Figure 2 shows an example of a directed road network, which we employ throughout this section.
Before presenting the pruning lemma, let us define some useful terminologies: Static Dimension: The static dimensions i (i.e., 1 ≤ i ≤ n) are fixed criteria that are not changed by the motion of the query, such as the rank of any restaurant or price.

Static Equality
Complete Dominance: To explain the pruning lemma we need to define complete dominance.An object o is completely dominated by another object o with respect to data object d, if o s o as well as dist(d, o ) < dist(d, o).In other words, o completely dominates o if o is equally good in terms of its static dimensions and it is also closer to the data object d.In Figure 2, the feature object f 1 completely dominates f 2 with respect to Proof: The proof is straight forward and thus it is omitted.Intuitively, a Lemma 1 state that f is a dominant object if f is closer to d than every other object f and it is at least as good as f in terms of its static dimensions (i.e., f s f ).
In Figure 2, f 2 is not a dominant object of d 1 because a feature object f 1 exists such that f 1 s f 2 and dist(d 1 , f 1 ) < dist(d 1 , f 2 ).Hence, f 2 is completely dominated by f 1 , and thus it is pruned, whereas, f 4 is a dominant object because it is closer to d 1 .Here, note that f 3 cannot be a dominant object for data object d 1 because we are considering a directed road network and no path exists to f 3 from d 1 .
Figure 3 depicts the mapping of D ⊗ F i to the distance-score space M. We formally define the distance-score space in Section 4.3.The black square shows the mapping of pairs d ⊗ f where d ∈ D and f ∈ F i .Now, by applying the dominance relationship onto the mapping in Figure 3a, we find that pair f4 is a dominant object because it is closer to d1.Here, note that f3 cannot be a dominant object for data object d1 because we are considering a directed road network and no path exists to f3 from d1. Figure 3 depicts the mapping of ⨂  to the distance-score space M. We formally define the distance-score space in Section 4.3.The black square shows the mapping of pairs ⨂ where  ∈  and  ∈   .Now, by applying the dominance relationship onto the mapping in Figure 3a, we find that pair  1 ⨂ 2 is completely dominated by  1 ⨂ 1 .Therefore, ( 1 ,   ) =  1 ⨂ 4 ,  1 ⨂ 1 .Figure 3b shows that  2 ⨂ 5 ,  2 ⨂ 7 ,  2 ⨂ 3 and  2 ⨂ 1 are dominated pairs.Similarly, Figure 3c shows the mapping of  3 ⨂  , and it is clear that both pairs  3 ⨂ 10 and  3 ⨂ 9 are not dominated by any other pair.

Grouping
In this section, we describe our approach for grouping the feature objects.The pruning phase reduces the number of feature objects, but this can be reduced further by merging them into a group.In addition, the grouping technique reduces the size of skyline set and the entries in R-tree [29], thereby enhancing the efficiency of algorithm by minimizing the memory consumption required.As mentioned earlier, the score for data object d is computed from the score of the feature objects  ∈   which requires that we examine every pair of objects (d, f).The performance of the algorithm will decline dramatically if the number of feature objects is excessively high.Therefore, the main purpose of grouping is to further reduce the number of feature objects by grouping them together, which consequently reduces the number of pairs.Thus, instead of evaluating the individual pairs  ⨂ , our algorithm evaluates evaluates  ⨂ , where g denotes a feature group and a set of feature groups are represented as Gi.Grouping the feature objects has two main advantages as follows.

Grouping
In this section, we describe our approach for grouping the feature objects.The pruning phase reduces the number of feature objects, but this can be reduced further by merging them into a group.In addition, the grouping technique reduces the size of skyline set and the entries in R-tree [29], thereby enhancing the efficiency of algorithm by minimizing the memory consumption required.As mentioned earlier, the score for data object d is computed from the score of the feature objects f ∈ F i which requires that we examine every pair of objects (d, f ).The performance of the algorithm will decline dramatically if the number of feature objects is excessively high.Therefore, the main purpose of grouping is to further reduce the number of feature objects by grouping them together, which consequently reduces the number of pairs.Thus, instead of evaluating the individual pairs d ⊗ f , our algorithm evaluates evaluates d ⊗ g, where g denotes a feature group and a set of feature groups are represented as G i .Grouping the feature objects has two main advantages as follows.

1.
It is easy to compute the highest score of a data object.

2.
The computational cost and memory consumption are decreased by reducing the number of pairs.

Grouping Method
We now discuss the method for grouping feature objects based on pivot nodes.We have described the technique for finding the minimum pivot nodes in Section 3.3.For grouping, each feature object is associated with one of a pivot node.In pruning phase, we find the dominant feature objects for each data object.The dominant feature objects of each data object are grouped together if they are associated with the same pivot node.In our previous study [12], we grouped all the feature objects connected to one pivot node, thereby generating one feature group per pivot node.However, in some cases, more than one feature group may be associated with a single pivot node if dominant feature objects of multiple data objects share the same pivot node.
Let us consider the same example shown in Figure 2, where node n 2 is the pivot node and the feature objects f 1 , f 2 , f 3 , and f 4 are connected to it.As mentioned in Section 4.1, for data object d 1 the dominant objects are f 1 and f 4 whereas feature object f 2 and f 3 are pruned.However, for data object d 2 ; f 1 and f 3 is dominant whereas f 2 and f 4 are pruned.Therefore, two groups are formed { f 1 , f 4 } ∈ g 1 and { f 1 , f 3 } ∈ g 2 , which are associated with pivot node n 2 .Table 3 summarizes the grouping of feature objects.

Pivot Node
Groups Computation of the Group Score s(g) Due to the separate score of each feature object, the computation of partial score γ θ i (d) becomes costly for a large number of feature objects.We devised a new method for calculating the partial scores based on the group score denoted as s(g).The group score is the highest score for any feature object that belongs to a group such that it qualifies the neighborhood conditions.Table 3 shows that { f 1 , f 4 } ∈ g 1 , s( f 1 ) = 0.9 and s( f 4 ) = 0.5.Therefore, s(g 1 ) = 0.9 which is the highest score of the feature object belongs to g 1 .The score of other groups can be computed in a similar fashion.
The partial score γ θ i by using s(g) can be defined as follows: • Range(rng) score of d: Nearest neighbor (nn) score of d: We modify the formulae presented in Section 3.2 to compute the partial score γ θ i by using the group score s(g) instead of the feature score s( f ).The only difference is maxdist(d, g) is used instead of dist(d, f ) for range and influence score whereas mindist(d, g) is used instead of dist(d, f ) for nearest neighbor score.

Computation of the Distance between a Data Object and Feature Group
In this section, we present Lemmas for the computation of the minimum and maximum distances between a data object and feature group.The subset of pairs d ⊗ g retrieved in the grouping step is indexed in an R-tree [17], where it is necessary to compute the minimum and maximum distances between a data object and feature group.Lemma 2 presents the computation of mindist(d, g), while Lemma 2: Given a data object and a feature group g, mindist(d, g) is as follows: Proof: This lemma is self-evident, so the proof is omitted.Here, β denotes the boundary point of feature group.For We consider a directed road network and thus to determine maxdist(d, g), it is necessary to evaluate maxdist(d,  5, it is obvious that dist(d, p) = len(d, β) + dist(β, α) + len(α, p).From the equation above, we can observe that the distance value will increase with the value of len(α, p), so to obtain the maximum distance value, the point p must be very close to d, and thus we can say that d ∼ = p.Therefore, we can rewrite the equation above as maxdist(d, Proof: This lemma is self-evident, so the proof is omitted.Table 4 summarizes the minimum and maximum distances along with the score for the d ⊗ g in Figure 2.  2.

Data Object
Feature Group d ⊗ g Table 4 summarizes the minimum and maximum distances along with the score for the  ⨂  in Figure 2.  Table 4 summarizes the minimum and maximum distances along with the score for the  ⨂  in Figure 2.
In the following, we define the dominance relation which is the subset of pairs of M that comprise the skyline set of M, denoted as S = SKY(M).The skyline set S is the set of pairs (d ⊗ g) ∈ M which are not dominated by any other pair d ⊗ g ∈ M .
Let SKY(d ⊗ G i ) be the set of all pairs that are not dominated by any other pair in d ⊗ G i .Figure 6 shows the mapping of D ⊗ G i in Table 4 to the distance-score space M. Figure 6a shows the mapping of d 1 ⊗ G i , Figure 6b shows the mapping of d 2 ⊗ G i and Figure 6c shows the mapping of d 3 ⊗ G i .The skyline sets of respectively.The pairs related to different data objects (e.g., d 1 ⊗ G i and d 2 ⊗ G i ) are definitely incomparable.Finally, the skyline set for D ⊗ G i is the union of the skyline sets of all the data objects d ∈ D. Thus, Figure 6d shows the skyline set, SKY(D ISPRS Int.J. Geo .However, a pair (⨂  ) ∉ (⨂  ) such that there is another pair ⨂  ⊏ ⨂  and (⨂  ) ∈ (⨂  ), which is equivalent to either (,   ) ≤ (,   ) and s(  ) > s(  ) or if (,   ) < (,   ) and (  ) ≥ (  ).Hence, the partial score of d is    () = (  ) if  =  or  =  , and    () = (  ) × 2 − (,  )  if  =  .This contradicts our assumption that   contributes to    () .Therefore, (⨂  ) is sufficient for obtaining the component score of a data object  ∈ .
Observe that in Figure 6a, d 1 ⊗ g 1 is mapped at 0.9 because s(g 1 ) = 0.9.It should be noted that s(g) can be changed according to the neighborhood conditions.As explained earlier, s(g 1 ) = 0.9 because { f 1 , f 4 } ∈ g 1 and s( f 1 ) = 0.9.Now, if we consider range condition r = 2, the s(g 1 ) = 0.9, because dist(d 1 , f 1 ) = 3, which does not satisfy the neighborhood condition.In this scenario, the s(g 1 ) is changed to 0.5 which is the score of f 4 .Thus, d 1 ⊗ g 1 is mapped at 0.5.
Figure 7a shows the mapping of SKY(D ⊗ G i ) to M. Figure 7b, on the other hand, shows an R-tree that indexes the four pairs in SKY(D ⊗ G i ), assuming that node capacity of R-tree is set to 3. Therefore, index node R 2 encloses d 1 ⊗ g 1 and d 2 ⊗ g 2 , whereas index node R 3 , encloses d 3 ⊗ g 4 and d 2 ⊗ g 3 .Finally, we present Lemma 5, which proves that SKY(d ⊗ G i ) is sufficient for determining the partial score of each data object d ∈ D. Before presenting Lemma 5, recall that each feature object f ∈ g is a dominant feature object and not dominated by any other feature object f .Proof: Let us assume that SKY(d ⊗ G i ) is not sufficient for obtaining the partial score γ θ i (d) of a data object d ∈ D. This means that there is a feature group g b that contributes to

Limitations of Undirected Algorithms in Directed Road Networks
In contrast to undirected road networks, the network distance between two nodes is not symmetrical in directed road networks, i.e., dist(n i , n j ) = dist(n j , n i ). Figure 8a shows an undirected road network, where there are two data objects d 1 and d 2 , and two feature objects f 1 and f 2 .To simplify the presentation, we consider a single feature dataset F i = { f 1 , 0.6, f 2 , 0.8}.ISPRS Int.J. Geo-Inf.2016, 5, 170 14 of 25

Limitations of Undirected Algorithms in Directed Road Networks
In contrast to undirected road networks, the network distance between two nodes is not symmetrical in directed road networks, i.e., (  ,   ) ≠ (  ,   ). Figure 8a shows an undirected road network, where there are two data objects d1 and d2, and two feature objects f1 and f2.To simplify the presentation, we consider a single feature dataset   = {〈 1 , 0.6〉, 〈 2 , 0.8〉}.
Let us now evaluate the scores of the data objects.Suppose that the neighborhood condition is the range and value of the range constraint r = 3.The range score of d1 is 0.8 because ( 2 ) = 0.8 and ( 1 ,  2 ) ≤  .Similarly, the range score of d2 is 0.6 because ( 1 ) = 0.6 and ( 2 ,  1 ) ≤  .Therefore, d1 is the top-1 result with   ( 1 ) = 0.8.Now, assume the directed road network as shown in Figure 8b.In this case, the range score of d1 is 0 because no feature object exists within distance r.However, the range score of d2 remains the same because f1 still exists within distance r.Therefore, in the directed road network, d2 is the top-1 result with   ( 2 ) = 0.6 .This example clearly demonstrates that an algorithm based on an undirected road network cannot be applied to directed road networks.
The research study closest to our present work was presented by Cho et al. [11].They proposed an algorithm called ALPS for processing preference queries in undirected road networks, where the data objects in a road sequence are grouped to form a data segment in their approach.The motivation behind grouping data objects is that data objects in a sequence are close to each other, so it is more efficient to process them together rather than handling each data object separately.However, ALPS fall short in answering preference queries in directed road networks.Now, we present why ALPS cannot process preference queries in directed road networks.Consider a directed road in Figure 9 where there are four data objects d1, d2, d3 and d4 and two feature objects f1 and f2 which are denoted as triangles and rectangles, respectively.The data objects d1 and d2 lies in a same sequence are grouped and converted to data segment  1  2 ̅̅̅̅̅̅ and data objects d3 and d4 are converted to  3  4 ̅̅̅̅̅̅ .Observe that, grouping of  3  4 ̅̅̅̅̅̅ is not valid because there is no path that exists to connect d3 to d4, as shown in Figure 9.
(a) (b)   Let us now evaluate the scores of the data objects.Suppose that the neighborhood condition is the range and value of the range constraint r = 3.The range score of d 1 is 0.8 because s( f 2 ) = 0.8 and dist(d 1 , f 2 ) ≤ r.Similarly, the range score of d 2 is 0.6 because s( f 1 ) = 0.6 and dist(d 2 , f 1 ) ≤ r.Therefore, d 1 is the top-1 result with γ rng (d 1 ) = 0.8.Now, assume the directed road network as shown in Figure 8b.In this case, the range score of d 1 is 0 because no feature object exists within distance r.However, the range score of d 2 remains the same because f 1 still exists within distance r.Therefore, in the directed road network, d 2 is the top-1 result with γ rng (d 2 ) = 0.6.This example clearly demonstrates that an algorithm based on an undirected road network cannot be applied to directed road networks.
The research study closest to our present work was presented by Cho et al. [11].They proposed an algorithm called ALPS for processing preference queries in undirected road networks, where the data objects in a road sequence are grouped to form a data segment in their approach.The motivation behind grouping data objects is that data objects in a sequence are close to each other, so it is more efficient to process them together rather than handling each data object separately.However, ALPS fall short in answering preference queries in directed road networks.Now, we present why ALPS cannot process preference queries in directed road networks.Consider a directed road in Figure 9  undirected road network cannot be applied to directed road networks.
The research study closest to our present work was presented by Cho et al. [11].They proposed an algorithm called ALPS for processing preference queries in undirected road networks, where the data objects in a road sequence are grouped to form a data segment in their approach.The motivation behind grouping data objects is that data objects in a sequence are close to each other, so it is more efficient to process them together rather than handling each data object separately.However, ALPS fall short in answering preference queries in directed road networks.Now, we present why ALPS cannot process preference queries in directed road networks.Consider a directed road in Figure 9 where there are four data objects d1, d2, d3 and d4 and two feature objects f1 and f2 which are denoted as triangles and rectangles, respectively.The data objects d1 and d2 lies in a same sequence are grouped and converted to data segment  1  2 ̅̅̅̅̅̅ and data objects d3 and d4 are converted to  3  4 ̅̅̅̅̅̅ .Observe that, grouping of  3  4 ̅̅̅̅̅̅ is not valid because there is no path that exists to connect d3 to d4, as shown in Figure 9.
(a) (b)  Another issue is indexing of data segments and feature object pairs in R-tree for directed road networks.Let ⨂ denote a pair that consists of a data segment dseg and feature object f.To index a pair in R-tree it is necessary to compute minimum and maximum distances between data segment dseg and a feature object f.In Figure 9, as per ALPS computation ( 1  2 ̅̅̅̅̅̅ ,  1 ) = 2 Another issue is indexing of data segments and feature object pairs in R-tree for directed road networks.Let dseg ⊗ f denote a pair that consists of a data segment dseg and feature object f.To index a pair in R-tree it is necessary to compute minimum and maximum distances between data segment dseg and a feature object f.In Figure 9, as per ALPS computation mindist(d 1 d 2 , f 1 ) = 2 and maxdist(d 1 d 2 , f 1 ) = 4. Conversely, the actual maxdist(d 1 d 2 , f 1 ) = 6.ALPS computes the minimum and maximum distances between the data segment and the feature object based on the assumption that dist(d 1 , d 2 ) = dist(d 2 , d 1 ), but this assumption is not applicable in a directed road network where dist(d 1 , d 2 ) = dist(d 2 , d 1 ).Thus, ALPS generates an erroneous R-tree which leads to incorrect results.The above example demonstrates that ALPS do not work for directed road networks.For ALPS, to answer preference queries in a directed road network, the method for grouping and computing mindist(dseg, f ) and maxdist(dseg, f ) should be modified to consider the particular orientation of each road segment.
Comparing ALPS and TOPS conceptually, ALPS adopts the grouping of data objects into data segments whereas TOPS groups the feature objects.ALPS first groups the data objects to data segments and then prunes the dominated pairs which may allow the redundant pairs in the R-tree.However, TOPS first prune the pairs to avoid any redundant pair and then group them based on pivot nodes.Therefore, query processing time can be high in ALPS.Similarly, due to a higher number of redundant pairs, ALPS might utilize more disk size for indexing of skyline sets.However, the index construction time of ALPS can be better than TOPS because a lower number of skyline sets needs to be generated due to grouping of data objects.

Top-k Spatial Preference Query Algorithm
In this section, we present the Top-k Spatial Preference Query Algorithm (TOPS), for top-k spatial preference queries.TOPS is appropriate for all three neighborhood conditions (range, nearest neighbor and influence), but we discuss the range constraint to simplify the explanation.We then present the necessary modifications for supporting nearest neighbor and influence scores.Our algorithm processes the top-k preference query by sequential accessing the data objects in descending order of their partial score.In order to achieve this, TOPS retrieves qualifying data objects, one by one during query processing in descending order based on their partial scores, which can rapidly produce a set of k top data objects with the highest scores.
Algorithm 1 computes the top-k data objects with the highest score by aggregating the partial scores of data objects retrieved from each max heap H i .For each skyline set, SKY(D ⊗ G i ), we employ a max heap H i to traverse the data objects in descending order of their component score.Whenever if number of data objects in for each data object d ∈ D c do 17: If t ≥ u and number of data objects in D k = k then 21: break while statement 22: Finally, for each candidate object in D c , the upper bound score r θ ub (d) is computed by has not been seen so f ar} .The maximum γ θ ub (d) is then set to u (lines [16][17][18][19].If t ≥ u then no newly observed data object will end up in D k .Therefore, the algorithm terminates and returns D k if t ≥ u and the number of data objects in D k = k, or if all the heaps are exhausted (lines [20][21][22]. In order to illustrate our proposed algorithm, let us consider the example presented in Figure 7, where the R-tree is shown to index the skyline set SKY(D ⊗ G i ) that have been constructed using the example in Figure 2. We recall that hotels correspond to the data objects D = {d 1 , d 2 , d 3 } and cafes correspond to the feature objects F = { f 1 , f 2 , . . ., f 10 }.Let us consider that the client requested the following query of the top-k spatial preference query: "Find two hotels that are associated with a high-grade cafe which are located within a distance of 4".We recall that in this query, k = 2 and r = 4.After pruning and grouping the final generated skyline set is shown in Figure 7a, The algorithm checks all the qualifying pairs based on the neighborhood condition r = 4. Three pairs µ 1 , µ 3 and µ 4 are retrieved one at a time from R, and pushed onto a max heap H. Thus, H = {µ 1 , µ 3 , µ 4 } = {d, γ rng (d)|d 1 , 0.9d 3 , 0.8d 2 , 0.7}.Finally, d 1 and d 3 are selected as the Top-2 query result because they have scores of 0.9 and 0.8, respectively.
Algorithm 2 returns the data objects d ∈ H i one by one in descending order based on their partial score γ rng i (d).Initially, the heap H i contains the root node of an R-tree.H i comprises records rc, which can be either the data object or the R-tree node.Each time the record rc with the highest partial score is popped from H i .If rc indicates an R-tree node (line 3), then the algorithm verifies if the feature group satisfies the neighborhood condition maxdist(d, g) ≤ r.If entry s satisfies the neighborhood condition, it is added into H i (line 4, 5).If it does not satisfy neighborhood condition, which means a feature object f ∈ g exists such that dist(d, f ) > r.Therefore, each feature object f ∈ g needs to be examined to verify that dist(d, f ) ≤ r (line 8).All the qualifying records are inserted to Heap H i and the highest score of each feature object is assigned as a group score s(g) (lines 9, 10).Finally, when data object d is found, it is returned as d high with the highest partial score (lines 15,17).
Algorithm 2 can be adapted with minor modifications to the nearest neighbor and influence scores.For the nearest neighbor score, the pairs d ⊗ f are pruned such that f is not the nearest neighbor of d.Thus, during the construction of SKY(d ⊗ G i ), the data objects are flagged to indicate whether or not f is the nearest neighbor of d (bit 1 if f is the nearest neighbor, and a 0 bit otherwise).For the influence score, the radius r is only used to compute the score and the score of feature object is reduced in proportion to the distance to a data object.Therefore, the verification conditions from Algorithm 2 (lines 4 and 8) are removed from the algorithm for the influence score.Thus, for each feature object f ∈ g, the component influence score is computed with respect to feature object and a corresponding entry is added to H i .

Incremental Maintenance
In this section, we discuss the incremental maintenance of the skyline set during the insertion, deletion and updating processes of the data and feature objects.We use the adaptation of the branch-and-bound skyline (BBS) the dynamic skyline algorithm for incremental maintenance of the skyline set.First, we update DP(D, F i ) which is retrieved during the pruning phase and the Skyline set SKY(D ⊗ G i ) is then updated based on the updated DP(D, F i ) set.Once the dominant set is updated, the update of group and SKY(D ⊗ G i ) is simple and straight forward.Insertions and deletions of data objects d ∈ D are fairly simple and cost-effective.When a new data object d new is inserted into D, all of the dominant pairs d new ⊗ f are added to the DP(d, F i ) set and the feature objects with the same pivot node form a feature group g new .Next, the pairs (d new ⊗ g new ) are inserted in skyline set SKY(D ⊗ G i ).If a data object d deleted is deleted, then the pairs (d deleted ⊗ G i ) are deleted from the skyline set SKY(D ⊗ G i ) and all of the pairs d deleted ⊗ f are deleted from the DP(d, F i ) set.Updates to the spatial location of a data object d are processed as a deletion followed by an insertion.
Next, we discuss the insertion, deletion and update processes of feature objects.The scores of feature objects are usually updated more frequently compared with the spatial location.Therefore, the most frequent maintenance operation is updating the score of a feature object.Updating the score of a feature object f ∈ F i can potentially affect the score of a group g ∈ G i which may affect the materialized skyline set SKY(d ⊗ G i ).However, the updating cost is not very high due to the dominance relationship.Let us assume that the score of a feature object f updated has been updated.As a consequence, the following two cases may occur: (1) the dominant set is still valid, (2) the dominant set is no longer valid.In the first case, if the DP(d, F i ) set is valid, this means that the materialized skyline sets are also valid so there is no need for maintenance and the score of f updated is simply updated.If f updated has the highest score in the group, then the score of the group is also updated.In the second case, we check the dominance relationship.Only the score of the feature object is updated, so the maintenance algorithm simply performs a static dominance relation to update the DP(d, F i ) set.First, we check whether f s f updated , and the pairs d ⊗ f updated are then removed from the DP(d, F i ) set.Next, we check whether any feature object is dominated by f updated .If f updated s f , then the pairs d ⊗ f are removed from the DP(d, F i ) sets.Finally, the information regarding the groups (i.e., max/min distance and s(g)) is modified based on updated DP(D, F i ) set and SKY(D ⊗ G i ) is generated.
Let us consider the addition of a new feature object f new to the feature object set F i .First, we need to execute the dominance check.If a pair d ⊗ f new is dominated by any other pair in DP(d, F i ) set, this does not affect the dominant set, so it is simply discarded.However, if it is not dominated by any other pair then the maintenance algorithm will issue a query to retrieve all of the pairs that are dominated by d ⊗ f new .If no pair is retrieved then d ⊗ f new is simply added to the DP(d, F i ) set, otherwise, the maintenance algorithm will remove all of the existing pairs from DP(d, F i ) that are dominated by d ⊗ f new .Similarly, the information regarding groups is changed based on the updated DP(D, F i ) set and SKY(D ⊗ G i ) is generated.
Next, we explain the maintenance of the skyline set after the deletion of feature objects.First, we need to check whether ∈ DP(d, F i ), then no further processing is required; otherwise, the maintenance algorithm is called.For incremental maintenance, we need to determine the set of pairs d ⊗ f that are exclusively dominated by d ⊗ f deleted .If such pairs exist, then the dominant pairs are computed for these pairs and added to DP(d, F i ) set; otherwise, d ⊗ f deleted is simply removed from the DP(d, F i ) set.Finally, as mentioned earlier, the information regarding groups is modified based on the updated DP(D, F i ) set and SKY(D ⊗ G i ) is generated.
Finally, we analyze the time complexities of adding, deleting, and updating a data object and a feature object.As mentioned earlier, incremental maintenance is performed on d ⊗ f instead of d ⊗ g; therefore, we analyze the time complexity in terms of updating and maintaining DP(D, F i ).
Lastly, the time complexity of updating a feature object because the update (i.e., location or score update) of a feature object f u can be handled as a deletion followed by an insertion.

Performance Evaluation
In this section, we describe the performance evaluation of our proposed algorithm TOPS based on simulation experiments.In Section 6.1, we describe our experimental settings.Section 6.2 presents the experimental results for query processing time.Section 6.3 studies the performance evaluation of materialization and maintenance costs.Finally, in Section 6.4, we present the performance comparison of TOPS and ALPS + .

Experimental Settings
All of our experiments are performed using a real road network [30] that comprises the main roads of North America, with 175,812 nodes and 179,178 edges.According to the American Hotel and Lodging Association [31], at the end of year 2014, there were 53,432 hotels in the United States, which corresponds to the data objects in this study.All of the algorithms were implemented in Java and run on a desktop PC with a Pentium 2.8 GHz processor and 4 GB memory.The datasets were indexed by R-trees with a page size of 4 KB.Our results comprised the average values obtained from 20 experiments.In all of the experiments, we measured the total query processing time with respect to various parameters, as shown in Table 5.In each experiment, we only varied one parameter whereas the others remained fixed at the bold default values.We implement and evaluate two versions of TOPS: TOPS gr and TOPS in .TOPS gr groups the feature objects and then generates and stores the skyline set SKY(D ⊗ G i ), whereas, TOPS in does not group the feature objects, but instead it generates and stores the skyline set for each data and feature object pair SKY(D ⊗ F i ).We compare both versions of TOPS with the Period approach, which computes the score for every data object by using the incremental network expansion (INE) and range network expansion (RNE) algorithms [32] to compute the nearest neighbor and range scores, respectively.The INE algorithm finds k nearest neighbors in road networks using Dijkstra's algorithm [33].The RNE algorithm is similar to the INE algorithm, except that it explores the network within a distance r from a query point.We slightly modified the RNE algorithm in order to compute the influence scores of data objects.The Period method does not use any materialization scheme.

Experimental Results for Query Processing Time
Figure 10 shows the query processing times for TOPS gr , TOPS in and the Period method for the range condition.Figure 10a shows the query processing time as a function of the number k of requested data objects with the highest score.The query processing time of the period method incurs a constant query processing time regardless of the value of k because it explores all the feature objects within the query range r of the data object.However, the query processing time of TOPS gr and TOPS in increases slightly with the value of k.Nevertheless, TOPS gr and TOPS in outperform period algorithm.Figure 10b shows the query processing time as a function of the number m of feature datasets.The query processing time of all the algorithms increases with the value of m, but the query processing time of the period method increases more rapidly with the value of m than TOPS gr and TOPS in .This is mainly because TOPS gr and TOPS in use the materialized skyline sets, thereby reducing the computational overheads and increasing the performance efficiency.Figure 10c shows the comparison of query processing time of Period, TOPS gr and TOPS in with different values of r.Experimental results reveal that the computational time increases under all of the algorithms as the range r increases.This is mainly because the search space increases in proportion to r. Figure 10d,e demonstrate the performance of Period, TOPS gr and TOPS in with different values of N D and N F , respectively, which indicate that the query processing time of Period increases with the value of N D and N F because the Period method investigates all of the feature objects within the query range of each data object.TOPS gr and TOPS in exhibited similar trends because both algorithms explore the pairs sequentially in the skyline sets in descending order based on the range score.However, TOPS gr scale better than TOPS in due to grouping, which allows fewer pairs to investigate.It should be noted that according to Figure 10e, increasing N F has little impact on the performance of TOPS gr and TOPS in because both TOPS gr and TOPS in materializes the pairs that are dominant, and thus the number of pairs are not affected significantly by increasing N F .
Figure 11 shows the query processing time for TOPS gr , TOPS in and the Period method for the nearest neighbor condition.Figure 11a illustrates the effect of various values of k on the query processing time by all the algorithms, which shows that both TOPS gr and TOPS in clearly outperforms the Period method, although the query processing time of Period is stable regardless of the k value.This is mainly because the Period method continues network expansion until the closest feature object is found for each data object.Figure 11b shows the query processing time as a function of the m value.The query processing times increases rapidly for all of the methods with the m value.However, TOPS gr and TOPS in perform better than Period in all cases.Figure 11c shows the effect of the number of data objects on the query processing time of Period, TOPS gr and TOPS in .Observe that all the three algorithms are sensitive towards the number of data objects, but TOPS gr significantly outperforms the Period, while TOPS in is comparable to TOPS gr .Figure 11d    Figure 12 shows the query processing times for TOPS gr , TOPS in and the Period method for the influence condition.Figure 12a illustrates the query processing time as a function of the value of k.According to Figure 11a, TOPS gr and TOPS in clearly outperforms Period regardless of the value of k.As shown in Figure 12b, we varied the number of feature sets m, and the experimental results demonstrates that the query processing times increases for all three methods as the value of m increases.TOPS gr clearly outperforms TOPS in in each case.Figure 12c shows the effect of r on the query processing time.Notice that the query processing times of all three algorithms are sensitive to the increase in the range r, because the search space increased.In Figure 12d,e, we illustrate the effects on the query processing time by varying N D and N F , respectively, which indicate that TOPS gr and TOPS in are always faster than Period, irrespective of the values of N D and N F because both algorithms employ materialized skyline sets.
According to Figure 11a, TOPS  and TOPS  clearly outperforms Period regardless of the value of k.As shown in Figure 12b, we varied the number of feature sets m, and the experimental results demonstrates that the query processing times increases for all three methods as the value of m increases.TOPS  clearly outperforms TOPS  in each case.Figure 12c shows the effect of r on the query processing time.Notice that the query processing times of all three algorithms are sensitive to the increase in the range r, because the search space increased.In Figure 12d,e

Experimental Results for Materialization and Incremental Maintenance Costs
In this section, we only present a performance comparison of   and   because the baseline method does not use any materialization and incremental maintenance scheme.Figure 13 shows an index construction time for TOPS  and TOPS  for various cardinalities of data and feature objects.The index construction time of both methods increased with the values of  D and  F .This is mainly because the number of pairs to be indexed increases with  D and  F .However, due to the grouping technique TOPS  performs better for all cases.
In Figure 14, we study the effect of number of data objects and feature objects on the index size of TOPS  and TOPS  .As shown in Figure 14a,b, the index size increased by increasing number of data and feature objects, respectively.However, TOPS  consumed much less space as compared to TOPS  because of the grouping technique which reduces the number of pairs to index.

Experimental Results for Materialization and Incremental Maintenance Costs
In this section, we only present a performance comparison of TOPS gr and TOPS in because the baseline method does not use any materialization and incremental maintenance scheme.Figure 13 shows an index construction time for TOPS gr and TOPS in for various cardinalities of data and feature objects.The index construction time of both methods increased with the values of N D and N F .This is mainly because the number of pairs to be indexed increases with N D and N F .However, due to the grouping technique TOPS gr performs better for all cases.In Figure 14, we study the effect of number of data objects and feature objects on the index size of TOPS gr and TOPS in .As shown in Figure 14a,b, the index size increased by increasing number of data and feature objects, respectively.However, TOPS gr consumed much less space as compared to TOPS in because of the grouping technique which reduces the number of pairs to index.Figure 15 shows comparisons of the average elapsed times for inserting a data object and deleting a feature object.In order to measure these times, both insertion of data objects and deletion of feature objects have been conducted 500 times, during which all other parameters remain the same.As shown in Figure 15, the maintenance time for TOPS  is slightly longer than TOPS  because, as mentioned in Section 5.1, first (,   ) is updated and then (⨂  ) is updated accordingly.Experimental results in Figure 15a depict that insertion time for a data object is not significantly affected by the number of data objects.This is because the leading factor for insertion time is generation of a dominant pair of new data objects which are the same regardless of the number of data objects.The only factor that causes the slight increase in insertion time is the update of the materialized dominant set which increases with the value of   .Figure 15b shows that deletions of feature objects are more expensive than insertions of data objects and the average deletion time of a feature object is sensitive to the number of feature objects.This is mainly because the number of pairs that are exclusively dominated by deleted feature object pairs increases with   .Thus, the dominance sets for more new pairs are determined resulting in an increase in time.

Comparison of TOPS and ALPS +
In this section, we present a performance comparison of TOPS and ALPS + .Note that in this section TOPS  is referred as TOPS.As discussed earlier ALPS is originally designed for processing preference queries in undirected road networks.To make a fair comparison, we modified ALPS to process top-k spatial preference queries in directed road networks which we call ALPS + .Specifically, we perform two major modifications; firstly, we assume that only data objects that resides in a bidirectional adjacent edges can be grouped together to create a data segment, and, secondly, we modified the technique for computing distance between data segments and feature objects.Figure 15 shows comparisons of the average elapsed times for inserting a data object and deleting a feature object.In order to measure these times, both insertion of data objects and deletion of feature objects have been conducted 500 times, during which all other parameters remain the same.As shown in Figure 15, the maintenance time for TOPS gr is slightly longer than TOPS in because, as mentioned in Section 5, first DP(D, F i ) is updated and then SKY(D ⊗ G i ) is updated accordingly.Experimental results in Figure 15a depict that insertion time for a data object is not significantly affected by the number of data objects.This is because the leading factor for insertion time is generation of a dominant pair of new data objects which are the same regardless of the number of data objects.The only factor that causes the slight increase in insertion time is the update of the materialized dominant set which increases with the value of N D . Figure 15b shows that deletions of feature objects are more expensive than insertions of data objects and the average deletion time of a feature object is sensitive to the number of feature objects.This is mainly because the number of pairs that are exclusively dominated by deleted feature object pairs increases with N F .Thus, the dominance sets for more new pairs are determined resulting in an increase in time.Figure 16 shows the performance of query processing times of TOPS and ALPS + for the range condition.Figure 16a studies the effect of k on query processing time of TOPS and ALPS + whereas Figure 16b shows the effect of r on performance of both algorithms.The experimental results reveal that the query processing time of both methods increases with the value of k and r.However, TOPS clearly outperforms ALPS + in each case because the number of data and feature objects pairs in

Comparison of TOPS and ALPS +
In this section, we present a performance comparison of TOPS and ALPS + .Note that in this section TOPS gr is referred as TOPS.As discussed earlier ALPS is originally designed for processing preference queries in undirected road networks.To make a fair comparison, we modified ALPS to process top-k spatial preference queries in directed road networks which we call ALPS + .Specifically, we perform two major modifications; firstly, we assume that only data objects that resides in a bidirectional adjacent edges can be grouped together to create a data segment, and, secondly, we modified the technique for computing distance between data segments and feature objects.
Figure 16 shows the performance of query processing times of TOPS and ALPS + for the range condition.Figure 16a studies the effect of k on query processing time of TOPS and ALPS + whereas Figure 16b shows the effect of r on performance of both algorithms.The experimental results reveal that the query processing time of both methods increases with the value of k and r.However, TOPS clearly outperforms ALPS + in each case because the number of data and feature objects pairs in ALPS + is higher than TOPS.The main reason is that ALPS + first groups the data objects then prunes the pairs based on the dominance relation which may include the redundant pairs.Whereas, our proposed method first prunes the dominated pairs and then groups them to remove any redundant pairs.Figure 16 shows the performance of query processing times of TOPS and ALPS + for the range condition.Figure 16a studies the effect of k on query processing time of TOPS and ALPS + whereas Figure 16b shows the effect of r on performance of both algorithms.The experimental results reveal that the query processing time of both methods increases with the value of k and r.However, TOPS clearly outperforms ALPS + in each case because the number of data and feature objects pairs in ALPS + is higher than TOPS.The main reason is that ALPS + first groups the data objects then prunes the pairs based on the dominance relation which may include the redundant pairs.Whereas, our proposed method first prunes the dominated pairs and then groups them to remove any redundant pairs.

Conclusions
In this paper, we studied top-k spatial preference queries in directed road networks.We proposed a new approach called TOPS to enhance the performance of top-k spatial preference queries in directed road networks.Our approach is based on the pruning and grouping of feature objects, thereby minimizing the number of subsets of pairs required to rank the data objects.Skyline pairs that are not dominated by other pairs are mapped onto the distance-score space, and a skyline set is then generated and indexed in an R-tree.To achieve this, we presented mathematical formulae for determining the minimum and maximum distances between a data object and a feature group.

Conclusions
In this paper, we studied top-k spatial preference queries in directed road networks.We proposed a new approach called TOPS to enhance the performance of top-k spatial preference queries in directed road networks.Our approach is based on the pruning and grouping of feature objects, thereby minimizing the number of subsets of pairs required to rank the data objects.Skyline pairs that are not dominated by other pairs are mapped onto the distance-score space, and a skyline set is then generated and indexed in an R-tree.To achieve this, we presented mathematical formulae for determining the minimum and maximum distances between a data object and a feature group.Furthermore, we proposed an efficient algorithm for processing top-k spatial preference queries while ensuring materialized information is updated.
For experimental evaluation, we implemented two versions of TOPS: TOPS gr and TOPS in and compared them with the Period approach.To be precise, TOPS gr uses materialized data and feature group pairs whereas TOPS in uses the materialized data and feature object sets.Based on our experimental findings, both TOPS gr and TOPS in significantly outperform the Period approach in terms of query processing time for various parameters.However, both TOPS gr and TOPS in are comparable in terms of query processing time; but TOPS gr is superior in terms of materialization costs.

Figure 1 .
Figure 1.Example of top-k spatial preference queries in a directed road network.

Figure 1 .
Figure 1.Example of top-k spatial preference queries in a directed road network.
3 and d 2 ⊗ f 1 are dominated pairs.Similarly, Figure 3c shows the mapping of d 3 ⊗ F i , and it is clear that both pairs d 3 ⊗ f 10 and d 3 ⊗ f 9 are not dominated by any other pair.ISPRS Int.J. Geo-Inf.2016, 5, 170 9 of 25

Figure 3 .
Figure 3. Mapping of D ⊗ F i to the distance-score space.(a) d 1 ⊗ F i ; (b) d 2 ⊗ F i ; (c) d 3 ⊗ F i .

4. 3 .
Mapping to Distance-Score SpaceIn this section, we formally define the search space of the top-k spatial preference queries by defining a mapping of the data objects d and any feature group g to a distance-score space.Let d ⊗ g denote a pair comprising data object d ∈ D and a feature group g ∈ G i , then d ⊗ g is represented as {[mindist(d, g), maxdist(d, g)] , s(g)}.Each d ⊗ g pair is mapped to either a point or a line segment in the distance-score space M, defined by the axes dist(d, g) and s(g), where dist(d, g) corresponds to the distance between data object d and feature group g and s(g) corresponds to score of g.Definition 2: (Mapping of D ⊗ G i to M): The mapping of pairs d ⊗ g comprising a data object d ∈ D and a feature group g ∈ G i to the 2-dimensional space M (called distance-score space) is D
which is equivalent to either maxdist(d, g a ) ≤ mindist(d, g b ) and s(g a ) > s(g b ) or if maxdist(d, g a ) < mindist(d, g b ) and s(g a ) ≥ s(g b ).Hence, the partial score of d is γ θ i (d) = s(g a ) if θ = rng or θ = nn, and γ θ i (d) = s(g a ) × 2 − maxdist(d,ga ) r if θ = in f .This contradicts our assumption that g b contributes to γ θ i (d).Therefore, SKY(d ⊗ G i ) is sufficient for obtaining the component score of a data object d ∈ D.

Figure 9 .
Figure 9. Example of ALPS in directed road network.
where there are four data objects d 1 , d 2 , d 3 and d 4 and two feature objects f 1 and f 2 which are denoted as triangles and rectangles, respectively.The data objects d 1 and d 2 lies in a same sequence are grouped and converted to data segment d 1 d 2 and data objects d 3 and d 4 are converted to 3 d 4 .Observe that, grouping of d 3 d 4 is not valid because there is no path that exists to connect d 3 to d 4 , as shown in Figure 9.

Figure 9 .
Figure 9. Example of ALPS in directed road network.

Figure 9 .
Figure 9. Example of ALPS in directed road network.

Figure 10 .
Figure 10.Comparison of the query processing time for  = .(a) Effect of k; (b) Effect of m; (c) Effect of r; (d) Effect of   ; (e) Effect of   .

Figure 10 .
Figure 10.Comparison of the query processing time for θ = rng.(a) Effect of k; (b) Effect of m; (c) Effect of r; (d) Effect of N D ; (e) Effect of N F .
compares the query processing time of Period, TOPS gr and TOPS in with different values of N F , which indicates that both TOPS gr and TOPS in scale better than Period.

Figure 10 .Figure 11 .
Figure 10.Comparison of the query processing time for  = .(a) Effect of k; (b) Effect of m; (c) Effect of r; (d) Effect of   ; (e) Effect of   .

Figure 11 .
Figure 11.Comparison of the query processing time for θ = nn.(a) Effect of k; (b) Effect of m; (c) Effect of N D ; (d) Effect of N F .

Figure 12 .
Figure 12.Comparison of the query processing time for  = .(a) Effect of k; (b) Effect of m; (c) Effect of r; (d) Effect of   ; (e) Effect of   .

Figure 12 .
Figure 12.Comparison of the query processing time for θ = in f .(a) Effect of k; (b) Effect of m; (c) Effect of r; (d) Effect of N D ; (e) Effect of N F .

Figure 13 .
Figure 13.Index construction time.(a) Effect of   ; (b) Effect of   .Figure 13.Index construction time.(a) Effect of N D ; (b) Effect of N F .

Figure 13 .
Figure 13.Index construction time.(a) Effect of   ; (b) Effect of   .Figure 13.Index construction time.(a) Effect of N D ; (b) Effect of N F .

Figure 14 .
Figure 14.Index size.(a) Effect of N D ; (b) Effect of N F .

Figure 15 .
Figure 15.Incremental maintenance cost.(a) Effect of   on insertion time of data object; (b) Effect of   on deletion time of feature object.

Figure 15 .
Figure 15.Incremental maintenance cost.(a) Effect of N D on insertion time of data object; (b) Effect of N F on deletion time of feature object.

Figure 15 .
Figure 15.Incremental maintenance cost.(a) Effect of   on insertion time of data object; (b) Effect of   on deletion time of feature object.

Figure 16 .
Figure 16.Comparison of TOPS and  + for  = .(a) Effect of k on query processing time; (b) Effect r on query processing time.

Figure 16 .
Figure 16.Comparison of TOPS and ALPS + for θ = rng.(a) Effect of k on query processing time; (b) Effect r on query processing time.

Table 1 .
Summary of notations used in this study.
n } and a set of m feature dataset

Table 2 .
Computation of the scores of data objects d 1 , d 2 , and d 3 .
The set of pairs that are not dominated by any other pair in d ⊗ F i are referred as dominant set DP(d, F i ).
Lemma 1: A feature object f is a dominant object if and only if for any other feature object f for which f s

Table 3 .
Summary of the grouping of feature objects.

Table 4 .
Summary of d ⊗ g in Figure Let us assume that (⨂  ) is not sufficient for obtaining the partial score    () of a data object  ∈ .This means that there is a feature group   that contributes to Proof:().Now if  =  or  = , then    () = (  ), and if  = , then    () = (  ) × 2 − (,  ) the NextHighestRangeScoreObject(H i , r) method is called, the data object d high with the highest component score γ θ i (d high ) is popped from the max heap H i .Let D k be the current top-k set and R i is the recent component score seen in H i .In addition, TOPS maintains a list of candidate data objects D c , which may become top-k data objects.R i is set to γ θ i (d high ) (line 6) and the lower bound score r θ lb (d high ) is also updated using the aggregate function (line 7).If the number of data objects in D k is less than k or r θ lb (d high ) is greater than the k-th highest score of data object in D k , then d high is added to D k .If d high is already in D c then it will be removed from the D c list.If the number of data objects in D k = k + 1, the data object with the lowest r θ lb is moved from D k to D c (line 8-12).Then, t is set to the lowest r θ lb of the data objects in D k (line 13).The upper bound score γ θ ub (d) is computed for each data object d ∈ D c .
Algorithm 1: TOPS(H i ,k,r)Input: H i : a max heap with entries in descending order of partial range score, k: number of requested data objects with highest score, r:range constraint high ) ← r θ lb (d high ) + γ θ i (d high ) 8:If number of data objects in D k < k or r θ lb (d high ) > t then 9:

Algorithm 2 :
NextHighestRangeScore(H i ,r)Input: H i : a max heap with entries in descending order of partial range score, r: range constraint Output: The next data object in H i with highest partial score The time complexity of adding a data object d a is O(m|DP(d a , F i )||d a ⊗ F i | + m|DP(d a , F i )|log|DP(D, F i )|), where m is the number of feature datasets.Specifically, the dominant set of d a is generated for each feature dataset m which has a time complexity of O(|DP(d a , F i )||d a ⊗ F i |).Then, the dominant set DP(D, F i ) is updated which has a time complexity of O(|DP(d a, F i )|log|DP(D, F i )|).The time complexity of deleting a data object d d is m|DP(d a , F i )|log|DP(D, F i )|.Thus, the time complexity of updating a data object d u is O(m|DP(d u , F i )||d u ⊗ F i | + m|DP(d u , F i )|log|DP(D, F i )|)because updating a data object d u can be handled by a deletion of data object d u followed by an insertion.The time complexity of adding a feature object f a is O|DP(D, F i )|.This is because for each pair d ⊗ f dominant ∈ DP(D, F i )), the dominance check is performed to verify whether d ⊗ f a dominates d ⊗ f dominant or whether d ⊗ f a is dominated by d ⊗ f dominant .Next, we analyze the time complexity of deleting a feature object f d .Let d ⊗ F d i be the set of d ⊗ f pairs that are exclusively dominated by d ⊗ f d ∈ DP(D, F i )).For each data object d ∈ D, the exclusive dominance region for d ⊗ f d is determined, which has the time complexity of O(|DP(d, F i )|).Then, the dominant set for the pairs that are exclusively dominated by d ⊗ f d is determined which has the time complexity of O(|DP(d, F d i )||d ⊗ F d i |).Thus, the time complexity of deleting a feature object f d is O