Efficient Geometric Pruning Strategies for Continuous Skyline Queries

Jiping Zheng 1,2,3,*, Jialiang Chen 1 and Haixiang Wang 1 1 College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China; chenjialiang@nuaa.edu.cn (J.C.); wanghaixiang@nuaa.edu.cn (H.W.) 2 Collaborative Innovation Center of Novel Software Technology and Industrialization, Nanjing 211106, China 3 School of Computer Science and Engineering, The University of New South Wales, Sydney, NSW 2052, Australia * Correspondence: jzh@nuaa.edu.cn; Tel.: +86-25-8489-6490 (ext. 16116)


Introduction
The skyline query [1,2] is a useful operation for many important applications, including multi-criteria optimal decision making.Given two certain and multi-dimensional tuples u and v, u dominates v iff u is no worse than v in all dimensions, and strictly better than v in at least one dimension.Due to the exponentially increasing usage of smartphones and the availability of inexpensive position locators, location-based services (LBS) are increasingly popular where skyline queries are based on the current location of the user, which changes continuously as the user moves.Taking an example of a tourist looking for restaurants, she/he may be interested in the restaurants close to her/his location that are cheap and have good reputations.Since the distances between the tourist and the restaurants are changing as the tourist travels, the skyline needs to be updated continuously.In addition, the tourist may have a destination in her/his mind and she/he moves towards the place.As shown in Figure 1, the user at time t 0 (i.e., at the place q(t 0 )) moves in an upright direction (though users may move in arbitrary directions, in reality one user at a time only moves in one direction).At the next time t 1 , the user arrives at the place q(t 1 ).The ideal route may be the red one.However, for various reasons, such as picking up a friend, carriage maintenance, and traffic congestion etc., the actual route may be the yellow or the blue one.q(t 0 ) q(t 1 ) p 1 In other real-time applications such as e-games and digital battle systems, the route between a player and her destination may not be straight.Instead, the route is tortuous, as in the routes in the previous restaurant finding example.When the fighting player moves, she should keep her eyes on those guardians who are close and most dangerous to her in terms of energy, weapon, etc.To find suitable restaurants during travel or to escape guardians while moving to the player's destination, the intuitive approach to updating the skyline query results is to recalculate the skyline results using efficient algorithms from scratch, such as branch-and-bound skyline (BBS) [3,4].However, the relatively effective solution is to cache the last computed skyline results and to only calculate those data objects that may enter or leave the skyline results.
Note that existing approaches always assume that the motion of the query point is continuous and exactly calculable.Huang et al. [5] assumed that the query point was moving consecutively and the velocity of query point was known as (v x , v y ).Lin et al. [6] and Guo et al. [7] considered the motion of the query point a to be a line, and Lin et al. [6] also assumed the query point was moving within a certain range.For location privacy consideration, the user query point is sometimes assumed to vary in a disk region [8].In this paper, we address the motion typically presented incrementally over a series of discrete time steps, which is more practical for real applications.
Since discrete motion patterns are more suitable for moving points, we utilize the incremental motion model for the continuous skyline queries.That is, query points are moving incrementally in discrete time steps.Unlike existing motion models in [5][6][7][8], given the drift error bound and velocity of the drift, the incremental motion model places no restrictions on the motion or on its predictability (although the direction in this paper is given, we have no restrictions on moving directions).Under the incremental motion model, we utilize the geometric properties to prune the region in which the data objects will not be in the final skyline query results.To avoid calculating the skyline results from scratch while query points are moving, we maintain a data structure similar to kinetic data structures (KDSs) [9,10], which are famous in the area of computational geometry.The KDS keeps the desired relationship between data by storing all those data in some structures specific to the relationship.The contents in KDS do not change unless the relationship between some data points has been changed.The data structure includes a list for skyline query results at each time step.When the query point moves, the data structure decides whether the data object(s) enter(s) the skyline results or goes out of the result set.The implementation of the data structure is based on event-driven mechanisms.The framework of processing continuous skyline queries under an incremental motion model is illustrated as shown in Figure 2. First, the input dataset is pruned using geometric properties based on the incremental motion model.Then, we calculate the initial skyline results on the pruned dataset.
Event-driven mechanisms are adopted to compute the continuous skyline results when the query point is moving.To sum up, the key contributions are as follows: • We adopt an incremental motion model for the continuous skyline queries which neither restricts the motion nor makes any predictions.• By utilizing the geometric properties as a query point moves under incremental motion model, we prune skyline non-result-related data points, which can accelerate the processing of continuous skyline queries.• Instead of accurate dominance, we propose probabilistic dominance under an incremental motion model for two data points which possibly dominate each other.By giving different thresholds of possible dominance, we decide the final query results, which are more actual in real applications.The rest of the paper is organized as follows.Section 2 introduces related work.Preliminaries are given in Section 3. Section 4 presents the geometric pruning strategies.Data structures and event-driven processing mechanisms are provided in Section 5. Extensions to specific motion patterns are proposed in Section 6. Results of comprehensive performance studies are discussed in Section 7. Section 8 concludes the paper.

Related Work
Skyline Queries.The skyline operator was introduced to the database community by Borzsonyi et al. [1] in 2001.Consequent studies have focused on efficient skyline query processing.Tan et al. [11] developed bitmap and index techniques.Chomicki et al. [12] developed the sort-filter-skyline algorithm (SFS), which improved Block Nested Loop (BNL) by pre-sorting the dataset.Several optimizations to the SFS algorithm (e.g., [13]) increase its efficiency.Kossmann et al. [14] presented a nearest neighbor algorithm (NN) which allowed users to change preferences during runtime.Papadias et al. [4] proposed a progressive algorithm called branch-and-bound skyline (BBS), based on a nearest neighbor search technique supported by R-trees.Variations of the skyline operator have also been explored, such as in a distributed environment, road networks, skyline cubes, reverse skylines, and approximate skylines, just to name a few.See [2] and [15] for more extensions.However, the above studies only consider static query points on static attributes.Sharifzadeh et al. [16] first introduced the spatial skyline queries which are also named multi-source skyline queries in road networks by Deng et al. [17].They differentiate the attributes of the data objects to two categories: spatial attributes and non-spatial attributes, which are also called dynamic and static dimensions or static attributes and dynamic attributes.They also only considered static query points on dynamic attributes.Motion Modeling and Skyline Processing.Another related area is monitoring continuous motions using kinetic data structures (KDS) in computational geometry.Basch et al. [9] first proposed a conceptual framework for KDS to continuously maintain evolving attributes of mobile data.A basic assumption in the KDS framework is that the object trajectories are known.More recently, several efforts have been made to deal with data in much less restricted models of motion.Mount et al. [18] studied the maintenance of geometric structures in a setting where the trajectories are unknown.They separated the concerns of tracking the points and updating the geometric structure into two modules: the motion processor (MP) and the incremental motion algorithm (IM).They further presented a simple online model in which two agents named observer and builder cooperated to maintain the incremental motion [19].
Motion considered in skyline queries includes a moving query point and moving data objects.Huang et al. [5] proposed a kinetic-based data structure to update the skyline results.The query point was moving along with predefined motion patterns (i.e., uniform motion in a straight line).Lee et al. [20] also studied a similar problem.However, both of the attempts rely on the assumption that the velocities of the moving points are known.Unfortunately, this assumption does not hold in many real-world applications where the points (e.g., cars, tourists) frequently change their motion patterns (e.g., speed and direction).Furthermore, the extension of their techniques is non-trivial for the scenarios where velocities are unknown.Lin et al. [6] assumed that the query point is in a predefined spatial range instead of an exact location or moving along with a line segment.To handle the movement of the querying objects, the incremental version of the line-based skyline solution has been devised to reduce both the result set size and the computation cost.Hsueh et.al [21] presented an algorithm to update the skyline when the data objects change their attribute values.Cheema et al. [22] proposed a safe zone-based approach for monitoring moving skyline queries which allows queries to move in an arbitrary fashion.The query results are required to be updated only when the query leaves its safe zone.In the framework, when a query point moves in the time interval between two adjacent events, its trajectory belongs to a safe zone.Vu et al. [23] introduced uncertainty both on the query point and Points Of Interest (POIs) for spatial skyline queries.The uncertainties here are represented as square disks with varied radii.Compared to this method, our incremental motion model considers not only the uncertainties (represented by the disk whose radius is the maximum speed), but also the directions.Another difference from these studies is that we prune the data points not belonging to the final skyline results based on geometric properties as a preprocessing step, which accesses less data points and thus saves more CPU time.The authors in [3,4,[24][25][26] studied dynamic skylines.However, [3,4,24] used branch and bound method (BBS) to calculate the skyline results from scratch.References [25,26] utilized a caching mechanism to accelerate the dynamic/constraint skyline query processing.However, for continuous skyline queries, they are based on past queries not fully utilizing the query results of last moment.In addition, the methods of Sacharidis et al. [25] are not practical due to the shortcomings of bitmap coding mechanisms.

Preliminaries
When query points move, the distances between the query point and data points need to be estimated for further processing of skyline queries.In this section, we first estimate the location of the query point on incremental motion model, and then evaluate distance dominance between the data points for further calculating skyline results.Table 1 summarizes the notations frequently used throughout the paper.

Problem Definition
Given n data points in the dataset D, each point has d-dimensional non-spatial attributes, also called static attributes.Each point p i is stored as (p(p i ), p i,1 , p i,2 , ..., p i,d ), and p(p i ), p i,j are the location of p i and jth-dimensional static attribute of p i , respectively.Thus, p i can be represented as p i = (p i,1 , p i,2 , ..., p i,d , p i,d+1 ), where p i,d+1 is the distance of the query point q and data point p i .Definition 1. (Static Dominance) For two data points p i , p j and all attributes of p i except distance attribute, ∀k, k = (1, 2, ..., d), p i,k ≤ p j,k and at least one < holds, we say that p i statically dominates p j , represented as p i ≺ sta p j .Definition 2. (Complete Dominance) For two data points p i , p j , ∀k, k = (1, 2, ..., d + 1), p i,k ≤ p j,k and at least one < holds, we say that p i completely dominates p j , denoted as p i ≺ p j .
Although the skyline processing involves spatial and static attributes in our problem, some data points could be always in the skyline no matter how the query point moves.This is because these data points have the dominating nonspatial attributes which guarantee that no other data points can dominate them.We denote this subset of skyline points as S sta , the final skyline query results as S, and the difference of the two sets S − S sta , as S chg for data points in S chg might not be skyline points when query point moves.It is obvious that S chg contains two parts: the first part is S in , which includes the data points from non-skyline dataset D − S.These points become skyline points when the query point moves.The second part is S remain , which contains the skyline points not moving out of S when the query point moves.
Instead of snapshot skyline queries like I/O optimal BBS method, we define the continuous skyline queries as follows.Definition 3. (Continuous skyline queries) Given the skyline results S 0 at time t 0 , the skyline result S 1 at next time t 1 is based on S 0 and only considers two varied sets: S 0 − S 1 , the data points move out of skyline results of time t 0 , and S 1 − S 0 , the data points from D − S 0 become skyline points.
From References [1,5,20,21], we know that the skyline result S 0 is small compared with D. However, the dataset D − S 0 is also too large.Fortunately, not all data points in this dataset have the possibility to become skyline points.In practice, according to the motion of query point q, only a very small part of D − S 0 can be skyline points.So, the computation overhead of the continuous skyline queries will be decreased to some extent compared with snapshot skyline queries.

Incremental Motion Model
The incremental motion model was first proposed by Mount et al. [18].Later, Cho et al. [19] utilized this model to maintain net and net tree structures under an incremental motion model.We give a brief introduction to this model with adaptive modifications to our problem.

Query Point Position
Let q be a query point.The position of q in Euclidean space at time t is denoted as q(t).The motion of point q, M, is a finite sequence of point positions sampled at discrete time instances: M =< q(t 0 ), q(t 1 ), ..., q(t N ) >, where t i−1 < t i .The interval between two consecutive instances is a time step: s = [t i−1 , t i ].Let v denote the estimation of a query point's current velocity.The estimated displacement of the point over this step is sv, and its actual displacement is given by the vector u = q(t i ) − q(t i−1 ).Let |u| denote the Euclidean length of vector u.we use drift of this point at time t i to represent the relative error between the actual and estimated displacements: Let δ be the drift bound; we say that the motion M satisfies this motion estimate if for all time steps the drift of M relative to the velocity estimate v is at most δ.Given the velocity estimate v and given any time t, the estimated location of the point after an elapsed time of s is defined to be q(t, s) = q(t) + sv.For example, the estimated location of q(t i−1 ) after step s is q(t i−1 , s) (see Figure 3a).Let B(q, r) denote a Euclidean ball of radius r centered at point q.From the above definitions, we have Let T be a time interval of duration starting at time t, T = [t, t + s].If a motion M satisfies a given motion estimation, then for each time instance t + s ∈ T, 0 ≤ s ≤ s, the query point q(t + s ) lies within a Euclidean ball centered at q(t, s ) and the boundary of M is determined by a series of the balls mentioned above (see Figure 3b).The following equation specifies the constraints on its position at any time: s |v| q(t) sv q(t+s')

Time Parameterized Distance Function
In our problem, the velocity of a query point is determined by the positions of the point at different time steps: For simplicity, we assume that the query point q moves in 2d space, v = (v x , v y ), and the position of q at time t i is q(t i ) = (x i , y i ).We use q s to represent the center of the Euclidean ball.At time t i , q s = (x i , y i ).After an elapsed time of s = [t i , t i+1 ], the center of the incremental motion model q s changes to (x i+1 , y i+1 ) and equals to (x i + sv x , y i + sv y ).Then, for a data point p located at (x p , y p ), at time t i+1 , the estimated distance between q s and p can be expressed as follows: By Equation ( 3), the estimated location of the query point is q(t i+1 ) ∈ B( q(t i , s), sδ|v|).Since the estimated location of a query point after an elapsed time s is in a Euclidean ball, the actual distance between q(t i+1 ) and p is constrained in a certain range (see Figure 3b).We use I(p, q(t i+1 )) to denote the distance between p and q(t i+1 ): Besides, dist(p, q(t i+1 )) is used to denote the actual distance between p and q(t i+1 ), which is in the range of I(p, q(t i+1 )) denoted by Equation ( 6).

The Dominance Relationship of Distance
As the query point moves constantly, the distance between the moving point and each data point changes, which is inaccurate because we cannot predict the actual position of the query point exactly.However, we are sure that the distance is in a certain range: I(p i , q s ).In addition, if dist(p i , q s ) < dist(p j , q s ), p i is closer to the query point than p j .The dominance relationship between dist(p i , q s ) and dist(p j , q s ) is not confirmed, because both of them are inaccurate but within a known range.So, we need a rational method to compute the probability that p i is better than p j in distance.
Assume that after an elapsed time s, the position of a query point is shown in Figure 4, the radius of the incremental motion model is sδ|v|, and L denotes the perpendicular bisector of p i p j which divides the ball into two regions: D i and D j .Then, we use A and B to denote the two intersections of L and the ball.When the query point q is within region D i , dist(p i , q s ) < dist(p j , q s ).
Lemma 1. Assume that the position of query point q is uniformly distributed in the Euclidean ball.The angle between the estimated position of query point q to the intersections of the perpendicular bisector of p i p j and the bound of the incremental motion model is 2θ, then the probability that p i is better than p j in distance, denoted by Pr(p i < dist p j ), can be calculated by Pr(p Proof.As long as the query point is in the region D i , p i is closer to the query point than p j ; consequently, the probability that p i is better than p j in distance is equal to the probability that the query point falls in the region D i .Assuming that the possible positions of the query point are uniformly distributed in the ball, we only need to compute the proportion of region D i to the whole ball.
The angle between the estimated position of query point q to the intersections of the perpendicular bisector of p i p j is 2θ, so the area of fan-shaped AqB is: where R denotes the radius of the ball.
In addition, the area of triangle ABq is: So, the area of D i can be expressed as follows: Then, we can obtain that: From the above analysis we can see that with the movement of a query point the distance between the moving point and each data point is not certain, and is related to the estimated position of the query point.Next, we will give the definition of dominance in distance.Lemma 1 has shown how to compute the probability that p i is better than p j in distance.Definition 4. (Dominance in Distance) If the probability that p i is better than p j in distance is greater than a predefined threshold-That is, Pr(p i < dist p j ) > τ-, we say that p i dominates p j in distance (i.e., p i ≺ dist p j ).

Definition 5. (Skylines based on Thresholds
and p j cannot dominate each other in distance, and if no other data point can dominate p i or p j , they both belong to the skylines. The probability here is defined by the ratio of two areas (see Figure 4 and Equation ( 11)); the skyline queries are not probabilistic skyline queries as in Pei et al. [27] where probabilities denote the probable occurrence of each possible world from the PWS (possible world semantics) space composed by uncertain data objects.

Pruning Using Geometric Properties
The final skyline results consist of the points which are not dominated by any other skyline, both in distance and all static dimensions.So, if p i cannot dominate p j in static dimensions, p i cannot dominate p j after considering the distance dimension.That is, p i can dominate p j only if p i dominates or at least equals p j in all static dimensions.Thus, we can use the static skyline results and the earliest spatial relations of the data points to minimize the scale of data and reduce unnecessary data accesses.Lemma 2. For a query point q at time t, if sp f is the farthest point in S sta to the query point q, then any point p t that cannot dominate sp f in distance is not in S.
Proof.Obviously, p t / ∈ S sta , thus ∃sp ∈ S sta , and sp dominates p t in static dimensions.Since sp f is the farthest point in S sta to the query point q, sp dominates sp f in distance and sp f dominates p t in distance; thus, p t is dominated by sp when considering distance and static dimensions, p t / ∈ S.
Lemma 2 indicates a search bound when processing skyline queries.We can prune the portion of the unqualified data points before query processing: points that are dominated in distance by all points in S sta can be eliminated.Furthermore, if the drift error bound δ and the estimated velocity of a query point v are given, the range of the ball in the incremental motion model is determined.Lemma 3. As shown in Figure 5, the tangent lines of the ball of the increment motion model are L 1 , L 2 (see Figure 3b).Through query point q we draw lines H 1 H 1 , H 2 H 2 vertical to L 1 and L 2 , respectively.All these lines partition the entire area space into three parts: region A, region B, and the remaining area.Then, for any data point p in region B, if there exists a data point sk in region A and satisfies sk ≺ p while the query point resides in the region L 1 qL 2 , p cannot be in the final skyline.Proof.Initially, p is dominated by sk, so sk dominates (or is equal to) p in all static dimensions, and dist(sk, q) ≤ dist(p, q).There may be two situations in the future: 1. sk is always in the skyline.2. sk leaves the skyline at some time step.
For situation 1, consider the extreme case: sk lies on the boundary of region A (qH 1 or qH 2 ), then we take q as the center and dist(sk 0 , q) as the radius to draw a circle (see Figure 5).We only need to prove that any data point p in region B or outside of the circle, if p is dominated by sk in initial, it will still be dominated by sk in the future.p 0 is one of the extreme cases: through point q we draw a line vertical to p 0 sk 0 (i.e., the line L 1 ), because the whole estimated region of q is on the right side of the line, so sk 0 is nearer to the query point than p 0 in the future and sk 0 still dominates p 0 .
For situation 2, if sk leaves the skyline, there exists a point sk ≺ sk; in this case, sk dominates (or is equal to) sk in all static dimensions, and dist(sk , q) ≤ dist(sk, q).Referring to situation 1, sk ≺ p still holds, so sk ≺ p and p can be safely pruned.
We use Lemma 3 to divide a dataset into several regions, and verify each data point by the region it belongs to, then we prune the points that have no potential to enter the skyline S in the future.Lemma 4. On the basis of Lemma 3, we add following constraint: the angle of the incremental motion model α ≤ 60 • as shown in Figure 6a.L is the reverse extension line of the angular bisector of ∠L 1 qL 2 , H 2 H 2 and L divide the entire area space into several parts, denoted by dark grey and light grey color.A data point p cannot be in the final skyline when the query point q still resides in the estimated region and one of the following two situations is satisfied: 1. p lies in region D 1 and is dominated by another point S in C 1 .Note that D 1 , D 2 , C 1 , and C 2 are trapezoidal regions in Figure 6.

p lies in region D
Proof.From the above analysis, for a data point p lying in region D 1 we only need to prove that sk still dominates p in distance.Similar to the proof of Lemma 3, when α ≤ 60 • , we draw the perpendicular bisector of p and sk, the estimated region of q is on the sk side of the perpendicular bisector.Figure 6 has shown an extreme situation.Because H 1 H 1 , H 2 H 2 are the perpendicular bisectors of L 1 and L 2 , respectively, α so the perpendicular bisector of p 0 sk 0 is the line L 2 , as the estimated region of q is on the sk 0 side of L 2 , sk 0 still dominates p 0 in the future.When p belongs to region D 2 = area(LqH 2 ) symmetrically to D 1 , similar to situation 1, p is dominated as shown in Figure 6b.Proof.Similar to the proof of Lemma 3, we first draw the perpendicular bisector of p and sk and what we need to prove is that estimated region of q is on the sk side of the perpendicular bisector.
Figure 7 shows the extreme case: draw the perpendicular bisector of sk 0 and p 0 , sk 0 dominates p 0 in initial, in this situation, the perpendicular bisector goes through the start position of q.Additionally, ∠L 1 qL 1 = ∠L 1 qL 2 = α, so the perpendicular bisector is L 1 .the estimated region of q is on the sk 0 side of L 1 , sk 0 still dominates p 0 in the elapsed time of s.

Change of Skyline under Moving Contexts
When the query point moves, the dominance of data points may change.As shown in Figure 8, at time t 1 , Pr(p j < dist p i ) = 1, p j ≺ dist p i , and at time t 5 , Pr(p i < dist p j ) = 1, p i ≺ dist p j .The distances to query point q of p i and p j overlap at time t 2 , t 3 , and t 4 .Although the distance dominance relationship is uncertain at these moments, we can still compute the probabilities of dominance in distance.
q(t 3 ) q(t 4 ) q(t 5 ) At moment t 3 , the perpendicular bisector of p i p j goes through the center, Pr(p j < dist p i ) = 0.5, according to the threshold setting 0.5 ≤ τ ≤ 1, the dominating relation in distance is likely to change.At moment t 3 , the relationship is determined by τ.Intuitively, we set τ = 0.5, and t is the moment that distance dominance relationship between the two data points changes, called an intersection.A skyline point may leave the skyline after time t.On the other hand, a nonskyline point at time t may enter the skyline.In Figure 8, after time t 2 s i must be dominated by a skyline point s j .Those points that used to dominate the point before t 2 will stop dominating it.If τ > 0.5, because of the change of the dominance relationship in distance, the time that p j leaves the skyline will be later than t 2 .Similarly, the time that p i enters the skyline will be earlier than t 2 .That is, both p j and p i will remain in the skyline as long as they cannot dominate each other in distance.If there is no intersection, the distance dominance relationship will remain unchanged.Whether an intersection will affect the skyline depends on which set p i and p j belong just before time t.Obviously, not every intersection causes the skyline to change.Table 2 shows possible dominance changes after an intersection.
We have the following theorems to describe these possibilities in detail.

Possibility Before Intersection After Intersection
An intersection has no influence on the skyline if one of the following conditions holds before t: 1.In all conditions, p i ∈ S sta , p j ∈ S; 2. In all conditions, p i / ∈ S; 3. In condition C, p i ∈ S, p j / ∈ S; 4. In condition B, p i ∈ S chg , p j ∈ S.
Note that condition A, B, C are shown in Table 2.

Proof.
1.If p i ∈ S sta , it is obvious that p i does not leave the skyline.As p j ∈ S, there are two situations.
First, we assume that p j ∈ S sta .Thus, p j is still in the skyline and the skyline remains unchanged.Second, we assume that p j ∈ S chg .Since p j ∈ S, there exists no point p that dominates p j .Before time t, the points which have potential to dominate p j do not intersect with p j .So, p j is still in S chg and causes no change to the skyline.2. Since p i / ∈ S, before time t there must be at least one point sk ∈ S in the skyline dominating it.The intersection has no influence on sk ≺ dist p i if p j ∈ S sta and p j will not leave the skyline.Additionally, if p j ∈ S chg , for the same reason as in item (1), p j will stay in the skyline.If p j / ∈ S, as in item (1).
3. Since p i ∈ S, there are two situations: p i ∈ S sta and p i ∈ S chg .We first assume p i ∈ S sta .Then, p i is still in the skyline after the intersection.Then, we assume p i ∈ S chg .On one hand, because p j / ∈ S, p j cannot dominate p i in all static dimensions, and p i is still a skyline point after the intersection; on the other hand, p j / ∈ S and p i and p j cannot dominate each other in distance, so p j is unable to dominate p i , there must be a point p ∈ S which dominates p j .Therefore, there is no intersection between p and p j , p ≺ p j and p j / ∈ S. 4. For the same reason as in item (1), p j is still in the skyline after the intersection.p i ∈ S chg , assume that p j dominates p i in all static dimensions.After the intersection, p j and p i cannot dominate each other in distance, so p j cannot dominate p i .Thus, p i ∈ S chg .
Theorem 2. An intersection may have influence on the skyline if one of the following conditions holds before t: 1.In condition A or B, p i ∈ S, p j / ∈ S; 2. In condition A or C, p i ∈ S chg , p j ∈ S.
Note that conditions A, B, and C are shown in Table 2 Proof.
1. First, assume that p i ∈ S sta .So, p i is still in the skyline after the intersection.Since p j / ∈ S, there must be at least one skyline point in S dominating it.If p i dominates p j before t, then after t there may be one of the following situations: p j ≺ dist p i or p i and p j cannot dominate each other in distance.Consequently, p j will enter the skyline after t in both situations.2. Obviously, p j will not leave the skyline after t. p i ∈ S chg , p j ∈ S, if p j dominates p i in all static dimension, after t, p i will leave the skyline since p j ≺ dist p i .
According to Theorem 2, we only need to consider the two following cases in which the skyline may change: 1.In initial, s i ∈ S, s n / ∈ S, and s i ≺ s n .After an intersection, s i can no longer dominate s n , then s n can enter the skyline and depends on whether s i is the unique skyline point that dominates it.2. Initially, s i ∈ S chg , s j dominates s i in all static dimensions.After an intersection, s i can be dominated by s j and leave the skyline.

Continuous Skyline Query Processing
We now explore the continuous skyline query processing techniques according to the above analysis.A naive way is to call the existing algorithms to process skyline queries in each time step to acquire continuous query results.Since we know the estimated position of the query point and parameters of the associated incremental motion model, the results are credible.However, processing continuous skyline queries in this way needs to traverse the entire dataset repeatedly, which will inevitably increase the running time and I/Os.In practice, the speed of a query point is not very fast; therefore, the skyline will not change frequently.For example, as mentioned in Section 1, a user is looking for restaurants.Her/his moving speed is considerable not too fast.So, the skyline results in the last moment can be utilized to process skyline queries of the next moment.Otherwise, if the speed is too fast or the time interval is large enough that the skyline results in the last time step are all changed at the next moment, we would rather utilize snapshot skyline queries for this situation.For the motivation examples, these will not occur.So, in this paper, we assume the query points are not moving too fast and skyline results in last moment can be used for next time step query processing.For the problem, we can use the strategies mentioned in previous sections to compute the intersections that may influence the skyline and maintain the skyline results incrementally.
First, we compute the initial skyline.After that, we decide which moment may cause the skyline change and record the intersections the skyline results may change.Then, when an intersection comes, we deal with it and determine further intersections for the updates of the skyline.
Corollary 1. Assume that the distances of skyline points s i , s j , and s k to the estimated query point is an increasing sequence, then the intersection between s i and s k will not occur before an intersection between any two adjacent points of s i , s j , and s k .
Proof.Assume that s i and s j intersect at time t x , s j , s k , and s i , s k intersect at time t y , t z , respectively.Before an intersection, we know that s i < dist s j < dist s k , and we only need to prove that t z must be later than t x and t y .Now suppose that t z is earlier than t x or t y .So, after time t z , s k < dist s i .Additionally, s i < dist s j is still valid since no intersection happens between any two adjacent points, which contradicts s k < dist s i .Therefore, t z must be later than t x and t y .
We stored the skyline in a sequence according to their distances to the query point.We also only need to compute intersections between two adjacent skyline points.

Data Structure and Conditions
We use a bidirectional linked list (other similar data structures, such as heap and array are also fine for we process these structures not "on-line"; instead, they could be processed during the interval of two time steps) named LL to store current skyline points, which are sorted in ascending order of their distances to the query point.The form of each skyline point s i in LL is denoted as ( f lag, dist, t valid , t skip ).f lag indicates whether s i belongs to S sta , dist is the distance between s i and the query point q, t valid is the validity time of s i , which is only available to each changing skyline point and recording the time when s i is dominated, and t skip is the time s i will exchange its position with its successor in LL (see the algorithms below for details).
By Theorem 2, there are two situations that may cause the skyline to change.Assume the time of an intersection is t insec (τ = 0.5): 1.
Before time t insec , s i is a changing skyline point.s j is farther to query point q than s i , and s j dominates s i in all static dimensions.Then, after t insec , s i will be dominated by s j and leave the skyline; 2.
Before time t insec , s i is a skyline point, and s n is a nonskyline point.Then, after t insec , s n can enter the skyline depending on whether s i is the unique skyline point which dominates it.
To summarize the above analysis, we only need to consider the cases which may cause the skyline to change.For simplicity, this paper has made two assumptions on the threshold τ: 1.
The perpendicular bisector of p i p j goes through the centre of the ball in the incremental motion model, so θ = 90 • , we can derive from Lemma 1 that: For τ = 0.5, before time t 2 , p j ≺ dist p i , and after that p i ≺ dist p j (as shown in Figure 8).

2.
Assume that the perpendicular bisector of p i p j goes through the point C where is 1/4 of the diameter (see Figure 4); i.e., |qC| = 1/2|qA|.Therefore, cos θ = 0.5, and θ = 60 • ; we can obtain from Lemma 1 that: For convenience of calculation, we set Pr(p i < dist p j ) = τ.Theorem 3. As shown in Figure 8, assume that the position coordinates of p 1 , p 2 are (x 1 , y 1 ) and (x 2 , y 2 ).The query point is starting from (x q , y q ) with velocity v(v x , v y ), then the time of the intersection can be presented as below: 4π , time of the intersection are t 2 and t 4 , t 2 and t 4 can be given as follows: t 2 = Proof.The coordinates of p 1 , p 2 are (x 1 , y 1 ) and (x 2 , y 2 ), the perpendicular bisector of p 1 p 2 , denoted by L, can be written as: 1.If τ = 0.5, we only need to compute when the query point will meet the perpendicular bisector of p 1 p 2 ; that is, the moment p 1 and p 2 is equal to the distance to the query point: 4π , at time t 2 and t 4 , the query point satisfies the condition of 1/4 of the diameter (see Figure 4) , time t 1 and t 5 are the tangential moments of the perpendicular bisector and the bound of the ball in the incremental motion model.We derive time t 4 via t 3 and t 5 ; t 2 is similar.The vertical speed of the perpendicular bisector is denoted by v ⊥ .At time t 5 , distance from the query point to L is equal to the current radius of the ball in the motion model, so we have |k(x q +v x t 5 )−(y q +v y t 5 )+C| √ k 2 +1 = δ|v|t 5 .According to (1), t 3 = y q −kx q −C kv x −v y , then v ⊥ can be obtained by v ⊥ = δ|v|t 5 t 5 −t 3 .Therefore, from t 3 to t 4 , the distance varies along the direction of v ⊥ is equal to half of the radius.So, v ⊥ (t 4 − t 3 ) = 1  2 δ|v|t 4 .
Based on the above equations, as a conclusion, t 4 = − As the query point moves, the distances between all data points and the query point are varying, which may cause the skyline to change.According to the type of the change, three events are formulated as follows:

•
Event exit.This occurs when any skyline point leaves the skyline, which will only happen to a volatile skyline.Assume that s i ∈ S chg , and there is another skyline point s j with potential to dominate s i , then if s i intersects with s j in distance and Pr(s j < dist s i ) > τ, s i will leave the skyline; that is, an exit event happens.

•
Event in.This occurs when any nonskyline point enters the skyline.For a nonskyline point s n and all those skyline points currently dominating it, if s n gets closer to query point q than skyline point s i , s i can no longer dominate it; that is, an in event happens.However, whether it will enter the skyline depends on whether s i is the only one to dominate it.This will be checked when an event of this kind is being processed.• Event chg ord .This occurs when a couple of skyline points in LL make a sequential change.For a skyline point s i , if it intersects with its successor s j and s j cannot dominate it, s i and s j exchange positions in LL; that is, a chg ord event happens.Notice that s j does not have the potential to dominate s i ; otherwise, an exit event will happen instead.
As shown in Figure 9, the list includes {sk 1 , sk 2 , sk 3 , sk 4 } data points, and the points are sorted in ascending order of their distances to the query point.At time t, Pr(sk 3 < dist sk 2 ) > τ, sk 3 dominates sk 2 in all static dimensions.Then, an exit event will happen because sk 2 will be dominated and leaves the skyline (see Figure 9b).If s n is the skyline point dominated by sk 3 uniquely, at time t, if dist(q, sk 3 ) > dist(q, s n ), then an in event will happen because s n will enter the skyline (see Figure 9c).Additionally, at time t, if sk 3 gets closer to query point q than sk 2 and sk 3 has no potential to dominate sk 2 , then an chg ord event will happen because sk 2 and sk 3 will exchange their positions in the list (see Figure 9d).A global queue is used to maintain all events to represent future skyline changes.Each event is in the form of (time, type, sel f , rel) when the event happens at time time, and type is used to record the kind of this event.sel f and rel respectively represent the skyline point and the relevant data point involved in the event.In an exit or chg ord event, sel f represents the skyline point s i , while peer is its successor s j .In an in event, sel f represents the skyline point while peer stands for the relevant nonskyline point s n .
Initially, LL contains all the current skyline points while Q contains recent events that will happen in the nearest future.As time elapses, events in the queue are dequeued and handled according to their types.While handing events and updating the skyline, the process also incurs future incoming events.Therefore, Q evolves with existing events being dequeued and new events enqueued.After all due events are processed, LL contains all the correct skyline points with respect to the query point q's current position.

Event-Driven Mechanisms for Continuous Query Processing
In our method for the static data set, we use a simple 2D grid file index dividing the data space into h × v cells.We set the data points within each cell are stored in one disk page.At the beginning of the algorithm, the static skyline will be computed in advance.According to Lemma 2, the farthest distance is recorded in variable d f arest as a search bound, and the cells beyond d f arest are pruned for reducing I/Os (see Figure 10).As shown in Algorithm 1, in initial, all permanent skyline points in S sta are inserted into LL based on their distance to the starting position of query point q.First, we prune the dataset by utilizing geometric properties.Then, starting from the cell where q's initial position lies, all grid cells are searched in a spiral manner so that those on an inner surrounding circle are searched before those on an outer one, as shown in Figure 10.Then, we organize the data set according to Lemma 3, 4, 5 with heap structure.A heap H is used to store cells or data points that are possible to enter the skyline, which its top is the cell or the data point which is closest to q's estimated position.Points or cells in the heap are sequentially compared to the current skyline points in LL, which is adjusted with deletion or insertion if necessary.After that, events will be created for continuous skyline query-all events for all skyline points, except the last one in LL.Next, the farthest skyline point is applied to compute possible in events for those points farther than it.

Algorithm 1: Initialization
Input: the position of a query point q Output: the initial skyline IS for q, the event queue Q with initial events 1 According to S sta , insert an entry (1, dist, ∞, ∞) into LL; 2 Search bound determined by d f arest = (LL.last,q); 3 Prune the dataset by S sta ; 4 Insert the cell where q's position lies into H 5 Scan the grid file from where q lies; 6 while !Empty H do 7 pop(H), pop and process the top entry e; Algorithm 2 has shown the process of CreateEvents in detail.For a given skyline point s i , the algorithm first computes the time t when s i and the next skyline point s j in LL will exchange their positions in the list that s j will dominate s i in distance.If t is later than s j 's exchange time or s i 's validity time, it is ignored.Otherwise, it means an exit event depending on s j 's validity time if s i ∈ S chg , or it is a simple chg ord event.Then compute in events for each nonskyline point s n that distance to q's estimated position comparing with s i and s j 's distances.
When the nearest event in Q happens, it is dequeued and processed with the relevant points involved according to its type.Then create new events after the new skyline is obtained (as shown in Algorithm 3).At any time when Q is empty, all the points in LL are the correct skyline of the current time point.
According to Algorithm 3, the actions to process each kind of event are described as follows: for an exit event, s i is removed from the skyline list LL and creates new events for its predecessor since the successor has been changed; for an in event, the nonskyline point s n will be checked to see whether it is unique and dominated by s i .If yes, s n will be inserted into the skyline list LL and new events are computed for relevant points.Otherwise, a possible new in event is computed and enqueued.For a chg ord event, the skyline list is correctly adjusted by exchanging the positions of s i and s j .Similarly, relevant events are created and enqueued for them and their predecessors if exists.Enclosed motion patterns If the movement of a query point q is an enclosed curve, the query point stays in the enclosed region surrounded by the curve.Figure 11a-c are three typical enclosed motion patterns: circle, ellipse, and parabola.In this case, in spite of q's moving speed, the point moves along the curve and makes an influence to the skyline repeatedly.In consideration of the characteristics of enclosed curves, the exact upper and lower bounds in all directions of a query point can be obtained.Based on all of the above curves, we can establish incremental motion models in different directions to acquire a series of virtual query points; after that, the geometric pruning algorithm will be executed iteratively, and filtering out most of the redundant points which make no influence to the skyline.

Motion patterns with bounds
If the movement of a query point q is a curve with bounds, it means that we can acquire its upper and lower bounds in one or several directions while the rest have no boundaries or could not be predicted (Sinusoid and Spiral are two of this kind of motion pattern, as shown in Figure 11d,e).According to the property of these curves, we can still utilize the starting position of the query point and the directions that bounds can be obtained to establish an incremental motion model, get a virtual point, and apply to the geometric pruning strategies.Basically, though the effect of filtering is not as efficient as the situation of enclosed patterns, it is still worth executing compared to snapshot skyline queries.

Motion patterns without bounds
If the movement of a query point q is a curve without bounds, then no bounds can be gained in each direction and it is infeasible to use the geometric pruning strategies based on incremental motion model (examples of this kind of motion pattern are peach and Archimedean spiral in Figure 11f,g).
According to the characteristics of different types of trajectories, the curves can be classified as follows: 1.
There exists an upper or lower bound in one or several directions (prerequisite).For this kind of curve we can try to establish an incremental motion model by making a pair of tangent lines (e.g., parabola, logarithmic curves).The qualifications of tangent lines are: (a) There exists an upper or lower bound in some direction(s).(b) The pair of tangent lines will intersect and generate a virtual query point.(c) The angle between the pair of tangent lines satisfies the demands of the geometric pruning framework.

2.
There exists no bound in all directions or it is unable to acquire a pair of qualified tangent lines in (1).In this case, we cannot adopt the geometric pruning strategies directly.Then, we need to take the characteristics of the curves into account and adapt it to the framework by adding additional restrictions.

Experiments
A näive approach to monitoring moving skylines is to call existing algorithms such as I/O optimal BBS [3,4] to recompute the skyline whenever the results need to be updated.In this section, we compare the näive algorithms to the proposed methods against various factors which may potentially affect the performance of the algorithms.All the algorithms are implemented in standard C++ with STL library support and compiled with GNU GCC 4.9.3.Experiments were run on a PC with Intel Core i3-3240 3.40GHz dual CPU and 4G memory running Ubuntu Linux 14.04 LTS.The disk page size was fixed to 4096 bytes.
To generate datasets for the experiments, we first fetched real-life California's interesting points from the website [28] (see Figure 12), then we combined the real locations with nonspatial dimensions following different distributions: Independent, Correlated, Anti-correlated, and Zipf.The data size of California's real locations is about 100 K, and we generate nonspatial dimensions like that in reference [1].The attribute value of a data point varied from 1 to 1000.CPU time and I/O counts were used to measure the efficiency of the algorithms under 100 runs of skyline queries.The concerned parameters used in the experiments are listed in Table 3.In particular, the following algorithms were evaluated: Since the BBS algorithm is the most efficient method for computing skyline in static settings (both data points and query point are static), we adopted it for comparison in the experiments.When the location of the query point changed, we only modified the "mindist" to adapt the basic BBS algorithm (see [4]).Besides, we also used the method which is called "ex-BBS" in [5] for contrast to our proposed methods.Effect of Cardinality.To generate different-sized datasets, we randomly selected part of the real locations and then combined with two synthetic nonspatial dimensions.Thus, we converted the size of the datasets from 10 K to 100 K.Then, we executed 100 (10 for anti-correlated) continuous skyline queries on the datasets.For each query, we set the starting position of the query point as (37 • N,-120 • W).The default speed of the query point was (1, −1), while the moving direction was the same as vector (1, −1).The threshold was fixed to be 0.5.In Figure 13, the CPU cost of the original CSQ algorithm is higher than BBS algorithms in some cases, because CSQ not only processes non-skyline objects but also computes the initial events to maintain the skyline in the future; GP-CSQ and GP2-CSQ were faster in general because the geometric pruning policies can filter out a large number of unqualified data points and cut down the CPU cost of event computing.Note that in Figure 13d, CSQ takes much time and the size of the skyline results is large because anti-correlated datasets incur more events.Figure 14 shows that as cardinality increases, the I/Os of CSQ, GP-CSQ, and GP2-CSQ are nearly 10% less than that of BBS, while GP-CSQ and GP2-CSQ are a little better than CSQ algorithm.Effect of nonspatial dimensionality.We used a real 100 K road network dataset combined with nonspatial dimensionality ranging from two to five to evaluate the effect of nonspatial dimensionality on our methods.Values on these nonspatial dimensionalities varied from 1 to 1000.The set of other parameters are the same as shown in Table 3.
As shown in Figure 15, the pruning strategies can save running time to a certain extent while supporting higher nonspatial dimensionality.In Figure 16, CSQ algorithms have a clear advantage in I/Os since they focus on dynamic attributes while nonspatial attributes are considered only in dominance checking.Note that the efficiency of the geometric pruning strategies were affected slightly because a data point is harder to be dominated in higher nonspatial dimensionality and makes it almost impossible to prune more data points.Effect of Starting Positions.Obviously, the effectiveness of geometric pruning strategies is related to the location of query points.In this section, we performed experiments simulating an object moving from the northwest to the southeast in California to verify the influence of different starting positions.Representatively, the starting positions will be chosen in (42 • N,-125 • W), (39.5 • N,-122.5 • W), (37 • N,-120 • W), (34.5 • N,-117.5 • W); other query parameters were picked up in the same way as in previous experiments.We mainly explored the effect of position related to the efficiency of the pruning strategies by choosing the above four representative positions for evaluation.Moreover, we intended to obtain the general performance of the geometric pruning strategies.
As shown in Figures 17 and 18, the costs are quite distinct since the efficiency of pruning operations are obviously affected by the positions of the query point.The pruning operation of ex-BBS was almost disabled since there exists a permanent skyline far from the query position.If the query point approaches the center of the spatial area, the pruning operations of ex-BBS are more likely to malfunction.The geometric pruning strategies tend to be available except in the extreme situation in which the query point is starting from the edge of the spatial location.Moreover, the proposed GP-CSQ and GP2-CSQ methods can filter out most of the unqualified data points in some specific cases in which the region to be pruned is extensive (e.g., in the position (34.5 • N, -117.5 • W), saving about 60%-80% of CPU time and I/Os).Note that in Figure 17d, the CPU cost of the CSQ algorithm is much higher since there are a mount of events needing to be computed and GP-CSQ and GP2-CSQ are approaching that of BBS due to the optimization of the geometric pruning strategies.
Effect of Moving Speed.In this section, we run the experiments where the speed of the query point varies from (1,−1) to (8,−8).We still use the real-world dataset of 100 K combined with two correlated, independent, anti-correlated, and zipf-distributed nonspatial attributes.
In Figures 19 and 20, it is obvious that the cost of CSQ increases with the query speed because the distance of data points intersects more frequently, which means larger numbers of events incur and need to be disposed of, thus consuming more I/Os and CPU time.Optimized by the proposed geometric pruning strategies, the I/O cost of GP-CSQ and GP2-CSQ is not too sensitive.However, the CPU time for creating and handling events is still too high when the query point moves very fast.We can see that CSQ, GP-CSQ, and GP2-CSQ algorithms are more suitable in the context of lower moving speeds.If the query point moves at a high speed, we prefer to compute skyline from scratch; i.e., call BBS algorithm for each time step.Geometric Pruning Framework Efficiency Evaluation.In this section, we compare the effect of the geometric pruning framework under different kinds of motion patterns (e.g., Enclosed, with Bounds, and without Bounds) to explore whether it is efficient and versatile enough.For simplicity, we take ellipse, sinusoid, and parabola as representative of the three motion patterns mentioned above.
Figure 23 shows that the geometric pruning framework works efficiently as the cardinality of each dataset increases.In particular, for the ellipse, the pruning effect is very strong since we can eliminate most of the data points by invoking the geometric pruning algorithm several times for an enclosed motion pattern.More specifically, for the rest motion patterns, about 60% of the initial candidates were pruned out, which can still dramatically reduce the query execution time.Figure 24 shows the influence of nonspatial dimensionality.The proposed pruning strategy is slightly affected by higher nonspatial dimensionality, since every data point will not be easily dominated in higher dimensionality, but the result indicates that it is still worth executing the pruning operations.Note that the remaining data sizes of the anti-correlated datasets are large due to their greater skyline results.

Conclusions
In this paper, we address continuous skyline queries on moving query points under the incremental motion model.Geometric properties are fully exploited to prune the data points which will not belong to the final skyline results, thus improving the efficiency of skyline query processing.Further, event-based mechanisms and a grid file index-based pruning policy are proposed to maintain continuous skyline results instead of computing skyline results from scratch.Two efficient algorithms (GP-CSQ and GP2-CSQ) are proposed based on geometric properties, and our extensive experiments have shown that the two geometric property-based algorithms are more effective and efficient than existing methods.
There are many promising future directions.Firstly, suitable motion patterns for specific applications can be found, and we can study how to alter the pruning strategies based on corresponding geometric properties to adapt the new motion pattern under our framework.Secondly, since we assume only query points and point attributes are dynamic, if the databases change (i.e., the data points are varying-insert, update, or delete), we can study how to develop efficient algorithms to answer user continuous skyline queries.Thirdly, future work can be devoted to investigating the possibility of using the proposed geometric pruning strategies to support other variants of skyline queries, such as reserve skyline queries [24], skyline cubes [29], spatial skyline queries [16], probabilistic skylines [27], and so on.While results obtained from comprehensive experiments indicated the superiority of our approaches devised based on the geometric pruning strategies over existing works, we believe that these explored geometric features and the proposed framework are useful for other skyline query variants not examined in this paper.Another interesting problem is to extend the geometric pruning strategies to other real applications (e.g., recommendation systems [30], mobile sensor networks [31,32], and surveillance systems [33]) such that whenever a query point is located, after a snapshot query is performed, by using the proposed geometric pruning strategies or other extentions, we can get the small part of candidates which may impact the query result in the near future without verifying all data points in the system so that the query results can be maintained efficiently according to the small-scale candidate datasets.

Figure 1 .
Figure 1.Routes for finding restaurants while moving.

Figure 4 .
Figure 4. Dominance of distance for two data points.
2 and is dominated by another point S in C 2 .

Figure 8 .
Figure 8. Change of distance dominance for two data points.

8 if e is a data point then 9 compare 10 if 12 if s is a cell then 13 if 17 insert cell i into H 18 else 19 break 20 for
with all the current skyline points in LL; Not dominated then 11 Insert e into LL; Not dominated then 14 Insert child entries of e into H sequentially; 15 for each cell cell i on next outer surrounding circle do 16if dist(cell i , q) < d f arest || cell i is not dominated then each s i from LL.last.prev to LL. f irst do 21 CreateEvents(s i , q); 22 Compute possible in events with LL.last;

•Figure 12 .
Figure 12.California's points of interest (circle dots and the star denote the starting positions).

Table 1 .
Summary of notations.

Table 2 .
Possibilities for an intersection of two data points.