Cluster Nested Loop k-Farthest Neighbor Join Algorithm for Spatial Networks

Cho, Hyung-Ju

doi:10.3390/ijgi11020123

Open AccessArticle

Cluster Nested Loop k-Farthest Neighbor Join Algorithm for Spatial Networks

by

Hyung-Ju Cho

Department of Software, Kyungpook National University, 2559 Gyeongsang-daero, Sangju-si 37224, Korea

ISPRS Int. J. Geo-Inf. 2022, 11(2), 123; https://doi.org/10.3390/ijgi11020123

Submission received: 4 December 2021 / Revised: 20 January 2022 / Accepted: 30 January 2022 / Published: 9 February 2022

(This article belongs to the Special Issue Spatio-Temporal and Constraint Databases)

Download

Browse Figures

Versions Notes

Abstract

:

This paper considers k-farthest neighbor (kFN) join queries in spatial networks where the distance between two points is the length of the shortest path connecting them. Given a positive integer k, a set of query points Q, and a set of data points P, the kFN join query retrieves the k data points farthest from each query point in Q. There are many real-life applications using kFN join queries, including artificial intelligence, computational geometry, information retrieval, and pattern recognition. However, the solutions based on the Euclidean distance or nearest neighbor search are not suitable for our purpose due to the difference in the problem definition. Therefore, this paper proposes a cluster nested loop join (CNLJ) algorithm, which clusters query points (data points) into query clusters (data clusters) and reduces the number of kFN queries required to perform the kFN join. An empirical study was performed using real-life roadmaps to confirm the superiority and scalability of the CNLJ algorithm compared to the conventional solutions in various conditions.

Keywords:

cluster nested loop join; k-farthest neighbor join; spatial network; shared execution

1. Introduction

In this study, we investigate the efficient processing of k-farthest neighbor (kFN) join queries in spatial networks where the distance between two points is defined by the length of the shortest path connecting them. The kFN join combines each query point q in Q with the k data points in P that are farthest from the query point q, given a positive integer k, a set of query points Q, and a set of data points P. The kFN join query has real-life applications in recommender systems, where farthest neighbors can increase the variety of recommendations [1,2]. Farthest neighbor search is also an element in clustering applications [3], complete linkage clustering [4], and nonlinear dimensionality reduction algorithms [5]. Thus, being able to quickly process kFN join queries is an important practical concern for many applications [6,7,8,9,10,11,12,13,14].

Figure 1 shows an example of the kFN join between a set Q of query points and a set P of data points in a spatial network, where it is assumed that

k = 1

,

Q = {q_{1}, q_{2}, q_{3}}

, and

P = {p_{1}, p_{2}, p_{3}, p_{4}}

are given. In this paper, the kFN join is denoted as

Q ⋉_{k F N} P

. In this example, the data points farthest away from

q_{1}

,

q_{2}

, and

q_{3}

are

p_{2}

,

p_{2}

, and

p_{3}

, respectively, which can be represented by

Q ⋉_{k F N} P = {〈q_{1}, p_{2}〉, 〈q_{2}, p_{2}〉, 〈q_{3}, p_{3}〉}

. Conversely, the query points farthest from

p_{1}

,

p_{2}

,

p_{3}

, and

p_{4}

are

q_{1}

,

q_{1}

,

q_{3}

, and

q_{1}

, respectively, which can be represented by

P ⋉_{k F N} Q = {〈p_{1}, q_{1}〉, 〈p_{2}, q_{1}〉, 〈p_{3}, q_{3}〉, 〈p_{4}, q_{1}〉}

. This simply proves that kFN joins are not commutative, i.e.,

Q ⋉_{k F N} P \neq P ⋉_{k F N} Q

. Note that this study considers

Q ⋉_{k F N} P

. The facility location problem, which determines the competitive location of a new facility, such as garbage incinerators, crematoriums, chemical plants, supermarkets, and police stations, is very important in real life when using the kFN join query applications. Particularly, determining the optimal facility location is still an open problem [15,16]. Facing such a research problem, efficiently evaluating the kFN join query is remarkably useful. Assume that query points

q_{1}

through

q_{3}

represent unpleasant facilities such as garbage incinerators and chemical plants, whereas data points

p_{1}

through

p_{4}

represent available rental apartments. This example may consider a FN join between a set Q of unpleasant facilities and a set P of available rental apartments, which could be “find ordered pairs of an unpleasant facility q and available rental apartment p such that the rental apartment p is farther from the unpleasant facility q than the other rental apartments available.” Naturally,

p_{2}

or

p_{3}

may be the competitive apartment in terms of the distance to unpleasant facilities.

The kFN join query should repeatedly compute the distances between each pair of data and query points, which leads to a long query processing time. A simple solution to the kFN join query between a query set Q and a dataset P repeatedly scans all data points in P for each query point in Q to compute the distance between each pair of query and data points

〈q, p〉

. This simple solution is unacceptable in most cases because it repeatedly retrieves candidate data points for each query point. It may, however, be considered in cases where query points are uniformly distributed throughout the region. However, kFN join queries have not received adequate attention for spatial networks, despite their importance. This paper proposes a cluster nested loop join (CNLJ) algorithm for spatial networks to solve the problem of efficiently processing kFN join queries. Specifically, using the spatial network connection, query points (data points) are clustered into query clusters (data clusters). The CNLJ algorithm exploits a shared computation for query clusters to avoid unnecessary computations of the distances between data and query points. The CNLJ algorithm has several advantages over the traditional solution: (1) it clusters query points (data points) using the spatial network connection for the shared computation, (2) it quickly retrieves candidate data points at once for clustered query points, and (3) it does not retrieve candidate data points for each query point separately. To the best of our knowledge, this is the first attempt to study kFN join queries for spatial networks.

The primary contributions of this study are listed as follows:

This paper presents a cluster nested loop join algorithm for quickly evaluating spatial network kFN join queries. The CNLJ algorithm clusters query points before retrieving candidate data points for clustered query points all at once. As a result, it does not retrieve candidate data points for each query point multiple times.
The CNLJ algorithm’s correctness is demonstrated through mathematical reasoning. In addition, a theoretical analysis is provided to clarify the benefits and drawbacks of the CNLJ algorithm concerning query point spatial compactness.
An empirical study with various setups was conducted to demonstrate the superiority and scalability of the CNLJ algorithm. The CNLJ algorithm outperforms the conventional join algorithms by up to 50.8 times according to the results.

The remainder of this paper is organized as follows: Section 2 reviews related research and provides some background knowledge. Section 3 describes the clustering of query points (data points) and the computing of the maximum and minimum distances between a border point and a data cluster. Section 4 presents the CNLJ algorithm for rapidly evaluating kFN join queries in spatial networks. Section 5 presents the results of experiments using the CNLJ and conventional join algorithms with different setups. Finally, the conclusions of this study are discussed in Section 6.

2. Background

Section 2.1 presents related works and Section 2.2 defines the terms and notations used in this study.

2.1. Related Work

Many studies have considered spatial queries based on the farthest neighbor (FN) search [6,7,8,9,10,11,13,14,17,18,19,20]. Korn and Muthukrishnan [21] pioneered the concept of a reverse farthest neighbor (RFN) query to obtain the weak influence set. Given a set of data points P and a query point q, the RFN query retrieves a set of data points

p \in P

such that q is their farthest neighbor among all points in

P ⋃ \{q\}

. This is the monochromatic RFN (MRFN) query [8,9,13,14,19]. Another version of the RFN query is the bichromatic reverse farthest neighbor (BRFN) query [10,13,14,22]. Given a set of data points P, a set of query points Q, and a query point q in Q, the BRFN query retrieves a set of data points p in P such that q is the farthest neighbor of p among all query points in Q. Many studies have been conducted to process RFN queries for the Euclidean space [8,9,14,19,22] and for spatial networks [10,13]. Yao et al. [14] proposed progressive farthest cell and convex hull farthest cell algorithms to answer RFN queries using an R-tree [23,24]. A solution to answer reverse kFN queries in the Euclidean space was presented for arbitrary values of k [22]. Liu et al. [19] proposed the concept of group RkFN query in the obstacle space and presented a query optimization algorithm based on the Voronoi graph. Tran et al. [10] proposed a solution for RFN queries and RkFN queries in road networks by using Voronoi-diagram-related attributes and Dijkstra’s algorithm. Xu et al. [13] presented efficient algorithms based on landmarks and hierarchical partitioning to process monochromatic and bichromatic RFN queries in spatial networks. The approximate version of the problem, known as the c-approximate farthest neighbor (c-AFN) search, has been actively studied due to the difficulty in designing an efficient method for exact FN search in high-dimensional space [6,17,18,25]. Huang et al. [18,25] introduced a new concept of reverse locality-sensitive hashing (LSH) family and developed reverse query-aware LSH functions. They proposed two hashing schemes for high-dimensional c-AFN search over external memory. Liu et al. [17] developed an approximate algorithm with a theoretical guarantee for high-dimensional c-AFN search over external memory. Curtin et al. [6] proposed an algorithm with an absolute approximation guarantee for the FN search in the high-dimensional space. To estimate the difficulty of the FN search problem, an information-theoretical measure of hardness was presented [6]. Farthest dominated location (FDL) queries were proposed in [26]. An FDL query retrieves the location

s \in L

such that the distance to its nearest dominating object in P is maximized given a set of data points P with spatial and nonspatial attributes, a set L of candidate locations, and a design competence vector

Ψ

for L. Gao et al. [7] studied aggregate k-farthest neighbor (AkFN) queries that are defined by aggregation functions, such as min, max, and sum, and presented the MB and BF algorithms based on the R tree [23,24]. An AkFN query retrieves the k data points in P with the largest aggregate distances to all query points in Q given a set of data points P and a set of query points Q. In spatial networks, effective solutions to AkFN queries were proposed [11].

Due to the differences in the properties between the shortest path distance and the Euclidean distance, existing solutions based on the Euclidean space cannot be used directly to answer kFN join queries in spatial networks. The existing solutions for nearest neighbor search [27,28,29] cannot readily be used to address the farthest neighbor search problems due to the different distance properties between farness and nearness. Although the group computation of spatial queries has received considerable attention [19,27,30,31,32,33,34], group computation has not been applied to kFN join queries for spatial networks. To efficiently process kFN join queries in spatial networks, new sophisticated algorithms must be developed. First, the kFN join is a costly operation by definition. Second, farthest neighbor search is more difficult than nearest neighbor search. Finally, designing index structures that effectively support the FN search for spatial networks is difficult. In terms of the space domain, query type, and data type, Table 1 compares our problem scenario to existing studies.

2.2. Notation and Formal Problem Description

Query and data points are placed in a spatial network G and these points represent points of interest (POIs), as shown in Figure 1. Given two points q and p,

d i s t (q, p)

is the length of the shortest path between q and p in G. Table 2 summarizes the symbols used in this study.

Definition 1.

kFN search [6,7,8,9,10,11,13,14]. Given a positive integer k, a query point q, and a set P of data points, the query point q returns a set of k data points, denoted as

Ω (q)

, such that

d i s t (q, p^{+}) \geq d i s t (q, p^{-})

holds for

\forall p^{+} \in Ω (q)

and

\forall p^{-} \in P - Ω (q)

.

Definition 2.

kFN join. Given a positive integer k, a set of query points Q, and a set of data points P, the kFN join query, denoted as

Q ⋉_{k F N} P

, returns ordered pairs of each query point q in Q and a set of k data points farthest from q. For simplicity,

Q ⋉_{k F N} P

is abbreviated to

Q ⋉ P

, which is formally defined by

Q ⋉ P = {〈q, Ω (q)〉 | \forall q \in Q}

. Note that the kFN joins are not commutative, i.e.,

Q ⋉ P \neq P ⋉ Q

.

Definition 3.

Spatial network [32,33,36,37,38]. A weighted undirected graph

G = 〈V, E, W〉

is used to represent a spatial network, where V, E, and W represent the vertex set, edge set, and edge distance matrix, respectively. Each edge has a non-negative weight that indicates the network distance.

Definition 4.

Intersection, intermediate, and terminal vertices. Vertices can be divided into three categories based on their degree: (1) If the degree of a vertex is larger than or equal to 3, the vertex is referred to as an intersection vertex. (2) If the degree is 2, the vertex is an intermediate vertex. (3) If the degree is 1, the vertex is a terminal vertex.

Definition 5.

Vertex sequence, query segment, and data segment. A vertex sequence

\bar{v_{l} {v_{l + 1} \dots v}_{m}}

denotes a path between two vertices,

v_{l}

and

v_{m}

, such that

v_{l}

and

v_{m}

are either an intersection vertex or a terminal vertex, and the other vertices in the path,

v_{l + 1}, \dots, v_{m - 1}

, are intermediate vertices. A query segment

\bar{q_{i} q_{i + 1} \dots q_{j}}

denotes a line segment connecting query points

q_{i}, q_{i + 1}, \dots, q_{j}

and a data segment

\bar{p_{l} p_{l + 1} \dots p_{m}}

denotes a line segment connecting data points

p_{l}, p_{l + 1}, \dots, p_{m}

. For simplicity,

\bar{q_{i} q_{j}}

and

\bar{p_{l} p_{m}}

are abbreviated to

\bar{q_{i} q_{i + 1} \dots q_{j}}

and

\bar{p_{l} p_{l + 1} \dots p_{m}}

, respectively, to reduce confusion.

3. Clustering Points and Computing Distances

In Section 3.1, we group query points (data points) by using the spatial network connection. We calculate the maximum and minimum distances between a border point and a data cluster in Section 3.2.

3.1. Clustering Query and Data Points Using Spatial Network Connection

Figure 2 illustrates an example of the kFN join

Q ⋉ P

, where

k = 2

,

Q = {q_{1}, q_{2}, q_{3}, q_{4}}

, and

P = {p_{1}, p_{2}, \dots, p_{6}}

are given. The example kFN join query requires that each query point q in Q finds two data points farthest from q.

Figure 3 shows an example of the two-step clustering method to group nearby query points into query clusters. A query segment is created in the first step by connecting query points in a vertex sequence. In Figure 3a, query points

q_{1}

and

q_{2}

in a vertex sequence

\bar{q_{1} q_{2} v_{2}}

are connected to become

\bar{q_{1} q_{2}}

. Thus, three query segments

\bar{q_{1} q_{2}}

,

q_{3}

, and

q_{4}

are generated, as shown in Figure 3a. In the second step, an intersection vertex is used to connect adjacent query segments to form a query cluster. In Figure 3b, the intersection vertex

q_{1}

connects two query segments

\bar{q_{1} q_{2}}

and

q_{4}

. Similarly,

q_{3}

and

q_{4}

are linked by the intersection vertex

v_{1}

. Finally,

\bar{q_{1} q_{2}}

and

q_{3}

are linked by the intersection vertex

v_{2}

. As a result, three query segments

\bar{q_{1} q_{2}}

,

q_{3}

, and

q_{4}

are linked to form a query cluster

{\bar{q_{1} {q_{2} v}_{2}}, \bar{q_{1} {q_{4} v}_{1}}, \bar{v_{1} q_{3} v_{2}}}

. Note that a query cluster is a set of query segments. Naturally, a set of query points

Q = {q_{1}, q_{2}, q_{3}, q_{4}}

is converted into a set of query clusters

\bar{Q} = {{\bar{q_{1} {q_{2} v}_{2}}, \bar{q_{1} {q_{4} v}_{1}}, \bar{v_{1} q_{3} v_{2}}}}

. Let us define a border point for a query cluster

\bar{Q_{C}}

. When a query cluster

\bar{Q_{C}}

and its nonquery cluster

G - \bar{Q_{C}}

meet at a point, that point is referred to as the border point of

\bar{Q_{C}}

. In this example, three border points

q_{1}

,

v_{1}

, and

v_{2}

are found for

\bar{Q_{C}} = {\bar{q_{1} {q_{2} v}_{2}}, \bar{q_{1} {q_{4} v}_{1}}, \bar{v_{1} q_{3} v_{2}}}

. Thus, a set of border points of

\bar{Q_{C}}

is represented by

B (\bar{Q_{C}}) = {q_{1}, v_{1}, v_{2}}

.

Figure 4 shows an example of a two-step clustering method to group neighboring data points into data clusters. Notably, the query and data points are clustered using the same two-step method. In the first step, data points

p_{1}

,

p_{2}

, and

p_{3}

in a vertex sequence

\bar{v_{1} p_{2} v_{3}}

are connected to become a data segment

\bar{p_{1} p_{2} p_{3}}

. Similarly, data points

p_{4}

and

p_{5}

in a vertex sequence

\bar{p_{5} p_{4} q_{1}}

are linked to form a data segment

\bar{p_{4} p_{5}}

. As a result, three data segments

\bar{p_{1} p_{2} p_{3}}

,

\bar{p_{4} p_{5}}

, and

p_{6}

are generated, as illustrated in Figure 4a. In the second step, two data segments

\bar{p_{4} p_{5}}

and

p_{6}

are joined by an intersection vertex

p_{5}

to form a data cluster

{\bar{p_{4} p_{5}}, \bar{p_{5} p_{6}}}

. As a result, a set of data points

P = {p_{1}, p_{2}, \dots, p_{6}}

is transformed into a set of data clusters

\bar{P} = {{\bar{p_{1} p_{2} p_{3}}}, {\bar{p_{4} p_{5}}, \bar{p_{5} p_{6}}}}

.

3.2. Computing Maximum and Minimum Distances from a Border Point to a Data Cluster

The maximum and minimum distances between a border point

b_{q}

and a data cluster

\bar{P_{C}}

are computed in this section. The minimum and maximum distances between

b_{q}

and

\bar{P_{C}}

are formally defined by

m i n d i s t (b_{q}, \bar{P_{C}}) = m i n {d i s t (b_{q}, p) | p \in \bar{P_{C}}}

and

m a x d i s t (b_{q}, \bar{P_{C}}) = m a x {d i s t (b_{q}, p) | p \in \bar{P_{C}}}

, respectively. The minimum distance between

b_{q}

and

\bar{P_{C}}

can be easily calculated by

m i n d i s t (b_{q}, \bar{P_{C}}) = m i n {d i s t (b_{q}, b_{p}) | b_{p} \in B (\bar{P_{C}})}

where

b_{p}

is a border point of a data cluster

\bar{P_{C}}

. The maximum distance between

b_{q}

and

\bar{P_{C}}

can be represented by

m a x d i s t (b_{q}, \bar{P_{C}}) = m a x {m a x d i s t (b_{q}, \bar{p_{l} p_{m}}) | \bar{p_{l} p_{m}} \in \bar{P_{C}}}

where

m a x d i s t (b_{q}, \bar{p_{l} p_{m}})

is the maximum distance between

b_{q}

and a data segment

\bar{p_{l} p_{m}}

in

\bar{P_{C}}

.

An example is used to illustrate how to compute the maximum and minimum distances between a border point

b_{q}

and a data cluster

\bar{P_{C}}

. Note that the example kFN join query has three border points and two data clusters, i.e.,

B (\bar{Q_{C}}) = {q_{1}, v_{1}, v_{2}}

and

\bar{P} = {{\bar{p_{1} p_{2} p_{3}}}, {\bar{p_{4} p_{5}}, \bar{p_{5} p_{6}}}}

. In this section, the maximum and minimum distances between

b_{q}

and

\bar{P_{C}}

are computed, where

b_{q} \in {q_{1}, v_{1}, v_{2}}

and

\bar{P_{C}} \in {{\bar{p_{1} p_{2} p_{3}}}, {\bar{p_{4} p_{5}}, \bar{p_{5} p_{6}}}}

. Note that computations of

m a x d i s t (q_{1}, {\bar{p_{1} p_{2} p_{3}}})

,

m a x d i s t (q_{1},

{\bar{p_{4} p_{5}}, \bar{p_{5} p_{6}}})

,

m a x d i s t (v_{1},

{\bar{p_{1} p_{2} p_{3}}})

,

m a x d i s t (v_{1},

{\bar{p_{4} p_{5}}, \bar{p_{5} p_{6}}})

,

m a x d i s t (v_{2},

{\bar{p_{1} p_{2} p_{3}}})

, and

m a x d i s t (v_{2},

{\bar{p_{4} p_{5}}, \bar{p_{5} p_{6}}})

are illustrated in Figure 5, Figure 6, Figure 7, Figure 8, Figure 9 and Figure 10, respectively.

Figure 5 illustrates the computation of

m a x d i s t (q_{1},

{\bar{p_{1} p_{2} p_{3}}})

. First, the distances from

q_{1}

to the endpoints

p_{1}

and

p_{3}

of

\bar{p_{1} p_{2} p_{3}}

evaluate to

d i s t (q_{1}, p_{1}) = 24

and

d i s t (q_{1}, p_{3}) = 27

, respectively. Consider a point p in

\bar{p_{1} p_{2} p_{3}}

. Because p lies in

\bar{p_{1} p_{2} p_{3}}

, whose length is

l e n (\bar{p_{1} p_{2} p_{3}}) = 5

, the distance between

q_{1}

and p is computed by

d i s t (q_{1}, p) =

m i n

{d i s t (q_{1}, p_{1}) + l e n (\bar{p_{1} p}), d i s t (q_{1}, p_{3}) + l e n (\bar{p_{3} p})}

= m i n {24 + l e n (\bar{p_{1} p}), 27 + l e n (\bar{p_{3} p})}

. Let

x = l e n (\bar{p_{1} p})

for

0 \leq x \leq 5

. Then, we have

l e n (\bar{p_{3} p}) = 5 - x

because

l e n (\bar{p_{1} p}) + l e n (\bar{p_{3} p}) = 5

. We can rewrite

d i s t (q_{1}, p) = m i n {24 + x, 27 + (5 - x)}

for

0 \leq x \leq 5

. As shown in Figure 5, the maximum and minimum distances between

q_{1}

and

{\bar{p_{1} p_{2} p_{3}}}

is

m a x d i s t (q_{1}, {\bar{p_{1} p_{2} p_{3}}}) = 28

and

m i n d i s t (q_{1},

{\bar{p_{1} p_{2} p_{3}}}) = 24

, respectively. For convenience, the star symbol (★) in Figure 5 is marked to indicate

m a x d i s t (q_{1}, {\bar{p_{1} p_{2} p_{3}}}) = 28

.

The maximum distance between a border point

q_{1}

and a data cluster

{\bar{p_{4} p_{5}}, \bar{p_{5} p_{6}}}

is represented by

m a x d i s t (q_{1}, {\bar{p_{4} p_{5}}, \bar{p_{5} p_{6}}})

= m a x {m a x d i s t (q_{1}, \bar{p_{4} p_{5}}),

m a x d i s t (q_{1}, \bar{p_{5} p_{6}})}

. The computations of

m a x d i s t (q_{1}, \bar{p_{4} p_{5}})

and

m a x d i s t (q_{1}, \bar{p_{5} p_{6}})

are illustrated in Figure 6a,b, respectively. The distances from

q_{1}

to the endpoints

p_{4}

and

p_{5}

are

d i s t (q_{1}, p_{4}) = 5

and

d i s t (q_{1}, p_{5}) = 8

, respectively. The maximum and minimum distances from

q_{1}

to

\bar{p_{4} p_{5}}

are shown in Figure 6a as

m a x d i s t (q_{1}, \bar{p_{4} p_{5}}) = 8

and

m i n d i s t (q_{1}, \bar{p_{4} p_{5}}) = 5

, respectively. The distances from

q_{1}

to the endpoints

p_{5}

and

p_{6}

of

\bar{p_{5} p_{6}}

are

d i s t (q_{1}, p_{5}) = 8

and

d i s t (q_{1}, p_{6}) = 11

, respectively. The maximum and minimum distances from

q_{1}

to

\bar{p_{5} p_{6}}

are calculated to be

m a x d i s t (q_{1}, \bar{p_{5} p_{6}}) = 11

and

m i n d i s t (q_{1}, \bar{p_{5} p_{6}}) = 8

, respectively, as shown in Figure 6b. Therefore, the maximum and minimum distances between

q_{1}

and

{\bar{p_{4} p_{5}}, \bar{p_{5} p_{6}}}

are

m a x d i s t (q_{1}, {\bar{p_{4} p_{5}}, \bar{p_{5} p_{6}}}) = 11

and

m i n d i s t (q_{1}, {\bar{p_{4} p_{5}}, \bar{p_{5} p_{6}}}) = 5

, respectively.

Figure 7 illustrates the computation of

m a x d i s t (v_{1},

{\bar{p_{1} p_{2} p_{3}}})

and

m i n d i s t (v_{1},

{\bar{p_{1} p_{2} p_{3}}})

. The distances from

v_{1}

to the endpoints

p_{1}

and

p_{3}

of

\bar{p_{1} p_{2} p_{3}}

are

d i s t (v_{1}, p_{1}) = 19

and

d i s t (v_{1}, p_{3}) = 24

, respectively. Thus, the maximum and minimum distances between

v_{1}

and

{\bar{p_{1} p_{2} p_{3}}}

are

m a x d i s t (v_{1}, {\bar{p_{1} p_{2} p_{3}}}) = 24

and

m i n d i s t (v_{1}, {\bar{p_{1} p_{2} p_{3}}}) = 19

, respectively.

Figure 8 illustrates the computation of

m a x d i s t (v_{1},

{\bar{p_{4} p_{5}}, \bar{p_{5} p_{6}}})

and

m i n d i s t (v_{1},

{\bar{p_{4} p_{5}}, \bar{p_{5} p_{6}}})

. The maximum distance between

v_{1}

and

{\bar{p_{4} p_{5}}, \bar{p_{5} p_{6}}}

is computed by

m a x d i s t (v_{1}, {\bar{p_{4} p_{5}}, \bar{p_{5} p_{6}}}) = m a x {m a x d i s t (v_{1}, \bar{p_{4} p_{5}}),

m a x d i s t (v_{1}, \bar{p_{5} p_{6}})}

. The computations of

m a x d i s t (v_{1}, \bar{p_{4} p_{5}})

and

m a x d i s t (v_{1}, \bar{p_{5} p_{6}})

are illustrated in Figure 8a,b, respectively. The distances from

v_{1}

to the endpoints

p_{4}

and

p_{5}

of

\bar{p_{4} p_{5}}

are

d i s t (v_{1}, p_{4}) = 10

and

d i s t (v_{1}, p_{5}) = 9

, respectively. Thus, the maximum and minimum distances between

v_{1}

and

\bar{p_{4} p_{5}}

are

m a x d i s t (v_{1}, \bar{p_{4} p_{5}}) = 11

and

m i n d i s t (v_{1}, \bar{p_{4} p_{5}}) = 9

, respectively, as shown in Figure 8a, where the star symbol (★) is marked to indicate

m a x d i s t (v_{1}, \bar{p_{4} p_{5}}) = 11

. The distances from

v_{1}

to the endpoints

p_{5}

and

p_{6}

of

\bar{p_{5} p_{6}}

are

d i s t (v_{1}, p_{5}) = 9

and

d i s t (v_{1}, p_{6}) = 12

, respectively. As shown in Figure 8b, the maximum and minimum distances between

v_{1}

and

\bar{p_{5} p_{6}}

are

m a x d i s t (v_{1}, \bar{p_{5} p_{6}}) = 12

and

m i n d i s t (v_{1}, \bar{p_{5} p_{6}}) = 9

, respectively. Thus, the maximum and minimum distances between

v_{1}

and

{\bar{p_{4} p_{5}}, \bar{p_{5} p_{6}}}

are

m a x d i s t (v_{1}, {\bar{p_{4} p_{5}}, \bar{p_{5} p_{6}}}) = 12

and

m i n d i s t (v_{1}, {\bar{p_{4} p_{5}}, \bar{p_{5} p_{6}}}) = 9

, respectively.

Figure 9 illustrates the computation of

m a x d i s t (v_{2},

{\bar{p_{1} p_{2} p_{3}}})

and

m i n d i s t (v_{2},

{\bar{p_{1} p_{2} p_{3}}})

. The distances from

v_{2}

to the endpoints

p_{1}

and

p_{3}

of

\bar{p_{1} p_{2} p_{3}}

are

d i s t (v_{2}, p_{1}) = d i s t (v_{2}, p_{3}) = 23

, respectively. Thus, the maximum and minimum distances between

v_{2}

and

{\bar{p_{1} p_{2} p_{3}}}

are

m a x d i s t (v_{2}, {\bar{p_{1} p_{2} p_{3}}}) = 25.5

and

m i n d i s t (v_{2}, {\bar{p_{1} p_{2} p_{3}}}) = 23

, respectively. Note that the star symbol (★) in Figure 9 is marked to indicate

m a x d i s t (v_{2}, {\bar{p_{1} p_{2} p_{3}}}) = 25.5

.

Figure 10 illustrates the computation of

m a x d i s t (v_{2},

{\bar{p_{4} p_{5}}, \bar{p_{5} p_{6}}})

and

m i n d i s t (v_{2},

{\bar{p_{4} p_{5}}, \bar{p_{5} p_{6}}})

. The maximum distance between

v_{2}

and

{\bar{p_{4} p_{5}}, \bar{p_{5} p_{6}}}

is computed by

m a x d i s t (v_{2}, {\bar{p_{4} p_{5}}, \bar{p_{5} p_{6}}}) = m a x {m a x d i s t (v_{2}, \bar{p_{4} p_{5}}),

m a x d i s t (v_{2}, \bar{p_{5} p_{6}})}

. The computations of

m a x d i s t (v_{2}, \bar{p_{4} p_{5}})

and

m a x d i s t (v_{2}, \bar{p_{5} p_{6}})

are illustrated in Figure 10a,b, respectively. The distances from

v_{2}

to the endpoints

p_{4}

and

p_{5}

of

\bar{p_{4} p_{5}}

are

d i s t (v_{2}, p_{4}) = 8

and

d i s t (v_{2}, p_{5}) = 5

, respectively. Thus, the maximum and minimum distances between

v_{2}

and

\bar{p_{4} p_{5}}

are

m a x d i s t (v_{2}, \bar{p_{4} p_{5}}) = 8

and

m i n d i s t (v_{2}, \bar{p_{4} p_{5}}) = 5

, respectively, as shown in Figure 10a. The distances from

v_{2}

to the endpoints

p_{5}

and

p_{6}

of

\bar{p_{5} p_{6}}

are

d i s t (v_{2}, p_{5}) = 5

and

d i s t (v_{2}, p_{6}) = 8

, respectively. Thus, the maximum and minimum distances between

v_{2}

and

\bar{p_{5} p_{6}}

are

m a x d i s t (v_{2}, \bar{p_{5} p_{6}}) = 8

and

m i n d i s t (v_{2}, \bar{p_{5} p_{6}}) = 5

, respectively, as shown in Figure 10b. Thus, the maximum and minimum distances between

v_{2}

and

{\bar{p_{4} p_{5}}, \bar{p_{5} p_{6}}}

are

m a x d i s t (v_{2}, {\bar{p_{4} p_{5}}, \bar{p_{5} p_{6}}}) = 8

and

m i n d i s t (v_{2}, {\bar{p_{4} p_{5}}, \bar{p_{5} p_{6}}}) = 5

, respectively.

Table 3 summarizes the maximum and minimum distances between the border points in

B (\bar{Q_{C}})

and the data clusters in

\bar{P}

.

4. Cluster Nested Loop Join Algorithm for Spatial Networks

The CNLJ algorithm is described in Section 4.1. Section 4.2 evaluates k FNs queries at the border points of query clusters. Finally, the example k FNs join query is evaluated in Section 4.3.

4.1. Cluster Nested Loop Join Algorithm

The CNLJ algorithm is described in Algorithm 1, which involves two steps. The two-step clustering method (lines 2–4), which is described in Section 3.1, is used to group nearby query points (data points) into query clusters (data clusters) in the first step. In the second step, the kFN join is performed for each query cluster in

\bar{Q}

(lines 5–8). Finally, the kFN join result

Ω (Q)

is returned to a query user when the kFN join is complete for each query cluster in

\bar{Q}

(line 9).

Algorithm 1 CNLJ(

k, Q, P

).

Input:k: number of FNs for q, Q: set of query points, and P: set of data points
Output:

Ω (Q)

: Set of ordered pairs of each query point q in Q and a set of k FNs for q, i.e.,

Ω (Q) = {〈q, Ω (q)〉 | q \in Q}

.

1:: $Ω (Q) \leftarrow \emptyset$ // The result set $Ω (Q)$ is initialized to the empty set.
2:: // Step 1: Query and data points are clustered, which is presented in Section 3.1.
3:: $\bar{Q} \leftarrow t w o_s t e p_c l u s t e r i n g (Q)$ // Query points are grouped into query clusters.
4:: $\bar{P} \leftarrow t w o_s t e p_c l u s t e r i n g (P)$ // Data points are grouped into data clusters.
5:: // Step 2: The kFN join is performed for each query cluster in $\bar{Q}$ , which is presented in Algorithm 2.
6:: for each query cluster $\bar{Q_{C}} \in \bar{Q}$ do
7:: $Ω (\bar{Q_{C}}) \leftarrow k F N_j o i n (k, \bar{Q_{C}}, \bar{P})$ // $Ω (\bar{Q_{C}}) = {〈q, Ω (q)〉 | q \in \bar{Q_{C}}}$ .
8:: $Ω (Q) \leftarrow Ω (Q) \cup Ω (\bar{Q_{C}})$ // the kFN join result for $\bar{Q_{C}}$ is added to $Ω (Q)$ .
9:: return $Ω (Q)$ // $Ω (Q)$ is returned once the kFN join for every query cluster in $\bar{Q}$ is complete.

Algorithm 2 describes the kFN join algorithm for a query cluster

\bar{Q_{C}}

. First, kFN queries are evaluated at the border points of

\bar{Q_{C}}

to collect the candidate data points for query points in

\bar{Q_{C}}

(lines 4–7), which is described in Algorithm 3. Then, each query point q in

\bar{Q_{C}}

retrieves the kFNs for q among the candidate data points in

Ω (B (\bar{Q_{C}}))

(lines 8–11), which is detailed in Algorithm 4. Finally, the kFN join result

Ω (\bar{Q_{C}})

for query points in

\bar{Q_{C}}

is returned after each query point q in

\bar{Q_{C}}

retrieves the kFNs for q from the candidate data points (line 12).

Algorithm 2

k F N_j o i n (k, \bar{Q_{C}}, \bar{P})

.

Input:k: number of FNs for q,

\bar{Q_{C}}

: query cluster, and

\bar{P}

: set of data clusters
Output:

Ω (\bar{Q_{C}})

: Set of ordered pairs of each query point q in

\bar{Q_{C}}

and a set of k FNs for q, i.e.,

Ω (\bar{Q_{C}}) = {〈q, Ω (q)〉 | q \in \bar{Q_{C}}}

1:: $Ω (\bar{Q_{C}}) \leftarrow \emptyset$ // The result set for query points in $\bar{Q_{C}}$ is initialized to the empty set.
2:: $Ω (B (\bar{Q_{C}})) \leftarrow \emptyset$ // Note that $Ω (B (\bar{Q_{C}})) = {〈b_{q}, Ω (b_{q})〉 | b_{q} \in B (\bar{Q_{C}})}$ .
3:: $l \leftarrow m a x {d i s t (b_{q_{i}}, b_{q_{j}}) | b_{q_{i}}, b_{q_{j}} \in B (\bar{Q_{C}})}$ // l indicates the maximum distance between border points in $\bar{Q_{C}}$ .
4:: // Step 1: kFN query is evaluated at each border point $b_{q}$ in $\bar{Q_{C}}$
5:: for each border point $b_{q} \in B (\bar{Q_{C}})$ do
6:: $Ω (b_{q}) \leftarrow f i n d_c a n d i d a t e s (k, l, b_{q}, \bar{P})$ // kFN query is evaluated at $b_{q}$ , which is detailed in Algorithm 3.
7:: $Ω (B (\bar{Q_{C}})) \leftarrow Ω (B (\bar{Q_{C}})) \cup Ω (b_{q})$ // $Ω (B (\bar{Q_{C}}))$ collects candidate data points for query points in $\bar{Q_{C}}$ .
8:: // Step 2: Each query point q retrieves k FNs among the candidate data points in $Ω (B (\bar{Q_{C}}))$ .
9:: for each query point $q \in \bar{Q_{C}}$ do
10:: $Ω (q) \leftarrow r e t r i e v e_k F N (k, q, Ω (B (\bar{Q_{C}})))$ // $Ω (B (\bar{Q_{C}}))$ is the set of candidate data points for q.
11:: $Ω (\bar{Q_{C}}) \leftarrow Ω (\bar{Q_{C}}) \cup Ω (q)$
12:: return $Ω (\bar{Q_{C}})$ // $Ω (\bar{Q_{C}})$ is returned once the kFN search is performed for each query point in $\bar{Q_{C}}$ .

Algorithm 3 describes the kFN query processing algorithm for finding candidate data points at a border point

b_{q}

for a query cluster. Note that the kFN query result for

b_{q}

includes candidate data points for query points in

\bar{Q_{C}}

. The set of kFNs for

b_{q}

,

Ω (b_{q})

, is initialized to the empty set (line 1). The third argument l indicates the maximum distance between border points in

\bar{Q_{C}}

, i.e.,

l \leftarrow m a x {d i s t (b_{q_{i}}, b_{q_{j}}) | b_{q_{i}}, b_{q_{j}} \in B (\bar{Q_{C}})}

. The sentinel distance is initialized to

s n t l_d i s t \leftarrow 0

and determines whether a data point p is a candidate point for

\bar{Q_{C}}

. The maximum and minimum distances from

b_{q}

to data clusters in

\bar{P}

are computed, as described in Section 3.2. The data clusters are then sorted in decreasing order of their maximum distance to

b_{q}

. Naturally, the sorted data clusters are explored sequentially. If the maximum distance from

b_{q}

to the data cluster

\bar{P_{C}}

to be explored next is smaller than the sentinel distance, i.e.,

m a x d i s t (b_{q}, \bar{P_{C}}) < s n t l_d i s t

, the remaining unexplored data clusters do not need to be considered because the data points in these data clusters can be candidate data points for no query point in

\bar{Q_{C}}

. Otherwise (i.e.,

m a x d i s t (b_{q}, \bar{P_{C}}) \geq s n t l_d i s t

), each data point p in

\bar{P_{C}}

needs to be examined to determine whether p is a candidate point for query points in

\bar{Q_{C}}

. For this,

d i s t (b_{q}, p)

is computed. If

b_{q}

is inside

\bar{P_{C}}

, then the distance from

b_{q}

to p is simply computed using a graph search algorithm such as Dijkstra’s algorithm [39]. Otherwise (i.e., if

b_{q}

is outside

\bar{P_{C}}

), the distance evaluates to

d i s t (b_{q}, p) \leftarrow m i n {d i s t (b_{q}, b_{p}) + d i s t (b_{p}, p) | b_{p} \in B (\bar{P_{C}})}

. Note that

b_{q}

is a border point of

\bar{Q_{C}}

and

b_{p}

is a border point of

\bar{P_{C}}

. This is because if

b_{q}

is outside

\bar{P_{C}}

, the shortest path from

b_{q}

to p should pass through a border point

b_{p}

of

\bar{P_{C}}

, i.e.,

b_{q} \to b_{p} \to p

. If

d i s t (b_{q}, p) \geq s n t l_d i s t

, then p is added to

Ω (b_{q})

as a candidate data point for

\bar{Q_{C}}

. Redundant data points p may be included in

Ω (b_{q})

and they should be removed from

Ω (b_{q})

. Thus, each data point p in

Ω (b_{q})

is explored to verify that p is qualified to be a candidate data point, i.e.,

d i s t (b_{q}, p) \geq s n t l_d i s t

. If p does not satisfy the qualification, it is removed from

Ω (b_{q})

. Finally, if the maximum distance from

b_{q}

to the data cluster

\bar{P_{C}}

is smaller than the sentinel distance (lines 10–12) or if every data cluster is examined, the k FN query result for

b_{q}

,

Ω (b_{q})

, is returned.

Algorithm 3

f i n d_c a n d i d a t e s (k, l, b_{q}, \bar{P})

.

Input:k: number of FNs for q, l: maximum distance between border points in

\bar{Q_{C}}

,

b_{q}

: border point of

\bar{Q_{C}}

, and

\bar{P}

: set of data clusters
Output:

Ω (b_{q})

: Set of k FNs for

b_{q}

1:: $Ω (b_{q}) \leftarrow \emptyset$ // The set of k FNs for a border point $b_{q}$ , $Ω (b_{q})$ , is initialized to the empty set.
2:: $s n t l_d i s t \leftarrow 0$ // t = The sentinel distance $s n t l_d i s t$ is initialized to 0.
3:: // The maximum and minimum distances from $b_{q}$ to data clusters in $\bar{P}$ are computed as explained in Section 3.2.
4:: for each data cluster $\bar{P_{C}} \in \bar{P}$ do
5:: compute $m a x d i s t (b_{q}, \bar{P_{C}})$ and $m i n d i s t (b_{q}, \bar{P_{C}})$
6:: // The data clusters in $\bar{P}$ are sorted in decreasing order of their maximum distance to $b_{q}$
7:: $\bar{P} \leftarrow s o r t_d a t a_c l u s t e r s (\bar{P}$ ) // $\bar{P}$ contains the sorted data clusters for $b_{q}$ .
8:: // Data clusters are explored sequentially.
9:: for each sorted data cluster $\bar{P_{C}} \in \bar{P}$ do
10:: if $m a x d i s t (b_{q}, \bar{P_{C}}) < s n t l_d i s t$ then
11:: // Note that $s n t l_d i s t$ is updated as shown in line 24.
12:: Go to line 26 // This means that the other data clusters do not need to be explored.
13:: // Each data point p in $\bar{P_{C}}$ is sequentially explored to find k FNs for $b_{q}$ .
14:: for each data point $p \in \bar{P_{C}}$ do
15:: // $d i s t (b_{q}, p)$ is computed using the following two cases. $b_{q} \in$ $\bar{P_{C}}$ and $b_{q} \notin$ $\bar{P_{C}}$ .
16:: if $b_{q}$ is inside $\bar{P_{C}}$ then
17:: $d i s t (b_{q}, p)$ is simply computed using a graph search algorithm such as Dijkstra’s algorithm [39]
18:: else
19:: $d i s t (b_{q}, p) \leftarrow m i n {d i s t (b_{q}, b_{p}) + d i s t (b_{p}, p) | b_{p} \in B (\bar{P_{C}})}$ // Note that the path from $b_{q}$ to p is $b_{q} \to b_{p} \to p$ .
20:: // p is added to $Ω (b_{q})$ if $d i s t (b_{q}, p) \geq s n t l_d i s t$ .
21:: if $d i s t (b_{q}, p) \geq s n t l_d i s t$ then
22:: // $Ω (b_{q})$ collects candidate data points for query points in $\bar{Q_{C}}$ .
23:: $Ω (b_{q}) \leftarrow Ω (b_{q}) \cup {p}$ // p is added to $Ω (b_{q})$ .
24:: $s n t l_d i s t \leftarrow d i s t (b_{q}, p_{k t h}) - l$ // $p_{k t h}$ is the current kth FN of $b_{q}$ .
25:: // Redundant data points p are removed from $Ω (b_{q})$ because they can be kFNs of no query point in $\bar{Q_{C}}$
26:: for each data point $p \in Ω (b_{q})$ do
27:: if $d i s t (b_{q}, p) < s n t l_d i s t$ then
28:: $Ω (b_{q}) \leftarrow Ω (b_{q}) - {p}$ // p is no candidate data point for $\bar{Q_{C}}$ and it is removed from $Ω (b_{q})$ .
29:: return $Ω (b_{q})$ // $Ω (b_{q})$ is returned after candidate data points are collected for query points in $\bar{Q_{C}}$ .

Algorithm 4 describes that a query point q in

\bar{Q_{C}}

retrieves k FNs for q among candidate data points in

Ω (B (\bar{Q_{C}}))

. First,

Ω (q)

is initialized to the empty set. The distance between q and a candidate data point p is computed using the following two cases:

p \in

\bar{Q_{C}}

and

p \notin

\bar{Q_{C}}

. If p is inside

\bar{Q_{C}}

, i.e.,

p \in

\bar{Q_{C}}

, then the distance from q to p is simply computed using a graph search algorithm [39]. Otherwise (i.e.,

p \notin

\bar{Q_{C}}

), the distance evaluates to

d i s t (q, p) \leftarrow m i n {d i s t (q, b_{q}) + d i s t (b_{q}, p) | b_{q} \in B (\bar{Q_{C}})}

. This is because the shortest path from q to p should pass through a border point

b_{q}

of

\bar{Q_{C}}

, i.e.,

q \to b_{q} \to p

. When

d i s t (q, p)

is computed, the following two conditions are checked to determine whether the data point p is added to

Ω (q)

: If the cardinality of

Ω (q)

is smaller than k, i.e.,

| Ω (q) | < k

, then p is simply added to

Ω (q)

. Furthermore, if p is farther from q than the current kth FN

p_{k t h}

of q, i.e.,

d i s t (q, p) > d i s t (q, p_{k t h})

, then p is added to

Ω (q)

and

p_{k t h}

is removed from

Ω (q)

. Finally, when exploration of every candidate data point is complete, the kFN query result for q,

Ω (q)

, is returned.

Algorithm 4

r e t r i e v e_k F N (k, q, Ω (B (\bar{Q_{C}})))

.

Input:k: number of FNs for q, q: query point in

\bar{Q_{C}}

,

Ω (B (\bar{Q_{C}}))

: set of candidate data points for q
Output:

Ω (q)

: set of k FNs for q

1:: $Ω (q) \leftarrow \emptyset$ // $Ω (q)$ is initialized to the empty set.
2:: // $Ω (B (\bar{Q_{C}}))$ is the set of candidate data points for q.
3:: for each candidate data point $p \in Ω (B (\bar{Q_{C}}))$ do
4:: if p is inside $\bar{Q_{C}}$ then
5:: $d i s t (q, p)$ is simply computed using a graph search algorithm like Dijkstra’s algorithm [39]
6:: else
7:: $d i s t (q, p) \leftarrow m i n {d i s t (q, b_{q}) + d i s t (b_{q}, p) | b_{q} \in B (\bar{Q_{C}})}$ //note that $d i s t (b_{q}, p)$ was computed in Algorithm 3.
8:: // p is added to $Ω (q)$ if it satisfies either of the two conditions below.
9:: if $|Ω (q)| < k$ then
10:: $Ω (q) \leftarrow Ω (q) \cup {p}$
11:: else if $|Ω (q)| = k$ and $d i s t (q, p) > d i s t (q, p_{k t h})$ then
12:: // note that $p_{k t h}$ is the current kth FN of q.
13:: $Ω (q) \leftarrow Ω (q) \cup {p} - {p_{k t h}}$
14:: return $Ω (q)$

Lemma 1 proves that a query point q in a query cluster

\bar{Q_{C}}

can retrieve the k FNs for q among candidate data points in

Ω (B (\bar{Q_{C}}))

.

Lemma 1.

Each query point q in a query cluster

\bar{Q_{C}}

can retrieve the k FNs for q among candidate data points in

Ω (B (\bar{Q_{C}}))

.

Proof.

Lemma 1 is proved by contradiction. For this, assume that there is a qualified data point p in

Ω (q)

and that p does not belong to

Ω (B (\bar{Q_{C}}))

, i.e.,

p \in Ω (q)

and

p \notin Ω (B (\bar{Q_{C}}))

. The qualified data point p is farther from q than the kth FN

p_{k t h}

of border point

b_{q}

of

\bar{Q_{C}}

, which means that

d i s t (q, p) > d i s t (q, p_{k t h})

. According to Algorithm 3, it holds that

d i s t (q, b_{q}) \leq l

and

d i s t (b_{q}, p_{k t h}) - d i s t (b_{q}, p) > l

, where

l = m a x {d i s t (b_{q_{i}}, b_{q_{j}}) | b_{q_{i}}, b_{q_{j}} \in B (\bar{Q_{C}})}

. Thus, the distance from q to p via

b_{q}

is smaller than the distance from

b_{q}

to

p_{k t h}

, i.e.,

d i s t (q, b_{q}) + d i s t (b_{q}, p) < d i s t (b_{q}, p_{k t h})

. This is because

d i s t (q, b_{q}) \leq l

and

d i s t (b_{q}, p_{k t h}) > d i s t (b_{q}, p) + l

are given. Clearly,

d i s t (q, b_{q}) + d i s t (b_{q}, p) < d i s t (b_{q}, p_{k t h})

implies that

d i s t (q, p) < d i s t (q, p_{k t h})

. This leads to a contradiction to the assumption that

d i s t (q, p) > d i s t (q, p_{k t h})

. Therefore, each query point q in a query cluster

\bar{Q_{C}}

can retrieve the k FNs for q among candidate data points in

Ω (B (\bar{Q_{C}}))

. □

The CNLJ and nonclustering join algorithms for spatial networks have different time complexities, as shown in Table 4. Notably, the CNLJ algorithm is orthogonal to the kFN query processing algorithms, which can easily be incorporated into the CNLJ algorithm. The simple solution for finding k FNs for a single query point is used in this analysis for simplicity. The time complexity of the kFN query processing is

O (|E| + |V| \cdot \log |V| + |P| \cdot \log |P|)

. The CNLJ algorithm evaluates at most

M \cdot |\bar{Q}|

kFN queries, where M is the maximum number of border points of a query cluster

\bar{Q_{C}}

, i.e.,

M = m a x {|B (\bar{Q_{C}})| | \bar{Q_{C}} \in \bar{Q}}

. The nonclustering join algorithm simply evaluates

|Q|

kFN queries because kFN queries for query points should be evaluated sequentially. Thus, the time complexities of the CNLJ and nonclustering join algorithms are

O (| \bar{Q} | \cdot (|E| + |V| \cdot \log |V| + |P| \cdot \log |P|))

and

O (| Q | \cdot (|E| + |V| \cdot \log |V| + |P| \cdot \log |P|))

, respectively. The theoretical results imply that the CNLJ algorithm runs faster than the nonclustering join algorithm, particularly when

| \bar{Q} | ≪ | Q |

, i.e., the query points are densely clustered. In addition, the results imply that the CNLJ algorithm exhibits similar performance to the nonclustering join algorithm when

| \bar{Q} | ≅ | Q |

, i.e., the query points are not clustered.

4.2. Evaluating kFN Queries at Border Points

The CNLJ algorithm evaluates kFN queries at the border points of query clusters

\bar{Q_{C}}

. For the example kFN join query, the CNLJ algorithm evaluates kFN queries at border points

q_{1}

,

v_{1}

, and

v_{2}

, rather than query points

q_{1}

,

q_{2}

,

q_{3}

, and

q_{4}

. First, the kFN query is evaluated at a border point

q_{1}

. The maximum and minimum distances between

q_{1}

and each data cluster in

\bar{P}

are computed, and the data clusters are sorted in descending order based on their maximum distance to

q_{1}

. As shown in Figure 11, the two data clusters

{\bar{p_{1} p_{2} p_{3}}}

and

{\bar{p_{4} p_{5}}, \bar{p_{5} p_{6}}}

are arranged using their maximum distance to

q_{1}

as follows:

\bar{P} = {{\bar{p_{1} p_{2} p_{3}}}, {\bar{p_{4} p_{5}}, \bar{p_{5} p_{6}}}}

. This is because

m a x d i s t (q_{1}, {\bar{p_{1} p_{2} p_{3}}}) = 28

and

m a x d i s t (q_{1}, {\bar{p_{4} p_{5}}, \bar{p_{5} p_{6}}}) = 11

, as described in Table 3. The border point

q_{1}

investigates

{\bar{p_{1} p_{2} p_{3}}}

followed by

{\bar{p_{4} p_{5}}, \bar{p_{5} p_{6}}}

. Following an exploration of

{\bar{p_{1} p_{2} p_{3}}}

,

q_{1}

selects

p_{2}

and

p_{3}

as the two FNs because

d i s t (q_{1}, p_{1}) = 24

,

d i s t (q_{1}, p_{2}) = 25

, and

d i s t (q_{1}, p_{3}) = 27

are computed, as shown in Figure 5. The sentinel distance for

q_{1}

is

s n t l_d i s t = 20

. This is because the maximum distance l between the border points in

\bar{Q_{C}}

is

l = d i s t (q_{1}, v_{1}) = 5

, whereas the distance from

q_{1}

to its second FN

p_{2}

is

d i s t (q_{1}, p_{2}) = 25

. Thus, a set of candidate data points for query points in

\bar{Q_{C}}

is

Ω (q_{1}) = {p_{1}, p_{2}, p_{3}}

. This is because

d i s t (q_{1}, p_{1}) \geq s n t l_d i s t

,

d i s t (q_{1}, p_{2}) \geq s n t l_d i s t

, and

d i s t (q_{1}, p_{3}) \geq s n t l_d i s t

. Clearly,

q_{1}

no longer examines the other data cluster

{\bar{p_{4} p_{5}}, \bar{p_{5} p_{6}}}

. This is because

s n t l_d i s t

is larger than

m a x d i s t (q_{1}, {\bar{p_{4} p_{5}}, \bar{p_{5} p_{6}}})

where

m a x d i s t (q_{1}, {\bar{p_{4} p_{5}}, \bar{p_{5} p_{6}}}) = 11

, as shown in Table 3. Finally, a set of candidate data points for query points in

\bar{Q_{C}}

is

Ω (q_{1}) = {p_{1}, p_{2}, p_{3}}

.

At the other border points

v_{1}

and

v_{2}

, we can similarly evaluate kFN queries. Two FNs of

v_{1}

are

p_{2}

and

p_{3}

, as illustrated in Figure 7 and Figure 8. Thus, a set of candidate data points at

v_{1}

is

Ω (v_{1}) = {p_{1}, p_{2}, p_{3}}

because the sentinel distance for

v_{1}

is

s n t l_d i s t (v_{1}) = 15

. Similarly, two FNs of

v_{2}

are

p_{1}

and

p_{2}

, as illustrated in Figure 9 and Figure 10. Thus, a set of candidate data points at

v_{2}

is

Ω (v_{2}) = {p_{1}, p_{2}, p_{3}}

because the sentinel distance for

v_{2}

is

s n t l_d i s t (v_{2}) = 18

. Table 5 summarizes sets of candidate data points for border points

q_{1}

,

v_{1}

, and

v_{2}

and their sentinel distance.

4.3. Evaluating an Example kFN Join Query

The CNLJ algorithm retrieves k FNs for each query point in

\bar{Q_{C}}

among the candidate data points in

Ω (B (\bar{Q_{C}}))

. This example kFN join query requires two FNs for each query point, i.e.,

k = 2

, and the set of candidate data points is

Ω (B (\bar{Q_{C}})) = {p_{1}, p_{2}, p_{3}}

. Each of

q_{1}

,

q_{2}

,

q_{3}

, and

q_{4}

retrieves its two FNs among candidate data points

p_{1}

,

p_{2}

, and

p_{3}

. Two FNs of

q_{1}

are first determined, two FNs of

q_{2}

are next determined, and so on. Let us find two FNs for

q_{1}

among the candidate data points

p_{1}

,

p_{2}

, and

p_{3}

. The distances from

q_{1}

to

p_{1}

,

p_{2}

, and

p_{3}

should be computed using the fact that the shortest path from

q_{1}

to a candidate data point should pass through a border point

b_{q}

. As a result, the length of the shortest path from

q_{1}

to

p_{1}

is equal to

d i s t (q_{1}, p_{1})

= m i n {d i s t (q_{1}, b_{q}) + d i s t (b_{q}, p_{1}) | b_{q} \in {q_{1}, v_{1}, v_{2}}}

= m i n {d i s t (q_{1}, q_{1}) + d i s t (q_{1}, p_{1}),

d i s t (q_{1}, v_{1}) + d i s t (v_{1}, p_{1}),

d i s t (q_{1}, v_{2}) + d i s t (v_{2}, p_{1})}

= m i n {24, 24, 27} = 24

. The distance from

q_{1}

to

p_{2}

evaluates to

d i s t (q_{1}, p_{2}) =

m i n {d i s t (q_{1}, b_{q}) + d i s t (b_{q}, p_{2}) | b_{q} \in {q_{1}, v_{1}, v_{2}}} = 25

. The distance from

q_{1}

to

p_{3}

evaluates to

d i s t (q_{1}, p_{3}) = m i n {d i s t (q_{1}, b_{q}) + d i s t (b_{q}, p_{3}) | b_{q} \in {q_{1}, v_{1}, v_{2}}} = 27

. Thus,

p_{2}

and

p_{3}

are two FNs for

q_{1}

whose result set is

Ω (q_{1}) = {p_{2}, p_{3}}

. Next, two FNs for

q_{2}

are retrieved among candidate data points

p_{1}

,

p_{2}

, and

p_{3}

. For this, the distances from

q_{2}

to

p_{1}

,

p_{2}

, and

p_{3}

should be computed. As shown in Table 6, the distances from

q_{2}

to

p_{1}

,

p_{2}

, and

p_{3}

evaluate to

d i s t (q_{2}, p_{1}) = 25

,

d i s t (q_{2}, p_{2}) = 26

, and

d i s t (q_{2}, p_{3}) = 25

, respectively. Thus,

p_{1}

and

p_{2}

are two FNs for

q_{2}

, whose result set is

Ω (q_{2}) = {p_{1}, p_{2}}

. Similarly, two FNs for

q_{3}

and

q_{4}

can be retrieved among candidate data points

p_{1}

,

p_{2}

, and

p_{3}

. Table 6 computes the distance from a query point q to a candidate data point p and retrieves two FNs for q among candidate data points where

q \in {q_{1}, q_{2}, q_{3}, q_{4}}

and

p \in {p_{1}, p_{2}, p_{3}}

. Finally, the kFN join result is the union set of the kFN query results for query points in Q as follows:

Ω (Q) = Ω (q_{1}) \cup Ω (q_{2}) \cup Ω (q_{3}) \cup Ω (q_{4}) = {〈q_{1}, {p_{2}, p_{3}}〉, 〈q_{2}, {p_{1}, p_{2}}〉, 〈q_{3}, {p_{2}, p_{3}}〉, 〈q_{4}, {p_{2}, p_{3}}〉}

.

5. Performance Evaluation

The CNLJ algorithm and its competitors are compared empirically in this section under a variety of conditions. Section 5.1 describes the experimental conditions, and Section 5.2 reports the results of the experiment.

5.1. Experimental Settings

Table 7 describes two real-world roadmaps [40] that were used in the experiments. These real-world roadmaps have different sizes and are part of the United States’ road network. For convenience, the data universe was normalized to a unit square of the plane. The query and data points were generated to mimic the highly skewed distributions of POIs in the real world. Firstly, the centroids

c_{1}, c_{2}, \dots, c_{m}

were randomly chosen inside the data universe, where m indicates the total number of centroids and varies between 1 and 10. The query and data points around each centroid displayed a normal distribution, with the mean indicating the centroid and the standard deviation set to

{σ = 10}^{- 2}

. Table 8 shows the experimental parameters settings. We varied a single parameter within the range in each experiment while maintaining the other parameters at the bold default values.

The baseline algorithm, which is a nonclustering join algorithm for sequentially computing the k FNs of each query point in Q, was used as a benchmark for evaluating the CNLJ algorithm. We implemented and evaluated two versions of our proposed solution, i.e., CNLJ

_{NV}

and CNLJ

_{OPT}

. The naive version called CNLJ

_{NV}

of the CNLJ algorithm groups query points into query segments, as illustrated in Figure 3a. Thus, CNLJ

_{NV}

evaluates at most two kFN queries for a query segment. The optimized version called CNLJ

_{OPT}

of the CNLJ algorithm groups query points into query clusters using the two-step clustering method, as illustrated in Figure 3b. Note that the source codes for empirical evaluation of this study can be accessed via the GitHub site at https://github.com/Hyung-Ju-Cho/ (accessed on 8 February 2021). In the Microsoft Visual Studio 2019 development environment, all join algorithms were implemented in C++. All of the algorithms’ common subroutines were reused for similar tasks. Experiments were conducted on a desktop computer running the Windows 10 operating system with 32 GB RAM and an 8-core processor (i9-9900) at 3.1 GHz. As in several existing studies [36,41] for online map services, this empirical study assumes that all of the algorithms’ indexing structures remain in the main memory to evaluate kFN join queries quickly. The average time required to answer kFN join queries was calculated through repeated experiments using kFN join queries. Finally, we computed the network distance between two points quickly using the TNR method [42]. This is because the TNR method is easy to implement and demonstrates performance comparable to the other shortest distance algorithms [38,41,43,44,45].

5.2. Experimental Results

The proposed CNLJ

_{OPT}

, CNLJ

_{NV}

, and baseline algorithms in the NA roadmap are compared in Figure 12. Each chart shows the kFN join query processing time and the number of kFN queries required to evaluate the kFN join query. The numbers of kFN queries required by the CNLJ

_{OPT}

, CNLJ

_{NV}

, and baseline algorithms to answer the kFN join query are shown in parentheses in Figure 12, Figure 13 and Figure 14. Note that the CNLJ

_{OPT}

algorithm evaluates kFN queries at border points of query clusters, the CNLJ

_{NV}

algorithm evaluates kFN queries at end points of query segments, and the baseline algorithm evaluates kFN queries at query points. As a result, the baseline algorithm evaluates the same number of kFN queries as the number

|Q|

of query points in Q. Figure 12a shows the query processing times of the CNLJ

_{OPT}

, CNLJ

_{NV}

, and baseline algorithms when the number of the query points changes between 1000 and 5000, i.e.,

1000 \leq |Q| \leq 5000

. In all cases in

|Q|

, the CNLJ

_{OPT}

algorithm is faster than the CNLJ

_{NV}

and baseline algorithms. When

|Q| = 5000

, the CNLJ

_{OPT}

, CNLJ

_{NV}

, and baseline algorithms evaluate 281, 471, and 5000 kFN queries, respectively, and thus the CNLJ

_{OPT}

algorithm is 1.2 and 36.7 times faster than the CNLJ

_{NV}

and baseline algorithms, respectively. Figure 12b shows the query processing times when the number of data points changes from 1000 to 5000, i.e.,

1000 \leq |P| \leq 5000

. Regardless of the

|P|

value, the CNLJ

_{OPT}

, CNLJ

_{NV}

, and baseline algorithms evaluate 58, 96, and 1000 kFN queries, respectively. Thus, when

|P| = 3000

, the CNLJ

_{OPT}

algorithm outperforms the CNLJ

_{NV}

and baseline algorithms by 1.2 and 15.6 times, respectively. Figure 12c shows the query processing times when the number of FNs required changes between 1 and 16, i.e.,

1 \leq k \leq 16

. For all cases in k, the CNLJ

_{OPT}

algorithm outperforms the CNLJ

_{NV}

and baseline algorithms by 1.2 and 13.4 times, respectively. The CNLJ

_{OPT}

, CNLJ

_{NV}

, and baseline algorithms’ query processing times are not affected by the k value. This is because the kFN query evaluation computes the distances from a query point to data clusters, regardless of the k value, and then sorts the data clusters using the distances to the query point. Figure 12d shows the query processing times when the number of centroids for query points in Q changes between 1 and 10, i.e.,

1 \leq | C_{Q} | \leq 10

. As the

| C_{Q} |

value increases, the difference between the query processing times of all algorithms decreases. The CNLJ

_{OPT}

algorithm is 13.3, 1.3, 1.6, 1.7, and 1.1 times faster than the baseline algorithm when

| C_{Q} | =

1, 3, 5, 7, and 10, respectively. The reason is that as the

| C_{Q} |

value increases, the query points are widely dispersed and the number of query clusters increases, slowing down the CNLJ

_{OPT}

algorithm’s query processing time. Figure 12e shows the query processing times when the number of centroids for data points in P changes between 1 and 10, i.e.,

1 \leq | C_{P} | \leq 10

. The kFN query processing time increases with the

| C_{P} |

value. This is because as the

| C_{P} |

value increases, the data points are widely dispersed and the number of data clusters to be examined by the kFN queries also increases. To summarize, the CNLJ

_{OPT}

algorithm outperforms the CNLJ

_{NV}

and baseline algorithms in all the cases. This confirms that the CNLJ

_{OPT}

algorithm benefits from clustering of nearby query points and quickly retrieving candidate data points at once for those query points.

In the SJ roadmap, Figure 13 compares the performance of the CNLJ

_{OPT}

, CNLJ

_{NV}

, and baseline algorithms. Note that the experimental results using the SJ roadmap exhibit similar performance patterns to those using the NA roadmap. Figure 13a shows the query processing times when

1000 \leq |Q| \leq 5000

. The CNLJ

_{OPT}

algorithm is 1.2 and 6.0 times faster than the CNLJ

_{N V}

and baseline algorithms when

|Q| = 5000

, respectively. Figure 13b shows the query processing times when

1000 \leq |P| \leq 5000

. The CNLJ

_{OPT}

algorithm is 1.2 and 4.5 times faster than the CNLJ

_{NV}

and baseline algorithms when

|P| = 4000

, respectively. The

|P|

value increases the query processing times of all algorithms. Figure 13c shows the query processing times when

1 \leq k \leq 16

. The CNLJ

_{OPT}

algorithm is 1.2 and 3.5 times faster than the CNLJ

_{NV}

and baseline algorithms, respectively. The query processing times are nearly constant regardless of the k value. Figure 13d shows the query processing times when

1 \leq | C_{Q} | \leq 10

. The CNLJ

_{OPT}

algorithm is 3.5, 2.4, 1.8, 1.4, and 1.4 times faster than the baseline algorithm when

| C_{Q} |

= 1, 3, 5, 7, and 10, respectively. The distribution of query points has an impact on the query processing time of the CNLJ

_{OPT}

algorithm, as shown by this result. The query processing time of the CNLJ

_{OPT}

algorithm increases with the number of query clusters because the query points are widely dispersed. Figure 13e shows the query processing times when

1 \leq | C_{P} | \leq 10

. The CNLJ

_{OPT}

algorithm is 2.8, 4.0, 3.5, 2.9, and 3.0 times faster than the baseline algorithm when

|C_{P}|

= 1, 3, 5, 7, and 10, respectively.

Figure 14 compares the performance of the CNLJ

_{OPT}

, CNLJ

_{NV}

, and baseline algorithms while the numbers of query and data points change between 1000 and 10,000, i.e.,

1000 \leq |Q| \leq 10, 000

and

1000 \leq |P| \leq 10, 000

, to verify the scalability of the CNLJ

_{OPT}

algorithm. As shown in Figure 14a,c, the CNLJ

_{OPT}

algorithm runs faster than the CNLJ

_{NV}

and baseline algorithms for all cases in

|Q|

. The performance difference between them typically increases with

| Q |

. Specifically, when

|Q| = 10, 000

, the CNLJ

_{OPT}

algorithm runs 36.6 and 5.3 times faster than the baseline algorithm for NA and SJ roadmaps, respectively. As shown in Figure 14b,d, the CNLJ

_{OPT}

algorithm runs faster than the CNLJ

_{NV}

and baseline algorithms for all cases in

|P|

. Specifically, when

|P| = 10, 000

, the CNLJ

_{OPT}

algorithm runs 6.4 and 3.0 times faster than the baseline algorithm for NA and SJ roadmaps, respectively. The experimental results confirm that the CNLJ

_{OPT}

algorithm scales better with both

| Q |

and

|P|

than the CNLJ

_{NV}

and baseline algorithms.

6. Discussion and Conclusions

The kFN join query retrieves a pair of each query point in Q with its k FNs in P, given a positive integer k, a set of query points Q, and a set of data points P. The kFN join query has various real-life applications including recommender systems and computational geometry [6,7,8,9,10,11,12,13,14]. In particular, efficient processing of kFN join queries can aid in selecting a facility’s location that is farthest away from unpleasant facilities such as garbage incinerators, crematoriums, and chemical plants. A cluster nested loop join (CNLJ) algorithm was constructed in this study to efficiently answer kFN join queries for spatial networks. To the best of our knowledge, this is the first attempt to study kFN join queries in spatial networks. The CNLJ algorithm converts query points (data points) into query clusters (data clusters). It then retrieves candidate data points for clustered query points all at once, eliminating the need to search for candidate data points for each query point separately. Using real-life roadmaps in various conditions, the query processing times of the CNLJ and conventional join algorithms were empirically compared. The experimental results demonstrated that the CNLJ algorithm runs up to 50.8 times faster than the conventional join algorithms and that the CNLJ algorithm also better scales with the numbers of both data and query points than the conventional join algorithms. Unfortunately, the CNLJ algorithm shows similar performance to the conventional join algorithms, particularly when query points are uniformly located in the region. We intend to apply the proposed solution to various fields in the future. When the dataset does not fit in the main memory, we will first create index structures on the external memory. Second, we will conduct an empirical study to simulate real-life scenarios using real datasets. Third, we will improve the CNLJ algorithm for the efficient processing of kFN joins over query points that are uniformly scattered in the region.

Funding

This research was supported by the Basic Science Research Program through the National Research Foundation of Korea (NRF), funded by the Ministry of Education (NRF-2020R1I1A3052713).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

We thank the anonymous reviewers for their very useful comments and suggestions.

Conflicts of Interest

The author declares no conflict of interest.

References

Said, A.; Kille, B.; Jain, B.J.; Albayrak, S. Increasing diversity through furthest neighbor-based recommendation. In Proceedings of the International Workshop on Diversity in Document Retrieval, Seattle, WA, USA, 12 February 2012; pp. 1–4. [Google Scholar]
Said, A.; Fields, B.; Jain, B.J.; Albayrak, S. User-centric evaluation of a k-furthest neighbor collaborative filtering recommender algorithm. In Proceedings of the International Conference on Computer Supported Cooperative Work and Social Computing, San Antonio, TX, USA, 23–27 February 2013; pp. 1399–1408. [Google Scholar]
Veenman, C.J.; Reinders, M.J.T.; Backer, E. A maximum variance cluster algorithm. IEEE Trans. Pattern Anal. Mach. Intell. 2002, 24, 1273–1280. [Google Scholar] [CrossRef] [Green Version]
Defays, D. An efficient algorithm for a complete link method. Comput. J. 1977, 20, 364–366. [Google Scholar] [CrossRef] [Green Version]
Vasiloglou, N.; Gray, A.G.; Anderson, D.V. Scalable semidefinite manifold learning. In Proceedings of the IEEE Workshop on Machine Learning for Signal Processing, Cancun, Mexico, 16–19 October 2008; pp. 368–373. [Google Scholar]
Curtin, R.R.; Echauz, J.; Gardner, A.B. Exploiting the structure of furthest neighbor search for fast approximate results. Inf. Syst. 2019, 80, 124–135. [Google Scholar] [CrossRef]
Gao, Y.; Shou, L.; Chen, K.; Chen, G. Aggregate farthest-neighbor queries over spatial data. In Proceedings of the International Conference on Database Systems for Advanced Applications, Hong Kong, China, 22–25 April 2011; pp. 149–163. [Google Scholar]
Liu, J.; Chen, H.; Furuse, K.; Kitagawa, H. An efficient algorithm for arbitrary reverse furthest neighbor queries. In Proceedings of the Asia-Pacific Web Conference on Web Technologies and Applications, Kunming, China, 11–13 April 2012; pp. 60–72. [Google Scholar]
Liu, W.; Yuan, Y. New ideas for FN/RFN queries based nearest Voronoi diagram. In Proceedings of the International Conference on Bio-Inspired Computing: Theories and Applications, Huangshan, China, 12–14 July 2013; pp. 917–927. [Google Scholar]
Tran, Q.T.; Taniar, D.; Safar, M. Reverse k nearest neighbor and reverse farthest neighbor search on spatial networks. Trans. Large-Scale Data-Knowl.-Cent. Syst. 2009, 1, 353–372. [Google Scholar]
Wang, H.; Zheng, K.; Su, H.; Wang, J.; Sadiq, S.W.; Zhou, X. Efficient aggregate farthest neighbour query processing on road networks. In Proceedings of the Australasian Database Conference on Databases Theory and Applications, Brisbane, Australia, 14–16 July 2014; pp. 13–25. [Google Scholar]
Xiao, Y.; Liu, B.; Hao, Z.; Cao, L. A k-farthest-neighbor-based approach for support vector data description. Appl. Intell. 2014, 41, 196–211. [Google Scholar] [CrossRef]
Xu, X.-J.; Bao, J.-S.; Yao, B.; Zhou, J.-Y.; Tang, F.-L.; Guo, M.-Y.; Xu, J.-Q. Reverse furthest neighbors query in road networks. J. Comput. Sci. Technol. 2017, 32, 155–167. [Google Scholar] [CrossRef]
Yao, B.; Li, F.; Kumar, P. Reverse furthest neighbors in spatial databases. In Proceedings of the International Conference on Data Engineering, Shanghai, China, 29 March–2 April 2009; pp. 664–675. [Google Scholar]
Dutta, B.; Karmakar, A.; Roy, S. Optimal facility location problem on polyhedral terrains using descending paths. Theor. Comput. Sci. 2020, 847, 68–75. [Google Scholar] [CrossRef]
Gao, X.; Park, C.; Chen, X.; Xie, E.; Huang, G.; Zhang, D. Globally optimal facility locations for continuous-space facility location problems. Appl. Sci. 2021, 11, 7321. [Google Scholar] [CrossRef]
Liu, W.; Wang, H.; Zhang, Y.; Qin, L.; Zhang, W. I/O efficient algorithm for c-approximate furthest neighbor search in high-dimensional space. In Proceedings of the International Conference on Database Systems for Advanced Applications, Jeju, Korea, 24–27 September 2020; pp. 221–236. [Google Scholar]
Huang, Q.; Feng, J.; Fang, Q.; Ng, W. Two efficient hashing schemes for high-dimensional furthest neighbor search. IEEE Trans. Knowl. Data Eng. 2017, 29, 2772–2785. [Google Scholar] [CrossRef]
Liu, Y.; Gong, X.; Kong, D.; Hao, T.; Yan, X. A Voronoi-based group reverse k farthest neighbor query method in the obstacle space. IEEE Access 2020, 8, 50659–50673. [Google Scholar] [CrossRef]
Pagh, R.; Silvestri, F.; Sivertsen, J.; Skala, M. Approximate furthest neighbor in high dimensions. In Proceedings of the International Conference on Similarity Search and Applications, Glasgow, UK, 12–14 October 2015; pp. 3–14. [Google Scholar]
Korn, F.; Muthukrishnan, S. Influence sets based on reverse nearest neighbor queries. In Proceedings of the International Conference on Management of Data, Dallas, TX, USA, 16–18 May 2000; pp. 201–212. [Google Scholar]
Wang, S.; Cheema, M.A.; Lin, X.; Zhang, Y.; Liu, D. Efficiently computing reverse k furthest neighbors. In Proceedings of the International Conference on Data Engineering, Helsinki, Finland, 16–20 May 2016; pp. 1110–1121. [Google Scholar]
Beckmann, N.; Kriegel, H.-P.; Schneider, R.; Seeger, B. The R*-tree: An efficient and robust access method for points and rectangles. In Proceedings of the International Conference on Management of Data, Atlantic City, NJ, USA, 23–25 May 1990; pp. 322–331. [Google Scholar]
Guttman, A. R-trees: A dynamic index structure for spatial searching. In Proceedings of the International Conference on Management of Data, Boston, MA, USA, 18–21 June 1984; pp. 47–57. [Google Scholar]
Huang, Q.; Feng, J.; Fang, Q. Reverse query-aware locality-sensitive hashing for high-dimensional furthest neighbor search. In Proceedings of the International Conference on Data Engineering, San Diego, CA, USA, 19–22 April 2017; pp. 167–170. [Google Scholar]
Lu, H.; Yiu, M.L. On computing farthest dominated locations. IEEE Trans. Knowl. Data Eng. 2011, 23, 928–941. [Google Scholar] [CrossRef]
Cho, H.-J. Efficient shared execution processing of k-nearest neighbor joins in road networks. Mob. Inf. Syst. 2018, 2018, 55–66. [Google Scholar] [CrossRef] [Green Version]
He, D.; Wang, S.; Zhou, X.; Cheng, R. GLAD: A grid and labeling framework with scheduling for conflict-aware knn Queries. IEEE Trans. Knowl. Data Eng. 2021, 33, 1554–1566. [Google Scholar] [CrossRef]
Yang, R.; Niu, B. Continuous k nearest neighbor queries over large-scale spatial-textual data streams. ISPRS Int. J. Geo-Inf. 2020, 9, 694. [Google Scholar] [CrossRef]
Cho, H.-J.; Attique, M. Group processing of multiple k-farthest neighbor queries in road networks. IEEE Access 2020, 8, 110959–110973. [Google Scholar] [CrossRef]
Reza, R.M.; Ali, M.E.; Hashem, T. Group processing of simultaneous shortest path queries in road networks. In Proceedings of the International Conference on Mobile Data Management, Pittsburgh, PA, USA, 15–18 June 2015; pp. 128–133. [Google Scholar]
Zhang, M.; Li, L.; Hua, W.; Zhou, X. Efficient batch processing of shortest path queries in road networks. In Proceedings of the International Conference on Mobile Data Management, Hong Kong, China, 10–13 June 2019; pp. 100–105. [Google Scholar]
Zhang, M.; Li, L.; Hua, W.; Zhou, X. Batch processing of shortest path queries in road networks. In Proceedings of the Australasian Database Conference on Databases Theory and Applications, Sydney, Australia, 29 January–1 February 2019; pp. 3–16. [Google Scholar]
Reza, R.M.; Ali, M.E.; Cheema, M.A. The optimal route and stops for a group of users in a road network. In Proceedings of the International Conference on Advances in Geographic Information Systems, Redondo Beach, CA, USA, 7–10 November 2017; pp. 1–10. [Google Scholar]
Kim, T.; Cho, H.-J.; Hong, H.J.; Nam, H.; Cho, H.; Do, G.Y.; Jeon, P. Efficient processing of k-farthest neighbor queries for road networks. J. Korea Soc. Comput. Inf. 2019, 24, 79–89. [Google Scholar]
Abeywickrama, T.; Cheema, M.A.; Taniar, D. k-nearest neighbors on road networks: A journey in experimentation and in-memory implementation. In Proceedings of the International Conference on Very Large Data Bases, New Delhi, India, 5–9 September 2016; pp. 492–503. [Google Scholar]
Lee, K.C.K.; Lee, W.-C.; Zheng, B.; Tian, Y. ROAD: A new spatial object search framework for road networks. IEEE Trans. Knowl. Data Eng. 2012, 24, 547–560. [Google Scholar] [CrossRef]
Zhong, R.; Li, G.; Tan, K.-L.; Zhou, L.; Gong, Z. G-tree: An efficient and scalable index for spatial search on road networks. IEEE Trans. Knowl. Data Eng. 2015, 27, 2175–2189. [Google Scholar] [CrossRef]
Cormen, T.H.; Leiserson, C.E.; Rivest, R.L.; Stein, C. Introduction to Algorithms, 3rd ed.; MIT Press and McGraw-Hill: Cambridge, MA, USA, 2009; pp. 643–683. [Google Scholar]
Real Datasets for Spatial Databases. Available online: https://www.cs.utah.edu/~lifeifei/SpatialDataset.htm (accessed on 4 October 2021).
Wu, L.; Xiao, X.; Deng, D.; Cong, G.; Zhu, A.D.; Zhou, S. Shortest path and distance queries on road networks: An experimental evaluation. In Proceedings of the International Conference on Very Large Data Bases, Istanbul, Turkey, 27–31 August 2012; pp. 406–417. [Google Scholar]
Bast, H.; Funke, S.; Matijevic, D. Ultrafast shortest-path queries via transit nodes. In Proceedings of the International Workshop on Shortest Path Problem, Piscataway, NJ, USA, 13–14 November 2006; pp. 175–192. [Google Scholar]
Geisberger, R.; Sanders, P.; Schultes, D.; Delling, D. Contraction hierarchies: Faster and simpler hierarchical routing in road networks. In Proceedings of the International Workshop on Experimental Algorithms, Cape Cod, MA, USA, 30 May–2 June 2008; pp. 319–333. [Google Scholar]
Li, Z.; Chen, L.; Wang, Y. G*-tree: An efficient spatial index on road networks. In Proceedings of the International Conference on Data Engineering, Macao, China, 8–11 April 2019; pp. 268–279. [Google Scholar]
Samet, H.; Sankaranarayanan, J.; Alborzi, H. Scalable network distance browsing in spatial databases. In Proceedings of the International Conference on Management of Data, Vancouver, BC, Canada, 9–12 June 2008; pp. 43–54. [Google Scholar]

Figure 1. Example of kFN join

Q ⋉_{k F N} P

, where

Q = {q_{1}, q_{2}, q_{3}}

and

P = {p_{1}, p_{2}, p_{3}, p_{4}}

.

Figure 1. Example of kFN join

Q ⋉_{k F N} P

, where

Q = {q_{1}, q_{2}, q_{3}}

and

P = {p_{1}, p_{2}, p_{3}, p_{4}}

.

Figure 2. Example of kFN join

Q ⋉ P

in a spatial network.

Figure 2. Example of kFN join

Q ⋉ P

in a spatial network.

Figure 3. Two-step clustering method to group nearby query points into query clusters: (a) converting query points into query segments; (b) converting query segments into query clusters.

Figure 4. Two-step clustering method to group nearby data points into a data cluster: (a) converting data points into data segments; (b) converting data segments into data clusters.

Figure 5.

m a x d i s t (q_{1}, {\bar{p_{1} p_{2} p_{3}}}) = 28

and

m i n d i s t (q_{1}, {\bar{p_{1} p_{2} p_{3}}}) = 24

.

Figure 5.

m a x d i s t (q_{1}, {\bar{p_{1} p_{2} p_{3}}}) = 28

and

m i n d i s t (q_{1}, {\bar{p_{1} p_{2} p_{3}}}) = 24

.

Figure 6.

m a x d i s t (q_{1}, {\bar{p_{4} p_{5}}, \bar{p_{5} p_{6}}}) = 11

and

m i n d i s t (q_{1}, {\bar{p_{4} p_{5}}, \bar{p_{5} p_{6}}}) = 5

: (a)

m a x d i s t (q_{1}, \bar{p_{4} p_{5}}) = 8

and

m i n d i s t (q_{1}, \bar{p_{4} p_{5}}) = 5

; (b)

m a x d i s t (q_{1}, \bar{p_{5} p_{6}}) = 11

and

m i n d i s t (q_{1}, \bar{p_{5} p_{6}}) = 8

.

Figure 6.

m a x d i s t (q_{1}, {\bar{p_{4} p_{5}}, \bar{p_{5} p_{6}}}) = 11

and

m i n d i s t (q_{1}, {\bar{p_{4} p_{5}}, \bar{p_{5} p_{6}}}) = 5

: (a)

m a x d i s t (q_{1}, \bar{p_{4} p_{5}}) = 8

and

m i n d i s t (q_{1}, \bar{p_{4} p_{5}}) = 5

; (b)

m a x d i s t (q_{1}, \bar{p_{5} p_{6}}) = 11

and

m i n d i s t (q_{1}, \bar{p_{5} p_{6}}) = 8

.

Figure 7.

m a x d i s t (v_{1}, {\bar{p_{1} p_{2} p_{3}}}) = 24

and

m i n d i s t (v_{1}, {\bar{p_{1} p_{2} p_{3}}}) = 19

.

Figure 7.

m a x d i s t (v_{1}, {\bar{p_{1} p_{2} p_{3}}}) = 24

and

m i n d i s t (v_{1}, {\bar{p_{1} p_{2} p_{3}}}) = 19

.

Figure 8.

m a x d i s t (v_{1}, {\bar{p_{4} p_{5}}, \bar{p_{5} p_{6}}}) = 12

and

m i n d i s t (v_{1}, {\bar{p_{4} p_{5}}, \bar{p_{5} p_{6}}}) = 9

: (a)

m a x d i s t (v_{1}, \bar{p_{4} p_{5}}) = 11

and

m i n d i s t (v_{1}, \bar{p_{4} p_{5}}) = 9

; (b)

m a x d i s t (v_{1}, \bar{p_{5} p_{6}}) = 12

and

m i n d i s t (v_{1}, \bar{p_{5} p_{6}}) = 9

.

Figure 8.

m a x d i s t (v_{1}, {\bar{p_{4} p_{5}}, \bar{p_{5} p_{6}}}) = 12

and

m i n d i s t (v_{1}, {\bar{p_{4} p_{5}}, \bar{p_{5} p_{6}}}) = 9

: (a)

m a x d i s t (v_{1}, \bar{p_{4} p_{5}}) = 11

and

m i n d i s t (v_{1}, \bar{p_{4} p_{5}}) = 9

; (b)

m a x d i s t (v_{1}, \bar{p_{5} p_{6}}) = 12

and

m i n d i s t (v_{1}, \bar{p_{5} p_{6}}) = 9

.

Figure 9.

m a x d i s t (v_{2}, {\bar{p_{1} p_{2} p_{3}}}) = 25.5

and

m i n d i s t (v_{2}, {\bar{p_{1} p_{2} p_{3}}}) = 23

.

Figure 9.

m a x d i s t (v_{2}, {\bar{p_{1} p_{2} p_{3}}}) = 25.5

and

m i n d i s t (v_{2}, {\bar{p_{1} p_{2} p_{3}}}) = 23

.

Figure 10.

m a x d i s t (v_{2}, {\bar{p_{4} p_{5}}, \bar{p_{5} p_{6}}}) = 8

and

m i n d i s t (v_{2}, {\bar{p_{4} p_{5}}, \bar{p_{5} p_{6}}}) = 5

: (a)

m a x d i s t (v_{2}, \bar{p_{4} p_{5}}) = 8

and

m i n d i s t (v_{2}, \bar{p_{4} p_{5}}) = 5

; (b)

m a x d i s t (v_{2}, \bar{p_{5} p_{6}}) = 8

and

m i n d i s t (v_{2}, \bar{p_{5} p_{6}}) = 5

.

Figure 10.

m a x d i s t (v_{2}, {\bar{p_{4} p_{5}}, \bar{p_{5} p_{6}}}) = 8

and

m i n d i s t (v_{2}, {\bar{p_{4} p_{5}}, \bar{p_{5} p_{6}}}) = 5

: (a)

m a x d i s t (v_{2}, \bar{p_{4} p_{5}}) = 8

and

m i n d i s t (v_{2}, \bar{p_{4} p_{5}}) = 5

; (b)

m a x d i s t (v_{2}, \bar{p_{5} p_{6}}) = 8

and

m i n d i s t (v_{2}, \bar{p_{5} p_{6}}) = 5

.

Figure 11. Arranging data clusters in decreasing order of their maximum distance to

q_{1}

.

Figure 11. Arranging data clusters in decreasing order of their maximum distance to

q_{1}

.

Figure 12. Comparison of kFN join query processing times for the NA roadmap: (a)

10^{3} \leq |Q| \leq 5 \times 10^{3}

; (b)

10^{3} \leq |P| \leq 5 \times 10^{3}

; (c)

1 \leq k \leq 16

; (d)

1 \leq |C_{Q}| \leq 10

; (e)

1 \leq |C_{P}| \leq 10

.

Figure 12. Comparison of kFN join query processing times for the NA roadmap: (a)

10^{3} \leq |Q| \leq 5 \times 10^{3}

; (b)

10^{3} \leq |P| \leq 5 \times 10^{3}

; (c)

1 \leq k \leq 16

; (d)

1 \leq |C_{Q}| \leq 10

; (e)

1 \leq |C_{P}| \leq 10

.

Figure 13. Comparison of kFN join query processing times for the SJ roadmap: (a)

10^{3} \leq |Q| \leq 5 \times 10^{3}

; (b)

10^{3} \leq |P| \leq 5 \times 10^{3}

; (c)

1 \leq k \leq 16

; (d)

1 \leq |C_{Q}| \leq 10

; (e)

1 \leq |C_{P}| \leq 10

.

Figure 13. Comparison of kFN join query processing times for the SJ roadmap: (a)

10^{3} \leq |Q| \leq 5 \times 10^{3}

; (b)

10^{3} \leq |P| \leq 5 \times 10^{3}

; (c)

1 \leq k \leq 16

; (d)

1 \leq |C_{Q}| \leq 10

; (e)

1 \leq |C_{P}| \leq 10

.

Figure 14. Scalability test: (a)

10^{3} \leq |Q| \leq 10^{4}

for NA; (b)

10^{3} \leq |P| \leq 10^{4}

for NA; (c)

10^{3} \leq |Q| \leq 10^{4}

for SJ; (d)

10^{3} \leq |P| \leq 10^{4}

for SJ.

Figure 14. Scalability test: (a)

10^{3} \leq |Q| \leq 10^{4}

for NA; (b)

10^{3} \leq |P| \leq 10^{4}

for NA; (c)

10^{3} \leq |Q| \leq 10^{4}

for SJ; (d)

10^{3} \leq |P| \leq 10^{4}

for SJ.

Table 1. Classification of related work.

References	Space Domain	Query Type	Data Type
[8,9,14,19]	Euclidean space	RkFN search	Monochromatic
[14,22]	Euclidean space	RkFN search	Bichromatic
[6,9,17,18,20,25]	Euclidean space	kFN search
[7]	Euclidean space	AkFN search
[26]	Euclidean space	FDL search
[13]	Spatial network	RkFN search	Monochromatic
[10,13]	Spatial network	RkFN search	Bichromatic
[35]	Spatial network	kFN search
[11]	Spatial network	AkFN search
This study	Spatial network	kFN join

Table 2. Symbols used in this paper and their meanings.

Symbol	Definition
k	Number of requested FNs
Q and q	A set Q of query points and query point q in Q, respectively
P and p	A set P of data points and data point p in P, respectively
$\bar{v_{l} {v_{l + 1} \dots v}_{m}}$	Vertex sequence where $v_{l}$ and $v_{m}$ are either an intersection vertex or a terminal vertex and the other vertices, $v_{l + 1}, \dots {, v}_{m - 1}$ , are intermediate vertices
$\bar{q_{i} q_{i + 1} {\dots q}_{j}}$	Query segment connecting query points $q_{i}, q_{i + 1}, \dots, q_{j}$ in a vertex sequence (in short, $\bar{q_{i} q_{j}}$ )
$\bar{p_{l} p_{l + 1} {\dots p}_{m}}$	Data segment connecting data points $p_{l}, p_{l + 1}, \dots, p_{m}$ in a vertex sequence (in short, $\bar{p_{l} p_{m}}$ )
$\bar{Q_{C}}$ and $\bar{P_{C}}$	Set of query segments and set of data segments, respectively
$\bar{Q}$ and $\bar{P}$	Set of query clusters and set of data clusters, respectively
$B (\bar{Q_{C}})$ and $B (\bar{P_{C}})$	Sets of border points of $\bar{Q_{C}}$ and $\bar{P_{C}}$ , respectively
$b_{q}$ and $b_{p}$	Border points of $\bar{Q_{C}}$ and $\bar{P_{C}}$ , respectively
$Ω (q)$	Set of k data points farthest from a query point q
$d i s t (q, p)$	Length of the shortest path connecting points q and p
$l e n (\bar{q p})$	Length of the segment $\bar{q p}$

Table 3. Maximum and minimum distances between border points and data clusters.

$b_{q}$	${\bar{p_{1} p_{2} p_{3}}}$	${\bar{p_{4} p_{5}}, \bar{p_{5} p_{6}}}$
$q_{1}$	$m a x d i s t (q_{1}, {\bar{p_{1} p_{2} p_{3}}}) = 28$	$m a x d i s t (q_{1}, {\bar{p_{4} p_{5}}, \bar{p_{5} p_{6}}}) = 11$
	$m i n d i s t (q_{1}, {\bar{p_{1} p_{2} p_{3}}}) = 24$	$m i n d i s t (q_{1}, {\bar{p_{4} p_{5}}, \bar{p_{5} p_{6}}}) = 5$
$v_{1}$	$m a x d i s t (v_{1}, {\bar{p_{1} p_{2} p_{3}}}) = 24$	$m a x d i s t (v_{1}, {\bar{p_{4} p_{5}}, \bar{p_{5} p_{6}}}) = 12$
	$m i n d i s t (v_{1}, {\bar{p_{1} p_{2} p_{3}}}) = 19$	$m i n d i s t (v_{1}, {\bar{p_{4} p_{5}}, \bar{p_{5} p_{6}}}) = 9$
$v_{2}$	$m a x d i s t (v_{2}, {\bar{p_{1} p_{2} p_{3}}}) = 25.5$	$m a x d i s t (v_{2}, {\bar{p_{4} p_{5}}, \bar{p_{5} p_{6}}}) = 8$
	$m i n d i s t (v_{2}, {\bar{p_{1} p_{2} p_{3}}}) = 23$	$m i n d i s t (v_{2}, {\bar{p_{4} p_{5}}, \bar{p_{5} p_{6}}}) = 5$

Table 4. Comparison of time complexities of the CNLJ and nonclustering join algorithms.

	CNLJ Algorithm	Nonclustering Join Algorithm
Number of kFN queries to be evaluated	$M \cdot \| \bar{Q} \|$	$\| Q \|$
Time complexity to evaluate the kFN search	$O (\|E\| + \|V\| \log \|V\| + \|P\| \log \|P\|)$	$O (\|E\| + \|V\| \log \|V\| + \|P\| \log \|P\|)$
Time complexity to evaluate the kFN join	$O (\| \bar{Q} \| \cdot (\|E\| + \|V\| \log \|V\| + \|P\| \log \|P\|))$	$O (\| Q \| \cdot (\|E\| + \|V\| \log \|V\| + \|P\| \log \|P\|))$

Table 5. Results of kFN queries at

q_{1}

,

v_{1}

, and

v_{2}

and their sentinel distances.

Table 5. Results of kFN queries at

q_{1}

,

v_{1}

, and

v_{2}

and their sentinel distances.

$b_{q}$	$dist (b_{q}, p)$	$sntl_dist (b_{q})$	$Ω (b_{q})$
$q_{1}$	$d i s t (q_{1}, p_{1}) = 24$	$s n t l_d i s t (q_{1}) = 20$	$Ω (q_{1}) = {p_{1}, p_{2}, p_{3}}$
	$d i s t (q_{1}, p_{2}) = 25$
	$d i s t (q_{1}, p_{3}) = 27$
$v_{1}$	$d i s t (v_{1}, p_{1}) = 19$	$s n t l_d i s t (v_{1}) = 15$	$Ω (v_{1}) = {p_{1}, p_{2}, p_{3}}$
	$d i s t (v_{1}, p_{2}) = 20$
	$d i s t (v_{1}, p_{3}) = 24$
$v_{2}$	$d i s t (v_{2}, p_{1}) = 23$	$s n t l_d i s t (v_{2}) = 18$	$Ω (v_{2}) = {p_{1}, p_{2}, p_{3}}$
	$d i s t (v_{2}, p_{2}) = 24$
	$d i s t (v_{2}, p_{3}) = 23$

Table 6. Retrieval of two FNs for query points among candidate data points.

q	$dist (q, p)$	$Ω (q)$
$q_{1}$	$d i s t (q_{1}, p_{1}) = 24$	$Ω (q_{1}) = {p_{2}, p_{3}}$
	$d i s t (q_{1}, p_{2}) = 25$
	$d i s t (q_{1}, p_{3}) = 27$
$q_{2}$	$d i s t (q_{2}, p_{1}) = 25$	$Ω (q_{2}) = {p_{1}, p_{2}}$ or
	$d i s t (q_{2}, p_{2}) = 26$	$Ω (q_{2}) = {p_{2}, p_{3}}$
	$d i s t (q_{2}, p_{3}) = 25$
$q_{3}$	$d i s t (q_{3}, p_{1}) = 21$	$Ω (q_{3}) = {p_{2}, p_{3}}$
	$d i s t (q_{3}, p_{2}) = 22$
	$d i s t (q_{3}, p_{3}) = 25$
$q_{4}$	$d i s t (q_{4}, p_{1}) = 21$	$Ω (q_{4}) = {p_{2}, p_{3}}$
	$d i s t (q_{4}, p_{2}) = 22$
	$d i s t (q_{4}, p_{3}) = 26$

Table 7. Real-world roadmaps [40].

Name	Description	Vertices	Edges	Vertex Sequences
NA	Highways in North America (NA)	175,813	179,179	12,416
SJ	City streets in San Joaquin (SJ), California	18,263	23,874	20,040

Table 8. Experimental parameter settings.

Parameter	Range
Number of query points ( $\| Q \|$ )	1, 2, 3, 4, 5, 7, 10 ( $\times 10^{3}$ )
Number of data points ( $\| P \|$ )	1, 2, 3, 4, 5, 7, 10 ( $\times 10^{3}$ )
Number of FNs required (k)	1, 2, 4, 8, 16
Distribution of query and data points	Centroid distribution
Number of centroids for query points in Q ( $\| C_{Q} \|$ )	1, 3, 5, 7, 10
Number of centroids for data points in P ( $\| C_{P} \|$ )	1, 3, 5, 7, 10
The standard deviation for normal distribution ( $σ$ )	$10^{- 2}$
Roadmap	NA, SJ

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Cho, H.-J. Cluster Nested Loop k-Farthest Neighbor Join Algorithm for Spatial Networks. ISPRS Int. J. Geo-Inf. 2022, 11, 123. https://doi.org/10.3390/ijgi11020123

AMA Style

Cho H-J. Cluster Nested Loop k-Farthest Neighbor Join Algorithm for Spatial Networks. ISPRS International Journal of Geo-Information. 2022; 11(2):123. https://doi.org/10.3390/ijgi11020123

Chicago/Turabian Style

Cho, Hyung-Ju. 2022. "Cluster Nested Loop k-Farthest Neighbor Join Algorithm for Spatial Networks" ISPRS International Journal of Geo-Information 11, no. 2: 123. https://doi.org/10.3390/ijgi11020123

APA Style

Cho, H.-J. (2022). Cluster Nested Loop k-Farthest Neighbor Join Algorithm for Spatial Networks. ISPRS International Journal of Geo-Information, 11(2), 123. https://doi.org/10.3390/ijgi11020123

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Cluster Nested Loop k-Farthest Neighbor Join Algorithm for Spatial Networks

Abstract

1. Introduction

2. Background

2.1. Related Work

2.2. Notation and Formal Problem Description

3. Clustering Points and Computing Distances

3.1. Clustering Query and Data Points Using Spatial Network Connection

3.2. Computing Maximum and Minimum Distances from a Border Point to a Data Cluster

4. Cluster Nested Loop Join Algorithm for Spatial Networks

4.1. Cluster Nested Loop Join Algorithm

4.2. Evaluating kFN Queries at Border Points

4.3. Evaluating an Example kFN Join Query

5. Performance Evaluation

5.1. Experimental Settings

5.2. Experimental Results

6. Discussion and Conclusions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI