An Adaptive Sweep-Circle Spatial Clustering Algorithm Based on Gestalt

An adaptive spatial clustering (ASC) algorithm is proposed in this present study, which employs sweep-circle techniques and a dynamic threshold setting based on the Gestalt theory to detect spatial clusters. The proposed algorithm can automatically discover clusters in one pass, rather than through the modification of the initial model (for example, a minimal spanning tree, Delaunay triangulation or Voronoi diagram). It can quickly identify arbitrarily-shaped clusters while adapting efficiently to non-homogeneous density characteristics of spatial data, without the need of prior knowledge or parameters. The proposed algorithm is also ideal for use in data streaming technology with dynamic characteristics flowing in the form of spatial clustering in large data sets.


Introduction
Rapid advancements in geographic spatial information technology, generation, and collection have created exponential growth in spatial data, which has resulted in increasingly complex data structures.It is increasingly necessary to address the challenges involved in extracting useful information and knowledge from large-scale and highly complex spatial data sets.Data mining from the spatial data set is a valuable way to obtain valuable information, with spatial clustering having played an indispensable role in spatial data mining research.Clustering is the process of grouping spatial data objects into a series of meaningful clusters so that objects within a particular cluster share similarities, while being dissimilar to other clusters [1,2].Spatial point clustering has been applied to a wide variety of fields, including urban planning, remote sensing, geographic information, bio-engineering, geology and minerals, as well as computer science [3][4][5].The current methods for spatial clustering have been roughly classified into the following categories:
These traditional approaches have been successful in managing a number of specific applications across different domains, but significant limitations exist.Most traditional clustering methods rely on user-specified arguments or a priori knowledge.Furthermore, these methods cannot manage clusters of irregular shapes or of different sizes and are not effective in sets with non-uniform inner density, outliers, or noise.In fact, no particular clustering method has been shown to be superior to its competitors with regards to all of the necessary aspects [25,26].To date, the advantages and disadvantages of various algorithms have been extensively analyzed [26][27][28][29][30][31].An analysis of the classical spatial clustering algorithms is shown in Table 1.
Data often involve the relation to geographical space and are processed in large amounts.The spatial object is highly complex and requires extensive computation, which means that clustering algorithms need to be highly efficient.Efficient spatial clustering algorithms are valuable for many real-world, dynamic applications [32].Large data sets are challenging for computational systems when processed with conventional algorithms, particularly as the amount of spatial data increases exponentially in the real world.Popular traditional clustering algorithms require repeated access to the data set as well as multiple clustering operations, which means that their efficiency decreases with an increase in data set size [5,27,33].This paper proposes an adaptive spatial clustering algorithm (ASC) that employs both sweep-circle techniques and a dynamic threshold setting based on Gestalt theory to detect spatial clusters.Empirical results and a comparison with traditional methods demonstrated that the proposed ASC can automatically discover clusters in one pass, rather than modifying the initial model.A minimal spanning tree, Delaunay triangulation (DT), or Voronoi diagram can be quickly identified even with arbitrarily-shaped clusters.The proposed ASC can identify the non-homogeneous density characteristics of spatial data without the need for prior knowledge or parameters.It is compatible with streaming dynamic, large-scale data found in spatial clustering.
The remainder of this paper is organized as follows: In Section 2, the relation of ASC to previous methods is described.In Section 3, the proposed algorithm is explained in detail.Section 4 describes the ASC-based streaming process as applied to large data sets.Section 5 reports our analysis of the algorithm, including its time complexity and comparison with other clustering methods.Section 6 provides an example of the proposed algorithm applied to a real-world data set, while Section 7 concludes with an outlook for further research.

Plane-Sweep Techniques
The plane-sweep is a popular acceleration technique used to solve 2D Euclidean space geometric problems [34].This technique initially sorts the geometric elements, before imagining that a sweep-line glides over the plane and stops at geometric elements (typically called "event points") [35], where the corresponding data structure is then updated.The plane-sweep method cannot move backwards across the event points.
The sweep-plane technique was initially applied to computational geometry problems [36].Shamos and Hoey later applied a unidirectional sweep-plane algorithm that used time O (nlogn) to determine whether or not a finite number of line segments have any intersections in a plane [37].
Bentley and Ottmann extended this algorithm to determine the existence of intersecting line segments.Furthermore, they were able report all k intersections of n line segments within time O ((n + k)logn), where k is the number of intersections [38].The sweep-line algorithm was also used to construct a Voronoi diagram, i.e., dual Delaunay triangulation [39].The Delaunay algorithm examined in this study is based on the plane-scattered point sets used by Žalik [35] and Zhou [40].Žalik was the first to suggest the use of sweep-line techniques for spatial clustering [41].

Sweep-Circle Algorithm
The sweep-circle is another important sweep-line technique, where points are initially sorted according to their distances from a fixed pole O in the convex hull of S. It is assumed there is a circle C centered at O, with radius increasing from 0 to +∞, which stops at event points and updates the data structure.A part of the problem being swept (inside the circle) is already solved, while the remaining part (out of the circle) is unsolved.Dehne and Klein [42] were the first to use a circle that emanates from a fixed point, which resulted in a Voronoi diagram.Adam [43], in addition to Biniaz and Dastghaibyfard (2012), suggested that the incremental sweep-circle algorithm was more suitable for constructing Delaunay triangulations [42] (Figure 1).

Plane-Sweep Techniques
The plane-sweep is a popular acceleration technique used to solve 2D Euclidean space geometric problems [34].This technique initially sorts the geometric elements, before imagining that a sweep-line glides over the plane and stops at geometric elements (typically called "event points") [35], where the corresponding data structure is then updated.The plane-sweep method cannot move backwards across the event points.
The sweep-plane technique was initially applied to computational geometry problems [36].Shamos and Hoey later applied a unidirectional sweep-plane algorithm that used time O (nlogn) to determine whether or not a finite number of line segments have any intersections in a plane [37].
Bentley and Ottmann extended this algorithm to determine the existence of intersecting line segments.Furthermore, they were able report all k intersections of n line segments within time O ((n+k)logn), where k is the number of intersections [38].The sweep-line algorithm was also used to construct a Voronoi diagram, i.e., dual Delaunay triangulation [39].The Delaunay algorithm examined in this study is based on the plane-scattered point sets used by Žalik [35] and Zhou [40].Žalik was the first to suggest the use of sweep-line techniques for spatial clustering [41].

Sweep-Circle Algorithm
The sweep-circle is another important sweep-line technique, where points are initially sorted according to their distances from a fixed pole O in the convex hull of S. It is assumed there is a circle C centered at O, with radius increasing from 0 to +∞, which stops at event points and updates the data structure.A part of the problem being swept (inside the circle) is already solved, while the remaining part (out of the circle) is unsolved.Dehne and Klein [42] were the first to use a circle that emanates from a fixed point, which resulted in a Voronoi diagram.Adam [43], in addition to Biniaz and Dastghaibyfard (2012), suggested that the incremental sweep-circle algorithm was more suitable for constructing Delaunay triangulations [42] (Figure 1).

Data Stream Technique
The "data stream" is an unbounded orderly sequence of information, which can consecutively arrive in large quantities.However, this technique can only process data sequentially with appropriate access.Data mining algorithms based on data streaming techniques are commonly used in obtaining data from satellite remote sensors, geographic information, network monitoring, and financial services.Traditional typical spatial clustering algorithms that repeatedly access entire data sets cannot be readily applied for data streaming, as their high complexity and computational cost makes it impossible for them to manage such a large amount of data.In fact, data stream clustering algorithms have become important in data mining research and subsequently, many algorithms

Data Stream Technique
The "data stream" is an unbounded orderly sequence of information, which can consecutively arrive in large quantities.However, this technique can only process data sequentially with appropriate access.Data mining algorithms based on data streaming techniques are commonly used in obtaining data from satellite remote sensors, geographic information, network monitoring, and financial services.Traditional typical spatial clustering algorithms that repeatedly access entire data sets cannot be readily applied for data streaming, as their high complexity and computational cost makes it impossible for them to manage such a large amount of data.In fact, data stream clustering algorithms have become important in data mining research and subsequently, many algorithms based on data stream technology have been proposed, including the commonly-used one-pass algorithm [9,[44][45][46][47][48].This algorithm divides the non-streaming data sets into data blocks so as to fit requirements of the memory space and one-pass sweeping data objects.The traditional clustering algorithm can be applied to the data-streaming environment once the data blocks have arrived from the data stream [49].For example, K-Means and K-Medians algorithms [44] can be used to process large data sets, before the Squeezer algorithm can allocate the data into similar globes for clustering using one-pass sweeping [46].The BIRCH algorithm uses a clustering feature tree to minimize I/O requests prior to the one-pass sweeping for clustering [9].Guha et al. also conducted valuable research on the one-pass algorithm using similar data sets [44,47].

Sweep-Line Clustering Algorithm
Žalik (2009) proposed an innovative, agglomerative hierarchical clustering algorithm for spatial data using a sweep-line in O (nlogn) time in the worst case.This algorithm does not rely on domain knowledge or modification of the initial model.Furthermore, this algorithm can determine clusters of arbitrary shapes when completing spatial clustering of large data sets.In this algorithm, there are the horizontal sweep-lines named S 1 and S 2 , where the distance from S 1 to the front of S 2 is d.It is assumed that S 1 sweeps the p i-1 set points in accordance with the proximity parameter d to form part of the clusters, while the points of the front line (AF) are sorted in accordance with the x coordinates.When S 1 encounters point p i , p i is projected to the frontier toward AF and a cluster is found by comparing the distance from p i to p l and p i to p r using the proximity parameter d (Figure 2).When S 1 moves to the next point, S 2 follows it at distance d.The points that have been swept by S 2 are removed from the AF.If the projection misses the AF (that can also be empty), the corresponding end-point of the AF is tested to determine whether it is close enough to point p i to discover a new cluster [41].It is difficult to determine the global parameter d that accounts for uneven distribution of data sets.If the parameters are set without a priori knowledge (or measured experimental results), it is difficult to find true clusters accurately.
ISPRS Int.J. Geo-Inf.2017, 6, 272 5 of 21 based on data stream technology have been proposed, including the commonly-used one-pass algorithm [9,[44][45][46][47][48].This algorithm divides the non-streaming data sets into data blocks so as to fit requirements of the memory space and one-pass sweeping data objects.The traditional clustering algorithm can be applied to the data-streaming environment once the data blocks have arrived from the data stream [49].For example, K-Means and K-Medians algorithms [44] can be used to process large data sets, before the Squeezer algorithm can allocate the data into similar globes for clustering using one-pass sweeping [46].The BIRCH algorithm uses a clustering feature tree to minimize I/O requests prior to the one-pass sweeping for clustering [9].Guha et al. also conducted valuable research on the one-pass algorithm using similar data sets [44,47].

Sweep-Line Clustering Algorithm
Žalik (2009) proposed an innovative, agglomerative hierarchical clustering algorithm for spatial data using a sweep-line in O (nlogn) time in the worst case.This algorithm does not rely on domain knowledge or modification of the initial model.Furthermore, this algorithm can determine clusters of arbitrary shapes when completing spatial clustering of large data sets.In this algorithm, there are the horizontal sweep-lines named S1 and S2, where the distance from S1 to the front of S2 is d.It is assumed that S1 sweeps the pi-1 set points in accordance with the proximity parameter d to form part of the clusters, while the points of the front line (AF) are sorted in accordance with the x coordinates.When S1 encounters point pi, pi is projected to the frontier toward AF and a cluster is found by comparing the distance from pi to pl and pi to pr using the proximity parameter d (Figure 2).When S1 moves to the next point, S2 follows it at distance d.The points that have been swept by S2 are removed from the AF.If the projection misses the AF (that can also be empty), the corresponding end-point of the AF is tested to determine whether it is close enough to point pi to discover a new cluster [41].It is difficult to determine the global parameter d that accounts for uneven distribution of data sets.If the parameters are set without a priori knowledge (or measured experimental results), it is difficult to find true clusters accurately.

ASC Algorithm
Let E be the Euclidean plane and the Euclidean distance between two points x and y of E; and S be a planar set of n points of E, which are called sites.In the polar coordinate system where the ASC algorithm is applied, pi is swept from the initial frontier in the outwards direction according to the increasing distance from pole O (i.e., the sweep-circle center).Following this, pi is projected onto the segment of frontier edge (plpr) along the circle in the O-direction (Figure 3).According to Tobler, the first law of geography is that "everything is related to everything else, but near things are more related than distant things" [50].The points are considered to be similar if the points are within a specific distance of each other, such as points pi and pl or pi and pr (Figure 3).These values fall under a threshold value used to determine the formation of clusters.
The algorithm proposed in this paper utilizes the Gestalt theory and the associated definition of the dynamic adaptive threshold.It can efficiently locate the adaptive clusters of arbitrary shapes and can acclimate to the uneven density characteristics of spatial data to avoid the requirements of preset global parameters, such as those necessary for DBSCAN, DENCLUE, and other algorithms [41].ASC works in a four-phase process: basic conceptualization, initialization, clustering, and cluster merging.

ASC Algorithm
Let E be the Euclidean plane and the Euclidean distance between two points x and y of E; and S be a planar set of n points of E, which are called sites.In the polar coordinate system where the ASC algorithm is applied, p i is swept from the initial frontier in the outwards direction according to the increasing distance from pole O (i.e., the sweep-circle center).Following this, p i is projected onto the segment of frontier edge (p l p r ) along the circle in the O-direction (Figure 3).According to Tobler, the first law of geography is that "everything is related to everything else, but near things are more related than distant things" [50].The points are considered to be similar if the points are within a specific distance of each other, such as points p i and p l or p i and p r (Figure 3).These values fall under a threshold value used to determine the formation of clusters.

Basic Concepts and Initialization
Cluster definitions.Given n collection of discrete points S = {p1,p2,p3,•••,pn} in 2D set (R 2 ), we use the degree of similarity between data points.Thus, the data set divides S into k clusters C = {C1,C2,…,Ck} C ⊆ S for defining the cluster, where Calculating the polar coordinates of input points and sorting.The polar coordinates of input points are calculated and sorted by increasing distance from O as follows: Each point pi (xi,yi) in the Cartesian coordinates can be transformed to pi (ri,θi), where the points are sorted according to their r-coordinate found in the polar coordinates.If two points have the same r-coordinate, they are sorted by the secondary criterion θ.In a special case where the first point coincides with the origin O (i.e., its r-coordinate is zero), the point is removed from the list.
Constructing the initial frontier and clusters.The three points located nearest to the center O are used to form a triangle, where it is assumed that the three points are non-linear.The three edges of the triangle form a polyline, which is referred to as the frontier.Any spatial clustering algorithm should work based on various distances, such as the Euclidean distance, the Manhattan distance, or the Minkowski distance.This algorithm uses the Euclidian distance between data points to measure the distance needed for spatial clustering.Figure 4 shows an example of the three points nearest to the center O, which forms the initial cluster.The algorithm proposed in this paper utilizes the Gestalt theory and the associated definition of the dynamic adaptive threshold.It can efficiently locate the adaptive clusters of arbitrary shapes and can acclimate to the uneven density characteristics of spatial data to avoid the requirements of preset global parameters, such as those necessary for DBSCAN, DENCLUE, and other algorithms [41].ASC works in a four-phase process: basic conceptualization, initialization, clustering, and cluster merging.Determining the center of the sweep-circle.S corresponds to the coordinate set {p 1 (x 1 ,y 1 ), p 2 (x 2 ,y 2 ), p 3 (x 3 ,y 3 ),•••, p n (x n ,y n )}, where the origin of the polar coordinate O (p x ,p y ) is the center of the sweep-circle.Select O (p x ,p y ) as the average of the largest (x max ,y max ) and smallest (x mix ,y mix ) values of input S.
Calculating the polar coordinates of input points and sorting.The polar coordinates of input points are calculated and sorted by increasing distance from O as follows: Each point p i (x i ,y i ) in the Cartesian coordinates can be transformed to p i (r i ,θ i ), where the points are sorted according to their r-coordinate found in the polar coordinates.If two points have the same r-coordinate, they are sorted by the secondary criterion θ.In a special case where the first point coincides with the origin O (i.e., its r-coordinate is zero), the point is removed from the list.
Constructing the initial frontier and clusters.The three points located nearest to the center O are used to form a triangle, where it is assumed that the three points are non-linear.The three edges of the triangle form a polyline, which is referred to as the frontier.Any spatial clustering algorithm should work based on various distances, such as the Euclidean distance, the Manhattan distance, or the Minkowski distance.This algorithm uses the Euclidian distance between data points to measure the distance needed for spatial clustering.Figure 4 shows an example of the three points nearest to the center O, which forms the initial cluster.Clustering the threshold.The threshold setting ε is set as the distance measurement of d (pi,pj), which determines if two points are grouped into the same cluster.If the distance between the two points is less than or equal to this value, they belong to the same cluster; otherwise, they do not.This is calculated as follows: The clustering process for the global threshold setting ε is sensitive to density changes, particularly the internal density changes within the clusters.To manage the gradual changes of the local density, we used the fact that the spatial data mining process obeys not only the objective law of the geographical entity itself, but also relates to the concept of recognition in cognitive psychology.Specifically, the Gestalt theory was taken into account.
The Gestalt theory summarizes the cognitive law of human vision with the pattern organization discipline generated by the Gestalt principle, having been applied in pattern recognition and spatial clustering [14].The main principle of the Gestalt perception model is that "the whole is greater than the sum of its parts," which suggests that people tend to perceptually recognize structural integrity and can initially observe the visual object as a whole before breaking the object down into different parts [14].Gestalt is interpreted through principles of visual recognition, such as proximity, similarity, closure, continuity, orientation, and common fate [51].We combined a subset of Gestalt principles operating simultaneously to build the dynamic adaptive threshold model ε (t).We can define the mean of the triangle's perimeter Lt as the adaptive dynamic threshold ε (t).In this model, each new event point pi is processed to correspond to the threshold according to the following three Gestalt principles (Figure 5).

•
Proximity, where objects placed close together tend to be perceived as a group.
In agreement with Tobler's First Law of Geography, proximity is the most important for spatial clustering (in addition to also being the basis of continuity and closure).The easier it is to form continuity and closure among spatial data, the greater the similarity.
To build the dynamic threshold, relationships in terms of proximity help to define concepts, such as "distance" and "place".The concept of 'distance' explains the tendency to form a cluster when pi is close to two points of △ ε (t), while the concept of 'place' means that two triangles are adjacent to each other, such as △ ε (1) and △ ε (2), or △ ε (2) and △ ε (3).These can be used to form the dynamic threshold ε (1), ε (2), ε (3), ••, ε (t) as a group (Figure 6).Clustering the threshold.The threshold setting ε is set as the distance measurement of d (p i ,p j ), which determines if two points are grouped into the same cluster.If the distance between the two points is less than or equal to this value, they belong to the same cluster; otherwise, they do not.This is calculated as follows: The clustering process for the global threshold setting ε is sensitive to density changes, particularly the internal density changes within the clusters.To manage the gradual changes of the local density, we used the fact that the spatial data mining process obeys not only the objective law of the geographical entity itself, but also relates to the concept of recognition in cognitive psychology.Specifically, the Gestalt theory was taken into account.
The Gestalt theory summarizes the cognitive law of human vision with the pattern organization discipline generated by the Gestalt principle, having been applied in pattern recognition and spatial clustering [14].The main principle of the Gestalt perception model is that "the whole is greater than the sum of its parts," which suggests that people tend to perceptually recognize structural integrity and can initially observe the visual object as a whole before breaking the object down into different parts [14].Gestalt is interpreted through principles of visual recognition, such as proximity, similarity, closure, continuity, orientation, and common fate [51].We combined a subset of Gestalt principles operating simultaneously to build the dynamic adaptive threshold model ε (t).We can define the mean of the triangle's perimeter L t as the adaptive dynamic threshold ε (t).In this model, each new event point p i is processed to correspond to the threshold according to the following three Gestalt principles (Figure 5).Clustering the threshold.The threshold setting ε is set as the distance measurement of d (pi,pj), which determines if two points are grouped into the same cluster.If the distance between the two points is less than or equal to this value, they belong to the same cluster; otherwise, they do not.This is calculated as follows: The clustering process for the global threshold setting ε is sensitive to density changes, particularly the internal density changes within the clusters.To manage the gradual changes of the local density, we used the fact that the spatial data mining process obeys not only the objective law of the geographical entity itself, but also relates to the concept of recognition in cognitive psychology.Specifically, the Gestalt theory was taken into account.
The Gestalt theory summarizes the cognitive law of human vision with the pattern organization discipline generated by the Gestalt principle, having been applied in pattern recognition and spatial clustering [14].The main principle of the Gestalt perception model is that "the whole is greater than the sum of its parts," which suggests that people tend to perceptually recognize structural integrity and can initially observe the visual object as a whole before breaking the object down into different parts [14].Gestalt is interpreted through principles of visual recognition, such as proximity, similarity, closure, continuity, orientation, and common fate [51].We combined a subset of Gestalt principles operating simultaneously to build the dynamic adaptive threshold model ε (t).We can define the mean of the triangle's perimeter Lt as the adaptive dynamic threshold ε (t).In this model, each new event point pi is processed to correspond to the threshold according to the following three Gestalt principles (Figure 5).

•
Proximity, where objects placed close together tend to be perceived as a group.
In agreement with Tobler's First Law of Geography, proximity is the most important for spatial clustering (in addition to also being the basis of continuity and closure).The easier it is to form continuity and closure among spatial data, the greater the similarity.
To build the dynamic threshold, relationships in terms of proximity help to define concepts, such as "distance" and "place".The concept of 'distance' explains the tendency to form a cluster when pi is close to two points of △ ε (t), while the concept of 'place' means that two triangles are adjacent to each other, such as △ ε (1) and △ ε (2), or △ ε (2) and △ ε (3).These can be used to form the dynamic threshold ε (1), ε (2), ε (3), ••, ε (t) as a group (Figure 6).• Proximity, where objects placed close together tend to be perceived as a group.
In agreement with Tobler's First Law of Geography, proximity is the most important for spatial clustering (in addition to also being the basis of continuity and closure).The easier it is to form continuity and closure among spatial data, the greater the similarity.
To build the dynamic threshold, relationships in terms of proximity help to define concepts, such as "distance" and "place".The concept of 'distance' explains the tendency to form a cluster when p i is close to two points of ε (t), while the concept of 'place' means that two triangles are adjacent to each other, such as ε (1) and ε (2), or ε (2) and ε (3).These can be used to form the dynamic threshold ε (1), ε (2), ε (3), • • •, ε (t) as a group (Figure 6).

•
Continuity, where spatial objects arranged in a logical order are easily perceived as a group or a continuous graph.
The dynamic thresholds require a particular order and continuity relationships create formats, such as a "series", which are perceived in a more permanent way as the principle of continuity is connected with the concept of integrity in perception.

•
Closure, where the observer tends to prioritize closeness and "perfection" of objects.Thus, gaps between objects may be perceived as being filled to create a unified whole.
Closure tendency is valid for visual stimuli.Figure 6 shows that p i has a tendency to connect ∆ ε (t), which serves as a reminder of the whole.
ISPRS Int.J. Geo-Inf.2017, 6, 272 8 of 21 • Continuity, where spatial objects arranged in a logical order are easily perceived as a group or a continuous graph.
The dynamic thresholds require a particular order and continuity relationships create formats, such as a "series", which are perceived in a more permanent way as the principle of continuity is connected with the concept of integrity in perception.

•
Closure, where the observer tends to prioritize closeness and "perfection" of objects.Thus, gaps between objects may be perceived as being filled to create a unified whole.
Closure tendency is valid for visual stimuli.Figure 6 shows that pi has a tendency to connect ∆ ε (t), which serves as a reminder of the whole.Accordingly, when the event point pi is projected onto the frontier toward O, the triangle including the frontier is identified and we can define adaptive dynamic threshold ε (t) as follows: where α is a constant factor.We can enlarge or reduce ε (t) by manipulating the value of α , although this may affect the quality of clusters and reflect the hierarchical relation.The value of α is usually set to 1 for ASC.The three Gestalt principles operate simultaneously within ASC to build the dynamic adaptive threshold model ε (t), which is shown in  Thresholds similar to these are often set in similar real-world applications for use in situation-specific guidelines for users.For example, during urban planning, the threshold value is Accordingly, when the event point p i is projected onto the frontier toward O, the triangle including the frontier is identified and we can define adaptive dynamic threshold ε (t) as follows: where α is a constant factor.We can enlarge or reduce ε (t) by manipulating the value of α, although this may affect the quality of clusters and reflect the hierarchical relation.The value of α is usually set to 1 for ASC.The three Gestalt principles operate simultaneously within ASC to build the dynamic adaptive threshold model ε (t), which is shown in Figure 7.The input point p 4 is an event point projected on the edge (p 2, p 3 ) of triangle ∆p 1 p 2 p 3 toward O, where p 4 is closest to p 2 and p 3 .This results in the formation of a cluster due to the rule of proximity.Under the closure rule, p 2 and p 3 combine with p 4 to form a simple triangle ∆p 4 p 2 p 3 adjacent to ∆p 1 p 2 p 3 with a common edge (p 2 p 3 ).Both the proximity and continuity Gestalt clusters occur at triangle ∆p 4 p 2 p 3 and ∆p 1 p 2 p 3, which maintain closely related spatial properties.Therefore, the distance of p 4 from p 2 and p 3 is used to form a cluster, where the mean of the perimeter (L 1 ) of triangle ∆p 1 p 2 p 3 forms the adaptive dynamic threshold ε (1).When a new event point p 5 is obtained, the mean of the perimeter (L 2 ) of triangle ∆p 2 p 4 p 5 serves as the adaptive dynamic threshold ε (2).
Thresholds similar to these are often set in similar real-world applications for use in situation-specific guidelines for users.For example, during urban planning, the threshold value is set according to the minimum radius of the public service area being covered.The threshold can vary, which still allows for the analysis of the distribution of buildings in residential, commercial or industrial areas.Furthermore, this threshold reflects the hierarchical structure of the relationships between different structures.
to form a simple triangle ∆p4p2p3 adjacent to ∆p1p2p3 with a common edge (p2p3).Both the proximity and continuity Gestalt clusters occur at triangle ∆p4p2p3 and ∆p1p2p3, which maintain closely related spatial properties.Therefore, the distance of p4 from p2 and p3 is used to form a cluster, where the mean of the perimeter (L1) of triangle ∆p1p2p3 forms the adaptive dynamic threshold ε (1).When a new event point p5 is obtained, the mean of the perimeter (L2) of triangle ∆p2p4p5 serves as the adaptive dynamic threshold ε (2).Thresholds similar to these are often set in similar real-world applications for use in situation-specific guidelines for users.For example, during urban planning, the threshold value is

Clustering
In a system where the sweep-circle SC has already passed the first three points and has assigned them to one cluster, an algorithm surrounds the points by single-closure bordering polylines (i.e., the frontier) as shown in Figure 8a.When SC increases and sweeps to the new point p i , the projection of p i hits the edge (p l, p r ) of the frontier toward O.This manner of projection will typically hit the frontier, since the O lies inside the frontier and the new points lie outside of it.By connecting p i and p l as well as p i and p r , the distances dist (p i ,p l ) and dist (p i ,p r ) are calculated, where the threshold ε (t) can be set accordingly.According to Equation (3), there are four possibilities when moving forward: • dist (p i ,p l ) > ε (t) and dist (p i ,p r ) > ε (t), where p i is the first element of a new cluster.

•
dist (p i ,p l ) > ε (t) and dist (p i ,p r ) ≤ ε (t), where the right side of p i is assigned to a cluster (C r ) (Figure 8b).

•
dist (p i ,p l ) ≤ ε (t) and dist (p i ,p r ) > ε (t), where the left side of p i is assigned to a cluster (C 1 ) (Figure 8b).

•
dist (p i ,p l ) ≤ ε (t) and dist (p i ,p r ) ≤ ε (t), where if p l and p r are members of the same cluster, before p i is placed into the same cluster.Otherwise, p i is a merging point between left and right clusters [41].
ISPRS Int.J. Geo-Inf.2017, 6, 272 9 of 21 set according to the minimum radius of the public service area being covered.The threshold can vary, which still allows for the analysis of the distribution of buildings in residential, commercial or industrial areas.Furthermore, this threshold reflects the hierarchical structure of the relationships between different structures.

Clustering
In a system where the sweep-circle SC has already passed the first three points and has assigned them to one cluster, an algorithm surrounds the points by single-closure bordering polylines (i.e., the frontier) as shown in Figure 8a.When SC increases and sweeps to the new point pi, the projection of pi hits the edge (pl,pr) of the frontier toward O.This manner of projection will typically hit the frontier, since the O lies inside the frontier and the new points lie outside of it.By connecting pi and pl as well as pi and pr, the distances dist (pi,pl) and dist (pi,pr) are calculated, where the threshold ε (t) can be set accordingly.According to Equation (3), there are four possibilities when moving forward: • dist (pi,pl) > ε (t) and dist (pi,pr) > ε (t), where pi is the first element of a new cluster.
• dist (pi,pl) > ε (t) and dist (pi,pr) ≤ ε (t), where the right side of pi is assigned to a cluster (Cr) (Figure 8b).• dist (pi,pl) ≤ ε (t) and dist (pi,pr) > ε (t), where the left side of pi is assigned to a cluster (C1) (Figure 8b).• dist (pi,pl) ≤ ε (t) and dist (pi,pr) ≤ ε (t), where if pl and pr are members of the same cluster, before pi is placed into the same cluster.Otherwise, pi is a merging point between left and right clusters [41].The frontier plays an important role in the process of the discovery of clusters.In order to effectively implement the frontier, heap or balanced binary search trees (e.g., AVL tree, B-tree and Read-Black tree) can be often selected.In our case, a simple hash-table on a circular double-linked list is used to implement the algorithms efficiently and to ensure that large data sets were manipulated correctly (Figure 9).Each record of the frontier stores the key vertex index Pi and the index of the triangle Ti sharing its edge with the frontier (generating an adaptive threshold), in addition to the generated initial clustered index Ci.Fortunately, ASC will not have a projection-missed frontier, as previously mentioned in the literature [41].The frontier plays an important role in the process of the discovery of clusters.In order to effectively implement the frontier, heap or balanced binary search trees (e.g., AVL tree, B-tree and Read-Black tree) can be often selected.In our case, a simple hash-table on a circular double-linked list is used to implement the algorithms efficiently and to ensure that large data sets were manipulated correctly (Figure 9).Each record of the frontier stores the key vertex index P i and the index of the triangle T i sharing its edge with the frontier (generating an adaptive threshold), in addition to the generated initial clustered index C i .Fortunately, ASC will not have a projection-missed frontier, as previously mentioned in the literature [41].

Merging Clusters
The indices of clusters must be merged, i.e., the initial clusters must be adjusted during the final phase in accordance with the merged points.We used a previously applied method [41] to merge the indices of the clusters (Figure 8b provides an example).In this example, the clusters Cl and Cr are merged via pi and the smallest index value is preserved.In each list, any point that does not belong to any cluster is treated as an outlier/noise.

Point Collinearity
In the ASC algorithm, the "point collinearity" occurs when more than one spatial point is located on the same θ of the polar coordinates.This is a special case that must be treated accordingly in terms of setting the adaptive threshold ε (t) to ensure the stability of the algorithm.The mean of the triangle perimeter (including previously even points) is defined as the adaptive dynamic threshold that occurs when the sweep-circle located a new even point.When the projection of the next point p5 hits the vertex p4 of the triangle ∆p2p4p3, the mean of the perimeter can be calculated as the threshold, which determines whether the points p4 and p5 are grouped into the same cluster (Figure 10).The threshold of p6 is obtained according to the triangle ∆p2p5p3.We have described the procedure through a pseudo-code form in Algorithm 1.

Merging Clusters
The indices of clusters must be merged, i.e., the initial clusters must be adjusted during the final phase in accordance with the merged points.We used a previously applied method [41] to merge the indices of the clusters (Figure 8b provides an example).In this example, the clusters C l and C r are merged via p i and the smallest index value is preserved.In each list, any point that does not belong to any cluster is treated as an outlier/noise.

Point Collinearity
In the ASC algorithm, the "point collinearity" occurs when more than one spatial point is located on the same θ of the polar coordinates.This is a special case that must be treated accordingly in terms of setting the adaptive threshold ε (t) to ensure the stability of the algorithm.The mean of the triangle perimeter (including previously even points) is defined as the adaptive dynamic threshold that occurs when the sweep-circle located a new even point.When the projection of the next point p 5 hits the vertex p 4 of the triangle ∆p 2 p 4 p 3 , the mean of the perimeter can be calculated as the threshold, which determines whether the points p 4 and p 5 are grouped into the same cluster (Figure 10).The threshold of p 6 is obtained according to the triangle ∆p 2 p 5 p 3 .

Merging Clusters
The indices of clusters must be merged, i.e., the initial clusters must be adjusted during the final phase in accordance with the merged points.We used a previously applied method [41] to merge the indices of the clusters (Figure 8b provides an example).In this example, the clusters Cl and Cr are merged via pi and the smallest index value is preserved.In each list, any point that does not belong to any cluster is treated as an outlier/noise.

Point Collinearity
In the ASC algorithm, the "point collinearity" occurs when more than one spatial point is located on the same θ of the polar coordinates.This is a special case that must be treated accordingly in terms of setting the adaptive threshold ε (t) to ensure the stability of the algorithm.The mean of the triangle perimeter (including previously even points) is defined as the adaptive dynamic threshold that occurs when the sweep-circle located a new even point.When the projection of the next point p5 hits the vertex p4 of the triangle ∆p2p4p3, the mean of the perimeter can be calculated as the threshold, which determines whether the points p4 and p5 are grouped into the same cluster (Figure 10).The threshold of p6 is obtained according to the triangle ∆p2p5p3.We have described the procedure through a pseudo-code form in Algorithm 1.We have described the procedure through a pseudo-code form in Algorithm 1.

ASC-Based Stream Clustering
Bezdek and Hathaway categorize any data set containing 10 8 objects as a "large data set" [52].ASC extends the streaming clustering technique to include large spatial data sets repeating a small number of sequential passes over objects (ideally, single passes) and clustering the objects using the average memory space, where the size is a fraction of the stream length.The ASC-based stream clusters use a two-stage online and offline approach, as found in most streaming algorithms.In the online stage, the data set is split into blocks that are divided until they fit into the computer's main memory bank as the data points are swept within an increasingly large circle.ASC is applied until all spatial data objects in the blocks are processed.In the current experiment, we implemented cluster indexing, which stores the data in units of clusters grouped by ASC within storage systems.In the offline stage, the user sets the threshold ε and the corresponding clustering number K is identified.Atom clusters in the online stage are repeatedly computed via ASC until the process is complete and the results are provided.
Online stage

•
The large data set S is divided into a sequence of data blocks S = {X 1 ,X 2 , . . .,X i } according to the memory size.A load monitor [53] ensures that the loading of spatial data fits the main memory.

•
ASC is applied to each data block X i to form atom clusters Offline stage

•
It is assumed that the user provides a suitable threshold value ε and the clustering number K is set in advance for the obtained atom clusters.ASC is repeatedly implemented until forming a final (macro) space cluster by processing retrieval queries from the cluster indexes into the adjacent data blocks.
The above algorithm can manage static data and can be extended for the processing of dynamic data.

Time Complexity Analysis
All space points n are transformed to polar coordinates in O (n) and sorted according to their r-coordinates by Quicksort in O (nlogn).The total time complexity of the initialization phase is: The sweep-circle status is represented by the frontier, where points must be located to identify those that hit the projected edge.This point location is found in the hash table.A previously reported formula was used [35] to determine the number of entries into the hash-table in ASC: where h is the size of the hash-table, n is the number of table entries and k is the constant factor.According to a previous analysis [35], the relationship between CPU time spent and the number of entries into the hash-table h changes according to the value of k.If k is too small or too large, the computational time will be significantly altered.The k was set to 100 for these experiments in accordance to previous literature [35].During the sweeping phase, each point was projected onto the frontier, where it reached the frontier in a time period calculated as follows: The frontier that corresponds to threshold ε (t) is computed in constant time O (n).The frontier projections and their corresponding distances under the adaptive threshold are used to determine if the clusters require O (nlogn).In the final phase, the merged clusters that are adjusted indices require O (nlogn).The total time complexity of clustering is as follows: where the total expected time complexity of the proposed ASC algorithm is:

Comparison and Analysis of Experimental Results
In order to determine if the ASC clustering method is able to handle data with complex distributions, we utilized three 2D simulated spatial testing data sets (D1-D3) and a real-world spatial database.D1 and D2 are benchmark CHAMELEON data sets, which satisfy the similarity test requirements in terms of spatial proximity, thematic attributes, spatial distribution and hierarchy.
Data set D3 is very challenging for most clustering algorithms, as there are clusters with arbitrary shapes, different densities, noise and distinctly uneven internal densities.Traditional clustering algorithms were also tested and compared, including K-Means (the most commonly used method), DBSCAN (which can determine arbitrary shape clusters), CURE (which can identify clusters of more complex shapes and wide variances in size, while preferentially filtering the isolated points) and AMOEDA (which can adapt to the clusters that are arbitrarily-shaped or with different density without any a priori parameters using the Delaunay triangulation).The proposed ASC sweep-circle algorithm was also compared with Žalik's sweep-line algorithm [41].A real-world GIS data set was used in order to imitate the proposed algorithm's ability in dealing with actual spatial data.
D1 includes 8000 points with eight arbitrary shape clusters of different densities and random noise as seen in Figure 11a.The DBSCAN, CURE and Žalik's sweep-line algorithm (from here on referred to simply as "Žalik") were all compared with the ASC.For the DBSCAN, MinPts was set to 4; Eps was fixed to 5.4; the shrink factor of CURE was set to 0.3; and the number of representative points was set to 12. Parameter d was set to 12 for the Žalik algorithm.The ASC algorithm automatically discovered arbitrary shapes, clusters of different densities and nested clusters (Figure 11b).It not only effectively detected all eight clusters but also correctly identified the noise in D1.The CURE algorithm was unable to identify spatial clusters with complex shape and incorrectly defined less dense spatial data as noise.The DBSCAN algorithm could not readily adapt to the density variations among clusters, while the Žalik algorithm used global parameters that prevented it from identifying clusters with varying densities.There were 10,000 points in data set D2 (Figure 12).We varied the scaling factor α , which causes changes in the threshold in order to form clusters at different hierarchies.The results demonstrated that a larger number of clusters were created if the threshold was small when α was small.Conversely, the threshold was large if α was large, resulting in a smaller number of clusters forming (i.e., relatively shallow hierarchy).When the value of α is close to 1, clusters are easily and accurately distinguished from noise.In fact, the two effects create a favorable balance.These implicit hierarchical relationships with different thresholds related to α are often used as the basis for analysis in practice [16].Data set D3 containing 264 test points was used to test the recognition effectiveness of the ASC algorithm in clusters with uneven internal densities and non-uniform data distribution.The clustering results of D3 by K-Means, DBSCAN (Minpts = 4 and Eps = 0.78), AMOEBA, Žalik (d = 0.0074) and the ASC algorithm are seen in Figure 13, which showed that the ASC algorithm was most suitable in discovering clusters of uneven internal density.The clusters were simply divided into several parts by K-Means, while DBSCAN detected noise accurately but failed to separate nearby clusters.Both the AMOEBA and the Žalik failed to identify clusters of uneven internal density.
To illustrate the practical adaptability of ASC, we applied it to a real-world GIS data set collected obtained from the DCW (Digital Chart of the World), which focused on 15,067 position data points taken from Chinese cities, towns and villages pertaining to the population in 2002.The results from the ASC showed that the algorithm adapted well and is effective for this manner of practical application (See Figure 14).There were 10,000 points in data set D2 (Figure 12).We varied the scaling factor α, which causes changes in the threshold in order to form clusters at different hierarchies.The results demonstrated that a larger number of clusters were created if the threshold was small when α was small.Conversely, the threshold was large if α was large, resulting in a smaller number of clusters forming (i.e., relatively shallow hierarchy).When the value of α is close to 1, clusters are easily and accurately distinguished from noise.In fact, the two effects create a favorable balance.These implicit hierarchical relationships with different thresholds related to α are often used as the basis for analysis in practice [16].
Data set D3 containing 264 test points was used to test the recognition effectiveness of the ASC algorithm in clusters with uneven internal densities and non-uniform data distribution.The clustering results of D3 by K-Means, DBSCAN (Minpts = 4 and Eps = 0.78), AMOEBA, Žalik (d = 0.0074) and the ASC algorithm are seen in Figure 13, which showed that the ASC algorithm was most suitable in discovering clusters of uneven internal density.The clusters were simply divided into several parts by K-Means, while DBSCAN detected noise accurately but failed to separate nearby clusters.Both the AMOEBA and the Žalik failed to identify clusters of uneven internal density.
To illustrate the practical adaptability of ASC, we applied it to a real-world GIS data set collected obtained from the DCW (Digital Chart of the World), which focused on 15,067 position data points taken from Chinese cities, towns and villages pertaining to the population in 2002.The results from the ASC showed that the algorithm adapted well and is effective for this manner of practical application (See Figure 14).

CPU Time
The actual computational time for data processing is a greater concern in real-world applications, so we tested the proposed algorithm and compared it to several other methods accordingly.All algorithms were executed with the same development language, development environment, operating system and hardware (Intel R-core i3-3220 CPU@3.30GHz3.29 GHz and 2 GB memory, Seagate SV35 7200 rpm and access time 14.7 ms).

CPU Time Spent for Clustering
The CPU time spent for clustering data sets from Figures 11-14 is compared in Table 2.The actual efficiency of CPU time is correlated to the number of clusters and test points generated.More CPU time was spent on larger numbers of clusters when the number of test points were the same.When the same number of clusters was obtained, the computational time decreased when there were fewer test points.Additional time was spent when clustering or merging many small clusters rather than one larger cluster.
Most current adaptive algorithms (AMOEB and AUTOCLUST) were developed based on the Delaunay triangulation (i.e., high spatial proximity).Table 3 shows a comparison of the traditional adaptive method AUTOCLUST against the proposed algorithm.AUTOCLUST is a relatively new algorithm developed to manage complex data sets (such as those with clusters of varying density

CPU Time
The actual computational time for data processing is a greater concern in real-world applications, so we tested the proposed algorithm and compared it to several other methods accordingly.All algorithms were executed with the same development language, development environment, operating system and hardware (Intel R-core i3-3220 CPU@3.30GHz3.29 GHz and 2 GB memory, Seagate SV35 7200 rpm and access time 14.7 ms).

CPU Time Spent for Clustering
The CPU time spent for clustering data sets from Figures 11-14 is compared in Table 2.The actual efficiency of CPU time is correlated to the number of clusters and test points generated.More CPU time was spent on larger numbers of clusters when the number of test points were the same.When the same number of clusters was obtained, the computational time decreased when there were fewer test points.Additional time was spent when clustering or merging many small clusters rather than one larger cluster.
Most current adaptive algorithms (AMOEB and AUTOCLUST) were developed based on the Delaunay triangulation (i.e., high spatial proximity).Table 3 shows a comparison of the traditional adaptive method AUTOCLUST against the proposed algorithm.AUTOCLUST is a relatively new algorithm developed to manage complex data sets (such as those with clusters of varying density

CPU Time
The actual computational time for data processing is a greater concern in real-world applications, so we tested the proposed algorithm and compared it to several other methods accordingly.All algorithms were executed with the same development language, development environment, operating system and hardware (Intel R-core i3-3220 CPU@3.30GHz3.29 GHz and 2 GB memory, Seagate SV35 7200 rpm and access time 14.7 ms).

CPU Time Spent for Clustering
The CPU time spent for clustering data sets from Figures 11-14 is compared in Table 2.The actual efficiency of CPU time is correlated to the number of clusters and test points generated.More CPU time was spent on larger numbers of clusters when the number of test points were the same.When the same number of clusters was obtained, the computational time decreased when there were fewer test points.Additional time was spent when clustering or merging many small clusters rather than one larger cluster.
Most current adaptive algorithms (AMOEB and AUTOCLUST) were developed based on the Delaunay triangulation (i.e., high spatial proximity).Table 3 shows a comparison of the traditional adaptive method AUTOCLUST against the proposed algorithm.AUTOCLUST is a relatively new algorithm developed to manage complex data sets (such as those with clusters of varying density and arbitrary shapes).This algorithm is similar to the proposed algorithm in that it can adaptively discover spatial clusters without the need to set parameters in advance.The implementation class of AUTOCLUST can be obtained from the Web [29].12b 10,000 0.116 Figure 12c 10,000 0.124 Figure 12d 10,000 0.243 Figure 14 15,067 0.457 There are several methods available for constructing Delaunay triangulations.We used the fastest Sweep Line (SL) algorithm according to the literature [35].The experimental results suggested that ASC is more efficient than AUTOCLUST (Table 3).AUTOCLUST required more CPU time for Delaunay triangulation phase of the algorithm in addition to when dealing with repeated global and local uninteresting edges during the clustering process when the number of edges exceeded the number of points.
The CPU time spent between Žalik and ASC was compared (Table 4) using the same dataset, although different corresponding phases of the ASC algorithm were used.Initialization involved sorting input points by Quicksort in Žalik, which accounted for 22% of the total time spent.Initialization involved calculating the polar coordinate with Quicksort in ASC, which accounted for 30% of the total time.ASC employed the hash-table to speed up the efficiency during the clustering and merging phases when searching the event queue and locations on the borders.It was not necessary to consider projections that surpassed the border.However, this portion of the computational time in Žalik was spent both on missed projections and deleting the useless AF.In ASC, 8-9% of the total time was spent calculating polar coordinates of the input points and adaptive thresholds of the event points.Overall, there was less computational time and it is possible that the ASC algorithm could be truncated even further if the polar coordinates of the input points are obtained in advance.The ASC algorithm was also tested based on stream clustering techniques.Input spatial data sets were generated with high-resolution images obtained from Bing Maps (http://www.bing.com/maps/)containing 1.3 GB of data points.As shown in Figure 15, varying thresholds for ε (e.g., 100-600 m) and clusters K (e.g., 100-500) were provided to test the CPU time spent running the algorithm.The obtained clusters decreased with an increase in ε, while the running time of the algorithm was lower in the offline phase.Furthermore, having an ε value in the range of 100-300 resulted in relatively accurate clusters as shown in Figure 15.It required a longer period of time to generate a larger number of adaptive sub-clusters during the online phase.

Practical Applications of ASC
In order to explore the feasibility of the ASC algorithm in real-world scenarios, it was used to forecast geological disasters and quake magnitude based on geography.The real-world spatial data set containing 1264 geological disaster spots in Congzuo was collected from the geologic hazard database at the land department of Guangxi Zhuang Autonomous Region, China.The clustering results are shown in Figure 16.
ASC successfully detected nine clusters with varying densities within the disaster spots, which were divided into a preliminary distribution range of disaster-prone geographical areas (Figure 16c).It is possible to set a specific threshold (e.g., ε = 1000 m) in order to find areas that are most prone to major disasters, which could be beneficial for early warning purposes (Figure 16d).

Practical Applications of ASC
In order to explore the feasibility of the ASC algorithm in real-world scenarios, it was used to forecast geological disasters and quake magnitude based on geography.The real-world spatial data set containing 1264 geological disaster spots in Congzuo was collected from the geologic hazard database at the land department of Guangxi Zhuang Autonomous Region, China.The clustering results are shown in Figure 16.
ASC successfully detected nine clusters with varying densities within the disaster spots, which were divided into a preliminary distribution range of disaster-prone geographical areas (Figure 16c).It is possible to set a specific threshold (e.g., ε = 1000 m) in order to find areas that are most prone to major disasters, which could be beneficial for early warning purposes (Figure 16d).

Conclusions
The most notable conclusions of this study can be summarized as follows:

•
The Gestalt theory was successfully applied to enhance the adaptability of the spatial clustering algorithm.Both the sweep-circle technique and the dynamic threshold setting was employed to detect spatial clusters.

•
The ASC algorithm can automatically locate clusters in a single pass, rather than through modifying the initial model (i.e., via minimal spanning tree, Delaunay triangulation or Voronoi diagram).The algorithm could quickly adapt to identify arbitrarily-shaped clusters and could locate the non-homogeneous density characteristics of spatial data without necessitating a priori knowledge or parameters.The time complexity of the ASC algorithm was approximately O (nlogn), where n is the size of the spatial database.• Scalability in ASC was not limited to the size of the data set, which demonstrated that the algorithm is suitable for data streaming technology to cluster large, dynamic spatial data sets.

•
The proposed algorithm was efficient, feasible, easily understood and easily implemented.
The vast amount of information contained in spatial data sets and their relative complexity represent challenges that are yet to be solved.In the future, we believe we may benefit from further exploiting the characteristics of human vision.Humans can easily form clusters connected by chains

Conclusions
The most notable conclusions of this study can be summarized as follows: • The Gestalt theory was successfully applied to enhance the adaptability of the spatial clustering algorithm.Both the sweep-circle technique and the dynamic threshold setting was employed to detect spatial clusters.

•
The ASC algorithm can automatically locate clusters in a single pass, rather than through modifying the initial model (i.e., via minimal spanning tree, Delaunay triangulation or Voronoi diagram).The algorithm could quickly adapt to identify arbitrarily-shaped clusters and could locate the non-homogeneous density characteristics of spatial data without necessitating a priori knowledge or parameters.The time complexity of the ASC algorithm was approximately O (nlogn), where n is the size of the spatial database.

•
Scalability in ASC was not limited to the size of the data set, which demonstrated that the algorithm is suitable for data streaming technology to cluster large, dynamic spatial data sets.

•
The proposed algorithm was efficient, feasible, easily understood and easily implemented.The vast amount of information contained in spatial data sets and their relative complexity represent challenges that are yet to be solved.In the future, we believe we may benefit from further exploiting the characteristics of human vision.Humans can easily form clusters connected by chains and/or necks in addition to creating Gaussian clusters [14].However, ASC is unable to discover these special clusters.Additionally, in ASC, points that do not belong to any cluster are treated as outliers/noise, where multiple outliers or noise points could be processed as new, independent clusters.Finally, the algorithm can potentially be extended to clustering spatial data with a higher dimensionality than those discussed in the present study.

Figure 1 .
Figure 1.Incremental sweep circle algorithm used to construct Delaunay triangulation.

Figure 1 .
Figure 1.Incremental sweep circle algorithm used to construct Delaunay triangulation.

Figure 2 .
Figure 2. Sweeping the points by the sweep-lines.

Figure 2 .
Figure 2. Sweeping the points by the sweep-lines.

Figure 3 .
Figure 3. Sweeping the data set to obtain clusters.

Figure 3 .
Figure 3. Sweeping the data set to obtain clusters.

3. 1 .
Basic Concepts and Initialization Cluster definitions.Given n collection of discrete points S = {p 1 ,p 2 ,p 3 ,•••,p n } in 2D set (R 2 ), we use the degree of similarity between data points.Thus, the data set divides S into k clusters C = {C 1 ,C 2 , . . .,C k } C ⊆ S for defining the cluster, where k ∪C i = S, Ci ∩ Cj = O (i =j).This will result in the clustering of objects with high similarity, and the division of objects with high dissimilarity into different clusters.

Figure 7 .
The input point p4 is an event point projected on the edge (p2,p3) of triangle ∆p1p2p3 toward O, where p4 is closest to p2 and p3.This results in the formation of a cluster due to the rule of proximity.Under the closure rule, p2 and p3 combine with p4 to form a simple triangle ∆p4p2p3 adjacent to ∆p1p2p3 with a common edge (p2p3).Both the proximity and continuity Gestalt clusters occur at triangle ∆p4p2p3 and ∆p1p2p3, which maintain closely related spatial properties.Therefore, the distance of p4 from p2 and p3 is used to form a cluster, where the mean of the perimeter (L1) of triangle ∆p1p2p3 forms the adaptive dynamic threshold ε (1).When a new event point p5 is obtained, the mean of the perimeter (L2) of triangle ∆p2p4p5 serves as the adaptive dynamic threshold ε (2).

Figure 8 .
Figure 8. Adaptive spatial clustering (ASC) algorithm cluster basics with the following steps: (a) sweeping of the points; and (b) obtaining two clusters.

Figure 8 .
Figure 8. Adaptive spatial clustering (ASC) algorithm cluster basics with the following steps: (a) sweeping of the points; and (b) obtaining two clusters.

Figure 9 .
Figure 9. Hash-table on a circular double-linked list for sweep-circle clustering.

Figure 10 .
Figure 10.Vertexes located on the same line.

Figure 9 .
Figure 9. Hash-table on a circular double-linked list for sweep-circle clustering.

21 Figure 9 .
Figure 9. Hash-table on a circular double-linked list for sweep-circle clustering.

Figure 10 .
Figure 10.Vertexes located on the same line.

Figure 10 .
Figure 10.Vertexes located on the same line.
ISPRS Int.J. Geo-Inf.2017, 6, 272 13 of 21 defined less dense spatial data as noise.The DBSCAN algorithm could not readily adapt to the density variations among clusters, while the Žalik algorithm used global parameters that prevented it from identifying clusters with varying densities.

Figure 11 .
Figure 11.Testing data set D1 of ASC: (a) Graph built by triangulation of D1; (b) clustering result by ASC; (c) clustering result by DBSCAN; (d) clustering result by CURE; and (e) clustering result by Žalik.

Figure 11 .
Figure 11.Testing data set D1 of ASC: (a) Graph built by triangulation of D1; (b) clustering result by ASC; (c) clustering result by DBSCAN; (d) clustering result by CURE; and (e) clustering result by Žalik.

Figure 13 .
Figure 13.Clustering results of data set D3 by comparison: (a) Graph built by triangulation of D3; (b) clustering result by ASC; (c) clustering result by K-Means; (d) clustering result by DBSCAN; (e) clustering result by Žalik; and (f) clustering result by AMOEBA.

Figure 14 .
Figure 14.Clusters discovered by ASC in large spatial datasets: (a) graph built by triangulation of GIS datasets; and (b) clustering results of GIS large spatial datasets generated by ASC.

Figure 13 . 21 Figure 13 .
Figure 13.Clustering results of data set D3 by comparison: (a) Graph built by triangulation of D3; (b) clustering result by ASC; (c) clustering result by K-Means; (d) clustering result by DBSCAN; (e) clustering result by Žalik; and (f) clustering result by AMOEBA.

Figure 14 .
Figure 14.Clusters discovered by ASC in large spatial datasets: (a) graph built by triangulation of GIS datasets; and (b) clustering results of GIS large spatial datasets generated by ASC.

Figure 14 .
Figure 14.Clusters discovered by ASC in large spatial datasets: (a) graph built by triangulation of GIS datasets; and (b) clustering results of GIS large spatial datasets generated by ASC.
ISPRS Int.J. Geo-Inf.2017, 6, 272 17 of 21 (http://www.bing.com/maps/)containing 1.3 GB of data points.As shown in Figure 15, varying thresholds for ε (e.g., 100-600 m) and clusters K (e.g., 100-500) were provided to test the CPU time spent running the algorithm.The obtained clusters decreased with an increase in ε , while the running time of the algorithm was lower in the offline phase.Furthermore, having an ε value in the range of 100-300 resulted in relatively accurate clusters as shown in Figure 15.It required a longer period of time to generate a larger number of adaptive sub-clusters during the online phase.

Figure 15 .
Figure 15.Speed comparison of stream clustering.

Figure 15 .
Figure 15.Speed comparison of stream clustering.

Figure 16 .
Figure 16.Spatial clustering results for disaster database by ASC: (a) Distribution of disaster data points; (b) description of spatial neighborhood relations via Delaunay triangulation; (c) clustering result of ASC; and (d) clustering result of user-defined threshold setting ( ε = 1000 m).

16 .
Spatial clustering results for disaster database by ASC: (a) Distribution of disaster data points; (b) description of spatial neighborhood relations via Delaunay triangulation; (c) clustering result of ASC; and (d) clustering result of user-defined threshold setting (ε = 1000 m).

Determining the center of the sweep-circle. S corresponds
This will result in the clustering of objects with high similarity, and the division of objects with high dissimilarity into different clusters.

Table 2 .
CPU time (s) spent for clustering.

Table 3 .
CPU time (s) spent by ASC and AUTOCLUST for clustering.

Table 4 .
CPU time (s) spent by ASC and Žalik for different algorithm phases.