# An Adaptive Sweep-Circle Spatial Clustering Algorithm Based on Gestalt

^{1}

^{2}

^{3}

^{4}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Related work

#### 2.1. Plane-Sweep Techniques

#### 2.2. Sweep-Circle Algorithm

#### 2.3. Data Stream Technique

#### 2.4. Sweep-Line Clustering Algorithm

_{1}and S

_{2}, where the distance from S

_{1}to the front of S

_{2}is d. It is assumed that S

_{1}sweeps the p

_{i-1}set points in accordance with the proximity parameter d to form part of the clusters, while the points of the front line (AF) are sorted in accordance with the x coordinates. When S

_{1}encounters point p

_{i}, p

_{i}is projected to the frontier toward AF and a cluster is found by comparing the distance from p

_{i}to p

_{l}and p

_{i}to p

_{r}using the proximity parameter d (Figure 2). When S

_{1}moves to the next point, S

_{2}follows it at distance d. The points that have been swept by S

_{2}are removed from the AF. If the projection misses the AF (that can also be empty), the corresponding end-point of the AF is tested to determine whether it is close enough to point p

_{i}to discover a new cluster [41]. It is difficult to determine the global parameter d that accounts for uneven distribution of data sets. If the parameters are set without a priori knowledge (or measured experimental results), it is difficult to find true clusters accurately.

## 3. ASC Algorithm

**E**be the Euclidean plane and the Euclidean distance between two points x and y of

**E**; and

**S**be a planar set of n points of

**E,**which are called sites. In the polar coordinate system where the ASC algorithm is applied,

**p**

_{i}is swept from the initial frontier in the outwards direction according to the increasing distance from pole O (i.e., the sweep-circle center). Following this,

**p**

_{i}is projected onto the segment of frontier edge (

**p**

_{l}

**p**

_{r}) along the circle in the O-direction (Figure 3). According to Tobler, the first law of geography is that “everything is related to everything else, but near things are more related than distant things” [50]. The points are considered to be similar if the points are within a specific distance of each other, such as points

**p**

_{i}and

**p**

_{l}or

**p**

_{i}and

**p**

_{r}(Figure 3). These values fall under a threshold value used to determine the formation of clusters.

#### 3.1. Basic Concepts and Initialization

**Cluster**

**definitions**. Given n collection of discrete points

**S**= {

**p**

_{1},

**p**

_{2},

**p**

_{3},

**···**,

**p**

_{n}} in 2D set (

**R**

^{2}), we use the degree of similarity between data points. Thus, the data set divides

**S**into

**k**clusters

**C**= {C

_{1},C

_{2},…,C

_{k}} C $\subseteq $

**S**for defining the cluster, where $\underset{}{\overset{k}{\cup {C}_{i}}}$ =

**S**, $Ci\cap Cj=\xd8$ (i≠j). This will result in the clustering of objects with high similarity, and the division of objects with high dissimilarity into different clusters.

**Determining the center of the sweep-circle**.

**S**corresponds to the coordinate set {

**p**

_{1}(x

_{1},y

_{1}),

**p**

_{2}(x

_{2},y

_{2}),

**p**

_{3}(x

_{3},y

_{3}),

**···**,

**p**

_{n}(x

_{n},y

_{n})}, where the origin of the polar coordinate O (

**p**

_{x},

**p**

_{y}) is the center of the sweep-circle. Select O (

**p**

_{x},

**p**

_{y}) as the average of the largest (x

_{max},y

_{max}) and smallest (x

_{m}

_{ix},y

_{m}

_{ix}) values of input

**S**.

**Calculating the polar coordinates of input points and sorting**. The polar coordinates of input points are calculated and sorted by increasing distance from O as follows:

**p**

_{i}(x

_{i},y

_{i}) in the Cartesian coordinates can be transformed to

**p**

_{i}(r

_{i},θ

_{i}), where the points are sorted according to their r-coordinate found in the polar coordinates. If two points have the same r-coordinate, they are sorted by the secondary criterion θ. In a special case where the first point coincides with the origin O (i.e., its r-coordinate is zero), the point is removed from the list.

**Constructing the initial frontier and clusters**. The three points located nearest to the center O are used to form a triangle, where it is assumed that the three points are non-linear. The three edges of the triangle form a polyline, which is referred to as the frontier. Any spatial clustering algorithm should work based on various distances, such as the Euclidean distance, the Manhattan distance, or the Minkowski distance. This algorithm uses the Euclidian distance between data points to measure the distance needed for spatial clustering. Figure 4 shows an example of the three points nearest to the center O, which forms the initial cluster.

**Clustering the threshold.**The threshold setting $\epsilon $ is set as the distance measurement of

**d**(

**p**

_{i},

**p**

_{j}), which determines if two points are grouped into the same cluster. If the distance between the two points is less than or equal to this value, they belong to the same cluster; otherwise, they do not. This is calculated as follows:

**p**= ∪ {

**p**

_{i},

**p**

_{j}} |

**d**(

**p**

_{i},

**p**

_{j}) ≤ ε, (j ≠ i,

**p**

_{i},

**p**

_{j}∈

**S**)

**t**). We can define the mean of the triangle’s perimeter L

_{t}as the adaptive dynamic threshold $\epsilon $ (

**t**). In this model, each new event point

**p**

_{i}is processed to correspond to the threshold according to the following three Gestalt principles (Figure 5).

**Proximity**, where objects placed close together tend to be perceived as a group.

**p**

_{i}is close to two points of △ $\epsilon $ (

**t**), while the concept of ‘place’ means that two triangles are adjacent to each other, such as △ $\epsilon $ (

**1**) and △ $\epsilon $ (

**2**), or △ $\epsilon $ (

**2**) and △ $\epsilon $ (

**3**). These can be used to form the dynamic threshold $\epsilon $ (

**1**), $\epsilon $ (

**2**), $\epsilon $ (

**3**), ··, $\epsilon $ (

**t**) as a group (Figure 6).

**Continuity**, where spatial objects arranged in a logical order are easily perceived as a group or a continuous graph.

**Closure**, where the observer tends to prioritize closeness and “perfection” of objects. Thus, gaps between objects may be perceived as being filled to create a unified whole.

**p**

_{i}has a tendency to connect ∆ $\epsilon $ (

**t**), which serves as a reminder of the whole.

**p**

_{i}is projected onto the frontier toward

**O**, the triangle including the frontier is identified and we can define adaptive dynamic threshold $\epsilon $ (

**t**) as follows:

**t**) by manipulating the value of $\alpha $, although this may affect the quality of clusters and reflect the hierarchical relation. The value of $\alpha $ is usually set to 1 for ASC.

**t**), which is shown in Figure 7. The input point

**p**

_{4}is an event point projected on the edge (

**p**

_{2,}

**p**

_{3}) of triangle ∆

**p**

_{1}

**p**

_{2}

**p**

_{3}toward O, where

**p**

_{4}is closest to

**p**

_{2}and

**p**

_{3}. This results in the formation of a cluster due to the rule of proximity. Under the closure rule,

**p**

_{2}and

**p**

_{3}combine with

**p**

_{4}to form a simple triangle ∆

**p**

_{4}

**p**

_{2}

**p**

_{3}adjacent to ∆

**p**

_{1}

**p**

_{2}

**p**

_{3}with a common edge (

**p**

_{2}

**p**

_{3}). Both the proximity and continuity Gestalt clusters occur at triangle ∆

**p**

_{4}

**p**

_{2}

**p**

_{3}and ∆

**p**

_{1}

**p**

_{2}

**p**

_{3,}which maintain closely related spatial properties. Therefore, the distance of

**p**

_{4}from

**p**

_{2}and

**p**

_{3}is used to form a cluster, where the mean of the perimeter (L

_{1}) of triangle ∆

**p**

_{1}

**p**

_{2}

**p**

_{3}forms the adaptive dynamic threshold $\epsilon $ (

**1**). When a new event point

**p**

_{5}is obtained, the mean of the perimeter (L

_{2}) of triangle ∆

**p**

_{2}

**p**

_{4}

**p**

_{5}serves as the adaptive dynamic threshold $\epsilon $ (

**2**).

#### 3.2. Clustering

**p**

_{i}, the projection of

**p**

_{i}hits the edge (

**p**

_{l,}

**p**

_{r}) of the frontier toward O. This manner of projection will typically hit the frontier, since the O lies inside the frontier and the new points lie outside of it. By connecting

**p**

_{i}and

**p**

_{l}as well as

**p**

_{i}and

**p**

_{r}, the distances dist (

**p**

_{i},

**p**

_{l}) and dist (

**p**

_{i},

**p**

_{r}) are calculated, where the threshold $\epsilon $ (

**t**) can be set accordingly. According to Equation (3), there are four possibilities when moving forward:

- dist (
**p**_{i},**p**_{l}) > $\epsilon $ (**t**) and dist (**p**_{i},**p**_{r}) > $\epsilon $ (**t**), where**p**_{i}is the first element of a new cluster. - dist (
**p**_{i},**p**_{l}) > $\epsilon $ (**t**) and dist (**p**_{i},**p**_{r}) ≤ $\epsilon $ (**t**), where the right side of**p**_{i}is assigned to a cluster (C_{r}) (Figure 8b). - dist (
**p**_{i},**p**_{l}) ≤ $\epsilon $ (**t**) and dist (**p**_{i},**p**_{r}) > $\epsilon $ (**t**), where the left side of**p**_{i}is assigned to a cluster (C_{1}) (Figure 8b). - dist (
**p**_{i},**p**_{l}) ≤ $\epsilon $ (**t**) and dist (**p**_{i},**p**_{r}) ≤ $\epsilon $ (**t**), where if**p**_{l}and**p**_{r}are members of the same cluster, before**p**_{i}is placed into the same cluster. Otherwise,**p**_{i}is a merging point between left and right clusters [41].

_{i}and the index of the triangle T

_{i}sharing its edge with the frontier (generating an adaptive threshold), in addition to the generated initial clustered index C

_{i}. Fortunately, ASC will not have a projection-missed frontier, as previously mentioned in the literature [41].

#### 3.3. Merging Clusters

_{l}and C

_{r}are merged via

**p**

_{i}and the smallest index value is preserved. In each list, any point that does not belong to any cluster is treated as an outlier/noise.

#### 3.4. Point Collinearity

**t**) to ensure the stability of the algorithm. The mean of the triangle perimeter (including previously even points) is defined as the adaptive dynamic threshold that occurs when the sweep-circle located a new even point. When the projection of the next point

**p**

_{5}hits the vertex

**p**

_{4}of the triangle ∆

**p**

_{2}

**p**

_{4}

**p**

_{3}, the mean of the perimeter can be calculated as the threshold, which determines whether the points

**p**

_{4}and

**p**

_{5}are grouped into the same cluster (Figure 10).The threshold of

**p**

_{6}is obtained according to the triangle ∆

**p**

_{2}

**p**

_{5}

**p**

_{3}.

Algorithm 1. ASC clustering algorithm. |

Input: The 2D set S = {p_{1},p_{2},p_{3},···,p_{n}} of n points |

Output: C |

Initialization: |

1: select the pole O |

2: calculate p_{i} (r_{i},θ_{i}) for points in S |

3: sort the S according to r |

4: create the first triangle |

5: compute $\epsilon $ (t) ,t←1,t < (n − 3) |

6:C_{L}←Ø, C_{R}←Ø,C_{N}←Ø, C_{S}←Ø |

Clustering: |

7: for i ←4 to n do |

8: $\epsilon $ (t) = 1/3 $\alpha $ L_{t} |

9: project p_{i} on the frontier; hits the edge (p_{l}, p_{r}) |

10: if d (p_{i},p_{l}) > $\epsilon $ (t) and d (p_{i},p_{r}) > $\epsilon $ (t) then |

11: C_{N}←C_{N}∪p_{i} |

12: end if |

13: if d (p_{i},p_{l}) > $\epsilon $ (t) and d (p_{i},p_{r}) ≤ $\epsilon $ (t) then |

14: C_{R}←C_{R}∪p_{i} |

15: end if |

16: if d (p_{i},p_{l}) ≤ $\epsilon $ (t) and d (p_{i},p_{r}) > $\epsilon $ (t) then |

17: C_{L}←C_{L}∪p_{i} |

18: end if |

19: if d (p_{i},p_{l}) ≤ $\epsilon $ (t) and d (p_{i},p_{r}) ≤ $\epsilon $ (t) then |

20: C_{S}←C_{S}∪p_{i} |

21: end if |

22: create triangle ∆_{i,l,r} |

23: t←t + 1 |

24: end for |

Merging: |

25: C←C_{L}∪C_{R}∪C_{N}∪C_{S} |

## 4. ASC-Based Stream Clustering

^{8}objects as a “large data set” [52]. ASC extends the streaming clustering technique to include large spatial data sets repeating a small number of sequential passes over objects (ideally, single passes) and clustering the objects using the average memory space, where the size is a fraction of the stream length. The ASC-based stream clusters use a two-stage online and offline approach, as found in most streaming algorithms. In the online stage, the data set is split into blocks that are divided until they fit into the computer’s main memory bank as the data points are swept within an increasingly large circle. ASC is applied until all spatial data objects in the blocks are processed. In the current experiment, we implemented cluster indexing, which stores the data in units of clusters grouped by ASC within storage systems. In the offline stage, the user sets the threshold $\epsilon $ and the corresponding clustering number

**K**is identified. Atom clusters in the online stage are repeatedly computed via ASC until the process is complete and the results are provided.

- The large data set S is divided into a sequence of data blocks S = {X
_{1},X_{2},…,X_{i}} according to the memory size. A load monitor [53] ensures that the loading of spatial data fits the main memory. - ASC is applied to each data block X
_{i}to form atom clusters C_{i}= {C_{1},C_{2},…,C_{l}}.

- It is assumed that the user provides a suitable threshold value $\epsilon $ and the clustering number
**K**is set in advance for the obtained atom clusters. ASC is repeatedly implemented until forming a final (macro) space cluster by processing retrieval queries from the cluster indexes into the adjacent data blocks.

## 5. Results and Discussion

#### 5.1. Time Complexity Analysis

_{inti}= O (n) + O (nlogn) = O (nlogn)

**t**) is computed in constant time O (n). The frontier projections and their corresponding distances under the adaptive threshold are used to determine if the clusters require O (nlogn). In the final phase, the merged clusters that are adjusted indices require O (nlogn). The total time complexity of clustering is as follows:

_{clus}= O (n) + O (nlogn) + O (n) = O (nlogn)

_{total}= T

_{inti}+ T

_{locate}+ T

_{clus}= O (nlogn)

#### 5.2. Comparison and Analysis of Experimental Results

#### 5.3. CPU Time

#### 5.3.1. CPU Time Spent for Clustering

#### 5.3.2. CPU Time Spent for ASC-Based Stream Clustering

## 6. Practical Applications of ASC

## 7. Conclusions

- The Gestalt theory was successfully applied to enhance the adaptability of the spatial clustering algorithm. Both the sweep-circle technique and the dynamic threshold setting was employed to detect spatial clusters.
- The ASC algorithm can automatically locate clusters in a single pass, rather than through modifying the initial model (i.e., via minimal spanning tree, Delaunay triangulation or Voronoi diagram). The algorithm could quickly adapt to identify arbitrarily-shaped clusters and could locate the non-homogeneous density characteristics of spatial data without necessitating a priori knowledge or parameters. The time complexity of the ASC algorithm was approximately O (nlogn), where n is the size of the spatial database.
- Scalability in ASC was not limited to the size of the data set, which demonstrated that the algorithm is suitable for data streaming technology to cluster large, dynamic spatial data sets.
- The proposed algorithm was efficient, feasible, easily understood and easily implemented.

## Acknowledgments

## Author Contributions

## Conflicts of Interest

## References

- Han, J.; Kamber, M.; Pei, J. Data Mining, 3rd ed.; Morgan Kaufmann: Boston, MA, USA, 2012. [Google Scholar]
- Chen, J.; Lin, X.; Zheng, H.; Bao, X. A novel cluster center fast determination clustering algorithm. Appl. Soft Comput.
**2017**, 57, 539–555. [Google Scholar] - Deng, M.; Liu, Q.; Cheng, T.; Shi, Y. An adaptive spatial clustering algorithm based on Delaunay triangulation. Comput. Environ. Urban Syst.
**2011**, 35, 320–332. [Google Scholar] [CrossRef] - Liu, Q.; Deng, M.; Shi, Y. Adaptive spatial clustering in the presence of obstacles and facilitators. Comput. Geosci.
**2013**, 56, 104–118. [Google Scholar] [CrossRef] - Bouguettaya, A.; Yu, Q.; Liu, X.; Zhou, X.; Song, A. Efficient agglomerative hierarchical clustering. Expert Syst. Appl.
**2015**, 42, 2785–2797. [Google Scholar] [CrossRef] - MacQueen, J. Some Methods for Classification and Analysis of Multivariate Observations. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Berkeley, CA, USA, 21 June–18 July 1965 and 27 December 1965–7 January 1966; pp. 281–297. [Google Scholar]
- Ng, R.T.; Han, J. Efficient and effective clustering methods for spatial data mining. In Proceedings of the 20th International Conference on Very Large Data Bases, Santiago de Chile, Chile, 12–15 September 1994; pp. 144–155. [Google Scholar]
- Guha, S.; Rastogi, R.; Shim, K. Cure: An efficient clustering algorithm for large databases. Inf. Syst.
**1998**, 26, 35–58. [Google Scholar] [CrossRef] - Zhang, T. Birch: An efficient data clustering method for very large databases. In Proceedings of the 1996 ACM SIGMOD international conference on Management of data, Montreal, QC, Canada, 4–6 June 1999; pp. 103–114. [Google Scholar]
- Karypis, G.; Han, E.-H.; Kumar, V. Chameleon: Hierarchical clustering using dynamic modeling. Computer
**1999**, 32, 68–75. [Google Scholar] [CrossRef] - Ester, M.; Kriegel, H.-P.; Sander, J.; Xu, X. A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of the 2nd International Conference on Knowledge Discovery and Date Mining, Portland, OR, USA, 2–4 August 1996; pp. 226–231. [Google Scholar]
- Ankerst, M.; Breunig, M.M.; Kriegel, H.-P.; Sander, J. Optics: Ordering points to identify the clustering structure. In Proceedings of the 1999 ACM SIGMOD International Conference on Management of Data, Philadelphia, PA, USA, 31 May–3 June 1999; pp. 49–60. [Google Scholar]
- Hinneburg, A.; Keim, D.A. An Efficient Approach to Clustering in Large Multimedia Databases with Noise. In Proceedings of the 4th International Conference on Knowledge Discovery and Data Mining, New York, NY, USA, 27–31 August 1998; pp. 58–65. [Google Scholar]
- Zahn, C.T. Graph-theoretical methods for detecting and describing gestalt clusters. IEEE Trans. Comput.
**1971**, C-20, 68–86. [Google Scholar] [CrossRef] - Estivill-Castro, V.; Lee, I. Autoclust: Automatic clustering via boundary extraction for mining massive point-data sets. In Proceedings of the 5th International Conference on Geocomputation, London, UK, 23–25 August 2000. [Google Scholar]
- Kang, I.-S.; Kim, T.-W.; Li, K.-J. A spatial data mining method by delaunay triangulation. In Proceedings of the 5th ACM International Workshop on Advances in Geographic Information Systems, Las Vegas, NV, USA, 10–14 November 1997; pp. 35–39. [Google Scholar]
- Wang, W.; Yang, J.; Muntz, R.R. Sting: A statistical information grid approach to spatial data mining. In Proceedings of the 23rd International Conference on Very Large Data Bases, Athens, Greece, 25–29 August 1997; pp. 186–195. [Google Scholar]
- Sheikholeslami, G.; Chatterjee, S.; Zhang, A. Wavecluster: A multi-resolution clustering approach for very large spatial databases. In Proceedings of the 24rd International Conference on Very Large Data Bases, New York, NY, USA, 24–27 August 1998; pp. 428–439. [Google Scholar]
- Dempster, A.P.; Laird, N.M.; Rubin, D.B. Maximum likelihood from incomplete data via the em algorithm. J. R. Stat. Soc. Ser. B
**1977**, 39, 1–38. [Google Scholar] - Gennari, J.H.; Langley, P.; Fisher, D. Models of incremental concept formation. Artif. Intell.
**1989**, 40, 11–61. [Google Scholar] [CrossRef] - Kohonen, T. Self-organized formation of topologically correct feature maps. Biol. Cybern.
**1982**, 43, 59–69. [Google Scholar] [CrossRef] - Schikuta, E. Grid-clustering: An efficient hierarchical clustering method for very large data sets. In Proceedings of the 13th International Conference on Pattern Recognition, Vienna, Austria, 25–29 August 1996; pp. 101–105. [Google Scholar]
- Pei, T.; Zhu, A.X.; Zhou, C.; Li, B.; Qin, C. A new approach to the nearest-neighbour method to discover cluster features in overlaid spatial point processes. Int. J. Geogr. Inf. Sci.
**2006**, 20, 153–168. [Google Scholar] [CrossRef] - Tsai, C.-F.; Tsai, C.-W.; Wu, H.-C.; Yang, T. Acodf: A novel data clustering approach for data mining in large databases. J. Syst. Softw.
**2004**, 73, 133–145. [Google Scholar] [CrossRef] - Estivill-Castro, V.; Lee, I. Amoeba: Hierarchical clustering based on spatial proximity using delaunaty diagram. In Proceedings of the 9th International Symposium on Spatial Data Handling, Beijing, China, 10–12 August 2000. [Google Scholar]
- Estivill-Castro, V.; Lee, I. Argument free clustering for large spatial point-data sets via boundary extraction from delaunay diagram. Comput. Environ. Urban Syst.
**2002**, 26, 315–334. [Google Scholar] [CrossRef] - Wei, C.P.; Lee, Y.H.; Hsu, C.M. Empirical comparison of fast partitioning-based clustering algorithms for large data sets. Expert Syst. Appl.
**2003**, 24, 351–363. [Google Scholar] [CrossRef] - Li, D.; Yang, X.; Cui, W.; Gong, J.; Wu, H. A novel spatial clustering algorithm based on Delaunay triangulation. In Proceedings of the International Conference on Earth Observation Data Processing and Analysis (ICEODPA), Wuhan, China, 28–30 December 2008; Volume 7285, pp. 728530–728539. [Google Scholar]
- Liu, D.; Nosovskiy, G.V.; Sourina, O. Effective clustering and boundary detection algorithm based on delaunay triangulation. Pattern Recognit. Lett.
**2008**, 29, 1261–1273. [Google Scholar] [CrossRef] - Nosovskiy, G.V.; Liu, D.; Sourina, O. Automatic clustering and boundary detection algorithm based on adaptive influence function. Pattern Recognit.
**2008**, 41, 2757–2776. [Google Scholar] [CrossRef] - Xu, D.; Tian, Y. A comprehensive survey of clustering algorithms. Ann. Data Sci.
**2015**, 2, 165–193. [Google Scholar] [CrossRef] - Zhao, Q.; Shi, Y.; Liu, Q.; Fränti, P. A grid-growing clustering algorithm for geo-spatial data. Pattern Recognit. Lett.
**2015**, 53, 77–84. [Google Scholar] [CrossRef] - Bolaños, M.; Forrest, J.; Hahsler, M. Clustering large datasets using data stream clustering techniques. In Data Analysis, Machine Learning and Knowledge Discovery; Spiliopoulou, M., Schmidt-Thieme, L., Janning, R., Eds.; Springer International Publishing: Cham, Switzerland, 2014; pp. 135–143. [Google Scholar]
- Preparata, F.P.; Ian, S.M. Computational geometry: An Introduction; Springer-Verlag New York: New York, NY, USA, 1985. [Google Scholar]
- Žalik, B. An efficient sweep-line delaunay triangulation algorithm. Comput.-Aided Des.
**2005**, 37, 1027–1038. [Google Scholar] [CrossRef] - Alfred, U. A mathematician’s progress. Math. Teach.
**1966**, 59, 722–727. [Google Scholar] - Shamos, M.I.; Hoey, D. Geometric intersection problems. In Proceedings of the 17th Annual Symposium on Foundations of Computer Science, Houston, TX, USA, 25–27 October 1976; pp. 208–215. [Google Scholar]
- Bentley, J.L.; Ottmann, T.A. Algorithms for reporting and counting geometric intersections. IEEE Trans. Comput.
**1979**, C-28, 643–647. [Google Scholar] [CrossRef] - Fortune, S. A sweepline algorithm for voronoi diagrams. Algorithmica
**1987**, 2, 153–174. [Google Scholar] [CrossRef] - Zhou, P. Computational geometry algorithm design and analysis. In Computational Geometry Algorithm Design and Analysis, 4th ed.; Tsinghua University Press: Beijing, China, 2011. [Google Scholar]
- Žalik, K.R.; Žalik, B. A sweep-line algorithm for spatial clustering. Adv. Eng. Softw.
**2009**, 40, 445–451. [Google Scholar] [CrossRef] - Biniaz, A.; Dastghaibyfard, G. A faster circle-sweep delaunay triangulation algorithm. Adv. Eng. Softw.
**2012**, 43, 1–13. [Google Scholar] [CrossRef] - Adam, B.; Kauffmann, P.; Schmitt, D.; Spehner, J.-C. An increasing-circle sweep-algorithm to construct the Delaunay diagram in the plane. In Proceedings of the 9th Canadian Conference on Computational Geometry (CCCG), Kingston, ON, Canada, 11–14 August 1997. [Google Scholar]
- Guha, S.; Mishra, N.; Motwani, R.; O’Callaghan, L. Clustering data streams. In Proceedings of the 41st Annual Symposium on Foundations of Computer Science, Redondo Beach, CA, USA, 12–14 November 2000; pp. 359–366. [Google Scholar]
- O’Callaghan, L.; Mishra, N.; Meyerson, A.; Guha, S.; Motwani, R. Streaming-data algorithms for high-quality clustering. In Proceedings of the 18th International Conference on Data Engineering, San Jose, CA, USA, 26 February–1 March 2002; pp. 685–694. [Google Scholar]
- Zengyou, H.E.; Xiaofei, X.U.; Deng, S. Squeezer: An efficient algorithm for clustering categorical data. J. Comput. Sci. Technol.
**2002**, 17, 611–624. [Google Scholar] - Guha, S.; Meyerson, A.; Mishra, N.; Motwani, R.; O’Callaghan, L. Clustering data streams: Theory and practice. IEEE Trans. Knowl. Data Eng.
**2003**, 15, 515–528. [Google Scholar] [CrossRef] - Ding, S.; Zhang, J.; Jia, H.; Qian, J. An adaptive density data stream clustering algorithm. Cogn. Comput.
**2016**, 8, 30–38. [Google Scholar] [CrossRef] - Zheng, L.; Huo, H.; Guo, Y.; Fang, T. Supervised adaptive incremental clustering for data stream of chunks. Neurocomputing
**2016**, 219, 502–517. [Google Scholar] [CrossRef] - Tobler, W.R. A computer movie simulating urban growth in the detroit region. Econ. Geogr.
**1970**, 46, 234–240. [Google Scholar] [CrossRef] - Ellis, W.D. A Source Book of Gestalt Psychology; Kegan Paul, Trench, Trubner & Company: London, UK, 1938; p. 403. [Google Scholar]
- Hathaway, R.J.; Bezdek, J.C. Extending fuzzy and probabilistic clustering to very large data sets. Comput. Stat. Data Anal.
**2006**, 51, 215–234. [Google Scholar] [CrossRef] - Cho, K.; Jo, S.; Jang, H.; Kim, S.M.; Song, J. Dcf: An efficient data stream clustering framework for streaming applications. In Proceedings of the 17th International Conference on Database and Expert Systems Applications, Kraków, Poland, 4–8 September 2006; pp. 114–122. [Google Scholar]

**Figure 8.**Adaptive spatial clustering (ASC) algorithm cluster basics with the following steps: (

**a**) sweeping of the points; and (

**b**) obtaining two clusters.

**Figure 11.**Testing data set D1 of ASC: (

**a**) Graph built by triangulation of D1; (

**b**) clustering result by ASC; (

**c**) clustering result by DBSCAN; (

**d**) clustering result by CURE; and (

**e**) clustering result by Žalik.

**Figure 12.**Results largely dependent on parameters: (

**a**) Graph built by triangulation of D2; (

**b**) 20 clusters obtained when $\alpha $ = 1; (

**c**) 30 clusters obtained when $\alpha $ = 0.8; and (

**d**) 8 clusters obtained when $\alpha $ = 1.2.

**Figure 13.**Clustering results of data set D3 by comparison: (

**a**) Graph built by triangulation of D3; (

**b**) clustering result by ASC; (

**c**) clustering result by K-Means; (

**d**) clustering result by DBSCAN; (

**e**) clustering result by Žalik; and (

**f**) clustering result by AMOEBA.

**Figure 14.**Clusters discovered by ASC in large spatial datasets: (

**a**) graph built by triangulation of GIS datasets; and (

**b**) clustering results of GIS large spatial datasets generated by ASC.

**Figure 16.**Spatial clustering results for disaster database by ASC: (

**a**) Distribution of disaster data points; (

**b**) description of spatial neighborhood relations via Delaunay triangulation; (

**c**) clustering result of ASC; and (

**d**) clustering result of user-defined threshold setting ($\epsilon $ = 1000 m).

Category | Typical Algorithm | Shape of Suitable Data Set | Discovery of Clusters with Even Density | Scalability | Requirement of Prior Knowledge | Sensitive to Noise/Outlier | for Large-Scale Data | Complexity (Times) |
---|---|---|---|---|---|---|---|---|

Partition | K-means | Convex | No | Middle | Yes | Highly | Yes | Low |

CLARANS | Convex | No | Middle | Yes | Little | Yes | High | |

Hierarchy | BIRCH | Convex | No | High | Yes | Little | Yes | Low |

CURE | Arbitrary | No | High | Yes | Little | Yes | Low | |

CHAMELEON | Arbitrary | Yes | High | Yes | Little | No | High | |

Density | DBSCAN | Arbitrary | No | Middle | Yes | Little | Yes | Middle |

OPTICS | Arbitrary | Yes | Middle | Yes | Little | Yes | Middle | |

DENCLUE | Arbitrary | No | Middle | Yes | Little | Yes | Middle | |

Graph theory | MST | Arbitrary | Yes | High | Yes | Highly | Yes | Middle |

AMEOBA | Arbitrary | Yes | High | No | Little | No | Middle | |

AUTOCLUST | Arbitrary | Yes | High | No | Little | No | Middle | |

Grid | STING | Arbitrary | No | High | Yes | Little | Yes | Low |

CLIQUE | Arbitrary | No | High | Yes | Moderately | No | Low | |

WaveCluster | Arbitrary | No | High | Yes | Little | Yes | Low | |

Model | EM | Convex | No | Middle | Yes | Highly | No | Low |

Data Set | Points | CPU Time (s) |
---|---|---|

Figure 13 | 264 | 0.014 |

Figure 11 | 8000 | 0.074 |

Figure 12b | 10,000 | 0.116 |

Figure 12c | 10,000 | 0.124 |

Figure 12d | 10,000 | 0.243 |

Figure 14 | 15,067 | 0.457 |

Dataset | ASC | AUTOCLUST | ||
---|---|---|---|---|

Clustering Time | DT Time | Clustering Time | Total (s) | |

10,000 | 0.249 | 0.026 | 0.314 | 0.340 |

20,000 | 0.422 | 0.082 | 2.450 | 2.532 |

50,000 | 0.941 | 0.287 | 5.125 | 5.412 |

100,000 | 2.653 | 0.607 | 14.234 | 14.841 |

200,000 | 5.324 | 2.290 | 61.124 | 63.414 |

Data Set Algorithm | 100,000 | 200,000 | ||
---|---|---|---|---|

ASC | Žalik | ASC | Žalik | |

Initialization | 0.646 | 0.717 | 1.597 | 2.700 |

Clustering | 1.026 | 1.944 | 2.715 | 7.242 |

Merging | 0.481 | 0.624 | 1.012 | 2.332 |

Total | 2.153 | 3.285 | 5.324 | 12.274 |

© 2017 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Zhan, Q.; Deng, S.; Zheng, Z.
An Adaptive Sweep-Circle Spatial Clustering Algorithm Based on Gestalt. *ISPRS Int. J. Geo-Inf.* **2017**, *6*, 272.
https://doi.org/10.3390/ijgi6090272

**AMA Style**

Zhan Q, Deng S, Zheng Z.
An Adaptive Sweep-Circle Spatial Clustering Algorithm Based on Gestalt. *ISPRS International Journal of Geo-Information*. 2017; 6(9):272.
https://doi.org/10.3390/ijgi6090272

**Chicago/Turabian Style**

Zhan, Qingming, Shuguang Deng, and Zhihua Zheng.
2017. "An Adaptive Sweep-Circle Spatial Clustering Algorithm Based on Gestalt" *ISPRS International Journal of Geo-Information* 6, no. 9: 272.
https://doi.org/10.3390/ijgi6090272