Accelerating Density Peak Clustering Algorithm

The Density Peak Clustering (DPC) algorithm is a new density-based clustering method. It spends most of its execution time on calculating the local density and the separation distance for each data point in a dataset. The purpose of this study is to accelerate its computation. On average, the DPC algorithm scans half of the dataset to calculate the separation distance of each data point. We propose an approach to calculate the separation distance of a data point by scanning only the neighbors of the data point. Additionally, the purpose of the separation distance is to assist in choosing the density peaks, which are the data points with both high local density and high separation distance. We propose an approach to identify non-peak data points at an early stage to avoid calculating their separation distances. Our experimental results show that most of the data points in a dataset can benefit from the proposed approaches to accelerate the DPC algorithm.


Introduction
Clustering is the process of categorizing objects into groups (called clusters) of similar objects and is a widely-used data mining technique both in academic and applied research [1,2]. Many clustering methods appear in the literature, but they differ in the notion of similarity. For example, the k-means algorithm [3] represents each cluster by a centroid, and those objects near the same centroid are deemed similar; the DBSCAN algorithm [4] defines the notion of density and deems the objects in a continuous region with a density exceeding a specified threshold as similar; some studies measure the similarity using the concept of symmetry.
The k-means algorithm is an example of the partitioning-based clustering methods, and most of the partitioning-based clustering methods can find only spherical shaped clusters [5]. In contrast, the DBSCAN algorithm is an example of the density-based clustering methods, which can not only find clusters of arbitrary shapes but also detect outliers [5]. Although a density-based clustering method usually requires more execution time than a partitioning-based clustering method does, it can often discover meaningful clustering results that a partitioning-based clustering method cannot reveal. Several applications of clustering to real-world problems use both of these approaches to extract different clustering results of the same dataset, to highlight different aspects of the data.
The Density Peak Clustering (DPC) algorithm, proposed by Rodriguez and Laio [6], is a new density-based clustering method that has received much attention for the past few years [7][8][9][10][11][12][13][14][15][16][17]. It accelerates the clustering process by first searching for the density peaks in a dataset, and then constructing clusters from the density peaks. To search density peaks, DPC must calculate two quantities for each data point: local density and separation distance (see Section 2 for details) [9]. Then, data points with relatively high local density and separation distance are selected as the density peaks. Many works refer to the density peak of a cluster as the "center" of the cluster. Since density-based clustering methods yield clusters of arbitrary shapes, the notion of "center" is somewhat misleading. This work uses "density peak" instead of "center" to avoid confusion.
The contribution of this work is to propose two methods (called ADPC1 and ADPC2) that accelerate the DPC algorithm. The first method ADPC1 accelerates the calculation of separation distances and yields the same clustering results as that of the DPC algorithm. The second method ADPC2 accelerates the DPC algorithm by identifying a significant portion of the non-peak data points and avoiding calculating their separation distances. Since calculating the separation distances for all data points is a time-consuming step with O N 2 time complexity where N is the number of data points, our proposed methods can significantly speed up the DPC algorithm.
The rest of this work is organized as follows: Section 2 reviews related work, with a focus on the DPC algorithm. Sections 3 and 4 propose our methods. Section 5 presents the experimental results. Finally, Section 6 concludes this study.

Clustering Methods
In the literature, clustering methods have been classified into several categories [18]: partitioning-based methods, hierarchical methods, density-based methods, grid-based methods, and model-based methods. Partitioning-based methods (e.g., k-means and possibilistic c-means) focus on discovering compact and hyperellipsoidally shaped clusters. With k-means, the clustering results are sensitive to outliers. The possibilistic c-means (PCM) method is resilient to outliers, but it requires additional parameters γ, one for each cluster. Adaptive PCM algorithm [19] allows the parameters γ to change as the algorithm evolves.
Hierarchical methods work by iteratively (or recursively) dividing a large cluster into small clusters (or by combining small clusters as a large cluster). As a result, their clustering results can be represented by a dendrogram. Bianchi, et. [20] proposed a clustering method that forms clusters by iterative partitioning of an undirected graph.
Density-based methods discover clusters that are continuous regions with a high local density within the regions. Unlike the partitioning-based methods, density-based methods yield clusters of arbitrary shapes. Grid-based methods use a grid data structure to quantize the data space into a finite number of cells and perform the clustering operations directly on the cells. Model-based methods try to fit the data to some mathematical model. Some clustering methods do not fit nicely into the above categorization. For example, subspace clustering [21] methods identify clusters based on their association with subspaces in high-dimensional spaces.

Density Peak Clustering Algorithm
As described in Section 1, the DPC algorithm [6] must calculate the local density and the separation distance for each data point. Given a dataset X, the local density ρ(x i ) of a data point x i ∈ X is the number of data points in the neighborhood of x i . That is: where B(x i ) denotes the neighborhood of x i and is defined as the set of data points in X whose distance to x i is less than a user-specified parameter d c . That is: where d x i , x j represents the distance between x i and x j . Notably, Equations (1) and (2) use the parameter d c as a hard threshold to derive the neighborhood and the local density of a data point, respectively. The value of d c can be chosen so that the average number of neighbors of a data point is around p% of the number of the data points in X, and the suggested value [6] for p is between 1 and 2. For small datasets, Rodriguez and Laio [6] suggested using an exponential kernel to calculate the local density, as shown in Equation (3): The separation distance δ(x i ) of x i is the minimum distance from x i to any other data point with a local density > ρ(x i ), or the maximum distance from x i to any other data point in X if there exists no data point with a local density > ρ(x i ), as shown in Equation (4): For ease of exposition, we use σ(x i ) to denote the index j of the data point x j that is the nearest to x i and ρ x j > ρ(x i ), and if no such data point exists, σ(x i ) is set to i, as shown in Equation (5): Notably, there may be more than one data point that is the nearest to x i and has a local density > ρ(x i ). According to Laio's Matlab implementation of the DPC algorithm [22], if this situation happens, then σ(x i ) is randomly chosen from the indexes of those data points with the highest local density among all the data points that are the nearest to x i and have a local density > ρ(x i ).
Once ρ(x i ) and δ(x i ) of each data point have been determined, the DPC algorithm uses the following assumption to select density peaks: if a data point x i ∈ X is a density peak, then x i must be surrounded by many data points (i.e., ρ(x i ) is large) and must be at a relatively high distance from other data points with a local density greater than ρ(x i ) (i.e., δ(x i ) is large). To assist choosing the density peaks, the DPC algorithm plots each data point in a decision graph, which is a two-dimensional graph with the local density and the separation distance as the horizontal and vertical axes, respectively. Data points with both high local density and high separation distance are manually selected as the density peaks. Alternatively, one can set a threshold on γ(x i ) = ρ(x i )δ(x i ) and select data points with γ(x i ) greater than the threshold as density peaks [6].
After all density peaks have been determined, each density peak acts as the starting point of a cluster, and thus the number of density peaks equals the number of clusters. Each non-peak data point is assigned to the same cluster as its nearest data point of higher density, i.e., data points x i is assigned to the cluster that contains x σ(x i ) . Let y i denote the cluster label of data point x i , then y i = y σ(x i ) .
Algorithm 1 shows the DPC algorithm. Notably, it is important to sort the data points by their local density descendingly in Step 2 so that calculating δ(x i ) and σ(x i ) in Step 3 and the cluster assignment in Step 6 can be done efficiently. Without Step 2, for each data point x i , Step 3 would require scanning all data points in X to find the data points with a local density > ρ(x i ). With Step 2, Step 3 only needs to scan the data points located before x i in X, and, thus, reduces the running time of Step 3 by half on average. Additionally, with Step 2, data points with higher local density are processed earlier in Step 6.
Since ρ x σ(x i ) > ρ(x i ), y σ(x i ) will be determined before y i in Step 6, and, thus, Step 6 can complete cluster assignment in O(N) time. Input: the set of data points X ∈ R N×M and the parameters d c for defining the neighborhood, and d r for selecting density peaks Output: the label vector of cluster index y ∈ R N×1 Algorithm: 1.
Calculate ρ(x i ) for each x i ∈ X using either (1) or (3).

2.
Sort all data points in X by their local densities descendingly.
Select data points with ρ(x i )δ(x i ) > d r as density peaks. 5.
For each density peak x i , set y i = i. // starting point of each cluster. 6.
For each non-peak data point x i , set y i = y σ(x i ) . // cluster assignment.
Appendix A describes Laio's implementation details for Step 3 of the DPC algorithm. Specifically, we discuss how it handles two ambiguous situations when calculating the separation distance using Equation (4).

Accelerating APC by Scanning Neighbors Only
As described earlier, for each data point x i ∈ X, Step 3 of the DPC algorithm in Algorithm 1 requires to scan half of X on average to find x σ(x i ) , i.e., the data point nearest to x i and with a local density > ρ(x i ). Observation 1 shows that we can find x σ(x i ) by scanning only the neighbors of x i , if the local density of x i is less than the maximal local density of its neighbors. Most data points satisfy this condition, and the size of a data point's neighborhood is much smaller than the size of X, so the time complexity of Step 3 can be reduced from O N 2 to O(Nb) where N denotes the number of data points in X, and b denotes the average neighborhood size.
ρ x j for some data point x i ∈ X, then the data point nearest to x i and with a local density > ρ(x i ) is in B(x i ), i.e., x σ(x i ) ∈ B(x i ).
Based on Observation 1, we rewrite Equations (4) and (5) to Equations (6) and (7) below. In Algorithm 2, we propose an accelerated version of DPC (called ADPC1), which produces the same clustering results as DPC does, but in less time: Symmetry 2019, 11, 859
Input: the set of data points X ∈ R N×M and the parameters d c for defining the neighborhood, and d r for selecting density peaks. Output: the label vector of cluster index y ∈ R N×1 Algorithm: 1.

2.
Sort all data points in X by their local density descendingly.
Select data points with ρ(x i )δ(x i ) > d r as density peaks. 5.
For each density peak x i , set y i = i. // starting point of each cluster. 6.
For each non-peak data point x i , set y i = y σ(x i ) . // cluster assignment.
The parts different from the DPC in Algorithm 1 are highlighted in red.
Notably, in Step 1 of Algorithm 1, the DPC algorithm uses B(x i ) to calculate local density ρ(x i ), but, afterwards, B(x i ) is no longer needed. However, in Algorithm 2, the ADPC1 algorithm needs to keep B(x i ) for calculating δ(x i ) and σ( ρ x j does not hold, then σ(x i ) and δ(x i ) are calculated the same way as in the DPC algorithm, i.e., scanning half of the dataset X on average. Since it is often that the local density of a data point is less than the maximal local density of its neighbors, ADPC1 can greatly reduce the execution time. Appendix B describes the implementation details for Step 3 of the ADPC1 algorithm.

Accelerating APC by Skipping Non-Peaks
Both DPC and ADPC1 need to calculate the separation distance δ(x i ) for each data point x i . Recall that the purpose of calculating δ(x i ) is to assist determining whether x i is a density peak. Therefore, if we can determine x i as a non-peak data point at an early stage, then there is no need to calculate δ(x i ). Observation 2 shows the necessary condition of a density peak, which can be applied to detect most non-peak data points in a dataset.
ρ x j for some data point x i ∈ X, then x i cannot be a density peak.
If x i is not a density peak, then we can omit to calculate δ(x i ) by simply assigning δ(x i ) to a small value, say 0. However, without calculating δ(x i ), we do not know σ(x i ), i.e., the index of the data point nearest to x i and with a local density > ρ(x i ). Notably, σ(x i ) is needed for cluster assignment in Step 6 of the DPC and ADPC1 algorithms. To resolve this problem, we use the index of the data point with the highest local density in the neighborhood of x i as a surrogate for σ(x i ) and redefine Equations (6) and (7) as Equations (8) and (9) below: Notably, Equations (8) and (9) only modify the first case of Equations (6) and (7), i.e., when the local density of x i is less than the maximal local density of its neighbors. Based on Equations (8) and (9), we propose another accelerated version of DPC (called ADPC2), which is the same as ADPC1 in Algorithm 2 except that Step 3 of ADPC2 uses Equations (8) and (9) instead of Equations (6) and (7) to calculate δ(x i ) and σ(x i ), as shown in Algorithm 3. Notably, because ADPC1 and ADPC2 calculate σ(x i ) differently, their clustering results can be slightly different from each other. Appendix C describes the implementation details for Step 3 of the ADPC2 algorithm.

Algorithm 3. ADPC2 algorithm.
Input: the set of data points X ∈ R N×M and the parameters d c for defining the neighborhood, and d r for selecting density peaks Output: the label vector of cluster index y ∈ R N×1 Algorithm: 1.

2.
Sort all data points in X by their local density descendingly.
Select data points with ρ(x i )δ(x i ) > d r as density peaks. 5.
For each density peak x i , set y i = i. // starting point of each cluster. 6.
For each non-peak data point x i , set y i = y σ(x i ) . // cluster assignment.
The parts different from the DPC in Algorithm 1 are highlighted in red.

Test Datasets
In this study, we use 12 well-known two-dimensional synthetic datasets to demonstrate the performance of the proposed algorithms. Dataset Spiral [23] consists of three spiral-shaped clusters. Dataset Flame [24] consists of two non-Gaussian clusters of points, where both clusters are of different sizes and shapes. Dataset Aggregation [25] consists of seven perceptually distinct (non-Gaussian) clusters of points. Dataset R15 [26] consists of 15 similar Gaussian clusters that are positioned on concentric circles. Dataset D31 [26] consists of 31 similar Gaussian clusters that are positioned along random curves. Datasets A1, A2, and A3 [27] contain 20, 35, and 50 circular clusters, respectively, where each cluster has 150 points. Datasets S1, S2, S3, and S4 [28] each contain 15 Gaussian clusters, where the degree of cluster overlapping is S1 < S2 < S3 < S4. Appendix D gives a detailed characterization of these datasets.

Experiment Setup
The experiment was divided into two tests. Test 1 used a hard threshold to calculate the local density, as defined in Equations (1) and (2); Test 2 used an exponential kernel to calculate the local density, as defined in Equation (3). In both tests, the value of d c for defining the neighborhood is determined by the parameter p, as suggested in [6] and described in Section 2. We varied the value of p from 0.5 to 4 with a step size of 0.5. A large p implied a large d c and consequently a large neighborhood.
In this experimental study, we compared the performance of the proposed ADPC1 and ADPC2 against DPC. Recall that both ADPC1 and ADPC2 accelerated the way to derive the separation distances of those data points with a local density less than the maximal local density of their neighbors. Thus, we calculated the proportion (denoted byȒ) of such data points in a dataset for various p values, i.e., R =N N whereN is the number of such data points in the dataset, and N is the total number of data points in the dataset. Usually, bothN andȒ grow with a large neighborhood (i.e., a large d c or p). Thus, the proposed ADPC1 and ADPC2 should perform better with a larger p.
Since these three algorithms only differ on how to calculate the separation distance, we collected and compared their execution time for calculating the local density and the separation distance, i.e., from Step 1 to Step 3 of these algorithms in Algorithms 1 and 2. Then, for ease of comparison, we calculated the percentage of execution time improvement of ADPC1 (or ADPC2) over DPC by the difference of the execution times of DPC and ADPC1 (or ADPC2), divided by the execution time of DPC.

Test 1: Use a Fixed Threshold for Local Density
In Test 1, a fixed threshold is used to determine the neighborhood for calculating the local density of each data point. Table 1 shows the value ofȒ, i.e., the proportion of data points with a local density less than the maximal local density of their neighbors. According to Table 1, except for some small datasets (e.g., Spiral, Flame, Aggregation, and R15 datasets) and small p combinations, the value of R is usually greater than 80% in most cases, indicating that a large proportion of the data points in a dataset can be benefited from ADPC1 and ADPC2 to accelerate the calculation of their separation distances. Please refer to Table A2 in Appendix E for the value ofN, i.e., the number of data points with a local density less than the maximal local density of their neighbors. A larger p implies a larger d c , and thus a larger neighborhood range and probably more neighbors in the neighborhood. Intuitively, for a data point with a larger number of neighbors, it becomes less likely that the local density of the data point is greater than the maximal local density of its neighbors. Therefore, as the value of p increases, the value ofȒ tend to increase (with some exceptions). Table 2 shows the percentage of execution time improvement of ADPC1 and ADPC2 over DPC. Except for the two small datasets Spiral and Flame at p = 0.5, both ADPC1 and ADPC2 substantially reduced the execution time of DPC. ADPC2 took less time than ADPC1 did for most dataset and p value combinations. For the execution time of the three algorithms, please see Table A4 in Appendix E. For most cases in Table 1, the values ofȒ were large and did not change much as the value of p increased. As a result, the impact of p's value on the execution time improvement was not obvious in Table 2. To show that the impact ofȒ on the percentage of execution time improvement, consider the case of dataset D31 at p = 3 and 3.5. In Table 1, the value ofȒ dropped from 87.94% at p = 3 to 65.61% at p = 3.5. The corresponding case in Table 2 showed that at p = 3, ADPC1 (or ADPC2) incurred the execution time improvement over DPC by 77.86% (or 80.44%). However, at p = 3.5, ADPC1 (or ADPC2) incurred the execution time improvement over DPC by only 47.06% (or 48.53%). This example shows that a largeȒ helps ADPC1 and ADPC2 to reduce the percentage of execution time improvement. However, if a small p is applied on a small dataset, then the resultingȒ value is too small, causing ADPC2 to perform slower than DPC does (e.g., datasets Flame and Spiral at p = 0.5).

Test 2: Use an Exponential Kernel for Local Density
In Test 2, an exponential kernel (see Equation (3)) is used to calculate the local density of each data point. Table 3 shows the value ofȒ for various dataset and p combinations. Please refer to Table A3 in Appendix E for the value ofN. Similar to Table 1 in Test 1, a large proportion of the data points can be benefited from ADPC1 and ADPC2. Furthermore, each value ofȒ in Table 3 is greater than its corresponding value in Table 1. That is, for the same dataset and the same p value, an even larger proportion of data points can be benefited from ADPC1 and ADPC2 using the exponential kernel than using a fixed threshold to calculate the local density. In Test 2, a larger p value always incurs a largerȒ values in Table 3. The results are consistent with that of Test 1.  Table 4 shows the percentage of execution time improvement of ADPC1 and ADPC2 over DPC. ADPC1 always took less time than DPC did, except at p = 0.5 for Spiral dataset; ADPC2 always took less time than DPC did, except at p = 0.5 for Flame dataset. In general, both ADPC1 and ADPC2 required substantially less execution time than DPC did. ADPC2 usually achieved higher improvement than ADPC1 did; however, the difference is small. For the execution time of the three algorithms, please see Table A5 in Appendix E. Table 4. Percentage of execution time improvements over DPC (using an exponential kernel).

Dataset
Algorithm p = 0.5 Comparing Tables 2 and 4 show that the execution time improvement is greater in Test 1 than in Test 2. In Test 1, calculating the local density of a data point requires simply counting the number of data points in its neighborhood (see Equations (1) and (2)). However, in Test 2, calculating the local density of a data point is much time consuming because it requires calculating an exponential function N − 1 times, where N is the number of data points in the dataset (see Equation (3)). The execution time collected in this study is the execution time for calculating the local density and the separation distance. All three algorithms use the same method to calculate the local density, and they are only differed on how to calculate the separation distance. That is, the execution time improvement of ADPC1 and ADPC2 over DPC is due to the improvement on how to calculate the separation distance. Since much more time was spent on calculating the local density in Test 2 than in Test 1, the percentage of execution time improvement is smaller in Test 2 than in Test 1.

Conclusions
As discussed in Section 3, if the local density of a data point x i is less than the largest local density of its neighbors, then ADPC1 and ADPC2 can reduce the time complexity for calculating the separation distance of x i from O(N) to O B(x i ) where N denotes the number of data points in the dataset, and B(x i ) denotes the number of neighbors of x i . Thus, the effectiveness of both ADPC1 and ADPC2 depends on the proportion of the data points satisfying this condition. The experimental results in Tables 1 and 3 show that most data points in a dataset satisfy this condition, except for some small datasets using a small neighborhood setting. Consequently, both ADPC1 and ADPC2 improve the execution time of DPC, as shown in Tables 2 and 4. Furthermore, in most cases, ADPC2 requires less execution time than ADPC1 does.
Consider the case that all data points in a continuous region have the same local density. Then, there exists no data point in the region with a local density less than the largest local density of its neighbors, and consequently, both ADPC1 and ADPC2 cannot accelerate the computation of the separation distance for the data points in this region. If the entire dataset contains many such regions, then the advantage of ADPC1 and ADPC2 diminishes. However, according to Tables 1 and 3, except for small datasets with a small neighborhood range (i.e., small d c ), both ADPC1 and ADPC2 are advantageous.
The proposed methods focus on accelerating the calculation of the separation distance. However, it is also possible to improve the DPC algorithm by accelerating the calculation of the local density [9]. Besides, the DPC algorithm has several shortcomings that have received much attention in the literature. First, choosing proper values for DPC's parameters is not straightforward, but it can highly affect the quality of the clustering results. To resolve this problem, Ref. [7] applied the concept of heat diffusion and [8] employed the potential entropy of the data field to determine the value of d c . Additionally, Ref. [12] proposed a comparative technique to choose the density peaks. Thus, how to make the DPC algorithm more adaptive to the datasets with less human intervention is worthy of further investigation.
The local density of a data point x i can be defined from two different perspectives. One is to specify a fixed distance and count the number of data points within the fixed distance from x i . The DPC algorithm adopted this perspective. Another perspective is to specify a fixed number of neighbors and measure the distances of these neighbors to x i . Refs. [13,14] adopted this perspective and defined new methods to calculate the local density based on the k-nearest neighbors of x i . Since the definition of the local density significantly affects the clustering results, how to choose a proper method to define the local density is an important issue worthy of further investigation for density-based clustering algorithms.
Our future work intends to extend the DPC algorithm as a hierarchical clustering algorithm. Conceptually, the DPC algorithm builds a directed acyclic graph of all data points with an out-degree ≤ 1. Then, it selects several data points from the graph as the density peaks. Finally, it removes the outgoing links of the density peaks and breaks the graph into several subgraphs, each of which represents a cluster. By adding an ordering on the density peaks and incrementally removing the outgoing links of the density peaks according to this ordering, it is possible to yield the clustering results as a dendrogram. Furthermore, integrating the notion of central symmetry [29] or point symmetry [30] with the DPC algorithm for the detection of symmetry objects is also worthy of further investigation.

Acknowledgments:
The author acknowledges the Innovation Center for Big Data and Digital Convergence at Yuan Ze University for supporting this study.

Conflicts of Interest:
The author declares no conflict of interest.

Appendix A. Implementation Details for Calculating Separation Distance in DPC
Consider the case of more than one data points with a local density = the maximal local density in X. According to Equation (4), the separation distance of any data point x i with the maximal local density will be set to the maximal distance from x i to any point in X, i.e., max all data points with the maximal local density have high separation distances and, thus, will be chosen as the density peaks to form individual clusters, regardless that some of these data points may be near to each other. Notably, many data points with an equal local density are less likely to occur when Equation (3) is used for calculating local density because the Gaussian kernel in Equation (3) yields a floating-point value. However, the local density calculated using Equation (1) is an integer, and data points with an equal local density become common.
Laio's Matlab implementation of the DPC algorithm [22] resolved the above problem as follows. Recall that in Step 2 of the DPC algorithm in Algorithm 1, all data points in X are sorted by their local densities descendingly, i.e., ρ(x i ) ≥ ρ x j for i < j. After Step 2, Laio used the ordering of the data points' positions in X instead of the ordering on local density for calculating separation distances. Specifically, Laio used Equations (A1) and (A2) instead of Equations (4) and (5) to calculate separation distances. Notably, in this work, we use x i to denote the ith data point in X, and whenever the ordering of the data points in X is rearranged, the data point referred as x i also changes. Notably, it is possible that more than one data point has the same local density, but each position in X can only be taken by one data point: According to Equation (A1), only the separation distance of the first data point x 1 in X is set to the maximal distance, and for each data point x i 1 , we only scan those data points located before x i in X. Notably, with Equations (A1) and (A2), it is possible that σ(x i 1 ) = j but ρ x j = ρ(x i ) because the ordering on local density is non-monotonically decreasing after Step 2 of the DPC algorithm in Algorithm 1. However, with Equations (4) and (5), if σ(x i ) = j and i j, then ρ x j > ρ(x i ) must hold. Thus, the definition of separation distance according to Equation (4) has been slightly modified in Equation (A1), and the difference is illustrated in Figure A1.
According to Equation (A1), only the separation distance of the first data point in X is set to the maximal distance, and for each data point , we only scan those data points located before in X . Notably, with Equations (A1) and (A2), it is possible that σ( ) = but ρ = ρ( ) because the ordering on local density is non-monotonically decreasing after Step 2 of the DPC algorithm in Algorithm 1. However, with Equations (4) and (5), if σ( ) = and ≠ , then ρ > ρ( ) must hold. Thus, the definition of separation distance according to Equation (4) has been slightly modified in Equation (A1), and the difference is illustrated in Figure A1. Figure A1. Difference between using Equation (4) and using Equation (A1) to calculate separation distance. Figure A2 shows Laio's implementation for Step 3 of the DPC algorithm based on Equations (A1) and (A2). It is obvious that only the first data point is handled differently from the rest of the data points. Figure A3 shows the implementation of Step 3 of the DPC algorithm based on Equations (4) and (5). Data points with the maximal local density are handled in the same manner in Figures A2  and A3. However, for data points with a local density less than the maximal local density, Figure A3 faithfully implements Equations (4) and (5) to ensure that no data point with the same local density Figure A1.
Difference between using Equation (4) and using Equation (A1) to calculate separation distance. Figure A2 shows Laio's implementation for Step 3 of the DPC algorithm based on Equations (A1) and (A2). It is obvious that only the first data point is handled differently from the rest of the data points. Figure A3 shows the implementation of Step 3 of the DPC algorithm based on Equations (4) and (5). Data points with the maximal local density are handled in the same manner in Figures A2  and A3. However, for data points with a local density less than the maximal local density, Figure A3 faithfully implements Equations (4) and (5) to ensure that no data point with the same local density as x i is scanned when calculating δ(x i ), as illustrated in Figure A1. Figure A1. Difference between using Equation (4) and using Equation (A1) to calculate separation distance. Figure A2 shows Laio's implementation for Step 3 of the DPC algorithm based on Equations (A1) and (A2). It is obvious that only the first data point is handled differently from the rest of the data points. Figure A3 shows the implementation of Step 3 of the DPC algorithm based on Equations (4) and (5). Data points with the maximal local density are handled in the same manner in Figures A2  and A3. However, for data points with a local density less than the maximal local density, Figure A3 faithfully implements Equations (4) and (5) to ensure that no data point with the same local density as is scanned when calculating δ( ), as illustrated in Figure A1 Figure A1. Difference between using Equation (4) and using Equation (A1) to calculate separation distance. Figure A2 shows Laio's implementation for Step 3 of the DPC algorithm based on Equations (A1) and (A2). It is obvious that only the first data point is handled differently from the rest of the data points. Figure A3 shows the implementation of Step 3 of the DPC algorithm based on Equations (4) and (5). Data points with the maximal local density are handled in the same manner in Figures A2  and A3. However, for data points with a local density less than the maximal local density, Figure A3 faithfully implements Equations (4) and (5) to ensure that no data point with the same local density as is scanned when calculating δ( ), as illustrated in Figure A1 (4) and (5). Figure A4 gives a detailed description of Step 3 of the ADPC1 algorithm in Algorithm 2. Notably, before this step, all data points in X have been sorted by their local densities descendingly. To resolve the problem of multiple data points with the maximal local density, we adopt the same approach described in Appendix A, as shown in the first for loop of Figure A4. That is, only the separation distance of the first data point with the maximal local density in X is set to the maximal distance. For each data point x i 1 with the maximal local density, the separation distance δ(x i ) is set to the minimal distance from x i to other data points located before x i in X. Notably, the data points with the maximal local density are handled in the same manner in Figure A2, Figure A3, and Figure A4. The second for loop in Figure A4 applies Equations (6) and (7) to process the data points with local density < the maximal local density. described in Appendix A, as shown in the first for loop of Figure A4. That is, only the separation distance of the first data point with the maximal local density in X is set to the maximal distance. For each data point with the maximal local density, the separation distance δ( ) is set to the minimal distance from to other data points located before in X. Notably, the data points with the maximal local density are handled in the same manner in Figures A2, A3, and A4. The second for loop in Figure A4 applies Equations (6) and (7) to process the data points with local density < the maximal local density.  Figure A5 gives a detailed description of Step 3 of the ADPC2 algorithm in Algorithm 3. To resolve the problem of multiple data points with the maximal local density, we adopt the same approach described in Appendix A, as shown in the first for loop of Figure A5. The second for loop in Figure A5 bases on Equations (8) and (9) to process the data points with local density < the maximal local density.

3.
δ( ) = max ∈ d , ; σ( ) = 1; For = 2 to |X| do // for points with maximal local density in X Figure A4. Implementation details of Step 3 of ADPC1 algorithm (in Algorithm 2) based on Equations (6) and (7). Figure A5 gives a detailed description of Step 3 of the ADPC2 algorithm in Algorithm 3. To resolve the problem of multiple data points with the maximal local density, we adopt the same approach described in Appendix A, as shown in the first for loop of Figure A5. The second for loop in Figure A5 bases on Equations (8) and (9) to process the data points with local density < the maximal local density.  (8) and (9).

Appendix D. Datasets
Figures A6 and A7 show the data distribution of the 12 two-dimensional synthetic datasets used in Section 5. Table A8 describes the number of clusters and the number of points in these datasets.  (8) and (9).

Appendix D. Datasets
Figures A6 and A7 show the data distribution of the 12 two-dimensional synthetic datasets used in Section 5. Table A1 describes the number of clusters and the number of points in these datasets.  (8) and (9).