Improving Density Peak Clustering by Automatic Peak Selection and Single Linkage Clustering

: Density peak clustering (DPC) is a density-based clustering method that has attracted much attention in the academic community. DPC works by ﬁrst searching density peaks in the dataset, and then assigning each data point to the same cluster as its nearest higher-density point. One problem with DPC is the determination of the density peaks, where poor selection of the density peaks could yield poor clustering results. Another problem with DPC is its cluster assignment strategy, which often makes incorrect cluster assignments for data points that are far from their nearest higher-density points. This study modiﬁes DPC and proposes a new clustering algorithm to resolve the above problems. The proposed algorithm uses the radius of the neighborhood to automatically select a set of the likely density peaks, which are far from their nearest higher-density points. Using the potential density peaks as the density peaks, it then applies DPC to yield the preliminary clustering results. Finally, it uses single-linkage clustering on the preliminary clustering results to reduce the number of clusters, if necessary. The proposed algorithm avoids the cluster assignment problem in DPC because the cluster assignments for the potential density peaks are based on single-linkage clustering, not based on DPC. Our performance study shows that the proposed algorithm outperforms DPC for datasets with irregularly shaped clusters.


Introduction
Clustering is the process of grouping data points such that each group contains similar data points, and the data points in different groups are dissimilar. It is the main task in data mining and has applications in many fields, such as marketing [1,2], image processing [3,4], bioinformatics [5] and finance [6]. For different applications, the notation of "similarity" or "dissimilarity" varies. For example, in customer segmentation, two customers are similar if they exhibit a similar spending profile, and thus the "distance" between their spending profiles is a good measure of dissimilarity. However, this distance may not be a good measure of dissimilarity for the application of identifying communities in a social network of people, where two persons in the same community are deemed similar. Notably, two persons far apart could be in the same community as long as there is a group of near-by persons between them. That is, a community is a densely populated region of people. For this type of application, the notion of similarity is related to the distance between the data points and the densities of the data points. Distance may also be defined differently for different applications. For example, symmetric distance is used for clustering analysis in [7].
Various clustering applications motivate the research community to develop many clustering methods to meet different clustering needs. Major clustering methods can be classified into the following categories: partitioning methods, hierarchical methods, density-based methods, grid-based Figure 1. Flame dataset where each directed link indicates a data point to its nearest higher-density point. The links whose starting points have high densities and are far from their nearest points are shown in red. Exponential kernel and p = 2 are adopted for density calculation (see Section 2.1 for details).  Another drawback with DPC is its cluster assignment strategy, which assigns a non-peak data point and its nearest higher-density point into the same cluster. This strategy works fine for those data points that are not far from their nearest higher-density point. However, for those data points that are far from their nearest higher-density point, this strategy often yields a poor cluster assignment. For example, consider the three longest red directed links in Figure 1. If the starting point and the ending point of any of the three directed links are placed into the same cluster, DPC will incorrectly place the entire upper portion and part of the lower portion of the Flame dataset into one cluster.
The objective of this study is to eliminate the above two drawbacks of DPC. We propose a new clustering method that integrates DPC and single linkage clustering [8]. The proposed method automatically determines a set of potential density peaks by choosing those data points far from their nearest higher-density points. Notably, we use the term "potential" density peaks because this set may also contain the outliers of the dataset. The proposed method adopts the cluster assignment strategy of DPC only for those data points not far from their nearest higher-density points. Consequently, the proposed method will not yield the four red directed links in Figure 1. For those data points far away from their nearest higher-density points, single linkage clustering is adopted to reduce the number of clusters further.
The rest of this paper is organized as follows. Section 2 describes DPC and reviews related work. Section 3 proposes our method, and Section 4 presents the experimental results. Finally, Section 5 concludes this study.

Density Peak Clustering (DPC)
The DPC algorithm contains several major steps: calculating density and searching the nearest higher-density point for each point in the dataset, selecting density peaks, and assigning clusters. This section describes these steps in detail.
The density of a point is based on a user-specified parameter d c representing the radius of a point's neighborhood. Given a data set X, the local density ρ(x i ) of a point x i ∈ X can be calculated as the number of points within the neighborhood of x i , as shown below.
where d x i , x j is the Euclidean distance between points x i and x j , and χ(t) = 1 if t < 0 and otherwise χ(t) = 0. For small data sets, Rodriguez and Laio [15] suggested using an exponential kernel for calculating density, as shown below.
The value of d c can be set to the lower p% of all distances between any two points in X. Consequently, the average number of neighbors of a point is about p% of the total number of points in X. Notably, point x j is a neighbor of point x i if d x i , x j < d c . Rodriguez and Laio [15] suggested using 1 ≤ p ≤ 2.
Let δ(x i ) denote the distance between point x i and the nearest higher-density point of x i . Then, δ(x i ) can be calculated as follows. Notably, for the point with the highest density in X, δ(x i ) is set to the largest distance between any two points in X, as shown in the second case of (3). For ease of illustration, σ(x i ) is used to denote the nearest higher-density point of x i . That is: The data point with the highest density does not have any higher-density point, so for this data point, we set σ(x i ) to x i , as shown in the second case of (4). Once ρ(x i ) and δ(x i ) are available for each point x i ∈ X, Rodriguez and Laio [15] suggested selecting those points with high ρ(x i ) and high δ(x i ) as density peaks. One way to achieve this is to select those points with γ(x i ) greater than a specified threshold where γ(x i ) = ρ(x i )δ(x i ). One problem with this method is that ρ(x i ) and δ(x i ) are on different scales, and one of them may dominate the ordering of γ(x i ), resulting in a poor selection of density peaks. Another way is to select the density peaks manually with the assistance of the decision graph [15], a two-dimensional graph with ρ(x i ) and δ(x i ) as the horizontal and vertical coordinates, respectively. However, it remains a difficult and ineffective way to select the density peaks.
After the density peaks have been determined, the cluster assignment can proceed straightforwardly. Suppose that k points are selected as the density peaks. First, each density peak forms a new cluster, with a cluster label from one to k. Let η(x i ) denote the cluster label of the cluster containing point x i . Because each non-peak point x i is assigned to the same cluster as its nearest higher-density point σ(x i ), η(x i ) can be determined as follows: Notably, because (5) is recursive, we must ensure that η(σ(x i )) is calculated before η(x i ). This can be achieved by performing the cluster assignment for the non-peak points by the descending order of their densities. Figure 2 shows the DPC algorithm.
The data point with the highest density does not have any higher-density point, so for this data point, we set σ( ) to , as shown in the second case of (4). Once ρ( ) and δ( ) are available for each point ∈ , Rodriguez and Laio [15] suggested selecting those points with high ρ( ) and high δ( ) as density peaks. One way to achieve this is to select those points with γ( ) greater than a specified threshold where γ( ) = ρ( )δ( ). One problem with this method is that ρ( ) and δ( ) are on different scales, and one of them may dominate the ordering of γ( ), resulting in a poor selection of density peaks. Another way is to select the density peaks manually with the assistance of the decision graph [15], a two-dimensional graph with ρ( ) and δ( ) as the horizontal and vertical coordinates, respectively. However, it remains a difficult and ineffective way to select the density peaks.
After the density peaks have been determined, the cluster assignment can proceed straightforwardly. Suppose that k points are selected as the density peaks. First, each density peak forms a new cluster, with a cluster label from one to k. Let η( ) denote the cluster label of the cluster containing point . Because each non-peak point is assigned to the same cluster as its nearest higher-density point σ( ), η( ) can be determined as follows: Notably, because (5) is recursive, we must ensure that η(σ( )) is calculated before η( ). This can be achieved by performing the cluster assignment for the non-peak points by the descending order of their densities. Figure 2 shows the DPC algorithm. 1. Calculate ρ( ) for each ∈ using either (1) or (2).

3.
Select those points with high ρ( ) and high δ( ) as the density peaks.

4.
Form k clusters with labeling from 1 to k, where k is the number of the density peaks, and each cluster contains one density peak. Set the cluster label η( ) for each density peak accordingly.

6.
Return η( ) for each point ∈ .  Figure 3 illustrates the clustering process of DPC using a directed tree with a sink vertex. Each vertex represents a point in the dataset, each directed link connects a point to its nearest higherdensity point, and the sink vertex is the point with the highest density in the dataset. After selecting the density peaks (i.e., the red vertices in Figure 3), DPC decomposes the directed tree into the same number of directed sub-trees. Each of the subtrees has its sink vertex at a density peak. All points in a subtree form a cluster (shown as a gray ellipse region in Figure 3).  Figure 3 illustrates the clustering process of DPC using a directed tree with a sink vertex. Each vertex represents a point in the dataset, each directed link connects a point to its nearest higher-density point, and the sink vertex is the point with the highest density in the dataset. After selecting the density Symmetry 2020, 12, 1168 5 of 24 peaks (i.e., the red vertices in Figure 3), DPC decomposes the directed tree into the same number of directed sub-trees. Each of the subtrees has its sink vertex at a density peak. All points in a subtree form a cluster (shown as a gray ellipse region in Figure 3).  (1)). The three red points are the density peaks.

Variants of DPC
DPC has received much attention in the research community, and many variants of DPC have been proposed. This section reviews them from three perspectives: parameter setting, density peak selection, and computation acceleration.
Because DPC's parameters can affect the clustering performance, many studies focused on setting these parameters properly. For example, [16] applied the concept of heat diffusion and [17] employed the potential entropy of the data field to assist in setting the radius . Also, many studies suggested using k nearest neighbors to define density, instead of using the radius [18][19][20][21]. Furthermore, [22] suggested calculating two kinds of densities, one based on k nearest neighbors and one based on local spatial position deviation, to handle datasets with mixed density clusters.
As described in Section 1, selecting the density peaks in DPC can be difficult and ineffective. To resolve this problem, [23] proposed a comparative technique to choose the density peaks, [24] estimated density dips between points to determine the number of clusters, and [25] applied data detection to determine density peaks automatically. In [21], the optimal number of clusters was extracted from the results of hierarchical clustering. Furthermore, it may be more suitable for some datasets to locate a cluster by more than one density peak [26,27]. Overall speaking, making the clustering process more adaptive to the datasets with less human intervention is the goal.
Several studies focused on accelerating DPC [28][29][30]. Recall that DPC needs to search the nearest higher-density point σ( ) for each point (see Equation (4)). For each point whose density is not the highest within its neighborhood, [28] suggested that we can omit this step by simply setting σ( ) to the point with the highest density within the neighborhood of . Because most points are not the point with the highest density in their respective neighborhoods, this method accelerates calculating σ( ) in DPC. Alternatively, [29] accelerates calculating the density ρ( ) by integrating k-means with DPC. Also, [30] used k nearest neighbors to accelerate the calculation of both ρ( ) and δ( ).

The Proposed Method
This section describes the proposed method that avoids DPC's drawbacks described in Section 1. Specifically, the proposed method does not need to manually select the density peaks and does not place a point and its nearest higher-density point in the same cluster if the two points are far apart. The proposed method contains five stages: build a directed tree, remove long links, generate preliminary clustering, apply hierarchical clustering to the forest, and generate flat clustering results. The proposed method, referred to as density peak single linkage clustering (DPSLC), is shown in Figure 4.  (1)). The three red points are the density peaks.

Variants of DPC
DPC has received much attention in the research community, and many variants of DPC have been proposed. This section reviews them from three perspectives: parameter setting, density peak selection, and computation acceleration.
Because DPC's parameters can affect the clustering performance, many studies focused on setting these parameters properly. For example, ref. [16] applied the concept of heat diffusion and [17] employed the potential entropy of the data field to assist in setting the radius d c . Also, many studies suggested using k nearest neighbors to define density, instead of using the radius d c [18][19][20][21]. Furthermore, ref. [22] suggested calculating two kinds of densities, one based on k nearest neighbors and one based on local spatial position deviation, to handle datasets with mixed density clusters.
As described in Section 1, selecting the density peaks in DPC can be difficult and ineffective. To resolve this problem, ref. [23] proposed a comparative technique to choose the density peaks, ref. [24] estimated density dips between points to determine the number of clusters, and [25] applied data detection to determine density peaks automatically. In [21], the optimal number of clusters was extracted from the results of hierarchical clustering. Furthermore, it may be more suitable for some datasets to locate a cluster by more than one density peak [26,27]. Overall speaking, making the clustering process more adaptive to the datasets with less human intervention is the goal.
Several studies focused on accelerating DPC [28][29][30]. Recall that DPC needs to search the nearest higher-density point σ(x i ) for each point x i (see Equation (4)). For each point x i whose density is not the highest within its neighborhood, ref. [28] suggested that we can omit this step by simply setting σ(x i ) to the point with the highest density within the neighborhood of x i . Because most points are not the point with the highest density in their respective neighborhoods, this method accelerates calculating σ(x i ) in DPC. Alternatively, ref. [29] accelerates calculating the density ρ(x i ) by integrating k-means with DPC. Also, ref. [30] used k nearest neighbors to accelerate the calculation of both ρ(x i ) and δ(x i ).

The Proposed Method
This section describes the proposed method that avoids DPC's drawbacks described in Section 1. Specifically, the proposed method does not need to manually select the density peaks and does not place a point and its nearest higher-density point in the same cluster if the two points are far apart. The proposed method contains five stages: build a directed tree, remove long links, generate preliminary clustering, apply hierarchical clustering to the forest, and generate flat clustering results. The proposed method, referred to as density peak single linkage clustering (DPSLC), is shown in Figure 4.
14. Perform single-linkage agglomerative clustering on .  Recall from Section 2.1 and Figure 3 that DPC constructs a directed tree with a sink vertex. As with DPC, Stage 1 of DPSLC also constructs a directed tree. Figure 5 shows an example where each filled circle and the integer next to it represent a point and the point's density (based on Equation (1)), respectively, and each red dashed circle shows the neighborhood of a point. Each link connects a point to its nearest higher-density point; the red filled circle is the point with the highest density, which is also the sink vertex of the directed tree.  Recall from Section 2.1 and Figure 3 that DPC constructs a directed tree with a sink vertex. As with DPC, Stage 1 of DPSLC also constructs a directed tree. Figure 5 shows an example where each filled circle and the integer next to it represent a point and the point's density (based on Equation (1)), respectively, and each red dashed circle shows the neighborhood of a point. Each link connects a point to its nearest higher-density point; the red filled circle is the point with the highest density, which is also the sink vertex of the directed tree.
14. Perform single-linkage agglomerative clustering on .  Recall from Section 2.1 and Figure 3 that DPC constructs a directed tree with a sink vertex. As with DPC, Stage 1 of DPSLC also constructs a directed tree. Figure 5 shows an example where each filled circle and the integer next to it represent a point and the point's density (based on Equation (1)), respectively, and each red dashed circle shows the neighborhood of a point. Each link connects a point to its nearest higher-density point; the red filled circle is the point with the highest density, which is also the sink vertex of the directed tree. Stage 2 of DPSLC decomposes the directed tree into a forest of directed subtrees by removing those links that are too long. Specifically, DPSLC removes the link x i → x j if d x i , x j ≥ 2.1d c . Stage 2 of DPSLC will delete the three non-black links in Figure 5. The green link illustrates the case of linking from an outlier; the purple link illustrates the case of linking from a high-density point to its nearest higher-density point that is far away. Removing these two types of links allows DPSLC to break the connection between two regions that are not densely connected. The blue link connects the blue point to its nearest higher-density point, making the blue point connect to the gray region on the right in Figure 5. However, the two neighbors (within the red-dashed circle centered at the blue point) of the blue point are connecting to the gray region on the left in Figure 5. Thus, removing this blue link allows DPSLC to break the connection between the blue point and the right gray region, and yield a new region containing only the blue point. This new region will be combined with other regions later in Stage 4. After Stage 2, DPSLC yields a forest of four directed trees (shown in gray regions) in Figure 5.
Stage 3 of DPSLC constructs a preliminary clustering result by forming a cluster for each directed tree in the forest. The four gray regions in Figure 5 show the four clusters generated in this stage.
Let C i and C j be two clusters in the preliminary clustering result of Stage 3. The single-linkage distance ∆ S C i , C j between C i and C j is the distance between two points (one in each cluster) that are closest to each other. The overlapping distance ∆ O C i , C j between C i and C j is the reciprocal of one plus the number of point pairs Stage 4 of DPSLC performs agglomerative clustering on the preliminary clustering result of Stage 3 based on the single-linkage distance or the overlapping distance defined in (6) and (7). Here, overlapping distance is suitable for most datasets. However, for highly unbalanced datasets, single-linkage distance is preferred. The two nearest clusters are repeatedly combined into one until there is only one cluster left, and a dendrogram is generated to show the hierarchical relationship among clusters. Figure 6 shows the dendrogram for the example in Figure 5. The red, green, and purple links in Figure 6 are the three shortest single-linkage distances adopted in the dendrogram. Notably, the number of clusters in the preliminary clustering result generated in Stage 3 is usually small, so the hierarchical clustering in Stage 4 will not consume too much time. 2 of DPSLC will delete the three non-black links in Figure 5. The green link illustrates the case of linking from an outlier; the purple link illustrates the case of linking from a high-density point to its nearest higher-density point that is far away. Removing these two types of links allows DPSLC to break the connection between two regions that are not densely connected. The blue link connects the blue point to its nearest higher-density point, making the blue point connect to the gray region on the right in Figure 5. However, the two neighbors (within the red-dashed circle centered at the blue point) of the blue point are connecting to the gray region on the left in Figure 5. Thus, removing this blue link allows DPSLC to break the connection between the blue point and the right gray region, and yield a new region containing only the blue point. This new region will be combined with other regions later in Stage 4. After Stage 2, DPSLC yields a forest of four directed trees (shown in gray regions) in Figure 5. Stage 3 of DPSLC constructs a preliminary clustering result by forming a cluster for each directed tree in the forest. The four gray regions in Figure 5 show the four clusters generated in this stage.
Let and be two clusters in the preliminary clustering result of Stage 3. The single-linkage distance ∆ , between and is the distance between two points (one in each cluster) that are closest to each other. The overlapping distance ∆ , between and is the reciprocal of one plus the number of point pairs , satisfying , < 2 for ∈ and ∈ .
Stage 4 of DPSLC performs agglomerative clustering on the preliminary clustering result of Stage 3 based on the single-linkage distance or the overlapping distance defined in (6) and (7). Here, overlapping distance is suitable for most datasets. However, for highly unbalanced datasets, singlelinkage distance is preferred. The two nearest clusters are repeatedly combined into one until there is only one cluster left, and a dendrogram is generated to show the hierarchical relationship among clusters. Figure 6 shows the dendrogram for the example in Figure 5. The red, green, and purple links in Figure 6 are the three shortest single-linkage distances adopted in the dendrogram. Notably, the number of clusters in the preliminary clustering result generated in Stage 3 is usually small, so the hierarchical clustering in Stage 4 will not consume too much time.   Given the desired number of clusters k, this can be done by a horizontal cut on the dendrogram generated in Stage 4. For example, in Figure 6, setting the distance between clusters to a and b results in two and three clusters, respectively.
Additional parameter min_pts can be used to enforce that the clustering result contains k large clusters, each with no fewer than min_pts points, and possibly, some small clusters for outliers in the dataset.

Test Datasets
In this study, we used 12 well-known two-dimensional synthetic datasets to demonstrate the performance of the proposed algorithm. Table 1 describes the number of clusters and the number of points in these datasets. See Appendix A for the data distribution of these datasets. Dataset Spiral [32] consists of three spiral-shaped clusters. Dataset R15 [33] consists of 15 similar Gaussian clusters positioned on concentric circles. Dataset D31 [33] consists of 31 similar Gaussian clusters positioned along random curves. Dataset A1 [34] contains 20 circular clusters, where each cluster has 150 points. Datasets T300, T1000, and T2000 contain two half-ring-shaped clusters each, where the density is T300 < T1000 < T2000. Dataset Flame [31] consists of two non-Gaussian clusters of points, where both clusters are of different sizes and shapes. Dataset Aggregation [35] consists of seven perceptually distinct (non-Gaussian) clusters of points. Dataset Jain [36] consists of two crescent-shaped clusters with different densities. Dataset SMS02 consists of three rectangular-shaped clusters with different sizes. Dataset Unbalance consists of eight clusters, where three of them are dense, and the other five are sparse. Table 2 shows the parameter setting of DPSLC. Parameter p = 2 is adopted to determine the radius d c of the neighborhood, as described in Section 2.1. The number of clusters k is set to the exact number of clusters in the dataset, as specified in Table 1. Parameter min_pts is set to two, so the final clustering result contains k large clusters (each with no fewer than min_pts points) and possibly, some small clusters of outliers. For all datasets except the dataset Unbalance, overlapping distance (see Equation (7)) is adopted to calculate the distance between two clusters in the preliminary clustering results generated at Stage 3 of DPSLC. Because Database Unbalance contains clusters of extremely different densities, single-linkage distance is adopted instead.

Experiment Setup
The experiment compares the performance of DPSLC and DPC. Table 3 shows the parameter setting of DPC. Parameter p is set to two, the same as in DPSLC. The top k data points with the highest Symmetry 2020, 12, 1168 9 of 24 γ(x i ) are selected as the density peaks, where γ(x i ) = ρ(x i )δ(x i ) and k is set to the exact number of clusters in the dataset, as specified in Table 1.   Table 1 The number of clusters in a dataset For each dataset, the clustering result C of DPSLC or DPC is compared against the ground truth T. The following four measures are collected:

•
Homogeneity score measures the data points in the same cluster according to C are indeed in the same cluster according to the ground truth T. Homogeneity score is between 0 and 1, and 1 represents that C is perfectly homogeneous labeling.

•
Completeness score measures the data points in the same cluster according to the ground truth T are placed in the same cluster according to C. Completeness score is between 0 and 1, and 1 represents that C is perfectly complete labeling.

•
Adjusted Rand index (ARI) = (RI -Expected _Value(RI))/(max(RI) -Expected _Value(RI)), where RI (short for Rand index) is a similarity measure between two clustering results of the same dataset by considering all pairs of data points that are assigned in the same or different clusters in the two clustering results. ARI adjusts RI for chance such that random clustering results have an ARI close to 0. ARI can yield negative values if RI is less than the expected value of RI. When two clustering results are identical, ARI = 1.

•
Adjusted mutual information (AMI) adjusts mutual information (MI) to correct the agreement's effect due to chance. Similar to ARI, random clustering results have an AMI close to 0. When two clustering results are identical, AMI = 1. Table 4 shows the performance results. DPSLC and DPC yield the same clustering results for the first six datasets in Table 4. The common characteristics of these six datasets are that they contain clusters that are nicely separated and with similar densities. Both approaches achieve excellent performance on these six datasets.

Experiment Results
DPSLC outperforms DPC on the bottom six datasets in Table 4. The clusters in each of these six datasets are either not well separated or with very different densities. DPC performs poorly on these datasets, but DPSLC can still achieve excellent clustering results. The rest of this section inspects the process of DPSLC for these datasets. The DPC's clustering results are presented in Appendix B.  Figure 7 shows the process of applying DPSLC on dataset T300. The directed tree generated in Stage 1 contains several long links (see Figure 7a), which are subsequently removed in Stage 2 (see Figure 7b). The preliminary clustering result contains eight clusters where the star symbols indicate the positions of the density peaks (see Figure 7c). The final clustering result contains two clusters.  Figure 7 shows the process of applying DPSLC on dataset T300. The directed tree generated in Stage 1 contains several long links (see Figure 7a), which are subsequently removed in Stage 2 (see Figure 7b). The preliminary clustering result contains eight clusters where the star symbols indicate the positions of the density peaks (see Figure 7c). The final clustering result contains two clusters.   Figure 8 shows the process of applying DPSLC on dataset Flame. Notice that the preliminary clustering result contains six clusters, including a cluster of two outliers in the top left corner (see Figure 8c). The final clustering result includes two large clusters and a small cluster of outliers (shown in the gray region in Figure 8d).  Figure 8 shows the process of applying DPSLC on dataset Flame. Notice that the preliminary clustering result contains six clusters, including a cluster of two outliers in the top left corner (see Figure 8c). The final clustering result includes two large clusters and a small cluster of outliers (shown in the gray region in Figure 8d).  Figure 9 shows the process of applying DPSLC on dataset Aggregation. According to the ground truth in Figure A1i, the two data points placed in the wrong clusters by DPSLC are shown in the gray region in Figure 9d.  Figure 9 shows the process of applying DPSLC on dataset Aggregation. According to the ground truth in Figure A1i, the two data points placed in the wrong clusters by DPSLC are shown in the gray region in Figure 9d Figure 10 shows the process of applying DPSLC on dataset Jain. Dataset Jain contains one dense region and one sparse region. DPSLC breaks the dataset into 15 small groups in the preliminary clustering result (see Figure 10c) and coalesces them into two clusters in the final result (see Figure  10d).  Figure 10 shows the process of applying DPSLC on dataset Jain. Dataset Jain contains one dense region and one sparse region. DPSLC breaks the dataset into 15 small groups in the preliminary clustering result (see Figure 10c) and coalesces them into two clusters in the final result (see Figure 10d).  Figure 11 shows the process of applying DPSLC on dataset SMS02. According to the ground truth in Figure A1k, the four data points placed in the wrong clusters by DPSLC are shown in the gray region in Figure 11d.  Figure 11 shows the process of applying DPSLC on dataset SMS02. According to the ground truth in Figure A1k, the four data points placed in the wrong clusters by DPSLC are shown in the gray region in Figure 11d Figure 12 shows the process of applying DPSLC on dataset Unbalance, which contains three dense regions and five sparse regions. The densities of the density regions and the sparse regions differ significantly. Consequently, setting parameter p = 2 results in a small value for , which yields many clusters with just one point in the sparse regions at Stage 3 of DPSLC, as shown in Figure 12c. However, most of these clusters correctly coalesce at Stage 5 of DPSLC. However, there are still two data points misidentified as outliers by DPSLC, shown in the gray region in Figure 12d.       The two gray regions in Figure 17d and two gray regions in Figure 18d indicate data points where the results of DPSLC and the ground truth disagree. We manually inspect those data points in Figures 17d and 18d against their ground truth in Figure A1e,f. It appears that the DPSLC makes a better cluster assignment than the ground truth does for those data points. The two gray regions in Figure 17d and two gray regions in Figure 18d indicate data points where the results of DPSLC and the ground truth disagree. We manually inspect those data points in Figures 17d and 18d against their ground truth in Figure A1e,f. It appears that the DPSLC makes a better cluster assignment than the ground truth does for those data points.

Conclusions
This paper proposes DPSLC to improve DPC. DPSLC effectively avoids assigning a data point to the same cluster as its nearest higher-density point if both points are far apart. However, such a strategy could also yield many small clusters. For example, in the preliminary clustering result of dataset Unbalance, many small clusters appear on the right half of Figure 12c. DPSLC conquers this problem by applying single-linkage agglomerative clustering on the preliminary clustering result. The performance results in Table 4 show that DPSLC can still perform well on those datasets that DPC fails short.
Density-based clustering approaches are based on the idea of searching dense regions in a dataset. However, there is no de facto standard for what constitutes a dense region. In this study, we use a radius to define the neighborhood, and subsequently, calculate the density of a data point

Conclusions
This paper proposes DPSLC to improve DPC. DPSLC effectively avoids assigning a data point to the same cluster as its nearest higher-density point if both points are far apart. However, such a strategy could also yield many small clusters. For example, in the preliminary clustering result of dataset Unbalance, many small clusters appear on the right half of Figure 12c. DPSLC conquers this problem by applying single-linkage agglomerative clustering on the preliminary clustering result. The performance results in Table 4 show that DPSLC can still perform well on those datasets that DPC fails short.
Density-based clustering approaches are based on the idea of searching dense regions in a dataset. However, there is no de facto standard for what constitutes a dense region. In this study, we use a radius d c to define the neighborhood, and subsequently, calculate the density of a data point [10,15]. Some other studies calculate the density of a data point using the distance to the data point's k nearest neighbor [18][19][20][21]. However, the proper value for either d c or k depends on the characteristic of the dataset. Thus, clustering approaches whose value for d c or k is adaptive to the dataset are worthy of investigation. In this study, we set the value of d c so that each data point has about 2% of all data points within its neighborhood, on average. Thus, the value of d c is adaptive to the dataset to a small extent. However, more sophisticated strategies are needed.
Finally, using a single d c or k may be insufficient for those datasets containing clusters with a wide range of densities. Using multiple d c or k may be a better way to capture the patterns of these clusters. For example, persistent homology detects persistent topological features by inspecting data points over a wide range of scales [37]. Similarly, DPC can experiment with a wide range of d c or k to detect the persistent clustering among data points. Alternatively, future research directions can also consider the integration use of d c and k.

Conflicts of Interest:
The authors declare no conflict of interest. Figure A1 shows the data distribution and the ground truth of the clustering of the 12 two-dimensional synthetic datasets used in Section 4.  [10,15]. Some other studies calculate the density of a data point using the distance to the data point's k nearest neighbor [18][19][20][21]. However, the proper value for either or k depends on the characteristic of the dataset. Thus, clustering approaches whose value for or k is adaptive to the dataset are worthy of investigation. In this study, we set the value of so that each data point has about 2% of all data points within its neighborhood, on average. Thus, the value of is adaptive to the dataset to a small extent. However, more sophisticated strategies are needed.

Appendix A. Datasets
Finally, using a single or k may be insufficient for those datasets containing clusters with a wide range of densities. Using multiple or k may be a better way to capture the patterns of these clusters. For example, persistent homology detects persistent topological features by inspecting data points over a wide range of scales [37]. Similarly, DPC can experiment with a wide range of or k to detect the persistent clustering among data points. Alternatively, future research directions can also consider the integration use of and k.

Conflicts of Interest:
The authors declare no conflicts of interest. Figure A1 shows the data distribution and the ground truth of the clustering of the 12 twodimensional synthetic datasets used in Section 4.  Figure A2 shows the clustering results of using DPC on the 12 datasets described in Section 4. The parameter p is set to two. The density peak selection criterion is based on γ( ) = ρ( )δ( ), as described in Section 2.1, and the number of density peaks selected is set to the exact number of clusters in each dataset specified in Table 1. The star symbols in Figure A2 indicate the positions of the density peaks.  Figure A2 shows the clustering results of using DPC on the 12 datasets described in Section 4. The parameter p is set to two. The density peak selection criterion is based on γ(x i ) = ρ(x i )δ(x i ), as described in Section 2.1, and the number of density peaks selected is set to the exact number of clusters in each dataset specified in Table 1. The star symbols in Figure A2