Density Peak Clustering Based on Relative Density under Progressive Allocation Strategy

: In traditional density peak clustering, when the density distribution of samples in a dataset is uneven, the density peak points are often concentrated in the region with dense sample distribution, which is easy to affect clustering accuracy. Under the progressive allocation strategy, a density peak clustering algorithm based on relative density is proposed in this paper. This algorithm uses the K-nearest neighbor method to calculate the local density of sample points. In addition, in order to avoid the domino effect during sample allocation, a new similarity calculation method is deﬁned, and a progressive allocation strategy from near to far is used for the allocation of the remaining points. In order to evaluate the effectiveness of this algorithm, comparative experiments with ﬁve algorithms were carried out on classical artiﬁcial datasets and real datasets. Experimental results show that the proposed algorithm can achieve higher clustering accuracy on datasets with uneven density distribution. Moreover, due to the of a of wrong in the The clustering results of the DPC-RD-PAS algorithm are also inferior to that DPC.


Introduction
Clustering is an unsupervised machine learning [1][2][3] technique that aims to group objects according to the similarity relationship so that objects with high similarity are assigned to the same group and objects with high dissimilarity are isolated to different groups. Because clustering can discover the inherent structure information of objects, it has been widely used in image processing [4][5][6], fraud detection [7,8], information security [9,10], and medical applications [11,12].
In 2014, Rodriguez and Laio [13] proposed the density peaks clustering (DPC) algorithm. This algorithm classifies objects in two steps: (1) assuming that the cluster centers have a high local density and are relatively far away from each other, a decision graph is generated to select cluster centers that meet the assumptions; (2) noncentral points are assigned to the nearest neighbor with higher density. Based on the above steps, DPC cannot only effectively select cluster centers from the decision graph, but also effectively allocate the remaining noncentral points. Benefiting from the simple and efficient clustering logic, DPC can achieve better clustering results on datasets with an arbitrary shape. However, DPC is not impeccable, and it still faces some problems to be improved. For example, this algorithm uses Euclidean distance to calculate the density and search for the peak values of density, which is not suitable for a manifold structure [14], and the results are not satisfactory when processing some datasets with an uneven density. Further, the cluster allocation strategy of DPC may produce a domino effect, that is, the wrong allocation of one point may lead to the wrong allocation of all subsequent points. In order to overcome these problems, many researchers have improved and extended the original DPC algorithm. Du et al. [15] proposed the DPC-KNN algorithm based on the K-nearest neighbor (KNN) [16] concept. This algorithm changed the method of calculating local density in DPC, combined density peak clustering with KNN, and considered the surrounding environment of objects. At the same time, this algorithm used the principal component analysis to improve the

Density Peak Clustering
DPC is a new clustering algorithm based on density and distance. This algorithm assumes that (1) each cluster center is surrounded by neighbors with low local density, and (2) the distance between the cluster center and any point with high local density is relatively large. In DPC, each data point, i, is described by two important indicators: the local density, ρ i , and the distance, δ i , between data point i and the nearest point with a higher density.
For the local density value of data point i, the DPC algorithm provides two calculation methods: the cutoff distance method and the kernel distance method, which are respectively defined as follows: where d ij is the Euclidean distance between data points i and j, and d c is the cutoff distance and is the neighborhood radius set by the user. Therefore, the local density, ρ i , is related to the number of points whose distance from data point i is less than the cutoff distance, d c . The local density obtained by Equation (1) is a discrete value, and that obtained by Equation (2) is a continuous value.
The relative distance is defined as follows: 3 of 16 As shown in Equation (3), the relative distance of sample point i is the minimum distance d ij to point j, where the condition of sample point j is that its local density is greater than that of sample point i. For sample point i with the highest density, its relative distance is defined as follows: The cluster center points are located at the top right of the decision graph, that is, the cluster centers have a high density and large relative distance at the same time. To facilitate the selection of appropriate cluster center points in the decision diagram, the following formula is defined: DPC algorithm clustering mainly includes two steps. The first step tries to find the density peak. Based on the above analysis, we can find the appropriate cluster centers at the upper right side of the decision graph, where the x-axis of the decision graph is composed of the local density calculated by Equations (1) and (2), and the y-axis of the decision graph is the relative density, which is calculated by Equations (3) and (4). In the second step, the remaining sample points are allocated to the cluster to which the nearest neighbor with a higher density belongs. The nearest neighbor has been obtained when calculating the relative distance, and, therefore, the DPC algorithm has high allocation efficiency.
Although the experimental results show that DPC performs well in many cases, the allocation strategy on some non-uniform [22] density datasets has some shortcomings. Figure 1b describes the clustering results of DPC on Jain, a classic data set with an uneven density. In Figure 1, black solid pentagrams represent the cluster centers, and different colors represent different clusters. It can be seen that the density of the upper part of this dataset is significantly lower than that of the lower part when we use the local density calculation method of DPC. After selecting the points with a high local density and relative distance as cluster centers through the decision graph, we can see that the two cluster centers are both wrongly selected in the lower half of the dataset. Moreover, due to the wrong selection of cluster centers, a series of wrong assignments occur in the subsequent points.
As shown in Equation (3), the relative distance of sample point i is the min distance dij to point j, where the condition of sample point j is that its local den greater than that of sample point i. For sample point i with the highest density, its r distance is defined as follows: The cluster center points are located at the top right of the decision graph, that cluster centers have a high density and large relative distance at the same time. To tate the selection of appropriate cluster center points in the decision diagram, the f ing formula is defined: DPC algorithm clustering mainly includes two steps. The first step tries to fi density peak. Based on the above analysis, we can find the appropriate cluster cen the upper right side of the decision graph, where the x-axis of the decision graph i posed of the local density calculated by Equations (1) and (2), and the y-axis of the d graph is the relative density, which is calculated by Equations (3) and (4). In the step, the remaining sample points are allocated to the cluster to which the nearest bor with a higher density belongs. The nearest neighbor has been obtained when ca ing the relative distance, and, therefore, the DPC algorithm has high allocation effi Although the experimental results show that DPC performs well in many cas allocation strategy on some non-uniform [22] density datasets has some shortco Figure 1b describes the clustering results of DPC on Jain, a classic data set with an u density. In Figure 1, black solid pentagrams represent the cluster centers, and di colors represent different clusters. It can be seen that the density of the upper part dataset is significantly lower than that of the lower part when we use the local d calculation method of DPC. After selecting the points with a high local density an tive distance as cluster centers through the decision graph, we can see that the two centers are both wrongly selected in the lower half of the dataset. Moreover, due wrong selection of cluster centers, a series of wrong assignments occur in the subs points.  As shown in Figure 2, on the Pathbased dataset, the DPC algorithm can sel correct cluster centers from the decision graph. However, as the remaining obje allocated from high to low density, they are allocated to the cluster where the as points with higher density and the smallest relative distance are located. It can b from Figure 2b that the blue points are distributed first because of their high densit As shown in Figure 2, on the Pathbased dataset, the DPC algorithm can select the correct cluster centers from the decision graph. However, as the remaining objects are allocated from high to low density, they are allocated to the cluster where the assigned points with higher density and the smallest relative distance are located. It can be seen from Figure 2b that the blue points are distributed first because of their high density, and, thus, form a blue cluster. The points on the left ring should have been assigned to the pink cluster, but because the density is significantly lower than the blue cluster, they are incorrectly assigned to the blue cluster. thus, form a blue cluster. The points on the left ring should have been assigned to t cluster, but because the density is significantly lower than the blue cluster, they ar rectly assigned to the blue cluster.

DPC-RD-PAS
DPC is easily affected by the cutoff distance, dc, when calculating the density sample points. This is because the value of dc is determined based on the global d tion of objects, ignoring the local information between objects, which is easy to ca cluster centers to be concentrated in the area with dense objects (as shown in Figu view of this, our DPC-RD-PAS algorithm uses the K-nearest neighbor idea to de local density calculation method and then calculates the local density of the sample

Relative K-Nearest Neighbor Local Density
Definition 1. Relative K-nearest neighbor local density. The local density cal from the relative K-nearest neighbor around the sample point is called the relative est neighbor local density, which can be calculated as follows: where ( ) represents the set of the K-nearest neighbors of sample point i, and the total set composed of K-nearest neighbors of all objects in the set ( ). By using Equation (6) to calculate the local density of the sample points, the p ity that the cluster centers are located in a relatively sparse region can be im through the relative concept, so as to avoid the cluster centers being concentrated high-density region. This method is helpful to improve the correctness of the clus ters' selection, especially for datasets with an uneven density distribution.
In addition, our DPC-RD-PAS algorithm optimizes the allocation mode of D adopts the strategy of multi-step progressive allocation.

Progressive Allocation
To introduce the multi-step progressive allocation strategy in detail, the fol two definitions are given.

Definition 2. Nearest neighbors among unassigned points.
In the KNN range o located point P, find all the unassigned points in the KNN range. Among the neighbors of all the unassigned points, the nearest point will be regarded as th signed point.
For example, in Figure 3, take point P2 as the center to calculate the K-nearest bors, which can be divided into two groups, assigned points and unallocated poin

DPC-RD-PAS
DPC is easily affected by the cutoff distance, d c , when calculating the density of the sample points. This is because the value of d c is determined based on the global distribution of objects, ignoring the local information between objects, which is easy to cause the cluster centers to be concentrated in the area with dense objects (as shown in Figure 1). In view of this, our DPC-RD-PAS algorithm uses the K-nearest neighbor idea to define the local density calculation method and then calculates the local density of the sample points.

Relative K-Nearest Neighbor Local Density
Definition 1. Relative K-nearest neighbor local density. The local density calculated from the relative K-nearest neighbor around the sample point is called the relative K-nearest neighbor local density, which can be calculated as follows: where Γ(i)represents the set of the K-nearest neighbors of sample point i, and Γ(i) is the total set composed of K-nearest neighbors of all objects in the set Γ(i).
By using Equation (6) to calculate the local density of the sample points, the possibility that the cluster centers are located in a relatively sparse region can be improved through the relative concept, so as to avoid the cluster centers being concentrated in the high-density region. This method is helpful to improve the correctness of the cluster centers' selection, especially for datasets with an uneven density distribution.
In addition, our DPC-RD-PAS algorithm optimizes the allocation mode of DPC and adopts the strategy of multi-step progressive allocation.

Progressive Allocation
To introduce the multi-step progressive allocation strategy in detail, the following two definitions are given.

Definition 2. Nearest neighbors among unassigned points.
In the KNN range of the allocated point P, find all the unassigned points in the KNN range. Among the nearest neighbors of all the unassigned points, the nearest point will be regarded as the unassigned point.
For example, in Figure 3, take point P 2 as the center to calculate the K-nearest neighbors, which can be divided into two groups, assigned points and unallocated points. The blue points (P 1 , P 2 , and P 3 ) are assigned points, and the grey points (Q 1 , Q 2 , and Q 3 ) are unallocated points. Find the nearest neighbor from the unallocated points to the assigned points, taking the shortest distance as the benchmark, and take the corresponding unallocated points as the points to be assigned. From Figure 3, we can obtain that d(Q 1 , P 1 ) < d(Q 2 , P 2 ) < d(Q 3 , P 3 ). Therefore, point Q 1 will be the nearest neighbor among the unassigned points. blue points (P1, P2, and P3) are assigned points, and the grey points (Q1, Q2, and Q3) are unallocated points. Find the nearest neighbor from the unallocated points to the assigned points, taking the shortest distance as the benchmark, and take the corresponding unallocated points as the points to be assigned. From Figure 3, we can obtain that d(Q1, P1) < d(Q2, P2) < d(Q3, P3). Therefore, point Q1 will be the nearest neighbor among the unassigned points.
The smaller the value Rel, the higher the relation degree between the sample points. Assuming that point P has been assigned the corresponding cluster label, and point Q is one point to be assigned, we need to judge whether Q should be assigned the same cluster label as P by calculating the relation degree between point P and point Q.
If 0 < Rel < 0.5, the relation degree between point P and point Q is very high. We think these two points are very similar and, therefore, assign point Q the same cluster label as point P. If 0.5 < Rel < 1, the relation degree between point P and point Q is relatively high. We think these two points are similar but we cannot assign a cluster label to point Q for the time being. If Rel >1, the relation degree between point P and point Q is so low that point P cannot determine the cluster label of point Q. Figure 4 shows the different correlations between point P and point Q. Suppose K = 9, point P has been assigned a cluster label, and point Q is waiting to be assigned a cluster label.
As shown in Figure 4a, point Q is one of the K-nearest neighbors of point P, and =2, and point P is also one of the K-nearest neighbors of point Q, and = 2. According to Equation (7), the Rel value can be calculated to be 4/9, indicating that the similarity between these two points is very high, and, therefore, a subordinate label of point P is assigned to point Q. In Figure 4b, point Q is one of the K-nearest neighbors of point P, and = 5, and point P is also one of the K-nearest neighbors of point Q, and = 7. According to Equation (7), the Rel value can be calculated to be 12/9. This value is greater than 1, so we cannot assign the dependent label of point P to point Q. It can be seen from Figure  4c that point Q is one of the K-nearest neighbors of point P, and = 4, and point P is also one of the K-nearest neighbors of point Q, and = 4. The Rel value calculated using Equation (7) is 8/9, which indicates that the relation degree between these two points is in an ambiguous area, and we cannot assign a cluster label to point Q temporarily.
An example of the P and Q ranking calculation is as follows: Calculate the ranking of P and Q, as shown in Figure 4d, where d(P,P) < d(Q,P) < d(m3,P) < d(m4,P) <d(m5,P) < d(m6,P) < d(m7,P) < d(m8,P) < d(m9,P). In the neighborhood of K = 9 centered on P, it can be seen that point Q is in the second place centered on point P, which means that the ranking position is 2. Definition 3. Relation degree. The K-nearest neighbors of point P and point Q are calculated respectively and sorted to obtain the set Γ(P) and Γ(Q). The ranking position of point P in the Γ(Q) set is P Q , and the ranking position of point Q in the Γ(P) set is Q P . Then, the relation degree between point P and point Q is: The smaller the value Rel, the higher the relation degree between the sample points. Assuming that point P has been assigned the corresponding cluster label, and point Q is one point to be assigned, we need to judge whether Q should be assigned the same cluster label as P by calculating the relation degree between point P and point Q.
If 0 < Rel < 0.5, the relation degree between point P and point Q is very high. We think these two points are very similar and, therefore, assign point Q the same cluster label as point P. If 0.5 < Rel < 1, the relation degree between point P and point Q is relatively high. We think these two points are similar but we cannot assign a cluster label to point Q for the time being. If Rel >1, the relation degree between point P and point Q is so low that point P cannot determine the cluster label of point Q. Figure 4 shows the different correlations between point P and point Q. Suppose K = 9, point P has been assigned a cluster label, and point Q is waiting to be assigned a cluster label.
As shown in Figure 4a, point Q is one of the K-nearest neighbors of point P, and Q P = 2, and point P is also one of the K-nearest neighbors of point Q, and P Q = 2. According to Equation (7), the Rel value can be calculated to be 4/9, indicating that the similarity between these two points is very high, and, therefore, a subordinate label of point P is assigned to point Q. In Figure 4b, point Q is one of the K-nearest neighbors of point P, and Q P = 5, and point P is also one of the K-nearest neighbors of point Q, and P Q = 7. According to Equation (7), the Rel value can be calculated to be 12/9. This value is greater than 1, so we cannot assign the dependent label of point P to point Q. It can be seen from Figure 4c that point Q is one of the K-nearest neighbors of point P, and Q P = 4, and point P is also one of the K-nearest neighbors of point Q, and P Q = 4. The Rel value calculated using Equation (7) is 8/9, which indicates that the relation degree between these two points is in an ambiguous area, and we cannot assign a cluster label to point Q temporarily.
An example of the P and Q ranking calculation is as follows: Calculate the ranking of P and Q, as shown in Figure 4d, where d(P,P) < d(Q,P) < d(m 3 ,P) < d(m 4 ,P) < d(m 5 ,P) < d(m 6 ,P) < d(m 7 ,P) < d(m 8 ,P) < d(m 9 ,P). In the neighborhood of K = 9 centered on P, it can be seen that point Q is in the second place centered on point P, which means that the ranking position is 2. Math. Comput. Appl. 2022, 27, x FOR PEER REVIEW 6 of 17 (a) High relation degree between points P and Q.
(b) Medium relation degree between points P and Q.
(c) Low relation degree between points P and Q.
(d) Ranking relationship between P and Q.

Steps of DPC-RD-PAS
After introducing the above concepts, the steps of our DPC-RD-PAS algorithm are designed as follows: Input: the value of K. Output: the clustering results.
Step 1: Pre-process and the normalize dataset.
Step 3: Select the cluster centers according to the decision diagram.
Step4: Allocate the K-nearest neighbor points around the cluster centers to their corresponding class cluster.
Step 5: Find the nearest neighbors among the unassigned points of all assigned points according to Definition 2 and calculate the relation degree between the assigned points and the unassigned points according to Definition 3.
Step 6: Assign all the unassigned points with the value of a relation degree between 0 and 0.5 to the cluster where the corresponding assigned point is located; update the sets of assigned points and unassigned points and recalculate the relation degree.
Step 7: If there are still unassigned points with a value of relation degree between 0 and 0.5, go to Step 6.

Steps of DPC-RD-PAS
After introducing the above concepts, the steps of our DPC-RD-PAS algorithm are designed as follows: Input: the value of K. Output: the clustering results.
Step 1: Pre-process and the normalize dataset.
Step 3: Select the cluster centers according to the decision diagram.
Step4: Allocate the K-nearest neighbor points around the cluster centers to their corresponding class cluster.
Step 5: Find the nearest neighbors among the unassigned points of all assigned points according to Definition 2 and calculate the relation degree between the assigned points and the unassigned points according to Definition 3.
Step 6: Assign all the unassigned points with the value of a relation degree between 0 and 0.5 to the cluster where the corresponding assigned point is located; update the sets of assigned points and unassigned points and recalculate the relation degree.
Step 7: If there are still unassigned points with a value of relation degree between 0 and 0.5, go to Step 6.
Step 8: Assign all the unassigned points with a value of relation degree between 0.5 and 1 to the cluster where the corresponding assigned point is located; update the sets of assigned points and unassigned points and recalculate the relation degree.
Step 9: If there are still unassigned points with a value of relation degree between 0.5 and 1, go to Step 8.
Step 10: If there are unassigned sample points, they will be allocated to the cluster where the nearest allocated sample points with a higher density are located, and the clustering process is complete.

Experimental Preparation
In order to verify the effectiveness of our DPC-RD-PAS algorithm, comparative experiments with the DPC, DPC-KNN, K-Means [23], DBSCAN (Density-Based Spatial Clustering of Applications with Noise) [24], and DPCSA (DPC based on weighted local density Sequence and nearest neighbor Assignment) [25] algorithms were carried out. The experimental datasets include classic synthetic datasets and UCI datasets. The details of these datasets are listed in Tables 1 and 2. In order to quantify the quality of the clustering results, we selected three evaluation indicators to measure the accuracy of the clustering results, namely AMI (Adjusted Mutual Information), the ARI (Adjusted Rand Index), and the FMI (Fowles Mallows Index). The maximum value of these three indicators is 1. In the process of clustering, when the clustering results are better, the values of these three indicators are closer to 1.
In order to ensure that the experimental results were more accurate and objective, in our experiments, we optimized the parameters of all the algorithms and referred to the optimal parameters provided by the SNN-DPC algorithm.

Results on Synthetic Datasets
In order to more clearly illustrate the clustering performance of our DPC-RD-PAS algorithm on the uneven density datasets, we graphically displayed the results on datasets Jain, Pathbased, and Spiral, as shown in Figures 5-7, respectively. In these three figures, different colors represent different clusters, the black pentagram represents the center of one cluster, and the grey "×" represents the unallocated sample points.
As shown in Figure 5, the Jain dataset is composed of two crescent moons, of which the sample points in the lower half are evenly distributed and the density is relatively high, so it is easy to concentrate the cluster centers in the lower half when calculating the local density according to DPC. The DPC-KNN algorithm has been improved to solve this problem. Although the measurement method of local density has been unified, this problem has not been completely solved (as Figure 5c), resulting in the selection of cluster centers still being difficult to be satisfied. In the results of the DBSCAN algorithm, it can be clearly seen from Figure 5e that the upper part is wrongly divided into two clusters, and some sample points at the right corner are treated as noise points, so the clustering results are not particularly satisfactory. Because the K-Means algorithm has some disadvantages for non-spherical datasets, it is still not successful on the Jain dataset. Our DPC-RD-PAS algorithm adopts the concept of relative density, which is different from the concept of the K-nearest neighbor proposed by the DPC-KNN algorithm. It cannot only consider the K-nearest neighbors of each sample point and shrink the calculation range from the global point to the nearest neighbor points but also considers the K-nearest neighbors near the K-nearest neighbor sample points. This strategy enlarges the role of the surrounding points and can better find the cluster centers for uneven sample sets. The experimental results of our DPC-RD-PAS algorithm on the uneven data set Jain to confirm the correctness of the design idea.
The dataset, Pathbased, is composed of three classes, as shown in Figure 6. There are two dense classes in the middle, and the sparse ring sample points around them form the third class. On this dataset, the DPC, DPC-KNN, and DPCSA algorithms can all find the correct cluster centers, but there are joint errors in the allocation of the remaining sample points. Both DPC and DPC-KNN adopt the principle of ascending the arrangement according to the density and nearest neighbor allocation, which leads to allocation errors. The DBSCAN algorithm (as Figure 6e) mistakenly treats the data of the surrounding ring sample points as noise (the grey "×" sample points in Figure 6e). The K-Means algorithm still fails to allocate the Pathbased dataset correctly. The DPC-RD-PAS algorithm proposed in this paper improves the allocation strategy of the remaining sample points and achieves the optimal clustering results on this dataset.
As shown in Figure 7, the Spiral data set is composed of three spirals. On this dataset, except for the K-Means algorithm, the other algorithms can all obtain the correct clustering centers. Our DPC-RD-PAS algorithm uses the relative density method to calculate the cluster center, which cannot only find the correct cluster centers on this data set, but also correctly allocate the remaining points as other density-based algorithms. This group of results shows that this algorithm cannot only have good clustering performance on uneven data sets, but also obtain satisfactory clustering results on some spiral datasets, such as Spiral. As shown in Figure 5, the Jain dataset is composed of two crescent moons, of which the sample points in the lower half are evenly distributed and the density is relatively high, so it is easy to concentrate the cluster centers in the lower half when calculating the local density according to DPC. The DPC-KNN algorithm has been improved to solve this problem. Although the measurement method of local density has been unified, this problem has not been completely solved (as Figure 5c), resulting in the selection of cluster centers still being difficult to be satisfied. In the results of the DBSCAN algorithm, it can be clearly seen from Figure 5e that the upper part is wrongly divided into two clusters, and some sample points at the right corner are treated as noise points, so the clustering results are not particularly satisfactory. Because the K-Means algorithm has some disadvantages for non-spherical datasets, it is still not successful on the Jain dataset. Our DPC-RD-PAS algorithm adopts the concept of relative density, which is different from the concept of the K-nearest neighbor proposed by the DPC-KNN algorithm. It cannot only consider the K-nearest neighbors of each sample point and shrink the calculation range from the global point to the nearest neighbor points but also considers the K-nearest neighbors near the K-nearest neighbor sample points. This strategy enlarges the role of the surrounding points and can better find the cluster centers for uneven sample sets. The experimental results of our DPC-RD-PAS algorithm on the uneven data set Jain to confirm the correctness of the design idea.
The dataset, Pathbased, is composed of three classes, as shown in Figure 6. There are two dense classes in the middle, and the sparse ring sample points around them form the third class. On this dataset, the DPC, DPC-KNN, and DPCSA algorithms can all find the correct cluster centers, but there are joint errors in the allocation of the remaining sample points. Both DPC and DPC-KNN adopt the principle of ascending the arrangement according to the density and nearest neighbor allocation, which leads to allocation errors. The DBSCAN algorithm (as Figure 6e) mistakenly treats the data of the surrounding ring sample points as noise (the grey "×" sample points in Figure 6e). The K-Means algorithm still fails to allocate the Pathbased dataset correctly. The DPC-RD-PAS algorithm proposed in this paper improves the allocation strategy of the remaining sample points and achieves the optimal clustering results on this dataset.
As shown in Figure 7, the Spiral data set is composed of three spirals. On this dataset, except for the K-Means algorithm, the other algorithms can all obtain the correct clustering centers. Our DPC-RD-PAS algorithm uses the relative density method to calculate the cluster center, which cannot only find the correct cluster centers on this data set, but also correctly allocate the remaining points as other density-based algorithms. This group of results shows that this algorithm cannot only have good clustering performance on uneven data sets, but also obtain satisfactory clustering results on some spiral datasets, such as Spiral.  As shown in Figure 8, the Flame dataset is composed of two types of clusters. It can be seen from this figure that each algorithm can obtain correct clustering results except for K-Means. The clustering performance of the DPC-RD-PAS algorithm is slightly inferior to that of DPC. The main disadvantage is the adjacent position of the two clusters.  As shown in Figure 8, the Flame dataset is composed of two types of clusters. It can be seen from this figure that each algorithm can obtain correct clustering results except for K-Means. The clustering performance of the DPC-RD-PAS algorithm is slightly inferior to that of DPC. The main disadvantage is the adjacent position of the two clusters. As shown in Figure 8, the Flame dataset is composed of two types of clusters. It can be seen from this figure that each algorithm can obtain correct clustering results except for K-Means. The clustering performance of the DPC-RD-PAS algorithm is slightly inferior to that of DPC. The main disadvantage is the adjacent position of the two clusters. Figure 9 illustrates the results on the Aggregation dataset, which consists of seven clusters. The clustering results of the DPC-RD-PAS algorithm are also inferior to that of DPC.
Like the Flame dataset, the reason is still mainly concentrated at the intersection points, which leads to some sample points at the boundary of the orange area being incorrectly allocated to the blue area. Based on our analysis, in the progressive allocation strategy of the DPC-RD-PAS algorithm, the unallocated points with high similarity are allocated first, then the ones with medium similarity, and finally the ones with low similarity. The allocated set is updated every time the unallocated points are allocated according to the similarity until all the points are allocated. For some specific data sets, such as Aggregation, the performance may be poor at the junction, but the progressive strategy allocation method can better avoid the domino effect of DPC in the allocation of the remaining points.
As shown in Figure 10, each algorithm can obtain ideal clustering results on the R15 dataset. Figure 9 illustrates the results on the Aggregation dataset, which consists of seven clusters. The clustering results of the DPC-RD-PAS algorithm are also inferior to that of DPC. Like the Flame dataset, the reason is still mainly concentrated at the intersection points, which leads to some sample points at the boundary of the orange area being incorrectly allocated to the blue area. Based on our analysis, in the progressive allocation strategy of the DPC-RD-PAS algorithm, the unallocated points with high similarity are allocated first, then the ones with medium similarity, and finally the ones with low similarity. The allocated set is updated every time the unallocated points are allocated according to the similarity until all the points are allocated. For some specific data sets, such as Aggregation, the performance may be poor at the junction, but the progressive strategy allocation method can better avoid the domino effect of DPC in the allocation of the remaining points.
As shown in Figure 10, each algorithm can obtain ideal clustering results on the R15 dataset.   The specific clustering results on each dataset are shown in Table 3. The data in Table  3 not only includes the AMI, ARI, and FMI index values of the clustering results, but also gives the corresponding optimal parameters of each algorithm (the column represented by Arg-). The optimal values in the tables of this paper are shown in bold. It can be seen from this table that on the three datasets, Jain, Pathbased, and Spiral, with an uneven density distribution, the AMI, ARI, and FMI index values of our DPC-RD-PAS algorithm are the best. On the R15 and Aggregation datasets, the AMI, ARI, and FMI index values of the DPC-RD-PAS algorithm are close to the optimal DPC and DPC-KNN algorithms. The performance of the DPC-RD-PAS algorithm on the Flame dataset is relatively inferior, which is closely related to the data distribution characteristics of this dataset itself.  Figure 10. Clustering results on the R15 dataset.
The specific clustering results on each dataset are shown in Table 3. The data in Table 3 not only includes the AMI, ARI, and FMI index values of the clustering results, but also gives the corresponding optimal parameters of each algorithm (the column represented by Arg-). The optimal values in the tables of this paper are shown in bold. It can be seen from this table that on the three datasets, Jain, Pathbased, and Spiral, with an uneven density distribution, the AMI, ARI, and FMI index values of our DPC-RD-PAS algorithm are the best. On the R15 and Aggregation datasets, the AMI, ARI, and FMI index values of the DPC-RD-PAS algorithm are close to the optimal DPC and DPC-KNN algorithms. The performance of the DPC-RD-PAS algorithm on the Flame dataset is relatively inferior, which is closely related to the data distribution characteristics of this dataset itself.  Table 4 lists the clustering results of each algorithm on the six UCI datasets. On the Iris dataset, the index of our DPC-RD-PAS algorithm is slightly lower than that of the DPC-KNN algorithm and the DPSCA algorithm. The decline of AMI, ARI, and FMI are 2.7%, 2.0%, and 1.3%, respectively. On the Seeds dataset, DPC-RD-PAS has the best clustering results. Compared with DPC, DPC-KNN, DPSCA, DBSCAN, and K-Means, the AMI index increased by 4.25%, 4.25%, 14.82%, 30.16%, and 10.23% respectively. On the WDBC, Libras, Wine, and Ecoli datasets, DPC-RD-PAS achieved relatively good clustering results. Especially on the WDBC dataset, the values of AMI, ARI, and FMI are 11.1%, 19.9%, and 1.3% higher, respectively, than the K-Means algorithm, which performed the second best in this dataset. On the Wine dataset, our DPC-RD-PAS algorithm also performed well, and its AMI, ARI, and FMI indexes were improved by 7.93%, 14.47%, and 8.14%, respectively, compared with the DPC algorithm. In addition, compared with other algorithms, it also achieved the best clustering results on the Libras dataset with relatively high dimensions.

Running Time
The time complexity of the DPC algorithm is mainly composed of the complexity of calculating the distance matrix between the samples, the complexity of calculating the local density of the samples, and the complexity of calculating the relative distance of the samples. The time complexity of each part is O(n 2 ), so the total time complexity is O(n 2 ). The time complexity of the DPC-RD-PAS algorithm is mainly composed of the following five parts: (1) the complexity, O(n 2 ), of calculating the distance matrix between the samples; (2) calculate the complexity, O (n 2 ), of the relative local density of each sample; (3) calculate the complexity, O(n 2 ), of the sample relative distance; (4) the first step allocates the time complexity of the k-neighboring points around the cluster center as O(n); (5) the second step is to calculate the similarity of the unallocated points. Assuming that the number of remaining unallocated points is m, m < n, and that the time complexity is O(m 2 ) < O(n 2 ), the time complexity of the DPC-RD-PAS algorithm is O(n 2 ). Since it takes a relatively long time to find and judge whether it meets the requirements of merging when calculating the similarity between the unallocated points and allocated points, it will lead to a high running time on the datasets.
In this part, we ran the experiment on a computer with a 1.4 GHz quad core Intel i5 CPU and 8.0 GB of RAM. The operating environment was Python 3.9 (the DPC, DPC-KNN, and DPC-RD-PAS algorithms) and MATLAB 2018 (for the other algorithms). In order to reduce the impact of the running environment, the algorithm under the MATLAB 2018 environment was ignored in the time comparison. At the same time, in order to reduce the unexpected scenarios generated during the running of the program, for each data set, during the running of the different algorithms, we used the best parameters, provided in Tables 3 and 4, to execute the same process ten times. The running time values shown in Table 5 are all the average running times. It can be seen that the time consumption of the multi-step allocation strategy of the DPC-RD-PAS algorithm is larger than that of the one-step sample allocation strategy of the DPC algorithm. Although the time complexity of the DPC-RD-PAS and DPC algorithms is on the order of O(n 2 ), the time consumption for processing the actual data sets is different. The actual time consumption of the DPC-RD-PAS algorithm in this paper should be greater than that of the original DPC algorithm, but the running time is not as high as expected.

Conclusions
In order to improve the clustering performance of the DPC algorithm in processing datasets with an uneven density, we propose a density peak clustering algorithm based on relative density under a recursive allocation strategy named DPC-RD-PAS. This algorithm inherits the advantages of the DPC algorithm and can quickly find the density peak points. At the same time, using the idea of K-nearest neighbor for reference, the concept of the relative K-nearest neighbor local density has been introduced to improve the calculation method of the local density and improve the ability of cluster center selection on non-uniform density datasets. After obtaining the correct cluster centers, a recursive allocation strategy was designed for avoiding joint errors in the allocation of the remaining points. In order to evaluate the clustering performance of our DPC-RD-PAS algorithm, comparative experiments were carried out on six artificial datasets and six real datasets. The experimental results show that our DPC-RD-PAS algorithm can achieve satisfactory clustering results on datasets with an uneven density distribution. How to determine automatically the optimal parameter k of the algorithm will be the focus of the next step.