An Improved Density-Based Spatial Clustering of Applications with Noise Algorithm with an Adaptive Parameter Based on the Sparrow Search Algorithm
Abstract
:1. Introduction
- (1)
- Combining Sparrow Search Algorithm with DBSCAN to improve the efficiency of the latter.
- (2)
- The proposed algorithm underwent testing on five artificial datasets, seven real-world datasets, and five real images, showing strong performance with low- and medium-dimensional real data.
- (3)
- Improved the parameter selection process of DBSCAN algorithm.
- (4)
- Reducing the loss of flexibility in the current adaptive DBSCAN method helps to effectively maintain clustering quality.
2. DBSCAN Algorithm and Sparrow Search Algorithm
2.1. DBSCAN Algorithm
2.2. Sparrow Search Algorithm
3. Improved DBSCAN Algorithm Based on Sparrow Search Algorithm
3.1. Basic Idea
3.2. Determine the Parameter Range and the Optimal Number of Clusters
3.2.1. Adaptive Calculation of Parameter Ranges
3.2.2. Determine the Optimal Number of Clusters
3.3. Fitness Function Selection
3.4. Iterative Process of Parameter Optimization
3.5. Algorithm Pseudocode and Flowchart
Algorithm 1: SSA-DBSCAN | |
Input: Sample set , Population Size S, Number of iterations T; | |
1: | Calculate the distance matrix for dataset D; |
2: | Calculation the median of each column in D* ; |
3: | Get the range of Eps values ; |
4: | Get the range of MinPts values ; |
5: | Calculate the local density and the sample point distance , Determine the number of clusters based on the |
distance relationship ; | |
6: | Generate list of sparrow parameters based on Eps and MinPts; |
7: | While (t < T); |
8: | Obtain the profile coefficient value s(i) of an individual sparrow with the number of cluster n; |
9: | The solution that matches the number of clusters and has the largest Silhouette is the optimal solution; |
10: | Rank the fitness values and find the current best individual and the current worst individual; |
11: | R2 = rand (1) |
12: | for i = 1: PD |
13: | Using Equation (2) update the sparrow’s location; |
14: | end for |
15: | for i = (PD + 1): n |
16: | Using Equation (3) update the sparrow’s location; |
17: | end for |
18: | for l = 1: SD |
19: | Using Equation (4) update the sparrow’s location; |
20: | end for |
21: | Get the current new location; |
22: | If the new location is better than before, update it; |
23: | t = t + 1 |
24: | end while; |
25: | Initializing a collection of core objects; |
26: | for j = 1,2,3,…,m; |
27: | Determine the -neighborhood of sample ; |
28: | if ; |
29: | Add sample to the set of core objects: ; |
30: | end if; |
31: | end for; |
32: | Initialize the number of clusters: k = 0; |
33: | Initialize the set of unvisited samples: ; |
34: | ; |
35: | Record the current set of unvisited samples: ; |
36: | Random selection of a core object , Initializing the queue ; |
37: | ; |
38: | while ; |
39: | Fetch the first sample in the queue q; |
40: | if ; |
41: | ; |
42: | Add the sample in to the queue Q; |
43: | ; |
44: | end if; |
45: | end while; |
46: | k= k + 1, Number of clusters generated ; |
47: | ; |
48: | end while; |
Output: Cluster division |
3.6. Algorithm Complexity Analysis
4. Experiments and Results Analysis
4.1. Experimental Environment and Comparison Algorithms
4.2. Experimental Datasets and Clustering Evaluation Indicators
4.3. Analysis of Experimental Results on Synthetic Datasets
4.4. UCI Dataset Experiments
4.5. Image Segmentation
5. Overview of Experimental Findings and Sensitivity Assessment
5.1. Overview of Experimental Findings
5.2. Sensitivity Assessment
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Tan, S.K.; Wang, X. A novel two-stage omni-supervised face clustering algorithm. Pattern Anal. Appl. 2024, 27, 3. [Google Scholar] [CrossRef]
- Madrid-Herrera, L.; Chacon-Murguia, M.I.; Ramirez-Quintana, J.A. AENCIC: A method to estimate the number of clusters based on image complexity to be used in fuzzy clustering algorithms for image segmentation. Soft Comput. 2023, 28, 8561–8577. [Google Scholar] [CrossRef]
- Zhang, W. An improved DBSCAN algorithm for hazard recognition of obstacles in unmanned scenes. Soft Comput. 2023, 27, 18585–18604. [Google Scholar] [CrossRef]
- Kim, B.; Jang, H.J. Genetic-Based Keyword Matching DBSCAN in IoT for Discovering Adjacent Clusters. Comput. Model. Eng. Sci. 2023, 5, 20. [Google Scholar] [CrossRef]
- Ester, M. A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. Proc. Int. Conf. Knowl. Discov. Data Min. 1996, 96, 226–231. [Google Scholar]
- Lu, S.; Cheng, L.; Lu, Z.; Huang, Q.; Khan, B.A. A Self-Adaptive Grey DBSCAN Clustering Method. J. Grey Syst. 2022, 34, 4. [Google Scholar]
- Cao, P.; Yang, C.; Shi, L.; Wu, H. PSO-DBSCAN and SCGAN-Based Unknown Radar Signal Processing Method. Syst. Eng. Electron. 2022, 44, 4. [Google Scholar]
- Li, H.; Liu, X.; Li, T.; Gan, R. A novel density-based clustering algorithm using the nearest neighbor graph. Pattern Recognit. 2020, 102, 107206. [Google Scholar] [CrossRef]
- Wang, G.; Lin, G. Improved Adaptive Parameter DBSCAN Clustering Algorithm. Comput. Eng. Appl. 2020, 56, 45–51. [Google Scholar]
- Li, Y.; Wang, Y.; Song, H.X. GNN-DBSCAN: A new density-based algorithm using grid and the nearest neighbor. J. Intell. Fuzzy Syst. 2021, 41, 7589–7601. [Google Scholar]
- Sunita, J.; Parag, K. Algorithm to Determine ε-Distance Parameter in Density Based Clustering. Expert Syst. Appl. 2014, 41, 2939–2946. [Google Scholar]
- Bryant, A.C.; Cios, K.J. RNN-DBSCAN: A Density-Based Clustering Algorithm Using Reverse Nearest Neighbor Density Estimates. IEEE Trans. Knowl. Data Eng. 2018, 30, 1109–1121. [Google Scholar] [CrossRef]
- Li, W.; Yan, S.; Jiang, Y.; Zhang, S.; Wang, C. Research on the Method of Self-Adaptive Determination of DBSCAN Algorithm Parameters. Comput. Eng. Appl. 2019, 55, 1–7. [Google Scholar]
- Li, Y.; Yang, Z.; Jiao, S.; Li, Y. Partition KMNN-DBSCAN Algorithm and Its Application in Extraction of Rail Damage Data. Math. Probl. Eng. 2022, 2022, 4699573. [Google Scholar] [CrossRef]
- Zhou, Z.; Wang, J.; Zhu, S.; Sun, Z. An improved adaptive and fast AF-DBSCAN clustering algorithm. CAAI Trans. Intell. Syst. 2016, 11, 93–98. [Google Scholar]
- Chen, S.H.; Yi, M.L.; Zhang, Y.X. Wafer graph preprocessing based on optimized DBSCAN clustering algorithm. J. Control Decis. 2021, 36, 2713–2721. [Google Scholar]
- Juan, C.P.L.; Junlian, S.P. An unsupervised pattern recognition methodology based on factor analysis and a genetic-DBSCAN algorithm to infer operational conditions from strain measurements in structural applications. Chin. J. Aeronaut. 2021, 34, 165–181. [Google Scholar]
- Zhang, X.; Zhou, S. WOA-DBSCAN: Application of Whale Optimization Algorithm in DBSCAN Parameter Adaption. IEEE Access 2023, 11, 91861–91878. [Google Scholar] [CrossRef]
- Wang, Z.; Ye, Z.; Du, Y.; Mao, Y.; Liu, Y.; Wu, Z.; Wang, J. AMD-DBSCAN: An Adaptive Multi-density DBSCAN for datasets of extremely variable density. In Proceedings of the 9th IEEE International Conference on Data Science and Advanced Analytics (DSAA), Shenzhen, China, 13–16 October 2022; pp. 116–125. [Google Scholar] [CrossRef]
- Yang, Y.; Qian, C.; Li, H.; Gao, Y.; Wu, J.; Liu, C.J.; Zhao, S. An Efficient DBSCAN Optimized by Arithmetic Optimization Algorithm with Opposition-Based Learning. J. Supercomput. 2022, 78, 19566–19604. [Google Scholar] [CrossRef]
- Xue, J. Research and Application of a Novel Swarm Intelligence Optimization Technology. Master’s Thesis, Donghua University, Shanghai, China, 2020. [Google Scholar]
- Gao, S.; Zhou, X.; Li, S. Density Ratio-Based Density Peak Clustering Algorithm. Comput. Eng. Appl. 2017, 53, 8. [Google Scholar]
- Rodriguez, A.; Laio, A. Clustering by fast search and find of density peaks. Science 2014, 344, 1492. [Google Scholar] [CrossRef] [PubMed]
- Dong, X.; Cheng, C. Kernel Density Estimation-Based K-CFSFDP Clustering Algorithm. Comput. Sci. 2018, 45, 5. [Google Scholar]
- Wang, Y.; Zhang, G. Density Peak Clustering Algorithm for Automatically Determining Cluster Centers. Comput. Eng. Appl. 2018, 54, 6. [Google Scholar]
- Hartigan, J.A.; Wong, M.A. A K-Means clustering algorithm. JR Stat. Soc. Ser. C-Appl. Stat. 1979, 28, 100–108. [Google Scholar]
- Manaa, M.; Obaid, A.J.; Dosh, M. Unsupervised Approach for Email Spam Filtering using Data Mining. EAI Endorsed Trans. Energy Web 2018, 8, e3. [Google Scholar] [CrossRef]
- Huang, S.X.; Luo, J.W.; Pu, K.X.; Wu, M.; Chaudhary, G. Diagnosis system of microscopic hyperspectral image of hepatobiliary tumors based on convolutional neural network. Comput. Intel. Neurosc. 2022, 2022, 3794844. [Google Scholar] [CrossRef]
- Ping, X.; Yang, F.B.; Zhang, H.G.; Xing, C.D.; Zhang, W.J.; Wang, Y. Evaluation of hybrid forecasting methods for organic Rankine cycle: Unsupervised learning-based outlier removal and partial mutual information-based feature selection. Appl. Energ. 2022, 311, 118682. [Google Scholar] [CrossRef]
- Unnikrishnan, R.; Pantofaru, C.; Hebert, M. Toward Objective Evaluation of Image Segmentation Algorithms. IEEE Trans. Pattern Anal. Mach. Intell. 2007, 29, 929. [Google Scholar] [CrossRef]
- Wang, X.; Tang, Y.; Masnou, S.; Chen, L. A Global/Local Affinity Graph for Image Segmentation. Proc. ACM Conf. Comput. Sci. 1988, 24, 1399–1411. [Google Scholar]
Data | Observations | Classes | Dimensions |
---|---|---|---|
Aggregation | 788 | 7 | 2 |
Compound | 399 | 6 | 2 |
R15 | 600 | 15 | 2 |
Spiral | 312 | 3 | 2 |
Flame | 240 | 2 | 2 |
Iris | 150 | 3 | 4 |
Wine | 178 | 3 | 13 |
Sym | 350 | 3 | 2 |
Seeds | 210 | 3 | 7 |
Zoo | 101 | 7 | 16 |
Bank | 1372 | 2 | 4 |
Ecoli | 336 | 7 | 8 |
Dataset | Algorithms | ACC | AMI | ARI |
---|---|---|---|---|
Iris | SSA-DBSCAN | 0.920 | 0.837 | 0.889 |
RNN-DBSCAN | 0.696 | 0.582 | 0.664 | |
KANN-DBSCAN | 0.614 | 0.401 | 0.388 | |
AF-DBSCAN | 0.867 | 0.674 | 0.687 | |
DBSCAN | 0.648 | 0.536 | 0.515 | |
Wine | SSA-DBSCAN | 0.643 | 0.367 | 0.249 |
RNN-DBSCAN | 0.741 | 0.495 | 0.359 | |
KANN-DBSCAN | 0.535 | 0.156 | 0.152 | |
AF-DBSCAN | 0.609 | 0.301 | 0.396 | |
DBSCAN | 0.672 | 0.344 | 0.414 | |
Sym | SSA-DBSCAN | 0.929 | 0.854 | 0.913 |
RNN-DBSCAN | 0.736 | 0.785 | 0.742 | |
KANN-DBSCAN | 0.808 | 0.665 | 0.682 | |
AF-DBSCAN | 0.803 | 0.654 | 0.720 | |
DBSCAN | 0.755 | 0.427 | 0.414 | |
Seeds | SSA-DBSCAN | 0.574 | 0.426 | 0.326 |
RNN-DBSCAN | 0.535 | 0.512 | 0.534 | |
KANN-DBSCAN | 0.171 | 0.253 | 0.207 | |
AF-DBSCAN | 0.523 | 0.428 | 0.343 | |
DBSCAN | 0.519 | 0.332 | 0.235 | |
Zoo | SSA-DBSCAN | 0.819 | 0.814 | 0.706 |
RNN-DBSCAN | 0.842 | 0.673 | 0.595 | |
KANN-DBSCAN | 0.610 | 0.689 | 0.614 | |
AF-DBSCAN | 0.536 | 0.574 | 0.502 | |
DBSCAN | 0.614 | 0.741 | 0.651 | |
Bank | SSA-DBSCAN | 0.882 | 0.693 | 0.776 |
RNN-DBSCAN | 0.837 | 0.677 | 0.747 | |
KANN-DBSCAN | 0.796 | 0.581 | 0.673 | |
AF-DBSCAN | 0.864 | 0.681 | 0.752 | |
DBSCAN | 0.758 | 0.547 | 0.645 | |
Ecoli | SSA-DBSCAN | 0.592 | 0.748 | 0.684 |
RNN-DBSCAN | 0.418 | 0.627 | 0.599 | |
KANN-DBSCAN | 0.399 | 0.619 | 0.652 | |
AF-DBSCAN | 0.542 | 0.520 | 0.587 | |
DBSCAN | 0.426 | 0.565 | 0.515 |
Image | Clustering Algorithms | BDE | PRI | Running Time (s) | Memory Size (MB) |
---|---|---|---|---|---|
Image one | SSA-DBSCAN | 4.9377 | 0.8827 | 50.93 | 158.49 |
DBSCAN | 5.0937 | 0.8875 | 7.59 | 34.81 | |
k-means | 15.7662 | 0.6063 | 1.41 | 16.47 | |
DPC | 15.4732 | 0.6402 | 36.18 | 132.14 | |
Image two | SSA-DBSCAN | 6.4651 | 0.8981 | 56.14 | 173.82 |
DBSCAN | 6.5748 | 0.8974 | 8.09 | 56.73 | |
k-means | 11.6513 | 0.8319 | 1.48 | 17.69 | |
DPC | 10.0235 | 0.8436 | 39.76 | 139.56 | |
Image three | SSA-DBSCAN | 7.4530 | 0.8996 | 78.18 | 219.51 |
DBSCAN | 7.5634 | 0.8934 | 10.74 | 60.27 | |
k-means | 11.3806 | 0.8439 | 1.65 | 21.09 | |
DPC | 10.0519 | 0.8398 | 48.97 | 153.48 | |
Image four | SSA-DBSCAN | 9.7218 | 0.8390 | 86.31 | 231.67 |
DBSCAN | 10.5754 | 0.8387 | 12.73 | 69.58 | |
k-means | 17.4136 | 0.7379 | 1.69 | 22.89 | |
DPC | 15.8168 | 0.7630 | 51.94 | 175.23 | |
Image five | SSA-DBSCAN | 7.5544 | 0.8837 | 69.72 | 195.48 |
DBSCAN | 7.6185 | 0.8523 | 8.79 | 58.31 | |
k-means | 11.5680 | 0.7436 | 1.56 | 19.32 | |
DPC | 9.9451 | 0.7615 | 41.85 | 147.96 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Huang, Z.; Liang, Z.; Zhou, S.; Zhang, S. An Improved Density-Based Spatial Clustering of Applications with Noise Algorithm with an Adaptive Parameter Based on the Sparrow Search Algorithm. Algorithms 2025, 18, 273. https://doi.org/10.3390/a18050273
Huang Z, Liang Z, Zhou S, Zhang S. An Improved Density-Based Spatial Clustering of Applications with Noise Algorithm with an Adaptive Parameter Based on the Sparrow Search Algorithm. Algorithms. 2025; 18(5):273. https://doi.org/10.3390/a18050273
Chicago/Turabian StyleHuang, Zicheng, Zuopeng Liang, Shibo Zhou, and Shuntao Zhang. 2025. "An Improved Density-Based Spatial Clustering of Applications with Noise Algorithm with an Adaptive Parameter Based on the Sparrow Search Algorithm" Algorithms 18, no. 5: 273. https://doi.org/10.3390/a18050273
APA StyleHuang, Z., Liang, Z., Zhou, S., & Zhang, S. (2025). An Improved Density-Based Spatial Clustering of Applications with Noise Algorithm with an Adaptive Parameter Based on the Sparrow Search Algorithm. Algorithms, 18(5), 273. https://doi.org/10.3390/a18050273