Improving K-Means Clustering: A Comparative Study of Parallelized Version of Modified K-Means Algorithm for Clustering of Satellite Images
Abstract
1. Introduction
1.1. Overview of Satellite Image
1.2. Related Work
1.2.1. Overview of Clustering Algorithm
- Linear scalability with data volume;
- Support asynchronous parallel execution across independent data slices, enabling non-blocking processing of extremely large datasets (e.g., multiple 1 TB segments) by distributing computational task concurrently across multi-core or distributed systems;
- Flexibility to integrate algorithmic improvements such as advanced centroid initialization, iteration reduction, and outlier handling, as explored in this study.
1.2.2. Advancement in K-Means Clustering
1.2.3. Parallelization Techniques in K-Means Cluster
1.2.4. Motivation and Contribution of This Study
2. Materials
2.1. SDSU Dervied Data Product
2.2. Hardware and Software Specifications
3. Methodology
3.1. Satellite Imagery Dataset
3.2. Algorithm Development
- Centroid Initialization Method;
- Nearest-Neighbor Iteration Calculation Reduction Method;
- Dynamic K-Means Sharp Method.
3.2.1. Parallelize Standard K-Means Algorithms
Algorithm 1: Pseudocode for K-Means Algorithm | |
Input: | I = {I1, I2, I3,….,Id} // set of d data cube (114 data cube with size 3712*3712*16) K // Number of desired clusters |
Output: | A set of K distinct clusters 1. Randomly select K initial cluster centers (centroids) from the 114 data cubes, where each centroid is represented by the matrix μmn as shown in Equation (1). 2. While max(|μmn,old − μmn,new |) < 0.0005 do: 3. For (I = 1: number_of_images)//for all images For (x = 1: number_of_row) For (y = 1: number_of_column)//for all pixel 4. Calculate the Euclidean distance between Pxyn and each centroid μmn using Equation (2) 5. Assign Pxyn to the cluster with the nearest centroid (min(D(x,y,m))) 6. Update each centroid μmn: For each cluster m: 7. Calculate the new centroid using Equation (3). 8. Check for convergence: 9. Calculate the maximum change in centroids: 10. DiffMean = max(|μmn,old−μmn_new|) 11. If DiffMean < 0.0005, then: Convergence achieved, exit loop 12. Else: Update centroids: μmn = μmn_new 13. Continue iteration End While |
Step: |
Algorithm 2: Pseudocode for Parallelization Standard K-Means algorithm | |
Input: | I = {I1, I2, I3,….,In}//set of d data cube (114 data cube with size 3712 × 712 × 16) K//Number of desired clusters |
Output: | A set of K distinct clusters
|
Step: |
3.2.2. Parallelize Centroid Initialization Method
Algorithm 3: Pseudocode for class initialization of parallel centroid initialization method | |
Input: | I = {I1, I2, I3,….,In}//set of n data cube (114 data cube with size 3712 × 3712 × 16) K//Number of desired clusters |
Output: | A set of K distinct clusters. sampledpixelvalues ← 10 × *10 matrix of zeros (size: 100 × 16)
|
Step: |
Algorithm 4: Pseudocode for main classification | |
|
3.2.3. Parallelized Nearest-Neighbor Iteration Calculation Reduction Method
Algorithm 5: Pseudocode for Parallelize Nearest-Neighbor Iteration Calculation Reduction Method | |
Input: | I = {I1, I2, I3,….,Id}//set of d data cube (114 data cube with size 3712 × 3712 × 16) K//Number of desired clusters |
Output: | A set of k clusters.
|
Step: |
3.2.4. Parallelize Dynamic K-Means Sharp Method
- A. Outlier Detection
- B. New Centroid update
Algorithm 6: Pseudocode for Parallelize Dynamic K-Means Sharp Method | |
Input: | I = {I1, I2, I3,….,Id}//set of d data cube (114 data cube with size 3712 × 3712 × 16) K//Number of desired clusters |
Output: | A set of k clusters.
|
Step: |
3.2.5. Parallelized Centroid Initialization and Dynamic K-Means Sharp Method
Algorithm 7: Centroid Initialization and Dynamic K-Means Sharp Method | |
Input: | I = {I1, I2, I3,…,Id}//set of d data cube (114 data cube with size 3712 × 3712 × 16) K//Number of desired clusters |
Output: | A set of k clusters Phase 1: Identify the initial cluster centroids by using Algorithm 3 Phase 2: Update centroid by using Algorithm 6 |
Step: |
3.2.6. Parallelized Dynamic K-Means Sharp and Nearest-Neighbor Iteration Calculation Reduction Method
Algorithm 8: Dynamic K-Means Sharp and Nearest-Neighbor Iteration Calculation Reduction Method | |
Input: | I = {I1, I2, I3,….,Id}//set of d data cube (114 data cube with size 3712 × 3712 × 16) K//Number of desired clusters |
Output: | A set of k clusters Phase 1: Assign each data point to the appropriate cluster by Algorithm 5. Phase 2: Update centroid by using Algorithm 6 |
Step: |
3.2.7. Parallelized Centroid Initialization and Nearest-Neighbor Iteration Calculation Reduction Method
Algorithm 9: Centroid Initialization and Nearest-Neighbor Iteration Calculation Reduction Method | |
Input: | I = {I1, I2, I3,….,Id}//set of d data cube (114 data cube with size 3712 × 3712 × 16) K//Number of desired clusters |
Output: | A set of k clusters Phase 1: Identify the initial cluster centroids by using Algorithm 3 Phase 2: Assign each data point to the appropriate cluster by Algorithm 5 |
Step: |
3.2.8. Parallel Enhanced K-Means Method (PEKM)
Algorithm 10: Centroid Initialization Method, Dynamic K-Means Sharp, and Nearest-Neighbor Iteration Calculation Reduction Method | |
Input: | I = {I1, I2, I3,….,Id}//set of d data cube (114 data cube with size 3712 × 3712 × 16) K//Number of desired clusters |
Output: | A set of k clusters Phase 1: Determine the initial centroids of the clusters by using Algorithm 3 Phase 2: Assign each data point to the appropriate cluster by Algorithm 5. Phase 3: Update centroid by using Algorithm 6 |
Step: |
3.3. Convergent Criteria
- Reaching a predetermined maximum number of iterations.
- Having fewer pixel reassignments per iteration than a set threshold.
- Centroid shifts fall below a specified distance threshold during an update cycle.
3.4. Performance Metrics
3.4.1. Convergence Speed
3.4.2. Clustering Quality
- Compute the Root Mean Square Error:
- Compute the Average Root Mean Square Error:
- Vij: The jth valid pixel in the ith cluster.
- Cik: The centroid value for the kth band of the ith cluster.
- Ni: The number of valid pixels in the ith cluster.
- C: Total number of clusters.
- B: Total number of bands.
3.4.3. Computational Efficiency
3.5. Comparative Analysis
4. Evaluation and Experimental Results
4.1. Convergent Speed Comparison
4.2. Computation Efficiency Comparison
4.3. Clustering Quality Evaluation
4.4. Image Cluster Output
5. Discussion
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
Appendix A
References
- Ao, S.I. World Congress on Engineering: WCE 2012: 4–6 July, 2012, Imperial College London, London, U.K.; Newswood Ltd.; International Association of Engineers: Champaign, IL, USA, 2012; ISBN 9789881925138. [Google Scholar]
- Wulder, M.A.; Roy, D.P.; Radeloff, V.C.; Loveland, T.R.; Anderson, M.C.; Johnson, D.M.; Healey, S.; Zhu, Z.; Scambos, T.A.; Pahlevan, N.; et al. Fifty Years of Landsat Science and Impacts. Remote Sens. Environ. 2022, 280, 113195. [Google Scholar] [CrossRef]
- Dutta, S.; Das, M. Remote Sensing Scene Classification under Scarcity of Labelled Samples—A Survey of the State-of-the-Arts. Comput. Geosci. 2023, 171, 105295. [Google Scholar] [CrossRef]
- Shrestha, M.; Leigh, L.; Helder, D. Classification of North Africa for Use as an Extended Pseudo Invariant Calibration Sites (EPICS) for Radiometric Calibration and Stability Monitoring of Optical Satellite Sensors. Remote Sens. Environ. 2019, 11, 875. [Google Scholar] [CrossRef]
- Fajardo Rueda, J.; Leigh, L.; Teixeira Pinto, C.; Kaewmanee, M.; Helder, D. Classification and Evaluation of Extended Pics (Epics) on a Global Scale for Calibration and Stability Monitoring of Optical Satellite Sensors. Remote Sens. Environ. 2021, 13, 3350. [Google Scholar] [CrossRef]
- Usman, B. Satellite Imagery Land Cover Classification Using K-Means Clustering Algorithm: Computer Vision for Environmental Information Extraction. Elixir Int. J. Comput. Sci. Eng. 2013, 63, 18671–18675. [Google Scholar]
- Yasin, H.E.E.; Kornel, C. Evaluating Satellite Image Classification: Exploring Methods and Techniques. In Geographic Information Systems-Data Science Approach; IntechOpen: London, UK, 2024. [Google Scholar]
- Kaewmanee, M.; Leigh, L.; Shah, R.; Gross, G. Inter-Comparison of Landsat-8 and Landsat-9 during On-Orbit Initialization and Verification (OIV) Using Extended Pseudo Invariant Calibration Sites (EPICS): Advanced Methods. Remote Sens. 2023, 15, 2330. [Google Scholar] [CrossRef]
- Shah, R.; Leigh, L.; Kaewmanee, M.; Pinto, C.T. Validation of Expanded Trend-to-Trend Cross-Calibration Technique and Its Application to Global Scale. Remote Sens. 2022, 14, 6216. [Google Scholar] [CrossRef]
- Yin, L.; Lv, L.; Wang, D.; Qu, Y.; Chen, H.; Deng, W. Spectral Clustering Approach with K-Nearest Neighbor and Weighted Mahalanobis Distance for Data Mining. Electronics 2023, 12, 3284. [Google Scholar] [CrossRef]
- Ni, L.; Manman, P.; Qiang, W. A Spectral Clustering Algorithm for Non-Linear Graph Embedding in Information Networks. Appl. Sci. 2024, 14, 4946. [Google Scholar] [CrossRef]
- Ran, X.; Xi, Y.; Lu, Y.; Wang, X.; Lu, Z. Comprehensive Survey on Hierarchical Clustering Algorithms and the Recent Developments; Springer: Dordrecht, The Netherlands, 2023; Volume 56, ISBN 0123456789. [Google Scholar]
- Zhang, X.; Shen, X.; Ouyang, T. Extension of DBSCAN in Online Clustering: An Approach Based on Three-Layer Granular Models. Appl. Sci. 2022, 12, 9402. [Google Scholar] [CrossRef]
- Dinh, T.; Hauchi, W.; Lisik, D.; Koren, M.; Tran, D.; Yu, P.S.; Torres-Sospedra, J. Data Clustering: An Essential Technique in Data Science. arXiv 2024, arXiv:2412.18760. [Google Scholar] [CrossRef]
- Chaudhry, M.; Shafi, I.; Mahnoor, M.; Vargas, D.L.R.; Thompson, E.B.; Ashraf, I. A Systematic Literature Review on Identifying Patterns Using Unsupervised Clustering Algorithms: A Data Mining Perspective. Symmetry 2023, 15, 1679. [Google Scholar] [CrossRef]
- Miao, S.; Zheng, L.; Liu, J.; Jin, H. K-Means Clustering Based Feature Consistency Alignment for Label-Free Model Evaluation. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, Vancouver, BC, Canada, 24 June 2023; pp. 3299–3307. [Google Scholar] [CrossRef]
- Al-Sabbagh, A.; Hamze, K.; Khan, S.; Elkhodr, M. An Enhanced K-Means Clustering Algorithm for Phishing Attack Detections. Electronics 2024, 13, 3677. [Google Scholar] [CrossRef]
- Ahmed, M.; Seraj, R.; Islam, S.M.S. The K-Means Algorithm: A Comprehensive Survey and Performance Evaluation. Electronics 2020, 9, 1295. [Google Scholar] [CrossRef]
- Ikotun, A.M.; Ezugwu, A.E.; Abualigah, L.; Abuhaija, B.; Heming, J. K-Means Clustering Algorithms: A Comprehensive Review, Variants Analysis, and Advances in the Era of Big Data. Inf. Sci. 2023, 622, 178–210. [Google Scholar] [CrossRef]
- Rana, M.; Rahman, A.; Smith, D. Hierarchical Semi-Supervised Approach for Classifying Activities of Workers Utilising Indoor Trajectory Data. Internet Things 2024, 28, 101386. [Google Scholar] [CrossRef]
- Dash, M.; Liu, H.; Scheuermann, P.; Tan, K.L. Fast Hierarchical Clustering and Its Validation. Data Knowl. Eng. 2003, 44, 109–138. [Google Scholar] [CrossRef]
- Shi, K.; Yan, J.; Yang, J. A Semantic Partition Algorithm Based on Improved K-Means Clustering for Large-Scale Indoor Areas. ISPRS Int. J. Geoinf. 2024, 13, 41. [Google Scholar] [CrossRef]
- Degirmenci, A.; Karal, O. Efficient Density and Cluster Based Incremental Outlier Detection in Data Streams. Inf. Sci. 2022, 607, 901–920. [Google Scholar] [CrossRef]
- Sieranoja, S.; Fränti, P. Fast and General Density Peaks Clustering. Pattern Recognit. Lett. 2019, 128, 551–558. [Google Scholar] [CrossRef]
- Guava. Spectral Clustering for Large Scale Datasets (Part 1). Available online: https://medium.com/@guava1427/spectral-clustering-for-large-scale-datasets-part-1-874571887610 (accessed on 15 April 2025).
- Sreedhar, C.; Kasiviswanath, N.; Chenna Reddy, P. Clustering Large Datasets Using K-Means Modified Inter and Intra Clustering (KM-I2C) in Hadoop. J. Big Data 2017, 4, 27. [Google Scholar] [CrossRef]
- Capó, M.; Pérez, A.; Lozano, J.A. An Efficient K -Means Clustering Algorithm for Massive Data. arXiv 2018, arXiv:1801.02949. [Google Scholar] [CrossRef]
- Jin, S.; Cui, Y.; Yu, C. A New Parallelization Method for K-Means. arXiv 2016, arXiv:1608.06347. [Google Scholar] [CrossRef]
- Honggang, W.; Jide, Z.; Hongguang, L.; Jianguo, W. Parallel Clustering Algorithms for Image Processing on Multi-Core CPUs. In Proceedings of the International Conference on Computer Science and Software Engineering (CSSE), Wuhan, China, 12–14 December 2008; Volume 3, pp. 450–453. [Google Scholar]
- Zhang, Y.; Xiong, Z.; Mao, J.; Ou, L. The Study of Parallel K-Means Algorithm. In Proceedings of the 2006 6th World Congress on Intelligent Control and Automation, Dalian, China, 21–23 June 2006. [Google Scholar]
- Macqueen, J. SOME METHODS FOR CLASSIFICATION AND ANALYSIS OF MULTIVARIATE OBSERVATIONS. In Proceedings of the Berkeley Symposium on Mathematical Statistics & Probability, Berkeley, CA, USA, 21 June–18 July 1965. [Google Scholar]
- Parveen, S.; Yang, M. Lasso-Based k-Means++ Clustering. Electronics 2025, 14, 1429. [Google Scholar] [CrossRef]
- Khan, A.A.; Bashir, M.S.; Batool, A.; Raza, M.S.; Bashir, M.A. K-Means Centroids Initialization Based on Differentiation Between Instances Attributes. Int. J. Intell. Syst. 2024, 2024, 7086878. [Google Scholar] [CrossRef]
- Chan, J.Y.K.; Leung, A.P.; Xie, Y. Efficient High-Dimensional Kernel k-Means++ with Random Projection. Appl. Sci. 2021, 11, 6963. [Google Scholar] [CrossRef]
- Fränti, P.; Sieranoja, S. K-Means Properties on Six Clustering Benchmark Datasets. Appl. Intell. 2018, 48, 4743–4759. [Google Scholar] [CrossRef]
- Li, H.; Sugasawa, S.; Katayama, S. Adaptively Robust and Sparse K-Means Clustering. Trans. Mach. Learn. Res. 2024, 2024, 1–29. [Google Scholar]
- Olukanmi, P.O.; Twala, B. K-Means-Sharp: Modified Centroid Update for Outlier-Robust k-Means Clustering. In Proceedings of the 2017 Pattern Recognition Association of South Africa and Robotics and Mechatronics International Conference (PRASA-RobMech); Bloemfontein, South Africa, 29 November–1 December 2017; Institute of Electrical and Electronics Engineers Inc.: New York, NY, USA, 2017; Volume 2018-January, pp. 14–19. [Google Scholar]
- Zhao, J.; Bao, Y.; Li, D.; Guan, X. An Improved K-Means Algorithm Based on Contour Similarity. Mathematics 2024, 12, 2211. [Google Scholar] [CrossRef]
- Yao, X.; Chen, Z.; Gao, C.; Zhai, G.; Zhang, C. ResAD: A Simple Framework for Class Generalizable Anomaly Detection. Adv. Neural Inf. Process. Syst. 2024, 37, 125287–125311. [Google Scholar]
- Wu, S.; Zhai, Y.; Liu, J.; Huang, J.; Jian, Z.; Dai, H.; Di, S.; Chen, Z.; Cappello, F. TurboFFT: A High-Performance Fast Fourier Transform with Fault Tolerance on GPU. arXiv 2024, arXiv:2405.02520. [Google Scholar] [CrossRef]
- Shi, N.; Liu, X.; Guan, Y. Research on K-Means Clustering Algorithm: An Improved k-Means Clustering Algorithm. In Proceedings of the 3rd International Symposium on Intelligent Information Technology and Security Informatics (IITSI), Jian, China, 2–4 April 2010; pp. 63–67. [Google Scholar]
- Wang, J.; Wang, J.; Ke, Q.; Zeng, G.; Li, S.; Valley, S. Fast Approximate K-Means via Cluster Closures. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, 16–21 June 2012; pp. 3037–3044. [Google Scholar]
- Moodi, F.; Saadatfar, H. An Improved K-Means Algorithm for Big Data. IET Softw. 2022, 16, 48–59. [Google Scholar] [CrossRef]
- Mussabayev, R.; Mussabayev, R. Superior Parallel Big Data Clustering Through Competitive Stochastic Sample Size Optimization in Big-Means. In Proceedings of the Asian Conference on Intelligent Information and Database Systems, Ras Al Khaimah, United Arab Emirates, 15–18 April 2024; pp. 224–236. [Google Scholar] [CrossRef]
- Rashmi, C.; Chaluvaiah, S.; Kumar, G.H. An Efficient Parallel Block Processing Approach for K -Means Algorithm for High Resolution Orthoimagery Satellite Images. In Proceedings of the Procedia Computer Science, Bangalore, India, 19–21 August 2016; Elsevier B.V.: Amsterdam, The Netherlands, 2016; Volume 89, pp. 623–631. [Google Scholar]
- Jin, R.; Yang, G.; Agrawal, G. Shared Memory Parallelization of Data Mining Algorithms: Techniques, Programming Interface, and Performance. IEEE Trans. Knowl. Data Eng. 2005, 17, 71–89. [Google Scholar] [CrossRef]
- Cuomo, S.; De Angelis, V.; Farina, G.; Marcellino, L.; Toraldo, G. A GPU-Accelerated Parallel K-Means Algorithm. Comput. Electr. Eng. 2019, 75, 262–274. [Google Scholar] [CrossRef]
- Bellavita, J.; Pasquali, T.; Del Rio Martin, L.; Vella, F.; Guidi, G. Popcorn: Accelerating Kernel K-Means on GPUs Through Sparse Linear Algebra; Association for Computing Machinery: New York, NY, USA, 2025; Volume 1, ISBN 9798400714436. [Google Scholar]
- Gorelick, N.; Hancher, M.; Dixon, M.; Ilyushchenko, S.; Thau, D.; Moore, R. Google Earth Engine: Planetary-Scale Geospatial Analysis for Everyone. Remote Sens. Environ. 2017, 202, 18–27. [Google Scholar] [CrossRef]
- Dickinson, R.E.; Henderson-Sellers, A.; Kennedy, P.J.; Wilson, M.F. NCAR/TN-257+STR Biosphere-Atmosphere Transfer Scheme (BATS) for the NCAR Community Climate Model; National Center for Atmospheric Research: Boulder, CO, USA, 1986. [Google Scholar]
- Shahapure, K.R.; Nicholas, C. Cluster Quality Analysis Using Silhouette Score. In Proceedings of the 2020 IEEE 7th International Conference on Data Science and Advanced Analytics (DSAA), Sydney, Australia, 6–9 October 2020; pp. 747–748. [Google Scholar] [CrossRef]
- Syahputri, Z.; Sutarman, S.; Siregar, M.A.P. Determining The Optimal Number of K-Means Clusters Using The Calinski Harabasz Index and Krzanowski and Lai Index Methods for Groupsing Flood Prone Areas In North Sumatra. Sinkron 2024, 9, 571–580. [Google Scholar] [CrossRef]
- KC, M.; Leigh, L.; Pinto, C.T.; Kaewmanee, M. Method of Validating Satellite Surface Reflectance Product Using Empirical Line Method. Remote Sens. 2023, 15, 2240. [Google Scholar] [CrossRef]
- Fränti, P.; Sieranoja, S. Clustering Accuracy. Appl. Comput. Intell. 2024, 4, 24–44. [Google Scholar] [CrossRef]
- Nigro, L.; Cicirelli, F.; Fränti, P. Parallel Random Swap: An Efficient and Reliable Clustering Algorithm in Java. Simul. Model. Pract. Theory 2023, 124, 102712. [Google Scholar] [CrossRef]
- Fränti, P. Efficiency of Random Swap Clustering. J. Big Data 2018, 5, 1–29. [Google Scholar] [CrossRef]
- Kaukoranta, T.; Fränti, P.; Nevalainen, O. A Fast Exact GLA Based on Code Vector Activity Detection. IEEE Trans. Image Process. 2000, 9, 1337–1342. [Google Scholar] [CrossRef]
- Fajardo Rueda, J.; Leigh, L.; Kaewmanee, M.; Byregowda, H.; Teixeira Pinto, C. Derivation of Hyperspectral Profiles for Global Extended Pseudo Invariant Calibration Sites (EPICS) and Their Application in Satellite Sensor Cross-Calibration. Remote Sens. 2025, 17, 216. [Google Scholar] [CrossRef]
- Fajardo Rueda, J.; Leigh, L.; Teixeira Pinto, C. Identification of Global Extended Pseudo Invariant Calibration Sites (EPICS) and Their Validation Using Radiometric Calibration Network (RadCalNet). Remote Sens. 2024, 16, 4129. [Google Scholar] [CrossRef]
- Alshari, E.A.; Gawali, B.W. Development of Classification System for LULC Using Remote Sensing and GIS. Glob. Transit. Proc. 2021, 2, 8–17. [Google Scholar] [CrossRef]
- Yang, C.; Li, Y.; Cheng, F. Accelerating K-Means on GPU with CUDA Programming. IOP Conf. Ser. Mater. Sci. Eng. 2020, 790, 012036. [Google Scholar] [CrossRef]
- Han, S.; Lee, J. Parallelized Inter-Image k-Means Clustering Algorithm for Unsupervised Classification of Series of Satellite Images. Remote Sens. 2024, 16, 102. [Google Scholar] [CrossRef]
- Andoni, A.; Indyk, P.; Razenshteyn, I. Approximate Nearest Neighbor Search in High Dimensions. In Proceedings of the International Congress of Mathematicians (ICM), Rio de Janeiro, Brazil, 1–9 August 2018; Volume 4, pp. 3305–3336. [Google Scholar] [CrossRef]
- Shindler, M.; Wong, A.; Meyerson, A. Fast and Accurate κ-Means for Large Datasets. In Proceedings of the 25th Annual Conference on Neural Information Processing Systems (NIPS), Granada, Spain, 10–13 December 2011. [Google Scholar]
- Spalding-Jamieson, J.; Robson, E.W.; Zheng, D.W. Scalable K-Means Clustering for Large k via Seeded Approximate Nearest-Neighbor Search. arXiv 2025, arXiv:2502.06163. [Google Scholar]
- Peng, K.; Leung, V.C.M.; Huang, Q. Clustering Approach Based on Mini Batch Kmeans for Intrusion Detection System over Big Data. IEEE Access 2018, 6, 11897–11906. [Google Scholar] [CrossRef]
- Jourdan, B.; Schwartzman, G. Mini-Batch Kernel k-Means. arXiv 2024, arXiv:2410.05902. [Google Scholar]
- Newling, J. Nested Mini-Batch K-Means. Adv. Neural Inf. Process. Syst. 2015, 29, 1–9. [Google Scholar]
- Aggarwal, C.C.; Hinneburg, A.; Keim, D.A. On the Surprising Behavior of Distance Metrics in High Dimensional Space. In Database Theory ICDT 2001; Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2001; Volume 1973, pp. 420–434. [Google Scholar] [CrossRef]
Parameter | Values |
---|---|
Data cube Number | 114 |
Data cube Size | 3712 × 3712 × 16 (~1 GB) |
# Cluster (k) | 160 |
Convergent Criteria | 0.0005 |
Method | Iterations Till Convergence |
---|---|
PSKM + CI | 348 iterations |
PSKM + KS | 282 iterations |
PSKM + NN | 326 iterations |
PSKM + CI + KS | 257 iterations |
PSKM + CI +NN | 292 iterations |
PSKM + KS + NN | 288 iterations |
PEKM | 234 iterations |
Method | Hours Till Convergence |
---|---|
PSKM + CI | 9965.47 h |
PSKM + KS | 6341.58 h |
PSKM + NN | 5860.90 h |
PSKM + CI + KS | 5237.19 h |
PSKM + CI +NN | 5130.24 h |
PSKM + KS + NN | 5243.95 h |
PEKM | 4230.43 h |
Method | RMSE |
---|---|
PSKM+CI | 0.01417 |
PSKM+KS | 0.01332 |
PSKM + NN | 0.01399 |
PSKM + CI + KS | 0.01375 |
PSKM + CI +NN | 0.01412 |
PSKM + KS + NN | 0.01333 |
PEKM | 0.01364 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Pant, Y.R.; Leigh, L.; Fajardo Rueda, J. Improving K-Means Clustering: A Comparative Study of Parallelized Version of Modified K-Means Algorithm for Clustering of Satellite Images. Algorithms 2025, 18, 532. https://doi.org/10.3390/a18080532
Pant YR, Leigh L, Fajardo Rueda J. Improving K-Means Clustering: A Comparative Study of Parallelized Version of Modified K-Means Algorithm for Clustering of Satellite Images. Algorithms. 2025; 18(8):532. https://doi.org/10.3390/a18080532
Chicago/Turabian StylePant, Yuv Raj, Larry Leigh, and Juliana Fajardo Rueda. 2025. "Improving K-Means Clustering: A Comparative Study of Parallelized Version of Modified K-Means Algorithm for Clustering of Satellite Images" Algorithms 18, no. 8: 532. https://doi.org/10.3390/a18080532
APA StylePant, Y. R., Leigh, L., & Fajardo Rueda, J. (2025). Improving K-Means Clustering: A Comparative Study of Parallelized Version of Modified K-Means Algorithm for Clustering of Satellite Images. Algorithms, 18(8), 532. https://doi.org/10.3390/a18080532