Abstract
In edge computing scenarios, the data generated by distributed devices is characterized by its dispersion, heterogeneity, and privacy sensitivity, posing significant challenges to federated clustering, including high communication overhead, difficulty in adapting to non-IID data, and significant privacy leakage risks. To address these issues, this paper proposes a privacy-enhanced federated k-means clustering algorithm based on locality-sensitive hashing, aiming to mine latent knowledge from multi-source distributed data while ensuring data privacy protection. The core innovation of this algorithm lies in leveraging the distance sensitivity of clustering pairs, which effectively mitigates the non-IID problem while preserving data privacy and achieves global clustering in just a single communication round, significantly enhancing its practicality in communication-constrained environments. Specifically, the algorithm first evaluates local data dispersion at the client side, dynamically generates cluster cardinality based on dispersion, and obtains initial clustering centers through the k-means algorithm. Subsequently, it employs locality-sensitive hashing to encrypt the center points, uploading only the encrypted clustering information and weight data to the server, thereby achieving privacy protection without relying on a trusted server. On the server side, a secondary weighted k-means clustering is performed in the encrypted space to generate hashed global centers. Experimental results on the MNIST and CIFAR-10 datasets demonstrate that this method maintains robust clustering performance under non-IID data distributions. Most crucially, through a strict single-round client-to-server communication protocol, this approach significantly reduces communication overhead, providing a distributed data mining solution that is efficient, adaptable, and privacy-preserving for resource-constrained edge computing environments.