K-means, as a commonly used clustering method, has been widely applied in data analysis for smart meters. However, this method requires repeatedly computing the similarity between all data points and cluster centers in each iteration, which leads to high computational overhead. Moreover,
[...] Read more.
K-means, as a commonly used clustering method, has been widely applied in data analysis for smart meters. However, this method requires repeatedly computing the similarity between all data points and cluster centers in each iteration, which leads to high computational overhead. Moreover, the process of analyzing electricity consumption data by
K-means can cause the leakage of users’ privacy, and the current differential privacy technique adopts a uniform privacy budget allocation for data, which reduces the availability of the data. In order to reduce the computational overhead of smart meter data analysis and improve data availability while protecting data privacy, this paper proposes an adaptive differential privacy-based
CK-means clustering scheme, named DPCK. Firstly, we propose a
CK-means method by improving
K-means, which not only reduces the computation between data and centers but also avoids repeated computation by calculating the adjacent cluster center set and stability region for each cluster, thus effectively reducing the computational overhead of data analysis. Secondly, we design an adaptive differential privacy mechanism to add Laplace noise by calculating a different privacy budget for each cluster, which improves data availability while protecting data privacy. Finally, theoretical analysis demonstrates that DPCK provides differential privacy protection. Experimental results show that, compared to baseline methods, DPCK effectively reduces the computational overhead of data analysis and improves data availability by 11.3% while protecting user privacy.
Full article