A Histogram Publishing Method under Differential Privacy That Involves Balancing Small-Bin Availability First
Abstract
:1. Introduction
- A sanitization algorithm, SReB_GCA, is introduced, which adopts relative error as its metric for accuracy. This approach diverges from numerous methodologies documented in the study [15,19], wherein the mean squared error or absolute error is typically employed, with the histogram being assessed collectively to ascertain a cumulative mean squared or absolute error for the purpose of optimization. Consequently, the significance of smaller bins may be inadvertently disregarded.
- The grouping rules under relative error are theoretically deduced and analyzed. There are two types of errors in DP histogram publishing, including the reconstruction error due to grouping and the noise error due to Laplace noise being injected. From the analysis of their relative error forms, it is concluded that sorting from small to large is favorable for improving the utility of small bins first, a perspective not thoroughly explored in the previous literature. Additionally, a lower bound in the greedy grouping process of SReB_GCA is theoretically deduced, which facilitates maximizing the number of bins grouped and approximately optimizing the mean relative error of the histogram.
2. Related Work
3. Preliminaries
3.1. Differential Privacy
3.2. Laplace Mechanism
3.3. Composition Properties
3.4. Relative Error
4. Problem Statement and First-Cut Method
4.1. Problem Statement
4.2. First-Cut Method
5. Sanitization Algorithm
5.1. Grouping Rules
- (1)
- When , it has
- (2)
- When , it hasTherefore, is mainly composed of two parts, where the part is caused by grouping and the other part is caused by both grouping and Laplace noise injected. It is easy to obtain
- (1)
- When , decreases as increases;
- (2)
- When , two cases exist as follows.
- (i)
- If , then decreases as increases;
- (ii)
- If , then increases as increases.
- A sorting from small to large is favorable to improve the availability of small bins first.
- The closer bins should be divided into the same group as much as possible to reduce the relative error.
5.2. GGS
5.3. SReB_GCA
Algorithm 1 SReB_GCA |
|
Algorithm 2 GGS |
|
6. Experimental Evaluation
6.1. Experimental Settings
6.2. Experimental Results
7. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Torra, V. Data Privacy: Foundations, New Developments and the Big Data Challenge; Springer Press: Cham, Switzerland, 2017; pp. 1–21. [Google Scholar]
- Fung, B.C.M.; Wang, K.; Chen, R.; Yu, P.S. Privacy-preserving data publishing: A survey of recent developments. ACM Comput. Surv. 2010, 42, 1–53. [Google Scholar] [CrossRef]
- Sweeney, L.A. K-anonymity: A model for protecting privacy. Int. J. Uncertain. Fuzz. Knowl. Syst. 2002, 10, 557–570. [Google Scholar] [CrossRef]
- Dwork, C. Differential privacy: A survey of results. In Proceedings of the International Conference on Theory and Applications of Models of Computation, Xi’an, China, 25–29 April 2008; pp. 1–19. [Google Scholar]
- Dwork, C.; McSherry, F.; Nissim, K.; Smith, A. Calibrating noise to sensitivity in private data analysis. In Proceedings of the 3rd Conference on Theory of Cryptography Conference, New York, NY, USA, 4–7 March 2006; pp. 265–284. [Google Scholar]
- Dwork, C.; Roth, A. The algorithmic foundations of differential privacy. Found. Trends Databases 2014, 9, 211–407. [Google Scholar]
- Huang, H.; Zhang, D.; Xiao, F.; Wang, K.; Gu, J.; Wang, R.; Wang, R. Privacy-preserving approach PBCN in social network with differential privacy. IEEE Trans. Netw. Serv. Man. 2020, 17, 931–945. [Google Scholar] [CrossRef]
- Ou, L.; Qin, Z.; Liao, S.; Hong, Y.; Jia, X. Releasing correlated trajectories: Towards high utility and optimal differential privacy. IEEE Trans. Dep. Secur. Comput. 2020, 17, 1109–1123. [Google Scholar] [CrossRef]
- Ying, C.; Jin, H.; Wang, X.; Luo, Y. Double insurance: Incentivized federated learning with differential privacy in mobile crowdsensing. In Proceedings of the 2020 International Symposium on Reliable Distributed Systems (SRDS), Shanghai, China, 21–24 September 2020; pp. 81–90. [Google Scholar]
- Abadi, M.; Chu, A.; Goodfellow, I.; McMahan, H.B.; Mironov, I.; Talwar, K.; Zhang, L. Deep learning with differential privacy. In Proceedings of the ACM Conference on Computer and Communications Security (CCS), Vienna, Austria, 24–28 October 2016; pp. 308–318. [Google Scholar]
- Bonomi, L.; Xiong, L. Mining frequent patterns with differential privacy. Proc. VLDB Endowm. 2013, 6, 1422–1427. [Google Scholar] [CrossRef]
- Barak, B.; Chaudhuri, K.; Dwork, C.; Kale, S.; McSherry, F.; Talwar, K. Privacy, accuracy, and consistency too a holistic solution to contingency table release. In Proceedings of the Symposium on Principles of Database Systems (PODS), Beijing, China, 11–13 June 2007; pp. 273–282. [Google Scholar]
- Acs, G.; Castelluccia, C.; Chen, R. Differentially private histogram publishing through lossy compression. In Proceedings of the International Conference on Data Mining (ICDM), Washington, DC, USA, 10–13 December 2012; pp. 1–10. [Google Scholar]
- Hay, M.; Rastogi, V.; Miklau, G.; Suciu, D. Boosting the accuracy of differentially private histograms through consistency. Proc. VLDB Endow. 2010, 3, 1021–1032. [Google Scholar] [CrossRef]
- Kellaris, G.; Papadopoulos, S. Practical differential privacy via grouping and smoothing. Proc. VLDB Endow. 2013, 6, 301–312. [Google Scholar] [CrossRef]
- Li, C.; Hay, M.; Miklau, G.; McGregor, A. Optimizing linear counting queries under differential privacy. In Proceedings of the Symposium on Principles of Database Systems (PODS), Indianapolis, IN, USA, 6–11 June 2010; pp. 123–134. [Google Scholar]
- Rastogi, V.; Nath, S. Differentially private aggregation of distributed time-series with transformation end encryption. In Proceedings of the International Conference on Management of Data (SIGMOD), Indianapolis, IN, USA, 6–10 June 2010; pp. 735–746. [Google Scholar]
- Xiao, X.; Wang, G.; Gehrke, J. Differential privacy via wavelet transform. In Proceedings of the International Conference on Data Engineering (ICDE), Long Beach, CA, USA, 1–6 March 2010; pp. 225–236. [Google Scholar]
- Xu, J.; Zhang, Z.; Xiao, X.; Yu, G. Differentially private histogram publicaiton. In Proceedings of the International Conference on Data Engineering (ICDE), Arlington, VA, USA, 1–5 April 2012; pp. 32–43. [Google Scholar]
- Yuan, G.; Zhang, Z.; Winslett, M.; Xiao, X.; Yang, Y.; Hao, Z. Low-rank mechanism: Optimizing batch queries under differential privacy. Proc. VLDB Endow. 2012, 5, 1352–1363. [Google Scholar] [CrossRef]
- Nelson, B.; Reuben, J. SoK: Chasing accuracy and privacy, and catching both in differentially private histogram publication. Trans. Data Priv. 2020, 13, 201–245. [Google Scholar]
- Zhang, X.; Chen, R.; Xu, J.; Meng, X.; Xie, Y. Towards accurate histogram publication under differential privacy. In Proceedings of the International Conference on Data Mining (SDM), Philadelphia, PA, USA, 24–26 April 2014; pp. 587–595. [Google Scholar]
- Ligett, K.; Neel, S.; Roth, S.A.; Bo, W.; Wu, Z.S. Accuracy first: Selecting a differential privacy level for accuracy-constrained ERM. In Proceedings of the International Conference on Neural Information Processing Systems (NIPS), Red Hook, NY, USA, 4–9 December 2017; pp. 2563–2573. [Google Scholar]
- Tao, T.; Li, S.; Huang, J.; Hou, S.; Gong, H. A Symmetry Histogram Publishing Method Based on Differential Privacy. Symmetry 2023, 15, 1099. [Google Scholar] [CrossRef]
- Chen, Q.; Ni, Z.; Zhu, X.; Xia, P. Diferential privacy histogram publishing method based on dynamic sliding window. Front. Comput. Sci. 2023, 17, 174809. [Google Scholar] [CrossRef]
- Zou, Y.; Shan, C. Delay-tolerant privacy-preserving continuous histogram publishing method. In Proceedings of the 7th International Conference on Big Data and Computing, Shenzhen, China, 27–29 May 2022; pp. 88–95. [Google Scholar]
- Lei, H.; Li, S.; Wang, H. A weighted social network publishing method based on diffusion wavelets transform and differential privacy. Multimed. Tools Appl. 2022, 81, 20311–20328. [Google Scholar] [CrossRef]
- Shoaran, M.; Thomo, A.; Weber, J. Differential privacy in practice. In Proceedings of the Workshop on Secure Data Management (SDM), Istanbul, Turkey, 27 August 2012; pp. 14–24. [Google Scholar]
- Xiao, X.; Bender, G.; Hay, M.; Gehrke, J. iReduct: Differential privacy with reduced relative errors. In Proceedings of the International Conference on Management of Data (SIGMOD), Athens, Greece, 12–16 June 2011; pp. 229–240. [Google Scholar]
- McSherry, F. Privacy integrated queries: An extensible platform for privacy-preserving data analysis. In Proceedings of the International Conference on Management of Data (SIGMOD), Providence, RI, USA, 29 June–2 July 2009; pp. 19–30. [Google Scholar]
- Liu, H.; Wu, Z.; Peng, C.; Tian, F.; Lu, H. Adaptive Gaussian mechanism based on expected data utility under conditional filtering noise. KSII Trans. Int. Inf. Syst. 2018, 12, 3497–3515. [Google Scholar]
- Chen, Y.; Xu, Z.; Chen, J.; Jia, S. B-DP: Dynamic collection and publishing of continuous check-in data with best-effort differential privacy. Entropy 2022, 24, 404. [Google Scholar] [CrossRef] [PubMed]
- Bassily, R.; Smith, A. Local, private, efficient protocols for succinct histograms. In Proceedings of the Forty-Seventh Annual ACM Symposium on Theory of Computing (STOC), Portland, OR, USA, 14–17 June 2015; pp. 127–135. [Google Scholar]
- Ren, X.; Yu, C.M.; Yu, W.; Yang, S.; Yang, X.; McCann, J.A.; Yu, P.S. LoPub: High-dimensional crowdsourced data publication with local differential privacy. IEEE Trans. Inf. Forensics Secur. 2018, 13, 2151–2166. [Google Scholar] [CrossRef]
- Li, H.; Xiong, L.; Jiang, X.; Liu, J. Differentially private histogram publication for dynamic datasets: An adaptive sampling approach. In Proceedings of the ACM International Conference on Information and Knowledge Management(CIKM), Melbourne, Australia, 19–23 October 2015; pp. 1001–1010. [Google Scholar]
- Ren, X.; Shi, L.; Yu, W.; Yang, S.; Zhao, C.; Xu, Z. LDP-IDS: Local Differential Privacy for Infinite Data Streams. In Proceedings of the 41th ACM SIGMOD International Conference on Management of Data (SIGMOD), Philadelphia, PA, USA, 12–17 June 2022; pp. 1064–1077. [Google Scholar]
Dataset | Mean | Variance | Count Range | |
---|---|---|---|---|
Waitakere | 7725 | 24.13 | 4764.57 | [0, 467] |
Search Log | 32,768 | 10.25 | 577.31 | [0, 496] |
NetTrace | 65,536 | 0.39 | 91.01 | [0, 1423] |
Social Network | 11,342 | 59.49 | 2995 | [1, 1678] |
Dataset | Method | |||
---|---|---|---|---|
Waitakere | AHP | 0.018 | 0.203 | 0.467 |
SReB_GCA | 0.0004 | 0.056 | 0.171 | |
Search Log | AHP | 0.054 | 0.103 | 0.189 |
SReB_GCA | 0.0001 | 0.009 | 0.099 | |
NetTrace | AHP | 0.153 | 0.572 | 1.229 |
SReB_GCA | 0.004 | 0.092 | 0.252 | |
Social Network | AHP | 0.071 | 0.309 | 0.825 |
SReB_GCA | 0.0001 | 0.003 | 0.099 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Chen, J.; Zhou, S.; Qiu, J.; Xu, Y.; Zeng, B.; Fang, W.; Chen, X.; Huang, Y.; Xu, Z.; Chen, Y. A Histogram Publishing Method under Differential Privacy That Involves Balancing Small-Bin Availability First. Algorithms 2024, 17, 293. https://doi.org/10.3390/a17070293
Chen J, Zhou S, Qiu J, Xu Y, Zeng B, Fang W, Chen X, Huang Y, Xu Z, Chen Y. A Histogram Publishing Method under Differential Privacy That Involves Balancing Small-Bin Availability First. Algorithms. 2024; 17(7):293. https://doi.org/10.3390/a17070293
Chicago/Turabian StyleChen, Jianzhang, Shuo Zhou, Jie Qiu, Yixin Xu, Bozhe Zeng, Wanchuan Fang, Xiangying Chen, Yipeng Huang, Zhengquan Xu, and Youqin Chen. 2024. "A Histogram Publishing Method under Differential Privacy That Involves Balancing Small-Bin Availability First" Algorithms 17, no. 7: 293. https://doi.org/10.3390/a17070293
APA StyleChen, J., Zhou, S., Qiu, J., Xu, Y., Zeng, B., Fang, W., Chen, X., Huang, Y., Xu, Z., & Chen, Y. (2024). A Histogram Publishing Method under Differential Privacy That Involves Balancing Small-Bin Availability First. Algorithms, 17(7), 293. https://doi.org/10.3390/a17070293