A Differential Privacy Framework with Adjustable Efficiency–Utility Trade-Offs for Data Collection
Abstract
:1. Introduction
- We introduce the differentially private fractional coverage model (DPFCM), which is designed to meet the needs of applications that can operate effectively with partial data collection. DPFCM specifies two parameters, and , which are determined by the specific purpose of the application. -DPFCM aims to collect at least a fraction of the total data elements, with the requirement that for each of these collected elements, data are also collected from at least a fraction of the users. This method guarantees that both the breadth of data elements and the depth of user representation are maintained, supporting robust data utility even when only partial data are collected.
- We propose two different probability-based approaches that effectively determine the minimum number of data elements the server should collect from each user to satisfy the requirements of an -DPFCM. These approaches establish precise lower bounds on data collection, ensuring that the utility requirements of the model are satisfied while optimizing for efficiency.
- Finally, we validate the effectiveness of our proposed framework through experiments on real-world datasets, demonstrating that DPFCM achieves high data utility with reduced data collection requirements. Our results show that DPFCM maintains high data utility and computational efficiency, confirming its practical value in real-world applications.
2. Related Work
3. Preliminary
4. Problem Definition and Baseline Approaches
4.1. Problem Definition
4.2. Baseline Approaches
Algorithm 1 Baseline Approach Based on LDP (Each Participating User Processing) |
|
Algorithm 2 Baseline Approach Based on DDP (Each Participating User Processing) |
|
5. Proposed Approach
- Impracticality of Applying DDP to All Data. The DDP-based baseline approach in Algorithm 2 is highly inefficient because it requires cryptographic techniques to be applied to all m data elements across n users. In real-world scenarios, where m (the number of data elements) is extremely large, the computational overhead of cryptographic techniques becomes prohibitive, especially when n is also large. The complexity of these techniques scales with both the number of users and the size of the dataset [17,20]. As a result, applying DDP to all data elements in scenarios with a large number of users is highly inefficient due to the significant computational cost.
- Non-Zero Data Variability. An alternative solution is to apply DDP only to data elements with non-zero values, similar to the LDP-based approach in Algorithm 1. In this approach, each user i perturbs its data by adding independent noise , where h represents the total number of users contributing to the aggregation for a specific data element. However, this solution is not feasible for sparse datasets, as each user typically has a different set of non-zero data elements. Since h depends on the number of users contributing non-zero values for a given data element , the server cannot determine h for each element without knowing the users’ individual non-zero data. Consequently, the noise variance cannot be accurately calibrated, resulting in the failure to satisfy (,)-DP globally.
5.1. Definition of -DPFCM
5.2. Overview of -DPFCM Framework
- Computing minimum user contributions: The data collection server calculates the minimum number of data elements, , that each user must contribute to the server to satisfy the requirements of the -DPFCM framework, and then distributes it to all users (Section 5.3).
- Contributing data using DDP: Each user employs a DDP-based mechanism to report at least data elements to the server (Section 5.4).
- Secure aggregation: The server aggregates encrypted contributions for each data element, verifies that the threshold is satisfied, and securely decrypts the noisy values for qualifying elements (Section 5.5).
5.3. Computing Minimum User Contributions for -DPFCM
5.3.1. Binomial Model-Based Approach
5.3.2. Chernoff Bound-Based Approach
5.3.3. Algorithm for Computing Minimum User Contribution
Algorithm 3 Pseudocode for Computing Minimum |
|
5.4. Contributing Data Using DDP
Algorithm 4 Pseudocode for the User-Side Processing of -DPFCM |
|
5.5. Secure Aggregation of User Contributions
Algorithm 5 Pseudocode for the Sever-Side Processing of -DPFCM |
|
5.6. Analysis of Effect of and on
5.6.1. Effect of on
- Binomial Model-Based Approach: Using the condition from Equation (9), an increase in decreases , tightening the inequality. To satisfy this tighter condition, the cumulative probability must decrease. This requires an increase in , as increasing increases the probability that more users will select each data element, thus shifting the probability mass of the binomial towards higher values of X.
- Chernoff Bound-Based Approach: From Equation (17), an increase in results in a larger on the right-hand side. To maintain the inequality, must be increased to ensure that the left-hand side remains greater than or equal to the right-hand side. Specifically, a larger compensates by increasing both the quadratic term and the overall product with .
5.6.2. Effect of on
- Binomial Model-Based Approach: From Equation (9), the inequality involves a summation up to . Increasing raises this upper limit, requiring the binomial probability mass to shift toward higher values of X. To achieve this, must increase, as a higher ensures that more users contribute to each data element, thereby meeting the increased threshold.
- Chernoff Bound-Based Approach: From Equation (17), an increase in raises the term , which reduces the factor on the left-hand side. To restore balance, must be increased to ensure that the left side satisfies the inequality.
5.6.3. Discussion on Selecting Appropriate Values for and
6. Experiments
6.1. Experimental Setup
6.2. Evaluation of Computation Methods in -DPFCM Framework
Scalability and Computational Considerations
6.3. Evaluation Results on Data Utility
7. Conclusions and Future Work
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Rong, C.; Ding, J.; Li, Y. An interdisciplinary survey on origin-destination flows modeling: Theory and techniques. ACM Comput. Surv. 2024, 57, 1–49. [Google Scholar] [CrossRef]
- Behara, K.N.S.; Bhaskar, A.; Chung, E. A DBSCAN-based framework to mine travel patterns from origin-destination matrices: Proof-of-concept on proxy static OD from Brisbane. Transp. Res. Part Emerg. Technol. 2021, 131, 103370. [Google Scholar] [CrossRef]
- Jia, J.S.; Lu, X.; Yuan, Y.; Xu, G.; Jia, J.; Christakis, N.A. Population flow drives spatio-temporal distribution of COVID-19 in China. Nature 2020, 582, 389–394. [Google Scholar] [CrossRef]
- Chen, R.; Li, L.; Ma, Y.; Gong, Y.; Guo, Y.; Ohtsuki, T.; Pan, M. Constructing mobile crowdsourced COVID-19 vulnerability map with geo-indistinguishability. IEEE Internet Things J. 2022, 9, 17403–17416. [Google Scholar] [CrossRef]
- Yu, Z.; Ma, H.; Guo, B.; Yangi, Z. Crowdsensing 2.0. Commun. ACM 2021, 64, 76–80. [Google Scholar] [CrossRef]
- Kim, J.W.; Lim, J.H.; Moon, S.M.; Jang, B. Collecting health lifelog data from smartwatch users in a privacy-preserving manner. IEEE Trans. Consum. Electron. 2019, 65, 369–378. [Google Scholar] [CrossRef]
- Saura, J.R.; Ribeiro-Soriano, D.; Palacios-Marques, D. From user-generated data to data-driven innovation: A research agenda to understand user privacy in digital markets. Int. J. Inf. Manag. 2021, 60, 102331. [Google Scholar] [CrossRef]
- Jiang, H.; Li, J.; Zhao, P.; Zeng, F.; Xiao, Z.; Iyengar, A. Location privacy-preserving mechanisms in location-based services: A comprehensive survey. Acm Comput. Surv. 2021, 54, 1–36. [Google Scholar] [CrossRef]
- Dwork, C. Differential privacy. In Proceedings of the International Colloquium on Automata, Languages, and Programming, Venice, Italy, 12–15 July 2006; pp. 1–12. [Google Scholar]
- Erlingsson, U.; Pihur, V.; Korolova, A. RAPPOR: Randomized aggregatable privacy-preserving ordinal response. In Proceedings of the ACM SIGSAC Conference on Computer and Communications Security, Scottsdale, AZ, USA, 3–7 November 2014; pp. 1054–1067. [Google Scholar]
- Wang, T.; Blocki, J.; Li, N.; Jha, S. Locally differentially private protocols for frequency estimation. In Proceedings of the USENIX Conference on Security Symposium, Berkeley, CA, USA, 14–16 August 2017. [Google Scholar]
- Goryczka, S.; Xiong, L. A comprehensive comparison of multiparty secure additions with differential privacy. IEEE Trans. Dependable Secur. Comput. 2015, 14, 463–477. [Google Scholar] [CrossRef] [PubMed]
- Wei, Y.; Jia, J.; Wu, Y.; Hu, C.; Dong, C.; Liu, Z.; Chen, X.; Peng, Y.; Wang, S. Distributed differential privacy via shuffling versus aggregation: A curious study. IEEE Trans. Inf. Forensics Secur. 2024, 19, 2501–2516. [Google Scholar] [CrossRef]
- Bassily, R.; Smith, A. Local, private, efficient protocols for succinct histograms. In Proceedings of the Forty-Seventh Annual ACM Symposium on Theory of Computing, Portland, OR, USA, 14–17 June 2015. [Google Scholar]
- Geng, Q.; Kairouz, P.; Oh, S.; Viswanath, P. The staircase mechanism in differential privacy. IEEE J. Sel. Top. Signal Process. 2015, 9, 1176–1184. [Google Scholar] [CrossRef]
- Lyu, L.; Nandakumar, K.; Rubinstein, B.; Jin, J.; Bedo, J.; Palaniswami, M. PPFA: Privacy preserving fog-enabled aggregation in smart grid. IEEE Trans. Ind. Inform. 2018, 14, 3733–3744. [Google Scholar] [CrossRef]
- Xie, Q.; Jiang, S.; Jiang, L.; Huang, Y.; Zhao, Z.; Khan, S. Efficiency optimization techniques in privacy-preserving federated learning with homomorphic encryption: A brief survey. IEEE Internet Things J. 2024, 11, 24569–24580. [Google Scholar] [CrossRef]
- Truex, S.; Baracaldo, N.; Anwar, A.; Steinke, T.; Ludwig, H.; Zhang, R.; Zhou, Y. A hybrid approach to privacy-preserving federated learning. In Proceedings of the the ACM Workshop on Artificial Intelligence and Security, London, UK, 15 November 2019. [Google Scholar]
- Kadhe, S.; Rajaraman, N.; Koyluoglu, O.O.; Ramchandran, K. FastSecAgg: Scalable secure aggregation for privacy-preserving federated learning. arXiv 2020, arXiv:2009.11248. [Google Scholar]
- Bell, J.H.; Bonawitz, K.A.; Gascon, A.; Lepoint, T.; Raykova, M. Secure single-server aggregation with (poly)logarithmic overhead. In Proceedings of the ACM SIGSAC Conference on Computer and Communications Security, Virtual, 9–16 November 2020; pp. 1253–1269. [Google Scholar]
- Balle, B.; Bell, J.; Gascon, A.; Nissim, K. The privacy blanket of the shuffle model. In Proceedings of the International Cryptology Conference, Santa Barbara, CA, USA, 12–18 August 2019; pp. 638–667. [Google Scholar]
- Scott, M.; Cormode, G.; Maple, C. Aggregation and transformation of vector-valued messages in the shuffle model of differential privacy. IEEE Trans. Inf. Forensics Secur. 2022, 17, 612–627. [Google Scholar] [CrossRef]
- Chen, E.; Cao, Y.; Ge, Y. A generalized shuffle framework for privacy amplification: Strengthening privacy guarantees and enhancing utility. In Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada, 26–27 February 2024; pp. 11267–11275. [Google Scholar]
- Li, K.; Zhang, H.; Liue, Z. A range query scheme for spatial data with shuffled differential privacy. Mathematics 2024, 12, 1934. [Google Scholar] [CrossRef]
- Andres, M.E.; Bordenabe, N.E.; Chatzikokolakis, K.; Palamidessi, C. Geo-indistinguishability: Differential privacy for location-based systems. In Proceedings of the ACM SIGSAC Conference on Computer and Communications Security, Berlin, Germany, 4–8 November 2013; pp. 901–914. [Google Scholar]
- Kim, J.W.; Edemacu, K.; Jang, B. Privacy-preserving mechanisms for location privacy in mobile crowdsensing: A survey. J. Netw. Comput. Appl. 2023, 200, 103315. [Google Scholar] [CrossRef]
- Zhao, Y.; Yuan, D.; Du, J.T.; Chen, J. Geo-Ellipse-Indistinguishability: Community-aware location privacy protection for directional distribution. IEEE Trans. Knowl. Data Eng. 2023, 35, 6957–6967. [Google Scholar] [CrossRef]
- Fathalizadeh, A.; Moghtadaiee, V.; Alishahi, M. Indoor geo-indistinguishability: Adopting differential privacy for indoor location data protection. IEEE Trans. Emerg. Top. Comput. 2023, 12, 293–306. [Google Scholar] [CrossRef]
- Jin, W.; Xiao, M.; Guo, L.; Yang, L.; Li, M. ULPT: A user-centric location privacy trading framework for mobile crowd sensing. IEEE Trans. Mob. Comput. 2022, 21, 3789–3806. [Google Scholar] [CrossRef]
- Huang, P.; Zhang, X.; Guo, L.; Li, M. Incentivizing crowdsensing-based noise monitoring with differentially-private locations. IEEE Trans. Mob. Comput. 2021, 20, 519–532. [Google Scholar] [CrossRef]
- Zhang, P.; Cheng, X.; Su, S.; Wang, N. Area coverage-based worker recruitment under geo-indistinguishability. Comput. Netw. 2022, 217, 109340. [Google Scholar] [CrossRef]
- Song, S.; Kim, J.W. Adapting geo-indistinguishability for privacy-preserving collection of medical microdata. Electronics 2023, 12, 2793. [Google Scholar] [CrossRef]
- Tian, H.; Zhang, F.; Shao, Y.; Li, B. Secure linear aggregation using decentralized threshold additive homomorphic encryption for federated learning. arXiv 2021, arXiv:2111.10753. [Google Scholar]
- T-Drive Trajectory Data Sample. 2018. Available online: https://www.microsoft.com/en-us/research/publication/t-drive-trajectory-data-sample (accessed on 1 August 2024).
0.1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
0.2 | 0.010 | 0.026 | 0.028 | 0.075 | 0.025 | 0 | 0 | 0 | 0 | 0 |
0.3 | 0.015 | 0.035 | 0.030 | 0.085 | 0.080 | 0 | 0 | 0 | 0 | 0 |
0.4 | 0.035 | 0.075 | 0.086 | 0.095 | 0.105 | 0 | 0 | 0 | 0 | 0 |
0.5 | 0.055 | 0.075 | 0.090 | 0.100 | 0.120 | 0 | 0 | 0 | 0 | 0 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Kim, J.; Cho, S.-H. A Differential Privacy Framework with Adjustable Efficiency–Utility Trade-Offs for Data Collection. Mathematics 2025, 13, 812. https://doi.org/10.3390/math13050812
Kim J, Cho S-H. A Differential Privacy Framework with Adjustable Efficiency–Utility Trade-Offs for Data Collection. Mathematics. 2025; 13(5):812. https://doi.org/10.3390/math13050812
Chicago/Turabian StyleKim, Jongwook, and Sae-Hong Cho. 2025. "A Differential Privacy Framework with Adjustable Efficiency–Utility Trade-Offs for Data Collection" Mathematics 13, no. 5: 812. https://doi.org/10.3390/math13050812
APA StyleKim, J., & Cho, S.-H. (2025). A Differential Privacy Framework with Adjustable Efficiency–Utility Trade-Offs for Data Collection. Mathematics, 13(5), 812. https://doi.org/10.3390/math13050812