Improving Utility of Private Join Size Estimation via Shuffling

Liu, Xin; Mao, Yibin; Zhang, Meifan; Li, Mohan

doi:10.3390/math13213468

This is an early access version, the complete PDF, HTML, and XML versions will be available soon.

Open AccessFeature PaperArticle

Improving Utility of Private Join Size Estimation via Shuffling

Cyberspace Institute of Advanced Technology, Guangzhou University, Guangzhou 510006, China

^*

Author to whom correspondence should be addressed.

Mathematics 2025, 13(21), 3468; https://doi.org/10.3390/math13213468

Submission received: 26 September 2025 / Revised: 25 October 2025 / Accepted: 28 October 2025 / Published: 30 October 2025

(This article belongs to the Topic Recent Advances in Security, Privacy, and Trust)

Download Versions Notes

Abstract

Join size estimation plays a crucial role in query optimization, correlation computing, and dataset discovery. A recent study, LDPJoinSketch, has explored the application of local differential privacy (LDP) to protect the privacy of two data sources when estimating their join size. However, the utility of LDPJoinSketch remains unsatisfactory due to the significant noise introduced by perturbation under LDP. In contrast, the shuffle model of differential privacy (SDP) can offer higher utility than LDP, as it introduces randomness based on both shuffling and perturbation. Nevertheless, existing research on SDP primarily focuses on basic statistical tasks, such as frequency estimation and binary summation. There is a paucity of studies addressing queries that involve join aggregation of two private data sources. In this paper, we investigate the problem of private join size estimation in the context of the shuffle model. First, drawing inspiration from the success of sketches in summarizing data under LDP, we propose a sketch-based join size estimation algorithm, SDPJoinSketch, under SDP, which demonstrates greater utility than LDPJoinSketch. We present theoretical proofs of the privacy amplification and utility of our method. Second, we consider separating high- and low-frequency items to reduce the hash-collision error of the sketch and propose an enhanced method called SDPJoinSketch+. Unlike LDPJoinSketch, we utilize secure encryption techniques to preserve frequency properties rather than perturbing them, further enhancing utility. Extensive experiments on both real-world and synthetic datasets validate the superior utility of our methods.

Keywords: differential privacy; shuffle model; join query; sketch

Share and Cite

MDPI and ACS Style

Liu, X.; Mao, Y.; Zhang, M.; Li, M. Improving Utility of Private Join Size Estimation via Shuffling. Mathematics 2025, 13, 3468. https://doi.org/10.3390/math13213468

AMA Style

Liu X, Mao Y, Zhang M, Li M. Improving Utility of Private Join Size Estimation via Shuffling. Mathematics. 2025; 13(21):3468. https://doi.org/10.3390/math13213468

Chicago/Turabian Style

Liu, Xin, Yibin Mao, Meifan Zhang, and Mohan Li. 2025. "Improving Utility of Private Join Size Estimation via Shuffling" Mathematics 13, no. 21: 3468. https://doi.org/10.3390/math13213468

APA Style

Liu, X., Mao, Y., Zhang, M., & Li, M. (2025). Improving Utility of Private Join Size Estimation via Shuffling. Mathematics, 13(21), 3468. https://doi.org/10.3390/math13213468

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Improving Utility of Private Join Size Estimation via Shuffling

Abstract

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI