Multidimensional Epidemiological Survey Data Aggregation Scheme Based on Personalized Local Differential Privacy

: In recent years, with the rapid development of intelligent technology, information security and privacy issues have become increasingly prominent. Epidemiological survey data (ESD) research plays a vital role in understanding the laws and trends of disease transmission. However, epidemiological investigations (EI) involve a large amount of privacy-sensitive data which, once leaked


Introduction
In recent years, various infectious diseases, such as SARS, swine flu, Ebola, novel coronavirus epidemics, and influenza virus, have had a significant impact [1,2].During these outbreaks, EIs have emerged as a crucial measure to curb their spread.EIs typically involve tracking patients, close contacts, and potential contacts and is conducted by health departments or other organizations.This investigation identifies close contacts based on patients' basic information and behavior patterns; then, it tracks and screens them and takes the necessary actions in such settings as hospitals, communities, and workplaces [3].Accordingly, ensuring a timely and accurate EI is vital, with a focus on safeguarding personal privacy and information security.
In analyzing infectious disease transmission, the data used require higher privacy protection compared with general health data.These data encompass the identities, health statuses, and close contact details of epidemiological survey objects (ESOs), which are highly sensitive to data owners.Emergencies intensify the urgency and volume of epidemic investigations, which are often recorded physically, thereby heightening information exposure risks.Information leaks have severe consequences, including repeated privacy breaches, misinformed public opinions, social panic, and potential harm.An excessive or insufficient desensitization of patient information by authorities may also lead to irrelevant data disclosure.Consequently, citizens' privacy must be prioritized during epidemic investigations, including by employing effective encryption and security measures to prevent sensitive information leaks.

Related Work
In epidemiological studies, there have been many research efforts on the encryption protection of ESD.Blumenberg C et al. [4] used the REDCap system to explore the advantages and limitations of the electronic data collection environment and solved data inconsistency through real-time reporting and field verification, which is expected to play a role in time-saving and data quality improvement.However, there are shortcomings in its method of detail description, limitation discussion, replicability, and data quality analysis.Dong E et al. [5] introduced the COVID-19 dashboard, which provides real-time epidemic data for the world, and described the data collection process in detail.However, the inconsistency and time delay of the data affects the reliability of the data and the limitations of their application.Sperber A D et al. [6] compared two data collection methods, face-to-face interviews and internet surveys, concluding that the preferences and participation of different audiences may affect the bias of the method.Moreover, these authors did not provide a detailed design and description of the data collection method.In summary, most studies only encrypt ESD themselves, ignoring the privacy protection of nonmedical data, and may not be able to fully protect users' privacy information; their methods cannot be replicated, the limitations are significant, and the data cannot be applied.For data involving correlations, such as influenza status, sending data separately may lead to increased errors and reduce data accuracy and availability.In addition, correlations between non-sensitive attributes and sensitive attributes may result in the disclosure of personally sensitive information.Therefore, when dealing with ESOs, we must consider their relevance and take privacy protection measures to ensure data security and privacy.
Differential privacy (DP) [7,8] is pivotal in balancing individual privacy and data availability, and it can also be used to resist member inference attacks.In 2013, local differential privacy (LDP) [9,10] was proposed as a variant of DP [11]; it inherits its advantages and abandons the dependence on trusted third parties, thus improving the practicality of the model.LDP can add noise to all data and effectively protect data privacy.It has a wide range of applications, including machine learning, network services, data statistics, and optimization.LDP has been widely used in the industry.For example, Apple uses it to protect users' mobile phone usage data, and Google uses LDP-based components (such as randomized aggregable privacy-preserving ordinal response, RAPPOR) to collect user behavior data.In 2017, Wang et al. [12] proposed a framework to incorporate most LDP protocols into the pure LDP protocol framework to optimize and generalize existing protocols and compared the accuracy and communication cost of different LDP protocols.They also introduced the optimized unary encoding (OUE) protocol with higher accuracy.In addition, the study compared a variety of encoding methods and provided suggestions for selecting protocols.Histogram encoding (HE) and unary encoding (UE) required cost of DE and LH, the OUE coding mechanism offers improved accuracy and unbiased estimation results.Furthermore, UE, known for its simplicity and intuitiveness, is easy to implement and offers superior performance in terms of computational and communication costs.This made it a more practical choice for adoption in various applications.However, many existing LDP mechanisms often overlook the varying privacy protection requirements of different data in practical scenarios.Consequently, they may increase estimation errors by overprotecting non-sensitive data.Therefore, customizing LDP for specific domains is essential.Some protocols designed for protecting privacy location data, such as Geo-Indistinguishability [13] and Private Drop [14], may not directly suit EI frequency estimation.In addition, some personalized DP protocols may introduce excessive noise or data cuts, affecting data quality.In 2019, to address these issues, Murakami et al. [15] introduced the utility-optimized LDP (ULDP) model to reduce the protection of non-sensitive data according to the privacy requirements of different data, thereby improving data utility and protecting the privacy of sensitive data.However, the current ULDP protocols, such as utility-optimized generalized random response (uGRR) and utility-optimized RAPPOR (uRAP), mainly rely on generalized random response (GRR) and symmetric unary encoding (SUE), and face the challenges of utility and communication cost in the field of big data.In 2022, He et al. [16] proposed the utility-optimized optimized local hashing (uOLH) protocol for big data domains, which aims to achieve low communication costs and high data utility.Nonetheless, it focuses on big data and does not apply to all scenarios, and complex data hashing increases the complexity and cost of implementation.Additionally, Cao et al. [17] devised a frequency estimation mechanism conforming to the set-valued data ULDP model, offering a privacy protection solution for set data.However, its applicability may be limited when dealing with different data types, thereby increasing the complexity and cost of practical implementations.These studies aimed to enhance data utility while addressing privacy concerns.EIs present unique challenges, demanding rigorous data collection and privacy safeguards.Current LDP technology falls short, particularly in terms of data aggregation, analysis efficiency, accuracy, and compatibility with diverse data types.Hence, in-depth research is vital to develop customized protocols for EI that effectively balance data privacy and utilization.
Homomorphic encryption [18], notably the Paillier variant, facilitates calculations on encrypted data, thereby enhancing data privacy.Paillier homomorphic encryption (PHE) [19] permits encrypted data aggregation, ensuring privacy and security while enabling efficient data transmission.In epidemiological survey data collection, it safeguards privacy and also enhances transmission efficiency.This ensures that individual data remain encrypted during sensitive data aggregation and analysis, preventing the exposure of personal sensitive information.Consequently, PHE offers an efficient and dependable solution for data collaboration and privacy protection.
Based on the particularity of data in EI, in the process of LDP perturbation, it is necessary to ensure that ESD do not lose information and that ESD are fully protected for privacy.Based on the existing mechanism, we improve the OUE protocol scheme based on the ULDP protocol to transmit EI data set information, avoid the unified perturbation processing of all data like the traditional LDP method, and use OUE coding to avoid complex data hashing.Increasing the complexity and cost of implementation can achieve lower communication costs and higher data utility in the EI data domain.
Therefore, based on the OUE protocol [12], we improve the utility-optimized OUE (uOUE) protocol that conforms to the ULDP model, aiming to further improve the efficiency and accuracy of data coding.A ULDP scheme based on the uOUE protocol is designed.The uOUE protocol is used to deal with situations in which ESO data contain both sensitive values and non-sensitive values to ensure the security and privacy of data, improve the accuracy of ESD frequency estimation results, and avoid the privacy risks posed to patients' original data during transmission.In addition, the scheme also uses PHE and identity-based signature schemes to protect data from differential attacks.PHE can aggregate encrypted data in a ciphertext state to ensure data security and privacy.The identitybased signature scheme can digitally sign the userʹs ESD to ensure data integrity and source credibility.
Our main contributions are as follows: 1.For users' personalized privacy protection requirements, we improve the uOUE protocol that conforms to the ULDP model based on the OUE mechanism.By proving that the uOUE protocol satisfies the ULDP model and calculating the theoretical variance of the frequency estimation results, the proposed protocol has been deemed to have a low communication cost and high data utility; 2. Considering the collection and processing of data in the EI scenario, we design a multidimensional ESD aggregation scheme based on PLDP.This scheme ensures the security and integrity of personal privacy data while maintaining the availability of data and achieves the secure, efficient, and accurate aggregation of ESD; 3. Through the comparative analysis of mean square error (MSE) and communication cost with the other five LDP protocols, as well as the experimental results on two data sets, our scheme shows higher practicability and performance in ESD aggregation.In terms of multidimensional data aggregation, the scheme shows strong computing performance and more comprehensive functions and cleverly balances computing efficiency and privacy protection in the ESD aggregation scenario, providing a valuable practical solution for this field.

Organization
The rest of this paper is organized as follows: Section 2 introduces some preliminary knowledge; Section 3 gives the specific content of the uOUE mechanism; in Section 4, considering the personalized privacy requirements of users in the EI scenario, an ESD aggregation scheme based on uOUE is designed; Section 5 gives the theoretical proof and comparative analysis of the scheme; and finally, Section 6 contains a summary of our results.

Preliminary Knowledge
In this section, we will briefly introduce the concepts of utility-optimized local differential privacy (ULDP) and Paillier homomorphic encryption (PHE).Finally, we further introduce the utility evaluation mechanism MSE used in this paper.

Utility Optimization LDP
ULDP [15]  ( ) And for any 1 2 x x ≠ , satisfy the following: (2) For any input 1 2 x x X ∈ ， , obtain any output S y Y ∈ and satisfy the following: ε -LDP guarantees that any attacker cannot infer the exact original input from the output result, and when the privacy budget ε approaches 0, all data in X output the same result with almost the same probability.Since the privacy budget ε controls the degree of privacy protection in LDP, the smaller (or larger) the value, the stronger (or weaker) the privacy guarantee.

PHE Algorithm
PHE [19] is widely used in many privacy-preserving data aggregation schemes.Suppose ( ) E ⋅ is an encryption function,  is an encrypted key, and a and b are two random encrypted messages.The additional homomorphism of the PHE algorithm is shown as follows: . The PHE algorithm consists of three parts: key generation, encryption, and decryption.The detailed process is as follows: • Key generation: randomly select two large primes p and q and calculate N pq = , ( ) ( )

Utility Evaluation
The MSE is used to evaluate the effectiveness of a protocol and experiment.The MSE can evaluate the degree of data change.The smaller the value of the MSE, the better the accuracy of the prediction model to describe the experimental data.The formal definition of mean square error is shown in ( 4): x F represents the real frequency, and x F represents the estimated frequency.

uOUE Mechanism
In this section, we propose a uOUE protocol based on the OUE protocol, which conforms to the ULDP model and protects the privacy of categorical data.Different from the traditional OUE protocol, the uOUE protocol considers the privacy budget and introduces the ULDP mechanism to reduce communication costs and effectively reduce the risk of information leakage while maintaining data accuracy.

Introduction to uOUE
The uOUE protocol encodes values as binary vectors, matching each data item to candidate values.Privacy perturbation uses vector operations to maintain original data attributes.It treats sensitive and non-sensitive sets differently, applying distinct privacy protection methods.Sensitive data undergo random response interference, and individual candidate values of non-sensitive data are disturbed.Frequency estimation enhances accuracy and utility.This method reduces non-sensitive data protection, improves frequency estimation accuracy, maintains privacy, and enhances data availability.

Mechanism Description
Participants in the uOUE protocol include three parties: users, servers, and data users.Users hold the original data, encode and perturb them, and send them to the server.The server aggregates and statistically analyzes the perturbation data of all users, estimates the frequency distribution results of all the original data, and finally sends the results to the corresponding data users.
The original data set is recorded as Among them, the original data set is divided into two parts: sensitive data set S X and non-sensitive data set N X .The two do not intersect, that is, .
The uOUE scheme is divided into three steps: encoding, perturbation, and aggregation.The specific steps are as follows:

uOUE encoding
In uOUE, the data are first UE-encoded, that is, the classification data in the user's hands are encoded as a d -bit vector v .Each bit corresponds to data in the original data domain.If the user data contain data k , let the k bit of v be 1.
Assume that there are n users, and each user holds raw data x .To reduce the sub- sequent communication cost, UE is used to encode it, and the encoding result ( ) , , , , , , , , , 0,1 . When x is sensitive data, it will be encoded as data in { } , , , , and when x is non-sensitive data, its encoding result is data in { }

uOUE perturbation method
Users encode and perturb their original data locally to generate perturbation data v ′ .According to sensitivity x , ( ) Perturb v is processed by different perturbation methods.
The specific method is to perturb each k v in v to obtain k v ′ , as shown in (5).The pro- cessed perturbation data k v ′ will be sent to the server for aggregation and analysis.
[ ] For sensitive data x , some probabilities remain unchanged

uOUE aggregation
After the server receives the disturbance data sent by the user, the server will count whether each bit in v ′ is 1.Suppose that the number of occurrences of 1 in the k -th bit is ˆx F ; by counting the number of occurrences of 1 in each bit, the server can estimate the probability of the occurrence of x in each original data item, and the statistical analysis results ˆx F close to the frequency distribution of the original data are as follows:

ESD Aggregation Scheme Based on uOUE
ESD sensitivity analysis involves sensitive and non-sensitive attributes.For example, in an epidemiological record, attributes like {gender, age, symptoms, allergic drugs, chronic diseases} include sensitive (allergic drugs) and non-sensitive attributes (gender and chronic diseases).Sensitive attributes also have sensitive and non-sensitive candidate values.For instance, when examining regional attributes, an individual's travel destinations might include {Beijing, Shanghai, Guangxi, Hubei}, where Beijing and Shanghai are sensitive candidate values, and the others are not.
To enhance user data privacy, we designed an aggregation scheme based on the uOUE mechanism outlined in Section 3.This approach improves data utility by reducing non-sensitive data protection.It also introduces PHE and BLS-based short signatures [20] to boost data security.PHE enables the Epidemiological Data Control Center (EDCC) to merge encrypted data from multiple ESOs without decryption, ensuring strong data privacy and security while enhancing transmission efficiency.Meanwhile, the BLS-based short signature scheme offers efficient signing and verification, reducing communication and storage costs while maintaining high security and anonymity.This facilitates the efficient collection and aggregation of ESD while safeguarding authentication and data transmission privacy.

Scheme Model
This scheme is mainly composed of ESOs, epidemiological survey workers (ESWs), and the three-tier structure of the EDCC.The specific scheme model is shown in Figure 3    , , , which is the candidate value of the attribute j m .Value ij m is 1 or 0; 1 is the candidate value, 0 indicates that it does not have the candidate value, and k indicates the position corresponding to the candidate value; 2. The EDCC selects safety parameter  and two primes , p q , and calculates N pq = and ( 1, 1) lcm p q λ = − − , where , defines function ( ) , and calculates the public key ( ) , , , l a a a  , 4.

1
 and 2  are cyclic multiplicative groups with the same prime order 1 q , where 1  is generated by P , and  registration, managed by the EDCC, yields a user's pseudonym and generates their public and private keys using the BLS short signature, as follows: 1.
i U obtains the current timestamp i T , calculates the hash value ( ) ing their own real identity U i id , and sends a registration request { } 3 , , id T H to the EDCC; 2. After receiving the user's registration request, the EDCC checks whether ( ) ( ) shown in (8).
( ) i U perturbs according to (5) obtain data i v ′ , that is, perturbs each candidate value i jk v according to its sensitivity to obtain data : for sensitive data x , the probability remains unchanged, and the probability of h ESW obtains ( ) frequency statistics: , and calculates the ciphertext according to (10): the equation is equal, the EDCC receives the report; 3. The EDCC aggregates the data according to (11) and ( 12) to obtain ciphertext C .
4.2.6.Data Acquisition The EDCC decrypts and analyzes the ESD.The specific steps are as follows: 1. Let ( ) 2. The EDCC performs frequency statistics on the ESD.
The candidate value frequency of j m is jk  , and the frequency estimate can be cal- culated by the method shown in ( 14) and ( 15).

•
x is sensitive data: • x is non-sensitive data: ( )

Scheme Analysis
This section provides a theoretical and comparative analysis of the uOUE protocol, demonstrating its low communication cost and high data utility.It also includes a security proof and comparative analysis of the ESD aggregation scheme based on uOUE.

Theoretical Analysis of uOUE Protocol
This section will introduce some related properties of the uOUE protocol and give the corresponding theoretical proof.

Proof of Theorem 1. For any
, the probability of outputting the same result A satisfies ( 16):

□
The above ( 16) satisfies the second nature of (3).In uOUE, for any output N A Y ∈ , there is only one original data item N x X ∈ that can be perturbed into reversible data, that is, if and only if x X ∈ , the reversible data are output with probability γ .Therefore, uOUE satisfies the properties of ( 1) and ( 2) in the ULDP model definition.In summary, the perturbation process of uOUE conforms to the ULDP model.
Theorem 2. The result of uOUE frequency estimation is an unbiased estimation.
Proof of Theorem 2. For original data x , their estimated frequency is denoted by ˆx F .□If x is sensitive data, it can be known from the disturbance process that: ( ) If x is non-sensitive data, it can be known from the disturbance process that: ( ) In summary, the frequency estimation result of uOUE is unbiased.
Theorem 3. In the uOUE protocol, the mean square error of the estimated frequency ˆx F is shown in ( Proof of Theorem 3. From Theorem 2, Equation ( 6) is an unbiased estimation, so MSE is equal to the variance of ˆx F .

•
x is sensitive:

Comparison of Theoretical Results
We evaluate the utility of the uOUE protocol with traditional LDP perturbation methods (GRR [12] and RAPPOR [12]) and existing ULDP perturbation methods (uGRR [15], uRAP [15] and uOLH [16]).Since ˆˆŜ and Theorem 3, we compute the MSE for both the proposed and existing LDP mechanisms when ( ) . The results are displayed in Table 1, in which N X F represents the actual total frequency of non-sensitive data.
In practical applications, most of the MSE comes from sensitive data, but sensitive data usually only account for a part of the entire data set.Therefore, by optimizing the utility of the personalized component privacy mechanism, its MSE is significantly smaller than the non-utility mechanism.Our improved mechanism is easy to implement in practical applications and shows better performance in terms of computational cost and communication costs.This mechanism can protect sensitive data more accurately and reduce its impact on the overall error.
In addition to data utility, communication cost is also an important criterion to evaluate whether a mechanism is good or not.We summarize the communication cost of existing ULDP protocols, which can be seen in Table 2.

GRR
( ) As the privacy budget increases, the error of the six protocols gradually decreases.However, in terms of data utility, uGRR is significantly behind the other two protocols.This difference is mainly due to the use of large data sets in the original data domain in the experiment, which poses a challenge to the adaptability of uGRR.For a wide range of raw data domains, uOLH shows superior communication cost performance.In practical scenarios, especially when the original data domain is moderate, the uOUE protocol is superior in communication overhead.Considering the unique application scenarios and data characteristics of ESD, the uOUE protocol effectively meets the needs of actual scenarios while ensuring privacy protection.

Experimental Settings:
Our experimental environment is set as follows: the operating system is Windows 10, the processor is Inter i7-1165G7, the memory is 16.0 GB, and PyCharm 2021.3.
We conducted experiments on two data sets: the COVID-19 data set [21][22][23] and the SARS virus data set [24].Their relevant parameter settings are also given in Table 3.In this subsection, we compare the four mechanisms under different privacy budgets, as shown in Figure 4 and Figure 5.
When the data domain size =256 d , with the increase in privacy budget ε , the MSE of the four mechanisms decreases gradually.Higher privacy budgets lead to more accurate frequency estimation and improved data utility but reduce privacy protection.This implies that within a given privacy budget, we must strive for precise statistical results while preserving data privacy.Balancing data utility and privacy protection is essential.
In the experiment, we observed that data utility with the GRR mechanism was notably lower than with the other four mechanisms.This occurred because the experiment used relatively large data domains, leading to a higher likelihood of data disturbance in uGRR, which impacted frequency estimation accuracy, in line with GRR's characteristics.
From the chart, it is evident that uOLH excels with particularly large data domains.However, for mid-sized data domains, the enhanced uOUE mechanism proves superior in practical applications.It offers higher practicality and performance for the ESD aggregation scheme.; the proportion of sensitive data is 0.5, and the privacy budget is 1 ε = .The results are shown in Figure 6.

Security Proof and Analysis
The security of our scheme is based on the BLS signature.Under the Computational Diffie-Hellman (CDH) assumption and random oracle model, the ESD aggregation scheme based on uOUE is unforgeable under an adaptive chosen message attack.ϑ .Suppose that ①  will not initiate two identical queries on the random oracle; ② if  requests a signature of message M , he has asked ϑ before; he has asked ϑ before.□ Analysis:  regards PK  as its public key and a as its private key (  does not actually know a ), then a ϑ is  's signature on a message, that is, ( ) where ( ) (1) If i j ≠ , then there is a triple ( ) ,  interrupts; otherwise,  outputs In the above process, if  is not interrupted, then the simulation of  is complete.When guessed correctly, the view of the above reduction is identically distributed with the view of the real attack.This is because of the following two points: (1) Each of the H q 1 H queries of  is answered by a random value, and the response to ( ) is as follows: • When i j = is answered by  according to the randomness of i b ; • When i j ≠ is answered by In real attacks, 1 H is regarded as a random oracle.Therefore, the response to  's hash query is the same distribution as the response in the real attack.
( Therefore, the view of  in the above reduction is identically distributed with its view in the real attack, that is, the simulation of  is complete.
If the conjecture of  is correct, then  solves the problem in group 1  .Because the CDH problem in group 1  is difficult, the signature scheme of this scheme is un- forgeable under adaptive chosen message attack.

Functional Comparison
Our scheme achieves multidimensional ESD aggregation, guarantees user identity anonymity, and effectively defends against eavesdropping, active attacks, and differential attacks.Compared to existing schemes, homomorphic-based multiple data aggregation (HB-MDA) [25] could aggregate multidimensional data but not multisubset data.Faulttolerant and flexible privacy-preserving multisubset data aggregation (FF-PPMA) [26] allows the control center to aggregate subsets, but clients can only report one type of data.Moreover, the DP-based multidimensional and multisubset data aggregation scheme (DP-MMDA) [27] signature mechanism is susceptible to adaptive selection message attacks.As seen in Table 4, our proposed scheme offers significant functional advantages and a more comprehensive defense against various security threats.

Computational Overhead
In this section, we will analyze the computational overhead of our scheme, comparing it to HB-MDA, FF-PPMA, and DP-MMDA.Table 5 presents the core operations and their execution times.As per Table 6, our scheme accomplishes multidimensional and multiregion aggregation, with less computational expense and greater efficiency during the aggregation and decryption phases compared to the HB-MDA and FF-PPMA schemes.( ) The HB-MDA scheme takes more time owing to the construction of a super-increasing sequence and the use of the PHE algorithm for encryption.By contrast, the DP-MMDA scheme efficiently reduces computational overhead by employing the Chinese remainder theorem to merge multidimensional data into composite data.The FF-PPMA scheme, not accounting for authentication, does not consider signature generation computational overhead.In this study, we solely compare computational overhead for implementing multidimensional and multisubset data aggregation.
With an increasing number of data aggregations, the DP-MMDA scheme and our scheme excel in multidimensional data aggregation, outperforming the HB-MDA and FF-PPMA schemes in computational performance.Note that the DP-MMDA scheme aggregates grid data and is suitable for accumulating electricity, whereas our ESD aggregation involves multiattribute data, primarily used for statistical analysis.Nevertheless, we effectively integrate and analyze ESD across multiple regions and attributes.
We include a ULDP mechanism, enhancing privacy protection for dimension attributes, which is crucial for ESD analysis.With the LDP mechanism, our scheme demonstrates strong computational performance in multidimensional and multisubset data aggregation, delivering heightened data privacy.Our comprehensive analysis showcases that our scheme adeptly balances computational efficiency and privacy protection in large-scale data aggregation scenarios, thereby offering a valuable practical solution.

Summary
In the context of ESD aggregation, we enhance the uOUE protocol aligned with the ULDP model to boost data coding efficiency and accuracy.Concurrently, we devise an epidemiological survey grounded in the uOUE protocol, ensuring heightened accuracy in ESD frequency estimation without compromising sensitive data protection.We aim to mitigate the privacy risk associated with raw patient data traversing channels and servers, ensuring data integrity during processing while preserving privacy.These enhancements enable secure, efficient, and precise ESD aggregation, bolstering result accuracy and reliability while thwarting data tampering and forgery.Our scheme also has potential practical application value in multifunctional data aggregation [27][28][29] and data aggregation combined with wearable medical devices.
Our future research will focus on research into data aggregation technology in big data environments to improve data utility in a personalized model and develop a multilevel privacy ESD aggregation scheme.In the personalized model, we will explore adaptive data processing methods to meet different user scenarios and data needs.In this multilevel privacy-level ESD aggregation scheme design, we will account for varying data sensitivity and privacy requirements, optimizing data information utilization while preserving privacy.These studies will advance the field, providing comprehensive, optimized solutions for data aggregation and privacy protection.
costs, and direct encoding (DE) and local hashing (LH) required ( ) log d Ο or ( ) log n Ο communication costs ( n is the number of users).All protocols ex- cept DE estimated the computational cost ( ) n d Ο ⋅ of the frequency of all values.When the number of values that users might input is large, UE was an effective coding method, and its communication cost ( ) d Ο was equivalent to that of HE.Therefore, when the user may input the number of values 3 2 d e ε + ＞ and d n ＜ , to avoid the high computational

•
Decryption: the plaintext is obtained by the formula: probability deflection occurs, as shown in Figure1below; for non-sensitive data x , the probability of having , then the re- versible data k v are output, that is, we can use k v ′ to represent () examples are shown in Figure1and Figure2.After the disturbance is completed, the disturbance data will be sent to the server.

h
ESW aggregates these data after verifying signatures, employs the PHE algorithm, and forwards the report to the data control center, streamlining interactions and communication with the EDCC;3.EDCC ( )EDCC : the EDCC is central to the ESD aggregation scheme, acting as the aggregator.It possesses a pair of public and private keys for homomorphic encryption and semantic security, as well as a pair of identity-based public and private keys.Its responsibilities include generating public and private key pairs for ESOs and ESWs, as well as managing the transmission and verification of their data.To facilitate the separation of aggregated ESD, the EDCC constructs a super-increasing sequence for this purpose.

Figure 3 .
Figure 3. Scheme model of ESD aggregation based on uOUE mechanism.

.
established.If yes, the EDCC computes pseudonym PS for user i U based on their real identity U i id : the EDCC randomly selects If equal, use the pseudonym.(7) proves the process.

Figure 4 .
Figure 4.The effect of ε on MSE.

3 .
The effect of d on MSE Data domain size d also has a certain impact on data utility.Since the data domain of the real data set is fixed, this section evaluates simulated data sets of different sizes.The value range of d set by the experiment is { } 16, 32, 64, ,1024 

Figure 6 .
Figure 6.The effect of d on MSE.

Theorem 4 . 1  1 H
If the CDH problem on is difficult, then the ESD aggregation scheme based on uOUE achieves Existential Unforgeability Against Adaptive Chosen Message Attacks (EUF-CMA).Proof of Theorem 4. Let be a random oracle.The adversary  knows that takes  (attacking BLS short signature scheme) as a subroutine, and the goal is to calculate a

)(
The response obtained by  to the signature query of  has obtained this), so the signature response obtained by  is valid (relative to the public key it obtains).
Scheme Encryption Stage Decryption Phase Aggregation PhaseHB-MDA

Funding:
This research was funded by National Natural Science Foundation of China, grant number 62262060, 61662071; Industrial support plan project of the Gansu Provincial Department of Education, grant number 2022CYZC-17 and Gansu Science and Technology Program, grant number 22JR5RA158.Data Availability Statement: Data are contained within the article.
ESO selects a random number h ESW also selects private key i U fills in i x and converts i x into vector and i U sends report { } .

Table 1 .
Comparison of MSE when

Table 4 .
Comparison of privacy-preserving data aggregation schemes.

Table 5 .
Applications in each class.
e  Exponential operation on N Ζ 11.256 m  Multiplication operation on N Ζ 1.032

Table 6 .
Comparison of computational overhead.