Privacy-Preserving Federated Singular Value Decomposition

: Singular value decomposition (SVD) is a fundamental technique widely used in various applications, such as recommendation systems and principal component analyses. In recent years, the need for privacy-preserving computations has been increasing constantly, which concerns SVD as well. Federated SVD has emerged as a promising approach that enables collaborative SVD computation without sharing raw data. However, existing federated approaches still need improvements regarding privacy guarantees and utility preservation. This paper moves a step further towards these directions: we propose two enhanced federated SVD schemes focusing on utility and privacy, respectively. Using a recommendation system use-case with real-world data, we demonstrate that our schemes outperform the state-of-the-art federated SVD solution. Our utility-enhanced scheme (utilizing secure aggregation) improves the ﬁnal utility and the convergence speed by more than 2.5 times compared with the existing state-of-the-art approach. In contrast, our privacy-enhancing scheme (utilizing differential privacy) provides more robust privacy protection while improving the same aspect by more than 25%.


Introduction
Advances in networking and hardware technology have led to the rapid proliferation of the Internet of Things (IoTs) and decentralized applications.These advancements, including fog computing and edge computing technologies, enable data processing and analysis to be performed at node devices, avoiding the need for data aggregation.This naturally brings benefits such as efficiency and privacy, but on the other hand, it forces data analysis tasks to be carried out in a distributed manner.To this end, federated learning (FL) has emerged as a promising solution in this context, allowing multiple parties to collaboratively train models without sharing raw data.Instead, only intermediate results are exchanged with an aggregator server, ensuring privacy preservation and decentralized data analysis [1].
With respect to machine learning tasks, research has shown that sensitive information can be leaked from the models [2][3][4][5].For example, in [3], Shokri et al. demonstrated membership inference attacks against machine learning tasks.In such an attack, an attacker can determine whether a data sample has been used in the model training.This will violate privacy if the data sample is sensitive.Regardless of its privacy friendly status, FL suffers similar privacy issues, as demonstrated by Nasr, Shokri and Houmansadr [5].This makes it necessary to incorporate additional privacy protection mechanisms into FL and to make it rigorously privacy-preserving.
To mitigate information leakages, FL can be aided with other privacy-enhancing technologies, such as secure aggregation (SA) [6] and differential privacy (DP) [7].SA hides the individual contributions from the aggregator server in each intermediate step in a way that does not affect the trained model's utility.In other words, the standalone updates are masked such that the masks cancel out during aggregation; therefore, the aggregated results remain intact.The masks could be seen as temporary noise; hence, the privacy protection does not extend to the aggregated data.In contrast, DP adds persistent noise to the model, i.e., it provides broader privacy protection but with an inevitable utility loss (due to the permanent noise).We differentiate between two DP settings depending on where the noise is injected.In local DP (LDP), the participants add noise to their updates, while in central DP (CDP), the server applies noise to the aggregate result.A comparison of LDP, CDP and SA is summarized in Table 1.While there are many privacy protection mechanisms, incorporating them into FL is not a trivial task and remains as open challenges [1].

Protects
The Individual Updates The Aggregate Secure Aggregation Central DP Local DP Among many data analysis methods, this paper focuses on singular value decomposition (SVD).Plainly, SVD factorizes a matrix into three new matrices.Originating from linear algebra, SVD has several interesting properties and conveys crucial insights about the underlying matrix.Hence, SVD has essential applications in data science, such as in recommendation systems [8,9], principal component analysis [10], latent semantic analysis [11], noise filtering [12,13], dimension reduction [14], clustering [15], matrix completion [16], etc. Existing federated SVD solutions fall into two categories: SVD over horizontally and vertically partitioned datasets [17].In real-world applications, the former is much more common [18,19]; therefore, in this paper, we choose the horizontal setting and focus on the privacy protection challenges.

Related Work
The concept of privacy-preserving federated SVD has been studied in several works, which are briefly summarized below.
In the literature, many anonymization techniques have been proposed to enable privacy protection in federated machine learning and other tasks.Ref. [20] proposed substitute vectors and length-based frequent pattern tree (LFP-tree) to achieve the data anonymization.It focuses on what data can be published and how they can be published without associating subjects or identities.With the concept of data anonymization in mind, Ref. [21] proposed a strategy by decreasing the correlation between data and the identities.However, the utility of the data will be affected.And, Ref. [22] focused on high-dimensional dataset, which is divided into different subsets; then, each subset is generalized with a novel heuristic method based on local re-coding.While these works contain interesting techniques, they do not directly offer a solution for privacy-preserving federated SVD.A more detailed analysis can be found in [1].
Technically speaking, the algorithms utilized to compute SVD are mostly iterative, such as the power iteration method [23].Recently, these algorithms were adopted to a distributed setting to solve large-scale problems [24,25].While these works tackle important issues and advance the field, they all disregard privacy issues: we are only aware of two federated SVD solutions in the literature explicitly providing a privacy analysis [18,19].Hartebrodt et al. [19] proposed a federated SVD algorithm with a star-like architecture for high-dimensional data such that the aggregator cannot access the complete eigenvector matrix of SVD results.Instead, each node device has access, but only to its shared part of the eigenvector matrix.In addition to the lack of a rigorous privacy analysis, its aim is different from most other federated SVD solutions where the aim is to jointly compute a global feature space.In contrast, Guo et al. [18] proposed a federated SVD algorithm based on the distributed power method, where both the server and all the participants learn the entire eigenvector matrix.Their solution incorporated additional privacy-preserving features, such as participant and aggregator server noise injection, but without a rigorous privacy analysis.We improve upon this solution by pointing out an error in its privacy analysis and by providing a tighter privacy protection with less utilized noise.Overall, these existing literature works do not provide a privacy-preserving federated SVD solution with a rigorous analysis in our setting.

Contribution and Organization
This work focuses on a setting similar to Guo et al. [18], i.e., when the server and all the participants are expected to learn the final eigenvector matrix.As our main contribution, we improve the FedPower algorithm [18] from two perspectives, i.e., both from the privacy and utility points of view.Our detailed contributions are summarized below.

•
Firstly, we point out several inefficiencies and shortcomings of FedPower, such as the avoidable double noise injection steps and the unclear and confusing privacy guarantee.

•
Secondly, we propose a utility enhanced solution, where the added noise is reduced due to the introduction of SA.

•
Thirdly, we propose a privacy enhanced solution, which (in contrast to FedPower) satisfies DP. • Finally, we empirically validate our proposed algorithms by measuring the privacyutility trade-off using a real-world recommendation system use-case.
The rest of the paper is organized as follows.In Section 2, we list the fundamental definitions of the relevant techniques used throughout the paper.In Section 3, we recap the scheme proposed by Guo et al. [18], while in Sections 4 and 5, we propose two improved schemes focusing on utility and privacy, respectively.In Section 6, we empirically compare the proposed schemes with the original work.Finally, in Section 7, we conclude the paper.

Singular Value Decomposition
Let M be a s × d matrix with assumption of s ≤ d.As shown in Figure 1, the full SVD of M is a factorization of the form UΣV T , where T means conjugate transpose.The left-singular vectors are U = [u 1 , u 2 , . . ., u s ] ∈ R s×s , the right-singular vectors are V = [v 1 , v 2 , . . ., v d ] ∈ R d×d , and the diagonal matrix with the singular values in decreasing order in its diagonal is Σ = diag{σ 1 , σ 1 , . . ., σ d } ∈ R s×d .The partial or truncated SVD [26,27] is used to find the top k (k and singular values Σ = diag{σ 1 , σ 1 , . . ., σ k } .Moreover, if M is the composition of n matrices, then computation of the Power Method can be distributed.So, if M T = [M T 1 , M T 2 , . . ., M T n ] ∈ R s×d with s = ∑ n i=1 s i , where M i ∈ R s i ×d and M i = 1 s i M T i M i , then Equation (1) holds.Thereby, Y can be written as which indicates that the Power Method can be processed in parallel by each data holder [18,28].

Secure Aggregation
In simple terms, with SA, the original data of each node device are locally masked in a particular way and shared with the server, so when the masked data are aggregated on the server, the masks are canceled and offset.In contrast, the server does not know all individual node devices' original unmasked intermediate results.In the FL literature, many solutions have widely used the SA protocol of Bonawitz et al. [29].We recap this protocol in Appendix A and use it in Section 4 to benchmark our enhanced SVD solution.

Differential Privacy
Besides SA, DP is also exhaustively utilized in the FL literature.DP was introduced by Dwork et al. [30], which ensures that the addition, removal, or modification of a single data point does not substantially affect the outcome of the data-based analysis.One of the core strengths of DP comes from its properties, called composition and post-processing, which we also utilize in this paper.The former ensures that the output of two DP mechanisms still satisfies DP but with a parameter change.The latter ensures that a transformation of the results of a DP mechanism does not affect the corresponding privacy guarantees.Typically, DP is enforced by injecting calibrated noise (e.g., Laplacian or Gaussian) into the computation.

Definition 1 ((ε, δ)-Differential Privacy).
A randomized mechanism M : X → R with domain X and range R satisfies ε-differential privacy if for any two adjacent inputs x, x ∈ X and for any subset of output S ⊆ R it holds that The variable ε is called the privacy budget, which measures the privacy loss.It captures the trade-off between privacy and utility: the lower its value, the more noise is required to satisfy Equation (2), resulting in higher utility loss.Another widely used DP notion is approximate DP, where a small additive term δ is added to the right side of Equation (2).Typically, we are interested in values of δ that are smaller than the inverse of the database size.Although DP has been adopted to many domains [7] such as recommendation systems [31], we are not aware of any work besides [18] which adopts DP for SVD computation.Thus, as we later show a flaw in that work, we are the first to provide a distributed SVD computation with DP guarantees.

The FedPower Algorithm
Following Guo et al. [18], we assume there are n node devices, and each device i holds an independent dataset, an s i -by-d matrix M i .Each row represents a record item, while the columns of each matrix correspond to the same feature space.Moreover, M denotes the composition of matrices The solution proposed by Guo et al. [18] is presented in Algorithm 1 with the following parameters.
Input: Datasets {M i } n i=1 , target rank k, iteration rank r ≥ k, number of iteration T, synchronous set I p T , and the variance of noises (σ, σ ) each node device i adds the Gaussian noise: to the server 8: the server performs perturbed aggregation with an extra Gaussian noise: the server broadcasts Y t to all node devices 10: each node device i sets Y each node device i performs orthgonalization: t ) 13: end for 14: return approximated eigenspace otherwise.
In the proposed solution, each node device holds its raw data and processes the SVD locally, its eigenvectors are aggregated on the server by the orthogonal procrustes transformation (OPT) mechanism.The basic idea behind this is to find an orthogonal transformation matrix that maps one set onto another while preserving their relative characteristics.And, the aggregation result is sent back for further iterations.More details (e.g., the computation of D (i) t ) are given in [18].

Enhancing the Utility of FedPower
Adversary Model.Throughout this paper, we consider a semi-honest setup, i.e., where the clients and the server are honest but curious.This means that they follow the protocol truthfully, but in the meantime, they try to learn as much as possible about the dataset of other participants.We also assume that the server and the clients cannot collude, so the server cannot control node devices.
Utility Analysis of FedPower.It is not a surprise that adding Gaussian noise twice (i.e., the local and the central noise in Step 6 and 8 in Algorithm 1) severely affects the accuracy of the final result.A straightforward way to increase the utility is to eliminate some of this noise.As highlighted in Table 1, the local noise protects the individual clients from the server.Moreover, it also protects the aggregate from other clients and from external attackers.On the other hand, the central noise merely covers the aggregate.Hence, if the protection level against the server is sufficient against other clients and external attackers, the central noise becomes obsolete.
Moreover, all the locally added noise accumulates during aggregation, which also negatively affects the utility of the final result.Loosely speaking, as shown in Table 1, CDP combined with SA could provide the same protection as LDP.Consequently, by utilizing cryptographic techniques with a single local noise, we can hide the individual updates and protect the aggregate as well.
Utility Enhanced FedPower.We improve on FedPower [18] from two aspects: (1) we apply an SA protocol to hide the individual intermediate results of the node devices from the server, and (2) we use a secure multi-party computation (SMPC) protocol to enforce the CDP in an oblivious manner to the server.In SMPC, multiple parties can jointly compute a function over their private inputs without revealing those inputs to each other or to the server.More details of this topic can be found in the book [32].We supplement the assumptions and the setup of Guo et al. [18] with a homomorphic encryption key pair generated by the server.The server holds the private key and shares the public key with all node devices.The remaining part of our solution is shown in Algorithm 2. To ease understanding, the pseudo code is simplified.The actual implementation is more optimized, e.g., the encrypted results are aggregated before decryption in Step 11,and in Step 7, the ciphertexts are re-randomized rather than generated from scratch.We describe all these tricks in Section 6.
By performing SA in Step 7, the server obtains the aggregated result with Gaussian noises from all node devices.With the simple SMPC procedure (Steps 8-12), the server receives all Gaussian noises apart from the one (i.e., node device j) is randomly selected (which is hidden from the node devices).Then, in Step 13, it removes them from the output of the SA protocol.Compared with FedPower [18], our intermediate aggregation result only contains a single instance of Gaussian noise from the randomly chosen node device instead of n.Consequently, via SA and SMPC, the proposed utility-enhancing protocol reduced the locally added noise n-fold and completely eliminated the central noise.
Computational Complexity.Regarding computational complexity, we compare the proposed scheme with the original solution in Table 2.The major difference is that we have integrated SA to facilitate our new privacy protection strategy.Let SA e and SA s be the asymptotic computational complexities of SA on each node device and server side, respectively.Agg. [18] Although we have added more operations, as seen in Table 2, we have distributed some computations to individual node devices.Most importantly, we no longer add secondary server-side Gaussian noise to the final aggregation result and only retain the Gaussian noise from one node device.
Analysis.As we mentioned in our adversarial model, the semi-honest server cannot collude with any of the node devices, which are also semi-honest.Thus, the server cannot eliminate the remaining noise from the final result.In terms of the node device, since no one except the server is aware of the random index in Step 8, apart from its data, a node device only knows the aggregation result with the added noise, even if the retained noise comes from itself.
Input: Datasets {M i } n i=1 , target rank k, iteration rank r, number of iteration T, synchronous trigger p, the variance of noise σ, and key pair (sk hm , pk hm ) Output: Approximated eigenspace Z T 1: initialise Z (i) 0 = Z 0 ∈ R d×r ∼ N(0, 1) d×r with orthonormal columns and generate an r × r zero matrix P and another all-ones matrix P of the same size 2: for t = 1 to T do each node device i adds Gaussian noise: SA protocol is executed among the server and all node devices, with inputs Y (i) t and output Y t 8: the server chooses one random index j ∈ [1, n] and encrypts P and P : the server sends value C (j) and C (j ) to the appropriate node devices 10: each node device i computes each node device i sends C (i) back to the server 12: for all i ∈ [1, n] \ {j}, the server decrypts the receiving messages 13: the server updates aggregation result as Y the server performs orthogonalization Z t = orth(Y t ) 15: the server broadcasts Z t to all node devices Compared with the original solution by Guo et al. [18], we have improved the utility of the aggregation result by keeping the added noise from one single node device.As a side effect, the complexity has grown due to the SA protocol.This is a trade-off between result accuracy and solution efficiency.

Differentially Private Federated SVD Solution
Privacy Analysis of FedPower.Algorithm 1 injects noise both at the local (Step 6) and the global (Step 8) levels.Consequently, the claimed privacy protection of Algorithm 1 is (2ε, 2δ)-DP, which originates from (ε, δ)-LDP and (ε, δ)-CDP [18].Firstly, as we highlighted in Table 1, LDP and CDP provide different privacy protections; hence, merely combining them is inappropriate, so the claim must be more precise.Instead, Algorithm 1 seems to provide (ε, δ)-DP for the clients from the server and stronger protection (due to the additional central noise) from other clients and external attackers.
Yet, this is still not entirely sound, as not all computations were included in the sensitivity calculation; hence, the noise scaling is incorrect.Indeed, the authors only considered the sensitivity of the multiplication with Z in Step 3 when determining the variance of the Gaussian noise in Step 6; however, the noise is only added after the multiplication with D in Step 5. Thus, the sensitivity of the orthogonalization is discarded.
Privacy-Enhanced FedPower.We improve on FedPower [18] from two aspects: (1) we incorporate clipping in the protocol to bound the sensitivity of the local operations performed by the clients and (2) we use SA with DP to obtain a strong privacy guarantee.For this reason, similar to FedPower [18], we assume that for all i the elements of M i = 1 s i M T i M i are bounded with m.In Algorithm 1, the computations the nodes undertake (besides noise injection at Step 6) are in Steps 3, 5 and 12, where the last two could be either discarded for the sensitivity computation or completely removed, as explained below.

•
Step 12: Orthogonalization is intricate, so its sensitivity is not necessarily traceable.
To tackle this, we propose applying the noise before, in which case it would not affect the privacy guarantee, as it would count as post-processing.

•
Step 5: We remove this client-side operation from our privacy-enhanced solution, as it is not essential; only the convergence speed would be affected slightly.
The FedPower protocol with enhanced privacy is present in Algorithm 3, where besides the orthogonalization, clipping is also performed with ẑ.The only client operation which must be considered for the sensitivity computation (i.e., before noise injection) is Step 3. We calculate its sensitivity in Theorem 1.
Proof.To make the proof easier to follow, we remove the subscript round counter from the notation.Let us define M and M such that they are equal except at position 1 ≤ i, j ≤ d.Now, multiply these with Z from the left results in Y and Ỹ, respectively, which are the same except in row i: Hence, the Euclidean distance of Y and Ỹ boils down to this row i: It is known that adding Gaussian noise with (where s is the sensitivity) results in (ε, δ)-DP.As a corollary, we can state in Theorem 2 that a single round in Algorithm 3 is differentially private.An even tighter result was presented in [33]; we leave the exploration of this as future work.The best practice is to set δ as the inverse of the size of the underlying dataset, so there is a direct connection between the variance σ and the privacy parameter ε.Theorem 2. If T = 1, then Algorithm 3 provides (ε, δ)-DP, where Can be verified by combining the provided formula with the appropriate sensitivity.
Input: Datasets {M i } n i=1 , target rank k, iteration rank r, number of iteration T, the clipping bound ẑ, the variance of noise σ Output: Approximated eigenspace Z T each node device i adds Gaussian noise: One can easily extend this result for T ≥ 1 with the composition property of DP: Algorithm 3 satisfies (T • ε, T • δ)-DP.Besides this basic loose composition, one can obtain better results by utilizing more involved composition theorems such as in [34].We leave this for future work.
Analysis.Similarly to Section 4, we protect the individual intermediate results with SA.On the other hand, it is equivalent to generate n Gaussian noise with variance σ and select one, or to generate n Gaussian noise with variance σ n and sum them all up.Consequently, instead of relying on an SMPC protocol to eliminate most of the local noise, we could merely scale them down.combining SA with such a downsized local noise is, in fact, a common practice in FL: this is what distributed differential privacy (DDP) [35] does, i.e., DDP combined with SA provides LDP but with n times smaller noise, where n is the number of participants.

Empirical Comparison
In order to compare our proposed schemes with FedPower, we implement the schemes in Python [36].As we only encrypt 0 and 1 in Section 4, we optimize the performance and take advantage of the utilized Paillier cryptosystem.More specifically, we re-randomize the corresponding ciphertexts to obtain new ciphertexts.In addition, we also exploit the homomorphic property, and instead of decrypting each value (d × r × |number o f node devices| times), we first calculate the product of all the ciphertexts (elementary matrix multiplication) and then perform the decryption on a signal matrix.In this way, we obtain the sum of all Gaussian noises more efficiently.The decryption result is the sum of noise which will be canceled in Algorithm 2. Furthermore, we prepare the 0 and all keys of SA offline for each node device i.
Metric.We use Euclidean distance to represent the similarity of two m × n matrix Let Z denote the true eigenspace computed without any noise, let Z g (σ, σ ) denote the eigenspace generated with Algorithm 1, let Z u (σ) denote the eigenspace generated with Algorithm 2, and let Z p (σ) denote the eigenspace generated with Algorithm 3.

Setup.
For our experiments, we used the well-known NETFLIX rating dataset [37], and we pre-process it similarly to [38] (instead of 10, we removed users and movies with less than 50 ratings).It consists of 96.310.835ratings corresponding to 17.711 movies from 324.468 users.We split them horizontally into 100 random blocks to simulate node devices.Moreover, we set the security parameter to 128; thus, we adopt 3072 bits for N in Paillier cryptosystem (this is equivalent to RSA-3072, which provides a 128-bit security level [39]).The number of iteration rank and top eigenvectors is set to r = k = 10, and we keep the same synchronous trigger p = 4 as [18].To compare FedPower with our enhanced solutions, we set the noise size for these algorithms as σ = σ = 0.1.Moreover, for Algorithm 3 we bounded M i with 0.05 and Z (i) t with 0.2 for all possible i and t.Using Theorem 2, we can calculate that a single round corresponds to privacy budget ε = 30.6 with δ = 10 −5 .
In order to determine the number of global rounds T, we set up a small experiment.We built a data matrix M of size 3000 × 100 filled with integers in [0, 5], and randomly divided it for 100 node devices (each has at least 10 rows).We executed Algorithm 1 for 200 rounds and compared the distance between the aggregation result Z and the real singular values of M. From the result in Figure 2, we can see that convergence occurs around the round 92, since the subsequent results vary only slightly (<1%).Thus, we set T = 92 for our experiments.The experiment is implemented in a Docker container of 40-core Intel(R) Xeon(R) Silver 4210 CPU @ 2.20GHz and 755G RAM.We run our experiments 10-fold and take the average execution time.
Results.Firstly, we compare the efficiency of our enhanced schemes and the original algorithm.The computation times are presented in Table 3.Compared with FedPower, the overall computation burden of the devices increased by a factor of ×39.68 for the utility-enhanced solution in Section 4 and only ×1.74 and the privacy-enhanced solution in Section 5. Concerning the server, the increase is ×6.97 and ×1.17, respectively.The rise in computational demand comes with benefits.Concerning Algorithm 2, significant progress is achieved in the utility while it offers a similar privacy guarantee as FedPower.Concerning Algorithm 3, the privacy guarantee is more robust, as it provides a formal DDP protection (while FedPower fails to satisfy DP).Moreover, it obtains a higher utility, which could make this solution preferable despite its computational appeal.We compare the distance between the results of each algorithm and the real eigenvalues, as shown in Figure 3, and the utility is improved (i.e., the distances are lower) with both Algorithms 2 and 3. Our utility-enhanced solution significantly outperforms FedPower: after 92 rounds, the obtained error of our scheme is almost three times (2.74×) smaller than that for Fed-Power.The final error of Algorithm 2 is dist(Z, Z u (σ)) = 6.72, while this value for Algorithm 1 is dist(Z, Z g (σ, σ )) = 18.42.Note that this level of accuracy (∼18.5) was obtained using our method in the 32nd round, i.e., almost three times (2.88×) faster.Hence, the superior convergence speed can compensate for most of the computational increase caused by SA and SMPC.
Let us shift our attention to our privacy-enhanced solution.In that case, we can see that besides more robust privacy protection, our solution offers better utility: Algorithms 1 and 3 obtains dist(Z, Z g (σ, σ )) = 18.42 and dist(Z, Z p (σ)) = 13.94RMSE values, respectively, i.e., we acquired a 24% error reduction.Our method (with actual DP guarantees) achieved the same level of accuracy (∼18.5)only after 65 rounds, which is a 29% convergence speed increase.
We also compare our two proposed schemes, in a way, that the size of the accumulated noises is equal.Besides the nature of noise injection (many small vs. one large), the only factor that differentiates the results is the clipping bounds.As expected, the error is 1.65× larger with clipping, i.e., dist(Z, Z p ( σ 10 )) = 11.11compared with dist(Z, Z u (σ)) = 6.72.Concerning the convergence speed, the utility enhanced solution is 1.7× faster, reaching similar accuracy (∼11) in round 54.Note though that this result still vastly outperforms FedPower: the accuracy and the convergence speed are increased by 40% and 43%, respectively.
Finally, we study the effect of different levels of privacy protection on the accuracy of each algorithm.As we noticed in Figure 3, after the 60th round, the error ratios of the algorithms are reasonably stable, so for this experiment, we set T = 60.Since the clipping rate ẑ and the noise variance σ both contributed to the privacy parameter ε (as seen in Theorem 2), we varied each independently.Our results are presented in Figure 4.It is visible that the previously seen trends hold with other levels of privacy protection, making our proposed schemes favorable for a wide range of settings.

Conclusions
Motivated by Guo et al.'s distributed privacy-preserving SVD algorithm based on the federated power method [18], we have proposed two enhanced federated SVD schemes, focusing on utility and privacy, respectively.Both use secure aggregation to reduce the added noise, which reverts to the initial design intent and interest.Yet, the added cryptographic operations trade efficiency for superior performance (×10 better results) while providing either similar or superior privacy guarantee.Our work leaves several future research topics.One is to further investigate the computational complexity, particularly the secure aggregation, to achieve more efficient solutions.Another is to investigate the scalability of the proposed solutions, regarding larger datasets and different datasets in applications other than recommendation systems.In addition, scalability also concerns the number of node devices.Yet, another topic is to look further into the security assumptions.For example, the security assumptions can be weaker so that the server can be allowed to collude with one or more node devices.s takes threshold t as an input and shares the corresponding inputs to a user subset V ⊆ U such that |V | ≥ t, and outputs a field element s. s u,v is expanded using PRG into a random vector p u,v = ∆ u,v • PRG(s u,v ), where ∆ u,v = 1 when u > v and ∆ u,v = −1 when u < v; moreover, define p u,u = 0. 2.3.
The node device u computes its own private mask vector p u = PRG(b u ) and the masked input vector x u into y u ← x u + p u + ∑ v∈U 2 p u,v (mod R); then, y u is sent to the server.2.4.
If the server receives at least t messages (denote with U 3 ⊆ U 2 this set of node devices), share the node device set U 3 with all node devices in U 3 .
Once the node device u ∈ U 3 receives the message, it returns the signature σ u ← SIG.sign(d SK u , U 3 ).
Each node device u sends the shares s SK v,u for node devices v ∈ U 2 \ U 3 and b v,u for node devices in v ∈ U 3 to the server.4.4.
If the server receives at least t messages (denote with U 5 this set of node devices), it re-constructs, for each node device u ∈ U 2 \ U 3 , s SK u ← SS.recon({s SK u,v } v∈U 5 , t) and re-computes p v,u using PRG for all v ∈ U 3 .4.5.
The server also re-constructs, for all node devices u ∈ U 3 , b u ← SS.recon({b u,v } v∈U 5 , t) and re-computes p v,u using the PRG.4.6.
We summarize the asymptotic computational complexity of each node device and the server in Table A1.For simplicity of description, we assume that all devices participate in the protocol, that is, t = m.Since some operations can be considered as offline preconfiguration, we focus on online operations starting from masking messages in Step 2.3.

Figure 1 .
Figure 1.Singular value decomposition.If M = 1 s M T M ∈ R d×d , then the Power Method [23] could be used to compute the top k right singular vector of M and the top k eigenvectors of M .It works by iterating Y = M Z and Z = orth(Y), where both Y and Z are d × k matrices and orth(•) is the orthogonalization of the columns with QR factorization.Moreover, if M is the composition of n matrices, then computation of the Power Method can be distributed.So, if M T = [M T 1 , M T 2 , . . ., M T n ] ∈ R s×d with s = ∑ n i=1 s i , where M i ∈ R s i ×d and M i = 1 s i M T i M i , then Equation (1) holds.Thereby, Y can be written as

•
T: the number of local computations performed by each node device.• I p T : the rounds where the node devices and the server communicate, i.e., I p T = {0, p, 2p, . . ., p T/p }. • (ε, δ): the privacy budget.• (σ, σ ): the variance of noises added by the clients and the server, respectively:

Theorem 1 .
If we assume |m ij | ≤ m for all i, j ∈ [1, d], then the sensitivity (calculated via the Euclidean distance) of the client-side operations (i.e., Step 3 in Algorithm 3 is bounded by 2

Figure 4 .
Figure 4.The effect of various privacy parameters on the accuracy for Algorithms 1-3.

Table 1 .
Comparing secure aggregation with local and central differential privacy.
[45]y Agreement[43]: KA.param(k) → pp takes a security parameter k and returns some public parameters; KA.gen(pp) → (s SK , s PK ) generates a secret/public key pair; KA.agree(s SK u , s PK v ) → s u,v allows a user u to combine its private key with the public key of another user v into a private shared key between them.•AuthenticatedEncryption[44]:AE.enc and AE.dec are algorithms for encrypting a plaintext with a public key and for decrypting a ciphertext with a secret key.•SignatureScheme[45]:SIG.gen takes a security parameter k and outputs a secret/public key pair; SIG.sign signs a message with a secret key and returns the relevant signature; and SIG.ver verifies the signature of the relevant message and returns a boolean bit indicating whether the signature is valid.The complete execution of the protocol between node devices and the server is provided in the following.If the server receives at least t messages from individual node devices (denote by U 1 this set of node devices), then broadcast {(v, c PK v , s PK v , σ v )} v∈U 1 to all node devices in U 1 .Once a node device u in U 1 receives the messages from the server, it verifies if all signatures are valid with SIG.ver(d PK u , c PK u ||s PK u , σ u ), where u ∈ U 1 .1.2.The node device u samples a random element b u ← F as a seed for a PRG.1.3.The node device u generates two t-out-of-|U 1 | shares of s SK u : {(v, s SK u,v )} v∈U 1 ← SS.share(s SK u , t, U 1 ) and b u : {(v, b u,v )} v∈U 1 ← SS.share(b u , t, U 1 ).1.4.For each node device v ∈ U 1 \ {u}, u computes e u,v ← AE.enc(KA.agree(cSK u , c PK v ), u||v||s SK u,v ||b u,v ) and sends them to the server.1.5.If the server receives at least t messages from individual node devices (denoted by U 2 ⊆ U 1 this set of node devices), then it shares to each node device u ∈ U 2 all ciphertexts for it {e u,v } v∈U 2 .For the node device u ∈ U 2 , once the ciphertexts are received, it computes s u,v ← KA.agree(s SK u , s PK v ), where v ∈ U 2 \ {u}.2.2.
3.2.If the server receives at least t messages (denote by U 4 ⊆ U 3 this set of node devices), share the set {u , σ u } u ∈U 4 .Each node device u verifies SIG.ver(d PK v , U 3 , σ v ) for all v ∈ U 4 4.2.For each node device v ∈ U 2 \ {u}, u decrypts the ciphertext (received in the MaskedInputCollection round) v ||u ||s v,u ||b v,u ← AE.dec(KA.agree(cSK u , c PK v ), e v,u ) and asserts that u

Table A1 .
Asymptotic computational complexity of online operations.