Minimizing Computation and Communication Costs of Two-Sided Secure Distributed Matrix Multiplication under Arbitrary Collusion Pattern

This paper studies the problem of minimizing the total cost, including computation cost and communication cost, in the system of two-sided secure distributed matrix multiplication (SDMM) under an arbitrary collusion pattern. In order to perform SDMM, the two input matrices are split into some blocks, blocks of random matrices are appended to protect the security of the two input matrices, and encoded copies of the blocks are distributed to all computing nodes for matrix multiplication calculation. Our aim is to minimize the total cost, overall matrix splitting factors, number of appended random matrices, and distribution vector, while satisfying the security constraint of the two input matrices, the decodability constraint of the desired result of the multiplication, the storage capacity of the computing nodes, and the delay constraint. First, a strategy of appending zeros to the input matrices is proposed to overcome the divisibility problem of matrix splitting. Next, the optimization problem is divided into two subproblems with the aid of alternating optimization (AO), where a feasible solution can be obtained. In addition, some necessary conditions for the problem to be feasible are provided. Simulation results demonstrate the superiority of our proposed scheme compared to the scheme without appending zeros and the scheme with no alternating optimization.


Introduction
With the development of the Internet of Things (IoT), the ubiquitous wireless devices can generate massive data via environment monitoring or target tracking [1].However, due to the limited power or hardware architecture, these wireless devices cannot satisfy the data processing and computation requirements by themselves.This inspires wireless devices to seek help from online computing nodes who can assist in computation and data processing.Furthermore, distributed computing nodes can be employed to further accelerate the computation and data processing tasks, which means wireless devices can assign computation tasks to many different computing nodes, e.g., Apache Spark [2] and MapReduce [3].On the other hand, if the online computing nodes are untrustworthy, we should also guarantee data security.Hence, how to perform computation of data with the aid of distributed computing nodes in a secure fashion is an important problem.
In this paper, we focus on the secure distributed matrix multiplication (SDMM) problem [4][5][6][7][8].In [7,8], the trace-mapping framework has been employed to achieve communication-efficient schemes in the SDMM.The authors of [9] proposed a model of SDMM from an information-theoretic perspective.The user wishes to compute the product of two input matrices A and B with the aid of distributed computing nodes while guaranteeing the security of the information about the two input matrices.Two cases are considered: one-sided security and two-sided security.In the first case, the user only wants to protect the information security of matrix A, and B is a public matrix known to all computing nodes [10].In the second case, we need to consider the information security of both matrices A and B [9,11].The information theft by the distributed computing nodes can be modeled by the collusion pattern, which has also been studied in problems of secret sharing [12] and private information retrieval [13,14].Some of the existing literature has studied the SDMM problem under homogeneous collusion patterns, where up to l computing nodes may collude to obtain the information of the two input matrices [9,[15][16][17][18].To balance the tradeoff between the uplink and downlink cost, the works proposed two schemes based on the secure cross subspace alignment [15].In [9], the authors characterized the fundamental limits of minimum communication overhead for the SDMM problem under homogeneous collusion pattern.The work in [16] proposed a scheme based on the polynomial codes on sub-tasks assigned to computing nodes, which can mitigate the straggling effects efficiently.In [18], the authors have adopted some random matrices to encode two input matrices for the purpose of meeting the requirement of security.Then, many encoded copies are sent to different computing nodes for computation.Finally, the user receives these computation results from computing nodes and recovers the product of the two input matrices.It has considered two cases: (1) encoding the input matrices without extra random matrices, i.e., generalized polydot code, and (2) encoding the input matrices with some random matrices to satisfy the security constraint, i.e., secure the generalized polydot code.They also show the superiority of the proposed scheme on the recovery threshold, i.e., the number of computation results that is needed for users to decode the desired result without error, and the communication load between the user and computing nodes, i.e., the amount of downloaded information from computing nodes.Recently, rather than focusing on the homogeneous collusion pattern, ref. [19] studied the SDMM problem under the arbitrary collusion pattern.Considering the two proposed performance metrics, i.e., the normalized download cost and normalized upload cost, they provide the optimal scheme for the one-sided SDMM problem and an achievable scheme for the two-sided SDMM problem.
Both the private information retrieval and SDMM problem considered in [14,19] deal with the non-homogeneous collusion pattern scenario.The common approach of these two problems is assigning different number of copies to different servers.Intuitively speaking, the servers that collude more will be assigned a lower number of copies.More specifically, in [14], the authors considered the ratio between the message size and the amount of downloaded information from the servers.Then, the work of [19] studied the SDMM problem under the arbitrary collusion pattern for a fixed matrix splitting factor, and different numbers of copies were distributed to different computing nodes based on the collusion pattern to minimize the performance of normalized download and upload costs.However, the heterogeneity of the computing nodes in terms of storage capacity, communication capability, and computing capability was not taken into consideration.When full heterogeneity is taken into consideration, the numbers of copies assigned to different servers will not only depend on its colluding behavior but also on its storage capacity, communication capability, and computing capability.Furthermore, the fixed matrix splitting factor may affect the performance of SDMM.Hence, in this work, we study the problem of two-sided SDMM under an arbitrary collusion pattern with the flexible matrix splitting factor.Furthermore, in order to measure the communication and computation performance of the system, a new performance metric called the total cost, which is composed of the computation cost and communication cost, has been proposed in our paper.Additionally, the storage capability of the computing nodes and the delay requirement of the user are also considered.Then, an optimization problem is formulated by minimizing the total cost, subject to the security constraint of the two input matrices, the decodability constraint of the desired result of the multiplication, the storage capacity of the computing nodes, and the delay constraint.In order to overcome the divisibility problem of matrix splitting, we also propose a strategy of appending zeros to the input matrices and discuss the feasible set of some matrix splitting factors for the optimality of the problem.Finally, an alternating optimization (AO) algorithm based on some solvers is adopted to obtain a feasible solution, and some necessary conditions for the feasibility of problem have been provided.
The contributions of our paper are summarized as follows: • We propose a new performance metric, the total cost, which includes communication cost and computation cost, to measure the performance of the SDMM problem under arbitrary collusion pattern.Our aim is to minimize the total cost, overall matrix splitting factors, number of appended random matrices, and distribution vector, while satisfying the security constraint of the two input matrices, the decodability constraint of the desired result of the multiplication, the storage capacity of the computing nodes, and the delay constraint.

•
To overcome the divisibility problem of matrix splitting, we propose a strategy of padding zeros to the input matrices, which can split the input matrices into an arbitrary number of blocks compared to the scheme without appending zeros.Moreover, the value ranges of some matrix splitting factors are discussed for the optimality of the problem.

•
The formulated optimization problem is solved by an AO algorithm based on some solvers.More specifically, for the optimization subproblem corresponding to number of appended random matrices and distribution vector, the relationship between number of appended random matrices and distribution vector can be found so that the subproblem is transformed into an integer linear programming over the distribution vector, which can be solved by the MATLAB function "intlinprog".Furthermore, we also provide some necessary conditions to verify the feasibility of this subproblem.Then, for the optimization subproblem corresponding to all matrix splitting factors, by relaxing the ceiling function and integer constraints, the subproblem can be transformed into an integer geometric programming problem solved by using "YALMIP".Simulation results show that our proposed scheme with padding zeros is superior to the scheme without appending zeros and the scheme with no alternating optimization.
The rest of this paper is organized as follows: Section 2 introduces the system model of the two-sided SDMM under arbitrary collusion pattern.Section 3 proposes a zero-padding strategy, discusses the feasible set of some matrix splitting factors, and formulates an optimization problem.Section 4 provides the algorithm to solve the problem.Simulation results and conclusions are shown in Sections 5 and 6, respectively.Notation 1.In this paper, the following notations are used.[1 : N] denotes the set {1, 2, • • • , N}. h n represents the n-th column vector of the matrix h. 1 N denotes the N × 1 column vector.Positive integer is represented by Z + , natural number is denoted by N, and the ceiling function is denoted by ⌈•⌉.

System Model
As shown in Figure 1, we consider a user who wants to calculate the multiplication of two input matrices A ∈ F T×S and B ∈ F S×D .We suppose that T, S and D are all integers and the finite field F is sufficiently large.Due to its own limited computational ability, the user wishes to split the two matrices A and B into many blocks and upload them to N computing nodes for computation.At the same time, both matrices A and B contain sensitive information, and the user does not want to leak any information to the N computing nodes.We study the case where the computing nodes may collude with others to obtain information about the two matrices A and B. We represent the colluding behaviors by a collusion pattern P, which contains M colluding sets, i.e., P = {T 1 , T 2 , • • • T M }.Here, T m ⊆ [1 : N] is the m-th colluding set, which means that computing nodes in T m may collude to obtain the information of the two matrices.We make the following two assumptions about the collusion pattern P: (1) For ease of presentation, we only include the maximal colluding set in P. For instance, a colluding set {3, 4, 5, 6} means that computing nodes 3, 4, 5, and 6 collude.This implies that computing nodes belonging to any subset of {3, 4, 5, 6} also collude.However, for ease of presentation, we do not include the subsets of {3, 4, 5, 6} in P.
(2) Every computing node must appear in at least one colluding set.This is because we assume that all computing nodes are curious, and no computing node can be trusted with the sensitive information of A and B.
A collusion pattern P can be represented by its incidence matrix B P , of size N × M, i.e., if computing node i in the j-th colluding set of P, the value of the (i, j)-th element in B P is 1.For example, when P = {{1, 2, 3}, {1, 4}, {2, 4}, {3, 4}, {5}}, its incidence matrix is Due to the need to keep the two matrices secure, the user must encode A, B before uploading them to the computing nodes for computation.Assume that there are N 1 encoded copies with N 1 ≥ N, then these encoding functions are denoted as: The computing node n computes the product, i.e., Z i = A i B i , i ∈ L n .Then, computing node n would send the computed results Z i , i ∈ L n back to the user.This is termed the download phase.
In order to ensure the security of matrices A and B, the following security constraint must be satisfied, which indicates that computing nodes in each colluding set, when putting their received copies together, can not obtain any information about the two matrices.
In addition, the user must be able to decode the desired product C = AB from the answers received from all the computing nodes, i.e., the decodability constraint must be satisfied.

Matrix Encoding Scheme
We use the secure generalized polydot code (SGPD) in [18] to encode the two input matrices.First, we split A into t × s blocks, while B can be split into s × d blocks, i.e., where T is divisible by t, S is divisible by s, and D is divisible by d.Then, A i,j is of size t 0 × s 0 , and B i,j is of size s 0 × d 0 , where we have defined In view of the security constraint (2), we append some random matrices K i,j ∈ F (T/t)×(S/s) and K ′ i,j ∈ F (S/s)×(D/d) as where l ∆ rows of random matrices are appended to matrix A, and l ∆ columns of random matrices are appended to matrix B, where l ∆ is a positive integer.Each element of the random matrices K i,j and K ′ i,j are generated in an i.i.d.fashion according to the uniform distribution on F. Note that ( 6) and (7) are just one way of appending random matrices.The other case is given by Method 2 in [19].For simplicity, we only study the case of ( 6) and (7), and the other case of appending random matrices can be treated in a similar fashion.
In this case, the encoded matrices are generated according to where non-zero elements in F, and we have defined The N 1 generated encoded copies of ( 8) and ( 9), i.e., ( , will be distributed to the computing nodes, where computing node n will receive A L n and B L n , where L n ⊆ [1 : N 1 ] is the index set of the encoded matrices distributed to computing node n.We assume that L n , n ∈ [1 : N], form a partition of the set [1 : N 1 ], which means that each encoded copy will be distributed to one and only one computing node.Upon receiving A L n and B L n , computing node n will calculate Z i = A i B i , i ∈ L n , and return Z L n to the user.We distribute the encoded matrices to the computing nodes in the following way.
is the number of distributed encoded matrices given to the n-th computing node.Then, we have It has been proved in [19] that when the security constraint (2) is satisfied.The physical meaning of (10) is that the number of encoded matrices for computing nodes in every colluding set must be smaller than the minimal number of random matrices appended in A * or B * , which is l ∆ s.Furthermore, the decodability constraint (3) is guaranteed by the following inequality [19]: It means that the encoded copies N 1 must be no smaller than l ∆ (sd + 2s) + ts(d + 1) − 1 for decoding the desired results C = AB without error.

Storage, Communication and Computing Requirements of Each Computing Node
The amount of storage each encoded copy computing node n can not even store one encoded copy of ( A i , B i ) and its corresponding answer, i.e., then the computing node could store one encoded copy , return the corresponding result and then retrieve another encoded copies from the user for further computation.Hence, (12) must be satisfied for all n ∈ [1 : N].Written in vector form, we have Suppose computing node n's computation speed is V n multiplications per second, then the time it takes for the user to complete the computation assigned to it, is Further suppose that the uplink and downlink capacity between the user and computing node n are C U n and C D n symbols per second, respectively.Then, the amount of upload delay incurred at computing node n is and the amount of download delay incurred at computing node n is Then the total amount of delay incurred at computing node n when assigned with J n number of encoded copies is where we have assumed that the computing nodes can only do one of the three actions at any time instant: compute or receive upload or send download.This is also in line with the assumption that the computing nodes may not have enough memory to store all J n copies all at once.Rather, it receives one copy, computes, and then sends it back to the user and then retrieves the next copy and repeats.Thus, the total delay incurred for this computation is and we require that the total delay is no larger than a given threshold Q th , i.e., max Besides the delay constraint, cost should also be considered for efficient SDMM.More specifically, the cost we consider is comprised of the computation cost of computing nodes and the data transmission cost, where the data transmission cost can be can be written twice divided into the upload and download transmission cost.More specifically, we assume that the upload and download transmission cost for computing node n is c U n , and c D n per symbol, and the computation cost of each multiplication at computing node n is c C n , then the total required cost for the user doing the secure matrix multiplication of matrices A and B is where U U is the upload cost, which is given by U D is the download cost, which is given by and U C is the computation cost, which is given by

Problem Formulation
In this work, we would like to jointly optimize the distribution vector J, and the matrix split parameter (t, s, d, l ∆ ) such that the cost of the user, defined in (17), is minimized.At the same time, the security constraint (10), the decodability constraint (11), the storage constraint (13), and the delay constraint ( 16) must be satisfied.

The Feasible Set of (T, S, D)
Since we are splitting the two matrices A and B as shown in (4), it is natural to assume that t, s, and d have to take values such that T, S, and D be divisible by t, s, and d, respectively.For example, if T = 5, t can only take values in the set {1, 5}, because T = 5 is not divisible by 2, 3, 4.However, this significantly limit the values that (t, s, d) can take and may provide a high cost for the user.
In this section, we propose a better and more general way as follows: we allow any t, s, d values, and to make the matrix splittable, we append zeros to the original matrix, i.e., append s columns and t rows to the matrix A and append s rows and d columns to the matrix B, such that (T + t)/t, (S + s)/s, and (D + d)/d are integers.This increases the dimension of the two matrices but enables us to split them into blocks in a more flexible way.For example, A ∈ F 5×4 , i.e., T = 5, S = 4, and we would like to take t = 2, s = 2.However, T = 5 is not divisible by t = 2.Then, we can append one row of zeros to A so that the appended matrix has dimension 6 × 4 and thus can be divisible by t = 2, s = 2.
More generally, we propose that for any (t, s, d) with t ∈ [1 : T], s ∈ [1 : S], d ∈ [1 : D], we may append (T mod t) many rows to the bottom of matrix A and (S mod s) many columns to the right side of matrix A. Similarly, we append (S mod s) many rows to the bottom of matrix B and (D mod d) many columns to the right side of matrix B. As a result, instead of (5), we have As can be seen, not padding zeros and only using (t, s, d) that is a divisor of (T, S, D) is a special case.Since we are considering padding zeros, it is also possible to have We show in the next lemma that this will only increase the cost at the user, defined in (17) Lemma 1.To minimize the cost at the user, i.e., (17), it is sufficient to consider t ∈ [1 : T] and d ∈ [1 : D].
Remark 1.The case of s is different from the cases of t and d.When (l ∆ , t, d) is fixed, from security constraint (10) and decodability constraint (11), we see that on one hand, increasing s increases the number of blocks, but on the other hand, it also relaxes the security constraint.When computing nodes are heterogeneous, i.e., computing nodes have different computation cost, upload transmission cost and download transmission cost, the increase in s does not necessarily increase the total cost, because due to the more relaxed security constraint, we can distribute more blocks to computing nodes with lower costs.As a result, when we apply the strategy of appending zeros, the optimal s may not take values in [1 : S].
After the above discussions, the problem described in Section 2.3 can be formally formulated as min 10), ( 11), (19b) max where (19b) provides the security constraint and decodability constraint, (19c) is the storage constraint with N computing nodes' storage capacity vector defined as and (19d) is the delay constraint.In the cost function (19a), we have defined the upload transmission cost vector, the download transmission cost vector and the computation cost vector of N computing nodes as , respectively.Note that the scheme of appending zeros makes the dimension of every block in A and B to be t 0 × s 0 = T t × S s and s 0 × d 0 = S s × D d , respectively, as indicated by (19e).Furthermore, note that in (19f), while the values of t and d are limited to [1 : T] and [1 : D], respectively, the value of s does not have an upper bound due to Remark 1.

Algorithm Design
Due to coupling variables, integer constraints and nonlinear constraints and objective function of the problem in (19) , it is hard to find a global optimal or suboptimal solution.
In the following, we propose an algorithm to obtain a feasible solution.
Coupling variables in Problem (19) inspires us to utilize the alternating optimization (AO) technique.Then, a feasible solution to Problem (19) can be obtained by solving the next two subproblems: one is fixing (t, s, d) to optimize (J, l ∆ ), and the other is optimizing (t, s, d) given (J, l ∆ ).
4.1.Optimization Subproblem of (J, l ∆ ) for a Fixed (T, S, D) In this subsection, for a fixed (t, s, d), the optimization subproblem of Problem (19) corresponding to (J, l ∆ ) is given as min 10), ( 11), (20b) Note that when (t, s, d) is fixed, the corresponding (t 0 , s 0 , d 0 ) is also fixed according to (19e).Further note that when (t, s, d) is fixed, the objective function (20a) is only a function of J, and not l ∆ .Due to the fact that J n , c U n , c D n , c C n ≥ 0, n = 1, ..., N, the inequality of ( 11) must be satisfied with the equality when J * is optimal.Hence, (11) can be rewritten as follows With equality (21), l ∆ can be expressed as a function of J, i.e., l ∆ = 1 T J−ts(d+1)+1

sd+2s
. Then, substituting l ∆ in Problem (20) as a the function of J, Problem (20) can be reformulated as min Problem ( 22) is an integer linear programming problem with only one optimizing variable J.This problem can be solved using MATLAB function "intlinprog".MATLAB's built-in "intlinprog" function is based on the branch and bound (BnB) algorithm and the interior point method [20,21] and is typically used to solve integer linear programming problems, such as the one in (22).
For certain system parameters and (t, s, d) values, Problem ( 22) is not feasible.To identify a necessary condition for the feasibility of Problem (22), we have the following lemma.
Before presenting the lemma, we define a variable p as the smallest number of colluding sets that contain all computing nodes.For example, for the collusion pattern represented by incidence matrix (1), p is equal to 3, because three colluding sets, i.e., {1, 2, 3}, {1, 4}, {5}, include all computing nodes, and any 2 colluding sets in the collusion pattern can not include all computing nodes.Lemma 2. For fixed parameters (t, s, d), if Problem (20) is feasible, the following inequalities must be satisfied: where (t 0 , s 0 , d 0 ) satisfies (19e).Variable p is defined as the smallest number of colluding sets that contain all computing nodes.
The constraint (10) can be rewritten as where T m is the m-th colluding set.Inequality (25) shows that the total number of encoded matrices received by computing nodes in every colluding set can not be more than that of random matrices.Hence, from (25), we have Thus, from ( 21), (24), and (26), we have On the other hand, when p − d − 2 > 0 is not satisfied, there exists no feasible l ∆ .Next, we derive an upper bound on l ∆ .We have where (28) follows from (21), and (29) follows from (20c).Hence, an upper bound on l ∆ is given by where Y is as defined in (23).
If Problem ( 20) is feasible, we must have that p − d − 2 > 0, and the upper bound of l ∆ in (30) must be greater than or equal to the lower bound of l ∆ in (27).
Hence, the proof is complete.
Based on Lemma 2, Algorithm 1 is proposed to solve Problem (20), where we check the necessary conditions of the feasibility of Problem (20) before solving it using the MATLAB "intlinprog" function.
Since Problem (32) is obtained by the relaxation of the ceiling functions, we may face the problem where even though the (t, s, d) found by YALMIP are integers, which we call ( t, s, d), the corresponding T t , S s , and D d in Problem (32) may not be integers.In order to overcome this problem, for the converged solution (t * , s * , d * ) of the AO, we check whether constraints (19c) and (19d) in the original problem (19) are satisfied according to the definition of (19e).If they are, (t * , s * , d * ) is taken as the solution to Problem (31), and the corresponding block dimensions (t 0 , s 0 , d 0 ) are taken to be by padding zeros.If they are not, then we can employ an exhaustive search within a neighborhood near the converged solution (t * , s * , d * ) for a feasible solution or restart the algorithm with a new random initial point (t (0) , s (0) , d (0) ).In a time-constrained system, we can also abandon optimizing (t, s, d) and simply use the initial values (t (0) , s (0) , d (0) ) to obtain a timely solution.

Complexity Analysis
The complexity of Algorithm 2 per iteration mainly lies in Steps 3 and 4. In Step 3, the complexity of Algorithm 1 is derived from solving Problem (20) by MATLAB function "intlinprog" which uses BnB method.By omitting the lower-order terms, the main complexity of Algorithm 1 per iteration is O(2 N ), where N is the number of computing nodes.In Step 4, similarly, the main complexity of solving Problem (32) by YALMIP with BnB method is O(2 3 ), where 3 is the dimension of optimizing variables (t, s, d) [24].Hence, by neglecting the lower-order terms, the approximate computational complexity of Algorithm 2 per iteration is O(2 N ) when N ≥ 3, and O(2 3 ) when N ≤ 2. As can be seen, the complexity scales exponentially with the number of computing nodes.
For simplicity, our proposed scheme in this paper is denoted by "Pro.".Then, the following two benchmarks are considered to compare with our proposed scheme: (1) "N/0.":In this scenario, we do not append zeros to the input matrices.The optimization subproblem corresponding to (t, s, d) for a fixed (J, l ∆ ) is solved by exhaustive search in feasible pairs ( t, s, d), which are divisors of (T, S, D).Other details are similar to Algorithm 2. This corresponds to the optimal performance of AO when no zeros are appended.
Figure 3 plots the total cost versus the number of columns of matrix B, i.e., D. Similar to Figure 2, the difference between our proposed scheme and the "SE" scheme becomes larger with the increase in the dimensions of the input matrices.However, the "N/0."scheme achieves the same total cost as our proposed algorithm.This shows that, in this case, there is no need to pad zeros.Though the proposed scheme and the "N/0."scheme have the same performance, the proposed scheme has less complexity because it can avoid the exhaustive search of the "N/0."scheme.

Conclusions
In this paper, we investigated the minimization problem of the total cost, comprised of the computation cost and the communication cost, in the system of two-sided SDMM under an arbitrary collusion pattern.For realizing SDMM, we split the two input matrices into many blocks and appended some extra blocks of random matrices to guarantee the security of the two input matrices.Then, the matrix multiplication is calculated based on the encoded copies in the computing nodes.Our aim is to minimize the total cost, while ensuring the security constraint of the two input matrices, the decodability constraint of the desired result of the multiplication, the storage capacity of the computing nodes, and the delay constraint.The distribution vector, the number of appended random matrices, and all matrix splitting factors were optimized.In order to overcome divisibility problem of matrix splitting, we firstly proposed a strategy of appending zeros to the two input matrices and then discussed the value ranges of some matrix splitting factors for the optimality of the problem.Next, an AO algorithm was provided to obtain a feasible solution.Furthermore, to verify the feasibility of the proposed optimization problem, some necessary conditions were provided.Numerical results demonstrated that our proposed scheme achieves a lower total cost compared to the scheme without appending zeros and the scheme without AO optimization.
The user distributes a subset of the encoded matrices to computing node n, where the indices of this subset are written as L n , L n ⊆ [1 : N 1 ].This is termed the upload phase.