Straggler- and Adversary-Tolerant Secure Distributed Matrix Multiplication Using Polynomial Codes

Large matrix multiplications commonly take place in large-scale machine-learning applications. Often, the sheer size of these matrices prevent carrying out the multiplication at a single server. Therefore, these operations are typically offloaded to a distributed computing platform with a master server and a large amount of workers in the cloud, operating in parallel. For such distributed platforms, it has been recently shown that coding over the input data matrices can reduce the computational delay by introducing a tolerance against straggling workers, i.e., workers for which execution time significantly lags with respect to the average. In addition to exact recovery, we impose a security constraint on both matrices to be multiplied. Specifically, we assume that workers can collude and eavesdrop on the content of these matrices. For this problem, we introduce a new class of polynomial codes with fewer non-zero coefficients than the degree +1. We provide closed-form expressions for the recovery threshold and show that our construction improves the recovery threshold of existing schemes in the literature, in particular for larger matrix dimensions and a moderate to large number of colluding workers. In the absence of any security constraints, we show that our construction is optimal in terms of recovery threshold.


Introduction
Recently, tensor operations have emerged as an important ingredient of many signal processing and machine learning applications [1].These operations are typically complex due to the large size of the associated tensors.Therefore, in the interest of a low execution time, such computations are often performed in a distributed fashion and outsourced to a cloud of multiple workers that operate in parallel over the distributed data set.These workers in many cases consist of commercial off-the-shelf servers that are characterized by failures and varying execution times.Such straggling servers are handled by state-of-the art cloud computation platforms via a repetition of the computation task at hand.However, recent work has shown that encoding the input data may help alleviate the straggler problem and thus reduce the computation latency, which mainly depends on the amount of stragglers present in the cloud computing environment; see [2,3].More generally, it has been shown that coding can control the trade-off between computational delay and communication load between workers and master server [3][4][5][6].In addition, the workers in the cloud may not be trustworthy, so the input and output of the partial computations need to be protected against unauthorized access.To this end, it has been shown that stochastic coding can help keep both input and output data secure from eavesdropping and colluding workers (see, for example, [7][8][9][10][11][12][13][14]).
In this work, we focus on the canonical problem of distributing the multiplication of two matrices A and B, i.e., C = AB, whose content should be kept secret from a prescribed number of colluding workers in the cloud.Our goal is to minimize the number of workers from which the partial result must be downloaded, the so-called recovery threshold, to recover the correct matrix product C.
Coded matrix computation was first addressed in the non-secure case by applying separate MDS codes to encode the two matrices [3].In [5], polynomial codes have been introduced, which improves on the recovery threshold of [3].The recovery threshold was further improved by the so-called MatDot and PolyDot codes [15,16] at the expense of a larger download rate.In particular, PolyDot codes allow a flexible trade-off between the recovery threshold and the download rate, depending on the application at hand.
In [17,18] two different schemes are presented, an explicit scheme that improves on the recovery thereshold of PolyDot codes and a construction based on the tensor rank of matrix multiplication, which is optimal up to a factor of 2. In [19] a new construction for private and secure matrix multiplication is proposed based on entangled polynomial codes, which allows for a flexible trade-off between the upload rate and the download rate (equivalently, the recovery threshold).For small numbers of stragglers [20] constructs schemes that outperform the entangled polynomial scheme.Recently, several attempts have been made to design coding schemes to further reduce upload and download rates, the recovery threshold, and computational complexity for both workers and server (see, for example, [21][22][23][24][25][26][27]).For example, in [21], bivariate polynomial codes were used to reduce the recovery threshold in specific cases.In [22], the authors considered new schemes for the private and secure case which outperform [19] for specific parameter regions.The work in [23] considered distributed storage repair codes, so-called field-trace polynomial codes, to reduce the download rate for specific partitions of matrices A and B. Very recently, the authors in [24] proposed a black-box coding scheme based on star products, which subsumes several existing works as special cases.In [25], a discrete Fourier transformbased scheme with low upload rates and encoding complexity is proposed.The work in [26] focused on selecting the evaluation points for the polynomial codes, providing a better upload rate than [9], but worse than [25].
In the following, we propose a new scheme for secure matrix multiplication, which provides explicit evaluation points for the polynomial codes, but unlike the work in [26], is also able to tolerate stragglers.Specifically, we exploit gaps in the underlying polynomial code.This is motivated by the observation that the recovery threshold can be improved by selecting the number of evaluation points to be equal to the number of only the non-zero coefficients in the polynomial [9,19].In addition, selecting dedicated evaluation points has the advantage that the condition for security against colluding workers is automatically satisfied (see, for example, condition C2 in [27]).As such, our approach is able to provide a constructive scheme with provable security guarantees.Further, our coding scheme provides an advantage in terms of download rate in some cases, and is both stragglertolerant and robust against Byzantine attacks on the workers.This paper is organized as follows.In Section 2, the problem statement and the background is highlighted.Section 3 discusses design and properties of our proposed scheme and provides performance guarantees with respect to the number of helper nodes needed for recovery, security, straggler tolerance and under Byzantine attacks.Section 4 extends the scheme of Section 4 by introducing gaps into the code polynomials and by studying its properties.Finally, Section 5 presents numerical results and comparisons with state-of-the-art schemes from the literature.

Problem Statement and Background
Let A and B be a pair of matrices over the finite field F q , whose product is well defined.We consider the problem of computing the product C = AB.The computation will be distributed among a number of helper nodes, each of which will execute a portion of the total calculation.We also assume that the user wishes to hide the data contained in the matrices A and B and that up to T honest but curious helper nodes may collude to deduce information about the contents of A and B. To divide the work among the helper nodes, the matrices A and B are each divided into KM and ML blocks, respectively, of compatible dimensions, say a × r and r × b.The matrices are also assumed to have independent and identically distributed uniformly distributed entries from a sufficiently large field of cardinality q > N, where N denotes the number of servers to be employed (in fact, we will require q to exceed the degree of a polynomial P(x)Q(x), central to this scheme).Hence, for given matrix partition of A and B according to we obtain The system model is displayed in Figure 1.We consider a distributed computing system with a master server and N helper nodes or workers.The master server is interested in computing the product C = AB.In Figure 1, the worker receives matrices A and B and T random uniformly independent and identically distributed matrices of size R t ∈ F a×r q and S t ∈ F r×b for t ∈ [T].To keep the data secure and to leverage possible computational redundancy at the workers, the server sends encoded versions of the input matrices to the workers.This security constraint imposes the mutual information condition between the pair (A, B) and their encodings (A T , B T ) for all subsets T ⊂ [N] of maximum cardinality T. The server generates a polynomial representation of A and R t by constructing a polynomial P(x) ∈ F a×r q [x].Likewise, a polynomial representation of B and Q t results in a polynomial Q(x) ∈ F r×b q [x].The polynomial encodings that the p-th worker receives comprise the two polynomial evaluations P(α p ) and Q(α p ), for distinct evaluation points α p ∈ F q with p ∈ [N].It then computes the matrix product P(α p )Q(α p ) and sends it back to the server.The server collects a subset of N R ≤ N outputs from the workers as defined by the evaluation points in the subset {P(α p )Q(α p )} p∈N R with |N R | = N R .The size of the smallest possible subset N R for which perfect recovery is obtained, i.e., where H denoted the entropy function, is defined as the recovery threshold.The server then interpolates the underlying polynomial such that the correct product C = AB can be assembled from a combination of the interpolated polynomial coefficients C i,j (see Section 3 for details).
We further define the upload rate R u per worker as the sum of the dimensions of P(α p ) and Q(α p ), i.e., R u = (a + b)r field elements of F q .Likewise, the download rate or communication load R d is defined as the total number of field elements to be downloaded from the workers such that (2) is satisfied, i.e., R d = abN R .

Server Worker 1
Worker 2 Worker N < l a t e x i t s h a 1 _ b a s e 6 4 = " G 4 g a 9 0 x l g 0 < l a t e x i t s h a 1 _ b a s e 6 4 = " m h r r 3 v 7 L W o 9 t 9 f 0 D    Notation.For the remainder, we fix A, B, C to be matrices over F q such that C = AB, and we fix K, M, L, a, b, r to be the integers as defined above.We define [n] := {1, . . ., n} for any positive integer n.For each k ∈ [K], ∈ [L], and m ∈ [M], we write A k,m , B m, , and C k, to denote the (k, m), (m, ), and (k, ) blocks of A, B, and C, respectively.The transpose of a matrix Z is denoted by Z t .

Proposed Scheme
The scheme we propose uses a similar approach to the schemes in [9,19,27].We will begin with the choices for exponents in P(x) and Q(x) and show that the desired blocks of C appear as coefficients of the product PQ.We discuss the maximum possible degree of PQ since it gives us an upper bound on the necessary evaluations, and hence workers, needed to interpolate PQ.In Section 3.3, we give explicit criteria for choices of evaluation points and prove that the scheme protects against collusion of up to T servers.Section 3.4 discusses the option to query additional servers to provide resilience against stragglers and Byzantine servers.
Section 4 uses ideas from the GASP scheme [9] to reduce the recovery threshold by examining how many coefficients in the product are already known to be zero.

Choice of Exponents and Maximal Degree
We propose the following scheme to outsource the computation among the worker servers.The model will incorporate methods to secure the privacy of the data held by the matrices A, B, and C.
Let D := M + 2. For the given A and B, we define the polynomials: We now define polynomials where and R(x), S(x) are a pair of matrix polynomials: whose coefficients are a × r and r × b matrices over F q , respectively, chosen uniformly at random.In the next theorem, we show that the desired matrices C k, appear as coefficients of the product PQ and can hence be retrieved by inspection of this product.Theorem 1.For each pair (k, ) ∈ [K] × [L], the block C k, arising in the product C = AB appears as the coefficient of x D((k−1)+K( −1))+M+1 in the product PQ.
Proof.We calculate the product Consider the exponents modulo D. The first term in the sum of terms above is the product P Q.Any of the exponents of x in this term are equal to D − 1 ≡ M + 1 mod D if and only if m = m , in which case its corresponding coefficient is C k, .In particular, the matrix block C k, appears in the product P Q as the coefficient of x D((k−1)+K( −1))+M+1 .
We claim that no other exponent of x in PQ − P Q is equal to M + 1 mod D, from which the result will follow.Observe that the exponents in the second and third term of the product (i.e.those of PS + R Q) are all between 1 and M modulo D, while every exponent of x in the fourth term, which is RS, is a multiple of D.
In order to retrieve the polynomial PQ, we may evaluate P and Q at a number of distinct values α 1 , . . ., α N+1 in F × q .The values P(α i ) and Q(α i ) are found at a cost of zero non-scalar operations.Define The (i, j)-entries of the coefficients of PQ ∈ F a×b q [x] can be retrieved by computing the product if the degree of PQ is at most N. Since this computation involves only F q -linear computations, the total non-scalar cost is the total cost of performing the N + 1 matrix products P(α i )Q(α i ).In the distributed computation scheme as shown in Figure 1, the server uploads each pair of evaluations P(α i ), Q(α i ) to the i-th worker node, which then computes the product P(α i )Q(α i ) and returns it to the server.
In this approach to reconstructing PQ, we require the participation of N + 1 worker nodes, where N is the degree of PQ.For this reason, we study this degree.Since we have the following result, wherein each of the values N 1 (K, L, M; T) to N 4 (K, L, M; T) correspond to the maximum possible degrees of P Q, PS, R Q, and RS, respectively.We write N(A, B; K, L, M; T) to denote the maximum possible degree of the polynomial PQ, as the A, B, R, S range over all possible matrices of the stated sizes.Proposition 1.The degree of PQ is upper bounded by N(A, B; K, L, M; T), where Proposition 2. The following are equivalent.
Since T − K is an integer, we thus have that the following inequalities are equivalent to T > K: This shows that N 3 (K, L, M; T) > N 1 (K, L, M; T) if and only if T > K. Similarly, using the 2nd and 3rd inequalities just above, we have from which we see that N 4 (K, L, M; T) > N 2 (K, L, M; T) if and only if T > K.
Proposition 3. The following are equivalent.
Proof.We have the following inequalities: from which we deduce that N 4 (K, L, M; T) > N 3 (K, L, M; T).We now show that N 2 (K, L, M; T) > N 1 (K, L, M; T).We have: We tabulate (see Table 1) the value of N(K, L, M; T) based on the observations of Propositions 2 and 3.

AB versus B t A t
We compare the recovery threshold cost of calculating B t A t rather than AB.It can be shown that it is always better to calculate AB whenever K ≥ L. That is, we show that N(A, B; K, L, M; T) ≤ N(B t , A t ; L, K, M; T) for K ≥ L. We consider all possible cases for the maximal degree in the following two theorems and remarks.Theorem 2. 1.

Proof. 1.
Since T > K, and T < K(L − 1) + 1 by Propositions 2 and 3 we have that and so N(A, B; K, L, M; T) = N 3 (K, L, M; T).Similarly, since T > L, and T < L(K − 1) + 1, we have that N(B t , A t ; L, K, M; T) = N 3 (L, K, M; T).Clearly, L < K if and only if:

2.
By Propositions 2 and 3, the assumptions K ≥ T and T < K(L − 1) + 1 imply that N(A, B; K, L, M; T) = N 1 (K, L, M; T), while the assumptions T > L and T < L(K − 1) + 1 yield that N(B t , A t ; K, L, M; T) = N 3 (L, K, M; T).Clearly, since T > L, we have M < D(T − L) and

3.
From the given assumptions, by Propositions 2 and 3, we have N(A, B; K, L, M; T) = N 4 (K, L, M; T) and N(B t , A t ; L, K, M; T) = N 3 (L, K, M; T).Since L(K − 1) + 1 ≥ T, as in the proof of Proposition 3, we have

4.
For the given assumptions the statement follows immediately from Propositions 2 and 3.

5.
From the given assumptions, by Propositions 2 and 3, we have N(A, B; K, L, M; T) = N 1 (K, L, M; T) and N(B t , A t ; L, K, M; T) = N 1 (L, K, M; T).The rest follows immediately from In this case, from Propositions 3 and 2, we have that N(A, B; K, 1, M; T) = N 2 (K, 1, M; T).
and so the result follows.(ii) We see that Remark 2. The remaining two cases lead to a contradiction and can hence never occur.Let T ≤ K and T > K(L − 1) + 1 and T > L(K − 1) + 1.By Remark 1, we have that L = 1 and we obtain the contradiction T ≤ K < T.

T-Collusion
Each query is masked with a polynomial of the form ∑ T−1 i=0 x iD R i , where R i is chosen uniformly at random.A query is private in the case of T servers colluding if and only if the matrix has full rank for any subset of T evaluation points.This is the same as condition C2 in [27].
Because of the very specific set of exponents used, we can give a more explicit condition for the invertibility of this matrix.Proof.M(x 1 , . . ., x T ) is a Vandermonde matrix with entries x D 1 , . . ., x D T .

Proposition 5.
A set of elements of F q such that their D th powers are pairwise different has size at most N = q−1 gcd(q−1,D) + 1.
Proof.Fix a generator γ of F * q .Then the image of the map x → x D from F q to F q is given by 0 together with all powers γ Di where 0 ≤ i < q − 1.
Corollary 1.Let T < q.If gcd(q − 1, D) = 1, then the scheme in Section 3 is secure against T-collusion for any choice of evaluation points.

Stragglers and Byzantine Servers
Considering the scheme as described in the previous section, we see that the responses are the coordinates of a codeword of a Reed-Solomon code.The polynomial that needs to be interpolated has degree at most N = N(K, L, M; T), and hence N + 1 evaluation points suffice for reconstruction.Any N + 1 evaluation points are admissible and hence we have the following theorem.Theorem 4. The scheme in Section 3 is straggler resistant against S stragglers if N + 1 + S helper nodes are used.
Proof.The responses can be considered as a codeword in an [N + 1 + S, N + 1, S + 1] RS code, with S erasures.Since S is smaller than the minimum distance of the code, the full codeword and hence the interpolating polynomial can be recovered.
Similarly, we can use additional helper nodes to account for possible Byzantine servers whose responses are incorrect.Theorem 5.The scheme in Section 3 is resistant against Byzantine attacks of up to B helper nodes if N + 1 + 2B helper nodes are used.

Proof. The responses can be considered as a codeword in an
RS code, with B errors.Since 2B is smaller than the minimum distance of the code, the full codeword and hence the interpolating polynomial can be recovered.
Combining both theorems give us the following corollary.Corollary 2. The scheme in Section 3 is resistant against S stragglers and B Byzantine helper nodes if N + 1 + S + 2B helper nodes are used.

Gaps in the Polynomial
The upper bound on the recovery threshold given by the maximum degree of the product PQ can actually be improved if we choose instead to use the fact that we need only as many servers as non-zero coefficients.Similar to considerations in [9], as a basic observation of linear algebra, we note that only as many evaluation points as there are possible non-zero coordinates are required to retrieve the required matrix coefficients of PQ.Let PQ have degree r − 1 and suppose that q ≥ r + 1.Let α 1 , . . ., α r be distinct elements of F × q .Suppose that the zero coefficients of PQ are indexed by I and let i = r − |I|.There exist j 1 , . . ., j i ∈ {1, . . ., r} such that the i × i matrix V, found by deleting the columns of V(α j 1 , . . . ,α j i ) indexed by I, is invertible.Then, each (s, t)-entry of the unknown coefficients of the polynomial PQ ∈ F a×b q [x] can be retrieved by computing the product The number N of non-zero terms in the product PQ satisfies Proof.We have P(x) = P(x) + R(x) and Q(x) = Q(x) + S(x).Recall that P(x) and R(x) have disjoint support, as do Q(x) and S(x).From Theorem 1, for each each k ∈ Clearly, each such coefficient h ≡ M + 1 mod D. The degrees of terms arising in the product PQ are given by for i ∈ {0, ..., K − 1}, z ∈ {0, ..., L − 1}, j, y ∈ {0, ..., M − 1} and u, t ∈ {0, ..., T − 1}.The sequence (7) corresponds to terms that appear in the product P Q.By inspection, we see that no element θ in any of the sequences ( 8)-( 10) satisfies θ ≡ −1 mod D: in (8) this would require j = M and in (9) this would require y = M, contradicting our choices of j, y.The total number of distinct terms to be computed is the number of distinct integers appearing in the union T of the elements of the sequences ( 7)- (10).Let U 0 denote the set of integers appearing in (7).Observe that U 0 = {2, . . ., (LK + 1)D − 4}, unless M = 2, in which case U 0 = {j : 2 ≤ j ≤ 4LK, j ≡ 1 mod 4}.Consider the set We make the following observations with respect to U .
Consider the following sets.
Clearly, U 1 comprises the elements of the sequence (8) and the members of U 3 are exactly those of the sequence (10).For T ≥ K + 1, we have in which case U 2 is exactly the set of elements of (9).It follows that Suppose first that M > 2. We thus have that U = T if L ≥ 2 and T ≤ K, or if L = T = 1; in either of these cases, PQ has at most non-zero terms.We summarize these observations as follows.
− 2 and so, applying inclusion-exclusion, we see that, if L ≥ 2, then In the case L = 1, we have U 2 ⊆ U 1 , while if T ≤ K then the elements of (9) are contained in U .Therefore, T = U ∪ U 1 ∪ U 3 and so for T ≥ 2 we have Similar to previous computations, we see |T | takes the same values as in the case for M > 2. If L ≥ 2 and T ≥ K + 1 then T = U 0 ∪ U 2 ∪ U 3 .Again using similar computations as before, we see in this case that |T | takes the same values as in the case for M > 2. Suppose that L ≥ 2 and T ≤ K.In this case, the integers appearing in (9) comprise the set We have |U 0 | = 3KL and moreover, We will compute the product AB using 32 helper nodes, assuming that T = 3 servers may collude.Choose a pair of polynomials R(z) = R 1 + R 6 x 5 + R 11 x 10 and S(z) = S 1 + S 6 x 5 + S 11 x 10 , whose non-zero matrix coefficients are chosen uniformly at random over F q .We have Define P(x) := P(x) + R(x) and Q(x) := Q(x) + S(x).In Table 2, we show the exponents that arise in the product P(x)Q(x).The monomials corresponding to the computed data are 4, 9, 14, 19, 24, 29, shown in blue.The coefficients of x 4 , x 9 , x 14 , x 19 , x 24 and x 29 are, respectively, given by Note that the total number of non-zero terms in PQ is LKD + M − 1 = 32, as predicted by Theorem 6.This also corresponds to the case for which PQ has degree N 1 (K, L, M; T) = N 1 (3, 2, 3; 3) = 31, which is consistent with Theorem 2. Therefore, 32 helper nodes are required to retrieve PQ and hence the coefficients C k,m .If the matrices have entries over F q with q = 64, then since gcd(q − 1, D) = gcd(63, 5) = 1, the user can retrieve the data securely in the presence of 3 colluding workers.
Suppose now that we have T = 6 colluding servers.In this case, we have T = 6 > 4 = LK/2 + 1 and L > 1 and so from Theorem 6, we expect the polynomial PQ to have at most (LK + T)D − K(M + L) − 1 = 44 non-zero coefficients.These exponents are shown in the corresponding degree table for our scheme (see Table 3).In this case, to protect against collusion by 6 workers, we require a total of 44 helpers.While the degree of PQ in this case is 50 (see Table 1), the coefficients corresponding to the exponents E = {34, 39, 44, 46, 47, 48, 49} are zero, and hence known a priori to the user.Let α be a root of x 6 + x 4 + x 3 + x + 1 ∈ F 2 [x], so that α generates F × 64 .
Let V be the 44 × 44 matrix obtained from V(α i : i ∈ [63]) by deleting the columns and rows indexed by E ∪ {51, . . ., 62}.It is readily checked (e.g., as here, using MAGMA [28]) that the determinant of V is α 11 and in particular is non-zero.Therefore, we can solve the system to find the unknown coefficients of PQ via the computation V −1 (P(α ij )Q(α ij ) : i, j ∈ [63]\(E ∪ {51, . . ., 62})) t .We remark that for the case of no collusion, Theorem 6 does not yield an optimal scheme.The proposition below outlines a modified scheme with a lower recovery threshold if secrecy is not a consideration.Proposition 6. Define the polynomials: The following hold: 1.
For each (i, j The number N of non-zero terms in the product P Q satisfies N ≤ KLM + M − 1. outperforms the SGPD scheme.This comparison of the recovery threshold for the two schemes is well justified since they use the same division of the matrices and will have identical upload and download costs per server.The comparison in Figure 4 with the entangled codes scheme [17] and a newer scheme using roots of unity [26] shows that our new codes have lower recovery threshold for low number of colluding servers.Calculating the actual number of servers needed for the entangled scheme requires knowledge of the tensor rank of matrix multiplication.These ranks, or their best known upper bounds, are taken from [29,30].It should be noted that the scheme in [26] requires that either ((L + 1)(K + T) − 1) | q or (KML + LT + KM + T) | q where q is the field size.The requirements for our scheme outlined in Proposition 5 and Corollary 1 (i.e., that gcd(q − 1, D) = 1, q > N) are much less restrictive.The comparison with the GASP scheme is less straightforward since the partitioning in GASP has a fixed value of M = 1.The plot in Figure 5 shows the recovery thresholds for the GASP scheme with partitioning K = L = 3M as well as the recovery thresholds of our scheme for K = L = 3 and varying M from 1 to 5. We compare here with the maximal degree of our scheme, not the non-zero coefficients, to show that the variant of our scheme that is able to mitigate stragglers and Byzantine servers achieve much lower recovery thresholds.Fixing K and L to be the same value across this comparison means that the download cost per server is the same for all our schemes and the K = L = 3 GASP scheme.Note that in the M = 1 case, we have identical partition and hence upload cost per server as the K = L = 3 GASP scheme, while for M = 2, we have identical upload cost with the K = L = 6 GASP scheme, and M = 5 corresponds to the K = L = 15 GASP scheme.We can see that the grid partitioning allows for a much lower recovery threshold when the upload cost is fixed.The outer partitioning of the GASP scheme allows for low download cost per server that makes up for the higher recovery threshold.Explicitly, the outer partition into KM and LM blocks allows for a download rate of N GASP ( ab M 2 ), where N GASP is the recovery threshold for the GASP scheme.In contrast, the scheme presented in this paper will have a download rate of Nab if we partition into K × M and M × L blocks.It should be noted though that our construction allows to explicitly control the field size needed.In contrast, the GASP scheme might have to choose its evaluations points from an extension field Theorem 1 [9] if the base field is fixed by the entries of the matrices A and B, or just requires a very large base field.This would greatly increase the computational cost and the rates at all steps of the scheme.For example, for K = 3, L = 3, T = 3, GASP r uses N = 22 servers and the exponents for the randomness in one of the polynomials are 9, 10, 12.Then, there are no suitable evaluation points for q = 23, 25, 27, 29, 31, 32, 37, 41, 43 and so for these values of q, an extension field is required.
Furthermore, the scheme presented in this paper can be used in situations where stragglers or Byzantine servers are expected as described in Corollary 2.

Complexity
We summarize the cost of F q -arithmetic operations and transmission of F q elements associated with this scheme, using N servers.We refer the reader to ([25], Table 1) and ( [26], Table 1) to view the complexity of other schemes in the literature (note that the costs defined in [25] are normalized).There are various trade-offs in costs depending on the partitioning chosen (the proposed scheme is completely flexible in this respect), ability to handle stragglers and Byzantine servers, and constraints on the field size q.
We remark that additions in general are much less costly than F q -multiplications in terms of space and time: for example, if q = 2 , then an addition has space complexity (number of AND and XOR gates) O( ) and costs 1 clock in time, while multiplication has space complexity O( 2 ) and time complexity O(log 2 ( )) [31,32].
The encoding complexity of our scheme comes at the cost of evaluating the pair of polynomials P(x) and Q(x) each at N distinct elements of F q .This is equivalent to performing Nr(a + b) (scalar) polynomial evaluations in F q .Given α ∈ F q , the (i, j)-entry of P(α) is an evaluation of an F q -polynomial with KM + T coefficients, while the (i, j)-entry of Q(α) is an evaluation of an F q -polynomial with KL + T coefficients.The decoding complexity is the cost of interpolating the polynomial PQ ∈ F a×b q [x] using N evaluation points, when PQ has at most N unknown coefficients.
The cost of either polynomial evaluation at N points or interpolation of a polynomial of degree at most N − 1 has complexity O(N log 2 Nlog log N).Therefore, we have the following statement.

1.
The encoding phase of the scheme presented in Section 3, using N servers, has complexity O((a + b)rN log 2 Nlog log N).

2.
The decoding phase of the scheme presented in Section 3, using N servers, has complexity O(abN log 2 Nlog log N).

3.
The total upload cost of the scheme presented in Section 3, using N servers, is r(a + b)N.

4.
The total download cost of the scheme presented in Section 3, using N servers, is abN.

Conclusions
In this work, we addressed the problem of secure distributed matrix multiplication for C = AB in terms of designing polynomial codes for this setting.In particular, we assumed that A and B contain confidential data, which must be kept secure from colluding workers.Similar to some previous work also employing polynomial codes for distributed matrix multiplication, we proposed to deliberately leave gaps in the polynomial coefficients for certain degrees and provided a new code construction which is able to exploit these gaps to lower the recovery threshold.For this construction, we also presented new closed-form expressions for the recovery threshold as a function of the number of colluding workers and the specific number of submatrices that the matrices A and B are partitioned into during encoding.Further, in the absence of any security constraints, we showed that our construction is optimal in terms of recovery threshold.Our proposed scheme improves on the recovery threshold of existing schemes from the literature in particular for large dimensions of A and a larger number of colluding workers, in some cases, even by a large margin.
s h a 1 _ b a s e 6 4 = " d e P F O E x o n e o 4 e 6 7 L w F h R / 7 B y u 5 5 H 4 2 3 B 9 o k e Q 5 P q j 2 9 e c z b 5 x s N 9 u 7 z Z 3 P O 2 v 7 j e p p L Z F n 5 A X Z J G 3 y i u y T I 9 I h X c I I J 1 / J N / I 9 + B H 8 C n 4 H f 6 a u i w t V z F N S W 8 H f f 2 m S d Y A = < / l a t e x i t > C = AB < l a t e x i t s h a 1 _ b a s e 6 4 = " 5 f 0 w 6 2 o W 2 f / U Y f l B Y U w u g c I 9 7 7 s r 0 9 9 o v I g 6 1 N o v a Z H S d n W H s h E d g n c U q 8 j 9 p K w y x x N Y J k E 5 J t D 5 s t 1 K X c 9 T 4 z h D p 7 U o s 5 D O 4 i 7 R V F S C 7 f l Z Z L C N S B I M t M F P u a B E 5 y M 8 l d Z O Z Y y e k r o r e 5 M r w N u 4 I q M N r Z P U T E 2 S z 7 M w c d Q Y P b b 1 8 s o 0 z P Q 8 Z P j j q a v R 3 m J 7 V 7 G e h C l n b q / e m o + 1 H q E s 9 Y x + T O 0 U 6 6 + D l W w 1 r I R c L P K 1 7 Z p g b v C m 5 7 l K

Figure 1 .
Figure 1.System model for secure matrix multiplication.

Proposition 4 .
The matrix M(x 1 , . . ., x T ) is invertible if and only if the elements x D 1 , . . ., x D T are distinct.

Figure 5 .
Figure 5.Comparison of the maximal degree with the GASP r scheme from[10].

Table 1 .
Summary table of maximal degree of PQ.

Table 2 .
Exponents of P(x)Q(x) for K = 3, L = 2, M = 3, T = 3.The monomial exponents which correspond to the computed data are shown in blue.The grey background marks noise exponents.

Table 3 .
Exponents of P(x)Q(x) for K = 3, L = 2, M = 3, T = 6.The monomial exponents which correspond to the computed data are shown in blue.The grey background marks noise exponents.