1. Introduction
The rapid advancement of big data processing and artificial intelligence technology has created substantial demand for low-latency computation of large-scale batch matrix multiplication, thereby motivating the emergence of the coded distributed batch matrix multiplication (CDBMM) problem. CDBMM aims to enable efficient and reliable distributed computation of a sequence of matrix products
by leveraging coding techniques to mitigate the impact of stragglers. Specifically, the sources encode two batches of matrices
and
into shares, which are then distributed to
N worker nodes. Each worker then computes a response from received shares and sends it to a sink node. Some worker nodes may fail to send their response in time due to unforeseen factors such as network latency, hardware failures, or resource contention; these nodes are referred to as stragglers. The sink node must be able to recover the desired matrix multiplication results from the received responses, provided that the number of stragglers remains within a tolerable limit. Early approaches adopt Maximum Distance Separable (MDS) codes [
1] and simply replicate half of the matrices unchanged across all worker nodes without encoding, leading to high computational overhead at workers. To reduce worker complexity, polynomial codes [
2] introduced algebraic encoding via polynomial evaluation, achieving the optimal recovery threshold under a specific matrix partitioning strategy, namely, partitioning matrix
row-wise and matrix
column-wise for each
. Subsequent works such as MatDot and PolyDot codes [
3] further optimized the trade-off between communication load and recovery threshold by exploiting matrix partitioning along both rows and columns. Generalized PolyDot [
4] improve the recovery threshold of PolyDot codes by a factor of 2, while Entangled Polynomial (EP) codes [
5] generalize polynomial codes to support arbitrary matrix partitioning and achieve the same recovery threshold as Generalized PolyDot. Building on EP codes, the scheme in [
6] designed a flexible coding strategy that dynamically adapts to the number of stragglers and optimizes download cost by utilizing all non-straggling workers. Lagrange Coded Computing (LCC) [
7] extends these ideas to general batched computations using polynomial interpolation over finite fields, supporting both straggler resilience and privacy. Generalized Cross-Subspace Alignment (GCSA) codes [
8] provides a powerful unifying framework that simultaneously supports arbitrary matrix partitioning and batch processing, achieving the best known performance. Meanwhile, the growing demand for privacy protection has motivated the study of secure distributed batch matrix multiplication (SDBMM). To prevent adversaries from learning information about the two batches of matrices
and
from the shares distributed to a subset of worker nodes, SDBMM requires any
X colluding worker to not learn any information about the input matrices. Early secure extensions adapt polynomial codes to the SDBMM setting [
9,
10]. GASP codes [
11] optimize download cost by tailoring the degrees of encoding polynomials to the partition parameters and the security threshold
X. The authors of [
12] propose two SDBMM schemes: one based on structured secret sharing and another built upon CSA codes. Generalized PolyDot codes are extended to SDBMM in [
13], offering a tunable trade-off between recovery threshold and communication cost, and this framework is further generalized in [
14] to support arbitrary collusion patterns. Bivariate polynomial codes are adapted to SDBMM in [
15], balancing upload cost against average worker computation time. More recently, algebraic geometry codes have been employed to construct SDBMM schemes [
16]. For scenarios requiring source privacy, where even the sink must remain ignorant of the input matrices
and
, polynomial sharing [
17] uses secret sharing among workers to protect the inputs. However, it requires each worker node to securely exchange intermediate products with all others, resulting in significant inter-worker communication overhead. GCSA-NA codes [
18] address this issue by allowing workers to pre-share randomness that is independent of the input matrices, thereby reducing inter-worker communication by orders of magnitude. Finally, the scheme in [
19] improves upon
X-secure GCSA [
8] by achieving a lower recovery threshold for certain values of the security parameter
X.
While the aforementioned CDBMM and SDBMM frameworks have achieved remarkable progress in straggler mitigation and security guarantees, they commonly rely on a crucial yet often unrealistic assumption: that every source node can distribute encoded shares to all worker nodes. This full-connectivity assumption greatly simplifies code design but fails to reflect practical distributed computing environments, such as edge computing clusters, federated learning systems, or data centers with hierarchical network topologies, where communication links may be restricted due to bandwidth limitations, access control policies, or physical proximity constraints. It is precisely this gap between the idealized model of CDBMM and real-world deployment scenarios that motivates our work. When source–worker connectivity becomes partial, the conventional global encoding strategies (e.g., EP codes, LCC codes) are no longer directly applicable, as each source can only influence a subset of workers. This necessitates a new coding paradigm that respects the locality of encoding while preserving straggler tolerance and security, naturally leading to the formulation of the locally encoded secure distributed batch matrix multiplication (LESDBMM) problem. The LESDBMM problem involves M pairs of source nodes , , ⋯, , N worker nodes, and one sink node. The sink node can communicate with all worker nodes, while each pair of source nodes is connected only to a subset of worker nodes. The communication connectivity between all source nodes and worker nodes is globally known. Each pair of source nodes and encodes their batches of matrices and and distributes the generated shares exclusively to its connected worker nodes; such an encoding pattern is referred to as “local encoding pattern”. The worker nodes must remain oblivious to the values of the matrices . Any set of up to X colluding workers must learn no information about these matrices. Each worker node computes a response from the received shares and sends it to the sink node. The sink node must be able to recover the desired matrix multiplication as long as the number of stragglers does not exceed S. This work aims to construct an efficient and straggler-tolerant LESDBMM scheme. The key challenge lies in leveraging the local encoding pattern to design an encoding scheme, such that the interference across different encoding subsets can be aligned into as few dimensions as possible, thereby minimizing the communication cost required for interference elimination during decoding. Furthermore, the decoding scheme must mitigate the impact of randomly occurring stragglers on the locally encoded structure.
A closely related problem is the problem of
X-secure
T-private linear computation based on graph-based replicated/MDS-coded storage (GXSTPLC) [
20]. In the GXSTPLC problem,
K messages are partitioned into
M message sets, and the messages of each message set are restricted to be distributed among a subset of
N servers in a securely coded form. Any set of
X colluding servers must not disclose any information about the stored messages. A user wishes to privately compute a linear combination of all messages. To this end, the user sends queries to the servers and recovers the desired linear combination from the answer returned by servers. In this process, all servers must remain available, and any
T colluding server must learn nothing about the coefficients of the linear combination. Ref. [
20] proposes the first asymptotic capacity achieving the GXSTPLC scheme for replicated storage based on the idea of cross-subspace alignment (CSA) and a structure inspired by dual generalized Reed–Solomon (GRS) codes, demonstrating the optimality of CSA codes and dual GRS codes for interference alignment across message sets in this setting. Ref. [
21] proposes a GXSTPLC scheme for MDS-coded storage based on CSA codes and exploits the idea of CSA null shaper, rather than using dual GRS codes, to enable interference alignment across message sets. When applied to the case of replicated storage, its rate matches the asymptotic capacity established in [
20]. In fact, there is a connection between the LESDBMM problem and the GXSTPLC problem. If we let
be a row vector and
be a column vector, the desired matrix multiplication
then degenerates into
M batches of pairwise vector inner products. By thinking of
as the
symbols of each message in the
message set and
as corresponding coefficients, any GXSTPLC scheme automatically yields an LESDBMM scheme. This work focuses on extending the batch vector inner product scheme yielded by the MDS-GXSTPLC scheme [
21] to an LESDBMM scheme applicable to matrix multiplication of arbitrary dimensions and enabling it to tolerate stragglers.
The main contribution of this work is the first LESDBMM scheme based on batch processing. Our scheme utilizes CSA codes and CSA null shaper to achieve interference alignment across encoding subsets. We evaluate the scheme in terms of the straggler threshold (i.e., maximum number of tolerable stragglers), upload cost, download cost, encoding complexity, worker node computation complexity, and decoding complexity. By comparing with the baseline scheme, we demonstrate that the optimization of download cost achieved by our scheme is non-trivial. Moreover, by adjusting the parameter
, we can achieve a trade-off between performance metrics of the encoding and decoding phases. When the problem degenerates to the CDBMM setting, where
and all source nodes can distribute their shares to all workers, the performance of our scheme matches that of the CSA codes for CDBMM [
8]. Hence, our scheme can be viewed as a generalization of the CSA codes for CDBMM to the LESDBMM setting.
The remainder of this paper is organized as follows.
Section 2 formally defines the problem of locally encoded secure distributed batch matrix multiplication.
Section 3 presents the main result.
Section 4 presents the proof of our main result along with an illustrative example.
Section 5 concludes the paper and discusses future research directions.
Notation: Bold symbols are used to denote vectors and matrices, while calligraphic symbols denote sets. Following the convention, let the empty product be the multiplicative identity and the empty sum be the additive identity. For any two positive integers , denotes the set . We use the shorthand notation for . For an index set , denotes the set . For a subset of integers , denotes its element in ascending order. For an matrix and two integers , denotes the element at the row and the column of . denotes the zero matrix of size . The notation suppresses polylog terms; i.e., can be replaced with if the field supports the Fast Fourier Transform (FFT) and with if it does not.
2. Problem Statement
Consider the LESDBMM problem as shown in
Figure 1 with
M pairs of source nodes
,
, ⋯,
and
N worker nodes
. Each source node
,
generates a batch of matrices
, where for all
,
. Each source node
,
generates a batch of matrices
, where for all
,
. A sink node with limited computing power demands
. For this purpose, the sink node requires each source node to encode and send its matrices to
N worker nodes and have them assist in computing matrix multiplications. Due to constrained communication links between the source nodes and the worker nodes, each pair of source nodes
can only send their encoded matrices to a subset of
N worker nodes denoted as
,
, i.e.,
. The collection of sets
is referred to as “local encoding pattern”. We can equivalently define the dual representation of the local encoding pattern. For all
, let us define the index set of encoded batches of matrices that are available at worker node
n as
.
Each source node
encodes the matrices
according to functions
and generates the shares
, where for all
,
is sent to worker node
n. Each source node
encodes the matrices
according to functions
and generates the shares
, where for all
,
is sent to worker node
n. The shares of
M batches of matrices
are generated independently, i.e.,
Any group of up to
X colluding worker nodes cannot learn any information about the matrices
; i.e., for all
,
Upon receiving shares
, each worker node
computes a response
according to function
, i.e.,
and sends
to the sink node.
The sink node must be able to decode the desired products
in the presence of up to
S stragglers. Let
denote the index set of the
fastest worker nodes that send responses to the sink node. For any subset
, according to decoding function
, we have
Let us denote
. We say that
form an LESDBMM scheme against
S stragglers. To evaluate the performance of an LESDBMM scheme defined above, we consider the straggler threshold, the communication cost, and computation complexity. The straggler threshold
S is defined as the maximum number of tolerable stragglers. The communication cost includes upload cost
and download cost
D, defined as follows:
where
counts the number of symbols from
needed to represent
C. The computation complexity includes encoding complexity
, worker node computation complexity
, and decoding complexity
.
and
are defined as the order of the number of finite field arithmetic operations required to compute
and
, normalized by
.
is defined as the order of the number of finite field arithmetic operations required to compute
, normalized by
.
is defined as the order of the number of finite field arithmetic operations required to compute
, normalized by
.