4.1. Architectural Overview of FlowAgg-FE
FlowAgg-FE is structured as a layered architecture that aligns the cryptographic design of KS-IPFE with the systems concerns of streaming, robustness, and verifiability. The roles are as in
Section 3: a population of clients
, an untrusted aggregator
, and two non-colluding helpers
. All parties share public cryptographic parameters and model hyperparameters; only the helpers hold FE function key shares. At a high level, each training round
t consists of three phases: (i) function selection and key materialization, (ii) client-side compression, encoding, and encryption, and (iii) server-side aggregation, threshold decryption, and verification.
In the function selection and key materialization phase, a key authority (which may be an initialization-time trusted party or a distributed setup protocol) runs the KS-IPFE setup algorithm to produce
, as described later in detail. The public key
is disseminated to all clients and to
, while the master secret key
is retained solely for generating function keys. To support changing linear functions on-the-fly (e.g., per-round masks, adaptive weighting, or auditing vectors), the authority issues fresh split keys
tagged with the round identifier
t. Helpers keep only the currently active shares (plus long-lived basis shares for
), which makes revocation as simple as deleting prior-round shares. Synchronization is handled by including the round id in helper responses and rejecting stale shares at
. The per-round key material is
elements per helper, which is negligible compared to per-round ciphertext traffic in our regimes. For a given training regime, the task owner specifies a family of allowable linear functionals over encoded blocks, such as (i) individual coordinates
, (ii) rows of a sketching matrix
used in PaS-Stream, or (iii) secret tag vectors
used for verifiability. For each such vector
, the authority runs
to obtain a pair of key shares
and distributes them to
and
, respectively. Because KS-IPFE keys are splittable and each share is individually simulatable, no single helper can recover inner products alone, yet together they can support decryption for any authorized linear functional. We note that although we model the key authority as a logical role, it is needed only at initialization (and when the authorized function set changes). To remove a single point of failure, the master secret
can be generated and held by a small committee via distributed key generation/threshold secret sharing, and function-key issuance can be made auditable [
33]. Alternatively,
can be sealed inside an HSM/TEE so that only a rate-limited key-derivation interface is exposed and no bulk secret material is exportable.
In the client-side phase of round
t, the aggregator broadcasts the current global model
and, when PaS-Stream is enabled, the public sketching matrix is
and quantization is configured (e.g., the bit-width
b and scale ranges used by
). Each client
computes its clipped gradient according to Equation (
1), obtaining
with
. Depending on the mode, the payload prior to encryption is either the full vector
(KS-IPFE only) or a compressed representation
that satisfies the unbiasedness property of Equation (
8). The client then maps this payload into an integer vector via Equation (
3), producing
for some working dimension
k (equal to
d in the full-precision case and to the sketch dimension in PaS-Stream). This vector is partitioned into contiguous blocks of size
D,
, each of which is encrypted independently under
using the KS-IPFE encryption algorithm. For every block, the client also computes a linearly homomorphic tag and a clipping/encoding proof that binds
to a valid, correctly clipped source vector. The collection of ciphertexts, tags, and proofs for all blocks is streamed towards
using a simple, nonce-based framing that preserves block ordering.
The server-side phase begins as soon as
receives the first block ciphertexts. Rather than waiting for all clients,
incrementally forms aggregated ciphertexts for each block index
b by computing weighted sums across the subset of clients whose
b-th block has arrived, as in Equation (
14). In parallel,
aggregates the corresponding tags, preserving linearity at the tag level. The aggregated ciphertext pair
and any necessary public metadata are then forwarded to both helpers. Each helper uses its share of the FE keys to compute decryption shares for the relevant function vectors (e.g., the coordinate basis vectors and the secret tag vector
), which are then combined at
to recover (i) the blockwise sums of encoded payloads and (ii) an independently reconstructed tag. The latter is cross-checked against the aggregated tag to detect any tampering or malformed ciphertexts, while the former yields either a block of the aggregated gradient
or, in PaS-Stream mode, a block of the aggregated sketch
. After rescaling and (if needed) applying
,
obtains an estimator of
and performs the model update for round
t.
We note that a key feature of this architecture is that ciphertext aggregation and verification commute with functional decryption. Because KS-IPFE is additively homomorphic with respect to ciphertexts (Equation (
4)) and the tags are linearly homomorphic (Equation (
10)),
can aggregate encrypted blocks and tags independently, and the helpers can perform decryption on these aggregates without ever seeing per-client plaintexts. This property is crucial for scalability: it ensures that helper workload scales with the number of blocks per round, not with the number of clients, and that verification overhead remains modest.
4.2. KS-IPFE: Key-Splittable LWE-Based Construction
The KS-IPFE component of FlowAgg-FE provides the cryptographic substrate that allows the aggregator and the two helpers to recover only prescribed linear functionals of encrypted updates. It is tailored to the FL setting in
Section 3 by (i) operating over high-dimensional encoded vectors arising from Equation (
3), (ii) supporting ciphertext aggregation before decryption as in Equation (
4), and (iii) splitting every function key into two individually simulatable shares held by non-colluding helpers. This subsection refines the construction in a more algebraic manner, explicitly tracking dimensions, noise terms, and the simulation-based security view.
We fix a block length
and view each FE plaintext as a block
. A working vector of dimension
k (either
for full gradients or
k equal to the sketch dimension in PaS-Stream) is partitioned into
blocks, so that the
b-th block is denoted
for
. Typical choices such as
balance the amortization of gadget-based packing with the number of function keys that must be instantiated per round. All arithmetic is in the residue ring
, where
is chosen together with the LWE dimension
and error distributions
to satisfy the noise budget in Equation (
6). We write
m for the width of the public matrix
A, with
.
The construction follows a dual-style LWE template with gadget packing. Let be a block-diagonal gadget matrix implementing base-2 decomposition with digits per coordinate. Concretely, with D blocks , so that for any , the product represents the expansion of each coordinate of in base 2. There exists a (not necessarily unique) left inverse such that on the range of interest (i.e., for coefficients below the wrap-around threshold). This allows us to pass back and forth between the “digit” domain and the original coordinate domain inside the decryption algorithm.
Setup. In the setup phase, the authority samples
where the entries of
S and
E are independent draws from subgaussian integer distributions
, respectively (e.g., discrete Gaussians with parameter
and
). We define
and publish the master public key
while retaining the master secret key
. Under the LWE assumption at parameters
, the joint distribution
is computationally indistinguishable from
, where
is uniform; consequently,
reveals no information about
S beyond what is implied by the security parameter.
Key generation and splitting. To authorize decryption of inner products with a block-level function vector
, the authority first embeds
into the gadget domain by computing
We interpret
as the vector of “digit weights” corresponding to the functional
acting on gadget-packed plaintexts. Using
, we compute a base key
Intuitively,
K is the dual secret associated with the function vector
, analogous to the secret key in standard LWE decryption.
We now choose a masking vector
uniformly at random and define the two key shares as
Each helper receives only its own share. The distribution of
(resp.
) is computationally indistinguishable from
, where
is uniform in
, since
W is uniform and independent of
K. In particular, any PPT adversary that corrupts at most one helper learns no additional information about
K beyond what is already implied by
and the allowed function outputs. This observation underlies the threshold property of KS-IPFE: decryption requires cooperation between
and
, while each helper’s view alone can be simulated from public information and oracle access to function outputs.
Client encryption. For a single block payload
, derived from either
or its sketch as in Equation (
3), client
forms the gadget-packed vector
We regard
as a column vector. The client then samples a fresh randomness vector
and error terms
and computes the ciphertext components
The per-block ciphertext is the pair
. Note that if we define the “ideal” noiseless ciphertext as
, then
differs from
by an additive error vector
.
Ciphertext aggregation. Once
has received ciphertexts for a given block index
b from a subset
of clients in round
t, it uses the fixed-point weights
(encoding the real weights
) to form the aggregated ciphertext
Writing
for the randomness used by
on block
b and
for its gadget-packed plaintext, we can decompose
The aggregated error vectors
remain subgaussian with parameter depending on
; Equation (
6) is chosen precisely so that their magnitude is well within the decryption margin.
Threshold decryption and correctness. For each block and each authorized function vector
, helper
with share
computes a decryption share
where
and
as in Equation (
12). Summing these shares yields
Denoting the total noise term in Equation (
16) by
, we can write
Because
, and
on the relevant range, we have
where we used the identity
modulo
q when no wrap-around occurs. By Equation (
6), the subgaussian tails of
ensure that
with probability at least
, so rounding to the nearest integer in the canonical interval
recovers
with overwhelming probability.
By choosing
for
, the helpers can reconstruct all
D coordinates of the blockwise weighted sum
from encrypted inputs. Mapping these coordinates back through the scaling factor
in Equation (
3) then yields the corresponding portion of
or of its sketched variant, up to deterministic rounding error controlled by
and the clipping threshold
S.
Security intuition and cost. From a security perspective, KS-IPFE inherits the indistinguishability guarantees of LWE-based IPFE. An IND-style security game for KS-IPFE can be phrased as follows: An adversary chooses two message families
with the same values under all functions for which it holds key shares, receives encryptions of one of the two (chosen at random), along with one key share per function vector, and must guess which family was encrypted. Under the LWE assumption and the simulatable distribution of key shares in Equation (
12), any PPT adversary controlling
and at most one helper has at most negligible advantage in this game. Intuitively, replacing
with uniform, then replacing ciphertexts with uniform, and finally replacing the key share mask
W with uniform in hybrids yields a distribution that depends only on the revealed function outputs.
On the cost side, one KS-IPFE encryption of a block requires forming
and two matrix-vector multiplications
and
, plus additions by small error terms. With NTT-friendly moduli and a structured choice of
, these multiplications can be implemented in
ring operations with a modest constant. Server-side aggregation is linear in the number of ciphertexts actually received and consists of scalar additions in
. Helper-side decryption uses
ring operations per decryption share (one inner product with
and one with
), so for
F distinct function vectors and
B blocks, the total number of decryption operations per round scales as
, independent of
. In the regimes we evaluate in
Section 5, a configuration with
,
,
,
, and carefully tuned
yields a per-block decryption failure probability below
and supports several thousand contributing clients per round, while keeping per-client ciphertext sizes within a small constant factor of plaintext updates. More detailed game-based proofs (including the reduction to LWE and simulation of single-helper views) are provided in
Appendix B.
4.3. Verifiable Aggregation with Commuting Checks
In addition to confidentiality and function privacy, FlowAgg-FE must enforce that the decryptions released to
coincide with valid linear aggregates of correctly clipped and encoded client updates, even in the presence of Byzantine clients and a potentially adversarial single helper. The verifiability layer is engineered so that all checks
commute with the KS-IPFE aggregation pipeline: every operation can be written as an
-linear map applied either before or after ciphertext aggregation, and the corresponding verification conditions are preserved up to negligible statistical or computational error. This section refines the informal description from
Section 3 into a more algebraic treatment.
4.3.1. Linearly Homomorphic Tags as a MAC over the FE Plaintext Space
Let
p be a large prime such that
and
p is co-prime with
q, and let
be the finite field of order
p. For each round
t, an authentication key space is defined as
, and a tag space as
. A per-round authentication key is sampled as
and is made available to all clients and to the KS-IPFE key generator (for the purpose of deriving
), but not to
. We treat
as an ephemeral, per-round secret used only to authenticate that the decrypted aggregate matches the client-supplied tags in that round.
is generated by the same (possibly distributed) authority that issues KS-IPFE keys, is delivered to clients over an authenticated channel, and is erased after round
t closes; helpers receive only FE key shares for the corresponding function vector
. The soundness of the LHT check requires that
does not learn
; if a compromised client discloses
to
, then the server could forge tags for that round. We now make this collusion caveat explicit and note that confining tag computation to client TEEs (or distributing
only to trusted clients) mitigates it in deployments that require strong server-verifiability even under client–server collusion.
On this basis, we define the block-level message space for tags as
for some bound
that upper bounds the infinity norm of rounded, scaled blocks
produced by Equation (
3) and the clipping constraint
. The LHT then induces an almost-linear MAC
For each block
b and client
, we set
For any finite subset of indices
and scalar weights
, we have the
exact algebraic identity
which is the blockwise instantiation of Equation (
10). In particular, define the aggregated encoded block
Then
exactly, as long as all operations are interpreted modulo
p for the tag and modulo
q for the encoding. When
, we recover the global aggregated tag
used by the protocol.
Viewed as a MAC, the unforgeability of this LHT against an adversary not knowing reduces to the hardness of guessing a non-trivial linear relation over on authenticated messages. More precisely, if an adversary outputs such that is not in the -span of previously authenticated messages and , then under the random choice of , the probability of success is at most .
Lemma 1
(LHT statistical hiding). Fix any and sample . Let . If then τ is uniform over ; if then . In particular, when is hidden from , the tag does not leak the gradient norm or other distributional statistics beyond the degenerate event .
4.3.2. FE-Based Cross-Checking of Aggregates
The LHT alone only ensures that tags are consistent with the sum of messages within the tag domain; it does not bind tags to the KS-IPFE ciphertexts. To couple tags to ciphertexts, we reuse the FE plaintext space and instantiate an additional FE key for the same vector .
Let
be as above and let
denote its canonical embedding into
(e.g., by interpreting
as integers in
and viewing them modulo
q). The KS-IPFE key generator computes
with internal gadget expansion
as in Equation (
12). These key shares are distributed to
and
.
Given the aggregated ciphertext
for block
b in round
t as in Equation (
14) (so
), helper
computes the decryption share for function vector
as follows:
and transmits it to
. By the correctness analysis in Equation (
16), their sum
and after gadget inversion and rounding we have, with probability at least
,
Since
and the magnitude of
is bounded by
B, the reduction modulo
q is injective on the relevant range, enabling a well-defined lifting to
as follows:
with
representing
. We then project
into
via the canonical map
(e.g., reduction modulo
p) to obtain
Under the consistency of parameters (specifically,
p and
q sufficiently large relative to
B and the number of clients), we have
with all but negligible probability. Hence the equality
serves as a soundness check tying the FE-decrypted aggregate to the aggregated tags. Any deviation that changes ciphertext contents (e.g., adversarial modification by a client or by a compromised helper) while leaving client-generated tags untouched will, with high probability, violate this equality.
4.3.3. Commit-and-Prove for Clipping and Encoding
The consistency of tags and FE decryptions guarantees that the aggregator recovers a sum of certain integer vectors, but it does not by itself ensure that those integers arise from correctly clipped and scaled real-valued updates. To enforce semantic correctness of the encoding, each client engages in a commit-and-prove protocol.
Let
be a cyclic group of prime order
p with generators
such that the discrete logarithm
is unknown. For a block index
b, client
defines a bit-decomposition of
into
-bit chunks per coordinate (with
), and commits to each coordinate using a Pedersen-style vector commitment as follows:
where
are fixed, publicly known generators for the message space. The commitment is additively homomorphic as follows:
for any index set
J. This algebra exactly mirrors the linear aggregation in Equation (
14) and the tag aggregation above.
Client supplies, for each round, a non-interactive zero-knowledge proof of the following composite statement:
There exists and randomness such that (a) , (b) the blocks are consecutive slices of either or , and (c) for each b we have and the corresponding commitment equals .
Such proofs can be realized using standard inner-product arguments and range proofs (e.g., Bulletproofs-style constructions) that express the clipping condition as a quadratic constraint on the coordinates and the encoding relation as a bounded difference between and . Verification of is achieved by before aggregation, but thanks to homomorphism, commitments may also be aggregated and verified in batch with auxiliary information provided by helpers after decryption.
4.3.4. Commutativity and Asymptotic Overheads
Summarizing the algebraic structure, for each block
b and round
t we have three parallel linear maps as follows:
Each of these maps is
-linear in the sense that the image of the aggregated objects is equal to the aggregation of the images. Consequently, the following diagram commutes (up to negligible decryption error and modulo reductions):
A similar commuting diagram holds for commitments. Our commuting checks enforce that each accepted ciphertext/tag/commitment tuple is well-formed and that the server’s decrypted output equals the linear aggregate of the submitted (bounded, encoded) client updates. This provides an auditable enforcement point for bounded-energy (e.g., clipped) updates, but it does not, by itself, prevent within-bound model-poisoning attacks. In practice, monitoring can be done via (i) per-round rejection/abort rates from failed checks, (ii) aggregate statistics that are safe to reveal (e.g., the number of clipped updates, which can be reported by clients as a single bit), and (iii) standard training diagnostics (loss/accuracy curves) to flag anomalous rounds; robust aggregation or anomaly detection can be layered on top without changing the cryptographic core. This commutativity is the main reason the verification overhead scales with the number of blocks and not with the number of clients:
can aggregate ciphertexts, tags, and commitments using the same coefficients and rely on a constant number of FE decryptions and aggregate-proof verifications per block.
From an asymptotic viewpoint, if denotes the number of blocks per client in round t and F the number of distinct function vectors used for verification (coordinate basis plus and possibly a small number of additional selectors), then:
The number of FE decryptions per round is , independent of .
The number of scalar LHT evaluations per client is , and the communication cost of tags is bits.
The size of commitments and their proofs grows as group elements per client, which is dominated by the KS-IPFE ciphertexts for typical parameter regimes.
Section 5 confirms empirically that, under realistic choices of
D,
,
p, and
q, the verifiability layer adds only a low single-digit percentage to overall runtime and communication while significantly strengthening the integrity guarantees of FlowAgg-FE.
4.4. PaS-Stream: Rate-Adaptive Streaming Without Accuracy Bias
PaS-Stream instantiates a rate-adaptive transmission layer that composes Johnson–Lindenstrauss sketching, stochastic quantization, and KS-IPFE encryption into a single linear operator acting on client updates. Its design is such that (i) the estimator of the target aggregate
remains unbiased in the sense of Equation (
8), (ii) the variance introduced by quantization is explicitly controlled, and (iii) partial receipt of blocks and client dropouts manifest as structured linear perturbations rather than protocol failures.
We consider a per-round sketching dimension
k (either fixed or adapted over time) and a public sketching matrix
. For concreteness, one may view
as a subsampled randomized orthonormal transform with entries in
, as in Equation (
7), although the analysis below only requires a Johnson–Lindenstrauss-type concentration property. Let
be a stochastic quantizer with
output levels satisfying
for all
, where
is a variance parameter. We lift
to act componentwise on vectors.
Client pipeline (round t). For each client
, define the linear compression operator
Client
first computes its clipped gradient
according to Equation (
1) and then its sketch
Next, it applies
to obtain a random quantized sketch
with the property that
and
. Combining this with Equations (
2) and (
8), we have
so the compressed aggregate is an unbiased estimator of the sketched target
. We now partition the
k-dimensional vector
into
contiguous blocks via a family of deterministic selection matrices
, each extracting
D coordinates
By construction,
is the
identity, and thus
. We write
to emphasize that this is the real-valued payload block for KS-IPFE and the tag layer.
Each block
is then encoded into an integer vector and packed using the KS-IPFE plaintext map as follows:
and encrypted into a block ciphertext
according to Equation (
13). In parallel,
computes the linearly homomorphic tag
and then produces the clipping/encoding proof for this block. Each triple
consists of a monotone nonce
(e.g.,
encoded as a single integer), the ciphertext
, tag, and proof. These are streamed to
in non-decreasing order of
, and the nonce is bound into
(e.g., by hashing it into the LWE randomness) to prevent replay or reordering attacks. To validate
at scale,
only stores, for each active client
, the largest accepted nonce (or equivalently the next expected block index). This is
counters and can be garbage-collected at round end since nonces are round-scoped; e.g.,
active clients require
bytes for 64-bit counters. If
is deterministically encoded as
, the check reduces to duplicate suppression and does not require long-term per-client history.
Server/helpers pipeline and rate adaptation. Upon receiving any subset of block messages from clients, the aggregator maintains, for each block index
b, an active index set
consisting of clients whose block-
b ciphertexts have arrived and passed basic syntactic checks (including proof verification). For that block, it computes the aggregated ciphertext and tag
Note that
can vary across blocks: a client may send early blocks promptly but drop out before later blocks. Let
be a target coverage parameter;
may decide to close block
b once
, discarding any later-arriving contributions to that block. This is the locus of rate adaptation: smaller
accelerates progress at the cost of using fewer client contributions per block. Closing blocks early can correlate contribution with device speed: if slower clients systematically belong to underrepresented groups (a common concern in non-IID settings such as FEMNIST), their effective weight in the aggregate may be reduced. Our sketching and quantization remain unbiased conditional on the received set
, but unbiasedness alone does not guarantee population- or device-level fairness [
34,
35]. As simple mitigations, one may (i) track each client’s effective participation over time and compensate in
, and/or (ii) periodically run full-coverage rounds (
) to reduce drift.
For each closed block
b, the pair
is forwarded to both helpers, who evaluate it under KS-IPFE function keys for the standard basis vectors
and the tag selector
. Using Equation (
15), helpers produce shares
for
,
. Aggregating shares and rounding as in Equation (
16) yields, with overwhelming probability
so that
Dividing by
and undoing the fixed-point encoding recovers
where the approximation error is due only to rounding in Equation (
3). In parallel, the tag decryption under
produces
, which is checked against
as in the previous subsection, ensuring consistency between ciphertext aggregates and tags.
Stacking all blocks, we define the recovered sketched aggregate as
By linearity of
and the unbiasedness of
, conditioning on the sets
we obtain
When all blocks are received,
for all
b, and the last expression collapses to
by
. Under rate adaptation (i.e., some
), the expectation equals
applied to a truncated aggregate in which each coordinate receives contributions only from those clients whose corresponding block arrived before closure. This aligns with the FL semantics where the effective aggregation set is the subset of clients that succeed in uploading their updates before the round deadline. Finally, the model update is computed from
via a pseudo-inverse of the sketch
which is the estimator used in
Section 5. In the idealized full-participation regime,
=
, which reduces to
when
has orthonormal rows; in the non-ideal regime,
acts as a linear reconstruction operator for the partially observed sketch. Because decryptions and verifications occur blockwise, the server can begin updating coordinates associated with blocks that have already been closed and validated, while late blocks continue to stream, thereby tolerating both dropouts and heavy-tailed straggler behavior without violating the FE-based confidentiality guarantees.