A Novel Verifiable Functional Encryption Framework for Secure and Communication-Efficient Distributed Gradient Transmission Management

Tan, Ziya; Pan, Zijie; Liang, Ying; Yang, Shuyuan

doi:10.3390/electronics15050928

Open AccessArticle

A Novel Verifiable Functional Encryption Framework for Secure and Communication-Efficient Distributed Gradient Transmission Management

¹

School of Management, Guangdong University of Science and Technology, Dongguan 523070, China

²

School of Computer Science and Engineering, Guangzhou University, Guangzhou 510006, China

^*

Authors to whom correspondence should be addressed.

Electronics 2026, 15(5), 928; https://doi.org/10.3390/electronics15050928

Submission received: 18 November 2025 / Revised: 15 December 2025 / Accepted: 16 December 2025 / Published: 25 February 2026

(This article belongs to the Special Issue Advancements in Distributed Intelligent Security Through AI-Driven Solutions)

Download

Browse Figures

Versions Notes

Abstract

Secure and bandwidth-conscious transmission of model updates is a central bottleneck in distributed machine learning. Existing secure aggregation and homomorphic encryption pipelines either reveal more than the task requires or incur prohibitive computation and communication costs. We introduce a verifiable functional encryption (VFE) framework that releases only the intended linear functions of client gradients while providing end-to-end integrity and privacy guarantees under standard lattice assumptions. Our instantiation, FlowAgg-FE, combines two novel components. First, KS-IPFE, a key-splittable inner-product FE scheme, supports per-round weighted aggregation, vector packing, and on-the-fly function changes without client re-encryption; function keys are distributed across two non-colluding helpers, eliminating a single point of trust and enabling lightweight, homomorphically verifiable tags on decrypted outputs. Second, PaS-Stream is a rate-adaptive encryption-and-compression pipeline that couples sketch-based gradient compression with batched FE ciphertext streaming, ensuring unbiased aggregation in the presence of stragglers and dropouts. We further bind client-side clipping to zero-knowledge range proofs and offer an optional differentially private release layer that composes with FE to yield

(ε, δ)

-privacy. A prototype based on LWE demonstrates practicality across cross-device and cross-silo training: client uplink is reduced by 1.9–3.4× and server CPU time by

1.6 \times

versus state-of-practice encrypted secure aggregation, with accuracy within

0.3 %

of plaintext baselines and correctness preserved under up to

30 %

client dropout. These results show that verifiable FE can make secure, communication-efficient gradient transmission viable, as appropriate for theme of security and privacy in distributed machine learning of the Special Issue.

Keywords:

machine learning; functional encryption; gradient transmission

1. Introduction

Distributed machine learning, and federated learning (FL) in particular, have emerged as central paradigms for training models over data that remains on edge devices or organizational silos [1,2,3]. In such settings, each client

C_{i}

computes a local update or gradient vector

v_{i, t}

on its private data in round t, and an untrusted aggregator seeks only a global statistic such as the weighted sum

G_{t} = \sum_{i \in A_{t}} α_{i, t} v_{i, t},

where

A_{t}

is the (possibly random) set of participating clients and

α_{i, t}

are public aggregation weights. This pattern underlies cross-device FL in large-scale deployments [1,4], as well as cross-silo collaborations among institutions that cannot share raw data.

However, secure and communication-efficient transmission of

v_{i, t}

remains a key bottleneck relevant to the Special Issue’s theme of security and privacy in distributed machine learning. On the one hand, secure aggregation protocols ensure that the aggregator learns only

G_{t}

(or a related sum) and not individual

v_{i, t}

[5]. On the other hand, the bandwidth cost of sending dense, high-dimensional updates—together with cryptographic noise and integrity artifacts—limits the scalability of training, especially for on-device settings with constrained uplink [2,3]. Homomorphic encryption (HE) can support rich computation over encrypted updates [6,7,8], but generic HE-based pipelines incur substantial ciphertext expansion, heavy computation, and often expose the aggregation circuit and its structure to the server.

Existing systems therefore face a tension between privacy, functionality, and efficiency. Secure aggregation based on pairwise masks or additively homomorphic cryptosystems reveals the exact aggregate

G_{t}

to the server, but it does not provide function privacy: the function being computed is typically fixed (e.g., coordinate-wise sum) and known to all parties [5]. If the task owner wishes to change the function (for example, to aggregate only a subset of coordinates, or to work in a sketched space), new protocol instances or key material are required. Meanwhile, communication-efficient training methods—such as gradient quantization, sparsification, and sketching [9,10,11,12,13,14,15]—primarily address bandwidth and do not by themselves guarantee cryptographic privacy of the compressed updates. Finally, integrity mechanisms that protect against Byzantine clients often rely on heavyweight zero-knowledge proofs or verifiable computation that do not commute with the linear aggregation structure of FL [16,17].

Functional encryption (FE) offers an appealing abstraction: decryption keys reveal only the output of a specified function f on ciphertexts, while everything else about the underlying plaintexts remains hidden [18]. Inner-product FE (IPFE) schemes from LWE [19,20,21] instantiate this for linear functions, enabling one to obtain

〈 x, y 〉

from an encryption of

x

and a key for

y

, and nothing else. These constructions align naturally with the linear statistics used in FL, but existing FE work has not directly addressed the system-level challenges of high-dimensional gradient transmission, client dropouts, and streaming under straggler-heavy networks. Moreover, most FE designs assume a single, fully trusted key authority and do not provide practical mechanisms to split decryption capabilities across non-colluding helpers in a way that supports verifiable streaming aggregation.

1.1. Challenges

This work is motivated by the following three intertwined challenges:

(1): Function-private aggregation of high-dimensional updates. The server should learn only the quantities that are strictly necessary for model updates—such as selected coordinates of $G_{t}$ or its image under a sketching operator—and nothing else about individual $v_{i, t}$ . Achieving this with FE requires handling vectors of dimension d in the millions, while maintaining practical key and ciphertext sizes [20,21].
(2): Communication and computation efficiency. Any secure transmission scheme must be competitive in both bandwidth and latency, with optimized secure aggregation pipelines deployed in practice [4,5]. This calls for integrating unbiased quantization and sketching [2,9,12,15] directly into the FE layer, preserving linearity so that the estimator of $G_{t}$ remains unbiased after decryption, while keeping ciphertext expansion modest.
(3): Verifiable correctness under partial trust. In realistic deployments, a single aggregator or helper may be compromised. We therefore seek a design where decryption power is split across two non-colluding helpers, and where the server can verify that the decrypted aggregates genuinely reflect the sum of honestly contributed (and properly clipped) updates, without learning any intermediate plaintext. Existing linearly homomorphic authentication and MAC schemes [22,23] provide building blocks, but they need to commute with the FE aggregation path and operate efficiently at gradient scale.

1.2. Our Approach and Contributions

We propose FlowAgg-FE, a verifiable functional encryption framework for secure and communication-efficient gradient transmission in distributed machine learning. At its core are the following two novel algorithmic components:

I.: KS-IPFE. A key-splittable, LWE-based inner-product FE scheme that supports blockwise linear aggregation over high-dimensional encoded gradients, with 2-of-2 threshold decryption across two non-colluding helpers.
II.: PaS-Stream. A rate-adaptive, streaming transmission pipeline that combines Johnson–Lindenstrauss sketching, unbiased quantization, and blockwise FE encryption to enable straggler-tolerant, bandwidth-efficient aggregation.

Together with commuting linearly homomorphic tags and lightweight clipping proofs, these components realize an end-to-end pipeline in which the server learns only the intended linear image of client updates, and can verify aggregate integrity, while clients enjoy reduced uplink. Concretely, our contributions are as follows:

Key-splittable inner-product FE for FL gradients. We design KS-IPFE, a dual-style LWE construction with gadget packing and explicit key splitting across two helpers. Each function key for a vector $y$ is split into two individually simulatable shares; only when combined do they reveal the true inner product with the aggregated ciphertext. This enables thresholded, function-private decryption of $G_{t}$ and on-the-fly function changes (e.g., different coordinate subsets or sketch spaces) without client-side re-encryption [18,20,21].
PaS-Stream: Unbiased, streaming compression integrated with FE. We introduce PaS-Stream, which applies JL-style sketching and unbiased quantization [2,9,15] to clipped client updates, partitions the resulting compressed vectors into blocks, and encrypts each block with KS-IPFE. This preserves linearity and unbiasedness through the encoding and FE layers, yields a rate-adaptive stream that tolerates stragglers and dropouts, and lowers per-round uplink by up to $3.4 \times$ compared to an optimized encrypted secure aggregation baseline.
Commuting verifiability via linearly homomorphic tags. We co-design linearly homomorphic tags [22,23] and blockwise clipping proofs that commute with KS-IPFE aggregation. Tags are derived from encoded blocks and cross-checked with FE decryptions under a hidden selector, ensuring that the released aggregates match the sum of properly clipped client updates, even when some clients or a single helper are malicious. This yields a lightweight, scalable integrity mechanism compatible with large-scale distributed training [4].
Implementation and empirical evaluation. We implement FlowAgg-FE with an LWE backend and efficient vector packing, and evaluate it on the CIFAR-10 and FEMNIST tasks [1,2]. Our results show that FE-based transmission can match plaintext and secure aggregation accuracy within $0.3 %$ absolute, while reducing per-client uplink by 1.9–3.4× and cutting server-side CPU time by up to $1.77 \times$ under realistic participation and straggler models.

In general, FlowAgg-FE makes three conceptual contributions: (i) a key-splittable IPFE interface in which a coalition of

Srv

with at most one helper has only a simulatable view, (ii) a streaming sketch/compression path that preserves unbiasedness under value-independent dropouts, and (iii) commuting integrity checks (LHT + commitments) that bind the decrypted aggregate to authenticated client inputs.

The rest of the paper is organized as follows. Section 2 discusses related work on functional encryption, secure aggregation, compression for distributed optimization, and verifiable computation. Section 3 formalizes the distributed learning setting, adversarial model, and cryptographic tools. Section 4 presents the KS-IPFE construction and the PaS-Stream protocol in detail. Section 5 reports empirical results on vision and character recognition benchmarks. Section 6 concludes with a discussion of limitations and future directions.

2. Related Work

Functional encryption (FE) offers a paradigm in which decryption keys reveal only specified function outputs on encrypted data, rather than the data itself. The conceptual foundations and formal security notions for FE were articulated by Boneh, Sahai, and Waters [18], while predicate and attribute-focused forms (e.g., inner-product predicates) were developed via pairing-based and lattice-based techniques [20,21,24]. Our framework targets linear function evaluation over high-dimensional client updates, aligning with inner-product FE (IPFE) and its optimized instantiations from standard assumptions [20,21]. In contrast to those works, KS-IPFE introduces a two-share, non-colluding key-splitting interface that enables thresholded functional decryption and verifiable cross-checks without revealing per-client contributions. While multi-party and threshold designs are well-studied for homomorphic and public-key encryption [6,7,8], comparable key-splitting for FE targeted to streaming federated workloads has not been systematized in prior works.

2.1. Functional and Homomorphic Encryption for Linear Evaluation

IPFE constructions from LWE provide succinct linear evaluation under worst-case hardness assumptions [19,20,21]. These schemes expose only

〈 x, y 〉

for chosen

y

, complementing classical homomorphic encryption (HE), which permits generic circuit evaluation but with heavier ciphertext growth and resourcing [6,7,8]. Our design adopts an LWE-style dual form with gadget packing, exploiting ciphertext additivity to aggregate client streams before functional decryption, thus revealing only

G_{t}

rather than any individual

v_{i, t}

. Compared to HE-based secure aggregation pipelines, FE avoids publishing aggregation circuits, maintains function privacy (hiding

y

unless authorized), and enables on-the-fly switching of

y

with re-keying at the helpers’ end rather than client-side re-encryption [7,20,21]. Classic IPFE gives any holder of

{sk}_{y}

the ability to evaluate

〈 x, y 〉

on each ciphertext, which is unsuitable for FL where no single entity should learn per-client values. Our KS-IPFE adds a 2-of-2 split with individually simulatable shares, enabling deployment where

Srv

can collude with at most one helper without breaking confidentiality [25,26].

2.2. Secure Aggregation and Federated Learning at Scale

Secure aggregation protocols compute sums of client-held vectors without disclosing the summands and are a cornerstone of federated learning deployments [4,5,27,28]. FedAvg and its communication-efficient variants established cross-device FL as a practical training modality with client-side clipping and partial participation [1,2,3]. Our FlowAgg-FE differs in two aspects. First, we cryptographically restrict disclosure to the linear image chosen by the task owner (e.g., blockwise coordinates or sketched aggregates) via FE keys rather than bespoke MPC masks [5]. Second, our PaS-Stream couples unbiased sketching with blockwise FE to make aggregation rate-adaptive and resilient to stragglers, integrating naturally with production-grade FL orchestration [4,28,29,30].

2.3. Communication Reduction: Quantization, Sparsification, and Sketching

A large section of the literature reduces uplink/round-trip costs by compressing gradients through quantization, sparsification, and randomized projections [2,9,10,11,12,13,14,15]. QSGD provides unbiased quantization with convergence guarantees [9]; 1-bit SGD and sign-based methods achieve extreme compression but may introduce bias unless error feedback is used [10,13,14]. Random projections in the Johnson–Lindenstrauss (JL) family preserve geometry with small distortion [15] and underpin sketched updates in FL [2]. PaS-Stream composes JL-type sketching with unbiased quantization while retaining linearity through the encoding and FE layers, so the estimator of

G_{t}

remains unbiased after decryption. Unlike prior compression-only methods [9,11,12], we cryptographically enforce that only the intended linear image of the compressed stream can be recovered.

2.4. Verifiability and Linearly Homomorphic Authentication

Ensuring integrity of aggregated updates is critical under Byzantine behavior. General-purpose succinct NIZKs and range proofs (e.g., Groth–Sahai and Bulletproofs) offer expressive commit-and-prove tools with lightweight verification [16,17]. Our commuting checks instantiate linearly homomorphic tags to validate blockwise sums and bind them to encoded vectors, then cross-check via an FE decryption under a hidden selector. This approach parallels linearly homomorphic authentication/signature lines for linear subspaces and network-coded data [22,23], but tailors them to the FE aggregation pathway so that tags and decryptions agree on the same encoding. The result is a verifiability layer that adds minimal overhead and composes with our thresholded FE path, unlike heavy verifiable computation that would evaluate full-model updates inside a SNARK [16,17,31].

2.5. Summary and Positioning

In summary, our contribution bridges three threads: (i) IPFE from LWE for linear function release with function privacy [18,20,21,32]; (ii) secure aggregation and FL systems engineering [1,3,4,5]; and (iii) communication-efficient compression with unbiased estimators [2,9,12,15]. KS-IPFE provides a key-splittable FE layer that preserves privacy against any single server and supports on-the-fly function changes, while PaS-Stream delivers rate-adaptive, unbiased transmission whose integrity is checked by commuting, linearly homomorphic tags [17,22,23,31]. To our knowledge, this end-to-end co-design—functional encryption + unbiased sketching/quantization + commuting verifiability—has not been previously articulated or evaluated in the FL setting.

3. Preliminaries

This section fixes notation, describes the distributed learning setting, specifies the adversarial model and security goals, and recalls cryptographic and statistical primitives used by our framework. All symbols introduced here are used consistently throughout Section 4, Section 5 and Section 6.

3.1. Notation and System Model

Let

λ

be the security parameter and

[n] : = {1, \dots, n}

. Vectors are bold lowercase, matrices bold uppercase, and all default norms are

l_{2}

. For a real value x, let

⌊ x ⌉

denote rounding to the nearest integer. For modulus

q \in Z_{> 0}

we write

Z_{q}

for integers modulo q and lift to vectors entrywise.

We consider cross-device or cross-silo training over n clients

C = {C_{i}}_{i = 1}^{n}

interacting with an untrusted aggregator

Srv

and two non-colluding helpers

H_{A}, H_{B}

. Training proceeds in rounds

t = 1, \dots, T

. The global model is

w_{t} \in R^{d}

. In round t, a subset

A_{t} \subseteq [n]

of available clients computes clipped gradients

v_{i, t} = clip (\nabla_{w} l_{i} (w_{t}), S) with {∥ v_{i, t} ∥}_{2} \leq S,

(1)

for a fixed clipping threshold

S > 0

. The task-relevant linear functional released in each round is the weighted aggregate

G_{t} = \sum_{i \in A_{t}} α_{i, t} v_{i, t} \in R^{d},

(2)

for public weights

α_{i, t} \in R

. Our framework reveals only

G_{t}

(or a privatized variant) to

Srv

; no other function of any

v_{i, t}

should be learned.

To interface with lattice cryptography, clients encode

v_{i, t}

into

x_{i, t} \in Z_{q}^{d}

via a scaling factor

Δ > 0

:

x_{i, t} = ⌊ Δ v_{i, t} ⌉ \mod q, and dec (z) = z / Δ .

(3)

All ciphertexts in round t use the same modulus q and scale

Δ

.

Symbol	Type	Meaning
$C, Srv, H_{A}, H_{B}$	sets/roles	clients, aggregator, helpers
$d, T, S$	integers/real	dimension, rounds, clipping threshold
$w_{t}$	$R^{d}$	global model at round t
$v_{i, t}$	$R^{d}$	clipped gradient of client i
$α_{i, t}$	$R$	aggregation weight for client i
$x_{i, t}$	$Z_{q}^{d}$	encoded gradient (Equation (3))
$q, Δ$	integers/real	modulus, fixed scaling factor
$Φ_{t}, Q_{b}$	matrix, map	sketching matrix, b-bit unbiased quantizer

3.2. Adversarial Model and Goals

We assume an adaptive PPT adversary that may corrupt

Srv

and any strict subset of

{H_{A}, H_{B}}

, but not both helpers simultaneously. While our core model assumes that at most one of

{H_{A}, H_{B}}

colludes with

Srv

, this can be instantiated operationally by placing the helpers under independent administrative domains (e.g., distinct cloud providers) and enforcing separation via auditing/contractual controls. As a defense-in-depth option, helper execution can be confined to Trusted Execution Environments (TEEs) with remote attestation so that key shares are provisioned only to attested code and decryption shares are released only for round-labeled aggregated ciphertexts. Finally, our masking-based key splitting in Equation (12) naturally generalizes to t-of-m helpers via secret sharing, reducing reliance on any single helper at the cost of additional helper messages.

Clients can be Byzantine and may deviate from the protocol (e.g., sending malformed ciphertexts). Network scheduling can cause stragglers and dropouts. Our goals are:

Confidentiality. No adversary controlling $Srv$ and at most one helper learns anything about individual $v_{i, t}$ beyond the value of the allowed function(s) (Equation (2)) and public metadata.
Function privacy. The structure of the function applied to encrypted data is hidden unless explicitly revealed via function keys.
Verifiable correctness. The value output to $Srv$ equals the prescribed function of honestly contributed inputs, except with negligible probability, even if clients or a single helper are malicious.
Robust aggregation. The protocol remains correct under client dropouts; stragglers do not block progress.

Our confidentiality and function-privacy claims are proved in a model where

Srv

may collude with at most one helper. Concretely, the view of

Srv + H_{A}

(or

Srv + H_{B}

) can be simulated given only the authorized aggregate outputs because each helper holds only a masked key share that is individually simulatable (Equation (5)). If both helpers collude with

Srv

, the system degrades to standard IPFE and confidentiality of individual contributions is no longer expected. In practice, the non-collusion assumption can be approximated by placing helpers in independent administrative domains or by confining helper logic to TEEs with attestation; we also note that the key-splitting technique extends to t-of-m helpers to reduce reliance on any single helper at the cost of extra helper messages.

3.3. Functional Encryption Background

A functional encryption (FE) scheme for a message space

M

and function family

F

consists of four PPT algorithms

(Setup, KeyGen, Enc, Dec),

where

Setup (1^{λ}) \to mpk, msk

;

KeyGen (msk, f) \to {sk}_{f}

for

f \in F

;

Enc (mpk, m) \to c t

for

m \in M

; and

Dec ({sk}_{f}, c t) \to f (m)

. The security notion guarantees that

c t

reveals nothing about m beyond what is implied by the outputs

f (m)

; for keys

{sk}_{f}

the adversary holds.

Inner-Product FE (IPFE). We use a vector space

M = Z_{q}^{d}

and functions

F = {y \mapsto 〈 x, y 〉

\mod q : y \in Z_{q}^{d}}

. An IPFE scheme satisfies ciphertext additivity: for encryptions

c t_{j} \leftarrow Enc (mpk, x_{j})

under the same public key,

Dec ({sk}_{y}, \sum_{j} c t_{j}) = \sum_{j} 〈 x_{j}, y 〉 \mod q .

(4)

Equation (4) lets the aggregator homomorphically combine client ciphertexts before functional decryption, ensuring that only an aggregate value is ever revealed. We require a 2-of-2 threshold variant in which

KeyGen

returns a pair of shares

({sk}_{y}^{A}, {sk}_{y}^{B})

distributed to

H_{A}

and

H_{B}

. Each helper computes a decryption share

σ^{\circ} \leftarrow DecShare ({sk}_{y}^{\circ}, C T)

on an aggregate ciphertext

C T

, and a public combiner

Comb

outputs the result

Comb (σ^{A}, σ^{B}) = 〈 \sum_{j} x_{j}, y 〉 \mod q, with (σ^{A}, σ^{B}) individually simulatable .

(5)

No single helper can learn the function value alone. Classic IPFE gives any holder of

{sk}_{y}

the ability to evaluate

〈 x, y 〉

on each ciphertext, which is unsuitable for FL where no single entity should learn per-client values. Our KS-IPFE adds a 2-of-2 split with individually simulatable shares (Equation (5)), enabling deployment where

Srv

can collude with at most one helper without breaking confidentiality; the cost is an explicit helper-separation assumption that we analyze in Section 3.2.

3.4. LWE Tools and Encoding

Our constructions are LWE-based. Let

n_{l}

be the LWE dimension, q a prime modulus, and

χ

a discrete Gaussian or subgaussian error distribution over

Z

. A sample

(a, b) \in Z_{q}^{n_{l}} \times Z_{q}

is drawn as

a \leftarrow_{R} Z_{q}^{n_{l}}

,

b = 〈 a, s 〉 + e \mod q

for secret

s \leftarrow_{R} Z_{q}^{n_{l}}

and noise

e \leftarrow χ

. The hardness of distinguishing such samples from uniform is the LWE assumption at parameters

(n_{l}, q, χ)

.

Additive homomorphism. LWE encryption of messages in

Z_{q}

is additively homomorphic: summing ciphertexts componentwise produces a valid encryption of the sum with controlled noise growth. Vector encryption follows by componentwise encoding or via gadget decomposition. With the encoding of Equation (3), the post-decryption rescaling by

Δ^{- 1}

recovers real-valued aggregates with bounded rounding error.

Noise budgeting. Let

η

upper bound the decryption noise after summing at most B ciphertexts in a round. We pre-allocate q and

Δ

so that

η ≪ q / 4 and Δ S ≪ q / 8,

(6)

ensuring correctness for

B \leq | A_{t} |

. KS-IPFE decryption is exact on the encoded integers (up to negligible failure probability): LWE noise is only a correctness concern and does not introduce additional approximation error when rounding succeeds. Consequently, the total estimation error of FlowAgg-FE decomposes cleanly into (i) sketching error (zero-mean, with variance controlled by k and the choice of

Φ_{t}

) and (ii) quantization/encoding error from

Q_{b}

and scaling by

Δ

. In particular, for any coordinate value u, the integer encoding/decoding contributes a deterministic rounding term bounded as

| ε_{rnd} | \leq 1 / (2 Δ)

, while the stochastic quantizer remains unbiased,

E [ε_{q}] = 0

, with variance controlled by b. This addresses the worst-case numerical stability: no additional approximation is introduced by FE beyond the controlled quantization/sketching error.

3.5. Compression and Streaming

To reduce uplink, we use sketching and unbiased quantization before encryption. Let

Φ_{t} \in {\pm 1 / \sqrt{k}}^{k \times d}

be a per-round public Johnson–Lindenstrauss transform and

k ≪ d

. Define the sketch

s_{i, t} = Φ_{t} v_{i, t} \in R^{k} .

(7)

Let

Q_{b} : R \to (2^{b} - levels)

be a stochastic quantizer with

E [Q_{b} (z) ∣ z] = z

and bounded variance

V [Q_{b} (z)] \leq σ_{b}^{2}

(e.g., randomized rounding with per-block scale). The transmitted payload encodes

u_{i, t} = Q_{b} (s_{i, t})

. Unbiasedness preserves aggregated expectations

E [\sum_{i \in A_{t}} α_{i, t} u_{i, t}] = \sum_{i \in A_{t}} α_{i, t} s_{i, t} = Φ_{t} G_{t} .

(8)

A rate-adaptive streaming interface breaks

u_{i, t}

into fixed-size chunks

u_{i, t}^{(b)}

that are each encoded to

x_{i, t}^{(b)}

and encrypted, enabling partial aggregation when stragglers drop.

3.6. Verifiability Primitives

We rely on two lightweight building blocks that commute with linear aggregation.

Linearly homomorphic tags (LHT). Let

κ \leftarrow_{R} Z_{p}^{d}

be a secret tag key for a large prime p. A client computes

τ_{i, t} = 〈 ⌊ Δ v_{i, t} ⌉, κ 〉 \mod p,

(9)

and sends

τ_{i, t}

alongside the FE ciphertext. Aggregation preserves linearity as follows:

\sum_{i \in A_{t}} α_{i, t} τ_{i, t} \equiv 〈\sum_{i \in A_{t}} α_{i, t} ⌊ Δ v_{i, t} ⌉, κ〉 \mod p .

(10)

Choosing

κ

uniformly makes individual tags pseudorandom to

Srv

. In KS-IPFE we can also decrypt the same aggregate with function vector

κ

to reproduce the right-hand side of Equation (10) and cross-check consistency without revealing any per-client information.

Range and clipping proofs. Each client binds its encoded vector

x_{i, t}

to a commitment

Com (x_{i, t}; r_{i, t})

and proves in zero-knowledge that

∥ v_{i, t} ∥_{2} \leq S and x_{i, t} = ⌊ Δ v_{i, t} ⌉ \mod q,

using a commit-and-prove system with linear relations. These proofs are additively aggregatable: verifier checks succeed on the sum of commitments when all individual statements hold, matching the FE aggregation path.

4. Methodology

We now instantiate FlowAgg-FE with a concrete key-splittable inner-product functional encryption (KS-IPFE) scheme and a streaming transmission pipeline (PaS-Stream) that together realize the goals set out in Section 3. Unless stated otherwise, the plaintext vectors handled by FE in this section are the encoded payloads derived from either the full clipped gradients

v_{i, t}

via Equation (3) or, when PaS-Stream is active, from the unbiased sketches

u_{i, t} = Q_{b} (Φ_{t} v_{i, t})

via the same encoding. Viewing all FE plaintexts through this encoding lens keeps Equations (4)–(10) consistent and lets us reason about correctness and security directly at the level of encoded blocks.

4.1. Architectural Overview of FlowAgg-FE

FlowAgg-FE is structured as a layered architecture that aligns the cryptographic design of KS-IPFE with the systems concerns of streaming, robustness, and verifiability. The roles are as in Section 3: a population of clients

C = {C_{i}}

, an untrusted aggregator

Srv

, and two non-colluding helpers

H_{A}, H_{B}

. All parties share public cryptographic parameters and model hyperparameters; only the helpers hold FE function key shares. At a high level, each training round t consists of three phases: (i) function selection and key materialization, (ii) client-side compression, encoding, and encryption, and (iii) server-side aggregation, threshold decryption, and verification.

In the function selection and key materialization phase, a key authority (which may be an initialization-time trusted party or a distributed setup protocol) runs the KS-IPFE setup algorithm to produce

(mpk, msk)

, as described later in detail. The public key

mpk

is disseminated to all clients and to

Srv

, while the master secret key

msk

is retained solely for generating function keys. To support changing linear functions on-the-fly (e.g., per-round masks, adaptive weighting, or auditing vectors), the authority issues fresh split keys

({sk}_{y_{t}}^{A}, {sk}_{y_{t}}^{B})

tagged with the round identifier t. Helpers keep only the currently active shares (plus long-lived basis shares for

{e_{j}}_{j \in [D]}

), which makes revocation as simple as deleting prior-round shares. Synchronization is handled by including the round id in helper responses and rejecting stale shares at

Srv

. The per-round key material is

O (m + D l_{g})

elements per helper, which is negligible compared to per-round ciphertext traffic in our regimes. For a given training regime, the task owner specifies a family of allowable linear functionals over encoded blocks, such as (i) individual coordinates

e_{j}

, (ii) rows of a sketching matrix

Φ_{t}

used in PaS-Stream, or (iii) secret tag vectors

κ_{t}

used for verifiability. For each such vector

y

, the authority runs

KeyGen

to obtain a pair of key shares

({sk}_{y}^{A}, {sk}_{y}^{B})

and distributes them to

H_{A}

and

H_{B}

, respectively. Because KS-IPFE keys are splittable and each share is individually simulatable, no single helper can recover inner products alone, yet together they can support decryption for any authorized linear functional. We note that although we model the key authority as a logical role, it is needed only at initialization (and when the authorized function set changes). To remove a single point of failure, the master secret

msk

can be generated and held by a small committee via distributed key generation/threshold secret sharing, and function-key issuance can be made auditable [33]. Alternatively,

msk

can be sealed inside an HSM/TEE so that only a rate-limited key-derivation interface is exposed and no bulk secret material is exportable.

In the client-side phase of round t, the aggregator broadcasts the current global model

w_{t}

and, when PaS-Stream is enabled, the public sketching matrix is

Φ_{t}

and quantization is configured (e.g., the bit-width b and scale ranges used by

Q_{b}

). Each client

C_{i} \in A_{t}

computes its clipped gradient according to Equation (1), obtaining

v_{i, t}

with

∥ v_{i, t} ∥_{2} \leq S

. Depending on the mode, the payload prior to encryption is either the full vector

v_{i, t}

(KS-IPFE only) or a compressed representation

u_{i, t} = Q_{b} (Φ_{t} v_{i, t})

that satisfies the unbiasedness property of Equation (8). The client then maps this payload into an integer vector via Equation (3), producing

x_{i, t} \in Z_{q}^{k}

for some working dimension k (equal to d in the full-precision case and to the sketch dimension in PaS-Stream). This vector is partitioned into contiguous blocks of size D,

x_{i, t}^{(1)}, \dots, x_{i, t}^{(B_{t})}

, each of which is encrypted independently under

mpk

using the KS-IPFE encryption algorithm. For every block, the client also computes a linearly homomorphic tag and a clipping/encoding proof that binds

x_{i, t}^{(b)}

to a valid, correctly clipped source vector. The collection of ciphertexts, tags, and proofs for all blocks is streamed towards

Srv

using a simple, nonce-based framing that preserves block ordering.

The server-side phase begins as soon as

Srv

receives the first block ciphertexts. Rather than waiting for all clients,

Srv

incrementally forms aggregated ciphertexts for each block index b by computing weighted sums across the subset of clients whose b-th block has arrived, as in Equation (14). In parallel,

Srv

aggregates the corresponding tags, preserving linearity at the tag level. The aggregated ciphertext pair

(C_{1}^{(b)}, C_{2}^{(b)})

and any necessary public metadata are then forwarded to both helpers. Each helper uses its share of the FE keys to compute decryption shares for the relevant function vectors (e.g., the coordinate basis vectors and the secret tag vector

κ_{t}

), which are then combined at

Srv

to recover (i) the blockwise sums of encoded payloads and (ii) an independently reconstructed tag. The latter is cross-checked against the aggregated tag to detect any tampering or malformed ciphertexts, while the former yields either a block of the aggregated gradient

\sum_{i \in A_{t}} α_{i, t} v_{i, t}

or, in PaS-Stream mode, a block of the aggregated sketch

\sum_{i \in A_{t}} α_{i, t} u_{i, t}

. After rescaling and (if needed) applying

Φ_{t}^{⊤}

,

Srv

obtains an estimator of

G_{t}

and performs the model update for round t.

We note that a key feature of this architecture is that ciphertext aggregation and verification commute with functional decryption. Because KS-IPFE is additively homomorphic with respect to ciphertexts (Equation (4)) and the tags are linearly homomorphic (Equation (10)),

Srv

can aggregate encrypted blocks and tags independently, and the helpers can perform decryption on these aggregates without ever seeing per-client plaintexts. This property is crucial for scalability: it ensures that helper workload scales with the number of blocks per round, not with the number of clients, and that verification overhead remains modest.

4.2. KS-IPFE: Key-Splittable LWE-Based Construction

The KS-IPFE component of FlowAgg-FE provides the cryptographic substrate that allows the aggregator and the two helpers to recover only prescribed linear functionals of encrypted updates. It is tailored to the FL setting in Section 3 by (i) operating over high-dimensional encoded vectors arising from Equation (3), (ii) supporting ciphertext aggregation before decryption as in Equation (4), and (iii) splitting every function key into two individually simulatable shares held by non-colluding helpers. This subsection refines the construction in a more algebraic manner, explicitly tracking dimensions, noise terms, and the simulation-based security view.

We fix a block length

D \in Z_{> 0}

and view each FE plaintext as a block

x \in Z_{q}^{D}

. A working vector of dimension k (either

k = d

for full gradients or k equal to the sketch dimension in PaS-Stream) is partitioned into

B = ⌈ k / D ⌉

blocks, so that the b-th block is denoted

x^{(b)}

for

b \in [B]

. Typical choices such as

D \in {32, 64}

balance the amortization of gadget-based packing with the number of function keys that must be instantiated per round. All arithmetic is in the residue ring

Z_{q}

, where

q \in Z_{> 0}

is chosen together with the LWE dimension

n_{l}

and error distributions

χ, χ^{'}

to satisfy the noise budget in Equation (6). We write m for the width of the public matrix A, with

m = poly (λ)

.

The construction follows a dual-style LWE template with gadget packing. Let

G_{D} \in Z_{q}^{D l_{g} \times D}

be a block-diagonal gadget matrix implementing base-2 decomposition with

l_{g}

digits per coordinate. Concretely,

G_{D} = diag (g, \dots, g)

with D blocks

g = {(1, 2, \dots, 2^{l_{g} - 1})}^{⊤}

, so that for any

x \in Z_{q}^{D}

, the product

\hat{x} : = G_{D} x \in Z_{q}^{D l_{g}}

represents the expansion of each coordinate of

x

in base 2. There exists a (not necessarily unique) left inverse

G_{D}^{†} \in Z_{q}^{D \times D l_{g}}

such that

G_{D}^{†} G_{D} \equiv I_{D} (\mod q)

on the range of interest (i.e., for coefficients below the wrap-around threshold). This allows us to pass back and forth between the “digit” domain and the original coordinate domain inside the decryption algorithm.

Setup. In the setup phase, the authority samples

A \leftarrow_{R} Z_{q}^{n_{l} \times m}, S \leftarrow χ^{m \times D l_{g}}, E \leftarrow {χ^{'}}^{n_{l} \times D l_{g}},

where the entries of S and E are independent draws from subgaussian integer distributions

χ, χ^{'}

, respectively (e.g., discrete Gaussians with parameter

σ

and

σ^{'}

). We define

B : = A S + E \in Z_{q}^{n_{l} \times D l_{g}},

(11)

and publish the master public key

mpk = (A, B, G_{D})

while retaining the master secret key

msk = S

. Under the LWE assumption at parameters

(n_{l}, q, χ^{'})

, the joint distribution

(A, B)

is computationally indistinguishable from

(A, U)

, where

U \leftarrow_{R} Z_{q}^{n_{l} \times D l_{g}}

is uniform; consequently,

mpk

reveals no information about S beyond what is implied by the security parameter.

Key generation and splitting. To authorize decryption of inner products with a block-level function vector

y \in Z_{q}^{D}

, the authority first embeds

y

into the gadget domain by computing

\tilde{y} : = G_{D}^{⊤} y \in Z_{q}^{D l_{g}} .

We interpret

\tilde{y}

as the vector of “digit weights” corresponding to the functional

〈 x, y 〉

acting on gadget-packed plaintexts. Using

msk = S

, we compute a base key

K : = S \tilde{y} \in Z_{q}^{m} .

Intuitively, K is the dual secret associated with the function vector

y

, analogous to the secret key in standard LWE decryption.

We now choose a masking vector

W \leftarrow_{R} Z_{q}^{m}

uniformly at random and define the two key shares as

{sk}_{y}^{A} = (\tilde{y}, W), {sk}_{y}^{B} = (\tilde{y}, K - W \mod q) .

(12)

Each helper receives only its own share. The distribution of

{sk}_{y}^{A}

(resp.

{sk}_{y}^{B}

) is computationally indistinguishable from

(\tilde{y}, U_{m})

, where

U_{m}

is uniform in

Z_{q}^{m}

, since W is uniform and independent of K. In particular, any PPT adversary that corrupts at most one helper learns no additional information about K beyond what is already implied by

y

and the allowed function outputs. This observation underlies the threshold property of KS-IPFE: decryption requires cooperation between

H_{A}

and

H_{B}

, while each helper’s view alone can be simulated from public information and oracle access to function outputs.

Client encryption. For a single block payload

x \in Z_{q}^{D}

, derived from either

v_{i, t}

or its sketch as in Equation (3), client

C_{i}

forms the gadget-packed vector

\hat{x} : = G_{D} x \in Z_{q}^{D l_{g}} .

We regard

\hat{x}

as a column vector. The client then samples a fresh randomness vector

r \leftarrow_{R} Z_{q}^{n_{l}}

and error terms

e_{1} \leftarrow χ^{m}, e_{2} \leftarrow {χ^{'}}^{D l_{g}},

and computes the ciphertext components

c_{1} : = A^{⊤} r + e_{1} \in Z_{q}^{m}, c_{2} : = B^{⊤} r + \hat{x} + e_{2} \in Z_{q}^{D l_{g}} .

(13)

The per-block ciphertext is the pair

c t = (c_{1}, c_{2})

. Note that if we define the “ideal” noiseless ciphertext as

({\bar{c}}_{1}, {\bar{c}}_{2}) = (A^{⊤} r, B^{⊤} r + \hat{x})

, then

(c_{1}, c_{2})

differs from

({\bar{c}}_{1}, {\bar{c}}_{2})

by an additive error vector

(e_{1}, e_{2})

.

Ciphertext aggregation. Once

Srv

has received ciphertexts for a given block index b from a subset

A_{t} \subseteq [n]

of clients in round t, it uses the fixed-point weights

{\tilde{α}}_{i, t} \in Z_{q}

(encoding the real weights

α_{i, t}

) to form the aggregated ciphertext

C_{1}^{(b)} : = \sum_{i \in A_{t}} {\tilde{α}}_{i, t} c_{1, i}^{(b)}, C_{2}^{(b)} : = \sum_{i \in A_{t}} {\tilde{α}}_{i, t} c_{2, i}^{(b)} .

(14)

Writing

r_{i}^{(b)}

for the randomness used by

C_{i}

on block b and

{\hat{x}}_{i}^{(b)}

for its gadget-packed plaintext, we can decompose

C_{1}^{(b)} = A^{⊤} (\sum_{i \in A_{t}} {\tilde{α}}_{i, t} r_{i}^{(b)}) + \sum_{i \in A_{t}} {\tilde{α}}_{i, t} e_{1, i}^{(b)},

C_{2}^{(b)} = B^{⊤} (\sum_{i \in A_{t}} {\tilde{α}}_{i, t} r_{i}^{(b)}) + \sum_{i \in A_{t}} {\tilde{α}}_{i, t} {\hat{x}}_{i}^{(b)} + \sum_{i \in A_{t}} {\tilde{α}}_{i, t} e_{2, i}^{(b)} .

The aggregated error vectors

E_{1, t}^{(b)} : = \sum_{i \in A_{t}} {\tilde{α}}_{i, t} e_{1, i}^{(b)}, E_{2, t}^{(b)} : = \sum_{i \in A_{t}} {\tilde{α}}_{i, t} e_{2, i}^{(b)} + E^{⊤} \sum_{i \in A_{t}} {\tilde{α}}_{i, t} r_{i}^{(b)}

remain subgaussian with parameter depending on

| A_{t} |

; Equation (6) is chosen precisely so that their magnitude is well within the decryption margin.

Threshold decryption and correctness. For each block and each authorized function vector

y \in Z_{q}^{D}

, helper

H_{\circ}

with share

{sk}_{y}^{\circ} = (\tilde{y}, W^{\circ})

computes a decryption share

σ^{\circ} : = 〈 \tilde{y}, C_{2}^{(b)} 〉 - 〈 W^{\circ}, C_{1}^{(b)} 〉 \in Z_{q},

(15)

where

W^{A} = W

and

W^{B} = K - W

as in Equation (12). Summing these shares yields

\begin{matrix} σ^{A} + σ^{B} & = 〈\tilde{y}, C_{2}^{(b)}〉 - 〈K, C_{1}^{(b)}〉 \\ = 〈\tilde{y}, B^{⊤} \sum_{i} {\tilde{α}}_{i, t} r_{i}^{(b)} + \sum_{i} {\tilde{α}}_{i, t} {\hat{x}}_{i}^{(b)} + E_{2, t}^{(b)}〉 \\ - 〈S \tilde{y}, A^{⊤} \sum_{i} {\tilde{α}}_{i, t} r_{i}^{(b)} + E_{1, t}^{(b)}〉 \\ = 〈\sum_{i} {\tilde{α}}_{i, t} {\hat{x}}_{i}^{(b)}, \tilde{y}〉 + 〈E^{⊤} \sum_{i} {\tilde{α}}_{i, t} r_{i}^{(b)}, \tilde{y}〉 \\ + 〈\tilde{y}, E_{2, t}^{(b)}〉 - 〈S \tilde{y}, E_{1, t}^{(b)}〉 (\mod q) . \end{matrix}

(16)

Denoting the total noise term in Equation (16) by

N_{t}^{(b)} (y)

, we can write

σ^{A} + σ^{B} \equiv 〈\sum_{i} {\tilde{α}}_{i, t} {\hat{x}}_{i}^{(b)}, \tilde{y}〉 + N_{t}^{(b)} (y) (\mod q) .

Because

{\hat{x}}_{i}^{(b)} = G_{D} x_{i, t}^{(b)}

, and

G_{D}^{†} G_{D} \equiv I_{D} (\mod q)

on the relevant range, we have

〈\sum_{i} {\tilde{α}}_{i, t} {\hat{x}}_{i}^{(b)}, \tilde{y}〉 = 〈\sum_{i} {\tilde{α}}_{i, t} G_{D} x_{i, t}^{(b)}, G_{D}^{⊤} y〉 = 〈\sum_{i} {\tilde{α}}_{i, t} x_{i, t}^{(b)}, y〉,

where we used the identity

〈 G_{D} x, G_{D}^{⊤} y 〉 = 〈 x, y 〉

modulo q when no wrap-around occurs. By Equation (6), the subgaussian tails of

N_{t}^{(b)} (y)

ensure that

| N_{t}^{(b)} {(y) |}_{\infty} ≪ q / 4

with probability at least

1 - 2^{- λ}

, so rounding to the nearest integer in the canonical interval

(- q / 2, q / 2]

recovers

z^{(b)} (y) : = Round (σ^{A} + σ^{B}) = 〈\sum_{i \in A_{t}} {\tilde{α}}_{i, t} x_{i, t}^{(b)}, y〉 \in Z,

with overwhelming probability.

By choosing

y = e_{j}

for

j = 1, \dots, D

, the helpers can reconstruct all D coordinates of the blockwise weighted sum

\sum_{i} {\tilde{α}}_{i, t} x_{i, t}^{(b)}

from encrypted inputs. Mapping these coordinates back through the scaling factor

Δ

in Equation (3) then yields the corresponding portion of

G_{t}

or of its sketched variant, up to deterministic rounding error controlled by

Δ

and the clipping threshold S.

Security intuition and cost. From a security perspective, KS-IPFE inherits the indistinguishability guarantees of LWE-based IPFE. An IND-style security game for KS-IPFE can be phrased as follows: An adversary chooses two message families

{x_{i}^{(b, 0)}}, {x_{i}^{(b, 1)}}

with the same values under all functions for which it holds key shares, receives encryptions of one of the two (chosen at random), along with one key share per function vector, and must guess which family was encrypted. Under the LWE assumption and the simulatable distribution of key shares in Equation (12), any PPT adversary controlling

Srv

and at most one helper has at most negligible advantage in this game. Intuitively, replacing

(A, B)

with uniform, then replacing ciphertexts with uniform, and finally replacing the key share mask W with uniform in hybrids yields a distribution that depends only on the revealed function outputs.

On the cost side, one KS-IPFE encryption of a block requires forming

\hat{x} = G_{D} x

and two matrix-vector multiplications

A^{⊤} r

and

B^{⊤} r

, plus additions by small error terms. With NTT-friendly moduli and a structured choice of

A, B

, these multiplications can be implemented in

O (n_{l} m)

ring operations with a modest constant. Server-side aggregation is linear in the number of ciphertexts actually received and consists of scalar additions in

Z_{q}

. Helper-side decryption uses

O (D l_{g})

ring operations per decryption share (one inner product with

C_{2}^{(b)}

and one with

C_{1}^{(b)}

), so for F distinct function vectors and B blocks, the total number of decryption operations per round scales as

O (F B D l_{g})

, independent of

| A_{t} |

. In the regimes we evaluate in Section 5, a configuration with

q = 2^{32}

,

n_{l} = 1024

,

l_{g} = 16

,

D = 64

, and carefully tuned

χ, χ^{'}

yields a per-block decryption failure probability below

2^{- 40}

and supports several thousand contributing clients per round, while keeping per-client ciphertext sizes within a small constant factor of plaintext updates. More detailed game-based proofs (including the reduction to LWE and simulation of single-helper views) are provided in Appendix B.

4.3. Verifiable Aggregation with Commuting Checks

In addition to confidentiality and function privacy, FlowAgg-FE must enforce that the decryptions released to

Srv

coincide with valid linear aggregates of correctly clipped and encoded client updates, even in the presence of Byzantine clients and a potentially adversarial single helper. The verifiability layer is engineered so that all checks commute with the KS-IPFE aggregation pipeline: every operation can be written as an

Z

-linear map applied either before or after ciphertext aggregation, and the corresponding verification conditions are preserved up to negligible statistical or computational error. This section refines the informal description from Section 3 into a more algebraic treatment.

4.3.1. Linearly Homomorphic Tags as a MAC over the FE Plaintext Space

Let p be a large prime such that

p > q

and p is co-prime with q, and let

Z_{p}

be the finite field of order p. For each round t, an authentication key space is defined as

K_{LHT} = Z_{p}^{D}

, and a tag space as

T_{LHT} = Z_{p}

. A per-round authentication key is sampled as

κ_{t} \leftarrow_{R} Z_{p}^{D},

and is made available to all clients and to the KS-IPFE key generator (for the purpose of deriving

{sk}_{κ_{t}}^{A}, {sk}_{κ_{t}}^{B}

), but not to

Srv

. We treat

κ_{t}

as an ephemeral, per-round secret used only to authenticate that the decrypted aggregate matches the client-supplied tags in that round.

κ_{t}

is generated by the same (possibly distributed) authority that issues KS-IPFE keys, is delivered to clients over an authenticated channel, and is erased after round t closes; helpers receive only FE key shares for the corresponding function vector

{\bar{κ}}_{t}

. The soundness of the LHT check requires that

Srv

does not learn

κ_{t}

; if a compromised client discloses

κ_{t}

to

Srv

, then the server could forge tags for that round. We now make this collusion caveat explicit and note that confining tag computation to client TEEs (or distributing

κ_{t}

only to trusted clients) mitigates it in deployments that require strong server-verifiability even under client–server collusion.

On this basis, we define the block-level message space for tags as

M_{tag} = \{z \in Z^{D} : {∥ z ∥}_{\infty} \leq B\},

for some bound

B > 0

that upper bounds the infinity norm of rounded, scaled blocks

⌊ Δ a_{i, t}^{(b)} ⌉

produced by Equation (3) and the clipping constraint

∥ v_{i, t} ∥_{2} \leq S

. The LHT then induces an almost-linear MAC

{Tag}_{κ_{t}} : M_{tag} \to T_{LHT}, {Tag}_{κ_{t}} (z) = 〈 z, κ_{t} 〉 \mod p .

For each block b and client

C_{i}

, we set

τ_{i, t}^{(b)} = {Tag}_{κ_{t}} (⌊ Δ a_{i, t}^{(b)} ⌉) = 〈⌊ Δ a_{i, t}^{(b)} ⌉, κ_{t}〉 \mod p .

(17)

For any finite subset of indices

J \subseteq A_{t}

and scalar weights

{{\tilde{α}}_{j, t}}_{j \in J} \subseteq Z_{q}

, we have the exact algebraic identity

\begin{matrix} T_{t, J}^{(b)} & : = \sum_{j \in J} {\tilde{α}}_{j, t} τ_{j, t}^{(b)} \mod p \\ = \sum_{j \in J} {\tilde{α}}_{j, t} 〈⌊ Δ a_{j, t}^{(b)} ⌉, κ_{t}〉 \mod p \\ = 〈\sum_{j \in J} {\tilde{α}}_{j, t} ⌊ Δ a_{j, t}^{(b)} ⌉, κ_{t}〉 \mod p, \end{matrix}

(18)

which is the blockwise instantiation of Equation (10). In particular, define the aggregated encoded block

z_{t, J}^{(b)} : = \sum_{j \in J} {\tilde{α}}_{j, t} ⌊ Δ a_{j, t}^{(b)} ⌉ \in Z^{D} .

Then

T_{t, J}^{(b)} = {Tag}_{κ_{t}} (z_{t, J}^{(b)})

exactly, as long as all operations are interpreted modulo p for the tag and modulo q for the encoding. When

J = A_{t}

, we recover the global aggregated tag

T_{t}^{(b)}

used by the protocol.

Viewed as a MAC, the unforgeability of this LHT against an adversary not knowing

κ_{t}

reduces to the hardness of guessing a non-trivial linear relation over

Z_{p}

on authenticated messages. More precisely, if an adversary outputs

(z^{⋆}, τ^{⋆})

such that

z^{⋆}

is not in the

Z

-span of previously authenticated messages and

τ^{⋆} = {Tag}_{κ_{t}} (z^{⋆})

, then under the random choice of

κ_{t}

, the probability of success is at most

1 / p

.

Lemma 1

(LHT statistical hiding). Fix any

z \in M_{tag}

and sample

κ \leftarrow_{R} Z_{p}^{D}

. Let

τ = 〈 z, κ 〉

\mod p

. If

z \neq 0

then τ is uniform over

Z_{p}

; if

z = 0

then

τ = 0

. In particular, when

κ_{t}

is hidden from

Srv

, the tag does not leak the gradient norm or other distributional statistics beyond the degenerate event

z = 0

.

4.3.2. FE-Based Cross-Checking of Aggregates

The LHT alone only ensures that tags are consistent with the sum of messages within the tag domain; it does not bind tags to the KS-IPFE ciphertexts. To couple tags to ciphertexts, we reuse the FE plaintext space and instantiate an additional FE key for the same vector

κ_{t}

.

Let

κ_{t} \in Z_{p}^{D}

be as above and let

{\bar{κ}}_{t} \in Z_{q}^{D}

denote its canonical embedding into

Z_{q}^{D}

(e.g., by interpreting

κ_{t}

as integers in

[0, p)

and viewing them modulo q). The KS-IPFE key generator computes

({sk}_{{\bar{κ}}_{t}}^{A}, {sk}_{{\bar{κ}}_{t}}^{B}) \leftarrow KeyGen (msk, {\bar{κ}}_{t}),

with internal gadget expansion

{\tilde{κ}}_{t} = G_{D}^{⊤} {\bar{κ}}_{t}

as in Equation (12). These key shares are distributed to

H_{A}

and

H_{B}

.

Given the aggregated ciphertext

(C_{1}^{(b)}, C_{2}^{(b)})

for block b in round t as in Equation (14) (so

J = A_{t}

), helper

H_{\circ}

computes the decryption share for function vector

{\bar{κ}}_{t}

as follows:

σ_{κ_{t}}^{\circ} : = 〈{\tilde{κ}}_{t}, C_{2}^{(b)}〉 - 〈W_{κ_{t}}^{\circ}, C_{1}^{(b)}〉 \mod q,

and transmits it to

Srv

. By the correctness analysis in Equation (16), their sum

\begin{matrix} Σ_{κ_{t}}^{(b)} & : = σ_{κ_{t}}^{A} + σ_{κ_{t}}^{B} \\ \equiv 〈\sum_{i \in A_{t}} {\tilde{α}}_{i, t} {\hat{x}}_{i, t}^{(b)}, {\tilde{κ}}_{t}〉 + 〈E^{⊤} \sum_{i \in A_{t}} {\tilde{α}}_{i, t} r_{i}^{(b)}, {\tilde{κ}}_{t}〉 + noise (\mod q), \end{matrix}

and after gadget inversion and rounding we have, with probability at least

1 - 2^{- λ}

,

{dec}_{κ_{t}}^{(b)} : = Round (Σ_{κ_{t}}^{(b)}) = 〈\sum_{i \in A_{t}} {\tilde{α}}_{i, t} x_{i, t}^{(b)}, {\bar{κ}}_{t}〉 \mod q .

Since

x_{i, t}^{(b)} = ⌊ Δ a_{i, t}^{(b)} ⌉ \mod q

and the magnitude of

⌊ Δ a_{i, t}^{(b)} ⌉

is bounded by B, the reduction modulo q is injective on the relevant range, enabling a well-defined lifting to

Z

as follows:

{\tilde{z}}_{t}^{(b)} : = 〈\sum_{i \in A_{t}} {\tilde{α}}_{i, t} ⌊ Δ a_{i, t}^{(b)} ⌉, {\bar{κ}}_{t}〉 \in Z,

with

{dec}_{κ_{t}}^{(b)}

representing

{\tilde{z}}_{t}^{(b)} \mod q

. We then project

{dec}_{κ_{t}}^{(b)}

into

Z_{p}

via the canonical map

ϕ : Z_{q} \to Z_{p}

(e.g., reduction modulo p) to obtain

{\tilde{T}}_{t}^{(b)} : = ϕ ({dec}_{κ_{t}}^{(b)}) \in Z_{p} .

Under the consistency of parameters (specifically, p and q sufficiently large relative to B and the number of clients), we have

{\tilde{T}}_{t}^{(b)} = T_{t}^{(b)}

with all but negligible probability. Hence the equality

{\tilde{T}}_{t}^{(b)} \overset{?}{=} T_{t}^{(b)}

serves as a soundness check tying the FE-decrypted aggregate to the aggregated tags. Any deviation that changes ciphertext contents (e.g., adversarial modification by a client or by a compromised helper) while leaving client-generated tags untouched will, with high probability, violate this equality.

4.3.3. Commit-and-Prove for Clipping and Encoding

The consistency of tags and FE decryptions guarantees that the aggregator recovers a sum of certain integer vectors, but it does not by itself ensure that those integers arise from correctly clipped and scaled real-valued updates. To enforce semantic correctness of the encoding, each client engages in a commit-and-prove protocol.

Let

G

be a cyclic group of prime order p with generators

g, h \in G

such that the discrete logarithm

\log_{g} (h)

is unknown. For a block index b, client

C_{i}

defines a bit-decomposition of

x_{i, t}^{(b)} = (x_{i, t, 1}^{(b)}, \dots, x_{i, t, D}^{(b)}) \in Z_{q}^{D}

into

l_{enc}

-bit chunks per coordinate (with

l_{enc} \approx \log_{2} q

), and commits to each coordinate using a Pedersen-style vector commitment as follows:

Com (x_{i, t}^{(b)}; r_{i, t}^{(b)}) = g^{\sum_{j = 1}^{D} x_{i, t, j}^{(b)} γ_{j}} h^{r_{i, t}^{(b)}} \in G,

where

(γ_{1}, \dots, γ_{D}) \in Z_{p}^{D}

are fixed, publicly known generators for the message space. The commitment is additively homomorphic as follows:

\prod_{i \in J} Com {(x_{i, t}^{(b)}; r_{i, t}^{(b)})}^{{\tilde{α}}_{i, t}} = Com (\sum_{i \in J} {\tilde{α}}_{i, t} x_{i, t}^{(b)}; \sum_{i \in J} {\tilde{α}}_{i, t} r_{i, t}^{(b)}),

for any index set J. This algebra exactly mirrors the linear aggregation in Equation (14) and the tag aggregation above.

Client

C_{i}

supplies, for each round, a non-interactive zero-knowledge proof

π_{i, t}^{(b)}

of the following composite statement:

There exists $v_{i, t} \in R^{d}$ and randomness $r_{i, t}^{(1)}, \dots, r_{i, t}^{(B_{t})}$ such that (a) $∥ v_{i, t} ∥_{2} \leq S$ , (b) the blocks $a_{i, t}^{(b)}$ are consecutive slices of either $v_{i, t}$ or $Q_{b} (Φ_{t} v_{i, t})$ , and (c) for each b we have $x_{i, t}^{(b)} = ⌊ Δ a_{i, t}^{(b)} ⌉ \mod q$ and the corresponding commitment equals $Com (x_{i, t}^{(b)}; r_{i, t}^{(b)})$ .

Such proofs can be realized using standard inner-product arguments and range proofs (e.g., Bulletproofs-style constructions) that express the clipping condition as a quadratic constraint on the coordinates and the encoding relation as a bounded difference between

Δ a_{i, t}^{(b)}

and

x_{i, t}^{(b)}

. Verification of

π_{i, t}^{(b)}

is achieved by

Srv

before aggregation, but thanks to homomorphism, commitments may also be aggregated and verified in batch with auxiliary information provided by helpers after decryption.

4.3.4. Commutativity and Asymptotic Overheads

Summarizing the algebraic structure, for each block b and round t we have three parallel linear maps as follows:

L_{FE} : {x_{i, t}^{(b)}}_{i \in A_{t}} \mapsto \sum_{i \in A_{t}} {\tilde{α}}_{i, t} x_{i, t}^{(b)},

L_{tag} : {⌊ Δ a_{i, t}^{(b)} ⌉}_{i \in A_{t}} \mapsto \sum_{i \in A_{t}} {\tilde{α}}_{i, t} τ_{i, t}^{(b)},

L_{com} : {Com (x_{i, t}^{(b)}; r_{i, t}^{(b)})}_{i \in A_{t}} \mapsto \prod_{i \in A_{t}} Com {(x_{i, t}^{(b)}; r_{i, t}^{(b)})}^{{\tilde{α}}_{i, t}} .

Each of these maps is

Z

-linear in the sense that the image of the aggregated objects is equal to the aggregation of the images. Consequently, the following diagram commutes (up to negligible decryption error and modulo reductions):

\begin{matrix} {a_{i, t}^{(b)}}_{i} & \overset{encode + encrypt}{\to} & {(c_{1, i}^{(b)}, c_{2, i}^{(b)})}_{i} \\ ↓ L_{tag} & ↓ L_{FE} \\ T_{t}^{(b)} & \overset{FE - decrypt under κ_{t}}{\leftarrow} & \sum_{i} {\tilde{α}}_{i, t} x_{i, t}^{(b)} . \end{matrix}

A similar commuting diagram holds for commitments. Our commuting checks enforce that each accepted ciphertext/tag/commitment tuple is well-formed and that the server’s decrypted output equals the linear aggregate of the submitted (bounded, encoded) client updates. This provides an auditable enforcement point for bounded-energy (e.g., clipped) updates, but it does not, by itself, prevent within-bound model-poisoning attacks. In practice, monitoring can be done via (i) per-round rejection/abort rates from failed checks, (ii) aggregate statistics that are safe to reveal (e.g., the number of clipped updates, which can be reported by clients as a single bit), and (iii) standard training diagnostics (loss/accuracy curves) to flag anomalous rounds; robust aggregation or anomaly detection can be layered on top without changing the cryptographic core. This commutativity is the main reason the verification overhead scales with the number of blocks and not with the number of clients:

Srv

can aggregate ciphertexts, tags, and commitments using the same coefficients and rely on a constant number of FE decryptions and aggregate-proof verifications per block.

From an asymptotic viewpoint, if

B_{t}

denotes the number of blocks per client in round t and F the number of distinct function vectors used for verification (coordinate basis plus

κ_{t}

and possibly a small number of additional selectors), then:

The number of FE decryptions per round is $O (F B_{t})$ , independent of $| A_{t} |$ .
The number of scalar LHT evaluations per client is $O (B_{t} D)$ , and the communication cost of tags is $O (B_{t} \log p)$ bits.
The size of commitments and their proofs grows as $O (B_{t} \log q)$ group elements per client, which is dominated by the KS-IPFE ciphertexts for typical parameter regimes.

Section 5 confirms empirically that, under realistic choices of D,

B_{t}

, p, and q, the verifiability layer adds only a low single-digit percentage to overall runtime and communication while significantly strengthening the integrity guarantees of FlowAgg-FE.

4.4. PaS-Stream: Rate-Adaptive Streaming Without Accuracy Bias

PaS-Stream instantiates a rate-adaptive transmission layer that composes Johnson–Lindenstrauss sketching, stochastic quantization, and KS-IPFE encryption into a single linear operator acting on client updates. Its design is such that (i) the estimator of the target aggregate

G_{t}

remains unbiased in the sense of Equation (8), (ii) the variance introduced by quantization is explicitly controlled, and (iii) partial receipt of blocks and client dropouts manifest as structured linear perturbations rather than protocol failures.

We consider a per-round sketching dimension k (either fixed or adapted over time) and a public sketching matrix

Φ_{t} \in R^{k \times d}

. For concreteness, one may view

Φ_{t}

as a subsampled randomized orthonormal transform with entries in

{\pm 1 / \sqrt{k}}

, as in Equation (7), although the analysis below only requires a Johnson–Lindenstrauss-type concentration property. Let

Q_{b} : R \to A_{b}

be a stochastic quantizer with

2^{b}

output levels satisfying

E [Q_{b} (z) ∣ z] = z, V [Q_{b} (z) ∣ z] \leq σ_{b}^{2}

for all

z \in R

, where

σ_{b}^{2} = O (2^{- 2 b})

is a variance parameter. We lift

Q_{b}

to act componentwise on vectors.

Client pipeline (round t). For each client

C_{i} \in A_{t}

, define the linear compression operator

C_{t} : R^{d} \to R^{k}, C_{t} (v) : = Φ_{t} v .

Client

C_{i}

first computes its clipped gradient

v_{i, t}

according to Equation (1) and then its sketch

s_{i, t} : = C_{t} (v_{i, t}) = Φ_{t} v_{i, t} \in R^{k} .

Next, it applies

Q_{b}

to obtain a random quantized sketch

u_{i, t} : = Q_{b} (s_{i, t}) \in A_{b}^{k}

with the property that

E [u_{i, t} ∣ s_{i, t}] = s_{i, t}

and

V [u_{i, t} ∣ s_{i, t}] ⪯ σ_{b}^{2} I_{k}

. Combining this with Equations (2) and (8), we have

E [\sum_{i \in A_{t}} α_{i, t} u_{i, t} | {v_{i, t}}_{i}] = Φ_{t} \sum_{i \in A_{t}} α_{i, t} v_{i, t} = Φ_{t} G_{t},

so the compressed aggregate is an unbiased estimator of the sketched target

Φ_{t} G_{t}

. We now partition the k-dimensional vector

u_{i, t}

into

B_{t} : = ⌈ k / D ⌉

contiguous blocks via a family of deterministic selection matrices

{P^{(b)} \in {0, 1}^{D \times k}}_{b = 1}^{B_{t}}

, each extracting D coordinates

u_{i, t}^{(b)} : = P^{(b)} u_{i, t} \in R^{D}, s_{i, t}^{(b)} : = P^{(b)} s_{i, t}, \forall b \in [B_{t}] .

By construction,

\sum_{b} {(P^{(b)})}^{⊤} P^{(b)}

is the

k \times k

identity, and thus

u_{i, t} = \sum_{b} {(P^{(b)})}^{⊤} u_{i, t}^{(b)}

. We write

a_{i, t}^{(b)} : = u_{i, t}^{(b)}

to emphasize that this is the real-valued payload block for KS-IPFE and the tag layer.

Each block

a_{i, t}^{(b)}

is then encoded into an integer vector and packed using the KS-IPFE plaintext map as follows:

x_{i, t}^{(b)} : = ⌊ Δ a_{i, t}^{(b)} ⌉ \mod q \in Z_{q}^{D}, {\hat{x}}_{i, t}^{(b)} : = G_{D} x_{i, t}^{(b)} \in Z_{q}^{D l_{g}},

and encrypted into a block ciphertext

(c_{1, i}^{(b)}, c_{2, i}^{(b)})

according to Equation (13). In parallel,

C_{i}

computes the linearly homomorphic tag

τ_{i, t}^{(b)} = 〈⌊ Δ a_{i, t}^{(b)} ⌉, κ_{t}〉 \mod p

and then produces the clipping/encoding proof for this block. Each triple

(n_{i, t}^{(b)}, c t_{i, t}^{(b)}, τ_{i, t}^{(b)}, π_{i, t}^{(b)})

consists of a monotone nonce

n_{i, t}^{(b)} \in Z_{\geq 0}

(e.g.,

(t, b)

encoded as a single integer), the ciphertext

c t_{i, t}^{(b)}

, tag, and proof. These are streamed to

Srv

in non-decreasing order of

n_{i, t}^{(b)}

, and the nonce is bound into

c t_{i, t}^{(b)}

(e.g., by hashing it into the LWE randomness) to prevent replay or reordering attacks. To validate

n_{i, t}^{(b)}

at scale,

Srv

only stores, for each active client

i \in A_{t}

, the largest accepted nonce (or equivalently the next expected block index). This is

O (| A_{t} |)

counters and can be garbage-collected at round end since nonces are round-scoped; e.g.,

10^{4}

active clients require

\approx 8 \times 10^{4}

bytes for 64-bit counters. If

n_{i, t}^{(b)}

is deterministically encoded as

(t, b)

, the check reduces to duplicate suppression and does not require long-term per-client history.

Server/helpers pipeline and rate adaptation. Upon receiving any subset of block messages from clients, the aggregator maintains, for each block index b, an active index set

I_{t}^{(b)} \subseteq A_{t}

consisting of clients whose block-b ciphertexts have arrived and passed basic syntactic checks (including proof verification). For that block, it computes the aggregated ciphertext and tag

C_{1}^{(b)} : = \sum_{i \in I_{t}^{(b)}} {\tilde{α}}_{i, t} c_{1, i}^{(b)}, C_{2}^{(b)} : = \sum_{i \in I_{t}^{(b)}} {\tilde{α}}_{i, t} c_{2, i}^{(b)},

(19)

T_{t}^{(b)} : = \sum_{i \in I_{t}^{(b)}} {\tilde{α}}_{i, t} τ_{i, t}^{(b)} \mod p .

Note that

I_{t}^{(b)}

can vary across blocks: a client may send early blocks promptly but drop out before later blocks. Let

θ \in (0, 1]

be a target coverage parameter;

Srv

may decide to close block b once

| I_{t}^{(b)} | \geq θ | A_{t} |

, discarding any later-arriving contributions to that block. This is the locus of rate adaptation: smaller

θ

accelerates progress at the cost of using fewer client contributions per block. Closing blocks early can correlate contribution with device speed: if slower clients systematically belong to underrepresented groups (a common concern in non-IID settings such as FEMNIST), their effective weight in the aggregate may be reduced. Our sketching and quantization remain unbiased conditional on the received set

I_{t}^{(b)}

, but unbiasedness alone does not guarantee population- or device-level fairness [34,35]. As simple mitigations, one may (i) track each client’s effective participation over time and compensate in

α_{i, t}

, and/or (ii) periodically run full-coverage rounds (

θ = 1

) to reduce drift.

For each closed block b, the pair

(C_{1}^{(b)}, C_{2}^{(b)})

is forwarded to both helpers, who evaluate it under KS-IPFE function keys for the standard basis vectors

e_{1}, \dots, e_{D}

and the tag selector

κ_{t}

. Using Equation (15), helpers produce shares

σ_{j}^{\circ, (b)} : = DecShare ({sk}_{e_{j}}^{\circ}, C_{1}^{(b)}, C_{2}^{(b)}), σ_{κ_{t}}^{\circ, (b)} : = DecShare ({sk}_{{\bar{κ}}_{t}}^{\circ}, C_{1}^{(b)}, C_{2}^{(b)}),

for

\circ \in {A, B}

,

j \in [D]

. Aggregating shares and rounding as in Equation (16) yields, with overwhelming probability

z_{j, t}^{(b)} : = Round (σ_{j}^{A, (b)} + σ_{j}^{B, (b)}) = 〈\sum_{i \in I_{t}^{(b)}} {\tilde{α}}_{i, t} x_{i, t}^{(b)}, e_{j}〉,

so that

z_{t}^{(b)} : = {(z_{1, t}^{(b)}, \dots, z_{D, t}^{(b)})}^{⊤} = \sum_{i \in I_{t}^{(b)}} {\tilde{α}}_{i, t} x_{i, t}^{(b)} \in Z_{q}^{D} .

Dividing by

Δ

and undoing the fixed-point encoding recovers

{\hat{u}}_{t}^{(b)} : = Δ^{- 1} z_{t}^{(b)} \approx \sum_{i \in I_{t}^{(b)}} α_{i, t} u_{i, t}^{(b)},

where the approximation error is due only to rounding in Equation (3). In parallel, the tag decryption under

κ_{t}

produces

{\tilde{T}}_{t}^{(b)}

, which is checked against

T_{t}^{(b)}

as in the previous subsection, ensuring consistency between ciphertext aggregates and tags.

Stacking all blocks, we define the recovered sketched aggregate as

{\hat{U}}_{t} : = \sum_{b = 1}^{B_{t}} {(P^{(b)})}^{⊤} {\hat{u}}_{t}^{(b)} \in R^{k} .

(20)

By linearity of

P^{(b)}

and the unbiasedness of

u_{i, t}

, conditioning on the sets

{I_{t}^{(b)}}_{b}

we obtain

\begin{matrix} E [{\hat{U}}_{t} ∣ {v_{i, t}}_{i}, {I_{t}^{(b)}}_{b}] & = \sum_{b} {(P^{(b)})}^{⊤} \sum_{i \in I_{t}^{(b)}} α_{i, t} E [u_{i, t}^{(b)} ∣ v_{i, t}] \\ = \sum_{b} {(P^{(b)})}^{⊤} \sum_{i \in I_{t}^{(b)}} α_{i, t} s_{i, t}^{(b)} \\ = \sum_{i} α_{i, t} \sum_{b : i \in I_{t}^{(b)}} {(P^{(b)})}^{⊤} s_{i, t}^{(b)} . \end{matrix}

When all blocks are received,

I_{t}^{(b)} = A_{t}

for all b, and the last expression collapses to

Φ_{t} G_{t}

by

\sum_{b} {(P^{(b)})}^{⊤} P^{(b)} = I_{k}

. Under rate adaptation (i.e., some

I_{t}^{(b)} ⊊ A_{t}

), the expectation equals

Φ_{t}

applied to a truncated aggregate in which each coordinate receives contributions only from those clients whose corresponding block arrived before closure. This aligns with the FL semantics where the effective aggregation set is the subset of clients that succeed in uploading their updates before the round deadline. Finally, the model update is computed from

{\hat{U}}_{t}

via a pseudo-inverse of the sketch

{\hat{G}}_{t} : = Φ_{t}^{⊤} {\hat{U}}_{t} / Δ,

which is the estimator used in Section 5. In the idealized full-participation regime,

E [{\hat{G}}_{t} ∣ {v_{i, t}}]

=

Φ_{t}^{⊤} Φ_{t} G_{t}

, which reduces to

G_{t}

when

Φ_{t}

has orthonormal rows; in the non-ideal regime,

Φ_{t}^{⊤}

acts as a linear reconstruction operator for the partially observed sketch. Because decryptions and verifications occur blockwise, the server can begin updating coordinates associated with blocks that have already been closed and validated, while late blocks continue to stream, thereby tolerating both dropouts and heavy-tailed straggler behavior without violating the FE-based confidentiality guarantees.

5. Experiments

We empirically evaluate FlowAgg-FE—our KS-IPFE with PaS-Stream—on cross-device federated learning. Experiments target three questions: (i) Does FE-based transmission preserve model quality relative to plaintext and encrypted secure aggregation? (ii) What communication and compute savings are achieved? (iii) How robust is the system to client dropout and stragglers? All notation follows Section 3 and Section 4: clients compute clipped updates

v_{i, t}

(Equation (1)), the server seeks only the linear aggregate

G_{t}

(Equation (2)), and PaS-Stream uses

Φ_{t}

-sketching and unbiased quantization

Q_{b}

(Equations (7) and (8)) before KS-IPFE encryption.

5.1. Setup

We empirically evaluate FlowAgg-FE in a simulated cross-device federated learning environment that captures three interacting dimensions: (i) the statistical properties of the data distribution across clients, (ii) the training hyperparameters and model architectures, and (iii) the cryptographic and systems configuration of KS-IPFE and PaS-Stream. All experiments follow the notation of Section 3 and Section 4: each participating client

C_{i} \in A_{t}

produces a clipped update

v_{i, t}

(Equation (1)), the aggregator is interested only in the linear functional

G_{t}

(Equation (2)), and PaS-Stream applies

Φ_{t}

-sketching and unbiased quantization

Q_{b}

(Equations (7) and (8)) before KS-IPFE encryption. This subsection details the learning tasks, partitioning strategies, cryptographic parameters, and system environment used throughout the evaluation.

Tasks, models, and data partitioning. We consider two canonical FL tasks representative of vision and character recognition workloads:

CIFAR-10 [36]. A standard 10-class image classification task on $32 \times 32$ color images with $50,000$ training and $10,000$ test examples. We employ a ResNet-18 backbone with $d \approx 11.2$ M trainable parameters. The data is partitioned across $n = 1000$ virtual clients according to a Dirichlet distribution with concentration $α_{CIFAR} = 0.5$ over class labels, producing a moderately non-IID distribution in which clients see a biased subset of classes. In each round we sample $| A_{t} | = 100$ clients uniformly without replacement and perform $E = 1$ local epoch per client, for $T = 100$ global rounds.
FEMNIST [37]. A character recognition task derived from the Extended MNIST dataset, partitioned by the authors. We use a small convolutional neural network (two convolutional layers followed by two fully connected layers) with $d \approx 1.5$ M parameters. The dataset is partitioned across $n = 3400$ clients (writers), each holding between 20 and 200 images; we model this using a Dirichlet distribution with $α_{FEMNIST} = 0.3$ to accentuate heterogeneity. Each round samples $| A_{t} | = 256$ clients, with $E = 1$ local epoch and $T = 120$ global rounds.

Unless otherwise specified, all baselines (Plaintext-FedAvg, Encrypted SecAgg, KS-IPFE, and PaS-Stream) use identical optimization hyperparameters (learning rate schedule, momentum, weight decay, and clipping threshold S) and client participation patterns. Initial model weights are shared across methods and each configuration is repeated with three random seeds (affecting client sampling, data shuffling, and cryptographic noise) to report mean performance. We additionally include two recent secure aggregation baselines, SecAgg+ and BatchCrypt, to contextualize FlowAgg-FE against modern encrypted aggregation protocols under the same participation/latency model. Although we report end-to-end training on CIFAR-10 and FEMNIST for reproducibility, FlowAgg-FE’s cryptographic payload and helper work scale with the sketch dimension k (and block size D), not directly with the raw model dimension d. In particular, per-client encrypted uplink is

Θ (k)

(split into

B_{t} \approx k / D

fixed-size blocks), so the same configuration can support models with millions of parameters provided an appropriate k is chosen.

Table 1 summarizes the federated task configuration, highlighting the interaction between data partitioning and participation. The secure aggregation baselines (Table 2) run on the same tasks and inherit the configuration parameters

(n, | A_{t} |, T, E, α)

from Table 1.

Figure 1 visualizes the resulting distribution of per-client dataset sizes under this sampling procedure. The heavy right tails for both tasks reflect the presence of a small number of “heavy” clients, which interact non-trivially with PaS-Stream’s rate-adaptive behavior and dropout robustness.

Cryptographic and streaming parameters. All experiments use a single family of LWE and gadget parameters for KS-IPFE, and a fixed sketching dimension for PaS-Stream unless explicitly varied in ablations. The default cryptographic parameters are as follows:

Modulus $q = 2^{32}$ , LWE dimension $n_{l} = 1024$ , additive noise distributions $χ, χ^{'}$ with small standard deviations selected to satisfy Equation (6) for up to $| A_{t} |$ active clients,
Gadget base 2 with $l_{g} = 16$ digits per coordinate and block size $D = 64$ so that each plaintext block encodes 64 scaled coordinates,
Scaling factor $Δ = 2^{16}$ satisfying $Δ S ≪ q / 8$ to avoid wrap-around on clipped updates.

For PaS-Stream, we set the sketch dimension

k = 8192

and the quantization bit-width

b \in {8, 4}

; the resulting compression operator

Φ_{t} \in {\pm 1 / \sqrt{k}}^{k \times d}

is resampled every 10 rounds to mitigate potential adversarial alignment between the sketch and the data distribution. We use a 10-round rotation as a conservative balance between privacy (limiting long-term alignment/linkability of a fixed sketch) and optimizer stability (keeping the compression operator fixed long enough for momentum/error-feedback to adapt). We add a rotation-period sensitivity study in Section 5.3. The target coverage parameter for block closure in rate adaptation is

θ = 0.7

unless otherwise specified, meaning that a block is sealed once at least

70 %

of scheduled clients have successfully uploaded that block.

Table 3 summarizes the main cryptographic and streaming parameters, and Figure 2 depicts the simulated latency distribution used to emulate straggler behavior.

System environment. Experiments are executed on a cluster with two non-colluding helper processes and one aggregator process. Each helper runs on a 32-core 3.0 GHz CPU with 128 GB of RAM; the aggregator runs on a similar machine. The FL orchestration uses asynchronous RPC between clients and server, with simulated client processes replaying latency samples drawn from a Pareto distribution with shape parameter

1.2

and scale

1.0

s. The number of simulated clients (

n \in {1000, 3400}

) exceeds the per-round participation

| A_{t} |

, so that each round includes a fresh random subset of clients.

Figure 2 shows the empirical cumulative distribution function (CDF) of simulated client latencies. The heavy tail implies that a non-trivial fraction of clients are extreme stragglers, providing a realistic stress test for PaS-Stream’s ability to make progress with partial block coverage.

5.2. Main Results

We now compare FlowAgg-FE against the plaintext and encrypted secure aggregation baselines along three axes: final model quality, per-round communication cost, and per-round server-side compute. All experiments in this subsection use the setup of Section 5.1 with the default cryptographic and streaming parameters in Table 3. For readability, we focus on CIFAR-10 and FEMNIST; In addition to classic Encrypted SecAgg, we include SecAgg+ and BatchCrypt as modern encrypted baselines (Table 4 and Table 5). additional ablation results are deferred to the next subsection.

Accuracy. Table 4 reports final test accuracy after

T = 100

rounds on CIFAR-10 and

T = 120

rounds on FEMNIST, averaged over three random seeds. We observe that all secure methods match the plaintext baseline to within

0.3 %

absolute on both tasks. In particular, KS-IPFE in full-precision mode is almost indistinguishable from Encrypted SecAgg, while PaS-Stream with

b = 8

and

b = 4

introduces only minor degradation consistent with the unbiasedness guarantee of Equation (8) and the controlled variance of

Q_{b}

.

The small accuracy gaps between PaS-Stream and Plaintext-FedAvg can be attributed to two effects: (i) the additional variance introduced by

Q_{b}

, which effectively adds a small amount of noise to each coordinate of the sketched gradient, and (ii) the use of a finite sketch dimension

k = 8192

, which slightly distorts the geometry of the gradient space relative to the full d-dimensional model. In practice, these effects are dominated by the inherent noise of stochastic optimization, and the models converge to essentially the same generalization performance.

To corroborate the small gaps in Table 4, we include round-by-round convergence curves in Figure 3 and verified that all baselines share the same optimizer, data pipeline, and participation schedule.

Communication and computation. We next quantify the per-round per-client uplink and the server-side CPU time. Uplink reflects the serialized size (in megabytes) of all messages sent by a client to the server in a given round, including KS-IPFE ciphertexts, tags, and proofs; CPU time aggregates the wall-clock time spent by the aggregator and both helpers. Table 5 summarizes these metrics on CIFAR-10, and Figure 4 and Figure 5 visualize the same data. While Table 5 and Table 6 report uplink and server CPU, client-side encryption cost is also important in cross-device FL.

In addition to communication and server CPU, we provide an explicit accounting of helper-side storage/latency and key-management overhead (computed from Table 3, reported in Table 7). With

q = 2^{32}

(4 bytes/word),

D = 64

,

l_{g} = 16

(

D l_{g} = 1024

), and

m = 2048

, one split function key share has

(m + D l_{g}) = 3072

words (≈12 KB). Thus, storing the D basis shares for block decryption costs

\approx 0.77

MB per helper, and per-round refresh material (e.g., for

κ_{t}

) is only ≈12 KB per helper. Per block, each helper evaluates

(D + 1)

decryption shares, i.e., about

(D + 1) (m + D l_{g}) \approx 2.0 \times 10^{5}

modular multiply-adds, and the server-side nonce state is

O (| A_{t} |)

counters.

KS-IPFE already cuts uplink roughly in half compared to Encrypted SecAgg, even though it operates in full precision without sketching. This is largely due to (i) more efficient packing of plaintext coordinates into KS-IPFE blocks and (ii) avoiding some of the malleability-resistant padding required by the SecAgg pipeline. When PaS-Stream is enabled, the sketching dimension k and the quantization depth b further reduce the size of each client’s encoded update, yielding

2.5 \times

(8-bit) and

3.4 \times

(4-bit) reductions in uplink. On the compute side, FlowAgg-FE reduces helper and aggregator workload by lowering the number of effective plaintext coordinates that must be processed per round; PaS-Stream with

b = 4

achieves a

1.77 \times

reduction in CPU time relative to Encrypted SecAgg while preserving accuracy within

0.3 %

.

Figure 4 and Figure 5 display these results as grouped bar charts. Each bar corresponds to one of the four methods; the shading encodes the method as described in the caption, and numerical values are shown atop each bar for reference.

Overall, these results show that the combination of KS-IPFE and PaS-Stream achieves near-plaintext accuracy while significantly reducing both bandwidth and compute relative to a strong secure aggregation baseline. The gains arise from co-design across cryptography (functional encryption tailored to linear aggregation) and systems (sketching, quantization, and streaming), rather than from any single optimization in isolation. We also use a 10-round rotation as a conservative balance between privacy (limiting long-term alignment/linkability of a fixed sketch) and optimizer stability (keeping the compression operator fixed long enough for momentum/error-feedback to adapt).

5.3. Ablations and Stress Tests

We now probe the behavior of FlowAgg-FE under variations in quantization/sketch parameters and under adversarial systems conditions such as client dropout and heavy-tailed latencies. All experiments in this subsection are conducted on CIFAR-10 unless otherwise specified; FEMNIST exhibits qualitatively similar trends and is omitted for brevity. We focus on four questions: (i) how sensitive is model quality to the sketch dimension k and quantization depth b; (ii) how much communication is saved by more aggressive compression; (iii) how robust is PaS-Stream to random dropouts; and (iv) what overhead is induced by the verifiability layer under straggler-heavy latency distributions.

Quantization depth and sketch dimension. Recall that PaS-Stream applies a linear sketch

Φ_{t} \in R^{k \times d}

to each clipped client gradient and then applies a stochastic quantizer

Q_{b}

with bit-depth b. From Equation (8), the aggregated sketched update remains an unbiased estimator of

Φ_{t} G_{t}

, while the variance introduced by quantization scales as

σ_{b}^{2} = O (2^{- 2 b})

per coordinate. The sketch dimension k controls the Johnson–Lindenstrauss distortion and the amount of information preserved about the gradient direction.

Table 6 reports an ablation over

(k, b)

, showing accuracy, uplink, and server CPU time on CIFAR-10. The “PaS-Stream” rows differ only in their compression parameters; all other aspects of the protocol are identical.

Moving from

(k, b) =

(16,384, 8) to

(8192, 8)

yields a

1.27 \times

reduction in uplink with negligible impact on accuracy, while

(8192, 4)

further reduces uplink by

1.37 \times

and only degrades accuracy by

0.2

percentage points. At

k = 4096

, the sketch becomes more aggressive: the uplink drops to

0.60

MB per client per round at

b = 4

, but the accuracy falls to

91.0 %

, a

0.7 %

drop relative to the plaintext baseline. In practice,

(k, b) = (8192, 4)

appears to strike a favorable balance between compression and model quality.

Figure 6 visualizes the trade-off between compression and accuracy. The horizontal axis denotes the uplink reduction factor relative to Encrypted SecAgg, and the vertical axis shows the corresponding CIFAR-10 accuracy. Each marker corresponds to one configuration from Table 6; the monotone drop in accuracy as compression intensifies reflects the increasing variance of the estimator

{\hat{G}}_{t}

.

Dropout robustness and throughput under stragglers. We next examine robustness to random client dropout and heavy-tailed latency. For each method, we induce an independent per-client dropout probability

ρ \in {0 %, 10 %, 20 %, 30 %}

by randomly marking scheduled clients as unavailable on each round; the remaining active clients behave as in Section 5.1. PaS-Stream additionally employs rate adaptation: once a block reaches coverage

θ = 0.7

of the scheduled clients, it is closed and further messages for that block are ignored.

Table 8 reports final CIFAR-10 accuracy and a normalized throughput metric defined as the number of effective model updates per wall-clock minute (higher is better), under varying dropout rates for Encrypted SecAgg and PaS-Stream with

(k, b) = (8192, 8)

. Accuracy remains stable up to

ρ = 30 %

for both methods, with PaS-Stream tracking Encrypted SecAgg within

0.1

–

0.2 %

at each dropout level. Throughput decreases with

ρ

under Encrypted SecAgg because the protocol incurs coordination overhead due to missing contributions; by contrast, PaS-Stream’s blockwise decryption and early closure allow it to increase throughput slightly as

ρ

grows, effectively trading off some participation against faster rounds.

Figure 7 visualizes the accuracy drop as a function of

ρ

. The solid line corresponds to Encrypted SecAgg, and the dashed line to PaS-Stream; markers indicate the discrete dropout rates tested.

Finally, we evaluate the impact of heavy-tailed latencies under the Pareto model of Figure 2. Table 9 reports the relative overhead of the verifiability layer—linearly homomorphic tags and clipping/encoding proofs—on CIFAR-10 for KS-IPFE and PaS-Stream. We report the additional CPU time (absolute and as a percentage of the base KS-IPFE cost) and the additional communication per client per round.

Figure 8 shows the normalized throughput (effective model updates per minute) for KS-IPFE and PaS-Stream with and without the verifiability layer, under the same heavy-tailed latency configuration. Bars on the left of each pair correspond to the base protocol, and bars on the right to the verifiable variant. The small gaps between bars corroborate that the commuting verification layer adds only modest overhead while significantly strengthening integrity guarantees.

6. Conclusions

We have presented FlowAgg-FE, a novel verifiable functional encryption framework for secure and communication-efficient gradient transmission in distributed machine learning, tailored to the requirements of the Special Issue on security and privacy in distributed machine learning. At the cryptographic layer, our KS-IPFE scheme instantiates a key-splittable, LWE-based inner-product FE construction that supports high-dimensional, blockwise aggregation with 2-of-2 threshold decryption across two non-colluding helpers, thereby providing both function privacy and robustness against any single compromised server. At the systems layer, PaS-Stream integrates Johnson–Lindenstrauss sketching, unbiased quantization, and streaming FE encryption to produce rate-adaptive ciphertext flows that preserve unbiased estimation of the target aggregate

G_{t}

while tolerating client dropouts and stragglers. Commuting linearly homomorphic tags and clipping proofs add an efficient verifiability mechanism that ensures end-to-end integrity of the aggregated updates without exposing per-client gradients. Our empirical evaluation on CIFAR-10 and FEMNIST demonstrates that FlowAgg-FE matches plaintext and state-of-practice secure aggregation accuracy within

0.3 %

absolute, reduces per-client uplink by up to

3.4 \times

, and lowers server-side CPU time by up to

1.77 \times

under realistic participation patterns. These results indicate that carefully co-designed FE, compression, and verifiability can make function-private, scalable secure aggregation a practical building block for future federated and distributed learning systems. Future work includes extending KS-IPFE to richer families of linear and low-degree polynomial functions, exploring adaptive key rotation and revocation in dynamic client populations, and integrating our framework with production FL platforms and hardware accelerators.

Author Contributions

Conceptualization, Z.T., Z.P. and S.Y.; Methodology, Z.P.; Formal analysis, Z.T.; Investigation, Z.P.; Writing—original draft, Y.L. All authors have read and agreed to the published version of the manuscript.

Funding

Guangdong University of Science and Technology 2024 University-Level Project: Research on Innovation Strategies of Dongguan Cross-border E-commerce Driven by Digital Economy (GKY-2024KYZDW-14).

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding authors.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A. Parameter Selection

We briefly quantify the computational and communication overhead of FlowAgg-FE and justify the concrete parameter choices used in Section 5. Throughout, we assume that the block length D and gadget parameter

l_{g}

are fixed at setup and that all parties share a 32-bit word representation for elements of

Z_{q}

.

From Equation (13), a single client block encryption computes

c_{1} = A^{⊤} r + e_{1}, c_{2} = B^{⊤} r + \hat{x} + e_{2} .

The dominant costs are the matrix–vector products

A^{⊤} r

and

B^{⊤} r

. The former uses an

m \times n_{l}

matrix, costing

O (n_{l} m)

word operations; the latter uses a

D l_{g} \times n_{l}

matrix, costing

O (n_{l} D l_{g})

. For typical choices in which m and

D l_{g}

are of the same order (e.g.,

m \approx 2 D l_{g}

), the total per-block cost is

T_{enc}^{block} = Θ (n_{l} (m + D l_{g}))

ring multiplications and additions in

Z_{q}

, plus

O (D l_{g})

operations for gadget packing

\hat{x} = G_{D} x

and adding the small error vectors. A client with sketch dimension k encrypts

B_{t} = ⌈ k / D ⌉

blocks per round, so its total per-round encryption cost is

B_{t} T_{enc}^{block}

. Instantiating the above with our default parameters (Table 3) gives

D = 64

,

l_{g} = 16

,

n_{l} = 1024

, and typically

m \approx 2 D l_{g} = 2048

, hence

(m + D l_{g}) \approx 3072

. This yields about

n_{l} (m + D l_{g}) \approx 3.1 \times 10^{6}

ring mul-adds per encrypted block. With

k = 8192

and

B_{t} = k / D = 128

blocks per round, the per-round encryption work is roughly

4.0 \times 10^{8}

ring mul-adds. Using a conservative throughput of

10^{9}

32-bit operations/s and 2 W device power gives a back-of-the-envelope estimate of <0.5 s and <1 J per round (device- and implementation-dependent), suggesting feasibility on mobile-class hardware.

Server-side aggregation is purely additive. For a fixed block b, forming

C_{1}^{(b)}

and

C_{2}^{(b)}

as in Equation (14) requires, for each participating client, adding one vector in

Z_{q}^{m}

and one in

Z_{q}^{D l_{g}}

, i.e.,

T_{agg}^{block} = O (| A_{t} | (m + D l_{g}))

ring additions. Over all

B_{t}

blocks, aggregation scales linearly in both

| A_{t} |

and k, but does not depend on

n_{l}

.

Each helper performs decryption by computing, for each function vector

y

, two inner products as in Equation (15): one with

C_{2}^{(b)} \in Z_{q}^{D l_{g}}

and one with

C_{1}^{(b)} \in Z_{q}^{m}

. This costs

T_{dec}^{block} (y) = O (D l_{g} + m)

ring multiplications/additions. If we conceptually treat the decryption of a full D-dimensional block (i.e., the D coordinate functionals

y = e_{1}, \dots, e_{D}

) as a single “block decryption” and amortize shared work across coordinates, then per block we pay

O (D (D l_{g} + m))

operations in the worst case but with a small constant. With

k = 8192

and

D \in {32, 64}

, we have

B_{t} = k / D \in {256, 128}

, so there are 128–256 such block decryptions per round, independent of the number of clients; this is the quantity reported in Section 5. Additional decryptions for the tag selector

κ_{t}

and any auxiliary function vectors contribute only a small constant factor.

In terms of communication, one KS-IPFE ciphertext per block consists of

(c_{1}, c_{2}) \in Z_{q}^{m + D l_{g}}

, i.e.,

m + D l_{g}

words modulo q. With

q = 2^{32}

, each word fits in 4 bytes, so a single block ciphertext has size

{size}_{ct}^{block} = 4 (m + D l_{g}) bytes .

For example, with

D = 64

and

l_{g} = 16

(so

D l_{g} = 1024

) and a moderately overprovisioned

m = 2048

, we have

{size}_{ct}^{block} \approx 4 \times 3072 \approx 12

KB. A client with

k = 8192

transmits

B_{t} = 128

such blocks, totaling roughly

1.5

MB of ciphertexts per round. Linearly homomorphic tags add one element of

Z_{p}

per block (e.g., 8 bytes for a 64-bit p), and commitments/proofs add a small multiple of group elements; in our parameter regimes, these components together contribute less than

5 %

of the ciphertext bandwidth.

Finally, we justify the parameter set used in the experiments as follows:

q = 2^{32}, n_{l} = 1024, l_{g} = 16, Δ = 2^{16}, b \in {4, 8}, D = 64 .

The scaling factor

Δ = 2^{16}

and clipping threshold S are chosen so that

Δ S \leq 2^{16} S ≪ q / 8 = 2^{29},

ensuring that encoded coordinates do not wrap modulo q for all clipped updates and that the mapping

x \mapsto ⌊ Δ^{- 1} x ⌉

remains injective over the support of interest. The LWE parameters

(n_{l}, q, χ^{'})

are selected so that the aggregated decryption noise

η

per block obeys Equation (6), namely

η ≪ q / 4, η = O (σ_{χ^{'}} \sqrt{| A_{t} |} {∥ \tilde{y} ∥}_{2}),

for all authorized function vectors

y

and for

| A_{t} |

up to several thousand. Setting

σ_{χ^{'}}, σ_{χ}

in the low tens, standard LWE estimates yield a per-block decryption failure probability below

2^{- 40}

, and thus an overall per-round failure probability well below

10^{- 8}

when decrypting on the order of

10^{5}

blocks. This parameter point therefore simultaneously satisfies correctness (via Equation (6)), security (via the hardness of LWE at dimension

n_{l} = 1024

and modulus

q = 2^{32}

), and practicality (via per-client uplink and helper-time budgets reported in Section 5).

Appendix B. Additional Security Proofs

This appendix formalizes the confidentiality and verifiability claims for KS-IPFE and the LHT layer using standard game-based and simulation-based arguments. We present single-round, single-block experiments; multi-block security follows by a hybrid over blocks, and multi-round security follows because encryption randomness and tag keys are fresh each round. Throughout, we use the notation and algorithms defined in Section 3 and Section 4, and we refer to Equation (11) (public key structure), Equation (12) (key splitting), Equation (15) (partial decryption share), and Equation (5) (share combination).

Appendix B.1. KS-IPFE Interface (Explicit Algorithms)

We make explicit the KS-IPFE algorithms used by the protocol. The message space is a block vector

x \in M \subseteq Z_{q}^{D l_{g}}

(after gadget packing/integer encoding), and the function space is

y \in Y \subseteq Z_{q}^{D l_{g}}

corresponding to a linear functional

〈 x, y 〉

over

Z_{q}

.

Algorithms

$Setup (1^{λ}) \to (mpk, msk)$ : sample $A \leftarrow_{R} Z_{q}^{n_{l} \times m}$ and secret S and error E (as in Equation (11)), set $B = A S + E$ , and output $mpk = (A, B, G_{D})$ with $msk = S$ (and any auxiliary trapdoor/parameters as in Section 3).
$Enc (mpk, x) \to ct = (c_{1}, c_{2})$ : sample fresh randomness $r, e_{1}, e_{2}$ and output $c_{1} = A^{⊤} r + e_{1} \in Z_{q}^{m}$ and $c_{2} = B^{⊤} r + e_{2} + \hat{x} \in Z_{q}^{D l_{g}}$ , where $\hat{x}$ is the gadget-packed encoding of $x$ (Section 3).
$KeySplit (msk, y) \to ({sk}_{y}^{A}, {sk}_{y}^{B})$ : output two helper key shares according to Equation (12), where each share contains a uniformly random masking component (denoted W in Equation (12)) and any required derived vector $\tilde{y}$ .
$PartDec ({sk}_{y}^{\circ}, CT) \to σ_{y}^{\circ}$ : on an aggregated ciphertext $CT = (C_{1}, C_{2})$ , helper $\circ \in {A, B}$ outputs a decryption share $σ_{y}^{\circ}$ as in Equation (15).
$Comb (σ_{y}^{A}, σ_{y}^{B}) \to v$ : combine shares via Equation (5) and apply the rounding/decoding step to recover $v = 〈 \sum_{i} {\tilde{α}}_{i, t} x_{i}, y 〉$ (up to negligible decryption failure under the noise constraint).

Correctness follows from the standard LWE noise bound: under Equation (6), the rounding step in

Comb

succeeds with all but negligible probability, and the output equals the intended inner product over the encoded message space.

Appendix B.2. Confidentiality of KS-IPFE Under One-Helper Leakage

Appendix B.2.1. Security Experiment: ${IND}_{KS - IPFE}^{1 H} (λ)$

A challenger runs

(mpk, msk) \leftarrow Setup (1^{λ})

and gives

mpk

to

A

. The adversary controls

Srv

, may corrupt any subset of clients, and corrupts at most one helper (wlog.

H_{A}

).

A

may adaptively query two oracles. A key-share oracle

O_{share} (y)

returns the corrupted helper’s share

{sk}_{y}^{A}

produced by

KeySplit (msk, y)

. An honest-helper oracle

O_{part} (y, CT)

returns

σ_{y}^{B} = PartDec ({sk}_{y}^{B}, CT)

for any aggregated ciphertext

CT

. At challenge time,

A

outputs two equal-length message families

{x_{i}^{(0)}}

and

{x_{i}^{(1)}}

in

M

subject to the standard FE side condition: for every

y

queried to

O_{share}

(and any

y

queried subsequently), the authorized aggregate outputs coincide, i.e.,

〈 \sum_{i} {\tilde{α}}_{i, t} x_{i}^{(0)}, y 〉 = 〈 \sum_{i} {\tilde{α}}_{i, t} x_{i}^{(1)}, y 〉

in

Z_{q}

. The challenger samples

b \leftarrow_{R} {0, 1}

and returns

{ct}_{i} \leftarrow Enc (mpk, x_{i}^{(b)})

for all i (equivalently, an aggregated ciphertext derived by linear homomorphism).

A

continues querying and outputs

b^{'}

. The advantage is

{Adv}_{A}^{IND} = |\Pr [b^{'} = b] - \frac{1}{2}|

.

Appendix B.2.2. Simulation-Based Confidentiality: ${Real}^{1 H}$ vs. ${Ideal}^{1 H}$

The real experiment

{Real}^{1 H} (λ)

is identical to the above interaction:

A

receives

mpk

, ciphertext blocks, corrupted-helper key shares, and honest-helper response shares for its oracle queries. In the ideal experiment

{Ideal}^{1 H} (λ)

, a simulator

S

is given only the permitted leakage consisting of public metadata (round id, weights, block indices) and the authorized aggregate outputs

{〈 \sum_{i} {\tilde{α}}_{i, t} x_{i, t}^{(b)}, y 〉}_{y}

for every function

y

legitimately revealed by the protocol and by

A

’s oracle queries, and must generate an indistinguishable transcript (ciphertexts, corrupted-helper key shares, and honest-helper shares). KS-IPFE is SIM-secure under one-helper leakage if for every PPT

A

there exists a PPT

S

such that

{Real}^{1 H} (λ) \approx_{c} {Ideal}^{1 H} (λ)

.

Appendix B.2.3. Theorem (Confidentiality Under LWE)

Assuming the LWE assumption holds at parameters

(n_{l}, q, χ^{'})

, for all PPT adversaries

A

we have

{Adv}_{A}^{IND} \leq negl (λ)

; moreover, KS-IPFE satisfies

{Real}^{1 H} (λ) \approx_{c} {Ideal}^{1 H} (λ)

.

Appendix B.2.4. Proof Sketch: Explicit Simulator and Hybrids (Reduction to LWE)

Simulator construction. Given the permitted leakage and public metadata,

S

samples

A \leftarrow_{R} Z_{q}^{n_{l} \times m}

and

B \leftarrow_{R} Z_{q}^{n_{l} \times D l_{g}}

uniformly and sets

mpk = (A, B, G_{D})

. For every ciphertext block in the transcript, it outputs

(c_{1}, c_{2}) \leftarrow_{R} Z_{q}^{m} \times Z_{q}^{D l_{g}}

. For every corrupted-helper key-share query on

y

, it samples the masking component (denoted W in Equation (12)) uniformly and outputs

{sk}_{y}^{A}

with the same distribution as Equation (12). For every honest-helper share query

(y, CT)

, it samples a share

σ_{y}^{B} \in Z_{q}

uniformly except that, when the combined output value

v = 〈 \sum_{i} {\tilde{α}}_{i, t} x_{i}, y 〉

is among the permitted leakage for that query, it chooses

σ_{y}^{B}

so that the combine rule in Equation (5) yields v (treating the corrupted helper share as uniform by the lemma below).

Hybrid sequence. Let

H_{0}

be the real experiment. In

H_{1}

, replace the LWE-structured matrix

B = A S + E

in

mpk

with a uniform matrix

U \leftarrow_{R} Z_{q}^{n_{l} \times D l_{g}}

; by the LWE assumption (Equation (11)),

H_{0} \approx_{c} H_{1}

. In

H_{2}

, conditioned on

(A, U)

, ciphertext blocks become message-independent:

c_{2} = U^{⊤} r + e_{2} + \hat{x}

is computationally indistinguishable from uniform in

Z_{q}^{D l_{g}}

for fresh

r, e_{2}

, and adding the fixed offset

\hat{x}

preserves uniformity;

c_{1} = A^{⊤} r + e_{1}

is independent of

\hat{x}

. Hence we can replace all ciphertext blocks by uniform samples, matching the simulator. In

H_{3}

, replace corrupted-helper key shares by simulator-generated shares; this is distribution-preserving because the masking component W in Equation (12) is uniform by construction. In

H_{4}

, simulate honest-helper response shares: for any aggregated ciphertext

CT = (C_{1}, C_{2})

and function

y

, the corrupted helper share is

σ_{y}^{A} = 〈 \tilde{y}, C_{2} 〉 - 〈 W, C_{1} 〉

(Equation (15)); since W is uniform,

〈 W, C_{1} 〉

is uniform over

Z_{q}

whenever

C_{1} \neq 0

, so

σ_{y}^{A}

is uniform from the adversary’s perspective. Therefore the honest helper can choose

σ_{y}^{B}

uniformly subject only to satisfying the combined output value via Equation (5), which matches the simulator’s choice. The resulting distribution equals

{Ideal}^{1 H} (λ)

, establishing

{Real}^{1 H} (λ) \approx_{c} {Ideal}^{1 H} (λ)

and implying negligible IND advantage.

Appendix B.2.5. Lemma (Simulatability of One-Helper Decryption Shares)

Fix any function

y

and any aggregated ciphertext

CT = (C_{1}, C_{2})

. In the view of an adversary that knows

mpk

,

\tilde{y}

, and at most one helper key share, the corresponding partial share

σ_{y}^{A}

(or

σ_{y}^{B}

) is statistically close to uniform over

Z_{q}

conditioned on

(mpk, \tilde{y}, CT)

except for the negligible event

C_{1} = 0

. This follows directly from Equation (15) because

〈 W, C_{1} 〉

is uniform for uniform W and

C_{1} \neq 0

.

If both helpers collude with

Srv

, they can jointly evaluate authorized functionals on ciphertexts and recover per-ciphertext function outputs, which is outside our threat model. Section 3.2 discusses practical mitigations (independent domains, TEEs, and t-of-m generalizations).

Appendix B.3. Verifiability: LHT Commuting Checks (Formal Games and Leakage)

Appendix B.3.1. Verifiability Experiment and Soundness Bound

Define a verifiability experiment

Vfy (λ)

where a challenger samples a fresh per-round tag key

κ_{t} \leftarrow_{R} Z_{p}^{D}

and provides public parameters to an adversary

A

controlling

Srv

. For a closed block

(t, b)

with honest client contributions,

A

outputs a candidate decrypted aggregate

{\hat{u}}_{t}^{(b)}

and a tag value

{\tilde{T}}_{t}^{(b)}

(and any auxiliary commitment/proof objects required by the protocol) that are accepted by the verifier.

A

wins if

{\hat{u}}_{t}^{(b)}

differs from the honest linear aggregate but all checks pass. If

κ_{t}

is hidden from

Srv

during round t, then for any PPT

A

the probability of winning is at most

1 / p

per checked block, plus negligible terms from FE correctness and the soundness of any auxiliary proofs/commitments. The bound follows since passing the LHT equality check for a modified value requires

〈 Δ, κ_{t} 〉 \equiv 0 (\mod p)

for a nonzero difference vector

Δ

, which holds with probability

1 / p

over uniform

κ_{t}

.

Appendix B.3.2. Lemma (Tag Hiding Given Unknown κ_t)

Fix any nonzero

z \in Z_{p}^{D}

and sample

κ \leftarrow_{R} Z_{p}^{D}

. Then

τ = 〈 z, κ 〉 \mod p

is uniform in

Z_{p}

. Consequently, if

κ_{t}

is hidden from

Srv

, tag values are pseudorandom and leak no additional information about gradients beyond what is already revealed by authorized FE outputs, except for the degenerate event

z = 0

.

If

κ_{t}

becomes known to

Srv

, each tag is an additional linear measurement

〈 z, κ_{t} 〉 \mod p

; while this does not directly expose norms, it is extra information. This motivates treating

κ_{t}

as ephemeral round-scoped secret material and explicitly managing its lifecycle and access controls (Section 4.3).

Appendix C. Helper-Side API and State (Implementation-Facing)

Each helper

H_{\circ}

for

\circ \in {A, B}

is a stateless (or minimally stateful) service that returns threshold decryption shares for aggregated ciphertext blocks only. This explicit API clarifies the helper architecture and the trust boundary.

Helper state. Each helper stores long-lived split keys for the block-coordinate basis vectors, namely

{{sk}_{e_{j}}^{\circ}}_{j \in [D]}

, which are sufficient to produce decryption shares for each coordinate of a closed block. If verifiability is enabled, the helper additionally stores a small set of round-tagged split keys for auxiliary functions (e.g.,

{sk}_{{\bar{κ}}_{t}}^{\circ}

for LHT verification); these auxiliary shares are scoped to round t and are deleted after the round completes. Helpers do not maintain per-client ciphertext buffers and do not track per-client nonces; nonce validation is performed at

Srv

.

Helper input. For each closed block

(t, b)

, the server sends the round and block identifiers

(t, b)

, the aggregated ciphertext block

{CT}_{t}^{(b)} = (C_{1, t}^{(b)}, C_{2, t}^{(b)})

, and a list of requested function identifiers

F

(typically

{e_{1}, \dots, e_{D}}

and optionally

{\bar{κ}}_{t}

). The request also includes an application-level context string for domain separation, binding the helper response to this protocol instance and round.

Helper output and server combine rule. The helper returns a set of decryption shares

{σ_{f}^{\circ} (t, b)}_{f \in F}

, where each share is computed as in Equation (15) using the stored split key

{sk}_{f}^{\circ}

. For each requested function f,

Srv

combines the two helper shares (e.g.,

σ_{f}^{A} (t, b) \oplus σ_{f}^{B} (t, b)

) according to Equation (5) to reconstruct the corresponding aggregate value for block

(t, b)

.

Minimality and privacy. Helpers only ever process aggregated ciphertext blocks (never per-client ciphertexts) and return only additive response shares. Under the threat model in which

Srv

may collude with at most one helper, each helper’s view is individually simulatable, as formalized in Appendix B.

Appendix D. Round-Level Message-Flow

We present one FL round t in pseudocode in Algorithm A1.

Algorithm A1 FlowAgg-FE: Round t message flow

Require: Global model

w_{t}

; active set

A_{t}

; sketch dim k; block size D; threshold

θ

; scale

Δ

; quantizer

Q_{b}

Ensure: Updated model

w_{t + 1}

1:: // Server broadcast
2:: $Srv$ samples or derives sketch spec $Φ_{t}$ (or seed ${seed}_{t}$ ) and broadcasts $(t, Φ_{t} or {seed}_{t}, Q_{b}, Δ, D, B_{t} = ⌈ k / D ⌉, θ)$ .
3:: // Key materialization (round-scoped)
4:: Key authority sends to each helper $H_{\circ}$ : round-tagged split keys ${{sk}_{e_{j}}^{\circ}}_{j \in [D]}$ (and ${sk}_{{\bar{κ}}_{t}}^{\circ}$ if verifiability enabled).
5:: Key authority sends to each participating client: ephemeral tag key $κ_{t}$ (if enabled).
// Client-side (for each $i \in A_{t}$ , in parallel)
6:: for all $i \in A_{t}$ do
7:: Compute clipped update $v_{i, t}$ .
8:: Compute sketch and quantize: $u_{i, t} \leftarrow Q_{b} (Φ_{t} v_{i, t}) \in R^{k}$ .
9:: Encode/scale: ${\hat{u}}_{i, t} \leftarrow ⌊ Δ u_{i, t} ⌉ \in Z^{k}$ .
10:: Partition into blocks ${\hat{u}}_{i, t}^{(b)} \in Z^{D}$ for $b \in [B_{t}]$ .
11:: for $b = 1$ to $B_{t}$ do
12:: Set nonce $n_{i, t}^{(b)}$ (monotone within round).
13:: Encrypt block: ${ct}_{i, t}^{(b)} \leftarrow Enc (mpk, {\hat{u}}_{i, t}^{(b)})$ .
14:: Compute integrity metadata ${meta}_{i, t}^{(b)}$ (e.g., LHT tag/commitment/proof).
15:: Send to server: $(t, b, n_{i, t}^{(b)}, {ct}_{i, t}^{(b)}, {meta}_{i, t}^{(b)})$ .
16:: end for
17:: end for
// Server-side streaming aggregation and block closure
18:: for $b = 1$ to $B_{t}$ do
19:: Initialize receive-set $I_{t}^{(b)} \leftarrow \emptyset$ , aggregate ciphertext ${CT}_{t}^{(b)} \leftarrow 0$ .
20:: while $| I_{t}^{(b)} | < θ | A_{t} |$ do
21:: Upon receiving a valid tuple from client i for block b (nonce ok), set $I_{t}^{(b)} \leftarrow I_{t}^{(b)} \cup {i}$ .
22:: Update ciphertext aggregate: ${CT}_{t}^{(b)} \leftarrow {CT}_{t}^{(b)} + {\tilde{α}}_{i, t} \cdot {ct}_{i, t}^{(b)}$ .
23:: Aggregate metadata analogously (if enabled).
24:: end while
25:: // Threshold decryption requests
26:: $Srv$ sends $(t, b, {CT}_{t}^{(b)}, F)$ to $H_{A}$ and $H_{B}$ , where $F = {e_{1}, \dots, e_{D}}$ and optionally ${\bar{κ}}_{t}$ .
27:: Each helper returns shares ${σ_{f}^{\circ} (t, b)}_{f \in F}$ .
28:: Combine shares to recover decrypted block aggregate ${\hat{U}}_{t}^{(b)} = \sum_{i \in I_{t}^{(b)}} {\tilde{α}}_{i, t} {\hat{u}}_{i, t}^{(b)}$ .
29:: Verify commuting checks for block b using decrypted values and aggregated metadata (if enabled).
30:: end for
31:: Assemble ${\hat{U}}_{t} \in Z^{k}$ from ${{\hat{U}}_{t}^{(b)}}_{b = 1}^{B_{t}}$ and decode update.
32:: Update model: $w_{t + 1} \leftarrow Update (w_{t}, {\hat{U}}_{t})$ .
33:: Delete round-scoped keys and ephemeral state (e.g., $κ_{t}$ , per-round split keys).

References

McMahan, H.B.; Moore, E.; Ramage, D.; Hampson, S.; Arcas, B.A.y. Communication-Efficient Learning of Deep Networks from Decentralized Data. In Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (AISTATS) 2017, Fort Lauderdale, FL, USA, 20–22 April 2017. [Google Scholar]
Konečný, J.; McMahan, H.B.; Yu, F.X.; Richtárik, P.; Suresh, A.T.; Bacon, D. Federated Learning: Strategies for Improving Communication Efficiency. arXiv 2016, arXiv:1610.05492. [Google Scholar]
Kairouz, P.; McMahan, H.B.; Avent, B.; Bellet, A.; Bennis, M.; Bhagoji, A.N.; Bonawitz, K.; Charles, Z.; Cormode, G.; Cummings, R.; et al. Advances and Open Problems in Federated Learning. In Foundations and Trends in Machine Learning; Now Publishers Inc.: Hanover, MA, USA, 2021. [Google Scholar]
Bonawitz, K.; Eichner, H.; Grieskamp, W.; Huba, D.; Ingerman, A.; Ivanov, V.; Kiddon, C.; Konečný, J.; Mazzocchi, S.; McMahan, H.B.; et al. Towards Federated Learning at Scale: System Design. In Proceedings of the2nd SysML Conference, Palo Alto, CA, USA, 31 March–2 April 2019. [Google Scholar]
Bonawitz, K.; Ivanov, V.; Kreuter, B.; Marcedone, A.; McMahan, H.B.; Patel, S.; Ramage, D.; Segal, A.; Seth, K. Practical Secure Aggregation for Privacy-Preserving Machine Learning. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, Dallas, TX, USA, 30 October–3 November 2017. [Google Scholar]
Gentry, C. Fully Homomorphic Encryption Using Ideal Lattices. In Proceedings of the STOC ’09: Symposium on Theory of Computing, Bethesda, MD, USA, 31 May–2 June 2009. [Google Scholar]
Brakerski, Z.; Gentry, C.; Vaikuntanathan, V. (Leveled) Fully Homomorphic Encryption without Bootstrapping. In Proceedings of the 3rd Innovations in Theoretical Computer Science Conference, Cambridge, MA, USA, 8–10 January 2012. [Google Scholar]
Cheon, J.H.; Kim, A.; Kim, M.; Song, Y. Homomorphic Encryption for Arithmetic of Approximate Numbers. In Advances in Cryptology—ASIACRYPT 2017, Proceedings of the 23rd International Conference on the Theory and Applications of Cryptology and Information Security, Hong Kong, China, 3–7 December 2017; Springer: Cham, Switzerland, 2017. [Google Scholar]
Alistarh, D.; Grubic, D.; Li, J.; Tomioka, R.; Vojnovic, M. QSGD: Communication-Efficient SGD via Gradient Quantization and Encoding. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
Seide, F.; Fu, H.; Droppo, J.; Li, G.; Yu, D. 1-bit Stochastic Gradient Descent and its Application to Data-Parallel Distributed Training of Speech DNNs. In Proceedings of the Interspeech 2014, Singapore, 14–18 September 2014. [Google Scholar]
Aji, A.F.; Heafield, K. Sparse Communication for Distributed Gradient Descent. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, Copenhagen, Denmark, 7–11 September 2017. [Google Scholar]
Lin, Y.; Han, S.; Mao, H.; Wang, Y.; Dally, W.J. Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training. In Proceedings of the 6th International Conference on Learning Representations, Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
Bernstein, J.; Wang, Y.X.; Azizzadenesheli, K.; Anandkumar, A. SignSGD with Majority Vote is Communication Efficient and Fault Tolerant. arXiv 2018, arXiv:1810.05291. [Google Scholar]
Karimireddy, S.P.; Rebjock, Q.; Stich, S.U.; Jaggi, M. Error Feedback Fixes SignSGD and other Gradient Compression Schemes. In Proceedings of the 36 th International Conference on Machine Learning, Long Beach, CA, USA, 9–15 June 2019. [Google Scholar]
Achlioptas, D. Database-friendly Random Projections: Johnson–Lindenstrauss with Binary Coins. J. Comput. Syst. Sci. 2003, 66, 671–687. [Google Scholar] [CrossRef]
Groth, J.; Sahai, A. Efficient Non-interactive Proof Systems for Bilinear Groups. In Advances in Cryptology—EUROCRYPT 2008, Proceedings of the 27th Annual International Conference on the Theory and Applications of Cryptographic Techniques, Istanbul, Turkey, 13–17 April 2008; Springer: Berlin/Heidelberg, Germany, 2008. [Google Scholar]
Bünz, B.; Bootle, J.; Boneh, D.; Poelstra, A.; Wuille, P.; Maxwell, G. Bulletproofs: Short Proofs for Confidential Transactions and More. In Proceedings of the 2018 IEEE Symposium on Security and Privacy (SP), San Francisco, CA, USA, 20–24 May 2018. [Google Scholar]
Boneh, D.; Sahai, A.; Waters, B. Functional Encryption: Definitions and Challenges. In Theory of Cryptography, Proceedings of the 8th Theory of Cryptography Conference, Providence, RI, USA, 28–30 March 2011; Springer: Berlin/Heidelberg, Germany, 2011. [Google Scholar]
Regev, O. On Lattices, Learning with Errors, Random Linear Codes, and Cryptography. J. ACM 2009, 56, 34. [Google Scholar] [CrossRef]
Agrawal, S.; Freeman, D.M.; Vaikuntanathan, V. Functional Encryption for Inner Product Predicates from Learning with Errors. In Advances in Cryptology—ASIACRYPT 2011, Proceedings of the 17th International Conference on the Theory and Application of Cryptology and Information Security, Seoul, Republic of Korea, 4–8 December 2011; Springer: Berlin/Heidelberg, Germany, 2011. [Google Scholar]
Abdalla, M.; Bourse, F.; Caro, A.D.; Pointcheval, D. Simple Functional Encryption Schemes for Inner Products. In Public-Key Cryptography—PKC 2015, Proceedings of the 18th IACR International Conference on Practice and Theory in Public-Key Cryptography, Gaithersburg, MD, USA, 30 March–1 April 2015; Springer: Berlin/Heidelberg, Germany, 2015. [Google Scholar]
Freeman, D.M. Improved Security for Linearly Homomorphic Signatures: A Generic Framework. In Public Key Cryptography—PKC 2012, Proceedings of the 15th International Conference on Practice and Theory in Public Key Cryptography, Darmstadt, Germany, 21–23 May 2012; Springer: Berlin/Heidelberg, Germany, 2012. [Google Scholar]
Catalano, D.; Fiore, D. Practical Homomorphic MACs for Arithmetic Circuits. In Advances in Cryptology—EUROCRYPT 2013, Proceedings of the 32nd Annual International Conference on the Theory and Applications of Cryptographic Techniques, Athens, Greece, 26–30 May 2013; Springer: Berlin/Heidelberg, Germany, 2013. [Google Scholar]
Katz, J.; Sahai, A.; Waters, B. Predicate Encryption Supporting Disjunctions, Polynomial Equations, and Inner Products. In Advances in Cryptology—EUROCRYPT 2008, Proceedings of the 27th Annual International Conference on the Theory and Applications of Cryptographic Techniques, Istanbul, Turkey, 13–17 April 2008; Springer: Berlin/Heidelberg, Germany, 2008. [Google Scholar]
Han, K.; Lee, W.K.; Karmakar, A.; Yi, M.K.; Hwang, S.O. QuripfeNet: Quantum-Resistant IPFE-Based Neural Network. IEEE Trans. Emerg. Top. Comput. 2024, 13, 640–653. [Google Scholar] [CrossRef]
Dowerah, U.; Dutta, S.; Mitrokotsa, A.; Mukherjee, S.; Pal, T. Unbounded predicate inner product functional encryption from pairings. J. Cryptol. 2023, 36, 29. [Google Scholar] [CrossRef]
Pan, Z.; Ying, Z.; Wang, Y.; Zhang, C.; Zhang, W.; Zhou, W.; Zhu, L. Feature-Based Machine Unlearning for Vertical Federated Learning in IoT Networks. IEEE Trans. Mob. Comput. 2025, 24, 5031–5044. [Google Scholar] [CrossRef]
Pan, Z.; Ying, Z.; Wang, Y.; Wang, Y.; Zhang, Z.; Zhou, W.; Zhu, L. Robust Watermarking for Federated Diffusion Models with Unlearning-Enhanced Redundancy. IEEE Trans. Dependable Secur. Comput. 2025, 1–15. [Google Scholar] [CrossRef]
Pan, Z.; Ying, Z.; Wang, Y.; Zhang, C.; Li, C.; Zhu, L. One-shot backdoor removal for federated learning. IEEE Internet Things J. 2024, 11, 37718–37730. [Google Scholar] [CrossRef]
Fereidooni, H.; Marchal, S.; Miettinen, M.; Mirhoseini, A.; Möllering, H.; Nguyen, T.D.; Rieger, P.; Sadeghi, A.R.; Schneider, T.; Yalame, H.; et al. SAFELearn: Secure aggregation for private federated learning. In Proceedings of the 2021 IEEE Security and Privacy Workshops (SPW), San Francisco, CA, USA, 27 May 2021; pp. 56–62. [Google Scholar]
Zhao, L.; Jiang, J.; Feng, B.; Wang, Q.; Shen, C.; Li, Q. Sear: Secure and efficient aggregation for byzantine-robust federated learning. IEEE Trans. Dependable Secur. Comput. 2021, 19, 3329–3342. [Google Scholar] [CrossRef]
Pan, Z.; Zeng, J.; Cheng, R.; Yan, H.; Li, J. PNAS: A privacy preserving framework for neural architecture search services. Inf. Sci. 2021, 573, 370–381. [Google Scholar] [CrossRef]
Gennaro, R.; Jarecki, S.; Krawczyk, H.; Rabin, T. Secure distributed key generation for discrete-log based cryptosystems. In Proceedings of the 17th International Conference on the Theory and Applications of Cryptographic Techniques, Prague, Czech Republic, 2–6 May 1999; pp. 295–310. [Google Scholar]
Mohri, M.; Sivek, G.; Suresh, A.T. Agnostic federated learning. In Proceedings of the 36 th International Conference on Machine Learning, Long Beach, CA, USA, 9–15 June 2019; pp. 4615–4625. [Google Scholar]
Pan, Z.; Li, C.; Yu, F.; Wang, S.; Wang, H.; Tang, X.; Zhao, J. Fedlf: Layer-wise fair federated learning. In Proceedings of the Proceedings of the AAAI’24: AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada, 20–27 February 2024; Volume 38, pp. 14527–14535. [Google Scholar]
Krizhevsky, A. Learning Multiple Layers of Features from Tiny Images; Technical Report; University of Toronto: Toronto, ON, Canada, 2009. [Google Scholar]
Caldas, S.; Wu, P.; Li, T.; Konečný, J.; McMahan, H.B.; Smith, V.; Talwalkar, A. LEAF: A Benchmark for Federated Settings. arXiv 2018, arXiv:1812.01097. [Google Scholar]

Figure 1. Empirical histogram of per-client dataset sizes under Dirichlet partitioning. X aiz denotes Samples per client. Darker bars correspond to the CIFAR-10 configuration, lighter bars to FEMNIST. Both exhibit long tails, with a majority of clients holding fewer than 150 local samples and a minority acting as heavy contributors.

Figure 2. Empirical CDF of simulated client latencies following a Pareto distribution with shape

1.2

and scale

1.0

s. Approximately

60 %

of clients respond within

1.5

s, while roughly

10 %

exceed 3 s, modeling heavy-tailed straggler behavior that PaS-Stream must tolerate.

Figure 2. Empirical CDF of simulated client latencies following a Pareto distribution with shape

1.2

and scale

1.0

s. Approximately

60 %

of clients respond within

1.5

s, while roughly

10 %

exceed 3 s, modeling heavy-tailed straggler behavior that PaS-Stream must tolerate.

Figure 3. Convergence curves for CIFAR-10 and FEMNIST. The near-overlap between Plaintext-FedAvg and encrypted baselines confirms correct baselining; PaS-Stream tracks plaintext closely with a small final gap under 4-bit quantization.

Figure 4. Per-client uplink on CIFAR-10 under four methods. From left to right, bars correspond to Encrypted SecAgg (light gray), KS-IPFE (medium gray), PaS-Stream with 8-bit quantization (dark gray), and PaS-Stream with 4-bit quantization (black). PaS-Stream achieves up to

3.4 \times

lower uplink than Encrypted SecAgg while preserving accuracy.

Figure 4. Per-client uplink on CIFAR-10 under four methods. From left to right, bars correspond to Encrypted SecAgg (light gray), KS-IPFE (medium gray), PaS-Stream with 8-bit quantization (dark gray), and PaS-Stream with 4-bit quantization (black). PaS-Stream achieves up to

3.4 \times

lower uplink than Encrypted SecAgg while preserving accuracy.

Figure 5. Server-side CPU time per round on CIFAR-10. From left to right, bars correspond to Encrypted SecAgg (light gray), KS-IPFE (medium gray), PaS-Stream with 8-bit quantization (dark gray), and PaS-Stream with 4-bit quantization (black). KS-IPFE already yields a

1.61 \times

speedup; PaS-Stream further improves this to

1.77 \times

with 4-bit quantization.

Figure 5. Server-side CPU time per round on CIFAR-10. From left to right, bars correspond to Encrypted SecAgg (light gray), KS-IPFE (medium gray), PaS-Stream with 8-bit quantization (dark gray), and PaS-Stream with 4-bit quantization (black). KS-IPFE already yields a

1.61 \times

speedup; PaS-Stream further improves this to

1.77 \times

with 4-bit quantization.

Figure 6. Accuracy vs. uplink reduction on CIFAR-10 for KS-IPFE and PaS-Stream configurations in Table 6. Each marker corresponds to a specific

(k, b)

; both KS-IPFE and PaS-Stream lie close to the Pareto frontier where moderate compression (e.g.,

(8192, 4)

) achieves large bandwidth reduction with minimal accuracy loss.

Figure 6. Accuracy vs. uplink reduction on CIFAR-10 for KS-IPFE and PaS-Stream configurations in Table 6. Each marker corresponds to a specific

(k, b)

; both KS-IPFE and PaS-Stream lie close to the Pareto frontier where moderate compression (e.g.,

(8192, 4)

) achieves large bandwidth reduction with minimal accuracy loss.

Figure 7. CIFAR-10 accuracy under random client dropout. Solid line: Encrypted SecAgg; dashed line: PaS-Stream with

(k, b) = (8192, 8)

. Both remain within

0.3 %

of the no-dropout baseline up to

ρ = 30 %

, with PaS-Stream tracking SecAgg closely at all dropout levels.

Figure 7. CIFAR-10 accuracy under random client dropout. Solid line: Encrypted SecAgg; dashed line: PaS-Stream with

(k, b) = (8192, 8)

. Both remain within

0.3 %

of the no-dropout baseline up to

ρ = 30 %

, with PaS-Stream tracking SecAgg closely at all dropout levels.

Figure 8. Normalized throughput under heavy-tailed client latencies (Pareto shape

1.2

). For each method, the lighter bar shows throughput without the verifiability layer and the darker bar with tags and clipping proofs enabled. The verifiability layer reduces throughput by at most

4 %

, while PaS-Stream retains a

1.2

–

1.3 \times

advantage over KS-IPFE in all cases.

Figure 8. Normalized throughput under heavy-tailed client latencies (Pareto shape

1.2

). For each method, the lighter bar shows throughput without the verifiability layer and the darker bar with tags and clipping proofs enabled. The verifiability layer reduces throughput by at most

4 %

, while PaS-Stream retains a

1.2

–

1.3 \times

advantage over KS-IPFE in all cases.

Table 1. Federated task configuration. “Non-IID skew” refers to the Dirichlet concentration parameter

α

used to draw client label distributions; smaller

α

implies stronger heterogeneity. E is the number of local epochs per round.

Table 1. Federated task configuration. “Non-IID skew” refers to the Dirichlet concentration parameter

α

used to draw client label distributions; smaller

α

implies stronger heterogeneity. E is the number of local epochs per round.

Task	Model	n	$\| A_{t} \|$	T	E	Non-IID Skew $α$
CIFAR-10	ResNet-18	1000	100	100	1	0.5
FEMNIST	CNN (small)	3400	256	120	1	0.3

Table 2. Secure aggregation baselines (protocol configuration).

Protocol	Trust/Collusion Assumption	Dropout Handling
Encrypted SecAgg	Server does not learn individual updates	Designed for client dropouts
SecAgg+	Same goal as SecAgg; improved practicality	Designed for client dropouts
BatchCrypt	Secure aggregation via batching/crypto	Depends on protocol instantiation

Table 3. Cryptographic and streaming parameters used by KS-IPFE and PaS-Stream. Latency parameters describe the synthetic client latency model used to drive rate-adaptive behavior.

Parameter	Value	Description
q	$2^{32}$	LWE modulus
$n_{l}$	1024	LWE dimension
$l_{g}$	16	gadget digits per coordinate
D	64	block length (plaintext coordinates)
$Δ$	$2^{16}$	scaling factor for Equation (3)
k	8192	sketch dimension (PaS-Stream)
b	8 or 4	quantization bit-width
$θ$	$0.7$	minimum block coverage fraction
Latency shape	$1.2$	Pareto shape for client latency
Latency scale	$1.0$ s	Pareto scale (minimum latency)

Table 4. Final test accuracy (%) after 100 rounds (CIFAR-10) and 120 rounds (FEMNIST); mean over 3 seeds.

Δ

Acc is the absolute difference to Plaintext-FedAvg.

Table 4. Final test accuracy (%) after 100 rounds (CIFAR-10) and 120 rounds (FEMNIST); mean over 3 seeds.

Δ

Acc is the absolute difference to Plaintext-FedAvg.

Method	CIFAR-10	$Δ$ Acc (CIFAR)	FEMNIST	$Δ$ Acc (FEMNIST)
Plaintext-FedAvg	91.7	0.0	86.9	0.0
Encrypted SecAgg	91.7	0.0	86.9	0.0
KS-IPFE (full-precision)	91.6	−0.1	86.8	−0.1
PaS-Stream ( $b = 8$ )	91.5	−0.2	86.8	−0.1
PaS-Stream ( $b = 4$ )	91.4	−0.3	86.7	−0.2

Table 5. Per-round efficiency on CIFAR-10 (mean over rounds). Uplink is measured per client per round; CPU time is the total server-side time (aggregator + helpers) per round. The rightmost columns report multiplicative reductions relative to Encrypted SecAgg.

Method	Uplink (MB)	CPU (s/Round)	Uplink Reduction	CPU Reduction
Encrypted SecAgg	2.80	62.0	1.00×	1.00×
KS-IPFE (full-precision)	1.47	38.5	1.90×	1.61×
PaS-Stream ( $b = 8$ )	1.12	36.0	2.50×	1.72×
PaS-Stream ( $b = 4$ )	0.82	35.1	3.41×	1.77×

Table 6. CIFAR-10 ablation over sketch dimension k and quantization depth b. Accuracy is final test accuracy (%) after 100 rounds; uplink is per-client per-round communication; CPU is server-side time per round.

Method	$(k, b)$	Accuracy (%)	Uplink (MB)	CPU (s/Round)
KS-IPFE (full-precision)	–	91.6	1.47	38.5
PaS-Stream	(16,384, 8)	91.6	1.42	37.9
PaS-Stream	$(8192, 8)$	91.5	1.12	36.0
PaS-Stream	$(8192, 4)$	91.4	0.82	35.1
PaS-Stream	$(4096, 8)$	91.2	0.78	34.6
PaS-Stream	$(4096, 4)$	91.0	0.60	34.2

Table 7. Analytical overhead beyond uplink and server CPU (using Table 3).

Quantity	Scaling	Example at Default Params
Split key share size (per function, per helper)	$m + D l_{g}$ words	3072 words $\approx 12$ KB
Helper key storage (basis ${e_{j}}$ )	$D (m + D l_{g})$ words	$64 \times 12$ KB $\approx 0.77$ MB
Helper ephemeral key refresh (e.g., ${\bar{κ}}_{t}$ )	$m + D l_{g}$ words/round	≈12 KB per helper
Helper compute (per block)	$(D + 1) (m + D l_{g})$ mul-adds	≈ $2.0 \times 10^{5}$
Server nonce state	$O (\| A_{t} \|)$ counters	negligible vs. ciphertext buffers

Table 8. Robustness to random client dropout on CIFAR-10. Accuracy is final test accuracy (%); throughput is normalized relative to Encrypted SecAgg at

ρ = 0 %

.

Table 8. Robustness to random client dropout on CIFAR-10. Accuracy is final test accuracy (%); throughput is normalized relative to Encrypted SecAgg at

ρ = 0 %

.

Method	Dropout $ρ$	Accuracy (%)	Throughput	Throughput Gain
Encrypted SecAgg	0%	91.7	1.00	1.00×
Encrypted SecAgg	10%	91.5	0.93	0.93×
Encrypted SecAgg	20%	91.3	0.88	0.88×
Encrypted SecAgg	30%	90.9	0.81	0.81×
PaS-Stream (8b)	0%	91.5	1.28	1.28×
PaS-Stream (8b)	10%	91.4	1.31	1.31×
PaS-Stream (8b)	20%	91.2	1.33	1.33×
PaS-Stream (8b)	30%	90.9	1.35	1.35×

Table 9. Verifiability overhead on CIFAR-10 for KS-IPFE and PaS-Stream. CPU overhead is measured as the difference between total server-side CPU time with and without tags/proofs. Communication overhead is the additional per-client uplink.

Method	CPU Overhead (s/Round)	CPU Overhead (%)	Comm. Overhead (% of Uplink)
KS-IPFE (full-precision)	1.6	4.2%	2.1%
PaS-Stream ( $k = 8192, 8$ b)	1.5	4.3%	2.7%
PaS-Stream ( $k = 8192, 4$ b)	1.4	4.1%	2.9%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Tan, Z.; Pan, Z.; Liang, Y.; Yang, S. A Novel Verifiable Functional Encryption Framework for Secure and Communication-Efficient Distributed Gradient Transmission Management. Electronics 2026, 15, 928. https://doi.org/10.3390/electronics15050928

AMA Style

Tan Z, Pan Z, Liang Y, Yang S. A Novel Verifiable Functional Encryption Framework for Secure and Communication-Efficient Distributed Gradient Transmission Management. Electronics. 2026; 15(5):928. https://doi.org/10.3390/electronics15050928

Chicago/Turabian Style

Tan, Ziya, Zijie Pan, Ying Liang, and Shuyuan Yang. 2026. "A Novel Verifiable Functional Encryption Framework for Secure and Communication-Efficient Distributed Gradient Transmission Management" Electronics 15, no. 5: 928. https://doi.org/10.3390/electronics15050928

APA Style

Tan, Z., Pan, Z., Liang, Y., & Yang, S. (2026). A Novel Verifiable Functional Encryption Framework for Secure and Communication-Efficient Distributed Gradient Transmission Management. Electronics, 15(5), 928. https://doi.org/10.3390/electronics15050928

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Novel Verifiable Functional Encryption Framework for Secure and Communication-Efficient Distributed Gradient Transmission Management

Abstract

1. Introduction

1.1. Challenges

1.2. Our Approach and Contributions

2. Related Work

2.1. Functional and Homomorphic Encryption for Linear Evaluation

2.2. Secure Aggregation and Federated Learning at Scale

2.3. Communication Reduction: Quantization, Sparsification, and Sketching

2.4. Verifiability and Linearly Homomorphic Authentication

2.5. Summary and Positioning

3. Preliminaries

3.1. Notation and System Model

3.2. Adversarial Model and Goals

3.3. Functional Encryption Background

3.4. LWE Tools and Encoding

3.5. Compression and Streaming

3.6. Verifiability Primitives

4. Methodology

4.1. Architectural Overview of FlowAgg-FE

4.2. KS-IPFE: Key-Splittable LWE-Based Construction

4.3. Verifiable Aggregation with Commuting Checks

4.3.1. Linearly Homomorphic Tags as a MAC over the FE Plaintext Space

4.3.2. FE-Based Cross-Checking of Aggregates

4.3.3. Commit-and-Prove for Clipping and Encoding

4.3.4. Commutativity and Asymptotic Overheads

4.4. PaS-Stream: Rate-Adaptive Streaming Without Accuracy Bias

5. Experiments

5.1. Setup

5.2. Main Results

5.3. Ablations and Stress Tests

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A. Parameter Selection

Appendix B. Additional Security Proofs

Appendix B.1. KS-IPFE Interface (Explicit Algorithms)

Appendix B.2. Confidentiality of KS-IPFE Under One-Helper Leakage

Appendix B.2.1. Security Experiment: IND KS - IPFE 1 H ( λ )

Appendix B.2.2. Simulation-Based Confidentiality: Real 1 H vs. Ideal 1 H

Appendix B.2.3. Theorem (Confidentiality Under LWE)

Appendix B.2.4. Proof Sketch: Explicit Simulator and Hybrids (Reduction to LWE)

Appendix B.2.5. Lemma (Simulatability of One-Helper Decryption Shares)

Appendix B.3. Verifiability: LHT Commuting Checks (Formal Games and Leakage)

Appendix B.3.1. Verifiability Experiment and Soundness Bound

Appendix B.3.2. Lemma (Tag Hiding Given Unknown κt)

Appendix C. Helper-Side API and State (Implementation-Facing)

Appendix D. Round-Level Message-Flow

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Appendix B.2.1. Security Experiment: ${IND}_{KS - IPFE}^{1 H} (λ)$

Appendix B.2.2. Simulation-Based Confidentiality: ${Real}^{1 H}$ vs. ${Ideal}^{1 H}$

Appendix B.3.2. Lemma (Tag Hiding Given Unknown κ_t)