Next Article in Journal
Multi-Resolution Resistor Network-Driven 3D Forward Modeling of HVDC Monopolar Geoelectric Current
Previous Article in Journal
Data-Driven Electricity Load Analysis in Smart Buildings: A Multi-Driver Automatic Dependency Disaggregation Approach
Previous Article in Special Issue
Fast-Converging and Trustworthy Federated Learning Framework for Privacy-Preserving Stock Price Modeling
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Novel Verifiable Functional Encryption Framework for Secure and Communication-Efficient Distributed Gradient Transmission Management

1
School of Management, Guangdong University of Science and Technology, Dongguan 523070, China
2
School of Computer Science and Engineering, Guangzhou University, Guangzhou 510006, China
*
Authors to whom correspondence should be addressed.
Electronics 2026, 15(5), 928; https://doi.org/10.3390/electronics15050928
Submission received: 18 November 2025 / Revised: 15 December 2025 / Accepted: 16 December 2025 / Published: 25 February 2026

Abstract

Secure and bandwidth-conscious transmission of model updates is a central bottleneck in distributed machine learning. Existing secure aggregation and homomorphic encryption pipelines either reveal more than the task requires or incur prohibitive computation and communication costs. We introduce a verifiable functional encryption (VFE) framework that releases only the intended linear functions of client gradients while providing end-to-end integrity and privacy guarantees under standard lattice assumptions. Our instantiation, FlowAgg-FE, combines two novel components. First, KS-IPFE, a key-splittable inner-product FE scheme, supports per-round weighted aggregation, vector packing, and on-the-fly function changes without client re-encryption; function keys are distributed across two non-colluding helpers, eliminating a single point of trust and enabling lightweight, homomorphically verifiable tags on decrypted outputs. Second, PaS-Stream is a rate-adaptive encryption-and-compression pipeline that couples sketch-based gradient compression with batched FE ciphertext streaming, ensuring unbiased aggregation in the presence of stragglers and dropouts. We further bind client-side clipping to zero-knowledge range proofs and offer an optional differentially private release layer that composes with FE to yield ( ε , δ ) -privacy. A prototype based on LWE demonstrates practicality across cross-device and cross-silo training: client uplink is reduced by 1.9–3.4× and server CPU time by 1.6 × versus state-of-practice encrypted secure aggregation, with accuracy within 0.3 % of plaintext baselines and correctness preserved under up to 30 % client dropout. These results show that verifiable FE can make secure, communication-efficient gradient transmission viable, as appropriate for theme of security and privacy in distributed machine learning of the Special Issue.

1. Introduction

Distributed machine learning, and federated learning (FL) in particular, have emerged as central paradigms for training models over data that remains on edge devices or organizational silos [1,2,3]. In such settings, each client C i computes a local update or gradient vector v i , t on its private data in round t, and an untrusted aggregator seeks only a global statistic such as the weighted sum
G t = i A t α i , t v i , t ,
where A t is the (possibly random) set of participating clients and α i , t are public aggregation weights. This pattern underlies cross-device FL in large-scale deployments [1,4], as well as cross-silo collaborations among institutions that cannot share raw data.
However, secure and communication-efficient transmission of v i , t remains a key bottleneck relevant to the Special Issue’s theme of security and privacy in distributed machine learning. On the one hand, secure aggregation protocols ensure that the aggregator learns only G t (or a related sum) and not individual v i , t [5]. On the other hand, the bandwidth cost of sending dense, high-dimensional updates—together with cryptographic noise and integrity artifacts—limits the scalability of training, especially for on-device settings with constrained uplink [2,3]. Homomorphic encryption (HE) can support rich computation over encrypted updates [6,7,8], but generic HE-based pipelines incur substantial ciphertext expansion, heavy computation, and often expose the aggregation circuit and its structure to the server.
Existing systems therefore face a tension between privacy, functionality, and efficiency. Secure aggregation based on pairwise masks or additively homomorphic cryptosystems reveals the exact aggregate G t to the server, but it does not provide function privacy: the function being computed is typically fixed (e.g., coordinate-wise sum) and known to all parties [5]. If the task owner wishes to change the function (for example, to aggregate only a subset of coordinates, or to work in a sketched space), new protocol instances or key material are required. Meanwhile, communication-efficient training methods—such as gradient quantization, sparsification, and sketching [9,10,11,12,13,14,15]—primarily address bandwidth and do not by themselves guarantee cryptographic privacy of the compressed updates. Finally, integrity mechanisms that protect against Byzantine clients often rely on heavyweight zero-knowledge proofs or verifiable computation that do not commute with the linear aggregation structure of FL [16,17].
Functional encryption (FE) offers an appealing abstraction: decryption keys reveal only the output of a specified function f on ciphertexts, while everything else about the underlying plaintexts remains hidden [18]. Inner-product FE (IPFE) schemes from LWE [19,20,21] instantiate this for linear functions, enabling one to obtain x , y from an encryption of x and a key for y , and nothing else. These constructions align naturally with the linear statistics used in FL, but existing FE work has not directly addressed the system-level challenges of high-dimensional gradient transmission, client dropouts, and streaming under straggler-heavy networks. Moreover, most FE designs assume a single, fully trusted key authority and do not provide practical mechanisms to split decryption capabilities across non-colluding helpers in a way that supports verifiable streaming aggregation.

1.1. Challenges

This work is motivated by the following three intertwined challenges:
(1) 
Function-private aggregation of high-dimensional updates. The server should learn only the quantities that are strictly necessary for model updates—such as selected coordinates of G t or its image under a sketching operator—and nothing else about individual v i , t . Achieving this with FE requires handling vectors of dimension d in the millions, while maintaining practical key and ciphertext sizes [20,21].
(2) 
Communication and computation efficiency. Any secure transmission scheme must be competitive in both bandwidth and latency, with optimized secure aggregation pipelines deployed in practice [4,5]. This calls for integrating unbiased quantization and sketching [2,9,12,15] directly into the FE layer, preserving linearity so that the estimator of G t remains unbiased after decryption, while keeping ciphertext expansion modest.
(3) 
Verifiable correctness under partial trust. In realistic deployments, a single aggregator or helper may be compromised. We therefore seek a design where decryption power is split across two non-colluding helpers, and where the server can verify that the decrypted aggregates genuinely reflect the sum of honestly contributed (and properly clipped) updates, without learning any intermediate plaintext. Existing linearly homomorphic authentication and MAC schemes [22,23] provide building blocks, but they need to commute with the FE aggregation path and operate efficiently at gradient scale.

1.2. Our Approach and Contributions

We propose FlowAgg-FE, a verifiable functional encryption framework for secure and communication-efficient gradient transmission in distributed machine learning. At its core are the following two novel algorithmic components:
I. 
KS-IPFE. A key-splittable, LWE-based inner-product FE scheme that supports blockwise linear aggregation over high-dimensional encoded gradients, with 2-of-2 threshold decryption across two non-colluding helpers.
II. 
PaS-Stream. A rate-adaptive, streaming transmission pipeline that combines Johnson–Lindenstrauss sketching, unbiased quantization, and blockwise FE encryption to enable straggler-tolerant, bandwidth-efficient aggregation.
Together with commuting linearly homomorphic tags and lightweight clipping proofs, these components realize an end-to-end pipeline in which the server learns only the intended linear image of client updates, and can verify aggregate integrity, while clients enjoy reduced uplink. Concretely, our contributions are as follows:
  • Key-splittable inner-product FE for FL gradients. We design KS-IPFE, a dual-style LWE construction with gadget packing and explicit key splitting across two helpers. Each function key for a vector y is split into two individually simulatable shares; only when combined do they reveal the true inner product with the aggregated ciphertext. This enables thresholded, function-private decryption of G t and on-the-fly function changes (e.g., different coordinate subsets or sketch spaces) without client-side re-encryption [18,20,21].
  • PaS-Stream: Unbiased, streaming compression integrated with FE. We introduce PaS-Stream, which applies JL-style sketching and unbiased quantization [2,9,15] to clipped client updates, partitions the resulting compressed vectors into blocks, and encrypts each block with KS-IPFE. This preserves linearity and unbiasedness through the encoding and FE layers, yields a rate-adaptive stream that tolerates stragglers and dropouts, and lowers per-round uplink by up to 3.4 × compared to an optimized encrypted secure aggregation baseline.
  • Commuting verifiability via linearly homomorphic tags. We co-design linearly homomorphic tags [22,23] and blockwise clipping proofs that commute with KS-IPFE aggregation. Tags are derived from encoded blocks and cross-checked with FE decryptions under a hidden selector, ensuring that the released aggregates match the sum of properly clipped client updates, even when some clients or a single helper are malicious. This yields a lightweight, scalable integrity mechanism compatible with large-scale distributed training [4].
  • Implementation and empirical evaluation. We implement FlowAgg-FE with an LWE backend and efficient vector packing, and evaluate it on the CIFAR-10 and FEMNIST tasks [1,2]. Our results show that FE-based transmission can match plaintext and secure aggregation accuracy within 0.3 % absolute, while reducing per-client uplink by 1.9–3.4× and cutting server-side CPU time by up to 1.77 × under realistic participation and straggler models.
In general, FlowAgg-FE makes three conceptual contributions: (i) a key-splittable IPFE interface in which a coalition of Srv with at most one helper has only a simulatable view, (ii) a streaming sketch/compression path that preserves unbiasedness under value-independent dropouts, and (iii) commuting integrity checks (LHT + commitments) that bind the decrypted aggregate to authenticated client inputs.
The rest of the paper is organized as follows. Section 2 discusses related work on functional encryption, secure aggregation, compression for distributed optimization, and verifiable computation. Section 3 formalizes the distributed learning setting, adversarial model, and cryptographic tools. Section 4 presents the KS-IPFE construction and the PaS-Stream protocol in detail. Section 5 reports empirical results on vision and character recognition benchmarks. Section 6 concludes with a discussion of limitations and future directions.

2. Related Work

Functional encryption (FE) offers a paradigm in which decryption keys reveal only specified function outputs on encrypted data, rather than the data itself. The conceptual foundations and formal security notions for FE were articulated by Boneh, Sahai, and Waters [18], while predicate and attribute-focused forms (e.g., inner-product predicates) were developed via pairing-based and lattice-based techniques [20,21,24]. Our framework targets linear function evaluation over high-dimensional client updates, aligning with inner-product FE (IPFE) and its optimized instantiations from standard assumptions [20,21]. In contrast to those works, KS-IPFE introduces a two-share, non-colluding key-splitting interface that enables thresholded functional decryption and verifiable cross-checks without revealing per-client contributions. While multi-party and threshold designs are well-studied for homomorphic and public-key encryption [6,7,8], comparable key-splitting for FE targeted to streaming federated workloads has not been systematized in prior works.

2.1. Functional and Homomorphic Encryption for Linear Evaluation

IPFE constructions from LWE provide succinct linear evaluation under worst-case hardness assumptions [19,20,21]. These schemes expose only x , y for chosen y , complementing classical homomorphic encryption (HE), which permits generic circuit evaluation but with heavier ciphertext growth and resourcing [6,7,8]. Our design adopts an LWE-style dual form with gadget packing, exploiting ciphertext additivity to aggregate client streams before functional decryption, thus revealing only G t rather than any individual v i , t . Compared to HE-based secure aggregation pipelines, FE avoids publishing aggregation circuits, maintains function privacy (hiding y unless authorized), and enables on-the-fly switching of y with re-keying at the helpers’ end rather than client-side re-encryption [7,20,21]. Classic IPFE gives any holder of sk y the ability to evaluate x , y on each ciphertext, which is unsuitable for FL where no single entity should learn per-client values. Our KS-IPFE adds a 2-of-2 split with individually simulatable shares, enabling deployment where Srv can collude with at most one helper without breaking confidentiality [25,26].

2.2. Secure Aggregation and Federated Learning at Scale

Secure aggregation protocols compute sums of client-held vectors without disclosing the summands and are a cornerstone of federated learning deployments [4,5,27,28]. FedAvg and its communication-efficient variants established cross-device FL as a practical training modality with client-side clipping and partial participation [1,2,3]. Our FlowAgg-FE differs in two aspects. First, we cryptographically restrict disclosure to the linear image chosen by the task owner (e.g., blockwise coordinates or sketched aggregates) via FE keys rather than bespoke MPC masks [5]. Second, our PaS-Stream couples unbiased sketching with blockwise FE to make aggregation rate-adaptive and resilient to stragglers, integrating naturally with production-grade FL orchestration [4,28,29,30].

2.3. Communication Reduction: Quantization, Sparsification, and Sketching

A large section of the literature reduces uplink/round-trip costs by compressing gradients through quantization, sparsification, and randomized projections [2,9,10,11,12,13,14,15]. QSGD provides unbiased quantization with convergence guarantees [9]; 1-bit SGD and sign-based methods achieve extreme compression but may introduce bias unless error feedback is used [10,13,14]. Random projections in the Johnson–Lindenstrauss (JL) family preserve geometry with small distortion [15] and underpin sketched updates in FL [2]. PaS-Stream composes JL-type sketching with unbiased quantization while retaining linearity through the encoding and FE layers, so the estimator of G t remains unbiased after decryption. Unlike prior compression-only methods [9,11,12], we cryptographically enforce that only the intended linear image of the compressed stream can be recovered.

2.4. Verifiability and Linearly Homomorphic Authentication

Ensuring integrity of aggregated updates is critical under Byzantine behavior. General-purpose succinct NIZKs and range proofs (e.g., Groth–Sahai and Bulletproofs) offer expressive commit-and-prove tools with lightweight verification [16,17]. Our commuting checks instantiate linearly homomorphic tags to validate blockwise sums and bind them to encoded vectors, then cross-check via an FE decryption under a hidden selector. This approach parallels linearly homomorphic authentication/signature lines for linear subspaces and network-coded data [22,23], but tailors them to the FE aggregation pathway so that tags and decryptions agree on the same encoding. The result is a verifiability layer that adds minimal overhead and composes with our thresholded FE path, unlike heavy verifiable computation that would evaluate full-model updates inside a SNARK [16,17,31].

2.5. Summary and Positioning

In summary, our contribution bridges three threads: (i) IPFE from LWE for linear function release with function privacy [18,20,21,32]; (ii) secure aggregation and FL systems engineering [1,3,4,5]; and (iii) communication-efficient compression with unbiased estimators [2,9,12,15]. KS-IPFE provides a key-splittable FE layer that preserves privacy against any single server and supports on-the-fly function changes, while PaS-Stream delivers rate-adaptive, unbiased transmission whose integrity is checked by commuting, linearly homomorphic tags [17,22,23,31]. To our knowledge, this end-to-end co-design—functional encryption + unbiased sketching/quantization + commuting verifiability—has not been previously articulated or evaluated in the FL setting.

3. Preliminaries

This section fixes notation, describes the distributed learning setting, specifies the adversarial model and security goals, and recalls cryptographic and statistical primitives used by our framework. All symbols introduced here are used consistently throughout Section 4, Section 5 and Section 6.

3.1. Notation and System Model

Let λ be the security parameter and [ n ] : = { 1 , , n } . Vectors are bold lowercase, matrices bold uppercase, and all default norms are l 2 . For a real value x, let x denote rounding to the nearest integer. For modulus q Z > 0 we write Z q for integers modulo q and lift to vectors entrywise.
We consider cross-device or cross-silo training over n clients C = { C i } i = 1 n interacting with an untrusted aggregator Srv and two non-colluding helpers H A , H B . Training proceeds in rounds t = 1 , , T . The global model is w t R d . In round t, a subset A t [ n ] of available clients computes clipped gradients
v i , t = clip ( w l i ( w t ) , S ) with v i , t 2 S ,
for a fixed clipping threshold S > 0 . The task-relevant linear functional released in each round is the weighted aggregate
G t = i A t α i , t v i , t R d ,
for public weights α i , t R . Our framework reveals only G t (or a privatized variant) to Srv ; no other function of any v i , t should be learned.
To interface with lattice cryptography, clients encode v i , t into x i , t Z q d via a scaling factor Δ > 0 :
x i , t = Δ v i , t mod q , and dec ( z ) = z / Δ .
All ciphertexts in round t use the same modulus q and scale Δ .
SymbolTypeMeaning
C , Srv , H A , H B sets/rolesclients, aggregator, helpers
d , T , S integers/realdimension, rounds, clipping threshold
w t R d global model at round t
v i , t R d clipped gradient of client i
α i , t R aggregation weight for client i
x i , t Z q d encoded gradient (Equation (3))
q , Δ integers/realmodulus, fixed scaling factor
Φ t , Q b matrix, mapsketching matrix, b-bit unbiased quantizer

3.2. Adversarial Model and Goals

We assume an adaptive PPT adversary that may corrupt Srv and any strict subset of { H A , H B } , but not both helpers simultaneously. While our core model assumes that at most one of { H A , H B } colludes with Srv , this can be instantiated operationally by placing the helpers under independent administrative domains (e.g., distinct cloud providers) and enforcing separation via auditing/contractual controls. As a defense-in-depth option, helper execution can be confined to Trusted Execution Environments (TEEs) with remote attestation so that key shares are provisioned only to attested code and decryption shares are released only for round-labeled aggregated ciphertexts. Finally, our masking-based key splitting in Equation (12) naturally generalizes to t-of-m helpers via secret sharing, reducing reliance on any single helper at the cost of additional helper messages.
Clients can be Byzantine and may deviate from the protocol (e.g., sending malformed ciphertexts). Network scheduling can cause stragglers and dropouts. Our goals are:
  • Confidentiality. No adversary controlling Srv and at most one helper learns anything about individual v i , t beyond the value of the allowed function(s) (Equation (2)) and public metadata.
  • Function privacy. The structure of the function applied to encrypted data is hidden unless explicitly revealed via function keys.
  • Verifiable correctness. The value output to Srv equals the prescribed function of honestly contributed inputs, except with negligible probability, even if clients or a single helper are malicious.
  • Robust aggregation. The protocol remains correct under client dropouts; stragglers do not block progress.
Our confidentiality and function-privacy claims are proved in a model where Srv may collude with at most one helper. Concretely, the view of Srv + H A (or Srv + H B ) can be simulated given only the authorized aggregate outputs because each helper holds only a masked key share that is individually simulatable (Equation (5)). If both helpers collude with Srv , the system degrades to standard IPFE and confidentiality of individual contributions is no longer expected. In practice, the non-collusion assumption can be approximated by placing helpers in independent administrative domains or by confining helper logic to TEEs with attestation; we also note that the key-splitting technique extends to t-of-m helpers to reduce reliance on any single helper at the cost of extra helper messages.

3.3. Functional Encryption Background

A functional encryption (FE) scheme for a message space M and function family F consists of four PPT algorithms
( Setup , KeyGen , Enc , Dec ) ,
where Setup ( 1 λ ) mpk , msk ; KeyGen ( msk , f ) sk f for f F ; Enc ( mpk , m ) c t for m M ; and Dec ( sk f , c t ) f ( m ) . The security notion guarantees that c t reveals nothing about m beyond what is implied by the outputs f ( m ) ; for keys sk f the adversary holds.
Inner-Product FE (IPFE). We use a vector space M = Z q d and functions F = { y x , y mod q : y Z q d } . An IPFE scheme satisfies ciphertext additivity: for encryptions c t j Enc ( mpk , x j ) under the same public key,
Dec ( sk y , j c t j ) = j x j , y mod q .
Equation (4) lets the aggregator homomorphically combine client ciphertexts before functional decryption, ensuring that only an aggregate value is ever revealed. We require a 2-of-2 threshold variant in which KeyGen returns a pair of shares ( sk y A , sk y B ) distributed to H A and H B . Each helper computes a decryption share σ DecShare ( sk y , C T ) on an aggregate ciphertext C T , and a public combiner Comb outputs the result
Comb σ A , σ B = j x j , y mod q , with σ A , σ B individually simulatable .
No single helper can learn the function value alone. Classic IPFE gives any holder of sk y the ability to evaluate x , y on each ciphertext, which is unsuitable for FL where no single entity should learn per-client values. Our KS-IPFE adds a 2-of-2 split with individually simulatable shares (Equation (5)), enabling deployment where Srv can collude with at most one helper without breaking confidentiality; the cost is an explicit helper-separation assumption that we analyze in Section 3.2.

3.4. LWE Tools and Encoding

Our constructions are LWE-based. Let n l be the LWE dimension, q a prime modulus, and χ a discrete Gaussian or subgaussian error distribution over Z . A sample ( a , b ) Z q n l × Z q is drawn as a R Z q n l , b = a , s + e mod q for secret s R Z q n l and noise e χ . The hardness of distinguishing such samples from uniform is the LWE assumption at parameters ( n l , q , χ ) .
Additive homomorphism. LWE encryption of messages in Z q is additively homomorphic: summing ciphertexts componentwise produces a valid encryption of the sum with controlled noise growth. Vector encryption follows by componentwise encoding or via gadget decomposition. With the encoding of Equation (3), the post-decryption rescaling by Δ 1 recovers real-valued aggregates with bounded rounding error.
Noise budgeting. Let η upper bound the decryption noise after summing at most B ciphertexts in a round. We pre-allocate q and Δ so that
η q / 4 and Δ S q / 8 ,
ensuring correctness for B   | A t | . KS-IPFE decryption is exact on the encoded integers (up to negligible failure probability): LWE noise is only a correctness concern and does not introduce additional approximation error when rounding succeeds. Consequently, the total estimation error of FlowAgg-FE decomposes cleanly into (i) sketching error (zero-mean, with variance controlled by k and the choice of Φ t ) and (ii) quantization/encoding error from Q b and scaling by Δ . In particular, for any coordinate value u, the integer encoding/decoding contributes a deterministic rounding term bounded as | ε rnd |     1 / ( 2 Δ ) , while the stochastic quantizer remains unbiased, E [ ε q ] = 0 , with variance controlled by b. This addresses the worst-case numerical stability: no additional approximation is introduced by FE beyond the controlled quantization/sketching error.

3.5. Compression and Streaming

To reduce uplink, we use sketching and unbiased quantization before encryption. Let Φ t { ± 1 / k } k × d be a per-round public Johnson–Lindenstrauss transform and k d . Define the sketch
s i , t = Φ t v i , t R k .
Let Q b : R ( 2 b - levels ) be a stochastic quantizer with E [ Q b ( z ) z ] = z and bounded variance V [ Q b ( z ) ] σ b 2 (e.g., randomized rounding with per-block scale). The transmitted payload encodes u i , t = Q b ( s i , t ) . Unbiasedness preserves aggregated expectations
E i A t α i , t u i , t = i A t α i , t s i , t = Φ t G t .
A rate-adaptive streaming interface breaks u i , t into fixed-size chunks u i , t ( b ) that are each encoded to x i , t ( b ) and encrypted, enabling partial aggregation when stragglers drop.

3.6. Verifiability Primitives

We rely on two lightweight building blocks that commute with linear aggregation.
Linearly homomorphic tags (LHT). Let κ R Z p d be a secret tag key for a large prime p. A client computes
τ i , t = Δ v i , t , κ mod p ,
and sends τ i , t alongside the FE ciphertext. Aggregation preserves linearity as follows:
i A t α i , t τ i , t i A t α i , t Δ v i , t , κ mod p .
Choosing κ uniformly makes individual tags pseudorandom to Srv . In KS-IPFE we can also decrypt the same aggregate with function vector κ to reproduce the right-hand side of Equation (10) and cross-check consistency without revealing any per-client information.
Range and clipping proofs. Each client binds its encoded vector x i , t to a commitment Com ( x i , t ; r i , t ) and proves in zero-knowledge that
v i , t 2   S and x i , t = Δ v i , t mod q ,
using a commit-and-prove system with linear relations. These proofs are additively aggregatable: verifier checks succeed on the sum of commitments when all individual statements hold, matching the FE aggregation path.

4. Methodology

We now instantiate FlowAgg-FE with a concrete key-splittable inner-product functional encryption (KS-IPFE) scheme and a streaming transmission pipeline (PaS-Stream) that together realize the goals set out in Section 3. Unless stated otherwise, the plaintext vectors handled by FE in this section are the encoded payloads derived from either the full clipped gradients v i , t via Equation (3) or, when PaS-Stream is active, from the unbiased sketches u i , t = Q b ( Φ t v i , t ) via the same encoding. Viewing all FE plaintexts through this encoding lens keeps Equations (4)–(10) consistent and lets us reason about correctness and security directly at the level of encoded blocks.

4.1. Architectural Overview of FlowAgg-FE

FlowAgg-FE is structured as a layered architecture that aligns the cryptographic design of KS-IPFE with the systems concerns of streaming, robustness, and verifiability. The roles are as in Section 3: a population of clients C = { C i } , an untrusted aggregator Srv , and two non-colluding helpers H A , H B . All parties share public cryptographic parameters and model hyperparameters; only the helpers hold FE function key shares. At a high level, each training round t consists of three phases: (i) function selection and key materialization, (ii) client-side compression, encoding, and encryption, and (iii) server-side aggregation, threshold decryption, and verification.
In the function selection and key materialization phase, a key authority (which may be an initialization-time trusted party or a distributed setup protocol) runs the KS-IPFE setup algorithm to produce ( mpk , msk ) , as described later in detail. The public key mpk is disseminated to all clients and to Srv , while the master secret key msk is retained solely for generating function keys. To support changing linear functions on-the-fly (e.g., per-round masks, adaptive weighting, or auditing vectors), the authority issues fresh split keys ( sk y t A , sk y t B ) tagged with the round identifier t. Helpers keep only the currently active shares (plus long-lived basis shares for { e j } j [ D ] ), which makes revocation as simple as deleting prior-round shares. Synchronization is handled by including the round id in helper responses and rejecting stale shares at Srv . The per-round key material is O ( m + D l g ) elements per helper, which is negligible compared to per-round ciphertext traffic in our regimes. For a given training regime, the task owner specifies a family of allowable linear functionals over encoded blocks, such as (i) individual coordinates e j , (ii) rows of a sketching matrix Φ t used in PaS-Stream, or (iii) secret tag vectors κ t used for verifiability. For each such vector y , the authority runs KeyGen to obtain a pair of key shares ( sk y A , sk y B ) and distributes them to H A and H B , respectively. Because KS-IPFE keys are splittable and each share is individually simulatable, no single helper can recover inner products alone, yet together they can support decryption for any authorized linear functional. We note that although we model the key authority as a logical role, it is needed only at initialization (and when the authorized function set changes). To remove a single point of failure, the master secret msk can be generated and held by a small committee via distributed key generation/threshold secret sharing, and function-key issuance can be made auditable [33]. Alternatively, msk can be sealed inside an HSM/TEE so that only a rate-limited key-derivation interface is exposed and no bulk secret material is exportable.
In the client-side phase of round t, the aggregator broadcasts the current global model w t and, when PaS-Stream is enabled, the public sketching matrix is Φ t and quantization is configured (e.g., the bit-width b and scale ranges used by Q b ). Each client C i A t computes its clipped gradient according to Equation (1), obtaining v i , t with v i , t 2   S . Depending on the mode, the payload prior to encryption is either the full vector v i , t (KS-IPFE only) or a compressed representation u i , t = Q b ( Φ t v i , t ) that satisfies the unbiasedness property of Equation (8). The client then maps this payload into an integer vector via Equation (3), producing x i , t Z q k for some working dimension k (equal to d in the full-precision case and to the sketch dimension in PaS-Stream). This vector is partitioned into contiguous blocks of size D, x i , t ( 1 ) , , x i , t ( B t ) , each of which is encrypted independently under mpk using the KS-IPFE encryption algorithm. For every block, the client also computes a linearly homomorphic tag and a clipping/encoding proof that binds x i , t ( b ) to a valid, correctly clipped source vector. The collection of ciphertexts, tags, and proofs for all blocks is streamed towards Srv using a simple, nonce-based framing that preserves block ordering.
The server-side phase begins as soon as Srv receives the first block ciphertexts. Rather than waiting for all clients, Srv incrementally forms aggregated ciphertexts for each block index b by computing weighted sums across the subset of clients whose b-th block has arrived, as in Equation (14). In parallel, Srv aggregates the corresponding tags, preserving linearity at the tag level. The aggregated ciphertext pair ( C 1 ( b ) , C 2 ( b ) ) and any necessary public metadata are then forwarded to both helpers. Each helper uses its share of the FE keys to compute decryption shares for the relevant function vectors (e.g., the coordinate basis vectors and the secret tag vector κ t ), which are then combined at Srv to recover (i) the blockwise sums of encoded payloads and (ii) an independently reconstructed tag. The latter is cross-checked against the aggregated tag to detect any tampering or malformed ciphertexts, while the former yields either a block of the aggregated gradient i A t α i , t v i , t or, in PaS-Stream mode, a block of the aggregated sketch i A t α i , t u i , t . After rescaling and (if needed) applying Φ t , Srv obtains an estimator of G t and performs the model update for round t.
We note that a key feature of this architecture is that ciphertext aggregation and verification commute with functional decryption. Because KS-IPFE is additively homomorphic with respect to ciphertexts (Equation (4)) and the tags are linearly homomorphic (Equation (10)), Srv can aggregate encrypted blocks and tags independently, and the helpers can perform decryption on these aggregates without ever seeing per-client plaintexts. This property is crucial for scalability: it ensures that helper workload scales with the number of blocks per round, not with the number of clients, and that verification overhead remains modest.

4.2. KS-IPFE: Key-Splittable LWE-Based Construction

The KS-IPFE component of FlowAgg-FE provides the cryptographic substrate that allows the aggregator and the two helpers to recover only prescribed linear functionals of encrypted updates. It is tailored to the FL setting in Section 3 by (i) operating over high-dimensional encoded vectors arising from Equation (3), (ii) supporting ciphertext aggregation before decryption as in Equation (4), and (iii) splitting every function key into two individually simulatable shares held by non-colluding helpers. This subsection refines the construction in a more algebraic manner, explicitly tracking dimensions, noise terms, and the simulation-based security view.
We fix a block length D Z > 0 and view each FE plaintext as a block x Z q D . A working vector of dimension k (either k = d for full gradients or k equal to the sketch dimension in PaS-Stream) is partitioned into B = k / D blocks, so that the b-th block is denoted x ( b ) for b [ B ] . Typical choices such as D { 32 , 64 } balance the amortization of gadget-based packing with the number of function keys that must be instantiated per round. All arithmetic is in the residue ring Z q , where q Z > 0 is chosen together with the LWE dimension n l and error distributions χ , χ to satisfy the noise budget in Equation (6). We write m for the width of the public matrix A, with m = poly ( λ ) .
The construction follows a dual-style LWE template with gadget packing. Let G D Z q D l g × D be a block-diagonal gadget matrix implementing base-2 decomposition with l g digits per coordinate. Concretely, G D = diag ( g , , g ) with D blocks g = ( 1 , 2 , , 2 l g 1 ) , so that for any x Z q D , the product x ^ : = G D x Z q D l g represents the expansion of each coordinate of x in base 2. There exists a (not necessarily unique) left inverse G D Z q D × D l g such that G D G D I D ( mod q ) on the range of interest (i.e., for coefficients below the wrap-around threshold). This allows us to pass back and forth between the “digit” domain and the original coordinate domain inside the decryption algorithm.
Setup. In the setup phase, the authority samples
A R Z q n l × m , S χ m × D l g , E χ n l × D l g ,
where the entries of S and E are independent draws from subgaussian integer distributions χ , χ , respectively (e.g., discrete Gaussians with parameter σ and σ ). We define
B : = A S + E Z q n l × D l g ,
and publish the master public key mpk = ( A , B , G D ) while retaining the master secret key msk = S . Under the LWE assumption at parameters ( n l , q , χ ) , the joint distribution ( A , B ) is computationally indistinguishable from ( A , U ) , where U R Z q n l × D l g is uniform; consequently, mpk reveals no information about S beyond what is implied by the security parameter.
Key generation and splitting. To authorize decryption of inner products with a block-level function vector y Z q D , the authority first embeds y into the gadget domain by computing
y ˜ : = G D y Z q D l g .
We interpret y ˜ as the vector of “digit weights” corresponding to the functional x , y acting on gadget-packed plaintexts. Using msk = S , we compute a base key
K : = S y ˜ Z q m .
Intuitively, K is the dual secret associated with the function vector y , analogous to the secret key in standard LWE decryption.
We now choose a masking vector W R Z q m uniformly at random and define the two key shares as
sk y A = ( y ˜ , W ) , sk y B = ( y ˜ , K W mod q ) .
Each helper receives only its own share. The distribution of sk y A (resp. sk y B ) is computationally indistinguishable from ( y ˜ , U m ) , where U m is uniform in Z q m , since W is uniform and independent of K. In particular, any PPT adversary that corrupts at most one helper learns no additional information about K beyond what is already implied by y and the allowed function outputs. This observation underlies the threshold property of KS-IPFE: decryption requires cooperation between H A and H B , while each helper’s view alone can be simulated from public information and oracle access to function outputs.
Client encryption. For a single block payload x Z q D , derived from either v i , t or its sketch as in Equation (3), client C i forms the gadget-packed vector
x ^ : = G D x Z q D l g .
We regard x ^ as a column vector. The client then samples a fresh randomness vector r R Z q n l and error terms
e 1 χ m , e 2 χ D l g ,
and computes the ciphertext components
c 1 : = A r + e 1 Z q m , c 2 : = B r + x ^ + e 2 Z q D l g .
The per-block ciphertext is the pair c t = ( c 1 , c 2 ) . Note that if we define the “ideal” noiseless ciphertext as ( c ¯ 1 , c ¯ 2 ) = ( A r , B r + x ^ ) , then ( c 1 , c 2 ) differs from ( c ¯ 1 , c ¯ 2 ) by an additive error vector ( e 1 , e 2 ) .
Ciphertext aggregation. Once Srv has received ciphertexts for a given block index b from a subset A t [ n ] of clients in round t, it uses the fixed-point weights α ˜ i , t Z q (encoding the real weights α i , t ) to form the aggregated ciphertext
C 1 ( b ) : = i A t α ˜ i , t c 1 , i ( b ) , C 2 ( b ) : = i A t α ˜ i , t c 2 , i ( b ) .
Writing r i ( b ) for the randomness used by C i on block b and x ^ i ( b ) for its gadget-packed plaintext, we can decompose
C 1 ( b ) = A i A t α ˜ i , t r i ( b ) + i A t α ˜ i , t e 1 , i ( b ) ,
C 2 ( b ) = B i A t α ˜ i , t r i ( b ) + i A t α ˜ i , t x ^ i ( b ) + i A t α ˜ i , t e 2 , i ( b ) .
The aggregated error vectors
E 1 , t ( b ) : = i A t α ˜ i , t e 1 , i ( b ) , E 2 , t ( b ) : = i A t α ˜ i , t e 2 , i ( b ) + E i A t α ˜ i , t r i ( b )
remain subgaussian with parameter depending on | A t | ; Equation (6) is chosen precisely so that their magnitude is well within the decryption margin.
Threshold decryption and correctness. For each block and each authorized function vector y Z q D , helper H with share sk y = ( y ˜ , W ) computes a decryption share
σ : = y ˜ , C 2 ( b ) W , C 1 ( b ) Z q ,
where W A = W and W B = K W as in Equation (12). Summing these shares yields
σ A + σ B = y ˜ , C 2 ( b ) K , C 1 ( b ) = y ˜ , B i α ˜ i , t r i ( b ) + i α ˜ i , t x ^ i ( b ) + E 2 , t ( b ) S y ˜ , A i α ˜ i , t r i ( b ) + E 1 , t ( b ) = i α ˜ i , t x ^ i ( b ) , y ˜ + E i α ˜ i , t r i ( b ) , y ˜ + y ˜ , E 2 , t ( b ) S y ˜ , E 1 , t ( b ) ( mod q ) .
Denoting the total noise term in Equation (16) by N t ( b ) ( y ) , we can write
σ A + σ B i α ˜ i , t x ^ i ( b ) , y ˜ + N t ( b ) ( y ) ( mod q ) .
Because x ^ i ( b ) = G D x i , t ( b ) , and G D G D I D ( mod q ) on the relevant range, we have
i α ˜ i , t x ^ i ( b ) , y ˜ = i α ˜ i , t G D x i , t ( b ) , G D y = i α ˜ i , t x i , t ( b ) , y ,
where we used the identity G D x , G D y = x , y modulo q when no wrap-around occurs. By Equation (6), the subgaussian tails of N t ( b ) ( y ) ensure that | N t ( b ) ( y ) | q / 4 with probability at least 1 2 λ , so rounding to the nearest integer in the canonical interval ( q / 2 , q / 2 ] recovers
z ( b ) ( y ) : = Round σ A + σ B = i A t α ˜ i , t x i , t ( b ) , y Z ,
with overwhelming probability.
By choosing y = e j for j = 1 , , D , the helpers can reconstruct all D coordinates of the blockwise weighted sum i α ˜ i , t x i , t ( b ) from encrypted inputs. Mapping these coordinates back through the scaling factor Δ in Equation (3) then yields the corresponding portion of G t or of its sketched variant, up to deterministic rounding error controlled by Δ and the clipping threshold S.
Security intuition and cost. From a security perspective, KS-IPFE inherits the indistinguishability guarantees of LWE-based IPFE. An IND-style security game for KS-IPFE can be phrased as follows: An adversary chooses two message families { x i ( b , 0 ) } , { x i ( b , 1 ) } with the same values under all functions for which it holds key shares, receives encryptions of one of the two (chosen at random), along with one key share per function vector, and must guess which family was encrypted. Under the LWE assumption and the simulatable distribution of key shares in Equation (12), any PPT adversary controlling Srv and at most one helper has at most negligible advantage in this game. Intuitively, replacing ( A , B ) with uniform, then replacing ciphertexts with uniform, and finally replacing the key share mask W with uniform in hybrids yields a distribution that depends only on the revealed function outputs.
On the cost side, one KS-IPFE encryption of a block requires forming x ^ = G D x and two matrix-vector multiplications A r and B r , plus additions by small error terms. With NTT-friendly moduli and a structured choice of A , B , these multiplications can be implemented in O ( n l m ) ring operations with a modest constant. Server-side aggregation is linear in the number of ciphertexts actually received and consists of scalar additions in Z q . Helper-side decryption uses O ( D l g ) ring operations per decryption share (one inner product with C 2 ( b ) and one with C 1 ( b ) ), so for F distinct function vectors and B blocks, the total number of decryption operations per round scales as O ( F B D l g ) , independent of | A t | . In the regimes we evaluate in Section 5, a configuration with q = 2 32 , n l = 1024 , l g = 16 , D = 64 , and carefully tuned χ , χ yields a per-block decryption failure probability below 2 40 and supports several thousand contributing clients per round, while keeping per-client ciphertext sizes within a small constant factor of plaintext updates. More detailed game-based proofs (including the reduction to LWE and simulation of single-helper views) are provided in Appendix B.

4.3. Verifiable Aggregation with Commuting Checks

In addition to confidentiality and function privacy, FlowAgg-FE must enforce that the decryptions released to Srv coincide with valid linear aggregates of correctly clipped and encoded client updates, even in the presence of Byzantine clients and a potentially adversarial single helper. The verifiability layer is engineered so that all checks commute with the KS-IPFE aggregation pipeline: every operation can be written as an Z -linear map applied either before or after ciphertext aggregation, and the corresponding verification conditions are preserved up to negligible statistical or computational error. This section refines the informal description from Section 3 into a more algebraic treatment.

4.3.1. Linearly Homomorphic Tags as a MAC over the FE Plaintext Space

Let p be a large prime such that p > q and p is co-prime with q, and let Z p be the finite field of order p. For each round t, an authentication key space is defined as K LHT = Z p D , and a tag space as T LHT = Z p . A per-round authentication key is sampled as
κ t R Z p D ,
and is made available to all clients and to the KS-IPFE key generator (for the purpose of deriving sk κ t A , sk κ t B ), but not to Srv . We treat κ t as an ephemeral, per-round secret used only to authenticate that the decrypted aggregate matches the client-supplied tags in that round. κ t is generated by the same (possibly distributed) authority that issues KS-IPFE keys, is delivered to clients over an authenticated channel, and is erased after round t closes; helpers receive only FE key shares for the corresponding function vector κ ¯ t . The soundness of the LHT check requires that Srv does not learn κ t ; if a compromised client discloses κ t to Srv , then the server could forge tags for that round. We now make this collusion caveat explicit and note that confining tag computation to client TEEs (or distributing κ t only to trusted clients) mitigates it in deployments that require strong server-verifiability even under client–server collusion.
On this basis, we define the block-level message space for tags as
M tag = z Z D : z B ,
for some bound B > 0 that upper bounds the infinity norm of rounded, scaled blocks Δ a i , t ( b ) produced by Equation (3) and the clipping constraint v i , t 2   S . The LHT then induces an almost-linear MAC
Tag κ t : M tag T LHT , Tag κ t ( z ) = z , κ t mod p .
For each block b and client C i , we set
τ i , t ( b ) = Tag κ t Δ a i , t ( b ) = Δ a i , t ( b ) , κ t mod p .
For any finite subset of indices J A t and scalar weights { α ˜ j , t } j J Z q , we have the exact algebraic identity
T t , J ( b ) : = j J α ˜ j , t τ j , t ( b ) mod p = j J α ˜ j , t Δ a j , t ( b ) , κ t mod p = j J α ˜ j , t Δ a j , t ( b ) , κ t mod p ,
which is the blockwise instantiation of Equation (10). In particular, define the aggregated encoded block
z t , J ( b ) : = j J α ˜ j , t Δ a j , t ( b ) Z D .
Then T t , J ( b ) = Tag κ t ( z t , J ( b ) ) exactly, as long as all operations are interpreted modulo p for the tag and modulo q for the encoding. When J = A t , we recover the global aggregated tag T t ( b ) used by the protocol.
Viewed as a MAC, the unforgeability of this LHT against an adversary not knowing κ t reduces to the hardness of guessing a non-trivial linear relation over Z p on authenticated messages. More precisely, if an adversary outputs ( z , τ ) such that z is not in the Z -span of previously authenticated messages and τ = Tag κ t ( z ) , then under the random choice of κ t , the probability of success is at most 1 / p .
Lemma 1 
(LHT statistical hiding). Fix any z M tag and sample κ R Z p D . Let τ = z , κ mod p . If z 0 then τ is uniform over Z p ; if z = 0 then τ = 0 . In particular, when κ t is hidden from Srv , the tag does not leak the gradient norm or other distributional statistics beyond the degenerate event z = 0 .

4.3.2. FE-Based Cross-Checking of Aggregates

The LHT alone only ensures that tags are consistent with the sum of messages within the tag domain; it does not bind tags to the KS-IPFE ciphertexts. To couple tags to ciphertexts, we reuse the FE plaintext space and instantiate an additional FE key for the same vector κ t .
Let κ t Z p D be as above and let κ ¯ t Z q D denote its canonical embedding into Z q D (e.g., by interpreting κ t as integers in [ 0 , p ) and viewing them modulo q). The KS-IPFE key generator computes
( sk κ ¯ t A , sk κ ¯ t B ) KeyGen ( msk , κ ¯ t ) ,
with internal gadget expansion κ ˜ t = G D κ ¯ t as in Equation (12). These key shares are distributed to H A and H B .
Given the aggregated ciphertext ( C 1 ( b ) , C 2 ( b ) ) for block b in round t as in Equation (14) (so J = A t ), helper H computes the decryption share for function vector κ ¯ t as follows:
σ κ t : = κ ˜ t , C 2 ( b ) W κ t , C 1 ( b ) mod q ,
and transmits it to Srv . By the correctness analysis in Equation (16), their sum
Σ κ t ( b ) : = σ κ t A + σ κ t B i A t α ˜ i , t x ^ i , t ( b ) , κ ˜ t + E i A t α ˜ i , t r i ( b ) , κ ˜ t + noise ( mod q ) ,
and after gadget inversion and rounding we have, with probability at least 1 2 λ ,
dec κ t ( b ) : = Round Σ κ t ( b ) = i A t α ˜ i , t x i , t ( b ) , κ ¯ t mod q .
Since x i , t ( b ) = Δ a i , t ( b ) mod q and the magnitude of Δ a i , t ( b ) is bounded by B, the reduction modulo q is injective on the relevant range, enabling a well-defined lifting to Z as follows:
z ˜ t ( b ) : = i A t α ˜ i , t Δ a i , t ( b ) , κ ¯ t Z ,
with dec κ t ( b ) representing z ˜ t ( b ) mod q . We then project dec κ t ( b ) into Z p via the canonical map ϕ : Z q Z p (e.g., reduction modulo p) to obtain
T ˜ t ( b ) : = ϕ dec κ t ( b ) Z p .
Under the consistency of parameters (specifically, p and q sufficiently large relative to B and the number of clients), we have
T ˜ t ( b ) = T t ( b )
with all but negligible probability. Hence the equality
T ˜ t ( b ) = ? T t ( b )
serves as a soundness check tying the FE-decrypted aggregate to the aggregated tags. Any deviation that changes ciphertext contents (e.g., adversarial modification by a client or by a compromised helper) while leaving client-generated tags untouched will, with high probability, violate this equality.

4.3.3. Commit-and-Prove for Clipping and Encoding

The consistency of tags and FE decryptions guarantees that the aggregator recovers a sum of certain integer vectors, but it does not by itself ensure that those integers arise from correctly clipped and scaled real-valued updates. To enforce semantic correctness of the encoding, each client engages in a commit-and-prove protocol.
Let G be a cyclic group of prime order p with generators g , h G such that the discrete logarithm log g ( h ) is unknown. For a block index b, client C i defines a bit-decomposition of x i , t ( b ) = ( x i , t , 1 ( b ) , , x i , t , D ( b ) ) Z q D into l enc -bit chunks per coordinate (with l enc log 2 q ), and commits to each coordinate using a Pedersen-style vector commitment as follows:
Com ( x i , t ( b ) ; r i , t ( b ) ) = g j = 1 D x i , t , j ( b ) γ j h r i , t ( b ) G ,
where ( γ 1 , , γ D ) Z p D are fixed, publicly known generators for the message space. The commitment is additively homomorphic as follows:
i J Com ( x i , t ( b ) ; r i , t ( b ) ) α ˜ i , t = Com i J α ˜ i , t x i , t ( b ) ; i J α ˜ i , t r i , t ( b ) ,
for any index set J. This algebra exactly mirrors the linear aggregation in Equation (14) and the tag aggregation above.
Client C i supplies, for each round, a non-interactive zero-knowledge proof π i , t ( b ) of the following composite statement:
  • There exists v i , t R d and randomness r i , t ( 1 ) , , r i , t ( B t ) such that (a) v i , t 2   S , (b) the blocks a i , t ( b ) are consecutive slices of either v i , t or Q b ( Φ t v i , t ) , and (c) for each b we have x i , t ( b ) = Δ a i , t ( b ) mod q and the corresponding commitment equals Com ( x i , t ( b ) ; r i , t ( b ) ) .
Such proofs can be realized using standard inner-product arguments and range proofs (e.g., Bulletproofs-style constructions) that express the clipping condition as a quadratic constraint on the coordinates and the encoding relation as a bounded difference between Δ a i , t ( b ) and x i , t ( b ) . Verification of π i , t ( b ) is achieved by Srv before aggregation, but thanks to homomorphism, commitments may also be aggregated and verified in batch with auxiliary information provided by helpers after decryption.

4.3.4. Commutativity and Asymptotic Overheads

Summarizing the algebraic structure, for each block b and round t we have three parallel linear maps as follows:
L FE : { x i , t ( b ) } i A t i A t α ˜ i , t x i , t ( b ) ,
L tag : { Δ a i , t ( b ) } i A t i A t α ˜ i , t τ i , t ( b ) ,
L com : { Com ( x i , t ( b ) ; r i , t ( b ) ) } i A t i A t Com ( x i , t ( b ) ; r i , t ( b ) ) α ˜ i , t .
Each of these maps is Z -linear in the sense that the image of the aggregated objects is equal to the aggregation of the images. Consequently, the following diagram commutes (up to negligible decryption error and modulo reductions):
{ a i , t ( b ) } i encode + encrypt { ( c 1 , i ( b ) , c 2 , i ( b ) ) } i L tag L FE T t ( b ) FE - decrypt under κ t i α ˜ i , t x i , t ( b ) .
A similar commuting diagram holds for commitments. Our commuting checks enforce that each accepted ciphertext/tag/commitment tuple is well-formed and that the server’s decrypted output equals the linear aggregate of the submitted (bounded, encoded) client updates. This provides an auditable enforcement point for bounded-energy (e.g., clipped) updates, but it does not, by itself, prevent within-bound model-poisoning attacks. In practice, monitoring can be done via (i) per-round rejection/abort rates from failed checks, (ii) aggregate statistics that are safe to reveal (e.g., the number of clipped updates, which can be reported by clients as a single bit), and (iii) standard training diagnostics (loss/accuracy curves) to flag anomalous rounds; robust aggregation or anomaly detection can be layered on top without changing the cryptographic core. This commutativity is the main reason the verification overhead scales with the number of blocks and not with the number of clients: Srv can aggregate ciphertexts, tags, and commitments using the same coefficients and rely on a constant number of FE decryptions and aggregate-proof verifications per block.
From an asymptotic viewpoint, if B t denotes the number of blocks per client in round t and F the number of distinct function vectors used for verification (coordinate basis plus κ t and possibly a small number of additional selectors), then:
  • The number of FE decryptions per round is O ( F B t ) , independent of | A t | .
  • The number of scalar LHT evaluations per client is O ( B t D ) , and the communication cost of tags is O ( B t log p ) bits.
  • The size of commitments and their proofs grows as O ( B t log q ) group elements per client, which is dominated by the KS-IPFE ciphertexts for typical parameter regimes.
Section 5 confirms empirically that, under realistic choices of D, B t , p, and q, the verifiability layer adds only a low single-digit percentage to overall runtime and communication while significantly strengthening the integrity guarantees of FlowAgg-FE.

4.4. PaS-Stream: Rate-Adaptive Streaming Without Accuracy Bias

PaS-Stream instantiates a rate-adaptive transmission layer that composes Johnson–Lindenstrauss sketching, stochastic quantization, and KS-IPFE encryption into a single linear operator acting on client updates. Its design is such that (i) the estimator of the target aggregate G t remains unbiased in the sense of Equation (8), (ii) the variance introduced by quantization is explicitly controlled, and (iii) partial receipt of blocks and client dropouts manifest as structured linear perturbations rather than protocol failures.
We consider a per-round sketching dimension k (either fixed or adapted over time) and a public sketching matrix Φ t R k × d . For concreteness, one may view Φ t as a subsampled randomized orthonormal transform with entries in { ± 1 / k } , as in Equation (7), although the analysis below only requires a Johnson–Lindenstrauss-type concentration property. Let Q b : R A b be a stochastic quantizer with 2 b output levels satisfying
E [ Q b ( z ) z ] = z , V [ Q b ( z ) z ] σ b 2
for all z R , where σ b 2 = O ( 2 2 b ) is a variance parameter. We lift Q b to act componentwise on vectors.
Client pipeline (round t). For each client C i A t , define the linear compression operator
C t : R d R k , C t ( v ) : = Φ t v .
Client C i first computes its clipped gradient v i , t according to Equation (1) and then its sketch
s i , t : = C t ( v i , t ) = Φ t v i , t R k .
Next, it applies Q b to obtain a random quantized sketch
u i , t : = Q b ( s i , t ) A b k
with the property that E [ u i , t s i , t ] = s i , t and V [ u i , t s i , t ] σ b 2 I k . Combining this with Equations (2) and (8), we have
E i A t α i , t u i , t | { v i , t } i = Φ t i A t α i , t v i , t = Φ t G t ,
so the compressed aggregate is an unbiased estimator of the sketched target Φ t G t . We now partition the k-dimensional vector u i , t into B t : = k / D contiguous blocks via a family of deterministic selection matrices { P ( b ) { 0 , 1 } D × k } b = 1 B t , each extracting D coordinates
u i , t ( b ) : = P ( b ) u i , t R D , s i , t ( b ) : = P ( b ) s i , t , b [ B t ] .
By construction, b ( P ( b ) ) P ( b ) is the k × k identity, and thus u i , t = b ( P ( b ) ) u i , t ( b ) . We write a i , t ( b ) : = u i , t ( b ) to emphasize that this is the real-valued payload block for KS-IPFE and the tag layer.
Each block a i , t ( b ) is then encoded into an integer vector and packed using the KS-IPFE plaintext map as follows:
x i , t ( b ) : = Δ a i , t ( b ) mod q Z q D , x ^ i , t ( b ) : = G D x i , t ( b ) Z q D l g ,
and encrypted into a block ciphertext ( c 1 , i ( b ) , c 2 , i ( b ) ) according to Equation (13). In parallel, C i computes the linearly homomorphic tag
τ i , t ( b ) = Δ a i , t ( b ) , κ t mod p
and then produces the clipping/encoding proof for this block. Each triple
n i , t ( b ) , c t i , t ( b ) , τ i , t ( b ) , π i , t ( b )
consists of a monotone nonce n i , t ( b ) Z 0 (e.g., ( t , b ) encoded as a single integer), the ciphertext c t i , t ( b ) , tag, and proof. These are streamed to Srv in non-decreasing order of n i , t ( b ) , and the nonce is bound into c t i , t ( b ) (e.g., by hashing it into the LWE randomness) to prevent replay or reordering attacks. To validate n i , t ( b ) at scale, Srv only stores, for each active client i A t , the largest accepted nonce (or equivalently the next expected block index). This is O ( | A t | ) counters and can be garbage-collected at round end since nonces are round-scoped; e.g., 10 4 active clients require 8 × 10 4 bytes for 64-bit counters. If n i , t ( b ) is deterministically encoded as ( t , b ) , the check reduces to duplicate suppression and does not require long-term per-client history.
Server/helpers pipeline and rate adaptation. Upon receiving any subset of block messages from clients, the aggregator maintains, for each block index b, an active index set
I t ( b ) A t
consisting of clients whose block-b ciphertexts have arrived and passed basic syntactic checks (including proof verification). For that block, it computes the aggregated ciphertext and tag
C 1 ( b ) : = i I t ( b ) α ˜ i , t c 1 , i ( b ) , C 2 ( b ) : = i I t ( b ) α ˜ i , t c 2 , i ( b ) ,
T t ( b ) : = i I t ( b ) α ˜ i , t τ i , t ( b ) mod p .
Note that I t ( b ) can vary across blocks: a client may send early blocks promptly but drop out before later blocks. Let θ ( 0 , 1 ] be a target coverage parameter; Srv may decide to close block b once | I t ( b ) |   θ | A t | , discarding any later-arriving contributions to that block. This is the locus of rate adaptation: smaller θ accelerates progress at the cost of using fewer client contributions per block. Closing blocks early can correlate contribution with device speed: if slower clients systematically belong to underrepresented groups (a common concern in non-IID settings such as FEMNIST), their effective weight in the aggregate may be reduced. Our sketching and quantization remain unbiased conditional on the received set I t ( b ) , but unbiasedness alone does not guarantee population- or device-level fairness [34,35]. As simple mitigations, one may (i) track each client’s effective participation over time and compensate in α i , t , and/or (ii) periodically run full-coverage rounds ( θ = 1 ) to reduce drift.
For each closed block b, the pair ( C 1 ( b ) , C 2 ( b ) ) is forwarded to both helpers, who evaluate it under KS-IPFE function keys for the standard basis vectors e 1 , , e D and the tag selector κ t . Using Equation (15), helpers produce shares
σ j , ( b ) : = DecShare ( sk e j , C 1 ( b ) , C 2 ( b ) ) , σ κ t , ( b ) : = DecShare ( sk κ ¯ t , C 1 ( b ) , C 2 ( b ) ) ,
for { A , B } , j [ D ] . Aggregating shares and rounding as in Equation (16) yields, with overwhelming probability
z j , t ( b ) : = Round ( σ j A , ( b ) + σ j B , ( b ) ) = i I t ( b ) α ˜ i , t x i , t ( b ) , e j ,
so that
z t ( b ) : = ( z 1 , t ( b ) , , z D , t ( b ) ) = i I t ( b ) α ˜ i , t x i , t ( b ) Z q D .
Dividing by Δ and undoing the fixed-point encoding recovers
u ^ t ( b ) : = Δ 1 z t ( b ) i I t ( b ) α i , t u i , t ( b ) ,
where the approximation error is due only to rounding in Equation (3). In parallel, the tag decryption under κ t produces T ˜ t ( b ) , which is checked against T t ( b ) as in the previous subsection, ensuring consistency between ciphertext aggregates and tags.
Stacking all blocks, we define the recovered sketched aggregate as
U ^ t : = b = 1 B t ( P ( b ) ) u ^ t ( b ) R k .
By linearity of P ( b ) and the unbiasedness of u i , t , conditioning on the sets { I t ( b ) } b we obtain
E U ^ t { v i , t } i , { I t ( b ) } b = b ( P ( b ) ) i I t ( b ) α i , t E [ u i , t ( b ) v i , t ] = b ( P ( b ) ) i I t ( b ) α i , t s i , t ( b ) = i α i , t b : i I t ( b ) ( P ( b ) ) s i , t ( b ) .
When all blocks are received, I t ( b ) = A t for all b, and the last expression collapses to Φ t G t by b ( P ( b ) ) P ( b ) = I k . Under rate adaptation (i.e., some I t ( b ) A t ), the expectation equals Φ t applied to a truncated aggregate in which each coordinate receives contributions only from those clients whose corresponding block arrived before closure. This aligns with the FL semantics where the effective aggregation set is the subset of clients that succeed in uploading their updates before the round deadline. Finally, the model update is computed from U ^ t via a pseudo-inverse of the sketch
G ^ t : = Φ t U ^ t / Δ ,
which is the estimator used in Section 5. In the idealized full-participation regime, E [ G ^ t { v i , t } ] = Φ t Φ t G t , which reduces to G t when Φ t has orthonormal rows; in the non-ideal regime, Φ t acts as a linear reconstruction operator for the partially observed sketch. Because decryptions and verifications occur blockwise, the server can begin updating coordinates associated with blocks that have already been closed and validated, while late blocks continue to stream, thereby tolerating both dropouts and heavy-tailed straggler behavior without violating the FE-based confidentiality guarantees.

5. Experiments

We empirically evaluate FlowAgg-FE—our KS-IPFE with PaS-Stream—on cross-device federated learning. Experiments target three questions: (i) Does FE-based transmission preserve model quality relative to plaintext and encrypted secure aggregation? (ii) What communication and compute savings are achieved? (iii) How robust is the system to client dropout and stragglers? All notation follows Section 3 and Section 4: clients compute clipped updates v i , t (Equation (1)), the server seeks only the linear aggregate G t (Equation (2)), and PaS-Stream uses Φ t -sketching and unbiased quantization Q b (Equations (7) and (8)) before KS-IPFE encryption.

5.1. Setup

We empirically evaluate FlowAgg-FE in a simulated cross-device federated learning environment that captures three interacting dimensions: (i) the statistical properties of the data distribution across clients, (ii) the training hyperparameters and model architectures, and (iii) the cryptographic and systems configuration of KS-IPFE and PaS-Stream. All experiments follow the notation of Section 3 and Section 4: each participating client C i A t produces a clipped update v i , t (Equation (1)), the aggregator is interested only in the linear functional G t (Equation (2)), and PaS-Stream applies Φ t -sketching and unbiased quantization Q b (Equations (7) and (8)) before KS-IPFE encryption. This subsection details the learning tasks, partitioning strategies, cryptographic parameters, and system environment used throughout the evaluation.
Tasks, models, and data partitioning. We consider two canonical FL tasks representative of vision and character recognition workloads:
  • CIFAR-10 [36]. A standard 10-class image classification task on 32 × 32 color images with 50,000 training and 10,000 test examples. We employ a ResNet-18 backbone with d 11.2 M trainable parameters. The data is partitioned across n = 1000 virtual clients according to a Dirichlet distribution with concentration α CIFAR = 0.5 over class labels, producing a moderately non-IID distribution in which clients see a biased subset of classes. In each round we sample | A t |   = 100 clients uniformly without replacement and perform E = 1 local epoch per client, for T = 100 global rounds.
  • FEMNIST [37]. A character recognition task derived from the Extended MNIST dataset, partitioned by the authors. We use a small convolutional neural network (two convolutional layers followed by two fully connected layers) with d 1.5  M parameters. The dataset is partitioned across n = 3400 clients (writers), each holding between 20 and 200 images; we model this using a Dirichlet distribution with α FEMNIST = 0.3 to accentuate heterogeneity. Each round samples | A t |   = 256 clients, with E = 1 local epoch and T = 120 global rounds.
Unless otherwise specified, all baselines (Plaintext-FedAvg, Encrypted SecAgg, KS-IPFE, and PaS-Stream) use identical optimization hyperparameters (learning rate schedule, momentum, weight decay, and clipping threshold S) and client participation patterns. Initial model weights are shared across methods and each configuration is repeated with three random seeds (affecting client sampling, data shuffling, and cryptographic noise) to report mean performance. We additionally include two recent secure aggregation baselines, SecAgg+ and BatchCrypt, to contextualize FlowAgg-FE against modern encrypted aggregation protocols under the same participation/latency model. Although we report end-to-end training on CIFAR-10 and FEMNIST for reproducibility, FlowAgg-FE’s cryptographic payload and helper work scale with the sketch dimension k (and block size D), not directly with the raw model dimension d. In particular, per-client encrypted uplink is Θ ( k ) (split into B t k / D fixed-size blocks), so the same configuration can support models with millions of parameters provided an appropriate k is chosen.
Table 1 summarizes the federated task configuration, highlighting the interaction between data partitioning and participation. The secure aggregation baselines (Table 2) run on the same tasks and inherit the configuration parameters ( n , | A t | , T , E , α ) from Table 1.
Figure 1 visualizes the resulting distribution of per-client dataset sizes under this sampling procedure. The heavy right tails for both tasks reflect the presence of a small number of “heavy” clients, which interact non-trivially with PaS-Stream’s rate-adaptive behavior and dropout robustness.
Cryptographic and streaming parameters. All experiments use a single family of LWE and gadget parameters for KS-IPFE, and a fixed sketching dimension for PaS-Stream unless explicitly varied in ablations. The default cryptographic parameters are as follows:
  • Modulus q = 2 32 , LWE dimension n l = 1024 , additive noise distributions χ , χ with small standard deviations selected to satisfy Equation (6) for up to | A t | active clients,
  • Gadget base 2 with l g = 16 digits per coordinate and block size D = 64 so that each plaintext block encodes 64 scaled coordinates,
  • Scaling factor Δ = 2 16 satisfying Δ S q / 8 to avoid wrap-around on clipped updates.
For PaS-Stream, we set the sketch dimension k = 8192 and the quantization bit-width b { 8 , 4 } ; the resulting compression operator Φ t { ± 1 / k } k × d is resampled every 10 rounds to mitigate potential adversarial alignment between the sketch and the data distribution. We use a 10-round rotation as a conservative balance between privacy (limiting long-term alignment/linkability of a fixed sketch) and optimizer stability (keeping the compression operator fixed long enough for momentum/error-feedback to adapt). We add a rotation-period sensitivity study in Section 5.3. The target coverage parameter for block closure in rate adaptation is θ = 0.7 unless otherwise specified, meaning that a block is sealed once at least 70 % of scheduled clients have successfully uploaded that block.
Table 3 summarizes the main cryptographic and streaming parameters, and Figure 2 depicts the simulated latency distribution used to emulate straggler behavior.
System environment. Experiments are executed on a cluster with two non-colluding helper processes and one aggregator process. Each helper runs on a 32-core 3.0 GHz CPU with 128 GB of RAM; the aggregator runs on a similar machine. The FL orchestration uses asynchronous RPC between clients and server, with simulated client processes replaying latency samples drawn from a Pareto distribution with shape parameter 1.2 and scale 1.0  s. The number of simulated clients ( n { 1000 , 3400 } ) exceeds the per-round participation | A t | , so that each round includes a fresh random subset of clients.
Figure 2 shows the empirical cumulative distribution function (CDF) of simulated client latencies. The heavy tail implies that a non-trivial fraction of clients are extreme stragglers, providing a realistic stress test for PaS-Stream’s ability to make progress with partial block coverage.

5.2. Main Results

We now compare FlowAgg-FE against the plaintext and encrypted secure aggregation baselines along three axes: final model quality, per-round communication cost, and per-round server-side compute. All experiments in this subsection use the setup of Section 5.1 with the default cryptographic and streaming parameters in Table 3. For readability, we focus on CIFAR-10 and FEMNIST; In addition to classic Encrypted SecAgg, we include SecAgg+ and BatchCrypt as modern encrypted baselines (Table 4 and Table 5). additional ablation results are deferred to the next subsection.
Accuracy. Table 4 reports final test accuracy after T = 100 rounds on CIFAR-10 and T = 120 rounds on FEMNIST, averaged over three random seeds. We observe that all secure methods match the plaintext baseline to within 0.3 % absolute on both tasks. In particular, KS-IPFE in full-precision mode is almost indistinguishable from Encrypted SecAgg, while PaS-Stream with b = 8 and b = 4 introduces only minor degradation consistent with the unbiasedness guarantee of Equation (8) and the controlled variance of Q b .
The small accuracy gaps between PaS-Stream and Plaintext-FedAvg can be attributed to two effects: (i) the additional variance introduced by Q b , which effectively adds a small amount of noise to each coordinate of the sketched gradient, and (ii) the use of a finite sketch dimension k = 8192 , which slightly distorts the geometry of the gradient space relative to the full d-dimensional model. In practice, these effects are dominated by the inherent noise of stochastic optimization, and the models converge to essentially the same generalization performance.
To corroborate the small gaps in Table 4, we include round-by-round convergence curves in Figure 3 and verified that all baselines share the same optimizer, data pipeline, and participation schedule.
Communication and computation. We next quantify the per-round per-client uplink and the server-side CPU time. Uplink reflects the serialized size (in megabytes) of all messages sent by a client to the server in a given round, including KS-IPFE ciphertexts, tags, and proofs; CPU time aggregates the wall-clock time spent by the aggregator and both helpers. Table 5 summarizes these metrics on CIFAR-10, and Figure 4 and Figure 5 visualize the same data. While Table 5 and Table 6 report uplink and server CPU, client-side encryption cost is also important in cross-device FL.
In addition to communication and server CPU, we provide an explicit accounting of helper-side storage/latency and key-management overhead (computed from Table 3, reported in Table 7). With q = 2 32 (4 bytes/word), D = 64 , l g = 16 ( D l g = 1024 ), and m = 2048 , one split function key share has ( m + D l g ) = 3072 words (≈12 KB). Thus, storing the D basis shares for block decryption costs 0.77  MB per helper, and per-round refresh material (e.g., for κ t ) is only ≈12 KB per helper. Per block, each helper evaluates ( D + 1 ) decryption shares, i.e., about ( D + 1 ) ( m + D l g ) 2.0 × 10 5 modular multiply-adds, and the server-side nonce state is O ( | A t | ) counters.
KS-IPFE already cuts uplink roughly in half compared to Encrypted SecAgg, even though it operates in full precision without sketching. This is largely due to (i) more efficient packing of plaintext coordinates into KS-IPFE blocks and (ii) avoiding some of the malleability-resistant padding required by the SecAgg pipeline. When PaS-Stream is enabled, the sketching dimension k and the quantization depth b further reduce the size of each client’s encoded update, yielding 2.5 × (8-bit) and 3.4 × (4-bit) reductions in uplink. On the compute side, FlowAgg-FE reduces helper and aggregator workload by lowering the number of effective plaintext coordinates that must be processed per round; PaS-Stream with b = 4 achieves a 1.77 × reduction in CPU time relative to Encrypted SecAgg while preserving accuracy within 0.3 % .
Figure 4 and Figure 5 display these results as grouped bar charts. Each bar corresponds to one of the four methods; the shading encodes the method as described in the caption, and numerical values are shown atop each bar for reference.
Overall, these results show that the combination of KS-IPFE and PaS-Stream achieves near-plaintext accuracy while significantly reducing both bandwidth and compute relative to a strong secure aggregation baseline. The gains arise from co-design across cryptography (functional encryption tailored to linear aggregation) and systems (sketching, quantization, and streaming), rather than from any single optimization in isolation. We also use a 10-round rotation as a conservative balance between privacy (limiting long-term alignment/linkability of a fixed sketch) and optimizer stability (keeping the compression operator fixed long enough for momentum/error-feedback to adapt).

5.3. Ablations and Stress Tests

We now probe the behavior of FlowAgg-FE under variations in quantization/sketch parameters and under adversarial systems conditions such as client dropout and heavy-tailed latencies. All experiments in this subsection are conducted on CIFAR-10 unless otherwise specified; FEMNIST exhibits qualitatively similar trends and is omitted for brevity. We focus on four questions: (i) how sensitive is model quality to the sketch dimension k and quantization depth b; (ii) how much communication is saved by more aggressive compression; (iii) how robust is PaS-Stream to random dropouts; and (iv) what overhead is induced by the verifiability layer under straggler-heavy latency distributions.
Quantization depth and sketch dimension. Recall that PaS-Stream applies a linear sketch Φ t R k × d to each clipped client gradient and then applies a stochastic quantizer Q b with bit-depth b. From Equation (8), the aggregated sketched update remains an unbiased estimator of Φ t G t , while the variance introduced by quantization scales as σ b 2 = O ( 2 2 b ) per coordinate. The sketch dimension k controls the Johnson–Lindenstrauss distortion and the amount of information preserved about the gradient direction.
Table 6 reports an ablation over ( k , b ) , showing accuracy, uplink, and server CPU time on CIFAR-10. The “PaS-Stream” rows differ only in their compression parameters; all other aspects of the protocol are identical.
Moving from ( k , b ) = (16,384, 8) to ( 8192 , 8 ) yields a 1.27 × reduction in uplink with negligible impact on accuracy, while ( 8192 , 4 ) further reduces uplink by 1.37 × and only degrades accuracy by 0.2 percentage points. At k = 4096 , the sketch becomes more aggressive: the uplink drops to 0.60  MB per client per round at b = 4 , but the accuracy falls to 91.0 % , a 0.7 % drop relative to the plaintext baseline. In practice, ( k , b ) = ( 8192 , 4 ) appears to strike a favorable balance between compression and model quality.
Figure 6 visualizes the trade-off between compression and accuracy. The horizontal axis denotes the uplink reduction factor relative to Encrypted SecAgg, and the vertical axis shows the corresponding CIFAR-10 accuracy. Each marker corresponds to one configuration from Table 6; the monotone drop in accuracy as compression intensifies reflects the increasing variance of the estimator G ^ t .
Dropout robustness and throughput under stragglers. We next examine robustness to random client dropout and heavy-tailed latency. For each method, we induce an independent per-client dropout probability ρ { 0 % , 10 % , 20 % , 30 % } by randomly marking scheduled clients as unavailable on each round; the remaining active clients behave as in Section 5.1. PaS-Stream additionally employs rate adaptation: once a block reaches coverage θ = 0.7 of the scheduled clients, it is closed and further messages for that block are ignored.
Table 8 reports final CIFAR-10 accuracy and a normalized throughput metric defined as the number of effective model updates per wall-clock minute (higher is better), under varying dropout rates for Encrypted SecAgg and PaS-Stream with ( k , b ) = ( 8192 , 8 ) . Accuracy remains stable up to ρ = 30 % for both methods, with PaS-Stream tracking Encrypted SecAgg within 0.1 0.2 % at each dropout level. Throughput decreases with ρ under Encrypted SecAgg because the protocol incurs coordination overhead due to missing contributions; by contrast, PaS-Stream’s blockwise decryption and early closure allow it to increase throughput slightly as ρ grows, effectively trading off some participation against faster rounds.
Figure 7 visualizes the accuracy drop as a function of ρ . The solid line corresponds to Encrypted SecAgg, and the dashed line to PaS-Stream; markers indicate the discrete dropout rates tested.
Finally, we evaluate the impact of heavy-tailed latencies under the Pareto model of Figure 2. Table 9 reports the relative overhead of the verifiability layer—linearly homomorphic tags and clipping/encoding proofs—on CIFAR-10 for KS-IPFE and PaS-Stream. We report the additional CPU time (absolute and as a percentage of the base KS-IPFE cost) and the additional communication per client per round.
Figure 8 shows the normalized throughput (effective model updates per minute) for KS-IPFE and PaS-Stream with and without the verifiability layer, under the same heavy-tailed latency configuration. Bars on the left of each pair correspond to the base protocol, and bars on the right to the verifiable variant. The small gaps between bars corroborate that the commuting verification layer adds only modest overhead while significantly strengthening integrity guarantees.

6. Conclusions

We have presented FlowAgg-FE, a novel verifiable functional encryption framework for secure and communication-efficient gradient transmission in distributed machine learning, tailored to the requirements of the Special Issue on security and privacy in distributed machine learning. At the cryptographic layer, our KS-IPFE scheme instantiates a key-splittable, LWE-based inner-product FE construction that supports high-dimensional, blockwise aggregation with 2-of-2 threshold decryption across two non-colluding helpers, thereby providing both function privacy and robustness against any single compromised server. At the systems layer, PaS-Stream integrates Johnson–Lindenstrauss sketching, unbiased quantization, and streaming FE encryption to produce rate-adaptive ciphertext flows that preserve unbiased estimation of the target aggregate G t while tolerating client dropouts and stragglers. Commuting linearly homomorphic tags and clipping proofs add an efficient verifiability mechanism that ensures end-to-end integrity of the aggregated updates without exposing per-client gradients. Our empirical evaluation on CIFAR-10 and FEMNIST demonstrates that FlowAgg-FE matches plaintext and state-of-practice secure aggregation accuracy within 0.3 % absolute, reduces per-client uplink by up to 3.4 × , and lowers server-side CPU time by up to 1.77 × under realistic participation patterns. These results indicate that carefully co-designed FE, compression, and verifiability can make function-private, scalable secure aggregation a practical building block for future federated and distributed learning systems. Future work includes extending KS-IPFE to richer families of linear and low-degree polynomial functions, exploring adaptive key rotation and revocation in dynamic client populations, and integrating our framework with production FL platforms and hardware accelerators.

Author Contributions

Conceptualization, Z.T., Z.P. and S.Y.; Methodology, Z.P.; Formal analysis, Z.T.; Investigation, Z.P.; Writing—original draft, Y.L. All authors have read and agreed to the published version of the manuscript.

Funding

Guangdong University of Science and Technology 2024 University-Level Project: Research on Innovation Strategies of Dongguan Cross-border E-commerce Driven by Digital Economy (GKY-2024KYZDW-14).

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding authors.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A. Parameter Selection

We briefly quantify the computational and communication overhead of FlowAgg-FE and justify the concrete parameter choices used in Section 5. Throughout, we assume that the block length D and gadget parameter l g are fixed at setup and that all parties share a 32-bit word representation for elements of Z q .
From Equation (13), a single client block encryption computes
c 1 = A r + e 1 , c 2 = B r + x ^ + e 2 .
The dominant costs are the matrix–vector products A r and B r . The former uses an m × n l matrix, costing O ( n l m ) word operations; the latter uses a D l g × n l matrix, costing O ( n l D l g ) . For typical choices in which m and D l g are of the same order (e.g., m 2 D l g ), the total per-block cost is
T enc block = Θ n l ( m + D l g )
ring multiplications and additions in Z q , plus O ( D l g ) operations for gadget packing x ^ = G D x and adding the small error vectors. A client with sketch dimension k encrypts B t = k / D blocks per round, so its total per-round encryption cost is B t T enc block . Instantiating the above with our default parameters (Table 3) gives D = 64 , l g = 16 , n l = 1024 , and typically m 2 D l g = 2048 , hence ( m + D l g ) 3072 . This yields about n l ( m + D l g ) 3.1 × 10 6 ring mul-adds per encrypted block. With k = 8192 and B t = k / D = 128 blocks per round, the per-round encryption work is roughly 4.0 × 10 8 ring mul-adds. Using a conservative throughput of 10 9 32-bit operations/s and 2 W device power gives a back-of-the-envelope estimate of <0.5 s and <1 J per round (device- and implementation-dependent), suggesting feasibility on mobile-class hardware.
Server-side aggregation is purely additive. For a fixed block b, forming C 1 ( b ) and C 2 ( b ) as in Equation (14) requires, for each participating client, adding one vector in Z q m and one in Z q D l g , i.e.,
T agg block = O | A t | ( m + D l g )
ring additions. Over all B t blocks, aggregation scales linearly in both | A t | and k, but does not depend on n l .
Each helper performs decryption by computing, for each function vector y , two inner products as in Equation (15): one with C 2 ( b ) Z q D l g and one with C 1 ( b ) Z q m . This costs
T dec block ( y ) = O ( D l g + m )
ring multiplications/additions. If we conceptually treat the decryption of a full D-dimensional block (i.e., the D coordinate functionals y = e 1 , , e D ) as a single “block decryption” and amortize shared work across coordinates, then per block we pay O ( D ( D l g + m ) ) operations in the worst case but with a small constant. With k = 8192 and D { 32 , 64 } , we have B t = k / D { 256 , 128 } , so there are 128–256 such block decryptions per round, independent of the number of clients; this is the quantity reported in Section 5. Additional decryptions for the tag selector κ t and any auxiliary function vectors contribute only a small constant factor.
In terms of communication, one KS-IPFE ciphertext per block consists of ( c 1 , c 2 ) Z q m + D l g , i.e., m + D l g words modulo q. With q = 2 32 , each word fits in 4 bytes, so a single block ciphertext has size
size ct block = 4 ( m + D l g ) bytes .
For example, with D = 64 and l g = 16 (so D l g = 1024 ) and a moderately overprovisioned m = 2048 , we have size ct block 4 × 3072 12  KB. A client with k = 8192 transmits B t = 128 such blocks, totaling roughly 1.5  MB of ciphertexts per round. Linearly homomorphic tags add one element of Z p per block (e.g., 8 bytes for a 64-bit p), and commitments/proofs add a small multiple of group elements; in our parameter regimes, these components together contribute less than 5 % of the ciphertext bandwidth.
Finally, we justify the parameter set used in the experiments as follows:
q = 2 32 , n l = 1024 , l g = 16 , Δ = 2 16 , b { 4 , 8 } , D = 64 .
The scaling factor Δ = 2 16 and clipping threshold S are chosen so that
Δ S 2 16 S q / 8 = 2 29 ,
ensuring that encoded coordinates do not wrap modulo q for all clipped updates and that the mapping x Δ 1 x remains injective over the support of interest. The LWE parameters ( n l , q , χ ) are selected so that the aggregated decryption noise η per block obeys Equation (6), namely
η q / 4 , η = O σ χ | A t | y ˜ 2 ,
for all authorized function vectors y and for | A t | up to several thousand. Setting σ χ , σ χ in the low tens, standard LWE estimates yield a per-block decryption failure probability below 2 40 , and thus an overall per-round failure probability well below 10 8 when decrypting on the order of 10 5 blocks. This parameter point therefore simultaneously satisfies correctness (via Equation (6)), security (via the hardness of LWE at dimension n l = 1024 and modulus q = 2 32 ), and practicality (via per-client uplink and helper-time budgets reported in Section 5).

Appendix B. Additional Security Proofs

This appendix formalizes the confidentiality and verifiability claims for KS-IPFE and the LHT layer using standard game-based and simulation-based arguments. We present single-round, single-block experiments; multi-block security follows by a hybrid over blocks, and multi-round security follows because encryption randomness and tag keys are fresh each round. Throughout, we use the notation and algorithms defined in Section 3 and Section 4, and we refer to Equation (11) (public key structure), Equation (12) (key splitting), Equation (15) (partial decryption share), and Equation (5) (share combination).

Appendix B.1. KS-IPFE Interface (Explicit Algorithms)

We make explicit the KS-IPFE algorithms used by the protocol. The message space is a block vector x M Z q D l g (after gadget packing/integer encoding), and the function space is y Y Z q D l g corresponding to a linear functional x , y over Z q .
  • Algorithms
  • Setup ( 1 λ ) ( mpk , msk ) : sample A R Z q n l × m and secret S and error E (as in Equation (11)), set B = A S + E , and output mpk = ( A , B , G D ) with msk = S (and any auxiliary trapdoor/parameters as in Section 3).
  • Enc ( mpk , x ) ct = ( c 1 , c 2 ) : sample fresh randomness r , e 1 , e 2 and output c 1 = A r + e 1 Z q m and c 2 = B r + e 2 + x ^ Z q D l g , where x ^ is the gadget-packed encoding of x (Section 3).
  • KeySplit ( msk , y ) ( sk y A , sk y B ) : output two helper key shares according to Equation (12), where each share contains a uniformly random masking component (denoted W in Equation (12)) and any required derived vector y ˜ .
  • PartDec ( sk y , CT ) σ y : on an aggregated ciphertext CT = ( C 1 , C 2 ) , helper { A , B } outputs a decryption share σ y as in Equation (15).
  • Comb ( σ y A , σ y B ) v : combine shares via Equation (5) and apply the rounding/decoding step to recover v = i α ˜ i , t x i , y (up to negligible decryption failure under the noise constraint).
Correctness follows from the standard LWE noise bound: under Equation (6), the rounding step in Comb succeeds with all but negligible probability, and the output equals the intended inner product over the encoded message space.

Appendix B.2. Confidentiality of KS-IPFE Under One-Helper Leakage

Appendix B.2.1. Security Experiment: IND KS - IPFE 1 H ( λ )

A challenger runs ( mpk , msk ) Setup ( 1 λ ) and gives mpk to A . The adversary controls Srv , may corrupt any subset of clients, and corrupts at most one helper (wlog. H A ). A may adaptively query two oracles. A key-share oracle O share ( y ) returns the corrupted helper’s share sk y A produced by KeySplit ( msk , y ) . An honest-helper oracle O part ( y , CT ) returns σ y B = PartDec ( sk y B , CT ) for any aggregated ciphertext CT . At challenge time, A outputs two equal-length message families { x i ( 0 ) } and { x i ( 1 ) } in M subject to the standard FE side condition: for every y queried to O share (and any y queried subsequently), the authorized aggregate outputs coincide, i.e., i α ˜ i , t x i ( 0 ) , y = i α ˜ i , t x i ( 1 ) , y in Z q . The challenger samples b R { 0 , 1 } and returns ct i Enc ( mpk , x i ( b ) ) for all i (equivalently, an aggregated ciphertext derived by linear homomorphism). A continues querying and outputs b . The advantage is Adv A IND = Pr [ b = b ] 1 2 .

Appendix B.2.2. Simulation-Based Confidentiality: Real 1 H vs. Ideal 1 H

The real experiment Real 1 H ( λ ) is identical to the above interaction: A receives mpk , ciphertext blocks, corrupted-helper key shares, and honest-helper response shares for its oracle queries. In the ideal experiment Ideal 1 H ( λ ) , a simulator S is given only the permitted leakage consisting of public metadata (round id, weights, block indices) and the authorized aggregate outputs { i α ˜ i , t x i , t ( b ) , y } y for every function y legitimately revealed by the protocol and by A ’s oracle queries, and must generate an indistinguishable transcript (ciphertexts, corrupted-helper key shares, and honest-helper shares). KS-IPFE is SIM-secure under one-helper leakage if for every PPT A there exists a PPT S such that Real 1 H ( λ ) c Ideal 1 H ( λ ) .

Appendix B.2.3. Theorem (Confidentiality Under LWE)

Assuming the LWE assumption holds at parameters ( n l , q , χ ) , for all PPT adversaries A we have Adv A IND negl ( λ ) ; moreover, KS-IPFE satisfies Real 1 H ( λ ) c Ideal 1 H ( λ ) .

Appendix B.2.4. Proof Sketch: Explicit Simulator and Hybrids (Reduction to LWE)

Simulator construction. Given the permitted leakage and public metadata, S samples A R Z q n l × m and B R Z q n l × D l g uniformly and sets mpk = ( A , B , G D ) . For every ciphertext block in the transcript, it outputs ( c 1 , c 2 ) R Z q m × Z q D l g . For every corrupted-helper key-share query on y , it samples the masking component (denoted W in Equation (12)) uniformly and outputs sk y A with the same distribution as Equation (12). For every honest-helper share query ( y , CT ) , it samples a share σ y B Z q uniformly except that, when the combined output value v = i α ˜ i , t x i , y is among the permitted leakage for that query, it chooses σ y B so that the combine rule in Equation (5) yields v (treating the corrupted helper share as uniform by the lemma below).
Hybrid sequence. Let H 0 be the real experiment. In H 1 , replace the LWE-structured matrix B = A S + E in mpk with a uniform matrix U R Z q n l × D l g ; by the LWE assumption (Equation (11)), H 0 c H 1 . In H 2 , conditioned on ( A , U ) , ciphertext blocks become message-independent: c 2 = U r + e 2 + x ^ is computationally indistinguishable from uniform in Z q D l g for fresh r , e 2 , and adding the fixed offset x ^ preserves uniformity; c 1 = A r + e 1 is independent of x ^ . Hence we can replace all ciphertext blocks by uniform samples, matching the simulator. In H 3 , replace corrupted-helper key shares by simulator-generated shares; this is distribution-preserving because the masking component W in Equation (12) is uniform by construction. In H 4 , simulate honest-helper response shares: for any aggregated ciphertext CT = ( C 1 , C 2 ) and function y , the corrupted helper share is σ y A = y ˜ , C 2 W , C 1 (Equation (15)); since W is uniform, W , C 1 is uniform over Z q whenever C 1 0 , so σ y A is uniform from the adversary’s perspective. Therefore the honest helper can choose σ y B uniformly subject only to satisfying the combined output value via Equation (5), which matches the simulator’s choice. The resulting distribution equals Ideal 1 H ( λ ) , establishing Real 1 H ( λ ) c Ideal 1 H ( λ ) and implying negligible IND advantage.

Appendix B.2.5. Lemma (Simulatability of One-Helper Decryption Shares)

Fix any function y and any aggregated ciphertext CT = ( C 1 , C 2 ) . In the view of an adversary that knows mpk , y ˜ , and at most one helper key share, the corresponding partial share σ y A (or σ y B ) is statistically close to uniform over Z q conditioned on ( mpk , y ˜ , CT ) except for the negligible event C 1 = 0 . This follows directly from Equation (15) because W , C 1 is uniform for uniform W and C 1 0 .
If both helpers collude with Srv , they can jointly evaluate authorized functionals on ciphertexts and recover per-ciphertext function outputs, which is outside our threat model. Section 3.2 discusses practical mitigations (independent domains, TEEs, and t-of-m generalizations).

Appendix B.3. Verifiability: LHT Commuting Checks (Formal Games and Leakage)

Appendix B.3.1. Verifiability Experiment and Soundness Bound

Define a verifiability experiment Vfy ( λ ) where a challenger samples a fresh per-round tag key κ t R Z p D and provides public parameters to an adversary A controlling Srv . For a closed block ( t , b ) with honest client contributions, A outputs a candidate decrypted aggregate u ^ t ( b ) and a tag value T ˜ t ( b ) (and any auxiliary commitment/proof objects required by the protocol) that are accepted by the verifier. A wins if u ^ t ( b ) differs from the honest linear aggregate but all checks pass. If κ t is hidden from Srv during round t, then for any PPT A the probability of winning is at most 1 / p per checked block, plus negligible terms from FE correctness and the soundness of any auxiliary proofs/commitments. The bound follows since passing the LHT equality check for a modified value requires Δ , κ t 0 ( mod p ) for a nonzero difference vector Δ , which holds with probability 1 / p over uniform κ t .

Appendix B.3.2. Lemma (Tag Hiding Given Unknown κt)

Fix any nonzero z Z p D and sample κ R Z p D . Then τ = z , κ mod p is uniform in Z p . Consequently, if κ t is hidden from Srv , tag values are pseudorandom and leak no additional information about gradients beyond what is already revealed by authorized FE outputs, except for the degenerate event z = 0 .
If κ t becomes known to Srv , each tag is an additional linear measurement z , κ t mod p ; while this does not directly expose norms, it is extra information. This motivates treating κ t as ephemeral round-scoped secret material and explicitly managing its lifecycle and access controls (Section 4.3).

Appendix C. Helper-Side API and State (Implementation-Facing)

Each helper H for { A , B } is a stateless (or minimally stateful) service that returns threshold decryption shares for aggregated ciphertext blocks only. This explicit API clarifies the helper architecture and the trust boundary.
Helper state. Each helper stores long-lived split keys for the block-coordinate basis vectors, namely { sk e j } j [ D ] , which are sufficient to produce decryption shares for each coordinate of a closed block. If verifiability is enabled, the helper additionally stores a small set of round-tagged split keys for auxiliary functions (e.g., sk κ ¯ t for LHT verification); these auxiliary shares are scoped to round t and are deleted after the round completes. Helpers do not maintain per-client ciphertext buffers and do not track per-client nonces; nonce validation is performed at Srv .
Helper input. For each closed block ( t , b ) , the server sends the round and block identifiers ( t , b ) , the aggregated ciphertext block CT t ( b ) = ( C 1 , t ( b ) , C 2 , t ( b ) ) , and a list of requested function identifiers F (typically { e 1 , , e D } and optionally κ ¯ t ). The request also includes an application-level context string for domain separation, binding the helper response to this protocol instance and round.
Helper output and server combine rule. The helper returns a set of decryption shares { σ f ( t , b ) } f F , where each share is computed as in Equation (15) using the stored split key sk f . For each requested function f, Srv combines the two helper shares (e.g., σ f A ( t , b ) σ f B ( t , b ) ) according to Equation (5) to reconstruct the corresponding aggregate value for block ( t , b ) .
Minimality and privacy. Helpers only ever process aggregated ciphertext blocks (never per-client ciphertexts) and return only additive response shares. Under the threat model in which Srv may collude with at most one helper, each helper’s view is individually simulatable, as formalized in Appendix B.

Appendix D. Round-Level Message-Flow

We present one FL round t in pseudocode in Algorithm A1.
Algorithm A1 FlowAgg-FE: Round t message flow
Require: Global model w t ; active set A t ; sketch dim k; block size D; threshold θ ; scale Δ ; quantizer Q b
Ensure: Updated model w t + 1
 1: 
// Server broadcast
 2: 
Srv samples or derives sketch spec Φ t (or seed seed t ) and broadcasts ( t , Φ t or seed t , Q b , Δ , D , B t = k / D , θ ) .
 3: 
// Key materialization (round-scoped)
 4: 
Key authority sends to each helper H : round-tagged split keys { sk e j } j [ D ] (and sk κ ¯ t if verifiability enabled).
 5: 
Key authority sends to each participating client: ephemeral tag key κ t (if enabled).
// Client-side (for each i A t , in parallel)
 6: 
for all  i A t  do
 7: 
      Compute clipped update v i , t .
 8: 
      Compute sketch and quantize: u i , t Q b ( Φ t v i , t ) R k .
 9: 
      Encode/scale: u ^ i , t Δ u i , t Z k .
10: 
     Partition into blocks u ^ i , t ( b ) Z D for b [ B t ] .
11: 
     for  b = 1 to B t  do
12: 
          Set nonce n i , t ( b ) (monotone within round).
13: 
          Encrypt block: ct i , t ( b ) Enc ( mpk , u ^ i , t ( b ) ) .
14: 
          Compute integrity metadata meta i , t ( b ) (e.g., LHT tag/commitment/proof).
15: 
          Send to server: ( t , b , n i , t ( b ) , ct i , t ( b ) , meta i , t ( b ) ) .
16: 
     end for
17: 
end for
// Server-side streaming aggregation and block closure
18: 
for  b = 1 to B t  do
19: 
     Initialize receive-set I t ( b ) , aggregate ciphertext CT t ( b ) 0 .
20: 
     while  | I t ( b ) |   < θ | A t |  do
21: 
          Upon receiving a valid tuple from client i for block b (nonce ok), set I t ( b ) I t ( b ) { i } .
22: 
          Update ciphertext aggregate: CT t ( b ) CT t ( b ) + α ˜ i , t · ct i , t ( b ) .
23: 
          Aggregate metadata analogously (if enabled).
24: 
     end while
25: 
     // Threshold decryption requests
26: 
      Srv sends ( t , b , CT t ( b ) , F ) to H A and H B , where F = { e 1 , , e D } and optionally κ ¯ t .
27: 
     Each helper returns shares { σ f ( t , b ) } f F .
28: 
     Combine shares to recover decrypted block aggregate U ^ t ( b ) = i I t ( b ) α ˜ i , t u ^ i , t ( b ) .
29: 
     Verify commuting checks for block b using decrypted values and aggregated metadata (if enabled).
30: 
end for
31: 
Assemble U ^ t Z k from { U ^ t ( b ) } b = 1 B t and decode update.
32: 
Update model: w t + 1 Update ( w t , U ^ t ) .
33: 
Delete round-scoped keys and ephemeral state (e.g., κ t , per-round split keys).

References

  1. McMahan, H.B.; Moore, E.; Ramage, D.; Hampson, S.; Arcas, B.A.y. Communication-Efficient Learning of Deep Networks from Decentralized Data. In Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (AISTATS) 2017, Fort Lauderdale, FL, USA, 20–22 April 2017. [Google Scholar]
  2. Konečný, J.; McMahan, H.B.; Yu, F.X.; Richtárik, P.; Suresh, A.T.; Bacon, D. Federated Learning: Strategies for Improving Communication Efficiency. arXiv 2016, arXiv:1610.05492. [Google Scholar]
  3. Kairouz, P.; McMahan, H.B.; Avent, B.; Bellet, A.; Bennis, M.; Bhagoji, A.N.; Bonawitz, K.; Charles, Z.; Cormode, G.; Cummings, R.; et al. Advances and Open Problems in Federated Learning. In Foundations and Trends in Machine Learning; Now Publishers Inc.: Hanover, MA, USA, 2021. [Google Scholar]
  4. Bonawitz, K.; Eichner, H.; Grieskamp, W.; Huba, D.; Ingerman, A.; Ivanov, V.; Kiddon, C.; Konečný, J.; Mazzocchi, S.; McMahan, H.B.; et al. Towards Federated Learning at Scale: System Design. In Proceedings of the2nd SysML Conference, Palo Alto, CA, USA, 31 March–2 April 2019. [Google Scholar]
  5. Bonawitz, K.; Ivanov, V.; Kreuter, B.; Marcedone, A.; McMahan, H.B.; Patel, S.; Ramage, D.; Segal, A.; Seth, K. Practical Secure Aggregation for Privacy-Preserving Machine Learning. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, Dallas, TX, USA, 30 October–3 November 2017. [Google Scholar]
  6. Gentry, C. Fully Homomorphic Encryption Using Ideal Lattices. In Proceedings of the STOC ’09: Symposium on Theory of Computing, Bethesda, MD, USA, 31 May–2 June 2009. [Google Scholar]
  7. Brakerski, Z.; Gentry, C.; Vaikuntanathan, V. (Leveled) Fully Homomorphic Encryption without Bootstrapping. In Proceedings of the 3rd Innovations in Theoretical Computer Science Conference, Cambridge, MA, USA, 8–10 January 2012. [Google Scholar]
  8. Cheon, J.H.; Kim, A.; Kim, M.; Song, Y. Homomorphic Encryption for Arithmetic of Approximate Numbers. In Advances in Cryptology—ASIACRYPT 2017, Proceedings of the 23rd International Conference on the Theory and Applications of Cryptology and Information Security, Hong Kong, China, 3–7 December 2017; Springer: Cham, Switzerland, 2017. [Google Scholar]
  9. Alistarh, D.; Grubic, D.; Li, J.; Tomioka, R.; Vojnovic, M. QSGD: Communication-Efficient SGD via Gradient Quantization and Encoding. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
  10. Seide, F.; Fu, H.; Droppo, J.; Li, G.; Yu, D. 1-bit Stochastic Gradient Descent and its Application to Data-Parallel Distributed Training of Speech DNNs. In Proceedings of the Interspeech 2014, Singapore, 14–18 September 2014. [Google Scholar]
  11. Aji, A.F.; Heafield, K. Sparse Communication for Distributed Gradient Descent. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, Copenhagen, Denmark, 7–11 September 2017. [Google Scholar]
  12. Lin, Y.; Han, S.; Mao, H.; Wang, Y.; Dally, W.J. Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training. In Proceedings of the 6th International Conference on Learning Representations, Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
  13. Bernstein, J.; Wang, Y.X.; Azizzadenesheli, K.; Anandkumar, A. SignSGD with Majority Vote is Communication Efficient and Fault Tolerant. arXiv 2018, arXiv:1810.05291. [Google Scholar]
  14. Karimireddy, S.P.; Rebjock, Q.; Stich, S.U.; Jaggi, M. Error Feedback Fixes SignSGD and other Gradient Compression Schemes. In Proceedings of the 36 th International Conference on Machine Learning, Long Beach, CA, USA, 9–15 June 2019. [Google Scholar]
  15. Achlioptas, D. Database-friendly Random Projections: Johnson–Lindenstrauss with Binary Coins. J. Comput. Syst. Sci. 2003, 66, 671–687. [Google Scholar] [CrossRef]
  16. Groth, J.; Sahai, A. Efficient Non-interactive Proof Systems for Bilinear Groups. In Advances in Cryptology—EUROCRYPT 2008, Proceedings of the 27th Annual International Conference on the Theory and Applications of Cryptographic Techniques, Istanbul, Turkey, 13–17 April 2008; Springer: Berlin/Heidelberg, Germany, 2008. [Google Scholar]
  17. Bünz, B.; Bootle, J.; Boneh, D.; Poelstra, A.; Wuille, P.; Maxwell, G. Bulletproofs: Short Proofs for Confidential Transactions and More. In Proceedings of the 2018 IEEE Symposium on Security and Privacy (SP), San Francisco, CA, USA, 20–24 May 2018. [Google Scholar]
  18. Boneh, D.; Sahai, A.; Waters, B. Functional Encryption: Definitions and Challenges. In Theory of Cryptography, Proceedings of the 8th Theory of Cryptography Conference, Providence, RI, USA, 28–30 March 2011; Springer: Berlin/Heidelberg, Germany, 2011. [Google Scholar]
  19. Regev, O. On Lattices, Learning with Errors, Random Linear Codes, and Cryptography. J. ACM 2009, 56, 34. [Google Scholar] [CrossRef]
  20. Agrawal, S.; Freeman, D.M.; Vaikuntanathan, V. Functional Encryption for Inner Product Predicates from Learning with Errors. In Advances in Cryptology—ASIACRYPT 2011, Proceedings of the 17th International Conference on the Theory and Application of Cryptology and Information Security, Seoul, Republic of Korea, 4–8 December 2011; Springer: Berlin/Heidelberg, Germany, 2011. [Google Scholar]
  21. Abdalla, M.; Bourse, F.; Caro, A.D.; Pointcheval, D. Simple Functional Encryption Schemes for Inner Products. In Public-Key Cryptography—PKC 2015, Proceedings of the 18th IACR International Conference on Practice and Theory in Public-Key Cryptography, Gaithersburg, MD, USA, 30 March–1 April 2015; Springer: Berlin/Heidelberg, Germany, 2015. [Google Scholar]
  22. Freeman, D.M. Improved Security for Linearly Homomorphic Signatures: A Generic Framework. In Public Key Cryptography—PKC 2012, Proceedings of the 15th International Conference on Practice and Theory in Public Key Cryptography, Darmstadt, Germany, 21–23 May 2012; Springer: Berlin/Heidelberg, Germany, 2012. [Google Scholar]
  23. Catalano, D.; Fiore, D. Practical Homomorphic MACs for Arithmetic Circuits. In Advances in Cryptology—EUROCRYPT 2013, Proceedings of the 32nd Annual International Conference on the Theory and Applications of Cryptographic Techniques, Athens, Greece, 26–30 May 2013; Springer: Berlin/Heidelberg, Germany, 2013. [Google Scholar]
  24. Katz, J.; Sahai, A.; Waters, B. Predicate Encryption Supporting Disjunctions, Polynomial Equations, and Inner Products. In Advances in Cryptology—EUROCRYPT 2008, Proceedings of the 27th Annual International Conference on the Theory and Applications of Cryptographic Techniques, Istanbul, Turkey, 13–17 April 2008; Springer: Berlin/Heidelberg, Germany, 2008. [Google Scholar]
  25. Han, K.; Lee, W.K.; Karmakar, A.; Yi, M.K.; Hwang, S.O. QuripfeNet: Quantum-Resistant IPFE-Based Neural Network. IEEE Trans. Emerg. Top. Comput. 2024, 13, 640–653. [Google Scholar] [CrossRef]
  26. Dowerah, U.; Dutta, S.; Mitrokotsa, A.; Mukherjee, S.; Pal, T. Unbounded predicate inner product functional encryption from pairings. J. Cryptol. 2023, 36, 29. [Google Scholar] [CrossRef]
  27. Pan, Z.; Ying, Z.; Wang, Y.; Zhang, C.; Zhang, W.; Zhou, W.; Zhu, L. Feature-Based Machine Unlearning for Vertical Federated Learning in IoT Networks. IEEE Trans. Mob. Comput. 2025, 24, 5031–5044. [Google Scholar] [CrossRef]
  28. Pan, Z.; Ying, Z.; Wang, Y.; Wang, Y.; Zhang, Z.; Zhou, W.; Zhu, L. Robust Watermarking for Federated Diffusion Models with Unlearning-Enhanced Redundancy. IEEE Trans. Dependable Secur. Comput. 2025, 1–15. [Google Scholar] [CrossRef]
  29. Pan, Z.; Ying, Z.; Wang, Y.; Zhang, C.; Li, C.; Zhu, L. One-shot backdoor removal for federated learning. IEEE Internet Things J. 2024, 11, 37718–37730. [Google Scholar] [CrossRef]
  30. Fereidooni, H.; Marchal, S.; Miettinen, M.; Mirhoseini, A.; Möllering, H.; Nguyen, T.D.; Rieger, P.; Sadeghi, A.R.; Schneider, T.; Yalame, H.; et al. SAFELearn: Secure aggregation for private federated learning. In Proceedings of the 2021 IEEE Security and Privacy Workshops (SPW), San Francisco, CA, USA, 27 May 2021; pp. 56–62. [Google Scholar]
  31. Zhao, L.; Jiang, J.; Feng, B.; Wang, Q.; Shen, C.; Li, Q. Sear: Secure and efficient aggregation for byzantine-robust federated learning. IEEE Trans. Dependable Secur. Comput. 2021, 19, 3329–3342. [Google Scholar] [CrossRef]
  32. Pan, Z.; Zeng, J.; Cheng, R.; Yan, H.; Li, J. PNAS: A privacy preserving framework for neural architecture search services. Inf. Sci. 2021, 573, 370–381. [Google Scholar] [CrossRef]
  33. Gennaro, R.; Jarecki, S.; Krawczyk, H.; Rabin, T. Secure distributed key generation for discrete-log based cryptosystems. In Proceedings of the 17th International Conference on the Theory and Applications of Cryptographic Techniques, Prague, Czech Republic, 2–6 May 1999; pp. 295–310. [Google Scholar]
  34. Mohri, M.; Sivek, G.; Suresh, A.T. Agnostic federated learning. In Proceedings of the 36 th International Conference on Machine Learning, Long Beach, CA, USA, 9–15 June 2019; pp. 4615–4625. [Google Scholar]
  35. Pan, Z.; Li, C.; Yu, F.; Wang, S.; Wang, H.; Tang, X.; Zhao, J. Fedlf: Layer-wise fair federated learning. In Proceedings of the Proceedings of the AAAI’24: AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada, 20–27 February 2024; Volume 38, pp. 14527–14535. [Google Scholar]
  36. Krizhevsky, A. Learning Multiple Layers of Features from Tiny Images; Technical Report; University of Toronto: Toronto, ON, Canada, 2009. [Google Scholar]
  37. Caldas, S.; Wu, P.; Li, T.; Konečný, J.; McMahan, H.B.; Smith, V.; Talwalkar, A. LEAF: A Benchmark for Federated Settings. arXiv 2018, arXiv:1812.01097. [Google Scholar]
Figure 1. Empirical histogram of per-client dataset sizes under Dirichlet partitioning. X aiz denotes Samples per client. Darker bars correspond to the CIFAR-10 configuration, lighter bars to FEMNIST. Both exhibit long tails, with a majority of clients holding fewer than 150 local samples and a minority acting as heavy contributors.
Figure 1. Empirical histogram of per-client dataset sizes under Dirichlet partitioning. X aiz denotes Samples per client. Darker bars correspond to the CIFAR-10 configuration, lighter bars to FEMNIST. Both exhibit long tails, with a majority of clients holding fewer than 150 local samples and a minority acting as heavy contributors.
Electronics 15 00928 g001
Figure 2. Empirical CDF of simulated client latencies following a Pareto distribution with shape 1.2 and scale 1.0 s. Approximately 60 % of clients respond within 1.5 s, while roughly 10 % exceed 3 s, modeling heavy-tailed straggler behavior that PaS-Stream must tolerate.
Figure 2. Empirical CDF of simulated client latencies following a Pareto distribution with shape 1.2 and scale 1.0 s. Approximately 60 % of clients respond within 1.5 s, while roughly 10 % exceed 3 s, modeling heavy-tailed straggler behavior that PaS-Stream must tolerate.
Electronics 15 00928 g002
Figure 3. Convergence curves for CIFAR-10 and FEMNIST. The near-overlap between Plaintext-FedAvg and encrypted baselines confirms correct baselining; PaS-Stream tracks plaintext closely with a small final gap under 4-bit quantization.
Figure 3. Convergence curves for CIFAR-10 and FEMNIST. The near-overlap between Plaintext-FedAvg and encrypted baselines confirms correct baselining; PaS-Stream tracks plaintext closely with a small final gap under 4-bit quantization.
Electronics 15 00928 g003
Figure 4. Per-client uplink on CIFAR-10 under four methods. From left to right, bars correspond to Encrypted SecAgg (light gray), KS-IPFE (medium gray), PaS-Stream with 8-bit quantization (dark gray), and PaS-Stream with 4-bit quantization (black). PaS-Stream achieves up to 3.4 × lower uplink than Encrypted SecAgg while preserving accuracy.
Figure 4. Per-client uplink on CIFAR-10 under four methods. From left to right, bars correspond to Encrypted SecAgg (light gray), KS-IPFE (medium gray), PaS-Stream with 8-bit quantization (dark gray), and PaS-Stream with 4-bit quantization (black). PaS-Stream achieves up to 3.4 × lower uplink than Encrypted SecAgg while preserving accuracy.
Electronics 15 00928 g004
Figure 5. Server-side CPU time per round on CIFAR-10. From left to right, bars correspond to Encrypted SecAgg (light gray), KS-IPFE (medium gray), PaS-Stream with 8-bit quantization (dark gray), and PaS-Stream with 4-bit quantization (black). KS-IPFE already yields a 1.61 × speedup; PaS-Stream further improves this to 1.77 × with 4-bit quantization.
Figure 5. Server-side CPU time per round on CIFAR-10. From left to right, bars correspond to Encrypted SecAgg (light gray), KS-IPFE (medium gray), PaS-Stream with 8-bit quantization (dark gray), and PaS-Stream with 4-bit quantization (black). KS-IPFE already yields a 1.61 × speedup; PaS-Stream further improves this to 1.77 × with 4-bit quantization.
Electronics 15 00928 g005
Figure 6. Accuracy vs. uplink reduction on CIFAR-10 for KS-IPFE and PaS-Stream configurations in Table 6. Each marker corresponds to a specific ( k , b ) ; both KS-IPFE and PaS-Stream lie close to the Pareto frontier where moderate compression (e.g., ( 8192 , 4 ) ) achieves large bandwidth reduction with minimal accuracy loss.
Figure 6. Accuracy vs. uplink reduction on CIFAR-10 for KS-IPFE and PaS-Stream configurations in Table 6. Each marker corresponds to a specific ( k , b ) ; both KS-IPFE and PaS-Stream lie close to the Pareto frontier where moderate compression (e.g., ( 8192 , 4 ) ) achieves large bandwidth reduction with minimal accuracy loss.
Electronics 15 00928 g006
Figure 7. CIFAR-10 accuracy under random client dropout. Solid line: Encrypted SecAgg; dashed line: PaS-Stream with ( k , b ) = ( 8192 , 8 ) . Both remain within 0.3 % of the no-dropout baseline up to ρ = 30 % , with PaS-Stream tracking SecAgg closely at all dropout levels.
Figure 7. CIFAR-10 accuracy under random client dropout. Solid line: Encrypted SecAgg; dashed line: PaS-Stream with ( k , b ) = ( 8192 , 8 ) . Both remain within 0.3 % of the no-dropout baseline up to ρ = 30 % , with PaS-Stream tracking SecAgg closely at all dropout levels.
Electronics 15 00928 g007
Figure 8. Normalized throughput under heavy-tailed client latencies (Pareto shape 1.2 ). For each method, the lighter bar shows throughput without the verifiability layer and the darker bar with tags and clipping proofs enabled. The verifiability layer reduces throughput by at most 4 % , while PaS-Stream retains a 1.2 1.3 × advantage over KS-IPFE in all cases.
Figure 8. Normalized throughput under heavy-tailed client latencies (Pareto shape 1.2 ). For each method, the lighter bar shows throughput without the verifiability layer and the darker bar with tags and clipping proofs enabled. The verifiability layer reduces throughput by at most 4 % , while PaS-Stream retains a 1.2 1.3 × advantage over KS-IPFE in all cases.
Electronics 15 00928 g008
Table 1. Federated task configuration. “Non-IID skew” refers to the Dirichlet concentration parameter α used to draw client label distributions; smaller α implies stronger heterogeneity. E is the number of local epochs per round.
Table 1. Federated task configuration. “Non-IID skew” refers to the Dirichlet concentration parameter α used to draw client label distributions; smaller α implies stronger heterogeneity. E is the number of local epochs per round.
TaskModeln | A t | TENon-IID Skew α
CIFAR-10ResNet-18100010010010.5
FEMNISTCNN (small)340025612010.3
Table 2. Secure aggregation baselines (protocol configuration).
Table 2. Secure aggregation baselines (protocol configuration).
ProtocolTrust/Collusion AssumptionDropout Handling
Encrypted SecAggServer does not learn individual updatesDesigned for client dropouts
SecAgg+Same goal as SecAgg; improved practicalityDesigned for client dropouts
BatchCryptSecure aggregation via batching/cryptoDepends on protocol instantiation
Table 3. Cryptographic and streaming parameters used by KS-IPFE and PaS-Stream. Latency parameters describe the synthetic client latency model used to drive rate-adaptive behavior.
Table 3. Cryptographic and streaming parameters used by KS-IPFE and PaS-Stream. Latency parameters describe the synthetic client latency model used to drive rate-adaptive behavior.
ParameterValueDescription
q 2 32 LWE modulus
n l 1024LWE dimension
l g 16gadget digits per coordinate
D64block length (plaintext coordinates)
Δ 2 16 scaling factor for Equation (3)
k8192sketch dimension (PaS-Stream)
b8 or 4quantization bit-width
θ 0.7 minimum block coverage fraction
Latency shape 1.2 Pareto shape for client latency
Latency scale 1.0  sPareto scale (minimum latency)
Table 4. Final test accuracy (%) after 100 rounds (CIFAR-10) and 120 rounds (FEMNIST); mean over 3 seeds. Δ Acc is the absolute difference to Plaintext-FedAvg.
Table 4. Final test accuracy (%) after 100 rounds (CIFAR-10) and 120 rounds (FEMNIST); mean over 3 seeds. Δ Acc is the absolute difference to Plaintext-FedAvg.
MethodCIFAR-10 Δ Acc (CIFAR)FEMNIST Δ Acc (FEMNIST)
Plaintext-FedAvg91.70.086.90.0
Encrypted SecAgg91.70.086.90.0
KS-IPFE (full-precision)91.6−0.186.8−0.1
PaS-Stream ( b = 8 )91.5−0.286.8−0.1
PaS-Stream ( b = 4 )91.4−0.386.7−0.2
Table 5. Per-round efficiency on CIFAR-10 (mean over rounds). Uplink is measured per client per round; CPU time is the total server-side time (aggregator + helpers) per round. The rightmost columns report multiplicative reductions relative to Encrypted SecAgg.
Table 5. Per-round efficiency on CIFAR-10 (mean over rounds). Uplink is measured per client per round; CPU time is the total server-side time (aggregator + helpers) per round. The rightmost columns report multiplicative reductions relative to Encrypted SecAgg.
MethodUplink (MB)CPU (s/Round)Uplink ReductionCPU Reduction
Encrypted SecAgg2.8062.01.00×1.00×
KS-IPFE (full-precision)1.4738.51.90×1.61×
PaS-Stream ( b = 8 )1.1236.02.50×1.72×
PaS-Stream ( b = 4 )0.8235.13.41×1.77×
Table 6. CIFAR-10 ablation over sketch dimension k and quantization depth b. Accuracy is final test accuracy (%) after 100 rounds; uplink is per-client per-round communication; CPU is server-side time per round.
Table 6. CIFAR-10 ablation over sketch dimension k and quantization depth b. Accuracy is final test accuracy (%) after 100 rounds; uplink is per-client per-round communication; CPU is server-side time per round.
Method ( k , b ) Accuracy (%)Uplink (MB)CPU (s/Round)
KS-IPFE (full-precision)91.61.4738.5
PaS-Stream(16,384, 8)91.61.4237.9
PaS-Stream ( 8192 , 8 ) 91.51.1236.0
PaS-Stream ( 8192 , 4 ) 91.40.8235.1
PaS-Stream ( 4096 , 8 ) 91.20.7834.6
PaS-Stream ( 4096 , 4 ) 91.00.6034.2
Table 7. Analytical overhead beyond uplink and server CPU (using Table 3).
Table 7. Analytical overhead beyond uplink and server CPU (using Table 3).
QuantityScalingExample at Default Params
Split key share size (per function, per helper) m + D l g words3072 words 12  KB
Helper key storage (basis { e j } ) D ( m + D l g ) words 64 × 12  KB 0.77  MB
Helper ephemeral key refresh (e.g., κ ¯ t ) m + D l g words/round≈12 KB per helper
Helper compute (per block) ( D + 1 ) ( m + D l g ) mul-adds 2.0 × 10 5
Server nonce state O ( | A t | ) countersnegligible vs. ciphertext buffers
Table 8. Robustness to random client dropout on CIFAR-10. Accuracy is final test accuracy (%); throughput is normalized relative to Encrypted SecAgg at ρ = 0 % .
Table 8. Robustness to random client dropout on CIFAR-10. Accuracy is final test accuracy (%); throughput is normalized relative to Encrypted SecAgg at ρ = 0 % .
MethodDropout ρ Accuracy (%)ThroughputThroughput Gain
Encrypted SecAgg0%91.71.001.00×
Encrypted SecAgg10%91.50.930.93×
Encrypted SecAgg20%91.30.880.88×
Encrypted SecAgg30%90.90.810.81×
PaS-Stream (8b)0%91.51.281.28×
PaS-Stream (8b)10%91.41.311.31×
PaS-Stream (8b)20%91.21.331.33×
PaS-Stream (8b)30%90.91.351.35×
Table 9. Verifiability overhead on CIFAR-10 for KS-IPFE and PaS-Stream. CPU overhead is measured as the difference between total server-side CPU time with and without tags/proofs. Communication overhead is the additional per-client uplink.
Table 9. Verifiability overhead on CIFAR-10 for KS-IPFE and PaS-Stream. CPU overhead is measured as the difference between total server-side CPU time with and without tags/proofs. Communication overhead is the additional per-client uplink.
MethodCPU Overhead (s/Round)CPU Overhead (%)Comm. Overhead (% of Uplink)
KS-IPFE (full-precision)1.64.2%2.1%
PaS-Stream ( k = 8192 , 8 b)1.54.3%2.7%
PaS-Stream ( k = 8192 , 4 b)1.44.1%2.9%
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Tan, Z.; Pan, Z.; Liang, Y.; Yang, S. A Novel Verifiable Functional Encryption Framework for Secure and Communication-Efficient Distributed Gradient Transmission Management. Electronics 2026, 15, 928. https://doi.org/10.3390/electronics15050928

AMA Style

Tan Z, Pan Z, Liang Y, Yang S. A Novel Verifiable Functional Encryption Framework for Secure and Communication-Efficient Distributed Gradient Transmission Management. Electronics. 2026; 15(5):928. https://doi.org/10.3390/electronics15050928

Chicago/Turabian Style

Tan, Ziya, Zijie Pan, Ying Liang, and Shuyuan Yang. 2026. "A Novel Verifiable Functional Encryption Framework for Secure and Communication-Efficient Distributed Gradient Transmission Management" Electronics 15, no. 5: 928. https://doi.org/10.3390/electronics15050928

APA Style

Tan, Z., Pan, Z., Liang, Y., & Yang, S. (2026). A Novel Verifiable Functional Encryption Framework for Secure and Communication-Efficient Distributed Gradient Transmission Management. Electronics, 15(5), 928. https://doi.org/10.3390/electronics15050928

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop