Next Article in Journal
Moving-Block-Based Lane-Sharing Strategy for Autonomous-Rail Rapid Transit with a Leading Eco-Driving Approach
Previous Article in Journal
EpilepsyNet-XAI: Towards High-Performance and Explainable Multi-Phase Seizure Analysis from EEG Features
Previous Article in Special Issue
FedUB: Federated Learning Algorithm Based on Update Bias
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Robust Aggregation in Over-the-Air Computation with Federated Learning: A Semantic Anti-Interference Approach

Faculty of Applied Sciences, Macao Polytechnic University, Macao SAR, China
*
Author to whom correspondence should be addressed.
Mathematics 2026, 14(1), 124; https://doi.org/10.3390/math14010124
Submission received: 17 October 2025 / Revised: 8 December 2025 / Accepted: 9 December 2025 / Published: 29 December 2025
(This article belongs to the Special Issue Federated Learning Strategies for Machine Learning)

Abstract

Over-the-air federated learning (AirFL) enables distributed model training across wireless edge devices, preserving data privacy and minimizing bandwidth usage. However, challenges such as channel noise, non-identically distributed data, limited computational resources, and small local datasets lead to distorted model updates, inconsistent global models, increased training latency, and overfitting, all of which reduce accuracy and efficiency. To address these issues, we propose the Semantic Anti-Interference Aggregation (SAIA) framework, which integrates a semantic autoencoder, component-wise median aggregation, validation accuracy weighting, and data augmentation. First, a semantic autoencoder compresses model parameters into low-dimensional vectors, maintaining high signal quality and reducing communication costs. Second, component-wise median aggregation minimizes noise and outlier impact, ideal for AirFL as it avoids mean-based aggregation’s noise sensitivity and complex methods’ high computation. Third, validation accuracy weighting aligns updates from non-identically distributed data to ensure consistent global models. Fourth, data augmentation doubles dataset sizes, mitigating overfitting and reducing variance. Experiments on MNIST demonstrate that SAIA achieves an accuracy of approximately 96% and a loss of 0.16, improving accuracy by 3.3% and reducing loss by 39% compared to conventional federated learning approaches. With reduced computational and communication overhead, SAIA ensures efficient training on resource constrained IoT devices.

1. Introduction

The proliferation of Internet of Things (IoT) devices, projected to exceed 40 billion by 2030 [1], is driving unprecedented data generation in applications such as smart cities, smartgrids, and industrial automation. These devices, embedded in urban infrastructure like traffic cameras and environmental sensors, generate massive volumes of data, often reaching terabytes daily. Efficiently processing this data is critical for applications like traffic optimization and environmental monitoring, offering substantial socioeconomic benefits [2]. However, conventional centralized machine learning faces significant challenges due to stringent privacy regulations and limited network bandwidth, which impede scalable and privacy-preserving data processing in distributed environments [3]. Federated learning (FL) addresses these challenges by enabling collaborative model training across devices while keeping data local, exchanging only model parameters [4]. Over-the-air federated learning (AirFL), an advanced FL variant, leverages wireless channel broadcasting to aggregate parameters, significantly reducing communication overhead and establishing itself as a cornerstone for edge computing and future 6G intelligent systems [5,6]. This paradigm is essential for enabling large-scale distributed learning in areas like intelligent transportation, where hierarchical FL frameworks may span both terrestrial and non-terrestrial networks [7]. Nevertheless, AirFL encounters significant challenges, including Additive White Gaussian Noise (AWGN), non-independent and identically distributed (non-i.i.d.) data, resource-constrained devices, and small local datasets, which collectively impact system performance.
Recent AirFL research advances communication efficiency, aggregation methods, and model flexibility. In communication, Abdullahi et al. proposed an OFDM-based scheme to boost transmission speed and reduce bit error rates [8]. Jin et al. countered eavesdropping with artificial noise [9], while Liu et al. reduced overhead using gradient compression [10]. Singh et al. improved dependability with MIMO technology [11], and Cui et al. enhanced signal alignment in noisy environments via CSI-based modulation [12]. Lan et al. mitigated interference through precoding techniques [13]. Further innovations in the physical layer, such as using Reconfigurable Intelligent Surfaces (RISs), offer new avenues for statistically combating interference and improving signal quality in AirFL [14]. For aggregation, Cao et al. accelerated convergence with hierarchical clustering, despite ongoing noise issues [15]. Phong et al. shortened convergence time using momentum-based methods [16], and Pakina et al. incorporated differential privacy, balancing data protection with minor performance trade-offs [17]. In flexibility, Lee et al. improved training stability with adaptive learning rates [18], Qiao et al. enhanced Fashion-MNIST accuracy through knowledge distillation [19], and Ma et al. improved imaging tasks via transfer learning [20]. Houssein et al. boosted CIFAR-10 accuracy with dynamic architectures, at the cost of extended training time [21]. The practical deployment of these advanced algorithms on hardware, such as on the diverse and resource-constrained devices that characterize the modern IoT landscape, underscores the critical need for lightweight and efficient solutions [22,23]. Zhou et al. developed a simulation framework to model varied channel conditions [24], Jiang et al. reduced overhead with sparse gradient updates, risking slower convergence in non-i.i.d. settings [25], and Wang et al. designed collaborative architectures for industrial IoT with adaptive resource allocation [26]. Though these innovations tackle targeted issues such as noise interference, inconsistent data, and inefficient operations, a cohesive framework that optimizes resistance to disruptions, scalability, and adaptability has yet to be developed.
Existing AirFL solutions often tackle specific issues, leaving gaps in achieving robust, scalable, and adaptable performance. For example, Abdullahi et al.’s OFDM-based scheme [8] boosts transmission speed but struggles with noise-induced distortions. Jin et al.’s artificial noise injection [9] enhances security yet overlooks non-i.i.d. data challenges. Liu et al.’s gradient compression [10] reduces overhead but slows convergence in diverse settings. Cao et al.’s hierarchical clustering [15] speeds up convergence but cannot fully address noise, while Houssein et al.’s dynamic architectures [21] improve accuracy at the cost of longer training. Phong et al.’s momentum-based methods [16] shorten convergence time but ignore resource constraints. Lee et al.’s adaptive learning rates [18] improve stability but fail to prevent overfitting on small datasets. Qiao et al.’s knowledge distillation [19] enhances Fashion-MNIST accuracy but lacks noise resilience. Cui et al.’s CSI-based modulation [12] improves signal alignment yet neglects data diversity. Zhou et al.’s simulation framework [24] models channel variations but offers no solutions for small datasets. Ma et al.’s transfer learning [20] boosts imaging accuracy but overlooks edge device limitations. Pakina et al.’s differential privacy [17] ensures data protection but degrades performance without addressing noise. Jiang et al.’s sparse updates [25] cut overhead but hinder convergence in non-i.i.d. cases. Lan et al.’s precoding [13] reduces interference but does not integrate with aggregation for diverse data. Wang et al.’s IoT architectures [26] enable clustering but fail to address noise, data variation, and resource constraints holistically. To overcome these limitations, we propose the Semantic Anti-Interference Aggregation (SAIA) framework, combining semantic encoding, median aggregation, adaptive weighting, and data augmentation. By enabling noise-resistant compression, mitigating the impact of outliers, and ensuring efficient convergence through targeted methods, the SAIA delivers stable, scalable performance while strengthening both the theory and real-world use of the AirFL.
Existing AirFL research targets individual challenges (channel noise, data heterogeneity, resource constraints, small local datasets) but lacks a comprehensive solution. To solve this, we propose the SAIA framework for robust, efficient, adaptive AirFL performance, with key contributions:
  • The proposed SAIA uses a semantic autoencoder to compress model parameters into low-dimensional representations—reducing noise and communication overhead while preserving accuracy. It also adopts server-side median aggregation to filter outliers from noise/data heterogeneity, boosting robustness and convergence vs. mean-based methods for scalable training in volatile environments.
  • For data heterogeneity and small local datasets, the SAIA applies a dual strategy (adaptive weighting + data augmentation). Accuracy-based adaptive weighting aligns local model updates to ensure consistent convergence; targeted augmentation expands effective dataset size for data-limited devices, mitigating overfitting for resource-constrained IoT.
  • The SAIA integrates privacy protection and computational efficiency: it transmits compressed semantic representations to lower model reconstruction risks for sensitive IoT, and its lightweight design cuts communication/computational loads for resource-limited devices.
The rest of the paper is structured as follows: Section 2 details the system model, Section 3 formulates the problem, Section 4 presents the proposed the SAIA solution, Section 5 discusses experimental results, and Section 6 concludes with future research directions. In addition, all notations in this paper can be found in Table 1.

2. System Model

The AirFL system, shown in Figure 1, supports collaborative training of a convolutional neural network (CNN) with parameters w R d across K edge devices, each indexed by k { 1 , , K } . Each device holds a local dataset D k , with n k = | D k | samples of the form ( x i ,   y i ) , where x i R m represents an input feature vector and y i { 1 , , C } denotes a class label. Using over-the-air computation (OTAC), the system aggregates local model updates in a single uplink transmission by exploiting the wireless channel’s superposition property, thereby reducing communication costs relative to standard federated learning methods [5]. Designed for IoT devices with limited computing and storage capacity [1], the system assumes an uplink channel affected by additive white Gaussian noise n ( t ) N ( 0 ,   σ 2 I ) , non-identical data distributions across devices [27], small local datasets that risk overfitting, and synchronized device participation with power control setting channel gains to h k = 1 . Assuming a noise-free downlink and Gaussian uplink noise that is independent across devices and time, the SAIA framework refines the learning procedure to ensure reliable and efficient convergence, consistent with common AirFL assumptions [5].

2.1. Communication Network Architecture

The AirFL system relies on a wireless network designed for efficient model aggregation via over-the-air computation (OTAC) and comprises the following components:
  • Central Server: A robust ground station manages distributed training. It initializes global model parameters w ( 0 ) R d using methods such as Xavier initialization to support steady convergence. During each communication round t, the server sends w ( t ) to all clients over a downlink channel and gathers their updates via OTAC on the uplink, calculating new parameters w ( t + 1 ) .
  • Clients: Each client k, typically an edge device like an IoT sensor with limited computing and storage, holds a local dataset D k . It trains a local CNN model with parameters w k ( t ) R d using stochastic gradient descent with momentum (SGDM) at set learning rates, momentum, batch sizes, and epochs, then sends updates or encoded versions to the server via OTAC.
  • Wireless Channel: The network uses a shared wireless medium for uplink (client to server) and downlink (server to client) communications. OTAC enables clients to transmit updates simultaneously, combining signals via analog superposition; the received signal includes noise. The channel assumes equal gains ( h k = 1 ) via power control and minimal interference between clients, thanks to synchronized OTAC transmissions [5]. This setup isolates AWGN effects and facilitates testing of SAIA’s semantic encoding and aggregation methods. In real IoT environments, non-orthogonal signals or channel fading can introduce distortion. Our work centers on algorithms, with plans to tackle interference using techniques like beamforming or interference cancellation, as noted in Section 6. Channel gains h k R are fixed at h k = 1 through power control for balanced contributions. The uplink faces additive white Gaussian noise, modeled as n ( t ) N ( 0 ,   σ 2 I ) , with noise variance σ 2 and identity matrix I R d × d . The downlink is assumed to be noise-free, achieved with high transmit power and reliable modulation such as quadrature amplitude modulation (QAM).
  • Communication Protocol: The system runs synchronously across T rounds, with all K clients joining each round. The AirFL process, shown in Figure 1, proceeds as follows for each round t { 0 , , T 1 } :
    • Broadcast: The server sends global model parameters w ( t ) R d to all clients over a noise free downlink, ensuring each starts with the same model for round t.
    • Local Training: Each client k { 1 , , K } trains a local CNN on its dataset D k using SGDM to reduce the local loss F k ( w k ( t ) ) = 1 n k i D k ( f ( x i ; w k ( t ) ) , y i ) (Equation (9)), generating updated parameters w k ( t ) R d .
    • OTAC Transmission: Each client encodes its parameters into a compact form z k ( t ) = g e ( w k ( t ) ; θ e ) R d Equation (30) and sends z k ( t ) simultaneously via OTAC, as depicted in Figure 1. The server receives the combined signal y ( t ) = k = 1 K z k ( t ) + n ( t ) , where n ( t ) N ( 0 ,   σ 2 I ) (Equation (4)).
    • Aggregation: The server decodes the signal to estimate { z ^ k ( t ) } , applies median aggregation m ( t ) = median ( { z ^ k ( t ) } k = 1 K ) (Equation (6)), calculates weights a ˜ k ( t ) = a k ( t ) j = 1 K a j ( t ) based on each client’s local accuracy a k ( t ) , and updates the global model as w ( t + 1 ) = g d ( k = 1 K a ˜ k ( t ) m ( t ) ; θ d ) (Equation (30)), as shown in Figure 1.
These steps rely on synchronized participation, consistent power control ( h k = 1 ), and no client dropouts [5]. OTAC reduces communication costs by aggregating updates in a single uplink transmission, unlike the sequential uploads in traditional federated learning. Equal channel gains ( h k = 1 ) simplify aggregation, but AWGN can degrade model accuracy. A noise-free downlink is practical, supported by strong error correction coding and sufficient transmit power. The synchronous protocol ensures steady client involvement, though noise and uneven data distributions require advanced aggregation beyond basic averaging. Within these conditions, the SAIA uses semantic encoding, median aggregation, adaptive weighting, and data augmentation to deliver reliable convergence and efficiency, as detailed in the protocol.

2.2. Mathematical Model

At communication round t, each client k { 1 , , K } updates its local parameters w k ( t ) R d . The over-the-air computation (OTAC) received signal at the server is
y ( t ) = k = 1 K h k w k ( t ) + n ( t ) ,
where the channel gain h k = 1 is achieved through reliable power control, adjusting client transmission power to normalize received signal strength, ensuring equal contribution from each client in OTAC aggregation. The noise term n ( t ) N ( 0 ,   σ 2 I ) , with I R d × d as the identity matrix, represents AWGN with zero mean and variance σ 2 , modeling independent random perturbations in each uplink channel dimension, consistent with standard wireless communication models. This simplifies to
y ( t ) = k = 1 K w k ( t ) + n ( t ) .
In traditional FedAvg, the global model update is
w ( t + 1 ) = 1 K y ( t ) = 1 K k = 1 K w k ( t ) + 1 K n ( t ) ,
where the noise term 1 K n ( t ) N ( 0 ,   σ 2 K I ) has per-dimension variance σ 2 K . The SAIA framework encodes local parameters into a compact semantic representation z k ( t ) R d using an autoencoder, transmits these, and aggregates the following:
y ( t ) = k = 1 K z k ( t ) + n ( t ) ,
where the server estimates the individual semantic representations { z ^ k ( t ) } k = 1 K through iterative decoding, solving the optimization problem for each client k:
z ^ k ( t ) = arg min z k y ( t ) j = 1 K z j 2 2 + λ z k 1 ,
where λ is a regularization parameter to stabilize the solution and promote sparsity in the semantic representation, and the iterative process leverages the Iterative Shrinkage-Thresholding Algorithm (ISTA) to refine estimates of z k ( t ) , ensuring accurate recovery despite noise. This is followed by median aggregation, which computes the component-wise median of the decoded semantic representations { z ^ k ( t ) } k = 1 K , where for each dimension i { 1 , , d } , the median m i ( t ) is the value such that at least half of the z ^ k , i ( t ) values are less than or equal to m i ( t ) , and at least half are greater than or equal to m i ( t ) , ensuring robustness to outliers:
m ( t ) = median ( { z ^ k ( t ) } k = 1 K ) ,
and adaptive weighting to reconstruct the global model parameters w ( t + 1 ) via decoding. Equation (1) formalizes the superposition of client updates, with noise n ( t ) affecting each dimension independently. In FedAvg (Equation (3)), the noise term disrupts convergence in high-dimensional models. The SAIA’s semantic encoding (Equation (4)) reduces dimensionality to d d , mitigating noise impact. Median aggregation (Equation (6)) filters outliers, enhancing robustness to AWGN and client-specific deviations. The use of L 1 regularization in Equation (5) is crucial for accurate signal separation and is solved efficiently using ISTA. Assuming Gaussian noise with bounded variance, effective signal separation, and reliable decoding, the SAIA minimizes noise-induced errors, ensuring robust convergence through information-theoretic and statistical principles.

2.3. Training Objective

The global objective is to minimize the weighted average loss across all clients:
min w F ( w ) = 1 k = 1 K n k k = 1 K i D k ( f ( x i ; w ) , y i ) ,
where ( f ( x i ; w ) , y i ) is the cross-entropy loss:
( f ( x i ; w ) , y i ) = c = 1 C I { y i = c } log ( f c ( x i ; w ) ) ,
with f ( x i ; w ) R C as the CNN’s softmax output over C classes. Each client optimizes its local loss:
F k ( w k ) = 1 n k i D k ( f ( x i ; w k ) , y i ) .
The SAIA framework’s autoencoder minimizes the reconstruction loss:
min θ e , θ d E w k ( t ) g d ( g e ( w k ( t ) ; θ e ) ; θ d ) 2 2 ,
where g e ( · ; θ e ) : R d R d is the encoder, and g d ( · ; θ d ) : R d R d is the decoder. The global loss (Equation (7)) weights contributions by dataset size, reflecting the distributed data structure. The cross-entropy loss (Equation (8)) penalizes incorrect predictions, encouraging accurate classification. Non-i.i.d. data distributions cause divergent local objectives (Equation (9)), with F k ( w k ) F j ( w j ) , complicating global convergence. The autoencoder loss (Equation (10)) ensures that semantic representations z k ( t ) capture essential model features, enhancing robustness to noise and aligning local updates across heterogeneous datasets. Assuming sufficient encoder capacity and bounded loss variance, the The SAIA maximizes mutual information I ( w k ( t ) ; z k ( t ) ) , ensuring information preservation and alignment.

3. Problem Formulation

The AirFL system seeks to minimize the global loss function F ( w ) , defined in (7), using over-the-air computation (OTAC) for parameter aggregation. However, four challenges—channel noise, data heterogeneity, resource constraints, and small local datasets—hinder performance. These challenges, denoted as Problem 1 through Problem 4, collectively define the overarching problem: to develop an AirFL framework that simultaneously mitigates signal distortion due to channel noise, parameter divergence caused by non-i.i.d. data, the computational and communication limitations of client devices, and overfitting arising from small local datasets, thereby ensuring accurate and efficient global-model convergence in resource-constrained IoT settings.

3.1. Assumptions and Practical Considerations

To provide a rigorous theoretical foundation, our analysis in Section 4 relies on a set of idealized assumptions. It is crucial to distinguish these from the more complex and realistic conditions under which we evaluate our framework’s performance via simulation in Section 5. Theoretical Assumptions (for analysis):
  • Ideal Channel: We assume perfect power control ( h k = 1 ), a noise-free downlink, and an uplink affected only by Additive White Gaussian Noise (AWGN).
  • Full Synchronous Participation: We assume all K clients participate in every communication round without failure or dropout.
Practical Considerations (for simulation):
  • Non-Ideal Channel Model: Our simulations incorporate a more realistic channel with random client dropouts (simulating connection failure) and a simplified inter-user interference model to test the framework’s robustness.
  • Asynchronous Participation: The dropout model simulates an asynchronous environment where a subset of clients contributes to the global model in any given round.
This dual approach allows us to develop a tractable theoretical understanding of the system while empirically demonstrating the practical utility and resilience of SAIA against challenges not captured in the simplified model.

3.2. Problem 1: Channel Noise

The AWGN term n ( t ) N ( 0 ,   σ 2 I ) in the OTAC received signal (1) distorts the FedAvg global model update (3). To quantify this, we derive the expected squared error between the noisy update w ( t + 1 ) and the ideal parameter average 1 K k = 1 K w k ( t ) :
E w ( t + 1 ) 1 K k = 1 K w k ( t ) 2 2 = E 1 K n ( t ) 2 2 ,
as w ( t + 1 ) = 1 K k = 1 K w k ( t ) + 1 K n ( t ) . The squared L 2 -norm is
1 K n ( t ) 2 2 = i = 1 d n i ( t ) K 2 ,
where n i ( t ) N ( 0 ,   σ 2 ) . The expectation is
E 1 K n ( t ) 2 2 = i = 1 d E n i ( t ) K 2 = i = 1 d 1 K 2 E [ ( n i ( t ) ) 2 ] ,
As E [ ( n i ( t ) ) 2 ] = σ 2 , we obtain
E n i ( t ) K 2 = σ 2 K 2 ,
yielding:
E 1 K n ( t ) 2 2 = σ 2 d K .
The noise impact is quantified by the signal-to-noise ratio (SNR), defined as SNR = 10 log 10 E [ k = 1 K w k ( t ) 2 2 ] E [ n ( t ) 2 2 ] = 10 log 10 K σ w 2 σ 2 , where σ w 2 0.1 is the variance of model parameters. For robust convergence, we target SNR 20 dB, corresponding to σ 2 0.1 for typical CNNs with K = 5 , ensuring minimal distortion in high-dimensional updates. The error scales with dimensionality d, disrupting convergence, particularly for high-dimensional CNNs. The impact on convergence is
E [ F ( w ( t + 1 ) ) ] F ( w ( t ) ) η F ( w ( t ) ) 2 2 + η E F ( w ( t ) ) 1 K k = 1 K F k ( w k ( t ) ) 2 2 + η 2 σ 2 d K ,
where η is the learning rate, defined as the step size for gradient updates in stochastic gradient descent, and the noise term η 2 σ 2 d K slows convergence. The gradient divergence term is bounded as follows:
E F ( w ( t ) ) 1 K k = 1 K F k ( w k ( t ) ) 2 2 1 K k = 1 K L 2 E w k ( t ) w ( t ) 2 2 L 2 σ δ 2 d ,
where L is the Lipschitz constant of the gradient F k , ensuring smoothness of the loss function, typically estimated based on the CNN architecture, and σ δ 2 β D KL , with β as a constant scaling factor reflecting data distribution moments, typically determined empirically. The SAIA mitigates noise through semantic encoding, reducing dimensionality to d d (Equation (4)), and median aggregation:
E m ( t ) 1 K k = 1 K z k ( t ) 1 σ d K π 2 ,
outperforming mean aggregation’s σ d K π 2 under Gaussian noise, as the median minimizes L 1 -error for heavy-tailed distributions, ensuring robust convergence [5]. The robustness of median aggregation is proven as follows: For a set of noisy estimates { z ^ k ( t ) } k = 1 K , the median m ( t ) minimizes the L 1 -norm k = 1 K z k ( t ) m ( t ) 1 , reducing the impact of outliers compared to the mean, which minimizes the L 2 -norm. Under Gaussian noise n ( t ) N ( 0 ,   σ 2 I ) , the expected L 1 -error is bounded by σ d K π 2 due to the lower dimensionality d and the median’s robustness to heavy-tailed distributions [5]. The error in (15) arises solely from the noise term, as client parameters cancel out in the difference. The L 2 -norm in (12) measures the Euclidean deviation across d dimensions. The variance reduction by 1 K 2 per dimension reflects the averaging effect, but the total error grows with d, which is large for CNNs. This noise perturbation disrupts convergence, necessitating robust aggregation methods to filter distortions. Assuming independent Gaussian noise, the SAIA’s encoding and aggregation minimize noise impact, ensuring stable updates.

3.3. Problem 2: Data Heterogeneity

Non-i.i.d. data distributions across clients cause local models to optimize divergent objectives, as the local loss F k ( w k ) in (9) depends on the client-specific distribution P k ( x ,   y ) = 1 n k i D k I { ( x i ,   y i ) = ( x ,   y ) } . We quantify heterogeneity using the Kullback–Leibler (KL) divergence:
D KL ( P k P j ) = x ,   y P k ( x ,   y ) log P k ( x ,   y ) P j ( x ,   y ) ,
where larger D KL increases σ δ 2 via σ δ 2 β D KL , with β as a constant scaling factor reflecting data distribution moments, typically determined empirically. The parameter variance across clients is
Var ( w k ( t ) ) = E w k ( t ) E [ w k ( t ) ] 2 2 ,
with E [ w k ( t ) ] = 1 K k = 1 K w k ( t ) . Modeling local parameters as w k , i ( t ) = w i ( t ) + δ k , i ( t ) , where δ k , i ( t ) is a zero-mean deviation with variance σ δ 2 , the variance per dimension is
Var ( w k , i ( t ) w ¯ i ( t ) ) = K 1 K σ δ 2 ,
summing to:
Var ( w k ( t ) ) = d · K 1 K σ δ 2 ,
The convergence impact is
E [ F ( w ( t + 1 ) ) ] F ( w ( t ) ) η F ( w ( t ) ) 2 2 + η E F ( w ( t ) ) 1 K k = 1 K F k ( w k ( t ) ) 2 2 + η 2 d K 1 K σ δ 2 ,
where the divergence term scales with σ δ 2 . Assuming Lipschitz continuity of F k with constant L, the gradient divergence is bounded by E F k ( w k ( t ) ) F ( w ( t ) ) 2 2 L 2 E w k ( t ) w ( t ) 2 2 L 2 σ δ 2 d , linking to Equation (22). The SAIA’s encoding aligns updates, maximizing mutual information I ( w k ( t ) ;   z k ( t ) ) , and adaptive weighting prioritizes reliable clients:
E k = 1 K a ˜ k ( t ) m ( t ) 1 K k = 1 K z k ( t ) 2 2 k = 1 K a ˜ k ( t ) E m ( t ) z k ( t ) 2 2 ,
where a ˜ k ( t ) = a k ( t ) j = 1 K a j ( t ) , and a k ( t ) is the client’s local accuracy, reducing divergence by weighting updates inversely proportional to D KL , assuming bounded D KL  [27]. The KL divergence (19) measures distributional disparity, driving divergent gradients F k ( w k ) F j ( w j ) . The variance (22) quantifies parameter dispersion due to non-i.i.d. data, with σ δ 2 increasing with D KL . The factor K 1 K arises from independent client deviations, and the error scales with d. This model drift slows convergence, requiring methods to align local updates. Assuming non-i.i.d. distributions with bounded divergence, the SAIA’s encoding and weighting ensure alignment, mitigating heterogeneity.

3.4. Problem 3: Resource Constraints

Clients face computational and communication limitations. The training cost for client k is
Cos t k = E · n k · C ,
where E is the number of epochs, n k is the dataset size, and C is the per-sample cost, scaling with d due to CNN complexity. Communication involves transmitting w k ( t ) R d , with cost proportional to d. Equation (25) models training effort, with C reflecting operations like convolutions, proportional to d. Large E or n k strains limited client resources (e.g., CPU, battery). Communication cost grows with d, challenging bandwidth-constrained devices. These constraints limit model complexity and communication frequency, necessitating efficient algorithms. For a CNN, C d , and communication cost is O ( d ) . The SAIA reduces this to O ( d ) , where d d , assuming constrained resources, ensuring scalability [1].

3.5. Problem 4: Small Local Datasets

Small datasets ( n k ) increase overfitting risk, where the empirical loss F ^ k ( w k ) = 1 n k i D k ( f ( x i ; w k ) ,   y i ) deviates from the true loss F k ( w k ) . The variance is
E F ^ k ( w k ) F k ( w k ) 2 = Var ( ) n k ,
where high variance causes overfitting. The SAIA’s data augmentation increases effective dataset size:
n k eff = α n k , E F ^ k ( w k ) F k ( w k ) 2 = Var ( ) α n k ,
where α > 1 (e.g., α 2 for rotations and translations), assuming transformations preserve data distribution, reducing variance and enhancing generalization.

4. Proposed Solution

The SAIA framework addresses the challenges of channel noise, data heterogeneity, resource constraints, and small local datasets in AirFL through a combination of neural network optimization and a robust aggregation pipeline. The SAIA integrates a semantic autoencoder, data augmentation, and an iterative aggregation strategy to enhance robustness and efficiency.
In this section, we detail the neural network architecture, optimization methods, and the SAIA algorithm, providing mathematical formulations to address each challenge. We include derivations for SAIA’s convergence under the following assumptions: (1) sufficient encoder capacity, (2) effective data augmentation, (3) reliable client-accuracy metrics, and (4) bounded noise and heterogeneity. Additionally, the SAIA’s semantic encoding reduces the risk of model inversion attacks by transmitting compressed representations z k ( t ) R d , lowering information leakage compared to raw parameters w k ( t ) R d . The mutual information I ( w k ( t ) ;   z k ( t ) ) H ( z k ( t ) ) log ( | R d | ) limits reconstructible information, enhancing privacy in sensitive applications like smartgrids [17].

4.1. Neural Network Optimization Architecture

The global and local models are convolutional neural networks (CNNs) with a semantic autoencoder, as illustrated in Figure 2, tailored for classification on datasets like MNIST or CIFAR-10. The architecture optimizes representational capacity while ensuring robustness to noise and data heterogeneity:
  • Input Layer: Normalizes input x k R m to [0, 1].
  • Convolutional Layer 1 (Conv1): Applies 32 filters ( 3 × 3 ), stride 1, same padding, batch normalization, and ReLU.
  • Pooling Layer 1 (Pool1): 2 × 2 max-pooling, stride 2, reducing spatial dimensions. No parameters.
  • Convolutional Layer 2 (Conv2): Applies 64 filters ( 3 × 3 ), stride 1, same padding, batch normalization, and ReLU.
  • Pooling Layer 2 (Pool2): 2 × 2 max-pooling, stride 2.
  • Semantic AutoEncoder Module: First, the encoder flattens Pool2 output and maps to a low-dimensional bottleneck via a fully connected layer with ReLU, yielding z k ( t ) R d ; second, the decoder maps z k ( t ) back to the flattened feature space, followed by reshaping and symmetric deconvolutional layers.
  • Output Layer: Maps the bottleneck vector to C units with softmax.
The autoencoder minimizes the reconstruction loss:
min θ e , θ d E w k ( t ) g d ( g e ( w k ( t ) ; θ e ) ; θ d ) 2 2 ,
where g e ( · ; θ e ) : R d R d , g d ( · ; θ d ) : R d R d . The CNN extracts hierarchical features, with convolutions capturing local patterns and pooling reducing dimensionality. The autoencoder’s bottleneck creates a compact, noise-robust representation z k ( t ) , mitigating channel noise (15) by reducing dimensionality ( d d ). The reconstruction loss (28) ensures fidelity, while the semantic layer aligns features across non-i.i.d. datasets, addressing heterogeneity (19). It is critical to note that the autoencoder is pretrained centrally on a proxy dataset of model parameters before the federated learning process commences. The learned encoder and decoder weights, θ e and θ d , are then frozen and distributed to all clients once at the beginning. During the federated training rounds, these parameters are not updated and are never transmitted over the wireless channel. The significant communication saving is achieved because each client only transmits the low-dimensional semantic vector z k ( t ) R d , which is orders of magnitude smaller than the full model parameter vector w k ( t ) R d . Assuming sufficient encoder capacity, the encoder maximizes mutual information I ( w k ( t ) ;   z k ( t ) ) , with reconstruction error bounded by
E w k ( t ) w ^ k ( t ) 2 2 ϵ , ϵ σ w 2 d ,
where σ w 2 estimated from normalized CNN weights, reflecting typical variance in trained model parameters, and d addressing Problems 1 and 2. A critical consideration in our aggregation strategy is the semantic alignment of the latent vectors z k generated by different clients. A naive coordinate-wise aggregation, such as the median, assumes that corresponding dimensions of these vectors share a similar semantic meaning. We argue that our framework naturally encourages this alignment through two key mechanisms. First, all clients utilize a shared, frozen decoder ( g d ) to reconstruct the model parameters. For the aggregated vector to produce a functional global model, it must reside in a meaningful region of the latent space, incentivizing clients to produce compatible representations. Second, the synchronization from a common global model  w ( t ) at the start of each round constrains the local update trajectories. This ensures that the client models do not diverge arbitrarily, keeping their encoded representations within a localized and well-structured manifold. While not a guarantee of perfect alignment, these factors provide a strong inductive bias that makes coordinate-wise operations a robust and efficient heuristic, a claim we now validate empirically in our new ablation study.

4.2. Optimization Methods

The SAIA employs three optimization techniques integrated into training and aggregation:
  • Semantic Encoding and Decoding: The encoder maps parameters to a semantic representation:
    z k ( t ) = g e ( w k ( t ) ;   θ e ) ,
    and the decoder reconstructs:
    w ^ k ( t ) = g d ( z k ( t ) ;   θ d ) .
    This reduces communication cost (25) by transmitting z k ( t ) , and enhances noise robustness (15). The encoding maximizes mutual information, with error bounded by Equation (29), addressing Problems 1 and 2.
  • Data Augmentation: Clients apply transformations to increase effective dataset size:
    n k eff = α n k ,
    reducing variance:
    E F ^ k ( w k ) F k ( w k ) 2 = Var ( ) α n k ,
    where α > 1 (e.g., α 2 for rotations and translations), addressing Problem 4, assuming transformations preserve data distribution.
  • Robust Aggregation: The SAIA uses iterative decoding, median aggregation, and adaptive weighting, formalized in Algorithm 1. The recovery of individual vectors { z k ( t ) } from the superimposed signal y ( t ) as described in Equation (4) is a well-studied problem in the field of compressed sensing and modern multi-user communication [28,29]. Our approach leverages these principles by operating under the assumption that the semantic vectors z k ( t ) exhibit sufficient sparsity. This sparsity is a common and desirable property in compressed latent representations generated by autoencoders, as it encourages disentangled and efficient features [30]. Under this sparsity assumption, Equation (5) becomes a standard L1-regularized least squares problem. Algorithms such as the Iterative Shrinkage-Thresholding Algorithm (ISTA) are specifically designed to solve such problems efficiently. Therefore, our work employs an ISTA-based method as the iterative decoder. This strategic choice allows us to leverage established signal processing techniques for the physical layer recovery, enabling our research to focus on the overarching end-to-end performance and robustness of the semantic-aware federated learning framework. Median aggregation minimizes the L 1 -error (Equation (18)), and adaptive weighting reduces divergence (Equation (24)), addressing Problems 1 and 2, assuming reliable client accuracy metrics.
Semantic encoding (30) compresses parameters, reducing noise impact and communication overhead. Data augmentation (32) lowers variance (33), enhancing generalization for small datasets. The aggregation pipeline, detailed below, filters noise and aligns divergent updates. The SAIA’s convergence is bounded by
E [ F ( w ( t + 1 ) ) ] F ( w ( t ) ) η F ( w ( t ) ) 2 2 + η 2 σ 2 d K + d K 1 K σ δ 2 + η ϵ ,
where d d , and ϵ is the reconstruction error, ensuring faster convergence than FedAvg under the stated assumptions.
Algorithm 1 Semantic Anti-Interference Aggregation (SAIA).
1:
Input: Global parameters w ( 0 ) R d , datasets { D k } k = 1 K , rounds T, noise variance σ 2
2:
Server-side Precomputation:
3:
Pretrain autoencoder ( g e ,   g d ) on a proxy dataset; freeze parameters θ e , θ d
4:
Distribute frozen g e , g d to all clients (once)
5:
Federated Training:
6:
Initialize w ( 0 ) (e.g., Xavier initialization)
7:
for   t = 0 to T 1  do
8:
    Broadcast w ( t ) to all clients
9:
    for each client k = 1 to K in parallel do
10:
        Augment D k (e.g., rotations, translations)
11:
        Train local CNN to obtain w k ( t ) (SGDM)
12:
        Encode: z k ( t ) g e ( w k ( t ) ; θ e )
13:
        Transmit z k ( t ) via OTAC
14:
    end for
15:
    Receive: y ( t ) k = 1 K z k ( t ) + n ( t )
16:
    Decode iteratively (e.g., using ISTA) to estimate { z ^ k ( t ) }
17:
    Median aggregation: m ( t ) median ( { z ^ k ( t ) } k = 1 K )
18:
    Compute weights: a ˜ k ( t ) a k ( t ) j = 1 K a j ( t ) , where a k ( t ) is client accuracy
19:
    Update: w ( t + 1 ) g d ( k = 1 K a ˜ k ( t ) m ( t ) ; θ d )
20:
end for
21:
return  w ( T )

4.3. SAIA Algorithm

The SAIA framework integrates the above components into a cohesive algorithm, with key experimental parameters listed in Table 2:
The algorithm leverages semantic encoding to transmit low-dimensional z k ( t ) , reducing noise (15) and communication cost (25). Data augmentation mitigates small dataset issues (26). Median aggregation (6) filters noise outliers, and adaptive weighting prioritizes reliable clients, addressing heterogeneity (19). Iterative decoding recovers individual z k ( t ) , enhancing robustness. The SAIA algorithm ensures convergence under the stated assumptions, as shown in Equation (34).

4.4. Complexity Analysis

The SAIA’s computational and communication complexity ensures scalability for resource-constrained devices (Problem 3). For each client, the encoder g e ( · ; θ e ) maps w k ( t ) R d to z k ( t ) R d via a fully connected layer, with complexity O ( d · d ) . Local training uses SGDM with complexity O ( E · n k · d ) , where E is the number of epochs and n k is the dataset size. Communication involves transmitting z k ( t ) , with cost O ( d ) , compared to O ( d ) for FedAvg. At the server, iterative decoding estimates { z ^ k ( t ) } with complexity O ( K · d ) , median aggregation requires sorting with complexity O ( K · d · log K ) , and decoding to w ( t + 1 ) has complexity O ( d · d ) . Total per-round complexity is dominated by local training and encoding/decoding, remaining feasible for IoT devices, as validated in Section 5.

5. Experimental Results and Discussion

This section presents the experimental evaluation of the SAIA framework in AirFL using the MNIST dataset ( m = 784 , C = 10 ) with K = 5 clients under AWGN ( σ 2 = 0.3 ), following the system model in Section 2. The dataset is subsampled to 1000 images (700 training, 300 validation). To ensure the reliability and statistical significance of our findings, all experiments were conducted 5 times using different random seeds {0, 42, 123, 2024, 9999}, and results are reported as ‘mean ± standard deviation’. Our code and dataset splits are available to ensure reproducibility. The global model, a CNN with a semantic autoencoder (Conv1: 32 filters, 320 parameters; Conv2: 64 filters, 18,432 parameters; Encoder: 803,072 parameters; Output: 2570 parameters), is trained for T = 10 rounds, with each client performing local updates using SGDM (10 epochs, learning rate 0.01, momentum 0.9, batch size 32). The SAIA is compared with FedAvg in terms of accuracy, loss, client consistency, parameter updates, computational overhead, and classification performance. We clarify that the primary goal of this research is to validate the effectiveness of the SAIA protocol in resisting Byzantine attacks and communication interference. The choice of MNIST allows for high iteration efficiency and accurate isolation of the protocol’s robustness mechanisms, which aligns with the resource constraints of our target IoT devices.

5.1. Noise Sensitivity Analysis

To evaluate the SAIA’s robustness to varying noise levels (Problem 1), experiments with σ 2 { 0.1 ,   0.3 ,   0.5 } were conducted. Table 3 presents the validation accuracy for the SAIA and FedAvg, demonstrating the SAIA’s superior performance across noise levels.
Figure 3 shows the accuracy over rounds for different noise levels. As noise increases, accuracy decreases, but the SAIA maintains relatively high performance, stabilizing around 0.95–0.97, demonstrating its noise robustness via semantic encoding and median aggregation.
Figure 4 shows the loss over rounds for different noise levels. Higher noise leads to increased loss, but the SAIA shows lower and more stable loss values.

5.2. Alignment with Problem Objectives

To align with Section 3, we map experimental metrics to each problem:
  • Problem 1 (Channel Noise): Accuracy and loss metrics evaluate the SAIA’s ability to mitigate noise-induced, validated by semantic encoding and median aggregation (Equations (15) and (18)).
  • Problem 2 (Data Heterogeneity): Client consistency across skewed datasets tests the SAIA’s alignment of non-i.i.d. updates, validated by adaptive weighting (Equation (24)).
  • Problem 3 (Resource Constraints): Training time and communication overheads to assess the SAIA’s scalability, validated by low-dimensional encoding (Equation (25)).
  • Problem 4 (Small Datasets): Classification performance and generalization errors test the SAIA’s mitigation of overfitting, validated by data augmentation (Equation (27)).

5.3. Performance Metrics

After 10 rounds, the SAIA achieves a final validation accuracy of 95.67% ± 0.41% and a loss of 0.1605 ± 0.012, outperforming FedAvg’s 92.33% ± 0.65% accuracy and 0.2633 ± 0.021 loss. The SAIA’s client accuracy range narrows to 98.50–99.50% by Round 10, compared to FedAvg’s 88.57–97.14%, reflecting robustness to data skew. Parameter variance, reflecting update divergence due to heterogeneity (20), is lower for the SAIA (0.05–0.12) than FedAvg (0.15–0.28). The SAIA’s improved performance is attributed to semantic encoding and median aggregation (18), which mitigate AWGN effects, and semantic-weighted aggregation, which aligns updates under non-i.i.d. conditions (19). These results validate the SAIA’s solutions to Problems 1 and 2, as encoding reduces noise-induced errors and weighting minimizes divergence.

5.4. Convergence Behavior

Figure 5 shows the round-wise accuracy and loss for the SAIA and FedAvg over 10 rounds. The SAIA’s accuracy rises from 93.00% (Round 1) to 95.67% (Round 6), stabilizing thereafter, while FedAvg increases from 91.33% to 92.33%. The SAIA’s loss decreases from 0.2025 to 0.1605, a 20.7% reduction, compared to FedAvg’s decline from 0.2413 to 0.2633, highlighting the SAIA’s faster convergence.
The faster convergence of the SAIA (3.34% higher accuracy than FedAvg) results from semantic encoding, reducing update dimensionality to 256 (from d 1.6 × 10 6 ) to lower noise impact (15), and median aggregation, which filters outliers (18). The significant loss reduction highlights the SAIA’s stability in non-i.i.d. settings. These results validate the SAIA’s solutions to Problems 1 and 2, aligning with the convergence bounds in Equations (16) and (23).

5.5. Client Consistency and Parameter Updates

Table 4 presents client-level performance, considering data skew. The SAIA achieves high accuracy (98.50%–99.50%) for all clients by Round 10, despite significant skews (e.g., Client 1: 60% digits 0–2; Client 2: 70% digits 3–5), compared to FedAvg’s wider range of 88.57–97.14%. Client 2, with the most severe skew, improves from 90.71% to 98.50%, reflecting robustness. Test set accuracy (95.00%–96.50%) confirms generalization, though the small dataset size and MNIST’s simplicity may inflate validation accuracy. The SAIA’s losses range from 0.0018 to 0.0563.
Figure 6 depicts client accuracy and parameter update magnitude over 10 rounds. All clients reach 98.50% or higher accuracy. The parameter update magnitude decreases from approximately 0.5 in Round 2 to 0.1 in Round 10, indicating convergence. The SAIA’s consistency across clients, even with severe skew, stems from semantically weighted aggregation that aligns features (19). The decreasing parameter update magnitude confirms global model convergence. This validates SAIA’s solution to Problem 2.

5.6. Computational Overhead

Figure 7 illustrates the training time per round for the SAIA, ranging from 40.42 to 51.00 s, comparable to FedAvg. Communication overhead is 0.4 MB per round for both, as the SAIA uses compressed representations ( z k ( t ) R 256 ) (25). The stable training time indicates that the SAIA’s anti-interference mechanisms add negligible overhead, making it suitable for resource-constrained devices. This addresses Problem 3.

5.7. Classification Performance

Figure 8 presents the confusion matrix for the SAIA’s final model on the validation set, with an accuracy of 95.67%. High diagonal values indicate accurate classification across all classes. Minor errors are due to digit similarities, but SAIA’s robustness to noise (15) ensures high accuracy. This validates the SAIA’s solutions to Problems 1 and 4.

5.8. Interference Sensitivity Analysis

Figure 9 illustrates accuracy over rounds for different interference scales. Higher interference reduces accuracy, but the SAIA demonstrates resilience, maintaining levels above 0.94. Figure 10 shows the corresponding loss, where SAIA again maintains lower and more stable values.

5.9. Effect of Number of Clients

Figure 11 shows accuracy for a varying number of clients. More clients generally improve accuracy due to more robust aggregation, with seven clients achieving the highest stability around 0.96.

5.10. Dropout Probability Analysis

Figure 12 and Figure 13 depict loss and accuracy for different dropout probabilities. Higher dropout increases loss and decreases accuracy, but the SAIA handles up to 0.3 dropouts with accuracy above 0.94, showcasing its robustness to client dropouts.

5.11. Ablation Study

To quantitatively isolate the contribution of our adaptive aggregation strategy, we conducted a targeted ablation study. The experiment was designed with five clients, where one client (Client 5) was deliberately designated as a “bad” client and trained with noisy labels to simulate a Byzantine or poorly performing device. Figure 14 compares the performance of our full SAIA framework, which uses an adaptive, variance-based weighting scheme, against a naive baseline. In this baseline, the global model is simply replaced by the model of one randomly selected active client in each round, representing an aggregation strategy with no robustness.
As shown in Figure 14, the baseline exhibits highly unstable performance. Its accuracy fluctuates wildly and plummets in rounds where the bad client is randomly selected for the global update. In contrast, our proposed method demonstrates remarkable resilience. The adaptive weighting scheme quickly learns to assign a negligible weight to the bad client, effectively ignoring its corrupting updates and achieving smooth, monotonic convergence. This experiment powerfully validates the necessity and effectiveness of our robust aggregation strategy in heterogeneous and potentially adversarial environments. The semantic encoding, median aggregation, and data augmentation in the SAIA framework are designed to work synergistically as a whole, and their effects are difficult to completely isolate and quantify without compromising system integrity. Therefore, we only performed ablation analysis on the most critical defense module—adaptive weighting—and commit to including a complete, component-level quantitative dissection in the extended journal version.

5.12. Analysis of Privacy Enhancement

To quantify the privacy benefits of semantic encoding, we simulated a model inversion attack. An attacker at the server attempts to reconstruct a client’s model parameters from the received signals. We measured the mean squared error (MSE) between the attacker’s reconstructed parameters and the client’s true parameters for two scenarios: an attack on SAIA (receiving the noisy semantic vector z ^ k ) and an attack on FedAvg (receiving the noisy raw parameters w ^ k ). Our results show that the reconstruction error is an order of magnitude higher when attacking SAIA compared to FedAvg. The semantic encoding acts as a strong, non-linear transformation that makes it significantly more difficult to invert the process and recover the original model weights, thus providing a tangible privacy enhancement.

5.13. Evaluation and Insights

The experimental results demonstrate the SAIA’s effectiveness in AirFL under noisy conditions ( σ 2 = 0.3 ). Averaged over five runs, SAIA achieves 3.34% higher accuracy (95.67% ± 0.41% vs. 92.33% ± 0.65%) and 39.0% lower loss (0.1605 ± 0.012 vs. 0.2633 ± 0.021) than FedAvg, attributable to semantic encoding and median aggregation. The framework ensures high client consistency, with all clients reaching 98.50–99.50% accuracy by Round 10, even under severe data skew, highlighting its ability to handle non-i.i.d. data. The decreasing magnitude of parameter updates confirms model convergence, while the stable training time and low communication overhead make SAIA practical for edge devices. These results align with Problems 1–4, validating the SAIA’s theoretical solutions. The privacy enhancement from semantic encoding further supports its use in secure applications like smartgrid monitoring [17]. While the primary focus was on protocol-level validation, future work should extend this evaluation to larger datasets like CIFAR-10 to further explore the framework’s scalability.

6. Conclusions and Future Works

The proposed SAIA framework effectively tackles critical challenges in AirFL, including channel noise, data heterogeneity, resource constraints, and small local datasets. By combining semantic encoding, median aggregation, adaptive weighting, and data augmentation, it ensures stable convergence and scalability. Our comprehensive experiments, validated over multiple runs for statistical significance, show that SAIA achieves a validation accuracy of 95.67% ± 0.41% on MNIST under challenging conditions, significantly outperforming standard FedAvg. Our ablation study quantitatively confirms the critical contribution of the adaptive weighting scheme in defending against Byzantine clients. While not individually quantified through separate ablation studies, semantic encoding, median aggregation, and data augmentation are theoretically designed to collectively enhance robustness and contribute significantly to the overall superior performance. Furthermore, we have quantitatively demonstrated that semantic encoding provides a tangible privacy enhancement against model inversion attacks. The SAIA outperforms existing methods, making it suitable for sensitive applications like smart-grid monitoring.
To improve practical applicability, future work will explore modeling inter-user interference using correlated channel models, such as Rayleigh fading, and mitigating it through precoding or orthogonal access techniques. Additionally, incorporating reconfigurable intelligent surfaces [31,32,33] into the proposed system may further improve signal quality and system adaptability in real-world deployments. Client dropouts will be addressed through partial aggregation strategies, and noisy downlink will be managed using error-correcting codes. Additionally, performance will be evaluated on more complex datasets, such as CIFAR-10, under varying noise conditions. These extensions, combined with dynamic semantic compression, will further strengthen the proposed SAIA as a robust, privacy-preserving framework for distributed learning.

Author Contributions

Conceptualization, J.-C.J., C.-T.L., and K.W.; methodology, J.-C.J., C.-T.L., and K.W.; software, J.-C.J.; validation, J.-C.J.; formal analysis, J.-C.J. and K.W.; writing—original draft preparation, J.-C.J.; writing—review and editing, C.-T.L. and K.W.; supervision, C.-T.L., K.W., and B.K.N.; project administration, C.-T.L., K.W., and B.K.N.; funding acquisition, C.-T.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Science and Technology Development Fund of Macao (FDCT) under Grant 0033/2023/RIA1 and Macao Polytechnic University Research Grant RP/FCA-13/2022.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in this study are included in the paper.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
AirFLOver-the-Air Federated Learning
AWGNAdditive White Gaussian Noise
CNNConvolutional Neural Network
FLFederated Learning
IoTInternet of Things
KLKullback-Leibler (Divergence)
OTACOver-the-Air Computation
SAIASemantic Anti-Interference Aggregation
SGDMStochastic Gradient Descent with Momentum
SNRSignal-to-Noise Ratio

References

  1. Zikria, Y.B.; Ali, R.; Afzal, M.K.; Kim, S.W. Next-generation internet of things (iot): Opportunities, challenges, and solutions. Sensors 2021, 21, 1174. [Google Scholar] [CrossRef]
  2. Rathore, M.M.; Paul, A.; Hong, W.H.; Seo, H.; Awan, I.; Saeed, S. Exploiting IoT and big data analytics: Defining smart digital city using real-time urban data. Sustain. Cities Soc. 2018, 40, 600–610. [Google Scholar] [CrossRef]
  3. Parikh, D.; Radadia, S.; Eranna, R.K. Privacy-Preserving Machine Learning Techniques, Challenges and Research Directions. Int. Res. J. Eng. Technol. 2024, 11, 499. [Google Scholar]
  4. Wen, J.; Zhang, Z.; Lan, Y.; Cui, Z.; Cai, J.; Zhang, W. A survey on federated learning: Challenges and applications. Int. J. Mach. Learn. Cybern. 2023, 14, 513–535. [Google Scholar] [CrossRef]
  5. Yang, K.; Jiang, T.; Shi, Y.; Ding, Z. Federated Learning via Over-the-Air Computation. IEEE Trans. Wirel. Commun. 2020, 19, 2022–2035. [Google Scholar] [CrossRef]
  6. Ji, J.; Lam, C.T.; Wang, K.; Ng, B.K. AMNED: An Efficient Framework for Spiking Neuron Coding in AirComp Federated Learning. IEEE Access 2025, 13, 138970–138985. [Google Scholar] [CrossRef]
  7. Naseh, D.; Bozorgchenani, A.; Shinde, S.S.; Tarchi, D. Unified Distributed Machine Learning for 6G Intelligent Transportation Systems: A Hierarchical Approach for Terrestrial and Non-Terrestrial Networks. Network 2025, 5, 41. [Google Scholar] [CrossRef]
  8. Abdullahi, M.; Cao, A.; Zafar, A.; Xiao, P.; Hemadeh, I.A. A generalized bit error rate evaluation for index modulation based OFDM system. IEEE Access 2020, 8, 70082–70094. [Google Scholar] [CrossRef]
  9. Jin, C.; Chang, Z.; Hu, F.; Luan, M.; Hämäläinen, T. Enhanced Physical Layer Security for Full-Duplex Facultative Symbiotic Radio: A Pattern Switching and Multi-Device Scheduling Strategy. In Proceedings of the 2025 IEEE Wireless Communications and Networking Conference (WCNC), Milan, Italy, 24–27 March 2025; pp. 1–6. [Google Scholar]
  10. Liu, Y.; Qu, Z.; Wang, J. Compressed Hierarchical Federated Learning for Edge-Level Imbalanced Wireless Networks. IEEE Trans. Comput. Soc. Syst. 2025, 12, 3131–3142. [Google Scholar] [CrossRef]
  11. Singh, S.; Kumar, M.; Kumar, R. Powering the future: A survey of ambient RF-based communication systems for next-gen wireless networks. IET Wirel. Sens. Syst. 2024, 14, 265–292. [Google Scholar] [CrossRef]
  12. Cui, Y.; Guo, J.; Wen, C.K.; Jin, S. Communication-efficient personalized federated edge learning for massive mimo csi feedback. IEEE Trans. Wirel. Commun. 2023, 23, 7362–7375. [Google Scholar] [CrossRef]
  13. Lan, M.; Ling, Q.; Xiao, S.; Zhang, W. Quantization bits allocation for wireless federated learning. IEEE Trans. Wirel. Commun. 2023, 22, 8336–8351. [Google Scholar] [CrossRef]
  14. Shi, W.; Yao, J.; Xu, W.; Xu, J.; You, X.; Eldar, Y.C.; Zhao, C. Combating interference for over-the-air federated learning: A statistical approach via RIS. IEEE Trans. Signal Process. 2025, 73, 936–953. [Google Scholar] [CrossRef]
  15. Cao, X.; Lyu, Z.; Zhu, G.; Xu, J.; Xu, L.; Cui, S. An overview on over-the-air federated edge learning. IEEE Wirel. Commun. 2024, 31, 202–210. [Google Scholar] [CrossRef]
  16. Phong, N.H.; Santos, A.; Ribeiro, B. PSO-convolutional neural networks with heterogeneous learning rate. IEEE Access 2022, 10, 89970–89988. [Google Scholar] [CrossRef]
  17. Pakina, A.K.; Pujari, M. Differential privacy at the edge: A federated learning framework for GDPR-compliant TinyML deployments. IOSR J. Comput. Eng. 2024, 26, 52–64. [Google Scholar]
  18. Lee, H.S.; Lee, J.W. Adaptive transmission scheduling in wireless networks for asynchronous federated learning. IEEE J. Sel. Areas Commun. 2021, 39, 3673–3687. [Google Scholar] [CrossRef]
  19. Qiao, Y.; Adhikary, A.; Kim, K.T.; Zhang, C.; Hong, C.S. Knowledge distillation assisted robust federated learning: Towards edge intelligence. In Proceedings of the ICC 2024—IEEE International Conference on Communications, Denver, CO, USA, 9–13 June 2024; IEEE: New York, NY, USA, 2024; pp. 843–848. [Google Scholar]
  20. Ma, B.; Yin, X.; Tan, J.; Chen, Y.; Huang, H.; Wang, H.; Xue, W.; Ban, X. FedST: Federated style transfer learning for non-IID image segmentation. In Proceedings of the AAAI Conference on Artificial Intelligence, Philadelphia, PA, USA, 25 February–4 March 2024; Volume 38, pp. 4053–4061. [Google Scholar]
  21. Houssein, E.H.; Sayed, A. Boosted federated learning based on improved Particle Swarm Optimization for healthcare IoT devices. Comput. Biol. Med. 2023, 163, 107195. [Google Scholar] [CrossRef]
  22. Baqer, M. Lightweight Federated Learning Approach for Resource-Constrained Internet of Things. Sensors 2025, 25, 5633. [Google Scholar] [CrossRef] [PubMed]
  23. Ridolfi, L.; Naseh, D.; Shinde, S.S.; Tarchi, D. Implementation and evaluation of a federated learning framework on raspberry PI platforms for IoT 6G applications. Future Internet 2023, 15, 358. [Google Scholar] [CrossRef]
  24. Zhou, X.; Liang, W.; Kawai, A.; Fueda, K.; She, J.; Wang, K.I.K. Adaptive segmentation enhanced asynchronous federated learning for sustainable intelligent transportation systems. IEEE Trans. Intell. Transp. Syst. 2024, 25, 6658–6666. [Google Scholar] [CrossRef]
  25. Wang, Z.; Zhou, Y.; Shi, Y.; Zhuang, W. Interference management for over-the-air federated learning in multi-cell wireless networks. IEEE J. Sel. Areas Commun. 2022, 40, 2361–2377. [Google Scholar] [CrossRef]
  26. Wang, S.; Chen, M.; Brinton, C.G.; Yin, C.; Saad, W.; Cui, S. Performance optimization for variable bitwidth federated learning in wireless networks. IEEE Trans. Wirel. Commun. 2023, 23, 2340–2356. [Google Scholar] [CrossRef]
  27. Zhu, G.; Liu, D.; Du, Y.; You, C.; Zhang, J.; Huang, K. Toward an Intelligent Edge: Wireless Communication Meets Machine Learning. IEEE Commun. Mag. 2020, 58, 19–25. [Google Scholar] [CrossRef]
  28. Donoho, D.L. Compressed sensing. IEEE Trans. Inf. Theory 2006, 52, 1289–1306. [Google Scholar] [CrossRef]
  29. Bockelmann, C.; Pratas, N.K.; Güllü, H.; Nikopour, H.; Au, K.; Stefanovic, C.; Popovski, P. Massive MIMO for machine-to-machine communication. IEEE Commun. Mag. 2016, 54, 162–169. [Google Scholar]
  30. Bengio, Y.; Courville, A.; Vincent, P. Representation learning: A review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 2013, 35, 1798–1828. [Google Scholar] [CrossRef]
  31. Wang, K.; Lam, C.T.; Ng, B.K. RIS-Assisted High-Speed Communications with Time-Varying Distance-Dependent Rician Channels. Appl. Sci. 2022, 12, 11857. [Google Scholar] [CrossRef]
  32. Wang, K.; Lam, C.T.; Ng, B.K. Positioning Information Based High-Speed Communications with Multiple RISs: Doppler Mitigation and Hardware Impairments. Appl. Sci. 2022, 12, 7076. [Google Scholar] [CrossRef]
  33. Wang, K.; Lam, C.T.; Ng, B.K. Hardware aging analysis for reconfigurable intelligent surfaces. IET Electron. Lett. 2023, 59, e12714. [Google Scholar] [CrossRef]
Figure 1. The AirFL system model with OTAC-based aggregation, showing simultaneous client transmissions and server aggregation under AWGN.
Figure 1. The AirFL system model with OTAC-based aggregation, showing simultaneous client transmissions and server aggregation under AWGN.
Mathematics 14 00124 g001
Figure 2. The neural network architecture with a semantic autoencoder for AirFL, showing convolutional layers, pooling, and the encoder–decoder structure for robust parameter compression.
Figure 2. The neural network architecture with a semantic autoencoder for AirFL, showing convolutional layers, pooling, and the encoder–decoder structure for robust parameter compression.
Mathematics 14 00124 g002
Figure 3. Accuracy with different noise levels (5 Clients, dropout = 0.2, interference = 0.2). The plot shows the mean accuracy over 5 runs, with error bars indicating the standard deviation. Lines are styled with distinct markers for clarity.
Figure 3. Accuracy with different noise levels (5 Clients, dropout = 0.2, interference = 0.2). The plot shows the mean accuracy over 5 runs, with error bars indicating the standard deviation. Lines are styled with distinct markers for clarity.
Mathematics 14 00124 g003
Figure 4. Loss with different noise levels (5 clients, dropout = 0.2, interference = 0.2). The plot shows the mean loss over 5 runs, with error bars indicating the standard deviation. Lines are styled with distinct markers for clarity.
Figure 4. Loss with different noise levels (5 clients, dropout = 0.2, interference = 0.2). The plot shows the mean loss over 5 runs, with error bars indicating the standard deviation. Lines are styled with distinct markers for clarity.
Mathematics 14 00124 g004
Figure 5. Accuracy and loss scenario comparison (5 clients, interference = 0.2, noise = 0.3). Results are averaged over 5 runs. The SAIA (with anti-interference) consistently outperforms the version without, especially under client dropout.
Figure 5. Accuracy and loss scenario comparison (5 clients, interference = 0.2, noise = 0.3). Results are averaged over 5 runs. The SAIA (with anti-interference) consistently outperforms the version without, especially under client dropout.
Mathematics 14 00124 g005
Figure 6. Client accuracy and parameter update magnitude over 10 rounds for SAIA. The decreasing update magnitudes indicate successful global model convergence, while high client accuracies demonstrate robustness to data heterogeneity.
Figure 6. Client accuracy and parameter update magnitude over 10 rounds for SAIA. The decreasing update magnitudes indicate successful global model convergence, while high client accuracies demonstrate robustness to data heterogeneity.
Mathematics 14 00124 g006
Figure 7. Training time per round for SAIA on MNIST. The stable times indicate that SAIA’s additional processing (encoding and robust aggregation) adds negligible computational overhead, making it suitable for resource-constrained devices.
Figure 7. Training time per round for SAIA on MNIST. The stable times indicate that SAIA’s additional processing (encoding and robust aggregation) adds negligible computational overhead, making it suitable for resource-constrained devices.
Mathematics 14 00124 g007
Figure 8. Confusion matrix for SAIA on the MNIST validation set. The model achieves 95.67% accuracy, with high diagonal values indicating robust classification across all digits.
Figure 8. Confusion matrix for SAIA on the MNIST validation set. The model achieves 95.67% accuracy, with high diagonal values indicating robust classification across all digits.
Mathematics 14 00124 g008
Figure 9. Accuracy with different interference scales (5 clients, dropout = 0.2, noise = 0.3). Results are averaged over 5 runs. SAIA maintains high accuracy even as interference increases, demonstrating its resilience.
Figure 9. Accuracy with different interference scales (5 clients, dropout = 0.2, noise = 0.3). Results are averaged over 5 runs. SAIA maintains high accuracy even as interference increases, demonstrating its resilience.
Mathematics 14 00124 g009
Figure 10. Loss with different interference scales (5 clients, dropout = 0.2, noise = 0.3). Results are averaged over 5 runs. SAIA’s loss remains lower and more stable compared to scenarios with higher interference.
Figure 10. Loss with different interference scales (5 clients, dropout = 0.2, noise = 0.3). Results are averaged over 5 runs. SAIA’s loss remains lower and more stable compared to scenarios with higher interference.
Mathematics 14 00124 g010
Figure 11. Accuracy with different number of clients (dropout = 0.2, interference = 0.2, noise = 0.3). Results are averaged over 5 runs. Performance generally improves with more clients due to more robust aggregation.
Figure 11. Accuracy with different number of clients (dropout = 0.2, interference = 0.2, noise = 0.3). Results are averaged over 5 runs. Performance generally improves with more clients due to more robust aggregation.
Mathematics 14 00124 g011
Figure 12. Loss with different dropout probabilities (5 clients, interference = 0.2, noise = 0.3). Results are averaged over 5 runs. SAIA’s performance degrades gracefully as dropout increases, showing its robustness to client unavailability.
Figure 12. Loss with different dropout probabilities (5 clients, interference = 0.2, noise = 0.3). Results are averaged over 5 runs. SAIA’s performance degrades gracefully as dropout increases, showing its robustness to client unavailability.
Mathematics 14 00124 g012
Figure 13. Accuracy with different dropout probabilities (5 clients, interference = 0.2, noise = 0.3). Results are averaged over 5 runs. The framework maintains high accuracy even with up to 30% client dropout per round.
Figure 13. Accuracy with different dropout probabilities (5 clients, interference = 0.2, noise = 0.3). Results are averaged over 5 runs. The framework maintains high accuracy even with up to 30% client dropout per round.
Mathematics 14 00124 g013
Figure 14. Ablation study of the adaptive weighting strategy. The solid blue line (Proposed) shows SAIA’s performance, which effectively identifies and down-weights the “bad” client, leading to stable convergence. The dashed red line (Ablation Baseline) shows the erratic performance of a naive random-client selection, which collapses when the bad client is chosen.
Figure 14. Ablation study of the adaptive weighting strategy. The solid blue line (Proposed) shows SAIA’s performance, which effectively identifies and down-weights the “bad” client, leading to stable convergence. The dashed red line (Ablation Baseline) shows the erratic performance of a naive random-client selection, which collapses when the bad client is chosen.
Mathematics 14 00124 g014
Table 1. Symbols and their descriptions.
Table 1. Symbols and their descriptions.
SymbolDescription
w Global CNN parameters optimized across clients.
KNumber of edge devices, typically 5.
D k Client k’s non-i.i.d. dataset.
n k Local dataset size, affects overfitting.
x i Input data for local training.
y i Sample label, C classes.
n ( t ) Uplink AWGN, variance σ 2 .
σ 2 AWGN intensity, tested 0.1–0.5.
I Identity matrix for noise independence.
h k Channel gain, normalized to 1.
w k ( t ) Client k’s local CNN parameters.
z k ( t ) Encoded parameters, reduces noise.
dCNN parameter count, large.
d Encoded representation size, typically 256.
y ( t ) Superposed updates with noise.
m ( t ) Median of decoded representations, robust.
a ˜ k ( t ) Adaptive weights by client accuracy.
a k ( t ) Client k’s validation accuracy.
g e ( · ; θ e ) Encoder maps to semantic representation.
g d ( · ; θ d ) Decoder reconstructs from semantic encoding.
F ( w ) Global weighted average loss.
F k ( w k ) Client k’s non-i.i.d. loss.
( · ) Cross-entropy loss for predictions.
f ( · ; w ) CNN softmax output, C classes.
η SGDM learning rate, typically 0.01.
LLipschitz constant, bounds smoothness.
σ δ 2 Client parameter variance, non-i.i.d.
D KL KL divergence, measures data heterogeneity.
β Scales σ δ 2 to D KL .
SNRSignal-to-noise ratio, ≥20 dB.
σ w 2 CNN weight variance, typically 0.1.
Cos t k Client k’s training cost.
EEpochs per dataset, typically 10.
CPer-sample computational cost.
n k eff Augmented dataset size, reduces overfitting.
α Dataset augmentation factor, typically 2.
ϵ Autoencoder reconstruction error bound.
I ( · ; · ) Mutual information in semantic encoding.
TTotal AirFL rounds, typically 10.
Table 2. Key experimental parameters for Algorithm 1.
Table 2. Key experimental parameters for Algorithm 1.
ParameterValue
Model Parameter Dimension (d)≈8.2 × 10 5
Semantic Vector Dimension ( d )256
Regularization ( λ in Equation (5))0.01
ISTA Iterations (Decoding)50
Total Rounds (T)10
Local Epochs (E)10
Learning Rate ( η )0.01
Table 3. Validation accuracy vs. noise variance ( σ 2 ) for the SAIA and FedAvg on MNIST.
Table 3. Validation accuracy vs. noise variance ( σ 2 ) for the SAIA and FedAvg on MNIST.
Noise Variance ( σ 2 )SAIA AccuracyFedAvg Accuracy
0.198.33% ± 0.21%95.67% ± 0.45%
0.397.00% ± 0.35%94.33% ± 0.58%
0.595.67% ± 0.41%92.00% ± 0.72%
Table 4. Client-level accuracy and loss after 10 rounds.
Table 4. Client-level accuracy and loss after 10 rounds.
ClientData SkewAcc. (SAIA)Acc. (FedAvg)Loss (SAIA)
10–2 (60%)99.50%95.71%0.0294
23–5 (70%)98.50%88.57%0.0563
36–8 (50%)99.50%94.29%0.0512
41–4 (40%)99.00%96.43%0.0301
5Balanced99.50%97.14%0.0018
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Ji, J.-C.; Lam, C.-T.; Wang, K.; Ng, B.K. Robust Aggregation in Over-the-Air Computation with Federated Learning: A Semantic Anti-Interference Approach. Mathematics 2026, 14, 124. https://doi.org/10.3390/math14010124

AMA Style

Ji J-C, Lam C-T, Wang K, Ng BK. Robust Aggregation in Over-the-Air Computation with Federated Learning: A Semantic Anti-Interference Approach. Mathematics. 2026; 14(1):124. https://doi.org/10.3390/math14010124

Chicago/Turabian Style

Ji, Jun-Cheng, Chan-Tong Lam, Ke Wang, and Benjamin K. Ng. 2026. "Robust Aggregation in Over-the-Air Computation with Federated Learning: A Semantic Anti-Interference Approach" Mathematics 14, no. 1: 124. https://doi.org/10.3390/math14010124

APA Style

Ji, J.-C., Lam, C.-T., Wang, K., & Ng, B. K. (2026). Robust Aggregation in Over-the-Air Computation with Federated Learning: A Semantic Anti-Interference Approach. Mathematics, 14(1), 124. https://doi.org/10.3390/math14010124

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop