Efficient Federated Learning Method FedLayerPrune Based on Layer Adaptive Pruning
Abstract
1. Introduction
- (1)
- Layer-adaptive pruning strategy: We propose a principled mechanism that dynamically allocates pruning rates based on layer type, network depth, and gradient-based importance scores. Specifically, sensitive convolutional layers receive conservative pruning, while redundant fully connected layers undergo more aggressive compression, thereby preserving critical feature extraction capabilities.
- (2)
- Heterogeneity-aware aggregation mechanism: We introduce a dual mechanism combining sample-size weighted averaging with mask consensus voting for global model aggregation. This design explicitly accounts for data distribution heterogeneity across clients, enhancing model robustness and generalization under non-IID settings.
- (3)
- Dynamic pruning rate scheduling: We design a progressive pruning scheduler that coordinates with the training process—maintaining low pruning rates during early convergence-critical phases and gradually increasing compression intensity in later rounds—achieving an effective balance between communication efficiency and model convergence quality.
- (4)
- Comprehensive experimental evaluation: Systematic experiments on CIFAR-10, MNIST, and Fashion-MNIST datasets demonstrate that FedLayerPrune achieves up to 68.3% communication reduction compared with FedAvg while keeping accuracy loss within 2%, and exhibits superior robustness under severe non-IID conditions compared to existing baselines.
2. Related Works
2.1. Federated Learning Optimization
2.2. Model Compression Techniques
- Unstructured pruning: Removes individual weight connections, achieving high compression ratios but producing sparse models that require specialized hardware or sparse computation libraries for acceleration. Dynamic sparse training methods such as RigL [6] has been proposed to maintain performance during the pruning process.
- Structured pruning: Operates at the level of channels, filters, or entire layers, preserving network regularity and enabling efficient deployment on general-purpose hardware platforms.
2.3. Communication Optimization in Federated Learning
- Gradient compression: Techniques including quantization [17,21], sparsification, and truncation reduce the communication burden by compressing gradient or parameter updates. Advanced approaches such as adaptive gradient quantization [22,23,24] and predictive coding [25] have been proposed to further improve communication efficiency.
- Model compression: Directly simplifies the transmitted model to reduce per-round data volume.
2.4. Non-IID Data Handling
2.5. Layer-Wise Techniques
2.6. Summary and Positioning of FedLayerPrune
3. Our Method: FedLayerPrune
3.1. Problem Definition
3.2. Design of FedLayerPrune
- Local training: Each selected client performs E local epochs of SGD on its private dataset .
- Importance evaluation: Each client estimates parameter importance scores using Fisher information approximation (computed via a single additional backward pass on a mini-batch), with exponential moving average smoothing for stability.
- Layer-adaptive pruning: Binary masks and sparse parameters are generated according to layer-specific sensitivity coefficients and the temporal pruning schedule.
- Heterogeneity-aware aggregation: The server performs sample-size weighted aggregation on the uploaded sparse models and applies mask consensus voting to produce the global mask, followed by periodic regrowth of pruned connections.
3.2.1. Layer Adaptive Pruning Strategy
- Convolutional layers (Conv): (conservative);
- Batch normalization (BN): (no pruning);
- Fully connected layers (FC): (aggressive);
- By depth: shallow layers , deep layers (can be combined).
- Structured channel pruning: Sort convolutional channels/filters by norm, and retain the top channels;
- Unstructured sparsity: Within the retained channels, apply fine-grained 0/1 masks according to weight magnitude or Fisher scores.
3.2.2. Parameter Importance Assessment
3.2.3. Heterogeneity-Aware Aggregation
3.3. Algorithm Pseudocode
| Algorithm 1: FedLayerPrune |
![]() |
3.4. Complexity and Communication Cost Analysis
4. Experiments
4.1. Experimental Setup
4.1.1. Datasets and Data Heterogeneity Modeling
- CIFAR-10 [33]: This dataset contains 60,000 RGB color images of size pixels, covering 10 mutually exclusive object categories (airplanes, automobiles, birds, etc.). We follow the standard split, using 50,000 images for training and 10,000 images for testing. With moderate image content complexity, CIFAR-10 is widely used to evaluate the performance of deep neural networks in federated learning scenarios.
- MNIST: As a classic handwritten digit recognition dataset, MNIST contains 70,000 grayscale images of size pixels, covering 10 digit categories from 0 to 9. The standard configuration uses 60,000 images for training and 10,000 images for testing. Although MNIST is relatively simple, it holds significant value for validating the fundamental effectiveness of algorithms.
- Fashion-MNIST [34]: This dataset maintains the same image specifications and quantity distribution as MNIST, but replaces the recognition targets with 10 categories of clothing items (T-shirts, trousers, sweaters, etc.). Fashion-MNIST offers higher complexity than MNIST, providing a more challenging classification task that helps verify the adaptability of algorithms across different difficulty levels.
4.1.2. Neural Network Architecture Configuration
- ResNet-18 [35] (Adapted Version): For the input size of CIFAR-10, we adapted the standard ResNet-18 with modifications including adjusting the stride of the initial convolutional layer from 2 to 1, removing the first max pooling layer, and adjusting the channel configuration of residual blocks. The modified model contains 4 residual block groups with 2 residual blocks per group, totaling approximately 11.2 M parameters. This architecture represents the widely used residual network family in modern deep learning, whose skip connection characteristics pose unique challenges for pruning algorithms.
- Convolutional Neural Network (CNN): For MNIST and Fashion-MNIST datasets, we designed a lightweight yet effective CNN architecture. The network contains two convolutional layers (with 32 and 64 filters, respectively, kernel size ), each followed by ReLU activation and max pooling, followed by two fully connected layers (128 hidden units and 10 output units). The total parameter count is approximately 1.2 M. This relatively simple architecture facilitates analyzing the effects of layer-wise pruning strategies. For comparison with lightweight mobile architectures, we also consider the design principles of MobileNets [36] in our pruning strategy.
4.1.3. Baseline Methods and Comparison Algorithms
- FedAvg [1] (Federated Averaging): The classic federated learning algorithm proposed by McMahan et al., without any model compression mechanism, serving as the performance upper bound reference baseline.
- FixedPrune-50 and FixedPrune-70: Static pruning strategies with fixed pruning rates (50% and 70%, respectively). These methods apply identical pruning rates to all layers without considering inter-layer differences or training dynamics, enabling evaluation of the impact of uniform pruning intensities on model performance.
- FedPrune [26]: A dynamic pruning method that adjusts global pruning rates based on training progress but does not differentiate the importance variations among different network layers, forming a direct contrast with our layer-adaptive strategy.
- FedDST [7]: A representative dynamic sparse training method for federated learning that employs sparse-to-sparse training with periodic topology updates. FedDST serves as a strong baseline for evaluating the benefits of layer-adaptive pruning over uniform dynamic sparsification.
- FedProx [14]: A federated optimization method that introduces proximal regularization to handle client heterogeneity. While not a pruning method, FedProx serves as a reference for evaluating robustness under non-IID conditions.
4.1.4. Hyperparameter Configuration and Experimental Environment
4.2. Experimental Results and Analysis
4.2.1. Comprehensive Evaluation of Model Accuracy
4.2.2. Convergence Analysis and Training Dynamics
4.2.3. Communication Cost Analysis
4.2.4. In-Depth Analysis of Layer-Wise Pruning Rate Distribution
4.2.5. Systematic Analysis of Ablation Study
4.2.6. Investigation of Data Heterogeneity Effects
4.2.7. Scalability Analysis with Partial Participation
5. Discussion
5.1. Analysis of Method Advantages
5.2. Informal Convergence Analysis
5.3. Computational Overhead and Efficiency Analysis
5.4. Practical Deployment and Application Potential
5.5. Limitations and Future Research Directions
- Theoretical convergence guarantees: While our informal analysis (Section 5.2) provides intuition, a rigorous convergence proof incorporating dynamic masks, layer-adaptive rates, and heterogeneous aggregation remains open. Establishing a formal communication-accuracy trade-off bound of the form would significantly strengthen the theoretical contribution.
- Larger-scale evaluation: Although our scalability experiments (, ) demonstrate promising results, evaluation on larger client populations () and more complex datasets (CIFAR-100, Tiny-ImageNet) would better reflect cross-device FL scenarios. System-level latency measurements incorporating actual network delays would also strengthen the practical assessment.
- Adaptive layer sensitivity learning: The current layer sensitivity coefficients are manually assigned based on layer type and depth. Future work could explore data-driven approaches, such as meta-learning or neural architecture search, to automatically learn optimal per-layer pruning configurations.
- Client-specific pruning strategies: Currently, all clients adopt the same pruning strategy determined by the global model structure. Personalizing pruning rates based on individual client constraints (computing power, bandwidth, and data volume) could further improve the overall system efficiency.
- Hardware-aware structured pruning: The current hybrid pruning strategy produces semi-structured sparsity patterns. Incorporating hardware-aware constraints to produce fully structured sparsity (e.g., channel pruning) would enable direct inference acceleration on general-purpose hardware without sparse computation libraries.
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- McMahan, H.B.; Moore, E.; Ramage, D.; Hampson, S.; Arcas, B.A.Y. Communication-Efficient Learning of Deep Networks from Decentralized Data. In Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (AISTATS), Fort Lauderdale, FL, USA, 20–22 April 2017; PMLR: Cambridge, MA, USA, 2017; Volume 54, pp. 1273–1282. [Google Scholar]
- Liu, B.; Lyu, N.; Guo, Y.; Xu, Y.; Zhu, S.; Wu, Z.; Shi, C.; Zhong, Y. Recent Advances on Federated Learning: A Systematic Survey. Neurocomputing 2024, 597, 128019. [Google Scholar] [CrossRef]
- Wen, J.; Zhang, Z.; Lan, Y.; Cui, Z.; Cai, J.; Zhang, W. A Survey on Federated Learning: Challenges and Applications. Int. J. Mach. Learn. Cybern. 2023, 14, 513–535. [Google Scholar] [CrossRef] [PubMed]
- Han, S.; Pool, J.; Tran, J.; Dally, W.J. Learning Both Weights and Connections for Efficient Neural Networks. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), Montreal, QC, Canada, 7–12 December 2015; Curran Associates: Red Hook, NY, USA, 2015; Volume 28, pp. 1135–1143. [Google Scholar]
- Frankle, J.; Carbin, M. The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks. In Proceedings of the International Conference on Learning Representations (ICLR), New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
- Evci, U.; Gale, T.; Menick, J.; Castro, P.S.; Elsen, E. Rigging the Lottery: Making All Tickets Winners. In Proceedings of the 37th International Conference on Machine Learning (ICML), Virtual, 13–18 July 2020; PMLR: Cambridge, MA, USA, 2020; pp. 2943–2952. [Google Scholar]
- Bibikar, S.; Vikalo, H.; Wang, Z.; Chen, X. Federated Dynamic Sparse Training: Computing Less, Communicating Less, Yet Learning Better. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), Virtual, 22 February–1 March 2022; AAAI Press: Palo Alto, CA, USA, 2022; Volume 36, pp. 6080–6088. [Google Scholar]
- Jiang, Z.; Wang, Y.; Zhan, C.; Liu, J.; Huang, C. Computation and Communication Efficient Federated Learning With Adaptive Model Pruning. IEEE Trans. Mob. Comput. 2023, 22, 5765–5781. [Google Scholar] [CrossRef]
- Huang, H.; Zhuang, W.; Chen, C.; Lyu, L. FedMef: Towards Memory-efficient Federated Dynamic Pruning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 17–21 June 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 27548–27557. [Google Scholar]
- Li, A.; Sun, J.; Wang, B.; Duan, L.; Li, S.; Chen, Y.; Li, H. LotteryFL: Empower Edge Intelligence with Personalized and Communication-Efficient Federated Learning. In Proceedings of the IEEE/ACM Symposium on Edge Computing (SEC), San Jose, CA, USA, 14–17 December 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 68–79. [Google Scholar]
- Hoefler, T.; Alistarh, D.; Ben-Nun, T.; Dryden, N.; Peste, A. Sparsity in Deep Learning: Pruning and Growth for Efficient Inference and Training in Neural Networks. J. Mach. Learn. Res. 2021, 22, 1–124. [Google Scholar]
- Wang, Z.; Xu, Y.; Xu, J.; Yang, Y.; Zhou, X.; Zhang, J. Towards Efficient Federated Learning: Layer-Wise Pruning-Quantization Scheme and Coding Design. Entropy 2023, 25, 1205. [Google Scholar]
- Diao, E.; Ding, J.; Tarokh, V. HeteroFL: Computation and Communication Efficient Federated Learning for Heterogeneous Clients. In Proceedings of the International Conference on Learning Representations (ICLR), Virtual, 3–7 May 2021. [Google Scholar]
- Li, T.; Sahu, A.K.; Zaheer, M.; Sanjabi, M.; Talwalkar, A.; Smith, V. Federated Optimization in Heterogeneous Networks. In Proceedings of the Machine Learning and Systems (MLSys), Austin, TX, USA, 2–4 March 2020; MLSys: Indio, CA, USA, 2020; Volume 2, pp. 429–450. [Google Scholar]
- Karimireddy, S.P.; Kale, S.; Mohri, M.; Reddi, S.; Stich, S.; Suresh, A.T. SCAFFOLD: Stochastic Controlled Averaging for Federated Learning. In Proceedings of the 37th International Conference on Machine Learning (ICML), Virtual, 13–18 July 2020; PMLR: Cambridge, MA, USA, 2020; Volume 119, pp. 5132–5143. [Google Scholar]
- Wang, J.; Liu, Q.; Liang, H.; Joshi, G.; Poor, H.V. Tackling the Objective Inconsistency Problem in Heterogeneous Federated Optimization. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), Virtual, 6–12 December 2020; Curran Associates: Red Hook, NY, USA, 2020; Volume 33, pp. 7611–7623. [Google Scholar]
- Alistarh, D.; Grubic, D.; Li, J.; Tomioka, R.; Vojnovic, M. QSGD: Communication-Efficient SGD via Gradient Quantization and Encoding. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), Long Beach, CA, USA, 4–9 December 2017; Curran Associates: Red Hook, NY, USA, 2017; Volume 30, pp. 1707–1718. [Google Scholar]
- Gou, J.; Yu, B.; Maybank, S.J.; Tao, D. Knowledge Distillation: A Survey. Int. J. Comput. Vis. 2021, 129, 1789–1819. [Google Scholar] [CrossRef]
- Hinton, G.; Vinyals, O.; Dean, J. Distilling the Knowledge in a Neural Network. arXiv 2015, arXiv:1503.02531. [Google Scholar] [CrossRef]
- Chen, S.; Wang, H.; Zhang, Y.; Yang, L. Few-Shot Image Classification Algorithm Based on Global-Local Feature Fusion. Electronics 2025, 14, 456. [Google Scholar]
- Sun, J.; Chen, T.; Giannakis, G.B.; Yang, Q.; Yang, Z. Lazily Aggregated Quantized Gradient Innovation for Communication-Efficient Federated Learning. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 44, 2031–2044. [Google Scholar] [CrossRef] [PubMed]
- Liu, H.; He, F.; Cao, G. Communication-Efficient Federated Learning for Heterogeneous Edge Devices Based on Adaptive Gradient Quantization. In Proceedings of the IEEE INFOCOM 2023, New York, NY, USA, 17–20 May 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 1–10. [Google Scholar]
- Mao, Y.; Zhao, Z.; Yan, G.; Liu, Y.; Lan, T.; Song, L.; Ding, W. Communication Efficient Federated Learning with Adaptive Quantization. ACM Trans. Intell. Syst. Technol. 2022, 13, 1–26. [Google Scholar] [CrossRef]
- Zhao, Z.; Mao, Y.; Shi, Z.; Liu, Y.; Lan, T.; Ding, W.; Zhang, X.P. AQUILA: Communication Efficient Federated Learning with Adaptive Quantization in Device Selection Strategy. IEEE Trans. Mob. Comput. 2023, 23, 7363–7376. [Google Scholar] [CrossRef]
- Yue, K.; Jin, R.; Wong, C.W.; Dai, H. Communication-Efficient Federated Learning via Predictive Coding. IEEE J. Sel. Top. Signal Process. 2022, 16, 369–380. [Google Scholar] [CrossRef]
- Jiang, Y.; Wang, S.; Valls, V.; Ko, B.J.; Lee, W.H.; Leung, K.K.; Tassiulas, L. Model Pruning Enables Efficient Federated Learning on Edge Devices. IEEE Trans. Neural Netw. Learn. Syst. 2023, 34, 10374–10386. [Google Scholar] [CrossRef] [PubMed]
- Babakniya, S.; Kundu, S.; Kundu, S.; Venkatesh, S.; Paiva, A.R.C.; Pal, S. FLASH: Concept Drift Adaptation via Federated Learning for Heterogeneous Edge Networks. In Proceedings of the IEEE 43rd International Conference on Distributed Computing Systems (ICDCS), Hong Kong, China, 18–21 July 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 190–202. [Google Scholar]
- Alam, S.; Liu, L.; Yan, M.; Zhang, M. FedRolex: Model-Heterogeneous Federated Learning with Rolling Sub-Model Extraction. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), New Orleans, LA, USA, 28 November–9 December 2022; Curran Associates: Red Hook, NY, USA, 2022; Volume 35, pp. 29677–29690. [Google Scholar]
- Gao, L.; Fu, H.; Li, L.; Chen, Y.; Xu, M.; Xu, C.Z. FedDC: Federated Learning with Non-IID Data via Local Drift Decoupling and Correction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 10102–10111. [Google Scholar]
- Zhang, Z.; Chen, Y.; Wang, X.; Li, B. Non-IID Data in Federated Learning: A Systematic Review with Taxonomy, Metrics, Methods, Frameworks and Future Directions. arXiv 2024, arXiv:2411.12377. [Google Scholar] [CrossRef]
- Li, X.; Jiang, M.; Zhang, X.; Kamp, M.; Dou, Q. FedBN: Federated Learning on Non-IID Features via Local Batch Normalization. In Proceedings of the International Conference on Learning Representations (ICLR), Virtual, 3–7 May 2021. [Google Scholar]
- Ma, X.; Zhang, J.; Guo, S.; Xu, W. Layer-wised Model Aggregation for Personalized Federated Learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 10092–10101. [Google Scholar]
- Krizhevsky, A. Learning Multiple Layers of Features from Tiny Images. Master’s Thesis, University of Toronto, Toronto, ON, Canada, 2009. [Google Scholar]
- Xiao, H.; Rasul, K.; Vollgraf, R. Fashion-MNIST: A Novel Image Dataset for Benchmarking Machine Learning Algorithms. arXiv 2017, arXiv:1708.07747. [Google Scholar] [CrossRef]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; IEEE: Piscataway, NJ, USA, 2016; pp. 770–778. [Google Scholar]
- Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv 2017, arXiv:1704.04861. [Google Scholar] [CrossRef]




| Method | Layer-Adaptive Pruning | Dynamic Scheduling | Heterogeneity- Aware Aggreg. | Mask Regrowth | Pruning Type |
|---|---|---|---|---|---|
| PruneFL [26] | × | ✓ | × | × | Unstructured |
| HeteroFL [13] | × | × | Partial | × | Structured |
| FedDST [7] | × | ✓ | × | ✓ | Unstructured |
| FedMP [8] | Partial | ✓ | × | × | Hybrid |
| LotteryFL [10] | × | × | × | ✓ | Unstructured |
| FedRolex [28] | × | ✓ | Partial | × | Structured |
| FedLP-Q [12] | ✓ | × | × | × | Structured |
| FedLayerPrune (Ours) | ✓ | ✓ | ✓ | ✓ | Hybrid |
| Symbol | Definition |
|---|---|
| K | Number of clients |
| Dataset and its size of client k | |
| Model parameters and their dimension | |
| Parameters and dimension of layer l | |
| Layer-wise retention and pruning rates () | |
| Binary mask of layer l | |
| Parameter importance score (diagonal Fisher information, Equation (9)) | |
| Layer sensitivity coefficient (type- and depth-dependent) | |
| Temporal scheduling factor (Equation (7)) | |
| Base and maximum pruning rates | |
| Mask consensus voting threshold (default 0.3) | |
| R | Regrowth interval in rounds (default 5) |
| Regrowth ratio per cycle (default 0.05) | |
| EMA smoothing factor for importance scores (Equation (10)) | |
| Mask encoding cost (bitmap: bytes) |
| Method | CIFAR-10 (ResNet-18) | MNIST (CNN) | Fashion-MNIST (CNN) | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Total Comm. (GB) | Reduction
vs. FedAvg |
Avg.
Sparsity |
Total Comm.
(MB) |
Reduction
vs. FedAvg |
Avg.
Sparsity |
Total Comm.
(MB) |
Reduction
vs. FedAvg |
Avg.
Sparsity | |
| FedAvg | 20.16 | - | 0% | 576.0 | - | 0% | 576.0 | - | 0% |
| FedProx | 20.16 | 0% | 0% | 576.0 | 0% | 0% | 576.0 | 0% | 0% |
| FixedPrune-50 | 10.08 | 50.0% | 50.0% | 288.0 | 50.0% | 50.0% | 288.0 | 50.0% | 50.0% |
| FixedPrune-70 | 6.05 | 70.0% | 70.0% | 172.8 | 70.0% | 70.0% | 172.8 | 70.0% | 70.0% |
| FedPrune | 8.47 | 58.0% | 42.3% | 242.9 | 57.8% | 41.8% | 244.1 | 57.6% | 41.5% |
| FedDST | 7.26 | 64.0% | 48.5% | 207.4 | 64.0% | 48.2% | 209.1 | 63.7% | 47.8% |
| FedLayerPrune | 6.45 | 68.3% | 52.7% | 185.2 | 67.8% | 51.9% | 186.8 | 67.6% | 51.5% |
| Method | CIFAR-10 | MNIST | Fashion-MNIST | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| = 0.1 | = 0.5 | = 1.0 | = 0.1 | = 0.5 | = 1.0 | = 0.1 | = 0.5 | = 1.0 | |||
| FedAvg | 89.2 ± 0.3 | 92.1 ± 0.2 | 93.5 ± 0.2 | 96.8 ± 0.2 | 98.2 ± 0.1 | 98.7 ± 0.1 | 87.3 ± 0.4 | 89.8 ± 0.3 | 91.2 ± 0.2 | ||
| FedProx | 89.5 ± 0.3 | 91.8 ± 0.2 | 93.2 ± 0.2 | 96.9 ± 0.2 | 98.0 ± 0.1 | 98.5 ± 0.1 | 87.6 ± 0.3 | 89.5 ± 0.3 | 90.9 ± 0.2 | ||
| FixedPrune-50 | 85.6 ± 0.5 | 88.9 ± 0.4 | 90.7 ± 0.3 | 94.2 ± 0.3 | 96.5 ± 0.2 | 97.3 ± 0.2 | 84.1 ± 0.5 | 87.2 ± 0.4 | 88.9 ± 0.3 | ||
| FixedPrune-70 | 81.3 ± 0.7 | 85.2 ± 0.6 | 87.8 ± 0.5 | 91.5 ± 0.5 | 94.1 ± 0.4 | 95.6 ± 0.3 | 80.2 ± 0.8 | 84.3 ± 0.6 | 86.7 ± 0.5 | ||
| FedPrune | 87.1 ± 0.4 | 90.3 ± 0.3 | 91.8 ± 0.3 | 95.6 ± 0.3 | 97.3 ± 0.2 | 97.9 ± 0.1 | 85.7 ± 0.4 | 88.4 ± 0.3 | 90.1 ± 0.3 | ||
| FedDST | 87.5 ± 0.4 | 90.8 ± 0.3 | 92.2 ± 0.3 | 95.9 ± 0.3 | 97.5 ± 0.2 | 98.1 ± 0.1 | 86.0 ± 0.4 | 88.7 ± 0.3 | 90.4 ± 0.2 | ||
| FedLayerPrune | 88.7 ± 0.3 | 91.5 ± 0.2 | 92.9 ± 0.2 | 96.4 ± 0.2 | 97.8 ± 0.1 | 98.4 ± 0.1 | 86.8 ± 0.3 | 89.3 ± 0.2 | 90.8 ± 0.2 | ||
| Method Variant | Layer-Adaptive | Dynamic Pruning | Heterogeneity-Aware | Accuracy (%) | Comm. Cost (GB) | Acc. Drop |
|---|---|---|---|---|---|---|
| FedLayerPrune (Full) | ✓ | ✓ | ✓ | 91.5 ± 0.2 | 6.45 | – |
| w/o Layer-Adaptive | × | ✓ | ✓ | 89.8 ± 0.3 | 7.12 | 1.7% |
| w/o Dynamic Adjustment | ✓ | × | ✓ | 90.2 ± 0.3 | 7.89 | 1.3% |
| w/o Heterogeneity-Aware | ✓ | ✓ | × | 90.6 ± 0.3 | 6.52 | 0.9% |
| Basic Pruning Only | × | × | × | 88.1 ± 0.4 | 8.47 | 3.4% |
| Accuracy (%) | Comm. Cost (GB) | Avg. Sparsity | Mask Stability | |
|---|---|---|---|---|
| 0.1 | 91.2 ± 0.3 | 7.03 | 48.1% | Low |
| 0.3 (default) | 91.5 ± 0.2 | 6.45 | 52.7% | Moderate |
| 0.5 | 91.3 ± 0.2 | 6.12 | 55.3% | High |
| 0.7 | 90.4 ± 0.4 | 5.68 | 59.8% | Very High |
| Method | Mean Acc. (%) | Std. Dev. | Worst Client (%) | Best Client (%) |
|---|---|---|---|---|
| FedAvg | 89.2 | 3.1 | 83.5 | 94.2 |
| FedProx | 89.5 | 2.7 | 84.8 | 93.9 |
| FixedPrune-70 | 81.3 | 5.8 | 71.2 | 89.1 |
| FedDST | 87.5 | 3.6 | 81.3 | 92.8 |
| FedLayerPrune | 88.7 | 2.9 | 83.1 | 93.5 |
| Method | Accuracy (%) | Total Comm. (GB) | Reduction | Rounds to 85% |
|---|---|---|---|---|
| FedAvg | 90.8 ± 0.4 | 50.40 | – | 12 |
| FedProx | 91.1 ± 0.3 | 50.40 | 0% | 11 |
| FixedPrune-70 | 83.1 ± 0.8 | 15.12 | 70.0% | 28 |
| FedDST | 89.2 ± 0.5 | 18.65 | 63.0% | 16 |
| FedLayerPrune | 90.1 ± 0.3 | 16.13 | 68.0% | 14 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
He, W.; Cao, H.; Zhang, J.; Yang, D. Efficient Federated Learning Method FedLayerPrune Based on Layer Adaptive Pruning. Electronics 2026, 15, 1049. https://doi.org/10.3390/electronics15051049
He W, Cao H, Zhang J, Yang D. Efficient Federated Learning Method FedLayerPrune Based on Layer Adaptive Pruning. Electronics. 2026; 15(5):1049. https://doi.org/10.3390/electronics15051049
Chicago/Turabian StyleHe, Wenlong, Hui Cao, Jisai Zhang, and Decao Yang. 2026. "Efficient Federated Learning Method FedLayerPrune Based on Layer Adaptive Pruning" Electronics 15, no. 5: 1049. https://doi.org/10.3390/electronics15051049
APA StyleHe, W., Cao, H., Zhang, J., & Yang, D. (2026). Efficient Federated Learning Method FedLayerPrune Based on Layer Adaptive Pruning. Electronics, 15(5), 1049. https://doi.org/10.3390/electronics15051049


