TinySLFL: A Flash-Endurance-Aware Federated Edge Learning Framework with Layer-Wise Delayed Aggregation for Resource-Constrained Microcontrollers
Abstract
1. Introduction
- We identified and formalized flash P/E endurance as a first-class optimization target in federated learning on MCUs, and provide an end-to-end device-lifetime evaluation methodology that complements the single-round cost metrics common in prior work.
- We designed TinySLFL, a hardware-aware FL framework whose layer-wise training and delayed aggregation are provably SRAM-bounded and reduce the per-round flash writes from to , where K is the network depth.
- We propose a three-mechanism dynamic aggregation strategy that improves Non-IID accuracy and shortens the round budget, and complement it with a fault-tolerant streaming protocol for realistic lossy-network MCU deployments.
- We provide experiments spanning three vision benchmarks, real ESP32-S3 measurements, full-module ablation, and a hyperparameter sensitivity analysis, establishing TinySLFL as a practical pathway for long-term, sustainable on-device federated adaptation.
2. Related Work
2.1. On-Device Training on Microcontrollers
2.2. Federated Learning Under Resource and Data Heterogeneity
2.3. Flash Memory Endurance and Wear-Aware Systems
3. Preliminaries and Problem Formulation
3.1. System Model and Federated Objective
3.2. Constraint I: The SRAM Memory Wall
3.3. Constraint II: Flash Endurance
3.4. Co-Optimization Objective
4. The TinySLFL Framework
4.1. Problem Formulation
4.2. Client Side: Layer-Wise Training with Delayed Aggregation
4.2.1. Layer-Wise Training Scheduler
4.2.2. Delayed-Aggregation Protocol
| Algorithm 1 Client side: layer-wise training with delayed streaming upload |
| Require: Round t; global model ; local data ; layer count K; per-layer learning rates ; local epochs E Ensure: Layer-wise updates and loss statistics uploaded to the server Receive ; persist to Flash ▹ Phase 1: single write for to K do ▹ Phase 2: streaming layer-wise upload if then continue ▹ layer frozen by server end if Load into SRAM; allocate gradient/optimizer buffers for to E do end for Compute layer-wise loss reduction Send with CRC Free SRAM of layer k ▹ Phase 3: zero Flash write-back end for return round completion after all non-frozen layers have been uploaded or skipped |
4.2.3. Fault Tolerance, State Alignment, and Straggler Control
4.3. Server Side: Dynamic Aggregation
4.3.1. Layer-Wise Learning-Rate Scheduling
4.3.2. Loss-Aware Freezing
4.3.3. Accuracy-Guided Selective Aggregation
4.4. Complexity, Overhead, and Formal Guarantees
| Algorithm 2 Server side: dynamic aggregation |
| Require: Round t; participating clients ; global model ; freezing threshold ; accuracy threshold ; frozen set ; reactivation margin ; minimum unfreeze rate ; best proxy accuracy Ensure: Updated global model ; next-round learning rates ; frozen set for to K do if then ; continue ▹ skip frozen layer end if Compute via (11)–(12) ▹ Step 1: loss-aware freezing if then ; if and then ; ▹ unfreeze next layer end if else Update via (10) Form by replacing layer k of current ; compute via (13) ▹ Step 2: accuracy-guided filtering if then ▹ commit else ▹ reject end if end if end for Compute round-end proxy accuracy if then Select ; ▹ recent or drift-sensitive layers For , set ▹ reactivate/unfreeze end if if all layers remain frozen then return and terminate global rounds end if Broadcast and to all clients |
5. Results
5.1. Experimental Setup
| Item | Configuration |
|---|---|
| Datasets | CIFAR-10, SVHN, FEMNIST |
| Non-IID partition | CIFAR-10/SVHN: Dirichlet ; FEMNIST: writer-identity partition; clients |
| Backbone | ResNet-8 (∼70 KB), pre-trained on Tiny-ImageNet |
| Local epochs/batch size | /effective batch 32 (hardware batch 1, accumulation 32 in both simulation and device runs) |
| Optimizer | SGD, momentum , , layer-wise decay |
| Server thresholds | , , |
| Server proxy set | Held-out validation pool (20% of partitioned data): CIFAR-10 10,000; SVHN∼14,651; FEMNIST∼800–1200 |
| MCU model | Espressif ESP32-S3; 512 KB SRAM; 8 MB NOR flash |
| Flash erase block | 4 KB |
| Wear estimation | Server-side simulation with inferred erase-block tracing |
| Simulation seeds | 5 random seeds |
| Method | Peak SRAM (KB) | FW/Round | Latency/Round (s) | Energy/Round (J) |
|---|---|---|---|---|
| FedAvg | OOM | — | — | — |
| SLT | 310 | 558 | 235 | |
| TinySLFL | 320 | 18 | 192 |
| Method | Simulation Acc. (%) | On-Device Acc. (%) |
|---|---|---|
| SLT | ||
| TinySLFL |
| Method | CIFAR-10 | SVHN | FEMNIST |
|---|---|---|---|
| FedAvg † | 90,031 | 93,782 | 90,031 |
| FedProx † | 82,529 | 61,896 | 31,886 |
| SLT | 24,552 | 23,436 | 12,834 |
| TinySLFL | 342 | 486 | 288 |
5.2. Accuracy and Wear-to-Target Comparison
Device Lifetime Projection
5.3. Real-Device Validation on ESP32-S3
5.4. Ablation Study and Sensitivity Analysis
5.4.1. Ablation Study
5.4.2. Freeze/Unfreeze Dynamics
5.4.3. Sensitivity Analysis
6. Discussion
7. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Lin, J.; Zhu, L.; Chen, W.-M.; Wang, W.-C.; Han, S. Tiny Machine Learning: Progress and Futures. IEEE Circuits Syst. Mag. 2023, 23, 8–34. [Google Scholar] [CrossRef]
- Zhu, S.; Voigt, T.; Rahimian, F.; Ko, J. On-Device Training: A First Overview on Existing Systems. ACM Trans. Sens. Netw. 2024, 20, 1–39. [Google Scholar] [CrossRef]
- Lin, J.; Chen, W.-M.; Cai, H.; Gan, C.; Han, S. MCUNetV2: Memory-Efficient Patch-Based Inference for Tiny Deep Learning. Adv. Neural Inf. Process. Syst. 2021, 34, 2346–2358. [Google Scholar]
- Boboila, S.; Desnoyers, P. Write Endurance in Flash Drives: Measurements and Analysis. In Proceedings of the 8th USENIX Conference on File and Storage Technologies, San Jose, CA, USA, 23–26 February 2010; pp. 115–128. [Google Scholar]
- Boukhobza, J.; Olivier, P.; Lim, W.S.; Chen, L.-C.; Hsieh, Y.-S.; Wu, S.-T.; Ho, C.-C.; Huang, P.-C.; Chang, Y.-H. A Survey on Flash-Memory Storage Systems: A Host-Side Perspective. ACM Trans. Storage 2025, 21, 1–59. [Google Scholar] [CrossRef]
- McMahan, H.B.; Moore, E.; Ramage, D.; Hampson, S.; Arcas, B.A. Communication-Efficient Learning of Deep Networks from Decentralized Data. In Artificial Intelligence and Statistics; PMLR: Cambridge, MA, USA, 2017; pp. 1273–1282. [Google Scholar]
- Li, Q.; Diao, Y.; Chen, Q.; He, B. Federated Learning on Non-IID Data Silos: An Experimental Study. In Proceedings of the IEEE 38th International Conference on Data Engineering (ICDE), Kuala Lumpur, Malaysia, 9–12 May 2022; pp. 965–978. [Google Scholar]
- Li, T.; Sahu, A.K.; Zaheer, M.; Sanjabi, M.; Talwalkar, A.; Smith, V. Federated Optimization in Heterogeneous Networks. Proc. Mach. Learn. Syst. 2020, 2, 429–450. [Google Scholar]
- Llisterri Giménez, N.; Monfort Grau, M.; Pueyo Centelles, R.; Freitag, F. On-Device Training of Machine Learning Models on Microcontrollers with Federated Learning. Electronics 2022, 11, 573. [Google Scholar] [CrossRef]
- Sha, X.; Sun, W.; Liu, X.; Luo, Y.; Luo, C. Enhancing Edge-Assisted Federated Learning with Asynchronous Aggregation and Cluster Pairing. Electronics 2024, 13, 2135. [Google Scholar] [CrossRef]
- Pfeiffer, K.; Khalili, R.; Henkel, J. Aggregating Capacity in FL through Successive Layer Training for Computationally-Constrained Devices. Adv. Neural Inf. Process. Syst. 2023, 36, 35386–35402. [Google Scholar]
- Banbury, C.; Reddi, V.J.; Torelli, P.; Holleman, J.; Jeffries, N.; Király, C.; Montino, P.; Kanter, D.; Ahmed, S.; Pau, D.; et al. MLPerf Tiny Benchmark. In Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks; Curran Associates, Inc.: Red Hook, NY, USA, 2021; Volume 1. [Google Scholar]
- Choi, J.; Wang, Z.; Venkataramani, S.; Chuang, P.I.-J.; Srinivasan, V.; Gopalakrishnan, K. PACT: Parameterized Clipping Activation for Quantized Neural Networks. In Proceedings of the Sixth International Conference on Learning Representations, Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
- Gruslys, A.; Munos, R.; Danihelka, I.; Lanctot, M.; Graves, A. Memory-Efficient Backpropagation Through Time. In Advances in Neural Information Processing Systems 29 (NeurIPS 2016); Curran Associates, Inc.: Red Hook, NY, USA, 2016; pp. 4125–4133. [Google Scholar]
- Cai, H.; Gan, C.; Zhu, L.; Han, S. TinyTL: Reduce Memory, Not Parameters for Efficient On-Device Learning. Adv. Neural Inf. Process. Syst. 2020, 33, 11285–11297. [Google Scholar]
- Lin, J.; Zhu, L.; Chen, W.-M.; Wang, W.-C.; Cai, H.; Shi, L.; Han, S. On-Device Training Under 256KB Memory. Adv. Neural Inf. Process. Syst. 2022, 35, 22169–22183. [Google Scholar]
- Kopparapu, K.; Lin, E.; Breslin, J.G.; Sudharsan, B. TinyFedTL: Federated Transfer Learning on Ubiquitous Tiny IoT Devices. In 2022 IEEE International Conference on Pervasive Computing and Communications Workshops and Other Affiliated Events (PerCom Workshops); IEEE: New York, NY, USA, 2022; pp. 79–81. [Google Scholar] [CrossRef]
- Ren, H.; Li, X.; Anicic, D.; Runkler, T.A. TinyMetaFed: Efficient Federated Meta-Learning for TinyML. In Machine Learning and Principles and Practice of Knowledge Discovery in Databases, International Workshops of ECML PKDD 2023; Communications in Computer and Information Science, Volume 2136; Springer: Cham, Switzerland, 2024. [Google Scholar] [CrossRef]
- Geerts, C. LittleFS—A Little Fail-Safe Filesystem Designed for Microcontrollers. GitHub Repository. 2017. Available online: https://github.com/littlefs-project/littlefs (accessed on 14 April 2026).
- Buck, A.; Ganesan, K.; Enright Jerger, N. FlipBit: Approximate Flash Memory for IoT Devices. In 2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA); IEEE: New York, NY, USA, 2024; pp. 876–890. [Google Scholar] [CrossRef]
- Krizhevsky, A.; Hinton, G. Learning Multiple Layers of Features from Tiny Images; Technical Report; University of Toronto: Toronto, ON, Canada, 2009. [Google Scholar]
- Netzer, Y.; Wang, T.; Coates, A.; Bissacco, A.; Wu, B.; Ng, A.Y. Reading Digits in Natural Images with Unsupervised Feature Learning. In NeurIPS Workshop on Deep Learning and Unsupervised Feature Learning; Curran Associates, Inc.: Red Hook, NY, USA; Granada, Spain, 2011. [Google Scholar]
- Caldas, S.; Duddu, S.M.K.; Wu, P.; Li, T.; Konečný, J.; McMahan, H.B.; Smith, V.; Talwalkar, A. LEAF: A Benchmark for Federated Settings. In Workshop on Federated Learning for Data Privacy and Confidentiality (NeurIPS 2019); Curran Associates, Inc.: Red Hook, NY, USA; Vancouver, BC, Canada, 2019. [Google Scholar]
- Hsu, T.-M.H.; Qi, H.; Brown, M. Measuring the Effects of Non-Identical Data Distribution for Federated Visual Classification. In Workshop on Federated Learning for Data Privacy and Confidentiality (NeurIPS 2019); Curran Associates, Inc.: Red Hook, NY, USA; Vancouver, BC, Canada, 2019. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Le, Y.; Yang, X.S. Tiny ImageNet Visual Recognition Challenge; Stanford University CS231N Course Report; Stanford University: Stanford, CA, USA, 2015; Available online: https://cs231n.stanford.edu/2015/project.html (accessed on 15 April 2026).
- Espressif Systems. ESP32-S3 Technical Reference Manual, version 1.4; Espressif Systems: Shanghai, China, 2023; Available online: https://www.espressif.com/sites/default/files/documentation/esp32-s3_technical_reference_manual_en.pdf (accessed on 14 April 2026).
- Nordic Semiconductor. Power Profiler Kit II (PPK2) User Guide; Nordic Semiconductor: Trondheim, Norway, 2021; Available online: https://docs.nordicsemi.com/bundle/ug_ppk2/page/UG/ppk/PPK_user_guide_Intro.html (accessed on 14 April 2026).




| Method | CIFAR-10 | SVHN | FEMNIST |
|---|---|---|---|
| FedAvg | 68.41 ± 0.23 | 90.83 ± 0.16 | 71.68 ± 0.23 |
| FedProx | 69.80 ± 0.31 | 91.22 ± 0.25 | 80.65 ± 0.26 |
| SLT | 71.31 ± 0.28 | 90.31 ± 0.17 | 80.54 ± 0.35 |
| TinySLFL | 76.55 ± 0.18 | 92.23 ± 0.21 | 81.82 ± 0.38 |
| Method | Peak FW/Round | Lifetime (Light) | Lifetime (Frequent) |
|---|---|---|---|
| FedAvg | OOM | — | — |
| SLT | 558 | days | days |
| TinySLFL | 18 | >15 years | >9 months |
| Method | Snapshot (J) | Training (J) | Wi-Fi (J) | Idle (J) | Total/Round (J) | Inference/Sample (mJ) |
|---|---|---|---|---|---|---|
| SLT | ||||||
| TinySLFL |
| 0.70 | 0.90 | 0.95 | |
|---|---|---|---|
| 72.86 | 74.09 | 72.99 | |
| 74.66 | 76.55 | 73.17 | |
| 73.78 | 75.11 | 74.82 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Tao, Y.; Jia, J.; Deng, T. TinySLFL: A Flash-Endurance-Aware Federated Edge Learning Framework with Layer-Wise Delayed Aggregation for Resource-Constrained Microcontrollers. Electronics 2026, 15, 2084. https://doi.org/10.3390/electronics15102084
Tao Y, Jia J, Deng T. TinySLFL: A Flash-Endurance-Aware Federated Edge Learning Framework with Layer-Wise Delayed Aggregation for Resource-Constrained Microcontrollers. Electronics. 2026; 15(10):2084. https://doi.org/10.3390/electronics15102084
Chicago/Turabian StyleTao, Yiru, Juncheng Jia, and Tao Deng. 2026. "TinySLFL: A Flash-Endurance-Aware Federated Edge Learning Framework with Layer-Wise Delayed Aggregation for Resource-Constrained Microcontrollers" Electronics 15, no. 10: 2084. https://doi.org/10.3390/electronics15102084
APA StyleTao, Y., Jia, J., & Deng, T. (2026). TinySLFL: A Flash-Endurance-Aware Federated Edge Learning Framework with Layer-Wise Delayed Aggregation for Resource-Constrained Microcontrollers. Electronics, 15(10), 2084. https://doi.org/10.3390/electronics15102084

