FEM-Based Hybrid Compression Framework with Pipeline Implementation for Efficient Deep Neural Networks on Tiny ImageNet
Abstract
1. Introduction
- Designing a unified FEM-integrated hybrid compression framework that coherently combines FP16, structured pruning, and KD.
- The evaluation of its adaptability across both high capacity (ResNet50) and lightweight (MobileNetV3) architectures.
- The establishment of an efficient paradigm for deploying deep learning models in resource constrained environments such as embedded vision, mobile AI, and edge intelligence.
- Enabling real-time inference performance to deploy in mobile and edge AI scenarios including IoT and autonomous systems such as drones
- A unified FEM-based hybrid compression framework is proposed, which integrates mixed-precision computation, structured pruning, and KD into a single optimization pipeline.
- A FEM is proposed to help enhance the inter-channel dependencies and retain the semantic richness under aggressive compression.
- The proposed framework is verified on ResNet50 and MobileNetV3 to show the architecture-agnostic scalability on the Tiny ImageNet dataset.
- Experimental results demonstrate an effective trade-off between accuracy, computational efficiency, and resource utilization, achieving approximately 6% improvement in Top-1 accuracy compared with recent state-of-the-art methods, up to 24% improvement over the baseline model, more than 32.26% memory savings, and approximately 66% reduction in inference latency.
2. Related Work
3. Theoretical Analysis
3.1. Convolutional Neural Networks
3.2. Residual Learning and ResNet50
3.3. MobileNetV3
4. Materials and Methods
4.1. Framework Overview
4.2. Feature Enhancement Module
- Shallow MLP: A lightweight path consisting of a linear layer, Rectified Linear Unit (ReLU) activation, dropout, and an output projection.
- Deep MLP: A deeper path with additional layers for modeling more complicated feature relationships.
| Algorithm 1 Feature Enhancement Module (FEM) Procedure |
| Require: Input feature map Ensure: Enhanced feature representation
|
4.3. Mixed-Precision Training
4.4. Structured Channel Pruning
4.5. Knowledge Distillation
- : cross-entropy loss between the student predictions and the ground-truth labels .
- : Kullback–Leibler divergence between the softened output distributions of the teacher and the student models.
- T: temperature parameter (set to ) that controls the softening of the probability distributions.
- : weighting factor (set to ) that balances the contributions of the soft distillation loss and the hard classification loss.
4.6. Pipeline Flow
- Initialize the backbone network (ResNet50 or MobileNetV3) and integrate the FEM module.
- Train the model using FP32 to establish a performance baseline.
- Apply FP16 training using AMP to reduce computational cost.
- Perform structured channel pruning by removing low-importance filters based on L1-norm criteria, followed by fine-tuning.
- Apply KD to transfer knowledge from the teacher model to the compressed student model.
4.7. Dataset and Preprocessing
- Random horizontal flipping
- Random rotation ()
- Color jittering (brightness, contrast, and saturation variations)
- Normalization using the ImageNet mean and standard deviation
4.8. Training Configuration
4.9. Reproducibility
5. Experiments and Results
5.1. Experimental Setup and Training Details
5.2. Evaluation Metrics
- (1)
- Top-1 Accuracy ():
- (2)
- Memory Reduction (MR%):
- (3)
- Latency (ms):
- (4)
- Compression Ratio (CR):
5.3. Results for ResNet50 + FEM
5.4. Results for MobileNetV3 + FEM
Comparative Analysis Between Architectures
5.5. Ablation Analysis of the FEM Module
5.6. Comparative Analysis with State of the Art Methods
5.7. Stochastic Stability Verification
5.8. Summary of Experimental Findings
6. Discussion
7. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
| AI | Artificial Intelligence |
| AMP | Automatic Mixed Precision |
| CNNs | Convolutional Neural Networks |
| CR | Compression Ratio |
| CUDA | Compute Unified Device Architecture |
| DNNs | Deep Neural Networks |
| FEM | Feature Enhancement Module |
| FLOPs | Floating Point Operations |
| FP16 | 16-bit Floating Point |
| FP32 | 32-bit Floating Point |
| GAP | Global Average Pooling |
| GPU | Graphics Processing Unit |
| IoT | Internet of Things |
| KD | Knowledge Distillation |
| MLP | Multi-Layer Perceptron |
| MR | Memory Reduction |
| NAS | Neural Architecture Search |
| RAM | Random Access Memory |
| ReLU | Rectified Linear Unit |
| SE | Squeeze-and-Excitation |
| VRAM | Video Random Access Memory |
References
- Li, Z.; Li, H.; Meng, L. Model compression for deep neural networks: A survey. Computers 2023, 12, 60. [Google Scholar] [CrossRef]
- Lee, H.; Lee, N.; Lee, S. A method of deep learning model optimization for image classification on edge device. Sensors 2022, 22, 7344. [Google Scholar] [CrossRef]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; IEEE: Piscataway, NJ, USA, 2016; pp. 770–778. [Google Scholar]
- Howard, A.; Sandler, M.; Chu, G.; Chen, L.C.; Chen, B.; Tan, M.; Wang, W.; Zhu, Y.; Pang, R.; Vasudevan, V.; et al. Searching for mobilenetv3. In Proceedings of the IEEE/CVF International Conference on Computer Vision; IEEE: Piscataway, NJ, USA, 2019; pp. 1314–1324. [Google Scholar]
- Li, M.; Huang, Z.; Chen, L.; Ren, J.; Jiang, M.; Li, F.; Fu, J.; Gao, C. Contemporary advances in neural network quantization: A survey. In Proceedings of the 2024 International Joint Conference on Neural Networks (IJCNN); Yokohama, Japan, 30 June–5 July 2024, IEEE: Piscataway, NJ, USA, 2024; pp. 1–10. [Google Scholar]
- Lab, S.V. Tiny ImageNet Visual Recognition Challenge. 2015. Available online: http://cs231n.stanford.edu/tiny-imagenet-200.zip (accessed on 10 January 2026).
- Ahmed Zaid, D.; Djamaa, B.; Benatia, M.A. Efficient and dynamic layer-wise structured N:M pruning of deep neural networks. Neurocomputing 2025, 653, 131090. [Google Scholar] [CrossRef]
- Deng, C.; Cheng, J.; Su, Y.; An, Z.; Yang, Z.; Xia, Z.; Zhang, Y.; Wang, S. WideTopo: Improving foresight neural network pruning through training dynamics preservation and wide topologies exploration. Neural Netw. 2025, 194, 108136. [Google Scholar] [CrossRef]
- Wu, D.; Wang, Y.; Fei, Y.; Gao, G. A Novel Mixed-Precision Quantization Approach for CNNs. IEEE Access 2025, 13, 49309–49319. [Google Scholar] [CrossRef]
- Rakka, M.; Fouda, M.E.; Khargonekar, P.; Kurdahi, F. A review of state-of-the-art mixed-precision neural network frameworks. IEEE Trans. Pattern Anal. Mach. Intell. 2024, 46, 7793–7812. [Google Scholar] [CrossRef]
- Zhang, R.; Jiang, H.; Wang, W.; Liu, J. Optimization Methods, Challenges, and Opportunities for Edge Inference: A Comprehensive Survey. Electronics 2025, 14, 1345. [Google Scholar] [CrossRef]
- Rokh, B.; Azarpeyvand, A.; Khanteymoori, A. A comprehensive survey on model quantization for deep neural networks in image classification. ACM Trans. Intell. Syst. Technol. 2023, 14, 1–50. [Google Scholar] [CrossRef]
- Kim, G.I.; Hwang, S.; Jang, B. Efficient compressing and tuning methods for large language models: A systematic literature review. ACM Comput. Surv. 2025, 57, 1–39. [Google Scholar] [CrossRef]
- Tmamna, J.; Ayed, E.B.; Fourati, R.; Gogate, M.; Arslan, T.; Hussain, A.; Ayed, M.B. Pruning deep neural networks for green energy-efficient models: A survey. Cogn. Comput. 2024, 16, 2931–2952. [Google Scholar] [CrossRef]
- Wang, S.; Zhu, Q. Channel modulus normalization for CNN image classification. Multimed. Syst. 2024, 30, 305. [Google Scholar] [CrossRef]
- Dantas, P.V.; Da Silva, W.S.; Cordeiro, L.C.; Carvalho, C.B. A comprehensive review of model compression techniques in machine learning. Appl. Intell. 2024, 54, 11804–11844. [Google Scholar] [CrossRef]
- Lian, Y.; Peng, P.; Jiang, K.; Xu, W. Cross-layer importance evaluation for neural network pruning. Neural Netw. 2024, 179, 106496. [Google Scholar] [CrossRef] [PubMed]
- Zhao, H.; Guan, R.; Man, K.L.; Yu, L.; Yue, Y. RePaIR: Repaired pruning at initialization resilience. Neural Netw. 2025, 184, 107086. [Google Scholar] [CrossRef]
- Mondal, M.; Das, B.; Lall, B.; Singh, P.; Roy, S.D.; Joshi, S.D. Feature independent filter pruning by successive layers analysis. Comput. Vis. Image Underst. 2023, 236, 103828. [Google Scholar] [CrossRef]
- Waheed, Z.; Khalid, S.; Riaz, S.M.; Khawaja, S.G.; Tariq, R. Resource-Restricted Environments Based Memory-Efficient Compressed Convolutional Neural Network Model for Image-Level Object Classification. IEEE Access 2022, 11, 1386–1406. [Google Scholar] [CrossRef]
- Xu, Y.; Khan, T.M.; Song, Y.; Meijering, E. Edge deep learning in computer vision and medical diagnostics: A comprehensive survey. Artif. Intell. Rev. 2025, 58, 93. [Google Scholar] [CrossRef]
- Gou, J.; Yu, B.; Maybank, S.J.; Tao, D. Knowledge Distillation: A Survey. Int. J. Comput. Vis. 2021, 129, 1789–1819. [Google Scholar] [CrossRef]
- Hao, Z.; Guo, J.; Han, K.; Tang, Y.; Hu, H.; Wang, Y.; Xu, C. One-for-all: Bridge the gap between heterogeneous architectures in knowledge distillation. Adv. Neural Inf. Process. Syst. 2023, 36, 79570–79582. [Google Scholar]
- Francy, S.; Singh, R. Edge ai: Evaluation of model compression techniques for convolutional neural networks. arXiv 2024, arXiv:2409.02134. [Google Scholar] [CrossRef]
- Jiang, C.; Hou, M.; Wang, H. An Adaptive Compression Method for Lightweight AI Models of Edge Nodes in Customized Production. Sensors 2026, 26, 383. [Google Scholar] [CrossRef]
- Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.C. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; IEEE: Piscataway, NJ, USA, 2018; pp. 4510–4520. [Google Scholar]
- Tan, M.; Le, Q. Efficientnet: Rethinking model scaling for convolutional neural networks. In Proceedings of the International Conference on Machine Learning, PMLR, Long Beach, CA, USA, 9–15 June 2019; pp. 6105–6114. [Google Scholar]
- Micikevicius, P.; Narang, S.; Alben, J.; Diamos, G.; Elsen, E.; Garcia, D.; Ginsburg, B.; Houston, M.; Kuchaiev, O.; Venkatesh, G.; et al. Mixed precision training. arXiv 2017, arXiv:1710.03740. [Google Scholar] [CrossRef]
- He, Y.; Zhang, X.; Sun, J. Channel pruning for accelerating very deep neural networks. In Proceedings of the IEEE International Conference on Computer Vision; IEEE: Piscataway, NJ, USA, 2017; pp. 1389–1397. [Google Scholar]
- Hinton, G.; Vinyals, O.; Dean, J. Distilling the knowledge in a neural network. arXiv 2015, arXiv:1503.02531. [Google Scholar] [CrossRef]
- Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada, 8–14 December 2019; Volume 32. [Google Scholar]
- Cugu, I.; Akbas, E. A Deeper Look into Convolutions via Eigenvalue-based Pruning. arXiv 2021, arXiv:2102.02804. [Google Scholar]
- Hou, Y.; Ma, Z.; Liu, C.; Wang, Z.; Loy, C.C. Network pruning via resource reallocation. Pattern Recognit. 2024, 145, 109886. [Google Scholar] [CrossRef]
- Zhang, W.; Guo, Y.; Wang, J.; Zhu, J.; Zeng, H. Collaborative Knowledge Distillation. IEEE Trans. Circuits Syst. Video Technol. 2024, 34, 7601–7613. [Google Scholar] [CrossRef]
- Zhang, Z.; Liu, T.; Gao, J.; Yang, M.; Luo, W.; Lin, F. TDR-Model: Tomato Disease Recognition Based on Image Dehazing and Improved MobileNetV3 Model. IEEE Access 2024, 13, 852–865. [Google Scholar] [CrossRef]
- Shahriar, T. Comparative Analysis of Lightweight Deep Learning Models for Memory-Constrained Devices. arXiv 2025, arXiv:2505.03303. [Google Scholar] [CrossRef]


















| Parameter | Value |
|---|---|
| Input resolution | |
| Optimizer | Adamax |
| Learning Rate | (Cosine Annealing) |
| Epochs | 100 (maximum, early stopping applied) |
| Batch Size | 128 |
| Pruning Ratio | 30% |
| Temperature (T) | 4.0 |
| Distillation Alpha () | 0.3 |
| Weight Decay | |
| Gradient Clipping | 1.0 |
| Loss function | Cross-entropy with label smoothing |
| Category | Specification |
|---|---|
| Workstation | MSI Titan 18 HX |
| CPU | Intel Core Ultra 9 285 HX (2.80 GHz) |
| RAM | 64 GB DDR5 (6400 MT/s) |
| Storage | 4 TB NVMe SSD |
| GPU | NVIDIA RTX 5090 GPU (24 GB VRAM) |
| Display | 18-inch 3840 × 2400 MiniLED, 120 Hz |
| Operating System | Windows 11 |
| Programming Language | Python 3.11.5 |
| Deep Learning Framework | PyTorch 2.1 |
| GPU Acceleration | CUDA 12.0 |
| Metric | FP32 Baseline | FP16 | Pruning (30%) | KD | Final Pipeline Model | Impact vs. Baseline |
|---|---|---|---|---|---|---|
| Accuracy (%) | 81.63 | 81.51 | 74.63 | 80.67 | 80.87 | |
| Latency (ms) | 9.48 | 5.42 | 3.33 | 3.18 | 3.16 | |
| Throughput (img/s) | 143.4 | 144.4 | 176.4 | 478.8 | 500.7 | |
| Memory (MB) | 536.0 | 410.7 | 458.8 | 375.1 | 363.1 | |
| Parameters (M) | 32.53 | 32.53 | 27.08 | 24.66 | 24.66 |
| Metric | FP32 Baseline | FP16 | Pruning (30%) | KD | Final Pipeline Model | Impact vs. Baseline |
|---|---|---|---|---|---|---|
| Accuracy (%) | 75.37 | 75.01 | 72.15 | 65.61 | 66.29 | |
| Latency (ms) | 6.78 | 3.50 | 3.05 | 2.81 | 3.57 | |
| Throughput (img/s) | 150.1 | 111.3 | 130.5 | 452.3 | 593.1 | |
| Memory (MB) | 107.8 | 82.7 | 37.3 | 61.8 | 72.76 | |
| Parameters (M) | 5.29 | 5.29 | 3.58 | 2.98 | 3.57 |
| Configuration | Accuracy (%) | Latency (ms) | Throughput (img/s) | Memory (MB) |
|---|---|---|---|---|
| ResNet50 (FP32, without FEM) | 57.56 | 4.71 | 116.6 | 406.5 |
| ResNet50 + FEM | 81.63 | 9.48 | 143.4 | 536.0 |
| Hybrid Compression Pipeline + FEM | 80.87 | 3.16 | 500.7 | 363.1 |
| Method | Backbone | Compression Strategy | Top-1 Acc. | Memory Red. (%) | Latency Red. (%) |
|---|---|---|---|---|---|
| Akbaş et al. [32] | ResNet50 | Eigenvalue-based Pruning | 55.30 | ∼30 | ∼25 |
| Hou et al. [33] | ResNet50 | PEEL Layer-wise Pruning | 74.80 | ∼45 | – |
| Ahmed Zaid et al. [7] | ResNet50 | Structured N:M Pruning | 56.40 | ∼30 | – |
| Deng et al. [8] | ResNet50 | WideTopo Foresight Pruning | 63.80 | ∼25 | – |
| Zhang et al. [34] | ResNet50 | KD | 69.70 | – | – |
| Proposed FEM | ResNet50 | Hybrid (FEM + FP16+ Pruning + KD) | 80.87 | 32.2 | 66 |
| Method | Backbone | Technique | Top-1 Acc. | Memory Red. (%) | Latency Red. (%) |
|---|---|---|---|---|---|
| Kumar et al. [35] | MobileNetV3 | Lightweight Pruning | 58.50 | – | – |
| Shahriar et al. [36] | MobileNetV3 | Distillation-based Training | 72.54 | – | – |
| Proposed FEM | MobileNetV3 | FEM-integrated Hybrid Compression | 66.29 | 32.5 | 47.3 |
| Seed | Accuracy (%) |
|---|---|
| 42 | 80.42 |
| 133 | 80.53 |
| 999 | 80.52 |
| Mean ± Std | 80.49 ± 0.06 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Hamza, A.; Tuama, A.; Mohamed Moubark, A. FEM-Based Hybrid Compression Framework with Pipeline Implementation for Efficient Deep Neural Networks on Tiny ImageNet. Big Data Cogn. Comput. 2026, 10, 131. https://doi.org/10.3390/bdcc10050131
Hamza A, Tuama A, Mohamed Moubark A. FEM-Based Hybrid Compression Framework with Pipeline Implementation for Efficient Deep Neural Networks on Tiny ImageNet. Big Data and Cognitive Computing. 2026; 10(5):131. https://doi.org/10.3390/bdcc10050131
Chicago/Turabian StyleHamza, Areej, Amel Tuama, and Asraf Mohamed Moubark. 2026. "FEM-Based Hybrid Compression Framework with Pipeline Implementation for Efficient Deep Neural Networks on Tiny ImageNet" Big Data and Cognitive Computing 10, no. 5: 131. https://doi.org/10.3390/bdcc10050131
APA StyleHamza, A., Tuama, A., & Mohamed Moubark, A. (2026). FEM-Based Hybrid Compression Framework with Pipeline Implementation for Efficient Deep Neural Networks on Tiny ImageNet. Big Data and Cognitive Computing, 10(5), 131. https://doi.org/10.3390/bdcc10050131

