FEM-Based Hybrid Compression Framework with Pipeline Implementation for Efficient Deep Neural Networks on Tiny ImageNet

Hamza, Areej; Tuama, Amel; Mohamed Moubark, Asraf

doi:10.3390/bdcc10050131

Open AccessArticle

FEM-Based Hybrid Compression Framework with Pipeline Implementation for Efficient Deep Neural Networks on Tiny ImageNet

by

Areej Hamza

^1,*

,

Amel Tuama

^1,*

and

Asraf Mohamed Moubark

²

¹

Technical Engineering College for Computer and AI, Northern Technical University, Kirkuk 36001, Iraq

²

Department of Electrical, Electronic and Systems Engineering, Faculty of Engineering and Built Environment, Universiti Kebangsaan Malaysia, Bangi 43600, Malaysia

^*

Authors to whom correspondence should be addressed.

Big Data Cogn. Comput. 2026, 10(5), 131; https://doi.org/10.3390/bdcc10050131

Submission received: 8 February 2026 / Revised: 4 April 2026 / Accepted: 17 April 2026 / Published: 22 April 2026

Download

Browse Figures

Versions Notes

Abstract

The high accuracy achieved by deep learning techniques has made them indispensable in computer vision applications. However, their substantial memory demands and high computational complexity limit their deployment in resource-constrained environments. To address this challenge, this study introduces a Feature Enhancement Module (FEM) as part of a unified hybrid compression framework that combines mixed-precision quantization and structured pruning to improve model efficiency. Experimental results on the Tiny ImageNet dataset using ResNet50 and MobileNetV3 architectures demonstrate the strong adaptability and scalability of the proposed approach. Compared with state-of-the-art compression methods, the proposed FEM-based framework achieves up to 6% improvement in Top-1 accuracy, while reducing memory usage by 32.26% and improving inference speed by 66%. Furthermore, the ablation study demonstrates that incorporating the FEM module leads to up to 24% improvement over the baseline model, highlighting its effectiveness. The results further show that FEM effectively preserves inter-channel feature representation stability even under aggressive compression, making it well suited for real-time processing and practical Artificial Intelligence (AI) applications. By maintaining semantic richness while significantly reducing computational cost, the proposed method bridges the gap between high-performance deep models and lightweight, deployable solutions. Overall, the FEM-based hybrid compression framework establishes a scalable and architecture-independent foundation for sustainable deep learning in resource-limited environments.

Keywords:

Feature Enhancement Module; hybrid model compression; deep neural networks; model optimization; compression pipeline

Graphical Abstract

1. Introduction

Recently, Convolutional Neural Networks (CNNs) have played an important role in image processing, object detection and classification, Artificial Intelligence (AI), and segmentation. However, CNNs consume substantial computational resources and memory which pose a major challenge for real-time and embedded deployment using Internet of Things (IoT) devices [1,2].

Although architectures such as ResNet50 [3] and MobileNetV3 [4] have improved inference efficiency, some hardware limitations have remained, therefore model compression including pruning, quantization, and knowledge distillation (KD), started to be used in deep learning optimization [1,5]. However, achieving a stable balance between accuracy preservation and computational reduction remains a challenge. Some traditional pruning techniques eliminate critical feature channels especially in fine grained datasets such as Tiny ImageNet [6], which consists of 200 object categories. Moreover, these techniques typically operate post training, thus, they neglect internal feature dependencies and cause unstable generalization across architectures [5,7,8]. Mixed-precision computation using 16-bit floating point (FP16) and 32-bit floating point (FP32) is widely adopted to reduce Graphics Processing Unit (GPU) memory consumption and improve computational efficiency; however, it can introduce gradient instability, particularly in lightweight architectures such as MobileNetV3 [9,10]. Although hybrid quantization reduces this issue, it lacks mechanisms for maintaining feature representation under reduced numerical precision [5,11]. Similarly, KD has proven its effectiveness for model compression [12,13] and it is essential for preserving semantic richness in compressed networks [14,15].

Most existing model compression frameworks rely on pruning, quantization, and distillation techniques that are typically applied independently, which limits their ability to preserve stable feature representations and maintain consistent performance, particularly across heterogeneous architectures and under aggressive compression settings. This limitation motivates the development of a unified hybrid compression framework [12,13].

Unlike traditional pruning, quantization, or distillation methods, which are often applied independently with limited consideration of feature dependencies, the proposed Feature Enhancement Module (FEM) adopts a dual-branch structure to enhance feature representations before compression, ensuring stable feature abstraction and consistent convergence across different architectures [15,16].

This study proposes FEM integrated within a hybrid compression pipeline that joins FP16, structured pruning, and KD [13], which helps to overcome the mentioned limitations. The FEM design is based on a dual-branch structure enabling stable abstraction under high optimization levels [15,16], thus ensuring minimal accuracy degradation and consistent convergence across architectures [13], by conditioning the feature maps before compression unlike what the previous processing strategies had done. The aims of this research are summarized as follows:

Designing a unified FEM-integrated hybrid compression framework that coherently combines FP16, structured pruning, and KD.
The evaluation of its adaptability across both high capacity (ResNet50) and lightweight (MobileNetV3) architectures.
The establishment of an efficient paradigm for deploying deep learning models in resource constrained environments such as embedded vision, mobile AI, and edge intelligence.
Enabling real-time inference performance to deploy in mobile and edge AI scenarios including IoT and autonomous systems such as drones

The main contributions of this study are summarized as follows:

A unified FEM-based hybrid compression framework is proposed, which integrates mixed-precision computation, structured pruning, and KD into a single optimization pipeline.
A FEM is proposed to help enhance the inter-channel dependencies and retain the semantic richness under aggressive compression.
The proposed framework is verified on ResNet50 and MobileNetV3 to show the architecture-agnostic scalability on the Tiny ImageNet dataset.
Experimental results demonstrate an effective trade-off between accuracy, computational efficiency, and resource utilization, achieving approximately 6% improvement in Top-1 accuracy compared with recent state-of-the-art methods, up to 24% improvement over the baseline model, more than 32.26% memory savings, and approximately 66% reduction in inference latency.

The key novelty of this work lies in integrating a FEM within a unified hybrid compression framework for deep neural networks (DNNs). Unlike conventional model compression approaches that primarily focus on reducing computational cost, the proposed framework enhances feature representation before applying compression techniques. By combining feature enhancement with mixed-precision computation, structured pruning, and KD, the proposed pipeline achieves significant reductions in memory usage and inference latency while preserving classification accuracy on the Tiny ImageNet dataset.

2. Related Work

Several compression techniques suffer from instability and hardware sensitivity, particularly in edge AI and embedded environments. Comprehensive reviews by Li et al. [1] and Dantas et al. [16] highlight that compression should be approached not only as structural reduction but also as a process that ensures representational stability, hardware scalability, and cross-architecture generalization in sustainable AI systems.

Lian et al. [17] proposed a cross layer importance evaluation approach that achieved a 50% memory reduction on ResNet50 with only a small loss of 0.93% and 0.43% in the Top-1 and Top-5 accuracies respectively. Zhao et al. [18] employed a similar approach. They developed the RePaIR pruning strategy, which improved model robustness by integrating initialization topology information and achieved up to 1.7% accuracy gain with the same sparse pruning mask on Tiny ImageNet. However, the researchers in [19] reported that aggressive pruning could eliminate semantically significant filters, which reduces model generalization and convergence stability. As with pruning, quantization has become a widely adopted technique to reduce computational costs by lowering numerical precision. FPSL reduces 42.7% of Floating Point Operations (FLOPs) while maintaining Top-1 accuracy for ResNet50. Waheed et al. [20] proposed a collaborative pruning quantization compression framework that reduced the computation and memory requirements of CNNs, saving up to 91% of model memory. It has reduced bit-width parameters to lower bits and parameter count to almost half, with negligible accuracy drop. A technique developed by Wu et al. [9] was used to vary precision according to layer sensitivity using a layer wise adaptive quantization scheme, which improved a compression rate by 2.15% and a very bit width by 0.77% on ResNet50 and 0.12% on ResNet20. It minimized memory usage while preserving inference accuracy. On the other hand, a team of researchers observed that aggressive quantization may amplify gradient instability and diminish expressiveness in lightweight architectures such as MobileNetV3 [21]. These limitations led to the development of feature attention mechanisms and knowledge transfer strategies that stabilize quantized models. KD has emerged as a powerful approach for compressing DNNs while preserving generalization. Gou et al. [22] presented a comprehensive overview of KD methods for CNNs and Transformers, showing the enhancement of task transferability through feature based and relational distillation strategies Hao et al. [23] proposed a one-for-all KD framework called OFA-KD, which significantly improves the distillation performance between heterogeneous architectures and outperforms existing KD methods. As hybrid compression frameworks started to appear, Francy and Singh [24] introduced a joint (pruning quantization) system achieving up to 89.7% size reduction, 95% number of parameters reduction with MACs, and 3.8% accuracy increment. The deployment of the final compressed model on an edge device demonstrates a high accuracy of 92.5% and a low inference time of about 20 ms which validates the strength of compression techniques for real-world edge computing applications, while Jiang [25] proposed an adaptive lightweight (AI) model compression that is designed for efficient real-time deployment on edge devices. Wang and Zhu [15] proposed channel modulus normalization to confirm inter-channel dependencies and overcome the problems of limited generalization across architectures which existed in previous techniques which improved the classification accuracy of ResNet50 by 0.57%, 3.17%, 1.11%, 2.28%, 0.61% and 0.14% sequentially on six datasets such as CIFAR10, CIFAR100, FaceScrub, Tiny ImageNet, ImageNet (100), and ImageNet (1000) respectively, while Xu et al. [8] demonstrated that attention - based normalization improves both stability and accuracy in compressed CNNs. Thus, researchers began to adopt hybrid compression-based systems for designing the required architecture in effective deep learning issues. In contrast to existing studies that typically focus on individual compression techniques such as pruning, quantization, or distillation, the proposed framework integrates these approaches within a unified optimization pipeline enhanced by a dedicated FEM. While prior methods mainly aim to reduce computational complexity, the proposed approach emphasizes preserving feature representation quality before compression. This design enables a more favorable balance between model efficiency and predictive accuracy, particularly for resource-constrained deployment scenarios.

3. Theoretical Analysis

This study concentrates on both high capacity and lightweight CNN architectures, represented by ResNet50 and MobileNetV3 respectively by discussing the trade of between their representational power and computational efficiency.

3.1. Convolutional Neural Networks

Nowadays, CNNs are used widely in image classification and recognition tasks, and it forces increasing demands on memory bandwidth and activation storage. In this study, CNNs architectures are selected mostly because of their high representational ability and sensitivity to compression techniques and therefore they can serve as a good candidate to assess the effectiveness of the proposed FEM-based hybrid framework.

3.2. Residual Learning and ResNet50

ResNet50 consists of a 50-layer deep residual network, it is widely used in image classification and it employs bottleneck residual blocks composed of a sequence of 1 × 1, 3 × 3, and 1 × 1 convolutional layers, combined with residual skip connections. It is capable of efficient feature extraction, and it could be used while keeping high representational capacity, allowing the network to scale in depth without high computational cost. ResNet50 has a strong performance in feature extraction and feature classification in different application domains [3]. It is an excellent choice in high capacity applications where accuracy is essential. While ResNet50 achieves high accuracy in latency critical environments, it often falls short in real-time edge deployments compared to lightweight models like MobileNetV3 [4]. Figure 1 shows the overall structure of the ResNet50 model as a hierarchical residual block structure, where each block consists of 1 × 1, 3 × 3, and 1 × 1 convolutional layers integrated with identity skip connections to enhance feature reuse and gradient flow [3]. In this work, ResNet50 is used as a high capacity baseline to test the robustness and effectiveness of the FEM-based hybrid compression framework proposed in this work under aggressive pruning and quantization. Figure 1 shows the ResNet50 architecture as the high-capacity backbone in the suggested framework.

3.3. MobileNetV3

MobileNetV3 is considered a lightweight CNNs architecture that is designed to balance recognition accuracy with computational efficiency in embedded vision and mobile systems. It seems to be similar, but more advanced, to the MobileNetV2 design which is used to reduce computational complexity while preserving representational [26]. This approach integrates Neural Architecture Search (NAS) and NetAdapt techniques to automatically optimize network topology under latency and power constraints [4]. Adaptive channel-wise recalibration is employed in this model to enhance feature selectivity and improve information flow across layers. MobileNetV3 has become a widely adopted backbone due to its efficiency-oriented design, it remains competitive under strict latency and computational constraints when compared with alternative efficient architectures such as EfficientNet [27]. Figure 2 shows the schematic representation of the MobileNetV3 large architecture reconstructed from the structural specifications provided in [4]. It shows the progressive transformation of the feature map resolutions through convolutional and bottleneck blocks, integrating depthwise separable convolutions, Squeeze-and-Excitation (SE) modules, and the Hard-Swish (HS) activation function. It reflects the optimized balance between representational accuracy and computational efficiency in mobile deep learning applications. In this study, MobileNetV3 is chosen as an example of lightweight baseline model in order to validate scalability and real-time effectiveness of the proposed FEM-based hybrid compression framework for Mobile VMS on resource constrained edge device.

A comparison between ResNet50 and MobileNetV3 architectures is presented in Figure 3, highlighting the differences in their design and computational characteristics.

Figure 3 illustrates the key architectural differences between ResNet50, which relies on deep bottleneck residual blocks for high-capacity feature extraction [3], and MobileNetV3, which is based on inverted residual structures, depthwise separable convolutions, SE modules, and lightweight activation functions [4]. As a summary and motivation for the proposed framework, ResNet50 confirms depth and residual learning to achieve high recognition accuracy, while MobileNetV3 achieves memory efficiency through lightweight architectural components and automated design optimization. By addressing memory usage, computational efficiency, and accuracy preservation, the proposed approach aims to bridge the gap between performance-oriented and resource-aware CNNs designs, enabling scalable and efficient deployment.

4. Materials and Methods

4.1. Framework Overview

This research introduces a modular and unified compression framework designed for DNNs and the architecture is made of three stages: mixed-precision training, structured channel pruning, and KD to reach a balance between computational efficiency and predictive accuracy. The framework is set on the mentioned two architectures: ResNet50 and MobileNetV3, which are boosted with a custom FEM to improve feature richness before compression. The design of the proposed hybrid framework is motivated by the complementary strengths of its components. The FEM is introduced at an early stage to enhance feature representation and stabilize inter-channel relationships before applying compression. Mixed-precision training is then employed to reduce computational cost while maintaining numerical stability. Structured channel pruning further eliminates redundant features, and KD is finally applied to recover potential performance degradation. This sequential integration ensures a balance between compression efficiency and model accuracy across different network architectures.

The feature enhancement process (see Figure 4 and Figure 5) is first integrated within the backbone network. Subsequently, the complete framework is illustrated in Figure 6, highlighting the sequential application of all framework stages. The pipeline starts with input preprocessing followed by feature extraction using either ResNet50 or MobileNetV3. Extracted features are passed through the FEM. Subsequently, the model undergoes three optimization stages: (FP16) for mixed-precision training, structured channel pruning for model size reduction, and KD to preserve accuracy. These stages are applied sequentially, with each component complementing the next to yield a compact yet accurate model suitable for resource-constrained deployment.

4.2. Feature Enhancement Module

This model is positioned in the middle, directly after the backbone network and before the final classification head, and it contains two distinct Multi-Layer Perceptron (MLP) branches:

Shallow MLP: A lightweight path consisting of a linear layer, Rectified Linear Unit (ReLU) activation, dropout, and an output projection.
Deep MLP: A deeper path with additional layers for modeling more complicated feature relationships.

The outputs from both branches are mixed using learnable weights, and the mixed feature is then passed through a channel attention mechanism, which includes a bottleneck MLP followed by a sigmoid activation function used to recalibrate channels. Finally, the recalibrated feature is added to the original feature map to improve feature representation. The introduction of FEM introduces a marginal training overhead, given that it consists of light MLP layers and simple attention operations. This extra cost is still small compared to the overall complexity of the backbone and does not have a major impact on the training efficiency. Figure 4 demonstrates how to combine FEM with a ResNet50 backbone, and Figure 5 illustrates how this could be applied to the MobileNetv3 architecture. The detailed procedure of the proposed FEM is summarized in Algorithm 1.

Algorithm 1 Feature Enhancement Module (FEM) Procedure

Require: Input feature map

F \in R^{H \times W \times C}

Ensure: Enhanced feature representation

F_{out}

1:: Compute shallow features: $F_{s} = {MLP}_{s} (F)$
2:: Compute deep features: $F_{d} = {MLP}_{d} (F)$
3:: Adaptive feature fusion: $F_{f} = w_{1} \cdot F_{s} + w_{2} \cdot F_{d}$
4:: Channel attention weights: $A = σ (MLP (GAP (F_{f})))$ , where $GAP (\cdot)$ denotes Global Average Pooling.
5:: Feature recalibration: $F_{e} = A ⊙ F_{f}$
6:: Residual enhancement: $F_{out} = F + F_{e}$

Figure 4 and Figure 5 show the internal architecture of the FEM integrated with ResNet50 and Mobilenetv3, respectively. Outputs from both branches are adapted, mixed, and subsequently passed through a bottleneck channel attention layer. Finally, the features that are enhanced are combined with the original input using a residual connection. That ensures representation boosting while keeping original information.Each stage of the FEM-based pipeline builds on the result of the previous, creating a coherent end-to-end compression that enables better efficiency of the model, while maintaining accuracy. The ResNet50 model is represented by an FP32 baseline and integrated with the FEM before applying the compression stages (FP16, pruning, or KD) to ensure that all reported improvements reflect optimization effects rather than structural differences between the original model and FEM-enhanced networks.

It is worth noting that the Global Average Pooling (GAP) operation is implicitly incorporated within the backbone architectures (ResNet50 and MobileNetV3), and is therefore not explicitly shown in Figure 4 for simplicity, although it is functionally present in the overall pipeline.

The proposed FEM is integrated after the backbone feature extractor and before the final classification layer. For the ResNet50 architecture, the output feature vector from the backbone has a dimensionality of 2048, which is used as the input to the FEM module.

The FEM module employs a dual-branch architecture to capture complementary feature interactions. The first branch performs lightweight feature transformation, while the second branch models deeper feature relationships through additional fully connected layers. The outputs of both branches are adaptively fused using learnable fusion weights. A channel attention mechanism is then applied to emphasize informative feature channels. Finally, a residual connection combines the enhanced representation with the original feature vector to preserve semantic information while improving feature expressiveness. The introduction of the FEM module slightly increases the number of trainable parameters but provides a richer feature representation prior to compression, which helps preserve model accuracy during the subsequent compression stages. The detailed analysis of the resulting model parameters, computational cost (FLOPs), and memory footprint after integrating the FEM module is reported in Section 5.3.

4.3. Mixed-Precision Training

The Automatic Mixed-Precision (AMP) is employed for training to reduce GPU memory usage and increase training speed. It is true that most network layers operate in FP16 precision, but FP32 is used for numerically sensitive components like batch normalization and loss computation. This configuration follows the approach by Micikevicius et al. (2018) and employs PyTorch’s built-in AMP module for easy implementation [28].

4.4. Structured Channel Pruning

This stage is done to eliminate redundant filters and reduce model complexity. Filters are ranked based on the L1-norm of their weights, and the least important filters, about 30%, are removed from each residual or inverted residual block [29]. The pruning ratio of 30% was chosen as a trade-off between model compression and preservation of model performance since it provides significant changes in model size and model cost without causing a severe degradation in accuracy. The pruned network is then tuned using the same optimizer and scheduler to recover accuracy to ensure reduced computational cost while maintaining most of the model’s capacity. Structured channel pruning is applied to reduce model complexity. The pruning strategy targets convolutional layers within the ResNet50 bottleneck blocks. Channel importance is estimated using the L1-norm of convolutional filter weights. Based on the computed importance scores, approximately 30% of the least significant channels are removed from each selected convolutional layer. The corresponding batch normalization layers and subsequent convolutional layers are updated accordingly to maintain architectural consistency. After pruning, the network is fine-tuned for several epochs to recover potential accuracy degradation and stabilize model performance.

4.5. Knowledge Distillation

The aim here is to compress the network by transferring knowledge from a teacher model to a student model (from large to small) in such a way that the student learns from both the ground truth and the softened outputs of the teacher. The total loss function is defined as shown in Equation (1):

L_{total} = (1 - α) L_{CE} (y_{s}, y) + α \cdot T^{2} \cdot KL (Softmax (\frac{z_{s}}{T}) ∥ Softmax (\frac{z_{t}}{T}))

(1)

where:

$L_{CE}$ : cross-entropy loss between the student predictions $(y_{s})$ and the ground-truth labels $(y)$ .
$KL$ : Kullback–Leibler divergence between the softened output distributions of the teacher $(z_{t})$ and the student $(z_{s})$ models.
T: temperature parameter (set to $4.0$ ) that controls the softening of the probability distributions.
$α$ : weighting factor (set to $0.3$ ) that balances the contributions of the soft distillation loss and the hard classification loss.

This formulation enables the student model to generalize better under supervision from the teacher [30].

4.6. Pipeline Flow

The proposed train and compress pipeline begins with the initialization of backbone network coupled with FEM. The model is first trained with a full precision FP32 baseline to get the reference to know what the performance should be. Subsequently, FP16 with AMP training is used to decrease the computational cost. This is then followed by designed pruning in order to remove redundant parameters with preserving critical features. Finally, KD with soft targets is used to transfer knowledge from the teacher model and further enhance the performance of compressed model. The complete pipeline of the proposed framework is shown in Figure 6.

To ensure clarity and reproducibility, the overall methodology can be summarized in the following sequential steps:

Initialize the backbone network (ResNet50 or MobileNetV3) and integrate the FEM module.
Train the model using FP32 to establish a performance baseline.
Apply FP16 training using AMP to reduce computational cost.
Perform structured channel pruning by removing low-importance filters based on L1-norm criteria, followed by fine-tuning.
Apply KD to transfer knowledge from the teacher model to the compressed student model.

This step-wise formulation ensures a clear understanding of the implementation procedure and enhances reproducibility. As shown in Figure 6, the training pipeline begins with FP32 training to establish a robust baseline model, followed by optimization FP16 to improve computational efficiency while maintaining numerical stability. Structured channel pruning is applied to reduce model complexity, and the final KD stage transfers knowledge from the full-capacity model to a compressed student network.

The resulting student model represents the final optimized model of the proposed framework after sequential application of all compression stages. No additional operations are performed between the last two blocks in Figure 6 The framework allows flexible integration and independent analysis.

4.7. Dataset and Preprocessing

Tiny ImageNet-200 [6] was used as a dataset, which was originally introduced from Stanford Vision Lab as part of the CS231n course “Convolutional Neural Networks for Visual Recognition”. The Tiny ImageNet dataset was employed, consisting of 200 object categories with an original resolution of

64 \times 64

pixels. The dataset was used with predefined training, validation, and test splits. All images were resized to

224 \times 224

pixels to ensure compatibility with the input layer dimensions of the CNN architectures.

The pipeline of preprocessing includes a series of data augmentation and normalization operations that are designed to enhance the generalization ability of the proposed models, including the following transformations:

Random horizontal flipping
Random rotation ( $\pm 15^{\circ}$ )
Color jittering (brightness, contrast, and saturation variations)
Normalization using the ImageNet mean and standard deviation

Data loading and augmentation were implemented using PyTorch’s DataLoader and torchvision.transforms modules [31] with a batch size of 128. Figure 7 illustrates the complete preprocessing and augmentation pipeline. It illustrates the preprocessing and augmentation steps applied to Tiny ImageNet images, including: resizing, random flipping, rotation, color jittering, normalization, and batching. These transformations ensure consistent input dimensions, enhancement of data diversity, and improvement model generalization for both ResNet50 and MobileNetV3 backbones.

4.8. Training Configuration

All experiments were conducted on the Tiny ImageNet dataset. The models were trained using a batch size of 128 for up to 100 epochs. The Adamax optimizer was employed with an initial learning rate of

5 \times 10^{- 5}

and weight decay ranging between

5 \times 10^{- 4}

and

10^{- 3}

depending on the compression stage. A cosine annealing learning rate scheduler was applied during training to improve convergence stability.

The same training setup was used across FP32 training, FP16, structured pruning fine-tuning, and KD to ensure fair comparison between compression techniques. Early stopping was applied during training based on validation accuracy to prevent overfitting, and the best-performing model was selected accordingly. The test set was strictly held out and was not used during training, validation, or model selection. All reported results are based on test set evaluation, while the validation set was used solely for model selection and performance monitoring.

The training objective was based on cross-entropy loss with label smoothing. Data augmentation was applied as described in Section 4.7 to improve model generalization.

To ensure experimental reproducibility, all models were trained using the same hyperparameter configuration. The detailed training settings used throughout the experiments are summarized in Table 1.

4.9. Reproducibility

All training procedures were implemented in PyTorch, with fixed random seeds and deterministic backend settings to ensure reproducibility. The code base is designed to be modular and extensible, facilitating easy adaptation to new datasets and architectures.

5. Experiments and Results

5.1. Experimental Setup and Training Details

The software and hardware set up on which the experiments were carried out are summarized in Table 2.

Both ResNet50 and MobileNetV3 models were trained and optimized under identical preprocessing conditions. The Tiny ImageNet dataset was employed, consisting of 200 object categories. All images were resized to

224 \times 224

pixels and augmented using random horizontal flips, random rotations, and color jittering. Subsequently, the images were normalized using the ImageNet mean and standard deviation.

Model training was performed using a batch size of 128 and an initial learning rate of

5 \times 10^{- 5}

employing the Adamax optimizer. A cosine annealing learning rate scheduler with warm restarts was applied to stabilize convergence, while early stopping with a patience of 15 epochs was used to mitigate overfitting. Each model was trained for a maximum of 100 epochs, and the best-performing checkpoints were selected based on validation accuracy. The test set was strictly held out and was not used during training, validation, or model selection.

Both ResNet50 and MobileNetV3 pipelines underwent a series of optimization stages, including FP16 quantization, structured pruning, KD, and integration of the FEM, with the objective of achieving an effective balance between model accuracy and computational efficiency. To evaluate stochastic stability, the final hybrid compression pipeline was executed three independent runs using different random seeds. The reported results correspond to the mean performance across these runs.

5.2. Evaluation Metrics

To quantify the trade-off between recognition quality and computational cost, model performance was evaluated using both accuracy-oriented and efficiency-oriented metrics. The following quantitative indicators were employed:

(1): Top-1 Accuracy ( ${Acc}_{1}$ ):

{Acc}_{1} = \frac{N_{correct}}{N_{total}} \times 100

(2)

where

N_{correct}

represents the number of correctly classified samples, and

N_{total}

denotes the total number of test samples.

(2): Memory Reduction (MR%):

MR (%) = \frac{M_{FP 32} - M_{optimized}}{M_{FP 32}} \times 100

(3)

MR reflects the percentage reduction in GPU memory consumption compared to the FP32 baseline.

(3): Latency (ms):

Latency (ms) is used to measure the average inference time per image, where lower values indicate faster model execution. It is computed as the time required by the model to process a single input sample.

(4): Compression Ratio (CR):

CR = \frac{P_{baseline}}{P_{compressed}}

(4)

where

P_{baseline}

and

P_{compressed}

refer to the total number of parameters before and after optimization, respectively. A higher compression ratio indicates a more compact and efficient model.

The purpose of employing these metrics is to jointly assess recognition accuracy and computational efficiency, ensuring that performance improvements are achieved without incurring excessive resource overhead.

5.3. Results for ResNet50 + FEM

All results obtained from the ResNet50 architecture augmented with the FEM across the stages of the hybrid compression framework, including FP16, structured pruning (30%), and KD, are summarized in Table 3.

The results reported are for the best performing configurations achieved after multiple experiments with hyperparameter variation being chosen so that an optimal balance condition between recognition accuracy and computational efficiency across the compression pipeline can be achieved. FP32 refers to the baseline ResNet50 + FEM model prior to applying any compression techniques, FP16 is the training mode with mixed precision, pruning refers to a 30% structured pruning ratio and distillation is the student model trained by KD. The last pipeline is of the optimized ResNet50 + FEM configuration. The hybrid model demonstrates that the transition from FP32 to FP16 achieves a 23.4% reduction in memory consumption (from 536 MB to 410 MB) and a 42.8% improvement in inference latency, while maintaining nearly identical accuracy (

- 0.12 %

). This confirms that mixed-precision optimization effectively enhances computational efficiency with minimal representational degradation.

Structured pruning further reduced latency from 5.42 ms to 3.33 ms, corresponding to a 38.6% improvement. This stage reduced the total number of parameters by approximately 16.7%. As expected, a corresponding accuracy degradation of 6.88% was observed, which can be attributed to the removal of less critical convolutional filters.

The KD stage effectively compensates for this accuracy loss by restoring performance to 80.67% and achieving a 32.26% reduction in memory usage compared to the FP32 baseline. This demonstrates that the proposed FEM integration preserves feature discrimination even under aggressive model compression.

Compared with conventional ResNet50 compression techniques, the proposed FEM-based framework achieves superior computational efficiency while preserving recognition accuracy. These results highlight the effectiveness of the FEM in preserving feature representation and maintaining high accuracy while significantly improving computational efficiency.

The training and validation curves of the ResNet50 + FEM model shown in Figure 8, Figure 9, Figure 10, Figure 11 and Figure 12 illustrate smoother convergence trends, indicating faster optimization and reduced overfitting compared to the original ResNet50 baseline. The reason behind varying the number of training epochs for different stages is the application of early stopping as per validation performance where each model is trained until convergence to avoid overfitting rather than fixing a number of epochs.

5.4. Results for MobileNetV3 + FEM

Experiments were conducted using the MobileNetV3 + FEM configuration under the Tiny ImageNet and training conditions described previously.FP16, structured pruning (30%), and KD were independently applied to evaluate their impact on recognition accuracy, computational efficiency, and memory utilization. The experimental results are summarized in Table 4.

When applied to the MobileNetV3 backbone, the proposed FEM-based hybrid compression framework emphasizes computational efficiency and lower resource consumption, resulting in only a minimal loss of accuracy compared to heavier architectures. This trade-off is expected and acceptable in scenarios where efficiency, latency, and memory constraints are critical. Transition from FP32 to FP16 brought a significant reduction in memory usage (23.3%) and inference latency (48.4%) while keeping the classification accuracy unchanged (feature accuracy of −0.48%), which indicates an improvement in the efficiency of the hardware without a significant change in the feature representation. structured pruning with a ratio of 30% could further improve the runtime performance with a reduction of 3.22% and a latency of 55.0%, in addition to a slight decrease of the throughput. Although there was a 3.22% accuracy degradation as a result of pruning, this degradation is acceptable given the overall increase in efficiency that is achieved.

Finally, the inference time was obtained at 2.81 ms, and the throughput was 452.3 images per second, and the Top-1 accuracy was preserved as 65.61. These results confirm that lightweight architectures such as MobileNetV3 can effectively benefit from the (teacher–student) paradigm without requiring extensive structural redundancy. Analyzing the scalability and convergence behavior across lightweight architectures could be best recognized through the training and validation curves in Figure 13, Figure 14, Figure 15, Figure 16 and Figure 17 that exhibit stable convergence with reduced overfitting tendencies, reinforcing the framework’s adaptability to compact neural models.

Compared to existing lightweight compression approaches, the proposed FEM-based framework enables MobileNetV3 to achieve superior efficiency with minimal degradation in recognition accuracy. These findings further confirm that the proposed FEM-based framework is effective for lightweight architectures, achieving a favorable balance between efficiency and accuracy. Analyzing the unified training convergence and generalization, it seems that both architectures exhibited smooth and stable convergence across all optimization phases. The FP16 mixed-precision configuration accelerated convergence, keeping the accuracy and efficiency while preserving generalization. In contrast, the structured pruning phase introduced minor loss oscillations, which rapidly stabilized, indicating the model’s capacity for adaptive recalibration. Furthermore, KD minimized the divergence between training and validation curves, confirming strong (teacher–student) knowledge transfer and strong feature preservation under compression.

Comparative Analysis Between Architectures

The comparison between ResNet50 + FEM and MobileNetV3 + FEM demonstrates the flexibility and the architecture flexibility of the proposed FEM-based hybrid compression framework for backbone networks with different computational capacities. In order to guarantee a fair and a methodologically sound evaluation we submitted both models to the same optimization stages, which included FP16 mixed-precision training, structured pruning and KD. As expected, ResNet50 + FEM was able to maintain higher Top-1 accuracy at all compression stages, which can be explained by its more profound residual structure and more powerful representational ability. In contrast, MobileNetV3 + FEM proved to be better computationally efficient, with significant reduction in memory consumption and inference latency, with moderate decrease in accuracy. These characteristics make MobileNetV3 + FEM especially suitable for resource constrained and real-time edge deployment scenarios. Overall, the obtained results suggest that the proposed FEM-based framework is architecture independent and enough accurate to provide stability in terms of accuracy when coupled with high-capacity models, but sufficiently efficient when coupled with lightweight architectures. This behavior confirms the robustness and the generalization of the FEM module for various backbone designs and not an architecture superiority of one model over another.

Figure 18 illustrates the comparative trends of Top-1 accuracy for both architectures across the entire compression pipeline.

5.5. Ablation Analysis of the FEM Module

To evaluate the contribution of the proposed FEM, an ablation analysis was conducted by comparing three configurations: the baseline CNN model, the FEM-enhanced model, and the final hybrid compression pipeline. All experiments were conducted under identical training settings, and the reported results correspond to the mean and standard deviation obtained across three independent runs to ensure statistical reliability. The baseline model represents the original FP32 network without the FEM module. As shown in Table 5, the baseline model achieves a Top-1 accuracy of 57.56% on the Tiny ImageNet dataset, with a latency of 4.71 ms and memory consumption of 406.5 MB. After integrating the proposed FEM module into the backbone architecture, the classification accuracy significantly increases to 81.63%. This improvement indicates that the FEM module significantly enhances feature representation and improves the discriminative capability of the network prior to compression. However, this enhancement introduces additional computational cost and memory usage due to the extra feature processing layers. To address this overhead, the proposed hybrid compression pipeline is applied, which includes mixed-precision training, structured pruning, and KD. The compressed FEM-enhanced model achieves 80.87% Top-1 accuracy while reducing memory usage to 363.1 MB and improving inference latency to 3.16 ms. This indicates that the compression pipeline effectively reduces computational complexity while preserving most of the accuracy gained by the FEM module. Overall, the ablation results demonstrate that the FEM module plays a key role in improving feature representation and classification accuracy, while the hybrid compression pipeline effectively reduces computational and memory requirements with minimal accuracy degradation.

5.6. Comparative Analysis with State of the Art Methods

Prior ResNet50 -based compression studies have demonstrated notable improvements in computational efficiency; however, many of these approaches exhibit accuracy degradation or limited scalability when applied to deeper architectures. For instance, Akbaş et al. [32] reported an eigenvalue-based pruning strategy achieving 55.3% Top-1 accuracy on ResNet50 with a 30% pruning ratio, while Hou et al. [33] introduced the PEEL layer-wise pruning method, maintaining 74.8% accuracy with an estimated 50% reduction in FLOPs. In the same way, the researchers in [7] proposed a structured pruning approach achieving 56.4% accuracy, while Deng et al. [8] presented the Wide Topo foresight pruning framework, reporting 63.8% accuracy at 25% density. Distillation-based optimization was explored by Zhang et al. [34], who achieved 69.7% accuracy on Tiny ImageNet with improved convergence stability. The proposed model achieves a Top-1 accuracy of 80.87% after applying the hybrid compression pipeline, as shown in Table 6, while delivering substantial reductions in memory consumption and inference latency. These results suggest that the integration of the FEM strengthens representational capacity and feature stability, particularly under aggressive and high compression scenarios. The experiments in Table 7 show that the MobileNetV3 demonstrates the framework’s scalability to lightweight architectures under identical training and evaluation conditions. No prior work has reported Tiny ImageNet results for a unified hybrid pipeline that jointly integrates feature enhancement, mixed-precision computation, structured pruning, and KD, and that makes the proposed method a strong and updated model. Due to differences in experimental settings and hardware platforms, the comparison is indicative rather than a strictly controlled side-by-side evaluation. It should be noted that hardware specification can play a large role in terms of latency and memory results. Accordingly, latency and memory reduction values are reported for reference purposes only, while Top-1 accuracy is used as the main metric for comparative analysis purposes, since it is hardware-independent. To ensure a fair comparison with previously published methods (Section 5.5), all evaluated approaches were compared under comparable experimental conditions. Specifically, the Tiny ImageNet dataset was used with the same input resolution and evaluation protocol. The reported results correspond to the Top-1 classification accuracy evaluated on the test set. Results of prior methods were obtained from their respective publications using the same dataset and evaluation metrics, ensuring a fair and consistent comparison with the proposed framework.

5.7. Stochastic Stability Verification

Deep neural network training is inherently stochastic due to random initialization, data shuffling, and hardware-level parallelism. To evaluate the robustness of the proposed framework against such stochastic effects, we conducted three independent runs of the final hybrid compression pipeline using different fixed random seeds (42, 133, and 999). All experiments were executed under identical training configurations and evaluation protocols. The obtained accuracies were 80.42%, 80.53%, and 80.52%, respectively. The mean accuracy across the three runs was 80.49% with a standard deviation of ±0.06, indicating extremely low performance variance across different random initializations. This confirms that the proposed FEM-based hybrid compression framework maintains stable optimization behavior and is not sensitive to stochastic training variations. The results of the stochastic stability evaluation are summarized in Table 8.

The reported mean and standard deviation quantify the statistical stability of the proposed framework and demonstrate that the obtained results are consistent across different random initializations.

5.8. Summary of Experimental Findings

The experimental results demonstrate that the proposed FEM-based hybrid compression framework balances recognition accuracy and computational efficiency effectively across diverse neural architectures. Through systematic optimization, including mixed-precision computation, structured pruning, and KD, the proposed framework consistently reduces memory consumption and inference latency while maintaining competitive recognition performance. Under a unified experimental conditions, the 81.63% Top-1 accuracy for ResNet50 + FEM and 75.37% for MobileNetV3 + FEM indicate strong scalability, robustness, and adaptability across architectures with varying computational capacities.

6. Discussion

The results show that the proposed FEM-based hybrid compression framework is effective across different network architectures and optimization stages. Mixing the FEM with both ResNet50 and MobileNetV3 improved feature selection and inter-channel representation stability, which enables networks to keep semantic richness under a high compression and mixed-precision constraints.

The FEM boosted inter-channel communication and discriminative feature learning, leading to significant performance improvements. In particular, the ablation analysis shows an improvement of approximately 24% in Top-1 accuracy compared to the baseline CNNs model, while comparisons with existing compression methods indicate improvements of up to 6% on the Tiny ImageNet dataset.

Furthermore, the FEM-based approach achieved higher efficiency, reducing memory consumption by 32.26% and inference latency by approximately 66%, as reported in Table 3. These results demonstrate that the effectiveness of the proposed model originates not only from parameter compression, but also from integrating the FEM within the hybrid optimization pipeline.

7. Conclusions

This study proposes to build a unified hybrid compression framework which includes FEM along with Mixed Precision, Structure Channel Pruning and KD. By enhancing feature stability and inter-channel robustness, the proposed approach shows good effectiveness in terms of high-capacity (ResNet50) and lightweight (MobileNetV3) architectures. Experimental results on the Tiny ImageNet dataset indicate that the framework achieves up to 6% improvement in Top-1 accuracy compared with existing compression methods, and up to 24% improvement compared to the baseline CNNs model, while reducing memory usage by 32.26% and achieving approximately 66% faster inference. These findings are consistent with the experimental results and confirm the effectiveness of the proposed FEM-based framework in balancing accuracy and computational efficiency. Thus FEM could be considered as an efficient technique in preserving semantic richness and feature detection under high compression, which confirms being adaptable for high capacity and lightweight convolutional networks.

Future work will focus on evaluating the proposed FEM-based framework on larger and more complex datasets, as well as extending it to transformer-based or hybrid CNN–ViT architectures.

Author Contributions

Conceptualization, A.T.; methodology, A.H. and A.T.; software, A.H.; validation, A.H. and A.T.; formal analysis, A.H. and A.T.; investigation, A.H.; resources, A.H.; data curation, A.H.; writing—original draft preparation, A.H.; writing—review and editing, A.T. and A.M.M.; visualization, A.H.; supervision, A.T.; project administration, A.T. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The dataset used in this study (Tiny ImageNet) is publicly available from the official Stanford CS231N repository at http://cs231n.stanford.edu/tiny-imagenet-200.zip (accessed on 10 January 2026). This study did not involve the creation of new data.

Acknowledgments

The authors would like to acknowledge the support provided by their respective institutions. The authors also thank the reviewers for their valuable comments and suggestions that helped improve the quality of this manuscript.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

AI	Artificial Intelligence
AMP	Automatic Mixed Precision
CNNs	Convolutional Neural Networks
CR	Compression Ratio
CUDA	Compute Unified Device Architecture
DNNs	Deep Neural Networks
FEM	Feature Enhancement Module
FLOPs	Floating Point Operations
FP16	16-bit Floating Point
FP32	32-bit Floating Point
GAP	Global Average Pooling
GPU	Graphics Processing Unit
IoT	Internet of Things
KD	Knowledge Distillation
MLP	Multi-Layer Perceptron
MR	Memory Reduction
NAS	Neural Architecture Search
RAM	Random Access Memory
ReLU	Rectified Linear Unit
SE	Squeeze-and-Excitation
VRAM	Video Random Access Memory

References

Li, Z.; Li, H.; Meng, L. Model compression for deep neural networks: A survey. Computers 2023, 12, 60. [Google Scholar] [CrossRef]
Lee, H.; Lee, N.; Lee, S. A method of deep learning model optimization for image classification on edge device. Sensors 2022, 22, 7344. [Google Scholar] [CrossRef]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; IEEE: Piscataway, NJ, USA, 2016; pp. 770–778. [Google Scholar]
Howard, A.; Sandler, M.; Chu, G.; Chen, L.C.; Chen, B.; Tan, M.; Wang, W.; Zhu, Y.; Pang, R.; Vasudevan, V.; et al. Searching for mobilenetv3. In Proceedings of the IEEE/CVF International Conference on Computer Vision; IEEE: Piscataway, NJ, USA, 2019; pp. 1314–1324. [Google Scholar]
Li, M.; Huang, Z.; Chen, L.; Ren, J.; Jiang, M.; Li, F.; Fu, J.; Gao, C. Contemporary advances in neural network quantization: A survey. In Proceedings of the 2024 International Joint Conference on Neural Networks (IJCNN); Yokohama, Japan, 30 June–5 July 2024, IEEE: Piscataway, NJ, USA, 2024; pp. 1–10. [Google Scholar]
Lab, S.V. Tiny ImageNet Visual Recognition Challenge. 2015. Available online: http://cs231n.stanford.edu/tiny-imagenet-200.zip (accessed on 10 January 2026).
Ahmed Zaid, D.; Djamaa, B.; Benatia, M.A. Efficient and dynamic layer-wise structured N:M pruning of deep neural networks. Neurocomputing 2025, 653, 131090. [Google Scholar] [CrossRef]
Deng, C.; Cheng, J.; Su, Y.; An, Z.; Yang, Z.; Xia, Z.; Zhang, Y.; Wang, S. WideTopo: Improving foresight neural network pruning through training dynamics preservation and wide topologies exploration. Neural Netw. 2025, 194, 108136. [Google Scholar] [CrossRef]
Wu, D.; Wang, Y.; Fei, Y.; Gao, G. A Novel Mixed-Precision Quantization Approach for CNNs. IEEE Access 2025, 13, 49309–49319. [Google Scholar] [CrossRef]
Rakka, M.; Fouda, M.E.; Khargonekar, P.; Kurdahi, F. A review of state-of-the-art mixed-precision neural network frameworks. IEEE Trans. Pattern Anal. Mach. Intell. 2024, 46, 7793–7812. [Google Scholar] [CrossRef]
Zhang, R.; Jiang, H.; Wang, W.; Liu, J. Optimization Methods, Challenges, and Opportunities for Edge Inference: A Comprehensive Survey. Electronics 2025, 14, 1345. [Google Scholar] [CrossRef]
Rokh, B.; Azarpeyvand, A.; Khanteymoori, A. A comprehensive survey on model quantization for deep neural networks in image classification. ACM Trans. Intell. Syst. Technol. 2023, 14, 1–50. [Google Scholar] [CrossRef]
Kim, G.I.; Hwang, S.; Jang, B. Efficient compressing and tuning methods for large language models: A systematic literature review. ACM Comput. Surv. 2025, 57, 1–39. [Google Scholar] [CrossRef]
Tmamna, J.; Ayed, E.B.; Fourati, R.; Gogate, M.; Arslan, T.; Hussain, A.; Ayed, M.B. Pruning deep neural networks for green energy-efficient models: A survey. Cogn. Comput. 2024, 16, 2931–2952. [Google Scholar] [CrossRef]
Wang, S.; Zhu, Q. Channel modulus normalization for CNN image classification. Multimed. Syst. 2024, 30, 305. [Google Scholar] [CrossRef]
Dantas, P.V.; Da Silva, W.S.; Cordeiro, L.C.; Carvalho, C.B. A comprehensive review of model compression techniques in machine learning. Appl. Intell. 2024, 54, 11804–11844. [Google Scholar] [CrossRef]
Lian, Y.; Peng, P.; Jiang, K.; Xu, W. Cross-layer importance evaluation for neural network pruning. Neural Netw. 2024, 179, 106496. [Google Scholar] [CrossRef] [PubMed]
Zhao, H.; Guan, R.; Man, K.L.; Yu, L.; Yue, Y. RePaIR: Repaired pruning at initialization resilience. Neural Netw. 2025, 184, 107086. [Google Scholar] [CrossRef]
Mondal, M.; Das, B.; Lall, B.; Singh, P.; Roy, S.D.; Joshi, S.D. Feature independent filter pruning by successive layers analysis. Comput. Vis. Image Underst. 2023, 236, 103828. [Google Scholar] [CrossRef]
Waheed, Z.; Khalid, S.; Riaz, S.M.; Khawaja, S.G.; Tariq, R. Resource-Restricted Environments Based Memory-Efficient Compressed Convolutional Neural Network Model for Image-Level Object Classification. IEEE Access 2022, 11, 1386–1406. [Google Scholar] [CrossRef]
Xu, Y.; Khan, T.M.; Song, Y.; Meijering, E. Edge deep learning in computer vision and medical diagnostics: A comprehensive survey. Artif. Intell. Rev. 2025, 58, 93. [Google Scholar] [CrossRef]
Gou, J.; Yu, B.; Maybank, S.J.; Tao, D. Knowledge Distillation: A Survey. Int. J. Comput. Vis. 2021, 129, 1789–1819. [Google Scholar] [CrossRef]
Hao, Z.; Guo, J.; Han, K.; Tang, Y.; Hu, H.; Wang, Y.; Xu, C. One-for-all: Bridge the gap between heterogeneous architectures in knowledge distillation. Adv. Neural Inf. Process. Syst. 2023, 36, 79570–79582. [Google Scholar]
Francy, S.; Singh, R. Edge ai: Evaluation of model compression techniques for convolutional neural networks. arXiv 2024, arXiv:2409.02134. [Google Scholar] [CrossRef]
Jiang, C.; Hou, M.; Wang, H. An Adaptive Compression Method for Lightweight AI Models of Edge Nodes in Customized Production. Sensors 2026, 26, 383. [Google Scholar] [CrossRef]
Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.C. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; IEEE: Piscataway, NJ, USA, 2018; pp. 4510–4520. [Google Scholar]
Tan, M.; Le, Q. Efficientnet: Rethinking model scaling for convolutional neural networks. In Proceedings of the International Conference on Machine Learning, PMLR, Long Beach, CA, USA, 9–15 June 2019; pp. 6105–6114. [Google Scholar]
Micikevicius, P.; Narang, S.; Alben, J.; Diamos, G.; Elsen, E.; Garcia, D.; Ginsburg, B.; Houston, M.; Kuchaiev, O.; Venkatesh, G.; et al. Mixed precision training. arXiv 2017, arXiv:1710.03740. [Google Scholar] [CrossRef]
He, Y.; Zhang, X.; Sun, J. Channel pruning for accelerating very deep neural networks. In Proceedings of the IEEE International Conference on Computer Vision; IEEE: Piscataway, NJ, USA, 2017; pp. 1389–1397. [Google Scholar]
Hinton, G.; Vinyals, O.; Dean, J. Distilling the knowledge in a neural network. arXiv 2015, arXiv:1503.02531. [Google Scholar] [CrossRef]
Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada, 8–14 December 2019; Volume 32. [Google Scholar]
Cugu, I.; Akbas, E. A Deeper Look into Convolutions via Eigenvalue-based Pruning. arXiv 2021, arXiv:2102.02804. [Google Scholar]
Hou, Y.; Ma, Z.; Liu, C.; Wang, Z.; Loy, C.C. Network pruning via resource reallocation. Pattern Recognit. 2024, 145, 109886. [Google Scholar] [CrossRef]
Zhang, W.; Guo, Y.; Wang, J.; Zhu, J.; Zeng, H. Collaborative Knowledge Distillation. IEEE Trans. Circuits Syst. Video Technol. 2024, 34, 7601–7613. [Google Scholar] [CrossRef]
Zhang, Z.; Liu, T.; Gao, J.; Yang, M.; Luo, W.; Lin, F. TDR-Model: Tomato Disease Recognition Based on Image Dehazing and Improved MobileNetV3 Model. IEEE Access 2024, 13, 852–865. [Google Scholar] [CrossRef]
Shahriar, T. Comparative Analysis of Lightweight Deep Learning Models for Memory-Constrained Devices. arXiv 2025, arXiv:2505.03303. [Google Scholar] [CrossRef]

Figure 1. Schematic representation of the ResNet50 architecture (adapted from [3]).

Figure 2. Schematic representation of the MobileNetV3-Large architecture, adapted for the Tiny ImageNet dataset (200 classes), based on the original design proposed by Howard et al. [4].

Figure 3. Architectural comparison between ResNet50 and MobileNetV3, illustrating key differences in network design, computational complexity, and efficiency-oriented architectural components.

Figure 4. Structure of FEM with ResNet50.

Figure 5. Structure of FEM with Mobilenetv3.

Figure 6. Sequential training and compression stages of the proposed framework.

Figure 7. Data Preprocessing and Augmentation Pipeline for Tiny ImageNet.

Figure 8. Training and validation accuracy (left) and loss (right) curves of ResNet50 + FEM (FP32 baseline).

Figure 9. Training and validation accuracy (left) and loss (right) curves of ResNet50 + FEM (FP16 mixed precision).

Figure 10. Training and validation accuracy (left) and loss (right) curves of ResNet50 + FEM after structured pruning and fine-tuning.

Figure 11. Training and validation accuracy (left) and loss (right) curves for ResNet50 + FEM during KD in the student–teacher configuration, illustrating the convergence behavior and performance of the student model.

Figure 12. Training and validation accuracy (left) and loss (right) curves for ResNet50 + FEM final pipeline, illustrating the convergence behavior and performance of the optimized student model.

Figure 13. Training and validation accuracy (left) and loss (right) curves of MobileNetV3 + FEM (FP32 baseline).

Figure 14. Training and validation accuracy (left) and loss (right) curves of MobileNetV3 + FEM (FP16 mixed precision).

Figure 15. Training and validation accuracy (left) and loss (right) curves of MobileNetV3 + FEM after structured pruning and fine-tuning.

Figure 16. Training and validation accuracy (left) and loss (right) curves of MobileNetV3 + FEM during KD (student–teacher configuration).

Figure 17. Training and validation accuracy (left) and loss (right) curves of the MobileNetV3 + FEM pipeline across the compression stages.

Figure 18. Top-1 accuracy trends of ResNet50 + FEM and MobileNetV3 + FEM across compression stages, showing the adaptability of the proposed FEM-based framework to different backbone architectures.

Table 1. Training Hyperparameters Configuration.

Parameter	Value
Input resolution	$224 \times 224$
Optimizer	Adamax
Learning Rate	$5 \times 10^{- 5}$ (Cosine Annealing)
Epochs	100 (maximum, early stopping applied)
Batch Size	128
Pruning Ratio	30%
Temperature (T)	4.0
Distillation Alpha ( $α$ )	0.3
Weight Decay	$5 \times 10^{- 4}$
Gradient Clipping	1.0
Loss function	Cross-entropy with label smoothing

Table 2. Hardware and Software Configuration.

Category	Specification
Workstation	MSI Titan 18 HX
CPU	Intel Core Ultra 9 285 HX (2.80 GHz)
RAM	64 GB DDR5 (6400 MT/s)
Storage	4 TB NVMe SSD
GPU	NVIDIA RTX 5090 GPU (24 GB VRAM)
Display	18-inch 3840 × 2400 MiniLED, 120 Hz
Operating System	Windows 11
Programming Language	Python 3.11.5
Deep Learning Framework	PyTorch 2.1
GPU Acceleration	CUDA 12.0

Table 3. Experimental results for ResNet50 + FEM across the hybrid compression pipeline.

Metric	FP32 Baseline	FP16	Pruning (30%)	KD	Final Pipeline Model	Impact vs. Baseline
Accuracy (%)	81.63	81.51	74.63	80.67	80.87	$- 0.93 %$
Latency (ms)	9.48	5.42	3.33	3.18	3.16	$- 66.7 %$
Throughput (img/s)	143.4	144.4	176.4	478.8	500.7	$+ 249 %$
Memory (MB)	536.0	410.7	458.8	375.1	363.1	$- 32.26 %$
Parameters (M)	32.53	32.53	27.08	24.66	24.66	$- 24.2 %$

Table 4. Experimental results for MobileNetV3 + FEM across the hybrid compression pipeline.

Metric	FP32 Baseline	FP16	Pruning (30%)	KD	Final Pipeline Model	Impact vs. Baseline
Accuracy (%)	75.37	75.01	72.15	65.61	66.29	$- 12.05 %$
Latency (ms)	6.78	3.50	3.05	2.81	3.57	$- 47.3 %$
Throughput (img/s)	150.1	111.3	130.5	452.3	593.1	$+ 295 %$
Memory (MB)	107.8	82.7	37.3	61.8	72.76	$- 32.50 %$
Parameters (M)	5.29	5.29	3.58	2.98	3.57	$- 32.5 %$

Table 5. Ablation analysis of the proposed FEM module and hybrid compression pipeline.

Configuration	Accuracy (%)	Latency (ms)	Throughput (img/s)	Memory (MB)
ResNet50 (FP32, without FEM)	57.56	4.71	116.6	406.5
ResNet50 + FEM	81.63	9.48	143.4	536.0
Hybrid Compression Pipeline + FEM	80.87	3.16	500.7	363.1

Table 6. Indicative comparison between the proposed FEM-based framework and previously reported ResNet50-based methods on Tiny ImageNet.

Method	Backbone	Compression Strategy	Top-1 Acc.	Memory Red. (%)	Latency Red. (%)
Akbaş et al. [32]	ResNet50	Eigenvalue-based Pruning	55.30	∼30	∼25
Hou et al. [33]	ResNet50	PEEL Layer-wise Pruning	74.80	∼45	–
Ahmed Zaid et al. [7]	ResNet50	Structured N:M Pruning	56.40	∼30	–
Deng et al. [8]	ResNet50	WideTopo Foresight Pruning	63.80	∼25	–
Zhang et al. [34]	ResNet50	KD	69.70	–	–
Proposed FEM	ResNet50	Hybrid (FEM + FP16+ Pruning + KD)	80.87	32.2	66

Note: All reported results for the proposed method are based on the held-out test set, while the validation set was exclusively used during training for model selection and hyperparameter tuning. The compared methods were reported from the original publications and may employ different validation protocols. Therefore, the comparison should be interpreted as indicative rather than strictly controlled.

Table 7. Indicative comparison of MobileNet-based methods on Tiny ImageNet.

Method	Backbone	Technique	Top-1 Acc.	Memory Red. (%)	Latency Red. (%)
Kumar et al. [35]	MobileNetV3	Lightweight Pruning	58.50	–	–
Shahriar et al. [36]	MobileNetV3	Distillation-based Training	72.54	–	–
Proposed FEM	MobileNetV3	FEM-integrated Hybrid Compression	66.29	32.5	47.3

Note: All reported results for the proposed method are based on the held-out test set, while the validation set was exclusively used during training for model selection and hyperparameter tuning. The compared methods were reported from the original publications and may employ different validation protocols. Therefore, the comparison should be interpreted as indicative rather than strictly controlled.

Table 8. Stochastic Stability Evaluation of the Final Hybrid Pipeline (ResNet50 + FEM).

Seed	Accuracy (%)
42	80.42
133	80.53
999	80.52
Mean ± Std	80.49 ± 0.06

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Hamza, A.; Tuama, A.; Mohamed Moubark, A. FEM-Based Hybrid Compression Framework with Pipeline Implementation for Efficient Deep Neural Networks on Tiny ImageNet. Big Data Cogn. Comput. 2026, 10, 131. https://doi.org/10.3390/bdcc10050131

AMA Style

Hamza A, Tuama A, Mohamed Moubark A. FEM-Based Hybrid Compression Framework with Pipeline Implementation for Efficient Deep Neural Networks on Tiny ImageNet. Big Data and Cognitive Computing. 2026; 10(5):131. https://doi.org/10.3390/bdcc10050131

Chicago/Turabian Style

Hamza, Areej, Amel Tuama, and Asraf Mohamed Moubark. 2026. "FEM-Based Hybrid Compression Framework with Pipeline Implementation for Efficient Deep Neural Networks on Tiny ImageNet" Big Data and Cognitive Computing 10, no. 5: 131. https://doi.org/10.3390/bdcc10050131

APA Style

Hamza, A., Tuama, A., & Mohamed Moubark, A. (2026). FEM-Based Hybrid Compression Framework with Pipeline Implementation for Efficient Deep Neural Networks on Tiny ImageNet. Big Data and Cognitive Computing, 10(5), 131. https://doi.org/10.3390/bdcc10050131

Article Menu

FEM-Based Hybrid Compression Framework with Pipeline Implementation for Efficient Deep Neural Networks on Tiny ImageNet

Abstract

1. Introduction

2. Related Work

3. Theoretical Analysis

3.1. Convolutional Neural Networks

3.2. Residual Learning and ResNet50

3.3. MobileNetV3

4. Materials and Methods

4.1. Framework Overview

4.2. Feature Enhancement Module

4.3. Mixed-Precision Training

4.4. Structured Channel Pruning

4.5. Knowledge Distillation

4.6. Pipeline Flow

4.7. Dataset and Preprocessing

4.8. Training Configuration

4.9. Reproducibility

5. Experiments and Results

5.1. Experimental Setup and Training Details

5.2. Evaluation Metrics

5.3. Results for ResNet50 + FEM

5.4. Results for MobileNetV3 + FEM

Comparative Analysis Between Architectures

5.5. Ablation Analysis of the FEM Module

5.6. Comparative Analysis with State of the Art Methods

5.7. Stochastic Stability Verification

5.8. Summary of Experimental Findings

6. Discussion

7. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI