Adaptive and User-Friendly Framework for Image Classification with Transfer Learning Models
Abstract
1. Introduction
- (a)
- Key features of ALF:
- (i)
- (ii)
- User-Friendly Interface: Built with Streamlit, it offers a simple web interface where users can upload images, define classes, and train models without coding [7].
- (iii)
- Automation of Model Training and Deployment: The system takes care of everything, starting from preprocessing user data to training and supporting model downloads for local usage.
- (iv)
- (v)
- Multi-Model Flexibility: The user can compare and train by various transfer learning backbones (e.g., MobileNet, ResNet, Inception, EfficientNet) within the same GUI framework.
- (vi)
- Transparency: Respective internal callbacks, e.g., early stopping or learning-rate schedulers, are both accompanied by a transparent measure reporting (evaluation metrics, learning curves, and other visualizations).
- (vii)
- Versatile Applications: Suitable for various fields such as healthcare, education, and retail, it enables users to tackle specific classification challenges with ease.
- (b)
- Literature review:
- (i)
- Transfer Learning in Image Classification: Transfer learning has completely changed the scenario of image classification tasks by allowing for the capability to just fine-tune the already pre-trained models for achieving state-of-the-art accuracies with minimum computational cost and number of images [8,9,10]. For example, MobileNetV1, proposed by Tao Sheng et al., 2018, showed the power of depth-wise separable convolutions in reducing the parameter size and computational overhead [11]. Later, Sandler et al., 2019, enhanced this architecture by using inverted residuals and linear bottlenecks [12]. As a result, MobileNetV2 was particularly suited to resource-constrained devices. Furthermore, EfficientNetV2, introduced by Mingxing Tan and Quoc V. Le in 2021, achieved state-of-the-art results due to its compound scaling, balancing depth, width, and resolution in a manner that is both faster and more accurate [13]. Various studies have also used pre-trained models such as ResNet and Inception. ResNet introduced residual learning, solving the problem of the vanishing gradient, hence allowing for deeper networks that could be trained effectively [14].Classical standard classification image backbones like Vision Transformer (ViT), Swin Transformer, and ConvNeXt have shown their ability to perform optimally in terms of results on large-scale databases. Although these architectures were not used in the current implementation of ALF, it is expected that they will be added in future releases, and therefore extend the level of adaptabilities and performance standards set by the framework.
- (ii)
- Machine Learning with Accessibility: Various research works have been carried out to simplify the intricacies of machine learning for non-technical users. Google’s Teachable Machine, discussed in a review by Michelle Carney et al., 2020, illustrates how AI can be more accessible with the help of user-friendly tools. This tool enables users to train classification models for their specific needs without prior experience in coding, thereby ensuring fairness in machine learning applications [3] and the place of teachable machines in accessibility, especially in the design of assistive technologies for people with disabilities [4]. These works all underpin the importance of intuitive interfaces in bridging the gap between AI technologies and their end-users.
- (iii)
- Data-Driven Optimization for Image Classification: Data preprocessing and optimization techniques are effective in the creation of robust classification models [15]. Various works using MobileNet architectures cite image resizing and normalization as necessary preprocessing to match model requirements. EfficientNetV2’s progressive learning strategy underlines the dynamic scaling of data dimensions toward improving the performance of a model. Transfer learning models have been applied in diverse fields due to their versatility. For example, Mijwil et al. used MobileNetV1 for brain tumor classification, showing an accuracy of over 97% and high sensitivity and specificity [16]. Similarly, the use of ResNet and Inception models in medical imaging and object recognition highlights their utility in real-world challenges.
2. Methodology
- (a)
- Dataset Overview
- (b)
- Flow of Methodology
- (i)
- Data Preprocessing: Preprocessing the data ensures that it is well-prepared for use in machine learning models [15,19]. This involves resizing, normalizing, and splitting the data. MobileNetV2 expects the input image to be of the shape 224 × 224 × 3, and InceptionV3 expects it to be of 299 × 299 × 3. In this way, the aspect ratio of images is preserved by not allowing for their distortion [15,16]. Normalization scales pixel values to a range of [0, 1], achieved by dividing each pixel value by 255 [15,20]. This standardization step helps stabilize computations and speeds up the model’s learning process. Splitting divides the dataset into the training set, which forms the majority (60%) and is used to teach the model the relationships between input images and their labels. Techniques like flipping, cropping, or rotating images are applied to enhance diversity. The validation set (20%) acts as a checkpoint during training, allowing for adjustments to the model parameters and the detection of overfitting early. The test set (20%) is used for the evaluation of new data.
- (ii)
- Models: The choice of model plays an important role in achieving accurate results in image classification tasks. By using transfer learning, it is possible to achieve higher accuracy with reduced training time. Each transfer learning model has unique strengths, as shown in Table 1.
- (iii)
- Model Performance Evaluation: Evaluating how well a model performs is crucial for understanding its ability to generalize unseen data. Several approaches are used. Metrics like accuracy, precision, recall, and F1-scores are calculated to compare the strengths and weaknesses of different models [23,24]. Learning curves visually track the accuracy and loss across training and validation phases. These curves help detect problems such as overfitting [25].
- (iv)
- Model Deployment: Once trained, the models are released for use in real-world applications. There are two ways for users to interact with them: through a web interface, users upload images and obtain predictions directly with confidence scores; or the model is downloaded as a trained model file (h5) that can be deployed offline or integrated into other systems.
- (v)
- User Interface Development: Figure 2 shows the User Interface for the ALF application that was designed using Streamlit 1.35.0, an open-source Python framework. The UI of the machine learning application was developed to make interactions with it simple and easy to use, even for people who are not technical. The interface allows for users to select models, adjust parameters, and train transfer learning models with minimal effort. It has been designed such that the entire process is seamless, accessible, and efficient.
- Interface Layout: To declutter the main interface, a sidebar has been utilized to maintain a track of available user settings. The UI has been divided into sections for model selection, dataset upload, parameter selection and customization, and result display.
- Model Selection and Configuration: ALF offers users a drop-down menu to select from various transfer learning models. The customizable parameters include options such as selection of a number of classes, batch size, and the number of epochs to maintain model training integrity.
- Dataset Management: ALF allows for users to upload image datasets for classification tasks.
- User Feedback: Real-time progress bars were integrated to keep users informed of the training progress. Error messages and class summary sections update constantly based on user input, as shown in Figure 3.
- Graph Visualization: Upon completion of the training process, visualization of learning curves (for accuracy and loss plots per epoch) is displayed for user transparency.
- Download Options and Model Evaluation: A tabular summary of evaluation metrics such as training accuracy, precision, recall, F1-score, validation accuracy, and time taken is presented. A download button allows for users to retrieve trained models, which can be used for deployment in offline applications.
- On-Screen Prediction: Users can upload images for prediction of unlabeled data using the trained model.
3. Results
- Learning Curves:
- Confusion Matrix:
4. Discussion
5. Limitations and Future Scope
6. Conclusions
- InceptionV2 is preferable when maximum precision and recall are required, whereas MobileNetV3 and EfficientNetV2 provide a balanced trade-off between speed and accuracy, making them more suitable for real-time or edge deployments.
- The framework performs well on small and imbalanced datasets, showing strong generalization, even without early stopping or advanced regularization techniques.
- Future work will focus on integrating modern architectures such as Vision Transformers, applying optimization strategies such as pruning and quantization, and incorporating explainable AI (XAI) components to improve interpretability.
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
AI | Artificial Intelligence |
ALF | Adaptive Learning Framework |
CNN | Convolutional Neural Network |
F1-score | Harmonic Mean of Precision and Recall |
FN | False Negative |
FP | False Positive |
GUI | Graphical User Interface |
h5 | HDF5 (Hierarchical Data Format version 5) |
NAS | Neural Architecture Search |
SE | Squeeze-and-Excitation |
TN | True Negative |
TP | True Positive |
UI | User Interface |
ViT | Vision Transformer |
XAI | Explainable Artificial Intelligence |
References
- McNeely-White, D.; Beveridge, J.R.; Draper, B.A. Inception and ResNet Features are (Almost) Equivalent. Cogn. Syst. Res. 2020, 59, 312–318. [Google Scholar] [CrossRef]
- Mathew, M.P.; Mahesh, T.Y. Object Detection Based on Teachable Machine. J. VLSI Des. Signal Process. 2021, 7, 20–26. [Google Scholar] [CrossRef]
- Carney, M.; Webster, B.; Alvarado, I.; Phillips, K.; Howell, N.; Griffith, J.; Jongejan, J.; Pitaru, A.; Chen, A. Teachable machine: Approachable web-based tool for exploring machine learning classification. In Proceedings of the Conference on Human Factors in Computing Systems—Proceedings, Association for Computing Machinery, Honolulu, HI, USA, 25–30 April 2020. [Google Scholar] [CrossRef]
- Kacorri, H. Teachable machines for accessibility. ACM SIGACCESS Access. Comput. 2017, 119, 10–18. [Google Scholar] [CrossRef]
- Salehi, A.W.; Khan, S.; Gupta, G.; Alabduallah, B.I.; Almjally, A.; Alsolai, H.; Siddiqui, T.; Mellit, A. A Study of CNN and Transfer Learning in Medical Imaging: Advantages, Challenges, Future Scope. Sustainability 2023, 15, 5930. [Google Scholar] [CrossRef]
- Kim, H.E.; Cosa-Linan, A.; Santhanam, N.; Jannesari, M.; Maros, M.E.; Ganslandt, T. Transfer learning for medical image classification: A literature review. BMC Med. Imaging 2022, 22, 69. [Google Scholar] [CrossRef] [PubMed]
- Kannan, M.K.J.; Sengar, A.; Bhardwaj, A.; Singh, A.; Jain, H.; Shrivastava, V. Simplifying Machine Learning: A Streamlit-Powered Interface for Rapid Model Development with PyCaret. Int. J. Innov. Res. Comput. Commun. Eng. 2024, 12, 5857–5871. [Google Scholar]
- Desai, C. Image Classification Using Transfer Learning and Deep Learning. Int. J. Eng. Comput. Sci. 2021, 10, 25394–25398. [Google Scholar] [CrossRef]
- Ariefwan, M.R.M.; Diyasa, I.G.S.M.; Hindrayani, K.M. InceptionV3, ResNet50, ResNet18 and MobileNetV2 Performance Comparison on Face Recognition Classification. Literasi Nusant. 2021, 4, 1–10. [Google Scholar]
- Hussain, M.; Bird, J.J.; Faria, D.R. A study on CNN transfer learning for image classification. In Advances in Intelligent Systems and Computing; Springer: Berlin/Heidelberg, Germany, 2019; pp. 191–202. [Google Scholar] [CrossRef]
- Sheng, T.; Feng, C.; Zhuo, S.; Zhang, X.; Shen, L.; Aleksic, M. A Quantization-Friendly Separable Convolution for MobileNets. In Proceedings of the 2018 1st Workshop on Energy Efficient Machine Learning and Cognitive Computing for Embedded Applications (EMC2), Williamsburg, VA, USA, 25 March 2018. [Google Scholar] [CrossRef]
- Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.C. MobileNetV2: Inverted Residuals and Linear Bottlenecks. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar]
- Tan, M.; Le, Q.V. EfficientNetV2: Smaller Models and Faster Training. 2021. Available online: https://github.com/google/ (accessed on 20 May 2025).
- Deep Learning and Transfer Learning Approaches for Image Classification. Available online: https://www.researchgate.net/publication/333666150 (accessed on 20 May 2025).
- Dharavath, K.; Amarnath, G.; Talukdar, F.A.; Laskar, R.H. Impact of image preprocessing on face recognition: A comparative analysis. In Proceedings of the 2014 International Conference on Communication and Signal Processing, Melmaruvathur, India, 3–5 April 2014; pp. 631–635. [Google Scholar] [CrossRef]
- Mijwil, M.M.; Doshi, R.; Hiran, K.K.; Unogwu, O.J.; Bala, I. MobileNetV1-Based Deep Learning Model for Accurate Brain Tumor Classification. Mesopotamian J. Comput. Sci. 2023, 2023, 32–41. [Google Scholar] [CrossRef] [PubMed]
- Jeong, H. Feasibility Study of Google’s Teachable Machine in Diagnosis of Tooth-Marked Tongue. J. Dent. Hyg. Sci. 2020, 20, 206–212. [Google Scholar] [CrossRef]
- Mahesh, T.R.; Thakur, A.; Gupta, M.; Sinha, D.K.; Mishra, K.K.; Venkatesan, V.K.; Guluwadi, S. Transformative Breast Cancer Diagnosis using CNNs with Optimized ReduceLROnPlateau and Early Stopping Enhancements. Int. J. Comput. Intell. Syst. 2024, 17, 14. [Google Scholar] [CrossRef]
- Ciresan, D.C.; Meier, U.; Masci, J.; Maria Gambardella, L.; Schmidhuber, J. Flexible, High Performance Convolutional Neural Networks for Image Classification. In Proceedings of the IJCAI Proceedings-International Joint Conference on Artificial Intelligence, Barcelona, Spain, 16–22 July 2011. [Google Scholar]
- Alruwaili, M.; Shehab, A.; Abd El-Ghany, S. COVID-19 Diagnosis Using an Enhanced Inception-ResNetV2 Deep Learning Model in CXR Images. J. Healthc. Eng. 2021, 2021, 6658058. [Google Scholar] [CrossRef] [PubMed]
- Dong, K.; Zhou, C.; Ruan, Y.; Li, Y. MobileNetV2 Model for Image Classification. In Proceedings of the 2020 2nd International Conference on Information Technology and Computer Application, ITCA 2020, Guangzhou, China, 18–20 December 2020; IEEE: Piscataway, NJ, USA; pp. 476–480. [Google Scholar] [CrossRef]
- Hindarto, D. Revolutionizing Automotive Parts Classification Using InceptionV3 Transfer Learning. Int. J. Softw. Eng. Comput. Sci. (IJSECS) 2023, 3, 324–333. [Google Scholar] [CrossRef]
- Krishnapriya, S.; Karuna, Y. Pre-trained deep learning models for brain MRI image classification. Front. Hum. Neurosci. 2023, 17, 1150120. [Google Scholar] [CrossRef] [PubMed]
- Hossin, M.; Sulaiman, M.N. A Review on Evaluation Metrics for Data Classification Evaluations. Int. J. Data Min. Knowl. Manag. Process 2015, 5, 1–11. [Google Scholar] [CrossRef]
- Didyk, L.; Yarish, B.; Beck, M.A.; Bidinosti, C.P.; Henry, C.J. Strategies and Impact of Learning Curve Estimation for CNN-Based Image Classification. 2023. Available online: http://arxiv.org/abs/2310.08470 (accessed on 25 June 2025).
Models | Strengths |
---|---|
MobileNetV1 | Lightweight and efficient for mobile applications, employing depth-wise separable convolutions to reduce computational load [11,16]. |
MobileNetV2 | Adds inverted residuals and linear bottlenecks, striking a balance between performance and efficiency for edge devices [12,21]. |
MobileNetV3 | Uses Neural Architecture Search and SE blocks for enhanced attention, offering small and large variants for speed-accuracy trade-offs |
ResNetV1 | Introduced residual learning to train deeper networks effectively, overcoming the vanishing gradient problem. |
ResNetV2 | Refines ResNetV1 by reordering layers in residual blocks and applying batch normalization before each weight layer, stabilizing training and improving accuracy [5]. |
InceptionV2 | Enhances the original Inception with Batch Normalization and parallel convolutions to capture multi-scale features. |
InceptionV3 | Optimized with factorized convolutions, auxiliary classifiers, and label smoothing for precise classifications [22]. |
EfficientNetV2 | Balances width, depth, and resolution with compound scaling, achieving high accuracy and faster training speed while minimizing parameters [13]. |
Model | Precision (%) | Recall (%) | F1-Score (%) | Validation Accuracy (%) | Testing Accuracy (%) | Time Taken (Seconds) |
---|---|---|---|---|---|---|
CNN | 73.57 | 72.56 | 71.89 | 73.22 | 70.78 | 644.28 |
MobileNetV1 | 98.99 | 99.11 | 99.04 | 99.68 | 99.02 | 361.18 |
MobileNetV2 | 99.46 | 99.42 | 99.44 | 99.84 | 99.41 | 389.03 |
MobileNetV3 | 99.59 | 99.67 | 99.63 | 99.84 | 99.61 | 339.33 |
ResNetV1 | 98.80 | 98.89 | 98.83 | 100 | 98.82 | 1126.67 |
ResNetV2 | 99.57 | 99.61 | 99.59 | 100 | 99.61 | 1184.17 |
InceptionV2 | 100 | 100 | 100 | 99.68 | 100 | 555.51 |
InceptionV3 | 99.83 | 99.79 | 99.81 | 99.35 | 99.80 | 749.95 |
EfficientNetV2 | 99.21 | 99.29 | 99.25 | 99.84 | 99.22 | 397.38 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Khatri, M.; Sahoo, M.; Sayyad, S.; Sayyad, J. Adaptive and User-Friendly Framework for Image Classification with Transfer Learning Models. Future Internet 2025, 17, 370. https://doi.org/10.3390/fi17080370
Khatri M, Sahoo M, Sayyad S, Sayyad J. Adaptive and User-Friendly Framework for Image Classification with Transfer Learning Models. Future Internet. 2025; 17(8):370. https://doi.org/10.3390/fi17080370
Chicago/Turabian StyleKhatri, Manan, Manmita Sahoo, Sameer Sayyad, and Javed Sayyad. 2025. "Adaptive and User-Friendly Framework for Image Classification with Transfer Learning Models" Future Internet 17, no. 8: 370. https://doi.org/10.3390/fi17080370
APA StyleKhatri, M., Sahoo, M., Sayyad, S., & Sayyad, J. (2025). Adaptive and User-Friendly Framework for Image Classification with Transfer Learning Models. Future Internet, 17(8), 370. https://doi.org/10.3390/fi17080370