Deep Learning and Transfer Learning Models for Indian Food Classification

Patel, Jigarkumar Ambalal; Labana, Dileep Laxmansinh; Lakhani, Gaurang Vinodray; Vaghela, Rashmika Ketan

doi:10.3390/engproc2026124014

Open AccessProceeding Paper

Deep Learning and Transfer Learning Models for Indian Food Classification^†

by

Jigarkumar Ambalal Patel

^1,*

,

Dileep Laxmansinh Labana

²,

Gaurang Vinodray Lakhani

¹

and

Rashmika Ketan Vaghela

³

¹

Government Polytechnic, Bhuj 370001, India

²

R C Technical Institute, Ahmedabad 380060, India

³

Government Polytechnic, Gujarat Technological University, Ahmedabad 380006, India

^*

Author to whom correspondence should be addressed.

^†

Presented at the 6th International Electronic Conference on Applied Sciences, 9–11 December 2025; Available online: https://sciforum.net/event/ASEC2025.

Eng. Proc. 2026, 124(1), 14; https://doi.org/10.3390/engproc2026124014

Published: 3 February 2026

(This article belongs to the Proceedings of The 6th International Electronic Conference on Applied Sciences)

Download

Browse Figures

Versions Notes

Abstract

This study examines the utilization of deep learning and transfer learning models for classifying photos of Indian cuisine. Indian cuisine, characterized by its extensive diversity and intricate presentation, poses considerable hurdles in food recognition owing to changes in ingredients, texture, and visual aesthetics. To tackle these challenges, we utilized a bespoke Convolutional Neural Network (CNN) and harnessed cutting-edge transfer learning models such as DenseNet121, InceptionV3, MobileNet, VGG16, and Xception. The research employed a varied dataset comprising 13 food categories and executed preprocessing techniques like HSV conversion, noise reduction, and edge identification to improve image quality. Metrics for performance evaluation, including accuracy, precision, recall, and F1-score, were employed to assess model efficacy. The CNN model demonstrated a mediocre performance, revealing overfitting concerns due to a substantial disparity between training and validation accuracy. In contrast, transfer learning models, particularly DenseNet121, InceptionV3, and Xception, exhibited an enhanced generalization ability, each attaining above 92% accuracy across all criteria. MobileNet and VGG16 produced reliable outcomes with marginally reduced performances. The results highlight the efficacy of transfer learning in food image classification and indicate that fine-tuned, pre-trained models markedly improve classification accuracy. This research advances the creation of intelligent food recognition systems applicable in dietary monitoring, nutrition tracking, and health management.

Keywords:

Indian food; deep learning; transfer learning; DenseNet; Inception; MobileNet; VGG16; Xception

1. Introduction

Food serves as a critical lens for understanding Indian culture, identity, and social structure, reflecting dimensions such as caste, class, religion, ethnicity, and lifestyle. In India’s rapidly evolving socio-economic landscape, food has emerged as a prominent form of cultural expression shaped by globalization and increased consumption. While these changes have expanded dietary choices, they have also contributed to unhealthy eating patterns and a rising prevalence of obesity, emphasizing the importance of effective dietary monitoring and calorie management.

Recent advances in computer vision and machine learning have enabled automated food recognition and calorie estimation through image analysis, offering a practical solution for promoting healthier dietary behaviors [1,2]. The increasing availability of food images from social media, mobile applications, and online platforms has facilitated the creation of large-scale datasets, accelerating research in this domain. Automated dish recognition not only supports personal diet tracking and meal planning but also has applications in epidemiological research and the formulation of public health dietary guidelines [3].

Dietary assessment methods based on visual data can be broadly categorized into image-assisted and image-based approaches. Image-assisted methods complement traditional text-based assessments with manual image analysis, whereas image-based techniques enable fully automated dietary evaluation without human intervention. Although both approaches yield reliable results, image-based systems significantly reduce the workload of nutrition professionals and enhance scalability for real-world applications.

Despite significant progress, food image recognition remains a challenging task due to high intra-class variability, occlusion of food items, irregular shapes, and difficulties in segmentation under complex backgrounds. Additional challenges include accurate volume estimation for liquid foods, non-uniform food density affecting calorie estimation, high computational costs for real-time deployment, and the lack of comprehensive nutritional databases accounting for variations in recipes and cooking methods [4]. Addressing these challenges is essential for improving the accuracy, reliability, and applicability of food recognition systems in practical health-oriented applications.

The primary objective of this research is to evaluate the effectiveness of deep learning-based food image classification for Indian cuisine using transfer learning techniques. In this study, multiple pre-trained convolutional neural network architectures, including DenseNet, Inception, MobileNet, VGG16, and Xception, are systematically investigated and compared to assess their classification performance. Particular emphasis is placed on the MobileNetV3 architecture due to its computational efficiency and suitability for real-time and resource-constrained environments. Through comparative performance evaluation, this work aims to identify an optimal transfer learning framework for accurate and efficient Indian food classification.

2. Related Work

Reddy, U.K et al. (2022) [5] developed an automated food classification system using deep learning techniques. SqueezeNet and VGG-16 models were implemented for real-time food image classification in healthcare applications. After dataset augmentation and hyper-parameter tuning, VGG-16 outperformed Squeeze Net, achieving an accuracy of 85.07% due to superior feature extraction.

Nithiyaraj, E.E et al. (2021) [6] proposed a transfer learning-based food classification framework using AlexNet on the India-Food-21-Categories-Small dataset. Data augmentation was applied to address dataset limitations. The model achieved 96.6% accuracy within five training epochs and supported calorie monitoring and dietary habit analysis.

Survase, A.S et al. (2022) [7] introduced a CNN-based approach for Indian food classification using multiple pre-trained transfer learning models. The models reduced computational cost and achieved effective classification of food categories, recipes, and automated dietary analysis.

Giramkar, J. et al. (2023) [8] presented a transfer learning-based CNN framework evaluated on an Indian food dataset with 11 classes. InceptionV3, VGG16, VGG19, and ResNet models were compared, with InceptionV3 achieving the highest accuracy of 87.9% and providing calorie and ingredient information.

Ramesh, A. et al. (2020) [9] developed a real-time food detection and classification system using an SSD model integrated with CNN architectures. The system achieved an overall accuracy of 97.6%, demonstrating effectiveness for automatic food localization and classification.

Sharma, A. et al. (2023) [10] proposed a custom CNN incorporating edge and shape features for food image classification. The model achieved 92.89% accuracy on benchmark datasets and improved to 96.27% when combined with InceptionV3 features on a South Indian food dataset.

Zhao, H. et al. (2019) [11] introduced a few-shot and many-shot fusion learning model for food classification and created the NTU-IndianFood107 dataset. The model achieved a top-1 accuracy of 72% and effectively handled limited and large-sample food categories.

Liu, Y.C. et al. (2022) [12] proposed an EfficientNet-based framework for multi-dish food recognition with bounding-box detection. The system achieved a mean average precision of up to 92%, enabling real-time recognition and improved dietary reporting.

Patel, J. et al. (2022) [13] proposed a transfer learning-based Indian food image classification system using the MobileNetV3 architecture, achieving an overall classification accuracy of approximately 90%.

3. Methodology

In the present research, the CNN and transfer learning with DenseNet, Inception, MobileNet, VGG16, and Xception is used. Our Indian food classification system is shown in Figure 1.

3.1. Indian Food Dataset

The 13_Indian_food dataset [14] used in this study consists of approximately 3500 images distributed across 13 commonly consumed Indian food categories, with each category containing more than 250 images. This relatively balanced class distribution helps minimize training bias and ensures fair comparative evaluation across different deep learning and transfer learning models. Although Indian cuisine comprises hundreds of regional and traditional dishes, the selected categories represent visually distinctive and frequently consumed foods, making the dataset suitable for benchmarking classification performance under controlled experimental conditions.

It is acknowledged that broader category coverage would further enhance real-world applicability. Future extensions of this work may include expanding the dataset by crawling food images from social media and recipe platforms, crowdsourced data collection through mobile applications, and collaborations with catering services, restaurants, and food delivery platforms to capture real-world variations in lighting, background, and presentation styles.

Preprocessing on Dataset

Upon acquiring the dataset, image preprocessing was performed to enhance image quality and ensure consistency. The research employed a diverse dataset comprising 13 food categories and applied preprocessing techniques such as HSV color space conversion, noise reduction, and edge detection to improve visual clarity and feature representation. Additionally, standard preprocessing operations including normalization, contrast enhancement, and image resizing were carried out. Data augmentation methods such as random rotation, horizontal flipping, cropping, and illumination variation were applied during training to reduce overfitting, increase data diversity, and improve model robustness, particularly for transfer learning models. As part of the rescaling process, pixel values were normalized by dividing each value by 255 to facilitate efficient model training.

3.2. Feature Extraction and Classification

For feature extraction and classification from the images, here we used a Convolutional Network method. We used a CNN model and different transfer learning methods.

3.2.1. CNN (Self Designed)

The input image is 224 × 224 pixels with 3 channels, and the convolutional layers progressively double the number of filters from 32 to 256, each followed by a 2 × 2 max-pooling layer to reduce dimensionality while preserving key features. The resulting feature maps are flattened and fed into a dense layer with 256 units using ReLU activation, followed by a final dense layer with 13 units and a SoftMax activation function for food image classification. Basic architectural tuning was explored but did not significantly improve performance. Advanced optimizations such as Batch Normalization, alternative activation functions, attention mechanisms, and kernel optimization are identified as future research directions. The model summary is presented in Figure 2.

3.2.2. Transfer Learning with MobileNetV3

MobileNetV3 [13] was employed for transfer learning using ImageNet pre-trained weights to enhance food classification performance. The model was used for feature extraction, followed by a dense layer with 1280 units and ReLU activation. Final classification was performed using a dense layer with units equal to the number of food categories (13). The model summary is presented in Figure 3.

3.2.3. Transfer Learning with DenseNet-121

DenseNet-121 [15] is a convolutional neural network with densely connected layers that promote feature reuse and mitigate the vanishing gradient problem. Its architecture is well suited for capturing fine-grained details in Indian food images, making it effective for accurate food classification. The model summary is presented in Figure 4.

3.2.4. Transfer Learning with Inception

InceptionV3 [16], also known as GoogleNet, employs multi-scale feature extraction using multiple filter sizes within each layer. This architecture is effective for capturing diverse textures and structural details in Indian food images, enabling improved classification performance. The model summary is presented in Figure 5.

3.2.5. Transfer Learning with Xception

Xception is an advanced variant of the Inception architecture that utilizes depthwise separable convolutions to efficiently learn both local and global features. Its capability to capture complex hierarchical patterns makes it well suited for Indian food classification, where textures and visual variations differ widely across dishes. The model summary is presented in Figure 6.

3.2.6. Transfer Learning with VGG

VGG16 [16] is a deep convolutional neural network recognized for its simple and consistent design. It uses a series of layers with small convolutional filters combined with max-pooling operations. Although it is deeper than some alternative architectures, it remains effective at learning and extracting features from food images. However, it is important to note that VGG16 has a relatively large number of parameters, making it computationally heavier. The model summary is presented in Figure 7.

3.3. Evaluation

The performance of the classification system is evaluated using standard metrics widely adopted in the scientific community. A confusion matrix is employed to analyze the model’s effectiveness through key parameters, including true positive (TP), true negative (TN), false positive (FP), and false negative (FN). Based on these parameters, evaluation metrics such as accuracy, precision, recall, and F1-score are computed.

4. Implementation and Results

The experiments undertaken in this study used Kaggle.com, an online platform specifically built for the purpose of deep learning. The following hyper-parameters were used for the experiments.

Image Size: 224 × 224.

Epoch: 30 for CNN (Self Designed) & 10 for (Transfer Learning Methods).

Optimizer: Adam.

Loss Function: categorical_crossentropy.

Initial learning rate: 0.0001.

Batch size: 32.

The testing accuracy was determined to be 50% for the CNN (Self-Designed) Model. The testing accuracy of different models is shown below in Table 1.

Misclassifications mainly occur among visually similar dishes with overlapping textures, colors, or presentation styles, often due to complex backgrounds, occlusion, and lighting variations. Transfer learning models show stable performance under moderate viewpoint and illumination changes, while detailed robustness evaluation under severe noise and extreme angles is identified as future work.

5. Conclusions

This study investigated the problem of Indian food image classification by comparing a custom-built convolutional neural network (CNN) with several well-known transfer learning models, namely DenseNet, Inception, MobileNet, VGG16, and Xception. The performance of each model was evaluated using standard metrics such as accuracy, precision, recall, and F1-score to ensure a comprehensive comparison. The custom CNN achieved only a moderate performance, with an accuracy of approximately 50%, indicating a limited feature learning capability. In contrast, the transfer learning models demonstrated significantly superior results due to their pre-trained feature representations. DenseNet, Inception, and Xception achieved around 92% across all evaluation metrics, while MobileNet and VGG16 recorded performances in the range of 86–91%. Overall, these findings clearly demonstrate the effectiveness of pre-trained deep learning models, as they provide enhanced feature extraction and improved classification accuracy for diverse Indian food images. Future work will focus on expanding dataset diversity, conducting preprocessing ablation studies, improving custom CNN architectures, and performing detailed robustness evaluations under real-world conditions to further enhance generalization and practical applicability.

Author Contributions

Conceptualization, Methodology—J.A.P., Software—D.L.L., Writing, Visualization—R.K.V., Validation, Supervision—G.V.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Dataset available on request from the authors.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Kiss, Á.; Kovács, A. Food recognition using neural network on mobile device. In Proceedings of the 2023 IEEE 17th International Symposium on Applied Computational Intelligence and Informatics (SACI), Timisoara, Romania, 23–26 May 2023; pp. 755–760. [Google Scholar]
Christian, S.; Murwantara, I.M.; Lazarusli, I. A Mobile Application for Food and Its Ingredients Detection Using Deep Learning. In Proceedings of the 2022 1st International Conference on Technology Innovation and Its Applications (ICTIIA), Tangerang, Indonesia, 23 September 2022; pp. 1–6. [Google Scholar]
Banerjee, S.; Mondal, A.C. Nutrient Food Prediction Through Deep Learning. In Proceedings of the 2021 Asian Conference on Innovation in Technology (ASIANCON), Pune, India, 27–29 August 2021; pp. 1–5. [Google Scholar]
Freitas, C.N.C.; Cordeiro, F.R.; Macario, V. MyFood: A Food Segmentation and Classification System to Aid Nutritional Monitoring. In Proceedings of the 2020 33rd SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI), Porto de Galinhas, Brazil, 7–10 November 2020; pp. 234–239. [Google Scholar]
Reddy, U.K.; Swathi, S.; Rao, M.S. Deep Learning Technique for Automatically Classifying Food Images. Math. Stat. Eng. Appl. 2022, 71, 2362–2370. [Google Scholar]
Nithiyaraj, E.E.; Rajaseela, S. Indian Food Image Recognition using a Deep Learning Approach. Indian J. Food Eng. 2021, 1, 777–783. [Google Scholar]
Survase, A.S.; Sawant, K.; Lokhande, R.; Pawar, S.; Korde, P. Food Image Classification and Nutrition Detection Using CNN. Iconic Res. Eng. J. 2022, 5, 10–11. [Google Scholar]
Giramkar, J.; Thakur, A.; Hole, M.; Darekar, O.; Manwatkar, A. Indian Food Image Classification with Transfer Learning. In Proceedings of the 2019 4th International Conference on Computational Systems and Information Technology for Sustainable Solution (CSITSS), Bengaluru, India, 20–21 December 2019; Volume 5. [Google Scholar]
Ramesh, A.; Sivakumar, A.; Angel, S.S. Real-time Food-Object Detection and Localization for Indian Cuisines Using Deep Neural Networks. In Proceedings of the 2020 IEEE International Conference on Machine Learning and Applied Network Technologies (ICMLANT), Hyderabad, India, 20–21 December 2020; pp. 1–6. [Google Scholar]
Sharma, A.; Kumar, C. Analysis & Numerical Simulation of Indian Food Image Classification Using Convolutional Neural Network. Int. J. New Pract. Manag. Eng. 2023, 12, 127–132. [Google Scholar]
Zhao, H.; Yap, K.-H.; Kot, A.C.; Duan, L.; Cheung, N.-M. Few-Shot and Many-Shot Fusion Learning in Mobile Visual Food Recognition. In Proceedings of the 2019 IEEE International Symposium on Circuits and Systems (ISCAS), Sapporo, Japan, 26–29 May 2019. [Google Scholar]
Liu, Y.-C.; Onthoni, D.D.; Sahoo, P.K. Deep-Learning-Assisted Multi-Dish Food Recognition Application for Dietary Intake Reporting. Electronics 2022, 11, 1626. [Google Scholar] [CrossRef]
Patel, J.; Talsania, H.; Modi, K. Performance Evaluation of Indian Food Image Classification System Using Transfer Learning with MobileNetV3. SAMRIDDHI J. Phys. Sci. Eng. Technol. 2022, 14, 398–405. [Google Scholar] [CrossRef]
Available online: https://www.kaggle.com/datasets/jigarsharp/13-indian-food-250 (accessed on 24 March 2025).
Patel, J.A.; Lakhani, G.V.; Vaghela, R.K.; Labana, D.L. DenseMobile Net: Deep Ensemble Model for Precision and Innovation in Indian Food Recognition. Eng. Proc. 2025, 87, 3. [Google Scholar]
Shah, S.R.; Qadri, S.; Bibi, H.; Shah, S.M.W.; Sharif, M.I.; Marinello, F. Comparing Inception V3, VGG 16, VGG 19, CNN, and ResNet 50: A Case Study on Early Detection of a Rice Disease. Agronomy 2023, 13, 1633. [Google Scholar] [CrossRef]

Figure 1. Indian Food Classification System Architecture.

Figure 2. Model Summary of CNN.

Figure 3. Model Summary of MobileNetV3.

Figure 4. Model Summary of DenseNet-121.

Figure 5. Model Summary of Inception.

Figure 6. Model Summary of Xception.

Figure 7. Model Summary of VGG.

Table 1. Comparison of Testing Results on Different Evaluation Matrices.

Model	Accuracy (%)	Precision (%)	Recall (%)	F1-Score (%)
CNN	50	47	47	47
DenseNet-121	92	92	91	91
Inception V3	92	92	91	92
MobileNetV3	91	91	92	91
VGG16	86	86	85	86
Xception	92	92	91	92

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Patel, J.A.; Labana, D.L.; Lakhani, G.V.; Vaghela, R.K. Deep Learning and Transfer Learning Models for Indian Food Classification. Eng. Proc. 2026, 124, 14. https://doi.org/10.3390/engproc2026124014

AMA Style

Patel JA, Labana DL, Lakhani GV, Vaghela RK. Deep Learning and Transfer Learning Models for Indian Food Classification. Engineering Proceedings. 2026; 124(1):14. https://doi.org/10.3390/engproc2026124014

Chicago/Turabian Style

Patel, Jigarkumar Ambalal, Dileep Laxmansinh Labana, Gaurang Vinodray Lakhani, and Rashmika Ketan Vaghela. 2026. "Deep Learning and Transfer Learning Models for Indian Food Classification" Engineering Proceedings 124, no. 1: 14. https://doi.org/10.3390/engproc2026124014

APA Style

Patel, J. A., Labana, D. L., Lakhani, G. V., & Vaghela, R. K. (2026). Deep Learning and Transfer Learning Models for Indian Food Classification. Engineering Proceedings, 124(1), 14. https://doi.org/10.3390/engproc2026124014

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.

Article Menu

Deep Learning and Transfer Learning Models for Indian Food Classification^†

Abstract

1. Introduction

2. Related Work