DenseMobile Net: Deep Ensemble Model for Precision and Innovation in Indian Food Recognition †
Abstract
:1. Introduction
- Using pre-trained weights of Densenet-121 and MobileNetV3 on the ImageNet dataset, the suggested method for Indian cuisine detection works well even when the dataset is unbalanced.
- Deep ensemble learning leads to improved accuracy in recognizing a wide variety of Indian foods and could also facilitate new discoveries in how we engage with and comprehend food.
- The proposed approach demonstrates enhanced accuracy compared to the current state-of-the-art methods for food classification.
2. Related Work
Researcher | Algorithm | Dataset | Accuracy |
---|---|---|---|
Md Tohidul Islam [9] | CNN (Self-Designed) | Food-11 | 74.70% |
Yuzhen Lu [10] | BoF with SVM | Custom-10 Classes | 90% |
VijayaKumari G [11] | EfficientNetb0 | Food-101 | 80% |
Chang Liu [12] | CNN | Food-101 | 77.4% |
Michele De Bonis [13] | GoogleNet | UPMC 101 | 70% |
SqueezeNet | ETHZ 101 | 60% | |
Rajayogi [14] | InceptionV3 | Indian Food | 87.9% |
N. Hnoohom [15] | GoogleNet | Thai Fast Food | 88.33% |
3. Methodology
3.1. Indian Food Dataset
3.2. Data Pre-Processing
3.3. Feature Extractions and Classification
3.3.1. Transfer Learning with MobileNetV3
Architecture of MobileNetV3
- 1.
- Input Layer: It is consisting of an image or a tensor that serves as the data to be processed.
- 2.
- Convolution Block: MobileNetV3 employs inverted residual blocks featuring linear bottlenecks in its convolutional architecture. The blocks are composed of a sequence of depth-wise separable convolutions and pointwise convolutions, which effectively decrease computational complexity while preserving the capacity to capture significant features.
- 3.
- Activation Function: It plays a crucial role in deep learning models. MobileNetV3, incorporates sophisticated activation functions, including the hard-swish activation function. This particular activation function introduces non-linearity to the network, enhancing its ability to capture complex patterns and improve overall performance.
- 4.
- Squeeze-and-Excitation (SE) Blocks: It allows for the adaptive recalibration of feature maps by learning the relative importance of different channels. This enhances the overall representation power of the model.
- 5.
- Fully Connected Layer: Before the fully connected layer is employed, the first layer flattens to decrease the spatial dimensions of an image and generate a feature vector of a consistent length, regardless of the size of the input image.
3.3.2. Transfer Learning with Densenet-121
Architecture of DenseNet
- 1.
- The first layers: Before images are fed into the dense block layers, a significant amount of information is extracted from them using the convolution and pooling layers. To reduce the total number of parameters, down sampling is applied to the feature maps.
- 2.
- Dense Block Layers: To make information flow between layers easier, the DenseNet architecture uses a dense connection. The convolution layer, activation function, and batch normalization make up the dense block. Each layer’s input is sent to the layers that follow it. As a result, all feature maps from earlier layers are sent to the nth layer. The inputs can be written as follows:
- 3.
- Interfacial layers, another name for transition layers, are areas that lie between two distinct phases. These layers contribute to down sampling in convolutional neural network architecture. The network design is divided into four dense blocks to reduce the feature map’s dimension. Transition layers are the middle layers that are located in the network design between the dense blocks. Batch normalization, a rectified linear unit (ReLU) activation function, a 1 × 1 convolution layer, and a 2 × 2 max pooling layers are some of the components that make up these transition layers. It is preferable to use maximum pooling rather than average pooling when down sampling. This decision was made in order to detect important characteristics, like edges, and efficiently reduce the dimensionality of features. It should be mentioned that average pooling is a method that gradually and continuously retrieves features. Compression is used to reduce the number of feature maps, reduce the possibility of overfitting, and improve the generalizability of the model.
- 4.
- Fully Connected Layers: Two fully connected layers make up the model’s last section. By using global average pooling, the first layer flattens the feature maps, producing a linear array with 1024 nodes. The fully connected layer, which serves as a classifier, then receives this array as input. Each of the 26 neurons that make up the second layer represents a different food class.
3.3.3. Ensemble Learning with MobileNetV3 and DenseNet-121
3.4. Performance Evaluation
4. Implementation and Results
4.1. Results with MobileNetV3
4.2. Results with DesneNet-121
4.3. Results with Ensemble Model
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Sadler, C.R.; Grassby, T.; Hart, K.; Raats, M.; Sokolović, M.; Timotijevic, L. Processed food classification: Conceptualisation and challenges. Trends Food Sci. Technol. 2021, 112, 149–162. [Google Scholar] [CrossRef]
- Lee, K.-S. Multispectral Food Classification and Caloric Estimation Using Convolutional Neural Networks. Foods 2023, 12, 3212. [Google Scholar] [CrossRef] [PubMed]
- Shah, B.; Kanani, P.; Joshi, P.; Pandya, G.; Kulkarni, D.; Patil, N.; Kurup, L. Traditional Indian Food Classification Using Shallow Convolutional Neural Network. Int. J. Intell. Syst. Appl. Eng. 2024, 12, 769. [Google Scholar]
- Hiremath, G.; Mathew, J.A.; Boraiah, N.K. Hybrid Statistical and Texture Features with DenseNet 121 for Breast Cancer Classification. Int. J. Intell. Eng. Syst. 2023, 16, 24–34. [Google Scholar]
- Patel, J.; Modi, K. Indian Food Image Classification and Recognition with Transfer Learning Technique Using MobileNetV3 and Data Augmentation. Eng. Proc. 2023, 56, 197. [Google Scholar] [CrossRef]
- Lo, F.P.W.; Sun, Y.; Qiu, J.; Lo, B. Image-based food classification and volume estimation for dietary assessment: A review. IEEE J. Biomed. Health Inform. 2020, 24, 1926–1939. [Google Scholar] [CrossRef] [PubMed]
- Hameed, K.; Chai, D.; Rassau, A. Texture-based latent space disentanglement for enhancement of a training dataset for ANN-based classification of fruit and vegetables. Inf. Process. Agric. 2023, 10, 85–105. [Google Scholar] [CrossRef]
- Ren, X.; Wang, Y.; Huang, Y.; Mustafa, M.; Sun, D.; Xue, F.; Wu, F. A CNN-Based E-Nose Using Time Series Features for Food Freshness Classification. IEEE Sens. J. 2023, 23, 6027–6038. [Google Scholar] [CrossRef]
- Islam, M.T.; Siddique, B.N.K.; Rahman, S.; Jabid, T. Food image classification with convolutional neural network. In Proceedings of the 2018 International Conference on Intelligent Informatics and Biomedical Sciences, Bangkok, Thailand, 21–24 October 2018; pp. 257–262. [Google Scholar]
- Lu, Y. Food Image Recognition by Using Convolutional Neural Networks (CNNs). arXiv 2019, arXiv:1612.00983. [Google Scholar]
- VijayaKumari, G.; Vutkur, P.; Vishwanath, P. Food classification using transfer learning technique. Glob. Transit. Proc. 2022, 3, 225–229. [Google Scholar]
- Liu, C.; Cao, Y.; Luo, Y.; Chen, G.; Vokkarane, V.; Ma, Y. Deepfood: Deep learning-based food image recognition for computer-aided dietary assessment. In Inclusive Smart Cities and Digital Health: 14th International Conference on Smart Homes and Health Telematics, Wuhan, China, 25–27 May 2016; Springer: Cham, Switzerland, 2016; Volume 14, pp. 37–48. [Google Scholar]
- De Bonis, M.; Amato, G.; Falchi, F.; Gennaro, C.; Manghi, P. Deep learning techniques for visual food recognition on a mobile app. In Multimedia and Network Information Systems: Proceedings of the 11th International Conference MISSI; Wrocław, Poland, 12–14 September 2018, Springer: Cham, Switzerland, 2019; Volume 11, pp. 303–312. [Google Scholar]
- Rajayogi, J.R.; Manjunath, G.; Shobha, G. Indian food image classification with transfer learning. In Proceedings of the 2019 4th International Conference on Computational Systems and Information Technology for Sustainable Solution (CSITSS), Bengaluru, India, 20–21 December 2019; pp. 1–4. [Google Scholar]
- Hnoohom, N.; Yuenyong, S. Thai fast food image classification using deep learning. In Proceedings of the 2018 International ECTI Northern Section Conference on Electrical, Electronics, Computer and Telecommunications Engineering (ECTI-NCON), Chiang Rai, Thailand, 25–28 February 2018; pp. 116–119. [Google Scholar]
- Mehta, R.; Singh, K.K. An efficient ear recognition technique based on deep ensemble learning approach. Evol. Syst. 2023, 15, 771–787. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Patel, J.A.; Lakhani, G.V.; Vaghela, R.K.; Labana, D.L. DenseMobile Net: Deep Ensemble Model for Precision and Innovation in Indian Food Recognition. Eng. Proc. 2025, 87, 3. https://doi.org/10.3390/engproc2025087003
Patel JA, Lakhani GV, Vaghela RK, Labana DL. DenseMobile Net: Deep Ensemble Model for Precision and Innovation in Indian Food Recognition. Engineering Proceedings. 2025; 87(1):3. https://doi.org/10.3390/engproc2025087003
Chicago/Turabian StylePatel, Jigarkumar Ambalal, Gaurang Vinodray Lakhani, Rashmika Ketan Vaghela, and Dileep Laxmansinh Labana. 2025. "DenseMobile Net: Deep Ensemble Model for Precision and Innovation in Indian Food Recognition" Engineering Proceedings 87, no. 1: 3. https://doi.org/10.3390/engproc2025087003
APA StylePatel, J. A., Lakhani, G. V., Vaghela, R. K., & Labana, D. L. (2025). DenseMobile Net: Deep Ensemble Model for Precision and Innovation in Indian Food Recognition. Engineering Proceedings, 87(1), 3. https://doi.org/10.3390/engproc2025087003