An Image Recognition Method for the Foods of Northern Shaanxi Based on an Improved ResNet Network
Abstract
1. Introduction
- (1)
- Construct the FoodResNet18 network architecture and enhance multi-scale feature extraction capability by introducing an asymmetric convolution (AC block).
- (2)
- Design a deep, shallow collaborative attention mechanism module to achieve the dynamic fusion of local details and global features.
- (3)
- Adopt an adaptive step size attenuation strategy to optimize the training process.
2. Food Image Recognition Based on CNN
2.1. Basic Principles of CNN
2.1.1. Hierarchical Structure and Function
2.1.2. Loss Function and Training
2.1.3. Core Advantages
- (1)
- Local perception: Convolutional kernels only focus on local regions, reducing computational complexity.
- (2)
- Weight sharing: The same convolution kernel traverses the entire input to reduce the number of parameters.
- (3)
- Hierarchical feature extraction: The shallow capture of low-level features is performed, such as edges and textures, and the deep extraction of semantic information.
- (4)
- Translation invariance: Pooling operations make the model insensitive to changes in target position.
2.2. Optimization Design of CNN
2.2.1. AC Block Fused with Asymmetric Convolution
2.2.2. Global Feature Calibration Based on Attention Mechanism
2.2.3. Residual Structure Optimization and Lightweight Design
3. FoodResNet18 Model Structure
- An attention module shared by deep and shallow layers, enhancing global feature extraction.
- An enhanced block structure, strengthening local detail feature extraction.
- Asymmetric convolution and skip connections reduce degradation phenomena.
3.1. Enhancement Block
- (1)
- Multi-branch convolution group
- (2)
- Feature fusion
- (3)
- Jumping Connection
3.2. Deep Shallow Shared Attention Residual Module
- (1)
- Hierarchical attention adaptation
- (2)
- Parameter sharing mechanism
4. Preprocessing of Food Image Data in Northern Shaanxi
4.1. The Food Image Dataset of Northern Shaanxi
4.2. Food Image Preprocessing
5. Analysis of Food Image Data
5.1. Model Training
5.1.1. Optimizer and Parameter Settings
5.1.2. Learning Rate Adjustment Strategy
5.1.3. Training Parameters and Data Configuration
5.1.4. Training Process Analysis
5.1.5. Experimental Conclusion
5.2. Comparative Analysis
- (1)
- Lightweight and high-performance: FoodResNet18 achieved 85.26% Top-1 accuracy with a 71.2MB model volume, approaching the performance of SOTA models (such as 82.07% of Arch-D), while significantly outperforming traditional ResNet series.
- (2)
- Edge device adaptation: Compared to ResNet-101 (136MB), the model size is reduced by 48%, making it more suitable for scenarios with limited computing resources, such as food and beverage robots and mobile apps.
- (3)
- Domain-specific optimization: The fine-grained feature extraction capabilities were enhanced through modular design to address the unique challenges in Chinese food image recognition.
5.3. Ablation Experiment
6. Conclusions
Supplementary Materials
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Zhao, R. History of Chinese Food Culture; China Light Industry Press: Beijing, China, 2014. [Google Scholar]
- Min, W.; Jiang, S.; Liu, L.; Rui, Y.; Jain, R. A survey on food computing. ACM Comput. Surv. 2020, 52, 1–36. [Google Scholar] [CrossRef]
- Bossard, L.; Guillaumin, M.; Van, G. Food-101-miming discriminative components with random forests. In European Conference on Computer Vision; Springer: Cham, Switzerland, 2014; pp. 446–461. [Google Scholar] [CrossRef]
- Yanai, K.; Kawano, Y. Food image recognition using deep convolutional network with pretraining and fine-tuning. In Proceedings of the 2015 IEEE International Conference on Multimedia & Expo Workshops (ICMEW), Turin, Italy, 29 June–3 July 2015; pp. 1–6. [Google Scholar] [CrossRef]
- Islam, M.; Siddique, M.; Rahman, S.; Jabid, T. Food image classification with convolutional neural network. In Proceedings of the 2018 International Conference on Intelligent Informatics and Biomedical Sciences(ICIIBMS), Bangkok, Thailand, 21–24 October 2018; pp. 257–262. [Google Scholar] [CrossRef]
- Yunus, R.; Arif, O.; Afzal, H.; Amjad, M.F.; Abbas, H.; Bokhari, H.N.; Haider, S.T.; Zafar, N.; Nawaz, R. A framework to estimate the nutritional value of food in real time using deep learning techniques. IEEE Access 2019, 7, 2643–2652. [Google Scholar] [CrossRef]
- Metwalli, A.; Shen, W.; Wu, C. Food image recognition based on densely connected convolutional neural networks. In Proceedings of the 2020 International Conference on Artificial Intelligence in Information and Communication (ICAIIC), Fukuoka, Japan, 19–21 February 2020; pp. 27–32. [Google Scholar] [CrossRef]
- Liao, E.H.; Li, H.; Wang, H.; Pang, X.W. Food image recognition based on convolutional neural networks. J. South China Norm. Univ. 2019, 51, 113–119. [Google Scholar]
- Sheng, G.; Min, W.; Zhu, X.; Xu, L.; Sun, Q.; Yang, Y.; Wang, L.; Jiang, S.A. Lightweight Hybrid Model with Location-Preserving ViT for Efficient Food Recognition. Nutrients 2024, 16, 200. [Google Scholar] [CrossRef] [PubMed]
- Li, X.; Yin, A.; Choi, H.Y.; Chan, V.; Allman-Farinelli, M.; Chen, J. Evaluating the Quality and Comparative Validity of Manual Food Logging and Artificial Intelligence-Enabled Food Image Recognition in Apps for Nutrition Care. Nutrients 2024, 16, 2573. [Google Scholar] [CrossRef] [PubMed]
- Bianco, R.; Marinoni, M.; Coluccia, S.; Carioni, G.; Fiori, F.; Gnagnarella, P.; Edefonti, V.; Parpinel, M. Tailoring the Nutritional Composition of Italian Foods to the US Nutrition5k Dataset for Food Image Recognition: Challenges and a Comparative Analysis. Nutrients 2024, 16, 3339. [Google Scholar] [CrossRef] [PubMed]
- Nfor, K.A.; Theodore Armand, T.P.; Ismaylovna, K.P.; Joo, M.-I.; Kim, H.-C. An Explainable CNN and Vision Transformer-Based Approach for Real-Time Food Recognition. Nutrients 2025, 17, 362. [Google Scholar] [CrossRef] [PubMed]
- Li, Y.; Liu, M. Food image recognition method based on iterative clustering and confidence screening mechanism. J. Electron. Imaging 2025, 34, 043025. [Google Scholar] [CrossRef]
- Jagadesh, B.N.; Mantena, S.V.; Sathe, A.P.; Prabhakara Rao, T.; Lella, K.K.; Pabboju, S.S.; Vatambeti, R. Enhancing food recognition accuracy using hybrid transformer models and image preprocessing techniques. Sci. Rep. 2025, 15, 5591. [Google Scholar] [CrossRef] [PubMed]
- Xiong, Y. Food Image Recognition based on ResNet. Appl. Comput. Eng. 2023, 8, 605–611. [Google Scholar] [CrossRef]
- Liu, Y.Z. Automatic food recognition based on efficientnet and ResNet. J. Phys. Conf. Ser. 2023, 2646, 012037. [Google Scholar] [CrossRef]
- Xiao, Z.; Diao, G.; Deng, Z. Fine grained food image recognition based on swin transformer. J. Food Eng. 2024, 380, 112134. [Google Scholar] [CrossRef]
- Kim, Y.D. Consumer Usability Test of Mobile Food Safety Inquiry Platform Based on Image Recognition. Sustainability 2024, 16, 9538. [Google Scholar] [CrossRef]
- Bu, L.; Hu, C.; Zhang, X. Recognition of food images based on transfer learning and ensemble learning. PLoS ONE 2024, 19, e0296789. [Google Scholar] [CrossRef] [PubMed]
- Ding, X.; Guo, Y.; Ding, G.; Han, J. ACNet: Strengthening the kernel skeletons for powerful CNN via asymmetric convolution blocks. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 1911–1920. [Google Scholar]
- Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Chen, J.; Ngo, C. Deep-based ingredient recognition for cooking recipe retrieval. In Proceedings of the 24th ACM International Conference on Multimedia, Amsterdam, The Netherlands, 15–19 October 2016; pp. 32–41. [Google Scholar] [CrossRef]
Method | Top-1/% | Top-5/% | Size/MB |
---|---|---|---|
Arch-D | 82.07 | 95.89 | \ |
Dense-Food [6] | 81.23 | 95.47 | \ |
ResNet101 | 75.21 | 91.22 | 136 |
ResNet-18 | 75.38 | 91.88 | 43 |
FoodResNet18 | 85.26 | 96.11 | 71.2 |
Method | Top-1/% | Top-5/% | Parameter Quantities/M |
---|---|---|---|
ResNet-18 | 75.38 | 91.88 | 11.26 |
ResNet-18+E | 84.08 | 95.28 | 18.60 |
ResNet18+A | 83.22 | 95.15 | 14.79 |
FoodResNet18 | 85.26 | 96.11 | 19.34 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Ma, Y.; Liu, J.; Cui, A. An Image Recognition Method for the Foods of Northern Shaanxi Based on an Improved ResNet Network. Mathematics 2025, 13, 2572. https://doi.org/10.3390/math13162572
Ma Y, Liu J, Cui A. An Image Recognition Method for the Foods of Northern Shaanxi Based on an Improved ResNet Network. Mathematics. 2025; 13(16):2572. https://doi.org/10.3390/math13162572
Chicago/Turabian StyleMa, Yonggang, Junmei Liu, and Angang Cui. 2025. "An Image Recognition Method for the Foods of Northern Shaanxi Based on an Improved ResNet Network" Mathematics 13, no. 16: 2572. https://doi.org/10.3390/math13162572
APA StyleMa, Y., Liu, J., & Cui, A. (2025). An Image Recognition Method for the Foods of Northern Shaanxi Based on an Improved ResNet Network. Mathematics, 13(16), 2572. https://doi.org/10.3390/math13162572