NutritionVerse3D2D: Large 3D Object and 2D Image Food Dataset for Dietary Intake Estimation
Abstract
1. Introduction
- Collection of quality 3D objects for food items, in conjunction with their associated weight, food name, and nutritional value (NV-3D).
- Methodology for view synthesis using NV-3D to create an associated 2D food image dataset, namely NutritionVerse-Synth (NV-Synth)
- Introduction of a 2D food image validation dataset, NutritionVerse-Real (NV-Real), enriched with both diet information and segmentation masks.
- Exploration of the benefits of incorporating depth information in food estimation tasks, accompanied by comprehensive experimental results.
- Valuable insights into the synergistic utilization of synthetic and real data to enhance the accuracy of diet estimation methods.
- Analysis of leveraging different deep-learning model architectures to improve dietary intake estimation.
2. Related Work
2.1. Food Datasets
2.2. Dietary Intake Estimation Methods
3. Materials and Methods
3.1. Data Collection of 3D Objects
Item-Specific Challenges
3.2. Synthesis Dataset (NV-Synth)
3.3. Validation Dataset (NV-Real)
4. Results and Discussion
- using depth information
- incorporating synthetic data
- leveraging different deep-learning model architectures
4.1. Using Depth Information
4.2. Incorporating Synthetic Data
4.3. Leveraging Different Deep-Learning Model Architectures
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Davis, M.R. Despite Pandemic, Percentage of Older Adults Who Want to Age in Place Stays Steady. 2021. Available online: https://www.overleaf.com/project/6902d3b535892b7c7552d53d (accessed on 11 October 2022).
- Ahmed, T.; Haboubi, N. Assessment and management of nutrition in older people and its importance to health. Clin. Interv. Aging 2010, 5, 207–216. [Google Scholar] [CrossRef]
- Keller, H.H.; Østbye, T.; Richard, G. Nutritional risk predicts quality of life in elderly community-living Canadians. J. Gerontol. Ser. A 2004, 59, M68–M74. [Google Scholar] [CrossRef]
- Kaiser, M.J.; Bauer, J.M.; Rämsch, C.; Uter, W.; Guigoz, Y.; Cederholm, T.; Thomas, D.R.; Anthony, P.S.; Charlton, K.E.; Maggio, M.; et al. Frequency of Malnutrition in Older Adults: A Multinational Perspective Using the Mini Nutritional Assessment. J. Am. Geriatr. Soc. 2010, 58, 1734–1738. [Google Scholar] [CrossRef]
- Subar, A.F.; Kirkpatrick, S.I.; Mittl, B.; Zimmerman, T.P.; Thompson, F.E.; Bingley, C.; Willis, G.; Islam, N.G.; Baranowski, T.; McNutt, S.; et al. The Automated Self-Administered 24-Hour Dietary Recall (ASA24): A Resource for Researchers, Clinicians, and Educators from the National Cancer Institute. J. Acad. Nutr. Diet. 2012, 112, 1134–1137. [Google Scholar] [CrossRef]
- Kipnis, V.; Subar, A.F.; Midthune, D.; Freedman, L.S.; Ballard-Barbash, R.; Troiano, R.P.; Bingham, S.; Schoeller, D.A.; Schatzkin, A.; Carroll, R.J. Structure of dietary measurement error: Results of the OPEN biomarker study. Am. J. Epidemiol. 2003, 158, 14–21. [Google Scholar] [CrossRef]
- Freedman, L.S.; Commins, J.M.; Moler, J.E.; Arab, L.; Baer, D.J.; Kipnis, V.; Midthune, D.; Moshfegh, A.J.; Neuhouser, M.L.; Prentice, R.L.; et al. Pooled results from 5 validation studies of dietary self-report instruments using recovery biomarkers for energy and protein intake. Am. J. Epidemiol. 2014, 180, 172–188. [Google Scholar] [CrossRef]
- Freedman, L.S.; Commins, J.M.; Moler, J.E.; Willett, W.; Tinker, L.F.; Subar, A.F.; Spiegelman, D.; Rhodes, D.; Potischman, N.; Neuhouser, M.L.; et al. Pooled results from 5 validation studies of dietary self-report instruments using recovery biomarkers for potassium and sodium intake. Am. J. Epidemiol. 2015, 181, 473–487. [Google Scholar] [CrossRef]
- Elbert, S.P.; Dijkstra, A.; Oenema, A. A Mobile Phone App Intervention Targeting Fruit and Vegetable Consumption: The Efficacy of Textual and Auditory Tailored Health Information Tested in a Randomized Controlled Trial. J. Med. Internet Res. 2016, 18, e147. [Google Scholar] [CrossRef]
- Zhang, W.; Yu, Q.; Siddiquie, B.; Divakaran, A.; Sawhney, H. “Snap-n-Eat”: Food Recognition and Nutrition Estimation on a Smartphone. J. Diabetes Sci. Technol. 2015, 9, 525–533. [Google Scholar] [CrossRef] [PubMed]
- Williamson, D.A.; Allen, H.R.; Martin, P.D.; Alfonso, A.J.; Gerald, B.; Hunt, A. Comparison of digital photography to weighed and visual estimation of portion sizes. J. Am. Diet. Assoc. 2003, 103, 1139–1145. [Google Scholar] [CrossRef]
- Rusu, A.; Randriambelonoro, M.; Perrin, C.; Valk, C.; Álvarez, B.; Schwarze, A.K. Aspects Influencing Food Intake and Approaches towards Personalising Nutrition in the Elderly. J. Popul. Ageing 2020, 13, 239–256. [Google Scholar] [CrossRef]
- Ciocca, G.; Napoletano, P.; Schettini, R. Food Recognition: A New Dataset, Experiments, and Results. IEEE J. Biomed. Health Inform. 2017, 21, 588–598. [Google Scholar] [CrossRef]
- Ando, Y.; Ege, T.; Cho, J.; Yanai, K. DepthCalorieCam: A Mobile Application for Volume-Based FoodCalorie Estimation Using Depth Cameras. In Proceedings of the 5th International Workshop on Multimedia Assisted Dietary Management, MADiMa ’19, Nice, France, 21 October 2019; pp. 76–81. [Google Scholar] [CrossRef]
- Tai, C.e.A.; Keller, M.; Nair, S.; Chen, Y.; Wu, Y.; Markham, O.; Parmar, K.; Xi, P.; Keller, H.; Kirkpatrick, S.; et al. NutritionVerse: Empirical Study of Various Dietary Intake Estimation Approaches. In Proceedings of the 8th International Workshop on Multimedia Assisted Dietary Management, Ottawa, ON, Canada, 29 October–3 November 2023; pp. 11–19. [Google Scholar]
- Beijbom, O.; Joshi, N.; Morris, D.; Saponas, S.; Khullar, S. Menu-Match: Restaurant-Specific Food Logging from Images. In Proceedings of the 2015 IEEE Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA, 5–9 January 2015; pp. 844–851. [Google Scholar] [CrossRef]
- Myers, A.; Johnston, N.; Rathod, V.; Korattikara, A.; Gorban, A.; Silberman, N.; Guadarrama, S.; Papandreou, G.; Huang, J.; Murphy, K. Im2Calories: Towards an Automated Mobile Vision Food Diary. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; pp. 1233–1241. [Google Scholar] [CrossRef]
- Liang, Y.; Li, J. Computer vision-based food calorie estimation: Dataset, method, and experiment. arXiv 2017, arXiv:1705.07632. [Google Scholar] [CrossRef]
- Thames, Q.; Karpur, A.; Norris, W.; Xia, F.; Panait, L.; Weyand, T.; Sim, J. Nutrition5k: Towards Automatic Nutritional Understanding of Generic Food. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 8903–8911. [Google Scholar]
- Wu, X.; Fu, X.; Liu, Y.; Lim, E.P.; Hoi, S.C.; Sun, Q. A Large-Scale Benchmark for Food Image Segmentation. In Proceedings of the ACM International Conference on Multimedia, Chengdu, China, 20–24 October 2021; pp. 506–515. [Google Scholar]
- Kaur, P.; Sikka, K.; Wang, W.; Belongie, S.J.; Divakaran, A. FoodX-251: A Dataset for Fine-grained Food Classification. arXiv 2019, arXiv:1907.06167. [Google Scholar] [CrossRef]
- Matsuda, Y.; Hoashi, H.; Yanai, K. Recognition of multiple-food images by detecting candidate regions. In Proceedings of the 2012 IEEE International Conference on Multimedia and Expo, Melbourne, VIC, Australia, 9–13 July 2012; pp. 25–30. [Google Scholar]
- Min, W.; Wang, Z.; Liu, Y.; Luo, M.; Kang, L.; Wei, X.; Wei, X.; Jiang, S. Large Scale Visual Food Recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 9932–9949. [Google Scholar] [CrossRef]
- Chen, X.; Zhu, Y.; Zhou, H.; Diao, L.; Wang, D. Chinesefoodnet: A large-scale image dataset for chinese food recognition. arXiv 2017, arXiv:1705.02743. [Google Scholar]
- Bossard, L.; Guillaumin, M.; Van Gool, L. Food-101—Mining Discriminative Components with Random Forests. In Proceedings of the European Conference on Computer Vision—ECCV 2014, Zurich, Switzerland, 6–12 September 2014; Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T., Eds.; Springer: Cham, Switzerland, 2014; pp. 446–461. [Google Scholar]
- Karabay, A.; Varol, H.A.; Chan, M.Y. Improved food image recognition by leveraging deep learning and data-driven methods with an application to Central Asian Food Scene. Sci. Rep. 2025, 15, 14043. [Google Scholar] [CrossRef]
- Karabay, A.; Bolatov, A.; Varol, H.A.; Chan, M.Y. A central asian food dataset for personalized dietary interventions. Nutrients 2023, 15, 1728. [Google Scholar] [CrossRef]
- Min, W.; Liu, L.; Wang, Z.; Luo, Z.; Wei, X.; Wei, X.; Jiang, S. Isia food-500: A dataset for large-scale food recognition via stacked global-local attention network. In Proceedings of the 28th ACM International Conference on Multimedia, Seattle, WA, USA, 12–16 October 2020; pp. 393–401. [Google Scholar]
- Romero-Tapiador, S.; Tolosana, R.; Lacruz-Pleguezuelos, B.; Marcos-Zambrano, L.J.; Bazán, G.X.; Espinosa-Salinas, I.; Fierrez, J.; Ortega-Garcia, J.; de Santa Pau, E.C.; Morales, A. Are Vision-Language Models Ready for Dietary Assessment? Exploring the Next Frontier in AI-Powered Food Image Recognition. In Proceedings of the Computer Vision and Pattern Recognition Conference, Nashville, TN, USA, 11–15 June 2025; pp. 430–439. [Google Scholar]
- Marin, J.; Biswas, A.; Ofli, F.; Hynes, N.; Salvador, A.; Aytar, Y.; Weber, I.; Torralba, A. Recipe1M+: A Dataset for Learning Cross-Modal Embeddings for Cooking Recipes and Food Images. IEEE Trans. Pattern Anal. Mach. Intell. 2019, 43, 187–203. [Google Scholar] [CrossRef]
- Salvador, A.; Hynes, N.; Aytar, Y.; Marin, J.; Ofli, F.; Weber, I.; Torralba, A. Learning Cross-Modal Embeddings for Cooking Recipes and Food Images. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 3068–3076. [Google Scholar]
- Chen, J.; Ngo, C.W. Deep-Based Ingredient Recognition for Cooking Recipe Retrieval. In Proceedings of the 24th ACM International Conference on Multimedia, MM ’16, Amsterdam, The Netherlands, 15–19 October 2016; pp. 32–41. [Google Scholar] [CrossRef]
- Chang, A.X.; Funkhouser, T.; Guibas, L.; Hanrahan, P.; Huang, Q.; Li, Z.; Savarese, S.; Savva, M.; Song, S.; Su, H.; et al. ShapeNet: An Information-Rich 3D Model Repository. arXiv 2015, arXiv:1512.03012. [Google Scholar]
- Wu, T.; Zhang, J.; Fu, X.; Wang, Y.; Ren, J.; Pan, L.; Wu, W.; Yang, L.; Wang, J.; Qian, C.; et al. Omniobject3d: Large-vocabulary 3D object dataset for realistic perception, reconstruction and generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 18–22 June 2023; pp. 803–814. [Google Scholar]
- Pouladzadeh, P.; Shirmohammadi, S.; Al-Maghrabi, R. Measuring Calorie and Nutrition From Food Image. IEEE Trans. Instrum. Meas. 2014, 63, 1947–1956. [Google Scholar] [CrossRef]
- Bolaños, M.; Radeva, P. Simultaneous food localization and recognition. In Proceedings of the 2016 23rd International Conference on Pattern Recognition (ICPR), Cancun, Mexico, 4–8 December 2016; pp. 3140–3145. [Google Scholar] [CrossRef]
- Konstantakopoulos, F.S.; Georga, E.I.; Fotiadis, D.I. A review of image-based food recognition and volume estimation artificial intelligence systems. IEEE Rev. Biomed. Eng. 2023, 17, 136–152. [Google Scholar] [CrossRef] [PubMed]
- Mezgec, S.; Seljak, B. NutriNet: A Deep Learning Food and Drink Image Recognition System for Dietary Assessment. Nutrients 2017, 9, 657. [Google Scholar] [CrossRef]
- Kelly, P.; Marshall, S.J.; Badland, H.; Kerr, J.; Oliver, M.; Doherty, A.R.; Foster, C. An Ethical Framework for Automated, Wearable Cameras in Health Behavior Research. Am. J. Prev. Med. 2013, 44, 314–319. [Google Scholar] [CrossRef]
- Mok, T.M.; Cornish, F.; Tarr, J. Too Much Information: Visual Research Ethics in the Age of Wearable Cameras. Integr. Psychol. Behav. Sci. 2015, 49, 309–322. [Google Scholar] [CrossRef]
- Apple. Apple. 2022. Available online: https://www.apple.com/ca/iphone/ (accessed on 25 October 2022).
- Polycam. Polycam—LiDAR & 3D Scanner for iPhone & Android. 2022. Available online: https://poly.cam/ (accessed on 25 October 2022).
- Chambers, J.; Hullette, T.; Gharge, P. The Best 3D Scanner Apps of 2022 (iPhone & Android). 2022. Available online: https://all3dp.com/2/best-3d-scanner-app-iphone-android-photogrammetry/ (accessed on 25 October 2022).
- Government of Canada. Canadian Nutrient File (CNF)—Search by Food. 2022. Available online: https://food-nutrition.canada.ca/cnf-fce/ (accessed on 25 February 2023).
- McHenry, K.; Bajcsy, P. An overview of 3D data content, file formats and viewers. Natl. Cent. Supercomput. Appl. 2008, 1205, 22. [Google Scholar]
- Tai, C.e.A.; Keller, M.; Kerrigan, M.; Chen, Y.; Nair, S.; Xi, P.; Wong, A. NutritionVerse-3D: A 3D Food Model Dataset for Nutritional Intake Estimation. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 18–22 June 2023. Women in Computer Vision (WiCV). [Google Scholar]
- NVIDIA. NVIDIA Isaac Sim. 2023. Available online: https://developer.nvidia.com/isaac-sim (accessed on 21 July 2023).
- Roboflow, Version 1.0; Computer Vision: 2022. Available online: https://research.roboflow.com/citations (accessed on 29 September 2023).
- Government of Canada. Percent Daily Value. 2019. Available online: https://www.canada.ca/en/health-canada/services/understanding-food-labels/percent-daily-value.html (accessed on 29 September 2023).
- Osilla, E.V.; Safadi, A.O.; Sharma, S. Calories; StatPearls Publishing: Treasure Island, FL, USA, 2018. [Google Scholar]
- Szegedy, C.; Ioffe, S.; Vanhoucke, V. Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning. arXiv 2016, arXiv:1602.07261. [Google Scholar] [CrossRef]
- Wu, B.; Xu, C.; Dai, X.; Wan, A.; Zhang, P.; Tomizuka, M.; Keutzer, K.; Vajda, P. Visual Transformers: Token-based Image Representation and Processing for Computer Vision. arXiv 2020, arXiv:2006.03677. [Google Scholar] [CrossRef]
- He, K.; Chen, X.; Xie, S.; Li, Y.; Dollár, P.; Girshick, R. Masked Autoencoders Are Scalable Vision Learners. arXiv 2021, arXiv:2111.06377. [Google Scholar] [CrossRef]
- Deng, J.; Dong, W.; Socher, R.; Li, L.J.; Li, K.; Li, F.F. Imagenet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 248–255. [Google Scholar]

















| Work | Public | Data | Dietary Info | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| # Images | # Itms | Real | Mixed | # Angles | Depth Info | Annotation Masks | Calories | Mass | Protein | Fat | Carbohydrate | ||
| [14] | ✓ | 18 | 3 | Y | N | 1 | ✓ | ||||||
| [16] | ✓ | 646 | 41 | Y | Y | 1 | ✓ | ||||||
| [17] | ✓ | 50,374 | 201 | Y | Y | 1 | ✓ | ||||||
| [18] | ✓ | 2978 | 160 | Y | N | 2 | ✓ | ✓ | |||||
| [19] | ✓ | 5006 | 555 | Y | Y | 4 | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | |
| [35] | 3000 | 8 | Y | Y | 2 | ✓ | ✓ | ✓ | |||||
| NV-Real | ✓ | 889 | 45 | Y | Y | 4 | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | |
| NV-Synth | ✓ | 84,984 | 45 | N | Y | 12 | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
| Num of Food 6D-Poses | Num of Camera Angles |
|---|---|
| 2 | 4 |
| 3 | 2 |
| 4 | 2 |
| Property | Quantifier | Example |
|---|---|---|
| Texture | Low | Cheese Block |
| High | Granola Bar | |
| Volume | Low | Grape |
| High | Apple | |
| Thickness | Low | Potato Chip |
| High | Salad Chicken Strip | |
| Fragility | Low | Chicken Wing |
| High | Tuna Rice Ball |
| Item_Id | fwg | Calories | Fat | Carbohydrates | Protein | Calcium |
|---|---|---|---|---|---|---|
| id-11-red-apple-145g | 145 | 85.55 | 0.29 | 20.39 | 0.39 | 0.01 |
| id-12-carrot-9g | 9 | 3.69 | 0.02 | 0.86 | 0.08 | 0.00 |
| id-13-salad-beef-strip-1g | 1 | 2.15 | 0.08 | 0.00 | 0.32 | 0.00 |
| id-14-salad-beef-strip-7g | 7 | 15.05 | 0.58 | 0.00 | 2.26 | 0.00 |
| Initial Weights | Depth | Calories (kcal) | Mass (g) | Protein (g) | Fat (g) | Carb (g) |
|---|---|---|---|---|---|---|
| ImageNet | No | 161.9 | 84.6 | 17.0 | 8.7 | 19.9 |
| Nutrition5k | No | 134.2 *** | 72.7 *** | 17.9 *** | 9.1 *** | 22.0 *** |
| ImageNet | Yes | 249.5 *** | 103.3 *** | 25.4 *** | 14.6 *** | 21.2 *** |
| Nutrition5k | Yes | 214.4 *** | 82.3 | 24.5 *** | 13.5 *** | 19.8 |
| Initial Weights | Scenario | Calories (kcal) | Mass (g) | Protein (g) | Fat (g) | Carb (g) |
|---|---|---|---|---|---|---|
| ImageNet | A | 485.5 | 175.2 | 39.6 | 26.2 | 55.7 |
| Nutrition5k | A | 1083.0 *** | 443.9 *** | 96.3 *** | 55.5 *** | 64.8 *** |
| ImageNet | B | 471.5 | 139.7 *** | 33.2 *** | 23.8 ** | 51.4 *** |
| Nutrition5k | B | 497.7 | 170.5 | 36.0 * | 25.9 | 52.9 ** |
| ImageNet | C | 290.1 *** | 117.9 *** | 25.2 *** | 15.6 *** | 26.4 *** |
| Nutrition5k | C | 489.7 | 188.2 | 32.7 ** | 24.6 | 59.6 |
| Base | Comp. | Calorie (kcal) | Mass (g) | Protein (g) | Fat (g) | Carb (g) | Combined |
|---|---|---|---|---|---|---|---|
| Inception-ResNet | No | 290.1 | 117.85 | 25.2 | 15.6 | 26.4 | 475.1 |
| Inception-ResNet | Yes | 305.5 * | 162.1 *** | 35.2 ** | 17.4 * | 51.3 *** | 571.5 |
| ViT | No | 253.7 | 98.4 | 22.1 | 14.3 | 24.2 | 412.6 |
| ViT | Yes | 311.9 * | 149.5 ** | 27.3 | 16.2 | 46.8 *** | 551.7 |
| M-AutoE | No | 463.3 *** | 144.3 * | 32.0 * | 24.4 *** | 53.5 *** | 717.5 |
| M-AutoE | Yes | 476.2 *** | 152.0 ** | 30.6 * | 22.9 *** | 56.6 *** | 737.3 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Tai, C.-e.A.; Keller, M.; Nair, S.; Chen, Y.; Wu, Y.; Markham, O.; Parmar, K.; Xi, P.; Wong, A. NutritionVerse3D2D: Large 3D Object and 2D Image Food Dataset for Dietary Intake Estimation. Data 2025, 10, 180. https://doi.org/10.3390/data10110180
Tai C-eA, Keller M, Nair S, Chen Y, Wu Y, Markham O, Parmar K, Xi P, Wong A. NutritionVerse3D2D: Large 3D Object and 2D Image Food Dataset for Dietary Intake Estimation. Data. 2025; 10(11):180. https://doi.org/10.3390/data10110180
Chicago/Turabian StyleTai, Chi-en Amy, Matthew Keller, Saeejith Nair, Yuhao Chen, Yifan Wu, Olivia Markham, Krish Parmar, Pengcheng Xi, and Alexander Wong. 2025. "NutritionVerse3D2D: Large 3D Object and 2D Image Food Dataset for Dietary Intake Estimation" Data 10, no. 11: 180. https://doi.org/10.3390/data10110180
APA StyleTai, C.-e. A., Keller, M., Nair, S., Chen, Y., Wu, Y., Markham, O., Parmar, K., Xi, P., & Wong, A. (2025). NutritionVerse3D2D: Large 3D Object and 2D Image Food Dataset for Dietary Intake Estimation. Data, 10(11), 180. https://doi.org/10.3390/data10110180

