Computer Vision for Food Data Analysis: Methods, Challenges, and Applications

A special issue of Journal of Imaging (ISSN 2313-433X). This special issue belongs to the section "Computer Vision and Pattern Recognition".

Deadline for manuscript submissions: 31 December 2025 | Viewed by 2738

Special Issue Editors


E-Mail Website
Guest Editor
School of Electrical and Computer Engineering, Purdue University, West Lafayette, IN 47907, USA
Interests: food image processing; computer vision; deep learning

E-Mail Website
Guest Editor
Department of Individual, Family, and Community Education, University of New Mexico, Albuquerque, NM 87131, USA
Interests: nutrition epidemiology; food classification; food recognition

E-Mail Website
Guest Editor
Department de Matemàtiques i Informàtica, Universitat de Barcelona, Gran Via de les Corts Catalanes, 08007 Barcelona, Spain
Interests: deep learning; machine learning; computer vision

Special Issue Information

Dear Colleagues,

Computer vision has achieved remarkable progress in various downstream tasks, demonstrating promising performance and application potential in real-world scenarios. However, food image analysis represents a domain where current computer vision algorithms face significant limitations, revealing a substantial performance gap between general computer vision tasks and food-specific applications. This gap stems from several unique characteristics of food data, such as (1) intra-class variations (the same dish can appear drastically different), (2) inter-class similarities (different dishes may look very similar), (3) complex compositional structures (multiple ingredients with various preparation methods), and (4) significant cultural diversities. These challenges are further compounded by the nature of real-world food data available on the internet, which is often ambiguous, non-annotated, noisy, watermarked, and of varying quality.

The importance of addressing these computer vision challenges has grown rapidly with the widespread adoption of smartphones and social media platforms, which continuously generate massive amounts of food-related visual data. While food serves as a fundamental element of human society, the technical challenges it presents make it an important domain for advancing computer vision capabilities. The successful development of robust computer vision systems for food analysis could enable breakthrough applications across multiple domains, from automated dietary monitoring and nutritional assessment systems to cross-cultural food recognition systems. Furthermore, with an estimated 200,000 basic dishes globally, food recognition presents fundamental computer vision research questions regarding classification scalability with massive numbers of fine-grained categories, robustness to noisy and limited labels, and the development of more sophisticated visual understanding models.

Recent advances in computer vision, particularly in generative AI, large language models, and multi-modal learning, present promising new directions for addressing these challenges. This Special Issue aims to explore innovative approaches in advancing our ability to analyze and understand food through visual data. We seek contributions that advance both fundamental computer vision techniques and practical applications in food-related tasks. Submissions may include, but are not limited to, the following topics:

  • Food image/video generation and generative AI;
  • Food video analysis and action recognition;
  • Food 3D model reconstruction;
  • Food portion/nutrition value estimation;
  • Food manipulation understanding;
  • Food image quality analysis/inspection;
  • Eating and cooking action recognition;
  • Multi-modal food data analysis;
  • Food ontologies and LLM-based models for food data analysis;
  • Visual question answering for food;
  • Food data analysis and uncertainty modeling;
  • Learning with noisy food labels;
  • Continual, self-supervised, semi-supervised, and unsupervised learning for food;
  • Food classification/detection/segmentation in 2D/3D.

Dr. Jiangpeng He
Dr. Luotao Lin
Dr. Bhalaji Nagarajan
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 250 words) can be sent to the Editorial Office for assessment.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Journal of Imaging is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1800 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • computer vision
  • food image analysis
  • generative AI
  • deep learning
  • multi-modal analysis
  • action recognition
  • 3D reconstruction
  • visual understanding
  • food recognition
  • machine learning

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • Reprint: MDPI Books provides the opportunity to republish successful Special Issues in book format, both online and in print.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (2 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

23 pages, 3326 KB  
Article
Hybrid Multi-Scale Neural Network with Attention-Based Fusion for Fruit Crop Disease Identification
by Shakhmaran Seilov, Akniyet Nurzhaubayev, Marat Baideldinov, Bibinur Zhursinbek, Medet Ashimgaliyev and Ainur Zhumadillayeva
J. Imaging 2025, 11(12), 440; https://doi.org/10.3390/jimaging11120440 - 10 Dec 2025
Abstract
Unobserved fruit crop illnesses are a major threat to agricultural productivity worldwide and frequently cause farmers to suffer large financial losses. Manual field inspection-based disease detection techniques are time-consuming, unreliable, and unsuitable for extensive monitoring. Deep learning approaches, in particular convolutional neural networks, [...] Read more.
Unobserved fruit crop illnesses are a major threat to agricultural productivity worldwide and frequently cause farmers to suffer large financial losses. Manual field inspection-based disease detection techniques are time-consuming, unreliable, and unsuitable for extensive monitoring. Deep learning approaches, in particular convolutional neural networks, have shown promise for automated plant disease identification, although they still face significant obstacles. These include poor generalization across complicated visual backdrops, limited resilience to different illness sizes, and high processing needs that make deployment on resource-constrained edge devices difficult. We suggest a Hybrid Multi-Scale Neural Network (HMCT-AF with GSAF) architecture for precise and effective fruit crop disease identification in order to overcome these drawbacks. In order to extract long-range dependencies, HMCT-AF with GSAF combines a Vision Transformer-based structural branch with multi-scale convolutional branches to capture both high-level contextual patterns and fine-grained local information. These disparate features are adaptively combined using a novel HMCT-AF with a GSAF module, which enhances model interpretability and classification performance. We conduct evaluations on both PlantVillage (controlled environment) and CLD (real-world in-field conditions), observing consistent performance gains that indicate strong resilience to natural lighting variations and background complexity. With an accuracy of up to 93.79%, HMCT-AF with GSAF outperforms vanilla Transformer models, EfficientNet, and traditional CNNs. These findings demonstrate how well the model captures scale-variant disease symptoms and how it may be used in real-time agricultural applications using hardware that is compatible with the edge. According to our research, HMCT-AF with GSAF presents a viable basis for intelligent, scalable plant disease monitoring systems in contemporary precision farming. Full article
Show Figures

Figure 1

19 pages, 6354 KB  
Article
Extract Nutritional Information from Bilingual Food Labels Using Large Language Models
by Fatmah Y. Assiri, Mohammad D. Alahmadi, Mohammed A. Almuashi and Ayidh M. Almansour
J. Imaging 2025, 11(8), 271; https://doi.org/10.3390/jimaging11080271 - 13 Aug 2025
Viewed by 2124
Abstract
Food product labels serve as a critical source of information, providing details about nutritional content, ingredients, and health implications. These labels enable Food and Drug Authorities (FDA) to ensure compliance and take necessary health-related and logistics actions. Additionally, product labels are essential for [...] Read more.
Food product labels serve as a critical source of information, providing details about nutritional content, ingredients, and health implications. These labels enable Food and Drug Authorities (FDA) to ensure compliance and take necessary health-related and logistics actions. Additionally, product labels are essential for online grocery stores to offer reliable nutrition facts and empower customers to make informed dietary decisions. Unfortunately, product labels are typically available in image formats, requiring organizations and online stores to manually transcribe them—a process that is not only time-consuming but also highly prone to human error, especially with multilingual labels that add complexity to the task. Our study investigates the challenges and effectiveness of leveraging large language models (LLMs) to extract nutritional elements and values from multilingual food product labels, with a specific focus on Arabic and English. A comprehensive empirical analysis was conducted using a manually curated dataset of 294 food product labels, comprising 588 transcribed nutritional elements and values in both languages, which served as the ground truth for evaluation. The findings reveal that while LLMs performed better in extracting English elements and values compared to Arabic, our post-processing techniques significantly enhanced their accuracy, with GPT-4o outperforming GPT-4V and Gemini. Full article
Show Figures

Figure 1

Back to TopTop