DietQA: A Comprehensive Framework for Personalized Multi-Diet Recipe Retrieval Using Knowledge Graphs, Retrieval-Augmented Generation, and Large Language Models

Tsampos, Ioannis; Marakakis, Emmanouil

doi:10.3390/computers14100412

Open AccessArticle

DietQA: A Comprehensive Framework for Personalized Multi-Diet Recipe Retrieval Using Knowledge Graphs, Retrieval-Augmented Generation, and Large Language Models

by

Ioannis Tsampos

^*

and

Emmanouil Marakakis

^*

Department of Electrical & Computer Engineering, Hellenic Mediterranean University, 71410 Heraklion, Greece

^*

Authors to whom correspondence should be addressed.

Computers 2025, 14(10), 412; https://doi.org/10.3390/computers14100412

Submission received: 31 July 2025 / Revised: 22 September 2025 / Accepted: 24 September 2025 / Published: 29 September 2025

(This article belongs to the Special Issue Natural Language Processing (NLP) and Large Language Modelling (2nd Edition))

Download

Browse Figures

Versions Notes

Abstract

Recipes available on the web often lack nutritional transparency and clear indicators of dietary suitability. While searching by title is straightforward, exploring recipes that meet combined dietary needs, nutritional goals, and ingredient-level preferences remains challenging. Most existing recipe search systems do not effectively support flexible multi-dietary reasoning in combination with user preferences and restrictions. For example, users may seek gluten-free and dairy-free dinners with suitable substitutions, or compound goals such as vegan and low-fat desserts. Recent systematic reviews report that most food recommender systems are content-based and often non-personalized, with limited support for dietary restrictions, ingredient-level exclusions, and multi-criteria nutrition goals. This paper introduces DietQA, an end-to-end, language-adaptable chatbot system that integrates a Knowledge Graph (KG), Retrieval-Augmented Generation (RAG), and a Large Language Model (LLM) to support personalized, dietary-aware recipe search and question answering. DietQA crawls Greek-language recipe websites to extract structured information such as titles, ingredients, and quantities. Nutritional values are calculated using validated food composition databases, and dietary tags are inferred automatically based on ingredient composition. All information is stored in a Neo4j-based knowledge graph, enabling flexible querying via Cypher. Users interact with the system through a natural language chatbot friendly interface, where they can express preferences for ingredients, nutrients, dishes, and diets, and filter recipes based on multiple factors such as ingredient availability, exclusions, and nutritional goals. DietQA supports multi-diet recipe search by retrieving both compliant recipes and those adaptable via ingredient substitutions, explaining how each result aligns with user preferences and constraints. An LLM extracts intents and entities from user queries to support rule-based Cypher retrieval, while the RAG pipeline generates contextualized responses using the user query and preferences, retrieved recipes, statistical summaries, and substitution logic. The system integrates real-time updates of recipe and nutritional data, supporting up-to-date, relevant, and personalized recommendations. It is designed for language-adaptable deployment and has been developed and evaluated using Greek-language content. DietQA provides a scalable framework for transparent and adaptive dietary recommendation systems powered by conversational AI.

Keywords:

recipe recommendation; dietary reasoning; knowledge graph (KG); chatbot; personalized nutrition; retrieval-augmented generation (RAG); large language models (LLMs); ingredient substitution; food computing; natural language interface

Graphical Abstract

1. Introduction

The rise in digital health and nutrition applications underscores a pressing need for intelligent systems that can guide individuals toward healthier eating. Unhealthy diets are a major contributing factor to non-communicable diseases like heart conditions and diabetes [1]. Personalized nutrition—providing tailored dietary advice to individuals based on their unique health profiles and preferences—has emerged as a promising approach to mitigate these issues [2]. Traditional “one-size-fits-all” diet recommendations often fail to engage users, whereas personalized guidance can help users adhere to healthy, enjoyable diets aligned with their needs [3]. In this context, recipe recommendation and meal-planning systems have evolved to incorporate personal factors such as nutritional requirements, taste preferences, allergies, and health goals. However, selecting appropriate recipes from large online collections remains challenging for users. They face an information overload—tens of thousands of recipes to choose from—with limited support for filtering by healthfulness or personal criteria and preferences [4]. This often leads to suboptimal meal choices or user disengagement with diet-focused recipe search apps. Recent literature reviews concur that food recommender systems are predominantly content-based and ML-driven yet often non-personalized, with limited support for dietary constraints, ingredient-level exclusions, and multi-criteria nutrition goals, leaving combined diet- and preference-aware retrieval under-served [5,6,7,8].

Recent advancements in artificial intelligence for food computing offer a pathway to smarter diet assistance. In particular, knowledge graphs (KGs) and conversational AI have proven valuable in making recipe recommendations more personalized and explainable [9,10]. A knowledge graph (KG) can organize recipes, ingredients, and nutrition facts into a semantic network, enabling reasoning about ingredient relationships, dietary constraints, and health impacts. Prior works [10] have shown that integrating KGs into recommender systems provides extra context and improves recommendation relevance and explainability [11]. For example, mapping recipe ingredients and attributes into a KG allows the system to find connections (e.g., two soups that are similar in content and cuisine) that would be missed by simple keyword matching. At the same time, chatbot interfaces driven by large language models (LLMs) like ChatGPT Gemma 2 (12B) have become popular for their ability to understand natural language queries and engage in dialogue [12]. In the food domain, conversational agents can let users ask complex questions (“What’s a good low-carbs soup I can cook tonight?”) and receive interactive, context-aware suggestions, rather than just static search results. Early recipe chatbots were rule-based and limited to canned Q&A, but modern systems employ transformer-based natural language understanding and multi-turn dialogue management to handle nuanced queries and follow-up questions [13]. These AI-driven developments create an opportunity to fuse structured nutritional knowledge with free-form conversation, delivering a diet assistant that is both accurate and user-friendly.

This paper presents DietQA, an intelligent recipe-based question answering system integrated into a diet application, which exemplifies such a fusion. DietQA distinguishes itself by combining a robust food KG with an LLM-powered conversational agent to satisfy users’ culinary and nutritional information needs [14,15]. The system allows users to ask questions about recipes, ingredients, diets and micronutrients options in natural language and receive answers that are tailored to their dietary profile [16]. For instance, a user following a gluten-free and high-protein diet can ask, “What are some gluten-free dinner recipes rich in protein?”, and the system will respond with a curated list of personalized recipe suggestions, each accompanied by detailed nutritional breakdowns and ingredient substitutions (if necessary) to ensure compliance with the user’s dietary restrictions [17,18]. In addition, the system considers individual preferences—such as specific micronutrient targets, liked or disliked foods—and explains how each recommendation aligns with these criteria, enhancing both relevance and user trust. The underlying KG encodes relationships between recipes, ingredients, nutrients, and diet guidelines, enabling the system to perform semantic filtering beyond simple keyword matching [19]. By combining KG retrieval with LLM generation, every answer remains grounded in verified facts. Users can further interact with the system by asking follow-ups like, “Which of these recipes offers the best protein-to-calorie ratio?”. Through this interactive Question Answering (QA) paradigm, recipe search becomes a conversation, where the system can clarify needs, justify its recommendations, and adapt to user feedback.

Prior diet and recipe recommenders often rely on keyword filtering or health-weighted re-ranking, which struggle to represent compositional constraints (diet tags plus numeric rules) and provide limited explanations. KG-based systems improve structure but rarely combine multi-diet constraint satisfaction with diet-aware substitution in dialogue, while LLM-only chatbots risk producing ungrounded or non-compliant answers. In our corpus, recipe coverage falls sharply as diet constraints accumulate, but substitution recovers many otherwise unsatisfied queries, highlighting the need for factual reasoning before generation.

To overcome these limitations, we adopt a Retrieval-Augmented Generation (RAG) design. First, Cypher queries are executed over the KG to enforce hard constraints and compute structured insights such as nutrient statistics, compliance checks, and possible substitutions. These retrieved facts are then passed to an LLM, which generates grounded, personalized explanations and supports multi-turn dialogue. This combination reduces hallucination, preserves safety in a health context, and provides reliable support for Greek-language queries by grounding them in canonical KG entities rather than relying solely on free-text processing.

The architecture is designed to be language-adaptable. The reasoning and retrieval modules (Cypher queries over the KG, substitution logic, and statistical analysis) are language-agnostic, while the conversational interface is supported by an instruction-tuned multilingual LLM, with localization required only for corpora, nutrient databases, and dietary taxonomies.

The principal contributions of this work are as follows:

KG-QA for Multi-Diet Reasoning and Substitution: Formulates diet-aware recipe search as knowledge-graph question answering with full Boolean constraints across ingredients, nutrients and diet tags, plus diet-aware ingredient substitution to recover feasible results even if direct matches fail; the study quantifies coverage and substitution lift across increasing diet complexity.
Hybrid Symbolic–Neural Pipeline with Explanations: Combines Cypher-based constraint enforcement and KG reasoning with RAG-style LLM generation to produce grounded, explainable answers; introduces a hard vs. soft constraints model (strict compliance requirements vs. preference-guided ranking) and supports post-retrieval analytics for transparent rationale.
Language-Adaptable Architecture for Cross-Cultural Use: Keeps retrieval/reasoning language-agnostic and uses a multilingual LLM at entity extraction and the interface, so porting to new locales mainly requires local corpora, nutrient databases, and diet taxonomies, not algorithmic changes.
Usability-Centred Chat and GUI Design: Delivers a conversational UI enhanced with visual pills, tag clouds, sliders, and filter panels for immediate refinement; achieves strong usability outcomes in a mobile-friendly, accessible interface.
Scalable, Performant System Implementation: Provides a modular client–server stack on Neo4j with bounded traversals and caching; offers a complexity model showing per-query cost scales with the candidate set and empirical sub-second latencies with negligible substitution overhead.

2. Related Work

2.1. Recipe Web Scraping, Semantic Extraction, and Database Storage

Building a rich knowledge base of recipes and food information starts with web scraping and semantic data extraction, since much of the world’s recipe knowledge is available only in unstructured forms on websites. A variety of methods have been used in prior research to collect and structure recipe data at scale [4,14]. The preprocessing challenges are significant. For example, Ref. [11] noted that some recipe names in their flower dataset were misleading, requiring the deletion of irrelevant entries during cleaning. One important extraction task is ingredient phrase parsing—separating each ingredient line into its quantity, unit, ingredient name, and any preparation notes. The goal is to normalize ingredient names by removing extraneous descriptors [19]. The paper in [20] describes cleaning user-contributed recipes where ingredients appear in varying forms, e.g., one recipe might list “large onion” and another “sliced onion”—both referring to ”onion ”. By stripping adjectives (“large”) and preparation notes (“sliced”), they mapped such variants to canonical ingredient names.

Prior work has tackled this with rule-based approaches and statistical NLP [2,3,9]. For example, the Ingredient Phrase Tagger [21] is a known tool that uses a trained conditional random field model to tag tokens in ingredient strings (e.g., “2 cups of sliced onions” becomes: “amount: 2, unit: cups, ingredient: onion, preparation: sliced”). Others have used simpler heuristic approaches combined with lexicons of units. Recipe1M [22], one of the most widely used recipe datasets, was constructed by scraping a variety of websites and ended up with around one million recipes with structured ingredients and instruction. It has since been used in tasks from image-to-recipe retrieval to nutritional analysis.

In our project, we similarly employ NLP preprocessing to standardize ingredients: we remove quantity and unit information and use a combination of rule-based filtering and spaCy’s entity lemmatization to isolate the core ingredient terms [23]. This normalization enables accurate linkage to nutritional databases and recipe aggregation (e.g., aligning “baking soda” with “bicarbonate of soda”). The processed ingredients, their links to recipes and nutrition, and semantic relationships are stored in a Neo4j graph, addressing key challenges in recipe data organization [17,19].

2.2. AI in Recipe Recommendation and Personalized Nutrition

Early recipe recommendation systems borrowed techniques from general recommenders, like those for movies or products, focusing on predicting user ratings or preferences for recipes. Traditional approaches often modelled recipe similarity based on ingredients or collaborative filtering, without any special consideration for health or nutrition [4]. Over time, the unique challenges of food recommendation became evident: unlike movies, recommending unhealthy foods can have direct negative impacts on user health. Hence, recent research in personalized nutrition and recipe recommendation has increasingly integrated health factors and personal constraints into the recommendation process [3]. For example, Trattner and Elsweiler showed that adding a post-filtering step to re-rank recommended recipes according to their healthiness scores, improved the nutritional quality of suggestions [24]. Their findings highlighted that one can balance “tastiness” with “healthiness” in recommendations—recommending slightly healthier recipes that users still enjoy—though this often involves trade-offs with predictive accuracy [3]. Other works explored simple constraint-based methods, such as filtering out very high-calorie recipes or suggesting ingredient substitutions to reduce calories, salt, or fats [17,20]. These early health-focused recommenders demonstrated the value of accounting for nutrition (e.g., balancing user preference for tasty dishes with healthier choices), but many relied on relatively basic linear combinations of taste and health scores or heuristic re-ranking strategies applied after generating recommendations.

The introduction of KGs into recipe recommendation has been a key development [9,19]. The paper in [10] proposes a health-guided recipe recommendation system that leverages graph neural networks over a large-scale food KG to jointly model user preferences and nutritional aspects of recipes. Their system models semantic relationships across two distinct graphs—one reflecting user taste preferences and the other capturing recipe healthiness—and integrates them through a knowledge transfer mechanism. This enables the recommendation of recipes that align with users’ prior choices while gradually encouraging healthier options. As demonstrated through a case study, their KG-based recommender system also provides explainability by offering reasoning paths within the graph.

KGs also enabled incorporating external health knowledge (like dietary guidelines or ingredient health effects) as side information [11,13,25]. The paper in [16] take a related approach by formulating food recommendation as knowledge base question answering (KBQA). Their framework, pFoodReQ, models user queries together with dietary preferences and health guidelines as structured constraints, enabling more accurate and health-conscious recipe suggestions. The system personalizes results by enforcing dietary constraints, and by handling explicit user requirements including allergies, nutrition needs, and food logs, within a unified Question Answering (QA) framework. It introduces novel techniques for interpreting numerical comparisons and negations in user queries, achieving substantial performance gains over traditional non-personalized recommenders. Another work [26] introduced a software tool that enriches Food-Energy-Water KGs using machine learning and semantic embeddings, enhancing decision-making and information retrieval by adding semantically related triples, relations, and class classifications. DietQA similarly treats recipe recommendation as a constraint-based QA task over a KG, enriching it with dietary attributes and supporting personalized, health-aware answers through symbolic reasoning and natural language understanding.

Another dimension of personalization in food recommenders is multi-criteria optimization to satisfy both preferences and nutritional needs [2]. Toledo et al. [1] presented a framework for daily menu planning that combines user taste preferences with nutritional requirements using a two-stage approach: first filtering out recipes incompatible with a user’s profile (e.g., with allergens or disliked ingredients), then optimizing a day’s menu to meet nutrient targets (calories, protein, etc.). They highlighted that managing nutritional data and user preferences simultaneously is crucial for realistic personalized diet plans. Zioutos et al. in [27] developed SHARE, a hybrid recommendation system that integrates collaborative filtering with content-based and knowledge-based methods to generate comprehensive weekly meal plans, incorporating users’ health histories and chronic conditions while enabling dynamic adaptation through real-time user feedback and constraint modifications. Their system demonstrates notable flexibility by allowing users to apply hard and soft constraints, personalized filtering, and positive weight assignments, ensuring the recommendations evolve with changing user preferences and health requirements. DietQA adopts a similar two-stage logic, using hard constraints to strictly filter out recipes that violate dietary restrictions, and soft constraints to rank the remaining options based on user preferences such as favoured ingredients or dietary style, while lowering the rank of disliked or less suitable items. DietQA operates at the level of real-time, per-query recommendations within an interactive dialogue [12].

2.3. Knowledge Graphs and Food Ontologies

The use of ontologies and KGs in food computing has expanded significantly in the past few years. Early foundational work in cooking ontologies established the core organizational principles that inform modern food KGs. The approach in [28] has developed one of the first comprehensive cooking ontologies specifically designed for dialogue systems, organizing the cooking domain into four main modules: actions, food, recipes, and kitchen utensils, along with three auxiliary modules covering units and measures, equivalencies, and plate types. Their approach of separating domain knowledge from system logic through ontological modelling laid important groundwork for subsequent KG architectures in food computing.

Building on this foundation, more recent developments have produced large-scale, integrated resources such as FoodKG [29], a comprehensive food KG with about 67 million triples integrating recipes with a food ontology and nutritional databases. The KG incorporates data from over 1 million recipes [22], nutritional information from USDA’s National Nutrient Database, and relevant portions of the FoodOn ontology. FoodKG provides a structured representation of entities like ingredients (and their hierarchy), recipes, nutrients, and their inter-relations (e.g., recipe X uses ingredient Y, ingredient Y contains nutrient Z). By leveraging FoodKG, researchers can perform sophisticated reasoning in tasks such as recipe similarity, ingredient substitution, and health-oriented food analysis. The structured nature of the KG enables systems to identify connections between recipes based on shared ingredients and categorical relationships, allowing for more nuanced food recommendations that can increase diversity while maintaining relevance to user preferences and dietary constraints. DietQA extends this framework to better serve personalized dietary reasoning. It encompasses a rich and interconnected set of food-related entities, including recipes, ingredients, nutrients, and dietary classifications. DietQA’s KG explicitly encodes nutritional properties (e.g., protein, carbohydrates) and dietary tags (e.g., vegan, gluten-free) at the ingredient level, allowing the system to infer the overall dietary suitability and nutritional composition of each recipe. This enables logical reasoning and aggregation across multiple dimensions of dietary relevance.

KGs also support ingredient substitution and recipe adaptation, enabling personalized cooking assistance. Users frequently seek ways to adjust recipes to suit health goals, preferences, or available ingredients. Prior research has used graphs to suggest ingredient alternatives that play similar roles in recipes. For example, an ingredient ontology might link almond milk to cow’s milk via a “substitute for” relationship. Fatemi et al. in [30] developed GISMo, a graph-based ingredient substitution module that combined recipe context with a generic ingredient similarity graph to predict plausible substitution. This showed that encoding ingredient relationships in a graph (e.g., grouping functionally similar ingredients) can improve substitution suggestions. More recently, LLMs have been applied to the substitution task: Senath et al. in [18] report state-of-the-art results using an LLM augmented with ingredient knowledge, achieving about 21% Hit@1 accuracy (precision of the top suggestion) on a standard substitution benchmark. This outperforms earlier methods and underscores that combining factual knowledge (e.g., which ingredients are interchangeable) with LLM reasoning is a promising approach for complex food tasks. DietQA uses KGs for constraint-aware recipe adaptation, such as replacing high-fat ingredients with lower-fat alternatives. However, it does not incorporate advanced machine learning or predictive substitution methods.

Finally, KGs contribute greatly to the transparency and explainability of recommendations. Unlike black-box models, a KG-backed system can provide a clear rationale for its suggestions by tracing the graph. For instance, a system might explain: “Recommended Chicken with Eggplant because it is high in protein, low in fat, and all ingredients comply with your gluten-free and dairy-free preferences”. It could even highlight which ingredients contribute to those levels. Such explanations build user trust, especially in diet-related applications where users need to trust that the recommendations align with their health goals [14]. Prior studies have noted that a strength of KG-based recommenders is the ability to offer reasoning along graph paths as an explanation [10]. DietQA builds and utilizes a dedicated KG to support semantic reasoning over recipes, ingredients, dietary classifications, and potential substitutions, forming the backbone of its personalized, explainable, and nutrition-aware question answering system.

2.4. RAG, Food-Specific LLMs and Conversational Diet Assistants

The integration of Retrieval-Augmented Generation (RAG)—where external structured or unstructured data is retrieved and incorporated into LLM-based responses—has gained traction in food and health-related question answering. Several recent systems use RAG to improve personalization and factual grounding.

The convergence of conversational AI and dietary assistance has led to a new generation of systems that incorporate user context, nutritional reasoning, and knowledge retrieval. FoodGPT [31] is a domain-specific language model focused on food safety and food testing. It uses an incremental pretraining strategy on scientific food safety documents and integrates with a curated food safety KG via retrieval and context integration to support factual question answering. This setup allows it to answer technical questions (e.g., safe contaminant limits) with reduced hallucination. FoodGPT demonstrates the value of knowledge grounding and is optimized for high-precision QA in food science contexts rather than interactive use.

FoodSky [32] is a domain-specific large language model (LLM) for the culinary field. It employs a Hierarchical Topic Retrieval-Augmented Generation (HTRAG) framework: when faced with a question, FoodSky retrieves related information from external knowledge sources—such as recipe databases, cooking encyclopedias, and nutrition tables—and injects that into the prompt to improve generation quality. This hierarchical retrieval ensures that generated responses are enriched with factual knowledge, enabling FoodSky to score competitively on professional chef and dietitian certification exams in China (~67% accuracy), significantly outperforming baseline LLMs. However, FoodSky is not interactive, nor does it offer symbolic query handling or constraint satisfaction. It does not manage structured user profiles or dietary preferences, nor can it explain why a specific recipe was chosen.

ChatDiet [33] introduces a dual-knowledge RAG system for personalized nutrition, with an LLM orchestrator retrieving data from (a) a personal model—capturing user-specific health data via a causal inference graph—and (b) a population model with general dietary knowledge. At runtime, relevant facts from both sources guide the LLM’s responses. Personalisation is enabled through causal discovery (using SAM and CDT) and ITE estimation with DoWhy. The system offers explainability via step-by-step reasoning and BM25 retrieval, linking nutrients to quantified health effects (e.g., “Vitamin E extends REM sleep by 3.3 per unit based on your data”). Despite its strengths in personalisation and causal reasoning, ChatDiet does not support symbolic recipe retrieval or structured dietary constraint composition. It lacks fine-grained filtering using logical operators—such as excluding ingredients or enforcing nutrient limits (e.g., “exclude mushrooms” or “under 400 kcal with quinoa”). In contrast, DietQA enables real-time recipe adaptation via KG logic and user-defined constraints, supporting both strict filtering and soft re-ranking for dietary and preference-based personalisation.

HealthGenie [15] is an interactive dietary assistant that combines an LLM with a curated nutrition-focused KG. It refines free-text queries (e.g., “vitamin C–rich meals for scurvy”) and retrieves a task-specific subgraph linking foods, nutrients, health conditions, and recipes. This subgraph is visualized as an interactive node-link graph, allowing users to adjust graph depth, exclude ingredients, and guide results through direct manipulation. These interactions are logged and reintegrated into the LLM, which provides updated textual rationales and recipe suggestions. The system supports both English and Chinese. A 12-person study showed lower cognitive load and higher transparency compared to a text-only baseline.

HealthGenie and DietQA share a hybrid LLM–KG architecture for personalized recipe guidance, with common design traits: (a) a structured food KG with symbolic reasoning and path traversal, (b) an LLM that interprets queries and grounds answers in KG facts, (c) a persistent user profile used in every step for tailored recommendations, and (d) rationale generation that links outputs back to user constraints. Both report strong usability and trust in small-scale studies.

The two platforms diverge sharply in how they turn this common core into an interface. (a) HealthGenie exposes a dynamic node-link graph that users can directly manipulate: clicking a node includes or removes an ingredient, and a depth slider changes the level of detail. DietQA, by contrast, surfaces visual pills for recognized search terms, a tag-cloud of ingredients, and sliders for nutrient ceilings, while the user has no immediate access to the underlying graph. Their personalisation engines also differ. (b) HealthGenie blends causal discovery (SAM + CDT) with DoWhy individual-treatment-effect estimation, merging a user’s longitudinal data with a population model. DietQA applies rule-based profile filters expressed as Cypher predicates. Constraint logic is another point of contrast. (c) HealthGenie lets users add or drop ingredients through the graph UI yet cannot compose rich symbolic expressions—so it struggles with compound constraints like “under 400 kcal and no mushrooms.” DietQA’s Cypher layer supports full Boolean filtering across ingredients, nutrients and diet tags, enabling complex multi-constraint reasoning. Finally, their strategies for recipe rescue diverge. (d) HealthGenie re-runs the query after node-level exclusions or inclusions; it lacks a substitution taxonomy. DietQA encodes explicit diet-specific alternatives (e.g., vegan, gluten-free) and can swap non-compliant ingredients, recovering results when a direct match fails.

In sum, the two systems share a KG-centred, LLM-augmented foundation but pursue opposite interface philosophies: HealthGenie prioritizes interactive graph transparency and causal personalisation, whereas DietQA opts for conversational depth and formal constraint satisfaction powered by Cypher and substitution rules.

To summarize the preceding review, Table 1 compares DietQA with the most related systems in food knowledge graphs and LLM-based dietary recommendation. As shown in the table, existing systems address only parts of this space, such as KG based reasoning (FoodKG, pFoodReQ, HealthGenie, Health-guided recipe recommendation), conversational LLMs (FoodSky, ChatDiet, FoodGPT), or substitution (GISMo), but none integrate them fully. DietQA combines multi-diet reasoning, diet aware substitutions, symbolic constraints, and quantitative multi-diet coverage and substitution lift within a single framework.

3. Methodology

3.1. System Overview

DietQA combines structured knowledge processing with LLMs to deliver personalized question answering. The system integrates symbolic reasoning over a food KG with natural language understanding and generation. Figure 1 illustrates the system’s end-to-end pipeline from user query to final response generation. The pipeline includes several interconnected modules: the Intent Classifier and Entity Extraction Module leverages an LLM to interpret the user query and extract relevant intents and entities. These are passed to the Cypher Query Construction Module, which formulates graph queries by integrating user-specific constraints retrieved from a relational MySQL database. The Information Retrieval Module then executes the queries against the Neo4j KG. The retrieved data are processed by the Nutritional Data Analysis Module, which computes structured insights such as statistical summaries and nutritionally grouped recipe clusters. These outputs, together with the query context and user preferences, are forwarded to the RAG-based Response Generation Module, which employs an LLM to generate fluent, context-aware, and grounded natural language responses.

The final output is presented to the user as the System Response. To maintain up-to-date knowledge, the system includes a Web Crawler and Recipes & Ingredients Extraction Module that continuously gathers new recipe data, while the Ingredient Diet and Nutritional Data Extraction Module is used to enrich the KG with reliable dietary and nutritional information.

DietQA is implemented as a Client–Server architecture composed of modular components as shown in Figure 2. The design separates the backend—which handles natural language processing, KG querying, and LLM-based answer generation—from the frontend—which provides the user-facing application interface.

The methodology of DietQA combines structured knowledge representation, natural language understanding, and LLM-generated response. At the core of the system lies a domain-specific KG, which serves as the structured backbone for interpreting, constraining, and answering user questions about recipes and nutrition. The KG encodes detailed information about recipes, ingredients, nutrients, dietary suitability, and allowable substitutions as interconnected nodes and relationships. It is automatically populated through a combination of web crawling and semi-structured data extraction. This structured representation enables fine-grained semantic querying and supports dietary constraint enforcement.

User queries are processed using an LLM that identifies intents and extracts relevant entities, which are then normalized and semantically categorized to guide Cypher query formulation for execution against the KG. The system follows a RAG approach, where Cypher queries are executed on the KG, and the retrieved results undergo sequential processing, i.e., post-retrieval analysis and structured insights extraction. These processed outputs feed into the Retrieval-Augmented Response Generation Module, which manages search context explanation, personalized result presentation, and context-aware dialogue management. This architecture enables interactive, grounded, and user-adaptive feedback through an integrated, end-to-end pipeline. The following subsections describe each stage of the methodology, from KG construction to adaptive flows and complete pipeline execution.

3.2. Knowledge Graph and Dictionary Construction

At the core of DietQA is a KG tailored for recipe-centric question answering. The KG provides a unified representation of recipe data, nutritional information, and dietary domain knowledge, enabling the system to reason about complex queries. We constructed the KG by integrating multiple data sources, drawing upon existing food ontologies and databases.

Recipe Dataset.

The integration of natural language processing (NLP) and KGs has significantly advanced food computing applications, particularly in recipe understanding, dietary recommendation, and ingredient substitution. However, most existing systems have been developed for high-resource languages like English, which benefit from rich corpora and mature NLP tools. Greek, by contrast, is a medium-resource language with limited structured data and few domain-specific resources—especially in the food domain [5,34,35,36]. To address this gap, we developed a domain-specific knowledge graph of Greek recipes using web-scraped data. This approach enables the extraction of structured semantic information from unstructured Greek-language recipe text, populating the KG with meaningful entities and relationships.

We construct a domain-specific KG of Greek recipes using web-scraped data. For the ingredient extraction and semantic annotation of the recipes, we followed the methodology described in [23]. The KG is populated automatically using a web scraping and NLP pipeline. Recipes are collected from public Greek-language websites with titles, ingredient lists and images, extracted for each entry. The ingredient lines are further processed to extract structured triples consisting of quantity, unit, and ingredient name. All quantities and ingredient names are normalized and mapped to standard forms. To address out-of-vocabulary cases and expand coverage, we employ a translate-and-match bootstrap: when the crawler encounters an ingredient absent from the KG, an LLM translates the label to a canonical string; we then query authoritative repositories (USDA, FSANZ) and apply an exact, normalized string-match gate. If a single match is found, we import the item’s nutrient profile, category, and dietary labels, apply predefined normalization rules, create the ingredient node with full provenance/versioning, and link it to the corresponding category and diet nodes. Deduplication and numeric guards are enforced prior to insertion; cases with no or multiple matches or missing category/diet classes, are held for human review in staging. Upon recipe ingestion, the pipeline calculates nutrition totals and macronutrient energy fractions from the mapped ingredient quantities and stores them as properties to the corresponding recipe node enabling downstream analysis and KG-QA.

The resulting KG comprises 17,587 recipes of 2248 distinct dishes collected from 221 Greek culinary websites, encompassing 3365 distinct ingredients. Each recipe is represented as a node labelled Recipe annotated with metadata including its title, URL, and image URL. Each ingredient is represented by a node labelled Ingredient. Recipes are linked to their constituent ingredients via a CONTAINS relationship. The KG schema is shown in Figure 3a. This relationship is enriched with properties capturing the quantity, unit, and the percentage contribution of the ingredient to the total recipe weight. For instance, the recipe Greek Stuffed Tomatoes is connected to ingredients such as rice, tomato, olive oil, onion, and parsley—each via a CONTAINS edge that reflects the specific role and weight of the ingredient in that dish. Ingredient nodes are annotated with nutritional properties—protein, carbohydrates, fat, and calories. Recipe-level nutritional values are computed by aggregating the nutrient content of individual ingredients, weighted by their proportional contribution. This graph structure enables expressive and compositional queries over nutritional content. For example, identifying low-fat recipes involves traversing from recipe nodes to their ingredients via CONTAINS relationships and applying threshold-based filters on the aggregated fat values. Figure 3 displays a subgraph of a recipe for Cupcakes connected to its ingredients via CONTAINS relationships. Each edge represents the use of a specific ingredient in the recipe. A duplicated CONTAINS relationship indicates that the same ingredient appears two times in the ingredient list, reflecting separate mentions or uses within the recipe.

Diet compliance and alternative ingredients.

Beyond representing raw nutrient values, the KG also encodes dietary compatibility and substitution guidelines obtained from trusted resources like Food Allergy Research & Education (FARE) and Allergy & Asthma Network. Each specific diet is modelled as a node labelled Diet, and every Ingredient node is connected to Diet nodes through a directed ALLOWED_IN relationship. This relationship includes a compliance property with one of three values: True, False and Conditional, indicating whether the ingredient is strictly allowed, disallowed, or permitted under specific conditions for that diet. Additionally, the KG supports diet-aware ingredient substitution through directed relationships between ingredient nodes. These substitution edges are diet-specific and reflect semantic compatibility within a given dietary framework. For example, the ingredient node Wheat Flour is connected to Oat Flour via the GLUTEN_FREE_ALTERNATIVE relationship, allowing the system to suggest Oat Flour as a gluten-free substitute when Wheat Flour is excluded due to a False compliance value with respect to the :Diet {name: “Gluten-Free”} node. Similarly, alternative relationships such as VEGAN_ALTERNATIVE, DAIRY_FREE_ALTERNATIVE and LOW_FAT_ALTERNATIVE connect ingredients that serve as suitable replacements under the constraints of vegan, dairy-free and low-fat diets, respectively (Figure 3a). These substitution relationships enable the graph to support dynamic recipe adaptation and flexible constraint satisfaction in diet-specific recommendation scenarios.

The KG database functions as a service that accepts Cypher queries and returns structured results for downstream processing. The KG’s expressiveness allows combining multiple criteria seamlessly. Compared to a keyword search, the graph query can interpret the semantic roles (e.g., inclusion vs. exclusion of ingredients, numeric comparisons on nutrients) correctly and retrieve only those recipes that satisfy all conditions. Previous work [16] has emphasized the need for handling negations and numerical comparisons in food queries. Our knowledge representation is designed to support these out of the box.

We also maintain a dictionary stored in a relational database that includes canonical names and known synonyms for ingredients, dishes, diets, and course types. The dictionary is used during the entity validation phase to ensure that user-extracted entities by the LLM are correctly mapped to their corresponding nodes in the KG. The dictionary is populated automatically from the KG, by extracting the node labels and types, and can be further enriched manually to capture synonyms and spelling variations.

3.3. Natural Language Understanding and Query Parsing

Users interact with DietQA through natural language questions, which can range from simple requests (“Find a chicken salad recipe with low fats”) to complex ones (“I’m looking for a vegan dessert with bananas and oat, but without any sugar or dairy”). The role of the query parsing module is to transform these free-text queries into structured constraints executable by the KG. This pipeline includes the following steps: intent and entity extraction via an LLM, normalization and semantic categorization of extracted entities, and query construction.

3.3.1. Intent and Entity Extraction via LLM

DietQA employs a prompt-engineered LLM to jointly perform intent classification and entity extraction from user queries. Given an input utterance Q, the LLM outputs:

Intent set: $I = \{i_{1}, \dots, i_{n}\}$ (e.g., include_ingredient, display_recipe)
Entity set: $E = e_{1}, \dots, e_{m}$ (dishes, ingredients, diets)

The intents and the entities are returned as structured representations in JSON format. The prompt to the LLM explicitly defines a set of supported intent types and enforces strict mappings for recognized entities. Each query is parsed, producing semantically annotated output that includes all the user’s constraints and multi-intent interpretations. This approach enables robust handling of ambiguous, informal, or linguistically varied expressions, without relying on rigid syntactic patterns or handcrafted rules. Compared to earlier BERT-based intent classifiers and NER models, the LLM provides broader generalization and cleaner integration by leveraging its pretrained language understanding capabilities and prompt conditioning.

Intent Classification: Although DietQA primarily deals with recipe-finding queries, the system can handle various question types—for example, nutritional queries (“How much protein is in quinoa?”), comparison questions (“Which has more carbs, rice or pasta?”), or dietary compliance questions (“Is honey suitable for a vegan diet?”). The intent distinguishes between, for example, a direct recipe request, a nutritional fact question, a request to display a recipe list, or a follow-up question about the retrieved results. The LLM supports multi-intent parsing, allowing complex queries such as “I want a main course without eggs and low in fat” to be mapped into distinct intents like include_course, exclude_ingredient and add_diet_restriction, each with the appropriate entity values. The system can handle exclusions and negation (e.g., “without eggs”, “no peanuts”, “except for chicken”) by assigning the appropriate intent, such as exclude_ingredient. It also supports numeric comparisons (e.g., “under 500 calories”, “at least 20 g protein”) by mapping them to specific quantitative intents, such as cal_lower_than, prot_higher_than. Based on the identified intent, the system routes the query to the corresponding functional module—for example, recipe retrieval, factual answering, or context-aware follow-up handling.

Entity Extraction: The system extracts structured entities and constraints from free-form input. The LLM is prompted to identify relevant elements across multiple dimensions of recipe search, aligned with the system’s defined intents. These include ingredients to include or exclude, dietary restrictions (e.g., Gluten-Free, High-Protein), course types (e.g., Breakfast, Main Course), dish names, and constraint expressions such as calorie or nutrient limits.

Normalization and Semantic Categorization of the Entities.

The raw entities returned by the LLM are passed through a normalization and categorization layer. This component applies the following:

Exact and fuzzy string matching against the predefined tags dictionary to resolve lexical variants and synonyms (e.g., “veggie” to “vegetarian”).
Grammar-based rules to normalize the entity values to predefined grammatical number and case. Case handling is particularly important in Greek, where word forms vary significantly due to inflectional morphology.

The semantic annotation of the extracted values is validated and mapped against the entities defined in the KG, ensuring alignment with its schema. Each recognized entity is associated with a corresponding semantic category present in the graph (e.g., chicken: ingredient, vegan: diet, moussaka: dish, breakfast: course), which defines its functional role in the query structure. This mapping step enhances semantic consistency with the KG structure and contributes to accurate query construction.

3.3.2. Knowledge Graph Query Construction

After receiving the parsed entities that define the query constraints, along with the user’s stored preferences, the system formulates a Cypher query using predefined templates. To support complex recipe retrieval based on user constraints, we designed a multi-phase Cypher. The query is parameterized and capable of handling ingredient-level preferences, macronutrient filtering, dietary restrictions, and substitution logic, returning recipes that are both relevant and nutritionally compliant.

The query accepts inputs including required ($includeList) and excluded ($excludeList) ingredients, dietary restrictions ($diets), nutritional thresholds such as maximum calories ($maxCal) and minimum protein ($minProt or $minProteinPercentage), as well as recipe-level filters like dish name ($dish) and course type ($course). An additional flag, $allowAlternatives, determines whether the system should attempt to replace diet non-compliant ingredients with suitable alternatives.

The query consists of the following 3 three main stages:

Stage 1: Recipe Retrieval via Precomputed Properties. Recipes are matched along with their ingredients, optionally filtered by dish name and course. Ingredient generalization is supported via IS_A relationships, allowing the system to account for hierarchical concepts (e.g., cheddar is a type of cheese). The filtering logic ensures that all required ingredients are present, and all excluded ingredients (or their subclasses) are omitted. Nutritional information that has been aggregated across all ingredients in each recipe, during recipe insertion to the KG, using both declared nutrient values and derived estimates based on Atwater factors (4 kcal/g for protein and carbohydrates, 9 kcal/g for fat), is retrieved along with macronutrient percentages.

Stage 2: Diet Compliance Filtering. Dietary constraints are processed in two layers. First, macronutrient-based diets such as High-Protein and Low-Fat are applied through threshold rules (e.g., protein ≥ 20% of energy, fat ≤ 3 g or ≤30% of energy). Then, ingredient-level compatibility is checked for traditional diets like Vegan or Gluten-Free. The system supports both absolute (allowed = ‘true’) and conditional permissions (allowed = ‘conditional’) per ingredient-diet pair. Recipes are categorized as fully compliant, non-compliant, or conditionally accepted if substitutions are permitted.

Stage 3: Ingredient-Level Analysis and Substitution. A detailed breakdown of each recipe’s ingredient composition is provided, including weight contributions, nutrient impact, and energy contribution per component. If $allowAlternatives = true, the query follows alternative relationship paths (e.g., VEGAN_ALTERNATIVE) to identify substitute ingredients that preserve diet compliance. The output includes:

Aligned diets: Dietary patterns fully satisfied by the original recipe.
Conflicting ingredients: Components that conflict with the specified dietary constraints.
Suggested substitutes: Compatible alternatives derived from substitution relationships.
Conditional ingredients: Composite foods whose dietary compliance depends on the properties of their constituent ingredients.

This query leverages graph-based reasoning over a semantically structured KG, incorporating hierarchical ingredient classification and conditional execution (via apoc.do.when) to generate interpretable and user-adaptive recipe recommendations. The design balances flexibility, semantic richness, and fine-grained filtering, making it suitable for deployment in real-world diet-aware retrieval systems.

In addition to recipe recommendations, the retrieval layer handles factual queries (e.g., “How many calories are in broccoli?”). In such cases, the system queries the relevant ingredient node and directly extracts the requested property.

3.4. Retrieval-Augmented Generation (RAG)

3.4.1. Cypher Query Execution

The query is executed against the KG to retrieve candidate answers. In recipe-finding queries, the answer candidates are a set of recipes that satisfy all the constraints. The query output includes, in addition to the recipe identifiers, key attributes necessary for formulating informative responses—such as the recipe name, energy, macronutrient values, and their percentage breakdowns. The system also retrieves supplementary data such as dietary tags, substitution mappings, and ingredient-level contributions. The results are filtered and sorted according to the query parameters. Factual queries typically return numeric values, categorical labels, or structured lists of items.

To improve performance, each Cypher query is hashed using its structure and parameters, generating a unique cache key. A fixed-size in-memory Least Recently Used (LRU) cache stores the results. When a repeated or semantically equivalent query is detected, the system retrieves the corresponding result directly from the cache, bypassing the Neo4j database and reducing response times. If no match is found, the query is executed and its result is cached for future reuse.

3.4.2. Post-Retrieval Analysis and Nutritional Clustering of Recipes

After retrieving the query results, the system performs post-processing to compute structured insights such as calorie ranges, macronutrient distributions, and nutritional breakdowns by percentage. It also analyses recipe metadata, examining the distribution across meal types, the frequency of common and rare ingredients, and the degree of overlap with the user’s available pantry items.

To identify groups of nutritionally similar recipes, we apply K-Means clustering, which partitions data into k groups by iteratively assigning each recipe to the nearest centroid and updating centroids to minimize within-cluster variance. Recipes are represented as a seven-dimensional feature vector (total calories; grams of protein, fat, and carbohydrates; and the percentage of energy from each macronutrient). Features are z-score normalized—rescaled so that each has mean 0 and standard deviation 1. A z-score indicates how many standard deviations

σ

a data point

x

deviates from the mean

μ

, calculated as

z = (x - μ) / σ

. This ensures comparability across scales and reduces skew, making clusters closer to spherical in practice. The optimal number of clusters k is chosen by maximizing the silhouette score, which measures both cohesion within clusters and separation between them (higher values indicate better-defined groups). We use K-Means with k chosen via silhouette-based optimization because (i) centroids are directly interpretable as nutritional archetypes that support filtering, explanations, and preference-aware ranking against user macro goals (e.g., “low fat”, “high protein”); (ii) hard assignments yield clear groups that integrate cleanly with our downstream logic; and (iii) the method’s speed and stability suit an interactive pipeline. For each cluster we compute centroid summaries—the average calories and macronutrient percentages across all recipes in the cluster, together with their minimum and maximum values—so that each cluster can be presented as a human-readable nutritional archetype. This design favours interpretability, determinism, and low latency.

Once recipes are assigned to clusters, we compute descriptive statistics for each group, including mean, minimum, and maximum values across all nutritional features. These summaries yield interpretable nutritional profiles for each cluster and are used downstream to rank or filter recipe groups according to user-defined macronutrient preferences. The extracted information feeds into the RAG-based Response Generation Module to enhance its ability to generate grounded and context-aware answers.

3.4.3. Retrieval-Augmented Response Generation

The final stage of the DietQA pipeline involves generating human-readable, personalized answers that explain and justify the retrieved results in relation to the user’s initial query, nutritional preferences, and filtering constraints. The RAG-based Response Generation Module is responsible for producing natural language answers based on the results returned from the KG and the user’s original query. It interfaces with the Ollama LLM inference server to formulate prompt-based calls that include the user query, relevant recipe results, structured insights, and system guidelines. Depending on the context, prompts are tailored to explain the result set, answer factual or follow-up user questions, or provide recipe suggestions in a human-readable format. The module ensures that all responses are grounded in the retrieved data, in line with the RAG approach.

The system provides structured explanations at multiple levels. First, it clarifies the search context, detailing how the user’s query was interpreted in terms of dietary constraints, filtering parameters, and ingredient exclusions. Second, it presents aggregated insights, including statistical distributions, nutritional clusters, and high-level patterns across retrieved recipes. Finally, it offers fine-grained explanations for individual recipes, covering their nutritional profiles, ingredient composition, dietary compatibility, and relevant substitutions.

Search Context and Results Explanation.

Once candidate recipes have been retrieved, the system initiates a multi-step interaction that guides the user through the exploration of results. The first step is the explanation of search terms and user preferences, where the system articulates explicitly and in natural language, the active search parameters such as dietary restrictions, macronutrient targets, ingredient exclusions or pantry limitations, and any selected meal type or dish category. This ensures that the user is aware of how their input has been interpreted and sets a clear context for subsequent interaction.

Based on the retrieved candidate recipes, the system generates an aggregated explanation of the results and their statistics in the context of the user’s goals, describing not only how the retrieved recipes satisfy the initial criteria, but also how they compare across different nutritional axes. The system presents a detailed explanation of the results, focusing on how they relate to the user’s specified ingredients, preferences, and nutritional constraints. This explanation serves both as a summary of the search and as an interpretive guide for understanding the suitability of the retrieved recipes. In addition to nutritional trends, the system incorporates metadata derived from the analysis of the result set. This includes identifying the distribution of recipes across different types of meals and highlighting the most and least commonly used ingredients within the result set. It also evaluates how many of the ingredients overlap with those already available in the user’s pantry. These insights help users assess both the nutritional relevance and the practical accessibility of the result set before proceeding to recipe-level decisions or additional filtering.

To further support user understanding, the system leverages the recipe clustering output to generate human-readable explanations for each nutritional group, highlighting how the aggregate characteristics of these clusters align with the user’s stated preferences and search intent. Instead of exposing users to raw statistical data of clusters, the system converts the macronutrient profiles into concise, interpretable descriptions that clearly indicate how well each cluster satisfies or deviates from the user’s criteria—such as prioritizing high protein content or reducing fat intakes. These explanations are integrated into the filtering process, helping users quickly assess which clusters are most compatible with their search constraints. By translating nutritional patterns into clear, actionable categories, the system supports the user in making informed decisions.

Consider a user search, “I’m looking for a dish with chicken and tomatoes”, where the user’s preferences indicate high protein, low fat and carbohydrates, and a small number of ingredients. Additionally, the user has specified dislikes for artichokes, broad beans, okra, and mushrooms. Table 2 illustrates how DietQA explains and contextualizes the search context and the retrieved results by aligning them with the user’s query, dietary preferences, and nutritional goals, offering structured summaries and nutritional groupings to support informed decision-making.

Personalized Results Display and Justification.

When results are finally displayed, each recipe is presented alongside a short, natural-language explanation of why it was selected and how it meets (or partially meets) the user’s query and preferences. This includes a per-recipe breakdown of nutritional values and ingredient matches, as well as dietary compatibility. Users also have the option to switch to an available ingredient ranking, where recipes are sorted based on their compatibility with pantry ingredients, enabling practical decision-making rooted in ingredient availability.

A notable feature of DietQA is the use of the KG to retrieve explanatory context alongside the answer. When formulating a response, particularly in the case of recommendations, it is important to explain why a recipe meets the specified criteria. The system retrieves contextual data such as which ingredients contributed to the recipe’s eligibility and any associated dietary annotations.

For example, if a recipe is recommended as suitable for a vegan diet, the KG contains relationships such as (:Recipe)-[:CONTAINS]->(:Ingredient)-[:ALLOWED_IN]->(:Diet {name: $Diet}) for each ingredient in the recipe. This allows the system to justify the result with a statement like: “This recipe is vegan because it includes only vegan-compliant ingredients.” Similarly, based on the query’s nutritional calculations, the system can explain that a recipe is labelled “High-Protein” because the calories contributed by protein exceed 20% of the total—providing a clear, data-driven rationale for the classification. Such explanations enhance both transparency and educational value for the user.

In addition to dietary compliance, the system can also explain how a recipe is adapted to meet dietary requirements through ingredient substitutions. When a recipe includes an ingredient that is incompatible with a specific diet, the KG is used to identify suitable alternatives via relations such as (:Recipe)-[:CONTAINS]->(A:Ingredient)-[:GLUTEN_FREE_ALTERNATIVE]->(B:Ingredient). The recipe’s revised alignment with the diet, after one or more substitutions, is then explained with contextual justifications—for example: “This recipe includes wheat flour, which is not gluten-free, but it can be substituted with rice flour to comply”. By presenting such substitutions directly within the personalized results, DietQA supports flexible, user-aligned recommendations while clearly communicating how each adaptation impacts dietary compatibility.

If the KG lacks information (e.g., a certain nutrient not present), the system will either indicate it could not find an answer or fall back to the LLM’s general knowledge. In the latter case, DietQA explicitly flags that the answer is not grounded in the KG and advises the user to verify it with a trusted source.

Consider the previous example with the user search, “I’m looking for a dish with chicken and tomatoes,” and the associated preferences for high protein, low fat and carbohydrates, minimal ingredients, and exclusions such as okra and mushrooms. The results are presented in an explanatory context that evaluates how well each retrieved recipe aligns with these constraints (Table 3).

Context-Aware Dialogue Management.

When the user poses a question about the results, the system forwards the query—together with the retrieved recipes, structured insights, and metadata—to the LLM, enabling context-aware and data-grounded response generation. This mechanism supports conversational continuity by maintaining awareness of previously mentioned items. For example, if the user asks, “What are the nutritional values of these recipes?”, the system resolves the reference to the last results. By default, all follow-up references bind to the most recent result set. The relevant data retrieved from the KG are used by the LLM to generate a new response. To maintain coherence across turns, a running dialogue summary is dynamically updated and rewritten at each step, ensuring that the model has access to the most recent interaction history without exceeding context limits. The system uses a summary of previous turns and re-queries the KG as needed at each step to respond to follow-up questions.

To preserve coherence in multi-turn dialogue, DietQA resolves contradictory constraints deterministically. When a user introduces a new constraint that conflicts with an earlier one for the same canonical entity, the latest input overrides the prior setting after normalization (e.g., a previous exclude eggs is replaced by include eggs). If a single utterance simultaneously includes and excludes the same entity (e.g., “cake with eggs excluding eggs”), the parser flags the inconsistency and the dialogue manager pauses execution to request clarification before any KG query is issued.

3.5. User Decision and Adaptive Flow

Based on the retrieved data, filter controls are dynamically generated and displayed. These controls reflect the actual variation in recipe data present in the current result set and allow the user to refine the search further. Filters include toggles, sliders or selection prompts for additional dietary preferences, specific ingredient inclusion/exclusion, recipe courses, or nutritional data boundaries. The system only presents filters that are meaningful given the returned results, avoiding irrelevant or redundant options. These filters are designed to complement, rather than replace, the ongoing dialogue with the system. Users can open them in a separate, interactive modal panel to explore available refinement options visually and selectively. Each user action on the filter controls is followed by a system message that explains how the new criteria have affected the results.

At every stage, users retain the freedom to engage with these filters directly or continue the interaction solely via natural language. For example, following the results explanation, the system offers suggestions to the user—such as indicating which nutritional group best matches the query—and then prompts for input on whether to request further clarification or proceed directly to viewing the recipes. This decision point ensures that users retain full control over the interaction, with the flexibility to either explore the reasoning behind the suggestions, ask follow-up questions about the result set, or continue by examining the top-ranked recipes.

3.6. DietQA System Pipeline Algorithm

The pseudocode in Algorithm 1 summarizes the main execution flow of the DietQA system, integrating structured KG access, natural language query understanding, and response generation through a retrieval-augmented language model.

Algorithm 1. DietQA System Pseudocode

Input: q (query), u (user profile)

Output: R (ranked recipes) OR A (factual answer)

session ← InitializeSession(u)

profile ← LoadUserProfile(u)

dictionary ← LoadDictionary()

repeat

hardConstraints ← ∅

repeat

if hardConstraints = ∅ then

//Extract and Normalize

intent, entities ← LLM_ExtractIntentEntities(q)

entities ← NormalizeAndValidate(entities, dictionary)

// Build Query

hardConstraints ← MapToConstraints(intent, entities, profile)

softPreferences ← MapToPreferences(profile)

cypherQuery ← BuildQuery(intent, hardConstraints)

end if

// Execute

results ← ExecuteQuery(cypherQuery)

if results.type = “recipes” then

R ← results.recipes

//Analyze and Rank

stats ← ComputeStats(R)

clusters ← ClusterByNutrition(R)

for each item in [clusters, R] do

scores[item] ← ScoreItem(item, profile)

end for

rankedClusters ← Sort(clusters, scores)

rankedRecipes ← Sort(R, scores)

// Display and Interact

repeat

Display(rankedRecipes, stats)

action ← GetUserAction()

switch action.type

case “display_recipes”:

ShowRecipeCards(rankedRecipes, profile)

case “soft_filter”:

rankedRecipes ← Rerank(rankedRecipes, action.data)

case “hard_filter”:

hardConstraints ← Update(hardConstraints, action.data)

break

case “new_query”:

q ← action.query

hardConstraints ← ∅

break

end switch

until action.type ∈ {“hard_filter”, “new_query”, “exit”}

else // factual answer

A ← GenerateAnswer(results.answer, profile)

Display(A)

end if

until hardConstraints ≠ ∅ or results.type = “factual”

until UserExit()

return {lastResults: R OR A, profile: profile}

3.7. Computational Complexity and Scalability

Setup and notation.

In the deployed configuration, each :recipe node stores precomputed nutrition totals (calories; grams of protein, fat, carbohydrates) and macro percentages. Recipes are connected to ingredients via CONTAINS relationship. Let

$N_{recipes}$ : total recipes in the KG
$s = R / N_{recipes}$ : selectivity of indexed predicates
$R$ : candidates after property filtering (e.g., dish, course, numeric thresholds)
$D$ : number of requested diet labels in the query
$\bar{i}$ : average ingredients per recipe
$m = |Inc| + |Exc|$ : include/exclude tokens
$c$ : ingredient category tokens (e.g., “meat”)
$h$ : taxonomy depth for IS_A (bounded, typically $\leq 2$ )
$b$ : branching factor of *_ALTERNATIVE edges; $L \leq 2$ : maximum alternative path length
$n$ : recipes passed to analysis/display
$T$ : LLM tokens (prompt + context)

Complexity Alignment with Query Execution.

The complexity analysis mirrors the Cypher query, which comprises three parts: Part A performs retrieval with include/exclude handling, Part B applies diet filtering, and Part C generates explanations and, optionally, searches for alternatives. For clarity, we describe four logical stages that map to these parts. Stage 1 (Part A, early) applies property-based retrieval and numeric filters over indexed recipe attributes, executing index seeks and range scans with time and memory bounded by

O (s \cdot N_{recipes}) \approx O (R)

. Stage 2 (Part A, mid) enforces ingredient include/exclude tokens and category terms by traversing CONTAINS edges and performing hash-set membership tests, while expanding categories via a bounded IS_A hierarchy of depth

h

. The time bound is

O (R \cdot \bar{i}) + O (m) + O (c \cdot h)

. Stage 3 (Part B) validates diet compliance by checking each candidate recipe’s ingredients against the requested diet labels via ALLOWED_IN relations, with cost

O (R \cdot \bar{i} \cdot D)

. Stage 4 (Part C, optional) explores diet-typed *_ALTERNATIVE edges up to depth

L

with branching factor

b

; per-recipe worst-case addition is

O (b^{L})

.

Post-retrieval analysis and LLM stages.

For the displayed set of

n

recipes, z-score normalization over a fixed 7-dimensional feature vector costs

O (n)

, and K-Means with small

k

and standard iterations is near-linear in

n

. Intent/entity extraction and grounded answer generation by the LLM scale with the number of tokens

T

, i.e.,

O (T)

, and are independent of the graph size.

Total per-query cost.

Let

1_{i n c}

,

1_{a l t}

,

1_{e x p}

\in {0,1}

indicate the presence of include/exclude terms, alternatives, and explanations, respectively. A compact upper bound is:

T_{q u e r y} = O (s \cdot N_{recipes}) + O (R \cdot \bar{i} \cdot D) + 1_{i n c} \cdot [O (R \cdot \bar{i}) + O (m) + O (c \cdot h)] + 1_{e x p} \cdot O (n \cdot \bar{i} \cdot l o g \bar{i}) + 1_{a l t} \cdot O (R_{alt} \cdot \bar{i} \cdot b^{L}) + O (T),

where

R_{alt} \leq R

is the number of recipes that actually enter the alternatives step.

Scalability considerations.

As the corpus and query complexity grow, per-query latency and throughput evolve predictably under these bounds. With respect to corpus size and selectivity (

N_{recipes}, s

), indexed property retrieval scales as

O (s \cdot N_{recipes}) \approx O (R)

, while very broad queries (

s \to 1

) approach

O (N_{recipes})

. For the ingredient inventory (

\bar{i}

), relationship-based steps scale with the average ingredients per recipe: include/exclude checks at

O (R \cdot \bar{i})

and diet validation at

O (R \cdot \bar{i} \cdot D)

; normalization and de-duplication at ingest help keep

\bar{i}

stable. Regarding the diet taxonomy (

D

), per-query cost grows linearly with the number of requested diets through the

O (R \cdot \bar{i} \cdot D)

term and is independent of the total number of supported diet types.

For category constraints (

c, h

), category tokens (e.g., “meat”) introduce a bounded preprocessing cost

O (c \cdot h)

for IS_A expansion (typically

h \leq 2

); subsequent traversal cost follows the include/exclude bound. In the substitution graph (

b, L

), the optional alternative search contributes

O (R_{alt} \cdot \bar{i} {\cdot b}^{L})

in the worst case; edge curation, bounding

L \leq 2

, and short-circuiting on the first valid substitute keep this term controlled. For the analysis set size (

n

), post-retrieval normalization is

O (n)

and clustering remains near-linear in

n

with a fixed, low-dimensional feature vector and small

k

. The LLM component scales with tokens

O (T)

and is decoupled from KG size; summarizing and de-duplicating evidence bounds

T

as data grow.

In terms of memory and streaming, Stages 1–3 stream over candidates with memory

O (R)

; per-recipe lists for explanations/substitutions are materialized only when requested, adding

O (n \cdot \bar{i})

. Along the ingest/update path, precomputing recipe-level nutrition and macro shares costs

O (\bar{i})

per recipe, while diet alignment is computed at query time via recipe–ingredient–diet relationships, consistent with the

O (R \cdot \bar{i} \cdot D)

bound. For operational scaling, throughput improves with read replicas for query traffic and application-level caching for repeated interactions; consistent indexing, bounded traversal depths, and back-pressure/timeout guards help maintain service-level objectives as

N_{recipes}, \bar{i}

, and concurrency increase.

In practice, well-specified queries with selective predicates exhibit stable, near-linear behaviour in

R

and

\bar{i}

, while more complex interactions (many diets

D

, category expansions, or substitutions) degrade predictably under the stated bounds and remain tractable under the configured limits. Scalability is maintained because per-query cost grows with the candidate set

R

rather than the total corpus, while indexing, read replicas, caching, and bounded traversal depths keep latency stable as

N_{recipes}

and concurrency increase.

4. User Interface Design and Usability

A primary objective of DietQA is to make advanced AI capabilities accessible to non-technical users through a user interface (UI) that emphasizes clarity, control, and responsiveness. The interface was designed to facilitate intuitive interaction, support complex multi-constraint filtering, and provide visual feedback that enhances user trust and engagement.

4.1. UI Elements

The frontend of DietQA is implemented as a mobile-friendly web application that provides an intuitive, chat-driven user experience. The main UI elements include:

Preferences Panel: Users can customize their dietary preferences using a dedicated panel, as illustrated in Figure A1. This includes toggling nutrient emphasis (e.g., favouring high protein or low fat), specifying liked and disliked ingredients, indicating forbidden ingredients (e.g., due to allergies or strict dietary rules), and listing which ingredients are currently available in their pantry. Preferences are input via dynamic search fields and visual tags, with each category colour-coded for clarity. These settings inform subsequent queries, ensuring that recommendations align with the user’s goals and constraints.
Chat Interface: This is the core interaction space where users engage with DietQA through natural language. The interface acts like a messaging app, with user queries and system replies displayed in conversational bubbles (Figure A2). System responses go beyond plain text, incorporating interactive visual elements such as sliders, filter controls, nutrient visualization charts, and recipe images—enhancing user engagement and understanding.
Filters Panel: Accessible directly from the chat interface, the Filters Panel offers a rich, real-time refinement toolkit for dynamic, real-time refining search results across nutritional, categorical, and compositional filters (Figure A3). It supports diverse interaction types, including ingredient tag cloud, drag-and-drop gestures, option selections, and sliders. It includes Ingredients, Dish name, Meal type, Diet and Nutritional Sliders Tabs.
Result Recipe Cards: Recommended recipes are initially presented as conversational messages. Optionally, they can be displayed in a structured list format enriched with dietary tags, key ingredient highlights, and graphical representations of macronutrient distribution (Figure A3). This visual enhancement helps users quickly compare nutritional profiles and suitability to their preferences. Selecting a result expands the full recipe view, including ingredients, and dietary compatibility checks.
Recipe Card Detail View: When a recipe is selected, a detailed card opens displaying the dish image, basic metadata, per-100 g nutritional information, and diet compatibility. A breakdown of ingredient-level contributions shows how each item affects calories and macronutrients. Users can also view pantry matches and ranking analytics. Visual elements like nutrient bars and expandable ingredient details enhance understanding and support informed decision-making.

4.2. Conversational Interaction

The primary interface is conversational, allowing users to interact naturally by asking questions as they would to a human (Figure A2). This design leverages the familiarity of messaging apps, using a clear prompt area (“Ready to find some recipes! What are you looking for?”) to encourage free-form queries. Responses are presented in friendly, context-aware chat bubbles, fostering an approachable atmosphere and lowering the intimidation of interacting with a complex system. During backend processing, the system displays a “Thinking…” indicator to maintain responsiveness. This interaction style aligns with Human–Computer Interaction principles, which show that conversational agents reduce cognitive load, increase engagement, and improve accessibility—especially in health-related contexts where users may lack technical expertise.

Users are free to engage with the system in a flexible, non-linear manner. They can pose a new query at any point, refine their request through natural language, ask follow-up questions about the returned recipes, initiate an entirely new search, or shift the conversation to a nutritional or dietary topic—all without needing to follow a predefined interaction path. This open-ended design supports diverse user goals and encourages exploration, ensuring that users retain full control over how they navigate and interact with DietQA.

4.3. Visual Query Feedback

Immediately after a query is processed, the system presents the search terms as colour-coded pills (Figure 4) and along the system’s response, both rendered within the same dialogue bubble to maintain a unified, conversational interface. The pills follow a consistent visual scheme: dish names appear in blue, included ingredients in green, exclusions in red, and dietary tags in grey. Each pill can be removed with a single action, enabling users to refine their query without re-entering text. Above the results, the system displays a concise natural language summary that describes both the user’s original query and their active dietary preferences. This allows users to quickly validate what the system has interpreted and is searching for on their behalf. Within the system’s response, key search terms—such as ingredients, exclusions, or dietary tags—are also visually highlighted (e.g., “The search for cookies based on your preferences brought up 4 recipes, which are mostly Gluten-Free”). This technique of echoing and emphasizing the user’s input within the answer serves as a form of confirmation feedback, reinforcing that the system has correctly understood and applied the intended constraints.

4.4. Filter Controls and Exploratory Refinement

Beyond the chat interface, DietQA includes a structured filter panel that enables detailed exploratory refinement across the dimensions: Ingredients, Dish Category, Course Type, Diet, Calories, Macronutrients and the number of recipe’s ingredients (Figure A3). These filters are dynamically populated based on the current result set and support real-time feedback as users adjust them.

Ingredients Tab: Displays a tag cloud of all ingredients found in the retrieved recipes, with font size indicating their frequency. This visualization helps users quickly grasp which ingredients are most common in the current result set. To refine the query, users can drag an ingredient to a “cooking pot” icon to explicitly require it, or to a “trash” icon to exclude it from the results. This gesture-based interaction offers an intuitive way to adjust filters, immediately triggering a modified query execution.

Dish Category, Course Type: These two tabs present categorical facets—such as cookies, soups, or breakfast—as clearly labelled checkbox options. Each category is accompanied by the number of recipes it appears in, providing immediate feedback on its relevance to the current result set. Users can select or deselect categories to filter recipes in real time, allowing for quick exploration and refinement based on dish category or course type.

Diet Tab: Allows for the selection of dietary tags such as Dairy-Free or Vegan. Each tag is accompanied by the number of recipes that match it in the current result set. Only diets present in the retrieved results are displayed, reducing cognitive load and preventing zero-result interactions.

Nutritional Sliders Tab: Offers interactive sliders for Calories, Fat %, Protein %, Carbs %, and total Ingredient Count. The available ranges are automatically computed from the current result set. Any adjustment triggers instant re-rendering of both the filtered results and the sliders themselves. Users can iteratively narrow their constraints and immediately observe the updated number of matching recipes and the corresponding their aggregated nutritional properties.

The filter panel complements the conversational interface by providing an intuitive, real-time mechanism for refining searches through dynamically generated controls, enabling users to adjust constraints efficiently and immediately see the impact on the results; this empowers users to stay in control of the search process and promotes confidence in the system’s responsiveness and transparency.

4.5. Accessibility and Responsiveness

Given many users would use a diet app on their smartphones, we designed the UI to be mobile-responsive. The interface is responsive across devices and follows accessibility standards, including high contrast colours, scalable fonts, readability and large touch targets. The chat bubbles are sized for readability on small screens, and buttons/toggles are touch friendly.

In conclusion, DietQA’s UI is crafted to balance advanced backend capabilities with a clear, user-friendly experience. By supporting natural language interaction, offering intuitive visual feedback, and allowing users to personalize their dietary preferences, the interface ensures usability and engagement. A well-designed UI enhances the overall experience and fosters long-term adoption, encouraging users to return to DietQA for recipe discovery. The Evaluation section that follows examines how these design choices translate into real-world performance and user satisfaction.

5. System Implementation

DietQA is implemented as a modular client–server architecture, optimized for responsiveness, modularity, and secure on-premise operation. The system integrates a Neo4j-based KG for structured reasoning, a quantized LLM server for natural language understanding and response generation, and a web-based frontend for user interaction.

5.1. Backend and Data Infrastructure

The backend is implemented in Python 3.12, using a modular architecture composed of orchestrated services. At its core is a Neo4j graph database, which stores recipes, ingredients, nutritional data, diet constraints, and substitution relationships. All Cypher queries are issued via a custom Information Retrieval Module, which builds multi-stage queries incorporating filtering logic, nutrient aggregation, diet compliance checks, and substitution detection.

The KG is populated automatically using a web scraping and NLP pipeline. Recipes are collected from public Greek-language websites using Selenium for browser automation and BeautifulSoup for HTML parsing. Recipe titles, ingredient lists, and images are extracted. The ingredient lines are further processed using spaCy.

In parallel, a MySQL relational database holds user-specific static data such as dietary profiles and pantry contents. It also stores a structured dictionary for entity normalization.

Query results are cached in memory using a Python-level fixed-size Least Recently Used (LRU) cache, reducing redundant database access. Queries are identified by hashing their Cypher structure and parameters.

5.2. LLM Server and Prompt Pipeline

All prompt-based interactions are routed to a local Ollama server, hosting a quantized instance of the gemma3:12b model. This server performs all LLM-driven tasks including intent classification, entity extraction, query clarification, summarization, and response generation.

The entire backend—including the Neo4j graph database, relational database and the Ollama LLM server—runs on a dedicated Ubuntu 22.04.5 LTS machine with the following specifications:

Dual AMD EPYC 7742 CPUs (128 cores, 256 threads)
1 TiB RAM
8× NVIDIA A100-SXM4 GPUs (40 GB each)
14 TB RAID0 array for data storage
1.7 TB RAID1 array for the operating system

This high-performance setup allows for fast inference even with complex prompts. LLM inference latency ranges from 3 to 16 s, depending on prompt complexity and length.

5.3. Frontend Architecture and Technologies

The frontend of DietQA is a lightweight, browser-based web application built using standard HTML and JavaScript, enhanced by modern UI libraries for dynamic interaction. The interface is styled with Tailwind CSS, ensuring mobile responsiveness and a clean design. Vue.js 3 is used to manage reactive components and UI state, particularly within the chatbot and preferences interfaces. For lightweight interactivity and collapsible UI elements, Alpine.js (with the Collapse plugin) is employed, complementing Vue without heavy overhead. js-cookie is used for managing client-side session data. The system supports markdown-formatted responses using Marked.js, allowing rich text rendering within the chat interface. Custom JavaScript handle core UI logic, user input handling, and communication with the backend, enabling a seamless conversational experience.

The frontend is powered by Flask, which handles HTTP requests and routes them to appropriate backend logic. HTML pages are generated using Jinja2 templating, allowing server-side insertion of dynamic content—such as user preferences, recipe results, and filter elements—directly into the rendered pages.

Overall, the frontend is designed for clarity, modularity, and ease of use, acting as a lightweight client that delegates most of the application logic to the backend.

5.4. Deployment, Performance, and Monitoring

The entire backend, including the Neo4j graph, Ollama server, and API services, is containerized using Docker and deployed on a Linux-based infrastructure. The system is secured via HTTPS, with user-specific data stored in an isolated relational database to ensure privacy and compliance with data sovereignty requirements.

DietQA’s architecture is purposefully optimized for low-latency interaction, even in the presence of complex backend semantics. User queries are processed through a hybrid pipeline: symbolic reasoning is first executed via Cypher queries on the Neo4j KG, followed by neural response generation through the Ollama LLM server. Graph queries typically resolve in under 500 milliseconds; however, in cases involving high-volume result sets, latency can extend up to 3 s. When supported by the Python-level in-memory LRU cache, query times can drop below 1 millisecond. This caching mechanism allows the system to efficiently handle repeated or semantically similar queries, reducing load on the KG and significantly improving responsiveness.

LLM inference is the most time-consuming stage, with response times ranging from 3 to 16 s, depending on prompt complexity. To balance fluency and latency, DietQA employs quantized LLM.

A real-time logging and monitoring subsystem tracks system behaviour, query patterns, parsing errors, and KG failures. These logs are used to improve prompt templates, expand the tags dictionary, and support iterative refinement.

6. Evaluation

We evaluated DietQA through a combination of quantitative performance tests and a qualitative user study to demonstrate its effectiveness, accuracy, and usability. The evaluation aims to answer the following questions:

How well does DietQA perform in retrieving correct and relevant recipes in response to user queries (accuracy and constraint satisfaction)?
How effectively does the system perform to diet-constrained queries across diverse dishes, and what is the contribution of ingredient substitution mechanisms to overall query resolution coverage?
Does DietQA’s integration of personalization improve the quality of recommendations compared to non-personalized baselines?
How do users perceive the system in terms of usefulness, ease of use, and satisfaction, and does the interface support their needs?

6.1. Accuracy and Retrieval Performance

We created a benchmark dataset consisting of 60 pairs of user preference profiles and corresponding natural language queries, covering a wide range of realistic dietary and ingredient-based scenarios. These pairs were manually constructed based on real-world examples, sourced from web discussion groups and in-person interviews. The queries include a variety of types, such as simple dietary requests (e.g., “low-carb recipes”), multi-constraint queries (e.g., “lunch under 600 kcal with chicken and broccoli, no dairy”), and information-seeking questions related to nutrition and diet (e.g., “how much protein is in tofu?”, “is oatmeal suitable for a gluten-free diet?”). This diversity was designed to reflect the types of queries users are likely to pose in real-world diet support contexts. The content of the queries was aligned with the available data of the Recipe KG, ensuring that no out-of-scope or unsupported questions were included in the benchmark. For each query and its associated user context, we manually (a) identified and extracted the relevant search constraints and preferences, (b) queried the Recipe KG to retrieve the matching recipes, and (c) compiled a reference dataset that reflects the intended dietary and semantic criteria for subsequent evaluation.

To evaluate system performance, we submitted the same natural language queries and user preference profiles to DietQA and collected its top-k recipe suggestions. Each output was compared against the manually curated reference dataset using standard information retrieval and constraint-based evaluation metrics: Top-1 Accuracy, Recall@3, nDCG@5, and Constraint Satisfaction Rate. Since both the reference answers and system outputs were generated over the same Recipe KG schema and based on identical dietary and semantic constraints, the evaluation process was fully automated and free from annotation noise.

The benchmark included both recipe recommendation queries and factual nutritional questions (e.g., “How much protein is in tofu?” or “Is almond milk suitable for a low-fat diet?”). For recipe queries, we assessed whether DietQA’s recommendations matched the ground truth set of recipes that satisfied all constraints. Top-1 Accuracy measured the proportion of queries where the highest-ranked recipe was correct; Recall@3 captured whether a valid answer appeared in the top three results; and nDCG@5 reflected the ranking quality among the top five. For factual queries, accuracy was defined as an exact match with the correct numeric or categorical response. Finally, we computed the Constraint Satisfaction Rate, defined as the percentage of recipe suggestions that fully complied with all explicit user and profile-based constraints (e.g., excluding disallowed ingredients). This ensured that results were not only relevant but also safe and appropriate. Table 4 summarizes DietQA’s performance across these metrics in comparison to our baseline retrieval system.

The offline evaluation demonstrates that DietQA selects the top approved recipe in approximately 9 out of 10 queries (Top-1 = 88.3%) and includes at least one such recipe within its top three recommendations 93.3% of the time. This Recall@3 score indicates that the system reliably retrieves a breadth of relevant options for users to consider. Additionally, an nDCG@5 of 0.908 highlights that not only relevant recipes are found, but they are also accurately prioritized toward the top of the result list—given that nDCG measures how closely the ranking matches an ideal ordering based on relevance. A Constraint Satisfaction rate of 90.0% further confirms that the majority of returned recipes fully adhere to the specified dietary restrictions, reinforcing the system’s practical validity in nutrition-focused contexts.

In terms of answering factual nutrition questions, DietQA achieved 94.0% accuracy on our test set (covering topics such calorie and micronutrients values of specific foods). The remaining 6% of errors were due to linguistic mismatches between user phrasing and the terminology used in the KG.

These results signal that DietQA can confidently serve as an automated, reliable recommendation engine, even with strict multi-constraint queries. The ~10% gap in constraint satisfaction highlights areas for potential model refinement. While all benchmark queries were designed to be answerable based on the KG’s content and dictionary, natural linguistic variation still posed challenges for robust intent and entity extraction. A key contributing factor is the presence of linguistic mismatches between the way users naturally express dietary needs or preferences and the formal terminology used within the knowledge. These mismatches can lead to partial or incorrect mapping of constraints. Additionally, errors or omissions in the extraction of the intents and the entities from natural language queries may result in incomplete filtering or constraint logic. Addressing these issues through improved natural language understanding, expanded lexical coverage, and more robust query parsing could further increase DietQA’s precision and reliability in complex retrieval scenarios.

6.2. Diet-Constrained Query Evaluation

To systematically evaluate the system’s ability to handle complex dietary constraints, we generated 34,470 queries spanning 2248 distinct dishes, each combined with 15 different dietary scenarios. These scenarios reflect all possible non-empty combinations of four primary diets—Vegan, Gluten-Free, Dairy-Free, and Low-Fat—capturing the full range of single and multi-diet restrictions a user might apply.

Each of the 34,470 benchmark queries was evaluated against a 17,587-recipe corpus. With only the base ingredient index, 7233 queries (21.0%) returned at least one match (Table 5). Enabling ingredient substitution rescued an additional 2485 queries (7.2%), raising overall coverage from 21.0% to 28.2%. Substitution also enriched 1542 of the directly matched queries with new recipes. Notably, substitution-only matches (2485 queries) amount to a 34.4% uplift over direct matches and constitute 25.6% of all successful queries. These findings demonstrate that our substitution engine not only supplements existing coverage but extends the system’s ability to satisfy a further 7.2% of user requests—and that about one quarter of all successful queries rely exclusively on substitution.

Table 6 reports, for each level of diet-constraint complexity (1–4 diets), the following metrics: the total number of queries issued and how many returned ≥1 recipe; the aggregate count of all retrieved recipes; the overall coverage rate (percent of queries with results); the average number of results per successful query; the count of unique dishes covered by direct retrieval and the additional unique dishes unlocked via substitution; and the coverage lift from substitution.

We quantified both the total number of recipe matches and the count of unique dishes returned for each possible combination of dietary constraints, ranging from a single restriction up to all four simultaneously. Our analysis clearly demonstrates that diet complexity is the most significant factor influencing system performance: as each additional dietary constraint is imposed, the coverage of both total matches and unique recipes declines sharply. This compounding effect underscores the escalating challenge of satisfying multiple concurrent requirements.

The chart in Figure 5a shows how the total volume of retrieved recipes breaks down between direct index matches and substitution-enabled matches as dietary constraints increase from one to four filters:

X-axis (Diet Complexity): Moves left to right from queries with a single diet restriction up to queries combining all four diets.

Orange (Base Coverage): The core number of recipes returned without any substitutions. It starts very high for simple, one-diet queries (~14,530 recipes) but plunges constraints added, falling to just over 200 recipes when four diets must all be satisfied simultaneously.

Light blue (Substitution Coverage): The extra recipes brought in by allowing ingredient substitutions. Substitution adds a meaningful lift at every level—around 3300 additional recipes for one and two diets, tapering to about 200 extra recipes at four diets.

Overall Shape: The stacked area shrinks dramatically from ~17,840 total recipes at one diet to roughly 430 at four diets, highlighting how base coverage collapses under multiple filters and how substitution consistently rescues a substantial fraction of otherwise-lost recipes.

Overall, the chart shows that direct recipe coverage drops quickly as more diet filters are added, while substitution helps offset this decline and becomes increasingly important for maintaining useful recipe results.

The chart in Figure 5b presents two linked views of how coverage evolves as dietary constraints accumulate:

Stacked Bars (left axis): For each level of diet complexity, from 1 to 4 diets, the lower (blue) segment shows the absolute number of unique dishes covered by the base index alone, while the upper (orange) segment shows how many additional unique dishes are unlocked through substitution. Base coverage declines steeply from roughly 1085 dishes at one diet down to about 228 at four diets, while substitution steadily makes up an ever-larger portion of the total.

Black Line (right axis): This plots the percentage lift in coverage provided by substitution at each diet level. Beginning with a ~23% uplift for single-diet queries, the boost rises steeply to over 100% for four-diet queries—underscoring substitution’s crucial role once direct matches become scarce.

Combined, these views illustrate that as diet complexity increases and base coverage contracts, substitution not only recuperates lost matches but progressively constitutes the majority of total coverage.

To complement the coverage analysis, we quantified the computational cost of the benchmark workload. Each of the 34,470 queries was executed against the full pipeline, and the system recorded end-to-end latencies in milliseconds. To measure the impact of ingredient substitution, every query was executed twice, once with substitution disabled and once with substitution enabled and we measured the latency of each run as well as the aggregate time across the pair.

Table 7 presents the resulting latency measurements. It first reports performance for the aggregated base and substitution pairs, followed by the two configurations individually. Results are then stratified by dietary constraint complexity, from 1 to 4 concurrent diets. For each complexity level, three rows are provided, namely base, substitution, and aggregated. The columns report standard summary metrics that capture central tendency (mean and median), tail behaviour (p95 and p99), and range (minimum and maximum).

According to the metrics, the end-to-end response time for DietQA remains consistent across different levels of dietary constraint complexity. Median latencies for single-diet queries (93.2 ms) and those involving up to four simultaneous diets (91.9 ms) stay below 100 ms, indicating that adding more dietary filters does not substantially increase the typical processing time. The similarity in median values across all scenarios suggests that the system’s retrieval and validation stages handle constraint complexity without significant delays.

Comparing runs with and without substitutions shows that the ingredient replacement mechanism contributes only a slight increase in processing time. Base queries execute in around 92.4 ms, and substitution-enhanced queries require about 93.4 ms, adding only approximately 1 ms to the total. This narrow difference highlights that the substitution logic integrates smoothly into the overall pipeline without causing major slowdowns. These trends are consistent with the computational model in which tighter filters reduce the effective candidate set; more constraints lead to fewer join operations and cheaper ranking, offsetting any added logical complexity of the predicate.

The aggregated results (base plus substitution), maintain response times well under 500 ms even at the 95th percentile. The 95th percentile latency for aggregated queries is 402.3 ms, and the 99th percentile is 575.4 ms, demonstrating that worst-case query execution rarely exceeds 600 ms, ensuring that most user interactions fall within an interactive.

Overall, the system stays fast and predictable, with stable tails (about p95 400 ms and p99 575 ms), and substitution adds only a small, fixed overhead. Each query first uses indexes and diet filters to narrow the graph to a small candidate set, and the work that follows scales with that set rather than with the total number of recipes.

To assess how latency is affected by the total number of recipes, we evaluated two deployments on corpus-specific query sets, contrasting a smaller Recipe KG (4395 recipes) with a larger one (17,587 recipes; 4×). The smaller KG was produced by randomly removing 75% of recipes from the original dataset. For each corpus, the query set was constructed by pairing the unique dishes of the KG, with all non-empty combinations of four primary diets (1–4 diets). Table 8 presents the experimental metrics for the two recipe databases. It lists dataset size and composition, the evaluation workload, the outcomes of running the workload, time and efficiency measures, latency statistics and coverage lift using substitution.

Scaling the corpus materially improved coverage from 20.5% to 28.2% (+37%) and increased mean results per query by about 2.5×, while tail latency remained stable (p95 ≈ 402 ms; p99 ≈ 575 ms). Mean and median per-query latency rose only modestly (14% and 17%), indicating limited sensitivity to corpus size within this range. System efficiency improved substantially with scale. The overall time per result decreased from 101.1 ms to 65.0 ms (−36%), and throughput on successful queries rose from 9.9 to 15.4 results per second. The correlation between latency and result count remained weak (ρ ≈ 0.03–0.14 across runs), suggesting that runtime is dominated by fixed costs) and bounded candidate sets rather than by the absolute corpus size. Ingredient substitution contributes materially to recall with negligible overhead. In both corpora it yields to about 37% additional results beyond base retrieval, with the largest uplifts on multi-diet.

Both runs contain rare extreme-latency outliers of similar rate and magnitude. Overall, enlarging the recipe database substantially improves recall and per-result efficiency while keeping tail latency stable, supporting the claim that performance is largely independent of corpus size. Collectively, these measurements substantiate the analytical claims of §3.7 and demonstrate predictable, sub-second performance under realistic multi-diet workloads.

6.3. Negation and Exclusion Recognition

In order to assess the system’s ability to recognize and operationalise negation and exclusion in user queries, we evaluated how expressions such as “without eggs”, “no peanuts”, and “except for chicken” are detected, normalized to canonical entities, and enforced as hard constraints within the KG and Cypher retrieval pipeline. We assembled a 50-query Greek dataset comprising realistic recipe requests, each containing one or more explicit negative constructions (e.g., “without”, “no”, “do not include”, “except for”, “omit”, “avoid”) and their combinations; for example, “I want a high-protein salad; avoid peanuts and walnuts, and don’t add honey,” and “I want main courses without meat, except for chicken, and without cheese.” Each explicit exclusion mention in the queries was annotated as a distinct entity. A system-identified exclusion was considered correct only if the system first detected the exclusion and its entity mention, and then, after normalization, mapped it to the corresponding canonical KG entity.

We report micro-averaged precision, recall, and F1 at the entity level. The 50 queries contained 109 explicit exclusion entities. True positives were exclusions correctly detected and mapped; false positives were exclusions detected where none existed; false negatives were gold exclusions the system failed to detect. On this set the system produced 107 true positives, 0 false positives, and 2 false negatives. These counts yield precision = 107/(107 + 0) = 100.0%, recall = 107/(107 + 2) ≈ 98.2%, and F1 ≈ 99.1%. In summary, on a 50-query set with 109 explicit exclusions, the system achieved 100.0% precision, 98.2% recall, and 99.1% F1 for exclusion and negation recognition. These results demonstrate that the system can reliably handle exclusions and negations through dedicated intents and canonicalization, a non-trivial capability in morphologically rich languages such as Greek.

6.4. User Study

We conducted a comprehensive user evaluation study to assess the effectiveness and usability of DietQA in real-world scenarios. Eight participants were recruited for this evaluation, representing a diverse sample of potential users with varying dietary needs and technological familiarity. Each participant was assigned a set of typical tasks that simulated authentic interactions with a dietary recommendation system.

6.4.1. Study Design and Methodology

The study employed a mixed-methods approach combining task-based evaluation with standardized questionnaires. Participants were encouraged to interact with the DietQA system naturally and freely, using conversational language as they would when speaking to a human assistant. All interactions were logged with explicit participant consent, enabling detailed post hoc analysis of user behaviour patterns, query formulations, and system responses.

Following the completion of all assigned tasks, participants provided feedback through two validated assessment instruments. The first was a custom post-study questionnaire featuring eight statements measured on standard 5-point Likert scales. These statements assessed the critical dimensions of user experience. Additionally, participants completed the widely validated 10-item System Usability Scale (SUS), a standardized instrument that provides a reliable measure of perceived system usability.

6.4.2. Task Design and Coverage

The evaluation protocol encompassed ten distinct task categories (Table 9), each crafted to elicit different facets of DietQA’s functionality while simulating realistic user behaviour. This task structure was designed to evaluate both the breadth and depth of DietQA’s capabilities, from simple ingredient-based searches to complex multi-constraint optimization problems that mirror real-world dietary planning challenges.

In order to contextualize system responses, participants were first instructed to configure their personalized dietary profiles within the interface. This setup phase included specification of preferred and disliked ingredients, explicitly excluded items (e.g., due to allergies or dietary rules), active dietary regimes (e.g., vegan, low-fat), nutritional objectives (e.g., high protein), and currently available pantry ingredients. This initial configuration ensured that DietQA had the necessary information to provide personalized and context-aware recipe suggestions during the evaluation.

Participants were then instructed to perform each task type three times using different criteria, ensuring thorough coverage of the system’s functionality while capturing variation in individual dietary needs and query phrasing. They were informed that their preferences were already stored in the system and did not need to be repeated during the conversation. This setup allowed the evaluation to assess how effectively DietQA adapted its recommendations by combining the pre-defined user profiles with the dynamic parameters introduced in each interaction.

6.4.3. Assessment Instruments

To capture both subjective experience and objective usability, we combined a custom User-Experience (UX) questionnaire with the standardized System Usability Scale (SUS).

Custom UX questionnaire. Participants rated eight statements on a 5-point Likert scale (1 = strongly disagree, 5 = strongly agree). The items targeted four key dimensions: perceived usability, usefulness, satisfaction, and trust in the system’s recommendations (Table 10).

System Usability Scale. Immediately after the UX survey, participants completed the 10-item SUS (Table 11). Following the standard SUS procedure, negatively worded items were reverse-scored, individual responses were summed, and the total was multiplied by 2.5 to yield a composite score on a 0–100 scale.

Together, these two instruments provide a balanced evaluation: the custom questionnaire offers domain-specific insight into DietQA’s perceived value as a dietary assistant, whereas the SUS supplies a widely recognized benchmark for overall usability.

6.4.4. Evaluation Metrics and Analysis Framework

The study employed multiple quantitative metrics to provide a comprehensive assessment of DietQA’s performance and user acceptance:

User Experience Metrics: From the custom survey responses, we calculated average ratings for each of the four primary dimensions—usability, usefulness, satisfaction, and trust. Raw scores on the 5-point scale were converted to percentage values (0–100%) to facilitate interpretation and comparison across different aspects of user experience.

System Usability Assessment: The SUS scores were computed following the standard methodology, which involves reverse-scoring negatively worded items, summing all responses, and applying a multiplicative factor to produce the final 0–100 scale score. SUS scores can be interpreted using established benchmarks, where scores above 68 are considered above average, scores above 80 indicate excellent usability, and scores below 50 suggest significant usability concerns.

Qualitative Analysis: Beyond quantitative metrics, the logged interaction data provided rich qualitative insights into user behaviour patterns, common query formulations, system response quality, and areas for potential improvement. This qualitative component complemented the numerical assessments by revealing nuanced aspects of user experience that standardized scales might not capture.

The combination of task-based evaluation, standardized usability assessment, and custom user experience measurement provided a robust framework for understanding DietQA’s effectiveness as a practical dietary assistance tool while identifying specific strengths and areas for future development.

6.4.5. User Study Results and Analysis

The user study provided rich insights into DietQA’s real-world usability and usefulness. Table 12 presents a summary of user experience metrics gathered from post-study questionnaires.

The individual ratings demonstrate consistently strong performance across all dimensions. On average, participants rated ease of use at 4.63 out of 5 (92.5%), indicating that the interface and interaction flow were highly intuitive. Recommendation usefulness and overall satisfaction both scored 4.25 (85%), reflecting clear and relevant suggestions. Personalization received the highest average of 4.75 (95%), showing that users felt the system effectively tailored recipes to their dietary profiles. Trust in the system’s accuracy averaged 4.38 (87.5%), highlighting users’ confidence in the factual correctness and reliability of the answers, while enjoyment of suggested recipes and intent to continue using the system both averaged 4.25 (85%), indicating positive engagement and adoption potential. Results indicate that DietQA provides real value in a way users would integrate into their routine, which is the ultimate test of a practical system.

In order to conduct the usability analysis by dimension, we first computed the SUS Question-Level Percentage Scores by applying the standard SUS scoring procedure. Each SUS question was then mapped to a thematic usability dimension (e.g., ease of use, confidence, complexity, and confidence) based on the intent of the question as defined in the original SUS framework and supported by prior usability studies. For example, responses to “I thought the system was easy to use” were grouped under the ease of use dimension, while “I found the system unnecessarily complex” was associated with Complexity/Simplicity. Once grouped, the percentage scores of all questions within the same dimension were averaged to produce an aggregate score for that usability dimension.

Ease of Use and Confidence were the highest-scoring dimensions, each averaging 90.6%, indicating that users found the system intuitive and felt assured while using it. These results affirm that DietQA’s interface design and interaction style support effective and low-friction user experiences.

Cumbersomeness, derived from a negatively worded SUS item and reverse-scored, also received a high score of 87.5%. This indicates that users largely did not find the system cumbersome, further supporting the conclusion that the interface is lightweight and easy to interact with.

Complexity/Simplicity and Consistency received the lowest—but still positive—scores, both averaging 78.1%. These slightly lower values suggest minor areas for improvement, possibly related to subtle variations in system responses or occasional complexity when handling multi-constraint queries. Nonetheless, the results indicate generally positive perceptions across these dimensions and do not point to critical usability issues.

Overall, the analysis confirms that DietQA performs well in core usability domains—particularly in ease of use, confidence, and effortlessness—while also highlighting areas for incremental refinement.

DietQA met its objectives in evaluation, with reached results, high accuracy and strong user approval. Users responded favourably to the personalized, conversational experience, indicating that such a system can be a practical addition to diet apps to improve user engagement and outcomes. In the next section, we reflect on the implications of these results, discuss limitations observed, and outline future enhancements.

7. Discussion

This research presents DietQA, a novel framework that integrates knowledge graphs, retrieval-augmented generation, and large language models to address fundamental challenges in personalized nutrition guidance. Our approach to nutritional guidance combines three complementary components. First, we employ KG reasoning, building a semantically rich graph that encodes 17.5 K Greek recipes, 3.3 K ingredients, their nutritional attributes, and associated diet tags to support constraint-aware querying, dietary filtering, and ingredient substitution. Second, we provide a conversational natural language interface, allowing users to express complex, multi-constraint dietary requests in natural language. Third, a real-time adaptation engine merges graph query outputs with user profiles and context to generate personalized recommendations on the fly, blending symbolic reasoning with dynamic response generation rather than relying on static search filters.

7.1. Implications for Personalized Diet and Health Technology

DietQA demonstrates the feasibility of fine-grained, constraint-based question answering in an end-user application. It grounds its responses in a KG of nutrients, diet tags, and substitutions—linking user profiles, queries, and compliant recipes, allowing the system to be explanatory and transparent in its recommendations. This transparency is particularly important in health contexts; users are more likely to trust and follow dietary advice when they understand the reasoning behind it. This could improve adherence to dietary recommendations. The positive user feedback on trust and satisfaction in our study reinforces the notion that data-backed, transparent recommendations are well-received and meaningful to users.

Real-world diets often involve multiple simultaneous restrictions. Constraint stacking quickly overwhelms not only simple keyword search but also conventional filtering methods. Our large-scale evaluation of synthetic single- and multi-diet-constrained queries highlight the growing importance of ingredient substitution as dietary complexity increases. While direct matches alone suffice for a subset of queries—particularly in single-diet cases—their effectiveness drops sharply as more constraints are combined. The ability to substitute ingredients plays a pivotal role in extending the system’s usefulness: it not only recovers a meaningful number of previously unmatched queries but also enhances those that were covered by the base index. In many cases with three or four dietary filters, substitution contributes more than half of the total results retrieved.

DietQA distinguishes between “hard” and “soft” constraints: the former are inviolable filters (e.g., exclude shellfish; must be gluten-free), while the latter guide ranking (e.g., user prefers spinach; target high protein). This dual treatment mirrors real clinical workflows in which safety or adherence rules take precedence, but taste and convenience still matter. When user profiles are not used during retrieval, the system may return inappropriate results, such as meat-based dishes for vegetarian users. Participant ratings support that the layered constraint model generates recommendations that feel personally relevant and tailored.

Population-level nutrition guidelines are often hard for individuals to apply to everyday meals. DietQA bridges this gap by translating dietary profiles and numeric targets into executable KG queries, enabling it to “do the math” in real time—filtering, ranking, or flagging ingredients based on compliance. For instance, users can request low-fat meals and receive computed responses based on ingredient content. This ability to apply abstract targets at the point of decision helps users make informed choices and encourages better adherence.

Furthermore, DietQA’s conversational design addresses common usability issues in diet apps, like repetitive input and inflexible interactions; users interacted in free text, set their preferences once, and asked natural follow-ups without needing to restate constraints. The combination of conversational AI with personalization has important implications for user engagement, addressing a common challenge in diet apps where users often lose interest over time. This design yielded strong usability scores, including ease of use, satisfaction, and intention to continue use, suggesting that conversational delivery, coupled with visible personalisation, supports sustained engagement.

Beyond its conversational flow, DietQA’s interface was designed to be intuitive and visually supportive, ensuring that users can interact effortlessly with complex dietary features. Participants responded positively to the interface, with usability feedback indicating that they were able to navigate and operate the system with ease. These impressions were reflected in the evaluation metrics: ease of use averaged 92.5%, satisfaction 85%, and intention to continue use also 85%. Overall, the interface played a key role in making DietQA feel both accessible and practical for everyday use.

DietQA addresses key limitations of free-form generative models by grounding its recommendations in a verified recipe database and enforcing explicit dietary constraints. Unlike unconstrained text generation approaches, which may produce inaccurate, non-compliant, or hallucinated content, DietQA ensures that all outputs are both nutritionally valid and semantically aligned with user requirements. This structured retrieval and reasoning framework enhances reliability, interpretability, and trustworthiness in dietary recommendation scenarios.

Building on this design, our empirical study shows that the system maintains interactive responsiveness under compound diet constraints, while ingredient substitution expands the set of valid recommendations with only negligible computational overhead. The evaluation further indicates stable tail behaviour and favourable scaling as the corpus grows, reinforcing the system’s practical reliability for real-world use.

Beyond recipe retrieval, users frequently seek quick nutrient facts. DietQA answered such factual nutrition questions with ~94% accuracy in controlled testing, with residual errors linked mainly to linguistic mismatches between user phrasing and KG terminology—highlighting the importance of synonym dictionaries and robust entity normalization in domain chatbots. Reliable fact answers are a prerequisite for clinical utility; hallucinated nutrient values could mislead users managing conditions like renal disease or diabetes.

All evaluation data and the recipe KG in this study were derived from Greek-language sources, ensuring cultural and linguistic relevance for the target audience. Nutritional values for ingredients were translated and integrated from established English-language databases. At the conversational edge, an instruction-tuned Gemma 3 LLM (gemma3:12b) supports 140+ languages, enabling the chatbot to understand and respond in multiple languages without changes to application logic. The retrieval and reasoning layers are language-agnostic: user utterances are normalized to canonical KG entities, constraints are enforced in Cypher, and responses are grounded in retrieved facts. Successful deployment in other regions therefore requires localization at multiple levels, including ingestion of region-specific recipe corpora, alignment to appropriate national or regional nutrient databases, and adaptation of dietary taxonomies to reflect local norms. We therefore describe DietQA as language-adaptable: the reasoning layer is language-agnostic, and deployment to a new language requires only localized resources (corpora, nutrient databases, diet taxonomies), not algorithmic changes. In line with this language-adaptable design, our evaluation of negation and exclusion handling shows near-perfect performance in Greek, underscoring robustness in a morphologically rich setting.

The strong signals of usability and personalization observed in the current study suggest that the conversational interface, combined with knowledge-based reasoning, can generalize effectively. However, empirical validation in diverse linguistic and dietary contexts remains essential to confirm its transferability.

7.2. Limitations

While the results of the DietQA prototype are encouraging, several technical and methodological shortcomings must be acknowledged before positioning the system for wider deployment. We group these limitations into four broad categories—nutritional accuracy, knowledge-graph coverage, conversational performance, and evaluation scope.

7.2.1. Nutritional and Culinary Fidelity

Ingredient-level estimates vs. cooked reality. All micronutrient values are calculated from raw-ingredient data. Heat-induced losses, water uptake, or fat rendering that occur during cooking are not yet modelled, so reported numbers can diverge from the values in a final product.
Cooking methods ignored. Preparation techniques (e.g., sautéing vs. steaming) strongly influence nutrient retention and energy density, but DietQA calculations are presently based only on the contributing ingredients.
Ingredient substitutions may alter the final product. The system suggests ingredient replacements to make a recipe compliant with specific dietary constraints. However, these substitutions are selected based on predefined tag compatibility rather than full nutritional or culinary equivalence. As a result, the proposed substitute may differ in flavour, texture, or nutritional profile—potentially impacting the final outcome of the dish. The system’s role is limited to proposing alternatives, while the user retains full control over whether to accept and apply the substitution.
Restricted nutrient and diet coverage. At present the KG stores only three macronutrients (protein, carbohydrates, fat) plus calories for each ingredient and handles four diet labels (Vegan, Gluten-Free, Dairy-Free and Low-Fat). This schema prevents queries about other health-critical nutrients (e.g., fibre, sodium, vitamins, minerals) or alternative eating patterns (e.g., Mediterranean, DASH, ketogenic). Broadening the KG to a richer nutrient panel and a more comprehensive diet taxonomy is therefore essential.
Out-of-vocabulary ingredients. If a newly scraped recipe contains even one ingredient absent from the KG, the entire recipe is discarded because essential nutrient and diet-fit metadata are missing. Enriching the KG therefore requires manual curation or automated connection to external food composition sources before such recipes can be included.

7.2.2. Conversational Performance

Latency introduced by LLM reasoning. Even with a quantised on-premise model, the RAG pipeline adds several seconds of delay compared with pure KG queries. While users in the study tolerated the lag, it remains a barrier to real-time kitchen use.
Context-window erosion. DietQA summarizes prior turns to keep the dialogue within the LLM’s context length, but long, multi-topic conversations can still drift, leading to loss of nuance or forgotten constraints.
Lack of feedback loop. The system currently lacks any mechanism for collecting user feedback—neither explicit (e.g., thumbs-up/down) nor implicit (e.g., recipe clicks, ignored results). Without these signals, it cannot adapt its recommendations or learn user preferences over time.

7.2.3. Evaluation Boundaries

Short-term, small-sample study. The usability evaluation involved 8 participants performing predefined tasks within a limited timeframe. While useful for initial insights, this short-term snapshot cannot capture long-term adherence, behavioural change, or the system’s robustness under sustained, real-world use.
Generalisability. All queries and recipes were evaluated in Greek; further work is needed to validate performance in other languages and cultural contexts.

7.2.4. Implications for Future Iterations

To overcome the identified limitations and enhance the system’s reliability, several key developments are necessary. First, nutritional estimates should account for cook-loss factors or draw from laboratory-validated recipe data to more accurately reflect the nutritional content of prepared dishes. Second, the KG must be enriched through automated integration with public nutrition databases and ontology alignment—extending its scope to include a comprehensive set of micronutrients (e.g., vitamins, minerals) and a more diverse range of dietary models. Third, improved dialogue management strategies, such as hierarchical memory or retrieval-augmented context handling, are needed to preserve conversational coherence over extended interactions. Finally, longitudinal user studies should be conducted to collect implicit feedback and monitor real-world health outcomes. Together, these enhancements are critical for transitioning DietQA from a prototype to a robust, personalized nutrition assistant with clinical and cross-cultural relevance.

7.3. Extensions and Future Work

Future extensions of DietQA may enhance both its technical architecture and its user-centred functionalities. On the technical side, incorporating vector-based semantic retrieval mechanisms could complement the existing symbolic query pipeline, enabling more flexible and context-aware matching between user queries and recipe content. Expanding the recipe schema to include structured cooking instructions and methods would enable additional filtering dimensions—such as preparation time and complexity—thereby improving the contextual relevance of recommendations. Moreover, incorporating detailed procedural steps could enhance the accuracy of nutritional data estimation by enabling more precise modelling of ingredient transformations and cooking losses. Additionally, linking ingredients to real-time pricing data via external APIs would enable cost-aware retrieval and recommendation, incorporating dynamic price estimation into the query process. This enhancement would increase the system’s practical utility for users managing dietary goals within financial constraints.

A second promising direction involves the integration of temporal dietary tracking. Currently, DietQA processes each query in isolation, without accounting for prior user behaviour. Coupling the system with a meal logging component—such as a food diary—would enable context-sensitive reasoning over the course of a day. For example, the system could tailor dinner recommendations based on nutritional intake logged for breakfast and lunch, helping users maintain adherence to daily dietary targets. In parallel, incorporating computer vision could further streamline interaction: allowing users to photograph their pantry or refrigerator and automatically extract ingredient availability via image recognition would help ground recommendations in the user’s actual food inventory.

Enhancing the modality of interaction also presents opportunities for broader accessibility and engagement. Future versions could implement native voice-based dialogue through a custom in-app voice interface. However, such integration introduces challenges in managing conversational continuity and in delivering complex information. Additionally, incorporating a user feedback loop—through mechanisms such as ratings, likes, or saved recipes—could refine recommendation accuracy over time by learning user preferences and surfacing more relevant results based on historical interaction patterns.

8. Conclusions

DietQA demonstrates the feasibility and utility of fine-grained, constraint-based question answering in the context of personalized nutrition. By combining symbolic reasoning over a structured recipe KG with neural generation capabilities, the system supports natural language queries that reflect complex dietary needs. Its UI enables intuitive interaction through both dialogue and interactive visual controls, facilitating user-friendly refinement and exploration.

The system’s ability to incorporate user preferences, ingredient availability, and dietary constraints leads to highly personalized and context-aware recipe recommendations. Offline benchmarks show strong retrieval accuracy and constraint compliance, while user study results highlight high levels of usability, trust, and satisfaction. Moreover, the architecture and empirical results indicate that the system scales gracefully to larger knowledge bases and richer, multi-constraint scenarios without compromising responsiveness or reliability. These findings validate the effectiveness of DietQA’s hybrid architecture and interaction model.

Future work will focus on improving natural language understanding to better align user phrasing with the KG vocabulary, supporting multi-turn dialogue for more complex interactions, and integrating temporal tracking to adapt recommendations based on users’ daily dietary patterns. We will also broaden the KG to include additional nutrients and a wider set of diet types, enabling richer dietary reasoning and wider applicability.

Author Contributions

Conceptualization, E.M. and I.T.; methodology, I.T.; software, I.T.; validation, E.M. and I.T.; formal analysis, E.M.; investigation, I.T.; resources, I.T.; data curation, I.T.; writing—original draft preparation, I.T.; writing—review and editing, I.T. and E.M.; visualization, I.T.; supervision, E.M. All authors have read and agreed to the published version of the manuscript.

Funding

This publication was financed by the Project “Strengthening and optimizing the operation of MODY services and academic and research units of the Hellenic Mediterranean University”, project number 80860, funded by the Public Investment Program of the Greek Ministry of Education and Religious Affairs.

Data Availability Statement

The data underpinning this study are not publicly accessible, as this research forms part of an ongoing PhD thesis. Data may be provided upon reasonable request after the dissertation has been completed and formally published, in accordance with institutional and ethical guidelines.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Abbreviations

The following abbreviations are used in this manuscript:

API	Application Programming Interface
AI	Artificial Intelligence
BM25	Best Match 25
BERT	Bidirectional Encoder Representations from Transformers
CDT	Causal Distillation Trees
DASH	Dietary Approaches to Stop Hypertension
FARE	Food Allergy Research & Education
FSANZ	Food Standards Australia New Zealand
HTRAG	Hierarchical Topic Retrieval-Augmented Generation
ITE	Individual Treatment Effect
JSON	JavaScript Object Notation
KG	Knowledge Graph
LLM	Large Language Model
LRU	Least Recently Used
NER	Named Entity Recognition
nDCG	Normalized Discounted Cumulative Gain
QA	Question Answering
RAG	Retrieval-Augmented Generation
SAM	Structural Agnostic Modelling
SUS	System Usability Scale
USDA	United States Department of Agriculture
UX	User Experience

Appendix A

Figure A1, Figure A2 and Figure A3: This appendix provides additional illustrations that complement the main text.

Figure A1. User’s Preferences Panel. Three UI cards show (i) Nutrient Emphasis (controls for Calories, Fat, Protein, Carbs and Ingredient Count), (ii) Available Ingredients (Pantry)—searchable pantry tags: pumpkin (κολοκύθα); tomato (ντομάτα); potatoes (πατάτες); peppers (πιπεριές); onions (κρεμμύδια); eggplants (μελιτζάνες); mustard (μουστάρδα); chicken (κοτόπουλο); salt (αλάτι); pepper (πιπέρι); black pepper (μαύρο πιπέρι); carrot (καρότο). (iii) Ingredient Preferences—Like: chicken fillet (κοτόπουλο φιλέτο); chicken breast (στήθος κοτόπουλου); turkey (γαλοπούλα); beef (μοσχάρι). Dislike: artichokes (αγκινάρες); fresh broad beans (κουκιά φρέσκα); mushrooms (μανιτάρια). Forbidden: peanut (φυστίκι); almonds (αμύγδαλα).

Figure A2. The conversational user interface in Greek and the translation in English.

Figure A3. DietQA results in Greek UI. Left: filters, diet toggles, tag-clouds and descriptive counts. Right: ranked recipe cards with pantry-match, missing ingredients and macronutrient bar.

References

Toledo, R.Y.; Alzahrani, A.A.; Martinez, L. A Food Recommender System Considering Nutritional Information and User Preferences; Springer: Singapore, 2019. [Google Scholar]
Azzi, R.; Despres, S.; Diallo, G. NutriSem: A Semantics-Driven Approach to Calculating Nutritional Value of Recipes. In Proceedings of the WorldCIST 2020, Budva, Montenegro, 7–10 April 2020. [Google Scholar]
Chen, M.; Jia, X.; Gorbonos, E.; Hong, C.T.; Yu, X.; Liu, Y. Eating Healthier: Exploring Nutrition Information for Healthier Recipe Recommendation. arXiv 2020, arXiv:2003.07027v1. [Google Scholar] [CrossRef]
Bondevik, J.N.; Bennin, K.E.; Babur, Ö.; Ersch, C. A systematic review on food recommender systems. Expert Syst. Appl. 2024, 238, 122166. [Google Scholar] [CrossRef]
Bakagianni, J.; Pouli, K.; Gavriilidou, M.; Pavlopoulos, J. A Systematic Survey of Natural Language Processing for the Greek Language. Patterns 2025, 6, 101313. [Google Scholar] [CrossRef]
Trattner, C.; Elsweiler, D. Food Recommender Systems: Important Contributions, Challenges and Future Research Directions. arXiv 2017, arXiv:1711.02760. [Google Scholar] [CrossRef]
Tran, T.N.T.; Atas, M.; Felfernig, A.; Stettinger, M. An overview of recommender systems in the healthy food domain. J. Intell. Inf. Syst. 2018, 50, 501–526. [Google Scholar] [CrossRef]
Orue-Saiz, I.; Kazarez, M.; Mendez-Zorrilla, A. Systematic Review of Nutritional Recommendation Systems. Appl. Sci. 2021, 11, 12069. [Google Scholar] [CrossRef]
Min, W.; Liu, C.; Xu, L.; Jiang, S. Applications of knowledge graphs for food science and industry. Patterns 2022, 3, 100484. [Google Scholar] [CrossRef]
Li, D.; Zaki, M.J.; Chen, C.-H. Health-guided recipe recommendation over knowledge graphs. J. Web Semant. 2023, 75, 100743. [Google Scholar] [CrossRef]
Cui, J.; Zhang, X.; Zheng, D. Construction of recipe knowledge graph based on user knowledge demands. J. Inf. Sci. 2023, 51, 881–895. [Google Scholar] [CrossRef]
Khilji, A.F.U.R.; Manna, R.; Laskar, S.R.; Pakray, P.; Das, D.; Bandyopadhyay, S.; Gelbukh, A. CookingQA: Answering Questions and Recommending Recipes Based on Ingredients. Arab. J. Sci. Eng. 2021, 46, 3701–3712. [Google Scholar] [CrossRef]
Zheng, W.; Cheng, H.; Yu, J.X.; Zou, L.; Zhao, K. Interactive natural language question answering over knowledge graphs. Inf. Sci. 2019, 481, 141–159. [Google Scholar] [CrossRef]
Chatterjee, U.; Giunchiglia, F.; Madalli, D.P.; Maltese, V. Modeling Recipes for Online Search. In On the Move to Meaningful Internet Systems: OTM 2016 Conferences; LNCS; Springer: Cham, Switzerland, 2016. [Google Scholar]
Gao, F.; Zhao, X.; Xia, D.; Zhou, Z.; Yang, R.; Lu, J.; Jiang, H.; Park, C.; Li, I. HealthGenie: Empowering Users with Healthy Dietary Guidance through Knowledge Graph and Large Language Models. arXiv 2025, arXiv:2504.14594. [Google Scholar] [CrossRef]
Chen, Y.; Subburathinam, A.; Chen, C.; Zaki, M.J. Personalized Food Recommendation as Constrained Question Answering over a Large-scale Food Knowledge Graph. In Proceedings of the Fourteenth ACM International Conference on Web Search and Data Mining (WSDM’21), Online, 8–12 March 2021. [Google Scholar]
Ławrynowicz, A.; Wróblewska, A.; Adrian, W.T.; Kulczyński, B.; Gramza-Michałowska, A. Food Recipe Ingredient Substitution Ontology Design Pattern. Sensors 2022, 22, 1095. [Google Scholar] [CrossRef] [PubMed]
Senath, T.; Athukorala, K.; Costa, R.; Ranathunga, S.; Kaur, R. Large Language Models for Ingredient Substitution in Food Recipes using Supervised Fine-tuning and Direct Preference Optimization. arXiv 2024, arXiv:2412.04922v1. [Google Scholar] [CrossRef]
Bajaj, V.; Panda, R.B.; Dabas, C.; Kaur, P. Graph Database for Recipe Recommendations. In Proceedings of the 7th International Conference on Reliability, Infocom Technologies and Optimization (ICRITO), Noida, India, 29–31 August 2018. [Google Scholar]
Tang, Y.S.; Zheng, A.H.; Lai, N. Healthy Recipe Recommendation Using Nutrition and Ratings Models. In Proceedings of the AAAI 2019, Honolulu, HI, USA, 27 January–1 February 2019. [Google Scholar]
The New York Times. Nytimes/Ingredient-Phrase-Tagger: Extract Structured Data from Ingredient Phrases Using Conditional Random Fields. Available online: https://github.com/nytimes/ingredient-phrase-tagger (accessed on 27 July 2025).
Marín, J.; Biswas, A.; Ofli, F.; Hynes, N.; Salvador, A.; Aytar, Y.; Weber, I.; Torralba, A. Recipe1M+: A Dataset for Learning Cross-Modal Embeddings for Cooking Recipes and Food Images. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 43, 187–203. [Google Scholar] [CrossRef] [PubMed]
Tsampos, I.; Marakakis, E. A Knowledge Graph Question Answering System for Personalized Nutrition and Recipes Recommendation. In Pervasive Computing Technologies for Healthcare: 18th EAI International Conference, PervasiveHealth 2024, Heraklion, Crete, Greece, 17–18 September 2024, Proceedings, Part I; Springer Nature AG: Cham, Switzerland, 2025; Volume 611, pp. 61–79. [Google Scholar]
Trattner, C.; Elsweiler, D. Investigating the Healthiness of Internet-Sourced Recipes. In Proceedings of the WWW Conference, Perth, Australia, 3–17 May 2017. [Google Scholar]
Huang, X.; Zhang, J.; Li, D.; Li, P. Knowledge Graph Embedding Based Question Answering. In Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining (WSDM’19), Melbourne, VIC, Australia, 11–15 February 2019. [Google Scholar]
Gharibi, M.; Zachariah, A.; Rao, P. FoodKG: A Tool to Enrich Knowledge Graphs Using Machine Learning Techniques. Front. Big Data 2020, 3, 12. [Google Scholar]
Zioutos, K.; Kondylakis, H.; Stefanidis, K. Healthy Personalized Recipe Recommendations for Weekly Meal Planning. Computers 2023, 13, 1. [Google Scholar] [CrossRef]
Ribeiro, R.; Batista, F.; Pardal, J.P.; Mamede, N.J.; Pinto, H.S. Cooking an Ontology. In Proceedings of the AIMSA 2006, Varna, Bulgaria, 12–15 September 2006. [Google Scholar]
Haussmann, S.; Seneviratne, O.; Chen, Y.; Ne’eman, Y.; Codella, J.; Chen, C.-H.; McGuinness, D.L.; Zaki, M.J. FoodKG: A Semantics-Driven Knowledge Graph for Food Recommendation. In Proceedings of the International Semantic Web Conference (ISWC), Auckland, New Zealand, 26–30 October 2019. [Google Scholar]
Fatemi, B.; Duval, Q.; Girdhar, R.; Drozdzal, M.; Romero-Soriano, A. Learning to Substitute Ingredients in Recipes. arXiv 2023, arXiv:2302.07960. [Google Scholar] [CrossRef]
Qi, Z.; Yu, Y.; Tu, M.; Tan, J.; Huang, Y. FoodGPT: A Large Language Model in Food Testing Domain with Incremental Pre-training and Knowledge Graph Prompt. arXiv 2023, arXiv:2308.10173. [Google Scholar]
Zhou, P.; Min, W.; Fu, C.; Jin, Y.; Huang, M.; Li, X.; Mei, S.; Jiang, S. FoodSky: A Food-oriented Large Language Model that Passes the Chef and Dietetic Examination. arXiv 2024, arXiv:2406.10261v1. [Google Scholar] [CrossRef]
Yang, Z.; Khatibi, E.; Nagesh, N.; Abbasian, M.; Azimi, I.; Jain, R.; Rahmani, A.M. ChatDiet: Empowering personalized nutrition-oriented food recommender chatbots through an LLM-augmented framework. Smart Health 2024, 32, 100465. [Google Scholar] [CrossRef]
Tsampos, I.; Marakakis, E. Querying Knowledge Graphs in Greek Language. In Proceedings of the 17th International Conference on PErvasive Technologies Related to Assistive Environments, Crete, Greece, 26–28 June 2024. [Google Scholar]
Serderidis, K.; Konstantinidis, I.; Meditskos, G.; Peristeras, V.; Bassiliades, N. d2kg: An integrated ontology for knowledge graph-based representation of government decisions and acts—The Greek Programme Diavgeia case. Semant. Web 2024, 15, 1677–1699. [Google Scholar] [CrossRef]
Tsampos, I.; Marakakis, E. A Medical Question Answering System with NLP and Graph Database. In Proceedings of the HeDAI 2023 Workshop, Co-Located with EDBT/ICDT 2023 Joint Conference, Ioannina, Greece, 28–31 March 2023. [Google Scholar]

Figure 1. DietQA System Pipeline.

Figure 2. DietQA System Frontend and Backend Architecture.

Figure 3. (a) The recipe’s Neo4j KG schema. (b) Example subgraph showing a recipe node Cupcakes with Nutella (Cupcakes με νουτέλλα) connected to its ingredients, i.e. eggs (αυγά), Nutella (νουτέλλα), self-rising flour (φαρίνα) and mascarpone (μασκαρπόνε), via CONTAINS relationships. Greek terms in parentheses reflect the original labels in the KG.

Figure 4. Search terms extracted from the original Greek query translated as “Dairy-Free chocolate chip cookies with oat, excluding eggs”. Terms are displayed as colour-coded pills: chocolate chip cookies, oat (βρώμη), eggs (αυγά), and Dairy-Free.

Figure 5. (a) Graph of Total Queries Coverage by Query Complexity, (b) Graph of Unique Dishes Coverage and Coverage Lift by Query Complexity.

Table 1. Comparison of DietQA with the most related systems. A ✓ indicates that the system supports the corresponding feature, while a ✗ indicates it does not.

System Features	DietQA	HealthGenie [15]	FoodKG [29]	pFoodReQ [16]	Health-Guided Recipe Recom. [10]	FoodSky [32]	ChatDiet [33]	FoodGPT [31]	GISMo [30]
Uses a food KG	✓	✓	✓	✓	✓	✓	✗	✓	✓
Formulated as KG-QA	✓	✓	✓	✓	✗	✗	✗	✗	✗
Multi-diet combination reasoning	✓	✓	✗	✓	✗	✗	✗	✗	✗
Diet-aware ingredient substitutions	✓	✓	✓	✗	✓	✗	✗	✗	✓
Conversational, multi-turn QA	✓	✓	✗	✗	✗	✓	✓	✗	✗
Symbolic constraint composition	✓	✓	✓	✓	✗	✗	✗	✗	✗
Explanation/rationale	✓	✓	✗	✗	✓	✓	✓	✗	✗
Quantitative multi-diet coverage and substitution lift	✓	✗	✗	✗	✗	✗	✗	✗	✗

Table 2. Nutritional Grouping of Retrieved Recipes for Preference-Based Comparison.

You’re looking for recipes with chicken and tomato, with a focus on high protein content, low fat and low carbohydrates, and a small number of total ingredients. You don’t like artichokes, broad beans, okra, or mushrooms.
Let’s look at the options:
Your search returned 44 recipes, mainly featuring chicken and tomato-based dishes, including main meals, salads, pizzas, and soups. Most recipes are suitable for gluten-free and dairy-free diets, and many are high in protein and low in fat.
To help you choose, we’ve grouped the recipes into two categories based on their nutritional profile:
Group 1 (Lighter): These recipes (18) have about 121 calories per 100 g, with high protein content (23%), moderate fat (27%), and fairly high carbohydrates (50%). Ideal if you’re looking for something light and protein-rich.
Group 2 (Richer): These recipes (26) have about 220 calories per 100 g, with lower protein content (15%), much higher fat (43%), and lower carbohydrates (18%). This group is richer in fats and may be more filling, but it doesn’t align well with your preference for low fat.

Table 3. Contextualized Recipe Evaluation Based on User Preferences.

The top three results are: calzone, chicken/turkey giouvetsi, and beer-roasted chicken, with 11, 10, and 8 ingredients, respectively.
Let’s look at how well these match your preferences:
Pan-fried Calzone: High in fat and carbs, so not ideal if you’re looking for low-fat, low-carb meals. You already have tomato, but you’ll need turkey, ham, tortillas, gouda, mozzarella, and peppers.
Chicken or Turkey Giouvetsi: A good option! Moderate in carbs, high in protein, and low in fat. You already have chicken, tomato, onion, pepper, and salt. You’ll need orzo, turkey, chicken stock cube, tomato paste, and garlic.
Beer-Roasted Chicken: Interesting choice, but high in fat. You already have tomato, chicken, salt, and black pepper. You’ll need beer, mustard powder, red pepper, and margarine.
Chicken with Eggplant in Pot: A very good choice! Low in calories, high in protein, and low in fat. You already have tomato and chicken. You’ll need tsakonian eggplants and oil.
Rooster Soup with Trahana: Also a good option, moderate in carbs. You already have potatoes, zucchini, tomato, onion, and chicken. You’ll need leeks, rooster, and celery.
Chicken with Lentil Rice: An interesting option, but moderate in fat. You already have chicken, tomato, and onion. You’ll need lentils, rice, and balsamic vinegar.
Chicken with Peas in Clay Pot: Not the best option, as it’s quite fatty. You have chicken, tomato, carrot, and onion.
Chicken with Okra: Not ideal either, as it’s high in fat and carbs, and you don’t like okra. You have tomato, onion, chicken, pepper, and salt.
Summary:
The Chicken or Turkey Giouvetsi and Chicken with Eggplant in Pot are the best matches for your preferences: high protein, low fat.
The rest have drawbacks in terms of fat, carbs, or required ingredients.

Table 4. Evaluation metrics for recipe retrieval and constraint satisfaction performance.

Top-1 Accuracy	Recall@3	nDCG@5	Constraint Satisfaction
88.3%	93.3%	0.908	90.0%

Table 5. Diet-constrained query resolution outcomes.

Covered Query Category	# Queries	Percentage of Covered Queries
All Covered Queries (≥1 recipe)	9718	100.0%
Covered by Direct Match	7233	74.4%
- Direct-Only Coverage	5691	58.6%
Covered by Substitution	4027	41.4%
- Substitution-Only Coverage	2485	25.6%
- Supplemented Coverage *	1542	15.9%

* “Supplemented Coverage” refers to queries that had at least one direct match and gained additional recipes via substitution.

Table 6. Diet complexity summary: queries, coverage, and substitution boosts.

# Diets	Total Queries	Queries with Results	Total Results	Coverage	Avg Results/Successful Query	Unique Dishes from Direct Match	Additional Unique Dishes from Substitution Coverage	Coverage Lift Using Substitution
1	9192	4335	17,836	47.2%	4.11	1084	250	+23%
2	13,788	3682	11,580	26.7%	3.15	614	319	+52%
3	9192	1473	3493	16.0%	2.37	368	320	+87%
4	2298	228	345	9.9%	1.51	228	257	+113%

Table 7. Summary of end-to-end latency (milliseconds) across 34,470 queries, covering all diet combinations from 1 to 4 and comparing runs with and without substitutions.

Span	Mean	Median	P95	P99	Min	Max
Aggregated queries (base + substitution)	213.7	186.7	402.3	575.4	169.78	1450.9
Base (no substitution)	105.2	92.4	197.7	345.1	83.9	967.9
With substitution	108.5	93.4	201.5	362.3	83.7	1323.9
1 diet base/substitution/aggregated	107.0	93.2	199.1	348.1	83.8	940.3
	109.0	93.7	201.7	364.9	84.4	1153.4
	216.0	187.7	410.0	577.6	170.8	1300.9
2 diets base/substitution/aggregated	105.2	92.2	198.4	345.8	83.9	967.9
	108.7	93.3	202.5	363.7	83.7	1323.9
	213.9	186.5	410.7	575.7	169.8	1416.4
3 diets base/substitution/aggregated	103.7	91.9	181.4	339.0	84.4	771.8
	107.8	93.2	201.1	358.2	84.4	1307.5
	211.5	186.2	371.5	545.6	170.1	1450.9
4 diets base/substitution/aggregated	104.0	91.9	197.8	333.5	84.8	756.7
	107.9	93.2	200.5	360.4	84.6	1118.2
	212.0	185.7	399.4	551.3	172.0	1212.1

Table 8. Experimental metrics for two databases of 4.4 K and 17.6 K recipes.

Metric	4.4 K Recipes DB	17.6 K Recipes DB	Change
Number of Recipes	4395	17,587	×4.0
Avg Ingredients per Recipe	9.24	9.25	≈same
Unique Dishes	1363	2298	68.6%
Queries	20,445	34,470	68.6%
Total Results	7885	33,258	×4.2
Coverage	20.5%	28.2%	+37.6%
No-result rate	79.5%	71.8%	−7.7 pp
Avg results/query	0.39	0.97	×2.50
Time per result (overall)	101.1 ms	65.0 ms	−36%
Results per second (successful)	9.9	15.4	+55%
Mean total latency/query	187.9 ms	213.7 ms	+13.8%
Median total latency/query	159.1 ms	186.7 ms	+17.4%
p95/p99 latency	406/578 ms	402/575 ms	≈same
Coverage Lift using Substitution	+36.6%	+37.8%	≈same

Table 9. Tasks User Study Task Set.

Task #	Task Description
1	Find a suitable breakfast that fits your usual dietary preferences or restrictions
2	Ask a nutrition-related question you are genuinely curious about (e.g., low-carb recipes, protein content, or alternatives to dairy)
3	Search for a lunch option under a specific micronutrient threshold that includes preferred ingredients while excluding disliked ones
4	Look for a meal from a specific dietary framework (e.g., Mediterranean, ketogenic) that is also high in protein
5	Find a dessert recipe that satisfies your preferences (e.g., low-fat, gluten-free) using varied expression methods
6	Perform complex queries with multiple dietary constraints (e.g., low-carb, high-protein, dairy-free) and evaluate result relevance
7	Request recipes incorporating specific ingredients within a chosen dietary framework and verify alignment with stated goals
8	Search for recipes featuring a particular main ingredient and explore preparation methods using preferred pantry items
9	Inquire about calorie content and macronutrient breakdown of commonly consumed ingredients or food items
10	Formulate one or more personalized questions reflecting genuine dietary needs, preferences, or nutritional curiosities

Table 10. User Experience Assessment Items.

Statement #	Statement
1	The system was easy to use
2	The recipe suggestions were helpful for my needs
3	I was satisfied with the quality and clarity of the answers
4	The system respected my dietary preferences and restrictions
5	I trust the nutritional information and recommendations given by the system
6	I felt the recommendations were tailored to my personal dietary needs
7	I enjoyed trying the recipes suggested by the system
8	I would continue using this system if it were available in my regular diet app

Table 11. System Usability Scale Items.

Item #	Statement
1	I think that I would like to use this system frequently
2	I found the system unnecessarily complex (reverse-scored)
3	I thought the system was easy to use
4	I think that I would need the support of a technical person to be able to use this system (reverse-scored)
5	I found the various functions in this system were well integrated
6	I thought there was too much inconsistency in this system (reverse-scored)
7	I would imagine that most people would learn to use this system very quickly
8	I found the system very cumbersome to use (reverse-scored)
9	I felt very confident using the system
10	I needed to learn a lot of things before I could get going with this system (reverse-scored)

Table 12. User Study Evaluation Results (Likert scale ratings converted to percentage agreement).

Aspect	Score (Positive %)
Ease of Use (interface and interaction)	92.5%
Usefulness of recommendations	85.0%
Satisfaction with response quality	85.0%
Personalization satisfaction	95.0%
Trust in system accuracy	87.5%
Adoption likelihood (continued use)	85.0%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Tsampos, I.; Marakakis, E. DietQA: A Comprehensive Framework for Personalized Multi-Diet Recipe Retrieval Using Knowledge Graphs, Retrieval-Augmented Generation, and Large Language Models. Computers 2025, 14, 412. https://doi.org/10.3390/computers14100412

AMA Style

Tsampos I, Marakakis E. DietQA: A Comprehensive Framework for Personalized Multi-Diet Recipe Retrieval Using Knowledge Graphs, Retrieval-Augmented Generation, and Large Language Models. Computers. 2025; 14(10):412. https://doi.org/10.3390/computers14100412

Chicago/Turabian Style

Tsampos, Ioannis, and Emmanouil Marakakis. 2025. "DietQA: A Comprehensive Framework for Personalized Multi-Diet Recipe Retrieval Using Knowledge Graphs, Retrieval-Augmented Generation, and Large Language Models" Computers 14, no. 10: 412. https://doi.org/10.3390/computers14100412

APA Style

Tsampos, I., & Marakakis, E. (2025). DietQA: A Comprehensive Framework for Personalized Multi-Diet Recipe Retrieval Using Knowledge Graphs, Retrieval-Augmented Generation, and Large Language Models. Computers, 14(10), 412. https://doi.org/10.3390/computers14100412

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

DietQA: A Comprehensive Framework for Personalized Multi-Diet Recipe Retrieval Using Knowledge Graphs, Retrieval-Augmented Generation, and Large Language Models

Abstract

1. Introduction

2. Related Work

2.1. Recipe Web Scraping, Semantic Extraction, and Database Storage

2.2. AI in Recipe Recommendation and Personalized Nutrition

2.3. Knowledge Graphs and Food Ontologies

2.4. RAG, Food-Specific LLMs and Conversational Diet Assistants

3. Methodology

3.1. System Overview

3.2. Knowledge Graph and Dictionary Construction

3.3. Natural Language Understanding and Query Parsing

3.3.1. Intent and Entity Extraction via LLM

3.3.2. Knowledge Graph Query Construction

3.4. Retrieval-Augmented Generation (RAG)

3.4.1. Cypher Query Execution

3.4.2. Post-Retrieval Analysis and Nutritional Clustering of Recipes

3.4.3. Retrieval-Augmented Response Generation

3.5. User Decision and Adaptive Flow

3.6. DietQA System Pipeline Algorithm

3.7. Computational Complexity and Scalability

4. User Interface Design and Usability

4.1. UI Elements

4.2. Conversational Interaction

4.3. Visual Query Feedback

4.4. Filter Controls and Exploratory Refinement

4.5. Accessibility and Responsiveness

5. System Implementation

5.1. Backend and Data Infrastructure

5.2. LLM Server and Prompt Pipeline

5.3. Frontend Architecture and Technologies

5.4. Deployment, Performance, and Monitoring

6. Evaluation

6.1. Accuracy and Retrieval Performance

6.2. Diet-Constrained Query Evaluation

6.3. Negation and Exclusion Recognition

6.4. User Study

6.4.1. Study Design and Methodology

6.4.2. Task Design and Coverage

6.4.3. Assessment Instruments

6.4.4. Evaluation Metrics and Analysis Framework

6.4.5. User Study Results and Analysis

7. Discussion

7.1. Implications for Personalized Diet and Health Technology

7.2. Limitations

7.2.1. Nutritional and Culinary Fidelity

7.2.2. Conversational Performance

7.2.3. Evaluation Boundaries

7.2.4. Implications for Future Iterations

7.3. Extensions and Future Work

8. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI