A GraphRAG-Based Question-Answering System for Explainable and Advanced Reasoning over Air Quality Insights

Mountzouris, Christos; Protopsaltis, Grigorios; Gialelis, John

doi:10.3390/air4010006

Open AccessArticle

A GraphRAG-Based Question-Answering System for Explainable and Advanced Reasoning over Air Quality Insights

by

Christos Mountzouris

^*

,

Grigorios Protopsaltis

and

John Gialelis

Department of Electrical and Computer Engineering, University of Patras, Rion, 26 504 Patras, Greece

^*

Author to whom correspondence should be addressed.

Air 2026, 4(1), 6; https://doi.org/10.3390/air4010006

Submission received: 18 January 2026 / Revised: 18 February 2026 / Accepted: 6 March 2026 / Published: 10 March 2026

Download

Browse Figure

Versions Notes

Abstract

Exposure to poor indoor air quality (IAQ) conditions represents a major public health concern, with adverse effects on human health and well-being. The adoption of innovative technological solutions can support timely risk awareness, enable informed decision-making, and ultimately mitigate this health burden. In this context, Large Language Models (LLMs) emerge as a promising technological avenue through the Retrieval-Augmented Generation (RAG) paradigm, which extends their inherent natural language understanding capabilities with explicit access to external knowledge bases, enabling evidence-grounded reasoning and informed recommendations. The present work introduces an integrated GraphRAG-based Question Answering (QA) system that couples a domain-specific knowledge graph encoding fundamental IAQ concepts and relationships with a RAG-based natural language interface, thereby enabling explainable, context-aware, and advanced analytical reasoning over IAQ data. The evaluation results demonstrate the effectiveness of the proposed QA system across both retrieval and generation stages. The retrieval mechanism achieved a context recall of 0.914 and a precision of 0.838, while the generation mechanism attained a faithfulness score of 0.906 and an answer relevancy score of 0.891.

Keywords:

indoor air quality; retrieval augmented generation; large language models; knowledge graphs

1. Introduction

Air pollution is a leading environmental risk factor for public health that imposes a substantial global burden of morbidity and mortality. In 2023 alone, air pollution was associated with an estimated 7.9 million deaths worldwide, corresponding to a loss of 232 million healthy life years [1]. About 35% of these deaths were attributable to household air pollution, largely in low- and middle-income countries, where limited access to electricity forces reliance on solid fuels for heating and cooking [2,3].

Mitigating the health burden associated with air pollution necessitates bottom-up initiatives that raise risk awareness and foster environmental literacy. Equally important is the democratization of the technological landscape through affordable and accessible tools that support this goal. The widespread adoption of low-cost sensors has recently driven a shift toward decentralized IAQ monitoring that reduces technical and economic barriers at household and community scales [4,5,6]. The ongoing advances in Machine Learning (ML) and Artificial Intelligence (AI) have enabled predictive analytics that support proactive and adaptive IAQ management strategies, anomaly detection, and personalized health recommendations [7,8,9,10,11]. The emergence of Large Language Models (LLMs) has opened new avenues for transparency and interpretability in IAQ insights, supporting reasoning about causal patterns between IAQ conditions, exposure, and health risks [12,13,14].

LLMs have reshaped how information is aggregated, accessed and interpreted across digital environments. Nevertheless, three critical limitations persist [15]. First, LLMs are susceptible to hallucinations, often generating statements that appear plausible yet are factually incorrect [16,17]. This stems from their training objective aimed at optimizing next-token prediction, which inherently prioritizes linguistic coherence over factual veracity. Second, LLMs are constrained by a temporal knowledge cut-off, as their knowledge is frozen at the end of their training cycle, and a complete isolation from private and closed-source data resources [18,19]. Third, LLMs exhibit a persistent explainability deficit: their internal processes remain highly opaque and their post hoc explanations are prone to hallucinations, obscuring the model’s true computational and reasoning pathways [20,21].

The Retrieval-Augmented Generation (RAG) architecture aims to mitigate these intrinsic limitations of LLMs. The fundamental idea behind RAG is the augmentation of LLMs with external knowledge at inference time, complementing their internal parametric memory [22]. Two coupled components underpin RAG architectures: a retriever and a generator. The retriever encodes the user’s question into a high-dimensional vector space and then performs a nearest-neighbor search to source the most semantically similar information, stored in the external knowledge base. The generator appends the retrieved context to the system’s prompt to facilitate context-aware reasoning and produce a factually grounded response.

GraphRAG extends the conventional RAG architecture by leveraging knowledge graphs to retrieve contextual information. This combination is particularly effective when the external knowledge is highly structured and governed by explicit semantic and ontological relationships [23,24]. While the external knowledge is represented as flat text embeddings in conventional RAG architectures, implicitly discarding much of its semantic and ontological structure, the GraphRAG paradigm explicitly encodes the semantics among entities.

The present study introduces a GraphRAG Question-Answering (QA) system that enables explainable reasoning, ad hoc analysis, and interactive exploration of IAQ insights and predictive analytics. While the GraphRAG architecture typically focuses on entity-relationship retrieval from unstructured text, the proposed design introduces a dual-source retrieval architecture. A Knowledge Graph (KG) serves both as a direct source for regulatory and IAQ domain knowledge queries, and as a semantic orchestrator that parameterizes queries to a distributed time-series database (TSDB). By coupling static domain knowledge with sensor measurements, this architecture transforms raw sensor readings into regulatory-grounded and semantically contextualized IAQ assessments. This architecture also moves beyond the passive information consumption of state-of-the-practice IAQ platforms, replacing rigid dashboards with an interactive, natural language interface capable of on-demand, multi-hop analytical reasoning. The main contributions of this study are summarized as follows:

A structured semantic representation of key entities and their relationships for the IAQ domain through a flexible and scalable KG schema, including air pollutants and climatic conditions, sensor and building infrastructure, environmental and health guidelines, and IAQ predictive models.
A dual-source retrieval architecture that implements a unified orchestration layer where the KG transforms natural language intent into deterministic queries for a TSDB. This design differentiates the system from generic GraphRAG frameworks by using the ontological structure of the KG to parameterize data retrieval rather than simply retrieving static text nodes.
A GraphRAG architecture that integrates a KG and a time-series database for the retrieval of IAQ domain knowledge and historical measurements to support context-aware reasoning in question answering.
A performance evaluation of the GraphRAG QA system, measuring its retrieval precision, contextual understanding, and the quality of generated answers.

The present study adopted the Design Science Research (DSR) framework to implement and validate the GraphRAG QA system. Section 2 details the system’s objectives, and demonstrates its artifacts. Section 3 reports the system’s performance evaluation. Section 4 discusses the results. Section 5 outlines limitations and future research directives.

2. Methodology

2.1. Problem Identification

Limited causal interpretability and inadequate contextual reasoning have been consistently reported as critical deficiencies of IAQ applications in recent systematic reviews [7,8,11,25]. This deficiency is particularly pronounced when delivering predictive analytics due to the limited interpretability inherent to AI/ML models. The lack of transparency in the contribution of individual predictors and the individual prediction’s uncertainty also remain largely unaddressed. The opaqueness of AI/ML models further extends to decision support algorithms for IAQ management that often do not reveal the causal paths and reasoning chains leading to specific interventions. Another critical challenge is knowledge fragmentation across environmental standards and health guidelines that impedes the traceability of IAQ assessments to their authoritative origins. The advanced semantic and reasoning capabilities position GraphRAG architectures as a promising solution to address these challenges.

In parallel to these limitations, a broader usability gap extends the IAQ domain: the passive interaction paradigm dominates user interfaces of applications for smart environments, where rigid dashboards restrict ad hoc querying. User interfaces are often designed for expert users with domain knowledge, thereby creating a significant engagement barrier with broader stakeholder groups. Transitioning toward a natural language interaction paradigm that leverages the generative capabilities of LLMs could significantly bridge this gap.

2.2. Definition of the QA System’s Objectives

The proposed GraphRAG QA system aims to serve as an IAQ-domain expert that responds in natural language to users’ questions related to individual IAQ exposure and assessment through structured, context-aware reasoning grounded explicitly in an external knowledge base. This external knowledge base encompasses a KG that encodes a structured semantic representation of fundamental concepts and relationships in the IAQ domain, and a time-series database that stores historical IAQ measurements and predictive analytics of IAQ conditions. Notably, the system’s responses are decoupled from the internal parametric memory of the LLMs, thereby ensuring factual integrity.

The operational scope of the QA system is constrained to questions that can be answered through: (i) single-hop or multi-hop inferential reasoning over the knowledge encoded in the KG; (ii) targeted retrieval and aggregation of historical IAQ measurements and predictive analytics stored in the time-series database; and (iii) hybrid retrieval across both the KG and the time-series database. The QA system’s queryable parameters are grounded in the sensing infrastructure described in Section 2.4. Queries concerning climatic and IAQ conditions at the space level are answered using measurements from the static sensors at the relevant spatial abstraction level (e.g., room), whereas queries concerning individual exposure are addressed using measurements from the associated portable devices. Table 1 summarizes the climatic and IAQ parameters considered within the operational scope of the QA system. Table 2 provides examples of competency questions for the QA system.

2.3. Design of the Knowledge Graph (KG) Schema

Knowledge graphs (KGs) are systematic domain knowledge formalizations through shared semantics, providing a high-level abstraction layer that facilitates inferential reasoning and structured information retrieval. Attributed graphs constitute a graph-based data modeling paradigm, where nodes represent entities and directed, typed edges encode semantic relationships between them. The integration of attribute-value pairs into the graph primitives supports semantically rich annotations that enable more expressive graph-based traversal and reasoning.

The proposed QA system employs an attributed graph to organize IAQ domain knowledge and to support reasoning over user questions. IAQ measurements are delegated to a dedicated time-series database, as detailed in Section 3.2, to prevent unbounded graph growth and maintain efficient graph traversal. This decoupling ensures that the graph-based reasoning remains performant as the volume of IAQ measurements grows, and supports scalability and long-term maintainability of the QA system. Figure 1 illustrates the conceptual structure of the attributed graph.

Building entities serve as the highest-level spatial abstraction in the attributed graph, decomposed hierarchically into Apartment and Room entities to represent spatial partitions in the built environment. The one-to-many inclusion relationship hasApartment connects Building entities to Apartment entities, and in turn, Apartment entities are linked to Room entities through the one-to-many hasRoom relationship. User entities represent the occupants of the built environment, associated with Apartment entities through a one-to-one spatial association relationship, denoted as occupies. While a room-level attribution could enable more precise semantic grounding, the absence of a reliable mechanism for tracking occupants’ position necessitates this apartment-level attribution.

At the sensing infrastructure level, the attributed graph distinguishes between PortableDevice and StaticDevice entities. PortableDevice entities define a sensing abstraction at the user level that captures IAQ conditions in the occupant’s immediate surroundings; these entities maintain a one-to-one correspondence with User entities, denoted as hasPortableDevice. Conversely, StaticDevice entities provide a room-level sensing abstraction that characterizes ambient IAQ conditions; StaticDevice and Room entities are linked by the one-to-one hasInstalledDevice assignment relationship. Both entities at the sensing infrastructure level encompass attribute-value pairs describing their unique identification, technical and manufacturer specifications.

In the attributed graph, air pollutants and climatic conditions are modeled as first-class entities rather than device attributes. This modeling approach facilitates a modular and scalable representation of the monitored parameters. Pollutant and ClimaticCondition entities are semantically connected to sensing infrastructure entities through many-to-many measurement relationships, denoted as measuresPollutant and measuresClimaticCondition, respectively. Both environmental parameter entities encompass attribute-value pairs capturing their unique identifiers, standard and alternative nomenclature aliases, and units of measure.

Pollutant entities are further enriched with a set of interconnected entities that encode fundamental concepts in IAQ assessment. First, PollutantThreshold entities define concentration boundaries that delineate qualitative exposure categories for pollutants; these are associated with the corresponding Pollutant entities through a one-to-many relationship, denoted as hasThreshold. Second, ExposureDuration entities capture the temporal dimension of IAQ exposure, specifying the aggregation time window over which concentration thresholds are evaluated. PollutantThreshold and ExposureDuration entities are connected through a one-to-many appliesToDuration relationship. Third, HealthEffect entities represent the health implications of exposure, maintaining a mapping to exposure scenarios through the one-to-one hasHealthEffect relationship with ExposureDuration entities. Lastly, both HealthEffect and PollutantThreshold entities are mapped to Authority entities through the documentedBy and issuedBy relationships, respectively. This mapping enables traceability of IAQ exposure thresholds and associated health effects to environmental standards and health guidelines defined by authoritative bodies. The threshold values, exposure durations, and associated health effects encoded in these entities are grounded in established IAQ guidelines issued by authoritative bodies, including the World Health Organization (WHO) and the Environmental Protection Agency (EPA), following a systematic review of the relevant literature [26,27,28].

PredictionModel entities encode the provenance and configuration of the IAQ forecasting models, associated with Pollutant entities through the one-to-one predictsPollutant relationship. PredictionHorizon entities capture the forecasting horizon supported by each predictive model, with distinct nodes representing discrete intervals in multi-step forecasting or specific hyperparameter configurations. The one-to-many supportsHorizon relationship maps PredictionModel entities to PredictionHorizon entities. In turn, PredictionHorizon entities are linked to PredictionUncertainty entities through the one-to-one appliesTo relationship, which quantifies the model’s reliability for a specific forecasting horizon, encompassing attribute-value pairs describing confidence intervals and error metrics. PredictionModel entities are further associated with PredictiveFeature entities through the one-to-many relationship usesFeature that enables the QA system to provide transparent reasoning about the primary drivers of a forecast.

The PredictionModel and ExplanationMethod entities are connected through the one-to-many supportsExplanation relationship, which captures the underlying technical approach for the interpretation of forecasts. These methods include local and global attribution methods such as Shapley Additive Explanations (SHAP) and Local Interpretable Model-agnostic Explanations (LIME) ensuring that the rationale for every prediction is grounded in a specific diagnostic framework. The ExplanationArtifact entities are linked to ExplanationMethod entities via the produces relationship, persisting attribute-value pairs for the specific quantifiable attribution scores that explain forecasts.

A detailed specification of the core entity properties is provided in Table A1, including their associated data types and their description. A description of the semantics behind entities relationships of the attributed graph is provided in Table A2. A set of example Cypher queries is provided in Table A3.

2.4. Data Management Platform for IAQ Measurements and Predictions Context-Aware Retrieval

The GraphRAG QA system interfaces with a Data Management Platform (DMP) that integrates measurements from a static sensor layer and a portable sensing layer. The static sensor layer utilizes low-cost sensing devices built upon the Smart Citizen Kit (SCK), an open-source environmental monitoring platform that provides modular sensing units based on a data board equipped with an ARM Cortex-M0+ (SAMD21) microcontroller and an ESP8266 Wi-Fi module [29]. In their base configuration, these static units use the Sensirion SHT31 sensor to monitor air temperature and relative humidity, and are also expanded via specialized sensors to monitor gaseous pollutants and particulate matter, including the Sensirion SCD30 for CO₂, the Sensirion SEN55 for PM₁, PM_2.5, PM₄, and PM₁₀, and the Sensirion SFA30 for formaldehyde. The portable layer employs lab-developed devices that feature a dual-board architecture centered on the STM32WL55 LoRa SoC, facilitating real-time data transmission over a LoRaWAN network. These devices integrate the Sensirion SHT41 for temperature and humidity, the Sensirion SGP41 for VOC index, the Sensirion SCD40 for CO₂, and the Sensirion SPS30 for particulate matter (PM₁, PM_2.5, PM₄, and PM₁₀).

All collected data are ingested into the FIWARE-based DMP via the Next Generation Service Interface (NGSI) protocol [30]. The DMP utilizes a Context Broker to manage real-time attribute-level state updates to entities that conform to Smart Data Models [31]. Both IAQ measurements and short-term predictions are represented as distinct entities with standardized semantics, metadata, and temporal markers, ensuring semantic interoperability across heterogeneous sources. These entities are stored in CrateDB, a distributed TSDB, which the GraphRAG system queries via RESTful API endpoints. This architecture supports multi-level querying at both the static and portable sensing device layers by accepting structured arguments for specific environmental parameters, temporal windows, and statistical aggregation functions.

It should be stressed that the IAQ forecasting models are treated as supporting components in the proposed GraphRAG architecture. Their detailed formulation, training procedures, and comparative evaluation are beyond the scope of this paper; selected aspects of these models have been reported in prior studies involving diverse real-world deployments [32,33,34].

2.5. Design of the Intention Detection Mechanism

The intention detection mechanism validates incoming questions for the QA system based on their semantic relevance to the IAQ domain. It evaluates the user’s intent against the QA system’s semantic and functional boundaries through a two-step cascading filtering approach that balances computational overhead and classification precision. In the first step, incoming questions are mapped into dense vector representations using the Sentence Transformers all-MiniLM-L6-v2 model. These latent representations are then compared against the pre-computed embeddings of a reference corpus using cosine similarity. This step is designed to reject ambiguous and irrelevant questions, reducing unnecessary LLM calls while ensuring high precision among automatically accepted questions. In the second step, incoming questions that do not exceed a specific similarity threshold at the first step are re-evaluated using an LLM-as-a-judge module based on the OpenAI gpt-4o model. This module is designed to emulate a human annotator’s heuristic judgement by applying a structured validation rubric, producing a binary eligibility verdict.

Let

u \in Σ^{*}

denote a user’s incoming question in natural language transformed into a dense vector

u = f_{s e} (u) \in R^{384}

where

f_{s e} (\cdot) : Σ^{*} \to R^{384}

is the embedding function of the all-MiniLM-L6-v2 model. Let

R = {r_{1}, r_{2}, \dots, r_{30}}

denote the natural language questions of a reference corpus, where each question

r_{i} \in Σ^{*}

is also transformed into a dense vector

r_{i} = f_{s e} (r_{i}) \in R^{384}

. Let

τ

denote the similarity threshold for the first stage classifier. Equation (1) defines the cosine similarity function. Equations (2) and (3) describe the eligibility decision at the first and the second stage, respectively. The overall binary eligibility decision is formalized in Equation (4).

σ (u) = \max_{i \in {1,2, \dots, 30}} \cos (u, r_{i}) = \underset{i \in {1,2, \dots, 30}}{m a x} \frac{u^{T} r_{i}}{‖u‖ ‖r_{i}‖} \in R

(1)

{\hat{y}}_{1} (u) = l [σ (u) \geq τ], w h e r e l [\cdot] \to \{0,1\}

(2)

{\hat{y}}_{2} (u) = g (u), w h e r e g (\cdot) \to \{0,1\}

(3)

\hat{y} (u) = l [σ (u) \geq τ] + l [σ (u) < τ] \times {\hat{y}}_{2} (u)

(4)

The similarity threshold

τ

was calibrated to minimize out-of-scope (OOS) leakage in the first-stage binary eligibility decision

{\hat{y}}_{1} (u)

. A human-annotated set

D

of 90 questions was constructed, comprising 45 in-scope and 45 out-of-scope questions. For each question

u \in D

, the similarity score

σ (u)

was computed against the reference corpus

R

, and the corresponding first-stage eligibility

{\hat{y}}_{1} (u)

was derived. A grid search over

τ \in [0.3, 0.8]

with step

0.025

was then conducted, and the OOS False Accept Rate (FAR) was estimated as the fraction of OOS questions in D for which

{\hat{y}}_{1} (u) = 1

, i.e., questions incorrectly admitted by the first-stage filter. To ensure strict filtering, the leakage constraint was set to

F A R \leq 5

%, allowing at most two OOS questions in

D

to be accepted by the first-stage filter. Among candidate thresholds satisfying this constraint,

τ = 0.7

was selected as it minimized OOS leakage while preserving a high rate of direct first-stage acceptance for in-scope questions, thereby reducing reliance on the second-stage adjudicator

g (\cdot)

for clearly in-scope questions.

2.6. Design of the GraphRAG Context Retrieval Mechanism

The reference corpus

Q

represents a collection of natural language questions covering the intended query space of the QA system. Each question

q \in Q

in the reference corpus also encapsulates a structured metadata object defining executable primitives for knowledge retrieval. For questions targeting historical IAQ measurements and predictions of IAQ conditions, the metadata object defines the parameters of an HTTP request to a specific endpoint of the DMP, including the environmental attributes, temporal windows, and the statistical aggregation function. For questions targeting contextual knowledge for the IAQ domain, such as thresholds for IAQ exposure and associated health effects, or questions requiring knowledge for interpreting IAQ predictions, the metadata object specifies the parameters of a Cypher query. These queries traverse entities and their underlying semantic relationships, as defined in the schema of the attributed graph. For questions requiring complex reasoning, the specific primitives for coordinated execution are defined.

For the proposed QA system, the reference corpus

Q

serves as a superset of the narrower

R

used to validate the user’s intent. The corpus construction follows a hybrid human-LLM approach: 40 questions were human-authored to establish a foundation grounded in the system’s supported functionalities and established IAQ terminology; 60 additional questions were generated using the OpenAI gpt-4o-mini model as paraphrased variations to increase semantic diversity and linguistic coverage. To ensure corpus integrity, all LLM-generated questions underwent human-in-the-loop curation to verify their domain relevance and alignment with the system’s operational scope. This validation process ensured that the corpus remained strictly within the system’s operational scope, effectively preventing OOS queries from entering the reference set.

Let

Q_{H} \subset Q

denote the subset of human-authored questions and

Q_{L} \subset Q

denote the subset of LLM-generated questions. Equation (5) formalizes the composition of the reference corpus

Q

.

Q = Q_{H} \cup Q_{L}, Q_{H} \cap Q_{L} = \emptyset, w h e r e |Q_{H}| = 40 a n d |Q_{L}| = 60

(5)

Let

J = {i_{1}, i_{2}, \dots, i_{M}}

denote the QA system’s intent space encompassing

Μ

distinct functional intents. Let

f : Q \to J

be a mapping function that assigns each question

q \in Q

to a specific functional intent

i_{k} \in J

. Equation (6) imposes a surjectivity constraint, ensuring that every functional intent is represented in the reference corpus.

\forall i_{k} \in J, \{q \in Q| f (q) = i_{k}\} \neq \emptyset

(6)

Each question

q \in Q

in the reference corpus is transformed into a dense vector

q = f_{t e} (q) \in R^{1536}

where

f_{t e} (\cdot) : Σ^{*} \to R^{1536}

represents the embedding function of the OpenAI text-embedding-3-small model. Dense vectors are then stacked into an embeddings matrix

E

, and are subsequently stored in a ChromaDB vector database. Equation (7) defines the embeddings matrix

E

for the reference corpus

Q

.

E = [\begin{matrix} f_{t e} (q_{1}) \\ \begin{matrix} f_{t e} (q_{2}) \\ \dots \end{matrix} \\ f_{t e} (q_{100}) \end{matrix}] = [\begin{matrix} q_{1} \\ \begin{matrix} q_{2} \\ \dots \end{matrix} \\ q_{100} \end{matrix}] \in R^{100 \times 1536}

(7)

At inference time, the incoming question

u

is transformed into a dense vector

u

using

f_{t e} (\cdot) : Σ^{*} \to R^{1536}

to ensure dimensional alignment with the pre-computed embeddings stored in matrix

E

. This dense vector is queried against the ChromaDB to identify questions from the reference corpus

Q

that approximate the user’s semantic intent. The cosine similarity between

u

and

q_{i}

is computed, retrieving

k

questions with the highest scores, along with their structured metadata objects. Equation (8) defines the retrieval operation for the

k = 3

most similar canonical questions. Equation (9) presents the resulting context set for augmentation, where

M (q)

describes the metadata object for question

q

.

Q_{r e t} = a r g \underset{q_{i} \in Q}{\max^{(k)}} \cos (u, q_{i}) = a r g \underset{q_{i} \in Q}{\max^{(k)}} \frac{u^{T} q_{i}}{‖u‖ ‖q_{i}‖}, w h e r e k = 3

(8)

C_{a u g} = \{M (q)| q \in Q_{r e t}\}

(9)

2.7. Design of the GraphRAG Augmented Generation Mechanism

The augmented generation mechanism implements a dual-LLM pipeline. The first LLM is based on OpenAI gpt-4o model, configured with a temperature of 0 to prioritize deterministic output. It acts as a neural semantic parser that translates the user’s question into an executable retrieval directive, encompassing either graph traversal logic or parameterized API requests. A few-shot in-context learning strategy is employed to ensure strict compliance with the syntax and schema definitions of the targeted retrieval source. The inference prompt is constructed on-the-fly, augmenting the user’s question

u

with the context set

C_{a u g}

comprising the

k = 3

most similar canonical questions and their associated gold-standard executable retrieval directives. In addition, the prompt is augmented with the formal schema definition of the attributed graph, and the operational constraints and syntactic specifications governing the DMP API.

The second model is based on the OpenAI gpt-4o model and acts as a verbalizer, ingesting the retrieved data to generate a coherent answer in natural language. Once the generated directives have been executed and the corresponding knowledge or data retrieved, these results are injected into the inference prompt alongside the original user question

u

to generate the final response. The verbalizer’s role is strictly limited to semantic realization and linguistic formulation, ensuring that the final answer is based exclusively on the interpretation of the provided context, without incorporating information beyond the provided context.

3. Results

3.1. Evaluation Metrics for the GraphRAG QA System’s Performance

The performance of the proposed GraphRAG QA system is evaluated using the Retrieval-Augmented Generation Assessment (RAGAS) framework, which provides dedicated evaluation metrics for the retrieval and generation artifacts of a RAG system. Given the semantic and contextual nature of these metrics, the RAGAS framework leverages the LLM-as-a-judge paradigm to compute the corresponding scores. In the present study, the evaluation metrics are computed using the OpenAI GPT-4o model as the LLM-as-a-judge.

The evaluation of the retrieval artifacts assesses the RAG’s precision and recall, indicating whether it retrieves relevant and sufficient context from the knowledge base to respond to the user’s question. In GraphRAG architectures, these metrics reflect whether the executable query directives successfully retrieve the complete supporting knowledge to respond to the user’s question. The evaluation of the generation artifacts measures the factual consistency of the generated answer and its semantic relevance with respect to the user’s question. Since GraphRAG modifies only the retrieval and reasoning mechanism of a conventional RAG architecture, the generation evaluation metrics remain architecture-agnostic.

The metrics used to evaluate the retrieval performance are: (i) context recall that measures the proportion of ground truth claims that are covered by the retrieved context, and in the proposed GraphRAG reflects whether the executable query retrieves all the necessary information from the attributed graph and the time-series database; (ii) context precision that evaluates the proportion of relevant information within the retrieved context, and in the proposed GraphRAG evaluates whether the retrieved graph elements and the API responses are strictly relevant to the query.

The metrics used to evaluate the generation performance are: (i) faithfulness that assesses the factual consistency of the generated answer against the retrieved context, ensuring that the answer is derived solely from retrieved knowledge and not from the LLM’s parametric memory; (ii) answer relevancy that measures how pertinent the generated answer is to the user’s original question, penalizing answers that are factually correct but fail to address the specific intent of the question.

3.2. Evaluation Dataset for the QA System

A comprehensive set of 155 competency questions was constructed to evaluate the GraphRAG QA system’s performance, spanning the entire intent space. This set comprises questions related to temporal analysis of IAQ conditions, predictive analytics, explainable predictions, compliance with IAQ environmental standards, health assessments, IAQ domain knowledge, and sensing infrastructure retrieval. Each question is paired with the corresponding gold standard retrieval directive, consisting of executable Cypher queries, structured API request objects, or a combination of above. Table 3 presents the distribution of competency questions across categories, including the specific variables addressed in each.

3.3. Evaluation Results for the QA System—Baseline

To benchmark the performance of the GraphRAG QA system, a Text2Query baseline was established using the same underlying KG and TSDB. In this approach, the user’s question is passed directly to the OpenAI gpt-4o model, configured with a temperature of 0, without prior intent classification or retrieval orchestration. The LLM is prompted in a zero-shot manner to generate the appropriate Cypher queries and structured API requests, augmented solely with the formal schema definition of the attributed graph and the syntactic specifications of the DMP API. No reference corpus, few-shot examples, or intent-parameterized metadata objects are incorporated into the inference prompt. The generated queries are executed directly against the respective databases, and the raw results are passed to the same verbalizer LLM for natural language generation under identical configuration. This design ensures that the baseline operates under equivalent data infrastructure and static schema context as the proposed system, isolating the contribution of the few-shot retrieval orchestration layer.

At the retrieval level, the Text2Query baseline achieved an overall context recall of 0.843 and a context precision of 0.711, indicating that approximately 28.9% of the retrieved context is not directly relevant to the query. On temporal analysis questions, the baseline exhibited relatively high performance, achieving context recall scores between 0.92 and 0.95, which reflects the relatively straightforward mapping between temporal natural language queries and structured API queries. However, performance degraded substantially for questions requiring multi-hop reasoning over the attributed graph, where context recall dropped below 0.70. This decline is attributed to the inherent difficulty of generating syntactically correct and semantically complete multi-hop Cypher queries in a single inference step, particularly for questions involving explainable predictions, where the LLM must traverse complex schema relationships without the guidance of an intent classification and retrieval orchestration layer.

3.4. Evaluation Results for the QA System—GraphRAG

The GraphRAG QA system reported strong retrieval performance, achieving a context recall of 0.914 and a context precision of 0.838. The particularly high context recall score reflects the system’s effectiveness to retrieve the necessary evidence from the external knowledge base to answer user questions. The relatively high context precision of 0.838 further indicating that approximately 16.2% of the retrieved context is not directly relevant to the query. The system achieved near-perfect context recall scores (0.98–1.00) in questions related to temporal analysis, demonstrating the deterministic advantage of structured retrieval. By accurately mapping user’s intent to API parameters (e.g., timestamps, aggregation functions), the system eliminates the ambiguity often found in semantic search. The system also exhibited strong performance for questions requiring multi-hop reasoning over the attributed graph, as well as hybrid queries requiring time-series and attributed graph retrievals, achieving context recall scores between 0.86 and 0.94.

The evaluation of GraphRAG QA system’s generation artifacts also revealed strong performance, reflected by a faithfulness score of 0.906 and an answer relevancy score of 0.891. Both metrics remained relatively stable across the different question categories, indicating that they capture general-purpose generation capabilities rather than behavior driven by specific question types. In addition, the strong convergence between factual grounding and semantic relevance indicated that the system effectively utilizes the retrieved context to construct answers that are both accurate and directly responsive to user intent, minimizing the trade-off between factual grounding and answer relevance.

Lower recall scores (0.75–0.82) were observed for questions related to explainable predictions, stemming from a fundamental architectural trade-off in the proposed KG design: the storage of explanation artifacts as JSON objects rather than decomposed graph entities introduced retrieval difficulties that scale proportionally to explanation method complexity. While simple global explanation methods such as Random Forest feature importance or linear regression coefficients produce flat importance vectors that could be trivially stored as node attributes, complex instance-level explanation methods generate hierarchical artifacts requiring nested data structures. SHAP outputs include per-instance base values, individual feature contributions, and potentially second-order interaction effects; LIME produces approximation coefficients, sample weights, and local model fidelity metrics; attention mechanisms multi-head weight matrices across transformer layers. These complex structures were encapsulated as JSON blobs to preserve atomic semantic integrity. However, JSON blobs are opaque to graph query optimizers: queries seeking predictions where specific features dominated SHAP contributions required client-side deserialization of all ExplanationArtifact nodes rather than database-level filtering; error analysis revealed that retrieval failures occurred when relevant complex explanation components existed within JSON but failed to surface in top-k semantic search Due to the inability of embedding models to capture nested attribute structure. Query optimization strategies, including algorithm-based pre-filtering, two-stage retrieval, and hybrid indexing, partially mitigated these limitations, but structural constraints persisted for relational operations within SHAP and LIME components.

4. Discussion

The experimental results demonstrate that the proposed GraphRAG QA system achieves strong retrieval and generation performance across a diverse intent space, spanning temporal analysis, predictive analytics, explainable predictions, compliance monitoring, health impact assessment, domain knowledge retrieval, and sensing infrastructure inspection. In particular, the system achieved a context recall of 0.914 and a context precision of 0.838, confirming its ability to retrieve complete and semantically relevant supporting evidence through deterministic graph traversal and structured time-series querying. These results demonstrate the effectiveness of executable query directives against an attributed graph and a time-series database, in contrast to conventional vector-based RAG architectures that rely exclusively on semantic similarity.

This design represents a fundamental shift from generic GraphRAG implementations. Conventional GraphRAG architectures treat the KG primarily as a supplementary textual source, retrieving graph nodes as additional context to augment semantic similarity-based generation. In contrast, the proposed system elevates the attributed graph to the role of a semantic orchestrator, leveraging its ontological structure and encoded regulatory logic to parameterize structured queries against the time-series database. This architectural distinction ensures that the generated answers are not merely semantically proximate to the user’s query, but are functionally grounded in the convergence of sensor measurements and authoritative domain guidelines.

The near-perfect context recall observed in temporal analysis queries (0.98–1.00) highlights the deterministic advantage of structured retrieval for time-series analytics. By accurately mapping user intent to API parameters such as temporal windows and aggregation functions, the system eliminates the ambiguity commonly associated with embedding-based semantic search. This finding suggests that symbolic query execution offers greater reliability than embedding-based retrieval for analytical and compliance-oriented IAQ queries, where correctness and traceability are critical, particularly given the direct health implications of non-compliance with regulatory thresholds.

Beyond temporal queries, the consistently high recall scores (0.86–0.94) observed in multi-hop reasoning and hybrid retrieval scenarios demonstrate the system’s capacity to traverse complex semantic dependencies between pollutants, exposure duration, regulatory thresholds, and health effects. This capability stems from the explicit encoding of semantic and ontological relationships in the attributed graph, enabling transparent reasoning chains grounded in authoritative environmental and health guidelines.

The retrieval challenges observed for explainable prediction queries highlight a broader tension in KG-based systems between schema flexibility and query expressiveness. The decision to encapsulate complex explanation artifacts as JSON blobs prioritized generalizability across heterogeneous explanation methods, but at the cost of rendering these structures opaque to graph traversal and semantic search. This trade-off is not unique to the proposed system; it reflects a more general limitation in representing semi-structured, method-specific outputs within fixed graph schemas.

The evaluation of the generation stage further reflects the strong performance of the proposed architecture. The system achieved a faithfulness score of 0.906 and an answer relevancy score of 0.891, indicating strong factual grounding and semantic alignment with user intent. The stability of these metrics across question categories suggests that the system’s generation capabilities are consistent and not contingent on specific intent types. This is consistent with the architectural design of GraphRAG, which modifies only the retrieval and reasoning layers while preserving a conventional generation stage.

The proposed GraphRAG QA system’s findings align with and extend recent advances in domain-specific GraphRAG architectures. Yu et al. demonstrated that schema-driven path identification in IoT environments prevents hallucinations when data is unavailable, a benefit similarly observed in the proposed system where structured intent classification maps user queries to specific KG and TSDB retrieval paths rather than relying on semantic similarity alone [35]. Jiang et al. showed that separating conceptual knowledge from instance-level data in a dual-layer KG improves semantic consistency in domain-specific applications. The proposed architecture follows a conceptually analogous design by decoupling static domain knowledge from sensors measurements, enabling regulatory-grounded assessments that neither layer could support independently [36]. Furthermore, Papageorgiou et al. reported that multi-source orchestration in their agentic framework achieved the highest faithfulness among retrieval strategies, albeit with increased latency [37]. This trade-off is consistent with the proposed system’s dual-retrieval paradigm, where the orchestration of graph traversal and time-series queries introduces additional complexity but yields substantial improvements in context recall for multi-hop reasoning. Collectively, these works reinforce the finding that domain-specific GraphRAG systems benefit most from structured retrieval orchestration rather than single-source retrieval strategies. The present work extends this insight by demonstrating that coupled KG-TSDB integration is particularly effective for domains requiring both static semantic reasoning and temporal data analysis.

5. Conclusions

The present study introduces a GraphRAG-based QA system that supports natural language queries for ad hoc temporal analysis, explainable predictive analytics, and personalized health impact assessment based on individual exposure. Extending the conventional RAG architecture, the proposed QA system integrates a hybrid external knowledge base comprising a domain-specific attributed graph and a time-series database. This architectural approach enables deterministic reasoning through structured semantic queries over IAQ domain knowledge and sensor measurements. Unlike conventional RAG architectures that abstract external knowledge into flat embeddings, this design preserves the semantic and ontological structure of domain knowledge. Specifically, the architecture is distinguished by its use of the attributed graph as a semantic orchestrator that provides the necessary ontological constraints and regulatory logic to drive structured data retrieval from the time-series database.

The system was evaluated using a comprehensive set of 155 competency questions spanning its entire intent space and the RAGAS evaluation framework. The experimental results demonstrated strong retrieval and generation performance. The system achieved a context recall of 0.914 and a context precision of 0.838, confirming its ability to retrieve complete and semantically relevant supporting evidence through deterministic graph traversal and structured time-series querying. In addition, the generation component achieved a faithfulness score of 0.906 and an answer relevancy score of 0.891, indicating that the produced answers are both factually grounded and closely aligned with user intent.

These evaluation results validate the effectiveness of the GraphRAG paradigm for IAQ reasoning and highlight its advantages over conventional vector-based RAG architectures in scenarios that require multi-hop reasoning, hybrid knowledge–measurement retrieval, regulatory compliance verification, and explainable forecasting. By preserving the semantic and ontological structure of IAQ knowledge, the proposed system enables transparent reasoning chains and traceable assessments grounded in authoritative environmental and health guidelines. Furthermore, the proposed QA system moves beyond passive dashboards and rigid visualization pipelines in IAQ applications, enabling interactive exploration of insights through natural language, and contributing to environmental literacy, citizen empowerment, and evidence-based decision making for healthier indoor environments.

Scaling the proposed GraphRAG-based QA system for real-world deployment introduces operational considerations that lie beyond the scope of this work. A primary concern is system’s latency, as multi-hop queries could increase computational overhead; asynchronous processing or model quantization approaches should be explored to ensure real-time responsiveness. In addition, transitioning from a static corpus to dynamic data updates is essential for maintaining an up-to-date KG without frequent re-indexing. Finally, improving user experience requires robust error handling, such as incorporating confidence-score thresholds to manage OOS queries and provide reliable fallbacks during edge-case interactions.

Multiple research avenues warrant future investigation to advance and further elaborate this promising QA system for the IAQ domain. First, the attributed graph should be further extended to model additional dimensions of environmental monitoring and health implications. This would improve the completeness of the encoded domain knowledge and enhance the system’s capacity to support real-world use cases. Second, the GraphRAG QA system should be further evaluated using an extended user intent space and a broader set of competency questions curated from diverse groups of IAQ stakeholders. This would enhance the user acceptance and serve as a human-feedback loop for performance improvements. Third, decomposition strategies for selectively materializing high-frequency query targets as first-class graph entities while retaining JSON encapsulation for rarely accessed components should be explored. Fourth, the proposed QA system should be integrated into IAQ applications, such as digital twins, to explore its synergistic potential with other cutting-edge technologies.

Author Contributions

Conceptualization, C.M. and J.G.; Methodology, C.M.; Software, C.M. and G.P.; Validation, C.M., G.P. and J.G.; Investigation, C.M. and G.P.; Data curation, C.M. and G.P.; Supervision, J.G.; Project administration, J.G. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data are not publicly available due to privacy restrictions. The source code is available on request from the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

Table A1 provides a detailed specification of the entity properties and attribute constraints defined within the attributed graph schema.

Table A1. Specification of entity properties and attribute descriptions within the attributed graph.

Entity	Property	Data Type	Description and Constraints
Building	buildingID	String	Unique identifier of the building entity (e.g., “BLD01”)
	buildingType	String	Functional building type constrained to a predefined set (e.g., “Residential”, “Office”)
	buildingLocation	List<Float>	Geographical coordinates (latitude, longitude) of the building
Apartment	apartmentID	String	Unique identifier of the apartment entity (e.g., “APRT01”)
	apartmentFloor	Integer	Vertical level of the apartment within the building
	apartmentArea	Float	Total surface area of the apartment in square meters
Room	roomID	String	Unique identifier of the room entity (e.g., “RM01”)
	roomType	String	Functional room type constrained to a predefined set (e.g., “Bedroom”)
	roomArea	Float	Total surface area of the room in square meters
User	userID	String	Unique identifier of the individual occupant (e.g., “USER01”)
	userFullName	String	Full name of the user for personalized responses (e.g., “John Doe”)
	userRespiratoryProfile	JSON	Health metadata specifying user respiratory conditions and sensitivity level
StaticDevice	staticDeviceID	String	Unique identifier of the static sensing device (e.g., “SDEV01”)
	staticDeviceSpecs	JSON	Metadata defining static device specifications (e.g., MAC address, connectivity protocol)
	manufacturerSpecs	JSON	Metadata defining manufacturer specifications (e.g., manufacturer name, model version)
PortableDevice	portableDeviceID	String	Unique identifier of the portable sensing device (e.g., “PDEV01”)
	portableDeviceSpecs	JSON	Metadata defining portable device specifications (e.g., MAC address, connectivity protocol)
	manufacturerSpecs	JSON	Metadata defining manufacturer specifications (e.g., manufacturer name, model version)
Pollutant	pollutantID	String	Unique identifier of the pollutant (e.g., “POL01”)
	pollutantName	String	Standardized nomenclature of the pollutant (e.g., “CO2”)
	pollutantAlias	List<String>	Alternative nomenclature of the pollutant (e.g., “Carbon dioxide”)
	unitsOfMeasure	String	Standardized units of measure for the pollutant (e.g., “ppm”)
ClimaticCondition	climaticConditionID	String	Unique identifier of the climatic condition (e.g., “CCON01”)
	climaticConditionName	String	Standardized nomenclature of climatic condition (e.g., “Air Temperature”)
	climaticConditionAlias	List<String>	Alternative nomenclature of the climatic condition (e.g., “Temperature”)
	unitsOfMeasure	String	Standardized units of measure of the climatic condition (e.g., “°C”)
PollutantThreshold	thresholdCategory	String	Qualitative associated risk level (e.g., “High”, “Moderate”)
	limitValue	Float	Concentration boundary for the risk level category (e.g., “1000”)
	limitUnit	String	Standardized measurement unit for the threshold (e.g., “ppm”)
ExposureDuration	exposureTimeWindow	Integer	Temporal window for pollutant exposure measured in hours
ExposureDuration	exposureAggregationType	String	Statistical aggregation method for exposure (e.g., “Average”, “Maximum”)
HealthEffect	healthEffectID	String	Unique identifier of the health outcome attributed to pollutant exposure (e.g., “HL01”)
	healthEffectDescription	String	Description of the health outcome attributed to pollutant exposure
	severityLevel	String	Classification of the health impact (e.g., “Acute”, “Chronic”)
Authority	authorityName	String	Health or environmental authority name that issued the guideline (e.g., “WHO”, “EPA”)
	documentTitle	String	Title of the environmental or health guideline issued by the authority
	documentPubYear	Integer	Year when the guideline was officially issued (e.g., 2021)
	documentPubYear	Integer	Year when the guideline was last ratified (e.g., 2021)
PredictiveModel	modelID	String	Unique identifier of the instance of the forecasting model (e.g., “PRD01”)
	modelType	String	High-level architecture of the forecasting model (e.g., “LSTM”, “ARIMA”)
	modelTemporalResolution	Integer	Duration of the forecasting model’s single temporal step measured in minutes (e.g., 5)
	trainingDate	Datetime	Timestamp of the forecasting model’s last training cycle
PredictionHorizon	horizonValue	Integer	Numerical value of the method’s forecast lead time (e.g., “60”)
PredictionHorizon	horizonUnit	String	Temporal unit for the lead time (e.g., minutes)
PredictionUncertainty	confidenceInterval	List<Float>	Probabilistic bounds of the prediction confidence intervals (e.g., [0.85, 0.95])
	errorMetricName	String	Type of error metric used (e.g., “RMSE”, “MAE”)
	errorMetricValue	Float	Numerical value for the specified error metric used
PredictiveFeature	featureName	String	Name of the predictor variable (e.g., “lagged_CO2”)
PredictiveFeature	featureLag	Integer	Number of discrete temporal steps relative to prediction time t (e.g., 1, 2)
ExplanationMethod	methodID	String	Unique identifier of the diagnostic framework (e.g., “EXMETH01”)
	methodType	String	Classification of the explanation approach (e.g., “Global”, “Local”)
	algorithmName	String	Specific technique used to generate the explanation (e.g., “SHAP”, “LIME”)
ExplanationArtifact	artifactID	String	Unique identifier of the specific explanatory artifact (e.g., “EXAR01”)
	featureWeights	JSON	Mapping of input features to their specific contribution scores
	baselineValue	Float	Reference value from which the explanation starts (e.g., SHAP base value)
	artifactFormat	String	Data representation type (e.g., “Vector”, “Contribution Map”)

Table A2 provides a description of the core semantic relationships in the KG schema.

Table A2. Description of the core semantic relationships in the KG schema.

Relationship	Source → Target	Attribute	Type	Description
hasApartment	Building → Apartment	isPrivateApartment	Boolean	Flag indicating whether a spatial unit is private
hasRoom	Apartment → Room	isPrimaryRoom	Boolean	Flag indicating the room’s prioritization
occupies	User → Apartment	isPermanentOccupant	Boolean	Flag indicating permanent occupation
occupies	User → Apartment	occupiesSince	Datetime	Timestamp when the occupancy period started
hasInstalledDevice	Room → StaticDevice	installationHeight	Float	Vertical distance from the floor in meters
		installtionDate	Datetime	Timestamp when the sensing device installed
		installationContext	String	Description of the device’s spatial environment (e.g., “Near HVAC”)
hasPortableDevice	Room → StaticDevice	deploymentDate	Datetime	Timestamp when the sensing device deployed
hasPortableDevice	Room → StaticDevice	deploymentContext	String	Description of the device’s deployment environment (e.g., “at the bedside table”)
measuresPollutant	Device → Pollutant	isPrimarySource	Boolean	Flags whether the sensor is the main reference for a specific pollutant in a room
		samplingFrequency	Integer	Measurement interval in seconds (e.g., 60)
		accuracy	Float	Measurement precision specific to the pollutant
measuresClimaticCondition	Device → ClimaticCondition	isPrimarySource	Boolean	Flags whether the sensor is the main reference for a specific pollutant near a user
		samplingFrequency	Integer	Measurement interval in seconds (e.g., 60)
		accuracy	Float	Measurement precision specific to the pollutant
hasThreshold	Pollutant → PollutantThreshold	isPrimarySource	Boolean	Flags whether the sensor is the main reference for a specific pollutant near a user
		samplingFrequency	Integer	Measurement interval in seconds (e.g., 60)
		accuracy	Float	Measurement precision specific to the pollutant
measuresClimaticConfigure	Device → ClimaticCondition	isPrimarySource	Boolean	Flags whether the sensor is the main reference for a specific pollutant near a user
		samplingFrequency	Integer	Measurement interval in seconds (e.g., 60)
		accuracy	Float	Measurement precision specific to the pollutant
hasThreshold	Pollutant → PollutantThreshold	isRecommended	Boolean	Flag specifying whether the threshold is an advisory guideline or a legally binding standard
hasHealthEffect	PollutantThreshold → HealthEffect	causalityType	String	Defines the link nature (e.g., “Symptomatic”)
usesFeature	PredictiveModel → PredictiveFeature	influenceScore	Float	Contribution score of the specific predictor to the model output
		significanceMetric	String	Metric type used for the score (e.g., “p-value”, “Weight”, “Importance”)
		featureLag	Integer	Number of discrete temporal steps relative to prediction time t (e.g., 1, 2)
hasUncertainty	PredictiveModel → PredictiveUncertainty	confidenceLevel	Float	Confidence level for a prediction (e.g., 0.95)

Table A3 provides representative Cypher queries covering the main question categories.

Table A3. Representative Cypher queries covering the main question categories.

Category	Cypher Query
Spatial Navigation	MATCH (b:Building {buildingID: “BLD01”})-[:hasApartment]→(a:Apartment) -[:hasRoom]→(r:Room {roomType: “Bedroom”}) -[:hasInstalledDevice]→(d:StaticDevice) RETURN d.staticDeviceID, r.roomID
Pollutant Threshold and Health Effects	MATCH (p:Pollutant)-[:hasThreshold]→(pt:PollutantThreshold)-[:appliesToDuration]→(ed:ExposureDuration)-[:hasHealthEffect]→(he:HealthEffect) WHERE pt.thresholdCategory = “High” RETURN pt.limitValue, pt.limitUnit, he.healthEffectDescription, he.severityLevel
Predictive Model Feature Analysis	MATCH (pm:PredictiveModel {modelType: “LSTM”})-[:usesFeature]→(pf:PredictiveFeature) RETURN pf.featureName, pf.featureLag ORDER BY pf.featureLag ASC
User Exposure Context	MATCH (u:User {userID: “USER01”})-[:occupies]→(a:Apartment) -[:hasRoom]→(r:Room)-[:hasInstalledDevice]→(d:StaticDevice) -[:measuresPollutant]→(p:Pollutant)-[:hasThreshold]→(pt:PollutantThreshold) -[:appliesToDuration]→(ed:ExposureDuration) -[:hasHealthEffect]→(he:HealthEffect) WHERE he.severityLevel = “Chronic” RETURN u.userFullName, r.roomID, p.pollutantName, pt.limitValue, pt.limitUnit, ed.exposureTimeWindow, he.healthEffectDescription, he.severityLevel

References

Health Effects Institute. State of Global Air 2025: A Report on Air Pollution and Its Role in the World’s Leading Causes of Death; Health Effects Institute: Boston, MA, USA, 2025. [Google Scholar]
World Health Organization. Household Air Pollution. Available online: https://www.who.int/news-room/fact-sheets/detail/household-air-pollution-and-health (accessed on 20 December 2025).
United Nations Office for Disaster Risk Reduction. Household Air Pollution. Available online: https://www.undrr.org/understanding-disaster-risk/terminology/hips/en0101 (accessed on 20 December 2025).
Morawska, L.; Thai, P.K.; Liu, X.; Asumadu-Sakyi, A.; Ayoko, G.; Bartonova, A.; Bedini, A.; Chai, F.; Christensen, B.; Dunbabin, M.; et al. Applications of low-cost sensing technologies for air quality monitoring and exposure assessment: How far have they gone? Environ. Int. 2018, 116, 286–299. [Google Scholar] [CrossRef]
Liu, X.; Jayaratne, R.; Thai, P.; Kuhn, T.; Zing, I.; Christensen, B.; Lamont, R.; Dunbabin, M.; Zhu, S.; Gao, J.; et al. Low-cost sensors as an alternative for long-term air quality monitoring. Environ. Res. 2020, 185, 109438. [Google Scholar] [CrossRef]
Hernández-Gordillo, A.; Ruiz-Correa, S.; Robledo-Valero, V.; Hernández-Rosales, C.; Arriaga, S. Recent advancements in low-cost portable sensors for urban and indoor air quality monitoring. Air Qual. Atmos. Health 2021, 14, 1931–1951. [Google Scholar] [CrossRef]
Garcia, A.; Saez, Y.; Harris, I.; Huang, X.; Collado, E. Advancements in air quality monitoring: A systematic review of IoT-based air quality monitoring and AI technologies. Artif. Intell. Rev. 2025, 58, 275. [Google Scholar] [CrossRef]
Saini, J.; Dutta, M.; Marques, G. Machine learning for indoor air quality assessment: A systematic review and analysis. Environ. Model. Assess. 2025, 30, 417–434. [Google Scholar] [CrossRef]
Méndez, M.; Merayo, M.G.; Núñez, M. Machine learning algorithms to forecast air quality: A survey. Artif. Intell. Rev. 2023, 56, 10031–10066. [Google Scholar] [CrossRef]
Ogundiran, J.; Asadi, E.; Gameiro da Silva, M. A systematic review on the use of AI for energy efficiency and indoor environmental quality in buildings. Sustainability 2024, 16, 3627. [Google Scholar] [CrossRef]
Latoń, D.; Grela, J.; Ożadowicz, A.; Wiśniewski, L. Artificial intelligence and machine learning approaches for indoor air quality prediction: A comprehensive review of methods and applications. Energies 2025, 18, 5194. [Google Scholar] [CrossRef]
Amangeldy, B.; Tasmurzayev, N.; Imankulov, T.; Baigarayeva, Z.; Izmailov, N.; Riza, T.; Abdukarimov, A.; Mukazhan, M.; Zhumagulov, B. AI-powered building ecosystems: A narrative mapping review on the integration of digital twins and LLMs for proactive comfort, indoor environmental quality, and energy management. Sensors 2025, 25, 5265. [Google Scholar] [CrossRef]
Dai, T.; Wang, F.; Chen, Q. Application of large language models in the design of indoor air distribution systems for office environments. Build. Environ. 2025, 285, 113647. [Google Scholar] [CrossRef]
Chen, A.; Du, J.; Rodriguez, A.; Rodriguez, R.; Higgins, J.; Podmore, R.; Liu, R.; Ilao, E.; Degilla, S.; Bibiano, J.; et al. Viability of applying large language models to indoor climate sensor and health data for scientific discovery. In Proceedings of the IEEE Global Humanitarian Technology Conference (GHTC), Radnor, PA, USA, 23–26 October 2024. [Google Scholar] [CrossRef]
Burton, J.W.; Lopez-Lopez, E.; Hechtlinger, S.; Rahwan, Z.; Aeschbach, S.; Bakker, M.A.; Becker, J.A.; Berditchevskaia, A.; Berger, J.; Brinkmann, L.; et al. How large language models can reshape collective intelligence. Nat. Hum. Behav. 2024, 8, 1643–1655. [Google Scholar] [CrossRef]
Ye, H.; Liu, T.; Zhang, A.; Hua, W.; Jia, W. Cognitive mirage: A review of hallucinations in large language models. arXiv 2023, arXiv:2309.06794. [Google Scholar] [CrossRef]
McKenna, N.; Li, T.; Cheng, L.; Hosseini, M.; Johnson, M.; Steedman, M. Sources of hallucination by large language models on inference tasks. In Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, Singapore, 6–10 December 2023. [Google Scholar] [CrossRef]
Cheng, J.; Marone, M.; Weller, O.; Lawrie, D.; Khashabi, D.; Van Durme, B. Dated data: Tracing knowledge cutoffs in large language models. arXiv 2024, arXiv:2403.12958. [Google Scholar] [CrossRef]
Wang, S.; Zhu, Y.; Liu, H.; Zheng, Z.; Chen, C.; Li, J. Knowledge editing for large language models: A survey. ACM Comput. Surv. 2024, 57, 59. [Google Scholar] [CrossRef]
Zhao, H.; Chen, H.; Yang, F.; Liu, N.; Deng, H.; Cai, H.; Wang, S.; Yin, D.; Du, M. Explainability for large language models: A survey. ACM Trans. Intell. Syst. Technol. 2024, 15, 20. [Google Scholar] [CrossRef]
Gantla, S.R. Exploring mechanistic interpretability in large language models: Challenges, approaches, and insights. In Proceedings of the International Conference on Data Science, Agents & Artificial Intelligence (ICDSAAI), Chennai, India, 6–10 June 2025. [Google Scholar] [CrossRef]
Lewis, P.; Perez, E.; Piktus, A.; Petroni, F.; Karpukhin, V.; Goyal, N.; Küttler, H.; Lewis, M.; Yih, W.-T.; Rocktäschel, T.; et al. Retrieval-augmented generation for knowledge-intensive NLP tasks. arXiv 2021, arXiv:2005.11401. [Google Scholar] [CrossRef]
Han, H.; Wang, Y.; Shomer, H.; Guo, K.; Ding, J.; Lei, Y.; Halappanavar, M.; Rossi, R.A.; Mukherjee, S.; Tang, X.; et al. Retrieval-augmented generation with graphs (GraphRAG). arXiv 2025, arXiv:2501.00309. [Google Scholar] [CrossRef]
Edge, D.; Trinh, H.; Cheng, N.; Bradley, J.; Chao, A.; Mody, A.; Truitt, S.; Metropolitansky, D.; Osazuwa Ness, R.; Larson, J. From local to global: A Graph RAG approach to query-focused summarization. arXiv 2025, arXiv:2404.16130. [Google Scholar] [CrossRef]
Romaios, A.; Sfikas, P.; Giannadakis, A.; Panidis, T.; Paravantis, J.A.; Skouras, E.D.; Mihalakakou, G. Artificial Intelligence for Enhancing Indoor Air Quality in Educational Environments: A Review and Future Perspectives. Sustainability 2025, 17, 10117. [Google Scholar] [CrossRef]
Zhang, H.; Srinivasan, R. A Systematic Review of Air Quality Sensors, Guidelines, and Measurement Studies for Indoor Air Quality Management. Sustainability 2020, 12, 9045. [Google Scholar] [CrossRef]
Settimo, G.; Yu, Y.; Gola, M.; Buffoli, M.; Capolongo, S. Challenges in IAQ for Indoor Spaces: A Comparison of the Reference Guideline Values of Indoor Air Pollutants from the Governments and International Institutions. Atmosphere 2023, 14, 633. [Google Scholar] [CrossRef]
Morawska, L.; Huang, W. WHO Health Guidelines for Indoor Air Quality and National Recommendations/Standards. In Handbook of Indoor Air Quality; Zhang, Y., Hopke, P.K., Mandin, C., Eds.; Springer: Singapore, 2022. [Google Scholar] [CrossRef]
Camprodon, G.; González, Ó.; Barberán, V.; Pérez, M.; Smári, V.; de Heras, M.Á.; Bizzotto, A. Smart Citizen Kit and Station: An Open Environmental Monitoring System for Citizen Participation and Scientific Experimentation. HardwareX 2019, 6, e00070. [Google Scholar] [CrossRef]
Cirillo, F.; Solmaz, G.; Berz, E.L.; Bauer, M.; Cheng, B.; Kovacs, E. A Standard-Based Open Source IoT Platform: FIWARE. IEEE Internet Things Mag. 2019, 2, 12–18. [Google Scholar] [CrossRef]
Bauer, M. FIWARE: Standard-Based Open Source Components for Cross-Domain IoT Platforms. In Proceedings of the 2022 IEEE 8th World Forum on Internet of Things (WF-IoT), Yokohama, Japan, 26 October–11 November 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 1–6. [Google Scholar] [CrossRef]
Mountzouris, C.; Protopsaltis, G.; Gialelis, J. Toward Personalized Short-Term PM2.5 Forecasting Integrating a Low-Cost Wearable Device and an Attention-Based LSTM. Air 2025, 3, 29. [Google Scholar] [CrossRef]
Mountzouris, C.; Protopsaltis, G.; Gialelis, J. Short-Term Forecast of Indoor CO2 Using Attention-Based LSTM: A Use Case of a Hospital in Greece. Sensors 2025, 25, 5382. [Google Scholar] [CrossRef] [PubMed]
Mountzouris, C.; Protopsaltis, G.; Gialelis, J.; Fytili, D. Short-term forecast of indoor CO₂ using Gradient Boosting: A use case of a hospital in Greece. In Proceedings of the IEEE 30th International Conference on Emerging Technologies and Factory Automation (ETFA), Porto, Portugal, 9–12 September 2025; IEEE: Piscataway, NJ, USA, 2025; pp. 1–7. [Google Scholar] [CrossRef]
Yu, H.-H.; Lin, W.-T.; Kuan, C.-W.; Yang, C.-C.; Liao, K.-M. GraphRAG-Enhanced Dialogue Engine for Domain-Specific Question Answering: A Case Study on the Civil IoT Taiwan Platform. Future Internet 2025, 17, 414. [Google Scholar] [CrossRef]
Jiang, B.; Liu, Z.; Wang, N.; Li, Z.; Shi, Y.; Lin, B. Process-Oriented Dual-Layer Knowledge GraphRAG for Reservoir Engineering Decision Support. Processes 2025, 13, 3230. [Google Scholar] [CrossRef]
Papageorgiou, G.; Sarlis, V.; Maragoudakis, M.; Tjortjis, C. Hybrid Multi-Agent GraphRAG for E-Government: Towards a Trustworthy AI Assistant. Appl. Sci. 2025, 15, 6315. [Google Scholar] [CrossRef]

Figure 1. Conceptual design of the attributed graph modeling IAQ entities and their relationships.

Table 1. Climatic and IAQ parameters included in the operational scope of the QA system.

Parameter	Type	Sensing Layer
Air temperature	Climatic	Static sensors, Portable sensors
Relative humidity	Climatic	Static sensors, Portable sensors
CO₂	IAQ	Static sensors, Portable sensors
PM₁, PM_2.5, PM₄, PM₁₀	IAQ	Static sensors, Portable sensors
VOC index	IAQ	Static sensors, Portable sensors
Formaldehyde	IAQ	Static sensors

Table 2. Examples of competency questions, their question type, and their specific data sources.

Category	Example of Competency Question	Data Sources *
Factual	What is the 8 h exposure threshold for indoor CO₂ based on the WHO?	KG
	Which health effects are associated with the long-term PM_2.5 exposure?	KG
	Which pollutants are measured from my portable device?	KG
Relational	Did the CO₂ in my bedroom exceeded the short-term exposure limits yesterday?	KG + TSDB
	Provide me a prediction for the PM_2.5 in my living room for the next 1 h.	KG + TSDB
	Based on my exposure on PM_2.5 this morning, what’s the respiratory risk?	KG + TSDB
Analytical	What was the peak PM_2.5 concentration in my bedroom during the last day?	TSDB
	How many hours this week did the CO₂ in the kitchen stay below 1000 ppm?	TSDB
	What was my average exposure to PM_2.5 over the past 30 days?	TSDB
Summarizing	Provide me a weekly compliance summary for PM_2.5 across all the rooms.	KG + TSDB
	Summarize the overall health risk profile for my apartment during the last week.	KG + TSDB
	Report me the differences in CO₂ levels across all the rooms yesterday.	KG + TSDB

* KG denote knowledge graph. TSDB denote time-series database.

Table 3. Competency Question Categories and Evaluation Variables for the GraphRAG QA System.

Category	Description of Category	Variables	Questions
Temporal Analysis	Historical measurements of air pollutants and climatic conditions across the occupant’s space.	Target pollutant or climatic condition (e.g., CO₂, PM_2.5, air temperature)	35
		Temporal window (e.g., timestamp, hour, days, week, month)
		Temporal aggregation (e.g., average, standard deviation, count, max, min)
		Temporal operators (e.g., before, now, this morning)
Predictive Analytics	Short-term forecasting for IAQ conditions across the occupant’s space.	Target pollutant (e.g., CO₂ levels, PM_2.5 levels)	20
Predictive Analytics		Temporal horizon (e.g., minutes, hour)	20
Compliance Monitoring	Compare historical IAQ measurements against established threshold and guidelines for exposure.	Target pollutant (e.g., CO₂ levels, PM_2.5 levels)	20
		Target guideline (e.g., WHO thresholds)
		Exposure duration (e.g., hour, days, week, month)
Health Impact	Compare historical IAQ measurements against established guidelines for health risk assessment.	Target pollutant (e.g., CO₂ levels, PM_2.5 levels)	20
		Target guideline (e.g., WHO health guidelines)
		Exposure duration (e.g., hour, days, week, month)
Explainable Predictions	Interpret predictions based on individual feature contribution, confidence, and explainability artifacts.	Target model (e.g., ARIMA, Random Forest, Gradient Boosting, LSTM)	25
		Predictive Factors (e.g., rolling CO₂ mean, lagged PM_2.5 concentrations)
		Prediction Uncertainty and Model’s Error Metrics
		Explainable artifacts (e.g., feature importance, correlation coefficients)
Domain Knowledge	General knowledge on fundamental IAQ concepts.	IAQ and environmental guidelines (e.g., EPA guidelines)	20
		IAQ and health guidelines (e.g., WHO guidelines)
		IAQ exposure and health effects (e.g., long-term PM_2.5 exposure)
Sensing Devices	Sensor infrastructure specifications	Device type (e.g., portable device, static device)	15
		Device spatial allocation (e.g., apartment-level, room-level)
		Sensing capabilities (e.g., pollutants, climatic conditions)

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Mountzouris, C.; Protopsaltis, G.; Gialelis, J. A GraphRAG-Based Question-Answering System for Explainable and Advanced Reasoning over Air Quality Insights. Air 2026, 4, 6. https://doi.org/10.3390/air4010006

AMA Style

Mountzouris C, Protopsaltis G, Gialelis J. A GraphRAG-Based Question-Answering System for Explainable and Advanced Reasoning over Air Quality Insights. Air. 2026; 4(1):6. https://doi.org/10.3390/air4010006

Chicago/Turabian Style

Mountzouris, Christos, Grigorios Protopsaltis, and John Gialelis. 2026. "A GraphRAG-Based Question-Answering System for Explainable and Advanced Reasoning over Air Quality Insights" Air 4, no. 1: 6. https://doi.org/10.3390/air4010006

APA Style

Mountzouris, C., Protopsaltis, G., & Gialelis, J. (2026). A GraphRAG-Based Question-Answering System for Explainable and Advanced Reasoning over Air Quality Insights. Air, 4(1), 6. https://doi.org/10.3390/air4010006

Article Menu

A GraphRAG-Based Question-Answering System for Explainable and Advanced Reasoning over Air Quality Insights

Abstract

1. Introduction

2. Methodology

2.1. Problem Identification

2.2. Definition of the QA System’s Objectives

2.3. Design of the Knowledge Graph (KG) Schema

2.4. Data Management Platform for IAQ Measurements and Predictions Context-Aware Retrieval

2.5. Design of the Intention Detection Mechanism

2.6. Design of the GraphRAG Context Retrieval Mechanism

2.7. Design of the GraphRAG Augmented Generation Mechanism

3. Results

3.1. Evaluation Metrics for the GraphRAG QA System’s Performance

3.2. Evaluation Dataset for the QA System

3.3. Evaluation Results for the QA System—Baseline

3.4. Evaluation Results for the QA System—GraphRAG

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI