Artificial Intelligence and Machine Learning Approaches for Indoor Air Quality Prediction: A Comprehensive Review of Methods and Applications

Latoń, Dominik; Grela, Jakub; Ożadowicz, Andrzej; Wisniewski, Lukasz

doi:10.3390/en18195194

Open AccessReview

Artificial Intelligence and Machine Learning Approaches for Indoor Air Quality Prediction: A Comprehensive Review of Methods and Applications

¹

Department of Power Electronics and Energy Control Systems, Faculty of Electrical Engineering, Automatics, Computer Science and Biomedical Engineering, AGH University of Krakow, 30-059 Krakow, Poland

²

Institute Industrial IT (inIT), Technische Hochschule OWL, 32657 Lemgo, Germany

^*

Authors to whom correspondence should be addressed.

Energies 2025, 18(19), 5194; https://doi.org/10.3390/en18195194

Submission received: 1 September 2025 / Revised: 24 September 2025 / Accepted: 25 September 2025 / Published: 30 September 2025

(This article belongs to the Collection Energy Efficiency and Environmental Issues)

Download

Browse Figure

Versions Notes

Abstract

Indoor air quality (IAQ) is a critical determinant of health, comfort, and productivity, and is strongly connected to building energy demand due to the role of ventilation and air treatment in HVAC systems. This review examines recent applications of Artificial Intelligence (AI) and Machine Learning (ML) for IAQ prediction across residential, educational, commercial, and public environments. Approaches are categorized by predicted parameters, forecasting horizons, facility types, and model architectures. Particular focus is given to pollutants such as CO₂, PM2.5, PM10, VOCs, and formaldehyde. Deep learning methods, especially the LSTM and GRU networks, achieve superior accuracy in short-term forecasting, while hybrid models integrating physical simulations or optimization algorithms enhance robustness and generalizability. Importantly, predictive IAQ frameworks are increasingly applied to support demand-controlled ventilation, adaptive HVAC strategies, and retrofit planning, contributing directly to reduced energy consumption and carbon emissions without compromising indoor environmental quality. Remaining challenges include data heterogeneity, sensor reliability, and limited interpretability of deep models. This review highlights the need for scalable, explainable, and energy-aware IAQ prediction systems that align health-oriented indoor management with energy efficiency and sustainability goals. Such approaches directly contribute to policy priorities, including the EU Green Deal and Fit for 55 package, advancing both occupant well-being and low-carbon smart building operation.

Keywords:

IAQ; IAQ prediction; machine learning; deep learning; building automation

1. Introduction

Indoor Air Quality (IAQ) has become a critical area of concern in the design and operation of modern buildings, as it directly influences human health, well-being, and cognitive performance. Accelerating urbanization and the proliferation of indoor lifestyles have resulted in people spending up to 90% of their time inside buildings [1]. Consequently, indoor environment quality has emerged as a public health priority and a determinant of urban sustainability. Poor IAQ has been demonstrated to be associated with a multitude of environmental and health stressors. Indoor environments are subject to several pollutants, including carbon dioxide (CO₂), volatile organic compounds (VOCs), and particulate matter (PM2.5, PM10). In addition to these, factors such as high humidity or inadequate ventilation have been shown to have a detrimental effect on human health. These include respiratory problems, an increased incidence of asthma, headaches, fatigue, and decreased cognitive function. The collective impact of these factors is colloquially termed sick building syndrome (SBS), a condition that has been linked to prolonged exposure to suboptimal indoor environments. From an economic point of view, inadequate IAQ imposes significant social burdens through increased healthcare care costs, reduced workforce productivity, and increased absenteeism. Equally important, ventilation and air treatment processes are among the most energy-intensive operations in buildings, representing a major share of HVAC energy demand. Consequently, predictive IAQ management is not only essential for health and comfort but also a key driver of energy efficiency and environmental sustainability. This dual perspective is consistent with current policy frameworks such as the EU Energy Efficiency Directive and the European Green Deal, which emphasize minimizing energy use while safeguarding indoor environmental quality. Research conducted in the United States has indicated that improving IAQ has the potential to generate economic benefits that range from 8 to 11 billion dollars annually [2,3].

While the term IAQ is typically defined as the condition of indoor air influenced by factors affecting the health and comfort of building occupants, its assessment depends on a set of key indicators. These include concentrations of CO₂ (an indicator of ventilation effectiveness and occupancy), VOC (emitted from construction materials, furniture, and cleaning products), PM2.5, and PM10, as well as relative humidity (RH) and temperature. These factors influence both thermal comfort and the likelihood of microbial growth. In addition, the Air Quality Index (AQI) has become a widely accepted composite metric to measure the air quality status. The authors of Ref. [4] posit that an effective assessment of IAQ requires reliable monitoring of these parameters in real time and over extended periods. IAQ is influenced by a variety of factors. These can be broadly categorized as external (for example, pollutants that infiltrate outdoor air through mechanical or passive ventilation) and internal (for example, human activity, cooking, smoking, and the use of household chemicals). It is evident that emissions from materials utilized in construction and furnishing, including but not limited to paints, adhesives, and composite woods, constitute a substantial source of VOCs. The presence of human beings has been shown to contribute to bioaerosols and CO₂ emissions. The general impact of these pollutants is modulated by factors such as ventilation efficiency, occupancy density, and building usage patterns [5].

To facilitate the analysis and prediction of IAQ, contemporary systems depend on heterogeneous and often intricate datasets. The primary data sources are as follows [6]:

Sensor-based measurements of environmental variables achieved through the utilization of sensor-based technologies, including metal oxide semiconductor (MOS) sensors, nondispersive infrared (NDIR) sensors, and optical particle counters. Despite the increasing prevalence of low-cost sensors, these devices frequently exhibit problems such as drift, noise, and reduced accuracy;
Weather data—to contextualize indoor air dynamics and to formulate forecasts regarding conditions, it is imperative that data regarding weather be taken into consideration. These data should include such factors as external temperature, humidity, wind speed, and pollutant concentrations;
Building system data—the collection of control and monitoring systems data, including but not limited to heating, ventilation, and air conditioning (HVAC) operational parameters, ventilation rates, and control schedules, is frequently facilitated by Building Management Systems (BMSs);
Occupancy data—derived from passive infrared (PIR) sensors, cameras, WiFi or Bluetooth tracking, or building calendars and schedules. These data serve as key indicators of the generation of internal pollutants, particularly CO₂.

Notwithstanding the existence of such extensive data, numerous challenges persist, as evidenced by the existing literature. The efficacy of predictive models is contingent on the integrity of the data upon which they are constructed, a quality that is frequently undermined by several factors [7]:

The absence of data can be attributed to either sensor failures or transmission errors;
The presence of sensor noise and long-term drift, particularly in low-cost devices, is a significant concern;
Data discontinuities may result from irregular sampling;
The absence of standardization across data sources, including but not limited to units, formats and sampling frequency;
Synchronization difficulties in multi-source systems. These difficulties may arise, for example, when trying to align sensor data with HVAC and occupancy logs.

It is evident that traditional IAQ monitoring systems are effective in providing real-time snapshots; however, they are fundamentally limited in their ability to provide anticipatory or adaptive control. Their spatial coverage is limited, they lack predictive capability, and they are often expensive to install and maintain. In addition, they lack the ability to dynamically respond to fluctuations in building usage or external environmental changes. Conversely, the integration of Artificial Intelligence (AI) and Machine Learning (ML) into IAQ management introduces novel opportunities for dynamic, data-driven control and forecasting. These techniques enable the following:

The detection of complex nonlinear relationships among IAQ parameters;
Short- and mid-term forecasting of pollutant concentrations;
The fusion of multi-source data (sensors, weather, occupancy);
Decision support in HVAC/BMS systems.

Recent studies have demonstrated the effectiveness of AI and ML models, such as random forests (RFs), support vector machines (SVMs), long short-term memory networks (LSTMs), and convolutional neural networks (CNNs), in predicting IAQ [8]. These models offer promise not only in conventional building environments but also in closed ecological systems (CESs) (e.g., habitats, shelters), where resource constraints necessitate highly efficient air quality control and minimal reliance on infrastructure.

The objective of this review is to present a structured synthesis of recent research on AI- and ML-based IAQ prediction systems. The key contributions of this article are as follows:

A comprehensive taxonomy of IAQ prediction approaches, categorized by predicted variables, building typologies, model types, and data fusion strategies;
The identification of issues pertaining to the quality of data, and the subsequent implications for the performance of the model;
The exploration of application scenarios encompasses a wide range of settings, including office buildings, educational institutions, and environments characterized by isolation or limited resources;
The identification of still open challenges and research gaps, with a particular focus on aspects such as model generalization, personalization, and integration into building automation frameworks.

The aim of this review is to serve as a valuable resource for researchers, engineers, and practitioners operating within the domains of smart building technologies, intelligent environmental control, and urban resilience planning. The consolidation of existing knowledge and the delineation of future directions are the primary objectives of this research, with the aim of advancing the field toward the development of more robust, adaptive, and human-centric IAQ management systems. The originality of this review lies in its cross-cutting perspective. Although several earlier works have summarized IAQ and ML research, to our knowledge, no previous study has simultaneously (i) provided a systematic taxonomy of approaches along predicted parameters, facility types, prediction strategies, and methodological classes; (ii) explicitly linked IAQ prediction to energy efficiency and sustainability frameworks such as the EU Green Deal and Fit for 55; and (iii) highlighted the integration of predictive models with HVAC automation, retrofit planning, and Digital Twin environments. This combined focus on both methodological classification and sustainability-oriented implications distinguishes the present review from previous studies and underscores its contribution at the intersection of occupant health, smart building operation, and climate policy.

Section 2 describes the review methodology, detailing the bibliometric strategies, selection criteria, and scope of literature analyzed. Section 3 provides an overview of the current state of research, describing the dominant modeling paradigms, performance metrics, and observed trends. Section 4 presents a classification of IAQ prediction studies according to the type of predicted parameters, modeling strategies, and application contexts. Section 5 offers a cross-sectional analysis and comparative discussion of selected works, emphasizing similarities, differences, and methodological patterns. Finally, Section 6 summarizes the key findings, outlines existing challenges, and proposes future research directions in the field of AI-based IAQ prediction.

2. Methodology of the Review

This review is based on a systematic survey of the scientific and technical literature published between 2020 and 2025. The aim was to identify and categorize recent trends, methods, and research directions related to the use of AI, particularly ML and deep learning (DL), to predict and control IAQ in smart buildings and environments.

The literature review process consisted of two main phases. The first phase of this study involved a methodical search across the major bibliometric databases, namely Web of Science, Scopus, and Google Scholar. The search queries included combinations of keywords such as indoor air quality, IAQ prediction, ML, DL, and building automation. The initial output was filtered based on three key criteria: relevance, scientific rigor, and the availability of quantitative performance metrics. The results of this bibliometric search are summarized in Table 1, which presents the number of research and review articles retrieved per category and database.

Phase two of the project included a targeted review of the literature of established scientific publishers, including Springer, Elsevier (ScienceDirect), MDPI, IEEE Xplore, ACM Digital Library, Taylor & Francis, and Wiley Online Library. This phase concentrated on the identification of high-impact contributions, namely journal articles, reviews, and selected conference papers, which demonstrated real-world implementation, methodological soundness, and model validation. Table 2 presents the results of this publisher-specific search, indicating the thematic scope and publication volumes on various platforms and subcategories.

The review was deliberately limited to the period 2020–2025 in order to capture the most recent methodological advances in AI/ML-based IAQ prediction. Earlier decades have already been extensively discussed in prior surveys and reviews, and repeating those findings would dilute the focus on current progress. As also emphasized in other reviews, the topic of IAQ prediction has become widely addressed in the broader literature, further justifying our concentration on the most recent and application-oriented studies. In this context, Table 1 summarizes the results of general keyword-based bibliometric searches, while Table 2 complements this with targeted queries across publisher platforms, highlighting thematic scope and distribution of publication types. The scope of this review is limited to peer-reviewed scientific publications, including journal articles, conference proceedings, and book chapters published by recognized academic and technical publishers. Nonpeer-reviewed sources, nonscientific reports and preprints lacking validation or reproducible data were excluded from the analysis. The exclusion of non-peer-reviewed sources, nonscientific reports, and preprints ensured that the analysis was based exclusively on validated and reproducible findings. However, this decision may introduce a certain bias, since very recent preprints or technical reports can occasionally contain innovative methodologies not yet available in peer-reviewed venues. The choice was made with the intention of maintaining robustness and comparability across studies, even at the expense of absolute completeness. This approach strengthens the reliability of the conclusions and ensures that they remain representative of the progress of the consolidated research. The focus of this review is specifically on AI-based prediction techniques, without a detailed discussion of the hardware design or IAQ regulatory frameworks. The analysis of data from bibliometric and publisher databases confirms a growing scientific interest in the application of AI, particularly ML and DL, for the prediction of IAQ. Although general IAQ topics generate high publication volumes, the focus narrows considerably when filtered for predictive approaches using ML or DL, reflecting the increasing specialization of this field. Moreover, the field of IAQ prediction using ML has seen a surge in research activity, with a consistent increase in the number of publications on various platforms. Despite their reduced overall number, DL-based studies are undergoing rapid expansion, particularly in the domains of time-series forecasting and real-time applications. These methods are increasingly being applied in real-world settings, such as residential, educational and public buildings, where prediction models support ventilation control, energy optimization, and occupant health monitoring. Notwithstanding the substantial body of technical contributions, review papers that integrate IAQ with AI methods remain few and far between, particularly those that span the divide between ML and DL. This indicates a gap in consolidated knowledge that this article aims to address. Furthermore, the existence of IAQ-AI studies in engineering and computing-focused databases (e.g., IEEE, ACM) indicates a growing convergence across different disciplines. A detailed review of the existing literature reveals a discernible trend toward intelligent predictive IAQ systems, with ML dominating current research and DL gaining prominence. The observations presented herein substantiate the necessity for a systematic synthesis of approaches, which is delineated in the following sections.

3. State of the Art-Related Research

According to the preliminary survey and the filtration process, a definitive subset of 71 publications was selected for exhaustive analysis. The selection of studies was based on several key criteria, namely, relevance to IAQ prediction in real or simulated environments; clarity and repeatability of the applied methodology; availability of input and output parameters; presence of quantitative model performance evaluation; and representativeness of various AI-based modeling paradigms. The reviewed studies cover a wide range of indoor contexts, including residential apartments, classrooms, university laboratories, healthcare facilities, transportation hubs, commercial spaces, and industrial environments.

The analyzed studies encompass a spectrum of modeling objectives and time horizons, ranging from short-term and real-time estimations to long-term control and optimization. Most of the research in this field concentrates on the prediction of future IAQ states (e.g., CO₂, PM2.5, VOCs), with the objective of anticipating these states based on past and present sensor data, occupancy patterns, meteorological data, and system behavior. A smaller subset of these systems is dedicated to real-time prediction, enabling the rapid adaptation of ventilation or purification systems in dynamic environments such as schools or hospitals. As illustrated in Table 3, a comprehensive compendium of the most pertinent publications has been collated, including key parameters, object types, methodologies, and performance indicators.

3.1. Modeling Approaches

A wide variety of computational techniques are applied in the reviewed studies, reflecting the complexity and diversity of IAQ prediction tasks. Table 4 show the most prominent methods. The reviewed articles used various types of variances and hybrids of the presented methods.

To assess and compare the quality of the models, the studies employ a diverse set of performance metrics. These reflect different dimensions of predictive accuracy, robustness, and practical applicability. The following metrics are the most frequently used:

$R^{2}$ (coefficient of determination). Indicates the proportion of the variability in the target variable that is explained by the model. When the value approaches 1.0, it is indicative of an excellent fit (e.g., $R^{2} > 0.95$ for certain LSTM models predicting CO₂). In contrast, values less than 0.5 signify inadequate predictive capacity. In the reviewed studies, $R^{2}$ values range from as low as 0.03 (weak Gaussian Process Regression (GPR) model) to nearly 1.0 (optimized DL models with feature selection);
MAE (Mean Absolute Error) and RMSE (Root Mean Square Error). These provide measures of the absolute prediction error in physical units (e.g., ppm for CO₂ or µg/m³ for PM2.5). It is important to note that the RMSE is more sensitive to outliers due to the squaring operation. It is notable that the models demonstrating the most efficacy are those employing hybrid or deep architectures, as evidenced by the attainment of minimal MAE/RMSE values (e.g., <5 ppm CO₂ or <2 µg/m³ PM2.5);
MAPE (Mean Absolute Percentage Error). Measurement of prediction error as a percentage facilitates comparison between different scales. It is widely accepted that MAPE values of less than 10% are indicative of highly accurate models; however, it should be noted that models for Total Volatile Organic Compounds (TVOC) or bioaerosols frequently exceed this range, a consequence of the elevated uncertainty and sensor limitations inherent in such models;
AUC (Area Under the Curve) and the F1-score—two important performance metrics in ML. These manifest in classification-based IAQ assessments (e.g., forecasting acceptable vs. unacceptable air quality states). It is notable that high AUC values (>0.9) and favorable F1-scores (>0.85) are indicative of strong discriminatory capability;
Interval-based metrics, such as the 95% prediction interval coverage or confidence interval coverage, are utilized in probabilistic models (e.g., Bayesian Neural Networks, Quantile Random Forest QRF). Models with high interval coverage (i.e., ≥85%) have been shown to provide not only accurate point predictions, but also reliable uncertainty estimates.

3.2. Observations and Trends

The reviewed literature reveals several important trends regarding the performance and applicability of different modeling approaches. CO₂ emerges as the most frequently modeled IAQ parameter, largely due to its strong correlation with occupancy and its importance for ventilation control. Predictive models for CO₂ often achieve high

R^{2}

values (up to 0.999), with MAE as low as 1–5 ppm in short-horizon forecasts. In contrast, PM2.5 prediction is more challenging due to greater variability caused by occupant activities, infiltration, and sensor noise. Nonetheless, DL models, particularly LSTM-based architectures, demonstrate promising performance in capturing temporal patterns of (PM2.5, PM10). For pollutants such as TVOC, formaldehyde, or bioaerosols, modeling remains difficult; several studies report elevated MAPE or lower

R^{2}

values, primarily due to sparse training data, complex chemical behavior, and higher sensor uncertainty.

The superiority of hybrid methods is a recurring finding, particularly those that combine deep learning with physical models or employ optimization techniques for hyperparameter tuning. These methods not only achieve high accuracy but also maintain better generalization across building types and environmental conditions. Real-time deployment emerges as an area of increasing importance, with several studies demonstrating the feasibility of running prediction models on edge devices with training times under 30 min and inference speeds several orders of magnitude faster than conventional CFD-based simulations. Overall, the diversity of metrics and modeling strategies reflects the multifaceted nature of the IAQ prediction problem, calling for context-specific solutions that balance accuracy, computational cost, and interpretability. Figure 1 presents graphs showing the percentage share of each category for the criteria to divide the articles and reviews examined. As illustrated in Figure 1, the distribution of reviewed studies highlights several dominant trends. CO₂ prediction clearly represents the largest share of research efforts, reflecting its strong link with occupancy, ventilation efficiency, and building energy performance. PM2.5 and PM10 are the next most frequently studied pollutants, while VOCs, formaldehyde, and bioaerosols appear less often due to higher measurement uncertainty and limited dataset availability. From a methodological perspective, Machine Learning techniques (e.g., Random Forest, SVM, Gradient Boosting) dominate, offering a balance between accuracy and interpretability. Deep learning models—particularly LSTM and GRU networks—show increasing adoption, especially for time-series forecasting tasks. Hybrid approaches, though less common, often deliver the highest accuracy by combining AI with physical models (e.g., CFD, mass balance) or optimization algorithms. In terms of prediction strategies, most studies address ahead prediction (short- to mid-term forecasting), enabling proactive ventilation and IAQ management. Real-time prediction remains less common, but is gaining traction with the spread of IoT-based sensing and edge computing solutions. Overall, the figure underscores two major research trends: (i) the central role of CO₂ and PM forecasting in IAQ-related studies, and (ii) the growing shift towards hybrid and Deep learning architectures that better capture nonlinear and temporal dependencies in complex indoor environments. A detailed analysis of the papers from Table 4 is presented and discussed in Section 4.

4. Classification of Studies on the IAQ Prediction

The previous Section 3 provided a comprehensive overview of the methodological landscape in IAQ forecasting, emphasizing the growing importance of AI and ML techniques. Building upon that foundation, this section presents a classification of recent research efforts based on the specific IAQ parameters being predicted, highlighting key modeling strategies and technological integrations.

4.1. Classification by Predicted IAQ Parameters

Among the various parameters that define IAQ, CO₂ concentration has received the most extensive attention due to its strong correlation with ventilation efficiency, occupant well-being, and energy use.

4.1.1. CO₂ Concentration Prediction

Predicting indoor CO₂ concentration is vital for maintaining healthy air quality, ensuring occupant comfort, and optimizing energy performance in buildings. AI and ML methods have proven particularly effective for modeling complex and dynamic indoor environments, enabling real-time and accurate forecasting.

Recent studies have predominantly employed neural network-based models, particularly Multilayer Perceptron (MLP) and LSTM networks, for dynamic CO₂ prediction [17,25,42,74]. Taheri and Razban [17] compared several ML algorithms—including SVM, AdaBoost, RF, GBM, Logistic Regression, and MLP—highlighting MLP’s superior performance in dynamic prediction scenarios, with energy savings in HVAC systems reaching up to 51.4%. Zhu et al. [25] demonstrated LSTM’s high accuracy in short-term forecasting using real-time Internet of Things (IoT) sensor data. Dudkina et al. [74] further confirmed the advantage of ML over traditional physical models, with LSTM and k-Nearest Neighbors (KNN) showing strong performance in short-term prediction contexts.

A broad array of ML algorithms, such as Decision Trees (DTrs), RF, GPR, and Ridge Regression, has also been explored. Kallio et al. [33] emphasized the simplicity and competitive accuracy of DTr compared to more complex RF models, advocating their practical deployment in real-time applications. Kapoor et al. [58] showed that GPR could achieve outstanding accuracy (

R^{2}

= 0.9887; RMSE = 4.20 ppm), especially when leveraging comprehensive environmental data collected via IoT.

Hybrid approaches that integrate ML with Computational Fluid Dynamics (CFD), optimization algorithms, or Building Information Modeling (BIM) have also shown great promise. Bian and Shi [10] developed a neural operator transformer that combines ML with CFD to enhance real-time prediction and ventilation system optimization. Similarly, Li et al. [45] fused CFD simulations with a Back-Propagation Neural Network (BPNN) optimized by PSO, enabling fast and precise CO₂ predictions. Hou et al. [14] implemented BIM in conjunction with Extreme Learning Machines (ELMs) and a Grey Wolf Optimizer (GWO) to improve ventilation control and IAQ management.

The integration of digital twins (DTs) and IoT sensor networks has further strengthened ML-based prediction frameworks. Arsiwala et al. [77] introduced a comprehensive DT system combining IoT, BIM, and AI-based prediction for real-time monitoring and visualization of CO₂-equivalent emissions, supporting proactive retrofitting strategies for climate-neutral buildings. Economic considerations are increasingly being incorporated into CO₂ prediction tools. Lee et al. [49] proposed a forecasting model for Energy Recovery Ventilation (ERV) systems that integrates predictive modeling with cost–benefit analysis to balance IAQ improvements and operational expenses.

In parallel, researchers have also proposed simplified, adaptive models aimed at broader usability. Dutta and Roy [27] introduced an accessible SARIMAX-based model achieving practical accuracy (RMSE = 26.45 ppm), suitable for everyday IAQ monitoring. Segala et al. [40] developed an adaptive neural network capable of rapid retraining and reliable short-term forecasting, ensuring ease of use in varied occupancy scenarios. Furthermore, Tagliabue et al. [13] proposed an IoT-based system incorporating LSTM-based recurrent neural networks for real-time CO₂ prediction in educational facilities. Their model achieved high predictive accuracy (

R^{2}

= 0.92) and effectively supported HVAC automation, enhancing comfort, mitigating cognitive performance decline, and improving productivity.

In summary, the literature demonstrates significant advancements in CO₂ prediction through AI and ML, showcasing high accuracy, real-time adaptability, and successful integration with IoT, CFD, DT, and economic assessment tools. Several studies further demonstrate that accurate CO₂ forecasting enables demand-controlled ventilation, reducing HVAC energy consumption by up to 50% while maintaining acceptable air quality levels. These findings provide direct evidence that predictive IAQ systems can contribute substantially to energy efficiency targets and climate mitigation strategies.

4.1.2. PM2.5 Concentration Prediction

A plethora of studies have investigated the use of ML and DL methodologies to predict indoor PM2.5 concentrations in a range of environments. Recurrent neural architectures such as LSTM, Gated Recurrent Units (GRUs), and their hybrids have demonstrated robust predictive capabilities in both residential and commercial settings. For instance, Dai et al. [26] employed a recurrent neural network (RNN) augmented with an autoencoder to forecast PM2.5 levels in a residential apartment, achieving a median prediction error of merely 8.3 µg/m³ for 30 min intervals. In a similar vein, Sharifuddin et al. proposed an LSTM-GRU hybrid to enhance learning stability in commercial buildings, while Lagesse et al. demonstrated that LSTM models outperformed classical regression methods, even without external PM data inputs, in a large office space [28,30].

After these foundations, there has been an emergence of more sophisticated model architectures. Lu et al. [39] developed a hybrid model combining temporal convolutional networks (MHATCN), multi-head attention, and LightGBM, achieving

R^{2}

= 0.92 in subway environments and excelling in temporal pattern extraction. Wang et al. [60] proposed a “soft sensor” to predict PM2.5 levels in subway stations. This sensor integrated kernel-based dimensionality reduction, ensemble learning, and interpretability through SHAP values. Zhou and Yang [36] built upon these concepts by incorporating occupancy data derived from image analysis into their SO-LSTS time series model, thus extending the scope of these ideas to the context of hospital environments. Their approach, which incorporated subjective (AHP) and objective (entropy-based) characteristic weighting, resulted in an 89% improvement over classical baselines, underscoring the importance of human activity patterns in dynamic indoor environments.

It is noteworthy that several studies have underscored the practical relevance of the subject through the enhancement of its robustness or the proposal of deployment strategies that are intended to be implemented in real-world settings. Guo et al. [55] have proposed a DL pipeline (RF-CNN-GRU) that has been specifically developed for the processing of incomplete sensor data from subway systems. The model employed a combination of techniques, including RF for imputing missing values, CNN for feature extraction, and GRU for sequential forecasting. This approach demonstrated superior performance in comparison to both isolated and hybrid baselines. In their paper [38], Wu et al. proposed a hybrid model that integrates Partial Least Squares (PLS) with a variational autoencoder (VAE). This approach yielded a maximum enhancement of

R^{2}

by up to 30%, a notable advancement that was achieved by enhancing the model’s resilience to noise and optimizing feature representation.

Concurrent research has concentrated on the integration of predictive tools with IAQ management. Kim et al. [43] conducted a study to ascertain the real-time PM2.5 concentrations during the process of cooking in Korean homes. This was achieved by means of a mass balance model, the results of which demonstrated that prediction models can inform optimal ventilation and filtration strategies under varying conditions. Bao et al. [42] proposed a fuzzy logic-enhanced CNN-LSTM model for intelligent HVAC systems, which enhances interpretability without compromising predictive efficacy. Meanwhile, Ferreira et al. demonstrated the applicability of the Fuzzy ARTMAP neural network in a residential bedroom context, leveraging sliding-window time series data to achieve high accuracy with minimal computational overhead. The MAE ranged from 0.26 to 7.65 µg/m³, making it highly suitable for real-time implementation in constrained environments [52].

In support of interpretability and operational simplicity, several works employed transparent, computationally lightweight models. Park et al. [72] proposed a time-segmented multiple linear regression (MLR) approach using two years of sensor data from homes, showing that dividing models by hour increased explanatory power by up to 9%. In a similar study, Jung et al. achieved 75% explained variance in Taiwanese homes by incorporating behavioral and structural parameters into their MLR framework [48]. Guo et al. proposed a classification-based alternative using CatBoost, which accurately identified threshold exceedance of PM2.5 (AUC = 0.949), thus offering a low-latency solution for smart building control [55].

Another significant trend pertains to the utilization of probabilistic models for the purpose of addressing uncertainty and achieving scale. Dai et al. developed large-scale Bayesian Neural Networks (BNNs) trained on nationwide datasets in China, achieving

R^{2}

= 0.70 and MAE = 9.45 µg/m³. These models have been employed to provide robust probabilistic predictions of indoor PM2.5 exposure across 330 cities, thus holding promise for population-scale risk assessment and policy planning [8,68].

Collectively, these studies illustrate the rapid evolution of AI-driven approaches to IAQ modeling. These models reflect a convergence of temporal modeling, probabilistic inference, data fusion, and model interpretability, while also underscoring the importance of context—residential, commercial, or public transportation—and the need to align models with the limitations and opportunities of real-world sensor systems.

4.1.3. PM10 Concentration Prediction

A significant development in the field of IAQ management systems involves the application of AI and ML for the prediction of PM10 concentrations. The presence of (PM2.5, PM10) of this size is a critical indicator of air quality in enclosed environments. Elevated concentrations of this (PM2.5, PM10) have been associated with serious health consequences.

Several studies have explored the potential of data-driven approaches for predicting indoor PM10 levels using IoT-based monitoring systems. Studies [21,41] proposed predictive frameworks deployed in residential and institutional indoor environments, utilizing sensor networks to collect parameters such as PM10, PM2.5, CO₂, VOCs, temperature, and humidity. The system described in [41] employed the XGBoost regressor, achieving high predictive performance (

R^{2}

= 0.99, RMSE = 0.48), while [21] utilized the RF algorithm with similarly strong results (

R^{2}

= 0.996, RMSE = 0.594). The findings of both studies demonstrate the feasibility of integrating ML models into real-time IAQ decision-support systems, particularly in the context of ventilation control and energy-efficient building management.

In addition to these approaches, studies [66,67] explored alternative predictive strategies. In [67], a hybrid model was proposed, combining a fuzzy inference system (FIS) with two optimization algorithms: PSO and a GA. The optimized models (FIS-GA and FIS-PSO) exhibited a substantial enhancement in performance when compared to the baseline FIS model, with RMSE values of 0.998 and 1.0746, respectively. These findings underscore the efficacy of metaheuristic optimization in enhancing the predictive accuracy of IAQ applications. In contrast, Ref. [66] concentrated on educational facilities and developed statistical models for predicting indoor PM10 concentrations based solely on outdoor environmental parameters, including ambient PM10 and meteorological data. Utilizing multiple linear regression and variable transformations, the models attained

R^{2}

values ranging from 0.43 to 0.64, contingent on the facility type. This underscores the efficacy of a cost-effective alternative to direct indoor monitoring when confronted with resource constraints.

In contrast to the contexts of residential and public-sector settings, the research study [22] focused on the prediction of IAQ in industrial environments, with a particular emphasis on the cleanrooms employed in semiconductor manufacturing facilities. The authors developed a multilevel RNN model that incorporated IoT and LoRa sensor data along with seasonal, humidity, and temperature variations. The multilevel RNN demonstrated superior performance in comparison to conventional LSTM models, thus underscoring the efficacy of hierarchical DL architectures for real-time IAQ forecasting within highly controlled and mission-critical environments [22].

4.1.4. Prediction of Other Parameters (Formaldehydes, VOC, RH)

Several studies have been conducted that focus on the use of AI to predict indoor pollutant levels, particularly VOCs, based on environmental parameters such as temperature, humidity, and the presence of secondary emission sources. Liu et al. [39] developed a model based on LSTM techniques for the purpose of forecasting formaldehyde concentrations. This model considers the effects of adsorption and desorption on various fabrics when these are subjected to dynamic thermal and humidity conditions. The model demonstrated high predictive accuracy, with an MAE below 0.5 and a MAPE below 10%. In a concurrent study, Majdi et al. [51] utilized a radial basis function neural network to predict VOC levels in indoor environments, leveraging data on temperature, humidity, and CO₂ levels retrieved from a smart building’s HVAC system. The approach yielded a prediction error of approximately 3%, underscoring its efficacy for real-time air quality control in smart environments. In a similar vein, D’Amico et al. [76] investigated the subject of VOC emissions from building materials by conducting a combination of box-model and CFD-based simulations. The research findings emphasized the pivotal function of ventilation strategies in maintaining acceptable concentrations of pollutants and proposed design guidelines to support health-oriented building planning.

Alternative approaches make use of indirect indicators of air quality to predict spatial and temporal variations. Tian et al. [31] proposed a novel method for estimating the non-uniform distribution of IAQ under mixed ventilation conditions. This method uses dynamic air temperature and velocity data generated through pulsating air supply. The model, which is based on SVM and validated with CFD simulations and experimental data, enabled an accurate prediction of air age—a proxy for freshness—without requiring direct pollutant measurements. In a related context, Fu et al. [71] assessed the feasibility of using airborne particles, particularly PM1.0 and PM2.5, as reliable tracers to estimate real-time air exchange rates (AER). The findings demonstrated a robust correlation between fine particulate concentrations and AER, thereby providing a pragmatic approach for the management of IAQ and thermal losses in real-world building operations [71].

System-level solutions aimed at monitoring and anomaly detection in IAQ are addressed in the works of Fanca et al. [64] and Espinosa et al. [73]. Fanca et al. [64] developed a client-server platform for continuous monitoring of indoor and outdoor air quality using commercial Netatmo sensors. The system facilitated user interaction through an Android application and enabled proactive interventions based on deviations in temperature, humidity, and CO₂ levels in occupational environments [64]. Conversely, Espinosa et al. employed Singular Spectral Analysis (SSA) to analyze smooth air quality time series data and identify anomalous fluctuations. The dual-stage method, which assesses both signal level and temporal dynamics, has been shown to be superior to classical noise-reduction techniques and offers significant potential for integration into IoT-based smart building frameworks [73].

4.1.5. Multiple Parameter Prediction (Different Combinations of Parameters/Factors)

A comprehensive synthesis of recent literature on the prediction of multiple IAQ parameters using AI and ML reveals several recurring methodological paradigms and application contexts across various building types. The findings of these studies consistently highlight the effectiveness of data-driven approaches in modeling and managing indoor pollutant concentrations and comfort parameters, especially in dynamic and resource-constrained environments.

A significant amount of research has been dedicated to DL models, with a particular emphasis on RNN, LSTM, GRU, and hybrid models such as CNN-LSTM or LSTM-DNN. These architectures have been shown to be particularly effective for time-series forecasting, with the capacity to capture complex temporal dependencies and nonlinearities within high-resolution environmental datasets. A substantial body of research conducted in educational and public-use buildings has demonstrated the capacity of such models to predict concentrations of CO₂, PM (PM1, PM2.5, PM10), TVOCs, and other pollutants with a high degree of accuracy, often exceeding the capabilities of traditional statistical methods. Furthermore, these models demonstrated considerable potential for generalization across diverse geographic locations and climate zones when adequately trained, thus rendering them viable instruments for large-scale implementation [11,15,16,19,20,24,34,37,43,44,46,50,59,61,65].

Another emerging research domain concerns the integration of IAQ prediction with real-time HVAC system control, particularly within heritage and commercial buildings where retrofitting sensor networks is often impractical. In this study, researchers employed GRU and RNN-based models in conjunction with CFD simulations and DT. This approach was utilized to facilitate proactive regulation of ventilation settings. These systems optimize the operation of HVAC systems based on the forecasting of pollutant levels (e.g., CO₂, NO₂, SO₂), thereby ensuring that IAQ remains within acceptable limits whilst simultaneously enhancing energy efficiency and reducing intervention latency. By integrating sensor data, physical simulation, and AI prediction, these platforms signify a substantial advancement towards fully autonomous, context-aware environmental control systems [29,50,53].

A distinct research direction focuses on the design and planning stages of building renovation, particularly within the framework of deep energy retrofits. The PredicTAIL methodology is an exemplar of this approach, integrating simulation-based tools to predict ten indoor environmental quality (IEQ) parameters, including CO₂, PM2.5, temperature, formaldehyde, and acoustic/light-related metrics, prior to physical interventions. This method is enabled by the utilization of a range of specialized tools (for example, TRNSYS, IDA ICE and ACOUBAT), thereby empowering designers to predict the effect of alternative retrofit strategies on indoor conditions. The employment of predictive modeling facilitates evidence-based decision-making and risk mitigation in renovation scenarios, thereby ensuring that energy efficiency goals do not compromise occupant health or comfort [23,56,59,62].

The reviewed literature also demonstrates a strong interest in low-cost and scalable IAQ management solutions through IoT-enabled platforms. These systems, which combine sensor networks with ML algorithms (including RF, GBM, and LSTM), are especially suited for use in resource-limited settings such as public schools or homes without HVAC infrastructure. Several studies have validated the feasibility of accurate IAQ prediction using minimal sensor setups, or even substituting direct measurements with proxy variables such as occupancy schedules, window operation, and meteorological data. These findings emphasize the potential for the deployment of ML-based IAQ forecasting without the necessity for full-scale instrumentation, thereby democratizing access to air quality insights [9,12,32,35,37,63,70,75,78].

A significant extension of the predictive IAQ paradigm is evident in the pre-occupancy stage of building design. For instance, the i-IAQ toolbox integrates pollutant simulation into early design workflows using a multi zone mass-balance framework. This tool enables architects and engineers to assess post-occupancy levels of CO₂, PM10, PM2.5, NO₂, O₃, and TVOCs based on layout, building materials, ventilation scenarios, and emission source libraries. Through validation in a real apartment case study, it demonstrated high potential for informing design decisions that minimize pollutant buildup, thereby contributing to the growing field of healthy building design [18].

In a comparable manner, studies such as [35] advocate for highly generalizable and cost-effective models, specifically designed for naturally ventilated educational facilities. The incorporation of readily available inputs, including occupancy patterns, the utilization of windows and doors, and outdoor weather conditions, has enabled the training of a class-weighted RF model. This model has been utilized for the classification of IAQ and thermal comfort levels, thereby obviating the necessity for direct environmental measurements. This approach eliminates the need for permanent sensor infrastructure while preserving model robustness, offering substantial advantages for large-scale deployment across public school systems. In addition to ML, analytical modeling continues to be a relevant field. A dynamic mass-balance model, as outlined in [69], provides a mechanistic framework for predicting PM2.5 and PM2.5–10 concentrations in office environments. The model incorporates a range of factors including particle infiltration, occupant resuspension, and air purifier operation. This provides actionable insights into the optimal duration and timing of air purifier operation. Its validation against ASTM D5157 and k-fold cross-validation benchmarks demonstrated excellent predictive performance and underscores the continued relevance of physics-based models, especially when calibrated with empirical data.

In conclusion, the reviewed studies collectively demonstrate that AI and ML approaches, when tailored to the specific operational, architectural, and environmental context of a building, can deliver accurate and actionable predictions of IAQ parameters. It is evident that the facilitation of both real-time environmental control and strategic planning occurs at the design stage. Furthermore, the convergence of AI, IoT, and simulation technologies has the potential to facilitate the development of scalable, cost-effective, and proactive IAQ management systems, which could have significant implications for energy efficiency, occupant health, and resilience to external air quality hazards.

4.2. Classification by Type of Facilities

Another significant aspect of classification pertains to the type of facility in which IAQ prediction models are employed, as different environments present distinctive challenges in terms of occupancy patterns, emission sources, and control strategies.

4.2.1. Residential Facilities

A substantial body of research has focused on the application of advanced ML and DL techniques, particularly LSTM and RNN, for the prediction of multiple IAQ parameters in residential settings. The efficacy of these models in capturing temporal dynamics and dependencies within time-series data has been well demonstrated. This is of critical importance for the accurate forecasting of pollutants such as PM2.5, CO₂, formaldehyde, and other environmental indicators. Several studies have been conducted that have employed such approaches for the purpose of predicting IAQ metrics in real-time or with short-term forecasting horizons (see, for example, [16,25,26,47,48,64,75,78]). These studies have achieved high levels of predictive accuracy (e.g.,

M A P E < 10%

,

R^{2} > 0.9

). It is important to note that [48] emphasized the significance of incorporating contextual and behavioral factors, including building characteristics, floor level, laundry habits, and differential indoor–outdoor CO₂ levels. These factors contributed to a substantial improvement in the performance of the PM2.5 prediction model. These findings emphasize the importance of customizing input variables to the specific residential context to enhance the generalization and relevance of the model.

Another research line addressed the integration of predictive models within comprehensive IoT architectures and DT systems for proactive IAQ management. A few studies have been conducted that present intelligent residential systems which combine sensor networks with ML models (e.g., RF, artificial neural networks, Bayesian neural networks) to forecast pollutant levels. These systems have also been developed for the automation of environmental control measures such as ventilation and purification (see [9,18,41,54,75,77] for further details). These systems frequently incorporated real-time interfaces or 3D visualization dashboards, thereby empowering end-users to monitor air quality in real-time and implement immediate corrective actions. The practical applicability of AI-based IAQ prediction in enhancing indoor comfort and health through active system response is highlighted by such implementations.

Several studies have been conducted that explore predictive modeling from an explanatory standpoint. These include [8,12,46,50,52,68,72]. The objective of these studies was to identify the key drivers of indoor pollutant variability and to assess their interactions. The works utilized regression-based, ensemble, and neural network models to analyze the impact of variables such as mechanical and natural ventilation, occupant activity patterns, building envelope configuration, and external pollution levels. The findings of the study demonstrated that IAQ in dwellings is highly sensitive to local conditions, and that prediction accuracy is enhanced when models incorporate granular, high-frequency data inputs. The findings also reinforced the importance of accounting for diurnal and seasonal variability, as previously demonstrated in [72]. The latter employed hour-specific regression models to reflect time-of-day dynamics in indoor PM2.5 concentration.

Moreover, certain contributions have extended the scope of IAQ prediction to encompass health-related and sustainability-oriented outcomes. For instance, Ref. [50] developed DL models linking real-time indoor PM2.5 levels to peak expiratory flow rates (PEFRs) in children with asthma, illustrating the critical value of accurate air quality forecasting in domestic environments. Concurrently, Ref. [77] proposed a DT framework integrating the IoT, BIM, and ML to predict equivalent CO₂ (eCO₂) emissions from residential spaces, thereby aligning IAQ monitoring with comprehensive Net-Zero emission strategies. This approach has the dual benefits of enhancing IEQ and facilitating policy-driven retrofitting and energy efficiency planning.

Collectively, these studies emphasize the growing maturity and multidimensionality of AI-based IAQ prediction in residential buildings. It has been demonstrated that such approaches have the ability to progress beyond the estimate of pollutants, toward integrative, responsive, and health-centered indoor environmental management.

4.2.2. Non-Residential Facilities

A substantial number of studies have demonstrated the efficacy of AI and ML techniques in predicting various parameters of IAQ in public buildings, including offices, schools, universities, fitness centers, healthcare facilities, and industrial environments. A wide range of DL architectures has been employed for this purpose, including LSTM, GRU, convolutional-recurrent hybrids (CNN-LSTM), artificial neural networks (ANNs), ELM, and fuzzy logic-based hybrids. The integration of these models with data from IoT infrastructures, BMS, CFD simulations, building information modeling (BIM), as well as environmental and occupancy sensors, has enabled real-time forecasting and adaptive control of HVAC systems.

It is particularly noteworthy that studies [14,20,23,31,59,77] report the use of sophisticated and integrated prediction frameworks, in which the predictive core relies on CNN-LSTM, GRU, or ELM networks that have been optimized with PSO, grey wolf optimization (GWO), or genetic algorithms. These models facilitate the concurrent optimization of air quality, thermal comfort (e.g., PMV), and energy consumption, thereby providing a comprehensive approach to intelligent indoor environmental control. As demonstrated in [29,31], certain implementations also incorporate DT platforms and CFD-based airflow simulations, thereby facilitating predictive environmental regulation in historical or sensor-constrained buildings.

A substantial body of literature focuses on the prediction of indoor CO₂ concentrations, which are highly correlated with occupancy levels and temporal activity patterns. A plethora of studies have demonstrated the efficacy of algorithms, including LSTM, GPR, SARIMAX, and MLP, in accurately predicting CO₂ dynamics [13,17,27,33,40,49,58,74]. These algorithms have been shown to be particularly effective in utilizing inputs such as temperature, humidity, window status, and occupancy rates to facilitate accurate predictions. These predictive capabilities enable the implementation of demand-controlled ventilation strategies in lecture halls, open-plan offices, and meeting rooms, contributing to considerable reductions in HVAC energy consumption—by over 50% in some cases [17].

In educational institutions, including schools, kindergartens, and daycare centers, researchers have identified the prediction of (PM2.5, PM10) and CO₂ as a priority, due to their direct impact on the cognitive function and well-being of children. A substantial number of studies [21,23,24,35,43,61,63,66] have demonstrated that models such as ANN, LSTM, XGBoost, and RF can perform robustly even with a limited number of sensors. It is noteworthy that the IndoAirSense framework [24] successfully estimated IAQ in unmonitored rooms based on cross-room sensor data. As demonstrated in [61], models trained in one country demonstrated a high degree of generalization capability to schools located in foreign countries. In contrast, the findings presented in [62,66] indicated that reliable IAQ prediction can be achieved through the exclusive utilization of outdoor environmental and meteorological data.

High-occupancy environments such as subway stations and industrial halls have also been the focus of extensive research. Research in [22,38,39,57,60,70] demonstrates that advanced models—such as LSTM, LightGBM, CNN-GRU, and latent variable-enhanced autoencoders—are resilient to data loss and noise and offer high predictive accuracy in dynamic, pollution-intensive conditions. These models are of value in public transit and industrial contexts, where events of pollution are characterized by their localization and temporal variability.

A considerable number of studies have concentrated on the prediction of multiple IAQ parameters concurrently to facilitate comprehensive environmental management. A range of investigations [15,19,34,37,42,45,51,53,65,73] have demonstrated that combining multiple data sources, attention mechanisms, FIS, and model optimization strategies can yield high-accuracy forecasts, even for parameters that are difficult to measure directly. Examples of such parameters include bioaerosols, TVOC, perceived air age and material-based VOC emissions. It is noteworthy that studies [37,45,73,76] underscore the significance of predictive models in the context of early warning systems, anomaly detection, and the evaluation of passive emission from building materials, underscoring their pertinence for the design of smart buildings.

Furthermore, even though the study [11] concentrated on greenhouse environments, it provides valuable insights into temperature–humidity coupling and control strategies that can be applied to controlled public indoor spaces, such as conservatories and exhibition halls. In a similar vein, Ref. [28] proposes a hybrid model that integrates LSTM and adaptive GRU algorithms to predict PM2.5 levels in commercial buildings. This approach effectively addresses the gradient vanishing issue prevalent in conventional RNN. Finally, Ref. [29] proposes an SVM-based model for predicting spatial air quality gradients under pulsed ventilation scenarios, illustrating the potential for fine-grained, zone-specific IAQ management.

A collective analysis of these studies reveals a converging conclusion: the implementation of AI and ML techniques for IAQ prediction in public buildings not only enhances occupant health and comfort but also contributes substantially to energy-efficient operation and the execution of intelligent building strategies. The models’ adaptability, scalability, and effectiveness render them suitable for deployment across a wide variety of building types and technological maturity levels, thus making them highly promising tools for both retrofitting and new construction contexts.

4.3. Classification by Prediction Strategy

In addition to the parameters and types of building used in the prediction, IAQ forecasting models can also be classified based on their temporal and methodological strategies. This section distinguishes between the predictive approaches used for forward-looking operational control, design-phase estimation and real-time inference based on indirect parameters.

4.3.1. Forward Prediction of IAQ

A dominant trend in the current literature is the focus on the short- to medium-term forecasting of IAQ variables to support real-time ventilation control and air quality management. Most of studies address time horizons ranging from a few minutes to 24 h ahead, which aligns with operational needs in residential, commercial, and public environments. Longer-range models, which forecast up to 24 h in advance, are usually used for strategic planning. These models utilize LSTM, fuzzy ARTMAP or ARIMA, and are trained using sensor data sampled every 5–60 min and often aggregated hourly. Applications include forecasting IAQ in fitness centers and apartments to identify favorable periods of occupancy or ventilation [8,9,15,16,52,78]. The most addressed forecasting horizon spans from 10 to 360 min (approximately 1–6 h), enabling dynamic HVAC control in response to changes in occupancy and pollution levels. Studies in this category use rolling forecasts based on IoT sensor data sampled at one-minute or hourly intervals. Popular models include MLP, LSTM, GRU and RF, which are applied to CO₂, PM2.5, PM10, and TVOC [13,17,20,22,25,26,28,30,33,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,51,53,54,57,59,60,65,67,68,69,70,73,75,77].

Such models support demand-controlled ventilation (DCV) systems, particularly in office spaces, classrooms, and healthcare settings [17,20,27,33,49,58]. In industrial and transport environments such as semiconductor cleanrooms and metro stations, short-term PM forecasts facilitate prompt mitigation [22,39,57,60,70].

Ultra-short-horizon models (5–30 min) are used for real-time applications involving high volatility, such as in homes, day care centers, and shopping malls. These models frequently use LSTM, ANN, sequential state analysis (SSA), radial basis function (RBF), GRU, and hybrid models (e.g., FL-CNN-LSTM, KPCA-AdaBoost-LSTM), which are trained on minute-level data [11,19,23,24,27,29,31,32,34,42,50,56,64,74,76].Forecast accuracy also depends on data resolution and model design. Iterative strategies, where past predictions inform future ones, enhance continuity [8,15,52,78]. Techniques such as PSO, GA and SHAP can further improve robustness in noisy environments [23,45,60,67].

In summary, AI-based IAQ prediction systems are increasingly being designed for short-term, real-time deployment in building management. While longer time frames support preventive planning, by allowing short- to medium-term forecasting of pollutant levels, these models enable dynamic HVAC adaptation. This not only ensures healthier indoor environments but also reduces unnecessary energy use in ventilation and air conditioning, directly supporting building energy efficiency objectives.

4.3.2. IAQ Estimation During Building Design (No Time Horizon)

Several studies focus on predicting IAQ in the early stages of building design or renovation, before the building is occupied. These methods estimate future pollutant concentrations based on building parameters by integrating physical modeling with AI-based data analytics.

Mass-balance and multi zone simulation tools, such as CONTAM, have been used to predict the concentrations of CO₂, PM2.5, PM10, VOCs, NO₂, O₃ and formaldehyde, based on the design of the building envelope, ventilation strategies, infiltration and outdoor pollution [18,62]. Frameworks such as i-IAQ and PredicTAIL support evidence-based decision-making by identifying IAQ risks and recommending mitigation strategies during the design process. Other studies have applied ML models (e.g., RF, XGBoost, and elastic net) trained on parametric simulation data to predict IAQ indicators, primarily CO₂ as a proxy for ventilation efficiency [12,14]. These models incorporated design parameters such as HVAC configuration, glazing, airflow rates, and occupancy profiles. Optimization tools such as SHAP and GWO were employed to priorities features and tune hyperparameters.

The findings emphasize the significant influence of ventilation-related parameters and reveal the trade-offs between IAQ and energy use, which are relevant for sustainable retrofitting. Overall, these tools support architects and engineers in the design phase, helping them to create healthier, more energy-efficient buildings by integrating IAQ as a core performance metric.

4.3.3. Inference of IAQ from Indirect Parameters (Sensor Fusion)

An increasing amount of research is exploring the use of indirect or proxy variables, such as outdoor air quality, occupancy, or building usage patterns, to infer indoor pollutant levels in real time, particularly in areas where sensor infrastructure is limited. Park et al. in [72] introduced a time-segmented multiple linear regression (MLR) model to predict indoor PM2.5 levels using minute-level outdoor sensor data, as well as temperature and humidity data. The hourly segmentation improved the accuracy by 9%. Guo et al. [55] applied a CatBoost classifier to predict PM2.5 threshold exceedance in offices using five sensor-based inputs, achieving an AUC of 0.949.

In educational settings, Ren et al. [61] compared ANN, NARX, and LSTM models using a large dataset from mechanically ventilated classrooms. NARX achieved an

R^{2}

value of between 0.81 and 0.87 for PM prediction across schools in China and the US. Guak et al. [66] proposed PM10 prediction models based solely on outdoor air quality and weather data in schools, thus avoiding the need for indoor instrumentation.

From a systems perspective, Marzouk and Atef [63] developed an IoT–AI platform that combines low-cost sensors and cloud services to monitor PM2.5, CO, CO₂, temperature and humidity. This system enables reliable, real-time IAQ management in academic buildings. Fu et al. [71] demonstrated that PM10 and PM2.5 can be used to estimate AER in real time, with normalized errors of less than 10%—a promising alternative to tracer gas methods.

These studies demonstrate a clear trend towards sensor fusion, real-time analytics, and predictive control. By combining environmental sensing with ML, researchers have developed scalable, cost-efficient systems suitable for diverse building types, especially in cases where direct monitoring is impractical or expensive.

4.4. Classification by Applied Methods

The prediction of IAQ has been approached through a variety of modeling techniques, differing in terms of complexity, data requirements, and deployment potential. The following section is concerned with the categorization of these approaches into four methodological classes.

4.4.1. Mathematical and Statistical Method

Traditional statistical models have long played an established role in IAQ prediction, offering interpretable and data-efficient solutions. Time series models, including ARIMA and SARIMAX, have been extensively used for the purpose of forecasting key IAQ variables, such as CO₂, PM2.5, and AQI. The efficacy of these models in short-term forecasting has been demonstrated, with MAPE frequently falling below 10% [16,27,60,78]. The efficacy of these models has been particularly evident in univariate CO₂ forecasting, as well as in multivariate settings that incorporate environmental covariates [20,53]. Furthermore, ARIMA has been employed in data preprocessing pipelines, such as outlier detection, achieving

R^{2}

values of up to 96% in multi-pollutant prediction [75].

Regression-based methods, notably MLR, have been extensively utilized across a range of settings, including domestic, office, and educational environments. In many cases, these models necessitated variable transformation to enhance linearity assumptions and model fit [30,48,61,66] resulting in

R^{2}

values ranging from 0.43 to 0.74. A comparison with time series regression (TSR) has been shown to indicate superior performance of the latter in certain contexts [49]. Ridge regression further demonstrated strong short-term accuracy (

M A E \approx 1

ppm), making it suitable for edge computing deployment [33].

Mass balance models have been demonstrated to offer a mechanistic insight into the influence of ventilation, infiltration, and emission sources on pollutant concentrations. Although explicit error metrics were not always reported, these models generally aligned well with the measured data in scenarios such as cooking, material off-gassing, or building retrofits [18,49,54,62,69,76].

As demonstrated in [32], discrete-time Markov chains (DTMCs) have been shown to support probabilistic IAQ forecasting, with an MAE of as low as 4.75% being achieved. Correlation-based techniques, particularly those employed for the estimation of AER using PM data, have been shown to outperform conventional tracer gas methods regarding accuracy and operational simplicity [71].

A few studies have integrated statistical modeling with data cleaning and imputation. For instance, the combination of ARIMA with ARMA, MICE, KNN, and EM has been utilized for the preprocessing of low-cost sensor datasets, thereby enhancing robustness and reliability [75].

In summary, mathematical and statistical models remain highly relevant for IAQ prediction, especially when transparency, interpretability, or constrained computational resources are prioritized. Although these models may be considered to lack the flexibility of ML or DL methods, they offer distinct advantages in terms of ease of calibration and integration into real-time systems, thereby ensuring their continued applicability.

4.4.2. Machine Learning

Classical ML approaches have facilitated more flexible and data-driven IAQ predictions, frequently demonstrating superior performance in terms of accuracy and scalability when compared to traditional models. QRF has been identified as a particularly effective method for PM2.5 prediction in residential and office settings, with predictive interval coverage rates of 85–88% and

R^{2}

up to 0.70 [8,9,31].

RF and XGBoost models have been extensively adopted for multi-pollutant forecasting (e.g., PM2.5, PM10, CO₂, TVOC, NO₂, formaldehyde), frequently achieving

R^{2} > 0.9

and maintaining low RMSE, even with limited training data [12,21,24,33,35,37,41,44,59,64,72,77]. The robustness and generalisability of the technology across a range of indoor environments was a key factor in the successful integration with IoT platforms and DT.

In the context of indoor environments, such as offices, schools, and shopping malls, ML algorithms, including MLP and ANN, have been employed for the prediction of (PM2.5, PM10) and VOCs. The reported

R^{2}

, a metric of model performance, ranges from 0.78 to 0.95, with a normalized mean square error (NMSE) ranging from 0.46 to 0.49 µg/m³ [17,19,24,30,44,64]. The utility of these models was further demonstrated in contexts characterized by a lack of direct sensor coverage. The application of SVM models for the purposes of CO₂ forecasting and categorical IAQ classification has been demonstrated to yield correlation coefficients of up to 0.99 and hospital classification accuracies of up to 99.813% [16,17,37,58,74].

In the field of ML, ensemble models (for example, CatBoost and GBM) have been utilized in the context of binary and multiclass classification, with a focus on threshold exceedances. CatBoost achieved an AUC of 0.949, surpassing the performance of competing models [17,55,59].

It is evident that additional methods, such as RBFNN, KNN, and SGD, fulfil supporting roles in VOC prediction, data imputation, and real-time DT modeling [51,75,77]. Collectively, classical ML models offer strong predictive accuracy, fast training times, and good interpretability. The efficacy of these methods is particularly pronounced in moderate-scale applications where computational resources and labeled data are available.

4.4.3. Deep Learning

DL models have been adopted with increasing frequency to capture complex temporal and nonlinear dependencies in IAQ datasets. LSTM networks are prevalent in the extant literature, exhibiting a high degree of accuracy across a range of time horizons. The reported performance includes RMSE < 2 μg/m³ for PM2.5 [30,33], MAE = 15.4 ppm for CO₂ [53], and MAPE

< 10%

for formaldehyde [40]. The deployment of these models has been successfully implemented in a variety of contexts, including residential, commercial, and transport [16,17,24,30,33,47,53,59,61,70,74,75].

RNN have been employed in specific short-term applications, including PM10 prediction and health response modeling. In certain cases, RNNs have been shown to outperform LSTMs on tailored datasets [22,26,50]. BNNs have been utilized to introduce uncertainty quantification into IAQ forecasts, thereby providing useful confidence estimates with

R^{2}

up to 0.70 and RMSE between 9 and 13 µg/m³ [8,68].

Conventional ANN and MLP models retained their relevance in settings characterized by limited data or low data complexity, providing an acceptable baseline level of performance [13,23,43,44,46,56].

It is evident that DL methods, most notably LSTM, demonstrate a high degree of suitability for real-time, multi-parameter IAQ prediction. These methods exhibit scalability, adaptability, and integration potential with smart infrastructure.

4.4.4. Hybrid Models

The employment of hybrid models has been demonstrated to combine the strengths of multiple techniques, thereby enhancing prediction accuracy and operational flexibility. CNN-LSTM-DNN models, utilized for PM10/PM2.5 forecasting in metro stations, incorporate spatial, sequential, and dense processing layers [70]. The integration of LSTM with KNN, EM, MICE, or ARMA has been demonstrated to enhance the robustness of low-cost IoT sensor systems, achieving accuracies of up to 96% [75].

To optimize the performance of DL networks for the purposes of thermal comfort and pollutant forecasting, the following algorithms were utilized: GWO and EPSO [14,20,28]. The integration of CFD-based models with PSO-optimized neural networks has been demonstrated to result in a maximum MAPE of less than 1% for CO₂ [45]. The combination of FIS with GA/PSO resulted in an RMSE of 0.998 [67].

The utilization of feature selection and reduction methods (i.e., KPCA, mRMR, LASSO) in hybrid pipelines, such as KPCA-AdaBoost-LSTM and LATCN, has been demonstrated to result in

R^{2}

improvements exceeding 30% and RMSE reductions greater than 14% [38,39,60,65]. Logic-based systems (FL-CNN-LSTM, ICA-RBP) and multi-criteria decision support (SO-LSTS) have been shown to enhance robustness under uncertainty and context specificity [36,42,52]. DT-based frameworks have been integrated with GRU, CNN, and RF with CFD simulations, demonstrating an enhanced responsiveness despite the presence of limited error reporting [29,57]. The employment of signal filtering methods, such as SSA, has been demonstrated to enhance forecast stability and improve the detection of anomalies for PM2.5 and CO₂ [73].

Hybrid approaches have been shown to offer high adaptability and accuracy, particularly in complex real-time IAQ management scenarios where no single model suffices.

5. Gaps, Barriers, and Opportunities with Future Perspective

Despite the growing number of studies on AI/ML applications in IAQ prediction, the field faces persistent limitations. A major gap is the narrow focus on single parameters—mainly PM2.5 or CO₂—while neglecting VOCs, humidity, or emissions from indoor materials, which hinders comprehensive modeling [8,12,51,76]. Moreover, many models lack contextual inputs such as occupancy, user behavior, or ventilation settings, essential for capturing real-world IAQ variability [35,36,64].

Models often suffer from limited scalability, being trained on data from single buildings or controlled experiments, reducing generalizability across different building types and climates [24,37,61]. The low spatial resolution of most approaches further limits their applicability in large or multi-zone environments [31,67].

Another barrier is data quality: many studies rely on low-cost sensors prone to noise and drift, yet robust preprocessing (e.g., imputation, outlier removal) is rarely applied [19,25,66]. While DL models (e.g., LSTM, CNN-LSTM) offer high accuracy, their computational complexity and lack of interpretability limit practical deployment [20,41,53].

Few studies conduct comparative evaluations across algorithms under unified conditions, hindering model selection [32,60]. Furthermore, predictive models are rarely integrated with HVAC control systems or DT, reducing their operational impact [13,30,77].

Finally, oversimplified physical assumptions, absence of standardized datasets and metrics, and the neglect of secondary emissions from materials weaken robustness and reproducibility [47,63,72]. Addressing these issues is vital to advance AI-driven IAQ solutions for smart and healthy buildings.

While current research on AI/ML for IAQ prediction reveals notable limitations, it also outlines clear directions for future advancements. A key opportunity lies in the development of hybrid and multi-source models, which combine data from sensors, user behavior, building characteristics, and simulation tools such as CFD or BIM to improve spatial and temporal accuracy [13,18,31,51]. Enhancing model contextualization through environmental and behavioral dynamics—such as occupancy, ventilation type, or user activity—can increase prediction reliability across varied building uses, including schools, hospitals, and heritage sites [33,35,37,71]. Equally important is the integration of predictive models with real-time control systems and DT, enabling proactive building operation and adaptive ventilation based on forecasted IAQ conditions [29,49,62,77].

Scalability and transferability can be addressed through multi-building datasets collected across diverse typologies and climatic zones, supporting the creation of robust, generalizable solutions [37,43,46,72]. Simultaneously, there is a need for computationally efficient, lightweight models—such as LightGBM or GRU—suitable for edge deployment in smart home or resource-constrained environments [39,40,67].

Further progress can be achieved by integrating health-related indicators (e.g., exposure risk or PEFR in children with asthma) [50], employing federated learning to preserve data privacy [20], and introducing interpretability and uncertainty quantification mechanisms (e.g., SHAP, Bayesian methods) to foster trust and support informed decision-making [19,60,65].

Finally, future efforts should prioritize standardized IAQ datasets [56], model optimization using metaheuristics [20], and energy-aware hybrid approaches, which jointly address air quality and energy efficiency [30,34,72]. These directions are essential to bridge current research gaps and build practical, scalable, and health-responsive IAQ prediction systems.

To consolidate the findings from Section 4 and Section 5, a SWOT analysis is presented below in Table 5. It summarizes the key strengths, weaknesses, opportunities, and threats associated with AI and ML approaches for IAQ prediction, highlighting both current capabilities and strategic directions for future research and deployment.

6. Conclusions

This review provides a comprehensive synthesis of recent advances in using AI and ML to predict IAQ across different building types, pollutants, and deployment contexts. The literature reviewed highlights a growing interest in data-driven IAQ forecasting methods that enable proactive environmental control, energy efficiency and health-focused building operation. Using a taxonomy based on predicted parameters, modeling strategies, and facility types, we demonstrate how predictive models, particularly those using DL and hybrid architectures, are transforming IAQ management from passive monitoring to intelligent decision support.

Key findings indicate that models based on LSTM, GRU, and hybrid DL structures achieve the highest prediction accuracy, particularly when applied to time-series data enriched with contextual information such as occupancy, ventilation modes, and external weather conditions. Hybrid systems that combine AI with physical modeling or simulation tools (e.g., CFD or BIM) and those that employ optimization algorithms or feature selection techniques demonstrate strong generalization and robustness under real-world conditions. However, several challenges remain. Many models are trained on localized datasets, which limits scalability and cross-building transferability. Few studies address the interpretability of DL predictions, and integration with building automation and control systems remains the exception rather than the rule. Furthermore, the lack of standardized datasets and performance benchmarks restricts progress in comparative assessment and model validation.

Future research should priorities developing scalable, lightweight and explainable models that are suitable for real-time deployment in environments with limited resources. Promising avenues include using federated learning for privacy-preserving training, integrating predictive IAQ models with DT platforms and HVAC control systems, and incorporating health-related metrics to support occupant-centered interventions. Such predictive IAQ frameworks will not only safeguard occupant health and comfort but also play a decisive role in improving building energy efficiency, reducing operational emissions, and contributing to broader sustainability goals. By linking indoor air quality management with energy performance optimization, AI and ML-based approaches directly support the transition toward low-carbon, resilient, and smart buildings. Multimodal sensor fusion, uncertainty quantification and benchmark datasets will also be essential for building trustworthy, generalizable and energy-efficient IAQ forecasting systems that align with the goals of smart, sustainable building design.

Author Contributions

Conceptualization, J.G. and L.W.; methodology, L.W.; validation, L.W. and A.O.; formal analysis, D.L. and A.O.; investigation, A.O., J.G. and D.L.; resources, D.L.; data curation, D.L., J.G. and A.O.; writing—original draft preparation, D.L. and J.G.; writing—review and editing, A.O.; visualization, D.L.; supervision, L.W. and J.G.; project administration, J.G.; funding acquisition, D.L. and J.G. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Faculty of Electrical Engineering, Automatics, Computer Science and Biomedical Engineering of the AGH University of Krakow as part of a research subsidy for young scientists (Dean’s grants) in 2025.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Zhao, B.; Shi, S.; Ji, J.S. The WHO Air Quality Guidelines 2021 promote great challenge for indoor air. Sci. Total Environ. 2022, 827, 154376. [Google Scholar] [CrossRef] [PubMed]
Fisk, W.J. The ventilation problem in schools: Literature review. Indoor Air 2017, 27, 1039–1051. [Google Scholar] [CrossRef]
Felgueiras, F.; Mourão, Z.; Moreira, A.; Gabriel, M.F. Indoor environmental quality in offices and risk of health and productivity complaints at work: A literature review. J. Hazard. Mater. Adv. 2023, 10, 100314. [Google Scholar] [CrossRef]
Sun, S.; Zheng, X.; Villalba-Díez, J.; Ordieres-Meré, J. Indoor Air-Quality Data-Monitoring System: Long-Term Monitoring Benefits. Sensors 2019, 19, 4157. [Google Scholar] [CrossRef]
Paterson, C.; Sharpe, R.; Taylor, T.; Morrissey, K. Indoor PM2.5, VOCs and asthma outcomes: A systematic review in adults and their home environments. Environ. Res. 2021, 202, 111631. [Google Scholar] [CrossRef]
Xu, Q.; Goh, H.C.; Mousavi, E.; Rafsanjani, H.N.; Varghese, Z.; Pandit, Y.; Ghahramani, A. Towards Personalization of Indoor Air Quality: Review of Sensing Requirements and Field Deployments. Sensors 2022, 22, 3444. [Google Scholar] [CrossRef]
Chojer, H.; Branco, P.; Martins, F.; Alvim-Ferraz, M.; Sousa, S. Development of low-cost indoor air quality monitoring devices: Recent advancements. Sci. Total Environ. 2020, 727, 138385. [Google Scholar] [CrossRef]
Dai, H.; Wu, N.; Dong, Z.; Ren, J.; Gao, Y.; Zhao, B. Comparison and evaluation of machine learning models for predicting indoor PM2.5 concentrations on a large spatiotemporal scale. Build. Simul. 2025, 18, 1453–1466. [Google Scholar] [CrossRef]
Jha, S. Enhancing Indoor Air Quality Through Smart Home Automation System. Master’s Thesis, Arcada University of Applied Sciences, Helsinki, Finland, 2025. [Google Scholar]
Bian, Y.; Shi, Y. Data-driven operator learning for energy-efficient building control. arXiv 2025, arXiv:2504.21243. [Google Scholar]
Oussous, S.A.; Lail, D.M.; Bouayadi, R.E.; Amine, A. Deep Learning Innovations for Greenhouse Climate Prediction: Insights From a Spanish Case Study. IEEE Access 2025, 13, 64810–64821. [Google Scholar] [CrossRef]
Dehghan, F.; Porras-Amores, C.; Khanmohammadi, L.; Labib, R. Evaluating Machine Learning Models for Sustainable Building Design: Energy, Emissions, and Comfort Metrics. Build. Environ. 2025, 285, 113582. [Google Scholar] [CrossRef]
Tagliabue, L.C.; Cecconi, F.R.; Rinaldi, S.; Ciribini, A.L.C. Data driven indoor air quality prediction in educational facilities based on IoT network. Energy Build. 2021, 236, 110782. [Google Scholar] [CrossRef]
Hou, F.; Ma, J.; Kwok, H.H.; Cheng, J.C. Prediction and optimization of thermal comfort, IAQ and energy consumption of typical air-conditioned rooms based on a hybrid prediction model. Build. Environ. 2022, 225, 109576. [Google Scholar] [CrossRef]
Karaiskos, P.; Munian, Y.; Martinez-Molina, A.; Alamaniotis, M. Indoor air quality prediction modeling for a naturally ventilated fitness building using RNN-LSTM artificial neural networks. Smart Sustain. Built Environ. 2024; ahead-of-print. [Google Scholar] [CrossRef]
Yao, H.; Shen, X.; Wu, W.; Lv, Y.; Vishnupriya, V.; Zhang, H.; Long, Z. Assessing and predicting indoor environmental quality in 13 naturally ventilated urban residential dwellings. Build. Environ. 2024, 253, 111347. [Google Scholar] [CrossRef]
Taheri, S.; Razban, A. Learning-based CO₂ concentration prediction: Application to indoor air quality control using demand-controlled ventilation. Build. Environ. 2021, 205, 108164. [Google Scholar] [CrossRef]
Yang, S.; Mahecha, S.D.; Moreno, S.A.; Licina, D. Integration of Indoor Air Quality Prediction into Healthy Building Design. Sustainability 2022, 14, 7890. [Google Scholar] [CrossRef]
Lee, J.Y.; Miao, Y.; Chau, R.L.; Hernandez, M.; Lee, P.K. Artificial intelligence-based prediction of indoor bioaerosol concentrations from indoor air quality sensor data. Environ. Int. 2023, 174, 107900. [Google Scholar] [CrossRef] [PubMed]
Nguyen, T.P. AIoT-based indoor air quality prediction for building using enhanced metaheuristic algorithm and hybrid deep learning. J. Build. Eng. 2025, 105, 112448. [Google Scholar] [CrossRef]
Saini, J.; Dutta, M.; Marques, G. Indoor Air Quality Monitoring with IoT: Predicting PM10 for Enhanced Decision Support. In Proceedings of the 2020 International Conference on Decision Aid Sciences and Application (DASA), Sakheer, Bahrain, 8–9 November 2020; pp. 504–508. [Google Scholar] [CrossRef]
Nurcahyanto, H.; Prihatno, A.T.; Alam, M.M.; Rahman, M.H.; Jahan, I.; Shahjalal, M.; Jang, Y.M. Multilevel RNN-Based PM10 Air Quality Prediction for Industrial Internet of Things Applications in Cleanroom Environment. Wirel. Commun. Mob. Comput. 2022, 2022, 1874237. [Google Scholar] [CrossRef]
Cho, J.H.; Moon, J.W. Integrated artificial neural network prediction model of indoor environmental quality in a school building. J. Clean. Prod. 2022, 344, 131083. [Google Scholar] [CrossRef]
Sharma, P.K.; Mondal, A.; Jaiswal, S.; Saha, M.; Nandi, S.; De, T.; Saha, S. IndoAirSense: A framework for indoor air quality estimation and forecasting. Atmos. Pollut. Res. 2021, 12, 10–22. [Google Scholar] [CrossRef]
Zhu, Y.; Al-Ahmed, S.A.; Shakir, M.Z.; Olszewska, J.I. LSTM-Based IoT-Enabled CO₂ Steady-State Forecasting for Indoor Air Quality Monitoring. Electronics 2022, 12, 107. [Google Scholar] [CrossRef]
Dai, X.; Liu, J.; Li, Y. A recurrent neural network using historical data to predict time series indoor PM2.5 concentrations for residential buildings. Indoor Air 2021, 31, 1228–1237. [Google Scholar] [CrossRef] [PubMed]
Dutta, J.; Roy, S. IndoorSense: Context based indoor pollutant prediction using SARIMAX model. Multimed. Tools Appl. 2021, 80, 19989–20018. [Google Scholar] [CrossRef]
Rahim, M.S.A.; Yakub, F.; Omar, M.; Ghani, R.A.; Salim, S.A.Z.S.; Masuda, S.; Dhamanti, I. Prediction of Indoor Air Quality Using Long Short-Term Memory with Adaptive Gated Recurrent Unit. E3S Web Conf. 2023, 396, 01095. [Google Scholar] [CrossRef]
Zhang, J.; Poon, K.H.; Kwok, H.H.; Hou, F.; Cheng, J.C. Predictive control of HVAC by multiple output GRU—CFD integration approach to manage multiple IAQ for commercial heritage building preservation. Build. Environ. 2023, 245, 110802. [Google Scholar] [CrossRef]
Lagesse, B.; Wang, S.; Larson, T.V.; Kim, A.A. Predicting PM_2.5 in Well-Mixed Indoor Air for a Large Office Building Using Regression and Artificial Neural Network Models. Environ. Sci. Technol. 2020, 54, 15320–15328. [Google Scholar] [CrossRef]
Tian, X.; Zhang, Y.; Lin, Z. Predicting non-uniform indoor air quality distribution by using pulsating air supply and SVM model. Build. Environ. 2022, 219, 109171. [Google Scholar] [CrossRef]
Rastogi, K.; Barthwal, A.; Lohani, D.; Acharya, D. An IoT-based Discrete Time Markov Chain Model for Analysis and Prediction of Indoor Air Quality Index. In Proceedings of the 2020 IEEE Sensors Applications Symposium (SAS), Kuala Lumpur, Malaysia, 9–11 March 2020; pp. 1–6. [Google Scholar] [CrossRef]
Kallio, J.; Tervonen, J.; Räsänen, P.; Mäkynen, R.; Koivusaari, J.; Peltola, J. Forecasting office indoor CO₂ concentration using machine learning with a one-year dataset. Build. Environ. 2021, 187, 107409. [Google Scholar] [CrossRef]
Sassi, M.S.H.; Fourati, L.C. Deep Learning and Augmented Reality for IoT-based Air Quality Monitoring and Prediction System. In Proceedings of the 2021 International Symposium on Networks, Computers and Communications (ISNCC), Dubai, United Arab Emirates, 31 October–2 November 2021; pp. 1–6. [Google Scholar] [CrossRef]
Miao, S.; Gangolells, M.; Tejedor, B. Data-driven model for predicting indoor air quality and thermal comfort levels in naturally ventilated educational buildings using easily accessible data for schools. J. Build. Eng. 2023, 80, 108001. [Google Scholar] [CrossRef]
Zhou, Y.; Yang, G. A predictive model of indoor PM2.5 considering occupancy level in a hospital outpatient hall. Sci. Total Environ. 2022, 844, 157233. [Google Scholar] [CrossRef] [PubMed]
Baqer, N.S.; Albahri, A.S.; Mohammed, H.A.; Zaidan, A.A.; Amjed, R.A.; Al-Bakry, A.M.; Albahri, O.S.; Alsattar, H.A.; Alnoor, A.; Alamoodi, A.H.; et al. Indoor air quality pollutants predicting approach using unified labelling process-based multi-criteria decision making and machine learning techniques. Telecommun. Syst. 2022, 81, 591–613. [Google Scholar] [CrossRef]
Wu, Q.; Geng, Y.; Wang, X.; Wang, D.; Yoo, C.; Liu, H. A novel deep learning framework with variational auto-encoder for indoor air quality prediction. Front. Environ. Sci. Eng. 2024, 18, 8. [Google Scholar] [CrossRef]
Lu, Y.; Wang, J.; Wang, D.; Yoo, C.; Liu, H. Incorporating temporal multi-head self-attention convolutional networks and LightGBM for indoor air quality prediction. Appl. Soft Comput. 2024, 157, 111569. [Google Scholar] [CrossRef]
Segala, G.; Doriguzzi-Corin, R.; Peroni, C.; Gazzini, T.; Siracusa, D. A Practical and Adaptive Approach to Predicting Indoor CO₂. Appl. Sci. 2021, 11, 10771. [Google Scholar] [CrossRef]
Saini, J.; Dutta, M.; Marques, G. Internet of Things Based Environment Monitoring and PM10 Prediction for Smart Home. In Proceedings of the 2020 International Conference on Innovation and Intelligence for Informatics, Computing and Technologies (3ICT), Sakheer, Bahrain, 20–21 December 2020; pp. 1–5. [Google Scholar]
Bao, R.; Zhou, Y.; Jiang, W. FL-CNN-LSTM: Indoor Air Quality Prediction Using Fuzzy Logic and CNN-LSTM Model. In Proceedings of the 2022 2nd International Conference on Electrical Engineering and Control Science (IC2ECS), Nanjing, China, 16–18 December 2022; pp. 986–989. [Google Scholar] [CrossRef]
Kim, J.; Hong, Y.; Seong, N.; Kim, D.D. Assessment of ANN Algorithms for the Concentration Prediction of Indoor Air Pollutants in Child Daycare Centers. Energies 2022, 15, 2654. [Google Scholar] [CrossRef]
Gaowa, S.; Zhang, Z.; Nie, J.; Li, L.; A-ru, H.; Yu, Z. Using artificial neural networks to predict indoor particulate matter and TVOC concentration in an office building: Model selection and method development. Energy Built Environ. 2025, 6, 750–761. [Google Scholar] [CrossRef]
Li, L.; Zhang, Y.; Fung, J.C.; Qu, H.; Lau, A.K. A coupled computational fluid dynamics and back-propagation neural network-based particle swarm optimizer algorithm for predicting and optimizing indoor air quality. Build. Environ. 2022, 207, 108533. [Google Scholar] [CrossRef]
Kim, M.K.; Cremers, B.; Liu, J.; Zhang, J.; Wang, J. Prediction and correlation analysis of ventilation performance in a residential building using artificial neural network models based on data-driven analysis. Sustain. Cities Soc. 2022, 83, 103981. [Google Scholar] [CrossRef]
Lu, L.; Huang, X.; Zhou, X.; Guo, J.; Yang, X.; Yan, J. High-performance formaldehyde prediction for indoor air quality assessment using time series deep learning. Build. Simul. 2024, 17, 415–429. [Google Scholar] [CrossRef]
Jung, C.C.; Lin, W.Y.; Hsu, N.Y.; Wu, C.D.; Chang, H.T.; Su, H.J. Development of Hourly Indoor PM2.5 Concentration Prediction Model: The Role of Outdoor Air, Ventilation, Building Characteristic, and Human Activity. Int. J. Environ. Res. Public Health 2020, 17, 5906. [Google Scholar] [CrossRef]
Lee, Y.K.; Kim, Y.I.; Lee, W.S. Development of CO₂ Concentration Prediction Tool for Improving Office Indoor Air Quality Considering Economic Cost. Energies 2022, 15, 3232. [Google Scholar] [CrossRef]
Woo, J.; Lee, J.H.; Kim, Y.; Rudasingwa, G.; Lim, D.H.; Kim, S. Forecasting the Effects of Real-Time Indoor PM2.5 on Peak Expiratory Flow Rates (PEFR) of Asthmatic Children in Korea: A Deep Learning Approach. IEEE Access 2022, 10, 19391–19400. [Google Scholar] [CrossRef]
Majdi, A.; Alrubaie, A.J.; Al-Wardy, A.H.; Baili, J.; Panchal, H. A novel method for Indoor Air Quality Control of Smart Homes using a Machine learning model. Adv. Eng. Softw. 2022, 173, 103253. [Google Scholar] [CrossRef]
de Assis Pedrobon Ferreira, W.; Grout, I.; da Silva, A.C.R. Application of a Fuzzy ARTMAP Neural Network for Indoor Air Quality Prediction. In Proceedings of the 2022 International Electrical Engineering Congress (iEECON), Khon Kaen, Thailand, 9–11 March 2022; pp. 1–4. [Google Scholar] [CrossRef]
Gabriel, M.; Auer, T. LSTM Deep Learning Models for Virtual Sensing of Indoor Air Pollutants: A Feasible Alternative to Physical Sensors. Buildings 2023, 13, 1684. [Google Scholar] [CrossRef]
Kim, Y.; Shin, D.; Hong, K.; Lee, G.; Kim, S.B.; Park, I.; Kim, H.; Kim, Y.; Han, B.; Hwang, J. Prediction of indoor PM_2.5 concentrations and reduction strategies for cooking events through various IAQ management methods in an apartment of South Korea. Indoor Air 2022, 32, e13173. [Google Scholar] [CrossRef] [PubMed]
Guo, Z.; Wang, X.; Ge, L. Classification prediction model of indoor PM2.5 concentration using CatBoost algorithm. Front. Built Environ. 2023, 9, 1207193. [Google Scholar] [CrossRef]
Zhang, H.; Srinivasan, R.; Yang, X. Simulation and Analysis of Indoor Air Quality in Florida Using Time Series Regression (TSR) and Artificial Neural Networks (ANN) Models. Symmetry 2021, 13, 952. [Google Scholar] [CrossRef]
Guo, Z.; Yang, C.; Wang, D.; Liu, H. A novel deep learning model integrating CNN and GRU to predict particulate matter concentrations. Process Saf. Environ. Prot. 2023, 173, 604–613. [Google Scholar] [CrossRef]
Kapoor, N.R.; Kumar, A.; Kumar, A.; Kumar, A.; Mohammed, M.A.; Kumar, K.; Kadry, S.; Lim, S. Machine Learning-Based CO₂ Prediction for Office Room: A Pilot Study. Wirel. Commun. Mob. Comput. 2022, 2022, 9404807. [Google Scholar] [CrossRef]
Mohammadshirazi, A.; Kalkhorani, V.A.; Humes, J.; Speno, B.; Rike, J.; Ramnath, R.; Clark, J.D. Predicting airborne pollutant concentrations and events in a commercial building using low-cost pollutant sensors and machine learning: A case study. Build. Environ. 2022, 213, 108833. [Google Scholar] [CrossRef]
Wang, J.; Wang, D.; Zhang, F.; Yoo, C.; Liu, H. Soft sensor for predicting indoor PM2.5 concentration in subway with adaptive boosting deep learning model. J. Hazard. Mater. 2024, 465, 133074. [Google Scholar] [CrossRef]
Ren, J.; He, J.; Novoselac, A. Predicting indoor particle concentration in mechanically ventilated classrooms using neural networks: Model development and generalization ability analysis. Build. Environ. 2023, 238, 110404. [Google Scholar] [CrossRef]
Wei, W.; Wargocki, P.; Ke, Y.; Bailhache, S.; Diallo, T.; Carré, S.; Ducruet, P.; Sesana, M.M.; Salvalai, G.; Espigares-Correa, C.; et al. PredicTAIL, a prediction method for indoor environmental quality in buildings undergoing deep energy renovation based on the TAIL rating scheme. Energy Build. 2022, 258, 111839. [Google Scholar] [CrossRef]
Marzouk, M.; Atef, M. Assessment of Indoor Air Quality in Academic Buildings Using IoT and Deep Learning. Sustainability 2022, 14, 7015. [Google Scholar] [CrossRef]
Puscasiu, A.P.; Fanca, A.; Gota, D.I.; Valean, H. Monitoring and Prediction of Indoor Air Quality for Enhanced Occupational Health. Intell. Autom. Soft Comput. 2023, 35, 925–940. [Google Scholar] [CrossRef]
Shi, T.; Yang, W.; Qi, A.; Li, P.; Qiao, J. LASSO and attention-TCN: A concurrent method for indoor particulate matter prediction. Appl. Intell. 2023, 53, 20076–20090. [Google Scholar] [CrossRef]
Guak, S.; Kim, K.; Yang, W.; Won, S.; Lee, H.; Lee, K. Prediction models using outdoor environmental data for real-time PM10 concentrations in daycare centers, kindergartens, and elementary schools. Build. Environ. 2021, 187, 107371. [Google Scholar] [CrossRef]
Saini, J.; Dutta, M.; Marques, G. A novel application of fuzzy inference system optimized with particle swarm optimization and genetic algorithm for PM10 prediction. Soft Comput. 2022, 26, 9573–9586. [Google Scholar] [CrossRef]
Dai, H.; Liu, Y.; Wang, J.; Ren, J.; Gao, Y.; Dong, Z.; Zhao, B. Large-scale spatiotemporal deep learning predicting urban residential indoor PM2.5 concentration. Environ. Int. 2023, 182, 108343. [Google Scholar] [CrossRef]
Park, S.B.; Park, J.H.; Jo, Y.M.; Song, D.; Heo, S.; Lee, T.J.; Park, S.; Koo, J. Development and validation of a dynamic mass-balance prediction model for indoor particle concentrations in an office room. Build. Environ. 2022, 207, 108465. [Google Scholar] [CrossRef]
Bakht, A.; Sharma, S.; Park, D.; Lee, H. Deep Learning-Based Indoor Air Quality Forecasting Framework for Indoor Subway Station Platforms. Toxics 2022, 10, 557. [Google Scholar] [CrossRef]
Fu, N.; Kim, M.K.; Huang, L.; Liu, J.; Chen, B.; Sharples, S. Investigating the reliability of estimating real-time air exchange rates in a building by using airborne particles, including PM1.0, PM2.5, and PM10: A case study in Suzhou, China. Atmos. Pollut. Res. 2024, 15, 101955. [Google Scholar] [CrossRef]
Park, S.Y.; Yoon, D.K.; Park, S.H.; Jeon, J.I.; Lee, J.M.; Yang, W.H.; Cho, Y.S.; Kwon, J.; Lee, C.M. Proposal of a Methodology for Prediction of Indoor PM2.5 Concentration Using Sensor-Based Residential Environments Monitoring Data and Time-Divided Multiple Linear Regression Model. Toxics 2023, 11, 526. [Google Scholar] [CrossRef]
Espinosa, F.; Bartolomé, A.B.; Hernández, P.V.; Rodriguez-Sanchez, M.C. Contribution of Singular Spectral Analysis to Forecasting and Anomalies Detection of Indoors Air Quality. Sensors 2022, 22, 3054. [Google Scholar] [CrossRef]
Dudkina, E.; Crisostomi, E.; Franco, A. Prediction of CO₂ in Public Buildings. Energies 2023, 16, 7582. [Google Scholar] [CrossRef]
Tran, Q.A.; Dang, Q.H.; Le, T.; Nguyen, H.T.; Le, T.D. Air Quality Monitoring and Forecasting System using IoT and Machine Learning Techniques. In Proceedings of the 2022 6th International Conference on Green Technology and Sustainable Development (GTSD), Nha Trang City, Vietnam, 29–30 July 2022; pp. 786–792. [Google Scholar] [CrossRef]
D’Amico, A.; Pini, A.; Zazzini, S.; D’Alessandro, D.; Leuzzi, G.; Currà, E. Modelling VOC Emissions from Building Materials for Healthy Building Design. Sustainability 2020, 13, 184. [Google Scholar] [CrossRef]
Arsiwala, A.; Elghaish, F.; Zoher, M. Digital twin with Machine learning for predictive monitoring of CO₂ equivalent from existing buildings. Energy Build. 2023, 284, 112851. [Google Scholar] [CrossRef]
Rakib, M.; Haq, S.; Hossain, M.I.; Rahman, T. IoT Based Air Pollution Monitoring & Prediction System. In Proceedings of the 2022 International Conference on Innovations in Science, Engineering and Technology (ICISET), Chittagong, Bangladesh, 26–27 February 2022; pp. 184–189. [Google Scholar] [CrossRef]
Salman, H.A.; Kalakech, A.; Steiti, A. Random Forest Algorithm Overview. Babylon. J. Mach. Learn. 2024, 2024, 69–79. [Google Scholar] [CrossRef] [PubMed]
James, G.; Witten, D.; Hastie, T.; Tibshirani, R.; Taylor, J. Linear Regression. In An Introduction to Statistical Learning: With Applications in Python; Springer International Publishing: Cham, Switzerland, 2023; pp. 69–134. [Google Scholar] [CrossRef]
Pradhan, A. Support vector machine—A survey. Int. J. Emerg. Technol. Adv. Eng. 2012, 2, 82–85. [Google Scholar]
Schulz, E.; Speekenbrink, M.; Krause, A. A tutorial on Gaussian process regression: Modelling, exploring, and exploiting functions. J. Math. Psychol. 2018, 85, 1–16. [Google Scholar] [CrossRef]
Natekin, A.; Knoll, A. Gradient boosting machines, a tutorial. Front. Neurorobot. 2013, 7, 21. [Google Scholar] [CrossRef]
Kramer, O. K-Nearest Neighbors. In Dimensionality Reduction with Unsupervised Nearest Neighbors; Springer: Berlin/Heidelberg, Germany, 2013; Volume 51, pp. 13–23. [Google Scholar] [CrossRef]
Salehinejad, H.; Sankar, S.; Barfett, J.; Colak, E.; Valaee, S. Recent Advances in Recurrent Neural Networks. arXiv 2018, arXiv:1801.01078. [Google Scholar] [CrossRef]
Graves, A. Long Short-Term Memory. In Supervised Sequence Labelling with Recurrent Neural Networks; Springer: Berlin/Heidelberg, Germnay, 2012; pp. 37–45. [Google Scholar] [CrossRef]
Dey, R.; Salem, F.M. Gate-variants of Gated Recurrent Unit (GRU) neural networks. In Proceedings of the 2017 IEEE 60th International Midwest Symposium on Circuits and Systems (MWSCAS), Boston, MA, USA, 6–9 August 2017; pp. 1597–1600. [Google Scholar] [CrossRef]
Ranstam, J.; Cook, J.A. LASSO regression. Br. J. Surg. 2018, 105, 1348. [Google Scholar] [CrossRef]
Shumway, R.H.; Stoffer, D.S. ARIMA Models. In Time Series Analysis and Its Applications: With R Examples; Springer International Publishing: Cham, Switzerland, 2017; pp. 75–163. [Google Scholar] [CrossRef]
Alharbi, F.R.; Csala, D. A Seasonal Autoregressive Integrated Moving Average with Exogenous Factors (SARIMAX) Forecasting Model-Based Time Series Approach. Inventions 2022, 7, 94. [Google Scholar] [CrossRef]

Figure 1. The percentage share of each category for the criteria for dividing the examined papers and reviews.

Table 1. The literature review results from bibliometric databases.

Database	Publication Type	IAQ	IAQ Prediction	IAQ Prediction + Machine Learning	IAQ Prediction + Deep Learning
Web of Science	Articles	1121	111	30	14
Web of Science	Reviews	155	11	3	0
Scopus	Articles	1203	99	25	15
Scopus	Reviews	162	16	5	0
Google Scholar	Articles	17,400	16,600	13,400	12,300
Google Scholar	Reviews	7560	4320	1730	1590

Table 2. The literature review results from publisher databases.

Database	Publication Type	IAQ	IAQ Prediction	IAQ Pred. + ML	IAQ Pred. + DL
Springer	Articles	592	166	73	45
Springer	Reviews	76	29	21	13
Science Direct	Articles	2713	1051	462	353
Science Direct	Reviews	357	169	105	87
MDPI	Articles	261	32	7	2
MDPI	Reviews	49	5	1	0
IEEE Xplore	Conf. papers	113	15	10	1
IEEE Xplore	Journals	16	3	2	3
Taylor & Francis	Articles	299	182	23	41
Taylor & Francis	Reviews	7	3	1	4
ACM Digital Library	All types	50	34	29	25
ACM Digital Library	Reviews	0	0	0	0
Wiley Online Library	Journal papers	280	165	38	40
Wiley Online Library	Books	63	39	21	19

Table 3. The list of selected publications.

Paper Ref.	IAQ Parameters	Object Type/Building Type	Strategy	Methods/Algorithm Type	Results
[8]	PM2.5	Residential	Ahead prediction	Machine Learning (Quantile Random Fores—QRF, Gaussian Process Regression—GPR); Deep Learning (Bayesian Neural Network—BNN)	BNN: $R^{2}$ = 0.48–0.72, MAE = 7.82–16.3 µg/m³, 95% interval coverage = 88%; QRF: $R^{2}$ = 0.24–0.95, MAE = 3.09–20.0 µg/m³, interval coverage = 85–92%; GPR: $R^{2}$ = 0.03–0.69, weak coverage, wide intervals, overfit tendency on test set.
[9]	CO₂, VOC, PM2.5, RH, Temp.	Residential	Ahead prediction	Machine Learning (Random Forest Regressor)	Temp: $R^{2}$ = 0.94, MAE = 0.04 Humidity: $R^{2}$ = 0.92, MAE = 0.31 CO₂: $R^{2}$ = 0.96, MAE = 13.89 VOC: $R^{2}$ = 0.98, MAE = 15.38 PM2.5: $R^{2}$ = 0.84, MAE = 1.44
[10]	CO₂	Public (Classroom)	Ahead prediction	Deep Learning (Ensemble Neural Operator Transformer based on CFD-trained PDE surrogate modeling)	CO₂ prediction relative error: 10.9% (test), 5.9% (train) Ventilation control reduced energy consumption by 34–56% vs. max control while maintaining CO₂ < 1200 ppm, 0% violations; 250,000× faster than CFD simulation.
[11]	Temp, RH, Dew Point	Public (Greenhouse)	Ahead prediction	Deep Learning (ANN, LSTM-RNN, LSTM-ANN, GRU, Power-LSTM—PLSTM)	PLSTM with input Hi-Di-Rs: $R^{2}$ = 0.9999, RMSE = 0.022, MAE = 0.016. Best among all models. Other DL models: $R^{2}$ = 0.918–0.998, varying by input set. GRU second-best with $R^{2}$ = 0.9997 (Hi-Di-To).
[12]	IAQ (CO₂ ppm), Primary Energy Consumption, CO₂ Emissions, PPD, VDH	Residential (Apartment)	No time horizon	Machine Learning (Random Forest, XGBoost, ANN, SVR, KNN, Linear Regression)	XGBoost: $R^{2}$ = 0.9593, RMSE = 33.12 ppm. RF: $R^{2}$ = 0.9250. Linear Regression baseline: $R^{2}$ ≈ 0.5. SHAP: airflow (P9) was top IAQ driver.
[13]	CO₂	Public (Educational–University Laboratory)	Ahead prediction	Deep Learning–Recurrent Neural Network (RNN), specifically LSTM	Training $R^{2}$ = 0.93, Test $R^{2}$ = 0.88, Global $R^{2}$ = 0.92, MSE = 75 ppm ( 10.6% of avg. CO₂ level 712 ppm); robust despite occupancy variability
[14]	CO₂	Public (University Lecture Theatre)	No time horizon	Hybrid models (GWO + ELM − ML)	$R^{2}$ = 0.95, RMSE = 5.17 ppm for CO₂ prediction; 95% faster than CFD for 289 scenarios
[15]	CO₂, TVOC, PM2.5, PM10	Public (Athletic/Fitness Center)	Ahead prediction	Deep Learning (LSTM–RNN architecture)	Availability (time slots with acceptable IAQ): TVOC—92.8%, CO₂—89.2%, PM10—85.7% Combined (TVOC + CO₂ + PM10): 82.1%
[16]	CO₂, PM2.5, Formaldehyde, RH	Residential	Ahead prediction	Mathematical/Statistical methods (ARIMA); Machine Learning (SVM); Deep Learning (LSTM, BPNN)	Best for CO₂: LSTM ( $R^{2}$ = 0.97, RMSE = 1.35 ppm); SVM for PM2.5 (lowest error); Formaldehyde: all models within sensor error (±20 µg/m³); PM2.5: all models showed reduced predictability (MAPE up to 91.85% for LSTM)
[17]	CO₂	Public (Classroom)	Ahead prediction	Machine Learning (SVM, RF, LR, AdaBoost, Gradient Boosting), Deep Learning (MLP)	MLP: $R^{2}$ = 0.91 (1 h), 0.805 (24 h); RMSE = 34.3–54.78 ppm; MLP best across all horizons
[18]	CO₂, TVOC, PM10, PM2.5, NO₂, O₃	Residential	No time horizon	Mathematical/Statistical methods (mass balance equation + multizone airflow simulation)	Good agreement for CO₂, PM10, PM2.5, O₃ (≤20% error); underestimation for TVOC and NO₂ by 3–4× (source database limitations)
[19]	Bioaerosols (bacteria, fungi, pollen), PM2.5, PM10	Public (Office Shopping Mall)	Ahead prediction	Machine Learning (RF, XGBoost, Linear, Lasso); Deep Learning (MLP, LSTM, RNN)	LSTM best overall: WI ≈ 0.75–0.80 (up to 60 min ahead); PM prediction: 90% accuracy; bioaerosols: 60–80% accuracy depending on particle type
[20]	CO₂, PM2.5, TVOC, Temp, RH	Public (Commercial Terminal)	Ahead prediction	Hybrid models (CNN-LSTM + EPSO)	EPSO-CNN-LSTM consistently outperforms CNN, LSTM, RNN, GRU: $R^{2}$ up to 0.98; MAE ↓ by 9–52%; MAPE ↓ by 2–70%; stable across 4 datasets (20-fold CV)
[21]	PM10	Public (Canteen in Campus)	Ahead prediction	Machine Learning (XGBoost)	$R^{2}$ = 0.99, RMSE = 0.483, MAE = 0.284, MAPE = 3.24%, accuracy = 98.15%; model uses CO₂, VOC, RH to forecast PM10
[22]	PM10	Public (Cleanroom in factory)	Ahead prediction	Deep Learning (Multilevel RNN, LSTM)	Multilevel RNN outperformed LSTM: $R^{2}$ = 0.61 vs. 0.42 (test), RMSE = 0.36 vs. 0.51, MAE = 0.26 vs. 0.33; best time-step = 22
[23]	CO₂, PM10, PM2.5	Public (School)	Ahead prediction	Ahead prediction Deep Learning (Integrated ANN, MIMO comparison, MOGA optimization)	RMSE: CO₂ = 0.88, PM10 = 0.46, PM2.5 = 0.66; $R^{2}$ : CO₂ = 1.0, PM10 = 0.999, PM2.5 = 0.998; 90–97% samples within ±1 unit prediction error
[24]	CO₂, PM2.5	Public (University Classrooms)	Ahead prediction	Machine Learning (MLP, XGBoost); Deep Learning (LSTM-wF)	Estimation: 95.86% accuracy (with AER); Forecasting: 94–96% accuracy; Forecast error max 16%; LSTM-wF faster than Bi-LSTM/DBU-LSTM
[25]	CO₂	Residential	Ahead prediction	Deep Learning (LSTM: single, stacked, bidirectional)	Best: Bidirectional LSTM— $R^{2}$ = 0.981, RMSE = 16.77 ppm, MAE = 8.95 ppm; steady-state CO₂ prediction: error 5.5%; step-ahead horizon = 1 min
[26]	PM2.5	Residential	Ahead prediction	Deep Learning (Autoencoder + RNN)	Best RMSE = 17.28 µg/m³, $R^{2}$ = 0.799; Median error = 8.3 µg/m³ (30 min horizon); model captures trends for indoor PM2.5 control
[27]	CO₂	Public (University Lab)	Ahead prediction	Mathematical/Statistical methods (SARIMAX)	RMSE = 26.45 ppm, Accuracy ≈ 97.36%, $R^{2}$ = 0.907 (10-fold CV); model uses exogenous context data for 3-day prediction
[28]	PM2.5	Public (US Embassy)	Ahead prediction	Hybrid models (LSTM + GRU)	RMSE: LSTM = 0.3186, LSTM-GRU = 0.2034; LSTM-GRU better at tracking temporal trends and reducing gradient error
[29]	CO₂, RH, Temp, NO₂, SO₂	Public (Commercial heritage)	Ahead prediction	Hybrid models (Multiple Output GRU + CFD)	MAE improved by 11.3% (RH), $R^{2}$ = 0.922 (CO₂); CFD + GRU reduced HVAC delay by 10 min; IAQ improved by up to 20% via predictive regulation
[30]	PM2.5	Public (Office)	Ahead prediction	Mathematical/Statistical methods (MLR, PLS, DLM, LASSO); Machine Learning (ANN); Deep Learning (LSTM)	Best: LSTM–RMSE = 1.73 µg/m³, $R^{2}$ = 0.83, IA = 0.94; ANN RMSE = 2.38 µg/m³, MLR RMSE = 3.07 µg/m³; 670 hourly observations used
[31]	Air Age (IAQ parameter)	Public (a room model with a pulsating air supply)	Ahead prediction	Machine Learning (XGBoost)	RMSE = 0.483, $R^{2}$ = 0.99, MAPE = 3.24%, Accuracy = 98.15%
[32]	PM2.5, PM10, CO (IAQ Index)	Public (University rooms)	Ahead prediction	Mathematical/Statistical methods (Discrete-Time Markov Chain–DTMC)	MAE = 4.75% (avg. prediction error); return periods for AQI states predicted with ≤6.6% error
[33]	CO₂	Public (Cubicles, Meeting Rooms)	Ahead prediction	Mathematical/Statistical methods (Ridge); Machine Learning (DT, RF); Deep Learning (MLP)	MAE: 1 min ahead ≈ 1 ppm, 5 min ≈ 4–5 ppm, 15 min ≈ 12–13 ppm; DT nearly matched RF; MLP not better
[34]	PM2.5, PM10, CO	Public (University building)	Ahead prediction	Deep Learning (RNN, LSTM–single and stacked)	LSTM used for >2 h AQ forecasting; model accuracy declines beyond 4 h horizon; no quantitative metrics provided; RNN found effective for <1 h
[35]	CO₂, TC (Thermal Comfort)	Public (Schools–Primary and Secondary)	Ahead prediction	Machine Learning (Class-weighted Random Forest)	Accuracy = 0.9718, $R^{2}$ = 0.9600 (test); robust over 20 runs (mean $R^{2}$ = 0.9584); best performance with 22 selected features including occupancy, activity, and window use
[36]	PM2.5	Public (Hospital Outpatient Hall)	Ahead prediction	Hybrid Models (SO-LSTS = Informer + AHP + Entropy Weighting)	$R^{2}$ = 0.860, RMSE = 6.258, MAE = 5.620; significantly outperformed XGBoost, KNN, SVM, BP; occupancy level improved model accuracy by 54%
[37]	CO₂, CO, NO₂, O₃, VOC, Formaldehyde, PM, Temp, RH	Public (Hospital: surgical rooms, pharmacy, women’s ward)	Ahead prediction	Machine Learning (SVM, Logistic Regression, Decision Tree, Random Forest, AdaBoost, KNN, NB, Neural Network)	Real dataset: SVM achieved 99.81% accuracy, LR 99.26%, DT 98.18%; Simulated dataset: RF 90.09%, DT 88.96%, AdaBoost 87.73%
[38]	PM2.5	Public (Subway Station–Platform Area)	Ahead prediction	Hybrid models (PLS + VAER)	PLS-VAER: $R^{2}$ = 0.722, RMSE = 0.136, MAE = 0.092; better than VAER alone ( $R^{2}$ = 0.635, RMSE = 0.156); 14.71% ↓ RMSE vs. VAER
[39]	PM2.5	Public (Subway Station–City Hall, Seoul)	Ahead prediction	Hybrid Models (KPCA + mRMR + MHATCN + LightGBM)	$R^{2}$ = 0.92, RMSE = 6.01 µg/m³, MAE = 4.36, MAPE = 20.58; outperformed baseline models including LSTM and TCN
[40]	CO₂	Public (Retail Stores, Offices, Meeting Rooms)	Ahead prediction	Deep Learning (1D Convolutional Neural Network–CNN)	RMSE ≈ 40–50 ppm (1 day of data); ≈15 ppm (after 7 days); ≈10 ppm (after 30 days); trainable on edge device in 25 min
[41]	PM10	Residential (Smart Home Kitchen Area)	Ahead prediction	Machine Learning (Random Forest)	$R^{2}$ = 0.996, RMSE = 0.594, MAE = 0.337, MAPE = 3.90%, Overall accuracy = 97.72%
[42]	PM2.5	Public	Ahead prediction	Hybrid Models (FL-CNN-LSTM)	RMSE (Test): FL-CNN-LSTM = 0.0592; better than CNN-LSTM (0.0711) and LSTM (0.0624); 3-h prediction horizon
[43]	CO₂, PM2.5, VOC	Public (Child daycare center)	Ahead prediction	Deep Learning (ANN–LM, BR, BFGS algorithms)	LM: $R^{2}$ = 0.989 (CO₂), 0.983 (PM2.5), 0.977 (VOC); RMSE = 18.7–36.5; BR/BFGS lower accuracy; 5-min interval
[44]	PM2.5, PM10, TVOC	Public (Public–Substation building)	Ahead prediction	Deep Learning (BP-ANN, MLNN, LSTM), Machine Learning (Random Forest–TVOC only)	MLNN: PM2.5 and PM10 → $R^{2}$ = 0.78–0.81, NMSE = 0.46–0.49 µg/m³. RF (TVOC): Accuracy = 89.2%. MLNN-TVOC regression: only 25.8% accuracy; MLNN better for PMs.
[45]	CO₂	Public (Office-like experimental chamber)	Ahead prediction	Hybrid models (CFD + BPNN + PSO)	BPNN: MAPE = 0.42–0.95%, $R^{2}$ = 0.93–0.97; BPNN-PSO: deviation from target < 7.38%, CO₂ reduced by 20%, up to 41.2%; 23.5% faster vs. ANN-GA
[46]	CO₂, VOC	Residential (Apartment in Switzerland)	Ahead prediction	Deep Learning (FFNN, RNN–both with LM-BP algorithm)	FFNN: CO₂ CVRMSE = 10.06–18.74%, NMBE = 2.18–1.58%; VOC CVRMSE = 16.70–19.86%. RNN: CO₂ error = 3.18–5.49%, VOC error = 4.53–4.72%.
[47]	Formaldehyde (Cf)	Residential (Simulated–Fabric-covered building, six Chinese cities)	Ahead prediction	Deep Learning (LSTM)	MAPE = 9.1–24.7%, MAE = 0.18–1.93 µg/m³, RMSE = 0.26–2.29 µg/m³; LSTM outperformed RNN; accuracy robust for input uncertainty up to 20%
[48]	PM2.5	Residential (93 households—children’s bedrooms in Taiwan)	Ahead prediction	Mathematical/Statistical methods (Multiple Linear Regression—MLR)	$R^{2}$ = 0.74; RMSE = 5.41 µg/m³; cross-validation $R^{2}$ = 0.72–0.78 (avg. 0.75); model includes outdoor PM2.5, CO₂ diff, building type, floor, human activity
[49]	CO₂	Public (one-person and shared office rooms)	Ahead prediction	Mathematical/Statistical methods (Mass balance equation + ventilation dynamics)	RMSE = 38 ppm (validation); optimized ERV reduced CO₂ > 1000 ppm events; economic analysis supported ERV sizing
[50]	PM2.5, CO₂, RH, Temp	Residential (Homes of asthmatic children)	Ahead prediction	Deep Learning (RNN–GRU, DNN)	RNN: RMSE = 42.5, MAPE = 14% (avg); best model: 3 GRU layers; performance improved with 10 min IAQ granularity
[51]	VOC	Public (Food Court in Smart Building–Kian Center 2)	Ahead prediction	Machine Learning (Radial Basis Function Neural Network—RBFNN)	MAPE = 3.51% (best setting); trained on 1104 samples (138 days), tested on 24 samples (3 days)
[52]	PM2.5	Residential (Bedroom)	Ahead prediction	Hybrid Models (Fuzzy ARTMAP Neural Network)	MAE = 2.28 (avg), range 0.26–7.65 µg/m³ for 24 h ahead prediction; network trained online with 1008 samples
[53]	CO₂, PM2.5, VOC	Public (Open-space in high-rise, 35 occupants)	Ahead prediction	Deep Learning (LSTM)	CO₂: MAE = 15.4 ppm, $R^{2}$ = 0.47; PM2.5: MAE = 0.3 µg/m³, $R^{2}$ = 0.88; VOC: MAE = 20.1 IAQI, $R^{2}$ = 0.87
[54]	PM2.5	Residential (Apartment in South Korea)	Ahead prediction	Mathematical/Statistical methods (Mass balance model)	Prediction error for PM2.5 at 30 min: 1–7 µg/m³; model validated under various cooking and ventilation scenarios
[55]	PM2.5	Public (Office in Beijing)	Real-time prediction	Machine Learning (CatBoost)	AUC = 0.949, F1-score = 0.883, Precision-Recall AUC = 0.928; model outperformed MLP, GBDT, LR, DT, KNN
[56]	PM2.5, PM10, NO₂	Public (Laboratory/Office in Florida, USA)	Ahead prediction	Mathematical/Statistical methods (MLR, TSR) Deep Learning (ANN–MLP)	ANN: $R^{2} = 0.9994$ (PM2.5), 0.9995 (PM10), 0.9014 (NO₂); RMSE = 0.0816, 0.0782, 3.17 (respectively); the best model
[57]	PM2.5	Public (Subway station—Chungmuro, Seoul)	Ahead prediction	Hybrid models (RF-CNN-GRU)	MAE = 8.61, MAPE = 0.249, RMSE = 10.56, $R^{2}$ = 0.8704–best among 7 baselines
[58]	CO₂	Public (Office 24 m² in India)	Ahead prediction	Machine Learning (GPR, SVM, DT, ANN, EL, LR)	Optimized GPR: $R^{2}$ = 0.9776, RMSE = 4.20 ppm, MAE = 3.35 ppm, NS = 0.9817, a20 = 1
[59]	CO₂, NO₂, O₃, PM1, PM2.5, PM10, HCHO, TVOC	Public (Commercial building: offices, labs, conference rooms—Berkeley, CA, USA)	Ahead prediction	Machine Learning (Random Forest, Gradient Boosting); Deep Learning (LSTM)	LSTM best: Adjusted $R^{2}$ up to 90% (e.g., PM2.5 = 87.14%, PM10 = 86.28%), MSE < 0.001 for most pollutants (1 h prediction). TVOC/HCHO much worse ( $R^{2}$ < 20%)
[60]	PM2.5	Public (Subway station—Seoul)	Ahead prediction	Hybrid models (KPCA + AdaBoost-LSTM)	Hall: $R^{2}$ = 0.9007, RMSE = 10.31 µg/m³, MAPE = 38.13%; Platform: $R^{2}$ = 0.8995, RMSE = 7.03 µg/m³, MAPE = 24.58%
[61]	PM1, PM2.5, PM10	Public (Mechanically ventilated high school classrooms—USA and China)	Real-time prediction	Deep Learning (LSTM); Mathematical/Statistical methods (NARX) NARX best: PM1	PM2.5— $R^{2}$ = 0.81–0.87, RMSE = 0.45–1.27 µg/m³, MAPE = 41–55%; PM10—all models weaker, best LSTM RMSE = 16.76 µg/m³, $R^{2}$ = 0.04
[62]	CO₂, Formaldehyde, Benzene, PM2.5	Public (Hotel and Office undergoing deep renovation—Europe)	No time horizon	Mathematical/Statistical methods (TRNSYS, IDA ICE, MATHIS-QAI, ACOUBAT, PHANIE)	PM2.5: factor of 2 deviation; CO₂: 1031–2072 ppm; Formaldehyde: 7–71 µg/m³; results within acceptable range for IEQ simulation-based TAIL rating
[63]	CO₂, CO, PM2.5, Temp, RH, Pressure	Public (Academic building—University classrooms)	Real-time prediction	Deep Learning (LSTM)	LSTM accurately predicted IAQ parameters using IoT data; no numerical metrics ( $R^{2}$ , RMSE) reported, but model demonstrated effective real-time performance
[64]	RH (Relative Humidity)	Residential (Sleeping room)	Ahead prediction	Machine Learning (Decision Forest Regression—DFR, Boosted DTR, BLR, LR, NNR)	Best: DFR–RMSE = 1.314–1.466, CoD = 0.97–0.974; NNR worst (RMSE ≈ 3.3); Azure ML Studio evaluation
[65]	PM1, PM2.5, PM10, PM > 10	Public (School building)	Ahead prediction	Hybrid models (LASSO + Attention + Temporal Convolutional Network—LATCN)	PM2.5: RMSE = 10.94, MAE = 5.94, $R^{2}$ = 0.912; PM10: RMSE = 13.75, MAE = 7.11, $R^{2}$ = 0.898; LATCN outperforms LSTM, GRU, RNN, ATCN, LTCN
[66]	PM10	Public (Daycare centers, kindergartens, elementary schools—Korea)	Real-time prediction	Mathematical/Statistical methods (Multiple Linear Regression—MLR)	$R^{2}$ = 0.64 (daycare), 0.45 (kindergartens), 0.43 (elementary); RMSE: 26.7, 18.9, 19.9 µg/m³ (10 min interval)
[67]	PM10	Public (Cafeteria—India)	Ahead prediction	Hybrid models (Fuzzy Inference System—FIS + PSO, FIS + GA)	FIS-GA: RMSE = 0.998; FIS-PSO: RMSE = 1.0746; FIS alone: RMSE = 2.0894 (all on normalized data)
[68]	PM2.5	Residential (Urban housing across multiple Chinese cities)	Ahead prediction	Deep Learning (Spatiotemporal Neural Network)	MAE = 6.19 µg/m³, RMSE = 8.40 µg/m³, $R^{2}$ = 0.74; consistent accuracy across 330 cities; validated on independent dataset
[69]	PM2.5, PM10	Public (University office room)	Ahead prediction	Mathematical/Statistical methods (Dynamic mass-balance + Least Squares Optimization)	PM2.5 prediction: r = 0.883, NMSE = 0.085 PM10 prediction: r = 0.882, NMSE = 0.083 AP effectiveness: PM2.5 = 86.4%, PM10 = 86%
[70]	PM2.5, PM10	Public (Subway platform—Korea)	Ahead prediction	Deep Learning (Hybrid CNN-LSTM-DNN)	PM10: RMSE = 8.94; MAE = 6.44; $R^{2}$ = 0.55 PM2.5: RMSE = 10.1; MAE = 6.81; $R^{2}$ = 0.35
[71]	AER (based on PM1.0, PM2.5, PM10)	Public (Office in Suzhou, China)	Real-time prediction	Mathematical/Statistical methods (mass balance equations + empirical correlations)	PM1.0: NME = 2.3–18.3%, r = 0.87–0.99 PM2.5: NME = 2.4–38.2%, r = 0.94–0.99 PM10: NME > 30% (less accurate)
[72]	PM2.5	Residential (Korea—two homes)	Real-time prediction	Machine Learning (Multiple Linear Regression—MLR)	$R^{2}$ = 0.25 (global model); the best hour model (H4): $R^{2}$ = 0.34, RMSE = 3.34 µg/m³, MAE = 2.55 µg/m³
[73]	IAQ Index (VOC, CO, NOx), Temp, RH	Public (Fire Department—Spain)	Ahead prediction	Hybrid (Statistical + Machine Learning Tree Partition + SSA preprocessing)	Without SSA: FIT ≈ 82.2%, MSE = 1.516 With SSA: FIT = 99.12%, MSE = 0.0035, calculation time ↓ 51%
[74]	CO₂	Public (University classroom—Italy)	Ahead prediction	Machine Learning (Regression, k-NN), Deep Learning (LSTM)	LSTM: MAPE = 18%, RMSE = 253 ppm, $R^{2}$ = 0.79 KNN: MAPE = 22%, RMSE = 290 ppm, $R^{2}$ = 0.71 Regression: MAPE = 24%, RMSE = 347 ppm, $R^{2}$ = 0.69
[75]	PM2.5, CO₂, CO	Residential (District 12, Ho Chi Minh City)	Ahead prediction	Mathematical/Statistical methods (ARIMA), Deep Learning (LSTM), Machine Learning (FFNN)	Best: ARIMA–CO₂: $R^{2}$ = 0.963, RMSE = 48.8 PM2.5: ARIMA–RMSE = 0.0043 FFNN weakest across all metrics
[76]	VOC (TVOC)	Public (Office and Meeting Room—CNR Pisa, Italy)	Ahead prediction	Mathematical/Statistical methods (Box-model with mass balance + CFD via CONTAM	Box-model: TVOC = 0.48–1.65 µg/m³ (Low VOC case), <10% of IAGV (600 µg/m³); CFD: VOC peaks = 50–100 µg/m³ near emission zones
[77]	CO₂	Residential (Apartment—Belfast, UK)	Ahead prediction	Machine Learning (SGD regressor)	MSE = 0.6 vs. baseline MSE = 1.049; predicted trends match actual (e.g., peak eCO₂ in evenings in bedroom)
[78]	PM2.5, CO, NH₃, AQI	Residential (Indoor testbed—Bangladesh)	Ahead prediction	Mathematical/Statistical methods (ARIMA)	MAPE: Temp = 2.82%, Humidity = 4.70%, PM2.5 = 6.92%, CO = 10.12%, NH₃ = 10.3%, AQI = 5.8% (≈90–97% accuracy)

Table 4. The descriptions of selected methods.

Method	Type	Description
Random Forest	Machine Learning	Random Forest is an ensemble algorithm that constructs multiple decision trees using randomly selected subsets of data and features, aggregating their outputs through majority voting or averaging. This approach effectively reduces overfitting by decreasing correlation among individual trees, resulting in improved accuracy and robustness in predictive tasks [79].
Linear Regression	Machine Learning	Linear regression models the relationship between a dependent variable and one or more independent variables by fitting a linear equation to observed data. The coefficients in this equation represent the intercept and slopes, quantifying the expected change in the dependent variable for a one-unit change in each predictor, estimated by minimizing the sum of squared residuals. This method assumes a linear, additive relationship and provides interpretable parameters useful for quantifying associations and making continuous predictions [80].
Support Vector Machine	Machine Learning	Support Vector Machine constructs an optimal hyperplane that maximizes the margin between distinct classes in a high-dimensional space, thus enhancing classification accuracy. It handles both linearly separable and non-linear data through kernel functions that map input data into higher-dimensional spaces where linear separation is feasible. The method explicitly controls the trade-off between classifier complexity and misclassification, providing robust generalization capabilities [81].
Gaussian Process Regression	Machine Learning	Gaussian Process Regression (GPR) is a non-parametric Bayesian method for modeling unknown functions by defining a distribution over possible functions consistent with observed data. It utilizes a kernel (covariance) function to encode prior assumptions about the function’s smoRecurrent Neural Networks (RNNs) are a class of artificial neural networks designed to process sequential data by maintaining an internal state that captures information from previous inputs. Unlike feedforward networks, RNNs include feedback loops, allowing them to model temporal dependencies and exhibit dynamic temporal behavior. This architecture enables RNNs to handle tasks such as time-series prediction, speech recognition, and natural language processing, where context and sequence order are critical. However, training RNNs can be challenging due to issues like vanishing or exploding gradients, which limit their ability to learn long-term dependencies effectively.othness and structure, enabling both interpolation and uncertainty quantification through posterior predictions derived from Gaussian process theory. The algorithm optimizes hyperparameters via marginal likelihood maximization and scales predictions using only the observed data, implicitly balancing complexity and fit [82].
Gradient Boosting	Machine Learning	Gradient Boosting Machines (GBM) are an ensemble learning technique that sequentially fits weak base-learners—typically decision trees—to the negative gradient of a specified loss function, iteratively improving the model’s predictive accuracy. By combining the outputs of these base-learners, GBM constructs a strong predictive model that is highly customizable through the choice of loss functions and base-learner types, making it effective for both regression and classification tasks. The algorithm inherently balances bias and variance by controlling overfitting via regularization techniques such as shrinkage and early stopping [83].
k-Nearest Neighbors	Machine Learning	Gradient Boosting Machines (GBM) are an ensemble learning technique that builds predictive models by iteratively combining weak learners, typically decision trees, to form a strong predictive model. GBMs are based on the principle of boosting, where each new model corrects the errors of the previous ensemble. The core idea is to fit new models to the negative gradient of the loss function, which measures the difference between predicted and actual values, thereby minimizing the loss step-by-step [84].
Artificial Neural Networks	Deep Learning	Artificial Neural Networks (ANNs) are computational models inspired by the structure and function of biological neural networks, designed to simulate the way the human brain processes information. ANNs consist of interconnected nodes (neurons) organized in layers—input, hidden, and output—which collectively learn to recognize patterns, classify data, and make predictions through adaptive weight adjustments during training. Their ability to model complex, nonlinear relationships makes them highly effective in diverse applications, including pattern recognition, machine learning, and data analysis [85].
Recurrent Neural Networks	Deep Learning	Recurrent Neural Networks (RNNs) are a class of artificial neural networks designed to process sequential data by maintaining an internal state that captures information from previous inputs. Unlike feedforward networks, RNNs include feedback loops, allowing them to model temporal dependencies and exhibit dynamic temporal behavior. This architecture enables RNNs to handle tasks such as time-series prediction, speech recognition, and natural language processing, where context and sequence order are critical. However, training RNNs can be challenging due to issues like vanishing or exploding gradients, which limit their ability to learn long-term dependencies effectively [85].
Long-Short Term Memory	Deep Learning	Long Short-Term Memory (LSTM) is a recurrent neural network architecture designed to overcome the vanishing gradient problem, allowing it to learn long-term dependencies in sequential data. By using memory cells and gating mechanisms, LSTM networks effectively retain and access information over extended sequences, making them highly suitable for tasks like speech recognition, time-series analysis, and natural language processing [86].
Gated Recurrent Units	Deep Learning	Gated Recurrent Units (GRUs) are a streamlined variant of the Long Short-Term Memory (LSTM) architecture, designed to efficiently model sequential data in recurrent neural networks (RNNs). GRUs employ two gating mechanisms—the update gate and reset gate—to regulate the flow of information between hidden states, enabling improved handling of long-term dependencies compared to traditional RNNs. With fewer parameters than LSTMs, GRUs offer computational efficiency while maintaining comparable performance, making them a widely adopted choice for tasks such as speech recognition, natural language processing, and time-series analysis. Their simplified structure facilitates faster training and reduced memory usage without sacrificing effectiveness in capturing temporal patterns [87].
Lasso	Statistical/ Mathematical	Least Absolute Shrinkage and Selection Operator (LASSO) is a penalized regression method that reduces overfitting by shrinking regression coefficients toward zero through an L1 penalty, effectively performing variable selection. The tuning parameter $λ$ , optimized via cross-validation, controls the degree of shrinkage. While LASSO improves predictive accuracy in high-dimensional data, it may introduce bias in coefficient estimates, limiting their interpretability [88].
ARIMA	Statistical/ Mathematical	Autoregressive Integrated Moving Average (ARIMA) models are versatile tools for analyzing and forecasting time series data by combining three key components: autoregression (AR), differencing (I), and moving averages (MAs). The AR component captures the dependence of current values on past observations, the differencing component stabilizes the mean by removing trends or seasonality, and the MA component models the error structure as a linear combination of past error terms. ARIMA models are particularly effective for handling nonstationary data, making them widely applicable in fields such as economics, finance, and environmental science. Their flexibility allows them to adapt to diverse temporal patterns, including trends, cycles, and stochastic fluctuations [89].
SARIMAX	Statistical/ Mathematical	Seasonal Autoregressive Integrated Moving Average with Exogenous Factors (SARIMAX) is an advanced time series forecasting model that extends the traditional SARIMA framework by incorporating exogenous variables. SARIMAX integrates seasonal components (SARIMA) with external influencing factors (X), enabling it to capture complex temporal dependencies, seasonal patterns, and the impact of external variables on the target series. This model is particularly effective for datasets with both seasonal and non-seasonal structures, as well as those influenced by external factors such as weather conditions, economic indicators, or policy changes. By combining autoregressive (AR), differencing (I), moving average (MA), and exogenous (X) components, SARIMAX enhances predictive accuracy and adaptability, making it a robust tool for applications in energy forecasting, economics, and environmental studies. Its ability to handle multiple input variables and seasonal effects simultaneously addresses limitations of simpler models, providing more reliable long-term forecasts [90].

Table 5. SWOT analysis of AI/ML-based approaches for IAQ prediction.

SWOT Element	Description
Strengths	High prediction accuracy, especially with LSTM, GRU, CNN-LSTM, and hybrid models. Ability to learn from real-time data streams. Scalability and adaptability to various building environments. Integration with IoT, BMS, BIM, Digital Twin, and HVAC systems. Ability to detect nonlinear relationships and perform multi-source data fusion.
Weaknesses	Low data quality from cheap sensors—noise, drift, data loss. Frequent focus on single parameters (e.g., CO₂, PM2.5), lacking comprehensive IAQ modeling. Lack of interpretability in deep learning models. Absence of standardized datasets and evaluation metrics. Models often tested in isolation—limited comparability.
Opportunities	Development of hybrid models combining AI with physics-based approaches (e.g., CFD, mass balance). Integration with HVAC control and Digital Twin for automated response to forecasts. Incorporation of contextual factors (occupancy, user activity, ventilation). Lightweight models for edge deployment. Use of federated learning and interpretability tools (e.g., SHAP, BNN).
Threats	Limited generalizability of models to other building types and climates. High computational complexity of DL models—deployment barrier. Lack of integration with real-world building automation systems. Risk of oversimplifying physical processes or misinterpreting results. Absence of widely available reference datasets for training and benchmarking.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Latoń, D.; Grela, J.; Ożadowicz, A.; Wisniewski, L. Artificial Intelligence and Machine Learning Approaches for Indoor Air Quality Prediction: A Comprehensive Review of Methods and Applications. Energies 2025, 18, 5194. https://doi.org/10.3390/en18195194

AMA Style

Latoń D, Grela J, Ożadowicz A, Wisniewski L. Artificial Intelligence and Machine Learning Approaches for Indoor Air Quality Prediction: A Comprehensive Review of Methods and Applications. Energies. 2025; 18(19):5194. https://doi.org/10.3390/en18195194

Chicago/Turabian Style

Latoń, Dominik, Jakub Grela, Andrzej Ożadowicz, and Lukasz Wisniewski. 2025. "Artificial Intelligence and Machine Learning Approaches for Indoor Air Quality Prediction: A Comprehensive Review of Methods and Applications" Energies 18, no. 19: 5194. https://doi.org/10.3390/en18195194

APA Style

Latoń, D., Grela, J., Ożadowicz, A., & Wisniewski, L. (2025). Artificial Intelligence and Machine Learning Approaches for Indoor Air Quality Prediction: A Comprehensive Review of Methods and Applications. Energies, 18(19), 5194. https://doi.org/10.3390/en18195194

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Artificial Intelligence and Machine Learning Approaches for Indoor Air Quality Prediction: A Comprehensive Review of Methods and Applications

Abstract

1. Introduction

2. Methodology of the Review

3. State of the Art-Related Research

3.1. Modeling Approaches

3.2. Observations and Trends

4. Classification of Studies on the IAQ Prediction

4.1. Classification by Predicted IAQ Parameters

4.1.1. CO2 Concentration Prediction

4.1.2. PM2.5 Concentration Prediction

4.1.3. PM10 Concentration Prediction

4.1.4. Prediction of Other Parameters (Formaldehydes, VOC, RH)

4.1.5. Multiple Parameter Prediction (Different Combinations of Parameters/Factors)

4.2. Classification by Type of Facilities

4.2.1. Residential Facilities

4.2.2. Non-Residential Facilities

4.3. Classification by Prediction Strategy

4.3.1. Forward Prediction of IAQ

4.3.2. IAQ Estimation During Building Design (No Time Horizon)

4.3.3. Inference of IAQ from Indirect Parameters (Sensor Fusion)

4.4. Classification by Applied Methods

4.4.1. Mathematical and Statistical Method

4.4.2. Machine Learning

4.4.3. Deep Learning

4.4.4. Hybrid Models

5. Gaps, Barriers, and Opportunities with Future Perspective

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

4.1.1. CO₂ Concentration Prediction