Artificial Intelligence and Machine Learning Approaches for Indoor Air Quality Prediction: A Comprehensive Review of Methods and Applications
Abstract
1. Introduction
- Sensor-based measurements of environmental variables achieved through the utilization of sensor-based technologies, including metal oxide semiconductor (MOS) sensors, nondispersive infrared (NDIR) sensors, and optical particle counters. Despite the increasing prevalence of low-cost sensors, these devices frequently exhibit problems such as drift, noise, and reduced accuracy;
- Weather data—to contextualize indoor air dynamics and to formulate forecasts regarding conditions, it is imperative that data regarding weather be taken into consideration. These data should include such factors as external temperature, humidity, wind speed, and pollutant concentrations;
- Building system data—the collection of control and monitoring systems data, including but not limited to heating, ventilation, and air conditioning (HVAC) operational parameters, ventilation rates, and control schedules, is frequently facilitated by Building Management Systems (BMSs);
- Occupancy data—derived from passive infrared (PIR) sensors, cameras, WiFi or Bluetooth tracking, or building calendars and schedules. These data serve as key indicators of the generation of internal pollutants, particularly CO2.
- The absence of data can be attributed to either sensor failures or transmission errors;
- The presence of sensor noise and long-term drift, particularly in low-cost devices, is a significant concern;
- Data discontinuities may result from irregular sampling;
- The absence of standardization across data sources, including but not limited to units, formats and sampling frequency;
- Synchronization difficulties in multi-source systems. These difficulties may arise, for example, when trying to align sensor data with HVAC and occupancy logs.
- The detection of complex nonlinear relationships among IAQ parameters;
- Short- and mid-term forecasting of pollutant concentrations;
- The fusion of multi-source data (sensors, weather, occupancy);
- Decision support in HVAC/BMS systems.
- A comprehensive taxonomy of IAQ prediction approaches, categorized by predicted variables, building typologies, model types, and data fusion strategies;
- The identification of issues pertaining to the quality of data, and the subsequent implications for the performance of the model;
- The exploration of application scenarios encompasses a wide range of settings, including office buildings, educational institutions, and environments characterized by isolation or limited resources;
- The identification of still open challenges and research gaps, with a particular focus on aspects such as model generalization, personalization, and integration into building automation frameworks.
2. Methodology of the Review
3. State of the Art-Related Research
3.1. Modeling Approaches
- (coefficient of determination). Indicates the proportion of the variability in the target variable that is explained by the model. When the value approaches 1.0, it is indicative of an excellent fit (e.g., for certain LSTM models predicting CO2). In contrast, values less than 0.5 signify inadequate predictive capacity. In the reviewed studies, values range from as low as 0.03 (weak Gaussian Process Regression (GPR) model) to nearly 1.0 (optimized DL models with feature selection);
- MAE (Mean Absolute Error) and RMSE (Root Mean Square Error). These provide measures of the absolute prediction error in physical units (e.g., ppm for CO2 or µg/m3 for PM2.5). It is important to note that the RMSE is more sensitive to outliers due to the squaring operation. It is notable that the models demonstrating the most efficacy are those employing hybrid or deep architectures, as evidenced by the attainment of minimal MAE/RMSE values (e.g., <5 ppm CO2 or <2 µg/m3 PM2.5);
- MAPE (Mean Absolute Percentage Error). Measurement of prediction error as a percentage facilitates comparison between different scales. It is widely accepted that MAPE values of less than 10% are indicative of highly accurate models; however, it should be noted that models for Total Volatile Organic Compounds (TVOC) or bioaerosols frequently exceed this range, a consequence of the elevated uncertainty and sensor limitations inherent in such models;
- AUC (Area Under the Curve) and the F1-score—two important performance metrics in ML. These manifest in classification-based IAQ assessments (e.g., forecasting acceptable vs. unacceptable air quality states). It is notable that high AUC values (>0.9) and favorable F1-scores (>0.85) are indicative of strong discriminatory capability;
- Interval-based metrics, such as the 95% prediction interval coverage or confidence interval coverage, are utilized in probabilistic models (e.g., Bayesian Neural Networks, Quantile Random Forest QRF). Models with high interval coverage (i.e., ≥85%) have been shown to provide not only accurate point predictions, but also reliable uncertainty estimates.
3.2. Observations and Trends
4. Classification of Studies on the IAQ Prediction
4.1. Classification by Predicted IAQ Parameters
4.1.1. CO2 Concentration Prediction
4.1.2. PM2.5 Concentration Prediction
4.1.3. PM10 Concentration Prediction
4.1.4. Prediction of Other Parameters (Formaldehydes, VOC, RH)
4.1.5. Multiple Parameter Prediction (Different Combinations of Parameters/Factors)
4.2. Classification by Type of Facilities
4.2.1. Residential Facilities
4.2.2. Non-Residential Facilities
4.3. Classification by Prediction Strategy
4.3.1. Forward Prediction of IAQ
4.3.2. IAQ Estimation During Building Design (No Time Horizon)
4.3.3. Inference of IAQ from Indirect Parameters (Sensor Fusion)
4.4. Classification by Applied Methods
4.4.1. Mathematical and Statistical Method
4.4.2. Machine Learning
4.4.3. Deep Learning
4.4.4. Hybrid Models
5. Gaps, Barriers, and Opportunities with Future Perspective
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Zhao, B.; Shi, S.; Ji, J.S. The WHO Air Quality Guidelines 2021 promote great challenge for indoor air. Sci. Total Environ. 2022, 827, 154376. [Google Scholar] [CrossRef] [PubMed]
- Fisk, W.J. The ventilation problem in schools: Literature review. Indoor Air 2017, 27, 1039–1051. [Google Scholar] [CrossRef]
- Felgueiras, F.; Mourão, Z.; Moreira, A.; Gabriel, M.F. Indoor environmental quality in offices and risk of health and productivity complaints at work: A literature review. J. Hazard. Mater. Adv. 2023, 10, 100314. [Google Scholar] [CrossRef]
- Sun, S.; Zheng, X.; Villalba-Díez, J.; Ordieres-Meré, J. Indoor Air-Quality Data-Monitoring System: Long-Term Monitoring Benefits. Sensors 2019, 19, 4157. [Google Scholar] [CrossRef]
- Paterson, C.; Sharpe, R.; Taylor, T.; Morrissey, K. Indoor PM2.5, VOCs and asthma outcomes: A systematic review in adults and their home environments. Environ. Res. 2021, 202, 111631. [Google Scholar] [CrossRef]
- Xu, Q.; Goh, H.C.; Mousavi, E.; Rafsanjani, H.N.; Varghese, Z.; Pandit, Y.; Ghahramani, A. Towards Personalization of Indoor Air Quality: Review of Sensing Requirements and Field Deployments. Sensors 2022, 22, 3444. [Google Scholar] [CrossRef]
- Chojer, H.; Branco, P.; Martins, F.; Alvim-Ferraz, M.; Sousa, S. Development of low-cost indoor air quality monitoring devices: Recent advancements. Sci. Total Environ. 2020, 727, 138385. [Google Scholar] [CrossRef]
- Dai, H.; Wu, N.; Dong, Z.; Ren, J.; Gao, Y.; Zhao, B. Comparison and evaluation of machine learning models for predicting indoor PM2.5 concentrations on a large spatiotemporal scale. Build. Simul. 2025, 18, 1453–1466. [Google Scholar] [CrossRef]
- Jha, S. Enhancing Indoor Air Quality Through Smart Home Automation System. Master’s Thesis, Arcada University of Applied Sciences, Helsinki, Finland, 2025. [Google Scholar]
- Bian, Y.; Shi, Y. Data-driven operator learning for energy-efficient building control. arXiv 2025, arXiv:2504.21243. [Google Scholar]
- Oussous, S.A.; Lail, D.M.; Bouayadi, R.E.; Amine, A. Deep Learning Innovations for Greenhouse Climate Prediction: Insights From a Spanish Case Study. IEEE Access 2025, 13, 64810–64821. [Google Scholar] [CrossRef]
- Dehghan, F.; Porras-Amores, C.; Khanmohammadi, L.; Labib, R. Evaluating Machine Learning Models for Sustainable Building Design: Energy, Emissions, and Comfort Metrics. Build. Environ. 2025, 285, 113582. [Google Scholar] [CrossRef]
- Tagliabue, L.C.; Cecconi, F.R.; Rinaldi, S.; Ciribini, A.L.C. Data driven indoor air quality prediction in educational facilities based on IoT network. Energy Build. 2021, 236, 110782. [Google Scholar] [CrossRef]
- Hou, F.; Ma, J.; Kwok, H.H.; Cheng, J.C. Prediction and optimization of thermal comfort, IAQ and energy consumption of typical air-conditioned rooms based on a hybrid prediction model. Build. Environ. 2022, 225, 109576. [Google Scholar] [CrossRef]
- Karaiskos, P.; Munian, Y.; Martinez-Molina, A.; Alamaniotis, M. Indoor air quality prediction modeling for a naturally ventilated fitness building using RNN-LSTM artificial neural networks. Smart Sustain. Built Environ. 2024; ahead-of-print. [Google Scholar] [CrossRef]
- Yao, H.; Shen, X.; Wu, W.; Lv, Y.; Vishnupriya, V.; Zhang, H.; Long, Z. Assessing and predicting indoor environmental quality in 13 naturally ventilated urban residential dwellings. Build. Environ. 2024, 253, 111347. [Google Scholar] [CrossRef]
- Taheri, S.; Razban, A. Learning-based CO2 concentration prediction: Application to indoor air quality control using demand-controlled ventilation. Build. Environ. 2021, 205, 108164. [Google Scholar] [CrossRef]
- Yang, S.; Mahecha, S.D.; Moreno, S.A.; Licina, D. Integration of Indoor Air Quality Prediction into Healthy Building Design. Sustainability 2022, 14, 7890. [Google Scholar] [CrossRef]
- Lee, J.Y.; Miao, Y.; Chau, R.L.; Hernandez, M.; Lee, P.K. Artificial intelligence-based prediction of indoor bioaerosol concentrations from indoor air quality sensor data. Environ. Int. 2023, 174, 107900. [Google Scholar] [CrossRef] [PubMed]
- Nguyen, T.P. AIoT-based indoor air quality prediction for building using enhanced metaheuristic algorithm and hybrid deep learning. J. Build. Eng. 2025, 105, 112448. [Google Scholar] [CrossRef]
- Saini, J.; Dutta, M.; Marques, G. Indoor Air Quality Monitoring with IoT: Predicting PM10 for Enhanced Decision Support. In Proceedings of the 2020 International Conference on Decision Aid Sciences and Application (DASA), Sakheer, Bahrain, 8–9 November 2020; pp. 504–508. [Google Scholar] [CrossRef]
- Nurcahyanto, H.; Prihatno, A.T.; Alam, M.M.; Rahman, M.H.; Jahan, I.; Shahjalal, M.; Jang, Y.M. Multilevel RNN-Based PM10 Air Quality Prediction for Industrial Internet of Things Applications in Cleanroom Environment. Wirel. Commun. Mob. Comput. 2022, 2022, 1874237. [Google Scholar] [CrossRef]
- Cho, J.H.; Moon, J.W. Integrated artificial neural network prediction model of indoor environmental quality in a school building. J. Clean. Prod. 2022, 344, 131083. [Google Scholar] [CrossRef]
- Sharma, P.K.; Mondal, A.; Jaiswal, S.; Saha, M.; Nandi, S.; De, T.; Saha, S. IndoAirSense: A framework for indoor air quality estimation and forecasting. Atmos. Pollut. Res. 2021, 12, 10–22. [Google Scholar] [CrossRef]
- Zhu, Y.; Al-Ahmed, S.A.; Shakir, M.Z.; Olszewska, J.I. LSTM-Based IoT-Enabled CO2 Steady-State Forecasting for Indoor Air Quality Monitoring. Electronics 2022, 12, 107. [Google Scholar] [CrossRef]
- Dai, X.; Liu, J.; Li, Y. A recurrent neural network using historical data to predict time series indoor PM2.5 concentrations for residential buildings. Indoor Air 2021, 31, 1228–1237. [Google Scholar] [CrossRef] [PubMed]
- Dutta, J.; Roy, S. IndoorSense: Context based indoor pollutant prediction using SARIMAX model. Multimed. Tools Appl. 2021, 80, 19989–20018. [Google Scholar] [CrossRef]
- Rahim, M.S.A.; Yakub, F.; Omar, M.; Ghani, R.A.; Salim, S.A.Z.S.; Masuda, S.; Dhamanti, I. Prediction of Indoor Air Quality Using Long Short-Term Memory with Adaptive Gated Recurrent Unit. E3S Web Conf. 2023, 396, 01095. [Google Scholar] [CrossRef]
- Zhang, J.; Poon, K.H.; Kwok, H.H.; Hou, F.; Cheng, J.C. Predictive control of HVAC by multiple output GRU—CFD integration approach to manage multiple IAQ for commercial heritage building preservation. Build. Environ. 2023, 245, 110802. [Google Scholar] [CrossRef]
- Lagesse, B.; Wang, S.; Larson, T.V.; Kim, A.A. Predicting PM2.5 in Well-Mixed Indoor Air for a Large Office Building Using Regression and Artificial Neural Network Models. Environ. Sci. Technol. 2020, 54, 15320–15328. [Google Scholar] [CrossRef]
- Tian, X.; Zhang, Y.; Lin, Z. Predicting non-uniform indoor air quality distribution by using pulsating air supply and SVM model. Build. Environ. 2022, 219, 109171. [Google Scholar] [CrossRef]
- Rastogi, K.; Barthwal, A.; Lohani, D.; Acharya, D. An IoT-based Discrete Time Markov Chain Model for Analysis and Prediction of Indoor Air Quality Index. In Proceedings of the 2020 IEEE Sensors Applications Symposium (SAS), Kuala Lumpur, Malaysia, 9–11 March 2020; pp. 1–6. [Google Scholar] [CrossRef]
- Kallio, J.; Tervonen, J.; Räsänen, P.; Mäkynen, R.; Koivusaari, J.; Peltola, J. Forecasting office indoor CO2 concentration using machine learning with a one-year dataset. Build. Environ. 2021, 187, 107409. [Google Scholar] [CrossRef]
- Sassi, M.S.H.; Fourati, L.C. Deep Learning and Augmented Reality for IoT-based Air Quality Monitoring and Prediction System. In Proceedings of the 2021 International Symposium on Networks, Computers and Communications (ISNCC), Dubai, United Arab Emirates, 31 October–2 November 2021; pp. 1–6. [Google Scholar] [CrossRef]
- Miao, S.; Gangolells, M.; Tejedor, B. Data-driven model for predicting indoor air quality and thermal comfort levels in naturally ventilated educational buildings using easily accessible data for schools. J. Build. Eng. 2023, 80, 108001. [Google Scholar] [CrossRef]
- Zhou, Y.; Yang, G. A predictive model of indoor PM2.5 considering occupancy level in a hospital outpatient hall. Sci. Total Environ. 2022, 844, 157233. [Google Scholar] [CrossRef] [PubMed]
- Baqer, N.S.; Albahri, A.S.; Mohammed, H.A.; Zaidan, A.A.; Amjed, R.A.; Al-Bakry, A.M.; Albahri, O.S.; Alsattar, H.A.; Alnoor, A.; Alamoodi, A.H.; et al. Indoor air quality pollutants predicting approach using unified labelling process-based multi-criteria decision making and machine learning techniques. Telecommun. Syst. 2022, 81, 591–613. [Google Scholar] [CrossRef]
- Wu, Q.; Geng, Y.; Wang, X.; Wang, D.; Yoo, C.; Liu, H. A novel deep learning framework with variational auto-encoder for indoor air quality prediction. Front. Environ. Sci. Eng. 2024, 18, 8. [Google Scholar] [CrossRef]
- Lu, Y.; Wang, J.; Wang, D.; Yoo, C.; Liu, H. Incorporating temporal multi-head self-attention convolutional networks and LightGBM for indoor air quality prediction. Appl. Soft Comput. 2024, 157, 111569. [Google Scholar] [CrossRef]
- Segala, G.; Doriguzzi-Corin, R.; Peroni, C.; Gazzini, T.; Siracusa, D. A Practical and Adaptive Approach to Predicting Indoor CO2. Appl. Sci. 2021, 11, 10771. [Google Scholar] [CrossRef]
- Saini, J.; Dutta, M.; Marques, G. Internet of Things Based Environment Monitoring and PM10 Prediction for Smart Home. In Proceedings of the 2020 International Conference on Innovation and Intelligence for Informatics, Computing and Technologies (3ICT), Sakheer, Bahrain, 20–21 December 2020; pp. 1–5. [Google Scholar]
- Bao, R.; Zhou, Y.; Jiang, W. FL-CNN-LSTM: Indoor Air Quality Prediction Using Fuzzy Logic and CNN-LSTM Model. In Proceedings of the 2022 2nd International Conference on Electrical Engineering and Control Science (IC2ECS), Nanjing, China, 16–18 December 2022; pp. 986–989. [Google Scholar] [CrossRef]
- Kim, J.; Hong, Y.; Seong, N.; Kim, D.D. Assessment of ANN Algorithms for the Concentration Prediction of Indoor Air Pollutants in Child Daycare Centers. Energies 2022, 15, 2654. [Google Scholar] [CrossRef]
- Gaowa, S.; Zhang, Z.; Nie, J.; Li, L.; A-ru, H.; Yu, Z. Using artificial neural networks to predict indoor particulate matter and TVOC concentration in an office building: Model selection and method development. Energy Built Environ. 2025, 6, 750–761. [Google Scholar] [CrossRef]
- Li, L.; Zhang, Y.; Fung, J.C.; Qu, H.; Lau, A.K. A coupled computational fluid dynamics and back-propagation neural network-based particle swarm optimizer algorithm for predicting and optimizing indoor air quality. Build. Environ. 2022, 207, 108533. [Google Scholar] [CrossRef]
- Kim, M.K.; Cremers, B.; Liu, J.; Zhang, J.; Wang, J. Prediction and correlation analysis of ventilation performance in a residential building using artificial neural network models based on data-driven analysis. Sustain. Cities Soc. 2022, 83, 103981. [Google Scholar] [CrossRef]
- Lu, L.; Huang, X.; Zhou, X.; Guo, J.; Yang, X.; Yan, J. High-performance formaldehyde prediction for indoor air quality assessment using time series deep learning. Build. Simul. 2024, 17, 415–429. [Google Scholar] [CrossRef]
- Jung, C.C.; Lin, W.Y.; Hsu, N.Y.; Wu, C.D.; Chang, H.T.; Su, H.J. Development of Hourly Indoor PM2.5 Concentration Prediction Model: The Role of Outdoor Air, Ventilation, Building Characteristic, and Human Activity. Int. J. Environ. Res. Public Health 2020, 17, 5906. [Google Scholar] [CrossRef]
- Lee, Y.K.; Kim, Y.I.; Lee, W.S. Development of CO2 Concentration Prediction Tool for Improving Office Indoor Air Quality Considering Economic Cost. Energies 2022, 15, 3232. [Google Scholar] [CrossRef]
- Woo, J.; Lee, J.H.; Kim, Y.; Rudasingwa, G.; Lim, D.H.; Kim, S. Forecasting the Effects of Real-Time Indoor PM2.5 on Peak Expiratory Flow Rates (PEFR) of Asthmatic Children in Korea: A Deep Learning Approach. IEEE Access 2022, 10, 19391–19400. [Google Scholar] [CrossRef]
- Majdi, A.; Alrubaie, A.J.; Al-Wardy, A.H.; Baili, J.; Panchal, H. A novel method for Indoor Air Quality Control of Smart Homes using a Machine learning model. Adv. Eng. Softw. 2022, 173, 103253. [Google Scholar] [CrossRef]
- de Assis Pedrobon Ferreira, W.; Grout, I.; da Silva, A.C.R. Application of a Fuzzy ARTMAP Neural Network for Indoor Air Quality Prediction. In Proceedings of the 2022 International Electrical Engineering Congress (iEECON), Khon Kaen, Thailand, 9–11 March 2022; pp. 1–4. [Google Scholar] [CrossRef]
- Gabriel, M.; Auer, T. LSTM Deep Learning Models for Virtual Sensing of Indoor Air Pollutants: A Feasible Alternative to Physical Sensors. Buildings 2023, 13, 1684. [Google Scholar] [CrossRef]
- Kim, Y.; Shin, D.; Hong, K.; Lee, G.; Kim, S.B.; Park, I.; Kim, H.; Kim, Y.; Han, B.; Hwang, J. Prediction of indoor PM2.5 concentrations and reduction strategies for cooking events through various IAQ management methods in an apartment of South Korea. Indoor Air 2022, 32, e13173. [Google Scholar] [CrossRef] [PubMed]
- Guo, Z.; Wang, X.; Ge, L. Classification prediction model of indoor PM2.5 concentration using CatBoost algorithm. Front. Built Environ. 2023, 9, 1207193. [Google Scholar] [CrossRef]
- Zhang, H.; Srinivasan, R.; Yang, X. Simulation and Analysis of Indoor Air Quality in Florida Using Time Series Regression (TSR) and Artificial Neural Networks (ANN) Models. Symmetry 2021, 13, 952. [Google Scholar] [CrossRef]
- Guo, Z.; Yang, C.; Wang, D.; Liu, H. A novel deep learning model integrating CNN and GRU to predict particulate matter concentrations. Process Saf. Environ. Prot. 2023, 173, 604–613. [Google Scholar] [CrossRef]
- Kapoor, N.R.; Kumar, A.; Kumar, A.; Kumar, A.; Mohammed, M.A.; Kumar, K.; Kadry, S.; Lim, S. Machine Learning-Based CO2 Prediction for Office Room: A Pilot Study. Wirel. Commun. Mob. Comput. 2022, 2022, 9404807. [Google Scholar] [CrossRef]
- Mohammadshirazi, A.; Kalkhorani, V.A.; Humes, J.; Speno, B.; Rike, J.; Ramnath, R.; Clark, J.D. Predicting airborne pollutant concentrations and events in a commercial building using low-cost pollutant sensors and machine learning: A case study. Build. Environ. 2022, 213, 108833. [Google Scholar] [CrossRef]
- Wang, J.; Wang, D.; Zhang, F.; Yoo, C.; Liu, H. Soft sensor for predicting indoor PM2.5 concentration in subway with adaptive boosting deep learning model. J. Hazard. Mater. 2024, 465, 133074. [Google Scholar] [CrossRef]
- Ren, J.; He, J.; Novoselac, A. Predicting indoor particle concentration in mechanically ventilated classrooms using neural networks: Model development and generalization ability analysis. Build. Environ. 2023, 238, 110404. [Google Scholar] [CrossRef]
- Wei, W.; Wargocki, P.; Ke, Y.; Bailhache, S.; Diallo, T.; Carré, S.; Ducruet, P.; Sesana, M.M.; Salvalai, G.; Espigares-Correa, C.; et al. PredicTAIL, a prediction method for indoor environmental quality in buildings undergoing deep energy renovation based on the TAIL rating scheme. Energy Build. 2022, 258, 111839. [Google Scholar] [CrossRef]
- Marzouk, M.; Atef, M. Assessment of Indoor Air Quality in Academic Buildings Using IoT and Deep Learning. Sustainability 2022, 14, 7015. [Google Scholar] [CrossRef]
- Puscasiu, A.P.; Fanca, A.; Gota, D.I.; Valean, H. Monitoring and Prediction of Indoor Air Quality for Enhanced Occupational Health. Intell. Autom. Soft Comput. 2023, 35, 925–940. [Google Scholar] [CrossRef]
- Shi, T.; Yang, W.; Qi, A.; Li, P.; Qiao, J. LASSO and attention-TCN: A concurrent method for indoor particulate matter prediction. Appl. Intell. 2023, 53, 20076–20090. [Google Scholar] [CrossRef]
- Guak, S.; Kim, K.; Yang, W.; Won, S.; Lee, H.; Lee, K. Prediction models using outdoor environmental data for real-time PM10 concentrations in daycare centers, kindergartens, and elementary schools. Build. Environ. 2021, 187, 107371. [Google Scholar] [CrossRef]
- Saini, J.; Dutta, M.; Marques, G. A novel application of fuzzy inference system optimized with particle swarm optimization and genetic algorithm for PM10 prediction. Soft Comput. 2022, 26, 9573–9586. [Google Scholar] [CrossRef]
- Dai, H.; Liu, Y.; Wang, J.; Ren, J.; Gao, Y.; Dong, Z.; Zhao, B. Large-scale spatiotemporal deep learning predicting urban residential indoor PM2.5 concentration. Environ. Int. 2023, 182, 108343. [Google Scholar] [CrossRef]
- Park, S.B.; Park, J.H.; Jo, Y.M.; Song, D.; Heo, S.; Lee, T.J.; Park, S.; Koo, J. Development and validation of a dynamic mass-balance prediction model for indoor particle concentrations in an office room. Build. Environ. 2022, 207, 108465. [Google Scholar] [CrossRef]
- Bakht, A.; Sharma, S.; Park, D.; Lee, H. Deep Learning-Based Indoor Air Quality Forecasting Framework for Indoor Subway Station Platforms. Toxics 2022, 10, 557. [Google Scholar] [CrossRef]
- Fu, N.; Kim, M.K.; Huang, L.; Liu, J.; Chen, B.; Sharples, S. Investigating the reliability of estimating real-time air exchange rates in a building by using airborne particles, including PM1.0, PM2.5, and PM10: A case study in Suzhou, China. Atmos. Pollut. Res. 2024, 15, 101955. [Google Scholar] [CrossRef]
- Park, S.Y.; Yoon, D.K.; Park, S.H.; Jeon, J.I.; Lee, J.M.; Yang, W.H.; Cho, Y.S.; Kwon, J.; Lee, C.M. Proposal of a Methodology for Prediction of Indoor PM2.5 Concentration Using Sensor-Based Residential Environments Monitoring Data and Time-Divided Multiple Linear Regression Model. Toxics 2023, 11, 526. [Google Scholar] [CrossRef]
- Espinosa, F.; Bartolomé, A.B.; Hernández, P.V.; Rodriguez-Sanchez, M.C. Contribution of Singular Spectral Analysis to Forecasting and Anomalies Detection of Indoors Air Quality. Sensors 2022, 22, 3054. [Google Scholar] [CrossRef]
- Dudkina, E.; Crisostomi, E.; Franco, A. Prediction of CO2 in Public Buildings. Energies 2023, 16, 7582. [Google Scholar] [CrossRef]
- Tran, Q.A.; Dang, Q.H.; Le, T.; Nguyen, H.T.; Le, T.D. Air Quality Monitoring and Forecasting System using IoT and Machine Learning Techniques. In Proceedings of the 2022 6th International Conference on Green Technology and Sustainable Development (GTSD), Nha Trang City, Vietnam, 29–30 July 2022; pp. 786–792. [Google Scholar] [CrossRef]
- D’Amico, A.; Pini, A.; Zazzini, S.; D’Alessandro, D.; Leuzzi, G.; Currà, E. Modelling VOC Emissions from Building Materials for Healthy Building Design. Sustainability 2020, 13, 184. [Google Scholar] [CrossRef]
- Arsiwala, A.; Elghaish, F.; Zoher, M. Digital twin with Machine learning for predictive monitoring of CO2 equivalent from existing buildings. Energy Build. 2023, 284, 112851. [Google Scholar] [CrossRef]
- Rakib, M.; Haq, S.; Hossain, M.I.; Rahman, T. IoT Based Air Pollution Monitoring & Prediction System. In Proceedings of the 2022 International Conference on Innovations in Science, Engineering and Technology (ICISET), Chittagong, Bangladesh, 26–27 February 2022; pp. 184–189. [Google Scholar] [CrossRef]
- Salman, H.A.; Kalakech, A.; Steiti, A. Random Forest Algorithm Overview. Babylon. J. Mach. Learn. 2024, 2024, 69–79. [Google Scholar] [CrossRef] [PubMed]
- James, G.; Witten, D.; Hastie, T.; Tibshirani, R.; Taylor, J. Linear Regression. In An Introduction to Statistical Learning: With Applications in Python; Springer International Publishing: Cham, Switzerland, 2023; pp. 69–134. [Google Scholar] [CrossRef]
- Pradhan, A. Support vector machine—A survey. Int. J. Emerg. Technol. Adv. Eng. 2012, 2, 82–85. [Google Scholar]
- Schulz, E.; Speekenbrink, M.; Krause, A. A tutorial on Gaussian process regression: Modelling, exploring, and exploiting functions. J. Math. Psychol. 2018, 85, 1–16. [Google Scholar] [CrossRef]
- Natekin, A.; Knoll, A. Gradient boosting machines, a tutorial. Front. Neurorobot. 2013, 7, 21. [Google Scholar] [CrossRef]
- Kramer, O. K-Nearest Neighbors. In Dimensionality Reduction with Unsupervised Nearest Neighbors; Springer: Berlin/Heidelberg, Germany, 2013; Volume 51, pp. 13–23. [Google Scholar] [CrossRef]
- Salehinejad, H.; Sankar, S.; Barfett, J.; Colak, E.; Valaee, S. Recent Advances in Recurrent Neural Networks. arXiv 2018, arXiv:1801.01078. [Google Scholar] [CrossRef]
- Graves, A. Long Short-Term Memory. In Supervised Sequence Labelling with Recurrent Neural Networks; Springer: Berlin/Heidelberg, Germnay, 2012; pp. 37–45. [Google Scholar] [CrossRef]
- Dey, R.; Salem, F.M. Gate-variants of Gated Recurrent Unit (GRU) neural networks. In Proceedings of the 2017 IEEE 60th International Midwest Symposium on Circuits and Systems (MWSCAS), Boston, MA, USA, 6–9 August 2017; pp. 1597–1600. [Google Scholar] [CrossRef]
- Ranstam, J.; Cook, J.A. LASSO regression. Br. J. Surg. 2018, 105, 1348. [Google Scholar] [CrossRef]
- Shumway, R.H.; Stoffer, D.S. ARIMA Models. In Time Series Analysis and Its Applications: With R Examples; Springer International Publishing: Cham, Switzerland, 2017; pp. 75–163. [Google Scholar] [CrossRef]
- Alharbi, F.R.; Csala, D. A Seasonal Autoregressive Integrated Moving Average with Exogenous Factors (SARIMAX) Forecasting Model-Based Time Series Approach. Inventions 2022, 7, 94. [Google Scholar] [CrossRef]
Database | Publication Type | IAQ | IAQ Prediction | IAQ Prediction + Machine Learning | IAQ Prediction + Deep Learning |
---|---|---|---|---|---|
Web of Science | Articles | 1121 | 111 | 30 | 14 |
Reviews | 155 | 11 | 3 | 0 | |
Scopus | Articles | 1203 | 99 | 25 | 15 |
Reviews | 162 | 16 | 5 | 0 | |
Google Scholar | Articles | 17,400 | 16,600 | 13,400 | 12,300 |
Reviews | 7560 | 4320 | 1730 | 1590 |
Database | Publication Type | IAQ | IAQ Prediction | IAQ Pred. + ML | IAQ Pred. + DL |
---|---|---|---|---|---|
Springer | Articles | 592 | 166 | 73 | 45 |
Reviews | 76 | 29 | 21 | 13 | |
Science Direct | Articles | 2713 | 1051 | 462 | 353 |
Reviews | 357 | 169 | 105 | 87 | |
MDPI | Articles | 261 | 32 | 7 | 2 |
Reviews | 49 | 5 | 1 | 0 | |
IEEE Xplore | Conf. papers | 113 | 15 | 10 | 1 |
Journals | 16 | 3 | 2 | 3 | |
Taylor & Francis | Articles | 299 | 182 | 23 | 41 |
Reviews | 7 | 3 | 1 | 4 | |
ACM Digital Library | All types | 50 | 34 | 29 | 25 |
Reviews | 0 | 0 | 0 | 0 | |
Wiley Online Library | Journal papers | 280 | 165 | 38 | 40 |
Books | 63 | 39 | 21 | 19 |
Paper Ref. | IAQ Parameters | Object Type/Building Type | Strategy | Methods/Algorithm Type | Results |
---|---|---|---|---|---|
[8] | PM2.5 | Residential | Ahead prediction | Machine Learning (Quantile Random Fores—QRF, Gaussian Process Regression—GPR); Deep Learning (Bayesian Neural Network—BNN) | BNN: = 0.48–0.72, MAE = 7.82–16.3 µg/m3, 95% interval coverage = 88%; QRF: = 0.24–0.95, MAE = 3.09–20.0 µg/m3, interval coverage = 85–92%; GPR: = 0.03–0.69, weak coverage, wide intervals, overfit tendency on test set. |
[9] | CO2, VOC, PM2.5, RH, Temp. | Residential | Ahead prediction | Machine Learning (Random Forest Regressor) | Temp: = 0.94, MAE = 0.04 Humidity: = 0.92, MAE = 0.31 CO2: = 0.96, MAE = 13.89 VOC: = 0.98, MAE = 15.38 PM2.5: = 0.84, MAE = 1.44 |
[10] | CO2 | Public (Classroom) | Ahead prediction | Deep Learning (Ensemble Neural Operator Transformer based on CFD-trained PDE surrogate modeling) | CO2 prediction relative error: 10.9% (test), 5.9% (train) Ventilation control reduced energy consumption by 34–56% vs. max control while maintaining CO2 < 1200 ppm, 0% violations; 250,000× faster than CFD simulation. |
[11] | Temp, RH, Dew Point | Public (Greenhouse) | Ahead prediction | Deep Learning (ANN, LSTM-RNN, LSTM-ANN, GRU, Power-LSTM—PLSTM) | PLSTM with input Hi-Di-Rs: = 0.9999, RMSE = 0.022, MAE = 0.016. Best among all models. Other DL models: = 0.918–0.998, varying by input set. GRU second-best with = 0.9997 (Hi-Di-To). |
[12] | IAQ (CO2 ppm), Primary Energy Consumption, CO2 Emissions, PPD, VDH | Residential (Apartment) | No time horizon | Machine Learning (Random Forest, XGBoost, ANN, SVR, KNN, Linear Regression) | XGBoost: = 0.9593, RMSE = 33.12 ppm. RF: = 0.9250. Linear Regression baseline: ≈ 0.5. SHAP: airflow (P9) was top IAQ driver. |
[13] | CO2 | Public (Educational–University Laboratory) | Ahead prediction | Deep Learning–Recurrent Neural Network (RNN), specifically LSTM | Training = 0.93, Test = 0.88, Global = 0.92, MSE = 75 ppm ( 10.6% of avg. CO2 level 712 ppm); robust despite occupancy variability |
[14] | CO2 | Public (University Lecture Theatre) | No time horizon | Hybrid models (GWO + ELM − ML) | = 0.95, RMSE = 5.17 ppm for CO2 prediction; 95% faster than CFD for 289 scenarios |
[15] | CO2, TVOC, PM2.5, PM10 | Public (Athletic/Fitness Center) | Ahead prediction | Deep Learning (LSTM–RNN architecture) | Availability (time slots with acceptable IAQ): TVOC—92.8%, CO2—89.2%, PM10—85.7% Combined (TVOC + CO2 + PM10): 82.1% |
[16] | CO2, PM2.5, Formaldehyde, RH | Residential | Ahead prediction | Mathematical/Statistical methods (ARIMA); Machine Learning (SVM); Deep Learning (LSTM, BPNN) | Best for CO2: LSTM ( = 0.97, RMSE = 1.35 ppm); SVM for PM2.5 (lowest error); Formaldehyde: all models within sensor error (±20 µg/m3); PM2.5: all models showed reduced predictability (MAPE up to 91.85% for LSTM) |
[17] | CO2 | Public (Classroom) | Ahead prediction | Machine Learning (SVM, RF, LR, AdaBoost, Gradient Boosting), Deep Learning (MLP) | MLP: = 0.91 (1 h), 0.805 (24 h); RMSE = 34.3–54.78 ppm; MLP best across all horizons |
[18] | CO2, TVOC, PM10, PM2.5, NO2, O3 | Residential | No time horizon | Mathematical/Statistical methods (mass balance equation + multizone airflow simulation) | Good agreement for CO2, PM10, PM2.5, O3 (≤20% error); underestimation for TVOC and NO2 by 3–4× (source database limitations) |
[19] | Bioaerosols (bacteria, fungi, pollen), PM2.5, PM10 | Public (Office Shopping Mall) | Ahead prediction | Machine Learning (RF, XGBoost, Linear, Lasso); Deep Learning (MLP, LSTM, RNN) | LSTM best overall: WI ≈ 0.75–0.80 (up to 60 min ahead); PM prediction: 90% accuracy; bioaerosols: 60–80% accuracy depending on particle type |
[20] | CO2, PM2.5, TVOC, Temp, RH | Public (Commercial Terminal) | Ahead prediction | Hybrid models (CNN-LSTM + EPSO) | EPSO-CNN-LSTM consistently outperforms CNN, LSTM, RNN, GRU: up to 0.98; MAE ↓ by 9–52%; MAPE ↓ by 2–70%; stable across 4 datasets (20-fold CV) |
[21] | PM10 | Public (Canteen in Campus) | Ahead prediction | Machine Learning (XGBoost) | = 0.99, RMSE = 0.483, MAE = 0.284, MAPE = 3.24%, accuracy = 98.15%; model uses CO2, VOC, RH to forecast PM10 |
[22] | PM10 | Public (Cleanroom in factory) | Ahead prediction | Deep Learning (Multilevel RNN, LSTM) | Multilevel RNN outperformed LSTM: = 0.61 vs. 0.42 (test), RMSE = 0.36 vs. 0.51, MAE = 0.26 vs. 0.33; best time-step = 22 |
[23] | CO2, PM10, PM2.5 | Public (School) | Ahead prediction | Ahead prediction Deep Learning (Integrated ANN, MIMO comparison, MOGA optimization) | RMSE: CO2 = 0.88, PM10 = 0.46, PM2.5 = 0.66; : CO2 = 1.0, PM10 = 0.999, PM2.5 = 0.998; 90–97% samples within ±1 unit prediction error |
[24] | CO2, PM2.5 | Public (University Classrooms) | Ahead prediction | Machine Learning (MLP, XGBoost); Deep Learning (LSTM-wF) | Estimation: 95.86% accuracy (with AER); Forecasting: 94–96% accuracy; Forecast error max 16%; LSTM-wF faster than Bi-LSTM/DBU-LSTM |
[25] | CO2 | Residential | Ahead prediction | Deep Learning (LSTM: single, stacked, bidirectional) | Best: Bidirectional LSTM— = 0.981, RMSE = 16.77 ppm, MAE = 8.95 ppm; steady-state CO2 prediction: error 5.5%; step-ahead horizon = 1 min |
[26] | PM2.5 | Residential | Ahead prediction | Deep Learning (Autoencoder + RNN) | Best RMSE = 17.28 µg/m3, = 0.799; Median error = 8.3 µg/m3 (30 min horizon); model captures trends for indoor PM2.5 control |
[27] | CO2 | Public (University Lab) | Ahead prediction | Mathematical/Statistical methods (SARIMAX) | RMSE = 26.45 ppm, Accuracy ≈ 97.36%, = 0.907 (10-fold CV); model uses exogenous context data for 3-day prediction |
[28] | PM2.5 | Public (US Embassy) | Ahead prediction | Hybrid models (LSTM + GRU) | RMSE: LSTM = 0.3186, LSTM-GRU = 0.2034; LSTM-GRU better at tracking temporal trends and reducing gradient error |
[29] | CO2, RH, Temp, NO2, SO2 | Public (Commercial heritage) | Ahead prediction | Hybrid models (Multiple Output GRU + CFD) | MAE improved by 11.3% (RH), = 0.922 (CO2); CFD + GRU reduced HVAC delay by 10 min; IAQ improved by up to 20% via predictive regulation |
[30] | PM2.5 | Public (Office) | Ahead prediction | Mathematical/Statistical methods (MLR, PLS, DLM, LASSO); Machine Learning (ANN); Deep Learning (LSTM) | Best: LSTM–RMSE = 1.73 µg/m3, = 0.83, IA = 0.94; ANN RMSE = 2.38 µg/m3, MLR RMSE = 3.07 µg/m3; 670 hourly observations used |
[31] | Air Age (IAQ parameter) | Public (a room model with a pulsating air supply) | Ahead prediction | Machine Learning (XGBoost) | RMSE = 0.483, = 0.99, MAPE = 3.24%, Accuracy = 98.15% |
[32] | PM2.5, PM10, CO (IAQ Index) | Public (University rooms) | Ahead prediction | Mathematical/Statistical methods (Discrete-Time Markov Chain–DTMC) | MAE = 4.75% (avg. prediction error); return periods for AQI states predicted with ≤6.6% error |
[33] | CO2 | Public (Cubicles, Meeting Rooms) | Ahead prediction | Mathematical/Statistical methods (Ridge); Machine Learning (DT, RF); Deep Learning (MLP) | MAE: 1 min ahead ≈ 1 ppm, 5 min ≈ 4–5 ppm, 15 min ≈ 12–13 ppm; DT nearly matched RF; MLP not better |
[34] | PM2.5, PM10, CO | Public (University building) | Ahead prediction | Deep Learning (RNN, LSTM–single and stacked) | LSTM used for >2 h AQ forecasting; model accuracy declines beyond 4 h horizon; no quantitative metrics provided; RNN found effective for <1 h |
[35] | CO2, TC (Thermal Comfort) | Public (Schools–Primary and Secondary) | Ahead prediction | Machine Learning (Class-weighted Random Forest) | Accuracy = 0.9718, = 0.9600 (test); robust over 20 runs (mean = 0.9584); best performance with 22 selected features including occupancy, activity, and window use |
[36] | PM2.5 | Public (Hospital Outpatient Hall) | Ahead prediction | Hybrid Models (SO-LSTS = Informer + AHP + Entropy Weighting) | = 0.860, RMSE = 6.258, MAE = 5.620; significantly outperformed XGBoost, KNN, SVM, BP; occupancy level improved model accuracy by 54% |
[37] | CO2, CO, NO2, O3, VOC, Formaldehyde, PM, Temp, RH | Public (Hospital: surgical rooms, pharmacy, women’s ward) | Ahead prediction | Machine Learning (SVM, Logistic Regression, Decision Tree, Random Forest, AdaBoost, KNN, NB, Neural Network) | Real dataset: SVM achieved 99.81% accuracy, LR 99.26%, DT 98.18%; Simulated dataset: RF 90.09%, DT 88.96%, AdaBoost 87.73% |
[38] | PM2.5 | Public (Subway Station–Platform Area) | Ahead prediction | Hybrid models (PLS + VAER) | PLS-VAER: = 0.722, RMSE = 0.136, MAE = 0.092; better than VAER alone ( = 0.635, RMSE = 0.156); 14.71% ↓ RMSE vs. VAER |
[39] | PM2.5 | Public (Subway Station–City Hall, Seoul) | Ahead prediction | Hybrid Models (KPCA + mRMR + MHATCN + LightGBM) | = 0.92, RMSE = 6.01 µg/m3, MAE = 4.36, MAPE = 20.58; outperformed baseline models including LSTM and TCN |
[40] | CO2 | Public (Retail Stores, Offices, Meeting Rooms) | Ahead prediction | Deep Learning (1D Convolutional Neural Network–CNN) | RMSE ≈ 40–50 ppm (1 day of data); ≈15 ppm (after 7 days); ≈10 ppm (after 30 days); trainable on edge device in 25 min |
[41] | PM10 | Residential (Smart Home Kitchen Area) | Ahead prediction | Machine Learning (Random Forest) | = 0.996, RMSE = 0.594, MAE = 0.337, MAPE = 3.90%, Overall accuracy = 97.72% |
[42] | PM2.5 | Public | Ahead prediction | Hybrid Models (FL-CNN-LSTM) | RMSE (Test): FL-CNN-LSTM = 0.0592; better than CNN-LSTM (0.0711) and LSTM (0.0624); 3-h prediction horizon |
[43] | CO2, PM2.5, VOC | Public (Child daycare center) | Ahead prediction | Deep Learning (ANN–LM, BR, BFGS algorithms) | LM: = 0.989 (CO2), 0.983 (PM2.5), 0.977 (VOC); RMSE = 18.7–36.5; BR/BFGS lower accuracy; 5-min interval |
[44] | PM2.5, PM10, TVOC | Public (Public–Substation building) | Ahead prediction | Deep Learning (BP-ANN, MLNN, LSTM), Machine Learning (Random Forest–TVOC only) | MLNN: PM2.5 and PM10 → = 0.78–0.81, NMSE = 0.46–0.49 µg/m3. RF (TVOC): Accuracy = 89.2%. MLNN-TVOC regression: only 25.8% accuracy; MLNN better for PMs. |
[45] | CO2 | Public (Office-like experimental chamber) | Ahead prediction | Hybrid models (CFD + BPNN + PSO) | BPNN: MAPE = 0.42–0.95%, = 0.93–0.97; BPNN-PSO: deviation from target < 7.38%, CO2 reduced by 20%, up to 41.2%; 23.5% faster vs. ANN-GA |
[46] | CO2, VOC | Residential (Apartment in Switzerland) | Ahead prediction | Deep Learning (FFNN, RNN–both with LM-BP algorithm) | FFNN: CO2 CVRMSE = 10.06–18.74%, NMBE = 2.18–1.58%; VOC CVRMSE = 16.70–19.86%. RNN: CO2 error = 3.18–5.49%, VOC error = 4.53–4.72%. |
[47] | Formaldehyde (Cf) | Residential (Simulated–Fabric-covered building, six Chinese cities) | Ahead prediction | Deep Learning (LSTM) | MAPE = 9.1–24.7%, MAE = 0.18–1.93 µg/m3, RMSE = 0.26–2.29 µg/m3; LSTM outperformed RNN; accuracy robust for input uncertainty up to 20% |
[48] | PM2.5 | Residential (93 households—children’s bedrooms in Taiwan) | Ahead prediction | Mathematical/Statistical methods (Multiple Linear Regression—MLR) | = 0.74; RMSE = 5.41 µg/m3; cross-validation = 0.72–0.78 (avg. 0.75); model includes outdoor PM2.5, CO2 diff, building type, floor, human activity |
[49] | CO2 | Public (one-person and shared office rooms) | Ahead prediction | Mathematical/Statistical methods (Mass balance equation + ventilation dynamics) | RMSE = 38 ppm (validation); optimized ERV reduced CO2 > 1000 ppm events; economic analysis supported ERV sizing |
[50] | PM2.5, CO2, RH, Temp | Residential (Homes of asthmatic children) | Ahead prediction | Deep Learning (RNN–GRU, DNN) | RNN: RMSE = 42.5, MAPE = 14% (avg); best model: 3 GRU layers; performance improved with 10 min IAQ granularity |
[51] | VOC | Public (Food Court in Smart Building–Kian Center 2) | Ahead prediction | Machine Learning (Radial Basis Function Neural Network—RBFNN) | MAPE = 3.51% (best setting); trained on 1104 samples (138 days), tested on 24 samples (3 days) |
[52] | PM2.5 | Residential (Bedroom) | Ahead prediction | Hybrid Models (Fuzzy ARTMAP Neural Network) | MAE = 2.28 (avg), range 0.26–7.65 µg/m3 for 24 h ahead prediction; network trained online with 1008 samples |
[53] | CO2, PM2.5, VOC | Public (Open-space in high-rise, 35 occupants) | Ahead prediction | Deep Learning (LSTM) | CO2: MAE = 15.4 ppm, = 0.47; PM2.5: MAE = 0.3 µg/m3, = 0.88; VOC: MAE = 20.1 IAQI, = 0.87 |
[54] | PM2.5 | Residential (Apartment in South Korea) | Ahead prediction | Mathematical/Statistical methods (Mass balance model) | Prediction error for PM2.5 at 30 min: 1–7 µg/m3; model validated under various cooking and ventilation scenarios |
[55] | PM2.5 | Public (Office in Beijing) | Real-time prediction | Machine Learning (CatBoost) | AUC = 0.949, F1-score = 0.883, Precision-Recall AUC = 0.928; model outperformed MLP, GBDT, LR, DT, KNN |
[56] | PM2.5, PM10, NO2 | Public (Laboratory/Office in Florida, USA) | Ahead prediction | Mathematical/Statistical methods (MLR, TSR) Deep Learning (ANN–MLP) | ANN: (PM2.5), 0.9995 (PM10), 0.9014 (NO2); RMSE = 0.0816, 0.0782, 3.17 (respectively); the best model |
[57] | PM2.5 | Public (Subway station—Chungmuro, Seoul) | Ahead prediction | Hybrid models (RF-CNN-GRU) | MAE = 8.61, MAPE = 0.249, RMSE = 10.56, = 0.8704–best among 7 baselines |
[58] | CO2 | Public (Office 24 m2 in India) | Ahead prediction | Machine Learning (GPR, SVM, DT, ANN, EL, LR) | Optimized GPR: = 0.9776, RMSE = 4.20 ppm, MAE = 3.35 ppm, NS = 0.9817, a20 = 1 |
[59] | CO2, NO2, O3, PM1, PM2.5, PM10, HCHO, TVOC | Public (Commercial building: offices, labs, conference rooms—Berkeley, CA, USA) | Ahead prediction | Machine Learning (Random Forest, Gradient Boosting); Deep Learning (LSTM) | LSTM best: Adjusted up to 90% (e.g., PM2.5 = 87.14%, PM10 = 86.28%), MSE < 0.001 for most pollutants (1 h prediction). TVOC/HCHO much worse ( < 20%) |
[60] | PM2.5 | Public (Subway station—Seoul) | Ahead prediction | Hybrid models (KPCA + AdaBoost-LSTM) | Hall: = 0.9007, RMSE = 10.31 µg/m3, MAPE = 38.13%; Platform: = 0.8995, RMSE = 7.03 µg/m3, MAPE = 24.58% |
[61] | PM1, PM2.5, PM10 | Public (Mechanically ventilated high school classrooms—USA and China) | Real-time prediction | Deep Learning (LSTM); Mathematical/Statistical methods (NARX) NARX best: PM1 | PM2.5— = 0.81–0.87, RMSE = 0.45–1.27 µg/m3, MAPE = 41–55%; PM10—all models weaker, best LSTM RMSE = 16.76 µg/m3, = 0.04 |
[62] | CO2, Formaldehyde, Benzene, PM2.5 | Public (Hotel and Office undergoing deep renovation—Europe) | No time horizon | Mathematical/Statistical methods (TRNSYS, IDA ICE, MATHIS-QAI, ACOUBAT, PHANIE) | PM2.5: factor of 2 deviation; CO2: 1031–2072 ppm; Formaldehyde: 7–71 µg/m3; results within acceptable range for IEQ simulation-based TAIL rating |
[63] | CO2, CO, PM2.5, Temp, RH, Pressure | Public (Academic building—University classrooms) | Real-time prediction | Deep Learning (LSTM) | LSTM accurately predicted IAQ parameters using IoT data; no numerical metrics (, RMSE) reported, but model demonstrated effective real-time performance |
[64] | RH (Relative Humidity) | Residential (Sleeping room) | Ahead prediction | Machine Learning (Decision Forest Regression—DFR, Boosted DTR, BLR, LR, NNR) | Best: DFR–RMSE = 1.314–1.466, CoD = 0.97–0.974; NNR worst (RMSE ≈ 3.3); Azure ML Studio evaluation |
[65] | PM1, PM2.5, PM10, PM > 10 | Public (School building) | Ahead prediction | Hybrid models (LASSO + Attention + Temporal Convolutional Network—LATCN) | PM2.5: RMSE = 10.94, MAE = 5.94, = 0.912; PM10: RMSE = 13.75, MAE = 7.11, = 0.898; LATCN outperforms LSTM, GRU, RNN, ATCN, LTCN |
[66] | PM10 | Public (Daycare centers, kindergartens, elementary schools—Korea) | Real-time prediction | Mathematical/Statistical methods (Multiple Linear Regression—MLR) | = 0.64 (daycare), 0.45 (kindergartens), 0.43 (elementary); RMSE: 26.7, 18.9, 19.9 µg/m3 (10 min interval) |
[67] | PM10 | Public (Cafeteria—India) | Ahead prediction | Hybrid models (Fuzzy Inference System—FIS + PSO, FIS + GA) | FIS-GA: RMSE = 0.998; FIS-PSO: RMSE = 1.0746; FIS alone: RMSE = 2.0894 (all on normalized data) |
[68] | PM2.5 | Residential (Urban housing across multiple Chinese cities) | Ahead prediction | Deep Learning (Spatiotemporal Neural Network) | MAE = 6.19 µg/m3, RMSE = 8.40 µg/m3, = 0.74; consistent accuracy across 330 cities; validated on independent dataset |
[69] | PM2.5, PM10 | Public (University office room) | Ahead prediction | Mathematical/Statistical methods (Dynamic mass-balance + Least Squares Optimization) | PM2.5 prediction: r = 0.883, NMSE = 0.085 PM10 prediction: r = 0.882, NMSE = 0.083 AP effectiveness: PM2.5 = 86.4%, PM10 = 86% |
[70] | PM2.5, PM10 | Public (Subway platform—Korea) | Ahead prediction | Deep Learning (Hybrid CNN-LSTM-DNN) | PM10: RMSE = 8.94; MAE = 6.44; = 0.55 PM2.5: RMSE = 10.1; MAE = 6.81; = 0.35 |
[71] | AER (based on PM1.0, PM2.5, PM10) | Public (Office in Suzhou, China) | Real-time prediction | Mathematical/Statistical methods (mass balance equations + empirical correlations) | PM1.0: NME = 2.3–18.3%, r = 0.87–0.99 PM2.5: NME = 2.4–38.2%, r = 0.94–0.99 PM10: NME > 30% (less accurate) |
[72] | PM2.5 | Residential (Korea—two homes) | Real-time prediction | Machine Learning (Multiple Linear Regression—MLR) | = 0.25 (global model); the best hour model (H4): = 0.34, RMSE = 3.34 µg/m3, MAE = 2.55 µg/m3 |
[73] | IAQ Index (VOC, CO, NOx), Temp, RH | Public (Fire Department—Spain) | Ahead prediction | Hybrid (Statistical + Machine Learning Tree Partition + SSA preprocessing) | Without SSA: FIT ≈ 82.2%, MSE = 1.516 With SSA: FIT = 99.12%, MSE = 0.0035, calculation time ↓ 51% |
[74] | CO2 | Public (University classroom—Italy) | Ahead prediction | Machine Learning (Regression, k-NN), Deep Learning (LSTM) | LSTM: MAPE = 18%, RMSE = 253 ppm, = 0.79 KNN: MAPE = 22%, RMSE = 290 ppm, = 0.71 Regression: MAPE = 24%, RMSE = 347 ppm, = 0.69 |
[75] | PM2.5, CO2, CO | Residential (District 12, Ho Chi Minh City) | Ahead prediction | Mathematical/Statistical methods (ARIMA), Deep Learning (LSTM), Machine Learning (FFNN) | Best: ARIMA–CO2: = 0.963, RMSE = 48.8 PM2.5: ARIMA–RMSE = 0.0043 FFNN weakest across all metrics |
[76] | VOC (TVOC) | Public (Office and Meeting Room—CNR Pisa, Italy) | Ahead prediction | Mathematical/Statistical methods (Box-model with mass balance + CFD via CONTAM | Box-model: TVOC = 0.48–1.65 µg/m3 (Low VOC case), <10% of IAGV (600 µg/m3); CFD: VOC peaks = 50–100 µg/m3 near emission zones |
[77] | CO2 | Residential (Apartment—Belfast, UK) | Ahead prediction | Machine Learning (SGD regressor) | MSE = 0.6 vs. baseline MSE = 1.049; predicted trends match actual (e.g., peak eCO2 in evenings in bedroom) |
[78] | PM2.5, CO, NH3, AQI | Residential (Indoor testbed—Bangladesh) | Ahead prediction | Mathematical/Statistical methods (ARIMA) | MAPE: Temp = 2.82%, Humidity = 4.70%, PM2.5 = 6.92%, CO = 10.12%, NH3 = 10.3%, AQI = 5.8% (≈90–97% accuracy) |
Method | Type | Description |
---|---|---|
Random Forest | Machine Learning | Random Forest is an ensemble algorithm that constructs multiple decision trees using randomly selected subsets of data and features, aggregating their outputs through majority voting or averaging. This approach effectively reduces overfitting by decreasing correlation among individual trees, resulting in improved accuracy and robustness in predictive tasks [79]. |
Linear Regression | Machine Learning | Linear regression models the relationship between a dependent variable and one or more independent variables by fitting a linear equation to observed data. The coefficients in this equation represent the intercept and slopes, quantifying the expected change in the dependent variable for a one-unit change in each predictor, estimated by minimizing the sum of squared residuals. This method assumes a linear, additive relationship and provides interpretable parameters useful for quantifying associations and making continuous predictions [80]. |
Support Vector Machine | Machine Learning | Support Vector Machine constructs an optimal hyperplane that maximizes the margin between distinct classes in a high-dimensional space, thus enhancing classification accuracy. It handles both linearly separable and non-linear data through kernel functions that map input data into higher-dimensional spaces where linear separation is feasible. The method explicitly controls the trade-off between classifier complexity and misclassification, providing robust generalization capabilities [81]. |
Gaussian Process Regression | Machine Learning | Gaussian Process Regression (GPR) is a non-parametric Bayesian method for modeling unknown functions by defining a distribution over possible functions consistent with observed data. It utilizes a kernel (covariance) function to encode prior assumptions about the function’s smoRecurrent Neural Networks (RNNs) are a class of artificial neural networks designed to process sequential data by maintaining an internal state that captures information from previous inputs. Unlike feedforward networks, RNNs include feedback loops, allowing them to model temporal dependencies and exhibit dynamic temporal behavior. This architecture enables RNNs to handle tasks such as time-series prediction, speech recognition, and natural language processing, where context and sequence order are critical. However, training RNNs can be challenging due to issues like vanishing or exploding gradients, which limit their ability to learn long-term dependencies effectively.othness and structure, enabling both interpolation and uncertainty quantification through posterior predictions derived from Gaussian process theory. The algorithm optimizes hyperparameters via marginal likelihood maximization and scales predictions using only the observed data, implicitly balancing complexity and fit [82]. |
Gradient Boosting | Machine Learning | Gradient Boosting Machines (GBM) are an ensemble learning technique that sequentially fits weak base-learners—typically decision trees—to the negative gradient of a specified loss function, iteratively improving the model’s predictive accuracy. By combining the outputs of these base-learners, GBM constructs a strong predictive model that is highly customizable through the choice of loss functions and base-learner types, making it effective for both regression and classification tasks. The algorithm inherently balances bias and variance by controlling overfitting via regularization techniques such as shrinkage and early stopping [83]. |
k-Nearest Neighbors | Machine Learning | Gradient Boosting Machines (GBM) are an ensemble learning technique that builds predictive models by iteratively combining weak learners, typically decision trees, to form a strong predictive model. GBMs are based on the principle of boosting, where each new model corrects the errors of the previous ensemble. The core idea is to fit new models to the negative gradient of the loss function, which measures the difference between predicted and actual values, thereby minimizing the loss step-by-step [84]. |
Artificial Neural Networks | Deep Learning | Artificial Neural Networks (ANNs) are computational models inspired by the structure and function of biological neural networks, designed to simulate the way the human brain processes information. ANNs consist of interconnected nodes (neurons) organized in layers—input, hidden, and output—which collectively learn to recognize patterns, classify data, and make predictions through adaptive weight adjustments during training. Their ability to model complex, nonlinear relationships makes them highly effective in diverse applications, including pattern recognition, machine learning, and data analysis [85]. |
Recurrent Neural Networks | Deep Learning | Recurrent Neural Networks (RNNs) are a class of artificial neural networks designed to process sequential data by maintaining an internal state that captures information from previous inputs. Unlike feedforward networks, RNNs include feedback loops, allowing them to model temporal dependencies and exhibit dynamic temporal behavior. This architecture enables RNNs to handle tasks such as time-series prediction, speech recognition, and natural language processing, where context and sequence order are critical. However, training RNNs can be challenging due to issues like vanishing or exploding gradients, which limit their ability to learn long-term dependencies effectively [85]. |
Long-Short Term Memory | Deep Learning | Long Short-Term Memory (LSTM) is a recurrent neural network architecture designed to overcome the vanishing gradient problem, allowing it to learn long-term dependencies in sequential data. By using memory cells and gating mechanisms, LSTM networks effectively retain and access information over extended sequences, making them highly suitable for tasks like speech recognition, time-series analysis, and natural language processing [86]. |
Gated Recurrent Units | Deep Learning | Gated Recurrent Units (GRUs) are a streamlined variant of the Long Short-Term Memory (LSTM) architecture, designed to efficiently model sequential data in recurrent neural networks (RNNs). GRUs employ two gating mechanisms—the update gate and reset gate—to regulate the flow of information between hidden states, enabling improved handling of long-term dependencies compared to traditional RNNs. With fewer parameters than LSTMs, GRUs offer computational efficiency while maintaining comparable performance, making them a widely adopted choice for tasks such as speech recognition, natural language processing, and time-series analysis. Their simplified structure facilitates faster training and reduced memory usage without sacrificing effectiveness in capturing temporal patterns [87]. |
Lasso | Statistical/ Mathematical | Least Absolute Shrinkage and Selection Operator (LASSO) is a penalized regression method that reduces overfitting by shrinking regression coefficients toward zero through an L1 penalty, effectively performing variable selection. The tuning parameter , optimized via cross-validation, controls the degree of shrinkage. While LASSO improves predictive accuracy in high-dimensional data, it may introduce bias in coefficient estimates, limiting their interpretability [88]. |
ARIMA | Statistical/ Mathematical | Autoregressive Integrated Moving Average (ARIMA) models are versatile tools for analyzing and forecasting time series data by combining three key components: autoregression (AR), differencing (I), and moving averages (MAs). The AR component captures the dependence of current values on past observations, the differencing component stabilizes the mean by removing trends or seasonality, and the MA component models the error structure as a linear combination of past error terms. ARIMA models are particularly effective for handling nonstationary data, making them widely applicable in fields such as economics, finance, and environmental science. Their flexibility allows them to adapt to diverse temporal patterns, including trends, cycles, and stochastic fluctuations [89]. |
SARIMAX | Statistical/ Mathematical | Seasonal Autoregressive Integrated Moving Average with Exogenous Factors (SARIMAX) is an advanced time series forecasting model that extends the traditional SARIMA framework by incorporating exogenous variables. SARIMAX integrates seasonal components (SARIMA) with external influencing factors (X), enabling it to capture complex temporal dependencies, seasonal patterns, and the impact of external variables on the target series. This model is particularly effective for datasets with both seasonal and non-seasonal structures, as well as those influenced by external factors such as weather conditions, economic indicators, or policy changes. By combining autoregressive (AR), differencing (I), moving average (MA), and exogenous (X) components, SARIMAX enhances predictive accuracy and adaptability, making it a robust tool for applications in energy forecasting, economics, and environmental studies. Its ability to handle multiple input variables and seasonal effects simultaneously addresses limitations of simpler models, providing more reliable long-term forecasts [90]. |
SWOT Element | Description |
---|---|
Strengths |
|
Weaknesses |
|
Opportunities |
|
Threats |
|
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Latoń, D.; Grela, J.; Ożadowicz, A.; Wisniewski, L. Artificial Intelligence and Machine Learning Approaches for Indoor Air Quality Prediction: A Comprehensive Review of Methods and Applications. Energies 2025, 18, 5194. https://doi.org/10.3390/en18195194
Latoń D, Grela J, Ożadowicz A, Wisniewski L. Artificial Intelligence and Machine Learning Approaches for Indoor Air Quality Prediction: A Comprehensive Review of Methods and Applications. Energies. 2025; 18(19):5194. https://doi.org/10.3390/en18195194
Chicago/Turabian StyleLatoń, Dominik, Jakub Grela, Andrzej Ożadowicz, and Lukasz Wisniewski. 2025. "Artificial Intelligence and Machine Learning Approaches for Indoor Air Quality Prediction: A Comprehensive Review of Methods and Applications" Energies 18, no. 19: 5194. https://doi.org/10.3390/en18195194
APA StyleLatoń, D., Grela, J., Ożadowicz, A., & Wisniewski, L. (2025). Artificial Intelligence and Machine Learning Approaches for Indoor Air Quality Prediction: A Comprehensive Review of Methods and Applications. Energies, 18(19), 5194. https://doi.org/10.3390/en18195194