Comparative Development of Machine Learning Models for Short-Term Indoor CO2 Forecasting Using Low-Cost IoT Sensors: A Case Study in a University Smart Laboratory
Abstract
1. Introduction
2. Materials and Methods
2.1. Data Collection
2.2. Sensor Network and Hardware Implementation
2.3. Data Aggregation and Security
2.4. Feature Engineering and Dataset
2.5. Data Preprocessing
2.6. Modeling and Evaluation Framework
3. Results
3.1. Model Benchmarking (Single Split)
3.2. Model Performance Assessment via 5-Fold Cross-Validation
3.3. Fold-Based Performance via Walk-Forward Cross-Validation
4. Discussion
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
| AI | Artificial Intelligence |
| AutoGluon | Automated Machine Learning Framework for Tabular Data |
| BLE | Bluetooth Low Energy |
| CI | Confidence Interval |
| CO2 | Carbon Dioxide |
| DCV | Demand-Controlled Ventilation |
| DNN | Deep Neural Network |
| GBM | Gradient Boosting Machine |
| HVAC | Heating, Ventilation, and Air Conditioning |
| IAQ | Indoor Air Quality |
| IoT | Internet of Things |
| KNN | K-Nearest Neighbors |
| MA | Moving Average |
| MA3 | 3 h Moving Average |
| MA6 | 6 h Moving Average |
| MAE | Mean Absolute Error |
| MAPE | Mean Absolute Percentage Error |
| MPC | Model Predictive Control |
| MQTT | Message Queuing Telemetry Transport |
| MSE | Mean Squared Error |
| NDIR | Non-Dispersive Infrared |
| PIR | Passive Infrared |
| PM2.5 | Particulate Matter with Diameter ≤ 2.5 µm |
| R2 | Coefficient of Determination |
| RBF | Radial Basis Function |
| RMSE | Root Mean Squared Error |
| SHAP | Shapley Additive exPlanations |
| Std3 | 3 h Rolling Standard Deviation |
| SVR | Support Vector Regression |
| TLS | Transport Layer Security |
| TVOC | Total Volatile Organic Compounds |
| UTC | Coordinated Universal Time |
References
- Moghadam, T.T.; Ochoa Morales, C.E.; Lopez Zambrano, M.J.; Bruton, K.; O’Sullivan, D.T.J. Energy efficient ventilation and indoor air quality in the context of COVID-19—A systematic review. Renew. Sustain. Energy Rev. 2023, 182, 113356. [Google Scholar] [CrossRef] [PubMed]
- Liu, Q.; Chen, Y.; Liu, Y.; Lei, Y.; Wang, Y.; Hu, P. A review and guide on selecting and optimizing machine learning algorithms for daylight prediction. Build. Environ. 2023, 244, 110822. [Google Scholar] [CrossRef]
- Lu, X.; Pang, Z.; Fu, Y.; O’Neill, Z. The nexus of the indoor CO2 concentration and ventilation demands underlying CO2-based demand-controlled ventilation in commercial buildings: A critical review. Build. Environ. 2022, 218, 109137. [Google Scholar] [CrossRef]
- Faulkner, C.A.; Castellini, J.E., Jr.; Lou, Y.; Zuo, W.; Lorenzetti, D.M.; Sohn, M.D. Tradeoffs among indoor air quality, financial costs, and CO2 emissions for HVAC operation strategies to mitigate indoor virus in U.S. office buildings. Build. Environ. 2022, 221, 109282. [Google Scholar] [CrossRef]
- Lu, X.; Pang, Z.; Fu, Y.; O’Neill, Z. Advances in research and applications of CO2-based demand-controlled ventilation in commercial buildings: A critical review of control strategies and performance evaluation. Build. Environ. 2022, 223, 109455. [Google Scholar] [CrossRef]
- Buonomano, A.; Forzano, C.; Giuzio, G.F.; Palombo, A. New ventilation design criteria for energy sustainability and indoor air quality in a post COVID-19 scenario. Renew. Sustain. Energy Rev. 2023, 182, 113378. [Google Scholar] [CrossRef]
- Taheri, S.; Razban, A. Learning-based CO2 concentration prediction: Application to indoor air quality control using demand-controlled ventilation. Build. Environ. 2021, 205, 108164. [Google Scholar] [CrossRef]
- ANSI/ASHRAE Standard 62.1-2022; Ventilation and Acceptable Indoor Air Quality. ASHRAE: Atlanta, GA, USA, 2022.
- Kapoor, N.R.; Kumar, A.; Kumar, A.; Kumar, A.; Mohammed, M.A.; Kumar, K.; Kadry, S.; Lim, S. Machine learning-based CO2 prediction for office room: A pilot study. Wirel. Commun. Mob. Comput. 2022, 2022, 9404807. [Google Scholar] [CrossRef]
- Persily, A. Please Don’t Blame Standard 62.1 for 1000 ppm CO2. ASHRAE J. 2021, 63, 1–2. [Google Scholar]
- Chen, Y.; Shen, G.; Huang, Y.; Zhu, Y. Predicting the long-term CO2 concentration in classrooms based on the BO-EMD-LSTM model. Build. Environ. 2022, 224, 109568. [Google Scholar] [CrossRef]
- Dong, J.; Goodman, N.; Rajagopalan, P. A review of artificial neural network models applied to predict indoor air quality in schools. Int. J. Environ. Res. Public Health 2023, 20, 6441. [Google Scholar] [CrossRef]
- Mahmood, M.H.; Kamal, K.Y.; Hussein, S.S. Monitoring indoor air quality using low-cost IoT. J. Tech. 2025, 7, 21–28. [Google Scholar] [CrossRef]
- Pan, J.; Wang, Y.; Liu, S. Future workspace needs flexibility and diversity: Understanding occupant attitudes and behavior for flexible co-working spaces. Build. Environ. 2023, 246, 110947. [Google Scholar] [CrossRef]
- Flayyih, H.Q.; Waleed, J.; Ibrahim, A.M. Indoor air quality prediction in sick building using machine and deep learning: Comparative analysis. Diyala J. Eng. Sci. 2025, 18, 203–218. [Google Scholar] [CrossRef]
- Chiang, Y.C.; Lu, C.H.; Chou, L.D. A practical and adaptive approach to predicting indoor CO2. Appl. Sci. 2021, 11, 10771. [Google Scholar] [CrossRef]
- Soliman, A.S.; Hafeez, G.; Khan, S.; Algarni, A.D. A review of occupancy detection techniques for HVAC control: Advances and practical challenges. J. Build. Eng. 2025, 105, 111399. [Google Scholar] [CrossRef]
- Chen, X.; Yang, L.; Xue, H.; Li, L.; Yu, Y.; Wang, X. A machine learning model based on GRU and LSTM to predict the environmental parameters in a layer house, taking CO2 concentration as an example. Sensors 2024, 24, 244. [Google Scholar] [CrossRef]
- Ali, S.; Alam, F.; Arif, K.M.; Potgieter, J. Low-cost CO sensor calibration using one dimensional convolutional neural network. Sensors 2023, 23, 854. [Google Scholar] [CrossRef]
- Taştan, M. Machine learning–based calibration and performance evaluation of low-cost Internet of Things air quality sensors. Sensors 2025, 25, 3183. [Google Scholar] [CrossRef]
- Krupinski, F.; Marques, G.; Kaur, N. Validating the accuracy of low-cost IAQ sensors through co-location. In Proceedings of the eSim 2024 Conference, Edmonton, AB, Canada, 5–7 June 2024; pp. 146–153. [Google Scholar] [CrossRef]
- Dai, Y.; Yuan, H.; Zhang, X.; Guo, J. A method for predicting indoor CO2 concentration in university classrooms: An RF-TPE-LSTM approach. Appl. Sci. 2024, 14, 6188. [Google Scholar] [CrossRef]
- Bae, K.W.; Choi, E.J.; Choi, Y.J.; Yun, J.Y.; Yun, G.Y.; Moon, H.J. Real-time ventilation control for indoor CO2 management using deep learning-based predictive optimization algorithm. Build. Environ. 2025, 285, 113568. [Google Scholar] [CrossRef]
- Norouziasas, A.; Tabadkani, A.; Doan, D.T.; Vafaee, F.; Aghamolaei, R. Impact of space utilization and work time flexibility on building energy demand. J. Build. Eng. 2024, 95, 110184. [Google Scholar] [CrossRef]
- He, J.; Luo, M.; Chen, W. Classification prediction model of indoor PM2.5 concentration using CatBoost algorithm. Front. Built Environ. 2023, 9, 1207193. [Google Scholar] [CrossRef]
- Majewski, G.; Telejko, M.; Sowa, J. Evaluation of demand control ventilation impact on indoor air quality and energy efficiency of an office space in a tropical climate. Indoor Built Environ. 2024, 33, 708–720. [Google Scholar]
- Okafor, N.U.; Delaney, D.T. Missing Data Imputation on IoT Sensor Networks: Implications for On-Site Sensor Calibration. IEEE Sens. J. 2021, 21, 22833–22845. [Google Scholar] [CrossRef]
- Little, R.J.A.; Rubin, D.B. Statistical Analysis with Missing Data, 3rd ed.; Wiley: Hoboken, NJ, USA, 2019. [Google Scholar]
- Junninen, H.; Niska, H.; Tuppurainen, K.; Ruuskanen, J.; Kolehmainen, M. Methods for Imputation of Missing Values in Air Quality Data Sets. Atmos. Environ. 2004, 38, 2895–2907. [Google Scholar] [CrossRef]
- Han, J.; Pei, J.; Tong, H. Data Mining: Concepts and Techniques, 3rd ed.; Morgan Kaufmann: Waltham, MA, USA, 2011. [Google Scholar]
- Kuhn, M.; Johnson, K. Feature Engineering and Selection: A Practical Approach for Predictive Models; CRC Press: Boca Raton, FL, USA, 2019. [Google Scholar]
- ASHRAE. ASHRAE Handbook—Fundamentals; ASHRAE: Atlanta, GA, USA, 2021. [Google Scholar]
- Box, G.E.P.; Jenkins, G.M.; Reinsel, G.C.; Ljung, G.M. Time Series Analysis: Forecasting and Control, 5th ed.; Wiley: Hoboken, NJ, USA, 2015. [Google Scholar]
- Smith, S.W. The Scientist and Engineer’s Guide to Digital Signal Processing; California Technical Publishing: San Diego, CA, USA, 1997. [Google Scholar]
- Ljung, L. System Identification: Theory for the User; Prentice Hall: Upper Saddle River, NJ, USA, 1999. [Google Scholar]
- Seber, G.A.F.; Lee, A.J. Linear Regression Analysis; Wiley: Hoboken, NJ, USA, 2003. [Google Scholar]
- Cover, T.; Hart, P. Nearest Neighbor Pattern Classification. IEEE Trans. Inf. Theory 1967, 13, 21–27. [Google Scholar] [CrossRef]
- Cortes, C.; Vapnik, V. Support-Vector Networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
- Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
- Friedman, J.H. Greedy Function Approximation: A Gradient Boosting Machine. Ann. Stat. 2001, 29, 1189–1232. [Google Scholar] [CrossRef]
- Wolpert, D.H. Stacked Generalization. Neural Netw. 1992, 5, 241–259. [Google Scholar] [CrossRef]
- Erickson, N.; Mueller, J.; Shirkov, A.; Zhang, H.; Larroy, P.; Li, M.; Smola, A. AutoGluon-Tabular: Robust and Accurate AutoML for Structured Data. arXiv 2020, arXiv:2003.06505. [Google Scholar]
- Lim, B.; Zohren, S. Time-Series Forecasting with Deep Learning: A Survey. Philos. Trans. R. Soc. A 2021, 379, 20200209. [Google Scholar] [CrossRef]
- Dietterich, T.G. Ensemble Methods in Machine Learning. In Multiple Classifier Systems; Springer: Berlin/Heidelberg, Germany, 2000; pp. 1–15. [Google Scholar]
- Kohavi, R. A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection. In Proceedings of the 14th International Joint Conference on Artificial Intelligence (IJCAI), Montreal, QC, Canada, 20–25 August 1995; pp. 1137–1143. [Google Scholar]
- Willmott, C.J.; Matsuura, K. Advantages of the Mean Absolute Error (MAE) over the Root Mean Square Error (RMSE) in Assessing Average Model Performance. Clim. Res. 2005, 30, 79–82. [Google Scholar] [CrossRef]
- Lundberg, S.M.; Lee, S.I. A Unified Approach to Interpreting Model Predictions. In Advances in Neural Information Processing Systems (NeurIPS); Curran Associates, Inc.: Red Hook, NY, USA, 2017. [Google Scholar]
- James, G.; Witten, D.; Hastie, T.; Tibshirani, R. An Introduction to Statistical Learning; Springer: New York, NY, USA, 2013. [Google Scholar]
- Cheng, Z.; Yang, Z.; Xiong, J.; Li, G. Investigation on the pollutant concentration and optimal control strategy of pre-ventilation in office buildings. Int. J. Environ. Sci. Technol. 2024, 21, 3845–3858. [Google Scholar] [CrossRef]
- Price, C.; Park, D.; Rasmussen, B.P. Cascaded Control for Building HVAC Systems in Practice. Buildings 2022, 12, 1814. [Google Scholar] [CrossRef]
- Wu, D.C.; Momeni, M.; Razban, A.; Chen, J. Optimizing demand-controlled ventilation with thermal comfort and CO2 concentrations using long short-term memory and genetic algorithm. Build. Environ. 2023, 243, 110676. [Google Scholar] [CrossRef]
- Vassiljeva, K.; Matson, M.; Ferrantelli, A.; Petlenkov, E.; Thalfeldt, M.; Belikov, J. Data-Driven Occupancy Profile Identification and Application to the Ventilation Schedule in a School Building. Energies 2024, 17, 3080. [Google Scholar] [CrossRef]
- Tarragona, J.; Gangolells, M.; Casals, M. Model predictive control for managing indoor air quality levels in buildings. Energy Rep. 2024, 12, 787–797. [Google Scholar] [CrossRef]
- Sha, X.; Ma, Z.; Sethuvenkatraman, S.; Li, W. Online learning-enhanced data-driven model predictive control for optimizing HVAC energy consumption, indoor air quality and thermal comfort. Appl. Energy 2025, 383, 125341. [Google Scholar] [CrossRef]
- Pang, Z.; Guo, M.; O’Neill, Z.; Smith-Cortez, B.; Yang, Z.; Dong, B. A longitudinal field study of sensor-driven occupancy-centric HVAC controls in an office building. Energy Build. 2025, 351, 116693. [Google Scholar] [CrossRef]
- Borodinecs, A.; Palcikovskis, A.; Jacnevs, V. Indoor air CO2 sensors and possible uncertainties of measurements: A review and an example of practical measurements. Energies 2022, 15, 6961. [Google Scholar] [CrossRef]
- Rios Mora, J.S.; Jardinier, E.; Guyot, G.; Mélois, A.; Legrée, M.; Parsy, F.; Berthin, S. Long-term performances of low-cost indoor environment quality sensors for use in monitoring studies and ventilation strategies. Int. J. Vent. 2026, 25, 1–19. [Google Scholar] [CrossRef]
- Gabriel, M.; Auer, T. LSTM Deep Learning Models for Virtual Sensing of Indoor Air Pollutants: A Feasible Alternative to Physical Sensors. Buildings 2023, 13, 1684. [Google Scholar] [CrossRef]








| Rank | Model | R2 | MAE (ppm) | MSE | RMSE | MAPE (%) |
|---|---|---|---|---|---|---|
| 1 | AutoGluon (Ensemble) | 0.9429 | 32.97 | 3496.77 | 59.13 | 4.18% |
| 2 | Random Forest | 0.9470 | 33.44 | 3248.87 | 56.99 | 4.41% |
| 3 | Hist. Gradient Boosting | 0.9372 | 34.74 | 3848.55 | 62.04 | 4.42% |
| 4 | Manual Stacking Ensemble | 0.9436 | 37.41 | 3456.76 | 58.79 | 5.24% |
| 5 | SVR (Support Vector Machine) | 0.9445 | 40.86 | 3403.27 | 58.34 | 6.45% |
| 6 | Gradient Boosting (GBM) | 0.9147 | 42.45 | 5232.59 | 72.34 | 5.51% |
| 7 | Linear Regression | 0.9330 | 43.31 | 4110.73 | 64.11 | 6.24% |
| 8 | KNN (K-Nearest Neighbors) | 0.8455 | 57.90 | 9472.54 | 97.33 | 7.28% |
| Model | R2 | MAE (ppm) | MSE | RMSE | MAPE (%) |
|---|---|---|---|---|---|
| Standard Stacking Ensemble | 0.940 ± 0.03 | 34.37 ± 4.36 | 3515 ± 1004 | 58.57 ± 9.18 | 5.06 ± 0.72 |
| Linear Regression | 0.938 ± 0.02 | 35.73 ± 3.91 | 3639 ± 976 | 59.70 ± 8.68 | 5.28 ± 0.55 |
| SVR (Support Vector Machine) | 0.914 ± 0.01 | 30.54 ± 3.41 | 5515 ± 1905 | 73.28 ± 12.02 | 3.89 ± 0.50 |
| Random Forest | 0.918 ± 0.04 | 34.02 ± 5.35 | 4887 ± 1641 | 68.88 ± 11.92 | 4.50 ± 0.91 |
| Gradient Boosting (GBM) | 0.918 ± 0.03 | 37.52 ± 3.81 | 4924 ± 1023 | 69.77 ± 7.51 | 5.17 ± 0.72 |
| Hist. Gradient Boosting | 0.921 ± 0.01 | 33.34 ± 3.72 | 4874 ± 1107 | 69.41 ± 7.50 | 4.32 ± 0.66 |
| AutoGluon (Ensemble) | 0.921 ± 0.03 | 33.93 ± 5.12 | 4604 ± 1472 | 66.86 ± 11.56 | 4.59 ± 0.85 |
| KNN (K-Nearest) | 0.875 ± 0.04 | 47.78 ± 3.60 | 7555 ± 1514 | 86.57 ± 8.94 | 6.52 ± 0.47 |
| Model | Folds | Train (%) | Test (%) | R2 | MAE (ppm) | MSE | RMSE | MAPE (%) |
|---|---|---|---|---|---|---|---|---|
| AutoGluon (Ensemble) | Fold 1 | 23.8% | 15.3% | 0.774 | 90.28 | 24,715 | 157.21 | 11.33% |
| Fold 2 | 39.0% | 15.3% | 0.866 | 59.90 | 8084 | 89.91 | 8.04% | |
| Fold 3 | 54.3% | 15.3% | 0.935 | 27.64 | 1570 | 39.62 | 5.17% | |
| Fold 4 | 69.5% | 15.3% | 0.959 | 23.37 | 1660 | 40.74 | 3.42% | |
| Fold 5 | 84.8% | 15.3% | 0.947 | 32.80 | 3476 | 58.96 | 4.22% | |
| Random Forest | Fold 1 | 23.8% | 15.3% | 0.779 | 76.71 | 24,198 | 155.56 | 8.29% |
| Fold 2 | 39.0% | 15.3% | 0.928 | 42.14 | 4324 | 65.75 | 5.49% | |
| Fold 3 | 54.3% | 15.3% | 0.949 | 18.91 | 1244 | 35.28 | 3.24% | |
| Fold 4 | 69.5% | 15.3% | 0.942 | 22.97 | 2320 | 48.16 | 3.15% | |
| Fold 5 | 84.8% | 15.3% | 0.945 | 34.93 | 3616 | 60.13 | 4.48% | |
| Hist Gradient Boosting | Fold 1 | 23.8% | 15.3% | 0.781 | 92.37 | 23,909 | 154.62 | 12.30% |
| Fold 2 | 39.0% | 15.3% | 0.919 | 43.33 | 4865 | 69.75 | 6.02% | |
| Fold 3 | 54.3% | 15.3% | 0.957 | 18.91 | 1038 | 32.22 | 3.37% | |
| Fold 4 | 69.5% | 15.3% | 0.955 | 21.37 | 1804 | 42.47 | 2.97% | |
| Fold 5 | 84.8% | 15.3% | 0.935 | 33.66 | 4297 | 65.55 | 4.18% | |
| Manual Stacking Ensemble | Fold 1 | 23.8% | 15.3% | 0.934 | 60.39 | 7176 | 84.71 | 8.74% |
| Fold 2 | 39.0% | 15.3% | 0.835 | 82.00 | 9937 | 99.68 | 12.76% | |
| Fold 3 | 54.3% | 15.3% | 0.927 | 30.81 | 1772 | 42.10 | 6.05% | |
| Fold 4 | 69.5% | 15.3% | 0.931 | 32.91 | 2777 | 52.70 | 5.34% | |
| Fold 5 | 84.8% | 15.3% | 0.950 | 43.78 | 3292 | 57.38 | 6.77% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Baigarayeva, Z.; Boltaboyeva, A.; Kalpeyeva, Z.; Uskenbayeva, R.; Turmakhan, M.; Kakharov, A.; Anartayeva, A.; Moldagulova, A. Comparative Development of Machine Learning Models for Short-Term Indoor CO2 Forecasting Using Low-Cost IoT Sensors: A Case Study in a University Smart Laboratory. Algorithms 2026, 19, 328. https://doi.org/10.3390/a19050328
Baigarayeva Z, Boltaboyeva A, Kalpeyeva Z, Uskenbayeva R, Turmakhan M, Kakharov A, Anartayeva A, Moldagulova A. Comparative Development of Machine Learning Models for Short-Term Indoor CO2 Forecasting Using Low-Cost IoT Sensors: A Case Study in a University Smart Laboratory. Algorithms. 2026; 19(5):328. https://doi.org/10.3390/a19050328
Chicago/Turabian StyleBaigarayeva, Zhanel, Assiya Boltaboyeva, Zhuldyz Kalpeyeva, Raissa Uskenbayeva, Maksat Turmakhan, Adilet Kakharov, Aizhan Anartayeva, and Aiman Moldagulova. 2026. "Comparative Development of Machine Learning Models for Short-Term Indoor CO2 Forecasting Using Low-Cost IoT Sensors: A Case Study in a University Smart Laboratory" Algorithms 19, no. 5: 328. https://doi.org/10.3390/a19050328
APA StyleBaigarayeva, Z., Boltaboyeva, A., Kalpeyeva, Z., Uskenbayeva, R., Turmakhan, M., Kakharov, A., Anartayeva, A., & Moldagulova, A. (2026). Comparative Development of Machine Learning Models for Short-Term Indoor CO2 Forecasting Using Low-Cost IoT Sensors: A Case Study in a University Smart Laboratory. Algorithms, 19(5), 328. https://doi.org/10.3390/a19050328

