Comparative Development of Machine Learning Models for Short-Term Indoor CO2 Forecasting Using Low-Cost IoT Sensors: A Case Study in a University Smart Laboratory

Baigarayeva, Zhanel; Boltaboyeva, Assiya; Kalpeyeva, Zhuldyz; Uskenbayeva, Raissa; Turmakhan, Maksat; Kakharov, Adilet; Anartayeva, Aizhan; Moldagulova, Aiman

doi:10.3390/a19050328

Open AccessArticle

Comparative Development of Machine Learning Models for Short-Term Indoor CO₂ Forecasting Using Low-Cost IoT Sensors: A Case Study in a University Smart Laboratory

by

Zhanel Baigarayeva

^1,2,3

,

Assiya Boltaboyeva

^1,2,3,*

,

Zhuldyz Kalpeyeva

¹

,

Raissa Uskenbayeva

¹,

Maksat Turmakhan

^1,3,*,

Adilet Kakharov

^2,3,

Aizhan Anartayeva

¹ and

Aiman Moldagulova

¹

Institute of Automation and Information Technology, Satbayev University, Almaty 050013, Kazakhstan

²

Faculty of Information Technologies and Artificial Intelligence, Al-Farabi Kazakh National University, Almaty 050040, Kazakhstan

³

LLP “Kazakhstan R&D Solutions”, Almaty 050056, Kazakhstan

^*

Authors to whom correspondence should be addressed.

Algorithms 2026, 19(5), 328; https://doi.org/10.3390/a19050328

Submission received: 23 January 2026 / Revised: 20 April 2026 / Accepted: 22 April 2026 / Published: 24 April 2026

(This article belongs to the Special Issue Emerging Trends in Distributed AI for Smart Environments)

Download

Browse Figures

Versions Notes

Abstract

Unlike reactive systems, mechanical ventilation controlled by CO₂ concentration operates at a target efficiency that dynamically increases whenever the target CO₂ level is exceeded. This approach eliminates the typical ‘dead-time’ and prevents air quality degradation by ensuring the system adjusts its performance immediately in response to concentration changes. In this work, the study focuses on the development and evaluation of data-driven predictive models for near-term indoor CO₂ forecasting that can be integrated into pre-occupancy ventilation strategies, rather than designing a complete control scheme. Experimental data were collected over four months in a 48 m² smart laboratory configured as an open-plan office, where a heterogeneous IoT sensing architecture logged synchronized time-series measurements of CO₂ and microclimate variables (temperature, relative humidity, PM_2.5, TVOCs), together with acoustic noise levels and appliance-level energy consumption used as indirect occupancy-related signals. Raw telemetry was transformed into a 22-feature state vector using a structured feature engineering method incorporating z-score standardization, cyclic time encodings, multi-horizon CO₂ lags, rolling statistics, momentum features, and non-linear interactions to represent temporal autocorrelation and daily periodicity. The study benchmarks multiple regression paradigms, including simple baselines and ensemble methods, and found that an automated multi-level stacked ensemble achieved the highest predictive fidelity for short-term forecasting, with an Mean Absolute Error (MAE) of 32.97 ppm across an observed CO₂ range of 403–2305 ppm, representing improvements of approximately 24% and 43% over Linear Regression and K-Nearest Neighbors (KNN), respectively. Temporal diagnostics showed strong phase alignment with observed CO₂ rises during occupancy transitions and statistically reliable prediction intervals. Five-fold walk-forward cross-validation confirmed the temporal stability of these results, with top models achieving consistent R² values of 0.93–0.95 across Folds 2–5. These results demonstrate that, within a single-room university laboratory setting, historical sensor data from low-cost IoT devices can support accurate short-term CO₂ forecasting, providing a predictive layer that could support future proactive ventilation scheduling aimed at reducing CO₂ lag at the start of occupancy while avoiding unnecessary ventilation runtime. Generalization to other building types and occupancy profiles requires further validation.

Keywords:

indoor air quality (IAQ); carbon dioxide forecasting; machine learning (ML); Internet of Things (IoT); demand-controlled ventilation (DCV); feature engineering; smart buildings

1. Introduction

Decarbonizing the built environment has placed growing pressure on the fundamental tension between minimizing building energy consumption and safeguarding Indoor Air Quality (IAQ) [1,2]. Buildings account for approximately 32 percent of global energy demand and 34 percent of energy-related CO₂ emissions, with HVAC systems representing the single largest end-use, consuming 40–60 percent of total building energy. Any increase in outdoor air ventilation rate directly amplifies heating or cooling loads, creating an energy penalty that scales with airflow volume and the outdoor-to-indoor thermal gradient. Modern energy codes and post-pandemic guidelines, which mandate higher outdoor air fractions to mitigate pathogen transmission, have intensified this conflict [3,4]. The primary IAQ indicators within this context are indoor CO₂ concentration (a surrogate for per-person ventilation adequacy), Total Volatile Organic Compounds (TVOCs), and Relative Humidity (RH), whose deviation from the recommended 40–60 percent range promotes microbial growth and occupant discomfort [5,6]. Maintaining these parameters within acceptable thresholds while constraining ventilation energy constitutes the central challenge of contemporary building climate control [7].

The most widely adopted strategy for reconciling ventilation adequacy with energy conservation is CO₂-based Demand-Controlled Ventilation (DCV), which maintains a baseline outdoor air rate per ASHRAE Standard 62.1 and proportionally increases supply as CO₂ rises above the ambient baseline [8]. Despite reducing HVAC energy consumption by 20–50 percent [9,10], DCV is fundamentally reactive: compounding delays from sensor lag, control latency, and mechanical ramp-up create a “ventilation dead-time” of 5–20 min during which CO₂ can exceed 1000 ppm, causing measurable decrements in cognitive performance [11,12]. Schedule-based control fails in flexible, hybrid workspaces due to chronic over- or under-ventilation [13,14], while PIR-based occupancy detection eliminates only the measurement lag—the dominant mechanical and dilution delays persist [11,15]. No purely reactive strategy can eliminate dead-time; only a proactive, predictive approach that forecasts CO₂ dynamics in advance can allow the HVAC system to complete its ramp-up before occupancy onset.

The maturation of Machine Learning (ML) and the Internet of Things (IoT) has enabled predictive IAQ systems, with Random Forest and deep learning, particularly LSTM networks, emerging as the most effective model families [16]. Hybrid frameworks such as BO-EMD-LSTM have demonstrated MAE reductions exceeding 55 percent over standalone models [11,12,13,14,15,16,17], and architectures incorporating cyclic feature engineering and time-lag inputs further improve accuracy [18,19,20,21,22]. However, most high-performing models rely on expensive research-grade sensors and lack deployment under real-world operational conditions [23]. There is a pronounced scarcity of studies demonstrating accurate CO₂ forecasting with low-cost, consumer-grade IoT sensors, which suffer from environmental cross-sensitivity and demand advanced feature engineering [24,25,26]. Addressing this gap is essential for scalable, cost-effective IAQ improvements across the existing building stock.

The novelty of this study lies in the development and comparative evaluation of short-term CO₂ forecasting models using low-cost commercial IoT sensors in a real operational environment. This research focuses on constructing a modeling pipeline that integrates physics-informed feature engineering, multi-horizon autoregressive structures, cyclic temporal encoding, and ensemble learning. The study provides a rigorous benchmarking analysis across multiple regression paradigms under consistent evaluation conditions. Additionally, the work demonstrates that forecasting accuracy approaches the intrinsic measurement uncertainty of standard NDIR sensors, revealing a practical hardware-imposed performance ceiling. The other contribution of this research lies in the deliberate and successful utilization of noisy, low-cost, consumer-grade IoT sensors—specifically, Aqara TVOC Air Quality Monitor and Aqara Temperature and Humidity Sensor devices communicating via the Zigbee 3.0 protocol—as the sole data acquisition infrastructure. Unlike previous studies that assume clean, high-fidelity sensor data or employ expensive industrial instrumentation, this work directly confronts the data quality challenges of commercial IoT sensors, including measurement noise, quantization effects, and intermittent data transmission, and demonstrates that these challenges can be effectively overcome through advanced feature engineering rather than hardware upgrades. Importantly, the multi-modal framework mitigates the model’s blindness to contextual dynamics by incorporating noise and energy consumption as proxy indicators of occupancy-driven and unscheduled activities. These auxiliary signals provide visibility into irregular events and non-stationary schedules that characterize demand-controlled ventilation, enabling the model to anticipate CO₂ changes beyond simple autoregressive trends. The feature engineering pipeline combines cyclic time encodings of hour and day with multi-resolution CO₂ lag features, capturing both periodic occupancy behavior and short-term gas dynamics. This integration of temporal context, autoregressive memory, and contextual proxies allows forecasting from inherently noisy low-cost sensors. By bridging the gap between ML-driven predictive ventilation theory and affordable IoT deployment, this work offers a promising and potentially scalable approach for proactive IAQ management, subject to validation across additional building types and occupancy profiles. The proposed predictive framework provides the foundation to shift future HVAC operation from reactive threshold control to anticipatory scheduling, ensuring high indoor air quality at occupancy onset without unnecessary continuous over-ventilation.

2. Materials and Methods

2.1. Data Collection

The experimental measurements for this study were conducted over a continuous four-month period in a 48 m² smart laboratory at Al-Farabi Kazakh National University. Regarding its primary operational use, the facility functions as both a dedicated research laboratory and an actively used open-plan office. It features eight functional workstations, ceiling-mounted LED luminaires, and a comprehensive HVAC suite. The experimental space is serviced by a centralized Variable Air Volume (VAV) HVAC system equipped with CO₂-based Demand-Controlled Ventilation (DCV). To isolate the performance of the predictive modeling framework and eliminate the unpredictable effects of natural infiltration, all windows remained strictly closed during the entire data collection period; thus, no passive ventilation was utilized. Under standard reactive operation, the mechanical ventilation system does not completely shut off but continuously maintains a baseline outdoor air flow rate of 50 m³/h to dilute background emissions such as TVOCs. When the indoor CO₂ concentration exceeds the predefined target threshold of 800 ppm, the system dynamically ramps up its target efficiency, increasing the outdoor air supply proportionally up to a maximum design capacity of 250 m³/h. The room typically accommodates up to 8 occupants during standard working hours (09:00 to 18:00), which directly drives the rapid CO₂ accumulation observed in the dataset. As illustrated in Figure 1, the cyber-physical architecture integrates a heterogeneous cluster of sensing and metering devices to capture the dynamic interplay between indoor microclimate conditions, occupant behavior, and energy consumption.

2.2. Sensor Network and Hardware Implementation

To ensure high-fidelity data acquisition suitable for machine learning, the physical placement and accuracy of the sensor network were rigorously configured. The primary environmental parameters were captured using a distributed network of commercial IoT devices. Specifically, four Aqara Temperature and Humidity T1 sensors were mounted at occupant desk height to measure temperature (accuracy: ±0.3 °C), relative humidity (±3%), and barometric pressure (±0.12 kPa). One Aqara TVOC Air Quality Monitor was installed on a central column to monitor TVOC concentrations (±0.01 mg m⁻³). To address potential concerns regarding the reliability of low-cost commercial sensors, the manufacturer-reported accuracies were empirically validated in-house through periodic gas-zero and salt-bath calibration checks, ensuring research-grade data integrity. Furthermore, to ensure high-fidelity occupancy detection—a critical parameter for demand-controlled ventilation—the sensing layer utilized the Aqara Presence FP2 sensor. Unlike traditional passive infrared (PIR) sensors, the FP2 employs millimeter-wave (mmWave) radar technology, enabling the identification of both moving and stationary occupants across multiple defined zones. This capability ensures that HVAC and lighting systems remain active even when users are present but motionless. Complementing this, Aqara Motion Sensors were deployed to detect rapid entry/exit events, while Aqara Door and Window sensors monitored the physical state of room boundaries to automatically suspend air conditioning during natural ventilation. Infrastructure safety and energy dynamics were further resolved using Aqara Water-Leak sensors and Zigbee 3.0 Smart Plugs, respectively, providing minute-resolution data on power consumption. No additional standalone air purification or filtration devices were used during the monitoring period beyond the HVAC system configuration.

2.3. Data Aggregation and Security

The central nervous system of this infrastructure is a Raspberry Pi 4 running the Home Assistant platform, which functions as a unified gateway and time-series logger. This gateway normalizes telemetry from diverse protocols—including Zigbee, Bluetooth Low Energy (BLE), and MQTT—into a standardized format. To guarantee rigorous temporal alignment, every measurement is stamped with Coordinated Universal Time (UTC) and stored in a local InfluxDB time-series database. Device clocks were strictly synchronized via an on-premises NTP server every 10 min, maintaining a worst-case drift below 10 ms. Raw measurements were sampled continuously at 1 Hz over the four-month campaign. Security was prioritized by encrypting all MQTT traffic, specifically from the Winsen air quality modules, using Transport Layer Security (TLS).

2.4. Feature Engineering and Dataset

Following data collection, the raw telemetry provided an initial base dataset of 16 primary features, integrating instantaneous environmental metrics (humidity, temperature, PM_2.5, TVOC), acoustic indicators, and energy data. However, to construct the final dataset required for the Machine Learning Layer, this raw data underwent a rigorous feature engineering process. To capture the cyclical nature of building operations without introducing numerical discontinuities, temporal data was encoded using trigonometric functions: the sine and cosine of the hour and the sine and cosine of the day. Furthermore, to account for thermal inertia, the dataset incorporated lagged variables, including historical CO₂ concentrations at 1 h, 3 h, and 24 h intervals, alongside previous-state values for noise, humidity, and temperature. Through this mathematical expansion, the initial 16 raw features were successfully transformed into the final 22-feature forecasting-oriented state vector. The selection of this final feature set was guided by three explicit physical and operational criteria. First, CO₂ lag terms at 1 h, 3 h, and 24 h were selected to represent immediate autocorrelation, the thermodynamic time constant of the room’s air volume, and daily occupancy periodicity, an approach consistent with established autoregressive feature design for indoor gas forecasting [27]. Second, cyclic sine-cosine encodings of hour and day were chosen because standard integer encoding of time introduces artificial discontinuities at period boundaries, which distort the model’s perception of schedule continuity [28]. Third, acoustic noise level and appliance energy consumption were retained as proxy features because direct occupancy labeling was unavailable, and prior work has demonstrated that non-intrusive ambient signals can reliably represent occupancy intensity to support ventilation scheduling [29]. The necessity and relative contribution of each feature group were subsequently confirmed empirically through the ablation study reported in Section 3. This resulted in a high-quality dataset of 1602 complete hourly records.

2.5. Data Preprocessing

The transformation of raw, high-frequency sensor telemetry into a predictive model-ready format required the development of a rigorous preprocessing pipeline designed to address the inherent stochasticity and temporal disconnects characteristic of real-world environmental data streams [30]. While the raw time-series data provides the foundational state of the environment, it inherently lacks the explicit temporal context required for high-precision forecasting. Therefore, to bridge this gap, the initial base dataset of 16 raw physical features was mathematically expanded and transformed into a structured 22-feature state vector (Xt) specifically engineered to capture system inertia. Since real-world sensor networks are prone to transmission artifacts, missing data points were initially handled using a forward-fill imputation strategy [31]. However, to prevent the propagation of artificial static trends, a strict temporal threshold was applied: continuous data gaps exceeding two hours were explicitly discarded rather than imputed. This filtering step ensures that the model is not trained on synthetic ‘plateaus’ that contradict the dynamic physical volatility of gas diffusion, as recommended in air quality monitoring protocols [32]. Subsequently, to ensure numerical stability during the training of gradient-based algorithms, all continuous variables underwent a rigorous standardization process (z = x), as this normalization step is critical for minimizing convergence time in optimization landscapes [33].

To address the cyclical nature of building operations, we fundamentally restructured the temporal representation of the dataset. Recognizing that standard linear numerical encoding introduces artificial discontinuities (e.g., the numerical jump between 23:00 and 00:00), we projected the temporal components onto a continuous manifold. The hour of the day and day of the week were transformed into four trigonometric variables (

h_{s i n}

,

h_{c o s}

,

d_{s i n}

,

d_{c o s}

) using sine and cosine functions:

h_{s i n} = s i n (\frac{2 π h}{24})

(1)

h_{c o s} = c o s (\frac{2 π h}{24})

(2)

d_{s i n} = s i n (\frac{2 π d}{7})

(3)

d_{c o s} = c o s (\frac{2 π d}{7})

(4)

where h represents the hour of the day, and d represents the day of the week, while the constants 24 and 7 correspond to the full daily and weekly periodic cycles, respectively.

This continuous encoding, widely validated in cyclical feature engineering literature [34], enables the regression algorithms to seamlessly learn recurrent daily patterns without boundary effects.

Building upon this temporal framework, we constructed the core feature set by aggregating seven fundamental raw inputs: Carbon Dioxide (CO₂), Temperature, Humidity, PM_2.5, Total Volatile Organic Compounds (TVOC), Noise Level, and Energy Consumption. However, since instantaneous readings often fail to capture the diffusive inertia of gas dynamics, a comprehensive set of Lagged Variables was introduced. We explicitly engineered three autoregressive terms for CO₂ at intervals of 1 h, 3 h, and 24 h. The specific selection of the 3 h lag (3 h) was empirically optimized to correlate with the thermodynamic time constant of the laboratory’s air volume, allowing the model to account for the physical delay between occupancy onset and peak gas saturation [35]. This acts alongside the 1 h (immediate autocorrelation) and 24 h (daily seasonality) lags to endow the model with a comprehensive “memory” of environmental states [36].

Finally, to enhance model sensitivity to micro-trends, we engineered a set of advanced statistical features. Specifically, 3 h and 6 h Moving Averages (MA₃, MA₆) were calculated to filter high-frequency sensor jitter and isolate the underlying accumulation trajectory [37]. Additionally, a Rolling Standard Deviation (Std₃) was computed to quantify environmental volatility. To complete the state vector, a Momentum feature

∆ {C O}_{2} = {C O}_{2} (t) - {C O}_{2} (t - k)

was introduced to capture the velocity of the gas diffusion process. In this formulation, k denotes the time-lag interval (where k =1 h). By explicitly modeling the rate of change, this feature allows the architecture to anticipate rapid surges in occupancy before they reach critical saturation levels [38].

2.6. Modeling and Evaluation Framework

To develop a robust and operationally viable inference engine capable of supporting pre-occupancy ventilation control with high reliability, a comparative study was meticulously designed to evaluate three distinct modeling paradigms, ranging from simple baseline regressions to complex, computationally intensive automated ensembles. This hierarchical approach was necessary to investigate the trade-off between computational complexity and predictive accuracy, specifically in the context of non-linear environmental dynamics. Initially, a diverse set of standalone algorithms was implemented to establish performance benchmarks for standard HVAC control logic and to determine the baseline predictability of the dataset. Linear Regression served as the primary baseline to rigorously test if the microclimate dynamics are linearly separable, providing a reference point for the complexity of the underlying function [39]. K-Nearest Neighbors was employed to capture local similarities in the multi-dimensional state vector, relying on the manifold assumption that similar past environmental states yield similar future outcomes [40]. Additionally, Support Vector Regression with a Radial Basis Function (RBF) kernel was utilized due to its theoretical ability to project input vectors into a higher-dimensional space where non-linear boundaries between occupancy states become linearly separable [41]. Furthermore, tree-based ensembles, specifically Random Forest [42] and Gradient Boosting [43], were deployed to test the effectiveness of decision tree architectures in handling the non-linear inertia of air quality without the need for extensive parameter tuning, utilizing the mechanisms of bagging and boosting to reduce variance and bias respectively.

To strictly test the efficacy of traditional ensemble methods for control applications, a Manual Stacking Regressor was constructed as a benchmark architecture. This model aggregates predictions from two powerful base learners—Random Forest and Gradient Boosting—using a Ridge Regression meta-learner [44]. The strategic choice of Ridge Regression as the aggregator allows for a robust linear combination of the tree-based outputs, ensuring that the meta-model does not overfit to the noise of the base learners. The objective function minimized by the meta-learner is defined as:

J (ω) = \sum_{i = 1}^{n} {(y_{i} - w^{T} \hat{y_{i}})}^{2} + λ {| | w | |}_{2}^{2}

(5)

where

\hat{y_{i}}

represents the vector of predictions from the base models,

w

is the weight vector, and

λ

is the regularization parameter. This L2 regularization term (

λ {| | w | |}_{2}^{2}

) is critical, as it penalizes extreme weights and ensures a stable fusion of the deterministic tree-based models, verifying whether a regularized combination offers sufficient precision for ventilation setpoint optimization compared to individual learners.

However, to overcome the inherent limitations of manual model selection and to minimize the operational risk of control errors, we utilized the AutoGluon AutoML framework, which automatically generated a stacked ensemble model [45]. This architecture provides the architectural depth needed for pre-occupancy forecasting by utilizing a base layer of heterogeneous models trained with k-fold bagging, including LightGBM, CatBoost, and Random Forest. The ensemble included Deep Neural Networks as base learners, contributing non-linear modeling capabilities [46]. The final prediction is generated by a weighted ensemble layer, which dynamically assigns importance to the most reliable models according to their validation performance. This ensures that the control signal remains stable even if individual base models fluctuate due to sensor noise or outliers. The resulting architecture combines these heterogeneous learners to provide the precision necessary for real-time control [47].

To rigorously quantify the operational precision of the developed architectures, we employed a multi-metric evaluation approach. The final 20% of the temporal dataset was reserved strictly for testing using a hold-out strategy [48]. We defined five key performance indicators to assess model fidelity from different perspectives:

R^{2} = 1 - \frac{\sum_{i = 1}^{n} (y_{i} - {\hat{y}}_{i})^{2}}{\sum_{i = 1}^{n} (y_{i} - \underline{y})^{2}}

(6)

M A E = \frac{1}{n} \sum_{i = 1}^{n} | y_{i} - {\hat{y}}_{i} |

(7)

M S E = \frac{1}{n} \sum_{i = 1}^{n} (y_{i} - {\hat{y}}_{i})^{2}

(8)

R M S E = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} (y_{i} - {\hat{y}}_{i})^{2}}

(9)

M A P E = \frac{100 %}{n} \sum_{i = 1}^{n} | \frac{y_{i} - {\hat{y}}_{i}}{y_{i}} |

(10)

We prioritized the Mean Absolute Error (MAE) as the primary selection criterion, as it provides a linear representation of the average prediction error in parts per million (ppm), directly corresponding to physical ventilation thresholds [5]. However, to assess the model’s sensitivity to large outliers, we concurrently analyzed the Mean Squared Error (MSE). Furthermore, to ensure the model is not a “black box” and to validate the physical consistency of its decision-making process, we integrated the SHAP (Shapley Additive exPlanations) framework. Unlike standard feature importance methods (e.g., Gini impurity), which can be biased towards high-cardinality features, SHAP values guarantee consistency and local accuracy based on cooperative game theory. The contribution (

ϕ_{j}

) of each feature j is calculated as the weighted average of its marginal contributions across all possible feature coalitions [47]:

ϕ_{j} = \sum_{S \subseteq F \ {j}} \frac{| S |! (| F | - | S | - 1)!}{| F |!} [f (S \cup {j}) - f (S)]

(11)

This interpretability layer is critical for verifying that the model relies on causal environmental factors (e.g., accumulation trends) rather than spurious correlations.

The error distribution approximates a Gaussian (normal) curve centered near zero, defined by the probability density function:

P (x) = \frac{1}{σ \sqrt{2 π}} e^{- (x - μ)^{2} / 2 σ^{2}}

(12)

Statistical tests on the residuals reveal a Skewness of near-zero (−0.02) and a Kurtosis of 2.8, indicating a platykurtic distribution with thin tails. This implies that extreme prediction errors are exceedingly rare. As its shown in Figure 2, the majority of errors fall within the narrow range of ±50 ppm, which is comparable to the inherent hardware measurement uncertainty of standard NDIR CO₂ sensors, indicating that the model has reached the theoretical limit of accuracy imposed by the sensing instrumentation.

To further strengthen the evaluation framework and ensure that reported performance metrics reflect temporally consistent model behavior rather than the outcome of a single favorable data partition, a 5-fold walk-forward cross-validation strategy was additionally employed using scikit-learn’s TimeSeriesSplit. Unlike standard k-fold cross-validation, which partitions data randomly without regard to temporal ordering, this approach strictly preserves the chronological structure of the time series by ensuring that each training fold contains only observations that precede the corresponding test fold. The training set expands progressively across folds, from 23.8% of the total data in Fold 1 to 84.8% in Fold 5, while the test set remains fixed at approximately 15.3% per fold. To prevent data leakage at the feature level, all temporal features, including lag variables, rolling means, and rolling standard deviations, were recomputed independently within each fold using only the available training portion, ensuring that no future information influenced feature construction at any stage of the validation process. This evaluation strategy was applied to the four best-performing models identified through the initial benchmarking analysis, with results reported in Section 3.3.

3. Results

3.1. Model Benchmarking (Single Split)

The consolidated performance metrics for all evaluated algorithms are presented in Table 1. The quantitative analysis reveals a distinct performance hierarchy where the AutoGluon ensemble emerged as the superior architecture for accurate short-term forecasting, achieving a minimal MAE of 32.97 ppm and a Mean Absolute Percentage Error (MAPE) of just 4.18%. Interestingly, the Random Forest model exhibited a marginally superior Coefficient of Determination (R² = 0.9470) and a lower Mean Squared Error (MSE = 3248.87). This discrepancy indicates that while Random Forest provides a tighter statistical fit regarding squared residuals (penalizing large outliers more heavily), the AutoGluon Ensemble is more effective at minimizing the average absolute deviation (MAE). Since future proactive ventilation strategies depends linearly on concentration thresholds, the Ensemble’s superiority in MAE makes it the preferred choice for this specific application [48]. Furthermore, the ensemble approach outperformed the Linear Regression model by approximately 24% and showed a massive 43% improvement over the instance-based K-Nearest Neighbors model (MAE = 57.90).

While Table 1 demonstrates tight error margins, in industrial HVAC practice, acceptable CO₂ margins are often relatively broad [5]. The primary value of the proposed model lies not merely in minimizing numerical error, but in its contextual awareness of the events driving CO₂ fluctuations. A common limitation of predictive models is that they blindly follow measured CO₂ data without understanding the context. However, by integrating multi-modal telemetry, specifically noise and energy consumption, our model gains visibility into the irregular events and non-stationary schedules that represent the core challenge of demand-controlled ventilation.

To rigorously evaluate the predictive models, the filtered dataset of 1602 hourly records (approximately 67 days) was chronologically partitioned using an 80/20 holdout validation strategy. Specifically, the model was trained on the first 80% of the data, which corresponds to 1281 h or approximately 53 days of continuous observation. The evaluation results and performance metrics (Table 1) were calculated on the remaining 20% (321 h, or roughly 13 days) of unseen testing data. To assess the model’s behavior under dynamic load conditions, we analyzed the temporal alignment between the predicted values and the actual sensor telemetry. As illustrated in Figure 3, the forecasting results display a continuous sequence of approximately 11 days extracted directly from this unseen testing set. This graphical analysis demonstrates a high degree of phase synchronization: the predicted trajectory (Red) closely follows the rising slopes of the actual measurements (Blue), indicating that the model correctly identifies the onset of accumulation phases triggered by human presence. Following these accumulation phases, the data also exhibits a rapid decrease in CO₂ concentration. It is important to contextualize this steep clearance rate: it is a direct physical consequence of the experimental room’s ventilation strategy. The 48 m² space relies entirely on a centralized Variable Air Volume (VAV) system, as all windows were kept strictly closed to eliminate unquantifiable passive ventilation. While the room typically accommodates a peak level of 8 occupants—driving the rapid CO₂ rise—the mechanical ventilation dynamically ramps up from its 50 m³/h baseline to a maximum design capacity of 250 m³/h once the 800 ppm threshold is exceeded. This peak extraction capacity efficiently and rapidly flushes the accumulated gas from the compact room volume, resulting in the sharp downward trajectories that the predictive model successfully tracks. We further conducted a Diurnal Error Profiling analysis, which revealed that the highest residuals occur during the “morning transient phase” (08:00–10:00), where the derivative of concentration

\frac{d C}{d t}

is maximal. However, even in these high-velocity transition zones, the model maintains an error within safe operational limits. Crucially, in the steady-state high-concentration zones exceeding 1000 ppm, the model exhibits a slight smoothing effect, acting as a low-pass filter that ignores high-frequency sensor noise while preserving the global trend. We further calculated the 95% Confidence Interval (CI) for the predictions (

C I = \hat{y} \pm 1.96 \times S E

), confirming that 96.4% of the actual data points fall within the predicted confidence intervals.

The feature importance analysis, conducted using the SHAP framework described in Section 2.3, confirms the physical validity of the model. As visualized in the SHAP summary plot (Figure 4), the rolling statistical feature CO2_MA3 (3 h Moving Average) is the undisputed dominant predictor. This ranking is physically significant: it indicates that the model prioritizes the smoothed accumulation trend over immediate, potentially noisy sensor readings (CO2_Lag1). By relying on the 3 h average, the architecture effectively filters out transient fluctuations caused by local air turbulence, focusing instead on the stable, inertial rise in concentration that characterizes true occupancy events. To empirically quantify the necessity of these features, we performed a post hoc Ablation Study (summarized in Figure 5). The removal of the Lagged Variables group resulted in a massive performance degradation, increasing the MAE from 32.97 ppm to 127.27 ppm (a 286.0% degradation), confirming that system inertia is the primary driver of predictability. Furthermore, the removal of the Noise and Energy group caused a dramatic performance degradation (spiking the MAE to 80.12 ppm, a 143.0% degradation), specifically during peak occupancy hours. This empirically demonstrates that the model utilizes these variables not just as secondary features, but as crucial contextual indicators of irregular events. Standard reactive systems cannot distinguish between routine fluctuations and event-driven concentration changes. Our model addresses this limitation by using acoustic and energy spikes as immediate contextual triggers to interpret the causes behind dynamic CO₂ increases [49]. The SHAP analysis (Figure 4) further demonstrates that high values of energy and noise actively drive the model to anticipate CO₂ accumulation, effectively capturing the non-stationary schedules inherent to real-world environments. Furthermore, the Ablation Study (Figure 5) confirms that without these contextual triggers, the architecture fundamentally fails to account for occupancy-driven dynamics.

Finally, a detailed diagnostic evaluation confirmed the statistical reliability of the predictions. Figure 6 shows that data points cluster tightly along the identity line (y = x) throughout the full operational range of 400–1600 ppm, confirming no saturation effects. Further analysis of the residuals (Figure 7) reveals that the error distribution approximates a Gaussian curve centered near zero. Statistical tests reveal a Skewness of near-zero (0.02) and a Kurtosis of 2.8, indicating a platykurtic distribution with thin tails. This implies that extreme prediction errors are exceedingly rare [50]. The majority of errors fall within the narrow range of 50 ppm, which is directly comparable to the inherent hardware measurement uncertainty of standard NDIR CO₂ sensors.

To further investigate the temporal stability of the model, we conducted a Diurnal Error Profiling analysis (Figure 8). The boxplot distribution reveals that the prediction residuals remain remarkably tight during nocturnal and late-evening hours (21:00–06:00), where the environment is stable. However, the spread of the residuals and the presence of outliers increase during the morning transient phase, specifically between 10:00 and 11:00. This period represents the highest derivative of CO₂ concentration due to simultaneous occupant arrival and equipment activation. Despite this increased volatility, the median error across all hours remains centered near the zero-error line, confirming that the model does not suffer from systematic drift or hourly bias.

3.2. Model Performance Assessment via 5-Fold Cross-Validation

Table 2 presents the performance of all eight models evaluated via 5-fold standard 5-Fold cross-validation, with results reported as mean ± standard deviation across folds. The Standard Stacking Ensemble achieved the highest R² of 0.940 ± 0.03 alongside an MAE of 34.37 ± 4.36 ppm and RMSE of 58.57 ± 9.18 ppm, indicating the strongest overall fit among all evaluated models. Linear Regression followed closely with R² of 0.938 ± 0.02 and RMSE of 59.70 ± 8.68 ppm, a notably competitive result for a simple linear model that suggests strong linear relationships exist within the CO₂ feature space.

Hist. Gradient Boosting and AutoGluon both recorded R² of 0.921, with AutoGluon showing a lower MSE of 4604 ± 1472 compared to Hist. Gradient Boosting’s 4874 ± 1107, suggesting marginally better handling of prediction variance. Random Forest and Gradient Boosting (GBM) achieved identical R² values of 0.918, though Random Forest demonstrated a lower MAE of 34.02 ± 5.35 ppm versus GBM’s 37.52 ± 3.81 ppm. SVR recorded an R² of 0.914 ± 0.01; however, it achieved the lowest MAPE of 3.89 ± 0.50% and the lowest MAE of 30.54 ± 3.41 ppm among all models, indicating particularly strong performance on relative prediction accuracy despite its lower explained variance. KNN performed weakest across all metrics with R² of 0.875 ± 0.04 and RMSE of 86.57 ± 8.94 ppm, suggesting limited capacity to capture the temporal dynamics of CO₂ concentration from the given feature set.

Overall, the standard deviation values across folds remained relatively low for most models, with Linear Regression and SVR showing the tightest variance in R² at ±0.02 and ±0.01 respectively, reflecting stable and consistent performance across different data partitions.

3.3. Fold-Based Performance via Walk-Forward Cross-Validation

To assess the temporal consistency and predictive stability of the top four models, 5-fold walk-forward cross-validation was implemented using scikit-learn’s TimeSeriesSplit. Unlike standard k-fold cross-validation, this approach strictly preserves the chronological ordering of the time-series data, ensuring that each model is trained exclusively on past observations and evaluated on subsequent unseen data. Across all five folds, the test set consistently comprised approximately 15.3% of the total data, as shown in Table 3, while the training set expanded progressively from 23.8% in Fold 1 to 84.8% in Fold 5, reflecting an incremental learning structure consistent with real-world sequential CO₂ prediction scenarios. To prevent data leakage, all temporal features, including lag variables, rolling means, and rolling standard deviations, were recomputed independently within each fold using only the training portion, ensuring that no future information influenced feature construction at any stage of the validation process.

As expected under walk-forward validation, Fold 1 yielded comparatively lower performance across all models due to the limited training data available at that stage. AutoGluon Ensemble recorded an R² of 0.774 and MAE of 90.28 ppm in Fold 1, improving consistently across subsequent folds and reaching R² values of 0.93–0.95 from Fold 3 onward, with MAE as low as 23.37 ppm in Fold 4. This progressive improvement confirms that the model captures genuine temporal CO₂ patterns rather than overfitting to any single time window.

Random Forest followed a similar trajectory, with R² rising from 0.779 in Fold 1 to 0.945 by Fold 5 and RMSE declining from 155.56 to 60.13. Hist Gradient Boosting demonstrated the most consistent stabilization among individual models, achieving R² of 0.95 across Folds 3 and 4 with RMSE values of 32.22 and 42.47 respectively, indicating strong capacity to model non-linear CO₂ dynamics once sufficient historical data is available within the fold structure.

The Manual Stacking Ensemble exhibited a somewhat different pattern, with Fold 1 already achieving R² of 0.93, suggesting that ensemble diversity partially compensates for limited training data. Performance remained stable across Folds 3–5 with R² values of 0.92–0.95, though MAPE values were comparatively higher across folds, indicating some sensitivity to low-magnitude CO₂ readings.

Across all models, the standard deviation in performance across folds was low from Fold 2 onward, confirming that the results are not an artifact of a single favorable train-test split but reflect consistent predictive behavior across multiple temporal evaluation windows within the dataset. This validation framework provides a more structured characterization of each model’s performance by evaluating predictions across multiple temporal windows within the dataset, rather than relying on a single chronological split.

4. Discussion

The results of this study indicate that the proposed predictive framework is well suited for anticipatory ventilation scheduling applications, as it learns the temporal structure that governs CO₂ accumulation rather than relying solely on threshold exceedance detection [51]. In our benchmarking analysis, the AutoGluon stacked ensemble achieved the lowest MAE (32.97 ppm) while maintaining strong goodness-of-fit metrics, outperforming simpler baselines such as Linear Regression and KNN. These findings confirm the value of multi-model ensembling for short-horizon environmental forecasting tasks. The model’s performance is consistent with the broader pre-occupancy ventilation concept: achieving acceptable indoor conditions at the start of occupancy requires reliable short-term forecasting rather than purely reactive sensing [52]. Empirical pre-ventilation studies similarly highlight that pollutant levels can be reduced prior to occupancy when airflow timing is appropriately scheduled [50]. In this context, the present work contributes the predictive modeling component required to support such strategies.

The superior performance of the AutoGluon stacked ensemble (MAE of 32.97 ppm) over simple baselines like Linear Regression and KNN highlights the non-linear, inertial nature of indoor gas diffusion. From a physical modeling perspective, this result is consistent with the view that indoor CO₂ evolution behaves as a dynamic system with memory rather than a static regression problem. The concentration at any given hour is not solely a function of instantaneous occupancy, but the cumulative effect of prior emissions, ventilation effectiveness, and mixing efficiency. While linear models fail to capture the complex temporal dependencies of CO₂ accumulation, the ensemble architecture appears to capture key aspects of the temporal dynamics observed in this open-plan office setting. By heavily leveraging the 3 h moving average (CO2_MA3) and multi-horizon lags, the model prioritizes the physical momentum of gas accumulation over transient, localized sensor noise. This is a critical advantage for real-world HVAC applications, where reacting to high-frequency anomalies often leads to excessive actuator hunting and energy waste [53].

A key reason the model performs well is its ability to synchronize with the “phase” of real building use, capturing rapid CO₂ rises during entry periods and stabilizing predictions during steady-state occupancy [23]. This is exactly the type of temporal behavior leveraged by forecasting-based DCV approaches, including LSTM-driven CO₂ prediction with occupant-related inputs, where accurate short-horizon trajectories enable proactive airflow regulation under operational CO₂ targets [23]. Our feature importance results are consistent with a physically plausible explanation: the dominance of CO₂ moving averages and lag terms suggests that the model may be capturing gas inertia and short-term accumulation trends more than overfitting to noisy instantaneous spikes, though this interpretation is based on model behavior within this dataset rather than direct causal evidence. This is aligned with predictive DCV formulations that combine time-series forecasting with optimization [54]. The ablation finding that removing lagged history causes a large performance drop further suggests that historical context plays an important role in reliable anticipatory forecasting within this setting [54]. This finding has important implications for practical deployment. Many commercial DCV implementations rely on instantaneous CO₂ readings without embedding temporal smoothing or autoregressive structure into the control logic. Our results suggest that such designs inherently ignore the dominant time constant of the room. By explicitly encoding multi-horizon lags and rolling statistics, the proposed framework internalizes the delay between occupancy onset and concentration peak. In addition, energy consumption and noise acted as useful proxies, increasing predicted CO₂ during high-activity periods, which matches prior evidence that occupancy profiles can be inferred from non-CO₂ signals (e.g., metered power) to support ventilation scheduling when direct occupancy labels are scarce [55]. A major challenge in contemporary Demand-Controlled Ventilation (DCV) is accurately estimating occupancy without deploying privacy-invasive technologies, such as high-resolution cameras [56]. Our findings demonstrate that non-intrusive ambient proxy signals, specifically acoustic noise and appliance-level energy consumption, can effectively capture unscheduled occupancy surges. As suggested by the ablation study, these variables appear to carry significant predictive weight beyond secondary features. Within this dataset, their removal substantially degraded performance during peak occupancy hours, which is consistent with the interpretation that the model associates these signals with occupancy-driven CO₂ changes, though this does not establish a direct causal relationship. This presents a potentially scalable and privacy-compliant architecture for continuous air quality monitoring, though validation across diverse building types and occupancy patterns is needed before broader generalizability can be claimed.

The temporal stability of these findings was further examined through 5-fold walk-forward cross-validation using scikit-learn’s TimeSeriesSplit, which evaluates model performance across successive temporal windows rather than a single fixed split. The comparatively lower performance in Fold 1, where training data comprised only 23.8% of the total observations, is consistent with the known sensitivity of time-series models to insufficient historical context and should not be interpreted as a methodological weakness [57]. Rather, it reflects the realistic early-deployment phase in which limited observational history constrains predictive capacity. The convergence of performance metrics from Fold 2 onward, with low standard deviations across folds, indicates that the reported accuracy levels are stable and reproducible within the temporal structure of this dataset. Notably, the Manual Stacking Ensemble displayed a distinct pattern, achieving R² of 0.93 in Fold 1 despite minimal training data, suggesting that architectural diversity within the ensemble partially compensates for limited temporal context, though this advantage diminished in Fold 2 before restabilizing in subsequent folds, pointing to sensitivity around the particular CO₂ dynamics present in that temporal window. Overall, these cross-validation results support the conclusion that the single-split benchmarking results in Section 3.1 are not an artifact of a favorable data partition but reflect consistent model behavior across the temporal structure of the dataset.

From an operational robustness standpoint, our ensemble also behaved as a low-pass filter in high-concentration regimes, attenuating stochastic sensor noise while preserving the global trend, which may reduce actuator hunting and potentially improve equipment longevity in threshold-driven systems, although this would need to be confirmed through closed-loop deployment in real operational conditions [23]. The statistical reliability result, where 96.4% of observed points fell within the 95% prediction interval, is practically important because control decisions must be stable under uncertainty rather than only accurate on average [23]. From an HVAC operational perspective, the fundamental reason for implementing demand-controlled ventilation is to balance the trade-off between CO₂ concentration and energy use. Our highly accurate predictions serve as the essential foundation for this optimization. By reliably identifying the exact phase of CO₂ accumulation, the model provides the predictive data necessary for future HVAC systems to confidently turn off mechanical ventilation during unoccupied or low-demand periods, thereby saving energy, while proactively ramping up just before major occupancy transitions to maintain air quality. This risk-aware framing is directly compatible with optimization-based control layers, where our predictive distributions can be translated into constrained schedules that dynamically balance air quality targets with energy costs [58]. In this respect, MPC-based strategies provide a natural mechanism for converting forecasts into actionable pre-occupancy ventilation sequences, including cost-aware operation under time-varying electricity prices and other external constraints [58]. Online-learning variants of predictive control further suggest a pathway to maintain performance when conditions drift, by updating the forecasting model during operation rather than relying on a fixed, static fit [27].

The implications for occupants are straightforward: by potentially reducing the dead-time typical of reactive HVAC, pre-occupancy prediction may improve perceived freshness and comfort at the start of the workday and could reduce the frequency of excursions beyond acceptable ranges, although these outcomes remain to be validated through closed-loop deployment [50]. Field evidence from occupancy-oriented HVAC operation shows that aligning control with actual space use can deliver large energy reductions while maintaining comfort targets, reinforcing the practical relevance of occupancy-aware, anticipatory strategies. It should be noted, however, that the present work’s evaluation remains offline and limited to a single laboratory environment, and direct energy savings have not been quantified in this study [28]. Pre-occupancy outcomes also depend on airflow distribution and ventilation design choices, and comparative studies of diffuse ceiling versus mixing ventilation illustrate that “starting early” works best when the delivery mechanism supports uniform conditions during the pre-occupancy period [29].

While this study establishes high predictive validity offline, transitioning these forecasts into an active, closed-loop Model Predictive Control (MPC) system introduces practical engineering challenges. Real-time edge deployment requires balancing computational overhead with predictive accuracy. Although automated ensembles provide superior precision, their computational footprint may present resource constraints when deployed on low-power IoT gateways like Raspberry Pi. Future implementations must explore containerized microservices architectures or model distillation techniques to ensure seamless, low-latency inference without exhausting edge resources.

Several limitations remain. While demand-controlled ventilation is fundamentally driven by irregular events, our multi-modal approach heavily mitigates the model’s blindness to context by using noise and energy proxies to identify unscheduled activities. However, limitations remain for extreme anomalies completely unrepresented in the training data, such as major holidays or facility-wide unusual experiments where even proxy signals behave unpredictably, as ingrained daily and weekly periodicities may temporarily override the sensor context [28].

While the proposed multi-modal methodology is designed to be scalable, the direct transferability of the trained model to other building types remains a significant limitation that requires careful consideration. The high predictive weight assigned to proxy signals, specifically appliance-level energy consumption and acoustic noise, is inherently tied to the functional profile of a university laboratory and open-plan office. In this specific setting, workstation use, PC monitors, and localized conversations strongly correlate with human presence. However, in spaces with different functional dynamics, such as cafeterias, transit hubs, or industrial workshops, these proxy relationships may decouple entirely. For example, a room adjacent to a busy street or containing heavy machinery running on continuous schedules would present high noise and energy baselines completely disconnected from indoor human occupancy, potentially rendering these features uninformative or causing false positive ventilation triggers. Furthermore, the learned temporal relationships within the model cannot be universally generalized. The dominance of the 3 h moving average (CO2_MA3) and specific multi-horizon lags represent the model mathematically fitting the unique thermodynamic time constant, air volume (48 m²), and spatial inertia of the experimental room. If this exact model were deployed in a large lecture hall with a much greater volume or a different HVAC mixing efficiency, the rate of gas accumulation would drastically change, rendering the learned temporal lags inaccurate. Therefore, while the framework of integrating cyclic encodings and contextual proxies is broadly applicable, the specific feature weights, moving average windows, and lag intervals learned in this study are strictly localized. Deploying this architecture in new environments will necessitate a period of localized data collection and model re-tuning to capture the distinct acoustic, electrical, and physical signatures of the new space.

The model can also be sensitive to disturbances not fully represented in the current sensing setup, such as window opening, unusual door traffic, changes in infiltration, or manual HVAC interventions, which can shift the CO₂ dynamics and reduce accuracy on atypical days. Regarding measurement precision, our error diagnostics indicate that most residuals fall within roughly 50 ppm. This is comparable to the inherent uncertainty of common NDIR CO₂ sensors, suggesting the model may be approaching a hardware-limited accuracy ceiling unless sensing quality and calibration are improved [30]. This concern is amplified in long deployments, where low-cost sensors can drift and gradually bias inputs, motivating calibration strategies, drift monitoring, or redundancy through alternative sensing modalities [57]. Virtual sensing is one complementary direction, where learned pollutant estimates reduce dependence on dense physical instrumentation and can improve robustness when some sensors fail or degrade [58]. From an implementation perspective, our evaluation is largely offline. While the forecasts look reliable, we have not yet demonstrated how the predictor behaves once it is embedded in a real control loop. In practice, closed-loop operation can reveal issues such as overshoot, frequent on–off cycling, or degraded comfort when constraints like minimum runtime and airflow limits are enforced. Furthermore, the present study focuses exclusively on CO₂ concentration as the primary predicted variable. However, indoor air quality is inherently multi-factorial, and a ventilation strategy optimized solely for CO₂ may not simultaneously optimize other pollutants such as VOCs or particulate matter. Extending the modeling framework to jointly forecast multiple IAQ indicators represents an important direction for future research. Lastly, although the ensemble model is computationally efficient during inference, automated ensembling can increase the complexity and computational cost of periodic retraining compared to simpler individual models. This consideration may become relevant in edge-only deployments where hardware resources and update cycles are constrained [51].

5. Conclusions

This study addressed the limitations of reactive Demand-Controlled Ventilation (DCV) systems, which inherently suffer from a “ventilation dead-time” that exposes occupants to degraded indoor air quality. We proposed and validated a predictive framework that leverages historical sensor data to forecast short-term CO₂ dynamics. This provides the predictive component that could enable a transition from reactive correction to proactive pre-occupancy scheduling, pending integration with a closed-loop control system and validation under real operational conditions. The primary scientific contribution of this work lies in demonstrating that highly accurate predictive forecasts can be extracted exclusively from low-cost, consumer-grade IoT sensors. By employing a rigorous feature engineering pipeline, the framework successfully compensated for the intrinsic measurement noise of affordable hardware. The integration of cyclic temporal encodings, multi-horizon autoregressive lags, and contextual proxies—specifically acoustic noise and energy consumption—allowed the models to capture both periodic occupancy patterns and irregular event-driven CO₂ surges. Comprehensive benchmarking across multiple regression paradigms revealed that an automated multi-level stacked ensemble (AutoGluon) delivered the optimal performance for short-term forecasting, achieving a Mean Absolute Error (MAE) of 32.97 ppm.

The practical implications of these findings are substantial. This research suggests a promising and potentially scalable approach for retrofitting buildings with advanced predictive capabilities, pending validation across diverse room types, occupancy patterns, and HVAC configurations. The findings indicate that, within the studied single-room environment, anticipating CO₂ accumulation dynamically may allow facility managers and future automated systems to initiate pre-occupancy ventilation sequences, with the potential to improve air quality upon occupant arrival while reducing the energy penalty associated with continuous over-ventilation.

Despite these promising results, this study has specific limitations that must be acknowledged. First, the experimental validation was conducted in a single open-plan smart laboratory under strictly controlled physical conditions, including permanently closed windows. Consequently, the model’s robustness to unpredictable natural infiltration and mixed-mode ventilation strategies remains untested. Second, extreme unscheduled anomalies—such as major facility-wide events entirely unrepresented in the training data—may temporarily reduce forecasting accuracy, as ingrained temporal periodicities could momentarily override proxy signals. Furthermore, diagnostic evaluations indicated that the majority of prediction errors cluster around the 50 ppm mark, suggesting the predictive model may be approaching a hardware-imposed accuracy ceiling characteristic of standard consumer-grade NDIR sensors.

Future research will focus on extending this predictive framework to multi-zone commercial environments with highly variable occupancy profiles. Additionally, while this study establishes high predictive validity in an offline evaluation setting, transitioning these forecasts into an active, closed-loop Model Predictive Control (MPC) system represents the next critical operational step. Future investigations should also explore virtual sensing techniques and expand the modeling architecture to jointly forecast multiple indoor environmental pollutants, such as PM_2.5 and TVOCs, ultimately advancing the development of truly intelligent, energy-efficient, and healthy built environments.

Author Contributions

Conceptualization, Z.B., A.B., Z.K., R.U. and M.T.; methodology, Z.B., A.B., M.T., A.K., A.A. and A.M.; software, Z.B., A.B., M.T. and A.K.; validation, Z.K., R.U., A.A. and A.M.; formal analysis, Z.B., A.B., M.T., A.A. and A.M.; investigation, Z.B., A.B., M.T., A.K., A.A. and A.M.; resources, M.T., A.A. and A.M.; data curation, Z.B., A.B., Z.K., R.U., M.T. and A.K.; writing—original draft preparation, Z.B., A.B., M.T., A.K. and A.A.; writing—review and editing, Z.K., R.U. and A.M.; visualization, Z.B., A.B. and M.T.; supervision, Z.K., R.U., A.A. and A.M.; project administration, Z.K., R.U. and A.M.; funding acquisition, Z.K. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported by the Science Committee of the Ministry of Science and Higher Education of the Republic of Kazakhstan (Grant number: BR24993051).

Data Availability Statement

The data presented in this study are available upon request from the corresponding author.

Conflicts of Interest

Authors Zhanel Baigarayeva, Assiya Boltaboyeva, Maksat Turmakhan and Adilet Kakharov were employed by the company LLP “Kazakhstan R&D Solutions”. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

AI	Artificial Intelligence
AutoGluon	Automated Machine Learning Framework for Tabular Data
BLE	Bluetooth Low Energy
CI	Confidence Interval
CO₂	Carbon Dioxide
DCV	Demand-Controlled Ventilation
DNN	Deep Neural Network
GBM	Gradient Boosting Machine
HVAC	Heating, Ventilation, and Air Conditioning
IAQ	Indoor Air Quality
IoT	Internet of Things
KNN	K-Nearest Neighbors
MA	Moving Average
MA3	3 h Moving Average
MA6	6 h Moving Average
MAE	Mean Absolute Error
MAPE	Mean Absolute Percentage Error
MPC	Model Predictive Control
MQTT	Message Queuing Telemetry Transport
MSE	Mean Squared Error
NDIR	Non-Dispersive Infrared
PIR	Passive Infrared
PM_2.5	Particulate Matter with Diameter ≤ 2.5 µm
R²	Coefficient of Determination
RBF	Radial Basis Function
RMSE	Root Mean Squared Error
SHAP	Shapley Additive exPlanations
Std3	3 h Rolling Standard Deviation
SVR	Support Vector Regression
TLS	Transport Layer Security
TVOC	Total Volatile Organic Compounds
UTC	Coordinated Universal Time

References

Moghadam, T.T.; Ochoa Morales, C.E.; Lopez Zambrano, M.J.; Bruton, K.; O’Sullivan, D.T.J. Energy efficient ventilation and indoor air quality in the context of COVID-19—A systematic review. Renew. Sustain. Energy Rev. 2023, 182, 113356. [Google Scholar] [CrossRef] [PubMed]
Liu, Q.; Chen, Y.; Liu, Y.; Lei, Y.; Wang, Y.; Hu, P. A review and guide on selecting and optimizing machine learning algorithms for daylight prediction. Build. Environ. 2023, 244, 110822. [Google Scholar] [CrossRef]
Lu, X.; Pang, Z.; Fu, Y.; O’Neill, Z. The nexus of the indoor CO₂ concentration and ventilation demands underlying CO₂-based demand-controlled ventilation in commercial buildings: A critical review. Build. Environ. 2022, 218, 109137. [Google Scholar] [CrossRef]
Faulkner, C.A.; Castellini, J.E., Jr.; Lou, Y.; Zuo, W.; Lorenzetti, D.M.; Sohn, M.D. Tradeoffs among indoor air quality, financial costs, and CO₂ emissions for HVAC operation strategies to mitigate indoor virus in U.S. office buildings. Build. Environ. 2022, 221, 109282. [Google Scholar] [CrossRef]
Lu, X.; Pang, Z.; Fu, Y.; O’Neill, Z. Advances in research and applications of CO₂-based demand-controlled ventilation in commercial buildings: A critical review of control strategies and performance evaluation. Build. Environ. 2022, 223, 109455. [Google Scholar] [CrossRef]
Buonomano, A.; Forzano, C.; Giuzio, G.F.; Palombo, A. New ventilation design criteria for energy sustainability and indoor air quality in a post COVID-19 scenario. Renew. Sustain. Energy Rev. 2023, 182, 113378. [Google Scholar] [CrossRef]
Taheri, S.; Razban, A. Learning-based CO₂ concentration prediction: Application to indoor air quality control using demand-controlled ventilation. Build. Environ. 2021, 205, 108164. [Google Scholar] [CrossRef]
ANSI/ASHRAE Standard 62.1-2022; Ventilation and Acceptable Indoor Air Quality. ASHRAE: Atlanta, GA, USA, 2022.
Kapoor, N.R.; Kumar, A.; Kumar, A.; Kumar, A.; Mohammed, M.A.; Kumar, K.; Kadry, S.; Lim, S. Machine learning-based CO₂ prediction for office room: A pilot study. Wirel. Commun. Mob. Comput. 2022, 2022, 9404807. [Google Scholar] [CrossRef]
Persily, A. Please Don’t Blame Standard 62.1 for 1000 ppm CO₂. ASHRAE J. 2021, 63, 1–2. [Google Scholar]
Chen, Y.; Shen, G.; Huang, Y.; Zhu, Y. Predicting the long-term CO₂ concentration in classrooms based on the BO-EMD-LSTM model. Build. Environ. 2022, 224, 109568. [Google Scholar] [CrossRef]
Dong, J.; Goodman, N.; Rajagopalan, P. A review of artificial neural network models applied to predict indoor air quality in schools. Int. J. Environ. Res. Public Health 2023, 20, 6441. [Google Scholar] [CrossRef]
Mahmood, M.H.; Kamal, K.Y.; Hussein, S.S. Monitoring indoor air quality using low-cost IoT. J. Tech. 2025, 7, 21–28. [Google Scholar] [CrossRef]
Pan, J.; Wang, Y.; Liu, S. Future workspace needs flexibility and diversity: Understanding occupant attitudes and behavior for flexible co-working spaces. Build. Environ. 2023, 246, 110947. [Google Scholar] [CrossRef]
Flayyih, H.Q.; Waleed, J.; Ibrahim, A.M. Indoor air quality prediction in sick building using machine and deep learning: Comparative analysis. Diyala J. Eng. Sci. 2025, 18, 203–218. [Google Scholar] [CrossRef]
Chiang, Y.C.; Lu, C.H.; Chou, L.D. A practical and adaptive approach to predicting indoor CO₂. Appl. Sci. 2021, 11, 10771. [Google Scholar] [CrossRef]
Soliman, A.S.; Hafeez, G.; Khan, S.; Algarni, A.D. A review of occupancy detection techniques for HVAC control: Advances and practical challenges. J. Build. Eng. 2025, 105, 111399. [Google Scholar] [CrossRef]
Chen, X.; Yang, L.; Xue, H.; Li, L.; Yu, Y.; Wang, X. A machine learning model based on GRU and LSTM to predict the environmental parameters in a layer house, taking CO₂ concentration as an example. Sensors 2024, 24, 244. [Google Scholar] [CrossRef]
Ali, S.; Alam, F.; Arif, K.M.; Potgieter, J. Low-cost CO sensor calibration using one dimensional convolutional neural network. Sensors 2023, 23, 854. [Google Scholar] [CrossRef]
Taştan, M. Machine learning–based calibration and performance evaluation of low-cost Internet of Things air quality sensors. Sensors 2025, 25, 3183. [Google Scholar] [CrossRef]
Krupinski, F.; Marques, G.; Kaur, N. Validating the accuracy of low-cost IAQ sensors through co-location. In Proceedings of the eSim 2024 Conference, Edmonton, AB, Canada, 5–7 June 2024; pp. 146–153. [Google Scholar] [CrossRef]
Dai, Y.; Yuan, H.; Zhang, X.; Guo, J. A method for predicting indoor CO₂ concentration in university classrooms: An RF-TPE-LSTM approach. Appl. Sci. 2024, 14, 6188. [Google Scholar] [CrossRef]
Bae, K.W.; Choi, E.J.; Choi, Y.J.; Yun, J.Y.; Yun, G.Y.; Moon, H.J. Real-time ventilation control for indoor CO₂ management using deep learning-based predictive optimization algorithm. Build. Environ. 2025, 285, 113568. [Google Scholar] [CrossRef]
Norouziasas, A.; Tabadkani, A.; Doan, D.T.; Vafaee, F.; Aghamolaei, R. Impact of space utilization and work time flexibility on building energy demand. J. Build. Eng. 2024, 95, 110184. [Google Scholar] [CrossRef]
He, J.; Luo, M.; Chen, W. Classification prediction model of indoor PM2.5 concentration using CatBoost algorithm. Front. Built Environ. 2023, 9, 1207193. [Google Scholar] [CrossRef]
Majewski, G.; Telejko, M.; Sowa, J. Evaluation of demand control ventilation impact on indoor air quality and energy efficiency of an office space in a tropical climate. Indoor Built Environ. 2024, 33, 708–720. [Google Scholar]
Okafor, N.U.; Delaney, D.T. Missing Data Imputation on IoT Sensor Networks: Implications for On-Site Sensor Calibration. IEEE Sens. J. 2021, 21, 22833–22845. [Google Scholar] [CrossRef]
Little, R.J.A.; Rubin, D.B. Statistical Analysis with Missing Data, 3rd ed.; Wiley: Hoboken, NJ, USA, 2019. [Google Scholar]
Junninen, H.; Niska, H.; Tuppurainen, K.; Ruuskanen, J.; Kolehmainen, M. Methods for Imputation of Missing Values in Air Quality Data Sets. Atmos. Environ. 2004, 38, 2895–2907. [Google Scholar] [CrossRef]
Han, J.; Pei, J.; Tong, H. Data Mining: Concepts and Techniques, 3rd ed.; Morgan Kaufmann: Waltham, MA, USA, 2011. [Google Scholar]
Kuhn, M.; Johnson, K. Feature Engineering and Selection: A Practical Approach for Predictive Models; CRC Press: Boca Raton, FL, USA, 2019. [Google Scholar]
ASHRAE. ASHRAE Handbook—Fundamentals; ASHRAE: Atlanta, GA, USA, 2021. [Google Scholar]
Box, G.E.P.; Jenkins, G.M.; Reinsel, G.C.; Ljung, G.M. Time Series Analysis: Forecasting and Control, 5th ed.; Wiley: Hoboken, NJ, USA, 2015. [Google Scholar]
Smith, S.W. The Scientist and Engineer’s Guide to Digital Signal Processing; California Technical Publishing: San Diego, CA, USA, 1997. [Google Scholar]
Ljung, L. System Identification: Theory for the User; Prentice Hall: Upper Saddle River, NJ, USA, 1999. [Google Scholar]
Seber, G.A.F.; Lee, A.J. Linear Regression Analysis; Wiley: Hoboken, NJ, USA, 2003. [Google Scholar]
Cover, T.; Hart, P. Nearest Neighbor Pattern Classification. IEEE Trans. Inf. Theory 1967, 13, 21–27. [Google Scholar] [CrossRef]
Cortes, C.; Vapnik, V. Support-Vector Networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Friedman, J.H. Greedy Function Approximation: A Gradient Boosting Machine. Ann. Stat. 2001, 29, 1189–1232. [Google Scholar] [CrossRef]
Wolpert, D.H. Stacked Generalization. Neural Netw. 1992, 5, 241–259. [Google Scholar] [CrossRef]
Erickson, N.; Mueller, J.; Shirkov, A.; Zhang, H.; Larroy, P.; Li, M.; Smola, A. AutoGluon-Tabular: Robust and Accurate AutoML for Structured Data. arXiv 2020, arXiv:2003.06505. [Google Scholar]
Lim, B.; Zohren, S. Time-Series Forecasting with Deep Learning: A Survey. Philos. Trans. R. Soc. A 2021, 379, 20200209. [Google Scholar] [CrossRef]
Dietterich, T.G. Ensemble Methods in Machine Learning. In Multiple Classifier Systems; Springer: Berlin/Heidelberg, Germany, 2000; pp. 1–15. [Google Scholar]
Kohavi, R. A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection. In Proceedings of the 14th International Joint Conference on Artificial Intelligence (IJCAI), Montreal, QC, Canada, 20–25 August 1995; pp. 1137–1143. [Google Scholar]
Willmott, C.J.; Matsuura, K. Advantages of the Mean Absolute Error (MAE) over the Root Mean Square Error (RMSE) in Assessing Average Model Performance. Clim. Res. 2005, 30, 79–82. [Google Scholar] [CrossRef]
Lundberg, S.M.; Lee, S.I. A Unified Approach to Interpreting Model Predictions. In Advances in Neural Information Processing Systems (NeurIPS); Curran Associates, Inc.: Red Hook, NY, USA, 2017. [Google Scholar]
James, G.; Witten, D.; Hastie, T.; Tibshirani, R. An Introduction to Statistical Learning; Springer: New York, NY, USA, 2013. [Google Scholar]
Cheng, Z.; Yang, Z.; Xiong, J.; Li, G. Investigation on the pollutant concentration and optimal control strategy of pre-ventilation in office buildings. Int. J. Environ. Sci. Technol. 2024, 21, 3845–3858. [Google Scholar] [CrossRef]
Price, C.; Park, D.; Rasmussen, B.P. Cascaded Control for Building HVAC Systems in Practice. Buildings 2022, 12, 1814. [Google Scholar] [CrossRef]
Wu, D.C.; Momeni, M.; Razban, A.; Chen, J. Optimizing demand-controlled ventilation with thermal comfort and CO₂ concentrations using long short-term memory and genetic algorithm. Build. Environ. 2023, 243, 110676. [Google Scholar] [CrossRef]
Vassiljeva, K.; Matson, M.; Ferrantelli, A.; Petlenkov, E.; Thalfeldt, M.; Belikov, J. Data-Driven Occupancy Profile Identification and Application to the Ventilation Schedule in a School Building. Energies 2024, 17, 3080. [Google Scholar] [CrossRef]
Tarragona, J.; Gangolells, M.; Casals, M. Model predictive control for managing indoor air quality levels in buildings. Energy Rep. 2024, 12, 787–797. [Google Scholar] [CrossRef]
Sha, X.; Ma, Z.; Sethuvenkatraman, S.; Li, W. Online learning-enhanced data-driven model predictive control for optimizing HVAC energy consumption, indoor air quality and thermal comfort. Appl. Energy 2025, 383, 125341. [Google Scholar] [CrossRef]
Pang, Z.; Guo, M.; O’Neill, Z.; Smith-Cortez, B.; Yang, Z.; Dong, B. A longitudinal field study of sensor-driven occupancy-centric HVAC controls in an office building. Energy Build. 2025, 351, 116693. [Google Scholar] [CrossRef]
Borodinecs, A.; Palcikovskis, A.; Jacnevs, V. Indoor air CO₂ sensors and possible uncertainties of measurements: A review and an example of practical measurements. Energies 2022, 15, 6961. [Google Scholar] [CrossRef]
Rios Mora, J.S.; Jardinier, E.; Guyot, G.; Mélois, A.; Legrée, M.; Parsy, F.; Berthin, S. Long-term performances of low-cost indoor environment quality sensors for use in monitoring studies and ventilation strategies. Int. J. Vent. 2026, 25, 1–19. [Google Scholar] [CrossRef]
Gabriel, M.; Auer, T. LSTM Deep Learning Models for Virtual Sensing of Indoor Air Pollutants: A Feasible Alternative to Physical Sensors. Buildings 2023, 13, 1684. [Google Scholar] [CrossRef]

Figure 1. Smart IoT sensing-and-control architecture for indoor microclimate optimization.

Figure 2. Architecture of the 3-Level Automated Stacking Ensemble developed using the AutoGluon framework.

Figure 3. Comparative Analysis of Actual (Blue) vs. Predicted (Red) CO₂ Concentration Levels using the AutoGluon Stacked Ensemble over a 150 h validation sequence.

Figure 4. SHAP summary plot illustrating the impact of feature values on the model output.

Figure 5. Ablation study results demonstrating the impact of different feature groups on the model’s Mean Absolute Error (MAE).

Figure 6. Scatter plot of Predicted vs. Actual CO₂ concentrations showing the regression fit.

Figure 7. Histogram of prediction residuals showing a normal distribution of errors.

Figure 8. Diurnal Error Profiling: Model Residuals by Hour showing the distribution of prediction errors throughout the 24 h cycle.

Table 1. Comparative performance analysis of machine learning models for CO₂ concentration forecasting.

Rank	Model	R²	MAE (ppm)	MSE	RMSE	MAPE (%)
1	AutoGluon (Ensemble)	0.9429	32.97	3496.77	59.13	4.18%
2	Random Forest	0.9470	33.44	3248.87	56.99	4.41%
3	Hist. Gradient Boosting	0.9372	34.74	3848.55	62.04	4.42%
4	Manual Stacking Ensemble	0.9436	37.41	3456.76	58.79	5.24%
5	SVR (Support Vector Machine)	0.9445	40.86	3403.27	58.34	6.45%
6	Gradient Boosting (GBM)	0.9147	42.45	5232.59	72.34	5.51%
7	Linear Regression	0.9330	43.31	4110.73	64.11	6.24%
8	KNN (K-Nearest Neighbors)	0.8455	57.90	9472.54	97.33	7.28%

Table 2. Comparative Model Performance via 5-Fold Standard K-Fold Cross-Validation (Mean ± Standard Deviation).

Model	R²	MAE (ppm)	MSE	RMSE	MAPE (%)
Standard Stacking Ensemble	0.940 ± 0.03	34.37 ± 4.36	3515 ± 1004	58.57 ± 9.18	5.06 ± 0.72
Linear Regression	0.938 ± 0.02	35.73 ± 3.91	3639 ± 976	59.70 ± 8.68	5.28 ± 0.55
SVR (Support Vector Machine)	0.914 ± 0.01	30.54 ± 3.41	5515 ± 1905	73.28 ± 12.02	3.89 ± 0.50
Random Forest	0.918 ± 0.04	34.02 ± 5.35	4887 ± 1641	68.88 ± 11.92	4.50 ± 0.91
Gradient Boosting (GBM)	0.918 ± 0.03	37.52 ± 3.81	4924 ± 1023	69.77 ± 7.51	5.17 ± 0.72
Hist. Gradient Boosting	0.921 ± 0.01	33.34 ± 3.72	4874 ± 1107	69.41 ± 7.50	4.32 ± 0.66
AutoGluon (Ensemble)	0.921 ± 0.03	33.93 ± 5.12	4604 ± 1472	66.86 ± 11.56	4.59 ± 0.85
KNN (K-Nearest)	0.875 ± 0.04	47.78 ± 3.60	7555 ± 1514	86.57 ± 8.94	6.52 ± 0.47

Table 3. Model Performance Across Temporal Folds via 5-Fold Walk-Forward Cross-Validation (TimeSeriesSplit).

Model	Folds	Train (%)	Test (%)	R²	MAE (ppm)	MSE	RMSE	MAPE (%)
AutoGluon (Ensemble)	Fold 1	23.8%	15.3%	0.774	90.28	24,715	157.21	11.33%
	Fold 2	39.0%	15.3%	0.866	59.90	8084	89.91	8.04%
	Fold 3	54.3%	15.3%	0.935	27.64	1570	39.62	5.17%
	Fold 4	69.5%	15.3%	0.959	23.37	1660	40.74	3.42%
	Fold 5	84.8%	15.3%	0.947	32.80	3476	58.96	4.22%
Random Forest	Fold 1	23.8%	15.3%	0.779	76.71	24,198	155.56	8.29%
	Fold 2	39.0%	15.3%	0.928	42.14	4324	65.75	5.49%
	Fold 3	54.3%	15.3%	0.949	18.91	1244	35.28	3.24%
	Fold 4	69.5%	15.3%	0.942	22.97	2320	48.16	3.15%
	Fold 5	84.8%	15.3%	0.945	34.93	3616	60.13	4.48%
Hist Gradient Boosting	Fold 1	23.8%	15.3%	0.781	92.37	23,909	154.62	12.30%
	Fold 2	39.0%	15.3%	0.919	43.33	4865	69.75	6.02%
	Fold 3	54.3%	15.3%	0.957	18.91	1038	32.22	3.37%
	Fold 4	69.5%	15.3%	0.955	21.37	1804	42.47	2.97%
	Fold 5	84.8%	15.3%	0.935	33.66	4297	65.55	4.18%
Manual Stacking Ensemble	Fold 1	23.8%	15.3%	0.934	60.39	7176	84.71	8.74%
	Fold 2	39.0%	15.3%	0.835	82.00	9937	99.68	12.76%
	Fold 3	54.3%	15.3%	0.927	30.81	1772	42.10	6.05%
	Fold 4	69.5%	15.3%	0.931	32.91	2777	52.70	5.34%
	Fold 5	84.8%	15.3%	0.950	43.78	3292	57.38	6.77%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Baigarayeva, Z.; Boltaboyeva, A.; Kalpeyeva, Z.; Uskenbayeva, R.; Turmakhan, M.; Kakharov, A.; Anartayeva, A.; Moldagulova, A. Comparative Development of Machine Learning Models for Short-Term Indoor CO₂ Forecasting Using Low-Cost IoT Sensors: A Case Study in a University Smart Laboratory. Algorithms 2026, 19, 328. https://doi.org/10.3390/a19050328

AMA Style

Baigarayeva Z, Boltaboyeva A, Kalpeyeva Z, Uskenbayeva R, Turmakhan M, Kakharov A, Anartayeva A, Moldagulova A. Comparative Development of Machine Learning Models for Short-Term Indoor CO₂ Forecasting Using Low-Cost IoT Sensors: A Case Study in a University Smart Laboratory. Algorithms. 2026; 19(5):328. https://doi.org/10.3390/a19050328

Chicago/Turabian Style

Baigarayeva, Zhanel, Assiya Boltaboyeva, Zhuldyz Kalpeyeva, Raissa Uskenbayeva, Maksat Turmakhan, Adilet Kakharov, Aizhan Anartayeva, and Aiman Moldagulova. 2026. "Comparative Development of Machine Learning Models for Short-Term Indoor CO₂ Forecasting Using Low-Cost IoT Sensors: A Case Study in a University Smart Laboratory" Algorithms 19, no. 5: 328. https://doi.org/10.3390/a19050328

APA Style

Baigarayeva, Z., Boltaboyeva, A., Kalpeyeva, Z., Uskenbayeva, R., Turmakhan, M., Kakharov, A., Anartayeva, A., & Moldagulova, A. (2026). Comparative Development of Machine Learning Models for Short-Term Indoor CO₂ Forecasting Using Low-Cost IoT Sensors: A Case Study in a University Smart Laboratory. Algorithms, 19(5), 328. https://doi.org/10.3390/a19050328

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu