Empirical Evaluation of a CNN-ResNet-RF Hybrid Model for Occupancy Rate Prediction in Passive Ultra-Low-Energy Buildings

Liu, Yiwen; Xue, Yibing; Liu, Chunlu; Wang, Runyu

doi:10.3390/urbansci10030150

Open AccessArticle

Empirical Evaluation of a CNN-ResNet-RF Hybrid Model for Occupancy Rate Prediction in Passive Ultra-Low-Energy Buildings

¹

School of Architecture and Urban Planning, Shandong Jianzhu University, Jinan 250101, China

²

School of Architecture and Built Environment, Deakin University, Geelong, VIC 3220, Australia

³

College of Architecture & the Built Environment, Thomas Jefferson University, Philadelphia, PA 19107, USA

^*

Author to whom correspondence should be addressed.

Urban Sci. 2026, 10(3), 150; https://doi.org/10.3390/urbansci10030150

Submission received: 6 January 2026 / Revised: 5 March 2026 / Accepted: 6 March 2026 / Published: 11 March 2026

(This article belongs to the Topic Geospatial AI: Systems, Model, Methods, and Applications)

Download

Browse Figures

Versions Notes

Abstract

Accurate occupancy information is critical for optimizing energy efficiency in buildings. Hybrid machine learning models have demonstrated great potential in previous studies; however, their application in passive ultra-low-energy buildings remains underexplored. This study conducts an empirical evaluation of real-time occupancy rate prediction using a CNN-ResNet-RF hybrid model based on multi-source environmental and behavioral data from a passive ultra-low-energy educational building. The model integrates Convolutional Neural Networks (CNN) for local feature extraction, Residual Networks (ResNet) to enhance deep feature representation, and Random Forests (RF) for ensemble-based generalization. Indoor CO₂ concentration exhibits the strongest linear correlation with occupancy rate (r = 0.54), indicating a meaningful association with occupancy dynamics. The model demonstrates strong predictive performance on the test set, with a coefficient of determination (R²) of 0.964, a root mean square error (RMSE) of 0.054, and a residual prediction deviation (RPD) exceeding 5. Compared with baseline models such as CNN, RF, and CNN-RF, the proposed framework exhibits generally lower prediction errors and improved stability. Further lightweight compression experiments reveal that the structured compact CNN-ResNet-RF-25 variant achieves even better accuracy (R² = 0.9748, RMSE = 0.0449, RPD = 6.327) while substantially reducing model complexity, demonstrating strong deployment potential in resource-constrained environments.

Keywords:

machine learning; occupancy rate prediction; CNN-ResNet-RF model; passive ultra-low-energy buildings; multi-source sensor data

1. Introduction

With the continuous development of intelligent building technologies and advanced energy management systems, obtaining accurate and real-time building occupancy information has become a key prerequisite for improving energy efficiency and optimizing operation strategies [1,2,3,4,5]. Based on reliable occupancy data, HVAC, lighting, and outlet equipment can be more precisely aligned with actual usage demand, thereby reducing energy waste during unoccupied or low-usage periods and improving overall operational efficiency. Occupancy rate prediction plays an increasingly important role as a core technology for dynamic control and on-demand adjustment [6,7,8].

Occupancy rate continuously influences indoor physical variables through the interaction between occupants and building systems. Occupant entry, movement, and behavioral choices, such as switching appliances, opening windows, or unlocking doors, directly or indirectly trigger operational responses from building systems, thereby altering energy consumption levels, air quality, and local environmental conditions. These behavior-induced changes transform occupancy status into observable operational signals, providing the physical foundation for occupancy rate prediction based on sensor data.

In recent years, advancements in sensor technology have significantly enhanced the ability to collect multi-source data, and data-driven modeling techniques have evolved rapidly [9,10,11]. From traditional machine learning algorithms (such as decision trees and random forests) [12,13] to deep learning architectures (such as convolutional neural networks and recurrent neural networks) [14], a variety of methods have been widely applied to uncover the complex relationships between environmental variables and human behavior. These models have demonstrated strong performance in conventional building scenarios, largely due to the pronounced variations in environmental parameters and well-defined occupancy patterns, which provide a rich informational foundation for model training.

However, the applicability of these modeling methods in passive and ultra-low-energy buildings has not yet been systematically investigated. Compared with conventional buildings, where occupancy-related environmental signals typically exhibit pronounced and rapid fluctuations, passive ultra-low-energy buildings operate under highly stable conditions due to high-performance building envelopes and stable operational strategies, resulting in overall low-dynamic characteristics of indoor environmental variables such as temperature and humidity. Nevertheless, the opening of doors and windows remains an unavoidable aspect of real-world operation. When such actions occur, the indoor air renewal process can be significantly disrupted over short time scales, leading to noticeable fluctuations in air quality indicators such as carbon dioxide concentration [15,16].

Against this backdrop of low-dynamic operation, physical variables measured within rooms often exhibit two distinctly different temporal characteristics simultaneously: on the one hand, over larger time scales, these variables follow highly stable patterns governed by building operation strategies and envelope performance; on the other hand, over smaller time scales, physical variables that are directly influenced by occupant actions may still exhibit localized and transient abrupt responses. In such contexts, single-model approaches may struggle to detect weak occupancy signals and to distinguish between long-term operational stability and short-term occupant-induced changes, thereby limiting prediction robustness. Therefore, it is necessary to adopt a hybrid modeling framework that combines foundational feature extraction models such as CNNs with residual structures and ensemble learning methods. Such a framework enhances the representation of multi-scale features and improves both the robustness and accuracy of occupancy rate prediction in passive ultra-low-energy buildings.

To address the modeling challenges associated with low-dynamic indoor environments, a hybrid deep learning framework is applied (CNN-ResNet-RF) to the task of real-time occupancy rate prediction in passive ultra-low-energy buildings. The model integrates a CNN for local feature extraction, a ResNet for enhanced deep feature representation, and an RF for ensemble-based result generalization. This study focuses on evaluating the applicability and robustness of the hybrid model in real-world passive building scenarios, particularly under conditions where occupancy-related signal variations are subtle and complex, in comparison with other baseline models. A comprehensive multi-source dataset was collected from a passive ultra-low-energy educational building, including CO₂ concentration, temperature, humidity, energy consumption, and user behavior logs.

Accordingly, the following objectives are established:

To improve model generalizability under passive building conditions, the prediction performance of the CNN-ResNet-RF-based method is evaluated using real-world data from an ultra-low-energy building.
To address the limitations of single-sensor input, we examine the contribution of multi-source environmental and behavioral data fusion to the robustness and accuracy of occupancy rate prediction.
To enhance the modeling of occupancy dynamics, we investigate the combined effects of residual deep learning and ensemble learning in capturing sequential variations in occupancy.

2. Literature Review

2.1. Data-Driven Occupancy Prediction Using Machine Learning

Early occupancy estimation methods primarily relied on CO₂-based mass balance models, in many cases combined with Kalman-filter-based state estimation to infer occupancy as a latent variable, while vision-based systems later emerged with high accuracy but limited deployability [17]. In recent years, machine learning techniques have been extensively applied in occupancy prediction studies within the building sector [18]. Relevant research indicates that by learning the variation patterns of indoor environmental variables such as carbon dioxide concentration, temperature, and humidity, machine learning models can effectively infer the occupancy status of building spaces. This provides data-driven insights for energy consumption control, HVAC system scheduling, and indoor environmental optimization.

In terms of modeling methods, most studies adopt a supervised learning framework based on non-intrusive environmental sensing data. The CNN–XGBoost hybrid model proposed by Mohammadabadi et al. [19] automatically extracts environmental time-series features via CNN and employs XGBoost for classification, outperforming traditional machine learning models in residential building experiments. This study demonstrates that when environmental variables exhibit significant responsiveness to human activity, combining deep feature extraction with ensemble learning can substantially enhance prediction performance. Similarly, Wang et al. [20] proposed a two-layer occupancy detection framework that combines rule-based human activity recognition with machine-learning classifiers using non-intrusive temperature and motion sensor data. Their results further validated the role of multi-source perception information in enhancing the robustness of occupancy prediction. However, under passive or low-dynamic operating conditions, the response relationship between CO₂ and occupancy may be significantly weakened. Huang et al. [21] demonstrated that leveraging spatial characteristics of indoor CO₂ distributions provides more discriminative information for occupancy detection than relying solely on temporal CO₂ variations, particularly in naturally ventilated buildings.

In a comparative study of different machine learning algorithms, Banihashemi et al. [22] conducted a systematic evaluation of models such as Random Forest (RF) and XGBoost based on non-invasive environmental sensing data. The results indicate that under well-designed feature construction, tree-based models can achieve high predictive accuracy without requiring complex model architectures. Similarly, Singh et al. [23] compared multiple algorithms by incorporating feature selection mechanisms, further highlighting that model performance improvements do not solely depend on structural complexity but are closely tied to factors such as feature relevance and redundancy control. These studies collectively demonstrate that in application scenarios with relatively stable environmental signal variations, traditional machine learning methods often achieve a more reasonable balance between computational efficiency and predictive performance.

Regarding dynamic occupancy prediction, some studies have attempted to incorporate time-series models to capture the temporal evolution of personnel occupancy. For instance, Zeleny et al. [24] combined Wi-Fi probe requests with a GRU model to achieve favorable occupancy prediction results under high-resolution temporal data conditions, demonstrating that recurrent neural networks possess advantages in dynamic modeling when behavioral patterns exhibit temporal continuity. However, Banihashemi et al. [22] also showed that in certain practical scenarios, traditional models based on environmental variables can achieve competitive dynamic prediction performance even without explicitly incorporating RNN structures.

Beyond specific model architectures, research has also expanded occupancy monitoring at the system and application levels. Huang et al. [25] focused on the deployability and engineering implementation of low-cost IoT occupancy counting systems, emphasizing feasible pathways for rapid deployment in real-world building environments. Ahmad et al. [26] systematically discussed the necessity of privacy-preserving occupancy monitoring solutions in public and non-residential buildings from both application and ethical perspectives. Concurrently, Han et al. [27] explored real-time monitoring of personnel distribution in complex or emergency scenarios through spatial partitioning and deep learning modeling. These studies broaden the application dimensions of occupancy modeling.

However, for passive or low-dynamic building environments characterized by limited environmental variable fluctuations and low sampling frequencies, the applicability of these methods remains under-validated. Under such conditions, the advantages of temporal models may be diminished, and complex sensing systems face substantial engineering implementation constraints. Therefore, it is necessary to re-evaluate the modeling value and performance boundaries of different machine learning methods specifically for low-dynamic, low-temporal-resolution scenarios.

2.2. Evaluation Metrics and Comparative Performance Across Occupancy Tasks

In occupancy-related research, the reporting methods for model performance and the selection of evaluation metrics directly impact the comparability of different study results. As summarized in Table 1, existing studies exhibit significant differences in task definition, leading to divergent evaluation metric systems. Overall, relevant work primarily focuses on two core task paradigms: one addresses occupancy state recognition, typically modeled as a classification task; the other focuses on modeling continuous occupancy rates or occupancy intensity, which is closer to a regression problem.

For studies targeting occupancy state recognition (occupied/unoccupied) or entry/exit event detection (±1/0), the problem is usually addressed using a classification modeling framework. Consequently, accuracy, precision, recall, and F1 score become the most commonly used evaluation metrics. For instance, Mohammadabadi et al. [19] constructed a CNN–XGBoost model using CO₂, temperature, and humidity data in mechanically ventilated residences. They primarily evaluated model performance using F1 score and confusion matrices, with MAE additionally employed to characterize misclassification severity, thereby mitigating the impact of class imbalance on evaluation results. Similarly, Wang et al. [20] validated their approach using multiple machine-learning models, achieving accuracy and F1-score values of approximately 0.95. Under multi-source non-invasive sensing conditions, Banihashemi et al. [22] systematically compared models including Random Forest, XGBoost, and artificial neural networks. In office building scenarios, they achieved accuracies exceeding 0.95 and F1 scores ranging from 0.92 to 0.93. This demonstrates that under well-designed feature engineering, traditional tree-based models can perform comparably to more complex deep learning methods in classification tasks. Similar trends emerge in high-density or highly dynamic scenarios. For instance, the DMFF-Transformer model [8], which integrates video, audio, and multi-sensor inputs, achieved approximately 97% accuracy in high-density classroom environments. However, its performance heavily relies on high perceptual density and complex data acquisition conditions.

Beyond occupancy detection methods relying on environmental sensors, some studies attempt to infer occupancy status using low-resolution or indirect signals. For instance, Liang and Wang [28] proposed a Transformer–RNN hybrid model based on low-resolution smart meter data for occupancy recognition in residential settings. Although the correlation between electricity data and human activity is relatively indirect, such models still achieve reference-worthy occupancy recognition performance in home scenarios. However, when classification tasks rely on low-resolution or indirectly reflected human activity data as input signals, even deep learning models exhibit significantly constrained predictive performance. ABODE-Net achieves occupancy state recognition accuracy of approximately 0.86–0.92 and F1 scores around 0.82–0.86 [29], significantly lower than results from studies based on environmental sensing or visual perception.

In contrast, when research objectives shift from “state discrimination” to modeling continuous occupancy levels or occupancy rates, the problem becomes fundamentally closer to a regression task. Such studies often focus on occupancy intensity, expected occupancy, or continuous trend changes rather than discrete occupancy labels. Therefore, in modeling continuous occupancy rates or occupancy intensity, studies typically employ error-based metrics such as RMSE and MAE. In some research that maps continuous occupancy to occupancy state classification within time windows, ranking-based metrics like Average Precision are also introduced to supplement the evaluation of a model’s classification capability across different prediction intervals. For instance, Kim [30] constructed an LSTM model for campus room occupancy based on historical usage data and climate variables, reporting average precision to assess prediction capabilities across different time windows. Qolomany et al. [31] similarly employed regression metrics such as RMSE in Wi-Fi-based occupancy prediction to measure continuous forecasting errors.

As shown in Table 1, the performance variations across different studies indicate that model performance is not solely determined by algorithmic complexity. Instead, it is closely related to the task paradigm, building type, occupancy intensity, input data modality, and temporal resolution. In high-dynamic, strong-signal application scenarios, classification models combined with metrics such as Accuracy and F1 typically provide a reliable representation of model discriminative performance. Conversely, in low-dynamic or passively operated building environments, changes in occupancy levels are often more subtle and better described through regression modeling. Consequently, the predictive performance of such models should be evaluated using error-based metrics such as RMSE and MAE. Based on this understanding, this paper models the occupancy rate prediction problem as a regression task and selects RMSE and MAE as the primary evaluation metrics. This approach more accurately reflects the error characteristics and practical application value of occupancy intensity prediction under low-dynamic, passive building conditions.

3. Case Study

3.1. Case Description

The passive ultra-low-energy teaching and laboratory building is located at the new campus of Shandong Jianzhu University in Jinan, Shandong Province, China, and has been recognized as one of the first national passive house demonstration projects. This project was constructed following the German Passive House Design Standards and represents China’s first demonstration project employing prefabricated construction methods for ultra-low-energy buildings, as shown in Figure 1. Figure 1 is an architectural drawing obtained from the building’s internal design documentation, used to illustrate the overall layout and the position of the test room. The building has a footprint area of 1547.43 m², a total floor area of 9696.30 m², and a height of 23.96 m, comprising six above-ground floors [33].

The building envelope was designed in strict accordance with the performance targets of passive ultra-low-energy buildings, with particular emphasis on thermal-bridge-free detailing and airtight construction strategies. The roof assembly achieves a thermal transmittance (U-value) of 0.14 W/(m²·K) and consists of a multilayer build-up including reinforced concrete structural slab, extruded polystyrene (XPS) thermal insulation, SBS-modified bituminous waterproofing membranes, and protective concrete and mortar layers arranged to ensure thermal continuity and moisture control. Similarly, the external wall system attains a U-value of 0.14 W/(m²·K) and adopts an externally insulated configuration composed of graphite polystyrene (GPS) insulation, an autoclaved aerated concrete (AAC) structural layer, and interior and exterior finishing layers designed to maintain thermal performance and airtightness [33]. Detailed layer configurations are illustrated in Figure 2.

The room investigated in this study is Room 605, located on the south side of the teaching building. The passive building serves as office and educational spaces for faculty and students, with primary functional areas including research rooms on the north and south sides and laboratories on the east and west sides. Figure 3 shows simplified floor-plan drawings obtained from the same internal design documentation, used to contextualize the spatial location of Room 605 and to clarify the geometric boundaries relevant for sensor deployment and environmental interpretation. Longitudinal and transverse sectional views passing through Room 605 are further illustrated in Figure 4 to provide vertical spatial context and envelope configuration.

The target room (Room 605) is on the south side of the building and has plan dimensions of 7.8 m × 7.5 m and a clear height of 3.7 m, with a 1.5 m × 2.1 m window. Room 605 is a furnished shared research office designed for a maximum capacity of 12 workstations. Under typical operating conditions, it is intermittently occupied by approximately 2–5 users, primarily engaged in sedentary activities such as computer-based research and administrative work. All measurements and simulations were conducted under normal use conditions, with the room furnished and occupied according to its usual usage pattern. The room is mainly occupied on weekdays (Monday–Friday) from 7:00 a.m. to 10:00 p.m., with breaks at 12:00–1:00 p.m. and 6:00–7:00 p.m.; weekend and holiday occupancy is infrequent. A monitoring campaign was carried out from 1 October to 30 December 2024, with hourly records collected between 6:00 a.m. and 11:00 p.m.

3.2. Data Collection

To support the analysis and modeling presented in later sections, the study first documents the actual on-site sensor deployment in Room 605. Figure 5 provides a concise visual summary of the actual sensor deployment in Room 605. The layouts are derived from internal architectural drawings and field installation records, offering spatial context for interpreting the environmental, air quality, occupancy, and energy-use data.

Thermal environment sensor locations: According to the “Standard for Building Thermal Environment Test Method” (JGJ/T 347-2014) [34], for rooms with an area ranging from 30 m² to less than 60 m², the measurement points for indoor temperature and humidity should be arranged along the longest diagonal of the room at quarter intervals. The sensor height should be 0.6 m above the floor to represent seated occupant conditions and 1.1 m for standing occupant conditions. Figure 5a plots three temperature–humidity loggers along the room diagonal on a scaled floor plan, indicating how the thermal sensors were positioned to comply with JGJ/T 347-2014.

Air quality sensor locations: Following the “Standard for Indoor Air Quality Control Design of Buildings” (JGJ/T 461-2019) [35], rooms with areas between 50 m² and 200 m² should be equipped with two sensors. These sensors should be installed in areas with good airflow, at a height ranging between 1.0 m and 1.5 m above the floor, while avoiding regions near ventilation openings or air ducts with high airflow velocity. Figure 5b shows the two CO₂/air-quality sensors placed along the same diagonal line on the room plan, indicating their positioning within the well-mixed airflow zone.

Occupancy sensor locations: The arrangement of the PIR sensors is optimized for the size and shape of the room. For square rooms of more than 50 m², a five-point arrangement was used: one mounted in the center of the ceiling to monitor the main activity areas, and four more mounted in the corners at a height of 2.4–2.7 m towards the center of the room to ensure coverage with no dead spots. In addition to the sensor data, occupancy labels generated from user activity logs were introduced as reference values. These logs are summarized by hour, which more accurately reflects sedentary and other static states and makes up for the shortcomings of the PIR sensors in low-activity situations, as well as accurately counting the number of people and improving the accuracy of the model analysis. Figure 5c depicts this five-point ceiling layout on the floor plan, making explicit how the PIR sensors were distributed to achieve full coverage and minimize blind zones in occupancy detection.

Energy consumption sensor locations: This passive, low-energy office room has 12 workstations, typically occupied by 2–4 people, with a relatively constant pattern of energy consumption. The room is equipped with four sets of three-tube fluorescent lamps, a vertical water heater, and two monitors that are less frequently used, the latter having a lower load. Three smart sockets have been installed to monitor power usage: two to plug-in panels at the main workstations to record office equipment usage, and one for the water heater to monitor its intermittent high power usage. The lighting system is fixed wiring, not connected to smart sockets, and its energy consumption is included in the overall assessment through theoretical estimates. Figure 5d locates the three smart sockets along two horizontal measurement lines in the room, highlighting the areas where plug-level energy use was monitored.

3.3. Dataset Description

Based on the deployed sensors, a multi-source dataset was constructed to simultaneously capture external drivers and indoor responses influencing the occupancy dynamics of passive buildings. The recorded variables include meteorological data (outdoor temperature, relative humidity, wind velocity, rainfall), indoor environmental parameters (indoor temperature and humidity, CO₂ concentration, door/window status), temporal descriptors, and plug-level energy consumption. Together, these variables represent the major physical and behavioral factors affecting occupancy behavior in a low-dynamic indoor environment. Table 2 lists all input features and the target variable used in the model, together with their abbreviations, data types, and units. By specifying these attributes, the table clarifies the structure of the dataset and facilitates the understanding of the preprocessing and modeling workflow. Table 3 presents the technical specifications of the sensors that generated the variables, documenting measurement ranges and accuracies. This information is used to characterize the quality of the raw data, and the reliability of the multi-source inputs used in model development can therefore be evaluated.

In this study, occupancy refers to the number of people present in the monitored room during each hourly sampling interval, obtained through PIR sensor data and manually recorded logs. The occupancy rate is defined as the ratio between this hourly occupant count and the maximum designed capacity of the room, providing a continuous ground-truth label used for model training and evaluation.

4. Methodology

Occupancy rate prediction is defined as the core methodological task. Based on the actual data-processing and modeling procedures implemented in this study, Figure 6 was generated to illustrate the overall workflow of the proposed occupancy rate prediction framework. It outlines how multi-source environmental and behavioral data are collected, preprocessed, and fed into the CNN–ResNet–RF model to generate continuous occupancy-rate estimates. This integrated pipeline clarifies the methodological sequence adopted in this study and supports the development of intelligent, occupancy-aware energy management strategies for low-energy buildings.

The prediction task is formulated as a supervised regression problem. The regression target is the hourly occupancy rate of the monitored room, defined as a continuous variable between 0 and 1. Let x_t∈R¹⁰ denote the vector of multi-source environmental and behavioral features at hour t (corresponding to the 10 input independent variables listed in Table 2), and let y_t ∈ [0, 1] denote the occupancy-rate label defined in Section 3.3. The CNN–ResNet–RF and baseline models learn a mapping from x_t to ŷ_t, optimized using mean-squared-error loss, as defined in Equation (1).

f_{θ} : x_{t} \mapsto {\hat{y}}_{t}

(1)

4.1. Data Preprocessing

To ensure comparability across heterogeneous variables, outliers were capped at the 1st–99th percentiles, and missing values (<2.5%) were imputed using within-hour means. Continuous features were standardized using z-score normalization (Equation (2)) and, where required by model input ranges, min–max normalized to [0, 1] (Equation (3)). All preprocessing steps were fitted exclusively on the training folds to prevent data leakage.

X_{i} = \overline{X}

(2)

X_{m i n - m a x} = \frac{X - X_{m i n}}{X_{m a x} - X_{m i n}}

(3)

4.2. CNN–ResNet–RF Modeling Framework

The proposed CNN–ResNet–RF model integrates convolutional feature extraction, residual feature refinement, and Random Forest regression into a unified hybrid architecture for continuous occupancy-rate prediction. To the best of our knowledge, this specific sequential integration has not been systematically reported in the existing literature. This hybrid design is motivated by the low-dynamic and weak-signal characteristics of passive buildings, where CNN-based feature learning helps capture local patterns across multi-source inputs, residual connections stabilize deeper representations and mitigate degradation in learning, and the RF regressor improves robustness and generalization under limited, noisy, and heterogeneous sensor data [36,37,38]. Figure 7 illustrates the overall architecture and the conceptual flow between components. The following subsections describe the implementation details of each module.

4.2.1. CNN Module: Low-Level Feature Extraction

The CNN module takes 10-dimensional normalized feature vectors as input, sequentially passing through two one-dimensional convolutional layers with kernel sizes of 3 and 2, respectively. Combined with batch normalization and ReLU activation functions, it extracts local features. The subsequent two max-pooling layers progressively reduce spatial dimensions while preserving critical temporal feature information. The resulting feature maps undergo unrolling before feeding into a fully connected layer to represent higher-level features. Figure 8 illustrates the CNN front-end architecture adopted in this study, visually depicting the transformation path of input feature vectors through convolutional, pooling, and fully connected layers, illustrating the feature transformation process within the CNN module.

4.2.2. ResNet Module: High-Level Feature Extraction

To mitigate the vanishing gradient problem in deep networks and stabilize temporal features, this study introduces residual blocks following CNN layers. The measured physical variables in Room 605 exhibit dual temporal characteristics under passive ultra-low-energy operation. These quasi-stable and transient dynamics indicate that predictive information is embedded both in persistent baseline patterns and in short-term response signals. Within this context, successive nonlinear transformations in convolutional layers may gradually emphasize dominant variations while potentially attenuating subtle baseline information that remains informative under low-dynamic conditions. Residual connections mitigate this risk by preserving identity mappings of earlier representations, enabling deeper layers to refine feature abstractions without discarding the underlying temporal and structural characteristics of the input signals. Figure 9 illustrates the residual block architecture within the hybrid model, demonstrating how the intermediate transformation F(x) is added to the identity mapping to form the residual output. This approach maintains the stability of input information throughout the deep feature extraction process.

4.2.3. RF Module: Deep-Feature Regression

After convolutional and residual feature extraction, the RF regressor maps the high-level feature representations to continuous occupancy-rate predictions in the range 0–1. By aggregating outputs from multiple decision trees, RF improves robustness and reduces overfitting through ensemble aggregation.

Figure 10 illustrates the RF regression mechanism, showing how individual decision trees generate independent estimates that are averaged to form the final prediction.

4.3. Model Training, Evaluation, and Optimization

4.3.1. Training Configuration

This section presents the unified workflow adopted for training and evaluating the CNN–ResNet–RF hybrid model and the baseline models. Figure 11 organizes the model-training procedure into a schematic form derived from the study’s experimental design. It shows how the dataset undergoes preprocessing, how the cleaned and transformed features are repeatedly divided into training and evaluation subsets, and how these subsets are used to train and assess the CNN, RF, CNN-RF, and CNN–ResNet-RF models. The flowchart summarizes the full experimental workflow, from data preparation to iterative training and evaluation, ensuring that all models are compared under a consistent and reproducible sampling strategy. In each evaluation round, the trained model outputs the predicted occupancy rate ŷ, which is compared with the observed occupancy rate y to quantify prediction performance.

Model implementation was conducted in MATLAB R2023b (MathWorks, Natick, MA, USA) with GPU acceleration. Input data were normalized prior to training to ensure numerical stability and consistent scaling for the CNN–ResNet components. The model was trained for 70 epochs using a batch size of 16 and the Adam optimizer (learning rate = 0.001, β₁ = 0.9, β₂ = 0.999, ε = 1 × 10⁻⁸). The mean-squared error (MSE) served as the objective loss function. For the RF regression component, a fixed hyperparameter configuration was adopted to ensure reproducibility and fair comparison across models. Specifically, the number of trees was set to 100, the minimum number of samples per leaf node was set to 5, and the number of features considered at each split was set to the square root of the deep feature dimension produced by the preceding network. Bootstrap sampling was enabled, and all RF models shared identical hyperparameter settings across experiments.

To assess model stability and prevent overfitting, a repeated 60–40% train–test splitting procedure was employed. In each round, the dataset was randomly shuffled before partitioning. This configuration enhances robustness against sampling variability and enables fair comparisons among different models.

4.3.2. Performance Evaluation Metrics

Model performance was quantitatively assessed using five standard regression metrics: the coefficient of determination (R²), root mean square error (RMSE), mean squared error (MSE), mean absolute error (MAE), and residual prediction deviation (RPD), as defined in Equations (4)–(8). These metrics jointly characterize goodness of fit, absolute error magnitude, and predictive reliability relative to the natural variability of occupancy-rate data. Higher R² and RPD values and lower RMSE, MSE, and MAE indicate better predictive accuracy.

R^{2} = 1 - \frac{\sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}{\sum_{i = 1}^{n} {(y_{i} - {\bar{y}}_{i})}^{2}}

(4)

R M S E = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}

(5)

M S E = \frac{1}{n} \sum_{i = 1}^{\infty} (y_{i} - {\hat{y}}_{i})^{2}

(6)

M A E = \frac{1}{n} \sum_{i = 1}^{n} | y_{i} - {\hat{y}}_{i} |

(7)

R P D = \frac{S D_{y}}{R M S E}

(8)

4.3.3. Design of Lightweight Variants

Building on the baseline architecture, the lightweight design was introduced to enhance computational efficiency and enable real-time deployment on resource-constrained platforms. The baseline CNN–ResNet–RF model employs two convolutional layers, each with 8 filters, followed by batch normalization, ReLU activation, and 2 × 2 max-pooling, as well as a fully connected (FC) layer containing 96 neurons before the Random Forest regressor. In the lightweight configuration, the number of convolutional filters and the FC width were proportionally reduced while maintaining all other hyperparameters and regularization terms identical to the baseline. Three compressed variants were implemented—CNN–ResNet–RF-75, CNN–ResNet–RF-50, and CNN–ResNet–RF-25—which retained 75%, 50%, and 25% of the original convolutional filters, respectively. Consequently, their convolutional parameters and multiply–accumulate operations (MACs) decreased by approximately 25%, 50%, and 75% relative to the baseline, with corresponding reductions in model size, training time per epoch, and inference latency. The structural integrity of the residual connections and batch normalization was preserved to ensure stable training. All models adopted identical training configurations to ensure fair comparison among variants.

5. Results

5.1. Feature Correlation Analysis and Predictor Categorization

To understand how environmental and behavioral variables contribute to occupancy-rate variation in a passive ultra-low-energy building, this section first examines their statistical relationships before model training. The objective is to identify which inputs carry meaningful occupancy-related signals under low-dynamic indoor conditions and to classify predictors according to their relevance.

The correlation heatmap in Figure 12 was generated by computing pairwise Pearson correlation coefficients across all features in the dataset, visualized to reveal linear dependencies among variables. It shows that variables coming from the indoor loop (CO₂, indoor temperature/humidity, plug-level energy, window/door state) form a tight cluster, while outdoor meteorological factors are only weakly linked to occupancy. This confirms that, in a passive ultra-low-energy building with stable envelope performance, occupancy-induced signals are mainly reflected in indoor air and plug-load responses rather than in outdoor drivers.

The boxplots in Figure 13 provide distribution-level evidence of how indoor environmental variables respond to different occupancy levels. These boxplots were generated by grouping all hourly samples by occupancy category and summarizing their statistical distribution (median, quartiles, and outlier range) for each environmental feature. CO₂ concentration exhibits the clearest upward shift, reinforcing its strong correlation strength and sensitivity to human presence. Indoor energy use also increases with higher occupancy and shows greater variability at peak levels, reflecting intensified appliance and plug-load activity. By contrast, indoor temperature and humidity change only modestly, consistent with the stable thermal performance of the building. Outdoor variables, however, show almost no systematic variation across occupancy groups, underscoring their limited relevance for occupancy inference. These distributional patterns further validate the feature-relevance trends identified earlier.

Figure 14 illustrates these feature-wise correlation coefficients in a bar-plot format, providing a clear visual comparison of their relative predictive strength. It shows that indoor CO₂ concentration is the dominant predictor (ρ = 0.54), followed by indoor temperature, plug-load energy and opening/closing status (ρ ≈ 0.09–0.20). This correlation ranking clarifies the relative contribution of each feature to occupancy rate. It should be clarified that no feature elimination was performed; all variables listed in Table 2 were retained in the modeling process to preserve potential nonlinear and interaction effects.

At the model level, the feature importance results from the random forest further reveal the mechanism of occupancy signals in passive ultra-low-energy buildings. Unlike simple linear correlations, feature importance reflects the actual contribution of each variable to prediction error in a nonlinear decision-making process. As shown in Figure 15, results show that outlet-level energy consumption and door or window status dominate the prediction process, indicating that in passive buildings with stable operating conditions and limited environmental fluctuations, occupancy behavior is more readily identified through behavioral and equipment response variables that directly reflect human activity. Although indoor CO₂ concentration exhibited the strongest linear correlation in the preceding analysis, its feature importance was relatively low, suggesting that its predictive contribution may be partially captured by behavior-related variables. In contrast, outdoor meteorological factors exhibited negligible or inconsistent contributions. This indicates that in passive buildings, the explanatory power of external environmental changes for occupancy rate prediction is significantly weakened by building envelope characteristics and operational strategies. These findings suggest that for low-dynamic building environments, emphasizing behavioral and equipment response features is more conducive to constructing robust occupancy rate prediction models than relying solely on environmental physical quantities.

5.2. Performance Evaluation of Occupancy Rate Prediction

5.2.1. Performance of Individual Models

This subsection evaluates the performance of different models in occupancy rate prediction, focusing on their ability to predict hourly occupancy rates. Because the prediction task is formulated as a supervised regression problem, the predicted occupancy-rate values on the held-out test set are compared directly with the measured rates. These results are illustrated through scatter plots against the 1:1 reference line (Figure 16, Figure 17, Figure 18 and Figure 19), which reveal how well each model reconstructs the underlying occupancy-rate patterns and where systematic deviations occur. The plots provide a clear basis for assessing variance capture, bias tendencies, and the overall predictive reliability of each model.

In the training phase, the CNN model performs the weakest among the four. As shown in Figure 16, although predictions are relatively accurate at low occupancy levels, errors increase substantially in the mid- and high-occupancy intervals, indicating limited ability to model nonlinear indoor feature relationships. Testing-phase performance further declines. When occupancy exceeds 0.6, deviations grow rapidly and prediction variance becomes large, revealing poor generalization and sensitivity to previously unseen conditions.

As shown in Figure 17, in the training phase, the CNN-RF model shows improved fitting ability over the pure CNN, with predictions tightly clustered around the ideal line in the 0.3–0.8 interval. However, systematic underestimation persists near extreme occupancy levels (~0.9), indicating that shallow CNN features still struggle to encode high-intensity events. In the testing phase, the CNN-RF model generalizes better than the CNN model. Its performance remains acceptable in the 0.2–0.6 range, but prediction variance increases significantly at higher occupancy rates, reflecting reduced robustness when confronted with unseen complex patterns.

The performance of the RF model on the training set is comparable to that of CNN-ResNet-RF, with most points distributed near the ideal line. As shown in Figure 18, its strong accuracy in the 0.2–0.8 interval suggests that key indoor variables are well captured by ensemble-based decision structures. However, under high-occupancy conditions (~0.9), the model occasionally underestimates peaks, indicating limitations in modeling rapid shifts. During testing, the RF model continues to show strong generalization. Although slight deviations and increased dispersion are observed at higher occupancy levels, the overall trend alignment remains stable, indicating robust predictive performance for practical applications.

The CNN-ResNet-RF model exhibits good fitting performance in the training phase, with predicted values highly consistent with the actual occupancy rates. As shown in Figure 19, the predictions align closely with the 1:1 reference line, indicating small residual dispersion and only a few outliers. This demonstrates that the model effectively captures the dominant variation patterns of occupancy rate without significant overfitting. In the testing phase, the CNN-ResNet-RF model maintains reasonable generalization performance, although increased dispersion is observed compared with the training phase. Although mild over- and under-estimation still occur within the 0.2–0.8 range, these deviations remain limited and likely reflect the sparse representation of certain environmental fluctuations in the training data. Overall, the model shows high predictive accuracy with room for refinement in mid-range transitions.

5.2.2. Metric Results and Cross-Model Comparison

To assess how the four models behave across the full range of occupancy conditions, their test-set predictions were directly compared with the ground-truth values using sample-aligned prediction curves (Figure 20, Figure 21, Figure 22 and Figure 23). These plots were generated by pairing each predicted occupancy-rate value with its corresponding measured value at the same time index, and overlaying both series to visualize point-wise agreement. This representation highlights how closely each model follows the temporal pattern of the true occupancy rate, where deviations emerge, and whether errors cluster during rapid changes or remain stable under flat conditions. In the error distribution panels, the horizontal axis shows the prediction error (y_pred − y_true) and the vertical axis indicates error density. In the zoomed prediction panels, the horizontal axis represents the sample index in the test set and the vertical axis represents the occupancy rate, with curves corresponding to y_true and y_pred.

The CNN–ResNet–RF model maintains the tightest prediction alignment within the zoomed test interval, showing small and relatively stable deviations from the true values. By contrast, the RF model exhibits slightly larger departures from the ground truth at several peak points, indicating that its predictions fluctuate more around sharp changes. This behavior suggests that the hybrid structure enhances the model’s ability to follow rapid variations in the occupancy trajectory, whereas the pure RF tends to respond more conservatively to sharp peaks. A different weakness appears in the CNN and CNN–RF models: their prediction errors increase noticeably during segments where the occupancy curve remains relatively flat, suggesting that these architectures rely more heavily on pronounced feature gradients. Taken together, these observations show how the four architectures differ in their capacity to track rapid transitions while maintaining accuracy under smoother conditions.

Figure 24 provides a unified view of prediction-error behavior across the four models. The left panel shows model-specific histograms of raw prediction errors, while the right panel presents smoothed error-density curves on a common scale, enabling direct comparison of error dispersion and tail behavior. Together, these visualizations characterize how tightly each model’s errors are concentrated around zero and how frequently large deviations occur. The CNN–ResNet–RF model exhibits the most compact and symmetric distribution centered near zero, indicating the lowest error variability. In contrast, the CNN model shows the widest spread with heavier tails, reflecting larger variance and more frequent extreme errors, while the RF and CNN–RF models demonstrate intermediate behavior.

Overall, model stability in passive buildings is associated with differences in model depth and their ability to respond to changing input patterns: small but correlated changes such as CO₂ and energy are better handled when residual paths enforce temporal continuity, while sparse, high-impact events such as door or window operations are still effectively captured by the RF back end. The combined evidence from temporal plots and error distributions explains the superior consistency of the CNN–ResNet–RF architecture across both steady and transition periods.

Figure 25 and Figure 26 are included to provide a comprehensive and comparable evaluation of regression accuracy across the four models beyond individual case analyses. Figure 25 is generated by computing standard regression metrics (R², RMSE, MSE, and RPD) on the same test set for all architectures, while Figure 26 visualizes the corresponding MAE distributions using violin plots derived from the same predictions.

Together, the figures show that the CNN–ResNet–RF model achieves the best overall accuracy, while the RF model performs at a comparable level on this hourly, low-dynamic dataset, and both outperform the CNN and CNN–RF baselines. The consistent performance ordering (CNN–ResNet–RF ≈ RF > CNN–RF > CNN) indicates that integrating residual learning and RF-based regression improves robustness without eliminating the strengths of tree ensembles. These results support the conclusion that hybrid deep–ensemble architectures offer a reliable and competitive approach for occupancy prediction in passive ultra-low-energy buildings.

5.3. Performance of Lightweight CNN–ResNet–RF Variants

To support the lightweighting analysis, Table 4 and Figure 27 were introduced to evaluate whether model compression preserves predictive fidelity while reducing architectural complexity under a unified evaluation setting. Figure 27a shows that the predicted values from all compressed variants cluster closely around the 1:1 reference line, indicating that the core input–output mapping learned by the full model is largely retained after compression. Figure 27b compares the sample-wise series across models and demonstrates that the compressed variants follow the peaks and valleys of the true values with only minor deviations, illustrating consistent sample-level correspondence across the test set. Table 4 then summarizes the corresponding regression metrics for all compression levels using the same test set.

The compression experiment reveals that the original architecture is moderately over-parameterized relative to the low-dynamic characteristics of this dataset. When the number of convolutional filters and the width of the fully connected layer are reduced to 25% of the baseline configuration, the model does not lose accuracy; instead, it slightly improves to R² = 0.9748, RMSE = 0.0449, and RPD = 6.327. This outcome suggests that the residual blocks perform stable feature routing and can therefore be retained. It also suggests that a smaller model actually generalizes better on repetitive, hourly passive-building data. The 50% variant also maintains high accuracy (R² > 0.96, RPD > 5), which makes it a realistic option for deployment on mid-range edge devices. Only the 75% variant shows a small decline across all metrics, suggesting that lightweighting needs to be applied decisively rather than marginally to obtain a good accuracy–complexity trade-off. For the intended application—real-time or near-real-time occupancy-aware control in passive ultra-low-energy buildings, where sensor signals are weak but the control cycle is relatively slow—this result is important: a compact hybrid model that can run locally on building controllers is more valuable than a larger model that only performs well in an offline laboratory setting.

6. Conclusions

The results demonstrate the application of the CNN-ResNet-RF-based method for real-time occupancy rate prediction in passive ultra-low-energy buildings. The proposed approach effectively captures the complex relationships among environmental variables, occupancy rate dynamics, and energy consumption, thereby providing a valuable basis for improving both energy management and occupant comfort. The research highlights several contributions. First, it applies a CNN-ResNet-RF hybrid model for occupancy rate prediction in passive ultra-low-energy buildings by integrating multi-source feature extraction with ensemble learning to model occupant behavior. Second, it constructs a multi-source dataset that includes CO₂ concentration, temperature, humidity, energy consumption, and user activity logs, which supports a more comprehensive modeling of occupancy rate dynamics. Third, it provides a thorough evaluation of the applicability and robustness of the CNN-ResNet-RF method, demonstrating superior accuracy and deployment potential compared with single models and conventional hybrid approaches. Finally, it assesses compressed variants of the CNN-ResNet-RF model and shows that reducing the number of CNN filters can maintain predictive performance, offering practical guidance for deployment in resource-constrained environments.

Despite these promising outcomes, certain limitations should be acknowledged. The model was trained and tested using data from a single room within a building, which raises questions about its generalizability to other building types and occupancy contexts; future research should therefore extend the study to multiple rooms and buildings. Moreover, occupancy rate was treated as a single aggregated variable without distinguishing between different occupant roles, activities, or preferences, which may limit the model’s ability to capture behavior-specific patterns relevant to advanced occupancy-aware control strategies. Additionally, the dataset used in this study only covers autumn and early winter, making it necessary for future investigations to include multi-seasonal datasets to evaluate model performance under broader climatic conditions. Nevertheless, the proposed framework is based on commonly available non-intrusive environmental and behavioral variables. Therefore, although its performance has only been validated in a single room and limited seasonal context, the modeling pipeline remains structurally adaptable to other rooms and building scenarios, subject to further empirical validation.

Furthermore, the model’s predictive performance under extremely low occupancy scenarios merits further attention. When a space is nearly vacant or occupied by only one or two individuals, changes in indoor environmental conditions are typically minimal, and the associated signal patterns may become less clear, potentially increasing prediction uncertainty. Although the model demonstrates stable overall performance on the current dataset, its reliability in near-vacant situations would benefit from validation using datasets containing a higher proportion of low-occupancy samples. In addition, this study models occupancy rate as a continuous aggregated metric without explicitly distinguishing between individual and group behaviors. In practical contexts, however, environmental responses associated with group activities may differ from those generated by individual actions. Because the current framework does not explicitly account for these behavioral distinctions, patterns arising from collective interactions may not be fully separable from those produced by solitary occupants. Future research could incorporate behavior-oriented features or integrate occupancy prediction with activity recognition techniques to improve behavioral interpretability and modeling granularity.

From a practical perspective, advanced occupancy rate prediction methods such as the one presented in this study can play a crucial role in supporting building energy management systems by informing HVAC operations, potentially reducing energy consumption while maintaining occupant comfort. Future research should continue to expand the model’s application to a wider variety of building types, explore the use of transfer learning to adapt to unseen data, and refine occupancy rate prediction strategies to support the development of intelligent and energy-efficient building environments.

Author Contributions

Conceptualization, Y.L.; Investigation, Y.L.; Software, Y.L.; Visualization, Y.L.; Writing—original draft preparation, Y.L.; Methodology, Y.X., R.W. and C.L.; Resources, Y.X.; Supervision, Y.X. and C.L.; Writing—review and editing, Y.X. and C.L.; Formal analysis, C.L.; Data curation, R.W.; Project administration, R.W.; Validation, R.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data are available from the corresponding author upon reasonable request, subject to privacy and institutional restrictions.

Acknowledgments

During the preparation of this manuscript, the authors used ChatGPT (OpenAI), GPT-4 for language editing. The authors reviewed and edited the output and take full responsibility for the content of this publication.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Wang, X.; Kang, X.; Zhou, X.; Sun, H.; Shuai, Z.; Yan, D. Investigating the predictability of building occupancy time series using an information entropy-based approach. Build. Environ. 2025, 289, 114105. [Google Scholar] [CrossRef]
Liang, X.; Hong, T.; Shen, G.Q. Improving the accuracy of energy baseline models for commercial buildings with occupancy data. Appl. Energy 2016, 179, 247–260. [Google Scholar] [CrossRef]
Zhang, W.; Wu, Y.; Calautit, J.K. A review of occupancy rate prediction through machine learning for enhancing energy efficiency, air quality, and thermal comfort in the built environment. Renew. Sustain. Energy Rev. 2022, 167, 112704. [Google Scholar] [CrossRef]
Rueda, L.; Agbossou, K.; Cardenas, A.; Henao, N.; Kelouwani, S. A comprehensive review of approaches to building occupancy detection. Build. Environ. 2020, 180, 106966. [Google Scholar] [CrossRef]
Balvedi, B.F.; Ghisi, E.; Lamberts, R. A review of occupant behavior in residential buildings. Energy Build. 2018, 174, 495–505. [Google Scholar] [CrossRef]
Wang, W.; Chen, J.; Hong, T. Occupancy rate prediction through machine learning and data fusion of environmental sensing and Wi-Fi sensing in buildings. Autom. Constr. 2018, 94, 233–243. [Google Scholar] [CrossRef]
Wang, W.; Chen, J.; Song, X. Modeling and predicting occupancy profile in office space with a Wi-Fi probe-based Dynamic Markov Time-Window Inference approach. Build. Environ. 2017, 124, 130–142. [Google Scholar] [CrossRef]
Sun, K. DMFF: Deep multimodel feature fusion for building occupancy detection. Build. Environ. 2024, 253, 111355. [Google Scholar] [CrossRef]
Zhang, W.; Calautit, J.; Tien, P.W.; Wu, Y.; Wei, S. Deep learning models for vision-based occupancy detection in high occupancy buildings. J. Build. Eng. 2024, 98, 111355. [Google Scholar] [CrossRef]
Bouyakhsaine, K.; Brakez, A.; Draou, M. Prediction of residential building occupancy using machine learning with integrated sensor and survey data: Insights from a living lab in Morocco. Energy Build. 2024, 319, 114519. [Google Scholar] [CrossRef]
Karjou, P.F.; Saryazdi, S.K.; Stoffel, P.; Müller, D. Practical design and implementation of IoT-based occupancy monitoring systems for office buildings: A case study. Energy Build. 2024, 323, 114852. [Google Scholar] [CrossRef]
Kim, J.; Choi, A.; Moon, H.J.; Moon, J.W.; Sung, M. Occupancy estimation using IoT sensors and machine learning: Incorporating ventilation system operating state and preprocessed differential pressure data. Build. Environ. 2023, 246, 110979. [Google Scholar] [CrossRef]
Chen, Z.; Masood, M.K.; Soh, Y.C. A fusion framework for occupancy estimation in office buildings based on environmental sensor data. Energy Build. 2016, 133, 790–798. [Google Scholar] [CrossRef]
Ma, T.-Y.; Faye, S. Multistep electric vehicle charging station occupancy rate prediction using hybrid LSTM neural networks. Energy 2022, 244, 123217. [Google Scholar] [CrossRef]
Kitsopoulou, A.; Bellos, E.; Tzivanidis, C. An up-to-date review of passive building envelope technologies for sustainable design. Energies 2024, 17, 4039. [Google Scholar] [CrossRef]
Han, F.; Liu, B.; Dermentzis, G. Operational performance evaluation of a Passive house residential building in northern China based on long-term monitoring in winter. Indoor Built Environ. 2023, 32, 1464–1486. [Google Scholar] [CrossRef]
Ke, Y.-P.; Mumma, S.A. Using carbon dioxide measurements to determine occupancy for ventilation controls. ASHRAE Trans. 1997, 103, 365–374. Available online: https://www.aivc.org/sites/default/files/airbase_10515.pdf (accessed on 20 November 2025).
Caballero-Peña, J.; Osma-Pinto, G.; Rey, J.M.; Nagarsheth, S.; Henao, N.; Agbossou, K. Analysis of the building occupancy estimation and prediction process: A systematic review. Energy Build. 2024, 313, 114230. [Google Scholar] [CrossRef]
Mohammadabadi, A.; Rahnama, S.; Afshari, A. Indoor Occupancy Detection Based on Environmental Data Using CNN-XGBoost Model: Experimental Validation in a Residential Building. Sustainability 2022, 14, 14644. [Google Scholar] [CrossRef]
Wang, C.; Jiang, J.; Roth, T.; Nguyen, C.; Liu, Y.; Lee, H. Integrated sensor data processing for occupancy detection in residential buildings. Energy Build. 2021, 237, 110810. [Google Scholar] [CrossRef]
Huang, Q.; Syndicus, M.; Frisch, J.; van Treeck, C. Spatial features of CO₂ for occupancy detection in a naturally ventilated school building. Indoor Environ. 2024, 1, 100018. [Google Scholar] [CrossRef]
Banihashemi, F.; Weber, M.; Deghim, F.; Zong, C.; Lang, W. Occupancy modeling on non-intrusive indoor environmental data through machine learning. Build. Environ. 2024, 254, 111382. [Google Scholar] [CrossRef]
Singh, A.; Kansal, V.; Gaur, M.; Pandey, M.S.; Khanna, A.; Gupta, D.; Kansal, V.; Fortino, G.; Hassanien, A.E. Predicting Smart Building Occupancy Using Machine Learning. In Proceedings of the Third Doctoral Symposium on Computational Intelligence, DoSCI 2022; Lecture Notes in Networks and Systems; Springer: Singapore, 2023; Volume 479, pp. 145–151. [Google Scholar] [CrossRef]
Zeleny, O.; Fryza, T.; Bravenec, T.; Azizi, S. Detection of Room Occupancy in Smart Buildings. Radioengineering 2024, 33, 432–441. [Google Scholar] [CrossRef]
Huang, Q.; Rodriguez, K.; Whetstone, N.; Habel, S. Rapid Internet of Things (IoT) prototype for accurate people counting towards energy efficient buildings. J. Inf. Technol. Constr. 2019, 24, 1–13. [Google Scholar] [CrossRef]
Ahmad, J.; Larijani, H.; Emmanuel, R.; Mannion, M.; Javed, A. Occupancy detection in non-residential buildings–A survey and novel privacy preserved occupancy monitoring solution. Appl. Comput. Inform. 2021, 17, 279–295. [Google Scholar] [CrossRef]
Han, L.; Feng, H.; Liu, G.; Zhang, A.; Han, T. A real-time intelligent monitoring method for indoor evacuee distribution based on deep learning and spatial division. J. Build. Eng. 2024, 92, 109764. [Google Scholar] [CrossRef]
Liang, X.; Wang, H. Hybrid transformer-RNN architecture for household occupancy detection using low-resolution smart meter data. arXiv 2023, arXiv:2308.14114. [Google Scholar] [CrossRef]
Luo, Z.; Qi, R.; Li, Q.; Zheng, J.; Shao, S. ABODE-Net: An attention-based deep learning model for non-intrusive building occupancy detection using smart meter data. In Proceedings of the 7th International Conference on Smart Computing and Communication (SmartCom 2022), New York, NY, USA, 18–20 November 2022. [Google Scholar] [CrossRef]
Kim, J. LSTM-based space occupancy rate prediction towards efficient building energy management. arXiv 2020, arXiv:2012.08114. [Google Scholar] [CrossRef]
Qolomany, B.; Al-Fuqaha, A.; Benhaddou, D.; Gupta, A. Role of deep LSTM neural networks and WiFi networks in support of occupancy rate prediction in smart buildings. arXiv 2017, arXiv:1711.10355. [Google Scholar] [CrossRef]
Rafi, M.R.; Hu, F.; Li, S.; Song, A.; Zhang, X.; O’Neill, Z. Deep Weighted Fusion Learning (DWFL)-based multi-sensor fusion model for accurate building occupancy detection. Energy AI 2024, 17, 100379. [Google Scholar] [CrossRef]
Li, D.; Fan, Z.; Zhang, S.; Pu, H.; Chu, H.; Wu, Z. Practice Exploration of the First Prefabricated Steel-Structure Passive Building in China: Teaching and Experimental Complex Building of Shandong Jianzhu University. Donggan: Eco-City Green Build. 2017, 48–57. (In Chinese) [Google Scholar]
JGJ/T 347-2014; Standard of Test Methods for Thermal Environment of Buildings. Ministry of Housing and Urban-Rural Development of the People’s Republic of China (MOHURD), Industry standard of China: Beijing, China, 2014.
JGJ/T 461-2019; Design Standard for Controlling Indoor Air Quality of Public Buildings. Ministry of Housing and Urban-Rural Development of the People’s Republic of China (MOHURD), Industry standard of China: Beijing, China, 2019.
Chen, T.; Guestrin, C. XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar] [CrossRef]
Donahue, J.; Jia, Y.; Vinyals, O.; Hoffman, J.; Zhang, N.; Tzeng, E.; Darrell, T. DeCAF: A deep convolutional activation feature for generic visual recognition. In Proceedings of the 31st International Conference on International Conference on Machine Learning, Beijing, China, 21–26 June 2014; Available online: https://proceedings.mlr.press/v32/donahue14.html (accessed on 15 December 2025).
Sagi, O.; Rokach, L. Ensemble learning: A survey. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2018, 8, e1249. [Google Scholar] [CrossRef]

Figure 1. Aerial View of the Passive ultra-low-energy Teaching Building.

Figure 2. Building envelope construction details: (a) Roof construction detail; (b) External wall construction detail.

Figure 3. Building floor plans: (a) First Floor Plan; (b) Typical Floor Plan. A–B indicate section lines and the dashed box marks the test room.

Figure 4. Longitudinal and transverse sections passing through Room 605: (a) Section A–A; (b) Section B–B.

Figure 5. Sensor Distribution. (a) Thermal environment sensor distribution; (b) air quality sensor distribution; (c) occupancy status sensor distribution; (d) energy consumption sensor distribution.

Figure 6. Occupancy Rate Modeling Workflow for Low-Energy Buildings.

Figure 7. CNN–ResNet–RF architecture and module flow.

Figure 8. CNN-based Feature Extraction Architecture. Numbers denote the key parameters of each layer (kernel size, filters, stride, pooling window size, and fully connected neurons).

Figure 9. Residual Module Structure.

Figure 10. Schematic Diagram of the Random Forest Prediction Process.

Figure 11. Development workflow of occupancy rate prediction models. Blue blocks indicate training samples, and beige blocks indicate evaluation samples.

Figure 12. Correlation Heatmap Between Environmental Features and Occupancy.

Figure 13. Boxplots of Key Continuous Environmental Variables Grouped by Occupancy Levels. The occupancy rate is divided into three intervals (0.00–0.30, 0.30–0.53, and 0.53–1.00). Circles represent outliers.

Figure 14. Feature-Wise Correlation Coefficients with Occupancy Level.

Figure 15. Random Forest-Based Feature Importance Ranking.

Figure 16. CNN train/test regression.

Figure 17. CNN-RF train/test regression.

Figure 18. RF train/test regression.

Figure 19. CNN-ResNet-RF train/test regression.

Figure 20. Test Set Prediction Performance of the CNN-ResNet-RF Model: (a) distribution of prediction errors; (b) zoomed comparison between predicted and observed occupancy rates.

Figure 21. Test Set Prediction Performance of the RF Model: (a) distribution of prediction errors; (b) zoomed comparison between predicted and observed occupancy rates.

Figure 22. Test Set Prediction Performance of the CNN-RF Model: (a) distribution of prediction errors; (b) zoomed comparison between predicted and observed occupancy rates.

Figure 23. Test Set Prediction Performance of the CNN Model: (a) distribution of prediction errors; (b) zoomed comparison between predicted and observed occupancy rates.

Figure 24. Error distribution analysis of four models: (a) histograms; (b) smoothed density curves.

Figure 25. Comparison of Regression Performance Among Four Models.

Figure 26. Comparison of MAE Performance Across Different Prediction Models.

Figure 27. Prediction performance of compressed CNN–ResNet–RF models: (a) scatter plot; (b) time-series comparison.

Table 1. Comparative review of occupancy rate prediction Studies and Their Applicability to Passive Ultra-low-energy Buildings.

References	Building Type	Input Data	Method	Reported Performance Metrics
Sun (2024) [8]	High-density classrooms	Video, audio, sensors	DMFF-Transformer	Accuracy: 97%; Recall: 98.23% Precision: 96.24%
Mohammadabadi et al. (2022) [19]	Mechanically ventilated houses	CO₂, temperature, and humidity	CNN + XGBoost	F1 score (primary, imbalanced data): up to 0.986; MAE (misclassification rate): 0.011–0.073
Wang et al. (2021) [20]	Residential building	Temperature, motion	Two-layer ML-based occupancy detection	Accuracy ≈ 95%; F1-score ≈ 95%
Huang et al. (2024) [21]	Naturally ventilated school classrooms	Multi-point CO₂ sensors	SVM with spatial CO₂ features	Accuracy ≈ 80–89%; F1-score reported; RMSE for quantity
Huang et al. (2019) [25]	Auditorium/Thermal zone	Active infrared interruption events	Active infrared people counting IoT prototype	Accuracy: 97%; Electricity reduction: 12%
Liang & Wang (2023) [28]	Housing	Smart meter	Transformer-RNN	Accuracy: 0.9166; Precision: 0.9323; Recall: 0.9623; F1-Score: 0.9470; ROC-AUC: 0.9331
Luo et al. (2022) [29]	Multifamily housing	Power data	ABODE-Net	Accuracy: 0.8649–0.9176; F1 score: 0.8198–0.8643
Kim (2020) [30]	Campus rooms	Historical use, climate	LSTM	Average precision: 0.993–0.999
Qolomany et al. (2017) [31]	Commercial/residential	Wi-Fi	LSTM vs. ARIMA	RMSE reduction: 80.9–93.4%
Rafi et al. (2024) [32]	Laboratories/classrooms	Camera, carbon dioxide, passive infrared	DWFL Fusion	Accuracy: 94%
This study	Passive ultra-low-energy consumption teaching building	CO₂, temperature and humidity, energy consumption, behavior logs, and meteorology	CNN-ResNet-RF	R² = 0.96; RMSE = 0.054; RPD > 5

Table 2. Features of Input Data.

Variable	Features	Abbreviation	Type	Unit
Independent variable	Time of the day	H	Numerical	1, 2, 3, …
	Outdoor temperature	T_out	Numerical	℃
	Outdoor humidity	RH_out	Numerical	%
	Wind velocity	V_out	Numerical	m/s
	Rain/no rain	R	Categorical	Rain = 1; no rain = 0
	Indoor temperature	T_in	Numerical	℃
	Indoor humidity	RH_in	Numerical	%
	Indoor CO₂ concentration	C_in	Numerical	ppm
	Door/Window status	W_in	Categorical	open = 0; closed = 1
	Energy consumption	EC_plug	Numerical	W
Dependent variable	Occupancy	O	Numerical	0–1

Table 3. Technical Specifications of Installed Sensors.

Parameter	Sensor	Sensitivity and Accuracy
Temperature	Testo 174H Mini Temperature and Humidity Logger (Testo SE & Co. KGaA, Titisee-Neustadt, Germany)	Measurement range: −20 to +70 °C; Accuracy: ±0.5 °C
Humidity	Testo 174H Mini Temperature and Humidity Logger (Testo SE & Co. KGaA, Titisee-Neustadt, Germany)	Measurement range: 0 to 100% RH; Accuracy: ±3% RH
CO₂	HT-2000 Digital CO₂ Detector (HTI Instruments Co., Ltd., Shenzhen, Guangdong, China)	Measurement range: 0–9999 ppm; Accuracy: ±50 ppm or ±5% of reading (0–5000 ppm)
Energy consumption	TP-Link HS110 Smart Plug Power Monitor (TP-Link Technologies Co., Ltd., Shenzhen, China)	Measurement range: 0.02–16 A (max power 3680 W); Accuracy: ±2%; Supports data export every minute to hour; Wi-Fi enabled and supports remote data acquisition
Occupancy	HC-SR501 PIR Motion Sensor (Commercial supplier, Shenzhen, China)	Detection radius: 4–5 m within a 110–120° fan-shaped range

Table 4. Comparative Regression Performance of the Original and Compressed CNN-ResNet-RF Models.

Model	R²	RMSE	MSE	RPD
CNN-ResNet-RF	0.96399	0.053617	0.0028747	5.2962
CNN-ResNet-RF-25	0.9748	0.0449	0.00201	6.327
CNN-ResNet-RF-50	0.9613	0.0556	0.00309	5.110
CNN-ResNet-RF-75	0.9596	0.0568	0.00322	5.003

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Liu, Y.; Xue, Y.; Liu, C.; Wang, R. Empirical Evaluation of a CNN-ResNet-RF Hybrid Model for Occupancy Rate Prediction in Passive Ultra-Low-Energy Buildings. Urban Sci. 2026, 10, 150. https://doi.org/10.3390/urbansci10030150

AMA Style

Liu Y, Xue Y, Liu C, Wang R. Empirical Evaluation of a CNN-ResNet-RF Hybrid Model for Occupancy Rate Prediction in Passive Ultra-Low-Energy Buildings. Urban Science. 2026; 10(3):150. https://doi.org/10.3390/urbansci10030150

Chicago/Turabian Style

Liu, Yiwen, Yibing Xue, Chunlu Liu, and Runyu Wang. 2026. "Empirical Evaluation of a CNN-ResNet-RF Hybrid Model for Occupancy Rate Prediction in Passive Ultra-Low-Energy Buildings" Urban Science 10, no. 3: 150. https://doi.org/10.3390/urbansci10030150

APA Style

Liu, Y., Xue, Y., Liu, C., & Wang, R. (2026). Empirical Evaluation of a CNN-ResNet-RF Hybrid Model for Occupancy Rate Prediction in Passive Ultra-Low-Energy Buildings. Urban Science, 10(3), 150. https://doi.org/10.3390/urbansci10030150

Article Menu

Empirical Evaluation of a CNN-ResNet-RF Hybrid Model for Occupancy Rate Prediction in Passive Ultra-Low-Energy Buildings

Abstract

1. Introduction

2. Literature Review

2.1. Data-Driven Occupancy Prediction Using Machine Learning

2.2. Evaluation Metrics and Comparative Performance Across Occupancy Tasks

3. Case Study

3.1. Case Description

3.2. Data Collection

3.3. Dataset Description

4. Methodology

4.1. Data Preprocessing

4.2. CNN–ResNet–RF Modeling Framework

4.2.1. CNN Module: Low-Level Feature Extraction

4.2.2. ResNet Module: High-Level Feature Extraction

4.2.3. RF Module: Deep-Feature Regression

4.3. Model Training, Evaluation, and Optimization

4.3.1. Training Configuration

4.3.2. Performance Evaluation Metrics

4.3.3. Design of Lightweight Variants

5. Results

5.1. Feature Correlation Analysis and Predictor Categorization

5.2. Performance Evaluation of Occupancy Rate Prediction

5.2.1. Performance of Individual Models

5.2.2. Metric Results and Cross-Model Comparison

5.3. Performance of Lightweight CNN–ResNet–RF Variants

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI