1. Introduction
As global energy demand continues to increase [
1], there is a need for efficient and sustainable solutions and technologies. This matter is particularly important for buildings. The Global Status Report for Buildings and Construction 2024/2025 states that in 2023 the buildings and construction sector consumed around 32% of global energy [
2]. In the European Union, buildings account for approximately 42% of total energy consumption [
3].
To address the high energy consumption in buildings, the concept of smart buildings has emerged as a transformative solution. Smart buildings integrate advanced technologies such as Building Energy Management and Control Systems (BEMCS), Internet of Things (IoT) sensors, and artificial intelligence to monitor and optimize energy use in real time [
4]. These systems enable buildings to autonomously adjust heating, cooling, lighting, and ventilation based on environmental conditions and occupant behavior, significantly reducing energy waste.
Occupancy prediction plays a key role in the development of smart building management systems, enabling significant reductions in energy consumption while maintaining occupant comfort [
4,
5]. In the European Union, the revised Energy Performance of Buildings Directive (EU/2024/1275 [
6]) emphasizes the adoption of smart technologies to achieve a zero-emission building stock by 2050, with occupancy detection highlighted as a key component for energy efficiency. Studies report that occupancy-based control strategies can reduce energy consumption by up to 55% for HVAC systems and by up to 78.5% for lighting [
7]. In addition to energy savings, occupancy information can be used to improve occupant comfort, strengthen building security and for space utilization analysis [
8].
A large number of studies have been conducted on occupancy detection and estimation in recent years, and various methods have been proposed to address this problem, including those based on cameras, environmental sensors, PIR sensors, smart meters, Wi-Fi, Bluetooth, UWB radars and microphones [
5,
7].
Among the available techniques, camera-based approaches generally offer the highest accuracy. With cameras and computer vision algorithms, it is possible not only to detect the presence of people but also to precisely count them and determine their activities in real time. The disadvantages of camera-based systems include their high cost, the need for large storage capacity and high processing requirements. This makes them impractical when occupancy detection is needed across many small rooms, and installing cameras can raise privacy concerns, as occupants may feel uncomfortable being monitored [
5,
9]. Additionally, it has been reported that the performance of camera-based occupancy prediction can suffer from sudden lighting changes and occlusion [
10,
11,
12,
13]. Due to these limitations of the camera-based methods, non-intrusive and occlusion-independent environmental sensor–based approaches have gained popularity.
The principle underlying environmental sensor-based occupancy detection is that human presence influences local indoor environmental conditions, and by analyzing environmental sensor data, information about occupancy can be deduced. These sensors typically monitor parameters such as temperature, relative humidity, air pressure, CO
2 concentration, VOC levels, and light intensity [
14]. There are two main reasons why environmental sensors are widely studied for occupancy detection. First, such methods help protect occupant privacy. Second, environmental sensors are generally inexpensive and easy to install. Additionally, some of these sensors may already be present in buildings for purposes such as indoor environment monitoring or appliance control (e.g., HVAC). In such cases, occupancy detection based on environmental sensors can rely on existing building infrastructure, making the approach both more economical and more sustainable.
The most widely studied environmental sensor for occupancy detection is the CO
2 sensor [
7]. This is largely due to the fact that human respiration is typically the predominant contributor to indoor CO
2 concentrations in residential and commercial environments [
15]. CO
2 sensors can be used for both occupancy detection [
16,
17] and estimation [
18,
19,
20,
21]; however, occupancy estimation is a more challenging task and typically shows lower accuracy than binary detection [
22]. It has been reported that the precision of occupancy estimation decreases as the number of occupants increases [
23]. The literature identifies two commonly used approaches for deriving occupancy information from CO
2 sensor measurements: physical modeling and statistical (or data-driven) modeling [
19].
Physical modeling approaches rely on solving mass balance equations. In addition to indoor CO
2 measurements, the mass balance equation requires knowledge of parameters such as room volume, ventilation rate, per-person CO
2 generation, and outdoor CO
2 concentration [
24]. Parameters such as occupant CO
2 generation rate, ventilation rate, and outdoor CO
2 concentration are often assumed rather than measured. In reality, these parameters can vary due to factors that are difficult to predict or measure. In addition, the mass balance equation relies on the assumption that the air in the room is perfectly mixed [
22]. All of these assumptions result in a simplified representation of real indoor conditions, which limits the accuracy of the estimated occupancy. Wolf et al. [
25] propose addressing these limitations by combining the mass balance equation with statistical parameter estimation.
Statistical modeling is primarily based on the application of machine learning algorithms. Commonly used algorithms include Support Vector Machines (SVM), Artificial Neural Networks (ANN), and K-Nearest Neighbors (KNN) [
7]. Statistical modeling can achieve higher occupancy prediction accuracy than physical modeling [
19] but usually requires a training period and the collection of ground-truth data. Moreover, studies have shown that statistical models developed for one location often do not generalize well to other locations [
26,
27].
All of this shows that although CO
2 sensor-based occupancy prediction is very promising, there are still challenges in developing easy-to-use algorithms. Another common issue reported in the literature is the slow response time of CO
2 sensors, which can introduce delays in detection and result in missed short-term occupancy changes [
28,
29,
30]. However, the most significant problem of CO
2 sensor-based occupancy detection, especially in naturally ventilated buildings, is the occurrence of window and door opening events. Such events can easily alter the CO
2 dynamics in a room, causing predictive algorithms to report false departures or false occupancy. Primarily for these reasons, the reported prediction accuracy for naturally ventilated buildings is lower than that for mechanically ventilated ones [
22,
31]. In some studies [
32], such events are intentionally excluded during testing, resulting in findings that do not fully capture the real-world performance of the evaluated methods.
Due to the inherent limitations of CO
2 sensors, many researchers have explored CO
2 sensor combinations with other sensors to enhance the accuracy and robustness of occupancy prediction models [
29,
33,
34,
35,
36]. While such an approach demonstrates good performance, some studies employ a very large number of sensors, making these systems economically unviable. It is important for an occupancy prediction system to achieve good performance with the smallest possible number of sensors.
Considering the limitations of CO
2 sensors, a good approach is to combine them with fast-response sensors that are independent of indoor airflow, such as PIR or sound sensors. Several studies have already investigated the joint use of CO
2 and PIR sensors for occupancy detection [
37,
38,
39,
40].
Jiang et al. [
37] tested a combination of CO
2 and PIR sensors in a mechanically ventilated office room. The PIR sensor was placed near the door to detect occupant arrivals and departures. Their study showed that adding a PIR sensor can completely eliminate the detection delay for arrivals and reduce the detection delay for departures. A limitation of this study is that the results were obtained from only one day.
Stephen Gage [
38] investigated the combination of CO
2 and PIR sensors in a residential house. Gage tested the CO
2–PIR sensor combination under four scenarios: (a) doors and windows closed; (b) doors open and windows closed; (c) doors closed and windows open; and (d) doors open and windows open. The experiment showed that when windows were open, occupancy detection based on CO
2 measurements becomes unreliable, and in such cases, the system has to rely primarily on the PIR sensor. A limitation of this study is that it did not apply specific occupancy detection algorithms, so no quantitative comparison between the scenarios was provided.
In general, combining CO
2 and PIR sensors can reduce occupancy detection delay and improve performance in scenarios where windows and doors are open. However, PIR sensors also have several limitations. A commonly reported issue is their inability to detect stationary occupants or very small movements [
41,
42], which is particularly problematic in office environments where occupants are often seated and relatively still. In addition, PIR sensors have a limited detection range, making sensor placement and the required number of sensors very important [
40,
42]. Because PIR sensors detect infrared radiation, their sensitivity also depends on indoor temperature. In hot environments, PIR sensors tend to be less sensitive, whereas in colder environments they may become oversensitive and react to the movement of warm air [
43,
44].
Regarding CO
2 and sound sensors, there are not many studies that have separately examined the performance of this combination. This can be explained by the fact that, unlike PIR sensors, sound sensors are rarely pre-installed in buildings, which makes their application less common. On the other hand, simple microphones are not very expensive; for example, the MEMS microphone ICS-43434 used in this work costs only around EUR 3. Moreover, several multi-sensor studies [
26,
34] have shown that sound levels in a room have a strong correlation with the presence of people, highlighting the importance of audio feature application for occupancy detection systems.
There are even studies that achieve good results using only audio features for occupancy prediction [
45,
46,
47,
48,
49,
50]. The main difference between these works and the previously mentioned multi-sensor studies is that audio-only approaches use more advanced audio features that describe the environment more accurately. This shows that, unlike PIR-based methods, audio-based detection is more flexible. It can be limited to simple audio processing methods, such as measuring sound levels [
23,
26,
33,
34] in a room, or it can be sophisticated enough to distinguish between different speakers [
50].
It is worth mentioning that audio-based methods also have their limitations. The main limitation is that audio-based occupancy prediction can be misled by non-human acoustic events such as outdoor construction noise or background audio from media devices, which may result in false presence detections. This means that such techniques are more suitable for environments where non-human sounds are rare, or these sounds should be compensated through advanced algorithms or sensor fusion.
Because the combination of CO2 with audio for occupancy detection is very promising and has been rarely studied separately, this study seeks to evaluate the performance of this sensor combination in a naturally ventilated building. The aim of this work is as follows:
Test the performance of the model with different features.
Examine the impact of sensor placement on model performance.
Evaluate audio feature contribution to model generalizability.
3. Occupancy Detection Model Development
3.1. CO2 Data Processing and Feature Extraction
CO
2 data is usually somewhat noisy due to sensor limitations and natural CO
2 fluctuations within a room, which is why data filtering is required. Many studies have demonstrated the importance of CO
2 data filtering for improving model performance [
18,
37]. Subsequent data processing and filtering operations were conducted in discrete 24 h intervals.
Because a simple moving average fails to remove the medium- and high-frequency noise present in the CO2 sensor data, a more capable filtering method is needed to obtain the underlying trend. In this work, we first applied a Savitzky–Golay filter to the raw CO2 data in Python (v3.12) using the SciPy library. The filter was configured as a 2nd-order polynomial with a window length of 350 samples, which helped remove high- and medium-frequency noise. Then, to further eliminate random noise, a simple moving average was applied on top of the previously filtered data using the rolling function from the Pandas library. The moving average was set as centered to avoid phase lag and used a 60-sample (≈30 min) window. The centered moving average was used for offline preprocessing and evaluation; real-time deployment would require causal filtering, which is outside the scope of this work.
From the CO2 data, we also computed the first and second derivatives, as these features have often been used in previous works and have proven to be effective for occupancy detection models.
From the previously filtered CO2 data, the first derivative was computed using the gradient function from the NumPy library. To mitigate artifacts caused by irregular sampling intervals, a uniform was used, calculated as the average of all sampling intervals, and subsequently divided by 3600 s to express the derivative in ppm/h. Additional filtering was applied to the first derivative to reduce the noise in the signal. The best results were obtained by applying two consecutive moving averages using a 40-sample window (≈20 min).
The second derivative was calculated in the same manner, by applying the NumPy gradient function to the filtered first derivative. No additional filtering was required, as the prior filtering sufficiently reduced noise in the sensor data.
As a result we have the following four CO2 features:
All those CO
2 features were fed to our occupancy detection model. In
Figure 3 is shown a graphical representation of all CO
2 features.
3.2. Audio Data Processing and Feature Extraction
Most of the audio processing was performed on the microcontroller. Pulse-code modulated (PCM) data from the microphone was transferred to the microcontroller’s memory using direct memory access (DMA), which enabled continuous data reading with low power consumption. From the raw samples, the sound dBFS (decibels relative to full scale) value was calculated every 100 milliseconds. Based on these measurements, the average sound dBFS value over a 30 s period was computed. In addition, three sound thresholds were defined, and the number of times the sound level exceeded each threshold during the 30 s interval was counted. The purpose of counting sound events was to preserve information about short-duration sounds without requiring frequent data transmission. The thresholds were chosen to detect quiet, medium, and loud sounds and to evaluate which threshold correlates best with occupancy. The first threshold was set to −67 dBFS, the second to −60 dBFS, and the third to −50 dBFS. All three thresholds were empirically determined in Room 1. We first measured background noise as well as noise generated by common activity events such as walking, talking, typing, and door closing. Based on these measurements, we selected three thresholds capable of capturing events with different loudness levels.
As a result, every 30 s, the following four audio features were sent to the database:
The average dBFS value;
The number of times the sound exceeded the first threshold (T1);
The number of times it exceeded the second threshold (T2);
The number of times it exceeded the third threshold (T3).
All of these audio features do not contain any personal information and do not compromise occupant privacy.
The received sound data is inherently very chaotic. With such data, our model did not perform well, so additional processing was required. Because the data from the second and third thresholds generally outputs zero values when the room is unoccupied and random spikes when someone is present, good results were achieved by applying the permutation entropy algorithm to these datasets. This algorithm calculates the randomness of time-series data: the more random the data is, the higher the resulting entropy values. For permutation entropy calculation, we used the AntroPy Python library, which provides a time-efficient implementation of this algorithm. Because the permutation entropy algorithm measures only the randomness of a time-series, independent of its amplitude, it had almost no effect on the noisy average dBFS data and on the first-threshold data. Instead, for these features, we applied a moving-maximum filter with a 60-sample window. The results of these processing steps on the audio data are shown in
Figure 4,
Figure 5,
Figure 6 and
Figure 7.
3.3. Model Training
In this study, we employ a Random Forest classifier as the primary predictive model. Random Forest is widely used in occupancy and indoor-environment prediction tasks due to its robustness to noise, ability to model nonlinear relationships, and comparatively low computational cost. Their adoption in related works further supports their suitability for the present problem [
58,
59,
60].
While recurrent neural networks such as LSTM or GRU are well suited for classification and regression of time-series data, their advantages are less pronounced in the present setting. First, CO2 changes relevant to occupancy happen over short periods, which we capture using window-based features and have the option to partially compensate for with lag features. Second, integration of various audio features with CO2 features would require additional model complexity, whereas Random Forests naturally accommodate heterogeneous feature types. Finally, the lower computational cost of tree-based models makes them more suitable for deployment in building monitoring systems, and they are lightweight enough to allow repeated experimentation.
Random Forest operates by constructing a set of decision trees, each trained on a bootstrap sample of the data and a randomly selected subset of features at each split. During inference, each tree provides an independent prediction, and the final output is obtained via majority voting. This set of tree procedure reduces overfitting and stabilizes performance, particularly in heterogeneous sensor environments such as indoor spaces with variable occupancy patterns.
For data preparation, we first load the occupancy data corresponding to each day from the study period. Sensor readings (CO2 and sound features) are fetched directly from the database for a given 24 h period. We then apply preprocessing and feature extraction steps as described in the previous section in discrete 24 h intervals. Here, each record is assigned an occupancy label based on its timestamp. Then a training data split is performed, and, depending on the experiment, all sensor readings within a given data split are merged into one contiguous data set.
Unless otherwise specified, we use a fixed temporal split for all experiments (from Room 1): 13 days for training, 2 days for development, and 3 days for testing. This corresponds to data split proportions of 0.72, 0.11, and 0.17, respectively.
Model hyperparameters were first tuned manually to achieve stable performance in the baseline configuration with only CO2 features. After identifying a configuration that performed well, these model hyperparameters were used for all subsequent experiments. Additional fine-tuning did not yield meaningful improvements, suggesting that the chosen configuration provides a good trade-off between complexity and generalization across different feature sets evaluated in this work. The final configuration employs 100 trees (n_estimators = 100) and a maximum tree depth of 30 (max_depth = 30).
All data preparation, feature engineering, and model training procedures were implemented using the Python scientific computing stack, specifically pandas and numpy for tabular data manipulation and the ‘RandomForestClassifier’ implementation from scikit-learn for model fitting and inference.
The methodology of data acquisition, dataset processing and training and testing the occupancy detection model is illustrated in
Figure 8.
4. Experiments and Results
4.1. Performance Metrics
To evaluate the performance of occupancy detection for each case, we use the following performance metrics for the occupied state: recall, precision, and F1-score.
Recall is a metric that represents the fraction of correctly predicted occupied instances out of all actual occupied instances. Recall is calculated as shown in Equation (
1).
Precision represents the fraction of correctly predicted occupied instances out of all instances predicted as occupied (both correct and incorrect). Precision is calculated as shown in Equation (
2).
F1-score is a metric that combines both precision and recall and is calculated as shown in Equation (
3). It provides a more balanced assessment of model performance by capturing both the model’s ability to detect occupied instances and the accuracy of those detections.
4.2. Different Feature Test
In this test, we compare the performance of CO
2-only occupancy detection, CO
2 combined with average dBFS, and CO
2 combined with the full set of audio features. Because prior studies [
23,
26,
33,
34] often rely on a single audio intensity metric, we include the average dBFS case as a separate baseline.
The model performance for each case is summarized in
Table 5. For this test, the model was trained using data from all four sensor nodes in Room 1, and the reported performance represents the average across all nodes.
The results show that using only the average dBFS as an audio feature does not significantly improve model performance; the improvement for the F1 score is only 2.3%. When sound-threshold features are included, however, the performance improvement increases to 7% compared to the CO2-only case, which was accompanied by a non-parametric bootstrap test with 10,000 resamples, resulting in a 95% confidence interval for the F1 score difference of [0.031, 0.056], with p < 0.0001. This indicates that, similar to the CO2 data, proper audio feature selection can substantially affect occupancy detection performance. In our case, the sound-threshold counters performed better than the average dBFS because averaging the sound over a period of time can easily obscure short acoustic events, whereas threshold-based counting preserves information about those events.
It is also worth mentioning that for the second and third thresholds we used the entropy of the threshold-exceedance data rather than the raw counts. For these features, it proved more effective to use an amplitude-independent measure of signal randomness (permutation entropy) than to rely on amplitude information. When we repeated the test using a moving-maximum filter on the second and third thresholds, the performance improvement was only 3.5%. Among all three thresholds, the highest correlation with occupancy was achieved by the third threshold, as it was high enough to avoid capturing outside noise while still being sensitive to many internal events.
Figure 9 shows the graphs of predicted and actual occupancy for each feature set. The graphs were plotted using data from sensor node 111. From the graphs, we can see that the CO
2 and CO
2 + dBFS models failed to detect the last occupancy period on Test Day 1. During the final hours of occupancy on that day, the door was open, which caused the CO
2 level to decrease. On Test Day 3, the CO
2 and CO
2 + dBFS models predicted false occupancy after the occupant had left. This was likely because the models misinterpreted the CO
2 decrease caused by the occupant’s departure as a CO
2 decrease caused by open doors or windows while the occupant was still present. In both situations, the model trained with all audio features performed better. It was less likely to misinterpret occupancy states. These results show that audio features can make CO
2-sensor-based occupancy detection more robust in naturally ventilated spaces.
In general, the tests were conducted in a quiet environment, although occasional loud external noises were captured by the room microphones.
Figure 10 shows an example from Test Day 1, illustrating the recorded data for all three sound thresholds. As can be seen, the first threshold captured a significant amount of external noise before occupants were present in Room 1. The higher thresholds (second and third) were more robust, although they still captured a small amount of noise. This highlights the importance of selecting thresholds that are not too low in order to effectively separate indoor acoustic events from external noise.
Despite observing some external noise, we did not observe false occupancy detections in the results. This is because our model relies on both audio and CO2 measurements, which compensate for each other in such situations. As discussed previously, audio features can assist during periods of intense ventilation, when CO2-only approaches may fail, while CO2 measurements help maintain reliable detection in acoustically noisy environments, where audio-only methods could otherwise lead to false positives.
4.3. Sensor Location Test
In this test, we evaluate the influence of sensor location on occupancy detection performance. We trained separate models using data from each sensor node in Room 1.
Table 6 shows the results for each sensor using only CO
2 features, and
Table 7 shows the results using both CO
2 and audio features.
The data shows that in both cases the worst-performing sensor node was the node with ID 109, which was located between the two windows, and the best-performing node was the one with ID 114, located near workstation A. When only CO
2 features were used, the performance difference between the best and worst node was approximately 9.8%. When both CO
2 and audio features were used, this difference decreased to 8%. We can conclude that, in our case, the inclusion of audio features reduced the impact of sensor placement, although it did not eliminate it entirely. Based on our observations, both for sound and CO
2 sensing, proximity to occupants resulted in higher detection performance. Similar observations have been reported in other studies as well [
61,
62,
63].
Sensor node 114 achieved the best performance because Room 1 was most frequently occupied by the person at workstation A. As a result, the CO2 sensor at this location often exhibited a faster response, and the microphone was better able to detect quiet activities such as typing or chair movement. Despite the fact that node 109 was the closest to the person at workstation B, this sensor showed the worst performance. Since the tests were conducted during the colder months, we can assume that the indoor air tended to flow in a direction away from the windows due to the temperature difference between the indoor and outdoor environments. This likely made the location of sensor node 109 inefficient, as the CO2 was pushed away from it. In general, the CO2 sensor on node 109 exhibited the slowest response to occupancy among all nodes. Additionally, when the person at workstation B arrived first, sensors 109 and 114 responded almost simultaneously, but when the person at workstation A arrived first; a noticeable delay was observed between the responses of the two sensors.
Although the location of node 109 was inefficient for CO2 sensing, the audio features helped compensate for these limitations.
Although the location of node 109 was inefficient for CO
2 sensing due to unfavorable airflow, combination with audio features helped minimize the impact of airflow on occupancy detection performance. For node 109, we observed the largest performance increase when audio features were added.
Figure 11 shows an example of occupancy prediction graphs for nodes 109 and 114. It can be seen that, for both nodes, the inclusion of audio features reduced occupancy detection delay and decreased false occupancy detections.
4.4. Different Room Test
Occupancy detection models developed using data from a single room usually perform poorly when applied to data from other rooms. In the case of CO2-based models, this is understandable because differences in room volume and ventilation cause CO2 dynamics to vary, leading the model to misinterpret unfamiliar data. The purpose of this test is to evaluate whether audio features can make the model more robust across different environments.
Our model was trained only on data from Room 1. For this evaluation, we used data from Room 2, which is almost identical to Room 1, and Room 3, which differs substantially from Room 1. In Rooms 2 and 3, we placed multiple sensor nodes to reduce the influence of sensor location on the results. The reported performance metrics for each room represent the average values across all sensor nodes in that room.
Table 8 shows the performance results for each room using only CO
2 features, and
Table 9 shows the results obtained when both CO
2 and audio features are used.
Using only the CO
2 features, the model performed well on the data from Room 2. The performance drop for Room 2 compared to Room 1 was only 4.9%. However, when audio features were added, the performance drop increased to 7.1%. This shows that the audio features did not improve the model’s generalizability and, to some extent, even made the results worse. For example, the recall metric decreased when audio features were included. One-day examples of predicted occupancy in Room 2 for both cases are shown in
Figure 12.
When we analyzed the audio data from Room 2, we found that this room suffered from poor noise isolation. Even when the occupant was not present, some noise from outside was still captured. Consequently, the audio features were unable to reliably distinguish occupancy states in this environment. The poor results can also be attributed to the fact that we used entropy-based audio features. Even when the outside noise had a lower amplitude than the noise generated by occupant activities, the entropy values remained high. When we reran the model using amplitude-based audio features instead of entropy, the F1-score for Room 2 surprisingly increased to 0.92. This was the only case in all of our tests where amplitude-based audio features outperformed entropy-based features. This result is even better than the performance observed for Room 1; it is due to the fact that Room 1 contained many short departure events that the model was unable to identify correctly. For Room 3, the F1-score decreased by 26.7% compared to Room 1 when only CO2 features were used. However, when both CO2 and audio features were included, the performance drop decreased to 21.3%. This shows that, for Room 3, the audio features improved the generalizability of the model trained on Room 1. Unlike Room 2, Room 3 did not suffer from outside noise issues, which contributed to the improved results.
Figure 13 shows an example of the occupancy prediction graphs for one test day in Room 3. Although the model achieved an F1-score of only 0.75 for the CO
2 + audio feature set, the graphs indicate that it successfully detected the occupant’s first arrival, lunch break, and final departure. Unlike the previous rooms, the data from Room 3 contains many short departure events that the model failed to identify. In situations where precise detection of such short departures is not required, the results for Room 3 can be considered satisfactory.
Additionally, we observed that background noise levels differed slightly between rooms. For example, all sensors in Rooms 2 and 3 measured lower dBFS values during quiet nighttime periods than those in Room 1. This difference also affected the behavior of the sound thresholds in each room. All thresholds were set in Room 1; however, because acoustic conditions vary between rooms, this resulted in slightly different outcomes. For instance, the first threshold in Room 1, set to −67 dBFS, was intended to detect very quiet acoustic events, whereas the same threshold in Rooms 2 and 3 captured medium-level noise events. In our case, because we used an entropy-based measure rather than absolute threshold counts, this variation did not have a major impact on the results. However, in scenarios where audio features rely on absolute sound levels, such differences should be carefully considered. In such cases, automatic calibration methods should be employed if the model is intended for use across different rooms.
5. Conclusions and Discussions
This study evaluated whether audio features can enhance CO2-based occupancy detection in naturally ventilated office environments. Separately for the CO2-only and the CO2 + audio feature sets, we evaluated model performance, the importance of sensor location, and the ability of the model to generalize to other rooms.
We found that the use of appropriate audio features is crucial for achieving good occupancy detection performance. In our case, simple averaged audio intensity did not perform as well as audio event counts. We also observed differences depending on whether the model received amplitude-based audio features or entropy-based features. Future work should further investigate audio feature selection and audio-processing techniques.
For both CO2 and audio sensing, proximity to the occupants is important for achieving high performance, but CO2 measurements are also strongly influenced by airflow patterns in the room. In cases where a CO2 sensor must operate under unfavorable airflow conditions, audio features can help maintain satisfactory occupancy detection performance.
In the tests using data from different rooms, we obtained generally satisfactory results. The audio features showed the ability to create more generalizable models, but it was also important that each room provided suitable acoustic conditions in order to achieve good performance.
As our study was conducted in late autumn, the results cannot be directly generalized to other seasons. Future work is required to investigate the performance of CO2- and audio-based occupancy detection across different seasons, particularly during summer, when natural ventilation is more intense.
Overall, the study confirms that combining CO2 sensing with audio features is a promising approach for improving occupancy detection in naturally ventilated buildings. The main advantage of audio is that it can be processed in many different ways. The approach used in this work is only one of many possible methods that could be applied to the occupancy detection problem.