Investigation of Audio Feature Application for CO2 Sensor-Based Occupancy Detection Enhancement

Skromule, Marija; Kozlovskis, Rainers; Tiscenko, Deniss; Judvaitis, Janis

doi:10.3390/buildings16030545

Open AccessArticle

Investigation of Audio Feature Application for CO₂ Sensor-Based Occupancy Detection Enhancement

Institute of Electronics and Computer Science, LV-1006 Riga, Latvia

^*

Author to whom correspondence should be addressed.

Buildings 2026, 16(3), 545; https://doi.org/10.3390/buildings16030545

Submission received: 10 December 2025 / Revised: 23 January 2026 / Accepted: 26 January 2026 / Published: 28 January 2026

(This article belongs to the Section Building Energy, Physics, Environment, and Systems)

Download

Browse Figures

Versions Notes

Abstract

This study investigates the integration of audio features with CO₂ sensor data to enhance occupancy detection accuracy in naturally ventilated office environments. Accurate occupancy detection is pivotal for smart building energy management, yet CO₂-based methods cannot provide fast enough response times and are sensitive to air circulation changes due to internal convection. In this article we propose a combination of CO₂ sensors and audio features from MEMS microphones to improve the occupancy detection accuracy and improve the response times. We use a Random Forest classifier and evaluate the results across two scenarios: CO₂-only and CO₂ combined with audio features. Results show that incorporating the audio features into the occupancy detection algorithms yields a significant increase in detection accuracy and speed, especially when the environment is subject to frequent air circulation changes due to internal convection, like the opening and closing of windows and doors. Combining the CO₂ and audio sensing offers a promising, cost-effective approach to occupancy detection in smart buildings, yet more research on advanced audio processing and feature selection is necessary.

Keywords:

occupancy detection; CO₂; audio; sound; naturally ventilated

1. Introduction

As global energy demand continues to increase [1], there is a need for efficient and sustainable solutions and technologies. This matter is particularly important for buildings. The Global Status Report for Buildings and Construction 2024/2025 states that in 2023 the buildings and construction sector consumed around 32% of global energy [2]. In the European Union, buildings account for approximately 42% of total energy consumption [3].

To address the high energy consumption in buildings, the concept of smart buildings has emerged as a transformative solution. Smart buildings integrate advanced technologies such as Building Energy Management and Control Systems (BEMCS), Internet of Things (IoT) sensors, and artificial intelligence to monitor and optimize energy use in real time [4]. These systems enable buildings to autonomously adjust heating, cooling, lighting, and ventilation based on environmental conditions and occupant behavior, significantly reducing energy waste.

Occupancy prediction plays a key role in the development of smart building management systems, enabling significant reductions in energy consumption while maintaining occupant comfort [4,5]. In the European Union, the revised Energy Performance of Buildings Directive (EU/2024/1275 [6]) emphasizes the adoption of smart technologies to achieve a zero-emission building stock by 2050, with occupancy detection highlighted as a key component for energy efficiency. Studies report that occupancy-based control strategies can reduce energy consumption by up to 55% for HVAC systems and by up to 78.5% for lighting [7]. In addition to energy savings, occupancy information can be used to improve occupant comfort, strengthen building security and for space utilization analysis [8].

A large number of studies have been conducted on occupancy detection and estimation in recent years, and various methods have been proposed to address this problem, including those based on cameras, environmental sensors, PIR sensors, smart meters, Wi-Fi, Bluetooth, UWB radars and microphones [5,7].

Among the available techniques, camera-based approaches generally offer the highest accuracy. With cameras and computer vision algorithms, it is possible not only to detect the presence of people but also to precisely count them and determine their activities in real time. The disadvantages of camera-based systems include their high cost, the need for large storage capacity and high processing requirements. This makes them impractical when occupancy detection is needed across many small rooms, and installing cameras can raise privacy concerns, as occupants may feel uncomfortable being monitored [5,9]. Additionally, it has been reported that the performance of camera-based occupancy prediction can suffer from sudden lighting changes and occlusion [10,11,12,13]. Due to these limitations of the camera-based methods, non-intrusive and occlusion-independent environmental sensor–based approaches have gained popularity.

The principle underlying environmental sensor-based occupancy detection is that human presence influences local indoor environmental conditions, and by analyzing environmental sensor data, information about occupancy can be deduced. These sensors typically monitor parameters such as temperature, relative humidity, air pressure, CO₂ concentration, VOC levels, and light intensity [14]. There are two main reasons why environmental sensors are widely studied for occupancy detection. First, such methods help protect occupant privacy. Second, environmental sensors are generally inexpensive and easy to install. Additionally, some of these sensors may already be present in buildings for purposes such as indoor environment monitoring or appliance control (e.g., HVAC). In such cases, occupancy detection based on environmental sensors can rely on existing building infrastructure, making the approach both more economical and more sustainable.

The most widely studied environmental sensor for occupancy detection is the CO₂ sensor [7]. This is largely due to the fact that human respiration is typically the predominant contributor to indoor CO₂ concentrations in residential and commercial environments [15]. CO₂ sensors can be used for both occupancy detection [16,17] and estimation [18,19,20,21]; however, occupancy estimation is a more challenging task and typically shows lower accuracy than binary detection [22]. It has been reported that the precision of occupancy estimation decreases as the number of occupants increases [23]. The literature identifies two commonly used approaches for deriving occupancy information from CO₂ sensor measurements: physical modeling and statistical (or data-driven) modeling [19].

Physical modeling approaches rely on solving mass balance equations. In addition to indoor CO₂ measurements, the mass balance equation requires knowledge of parameters such as room volume, ventilation rate, per-person CO₂ generation, and outdoor CO₂ concentration [24]. Parameters such as occupant CO₂ generation rate, ventilation rate, and outdoor CO₂ concentration are often assumed rather than measured. In reality, these parameters can vary due to factors that are difficult to predict or measure. In addition, the mass balance equation relies on the assumption that the air in the room is perfectly mixed [22]. All of these assumptions result in a simplified representation of real indoor conditions, which limits the accuracy of the estimated occupancy. Wolf et al. [25] propose addressing these limitations by combining the mass balance equation with statistical parameter estimation.

Statistical modeling is primarily based on the application of machine learning algorithms. Commonly used algorithms include Support Vector Machines (SVM), Artificial Neural Networks (ANN), and K-Nearest Neighbors (KNN) [7]. Statistical modeling can achieve higher occupancy prediction accuracy than physical modeling [19] but usually requires a training period and the collection of ground-truth data. Moreover, studies have shown that statistical models developed for one location often do not generalize well to other locations [26,27].

All of this shows that although CO₂ sensor-based occupancy prediction is very promising, there are still challenges in developing easy-to-use algorithms. Another common issue reported in the literature is the slow response time of CO₂ sensors, which can introduce delays in detection and result in missed short-term occupancy changes [28,29,30]. However, the most significant problem of CO₂ sensor-based occupancy detection, especially in naturally ventilated buildings, is the occurrence of window and door opening events. Such events can easily alter the CO₂ dynamics in a room, causing predictive algorithms to report false departures or false occupancy. Primarily for these reasons, the reported prediction accuracy for naturally ventilated buildings is lower than that for mechanically ventilated ones [22,31]. In some studies [32], such events are intentionally excluded during testing, resulting in findings that do not fully capture the real-world performance of the evaluated methods.

Due to the inherent limitations of CO₂ sensors, many researchers have explored CO₂ sensor combinations with other sensors to enhance the accuracy and robustness of occupancy prediction models [29,33,34,35,36]. While such an approach demonstrates good performance, some studies employ a very large number of sensors, making these systems economically unviable. It is important for an occupancy prediction system to achieve good performance with the smallest possible number of sensors.

Considering the limitations of CO₂ sensors, a good approach is to combine them with fast-response sensors that are independent of indoor airflow, such as PIR or sound sensors. Several studies have already investigated the joint use of CO₂ and PIR sensors for occupancy detection [37,38,39,40].

Jiang et al. [37] tested a combination of CO₂ and PIR sensors in a mechanically ventilated office room. The PIR sensor was placed near the door to detect occupant arrivals and departures. Their study showed that adding a PIR sensor can completely eliminate the detection delay for arrivals and reduce the detection delay for departures. A limitation of this study is that the results were obtained from only one day.

Stephen Gage [38] investigated the combination of CO₂ and PIR sensors in a residential house. Gage tested the CO₂–PIR sensor combination under four scenarios: (a) doors and windows closed; (b) doors open and windows closed; (c) doors closed and windows open; and (d) doors open and windows open. The experiment showed that when windows were open, occupancy detection based on CO₂ measurements becomes unreliable, and in such cases, the system has to rely primarily on the PIR sensor. A limitation of this study is that it did not apply specific occupancy detection algorithms, so no quantitative comparison between the scenarios was provided.

In general, combining CO₂ and PIR sensors can reduce occupancy detection delay and improve performance in scenarios where windows and doors are open. However, PIR sensors also have several limitations. A commonly reported issue is their inability to detect stationary occupants or very small movements [41,42], which is particularly problematic in office environments where occupants are often seated and relatively still. In addition, PIR sensors have a limited detection range, making sensor placement and the required number of sensors very important [40,42]. Because PIR sensors detect infrared radiation, their sensitivity also depends on indoor temperature. In hot environments, PIR sensors tend to be less sensitive, whereas in colder environments they may become oversensitive and react to the movement of warm air [43,44].

Regarding CO₂ and sound sensors, there are not many studies that have separately examined the performance of this combination. This can be explained by the fact that, unlike PIR sensors, sound sensors are rarely pre-installed in buildings, which makes their application less common. On the other hand, simple microphones are not very expensive; for example, the MEMS microphone ICS-43434 used in this work costs only around EUR 3. Moreover, several multi-sensor studies [26,34] have shown that sound levels in a room have a strong correlation with the presence of people, highlighting the importance of audio feature application for occupancy detection systems.

There are even studies that achieve good results using only audio features for occupancy prediction [45,46,47,48,49,50]. The main difference between these works and the previously mentioned multi-sensor studies is that audio-only approaches use more advanced audio features that describe the environment more accurately. This shows that, unlike PIR-based methods, audio-based detection is more flexible. It can be limited to simple audio processing methods, such as measuring sound levels [23,26,33,34] in a room, or it can be sophisticated enough to distinguish between different speakers [50].

It is worth mentioning that audio-based methods also have their limitations. The main limitation is that audio-based occupancy prediction can be misled by non-human acoustic events such as outdoor construction noise or background audio from media devices, which may result in false presence detections. This means that such techniques are more suitable for environments where non-human sounds are rare, or these sounds should be compensated through advanced algorithms or sensor fusion.

Because the combination of CO₂ with audio for occupancy detection is very promising and has been rarely studied separately, this study seeks to evaluate the performance of this sensor combination in a naturally ventilated building. The aim of this work is as follows:

Test the performance of the model with different features.
Examine the impact of sensor placement on model performance.
Evaluate audio feature contribution to model generalizability.

2. Setup

2.1. Data Collection

The data is collected using a custom-designed sensor unit (Figure 1) that includes a photoacoustic NDIR CO₂ sensor (SCD41) from Sensirion and a MEMS microphone (ICS-43434) from TDK InvenSense. At the time of the tests, the cost of the SCD41 CO₂ sensor was approximately EUR 17, while the cost of the ICS-43434 microphone was around EUR 3. The specification for the CO₂ sensor is shown in Table 1, and for the microphone in Table 2.

The data is transmitted wirelessly through a RAK3172 LoRa module from RAKwireless Technology Limited (Shenzhen, China). CO₂ and sound data from each sensor node are sent to the gateway every 30 s and then forwarded and stored in the database by the infrastructure provided by EDI TestBed [51,52,53,54,55,56].

For ground-truth collection, we installed cameras in each room. From the video recordings, we manually extracted information about occupancy and the states of doors and windows. We constructed occupancy data with 5 min occupancy resolution. Consequently, departure events that result in the room being unoccupied for less than 5 min were discarded and filled in as occupied, as detecting such short absences with high precision is difficult and often unnecessary. We assume that such events can be ignored.

2.2. Test Room Description

The tests were performed in three office-type rooms at the Institute of Electronics and Computer Science in Riga, Latvia. All rooms are located on the third floor and were naturally ventilated. The parameters of each room can be seen in Table 3. A sketch of the test rooms and the locations of the sensor nodes in these rooms is shown in Figure 2.

As part of our experimental setup, we installed multiple sensor nodes in each room to determine optimal sensor locations and, in other tests, to minimize the impact of specific sensor placement on the results. However, all results presented in this work generally represent a scenario with a single sensor node in a room. During the data recording period, the occupants of each room were asked to behave naturally in order to obtain realistic results. This means that occupants were allowed to open windows or leave doors open; however, since the data was collected in late autumn, the windows were not opened very often. From all recorded data, we calculated the percentage of time the room was occupied and unoccupied, as well as the total number of hours during which the doors and windows were open. This information for each room is presented in Table 4.

We believe that there was little to no impact of the Hawthorne effect [57] on our results; on the contrary, we made efforts to avoid it. During the first iteration of the experiments, when occupants were asked to manually record occupancy and door and window states, we noticed that they tended to leave the room less often or change door and window states less frequently. After cameras were installed, occupants were freed from this obligation and behaved more naturally. We placed the cameras at angles comfortable for the occupants, where their workspace was not exposed, while entrances and windows remained visible. Moreover, many occupants in the test rooms did not know the purpose of the experiment and therefore could not know what type of behavior was expected or preferable.

Room 1 was the main room used for our tests. For this room, we recorded 18 days of data during October and November of 2025. All 18 recorded days were working days when the room was occupied. It contains three workstations, but the person assigned to workstation C was not present during the test period, so the room was typically occupied by 1–2 people. The room has an area of 33.6 m² and a ceiling height of 2.7 m. The room has two windows, oriented to North-Northwest (NNW). We placed four sensor nodes in this room: three nodes in the corners and one node between the two windows. All nodes were placed at a height of 1.5 m.

Room 2 was not used for model training; therefore, only three days of data were recorded for this room during October of 2025. Room 2 is almost identical to Room 1. Both have similar volumes and both face North–Northwest (NNW). In the building, Room 1 and Room 2 are located on the same floor and are adjacent to each other. Because these rooms are so similar, the data from Room 2 was used to test the performance of the occupancy detection model trained on Room 1.

Similarly to Room 1, the occupancy of Room 2 during the test days ranged from 1 to 2 people. Four sensor nodes were placed in this room, with all nodes located in the corners at a height of 1.5 m. We could not place the sensor nodes in the same manner as in Room 1 because, in Room 2, the space between the two windows is occupied by a closet.

For Room 3, we also recorded three days of data during November of 2025, and the purpose of this room was the same as for Room 2. However, Room 3 differs significantly from Rooms 1 and 2. It is a small office-type room intended for a single occupant. The room has an area of 9.8 m² and a ceiling height of 2.7 m. Its windows are oriented toward the South–Southeast (SSE). In this room, three sensor nodes were placed at a height of 1.5 m.

3. Occupancy Detection Model Development

3.1. CO₂ Data Processing and Feature Extraction

CO₂ data is usually somewhat noisy due to sensor limitations and natural CO₂ fluctuations within a room, which is why data filtering is required. Many studies have demonstrated the importance of CO₂ data filtering for improving model performance [18,37]. Subsequent data processing and filtering operations were conducted in discrete 24 h intervals.

Because a simple moving average fails to remove the medium- and high-frequency noise present in the CO₂ sensor data, a more capable filtering method is needed to obtain the underlying trend. In this work, we first applied a Savitzky–Golay filter to the raw CO₂ data in Python (v3.12) using the SciPy library. The filter was configured as a 2nd-order polynomial with a window length of 350 samples, which helped remove high- and medium-frequency noise. Then, to further eliminate random noise, a simple moving average was applied on top of the previously filtered data using the rolling function from the Pandas library. The moving average was set as centered to avoid phase lag and used a 60-sample (≈30 min) window. The centered moving average was used for offline preprocessing and evaluation; real-time deployment would require causal filtering, which is outside the scope of this work.

From the CO₂ data, we also computed the first and second derivatives, as these features have often been used in previous works and have proven to be effective for occupancy detection models.

From the previously filtered CO₂ data, the first derivative was computed using the gradient function from the NumPy library. To mitigate artifacts caused by irregular sampling intervals, a uniform

Δ t

was used, calculated as the average of all sampling intervals, and subsequently divided by 3600 s to express the derivative in ppm/h. Additional filtering was applied to the first derivative to reduce the noise in the signal. The best results were obtained by applying two consecutive moving averages using a 40-sample window (≈20 min).

The second derivative was calculated in the same manner, by applying the NumPy gradient function to the filtered first derivative. No additional filtering was required, as the prior filtering sufficiently reduced noise in the sensor data.

As a result we have the following four CO₂ features:

Raw CO₂ in ppm;
Filtered CO₂ in ppm;
Filtered first derivative of CO₂ in ppm/h;
Second derivative of CO₂ in ppm/ $h^{2}$ .

All those CO₂ features were fed to our occupancy detection model. In Figure 3 is shown a graphical representation of all CO₂ features.

3.2. Audio Data Processing and Feature Extraction

Most of the audio processing was performed on the microcontroller. Pulse-code modulated (PCM) data from the microphone was transferred to the microcontroller’s memory using direct memory access (DMA), which enabled continuous data reading with low power consumption. From the raw samples, the sound dBFS (decibels relative to full scale) value was calculated every 100 milliseconds. Based on these measurements, the average sound dBFS value over a 30 s period was computed. In addition, three sound thresholds were defined, and the number of times the sound level exceeded each threshold during the 30 s interval was counted. The purpose of counting sound events was to preserve information about short-duration sounds without requiring frequent data transmission. The thresholds were chosen to detect quiet, medium, and loud sounds and to evaluate which threshold correlates best with occupancy. The first threshold was set to −67 dBFS, the second to −60 dBFS, and the third to −50 dBFS. All three thresholds were empirically determined in Room 1. We first measured background noise as well as noise generated by common activity events such as walking, talking, typing, and door closing. Based on these measurements, we selected three thresholds capable of capturing events with different loudness levels.

As a result, every 30 s, the following four audio features were sent to the database:

The average dBFS value;
The number of times the sound exceeded the first threshold (T1);
The number of times it exceeded the second threshold (T2);
The number of times it exceeded the third threshold (T3).

All of these audio features do not contain any personal information and do not compromise occupant privacy.

The received sound data is inherently very chaotic. With such data, our model did not perform well, so additional processing was required. Because the data from the second and third thresholds generally outputs zero values when the room is unoccupied and random spikes when someone is present, good results were achieved by applying the permutation entropy algorithm to these datasets. This algorithm calculates the randomness of time-series data: the more random the data is, the higher the resulting entropy values. For permutation entropy calculation, we used the AntroPy Python library, which provides a time-efficient implementation of this algorithm. Because the permutation entropy algorithm measures only the randomness of a time-series, independent of its amplitude, it had almost no effect on the noisy average dBFS data and on the first-threshold data. Instead, for these features, we applied a moving-maximum filter with a 60-sample window. The results of these processing steps on the audio data are shown in Figure 4, Figure 5, Figure 6 and Figure 7.

3.3. Model Training

In this study, we employ a Random Forest classifier as the primary predictive model. Random Forest is widely used in occupancy and indoor-environment prediction tasks due to its robustness to noise, ability to model nonlinear relationships, and comparatively low computational cost. Their adoption in related works further supports their suitability for the present problem [58,59,60].

While recurrent neural networks such as LSTM or GRU are well suited for classification and regression of time-series data, their advantages are less pronounced in the present setting. First, CO₂ changes relevant to occupancy happen over short periods, which we capture using window-based features and have the option to partially compensate for with lag features. Second, integration of various audio features with CO₂ features would require additional model complexity, whereas Random Forests naturally accommodate heterogeneous feature types. Finally, the lower computational cost of tree-based models makes them more suitable for deployment in building monitoring systems, and they are lightweight enough to allow repeated experimentation.

Random Forest operates by constructing a set of decision trees, each trained on a bootstrap sample of the data and a randomly selected subset of features at each split. During inference, each tree provides an independent prediction, and the final output is obtained via majority voting. This set of tree procedure reduces overfitting and stabilizes performance, particularly in heterogeneous sensor environments such as indoor spaces with variable occupancy patterns.

For data preparation, we first load the occupancy data corresponding to each day from the study period. Sensor readings (CO₂ and sound features) are fetched directly from the database for a given 24 h period. We then apply preprocessing and feature extraction steps as described in the previous section in discrete 24 h intervals. Here, each record is assigned an occupancy label based on its timestamp. Then a training data split is performed, and, depending on the experiment, all sensor readings within a given data split are merged into one contiguous data set.

Unless otherwise specified, we use a fixed temporal split for all experiments (from Room 1): 13 days for training, 2 days for development, and 3 days for testing. This corresponds to data split proportions of 0.72, 0.11, and 0.17, respectively.

Model hyperparameters were first tuned manually to achieve stable performance in the baseline configuration with only CO₂ features. After identifying a configuration that performed well, these model hyperparameters were used for all subsequent experiments. Additional fine-tuning did not yield meaningful improvements, suggesting that the chosen configuration provides a good trade-off between complexity and generalization across different feature sets evaluated in this work. The final configuration employs 100 trees (n_estimators = 100) and a maximum tree depth of 30 (max_depth = 30).

All data preparation, feature engineering, and model training procedures were implemented using the Python scientific computing stack, specifically pandas and numpy for tabular data manipulation and the ‘RandomForestClassifier’ implementation from scikit-learn for model fitting and inference.

The methodology of data acquisition, dataset processing and training and testing the occupancy detection model is illustrated in Figure 8.

4. Experiments and Results

4.1. Performance Metrics

To evaluate the performance of occupancy detection for each case, we use the following performance metrics for the occupied state: recall, precision, and F1-score.

Recall is a metric that represents the fraction of correctly predicted occupied instances out of all actual occupied instances. Recall is calculated as shown in Equation (1).

Recall = \frac{True Positives}{True Positives + False Negatives}

(1)

Precision represents the fraction of correctly predicted occupied instances out of all instances predicted as occupied (both correct and incorrect). Precision is calculated as shown in Equation (2).

Precision = \frac{True Positives}{True Positives + False Positives}

(2)

F1-score is a metric that combines both precision and recall and is calculated as shown in Equation (3). It provides a more balanced assessment of model performance by capturing both the model’s ability to detect occupied instances and the accuracy of those detections.

F 1 = 2 \cdot \frac{Precision \cdot Recall}{Precision + Recall}

(3)

4.2. Different Feature Test

In this test, we compare the performance of CO₂-only occupancy detection, CO₂ combined with average dBFS, and CO₂ combined with the full set of audio features. Because prior studies [23,26,33,34] often rely on a single audio intensity metric, we include the average dBFS case as a separate baseline.

The model performance for each case is summarized in Table 5. For this test, the model was trained using data from all four sensor nodes in Room 1, and the reported performance represents the average across all nodes.

The results show that using only the average dBFS as an audio feature does not significantly improve model performance; the improvement for the F1 score is only 2.3%. When sound-threshold features are included, however, the performance improvement increases to 7% compared to the CO₂-only case, which was accompanied by a non-parametric bootstrap test with 10,000 resamples, resulting in a 95% confidence interval for the F1 score difference of [0.031, 0.056], with p < 0.0001. This indicates that, similar to the CO₂ data, proper audio feature selection can substantially affect occupancy detection performance. In our case, the sound-threshold counters performed better than the average dBFS because averaging the sound over a period of time can easily obscure short acoustic events, whereas threshold-based counting preserves information about those events.

It is also worth mentioning that for the second and third thresholds we used the entropy of the threshold-exceedance data rather than the raw counts. For these features, it proved more effective to use an amplitude-independent measure of signal randomness (permutation entropy) than to rely on amplitude information. When we repeated the test using a moving-maximum filter on the second and third thresholds, the performance improvement was only 3.5%. Among all three thresholds, the highest correlation with occupancy was achieved by the third threshold, as it was high enough to avoid capturing outside noise while still being sensitive to many internal events.

Figure 9 shows the graphs of predicted and actual occupancy for each feature set. The graphs were plotted using data from sensor node 111. From the graphs, we can see that the CO₂ and CO₂ + dBFS models failed to detect the last occupancy period on Test Day 1. During the final hours of occupancy on that day, the door was open, which caused the CO₂ level to decrease. On Test Day 3, the CO₂ and CO₂ + dBFS models predicted false occupancy after the occupant had left. This was likely because the models misinterpreted the CO₂ decrease caused by the occupant’s departure as a CO₂ decrease caused by open doors or windows while the occupant was still present. In both situations, the model trained with all audio features performed better. It was less likely to misinterpret occupancy states. These results show that audio features can make CO₂-sensor-based occupancy detection more robust in naturally ventilated spaces.

In general, the tests were conducted in a quiet environment, although occasional loud external noises were captured by the room microphones. Figure 10 shows an example from Test Day 1, illustrating the recorded data for all three sound thresholds. As can be seen, the first threshold captured a significant amount of external noise before occupants were present in Room 1. The higher thresholds (second and third) were more robust, although they still captured a small amount of noise. This highlights the importance of selecting thresholds that are not too low in order to effectively separate indoor acoustic events from external noise.

Despite observing some external noise, we did not observe false occupancy detections in the results. This is because our model relies on both audio and CO₂ measurements, which compensate for each other in such situations. As discussed previously, audio features can assist during periods of intense ventilation, when CO₂-only approaches may fail, while CO₂ measurements help maintain reliable detection in acoustically noisy environments, where audio-only methods could otherwise lead to false positives.

4.3. Sensor Location Test

In this test, we evaluate the influence of sensor location on occupancy detection performance. We trained separate models using data from each sensor node in Room 1. Table 6 shows the results for each sensor using only CO₂ features, and Table 7 shows the results using both CO₂ and audio features.

The data shows that in both cases the worst-performing sensor node was the node with ID 109, which was located between the two windows, and the best-performing node was the one with ID 114, located near workstation A. When only CO₂ features were used, the performance difference between the best and worst node was approximately 9.8%. When both CO₂ and audio features were used, this difference decreased to 8%. We can conclude that, in our case, the inclusion of audio features reduced the impact of sensor placement, although it did not eliminate it entirely. Based on our observations, both for sound and CO₂ sensing, proximity to occupants resulted in higher detection performance. Similar observations have been reported in other studies as well [61,62,63].

Sensor node 114 achieved the best performance because Room 1 was most frequently occupied by the person at workstation A. As a result, the CO₂ sensor at this location often exhibited a faster response, and the microphone was better able to detect quiet activities such as typing or chair movement. Despite the fact that node 109 was the closest to the person at workstation B, this sensor showed the worst performance. Since the tests were conducted during the colder months, we can assume that the indoor air tended to flow in a direction away from the windows due to the temperature difference between the indoor and outdoor environments. This likely made the location of sensor node 109 inefficient, as the CO₂ was pushed away from it. In general, the CO₂ sensor on node 109 exhibited the slowest response to occupancy among all nodes. Additionally, when the person at workstation B arrived first, sensors 109 and 114 responded almost simultaneously, but when the person at workstation A arrived first; a noticeable delay was observed between the responses of the two sensors.

Although the location of node 109 was inefficient for CO₂ sensing, the audio features helped compensate for these limitations.

Although the location of node 109 was inefficient for CO₂ sensing due to unfavorable airflow, combination with audio features helped minimize the impact of airflow on occupancy detection performance. For node 109, we observed the largest performance increase when audio features were added. Figure 11 shows an example of occupancy prediction graphs for nodes 109 and 114. It can be seen that, for both nodes, the inclusion of audio features reduced occupancy detection delay and decreased false occupancy detections.

4.4. Different Room Test

Occupancy detection models developed using data from a single room usually perform poorly when applied to data from other rooms. In the case of CO₂-based models, this is understandable because differences in room volume and ventilation cause CO₂ dynamics to vary, leading the model to misinterpret unfamiliar data. The purpose of this test is to evaluate whether audio features can make the model more robust across different environments.

Our model was trained only on data from Room 1. For this evaluation, we used data from Room 2, which is almost identical to Room 1, and Room 3, which differs substantially from Room 1. In Rooms 2 and 3, we placed multiple sensor nodes to reduce the influence of sensor location on the results. The reported performance metrics for each room represent the average values across all sensor nodes in that room.

Table 8 shows the performance results for each room using only CO₂ features, and Table 9 shows the results obtained when both CO₂ and audio features are used.

Using only the CO₂ features, the model performed well on the data from Room 2. The performance drop for Room 2 compared to Room 1 was only 4.9%. However, when audio features were added, the performance drop increased to 7.1%. This shows that the audio features did not improve the model’s generalizability and, to some extent, even made the results worse. For example, the recall metric decreased when audio features were included. One-day examples of predicted occupancy in Room 2 for both cases are shown in Figure 12.

When we analyzed the audio data from Room 2, we found that this room suffered from poor noise isolation. Even when the occupant was not present, some noise from outside was still captured. Consequently, the audio features were unable to reliably distinguish occupancy states in this environment. The poor results can also be attributed to the fact that we used entropy-based audio features. Even when the outside noise had a lower amplitude than the noise generated by occupant activities, the entropy values remained high. When we reran the model using amplitude-based audio features instead of entropy, the F1-score for Room 2 surprisingly increased to 0.92. This was the only case in all of our tests where amplitude-based audio features outperformed entropy-based features. This result is even better than the performance observed for Room 1; it is due to the fact that Room 1 contained many short departure events that the model was unable to identify correctly. For Room 3, the F1-score decreased by 26.7% compared to Room 1 when only CO₂ features were used. However, when both CO₂ and audio features were included, the performance drop decreased to 21.3%. This shows that, for Room 3, the audio features improved the generalizability of the model trained on Room 1. Unlike Room 2, Room 3 did not suffer from outside noise issues, which contributed to the improved results.

Figure 13 shows an example of the occupancy prediction graphs for one test day in Room 3. Although the model achieved an F1-score of only 0.75 for the CO₂ + audio feature set, the graphs indicate that it successfully detected the occupant’s first arrival, lunch break, and final departure. Unlike the previous rooms, the data from Room 3 contains many short departure events that the model failed to identify. In situations where precise detection of such short departures is not required, the results for Room 3 can be considered satisfactory.

Additionally, we observed that background noise levels differed slightly between rooms. For example, all sensors in Rooms 2 and 3 measured lower dBFS values during quiet nighttime periods than those in Room 1. This difference also affected the behavior of the sound thresholds in each room. All thresholds were set in Room 1; however, because acoustic conditions vary between rooms, this resulted in slightly different outcomes. For instance, the first threshold in Room 1, set to −67 dBFS, was intended to detect very quiet acoustic events, whereas the same threshold in Rooms 2 and 3 captured medium-level noise events. In our case, because we used an entropy-based measure rather than absolute threshold counts, this variation did not have a major impact on the results. However, in scenarios where audio features rely on absolute sound levels, such differences should be carefully considered. In such cases, automatic calibration methods should be employed if the model is intended for use across different rooms.

5. Conclusions and Discussions

This study evaluated whether audio features can enhance CO₂-based occupancy detection in naturally ventilated office environments. Separately for the CO₂-only and the CO₂ + audio feature sets, we evaluated model performance, the importance of sensor location, and the ability of the model to generalize to other rooms.

We found that the use of appropriate audio features is crucial for achieving good occupancy detection performance. In our case, simple averaged audio intensity did not perform as well as audio event counts. We also observed differences depending on whether the model received amplitude-based audio features or entropy-based features. Future work should further investigate audio feature selection and audio-processing techniques.

For both CO₂ and audio sensing, proximity to the occupants is important for achieving high performance, but CO₂ measurements are also strongly influenced by airflow patterns in the room. In cases where a CO₂ sensor must operate under unfavorable airflow conditions, audio features can help maintain satisfactory occupancy detection performance.

In the tests using data from different rooms, we obtained generally satisfactory results. The audio features showed the ability to create more generalizable models, but it was also important that each room provided suitable acoustic conditions in order to achieve good performance.

As our study was conducted in late autumn, the results cannot be directly generalized to other seasons. Future work is required to investigate the performance of CO₂- and audio-based occupancy detection across different seasons, particularly during summer, when natural ventilation is more intense.

Overall, the study confirms that combining CO₂ sensing with audio features is a promising approach for improving occupancy detection in naturally ventilated buildings. The main advantage of audio is that it can be processed in many different ways. The approach used in this work is only one of many possible methods that could be applied to the occupancy detection problem.

Author Contributions

Conceptualization, M.S. and J.J.; methodology, M.S.; software, M.S., D.T. and R.K.; validation, M.S. and R.K.; formal analysis, J.J.; investigation, M.S.; resources, D.T. and R.K.; data curation, R.K.; writing—original draft preparation, M.S. and R.K.; writing—review and editing, J.J.; visualization, M.S. and D.T.; supervision, J.J.; project administration, J.J.; funding acquisition, J.J. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the European Union’s Horizon Europe research and innovation programme under grant agreement No. 101092161 project “One Step Open DBL solution (openDBL)”.

Data Availability Statement

The data presented in this study are available on request from the corresponding author, the data are not publicly available due to privacy or ethical restrictions.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Ritchie, H.; Rosado, P.; Roser, M. Energy Production and Consumption. Our World Data. 2020. Available online: https://ourworldindata.org/energy-production-consumption (accessed on 22 September 2025).
United Nations Environment Programme. Not Just Another Brick in the Wall: The Solutions Exist—Scaling Them Will Build on Progress and Cut Emissions Fast. Global Status Report for Buildings and Construction 2024/2025. Available online: https://wedocs.unep.org/handle/20.500.11822/47214 (accessed on 2 October 2025).
European Environment Agency. Addressing the Environmental and Climate Footprint of Buildings. 2024. Available online: https://build-up.ec.europa.eu/system/files/2024-10/TH-01-24-001-EN-N_Addressing-env-impact-buildings_FINAL.pdf (accessed on 22 September 2025).
Lavrinovica, I.; Judvaitis, J.; Laksis, D.; Skromule, M.; Ozols, K. A Comprehensive Review of Sensor-Based Smart Building Monitoring and Data Gathering Techniques. Appl. Sci. 2024, 14, 10057. [Google Scholar] [CrossRef]
Khan, I.; Zedadra, O.; Guerrieri, A.; Spezzano, G. Occupancy prediction in iot-enabled smart buildings: Technologies, methods, and future directions. Sensors 2024, 24, 3276. [Google Scholar] [CrossRef] [PubMed]
Directive (EU) 2024/1275 of the European Parliament and of the Council of 24 April 2024 on the Energy Performance of Buildings (Recast). Official Journal of the European Union, L Series, 8 May 2024. Available online: https://eur-lex.europa.eu/eli/dir/2024/1275/oj/eng (accessed on 22 September 2025).
Rueda, L.; Agbossou, K.; Cardenas, A.; Henao, N.; Kelouwani, S. A comprehensive review of approaches to building occupancy detection. Build. Environ. 2020, 180, 106966. [Google Scholar] [CrossRef]
Chaudhari, P.; Xiao, Y.; Cheng, M.M.C.; Li, T. Fundamentals, Algorithms, and Technologies of Occupancy Detection for Smart Buildings Using IoT Sensors. Sensors 2024, 24, 2123. [Google Scholar] [CrossRef]
Chen, Z.; Jiang, C.; Xie, L. Building occupancy estimation and detection: A review. Energy Build. 2018, 169, 260–270. [Google Scholar] [CrossRef]
Gursel Dino, I.; Kalfaoglu, E.; Işeri, O.; Erdogan, B.; Kalkan, S.; Alatan, A. Vision-based estimation of the number of occupants using video cameras. Adv. Eng. Inform. 2022, 53, 101662. [Google Scholar] [CrossRef]
Wei, S.; Tien, P.; Zhang, W.; Wei, Z.; Wang, Z.; Calautit, J.K. DeepVision based detection for energy-efficiency and indoor air quality enhancement in highly polluted spaces. J. Build. Eng. 2024, 84, 108530. [Google Scholar] [CrossRef]
Sun, K.; Liu, P.; Xing, T.; Zhao, Q.; Wang, X. A fusion framework for vision-based indoor occupancy estimation. Build. Environ. 2022, 225, 109631. [Google Scholar] [CrossRef]
Choi, H.; Um, C.Y.; Kang, K.; Kim, H.; Kim, T. Application of vision-based occupancy counting method using deep learning and performance analysis. Energy Build. 2021, 252, 111389. [Google Scholar] [CrossRef]
Chitnis, S.; Somu, N.; Kowli, A. Occupancy estimation with environmental sensors: The possibilities and limitations. Energy Built Environ. 2025, 6, 96–108. [Google Scholar] [CrossRef]
Alberts, W. Indoor air pollution: NO, NO₂, CO, and CO₂. J. Allergy Clin. Immunol. 1994, 94, 289–295. [Google Scholar] [CrossRef]
Kampezidou, S.I.; Ray, A.T.; Duncan, S.; Balchanos, M.G.; Mavris, D.N. Real-time occupancy detection with physics-informed pattern-recognition machines based on limited CO₂ and temperature sensors. Energy Build. 2021, 242, 110863. [Google Scholar] [CrossRef]
Bovo, M.; Agyeman, R.; Arif, M.; Rinner, B. Evaluation of Occupancy Detection with Distributed Environmental Sensors for IoT Applications. In Proceedings of the 2024 20th International Conference on Distributed Computing in Smart Systems and the Internet of Things (DCOSS-IoT), Abu Dhabi, United Arab Emirates, 29 April–1 May 2024; pp. 416–423. [Google Scholar] [CrossRef]
Jiang, C.; Masood, M.K.; Soh, Y.C.; Li, H. Indoor occupancy estimation from carbon dioxide concentration. Energy Build. 2016, 131, 132–141. [Google Scholar] [CrossRef]
Zuraimi, M.; Pantazaras, A.; Chaturvedi, K.; Yang, J.; Tham, K.; Lee, S. Predicting occupancy counts using physical and statistical CO₂-based modeling methodologies. Build. Environ. 2017, 123, 517–528. [Google Scholar] [CrossRef]
Arief-Ang, I.; Salim, F.; Hamilton, M. CD-HOC: Indoor Human Occupancy Counting using Carbon Dioxide Sensor Data. arXiv 2017, arXiv:1706.05286. [Google Scholar] [CrossRef]
Kańtoch, E.; Augustyniak, P. Occupancy Estimation in Academic Laboratory: A CO₂-Based Algorithm Incorporating Temporal Features for 1–16 Occupants. Electronics 2025, 14, 1377. [Google Scholar] [CrossRef]
Calì, D.; Matthes, P.; Huchtemann, K.; Streblow, R.; Müller, D. CO₂ based occupancy detection algorithm: Experimental analysis and validation for office and residential buildings. Build. Environ. 2015, 86, 39–49. [Google Scholar] [CrossRef]
Zhang, X.; Zhou, T.; Kokogiannakis, G.; Xia, L.; Wang, C. Estimating the number of occupants and activity intensity in large spaces with environmental sensors. Build. Environ. 2023, 243, 110714. [Google Scholar] [CrossRef]
Franco, A.; Leccese, F. Measurement of CO₂ concentration for occupancy estimation in educational buildings with energy efficiency purposes. J. Build. Eng. 2020, 32, 101714. [Google Scholar] [CrossRef] [PubMed]
Wolf, S.; Calı, D.; Krogstie, J.; Madsen, H. Carbon dioxide-based occupancy estimation using stochastic differential equations. Appl. Energy 2019, 236, 32–41. [Google Scholar] [CrossRef]
An, D.; Winterberger, S.; Biallas, M.; Paice, A. Occupancy Detection with Environmental Sensors Using Motion Sensors as Proxy Labels. In Proceedings of the 2025 17th International Conference on Human System Interaction (HSI), Ulsan, Republic of Korea, 16–19 July 2025; pp. 1–6. [Google Scholar] [CrossRef]
Yang, Z.; Li, N.; Becerik-Gerber, B.; Orosz, M. A Multi-Sensor Based Occupancy Estimation Model for Supporting Demand Driven HVAC Operations. In Proceedings of the 2012 Spring Simulation Multiconference, Orlando, FL, USA, 26–30 March 2012; Volume 44. [Google Scholar]
Gruber, M.; Trüschel, A.; Dalenbäck, J.O. CO₂ sensors for occupancy estimations: Potential in building automation applications. Energy Build. 2014, 84, 548–556. [Google Scholar] [CrossRef]
Pedersen, T.H.; Nielsen, K.U.; Petersen, S. Method for room occupancy detection based on trajectory of indoor climate sensor data. Build. Environ. 2017, 115, 147–156. [Google Scholar] [CrossRef]
Han, H.; Jang, K.J.; Han, C.; Lee, J. Occupancy estimation based on CO₂ concentration using dynamic neural network model. In Proceedings of the 34th AIVC-3rd TightVent-2nd Cool Roofs’-1st Venticool Conference, Athens, Greece, 25–26 September 2013. [Google Scholar]
Huang, Q.; Syndicus, M.; Frisch, J.; van Treeck, C. Spatial features of CO₂ for occupancy detection in a naturally ventilated school building. Indoor Environ. 2024, 1, 100018. [Google Scholar] [CrossRef]
Han, H. Estimation of Occupancy in a Naturally Ventilated Room using Bayesian Method Based on CO₂ Concentration. Int. J. Mech. Syst. Eng. 2017, 3, 123. [Google Scholar] [CrossRef]
Singh, A.P.; Jain, V.; Chaudhari, S.; Kraemer, F.A.; Werner, S.; Garg, V. Machine Learning-Based Occupancy Estimation Using Multivariate Sensor Nodes. In Proceedings of the 2018 IEEE Globecom Workshops (GC Wkshps), Abu Dhabi, United Arab Emirates, 9–13 December 2018; pp. 1–6. [Google Scholar] [CrossRef]
Amayri, M.; Arora, A.; Ploix, S.; Bandhyopadyay, S.; Ngo, Q.D.; Badarla, V.R. Estimating occupancy in heterogeneous sensor environment. Energy Build. 2016, 129, 46–58. [Google Scholar] [CrossRef]
Zikos, S.; Tsolakis, A.; Meskos, D.; Tryferidis, A.; Tzovaras, D. Conditional Random Fields - based approach for real-time building occupancy estimation with multi-sensory networks. Autom. Constr. 2016, 68, 128–145. [Google Scholar] [CrossRef]
Candanedo, L.M.; Feldheim, V. Accurate occupancy detection of an office room from light, temperature, humidity and CO₂ measurements using statistical learning models. Energy Build. 2016, 112, 28–39. [Google Scholar] [CrossRef]
Jiang, C.; Chen, Z.; Png, L.C.; Bekiroglu, K.; Srinivasan, S.; Su, R. Building Occupancy Detection from Carbon-dioxide and Motion Sensors. In Proceedings of the 2018 15th International Conference on Control, Automation, Robotics and Vision (ICARCV), Singapore, 18–21 November 2018; pp. 931–936. [Google Scholar] [CrossRef]
Gage, S. Is Anyone in the Room? ARENA J. Archit. Res. 2019, 4, 2. [Google Scholar] [CrossRef]
Blazevic, M.; Arz von Straussenburg, A.F.; Riehle, D.M. Sensing the Unseen–Using CO₂ as a Key Indicator for Occupancy Detection in Smart Collaboration Spaces. Procedia Comput. Sci. 2024, 239, 1312–1319. [Google Scholar] [CrossRef]
Gunay, B.; Fuller, A.; Beausoleil-Morrison, I. Detecting occupants’ presence in office spaces: A case study. eSim 2016, 9, 185–195. [Google Scholar] [CrossRef]
Cheng, C.C.; Lee, D. Enabling Smart Air Conditioning by Sensor Development: A Review. Sensors 2016, 16, 2028. [Google Scholar] [CrossRef]
Kim, S.; Moon, H.; Yoon, Y. Improved Occupancy Detection Accuracy using PIR and Door Sensors for a Smart Thermostat. Build. Simul. 2017, 15, 2753–2758. [Google Scholar] [CrossRef]
Occupancy Sensors (Motion Detectors)|PIR, Ultrasonic, Microwave Sensors. Available online: https://www.shine.lighting/products/occupancy-sensor/ (accessed on 21 November 2025).
CO₂ Sensor vs. Motion Detector. 2024. Available online: https://senseair.com/co%E2%82%82-sensor-vs-motion-detector/ (accessed on 21 November 2025).
Santiago, G.; Jiménez, M.; Aguilar, J.; Montoya, E. Audio Feature Engineering for Occupancy and Activity Estimation in Smart Buildings. Electronics 2021, 10, 2599. [Google Scholar] [CrossRef]
Al Hossain, F.; Tonmoy, M.; Lover, A.; Corey, G.; Alam, M.A.U.; Rahman, T. Crowdotic: A Privacy-Preserving Hospital Waiting Room Crowd Density Estimation with Non-speech Audio. In Proceedings of the 25th International Workshop on Mobile Computing Systems and Applications, San Diego, CA, USA, 28–29 February 2024; pp. 79–85. [Google Scholar] [CrossRef]
Valle, R. ABROA: Audio-based room-occupancy analysis using Gaussian mixtures and Hidden Markov models. In Proceedings of the 2016 Future Technologies Conference (FTC), San Francisco, CA, USA, 6–7 December 2016; pp. 1270–1273. [Google Scholar] [CrossRef]
Chen, S.; Epps, J.; Ambikairajah, E.; Le, P.N. An Investigation of Crowd Speech for Room Occupancy Estimation. In Proceedings of the Interspeech 2017, Stockholm, Sweden, 20–24 August 2017; pp. 324–328. [Google Scholar] [CrossRef]
Ghaffarzadegan, S.; Reiss, A.; Ruhs, M.; Duerichen, R.; Feng, Z. Occupancy Detection in Commercial and Residential Environments Using Audio Signal. In Proceedings of the Interspeech 2017, Stockholm, Sweden, 20–24 August 2017; pp. 3802–3806. [Google Scholar] [CrossRef]
Huang, Q.; Ge, Z.; Lu, C. Occupancy Estimation in Smart Buildings using Audio-Processing Techniques. arXiv 2016, arXiv:1602.08507. [Google Scholar] [CrossRef]
Judvaitis, J.; Salmins, A.; Nesenbergs, K. Network data traffic management inside a TestBed. In Proceedings of the 2016 Advances in Wireless and Optical Communications (RTUWO), Riga, Latvia, 3–4 November 2016; pp. 152–155. [Google Scholar]
Salmins, A.; Judvaitis, J.; Balass, R.; Nesenbergs, K. Mobile wireless sensor network TestBed. In Proceedings of the 2017 25th Telecommunication Forum (TELFOR), Belgrade, Serbia, 21–22 November 2017; pp. 1–4. [Google Scholar]
Lapsa, D.; Balass, R.; Judvaitis, J.; Nesenbergs, K. Measurement of current consumption in a wireless sensor network testbed. In Proceedings of the 2017 25th Telecommunication Forum (TELFOR), Belgrade, Serbia, 21–22 November 2017; pp. 1–4. [Google Scholar]
Judvaitis, J.; Nesenbergs, K.; Balass, R.; Greitans, M. Challenges of DevOps ready IoT Testbed. In Proceedings of the MDE4IoT/ModComp@ MoDELS, 2019, Munich, Germany, 15–17 September 2019; pp. 3–6. [Google Scholar]
Judvaitis, J.; Balass, R.; Greitans, M. Mobile iot-edge-cloud continuum based and devops enabled software framework. J. Sens. Actuator Netw. 2021, 10, 62. [Google Scholar] [CrossRef]
Balass, R.; Medvedevs, V.; Mackus, A.I.; Ormanis, J.; Ancans, A.; Judvaitis, J. Precise realtime current consumption measurement in IoT TestBed. Open Res. Eur. 2024, 3, 27. [Google Scholar] [CrossRef] [PubMed]
McCarney, R.; Warner, J.; Iliffe, S.; Van Haselen, R.; Griffin, M.; Fisher, P. The Hawthorne Effect: A randomised, controlled trial. BMC Med. Res. Methodol. 2007, 7, 30. [Google Scholar] [CrossRef] [PubMed]
Mao, S.; Yuan, Y.; Li, Y.; Wang, Z.; Yao, Y.; Kang, Y. Room Occupancy Prediction: Exploring the Power of Machine Learning and Temporal Insights. Am. J. Appl. Math. Stat. 2024, 12, 1–9. [Google Scholar] [CrossRef]
Koklu, M.; Tutuncu, K. Tree based classification methods for occupancy detection. IOP Conf. Ser. Mater. Sci. Eng. 2019, 675, 012032. [Google Scholar] [CrossRef]
Parzinger, M.; Hanfstaengl, L.; Sigg, F.; Spindler, U.; Wellisch, U.; Wirnsberger, M. Comparison of different training data sets from simulation and experimental measurement with artificial users for occupancy detection—Using machine learning methods Random Forest and LASSO. Build. Environ. 2022, 223, 109313. [Google Scholar] [CrossRef]
Varnosfaderani, M.P.; Heydarian, A.; Jazizadeh, F. Using Statistical Models to Detect Occupancy in Buildings through Monitoring VOC, CO₂, and other Environmental Factors. arXiv 2022, arXiv:2203.04750. [Google Scholar]
Khalil, H.; Wainer, G.; Dunnigan, Z. Cell-DEVS Models for CO₂ Sensors Locations in Closed Spaces. In Proceedings of the 2020 Winter Simulation Conference (WSC), Orlando, FL, USA, 14–18 December 2020; pp. 692–703. [Google Scholar] [CrossRef]
Jin, M.; Bekiaris-Liberis, N.; Weekly, K.; Spanos, C.J.; Bayen, A.M. Sensing by Proxy: Occupancy Detection Based on Indoor CO₂ Concentration. In Proceedings of the Ninth International Conference on Mobile Ubiquitous Computing, Systems, Services and Technologies, Nice, France, 19–24 July 2015. [Google Scholar]

Figure 1. Custom-designed sensor board including CO₂ sensor and MEMS microphone.

Figure 2. Sketch of the test rooms (red dots indicate the location of the sensor nodes).

Figure 3. Graphical representation of all CO₂ features. Data label explanation: CO2—raw CO₂, CO2_sma—filtered CO₂, CO2_dx1—first CO₂ derivative, CO2_dx1_sma—filtered first CO₂ derivative, CO2_dx2—second CO₂ derivative.

Figure 4. Average dBFS sound data processing. Blue—raw sound data, orange—calculated moving maximum of sound data.

Figure 5. First threshold sound data processing. Blue—raw sound data, orange—calculated moving maximum of sound data.

Figure 6. Second threshold sound data processing. Blue—raw sound data, orange—calculated entropy of sound data.

Figure 7. Third threshold sound data processing. Blue—raw sound data, orange—calculated entropy of sound data.

Figure 8. Model development methodology.

Figure 9. Occupancy prediction graphs from sensor node 111: (a) Test day 1, only CO₂ features. (b) Test day 2, only CO₂ features. (c) Test day 3, only CO₂ features. (d) Test day 1, CO₂+dBFS features. (e) Test day 2, CO₂+dBFS features. (f) Test day 3, CO₂+dBFS features. (g) Test day 1, CO₂+all audio features. (h) Test day 2, CO₂+all audio features. (i) Test day 3, CO₂+all audio features.

Figure 10. Example of recorded noise for Test Day 1: (a) Occupancy graph. (b) Data from the first sound threshold. (c) Data from the second sound threshold. (d) Data from the third sound threshold.

Figure 11. Occupancy prediction graphs: (a) Node 109, only CO₂ features. (b) Node 114, only CO₂ features. (c) Node 109, CO₂ and audio features. (d) Node 114, CO₂ and audio features.

Figure 12. Occupancy prediction graphs for Room 2: (a) CO₂ features only. (b) CO₂ and audio features.

Figure 13. Occupancy prediction graphs for Room 3: (a) CO₂ features only. (b) CO₂ and audio features.

Table 1. SCD41 CO₂ sensor specification.

Output Range	Measurement Accuracy (Range 400–1000 ppm)	Measurement Accuracy (Range 1001–2000 ppm)	Measurement Accuracy (Range 2001–5000 ppm)
0–40,000 ppm	±(50 ppm + 2.5% of Reading)	±(50 ppm + 3% of Reading)	±(40 ppm + 5% of Reading)

Table 2. ICS-43434 microphone specification.

Frequency Range	Sensitivity	Signal-to-Noise Ratio	Resolution
60 Hz–20 kHz	−26 dBFS	65 dBA	24 bit

Table 3. Test room parameters.

Room Name	Area	Volume	Windows	Doors
Room 1	33.6 m²	91.06 m³	2	1
Room 2	32.34 m²	86.99 m³	2	1
Room 3	9.83 m²	26.84 m³	1	1

Table 4. Description of collected dataset for rooms.

Room Name	Occupied Time	Unoccupied Time	Doors Open	Windows Open
Room 1	30.1%	69.9%	7.9 h	14.8 h
Room 2	38.9%	61.1%	0.1 h	0.3 h
Room 3	18.9%	81.1%	7.3 h	0 h

Table 5. Model performance for different features using.

Features	Recall	Precision	F1
Only CO₂ features	0.82	0.88	0.85
CO₂ features and dBFS	0.84	0.90	0.87
All CO₂ and sound features	0.91	0.90	0.91

Table 6. Model performance for each sensor node from Room 1 using only CO₂ features.

Sensor ID	Recall	Precision	F1
103	0.81	0.90	0.85
109	0.75	0.88	0.81
111	0.84	0.90	0.87
114	0.89	0.88	0.89

Table 7. Model performance for each sensor node from Room 1 using CO₂ and audio features.

Sensor ID	Recall	Precision	F1
103	0.87	0.90	0.88
109	0.84	0.91	0.87
111	0.91	0.94	0.92
114	0.94	0.93	0.94

Table 8. Model performance for data from different rooms using only CO₂ features.

Room	Recall	Precision	F1
Room 1	0.82	0.88	0.85
Room 2	0.80	0.81	0.81
Room 3	0.97	0.52	0.67

Table 9. Model performance for data from different rooms using CO₂ and audio features.

Room	Recall	Precision	F1
Room 1	0.91	0.90	0.91
Room 2	0.77	0.89	0.83
Room 3	0.99	0.60	0.75

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Skromule, M.; Kozlovskis, R.; Tiscenko, D.; Judvaitis, J. Investigation of Audio Feature Application for CO₂ Sensor-Based Occupancy Detection Enhancement. Buildings 2026, 16, 545. https://doi.org/10.3390/buildings16030545

AMA Style

Skromule M, Kozlovskis R, Tiscenko D, Judvaitis J. Investigation of Audio Feature Application for CO₂ Sensor-Based Occupancy Detection Enhancement. Buildings. 2026; 16(3):545. https://doi.org/10.3390/buildings16030545

Chicago/Turabian Style

Skromule, Marija, Rainers Kozlovskis, Deniss Tiscenko, and Janis Judvaitis. 2026. "Investigation of Audio Feature Application for CO₂ Sensor-Based Occupancy Detection Enhancement" Buildings 16, no. 3: 545. https://doi.org/10.3390/buildings16030545

APA Style

Skromule, M., Kozlovskis, R., Tiscenko, D., & Judvaitis, J. (2026). Investigation of Audio Feature Application for CO₂ Sensor-Based Occupancy Detection Enhancement. Buildings, 16(3), 545. https://doi.org/10.3390/buildings16030545

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Investigation of Audio Feature Application for CO₂ Sensor-Based Occupancy Detection Enhancement

Abstract

1. Introduction