Measuring Indoor Occupancy through Environmental Sensors: A Systematic Review on Sensor Deployment

The COVID-19 pandemic has changed our common habits and lifestyle. Occupancy information is valued more now due to the restrictions put in place to reduce the spread of the virus. Over the years, several authors have developed methods and algorithms to detect/estimate occupancy in enclosed spaces. Similarly, different types of sensors have been installed in the places to allow this measurement. However, new researchers and practitioners often find it difficult to estimate the number of sensors to collect the data, the time needed to sense, and technical information related to sensor deployment. Therefore, this systematic review provides an overview of the type of environmental sensors used to detect/estimate occupancy, the places that have been selected to carry out experiments, details about the placement of the sensors, characteristics of datasets, and models/algorithms developed. Furthermore, with the information extracted from three selected studies, a technique to calculate the number of environmental sensors to be deployed is proposed.


Introduction
Occupancy information refers to the presence of occupants in a building, their movement, and their behavior. The occupancy information can be used to optimize a building's energy consumption and reduce energy waste [1]. Furthermore, in a world where social distancing and space occupancy limitation policies have been enforced due to the COVID-19 pandemic, monitoring systems become tools not only to improve the management of spaces but also to save human lives [2].
To acquire occupancy-related information, there are different sensing approaches. For instance, intrusive sensors, such as cameras and pattern recognition, are used to count people; nevertheless, personal privacy is one problem during implementation. In contrast, the non-intrusive sensors types, such as passive infrared (PIR), ultrasonic, and acoustic sensors, can only be used to determine whether the room is occupied rather than determining the actual number of occupants [3].
Environmental sensors are frequently used in occupancy modeling because of their non-intrusive nature, their flexibility in sensor selection and combination, and their ability to provide continuous data streams for real-time occupancy modeling [4]. Most environmental sensors can measure CO 2 concentration, temperature, relative humidity (RH), airspeed, particulate matter (PM), and volatile organic compounds (VOC) [5]. Figure 1 presents the commonly used sensors for CO 2 , temperature, RH, and barometric pressure. The available CO 2 sensors are MQ135 [6], CL11 (also measure temperature and RH) [7][8][9], SenseAir S8 [10], and HOBO MX1102 Zhou2020, among others. To measure temperature, RH, and barometric pressure, some commercial sensors are the SENSIRION STS31 [11], BME280 [2,12], and MHB-382SD [8]. Occupancy modeling approaches are divided into categories based on their level of accuracy. These approaches include binary detection of the occupant's presence (occupancy detection) and counting the number of occupants (occupancy estimation) [4]. However, some authors have also developed models to estimate the levels of occupation as well as the social interaction [13] and status (to determine whether a person is alive or not) that a human being has [14].
Regarding practical implementations of occupancy detection and prediction, researchers have proposed various models that involve common statics models. Some models used are the Hidden Markov Model (HMM) and its variations [15][16][17], models based on Bayes' theorem [7,18], supervised Machine Learning models, such as Support Vector Machine (SVM) [19,20], Random Forest (RF) [21,22], and the popular Artificial Neural Networks (ANN) as well as their variants [23,24]. Furthermore, some researchers have also proposed combining multiple environmental parameters in the models to obtain higher precision and accuracy [25][26][27].
Existing reviews articles have performed a comprehensive overview of current solutions for occupancy estimation and detection using different categories of sensors [28][29][30][31]. All of them discuss and compare the characteristics of the sensors, both the advantages and the disadvantages. Other reviews address the modeling techniques and evaluation for occupancy inference [32][33][34]. Furthermore, some authors have carried out an extensive review incorporating the type of sensors, prediction models, and potential uses of low-cost sensors in buildings [35][36][37]. Nevertheless, none of them address the installation aspects and the number of sensors needed to place in a specific enclosed space. As a consequence, an intuitive deployment caused an increase in time and cost.
Hence, the purpose of this systematic review is to identify articles that present indoor occupancy approaches using at least one environmental sensor. That is, the aim is to find out indoor occupancy models, the number of sensors installed in occupancy environments, a description of the enclosed spaces, and the details about sensor deployment. The main interest of carrying out this research lies in reviewing how sensors are installed in testbed scenarios and the least number of sensors required to generate adequate data for analysis. This information can be beneficial for future research works that are focused on indoor occupancy.
The content of this work is organized as follows: Section 2 presents the methodology used to perform this systematic review. Section 3 provides the results obtained from this review, and Section 4 gives the discussion of the results. Finally, Section 5 presents the conclusions of this study.

Materials and Methods
This systematic review was conducted in accordance with the PRISMA (Preferred Reporting Items for Systematic Review and Meta-Analisis) checklist. The PRISMA statement was designed to help systematic reviewers transparently report why the review was done, what the authors did, and what they found. The PRISMA statement comprises a checklist of 27 items recommended for reporting in systematic reviews and an "explanation and elaboration" paper that provides additional reporting guidance for each item along with sample reports [38].
Furthermore, a second methodology developed by Kofod-Petersen [39] was considered in this study. The Kofod-Petersen method helps researchers conduct a structured literature review within Computer Science.
The review process has been broken down into several steps. First of all, relevant questions have been identified, and a specific strategy has been followed to answer them. This strategy is described along with the specific search strings and keywords used. Next, the inclusion and exclusion criteria for the selection of relevant literature are defined. Subsequently, data extraction and synthesis are carried out based on the already conducted search. Finally, the risk of bias and limitations of this systematic review process are discussed. The aforementioned steps for this systematic review are described in detail in the following subsections.

Research Questions
The purpose of this systematic review is to establish a relationship between the number of environmental sensors to collect data and the enclosed space in which occupancy levels will be estimated. The objectives for establishing this relationship are:

Search Process
For this systematic review, the well-known scientific database SCOPUS [40] is used to find relevant literature. The search process was initiated on 5 July 2021 and concluded on 10 September 2021. The search results were saved in SCOPUS and the selected publications were downloaded and imported to JabRef reference manager.
The main search keywords used were "Occupancy estimation", "Occupancy detection", "Occupancy Levels", "Occupancy" as well as "environmental variables" and "environmental sensors".

Inclusion and Exclusion Criteria
The inclusion and exclusion criteria used for screening and selecting relevant literature from the search results are defined in Table 1.

IC1
Publications whose titles contain the word "occupancy" and at least one environmental variable (e.g., CO 2 , Temperature, RH, etc.) considered. EC1 No match to the inclusion criteria.

IC2
Articles that contain keywords that match the defined keywords. EC2 Duplicate publication.

IC3
The abstracts include search keywords or have a detectable relationship with the selected theme. EC3 Research that involves datasets from other authors.

IC4
Articles that include at least one environmental sensor in their experiments. EC4 Thesis, books, and preprint studies.

IC5
The publication is available in full text in an open manner or through any of Tecnologico de Monterrey's subscriptions.

Study Selection
For the selection of articles, the criteria involved the revision of the document title, the abstract, and the skimming of the article. Furthermore, the inclusion and exclusion criteria were also applied. A selection process based on the PRISMA flowchart [38], presented in Figure 2, was also used. The selected keywords provided around 3807 publications. Among these articles, 1063 mainly focus on occupancy prediction, estimation, and detection. Finally, 93 studies were selected as only these studies fulfilled the search and inclusion criteria ( Table 2). Table 2. Summary of selected studies.

Data Extraction and Synthesis
The general information for the registration of articles should include information regarding the type of sensor and quantity, dimension of the enclosed space, design use of the room, the data collection period, and the model or method used to estimate occupancy levels as well as the accuracy obtained.
The data extracted from selected literature are tabulated in an Excel spreadsheet, according to the following structure: • After the data extraction step, the extracted data are analyzed to answer the research questions. For RQ1 and RQ2, the types of sensors used to estimate/detect indoor occupancy are listed, and the respective number of sensors installed in that enclosed space are analyzed. For RQ3, an analysis of the sensor deployment is reported, whereas for RQ4, the studies that utilized data fusion methods are listed. Finally, for RQ5, Machine Learning methods are analyzed based on their trend over time as well as the characteristics of the used dataset.

Risk of Bias
The risk of bias started during the initial query search in the database as the search produced only the literature that was published between 2009 and 2021. Moreover, the possible subjectivity of the inclusion and exclusion criteria defined by the authors can also increase the bias in the selection process. Furthermore, there could be publications that have been missed during the search process as the search was only performed through the SCOPUS database.
In addition to the aforementioned biases, this systematic review has focused on publications involving at least one environmental sensor to estimate/detect indoor occupancy.

Results
This section presents the findings from extracted data based on the questions provided in Section 2. In the "Study Characteristics" section, a brief description of the selected publications is presented. Next, RQ1 and RQ2 are answered in the respective sections "Type of Sensors and Indoor Place Characteristics" and "Place Dimension vs. Total of Sensors Deployed." This is followed by the "Sensor Deployment Specifications" section, where the proposed locations and the height of the sensors are detailed (RQ3). The "Data Fusion" section addresses the methods proposed by different authors and answers RQ4. Finally, the "Description of Algorithms and Datasets" section discusses the approaches for occupancy estimation/detection, answering RQ5.

Study Characteristics
Interest in indoor occupancy detection and estimation using environmental sensors has been growing over the years and has led to an increase in the annual output of articles in the related domain from 2009 to 2021 ( Figure 3). Lam et al. [41], pioneer in this domain, developed algorithms to calculate the number of occupants based on the analysis of the environmental data obtained. Later, in the year 2012, the research interest in this domain had a strong increase. In subsequent years from 2012 to 2017, the number of scientific publications on occupancy estimation increased significantly. The year with the most publications is 2017 (17 papers), while in the following years, the publication trend has slowly decreased.    Moreover, there are about 160 authors involved in this research area. Of these authors, the number of discovered authors who have published at least three articles and are included in this systematic review is 15 (9.35%). In total, 100 authors (62.5%) have only one publication, while 42 authors (26.25%) have two publications, indicating that a limited group of researchers (three publications, representing 1.87%) have focused on this domain. The top 15 authors who have published at least three papers in the domain of indoor occupancy estimation are shown in Figure 6. The top two researchers, M.K. Massod and Y.C. Soh, are collaborating closely.
The institution with the highest number of publications related to indoor occupancy estimation is Nanyang Technological University (nine publications, representing 9.67%) in Singapore, followed by Institut Polytechnique de Grenoble (four publications, representing 4.30%) in France, University of Southern California (four publications, representing 4.30%) in USA, and Sciences pour la Conception, l'Optimisation et la Production de Grenoble G-SCOP (four publications, representing 4.30%) in France. Nevertheless, there are more than 130 institutions that have conducted research in this field, and of those 130, 99 insti-tutions have only one publication, 25 institutions have two, and 6 institutions have three publications. Figure 7 presents the top 10 institutions that have at least three publications.  When reviewing keywords from the literature, the keywords with a minimum cooccurrence equal to five are presented in a network map (Figure 8), which was constructed using the VOSviewer software-version 1.6.17 [42]. The size of the nodes and the words in Figure 8 represent their respective weight. The bigger the node and the word, the greater their weight. The distance between two nodes reflects their strength. That is, a shorter distance reveals a stronger relationship. The line joining two keywords represents that they have appeared together. The thicker the line, the more co-occurrence they have. The nodes with the same color belong to a cluster [43].
The keyword "carbon dioxide" has the highest frequency of 43. Other keywords with a high frequency include "occupancy detection" (24), "learning system" (21), "machine learning" (21), and "energy efficiency" (20). On the other hand, keywords such as "wireless sensor network", "social interaction", "information theory", and "environmental sensor networks" have the lowest frequency of one.
The network map shows that the keyword "carbon dioxide" has a relationship with the keywords "energy efficiency", "occupancy detection", "occupancy detections", "office buildings", and "building occupancy". Finally, Figure 9 shows the research trends of indoor occupancy resolution presenting the changes and evolution of the desired precision over time. The number of occupants estimated in the place is more common; 46 publications (49.46%) were focused on this resolution. The second resolution most common is the detection (binary) of a person, which has been addressed in 35 publications (37.63%), followed by indoor occupancy levels, having been studied in 29 publications (31.18%). It is important to point out that of the 93 publications, 18 focused on more than one resolution.

Type of Sensor and Indoor Place Characteristics
Indoor occupancy is one of the important sources of information for designing smart buildings. However, challenges such as user privacy, communication limit, and a sensor's computational capability make it difficult to develop occupancy monitoring systems [44]. Figure 10 shows the types of sensors that have been put into use by year, from 2009 to 2021. The type of sensor that has been used most often over the years is the CO 2 sensor. Similarly, sensors that can measure temperature and RH are also widely used for enclosed spaces. In 2017, 15 publications (16.12%) used such sensors to collect environmental data. As secondary sensors, the PIR and light/luminescence sensors have been discussed in 27 publications (29.03%) over the years, followed by the acoustic sensor being discussed in 21 publications (representing 22.5%). While the interest in the PIR sensor increased in the years 2017 and 2019 (five publications, representing 5.37%), in 2019, the interest in the light/luminescence sensor also increased (five publications, representing 5.37%). Previously, the acoustic sensor was the most discussed in 2012 by four publications (4.30%). Regarding the number of publications reporting on a type of sensor, 84 publications (90.32%) documented the use of the CO 2 sensor with other sensors, while 13 publications (13.97%) described the use of the CO 2 sensor only. For instance, Zuraimi et al. [45] installed four CO 2 sensors within a lecture theater (876 m 2 ), obtaining a root-mean-square error (RMSE) between 19.6% and 27.4%. Other authors who had only installed one CO 2 sensor in places with areas between 12 and 40 m 2 obtained results with an accuracy value between 69.96% and 99.52% [46,47], between 88% and 94% [10], 86% [48], and an RMSE value of 77% [49]. In places with area between 89 and 186 m 2 , the results had an accuracy value between 75.5% and 96.5% [50], 85.57% [7], 94% [51], and an RMSE value of 60.44% [52].
On the other hand, 32 publications (34.40%) have only used environmental sensors. Of these 32 publications, only four plublications (4.30%) have used temperature and RH sensors as the main sensors. Their results have accuracies between 83.33% and 87.03% [53], and between 95.2% and 97% [12]. Viani et al. [20,54] installed between 23 and 28 sensors in a place with an area of 1196 m 2 to estimate occupancy levels. Their results show that the detection phase was able to correctly recognize more than 82% of the environmental events related to occupancy variations. In addition, Fiebig et al. [55] installed six VOC sensors to detect presence and estimate the occupancy levels in a place with an area of 60 m 2 , obtaining F1 scores for a binary classifier between 62% and 94%, while for multiclass, scores were between 15% and 94%. On the other hand, Weekly et al. [56] had used eight PM sensors combined with eight airflow sensors in a corridor to detect presence.

Place Dimension vs. Total Number of Sensors Deployed
In various scenarios, the size of the enclosed space varied significantly, causing environmental variables to behave differently. For instance, the physical size of a room is the primary factor in determining its ability to dissipate heat. The larger its area, the lower the temperature rise due to the heat generated in it [64]. Therefore, the number of sensors to be installed in a place should be in accordance with the surface area in order to obtain reliable information.
There are around 145 test-bed scenarios in which the authors of the studies considered in this systematic review have conducted experiments. However, not all studies share the dimensions of their enclosure. Figure 12 shows the size of the 65 venues (area in squared meters) and their co-occurrence in research works. Regarding the size of the test-bed scenarios, the most common scenarios include offices or apartments of 22 m 2 or laboratories of 186 m 2 . The smallest size is of an office of 5 m 2 , and the largest size is of a museum of 1196 m 2 . Other mostly documented sizes are offices with an area between 11 and 15 m 2 (12 test-bed places, which represent 18.46%). They are followed by spaces with areas between 16 and 20 m 2 (nine test-bed places, representing 13.84%) and those with areas between 21 and 25 m 2 (eight test-bed places, representing 12.30%). However, there are some studies that have measured and collected data from places with areas between 66 and 152 m 2 and larger areas from 306 to 1196 m 2 .
Other authors have described the enclosed spaces based on their occupancy capacity. Figure 13 presents the size of 21 places that were measured for their capacity. Offices and laboratories with a capacity of one to five occupants are the most used in the studies (10 test-bed places, representing 47.61%). These are followed by spaces with a capacity of six to 10 occupants (seven test-bed places, representing 33.33%), and laboratories with a capacity for 31 to 35 people (four test-bed places, representing 19.04%). Regarding the number of sensors deployed in the enclosed space, there are researchers who have deployed around 240 sensors in an office and achieved 91% accuracy [65]. In contrast, there are other researchers who have installed a single sensor in an office with an area of 186 m 2 , obtaining 94% accuracy [51] .  For instance, Han et al. [66] deployed a total of eight CO 2 , temperature, RH, and PIR sensors, as well as three VOC sensors in an office of 62.93 m 2 . In contrast, Szczurek et al. [67] installed only one sensor for CO 2 , temperature, and RH in a classroom with an area of 66.24 m 2 .
On the other hand, 24 studies (25.80%) have conducted experiments in places with areas between 300 and 990 m 2 , in which the number of installed CO 2 , temperature, RH, VOCs, PMs, PIR, acoustic, and light/luminescence sensors have increased. For example, Hobson et al. [68] installed a total of 26 CO 2 and PIR sensors as well as one light/luminescence sensor and plug meter in a 991 m 2 floor. Additionally, only three studies (3.22%) have considered large spaces (with an area of more than 1000 m 2 ), in which the sensors used focused mainly on measuring CO 2 , temperature, and RH. These studies also had secondary sensors such as PIR, light/luminescence, and plug meters installed in smaller amounts.

Sensor Deployment Specifications
Sensor deployment, a method of placing sensors in the desired area, is considered a challenging issue for researchers and developers [69]. In wireless sensor networks (WSNs), sensor deployment is a fundamental problem to be solved as sensor deployment determines the coverage and connectivity of a WSN and its robustness against attacks. In addition, efficient sensor deployment can prolong the lifecycle of WSNs by reducing energy consumption [69]. Figure 15 illustrates the different installation locations and their heights. These have been extracted from 64 publications that share the details of the sensor deployment. The analysis is performed from enclosed spaces (92 places) in which the authors carried out their research. It can be observed that the center of a room is the most common place to place CO 2 sensors (23 scenarios, representing 25%), temperature sensors (13 scenarios, representing 14.13%), and RH sensors (11 scenarios, representing 11.95%). Furthermore, these sensors were installed close to the occupants of the place.
The CO 2 sensor is mostly installed in HVAC ducts (nine scenarios, representing 9.78%), and PIR sensors are commonly installed near a door (15 scenarios, representing 16.30%). An easier way to place a sensor is by mounting it on a wall (17 scenarios, representing 18.47%) than placing it on a table (nine scenarios, representing 9.78%). It was also reported that few sensors were placed on the ceiling (three scenarios, representing 3.26%).
Regarding the height, CO 2 (11 scenarios, representing 11.95%), temperature (10 scenarios, representing 10.86%), and RH (10 scenarios, representing 10.86%) sensors are usually placed at about 100 cm from the ground. CO 2 sensors were also reported to have been placed at 110 cm and 160 cm (seven scenarios of each one, representing 7.60%, respectively) from the ground. In contrast, for the temperature and RH sensors (five scenarios of each one, representing 5.43%, respectively), the second most common height to place them is 150 cm from the ground.

Data Fusion
Several definitions of the term "data fusion" are presented in the literature. These definitions differ mainly based on the degree of generality and the specific research areas for which they have been used. One of the earliest and most popular definitions, at least in the multisensory area, was introduced by the Joint Directors of Laboratory and the US Department of Defense. According to them, data fusion is defined as: "A process dealing with the association, correlation, and the combination of data and information from single and multiple sources to achieve refined position and identity estimates, and complete and timely assessments of situations and threats as well as their significance" [70]. Figure 16 presents the methods and algorithms implemented in 26 studies that have specified the use of data fusion. Figure 16 unveils that parameter combinations (12 publications) are the most common used for sensor data fusion. The authors have combined different parameters to find the combination that achieves the best accuracy [4,57,71,72]. The second most used approach is the combination of relevant features (seven publications) obtained from Information Gain or Information Theory [41,[73][74][75]. Chen et al. [76] have proposed to merge the output of data-driven models with occupancy models using a Particle Filter algorithm. Alternatively, Das et al. [26] have developed a framework to fuse data at the edge node. The data are temporarily stored in a data stream buffer. Each piece of data retains its spatial-temporal properties in the buffer. Then, the fusion module correlates measurements of an entity from multiple points and reduces data redundancy. It uses a Kalman filter.
The most widely used algorithm is the conditional random field (CRF) (two publications), which is a relatively new type of discriminative probabilistic graphical model for labeling sequence data. Each feature is a real value and is associated with a numerical weight [16,77].
Almost all of the authors assert the improvement of the accuracy obtained from the models by using data fusion [25,66,68,78] except for Wang et al. [1], who state, "The fused dataset does not necessarily improve model accuracy but shows a better robustness for occupancy prediction".

Algorithms and Datasets
Numerous occupational estimation approaches have been proposed and applied to various problems recently. Occupancy estimation provides information on the presence of occupants (whether or not they exist), occupancy density, the actual number of occupants, and individual occupant location [79]. The models extracted from the 93 selected publications include statistical, analytical, probabilistic, stochastic, and machine learning models. Figure 17 presents the trends in the use of models over the years per study. On the other hand, the datasets used to train each of the models have particular characteristics. The most important for this systematic review are the availability of data, whether the data have labels, and the time-stamp resolution in these works (see Table A1). Only 11 studies (11.82%) mention the availability of their datasets; that is, the data can be downloaded for experimentation by anyone. Regarding the labeled datasets, 87 studies (93.54%) have used data with labels to train and test their models. Only one study (1.07%) has used both labeled data and unlabeled data to carry out their experiments [13]. The developed models involve Linear Regression (LR), Instance-Bases learning with parameter k (IBK), RF, K-means, Hierarchical Cluster Analysis (HCA), Fuzzy C-means, and k-medoids, which provide an accuracy between 88.7% and 97.1%.
Of the five studies (5.37%) that used unlabeled datasets, three studies have developed models based on HMM, achieving accuracies of 90.24% [80] and between 89% and 91% [81,82] as well as Bayesian Networks (BN) [65]. On the other hand, a study developed unsupervised ML algorithms such as HCA and a logical flow chart [78], obtaining an error between 7% and 23%.

Discussion
To answer RQ1, it is necessary to discuss not only the number of sensors but also the places in where they are installed. Each place has its own characteristics, not only in terms of its size but also in terms of the equipment present in it and its construction. This may be why researchers differ in choosing the number of sensors to be installed in spaces of similar size. For instance, in an office with an area between 19 and 22.5 m 2 , Diaz et al. [83] deployed two CO 2 sensors, two temperature sensors, two RH sensors, one windows/door status sensor, and three plug meters, among other sensors. They used the CO 2 concentration and the electricity consumption of the computer as indicators of the occupancy level. In a similar space, Candanedo et al. [71] installed one CO 2 sensor, one temperature sensor, one RH sensor, and one light sensor. Their developed models included Linear Discriminant Analysis (LDA), Classification And Regression Tree (CART), RF, and Gradient Boosting Machines (GBM). Furthermore, the combination of parameters performed by Candanedo et al. obtained an accuracy between 32.68% and 99.33%.
In contrast, there are studies in which the area is larger than 100 m 2 , and fewer sensors are deployed. For example, Rastogi et al. [6] installed one CO 2 , temperature, RH, and infrared proximity sensor in a 524.5 m 2 classroom. Their models included Multiple Linear Regression (MLR) and Quantile Linear Regression, and the coefficient of determination (R 2 ) for each model was 0.88 and 0.91, respectively. Jiang et al. [51] used a single CO 2 sensor in a 186 m 2 office, and they used the Feature Scaled Extreme Learning Machine (FS-ELM) model, which achieved 94% accuracy with a tolerance of four occupants difference.
According to the 93 selected publications (see Table A2), the authors mostly prefer to carry out their experiments in offices with an area between 5.03 and 62.92 m 2 , and between 97 and 634.17 m 2 , and in classrooms with an areas between 41 and 524.25 m 2 . These scenarios are easier to monitor because they are within the universities with which the researchers are affiliated. Fewer investigations have selected public spaces such as museums (1196 m 2 ) [20,54], hospital rooms (33 m 2 ) [57], cinema theaters (300 occupants) [46,47], and an elderly caring institution [59].
By analyzing the dimensions of the places, it is possible to make a size classification to define how many meters are a large or small space. To avoid the subjectivity of each person on the dimensions of the place, it makes sense to propose that spaces with an area between 1 and 70 m 2 are small, whereas spaces with an area between 71 and 300 m 2 are considered medium size. Finally, spaces with an area greater than 301 m 2 should be considered large spaces.
Furthermore, using all the information extracted from the publications selected in this systematic review, it is possible to have an idea of the number of sensors to be installed using the proposed linear regression model presented in Equation (1), where X would be the value of the area in m 2 .
To develop this linear regression, data were extracted from 66 publications that share the test-bed dimension in m 2 and the number of sensors deployed in their experiments. Furthermore, it is important to point out that the sensors considered in these studies are only for measuring CO 2 , temperature, and RH. Figure 18 shows the proposed linear regression that obtained an R 2 of 0.757.
For example, the last column of Table A2 shows the results of the theoretical estimation of the number of sensors to be deployed using the proposed Equation (1). The results coincide with 30 studies included in this systematic review. In 12 publications, the estimation has a difference of one sensor versus the actual sensors deployed. Moreover, for investigations where space is large, the estimation is extremely close to the actual number of installed sensors. In other words, it is possible to figure out how many sensors to place according to the size of the selected space. Nevertheless, this equation does not ensure optimal sensing and will need to be tested with more scenarios to obtain reliable results. To answer RQ2, the analysis shows that it is possible to obtain high accuracies using only environmental variables. In total, 38.70% of the publications use only environmental measures. For instance, Kampezidou et al. [84] have used one CO 2 and temperature sensor in a 12.96 m 2 room. Their study proposes an approach that includes a physicsinformed pattern-recognition machine (PI-PRM) to detect occupancy, which achieves 97% accuracy. Vela et al. [12] carried out an indoor occupancy-level estimation by deploying one temperature, RH, and atmospheric pressure sensor in a university gym (33 occupants) and in a living room (32 m 2 ). Their models involve SVM, k-NN, and DT, which obtained an accuracy between 95% and 97%.
The possibility of adding another type of sensor depends on the requirements, cost, and expected outcomes of the research. As for the optional sensors, the most widely used is the PIR, followed by the light and acoustic sensors.
On the other hand, answering the RQ3, the placement of sensors in an enclosed area can influence reliable data collection. Based on the selected studies, the most common locations for placement of CO 2 , temperature, and RH sensors are in the center of the room, ensuring that they are close to the occupants. Another option is to mount them on a wall or place them on a table. Moreover, the sensors are commonly installed 1 m from the ground.
Regarding RQ4, data fusion improves the models for occupancy detection or estimation. Most of the publications (83.87%) have used more than one type of sensor in their experiments. However, only 27.9% have explicitly specified the use of data fusion. The most used method is to combine parameters until an optimal combination is reached, which provides the highest accuracy. In addition, there are authors who have implemented more sophisticated methods, such as edge node fusion using Kalman Filter [26], Particle Filter [76], ANFIS [27], and BP-ANN [85]. All publications have shown that data fusion improves the accuracy of models to detect or estimate the occupancy, except for one study, which contradicts the benefits of data fusion [1].
Finally, to answer RQ5, it is important to discuss the models as well as the datasets used to train them. From the extracted information, it was discovered that Supervised Machine Learning Algorithms such as SVM, RF, DT, and ANN are very popular among researchers in addition to the models based on the Bayes Theorem and HMM. Since 2016, very few authors have carried out experiments using HMM and unlabeled data to estimate/detect indoor occupancy. In contrast, unsupervised and Dynamic Machine Learning models are of little interest to researchers so far. There is not even a single study where Semi-Supervised Machine Learning models have been used.
For instance, Crivello et al. [13] presented a system that is able to perform room occupancy detection and social interaction identification, using data coming from both energy consumption information and the environment (temperature and RH). Their aim was to determine room occupancy status and to detect socialization events in the monitored room. In order to use unsupervised methods, their approach relied on a minimal set of domain-based knowledge, such as the number of workers assigned to each room and the fact that, during each day, most of the time spent by them is on performing a usual daily activity which involves social interactions. The unsupervised clustering techniques implemented were K-medoids, K-means, hierarchical clustering, and fuzzy C-means. These four methods have a fixed number of clusters: two clusters when the goal is room occupancy detection and three clusters when the interest is in the identification of social interactions. The accuracy obtained for occupancy detection in their study was between 88.7% and 97.1%, and for socialization, it was between 93% and 95.4%.
With these investigations, it is clear that it is possible to use data without labels to detect/estimate occupancy in enclosed spaces. Only five authors have ventured into this field, which allows for reducing costs and time in data collection.

Conclusions
This systematic review presented a discussion on occupancy estimation/detection, sensor deployment, and a possible way to set the number of sensors, depending on the area of the enclosed space. The aim is to help researchers and practitioners to identify the most viable sensor placement to detect and estimate occupancy according to their objectives and performance demands.
After the implementation of the inclusion and exclusion criteria to the articles discovered in the SCOPUS database, 93 articles from 2009 to 2021 were considered and discussed. The selected studies allowed achieving the objectives and answering the research questions of this systematic review. Most of the studies (21.5%) were conducted in the USA. Other contributions were from Singapore (10.75%) and China (9.67%). After analysis of the described keywords, it was discovered that the keyword "carbon dioxide" has the highest frequency of 43. Other keywords with a high frequency include "occupancy detection" (24), "learning system" (21), "machine learning" (21), and "energy efficiency" (20). A summary of the findings of this systematic review is presented according to each research question: RQ1: Most of the studies (61.29%) are concentrated on collecting data from offices with an area between 5 and 66 m 2 . However, the number of sensors used in these studies depends on the author. Therefore, a linear regression model is proposed as a tool to calculate the number of sensors to be deployed according to the dimensions of the place.
RQ2: The results show that 90.32% of the total studies considered include CO 2 sensing as the main environmental parameter. However, 4.30% of the studies consider temperature and RH as priority measures.
RQ3: In total, 68.81% of the publications share the details of the sensor deployment from 92 places where the authors have conducted their research. The researchers preferred placing sensors that measure CO 2 , temperature, and RH in the middle of the room, at a height of 100 cm from the floor. Furthermore, it is sought that the installation of these sensors is close to the occupants.
RQ4: Regarding data fusion, only 27.95% of the studies specified the use of data fusion methods and unveiled that parameter combination is the most used method, which is followed by the combination of relevant features.
RQ5: In total, 20% of research works preferred Machine Learning algorithms such as SVM (20.43%), followed by RF (15.05%) and ANN (12.90%), including their sub-models as well. Results show that five publications specify the use of unlabeled data to detect/estimate indoor occupancy. However, the implementation of unsupervised models using environmental variables is almost unexplored.
Future research should focus on exploring models that can use unlabeled or semilabeled data in order to conduct further research on these approaches. Furthermore, it is important to study other methods to fuse data. The current studies have made use of the most basic level of data fusion. Finally, the development of a tool to set the number of sensors to be installed is important to do as well as the evaluation of the linear regression proposed in this systematic review. This would allow a cheaper but trustworthy development of experiments.
Even though all the answers were obtained, the current study also has limitations. The defined inclusion and exclusion criteria limit the scope of this study. Consequently, this systematic review does not provide details about monitoring systems that do not involve environmental parameters. Furthermore, the publications were obtained from only one database (SCOPUS), and the applied search restriction was for publications from 2009 to September 2021.

Acknowledgments:
The authors would like to thank Tecnologico de Monterrey and Conacyt for the PhD. scholarship.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript:    The column Num. Occ. refers to the place dimension based on number of persons capacity.