Enhancing Indoor Air Quality Estimation: A Spatially Aware Interpolation Scheme

: The comprehensive and accurate assessment of the indoor air quality (IAQ) in large spaces, such as ofﬁces or multipurpose facilities, is essential for IAQ management. It is widely recognized that various IAQ factors affect the well-being, health, and productivity of indoor occupants. In indoor environments, it is important to assess the IAQ in places where it is difﬁcult to install sensors due to space constraints. Spatial interpolation is a technique that uses sample values of known points to predict the values of other unknown points. Unlike in outdoor environments, spatial interpolation is difﬁcult in large indoor spaces due to various constraints, such as being separated into rooms by walls or having facilities such as air conditioners or heaters installed. Therefore, it is necessary to identify independent or related regions in indoor spaces and to utilize them for spatial interpolation. In this paper, we propose a spatial interpolation technique that groups points with similar characteristics in indoor spaces and utilizes the characteristics of these groups for spatial interpolation. We integrated the IAQ data collected from multiple locations within an ofﬁce space and subsequently conducted a comparative experiment to assess the accuracy of our proposed method in comparison to commonly used approaches, such as inverse distance weighting (IDW), kriging, natural neighbor interpolation, and the radial basis function (RBF). Additionally, we performed experiments using the publicly available Intel Lab dataset. The experimental results demonstrate that our proposed scheme outperformed the existing methods. The experimental results show that the proposed method was able to obtain better predictions by reﬂecting the characteristics of regions with similar characteristics within the indoor space.


Introduction
People spend the majority of their time indoors, accounting for roughly 90% of the day, which has led to a growing interest in the energy efficiency, indoor air quality (IAQ), and user comfort of buildings [1]. It is widely recognized that various IAQ factors affect the well-being, health, and productivity of indoor occupants [2,3]. The effective monitoring and a comprehensive understanding of the IAQ of indoor spaces are essential for an energysaving design and for improving human comfort. There is active research on techniques to effectively monitor and manage the IAQ utilizing IoT technologies as well as on the impact of the IAQ on the health of indoor occupants [4][5][6][7][8][9].
Spatial interpolation is a technique that uses sample values from a known location to predict values from another unknown location. This is achieved by collecting information about the environment using sensors or other data sources and then using statistical, deterministic, or machine learning techniques to estimate the value of a variable at another ISPRS Int. J. Geo-Inf. 2023, 12, 347 2 of 26 location [10][11][12][13]. Accurately assessing IAQ parameters, such as temperature, humidity, CO 2 concentration, and particulate matter (PM), is critical to optimizing energy use in these environments as well as maintaining occupant health and comfort [14]. The application of spatial interpolation techniques has important implications for IAQ management because this can inform the optimization of ventilation and air conditioning systems, reduce energy consumption, and maintain a healthy indoor environment for occupants. Therefore, identifying important IAQ parameters in indoor spaces and developing accurate measurement techniques and data analysis methods have become active research areas.
For large indoor spaces, such as offices, smart buildings, smart factories, schools, etc., there is a limit on the number of IAQ sensors that can be installed in the indoor space due to space constraints or financial limitations [14]. In addition, large indoor spaces are often divided into multiple rooms by walls or have other constraints, such as the presence of equipment such as air conditioners, heaters, and other structures [15,16].
Choi [17] proposed a spatial interpolation method to improve the accuracy of PM estimation based on a weighted correction according to the known importance of each point. Kaligambe [18] used an extreme gradient boosting (XGBoost) model to estimate the unmeasured room temperature, humidity, and CO 2 concentrations using a limited number of sensors in a three-story smart building in Japan. Zhou [19] proposed a cross-sample learning algorithm to obtain a spatial graph model of sensors based on the horizontal and vertical effects of gravity on humidity and used it to learn the coefficient elements of labeled locations to predict the state of unlabeled locations. Machine-learning-based methods are relatively data-intensive and have a high calculation cost [20]. Choi [14] developed an accurate IAQ distribution map for large spaces using spatial interpolation methods. In their study, 18 sensors were installed in a library's reading room, with 14 for data collection. Their study identified the optimal spatial interpolation method for each IAQ factor, determined the ideal number and layout of sensors, and confirmed the map's effectiveness. In Huang [21], a study was conducted to select the optimal sensor installation location under the constraints of an indoor space. Their study compared two sampling methods in indoor air distribution measurement: the gridded method and the slope-based method. The data collected through each method were interpolated using the usual kriging method. As a result, the slope-based sampling method had a smaller interpolation error than the gridded method, and the authors recommended the slope-based sampling method for indoor air distribution measurement.
Spatial interpolation in indoor environments must carefully consider the spatial structure of the environment, the distance between the data points, and the characteristics of the IAQ data. Spatial structures, such as walls or equipment such as air conditioners, can introduce a significant spatial variation in these parameters, especially in large spaces. Therefore, when interpolating data from unmeasured points using data obtained from IAQ sensors installed in indoor environments, it is necessary to consider both the spatial constraints and the distance between the unmeasured point and the other data points.
Due to spatial constraints, sensors installed within an indoor space may be grouped together where there is a high degree of data correlation between them. Sensors that are highly correlated with each other can be thought of as having a high degree of similarity between the data collected from each sensor. When spatial interpolation is performed, it is possible to predict more accurate values by referring to the data of points with high similarity to the point to be predicted and utilizing these data for spatial interpolation. In this paper, we propose a spatial interpolation technique that groups points with similar characteristics in an indoor space and utilizes the characteristics of these groups for spatial interpolation.

Related Works
There are several techniques that are commonly used for spatial interpolation, including inverse distance weighting (IDW), kriging, natural neighbor interpolation, and the radial basis function (RBF). These methods have been used primarily for outdoor air ISPRS Int. J. Geo-Inf. 2023, 12,347 3 of 26 quality interpolation, and there is little research on IAQ interpolation considering complex indoor spaces.
IDW is a simple interpolation method that uses a weighted average of the values of the nearest data points to interpolate the values at new locations [4]. The weights are determined by the inverse of the distances between the new locations and the data points. The closer a data point is to the new location, the higher its weight will be. IDW is a fast and easy-to-implement method, but it can provide inaccurate results when the data have a strong spatial structure, as it does not take into account the spatial autocorrelation of the data [22][23][24][25][26].
Kriging is a geostatistical interpolation method that uses spatial autocorrelation to interpolate the values at new locations based on the values of nearby data points [27]. The method takes into account the spatial variability of the data and uses a weighted average of the values of the nearest data points to interpolate the values at new locations [28][29][30][31]. The weights are determined by the spatial autocorrelation structure of the data, which describes how the values at different locations are related to each other. Kriging is a popular method for spatial interpolation, as it can provide accurate results, especially when the data have a strong spatial correlation [25,32,33].
Natural neighbor interpolation is a spatial interpolation method developed by Robin Sibson [34]. It is based on the Voronoi tessellation of a set of discrete spatial points. The method uses a weighted average of the values of the nearest data points to interpolate the values at new locations. The weights are determined based on the geometric relationship between the data points and the new locations, taking into account the shapes and sizes of the data clusters. Natural neighbor interpolation is beneficial when there is a high density of measured values and is particularly reliable in cases where there is limited information on the distribution of these values [35][36][37][38][39][40]. However, since natural neighbor interpolation relies on using Thiessen polygons to estimate values within corners, it is not possible to interpolate beyond the range of the measured values [14].
RBF interpolation is a method that uses radial basis functions to approximate the unknown values at new locations [41]. By using radial basis functions, it became possible to deal with higher dimensional problems in a that is way similar to dealing with two-and three-dimensional problems [42][43][44]. RBF interpolation can provide accurate results and is computationally efficient, but it can be sensitive to the choice of the radial basis functions and the parameters used in the interpolation process [45].

Basic Concepts
All the locations in the space of interest are referred to as points. A point is considered to be a data point if a sensor is installed to measure a value at that specific location. Points that do not have associated data sensors are referred to as unmeasured points. A specific point for which a value is to be predicted is defined as a query point. We denote the Euclidean distance between two points, p and q, as d(p, q).
A set that contains one or more points is referred to as a group. Given a group g that contains points p 1 , . . . , p n , we define the group distance GD(g) of group g as follows in Equation (1).
The group distance of a group is the maximum distance between any two points within the group.
Let G(p) denote the group containing point p. The virtual distance VD(p, q) between two points, p and q, is defined as follows in Equation (2).
The virtual distance can be regarded as a measure of the distance between two points that reflects the group information.

Indoor Spatial Interpolation Scheme
We propose a spatial interpolation scheme that leverages the spatial constraints inherent to indoor environments. Figure 1 shows the flow chart of the proposed spatial interpolation scheme. The virtual distance can be regarded as a measure of the distance between two points that reflects the group information.

Indoor Spatial Interpolation Scheme
We propose a spatial interpolation scheme that leverages the spatial constraints inherent to indoor environments. Figure 1 shows the flow chart of the proposed spatial interpolation scheme. The proposed scheme consists of two stages: a preprocessing stage and an interpolation stage. In the first preprocessing stage, groups are assigned to all the points in the indoor space through the group clustering algorithm and the group assignment algorithm. The second interpolation step uses the group assignment information obtained in the preprocessing step, the query point, and the data values of each data point to predict the value of the query point. In the interpolation step, the group assignment information obtained in the preprocessing step is used to select the nearest neighbors to reference during interpolation, and the prediction is calculated based on the virtual distance between the query point and each nearest neighbor.

Group Clustering
Clustering is a popular technique used in unsupervised machine learning to group similar data points together. K-means is a widely used clustering algorithm that partitions a set of objects into k clusters such that the within-cluster sum of squared distances (also known as the within-group sum of squared errors, or WGSS) is minimized [46,47].
The K-mode clustering algorithm is a partitional clustering algorithm that aims to minimize the sum of the dissimilarity between data points and their assigned cluster modes [48]. The K-mode clustering algorithm is a variation of the K-means algorithm that is suitable for categorical data. The algorithm starts by randomly selecting K initial cluster modes, which are vectors that represent the mode of each categorical variable in the cluster. The distance between a data point and a cluster mode is measured using the Hamming distance, which is defined as the number of variables that differ between the two vectors. The K-means algorithm selects a centroid, which can be a virtual data point that may not correspond to any actual data point in the dataset. In contrast to K-means, the K-mode algorithm selects one of the data points in the cluster as the centroid. This is achieved by finding the mode, which is the most common value, of each of the categorical variables in the cluster. The proposed scheme consists of two stages: a preprocessing stage and an interpolation stage. In the first preprocessing stage, groups are assigned to all the points in the indoor space through the group clustering algorithm and the group assignment algorithm. The second interpolation step uses the group assignment information obtained in the preprocessing step, the query point, and the data values of each data point to predict the value of the query point. In the interpolation step, the group assignment information obtained in the preprocessing step is used to select the nearest neighbors to reference during interpolation, and the prediction is calculated based on the virtual distance between the query point and each nearest neighbor.

Group Clustering
Clustering is a popular technique used in unsupervised machine learning to group similar data points together. K-means is a widely used clustering algorithm that partitions a set of objects into k clusters such that the within-cluster sum of squared distances (also known as the within-group sum of squared errors, or WGSS) is minimized [46,47].
The K-mode clustering algorithm is a partitional clustering algorithm that aims to minimize the sum of the dissimilarity between data points and their assigned cluster modes [48]. The K-mode clustering algorithm is a variation of the K-means algorithm that is suitable for categorical data. The algorithm starts by randomly selecting K initial cluster modes, which are vectors that represent the mode of each categorical variable in the cluster. The distance between a data point and a cluster mode is measured using the Hamming distance, which is defined as the number of variables that differ between the two vectors. The K-means algorithm selects a centroid, which can be a virtual data point that may not correspond to any actual data point in the dataset. In contrast to K-means, the K-mode algorithm selects one of the data points in the cluster as the centroid. This is achieved by finding the mode, which is the most common value, of each of the categorical variables in the cluster.
Group clustering is defined as the process of partitioning all the data points within an indoor space into multiple clusters with the objective of ensuring that each cluster comprises points that exhibit similar spatial characteristics. The objective of group clustering is to create homogeneous groups in which the points within each group have characteristics that are more similar to each other than to the points in other groups. In this paper, the K-mode clustering algorithm is used to cluster data points. This is because the K-means algorithm may select a centroid that does not correspond to an actual data point in the dataset, whereas the K-mode algorithm always selects one of the actual data points in the cluster as the centroid. Assuming that data values collected from data points with similar spatial characteristics have similar values, this paper uses the mean squared difference (MSD) as a dissimilarity measure for the K-mode clustering algorithm. Assuming that n data points p 1 , . . . , p n are given and that each data point p i has m data values, y i 1 , . . . y i m for i = 1, . . . , n. The MSD for two data points p i and p j is defined as follows in Equation (3). (3)

Group Assignment
Group assignment refers to the process of assigning each unmeasured point in an indoor space to one of the groups created during the group clustering process. For every unmeasured point, the process involves identifying the nearest data point and assigning the same group as that of the found data point to the unmeasured point. By carrying out the group assignment process, each data point in the indoor space is assigned to a specific group. Algorithm 1 outlines the algorithm for the group assignment process.

Group-Preferred K-Nearest Neighbor (GPKNN)
Let q be a query point and n be the number of all the data points in the same group as q. The proposed group-preferred K-nearest neighbor algorithm prioritizes data points belonging to the same group as q in contrast to the K-nearest neighbor (KNN) algorithm for q, which simply finds the K nearest data points. This group-preferred algorithm identifies K data points with a smaller virtual distance from q. Replacing the Euclidean distance function in the K-nearest neighbor algorithm with the virtual distance function proposed in this paper yields results equivalent to those of the group's preferred K-nearest neighbor algorithm. Algorithm 2 presents the group-preferred K-nearest neighbor algorithm.

Spatial Interpolation
We propose two types of indoor spatial interpolation methods modified from the existing IDW and kriging algorithms.
The SSI method modifies IDW to consider the spatial constraints of the indoor environments. IDW is one of the most widely used deterministic interpolation techniques. This method assumes that values measured at closer distances have a greater weight than values measured at greater distances. Since the influence of a known value is inversely proportional to its distance from an unknown data point, this method gives greater weight to the values closest to the predicted location, with the weight decreasing with distance [10]. Let q be a given query point in an indoor space. IDW estimates the data value of the given query point as a weighted sum of the data values measured at the surrounding data points as in Equation (4) [17].ŷ where K is the number of data points used for the estimation, p i is the data point that is the i-th nearest data point to q, y(p i ) is the measured value at p i , ω i is the weight value assigned to y(p i ), andŷ(q) is the estimated value at q. This method selects neighboring data points close to the query point and gives greater weight to the measured values of the points closer to the query point. Let λ(p, q) be the inverse of d(p, q) as in Equation (5) [17].
The weights of the IDW method are computed using Equation (6) [17].
The SSI method uses Equation (7) to compute the weights of the selected K points.
where µ(p, q) is defined as the inverse of the virtual distance between p and q as shown in Equation (8).
The IDW method utilizes a weighted average of the measurements of closer data points to estimate the value at a query point. The weights are determined by the reciprocal of the distance between the query point and the data points. Data points that are closer to the query point have higher weights. The SSI method calculates weights using a virtual distance. The closer the virtual distance between the data point and the query point, the higher the weight assigned to that data point is. Algorithm 3 presents the algorithm for the SSI method. Let DPSet = {p 1 , . . . , p n } be a set of all the data points in an indoor space Let y(p i ) be the data value of p i for i = 1, . . . , n.
The SSK method modifies the kriging method to reflect the spatial constraints of the indoor environments. Kriging is also a weighted combination of monitor values; however, this approach uses spatial autocorrelation among data to determine the weights rather than assuming a function of the inverse distance. The first step in kriging analysis is to fit a function to the empirical variogram, which is the degree of dissimilarity between two observations separated by a given distance. In general, semivariance increases as the distance between points increases, indicating that points closer together tend to have more similar values than those farther apart [10]. The kriging method is the best linear unbiased estimator (BLUE) that specifies not only the estimated values but also the error in the estimation of each point [27]. The basic form of the method is shown in Equation (9) [27].
where m(p i ) is the expected value of y(p i ) and where ω i is the kriging weight that is determined in a way that minimizes the variance of the error,ŷ(p 0 ) −ŷ(p 0 ). y(p) is a random field over a point p consisting of a trend m(p) and residual R(p), with the residual as a random field with a zero mean. The covariance of the residuals that is used to determine the weights of the method is assumed to be isotropic, which means that the covariance between two points depends only on their distance as in Equation (10) [27].
where h is the distance between p and p + h, cov(R(p), R(p + h)) is the covariance of the random variables R(p) and and C R (h) is the isotropic covariance that depends only on h. Various models, such as the spherical model, exponential model, and wave model, can be used for calculating the isotropic covariance C R (h). There are three main kriging variants, (i) simple, (ii) ordinary, and (iii) kriging with a trend, which depend on the treatment of the trend component m(p). The SSK method calculates the covariance of the residuals as in Equation (11).
In the SSK method, the virtual distance is used instead of the actual distance between two points, so that the interpolation takes into account the spatial similarity between the two points. In the kriging method, the weights are determined by the spatial autocorrelation structure of the data, which describes how values at different locations are related to each other. The SSK method utilizes the virtual distance to consider the group information between a query point and a data point when determining spatial autocorrelation. Algorithm 4 presents the algorithm for the SSK method. Let DPSet = {p 1 , . . . , p n } be a set of all the data points in an indoor space Let y(p i ) be the data value of p i for i = 1, . . . , n.

Experimental Results and Discussion
We now evaluate our methods using two datasets, an office dataset and the Intel Lab dataset. The performance of the six methods, including our proposed methods (IDW, ordinary kriging, natural neighbor interpolation, RBF, SSI, and SSK), was assessed by comparing their root mean square error (RMSE), mean absolute error (MAE), mean absolute percentage error (MAPE), and R 2 as shown in Equation (12), Equation (13), Equation (14), and Equation (15), respectively [49].
where n is the total number of points, y i is the actual value of the i-th point, y i is the mean of the true values, andŷ i is the estimated value of the i-th point. A spherical covariance model is used in the kriging and SSK methods. In this paper, to evaluate the performance of the dataset, each data point in the dataset is considered to be an unmeasured point. The estimation of the unmeasured point is calculated using other data points within the dataset, and the error value of the unmeasured point is computed as the difference between the estimated value and the actual value. By utilizing the obtained error values, the final RMSE (root mean square error) is calculated. In the following experiments, N denotes the number of groups, and K denotes the number of neighbors. The IDW, kriging, natural neighbor, and RBF methods are independent of N and depend only on K, whereas the SSI and SSK methods proposed in this paper depend on both N and K. For each method, we used a subset of the given dataset to explore the combination of N and K that yields the minimum RMSE. The values of N and K obtained through this exploration were utilized for performance validation on the remaining data. For our experiments, we utilized the following software libraries: numpy 1.23.5, pandas 1.5.3, matplotlib 3.7.1, scikit-learn 1.2.2, PyKrige 1.7.0, and MetPy 1.4.1.

Experimental Results on an Office Dataset
For the evaluation, we set up 14 IAQ data points in an office space labeled from IAQ01 to IAQ14 as shown in Figure 2. There is an IAQ sensor installed at each data point. In the figure, the lines represent the walls that segregate the rooms, the bottom left is defined as the origin, and the horizontal and the vertical arrows are the x-and the y-axes, respectively. Table 1 shows the specification of the CO2 and temperature sensors used in the office space.  Table 2 represents the coordinate values of the x-axis and y-axis of the 14 data points. Table 2. Coordinate values of each data point. IAQ01  100  243  IAQ02  126  354  IAQ03  187  335  IAQ04  265  249  IAQ05  392  335  IAQ06  511  283  IAQ07  637  384  IAQ08  387  178  IAQ09  507  111  IAQ10  603  176  IAQ11  325  8  IAQ12  386  15  IAQ13  591  19  IAQ14  62  354 We developed a data collection system to collect the air quality data from the IAQ sensors. Figure 3 shows the architecture of the system. There is an IAQ sensor installed at each data point. In the figure, the lines represent the walls that segregate the rooms, the bottom left is defined as the origin, and the horizontal and the vertical arrows are the x-and the y-axes, respectively. Table 1 shows the specification of the CO 2 and temperature sensors used in the office space.  Table 2 represents the coordinate values of the x-axis and y-axis of the 14 data points.  IAQ01  100  243  IAQ02  126  354  IAQ03  187  335  IAQ04  265  249  IAQ05  392  335  IAQ06  511  283  IAQ07  637  384  IAQ08  387  178  IAQ09  507  111  IAQ10  603  176  IAQ11  325  8  IAQ12  386  15  IAQ13  591  19  IAQ14  62  354 We developed a data collection system to collect the air quality data from the IAQ sensors. Figure 3 shows the architecture of the system.  Each sensor sends a packet, including the air quality data, every minute to the packet processing module of the system through the Transmission Control Protocol (TCP). The packet processing module parses the received packets and validates them. When the packets are valid, the data storing module stores them in the data repository. The stored data can be searched using the data searching module and displayed on the web in chart or table form.

Data Point X Location (cm) Y Location (cm)
In this experiment, we collected the CO2 concentration and temperature data every minute over a 5-day period from 29 June 2020 to 3 July 2020. For each of the 14 data points, we collected an average of 530 data points per day, totaling 37,086 data points for the CO2 concentration and temperature data, respectively. Using the June 29 data, we varied N from 2 to 7 and K between 3, 6, 9, 12, and 14 to find the N and K values with the minimum RMSE. Using the found N and K values, we performed interpolation experiments on the data from 30 June to 3 July to verify the performance.

Experimental Results for CO2 Data
The color-coded representation of the groups assigned to each point in the office space through the group allocation and group assignment processes is shown in Figure 4. The number of groups varies from two to seven. Each sensor sends a packet, including the air quality data, every minute to the packet processing module of the system through the Transmission Control Protocol (TCP). The packet processing module parses the received packets and validates them. When the packets are valid, the data storing module stores them in the data repository. The stored data can be searched using the data searching module and displayed on the web in chart or table form.
In this experiment, we collected the CO 2 concentration and temperature data every minute over a 5-day period from 29 June 2020 to 3 July 2020. For each of the 14 data points, we collected an average of 530 data points per day, totaling 37,086 data points for the CO 2 concentration and temperature data, respectively. Using the June 29 data, we varied N from 2 to 7 and K between 3, 6, 9, 12, and 14 to find the N and K values with the minimum RMSE. Using the found N and K values, we performed interpolation experiments on the data from 30 June to 3 July to verify the performance.

Experimental Results for CO 2 Data
The color-coded representation of the groups assigned to each point in the office space through the group allocation and group assignment processes is shown in Figure 4. The number of groups varies from two to seven.       The IDW method had its minimum RMSE value when K = 14, the kriging method had its minimum RMSE value when K = 12, the natural neighbor method had its minimum RMSE value when K = 6, and the RBF method had its minimum RMSE value when K = 14. The SSI method had its minimum RMSE value when N = 6 and K = 3, and the SSK method had its minimum RMSE value when N = 6 and K = 6. Table 4 shows the results of a performance experiment using the optimal values of N and K for each method based on four days of data from 30 June to 3 July. As shown in Table 4, the proposed SSI and SSK methods show better performance metrics compared to the other methods. Figure 6 shows the RMSE values for each method when the number of neighbors varies from 3 to 14. Figure 5 shows the RMSE values for each method when the number of groups in the CO 2 data varies from two to seven. The RBF method is excluded from Figure 5 Figure 5 shows the RMSE values for each method when the number of groups in the CO2 data varies from two to seven. The RBF method is excluded from Figure 5 and the subsequent figures due to its large RMSE value compared to the other methods.   The IDW method had its minimum RMSE value when K = 14, the kriging method had its minimum RMSE value when K = 12, the natural neighbor method had its minimum RMSE value when K = 6, and the RBF method had its minimum RMSE value when K = 14. The SSI method had its minimum RMSE value when N = 6 and K = 3, and the SSK method had its minimum RMSE value when N = 6 and K = 6. Table 4 shows the results of a performance experiment using the optimal values of N and K for each method based on four days of data from 30 June to 3 July. As shown in Table 4, the proposed SSI and SSK methods show better performance metrics compared to the other methods.  Figure 5 shows the RMSE values for each method when the number of groups in the CO2 data varies from two to seven. The RBF method is excluded from Figure 5 and the subsequent figures due to its large RMSE value compared to the other methods.   The IDW method had its minimum RMSE value when K = 14, the kriging method had its minimum RMSE value when K = 12, the natural neighbor method had its minimum RMSE value when K = 6, and the RBF method had its minimum RMSE value when K = 14. The SSI method had its minimum RMSE value when N = 6 and K = 3, and the SSK method had its minimum RMSE value when N = 6 and K = 6. Table 4 shows the results of a performance experiment using the optimal values of N and K for each method based on four days of data from 30 June to 3 July. As shown in Table 4, the proposed SSI and SSK methods show better performance metrics compared to the other methods.   Figure 7 shows the heatmaps for each method, displaying the estimated values for all the points when N = 5 and K = 6. In the case of the natural neighbor method, the areas outside the sensor range were not interpolated, so they are not displayed in the figure.  Figure 7 shows the heatmaps for each method, displaying the estimated values for all the points when N = 5 and K = 6. In the case of the natural neighbor method, the areas outside the sensor range were not interpolated, so they are not displayed in the figure.

Experimental Results for Temperature Data
The color-coded representation of the groups assigned to each point in the office space through the group allocation and group assignment processes is shown in Figure 8. The number of groups varies from two to seven.

Experimental Results for Temperature Data
The color-coded representation of the groups assigned to each point in the office space through the group allocation and group assignment processes is shown in Figure 8. The number of groups varies from two to seven.       Figure 9 shows the RMSE values for each method as the number of groups in the temperature data varies from two to seven.  Figure 9 shows the RMSE values for each method as the number of groups in the temperature data varies from two to seven.       The IDW method had its minimum RMSE value when K = 3, the kriging method had its minimum RMSE value when K = 3, the natural neighbor method had its minimum RMSE value when K = 14, and the RBF method had its minimum RMSE value when K = 12. The SSI method had its minimum RMSE value when N = 6 and K = 6, and the SSK method had its minimum RMSE value when N = 6 and K = 12. Table 6 shows the results of a performance experiment using the optimal values of N and K for each method based on four days of data from 30 June to 3 July. As shown in Table 6, the proposed SSI and SSK methods show slightly better performance metrics compared to the other methods. The IDW method had its minimum RMSE value when K = 3, the kriging method had its minimum RMSE value when K = 3, the natural neighbor method had its minimum RMSE value when K = 14, and the RBF method had its minimum RMSE value when K = 12. The SSI method had its minimum RMSE value when N = 6 and K = 6, and the SSK method had its minimum RMSE value when N = 6 and K = 12. Table 6 shows the results of a performance experiment using the optimal values of N and K for each method based on four days of data from 30 June to 3 July. As shown in Table 6, the proposed SSI and SSK methods show slightly better performance metrics compared to the other methods.  Figure 11 shows the heatmaps of the predicted values for each point when N = 5 and K = 6. For the natural neighbor method, the areas outside the sensor range were not interpolated and are not shown in the figure.  Figure 11 shows the heatmaps of the predicted values for each point when N = 5 and K = 6. For the natural neighbor method, the areas outside the sensor range were not interpolated and are not shown in the figure.

IDW Kriging
Natural neighbor RBF SSI SSK Figure 11. Heatmap plots for temperature data when N = 5 and K = 6. Figure 12 shows the CO2 values for IAQ01, IAQ02, and IAQ03 measured on 29 June.   From Figure 12, we can observe that the distance between IAQ02 and IAQ01 is greater than the distance between IAQ02 and IAQ03, but the data from IAQ02 and IAQ01 are much more similar than the data from IAQ02 and IAQ03. It is known that CO2 is highly correlated within an independent room separated by walls. Figure 13 shows the result of dividing the sensors into three groups based on the CO2 data using the group allocation and group assignment algorithms proposed in this paper. As shown in Figure 11, IAQ02 belongs to Group 1, the same group as IAQ01, while IAQ03 belongs to Group 3, a different group from IAQ01 and IAQ02. From Figure 12, we can observe that the distance between IAQ02 and IAQ01 is greater than the distance between IAQ02 and IAQ03, but the data from IAQ02 and IAQ01 are much more similar than the data from IAQ02 and IAQ03. It is known that CO 2 is highly correlated within an independent room separated by walls. Figure 13 shows the result of dividing the sensors into three groups based on the CO 2 data using the group allocation and group assignment algorithms proposed in this paper. As shown in Figure 11, IAQ02 belongs to Group 1, the same group as IAQ01, while IAQ03 belongs to Group 3, a different group from IAQ01 and IAQ02. Figure 12. Graph comparing CO2 data for IAQ01, IAQ02, and IAQ03.
From Figure 12, we can observe that the distance between IAQ02 and IAQ01 is greater than the distance between IAQ02 and IAQ03, but the data from IAQ02 and IAQ01 are much more similar than the data from IAQ02 and IAQ03. It is known that CO2 is highly correlated within an independent room separated by walls. Figure 13 shows the result of dividing the sensors into three groups based on the CO2 data using the group allocation and group assignment algorithms proposed in this paper. As shown in Figure 11, IAQ02 belongs to Group 1, the same group as IAQ01, while IAQ03 belongs to Group 3, a different group from IAQ01 and IAQ02.    From Figure 14, we can observe that the distance between IAQ11 and IAQ03 is greater than the distance between IAQ11 and IAQ10, but the data from IAQ11 and IAQ03 are much more similar than the data from IAQ11 and IAQ10. We think that this is likely due to the influence of various conditions, such as air conditioning and structure, in the office. Figure 15 shows the result of dividing the sensors into three groups based on the temperature data using the group allocation and group assignment algorithms proposed in this paper. As shown in Figure 11, IAQ11 belongs to Group 1, the same group as IAQ03, while IAQ10 belongs to Group3, a different group from IAQ11 and IAQ03. From Figure 14, we can observe that the distance between IAQ11 and IAQ03 is greater than the distance between IAQ11 and IAQ10, but the data from IAQ11 and IAQ03 are much more similar than the data from IAQ11 and IAQ10. We think that this is likely due to the influence of various conditions, such as air conditioning and structure, in the office. Figure 15 shows the result of dividing the sensors into three groups based on the temperature data using the group allocation and group assignment algorithms proposed in this paper. As shown in Figure 11, IAQ11 belongs to Group 1, the same group as IAQ03, while IAQ10 belongs to Group3, a different group from IAQ11 and IAQ03. greater than the distance between IAQ11 and IAQ10, but the data from IAQ11 and IAQ03 are much more similar than the data from IAQ11 and IAQ10. We think that this is likely due to the influence of various conditions, such as air conditioning and structure, in the office. Figure 15 shows the result of dividing the sensors into three groups based on the temperature data using the group allocation and group assignment algorithms proposed in this paper. As shown in Figure 11, IAQ11 belongs to Group 1, the same group as IAQ03, while IAQ10 belongs to Group3, a different group from IAQ11 and IAQ03. Various IAQ parameters, including CO2, temperature, relative humidity, and light intensity, have distinct physics, and IAQ parameters are influenced not only by the layout of indoor spaces but also by these underlying physical properties. We assume that even if the physics of the IAQ parameters are different, the physics also would be reflected in the collected data. Therefore, we believe that the sensor grouping algorithm proposed in this paper partially reflects the spatial constraints on IAQ parameters. Various IAQ parameters, including CO 2 , temperature, relative humidity, and light intensity, have distinct physics, and IAQ parameters are influenced not only by the layout of indoor spaces but also by these underlying physical properties. We assume that even if the physics of the IAQ parameters are different, the physics also would be reflected in the collected data. Therefore, we believe that the sensor grouping algorithm proposed in this paper partially reflects the spatial constraints on IAQ parameters.

Experimental Results Based on the Intel Lab Dataset
We evaluated our methods using the sensing data collected and made publicly available from Intel labs in 2004 [50,51]. This dataset provides the x and y coordinates of 54 sensors deployed in the Intel Berkeley Research lab between February 28th and April 5th, 2004. In the dataset, the temperature, humidity, light, and voltage data were collected at intervals of 31 s. The sensors were arranged in the lab according to Figure 16 [51].

Experimental Results Based on the Intel Lab Dataset
We evaluated our methods using the sensing data collected and made publicly available from Intel labs in 2004 [50,51]. This dataset provides the x and y coordinates of 54 sensors deployed in the Intel Berkeley Research lab between February 28th and April 5th, 2004. In the dataset, the temperature, humidity, light, and voltage data were collected at intervals of 31 s. The sensors were arranged in the lab according to Figure 16 [51]. For this experiment, we used five days of data from 28 February 2004 to 3 March 2004 averaged in minutes. Missing data were imputed using a linear method. Sensors 5 and 28 were excluded from the experiment due to a significant amount of missing data, resulting in a total of 52 sensors being used. We utilized the 28 February data for validation, meaning that we changed N to 3, 6, 9, 12, 15, 18, and 21 and K to 1, 10, 15, 20, 25, 30, 35, 40, 45, and 52 while utilizing the February 28 data to find the N and K values with the minimum RMSE. Using the found N and K values, we performed an interpolation experiment on the data from February 29 to March 3 to validate the performance. Figure 17 shows the RMSE values for the experiment with the number of groups set to 3, For this experiment, we used five days of data from 28 February 2004 to 3 March 2004 averaged in minutes. Missing data were imputed using a linear method. Sensors 5 and 28 were excluded from the experiment due to a significant amount of missing data, resulting in a total of 52 sensors being used. We utilized the 28 February data for validation, meaning that we changed N to 3, 6, 9, 12, 15, 18, and 21 and K to 1, 10, 15, 20, 25, 30, 35, 40, 45, and 52 while utilizing the February 28 data to find the N and K values with the minimum RMSE. Using the found N and K values, we performed an interpolation experiment on the data from February 29 to March 3 to validate the performance. Figure 17 shows the RMSE values for the experiment with the number of groups set to 3, 6, 9, 12, 15, 18, and 21 for the temperature data. The RBF method is excluded from the figure and the subsequent figures due to its large RMSE value compared to the other methods. For this experiment, we used five days of data from 28 February 2004 to 3 March 2004 averaged in minutes. Missing data were imputed using a linear method. Sensors 5 and 28 were excluded from the experiment due to a significant amount of missing data, resulting in a total of 52 sensors being used. We utilized the 28 February data for validation, meaning that we changed N to 3, 6, 9, 12, 15, 18, and 21 and K to 1, 10, 15, 20, 25, 30, 35, 40, 45, and 52 while utilizing the February 28 data to find the N and K values with the minimum RMSE. Using the found N and K values, we performed an interpolation experiment on the data from February 29 to March 3 to validate the performance. Figure 17 shows the RMSE values for the experiment with the number of groups set to 3, 6, 9, 12, 15, 18, and 21 for the temperature data. The RBF method is excluded from the figure and the subsequent figures due to its large RMSE value compared to the other methods.  Figure 18 shows the RMSE values for the experiment with the number of neighbors set to 5,10,15,20,25,30,35,40,45, and 52 for the temperature data.  Figure 18 shows the RMSE values for the experiment with the number of neighbors set to 5,10,15,20,25,30,35,40,45, and 52 for the temperature data. The IDW method had its minimum RMSE value when K = 5, the kriging method had its minimum RMSE value when K = 5, the natural neighbor method had its minimum RMSE value when K = 5, and the RBF method had its minimum RMSE value when K = 25. The SSI method had its minimum RMSE value when N = 6 and K = 5, and the SSK method had its minimum RMSE value when N = 6 and K = 30. Table 7 shows the results of performance experiments based on 4 days of data from February 29 to March 3 using the optimal values of N and K for each method. As shown in Table 7, the proposed SSI method provides slightly better performance metrics compared to the other methods, while the SSK method performs slightly worse in this case.  The IDW method had its minimum RMSE value when K = 5, the kriging method had its minimum RMSE value when K = 5, the natural neighbor method had its minimum RMSE value when K = 5, and the RBF method had its minimum RMSE value when K = 25. The SSI method had its minimum RMSE value when N = 6 and K = 5, and the SSK method had its minimum RMSE value when N = 6 and K = 30. Table 7 shows the results of performance experiments based on 4 days of data from February 29 to March 3 using the optimal values of N and K for each method. As shown in Table 7, the proposed SSI method provides slightly better performance metrics compared to the other methods, while the SSK method performs slightly worse in this case.  Figure 19. Heatmap plots for temperature data when N = 5 and K = 7. Figure 20 shows the RMSE values for the experiment with the number of groups set to 3,6,9,12,15,18, and 21 for the humidity data.  Figure 21 shows the RMSE values for the experiment with the number of neighbors set to 5,10,15,20,25,30,35,40,45, and 52 for the humidity data.  Figure 20 shows the RMSE values for the experiment with the number of groups set to 3,6,9,12,15,18, and 21 for the humidity data.       Figure 21 shows the RMSE values for the experiment with the number of neighbor set to 5,10,15,20,25,30,35,40,45, and 52 for the humidity data.  The IDW method had its minimum RMSE value when K = 5, the kriging method had its minimum RMSE value when K = 5, the natural neighbor method had its minimum RMSE value when K = 5, and the RBF method had its minimum RMSE value when K = 25. The SSI method had its minimum RMSE value when N = 6 and K = 5, and the SSK method had its minimum RMSE value when N = 9 and K = 15. Table 8 shows the results of performance experiments based on 4 days of data from 29 February to 3 March using the optimal values of N and K for each method. As shown in Table 8, the proposed SSI and SSK methods show slightly better performance metrics compared to the other methods. The IDW method had its minimum RMSE value when K = 5, the kriging method had its minimum RMSE value when K = 5, the natural neighbor method had its minimum RMSE value when K = 5, and the RBF method had its minimum RMSE value when K = 25. The SSI method had its minimum RMSE value when N = 6 and K = 5, and the SSK method had its minimum RMSE value when N = 9 and K = 15. Table 8 shows the results of performance experiments based on 4 days of data from 29 February to 3 March using the optimal values of N and K for each method. As shown in Table 8, the proposed SSI and SSK methods show slightly better performance metrics compared to the other methods.  Figure 22 shows the heatmap of the predicted values for humidity.   Figure 23 shows the RMSE values for the experiment with the number of groups set to 3,6,9,12,15,18, and 21 for the light data.   The IDW method had its minimum RMSE value when K = 5, the kriging method had its minimum RMSE value when K = 5, the natural neighbor method had its minimum RMSE value when K = 5, and the RBF method had its minimum RMSE value when K = 25. The SSI method had its minimum RMSE value when N = 6 and K = 5, and the SSK method had its minimum RMSE value when N = 9 and K = 15. Table 9 shows the results of performance experiments based on 4 days of data from 29 February to 3 March using the optimal values of N and K for each method. As shown in Table 9, the proposed SSI and SSK methods show better performance metrics compared to the other methods.  The IDW method had its minimum RMSE value when K = 5, the kriging method had its minimum RMSE value when K = 5, the natural neighbor method had its minimum RMSE value when K = 5, and the RBF method had its minimum RMSE value when K = 25. The SSI method had its minimum RMSE value when N = 6 and K = 5, and the SSK method had its minimum RMSE value when N = 9 and K = 15. Table 9 shows the results of performance experiments based on 4 days of data from 29 February to 3 March using the optimal values of N and K for each method. As shown in Table 9, the proposed SSI and SSK methods show better performance metrics compared to the other methods.  Figure 24 shows the RMSE values for the experiment with the number of neighbors set to 5,10,15,20,25,30,35,40,45, and 52 for the light data. The IDW method had its minimum RMSE value when K = 5, the kriging method had its minimum RMSE value when K = 5, the natural neighbor method had its minimum RMSE value when K = 5, and the RBF method had its minimum RMSE value when K = 25. The SSI method had its minimum RMSE value when N = 6 and K = 5, and the SSK method had its minimum RMSE value when N = 9 and K = 15. Table 9 shows the results of performance experiments based on 4 days of data from 29 February to 3 March using the optimal values of N and K for each method. As shown in Table 9, the proposed SSI and SSK meth-

Conclusions
In this paper, we proposed an interpolation scheme for IAQ data that considers the spatial constraints of indoor environments. The proposed scheme was compared with commonly used methods, such as IDW, kriging, natural neighbor interpolation, and RBF and was found to be more accurate in terms of the RMSE. The results of the experimen demonstrate that our proposed scheme could improve the accuracy of air quality estima tion in indoor environments.
Our findings have important implications for IAQ management in various settings including smart buildings, smart factories, schools, offices, and other similar environ ments. The accurate measurement and estimation of air quality parameters are crucial fo maintaining occupant health and comfort and optimizing energy use in indoor spaces Our proposed interpolation scheme can provide more accurate estimates of air quality parameters, which can inform the optimization of ventilation and air conditioning sys tems and ultimately lead to a healthier indoor environment for occupants. The integration of IAQ estimation with energy management and occupant behavior modeling can lead to more comprehensive and effective IAQ management strategies. In this paper, a DCP based k-mode clustering method is used to group sensors with similar characteristics However, the performance of the proposed spatial interpolation method may be affected when the internal structure of the indoor space is changed. To address this, further re search is needed on how to regroup sensors when the performance is degraded or the indoor space is changed. Additionally, this study does not include an analysis of im portant factors related to indoor air quality, such as particulate matter (PM) and volatile organic compounds (VOCs), which presents an area for future research.

Conclusions
In this paper, we proposed an interpolation scheme for IAQ data that considers the spatial constraints of indoor environments. The proposed scheme was compared with commonly used methods, such as IDW, kriging, natural neighbor interpolation, and RBF, and was found to be more accurate in terms of the RMSE. The results of the experiment demonstrate that our proposed scheme could improve the accuracy of air quality estimation in indoor environments.
Our findings have important implications for IAQ management in various settings, including smart buildings, smart factories, schools, offices, and other similar environments. The accurate measurement and estimation of air quality parameters are crucial for maintaining occupant health and comfort and optimizing energy use in indoor spaces. Our proposed interpolation scheme can provide more accurate estimates of air quality parameters, which can inform the optimization of ventilation and air conditioning systems and ultimately lead to a healthier indoor environment for occupants. The integration of IAQ estimation with energy management and occupant behavior modeling can lead to more comprehensive and effective IAQ management strategies. In this paper, a DCP-based k-mode clustering method is used to group sensors with similar characteristics. However, the performance of the proposed spatial interpolation method may be affected when the internal structure of the indoor space is changed. To address this, further research is needed on how to regroup sensors when the performance is degraded or the indoor space is changed. Additionally, this study does not include an analysis of important factors related to indoor air quality, such as particulate matter (PM) and volatile organic compounds (VOCs), which presents an area for future research. Data Availability Statement: The Intel Lab data are available at http://db.csail.mit.edu/labdata/ labdata.html (accessed on 10 May 2023). The office data presented in this study are available from the authors based on a reasonable request.