Real-Time Reconstruction of Contaminant Dispersion from Sparse Sensor Observations with Gappy POD Method

: Real-time estimation of three-dimensional ﬁeld data for enclosed spaces is critical to HVAC control. This task is challenging, especially for large enclosed spaces with complex geometry, due to the nonuniform distribution and nonlinear variations of many environmental variables. Moreover, constructing and maintaining a network of sensors to fully cover the entire space is very costly, and insu ﬃ cient sensor data might deteriorate system performance. Facing such a dilemma, gappy proper orthogonal decomposition (POD) o ﬀ ers a solution to provide three-dimensional ﬁeld data with a limited number of sensor measurements. In this study, a gappy POD method for real-time reconstruction of contaminant distribution in an enclosed space is proposed by combining the POD method with a limited number of sensor measurements. To evaluate the gappy POD method, a computational ﬂuid dynamics (CFD) model is utilized to perform a numerical simulation to validate the e ﬀ ectiveness of the gappy POD method in reconstructing contaminant distributions. In addition, the optimal sensor placement is given based on a quantitative metric to maximize the reconstruction accuracy, and the sensor placement constraints are also considered during the sensor design process. The gappy POD method is found to yield accurate reconstruction results. Further works will include the implementation of real-time control based on the POD method.


Introduction
Automatic control of the HVAC system plays a significant role in improving indoor air [1][2][3][4] and reducing building energy consumption [5][6][7][8]. Real-time estimation of contaminant distribution inside any enclosed space could provide immediate feedback to the control of ventilation systems and, thus, is of great significance. However, the temporal evolution of contaminant distribution is characterized by complex nonlinear dynamics. It is challenging to reconstruct the spatiotemporal distribution of contaminants for real-time control of the ventilation system [9].
There are three main approaches to constructing indoor field data: spatial data interpolation, physics-based simulation, and the data-driven approach. Generally, ordinary Kriging is the most widely used spatial interpolation method [10] and could produce an effective estimate of an indoor thermal map [11] and pollutant distributions [11][12][13] based on sensor measurements. However, the accuracy of the interpolation is strongly dependent on sensor placements and the number of sensors. Typically, a large number of sensors are required to achieve adequate spatial resolution [11]. On another hand, physics-based models such as computational fluid dynamics (CFD) [14][15][16], fast fluid to guarantee satisfactory HVAC performance, which, however, is usually cost-expensive. Thus, it is of great significance to extrapolate the three-dimensional distribution of indoor air contaminants based on limited sensor measurements. In this paper, a gappy POD method for the real-time reconstruction of contaminant distribution in an enclosed space is proposed by combining the POD method with a limited number of sensor measurements. Even though gappy POD has been confirmed to perform well in the field of turbulent flow sensing, the effects of gappy POD on the real-time reconstruction of contaminant distributions remain to be investigated. To evaluate the effects of gappy POD for the reconstruction of indoor environments, CFD models are utilized to perform the simulation study to validate the reconstruction results. The article is organized as follows. Section 2 briefly introduces the gappy POD method and the algorithm for optimal sensor placement. The reconstruction results of the contaminant distribution are presented in Section 3. Finally, the main conclusions are presented in Section 4.

Standard POD
The spatiotemporal distribution of contaminants is characterized by complex nonlinear dynamics. The POD method provides a powerful tool to decompose the high-dimensional dynamics to basic POD modes for representation of the coherent structures for contaminant distribution. The snapshot method was introduced by Sirovich et al. for determination of the basic POD modes [46]. The snapshots X are formed by collecting the time-series data.
where x(t k ) is a vector of n elements representing the spatial distribution of the contaminants at a time t k . The snapshot x(t k ) could be obtained from CFD simulation results. The correlation matrix R of size m × m is defined by Then, the eigenvectors ϕ = ϕ 1 , ϕ 2 · · · , ϕ m of the correlation matrix R, and its corresponding eigenvalues λ = λ 1 , λ 2 · · · , λ m are computed. The basic POD modes φ is given by a linear combination of snapshots where ϕ j denotes the j th eigenvector. The eigenvalue is ordered in terms of the importance of their corresponding POD modes ( λ 1 > λ 2 > λ 3 · · · > λ m ). The lager eigenvalue means that the corresponding basic POD modes plays a more important role in describing the contaminant distribution. By using the first p eigenvectors, the contaminant concentration could be reconstructed as where b i are temporal coefficients of the i th POD basis mode, and x is the vector representing the estimation of contaminant distribution in the enclosed space. For POD applications, the most important issue is the determination of the coefficients for basic POD modes.

Gappy POD
Gappy POD provides an effective method for the calculation of POD coefficients based on sparse sensor observations. The mask vector H of the n elements records the sensor locations. If the sensor is located on the ith grid point, then the ith element in H is equal to one; otherwise, the ith element in H is equal to zero. The incomplete vector x of the n elements describes the contaminant concentration at n locations corresponding to sensor locations but also has some elements missing. The full vector x of the n elements represents the reconstructed contaminant concentration. The gappy POD could reconstruct the full vector x from the incomplete vector x. The POD coefficient could be obtained by solving a linear regression in Equation (5). where and The coefficient b could be obtained by solving Equation (5), and the spatial distribution of contaminants could be immediately reconstructed through Equation (4).

Sensor Placement
According to Equation (5), the POD coefficient is obtained through data regression. The condition number of matrix M is a good proxy for evaluating the solution accuracy of Equation (5). The smaller condition number could contribute to the higher solution stability for Equation (5). The optimal sensor placement for the N sensors could be determined by the following equations: where κ(M) is the condition number of M.
The sensor placement could be obtained based on a greedy algorithm to minimize κ(M) [37]. Firstly, loop over all possible placement points, evaluate the condition number of M for each point, and choose the point that minimizes κ(M) to locate the first sensor. Then, determine the location of the next sensor that could minimize the condition number κ(M). Repeat the previous steps until all the N sensors are placed in appropriate locations. For sensors with placement constraints, the number of possible sensor placement points would be fewer, and the only change we have to make is to loop over locations within location constraints for the optimal sensor placement.

Numerical Model
In the present study, the turbulent model based on Reynolds-Averaged Navier Stokes (RANS) is adopted. According to the previous studies focusing on gaseous contaminant dispersion in an enclosed space, the Eulerian method is able to provide an effective solution for simulating an indoor concentration distribution [26,47]. Therefore, the Eulerian method is used to predict the contaminant distributions in this study. Moreover, it is assumed that there is no chemical reaction during the gaseous contaminant dispersion process.
The experimental results given by Yuan et al. were utilized to validate the reliability of our numerical model [48]. The experiments were conducted in a test chamber with a size of 5.16 m × 3.65 m × 2.43 m (length × width × height) under the displacement ventilation. The total flow rate was 0.05 m 3 /s. The supply inlet was placed at the middle of the side wall near the floor, and the exhaust device of size Energies 2020, 13,1956 5 of 12 0.43 m × 0.43 m was fixed at the center of the ceiling. Two heated manikins, two computers, and six lamps were used to simulate heat sources. SF 6 was used to simulate contaminants from occupants, and the contaminant source was placed above the simulated occupants. Figure 1 gives the comparisons of the velocity and dimensionless SF 6 concentration profiles between the numerical results and the experimental data. It could be observed that the agreement between the CFD results and the experimental data is reasonably good. Discrepancies between the simulated results and experimental values could be observed in the upper part of the space. Additionally, the discrepancies could also be found in the simulation study by Yuan et al. [49], who conducted this experiment and explained that this was because of recirculating flows in the upper part of the space. Moreover, it should be noted that the uncertainties for the experimental data are 0.01 m/s and 10% for the SF 6 concentration [49].
Energies 2020, 13, x FOR PEER REVIEW 5 of 12 part of the space. Moreover, it should be noted that the uncertainties for the experimental data are 0.01 m/s and 10% for the SF6 concentration [49].  The reconstruction of the contaminant distribution is decomposed into two steps: the offline stage and online stage ( Figure 3). In the offline stage, the snapshots are decomposed into basic POD

Model Setup
In this study, the case is set in an enclosed space with dimensions of 4 m (L) × 3 m (W) × 2.5 m (H). Both airflow inlet and outlet are located on the left wall with dimensions of 0.6 m × 0.3 m. The inlet is located in the upper part of the left-side wall of 0.3 m beneath the ceiling, while the outlet is located in the lower part of the left-side wall of 0.3 m above the floor. A gaseous contaminant source is located in the center of the room. Detailed configurations of the ventilation system are shown in Figure 2. Moreover, the sensor locations are assumed to be constrained on the left and the back wall.
The reconstruction of the contaminant distribution is decomposed into two steps: the offline stage and online stage ( Figure 3). In the offline stage, the snapshots are decomposed into basic POD modes, and the optimal sensor placement is determined. In the online stage, the temporal coefficients for POD modes are obtained based on the sparse sensor measurements through linear regression and then are applied for reconstruction of the contaminant distribution through a linear combination of the dominant POD modes. The POD-based online-offline algorithm has already been applied to many cases for optimal control of the dynamic systems, including optimal control of the water flooding reservoir [50] and optimal control of the indoor temperature [38,41]. The detailed procedure is described as follows: Energies 2020, 13, 1956 6 of 12 In this study, the case is set in an enclosed space with dimensions of 4 m (L) × 3 m (W) × 2.5 m (H). Both airflow inlet and outlet are located on the left wall with dimensions of 0.6 m × 0.3 m. The inlet is located in the upper part of the left-side wall of 0.3 m beneath the ceiling, while the outlet is located in the lower part of the left-side wall of 0.3 m above the floor. A gaseous contaminant source is located in the center of the room. Detailed configurations of the ventilation system are shown in Figure 2. Moreover, the sensor locations are assumed to be constrained on the left and the back wall. The reconstruction of the contaminant distribution is decomposed into two steps: the offline stage and online stage ( Figure 3). In the offline stage, the snapshots are decomposed into basic POD modes, and the optimal sensor placement is determined. In the online stage, the temporal coefficients for POD modes are obtained based on the sparse sensor measurements through linear regression and then are applied for reconstruction of the contaminant distribution through a linear combination of the dominant POD modes. The POD-based online-offline algorithm has already been applied to many cases for optimal control of the dynamic systems, including optimal control of the water snapshot matrix combining the snapshots in both case I and case II is formed.
(2) Decompose the snapshots to obtain the POD modes, and then the optimal senor placement can be determined by minimizing the condition number in Equation (8). Moreover, it should be noted that the number of sensors should be more than the number of the dominant POD modes to make sure that the optimal algorithm for sensor placement is not an underdetermined problem.
(3) In the online stage, the temporal coefficients of the POD modes are determined based on the real-time measurements from the sparse sensors. The CFD simulations in the test case are used to provide sensor observations for implementing gappy POD reconstruction and to validate the reconstruction results. In particular, it is noteworthy that the gappy POD in the online step does not rely on any CFD simulation, and the CFD simulation here is used to perform a simulation study to evaluate the gappy POD model. The test case is set when the contaminant is steadily released at a source strength of 1.5 times higher than the source strength in case I. We first obtain the steady contaminant distribution with the CFD model for an inlet velocity of 1.2 m/s, and then, the inlet velocity is subject to a 0.4 m/s positive step to 1.6 m/s. The gappy POD method can be applied for reconstruction of the contaminant distribution in this test case, because the airflow pattern in our test case shares similar coherent structures with the snapshots. Moreover, Sempey et al. confirmed that the POD method performs well in reconstructing system dynamics with boundary conditions different from the conditions used for constructing snapshots [42].
(4) Reconstruct the spatiotemporal distribution of the contaminants based on the basic POD modes and the corresponding temporal coefficients.

POD Decomposition
The snapshots of the contaminant distributions are of great importance for characterizing the system dynamics, and the dominant POD modes are built based on the snapshot ensemble. In detail, (1) Collect a set of snapshots with a combination of different inlet velocities and contaminant source strengths. In case I, we first obtain the steady contaminant distribution with the CFD model for an inlet velocity of 1 m/s, and the gaseous contaminant is steadily released with a source strength of 1 mL per cubic meter per second. Using the steady contaminant distribution as the initial condition, the inlet velocity experiences a step change from 1 m/s to 2 m/s, and the source strength remains unchanged. The simulation results are recorded as snapshots until the next steady state is reached. In summary, the snapshots are recorded every 1 s, and 180 snapshots are obtained during the response process in case I. Case II follows a similar procedure as case I; the only difference between case I and case II is that the gaseous contaminant is steadily released with a source strength of 2 times higher than the source strength in case I. A sum of 320 snapshots are recorded for case II. Finally, a snapshot matrix combining the snapshots in both case I and case II is formed.
(2) Decompose the snapshots to obtain the POD modes, and then the optimal senor placement can be determined by minimizing the condition number in Equation (8). Moreover, it should be noted that the number of sensors should be more than the number of the dominant POD modes to make sure that the optimal algorithm for sensor placement is not an underdetermined problem.
(3) In the online stage, the temporal coefficients of the POD modes are determined based on the real-time measurements from the sparse sensors. The CFD simulations in the test case are used to provide sensor observations for implementing gappy POD reconstruction and to validate the reconstruction results. In particular, it is noteworthy that the gappy POD in the online step does not rely on any CFD simulation, and the CFD simulation here is used to perform a simulation study to evaluate the gappy POD model. The test case is set when the contaminant is steadily released at a source strength of 1.5 times higher than the source strength in case I. We first obtain the steady contaminant distribution with the CFD model for an inlet velocity of 1.2 m/s, and then, the inlet velocity Energies 2020, 13,1956 7 of 12 is subject to a 0.4 m/s positive step to 1.6 m/s. The gappy POD method can be applied for reconstruction of the contaminant distribution in this test case, because the airflow pattern in our test case shares similar coherent structures with the snapshots. Moreover, Sempey et al. confirmed that the POD method performs well in reconstructing system dynamics with boundary conditions different from the conditions used for constructing snapshots [42].
(4) Reconstruct the spatiotemporal distribution of the contaminants based on the basic POD modes and the corresponding temporal coefficients.

POD Decomposition
The snapshots of the contaminant distributions are of great importance for characterizing the system dynamics, and the dominant POD modes are built based on the snapshot ensemble. In detail, the contaminant is released from the center of the enclosed space, and the concentration is highest around the contaminant source. The contaminant distribution evolves with time as a response to the step increase of the inlet velocity from 1 m/s to 2 m/s. A total of 500 POD modes and 500 corresponding eigenvalues are obtained based on the POD decomposition process. The exponential decay of the normalized eigenvalues is demonstrated in Figure 4. The normalized eigenvalues are calculated by dividing the sum of the total 500 eigenvalues and are regarded as a significant parameter for evaluating the percentage of energy contained in their corresponding POD modes and for measuring the importance of the POD modes. The normalized eigenvalues in Figure 4 are ordered in terms of their ability to describe the spatiotemporal distribution of the contaminants. It could be observed that the first few modes account for most of the system's energy. For example, the first, second, and third POD mode accounts for 66.63%, 17.83%, and 4.26% of the total system energy, respectively. In addition, the first six POD modes contain about 95% of the total system energy, while the first 16 POD modes contain about 99% of the total system energy. As the normalized eigenvalue decreases, the corresponding POD mode will exhibit less meaningful spatial structures and contribute less to reconstructing the contaminant dispersion process.
Energies 2020, 13, x FOR PEER REVIEW 7 of 12 the contaminant is released from the center of the enclosed space, and the concentration is highest around the contaminant source. The contaminant distribution evolves with time as a response to the step increase of the inlet velocity from 1 m/s to 2 m/s. A total of 500 POD modes and 500 corresponding eigenvalues are obtained based on the POD decomposition process. The exponential decay of the normalized eigenvalues is demonstrated in Figure 4. The normalized eigenvalues are calculated by dividing the sum of the total 500 eigenvalues and are regarded as a significant parameter for evaluating the percentage of energy contained in their corresponding POD modes and for measuring the importance of the POD modes. The normalized eigenvalues in Figure 4 are ordered in terms of their ability to describe the spatiotemporal distribution of the contaminants. It could be observed that the first few modes account for most of the system's energy. For example, the first, second, and third POD mode accounts for 66.63%, 17.83%, and 4.26% of the total system energy, respectively. In addition, the first six POD modes contain about 95% of the total system energy, while the first 16 POD modes contain about 99% of the total system energy. As the normalized eigenvalue decreases, the corresponding POD mode will exhibit less meaningful spatial structures and contribute less to reconstructing the contaminant dispersion process. Figure 4. The exponential decay of the normalized eigenvalue. The eigenvalue is normalized by dividing the sum of the 500 eigenvalues. The first 64 POD modes contain more than 99.9% of the total system energy.

Sensor Placement
It is challenging to optimize the sensor placement to maximize the reconstruction accuracy, since there are thousands of potential locations for sensor placement. A quantitative framework is used to determine the sensor locations by minimizing the condition number of the M matrix in Equation (8). The sensor locations are assumed to be constrained on the left and the back wall. For example, the optimal sensor placements of 20 sensors with location constraints is demonstrated in Figure 5. The potential sensor location could be in any location of the approximately 3000 grids on the left and the back wall. The optimization strategy is to pick up 20 points from the 3000 potential locations for the placement of sensors that could minimize the condition number of M in Equation (8) based on the greedy algorithm. It should be noted that increasing the POD mode number or increasing the sensor number would both result in a higher computational cost for finding the optimal sensor location. However, the computation of the optimal sensor placement is in the off-line stage and would not affect the speed for the real-time contaminant reconstruction in the online stage. Moreover, it can be observed that the sensors with location constraints are always kept away from the contaminant Figure 4. The exponential decay of the normalized eigenvalue. The eigenvalue is normalized by dividing the sum of the 500 eigenvalues. The first 64 POD modes contain more than 99.9% of the total system energy.

Sensor Placement
It is challenging to optimize the sensor placement to maximize the reconstruction accuracy, since there are thousands of potential locations for sensor placement. A quantitative framework is used to determine the sensor locations by minimizing the condition number of the M matrix in Equation (8). The sensor locations are assumed to be constrained on the left and the back wall. For example, the optimal sensor placements of 20 sensors with location constraints is demonstrated in Figure 5. The potential sensor location could be in any location of the approximately 3000 grids on the left and the back wall. The optimization strategy is to pick up 20 points from the 3000 potential locations for the placement of sensors that could minimize the condition number of M in Equation (8) based on the greedy algorithm. It should be noted that increasing the POD mode number or increasing Energies 2020, 13,1956 8 of 12 the sensor number would both result in a higher computational cost for finding the optimal sensor location. However, the computation of the optimal sensor placement is in the off-line stage and would not affect the speed for the real-time contaminant reconstruction in the online stage. Moreover, it can be observed that the sensors with location constraints are always kept away from the contaminant source. This can benefit the sensors by protecting them from long-term exposure to high-concentration contaminants and can improve the sensors' service lives.

Gappy Flow Reconstruction
Quantifying the indoor air quality plays an important role in analyzing indoor occupant exposure [51]. Figure 6 demonstrates the estimation of the contaminant distribution based on 16 dominant POD modes with sensor placement constraints. It should be noted that the reconstruction accuracy is significantly limited by the number of dominant POD modes chosen for estimation of the contaminant distribution, and the first 16 modes contain more than 99% of the total system energy. As shown in Figure 6, the gappy POD method exhibits a high reconstruction accuracy and performs extremely well in estimating a contaminant concentration higher than 0.02 ppm. This is because 16 POD modes could capture a sufficiently detailed structure for reconstruction of the contaminant dispersion process. Moreover, the difference between gappy POD reconstruction and CFD simulation can be observed at t = 20 s, because it is difficult to reconstruct the contaminant distribution during this period with dramatical variations in the airflow field. For further evaluation of the gappy POD method, comparison of the estimated contaminant concentration along the line x = 1.5 m, y = 1 m is conducted (Figure 7). The difference between the POD reconstruction and CFD simulation could be observed at t = 20 s due to the slightly different airflow pattern from the snapshots, which could also be observed in Figure 6. However, the gappy POD performs well in most conditions, because the airflow pattern in our test case shares similar coherent structures with snapshots in most conditions. Moreover, Sempey et al. confirmed that the POD method performs well in reconstructing system dynamics with boundary conditions different from the conditions used for constructing snapshots [42].
The reconstruction accuracy could be improved by increasing the dominant POD modes. This is because more POD modes can contribute to more detailed information for the description of the

Gappy Flow Reconstruction
Quantifying the indoor air quality plays an important role in analyzing indoor occupant exposure [51]. Figure 6 demonstrates the estimation of the contaminant distribution based on 16 dominant POD modes with sensor placement constraints. It should be noted that the reconstruction accuracy is significantly limited by the number of dominant POD modes chosen for estimation of the contaminant distribution, and the first 16 modes contain more than 99% of the total system energy. As shown in Figure 6, the gappy POD method exhibits a high reconstruction accuracy and performs extremely well in estimating a contaminant concentration higher than 0.02 ppm. This is because 16 POD modes could capture a sufficiently detailed structure for reconstruction of the contaminant dispersion process. Moreover, the difference between gappy POD reconstruction and CFD simulation can be observed at t = 20 s, because it is difficult to reconstruct the contaminant distribution during this period with dramatical variations in the airflow field.

Gappy Flow Reconstruction
Quantifying the indoor air quality plays an important role in analyzing indoor occupant exposure [51]. Figure 6 demonstrates the estimation of the contaminant distribution based on 16 dominant POD modes with sensor placement constraints. It should be noted that the reconstruction accuracy is significantly limited by the number of dominant POD modes chosen for estimation of the contaminant distribution, and the first 16 modes contain more than 99% of the total system energy. As shown in Figure 6, the gappy POD method exhibits a high reconstruction accuracy and performs extremely well in estimating a contaminant concentration higher than 0.02 ppm. This is because 16 POD modes could capture a sufficiently detailed structure for reconstruction of the contaminant dispersion process. Moreover, the difference between gappy POD reconstruction and CFD simulation can be observed at t = 20 s, because it is difficult to reconstruct the contaminant distribution during this period with dramatical variations in the airflow field. For further evaluation of the gappy POD method, comparison of the estimated contaminant concentration along the line x = 1.5 m, y = 1 m is conducted (Figure 7). The difference between the POD reconstruction and CFD simulation could be observed at t = 20 s due to the slightly different airflow pattern from the snapshots, which could also be observed in Figure 6. However, the gappy POD performs well in most conditions, because the airflow pattern in our test case shares similar coherent structures with snapshots in most conditions. Moreover, Sempey et al. confirmed that the POD method performs well in reconstructing system dynamics with boundary conditions different from the conditions used for constructing snapshots [42].
The reconstruction accuracy could be improved by increasing the dominant POD modes. This is because more POD modes can contribute to more detailed information for the description of the pollutant distribution. In our study, we found that 16 dominant POD modes are sufficient for For further evaluation of the gappy POD method, comparison of the estimated contaminant concentration along the line x = 1.5 m, y = 1 m is conducted (Figure 7). The difference between the POD reconstruction and CFD simulation could be observed at t = 20 s due to the slightly different airflow pattern from the snapshots, which could also be observed in Figure 6. However, the gappy POD performs well in most conditions, because the airflow pattern in our test case shares similar coherent structures with snapshots in most conditions. Moreover, Sempey et al. confirmed that the Energies 2020, 13, 1956 9 of 12 POD method performs well in reconstructing system dynamics with boundary conditions different from the conditions used for constructing snapshots [42].

Discussions and Conclusions
Real-time estimation of the contaminant distribution is essential for ventilation control and indoor air quality management. Moreover, it is known that automatic control is of great importance for improving a systems' energy efficiency in engineering applications [52][53][54][55]. However, it is challenging to reconstruct spatiotemporal distributions of contaminants in an enclosed space, due to the ununiform distribution and nonlinear variations of indoor contaminants. As an essential part of active control, the sensor measurements only provide limited information about contaminant concentrations near sensor locations. Usually, a large number of sensors are required to guarantee the satisfactory performance of HVAC control, which might result in high costs for maintaining and conducting a sensor network in practice. Facing such a dilemma, the gappy POD offers a solution to provide real-time contaminant distributions with a limited number of sensor measurements.
In this study, a gappy POD method for the real-time reconstruction of pollutant distribution in an enclosed space is proposed by combining the POD method with a limited number of sensor measurements. In fact, the spatial distribution of indoor contaminants is often represented by a combination of a few dominant patterns, and this inherent property enables reconstruction of the contaminant distribution with sparse sensor observations. Moreover, our study gives the optimal sensor placement based on a quantitative metric in order to maximize the reconstruction accuracy. The sensor placement constraints are also considered during the sensor design process. It should be noted that the reconstruction accuracy is significantly limited by the number of dominant POD modes chosen for estimation of the pollutant distributions. For example, the first six POD modes only contain about 95% of the total system energy, while the first 16 POD modes contain about 99% of the total system energy. According to our study, reconstruction based on 16 POD modes are sufficient for accurate reconstruction of the pollutant distribution in the enclosed space.
For feedback control of the HVAC, gappy POD is able to provide a reliable estimation of threedimensional data for the indoor environment with a high fidelity and low computational cost. Considering the low-order nature of the POD model, a closed-loop control might be achieved by controlling the coefficient of the POD modes, and the estimated POD coefficients could provide the necessary information for driving the HVAC actuators. Further work will include real-time control of the dynamic system based on the gappy POD method.  The reconstruction accuracy could be improved by increasing the dominant POD modes. This is because more POD modes can contribute to more detailed information for the description of the pollutant distribution. In our study, we found that 16 dominant POD modes are sufficient for estimation of the contaminant distribution in the enclosed space with sensor placement constraints.

Discussions and Conclusions
Real-time estimation of the contaminant distribution is essential for ventilation control and indoor air quality management. Moreover, it is known that automatic control is of great importance for improving a systems' energy efficiency in engineering applications [52][53][54][55]. However, it is challenging to reconstruct spatiotemporal distributions of contaminants in an enclosed space, due to the ununiform distribution and nonlinear variations of indoor contaminants. As an essential part of active control, the sensor measurements only provide limited information about contaminant concentrations near sensor locations. Usually, a large number of sensors are required to guarantee the satisfactory performance of HVAC control, which might result in high costs for maintaining and conducting a sensor network in practice. Facing such a dilemma, the gappy POD offers a solution to provide real-time contaminant distributions with a limited number of sensor measurements.
In this study, a gappy POD method for the real-time reconstruction of pollutant distribution in an enclosed space is proposed by combining the POD method with a limited number of sensor measurements. In fact, the spatial distribution of indoor contaminants is often represented by a combination of a few dominant patterns, and this inherent property enables reconstruction of the contaminant distribution with sparse sensor observations. Moreover, our study gives the optimal sensor placement based on a quantitative metric in order to maximize the reconstruction accuracy. The sensor placement constraints are also considered during the sensor design process. It should be noted that the reconstruction accuracy is significantly limited by the number of dominant POD modes chosen for estimation of the pollutant distributions. For example, the first six POD modes only contain about 95% of the total system energy, while the first 16 POD modes contain about 99% of the total system energy. According to our study, reconstruction based on 16 POD modes are sufficient for accurate reconstruction of the pollutant distribution in the enclosed space.
For feedback control of the HVAC, gappy POD is able to provide a reliable estimation of three-dimensional data for the indoor environment with a high fidelity and low computational cost. Considering the low-order nature of the POD model, a closed-loop control might be achieved by controlling the coefficient of the POD modes, and the estimated POD coefficients could provide the necessary information for driving the HVAC actuators. Further work will include real-time control of the dynamic system based on the gappy POD method.