Bayesian Calibration for Ofﬁce-Building Heating and Cooling Energy Prediction Model

: Conventional building energy models (BEM) for heating and cooling energy-consumption prediction without calibration are not accurate, and the commonly used manual calibration method requires the high expertise of modelers. Bayesian calibration (BC) is a novel method with great potential in BEM, and there are many successful applications for unknown-parameters calibrating and retroﬁtting analysis. However, there is still a lack of study on prediction model calibration. There are two main challenges in developing a calibrated prediction model: (1) poor generalization ability; (2) lack of data availability. To tackle these challenges and create an energy prediction model for ofﬁce buildings in Guangdong, China, this paper characterizes and validates the BC method to calibrate a quasi-dynamic BEM with a comprehensive database including geometry information for various ofﬁce buildings. Then, a case study analyzes the effectiveness and performance of the calibrated prediction model. The results show that BC effectively and accurately calibrates quasi-dynamic BEM for prediction purposes. The calibrated model accuracy (monthly CV(RMSE) of 0.59% and hourly CV(RMSE) of 19.35%) meets the requirement of ASHRAE Guideline 14. With the calibrated prediction model, this paper provides a new way to improve the data quality and integrity of existing building energy databases and will further beneﬁt usability.


Introduction
With rapid urbanization, building energy consumption has attracted increasing attention globally. Building energy modeling (BEM) is widely used as the most effective method to assess building energy efficiency. Generally, building energy models lack consideration of the urban context and ae always based on over-simplification and strong assumptions, thus leading to a prediction deviation of up to 90% [1]. To improve the confidence level of models, calibration is an essential step in the modeling process.
Currently, two BEM calibration approaches can be adopted: the trial-and-error approach and Bayesian calibration (BC) [2,3]. The trial-and-error approach is an iterative approach to adjusting unknown input parameters and assumptions [4]. It causes a big challenge for the modeler because the accuracy of the model highly relies on the modeler's experience and knowledge in building and modeling. Due to the same reason, this method requires incredible time and effort from both user and computer to achieve an accurate result [5]. Additionally, it is adjusted to match historical energy data, leading to a weakness in forecasting future energy.
The BC method alleviates the problems presented by the trial-and-error method. Instead of adjusting model parameters and assumptions to match the prediction value with measured data, the BC framework attempts to understand the model uncertainty and retain the consistency between the simulation data and measured data [6]. It is an automated calibration technique based on quantifiable evidence rather than experts' intervention, by a Bayesian calibrated model with generalization ability. It can not only retell the energy profile of the original buildings (included in the training set) but also predict energy consumption for other buildings (not included in the training set) with their space layout and geometry information. (2) Instead of dynamic simulation tools, this paper investigates the effectiveness of BC in giving results within the ASHRAE required margin when dealing with models with fewer input parameters from the quasi-dynamic simulation method. With the improvement in generalization ability, this calibration simulation method discovered a new way of solving the data quality and integrity issues of the existing building energy databases, which will greatly improve the usability of the existing big-data resource.
In conclusion, this paper shows that the quasi-dynamic simulation method can achieve a similar accuracy level as the dynamic simulation method after BC. It also effectively improves the generalization ability of the calibrated model by using a comprehensive dataset, including geometry information, so that it can be applied to general office buildings. With this improvement, this paper can also contribute to improving the quality and integrity of the existing building energy database.

Research Framework
This paper aims to develop a dedicated BC model based on quasi-dynamic BEM, which can (1) deal with diverse inaccurate geometry parameters and material data, and (2) predict the heating and cooling energy consumption of office buildings. To achieve the objective, this study collected energy data and building information from 10 office buildings located in Guangdong, China. Another building (with the same data structure) is recorded in a separate dataset to carry out a case study for testing the performance and feasibility of BC prediction model. These buildings were selected according to the building scale, including both open and unit office buildings. A total of 8 key parameters (shown in Table 1) were surveyed to generate the building geometry. Two years of hourly historical energy data and corresponding weather data were gathered to form the energy dataset. With the collected information, this paper calibrated quasi-dynamic BEM ISO 13790 [29] with the BC method [6].
The preparation work has three parts:

1.
A pre-calibration primary model based on ISO 13790 was developed; 2.
A comprehensive energy dataset containing simulation data of primary model and real-world measured data; 3.
A dedicated hypothesis for Bayesian inference was made for the proposed calibration.
The main calibration contains 5 steps: 1.
Use sensitivity analysis(SA) to identify sensitive parameters to construct the calibration parameter Θ; 2.
Assume prior probability density function at weakly informative level; 3.
Combine measured data z (collected from office buildings in Guangdong, China) and simulation data y (generated by ISO 13790) in a Gaussian process (GP) emulator to generate the prior distribution; 4.
Explore the posterior distributions of the calibration parameter Θ, the location parameter β and the hyper-parameter φ by using the Gibbs sampling approach for the MCMC sampling; 5.
Evaluate the performance of the calibrated model for both convergences for multiple MCMC chains and prediction accuracy.
Finally, a case study was carried out to analyze the calibrated model. The whole methodology framework of applying BC to ISO 13790 is shown in Figure 1.

Simulation tool:ISO 13790
This paper uses ISO 13790 instead of EnergyPlus as a simulation tool to mitigate the problem of over-parameterization and reduce the difficulty of data collection. This step aims to develop a primary energy prediction tool using ISO 13790 and further understand the essential parameters in this model for generating high-quality simulation data. The ISO 13790 calculation method [29] iterates the building thermal mass temperature in specific time steps by modeling the thermal gains and loss balance of the building. It is simulated in an equivalent three-node (surface node, internal air node, and thermal mass node) resistance-capacitance network (5R1C model). The schematic of the 5R1C model is shown in Figure 2.
Based on the schematic of the 5R1C model (Figure 2), the heating/cooling energy loads calculation requires 5 heat transfer rates (H ve , H tr,is , H tr,w , H tr,em , and H tr,ms ), 2 heat gains (φ sol and φ int ) and 4 temperatures (θ H,set , θ C,set , θ ext , and θ t−1 ). The calculation model for ISO 13790 is shown in Figure 3. In the first part, necessary building properties were used in dataset establishment, and the specific values will be described in Section 2.3, Table 1. In the second part, 19 input variables were used in SA, and relative information will be introduced in Section 2.5, Table 2. In the third part, process variables were involved in the simulation, but these calculation processes can be regarded as black boxes and will not be discussed in the calibration process. The meaning of process variables is shown in Table 3. Finally, the model accessed the simulation result as heating and cooling energy consumption.    Figure 3).  Figure 3).   Figure 3). Heating setpoint

Symbol
Last-hour room temperature

Dataset Establishment
The entire dataset contains two datasets: measured dataset D m and simulation dataset D s . The measured dataset D m is a combination of variable inputs matrix x and calibration target z, including the building design information and corresponding energy data of 10 office buildings collected from Guangdong, China. The specific index is shown in Table 1.
Variable inputs matrix x = (x 1 , x 2 , · · · , x q 1 ), which contains 'q 1 ' non-sensitive parameters, was determined by collecting building design information. The calibration target defined as z = (z 1 , z 2 , · · · , z n ) T has 'n' measured energy data and each element in z is one hour measured heating and cooling energy consumption. According to Table 1, for each building, there is a 2 × 8760 matrix for each building energy record. With 10 office buildings integrated composing the D m , the number of 'n' in vector z should be n = 10 × 2 × 8760. For better understanding, data points in z are illustrated in Figure 4.
The structure of the measured data set D m is: The simulation dataset D s also has three parts: the variable inputs matrix x , calibration input Θ, and corresponding simulation results y. The x and Θ come from input variables of the ISO 13790 calculation model, which is divided according to the sensitive analysis result. The variable inputs x = (x 1 , x 2 , · · · , x q 1 ), which include 'q 1 ' non-sensitive parameters, will have determined values for each simulation process. The calibration input Θ = (Θ 1 , Θ 2 , · · · , Θ q 2 ), which include 'q 2 ' sensitive parameters, will be supposed to take unknown values that need to be calibrated. The simulation result is defined as y = (y 1 , y 2 , · · · , y N ), including 'N' times simulation runs (N = 100 in this study). The mapping relationship between simulation input and result is denoted as η(x, Θ). Thus, y j = η(x j , Θ j ). The structure of the simulation dataset D s is: x 1,n · · · x q 1 ,n Θ 1,n,1 · · · Θ q 2 ,n,1 y n,1 x 1,n · · · x q 1 ,n Θ 1,n,2 · · · Θ q 2 ,n,2 y n,2 . . . . . . . . .
Therefore, the dataset to be analyzed is a combination of y and z:

Bayesian Inference
BC [6] represents all unknown inputs as the parameter Θ. The posterior distribution of Θ is calculated using the measured data, and the uncertainty of Θ is quantified. Then, the calibrated model can obtain the prediction distribution based on the posterior distribution of Θ. The preparation of calibration includes: (1) model hypothesis and (2) model analysis.

Model hypothesis
Based on Kennedy and O'Hagen's formulation, the uncertainty of ISO 13790 can come from four aspects: parameter uncertainty, model inadequacy, residual variability, and observation error. The relationship between y = η(x, Θ) and z can be represented as Equation (3): where, ζ(x i ) is the true process for i th set of variable input x; e i is the observation error for i th sample; ρ is the regression parameter of simulation model; δ(x i ) is the model inadequacy function, which is independent of the simulation result η(x i , Θ).

Model Analysis
The observation error e i , which considers the residual variability as well, can be assumed as an independent normal distribution N [0, λ e ] (λ e is an unknown variable which is one of the hyperparameters that needs to be calculated in the next step). It is not related to the variable input x, and, further, it is equal for the true process ζ(x i ) and simulation process η(x i , Θ), so that Equation (3) can be simplified as: As in Equation (3), η(x i , Θ) and δ(x i ) are independent in (4).

Step 1: Sensitive Analysis
Under the ideal situation, all the uncertain parameters in the BEM calculation model should be calibrated [20]. However, due to the data quantity/quality limitation, it is a common practice to identify the uncertain parameters using the SA technique [30]. Tian [31] conducted a comprehensive review of various SA strategies in BEM and stated the input parameters and their range should be defined based on the research purpose. This study chose Sobol method [32] for ISO 13790 to figure out which parameters need to be calibrated. Assuming that all inputs are independently and uniformly distributed within their range, the ranges of input variables are determined according to the measured building properties. The model of ISO 13790 is taken as a black box function with a 'd' dimensions input vector X M and a scaled output Y M (the subscript M means model).
A total of 19 input variables for the ISO 13790 calculation model and their range are shown in Table 2. At the first attempt, the SA was conducted with 'd = 19'. For a more detailed analysis and further clarification of the influence of each variable, the two most influential parameters GFA and θ t−1 were used to divide the situation. The SA was again conducted for different building scales (GFA = 10 k, 50 k, 100 k, 150 k (m 2 )) and different initial temperatures (θ t−1 = 10, 20, 30 ( • C)), respectively. Therefore, the dimensions of input vector X M is 17 (d = 17) for 2nd SA.
The sensitivity of input parameters was estimated by first-order indices S i and totaleffect index S Ti , which is calculated by the following equations: First-order indices: Total-effect index: where, X i is i th elements in the vector X M ; X ∼i is all elements in X M except X i ; Y is the heating and cooling energy consumption calculated by the input vector X M . With the (quasi) Monte Carlo method, both indices can be estimated with generated sample matrix [33]. Two N × d sample matrix X A and X B were firstly developed with Sobol sequence. Then, 'd' N × d sample matrix X AB i (with i = 1, 2, · · · , d) were built with ith column of X B , and X A without ith column. Correspondingly, Y A , Y B , and Y AB i were calculated by running ISO 13790.
For S i calculation, Var x i (E X ∼i (Y|X i )) can be estimated as: For S Ti calculation, E X ∼i (Var X i (Y|X ∼i )) can be estimated as: The S i and S Ti were compared with the threshold, and the parameters exceeding the threshold were defined as sensitive parameters which need to be calibrated.

Step 2: Assume Prior Distributions of Calibrated Parameters
Three different informative levels (non-informative, weakly informative, specific informative) can be used to assume prior distributions. Referring to the available information in the measured dataset, there is no strong evidence supporting that calibrated parameter is close to a specific value. Therefore, weakly informative priors (a normal distribution with a standard deviation of 0.2) are recommended rather than highly constrained specific informative priors [34]. These prior probability distributions were defined according to the normalized range [0, 1] of the calibration parameters Θ. The mean value of the prior distribution is determined by the average value of normalized calibrated variables.

Step 3: Prior Distribution (Gaussian-Process Emulator)
A Gaussian process based on a multivariate normal vector was adopted to denote η(x i , Θ) and δ(x i ) as two normal distributions. The parameters of the multivariate Gaussian distribution are mean value vector m and covariance function c.
Adopting the linear model with weak prior distribution, m can be expressed as For brevity, β η and β δ were combined in location parameters β = (β T η , β T δ ) T In terms of covariance function c, this can be calculated by: and, Then, the covariance function of the z can be specified as follows: This formulation introduces several unknown hyper-parameters to the calibration and inference process: the precision hyperparameters (λ η and λ δ ), and two sets of correlation hyperparameters, (β η and β δ ). Then, parameters related to the covariance functions (λ η , λ δ , β η and β δ ) were combined and denoted as hyperparameter ϕ.
All hyperparameters (ρ, λ and ϕ) are combined as hyperparameters φ. Therefore, the complete set of parameters comprises the calibration parameter Θ, the location parameter β and the hyperparameter φ. It is a reasonable expectation that the calibration parameter Θ is independent of others, so the prior distribution is:

Step 4: Posterior Distribution (Gibbs Sampling Approach)
To solve the multivariate distribution, the MCMC algorithm was used to compute the probability density function of the calibration parameters considered. Gibbs sampling approach is one of the MCMC algorithms. To assess the posterior distribution function p (d|Θ, β, φ), a high-dimension Gibbs sampling algorithm was applied to obtain the sample sequence, which will form the Markov chain. As Markov Chain can only move along the axes in the sample space during the second step, the final sample p (Θ, β, φ) is converged to a common stationary distribution.

Step 5: Evaluation and Validation
The performance of the calibrated model needs to be evaluated and validated. The evaluation criterion is the matching degree, i.e., accuracy and convergence, between the model predicted energy consumption and the actual measured energy consumption.

Evaluation of Convergence for Multiple MCMC Chains
Gelman and Rubin's approach [35] was applied to assessing the convergence of multiple MCMC chains. The convergence is indicated by an overestimated scale reduction factorR. For 'm' sets of 'n' simulated observations, the indicatorR was calculated using between sequence variance B/n and the within sequence variance W: where,σ 2 When theR is close to 1, the potential for varianceσ 2 + to be further decreased is limited, and each of the 'm ' sets of 'n' simulated observations are close to the target distribution. For convergence,R should be approximately 1 ± 0.1.

Validation of Prediction Accuracy
The performance of the model was evaluated by using mean bias error (MBE), coefficient of variation of root mean square error (CV(RMSE)), and mean absolute percentage error (MAPE). A total of 25% of samples in the measured dataset D m were originally extracted as the testing dataset and used for validation.
, is the sum of deviation and reflects the overall over-estimation/ under-estimation of the model.
, was determined by the RMSE, which is not affected by the sign of the deviation. It reflects the variability in agreement between the simulated results and measured values.
reflects the percentage of error for each prediction. Model performance was evaluated on different time scales (hourly, monthly). For every signal time scale, a dedicated statistical threshold was set in line with the ASHRAE guideline [36], details are shown in Table 4. Table 4. Threshold limits of statistical metrics for model evaluation [36]. 30 15

Result of Step 1: Input Variable and Calibration Variable
For the first attempt, SA on 19 input variables in the calculation model (the number of the variables was same as the series shown in Table 2) was conducted with the Sobol method, and the results are shown in Figure 5. It can be seen from Figure 5 that the S i and S Ti of variable 1 (GFA) and variable 14 (θ t−1 ) are too large, leading to difficulties in analyzing the sensitivity of the remaining 17 variables. Therefore, GFA and θ t−1 were used to make a reasonable division of the situation and conduct a further SA in detail.
For the remaining 17 parameters shown in Table 2, SA was conducted under 12 cases with different building scales (GFA = 10 k, 50 k, 100 k, 150 k (m 2 )) and different initial temperatures (θ t−1 = 10, 20, 30 ( • C)); the results (S i , and S Ti ) are shown in Figure 6.  This paper defines the threshold as 0.05 (the dashed line in the figure), and the parameters exceeding the threshold are defined as sensitive parameters. As the GFA of the building can always be achieved as true value, it is not considered as a calibrated variable. Therefore, Last-hour room temperature θ t−1 and parameter 1 (L), 2 (W), 4 (WWR), 10 (DHI), 11 (DNI), 12 (θ ext ), 13 (θ H,set ), 14 (θ C,set ), 16 (HourNum) were, together, defined as Θ = (Θ 1 , Θ 2 , · · · , Θ 10 ) for BC. According to the results of SA, 19 input parameters of ISO 13790 were redivided into input variable x (with q 1 = 9) and calibration variable Θ (with q 2 = 10). Detailed information is displayed in Table 5a and Table 5b, respectively. Day of year x 3 Story height x 4 Exterior window U-value x 5 Exterior wall U-value x 6 Roof wall U-value x 7 Orientation x 8 Mechanical ventilation x 9 Internal heat gain Hour of day Θ 3 External temperature Θ 4 Diffuse horizontal irradiance Θ 5 Direct normal irradiance Θ 6 Building length Θ 7 Building width Θ 8 Average window-wall ratio Θ 9 Heating setpoint Θ 10 Cooling setpoint

Result of Step 2: Priors Distribution of Model Calibrated Parameters
The prior distribution functions of calibrated parameters were assumed at the weakly informative level (a normal distribution with a standard deviation of 0.2). These prior probability distributions were defined according to the normalized range [0, 1] of the calibration parameters Θ. The mean value of the prior distribution was determined by the average value of normalized calibrated variables. The detailed information is shown in Table 6. Heating setpoint N(0.5, 0.2) Θ 10 Cooling setpoint N(0.5, 0.2)

Result of Step 3: Prior Distributions of Hyperparameters in Gaussian Process Emulator
The hyperparameters of the GP model include: β δ,1 , β δ,2 , · · · , β δ,q 1 ; β η,1 , β η,2 , · · · , β η,q 1 ; β η,q 1 +1 , β η,q 1 +2 , · · · , β η,q 1 +q 2 ; λ η ; λ δ ; λ e . The detailed prior distribution assumptions for each hyperparameter are given in Table 7.  Figure 7 shows the prior and posterior distributions of the 10 calibrate variables. It can be seen that using the weakly informative prior distribution can initially determine the form of the posterior distribution. Meanwhile, the shift in the posterior distribution from the prior distribution indicates that the measured data can affect the posterior distribution. The good combination of information from the prior distribution and measured data enables the prior distribution to exclude unreasonable parameter values, but is not strong enough to exclude meaningful values in the measured dataset.

Result of Step 5: Validation of Model Accuracy and Convergence
The Gelman-Rubin statisticR was used to check for adequate convergence, and its value should be approximately 1 in this model.R values are within 1 ± 0.1 for all calibration parameters and hyperparameters of the GP model, which means every parameter in the model is well-converged. Regarding model accuracy validation, the training-set and testingset results are shown in Table 8. According to the threshold set by ASHRAE Guideline 14, the model can be considered well-calibrated and satisfies the accuracy requirement. Furthermore, low CV(RMSE) and MBE indicate good consistency between calibration prediction and measurement results.

Discussion and Case Study
To demonstrate the performance of the calibrated model, an additional office-building historical energy data was used as a new data set (not included in the previous training set or test set) to carry out a case study. It is an office building with a GFA of 63,960 m 2 . The specific properties of this building are shown in Table 9. The overall performance of the calibrated model is shown in Table 10. As can be seen from the table, the monthly CV(RMSE) is 0.73%, and the hourly CV(RMSE) is 20.97%, which is obviously improved from 33.13% and 153.98% in the uncalibrated model. Furthermore, it satisfy the ASHRAE Guideline 14 accuracy requirements.  Figure 8 compares monthly prediction before and after calibration with the measured data, which can provide a basic understanding of calibration performance. Figure 9 demonstrates the corresponding monthly absolute percentage error (APE) for a more detailed illustration. As shown in Table 10, the MAPE for monthly prediction of the calibrated model is 0.64%, which is calibrated from MAPE 30.34% in the uncalibrated model. In addition, from Figure 9, it can be seen that all APE is within 2.5%, which means the APE performance of the model is stable for every month. It is worth noting that the prediction error for February is slightly higher than that of other months, which can also be seen from Figure 10. It is speculated that the sample number of mixed cooling and heating months is not enough in the sample data, so the calibrated model cannot make a perfectly accurate prediction.  For hourly prediction, the annual average hourly prediction CV(RMSE) is 20.97%, and MBE is −0.03%, which are in line with ASHRAE Guideline 14. The detailed hourly prediction performance in each month is shown in Figure 10. In Figure 10, the hourly prediction CV(RMSE) for all typical days, except 15 February and 15 October, are less than 30%.
The weaker performance in February and October may be attributed to the fast switching of cooling and heating conditions in these two months. The insufficient sample in the dataset leads to the model's simulation accuracy of mixed heating and cooling months being slightly lower than that of purely heating or cooling months. Figure 11 shows the hourly prediction performance of the calibrated model on design day. In this case study, the winter design day is 21 January, and the summer design day is 21 July. It can be seen from the figure that the calibrated model is very successful in predicting the energy consumption of the summer design day. It successfully corrected the daily maximum and minimum energy consumption and achieved high accuracy with a CV(RMSE) of 5.55%. The prediction of the winter design day also has a large improvement compared with the uncalibrated model, and the overall trend is correct. Although the biggest difference between prediction and measurement is up to 52.39% (at 14:00), the average CV(RMSE) is still within 30% of the day. Based on the result of the case study, the findings can be summarized as follows: Data set with building energy consumption data combined with building geometry information and material information is useful for BC calibrated prediction model. The calibrated quasidynamic simulation method can predict the energy consumption of other buildings not in the training set. The effect of calibration is obvious: it greatly improves the performance of the model for the three metrics (MAPE, MBE, and CV(RMSE)). Meanwhile, both the CV(RMSE) and MBE of the model satisfied the requirement of ASHRAE Guideline 14.

Conclusions
This paper provides a well-calibrated office-building prediction BEM using the BC method based on the established historical energy dataset, including geometry information, in Guangdong, China. It demonstrates the process of applying the BC to ISO 13790 based on a comprehensive dataset of real energy data from office buildings in Guangdong, China. Based on the result of the case study, this paper also shows the prediction result is in-line with the requirement of the hourly prediction model in ASHRAE Guideline 14.
Step-by-step results and key findings are listed as follows: 1. The Sobol method was used as the SA method. With the threshold of 0.05, 10 of the 19 input values in the ISO 13790 energy calculation method were determined as calibration variables.

2.
With less confidence in measured building geometry and material data, the prior probability hypothesis can take a weakly informative level. Taking the weakly informative level prior to distribution can lead to a good combination of information from the prior distribution and measured data, resulting in a proper posterior probability density function with accurate prediction results.

3.
A calibrated building energy model was established that meets the accuracy thresholds set by ASHRAE [36], with hourly prediction CV(RMSE) as 20.97% and MBE as −0.03%, which is less than 30% and 10%. 4.
The Gelman-Rubin statisticR was used to evaluate the model convergence and check the posterior distribution mixing and gathered on a common stationary distribution.
With the previously mentioned findings, the conclusion can be stated. The quasidynamic simulation method can achieve a similar accuracy level as the dynamic simulation method after BC. The generalization ability of the calibrated model can be improved with a comprehensive database, including geometry information and material information. The new model can be applied to the office buildings not included in the training set and can satisfy the prediction requirement set by ASHRAE. With this improvement, this calibrated prediction model can also be applied to improve the quality and integrity of the existing building energy database.