Evaluation of Model Calibration Method for Simulation Performance of a Public Hospital in Brazil

: This work presents an extensive study on methodologies to calibrate electric energy consumption in buildings. A comparison between several calibration methodologies shows different approaches addressing the same issue, suggesting a lack of a unique methodology that is reproducible for every building. Additionally, no methodology ﬁts the Brazilian public context, such as the predominance of Unitary Air Conditioning Systems (UACS) and buildings which have operated for more than 30 years. A new calibration methodology for performance simulation is proposed to deal with such features. The methodology is separated into two evidence-based steps according to the size of the Heating, Ventilation and Air Conditioning (HVAC) systems used to control buildings’ indoor environments: the ﬁrst step is dedicated to calibrating medium- and large-sized HVAC systems, and the second step is dedicated to calibrating small-sized HVAC systems. University Hospital of University of São Paulo (UH-USP) is used as a test bed to implement the proposed methodology. Accuracy indicators show the efﬁciency of the methodology in terms of calibrating a simulation of the whole UH-USP building and Chilled Water Plant on a monthly basis in terms of accuracy and the time needed to perform the calibration. However, regarding simulation of UACS, the application of the methodology was inconclusive. This study leaves open the question of the trade-off between increasing model outcome accuracy and the strictness of accuracy indicators applied to UACS and poorly automated large-sized air conditioners. P.P.F.d.S.;


Introduction
The use of computational programs to simulate energy performance of buildings has already become a consolidated practice [1]. These Building Energy Performance Simulators (BEPS) are capable of representing the behavior of the electrical energy consumption of the different types of equipment used inside buildings and estimating, with low-risk, energy-efficiency potentials [2][3][4]. Among end uses, HVAC systems, due to their increasing participation in energy consumption in buildings, deserve attention, especially in developing countries [5].
The applicability of a building energy model is directly dependent on satisfying a core condition: representing the real phenomena with acceptable accuracy [6]. Improvement of the reliability of the simulation results requires a calibration phase, which consists of the process of adjusting the input parameters to represent the building in the model, to obtain a simulation output that is close to the real measured data [7]. Existing buildings tend to undergo system degradation, changes in use, and unexpected faults over time [8]. Therefore, the calibration phase is a must, especially for aged buildings, such as most Brazilian public buildings that have been in operation for more than two decades [9]. In order to deal with such features, various methodologies have been proposed to calibrate energy modelling tools.
Despite the effectiveness of these methods to deal with buildings that underwent calibration processes, specificity of the buildings needs to be considered and represented. Most of the calibration works deal with more uniform and standardized buildings, such as office buildings, which have energy consumption profiles that are rather predictable over time. Additionally, they mostly have large-sized HVAC systems, which are fueled by gas to meet the heating demand. However, such features do not fit most buildings in regions, like Brazil, that have specific bioclimatic conditions. In Brazilian buildings, according to ELETROBRAS [10], HVAC systems account for 48% of the total energy consumption, and Unitary Air Conditioning Systems (UACS) are predominant, for instance, the Unitary Window-type and split air conditioners. Conversely, indoor heating equipment is very rare in Brazilian buildings due to its warmer climate [11], and the use of fuel is limited, and is mainly used for cooking and on-site energy generation [12]. Therefore, electric energy represents 92.3% of the total energy consumed in Brazil services buildings in 2019 [13].
Evidence-based methods use information and measurements collected from a real building to perform a calibration of the energy model [14]. The advantage of this methodology is its easy reproducibility. Calibration methodologies that use this approach partially or in full have been reported by González et al. [15], Allesina et al. [16], Ahmed et al. [17]. Other methods use a data-driven step in order to find the model which most accurately represents the real phenomena [18][19][20]. Recently, studies involving purely data-driven approaches as substitutes to BEPS as a tool to forecast energy consumption are gaining relevance [21][22][23][24][25], but they are, yet, limited as tools to evaluate different strategies to building retrofit scenarios [26].
In the work reported here, a new calibration methodology is developed to address typical building features prevalent in regions with climatic conditions such as Brazil, with a predominance of UACS and buildings with operation times of more than two decades. In order to demonstrate the efficacy of this methodology, it is tested to calibrate hourly electric energy consumption of a school hospital building in Brazil: University Hospital of University of São Paulo.
Ir order to proceed with the introduction, it is necessary to present a state of the art review on the two main aspects regarding BEPS calibration: (1) statistical indicators which describe the accuracy of a simulation model and (2) description of the calibration methodologies available in the literature and their limitations when used in the Brazilian public buildings context. The proposed methodology steps are presented in Section 2. Then the case study building is described and evaluated in Section 3. The final section is dedicated to implementing and evaluating the outcomes of the proposed methodology.

Accuracy Evaluation of Building Energy Models
Given the impossibility of collecting all possible input parameters to be entered into simulation tools, certain simplifications must be implemented. The need to adopt such simplifications is part of the very design of simulation models [27]. The simulation output should reach a minimum accuracy level, as part of the measurement and verification (M & V) plan. Thus far, three guidelines for M&V have been provided, which recommend the use of statistical indicators for evaluating simulation accuracy.
The first of these is the Mean Bias Error (MBE), calculated by Equation (1). Its value expresses the difference, positive or negative, between the measured and simulated data points [28]. It is not recommended to use the MBE as a stand-alone index, because it is vulnerable to the compensation effect, when positive and negative values contribute to reducing the final MBE value [7,28,29]. Thus, to eliminate this effect, the coefficient of variation of root mean square error (CV RMSE) (Equation (2)) is used.
where s i indicates an element of the vector that receives the simulation results and mi indicates an element (with the same index of the simulation element) of the vector that receives the measurement data; i indicates the index ranging from 1 to N, which, in turn, indicates the size of each vector, which must be coincident; and m represents the mean of the measurement vector. For evaluation over a year, in the time interval N, the maximum time is 8760, in the monthly interval N is 12, and N is 1 in the annual interval.

Review of Existing Calibration Methodologies and Case Studies
To compare several methodologies, five aspects were defined: • Approach type: according to the definition provided by Coakley et al. [7]: manual, when model adjustments are all manually made by the user, and automated, when the approach involves at least one step not directly driven by the user. • Guidance level: evaluates the extent to which the proposed methodology is explained. The reproducibility of the proposed methodologies may be constrained if they are not thoroughly explained. Detailed types are papers dedicating exclusive sections to providing further information regarding the methodology's steps, easing its implementation by less experienced practitioners. Conversely, certain papers, denominated as case studies, emphasize the implementation of the methodology over its explanation. Finally, generic methodologies provide little information regarding the methodology and its implementation. • Calibration core: details the pivotal point on which the methodology stands. The variables that are analyzed provide the desired goals in the calibration process, for which additional tools might be used. • Extra Software: indicates the need to use additional software to analyze the performance of the model calibration. When another software is used, the goal for its use is identified. • Model accuracy evaluation: consolidated indicators have already been presented, such as MBE and CV RMSE. Moreover, a graphical comparison or the differences in energy consumption integrated by all assessing periods could be used. Table 1 shows a comparison of different methods found in the literature based on the aspects mentioned previously. It can be noted that recent publications tended to propose automated approaches, which require the use of additional software to aid the calibration process. Despite reducing the time for performing a calibration, the use of these additional resources requires specific skills on the part of the users. Thus, they need to master not only BEPS, but the tools responsible for the automation process, which may make the methodology implementation more difficult.
The analysis of all calibration methods shows many calibration features that have been studied so far. It can be seen that there is a lack of a reproducible approach suitable for all kinds of buildings. Despite this, almost all created methodologies refer to MBE and CV RMSE as indicators of simulation accuracy.
From all methodologies, the one proposed by Raftery et al. [14] seems to be one that fits to buildings that underwent an energy audit, because it proposes systematizations in different phases of the calibration, which facilitates the decision making about which adjustments should be made and when. Despite this, it proposes using a version control software, which can easily be replaced by another simple solution, such as a sheet to control versions and model changes.
Even so, this methodology needs some adaptations when used to calibrate a Brazilian building. Silva et al. [30] show that the methodology starts from some assumptions that have little to do with the reality of public buildings, such as, for example, the existence of an energy model since the building's design phase. In addition to that, when evidence runs out, electric energy demand could be used to perform an input sensibility analysis. In addition, the methodology does not address the possibility of equipment degradation over time, making it difficult to collect evidence and reproduce it through simulation.

Proposal of a Methodology for Calibration of Building Energy Use Simulation Model
A new methodology is developed to address specific features of Brazilian public buildings, such being operated for more than 30 years, the predominance of UACS, underautomated HVAC control, and the lack of evidence regarding the energy consumption system. Despite being built under such logic, we expect it could be used in other building typologies, either to calibrate whole building or any sub-metered electric energy simulation. The method is partially evidence-based, so it relies on investigation of the building for data collection [38].
Furthermore, this methodology considers cases where an initial model is already available. In this case, the steps illustrated in Figure 1 aim to update the building model owing to deviations between its simulation and measurements. As buildings are dynamic systems, they are vulnerable to an eventual change in use over time, and these deviations are inherent to building operation. Moreover, deviations may arise from new measurements that were not available when the first model was created.
The method is composed of two main phases. In phase A, an initial model is created with zoning that details the environments served by medium and large equipment. Such equipment has a cooling capacity starting from 17.6 kW, and mainly includes self-contained, Variable Refrigerant Volume/Flow (VRV/VRF), and a central chilled water plant. Phase B is related to the buildings and/or spaces with small package equipment, such as heat pumps and air conditioners, or split or multi-split window types.
The division of this method into two phases according to the level of disaggregation of the thermal zone is due to a practical reason: a more detailed model requires additional computational processing and model preparation time; thus, phase B, which exhibits greater model disaggregation, occurs later. In this manner, there is a preferential sense of ordering the phases-from the levels of higher aggregation of thermal zones to lower aggregation-to optimize the time required for calibration. Figure 1 presents a visual representation of the proposed methodology steps.

Calibration Simulation Plan
According to the steps suggested by Pan et al. [35], this phase is dedicated to selecting the simulation objective and simulation software. Furthermore, the authors recommend defining the statistical indices below, through which the simulation of the entire building can be considered calibrated. Simulation practioners can adopt one between the three different threshold indices available in the literature, which are presented in Table 2. These indices are indicated for a whole building electric energy calibration.

MBE (%) CV RMSE (%) MBE (%) CV RMSE (%)
The choice of the standard must be consistent with the measurement type and the measurement level, these being a whole building or any sub-meter demand. For example, the calibration approach suggested by Monfet et al. [49] intends to predict the airflow rate, supply and return air temperatures, and whole-building cooling loads. As such, the reference data are the airflow rate and the supply and return air temperatures. Therefore, the indicator range presented in Table 2 cannot be used as a quality index for such data.

Iterative Parametric Analysis for adjustments of big and medium-sized HVAC equipment input
Model accuracy is within the normative limit that was adopted? More

Correction of Model Errors
## represents the version index that is created after input adjustments.

Model Versions
Indication of input to be adjusted In this phase, all available evidence is organized into hierarchical order. This order is as follows: (1) data-logged electrical measurements; (2) data-logged non-electrical measurements; (3) spot or short-term measurement; (4) direct observation (site surveys); (5) operator and personnel interviews; (6) operation documents (such as operations and maintenance (O & M) manuals); (7) commissioning documents (such as as-built drawings); (8) benchmark studies and best practice guides; (9) standards, specifications, and guidelines; and (10) design-stage information (such as the initial model). This order is derived from the recommendations provided by Raftery et al. [14].
The availability of data within the "data-logged electrical measurements" evidence aids the decision as to whether the simulation will evaluate the simulation accuracy at the sub-facility level Creating different simulation versions according to each input adjustment makes the visualization of the model accuracy improvement easier, as demonstrated by Kissock et al. [50] and Yoon et al. [33]. Raftery et al. [14] suggested the same practice, but with the use of version control software to track each created version and its respective adjustment.
The use of a sensibility analysis can be suggested to identify the input value that provides the best match between the simulation output and measurement data.

Initial Model
Two basic data types are necessary for constructing an initial model: the first is related to the building geometry and indoor layout information; the second includes the weather data representing the site at which the building is located.
In the initial model, internal load and envelope parameters can be the same as that one pointed out by standards, such as ASHRAE 90.1 (2010) [51] and ASHRAE 189.1 (2009) [52]. Regarding the HVAC system, the initial model must identify equipment types and assign them to thermal zones according to the building indoor spaces layout, just to determine which space is served by which air conditioner type. In this step, there is no need for further characterization of HVAC systems, so the software "autosizing" input should be used.

Thermal Zone Delimitation
The delimitation of thermal zones begins with defining the type of conditioning system in operation, because this definition will set the number of calibration phases. Different conditioning systems must serve different thermal zones; however, a single thermal zone is not created under the same conditioning system.
In both phases A and B, the breakdown of thermal zones will be made according to several recommendations. Raftery et al. [14] suggested that each thermal zone should be associated with environments with homogeneous internal loads. These internal loads may relate to the intensity and lighting schedule, intensity and schedule of the electrical equipment, occupation profile, and level of activity of occupants, among others. The authors also suggest that the thermal zone should be separated according to its position relative to the exterior.
In addition to the criteria defined by Raftery et al. [14], it is appropriate to propose two others. The first is that unconditioned environments can be added to a single thermal zone, even if they have different functions. The second regards the representation of the thermal conduction differences between floors. Thus, it is suggested to separate thermal zones that are on different floors, even when they are served by the same HVAC system. This recommendation is in line with the commercial model examples available on the U.S. Department of Energy website [53].
Finally, in stage B, a lower level of thermal zone disaggregation can be created based on the availability of measurements occurring on equipment operating within a delimited boundary. This criterion is more explicit for environmental conditioning systems, as they serve environments with identifiable limits. However, one cannot rule out the possibility of obtaining isolated measurements from rooms or sectors of the other end users.

Schedule
Schedules adopted in initial models are derived from standards, such as those from [51,52]. Therefore, to obtain a best-fit model, these schedules should be changed when high-hierarchy data are available. If an occupational sensing apparatus is not available, indirect methods for schedules may be used. For example, employee electronic registration points can be used to capture day-by-day occupant variability. Furthermore, if the building is unoccupied over weekends, the approach of Kim et al. [54] can be used to define a schedule based on electricity consumption.

Error Correction Phase
As the improved model versions arise from gradual adjustments, the identification of error in certain initially adjusted parameters will lead to the correction of all phases following this initial phase. Reserving a correction phase avoids the need to correct all subsequent versions, thereby accelerating the calibration process.

Parametric Analysis
When the simulation model does not reach the accuracy threshold, we suggest to proceed with an iterative parametric analysis. In this analysis, the most influential parameters are varied to determine the values that best match the simulation output to the accuracy index. The suggested variation range is 20%, according to the approach in O'Neill et al. [37], and the variation step is defined by the designer.
To increase the search spectrum of a parameter or a set of optimal parameters, one can also use an automated tool, as proposed by Liu and Henze [55]. In addition to increasing the search field, the use of these tools makes the calibration faster and requires a shorter time for the professionals involved in the simulation, who would otherwise have to adjust the parameters manually.
Parametric analysis could consider performing fine-tuning of the density of the lighting or electrical equipment identified as the most uncertain in phase A. This step relies on the assumption that, at this point, all loads are calibrated except for this one. Thus, all that remains is to modify the uncertain intensity to obtain a fully calibrated model. The modification is carried out by multiplication by a certain factor. This factor is the quotient of the energy consumption measured with the simulated energy consumption, both of which are integrated over the studied period. This step is repeated until the simulation falls within the limits of the M&V manuals.

Short Description of the Building
The study case is the University Hospital of Universidade de São Paulo (UH-USP) on the Butantã campus. It has been in operation since 1968 and accounts for approximately 1500 employees. The floor area totals 36,000 m 2 , which is divided into six floors, four of which are exhibit an "H" letter layout, as illustrated in Figure 2.
The building contains a high diversity of space-type patterns. As it works as a hospital as well as a training/educating facility, classrooms are spread all over the building. The main medical care part is on the second floor, where surgery, child-birth, and emergency care rooms and and laboratories are located. The kitchen and sanitation sections are located on the first floor. The fourth floor is dedicated to a baby nursery and child hospital beds. Hospital beds are concentrated on the fifth floor.
In addition to the heterogeneity among floors, the occupation varies within the same floor. For example, the main administrative sector is on the third floor. The child and neonatal intensive care unit (ICU) is located on the very same floor. The sixth floor is divided into post-surgical hospital beds and the adult ICU. UH-USP operates on a 24 h basis, seven days a week. Based on employee interviews, night occupancy is reduced to 30% of the peak occupancy, which generally occurs in the morning. However, certain sectors operate in typical office hours. Moreover, there are areas with no constant occupation schedule, such as class, conference, and break rooms.

Electrical Facility Characterization
To evaluate the potential of energy reductions in the building, since September 2016, the building has been subjected to an energy audit. To satisfy the program objectives, some key electrical facilities have been measured. Figure 3 shows box plots representing the hourly electrical demand of the whole hospital facility. Average demand is represented by the gray line and ranges from 700 to 600 kW. The integration of the whole data-logged period result in 5194 MWh of eletrical energy consumption.
Two air-cooled chilled water plants (CWPs) exist in the building, which have an 833 kW cooling capacity. Moreover, there are 208 split-ductless systems, which control the temperature of an area of approximately 5362 m 2 . The HVAC system as a whole integrates a rated refrigeration capacity of 2419 kW. The CWPs have a poor automation system, and one of these, CWP1, serves the surgery and child-birth sectors. CWP1 is attached to 19 air handling units (AHUs), which are partly automated. According to the facility management staff, the AHU that regulates the temperature of the common areas such as corridors and the reception lobby is controlled by a temperature set-point. However, the AHU s surgery rooms are manually triggered depending on a need-to-use basis. The other Chiller Water Plant, CWP2, is controlled by a timer that makes the system available for a certain period of the day. This period is changed by operators throughout the entire year.
The hourly measurements of the electrical facilities also encompass CWP1 and 19 splitductless air conditioners. The electrical energy demand of CWP1 was measured along a whole year period and its results are presented in Figure 4. The average demand represented by the gray line shows values between 40 and 100 kW. Excluding outlier observations, the electric demand ranges from 25 to 125 kW (minimum to maximum). Data sazonability shows that the summer season has a high electric demand. The electrical energy demand of split air conditioners was measured over different periods, ranging from one month to two months. Due to the large number of split systems, the choice of which split equipment should be measured took into consideration two criteria: the equipment capacity and operation hours. The energy demand was measured for the equipment which had the greater number of representatives and operating hours of the area served by each one. An itinerant meter was used and routines were implemented to measure the energy consumption of split type air conditioners. Figure 5 presents the energy demand profile for four split type air conditioners with different refrigeration capacities and different hospital spaces.
The diversity of occupancy profile can be corroborated by Figure 5. It can be noticed that there are spaces with demand from the air conditioning during the whole week on a 24 h basis. On the other side, there are air conditioners that remain on for few hours and for few days throughout the week.
In addition to the diversity between different space types, some unexpected occupancy behavior based on the measured energy demand profile was verified. For example, there was an office space operating on a 24 h-basis, due to misuse. The library room air conditioner was turned on for a few moments even on Saturday and Sunday. Among all measured UACS, the ones which serve ambulatory and laboratory rooms consume energy the most and operate uninterruptedly. Only two air conditioners work 5 days a week: the one which serves the library (12 h per day) and the one which serves the physiotherapy room (10 h per day).

Comparison of Other Test Beds
Data volume from UH-USP is compared with available data from those buildings where the reviewed methodologies were tested. As presented in Section 1.2, some comparison aspects should be considered. For instance, the first one concerns the building type. Office buildings tend to have well delimited occupancy on a commercial time basis. For the other side, there are building typologies whose occupancy patterns are not constant, such as hospitals and retail stores.
The type of electrically operated HVAC system is presented as well. The identified type can be CWP, for Chilled Water Plant; UACS, Unitary Air Conditioning System; GHPS, Geothermal Heat Pump System; ERHS, Electric Resistance Heating System. Naturally Ventilated (NV) types are identified when the studied building has strategies for natural ventilation.
Another comparison is related to the calibration reference that describes which realtime measurement level is considered to perform calibration. Most common calibration references are: Whole Building (WB), Lights (L), Plug Loads (PL) and HVAC. Other types of references can be used as well.
The measurement period and temporal resolution are taken for comparison too. The first one regards the period in which the measurement was made, while the former consists of the time step in which the measurement was stored. Each one refers to three different calibration references: whole building (WB), weather independent loads (such as lights, plug loads, elevators, and so), and HVAC (refers to a specific HVAC system component, if separately measured).
A comparison of tested buildings in light of these aspects is presented in Table 3, where it is shown how calibration of the HVAC level can be considered as a rather understudied issue. Among the 16 assessed papers, only one used a whole year period of HVAC measurements. For the analysis that was carried out in this paper, the most representative UACS in the building were selected based on their operation schedules and types of room use (nursery, office, etc.). The selection is necessary to provide the best evaluation of the impact of such equipment in the energy profile of the building with a limited budget for such measurements. These measurements were carried out during a full year period. Most of the papers in the literature focus on office buildings that, usually, have a quite stable and, therefore, predictable schedule of operation. It can be seen that, for the papers that were analyzed, there was a variety of types of HVAC systems. The building here evaluated is the only reported hospital building that has an HVAC system based on UACS. According to Neto and Fiorelli [59], as UACS are more vulnerable to the occupant's behavior, their simulation and model calibration tends to present more inaccuracies and the process of model calibration is more difficult. To improve the accuracy of the model calibration for buildings where occupancy and users' influence has a high impact, one should employ additional techniques such as calibration coupled with statistical occupancy estimation suggested by Li et al. [60].
Between all evaluated works, there is a single methodology paper that addresses the use of apparatus to measure occupant. The study reported by Li et al. [60] shows a model calibrated using actual occupancy provide statistically more accurate estimation when compared to a model calibrated using estimated occupancy data.

Calibration
The proposed methodology was used to calibrate an EnergyPlus (version 8.9) model that simulates the UH-USP building energy consumption. OpenStudio (version 2.5) was used to aid with the visual-dependent steps, such as geometry and HVAC component insertion. The threshold suggested by ASHRAE [46] ( Table 2) was chosen to evaluate model accuracy.

Preparation
All the evidence obtained from the UH-USP was classified into the hierarchical order presented in Section 2.1. A total of 22 pieces of evidence could be considered and Table 4 displays all the evidence in a hierarchical order. The data-logged electrical measurement indicates that there are three calibration levels for evaluating simulation output: WB, CWP, and split air conditioning.

Phase A
The adjustments made in phase A are outlined in Table 5. Seven evidence-based adjustments are foreseen in the initial model. However, it is necessary to detach three adjustments to correct the identified errors. In this phase, the average simulation runtime is 3 min. Weather data correction -AE3 Adjusting setpoint temperature of environments served by CWP1 -

Phase B
In phase B, thermal zones are segregated to represent split type air conditioner equipment, which make up 40% of the total HVAC cooling capacity of the buildings. The phase B new thermal scheme resulted in a 46% less conditioned area in comparison to phase A. This smaller conditioned area is a result of the spraying of thermal zones to represent areas conditioned by a small HVAC. This spraying also reflected the increase in the simulation runtime, which increased from 3 to 18 min on average.
It can be concluded that, in general, the HVAC energy demand tended to be lower in phase B. This was expected, as this model presents a larger conditioned area and the calculation of the lighting loads and electric equipment rose from a W/m2 ratio; a larger area results in a greater thermal load from these sources. Throughout the period, the total difference was 214 MWh, which is equivalent to 45% of the CWP1 total annual measured energy consumption. Table 6 presents all adjustments made in phase B. It includes five evidence-based and three error-correction changes. Pieces of evidence for calibration were exhausted in the BE3 version. A manual parametric analysis loop was used to obtain an MBE value below the MBE limit. The input parameter selected to be changed to minimize the simulation discrepancy was the AHU deck temperature that was changed 10 times using a step of 0.5 • C.

Versions Accuracy Evaluation
Mean Bias Error (MBE) is a way to evaluate energy simulation models by showing the average difference between simulated and measured electric energy data. Figure 6 illustrates the changes in the MBE verified in all created versions. As phase B is not dedicated to calibrating the CWP1 simulation, the MBE for CWP1 was omitted during this phase. Unitary Air Conditioners calibration levels represent the mean of the comparison of the 19 measured pieces of equipment and respective simulations. To facilitate visualization of the results, all MBE were converted into their absolute values. The percentage values represent both the hourly and monthly resolutions. Moreover, a straight line is plotted to represent the month (the most rigorous) normative limit, conforming to ASHRAE [46] and U.S [48]. The reduction in the conditioned area in phase B increased the MBE for CWP1 by 35%. The increment indicates the importance of selecting a zone typing strategy that best represents the conditioned areas in the studied building.
In general, it can be seen that the adjustments are efficient in lowering the simulation measurement deviations in terms of MBE. The increase in the MBE in the CWP1 simulation can be noted at the entire building level, albeit at a lower magnitude.
In the final part of the calibration process, only the whole building and CWP1 could be considered calibrated. Moreover, the evidence-based adjustments were not sufficient for simulation calibration. In terms of the split comparison, the adjustments were not capable of causing the simulation to reach the MBE limit. The split equipment simulation exhibits MBE levels of the final version that are as high as those of the first version, at approximately 45%.
As positive and negative values contribute to reducing MBE, we used the CV RMSE, Coefficient of Variation of Root Mean Square Error. CV RMSE expresses the average difference between the absolute value of simulated and measured energy. A comparison of the monthly CV RMSE values is presented in Figure 7. Owing to the short measurement period, the CV RMSE of the split system was not evaluated using the month resolution.
As opposed to the MBE, the month CV RMSE level of the CWP1 simulation could not reach the normative limit. This divergence could lie in a compensation effect to which the MBE is vulnerable. However, the whole building simulation reached the CV RMSE level in most versions. Finally, Figure 8 presents a CV RMSE comparison at an hourly temporal resolution, as well as an hourly CV RMSE normative limit.
The whole building simulation was the only level that reached the minimal hour CV RMSE level. Although lowering of the MBE level was not evident, the hourly CV RMSE was reduced considering the split system simulation. These deviations were expected, once this HVAC system type is vulnerable to stochastic elements, such as occupant behavior. CWP1 also did not reach the minimal CV RMSE level. Similar to split air conditioners, in the UH-USP operation routine, this system is exposed to stochastic elements of occupant behavior. However, these elements rely on a different source. Split systems are affected mainly by door closing/overture and occupant setpoint temperature preferences, while CWP1 is affected by the time that the surgeries occur.

Conclusions
In this study, we have proposed a methodology for the calibration of energy use in buildings, considering the reality of a Brazilian public school hospital. The application of the methodology to calibrate UH-USP energy consumption demonstrated that it is efficient to calibrate buildings where UACS are predominant in terms of accuracy outcome and, also, in terms of time. Regarding outcome accuracy, whole building simulated energy reached a minimal accuracy threshold, in hour and month timesteps. According to the main M&V, guidelines, it should be able to be used as a retrofit evaluation tool.
Regarding time efficiency, the process duration is lowered when the building's thermal zone delimitation is split into the two proposed steps. This two-step approach shows, as well, that when thermal zones are sprayed to represent rooms served by UACS, a reduction of more than 40% of HVAC energy consumption occurs. It shows that calibration processes must make the delimitation thermal zone an important aspect.
However, considering all statistical indicators and limits used to the whole building analysis, UACS cannot be considered calibrated, and neither can CWP1 hourly simulated energy. One solution could be the use of an optimization tool to minimize deviations. These tools require parametric analyses, which use a high number of simulations to determine the best input to fit the simulation output to the measurement. For example, using the number of simulation runs required to calibrate the model of Yang and Becerik-Gerber [20], namely, 1362, 17 days would be needed to complete the calibration process using a conventional personal computer. This significant time required for performing a simulation could make parametric analysis impracticable.
As shown in the paper, automated approaches tend to be used by other studies to maximize model simulation accuracy, especially optimization algorithms. Additionally, another tendency is the use of an occupant apparatus to measure actual occupancy data. However, as occupancy data describe occupant behavior in the past, it should not be able to foresee future occupant behavior, even more so if there are important energy uses that dependent on this behavior, such as UACS and under automated HVAC surgery rooms.
However, in addition to increasing the model capacity to describe a real phenomenon by increasing the simulation tools' capacity, another way should be to create specific statistical indicators and statistical minimal thresholds. These indicators could be different from the one suggested to whole building evaluation, the MBE and CV RMSE. For HVAC simulation, such an indicator should consider the simulation goal. In this context, if we analyze the annual performance, a month MBE could be enough. However, if the goal is to design a new air conditioner, the simulation has to, at least, be able to match the period of the highest/lowest temperature, or the period in which the maximum number of occupants in the building was observed. This conclusion is also presented by Li et al. [60], who state that a model built for one purpose will be statistically inaccurate when used for another purpose.
The high complexity task of representing a hospital school energy consumption such as UH-USP might reveal that the development of a better accuracy indicator is considered the most suitable alternative. In this context, this work intends to be the first to show calibration challenges regarding a building typology with high stochastical occupantdependent variables.