E ﬀ ectiveness of Automatic and Manual Calibration of an O ﬃ ce Building Energy Model

Featured Application: This work could contribute to evidence strengths and weaknesses of manual and automatic calibration of building dynamic simulation models leading at improving the quality of building retroﬁt solutions investigation. Abstract: Energy reduction can beneﬁt from the improvement of energy e ﬃ ciency in buildings. For this purpose, simulation models can be used both as diagnostic and prognostic tools, reproducing the behaviour of the real building as accurately as possible. High modelling accuracy can be achieved only through calibration. Two approaches can be adopted—manual or automatic. Manual calibration consists of an iterative trial and error procedure that requires high skill and expertise of the modeler. Automatic calibration relies on mathematical and statistical methods that mostly use optimization algorithms to minimize the di ﬀ erence between measured and simulated data. This paper aims to compare a manual calibration procedure with an automatic calibration method developed by the authors, coupling dynamic simulation, sensitivity analysis and automatic optimization using IDA ICE, Matlab and GenOpt respectively. Di ﬀ erences, advantages and disadvantages are evidenced applying both methods to a dynamic simulation model of a real o ﬃ ce building in Rome, Italy. Although both methods require high expertise from operators and showed good results in terms of accuracy, automatic calibration presents better performance and consistently helps with speeding up the procedure.


Introduction
The construction sector has a primary role in CO 2 reduction in Europe since buildings use around 40% of total energy consumption and generate almost 36% of greenhouse gases [1]. Recent data on world energy consumption in both residential and commercial buildings are presented by Allouhi et al., 2015 [2], together with an overview of measures and policies adopted by different countries for the reduction of energy consumption in buildings. They showed how in Asia, rapidly developing economies, essentially India and China, are looking to reduce the dramatic increase of energy consumption in buildings due to the fast urbanization rate [3].
Worldwide energy reduction will then benefit from the improvement of energy efficiency in buildings.
This objective can be reached using two different and complementary approaches-measurements and simulations [4]. Simulation can be used to estimate the energy (1) where s and m refer to simulated and measured data respectively. They are mostly evaluated for energy consumption, for which acceptance criteria have been introduced by various organizations as showed in Table 1.  [18] ±20 - 10 20 Federal Energy Management Program [19] ±5 15 10 30 For what concerns the calibration performed with environmental variables, such as indoor air temperature, usually the model is considered calibrated when the RMSE is within the uncertainty of the measurements however no reference values of performance indexes are commonly provided. A wide piece of literature deals with BEPS calibration and this is confirmed also by reviews on this topic [20][21][22]. Substantially two approaches can be adopted for model calibration-manual approach and automatic approach.
Appl. Sci. 2019, 9, 1985 3 of 29 Manual calibration consists of iterative and pragmatic process that involves the fine tuning of the input variables in a trial and error procedure. It uses building characteristics data from audit, energy use data and zone monitoring to get insight into the physical and operational characteristics of the building. The objective is to minimize the difference between measured and simulated output variables such as energy consumption, gathered from audit processes or inside temperature trends, obtained through environmental monitoring. Graphical techniques are widely used in manual calibration, as in the development of systematic and evidence based models [23]. Manual calibration requires high skill and expertise of the modeler that modifies the input basing principally on its experience. Apart from skill, the process usually requires a certain amount of time to be completed. Usually, to better understand the process, the input variables are changed one at a time then the simulation is run and for each simulation the output has to be compared with the original model. Many studies in the literature adopt manual calibration [23][24][25][26]. Manual calibration can also be used to gain detailed knowledge of the physical and operational characteristics of the buildings [27,28]. For example, Cornaro et al. 2016 [29] calibrated a model of a complex historical building and used manual calibration as a tool to identify various wall layers made by the superposition of unknown materials of different ages.
Automatic calibration relies on mathematical and statistical methods that mostly use optimization functions to minimize the difference between measured and simulated data. Many automatic procedures also include sensitivity analysis to reduce the number of input to the optimization tool and speed up the computing time [13,15,[30][31][32]. Various algorithms have been used for the automatic calibration, among them the Bayesian approach [33], pattern-based approach [34], evolutionary algorithms [35,36] and particle swarm optimization [13,16,37]. In general, both automatic and manual calibration result either time consuming or costly methods. Manual calibration requires time of an experienced analyst while automatic calibration mainly requires computing power and time to complete the process. Therefore, it is difficult to determine whether one method could prevail over the other. Apart of a case in which the difference of the two approaches has been evidenced [38] the literature studies scarcely assess this issue. For this reason, it seems to the authors that the proposed evaluation, even if related to a specific case study, could be informative to the literature. This paper introduces an automatic calibration procedure developed by the authors that couples dynamic simulation and sensitivity analysis with automatic optimization using IDA ICE, Matlab and GenOpt respectively. This automatic calibration method has been compared to a manual procedure to evidence differences, advantages and disadvantages by applying both of them to a dynamic simulation model of a real office building in Rome, Italy. The comparison has been made in terms of accuracy of prediction of the real building consumption bills by the manual and automatic energy model calibrated using the temperature profile trends. Section 2 illustrates the methodology used to face the model calibration (both manual and automatic). Section 3 presents the case study, the monitoring campaigns and the model construction. Results regarding manual and automatic calibration are discussed in Section 4.

Materials and Methods
Both manual and automatic procedures consist of a first phase of data collection and selection, then a calibration phase and finally a check of the models through cross validation. The calibration phase was different according to the corresponding methodology. A multi-stage calibration process had been carried out for both approaches which involved firstly the envelope calibration and then the heating plant calibration using experimental data coming from short term monitoring campaigns carried out in different periods of the year [16,39]. The first step foresaw the use of the sensitivity analysis only for the automatic calibration. The whole process can be resumed as shown in Figure 1.

Data Collection and Selection
An initial dynamic building simulation model was developed and built with IDA-ICE 4.7.1 by EQUA Simulation [9], starting from the available information concerning the building. IDA ICE is a simulation application for the multi-zonal and dynamic study of indoor climate and energy use. IDA Appl. Sci. 2019, 9,1985 4 of 29 ICE can be used for complete energy and design studies, involving the envelope, systems, the plant and control strategies. The flexible architecture of the software allows to develop and to expand it with new capabilities. Additional features, like parametric simulation runs and visual scripting, support the user in a parametric design process. The coupling with optimization engines like GenOpt is available directly in the program.
From the original drawings, it had been possible to define the context, the geometry and the thermal zones of the building. According to the material properties, walls and roofs layers but also openings were specified.
Appl. Sci. 2019, 9, x 4 of 36 ICE can be used for complete energy and design studies, involving the envelope, systems, the plant and control strategies. The flexible architecture of the software allows to develop and to expand it with new capabilities. Additional features, like parametric simulation runs and visual scripting, support the user in a parametric design process. The coupling with optimization engines like GenOpt is available directly in the program. From the original drawings, it had been possible to define the context, the geometry and the thermal zones of the building. According to the material properties, walls and roofs layers but also openings were specified. Finally, plants, equipment and users-related data allowed the setting of the plant system (in terms of set points but also time schedules) and thermal gains.
Apart from the physical data related to the initial building model, other information were specifically required for the calibration procedure. To define the climate file involved in the simulation process for the calibration but also to obtain data to be compared with the simulation outputs, a measurement campaign had been settled. As multi-stage calibration was applied, a multiphase campaign had been carried out.
Since the number of possible parameters influencing calibration was relatively high, a sensitive analysis using the Elementary Effect (EE) method, based on the Morris random sampling method [40], had been performed before the automatic calibration process identifying the most relevant ones.
Coupling MatLab with IDA-ICE, the EE method had been applied considering the hourlysimulated indoor temperature (T) for the period of each single stage of the whole calibration. According to the method, the number of EEs of each parameter (r), the number of parameters (k) and the number of levels (L) in which parameters range, were considered. A MatLab script was built that could open the IDA ICE environment iteratively, varying only one of the candidate inputs at each step while all the others were fixed to the previous value. N = r (k + 1) simulations had then been performed and the difference between simulated T and T from the initial guess model was expressed in terms of Mean Absolute Error (MAE). A single EE was defined as follows: where ( , , … , + , … , + ) and ( ) are T trends for current and first guess model respectively. Absolute value of the mean (µ*), standard deviation (σ) of the distribution of the EEs and the ratio * were finally evaluated. The relevance of the parameter was related to µ*, the larger µ* was, the more the corresponding input contributed to the dispersion of the output. σ was a measure of non-linear and/or interaction effects of the corresponding input. So, the ratio * expressed the nature of the dependence of each parameter effect ( * < 0.1 linear, 0.1 ≤ * < 0.5 monotonic, 0.5 ≤ * < 1 almost monotonic, * ≥ 1 not-linear and/or not-monotonic) [41]. Finally, plants, equipment and users-related data allowed the setting of the plant system (in terms of set points but also time schedules) and thermal gains.
Apart from the physical data related to the initial building model, other information were specifically required for the calibration procedure. To define the climate file involved in the simulation process for the calibration but also to obtain data to be compared with the simulation outputs, a measurement campaign had been settled. As multi-stage calibration was applied, a multi-phase campaign had been carried out.
Since the number of possible parameters influencing calibration was relatively high, a sensitive analysis using the Elementary Effect (EE) method, based on the Morris random sampling method [40], had been performed before the automatic calibration process identifying the most relevant ones.
Coupling MatLab with IDA-ICE, the EE method had been applied considering the hourly-simulated indoor temperature (T) for the period of each single stage of the whole calibration. According to the method, the number of EEs of each parameter (r), the number of parameters (k) and the number of levels (L) in which parameters range, were considered. A MatLab script was built that could open the IDA ICE environment iteratively, varying only one of the candidate inputs at each step while all the others were fixed to the previous value. N = r (k + 1) simulations had then been performed and the difference between simulated T and T from the initial guess model was expressed in terms of Mean Absolute Error (MAE). A single EE was defined as follows: where f (x 1 , x 2 , . . . , x i + ∆x, . . . , +x k ) and f (x) are T trends for current and first guess model respectively. Absolute value of the mean (µ*), standard deviation (σ) of the distribution of the EEs and the ratio σ µ * were finally evaluated. The relevance of the parameter was related to µ*, the larger µ* was, the more the corresponding input contributed to the dispersion of the output. σ was a measure of non-linear and/or interaction effects of the corresponding input. So, the ratio σ µ * expressed the nature of the dependence of each parameter effect ( σ µ * < 0.1 linear, 0.1 ≤ σ µ * < 0.5 monotonic, 0.5 ≤ σ µ * < 1 almost monotonic, σ µ * ≥ 1 not-linear and/or not-monotonic) [41].

Calibration
Although part of the logical procedure is similar, the calibration phase is treated separately according to the method involved.
The calibration was performed considering all the candidate parameters for minimizing the RMSE, CVRMSE and NRMSE between simulated and measured T. The manual and automatic procedures were carried out by two different master's students with similar experience in building simulation modelling, supervised by a senior researcher.

Manual Calibration
In the manual procedure, the model was calibrated using the hourly indoor T measured in the different periods of the monitoring campaign.
The statistic parameters evaluated the goodness of the modification manually and iteratively applied to the model according to the experience of the operator and the observation of past steps.
In order to facilitate the assessment of the on-going calibration process, also other rating elements and strategies were involved. First, a graphic comparison of the plots for measured and simulated T was used to qualitatively estimate the direction of the iterative process. Analysis of heat balance and surface heat fluxes through walls, slabs and glazed surfaces helped to identify the most relevant construction elements in terms of heat transfer, determining the elements to be modified at each manual iterative step. Finally, Taylor's diagram [42] was used to more completely evaluate the validity of the solutions found so far, using three statistical indexes-standard deviation (SD), correlation coefficient (R) and Centred RMSE (E')-described by the following equations, respectively: where x in SD is a general parameter replaced by s and m in SD s and SD m respectively.

Automatic Calibration
In the automatic procedure, the model was again calibrated using the hourly indoor T measured in the different monitoring campaigns but involving simulation-based optimization method.
This automated method was based on the coupling of a simulation program and an optimization engine, which consists of several optimization algorithms. Optimization settings (variables, constraints, selected algorithm, etc.) and an objective function are needed for the optimization engine. Subsequently, the engine sends a call signal to the simulation program, to run a simulation and obtain a resulting scenario. If the output of the simulation satisfies the stop criteria for the algorithm, the optimal solution has been found and the process is concluded. Otherwise, the optimization engine elaborates and sends a new set of input data to the simulation program, calls for a new simulation run and repeats the process until at least a stop criterion is met ( Figure 2).
In this calibration process, the objective function was the Cumulative Squared Error (CSE) of simulated (s) and measured (m) data: The function was built in IDA-ICE including: Appl. Sci. 2019, 9,1985 6 of 29 a. Source-File, containing all the T data measured in a specific room for the monitoring campaign period; b.
Zone-Sensor, referring to the thermal zone which the room corresponds, in order to extract the simulated T data to be compared with the measured ones; c.
List of mathematical operators, to build the function.
Appl. Sci. 2019, 9, x 6 of 36 a. Source-File, containing all the T data measured in a specific room for the monitoring campaign period; b. Zone-Sensor, referring to the thermal zone which the room corresponds, in order to extract the simulated T data to be compared with the measured ones; c. List of mathematical operators, to build the function. Using GenOpt [43,44] via the parametric-runs macro embedded in the IDA ICE environment as the optimization engine, a hybrid algorithm-a combination of the Particle Swarm Optimization (PSO) algorithm for a first global search and the Hooke-Jeeves (HJ) algorithm for a second local search-minimized the objective function.
The PSO algorithm is a metaheuristic population-based algorithm that makes use of the group behaviour of an ensemble of particles, intended as a social organism. This "swarm" of particles searches in the solution space using simple rule-based decisions combined with randomized decisions, sharing information about the best solution found so far. The PSO algorithm does not require any previous knowledge of the objective function or its derivative, which allows it to deal with discontinuities. It can also search in very large search-spaces, does not "get stuck" in local optima and has the possibility to evaluate a large number of cost functions. However, although excellent solutions are always found by the algorithm, global optimum cannot be guaranteed [45,46].
The HJ algorithm is a direct search algorithm that, starting from a (user) selected base point, performs local search using a defined step in the variability range of the input parameters. If the process does not decrease the cost, the step is reduced and the search continues from the best solution found so far. If, conversely, the objective function decreases, a temporary best value is found and assumed as new base point for the following search. The process keeps on going until the cost function decreases. The HJ algorithm is not gradient-based-it can easily face discontinuities but may be attracted by local optima if the first base point has not been wisely selected [47].
The hybrid algorithm involved in the calibration process takes advantage of the positive face of both algorithms, reducing the disadvantages. Indeed, PSO rapidly converges to a set of very good solutions (potential optima), the best of which is taken as base point for HJ. Subsequently, the direct search algorithm more deeply investigates the neighbourhood of this base point, searching for the global optimum.
The calibration was performed as a single-objective optimization problem, considering only the candidate parameters derived from the sensitivity analysis. Once the process was over, the goodness of the result could be evaluated again in terms of RMSE, CVRMSE and NRMSE, also to facilitate the further comparison. Using GenOpt [43,44] via the parametric-runs macro embedded in the IDA ICE environment as the optimization engine, a hybrid algorithm-a combination of the Particle Swarm Optimization (PSO) algorithm for a first global search and the Hooke-Jeeves (HJ) algorithm for a second local search-minimized the objective function.
The PSO algorithm is a metaheuristic population-based algorithm that makes use of the group behaviour of an ensemble of particles, intended as a social organism. This "swarm" of particles searches in the solution space using simple rule-based decisions combined with randomized decisions, sharing information about the best solution found so far. The PSO algorithm does not require any previous knowledge of the objective function or its derivative, which allows it to deal with discontinuities. It can also search in very large search-spaces, does not "get stuck" in local optima and has the possibility to evaluate a large number of cost functions. However, although excellent solutions are always found by the algorithm, global optimum cannot be guaranteed [45,46].
The HJ algorithm is a direct search algorithm that, starting from a (user) selected base point, performs local search using a defined step in the variability range of the input parameters. If the process does not decrease the cost, the step is reduced and the search continues from the best solution found so far. If, conversely, the objective function decreases, a temporary best value is found and assumed as new base point for the following search. The process keeps on going until the cost function decreases. The HJ algorithm is not gradient-based-it can easily face discontinuities but may be attracted by local optima if the first base point has not been wisely selected [47].
The hybrid algorithm involved in the calibration process takes advantage of the positive face of both algorithms, reducing the disadvantages. Indeed, PSO rapidly converges to a set of very good solutions (potential optima), the best of which is taken as base point for HJ. Subsequently, the direct search algorithm more deeply investigates the neighbourhood of this base point, searching for the global optimum.
The calibration was performed as a single-objective optimization problem, considering only the candidate parameters derived from the sensitivity analysis. Once the process was over, the goodness of the result could be evaluated again in terms of RMSE, CVRMSE and NRMSE, also to facilitate the further comparison.

Building Description
The building selected for the case study is the administrative headquarter of the CIRM (acronym for International Radio Medical Centre), an organisation that deals with radio assistance and medical rescue services to international ships during navigation.
The structure, intended for office and outpatient use, was built in the early 1960s in Rome in the E.U.R district (latitude 41 • 49 48.3" N, longitude 12 • 28 35.3" E, altitude 30 m a.s.l.).
The building ( Figure 3) consists of three floors-a basement (partially below the ground level on three sides), a ground floor and a first floor. The structure is a concrete frame, with bricks cladding and three different external finishes-exposed brick (ground and first floor), tuff blocks and white marble slabs in the southeast and north-west areas in the basement of the building, respectively. The total floor area is about 650 m 2 for a gross volume of approximately 2400 m 3 .

Building Description
The building selected for the case study is the administrative headquarter of the CIRM (acronym for International Radio Medical Centre), an organisation that deals with radio assistance and medical rescue services to international ships during navigation.
The building ( Figure 3) consists of three floors-a basement (partially below the ground level on three sides), a ground floor and a first floor. The structure is a concrete frame, with bricks cladding and three different external finishes-exposed brick (ground and first floor), tuff blocks and white marble slabs in the southeast and north-west areas in the basement of the building, respectively. The total floor area is about 650 m 2 for a gross volume of approximately 2400 m 3 . In winter, heating is provided by a centralised natural gas boiler with radiator, while for DHW (Domestic Hot Water) there are electric boilers. As regarding ground and first floor, independent air conditioners for cooling are available in a few rooms only.
The basement has been refurbished in the late 90 s, with new windows and radiators and the introduction of a HVAC centralized system for cooling. Ground and first floor are still in the original state. Since independent air conditioners for cooling are available in a few rooms only, and the centralized HVAC system was not active in the basement, the model was not calibrated for cooling, but the summer period was used to calibrate the building envelope.

Measurement Campaign
Modelling and calibration requested a monitoring campaign to measure both outdoor and indoor climate data. Outdoor measurements were carried out in order to use more representative climate data within the simulations. A portable measuring station was used.
The station consisted of a copper anemometer (7911, Davis Instrument, Hayward, CA USA) with a wind speed range of 0.5 ÷ 89 m/s and an accuracy of ±1 m/s for speed and ±7° for wind direction, a silicon solarimeter with a photodiode (SP Series Apogee Instruments) with a range of 0-350 mV and a sensitivity of 0.20 mV per W m and a thermo-hygrometer with an anti-radiation screen (Hygroclip2, Rotronic, Bassersdorf, Switzerland) with an accuracy of ±0.1 °C for temperature and ±0.8% for relative humidity. The same portable measuring system also provided an indoor station equipped with a thermo-hygrometer (Hygroclip2, Rotronic, Bassersdorf, Switzerland) with an accuracy of ±0.1 °C for temperature and ±0.8% for relative humidity, for indoor climate measurements. A control data logger (CR1000, Campbell Scientific, Logan, UT, USA), with 2 MB flash In winter, heating is provided by a centralised natural gas boiler with radiator, while for DHW (Domestic Hot Water) there are electric boilers. As regarding ground and first floor, independent air conditioners for cooling are available in a few rooms only.
The basement has been refurbished in the late 90 s, with new windows and radiators and the introduction of a HVAC centralized system for cooling. Ground and first floor are still in the original state. Since independent air conditioners for cooling are available in a few rooms only, and the centralized HVAC system was not active in the basement, the model was not calibrated for cooling, but the summer period was used to calibrate the building envelope.

Measurement Campaign
Modelling and calibration requested a monitoring campaign to measure both outdoor and indoor climate data. Outdoor measurements were carried out in order to use more representative climate data within the simulations. A portable measuring station was used.
The station consisted of a copper anemometer (7911, Davis Instrument, Hayward, CA USA) with a wind speed range of 0.5-89 m/s and an accuracy of ±1 m/s for speed and ±7 • for wind direction, a silicon solarimeter with a photodiode (SP Series Apogee Instruments) with a range of 0-350 mV and a sensitivity of 0.20 mV per Wm −2 and a thermo-hygrometer with an anti-radiation screen (Hygroclip2, Rotronic, Bassersdorf, Switzerland) with an accuracy of ±0.1 • C for temperature and ±0.8% for relative humidity. The same portable measuring system also provided an indoor station equipped with a thermo-hygrometer (Hygroclip2, Rotronic, Bassersdorf, Switzerland) with an accuracy of ±0.1 • C for temperature and ±0.8% for relative humidity, for indoor climate measurements. A control data logger (CR1000, Campbell Scientific, Logan, UT, USA), with 2 MB flash processor for the operating system, 4 MB SRAM, program memory and data memory, 16 analogic single-ended input, 100 Hz scan speed, ±0.06% of accuracy for analogic, was used to acquire both outdoor and indoor data at a one minute time rate. To obtain a larger spectrum of information about the building conditions, three measurement campaigns were carried out at different periods of the year and in different rooms. Outdoor air temperature, relative humidity, horizontal irradiance and indoor air temperature were collected in each survey (Figure 4). During the first period-Measurement Survey A (21-26 June 2015)-the outdoor station was on top of the building's roof and the indoor station in a room of the first floor with northwest exposure. Figure 4 presents the environmental data recorded during the test period (Figure 4a). processor for the operating system, 4 MB SRAM, program memory and data memory, 16 analogic single-ended input, 100 Hz scan speed, ±0.06% of accuracy for analogic, was used to acquire both outdoor and indoor data at a one minute time rate. To obtain a larger spectrum of information about the building conditions, three measurement campaigns were carried out at different periods of the year and in different rooms. Outdoor air temperature, relative humidity, horizontal irradiance and indoor air temperature were collected in each survey ( Figure 4). During the first period-Measurement Survey A (21-26 June 2015)-the outdoor station was on top of the building's roof and the indoor station in a room of the first floor with northwest exposure. Figure 4 presents the environmental data recorded during the test period (Figure 4a). In the second survey-Measurement Survey B (28 July-1 August 2015)-outdoor measurements were taken from a balcony on the first floor since it was not possible to reach the rooftop with the available cabling of the outdoor instruments connected with the indoor station that was located in a room in the basement with southeast exposure (Figure 4b). In the second survey-Measurement Survey B (28 July-1 August 2015)-outdoor measurements were taken from a balcony on the first floor since it was not possible to reach the rooftop with the available cabling of the outdoor instruments connected with the indoor station that was located in a room in the basement with southeast exposure (Figure 4b).
For the third period-Measurement Survey C (13-18 December 2015)-again, due to logistic problems, only the indoor station was used. E.U.R district weather data, considered representative of the building's surrounding environment were considered for outdoor data. The indoor measurement was taken in a room on the first floor with west exposure (Figure 4c).
During the three monitoring periods, the indoor measuring station was located in the centre of the reference room, far from walls, openings and direct heat sources (natural and artificial light, active equipment and plants), at a height of 1.5 m from the floor.

Energy Consumption
Bills of monthly natural gas consumption were available for calibration technique comparison. The consumption reported in the bills was estimated by the energy company using measured data from the past years and the outdoor ambient conditions of the referring period. Since delivered energy for fuel heating is expressed in IDA-ICE in kWh and the bills indicated the amount of monthly consumed gas in scm (standard cubic meters), to facilitate the comparison, a conversion of scm to kWh was carried out, considering (as reported in the bills) a GCV (Gross Calorific Value) of 39.64 MJ/m 3 ( Table 2).

Model Construction
Starting from the drawing's information and assuming part of the characteristic of the constructive elements from buildings of same typology and construction period (an inspection of wall layers was not allowed), a first guess of the model was built. The plant system was then modelled and added.

Envelope
According to the original drawings, a geometrically representative and well oriented 3D model was built (Figure 5), including the high trees in the surroundings (for their shading effect), the excavated part of the garden and the division of each floor in several thermal zones, 43 in total ( Figure 6) standing for the internal rooms. IDA-ICE's database was uploaded with materials properties information to model the construction elements (Table 3).
For the third period-Measurement Survey C (13-18 December 2015)-again, due to logistic problems, only the indoor station was used. E.U.R district weather data, considered representative of the building's surrounding environment were considered for outdoor data. The indoor measurement was taken in a room on the first floor with west exposure (Figure 4c).
During the three monitoring periods, the indoor measuring station was located in the centre of the reference room, far from walls, openings and direct heat sources (natural and artificial light, active equipment and plants), at a height of 1.5 m from the floor.

Energy Consumption
Bills of monthly natural gas consumption were available for calibration technique comparison. The consumption reported in the bills was estimated by the energy company using measured data from the past years and the outdoor ambient conditions of the referring period. Since delivered energy for fuel heating is expressed in IDA-ICE in kWh and the bills indicated the amount of monthly consumed gas in scm (standard cubic meters), to facilitate the comparison, a conversion of scm to kWh was carried out, considering (as reported in the bills) a GCV (Gross Calorific Value) of 39.64 MJ/m 3 ( Table 2).

Model Construction
Starting from the drawing's information and assuming part of the characteristic of the constructive elements from buildings of same typology and construction period (an inspection of wall layers was not allowed), a first guess of the model was built. The plant system was then modelled and added.

Envelope
According to the original drawings, a geometrically representative and well oriented 3D model was built (Figure 5), including the high trees in the surroundings (for their shading effect), the excavated part of the garden and the division of each floor in several thermal zones, 43 in total ( Figure  6) standing for the internal rooms. IDA-ICE's database was uploaded with materials properties information to model the construction elements (Table 3).     Finally, concerning the construction elements, three different wall packages and three slab typologies were defined. Two windows and correspondent internal shading, one glazed internal door and two door types were specified (Table 4). Thermal bridges and infiltrations settled on typical values as shown in Table 5.
The first guess model was adopted as starting point for both calibration processes.

Plant
The plant model referred to the heating system only, which consists of a boiler, several water radiators (almost one in each room/thermal zone) and a controller for the three-ways valve. (Figure 7). 19, 9, x  t lant model referred to the heating system only, which consists of a boiler, several water radiators (almost one in each room/thermal zone) and ee-ways valve. (Figure 7).  According to the operating manual, the burner-boiler system was given a 200 kW maximum heating capacity and an efficiency of 0.93.
Concerning the water radiators, emitted power is dynamically calculated by IDA-ICE as follows: where l is the length of the radiator, dT is the instantaneous temperature difference between the water and the zone air, K is the emitted power per unit of equipment length, N is an exponent that, as K, depends on the material and the structure of the radiator.
For the radiators at the basement level, according to the manual, it had been possible to determine all the technical specifications. A single element of 0.08·0.08·0.77 m 3 , at ∆T = 50 • C, was given an exponent coefficient of N = 1.339 and a maximum power at capacity of 147.4 W (total value depends only on the radiator length).
For radiators at ground and first floor level, since technical data were not available, a first guess of N = 1.28 (typical for ordinary radiators) was given and maximum power at capacity was calculated as suggested in UNI 10200 [48]: where S is the external convection surface, V is the volume of the radiator and C is an experimental parameter related to the material and typology of the equipment (for finned steel radiators the given value is C = 22500 at ∆T = 60 • C). At the three-ways valve, supplied water can be mixed with return water in order to adjust water temperature to reach desired ambient conditions. The controller is supposed to manage the valve according to a preselected heating curve (coupled values of outdoor T and supplied water T), attempting to reach a desired indoor ambient temperature set point (25 • C). It has to be noted that, since the two calibrations have been carried out by different operators, in the manual calibration a standard heating curve provided by IDA ICE has been considered while for the automatic calibration the real heating curve from the controller specifications has been used (water T = 90 • C for outdoor T = −2.5 • C and water T = 35 • C for outdoor T = 20 • C).
During the winter, heating system is on all days according to working hours schedule (8:00-17:00).

Occupancy, Equipment and Lights
Occupancy, equipment and lights were generally scheduled as present/on according to working hours (all days 8:00-17:00). Occupants were considered to have a metabolic rate of 1.2 MET; nevertheless, only 11 over 44 rooms (office rooms) were constantly occupied during working hours. Other rooms' occupancy (meeting rooms, ambulatory, archive, etc.) was considered as not influential. Equipment basically consists of working station (computer, monitor, printer, etc.) which power (related to its thermal gain) was estimated to be 100 to 200 W and has been object of further calibration. Each lighting element was given a power of 100 W.

Results
Both manual and automatic calibration were performed for the correct modelling of the CIRM administrative headquarters building. Both calibration methods included multi-stage process. Three tuning phases, according to the three monitoring period, were in fact involved in the processes: • Calibration A (first floor): using five days of summer data (22-26 June), a first tuning was given to the building envelope. Calibration was performed referring to an office room situated in the north part of the first floor of the building. No occupants were included, and lights, equipment and plants were turned off. • Calibration B (basement): using a further five days of summer data (28 July-1 August), a second building envelope tuning was performed. Calibration referred to a conference room situated in the south part of the basement. Again, no occupants, lights, equipment and plants were considered. • Calibration C (heating system): using five days of winter data (14-18 December), the heating system was tuned. Calibration was related to a north-west office room situated in the first floor.
Internal gains due to occupants, equipment and lights were scheduled according to working time during the day.
Calibration A and C, due to adjustments of monitoring instruments and room conditions during initial phases of the campaigns, started from the second monitored day. Since different operators carried out the two calibration processes in parallel, apart from the results of each step, the strategies adopted were sometimes different. Regarding the computational time both procedures were executed with a PC equipped with a quad core i7-2600 (3.4 GHz) processor and 8 GB RAM.

Calibration A (First Floor)
Calibration A aimed to provide a first tuning to the building envelope, focusing on the ground and first floors. The process involved parameters related to external wall with brick cladding (layers and bricks conductivity), covering roof (layers), integrated windows shading for original windows (multiplier for glass conductivity and solar gain), thermal bridges and infiltrations ( Table 6).
The calibration required over 1 month of work and attempts to achieve the final solution. Figure 8 shows the output of the most representative simulations (51), where the sole run time of each simulation took 5-7 min to be completed.
, 9, x Calibration ration A (First Floor) ation A aimed to provide a first tuning to the building envelope, focusing on the ground and first floors. The process involved paramete wall with brick cladding (layers and bricks conductivity), covering roof (layers), integrated windows shading for original windows (mu ctivity and solar gain), thermal bridges and infiltrations (Table 6). libration required over 1 month of work and attempts to achieve the final solution. Figure 8 shows the output of the most repr (51), where the sole run time of each simulation took 5-7 min to be completed.

Calibration B (Basement)
The second calibration process was directed again to the building envelope but concentrating on the basement floor. Calibration B involved parameters related to external wall with tuff cladding and external wall with marble cladding (layers), new windows (glass thermal transmittance) and their integrated shading (multiplier for solar gain). Details are reported in Table 7. The calibration required almost 2 weeks to achieve the final solution ( Figure 9). In the figure, only the most representative simulations needed (25) are shown (where the run of each simulation took 4-5 min to be completed).  ration C (Heating System) ning objective for Calibration C was the heating system. Since the first guess model simulated a lower temperature than measured (almos f 1 °C), the water radiator's emitted power was considered the main element to be involved for this tuning process. Parameters related tion as exponent N (Equation 11) and maximum emitted power at capacity equation from UNI10200 as experimental parameter C (Eq onsidered ( Table 8). libration needed about 3 weeks to achieve the final solution and 15 simulations (where the run of each simulation took 10-12 min to be co n Figure 10.

Calibration C (Heating System)
The tuning objective for Calibration C was the heating system. Since the first guess model simulated a lower temperature than measured (almost constant difference of 1 • C), the water radiator's emitted power was considered the main element to be involved for this tuning process. Parameters related to emitted power equation as exponent N (Equation (11)) and maximum emitted power at capacity equation from UNI10200 as experimental parameter C (Equation (12)) were then considered (Table 8). The calibration needed about 3 weeks to achieve the final solution and 15 simulations (where the run of each simulation took 10-12 min to be completed) are shown in Figure 10.

Calibration A (First Floor)
Initially, k = 33 parameters, involving thermal bridges, infiltration, walls' layers, emissivity and reflectance, windows and shading properties, we sidered (details reported in Table 9).

Calibration A (First Floor)
Initially, k = 33 parameters, involving thermal bridges, infiltration, walls' layers, emissivity and reflectance, windows and shading properties, were considered (details reported in Table 9). Subsequently, considering for each parameter a number of EEs r = 10, a number of discretized levels L = 6 to search in a fixed uncertainty range of ±20% from first guessed value, SA was performed and, after 17 h of computational time, the number of parameters was reduced to 10 ( Figure 11). ckness, E = emissivity, R = reflectance, λ = thermal conductivity, mult. g = multiplier for glass solar gain, mult. U = multiplier for glass thermal transmit uently, considering for each parameter a number of EEs r = 10, a number of discretized levels L = 6 to search in a fixed uncertainty rang uessed value, SA was performed and, after 17 h of computational time, the number of parameters was reduced to 10 ( Figure 11). put parameters were then selected, and it was possible to set and run the optimization problem for the calibration. Once started, the opt simulations and about 5 h of total computation time to achieve the final solution ( Figure 12). The input parameters were then selected, and it was possible to set and run the optimization problem for the calibration. Once started, the optimization needed 437 simulations and about 5 h of total computation time to achieve the final solution ( Figure 12).

Calibration B (Basement)
Using the final result of the first stage as new initial model, in this second phase again k = 17 parameters were considered (details reported in Table 10). Taking into account the identical settings of the previous case (number of EEs r = 10, number of levels L = 6 and fixed uncertainty range of ±20% from initial value), SA was performed and, after 9 h of computational time, the number of parameters was reduced to 9 ( Figure 13). ickness, E = emissivity, R = reflectance, λ = thermal conductivity, mult. g = multiplier for glass solar gain, mult. U = multiplier for glass thermal transmit into account the identical settings of the previous case (number of EEs r = 10, number of levels L = 6 and fixed uncertainty range of ± e), SA was performed and, after 9 h of computational time, the number of parameters was reduced to 9 ( Figure 13).  The selected input parameters were involved in the second optimization problem for the calibration. Once started, the optimization needed 408 simulations and about 4.5 h of total computation time to achieve the final solution ( Figure 14).  tion B included envelope-related parameters. Referring to the first calibration period and monitoring the simulated T in the office room A, a check simulation has been done to evaluate the effect on the previous results of the changes introduced in the above-mentioned pa rison results in a cross validation since the calibrated model output were compared with experimental data coming from a different tim ent zone of the building. Not only was the effect restrained but also positive in terms of accuracy of the model (Figure 15). Calibration B included envelope-related parameters. Referring to the first calibration period and monitoring the simulated T in the office room used for Calibration A, a check simulation has been done to evaluate the effect on the previous results of the changes introduced in the above-mentioned parameters. This comparison results in a cross validation since the calibrated model output were compared with experimental data coming from a different time period and a different zone of the building. Not only was the effect restrained but also positive in terms of accuracy of the model (Figure 15).  ation B included envelope-related parameters. Referring to the first calibration period and monitoring the simulated T in the office room A, a check simulation has been done to evaluate the effect on the previous results of the changes introduced in the above-mentioned pa rison results in a cross validation since the calibrated model output were compared with experimental data coming from a different ti ent zone of the building. Not only was the effect restrained but also positive in terms of accuracy of the model (Figure 15).

Calibration C (Heating System)
Starting from the result of the second stage as the new initial model, in this third phase the envelope of the model was considered as calibrated and the tuning process focused on the plant.
As the most of plant-related parameters were known or deducible and, conversely to the manual calibration case, changes on emitted power equation parameters were not giving sensitive model improvements, the highest uncertainty was considered to be related to the controller for supply heating water temperature.
A first calibration attempt was done trying to individuate the most appropriate heating curve. Although the statistics parameters for temperature were good for the model resulting by this calibration process, the validation was not confirming the goodness of the process. Although the whole year consumption was similar for real building and energy model, the monthly amount of gas consumed resulted to be strongly higher for warmer periods (1-27 November 2014 and13 March-14 April 2015) and lower for the colder one (24 December 2014-22 January 2015, which resulted in a too high CVRMSE value. A new strategy was then adopted. Referring to real fuel consumption from the bills ( Table 2), it was possible to notice that, despite the change in the outdoor conditions in the passing months, the amount of consumed gas did not present relevant variations. This suggested an almost constant trend for the supply heating water temperature, which could be justified, assuming that, due to the long duration of service, the controller is not working properly anymore.
According to this new hypothesis, a second and definitive attempt was done to calibrate the plant system.
Since the number of parameters pre-selected (Table 11) was restricted (k = 4) no SA was performed in this case. Involving these input parameters, the third optimization problem for the calibration was settled and run. Once started the optimization needed 299 simulations and about 6 h of total computation time to achieve the final solution ( Figure 16).  ing these input parameters, the third optimization problem for the calibration was settled and run. Once started the optimization ne and about 6 h of total computation time to achieve the final solution ( Figure 16).

Discussion
Tables 12-14 show final model settings for both manual and automatic procedure. Table 12 presents infiltration and thermal bridges values that were not modified during both calibration processes.  Different results and configurations in Calibration A lead to a different starting model for Calibration B-if the manual model resulted in simulating a lower temperature than measured, the automatic model, in reverse, provided a higher value of indoor T. Calibrations then attempted to rise internal T (increasing solar gain and insulation) or decrease it (through emissivity, reflectance and wall thickness) for manual and automatic procedures, respectively.
Concerning the plants' related parameters, the manual model keeps a standard heating curve which gives lower mean supply water T values but higher emitted power for radiators. The automatic model instead has an almost constant value for supply water T and radiator parameters obtained as standard from UNI 10200. The two approaches attempted to increase simulated indoor T for the manual procedure or to adjust indoor T value while keeping almost constant fuel consumption, for the automatic procedure.
Nevertheless, both methods show good and physically plausible results in having representative models for the case study.
Comparing results from both methods (Table 15), it is possible to observe how automatic calibration (in bold) leads to better solutions in terms of statistical parameters (in every stage of the tuning process). In addition, the time needed to achieve the above-mentioned results (even including SA, when applied) is strongly reduced through automation, which also allows the evaluation of a higher number of scenarios.

Comparison with the Energy Bills
Data related to the heating system consumption of the building and the manual and automatic calibrated models (MCM and ACM, respectively) were compared (both monthly and seasonally) in order to assess the magnitude of the error made in terms of energy consumption, considering the relative Bias Error (BE) and CVRMSE.
The climate file used for the energy simulation for the period of interest was built through data coming from the ESTER lab, University of Rome Tor Vergata, Rome, Italy.
The fuel heating consumption comparison for MCM (Table 16) showed a good monthly value for BE, barely higher (in absolute value) than 10%. Global value for BE exceeded the limits fixed by ASHRAE guideline but CVRMSE was optimal.
The fuel heating consumption comparison for ACM is shown in Table 17. As predicted, the consumption period related to the same part of the year of the calibration week showed best result. Worse but still acceptable results were obtained for the warmer periods (1-27 November 2014 and 13 March-14 April 2015), as well as for the colder one (24 December 2014-22 January 2015. Global values for BE and CVRMSE were optimal according to the bounds fixed by the ASHRAE guideline. Little disagreement between simulated and real data could be explained by the fact that primary energy consumption was estimated and not directly measured by the gas company. Energy consumption comparison between MCM, ACM and real bills (Figures 17 and 18) shows that MCM generally underestimates the energy consumption (only period 18 February-12 March has a higher value for simulated energy, at 5.92%) with an almost constant BE value (around −10%).
This leads to a lower simulated seasonal consumption, probably due to the insufficient water supply T of the standard heating curve, not completely suitable with the improvements on radiators' power emission. On the other hand, ACM does not show a constant trend, generally overestimating colder periods and underestimating warmer ones. This leads to a more equilibrated seasonal BE value, which globally lightly overestimates energy consumption. In addition, even if the BE for 1-27 November and 24 December-22 January results are worse for the automatic calibration, other periods' consumption is definitely more accurate, resulting in a lower CVRMSE value. So, for this specific case, ACM better fits the real energy consumption than MCM. This conclusion could imply that ACM could provide a more reliable prediction of energy consumption than MCM in cases in which the energy model would be used for evaluating the energy performance of building retrofit solutions.

Conclusions
The work presented introduces an automatic calibration procedure and compares it against a manual process to evidence differences, advantages and disadvantages by applying both of them to a dynamic simulation model of a real office building in Rome. The methodology for automatic calibration includes SA for selecting the most relevant parameters and simulation-based optimization to minimize the difference between the simulated and measured data, automatically performed coupling of IDA ICE and MatLab and IDA ICE and GenOpt, respectively. It has to be considered that the main calibration activity was pre-eminently focused on the envelope, in some part on the plant and marginally on the equipment. Although occupancy and lightning could be generally relevant for calibration procedures, they were not considered in this analysis since they were not influential in this particular study. So, in principle, the results of this paper could be considered valid for cases in which these parameters can be neglected. Both manual and automatic calibrations show good results in terms of accuracy and a physically plausible representation of the case study, although automation presents finer tuning results, even in a reduced time frame. For instance, referring to the envelope calibration, the manual procedure gave an RMSE of 0.35 • C and 0.62 • C for calibration A (first floor) and calibration B (basement) respectively, while results for the automatic procedure showed an RMSE of 0.27 • C and 0.42 • C. Concerning heating system calibration, an RMSE of 0.54 • C and 0.39 • C resulted respectively from manual and automatic processes. Automatic calibration results in a better prediction of energy consumption since the comparison with real consumption bills produces a BE of 2.11% and a CVRMSE of 8.37% versus a BE of −7.69% and a CVRMSE of 9.66% for the manual process. Nevertheless, automatic calibration appears still strictly linked to the operator expertise, since the quality of the results depends on the settings and monitoring of the optimization procedure.
periods' consumption is definitely more accurate, resulting in a lower CVRMSE value. So, for this specific case, ACM better fits the real energy consumption than MCM. This conclusion could imply that ACM could provide a more reliable prediction of energy consumption than MCM in cases in which the energy model would be used for evaluating the energy performance of building retrofit solutions.

Conclusions
The work presented introduces an automatic calibration procedure and compares it against a manual process to evidence differences, advantages and disadvantages by applying both of them to a dynamic simulation model of a real office building in Rome. The methodology for automatic calibration includes SA for selecting the most relevant parameters and simulation-based optimization to minimize the difference between the simulated and measured data, automatically performed coupling of IDA ICE and MatLab and IDA ICE and GenOpt, respectively. It has to be considered that the main calibration activity was pre-eminently focused on the envelope, in some part on the plant and marginally on the equipment. Although occupancy and lightning could be generally relevant for calibration procedures, they were not considered in this analysis since they were not influential in this particular study. So, in principle, the results of this paper could be considered valid for cases in which these parameters can be neglected. Both manual and automatic calibrations show good results in terms of accuracy and a physically plausible representation of the case study, although automation presents finer tuning results, even in a reduced time frame. For instance, referring to the envelope calibration, the manual procedure gave an RMSE of 0.35 °C and 0.62 °C for calibration A (first floor) and calibration B (basement) respectively, while results for the automatic procedure showed an RMSE of 0.27 °C and 0.42 °C. Concerning heating system calibration, an RMSE of 0.54 °C and 0.39 °C resulted respectively from manual and automatic processes. Automatic calibration results in a better prediction of energy consumption since the comparison with real consumption bills produces a BE of 2.11% and a CVRMSE of 8.37% versus a BE of −7.69% and a CVRMSE of 9.66% for the manual process. Nevertheless, automatic calibration appears still strictly linked to the operator expertise, since the quality of the results depends on the settings and monitoring of the optimization procedure.