Energym: A Building Model Library for Controller Benchmarking

: We introduce the Python-based open-source library Energym, a building model library to test and benchmark building controllers. The incorporated building models are presented with a brief explanation of their function, location and technical equipment. Furthermore, the library structure is described, highlighting the necessary features to provide the benchmarking and control capabilities, i


Introduction
Buildings play an important role in the total energy consumption and greenhouse gas (GHG) emissions worldwide. According to [1], 36% of the global final energy is used in buildings (30% in building operation) and buildings account for 39% of GHG emissions (28% in operation, see also [2]). Alongside building renovations, smart control strategies will be key technological enablers for reducing buildings' GHG footprints and meeting the Paris Agreement goals.
The current standard for controlling Heating Ventilation and Air-Conditioning (HVAC) systems is formed by rule-based and proportional-integral (PI) controllers [3,4], but their rather simple nature, combined with possible tuning errors, can lead to sub-optimal control behavior [5]. Therefore, automated and efficient building control provides the chance to significantly reduce energy consumption and emissions. Recent research approaches cover the fields of (robust) model predictive control (MPC) (see e.g., [6][7][8][9]), adaptive or learning-based MPC (see e.g., [10]), and reinforcement learning (RL); see e.g., [11][12][13]. A comprehensive overview of MPC and data-driven approaches for building control can be found in [14,15]. Yet many of the aforementioned studies suffer from the non-standardized evaluation of their control performance. Some were demonstrated in simulations (e.g., [8]), others in real sites (e.g., [7]), but most of them in a single building or simulated model, and for a rather short period of time (e.g., one day and one week experiments for [8], and five day experiments for [7]). Hence, a direct comparison of the performance of the control methods is impossible. Moreover, for industrial purposes, it is desirable to create scalable

Buildings Overview
Energym includes 11 simulation models to date (three Modelica models and eight EnergyPlus models). The EnergyPlus models are all updated to the current Energyplus version 9.4. An overview of the installed technical equipment and their controllability is given in Table 1. A description of each model's inputs and outputs is provided in Appendix B. The models differ in size, number of rooms, usage profile, technical equipment, controllability, and climate zone. The seven buildings that are the base for the 11 models are listed below. Four of them are available in two versions, either differing in the control (e.g., controlling thermostat setpoints vs. controlling the equipment directly) or the installed equipment. The buildings have the following characteristics.
• Apartments: A residential building with four stories, each being one apartment, and eight thermal zones (two per story). It is located in Spain and has a central geothermal Heat Pump (HP) providing heat to all apartments. The building envelope is fictive, based on typical Spanish construction materials, but the HP was calibrated with a real HP located in the IREC laboratory (see Section 4.2). • Apartments2: This building shares its envelope with the Apartments building, but differs in the details of the technical equipment: each apartment possesses its own air-to-water HP and its own heating storage tank. In this building, the electrical systems (solar panels, battery) were calibrated with real systems.  Table A6.
The calibration methodology used for the models is explained in Section 4.

Design Features
Energym is designed to work with different controller types including rule-based controllers (RBC), MPC controllers and RL-based controllers. Hence the building environments and their interface are provided but the controller structure is not prescribed and left free to the user. Moreover, model performance evaluation is not based on fixed rewards (like in Gym) but implemented via KPIs that can be computed by the user after an evaluation run. The main features of the library are outlined below (A full documentation of the library, describing usage and installation, is available at https://bsl546.github.io/energym-pages/ (accessed on 14 April 2021)).

Standardized Evaluation
For each model outlined in Table 1, a physical objective to be reached is predefined. This objective might be, e.g., the minimization of the CO 2 emissions related to the building operation. The controllers also have to satisfy thermal constraints to guarantee occupant comfort. These two quantities-objective and constraints-are tracked with the implemented KPIs; see Section 3.3, Table 2. For each building, the evaluation phase with the predefined KPIs is run over a definite period of time and under predefined weather conditions. Wrappers are implemented to cope with different controller needs. In particular, wrappers are provided to scale inputs and/or outputs between values in a min-max fashion. The scaling can be beneficial for optimization-based controllers like MPC, due to the used model and solver structure. For RL-controllers, an RL-wrapper is provided to change the outputs of the step method and provide exactly the same outputs as in the Gym library, i.e., outputs, reward, done, info. One slight change with respect to Gym, however, is that the reward design is left free to the user and must be specified at wrapper initialization. This design choice was made for users to be free in the reward design phase, the main objective of any controller being to minimize the predefined KPIs. Similarly, for controller speed-up (in particular for MPC), a downsampling wrapper is provided to optimize computation time, making it possible to solve the problem less frequently than what the standard step method would impose.

Forecasting Capabilities
For designing controllers such as MPC, it is important to have descriptions of external disturbances. For this, we provide weather forecasts (including irradiance and temperatures), optionally given by the exact values in the used weather files or by stochastic variations of those. Furthermore, we provide forecasts that are highly relevant for certain models: EV usage schedules for the Apartments and Apartments2 buildings, and electricity mix forecasts for the Seminarcenter. Random seeds to generate the forecasts are fixed in evaluation mode to ensure reproducibility of the results.

Usage and Code Example Basic Structure and Usage
After importing Energym, a model can be created by calling the make method and specifying the name of the model and other optional parameters, i.e., the starting day of the simulation, the number of simulated days, the used weather file, and the used KPIs, all of which use default values if not specified upon initialization. The interaction with the model, i.e., passing control inputs and receiving outputs, is done with the step method. Control inputs are Python dictionaries, with the setpoint name as key and input as value (possibly a list with multiple entries for multiple consecutive inputs), e.g., {"Z01_T_Thermostat_sp": [21]} (or {"Z01_T_Thermostat_sp": [21,22,21]}). Outputs are also defined as dictionaries using the predefined output names as keys. The main inputs and outputs for each model are given in Appendix B. A full list is available in the online documentation. The Wrapper class is implemented to provide input-output wrapper functionalities. Weather and stochastic disturbances forecasts are available with the get_forecast method.
For the tracking of the KPIs, a KPI object is initialized for each model, it automatically records the necessary data. Calling the method get_kpi returns the evaluation for a specified time interval (by default all the completed steps) as a dictionary. More details on handling the KPIs and the default ones are discussed in Section 3.3.
A simple usage example is given in Appendix A.1.

Performance Evaluation
A pre-compiled FMU is provided for each building model and can be used with different weather files. This allows the users to train their controllers (i.e., RL agents or models for MPC) with different weather files, while the weather file for final evaluation is fixed. These fixed weather conditions on a predefined period of time ensure comparability of the control performances via the implemented KPIs. The characteristics of these fixed evaluation scenarios are displayed in Table 2. The defined KPIs fall into the categories of thermal comfort (related to temperature constraints) and objective KPI (related to the objective to minimize).

KPI Definition
For the thermal comfort, a range of acceptable temperatures is defined. The tracked KPIs are the average deviation from the target temperatures for each controlled thermal zone and the total number of range violations. Let the desired temperature range be defined by the interval I = [a, b]. Then the average deviation d(T, I) for temperature measurements The number of total violations v(T, I) is defined as where δ(t, The average energy exchanged with the grid is tracked for the models based on the Apartments and Apartments2 buildings. Let E prod = {e prod,i : i = 1, . . . , N} be the set of N consecutive measurements of produced energy and E con = {e con,i : i = 1, . . . , N} of consumed energy. Then the average energy exchange e(E prod , E con ) is defined as In the evaluation scenario, the goal is to minimize this quantity and therefore maximize the self-consumption of produced energy.
The objective for the Offices, MixedUse, SimpleHouse and SwissHouse buildings is to minimize their power consumption. Let the mean power demand for N simulation steps be given by D = {d i : i = 1, . . . , N}. The minimization objective is again given by averaging over the measurements, so the average power demand p(D) is defined as The environments based on the Seminarcenter building track the CO 2 emissions for the installed gas boiler and the varying electricity mix. A minimization of this emission is the focus of their evaluation scenario. Let the emission values be given by C = {c i : i = 1, . . . , N}. The computed KPI for those measurements is the average emission g(C) defined as Instead of using predefined KPIs, it is also possible to define custom KPIs. An example of this is given in Appendix A.2.

Calibration Methodology
Distinct methodologies (see Table 1) for calibration have been used, depending on whether the entire building or just the technical systems were calibrated with real data sets. After calibration and validation with standard metrics (see e.g., Section 4.1.2), the model responses to control actions were further tested independently by team members to ensure that physical expectations were met (setpoint responses, energy consumption patterns by system activation, etc.).

Building Calibration
The Offices, MixedUse and Seminarcenter buildings have been calibrated using the three-step methodology presented in [24]. A short overview of the method is explained in the following.

Method
In the first step, data are collected from the test sites at a 15 min sampling rate. Collected features include weather parameters (outside temperature, ground temperature, relative humidity, irradiance, atmospheric pressure, wind speed), indoor climate (room temperatures and relative humidity), as well as technical equipment parameters (water temperature and flow, on/off status) and electric consumption disaggregated by sources. Standard data pre-processing techniques are applied to the collected data to improve their usability, namely: gap reconstruction (via interpolation), removal of sensor malfunction periods, and on/off status reconstruction for technical systems for which this signal was not made directly available.
In a second step, the buildings are modelled and their envelope calibrated using the collected data. It uses free oscillation data, i.e., periods where the HVAC equipment is off to eliminate HVAC interference. As described in depth in [24], envelope calibration is realized through a parametric analysis, a sensitivity analysis and a genetic algorithm simulation (using the NSGA-II genetic algorithm; see [25]) guided by an appropriate objective function (based on normalized root mean square error (NRMSE), determination coefficient, and normalized mean bias error (NMBE)).
In a third step, HVAC and technical equipment are introduced and configured to supply the building demand. A detailed HVAC model is added to the previously calibrated envelope model and simulated in EnergyPlus. Known HVAC equipment parameters are set to technical specification values, different performance curves are determined for each of the components, while unknown parameters are set either based on technical information of similar equipment, or calculated based on test site data. This detailed model undergoes a new calibration process similar to the one used for the envelope, i.e., a parametric/sensitivity analysis followed by a genetic algorithm simulation guided by a new objective function. This calibration is performed until the simulation model uncertainty indices are acceptable within the expected KPIs; see Section 4.1.2.

Model Evaluation
Three metrics were used to assess the quality of building models: the NMBE, the coefficient of variation of the RMSE (CVRMSE) and the coefficient of determination R 2 ; see e.g., [26]. Model acceptance is based on the threshold values recommended by the American Society of Heating, Refrigerating and Air-Conditioning Engineers (ASHRAE) and the International Performance Measurement and Verification Protocol (IPMVP); see e.g., [26].

Example: The MixedUse Building
Calibration of the envelope parameters for the MixedUse building has been performed with free oscillation data as described in Section 4.1.1. The HVAC system of the MixedUse building is made of two independent main technical systems: a Variable Refrigerant Volume Unit (VRV) and an air-to-water HP. The initial performance curves of these two systems have been fitted with linear estimations to reproduce the suppliers' technical documentation with the performance equations of the corresponding EnergyPlus objects. For the MixedUse VRV system a total of 14 different curves were required, from cooling and heating capacity for low/high outdoor conditions including its boundary curves to piping length correction and defrosting; see Figures A5 and A6 in Appendix C.1.
Key parameters were then estimated in the next step (nominal power, design airflow, design supply temperature, efficiency, ...) via the optimization process with the NSGA-II genetic algorithm in order to find the combination of parameter values that results in the best fit of energy consumption while maintaining the indoor climate of the building. The results for the VRV system are displayed in Table A7 and the results for the HP system in Table A8 in Appendix C.1.
Finally, the evaluation period took place during Summer 2020, between the months of June and August (i.e., on data not used for identification). Results are displayed in Table 3. It should be noted that during the evaluation period the HP underwent a series of malfunctions and had to be repaired. This is why evaluation results for the HP are not presented here.

Heat Pump Calibration
For the Apartments building, heating is covered by means of a centralized water-towater geothermal HP system that provides hot water for the indoor fan coil units and the Domestic Hot Water (DHW) tanks. This HP model has been calibrated using real data from a test bench facility installed at the IREC laboratory. The method used is based on the work presented in [27] for HP identification and the water-to-water HP model developed in [28]. Equations (6) and (7) from [28] displayed below represent the fitting of the heating thermal power and of the electric power consumed by the HP. The experimental data used to fit the equations have been obtained by operating the HP at full load in heating mode (control of return temperature to the condenser of the HP). where: • A constant volumetric flow rate was used in the experiment as the pump was operating with a constant flow, henceV L =V L re f andV S =V S re f . Coefficients were fitted to the data using ordinary least squares. The Q h calculation residuals range from 0.01-3.94% of the Q h value (max. deviation of 1.45 kW for a nominal consumption of 36.8 kW heating power) and are displayed in Figure 1a. The corresponding R 2 value for the heating power fitting is 0.985. P h residuals range from 0.02-4.78% of the electric power consumption value (max. deviation of 0.29 kW for an electrical power consumption of 6.24 kW). The R 2 value for the electric power fitting is 0.988. The residuals are depicted in Figure 1b.

Modelica Models
Modelica models are developed with components from the LBNL Modelica Buildings Library [29]. While one of the strengths of EnergyPlus is the ease at which large and realistic envelope models can be built, Modelica models with large and complex envelopes are harder to design: The strength of Modelica is the realism and flexibility at disposal for modeling and controlling technical systems like HPs, storage tanks, and AHUs. This is the reason why the currently included models come with very simple envelopes, but complex and realistic technical systems, the other case being covered by the EnergyPlus models at disposal. Future Modelica models with more complex envelopes are in preparation and will be integrated to the library. The authors also do not exclude incorporating models using Modelica for the control systems and Energyplus envelopes.
The envelope model used for SimpleHouse and SwissHouse is a simple first order model calibrated with the thermal peak power, minimum outdoor temperatures and the building free oscillation time constant. The indoor temperature is averaged over the house geometry and modeled by a scalar T(t) obeying Equation (8).
Assuming equilibrium at very cold temperatures, the thermal conductance G [W/K] is deduced by setting the left-hand side equal to zero in these conditions. G [W/K] is then inferred to be equal to the ratio of the thermal peak power over the indoor-outdoor temperature difference for the four coldest consecutive days on the last 20 years (see SIA norm CT 2028:2010 [30]). Knowing the thermal conductance G and inferring the time constant τ of the envelope from the available building data, we derive the heat capacity C [J/K] = Gτ.
For SwissHouse models, these G and C correspond to an overall U-value of 0.5 [W/m 2 K] and to a heat capacity per surface unit of 0.

Conclusions and Future Directions
The library Energym presented in this paper aims at providing building models and standardized evaluation scenarios and metrics to develop, test, and benchmark controllers. With diverse equipment configurations calibrated with real measurement data, Energym has been designed to ease the development and deployment of "swiss knife" data-driven controllers for buildings. The used calibration methods and results have been outlined.
Two main axes of research are foreseen for future works: The extension of the library itself and the development of data-driven control methods tested on the library.
For the former, Energym is aimed at growing by gaining new building models. We are currently developing new models (both Energyplus and Modelica models) that will be incorporated into the library in the near future. Moreover, through releasing Energym as open-source, we encourage model contributions from the building simulation community (To add a new model, please contact the authors.) and the authors welcome such efforts.
For the latter, MPC and RL-based control strategies will be extensively tested on many buildings of the library to showcase its benchmarking capabilities. Furthermore, since running models in parallel is possible with Energym, we aim to investigate scenarios with multiple models in a district setting and related control problems. Finally, an additional goal of Energym is to increase engagement within the Machine Learning community, in particular the RL community, to problems related to reducing energy consumption and CO 2 emissions.

Acknowledgments:
We thank Stephan Dasen for his strong involvement in the models testing phase, as well as Insero for the work done on the Danish site.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A. Code Examples
Appendix A. 1

. Basic Usage Example
A simple example of the usage of the library is given below. It demonstrates the interaction with the simulation model for 100 timesteps, assuming a function get_input() has been implemented, that computes the control input for the current measured state of the model and a forecast for the next 10 timesteps. The chosen parameters are arbitrary and just fulfill demonstrative purposes.

Appendix A.2. KPI Example
Default KPIs are defined for each model, but the user can also define custom KPIs to be tracked. This is done by specifying a Python dictionary containing the information of the variables of interest and KPI computation method. An example dictionary for the KPIs looks as follows. kpi_dict = {"kpi1": {"name": " Fa_Pw_All ", "type": "avg"}, "kpi2": { "name": " Z01_T ", "type": " tot_viol ", " target ": [19 ,24] , } } For more information on the KPI implementation, we refer to the documentation.

Appendix B. Building Descriptions
In this part, we give a short description of the buildings, and the inputs and the outputs of the simulation models that are related to the KPIs (other outputs not entering in KPIs calculation, like flow rate and flow temperature, are not listed). The common output variables for all EnergyPlus based models are given in Table A1. Complete input/output references and in-depth explanations of the buildings can be found in online documentation. The bounds given in the tables are not used to cut-off values (unless the specific cut-off wrapper is used), but are used by default by the inputs/outputs scaling wrappers to scale the signals in values close to/within the [0,1] interval.

. Apartments and Apartments2 Buildings
The envelope is the same for both Apartments and Apartments2 buildings; see Figure A1. The envelope is made of building elements used in the periods from 1991 to 2007 in Spain. The building model consists of four identical apartments split in two thermal zones ( Figure A1). The active surface area of the PV panels is 58m 2 with an inclination of 40 • and south oriented. The PV EnergyPlus component has a rated electric power output of 10.75 kW and the inverter efficiency is 0.95. In addition, occupancy, appliances and lighting consumptions follow stochastic profiles that differentiate each dwelling behavior, resulting in different energy demands. The DHW profiles are based on the European standard (EN16147, 2011). The difference between Apartments and Apartments2 lies in their thermal systems. Apartments has a central geothermal HP, directly connected to hot water tanks (1 per Apartment) used only for DHW consumption, and to a heating loop providing heat to the entire building. Apartments2 does not have this central heating system, but possesses four storage tanks (supplying heating and DHW to each apartment), each being alimented by a dedicated air-to-water HP.
Both buildings possess a stationary battery with a capacity of 10 kWh, maximum power for charging and discharging of 4 kW. In apartments, there is one electric vehicle with a capacity of 20 kWh and a maximum power for charging of 3.7 kW. For Apartments2, two EVs with the same characteristics are present. Usage schedules are stochastic and forecasts are provided via the forecast API.
The evaluation weather file used for Apartments and Apartments2 is given by the identifier ESP_CT_Barcelona_ElPratAP1 and should not be used in the training process. Control inputs and the most relevant outputs for the Apartments and Apartments2 models are listed in Table A2.

Appendix B.2. Offices Building
The Offices building is located in Greece and includes 25 conditioned rooms with a total area of 643.73m 2 (see Figure A2). Of those 25 rooms, 14 are controllable with thermostats (2 storage rooms, 2 lobbies, 4 seminar rooms, 1 meeting room, and 5 offices). Water-to-air fan coil units are used to condition the spaces, where either water heating is provided by an oil boiler or water cooling by an electrical air-to-water chiller. The evaluation weather file for the Offices building is given by the identifier GRC_TC_Lamia1 and should not be used in the training process. The inputs and some outputs are described in Table A3.

. MixedUse Building
The MixedUse building is a 566.38m 2 building located in Greece with 13 thermal zones, of which eight are controllable with thermostats (see Figure A3). The HVAC system installed consists of two AHUs, one dedicated exclusively to thermal zones 5, 6 and 7 and a second one serving to the remaining thermal zones.
The first system dedicated to TZ-5, 6 and 7, is composed of an air loop, an AHU that includes water coils and two supply water loops: one with a Heat Pump Water Heater (HPWH) and the other with a chiller for cooling.
The second system serving the entire facility, consists of an air loop with an AHU that has direct expansion "DX" coils. In addition, the zones that are affected under this system have variable refrigerant flow (VRF) terminal units as part of the air-conditioning system. The evaluation weather file for the MixedUse building is given by the identifier GRC_TC_Lamia1 and should not be used in the training process. The control inputs and KPI related outputs are displayed in Table A4. The Seminarcenter building is a one story building located in Denmark and includes 22 conditioned rooms on 1278.94m 2 (see Figure A4). Five of the 22 rooms are divided into two thermal zones and 18 rooms are controllable with thermostats. Heating of the rooms is provided by water convectors with hot water from a buffer tank. For the buffer tank and the DHW, air-to-water HPs are used to supply the heating demand and an additional gas boiler is available in case the HPs can not provide enough heating.
The evaluation weather file for the Seminarcenter buildings is given by the identifier DNK_MJ_Horsens2 and should not be used in the training process. The control inputs to both simulation models and some outputs are described in Table A5.

Appendix B.5. SimpleHouse and SwissHouse
SimpleHouse and SwissHouse represent two residential one-family houses. The entire house is modeled with a single thermal zone in both cases. A Carnot heat pump is connected to the room model and provides heat via radiator for the rad models (SimpleHouseRad-v0 and SwissHouseRad-v0), or via floor heating for the slab model.
The evaluation weather file for the SimpleHouse and Swisshoue buildings is given by the identifier CH_ZH_Maur and should not be used in the training process. The control inputs to both simulation models and some outputs are described in Table A6. For cooling mode, the obtained VRV capacity curves have a CV(RMSE) of 0.05%(Low), 0.10%(High) and 0.09%(Boundary) with an R2 above 99% for the three cases. While for its electric input curves it has a CV(RMSE) of 3.59%(Low), 3.06%(High) and 0.07%(Boundary) with an R2 of 91%, 96% and 99% respectively. As for heating mode, the results for the equipment capacity curves have a CV(RMSE) of 0.35%(Low), 1.07%(High) and 0.01%(Boundary) with an R2 above 99% for the three cases. The electric input curves CV(RMSE) range from 0.01%(Boundary), 0.41%(Low) to 0.75%(High), with an R2 above 99% for the three cases. For cooling, the MixedUse HP unit calculated capacity curves have a CV(RMSE) of 0.93% with an R2 of 99.4%, while its electric input has a CV(RMSE) 7.07% with an R2 of 94.5%. For heating, the calculated curve has a CV(RMSE) of 0.15% with an R2 above 99%. Its COP curve has a CV(RMSE) of 0.19% with an R2 above 99%; see Figure A6. Tec. Specification Calculated Figure A6. Heat pump performance curve comparison, technical specification displayed in black and calculated "z" value in red for heating, and blue for cooling.