Simulation-Based Evaluation and Optimization of Control Strategies in Buildings

Over the last several years, a great amount of research work has been focused on the development of model predictive control techniques for the indoor climate control of buildings, but, despite the promising results, this technology is still not adopted by the industry. One of the main reasons for this is the increased cost associated with the development and calibration (or identification) of mathematical models of special structure used for predicting future states of the building. We propose a methodology to overcome this obstacle by replacing these hand-engineered mathematical models with a thermal simulation model of the building developed using detailed thermal simulation engines such as EnergyPlus. As designing better controllers requires interacting with the simulation model, a central part of our methodology is the control improvement (or optimisation) module, facilitating two simulation-based control improvement methodologies: one based in multi-criteria decision analysis methods and the other based on state-space identification of dynamical systems using Gaussian process models and reinforcement learning. We evaluate the proposed methodology in a set of simulation-based experiments using the thermal simulation model of a real building located in Portugal. Our results indicate that the proposed methodology could be a viable alternative to model predictive control-based supervisory control in buildings.


Introduction
Building Management Systems (BMSs) provide monitoring and control capability for multiple sub-systems including Heating, Ventilation and Air Conditioning (HVAC). Modern BMSs additionally offer some energy management functionality based on a set of low-level controllers, such as thermostats. Up to 20% of the energy used for HVAC and lighting system operation is lost due to poor configuration of such controllers or due to equipment faults [1]. Even in the case of good controller configurations, it is very hard to encapsulate efficient control strategies within a static set of simple rules, especially in large buildings with complex energy or renewable systems installed; here, all generation, distribution and room-level systems need to operate in a coordinated manner and be able to adapt to stochastic generation patterns that strongly depend on weather conditions [2].
Recently, this manual configuration and control tuning process has been replaced by a methodology developed following two axes [2,3]: • Utilise predictions instead of historical data: Here, instead of gathering and analysing historical data from the building to improve the efficiency of specific controllers, forecasts (e.g., on weather, occupancy, equipment gains, etc.) are utilised to determine the (near-)future state (e.g., thermal/visual comfort conditions) of the building. This can enable for a continuous adaptation of the control logic to the (predicted) needs of the building and the microclimatic conditions of each site. • Automated controller tuning: Here, an optimisation process is defined (utilising the available predictions) to design efficient controllers in an automated and laborious-free manner. This approach has been made possible by two key technologies: (i) the application of Model Predictive Control (MPC) paradigm in buildings (e.g., see [2][3][4][5][6][7][8][9][10][11][12][13][14]); and (ii) the development of scalable BMS, e.g., the Johnson's Controls METASYS R , which provide a remote (cloud-based) layer for hosting computationally demanding services, such as supervisory-level control design.
In MPC application in buildings, typically, we assume that a model of the controlled process (i.e., the real building in our case) as well as (weather, occupancy, etc.) forecasts for a predefined time window are available. This way, a constrained optimisation problem can be formulated and solved for a predefined period in the future, generating a control signal (or control strategy) which is optimised for the system on the basis of a defined cost function and a set of (visual, thermal, air-quality, etc.) comfort constraints. The quality of the final control strategy is evaluated under a set of Key Performance Indicators (KPIs). A high-level schematic of the overall approach is shown in Figure 1.
Despite the fact that MPC for building energy management and indoor climate regulation has been an active research topic, a plethora of research work has been published in the field and impressive results in real-world and simulated applications have been reported in relevant projects and publications, it has really not found its way into the market. Very few companies include some simple aspects of MPC in their offerings [15,16] and these implementations are far from being fully-fledged MPC solutions. This hesitation of companies to include MPC in their product lines cannot be only attributed to conservatism on the side of the industry; some of the mechanics of MPC violate specific needs of building owners and operators.
One of the biggest obstacles towards market adoption is that MPC requires a model of the building [2], usually of a very specific linear or bi-linear structure [3,[17][18][19][20][21]. In addition, in holistic use cases, all involved generation and distribution systems need to be modelled too [21,22]-and there have been only few notable efforts for model standardisation (e.g., see [23,24]).
Model development induces the largest monetary costs in MPC applications [6], but what has been rather overlooked is that there are several other practical steps towards deploying MPC in a real building that also yield increased costs: 1. installing additional sensors/meters required for accurate state estimation of the building to minimise the model/reality mismatch; 2. configuration of the overall MPC solution (including model calibration) requires significant amount of time and the involvement of high-qualified engineers; and 3. monitoring and solution debugging times of MPC solutions are significantly higher compared to the deployment of traditional knowledge-based controllers.
In several cases, the achieved cost savings might not be high enough to achieve a favourable Return of Investment (ROI) [6].
In addition, as analysed in [2,25], there is a cost (e.g., express as productivity loss [26,27]) associated with the inability of most MPC-oriented bi-linear models to accurately estimate occupant thermal sensation, e.g., under the guidelines of ISO 7730 [28] or ASHRAE 55 [29], as most MPC applications in buildings define room thermal comfort as a pre-defined region of air [30] or operative [3] temperatures with only few exceptions [31][32][33][34]. This limitation becomes more restricting as personalised thermal comfort models [35][36][37][38][39] have increasingly started becoming the norm. In an effort to overcome these limitations, we propose a methodology that utilises Building Energy Performance (BEP) simulation models, developed using BEP simulation engines such as EnergyPlus [40] or Modelica [41] in place of the purpose-built models for MPC. A standardised automatic creation of BEP simulation models could expedite the BEP simulation modelling process, making it more cost-effective and less vulnerable to modelling errors. To this direction, the automated data transformation from Building Information Models (BIM) data to input data of thermal simulation tools has received considerable attention recently [42][43][44][45][46][47]. Concerning the BIMs availability, regulatory requirements and technological advances in the field of building information modelling are resulting in increased adoption and availability of BIM data. At the same time, BIM-based workflows are increasingly adopted in professional practice, which has resulted on comprehensive and usable BIM models' availability. Hence, adoption of such an automated BEP simulation models' generation from BIM data is not far into the future (see [48] and the references therein for a thorough review and algorithmic implementations). Even though other model-assisted control design approaches have also used these types of models, the novelty of our work lies in the methods we define for generating optimised control strategies through interacting with the model: (i) simulation-based evaluation using multi-criteria decision analysis methods; and (ii) simulation-based optimisation using Gaussian process state space models. Our proposed approach combines ideas from multi-criteria decision theory and both the data-driven control [49][50][51] and reinforcement learning [52][53][54] domains and is evaluated as a proof-of-concept utilising the simulation model of a real building located in Portugal.
The remainder of this paper is structured as follows: Section 2 provides an overview of the MPC methodology in buildings; Section 3 describes the proposed methodology; Section 4 presents the experimental setup and the main results; and Section 5 presents the final conclusions and concrete next steps towards future work.

Model Predictive Control in Buildings
A more formal definition of the MPC formulation can be seen in the following Equation [3,4]: subject to: Here, we have the following: • t is the discrete time-step (with its length depending on the application), T is the prediction horizon.
x t is a vector of values for each time-step t for the states of the system (e.g., wall and air temperatures [3]).

•
The admissible control actions u t are a vector of values for each time-step t, with each value corresponding to a control action (e.g., temperature set-point, hot water flow, etc.) for all controllable systems of the building. • w t is a vector of exogenous, uncontrollable factors affecting the system, like weather conditions or user actions. As discussed previously, we have some (reliable) predictionsŵ t for these disturbances. • U ⊆ R m and X ⊆ R n are closed sets defining the admissible controls and states respectively, while W ⊆ R v is the closed set of exogenous uncorrelated factors. • f : R (n+v) × R m → R n is a (possibly non-linear) function that describes the system dynamics. • g t ∈ R is the value of the cost (or objective) function to be minimised at time t and represents a performance measure, estimating the efficiency of a given control strategy. In the buildings domain, this index can either be economical (e.g., minimise operational cost) or environmental (e.g., maximise the net energy produced or minimise CO 2 emissions). • C(x t ) ≤ 0 is a function usually describing thermal comfort constraints for the building occupants.
To account for modelling/prediction errors, only the first control action is applied to the building and the whole process re-initiates.
Starting from this generic definition, we can distinguish different categories of MPC algorithms for buildings, depending on the type of model used and the associated optimisation approach for solving the defined constrained optimisation problem. In the remainder of this section, we provide a high-level overview of these categories, but a detailed analysis is beyond the scope of this paper. We refer the interested reader to [2,5] and the references therein for a thorough analysis.
Physics-based models These models are designed based on first principles, utilising knowledge of the underlying physical process of the building and are usually analogous to electrical Resistance-Capacitance (RC) networks [3,6,[17][18][19][20][21]. In an effort to reduce the model design-and calibration-associated costs for these models, a number of MPC approaches based on system identification methods [18,50,51,[55][56][57][58] have been proposed. The problem here is that some initial (quasi-)random actions need to be performed to the real building (called persistent excitation signals in system identification theory) [51,56,59,60] to satisfy key theoretical assumptions on reliable statistical identification [55,61]. In real operation, this requirement is almost always infeasible without compromising the comfort or the health of the occupants (note that this is a problem closely related to the safe exploration/exploitation dilemma that has been extensively studied in reinforcement learning [52,53,62,63]). It is conceivable that a poorly identified model can lead to severe compromise of occupant comfort conditions; the model will provide inaccurate predictions that will lead to unsuitable controllers [64].
Data-driven models Here, a linear or non-linear regression model is fitted using available data from the real building and then utilised for the MPC application. Several types of regression functions have been reported in the literature, e.g. neural networks, support vector regression, autoregressive moving average, regression trees, etc. (see [5,65] for an overview). These models have the same data requirements as the system identification methods described before while they also suffer from poor extrapolation properties.
BEP simulation models When these models are utilised for control design, the main concern is the optimisation algorithm to be used, as they are characterised by high simulation times. A common approach is to assume that the simulation model is an exact model of the real building and construct a surrogate regression model off-line; this model is orders of magnitude faster compared to the original model so it could be used for MPC-like on-line control design [55,66]. From our experience with real-building control experiments [2,4,67], the assumption that the original simulation model (and thus also the surrogate model derived from it) can actually predict all states of the real building, under all different combinations of weather conditions and occupant actions, usually does not hold in practice. Instead, what is required is a continuous calibration process, for example as defined in the digital twin paradigm [68].
A second approach that has been reported to address the sample complexity problem is to directly replace the (bi-)linear model required in classical MPC with the detailed thermal simulation model. Early work here [69,70] has used global constrained optimisation approaches (such as genetic algorithms), but as the number of required simulation calls is quite high (and would grow exponentially with the number of posed comfort constraints [70]), these algorithms would scale poorly in realistic building control scenarios.
More recent work in the same direction [2,4,67,[71][72][73][74] uses the available simulation model to construct a simple meta-model of the building online, which is able to predict the behavior of the building only for a limited prediction horizon. This meta-model is used for the optimisation of the control strategy, in an effort to reduce the number of simulation calls to the "expensive" simulation model. The problem with most approaches here stems from the multi-objective formulation of the optimisation algorithm [2,4,71], in which the objective function to be optimised is a properly weighted combination of a cost function (e.g., the total energy consumption) and a term that expresses the thermal discomfort for the entire building (in most cases, average discomfort values for all offices are calculated, in contrast to normative guidelines [28,75]). In large-scale real-world setups, hand-tuning a proper weighting between the active constraints being generic enough to capture all the real-world variations (e.g., varying daily weather conditions, occupant actions, etc.) is a formidable task. For example, in an office building with 30 offices, where thermal and visual comfort constraints need to be met, minimising the energy consumption of the building while giving priority to thermal over visual comfort constraints requires: (i) proper normalisation between the energy consumption, the thermal comfort and visual comfort constraints values so no term is dominating the combined cost function [76]; and (ii) adjusting the relative importance of visual over thermal comfort constraints Finally, constrained Bayesian optimisation has been used to address these shortcomings [2], but, even though this approach resulted in a successful real-world application, Bayesian optimisation does not scale well in problems with a high number of optimisation parameters [77].

The Proposed Approach
Taking into account all the above, in the present work, we define a methodology that builds upon some of the ideas that enable the great potential of MPC, but at the same time satisfies the requirements of building owners on reduced cost and more transparent control decisions [78]. To achieve this, in the heart of our methodology lies a set of Building Energy Performance (BEP) simulation models, developed to use BEP simulation engines such as EnergyPlus or Modelica, instead of using purpose-built models for every study building.
Here, we utilise the simulation model in an on-line fashion, but we deviate from previous work by developing and evaluating two simulation-based control improvement/optimisation methods (analysed in the remainder of this section): (i) simulation-based evaluation using multi-criteria decision analysis methods; and (ii) simulation-based optimisation using Gaussian process state space models. The high-level view of the overall methodology-which is similar to the classical MPC work flow-is shown in Figure 2, where we have the following steps: 1. Design Phase: • The Control Improvement module evaluates candidate control strategies utilising the simulation model under a set of pre-defined objectives (e.g., energy consumption, thermal comfort in each controller space, etc).

•
When the Design Phase finishes, the best resulting strategy is communicated and deployed to the real building.

Application Phase:
• The deployed controllers are used to generate new control actions in each control time-step for each controllable system of the building (possibly also utilising in-building and weather sensor measurements).

Simulation-Based Evaluation Using Multi-Criteria Decision Analysis Methods
One simple way of designing controllers is to adapt the methodology used in [79] for selecting the best voyage plans in ships in the building control domain. In this work, a module generates a set of candidate voyage plans from port A to port B and a Multi-Criteria Decision Making (MCDA) algorithm [80] selects the best route according to a pre-defined set of criteria.
Equivalently, the building management team could provide a list with pre-defined control strategies (e.g., more or less pre-cooling hours, different room setpoint schedules, different heating curves for the boiler operation, etc.) along with a list of objectives (called criteria in MCDA literature) to be optimised (e.g., total energy consumption, monetary cost, thermal and visual comfort, etc.). In addition, for each objective/criterion, a "preference" value (e.g., a number between 0 and 10) needs to be provided, expressing the relative "importance" of each criterion. Finally, each control strategy is evaluated on the available simulation model and based on the different values for each criterion an MCDA algorithm returns a ranking for each control strategy and the highest-ranking control is applied to the building. The overall methodology is shown in Figure 3.
The main advantage of MCDA methodologies is that in problems where multiple criteria have to be taken into account, the solution they provide is not solely a mathematical outcome, but also reflects the preferences and priorities of the decision makers, as indicated by their preferences. Following the work of [79], we have selected the PROMETHEE (Preference Ranking Organisation Method for Enrichment Evaluation) II methodology [81][82][83] as the main MCDA algorithm, which is a widely-used method based on pairwise comparisons of the different scenarios (control strategies in our case) to be evaluated.
To apply PROMETHEE II one has to define the set of alternative control strategies to be ranked, the set of objectives/criteria to be used for the ranking, as well as weights reflecting the preferences and priorities of the building management team with respect to the criteria. Based on these, a multi-criteria table is created, which includes the evaluation of each alternative control strategy against each considered criterion and a preference function is defined for each criterion that reflects the preference of an action from another according to this criterion. In PROMETHEE II, there are six preference functions for pairwise comparison of alternative scenarios; here, we use the Gaussian criterion since it is more robust and requires less fine-tuning (see [79] for more details): Here, p is the evaluation criterion, P p (S i , S j ) is the preference of control strategy S i over an alternative control strategy S j on the specific criterion, c p (S i ) and c p (S j ) are the evaluations of the alternatives S i and S j for the given criterion and σ p is a threshold that define the shape of the corresponding preference function.
Subsequently, an aggregated preference index pr(S i , S j ), expressing the overall preference of the one alternative over the other, is calculated for each pair of alternative actions (S i , S j ) as follows: with N the total number of considered criteria and w p the weight of each of them. This result is then used to define two additional indices for each alternative control strategy: • The positive outranking flow φ + (S i ) = ∑ S j pr(S i ,S j ) N−1 expresses the degree to which a control strategy S i is preferred over all the other control strategies S j .

•
The negative outranking flow φ − (S i ) = ∑ S j pr(S j ,S i ) N−1 expresses the degree to which all the other control strategies S j are preferred over this specific one.
, which reflects the final utility of the control strategy S i , and is used to rank the alternatives. Finally, the best control strategy (the one having the highest net outranking flow) is applied to the building.

Simulation-Based Optimisation Using Gaussian Process State Space Models
In contrast to the MCDA-based approach analysed in the previous section, here, we illustrate an optimisation-based solution to the control design problem. The main difference is that we no longer use static control strategies as input and rank them, but we design a new control strategy suitable for the building control problem at hand by interacting dynamically with the available simulation model.
As the simulation time of the building simulation model can be a significant bottleneck, the defined optimisation approach needs to be data efficient to facilitate a small number of simulation calls. This (usually) implies that we need to trade-off optimality of the solution for time-efficiency; the discovered locally optimal solution can still be significantly better compared to a set of static rules consisting of the baseline control in most buildings [2,4,67].
For our solution, we build upon recent advances in model-based reinforcement learning [53,62] and data-driven control [50,51,84]. These approaches use non-parametric models (Gaussian Processes (GPs) for regression [85]) to train a regression model that will mimic the dynamics of the controlled system (i.e., the building in our case). Then, an optimisation algorithm utilises the learned model and the controller discovered is applied in the real system. After each run in the real system, the available data are augmented with new data from the latest run and the model is re-trained. The main property of these methods is that they have exhibited unprecedented data efficiency compared to previous methods [62], which in our case implies that we can minimise the number of simulation calls to the model. GPs are non-parametric regression models [85,86] and as such they do not require an a priori fixed model structure (for example, the degree of polynomial regression functions). This property implies that the same regression model can be used "as-is" regardless of differences in the properties of the building or the HVAC system.
A GP (GP(m(x), k(x, x ))) is a collection of random variables, any finite number of which has a joint Gaussian distribution. As stated in [87], it is similar to a function, but instead of returning a scalar f (x) for every x, it returns the mean and variance of a normal distribution over the possible values of f at x.
If we have l training samples [(x 1 , y 1 ), . . . , (x l , y l )] ∈ D with x ∈ R n and y ∈ R and zero mean function, we can sample from the prior by sampling values from the multivariate normal distribution N (0, K) at x 1:l to generate pairs x 1:l , f(x 1:l ). The covariance matrix (or kernel) K is given by [87]: Now, if we want to predict the value of a new point x l+1 , then f(x 1:l ) and f (x l+1 ) are jointly Gaussian: Thus, the predictive distribution is the following: The choice of kernel determines the smoothness properties of the underlying function. Even though there is a plethora of kernels available [85] (and it is even possible to design new kernels by combining existing ones [88]), the most commonly used are the following: • The Squared Exponential (SE) covariance function, defined as: where d a hyperparameter that controls the width of the kernel.

•
The Matérn covariance functions, defined as follows: with ν, d > 0 the hyperparameters and Γ(ν) and H ν the Gamma and Bessel functions of order ν, respectively.
Using GPs, we can identify a dynamical model of the simulated building [84] and re-formulate the constrained optimisation setup of Equation (1) as follows: Here, we have the following: • t, u * , u t , U, X, W, g t and C(·) have the same definitions as in Equation (1). •ŵ t is a vector of available (weather, occupancy, etc.) predictions. •x 0 is provided by the simulator. • GP represents a set of GPs that provide an estimate of the statesx t+1 at time t + 1, taking into account the statesx t , exogenous predictionsŵ t and applied actions u t at time t. • s := [σ 1 , σ 2 , . . . , σ T ] is a vector containing the prediction variance in all time-steps t. • is a design parameter. By requiring s ∞ ≤ , we can implicitly allow for more or less exploration in the optimisation algorithm (i.e., favouring or not control actions that lead to "uncertain" states), simply by changing the value of .
To summarise, the simulation-based optimisation algorithm using GPs (called GP_SS in short in the remaining of the text) consists of the following steps: 1. An initial set of random simulations is performed in the detailed thermal simulation model. Note that, since we are operating in the simulation world, we can safely explore random or drastic actions with no effect to the real-building occupants. 2. A set of GPs is trained using the simulated state, action and disturbance data. 3. A constrained local optimisation algorithm is tasked to solve the constrained optimisation problem defined in Equation (9), utilising the GP models for evaluating the effect of different control parameters through rollouts, as shown in Figure 4. 4. The best parameters discovered are simulated in the "expensive" thermal simulation model and the GPs are re-trained in a new dataset augmented with the most recent simulation data. 5. The process re-iterates until convergence or a time-out occurs.

Closed-Loop Control Extension
We have to emphasise here that, following the work in [2,4], the optimised control strategies are not necessarily time-series of setpoints to be applied to the actuators; they can be optimised parameters of the knowledge-based controllers already deployed in the building. For example, if the supply water temperature of a radiant system is regulated using a heating curve with the external temperature as input, then the optimised controller of the proposed methodology would be a new heating curve, adapted, e.g., to the weather predictions of the following day. In this case, we would define a set of feedback parametric controllers in Equation (9) as follows: with θ ∈ R L the vector of control parameters to be optimised instead of u. This extension, apart from reducing the complexity of the problem (as usually the number of parameters would be much lower compared to the time-series controllers of the open loop solution), would also address a long standing requirement of building managers for more transparency and "explainability" of deployed control solutions [78]-a requirement that still cannot be fulfilled by any MPC-like control solution, despite some relevant research efforts in this direction [89][90][91].

The Example Building
As a running example, one of the MOEEBIUS project buildings was used: the Mafra City Hall in the northwest of Lisbon District (Portugal) shown in Figure 5. It is a four stories building (three over ground and one semi-exposed basement), which sums around 4800 m 2 , with its main façade west oriented, with large openings and a central atrium. It houses about 200 permanent staff and about 50 visitors per day, including the mayor's office and administrative spaces. Electricity is used as energy carrier to cover all buildings' needs (more information about the specific building as well as the other MOEEBIUS Project buildings can be found in the project website: www.moeebius.eu).
A simulation model of the building was created using EnergyPlus. The architectural plans and material properties of walls and glazings were gathered from the municipality. The 3D geometrical modelling and the thermal properties definition (roof, floors/ceilings, external walls, internal walls, openings, etc.) were carried out using DesignBuilder (v4.7.0.027, Design-Builder Software Ltd., Gloucestershire, UK) [92], from which the first version of EnergyPlus Input Data File (IDF) was exported. The building is located on a top of a hill, so the surrounding ground was included in the model, but any effect of shading by neighbourhood constructions is neglectable. Once exported to EnergyPlus, the model was enriched with zone and activity modelling (activity schedules, occupancy densities, metabolic rates, equipment internal gains and thermal comfort and internal air quality setpoints) and the simulation time-step was set to 10-min intervals.
In the last step, the thermal/electrical loads and distributed generation systems existing in the building were introduced into the model. Here, as the HVAC system represents the main load of the building, special attention was paid to include a reliable model, reproducing the typology and topology of the building's actual systems. The heating and cooling demand of the building is covered by 13 Variable Refrigerant Flow (VRF) centralised systems that can provide heating and cooling simultaneously according to the specific needs of each of the spaces of the building. Each of the VRF systems is formed by an external unit, connected to several internal units deployed at zone level that can provide independent temperature control to each of the conditioned rooms. Technically, a VRF system is an Air-Conditioning (AC) system, consisting of one external and several space-level terminal units, which operates by varying the refrigerant flow rate, using variable speed compressors (in the external units) and electronic expansion valves (in the space-level terminal units), to meet the heating and/or cooling demands of the building. The ventilation system of the building is formed by several independent supply and exhaust fans, connected respectively to supply and exhaust duct networks, enabling internal air renewal, however not providing exhaust air heat recovery functionality. All of them operate according to constant air volume control strategies. Concerning the Domestic Hot Water (DHW), small electrical domestic water heaters are installed in specific zones (e.g., cafe and restrooms), due to low DHW demands at these areas. Finally, the thermal/electrical loads related to indoor lighting, cafeteria appliances and offices/IT equipment were introduced into the model according to their specification data-sheets.
Occupant thermal comfort was evaluated using the Fanger predicted mean vote PMV index-following relevant standards [28]-which predicts the thermal sensation of a large group of people on a seven-point scale from too cold (−3) to too hot (+3). For the PMV estimation in our simulation model, constant values of metabolic rate (108 W/person corresponding to light-office activities), clothing level (0.5 clo) and in-space air velocity (0.137 m/s) were considered. Note here that, instead of using the Fanger index, any personalised thermal comfort model [35][36][37][38][39] can be seamlessly integrated in the overall methodology; the only change will be in the thermal discomfort calculation methodology.

The Experimental Setup
As a proof-of-concept, we illustrate our methodology in simulation (i.e., we demonstrate only the upper part of Figure 2), by a one-day control design experiment using both the MCDA and GP_SS approaches. The simulation day was 1 (July 1st -a hot Summer day) and we utilised a Meteonorm weather file [93] for Lisbon, Portugal.
For our experiment, we focused on VRF5 control, which serves Rooms 21, 22, 87 and 88. Rooms 21 and 87 have seven occupants each and Rooms 22 and 88 have only one. The occupants were assumed to enter the building at 08:15 in the morning and exit at 17:15 in the afternoon. Our task was to minimise the total energy consumption measured at the VRF unit while maintaining comfortable interiors in all related rooms. We used an open-loop supervisory controller consisting of nine setpoint values to be applied on every occupied hour (08:00-17:00); all the necessary low-level control logic for the setpoint tracking were defined in the EnergyPlus model developed. Note that we did not use a different setpoint for each room but rather defined an HVAC zone where the controller of all rooms included is the same. This is a standard practice for control complexity reduction in buildings both in industry and in research applications (e.g., see [94][95][96] and the references therein).
We have to emphasise here that, even though the control setpoint was the same for all rooms, the thermal comfort evaluation of the generated control solution was performed in each controlled room separately. Following the relevant standards [28,29], we used the Fanger index for thermal comfort evaluation and required that the Fanger PMV value in each office is within (−0.5, +0.5) in all time-steps. In each time-step t and for each room i, we defined the following comfort constraint: Summarising, in connection with the classical MPC setup (Equation (1)), we had the following setup: • The building is controlled only during the occupancy period (08:00-17:00).

•
The control actions are hourly setpoints, which are the same for the four control offices. Thus, the control time-step is one hour.

•
The thermal comfort constraints are defined according to Fanger PMV index and are posed as illustrated in Equation (11).

•
The (predicted) exogenous factors in our case are the weather conditions (provided by the weather file) and the occupancy patterns (which are assumed fixed and known in advance).

•
Since we consider that the predictions for the disturbances are accurate, there is no need to re-design our control strategy at each control time-step (i.e., every hour), so for the work presented here we design the entire control strategy once for the entire day. Of course, in future application in the real building frequent control, (re-)design will be required to compensate for all unpredicted disturbances.

•
The evolution of the system dynamics is provided by the EnergyPlus simulator; as described earlier we do not have access to the internal equations of the simulator but we can only observe the input-output relationship.

•
The objective/cost function to be minimised is the total energy consumption measured at the VRF5 unit, which serves the four controllable offices.
To evaluate the quality and efficiency of the improved control strategies under the MCDA and GP_SS experiments, we defined a simple Rule-Based (RB) controller of the building, which maintained a constant setpoint for all offices throughout the day. This setpoint was defined at 24.5 • C, after a set of test simulations with different setpoints and consultation of the relevant thermal comfort standards [28]. The performance of the Rule-Based controller can be seen in Figure 6a

The Software Framework
For the simulation study presented here, we defined a co-simulation setup. Both the MCDA and GP_SS approaches were implemented in Python and, in each iteration of the applied algorithm, a controller (i.e., the setpoint values in our case) was evaluated in the EnergyPlus simulation model. The control logic (i.e., applying the first control action at 08:00 and the subsequent setpoints every 1 h) was implemented in Matlab and in each simulation time-step a control action was communicated from Matlab to EnergyPlus. EnergyPlus simulated the building dynamics for the following time-step and the building states and weather conditions for this time-step were communicated back in Matlab.
This process repeated until the end of the simulation period (one day for our experiment here) and the results were communicated back to the Control Optimisation algorithm for further processing. The coupling between Matlab and EnergyPlus was achieved using the Building Controls Virtual Test Bed (BCVTB) co-simulation software (v8.6, Lawrence Berkeley National Laboratory, Cyclotron Rd, Berkeley, CA, USA) [97]. For all the experiments, we used a MacBook Pro laptop, equipped with a 2.8 GHz Intel Core i7 processor and 16 GB of RAM and we did not use any form of parallel computation. For the GP_SS implementation, we used the GPy library [98] for the Gaussian Process definition and the Sequential Least Squares Programming (SLSQP) algorithm [99] of the pyOpt Python library [100] for the constrained optimisation setup.

Approach Using Multi-Criteria Decision Analysis
For the MCDA approach, we ran 12 simulations where the controller was a static setpoint throughout the day, similar to the rule-based controller. Here, we tried all setpoint values from 21.5 • C to 27 • C and we incremented the setpoint by 0.5 • C in each simulation. The results for each controller are shown in Table 1.
To illustrate the behaviour and performance of the MCDA approach, we implemented a series of experiments with different weights in the criteria, with the criteria being the energy consumption and thermal comfort violations shown in Table 1. The weights can take numbers from 0 to 10, with 0 meaning that this criterion is not important and 10 meaning that it is quite important (note that the weights of all the criteria do not need to add to 10. We are not trying to achieve a relevant weighting). Note that we assigned the same weight in all thermal comfort violations, since it is rare in real world applications that we prefer comfort in one room over another. Thus, we represented the importance of energy savings with the value of w 1 and the importance of thermal comfort with the value of w 2 . The different weighting configurations (shown in Table 1) are the following: 1. Weights: w 1 = 10 and w 2 = 0.

•
Reasoning: This is a weighting combination that gives priority to the energy consumption.

•
Result: The ranking of the strategies from PROMETHEE II algorithm is as expected, since the algorithm prioritises the solutions taking into account only the energy consumption criterion.
• Reasoning: This is a weighting combination that gives priority to the thermal comfort conditions in the building.

•
Result: The algorithm favours solutions that maintain comfortable interiors, but the highest ranking controller is the one with the minimum consumption among all the controllers that achieve comfort. This is one of the properties of the algorithm: even though we indicated that the energy consumption has no priority for us, the fact that this criterion needs to be minimised too leads to this sorting. We have to note here that, even though Strategy #9 leads to limited discomfort, it is not among the high-ranking strategies. This is due to the high priority given to the comfort criterion, which results into treating the constraint violations as hard constraints.
• Reasoning: This is a weighting combination that gives equal priority to both criteria.

•
Result: Here, the comfort violations are treated as soft constraints and the ranking of strategies is quite balanced, but the highest ranking strategy determined by the algorithm (Strategy #10) leads to significant discomfort in Office 87.
Concluding, for a fair comparison both with the baseline controller and the GP_SS solution, we have selected Strategy #9 (i.e., setpoint at 25.5 • C, shown in Figure 6b) as the "best one", which results in 25% savings compared to the rule-based controller.  Note that, even though we refer to the GP approximation as "GP model", we defined 13 GP regressors-one for each target variable. When the model is trained, we utilise it for defining and solving the constrained optimisation problem of Equation (9) offline. Once the optimisation process finishes, the new controller is evaluated in the simulation model, the GP training data are augmented with the outcome of the new iteration and the process is repeated.
The resulting controller after five simulation/optimisation steps is shown in Figure 6c. The entire process (20 initial simulations + 5 simulation and optimisation steps) requires 90 min in our hardware configuration and leads to 32.9% energy improvement compared to the baseline controller and 9.5% compared to the "best" MCDA controller (see Table 2 for the consumption values, by allowing a small amount of discomfort in Room 87 only (0.2 as an absolute value calculated using Equation (11)).
As can be seen in Figure 6, these increased savings are due to the initial performance of the baseline controller. Here, the setpoint applied (24.5 • C) maintains the Fanger PMV values close to zero, being "unaware" constrained relaxation for PMV (−0.5, +0.5). The GP_SS and MCDA approaches on the other hand are capable of exploiting the trade-off between energy consumption and thermal comfort and discover control strategies that maintain PMV values close to the high bound of the constraint, thus saving energy. Another interesting observation is that there is not much variability to the control setpoints of the best control solution designed by the GP_SS approach. This is because the cooling system is air-based and capable of covering the cooling demand, so there are no time-delay effects, as we see in buildings equipped with systems such as radiators [4,25] or Thermally Activated Building Slabs (TABS) [2,25]. This would imply that only for this HVAC system a more targeted setpoint/control parameter search for the MCDA approach could lead to the same results as the fully-fledged optimisation methodology followed in GP_SS approach. Alternatively, an optimisation problem with only one parameter (i.e., a constant setpoint for the entire day) could also lead to an improved control strategy compared to the baseline controller.

Conclusions
In the work presented here, we defined and evaluated a model-assisted control optimisation methodology in buildings that is able to overcome the modelling cost problem of traditional MPC methodologies by replacing the mathematical model required in MPC with a detailed thermal simulation model of the building. A central part of our methodology is the control improvement (or optimisation) module, facilitating two simulation-based control improvement methodologies: one based in multi-criteria decision analysis methods and the other based on state-space identification of dynamical systems using Gaussian process models and model-based reinforcement learning. We evaluated both methods in a simulation study utilising an existing building in Portugal and we verified that they can generate more efficient controllers compared to a baseline control strategy, utilising only a small number of simulation calls.
More specifically, as indicated by our experiments, the flexibility of the GP_SS methodology to define different control setpoints in each control time-step can lead to more efficient control strategies. Here, including the thermal comfort bounds as constraints in the optimisation problem allows the algorithm to discover a good trade-off between thermal comfort conditions and energy consumption. Of course, the discovered controller does not correspond to the global optimal solution, but still is more efficient compared to the baseline control strategy.
The main benefit of the MCDA approach on the other hand is its simplicity and its negligible computational overhead. Despite this, it is almost impossible to define a set of meaningful and comfort preserving control strategies a priori for all types of buildings and controlled systems-some kind of control improvement or fine-tuning will always be required. In addition, we have seen that the method is particularly sensitive to the selection of the criteria weighting; here, the decision maker might need to actively take part in the final control strategy selection (e.g., by manually selecting one of the top N control strategies and not necessarily the highest ranked one).
We have to acknowledge that, for large buildings with centralised HVAC systems, the proposed approach might not scale. In the work presented here, we controlled only one of the 13 VRF units of the building, which implies that, if an online control design solution were required, we would need to run 13 control optimisation processes in parallel-a configuration that increases the computing infrastructure requirements and thus the associated cost. In addition, in the case of buildings with centralised control systems, the number of controlled parameters would be significantly higher, which would lead to a much harder constrained optimisation problem that would necessitate a higher number of simulation calls.
Of course, one could address this problem by utilising various simulation speed-up/simplification techniques reported in the literature (e.g., see [94][95][96] and the references therein), but still in the general case the posed constrained optimisation problem might be too large for this method. Another option would be to formulate the problem as a Reinforcement Learning setup and take advantage of recent advances in deep reinforcement learning research [101][102][103] combined with simulation to reality control transfer approaches [104,105] to mitigate the computationally complex problem of control design in the off-line phase.
Finally, as an immediate next step, we plan to apply the proposed methodology in the same real building in Portugal within the MOEEBIUS H2020 European Project, in an effort to evaluate the effect inaccurate predictions and user actions potentially have in the efficiency of our methodology.
Author Contributions: G.K. and C.M. formulated the methodology for the Gaussian Process State Space models approach for simulation-based control optimization in buildings. G.K. designed the experiments. G.K. and G.G. analysed the results. V.S., P.d.A.-C. and A.R.-A. developed the EnergyPlus simulation model of the Mafra City Hall building. P.d.A.-C. generated the 3D model images and provided the building photographs. G.G. and D.R. set up the co-simulation environment and G.G. supported the experiments. N.P. formulated and implemented the Multi-Criteria Decision Making-based approach for simulation-based control evaluation in buildings. G.K., G.G., P.d.A.-C. and N.P. wrote the paper. S.S. and C.M. proofread the paper and helped improve various sections in the text. G.G. oversaw the entire work, offered key insights in the design and extension of the experiments and the analysis of the results and proofread the paper.
Funding: Research leading to these results has been partially supported by the Modelling Optimization of Energy Efficiency in Buildings for Urban Sustainability (MOEEBIUS) project. This project has received funding from the European Union's Horizon 2020 research and innovation programme under grant agreement No. 680517. Georgios Giannakis and Dimitrios Rovas gratefully acknowledge financial support from the European Commission H2020-EeB5-2015 project "Optimised Energy Efficient Design Platform for Refurbishment at District Level" under Contract #680676 (OptEEmAL). Georgios Kontes and Christopher Mutschler gratefully acknowledge financial support from the Federal Ministry of Education and Research of Germany in the framework of Machine Learning Forum (grant number 01IS17071). Georgios Kontes, Natalia Panagiotidou, Simone Steiger and Gunnar Gruen gratefully acknowledge use of the services and facilities of the Energie Campus Nürnberg. The APC was funded by MOEEBIUS project. This paper reflects only the authors' views and the Commission is not responsible for any use that may be made of the information contained therein.

Conflicts of Interest:
The authors declare no conflict of interest.