Dynamic Simulation-Based Surrogate Model for the Dimensioning of Building Energy Systems

: In recent decades, building design and operation have been an important ﬁeld of study, due to the signiﬁcant share of buildings in global primary energy consumption and the time that most people spend indoors. As such, multiple studies focus on aspects of building energy consumption and occupant comfort optimization. The scientiﬁc community has discerned the importance of operation optimization through retroﬁtting actions for on-site building energy systems, achieved by the use of simulation techniques, surrogate modeling, as well as the guidance of existing building performance and indoor occupancy standards. However, more knowledge should be attained on the matter of whether this methodology can be extended towards the early stages of thermal system and/or building design. To this end, the present study provides a building thermal system design optimization methodology. A data set of minimum thermal system power, for a typical range of building characteristics, is generated, according to the criterion of occupant discomfort in degree hours. Respectively, a surrogate model, providing a conﬁgurable correlation of the above set of thermal system dimensioning solutions is developed, using regression model ﬁtting techniques. Computational results indicate that such a model could provide both desirable calculative simpliﬁcation and accuracy on par with existing respective thermal load calculation standards and simpliﬁed system dimensioning methods.


Introduction
Buildings account for a significant amount of total global energy consumption. More specifically, in 2018, consumption in the European Union reached approximately 40% of total primary energy [1]. In the past few decades, the engineering community has developed several ways to design and dimension building envelopes and Heating, Ventilation and Air Conditioning (HVAC) systems, so as to achieve a comfortable inner environment for occupants. International standards, such as ANSI/ASHRAE/IES Standard 90 [2,3] and EPBD [4] aid engineers with generalized calculation methodologies for energy efficient buildings and thermal system dimensioning. These standards provide calculating methodologies regarding the monthly/annual thermal and cooling building loads, as well as the energy consumed by HVAC systems. The ISO EN 13790 standard [5] provides a monthly calculation procedure of the required thermal loads. It also contains a methodology for modeling the thermal behavior of the building on an hourly basis, thus capturing more dynamic thermal states, namely the 5R1C (5 Resistance, 1 Capacitance) model.
Besides energy consumption, occupant comfort in indoor environments has become a growing concern, as it is evident that people tend to spend most of their time indoors [6]. Occupant comfort aspects can be divided into thermal, acoustic, and visual, as well as com- white-box models, in which the physical equations of the model are fully known • black box models, which use optimization techniques to approximate physical equations and discover patterns • grey box models, which are an in-between type of white and black-box models (physical model is partially known) Surrogate modeling can use simulated or real-time data from buildings to find correlations between environmental conditions and energy consumed. Furthermore, smart building technologies can contribute greatly to the on-site system dimensioning. Namely, smart controllers, with the help of machine learning structures, can use surrogate model methodology as a retrofitting tool for existing buildings [24]. This could enable them to further optimize HVAC systems' operation and design.
Many studies are concentrated on optimizing real time operation of existing building cases, in terms of cost and/or comfort [25][26][27][28]. Jiang et al. [29] used a reinforcement learning algorithm to optimize HVAC energy cost in the presence of variable electricity cost profiles. Other studies develop methodologies for predicting building thermal loads [30][31][32][33][34]. Guelpa et al. [35] developed a model suitable for simulating the thermal behavior of buildings and substations that are connected to district heating networks. Westermann et al. [36] developed a convolutional neural network as a surrogate model that was trained using annual weather data and simulated buildings with various characteristics across multiple locations in Canada. This model is able to estimate required heating and cooling loads of buildings in locations with different climate zones.
On the other hand, there seem to be opportunities to optimize not only existing HVAC operation, but effectively dimension heating systems from the design stages of HVAC systems, or even buildings. Caldas and Norford [37] presented an optimization algorithm for both the building envelope and the HVAC system design and operation. Carlos and Nepomuceno [38] proposed a simplified calculation methodology for heating loads to be used as a reliable estimation during the early stages of building design. Thrampoulidis et al. [39] developed a surrogate model using machine learning that offers optimized retrofit solutions for thermal systems of single and multi-family residences in the city of Zurich. In the work of Asadi et al. [40] a Multi-Objective Optimization (MOO) model is proposed for providing retrofitting solutions in school buildings, using a Genetic Algorithm (GA) and an Artificial Neural Network (ANN). However, it is noted that deciding the single optimal solution among a set of multiple optimal ones requires understanding of each building case. All in all, the literature seems to be concentrated on developing surrogate models to effectively predict building thermal behavior and using that to optimize HVAC operation, mostly in terms of energy consumption and occupant comfort. A minor part of the studies focuses on using surrogate modeling to simplify building and energy system dimensioning methodologies that could possibly outperform existing standard methodologies.
The aim of this work is to develop an optimization methodology to gain information on the nominal required thermal power of an indoor space for a specific thermal comfort level. More specifically, a surrogate model is implemented using a surrogate model-based optimization software. The proposed surrogate model is an equation that correlates the building characteristics and thermal discomfort with the thermal system nominal power and can be used in the early stages of HVAC design. Its feature is that the model not only takes building characteristics into account, but it also allows consideration of the degree of achieved occupant thermal comfort. Another trait of this methodology is that parameters and their value ranges are configurable, making the methodology flexible in its use. It may assist engineers or machine learning models in the process of calculating the required thermal power, as it offers a much lower computational load requirement than a standard dynamic method. An existing surrogate model also means that the user does not need to implement machine learning layouts for the search of the optimal solution. Those two advantages make it able to be integrated in an engineer's daily work as a helpful tool. In the following sections, the methodology is presented, describing the assumptions and procedure for formulating the surrogate model. In the computational study section, the results of this procedure are presented, as well as a comparison with existing building standards. The discussion and conclusions aim at summarizing the main findings of this work, providing an insight into the hypotheses and generalization potential, while future work is also discussed.

Materials and Methods
The developed methodology uses a simulation model, that is the ISO 13790-hourly model [5], in order to provide the necessary data for the analysis. As a final result, it provides both a non-linear and a more simplified, multilinear equation that allows the calculation of the required thermal power of the indoor spaces of a building, for a certain comfort level. A graph depicting the steps of the methodology is illustrated in Figure 1.
need to implement machine learning layouts for the search of the optimal solution. Those two advantages make it able to be integrated in an engineer's daily work as a helpful tool. In the following sections, the methodology is presented, describing the assumptions and procedure for formulating the surrogate model. In the computational study section, the results of this procedure are presented, as well as a comparison with existing building standards. The discussion and conclusions aim at summarizing the main findings of this work, providing an insight into the hypotheses and generalization potential, while future work is also discussed.

Materials and Methods
The developed methodology uses a simulation model, that is the ISO 13790-hourly model [5], in order to provide the necessary data for the analysis. As a final result, it provides both a non-linear and a more simplified, multilinear equation that allows the calculation of the required thermal power of the indoor spaces of a building, for a certain comfort level. A graph depicting the steps of the methodology is illustrated in Figure 1. The simulation model was implemented in MATLAB and functions according to the ISO-13790 hourly dynamic model. To begin with, random values of the building characteristics were generated, as well as the nominal power of the heating systems. More specifically, each building data batch case derived from multiple combination of building characteristics (see Table 1), nominal power of heating systems and discomfort settings. Simulations of the respective building set-ups were conducted, creating data batches for (Nbatch = 16,000) from each run. The source code that was used for the generation of the data batches is provided in Supplementary Material section. During each simulation, each building is exposed to environmental conditions, and heating systems operate during specified occupancy periods, in an attempt to achieve occupant thermal comfort. Both outdoor conditions and occupancy periods will be further elaborated in later parts of this section.

Um
Average building envelope thermal heat transmission coefficient Cm Average thermal capacity J·K −1 80,000-370,000 Vinf Average air infiltration rate m 3 ·h −1 50-700 The simulation was conducted two times for each random building. In the first simulation, the time period was the month during which the lowest outdoor temperatures were observed. The second simulation took place during the whole heating period, for the The simulation model was implemented in MATLAB and functions according to the ISO-13790 hourly dynamic model. To begin with, random values of the building characteristics were generated, as well as the nominal power of the heating systems. More specifically, each building data batch case derived from multiple combination of building characteristics (see Table 1), nominal power of heating systems and discomfort settings. Simulations of the respective building set-ups were conducted, creating data batches for (N batch = 16,000) from each run. The source code that was used for the generation of the data batches is provided in Supplementary Materials section. During each simulation, each building is exposed to environmental conditions, and heating systems operate during specified occupancy periods, in an attempt to achieve occupant thermal comfort. Both outdoor conditions and occupancy periods will be further elaborated in later parts of this section. The simulation was conducted two times for each random building. In the first simulation, the time period was the month during which the lowest outdoor temperatures were observed. The second simulation took place during the whole heating period, for the climatic zone in which the case building is situated in, according to the Greek implementation of EPBD [41]. In this way, we are able to determine which time period is more suitable for this simulation methodology. Data batches contain information regarding the occupied space temperature, over the course of each simulated time period, as well as the hourly heating energy demanded, to maintain indoor thermal comfort. ALAMO (Automated Learning of Algebraic Models for Optimization) is a software used for exporting high accuracy surrogate models based on simulation data or experiments and was developed by Cozad et al. [42]. In this study, this software was used in order to find the most accurate correlation possible between the building characteristics and the required heating power during occupancy periods.
In this work, the term 'thermal discomfort' was used. It is noted that two discomfort calculation methods were used (binary and degree-hour) [43,44] and a comparison was made between the fitting results of each method, when using ALAMO (see Section 3).
Binary-type hourly discomfort (BD) can be described as: where: T in : Indoor air temperature T set : Set point indoor air temperature.
Whereas the binary-type discomfort percentage (BDP) is calculated as: N hours : total hours that elapsed over a simulation. The discomfort in degree-hours (DDH) can be defined as: The normalized discomfort in degree hours (DDH N ), can be described as: As indicated in the experimental study of Favero et al. [45], discomfort could be put in a broader context and conceived as a tolerance. As such, T min could indicate the limit at which the environment conditions are considered as totally unacceptable by the occupant. Therefore, the occupant is expected to have a temperature that is conceived of as neutral and a minimum temperature, lower than which the thermal discomfort is maximized (100% discomfort); within this context, T min value is arbitrary. It should be noted that the degree-hour discomfort method is not only able to gain information on how many hours systems did not manage to provide comfort, but the temperature difference needed to achieve it each hour, as well.
Using the surrogate regression models created by ALAMO, we have obtained a correlation between nominal thermal power and building characteristics. Moreover, data regarding thermal system power over a certain time period can also be extracted for every combination of building characteristics and thermal system power. In Table 1, the combination of building characteristics to be used as parameters in the surrogate model are presented. It is noted that for each building, the values are picked in a random manner from the specified ranges. The data ranges from Table 1 were obtained from the Greek adaptation of EPBD (U m and V inf values) [41] and the ISO EN 13790 standard (C m values) [5].
Finally, the proposed surrogate model will be compared with the existing simplified calculation procedure, as instructed by the Greek implementation of the EPBD, called REPB (Regulation on the Energy Performance of Buildings) [41]. More specifically, the thermal power that is required is calculated using the following simplified equation.
In this study, a case building model is used for the testing of the proposed methodology (see Figure 2). It is a three-story building and has a pilotis underneath, as well as a conventional rooftop. Its building blocks are made of a mixture of brickwork (70%) and armed concrete (30%), and the window frames consist of double glazing. It is considered to be constructed prior to 1979, noting that by this year thermal insulation regulation was initially implemented for the Greek setting; therefore, the building does not adhere to regulations regarding insulation sufficiency. More information on case building characteristics is provided in Table 2. The data used in Table 2 were obtained from on-site inspection, according to the Greek adaptation of EPBD [41].
In this study, a case building model is used for the testing of the proposed methodology (see Figure 2). It is a three-story building and has a pilotis underneath, as well as a conventional rooftop. Its building blocks are made of a mixture of brickwork (70%) and armed concrete (30%), and the window frames consist of double glazing. It is considered to be constructed prior to 1979, noting that by this year thermal insulation regulation was initially implemented for the Greek setting; therefore, the building does not adhere to regulations regarding insulation sufficiency. More information on case building characteristics is provided in Table 2. The data used in Table 2 were obtained from on-site inspection, according to the Greek adaptation of EPBD [41].    Since the data set is based on the existing hourly method of the ISO EN 13790 standard, the assumptions under which the model was developed and validated are presented below:

•
Indoor air relative humidity is assumed to be at 50%. • Clothing of occupants is 1.0 clo. • According to the psychrometric chart of ASHRAE 55 [10], with the aforementioned humidity and clothing levels, expected occupants' thermal comfort ranges from 18 • C to 20 • C. That means T set = 18-20 • C and T min = T set − 2 K (see Equation (3)). • Every floor is considered to be a single thermal zone. As such, every room belongs to the conditioned zone. • Thermal systems operate during occupancy periods only.

•
There are no objects around the building that provide shade. • Occupancy timings exclude 08:00-16:00 for weekdays, with 24-h occupancy at weekends.

•
Outdoor conditions: Outdoor temperature and solar radiation data refer to the TRY (Test Reference Years) generated by the National Observatory of Athens [46].

Computational Study
In this section, the results of the surrogate model creation procedure will be shown, and a comparison between the extracted models and existing standards will be made. As has been described before, a data set has been generated, including the optimal thermal system power solution for each combination of building characteristics. The criterion for choosing the optimal power for each data set refers to the minimum thermal power value that satisfies the desired discomfort level (see Equation (4)).
First, a comparison between binary and degree-hour thermal discomfort will be performed. Figures 3 and 4 indicate the fitting results of the data (depicted as "measured values in the figures), while the "predicted values" come from the ALAMO regression value estimation. The data are derived from the simulation of the building case, having the heat loss coefficient as a varying parameter; the respective data batches are not the same in order for cross validation to be performed. As shown in Figure 4, the degreehour normalized discomfort method increases fitting accuracy, when compared to the simple binary discomfort percentage, presented in Figure 3. The above is verified through the fitting performance comparison, which is presented in Table 3. In addition, when comparing monthly to seasonal simulations, there is not much of a difference in results. Despite this, having an accurate model for calculating nominal thermal power for the whole heating season is more generalized and, therefore, more impartial.     At this point, the complete surrogate model will be presented and discussed. The model is generated using the building characteristics shown in Table 1.
Equations (6) and (7) describe the correlation of the best seasonal surrogate model that was exported from ALAMO, in terms of thermal discomfort and energy demand, respectively. Moreover, a complete view of the accuracy of these surrogate models, when examining cross validation results, is provided in Table 4.  At this point, the complete surrogate model will be presented and discussed. The model is generated using the building characteristics shown in Table 1.
Equations (6) and (7) describe the correlation of the best seasonal surrogate model that was exported from ALAMO, in terms of thermal discomfort and energy demand, respectively. Moreover, a complete view of the accuracy of these surrogate models, when examining cross validation results, is provided in Table 4.
For the case of 100% comfort level (DDH N values very near to 0, due to logarithmic correlation), and set point temperature (T set ) equal to 20 • C, implementation of Equation (6) leads to the results presented in Figure 5a, while the respective results for Equation (7) are presented in Figure 5b.  For the case of 100% comfort level (DDHN values very near to 0, due to logarithmic correlation), and set point temperature (Tset) equal to 20 °C, implementation of Equation (6) leads to the results presented in Figure 5a, while the respective results for Equation (7) are presented in Figure 5b.  Table 1). (a) Heating system dimensioning; (b) seasonal energy demand of the building.
As can be seen from Figure 5a,b, it is possible for the nominal system heating power to be calculated, provided the specific building characteristics, such as average heat transmission coefficient, average thermal capacity and air infiltration rate are known. It must be remembered that results are subject to the daily occupancy schedule that is set during the simulation phase, as well as the value of discomfort level that is selected.
In the dimensioning procedure, building characteristics and thermal discomfort are the independent variables. Due to that fact, Equation (6) is transformed into the following multilinear equation (Equation (8)). Linearity was chosen in order to further simplify the equation. Fitting results of the multilinear equation are presented in Table 4. As can be  Table 1). (a) Heating system dimensioning; (b) seasonal energy demand of the building.
As can be seen from Figure 5a,b, it is possible for the nominal system heating power to be calculated, provided the specific building characteristics, such as average heat transmission coefficient, average thermal capacity and air infiltration rate are known. It must be remembered that results are subject to the daily occupancy schedule that is set during the simulation phase, as well as the value of discomfort level that is selected.
In the dimensioning procedure, building characteristics and thermal discomfort are the independent variables. Due to that fact, Equation (6) is transformed into the following multilinear equation (Equation (8)). Linearity was chosen in order to further simplify the equation. Fitting results of the multilinear equation are presented in Table 4. As can be observed through Table 5, no significant accuracy penalties emerge when using the multilinear equation as a surrogate model, instead of the non-linear seasonal model (Equation (6)). At this point, the proposed surrogate model will be compared against both the ISO-13790 hourly method and the REPB. The results of the model at total comfort are almost identical to the simulation data. In order to compare the REPB line with the final surrogate model, a dimension reduction to a two-dimension scale is needed. It is noticed that, when comparing Equations (5) and (8), both share two common coefficients: average building heat transmission coefficient U m and air infiltration rate V inf . Therefore, in Figure 6a,b, two graphs are presented, each of them having one of the two coefficients set as a random value, while the other is set on the horizontal axis range. At this point, the proposed surrogate model will be compared against both the ISO-13790 hourly method and the REPB. The results of the model at total comfort are almost identical to the simulation data. In order to compare the REPB line with the final surrogate model, a dimension reduction to a two-dimension scale is needed. It is noticed that, when comparing Equations (5) and (8), both share two common coefficients: average building heat transmission coefficient Um and air infiltration rate Vinf. Therefore, in Figure 6a,b, two graphs are presented, each of them having one of the two coefficients set as a random value, while the other is set on the horizontal axis range. The results show that the heat transmission coefficient, and as a result insulation, are perceived differently from each method, when calculating required system power. As observed from Figure 6a, REPB tends to underdimension required power when using sufficient insulation and overdimension when using insufficient insulation, compared to the hourly method and the surrogate model. Infiltration effect is perceived similarly from both methods (Figure 6b).
One other observation is that by reducing comfort by 5% (from 100% to 95%), the required system power is reduced considerably. It is reminded that the EPBD and ISO EN13790 standards do not take occupant comfort into account, which is vital to attaining The results show that the heat transmission coefficient, and as a result insulation, are perceived differently from each method, when calculating required system power. As observed from Figure 6a, REPB tends to underdimension required power when using sufficient insulation and overdimension when using insufficient insulation, compared to the hourly method and the surrogate model. Infiltration effect is perceived similarly from both methods (Figure 6b).
One other observation is that by reducing comfort by 5% (from 100% to 95%), the required system power is reduced considerably. It is reminded that the EPBD and ISO EN13790 standards do not take occupant comfort into account, which is vital to attaining an optimal thermal system, regarding both user satisfaction, as well as reduction in initial installation and operation cost.

Discussion
The above results indicate that there is much to be considered, when discussing optimal thermal system dimensioning. This study indicated that a surrogate model stemming from a dynamic hourly method of existing thermal load calculation standards could not only offer calculation simplification, a virtue that would be valuable among engineers, but could also be as accurate as existing simplified system dimensioning methods, if not more so. Moreover, user thermal comfort should be taken into account, not only during the daily operation of the thermal systems, but also from the early stages of the thermal system dimensioning. Furthermore, the use of degree hour thermal comfort provides useful information that the simple hour thermal comfort fails to provide, which is in agreement with other studies that implement it [43,44]. The proposed work managed to reduce required thermal load by implementing thermal comfort as a configurable parameter (in the form of thermal discomfort, as presented in Equations (6) and (8)) and changing it at will.

Conclusions
In this study, a surrogate model, that is able to effectively dimension the required thermal power of a building, is developed. According to the results, a non-linear seasonal model can provide the dimensioning of the systems for the heating period. Involved parameters include average heat loss coefficient, heat capacity, and air infiltration; the desired degree of discomfort, involving a degree-hour approach, instead of a binary one, is imposed. The effect of the respective parameters on system dimensioning is evident. Especially regarding thermal comfort, reducing the comfort level by a rate of 5% results in a considerable reduction of required system power. A multilinear regression approach has also been proven to be reliable.
The developed model features both simplicity and satisfying accuracy, when compared to the existing ISO EN13790-simple hourly method. The simplified calculation procedure that derives from the Greek adaptation of the EPBD tends to provide different estimations of required thermal power in extreme cases of building thermal insulations, when compared to the EN13790 and the proposed model. Not only that, but both standards are unable to take occupant preferences into account in their current state, as far as thermal comfort is concerned.
Future work includes implementing this methodology across several climatic zones of Greece, and the testing and addition of several more building characteristics to the methodology, leading to better generalization. In addition, various occupancy schedules and thermal comfort settings could also be considered in the improvement of the method. Lastly, the examination and implementation of more standards, simulation procedures, as well as real-time measurements, could be of use for further development of this methodology.
Supplementary Materials: The MATLAB code used for the generation of data batches is available online at this URL: https://github.com/gpanaras/-Dynamic-simulation-based-surrogate-modelfor-the-dimensioning-of-building-energy-systems.

Funding:
We acknowledge support of this work by the project "Development of New Innovative Low Carbon Footprint Energy Technologies to Enhance Excellence in the Region of Western Macedonia" (MIS 5047197) which is implemented under the Action "Reinforcement of the Research and Innovation Infrastructure", funded by the Operational Programme "Competitiveness, Entrepreneurship and Innovation" (NSRF 2014-2020), and co-financed by Greece and the European Union (European Regional Development Fund).

Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author.