Probabilistic Load Forecasting Optimization for Building Energy Models via Day Characterization

Accurate load forecasting in buildings plays an important role for grid operators, demand response aggregators, building energy managers, owners, customers, etc. Probabilistic load forecasting (PLF) becomes essential to understand and manage the building’s energy-saving potential. This research explains a methodology to optimize the results of a PLF using a daily characterization of the load forecast. The load forecast provided by a calibrated white-box model and a real weather forecast was classified and hierarchically selected to perform a kernel density estimation (KDE) using only similar days from the database characterized quantitatively and qualitatively. A real case study is presented to show the methodology using an office building located in Pamplona, Spain. The building monitoring, both inside—thermal sensors—and outside—weather station—is key when implementing this PLF optimization technique. The results showed that thanks to this daily characterization, it is possible to optimize the accuracy of the probabilistic load forecasting, reaching values close to 100% in some cases. In addition, the methodology explained is scalable and can be used in the initial stages of its implementation, improving the values obtained daily as the database increases with the information of each new day.


Introduction
Smart cities, smart grids, and smart buildings [1][2][3], demand response and demand side management [4,5], renewable energy sources and their integration [6], nearly zero energy buildings [7], model predictive control [8], electric vehicles [9], etc., are perhaps some of the research areas that have received more emphasis in recent years. Global warming due to climate change, the scarcity of natural resources, and the growing global increase in energy demand are some of the reasons for this search for greater efficiency and smart energy use. In these fields, buildings play an important role since, for example, in the European Union (EU), they are responsible for 40% of energy consumption [10].
There are several approaches to reduce the energy consumption of buildings, which can be classified into two main groups: those based on "passive" strategies, understanding "passive" as those ones that do not require an active management of the building, as the improvement of building envelopes [11], the replacement of systems (HVAC, DHW, etc.) with more efficient ones [12], the use of renewable energies, etc.; and those based on "active" strategies that basically try to achieve an optimal management of the building as a whole [13,14].
Both approaches require reliable building energy models (BEMs) to test, verify, and quantify each measure and its improvements. However, characterizing a building in a simulation model is not an easy task [15]. In fact, there are different types of building energy models, depending on the methodology used to obtain them, which have their advantages and disadvantages [16]. They fall into three categories: black-box models, those based on kernel density estimation (KDE). A kernel is a standardized weighting function that assigns different weights to the observations. It depends on a parameter "h", the bandwidth, which determines the amount of smoothing applied in the estimation [52]. Although obtaining the KDE function requires more computational time, the PLF results provide valid prediction intervals since the prediction interval coverage probability (PICP) values are in many cases above the prediction interval nominal confidence (PINC), which is 80% [53]. Furthermore, as Gonzales-Fuentes et al. [54] explained, "What KDE has in its favor is that it requires less samples to provide a good estimate". This allows obtaining good PLFs in initial states, where the amount of data used to obtain the KDE function is low.
Research on probabilistic load forecasting is increasing in the literature; however, it is mainly focused on black-box models, which cannot show the link between the inputs and the forecast building loads. A previous research work from the authors Lucas Segarra et al. [53] aimed to fill this gap by providing a probabilistic load forecasting methodology that considers the weather prediction uncertainty using white-box models (BEMs), in particular calibrated BEMs using EnergyPlus [55], instead of black-box models. That methodology transforms the point load forecast provided by a BEM into a probabilistic load forecast using historical data based on outdoor and indoor conditions provided by building monitoring. The case study employed up to a total of 12 months of data (two years (two heating seasons)), and the results showed that the mean prediction interval width (MPIW) decreased as the amount of data used to perform the KDE function increased, maintaining good prediction interval coverage probabilities (PICPs).
This paper continued the research on PLF applied to white-box models or BEMs and went one step further and optimized the PLF values obtained by analyzing and classifying the data used to generate the KDE function. In this sense, it answers the question raised in that research on "how the seasonality of the weather data influences the results".
To this end, a methodology was developed to characterize the days so that each KDE function is obtained using only information from days with similar characteristics, classified by different criteria, whether quantitative (amount of energy), qualitative (form of energy demand), or a mixture of both. Classifying each day is more logical than assuming seasonality in the data, since within each season, there will surely be days with different weather behaviors. Due to the purpose of the PLF being to obtain the demand/consumption of a specific day (short-term load forecasting) [56,57], this time period was used to obtain both the KDE function and its uncertainty map based on both quantitative and qualitative characteristics. Thus, when a load forecasting is made, it is first characterized in order to select the characterization criteria of the training days that will construct its uncertainty map. This enables obtaining more accurate PLFs, reaching in some cases, as will be seen in the case study, almost 100% accuracy.
To make the methodology scalable, a script that establishes a hierarchy in the cascade of all the proposed characterization criteria was developed. After analyzing and classifying the day under study, the script selects which criterion best meets each day. Obviously, at the beginning, not all the characterization criteria have enough information to obtain a valid KDE function, so the script selects the best among the existing ones and "feeds" each criterion with the data of the new day. As a result, this methodology can be used at the initial stages of each case study, since as the days go by, the data feedback allows classifying and improving the PLFs obtained.
The main contributions of the research are: the optimization of the probabilistic load forecasting using a characterization of the daily load forecast through the use of white-box models; and the continuous improvement of the results by feeding back the database as the days go by, allowing an early implementation. This paper is organized as follows: Section 2 summarizes the calculation process of the PLF and of its uncertainty maps and explains the methodology to obtain an optimal PLF based on the days' characterization. Section 3 focuses on the description of the case study for which the methodology was implemented, and Section 4 shows the results, including the evaluation of the methodology. Finally, Section 5 presents the discussion and conclusions of the study.

Methodology
This section explains first the probabilistic load forecast (PLF) approach using building energy models (BEMS) and accounting for weather forecast uncertainty and, then, the process to characterize the training days used to generate the optimized PLF and the uncertainty map.

Probabilistic Load Forecast Using BEMs
This section explains the methodology to obtain probabilistic load forecasts taking into account the weather forecast uncertainty and using BEMs, which was previously published by the authors in [53]. The technique requires two procedures: the simulation process to determine the historical impact of the weather forecast data on the load provided by the BEM and the probabilistic processing of the simulation outputs. Figure 1 shows the overall schema of the probabilistic load forecasting methodology.  The first procedure is the simulation process. In order to reduce to a minimum the uncertainty in the load forecast due to building energy model accuracy, the proposed methodology employs calibrated building energy models (BEMs) based on EnergyPlus simulation software, which is an accurate representation of the behavior of the real building. Since the PLF procedure is based on the comparison of the load using observed and forecast weather data, it is required to explain how the forecast and observed data are implemented in the BEM and the simulation procedure, with special attention given to the weather files' creation, the indoor temperature, and the simulation periods.
The weather files, in EPW (EnergyPlus weather file) format, are created respecting the thermal history of the building. The process of creating the weather file starts with the collection of daily forecast weather data from an external provider and the measured weather data from an on-site weather station installed in the building surroundings. Then, one weather file, called the combined weather file, is generated for each day with the measured weather information (historical data) and forecast data.
In order to correctly reflect the past thermal behavior in the BEM taking into account all loads of the building (HVAC system, people, lighting, electric equipment, etc.), the building's indoor temperature measurements obtained from the sensors installed in the building are introduced into the model. The actual indoor temperature is introduced into the simulation model through an external file and configured as a dynamic set-point for the HVAC system. The simulation output is the energy demand required by the model to follow it. This way, the uncertainty in the load forecast due to internal gains and occupant behavior is avoided. In this sense, sensors installed in the building are fundamental devices since they connect the simulation model with the real world, increasing its accuracy by taking into account aspects that are difficult to virtualize [58].
For each day of analysis, one simulation is performed, which is configured to run 15 days before the baseline day (Day 0) with the measured weather data to ensure that the thermal history of the building is captured. The loads on Day 0 and n days of the forecast are obtained as the outputs of each simulation. These results (ordered and classified) are subsequently used in the probabilistic process. The simulation process, the inputs (weather and measured temperature file), and the simulation period are explained in Figure 2.   The second procedure of the PLF technique employed is the probabilistic processing of the point forecasts, which are the simulation outputs obtained from the BEM. First, the distribution of the residuals, which are the energy load differences provided by the BEM when it is fed the observed and forecast weather data, is studied through a probabilistic histogram. Second, to obtain a smooth curve that represents the data, a probability density estimation is performed using the Gaussian kernel density estimate (KDE). The objective of this procedure is to obtain the expected probability that the load forecast error is below a certain value. The cumulative distribution function (CDF) or S-curve is employed to represent the probability that the variable (here, the load forecast error due to the weather forecast) will be less than or equal to a certain value. Finally, from the CDF plot of each forecast hour, prediction intervals (PIs) of load forecast errors are extracted and transformed into an hourly map of uncertainty. The schema in Figure 3 shows the complete process. The resulting hourly map of the uncertainty of the load forecast due to weather forecast data shows an overview of the probability that the energy demand error is below a certain value for each hour. The map is constructed with the available hours ahead of the forecast time, and it is read as follows: for hour n, there is a% probability that the energy demand error is less than y kWh. This map can then be applied to the load forecast provided by the BEM by using the intervals of the energy demand error with their probabilities to the load forecast outputs of the model. The result is the load forecast with the probability error due to weather forecast data.
For a more detailed explanation of the PLF process, the previous paper from the authors can be consulted [53]. That study aimed at showing the methodology for providing a probabilistic load forecast using building energy models. The present study goes one step further and aims to provide an optimized PLF results by studying how to characterize the days used for the generation of the uncertainty map.

Optimal Probabilistic Load Forecast
This section shows how the PLF process explained in the previous section can be optimized by the correct selection of the training days that generate the PLF. The concept is that each day of study is individualized and has its own optimal training days in order to compare the future building's energy behavior to similar days in the past. The steps to generate this optimal PLF for each day of study are four: (1) definition of the characterization criteria; (2) generation of the PLF for each one of the characterization criteria and for each day of the database; (3) analysis of the minimum training days needed for each characterization criteria so that the obtained PLF works better than the previous criteria; and (4) ordering the criteria to generate a hierarchical criteria application list. The following paragraphs give the explanations of the four steps.

First
Step: Definition of the Characterization Criteria The first step is to define which are the characterization criteria when selecting the training days of the PLF. The key of the proposed methodology is that the selection of the data used for the generation of the uncertainty map and the PLF results is based on a daily unit and not on the seasonality of the data. The use of the seasonal or monthly criteria is discarded since it might happen that different days from different seasons cause a similar energy demand in the building regardless of the season or month in which they belong or that in a given month or season might be days with different behaviors that are negatively affecting the uncertainty map and PLF results. This methodology selects for each day of study similar days from the database attending to its energy demand. In order to characterize the days of the database and find the best set of training days for each day of study, four filters of selection criteria were established. These filters are ordered from highest to lowest filtrate grain, and one filter includes the previous one; in other words, they are cumulative. The schema of the proposed filters and the day characterization are shown in Figure    Filter 0: baseline. All the database available was used for the uncertainty map calculation and PLF for each day of study. It should be highlighted that only days with energy demand requirements are included in this Filter 0. For example, if the HVAC system is turned off on Sundays, Sundays will not be part of the database. Filter 1: filtered by type of energy demand. The database was divided in heating and cooling days. For example, when the day of study is a heating day, its uncertainty map is generated using all the heating days from the database. Filter 2: filtered by the building's use. It is usual that depending on the weekday, the building has a different use and HVAC schedule. For example, it is common that in an office building, working days and Saturdays have their own schedule. Therefore, this filter splits the database depending on the use of the building. In this research, working days and Saturdays were considered, but for other cases, this will depend on the specific use and schedule of the building of study. Filter 3: filtered by similar energy demand. Days with a similar energy demand to the forecast energy demand of the day of study were selected for the creation of the uncertainty map. The energy demand was used to categorize the days instead other indicators as weather data since it reflects the energy effect that the combination of climatic parameters (outdoor temperature, humidity, solar radiation, etc.) generates in the building. Three different ways of identifying the similarity between the forecast energy demand of the day of study and the energy demand from the days of the data base were established: 1.
Quantitative similarity: The days are similar in terms of the amount of energy demanded. This was analyzed using two different criteria: (1) by percentage with jumps of 5% with respect to energy demand (from 5% to 25%) and (2) by the amount of energy with jumps of 10 kWh (from 10 to 50 kWh). We selected 10 kWh because it matched with 10% of the mean daily energy demand of the building used in this study. For other cases with other mean daily energy demand, the jumps in kWh will be adjusted for the specific case.

2.
Qualitative similarity: The days are similar in terms of the shape of the required energy demand curve. Two indexes were used for this characterization: mean absolute error (MAE) (Equation (1)), which measures the average magnitude of the error in the units of the variable of interest, and the coefficient of determination (R 2 ) (Equation (2)), which allows measuring the linear relationship of the two patterns [59]. A maximum limit for the MAE of 5 kWh and a minimum limit for R 2 of 75% were established. 3.
Combination of the quantitative and qualitative criteria: The days selected as training days fulfill at once the quantitative and qualitative criteria defined in each case.

Second Step: PLF Calculation for Each Criterion
The second step is the PLF calculation for each day of the available database, which is performed for each one of the characterization criteria defined in the previous step. The PLF process follows the procedure explained in Section 2.1. For each day and criterion, three parameters are obtained: on the one hand, two indicators for the prediction interval assessment: the prediction interval coverage probability (PICP) and the mean prediction interval width (MPIW); and on the other hand, the number of training days with which the uncertainty map was generated, in other words the number of days that meet the criteria established in the previous step.
The PICP value measures the reliability of the predictions and shows the percentage of the real values that will are covered by the upper and lower bounds. The larger the PICP, the more likely that the real values are within the prediction interval. It can be defined as: in which H is the number of samples and C i is a Boolean variable defined as follows: where L i and U i are the lower and upper PI bounds of target y i , respectively. The PICP ranges between 0 and 100%. The prediction interval is considered valid if the PICP value is greater than the prediction interval nominal confidence (PI NC = 100(1 − α)%), where α represents the probability of error.
A complementary metric is used to assess the prediction interval widths, the mean prediction interval width (MPIW), and it is defined as:

Third
Step: Definition of the Minimum Training Days The third step defines what are the minimum training days required for each criterion since the ability of the PLF and the uncertainty map to foresee the error in the energy demand due to the weather forecast directly depends on the amount of training data used in the procedure. We analyzed the minimum number of training days required for each criterion to obtain improved results with respect to the results from the criterion of the previous filter.
The requirement that was established to consider that a criterion works better than the previous filter is that the average PICP of the days that meet the condition that have equal or more than x training days is greater than the average PICP of those same days for the criterion of the previous filter. The analysis was carried out for all available number of training days. At the moment in which, for a given minimum number of training days, the mean PICP of the criterion studied is higher than the mean PICP of the criterion of the previous filter, this criterion is considered to be better. It may happen that regardless of the number of days of training, the result of the average PICP does not exceed in any case that of the previous filter. In this case, it is considered that the criterion studied does not improve the previous filter, and it is discarded.

Fourth Step: Ordering of the Characterization Criteria
After the analysis of the minimum training days, the criteria that do not generate improvements with respect to the criterion of the previous filter are discarded. Afterward, the criteria that do produce improvements are ordered according to the average PICP value. It is important to highlight that this PICP value is calculated using the days that meet the specific criterion. In this way, a hierarchy of criteria can be generated according to their ability to improve the PLF of the previous filter. The result is a list of criteria application ordered based on the ability to improve the PLF results.
Once the hierarchical list of criteria application is obtained based on the available data set, the process to use this methodology is applied as follows. First, the day of study is characterized by defining the type of energy demand for Filter 1 (heating or cooling) and the type of use of the building for Filter 2 (working day or Saturday). To define Filter 3, based on the similarity in the energy demand, the weather forecast is introduced in the BEM, and the required energy demand of the building is obtained. This energy demand is then compared with the forecast energy demand of the days from the database. This comparison allows counting how many days from the database meet the different criteria with respect the day of study; for example, how many days in the database have heating demand, are working days, and have and energy demand that is equal to or greater than 50 kWh with respect the forecast energy demand of the day of study. The last step is to apply the list of criteria application through a cascading process with conditional structure. The first criterion is selected, and it is analyzed if the minimum training days required are met. If the minimum training days required are fulfilled, this criterion is applied; if this is not fulfilled, the second criterion is selected to be analyzed, and so on, until one criterion meets the minimum training days. If the minimum training days requirement of any of the criteria for this Filter 3 is not met, the previous level of filtering (Filter 2) is applied. This procedure is explained below using a case study.

Description of the Case Study
In this section, the case study used to apply the proposed methodology is presented. The administrative building of the Architecture School at the Universidad de Navarra in Pamplona (Spain) was chosen to be the case study. This building, used for administration purposes and by postgraduate students of the School of Architecture, was built in 1974. It is a 760 square meter single-story building whose layout consists of a succession of staff offices, an administration area, an open work space, and classrooms. The building has a concrete structure. The walls are built of red brick fabric, and the windows have aluminum frames and air chambers. The building has a heating and cooling system. Figure 5 shows the building's outdoor photograph, the weather station located on the building's roof, and the simulation model. The building has an office schedule. On Saturdays, the building is used only in the mornings, and it is not used on Sundays. The building energy model employed in the case study was developed using the EnergyPlus engine. In the load forecasting field, when using BEMs, it is important to take into account the three main sources of uncertainty: BEM accuracy, building use, and external conditions. This case study used a calibrated BEM, obtained using a calibration methodology explained in the authors' previous papers [59,[61][62][63][64]. Using a calibrated model allows minimizing the uncertainty due to BEM accuracy. Regarding the building's use and its internal gains, no uncertainty was considered since the BEM used indoor temperatures measured in each thermal zone to take into account the indoor conditions, as explained in Section 2.1.
Regarding the external conditions, they were introduced into the model through the weather files, and both the observed and the forecast weather data were required to generated these files. The observed weather data were obtained from the weather station installed on the building, which provides measurements with hourly intervals for nine climate parameters: temperature, humidity, direct and diffuse irradiation, atmospheric pressure, rainfall, wind speed, and wind direction. Figure 5 shows the weather station location. Meteoblue [65] is the commercial service that is used to provide the forecast weather information. Meteoblue uses a multimodel/machine learning approach to calculate the forecast weather data using both its own meteorological models (nonhydrostatic meso-scale modeling) and third-party models for the simulations. More information about Meteoblue's forecast weather data process is available on its web page [65]. In this study, the weather forecast data were obtained at 09:00 on each day.
The period in which all of the required data were available was from 13 June 2019 to 31 January 2020. The cooling system was connected from 13 June 2019 to 18 October 2019 and the heating system from 19 October 2019 to 31 January 2020.
In the following sections, the methodology is illustrated showing how the four steps explained in Section 2.2 were applied in this case study. Then, we evaluate the ability of the proposed methodology to get optimized PLF results by characterizing the training days applying the methodology to the last 15 working days of the data set available (validation period).

Results
In this section, the results of applying the proposed methodology to get optimized PLF results by characterizing the training days are presented. Due to the paper's extent, the results of the case study were focused on days when the heating system was turned on and were classified as working days. The four steps explained in Section 2 are presented using the data from the case study, and the methodology was validated showing the PLF results with the four levels of filtering using the last 15 working days of January 2020 (validation period).

First
Step: Definition of the characterization criteria First, the characterization criteria were established. The following Table 1 shows the four characterization filters of the training days for the calculation of the PLFs. Within the Filter 3, similar energy demand, it shows as well the three ways of identifying the similarity between the forecast energy demand of the day of study and the energy demand from the days of the data set.

Second
Step: PLF calculation for each criterion For each day of the available database, the PLF calculation was performed for each characterization criteria, from Filter 0 to Filter 3. For each case, we obtained the PICP and MPIW values and the number of training days, that is the number of days that met the criteria and with which the PLF map was generated. The following graph Figure 6 shows the results for two sample days (21 October 2019 and 21 November 2019). The graph shows the PICP, MPIW, and training days used in the PLF for all criteria. In the case of 21 November 2019, all criteria had data to generate the uncertainty map; in other words, there was at least one day in the database that met the requirements of each criterion. However, in the example of 21 October 2019, many criteria did not have training days available that met the requirement in each case, especially for qualitative similarities. The graphs also show how the characterization of the days improved the results gradually. For example, on 21 October 2019, the PICP value for All (Filter 0) started at 41% (below the confidence level) and it grows to 70% for HTG (Filter 1), to 75% for HTG-Work (Filter 2) and from 80% to 90% for the criteria from Filter 3, which is above the confidence level (80%).
The following Table 2 shows the results for the 66 heating working days of the data set. The first column indicates the filter to which the criterion belongs. The second indicates the selection criteria for training days. The third column shows the number of days with respect to the 66 days in which there was training that met the criteria. For example, in the case of HTG-Work±20%+MAE+R2, only 38 days of the 66 had at least one day of training. The fourth column "training days" shows the days (or ranges of days) with which the uncertainty maps were constructed for each criterion. The fifth column shows the mean PICP for the days in which the uncertainty map was available, and the sixth shows the mean MPIW for those days.
Regarding the results shown in this table, in the first three filters, a clear improvement in the mean PICP was appreciated as the filter gets smaller, going from a mean PICP of 55.9% for the All criterion, where the training days included all the days from the database, at a mean PICP of 83% when only the HTG and working days were used to generate the maps (HTG-Work). In Filter 3, when similar energy demand filters were applied in the training days' selection criteria, not every day from the database was a possible training day that met these criteria; therefore, most filters had less than 66 days. On the other hand, the range of training days for each criterion was very wide, for example from one to 36 days in the case of HTG-Work±5%. That is, within the same criteria, there were days that had very few training days, and other days had many training days since it depended on the forecast load provided by the BEM and the available data from the past. It should be noted that the table shows the mean PICP, but the PICP results for each day were very different depending on the number of training days available. For example, in the case of HTG-Work±5%, although the mean PICP was 82.5%, there were some days that achieved PICP values of 100% when there were enough available training days. Therefore, the next step was to analyze from how many days of training the result of applying Filter 3 improved the indices of the previous filter (Filter 2, HTG-Work), that is define the minimum days of training for each criterion to be effective.

Third
Step: Definition of the minimum training days The third step analyzes the minimum training days for each criterion. The following Table 3 shows for each criterion from Filter 3 (filtered by similar energy demand) the minimum training days required by each criterion to improve the PICP value from the previous filter (Filter 2, HTG-Work) and the values of the mean PICP for both Filters 2 and 3. The table shows that six criteria from Filter 3 did not achieve an improvement in the mean PICP values. In Appendix A, we present Tables A1-A3 showing how the minimum training days were selected.

Fourth
Step: Ordering of the characterization criteria The aim of this step is to define the order of application of the characterization criteria. Once the minimum training days necessary for each criterion to improve the results of the previous filter have been established, the criteria that do not imply an improvement in these results are discarded. In this case study, six criteria of the 21 initially proposed did not improve the PICP results of Filter 2 and, consequently, were discarded. These six criteria are marked in the previous Table 3 with a hyphen. The 15 criteria that succeeded in improving the PICP results of the previous Filter 2 were ordered according to the mean PICP obtained. Table 4 shows these criteria ordered by the PICP (from 94.4% of the first criteria to 76.2% to the fifteenth) and the minimum required training days to be applied.     HTG_Work_±20+MAE+R2 76.2% 7 The table above is the hierarchical list of criteria application ordered by their ability to improve the PLF results. The following section presents how this was used.

Validation of the Methodology
This section presents how the methodology was applied, showing the results obtained for the last 15 heating working days of the available data set. These were Days 10, 14-17, 20-24, and 27-31 of January, 2020. The following Table 5 shows an example, using the last day of the test days (31 January 2020), of how the criterion from Filter 3 (filtered by similar energy demand) was selected. The table shows the order of application of the criteria as calculated in the previous section and the minimum training days necessary to obtain an improvement with respect to the criterion of the previous filter (Filter 2, HTG-Work). The table also indicates the number of training days for this specific day of study (in this example, 31 January 2020) and if the minimum days were met in each criterion. The table is given in descending order, and the first criterion in which the minimum training days were met is the one to be selected, in this case HTG-Work±5%. In the event that the minimum days of training were not met for any criterion, the criterion of the previous filter (Filter 2, HTG-Work) would be applied.
The results for the testing days (the last 15 working days of January 2020) are shown in the following Table 6. It compares the results of the PICP and MPIW values for each one of the days and for each one of the filters (from Filter 0 to Filter 3), as well as the mean of all the days. It is observed how the PICP increases as the characterization of the day grows. The MPIW also increases, but in a measured way. Regarding the jump from Filter 2 to Filter 3, Filter 2 being the filter employed in the previous paper from the authors, it should be noted that the mean PICP of the days studied grew from 78.6% to 82.2%, that is by using the energy demand similarity filter, the mean PICP exceeded the confidence level. Analyzing the results for each day, it was observed that of the 15 days, fourteen days had a PICP equal to or greater than Filter 2, HTG-Work PICP, and of them, eight increased the PICP results. Only one day (15 January 2020) provided a lower PICP when applying Filter 3 (83.3%) compared to Filter 2 (87.5%), but it should be noted that the result was still above the confidence level (80%). In order to show graphically how the prediction of the building's energy demand improved as the characterization of the day was adjusted using similar training days to the day of study, a graph with the results for a day of study is shown in Figure 7. Specifically, we show the last day of the test period, 31 January 2020. The graph shows the real energy demand (red line), the forecast energy demand provided by the model using the weather forecast (blue line), and the probabilistic load forecast for each filter. It was observed how as the characterization of the days was adjusted and the filter more accurate (from Filter 0 in orange shadow to Filter 3 in grey shadow), the PLF adapted better to the real energy demand curve, since the shadow of the prediction intervals better covered the real energy demand. Applying the criterion of Filter 3, where the similarity of the energy demand was applied, means significantly improving the PLF. For the day presented in the graph, the PICP values increased from 12.5% for Filter 0 (All) to 87.5% for Filter 3 (HTG-Work±5%) (see the results in Table 6). The better adjustment of the PLF to the real energy demand can be seen graphically especially between Hours 10 and 12, when the real energy demand only fell within the PLF range obtained with the criterion of Filter 3.

Conclusions
In the current energy context, load forecasting is necessary to optimize the use of energy in buildings. Forecasting future energy demand necessarily requires dealing with the effect of uncertainties. One of the most important uncertainties and that affect the load forecasting results more is weather forecast. Traditional load forecasts based on point predictions are not able to reflect properly its effect on the energy forecasts. However, probabilistic load forecast allows taking into account the uncertainties in the energy load forecast. Previous research from the authors presented a methodology that provides the probabilistic load forecast while accounting for the inherent uncertainty in forecast weather data using a white-box model (BEM). The result is an hourly map of the uncertainty of the load forecast, which allows the point load forecasting provided by the BEM to be converted into a probabilistic load forecast. The methodology fixes the other uncertainties such as building model accuracy and user behavior in order to focus the research on the weather forecast uncertainty. This study aimed to go one step further and get optimized PLF results by providing a methodology to select the best amount of data for the training days that generates the map of uncertainty. The methodology was based on each day of study being individualized and it being provided with its own optimal training days in order to compare the future building's energy behavior to similar days in the past.
A case study in a real office building in Pamplona, Spain, was presented. Eight months of data from the building's monitoring, weather station, and weather forecast were available for the study with the cooling system connected from June 2019 to October 2019 and the heating system from October 2019 to January 2020. The methodology establishes four levels of filtering: Filter 0, baseline with all the data available; Filter 1, filtered by type of energy demand; Filter 2, filtered by the type of use of the building; and Filter 3, filtered by similar energy demand. The similarity of the energy demand from Filter 3 is identified by three different ways: quantitative similarity, qualitative similarity, and a combination of both.
As an example of applying the methodology in a real case scenario, the paper showed how the methodology was developed, focusing on working days that have the heating system, and it is validated by applying it to the last 15 working days of the data available (last 15 working days of January 2020). The PICP and MPIW indicators were used to compared the results for each filter. The results showed how the PICP value increased as more accurate filters when selecting the training days were applied. Analyzing Filter 3 application and comparing it to the previous Filter 2, which was the filter employed in the previous paper from the authors, it was seen how the mean PICP grew from 78.6% to 82.2%, which exceeded the confidence level when the energy demand similarity filter was applied, and PICP values up to 95.8% were obtained for some testing days. The use of the energy demand characterization criteria of Filter 3 provided that 14 days of the 15 days of study achieved a PICP value equal to or greater than the previous filter criterion, and eight of them increased this value.
The results showed how important the selection of the training days for the generation of the uncertainty map and the PLF calculation is. The paper showed as well how individualizing the training days and selecting them using the days characterization proposed by this methodology considerably improved the PICP results, which means that this methodology reached better probabilistic load forecasts. In conclusion, the application of this methodology allows obtaining an optimized prediction of the near-future energy demand of the building taking into account the weather forecast uncertainty, one of the most important sources of uncertainty in the building load prediction field. It is a useful tool for any application in which the future load forecast is required.
One important characteristic of this methodology is that it is constantly fed back the newly measured and gathered data from the building, weather station, and weather forecast provider. As the available data increase, the methodology gains robustness since the procedure is based on the energy characterization of the training days that configured the uncertainty map and the PLF, and the more data, the more possible training days for each criterion there are. The methodology can be easily automatized to directly incorporate the new data to redo the PLF calculations for each criterion. Funding: The researches G.R.R. and C.F.B. were financed by the Government of Navarra, from "BIM to BEM (B&B)" with Agreement Number 0011-1365-2020-000227.

Acknowledgments:
We would like to thank the School of Architecture of the Universidad de Navarra for providing the data of the school located in Pamplona (Spain).

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript:

Appendix A
The following Tables A1-A3 show how the minimum training days of the case study were defined. The tables present for each criterion from Filter 3 (filtered by similar energy demand) the comparison between the mean PICP applying the energy demand similarity filter studied in each case and the mean PICP of the criterion of the previous filter (HTG-Work), for all minimum training days. It should be noted that, in order to allow the comparison between criteria, the mean PICP in both cases is calculated for the days that have the minimum number of training days analyzed in each case available. The table allows quickly seeing for which minimum number of training days the results are improved when applying the energy demand similarity criterion from Filter 3. These minimum training days are highlighted in the table for each criterion with a box. It is remarkable that for some criteria, it is not possible to improve the HTG-Work criteria.