Electrical Consumption Profile Clusterization: Spanish Castilla y León Regional Health Services Building Stock as a Case Study

Health Services building stock is usually the top energy consumer in the Administrative sector, by a considerable margin. Therefore, energy consumption supervision, prediction, and improvement should be carried out for this group in a preferential manner. Most prior studies in this field have characterized the energy consumption of buildings based on complex simulations, which tend to be limited by modelisation restrictions and assumptions. In this paper, an improved method for the clusterization of buildings based on their electrical energy consumption is proposed and, then, reference profiles are determined by examining the variation of energy consumption over the typical yearly consumption period. The temporary variation has been analyzed by evaluating the temporary evolution of the area consumption index through data mining and statistical clusterization techniques. The proposed methodology has been applied to building stock of the Health Services in the Castilla y León region in Spain, based on three years of historical monthly electrical energy consumption data for over 250 buildings. This building stock consists of hospitals, health centers (with and without emergency services) and a miscellaneous set of administrative and residential buildings. Results reveal five distinct electrical consumption profiles that have been associated with five reference buildings, permitting significant improvement in the demand estimation as compared to merely using the classical energy consumption indicators.


Introduction
In order to comply with European Union (EU) perspectives on energy generation and consumption by EU members for the 2030 and 2050 horizons, a significant increase in the penetration of renewable energy sources (RES) and a reduction of energy needs, through energy savings and efficiency policies, are mandatory. Both approaches are especially relevant in the transport and buildings sectors, and within the latter, Public Administration building stock is of special relevance.
On the other hand, reliable energy indexes must be developed in order to supervise the evolution of the consumed energy, which is ultimately associated with greenhouse effect gas emissions. This detailed supervision intends to assist energy planners in achieving local, national, and European targets for energy savings and efficiency.
Typically, the evaluation of a building's energetic behavior is carried out via computer simulation and the assignment of energy labels is based on the results of the same and its comparison with reference buildings. This method has been demonstrated to show a great accuracy in some specific cases [1], but considerable dispersion appears when analyzing large buildings, such as hospitals, which are usually characterized by complex air conditioning equipment and high electric energy consumptions. Moreover, most energy labeling methods do not currently consider the electric energy consumption of the building-such as thermal (heating and/or cooling)-since the building's request is usually noticeably larger (both in magnitude and price terms) as compared to the electric energy and since those needs are vigorously associated to the level of occupation and activity. However, as the nearly Zero Energy Buildings (nZEBs) and Positive Energy Buildings (PEBs) are being deployed, energy consumption needs associated with the thermal behavior of the building are becoming increasingly lower over time and it is expected that soon, building energy needs will be mainly associated with their electric energy needs [2].
However, it may be difficult to carry out representative simulations of the electric energy needs of a building, due to their relationship with the oscillating level of activity. The recently developed "big data" and "data mining" techniques may help to treat real measurements from a large amount of buildings' metering systems. Thus, real power demand profiles can be created from suitable data treatment, instead of using simplified power demand profiles, thus significantly improving the final modelization of the facilities [3].
In this study, the authors propose a novel method to identify and classify buildings according to their electric energy demand profile during a natural year period by applying clustering techniques. The aim of this work is to identify general classes of buildings according to their electric energy consumption and to improve not only the demand estimations, but also the aggregation of electric energy profiles of different buildings, which results truly useful for centralized energy purchasing, energy consumption monitoring by activity sectors and fast identification of abnormal consumption behaviors. Other applications of the proposed method are related to the optimization of the definition of Power Purchase Agreements (PPAs) in the public sector or the optimal introduction of the Electric Vehicle (EV), among many others [4][5][6][7]. Taking into account a proposed case study, this methodology has been applied to the health system stock of the Castilla y León region in Spain, which consists of approximately 250 buildings of different sizes and final uses.
This paper is divided into three sections. The first section includes the introduction to the topic, and the reference framework in the EU zone and a description of the building stock from the health system of the Castilla y León region, which has been used as a case study. The second section, entitled "Materials and Methods", describes the proposed methodology and the origin of the used data. Scope and limitations of the work are also presented in this section. The next section presents and discusses the obtained results, while the final section collects the authors' conclusions and proposals for future research in the field.

Innovations Introduced by the 2018/844/EU Directive
On 30 May 2018, EU Directive 2018/844 [8] was published, updating Directive 2010/31/EU [9] on the energy performance of buildings and Directive 2012/27/EU [10] on energy efficiency. In this document, the EU declares its commitment to developing a sustainable, competitive, secure, and decarbonized energy system, while at the same time it recalls the commitments of the Energy Union and the Energy and Climate Policy Framework for 2030. The EU Commission finds the need to provide Member States and investors a clear vision to guide their policies and investment decisions, including national milestones and actions for energy efficiency to be accomplished over the short-term (2030), mid-term (2040), and long-term (2050). Then, it is required that Member States specify the expected output of their long-term renovation strategies and monitor developments by establishing domestic progress indicators, which are subject to national conditions and developments.
In order to meet proposed goals in the energy field, the EU concludes that Member States and investors need to apply new measures, and it focusses on the need for the de-carbonization of the building stock, responsible for approximately 36% of all CO 2 emissions in the Union, as soon as possible. This conclusion is in line with those from the 2015 Paris Agreement on climate change To achieve a highly energy efficient and decarbonized building stock and to ensure that the long-term renovation strategies result in the necessary progress towards the transformation of existing buildings into nZEBs, or even PEBs, clear guidelines should be provided and, more importantly, measurable, targeted actions should be established [11,12].
Each long-term renovation strategy should be in line with applicable planning and should encompass, among other conditions: (i) an overview of the national building stock; (ii) policies and actions to target all public buildings; and (iii) an evidence-based estimate of expected energy savings and wider benefits, establishing measurable progress indicators. Moreover, databases for energy performance certificates should permit data collection on the measured or calculated energy consumption of the buildings covered, including at least the public buildings stock.
It should be noted that EU Directive 2018/844 points out the real need to determine the energy performance of a building based on its calculated or actual energy use and it shall reflect typical energy use, not only for space heating, cooling or domestic hot water, but also for lighting and other electrical technical building systems [8].

Power Consumption of the Health System in the Castilla y León Region
The Autonomous Community of Castilla y León in Spain is the sixth largest region of the country, having almost 2.5 million inhabitants in 2018 and with health services that are divided into 39 different areas including primary care, specialized health, and administrative sections. It serves the medical needs of over two million patients with 7.81 health professionals per every thousand potential patients [13].
The building stock of the health system in Castilla y León consists of different sets of buildings, which are usually classified into hospitals and health centers. The latter may also be organized into health centers with and without emergency services. Clinics, residences and administrative buildings and warehouses are the minority and they are usually classified as "others". Table 1 shows the inventory description of each category, focusing on their electrical energy needs, while Figure 1 shows their distribution. Figure 1a helps introduce the reader to the energy context, showing the geographical distribution of the average annual Area Consumption Index (ACI), while the pie chart shown in Figure 1b depicts the electrical energy consumption distribution for the described administrative categories.
It can be observed that the majority of the annual electricity consumption comes from the hospitals (almost 81% of the total). The other categories represent approximately 25 GWh·year −1 of annual electric consumption. On the other hand, the variation in total electricity consumption, evaluated through the standard deviation, is relatively small on an annual basis, considering the evaluated period, which is from January 2015 to December 2017. Finally, it should be noticed that the buildings classification provided is valid for administrative purposes, but is inefficient for an energy analysis. Thus, one of the main objectives of this paper is to show the obtained results of a new proposed method to identify reference buildings according to an energetic perspective, which may differ from a purely administrative classification.

Building Sustainability, Energy Indexes, and Annual Electric Energy Profiles
Several authors claim that over recent times, world energy consumption has increased disproportionately in relation to population growth, mainly as a result of economic development and a lack of social awareness [14]. Thus, many studies have attempted to assess the sustainability of the energy consumption at a global level, from a demand side perspective, concluding that the building industry requires more attention and more effective actions than other sectors due to its high energy consumption [15,16]. So, a growing number of countries have introduced energy-efficient strategies in their public-use buildings. Currently, energy consumption in public buildings is 40% greater than that of residential buildings and 30% of the non-residential buildings in Europe are public buildings [14]. Therefore, evaluation of building energy efficiency and energy conservation is extremely necessary [17].
Many authors have considered the intense energy consumption problem of the building sector by considering thermal consumption [17][18][19] and therefore, different solutions highlighting bioclimatic architecture strategies [20] have been proposed with great success. Bioclimatic architectural systems have demonstrated that they can effectively contribute to the reduction of energy consumption while considering potential construction solutions at both passive and active levels. These analyses have been carried out not only for residential buildings, but also for industrial ones where energy savings through the incorporation of automation techniques are difficult to afford and when there is no single directive or standardized method of estimating and validating the energy consumption process in such buildings [18].
Although thermal comfort in Northern European countries has low impact on power consumption, as they are usually satisfied by gas boilers the warmer countries face high electricity energy demands in public buildings due to air conditioning needs [19]. Furthermore, these systems are quite sensitive to slight outdoor temperature changes and climate change has forced engineers to find and design sustainable low-energy systems, especially for public buildings [19]. Identifying the building parameters that significantly impact energy performance is an important step for enabling the reduction of the heating and cooling energy loads [21]. Moreover, as the application of energy savings and energy efficiency directives increases, especially in European countries, thermal demand is becoming electric demand due to the intensive use of electric heat-pumps. So, the analysis of power consumption in buildings is becoming much more relevant today than in the past, when it was several orders lower than thermal demand. Moreover, monitoring can provide advanced visualization and data analysis tools to achieve energy savings and peak power optimization [22,23].

Building Sustainability, Energy Indexes, and Annual Electric Energy Profiles
Several authors claim that over recent times, world energy consumption has increased disproportionately in relation to population growth, mainly as a result of economic development and a lack of social awareness [14]. Thus, many studies have attempted to assess the sustainability of the energy consumption at a global level, from a demand side perspective, concluding that the building industry requires more attention and more effective actions than other sectors due to its high energy consumption [15,16]. So, a growing number of countries have introduced energy-efficient strategies in their public-use buildings. Currently, energy consumption in public buildings is 40% greater than that of residential buildings and 30% of the non-residential buildings in Europe are public buildings [14]. Therefore, evaluation of building energy efficiency and energy conservation is extremely necessary [17].
Many authors have considered the intense energy consumption problem of the building sector by considering thermal consumption [17][18][19] and therefore, different solutions highlighting bioclimatic architecture strategies [20] have been proposed with great success. Bioclimatic architectural systems have demonstrated that they can effectively contribute to the reduction of energy consumption while considering potential construction solutions at both passive and active levels. These analyses have been carried out not only for residential buildings, but also for industrial ones where energy savings through the incorporation of automation techniques are difficult to afford and when there is no single directive or standardized method of estimating and validating the energy consumption process in such buildings [18].
Although thermal comfort in Northern European countries has low impact on power consumption, as they are usually satisfied by gas boilers the warmer countries face high electricity energy demands in public buildings due to air conditioning needs [19]. Furthermore, these systems are quite sensitive to slight outdoor temperature changes and climate change has forced engineers to find and design sustainable low-energy systems, especially for public buildings [19]. Identifying the building parameters that significantly impact energy performance is an important step for enabling the reduction of the heating and cooling energy loads [21]. Moreover, as the application of energy savings and energy efficiency directives increases, especially in European countries, thermal demand is becoming electric demand due to the intensive use of electric heat-pumps. So, the analysis of power consumption in buildings is becoming much more relevant today than in the past, when it was several orders lower than thermal demand. Moreover, monitoring can provide advanced visualization and data analysis tools to achieve energy savings and peak power optimization [22,23].
Several authors have conducted different studies to obtain reference indexes for energy consumption of buildings. At this point, we should highlight the work of Rodríguez-González, A.B. et al. [24] who attempted to propose a standardized energy efficiency index for buildings relating the energy consumption within a building to reference consumption. These authors focused on the need to establish adequate standardized levels of performance and separation of building types to avoid making unfair comparisons between buildings. Moreover, this type of index may be used to detect abnormal behaviors at selected time scales [25]. These indexes may be developed not only for health care facilities, but also for educational, office and residential buildings [26,27].
It is advisable to consider the distinction made in the VDI 3807 standard between building demand and characteristic consumption [28]. While the demand value is calculated according to the acknowledged rules of technology, using assumptions such as boundary conditions, standardized types of use and scenarios, the characteristic consumption is determined based on measured and corrected consumption values. In this work, the methodology was applied to true consumption values from monthly measurements made over 3 years. Thus, the results are expressed in terms of energy consumption instead of energy demand, although the results may be applied to estimate the electrical energy demands of future buildings.
This sort of consumption analysis may be used during the building operation, e.g., as an initial value for the assessment of energy consumption of a particular building, or to compare buildings of the same type and use, for periodic assessments of the actual consumption and user behavior, such as a tool for management and controlling.
Furthermore, it should be considered that the isolated analysis of the energy indexes offers a specific perspective of the energetic behavior of a facility or building. So, this sort of analysis must be conducted upon defining a levelized structure where building energy indices may be aggregated and disaggregated, accordingly to the analysis purposes. This aggregation capability can further explain changes over time. At the same time, the aggregated structure of the analysis helps to separate energy trends based on their source: (i) activity level, (ii) structure, or (iii) energy intensity.
When evaluating the energy consumption of a building, measurements should be independent of building size. The Area Consumption Index (ACI) and Occupation Consumption Index (OCI) are the most widely used. As for the ACI calculation, which is usually a more reliable indicator than OCI when the electric energy consumption is analyzed, the final use of the energy will define the surface value to be considered: the gross surface (BGF), the net surface (NF), the occupied surface (OF) or the heated/cooled surface (HF). The occupied surface is used in most applications, but this value is rarely known or available, and therefore, net surface value is used instead [28]. In this case, the reference area has been defined as the sum of all habitable gross floor areas of the building. In most cases, the habitable area is similar to the heatable area, which, according to VDI 3807 and DIN 277 standards, is calculated by subtracting major non-heatable gross floor areas from the building's gross floor area. The reference area of buildings in which only the entire storeys are heated, is identical to the storey area, which in general, may be taken from the building proposal. In the absence of these data, according to the DIN 277 standard [29], the assignable area for main uses (NF) was used. Also in the absence of this value, the building's total gross floor area was used (in accordance with the German Energy Saving Ordinance, EnEV [30]). The VDI 3807 Part 2 standard estimates the NF/BGF ratio at 85% for hospitals and health centers.
Thus, the reference index in this study has been calculated and defined as the Area Consumption Index, whose expression may be seen in Equation (1). This index has been determined on a monthly or annual basis.
In Equation (1), E is the corrected energy during the time period and A the reference area. In contrast to heating or cooling energy consumptions, outdoor-temperature effect corrections are not usually needed for electrical energy consumption. Nevertheless, when the measured period is not a full natural year (365 days), corrections must be made accordingly [31].
The characteristic energy consumption value can be used for predicting the energy consumption of a large building inventory [24,32,33]. Moreover, these values can help to very accurately estimate the future electric energy demand of certain areas based on the known building areas and building use types, resulting especially useful for energy planning studies [26].
Finally, when considering characteristic energy consumption, it must be considered that changes in building inventory, equipment, or occupation may occur, affecting the significance of the average value of the characteristic energy consumption. However, consumption variations along periodic time periods, such as natural years, tend to remain invariant or have very slight differences. Furthermore, when comparing characteristic energy consumption values of buildings in other countries with trend values in this work, the boundary conditions prevailing in those countries should also be considered.
As a novel approach in this work, obtained reference buildings will be defined not only by the characteristic value, but also by a definition of the energy consumption throughout a natural year, resulting as truly useful in order to increase the accuracy of the consumption estimation over the short-term [34,35]. Seasonal variations may also then be observed and, in some cases, this may help to find abnormal energetic behaviors from the dynamics of the consumption point of view [36].

Materials and Methods
The methodology that has been proposed to obtain the optimal reference electrical energy profiles may be synthetized in the following algorithmic steps (see  10% have been chosen as stop criteria. (h) Once the optimum number of clusters has been established, the reference values for each cluster are calculated as the clusters' centroids for each variable. (i) The accuracy of the obtained reference electrical consumption profiles is evaluated according to several statistical estimators. (j) Classified buildings are geo-referenced and plotted on 2D maps.

Database Description
The "Regional Department of Energy of Castilla y León" or EREN (Ente Regional de la Energía de Castilla y León) promotes an innovative application called the OPTE (Power Tariff Optimization tool) which intends to homogenize public energy contracts (both for fuels and electric energy) by helping local energy managers to implement optimization tools. One of these tools, already deployed, collects the true power consumption of each Public Building registered in the platform. Thus, energy managers from SACYL (the Regional Health System) have registered each managed building, including hospitals, health centers, and administrative buildings through the facility's CUPS (Universal Code for the Power Supply Point).
Each building or installation in the OPTE is characterized by a unique and invariant identifier, called the IDOPTE. This identifier permits the connection with other databases where other information may be provided, such as cadastral data, address, building manager, etc.
By default, the OPTE organizes the buildings database according to an administrative criterion for accounting purposes and, although some pre-analysis tools are being implemented in the platform, no data analysis is provided, other than descriptive reports.
Hourly and monthly average power demand (provided by the Distribution System Operator) of each building since 2015 is available on the platform for downloading. Other installation data, such as type of energy contract, costs, pricing periods, etc. are also provided with the power measurements. For this study, monthly data from January 2015 until December 2017 have been used.

Data Filtering-Acceptance and Exclusion Rules
Initially, 354 buildings and facilities were available in the database, but a filtering process was applied in order to discard errors and outliers which could affect the results. Thus, the following exclusion rules have been applied:

1.
Those building references having no available data on surface were discarded.

2.
Those building references having gaps or errors in the power measurements were discarded. 3.
Those building references whose power measurements data breached normality of the data set were discarded. 4.
Those building references whose power measurements data breached homoscedasticity of the data set were discarded.
Thus, from 354 samples (building references), clustering techniques were only applied to 259 samples, implying an acceptable 26.84% for the data rejection rate.

Yearly Energy Consumption Profile Definition
Before applying the clustering methods, the average monthly ACI values were calculated for each building in the data set, obtaining 12 variables per sample. Then, data were normalized by dividing each average monthly ACI value by the maximum value of the sample, so that results values were in the scale from 0 to 1 included.
To define the yearly energy consumption profile, on a monthly basis, the previously relative monthly ACI values for each sample were adjusted to a polynomic function by the least squares method. Several degrees of polynomic functions were evaluated first and the R 2 estimator was considered as the adjustment performance indicator. Thus, Figure 3 shows both the average value of the R 2 estimator and the relative increase as the degree of the polynomic function increases. It should be noted that the higher the degree of the polynomic function, the higher the accuracy, but also the greater the oscillations of the adjusting function, possibly leading to incorrect conclusions. In Figure 3, it may be observed that the average R 2 value increases with the polynomial degree, but relative increments are reduced from 4th degree onward. It can also be seen that the higher the polynomial degree, the more variables introduced in the analysis. Thus a compromise must be made. Figure 4 shows both histograms for the R 2 values for a 3rd degree polynomial (a) and 4th degree polynomial (b) adjustments. It can be observed that, although the average value of the R 2 estimator with a 4th degree polynomial only improves by 6.65% with respect to a 3rd degree polynomial adjustment, 53% of the R 2 values with 4th degree polynomial are higher than 0.75% and 84% are higher than 0.50, in contrast to only 48% and 62% in the case of 3rd degree polynomial adjustment, respectively.
Thus, for analysis and clustering purposes, the monthly behavior of the electric energy consumption of each building sample has been defined as follows: where x is an integer value in the interval [1,12] representing the month of the year; a4, a3, a2, a1 and a0 are the adjustment coefficients of the polynomial function (which will be the clustering values) and y is the average relative monthly ACI value. In Figure 3, it may be observed that the average R 2 value increases with the polynomial degree, but relative increments are reduced from 4th degree onward. It can also be seen that the higher the polynomial degree, the more variables introduced in the analysis. Thus a compromise must be made. Figure 4 shows both histograms for the R 2 values for a 3rd degree polynomial (a) and 4th degree polynomial (b) adjustments. It can be observed that, although the average value of the R 2 estimator with a 4th degree polynomial only improves by 6.65% with respect to a 3rd degree polynomial adjustment, 53% of the R 2 values with 4th degree polynomial are higher than 0.75% and 84% are higher than 0.50, in contrast to only 48% and 62% in the case of 3rd degree polynomial adjustment, respectively.  In Figure 3, it may be observed that the average R 2 value increases with the polynomial degree, but relative increments are reduced from 4th degree onward. It can also be seen that the higher the polynomial degree, the more variables introduced in the analysis. Thus a compromise must be made. Figure 4 shows both histograms for the R 2 values for a 3rd degree polynomial (a) and 4th degree polynomial (b) adjustments. It can be observed that, although the average value of the R 2 estimator with a 4th degree polynomial only improves by 6.65% with respect to a 3rd degree polynomial adjustment, 53% of the R 2 values with 4th degree polynomial are higher than 0.75% and 84% are higher than 0.50, in contrast to only 48% and 62% in the case of 3rd degree polynomial adjustment, respectively.
Thus, for analysis and clustering purposes, the monthly behavior of the electric energy consumption of each building sample has been defined as follows: where x is an integer value in the interval [1,12] representing the month of the year; a4, a3, a2, a1 and a0 are the adjustment coefficients of the polynomial function (which will be the clustering values) and y is the average relative monthly ACI value.
(a) (b)  Thus, for analysis and clustering purposes, the monthly behavior of the electric energy consumption of each building sample has been defined as follows:

Data Typification
where x is an integer value in the interval [1,12] representing the month of the year; a 4 , a 3 , a 2 , a 1 and a 0 are the adjustment coefficients of the polynomial function (which will be the clustering values) and y is the average relative monthly ACI value.

Data Typification
In order to increase the clusterization algorithm performance, variables for clusterization (coefficients of the adjustment polynomial), have been typified. This typification has been carried out by subtracting the mean value and dividing the result by the standard deviation, as seen in Equation (3). Normality and homoscedasticity of the data are then verified after the transformation.
In Equation (3), a i is the typified variable and a i is the i-th coefficient of the adjustment polynomic function (-), where i belongs to {0, 1, 2, 3, 4}.

Clustering
The so-called "data clustering techniques" are intended to find clusters from a dataset in such a way that data items in the cluster share some characteristics. These techniques constitute a part of the kernel of the exploring data mining science and they have been widely applied in statistics. The clustering analysis cannot be defined as an algorithm itself, but a bunch of them with many different orientations that can be applied to find clusters in a dataset.
On the other hand, some authors prefer to define clustering as a multi-target optimization problem involving a distance function, a density function and the number of defined clusters. Moreover, the clustering analysis is not an automatic process, but rather, an iterative one (interactive multi-target optimization) which implies a test and fail procedure [37,38].
The clustering may be hard (each member belongs only to one group) or soft (each member may belong to several groups simultaneously with different belonging rates). Furthermore, there are multiple clustering techniques and algorithms that can be classified in four main categories:

•
Connectivity models: based on the distance analysis of the connections. Hierarchy methods are included in this category. • Centroid models: each group is represented by a vector of the mean values of the parameters (centroid). The most representative model in this category is the k-means [3]. • Distribution models: groups are modeled by statistical distributions, such as the normal multivariate distribution (Expectation-maximization algorithm). • Density models: groups are defined as dense regions connected in the data space (DBSCAN or OPTICS algorithms).
The most appropriate algorithm for clustering depends on the problem characteristics and, most often, it must be selected experimentally using the researchers' past experience [5,38].
In this case, two clustering techniques types have been used and compared: hierarchy methods and non-hierarchy methods, although only figures of the non-hierarchical methods are shown. The hierarchy clustering algorithm was only applied to estimate the number of clusters that can be significant in the dataset. Based on the dendrogram's shape, different hierarchy methods may be more appropriate. In this case, the Ward's method seems to be the most adequate, as other authors have suggested [38].
So, because of its high performance and simplicity, the k-means or Lloyd's algorithm is applied as non-hierarchy clustering method [39]. This algorithm finds the k centroids of the k clusters and assigns members to each cluster according to their distance to the centroid. This definition constitutes a NP-hard optimization problem and therefore, only approximations of the solution are feasible to compute. Since it only finds local optimal values, the algorithm must be executed in an iterative way with random initial conditions. It should be noted that, as this algorithm optimizes the clusters' centroids it can fail in the borders definition [4].

Statistic Estimators for Accuracy Evaluation
Once the reference energy consumption profiles have been established according to the clustering results, the following statistic estimators, whose definitions may be easily found in the bibliography [40], have been evaluated to determine their accuracy with the dataset: The RMSD value points to the short-term behavior of the model, while the MBD value describes its long-term performance. It should be noted that a few differences of a high magnitude with regard to the reference values will significantly increase the RMSD. Conversely, over-estimations may be canceled out by under-estimations in the MBD.
Some authors express these estimators in terms of "errors" instead of "differences".

Clusterization Results
By applying the Ward's hierarchy clusterization algorithm, it is observed that the appropriate number of clusters to be set by non-hierarchy methods falls within the range of 2 to 6. Then, the k-means non-hierarchy clusterization method is applied in an iterative way with 2, 3, 4, 5, and 6 clusters respectively. Results are shown in Figure 5.

Statistic Estimators for Accuracy Evaluation
Once the reference energy consumption profiles have been established according to the clustering results, the following statistic estimators, whose definitions may be easily found in the bibliography [40], have been evaluated to determine their accuracy with the dataset: The RMSD value points to the short-term behavior of the model, while the MBD value describes its long-term performance. It should be noted that a few differences of a high magnitude with regard to the reference values will significantly increase the RMSD. Conversely, over-estimations may be canceled out by under-estimations in the MBD.
Some authors express these estimators in terms of "errors" instead of "differences".

Clusterization Results
By applying the Ward's hierarchy clusterization algorithm, it is observed that the appropriate number of clusters to be set by non-hierarchy methods falls within the range of 2 to 6. Then, the kmeans non-hierarchy clusterization method is applied in an iterative way with 2, 3, 4, 5, and 6 clusters respectively. Results are shown in Figure 5.
As seen in Figure 5, although increasing the number of clusters can reduce the error value (relative sum of the squared intra-cluster distances) and increase the explained value (relative sum of the squared inter-cluster distances), the optimal number of clusters is 5, as the explained value reaches 85.1%, while the relative increment of defining one extra cluster (6 clusters in total) is lower than 10% (7.76%).  Figure 6 includes the results of the clustering. In Figure 6a-e all members of each clustering class are plotted, as well as the reference electric energy consumption profile, defined as the centroid value of each class. Figure 6f allows the comparison between the five classes' reference electric energy consumption profiles. As seen in Figure 5, although increasing the number of clusters can reduce the error value (relative sum of the squared intra-cluster distances) and increase the explained value (relative sum of the squared inter-cluster distances), the optimal number of clusters is 5, as the explained value reaches 85.1%, while the relative increment of defining one extra cluster (6 clusters in total) is lower than 10% (7.76%). Figure 6 includes the results of the clustering. In Figure 6a-e all members of each clustering class are plotted, as well as the reference electric energy consumption profile, defined as the centroid value of each class. Figure 6f allows the comparison between the five classes' reference electric energy consumption profiles.
Environments 2018, 5, x FOR PEER REVIEW 11 of 19 As seen in Figure 6, classes 1, 4 and 5 show a high variance throughout the year, whereas classes 2 and 3 have a more uniformly constant energy profile. Buildings in classes 1, 2 and 4 consume most of the electric energy over the winter months, whereas buildings in classes 3 and 5 behave in precisely the opposite manner. Buildings characterized by a random consumption trend, e.g., alternative months with high and low consumption ratios were not observed in the dataset.
(e) (f) Apart from some exceptional cases, all samples in each class seem to have low deviations with the reference profile. Thus, the buildings' characterization appears to be appropriate. As seen in Figure 6, classes 1, 4 and 5 show a high variance throughout the year, whereas classes 2 and 3 have a more uniformly constant energy profile. Buildings in classes 1, 2 and 4 consume most of the electric energy over the winter months, whereas buildings in classes 3 and 5 behave in precisely the opposite manner. Buildings characterized by a random consumption trend, e.g., alternative months with high and low consumption ratios were not observed in the dataset.
Apart from some exceptional cases, all samples in each class seem to have low deviations with the reference profile. Thus, the buildings' characterization appears to be appropriate. Table 2 shows the characteristic coefficients for the polynomial adjustment for each class, while Tables 3 and 4 show the average relative monthly ACI values and the average monthly ACI values for the reference profiles, respectively. Table 5 shows the standard deviations. Coefficients of each class' reference are determined by the centroids of the defined clusters. These centroids minimize the sums of the inter-cluster distances. It should be notice that there are no hospitals in class 1 and only one administrative building both in classes 1 and 5 and just one hospital in class 4. As it can be seen in Table 4, the proposed classification cluster buildings that follow the same temporary electrical consumption profile but that can have very different monthly ACI values. It results outstanding that the most intensive energy consumers are the health centers without emergency services in most categories, but especially in class 1. Hospitals and buildings from the "Others" category also show high energy intensity values. n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a HCE 8.  In Table 5, a high standard deviation for health centers without emergencies is also observed for class 1. This means that there may exist a high number of abnormal buildings in this category due to health centers with the class 1 electric consumption profile are usually old centers placed in rural areas. Undoubtedly, this sort of buildings should be analyzed in detail with energy savings auditory reports. The other buildings show an acceptable standard deviation in the range between 1 and 9 kWh·m −2 ·month −1 . Moreover, as expected, the highest standard deviation values are for the winter season for all classes due to the use of electrical heating appliances in some cases. n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a HCE 8. n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a HCE 6.36 6 n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a Glob. = global. Hosp. = Hospital. HCE = Health center with emergencies. HC = Health center without emergencies. Inv. = Inventory. n/a = not applicable.

Results Evaluation
The adequacy of the obtained reference electric energy consumption profiles has been evaluated through five different statistical estimators, with the results of the same being summarized in Table 6. These profiles have been expressed in relative terms to the maximum. Table 6. Statistic estimators results for the reference electric energy consumption profiles. As seen in Table 6, all classes, except for 3 and 5, show a high value of the R 2 estimator. MAD values are better interpreted with the MAPD values. The latter are in the range (11%, 32%), which are relatively low. Paradoxically, the highest MAPD value corresponds to class 4, which has the second highest R 2 value. The lowest MAPD value corresponds to class 2.

Class R 2 (-) MAD (-) MBD (-) RMSD (-) MAPD (%) Samples
On the other hand, all RMSD values are lower than 0.042 and the MBD values are negative, meaning that the reference profiles tend to under-estimate the average relative monthly ACI values, although differences can be considered small (low RMSD values). Table 7 shows the administrative breakdown for each class, both in absolute and relative terms. It may be observed that hospitals mainly belong to classes 2, 3, and 5 consumption profiles, which means approximately constant consumptions or small increments with the average in summer months. On the other hand, health centers seem to belong mainly to classes 2 and 3, and few distinctions between those with and without emergency services are observed. Finally, those buildings classified in the "Others" category show electric energy consumption profiles from classes 2, 3, and 4. Most buildings with class 1 consumption profile are health centers, while classes 2, 3, 4, and 5 mainly identify health centers having emergency services. Table 8 shows the relative error or difference between the true average relative ACI values from and the model estimations expressed as percentage of the true average relative ACI. It can be seen that results are significantly low for all categories, but some extreme values for "others" buildings, especially in class 4 for the autumn season or in class 5 for June. This is explained due to the high miscellaneous of this category. Positive values in this table means that the model tends to underestimate while negative values tend to overestimate. Not an overall overestimation or underestimation behavior of the proposed method is observed neither for building types or time periods. n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a HCE −6.23 −6.09 On the other hand, Figure 7 represents the correlations between the true and the estimated values for each building type, including all classes. Very high determination coefficients are observed in all cases, being the health centers with emergencies the worst correlation. Hosp. n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a HCE −6.23 −6.09 6.75 13  Finally, Figure 8 shows the geographical distribution of the classified buildings. It is hard to identify a spatial pattern in this case, but it may be observed that class 4 buildings are located mainly in the borders (rural and mountainous areas). On the other hand, class 3 buildings prevail in the south mid of the region. Classes 2 and 5 buildings seem to be homogenously widespread throughout the  Finally, Figure 8 shows the geographical distribution of the classified buildings. It is hard to identify a spatial pattern in this case, but it may be observed that class 4 buildings are located mainly in the borders (rural and mountainous areas). On the other hand, class 3 buildings prevail in the south mid of the region. Classes 2 and 5 buildings seem to be homogenously widespread throughout the region, whereas class 1 buildings are highly concentrated in three very small areas (two in the north and one in the south). region, whereas class 1 buildings are highly concentrated in three very small areas (two in the north and one in the south).

Conclusions
Results in the proposed case study, which involved the building stock from the Health System of the Castilla y León region in Spain reveal five distinct reference electric energy profiles. These profiles have been demonstrated to very accurately estimate future energy demands of these buildings, according to different statistical estimators.
This proposed energetic classification shows significant differences with the classical administrative classification, which should be considered for energy managers and energy planners.
In the case study, most hospitals were characterized by a uniform consumption profile, whereas health centers show significant seasonal variations between winter and summer periods. However, from this point of view, slight differences exist between health centers with and without emergency services.
The obtained five reference consumption profiles show great accuracy for power demand estimations. Health centers show the worst performance, due to the high diversity present in this category due to very different building' age, location, maintenance programs, and final uses.
Moreover, it has been observed that the proposed clusterization classify well buildings with the same temporary electrical consumption profile even if they have very different monthly ACI values. This will help to find abnormal behaviors even from an aggregated point of view, which is significantly useful when conducting electrical energy bills audits. In the case study, it results outstanding that the most intensive energy consumers are the health centers without emergency services in most categories, but especially in class 1. Hospitals and buildings from the "Others" category also show high energy intensity values.
Finally, not an overall overestimation or underestimation behavior of the proposed method is observed neither for building types or time periods and it can be observed that while class 4 buildings are located mainly in rural and mountainous areas, class 3 buildings prevail in the south mid of the region. Class 2 and 5 buildings seem to be homogenously widespread throughout the region, whereas class 1 buildings are highly concentrated in urban areas.
Future works in this area should be conducted so as to combine these approaches with the more classical ones that tend to only consider static energy consumption indexes.
Author Contributions: All co-authors have collaborated equally in the conception and design of the content, the

Conclusions
Results in the proposed case study, which involved the building stock from the Health System of the Castilla y León region in Spain reveal five distinct reference electric energy profiles. These profiles have been demonstrated to very accurately estimate future energy demands of these buildings, according to different statistical estimators.
This proposed energetic classification shows significant differences with the classical administrative classification, which should be considered for energy managers and energy planners.
In the case study, most hospitals were characterized by a uniform consumption profile, whereas health centers show significant seasonal variations between winter and summer periods. However, from this point of view, slight differences exist between health centers with and without emergency services.
The obtained five reference consumption profiles show great accuracy for power demand estimations. Health centers show the worst performance, due to the high diversity present in this category due to very different building' age, location, maintenance programs, and final uses.
Moreover, it has been observed that the proposed clusterization classify well buildings with the same temporary electrical consumption profile even if they have very different monthly ACI values. This will help to find abnormal behaviors even from an aggregated point of view, which is significantly useful when conducting electrical energy bills audits. In the case study, it results outstanding that the most intensive energy consumers are the health centers without emergency services in most categories, but especially in class 1. Hospitals and buildings from the "Others" category also show high energy intensity values.
Finally, not an overall overestimation or underestimation behavior of the proposed method is observed neither for building types or time periods and it can be observed that while class 4 buildings are located mainly in rural and mountainous areas, class 3 buildings prevail in the south mid of the region. Class 2 and 5 buildings seem to be homogenously widespread throughout the region, whereas class 1 buildings are highly concentrated in urban areas.
Future works in this area should be conducted so as to combine these approaches with the more classical ones that tend to only consider static energy consumption indexes.
Author Contributions: All co-authors have collaborated equally in the conception and design of the content, the performance of the studies, the analysis of the data and the writing of the paper.
Funding: This research was funded by EREN (Ente Regional de la Energía de Castilla y León), under the research project in colaboration with Laboratorio de Inspección Técnica de Minas (LITEM), entitled: "Análisis de consumos horarios de contratos eléctricos de la Administración Autónoma". The APC was funded by MDPI.

Conflicts of Interest:
The authors declare no conflict of interest. The founding sponsors had no role in the design of the study; the collection, analyses or interpretation of data; or in the writing of the manuscript, or the decision to publish the results.