Custom Outlier Detection for Electrical Energy Consumption Data Applied in Case of Demand Response in Block of Buildings

The aim of this paper is to provide an extended analysis of the outlier detection, using probabilistic and AI techniques, applied in a demo pilot demand response in blocks of buildings project, based on real experiments and energy data collection with detected anomalies. A numerical algorithm was created to differentiate between natural energy peaks and outliers, so as to first apply a data cleaning. Then, a calculation of the impact in the energy baseline for the demand response computation was implemented, with improved precision, as related to other referenced methods and to the original data processing. For the demo pilot project implemented in the Technical University of Cluj-Napoca block of buildings, without the energy baseline data cleaning, in some cases it was impossible to compute the established key performance indicators (peak power reduction, energy savings, cost savings, CO2 emissions reduction) or the resulted values were far much higher (>50%) and not realistic. Therefore, in real case business models, it is crucial to use outlier’s removal. In the past years, both companies and academic communities pulled their efforts in generating input that consist in new abstractions, interfaces, approaches for scalability, and crowdsourcing techniques. Quantitative and qualitative methods were created with the scope of error reduction and were covered in multiple surveys and overviews to cope with outlier detection.


Introduction
Demand response applied in aggregation of block of buildings can provide significant benefits, on one hand to the consumers and prosumers and on the other hand to decrease pressure on the transmission and distribution system operators and share the responsibility and benefits of the generators with the rest of the power chain [1].
A demand response in blocks of buildings demo pilot project had been implemented during the period of 2016-2019 in 12 public buildings of the Technical University of Cluj-Napoca (TUCN), consisting in the effective testing of four different demand response automated and/or manual scenarios. The demand response in blocks of buildings (DR-BoB) project started with selecting the public buildings from four different campus locations of the university, having different power signature profiles and different HVAC (heating, ventilation, and air conditioning) systems. Then, in order to effectively implement the demand response (DR) scenarios, a building energy management system (BEMS) was installed at the blocks of buildings (BoBs) technical site, for an online visualization of the energy data and to record a baseline in the energy use and local renewable energy sources (RES) generation. During the following one-year period (12 months) demand response (DR) testing scenarios were implemented in different time schedules, for different combination of the blocks of buildings (BoBs) involved, using automated or manual control, including voluntary involvement of students, academic and administrative staff [2]. Thus, when evaluation of the achieved results followed, the baseline generation issue was the most significant to be properly solved. The paper focuses on this topic, with detailed key approaches regarding outlier detection, recorded data cleaning and baseline construction. Replication of the proposed approach and methodology can be considered, at least for the demand response effectiveness evaluation, to all range of consumers or prosumers, as the key performance indicators: 1-peak power reduction; 2-energy saving; 3-CO 2 reduction; 4-cost savings, etc. are easy to be applied on a clearly established baseline [3]. Demand response projects can be a great opportunity in local communities not only for the residential energy users, but also for the large pools of public buildings, belonging to the local authorities (schools), utility companies, chain of commercial buildings [4]. Collecting clean electrical energy data from this type of 'end users' is a must in order to achieve efficient results.
Regarding outlier's detection, from all the data science extensive literature some definitions can be summarized for the 'most common' data issues. One definition that stands out is given by Barnett and Lewis [5], defining an outlier as an observation or a set of observations that are inconsistent with the rest of the data. According to Edwin de Jonge and Mark van der Loo, "Outliers do not equal errors. They should be detected, but not necessarily removed. Their inclusion in the analysis is a statistical decision" [6]. It is important to mention, that after a significant understanding of the data, it can be concluded that not all the detected outliers should necessarily to be removed or replaced, they can be a meaningful observation on a long term; as a response for this, many distinct outlier detection methods were developed in the literature [7,8]. More than that, in the process of detecting the anomalous values there is only a snowball's chance in hell to be able to detect multiple outliers given the masking effect that can occur when the outlier cannot be detected due to the presence of the others [9]. This issue was addressed by using sequentially correction of the anomalies or using to reduce the masking effect [10,11]. In [12] the outlier detection techniques were presented as probabilistic models with parametric and nonparametric approaches, statistical models, and machine learning algorithms with clustering-based and classification-based techniques. In the probabilistic model's probability distribution functions were proposed to detect anomalous data as the values which have the highest probability to go outside a given threshold. There are two types of probabilistic approaches: (a) parametric, where the data is analyzed with an already known distribution function; and (b) nonparametric, where the data is measured based on a density or distance function, the data set which doesn't behave like the majority of tested population, is considered outlier [13,14]. In most of the parametric and probabilistic outlier detection methods Gaussian distribution functions and median absolute deviation are used [15]. Parametric models are prone to fail because most of the distributions are univariate and the primary distributions of the observations need to be noticed in advanced. Even if the median and mean methods are calculating the central tendency, they are insensitive to the presence of abnormal values. The median method together with median absolute deviation represent the statistical dispersion of the data set and they are more robust than the mean and standard deviation methods. The "breakdown point" [16] is one of the methods used to determine the insensitivity of median method, this indicator represents the maximum number of affected data that can be in the tested data without changing the final results. There is only one problem to address for the median method and that is when more than half of the parameters are of infinite values [17,18]. Non-parametric methods were used on multidimensional data sets using different clustering techniques as k-nearest neighbor [19] and Parzen window [20][21][22]. In addition to the most common nonparametric methods, the following were applied: ranking or scoring data, based on differences and similarities [23], Gaussian mixture models [24], and probabilistic ensemble models using the density based local outlier factor (LOF) with distance based as k-nearest neighbor [25]. Even if the probabilistic models are often used, there is a possibility that the probabilities can be unavailable or limited due to low correlations in the data, which is why the qualitative cleaning methods can achieve better results in detecting the correct tuples to clean and to reduce the information loss [26].
Statistical methods like auto regressive moving averages [27][28][29][30] and Linear regression models were proposed in [31] and used for outlier detection even if it is hard to detect polynomial functions in real time [32,33]. In general, most of the statistical methods are based on historical data and used for offline anomaly detection even if some of them were described as heavy online anomaly detection techniques [34,35]. Supervised (classification based) and unsupervised learning (clustering based) machine learning methods were used to identify outliers in fraud, health care, image processing, and networks intrusions [36][37][38][39]. One of the most feasible machine learning methods for detecting outliers in an unsupervised environment are the clustering-based methods such as k-means [40] or density-based spatial clustering of applications with noise (DBSCAN) [41]. DBSCAN presents some advantages over k-nearest neighbors' method like automatically adjusting the number of clusters to be computed and the ability to isolate the outliers in individual clusters. Neural networks [42][43][44] and support vector machines [45] were also used for anomalous data detection. Functional dependency thresholding with Bayesian optimization for functional dependencies data cleaning purpose was successfully tested on synthetic and real data [46], relaxed functional dependencies were also detected using an improved discovery algorithm relying on a lattice structured search space with new pruning strategy [47].
There has always been a debate on universality in numerical computation, [48] thus providing an opportunity for the authors to test some of the most common techniques from literature [12] in the context of energy consumption. Still, all the data sets may have their own 'personal character' or bias. In [49], a preliminary outlier detection empirical testing was conducted by the authors using probabilistic and statistical and machine learning techniques over the Technical University of Cluj-Napoca's swimming complex. The study highlighted the possibility that some outlier detection techniques would not be able to differentiate between the natural energy peaks (the custom bias or particularity of data) and abnormal data without an additional support function. Therefore, a more detailed investigation has been carried out for all four of the TUCN's DR-BoB pilot site locations that present different energy consumption patterns with focus on the proper tuning of the implemented/tested outlier detection techniques and evaluation of their effect on baseline construction.
After a short summary of the demand response in blocks of buildings (DR-BoB) project, implemented at the Technical University of Cluj-Napoca (TUCN), creating the research context of this paper and a detailed state of the art regarding outlier detection techniques is made, the second section of the paper presents a brief description of the TUCN pilot locations and the DR-BoB implemented system architecture with its components. The third section introduces the applied baseline evaluation approach, showcasing the implemented outlier detection techniques with a special highlight on the proper tuning parameter values to be applied for each investigated outlier detection technique in case of hourly energy consumption data. Additionally, to test and validate the detected outlier data points, an integrated custom scoring method is presented in section three of this paper. Obtained results after the outlier detection and removal processes was applied are highlighted in section four, along with the baseline consumption curves constructed on the original and the cleared data sets. Final conclusions and comments are outlined in the last section of the article.

Overview of the Implemented Demand Response System
Regarding the implemented demand response pilot project, the following clarifications have to be made regarding the existing monitoring system:

1.
The only form of energy involved in the demand response pilot process is electricity.

2.
When the project started, there was only one system for measuring and settling electricity consumption, the one installed by the electricity distribution system operator (DSO). It was combined by three-phase electronic meters, mounted in a semi-direct current and voltage measurement scheme, for currents featuring current reducers and for voltages directly connected to the grid. The meters in each location have the possibility of remote data transmission, but the direct beneficiary of this data is the local electricity distribution system operator (DSO). The technical university came into possession of the measurements by directly requesting this data from the beneficiary. The measured data had a sampling frequency of 1 h. 3.
One of the main objectives of the demand response pilot project was to implement a system for monitoring and remote management of electricity consumption in each involved group of buildings. The monitoring [50] and remote management system has a relatively simple architecture, monitoring only a small number of consumers in each location, therefore the equipment is considered to have a proportional impact on the project. This objective could be achieved only after identifying all the electricity consumers, the operating regimes and establishing their importance according to their energy consumption.

4.
It is also specified that within the technical university there is only one type of monitoring systems, the one described at point 2, which are also settlement systems in the relation with the local electricity distribution system operator (DSO) and further with the electricity supplier. The only exceptions are the four groups of buildings involved in the demand response pilot project.
The monitoring system of the energy consumption in the pilot buildings for the demand response project has the following structure: • Semi-direct mounted meters, with intermediate measurement (current reducers) for current and direct measurement for voltage; • Communication module between meters and PLC or data totalizer; • Communication module between the PLC or data totalizer and the computer on which the application is running, the graphical interface; • Local data server-dashboard, for storing measured data; • The computer on which the application runs with the graphical interface; • Monitors mounted in the main access ways in buildings, on which the monitoring system is displayed and where certain elements can be viewed by the occupants of the building. The following print-screen highlights the structure of the implemented monitoring system in each pilot location, with the corresponding real-time consumption: Figure 1 shows the general view/graphical interface of the monitoring systems that was implemented at four pilot site locations. The related daily consumptions in (kWh) and the absorbed power in (kW) are presented. Also, the total or aggregate electricity consumption is presented [51] for the four pilot groups of buildings:

•
The block of buildings from the Faculty of Electrical Engineering; • The block of buildings from the MarastiStudents Campus; • The Faculty of Building Services; • The swimming pool complex.
Through the monitoring system, the four groups of Technical University of Cluj-Napoca (TUCN) pilot buildings were interconnected with other pilot sites in Europe, in this way the demand response events were tested on a complex platform, with a series of integrated utilities and functionalities. The role of this platform was to manage and aggregate all the factors that influence the development of demand response events and all the participating pilot sites in a centralized process of reaching consumption and energy cost reduction targets, power peaks, CO 2 emissions, etc. Through the monitoring system, the four groups of Technical University of Cluj-Napoca (TUCN) pilot buildings were interconnected with other pilot sites in Europe, in this way the demand response events were tested on a complex platform, with a series of integrated utilities and functionalities. The role of this platform was to manage and aggregate all the factors that influence the development of demand response events and all the participating pilot sites in a centralized process of reaching consumption and energy cost reduction targets, power peaks, CO2 emissions, etc. Figure 2 shows an overview of the aggregated system that includes a series of data information and equipment for the centralized management of several pilot sites. As main functionalities and subsystems, the following are highlighted [52]:  Figure 2 shows an overview of the aggregated system that includes a series of data information and equipment for the centralized management of several pilot sites. As main functionalities and subsystems, the following are highlighted [52]: 1. Market emulation (ME) unifies information from the energy market (from electricity distribution system operator (DSO), electricity transmission system operator (TSO), energy suppliers, aggregators, etc.) and from regional meteorological operators about the factors that are influencing energy consumption and production. It receives notifications and forwards them to the pilot sites for demand re-

1.
Market emulation (ME) unifies information from the energy market (from electricity distribution system operator (DSO), electricity transmission system operator (TSO), energy suppliers, aggregators, etc.) and from regional meteorological operators about the factors that are influencing energy consumption and production. It receives notifications and forwards them to the pilot sites for demand response (DR) events and communicates with the local energy manager (LEM) to provide relevant information on weather conditions. 2.
Local energy manager (LEM) [53] may take over a range of information (energy consumption, energy production, energy storage) directly from energy consumers or, indirectly through monitoring systems or building management system/building energy management system (BMS/BEMS) implemented at each pilot site. Local energy manager (LEM) has also the role of performing the necessary calculations in order to quantify the results obtained by implementing demand response events, through base line forecasting algorithms and key performance indicators (KPIs) calculation. At the same time, it conveys the obtained results further to demand response management system (DRMS).

3.
Demand response management system (DRMS) is an intermediary between local energy manager (LEM) and the market emulator, it facilitates the exchange of information to achieve the predicted events in high performance conditions. Also, through another functionality system called consumer portal (CP), it transmits information regarding the development of events: market notifications, implementation period, equipment that should be involved and obtained results quantified by local energy manager (LEM), involving the Technical University of Cluj-Napoca (TUCN) staff, students, or other stakeholders.

4.
Consumer portal (CP) makes all the relevant information about demand response events available to building owners, administrators, and occupants. In short, the consumer portal (CP) is the interface between all market participants in demand response and system functionalities.

5.
The electricity network has the role of supplying electricity to the consumers in each building or group of buildings, respectively, it is the element through which the electricity produced and/or stored locally at each location, is injected in the network.
By using the above-presented various interconnected technologies with clear and specific functionalities, the possibility of interaction between energy consumers and facility management teams responsible for demand response programs is created. In addition, the use of advanced predictive control and forecasts-which can provide an accurate and highly detailed view of the operation of the building group and their consumers, in terms of energy consumption and production-influence the various types of behaviors.

Baseline Determination
The biggest challenge in evaluating the effectiveness of a demand response (DR) event is to properly determine what would be the real energy consumption in the absence of the event. Hence, to calculate the key performance indicators (KPI's) [54] for DR events taking place within the four Technical University of Cluj-Napoca (TUCN) blocks of buildings, it was impetuously necessary to identify a baseline consumption level for each day when demand response events took place. The starting point in determining the baseline reference consumption was a database composed of hourly electricity consumption values, starting 2014 until fall of 2019 for each block of buildings within TUCN, included in the pilot project.
The electricity consumption data at hourly level were taken from two main sources, namely: the local electricity distribution system operator (DSO), as settlement meter records for each analyzed block of buildings (for the historical energy consumption between 2014 and 2018) and the energy monitoring system, presented in the previous section (Section 2) and implemented within the demand response pilot project (for the energy consumption data starting from 2018 onwards). Given the different nature of energy consumption in the analyzed buildings at different times of the year, an energy profiling action had to be applied for each block of building separately. The applied energy profiling action determined the average daily consumption schedules (baseline consumptions) for each day of the week separately and for similar days, as occupants' activity like weekdays and weekends, respectively looking also for activity patterns. Taking into consideration that the analyzed blocks of buildings belong to the Technical University of Cluj-Napoca, their activity patterns mostly correlate to the academic year schedule: teaching semesters, examination periods and student vacations, in other words, there is a link between energy consumption and the social system of the tested buildings [55]. An exception is made in the case of the swimming pool complex, where the activity pattern is correlated to weather conditions (an additional outdoor pool is operated during summer periods) and special sporting events.
To obtain the proper average daily schedule (baseline consumption) for each analyzed block of buildings, a first selection (correction) of the used energy consumption data was made by identification of atypical activity patterns at day level and extracting from the process of baseline consumption curve evaluation. For these corrections we listed the days considered atypical: the beginning of the calendar year, when it is a national holiday, the holiday at the end of the first semester session, the Easter holiday, on the first day of May, the holiday at the end of the second semester session, respectively the summer vacation, when the hourly consumption profile is very different from the days when current activities are carried out, depending on the specifics of each group of buildings. The second correction consisted in eliminating from the baseline evaluation for each day of the week, the energy consumption data related to the demand response (DR) events that were carried out. Especially during the events, but also 2 h before and 2 h after the event, the consumption profile underwent changes, so that they would have had a negative impact in generating the baseline reference level. With all these assumptions taken into consideration the generated baseline consumption curves would be as good as the input energy consumption data used for the study. Consequently, a data cleaning process, applied with outlier detection and data correction, is presented in the next section.
To test the universal data approach of the methods the authors applied the most common input threshold/parameters values from literature. The Interquartile range (IQR) method helps not only in outlier detection but also in predicting the spread [56] of energy consumption, yet it is tight to its mathematical limitation of identifying only the values which are between the tested threshold value. The same can be observed in the mathematical model of the median absolute deviation (MAD) [57,58] where the limitations are given by the threshold values from the median value. The threshold value chosen for Interquartile range (IQR) method testing is 1.5 [59] and for median is 3. The advantage of using the IQR and median absolute deviation (MAD) models is more related with 'tracking' and maintaining a permeant control spread that will identify the extreme values for most of the cases. A confirmation of an outlier from both methods should always be taken into consideration for future investigations.
The local outlier factor (LOF) method [60] can achieve good results when the outliers are located in dense region of normal data which means that the accuracy of this method can be reduced when it is exposed to a high volatility data set. In the performed analysis values of k equal with 2, 3, 4, 5, 25, and 50 were used. It was determined that the most suitable k values for hourly energy consumption data sets would be the 2 and 3 which are the most common in the literature [61], and also the value of 25, which was empirically tested with a good accuracy, compared with the other values. For the density-based spatial clustering of applications with noise (DBSCAN) method, [62,63] the authors used as input parameter 0.5 for epsilon (ε) as a default value, and the minimum number of points equal to 5. For these values, the highest silhouette score [64] was obtained following various testing scenarios, where the authors applied different epsilon (ε) values (from 0.1 to 0.9) and used various minim number of points (5, 10, 20, and 50).

Outlier Verification and Validation
Due to a need of outlier validation, the support function or the "Custom Scoring Method" (CSM) was designed to analyze the output from any outlier detection method and to decide in the context of electrical energy consumption, if the detected anomalous value is a natural energy peak, or abnormal data. The method compares the input (detected outlier values) with the average energy consumption of four data clusters for the same interval of time and similar days (workdays or weekends).
An overview of the applied outlier verification methodology is presented in Figure 3. The data is collected in the local server and then it is analyzed using interquartile range method, median absolute deviation, and density-based spatial clustering of applications with noise, with the parameters presented in the previous section.  To verify and validate the outliers identified by the above presented outlier detection techniques, an outlier ranking system has been developed and implemented. As a first step for each identified outlier data point, four data clusters have been selected from the available historical data sets, consisting of similar energy consumption values for the same hour of the day, as follows: Cluster A-energy consumption data from the same location for similar days: weekday (Monday to Friday) or weekends (Saturday and Sunday), from the same year, as the analyzed outlier data point (national holidays not included).
Cluster B-energy consumption data from the same location for similar days: weekday (Monday to Friday) or weekends (Saturday and Sunday), same period of the year (2 To verify and validate the outliers identified by the above presented outlier detection techniques, an outlier ranking system has been developed and implemented. As a first step for each identified outlier data point, four data clusters have been selected from the available historical data sets, consisting of similar energy consumption values for the same hour of the day, as follows: Cluster A-energy consumption data from the same location for similar days: weekday (Monday to Friday) or weekends (Saturday and Sunday), from the same year, as the analyzed outlier data point (national holidays not included).
Cluster B-energy consumption data from the same location for similar days: weekday (Monday to Friday) or weekends (Saturday and Sunday), same period of the year (2 month period starting from 15th of the previous month to 15th of the next month), the same year as the analyzed outlier data point (national holidays not included).
Cluster C-energy consumption data for similar days: weekdays (Monday to Friday) or weekends (Saturday and Sunday), from the entire available historical data sets, for the same location as the analyzed outlier data point (all years, national holidays not included).
Cluster D-energy consumption data for similar days: weekdays (Monday to Friday) or weekends (Saturday and Sunday), same period of the year (2 month period starting from 15th of the previous month to 15th of the next month) from each year within the entire available historical data sets, for the same location as the analyzed outlier data point (all years, national holidays not included).
The above-mentioned data clusters were selected in order to accurately identify general daily energy consumption patterns (Clusters A and C), specific to each pilot location, respectively each day type and to catch seasonal energy consumption pattern changes (Cluster B and D), but agreeing to not be influenced by changes in buildings infrastructure (replacement of old equipment or acquisition of new equipment, that could significantly change the average energy consumption from one year to another) (Cluster A and B).
As a second step, a score (from 0 to 5) is given for the analyzed outlier data point, according to each of the above presented data clusters, based on how close the corresponding hourly energy consumption value of the outlier is to the central/average energy consumption of the data cluster. For this the average, the minimum and the maximum energy consumption over the closed P% of the data points from a data cluster is evaluated according to Equation (1), where P could be 75%, 90%, or 60% of all the data points from a cluster: (1) with: avgW P% -the average hourly energy consumption; minW P% -the minimum hourly energy consumption; maxW P% -the maximum hourly energy consumption over the closed P% of the data points from a cluster; N P% -the number of data points from the closed P% range; and W i -a hourly energy consumption data value from the specified data range within a cluster. The applied scoring algorithm is mathematically described through (2) to (6) and graphically represented in Figure 4. Namely if the hourly energy consumption corresponding to the analyzed outlier is between the closed 75% range of a specific data cluster then the analyzed data point is not a real outlier and therefor the Score is set to 0, see (2). If the analyzed outlier data point is between the 75% range limit and a restrictive 90% data range limit, then there is a mild outlier and the score is set to 1, see (3). If the outlier hourly energy consumption value exceeds the restrictive 90% data range limit, then we have a real outlier data point and the score is set to 2, see (4). If the outlier data value not only exceeds the restrictive 90% data range limit, but is more than 25% higher or smaller then a high outlier is found and the score is increased to 3, see (5), while if it is more than 50% higher or smaller, an extreme outlier is detected and the score is increased to 5, see (6).
with Out j -the hourly energy consumption value of the jth analyzed outlier data point and  Based on the scores obtained for data clusters A and B, a first mark is evaluated for each analyzed outlier data point as a weighted average value of the two scores, according to Equation (9), while a second mark is computed as a weighted average value of the scores obtained for data clusters C and D, according to Equation (10) The final rank is obtained as the sum of the two marks computed for the analyzed outlier data point, see Equation (11). If the final rank is lower than 3, then the data point Based on the scores obtained for data clusters A and B, a first mark is evaluated for each analyzed outlier data point as a weighted average value of the two scores, according to Equation (9), while a second mark is computed as a weighted average value of the scores obtained for data clusters C and D, according to Equation (10) The final rank is obtained as the sum of the two marks computed for the analyzed outlier data point, see Equation (11). If the final rank is lower than 3, then the data point is not considered a valid outlier. The adjustment of the valid outliers was conducted using one of the most popular outlier detection techniques in literature, that is the shape-preserving piecewise cubic spline interpolation [65].

Data Cleaning
The need for interconnectivity and balance between production and consumption of electrical energy is an "impetuous agreement" of clean data [49]. Any forecasting process [66,67], or statistical model can be jeopardized by the absence of a cleaned data set.

Swimming Pool Complex
It has been observed that from a common ground of 413 outliers, only 322 were validated through the system. The same process was conducted over the combined data of DBSCAN results and 95.2% of the observation were validated as real outliers. Because the LOF method had a larger number of detected issues, it was decided to run the k = 2 and k = 3 detected outliers independently. For both of the local outlier factor (LOF) computations, the results lacked accuracy, having only 755 valid outliers out of 3512 detected for k = 2 and only 292 outliers out of 1628 for k = 3, as shown in Table 1. After the scoring process, the valid outputs were analyzed in one database. It turned out that from all the methods available, there is a total of 1468 unique valid outliers. Some of the methods validated the same data point as an anomaly. All methods detected 23 common data points, 63 were detected by three of them, and 416 by any two methods that had a common value (see Table 2). Thus, a total of 502 common outliers were identified. Despite that, due to the low percentage of the already filtered outliers through the implemented scoring method, compared with all the data set, it was decided to adjust all the 1468 unique outliers (see Figure 5). out that from all the methods available, there is a total of 1468 unique valid outliers. Som of the methods validated the same data point as an anomaly. All methods detected 2 common data points, 63 were detected by three of them, and 416 by any two methods tha had a common value (see Table 2). Thus, a total of 502 common outliers were identified Despite that, due to the low percentage of the already filtered outliers through the imple mented scoring method, compared with all the data set, it was decided to adjust all th 1468 unique outliers (see Figure 5).

Faculty of Electrical Engineering
To confirm and to conclude the previous observations, the data collected from Faculty of Electrical Engineering, Faculty of Building Services and Marasti Students Campus were analyzed using the proposed methodology. In the first iteration, the data from the Faculty of Electrical Engineering block of buildings was first analyzed using interquartile range (IQR) and median absolute deviation (MAD) algorithms on each year, to extract yearly outliers for each year individually and then the same process was executed on all the data set. There are multiple ways data can be analyzed, because the authors wanted to highlight the most obvious outliers, the intersection of the two processes had been made, the result being represented by the interquartile range (IQR)/comb and median absolute deviation (MAD)/comb. It was concluded that the interquartile range (IQR) method detected that 1.4% of all data is represented by outliers and for median absolute deviation (MAD) 3.7%. The process was also conducted on all the data and the same outlier percentage were obtained for k = 2 and k = 3, and less than 1% for k = 25. In the case of the densitybased spatial clustering of applications with noise (DBSCAN) process, the intersection between two different density-based processes had been made: one was based on the energy consumption value and hour (DBSCAN 1) and the second one was based on the energy consumption value and the day of the week (DBSCAN 2). The testing was also conducted on yearly data and on the entire data set, respectively. Given that no intersection elements were found between the two processes, the combination (reunion) of the results obtained with these two approaches has been applied. The data in Table 3 shows that in total, the DBSCAN method detected 2.2% of the data to be an outlier. With the aim of understanding the validity of the tested methods the implemented scoring method was compiled over the outlier outputs. Given that the interquartile range (IQR) and median absolute deviation (MAD) are both 'spread control' based methods, the intersection of these two was used as input for the outlier scoring/validation. Based on this process, 966 unique outliers were adjusted from the data set (see Figure 6).

Faculty of Building Services
For the data collected from Faculty of Building Services the same test was app using interquartile range (IQR) and median absolute deviation (MAD) process. It wa served that from all the data, IQR detected 2% of the data as outliers and 3% for MAD Table 4). In the case of the local outlier factor (LOF) process, 2.2% of the data were out for k = 2, 0.8% for k = 3, and 0.09% for k = 25 respectively for yearly data. Forall the pro all the data was tested and we obtain 6.8% of the data as outlier based on k = 2 and for the k = 3, the algorithm found only one outlier. Compared with other methods fo density-based spatial clustering of applications with noise (DBSCAN) only 0.4% o data were marked as outlier. Even if the total number of anomalous data is low method indicated a silhouette score equal with 0.99 for both DBSCAN 1 and DBSCA The implemented scoring method was used to validate the results from Table 4 and Fi 7.

Faculty of Building Services
For the data collected from Faculty of Building Services the same test was applied using interquartile range (IQR) and median absolute deviation (MAD) process. It was observed that from all the data, IQR detected 2% of the data as outliers and 3% for MAD (see Table 4). In the case of the local outlier factor (LOF) process, 2.2% of the data were outliers for k = 2, 0.8% for k = 3, and 0.09% for k = 25 respectively for yearly data. For all the process, all the data was tested and we obtain 6.8% of the data as outlier based on k = 2 and 0.8% for the k = 3, the algorithm found only one outlier. Compared with other methods for the density-based spatial clustering of applications with noise (DBSCAN) only 0.4% of the data were marked as outlier. Even if the total number of anomalous data is low, the method indicated a silhouette score equal with 0.99 for both DBSCAN 1 and DBSCAN 2. The implemented scoring method was used to validate the results from Table 4 and Figure 7.

. Marasti Students Campus
Regarding Marasti Students Campus data, abnormal behaviors were observed in t interquartile range (IQR) and median absolute deviation (MAD) algorithms. For bo methods and approaches zero outliers were detected for most of the tested years even i visual data interpretation suggests otherwise. Continuing the analysis on the local outl factor (LOF) process, 5% of the data was detected as outliers for k = 2, 1.5% for k = 3, a 0.8% for k = 25 respectively. Running the same exercise for the entire historical ener consumption data set, the same proportions of outliers were found k = 2 and k = 3, wh for k = 25 only 0.02% of the data was detected as outliers (see Table 5). In case of the de sity-based spatial clustering of applications with noise (DBSCAN), when the energy co sumption and hour values were used for the clustering process (DBSCAN 1), an avera of 52% of the data was marked as outlier, but no intersection between the yearly data s and the entire data sets results was found. In the second part of the analysis, the oppos was observed for DBSCAN 2 (when the energy consumption and the day of the we were considered for the clustering process), only 2.2% of the data were marked as outlie (see Table 5 and Figure 8). It is worth mentioning that for this experiment, there were a no outlier intersection between the yearly data and total data approach. Moreover, t same could be noticed between DBSCAN 1 and DBSCAN 2 results. Therefore, to und stand the huge consistency gap that occurred between IQR, MAD, LOF, and DBSCA processes for this data set, the standard deviation (SD) of all the data sets/pilot locatio were calculated and compared.

Marasti Students Campus
Regarding Marasti Students Campus data, abnormal behaviors were observed in the interquartile range (IQR) and median absolute deviation (MAD) algorithms. For both methods and approaches zero outliers were detected for most of the tested years even if a visual data interpretation suggests otherwise. Continuing the analysis on the local outlier factor (LOF) process, 5% of the data was detected as outliers for k = 2, 1.5% for k = 3, and 0.8% for k = 25 respectively. Running the same exercise for the entire historical energy consumption data set, the same proportions of outliers were found k = 2 and k = 3, while for k = 25 only 0.02% of the data was detected as outliers (see Table 5). In case of the density-based spatial clustering of applications with noise (DBSCAN), when the energy consumption and hour values were used for the clustering process (DBSCAN 1), an average of 52% of the data was marked as outlier, but no intersection between the yearly data sets and the entire data sets results was found. In the second part of the analysis, the opposite was observed for DBSCAN 2 (when the energy consumption and the day of the week were considered for the clustering process), only 2.2% of the data were marked as outliers (see Table 5 and Figure 8). It is worth mentioning that for this experiment, there were also no outlier intersection between the yearly data and total data approach. Moreover, the same could be noticed between DBSCAN 1 and DBSCAN 2 results. Therefore, to understand the huge consistency gap that occurred between IQR, MAD, LOF, and DBSCAN processes for this data set, the standard deviation (SD) of all the data sets/pilot locations were calculated and compared.

Impact on Baseline Construction
Given that the efficiency of the load reduction (demand response effect) can be qu tified only from the baseline curve, [68] an investigation of the impact of the clean methods over this methodology is mandatory. The aim of this study is to understand integration of cleaning methods algorithms in a complex demand response automa platform can improve the baseline accuracy.
Due the fact that all of the tested locations are defined by different social-energe behavior [60], the baseline analysis was conducted independently for each of the inve gated location, in accordance to their specific energy profile.

Impact on Baseline Construction
Given that the efficiency of the load reduction (demand response effect) can be quantified only from the baseline curve, [68] an investigation of the impact of the cleaning methods over this methodology is mandatory. The aim of this study is to understand if integration of cleaning methods algorithms in a complex demand response automated platform can improve the baseline accuracy.
Due the fact that all of the tested locations are defined by different social-energetic behavior [60], the baseline analysis was conducted independently for each of the investigated location, in accordance to their specific energy profile.

Swimming Pool Complex
For the swimming pool complex location where throughout the summer period (June 1-August 31) an additional open air swimming pool is open for public usage, the consumption baseline investigation was considered using data collected throughout the full year, the summer period, and the rest of the year energy consumption data. Based on the results obtained from the full year energy consumption baseline analysis, it has been concluded that after the cleaning process, the yearly standard deviation of the baseline was reduced on average with 0.52% for all days from the data set, for weekdays with 0.43% and for weekends with 0.75% (see Table 6, Figures 9 and 10).
For the summer period consumption, a reduction of the standard deviation (SD) of 0.36% for all the data, 0.41% for weekdays data, and 0.23% for weekends, was obtained (see Table 7). For the rest of the year data, an increase to 0.63% of the SD reduction was observed for all the data, 0.41% for weekdays data and 1.2% for weekend data (see Table 8). In order to provide better understanding of the impact of the data cleaning over the consumption profile, a plot was created for both weekdays and weekends, for both original and cleared full year data sets (see Figure 11).  Figure 9. Energy consumption variation range compared to hourly average weekdays consumption for original and cleared data sets.

Hour
WeekEnds Original WeekEndss Cleared Figure 9. Energy consumption variation range compared to hourly average weekdays consumption for original and cleared data sets.
Sensors 2021, 21, x FOR PEER REVIEW 1 Figure 9. Energy consumption variation range compared to hourly average weekdays consumption for original and cleared data sets.

Hour
WeekEnds Original WeekEndss Cleared Figure 10. Energy consumption variation range compared to hourly average weekends consumption for original and cleared data sets.  Figure 11. Daily consumption baseline for the original and cleared data sets.

Faculty of Electrical Engineering
The energy profile of the Faculty of Electrical Engineering is more dynamic dur the semesters; therefore the analysis was conducted for full year data, 1st semester a 2nd semester data, separately. The same approach was applied both for the Faculty Building Services as for Marasti Students Campus locations.
The results showcased that for the baseline curve of the full year data set, an avera standard deviation (SD) reduction of 0.79% is obtained for all days, 0.45% for weekda and 1.6% for weekends (see Table 9, Figures 12 and 13). The consumption baseline original and cleared data sets are presented in Figure 14. For the 1st semester (see Ta 10) an average SD reduction of 1.09% for all days is recorded, 0.51% for weekdays a 2.57% for weekends, while for the 2nd semester (see Table 11) 0.24% for all days, 0.26% weekdays and 0.19% for weekend days.

WeekDays Original WeekDays Cleared
WeekEnds Original WeekEnds Cleared Figure 11. Daily consumption baseline for the original and cleared data sets.

Faculty of Electrical Engineering
The energy profile of the Faculty of Electrical Engineering is more dynamic during the semesters; therefore the analysis was conducted for full year data, 1st semester and 2nd semester data, separately. The same approach was applied both for the Faculty of Building Services as for Marasti Students Campus locations.
The results showcased that for the baseline curve of the full year data set, an average standard deviation (SD) reduction of 0.79% is obtained for all days, 0.45% for weekdays, and 1.6% for weekends (see Table 9, Figures 12 and 13). The consumption baseline for original and cleared data sets are presented in Figure 14. For the 1st semester (see Table 10) an average SD reduction of 1.09% for all days is recorded, 0.51% for weekdays and 2.57% for weekends, while for the 2nd semester (see Table 11) 0.24% for all days, 0.26% for weekdays and 0.19% for weekend days.

Faculty of Building Services
At the Faculty of Building Services location, the results showcased that due to th data cleaning process, a standard deviation (SD) reduction of 0.22% for all days, 0.24% fo weekdays and 0.17% for weekend was recorded over the entire historical data sets (se Table 12, Figures 15 and 16).

WeekDays Original WeekDays Cleared
WeekEnds Original WeekEnds Cleared Figure 14. Daily consumption baseline for the original and cleared data sets.

Faculty of Building Services
At the Faculty of Building Services location, the results showcased that due to the data cleaning process, a standard deviation (SD) reduction of 0.22% for all days, 0.24% for weekdays and 0.17% for weekend was recorded over the entire historical data sets (see Table 12, Figures 15 and 16).     It had been observed that for the 1st semester, the standard deviation (SD) reduction increased to 0.31% for weekdays, 0.27% for all the data, and 0.18% for weekends (see Table 13). For the 2nd semester, a 0.16% average SD reduction for all the days, 0.17% for weekdays and 0.13% for the weekend days was noticed (see Table 14). The daily consumption baseline for original and cleared data sets are showcased in Figure 17.

Marasti Students Campus
Even if the Marasti Students Campus data is the most volatile set and there issues encountered for the DBSCAN algorithm during the outlier detection process, the adjustment of those detected and validated, an average reduction of standard d tion for the full year data of 0.66% for all days, 0.61% for weekdays, and 0.78% fo weekend days was recorded (see Table 15, Figures 18 and 19).

Marasti Students Campus
Even if the Marasti Students Campus data is the most volatile set and there were issues encountered for the DBSCAN algorithm during the outlier detection process, after the adjustment of those detected and validated, an average reduction of standard deviation for the full year data of 0.66% for all days, 0.61% for weekdays, and 0.78% for the weekend days was recorded (see Table 15, Figures 18 and 19).  For the 1st semester, an average standard deviation (SD) reduction of 0.65% for all the data, 0.74% for weekdays, and 0.44% for the weekend days respectively was noticed (see Table 16). In the 2nd semester, a SD decrease of 0.67% for all days, 0.59% for weekdays, and 0.87% for weekend days was computed (see Table 17). The daily consumption baseline for original and cleared data is highlighted in Figure 20.  Figure 19. Energy consumption variation range compared to hourly average weekend consumption for original and cleared data sets.

Conclusions
During the DR-BOB "Demand Response in Blocks of Buildings" project, the en demand from 12 buildings at four different locations was monitored, 36 Deman sponse events were successfully implemented during an evaluation period of one ye order to improve the quality of baseline data and key performance indicators (KPI) uation, a data cleaning process was proposed (interquartile range, median absolute ation, local outlier factor, density-based spatial clustering of applications with nois telligent scoring method). The numerical results showcased that even in case of diff energy profiles, the cleaned data was reduced in all the cases, the standard deviati the baseline with an average of 0.41%, which means that the nature of the data is n fected by the removal of outliers and we also gain more accuracy in baseline. More that, this highlights the fact that a cleaning process in data energy, before any quali or quantitative process, can significantly improve the quality of the results. It is als portant to mention that in addition to the proposed outlier detection techniques a cu  0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23

Conclusions
During the DR-BOB "Demand Response in Blocks of Buildings" project, the energy demand from 12 buildings at four different locations was monitored, 36 Demand Response events were successfully implemented during an evaluation period of one year. In order to improve the quality of baseline data and key performance indicators (KPI) evaluation, a data cleaning process was proposed (interquartile range, median absolute deviation, local outlier factor, density-based spatial clustering of applications with noise, intelligent scoring method). The numerical results showcased that even in case of different energy profiles, the cleaned data was reduced in all the cases, the standard deviation of the baseline with an average of 0.41%, which means that the nature of the data is not affected by the removal of outliers and we also gain more accuracy in baseline. More than that, this highlights the fact that a cleaning process in data energy, before any qualitative or quantitative process, can significantly improve the quality of the results. It is also important to mention that in addition to the proposed outlier detection techniques a custom integrated function was created in order to differentiate between natural energy peaks and real outliers.
For the demand response project implemented at the level of the Technical University of Cluj-Napoca's block of buildings, in some cases, it was impossible to compute the required KPIs (power reduction, energy savings, cost savings, CO 2 emissions reduction) without the data cleaning process or the resulted values which were far higher (>50%) and not realistic. This is why, in real case business models, in order to rely upon the demand response in blocks of buildings (DR-BoB) demo pilot, it is crucial to use outlier removal techniques. The presented improved baseline data cleaning process will facilitate benchmarking and validating the integration of new incorporated RES technologies [69] and their associated established KPIs in the RE-COGNITION "REnewable COGeneration and storage techNologies IntegraTIon for energy autONomous buildings" project.