Holistic Evaluation of Demand Response Events in Real Pilot Sites: From Baseline Calculation to Evaluation of Key Performance Indicators

: Explicit demand response plays a signiﬁcant role in the future energy grid transition, as it involves end consumers in smart grid activities and, at the same time, exploits the potential of ﬂexibility, giving the opportunity to grid operators to accommodate a total amount of energy without the need to reinforce the grid infrastructure. For evaluating the successfulness of a demand response program, thus, evaluating its advantages, it is fundamental to have an accurate baseline curve consumption along with meaningful key performance indicators. In this work, we propose a novel way of calculating the baseline consumption using artiﬁcial intelligence techniques. In particular, regression models have been applied to a database of historical data. In order to present a complete evaluation of demand response programs, we present ﬁve key performance indicators (KPIs). The KPIs have been selected so as to depict the successfulness of the explicit demand response program. We suggest a novel way of evaluating two of the ﬁve KPI using a quantitative approach. We also apply the proposed methodology for baseline calculation and KPIs evaluation in a practical example: two pilot sites have been used and real-life scenarios of demand response events have been applied for this scope to actual nonindustrial consumers and especially residential consumers. The baseline has been calculated for these pilot sites and the KPIs have been evaluated for them. The presented results complete the picture of evaluating a real-life demand response program and show the effectiveness of the selected approach. The proposed schemes for baseline calculation and KPI evaluation can be used by the scientiﬁc community for evaluating future demand response programs, especially in the residential sector.


Introduction
In recent years, the grid has undergone great transformations, with various technologies implemented in order to integrate renewable energy sources and contribute to better energy management. In this context, demand response (DR) programs play a fundamental role. In such programs, incentives are given to consumers in order to change their consumption pattern, so as to avoid having great peaks in overall consumption, thus avoiding great investments in order to reinforce the grid [1].
Demand response attracts not only the interest of the research community, but also that of policy makers and stakeholders. In Europe, the recently approved Directive (EU 2019/944) [2] and Regulation (EU 2019/943) [3] for the internal market for electricity address demand response in several articles. According to the International Energy Agency, explicit demand response can be defined as incentive-based programs for consumers to shift their consumption, including automatic remuneration [4].

•
Distributed building DR reliability; • Energy consumption reduction in pilot sites; • Energy cost savings in pilot demonstration sites; • Reduction of CO 2 emissions in pilot sites; • Peak load reduction during pilot demonstration activities.
We propose the calculation procedure of two of the above KPIs, namely, the distributed building DR reliability and the peak load reduction during pilot demonstration activities, whereas the calculation methodology of the other three is extracted from the existing research achievements in the literature. The two KPIs whose evaluation is proposed in this paper are fundamental for the overall assessment of a demand response program, as they give a clear indication of the successfulness of the demand response events and overall peak load reduction.
For the realization of the KPI evaluation, the usage of the baseline is important. This has been calculated based on historical readings from the pilot sites and using machine learning algorithms.
The contributions and added value of this paper are summarised as follows: • We propose a structured way of calculating the baseline for energy consumption to be used for demand response purposes, while we compare the selected model for baseline calculation with the other two models.

•
We apply this baseline calculation methodology in real-life scenarios, by implementing it in real pilot sites, where consumers participate in demand response events.

•
We further use this baseline in order to calculate the key performance indicators that characterize the demand response program itself. Five KPIs are used for this scope.

•
We propose two novel KPIs and suggest a way to calculate them, namely the distributed building DR reliability and the peak load reduction during pilot demonstration activities. • Therefore, we propose a complete way of assessing a demand response program step-by-step: baseline calculation methodology creation; application of baseline to real-pilot sites; use of baseline for KPIs calculation; and suggestion of novel KPIs for the demand response programs evaluation.

•
The proposed methodology for baseline calculation and KPI evaluation can be utilised in future works by the scientific community for the evaluation of demand response programs.
The rest of the paper is structured as follows: Section 2 describes the pilot sites and the demand response events that have been used for this work. Section 3 presents the baseline calculation methodology, the models and the parameters used for this purpose. Section 4 shows the KPIs that have been used for evaluating the real-life demand response scenarios studied in this work. Section 5 shows the results obtained with respect to the KPIs evaluation. Finally, Section 6 concludes the paper.

Pilot Sites and DR Events Description
In this Section we describe the two pilot sites that have been considered for this work and the demand response events that have been realized. The pilot sites were selected according to the specific needs of the DRIMPAC project, which aims at exploiting the flexibility assets of residential and tertiary buildings. The project offers the necessary ICT infrastructure and develops the required protocols and standards for facilitating the participation of flexibility assets in energy markets. For this reason, it was fundamental to have multiple pilot sites comprising of residential and tertiary buildings. Both pilot sites described in this work include residential and office building consumers. Demand response programs have been applied in order to examine the flexibility that can be offered by such end-customers. Therefore, the sites contribute to the overall project's needs. On the other hand, the work described here is important in order to define the baseline for the final evaluation of the successfulness of such demand response programs.
During the pilot site demand response programs, several assets have been used, namely: building management systems or energy management systems, smart meters, heating ventilation and air conditioning Systems (including thermostats), domestic hot water heating, and smart home solutions like washing machines, dishwashers and ovens.
During demand response events, such assets were automatically switched on/off in order to alter energy consumption and reduce the peaks in the grid. For this reason, the energy management systems played an important role in enabling the control of assets within the buildings.

Cypriot Pilot Site
The pilot site of Cyprus is within the university campus. The range of campus buildings includes educational facilities, office buildings, restaurants and sport centres, for a total covered area of 80,000 m 2 . These buildings have been constructed to be energy efficient, since they are used daily by hundreds of students and over 1500 personnel. There are three PV installations in the campus: a 70 kWp and a 150 kWp rooftop system and a small 175 kWp PV farm. The installation of a 5 MWp solar park, combined with 2.35 MWh of electrochemical storage, is intended to contribute to the self-sufficiency capability of the campus.
There exist several building management systems from various vendors, like Siemens, Johnson Controls and Honeywell, with which DRIMPAC solutions can be tested. In addition, supervisory, control and monitoring equipment are installed in order to oversee building operations. The building energy management systems from SIEMENS can support explicit demand response programs in buildings, since there is direct control through a custom central EMS [18].
Several buildings participate in the DRIMPAC project of the campus, selected for their heterogenous set of services and functional requirements, which are, namely: • Administration Building-ADM (Tertiary) • Library Building (Tertiary) • Faculty of Economics and Management-FEB (Tertiary) • UCY Residential Blocks (Residential) • PV Technology Laboratory nanogrid (Large DER) Figure 1 shows the main buildings in the Cypriot pilot site and the connections among different devices/buildings. It can be observed that building energy management systems or central energy management systems have been used in order to achieve demand response events for this pilot site.
With respect to the Demand Response events that took place in this pilot site, a total number of 250 DR signals were sent during the period from the 17 June 2022 to the 15 July 2022. All the DR events lasted for 15 min and were sent at random periods during the day. These DR events have been considered for the evaluation of the KPIs and all the associated calculations. With respect to the Demand Response events that took place in this pilot site, a total number of 250 DR signals were sent during the period from the 17 June 2022 to the 15 July 2022. All the DR events lasted for 15 min and were sent at random periods during the day. These DR events have been considered for the evaluation of the KPIs and all the associated calculations.

Spanish Pilot Site
The Spanish pilot site entails three groups of facilities [18]: 1. Joven Futura residential neighbourhood: It includes four independent buildings, each one comprising 30 eight-story flats, for a total area of around 6500 m 2 . The buildings are managed by an external facility manager. The yearly annual electricity consumption is around 100 MWh, with an associated cost of 19,000 euros. All houses are equipped with smart meters, whereas some are also equipped with smart thermostats.

Parque Cientifico de Murcia:
A single office building with several facilities (e.g., offices, meeting rooms, a kitchen) covering an area of 3600 m 2 . Its yearly annual consumption is around 1.5 GWh, with an associated energy cost of 160,000 euros. It is worth noting that 75% of the energy is consumed by a data centre, whereas the remaining 25% is consumed by offices. The building is equipped with smart devices, i.e., thermostats, temperature and illumination sensors.

Magalia Business Centre:
This office building covers an area of 7800 m 2 . In addition to offices, the building includes an auditorium, underground parking, a restaurant plus kitchen, and meeting rooms. There is an in-house facility manager that manages the building. The yearly energy consumption is around 120 MWh, with a cost of 12,000 euros. It is equipped with smart devices, like smart thermostats and temperature and light sensors.
In addition to the three locations being located close to one another, they are handled by the same energy provide; thus, we refer to them as one pilot site. Thus, the same approach has been used for all three locations and the same concept has been applied when performing demand response programs in these locations.
The schematics of the buildings in the Spanish pilot site are illustrated in Figure 2. It can be observed how the various appliances/loads within the buildings are connected to electricity measuring points in order to enable demand response programs.

Spanish Pilot Site
The Spanish pilot site entails three groups of facilities [18]:

1.
Joven Futura residential neighbourhood: It includes four independent buildings, each one comprising 30 eight-story flats, for a total area of around 6500 m 2 . The buildings are managed by an external facility manager. The yearly annual electricity consumption is around 100 MWh, with an associated cost of 19,000 euros. All houses are equipped with smart meters, whereas some are also equipped with smart thermostats.

2.
Parque Cientifico de Murcia: A single office building with several facilities (e.g., offices, meeting rooms, a kitchen) covering an area of 3600 m 2 . Its yearly annual consumption is around 1.5 GWh, with an associated energy cost of 160,000 euros. It is worth noting that 75% of the energy is consumed by a data centre, whereas the remaining 25% is consumed by offices. The building is equipped with smart devices, i.e., thermostats, temperature and illumination sensors.

3.
Magalia Business Centre: This office building covers an area of 7800 m 2 . In addition to offices, the building includes an auditorium, underground parking, a restaurant plus kitchen, and meeting rooms. There is an in-house facility manager that manages the building. The yearly energy consumption is around 120 MWh, with a cost of 12,000 euros. It is equipped with smart devices, like smart thermostats and temperature and light sensors.
In addition to the three locations being located close to one another, they are handled by the same energy provide; thus, we refer to them as one pilot site. Thus, the same approach has been used for all three locations and the same concept has been applied when performing demand response programs in these locations.
The schematics of the buildings in the Spanish pilot site are illustrated in Figure 2. It can be observed how the various appliances/loads within the buildings are connected to electricity measuring points in order to enable demand response programs.
With respect to the Spanish pilot site, 248 DR events were sent both to residential and office buildings. The DR events were sent during the period between 5 July 2022 and 15 July 2022, with each event lasting 15 min. These DR events occurred during random times of the day. For the residential houses, 204 events were sent, whereas for the offices, 44 events were sent. For the evaluation of the KPIs, these 248 DR events have been considered and the calculations have been based on them. With respect to the Spanish pilot site, 248 DR events were sent both to residential and office buildings. The DR events were sent during the period between 5 July 2022 and 15 July 2022, with each event lasting 15 min. These DR events occurred during random times of the day. For the residential houses, 204 events were sent, whereas for the offices, 44 events were sent. For the evaluation of the KPIs, these 248 DR events have been considered and the calculations have been based on them.

Baseline Methodology Calculation
In this Section, we give the description of the baseline methodology calculation we suggest. We give details about the models that have been considered for such calculation, along with the parameters that have been considered and the steps that we followed. We describe how we ran the proposed algorithms and we give information about the usage of historical data obtained from the pilot sites.
First, there are three important steps for applying any possible methodology/algorithm for the baseline calculation: 1. Data acquisition and process; 2. Exploratory data analysis; 3. Model application and baseline calculation.
As the name suggests, the first step has to do with data acquisition. It is related to obtaining all necessary historical data from the pilot sites in order to utilize it for the baseline calculation. Such data entail mainly an energy consumption profile, together with several crucial parameters such as luminance, humidity and temperature, where available.
The second step has to do with performing a statistical analysis of the available data, so as to come up with trends and identify outliers, if any, spot errors in the collected data and verify that the data used to extract the baseline are correct and will lead to the best possible results.
The third step has to do with applying a specific model in order to extract the baseline. As mentioned in the Introduction, in [14] a thorough analysis is made to analyze different methodologies for the baseline calculation. In this work, we take into account three different methodologies, extracting the baseline and comparing the results from each method. The selection of the three methodologies was made in order to respect the available data and the nature of the pilot sites, as these are two important parameters, according to [14]. Complexity was an important issue in addition to accuracy; the target was to reach maximum accuracy while keeping complexity at relatively low levels. In relation to the methods for the baseline calculation described in [14], the following table lists the selected methodologies for this work and the reasoning behind this choice.

Baseline Methodology Calculation
In this Section, we give the description of the baseline methodology calculation we suggest. We give details about the models that have been considered for such calculation, along with the parameters that have been considered and the steps that we followed. We describe how we ran the proposed algorithms and we give information about the usage of historical data obtained from the pilot sites.
First, there are three important steps for applying any possible methodology/algorithm for the baseline calculation:
Model application and baseline calculation.
As the name suggests, the first step has to do with data acquisition. It is related to obtaining all necessary historical data from the pilot sites in order to utilize it for the baseline calculation. Such data entail mainly an energy consumption profile, together with several crucial parameters such as luminance, humidity and temperature, where available.
The second step has to do with performing a statistical analysis of the available data, so as to come up with trends and identify outliers, if any, spot errors in the collected data and verify that the data used to extract the baseline are correct and will lead to the best possible results.
The third step has to do with applying a specific model in order to extract the baseline. As mentioned in the Introduction, in [14] a thorough analysis is made to analyze different methodologies for the baseline calculation. In this work, we take into account three different methodologies, extracting the baseline and comparing the results from each method. The selection of the three methodologies was made in order to respect the available data and the nature of the pilot sites, as these are two important parameters, according to [14]. Complexity was an important issue in addition to accuracy; the target was to reach maximum accuracy while keeping complexity at relatively low levels. In relation to the methods for the baseline calculation described in [14], the following table lists the selected methodologies for this work and the reasoning behind this choice.
As can be seen from Table 1, there are many methodologies that can be applied in order to extract the baseline methodology. The day-matching technique was not selected, as the accuracy reported is relatively low. With respect to the control group methodology, this is usually applied when there are no historical data, and the optimal target group is new users. This was not the case for our two pilot sites, as historical data were available and, for this reason, no control group was available. The self-reported baseline is mainly a way for a specific consumer to report the baseline consumption. In our case, we needed to have a structured methodology for baseline calculation for each pilot site. Therefore, this methodology was not selected in our study. The need to have a concrete model for baseline calculation was also the reason why probabilistic estimation techniques were Energies 2023, 16, 6048 7 of 28 not chosen. Finally, the unsupervised learning techniques were not selected due to their reported increased complexity. Table 1. Baseline calculation methodologies, as listed in [14] and relation to this work. We selected two main categories, namely, the regression models and the neural networks. In particular, with respect to regression models, we used two methodologies: the multivariable regression analysis, where multiple variables are taken into consideration, and the univariate linear regression model. We also used neural networks for the baseline calculation and in particular the convolution neural networks. The three resulting models were selected because it is reported that they provide medium to high accuracy while maintaining complexity at medium/high levels. All three models are concrete methodologies reported in the literature review and used widely for baseline calculation. We compared all three techniques and we kept the most promising one in terms of results to be used for baseline energy consumption calculation.

Methodology
In the following, we give a detailed description of the steps required for the baseline calculation. The first two steps are the data acquisition and the exploratory data analysis. The third step has to do with the application of a specific model for the baseline estimation. Since we applied three different algorithms at this third step, we analyze each one separately.

Data Acquisition and Processing
In this subsection, we describe the process of acquiring the data from the pilot sites, which will be used as the basis for estimating the baseline consumption.
A Rest application programming interface (Rest-API) has been used to retrieve timeseries data. There was a variable duration of the selected data points through the API depending on the functionality of equipment installed in the pilot sites and their operational functionality. An additional API was used, namely the "prosumers configuration retrieval API", which gave useful information about the kind of loads in one or multiple buildings and the equipment installed in the pilot sites.
Energy consumption data were available with a time granularity of 15 min or 30 min depending on the data availability. In addition, smart boxes were used in order to retrieve data related to humidity, temperature and luminance, referring to indoor values. For the Cypriot pilot site, the luminance was not available, therefore, global horizontal irradiance was used. These values served for the multivariable regression model, where they have been used as the energy demand determinants.
The procedure to retrieve data is as follows: • In order to ensure only trusted users had access to data, a POST request was used to obtain a user authentication token. The user authentication API was also utilized for this scope.

•
In order to ask for the data to be gathered for data analysis via the timeseries data retrieval API, a GET function was used, whereas the key parameters were listed as follows: item UUID: The item uuid (universally unique identifier) to retrieve timeseries data. This has to do with a unique identifier to recognize the type of load measured, the measurement unit and the location of the measurement (in which place of the residence the measurement took place), among others.
• The format of data gathered was JSON. A cleaning process took place, so as to use only the necessary parts of the information and store this useful data in a suitable CSV format for the following steps of the analysis.

Exploratory Data Analysis (EDA)
The exploratory data analysis serves to delivering statistical analysis of the questioned data. It helps in identifying the structure of data and possible patterns. It is fundamental in order to detect the possible errors that occurred during data acquisition process, identify outliers, uncover empirical relationships, indicate statistical assumptions and propose possible models for the data [19]. Therefore, exploratory data analysis is fundamental in order to detect possible errors with respect to data and guarantee the quality of the available data in order to utilize it for the baseline calculation.
For the correct implementation of analytical tools and statistical methods on the available data, it is important that this is stored together with a correct timestamp. This condition is valid not only for energy consumption data, which are the historical data used for estimating the energy baseline, but also for crucial parameters influencing energy consumption, like weather conditions, e.g., temperature, luminance and humidity. Such parameters impact energy consumption, since good or bad weather has a direct effect on energy consumption. Thus, all such parameters are decisive for the final energy baseline calculation, and it is necessary that they are accompanied with a correct timestamp.
Part of the exploratory data analysis can include possible correction techniques for the available data. For example, it is common when acquiring big data to have missing data samples. This can induce problems, like introducing bias, creating issues in data processing and reducing efficiency [20]. There are various techniques to compensate for missing data, although their applicability might depend on the nature of missing data. In our case, the time series was retrieved through the Rest API. The issues that came up during data retrieval and the techniques applied to compensate for these issues, if any, are listed as follows:

•
There was a problem when the equivalent smart box/smart meter was offline, meaning that no data were transmitted. To deal with this problematic issue during the EDA, the period during which these devices were offline was not considered in the analysis. Less than 2% of the total data were identified as missing due to this problem. In addition, it was not possible to compensate for such missing data, as the deactivation of the equivalent devices occurred completely randomly. • A second problem was that although smart boxes/smart meters were online, the data values were not recorded properly. Less than 1% of the total data were missing due to this problematic. To compensate for this, the "fillGaps" API parameter was utilized in order to fill in gaps in the data series based on adjacent values. • A third problem had to do with the parameters influencing energy consumption, e.g., temperature, luminance and humidity. Specifically, the vector size was not the same for all these parameters, due to the sensor that was used for measuring. Therefore, the vector with the minimum size needed to be used, adjusting the rest of the values.

Multivariable Linear Regression Model
As mentioned in the beginning of this section, we have applied a multivariable linear regression model based on several independent variables in order to calculate the energy baseline. These independent variables are used to predict the output, which in our case is energy consumption. Thus, energy consumption is our dependent variable, and the independent variables are luminance, temperature and humidity. The variables considered in this work have been based chosen according to the literature review presented in the Introduction, focusing mainly on previous projects in the field and calculations of the energy baseline. Further investigation would be necessary to define whether or not other variables (e.g., population density, unit effectiveness) can be crucial for the energy baseline, which is left as future work.
This method for calculating the baseline belongs to the supervised machine learning algorithms and requires a series of data and multiple data variables to be executed.
For such a model to be applied, the series of input data needs to be partitioned into training and testing sets. Usually, most of the data are allocated for training the model. In our case, we allocated 70% of the data for training and 30% for testing the model. The mathematical representation of the model is given in the following equation: where a, b, and c are the coefficients for each of the variables and intercept is a constant. In order to evaluate the model's performance, two metrics were used, namely: Another metric similar to RMSE is the Mean Absolute Error (MAE). However, the one that is mostly used in regression models is the RMSE, which gives a more severe metric for errors in the forecast procedure.
In the following chapter, we present the results of this model (together with the associated linear coefficients for the three independent variables) when applied on the Cypriot and Spanish pilot sites.

Univariate Linear Regression Model
The univariate linear regression model considers only historical energy consumption data and disregards the other variables considered in the previous subsection, namely temperature, humidity and luminance. Linear regression is used for the data model using the lag feature. For the creation of the lag feature, the observations of the target series are shifted so as to appear as if they had occurred later in time. We have used a one-step lag feature; in general, the concept is the same when multiple steps of shifting are used. The resulting model is described as follows: where: Target: forecasted energy consumption; weight: a coefficient factor produced by the model; lag: energy consumption from one step behind in time; intercept: a coefficient factor produced by the model. With this model, every predicted value is calculated taking into consideration the previous step after applying on it the weighting and intercept factor. In particular, we have used 96 values corresponding to energy consumption values of one day (15 min intervals between the values). These values have been used by our model in order to extract the energy consumption of the following day. This concept was used in order to produce the energy baseline based on historical data only.

Convolutional Neural Networks (CNN)
Convolutional neural networks are actually deep learning models applied to time series forecasting. There are several types of models under this category that can be applied to historical data of energy consumption in order to predict the energy baseline consumption. The goal has been to develop a model capable of forecasting energy consumption of the following 7 days, whereas each day is divided in 48 or 96 intervals of 30 or 15 min each. Since there are multiple forecasting steps, the model is a multistep time series forecasting problem. Given the fact that the model uses multiple input variables, it can be characterized as a multivariable multistep time series forecasting model. To evaluate the model, the RMSE metric has been used, meaning that the metric is calculated for each 30 or 15 min time interval for which a prediction of the energy baseline is made. Therefore, we end up having multiple intervals over the considered seven-day period for which the energy baseline is forecasted.
It is worth mentioning that the so-called walk-forward validation scheme is used in order for the model to forecast the energy baseline consumption. This means that, after the model executes a 7-day prediction for the energy baseline, the actual consumption of these 7 days is used in order to make a prediction for the following week. This is beneficial since the model gets to have the most accurate data in order to execute the energy baseline forecasting. Further details on the specific training and architecture of the CNN are provided next.
For time-series analysis with CNNs, the input data are represented as a 3D tensor with dimensions (samples, timesteps, channels). "Samples" refer to individual time-series sequences, "timesteps" denote the number of observations in each sequence, and "channels" represent univariate time series or variables. In our case, the univariate variables are the historical time-series data of energy consumption, implying that the "channels" dimension is set to 1.
Essentially, the data are structured to capture the temporal nature of the time series, with each sample having a series of observations over time. Before feeding the data into the CNN model, proper preprocessing has been applied, including normalization and handling missing values.
Furthermore, we used a model with one convolution layer with 16 filters and a kernel size of three. This means that the input sequence of seven days will be read with a convolutional operation three-time steps at a time and this operation will be performed 16 times. A pooling layer will reduce the size of these feature maps by one-fourth before the internal representation is flattened to one long vector. This is then interpreted by a fully connected layer before the output layer predicts the next seven days in the sequence.
The learning rate plays a crucial role in the estimation process. A higher learning rate allows the model to take larger steps during optimization, which may lead to faster convergence but may also risk overshooting the optimal solution. On the other hand, a smaller learning rate takes smaller steps, which can lead to more stable and accurate convergence but may result in slower training. In our case, we used an adaptive algorithm to control the learning rate during training. More specifically, we have used the efficient Adam implementation of stochastic gradient descent and fit the model for 20 epochs with a batch size of four. The choice of the number of epochs for a CNN model depends on factors like dataset size, model complexity, and specific problem. Selecting 20 epochs is a reasonable choice for moderate-sized datasets, allowing the model to see data multiple times without overfitting. It is also suitable for scenarios with time and resource constraints (like in our case), as CNN training can be computationally expensive. Early stopping can be employed, monitoring validation performance to prevent overfitting. For our CNN model trained on time-series data of energy consumption, starting with a relatively small Energies 2023, 16, 6048 11 of 28 learning rate is generally a good practice to ensure stable training progress. The starting learning rate for such a model with 20 epochs falls within the range of 0.001 to 0.01. During our CNN model's training over 20 epochs using time-series data of energy consumption, the starting accuracy (1-5 epochs) is low (around 10%) and the loss relatively high (2.0 to 3.5) in the initial epochs. As the training progresses, accuracy improves steadily, reaching 40% to 60% and loss is reduced (around 0.5 to 1.5) in mid-epochs of 6-15. In the later epochs (15)(16)(17)(18)(19)(20), accuracy continues to increase reaching 60% to 80%, while loss further decreases, leveling off in the range of 0.3 to 1.0. [21].For the Cypriot pilot site, we had at our disposal data points of 15 min of granularity for a total of 566 days, resulting in 54,336 energy consumption data points as historical data. This size has been more than enough to train the CNN deep learning algorithm. This huge dataset has been divided into the training set and the test set. To respect the rule of 70-30% for the training and test sets, we used 538 days as the training set and 28 days (4 weeks of 7 days) as the test set. We had the luxury to forecast 4 weeks, since the whole set of historical data has been large enough to allow it.
On the other hand, for the Spanish pilot site, the whole set of historical data was smaller, but large enough so as to forecast 7 days. Therefore, we used 146 days of data as the training set and the final 7 days as the test set to evaluate the model.

Results for the Baseline Calculation
In this Section, we present the results of the methodology described in the previous Section when applied in our two pilot sites. We show the outcomes of the exploratory data analysis and the results of the three models applied for the energy baseline calculation.

Data Acquisition and Exploratory Data Analysis
The data obtained for the Cypriot pilot site covered a period of 566 days. It is worth mentioning that, for this pilot site, the luminance data came from the global horizontal irradiance (GHI) metric, as there were no available data for the specific pilot site.
We have obtained data corresponding to 15 min intervals, resulting in 54,336 data points. We combined two intervals in order to have information for a period of 30 min. Thus, the resulting average power consumption is 31.24 kWh, while the maximum and minimum are 79.06 and −10.05 kWh, where the negative sign indicates power production instead of consumption. The resulting total energy consumption is 1697.6 MWh. The mean value for the global horizontal irradiance is 227 W/m 2 . Regarding the other values the mean temperature value is 24.91 • C, whereas for the humidity the mean value is 65.11%. Figure 3 shows all the average, minimum and maximum values, respectively, together with the three quartiles, 25%, 50% and 75%. Figure 4 shows the histogram plots for the following variables of the Cypriot pilot site: energy, temperature, humidity and global horizontal irradiance. Regarding energy, most values range between 20-30 kWh, whereas the temperature values are between 15-32 °C. For humidity, there are a lot of values higher than 70%, whereas GHI is related to the season which we are examining. Figure 5 shows the correlation among the examined variables. It should be noted at this point that in terms of model creation and resulting outcomes, we are only interested in the correlation between whereas GHI is related to the season which we are examining. Figure 5 shows the correlation among the examined variables. It should be noted at this point that in terms of model creation and resulting outcomes, we are only interested in the correlation between the energy and the other variables. The correlation among the other variables themselves, i.e., the correlation between humidity and temperature or between global horizontal irradiance and humidity is only shown for reasons of completeness  Regarding energy, most values range between 20-30 kWh, whereas the temperature values are between 15-32 • C. For humidity, there are a lot of values higher than 70%, whereas GHI is related to the season which we are examining. Figure 5 shows the correlation among the examined variables. It should be noted at this point that in terms of model creation and resulting outcomes, we are only interested in the correlation between the energy and the other variables. The correlation among the other variables themselves, i.e., the correlation between humidity and temperature or between global horizontal irradiance and humidity is only shown for reasons of completeness It is noticed that there is a correlation between temperature and the GHI metric, as expected, whereas there is negative correlation between humidity and temperature. On It is noticed that there is a correlation between temperature and the GHI metric, as expected, whereas there is negative correlation between humidity and temperature. On the other hand, the energy variable seems to have little, if any, correlation to the other variables. We are reminded that when we have a value of 1, there is positive correlation, meaning that when one variable increases, so does the other variable. Negative correlation (−1) means that, when one variable increases, the other decreases. Zero correlation indicates that the variables change independently.

Multivariable Regression Model
To realize the multivariable regression model, we used 70% of the available data for training the model and the remaining 30% of the available data to test the model. After running the model, the resulting coefficients for the three considered variables and the intercept factor are as follows: Energy = −0.069 × temperature + 0.084 × humidity + 0.0012 × luminance + 28.2 (3) The values of the two used metrics as described in Section 3.5, are: We are reminded here that the mean value of energy consumption for a time period of 15 min is around 31.24 kWh. Therefore, the value of RMSE cannot be considered appropriate for an accurate energy baseline calculation model. As a result, the variables of temperature, humidity and luminance are not considered adequate to provide an accurate energy baseline model. This is in line with the exploratory data analysis, which showed that there was little correlation between the three predictors and energy consumption.

Univariate Regression Model
As explained in Section 3.4, the univariate regression model uses the lag feature, meaning that historical data of energy consumption are used to evaluate energy consumption baseline, shifted by one day. We have in total 96 values, as historical data for one day are recorded in 15 min time intervals; this historical data is shifted by one day in order to be used for the univariate regression model. The application of this model to the available data returns the following mathematical formula for the energy consumption baseline: In the above equation, Energy(t) represents the predicted energy consumption for one particular interval, whereas Energy(t − 1) represents the energy consumed for the particular interval the previous day. It is also worth noting that, even though the slope might be positive, this does not imply that predicted energy levels will be always positive.
The values of the two used metrics, as described in Section 3.5, are: When compared to the metrics calculated for the previous regression model, we note that the univariate model outperforms the multivariable model. Figure 6 shows the comparison of the actual and predicted energy consumption for a period of one week (672 time intervals), using the univariate model for the Cypriot pilot site. The dots represent the actual values, whereas the blue line shows the values predicted by the model. R 2 = 0.388 or 38.8%; RMSE = 10.52 kWh. When compared to the metrics calculated for the previous regression model, we note that the univariate model outperforms the multivariable model. Figure 6 shows the comparison of the actual and predicted energy consumption for a period of one week (672 time intervals), using the univariate model for the Cypriot pilot site. The dots represent the actual values, whereas the blue line shows the values predicted by the model.

Convolutional Neural Networks
The convolutional neural network described in Section 3.5 is applied to the Cypriot pilot site. We calculate the RMSE for all 96 intervals corresponding to one day and it results in: RMSE = 2.368 kWh. This means that the forecasted values differ on average from the actual ones only by 2.368 kWh. Given the fact that for the 15 min time interval the average consumption is 31.24 kWh, the resulting RMSE is considered a good value. Figure 7 shows the RMSE for one day for the Cypriot pilot site, divided in 96 intervals of 15 min each. The lower the values of the RMSE, the more accurate the forecast of the energy baseline. In addition, the first RMSE calculated corresponds to the interval at 12:00 midnight, meaning that the last step corresponds to 11:45 pm. According to Figure 7, the easy periods for predicting are from midnight to around 05:00 am, between 14:30 to 19:15 and between 21:00 and midnight. The hardest time period to predict the energy baseline lies between 14:30 and 19:15 approximately.

Convolutional Neural Networks
The convolutional neural network described in Section 3.5 is applied to the Cypriot pilot site. We calculate the RMSE for all 96 intervals corresponding to one day and it results in: RMSE = 2.368 kWh.
This means that the forecasted values differ on average from the actual ones only by 2.368 kWh. Given the fact that for the 15 min time interval the average consumption is 31.24 kWh, the resulting RMSE is considered a good value. Figure 7 shows the RMSE for one day for the Cypriot pilot site, divided in 96 intervals of 15 min each. The lower the values of the RMSE, the more accurate the forecast of the energy baseline. In addition, the first RMSE calculated corresponds to the interval at 12:00 midnight, meaning that the last step corresponds to 11:45 pm. According to Figure 7

Data Acquisition and Exploratory Data Analysis
The collected data for the Spanish pilot site cover a period of 5.5 months approximately. Data are collected for the same four parameters used in the Cypriot pilot site, namely: energy, luminance, temperature and humidity. The data regarding energy consumption come from smart meters and the parameter is measured in kWh. The rest of the variables are in lumens, percentage and °C, whereas the data are collected from sensors. All data are collected with an interval of 30 min. Figure 8 shows the maximum, minimum, mean values for the four parameters mentioned, together with the 3 quartiles

Data Acquisition and Exploratory Data Analysis
The collected data for the Spanish pilot site cover a period of 5.5 months approximately. Data are collected for the same four parameters used in the Cypriot pilot site, namely: energy, luminance, temperature and humidity. The data regarding energy consumption Energies 2023, 16, 6048 15 of 28 come from smart meters and the parameter is measured in kWh. The rest of the variables are in lumens, percentage and • C, whereas the data are collected from sensors. All data are collected with an interval of 30 min. Figure 8 shows the maximum, minimum, mean values for the four parameters mentioned, together with the 3 quartiles (25%, 50%, 75%).

Data Acquisition and Exploratory Data Analysis
The collected data for the Spanish pilot site cover a period of 5.5 months approximately. Data are collected for the same four parameters used in the Cypriot pilot site, namely: energy, luminance, temperature and humidity. The data regarding energy consumption come from smart meters and the parameter is measured in kWh. The rest of the variables are in lumens, percentage and °C, whereas the data are collected from sensors. All data are collected with an interval of 30 min. Figure 8 shows the maximum, minimum, mean values for the four parameters mentioned, together with the 3 quartiles (25%, 50%, 75%). In total, we have gathered 7344 datapoints with a recorded average power consumption of 0.197 kWh and a maximum of 2.31 kWh. The mean value for temperature was 26.94 °C and the mean value for humidity and luminance were 53.14% and 7.17 lumens, respectively. Figure 9 shows the histograms for the four examined parameters: energy, luminance, temperature and humidity. In total, we have gathered 7344 datapoints with a recorded average power consumption of 0.197 kWh and a maximum of 2.31 kWh. The mean value for temperature was 26.94 • C and the mean value for humidity and luminance were 53.14% and 7.17 lumens, respectively. Figure 9 shows the histograms for the four examined parameters: energy, luminance, temperature and humidity.  With respect to energy, the peak is around 0.2 kWh (for a period of 30 min). For temperature, the peaks are around 28-29 °C, whereas humidity forms a normal distribution with a mean value around 55%. Figure 10 shows the correlation among the four examined variables. It is observed that there is very little, if any, correlation among the variables, which is an indication of the performance of the multivariable regression model. It should be noted that, in terms of model creation and resulting outcomes, we are only interested in the correlation between the energy and the other variables. The With respect to energy, the peak is around 0.2 kWh (for a period of 30 min). For temperature, the peaks are around 28-29 • C, whereas humidity forms a normal distribution with a mean value around 55%. Figure 10 shows the correlation among the four examined variables. It is observed that there is very little, if any, correlation among the variables, which is an indication of the performance of the multivariable regression model. It should be noted that, in terms of model creation and resulting outcomes, we are only interested in the correlation between the energy and the other variables. The correlation among the other variables themselves, i.e., correlation between humidity and temperature or between global horizontal irradiance and humidity, is only shown for reasons of completeness.

Multivariable Regression Model
In this subsection, we present the results of the multivariable regression model applied to the Spanish pilot site. First of all, 70% of the available data have been allocated for the training set (5140 out of 7344 datapoints) and the remaining 30% have been assigned to the test set (2204 out of 7344 datapoints). After running the model, the resulting coefficients for the three considered variables and the intercept factor are as follows: Energy = −0.0088 × temperature − 0.0033 × humidity + 0.0002 × luminance + 0.605 (5) The values of the two used metrics, as described in Section 3.5, are: R 2 = 0.0478 or 4.78%; RMSE = 0.218 kWh.
We are reminded here that the mean value of energy consumption for a time period of 30 min is around 0.2 kWh. Therefore, the RMSE value here cannot be considered acceptable, meaning that the three parameters of luminance, humidity and temperature do not provide a sufficiently accurate prediction of energy consumption.

Univariate Regression Model
In this subsection, we present the results when applying the univariate regression model to the Spanish pilot site. The obtained formula for the predicted energy consumption baseline is the following: Energy(t) = 0.619 × Energy(t − 1) + 0.0759 (6) As for the Cypriot pilot site, Energy(t) represents the predicted energy consumption for one particular interval, whereas Energy(t − 1) represents the energy consumed for the particular interval the previous day. In this case, we have 30 min intervals for the available historical data. It is also worth noting that, while indeed the slope is positive, this does not imply that predicted energy levels will be always positive, as the value of the slope is smaller than 1. For example, for the Spanish pilot site, the relationship is the following: y = 0.619x + 0.0759. That is, assuming a value of x = 1 kWh for t = t − 1, y will be 0.695 kWh for t = t, which is lower. One should also consider that the coefficient of determination for this method was in any case relatively low, which is why it was outperformed by the CNN.

Multivariable Regression Model
In this subsection, we present the results of the multivariable regression model applied to the Spanish pilot site. First of all, 70% of the available data have been allocated for the training set (5140 out of 7344 datapoints) and the remaining 30% have been assigned to the test set (2204 out of 7344 datapoints). After running the model, the resulting coefficients for the three considered variables and the intercept factor are as follows: Energy = −0.0088 × temperature − 0.0033 × humidity + 0.0002 × luminance + 0.605 (5) The values of the two used metrics, as described in Section 3.5, are: R 2 = 0.0478 or 4.78%; RMSE = 0.218 kWh.
We are reminded here that the mean value of energy consumption for a time period of 30 min is around 0.2 kWh. Therefore, the RMSE value here cannot be considered acceptable, meaning that the three parameters of luminance, humidity and temperature do not provide a sufficiently accurate prediction of energy consumption.

Univariate Regression Model
In this subsection, we present the results when applying the univariate regression model to the Spanish pilot site. The obtained formula for the predicted energy consumption baseline is the following: Energy(t) = 0.619 × Energy(t − 1) + 0.0759 (6) As for the Cypriot pilot site, Energy(t) represents the predicted energy consumption for one particular interval, whereas Energy(t − 1) represents the energy consumed for the particular interval the previous day. In this case, we have 30 min intervals for the available historical data. It is also worth noting that, while indeed the slope is positive, this does not imply that predicted energy levels will be always positive, as the value of the slope is smaller than 1. For example, for the Spanish pilot site, the relationship is the following: y = 0.619x + 0.0759. That is, assuming a value of x = 1 kWh for t = t − 1, y will be 0.695 kWh for t = t, which is lower. One should also consider that the coefficient of determination for this method was in any case relatively low, which is why it was outperformed by the CNN.
The values of the two used metrics, as described in Section 3.5, are: R 2 = 0.379 or 37.9%; RMSE = 0.1746 kWh; When compared to the metrics calculated for the previous regression model, we notice that the univariate model outperforms the multivariable model. For the RMSE value, we are reminded that the closer the value is to zero, the better the performance of the model. Figure 11 shows the comparison of the actual energy consumption and the forecasted one for the period of one week. Again, the dots represent the actual consumption points and the blue line the forecasted consumption given by the model. From the graph, we notice that the accuracy increases when the values of energy are lower (0.1-0.3 kWh), whereas we have lower accuracy for higher values for energy consumption. When compared to the metrics calculated for the previous regression model, we notice that the univariate model outperforms the multivariable model. For the RMSE value, we are reminded that the closer the value is to zero, the better the performance of the model. Figure 11 shows the comparison of the actual energy consumption and the forecasted one for the period of one week. Again, the dots represent the actual consumption points and the blue line the forecasted consumption given by the model. From the graph, we notice that the accuracy increases when the values of energy are lower (0.1-0.3 kWh), whereas we have lower accuracy for higher values for energy consumption.

Convolutional Neural Networks
In this subsection, we give the results for the energy baseline evaluation when the convolutional neural network described in Section 3.5 is used. After applying the CNN model, we calculate the RMSE metric for the 48 intervals (30 min interval between each calculation) during one day. The resulting average RMSE for one day is: RMSE = 0.054 kWh Figure 12 shows the plot of the RMSE calculated for each of the 48 intervals of 30 min of a single day. A lower value of the RMSE implies that the accurate forecasting of energy baseline consumption is easier to be accomplished. In Figure 12, the first RMSE calculation corresponds to the RMSE obtained at 09:00 in the morning, thus the last one corresponds to the calculation performed at 08:30 the following morning.

Convolutional Neural Networks
In this subsection, we give the results for the energy baseline evaluation when the convolutional neural network described in Section 3.5 is used. After applying the CNN model, we calculate the RMSE metric for the 48 intervals (30 min interval between each calculation) during one day. The resulting average RMSE for one day is: RMSE = 0.054 kWh Figure 12 shows the plot of the RMSE calculated for each of the 48 intervals of 30 min of a single day. A lower value of the RMSE implies that the accurate forecasting of energy baseline consumption is easier to be accomplished. In Figure 12, the first RMSE calculation corresponds to the RMSE obtained at 09:00 in the morning, thus the last one corresponds to the calculation performed at 08:30 the following morning. calculation) during one day. The resulting average RMSE for one day is: RMSE = 0.054 kWh Figure 12 shows the plot of the RMSE calculated for each of the 48 intervals of 30 min of a single day. A lower value of the RMSE implies that the accurate forecasting of energy baseline consumption is easier to be accomplished. In Figure 12, the first RMSE calculation corresponds to the RMSE obtained at 09:00 in the morning, thus the last one corresponds to the calculation performed at 08:30 the following morning.

Summary for the Baseline Calculation
For both pilot sites, three models have been used in order to calculate the baseline energy consumption in real-life scenarios. The three models for baseline calculation have been chosen out of several possible solutions, according to [14]. The selection was made in order to achieve the best possible results in terms of keeping the complexity at reasonable levels and achieving at the same time high accuracy, as described in Table 1.
For the pilot sites, historical data have been acquired and an exploratory data analysis has been performed. All three models have been applied and the metrics have been calculated. The results show that the model with the best performance is the convolutional neural network for both pilot sites, which outperforms the other two models. The least efficient model has been proved to be the multivariable regression model, since the three parameters, namely the temperature, the humidity and the luminance, exhibit limited correlation with the energy consumption. Therefore, the model chosen for the energy baseline calculation is the convolutional neural network, which gives the greatest accuracy in both pilot sites. In the following, we apply this baseline in order to evaluate the basic key performance indicators of the two pilot sites, thus providing a holistic evaluation of demand response events.

KPIs Evaluation
In this Section, we give the evaluation of the KPIs that describe the performance of the demand response events and the overall performance of the project. Five KPIs are presented for their evaluation, as these have been listed in the Introduction.

KPI 1: Distributed Building Demand Response Reliability
This KPI is related to the transformation of buildings into active participants of the smart grid. Each building participating in demand response programs is considered for the calculation of this KPI. In this paper, we propose a novel way of calculating this KPI, for which it is necessary to define in the best possible way when a demand response event is considered successful. For this reason, we consider the effect of a demand response event, which is a change in the end user's consumption or production during the event. Thus, it is important to define which percentage of deviation from the baseline is considered enough to have a successful demand response. For example, a 5% deviation can be considered independent from a demand response event. On the other hand, a 10% deviation between the actual energy consumption and the baseline can be considered representative of a successful demand response event, as such deviation is great enough to consider that it occurred only due to a demand response event. This is further justified in [22]. In this work, we consider a 10% deviation from baseline consumption as proof of a successful demand response event. We calculate the consumption during the demand response event and derive the difference with respect to the baseline consumption. If this is larger than the 10% of the baseline, then the demand response event is considered successful. Thus, if we consider that ev is a demand response event with a total of events N ev , then we have: D cons, ev = t∈∆t (P baseline,elec,ev (t) − P DR,elec,ev (t))·δ DR, ev (t)dt (8) N ev,s = ∑ N ev ev=1 ID ev (10) In the above equations, we have: • D cons : difference between the consumption during a demand response event and the consumption that would have taken place without the demand response event. • P baseline,elec(t) : consumption that would have taken place without the demand response event. • P DR,elec(t) : consumption during the demand response event. The aforementioned formulas are applied to the demand response events that have taken place in the pilot sites. Table 2 shows the results obtained for the two pilot sites. As shown in the table above, the total number of demand response events in both pilot sites is around 500 altogether. There are slight differences between the two pilot sites with respect to the number of successful events, whereas the duration of the demand response events has been set at 15 min. It is observed that, overall, the number of successful demand response events reaches high ratios, resulting in the distributed building demand response reliability being higher than 75% for all sites. As the table reveals, the overall reliability KPI reaches 83%, which is considered to be a good index for the buildings that take part in the demand response events.
The methodology proposed above for calculating such an index can be used in future works that involve demand response in buildings and it is an efficient way of evaluating the performance of a building in demand response programs.

KPI 2: Energy Consumption Reduction in Pilot Sites
This KPI, as the name implies, shows the reduction in energy consumption induced by DR events. The calculation of this KPI relies on the baseline estimation described in the previous. The energy consumption reduction in the pilot sites is then calculated as the difference between the actual consumed energy and the estimated baseline. The exact time when DR takes place is also necessary. The following formulas are used in order to calculate this KPI [10].
Savings for electricity: E savings,elec (∆t) = t∈∆t (P baseline,elec (t) − P DR,elec (t))·δ DR,active (t).dt (12) When there are discrete values instead of continuous, the following formula is used: In the above equations, the following variables are defined as: • δ DR,active : DR event trigger (δ = 1 when a DR occurs, δ = 0 if no DR occurs); • P baseline (t): baseline energy consumption when no DR event occurs (kW); • P DR (t): real energy consumption during a DR event (kW).
The savings in percentage can be expressed as: E cons,baseline (∆t) = ∆t P baseline (t).dt ≈ ∑ t∈∆t P baseline (t) (15) which stands for the energy consumption. For the calculations on the pilot sites, we take into account the successful demand response events, as these are analysed for the previous KPI. By taking into account the baseline calculations, we obtain the results for this KPI presented in Table 3. As it can be observed from the above table, the number of successful demand response events and the baseline calculations are critical for the evaluation of this KPI. For the Cypriot pilot site, there have been 193 successful DR events, whereas for the Spanish site, there are 178 and 42 DR events for the residential and offices section, respectively. The energy savings vary between the two different pilot sites, achieving a nearly 15% for the Cypriot pilot site and above 50% for the Spanish pilot site (offices section).

KPI 3: Energy Cost Savings in Pilot Demonstration Sites
This KPI shows the cost savings achieved at the pilot sites, meaning that the energy cost during the demand response events should be lower than the same cost would have been without the demand response events. To calculate these energy cost savings, the Energies 2023, 16, 6048 21 of 28 electricity tariffs for each pilot site have been considered. The following formulas have been used for the calculation of this KPI [10]: EG(∆t) = ∆FR(∆t) + ∑ t ∆t ∆Ex(t) (16) where ∆Ex stands for the energy expenses variations (electricity, fuels and district heating) and ∆FR stands for financial rewards. For this case, we only have electricity as the energy source, thus fuels and district heating are not part of the equation. The output is calculated in the national currency.
In order to convert these savings in percent, the energy cost according to the baseline consumption is used for the calculation: Cost savings Cost baseline (19) With: Cost baseline = (D baseline,elec (t))·Pr elec (t) − S baseline,elec (t)·Pr elec, f eedin (20) To calculate this KPI, it is fundamental to have the average electricity tariffs, which are summarized in Table 4. According to the above equations, for the cost savings calculations, we first need to calculate the energy consumption that would take place according to the baseline and calculate the relevant costs and then compare these values to the actual consumption costs. Table 5 shows the resulting outcomes.
As can be concluded from the above table, to evaluate this KPI, we made use of the calculated baseline and the successful demand response events for each pilot site. The results have been positive in the sense that a substantial percentage reduction of costs has been achieved in both pilot sites. The range at which this reduction took place varies a lot, as we can observe from the table above, covering a span from 27% to 72%.

KPI 4: Reduction of CO 2 Emissions in Pilot Sites
This KPI is related, as the name implies, to the reduction of CO 2 emissions after the demand response program(s) have been applied. Since demand response programs are applied in order to avoid peaks in the overall electricity consumption, it is anticipated that such programs could also result in a reduction in overall consumption and, thus, reduction of CO 2 emissions. For this KPI evaluation, the baseline is necessary in order to calculate the CO 2 emissions with and without the demand response event. This indicator can be calculated in kgCO 2 and, according to [10], it is given by: where: D baseline (t): energy demand without DR event (kW); D DR (t): energy demand during DR event (kW); EF source : emission factors of national production sources and district heating supplier (kgCO 2 /kWh); MIX source (t): national electricity mix; production sources of electricity that can be extracted from ENTSO-E database.
The equation for the CO 2 savings becomes more simplified, since for the demand response events, one source of electricity has been used.
∆I CO 2 (t) = (D DR,elec (t) − D baseline, elec (t))·EF (23) To calculate the annualized CO 2 savings for each pilot site, we use the results obtained for the period of the DR events and we extrapolate the data for a period of one year. For the other two parameters, we need to calculate the value resulting from the emission factor in relation to the electricity source. For Cyprus, most of the energy was produced by conventional fossil fuel units and the emission factor of national production sources for electricity in 2020 was 0.677 kg of CO 2 per kWh. It is noted here that the adopted emission factor of national production sources for Cyprus corresponds to the most recent available data from 2020. For Spain, the corresponding emission factor was 0.19 kg of CO 2 per kWh. Especially for Spain, the average energy reduction percentage is used for the offices and the residential sector, since consumption has been similar.
As can be observed from Table 6, all the calculations for the emissions savings have been based on total energy reduction, which is the result of reduction between the energy actually consumed during demand response events and the energy that would have been consumed if there was no demand response events ongoing. This latter calculation is derived from the baseline, which proves the criticality to have an accurate baseline methodology in order to be in the position to accurately evaluate the effect of demand Energies 2023, 16, 6048 23 of 28 response events. In this case, for this KPI, it is obvious that the demand response events have a positive effect with respect to CO 2 savings, resulting in total CO 2 savings of 383.29 tons.

KPI 5: Peak Load Reduction during Pilot Demonstration Activities
This KPI has to do with the reduction in the peak load consumption. Similarly to other KPIs, here it is also necessary to take into consideration the baseline, which shows the consumption that would have taken place without demand response programs. Afterwards, the KPI is evaluated by calculating the difference between the energy peaks in the actual consumption and in the baseline profiles. We propose a novel way of calculating this KPI, taking into account the difference between the baseline consumption and the actual consumption for every successful demand response event.
Thus, if we consider that evS is a successful demand response event with a total of successful events N evS , then we have: In the above equations, we have: • D cons : consumption difference between the consumption during a successful demand response event and the consumption that would have taken place without the demand response event. • P baseline,elec(t) : consumption that would have taken place without the demand response event. • P DR,elec(t) : consumption during the successful demand response event. • δ DR,ev : DR event trigger (δ = 1 when a successful DR occurs, δ = 0 if no successful DR occurs). • N evS : number of successful demand response events. • D cons,norm : the normalised consumption difference (consumption difference between the consumption during a successful demand response event and the baseline consumption divided by the baseline consumption) in %. • D peak,reduction,max : the maximum normalised peak load reduction in %. • D peak,reduction,ave : the average normalised peak load reduction in %.
For the two pilot sites that we have examined, we have calculated two important parameters, the peak load reduction and the average peak load reduction, which give a  Table 7 shows the two percentage values for each of the pilot sites. Table 7. Maximum and average peak load reduction in the two pilot sites.

Cypriot Pilot Site
Spanish Pilot Site

Residential Offices
Average peak load reduction in % 26% 56% 79% Maximum peak load reduction in % 99.55% 99.72% 98.83% As can be observed, the maximum peak load reduction is circa 99% in all cases, whereas good targets have been achieved also for the average peak load reduction. This KPI shows the peak load reduction that can be achieved, which is also an indication for the energy grid: the lower the peaks in demand, the easier the grid can cope with them, whereas great demand peak loads mean that the grid needs to be enforced in order to meet the demand; thus, more investments are needed. As a result, we can see that with demand response programs, the peaks in demand are becoming lower, which means that the grid does not need to be more robust in order to cope with high demand; thus, less investments are needed to be made for the grid.
From the above, we can see the importance of this KPI, as it is the only one that gives an indication for the grid itself. Therefore, its precise calculation is of crucial importance. In this work, we give a novel way of calculating this KPI, which helps in the overall evaluation of the project and of the effect of demand response events on the pilot sites. The proposed approach can be utilised in future works by the scientific community to evaluate the impact and effectiveness of demand response events.

Discussion and Future Work
In this paper, we have developed a methodology for evaluating demand response programs in real-life scenarios. Specifically, we have proposed the following:

1.
A structured methodology for calculating the baseline consumption, meaning the consumption that would have taken place if there were no demand response events. The baseline consumption has been applied for evaluating the overall project.

2.
Specific key performance indicators (KPIs) to evaluate the demand response events taking place in real-life scenarios, specifically in real pilot sites.

3.
A quantitative methodology to estimate two of these KPIs, proposed, to the best of our knowledge, for the first time; for the other three KPIs, quantitative methodologies proposed in the literature are used. For the KPIs evaluation, the baseline consumption, estimated in step 1 has been used.
For the baseline calculation, we have selected three models, which have been applied in the two pilot sites, using the available historical data. For both pilot sites, we have obtained historical data and an exploratory data analysis has been performed in order to identify outliers and trends for the data. The three models applied for extracting the baseline energy consumption have been selected in order to keep complexity at a reasonable level and, on the other hand, increase accuracy. The analysis described in [14], has been taken into consideration, where the available models have been listed. The three models used here have been selected according to the aforementioned criteria and based on the description in [14]; namely, the multivariable regression model, the univariate regression model and the convolutional neural network model have been used for the baseline energy calculation. According to the results, the CNN model has the greatest accuracy, with the univariate regression model the second most-accurate. The multivariable regression model has been the least-accurate, since the variables that are used in the model, namely the temperature, the humidity and the luminance measured at the pilot sites, show little correlation with the energy consumption of the considered buildings. This means that energy consumption is not significantly correlated to the weather variables, as one might expect. Such findings should be considered for effective efforts for energy conservation, in the sense that the patterns for energy consumption need to be calculated with models that are proved to be working efficiently and not be derived without scientific basis. Therefore, an effective methodology is required for baseline calculation in order to support energy conservation efforts.
It should be noted that, while CNNs are less commonly used for time-series analysis than RNNs or LSTMs, they can be effective when spatial patterns or local dependencies exist within the time-series data, and tasks like time-series classification or anomaly detection in multivariate time-series data might benefit from their application. In addition, it is worth noting that the CNN model outperforms the two other models used in this work to predict the baseline. In this work, the CNN model has been selected in order to calculate the energy baseline for our analysis. Table 8 shows the KPIs that have been used for the evaluation of the demand response program and their role in the program's assessment. As can be seen from the above table, KPIs n. 2 to n. 4 are related to savings of the specific DR program, i.e., energy, cost and CO 2 savings, respectively. For the calculation of these KPIs, a thorough literature review has revealed the methodology to follow in order to realize such a calculation. These KPIs are important for the DR programs evaluation, since they reveal indicators for savings in relation to common European goals, such as the European Green Deal, indicating that the future grid should be able to manage energy in a better way and contribute to climate neutrality.
The other two KPIs are equally important: as the first reveals valuable information about the effectiveness of a particular DR program, i.e., the more successful DR events take place, the better the program's performance. The second KPI gives an important indication for the grid operators. The presence of peak load demand means that the grid needs to cope with such peaks and be in the position to meet energy demand. This, in turn, means that the grid needs to be robust enough to cover for peak load demand, which might require significant investments. On the other hand, if a lower peak load demand is present, then the grid does not need to be significantly expanded and strengthened, leading to lower investment costs to be sustained by the grid operator. Thus, this KPI is an indication for the grid operator and the necessity to increase or not the grid's capacity to cope with elevated peak loads. In this work, we have proposed a structured quantitative methodology for the evaluation of these two KPIs. This contributes not only to complete the work presented in this paper, but can also support any future work in the field, providing a concrete way of evaluating two important KPIs.
The above KPIs are considered to provide a comprehensive description of a demand response program, giving information from the savings point of view and also completing the picture by quantifying the overall performance of the demand response program itself.
The role of an accurate baseline calculation methodology is of vital importance, since all KPIs evaluation and, consequently, the whole demand response program's evaluation depends on the baseline consumption. Indeed, all KPIs are calculated taking into account the comparison of the actual energy consumed and the energy that would have been consumed if no demand response event took place (baseline methodology).
Moreover, the current work can be the basis for further investigations in the demand response field. In particular, the planned future work will examine the effectiveness of other variables for the multivariate model for the calculation of energy baseline, like population density and unit effectiveness, to name some of them. Another issue left for future work is to keep track of European projects in the field of demand response, with a focus on residential demand response; such works will be the basis for investigation of other KPIs necessary for the evaluation of demand response programs and the way these can be addressed. In general, demand response is a field that keeps evolving; thus it is necessary to keep track of the state-of-the-art with continuous work on the field.
Keeping track of the state-of-the-art is also necessary from a policy perspective. Demand response is a field where policy-makers still need to take actions in order to better clarify the role of the different involved actors and to unify differences among Member States. The research team of this article is actively involved in policy-making attempts and is actually planned to be part of novel policy making actions in the field of demand response focusing on the role of energy smart appliances. In this way, the scientific findings of this work, including baseline calculation methods and KPI evaluations, can be applied in real-world situations, e.g., how large-scale implementation of energy smart appliances can be used in demand response. This planned project is actually a continuation of an already ongoing work on energy smart appliances and their interoperability, where industry professionals, smart appliance manufacturers and other relevant energy actors are involved [23].
As future work, it is also intended to examine scalability issues, particularly the implications of applying demand response programs from a large-scale perspective. Such implications include baseline calculation issues, i.e., to what extent a single baseline methodology can be applied from a large scale perspective or alternatively how many different methodologies need to be applied for demand response programs applied in a region with differences in energy end consumers. Such future work is expected to shed light on how the paper's findings can be utilized in practical settings and real-world situations involving industry professionals and other stakeholders in the energy sector.

Conclusions
In this paper, we examine the topic of explicit demand response and we give a concrete methodology for evaluating demand response programs in real life scenarios. First, we propose a novel way of evaluating the baseline, by taking into account artificial intelligence techniques and applying regression algorithms for this baseline calculation. The demand response events are applied in real pilot sites, representing real-life scenarios. For the baseline calculation, we use historical data from the two pilot sites; therefore the baseline is designed taking into consideration the characteristics of each pilot site where demand response is about to be applied.
Second, we use this baseline in practice in order to evaluate the effects of explicit demand response in two pilot sites involving real consumers. Energy consumption is recorded when demand response occurs and the baseline calculation is used to estimate the energy that would have been consumed hypothetically if no demand response had taken place. The two curve values are used further for the evaluation of key performance indicators (KPIs). For the correct evaluation of the whole demand response program applied on the pilot sites, we have used five KPIs. Three of these KPIs refer to savings of the demand response program, whereas the other two have to do with evaluating the successfulness of the program and indicating the peak load reduction. An original, quantitative way of calculating the last two KPIs has been proposed, providing an added value to this work, as the proposed KPIs and their evaluation methodology can be used in the assessment of future demand programs. The proposed baseline and KPIs calculation have been applied on the two pilot sites and actual results have been presented, thus showing the applicability of the proposed methodologies through real-life results, including actual end consumers.
Summing up, this paper gives a clear methodology for evaluating real demand response programs by providing a concrete way of estimating the baseline and afterwards applying it to real pilot sites. We show how this baseline can be used for the evaluation of real KPIs, demonstrating the effects of demand response events in real-life scenarios.

Data Availability Statement:
The data presented in this study are available in [22].

Conflicts of Interest:
The authors declare no conflict of interest.