Energy Flexibility Prediction for Data Center Engagement in Demand Response Programs

: In this paper, we address the problem of the e ﬃ cient and sustainable operation of data centers (DCs) from the perspective of their optimal integration with the local energy grid through active participation in demand response (DR) programs. For DCs’ successful participation in such programs and for minimizing the risks for their core business processes, their energy demand and potential ﬂexibility must be accurately forecasted in advance. Therefore, in this paper, we propose an energy prediction model that uses a genetic heuristic to determine the optimal ensemble of a set of neural network prediction models to minimize the prediction error and the uncertainty concerning DR participation. The model considers short term time horizons (i.e., day-ahead and 4-h-ahead reﬁnements) and di ﬀ erent aspects such as the energy demand and potential energy ﬂexibility (the latter being deﬁned in relation with the baseline energy consumption). The obtained results, considering the hardware characteristics as well as the historical energy consumption data of a medium scale DC, show that the genetic-based heuristic improves the energy demand prediction accuracy while the intra-day prediction reﬁnements further reduce the day-ahead prediction error. In relation to ﬂexibility, the prediction of both above and below baseline energy ﬂexibility curves provides good results for the mean absolute percentage error (MAPE), which is just above 6%, allowing for safe DC participation in DR programs.


Introduction
The energy demand of data centers (DCs) is rapidly growing, and studies have shown that, in 2014, worldwide, they consumed 194 TWh of electricity (about 1% of the global electricity demand). These numbers are expected to increase to 3% by 2025 [1,2]. Cloud architectures and services have undergone massive development in the last decade; thus, DCs' energy demands have greatly increased, putting lots of pressure not only on their economic sustainability and environment impact but at the same time on the safe operation of the local energy grids to which they are connected, culminating in instability in the electricity network and severe risk of supply shortage [3]. Thus, in the past decade, a lot of industry efforts have concentrated on developing technologies for increasing the energy efficiency of DCs' operation with a view to decreasing their energy demand [4][5][6].
The advent of intermittent decentralized renewable energy sources (RES) is completely changing how grids are managed, increasing the need for demand response (DR) programs and energy storage to maintain grid balance and power quality [7]. The DR programs, where they are truly integrated within local energy systems, may represent a first significant option for transforming energy consumers into flexible, active energy users and integrating them in the emerging energy system. At the same time, this may bring significant benefits to the smart grids, since it allows a Distribution System Operator (DSO) to procure, in a cost-effective way, the necessary energy flexibility for integrating larger shares of intermittent RESs and stabilize the grid while not compromising the security of supply and network reliability. For example, in Europe, studies have revealed that the EU peak demand could be reduced by 60 GW (approximately 10% of the EU's peak demand) through DR measures [8]. However, to achieve a significant impact on load, flexibility schemes will require the dispatching of set-points to a greater number of assets and during a broader timeframe for 24 hours of a day. This inherently requires the introduction of engagement strategies for new types of energy customers (such as DCs) and the overtaking of technological barriers such as the accurate forecasting of demand, generation, and flexibility. Few studies have approached the efficient integration of DCs with the local energy grids via direct participation in DR programs; most of them focused on increasing the utilization of on-site renewable energy to take advantage of low energy prices [9,10]. To participate in DR programs, the DCs must accurately forecast their demand and potential flexibility and also optimally manage their operation to follow a DR signal provided by the DSO directly or by a flexibility aggregator [11]. As a result, the DCs can contribute to the stability of the local grid and can obtain a new income source besides their main revenue streams.
To show the technological barrier for DC participation in DR programs, we have considered contemporary DR scenarios, such as those described in reference [12]. The DC should act as a prosumer of electrical energy that has established flexibility purchase contracts with a Flexibility Aggregator in order to alleviate the associated risks with flexibility provisioning to its core business. The contract includes the operating conditions for the flexibility delivered by the DC in DR programs and the details of the financial settlement. Flexibility trading can be achieved daily, as shown in Figure 1. The DC must forecast its energy demand for the next day and send those values to the Flexibility Aggregator, which will collect this kind of information from its entire portfolio of prosumers in order to provide it to the DSO. The DSO will use the energy demand figures to detect potential congestion points, and if such congestion is forecasted for the next day, it will send a flexibility request to the Flexibility Aggregator. Upon aggregator request, DC must forecast its energy flexibility values for the next day in relation to a calculated baseline energy consumption. These values are then used by the Flexibility Aggregator to issue a flexibility order as a DR signal in terms of the DC energy demand profile that must be accurately followed in the next day to get rewarded, otherwise being at risk of penalty charges. During the operational day, these steps can be repeated considering a 4-h-ahead time frame.
As highlighted above, besides flexibility shifting and operation adaptation, the accurate forecasting of DCs energy demand and flexibility are key technical barriers for DCs' enrolment and successful participation in DR programs. Moreover, poor energy consumption analytics and lack of forecasting tools are mentioned in literature as major barriers for DR programs adoption [13]. There are lots of factors that may influence DC energy demand forecasting outcomes such as the size of the training data set, the specific behavior of the system (quantity of noise, seasonality, trend, susceptibility of external factors, etc.), the level of data aggregation, the intrinsic parameters of the model used, the granularity at which the predictions are realized (minutes, hours, months, years) and the dimension of the prediction window (the number of steps in the future) [14][15][16]. In-depth analysis of prediction outcomes shows that some DC energy consumption prediction models can outperform others on some specific data sets and give poor results on others depending on data characteristics. Even on the same data set, with of relatively good prediction accuracy, big discrepancy may occur between the quality of the results for different time intervals. At the same time, to our knowledge, there is no relevant state of the art approach addressing the forecasting of DC energy flexibility in relation to the baseline load.
In this paper we address the limitations identified in the literature for energy demand and flexibility forecasting by defining a DC energy prediction model that uses a genetic heuristic to determine the optimal ensemble of a set of neural network-based prediction models to obtain a high Sustainability 2020, 12, 1417 3 of 23 energy forecasting accuracy. It addresses holistically a mix of time horizons needed for DC participation in DR programs (i.e., day-ahead and intra-day) as well as different aspects of energy such as energy demand and energy flexibility.
Thus, the contributions of this paper are the following: • Implementation of an ensemble-based DC energy prediction model that combines a set of individual neural network weak learners to forecast the DC energy demand for the next day and to refine it continuously considering four-hour intervals. • Definition of energy flexibility in relation to the baseline load and a prediction model to forecast the potential DC energy flexibility to be used in DR programs. • Implementation of a genetic heuristic to determine the optimal combination of the outcome of individual predictors to minimize the prediction error thus lowering the uncertainty concerning DR participation.
The rest of the paper is structured as follows: Section 2 presents state of the art approaches on energy prediction related to DR programs, Section 3 describes the ensemble-based prediction model for DC energy demand and energy flexibility, Section 4 shows prediction results on a medium-scale DC testbed, while Section 5 concludes the paper and presents future work.
hours of a day. This inherently requires the introduction of engagement strategies for new types of energy customers (such as DCs) and the overtaking of technological barriers such as the accurate forecasting of demand, generation, and flexibility. Few studies have approached the efficient integration of DCs with the local energy grids via direct participation in DR programs; most of them focused on increasing the utilization of on-site renewable energy to take advantage of low energy prices [9,10]. To participate in DR programs, the DCs must accurately forecast their demand and potential flexibility and also optimally manage their operation to follow a DR signal provided by the DSO directly or by a flexibility aggregator [11]. As a result, the DCs can contribute to the stability of the local grid and can obtain a new income source besides their main revenue streams.
To show the technological barrier for DC participation in DR programs, we have considered contemporary DR scenarios, such as those described in reference [12]. The DC should act as a prosumer of electrical energy that has established flexibility purchase contracts with a Flexibility Aggregator in order to alleviate the associated risks with flexibility provisioning to its core business. The contract includes the operating conditions for the flexibility delivered by the DC in DR programs and the details of the financial settlement. Flexibility trading can be achieved daily, as shown in Figure 1. The DC must forecast its energy demand for the next day and send those values to the Flexibility Aggregator, which will collect this kind of information from its entire portfolio of prosumers in order to provide it to the DSO. The DSO will use the energy demand figures to detect potential congestion points, and if such congestion is forecasted for the next day, it will send a flexibility request to the Flexibility Aggregator. Upon aggregator request, DC must forecast its energy flexibility values for the next day in relation to a calculated baseline energy consumption. These values are then used by the Flexibility Aggregator to issue a flexibility order as a DR signal in terms of the DC energy demand profile that must be accurately followed in the next day to get rewarded, otherwise being at risk of penalty charges. During the operational day, these steps can be repeated considering a 4-h-ahead time frame.

Related Work
Few state-of-the-art approaches address the topic of DC energy forecasting for DR participation, most of them being focused on energy demand and none, to our knowledge, addresses the forecasting of energy flexibility. Dayarathna et al. [17] conducted a survey of techniques for DCs energy consumption modelling and energy demand forecasting. Besides identifying the IT equipment and facility infrastructure as the main consumers they classified existing machine learning-based approaches for DC energy consumption forecasting. They concluded that existing research work concentrated more on studying the energy efficiency of lower hardware levels of DCs, but less on higher aggregated levels which is desired for DR programs. Google's researchers [18] have trained a generic three-layered neural network on DCs to predict the power usage effectiveness (PUE) ratio, showing that prediction methods could be an effective way to model DCs performance that could bring significant cost savings on the global energy market. They achieved a 40% reduction in the amount of energy used for cooling their DC. Several DC energy prediction models are proposed, Sustainability 2020, 12, 1417 4 of 23 most of them targeting the increase of energy efficiency of DC operation [19,20] in isolation, without considering smart energy grid integration. Similarly, multi-objective genetic algorithms can be used to dynamically forecast resource usage and energy consumption in DCs [19]. They can forecast the resource requirements for a future time slot according to the historical data in previous time slots which is fed as input for virtual machines placement algorithms having as general objective the decrease of DC energy consumption. In [20] the authors present a DC power prediction framework based on power profiling and deep learning models. For short-time forecasting, a recursive autoencoder is used onto fine-grained models while for long-term prediction massive fine-grained historical data are encoded into the coarse-grained model. According to [21] the intermittent nature of renewable energy sources is seen as a major drawback of using it on DCs site. The authors propose a scheduler that uses IT and electrical models of the DC energy consumption together with an energy availability prediction engine for the next 48 h. In [22] a multi-layered ANN is defined and used to forecast the DC energy consumption on monthly intervals based on the historical energy consumption data. The forecast engine is implemented using MLP. Considering stability over prediction accuracy, in [23] a forecasting model that uses dynamic adaptive entropy-based weighting for total energy demand forecasting is proposed. The model combines classic prediction techniques such as Holt-Winters Multiplicative algorithm and moving average, using a weighted based ensemble. Finally, in one of our previous works [24] we proposed and successfully used neural network prediction models to forecast the DC server room temperature which is then used as input into the thermodynamics processes simulations to decide on adapting DC thermal energy profile for providing on-demand heat to nearby neighborhoods.
There are a lot of approaches in the literature that address energy consumption forecasting of regular prosumers. The most important ones are the deep neural network energy forecasting models, which are often used for medium and short-term predictions due to their low time overhead while raising concerns for their accuracy and creating the need for extensive comparative evaluation on similar energy sets [25]. In [26], the authors propose a novel building energy load forecasting methodology based on deep neural networks, more specifically long short-term memory (LSTM), which achieves accurate energy predictions, both at the aggregate and individual site level. For improving the forecasting results they investigate a LSTM architecture that maps sequences of different lengths. LSTM energy prediction models are compared with other machine learning-based approaches showing good improvements in terms of forecasting accuracy [27]. In [28] the authors argue that existing methods are not able to model the uncertainty at the prosumer level due to the many fluctuations in influencing variables which negatively impact the forecasting process. Consequently, deep extreme learning models are proposed for improving the performance of energy consumption forecasting [29][30][31]. In [32], artificial neural networks (ANNs) are used for accurate short-term load prediction based on smart meters gathered data; in the same time, they investigate different approaches for load aggregation. The authors of [33] propose an ANN-based approach for predicting energy usage of buildings, additionally considering the users' characteristics and their activities as relevant features. In [34,35], an ANN model is proposed for carrying out day-ahead power predictions that on a specific scenario performs better than several other tested methods such as support-vector machine (SVM) and multilayer perceptron (MLP). The results show that the approach is suitable for assessing DR load shifting options based on a time-of-use pricing scheme achieving district level cost savings of around 15%. In [36], three types of deep learning-based energy-related features are compared with conventional feature engineering methods using fully connected auto-encoders, convolutional autoencoders, and generative adversarial networks. The authors of [35] use deep learning models to predict electricity consumption for arbitrary time horizons, by dividing each predicted sample into a single forecasting sub-problem which is solved independently by identifying the best forecasting model.
In the context of DR programs hybrid ensemble-based approaches can obtain better results for complex models [37]. The authors of [38] propose a hybrid SVM model to forecast the hourly electricity demand of buildings while several articles report good energy forecasting results using hybrid models of convolutional neural network (CNN) and LSTM [39][40][41]. In [42], multiple CNN components are employed to extract rich features from the historical load sequence and an LSTM based recurrent neural component is used to model the variability and dynamics of historical energy data. In [43] a hybrid network is described, which can extract spatial-temporal information and irregular features of electric power consumption to effectively predict building energy consumption while [44] defines a hybrid model between LSTM and recurrent neural network (RNN) to forecast the short-and medium-term aggregated load in micro-grids providing good results in the case of medium to long-range forecasting. A hybrid forecasting model that combines wavelet transform, Particle Swarm Optimization (PSO) and SVM for estimating short-term (one-day-ahead) generation power of a real micro-grid is proposed in [45]. Similarly, a hybrid model for electricity load and price forecasting based on a combination of Stochastic Gradient Descent and SVM that shows improved prediction accuracy is the subject of [46]. Other approaches use deep belief network to forecast the hourly load of the power grid [47] and combine chicken swarm optimization algorithm with SVM to make predictions on short-term wind power and improve the stability of power system operation reporting a better convergence and accuracy compared with other bio-inspired models [48].
Our approach builds upon the existing state of the art by proposing an ensemble-based DC energy prediction model for determining the DC energy demand and flexibility for 24-and 4-hour intervals for their active participation in DR programs. To our knowledge, even though such ensemble models give good results for different prosumers energy prediction, they have never been used to the specific case of DCs. Moreover, existing approaches are focused on forecasting the energy demand of prosumers and do not address other relevant aspects in DR such as the forecasting of energy flexibility. To determine the optimal combination of individual neural network model's energy prediction outcomes we have used a genetic heuristic aiming to minimize the overall prediction process error, thus decreasing the uncertainty in relation to the successful participation of DCs in DR programs.

DC Energy Prediction Model
We propose an ensemble-based DC energy prediction model that aggregates the prediction outcomes of individual neural network based weak learners to forecast the DC energy demand and their potential energy flexibility. We have chosen neural networks due to their ability in modeling complex non-linear processes such as the energy exchange processes occurring inside a DC. At the same time the ensemble-based predictors seem to perform significantly better compared to individual algorithms for time series prediction tasks. The output of the weak learners is combined using a weighted average (see Figure 2) to improve the final prediction result. We have implemented genetic algorithm-based ensemble method to determine the optimal weights that will generate the best combined outcome for the specific DC energy prediction problem, considering both the characteristics of the input data and the interrelations among different DC sub-systems.
Each individual predictor can be defined as a function f θ NN−Model with parameter θ that computes the predictions based on the input energy data of a DC energy sub-system (E historical sub−system ), for a specific future time frame window T and granularities g: Analyzing the DC energy demand and flexibility patterns of various DCs we have selected the IT servers and cooling sub-systems which are the major contributors to the DC total energy demand, as relevant sub-systems for the energy forecasting process. There is a strong dependency between the energy demand and flexibility potential of the IT servers and cooling sub-systems which need to be captured in the prediction model (see Figure 3). If the IT server's sub-system has a higher energy demand, it will generate more heat and more cooling will be needed to maintain the temperature setpoints, generating an increase of energy demand from the cooling sub-system. Similarly, if the IT servers' energy demand decreases, the cooling energy demand will also decrease. same time the ensemble-based predictors seem to perform significantly better compared to individual algorithms for time series prediction tasks. The output of the weak learners is combined using a weighted average (see Figure 2) to improve the final prediction result. We have implemented genetic algorithm-based ensemble method to determine the optimal weights that will generate the best combined outcome for the specific DC energy prediction problem, considering both the characteristics of the input data and the interrelations among different DC sub-systems.  Each individual predictor can be defined as a function with parameter that computes the predictions based on the input energy data of a DC energy sub-system ( ), for a specific future time frame window and granularities : Analyzing the DC energy demand and flexibility patterns of various DCs we have selected the IT servers and cooling sub-systems which are the major contributors to the DC total energy demand, as relevant sub-systems for the energy forecasting process. There is a strong dependency between the energy demand and flexibility potential of the IT servers and cooling sub-systems which need to be captured in the prediction model (see Figure 3). If the IT server's sub-system has a higher energy demand, it will generate more heat and more cooling will be needed to maintain the temperature setpoints, generating an increase of energy demand from the cooling sub-system. Similarly, if the IT servers' energy demand decreases, the cooling energy demand will also decrease.
The ensemble predictor gathers the results of each individual predictor and combines them based on evolutionary computing optimized weighted average to predict the final outcome: where represents the weights vector that is applied to each individual predictor to obtain the best prediction performance and is the number of individual predictors.
The goal is to determine the parameters and such that the error between the energy predicted value and the actual monitored one is minimized for the entire forecasting time window:

Demand Forecasting
Each individual neural network is generically modeled as a set of neurons distributed over several hidden layers trying to map the energy inputs to outputs through a non-linear function (see Figure 4). Each model is then configured according to the forecasting time window, being fed with number of historical energy data inputs and additionally with the contextual features , and will predict number of future energy values. The ensemble predictor gathers the results of each individual predictor and combines them based on evolutionary computing optimized weighted average to predict the final outcome: where ω represents the weights vector that is applied to each individual predictor to obtain the best prediction performance and i is the number of individual predictors. The goal is to determine the parameters θ and ω such that the error between the energy predicted value and the actual monitored one is minimized for the entire forecasting time window: Sustainability 2020, 12, 1417 7 of 23

Demand Forecasting
Each individual neural network is generically modeled as a set of neurons distributed over several hidden layers trying to map the energy inputs to outputs through a non-linear function (see Figure 4). Each model is then configured according to the forecasting time window, being fed with N number of historical energy data inputs and additionally with the contextual features C F , and will predict M number of future energy values. We denote ( ), the instant power value of a DC sub-system at timestamp . The energy of a DC sub-system on the time interval = [ , ] is denoted as and is defined as the integral of power over the time interval: Because the prediction models use energy features sampled at equidistant timestamps, we define a discrete time model over which predictions are represented as a series of equidistant points on the time axis where the energy values are sampled: The power values of each DC sub-system will be computed from monitored power values on equal and continuous time intervals spreading between equidistant timestamps: The length of all intervals is constant, defined as or the time granularity of the sampling process. Considering the above, the energy on a time interval is computed using a basic interpolation technique as the average of the power values sampled by the monitoring infrastructure in the same interval: Our model can consider several target variables, and can reduce the DC energy prediction problem (at different time windows) to a univariate multi-step forecasting problem as follows: where , ( ) is the prediction of a certain energy value at time from the forecasting time window with a granularity , represents the contextual features considered in the forecasting process, ( ) is the historical energy value at time used as input in the forecasting We denote P sub−system (t k ), the instant power value of a DC sub-system at timestamp t k . The energy of a DC sub-system on the time interval T = [t s , t e ] is denoted as E T sub−system and is defined as the integral of power over the time interval: Because the prediction models use energy features sampled at equidistant timestamps, we define a discrete time model over which predictions are represented as a series of equidistant points on the time axis where the energy values are sampled: The power values of each DC sub-system will be computed from monitored power values on equal and continuous time intervals spreading between equidistant timestamps: The length of all intervals is constant, defined as g or the time granularity of the sampling process. Considering the above, the energy on a time interval t k is computed using a basic interpolation technique as the average of the power values sampled by the monitoring infrastructure in the same interval: Sustainability 2020, 12, 1417 8 of 23 Our model can consider several target variables, and can reduce the DC energy prediction problem (at different time windows) to a univariate multi-step forecasting problem as follows: where P NN−model,T component (t) is the prediction of a certain energy value at time t from the forecasting time window T with a granularity g, C F represents the contextual features considered in the forecasting process, P historical NN−model (t) is the historical energy value at time t used as input in the forecasting process, N is the number of historical energy values used as inputs and M is the size of the forecasting time window T.
Analyzing current DR programs operation, we have identified that two-time horizons are relevant (see Figure 5) for potential usage of forecasting results to enact the DC to participate as a prosumer: • process, is the number of historical energy values used as inputs and is the size of the forecasting time window .
Analyzing current DR programs operation, we have identified that two-time horizons are relevant (see Figure 5) for potential usage of forecasting results to enact the DC to participate as a prosumer: • Day-ahead: energy values are forecasted for the next 24 h with a granularity of one hour; • Intra-day: energy values are forecasted for the next 4 h with a granularity of half an hour; In the day-ahead case, the prediction model must forecast the hourly energy values for the next day, (i.e., 24 steps ahead), while the energy features considered are defined as historical energy data values spreading from the present to 24 h in the past, with time intervals granularity of one hour: In the intra-day case, the prediction model must forecast energy values over a four-hour time interval at a 30-min granularity (i.e., 8 steps ahead), while the energy features considered are defined as historical energy data values spreading from the present to 4 h in the past, at intervals of half an hour granularity: The energy-based features are further enhanced by adding contextual information as input, as we expect different energy profile patterns at different time contexts. Using them together with the energy value derived features, more complex and, maybe, hidden consumption patterns can be found. The contextual features represent data that are not specific to energy but correlated to context, such as season, weekdays and calendar days: • Season-the DC may consume/produce different quantities of energy depending on the season. For example, the energy consumption in summer can be higher than the energy consumption in winter especially due to more intensive use of cooling processes. Same reasoning may apply if we consider the renewable energy generation (i.e., solar energy). The possible values for this feature are: Spring, Summer, Autumn and Winter. In the day-ahead case, the prediction model must forecast the hourly energy values for the next day, (i.e., 24 steps ahead), while the energy features considered are defined as historical energy data values spreading from the present to 24 h in the past, with time intervals granularity of one hour: In the intra-day case, the prediction model must forecast energy values over a four-hour time interval at a 30-min granularity (i.e., 8 steps ahead), while the energy features considered are defined as historical energy data values spreading from the present to 4 h in the past, at intervals of half an hour granularity: The energy-based features are further enhanced by adding contextual information as input, as we expect different energy profile patterns at different time contexts. Using them together with the energy value derived features, more complex and, maybe, hidden consumption patterns can be found. The contextual features represent data that are not specific to energy but correlated to context, such as season, weekdays and calendar days:

Flexibility Forecasting
The energy flexibility of a DC measures the potential of adapting its energy demand in relation to a calculated baseline by shifting energy and, as a result, increasing or decreasing its energy demand profile. The baseline energy consumption is an estimate of the electricity that would have been consumed by each DC individual sub-system, or by the entire DC in the absence of any flexibility provisioning optimization. The baseline energy consumption profiling uses similar time scales as the energy demand forecasting process, but it is fundamentally different as it must satisfy both consumers and utility sides [49] and it is used only to measure the performance of the DC participation into a DR program. To determine the baseline at sub-system level we have used the X of Y method that calculates the baseline using the energy consumption data of Y previous days out of which the most significant X days are selected [50]. The average model-middle selects X days with the average load, excluding both the highest and the lowest loads, if they are isolated events. In this way the baseline is much more stable, and the error with respect to the load is reduced, but in this case, the actual energy demand may exceed the baseline at times: The bigger the Y parameter is, the more samples will be needed, which this usually increases the effectiveness of estimation. But if Y is too big it could cause problems, such as being affected by the change in the characteristics of the workload run by the DC. Thus, in our model we have used X = 7, Y = 30.
Considering the calculated baseline, we aim to forecast the DC energy flexibility for the day-ahead and intra-day timeframes. To estimate the degree in which the DC can increase or decrease the load in a DR program using its internal latent flexibility and to measure the adaptation during the program in a time interval [t start , t end ], we have used the adaptability power curve (APC) metric defined in the context of the EU Smart City Cluster [51]: The APC metric computes the Manhattan distance between the actual and baseline energy profile vectors and normalizes it using the total power demand over the DR program time interval [t start , t end ]. The APC metric is defined for each DC sub-system and for the entire DC.
Considering this metric, we define for each DC sub-system over a timeframe [t start , t end ] at granularity g, the flexibility above as the energy consumption values that are higher than the baseline and the flexibility below as the energy consumption values lower than the baseline: Flex below The flexibility above and below profiles as well as the baseline for a DC sub-system over a period of 24 h is illustrated in Figure 6. The difference between the above profile and baseline as well as the difference between the baseline and below profile provide the energy features for the machine learning algorithms used to forecast the energy flexibility of each DC sub-system.

Genetic Algorithm Based Ensemble
The DC energy prediction result over a specific forecasting time interval is calculated using a weighted average as: The flexibility forecasting aims to determine the demand flexibility of each DC sub-system over M future timestamps at granularity g based on a set of N historical values of the considered features. The main sources of energy demand flexibility in a DC are the IT servers and cooling sub-systems. We have considered which other components that may deliver certain flexibility levels such as the auxiliary energy storage devices -are included in the flexibility profiles of the above-mentioned sub-systems. Following the energy relation between the DC sub-systems (see Figure 7) the cooling system flexibility forecasting model has the output generated by the server room flexibility forecasting model as input. Each individual sub-system flexibility model has the historical monitored energy consumption values, the baseline values over the previous N time steps and the estimated baseline for the next M time steps as inputs while their outputs are aggregated to compute the total DC energy demand flexibility. Each sub-system flexibility model is implemented using two neural networks, used to predict either the flexibility below the baseline (Flex below sub−system ) or the flexibility above the baseline (Flex above sub−system ). We have considered that the individual neural networks models have similar characteristics, being composed of an input layer, two hidden layers with H neurons each and an output layer with M neurons.

Genetic Algorithm Based Ensemble
The DC energy prediction result over a specific forecasting time interval is calculated using a weighted average as:

Genetic Algorithm Based Ensemble
The DC energy prediction result over a specific forecasting time interval is calculated using a weighted average as: where j is the number of individual weak learners and ω T NN−model j is the weight of the energy prediction outcome generated by the NN − model j considered in the ensemble process: To determine the optimal values of weight matrix ω (i.e., the best combination of weights <ω t > for each timestamp t ∈ T) while taking into account the characteristics of the energy input data, prediction goal and energy interrelations between different subsystems of the DC, we will use a genetic algorithm. The sum off all weights for each timestamp in the interval should be equal to 1: We have modeled each individual chromosome of the genetic algorithm as a vector: Sustainability 2020, 12, 1417 12 of 23 representing a potential DC energy prediction weighted ensemble configuration. The entire population with r individuals is defined as: We define the fitness function aiming to minimize the MAPE of a potential weighted energy prediction ensemble and the actual monitored energy data:

MIN( f itness(T, I)), f itness(T, I)
The pseudocode for the evolutionary optimized ensemble is presented in Figure 8. Each chromosome in the genetic algorithm has several genes corresponding to the length of forecasting time window T (number of timestamps considered): 24 in the case of day-ahead, and 8 in the case of intra-day. Initially the individuals are randomly created by generating a random weight vector (line 11) for each gene corresponding to the timestamp t.
in the interval should be equal to 1: We have modeled each individual chromosome of the genetic algorithm as a vector: representing a potential DC energy prediction weighted ensemble configuration. The entire population with individuals is defined as: We define the fitness function aiming to minimize the MAPE of a potential weighted energy prediction ensemble and the actual monitored energy data: The pseudocode for the evolutionary optimized ensemble is presented in Figure 8. Each chromosome in the genetic algorithm has several genes corresponding to the length of forecasting time window (number of timestamps considered): 24 in the case of day-ahead, and 8 in the case of intra-day. Initially the individuals are randomly created by generating a random weight vector (line 11) for each gene corresponding to the timestamp .

2.
_ _ -the number of chromosomes in the population

3.
_ -the number of generations for population evolution

4.
_ _ -number of chromosomes that will be used for mating

5.
_ _ -the number of individual prediction models
-the individual containing the weights of the ensemble with the best fitness value 9.

12.
For each generation in range ( _ )

19.
End for 20. Then, for each new generation the fitness function is computed for all individuals in the population (lines [12][13]. The individuals with the best fitness value are selected as parents and mates for the next population generation (line 14). Using the crossover operation, the new individual offspring is calculated having its first half of genes taken from the first parent and the second half from the second parent ( _ being defined at the center):  Then, for each new generation the fitness function is computed for all individuals in the population (lines [12][13]. The individuals with the best fitness value are selected as parents and mates for the next population generation (line 14). Using the crossover operation, the new individual offspring is calculated having its first half of genes taken from the first parent and the second half from the second parent ( _ being defined at the center): New populations are created based on the parents and offspring, re-iterating through the process until the maximum number of generations defined is reached (lines [15][16][17][18]. In the end, the algorithm will return the best individual from the population which will contain the encoding of the ensemble weight matrix for each timestamp of the forecasting time window T.

Experimental Results
We have conducted a set of in-lab experiments to estimate the potential of our DC ensemblebased energy forecasting engine to generate accurate energy demand and energy flexibility predictions enacting the DCs to participate in DR programs. The prediction results are calculated in the day-ahead and intra-day forecasting time window and communicated on demand to the DSO for allowing it to accurately construct next day prognosis in the micro grid and potentially detect congestion points. For evaluation purposes we have considered the hardware characteristics as well as the historical energy consumption data of a medium scale DC (see Table 1) [52].
Next, several genes are selected for mutation. A random value α is added/subtracted from every individual prediction model weight at a certain position determined randomly (index of the gene) such that the mutation would maintain the constraint defined in relation (20): New populations are created based on the parents and offspring, re-iterating through the process until the maximum number of generations defined is reached (lines [15][16][17][18]. In the end, the algorithm will return the best individual from the population which will contain the encoding of the ensemble weight matrix ω for each timestamp t of the forecasting time window T.

Experimental Results
We have conducted a set of in-lab experiments to estimate the potential of our DC ensemble-based energy forecasting engine to generate accurate energy demand and energy flexibility predictions enacting the DCs to participate in DR programs. The prediction results are calculated in the day-ahead and intra-day forecasting time window and communicated on demand to the DSO for allowing it to accurately construct next day prognosis in the micro grid and potentially detect congestion points. For evaluation purposes we have considered the hardware characteristics as well as the historical energy consumption data of a medium scale DC (see Table 1) [52].

Sub-System Characteristics
Cooling system  Figure 9 shows the historical energy demand values for the DC split into IT servers and cooling sub-systems. The data values range over a period of 3 months with a sampling rate of 10 min. The initial data have been split into 80% for training and 20% for testing purposes. Out of the training data, 20% has been kept for training the genetic algorithm and has not been presented to weak learners in order to avoid prediction model overfitting. The ensemble has been evaluated on the 20% of data kept for testing purposes.
LSTM has gained popularity due to its capability to learn long-term dependencies in time series data and to scale up to several layers of LSTMs.
The DC energy prediction models have been implemented in Python programming language using the TensorFlow learning library, making use of the integrated Keras API. Experiments have been carried out on a system equipped with an Intel Core i5 7600 K CPU 3.80 GHz, 24 GB RAM internal memory and an NVIDIA GeForce GTX 1050 GPU.

DC Energy Demand Prediction Results
We have evaluated the performance of the implemented energy prediction model for forecasting the DC energy demand considering both day-ahead and intra-day forecasting time horizons. Each neural network model (i.e., NN-model) has been configured according to the energy features of the DC sub-systems and the timeframe for which the prediction must be computed. The number of inputs, outputs, hidden layer and neuron types is presented in Table 2. We have considered two types of individual neural network models (i.e., differentiated by the neuron types) as mathematical functions used for regression, aiming to forecast energy values over a future time window: (i) MLP that uses rectified linear units (ReLU) [53] and (ii) LSTM [54]. The MLP has proven its suitability for regression problems because it can be seen as a logistic regressor that is fed through an intermediate layer called "hidden layer" activated by a non-linear function. LSTM has gained popularity due to its capability to learn long-term dependencies in time series data and to scale up to several layers of LSTMs.
The DC energy prediction models have been implemented in Python programming language using the TensorFlow learning library, making use of the integrated Keras API. Experiments have been carried out on a system equipped with an Intel Core i5 7600 K CPU 3.80 GHz, 24 GB RAM internal memory and an NVIDIA GeForce GTX 1050 GPU.

DC Energy Demand Prediction Results
We have evaluated the performance of the implemented energy prediction model for forecasting the DC energy demand considering both day-ahead and intra-day forecasting time horizons. Each neural network model (i.e., NN-model) has been configured according to the energy features of the DC sub-systems and the timeframe for which the prediction must be computed. The number of inputs, outputs, hidden layer and neuron types is presented in Table 2.
The first set of experiments aim to evaluate the performance of the ensemble predictor considering the day-ahead prediction framework and the results obtained over the test days by the MLP and LSTM neural networks.  Figures 10 and 11 present details on the best forecasting results (i.e., best day from the testing set) obtained in terms of predicted energy profile compared with the actual one for different configurations of the forecasting models. The MAPE values for both type of weak learners considered are above 8% (i.e., 8.68% for MLP and 8.50% for LSTM). The first set of experiments aim to evaluate the performance of the ensemble predictor considering the day-ahead prediction framework and the results obtained over the test days by the MLP and LSTM neural networks. Figures 10 and 11 present details on the best forecasting results (i.e., best day from the testing set) obtained in terms of predicted energy profile compared with the actual one for different configurations of the forecasting models. The MAPE values for both type of weak learners considered are above 8% (i.e., 8.68% for MLP and 8.50% for LSTM).

IT servers consumption
The ensemble predictor that uses the genetic algorithm-based approach to generate specific weights for all the time stamps of the forecasting window achieves a better MAPE (see Figure 12) compared to the individual predictors (i.e., 8.15% for IT servers, respectively 8.09% for cooling subsystem). Figure 13 shows the average MAPE obtained by the two individual prediction models LSTM and MLP as well as the ensemble model over the entire testing period for the day-ahead time frame. The LSTM models MAPE average is 9.50%, MLP achieves a MAPE of 9.276%, while the ensemble models achieve a MAPE of 9%. As it can be seen in the chart, on some days the LSTM works better, while on others the MLP models predict with better accuracy. On the second test day, the ensemble model achieves a MAPE of 8.15%, best result obtained by any of the three models.   The ensemble predictor that uses the genetic algorithm-based approach to generate specific weights for all the time stamps of the forecasting window achieves a better MAPE (see Figure 12) compared to the individual predictors (i.e., 8.15% for IT servers, respectively 8.09% for cooling sub-system). The second set of experiments aims to evaluate the enhancement bought by the intra-day forecasting process considering not just the forecasting errors but also the difference between the forecasted energy values by the day-ahead and intra-day predictions and the actual monitored  Figure 13 shows the average MAPE obtained by the two individual prediction models LSTM and MLP as well as the ensemble model over the entire testing period for the day-ahead time frame. The LSTM models MAPE average is 9.50%, MLP achieves a MAPE of 9.276%, while the ensemble models achieve a MAPE of 9%. As it can be seen in the chart, on some days the LSTM works better, while on others the MLP models predict with better accuracy. On the second test day, the ensemble model achieves a MAPE of 8.15%, best result obtained by any of the three models. The second set of experiments aims to evaluate the enhancement bought by the intra-day forecasting process considering not just the forecasting errors but also the difference between the forecasted energy values by the day-ahead and intra-day predictions and the actual monitored The second set of experiments aims to evaluate the enhancement bought by the intra-day forecasting process considering not just the forecasting errors but also the difference between the forecasted energy values by the day-ahead and intra-day predictions and the actual monitored values. This represents an important measure of the prediction efficiency as the deviation between the forecasted values on the two horizons; the real monitored values are translated in an uncertainty and a cost in the delivery of flexibility services in the DR programs.
As it can be seen from Figures 14-16, the prediction models on the intra-day forecasting outperform the day-ahead prediction results. At the same time, the ensemble predictor also gives the best results on the intra-day time window achieving a MAPE of 7% on average taking the server room and cooling sub-system components.
Furthermore, we have evaluated the deviation between the actual energy monitored values and the ones forecasted by the intra-day and day-ahead processes over the test data. We have computed the mean absolute error (MAE) value with respect to the actual values at a 10-min granularity. The intra-day prediction gives better results in terms of total energy estimation in 3 out the 5 test days, achieving estimations with more than 600 kWh of energy daily better in respect to the day-ahead process. Overall the 5 test days, the intra-day prediction total energy prediction improvement compared with the day-ahead one is of about 100 kWh.
Sustainability 2020, 12, x FOR PEER REVIEW 16 of 22 values. This represents an important measure of the prediction efficiency as the deviation between the forecasted values on the two horizons; the real monitored values are translated in an uncertainty and a cost in the delivery of flexibility services in the DR programs.
As it can be seen from Figures 14-16, the prediction models on the intra-day forecasting outperform the day-ahead prediction results. At the same time, the ensemble predictor also gives the best results on the intra-day time window achieving a MAPE of 7% on average taking the server room and cooling sub-system components.  values. This represents an important measure of the prediction efficiency as the deviation between the forecasted values on the two horizons; the real monitored values are translated in an uncertainty and a cost in the delivery of flexibility services in the DR programs.
As it can be seen from Figures 14-16, the prediction models on the intra-day forecasting outperform the day-ahead prediction results. At the same time, the ensemble predictor also gives the best results on the intra-day time window achieving a MAPE of 7% on average taking the server room and cooling sub-system components.  Furthermore, we have evaluated the deviation between the actual energy monitored values and the ones forecasted by the intra-day and day-ahead processes over the test data. We have computed the mean absolute error (MAE) value with respect to the actual values at a 10-min granularity. The intra-day prediction gives better results in terms of total energy estimation in 3 out the 5 test days, achieving estimations with more than 600 kWh of energy daily better in respect to the day-ahead process. Overall the 5 test days, the intra-day prediction total energy prediction improvement compared with the day-ahead one is of about 100 kWh. Figure 17 shows the predictions for day-ahead and intra-day models plotted against the actual monitored data at a 10-min sampling rate over the 4 th day of test data. During this test day, the MAE between the day-ahead forecast result and the monitored data is 191.42, while the MAE between the intra-day and the monitored data is 179.42, meaning that on average, the prediction is better with about 277 kWh of energy. Table 3 presents the prediction results for electrical energy consumption of the IT servers and cooling sub-system for the considered DC.  Figure 17 shows the predictions for day-ahead and intra-day models plotted against the actual monitored data at a 10-min sampling rate over the 4th day of test data. During this test day, the MAE between the day-ahead forecast result and the monitored data is 191.42, while the MAE between the intra-day and the monitored data is 179.42, meaning that on average, the prediction is better with about 277 kWh of energy. Table 3 presents the prediction results for electrical energy consumption of the IT servers and cooling sub-system for the considered DC. Furthermore, we have evaluated the deviation between the actual energy monitored values and the ones forecasted by the intra-day and day-ahead processes over the test data. We have computed the mean absolute error (MAE) value with respect to the actual values at a 10-min granularity. The intra-day prediction gives better results in terms of total energy estimation in 3 out the 5 test days, achieving estimations with more than 600 kWh of energy daily better in respect to the day-ahead process. Overall the 5 test days, the intra-day prediction total energy prediction improvement compared with the day-ahead one is of about 100 kWh. Figure 17 shows the predictions for day-ahead and intra-day models plotted against the actual monitored data at a 10-min sampling rate over the 4 th day of test data. During this test day, the MAE between the day-ahead forecast result and the monitored data is 191.42, while the MAE between the intra-day and the monitored data is 179.42, meaning that on average, the prediction is better with about 277 kWh of energy. Table 3 presents the prediction results for electrical energy consumption of the IT servers and cooling sub-system for the considered DC.

DC Flexibility Forecasting Results
For predicting the DC energy flexibility, we have divided the forecasting problem into two sub-problems namely the prediction of the above-baseline flexibility curve and the prediction of the below-baseline flexibility curve. For this purpose, the initial training data have been split according to the computed baseline into two training datasets containing the decomposition of the training curve into below and above differences with respect to the calculated DC baseline. Each flexibility ensemble model trains two neural networks, namely MLP and LSTM, ensembled using the genetic heuristic.
To train each MLP neural network within each flexibility forecasting model, a dataset consisting of <input, output> pairs was used. The IT servers' sub-system flexibility prediction model was trained first and then the output was used to train the cooling system flexibility prediction model. Table 4 shows the input and outputs for the two neural networks composing the server room flexibility model, and the two neural networks composing the cooling system flexibility model. The neural networks were trained using a k-fold technique, using 100 epochs. The flexibility forecasting techniques were evaluated to predict the energy flexibility above and below curves for the 4 days of test data. Figure 18 shows the results obtained for assessing the flexibility above the baseline over the test period. The prediction has a 6.17% MAPE, the predicted values (colored in blue) following closely the real values (colored in orange). The baseline is depicted in green. Furthermore, Figure 19 depicts the below flexibility, the prediction exhibiting a MAPE of 6.58%. The average MAPE of the day-ahead flexibility prediction is of 6.37%.
The flexibility forecasting techniques were evaluated to predict the energy flexibility above and below curves for the 4 days of test data. Figure 18 shows the results obtained for assessing the flexibility above the baseline over the test period. The prediction has a 6.17% MAPE, the predicted values (colored in blue) following closely the real values (colored in orange). The baseline is depicted in green. Furthermore, Figure 19 depicts the below flexibility, the prediction exhibiting a MAPE of 6.58%. The average MAPE of the day-ahead flexibility prediction is of 6.37%.

Conclusions
In this paper we have proposed an ensemble-based energy prediction model to forecast DCs energy demand and flexibility aiming to enable their safe participation to DR programs. The selection of the short-term time horizon (i.e., day-ahead and intra-day), energy flavors considered as well as of the main DC sub-systems modeled as flexible assets where driven by the nowadays DR programs characteristics. The implemented ensemble-based DC energy prediction model is based on a set of individual neural network weak learners while a genetic heuristic is used to determine the optimal combination of the outcome of individual predictors to minimize the prediction error. The results are promising as the model is feasible to be used for engaging DCs in DR programs. In the case of DC energy demand forecasting results, the ensemble prediction obtained the best MAPE values compared to individual predictors such as MLP and LSTM, 8.15% for the day-ahead time frame. The The flexibility forecasting techniques were evaluated to predict the energy flexibility above and below curves for the 4 days of test data. Figure 18 shows the results obtained for assessing the flexibility above the baseline over the test period. The prediction has a 6.17% MAPE, the predicted values (colored in blue) following closely the real values (colored in orange). The baseline is depicted in green. Furthermore, Figure 19 depicts the below flexibility, the prediction exhibiting a MAPE of 6.58%. The average MAPE of the day-ahead flexibility prediction is of 6.37%.

Conclusions
In this paper we have proposed an ensemble-based energy prediction model to forecast DCs energy demand and flexibility aiming to enable their safe participation to DR programs. The selection of the short-term time horizon (i.e., day-ahead and intra-day), energy flavors considered as well as of the main DC sub-systems modeled as flexible assets where driven by the nowadays DR programs characteristics. The implemented ensemble-based DC energy prediction model is based on a set of individual neural network weak learners while a genetic heuristic is used to determine the optimal combination of the outcome of individual predictors to minimize the prediction error. The results are promising as the model is feasible to be used for engaging DCs in DR programs. In the case of DC energy demand forecasting results, the ensemble prediction obtained the best MAPE values compared to individual predictors such as MLP and LSTM, 8.15% for the day-ahead time frame. The

Conclusions
In this paper we have proposed an ensemble-based energy prediction model to forecast DCs energy demand and flexibility aiming to enable their safe participation to DR programs. The selection of the short-term time horizon (i.e., day-ahead and intra-day), energy flavors considered as well as of the main DC sub-systems modeled as flexible assets where driven by the nowadays DR programs characteristics. The implemented ensemble-based DC energy prediction model is based on a set of individual neural network weak learners while a genetic heuristic is used to determine the optimal combination of the outcome of individual predictors to minimize the prediction error. The results are promising as the model is feasible to be used for engaging DCs in DR programs. In the case of DC energy demand forecasting results, the ensemble prediction obtained the best MAPE values compared to individual predictors such as MLP and LSTM, 8.15% for the day-ahead time frame. The intra-day predictors manage to improve the results generated by the day-ahead ones (7.2% MAPE) while for flexibility forecasting 6.37% average MAPE was obtained for the below and above the baseline curves.
As for future work, we plan to investigate how the prediction model could be potentially decentralized to run closer to the edge (i.e., consumption point) and how it could work in conjunction with Big Data technologies for allowing the integration of large-scale distributed streams of energy data generated by the IoT power meters and by a significant number of smart grid prosumers. At the same time, other non-energy related features (i.e., holiday) or social features will be considered to improve the prediction of energy related behavior and we plan to test the proposed approach in the