Heating Homes with Servers: Workload Scheduling for Heat Reuse in Distributed Data Centers

Data centers consume lots of energy to execute their computational workload and generate heat that is mostly wasted. In this paper, we address this problem by considering heat reuse in the case of a distributed data center that features IT equipment (i.e., servers) installed in residential homes to be used as a primary source of heat. We propose a workload scheduling solution for distributed data centers based on a constraint satisfaction model to optimally allocate workload on servers to reach and maintain the desired home temperature setpoint by reusing residual heat. We have defined two models to correlate the heat demand with the amount of workload to be executed by the servers: a mathematical model derived from thermodynamic laws calibrated with monitored data and a machine learning model able to predict the amount of workload to be executed by a server to reach a desired ambient temperature setpoint. The proposed solution was validated using the monitored data of an operational distributed data center. The server heat and power demand mathematical model achieve a correlation accuracy of 11.98% while in the case of machine learning models, the best correlation accuracy of 4.74% is obtained for a Gradient Boosting Regressor algorithm. Also, our solution manages to distribute the workload so that the temperature setpoint is met in a reasonable time, while the server power demand is accurately following the heat demand.


Introduction
Data centers (DCs) are consuming about 2-3% of the total electrical energy generated worldwide, thus, they are becoming a global problem. The high energy demand is used not only for the DCs' primary business objective, which is to execute their client's workload, but also to maintain temperature conditions for the safe operation of IT equipment. As result, dealing with excess heat has become an expensive process that is negatively affecting the DCs' profit and sustainability [1]. The continuous hardware upgrades of computing resources that are increasing the power density of the processor make the cooling processes even more complex. This generated even higher energy demands for cooling systems in an effort to remove the heat produced by the computing resources. Studies show that DCs in 2019 consumed over 1900 MW of energy, while the associated heat generated is sufficient to heat around 2 million households need 20 GJ of heating on average [2].
In current DCs, the cooling processes are executed continuously to remove the heat generated by the computing resources and transfer it to a heat exchanger that uses air or liquid coolant [3]. This is rather inefficient, as the energy is consumed twice: first by the computing resources to execute the clients' workload and second by the cooling system In current DCs, the cooling processes are executed continuously to remove the heat generated by the computing resources and transfer it to a heat exchanger that uses air or liquid coolant [3]. This is rather inefficient, as the energy is consumed twice: first by the computing resources to execute the clients' workload and second by the cooling system to dissipate the accumulated heat. To address this issue, new research directions have emerged lately aiming to reuse the otherwise wasted heat of the DCs in nearby district heat grids [4][5][6].
Despite these recent efforts, only a few DCs are effectively reusing the generated heat and only a fraction of the excess heat is being recovered [7,8]. There are several reasons making heat reuse difficult. First, the DCs should be in an urban agglomeration benefiting from policies, operations, and infrastructure that enable the smart distribution of thermal energy. Few cities are offering the needed conditions for DCs to re-use their heat. A positive example is Stockholm where almost 10 percent of the city's heating needs are assured by using heat recovery [9,10]. Second is the relatively low quality of the recovered heat and the losses that occur when it is transported over long distances. To cope with this problem, systems were designed to transfer the absorbed heat at a higher temperature, making it suitable for long-distance transportation. Heat pumps are used to increase the temperature of the recovered heat to make it more marketable. With the help of heat pumps, the heat generated by servers at around 40 degrees Celsius can be transferred to heat water to around 80 degrees Celsius, suited for long-distance transportation in the nearby residences. At the same time, studies have shown that the coolant flow rate and server room outlet temperatures are important factors for the system's overall efficiency. The third is the concern regarding the exploitation of DCs' thermal flexibility of the server rooms in safe conditions for the IT equipment's operation [11]. The equipment can overheat leading to malfunctions if the temperature within the server room increases to generate more heat. The formation of hot spots needs to be prevented using complex management strategies and accurate simulations of the thermodynamic processes within the DC are used to avoid dangerous hot spots [12,13].
In this paper, we address the problem of DCs heat reuse from a novel perspective while considering the case of distributed DCs. In such a design, the IT equipment is not deployed in a server room but is distributed and deployed into buildings and used to provide heat for the tenants while executing the workload (see Figure 1).  The aforementioned design can be more energy efficient because it will also reuse the electricity that is normally used for space heating for executing the workload. The IT equipment is used as a primary source of heat being deployed in the building's rooms, which eliminates the costs associated with the cooling processes. The building network connection is used to get the delay-tolerant workload to be executed.
In summary, the paper provides the following contributions: • Definition of the thermal aware workload scheduling for the distributed DC case as a constraint satisfaction problem aiming to meet the workload service level agreements and at the same time meet the heat demand of the tenants.

•
Definition of a thermodynamic model to accurately estimate the heat demand needed to be generated by the IT equipment and the workload to be allocated for execution to meet the temperature setpoint defined by the tenant. • Development of a machine learning-based model for learning and correlating the heat demand with monitored data related to the actual temperature in the room, the temperature of the heat generated by the IT equipment, and temperature setpoint.

•
The heat models and workload scheduling solution were tested and validated considering the characteristics and actual monitored data from an operational distributed DC, with the results being promising in terms of meeting the heat demand and heat model's correlation accuracy.
The rest of the paper is structured as follows: Section 2 presents the related work on DC heat reuse and workload scheduling models; Section 3 defines the thermal aware workload scheduling in a distributed DC, Section 4 describes the models defined to determine the heat demand and correlation between the workload to be executed and the temperature of the generated heat, Section 5 presents experimental results for using test data from an operation distributed DC, while Section 6 concludes the paper.

Related Work
Most approaches in the literature address the heat reuse of the common type of DC in which all the IT equipment is hosted in one building (i.e., a server room) and features a support infrastructure for power and cooling management [6,14,15]. Several heat reuse options are proposed such as for district heating, [7][8][9] hot water grid [16,17], or nearby office buildings [4,5,18]. The heat reuse policy and infrastructure are well established in Nordic European countries [6,9,10], thus the DC potential for heat reuse is usually analyzed considering this use case. The efficient heat reuse will provide new revenue streams for DCs but at the same time, several research challenges still need to be faced such as the low-grade waste heat generated by the servers, especially in the case of the air-cooled DCs and the high investment costs [1,19]. These investments usually address the deployment and installation of heat pumps that are used for raising the quality of the heat [20][21][22]. District heating is seen as one of the most promising alternatives for residual heat recycling. The DCs' waste heat has the potential of replacing natural gas-based heat, bringing considerable cost savings and a lower carbon footprint to local communities [19,23].
In the area of DCs' heat reuse, two main research topics have a strict relation to this paper's objective and contributions: modeling and simulating the thermal characteristics of the DC and the thermal aware workload scheduling.
The first topic addresses the development of models to study the thermodynamic processes inside a DC and to determine the heat generation, transfer, and reuse characteristics [4,[24][25][26]. The thermodynamics impose limits on both the maximum allowable temperature of the microprocessors and the coefficient of performance of the heat pumps [27,28]. Setting higher temperature setpoints in the server room is proposed and used to improve the quality of the recovered heat [4,29]. In this case, accurate thermal models of the server room are developed to predict the temperature variations, detect the formation of hot spots which may lead to equipment malfunctioning, and evaluate alternatives in cooling system configurations [3,11]. Computational Fluid Dynamics (CFD) models of the server room are used to run simulations for studying the interactions between servers and cooling units and their effect on the heat and temperature distribution [5,13,30]. The server heat generation and dissipation rates are analyzed and used to set the recommended temperature values for inlet air into the server room [15,31]. The CFD models of the IT server room are used to analyze the supply air temperature of the cooling system units and the inlet air temperature to find the allowed range of temperature for not damaging the equipment and activating the cooling systems [11]. As a result, higher temperature setpoints can be used for short periods while techniques such as pre-or post-cooling may be used for thermal profile adaptation [4,32]. Even though the CFD simulations provide accurate temperature predictions, they are computationally expensive [33]. Tradeoffs should be done among the accuracy of the simulation, execution time, and resource overheads. Mathematical models and machine-leaning-based approaches are used to address such tradeoffs [4,29,34] with varying levels of success. Mathematical models of the temperature evolution in a server room are presented in [35,36] addressing the thermal behavior concerning heat generation, circulation, and air-cooling system using Navier-Stokes equations expressing thermal laws or by using fast approximate solvers [37,38].
The combination of thermodynamics processes simulations with machine learning techniques offers promising results for determining a set of parameters empirically from monitored data [5,39,40]. In [39], a thermal forecasting model is defined and used for predicting temperatures surrounding servers in data centers. Continuous streams of temperature and airflow measurements are collected for obtaining online predictions with real-time sensor measurements. In [40], a fast converging solution is proposed using both a feed-forward network and a dynamic recurrent artificial neural network. The neural networks learn incrementally, using the incoming stream of data samples. Adaptiveness is presented as an essential feature, as the model can learn the characteristics of a server room with minimal training, and then it may continuously adapt to new data fed without retraining [41]. In [5], a heat reuse model is defined which combines the simulation of the thermodynamic processes in a server room with deep learning processes. Multi-Layer Perceptron neural networks are used for predicting the hot air temperature distribution in the server room. Some models use a set of parameters from the server room that are relevant for the thermodynamics processes and use machine learning to predict their evolution. Gradient boosting decision trees, artificial neural networks, or deep learning models are used to predict the server room temperature [42,43]. Finally, Grammatical Evolution techniques [44] and Environmentally Opportunistic Computing [45] are used for analyzing server and inlet air temperatures and predicting the temperatures, in conjunction with thermal models of DCs. The models should reflect the physical nature of the system, rather than fitting the data purely mathematically. This is enforced using rules for a model's generation expressed grammars written in Backus-Naur form [46]. Thermal aware workload scheduling algorithms for heat reuse are derived from scheduling algorithms developed to minimize the cooling system energy consumption [12,47]. The main goal of these scheduling algorithms is to distribute the workload in a data center to maintain a low ambient temperature and avoid hotspot formation [47,48]. In the case of heat reuse, the workload scheduling aims to increase the efficiency of heat pump operation and to meet the heat demand of the district heating network [49]. They rely on an optimization problem, defined either reactively or proactively, whose complexity is highly dependent on the representation of the thermodynamics processes and the correlations considered among workload and power and heat demand [50,51]. Workload placement strategies considered are based on zones discretization, minimize the heat recirculation, and prioritize the servers for task allocation by observing hot airflow within the DC [47,48,51]. Scheduling methodologies common in DCs such as first-come-first-serve or backfilling do not usually consider the thermal perspective [52,53]. Machine learning-based models are proposed to infer scheduling policies with thermal features. Server room temperature prediction approaches using machine learning are proposed in conjunction with scheduling algorithms to avoid thermal stress and hotspot formation [54]. In [4], thermal aware workload scheduling is proposed to adapt the DC heat generation to the district heating demand and maximize the waste heat reuse. Neural networks are used to learn the heat generation and heat distribution in the server room. Thermal aware scheduling may consider different heuristics such as tasks and servers' classification in hot or cold thermal prediction models, node ranking based on heat generation features, etc. [48,51,55]. In [56], an optimization problem is proposed for thermal scheduling considering the optimal setpoints for the workload distribution and the temperature in the server room. Heat flow models are proposed for determining temperatures in case of a thermal aware workload scheduling policy, while a heat recirculation matrix is used to define the thermal influences between the servers [57,58]. Algorithms are prosed to allocate workload on multiprocessors while minimizing the makespan and temperature constraints [59,60]. They aim to reduce the chip temperature while meeting the workload SLA, with the optimization problem being usually modeled as a mixed-integer linear program. Thermal aware task scheduling approaches to adjust CPU frequency based on Dynamic Voltage Frequency Scaling [61] are used to manage the energy demand and heat generation [62][63][64]. In [65], the authors build a steady and dynamical thermal interaction model of the DC. Based on these models, a task assignment and frequency optimization are performed in the first optimization stage, while the second stage uses a model predictive control (MPC) to represent the optimization problems that aim to minimize the cooling system power demand. In [12], a thermal aware consolidation mechanism is defined using a heat recirculation matrix and a set of bio-inspired algorithms that minimize overall DC energy consumption. Finally, in [56], the scheduling optimization problem is defined by considering the energy footprint reduction with thermal exchanges while incorporating both temperature and workload constraints.
After analyzing the existing state of the art, we did not find any relevant literature approach that addresses the thermal aware workload scheduling in the case of distributed DCs while considering the IT equipment as a primary source of heat. Several papers advocate the Data Furnace as the method of heating residential homes by deploying IT equipment in their premises [66,67]. Nevertheless, they are at the stage of ideas promoting some of its advantages such as a smaller carbon footprint or a reduced total cost of ownership per server without offering an actual scheduling solution.
In our paper, we address the identified knowledge gap in the literature by proposing a thermal-aware workload scheduling solution for distributed DCs. Our approach can consider both the service level agreements constraints of the workload to be executed and the heat demand of the residential home's tenants. To accurately estimate the heat to be generated by the IT equipment for meeting the heat demand levels, we define thermodynamic and machine learning models. They evaluate the heat demand based on the temperature of the heat generated by the IT equipment to determine the workload to be allocated for execution such that the temperature setpoint defined by the tenant is meet. The scheduling algorithm and heat models are evaluated considering relevant data sets from an operational distributed DC showing promising results.

Thermal Aware Workload Scheduling
The distributed DC should be modeled as a collection of N isolated IT equipment (i.e., servers or micro data centers) deployed in residential buildings. Each one uses the building internet network and is deployed in a room that needs to be heated: The IT equipment offers a direct heat source, converting the electrical energy consumed for workload execution into thermal energy (i.e., a computing heater). The thermal energy is dissipated through radiators in the surrounding air, heating the room as conventional heaters to maintain the thermal comfort of the inhabitants. The objective, in this case, is to schedule the workload tasks to be executed by the IT equipment deployed in the room k to bring and maintain the ambient temperature T k Room close to a set point T k set−point temperature desired by the dwellers: The workload scheduling should consider the dockerized tasks and Service Level Agreements constraints in terms of computational resources to be allocated and execution time deadline. Also, the number of task migration should be kept as low as possible to minimize its impact on task SLA. We defined the workload to be scheduled for execution over an interval [0 . . . ∇] as a set of M tasks: Each task specifies the computational resources that need to be allocated, estimated execution time, and deadline according to the Service Level Agreements: The rooms should be heated by reusing residual heat generated by the servers. For each server, the computational resources, the idle, and maximum power consumption are specified: Server : < CPU, RAM, HDD, P IDLE , P MAX >.
To allocate tasks on servers to be executed during the interval [0 . . . ∇], a scheduling matrix W scheduling is defined. The matrix stores the starting time for tasks execution and their allocation on specific servers: If there is a task j scheduled to the executed-on server k, then: otherwise, the t kj start = 0. The subset of tasks W k alocation scheduled to be executed to each server k, from the N locations where the DC IT equipment is distributed, is determined using the task scheduling matrix: In the process of determining the scheduling matrix, several constraints need to be met. The first one is referring to the execution time of a scheduled task that is not allowed to exceed the specified execution deadline: The second set of constraints are referring to the relation between the total computational resources requested by the tasks scheduled for execution on a server and the server's available resources. The computational resources allocated to the tasks scheduled on a server k should be less than the total resources of that server: Considering the resources allocated to tasks, the computational resources utilization levels for a server k are computed as: The power consumed by the server is determined using the utilization ratio and the power characteristics of the servers as: For the function f power , we used a linear model to compute the power considering the idle and maximum power of the server k and the CPU utilization level [68]: The energy consumed by the server k over the time interval [0 . . . ∇] can be computed as an integral of the power over this time window: Among the computational resources considered for a server, the main heat source is the CPU, which is responsible for 30% up to 65% of the total heat dissipated and also has the highest temperature. Other resources have less impact with varying proportions, thus we denoted them with parameters that can be determined empirically. According to the law of energy conservation, the electrical energy consumed by the servers can be transformed into heat that is dissipated in the room where the servers reside over the interval [0 . . . ∇]. This heat can be split according to the nc computational resources of the server, with each having been assigned a weight ω according to the proportion of the thermal energy generated: Considering the tasks allocated over the interval [0 . . . ∇], the function f schedule : R M → R nc estimates for each server, the energy consumed by each component, which can be used to estimate the server heat generation.
The heat dissipation in each room leads to a temperature modification according to a function f Q that estimates the ambient temperature T k Room modification over the interval [0 . . . ∇]: In our case, the workload scheduling for a distributed DC aims to allocate the workload tasks on the IT servers to optimally generate the heat according to the resident's demand. The room temperature setpoints need to be reached as fast as possible and are kept constant over the rest of the time interval, with no task migrations, in each of the N locations where the IT equipment is distributed. The thermal aware workload scheduling problem is modeled as a constraint satisfaction problem to determine the optimal workload scheduling matrix W scheduling such that the set-points temperatures in each of the N rooms are met (see Algorithm 1). Its solving process involves nonlinear programming [69] because of the non-linearities of the objective function and the workload scheduling function f schedule and the continuous values of the W scheduling matrix. Also, it is an NP-hard problem [70], thus an approximation algorithm is needed to determine a solution.
The main challenge is the unknown nature of the f k Q function that may feature complex representations, making the utilization of optimization algorithms derived from stochastic gradient descent unfeasible [71]. To address this, we split the thermal aware scheduling problem into two subproblems (see Figure 2). For the first subproblem involving estimates for each room k, the heat that needs to be generated by the server to make the transition from T k Room (0) to T k set−point . The second one determines the workload scheduling matrix W scheduling such that a large enough workload is scheduled for execution on each server to generate heat to meet the demand. The latter problem can be solved using an adaptation of the Multiple Knapsack Problem [72]. In the case of the second subproblem, the goal is now to determine the workload scheduling matrix so that the heat generated by each server matches closely the heat demand of the corresponding room: In the case of the first subproblem, we aimed to approximate the heat demand Q k Demand needed over the interval [0 . . . ∇] to make the transition from T k Room (0) to T k set−point and then keep the temperature constant at the set point level. It can be solved by defining a function f −1 Q that estimates the amount of heat needed to make the temperature transition: In the case of the second subproblem, the goal is now to determine the workload scheduling matrix W Scheduling so that the heat generated by each server matches closely the heat demand of the corresponding room: The optimization still involves nonlinear programming, but the functions are not defined as black-box models. As result, it can be solved using approximation algorithms used to determine solutions for the Multiple Knapsack Problem [72]. Using this time discretization over the interval [0 . . . ∇], the scheduling problem can be reduced to an N × ∇ knapsack problem, where the N × ∇ knapsacks volumes correspond to the values of the heat demand to be generated on each of the ∇ intervals in each of the N rooms, while the items being packed are the workload tasks to be deployed on servers and executed.

Heat Demand Estimation
The thermal aware workload scheduling model described in the previous section needs accurate estimations of the heat demand to be generated by the IT equipment such that the temperature set point in the room set by the residents is met. In this section, we propose two models for implementing the f −1 Q function used to determine the head demand estimations.

Server Heat Transfer Model
We defined a thermodynamic model of the server calibrated with measurements. To ease the representation, we assumed that the room temperature T k Room is constant during the workload scheduling period [0 . . . ∇]. This means that all the power generated by the server Q server is dissipated in the surrounding environment by rising to the ceiling, while cold air from the floor passes through the computing element radiator, leading to an energy loss Q loss of the room.
The server power consumption for executing the allocated tasks, the temperature of the heat generated (T server ), and room temperature are linked as follows [73]: where c server is the server heat capacity, f air represents the airflow over the server surface, c air is the specific heat capacity of air, and ε server is the thermal server efficiency. Thermal efficiency is defined as the ratio of real to maximum power transfer between the server's body and the airflow and can be experimentally determined by measuring the temperature of emerging airflow denoted as T ex .
ε server = P server P max The changes in the server power P server influence the temperature of the heat generated. We assumed that the power is changed linearly with respect to time with the ratio R: P server (t) = R × t + P server (0).
We defined the transient state of server-generated heat temperature T server over the time interval in which P server changes, while the equilibrium state was defined to be when the P server reaches a constant value. Relation (25) can be used to define both the transient and the equilibrium regimes of the computational equipment. The transient temperature is defined in Equation (28) as a solution of (25), as a function of the time coordinate, while the solution in the equilibrium regime is given by Equation (29).
Cserver f air ×ca ×εserver Cserver f air ×ca ×εserver In Equations (28) and (29), the constants A and B are fixed by the initial conditions in each case. The equilibrium regime defined in Equation (29) tends asymptotically to a state independent of time due to the rapid exponential decay of the first term, eventually reaching a state depending only on the power workload and predicting the behavior of T server with respect to the change of P server . The server temperature after the equilibrium temperature setting can be estimated with Equation (30), derived from (29) when the first term decays to zero.
The parameters used in defining the server as a heater model are detailed in Table 1, mentioning which parameter should be measured, and which are tuned experimentally for a particular model. The parameters from Table 1 that are determined experimentally can be computed while considering a set of measurements from the physical server room, allowing the model to be fitted to the exact configuration of the real-world system. Finally, after reaching the equilibrium temperature, the assumption that T Room is constant can be relaxed, as the computational equipment will dissipate heat in the room, leading to a temperature change. Considering the equations developed above, the heat per unit of time demanded by the servers overtime to pass the transient regime and transition the temperature T Server to a temperature T server−equilibrium close to T server− f inal can be computed as the server demand over the transient regime: Q demand (t) = P server (t) = R × t + P server (0).
However, to transition the room temperature from T ROOM to T set−point , we considered that the server has reached a state sufficiently close to the asymptotic equilibrium and the relation between T server− f inal and T ROOM is approximately linear, so for small changes and fluctuations, the difference between the two temperatures is largely unchanged. Using simple thermodynamic considerations once again, it can then be stated that: which is an expression of the energy required to heat the room from T ROOM to T set−point in time ∆t after the server has reached its equilibrium temperature, M is the mass of the air in the room, computed by multiplying the room volume V by the air density ρ air , and P loss is the energy lost per unit time by the room due to air drafts and imperfect thermal insulation. Thus, from (32) we can deduce the expression: We can now obtain the more useful expression for Q Demand (t), given T set−point for the air in the room, knowing that Q Demand (∇) = P server− f inal and that the increase in power is linear: It should be noted that the time in which the room heats up, ∆t, is a parameter that can be chosen freely, and should not be made too short, as this will cause the server to overheat i.e., exceed the recommended functioning temperature of approximately 35 • C. The limiting minimum size ∆t is easily deducible from previous expressions. Finally, if the room is to be heated, P server− f inal must be larger than P loss , otherwise, the room will not receive any net heat and will not increase its temperature.

Machine Learning-Based Model
The machine learning model aims to infer and correlate the heat demand that needs to be generated by the server executing the allocated workload out of monitored data (see Figure 3). We considered that the room temperature is collected using heat sensors placed at a certain distance from the server. The heatsink temperature is collected using the server's embedded temperature sensor, whereas the power consumption is obtained using a wattmeter. All data needs to be smoothed so that outliers are eliminated. We aimed to detect areas of interest, defined by a high correlation coefficient between the heatsink and ambient temperature, limited between two local peaks of the temperatures and power. Thus, we used the Pearson Correlation Coefficient, as defined in Equation (35), to measure the linear correlation between two datasets. Only relevant samples with values larger than a The machine learning model takes as inputs the actual room temperature T Room (0), the initial power consumption P server (0), the desired room temperature, T set−point , and provides as output the heat demand that the server should generate by executing workload, to reach the desired temperature, T Room (∇).
All data needs to be smoothed so that outliers are eliminated. We aimed to detect areas of interest, defined by a high correlation coefficient between the heatsink and ambient temperature, limited between two local peaks of the temperatures and power. Thus, we used the Pearson Correlation Coefficient, as defined in Equation (35), to measure the linear correlation between two datasets. Only relevant samples with values larger than a threshold were considered for further processing, namely data where the room temperature, server temperature, and power consumed by the server present similar patterns.
As the area of interest still may contain data that might be irrelevant for our final purpose, it needed further processing. Only the initial and final temperatures were selected from the scheduling interval [0 . . . ∇]. They represent the relevant information for predicting the power consumption, given the temperature. The initial temperatures can be computed as the mean temperature before a sudden change, whereas the final ones can be computed as the mean temperature after stabilization. The same goes for power, consequently obtaining the four needed values: where t 1 , t 2 , . . . t n ∈ [0 . . . ∇], are timestamps of the data acquisition.
Several models were implemented and used to learn the behavior of the function f −1 Q used to estimate the heat demand. Table 2 presents the description of the models and their configuration determined empirically on the test data. Table 2. Machine learning models used for determining the heat demand.

Linear Regression
The basic linear regressor was used to determine the baseline for prediction accuracy Polynomial Regression A second-degree polynomial regressor. Multiple degrees were considered, but the validation score began to drop after the degree was set to 2.
Gradient Boosted Regression 90 estimators with a maximum depth of 4. The samples had a minimum split of 5 and the learning rate was 0.1. The loss was computed using the least-squares method.
Random Forest Regression 9 estimators with a maximum depth of 4 are defined.

Support Vector Regression
A support vector regressor with kernel type of radial basis function and parameters: C = 100, γ = 0.01, ε = 0.1

K Neighbors Regression
The K-Nearest Neighbors Regression with 2 neighbors and uniform weights.

Deep Learning Regression
Multi-Layer Perceptron having one input layer, two hidden layers of 128 and 256 neurons, and one output layer. The activation function for the hidden layers is of type Rectified Linear Units (ReLU), and 500 epochs were used for training. The loss function was the mean squared error and the optimizer ADAM. Early stopping was employed with the patience of 50 epochs and a minimum validation loss as the monitor.

Evaluation Results
To evaluate the thermal aware workload scheduling and the heat demand estimation models, we considered a test case distributed DC [74] composed of a main DC and a set of edge sites (see Figure 4). Each edge site hosts a workload distribution node, the QBox, and a set of server heater nodes, the QRads, that provide heat by executing the workload provided by the QBox. The QRad is a server-heater with no moving parts and three motherboards equipped with CPUs that execute workload and dissipate the heat in the surrounding environment, leading to a temperature rise. Each QRad node consumes about 400 W of electrical energy and generates an amount of roughly 400 W of heat, depending mainly on the CPU model. The workload that runs on the QRad is composed of tasks that use full CPU resources, such as 3D animation or financial risk computing.

Support Vector Regression
A support vector regressor with kernel type of radial basis function and parameters: = 100, = 0.01, = 0.1

K Neighbors Regression
The K-Nearest Neighbors Regression with 2 neighbors and uniform weights.

Deep Learning Regression
Multi-Layer Perceptron having one input layer, two hidden layers of 128 and 256 neurons, and one output layer. The activation function for the hidden layers is of type Rectified Linear Units (ReLU), and 500 epochs were used for training. The loss function was the mean squared error and the optimizer ADAM. Early stopping was employed with the patience of 50 epochs and a minimum validation loss as the monitor.

Evaluation Results
To evaluate the thermal aware workload scheduling and the heat demand estimation models, we considered a test case distributed DC [74] composed of a main DC and a set of edge sites (see Figure 4). Each edge site hosts a workload distribution node, the QBox, and a set of server heater nodes, the QRads, that provide heat by executing the workload provided by the QBox. The QRad is a server-heater with no moving parts and three motherboards equipped with CPUs that execute workload and dissipate the heat in the surrounding environment, leading to a temperature rise. Each QRad node consumes about 400 W of electrical energy and generates an amount of roughly 400 W of heat, depending mainly on the CPU model. The workload that runs on the QRad is composed of tasks that use full CPU resources, such as 3D animation or financial risk computing. A sensors-based monitoring infrastructure was used to acquire relevant data regarding the Ambiental temperature , power consumption of the QRad ( ) and temperature of the server ( ).The data acquired spans over several months, being recorded at a granularity of 10 s. Both the characteristics of the QRad heaters and monitored data acquired by the installed infrastructure are the inputs of our study. However, the monitored data used to model the thermal behavior of the QRad heaters must be extracted for larger intervals that show suggestive temperature and power changes. Thus, a pipeline of data pre-processing operations was employed to extract the relevant data samples (see Figure 5). A sensors-based monitoring infrastructure was used to acquire relevant data regarding the Ambiental temperature T Room , power consumption of the QRad (P server ) and temperature of the server (T server ). The data acquired spans over several months, being recorded at a granularity of 10 s. Both the characteristics of the QRad heaters and monitored data acquired by the installed infrastructure are the inputs of our study.
However, the monitored data used to model the thermal behavior of the QRad heaters must be extracted for larger intervals that show suggestive temperature and power changes. Thus, a pipeline of data pre-processing operations was employed to extract the relevant data samples (see Figure 5). The acquired data were smoothed using an exponential weighted moving avera window with the span of 90 data points corresponding to a time window of 15 min mean filter with a window span of 60 data points, corresponding to a time window of The acquired data were smoothed using an exponential weighted moving average window with the span of 90 data points corresponding to a time window of 15 min. A mean filter with a window span of 60 data points, corresponding to a time window of 10 min, was applied to ignore sudden fluctuations. To find the exact times when the ambient (T Room ) and the server temperatures (T server ) rise or fall together, Pearson's r coefficient was used. This was applied on a rolling window with the span of 360 data points corresponding to a time interval of 1 h to compute the correlation. The samples corresponding to intervals where the scores exceeded 0.5 were considered. Finally, local maximum and minimum peaks were found over window sizes of 180 data points corresponding to 30 min time intervals. Only samples that started at a local minimum and ended at a local maximum or vice-versa were taken into consideration. The process involved manual inspection of the selected data and fine fitting so that the included regions could also contain the power instant change, which sometimes takes place before a local peak. Figure 6 shows a relevant data sample obtained by filtering, smoothing, and applying Pearson's coefficient. The length of the time interval ∇ is determined between the start of the temperature change and the end of the temperature change. Analyzing the data set, we determined ∇ for each sequence to record most temperature changes and to reach a steady-state situation. Since most of the samples show that the server power change is similar to a linear function, only the final server power consumption was considered (P Server (t n + ∇)). This can be computed based on the initial state parameters at the time t n : the ambient temperature at the time T Room (t n ), the server temperature, T server (t n ), and the server power demand P server (t n ). After having determined the characteristics for each QRad server, we predicted the change in power workload on the testing data. This was done by fixing , , and to the values obtained, reading and from the data and finally obtaining our prediction for the change in , which we would compare to the one in the data. A prediction chart is shown in Figure 7, illustrating as a linear function the power change for a server heater (depicted in green line) to increase the temperature (depicted in blue line) to match the requested temperature as closely as possible (depicted in red line), over a time interval of 20 min. Firstly, we considered the server heat transfer model detailed in Section 4.1. The model was calibrated by fitting some of its parameters on the actual data gathered from the QRad heaters. The heat model parameters from Table 1 (i.e., the server heat capacity C server , the server thermal efficiency, ε server ) have to be computed to fit best on monitored data, while the airflow over the server (i.e., f air ) has to also be estimated.
The fitting process consists of feeding a trace of processed data in the form T ROOM (t n ), P server (t n ), T server (t n ) → P Server (t n + T) for each QRad server CPU to an optimizer that can compute the model parameters using gradient descent-based algorithms [71]. The processed data samples were split into 80% training data for model fitting, while 20% was used to validate the model. The fitting process is performed iteratively, at each step a sample of training data is read, and the variables C server , ε server and f air were varied from a set of initial values until they fit best the data. The sequential fitting was done using a custom script from the SciPy library, namely the scipy.optimize method [75], applied on each data sample, recording values of C server , ε server and f air from the fit process and using them as initial guesses for the next sample's fit. Repeating this operation gave us progressively refined values of the server's parameters.
After having determined the characteristics for each QRad server, we predicted the change in power workload on the testing data. This was done by fixing C server , ε server , and f air to the values obtained, reading T server and T Room from the data and finally obtaining our prediction for the change in P server , which we would compare to the one in the data. A prediction chart is shown in Figure 7, illustrating as a linear function the power change for a server heater (depicted in green line) to increase the temperature (depicted in blue line) to match the requested temperature as closely as possible (depicted in red line), over a time interval of 20 min. After having determined the characteristics for each QRad server, we predicted the change in power workload on the testing data. This was done by fixing , , and to the values obtained, reading and from the data and finally obtaining our prediction for the change in , which we would compare to the one in the data. A prediction chart is shown in Figure 7, illustrating as a linear function the power change for a server heater (depicted in green line) to increase the temperature (depicted in blue line) to match the requested temperature as closely as possible (depicted in red line), over a time interval of 20 min. The model was evaluated on a set of scenarios, with both fitting processes described above, achieving the average prediction accuracy from Table 3. The model was evaluated on a set of scenarios, with both fitting processes described above, achieving the average prediction accuracy from Table 3. Secondly, we evaluated the machine learning-based heat model presented in Section 4.2. A set of several machine learning algorithms were implemented to model the QRad server heat generation. The Linear and Polynomial Regression, Random Forest Regression, Support Vector Regression, and K Neighbors Regression were implemented using Python's SciKitLearn [76] library. The Multi-Layer Perceptron was implemented using Keras [77] and Tensorflow [78] while for the Gradient Boosting Regression, the XGBoost's [79] was used.
The dataset processed using the defined pipeline of operations can be split into train and test subsets, with the proportions of 0.8 and 0.2 of the initial data. The models were validated using 5-fold cross-validation and were evaluated by computing the Root Mean Square Error (RMSE), Root Mean Percentage Square Error (RMSPE), the coefficient of determination R2 (R Squared), the error mean, the error standard deviation, and MAPE. The average results obtained are listed in Table 4. As Table 4 shows, considering the MAPE and the RMSPE, the best results are obtained by the Gradient Boosting Regression (GBR). These are the most relevant metrics for prediction accuracy, considering the percentages of the errors. They are showing that the GBR model can predict the heater power demand with less than 5% error, corresponding to less than 10 W of power. However, by analyzing the mean and the standard deviation on the error besides the RMSPE and the MAPE, the Random Forrest Regression (RFR) gives better results, showing a small error distribution considering the mean error. Finally, the R2 metric for the two models is closest to 1, with values of 0.94 for GBR and 0.92 for RFR. Looking at all calculated metrics, we considered that the GBR gives the best results considering the datasets collected from the QRad heaters and is most suitable for being used in a thermal aware workload scheduling algorithm.
Thirdly, we evaluated the thermal aware workload scheduling solution presented in Section 3. The heating requirements for a room can be determined by several models used in the industry to determine the recommended size of the heaters [80]. The main factors that influence the required power of the heaters in a room are the volume of the room and the caloric coefficient of the room. The latter has a value between 40 and 70 kcal/m 3 and is influenced by the thermal insulation of the room, the number of exterior walls, the number of windows, and their size and type. The power needed to heat a room according to industry standards can be computed as: where V ROOM is the volume of the room expressed in cubic meters m 3 , C cal ROOM is the caloric coefficient of the room expressed in kcal m 3 , and c cal W is the conversion factor from kcal to W having an approximate value of 1.163 W/mcal. We determined the required heating power for rooms with different configurations and we assessed the number of QRad heaters needed to be installed (see Table 5). We considered that the heat losses from the room are much smaller than the heat generation ratio.
From the electrical energy consumption perspective, a heating system based on electrical heater radiators would consume the same amount of electricity as the server heaters for generating the same amount of heat. This is a result of the server design having no moving components, thus according to the law of energy conservation, most of the electrical energy consumed by the servers is converted into thermal energy and dissipated as heat in the room. Finally, as the server design is like a standard heater, they are installed in the room in the same positions as standard heaters, requiring no ventilation system for heat recirculation.
To illustrate our case study, we have considered the power and thermal behavior of 4 QRad server heaters containing three motherboards, each with an Intel ® Core™ i7-6950X The server's initial temperature is 29 • C and their total power demand is 540 W, while each of the four servers only has one of the three CPUs active with one core active out of the 10 total cores. The temperature and power status of the server over a time interval of 15 min is displayed in Figure 8, showing a steady-state condition in the room. The desired server temperature is 34.5 degrees Celsius, requiring the heater to generate thermal energy to increase the room temperature. . Initial conditions: the 4 QRad server heaters' total power demand and initial temperatures. Figure 8. Initial conditions: the 4 QRad server heaters' total power demand and initial temperatures. Figure 9 shows the power demand prediction using the GBR model for total heat demand estimation for the 4 QRad server heaters to transition the room temperature to the set-point temperature. Each of the 12 CPUs' estimated power demand is roughly 115 W, meaning that the 4 QRads will generate approximately 1.4 kW of heat.  Based on the power demand given by the GBR model for heat demand estimation, the thermal aware workload scheduling algorithm aims to schedule workload for execution so that the CPU usage leads to the required power demand. We considered a set of synthetic tasks running in Docker, each requiring 1 active CPU core processing near 100% and 1 GB of RAM. The task allocation result after solving the optimization problem from Section 3 is shown in Figure 10 left. For each of the 4 servers and their 3 CPUs, the plan activates 8 cores to run at a time interval of 0-100 s and 9 cores to run in the time interval of 100-800 s. Thus, in total 108 cores will be active during the heating period, leading to a total of 1488 W, shown in Figure 10-right, a value close to the predicted power demand of 1400 W estimated by the GBR model.
Finally, we use the data logs to estimate the thermal behavior of the 4 servers for the load computed by the thermal aware scheduling algorithm while considering the predicted load of the GBR model for heat demand estimation. As Figure 11-right shows, the server power demand is close to the predicted heat demand, being able to match it with 97% accuracy. As a result of the power demand and workload execution, the servers dissipate heat in the room, leading to the temperature evolution from Figure 11-left. The blue dotted line shows the room temperature evolution because of simulating the installed QRad behavior with the power demand depicted in Figure 11-right. The green dotted line Based on the power demand given by the GBR model for heat demand estimation, the thermal aware workload scheduling algorithm aims to schedule workload for execution so that the CPU usage leads to the required power demand. We considered a set of synthetic tasks running in Docker, each requiring 1 active CPU core processing near 100% and 1 GB of RAM. The task allocation result after solving the optimization problem from Section 3 is shown in Figure 10 left. For each of the 4 servers and their 3 CPUs, the plan activates 8 cores to run at a time interval of 0-100 s and 9 cores to run in the time interval of 100-800 s. Thus, in total 108 cores will be active during the heating period, leading to a total of 1488 W, shown in Figure 10-right, a value close to the predicted power demand of 1400 W estimated by the GBR model. depicts the room monitored temperature extracted from the data logs, showing a temperature increase close to the simulated QRad behavior and matching the set-point temperature after 800 s. Figure 10. CPU active cored due to task scheduling on the left and CPU power demand estimated and predicted on the right. Figure 11. QRad Simulation Results: temperature evolution (left) and power demand evolution (right).

Conclusions
In this paper, we consider the case of distributed DCs and associated problems re- Figure 10. CPU active cored due to task scheduling on the left and CPU power demand estimated and predicted on the right.
Finally, we use the data logs to estimate the thermal behavior of the 4 servers for the load computed by the thermal aware scheduling algorithm while considering the predicted load of the GBR model for heat demand estimation. As Figure 11-right shows, the server power demand is close to the predicted heat demand, being able to match it with 97% accuracy. As a result of the power demand and workload execution, the servers dissipate heat in the room, leading to the temperature evolution from Figure 11-left. The blue dotted line shows the room temperature evolution because of simulating the installed QRad behavior with the power demand depicted in Figure 11-right. The green dotted line depicts the room monitored temperature extracted from the data logs, showing a temperature increase close to the simulated QRad behavior and matching the set-point temperature after 800 s. Figure 10. CPU active cored due to task scheduling on the left and CPU power demand estimated and predicted on the right. Figure 11. QRad Simulation Results: temperature evolution (left) and power demand evolution (right).

Conclusions
In this paper, we consider the case of distributed DCs and associated problems related to heat reuse when the servers are installed in residential homes are used as a primary source of heat. We propose a workload scheduling solution based on constraint satisfaction to allocate workload on severs for reaching and maintaining the desired temperature set-point in residential homes by reusing their residual heat. Two models were defined to correlate the heat demand with the amount of workload to be executed by the servers: a mathematical model derived from thermodynamic laws calibrated with monitored data and a machine learning model able to predict the amount of workload to be executed by a server to reach a desired temperature set point. The results obtained considering monitored data from an operation distributed DC are promising. The workload scheduling solution can distribute the workload so that the temperature setpoints are meet in a reasonable time, while the server heat and power demand correlation models achieve good accuracy levels.

Conclusions
In this paper, we consider the case of distributed DCs and associated problems related to heat reuse when the servers are installed in residential homes are used as a primary source of heat. We propose a workload scheduling solution based on constraint satisfaction to allocate workload on severs for reaching and maintaining the desired temperature set-point in residential homes by reusing their residual heat. Two models were defined to correlate the heat demand with the amount of workload to be executed by the servers: a mathematical model derived from thermodynamic laws calibrated with monitored data and a machine learning model able to predict the amount of workload to be executed by a server to reach a desired temperature set point. The results obtained considering monitored data from an operation distributed DC are promising. The workload scheduling solution can distribute the workload so that the temperature setpoints are meet in a reasonable time, while the server heat and power demand correlation models achieve good accuracy levels.