A Temperature-Risk and Energy-Saving Evaluation Model for Supporting Energy-Saving Measures for Data Center Server Rooms

As data centers have become increasingly important in recent years their operational management must attain higher efficiency and reliability. Moreover, the power consumption of a data center is extremely large, and it is anticipated that it will continue to increase, so energy saving has become an urgent issue concerning data centers. In the meantime, the environment of the server rooms in data centers has become complicated owing to the introduction of virtualization technology, the installation of high-heat density information and communication technology (ICT) equipment and racks, and the diversification of cooling methods. It is very difficult to manage a server room in the case of such a complicated environment. When energy-saving measures are implemented in a server room with such a complicated environment, it is important to evaluate “temperature risks” in advance and calculate the energy-saving effect after the measures are taken. Under those circumstances, in this study, two prediction models are proposed: a model that predicts the rack intake temperature (so that the temperature risk can be evaluated in support of energy-saving measures implemented in the server room) and a model that evaluates the energy-saving effect (in relation to a baseline). Specifically, the models were constructed by using machine learning. The first constructed model evaluates the temperature risk in a verification room in advance, and it was confirmed that the model can evaluate the risk beforehand with high accuracy. The second constructed model—“baseline model” hereafter—supports energy-saving measures, and it was confirmed that the model can calculate the baseline (energy consumption) with high accuracy as well. Moreover, the effect of proposal process of energy-saving measures in the verification room was verified by using the two proposed models. In particular, the effectiveness of the model for evaluating temperature risk in advance and that of a technology for visualizing the energy-saving effect were confirmed.


Introduction
Information and communications technology (ICT) systems have become important tools for building the infrastructure that supports social life. Against that background, the role of data centers-which manage that information-is becoming more important [1]. Moreover, the power consumption of a data center is huge, and it is predicted to increase from now onwards; accordingly, reducing that power consumption has become an urgent issue [1,2]. In the meantime, owing to factors such as cloud computing, virtualization technology, high heat-generation density of ICT equipment, and diversification of cooling methods, the environment surrounding a data center is becoming ever more complicated [3,4]. Even with such a complex environment, a data center must be operated at (ESCO) projects targeting building equipment, the energy-saving effect may be evaluated by using statistical values (such as average and minimum) of past data (the previous year's data, for example) as the baseline [18]. Jian et al. estimate yearly power consumption of cooling system by using maximum and minimum value of some facilities [19]. However, it is not desirable to use statistical values such as average, minimum or maximum values of past data as a baseline in an environment that changes fluidly in time and space (like a server room), since it is unknown whether the effect can be properly evaluated. In addition, when evaluating energy saving measures, it is preferable to estimate the baseline on a monthly, daily, or hourly basis rather than on an annual basis. Other approach is energy simulation (For example, CFD is used in DC, TRANSYS [15] and EnergyPlus are used in buildings). Maurizio et al. use TRANSYS and calculate the energy consumption in the electric steelmaking plants to evaluate energy and cost by selecting cooling methods [20]. Chao et al. use EnergyPlus to calculate energy consumption for three type building (large office, small office, residence) [21]. Cheng et al. report the results of calculation to use EnergyPlus based on BIM [22]. Stefan et al. combine EnergyPlus and CFD to calculate COP of cooling system in the building [23]. However, as with temperature prediction, it takes a lot of time to build a model and set boundary conditions, and its versatility is low. There is also a technique for formulating a baseline by regression equation using historical data for the issues related to energy simulation. Chao et al. constructed a regression model and confirmed that energy in large office, small office, and residence can be predicted with high accuracy [21]. In addition, Massimiliano et al. constructed a multiple regression model based on the outside air temperature, etc., and showed that the power consumption in the building can be predicted [24]. From these studies, it is considered that the baseline can be estimated quickly and with high accuracy by using a model constructing from the past data. Also, baseline technology is often applied to office buildings and residence, and there are not many studies on baselines for CRAC in server room. In the present study, we therefore propose a model that calculates the baseline for CRACs in a server room that is suitable for their current operation. The model is constructed by using machine learning with data from a server room, and it can perform self-learning with high accuracy.
When implementing energy-saving measures for data centers, it is important to execute the following four steps appropriately: (i) Planning of the measures, (ii) evaluate the temperature risk before implementing them, (iii) implement them, and (iv) evaluate their energy-saving effects ( Figure 1). Accordingly, in this study, two models are proposed: a "rack intake temperature prediction model" that evaluates the temperature risk before implementing the energy-saving measures; and a "baseline model" that supports the evaluation of the energy-saving effect. The accuracy of each model was evaluated by experiments in a verification room. Energy-saving measures were implemented by applying these highly accurate models, and the effectiveness of each model was confirmed.

Verification Room
The configuration of the verification room is outlined in Figure 2, and its specification is listed in Table 1. The verification room is equipped with two computer room CRACs (CRACs), two "task-ambient" CRACs (racks A3 and B5), and 26 racks (four rows, A to D) for ICT equipment. The target of the energy-saving measures in this study is two of the CRAC (excluding the task-ambient type ones). The verification room is configured in the "cold-aisle containment" manner [8].

Verification Room
The configuration of the verification room is outlined in Figure 2, and its specification is listed in Table 1. The verification room is equipped with two computer room CRACs (CRACs), two "task-ambient" CRACs (racks A3 and B5), and 26 racks (four rows, A to D) for ICT equipment. The target of the energy-saving measures in this study is two of the CRAC (excluding the task-ambient type ones). The verification room is configured in the "cold-aisle containment" manner [8].

Verification Data
In recent years, data center infrastructure management (DCIM) systems, which support efficient operation by integrated management of various facilities and equipment in data centers, have become a focus [25,26]. Also, in previous study, we have summarized concepts and examples of effects related to DCIM [27]. We use the DCIM as an onsite data collecting system installed inside a server room ( Figure 3). The data collecting system is connected to the CRAC, rack intake temperature sensors, and power-distribution units (PDUs) for the racks via local area network (LAN). This system collects and stores data from the connected devices and devices. The data-collection interval for the CRACs and temperature sensors is one minute, and that for the rack PDUs is five minutes. The rack intake temperature prediction model uses the data accumulated by the data-collection system. In this study, values aggregated over a 30-min average were used. Rack intake temperature is defined as the value taken by the temperature sensor installed on the rack surface at a height of 1.5 m (Figure 4).
The operation data listed in Table 2 was used for constructing and evaluating the rack intake temperature prediction model and the baseline model for the CRACs. The models were evaluated during the period from 21 October to 23 December, 2019; that is, the effectiveness of each model was verified by simulating an energy-saving measure that changes the setting of the CRAC return temperature.

Verification Data
In recent years, data center infrastructure management (DCIM) systems, which support efficient operation by integrated management of various facilities and equipment in data centers, have become a focus [25,26]. Also, in previous study, we have summarized concepts and examples of effects related to DCIM [27]. We use the DCIM as an onsite data collecting system installed inside a server room ( Figure 3). The data collecting system is connected to the CRAC, rack intake temperature sensors, and power-distribution units (PDUs) for the racks via local area network (LAN). This system collects and stores data from the connected devices and devices. The data-collection interval for the CRACs and temperature sensors is one minute, and that for the rack PDUs is five minutes. The rack intake temperature prediction model uses the data accumulated by the data-collection system. In this study, values aggregated over a 30-min average were used. Rack intake temperature is defined as the value taken by the temperature sensor installed on the rack surface at a height of 1.5 m (Figure 4).
The operation data listed in Table 2 was used for constructing and evaluating the rack intake temperature prediction model and the baseline model for the CRACs. The models were evaluated during the period from 21 October to 23 December, 2019; that is, the effectiveness of each model was verified by simulating an energy-saving measure that changes the setting of the CRAC return temperature.

Construction of Prediction Model
The rack intake temperature prediction model and the model for calculating the energy-consumption baseline for the CRACs are described hereafter.

Aim of Each Construction Model
In our research, as described in Section 1, we propose energy-saving measures in data center with a load that is highly fluid in time and space, including the prior assessment temperature risk and the verification of the effect of energy-saving measures, in response to the issue of energy-saving in data centers. As a technical issue to realize the proposal, we think that the technology to predict the rack intake temperature and calculate the baseline of CARC is necessary, and this chapter describes the construction method for the two models.
Regarding prediction rack intake temperature, there is a problem that in the previous research it took a lot of time to build a high accuracy model using CFD. Therefore, we construct and verify models which are multiple machine learning methods that can be constructed by self-learning using past data, and our goal is to construct highly accurate models. In addition, regarding the baseline, there is a problem that it is difficult to apply the baseline of CRAC in data center with a load that is highly fluid in time and space by using the past statistical values (max value, minimum value) that are often tackled in the past studies. We aim to build and verify a highly accurate model of machine

Construction of Prediction Model
The rack intake temperature prediction model and the model for calculating the energy-consumption baseline for the CRACs are described hereafter.

Aim of Each Construction Model
In our research, as described in Section 1, we propose energy-saving measures in data center with a load that is highly fluid in time and space, including the prior assessment temperature risk and the verification of the effect of energy-saving measures, in response to the issue of energy-saving in data centers. As a technical issue to realize the proposal, we think that the technology to predict the rack intake temperature and calculate the baseline of CARC is necessary, and this chapter describes the construction method for the two models.
Regarding prediction rack intake temperature, there is a problem that in the previous research it took a lot of time to build a high accuracy model using CFD. Therefore, we construct and verify models which are multiple machine learning methods that can be constructed by self-learning using past data, and our goal is to construct highly accurate models. In addition, regarding the baseline, there is a problem that it is difficult to apply the baseline of CRAC in data center with a load that is highly fluid in time and space by using the past statistical values (max value, minimum value) that are often tackled in the past studies. We aim to build and verify a highly accurate model of machine

Construction of Prediction Model
The rack intake temperature prediction model and the model for calculating the energyconsumption baseline for the CRACs are described hereafter.

Aim of Each Construction Model
In our research, as described in Section 1, we propose energy-saving measures in data center with a load that is highly fluid in time and space, including the prior assessment temperature risk and the verification of the effect of energy-saving measures, in response to the issue of energy-saving in data centers. As a technical issue to realize the proposal, we think that the technology to predict the rack intake temperature and calculate the baseline of CARC is necessary, and this chapter describes the construction method for the two models.
Regarding prediction rack intake temperature, there is a problem that in the previous research it took a lot of time to build a high accuracy model using CFD. Therefore, we construct and verify models which are multiple machine learning methods that can be constructed by self-learning using past data, and our goal is to construct highly accurate models. In addition, regarding the baseline, there is a problem that it is difficult to apply the baseline of CRAC in data center with a load that is highly fluid in time and space by using the past statistical values (max value, minimum value) that are often tackled in the past studies. We aim to build and verify a highly accurate model of machine learning that can be built by self-learning using past data, using multiple methods selected from the viewpoint of AI's explanatory power.

Outline of Prediction Model of Rack Intake Temperature
The model that predicts rack intake temperature is described first.

Construction of Model for Predicting Rack Intake Temperature
As for this prediction model, rack intake temperature in one hour (or in 30 min) is the target variable. The explanatory variables used in the prediction are listed in Table 3. The prediction model divides the data set into a learning period and a verification period, and the model constructed by using the data set for the learning period is evaluated with the data set for the verification period. In this verification, data acquired up to n days before the verification period was set as the learning-period data. In a previous study, we were studying a model that predicts rack intake temperature after 30 min due to changes in ICT equipment in the server room and confirmed that it is possible to predict with high accuracy when using two methods (gradient boosting decision tree (GBDT) and a state space model) [28]. Based on the results, three methods, which are linear regression that are commonly used and two methods of the previous study, were selected as candidates. (Table 4). First, the prediction model using each method was verified by using a dataset for April 2016, and the accuracy of each method was evaluated, and the features of each method were identified. According to the results of the evaluations, the methods were narrowed down, further verified and evaluated in detail by using a dataset acquired from May to October 2017, and the model was selected. When the gradient-boosting decision tree (GBDT) was selected as the machine-learning method, the library "XGBoost," which is a Python implementation of scikit-learn, was used. The grid-search method was used with parameters set as follows: learning rate: 0.1; lower limit of loss reduction due to addition of leaves: 0; maximum depth of tree: 3; ratio of randomly sampled samples (data): 1; and ratio of columns randomly extracted from each decision tree: 1. As for other parameters, the default values in the library were used. In the periods other than No. 1, the parameters optimized by the grid search were also optimized by the same method. When the state-space model was selected as the machine-learning method, the python library "statsmodels" was used. Parameters were estimated by using a Monte Carlo filter and pseudo-Newton method. The rack intake temperature prediction model was evaluated by using the following four evaluation indices:

1.
Correlation coefficient (R): R expresses the explanatory power of the predicted value of the objective variable.

2.
Correct answer rate: The ratio of the number of predicted values within ±0.5 • C of the measured value to the total number of predicted values 3.
Root-mean-square error (RMSE): The accuracy of the three machine-learning methods is evaluated in terms of RMSE, which is a commonly used index for numerical prediction. 4.
Maximum peak error: As for predicting server-room temperature, maximum peak error is significant if the actual measured value and the predicted value deviate greatly. The error by which the actually measured value is larger than the predicted value is therefore defined as maximum peak error.

Outline of Baseline Model of CRAC
The baseline model for CRAC is overviewed as follows.

Construction of a Baseline Model for CRAC
The purpose of the baseline model is to determine the effect of energy-saving measures. The objective variable of the model is therefore the power consumption of CRAC when the energy-saving measure is not implemented. The baseline model calculates power consumption in real time when the energy-saving measure is not implemented; that is, it not the model that predicts the future from that time point. The explanatory variables used are listed in Table 5. The dataset is divided into a learning period and a verification period, and the model constructed by using the learning-period dataset is evaluated by using the verification-period dataset. In this verification, the data collected up to n days before the verification period was set as the data for the learning period. Table 5. Explanatory variables used by the baseline model.

CRAC cooling capacity 2
Outside-air temperature 3 Power consumption of each rack

Methods Used by Baseline Model
In recent years, "explainable" AI (XAI), which can explain prediction results by machine learning and processes in humans, has received increasing attention [29,30]. This explainable model allows the user to explain and understand the result calculated by the model. It is essential that the model for determining the effect of energy-saving measures targeted in this study should be explainable to scholars with specialized skills and operators who carry out practical work. In addition, as in the ESCO business, compensation may be paid for energy-saving effects; accordingly, from the standpoint of data-center businesses too, it is desirable to be able to confirm the likelihood of effects. The baseline model should therefore have high explanatory power. In this study, therefore, a total of three methods were selected: (1) linear regression (which is considered to be highly explanatory); (2) a decision-tree and, (3) gradient-boosting decision tree (which is an application of a decision-tree model that is expected to improve accuracy). Also, a state-space model. In addition, the state-space model (time-series model) used as a candidate for the rack intake temperature prediction model in this model has the feature that it is updated sequentially with data acquired at the preceding n time points. Therefore, when calculating the baseline without energy-saving measures implemented, it is a problem that prediction accuracy decreases because the prediction result is used for the next prediction; thus, subject was excluded from the method used by the baseline model.

Method for Evaluating Baseline Model
Accuracy of the baseline model was evaluated by using evaluation indices 1, 3, 4 and 5. This is because it was considered inappropriate to use evaluation index 2 (correct-answer rate) because the power consumption of the CRAC has a change significantly compared to temperature. Also, in the baseline model, NMBE is added as indices. This is a commonly used index when performing energy simulations [31,32]:

1.
Correlation coefficient (R): R expresses the explanatory power of the predicted value of the objective variable.

3.
Root-mean-square error (RMSE): The accuracy of the three machine-learning methods is evaluated in terms of RMSE, which is a commonly used index for numerical prediction. 4.
Maximum peak error: As for predicting server-room temperature, maximum peak error is significant if the actual measured value and the predicted value deviate greatly. The error by which the actually measured value is larger than the predicted value is therefore defined as maximum peak error.

5.
Normalized Mean Bias Error (NMBE): NMBE is a normalization of the MBE index that is used to scale the results of MBE, making them comparable. This index is used by IPMVP.

Primary Evaluation of Prediction Model and Narrowing Down of Prediction Methods
A model for predicting the rack intake temperature was constructed using each method under the assumption that the verification period was the last week of April 2016 and the learning period was the previous 21 days, and the accuracy of the model was evaluated. Evaluation values when each method was used are listed in Table 6. According to these values, evaluation indexes 1 to 3 used the average of evaluation values for all racks, and evaluation index 4 used the maximum value of the peak error for all racks. According to these results, it is clear that GBDT and the state-space model give similar values for evaluation indexes 1 to 4, meaning that the predictions by those methods achieve high accuracy. On the contrary, although linear regression gives similar correct-answer rate and peak error to those given by GBDT and the state-space model, correlation coefficient R and RMSE are low, so it can be concluded that the accuracy of linear regression is low compared to the other two methods. In light of that result, hereafter, GBDT and the state-space mode are focused on and investigated in detail.

Secondary Evaluation of Prediction Model and Determination of Prediction Method
The two methods narrowed down in Section 4.1.1 are summarized in Figures 5 and 6, and Table 7. As for the secondary evaluation conducted from May to October 2017, the learning and verification periods were taken as one month each, and the data of the month previous to the verification-target month was set as the learning period (Table 8). Evaluation values that compare the measured rack Energies 2020, 13, 5222 9 of 22 intake temperature with the temperature predicted by using each prediction model are listed in Table 9. Evaluation indexes 1 to 3 are average evaluation values for all racks, and evaluation index 4 is the maximum evaluation value for all racks, and these values were obtained by tabulating the prediction results for each month. It is clear from these results that the prediction values are extremely high and the prediction achieves high accuracy regardless of which of the two prediction models is used. It is also clear that the evaluation values (R and correct-answer ratio) given by the state-space model are higher than those given by GBDT, although they are only slightly higher over the entire period.
temperature for rack A1 in July 2017 and time-series changes of values predicted by each prediction model are respectively shown in Figure 7. According to these figures, the predicted temperature values follow the time-series changes in measured values closely. Measured rack intake temperatures for rack A7 in July 2017 and time-series changes of temperatures predicted by each prediction model are respectively shown in Figure 8. These figures show that the error between the measured temperatures and the temperature predicted by the model using GBDT increased on prediction date 5 July 2017. It is considered that this increased error is due to two reasons: (i) the effect of the intake temperature of rack A7 at this time point during the learning and verification periods being the lowest of all the racks and (ii) in the case of GBDT, if the input fluctuates significantly, the prediction result cannot be handled and becomes worse. On the contrary, as shown in Figure 8b and Table 9, in the case of the state-space model, which gives higher evaluation values than those of GBDT, the result obtained at the previous time point is reflected in the prediction model, and the model is updated sequentially; as a result, the ability to follow extrapolation is high, and is considered that good prediction results were obtained.   temperature for rack A1 in July 2017 and time-series changes of values predicted by each prediction model are respectively shown in Figure 7. According to these figures, the predicted temperature values follow the time-series changes in measured values closely. Measured rack intake temperatures for rack A7 in July 2017 and time-series changes of temperatures predicted by each prediction model are respectively shown in Figure 8. These figures show that the error between the measured temperatures and the temperature predicted by the model using GBDT increased on prediction date 5 July 2017. It is considered that this increased error is due to two reasons: (i) the effect of the intake temperature of rack A7 at this time point during the learning and verification periods being the lowest of all the racks and (ii) in the case of GBDT, if the input fluctuates significantly, the prediction result cannot be handled and becomes worse. On the contrary, as shown in Figure 8b and Table 9, in the case of the state-space model, which gives higher evaluation values than those of GBDT, the result obtained at the previous time point is reflected in the prediction model, and the model is updated sequentially; as a result, the ability to follow extrapolation is high, and is considered that good prediction results were obtained.    Table 7. Methods selected as a candidate for rack intake temperature prediction model.

GBDT
As an ensemble learning method using decision trees, a prediction method used in regression and classification problems [33] State-space model Prediction method used for time-series problems [34]  Next, the accuracy of the two methods, GBDT and state-space model, was further examined on the basis of temporal changes in the measured and predicted temperatures. Measured rack intake temperature for rack A1 in July 2017 and time-series changes of values predicted by each prediction model are respectively shown in Figure 7. According to these figures, the predicted temperature values follow the time-series changes in measured values closely. Measured rack intake temperatures for rack A7 in July 2017 and time-series changes of temperatures predicted by each prediction model are respectively shown in Figure 8. These figures show that the error between the measured temperatures and the temperature predicted by the model using GBDT increased on prediction date 5 July 2017. It is considered that this increased error is due to two reasons: (i) the effect of the intake temperature of rack A7 at this time point during the learning and verification periods being the lowest of all the racks and (ii) in the case of GBDT, if the input fluctuates significantly, the prediction result cannot be handled and becomes worse. On the contrary, as shown in Figure 8b and Table 9, in the case of the state-space model, which gives higher evaluation values than those of GBDT, the result obtained at the previous time point is reflected in the prediction model, and the model is updated sequentially; as a result, the ability to follow extrapolation is high, and is considered that good prediction results were obtained. Table 7. Methods selected as a candidate for rack intake temperature prediction model.

GBDT
As an ensemble learning method using decision trees, a prediction method used in regression and classification problems [33] State-space model Prediction method used for time-series problems [34]

Detailed Evaluation of the Determined Prediction Model
To understand the features of the prediction model, the effects of each explanatory variable and learning period on prediction accuracy are considered hereafter.  Table 7. Methods selected as a candidate for rack intake temperature prediction model.

GBDT
As an ensemble learning method using decision trees, a prediction method used in regression and classification problems [33] State-space model Prediction method used for time-series problems [34]

Detailed Evaluation of the Determined Prediction Model
To understand the features of the prediction model, the effects of each explanatory variable and learning period on prediction accuracy are considered hereafter.

Detailed Evaluation of the Determined Prediction Model
To understand the features of the prediction model, the effects of each explanatory variable and learning period on prediction accuracy are considered hereafter.

1.
Effect of explanatory variables on accuracy When building a predictive model, the choice of explanatory variables is paramount. This is because not only the choice of explanatory variables affects accuracy but also the time required to build a prediction model changes significantly. Therefore, to study the effect of each explanatory variable on the accuracy of the prediction model, change in RMSE was observed when the explanatory variables were changed in two major ways. RMSE was used because it makes it possible to quantify and evaluate the effect of the explanatory variables on the accuracy of the model by evaluating how Energies 2020, 13, 5222 11 of 22 much a predicted value has changed from the actual value during a certain period. One model was verified when one explanatory variable was removed from all the explanatory variables, and the other model was verified when only one explanatory variable was used as an explanatory variable. In period No. 2 (May 1 to October 31, 2017; see Table 2), the effect of the explanatory variables was evaluated in terms of the value of RMSE, and the results are listed in Table 10. It can be seen that when one explanatory variable is removed from all explanatory variables, the RMSE value drops significantly only when the CRAC return temperature is removed from the explanatory variables. When the other explanatory variables were removed, RMSE does not change much. Next, as for the verification using only one variable as the explanatory variable, the best RMSE was obtained when only CRAC return-air temperature was used as the explanatory variable. These results demonstrate that return temperature of the CRAC significantly influences prediction accuracy. Furthermore, the time required for building the model did not significantly change when the explanatory variables were changed. This is probably because (i) the explanatory variables used in this study were not large in number and (ii) increasing or decreasing the number of variables had a small effect on building the model.

Effect of learning period on accuracy
When constructing a predictive model, it is paramount to consider the length of the learning period. This is because the learning-period length significantly influences not only prediction accuracy (as in the case of explanatory variables) but also the time associated with the construction of the prediction model. Therefore, in this section, to consider the influence of the learning period for the prediction model on the accuracy, the model was verified by changing the learning period while using the data for August 2017. The effect on prediction accuracy when the learning period was varied was evaluated in terms of the value of RMSE, and the evaluation results are listed in Table 11. As shown by the results in Table 12, RMSE does not significantly change when the learning period is varied. On the contrary, although the time required for learning increased by a maximum of about two times, that increased time did not cause any operational problems when only the server room in this study was targeted. These results indicate that when constructing a prediction model, it is possible to highly accurately predict temperature if data covering about one week is available. Note that the shortest learning period that can achieve high prediction accuracy was left for future study.

Learning Period RMSE
7 days before the evaluation period 0.12 14 days before the evaluation period 0.13 21 days before the evaluation period 0.12 31 days before the evaluation period 0.14 61 days before the evaluation period 0.12 91 days before the evaluation period 0.13

Summary of Evaluation of Accuracy of Temperature Prediction Model
As explained hereafter, it was confirmed that the model for predicting rack intake temperature using the state-space model, which was narrowed down from the models constructed by multiple methods, can predict the temperature with high accuracy. Moreover, the effect of each explanatory variable used in the proposed model was evaluated, and the results of that evaluation demonstrate that the return temperature of the CRAC is important in regard to predicting the rack intake temperature using this state-space-based prediction model. After that, the effect of the learning period on the accuracy of the proposed model was evaluated, and the results of the evaluation performed in the verification room show that it is possible to predict temperature with high accuracy using about one week's worth of learning data.

Primary Evaluation of Baseline Model and Narrowing Down of Prediction Methods
During the period from May 1 to December 31, 2017, the learning and verification periods were both set to one month, prediction models was constructed by using each method (linear regression, decision tree, and GBDT), and the prediction accuracy of each model was evaluated (Table 13). As for these values, evaluation indexes 1 and 2 are average evaluation values for each month, and evaluation index 4 is the maximum value for each month. According to these results, GBDT achieved the best values and linear regression achieved the worst values of evaluation indices (1), (3) and (4). Also, indices (1), (3) and (4) values given by the prediction model using the decision-tree method are worse than those given by GBDT. This finding is considered to be explained by the fact that in contrast to the decision-tree method, GBDT features added "boosting" for weighting, and the presence or absence of that boosting affects prediction accuracy. Regarding indices (5), linear regression is better than other methods, and we discuss about this indices and results in detail next section.

Secondary Evaluation of Baseline Model and Determination of Prediction Method
The evaluation values for each month during the period from May 1 to December 31, 2017 are listed in Tables 14-17 and plotted in Figures 9-12. Regarding the correlation coefficients listed in Table 14 and plotted in Figure 9, all methods show numerical values with high correlation, and it can be considered that prediction can be performed with high accuracy. Regarding the correlation coefficients listed in Table 14 and Figure 9, all three methods give numerical values with high correlation, so it can be considered that the prediction has high enough accuracy. Even so, for any month, GBDT did not give the worst value of the three methods; in fact, the results show that it often gave the best value. As for RMSE listed in Table 15 and plotted Figure 10, compared with the other methods, linear regression does not give the best value for any month, and its fluctuation per month is the largest. In addition, RMSE values given by the decision-tree method and GBDT are close. Although RMSE varies slightly in a similar manner to the correlation coefficient, the values given by GBDT are the best for several of the months. Regarding peak error listed in Table 16 and plotted in Figure 11, linear regression gives the best values for four of the months (July, October, November, and December). However, fluctuations of peak error per month are larger (minimum: 0.48 kW (July) and maximum: 4.75 kW (September)) than those of the other two methods. Although the peak errors of the decision-tree method and GBDT are close, GBDT gives the better value for many months. It is concluded from these results that linear regression is unsuitable as the baseline model because the evaluation indices related to prediction accuracy (i.e., RMSE and peak error) fluctuate from month to month. In addition, both the decision-tree method and GBDT give high correlation coefficients, and it is concluded that they can predict temperature with high accuracy. However, it is considered that GBDT, which on the whole gives better values than those of the decision tree, has superior robustness and is suitable as the baseline model. Accordingly, the model using GBDT was evaluated as explained hereafter.

Detailed Evaluation of Baseline Model
To understand the features of the baseline model, the effects of each explanatory variable and learning period on prediction accuracy are considered hereafter.

Effect of explanatory variables on prediction accuracy
The choice of explanatory variables is paramount when building a predictive model. That is because not only the learning period affects prediction accuracy but also the time required to build the model changes significantly with the explanatory variables chosen. Furthermore, when using the baseline model to verify the energy-saving effect of measures, it is important that estimated values that play no part in an energy-saving effect can be understood by or explained to the user. Accordingly, to investigate the influence of each explanatory variable on prediction accuracy of the baseline model, the "degree of importance" of the explanatory variables was estimated. Degree of importance means to what extent each explanatory variable contributes to improving prediction accuracy. Regarding the model when November 2017 was the learning period and December 2017 was the evaluation period, the results obtained by using the existing python library and calculated by using (xgb.plot_importance(x)) and default gain are shown in Figure 13. This result indicates that cooling capacity has the greatest effect on prediction accuracy. And although the degrees of importance of the other explanatory variables are less than half, they cannot be ignored and are considered to be important explanatory variables. Although it is a more complicated method than linear regression or decision-tree method, GBDT makes is possible to understand a part of the logic by showing the importance of the explanatory variable for the target variable as shown in the figure. Regarding the NMBE listed in Table 17 and Figure 12, since the recommended range per month proposed by IPMVP is ±20%, it can be seen that the range is achieved by the decision-tree or GBDT. In addition, linear regression was the best value of NMBE when evaluated over the entire period. However, when analyzed monthly, the variation in linear regression was very large. Since NMBE does not use absolute values, the evaluation results vary greatly depending on the period. In the following, the RMSE, which uses absolute values, will be used to evaluate the error as a whole. For these reasons, the indices (1), (3), and (4) are used for evaluation thereafter.

Detailed Evaluation of Baseline Model
To understand the features of the baseline model, the effects of each explanatory variable and learning period on prediction accuracy are considered hereafter.

1.
Effect of explanatory variables on prediction accuracy The choice of explanatory variables is paramount when building a predictive model. That is because not only the learning period affects prediction accuracy but also the time required to build the model changes significantly with the explanatory variables chosen. Furthermore, when using the baseline model to verify the energy-saving effect of measures, it is important that estimated values that play no part in an energy-saving effect can be understood by or explained to the user. Accordingly, to investigate the influence of each explanatory variable on prediction accuracy of the baseline model, the "degree of importance" of the explanatory variables was estimated. Degree of importance means to what extent each explanatory variable contributes to improving prediction accuracy. Regarding the model when November 2017 was the learning period and December 2017 was the evaluation period, the results obtained by using the existing python library and calculated by using (xgb.plot_importance(x)) and default gain are shown in Figure 13. This result indicates that cooling capacity has the greatest effect on prediction accuracy. And although the degrees of importance of the other explanatory variables are less than half, they cannot be ignored and are considered to be important explanatory variables. Although it is a more complicated method than linear regression or decision-tree method, GBDT makes is possible to understand a part of the logic by showing the importance of the explanatory variable for the target variable as shown in the figure.

2.
Effect of learning period on accuracy With regard to the methods narrowed down as described in the previous section, the effect of length of learning period on prediction accuracy is considered hereafter. The evaluation period was fixed to one month (November or December 2017), and the learning period was changed from the last week to the last six months (Table 18). The evaluation indices for the accuracy during the one-month evaluation period (November or December 2017) when the learning period was changed are listed in Tables 19 and 20. It can be concluded from the results in these tables that the relationship between the learning period and the evaluation indices is insignificant when the learning period is more than the previous three weeks. On the contrary, in the case of a very short learning period, such as one week, the results indicate that the accuracy values decrease.
With regard to the methods narrowed down as described in the previous section, the effect of length of learning period on prediction accuracy is considered hereafter. The evaluation period was fixed to one month (November or December 2017), and the learning period was changed from the last week to the last six months (Table 18). The evaluation indices for the accuracy during the one-month evaluation period (November or December 2017) when the learning period was changed are listed in Tables 19 and 20. It can be concluded from the results in these tables that the relationship between the learning period and the evaluation indices is insignificant when the learning period is more than the previous three weeks. On the contrary, in the case of a very short learning period, such as one week, the results indicate that the accuracy values decrease.

Feature importance
Cooling capacity Outside-air temperature Total power of ICT equipment Figure 13. Importance of explanatory variables used in the baseline model. Previous three weeks 4 Previous month 5 Previous two months 6 Previous three months 7 Previous four months 8 Previous five months 9 Previous six months It was confirmed that the baseline model for power consumption of an CRAC can calculate temperature with high accuracy by constructing a model using GBDT (which was narrowed down Figure 13. Importance of explanatory variables used in the baseline model. Previous three weeks 4 Previous month 5 Previous two months 6 Previous three months 7 Previous four months 8 Previous five months 9 Previous six months It was confirmed that the baseline model for power consumption of an CRAC can calculate temperature with high accuracy by constructing a model using GBDT (which was narrowed down from three methods). Moreover, the effect of each explanatory variable used in the proposed baseline model was visualized by calculating the degree of importance of each explanatory variable. After that, the effect of the learning period on prediction accuracy of the proposed model was evaluated, and the results of that evaluation show that it is possible predict temperature with high accuracy if there is about three weeks of learning data acquired in the verification room.

Verification of Effectiveness of the Proposed Model When Energy-Saving Measures Are Implemented
This section presents the results of a preliminary examination and effectiveness verification of energy-saving measures by using the rack intake temperature prediction model constructed and evaluated in the previous section and the baseline model for the CRACs.

Overview of Effectiveness Verification
As for the evaluation period from October to December 2019, the effect and effectiveness of the proposed model was verified by simulating an energy-saving measure that changes the setting of return-air temperature of the CRACs. The normal condition of the return temperature of the CRAC was set to 28 • C, and the return temperature setting was changed to 30 • C during the measure-implementation period (two weeks). Note that during the period excluding the verification experiment simulating the energy-saving measure, the accuracy of each model was verified, and it was confirmed that it the models can predict and calculate the temperature with high accuracy, as described in Section 4 (Tables 21 and 22). The results of an evaluation of temperature risk when the energy-saving measure was implemented and the evaluation results regarding the visualization of effect of the measure are presented in the following sections.

Evaluation of Temperature Risk by Using Rack Intake Temperature Prediction Model
The change in rack intake temperature when the set value of the CRAC return temperature was changed to 30 • C was predicted in advance. Three explanatory variables were selected on the basis of three viewpoints: rack intake temperature at the previous time point, CRAC return temperature, and total power consumption of the server room. The rack intake temperature at the previous time point was selected on the basis of the characteristics of the state-space model using the value of the target variable at the previous time point. Return temperature was also selected because it is an important explanatory variable that affects prediction accuracy (as discussed in Section 4.1.4). Since the set temperature was set to 30 • C in this verification, return temperature was also set to 30 • C. About total power consumption, it is very difficult to predict the power consumption of each rack in DC. In previous studies, Mehdi et al. created a scenario to estimate the power consumption of multiple ICT devices, including the relationship between temperature and ICT equipment power consumption [35]. In this study, we decided to perform a simulation in the case of the maximum risk by adopting the maximum power consumption in the past week from the viewpoint of evaluating the risk. A model constructed as a learning period from November 8 to 29, 2019 was used, and the temperature for the period from November 21 to December 6, 2019, was predicted to determine, and the predicted temperature was compared with the measured value (Table 23). From this table, it can be seen that the calculation results show values very close to the average and median values. In addition, it is clear that the error between the maximum value and the calculation result for all racks is within the range of 2.37 • C shown as the peak error in Table 23. This is because the max peak error value in the preliminary evaluation is considered to be the maximum error held by the model, so (calculation result + peak error) should be considered as the maximum rack intake temperature that may occur during the evaluation period. Since the maximum rack intake air temperature that may have been calculated during the preliminary study is larger than the maximum value during the evaluation period, it is considered useful for the preliminary study ( Figure 14). Furthermore, for each rack, it is possible to predict which rack will become hot, so it is possible to understand the points to be noted when implementing energy conservation measures. It was confirmed the effectiveness of using this prediction model for the preliminary examination of temperature risk.  Figure 14. Relationship between calculation value + peak error and max of measure value for each rack.  Figure 14. Relationship between calculation value + peak error and max of measure value for each rack.

Visualization of Energy-Saving Effect by Using Baseline Model
The energy-saving effect was verified when the return-air temperature of the CRAC, which was in the normal condition, was changed from 28 • C to 30 • C. The energy-saving effect calculated from the difference between measured power consumption and the value predicted by the baseline model when the return-air temperature of the CRAC was increased to 30 • C is shown in Table 24, and the time-series changes are plotted in Figure 15. It is clear that lowering the set temperature twice for about 14 days has a reduction effect of about 50 kW in terms of power consumption during that period. And it is clear from Figure 15 that in addition to the time-series changes with a similar tendency, the baseline shows a slightly higher proportion of high values. was in the normal condition, was changed from 28 °C to 30 °C . The energy-saving effect calculated from the difference between measured power consumption and the value predicted by the baseline model when the return-air temperature of the CRAC was increased to 30 °C is shown in Table 24, and the time-series changes are plotted in Figure 15. It is clear that lowering the set temperature twice for about 14 days has a reduction effect of about 50 kW in terms of power consumption during that period. And it is clear from Figure 15 that in addition to the time-series changes with a similar tendency, the baseline shows a slightly higher proportion of high values. Past data may be used to verify the energy-saving effect. To confirm the features of using the model using machine learning, the energy-saving effect was verified when power consumption in November 2019 was taken as the power consumption in the previous year (return-air temperature setting for CRAC was 28 °C ). Measured values of power consumption for CRAC for each year and time-series changes of the values predicted by the baseline model using 2019 data are shown in Figure 16. As clear from this result, with the return-air temperature of the CRAC set at the same value (28 °C ), and even if the periods (21 November to 30 November) are equivalent, it is difficult to treat the past record of CRAC power consumption as a baseline. This difficulty can be attributed to the difficulty in using last year's data because internal load changes over time (as described above).  Past data may be used to verify the energy-saving effect. To confirm the features of using the model using machine learning, the energy-saving effect was verified when power consumption in November 2019 was taken as the power consumption in the previous year (return-air temperature setting for CRAC was 28 • C). Measured values of power consumption for CRAC for each year and time-series changes of the values predicted by the baseline model using 2019 data are shown in Figure 16. As clear from this result, with the return-air temperature of the CRAC set at the same value (28 • C), and even if the periods (21 November to 30 November) are equivalent, it is difficult to treat the past record of CRAC power consumption as a baseline. This difficulty can be attributed to the difficulty in using last year's data because internal load changes over time (as described above).

Additional Verification of Baseline When the Setting of CRAC Return Temperature is Changed
For the purpose to add explaining the certainty of the baseline when the setting of CRAC return temperature is changed as explained in the previous section, the verification when the setting of CRAC return temperature is lowered was conducted. The verification was carried out during the following period and the verification results are shown in Table 25. From this table, it is shown that when the setting of CRAC return temperature lowered, the actual power consumption

Additional Verification of Baseline When the Setting of CRAC Return Temperature Is Changed
For the purpose to add explaining the certainty of the baseline when the setting of CRAC return temperature is changed as explained in the previous section, the verification when the setting of CRAC return temperature is lowered was conducted. The verification was carried out during the following period and the verification results are shown in Table 25. From this table, it is shown that when the setting of CRAC return temperature lowered, the actual power consumption of CRAC is higher than that of the baseline, and how much energy is increased when this change of CRAC return temperature is implemented. I think it is possible to quantitatively grasp what has happened.

Concluding Remarks
In this study, we proposed the methods which are prediction of rack intake temperature and calculation of baseline to support to promote energy-saving measures. The evaluation of indices was defined for each problem, the suitable machine learning method was considered for each problem, and the method was narrowed down by experiment and verification. Using the selected method, we constructed a model that can predict and calculate the rack intake temperature and baseline with high accuracy in a server room. Furthermore, we clarified the effects of model explanatory variables and learning data that affect data accumulation on accuracy, and added consideration on data collection and accumulation in the server room. By utilizing the proposed method and the data in the server room, it is possible to support a highly reliable and highly efficient data center operation. However, since this study is a result in one server room, research on the application of this technology in different rooms is an issue in the future.
Prediction of rack intake temperature

•
We defined an evaluation indices that we considered to be important of data center operation, and verified it with multiple machine learnng methods which has character of self-learning. I built a model which predicted using a state space model as a method high accuracy from the viewpoint of the evaluation indices.

•
It was clarified that the return temperature of CRAC is an important among the explanatory variables on this model.

Calculation of the CRAC baseline
• We selected a machine learning method with XAI that we thought was important in this problem.

•
We verified the multiple methods and selected GBDT as a method high accuracy from the viewpoint of evaluation indices. In addition, We quantified the influence of the explanatory variables on the objective variables and showed that the model has explanatory power.