Productive Efﬁciency of Energy-Aware Data Centers

: Information technologies must be made aware of the sustainability of cost reduction. Data centers may reach energy consumption levels comparable to many industrial facilities and small-sized towns. Therefore, innovative and transparent energy policies should be applied to improve energy consumption and deliver the best performance. This paper compares, analyzes and evaluates various energy efﬁciency policies, which shut down underutilized machines, on an extensive set of data-center environments. Data envelopment analysis (DEA) is then conducted for the detection of the best energy efﬁciency policy and data-center characterization for each case. This analysis evaluates energy consumption and performance indicators for natural DEA and constant returns to scale (CRS). We identify the best energy policies and scheduling strategies for high and low data-center demands and for medium-sized and large data-centers; moreover, this work enables data-center managers to detect inefﬁciencies and to implement further corrective actions.


Introduction
Data centers, which constitute the computational muscle for cloud computing, can be compared in energy consumption to many industrial facilities and towns.The latest trends show that these infrastructures represent approximately 2% of global energy consumption [1], with a 5% annual growth rate [2].
The data envelopment analysis mathematical model enables the management organizational divisions to measure the performance of an organization by providing the relative efficiency of each organizational unit.This relative efficiency measurement can be applied to a set of decision-making units, also known as DMUs, or for productive efficiency.The productive efficiency, also called technical efficiency, involves a collection of inputs (the resources needed for the production) and outputs (the production achieved).To this end, DEA constructs an "efficiency frontier" which places the relative performance of all units so these can be contrasted.This method is notably well-suited for the examination of the behavior of complex relations, even unknown, between numerous inputs and outputs, where the decisions made are affected by a level of uncertainty [3].Moreover, DEA has been used both in private [4,5] and in public contexts [6][7][8][9].
Many initiatives have emerged looking for the decrease of the consumption of energy and the CO 2 trace of data-centers, especially those of a medium and large size.These facilities are composed of thousands and even tens of thousands of machines.
A substantial part of these initiatives focuses on the improvement of the Power Usage Effectiveness (PUE), that is the amount of energy consumed in non-computational tasks, such as power supply, cooling and networking components.This accounts for more than half of the energy consumption of an Internet data-center (IDC).
Several strategies are proposed to significantly improve energy efficiency in large-scale clusters [10]: cooling and temperature management [11,12]; power proportionality for CPU and memory hardware components [13,14]; fewer energy-hungry and non-mechanical hard disks [15]; and new proposals for energy distribution [16].
On the other hand, almost 50% of energy is consumed by computational servers to satisfy the incoming workload.The job arrival is not stable over time, but usually presents correlative low and high periods, such as those present in day/night and weekday/weekend workload patterns.
Such scenarios present a huge opportunity for the improvement of energy efficiency through proper scheduling and through the application of low-energy consumption modes to servers, since keeping servers in an idle state is extremely energy-inefficient.Many energy-aware schedulers, which aim to raise server usage, have been proposed in order to free up the maximum amount of machines so that they may put into hibernation [17][18][19].In addition to these schedulers, several energy-conservation strategies may be applied in virtualized environments, such as the consolidation and migration of virtual machines [20,21].
Other strategies focus on the reduction of energy consumption in specific scenarios, such as those of distributed file systems [22,23].
The most aggressive approach involves the shut-down of underutilized servers in order to minimize energy consumption.Several shut-down policies have been proposed for grid computing environments in [24].This strategy is yet to be widely implemented in working data-centers since a natural reticence to worsening QoS is usually present in data-center operators [25].
The innovation of the research presented in this paper involves the utilization of data envelopment analysis (DEA) as a mathematical technique to compare the efficiency regarding the consumption of energy and the performance of various workload scenarios, scheduling models and energy efficiency policies.This efficiency analysis enables data-center operators to make appropriate decisions about the number of machines, the scheduling solution and the shut-down strategy that must be applied so that data-centers run optimally.The final goal is the maximization of the productive efficiency, which is computed as the amount of energy consumed to serve a workload with a determined performance.
The major contributions of this paper can be summarized as follows: 1. Extensive empirical experimentation and analysis of various cloud-computing scenarios with a trustworthy and detailed simulation tool.2. Impact analysis in terms of the energy consumption and performance of several energy efficiency policies, which shut-down idle machines by means of data envelopment analysis.3. DEA-conducted analysis of the performance impact and energy consumption of a set of scheduling models for large-scale data-centers.4. Empirical determination and proposal of corrective actions to achieve optimal efficiency.The work is organized as follows.In Section 2, the authors introduce the current literature for the utilization of DEA presented for various areas, as well as the DEA model employed in this work.In Section 3, we briefly explain the set of energy efficiency policies that shut down idle servers.The scheduling models considered are explained in Section 4. In Section 5, the tool used for the simulation, the experimental environment, the energy model and DEA inputs/outputs are presented.Natural constant returns to scale (CRS) DEA results are described and analyzed in Section 6.Finally, we summarize this paper and present conclusions in Section 7.

Data Envelopment Analysis Model
Data envelopment Analysis (DEA) is a method that analyzes the connections between the outputs and inputs required in a production process in order to establish the efficiency frontiers [26].This non-parametric technique was first described for the determination of the efficiency of DMUs by [27] and was formally defined by [28].DEA has been proposed to measure the efficiency in various areas of operations research and management science [29][30][31][32].Moreover, it has been applied to measure the environmental performance by other authors [33][34][35][36][37][38][39][40], who describe the gains of this method in the field of environmental management, which is a matter of undoubted relevance for the valuation of the sustainable development ability and pathway [41].A critical feature of DEA for environmental analysis is the inclusion of desirable and undesirable outputs along with its own production variables, which cannot be isolated in an environmental analysis model of these features [42].In this way, ref. [36] have refined a non-radial and radial model of DEA for environmental measurements.This approach separates the outputs into desirable and undesirable and presents two concepts: natural and managerial disposability.In this work, we employ the DEA radial approach for environmental assessments proposed by [37].It should be borne in mind that a main feature of this approach is the utilization of DEA-RAM (range-adjusted measure), first proposed by [43] to treat in a unified manner the analysis of managerial and natural disposability.

Natural Disposability
Natural disposability refers to a DMU that improves its efficiency by decreasing its inputs in order to decrease its undesirable outputs, as well as to increase the desirable outputs.
In Model (1), each j-th DMU j = 1, . . ., n, considers inputs X j = (x 1j , . . ., x mj ) T for the production of desirable outputs G j = (g 1j , . . ., g sj ) T and undesirable outputs B j = (b 1j , . . ., b hj ) T .Furthermore, d x i , i = 1, . . ., m, d g r , r = 1, . . ., s and d b f , f = 1, . . ., h are all slack variables which are related to inputs, desirable and undesirable outputs, respectively.λ = (λ 1 , . . ., λ n ) T are structural or intensity variables, which are unknown and are used for the connection of the input and output vectors by means of a convex combination.R is the range resolute through the lower and upper limits of inputs, desirable outputs and undesirable outputs, denoted by: The natural efficiency of the k-th policy is computed by the following CRS and radial VRS model (see [37] for a better understanding): where the unrestricted parameter ξ denotes an unknown inefficiency rate expressing the gap between the efficiency frontier and an empirical group of undesirable and desirable outputs.The parameter takes the value of 0.0001 in this work to minimize the influence of slack variables.If the restriction ∑ n j=1 λ j = 1 is added to Model (1), then the obtained model is a VRS (Model (1 * )).The first restriction in equation systems ((1), (1 * )) explores the values of λ j to create a composite unit, considering inputs such as: The values of the inputs can be decreased when the positive slack variables d x i are present.This may unquestionably vary the given rates, which implies that the system presents some inefficiencies.
In the same way, the second restriction, ∑ n j=1 g rj λ j = d g r + ξg rk + g rk , r = 1, . . ., s, indicates that the desirable outputs can be maintained or increased by making an increase of the slack variable d g r and a radial expansion ξg rk .
The third restriction, shows the decrease of the inputs, and then, we could reduce the undesirable outputs both in their slack variables and radially.
The objective function considers that two origins of inefficiency may be established.A k-policy can be considered efficient when the following two conditions are met: (a) ξ = 0 ; (b) In this case, the k-policy belongs to the efficiency frontier, since it fulfills the constraints present in equation systems ((1), (1 * )), and consequently, the objective function takes a value of zero.Otherwise, the value of the objective function for non-efficient policies is greater than zero, due to possible displacements in the slack variables and radial movements.
The natural efficiency is then computed by: The value of this unified efficiency measure ranges between zero and one.If the k-policy is efficient, then the objective function of equation systems ((1), (1 * )) is zero, and hence, the efficiency score equals θ * = 1.Slack variables resulting in the optimality of the models represented in equation systems ((1), (1 * )) show the level of inefficiency.

Managerial Disposability
The managerial efficiency of the k-th policy is evaluated by the following CRS and VRS radial model [37]: Similarly, if the restriction ∑ n j=1 λ j = 1 is added to Model (2), then the obtained model is a VRS (Model (2 * )).In this model (2), increasing the inputs is allowed since new technologies that emit less CO 2 emissions to the atmosphere can be used.
By using the VRS models, we can obtain the returns to scale (RTS) and damage to scale (DTS) (see [37] for a better understanding).It is clear that for the natural efficiency, the returns to scale have to be increasing, and for managerial efficiency, the damages to scale have to be decreasing.Otherwise, the technical units are not working well and should correct the imbalances, using the information of the efficient units to which they have to be similar (peers).

Energy Policies for Data Centers at a Glance
The following set of energy efficiency policies for shutting down underutilized machines have been developed in this work as an evolution of those presented in [24], which have been adapted to the more complex reality of the cloud-computing paradigm:

•
Never: prevents any shut-down process.

•
Always: shuts down every server running in an idle state.

•
Load: shuts down machines when data-center load pressure fails to reach a given threshold.

•
Margin: assures that a determined number of machines are turned on and available before shutting down any machine.
• Random: shuts down machines randomly by means of a Bernoulli distribution with parameter 0.5.

•
Exponential: shuts down machines when the probability of one incoming task negatively impacting on the data-center performance is lower than a given threshold.This probability is computed by means of the exponential distribution.

•
Gamma: shuts down machines when the probability of incoming tasks oversubscribing to the available resources in a particular time period is lower than a given threshold; this probability is computed by means of the Gamma distribution.

Scheduling Models for Data Centers at a Glance
Cluster schedulers constitute a core part of cloud computing systems, since they are responsible for optimal task assignation to computing nodes.Several degrees of parallelism have been added to overcome the limitations present in central monolithic scheduling approaches when complex and heterogeneous systems with a high number of incoming jobs are considered.The following scheduling models are studied in this work: • Monolithic: A centralized and single scheduler is responsible for scheduling all tasks in the workload in this model [44].This scheduling approach may be the perfect choice when real-time responses are not required [45,46], since the omniscient algorithm performs high-quality task assignations by considering all restrictions and features of the data-center [47][48][49][50] at the cost of longer latency [46].The scheduling process of a monolithic scheduler, such as that given by Google Borg [51], is illustrated in Figure 1.• Two-level: This model achieves a higher level of parallelism by splitting the resource allocation and the task placement: a central manager blocks the whole cluster every time a scheduler makes a decision to offer computing resources to schedulers; and a set of parallel application-level schedulers performs the scheduling logic against the resources offered.This strategy enables the development of sub-optimal scheduling logic for each application, since the state of the data-center is not shared with the central manager, nor with the application schedulers.The workflow of the Two-level schedulers [53,54] is represented in Figure 2. Resource Scheduler • Shared-state schedulers: On the other hand, in shared-state schedulers, such as Omega [55], the state of the data-center is available to all the schedulers.The central manager coordinates all the simultaneous parallel schedulers, which perform the scheduling logic against an out-of-date copy of the state of the data-center.The scheduling decisions are then committed to the central manager, which strives to apply these decisions.The utilization of stale views of the cluster by the schedulers can result in conflicts, since the chosen resources may not longer be available.In such a scenario, the local view of the state of the data-center stored in the scheduler is refreshed before the repetition of the scheduling process.The workflow of the shared-state scheduling model is represented in Figure 3.
Cluster state

Methodology
In these next sections, the experimental environment designed for the implementation of the natural CRS DEA analysis is presented.The workflow followed in this work is shown in Figure 4.

Simulation Tool
The SCORE simulator [52] is employed in this work, since simulation is the best alternative in scenarios where the implementation of the proposed strategies on real large-scale data-centers remains unfeasible.This simulator provides us with the tools for the development and application of the energy policies described in Section 3 and the scheduling models presented in Section 4 on realistic large-scale cloud computing systems.

Environment and DMU Definition
Following the trends presented in [56,57], two utilization environments have been simulated in this paper for seven days of operation: • the low-utilization scenario, which represents highly over-provisioned infrastructures and achieves an average utilization of approximately 30%.

•
the high-utilization scenario, which represents facilities of a more efficient nature that use approximately 65% of available resources on average.
These scenarios are applied to three data-center sizes: (a) Small: composed of 1000 computing servers; (b) Medium: composed of 5000 computing servers; and (c) Large: composed of 10,000 computing servers.Each server is equipped with four CPU cores and 8 GB of RAM.
Decision-making units (DMUs) are defined by the following elements: (a) an energy efficiency policy; (b) a scheduling model; and (c) a workload scenario.

Energy Model
The following states are presented for each resource in the energy model applied in this work: (a) Idle: when the machine is not executing tasks; and (b) Busy: otherwise.
Let t i idle represent the time the i-th resource is idle, and let t i busy denote the time during which the machine is computing tasks.In the same way, P i idle and P i busy represent the power required for the machines to run in these states, respectively.
The time a machine spends on executing a job may be defined as follows: where Tasks ij represents the tasks of the j-th job assigned to M i and C t denotes the completion time of the t-th task of the j-th job.
In the same way, the total time a machine is executing tasks and the total time it is in an idle state may be defined as follows: where t i total represents the total operation time.Therefore, we can express the energy consumption as follows: The considered power states, transitions and values for the energetic model are shown in Figure 5.

DEA Inputs and Outputs
The inputs and outputs considered in DEA analysis and representative experimentation values are shown in Tables 1 and 2, respectively.One hundred and eight DMUs were analyzed, which were the result of the combination of all energy policies, scheduling models, data-center sizes and workload types described in Sections 3, 4 and 5.2, respectively.However, for clarity, a subset of the most interesting eighteen DMUs i shown in this paper.Each environment presents the following inputs and outputs:

•
Inputs: Two inputs are considered in this work: (a) the number of machines in the data-center (D.C.), as shown in Section 5.2; and (b) the number of shut-down operations performed.These inputs may be reduced or kept equal.

•
Outputs: One desirable output and two undesirable outputs are considered in this paper: (a) the time used to perform tasks' operations.The longer the time, the less idle the data-center.This good input can be maximized or kept equal; (b) the energy consumption of the data-center.The lower the energy consumption, the more efficient the data-center.This bad input may be reduced or kept equal; and (c) the average time jobs spend in a queue until they are scheduled.The shorter the time, the more performant the system is.This bad input may be reduced or kept equal.

Natural CRS DEA Results
The whole dataset included as an Appendix is analyzed by means of natural CRS and VRS DEA.However, only the most relevant natural CRS DEA results for the most representative DMUs, which are presented in Table 2, are described in this section.
An efficiency analysis depending on the data-center size and on the energy policy is shown in Tables 3 and 4. The following conclusions can be drawn:

•
The best efficiency levels are achieved for small data-centers.The data-center size input is predominant in this group of DMUs, since no major differences between energy policies, scheduling frameworks and workload scenarios are present (σ = 0.01, x = 0.99).

•
Mid-size data-centers should use the margin energy policy and monolithic or Omega schedulers and should avoid all other energy policies and the Mesos scheduler.Moreover, high workload scenarios are also more efficient than low workload scenarios.In addition, the following DMUs achieve a good level of efficiency, but they do not belong to the efficiency frontier: (a) the DMU combining the Gamma energy policy and the monolithic or Omega schedulers; (b) the DMU combining the exponential energy policy and the Omega scheduler.

•
No DMU is efficient in large-scale data-centers.However, the following DMUs present good levels of efficiency: (a) the DMUs combining the Gamma, exponential or margin energy policy with the high workload scenario and the monolithic scheduler; and (b) the DMUs combining the Gamma or margin energy policy with the high workload scenario and the Omega scheduler.

•
In high-loaded scenarios, the monolithic scheduler presents the lowest deviation regardless of the data-center size (σ = 0.32).
We can determine that it is always inefficient to operate in a low utilization scenario in medium-sized and large data-centers.Moreover, both the margin and the probabilistic energy policies (Gamma and exponential) perform more efficiently than the rest of the energy policies, as shown in Figure 6.The monolithic scheduler seems to achieve good results even for large-scale data-centers, while the two-level scheduling approach has a negative impact on data-center performance.However, the trends show that the performance of the monolithic scheduling approach suffers from degradation on larger data-centers and higher workload pressure, and hence, lower efficiency levels are to be expected if larger sizes and higher utilization scenarios are to be considered.The actions proposed for the improvement of efficiency of the most relevant DMUs are shown in Table 5. DMU #104 presents a natural efficiency of 0.1697.This means it is far from being efficient.The following corrective actions are suggested for it to belong to the efficiency frontier, as shown in Table 6:

•
The time the data-center spends on task computation must be increased by 38.28 h (+83%).

•
The average time jobs wait in a queue must be reduced by 3.23 s (−83%).

•
The number of servers must be reduced by 9190 (−92%).
In addition to these corrective actions, the peers this DMU should emulate are #13, #34 and #18.This means that the workload must be increased, and better energy efficiency policies, such as margin and always, must be used.The full dataset containing all the DMUs and DEA analysis and corrections can be found as Supplementary Material in the Appendix.Some of the proposed changes involve the switching of the scheduling framework, which is hardly achievable with the current resource manager systems.To implement these corrections, a resource managing system able to dynamically change the scheduling framework during runtime would be necessary.Such a system is an interesting improvement to the current state of the art that the DEA analysis leads us to develop.

Conclusions and Policy Implications
In this work, we have confirmed the hypothesis that DEA constitutes a powerful tool for the analysis of technical efficiency in cloud-computing scenarios where large-scale data-centers provide the computational core.
Data envelopment analysis provides cloud-computing operators with the means for the identification of which data-center configuration better suits their requirements, both in terms of performance and energy efficiency.
This methodology allows us to analyze several energy efficiency policies that shut down idle servers, so that their behavior and differences can be compared in various data-center environments.It has been proven that policies based on a security margin and those that use statistical tools to predict the future workload, such as exponential and Gamma, deliver better results than policies based on data-center workload pressure and random strategies.
In addition, it has been empirically shown that even under medium and high workload pressure, in data-centers composed of up to 10,000 machines, monolithic schedulers perform better than other scheduling models, such as the two-level and shared-state approaches.Finally, cloud-computing infrastructure managers are provided with empirical knowledge of which data-centers are not being used optimally, and hence, they can make decisions regarding the shut-down of machines in order to achieve higher utilization levels of the cloud-computing system as a whole.
As future work related to the limitations of the presented work, we may include:

•
The addition of different kind of workload patterns, as well as real workload traces.

•
The analysis of other scheduling models, such as distributed and hybrid models.

•
The development of a new-generation resource-managing system that could dynamically apply the optimal scheduling framework depending on the environment and workload.

•
The analysis of simulation data with other DEA approaches, such as Bayesian and probabilistic models, which could minimize the impact of the noise in current DEA models.

Figure 4 .
Figure 4. Methodology workflow employed in this work.DEA, data envelopment analysis.

Table 1 .
DEA inputs and outputs.Action column arrows mean whether the input/output value may be decreased (down arrow), increased (up arrow) or kept equal.

Table 2 .
Sample from the dataset for DEA analysis.The full dataset showing the results for the 108 DMUs analyzed can be found as the Supplementary Material.Energy policies, scheduling models, data-center sizes and workload types can be found in Sections 3, 4 and 5.2, respectively.D.C., data-center.
Summary of DEA natural constant returns to scale (CRS) efficiency results for energy efficiency policies.

Table 3 .
Efficiency analysis for data-center sizes.

Table 4 .
Efficiency analysis of energy policies.DMU #104 is selected to illustrate how corrective actions are proposed by DEA in order to achieve efficiency.This DMU is defined by the combination of the random energy efficiency policy, the Omega scheduling model and a low utilization workload scenario.

Table 5 .
Resulting proposed corrections following DEA analysis.Peer projections for a DMU indicate which DMU it should emulate.The following actions may be taken for each input and output: ↑ when the parameter must be increased; ↓ if the parameter must be reduced; and £ if no further actions are needed to achieve efficiency.