1. Introduction
Data centers, which constitute the computational muscle for cloud computing, can be compared in energy consumption to many industrial facilities and towns. The latest trends show that these infrastructures represent approximately 2% of global energy consumption [
1], with a 5% annual growth rate [
2].
The data envelopment analysis mathematical model enables the management organizational divisions to measure the performance of an organization by providing the relative efficiency of each organizational unit. This relative efficiency measurement can be applied to a set of decision-making units, also known as DMUs, or for productive efficiency. The productive efficiency, also called technical efficiency, involves a collection of inputs (the resources needed for the production) and outputs (the production achieved). To this end, DEA constructs an “efficiency frontier” which places the relative performance of all units so these can be contrasted. This method is notably well-suited for the examination of the behavior of complex relations, even unknown, between numerous inputs and outputs, where the decisions made are affected by a level of uncertainty [
3]. Moreover, DEA has been used both in private [
4,
5] and in public contexts [
6,
7,
8,
9].
Many initiatives have emerged looking for the decrease of the consumption of energy and the CO trace of data-centers, especially those of a medium and large size. These facilities are composed of thousands and even tens of thousands of machines.
A substantial part of these initiatives focuses on the improvement of the Power Usage Effectiveness (PUE), that is the amount of energy consumed in non-computational tasks, such as power supply, cooling and networking components. This accounts for more than half of the energy consumption of an Internet data-center (IDC).
Several strategies are proposed to significantly improve energy efficiency in large-scale clusters [
10]: cooling and temperature management [
11,
12]; power proportionality for CPU and memory hardware components [
13,
14]; fewer energy-hungry and non-mechanical hard disks [
15]; and new proposals for energy distribution [
16].
On the other hand, almost 50% of energy is consumed by computational servers to satisfy the incoming workload. The job arrival is not stable over time, but usually presents correlative low and high periods, such as those present in day/night and weekday/weekend workload patterns.
Such scenarios present a huge opportunity for the improvement of energy efficiency through proper scheduling and through the application of low-energy consumption modes to servers, since keeping servers in an idle state is extremely energy-inefficient. Many energy-aware schedulers, which aim to raise server usage, have been proposed in order to free up the maximum amount of machines so that they may put into hibernation [
17,
18,
19]. In addition to these schedulers, several energy-conservation strategies may be applied in virtualized environments, such as the consolidation and migration of virtual machines [
20,
21].
Other strategies focus on the reduction of energy consumption in specific scenarios, such as those of distributed file systems [
22,
23].
The most aggressive approach involves the shut-down of underutilized servers in order to minimize energy consumption. Several shut-down policies have been proposed for grid computing environments in [
24]. This strategy is yet to be widely implemented in working data-centers since a natural reticence to worsening QoS is usually present in data-center operators [
25].
The innovation of the research presented in this paper involves the utilization of data envelopment analysis (DEA) as a mathematical technique to compare the efficiency regarding the consumption of energy and the performance of various workload scenarios, scheduling models and energy efficiency policies. This efficiency analysis enables data-center operators to make appropriate decisions about the number of machines, the scheduling solution and the shut-down strategy that must be applied so that data-centers run optimally. The final goal is the maximization of the productive efficiency, which is computed as the amount of energy consumed to serve a workload with a determined performance.
The major contributions of this paper can be summarized as follows:
Extensive empirical experimentation and analysis of various cloud-computing scenarios with a trustworthy and detailed simulation tool.
Impact analysis in terms of the energy consumption and performance of several energy efficiency policies, which shut-down idle machines by means of data envelopment analysis.
DEA-conducted analysis of the performance impact and energy consumption of a set of scheduling models for large-scale data-centers.
Empirical determination and proposal of corrective actions to achieve optimal efficiency.
The work is organized as follows. In
Section 2, the authors introduce the current literature for the utilization of DEA presented for various areas, as well as the DEA model employed in this work. In
Section 3, we briefly explain the set of energy efficiency policies that shut down idle servers. The scheduling models considered are explained in
Section 4. In
Section 5, the tool used for the simulation, the experimental environment, the energy model and DEA inputs/outputs are presented. Natural constant returns to scale (CRS) DEA results are described and analyzed in
Section 6. Finally, we summarize this paper and present conclusions in
Section 7.
2. Data Envelopment Analysis Model
Data envelopment Analysis (DEA) is a method that analyzes the connections between the outputs and inputs required in a production process in order to establish the efficiency frontiers [
26]. This non-parametric technique was first described for the determination of the efficiency of DMUs by [
27] and was formally defined by [
28]. DEA has been proposed to measure the efficiency in various areas of operations research and management science [
29,
30,
31,
32]. Moreover, it has been applied to measure the environmental performance by other authors [
33,
34,
35,
36,
37,
38,
39,
40], who describe the gains of this method in the field of environmental management, which is a matter of undoubted relevance for the valuation of the sustainable development ability and pathway [
41]. A critical feature of DEA for environmental analysis is the inclusion of desirable and undesirable outputs along with its own production variables, which cannot be isolated in an environmental analysis model of these features [
42]. In this way, ref. [
36] have refined a non-radial and radial model of DEA for environmental measurements. This approach separates the outputs into desirable and undesirable and presents two concepts: natural and managerial disposability. In this work, we employ the DEA radial approach for environmental assessments proposed by [
37]. It should be borne in mind that a main feature of this approach is the utilization of DEA-RAM (range-adjusted measure), first proposed by [
43] to treat in a unified manner the analysis of managerial and natural disposability.
2.1. Natural Disposability
Natural disposability refers to a DMU that improves its efficiency by decreasing its inputs in order to decrease its undesirable outputs, as well as to increase the desirable outputs.
In Model (
1), each
j-th DMU
, considers inputs
for the production of desirable outputs
and undesirable outputs
. Furthermore,
,
and
are all slack variables which are related to inputs, desirable and undesirable outputs, respectively.
are structural or intensity variables, which are unknown and are used for the connection of the input and output vectors by means of a convex combination.
R is the range resolute through the lower and upper limits of inputs, desirable outputs and undesirable outputs, denoted by:
The natural efficiency of the k-th policy is computed by the following CRS and radial VRS model (see [
37] for a better understanding):
where the unrestricted parameter
denotes an unknown inefficiency rate expressing the gap between the efficiency frontier and an empirical group of undesirable and desirable outputs. The parameter
takes the value of 0.0001 in this work to minimize the influence of slack variables. If the restriction
is added to Model (
1), then the obtained model is a VRS (Model (
1)).
The first restriction in equation systems ((
1), (
1)) explores the values of
to create a composite unit, considering inputs such as:
. The values of the inputs can be decreased when the positive slack variables
are present. This may unquestionably vary the given rates, which implies that the system presents some inefficiencies.
In the same way, the second restriction, , indicates that the desirable outputs can be maintained or increased by making an increase of the slack variable and a radial expansion .
The third restriction, , shows the decrease of the inputs, and then, we could reduce the undesirable outputs both in their slack variables and radially.
The objective function considers that two origins of inefficiency may be established. A k-policy can be considered efficient when the following two conditions are met: (a)
; (b)
,
,
. In this case, the k-policy belongs to the efficiency frontier, since it fulfills the constraints present in equation systems ((
1), (
1)), and consequently, the objective function takes a value of zero. Otherwise, the value of the objective function for non-efficient policies is greater than zero, due to possible displacements in the slack variables and radial movements.
The natural efficiency is then computed by:
The value of this unified efficiency measure ranges between zero and one. If the k-policy is efficient, then the objective function of equation systems ((
1), (
1)) is zero, and hence, the efficiency score equals
. Slack variables resulting in the optimality of the models represented in equation systems ((
1), (
1)) show the level of inefficiency.
2.2. Managerial Disposability
The managerial efficiency of the k-th policy is evaluated by the following CRS and VRS radial model [
37]:
Similarly, if the restriction
is added to Model (
2), then the obtained model is a VRS (Model (
2)). In this model (
2), increasing the inputs is allowed since new technologies that emit less CO
emissions to the atmosphere can be used.
By using the VRS models, we can obtain the returns to scale (RTS) and damage to scale (DTS) (see [
37] for a better understanding). It is clear that for the natural efficiency, the returns to scale have to be increasing, and for managerial efficiency, the damages to scale have to be decreasing. Otherwise, the technical units are not working well and should correct the imbalances, using the information of the efficient units to which they have to be similar (peers).
6. Natural CRS DEA Results
The whole dataset included as an Appendix is analyzed by means of natural CRS and VRS DEA. However, only the most relevant natural CRS DEA results for the most representative DMUs, which are presented in
Table 2, are described in this section.
An efficiency analysis depending on the data-center size and on the energy policy is shown in
Table 3 and
Table 4. The following conclusions can be drawn:
The best efficiency levels are achieved for small data-centers. The data-center size input is predominant in this group of DMUs, since no major differences between energy policies, scheduling frameworks and workload scenarios are present ( = 0.01, = 0.99).
Mid-size data-centers should use the margin energy policy and monolithic or Omega schedulers and should avoid all other energy policies and the Mesos scheduler. Moreover, high workload scenarios are also more efficient than low workload scenarios. In addition, the following DMUs achieve a good level of efficiency, but they do not belong to the efficiency frontier: (a) the DMU combining the Gamma energy policy and the monolithic or Omega schedulers; (b) the DMU combining the exponential energy policy and the Omega scheduler.
No DMU is efficient in large-scale data-centers. However, the following DMUs present good levels of efficiency: (a) the DMUs combining the Gamma, exponential or margin energy policy with the high workload scenario and the monolithic scheduler; and (b) the DMUs combining the Gamma or margin energy policy with the high workload scenario and the Omega scheduler.
In high-loaded scenarios, the monolithic scheduler presents the lowest deviation regardless of the data-center size ( = 0.32).
We can determine that it is always inefficient to operate in a low utilization scenario in medium-sized and large data-centers. Moreover, both the margin and the probabilistic energy policies (Gamma and exponential) perform more efficiently than the rest of the energy policies, as shown in
Figure 6. The monolithic scheduler seems to achieve good results even for large-scale data-centers, while the two-level scheduling approach has a negative impact on data-center performance. However, the trends show that the performance of the monolithic scheduling approach suffers from degradation on larger data-centers and higher workload pressure, and hence, lower efficiency levels are to be expected if larger sizes and higher utilization scenarios are to be considered.
The actions proposed for the improvement of efficiency of the most relevant DMUs are shown in
Table 5.
6.1. Proposed Corrections for a Sample DMU
DMU #104 is selected to illustrate how corrective actions are proposed by DEA in order to achieve efficiency. This DMU is defined by the combination of the random energy efficiency policy, the Omega scheduling model and a low utilization workload scenario.
DMU #104 presents a natural efficiency of 0.1697. This means it is far from being efficient. The following corrective actions are suggested for it to belong to the efficiency frontier, as shown in
Table 6:
The time the data-center spends on task computation must be increased by 38.28 h (+83%).
Energy consumption must be reduced by 193.88 MWh (−83%).
The average time jobs wait in a queue must be reduced by 3.23 s (−83%).
The number of servers must be reduced by 9190 (−92%).
Shut-down operations must be reduced by 9680 (−24%).
In addition to these corrective actions, the peers this DMU should emulate are #13, #34 and #18. This means that the workload must be increased, and better energy efficiency policies, such as margin and always, must be used. The full dataset containing all the DMUs and DEA analysis and corrections can be found as
Supplementary Material in the Appendix.
Some of the proposed changes involve the switching of the scheduling framework, which is hardly achievable with the current resource manager systems. To implement these corrections, a resource managing system able to dynamically change the scheduling framework during runtime would be necessary. Such a system is an interesting improvement to the current state of the art that the DEA analysis leads us to develop.
7. Conclusions and Policy Implications
In this work, we have confirmed the hypothesis that DEA constitutes a powerful tool for the analysis of technical efficiency in cloud-computing scenarios where large-scale data-centers provide the computational core.
Data envelopment analysis provides cloud-computing operators with the means for the identification of which data-center configuration better suits their requirements, both in terms of performance and energy efficiency.
This methodology allows us to analyze several energy efficiency policies that shut down idle servers, so that their behavior and differences can be compared in various data-center environments. It has been proven that policies based on a security margin and those that use statistical tools to predict the future workload, such as exponential and Gamma, deliver better results than policies based on data-center workload pressure and random strategies.
In addition, it has been empirically shown that even under medium and high workload pressure, in data-centers composed of up to 10,000 machines, monolithic schedulers perform better than other scheduling models, such as the two-level and shared-state approaches.
Finally, cloud-computing infrastructure managers are provided with empirical knowledge of which data-centers are not being used optimally, and hence, they can make decisions regarding the shut-down of machines in order to achieve higher utilization levels of the cloud-computing system as a whole.
As future work related to the limitations of the presented work, we may include:
The addition of different kind of workload patterns, as well as real workload traces.
The analysis of other scheduling models, such as distributed and hybrid models.
The development of a new-generation resource-managing system that could dynamically apply the optimal scheduling framework depending on the environment and workload.
The analysis of simulation data with other DEA approaches, such as Bayesian and probabilistic models, which could minimize the impact of the noise in current DEA models.