Energy–QoS Trade-Offs in Mobile Service Selection

An attractive advantage of mobile networks is that their users can gain easy access to different services. In some cases, equivalent services could be fulfilled by different providers, which brings the question of how to rationally select the best provider among all possibilities. In this paper, we investigate an answer to this question from both quality-of-service (QoS) and energy perspectives by formulating an optimisation problem. We illustrate the theoretical results with examples from experimental measurements of the resulting energy and performance.

The power consumption of mobile services will depend on the load.Clearly quality of service (QoS) will also depend on load because a more heavily loaded computational or communication resource will quite naturally increase response times.However such issues are somewhat more complex, because the server clusters hosting the services may turn off some of the resources under lighter loads, so that when load is higher although power consumption will obviously increase, QoS can also improve.
A simple but quite realistic power consumption relation for current processing units is Π = A + Bρ, where A is the power consumption of the processing unit when it is idle, and B is the rate at which it increases as a function of the load factor ρ [1].Thus, a very efficient processor might have a very small value of A, and B would correspond to the rate of increase in power consumption as more and more cores are turned on as the load increases.Unfortunately, for much of the current equipment A is still a significant part of the total processor power consumption.This includes the fact that the memory system and the peripheral equipment and network connections need to be powered even when no jobs are being processed, and that the operating system can remain active (and hence contributes to the energy consumption) even when there are no external jobs that need to be processed.We can also obtain the expression for the energy consumption per job: where λ is the average arrival rate of jobs and E[S] is the average job service time.The equation supports the principle of concentrating computation on a small number of processing units in order to minimise the power consumption per job.However just power consumption on its own is not the only important fact: quality of service (QoS) is also primordial.In [1] we discuss how we can achieve optimum energy consumption to QoS trade-offs by adjusting system load in the context of a computing cloud.In this paper we discuss the much broader question: suppose that a mobile community could access services from both a local server within the operator provider (the "local server") and from remote service providers ("remote server"), then what fraction of their workload should they send remotely if they wish to optimise both QoS and energy consumption.
Of course, the decision to use a remote service will depend on a variety of considerations based on security, cost, data and software protection and resilience.Nevertheless, there will also be technical considerations based on QoS and energy consumption per job.Thus this paper only focuses on the technical choice between a local or remote cluster service, and shows that this choice can be formulated as an optimisation problem.In the sequel, we first review the literature, and then provide some experimental measurements regarding the energy consumption and performance of servers.Next we formulate the optimisation problem, describe its solution and present some numerical examples.

Optimising Energy and QoS
A simple analytic model that uses the combined energy-QoS cost function includes in its first part the well known Pollaczek-Khintchine formula [2,3] for the average response time, based on Poisson arrivals of jobs and general service time distributions, and in its second part the energy consumption per job: where E[S] is the average job service time as before; C 2 S is the squared coefficient of variation of service time; λ is the job arrival rate; and the constants a and b describe the relative importance placed on QoS and energy consumption.This allows us to compute the value of the arrival rate that minimises C job .The result shows that the optimum setting of the load ρ * = λ * E[S] will depend on A (the idle power consumption) and on the ratio b/a: The expression (3) gives us a simple rule of thumb for selecting system load for optimum operation, depending on how we weigh the importance of energy consumption with respect to average response time or how fast we are getting the jobs done.We also see that ρ * increases with the ratio bA/a(1 + C 2 S ).This tells us that the optimum load should increase with the system's idle power consumption, the relative importance that we place on energy, and with the squared coefficient of variation of service time.

Mathematical Model of Energy and Quality of Service at the Local and Remote Cluster
The local cluster (LC) is assumed to incorporate a rack of L processors and related peripheral devices, with a power profile: where: • A L is the local power consumption related to the internal networking and shared memory systems (main a secondary) plus their induced cooling and ventilation costs; • B L is the workload proportional power consumption, including cooling, per processor in the LC rack, and • ρ L is the individual utilisation (percentage of time it is busy) of each of the L processors in the local rack.
The local computational workload is represented by a flow of λ L jobs per second, each of which on average takes S L of processing time, and jobs are equally distributed to the L processors, so that: As a result, the total expenditure of energy per job in the LC is the ratio of power consumption to total job arrival rate, or: If the average response time W (F, λ) is a function of job arrival rate λ and job service time distribution F , we will have:

The Remote Cluster Model
Similarly, the remote cluster (RC) is assumed to incorporate a rack of R processors and related peripheral devices, with a simplified power profile given by the expression: where: • A R is the power consumption in the RC related to the internal networking and shared memory systems (main and secondary) plus the power consumption for cooling and ventilation; • B R is the workload proportional power consumption, including cooling, per processor in the RC rack; and • ρ R is the individual utilisation (percentage of time it is busy) of each of the R processors in the RC rack.
The computational workload in the RC is represented by a flow of λ R jobs per second, each of which on average takes S R of processing time, and jobs are equally distributed to the R processors, so that: As a result, the total expenditure of energy per job in the RC is the ratio of power consumption to the total job arrival rate, or: Assuming the same average response time formula W (F, λ), function of job arrival rate and job service time distribution, we have:

Transferring a Fraction α of Jobs to the Remote Cluster
When the user decides to transfer some fraction α of its jobs to the remote cluster, and assuming that the RC has another load of jobs arriving at rate λ, we obtain that the net average response time perceived by the users who emanate from the LC is: where we have assumed that additional network delays between the users and the two clusters are equivalent since the users need not be "resident" in the facility that hosts the LC.Under the assumption that each of the two clusters shares its load equally among its processors, that the RC processors are f times faster than the LC processors, and that all job arrival traffic is Poisson, we can use the well known Pollaczek-Khintchine formula [4] to estimate the average response time per job as a function of the load dispatching policy characterised by the fraction α of jobs that are sent to the RC.We have: and The composite cost function that includes both the energy consumption per job and the average response time then becomes: where a and b are the relative importance of response time versus energy.

Experimental Results
To validate the main findings of this work, we have conducted a series of experiments using a representative number of computing machines.In particular, we have used six computers (R = 6) in the "remote cluster" and three similar computers (L = 3) for our "local cluster".The computers were selected from a set of dual core Pentium 4 and quad-core Intel Xeon computers, all of them running Ubuntu Linux with CPU throttling enabled.Job requests were originated from an additional machine connected through a Fast Ethernet switch to both clusters.
A job consisted in calculating the number π, using Machin's formula, to a desired level of precision and in sending the resulting string back to the client over the network.Each job request indicated the desired number of digits to be used, which was randomly chosen in the range 10,000-50,000 by the client.In addition to generating requests periodically (exponentially spaced, at rate λ), the client also determined the cluster that handled the request as is illustrated in Figure 1.With probability α a request was sent to the remote cluster and with (1 − α) to the local cluster.A round-robin scheduler ("RR" in Figure 1) was implemented in each of the clusters, so that each newly arriving job is assigned and placed in the input queue of the next machine in the list regardless of the machine's load.This in effect results in an equal distribution of the incoming flow of jobs to each of the machines in the cluster, with a separate queue being created at each machine in the cluster, and is reflected in the way in which we construct the mathematical model in Section 2.
Because of the differences in both number and kind of machines in the two clusters, the job service times varied as shown in Figures 2. The service time in the local cluster was on average 9.6119 and in the remote cluster 9.5256, giving a speed-up factor of f = 1.0091 with corresponding coefficients of variation of 0.5103 and 0.6413.The power consumption of both clusters is shown in Figure 3 [5].From observations of power consumption and system utilization, it was possible to approximate parameters A L = 223.0062,B L = 9.9302,A R = 413.2667and B R = 11.5130 using linear regression.Note that the linear regressions were applied to each model's operational region, which depends on the number of processors available: 6 and 3 processors respectively for the remote and local clusters.Power measurements beyond the model's validity regions are shown for illustration purposes only.
Clearly, the remote cluster has a higher power consumption because of the larger number of machines available to service jobs.By recording the start and completion times for each job at the client, we were able to measure the average job response time.We conducted the experiment by using only one of the clusters at the time (i.e., by fixing α = 0, and later α = 1).The results for each independent cluster are shown in Figure 4 alongside the theoretical values.The reported values were obtained by averaging the measurements corresponding to 1000 jobs.The saturation rate for the local cluster was at around λ = 0.3 and about twice as large for the remote cluster, which makes sense given the relative sizes of the clusters.Similarly, we recorded the power consumption while executing the jobs along with the total execution time for each job set, which allowed us to estimate the energy consumption per job (see Figure 5).As shown in Figures 4 and 5, the theoretical model approximated well the measured values within the operational range (i.e., when not exceeding the systems' capacity).These models allowed us to obtain energy-QoS costs (Equation 15) for different values of α as shown in Figure 6.The choice of parameters a = 0.1 and b = 1.0 was made to approximately normalize (to 10) the varying range of the response time and energy per job.The former gets around 100 s close to full system utilization, whereas the latter was around 10 KJ for low system utilization.It is interesting to observe the values of α that minimize the overall system cost.These are graphically illustrated in Figure 7 for the cases a = 0.1, b = 1.0 (left), and a = b = 1.0 (right).The choice of a and b would in the end depend on the importance that we would like to give to both Energy and QoS.However, these two sets of values will serve us to illustrate the behaviour of the Energy-QoS cost.The horizontal axis depicts values of α in the range 0 to 1.As previously explained, when α = 0, all the load is sent to the local cluster.At α = 1 all load is sent to the remote cluster.The load is shared between the two clusters for values of α between these two extremes.The figures show that the choice of a and b can affect the cost of the load sharing, as well as the operating point (i.e., the value of λ).

Related Work
Although we have not been able to find work that has discussed the issue that is at the centre of this paper, there has indeed been much work on power aspects of servers and clusters.Most works have focused on power consumption models offering the advantage of simplicity, but also lack accuracy as suggested by Rivoire et al. [6] who examined five full-system representative power models in a recent study.The most common direction is the single-parameter black-box approach, finding relationships between a server's load (normally CPU utilization) and power consumption from measured data.For example, linear regression models have been used by Sasaki et al. [7] in web server clusters and Lewis [8] et al. in server blades.Fan et al. [9] obtained measurements of the power consumption of warehouse-sized computers (computer for large-scale Internet services), Li et al. [10] did a similar work on web servers also running on blade systems, and Economou et al proposed a modeling methodology for a full-system power consumption [11].Similar works have been done by Chu et al. [12], Jaiantilal et al. [13], Yuan and Ahmad [14].However, these models tend to be accurate only within certain operational regions as suggested by Lien et al. [15].Bolla et al [16] investigated the impact in energy consumption and network performance of using low power idle and power scaling in network devices.The simultaneous minimization of a composite energy-QoS function in networks has previously been studied in [17], while other work has considered the dynamically flow of energy so as to support "On Demand" the energy needs of Cloud Computing [18,19].
A direct application of these models is in power saving techniques.A comprehensive survey of green networking research was compiled by Bianzino et al. [20].Another survey on the power and energy management in servers was done by Bianchini and Rajamony [21].Some relevant examples of power optimization techniques are the works of Sankar et al. [22] on metric composition energy-delay, Chase et al. [23], who proposed an economic approach to server resource management.Rodero et al. researched application-aware power management looking at individual components [24].A popular approach to reduce power usage is by switching off unused equipment as done by Chen et al. [25] and Niyato et al. [26].A control mechanism to adjust the peak power of a high density server was suggested by Lefurgy et al. [27] by means of a feedback controller.

Conclusions
In this paper, we have studied the optimum load sharing between a local and remote cluster service as a function of a compromise between perceived average response time and energy consumption per job accessed from a mobile.This requires that the average and variance of job service times, average job arrival rates, and the power consumption parameters of the servers involved are effectively measured.We have also provided experimental measurements of these quantities for a test case.Yet much still can be done in this broad area, and some interesting questions that we would like to address include: • Considering an organisation of servers as a set of specialised service facilities, with multiple specialised units, what are the energy-QoS trade-offs and operating points in such a system?• With multiple types and distinct steps within jobs themselves, what are the best job allocation [28] strategies for each job type?• If jobs have synchronisation constraints as in distributed databases [29], how does this affect the energy-QoS trade-off?
• If we wish to simultaneously evaluate multiple QoS and energy criteria [30,31], such as peak power consumption, energy consumption, turn-around times, and throughput, how can we design task allocation and routing algorithms?• When sub-systems can be turned on and off creating further start-up delays [32] and energy costs, how can we now address the optimum operating point of each sub-system in an interconnected network of servers?

Figure 3 .
Figure 3. Power consumption of both the local and remote cluster.

Figure 5 .
Figure 5. Experimental and theoretical values for the energy per job.