Optimal Asynchronous Dynamic Policies in Energy-Efﬁcient Data Centers

: In this paper, we apply a Markov decision process to ﬁnd the optimal asynchronous dynamic policy of an energy-efﬁcient data center with two server groups. Servers in Group 1 always work, while servers in Group 2 may either work or sleep, and a fast setup process occurs when the server’s states are changed from sleep to work. The servers in Group 1 are faster and cheaper than those of Group 2 so that Group 1 has a higher service priority. Putting each server in Group 2 to sleep can reduce system costs and energy consumption, but it must bear setup costs and transfer costs. For such a data center, an asynchronous dynamic policy is designed as two sub-policies: The setup policy and the sleep policy, both of which determine the switch rule between the work and sleep states for each server in Group 2. To ﬁnd the optimal asynchronous dynamic policy, we apply the sensitivity-based optimization to establish a block-structured policy-based Markov process and use a block-structured policy-based Poisson equation to compute the unique solution of the performance potential by means of the RG-factorization. Based on this, we can characterize the monotonicity and optimality of the long-run average proﬁt of the data center with respect to the asynchronous dynamic policy under different service prices. Furthermore, we prove that a bang–bang control is always optimal for this optimization problem. We hope that the methodology and results developed in this paper can shed light on the study of more general energy-efﬁcient data centers.


Introduction
Over the last two decades considerable attention has been given to studying energy-efficient data centers. On the one hand, as the number and size of data centers increase rapidly, energy consumption becomes one main part of the operating costs of data centers. On the other hand, data centers have become a fundamental part of the IT infrastructure in today's Internet services, in which a huge number of servers are deployed in each data center such that the data centers can provide cloud computing environments. Therefore, finding optimal energy-efficient policies and designing optimal energy-efficient mechanisms are always interesting, difficult, and challenging in the energy-efficient management of data centers. Readers may refer to recent excellent survey papers, such as Masanet et al.
[1], Zhang et al. [2], Nadjahi et al. [3], Koot and Wijnhoven [4], Shirmarz and Ghaffari [5], Li et al. [6], and Harchol-Balter [7]. Barroso and Hölzle [8] demonstrated that many data centers were designed to handle peak loads effectively, but this directly caused a significant number of servers (approximately 20%) in the data centers to be idle because no work was done in the off-peak period. Although the idle servers do not provide any services, they still continue to consume a notable amount of energy, which is approximately 70% of the servers working in the onpeak period. Therefore, it is necessary and useful to design an energy-efficient mechanism areas. For example, in energy-efficient data centers by Xia et al. [35]; inventory rationing by Li et al. [36]; the blockchain selfish mining by Ma and Li [37]; and in finance by Xia [38].
The main contributions of this paper are threefold. The first contribution is to apply the sensitivity-based optimization (and the MDPs) to study a more general energy-efficient data center with key practical factors, for example, a finite buffer, a fast setup process, and transferring some incomplete service jobs to the idle servers in Group 1 or to the finite buffer, if any. Although practical factors will not increase any difficulty in performance evaluation (e.g., modeling by means of queueing systems or Markov processes, also see Gandhi [10] for more details), they can largely cause substantial difficulties and challenges in finding optimal dynamic energy-efficient policies and, furthermore, in determining thresholdtype policies by using the sensitivity-based optimization. For instance, the finite buffer makes the policy-based Markov process appear as the two-dimensional block-structured Markov process from the one-dimensional birth-death process given in Ma et al. [31] and Xia et al. [35].
Note that this paper has two related works: Ma et al. [31] and Xia et al. [35], and it might be necessary to set up some useful relations between this paper and each of the two papers. Compared with Ma et al. [31], this paper considers more practical factors in the energy-efficient data centers such that the policy-based Markov process is block-structured, which makes solving the block-structured Poisson equation more complicated. Compared with Xia et al. [35], this paper introduces a more detailed cost and reward structure, which makes an analysis of the monotonicity and optimality of dynamic energy-efficient policies more difficult and challenging. Therefore, this paper is a necessary and valuable generalization of Ma et al. [31] and Xia et al. [35] through extensively establishing the block-structured policy-based Markov processes, which in fact are the core part of the sensitivity-based optimization theory and its applications in various practical systems.
The second contribution of this paper is that it is the first to find an optimal asynchronous dynamic policy in the study of energy-efficient data centers. Note that the two groups of servers in the data center have "the setup actions from the sleep state to the work state" and "the close actions from the work state to the sleep state", thus, we follow the two action steps to form an asynchronous dynamic policy, which is decomposed into two sub-policies: the setup policy (or the setup action) and the sleep policy (or the close action). Crucially, one of the successes of this paper is to find the optimal asynchronous dynamic policy from many asynchronous dynamic policies by means of the sensitivity-based optimization. To date, it has still been very difficult and challenging in the MDPs.
The third contribution of this paper is to provide a unified framework for applying the sensitivity-based optimization to study the optimal asynchronous dynamic policy of the energy-efficient data center. For such a more complicated energy-efficient data center, we first establish a policy-based block-structured Markov process as well as a more detailed cost and reward structure, and provide an expression for the unique solution to the block-structured Poisson equation by means of the RG-factorization. Then, we show the monotonicity of the long-run average profit with respect to the setup and sleep policies and the asynchronous policy, respectively. Based on this, we find the optimal asynchronous policy when the service price is higher (or lower) than a key threshold. Finally, we indicate that the optimal control is a bang-bang control. Such a structure of the optimal asynchronous energy-efficient policy reduces the search space, which is a significant reduction of the optimization complexity and effectively alleviates the curse of the dimensionality of MDPs. Therefore, the optimal asynchronous dynamic policy is the threshold-type in the energy-efficient data center. Note that the optimality of the threshold-type policy can realize a large reduction for the search space, thus, the optimal threshold-type policy is of great significance to solve the mechanism design problem of energy-efficient data centers. Therefore, the methodology and results developed in this paper provide new highlights for understanding dynamic energy-efficient policy optimization and mechanism design in the study of more general data centers.
The organization of this paper is as follows. In Section 2, we give a problem description for an energy-efficient data center with more practical factors. In Section 3, we establish a policy-based continuous-time block-structured Markov process and define a suitable reward function with respect to both states and policies of the Markov process. In Section 4, we set up a block-structured Poisson equation and provide an expression for its unique solution by means of the RG-factorization. In Section 5, we study a perturbation realization factor of the policy-based continuous-time block-structured Markov process for the asynchronous dynamic policy, and analyze how the service price impacts on the perturbation realization factor. In Section 6, we discuss the monotonicity and optimality of the long-run average profit of the energy-efficient data center with respect to the asynchronous policy. Based on this, we can give the optimal asynchronous dynamic policy of the energy-efficient data center. In Section 7, if the optimal asynchronous dynamic policy is the threshold-type, then we can compute the maximal long-run average profit of the energy-efficient data center. In Section 8, we give some concluding remarks. Finally, three appendices are given, both for the state-transition relation figure of the policy-based block-structured continuous-time Markov process and for the block entries of its infinitesimal generator.

Model Description
In this section, we provide a problem description for setting up and optimizing an asynchronous dynamic policy in an energy-efficient data center with two groups of different servers, a finite buffer, and a fast setup process. Additionally, we provide the system structure, operational mode, and mathematical notations in the energy-efficient data center.
Server groups: The data center contains two server groups: Groups 1 and 2, each of which is also one interactive subsystem of the data center. Groups 1 and 2 have m 1 and m 2 servers, respectively. Servers in the same group are homogeneous, while those in different groups are heterogeneous. Note that Group 1 is viewed as a base-line group whose servers are always at the work state even if some of them are idle, the purpose of which is to guarantee a necessary service capacity in the data center. Hence, each server in Group 1 always works regardless of whether it has a job or not, so that it must consume an amount of energy at any time. In contrast, Group 2 is regarded as a reserved group whose servers may either work or sleep so that each of the m 2 servers can switch its state between work and sleep. If one server in Group 2 is at the sleep state, then it consumes a smaller amount of energy than the work state, as maintaining only the sleep state requires very little energy.
A finite buffer: The data center has a finite buffer of size m 3 . Jobs must first enter the buffer, and then they are assigned to the groups (Group 1 is prior to Group 2) and subsequently to the servers. To guarantee that the float service capacity of Group 2 can be fully utilized when some jobs are taken from the buffer to Group 2, we assume that m 3 ≥ m 2 , i.e., the capacity of the buffer must be no less than the server number of Group 2. Otherwise, if there are more jobs waiting in the buffer, the jobs transferred from Group 2 to the buffer will be lost.
Arrival processes: The arrivals of jobs at the data center are a Poisson process with arrival rate λ. If the buffer is full, then any arriving job has to be lost immediately. This leads to an opportunity cost C 5 per unit of time for each lost job due to the full buffer.
Service processes: The service times provided by each server in Groups 1 and 2 are i.i.d. and exponential with service rates µ 1 and µ 2 , respectively. We assume that µ 1 ≥ µ 2 , which makes the prior use of servers in Group 1. The service discipline of each server in the data center is First Come First Serve (FCFS). If a job finishes its service at a server, then it immediately leaves the system. At the same time, the data center can obtain a fixed service reward (or service price) R from the served job.
Once a job enters the data center for its service, it has to pay holding costs per unit of time C (1) 2 , and C 2 in Group 1, Group 2, and the buffer, respectively. We assume that C (1) 2 ≤ C (2) the service priority, each server in Group 1 is not only faster but also cheaper than that in Group 2. Switching between work and sleep: To save energy, the servers in Group 2 can switch between the work and sleep states. On the one hand, if there are more jobs waiting in the buffer, then Group 2 sets up and turns on some sleeping servers. This process usually involves a setup cost C (1) 3 . However, the setup time is very short as it directly begins from the sleep state and it can be ignored. On the other hand, if the number of jobs in Group 2 is smaller, then the working servers are switched to the sleep state, while the incomplete-service jobs are transferred to the buffer and served as the arriving ones.
Transfer rules: (1) To Group 1. Based on the prior use of servers in Group 1, if a server in Group 1 becomes idle and there is no job in the buffer, then an incomplete-service job (if it exists) in Group 2 must immediately be transferred to the idle server in Group 1. Additionally, the data center needs to pay a transferred cost C 4 to the transferred job.
(2) To the buffer. If some servers in Group 2 are closed to the sleep state, then those jobs in the servers closed at the sleep state are transferred to the buffer, and a transferred cost C (2) 3 is paid by the data center. To keep the transferred jobs that can enter the buffer, we need to control the new jobs arriving at the buffer. If the sum of the job number in the buffer and the job number in Group 2 is equal to m 3 , then the newly arriving jobs must be lost immediately.
Power Consumption: The power consumption rates P 1,W and P 2,W are for the work states of servers in Groups 1 and 2, respectively, while P 2,S is only for the sleep state of a server in Group 2. Note that each server in Group 1 does not have the sleep state and it is clear that P 1,S = 0. We assume that 0 < P 2,S < P 2,W . There is no power consumption for keeping the jobs in the buffer. C 1 is the power consumption price per unit of the power consumption rate and per unit of time.
Independence: We assume that all the random variables in the data center defined above are independent.
Finally, to aid reader understanding, the data center, together with its operational mode and mathematical notations, is depicted in Figure 1. Table 1 summarizes some notations involved in the model. This will be helpful in our later study.

Cost Necessary Interpretation
The power consumption price The holding cost for a job in Group 1 per unit of sojourn time The holding cost for a job in Group 2 per unit of sojourn time The holding cost for a job in the buffer per unit of sojourn time The setup cost for a server switching from the sleep state to the work state The transferred cost for a incomplete-service job returning to the buffer C 4 The transferred cost for a job in Group 2 is transferred to Group 1 C 5 The opportunity cost for each lost job R The service price from the served job In the remainder of this section, it might be useful to provide some comparison of the above model assumptions with those in Ma et al. [31] and in Xia et al. [35].

Remark 1.
(1) Compared with our previous paper [31], this paper considers several new practical factors, such as a finite buffer, a fast setup process, and a job transfer rule. The new factors make our MDP modeling more practical and useful in the study of energy-efficient data centers. Although the new factors do not increase any difficulty in performance evaluation through modeling by means of queueing systems or Markov processes, they can cause substantially more difficulties and challenges in finding optimal dynamic energy-efficient policies and, furthermore, in determining threshold-type policies by using the sensitivity-based optimization. Note that the difficulties mainly grow out of establishing the policy-based block-structured Markov process and solving the block-structured Poisson equation. On this occasion, we have simplified the above model descriptions: for example, the setup is immediate, the jobs can be transferred without delay either between the slow and fast servers or between the slow servers and the buffer, the jobs must be transferred as soon as the fast server becomes free, the finite buffer space is reserved for jobs in progress, and so on.
(2) For the energy-efficient data center operating with some buffer, it is seen from Figure A3 in Appendix B that the main challenge of our work is to focus on how to describe the policy-based block-structured Markov process. Obviously, (a) if there are more than two groups of servers, then it is easy to check that the policy-based Markov process will become multi-dimensional so that its analysis is very difficult; (b) if the buffer is infinite, then we have to deal with the policy-based blockstructured Markov process with infinitely many levels, for which the discussion and computation are very complicated.

Remark 2.
Compared with Xia et al. [35], this paper introduces a more detailed cost and reward structure, which makes analysis for the monotonicity and optimality of the dynamic energy-efficient policies more difficult and challenging. Therefore, many cost and reward factors make the MDP analysis and the sensitivity-based optimization more complicated.

Optimization Model Formulation
In this section, for the energy-efficient data center, we first establish a policy-based continuous-time Markov process with a finite block structure. Then, we define a suitable reward function with respect to both states and policies of the Markov process. Note that this will be helpful and useful for setting up a MDP to find the optimal asynchronous dynamic policy in the energy-efficient data center.

A Policy-Based Block-Structured Continuous-Time Markov Process
The data center in Figure 1 shows Group 1 of m 1 servers, Group 2 of m 2 servers, and a buffer of size m 3 . We need to introduce both "states" and "policies" to express the stochastic dynamics of this data center. Let N 1 (t), N 2 (t), and N 3 (t) be the numbers of jobs in Group 1, Group 2, and the buffer, respectively. Therefore, (N 1 (t), N 2 (t), N 3 (t)) is regarded as a state of the data center at time t. Let all the cases of such a state (N 1 (t), N 2 (t), N 3 (t)) form a set as follows: where .
For a state (n 1 , n 2 , n 3 ), it is seen from the model description that there are four different cases: (a) By using the transfer rules, if n 1 = 0, 1, . . . , m 1 − 1, then n 2 = n 3 = 0; if either n 2 = 0 or n 3 = 0, then n 1 = m 1 . (b) If n 1 = m 1 and n 2 = 0, then the jobs in the buffer can increase until the waiting room is full, i.e., n 3 = 1, 2, . . . , m 3 . (c) If n 1 = m 1 and n 2 = 1, 2, . . . , m 2 − 1, then the total numbers of jobs in Group 2 and the buffer are no more than the buffer size, i.e., n 2 + n 3 ≤ m 3 . (d) If n 1 = m 1 and n 2 = m 2 , then the jobs in the buffer can also increase until the waiting room is full, i.e., n 3 = 1, 2, . . . , m 3 . Now, for Group 2, we introduce an asynchronous dynamic policy, which is related to two dynamic actions (or sub-policies): from sleep to work (setup) and from work to sleep (close). Let d W n 1 ,n 2 ,n 3 and d S n 1 ,n 2 ,n 3 be the numbers of working servers and of sleeping servers in Group 2 at State (n 1 , n 2 , n 3 ), respectively. By observing the state set Ω, we call d W and d S the setup policy (i.e., from sleep to work) and the sleep policy (i.e., from work to sleep), respectively.
Note that the servers in Group 2 can only be set up when all of them are idle, while we cannot simultaneously have the setup policy (d W ) because the servers in Group 2 are always affected by the sleep policy (d S ) if they still work for some jobs. This is what we call asynchronous dynamic policies. Here, we consider the control optimization of the total system. For such sub-policies, we provide an interpretation of four different cases as follows: (1) In Ω 0 , if n 1 = 0, 1, . . . , m 1 − 1, then n 2 = n 3 = 0 due to the transfer rule. Thus, there are no jobs in Group 2 or in the buffer, so that no policy in Group 2 is used; (2) In Ω 1 , the states will affect how to use the setup policy. If n 1 = m 1 , n 2 = 0, n 3 = 1, 2, . . . , m 3 , then d W m 1 ,0,n 3 is the number of working servers in Group 2 at State (m 1 , 0, n 3 ). Note that some of the slow servers need to first start, so that some jobs in the buffer can enter the activated slow servers, thus, d W m 1 ,0,n 3 ∈ {0, 1, . . . , m 2 }, each of which can possibly take place under an optimal dynamic policy; (3) From Ω 2 to Ω m 2 +1 , the states will affect how to use the sleep policy. If n 1 = m 1 , n 2 = 1, 2, . . . , m 2 , n 3 = 0, 1, . . . , m 3 − n 2 , then d S m 1 ,n 2 ,n 3 is the number of sleeping servers in Group 2 at State (m 1 , n 2 , n 3 ). We assume that the number of sleeping servers is no less than m 2 − n 2 . Note that the sleep policy is independent of the work policy. Once the sleep policy is set up, the servers without jobs must enter the sleep state. At the same time, some working servers with jobs are also closed to the sleep state, and the jobs in those working servers are transferred to the buffer. It is easy to see that d S m 1 ,n 2 ,n 3 ∈ {m 2 − n 2 , m 2 − n 2 + 1, . . . , m 2 }; (4) In Ω m 2 +2 , if n 1 = m 1 and n 2 = m 2 , then n 3 may be any element in the set {m 3 −m 2 +1, m 3 −m 2 +2, . . . , m 3 }, it is clear that n 2 + n 3 > m 3 .
Our aim is to determine when or under what conditions an optimal number of servers in Group 2 switch between the sleep state and the work state such that the long-run average profit of the data center is maximal. From the state space Ω, we define an asynchronous dynamic energy-efficient policy d as Note that d W is related to the fact that if there is no job in Group 2 at the initial time, then all the servers in Group 2 are at the sleep state. Once there are jobs in the buffer, we quickly set up some servers in Group 2 such that they enter the work state to serve the jobs. Similarly, we can understand the sleep policy d S . In the state subset m 2 +2 i=2 Ω i , it is seen that the setup policy d W will not be needed because some servers are kept at the work state.
For all the possible policies d given in (1), we compose a policy space as follows: for any given policy d ∈ D. Then {X (d) (t) : t ≥ 0} is a policy-based block-structured continuous-time Markov process on the state space Ω whose state transition relations are given in Figure A3 in Appendix B (we provide two simple special cases to understand such a policy-based block-structured continuous-time Markov process in Appendix A). Based on this, the infinitesimal generator of the Markov process X (d) (t) : t ≥ 0 is given by where every block element Q i,j depends on the policy d (for simplification of description, here we omit "d") and it is expressed in Appendix C.
It is easy to see that the infinitesimal generator Q (d) has finite states, and it is irreducible with Q (d) e = 0, thus, the Markov process Q (d) is a positive recurrent. In this case, we write the stationary probability vector of the Markov process X (d) (t) : t ≥ 0 as where π (d) Note that the stationary probability vector π (d) can be obtained by means of solving the system of linear equations π (d) Q (d) = 0 and π (d) e = 1, where e is a column vector of the ones with a suitable size. To this end, the RG-factorizations play an important role in our later computation. Note that some computational details are given in Chapter 2 in Li [33]. Now, we use UL-type RG-factorization to compute the stationary probability vector π (d) as follows. For 0 ≤ i, j ≤ k and 0 ≤ k ≤ m 2 + 2, we write Then the UL-type RG-factorization is given by By using Theorem 2.9 of Chapter 2 in Li [33], the stationary probability vector of the Markov process Q (d) is given by Remark 3. The RG-factorizations provide a unified, constructive and algorithmic framework for the numerical computation of many practical stochastic systems. It can be applied to provide effective solutions for the block-structured Markov processes, and are shown to be also useful for the optimal design and dynamical decision-making of many practical systems. See more details in Li [33].
The following theorem provides some useful observations on some special policies d W d S ∈ D, in which the special policies will have no effect on the infinitesimal generator Q (d W d S ) or the stationary probability vector π (d W d S ) .
Theorem 1. Suppose that two asynchronous energy-efficient policies d W 1 d S , d W 2 d S ∈ D satisfy one of the following two conditions: (a) for each n 3 = 1, 2, . . . , m 2 , if d W 1 m 1 ,0,n 3 ∈ {n 3 , n 3 + 1, . . . , m 2 }, then we take d W 2 m 1 ,0,n 3 as any element of the set {1, 2, . . . , m 2 }; (b) for each n 3 = m 2 + 1, m 2 + 2, . . . , m 3 , if d W 1 m 1 ,0,n 3 ∈ {1, 2, . . . , m 2 }, then we take d W 2 m 1 ,0,n 3 = d W 1 m 1 ,0,n 3 . Under both such conditions, we have Proof of Theorem 1. It is easy to see from (2) that all the levels of the matrix Q (d W 1 d S ) are the same as those of the matrix Q (d W 2 d S ) , except level 1. Thus, we only need to compare level 1 of the matrix For the two asynchronous energy-efficient policies d W 1 d S , d W 2 d S ∈ D satisfying the conditions (a) and (b), by using 1 Thus, it follows from (2) . This completes the proof.
Note that Theorem 1 will be necessary and useful for analyzing the policy monotonicity and optimality in our later study. Furthermore, see the proof of Theorem 4.

Remark 4.
This paper is the first to introduce and consider the asynchronous dynamic policy in the study of energy-efficient data centers. We highlight the impact of the two asynchronous sub-policies: the setup and sleep policies on the long-run average profit of the energy-efficient data center.

The Reward Function
For the Markov process Q (d) , now we define a suitable reward function for the energyefficient data center.
Based on the above costs and price notations in Table 1, a reward function with respect to both states and policies is defined as a profit rate (i.e., the total revenues minus the total costs per unit of time). Therefore, according to the impact of the asynchronous dynamic policy on the profit rate, the reward function at State (n 1 , n 2 , n 3 ) under policy d is divided into four cases as follows: Case (a): For n 1 = 0, 1, . . . , m 1 and n 2 = n 3 = 0, the profit rate is not affected by any policy, and we have Note that in Case (a), there is no job in Group 2 or in the buffer. Thus, it is clear that each server in Group 2 is at the sleep state.
However, in the following two cases (b) and (c), since there are some jobs either in Group 2 or in the buffer, the policy d will play a key role in opening (i.e., setup) or closing (i.e., sleep) some servers of Group 2 to save energy efficiently.

Case (d):
For n 1 = m 1 , n 2 = m 2 and n 3 = m 3 − m 2 + 1, m 3 − m 2 + 2, . . . , m 3 , the profit rate is not affected by any policy, we have We define a column vector composed of the elements f (n 1 , n 2 , n 3 ), f (d W ) (n 1 , n 2 , n 3 ) and f (d S ) (n 1 , n 2 , n 3 ) as where In the remainder of this section, the long-run average profit of the data center (or the policy-based continuous-time Markov process X (d) (t) : t ≥ 0 ) under an asynchronous dynamic policy d is defined as where π (d) and f (d) are given by (3) and (8), respectively. We observe that as the number of working servers in Group 2 decreases, the total revenues and the total costs in the data center will decrease synchronously, and vice versa. On the other hand, as the number of sleeping servers in Group 2 increases, the total revenues and the total costs in the data center will decrease synchronously, and vice versa. Thus, there is a tradeoff between the total revenues and the total costs for a suitable number of working and/or sleeping servers in Group 2 by using the setup and sleep policies, respectively. This motivates us to study an optimal dynamic control mechanism for the energy-efficient data center. Thus, our objective is to find an optimal asynchronous dynamic policy d * such that the long-run average profit η d is maximized, that is, Since the setup and sleep policies d W and d S occur asynchronously, they cannot interact with each other at any time. Therefore, it is seen that the optimal policy can be decomposed into In fact, it is difficult and challenging to analyze the properties of the optimal asynchronous dynamic policy d * = d W * d S * , and to provide an effective algorithm for computing the optimal policy d * . To do this, in the next sections we will introduce the sensitivity-based optimization theory to study this energy-efficient optimization problem.

The Block-Structured Poisson Equation
In this section, for the energy-efficient data center, we set up a block-structured Poisson equation which provides a useful relation between the sensitivity-based optimization and the MDP. Additionally, we use the RG-factorization, given in Li [33], to solve the blockstructured Poisson equation and provide an expression for its unique solution.
For d ∈ D, it follows from Chapter 2 in Cao [34] that for the policy-based continuoustime Markov process X (d) (t) : t ≥ 0 , we define the performance potential as where η d is defined in (9). It is seen from Cao [34] that for any policy d ∈ D, g (d) (n 1 , n 2 , n 3 ) quantifies the contribution of the initial state (n 1 , n 2 , n 3 ) to the long-run average profit of the data center. Here, g (d) (n 1 , n 2 , n 3 ) is also called the relative value function or the bias in the traditional Markov decision process theory, see, e.g., Puterman [39] for more details.
We further define a column vector g (d) with elements g (d) (n 1 , n 2 , n 3 ) for (n 1 , n 2 , where g (d) A similar computation to that in Ma et al. [31], the block-structured Poisson equation is given by where η d is defined in (9), f (d) is given in (8), and Q (d) is given in (2).
To solve the system of linear equations (13), we note that rank (13) of linear equations exists with infinitely many solutions with a free constant of an additive term. Let Q be a matrix obtained through omitting the first row and the first column vectors of the matrix Q (d) . Then, where Q 0,1 is obtained by means of omitting the first row vector of Q 0,1 , Q 1,0 and Q 2,0 are obtained from omitting the first column vectors of Q 1,0 and Q 2,0 , respectively. The other block entries in Q (d) are the same as the corresponding block entries in the matrix Q (d) .
Note that, rank and ϕ (d) be two column vectors of size m 1 + 3m 2 /2 − m 2 2 /2 + m 2 m 3 + m 3 obtained through omitting the first element of the two column vectors f (d) − η d e and g (d) of size m 1 + 3m 2 /2 − m 2 2 /2 + m 2 m 3 + m 3 + 1, respectively, and Then, where f 0 and g (d) 0 are the two column vectors, which are obtained through omitting the scale entries f (0, 0, 0) and g (d) (0, 0, 0) of f 0 and g (d) 0 , respectively, and Therefore, it follows from (13) that where e 1 is a column vector with the first element being one and all the others being zero.
Note that the matrix − Q (d) is invertible and − Q (d) −1 > 0, thus the system (16) of linear equations always exists with one unique solution: where g (d) (0, 0, 0) = is any given positive constant. For the convenience of computation, we take g (d) (0, 0, 0) = = 1. In this case, we have Note that the expression of the invertible matrix − Q (d) −1 can be obtained by means of the RG-factorization, which is given in Li [33] for general Markov processes.
For convenience of computation, we write and every element of the matrix Q r is written by a scalar q (r) n,l , we denote by n a system state under the certain block, and l the index of element, where r = 0, 1, . . . , m 2 + 2, l = 1, 2, . . . , L, It is easy to check that The following theorem provides an expression for the vector ϕ (d) under a constraint condition g (d) (0, 0, 0) = = 1. Note that this expression is very useful for applications of the sensitivity-based optimization theory to the study of Markov decision processes in our later study.
Proof of Theorem 2. It is seen from (18) that we need to compute two parts: and thus a simple computation for the vector −1 e 1 can obtain our desired results. This completes the proof.

Impact of the Service Price
In this section, we study the perturbation realization factor of the policy-based continuous-time Markov process both for the setup policy and for the sleep policy (i.e., they form the asynchronous energy-efficient policy), and analyze how the service price impacts on the perturbation realization factor. To do this, our analysis includes the following two cases: the setup policy and the sleep policy. Note that the results given in this section will be useful for establishing the optimal asynchronous dynamic policy of the energy-efficient data center in later sections.
It is a key in our present analysis that the setup policy and the sleep policy are asynchronous at any time; thus, we can discuss the perturbation realization factor under the asynchronous dynamic policy from two different computational steps.

The Setup Policy
For the performance potential vector ϕ (d) under a constraint condition g (d) (0, 0, 0) = 1, we define a perturbation realization factor as where n = (n 1 , n 2 , n 3 ), n = n 1 , n 2 , n 3 . We can see that G (d) (n, n ) quantifies the difference between two performance potentials g (d) (n 1 , n 2 , n 3 ) and g (d) n 1 , n 2 , n 3 . It measures the long-run effect on the average profit of the data center when the system state is changed from n = n 1 , n 2 , n 3 to n = (n 1 , n 2 , n 3 ). For our next discussion, through observing the state space, it is necessary to define some perturbation realization factors as follows: where 0 ≤ i 2 < i 1 ≤ m 2 and n 3 = 0, 1, . . . , m 3 . It follows from Theorem 2 that To express the perturbation realization factor by means of the service price R, we write for 1 ≤ l ≤ l 0 and n 1 = 1, 2, . . . , m 1 , n 2 = n 3 = 0, for l 0 + 1 ≤ l ≤ l 1 , and n 1 = m 1 , n 2 = 0, n 3 = 1, 2, . . . , m 3 , for l 1 + 1 ≤ l ≤ l m 2 +1 , and n 1 = m 1 , n 2 = 1, 2, . . . , m 2 , n 3 = 0, 1, . . . , m 3 − n 2 , for l m 2 +1 + 1 ≤ l ≤ L, and n 1 = m 1 , n 2 = m 2 , n 3 = m 3 − m 2 + 1, m 3 − m 2 + 2, . . . , m 3 , Then for 1 ≤ l ≤ l 0 and n 1 = 0, 1, . . . , m 1 , n 2 = n 3 = 0, for l 0 + 1 ≤ l ≤ l 1 , and n 1 = m 1 , n 2 = 0, n 3 = 1, 2, . . . , m 3 , for l 1 + 1 ≤ l ≤ l m 2 +1 , and n 1 = m 1 , n 2 = 1, 2, . . . , m 2 , n 3 = 0, 1, . . . , m 3 − n 2 , for l m 2 +1 + 1 ≤ l ≤ L, and n 1 = m 1 , n 2 = m 2 , n 3 = m 3 − m 2 + 1, m 3 − m 2 + 2, . . . , m 3 , We rewrite π (d) as π (d) 0 = π 0 ; π 1 , π 2 , . . . , π l 0 , π Then it is easy to check that and Thus, we obtain It follows from (15) that for 1 ≤ l ≤ l 0 , for l m 2 +1 + 1 ≤ l ≤ L, If a job finishes its service at a server and leaves this system immediately, then the data center can obtain a fixed revenue (i.e., the service price) R from such a served job. Now, we study the influence of the service price R on the perturbation realization factor. Note that all the numbers q (r) n,l are positive and are independent of the service price R, while all the numbers h l are the linear functions of R. We write then for i 1 , i 2 = 0, 1, . . . , m 2 , we obtain where β 1 is defined as From the later discussion in Section 6, we will see that G (d W ) plays a fundamental role in the performance optimization of data centers, and the sign of G (d W ) directly determines the selection of decision actions, as shown in (38) later. To this end, we analyze how the service price can impact on G (d W ) as follows. Substituting (21) into the linear equation where ϕ(i 1 ) = m 1 µ 1 + i 1 µ 2 , ϕ(i 2 ) = m 1 µ 1 + i 2 µ 2 and ψ(i 1 , i 2 ) = (i 1 − i 2 )µ 2 . Thus, the unique solution of the price R in (22) is given by It is easy to see from (22) In the energy-efficient data center, we define two critical values, related to the service price, as and The following proposition uses the two critical values, which are related to the service price, to provide a key condition whose purpose is to establish a sensitivity-based optimization framework of the energy-efficient data center in our later study. Additionally, this proposition will be useful in the next section for studying the monotonicity of the asynchronous energy-efficient policies.

Proposition 1.
(1) If R ≥ R W H , then for any d ∈ D and for each couple (i 1 , (2) If 0 ≤ R ≤ R W L , then for any d ∈ D and for each couple (i 1 , Proof of Proposition 1.
(1) For any d ∈ D and for each couple (i 1 , Thus, for any couple (i 1 , i 2 ) with 0 ≤ i 2 < i 1 ≤ m 2 this makes that G (d W ) ≥ 0.
(2) For any d ∈ D and for each couple (i 1 , this gives that G (d W ) ≤ 0. This completes the proof.

The Sleep Policy
The analysis for the sleep policy is similar to that of the setup policy given in the above subsection. Here, we shall provide only a simple discussion.
We define the perturbation realization factor for the sleep policy as follows: where 0 ≤ j 2 < j 1 ≤ n 2 , n 2 = 0, 1, . . . , m 2 and n 3 = 0, 1, . . . , m 3 . It follows from Theorem 2 that Similarly, to express the perturbation realization factor by means of the service price R, we write and Now, we analyze how the service price impacts on G (d S ) as follows: Substituting (29) into the linear equation G (d S ) = 0, we obtain Then, the unique solution of the price R in (30) is given by It is easy to see from Equation (30) In the energy-efficient data center, we relate to the service price and define two critical values as and The following proposition is similar to Proposition 1, thus its proof is omitted here.
From Propositions 1 and 2, we relate to the service price and define two new critical values as The following theorem provides a simple summarization from Propositions 1 and 2, and it will be useful for studying the monotonicity and optimality of the asynchronous dynamic policy in our later sections.

Theorem 3.
(1) If R ≥ R H , then for any asynchronous dynamic policy d ∈ D, we have (2) If 0 ≤ R ≤ R L , then for any asynchronous policy d ∈ D, we have G (d W ) ≤ 0 and G (d S ) ≤ 0.

Monotonicity and Optimality
In this section, we use the block-structured Poisson equation to derive a useful performance difference equation, and discuss the monotonicity and optimality of the long-run average profit of the energy-efficient data center with respect to the setup and sleep policies, respectively. Based on this, we can give the optimal asynchronous dynamic policy of the energy-efficient data center. The standard Markov model-based formulation suffers from a number of drawbacks. First and foremost, the state space is usually too large for practical problems. That is, the number of potentials to be calculated or estimated is too large for most problems. Secondly, the generally applicable Markov model does not reflect any special structure of a particular problem. Thus, it is not clear whether and how potentials can be aggregated to save computation by exploring the special structure of the system. The sensitivity point of view and the flexible construction of the sensitivity formulas provide us a new perspective to explore alternative approaches for the performance optimization of systems with some special features.
For any given asynchronous energy-efficient policy d ∈ D, the policy-based continuoustime Markov process {X (d) (t) : t ≥ 0} with infinitesimal generator Q (d) given in (2) is an irreducible, aperiodic, and positive recurrent. Therefore, by using a similar analysis to Ma et al. [31], the long-run average profit of the data center is given by and the Poisson equation is written as For State (n 1 , n 2 , n 3 ), it is seen from (2) that the asynchronous energy-efficient policy d directly affects not only the elements of the infinitesimal generator Q (d) but also the reward function f (d) . That is, if the asynchronous policy d changes, then the infinitesimal generator Q (d) and the reward function f (d) will have their corresponding changes. To express such a change mathematically, we take two different asynchronous energy-efficient policies d and d , both of which correspond to their infinitesimal generators Q (d) and Q (d ) , and to their reward functions f (d) and f (d ) .
The following lemma provides a useful equation for the difference η d − η d of the longrun average performances η d and η d for any two asynchronous policies d, d ∈ D. Here, we only restate it without proof, while readers may refer to Ma et al. [31] for more details.
Lemma 1. For any two asynchronous energy-efficient policies d, d ∈ D, we have Now, we describe the first role played by the performance difference, in which we set up a partial order relation in the policy set D so that the optimal asynchronous energyefficient policy in the finite set D can be found by means of finite comparisons. Based on the performance difference η d − η d for any two asynchronous energy-efficient policies d, d ∈ D, we can set up a partial order in the policy set D as follows. We write d d if By using this partial order, our research target is to find an optimal asynchronous policy d * ∈ D such that d * d for any asynchronous energy-efficient policy d ∈ D, or d * = arg max d∈D η d .
It is seen that the policy set D contains (m 2 + 1) m 3 ×2 m 3 × 3 m 3 −1 × · · · × (m 2 + 1) m 2 +1 elements so that the enumeration method used to find the optimal policy will require a huge enumeration workload. However, our following work will be able to greatly reduce the amount of searching for the optimal asynchronous policy d * by means of the sensitivity-based optimization theory. Now, we discuss the monotonicity of the long-run average profit η d with respect to any asynchronous policy d under the different service prices. Since the setup and sleep policies d W and d S occur asynchronously and they will not interact with each other at any time, we can, respectively, study the impact of the policies d W and d S on the long-run average profit η d . To this end, in what follows we shall discuss three different cases: R ≥ R H , 0 ≤ R ≤ R L , and R L < R < R H .

The Service Price R ≥ R H
In the case of R ≥ R H , we discuss the monotonicity and optimality with respect to two different policies: the setup policy and the sleep policy, respectively.
The following theorem analyzes the right-half part of the unimodal structure (see Figure 2) of the long-run average profit η d with respect to the setup policy-either d W m 1 ,0,n 3 ∈ {n 3 , n 3 + 1, . . . , Theorem 4. For any setup policy d W with d W d S ∈ D and for each n 3 = 1, 2, . . . , m 3 , the longrun average profit η d W d S is linearly increasing with respect to the setup policy either d W m 1 ,0,n 3 ∈ {n 3 , n 3 + 1, . . . , Proof of Theorem 4. For each n 3 = 1, 2, . . . , m 3 , we consider two interrelated policies d W d S , d W d S ∈ D as follows: . Furthermore, it is easy to check from (4) to (7) that Thus, it follows from Lemma 1 that (m 1 , 0, n 3 ) can be determined by d W m 1 ,0,n 3 = n 3 ∧ m 2 . This indicates that π (d W d S ) (m 1 , 0, n 3 ) is irrelevant to the decision element d W m 1 ,0,n 3 . Furthermore, note that η d W d S is irrelevant to the decision element d W m 1 ,0,n 3 , and P 2,W − P 2,S , C 1 and C (1) 3 are all positive constants, thus it is easy to see from (35) that the long-run average profit η d W d S is linearly decreasing with respect to each decision element d W m 1 ,0,n 3 for d W m 1 ,0,n 3 ∈ {(n 3 + 1) ∧ m 2 , (n 3 + 2) ∧ m 2 , . . . , m 2 }. It is worth noting that if m 2 ≤ n 3 ≤ m 3 , then d W m 1 ,0,n 3 ∈ {m 2 , m 2 , . . . , m 2 }. This completes the proof.
In what follows, we discuss the left-half part of the unimodal structure (see Figure 2) of the long-run average profit η d with respect to each decision element d W m 1 ,0,n 3 ∈ {0, 1, . . . , n 3 } if n 3 < m 2 . Compared to analysis of its right-half part, our discussion for the left-half part is a little bit complicated.
Let the optimal setup policy d W * = arg max Then, it is seen from Theorem 4 that Hence, Theorem 4 takes the area of finding the optimal setup policy d W * from a large set {0, 1, . . . , m 2 } m 3 to a greatly shrunken area {0, To find the optimal setup policy d W * , we consider two setup policies with an interrelated structure as follows: where d W m 1 ,0,n 3 = i 1 > d W m 1 ,0,n 3 = i 2 , and d W m 1 ,0,n 3 , d W m 1 ,0,n 3 ∈ {1, 2, . . . , n 3 ∧ m 2 }. It is easy to check from (2) that On the other hand, from the reward functions given in (8), it is seen that for n 3 = 1, 2, . . . , m 2 , and d W m 1 ,0,n 3 ∈ {0, 1, . . . , n 3 }, Hence, we have We write that The following theorem discusses the left-half part (see Figure 2) of the unimodal structure of the long-run average profit η d with respect to each decision element d W m 1 ,0,n 3 ∈ {0, 1, . . . , m 2 }. where d W m 1 ,0,n 3 = i 1 > d W m 1 ,0,n 3 = i 2 , and d W m 1 ,0,n 3 , d W m 1 ,0,n 3 ∈ {0, 1, . . . , n 3 }. Applying Lemma 1, it follows from (36) and (37) that If R ≥ R W H , then it is seen from Proposition 1 that G (d W ) ≥ 0. Thus, we get that for the two policies d W d S , d W d S ∈ D with d W m 1 ,0,n 3 > d W m 1 ,0,n 3 and d W m 1 ,0,n 3 , d W m 1 ,0,n 3 ∈ {0, 1, . . . , n 3 }, This shows that This completes the proof. When R ≥ R W H , now we use Figure 2 to provide an intuitive summary for the main results given in Theorems 4 and 5. In the right-half part of Figure 2, shows that η d W d S is a linear function of the decision element d W m 1 ,0,n 3 . By contrast, in the right-half part of Figure 2, we need to first introduce a restrictive condition: Since G (d W ) also depends on the decision element d W m 1 ,0,n 3 , it is clear that η d W d S is a nonlinear function of the decision element d W m 1 ,0,n 3 .

The Sleep Policy with R ≥ R S H
It is different from the setup policy in that, for the sleep policy, each decision element is d S m 1 ,n 2 ,n 3 ∈ {m 2 − n 2 , m 2 − n 2 + 1, . . . , m 2 }. Hence, we just consider the structural properties of the long-run average profit η d W d S with respect to each decision element d S m 1 ,n 2 ,n 3 . We write the optimal sleep policy as d S * = arg max Then, it is seen that It is easy to see that the area of finding the optimal sleep policy d S * is To find the optimal sleep policy d S * , we consider two sleep policies with an interrelated structure as follows: 1,1 , . . . , d S m 1 ,n 2 ,n 3 −1 , d S m 1 ,n 2 ,n 3 , d S m 1 ,n 2 ,n 3 +1 , . . . , d S m 1 ,m 2 ,m 3 −m 2 , 1,1 , . . . , d S m 1 ,n 2 ,n 3 −1 , d S m 1 ,n 2 ,n 3 , d S m 1 ,n 2 ,n 3 +1 , . . . , d S m 1 ,m 2 ,m 3 −m 2 , where d S m 1 ,n 2 ,n 3 = m 2 − j 2 > d S m 1 ,n 2 ,n 3 = m 2 − j 1 , 0 ≤ j 2 < j 1 ≤ n 2 , it is easy to check from (2) that On the other hand, from the reward functions given in (6), d S m 1 ,n 2 ,n 3 , d S m 1 ,n 2 ,n 3 is in either {m 2 − n 2 , m 2 − n 2 + 1, . . . , m 2 } for 1 ≤ n 2 ≤ m 2 and 0 ≤ n 3 ≤ m 3 − n 2 or {0, 1, . . . , m 2 } for n 2 = m 2 and 0 ≤ n 3 ≤ m 3 − m 2 , we have Hence, we have We write The following theorem discusses the structure of the long-run average profit η d W d S with respect to each decision element d S m 1 ,n 2 ,n 3 .

The Service Price 0 ≤ R ≤ R L
A similar analysis to that of the case R ≥ R H , we simply discuss the service price 0 ≤ R ≤ R L for the monotonicity and optimality for two different policies: the setup policy and the sleep policy.
L , for any setup policy d W with d W d S ∈ D and for each n 3 = 1, 2, . . . , m 2 , then the long-run average profit η d W d S is strictly monotone decreasing with respect to each decision element d W m 1 ,0,n 3 ∈ {0, 1, . . . , n 3 } .
When 0 ≤ R ≤ R W L , we also use Figure 4 to provide an intuitive summary for the main results given in Theorems 4 and 8. If 0 ≤ R ≤ R S L , then for any sleep policy d S with d W d S ∈ D and for each n 2 = 1, 2, . . . , m 2 , n 3 = 0, 1, . . . , m 3 − n 2 , the long-run average profit η d W d S is strictly monotone increasing with respect to each decision element d S m 1 ,n 2 ,n 3 , for d S m 1 ,n 2 ,n 3 ∈ {m 2 − n 2 , m 2 − n 2 + 1, . . . , m 2 }.
This completes the proof.
When 0 ≤ R ≤ R S L , we also use Figure 5 to provide an intuitive summary for the main results given in Theorem 9. As a simple summarization of Theorems 8 and 9, the following theorem further describes monotone structure of the long-run average profit η d with respect to the asynchronous energy-efficient policy, while its proof is easy only through using the condition that 0 ≤ R ≤ R L makes 0 ≤ R ≤ R W L and 0 ≤ R ≤ R S L .
Theorem 10. If 0 ≤ R ≤ R L , then for any asynchronous energy-efficient policy d ∈ D, the long-run average profit η d is strictly monotone with respect to each decision element of d W and of d S , respectively.
In the remainder of this section, we discuss a more complicated case with the service price R L < R < R H . In this case, we use the bang-bang control and the asynchronous structure of d ∈ D to prove that the optimal asynchronous energy-efficient polices d W * and d S * both have bang-bang control forms.
6.3. The Service Price R L < R < R H For the price R L < R < R H , we can further derive the following theorems about the monotonicity of η d with respect to the setup policy and the sleep policy, respectively. 6.3.1. The Setup Policy with R W L < R < R W H For the service price R W L < R < R W H , the following theorem provides the monotonicity of η d with respect to the decision element d W m 1 ,0,n 3 .
Proof of Theorem 11. Similarly to the first part of the proof for Theorem 5, we consider any two setup policies with an interrelated structure as follows: On the other hand, we can similarly obtain the following difference equation By summing (42) and (43), we have Therefore, we have the sign conservation equation The above equation means that the sign of G (d W ) and G d W are always identical when a particular decision element d W m 1 ,0,n 3 is changed to any d W m 1 ,0,n 3 . With the sign conservation Equation (44) and the performance difference Equation (43), we can directly derive that the long-run average profit η d W d S is monotone with respect to d W m 1 ,0,n 3 . This completes the proof.
Based on Theorem 11, the following corollary directly derives that the optimal decision element d W * m 1 ,0,n 3 has a bang-bang control form (see more details in Cao [34] and Xia and Chen [35]).

Corollary 1.
For the setup policy, the optimal decision element d W * m 1 ,0,n 3 is either 0 or n 3 , i.e., the bang-bang control is optimal.
With Corollary 1, we should either keep all servers in sleep mode or turn on the servers such that the number of working servers equals the number of waiting jobs in the buffer. In addition, we can see that the search space of d W m 1 ,0,n 3 can be reduced from {0, 1, . . . , n 3 } to a two-element set {0, n 3 }, which is a significant reduction of search complexity.

The Sleep Policy with
For the service price R S L < R < R S H , the following theorem provides the monotonicity of η d with respect to the decision element d S m 1 ,n 2 ,n 3 .
Proof of Theorem 12. Similar to the proof for Theorem 11, we consider any two sleep policies with an interrelated structure as follows: On the other hand, we can also obtain the following difference equation: Thus, the sign conservation equation is given by This means that the signs of G (d S ) and G d S are always identical when a particular decision element d S m 1 ,n 2 ,n 3 is changed to any d S m 1 ,n 2 ,n 3 . We can directly derive that the long-run average profit η d W d S is monotone with respect to d S m 1 ,n 2 ,n 3 . This completes the proof.

Corollary 2.
For the sleep policy, the optimal decision element d S * m 1 ,n 2 ,n 3 is either m 2 − n 2 or m 2 , i.e., the bang-bang control is optimal.
With Corollary 2, we should either keep all in sleep or turn off the servers such that the number of sleeping servers equals the number of servers without jobs in Group 2. We can see that the search space of d S m 1 ,n 2 ,n 3 can be reduced from {m 2 − n 2 , m 2 − n 2 + 1, . . . , m 2 } to a two-element set {m 2 − n 2 , m 2 }, hence this is also a significant reduction of search complexity.
It is seen from Corollaries 1 and 2 that the form of the bang-bang control is very simple and easy to adopt in practice, while the optimality of the bang-bang control guarantees the performance confidence of such simple forms of control. This makes up the threshold-type of the optimal asynchronous energy-efficient policy in the data center.

The Maximal Long-Run Average Profit
In this section, we provide the optimal asynchronous dynamic policy d * of the threshold-type in the energy-efficient data center and further compute the maximal longrun average profit.
We introduce some notation as follows: c 0 = (P 2,W − P 2,S )C 1 , c 1 = (m 1 P 1,W + m 2 P 2,W )C 1 , Now, we express the optimal asynchronous energy-efficient policy d * of the thresholdtype and compute the maximal long-run average profit η d * under three different service prices as follows: Case 2. The service price 0 ≤ R ≤ R L It follows from Theorem 10 that 2 n 3 .

Remark 5.
The above results are intuitive because when the service price is suitably high, the number of working servers is equal to a crucial number related to waiting jobs both in Group 2 and in the buffer; when the service price is lower, each server at the work state must pay a high energy consumption cost, but they receive only a low revenue. In this case, the profit of the data center cannot increase, so that all the servers in Group 2 would like to be closed at the sleep state.
Case 3. The service price R L < R < R H In Section 6.3, we have, respectively, proven the optimality of the bang-bang control for the setup and sleep policies, regardless of the service price R. However, if R L < R < R H , we cannot exactly determine the monotone form (i.e., increasing or decreasing) of the optimal asynchronous energy-efficient policy. This makes the threshold-type of the optimal asynchronous energy-efficient policy in the data center. In fact, such a threshold-type policy also provides us with a choice to compute the optimal setup and sleep policies, they not only have a very simple form but are also widely adopted in numerical applications.
In what follows, we focus our study on the threshold-type asynchronous policy, although its optimality is not yet proven in our next analysis.
We define a couple of threshold-type control parameters as follows: where θ 1 and θ 2 are setup and sleep thresholds, respectively. Furthermore, we introduce two interesting subsets of the policy set D. We write d W It is easy to see that D ⊂ D.

Remark 7.
In this paper, we discuss the optimal asynchronous dynamic policy of the energy-efficient data center deeply, and such types of policies are widespread in practice. It would be interesting to extend our results to a more general situation. Although the sensitivity-based optimization theory can effectively overcome the drawbacks of MDPs, it still has some limitations. For example, it cannot discuss the optimization of two or more dynamic control policies synchronously, which is a very important research direction in dynamic optimization.

Conclusions
In this paper, we highlight the optimal asynchronous dynamic policy of an energyefficient data center by applying sensitivity-based optimization theory and RG-factorization. Such an asynchronous policy is more important and necessary in the study of energyefficient data centers, and it largely makes an optimal analysis of energy-efficient management more interesting, difficult, and challenging. To this end, we consider a more practical model with several basic factors, for example, a finite buffer, a fast setup process from sleep to work, and the necessary cost of transferring jobs from Group 2 either to Group 1 or to the buffer. To find the optimal asynchronous dynamic policy in the energy-efficient data center, we set up a policy-based block-structured Poisson equation and provide an expression for its solution by means of the RG-factorization. Based on this, we derive the monotonicity and optimality of the long-run average profit with respect to the asynchronous dynamic policy under different service prices. We prove the optimality of the bang-bang control, which significantly reduces the action search space, and study the optimal threshold-type asynchronous dynamic policy. Therefore, the results of this paper provide new insights to the discussion of the optimal dynamic control policies of more general energy-efficient data centers.
Along such a line, there are a number of interesting directions for potential future research, for example: • Analyzing non-Poisson inputs such as Markovian arrival processes (MAPs) and/or non-exponential service times, e.g., the PH distributions; • Developing effective algorithms for finding the optimal dynamic policies of the policybased block-structured Markov process (i.e., block-structured MDPs); • Discussing the fact that the long-run performance is influenced by the concave or convex reward (or cost) function; • Studying individual optimization for the energy-efficient management of data centers from the perspective of game theory.

Acknowledgments:
The authors are grateful to the editor and anonymous referees for their constructive comments and suggestions, which sufficiently help the authors to improve the presentation of this manuscript.

Conflicts of Interest:
The authors declare no conflict of interest.
Therefore, its infinitesimal generator is given by where for level 0, it is easy to see that for level 1, the setup policy affects the infinitesimal generator  for level 4, we have Q 4,3 = 0 2µ 1 + 2µ 2 0 0 and Q 4,4 = −(λ + 2µ 1 + 2µ 2 ) λ 2µ 1 + 2µ 2 −(2µ 1 + 2µ 2 ) . Case A2. m 1 = 2, m 2 = 3 and m 3 = 4 To further understand the policy-based block-structured continuous-time Markov process X (d) (t) : t ≥ 0 , we take another example to illustrate that if the parameters m 2 and m 3 increase slightly, the complexity of the state transition relations will increase considerably. The number of servers in Group 2 and the buffer capacity both increase by one, which makes the number of state transitions affected by the setup and sleep policies increase from 5 to 9 and from 7 to 15, respectively. Compared with Figure A2, the state transition relations become more complicated. We divide the state transition rate of the Markov process into two parts, as shown in Figure A2: (a) the state transitions without any policy and (b) the state transitions by both setup and sleep policies. Similar to Case 1, for the state transitions in which the transfer rates and the policies are simultaneously involved in the Markov chain with sync jumps of multi-events, the transfer rate is equal to the total entering rate at a certain state.
Furthermore, its infinitesimal generator is given by .
Remark A1. The first key step in applications of the sensitivity-based optimization to the study of energy-efficient data centers is to draw the state transition relation figure (e.g., see Figure A3) and to write the infinitesimal generator of the policy-based block-structured Markov process. Although this paper has largely simplified the model assumptions, Figure A3 is still slightly complicated by its three separate parts: (a), (b), and (c). Obviously, if we consider some more general assumptions (for example, (i) the faster servers are not cheaper, (ii) Group 2 is not slower, (iii) there is no transfer rule, and so on), then the state transition relation figure will become more complicated, so that it is more difficult to write the infinitesimal generator of the policy-based block-structured Markov process and to solve the block-structured Poisson equation. Remark A2. Figure A3 shows that Part (a) expresses the arrival and service rates, while Parts (b) and (c) express the state transition rates caused by the setup and sleep policies, respectively. Note that the setup policy is started by only the arrival and service process at State (m 1 , 0, k) for 1 ≤ k ≤ m 3 (see Part (b)), in which there is no setup rate because the setup time is so short that it is ignored. Similarly, it is easy to understand Part (c) for the sleep policy. It is worthwhile to note that an idle server may be at the work state, as seen in the idle servers with the work state in Group 1.

Appendix C. Block Elements in Q (d)
In this appendix, we write each block element in the matrix Q (d) .