Next Article in Journal
DSRP Theory: A Primer
Next Article in Special Issue
Modeling Emergency Logistics Location-Allocation Problem with Uncertain Parameters
Previous Article in Journal
In Situ Technological Innovation Diffusion Rate Accuracy Assessment
Previous Article in Special Issue
Simulation of Cooperation Scenarios of BRI-Related Countries Based on a GVC Network
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Optimal Asynchronous Dynamic Policies in Energy-Efficient Data Centers

1
Business School, Xuzhou University of Technology, Xuzhou 221018, China
2
School of Economics and Management, Beijing University of Technology, Beijing 100124, China
3
School of Business, Sun Yat-sen University, Guangzhou 510275, China
*
Author to whom correspondence should be addressed.
Systems 2022, 10(2), 27; https://doi.org/10.3390/systems10020027
Submission received: 24 January 2022 / Revised: 26 February 2022 / Accepted: 28 February 2022 / Published: 2 March 2022
(This article belongs to the Special Issue Data Driven Decision-Making for Complex Production Systems)

Abstract

:
In this paper, we apply a Markov decision process to find the optimal asynchronous dynamic policy of an energy-efficient data center with two server groups. Servers in Group 1 always work, while servers in Group 2 may either work or sleep, and a fast setup process occurs when the server’s states are changed from sleep to work. The servers in Group 1 are faster and cheaper than those of Group 2 so that Group 1 has a higher service priority. Putting each server in Group 2 to sleep can reduce system costs and energy consumption, but it must bear setup costs and transfer costs. For such a data center, an asynchronous dynamic policy is designed as two sub-policies: The setup policy and the sleep policy, both of which determine the switch rule between the work and sleep states for each server in Group 2. To find the optimal asynchronous dynamic policy, we apply the sensitivity-based optimization to establish a block-structured policy-based Markov process and use a block-structured policy-based Poisson equation to compute the unique solution of the performance potential by means of the RG-factorization. Based on this, we can characterize the monotonicity and optimality of the long-run average profit of the data center with respect to the asynchronous dynamic policy under different service prices. Furthermore, we prove that a bang–bang control is always optimal for this optimization problem. We hope that the methodology and results developed in this paper can shed light on the study of more general energy-efficient data centers.

1. Introduction

Over the last two decades considerable attention has been given to studying energy- efficient data centers. On the one hand, as the number and size of data centers increase rapidly, energy consumption becomes one main part of the operating costs of data centers. On the other hand, data centers have become a fundamental part of the IT infrastructure in today’s Internet services, in which a huge number of servers are deployed in each data center such that the data centers can provide cloud computing environments. Therefore, finding optimal energy-efficient policies and designing optimal energy-efficient mechanisms are always interesting, difficult, and challenging in the energy-efficient management of data centers. Readers may refer to recent excellent survey papers, such as Masanet et al. [1], Zhang et al. [2], Nadjahi et al. [3], Koot and Wijnhoven [4], Shirmarz and Ghaffari [5], Li et al. [6], and Harchol-Balter [7].
Barroso and Hölzle [8] demonstrated that many data centers were designed to handle peak loads effectively, but this directly caused a significant number of servers (approximately 20%) in the data centers to be idle because no work was done in the off-peak period. Although the idle servers do not provide any services, they still continue to consume a notable amount of energy, which is approximately 70% of the servers working in the on-peak period. Therefore, it is necessary and useful to design an energy-efficient mechanism for specifically dealing with idle servers. In this case, a key technique, the energy-efficient states of “Sleep” or “Off” were introduced such that the idle servers can take the sleep or off state, which always consumes less energy, in which the sleep state consumes very little energy, while the energy consumption of the off state is zero. See Kuehn and Mashaly [9] for more interpretation. To analyze the energy-efficient states, thus far, some queueing systems either with server energy-efficient states (e.g., work, idle, sleep, and off) or with server control policies (e.g., vacation, setup, and N-policy) have been developed in the study of energy-efficient data centers. Important examples include the survey papers by Gandhi [10] and Li et al. [6], and the research papers by Gandhi et al. [11,12,13], Maccio and Down [14,15], Phung-Duc [16] and Phung-Duc and Kawanishi [17].
It is always necessary to design an optimal energy-efficient mechanism for data centers. To do this, several static optimization methods have been developed using two basic approaches. The first is to construct a suitable utility function for a performance-energy trade-off or a performance cost with respect to the synchronous optimization of different performance measures, for example, reducing energy consumption, reducing system response time, and improving quality of service. The second is to minimize the performance cost by means of some optimal methods, including linear programming, nonlinear programming, and integer programming. Gandhi et al. [12] provided two classes of the performance-energy trade-offs: (a) ERWS, the weighted sum β 1 E [ R ] + β 2 E [ E ] of the mean response time E [ R ] and the mean power cost E [ E ] , where β 1 , β 2 0 are weighted coefficients; (b) ERP, the product E [ R ] E [ E ] of the mean response time and the mean power cost. See Gandhi et al. [12] and Gandhi [10] for systematical research. Maccio and Down [14] generalized the ERP to a more general performance cost function that considers the expected cycle rate. Gebrehiwot et al. [18] made some interesting generalizations of the ERP and ERWS by introducing multiple intermediate sleep states. Additionally, Gebrehiwot et al. [19,20] generalized the FCFS queue of the data center with multiple intermediate sleep states to the processor-sharing discipline and to the shortest remaining processing time (SRPT) discipline, respectively. In addition, Mitrani [21,22] provided another interesting method to discuss the data center of N identical servers that contain m reserves, while the idle or work of the servers is controlled by an up threshold U and a down threshold D. He established a new performance cost C = c 1 E [ L ] + c 2 E [ S ] and provided expressions for the average numbers E [ L ] and E [ S ] , so that the performance cost C can be optimized with respect to the three key parameters m, U, and D.
To date, little work has been done on applications of Markov decision processes (MDPs) to find the optimal dynamic control policies of energy-efficient data centers. Readers may refer to recent publications for details, among which Kamitsos et al. [23] constructed a discrete-time MDP and proved that the optimal sleep energy-efficient policy is simply hysteretic, so that it has a double threshold structure. Note that such an optimal hysteretic policy follows Hipp and Holzbaur [24] and Lu and Serfozo [25]. For policy optimization and dynamic power management for electronic systems or equipment, the MDPs and stochastic network optimization were developed from five different perspectives: (a) the discrete-time MDPs by Yang et al. [26]; (b) the continuous-time MDPs by Ding et al. [27]; (c) stochastic network optimization by Liang et al. [28]; (d) the sensitivity-based optimization by Xia and Chen [29], Xia et al. [30], and Ma et al. [31]; and (e) the deep reinforcement learning by Chi et al. [32].
The purpose of this paper is to apply continuous-time MDPs to set up optimal dynamic energy-efficient policies for data centers, in which the sensitivity-based optimization is developed to find the optimal solution. Note that the sensitivity-based optimization is greatly refined from the MDPs by dealing with the Poisson equation by means of some novel tools, for instance, performance potential and performance difference. To date, the sensitivity-based optimization has also been sufficiently related to the Markov reward processes (see Li [33]), thus, it is an effective method for the performance optimization of many practical systems (see an excellent book by Cao [34] for more details). The sensitivity-based optimization theory has been applied to performance optimization in many practical areas. For example, in energy-efficient data centers by Xia et al. [35]; inventory rationing by Li et al. [36]; the blockchain selfish mining by Ma and Li [37]; and in finance by Xia [38].
The main contributions of this paper are threefold. The first contribution is to apply the sensitivity-based optimization (and the MDPs) to study a more general energy-efficient data center with key practical factors, for example, a finite buffer, a fast setup process, and transferring some incomplete service jobs to the idle servers in Group 1 or to the finite buffer, if any. Although practical factors will not increase any difficulty in performance evaluation (e.g., modeling by means of queueing systems or Markov processes, also see Gandhi [10] for more details), they can largely cause substantial difficulties and challenges in finding optimal dynamic energy-efficient policies and, furthermore, in determining threshold-type policies by using the sensitivity-based optimization. For instance, the finite buffer makes the policy-based Markov process appear as the two-dimensional block-structured Markov process from the one-dimensional birth–death process given in Ma et al. [31] and Xia et al. [35].
Note that this paper has two related works: Ma et al. [31] and Xia et al. [35], and it might be necessary to set up some useful relations between this paper and each of the two papers. Compared with Ma et al. [31], this paper considers more practical factors in the energy-efficient data centers such that the policy-based Markov process is block-structured, which makes solving the block-structured Poisson equation more complicated. Compared with Xia et al. [35], this paper introduces a more detailed cost and reward structure, which makes an analysis of the monotonicity and optimality of dynamic energy-efficient policies more difficult and challenging. Therefore, this paper is a necessary and valuable generalization of Ma et al. [31] and Xia et al. [35] through extensively establishing the block-structured policy-based Markov processes, which in fact are the core part of the sensitivity-based optimization theory and its applications in various practical systems.
The second contribution of this paper is that it is the first to find an optimal asynchronous dynamic policy in the study of energy-efficient data centers. Note that the two groups of servers in the data center have “the setup actions from the sleep state to the work state” and “the close actions from the work state to the sleep state”, thus, we follow the two action steps to form an asynchronous dynamic policy, which is decomposed into two sub-policies: the setup policy (or the setup action) and the sleep policy (or the close action). Crucially, one of the successes of this paper is to find the optimal asynchronous dynamic policy from many asynchronous dynamic policies by means of the sensitivity-based optimization. To date, it has still been very difficult and challenging in the MDPs.
The third contribution of this paper is to provide a unified framework for applying the sensitivity-based optimization to study the optimal asynchronous dynamic policy of the energy-efficient data center. For such a more complicated energy-efficient data center, we first establish a policy-based block-structured Markov process as well as a more detailed cost and reward structure, and provide an expression for the unique solution to the block-structured Poisson equation by means of the RG-factorization. Then, we show the monotonicity of the long-run average profit with respect to the setup and sleep policies and the asynchronous policy, respectively. Based on this, we find the optimal asynchronous policy when the service price is higher (or lower) than a key threshold. Finally, we indicate that the optimal control is a bang–bang control. Such a structure of the optimal asynchronous energy-efficient policy reduces the search space, which is a significant reduction of the optimization complexity and effectively alleviates the curse of the dimensionality of MDPs. Therefore, the optimal asynchronous dynamic policy is the threshold-type in the energy-efficient data center. Note that the optimality of the threshold-type policy can realize a large reduction for the search space, thus, the optimal threshold-type policy is of great significance to solve the mechanism design problem of energy-efficient data centers. Therefore, the methodology and results developed in this paper provide new highlights for understanding dynamic energy-efficient policy optimization and mechanism design in the study of more general data centers.
The organization of this paper is as follows. In Section 2, we give a problem description for an energy-efficient data center with more practical factors. In Section 3, we establish a policy-based continuous-time block-structured Markov process and define a suitable reward function with respect to both states and policies of the Markov process. In Section 4, we set up a block-structured Poisson equation and provide an expression for its unique solution by means of the RG-factorization. In Section 5, we study a perturbation realization factor of the policy-based continuous-time block-structured Markov process for the asynchronous dynamic policy, and analyze how the service price impacts on the perturbation realization factor. In Section 6, we discuss the monotonicity and optimality of the long-run average profit of the energy-efficient data center with respect to the asynchronous policy. Based on this, we can give the optimal asynchronous dynamic policy of the energy-efficient data center. In Section 7, if the optimal asynchronous dynamic policy is the threshold-type, then we can compute the maximal long-run average profit of the energy-efficient data center. In Section 8, we give some concluding remarks. Finally, three appendices are given, both for the state-transition relation figure of the policy-based block-structured continuous-time Markov process and for the block entries of its infinitesimal generator.

2. Model Description

In this section, we provide a problem description for setting up and optimizing an asynchronous dynamic policy in an energy-efficient data center with two groups of different servers, a finite buffer, and a fast setup process. Additionally, we provide the system structure, operational mode, and mathematical notations in the energy-efficient data center.
Server groups: The data center contains two server groups: Groups 1 and 2, each of which is also one interactive subsystem of the data center. Groups 1 and 2 have m 1 and m 2 servers, respectively. Servers in the same group are homogeneous, while those in different groups are heterogeneous. Note that Group 1 is viewed as a base-line group whose servers are always at the work state even if some of them are idle, the purpose of which is to guarantee a necessary service capacity in the data center. Hence, each server in Group 1 always works regardless of whether it has a job or not, so that it must consume an amount of energy at any time. In contrast, Group 2 is regarded as a reserved group whose servers may either work or sleep so that each of the m 2 servers can switch its state between work and sleep. If one server in Group 2 is at the sleep state, then it consumes a smaller amount of energy than the work state, as maintaining only the sleep state requires very little energy.
A finite buffer: The data center has a finite buffer of size m 3 . Jobs must first enter the buffer, and then they are assigned to the groups (Group 1 is prior to Group 2) and subsequently to the servers. To guarantee that the float service capacity of Group 2 can be fully utilized when some jobs are taken from the buffer to Group 2, we assume that m 3 m 2 , i.e., the capacity of the buffer must be no less than the server number of Group 2. Otherwise, if there are more jobs waiting in the buffer, the jobs transferred from Group 2 to the buffer will be lost.
Arrival processes: The arrivals of jobs at the data center are a Poisson process with arrival rate λ . If the buffer is full, then any arriving job has to be lost immediately. This leads to an opportunity cost C 5 per unit of time for each lost job due to the full buffer.
Service processes: The service times provided by each server in Groups 1 and 2 are i.i.d. and exponential with service rates μ 1 and μ 2 , respectively. We assume that μ 1 μ 2 , which makes the prior use of servers in Group 1. The service discipline of each server in the data center is First Come First Serve (FCFS). If a job finishes its service at a server, then it immediately leaves the system. At the same time, the data center can obtain a fixed service reward (or service price) R from the served job.
Once a job enters the data center for its service, it has to pay holding costs per unit of time C 2 1 , C 2 2 , and C 2 3 in Group 1, Group 2, and the buffer, respectively. We assume that C 2 1 C 2 2 also guarantees the prior use of servers in Group 1. Therefore, to support the service priority, each server in Group 1 is not only faster but also cheaper than that in Group 2.
Switching between work and sleep: To save energy, the servers in Group 2 can switch between the work and sleep states. On the one hand, if there are more jobs waiting in the buffer, then Group 2 sets up and turns on some sleeping servers. This process usually involves a setup cost C 3 1 . However, the setup time is very short as it directly begins from the sleep state and it can be ignored. On the other hand, if the number of jobs in Group 2 is smaller, then the working servers are switched to the sleep state, while the incomplete-service jobs are transferred to the buffer and served as the arriving ones.
Transfer rules: (1) To Group 1. Based on the prior use of servers in Group 1, if a server in Group 1 becomes idle and there is no job in the buffer, then an incomplete-service job (if it exists) in Group 2 must immediately be transferred to the idle server in Group 1. Additionally, the data center needs to pay a transferred cost C 4 to the transferred job.
(2) To the buffer. If some servers in Group 2 are closed to the sleep state, then those jobs in the servers closed at the sleep state are transferred to the buffer, and a transferred cost C 3 2 is paid by the data center.
To keep the transferred jobs that can enter the buffer, we need to control the new jobs arriving at the buffer. If the sum of the job number in the buffer and the job number in Group 2 is equal to m 3 , then the newly arriving jobs must be lost immediately.
Power Consumption: The power consumption rates P 1 , W and P 2 , W are for the work states of servers in Groups 1 and 2, respectively, while P 2 , S is only for the sleep state of a server in Group 2. Note that each server in Group 1 does not have the sleep state and it is clear that P 1 , S = 0 . We assume that 0 < P 2 , S <   P 2 , W . There is no power consumption for keeping the jobs in the buffer. C 1 is the power consumption price per unit of the power consumption rate and per unit of time.
Independence: We assume that all the random variables in the data center defined above are independent.
Finally, to aid reader understanding, the data center, together with its operational mode and mathematical notations, is depicted in Figure 1. Table 1 summarizes some notations involved in the model. This will be helpful in our later study.
In the remainder of this section, it might be useful to provide some comparison of the above model assumptions with those in Ma et al. [31] and in Xia et al. [35].
Remark 1.
(1) Compared with our previous paper [31], this paper considers several new practical factors, such as a finite buffer, a fast setup process, and a job transfer rule. The new factors make our MDP modeling more practical and useful in the study of energy-efficient data centers. Although the new factors do not increase any difficulty in performance evaluation through modeling by means of queueing systems or Markov processes, they can cause substantially more difficulties and challenges in finding optimal dynamic energy-efficient policies and, furthermore, in determining threshold-type policies by using the sensitivity-based optimization. Note that the difficulties mainly grow out of establishing the policy-based block-structured Markov process and solving the block-structured Poisson equation. On this occasion, we have simplified the above model descriptions: for example, the setup is immediate, the jobs can be transferred without delay either between the slow and fast servers or between the slow servers and the buffer, the jobs must be transferred as soon as the fast server becomes free, the finite buffer space is reserved for jobs in progress, and so on.
(2) For the energy-efficient data center operating with some buffer, it is seen from Figure A3 in Appendix B that the main challenge of our work is to focus on how to describe the policy-based block-structured Markov process. Obviously, (a) if there are more than two groups of servers, then it is easy to check that the policy-based Markov process will become multi-dimensional so that its analysis is very difficult; (b) if the buffer is infinite, then we have to deal with the policy-based block-structured Markov process with infinitely many levels, for which the discussion and computation are very complicated.
Remark 2.
Compared with Xia et al. [35], this paper introduces a more detailed cost and reward structure, which makes analysis for the monotonicity and optimality of the dynamic energy-efficient policies more difficult and challenging. Therefore, many cost and reward factors make the MDP analysis and the sensitivity-based optimization more complicated.

3. Optimization Model Formulation

In this section, for the energy-efficient data center, we first establish a policy-based continuous-time Markov process with a finite block structure. Then, we define a suitable reward function with respect to both states and policies of the Markov process. Note that this will be helpful and useful for setting up a MDP to find the optimal asynchronous dynamic policy in the energy-efficient data center.

3.1. A Policy-Based Block-Structured Continuous-Time Markov Process

The data center in Figure 1 shows Group 1 of m 1 servers, Group 2 of m 2 servers, and a buffer of size m 3 . We need to introduce both “states” and “policies” to express the stochastic dynamics of this data center. Let N 1 t , N 2 t , and N 3 t be the numbers of jobs in Group 1, Group 2, and the buffer, respectively. Therefore, N 1 t , N 2 t , N 3 t is regarded as a state of the data center at time t. Let all the cases of such a state N 1 t , N 2 t , N 3 t form a set as follows:
Ω = Ω 0 Ω 1 Ω 2 Ω m 2 Ω m 2 + 1 Ω m 2 + 2 ,
where
Ω 0 = 0 , 0 , 0 , 1 , 0 , 0 , , m 1 , 0 , 0 , Ω 1 = m 1 , 0 , 1 , m 1 , 0 , 2 , , m 1 , 0 , m 3 , Ω 2 = m 1 , 1 , 0 , m 1 , 1 , 1 , , m 1 , 1 , m 3 1 , Ω m 2 = m 1 , m 2 1 , 0 , m 1 , m 2 1 , 1 , , m 1 , m 2 1 , m 3 m 2 + 1 , Ω m 2 + 1 = m 1 , m 2 , 0 , m 1 , m 2 , 1 , , m 1 , m 2 , m 3 m 2 , Ω m 2 + 2 = m 1 , m 2 , m 3 m 2 + 1 , m 1 , m 2 , m 3 m 2 + 2 , , m 1 , m 2 , m 3 .
For a state n 1 , n 2 , n 3 , it is seen from the model description that there are four different cases: (a) By using the transfer rules, if n 1 = 0 , 1 , , m 1 1 , then n 2 = n 3 = 0 ; if either n 2 0 or n 3 0 , then n 1 = m 1 . (b) If n 1 = m 1 and n 2 = 0 , then the jobs in the buffer can increase until the waiting room is full, i.e., n 3 = 1 , 2 , , m 3 . (c) If n 1 = m 1 and n 2 = 1 , 2 , , m 2 1 , then the total numbers of jobs in Group 2 and the buffer are no more than the buffer size, i.e., n 2 + n 3 m 3 . (d) If n 1 = m 1 and n 2 = m 2 , then the jobs in the buffer can also increase until the waiting room is full, i.e., n 3 = 1 , 2 , , m 3 .
Now, for Group 2, we introduce an asynchronous dynamic policy, which is related to two dynamic actions (or sub-policies): from sleep to work (setup) and from work to sleep (close). Let d n 1 , n 2 , n 3 W and d n 1 , n 2 , n 3 S be the numbers of working servers and of sleeping servers in Group 2 at State n 1 , n 2 , n 3 , respectively. By observing the state set Ω , we call d W and d S the setup policy (i.e., from sleep to work) and the sleep policy (i.e., from work to sleep), respectively.
Note that the servers in Group 2 can only be set up when all of them are idle, while we cannot simultaneously have the setup policy ( d W ) because the servers in Group 2 are always affected by the sleep policy ( d S ) if they still work for some jobs. This is what we call asynchronous dynamic policies. Here, we consider the control optimization of the total system. For such sub-policies, we provide an interpretation of four different cases as follows:
(1) In Ω 0 , if n 1 = 0 , 1 , , m 1 1 , then n 2 = n 3 = 0 due to the transfer rule. Thus, there are no jobs in Group 2 or in the buffer, so that no policy in Group 2 is used;
(2) In Ω 1 , the states will affect how to use the setup policy. If n 1 = m 1 , n 2 = 0 , n 3 = 1 , 2 , , m 3 , then d m 1 , 0 , n 3 W is the number of working servers in Group 2 at State m 1 , 0 , n 3 . Note that some of the slow servers need to first start, so that some jobs in the buffer can enter the activated slow servers, thus, d m 1 , 0 , n 3 W 0 , 1 , , m 2 , each of which can possibly take place under an optimal dynamic policy;
(3) From Ω 2 to Ω m 2 + 1 , the states will affect how to use the sleep policy. If n 1 = m 1 , n 2 = 1 , 2 , , m 2 , n 3 = 0 , 1 , , m 3 n 2 , then d m 1 , n 2 , n 3 S is the number of sleeping servers in Group 2 at State m 1 , n 2 , n 3 . We assume that the number of sleeping servers is no less than m 2 n 2 . Note that the sleep policy is independent of the work policy. Once the sleep policy is set up, the servers without jobs must enter the sleep state. At the same time, some working servers with jobs are also closed to the sleep state, and the jobs in those working servers are transferred to the buffer. It is easy to see that d m 1 , n 2 , n 3 S   m 2 n 2 , m 2 n 2 + 1 , , m 2 ;
(4) In Ω m 2 + 2 , if n 1 = m 1 and n 2 = m 2 , then n 3 may be any element in the set m 3     m 2   +   1 ,   m 3     m 2   +   2 , , m 3 , it is clear that n 2 + n 3 > m 3 .
Our aim is to determine when or under what conditions an optimal number of servers in Group 2 switch between the sleep state and the work state such that the long-run average profit of the data center is maximal. From the state space Ω , we define an asynchronous dynamic energy-efficient policy d as
d = d W d S ,
where d W and d S are the setup and sleep policies, respectively; ‘⊠’ denotes that the policies d W and d S occur asynchronously; and
d W = d m 1 , 0 , 1 W , d m 1 , 0 , 2 W , , d m 1 , 0 , m 3 W , d S = d m 1 , 1 , 0 S , d m 1 , 1 , 1 S , , d m 1 , 1 , m 3 1 S ;   d m 1 , 2 , 0 S , d m 1 , 2 , 1 S , , d m 1 , 2 , m 3 2 S ; ; d m 1 , m 2 , 0 S , d m 1 , m 2 , 1 S , , d m 1 , m 2 , m 3 m 2 S .
Note that d W is related to the fact that if there is no job in Group 2 at the initial time, then all the servers in Group 2 are at the sleep state. Once there are jobs in the buffer, we quickly set up some servers in Group 2 such that they enter the work state to serve the jobs. Similarly, we can understand the sleep policy d S . In the state subset i = 2 m 2 + 2 Ω i , it is seen that the setup policy d W will not be needed because some servers are kept at the work state.
For all the possible policies d given in (1), we compose a policy space as follows:
D : = d = d W d S : d m 1 , 0 , n 3 W   0 , 1 , , m 2   for   1 n 3 m 3 ; d m 1 , n 2 , n 3 S   m 2 n 2 , m 2 n 2 + 1 , , m 2   for   m 1 , n 2 , n 3   Ω 2 Ω 3 Ω m 2 + 1 .
Let X d ( t ) = N 1 t , N 2 t , N 3 t d for any given policy d D . Then { X d ( t ) : t 0 } is a policy-based block-structured continuous-time Markov process on the state space Ω whose state transition relations are given in Figure A3 in Appendix B (we provide two simple special cases to understand such a policy-based block-structured continuous-time Markov process in Appendix A). Based on this, the infinitesimal generator of the Markov process X d ( t ) : t 0 is given by
Q d = Q 0 , 0 Q 0 , 1 Q 1 , 0 Q 1 , 1 Q 1 , 2 Q 1 , 3 Q 1 , 4 Q 1 , m 2 + 1 Q 2 , 0 Q 2 , 1 Q 2 , 2 Q 3 , 1 Q 3 , 2 Q 3 , 3 Q 4 , 1 Q 4 , 2 Q 4 , 3 Q 4 , 4 Q m 2 + 1 , 1 Q m 2 + 1 , 2 Q m 2 + 1 , 3 Q m 2 + 1 , 4 Q m 2 + 1 , m 2 + 1 Q m 2 + 1 , m 2 + 2 Q m 2 + 2 , m 2 + 1 Q m 2 + 2 , m 2 + 2 ,
where every block element Q i , j depends on the policy d (for simplification of description, here we omit “ d ”) and it is expressed in Appendix C.
It is easy to see that the infinitesimal generator Q d has finite states, and it is irreducible with Q d e = 0 , thus, the Markov process Q d is a positive recurrent. In this case, we write the stationary probability vector of the Markov process X d ( t ) : t 0 as
π d = π 0 d , π 1 d , , π m 2 + 2 d ,   d   D ,
where
π 0 d = π d 0 , 0 , 0 , π d 1 , 0 , 0 , , π d m 1 , 0 , 0 , π 1 d = π d m 1 , 0 , 1 , π d m 1 , 0 , 2 , , π d m 1 , 0 , m 3 , π 2 d = π d m 1 , 1 , 0 , π d m 1 , 1 , 1 , , π d m 1 , 1 , m 3 1 ,   π m 2 d = π d m 1 , m 2 1 , 0 , π d m 1 , m 2 1 , 1 , , π d m 1 , m 2 1 , m 3 m 2 + 1 , π m 2 + 1 d = π d m 1 , m 2 , 0 , π d m 1 , m 2 , 1 , , π d m 1 , m 2 , m 3 m 2 , π m 2 + 2 d = π d m 1 , m 2 , m 3 m 2 + 1 , π d m 1 , m 2 , m 3 m 2 + 2 , , π d m 1 , m 2 , m 3 .
Note that the stationary probability vector π d can be obtained by means of solving the system of linear equations π d Q d = 0 and π d e = 1 , where e is a column vector of the ones with a suitable size. To this end, the RG-factorizations play an important role in our later computation. Note that some computational details are given in Chapter 2 in Li [33].
Now, we use UL-type RG-factorization to compute the stationary probability vector π d as follows. For 0 i , j k and 0 k m 2 + 2 , we write
Q i , j k = Q i , j + n = k + 1 m 2 + 2 Q i , n n Q n , n n 1 Q n , j n .
Clearly, Q i , j m 2 + 2 = Q i , j and Q i , j 0 = Q i , j 0 . Let
U n d = Q n , n n , 0 n m 2 + 2 , R i , j d = Q i , j j U j d 1 , 0 i < j m 2 + 2 ,
and
G i , j d = U i d 1 Q i , j i , 0 j < i m 2 + 2 .
Then the UL-type RG-factorization is given by
Q d = I R U d I U D d I G L d ,
where
R U d = 0 R 0 , 1 d R 0 , 2 d R 0 , m 2 + 1 d R 0 , m 2 + 2 d 0 R 1 , 2 d R 1 , m 2 + 1 d R 1 , m 2 + 2 d 0 0 R m 2 , m 2 + 1 d R m 2 , m 2 + 2 d 0 R m 2 + 1 , m 2 + 2 d 0 ,
U D d = diag U 0 d , U 1 d , , U m 2 + 1 d , U m 2 + 2 d
and
G L d = 0 G 1 , 0 d 0 G 2 , 0 d G 2 , 1 d 0 G 3 , 0 d G 3 , 1 d G 3 , 2 d 0 G m 2 + 2 , 0 d G m 2 + 2 , 1 d G m 2 + 2 , 2 d G m 2 + 2 , m 2 + 1 d 0 .
By using Theorem 2.9 of Chapter 2 in Li [33], the stationary probability vector of the Markov process Q d is given by
π 0 d = τ d x 0 d , π k d = i = 0 k 1 π i d R i , k d , 1 k m 2 + 2 ,
where x 0   d is the stationary probability vector of the censored Markov chain U 0 d to level 0, and the positive scalar τ d is uniquely determined by k = 0 m 2 + 2 π k d e = 1 .
Remark 3.
The RG-factorizations provide a unified, constructive and algorithmic framework for the numerical computation of many practical stochastic systems. It can be applied to provide effective solutions for the block-structured Markov processes, and are shown to be also useful for the optimal design and dynamical decision-making of many practical systems. See more details in Li [33].
The following theorem provides some useful observations on some special policies d W d S   D , in which the special policies will have no effect on the infinitesimal generator Q d W d S or the stationary probability vector π d W d S .
Theorem 1.
Suppose that two asynchronous energy-efficient policies d W 1 d S , d W 2 d S   D satisfy one of the following two conditions: ( a ) for each n 3 = 1 , 2 , , m 2 , if d m 1 , 0 , n 3 W 1   n 3 , n 3 + 1 , , m 2 , then we take d m 1 , 0 , n 3 W 2 as any element of the set 1 , 2 , , m 2 ; ( b ) for each n 3 = m 2 + 1 , m 2 + 2 , , m 3 , if d m 1 , 0 , n 3 W 1   1 , 2 , , m 2 , then we take d m 1 , 0 , n 3 W 2 = d m 1 , 0 , n 3 W 1 . Under both such conditions, we have
Q d W 1 d S = Q d W 2 d S ,   π d W 1 d S = π d W 2 d S .
Proof of Theorem 1.
It is easy to see from (2) that all the levels of the matrix Q d W 1 d S are the same as those of the matrix Q d W 2 d S , except level 1. Thus, we only need to compare level 1 of the matrix Q d W 1 d S with that of the matrix Q d W 2 d S .
For the two asynchronous energy-efficient policies d W 1 d S , d W 2 d S   D satisfying the conditions (a) and (b), by using 1 d m 1 , 0 , n 3 W n 3 in (2), it is clear that for n 3 = 1 , 2 , , m 2 , if d m 1 , 0 , n 3 W 1 , d m 1 , 0 , n 3 W 2   n 3 , n 3 + 1 , , m 2 , then
1 d m 1 , 0 , n 3 W 1 n 3 = 1 d m 1 , 0 , n 3 W 2 n 3 .
Thus, it follows from (2) that Q 1 , k d W 1 d S = Q 1 , k d W 2 d S , k = 1 , 2 , , m 2 + 1 . This also gives that Q d W 1 d S = Q d W 2 d S , and thus π d W 1 d S = π d W 2 d S . This completes the proof. □
Note that Theorem 1 will be necessary and useful for analyzing the policy monotonicity and optimality in our later study. Furthermore, see the proof of Theorem 4.
Remark 4.
This paper is the first to introduce and consider the asynchronous dynamic policy in the study of energy-efficient data centers. We highlight the impact of the two asynchronous sub-policies: the setup and sleep policies on the long-run average profit of the energy-efficient data center.

3.2. The Reward Function

For the Markov process Q d , now we define a suitable reward function for the energy-efficient data center.
Based on the above costs and price notations in Table 1, a reward function with respect to both states and policies is defined as a profit rate (i.e., the total revenues minus the total costs per unit of time). Therefore, according to the impact of the asynchronous dynamic policy on the profit rate, the reward function at State n 1 , n 2 , n 3 under policy d is divided into four cases as follows:
Case (a): For n 1 = 0 , 1 , , m 1 and n 2 = n 3 = 0 , the profit rate is not affected by any policy, and we have
f n 1 , 0 , 0 = R n 1 μ 1 m 1 P 1 , W + m 2 P 2 , S C 1 n 1 C 2 1 .
Note that in Case (a), there is no job in Group 2 or in the buffer. Thus, it is clear that each server in Group 2 is at the sleep state.
However, in the following two cases (b) and (c), since there are some jobs either in Group 2 or in the buffer, the policy d will play a key role in opening (i.e., setup) or closing (i.e., sleep) some servers of Group 2 to save energy efficiently.
Case (b): For n 1 = m 1 , n 2 = 0 and n 3 = 1 , 2 , , m 3 , the profit rate is affected by the setup policy d W , we have
f d W m 1 , 0 , n 3 = R m 1 μ 1 m 1 P 1 , W + d m 1 , 0 , n 3 W P 2 , W + m 2 d m 1 , 0 , n 3 W P 2 , S C 1       m 1 C 2 1 + n 3 C 2 3 d m 1 , 0 , n 3 W C 3 1 λ 1 n 1 = m 1 , n 2 = 0 , n 3 = m 3 C 5 ,
where 1 { · } is an indicator function whose value is 1 when the event is in { · } , otherwise its value is 0.
Case (c): For n 1 = m 1 , n 2 = 1 , 2 , , m 2 and n 3 = 0 , 1 , , m 3 n 2 , the profit rate is affected by the sleep policy d S , we have
f d S m 1 , n 2 , n 3 = R m 1 μ 1 + m 2 d m 1 , n 2 , n 3 S μ 2 m 1 P 1 , W + d m 1 , n 2 , n 3 S P 2 , S + m 2 d m 1 , n 2 , n 3 S P 2 , W C 1 m 1 C 2 1 + n 2 C 2 2 + n 3 C 2 3 n 2 m 2 d m 1 , n 2 , n 3 S C 3 2       m 1 μ 1 1 n 2 > 0 , n 3 = 0 C 4 λ 1 n 1 = m 1 , n 2 + n 3 = m 3 C 5 .
Note that the job transfer rate from Group 2 to Group 1 is given by n 1 μ 1 1 { n 2 > 0 , n 3 = 0 } . If 0 n 1 m 1 1 , then n 2 = 0 and n 1 μ 1 1 { n 2 > 0 , n 3 = 0 } C 4 = 0 . If n 1 = m 1 and n 2 = 0 , then n 1 μ 1 1 { n 2 > 0 , n 3 = 0 } C 4 = 0 . If n 1 = m 1 , 1 n 2 m 2 and n 3 = 0 , then n 1 μ 1 1 { n 2 > 0 , n 3 = 0 } C 4 = m 1 μ 1 C 4 .
Case (d): For n 1 = m 1 , n 2 = m 2 and n 3 = m 3 m 2 + 1 , m 3 m 2 + 2 , , m 3 , the profit rate is not affected by any policy, we have
f m 1 , m 2 , n 3 = R m 1 μ 1 + m 2 μ 2 m 1 P 1 , W + m 2 P 2 , W C 1 m 1 C 2 1 + m 2 C 2 2 + n 3 C 2 3 λ 1 n 1 = m 1 , n 2 = m 2 , n 3 = m 3 C 5 .
We define a column vector composed of the elements f n 1 , n 2 , n 3 , f d W n 1 , n 2 , n 3 and f d S n 1 , n 2 , n 3 as
f d = f 0 T , f 1 d W T , f 2 d S T , , f m 2 + 1 d S T , f m 2 + 2 T T ,
where
f 0 = f 0 , 0 , 0 , f 1 , 0 , 0 , , f m 1 , 0 , 0 T , f 1 d W = f d W m 1 , 0 , 1 , f d W m 1 , 0 , 2 , , f d W m 1 , 0 , m 3 T , f 2 d S = f d S m 1 , 1 , 0 , f d S m 1 , 1 , 1 , , f d S m 1 , 1 , m 3 1 T ,   f m 2 + 1 d S = f d S m 1 , m 2 , 0 , f d S m 1 , m 2 , 1 , , f d S m 1 , m 2 , m 3 m 2 T , f m 2 + 2 = f m 1 , m 2 , m 3 m 2 + 1 , f m 1 , m 2 , m 3 m 2 + 2 , , f m 1 , m 2 , m 3 T .
In the remainder of this section, the long-run average profit of the data center (or the policy-based continuous-time Markov process X d ( t ) : t 0 ) under an asynchronous dynamic policy d is defined as
η d = lim T + E 1 T 0 T f d X d ( t ) d t = π d f d ,
where π d and f d are given by (3) and (8), respectively.
We observe that as the number of working servers in Group 2 decreases, the total revenues and the total costs in the data center will decrease synchronously, and vice versa. On the other hand, as the number of sleeping servers in Group 2 increases, the total revenues and the total costs in the data center will decrease synchronously, and vice versa. Thus, there is a tradeoff between the total revenues and the total costs for a suitable number of working and/or sleeping servers in Group 2 by using the setup and sleep policies, respectively. This motivates us to study an optimal dynamic control mechanism for the energy-efficient data center. Thus, our objective is to find an optimal asynchronous dynamic policy d * such that the long-run average profit η d is maximized, that is,
d * = arg max d D η d .
Since the setup and sleep policies d W and d S occur asynchronously, they cannot interact with each other at any time. Therefore, it is seen that the optimal policy can be decomposed into
d * = d W * d S * = arg max d W   D η d W arg max d S   D η d S .
In fact, it is difficult and challenging to analyze the properties of the optimal asynchronous dynamic policy d * = d W * d S * , and to provide an effective algorithm for computing the optimal policy d * . To do this, in the next sections we will introduce the sensitivity-based optimization theory to study this energy-efficient optimization problem.

4. The Block-Structured Poisson Equation

In this section, for the energy-efficient data center, we set up a block-structured Poisson equation which provides a useful relation between the sensitivity-based optimization and the MDP. Additionally, we use the RG-factorization, given in Li [33], to solve the block-structured Poisson equation and provide an expression for its unique solution.
For d   D , it follows from Chapter 2 in Cao [34] that for the policy-based continuous-time Markov process X d ( t ) : t 0 , we define the performance potential as
g d n 1 , n 2 , n 3 = E 0 + f ( d ) X d ( t ) η d d t X d ( 0 ) = n 1 , n 2 , n 3 ,
where η d is defined in (9). It is seen from Cao [34] that for any policy d D , g d n 1 , n 2 , n 3 quantifies the contribution of the initial state n 1 , n 2 , n 3 to the long-run average profit of the data center. Here, g d n 1 , n 2 , n 3 is also called the relative value function or the bias in the traditional Markov decision process theory, see, e.g., Puterman [39] for more details. We further define a column vector g d with elements g d n 1 , n 2 , n 3 for ( n 1 , n 2 , n 3 )   Ω
g d = g 0 d , g 1 d , g 2 d , , g m 2 + 1 d , g m 2 + 2 d T ,
where
g 0 d = g d 0 , 0 , 0 , g d 1 , 0 , 0 , , g d m 1 , 0 , 0 T , g 1 d = g d m 1 , 0 , 1 , g d m 1 , 0 , 2 , , g d m 1 , 0 , m 3 T , g 2 d = g d m 1 , 1 , 0 , g d m 1 , 1 , 1 , , g d m 1 , 1 , m 3 1 T , g m 2 + 1 d = g d m 1 , m 2 , 0 , g d m 1 , m 2 , 1 , , g d m 1 , m 2 , m 3 m 2 T , g m 2 + 2 d = g d m 1 , m 2 , m 3 m 2 + 1 , g d m 1 , m 2 , m 3 m 2 + 2 , , g d m 1 , m 2 , m 3 T .
A similar computation to that in Ma et al. [31], the block-structured Poisson equation is given by
Q d g ( d ) = η d e f ( d ) ,
where η d is defined in (9), f d is given in (8), and Q d is given in (2).
To solve the system of linear equations (13), we note that rank Q d = m 1 + 3 m 2 / 2 m 2 2 / 2 + m 2 m 3 + m 3 and det Q d = 0 because the size of the matrix Q d is m 1 + 3 m 2 / 2 m 2 2 / 2 + m 2 m 3 + m 3 + 1 . Hence, this system (13) of linear equations exists with infinitely many solutions with a free constant of an additive term. Let Q be a matrix obtained through omitting the first row and the first column vectors of the matrix Q d . Then,
Q d = Q 0 , 0 Q 0 , 1 Q 1 , 0 Q 1 , 1 Q 1 , 2 Q 1 , 3 Q 1 , 4 Q 1 , m 2 + 1 Q 2 , 0 Q 2 , 1 Q 2 , 2 Q 3 , 1 Q 3 , 2 Q 3 , 3 Q 4 , 1 Q 4 , 2 Q 4 , 3 Q 4 , 4 Q m 2 + 1 , 1 Q m 2 + 1 , 2 Q m 2 + 1 , 3 Q m 2 + 1 , 4 Q m 2 + 1 , m 2 + 1 Q m 2 + 1 , m 2 + 2 Q m 2 + 2 , m 2 + 1 Q m 2 + 2 , m 2 + 2 ,
where
Q 0 , 0 = λ + μ 1 λ 2 μ 1 λ + 2 μ 1 λ m 1 1 μ 1 λ + m 1 1 μ 1 λ m 1 μ 1 λ + m 1 μ 1 ,
Q 0 , 1 is obtained by means of omitting the first row vector of Q 0 , 1 , Q 1 , 0 and Q 2 , 0 are obtained from omitting the first column vectors of Q 1 , 0 and Q 2 , 0 , respectively. The other block entries in Q d are the same as the corresponding block entries in the matrix Q d .
Note that, rank Q d = m 1 + 3 m 2 / 2 m 2 2 / 2 + m 2 m 3 + m 3 and the size of the matrix Q d is m 1 + 3 m 2 / 2 m 2 2 / 2 + m 2 m 3 + m 3 . Hence, the matrix Q d is invertible.
Let h d and φ d be two column vectors of size m 1 + 3 m 2 / 2 m 2 2 / 2 + m 2 m 3 + m 3 obtained through omitting the first element of the two column vectors f d η d e and g ( d ) of size m 1 + 3 m 2 / 2 m 2 2 / 2 + m 2 m 3 + m 3 + 1 , respectively, and
l 0 = m 1 , l 1 = m 1 + m 3 , l 2 = m 1 + 2 m 3 , l m 2 + 1 = m 1 + m 2 2 m 2 2 2 + m 2 m 3 + m 3 , L = m 1 + 3 m 2 2 m 2 2 2 + m 2 m 3 + m 3 .
Then,
h d = f 0 η d f 1 d W η d f 2 d S η d f m 2 + 1 d S η d f m 2 + 2 η d = def h 0 d h 1 d h 2 d h m 2 + 1 d h m 2 + 2 d ,   φ ( d ) = def g 0 d g 1 d g 2 d g m 2 + 1 d g m 2 + 2 d ,
where f 0 and g 0 d are the two column vectors, which are obtained through omitting the scale entries f 0 , 0 , 0 and g d 0 , 0 , 0 of f 0 and g 0 d , respectively, and
h 0 d = h 1 , h 2 , , h l 0 T , h 1 d = h l 0 + 1 , h l 1 + 2 , , h l 1 T , h 2 d = h l 1 + 1 , h l 2 + 2 , , h l 2 T , h m 2 + 1 d = h l m 2 + 1 , h l m 2 + 1 + 2 , , h l m 2 + 1 T , h m 2 + 2 d = h l m 2 + 1 + 1 , h l m 2 + 2 + 2 , , h L T .
Therefore, it follows from (13) that
Q d φ ( d ) = h d + μ 1 e 1 g d 0 , 0 , 0 ,
where e 1 is a column vector with the first element being one and all the others being zero. Note that the matrix Q d is invertible and Q d 1 > 0 , thus the system (16) of linear equations always exists with one unique solution:
φ ( d ) = Q d 1 h d + μ 1 Q d 1 e 1 · ,
where g d 0 , 0 , 0 = is any given positive constant. For the convenience of computation, we take g d 0 , 0 , 0 = = 1 . In this case, we have
φ ( d ) = Q d 1 h d + μ 1 Q d 1 e 1 .
Note that the expression of the invertible matrix Q d 1 can be obtained by means of the RG-factorization, which is given in Li [33] for general Markov processes.
For convenience of computation, we write
Q d 1 = Q 0 T , Q 1 T , Q 2 T , , Q m 2 + 1 T , Q m 2 + 2 T T ,
and every element of the matrix Q r is written by a scalar q n , l r , we denote by n a system state under the certain block, and l the index of element, where r = 0 , 1 , , m 2 + 2 , l = 1 , 2 , , L , for L = m 1 + 3 m 2 / 2 m 2 2 / 2 + m 2 m 3 + m 3 , and
n = 1 , 2 , , m 1 ,   for   r = 0 , 1 , 2 , , m 3 ,   for   r = 1 , 0 , 1 , , m 3 r + 1 ,   for   2 r m 2 + 1 , m 3 m 2 + 1 , m 3 m 2 + 2 , , m 3 ,   for   r = m 2 + 2 .
It is easy to check that
Q 0 = q 1 , 1 0 q 1 , 2 0 q 1 , L 0 q 2 , 1 0 q 2 , 2 0 q 2 , L 0 q m 1 , 1 0 q m 1 , 2 0 q m 1 , L 0 L × m 1 , Q 1 = q 1 , 1 1 q 1 , 2 1 q 1 , L 1 q 2 , 1 1 q 2 , 2 1 q 2 , L 1 q m 3 , 1 1 q m 3 , 2 1 q m 3 , L 1 L × m 3 ,
for 2 r m 2 + 1 ,
Q r = q 0 , 1 r q 0 , 2 r q 0 , L r q 1 , 1 r q 1 , 2 r q 1 , L r q m 3 r + 1 , 1 r q m 3 r + 1 , 2 r q m 3 r + 1 , L r L × m 3 r + 2 ,
and
Q m 2 + 2 = q m 3 m 2 + 1 , 1 m 2 + 2 q m 3 m 2 + 1 , 2 m 2 + 2 q m 3 m 2 + 1 , L m 2 + 2 q m 3 m 2 + 2 , 1 m 2 + 2 q m 3 m 2 + 2 , 2 m 2 + 2 q m 3 m 2 + 2 , L m 2 + 2 q m 3 , 1 m 2 + 2 q m 3 , 2 m 2 + 2 q m 3 , L m 2 + 2 L × m 2 .
The following theorem provides an expression for the vector φ ( d ) under a constraint condition g d 0 , 0 , 0 = = 1 . Note that this expression is very useful for applications of the sensitivity-based optimization theory to the study of Markov decision processes in our later study.
Theorem 2.
If g d 0 , 0 , 0 = 1 , then for n 1 = 1 , 2 , , m 1 ,
g d n 1 , 0 , 0 = l = 1 L q n 1 , l 0 h l + μ 1 q n 1 , 1 0 ;
for n 3 = 1 , 2 , , m 3 ,
g d m 1 , 0 , n 3 = l = 1 L q n 3 , l 1 h l + μ 1 q n 3 , 1 1 ;
for n 2 = r 1 , n 3 = 0 , 1 , , m 3 n 2 , and 2 r m 2 + 1 ,
g d m 1 , n 2 , n 3 = l = 1 L q n 3 , l r h l + μ 1 q n 3 , 1 r ;
for n 3 = m 3 m 2 + 1 , m 3 m 2 + 2 , , m 3 ,
g d m 1 , m 2 , n 3 = l = 1 L q n 3 , l m 2 + 2 h l + μ 1 q n 3 , 1 m 2 + 2 .
Proof of Theorem 2.
It is seen from (18) that we need to compute two parts: Q d 1 h d and μ 1 Q d 1 e 1 . Note that
Q 1 h d = l = 1 L q 1 , l 0 h l l = 1 L q m 1 , l 0 h l l = 1 L q 0 , l r h l l = 1 L q m 3 r + 1 , l r h l l = 1 L q m 3 m 2 + 1 , l m 2 + 2 h l l = 1 L q m 3 , l m 2 + 2 h l L × 1   and μ 1 Q 1 e 1 = μ 1 q 1 , 1 0 q m 1 , 1 0 q 0 , 1 r q m 3 r + 1 , 1 r q m 3 m 2 + 1 , 1 m 2 + 2 q m 3 , 1 m 2 + 2 L × 1 .
thus a simple computation for the vector φ ( d ) = Q d 1 h d + μ 1 Q d 1 e 1 can obtain our desired results. This completes the proof. □

5. Impact of the Service Price

In this section, we study the perturbation realization factor of the policy-based continuous-time Markov process both for the setup policy and for the sleep policy (i.e., they form the asynchronous energy-efficient policy), and analyze how the service price impacts on the perturbation realization factor. To do this, our analysis includes the following two cases: the setup policy and the sleep policy. Note that the results given in this section will be useful for establishing the optimal asynchronous dynamic policy of the energy-efficient data center in later sections.
It is a key in our present analysis that the setup policy and the sleep policy are asynchronous at any time; thus, we can discuss the perturbation realization factor under the asynchronous dynamic policy from two different computational steps.

5.1. The Setup Policy

For the performance potential vector φ ( d ) under a constraint condition g d 0 , 0 , 0 = 1 , we define a perturbation realization factor as
G d n , n = def g d n g d n ,
where n = n 1 , n 2 , n 3 , n = n 1 , n 2 , n 3 . We can see that G d n , n quantifies the difference between two performance potentials g d n 1 , n 2 , n 3 and g d n 1 , n 2 , n 3 . It measures the long-run effect on the average profit of the data center when the system state is changed from n = n 1 , n 2 , n 3 to n = n 1 , n 2 , n 3 . For our next discussion, through observing the state space, it is necessary to define some perturbation realization factors as follows:
G 1 d W = def g d m 1 , i 1 , n 3 i 1 g d m 1 , i 2 , n 3 i 2 , G 2 d W = def g d m 1 , 0 , n 3 g d m 1 , i 1 , n 3 i 1 , G 3 d W = def g d m 1 , 0 , n 3 g d m 1 , i 2 , n 3 i 2 ,
where 0 i 2 < i 1 m 2 and n 3 = 0 , 1 , , m 3 .
It follows from Theorem 2 that
g d m 1 , 0 , n 3 = l = 1 L q n 3 , l 1 h l + μ 1 q n 3 , 1 1 , g d m 1 , i 1 , n 3 i 1 = l = 1 L q n 3 i 1 , l i 1 + 1 h l + μ 1 q n 3 i 1 , 1 i 1 + 1 , g d m 1 , i 2 , n 3 i 2 = l = 1 L q n 3 i 2 , l i 2 + 1 h l + μ 1 q n 3 i 2 , 1 i 2 + 1 .
To express the perturbation realization factor by means of the service price R, we write
A 0 = 0 , B 0 = m 1 P 1 , W + m 2 P 2 , S C 1 > 0 ;
for 1 l l 0 and n 1 = 1 , 2 , , m 1 , n 2 = n 3 = 0 ,
A l = n 1 μ 1 > 0 ,     B l = m 1 P 1 , W + m 2 P 2 , S C 1 + n 1 C 2 1 > 0 ;
for l 0 + 1 l l 1 , and n 1 = m 1 , n 2 = 0 , n 3 = 1 , 2 , , m 3 ,
A l = m 1 μ 1 > 0 , B l d W = m 1 P 1 , W + d m 1 , 0 , n 3 W P 2 , W + m 2 d m 1 , 0 , n 3 W P 2 , S C 1 + m 1 C 2 1 + n 3 C 2 3 + d m 1 , 0 , n 3 W C 3 1 + λ 1 n 1 = m 1 , n 2 = 0 , n 3 = m 3 C 5 > 0 ;
for l 1 + 1 l l m 2 + 1 , and n 1 = m 1 , n 2 = 1 , 2 , , m 2 ,   n 3 = 0 , 1 , , m 3 n 2 ,
A l d S = m 1 μ 1 + ( m 2 d m 1 , n 2 , n 3 S ) μ 2 > 0 , B l d S = m 1 P 1 , W + d m 1 , n 2 , n 3 S P 2 , S + m 2 d m 1 , n 2 , n 3 S P 2 , W C 1 + m 1 C 2 1 + n 2 C 2 2 + n 3 C 2 3 + n 2 m 2 d m 1 , n 2 , n 3 S C 3 2 + m 1 μ 1 1 n 3 = 0 C 4 + λ 1 n 2 + n 3 = m 3 C 5 > 0 ;
for l m 2 + 1 + 1 l L , and n 1 = m 1 , n 2 = m 2 ,   n 3 = m 3 m 2 + 1 , m 3 m 2 + 2 , , m 3 ,
A l = m 1 μ 1 + m 2 μ 2 > 0 ,
B l = m 1 P 1 , W + m 2 P 2 , W C 1 + m 1 C 2 1 + m 2 C 2 2 + n 3 C 2 3 + λ 1 n 1 = m 1 , n 2 = m 2 , n 3 = m 3 C 5 > 0 .
Then for 1 l l 0 and n 1 = 0 , 1 , , m 1 , n 2 = n 3 = 0 ,
f n 1 , 0 , 0 = R A l B l ;
for l 0 + 1 l l 1 , and n 1 = m 1 , n 2 = 0 , n 3 = 1 , 2 , , m 3 ,
f d W m 1 , 0 , n 3 = R A l B l d W ;
for l 1 + 1 l l m 2 + 1 , and n 1 = m 1 , n 2 = 1 , 2 , , m 2 ,   n 3 = 0 , 1 , , m 3 n 2 ,
f d S m 1 , n 2 , n 3 = R A l d S B l d S ;
for l m 2 + 1 + 1 l L , and n 1 = m 1 , n 2 = m 2 ,   n 3 = m 3 m 2 + 1 , m 3 m 2 + 2 , , m 3 ,
f d m 1 , m 2 , n 3 = R A l B l .
We rewrite π d as
π 0 d = π 0 ; π 1 , π 2 , , π l 0 , π 1 d = π l 0 + 1 , π l 0 + 2 , , π l 1 , π 2 d = π l 1 + 1 , π l 1 + 2 , , π l 2 , π m 2 + 1 d = π l m 2 + 1 , π l m 2 + 2 , , π l m 2 + 1 , π m 2 + 2 d = π l m 2 + 1 + 1 , π l m 2 + 1 + 2 , , π L .
Then it is easy to check that
D d = π 0 A 0 + l = 0 l 0 π l A l + l = l 0 + 1 l 1 π l A l + l = l 1 + 1 l m 2 + 1 π l A l d S + l = l m 2 + 1 + 1 L π l A l > 0 ,
and
F d = π 0 B 0 + l = 0 l 0 π l B l + l = l 0 + 1 l 1 π l B l d W + l = l 1 + 1 l m 2 + 1 π l B l d S + l = l m 2 + 1 + 1 L π l B l > 0 .
Thus, we obtain
η d = π d f d = n 1 = 0 m 1 π d n 1 , 0 , 0 f n 1 , 0 , 0 + n 3 = 1 m 3 π d m 1 , 0 , n 3 f d W m 1 , 0 , n 3 + n 3 = 0 m 3 n 2 n 2 = 0 m 2 π d m 1 , n 2 , n 3 f d S m 1 , n 2 , n 3 + n 3 = m 3 m 2 + 1 m 3 π d m 1 , m 2 , n 3 f m 1 , m 2 , n 3 = R D d F d .
It follows from (15) that for 1 l l 0 ,
h l = R A l D d B l F d ;
for l 0 + 1 l l 1 ,
h l = R A l D d B l d W F d ;
for l 1 + 1 l l m 2 + 1 ,
h l = R A l d S D d B l d S F d ;
for l m 2 + 1 + 1 l L ,
h l = R A l D d B l F d .
If a job finishes its service at a server and leaves this system immediately, then the data center can obtain a fixed revenue (i.e., the service price) R from such a served job. Now, we study the influence of the service price R on the perturbation realization factor. Note that all the numbers q n , l r are positive and are independent of the service price R, while all the numbers h l are the linear functions of R. We write
W n r = l = 1 l 0 q n , l r h l A l D d + l = l 0 + 1 l 1 q n , l r h l A l D d + l = l 1 + 1 l m 2 + 1 q n , l r h l A l d S D d + l = l m 2 + 1 + 1 L q n , l r h l A l D d + μ 1 q n , 1 r
and
V n r = l = 1 l 0 q n , l r h l B l F d + l = l 0 + 1 l 1 q n , l r h l B l d W F d + l = l 1 + 1 l m 2 + 1 q n , l r h l B l d S F d + l = l m 2 + 1 + 1 L q n , l r h l B l F d ,
then for i 1 , i 2 = 0 , 1 , , m 2 , we obtain
G 1 d W = R W n 3 i 1 i 1 + 1 W n 3 i 2 i 2 + 1 V n 3 i 1 i 1 + 1 V n 3 i 2 i 2 + 1 , G 2 d W = R W n 3 1 W n 3 i 1 i 1 + 1 V n 3 1 V n 3 i 1 i 1 + 1 , G 3 d W = R W n 3 1 W n 3 i 2 i 2 + 1 V n 3 1 V n 3 i 2 i 2 + 1 .
Now, we define
G d W = m 1 μ 1 G 1 d W i 1 μ 2 G 2 d W + β 1 + i 2 μ 2 G 3 d W + β 1 ,
where β 1 is defined as
β 1 = P 2 , W P 2 , S C 1 + C 3 1 μ 2 .
From the later discussion in Section 6, we will see that G d W plays a fundamental role in the performance optimization of data centers, and the sign of G d W directly determines the selection of decision actions, as shown in (38) later. To this end, we analyze how the service price can impact on G d W as follows. Substituting (21) into the linear equation G d W = 0 , we obtain
R = φ i 1 V n 3 i 1 i 1 + 1 φ i 2 V n 3 i 2 i 2 + 1 ψ i 1 , i 2 V n 3 1 β 1 φ i 1 W n 3 i 1 i 1 + 1 φ i 2 W n 3 i 2 i 2 + 1 ψ i 1 , i 2 W n 3 1 ,
where φ i 1 = m 1 μ 1 + i 1 μ 2 , φ i 2 = m 1 μ 1 + i 2 μ 2 and ψ i 1 , i 2 = i 1 i 2 μ 2 .
Thus, the unique solution of the price R in (22) is given by
d W i 1 , i 2 = φ i 1 V n 3 i 1 i 1 + 1 φ i 2 V n 3 i 2 i 2 + 1 ψ i 1 , i 2 V n 3 1 β 1 φ i 1 W n 3 i 1 i 1 + 1 φ i 2 W n 3 i 2 i 2 + 1 ψ i 1 , i 2 W n 3 1 ,
It is easy to see from (22) that (a) if R d W i 1 , i 2 , then G d W 0 ; and (b) if R d W i 1 , i 2 , then G d W 0 .
In the energy-efficient data center, we define two critical values, related to the service price, as
R H W = max d   D 0 , d W 1 , 0 , d W 2 , 0 , , d W m 2 , m 2 1
and
R L W = min d   D d W 1 , 0 , d W 2 , 0 , , d W m 2 , m 2 1
The following proposition uses the two critical values, which are related to the service price, to provide a key condition whose purpose is to establish a sensitivity-based optimization framework of the energy-efficient data center in our later study. Additionally, this proposition will be useful in the next section for studying the monotonicity of the asynchronous energy-efficient policies.
Proposition 1.
(1) If R R H W , then for any d   D and for each couple i 1 , i 2 with 0 i 2 < i 1 m 2 , we have
G d W 0 .
(2) If 0 R R L W , then for any d   D and for each couple i 1 , i 2 with 0 i 2 < i 1 m 2 , we have
G d W 0 .
Proof of Proposition 1.
(1) For any d   D and for each couple i 1 , i 2 with 0 i 2 < i 1 m 2 , since R R H W and R H W = max d   D 0 , d W 1 , 0 , d W 2 , 0 , , d W m 2 , m 2 1 , this gives
R d W i 1 , i 2 .
Thus, for any couple i 1 , i 2 with 0 i 2 < i 1 m 2 this makes that G d W 0 .
(2) For any d   D and for each couple i 1 , i 2 with 0 i 2 < i 1 m 2 , if 0 R R L W , we get
R d W i 1 , i 2 ,
this gives that G d W 0 . This completes the proof. □

5.2. The Sleep Policy

The analysis for the sleep policy is similar to that of the setup policy given in the above subsection. Here, we shall provide only a simple discussion.
We define the perturbation realization factor for the sleep policy as follows:
G 1 d S = def g d m 1 , j 2 , n 3 + n 2 j 2 g d m 1 , j 1 , n 3 + n 2 j 1 , G 2 d S = def g d m 1 , n 2 , n 3 g d m 1 , j 1 , n 3 + n 2 j 1 , G 3 d S = def g d m 1 , n 2 , n 3 g d m 1 , j 2 , n 3 + n 2 j 2 ,
where 0 j 2 < j 1 n 2 , n 2 = 0 , 1 , , m 2 and n 3 = 0 , 1 , , m 3 .
It follows from Theorem 2 that
g d m 1 , n 2 , n 3 = l = 1 L q n 3 , l n 2 + 1 h l + μ 1 q n 3 , 1 n 2 + 1 , g d m 1 , j 1 , n 3 + n 2 j 1 = l = 1 L q n 3 + n 2 j 1 , l j 1 + 1 h l + μ 1 q n 3 + n 2 j 1 , 1 j 1 + 1 , g d m 1 , j 2 , n 3 + n 2 j 2 = l = 1 L q n 3 + n 2 j 2 , l j 2 + 1 h l + μ 1 q n 3 + n 2 j 2 , 1 j 2 + 1 .
Similarly, to express the perturbation realization factor by means of the service price R, we write
G d S = m 1 μ 1 G 1 d S + j 1 μ 2 G 2 d S + β 2 j 2 μ 2 G 3 d S + β 2 ,
where
G 1 d S = R W n 3 + n 2 j 2 j 2 + 1 W n 3 + n 2 j 1 j 1 + 1 V n 3 + n 2 j 2 j 2 + 1 V n 3 + n 2 j 1 j 1 + 1 , G 2 d S = R W n 3 n 2 + 1 W n 3 + n 2 j 1 j 1 + 1 V n 3 n 2 + 1 V n 3 + n 2 j 1 j 1 + 1 , G 3 d S = R W n 3 n 2 + 1 W n 3 + n 2 j 2 j 2 + 1 V n 3 n 2 + 1 V n 3 + n 2 j 2 j 2 + 1 ,
and
β 2 = R + P 2 , W P 2 , S C 1 + C 3 2 μ 2 .
Now, we analyze how the service price impacts on G d S as follows: Substituting (29) into the linear equation G d S = 0 , we obtain
R = φ j 1 V n 3 + n 2 j 1 j 1 + 1 φ j 2 V n 3 + n 2 j 2 j 2 + 1 ψ j 1 , j 2 V n 3 n 2 + 1 β 2 φ j 1 W n 3 + n 2 j 1 j 1 + 1 φ j 2 W n 3 + n 2 j 2 j 2 + 1 ψ j 1 , j 2 μ 2 W n 3 n 2 + 1 .
Then, the unique solution of the price R in (30) is given by
d S j 1 , j 2 = φ j 1 V n 3 + n 2 j 1 j 1 + 1 φ j 2 V n 3 + n 2 j 2 j 2 + 1 ψ j 1 , j 2 V n 3 n 2 + 1 β 2 φ j 1 W n 3 + n 2 j 1 j 1 + 1 φ j 2 W n 3 + n 2 j 2 j 2 + 1 ψ j 1 , j 2 μ 2 W n 3 n 2 + 1 .
It is easy to see from Equation (30) that (a) if R d S j 1 , j 2 , then G d S 0 ; and (b) if R d S j 1 , j 2 , then G d S 0 .
In the energy-efficient data center, we relate to the service price and define two critical values as
R H S = max d   D 0 , d S 1 , 0 , d S 2 , 0 , , d S m 2 , m 2 1
and
R L S = min d   D d S 1 , 0 , d S 2 , 0 , , d S m 2 , m 2 1
The following proposition is similar to Proposition 1, thus its proof is omitted here.
Proposition 2.
(1) If R R H S , then for any d   D and for each couple j 1 , j 2 with 0 j 2 < j 1 n 2 , we have
G d S 0 .
(2) If 0 R R L S , then for any d   D and for each couple j 1 , j 2 with 0 j 2 < j 1 n 2 , we have
G d S 0 .
From Propositions 1 and 2, we relate to the service price and define two new critical values as
R H = max R H W , R H S   and   R L = min R L W , R L S .
The following theorem provides a simple summarization from Propositions 1 and 2, and it will be useful for studying the monotonicity and optimality of the asynchronous dynamic policy in our later sections.
Theorem 3.
(1) If R R H , then for any asynchronous dynamic policy d   D , we have
G d W 0   a n d   G d S 0 .
(2) If 0 R R L , then for any asynchronous policy d   D , we have
G d W 0   a n d   G d S 0 .

6. Monotonicity and Optimality

In this section, we use the block-structured Poisson equation to derive a useful performance difference equation, and discuss the monotonicity and optimality of the long-run average profit of the energy-efficient data center with respect to the setup and sleep policies, respectively. Based on this, we can give the optimal asynchronous dynamic policy of the energy-efficient data center.
The standard Markov model-based formulation suffers from a number of drawbacks. First and foremost, the state space is usually too large for practical problems. That is, the number of potentials to be calculated or estimated is too large for most problems. Secondly, the generally applicable Markov model does not reflect any special structure of a particular problem. Thus, it is not clear whether and how potentials can be aggregated to save computation by exploring the special structure of the system. The sensitivity point of view and the flexible construction of the sensitivity formulas provide us a new perspective to explore alternative approaches for the performance optimization of systems with some special features.
For any given asynchronous energy-efficient policy d   D , the policy-based continuous-time Markov process { X d ( t ) : t 0 } with infinitesimal generator Q d given in (2) is an irreducible, aperiodic, and positive recurrent. Therefore, by using a similar analysis to Ma et al. [31], the long-run average profit of the data center is given by
η d = π d f d ,
and the Poisson equation is written as
Q d g ( d ) = η d e f d .
For State n 1 , n 2 , n 3 , it is seen from (2) that the asynchronous energy-efficient policy d directly affects not only the elements of the infinitesimal generator Q d but also the reward function f d . That is, if the asynchronous policy d changes, then the infinitesimal generator Q d and the reward function f d will have their corresponding changes. To express such a change mathematically, we take two different asynchronous energy-efficient policies d and d , both of which correspond to their infinitesimal generators Q d and Q d , and to their reward functions f d and f d .
The following lemma provides a useful equation for the difference η d η d of the long-run average performances η d and η d for any two asynchronous policies d , d   D . Here, we only restate it without proof, while readers may refer to Ma et al. [31] for more details.
Lemma 1.
For any two asynchronous energy-efficient policies d , d   D , we have
η d η d = π d Q d Q d g ( d ) + f d f d .
Now, we describe the first role played by the performance difference, in which we set up a partial order relation in the policy set D so that the optimal asynchronous energy-efficient policy in the finite set D can be found by means of finite comparisons. Based on the performance difference η d η d for any two asynchronous energy-efficient policies d , d   D , we can set up a partial order in the policy set D as follows. We write d d if η d > η d ; d d if η d = η d ; d d if η d < η d . Furthermore, we write d d if η d η d ; d d if η d η d . By using this partial order, our research target is to find an optimal asynchronous policy d *   D such that d * d for any asynchronous energy-efficient policy d   D , or
d * = arg max d   D η d .
Note that the policy set D and the state space Ω are both finite, thus an enumeration method is feasible for finding the optimal asynchronous energy-efficient policy d * in the policy set D . Since
D = d = d W d S : d m 1 , 0 , n 3 W   0 , 1 , , m 2   for   1 n 3 m 3 ; d m 1 , n 2 , n 3 S   m 2 n 2 , m 2 n 2 + 1 , , m 2   for   m 1 , n 2 , n 3   Ω 2 Ω 3 Ω m 2 + 1 .
It is seen that the policy set D contains m 2 + 1 m 3 × 2 m 3 × 3 m 3 1 × × m 2 + 1 m 2 + 1 elements so that the enumeration method used to find the optimal policy will require a huge enumeration workload. However, our following work will be able to greatly reduce the amount of searching for the optimal asynchronous policy d * by means of the sensitivity-based optimization theory.
Now, we discuss the monotonicity of the long-run average profit η d with respect to any asynchronous policy d under the different service prices. Since the setup and sleep policies d W and d S occur asynchronously and they will not interact with each other at any time, we can, respectively, study the impact of the policies d W and d S on the long-run average profit η d . To this end, in what follows we shall discuss three different cases: R R H , 0 R R L , and R L < R < R H .

6.1. The Service Price R R H

In the case of R R H , we discuss the monotonicity and optimality with respect to two different policies: the setup policy and the sleep policy, respectively.

6.1.1. The Setup Policy with R R H W

The following theorem analyzes the right-half part of the unimodal structure (see Figure 2) of the long-run average profit η d with respect to the setup policy—either d m 1 , 0 , n 3 W   n 3 , n 3 + 1 , , m 2 if n 3 m 2 or d m 1 , 0 , n 3 W = m 2 if n 3 > m 2 .
Theorem 4.
For any setup policy d W with d W d S   D and for each n 3 = 1 , 2 , , m 3 , the long-run average profit η d W d S is linearly increasing with respect to the setup policy either d m 1 , 0 , n 3 W   n 3 , n 3 + 1 , , m 2 if n 3 m 2 or d m 1 , 0 , n 3 W = m 2 if n 3 > m 2 .
Proof of Theorem 4.
For each n 3 = 1 , 2 , , m 3 , we consider two interrelated policies d W d S , d W d S   D as follows:
d W = d m 1 , 0 , 1 W , d m 1 , 0 , 2 W , , d m 1 , 0 , n 3 1 W , d m 1 , 0 , n 3 W ̲ , d m 1 , 0 , n 3 + 1 W , , d m 1 , 0 , m 3 W , d W = d m 1 , 0 , 1 W , d m 1 , 0 , 2 W , , d m 1 , 0 , n 3 1 W , n 3 m 2 ̲ , d m 1 , 0 , n 3 + 1 W , , d m 1 , 0 , m 3 W ,
where d m 1 , 0 , n 3 W n 3 m 2 . It is seen that the two policies d W , d W have one difference only between their corresponding decision elements d m 1 , 0 , n 3 W and n 3 m 2 . In this case, it is seen from Theorem 1 that Q d W d S = Q d W d S and π d W d S = π d W d S . Furthermore, it is easy to check from (4) to (7) that
f d f d = 0 , 0 , , 0 , d m 1 , 0 , n 3 W n 3 m 2 P 2 , W P 2 , S C 1 + C 3 1 , ̲ 0 , , 0 T .
Thus, it follows from Lemma 1 that
η d W d S η d W d S = π d W d S Q d W d S Q d W d S g ( d W d S ) + f d W f d W = π d W d S m 1 , 0 , n 3 d m 1 , 0 , n 3 W n 3 m 2 P 2 , W P 2 , S C 1 + C 3 1
or
  η d W d S = η d W d S π d W d S m 1 , 0 , n 3 d m 1 , 0 , n 3 W n 3 m 2 P 2 , W P 2 , S C 1 + C 3 1 .
Since π d W d S = π d W d S , it is easy to see that π d W d S m 1 , 0 , n 3 = π d W d S m 1 , 0 , n 3 can be determined by d m 1 , 0 , n 3 W = n 3 m 2 . This indicates that π d W d S m 1 , 0 , n 3 is irrelevant to the decision element d m 1 , 0 , n 3 W . Furthermore, note that η d W d S is irrelevant to the decision element d m 1 , 0 , n 3 W , and P 2 , W P 2 , S , C 1 and C 3 1 are all positive constants, thus it is easy to see from (35) that the long-run average profit η d W d S is linearly decreasing with respect to each decision element d m 1 , 0 , n 3 W for d m 1 , 0 , n 3 W   n 3 + 1 m 2 , n 3 + 2 m 2 , , m 2 . It is worth noting that if m 2 n 3 m 3 , then d m 1 , 0 , n 3 W   m 2 , m 2 , , m 2 . This completes the proof. □
In what follows, we discuss the left-half part of the unimodal structure (see Figure 2) of the long-run average profit η d with respect to each decision element d m 1 , 0 , n 3 W   0 , 1 , , n 3 if n 3 < m 2 . Compared to analysis of its right-half part, our discussion for the left-half part is a little bit complicated.
Let the optimal setup policy d W * = arg max d W d S   D η d W d S be
d W * = d m 1 , 0 , 1 W * , d m 1 , 0 , 2 W * , , d m 1 , 0 , m 3 W * .
Then, it is seen from Theorem 4 that
d m 1 , 0 , n 3 W * = 0 , 1 , , n 3 , 1 n 3 m 2 , m 2 , m 2 n 3 m 3 .
Hence, Theorem 4 takes the area of finding the optimal setup policy d W * from a large set 0 , 1 , , m 2 m 3 to a greatly shrunken area 0 , 1 × 0 , 1 , 2 × × 0 , 1 , , m 2 1 × 0 , 1 , , m 2 m 3 m 2 + 1 .
To find the optimal setup policy d W * , we consider two setup policies with an interrelated structure as follows:
d W = d m 1 , 0 , 1 W , , d m 1 , 0 , n 3 1 W , d m 1 , 0 , n 3 W , ̲ d m 1 , 0 , n 3 + 1 W , , d m 1 , 0 , m 2 W , , d m 1 , 0 , m 3 W , d W = d m 1 , 0 , 1 W , , d m 1 , 0 , n 3 1 W , d m 1 , 0 , n 3 W , ̲ d m 1 , 0 , n 3 + 1 W , , d m 1 , 0 , m 2 W , , d m 1 , 0 , m 3 W ,
where d m 1 , 0 , n 3 W = i 1 > d m 1 , 0 , n 3 W = i 2 , and d m 1 , 0 , n 3 W , d m 1 , 0 , n 3 W   1 , 2 , , n 3 m 2 . It is easy to check from (2) that
Q d W d S Q d W d S = 0 0 0 0 0 0 i 1 i 2 μ 2 m 1 μ 1 + i 2 μ 2 m 1 μ 1 + i 1 μ 2 0 0 0 0 0 0 .
On the other hand, from the reward functions given in (8), it is seen that for n 3 = 1 , 2 , , m 2 , and d m 1 , 0 , n 3 W   0 , 1 , , n 3 ,
f d W m 1 , 0 , n 3 = P 2 , W P 2 , S C 1 + C 3 1 d m 1 , 0 , n 3 W + R m 1 μ 1 m 1 P 1 , W + m 2 P 2 , S C 1 m 1 C 2 1 + n 3 C 2 3 λ 1 n 3 = m 3 C 5
and
f d W m 1 , 0 , n 3 = P 2 , W P 2 , S C 1 + C 3 1 d m 1 , 0 , n 3 W + R m 1 μ 1 m 1 P 1 , W + m 2 P 2 , S C 1 m 1 C 2 1 + n 3 C 2 3 λ 1 n 3 = m 3 C 5 .
Hence, we have
f d W f d W = 0 , 0 , , 0 , i 2 i 1 μ 2 β 1 , 0 , , 0 T .
We write that
η d d m 1 , 0 , n 3 W = i = π d d m 1 , 0 , n 3 W = i · f d d m 1 , 0 , n 3 W = i .
The following theorem discusses the left-half part (see Figure 2) of the unimodal structure of the long-run average profit η d with respect to each decision element d m 1 , 0 , n 3 W   0 , 1 , , m 2 .
Theorem 5.
If R R H W , then for any setup policy d W with d W d S   D and for each n 3 = 1 , 2 , , m 2 , the long-run average profit η d W d S is strictly monotone and increasing with respect to each decision element d m 1 , 0 , n 3 W for d m 1 , 0 , n 3 W   0 , 1 , , n 3 .
Proof of Theorem 5.
For each n 3 = 1 , 2 , , m 2 , we consider two setup policies with an interrelated structure as follows:
d W = d m 1 , 0 , 1 W , , d m 1 , 0 , n 3 1 W , d m 1 , 0 , n 3 W , ̲ d m 1 , 0 , n 3 + 1 W , , d m 1 , 0 , m 2 W , , d m 1 , 0 , m 3 W , d W = d m 1 , 0 , 1 W , , d m 1 , 0 , n 3 1 W , d m 1 , 0 , n 3 W , ̲ d m 1 , 0 , n 3 + 1 W , , d m 1 , 0 , m 2 W , , d m 1 , 0 , m 3 W ,
where d m 1 , 0 , n 3 W = i 1 > d m 1 , 0 , n 3 W = i 2 , and d m 1 , 0 , n 3 W , d m 1 , 0 , n 3 W   0 , 1 , , n 3 . Applying Lemma 1, it follows from (36)and (37) that
η d W d S η d W d S = π d W d S Q d W d S Q d W d S g ( d ) + f d W f d W = π d W d S m 1 , 0 , n 3 i 1 i 2 μ 2 g d m 1 , 0 , n 3 m 1 μ 1 + i 2 μ 2 g d m 1 , i 2 , n 3 i 2 + m 1 μ 1 + i 1 μ 2 g d m 1 , i 1 , n 3 i 1 i 1 i 2 β 1 = π d W d S m 1 , 0 , n 3 m 1 μ 1 G 1 d W i 1 μ 2 G 2 d W + β 1 + i 2 μ 2 G 3 d W + β 1 = π d W d S m 1 , 0 , n 3 G d W .
If R R H W , then it is seen from Proposition 1 that G d W 0 . Thus, we get that for the two policies d W d S , d W d S   D with d m 1 , 0 , n 3 W > d m 1 , 0 , n 3 W and d m 1 , 0 , n 3 W , d m 1 , 0 , n 3 W   0 , 1 , , n 3 ,
η d W d S > η d W d S .
This shows that
η d d m 1 , 0 , n 3 W = 1 < η d d m 1 , 0 , n 3 W = 2 < < η d d m 1 , 0 , n 3 W = n 3 1 < η d d m 1 , 0 , n 3 W = n 3 .
This completes the proof. □
When R R H W , now we use Figure 2 to provide an intuitive summary for the main results given in Theorems 4 and 5. In the right-half part of Figure 2,
η d W d S = η d W d S π d W d S m 1 , 0 , n 3 d m 1 , 0 , n 3 W n 3 P 2 , W P 2 , S C 1 + C 3 1
shows that η d W d S is a linear function of the decision element d m 1 , 0 , n 3 W . By contrast, in the right-half part of Figure 2, we need to first introduce a restrictive condition: R R H W , under which
η d W d S η d W d S = π d W d S m 1 , 0 , n 3 G d W .
Since G d W also depends on the decision element d m 1 , 0 , n 3 W , it is clear that η d W d S is a nonlinear function of the decision element d m 1 , 0 , n 3 W .

6.1.2. The Sleep Policy with R R H S

It is different from the setup policy in that, for the sleep policy, each decision element is d m 1 , n 2 , n 3 S   m 2 n 2 , m 2 n 2 + 1 , , m 2 . Hence, we just consider the structural properties of the long-run average profit η d W d S with respect to each decision element d m 1 , n 2 , n 3 S .
We write the optimal sleep policy as d S * = arg max d W d S   D η d W d S , where
d S * = d m 1 , 1 , 0 S * , d m 1 , 1 , 1 S * , , d m 1 , 1 , m 3 1 S * ; d m 1 , 2 , 0 S * , , d m 1 , 2 , m 3 2 S * ; ; d m 1 , m 2 , 0 S * , , d m 1 , m 2 , m 3 m 2 S * .
Then, it is seen that
d m 1 , 1 , 0 S *   m 2 1 , m 2 , , d m 1 , 1 , m 3 1 S *   m 2 1 , m 2 ; d m 1 , 2 , 0 S *   m 2 2 , m 2 1 , m 2 , , d m 1 , 2 , m 3 2 S *   m 2 2 , m 2 1 , m 2 ; d m 1 , m 2 , 0 S *   0 , 1 , , m 2 , , d m 1 , m 2 , m 3 m 2 S *   0 , 1 , , m 2 .
It is easy to see that the area of finding the optimal sleep policy d S * is m 2 1 , m 2 m 3 × m 2 2 , m 2 1 , m 2 m 3 1 × × 0 , 1 , , m 2 m 3 m 2 .
To find the optimal sleep policy d S * , we consider two sleep policies with an interrelated structure as follows:
d S = d m 1 , 1 , 0 S , d m 1 , 1 , 1 S , , d m 1 , n 2 , n 3 1 S , d m 1 , n 2 , n 3 S ̲ , d m 1 , n 2 , n 3 + 1 S , , d m 1 , m 2 , m 3 m 2 S , d S = d m 1 , 1 , 0 S , d m 1 , 1 , 1 S , , d m 1 , n 2 , n 3 1 S , d m 1 , n 2 , n 3 S ̲ , d m 1 , n 2 , n 3 + 1 S , , d m 1 , m 2 , m 3 m 2 S ,
where d m 1 , n 2 , n 3 S = m 2 j 2 > d m 1 , n 2 , n 3 S = m 2 j 1 , 0 j 2 < j 1 n 2 , it is easy to check from (2) that
Q d W d S Q d W d S = 0 0 0 0 0 0 m 1 μ 1 + j 2 μ 2 m 1 μ 1 + j 1 μ 2 j 2 j 1 μ 2 0 0 0 0 0 0 .
On the other hand, from the reward functions given in (6), d m 1 , n 2 , n 3 S , d m 1 , n 2 , n 3 S is in either m 2 n 2 , m 2 n 2 + 1 , , m 2 for 1 n 2 m 2 and 0 n 3 m 3 n 2 or 0 , 1 , , m 2 for n 2 = m 2 and 0 n 3 m 3 m 2 , we have
f d S m 1 , n 2 , n 3 = R μ 2 P 2 , W P 2 , S C 1 + C 3 2 d m 1 , n 2 , n 3 S + R m 1 μ 1 + m 2 μ 2 m 1 P 1 , W + m 2 P 2 , W C 1 m 1 C 2 1 + n 2 C 2 2 + n 3 C 2 3 n 2 m 2 C 3 2 m 1 μ 1 1 n 3 = 0 C 4
and
f d S m 1 , n 2 , n 3 = R μ 2 P 2 , W P 2 , S C 1 + C 3 2 d m 1 , n 2 , n 3 S + R m 1 μ 1 + m 2 μ 2 m 1 P 1 , W + m 2 P 2 , W C 1 m 1 C 2 1 + n 2 C 2 2 + n 3 C 2 3 n 2 m 2 C 3 2 m 1 μ 1 1 n 3 = 0 C 4 .
Hence, we have
f d S f d S = 0 , 0 , , 0 , j 2 j 1 μ 2 β 2 , 0 , , 0 T .
We write
η d d m 1 , n 2 , n 3 S = m 2 j = π d d m 1 , n 2 , n 3 S = m 2 j · f d d m 1 , n 2 , n 3 S = m 2 j .
The following theorem discusses the structure of the long-run average profit η d W d S with respect to each decision element d m 1 , n 2 , n 3 S .
Theorem 6.
If R R H S , then for any sleep policy d S with d W d S   D and for each n 2 = 1 , 2 , , m 2 , n 3 = 0 , 1 , , m 3 n 2 , the long-run average profit η d is strictly monotone decreasing with respect to each decision element d m 1 , n 2 , n 3 S   m 2 n 2 , m 2 n 2 + 1 , , m 2 .
Proof of Theorem 6.
For each n 2 = 1 , 2 , , m 2 and n 3 = 1 , 2 , , m 3 n 2 , we consider two sleep policies with an interrelated structure as follows:
d S = d m 1 , 1 , 0 S , , d m 1 , n 2 , n 3 1 S , d m 1 , n 2 , n 3 S ̲ , d m 1 , n 2 , n 3 + 1 S , , d m 1 , m 2 , m 3 m 2 S , d S = d m 1 , 1 , 0 S , , d m 1 , n 2 , n 3 1 S , d m 1 , n 2 , n 3 S ̲ , d m 1 , n 2 , n 3 + 1 S , , d m 1 , m 2 , m 3 m 2 S ,
where d m 1 , n 2 , n 3 S = m 2 j 2 > d m 1 , n 2 , n 3 S = m 2 j 1 , 0 j 2 < j 1 n 2 and d m 1 , n 2 , n 3 S , d m 1 , n 2 , n 3 S   m 2 n 2 , m 2 n 2 + 1 , , m 2 . Applying Lemma 1, it follows from (39)and (40) that
η d W d S η d W d S = π d W d S Q d W d S Q d W d S g d + f d S f d S = π d W d S m 1 , n 2 , n 3 m 1 μ 1 + j 2 μ 2 g d m 1 , j 2 , n 3 + n 2 j 2 m 1 μ 1 + j 1 μ 2 g d m 1 , j 1 , n 3 + n 2 j 1 j 2 j 1 μ 2 g d m 1 , n 2 , n 3 j 2 j 1 μ 2 β 2 = π d W d S m 1 , n 2 , n 3 m 1 μ 1 G 1 d S + j 1 μ 2 G 2 d S + β 2 j 2 μ 2 G 3 d S + β 2 = π d W d S m 1 , n 2 , n 3 G d S .
It is worthwhile to note that (41) has the same form as (38), since the perturbation of the sleep policy, j 1 , and j 2 denote the number of the working servers. If R R H S , then it is seen from Proposition 1 that for j 2 < j 1 , we have G d S 0 . Thus, we get that for j 2 < j 1 and j 1 , j 2   0 , 1 , , n 2 ,
η d W d S > η d W d S ,
this shows that η d W d S is strictly monotone increasing with respect to m 2 d m 1 , n 2 , n 3 S . Thus, we get that for the two policies d W d S , d W d S   D with d m 1 , n 2 , n 3 S > d m 1 , n 2 , n 3 S and d m 1 , n 2 , n 3 S , d m 1 , n 2 , n 3 S   m 2 n 2 , m 2 n 2 + 1 , , m 2 ,
η d W d S < η d W d S .
It is easy to see that
η d d m 1 , n 2 , n 3 S = m 2 n 2 > η d d m 1 , n 2 , n 3 S = m 2 n 2 + 1 > > η d d m 1 , n 2 , n 3 S = m 2 .
This completes the proof. □
When R R H S , now we use Figure 3 to provide an intuitive summary for the main results given in Theorems 6. According to (41), G d S depends on d m 1 , n 2 , n 3 S , and it is clear that η d W d S is a nonlinear function of d m 1 , n 2 , n 3 S .
As a simple summarization of Theorems 5 and 6, we obtain the monotone structure of the long-run average profit η d with respect to the asynchronous energy-efficient policy, while its proof is easy only on the condition that R R H makes R R H W and R R H S .
Theorem 7.
If R R H , then for any policy d   D , the long-run average profit η d is strictly monotone with respect to each decision element of d W and of d S , respectively.

6.2. The Service Price 0 R R L

A similar analysis to that of the case R R H , we simply discuss the service price 0 R R L for the monotonicity and optimality for two different policies: the setup policy and the sleep policy.

6.2.1. The Setup Policy with 0 R R L W

Theorem 8.
If 0 R R L W , for any setup policy d W with d W d S   D and for each n 3 = 1 , 2 , , m 2 , then the long-run average profit η d W d S is strictly monotone decreasing with respect to each decision element d m 1 , 0 , n 3 W   0 , 1 , , n 3 .
Proof of Theorem 8.
This proof is similar to that of Theorem 5. For each n 3 = 1 , 2 , , m 2 , we consider two setup policies with an interrelated structure as follows:
d W = d m 1 , 0 , 1 W , , d m 1 , 0 , n 3 1 W , d m 1 , 0 , n 3 W , ̲ d m 1 , 0 , n 3 + 1 W , , d m 1 , 0 , m 2 W , , d m 1 , 0 , m 3 W , d W = d m 1 , 0 , 1 W , , d m 1 , 0 , n 3 1 W , d m 1 , 0 , n 3 W , ̲ d m 1 , 0 , n 3 + 1 W , , d m 1 , 0 , m 2 W , , d m 1 , 0 , m 3 W ,
where d m 1 , 0 , n 3 W = i 1 > d m 1 , 0 , n 3 W = i 2 , and d m 1 , 0 , n 3 W , d m 1 , 0 , n 3 W   0 , 1 , , n 3 for 1 n 3 m 2 . It is clear that
η d W d S η d W d S = π d W d S m 1 , 0 , n 3 G d W .
If 0 R R L W , then it is seen from Proposition 1 that G d W 0 . Thus, we get that for the two setup policies with d m 1 , 0 , n 3 W < d m 1 , 0 , n 3 W and d m 1 , 0 , n 3 W , d m 1 , 0 , n 3 W   0 , 1 , , n 3 ,
η d W d S < η d W d S .
This shows that for 1 n 3 m 2 ,
η d d m 1 , 0 , n 3 W = 1 > η d d m 1 , 0 , n 3 W = 2 > > η d d m 1 , 0 , n 3 W = n 3 1 > η d d m 1 , 0 , n 3 W = n 3 .
This completes the proof. □
When 0 R R L W , we also use Figure 4 to provide an intuitive summary for the main results given in Theorems 4 and 8.

6.2.2. The Sleep Policy with 0 R R L S

Theorem 9.
If 0 R R L S , then for any sleep policy d S with d W d S   D and for each n 2 = 1 , 2 , , m 2 , n 3 = 0 , 1 , , m 3 n 2 , the long-run average profit η d W d S is strictly monotone increasing with respect to each decision element d m 1 , n 2 , n 3 S , for d m 1 , n 2 , n 3 S   m 2 n 2 , m 2 n 2 + 1 , , m 2 .
Proof of Theorem 9.
This proof is similar to that of Theorem 6. For each n 2 = 1 , 2 , , m 2 and n 3 = 0 , 1 , , m 3 n 2 , we consider two sleep policies with an interrelated structure as follows:
d S = d m 1 , 1 , 0 S , d m 1 , 1 , 1 S , , d m 1 , n 2 , n 3 1 S , d m 1 , n 2 , n 3 S ̲ , d m 1 , n 2 , n 3 + 1 S , , d m 1 , m 2 , m 3 m 2 S , d S = d m 1 , 1 , 0 S , d m 1 , 1 , 1 S , , d m 1 , n 2 , n 3 1 S , d m 1 , n 2 , n 3 S ̲ , d m 1 , n 2 , n 3 + 1 S , , d m 1 , m 2 , m 3 m 2 S ,
where d m 1 , n 2 , n 3 S = m 2 j 2 > d m 1 , n 2 , n 3 S = m 2 j 1 , 0 j 2 < j 1 n 2 and d m 1 , n 2 , n 3 S , d m 1 , n 2 , n 3 S   m 2 n 2 , m 2 n 2 + 1 , , m 2 . It is clear that
η d W d S η d W d S = π d W d S m 1 , n 2 , n 3 G d S .
By a similar analysis to that in Theorem 6, if 0 R R L S , then it is seen from Proposition 1 that for j 2 < j 1 , we have G d S 0 . Thus, we get that for j 2 < j 1 and j 1 , j 2   0 , 1 , , n 2 , η d W d S is strictly monotone decreasing with respect to m 2 d m 1 , n 2 , n 3 S , hence it is also strictly monotone increasing with respect to d m 1 , n 2 , n 3 S . Thus, we get that for the two sleep policies with d m 1 , n 2 , n 3 S < d m 1 , n 2 , n 3 S and d m 1 , n 2 , n 3 S , d m 1 , n 2 , n 3 S   m 2 n 2 , m 2 n 2 + 1 , , m 2 ,
η d W d S < η d W d S .
This shows that
η d d m 1 , n 2 , n 3 S = m 2 n 2 < η d d m 1 , n 2 , n 3 S = m 2 n 2 + 1 < < η d d m 1 , n 2 , n 3 S = m 2 .
This completes the proof. □
When 0 R R L S , we also use Figure 5 to provide an intuitive summary for the main results given in Theorem 9.
As a simple summarization of Theorems 8 and 9, the following theorem further describes monotone structure of the long-run average profit η d with respect to the asynchronous energy-efficient policy, while its proof is easy only through using the condition that 0 R R L makes 0 R R L W and 0 R R L S .
Theorem 10.
If 0 R R L , then for any asynchronous energy-efficient policy d   D , the long-run average profit η d is strictly monotone with respect to each decision element of d W and of d S , respectively.
In the remainder of this section, we discuss a more complicated case with the service price R L < R < R H . In this case, we use the bang–bang control and the asynchronous structure of d   D to prove that the optimal asynchronous energy-efficient polices d W * and   d S * both have bang–bang control forms.

6.3. The Service Price R L < R < R H

For the price R L < R < R H , we can further derive the following theorems about the monotonicity of η d with respect to the setup policy and the sleep policy, respectively.

6.3.1. The Setup Policy with R L W < R < R H W

For the service price R L W < R < R H W , the following theorem provides the monotonicity of η d with respect to the decision element d m 1 , 0 , n 3 W .
Theorem 11.
If R L W < R < R H W , then the long-run average profit η d W d S is monotone (either increasing or decreasing) with respect to the decision element d m 1 , 0 , n 3 W , where n 3 = 1 , 2 , , m 3 and d m 1 , 0 , n 3 W   0 , 1 , , n 3 .
Proof of Theorem 11.
Similarly to the first part of the proof for Theorem 5, we consider any two setup policies with an interrelated structure as follows:
d W = d m 1 , 0 , 1 W , , d m 1 , 0 , n 3 1 W , d m 1 , 0 , n 3 W , ̲ d m 1 , 0 , n 3 + 1 W , , d m 1 , 0 , m 2 W , , d m 1 , 0 , m 3 W , d W = d m 1 , 0 , 1 W , , d m 1 , 0 , n 3 1 W , d m 1 , 0 , n 3 W , ̲ d m 1 , 0 , n 3 + 1 W , , d m 1 , 0 , m 2 W , , d m 1 , 0 , m 3 W ,
where d m 1 , 0 , n 3 W , d m 1 , 0 , n 3 W   0 , 1 , , n 3 . Applying Lemma 1, we obtain
η d W d S η d W d S = π d W d S m 1 , 0 , n 3 G d W .
On the other hand, we can similarly obtain the following difference equation
η d W d S η d W d S = π d W d S m 1 , 0 , n 3 G d W .
By summing (42) and (43), we have
π d W d S